Re: bswap PRs 69714, 67781
Hi Bernd, First of all, my apologize for the late reply. I was in holidays the past week to celebrate Chinese new year. On Friday, February 12, 2016 05:28:43 PM Bernd Schmidt wrote: > PR69714 is an issue where the bswap pass makes an incorrect > transformation on big-endian targets. The source has a 32-bit bswap, but > PA doesn't have a pattern for that. Still, we recognize that there is a > 16-bit bswap involved, and generate code for that - loading the halfword > at offset 2 from the original memory, as per the proper big-endian > correction. > > The problem is that we recognized the rotation of the _high_ part, which > is at offset 0 on big-endian. The symbolic number is 0x0304, rather than > 0x0102 as it should be. Only the latter form should ever be matched. Which is exactly what the patch for PR67781 was set out to do (see the if (BYTES_BIG_ENDIAN) block in find_bswap_or_nop. The reason why the offset is wrong is due to another if (BYTES_BIG_ENDIAN) block in bswap_replace. I will check the testcase added with that latter block, my guess is that the change was trying to fix a similar issue to PR67781 and PR69714. When removing it the load in avcrc is done without an offset. I should have run the full testsuite also on a big endian system instead of a few selected testcases and a bootstrap in addition to the little endian bootstrap+testsuite. Lesson learned. > The > problem is caused by the patch for PR67781, which was intended to solve > a different big-endian problem. Unfortunately, I think it is based on an > incorrect analysis. > > The real issue with the PR67781 testcase is in fact the masking loop, > identified by Thomas in comment #7 for 67781. > > for (tmpn = n->n, rsize = 0; tmpn; tmpn >>= BITS_PER_MARKER, rsize++) >; > n->range = rsize; > > If we have a value of 0x01020304, but a range of 5, it means that > there's an "invisible" high-order byte that we don't care about. On > little-endian, we can just ignore it. On big-endian, this implies that > the data we're interested in is located at an offset. The code that does > the replacements does not use the offset or bytepos fields, it assumes > that the bytepos always matches that of the load instruction. Yes, but the change in find_bswap_or_nop aims at checking that we have 0x05040302 or 0x02030405 for big endian targets and 0x04030201 or 0x01020304 for little endian targets. Before the "if (rsize < n->range)" block, cmpnop and cmpxchg are respectively 0x0504030201 and 0x0102030405. Then for big endian it will only keep the 4 least significant symbolic bytes of cmpxchg (if performs a bitwise and) and the 4 most significant symbolic bytes of cmpnop (it performs a right shift) so you'd get 0x05040302 for cmpnop and 0x02030405 for cmpxchg. Both would translate to a load at offset 0, and then a byteswap for the latter. As said earlier, the problem is in bswap_replace which tries to adjust the address of the load for big endian targets by adding a load offset. With the change in find_bswap_or_nop, an offset is never needed because only pattern that correspond to a load at offset 0 are recognized. I kept for GCC 7 to change that to allow offset and recognize all sub-load and sub-bswap. > The only > offset we can introduce is the big-endian correction, but that assumes > we're always dealing with lowparts. > > So, I think the correct/conservative fix for both bugs is to revert the > earlier change for PR67781, and then apply the following on top: > > --- revert.tree-ssa-math-opts.c 2016-02-12 15:22:57.098895058 +0100 > +++ tree-ssa-math-opts.c 2016-02-12 15:23:08.482228474 +0100 > @@ -2473,10 +2473,14 @@ find_bswap_or_nop (gimple *stmt, struct > /* Find real size of result (highest non-zero byte). */ > if (n->base_addr) > { > - int rsize; > + unsigned HOST_WIDE_INT rsize; > uint64_t tmpn; > > for (tmpn = n->n, rsize = 0; tmpn; tmpn >>= BITS_PER_MARKER, > rsize++); > + if (BYTES_BIG_ENDIAN && n->range != rsize) > + /* This implies an offset, which is currently not handled by > +bswap_replace. */ > + return NULL; > n->range = rsize; > } This works too yes with less optimizations for big endian. I'm fine with either solutions. This one is indeed a bit more conservative so I see the appeal to use it for GCC 5 and 6. Best regards, Thomas
Re: [PATCH, testsuite] Fix g++.dg/pr67989.C test failure when running with -march or -mcpu
Ping? On Monday, January 18, 2016 11:33:47 AM Thomas Preud'homme wrote: > On Wednesday, January 13, 2016 06:39:20 PM Bernd Schmidt wrote: > > On 01/12/2016 08:55 AM, Thomas Preud'homme wrote: > > > On Monday, January 11, 2016 04:57:18 PM Bernd Schmidt wrote: > > >> On 01/08/2016 10:33 AM, Thomas Preud'homme wrote: > > >>> 2016-01-08 Thomas Preud'homme > > >>> > > >>> * g++.dg/pr67989.C: Remove ARM-specific option. > > >>> * gcc.target/arm/pr67989.C: New file. > > >> > > >> I checked some other arm tests and they have things like > > >> > > >> /* { dg-skip-if "avoid conflicting multilib options" { *-*-* } { > > >> "-march=*" } { "-march=armv4t" } } */ > > >> /* { dg-skip-if "avoid conflicting multilib options" { *-*-* } { > > >> "-mthumb" } { "" } } */ > > >> > > >> Do you need the same in your testcase? > > > > > > That was the first approach I took but Kyrill suggested me to use > > > arm_arch_v4t and arm_arch_v4t_ok machinery instead. It should take care > > > about whether the architecture can be selected. > > > > Hmm, the ones I looked at did use dg-add-options, but not the > > corresponding _ok requirement. So I think this is OK. > > Just to make sure: ok as in OK to commit as is? > > Best regards, > > Thomas
Re: [PATCH, GCC] Fix PR67781: wrong code generation for partial load on big endian targets
On Thursday, January 21, 2016 09:21:52 AM Richard Biener wrote: > On Thu, 21 Jan 2016, Thomas Preud'homme wrote: > > On Friday, January 08, 2016 10:05:25 AM Richard Biener wrote: > > > On Tue, 5 Jan 2016, Thomas Preud'homme wrote: > > > > Hi, > > > > > > > > bswap optimization pass generate wrong code on big endian targets when > > > > the > > > > result of a bit operation it analyzed is a partial load of the range > > > > of > > > > memory accessed by the original expression (when one or more bytes at > > > > lowest address were lost in the computation). This is due to the way > > > > cmpxchg and cmpnop are adjusted in find_bswap_or_nop before being > > > > compared to the result of the symbolic expression. Part of the > > > > adjustment > > > > is endian independent: it's to ignore the bytes that were not accessed > > > > by > > > > the original gimple expression. However, when the result has less byte > > > > than that original expression, some more byte need to be ignored and > > > > this > > > > is endian dependent. > > > > > > > > The current code only support loss of bytes at the highest addresses > > > > because there is no code to adjust the address of the load. However, > > > > for > > > > little and big endian targets the bytes at highest address translate > > > > into > > > > different byte significance in the result. This patch first separate > > > > cmpxchg and cmpnop adjustement into 2 steps and then deal with > > > > endianness > > > > correctly for the second step. > > > > > > > > ChangeLog entries are as follow: > > > > > > > > > > > > *** gcc/ChangeLog *** > > > > > > > > 2015-12-16 Thomas Preud'homme > > > > > > > > PR tree-optimization/67781 > > > > * tree-ssa-math-opts.c (find_bswap_or_nop): Zero out bytes in > > > > cmpxchg > > > > and cmpnop in two steps: first the ones not accessed in > > > > original > > > > gimple expression in a endian independent way and then the > > > > ones > > > > not > > > > accessed in the final result in an endian-specific way. > > > > > > > > *** gcc/testsuite/ChangeLog *** > > > > > > > > 2015-12-16 Thomas Preud'homme > > > > > > > > PR tree-optimization/67781 > > > > * gcc.c-torture/execute/pr67781.c: New file. > > > > > > > > diff --git a/gcc/testsuite/gcc.c-torture/execute/pr67781.c > > > > b/gcc/testsuite/gcc.c-torture/execute/pr67781.c > > > > new file mode 100644 > > > > index 000..bf50aa2 > > > > --- /dev/null > > > > +++ b/gcc/testsuite/gcc.c-torture/execute/pr67781.c > > > > @@ -0,0 +1,34 @@ > > > > +#ifdef __UINT32_TYPE__ > > > > +typedef __UINT32_TYPE__ uint32_t; > > > > +#else > > > > +typedef unsigned uint32_t; > > > > +#endif > > > > + > > > > +#ifdef __UINT8_TYPE__ > > > > +typedef __UINT8_TYPE__ uint8_t; > > > > +#else > > > > +typedef unsigned char uint8_t; > > > > +#endif > > > > + > > > > +struct > > > > +{ > > > > + uint32_t a; > > > > + uint8_t b; > > > > +} s = { 0x123456, 0x78 }; > > > > + > > > > +int pr67781() > > > > +{ > > > > + uint32_t c = (s.a << 8) | s.b; > > > > + return c; > > > > +} > > > > + > > > > +int > > > > +main () > > > > +{ > > > > + if (sizeof (uint32_t) * __CHAR_BIT__ != 32) > > > > +return 0; > > > > + > > > > + if (pr67781 () != 0x12345678) > > > > +__builtin_abort (); > > > > + return 0; > > > > +} > > > > diff --git a/gcc/tree-ssa-math-opts.c b/gcc/tree-ssa-math-opts.c > > > > index b00f046..e5a185f 100644 > > > > --- a/gcc/tree-ssa-math-opts.c > > > > +++ b/gcc/tree-ssa-math-opts.c > > > > @@ -2441,6 +2441,8 @@ find_bswap_or_nop_1 (gimple *stmt, struct > > > > symbolic_number *n, int limit) > > > > > > > > static gimple * > &
Re: [PATCH, GCC] Fix PR67781: wrong code generation for partial load on big endian targets
On Friday, January 08, 2016 10:05:25 AM Richard Biener wrote: > On Tue, 5 Jan 2016, Thomas Preud'homme wrote: > > Hi, > > > > bswap optimization pass generate wrong code on big endian targets when the > > result of a bit operation it analyzed is a partial load of the range of > > memory accessed by the original expression (when one or more bytes at > > lowest address were lost in the computation). This is due to the way > > cmpxchg and cmpnop are adjusted in find_bswap_or_nop before being > > compared to the result of the symbolic expression. Part of the adjustment > > is endian independent: it's to ignore the bytes that were not accessed by > > the original gimple expression. However, when the result has less byte > > than that original expression, some more byte need to be ignored and this > > is endian dependent. > > > > The current code only support loss of bytes at the highest addresses > > because there is no code to adjust the address of the load. However, for > > little and big endian targets the bytes at highest address translate into > > different byte significance in the result. This patch first separate > > cmpxchg and cmpnop adjustement into 2 steps and then deal with endianness > > correctly for the second step. > > > > ChangeLog entries are as follow: > > > > > > *** gcc/ChangeLog *** > > > > 2015-12-16 Thomas Preud'homme > > > > PR tree-optimization/67781 > > * tree-ssa-math-opts.c (find_bswap_or_nop): Zero out bytes in > > cmpxchg > > and cmpnop in two steps: first the ones not accessed in original > > gimple expression in a endian independent way and then the ones > > not > > accessed in the final result in an endian-specific way. > > > > *** gcc/testsuite/ChangeLog *** > > > > 2015-12-16 Thomas Preud'homme > > > > PR tree-optimization/67781 > > * gcc.c-torture/execute/pr67781.c: New file. > > > > diff --git a/gcc/testsuite/gcc.c-torture/execute/pr67781.c > > b/gcc/testsuite/gcc.c-torture/execute/pr67781.c > > new file mode 100644 > > index 000..bf50aa2 > > --- /dev/null > > +++ b/gcc/testsuite/gcc.c-torture/execute/pr67781.c > > @@ -0,0 +1,34 @@ > > +#ifdef __UINT32_TYPE__ > > +typedef __UINT32_TYPE__ uint32_t; > > +#else > > +typedef unsigned uint32_t; > > +#endif > > + > > +#ifdef __UINT8_TYPE__ > > +typedef __UINT8_TYPE__ uint8_t; > > +#else > > +typedef unsigned char uint8_t; > > +#endif > > + > > +struct > > +{ > > + uint32_t a; > > + uint8_t b; > > +} s = { 0x123456, 0x78 }; > > + > > +int pr67781() > > +{ > > + uint32_t c = (s.a << 8) | s.b; > > + return c; > > +} > > + > > +int > > +main () > > +{ > > + if (sizeof (uint32_t) * __CHAR_BIT__ != 32) > > +return 0; > > + > > + if (pr67781 () != 0x12345678) > > +__builtin_abort (); > > + return 0; > > +} > > diff --git a/gcc/tree-ssa-math-opts.c b/gcc/tree-ssa-math-opts.c > > index b00f046..e5a185f 100644 > > --- a/gcc/tree-ssa-math-opts.c > > +++ b/gcc/tree-ssa-math-opts.c > > @@ -2441,6 +2441,8 @@ find_bswap_or_nop_1 (gimple *stmt, struct > > symbolic_number *n, int limit) > > > > static gimple * > > find_bswap_or_nop (gimple *stmt, struct symbolic_number *n, bool *bswap) > > { > > > > + unsigned rsize; > > + uint64_t tmpn, mask; > > > > /* The number which the find_bswap_or_nop_1 result should match in order > > > > to have a full byte swap. The number is shifted to the right > > according to the size of the symbolic number before using it. */ > > > > @@ -2464,24 +2466,38 @@ find_bswap_or_nop (gimple *stmt, struct > > symbolic_number *n, bool *bswap) > > > >/* Find real size of result (highest non-zero byte). */ > >if (n->base_addr) > > > > -{ > > - int rsize; > > - uint64_t tmpn; > > - > > - for (tmpn = n->n, rsize = 0; tmpn; tmpn >>= BITS_PER_MARKER, > > rsize++); - n->range = rsize; > > -} > > +for (tmpn = n->n, rsize = 0; tmpn; tmpn >>= BITS_PER_MARKER, > > rsize++); > > + else > > +rsize = n->range; > > > > - /* Zero out the extra bits of N and CMP*. */ > > + /* Zero out the bits corresponding to untouched bytes in original > > gimple > > + expression. */
RE: [PATCH, testsuite] Fix PR68632: gcc.target/arm/lto/pr65837 failure on M profile ARM targets
It's indeed fixed now. Thanks. Best regards, Thomas > -Original Message- > From: gcc-patches-ow...@gcc.gnu.org [mailto:gcc-patches- > ow...@gcc.gnu.org] On Behalf Of Christian Bruel > Sent: Wednesday, December 09, 2015 6:13 PM > To: gcc-patches@gcc.gnu.org > Subject: Re: [PATCH, testsuite] Fix PR68632: gcc.target/arm/lto/pr65837 > failure on M profile ARM targets > > Hi Thomas, > > > On 12/09/2015 10:57 AM, Thomas Preud'homme wrote: > > gcc.target/arm/lto/pr65837 fails on M profile ARM targets because of > lack of neon instructions. This patch adds the necessary arm_neon_ok > effective target requirement to avoid running this test for such targets. > > > > This case also fails for all configs that don't have neon by default. > This is being fixed with > > https://gcc.gnu.org/ml/gcc-patches/2015-12/msg00865.html > > > > ChangeLog entry is as follows: > > > > > > * gcc/testsuite/ChangeLog *** > > > > 2015-12-08 Thomas Preud'homme > > > > PR testsuite/68632 > > * gcc.target/arm/lto/pr65837_0.c: Require arm_neon_ok effective > > target. > > > > > > diff --git a/gcc/testsuite/gcc.target/arm/lto/pr65837_0.c > b/gcc/testsuite/gcc.target/arm/lto/pr65837_0.c > > index 000fc2a..fcc26a1 100644 > > --- a/gcc/testsuite/gcc.target/arm/lto/pr65837_0.c > > +++ b/gcc/testsuite/gcc.target/arm/lto/pr65837_0.c > > @@ -1,4 +1,5 @@ > > /* { dg-lto-do run } */ > > +/* { dg-require-effective-target arm_neon_ok } */ > > /* { dg-lto-options {{-flto -mfpu=neon}} } */ > > /* { dg-suppress-ld-options {-mfpu=neon} } */ > > > > > > > > Testcase fails without the patch and succeeds with. > > > > Is this ok for trunk? > > > > Best regards, > > > > Thomas > > > >
Re: [PATCH, testsuite] Fix g++.dg/pr67989.C test failure when running with -march or -mcpu
On Wednesday, January 13, 2016 06:39:20 PM Bernd Schmidt wrote: > On 01/12/2016 08:55 AM, Thomas Preud'homme wrote: > > On Monday, January 11, 2016 04:57:18 PM Bernd Schmidt wrote: > >> On 01/08/2016 10:33 AM, Thomas Preud'homme wrote: > >>> 2016-01-08 Thomas Preud'homme > >>> > >>> * g++.dg/pr67989.C: Remove ARM-specific option. > >>> * gcc.target/arm/pr67989.C: New file. > >> > >> I checked some other arm tests and they have things like > >> > >> /* { dg-skip-if "avoid conflicting multilib options" { *-*-* } { > >> "-march=*" } { "-march=armv4t" } } */ > >> /* { dg-skip-if "avoid conflicting multilib options" { *-*-* } { > >> "-mthumb" } { "" } } */ > >> > >> Do you need the same in your testcase? > > > > That was the first approach I took but Kyrill suggested me to use > > arm_arch_v4t and arm_arch_v4t_ok machinery instead. It should take care > > about whether the architecture can be selected. > > Hmm, the ones I looked at did use dg-add-options, but not the > corresponding _ok requirement. So I think this is OK. Just to make sure: ok as in OK to commit as is? Best regards, Thomas
RE: [PATCH, libgcc/ARM 1/6] Fix Thumb-1 only == ARMv6-M & Thumb-2 only == ARMv7-M assumptions
> From: gcc-patches-ow...@gcc.gnu.org [mailto:gcc-patches- > ow...@gcc.gnu.org] On Behalf Of Thomas Preud'homme > Sent: Thursday, December 17, 2015 1:58 PM > > Hi, > > We decided to apply the following patch to the ARM embedded 5 branch. > This is *not* intended for trunk for now. We will send a separate email > for trunk. And now a rebased patch on top of trunk. > > This patch is part of a patch series to add support for ARMv8-M[1] to GCC. > This specific patch fixes some assumptions related to M profile > architectures. Currently GCC (mostly libgcc) contains several assumptions > that the only ARM architecture with Thumb-1 only instructions is ARMv6- > M and the only one with Thumb-2 only instructions is ARMv7-M. ARMv8- > M [1] make this wrong since ARMv8-M baseline is also (mostly) Thumb-1 > only and ARMv8-M mainline is also Thumb-2 only. This patch replace > checks for __ARM_ARCH_*__ for checks against > __ARM_ARCH_ISA_THUMB and __ARM_ARCH_ISA_ARM instead. For > instance, Thumb-1 only can be checked with > #if !defined(__ARM_ARCH_ISA_ARM) && (__ARM_ARCH_ISA_THUMB > == 1). It also fixes the guard for DIV code to not apply to ARMv8-M > Baseline since it uses Thumb-2 instructions. > > [1] For a quick overview of ARMv8-M please refer to the initial cover > letter. > > ChangeLog entries are as follow: > > *** gcc/ChangeLog *** 2015-11-13 Thomas Preud'homme * config/arm/elf.h: Use __ARM_ARCH_ISA_THUMB and __ARM_ARCH_ISA_ARM to decide whether to prevent some libgcc routines being included for some multilibs rather than __ARM_ARCH_6M__ and add comment to indicate the link between this condition and the one in libgcc/config/arm/lib1func.S. * config/arm/arm.h (TARGET_ARM_V6M): Add check to TARGET_ARM_ARCH. (TARGET_ARM_V7M): Likewise. *** gcc/testsuite/ChangeLog *** 2015-11-10 Thomas Preud'homme * lib/target-supports.exp (check_effective_target_arm_cortex_m): Use __ARM_ARCH_ISA_ARM to test for Cortex-M devices. *** libgcc/ChangeLog *** 2015-12-17 Thomas Preud'homme * config/arm/bpabi-v6m.S: Fix header comment to mention Thumb-1 rather than ARMv6-M. * config/arm/lib1funcs.S (__prefer_thumb__): Define among other cases for all Thumb-1 only targets. (__only_thumb1__): Define for all Thumb-1 only targets. (THUMB_LDIV0): Test for __only_thumb1__ rather than __ARM_ARCH_6M__. (EQUIV): Likewise. (ARM_FUNC_ALIAS): Likewise. (umodsi3): Add check to __only_thumb1__ to guard the idiv version. (modsi3): Likewise. (HAVE_ARM_CLZ): Remove block defining it. (clzsi2): Test for __only_thumb1__ rather than __ARM_ARCH_6M__ and check __ARM_FEATURE_CLZ instead of HAVE_ARM_CLZ. (clzdi2): Likewise. (ctzsi2): Likewise. (L_interwork_call_via_rX): Test for __ARM_ARCH_ISA_ARM rather than __ARM_ARCH_6M__ in guard for checking whether it is defined. (final includes): Test for __only_thumb1__ rather than __ARM_ARCH_6M__ and add comment to indicate the connection between this condition and the one in gcc/config/arm/elf.h. * config/arm/libunwind.S: Test for __ARM_ARCH_ISA_THUMB and __ARM_ARCH_ISA_ARM rather than __ARM_ARCH_6M__. * config/arm/t-softfp: Likewise. diff --git a/gcc/config/arm/arm.h b/gcc/config/arm/arm.h index fd999dd..0d23f39 100644 --- a/gcc/config/arm/arm.h +++ b/gcc/config/arm/arm.h @@ -2182,8 +2182,10 @@ extern int making_const_table; #define TARGET_ARM_ARCH\ (arm_base_arch) \ -#define TARGET_ARM_V6M (!arm_arch_notm && !arm_arch_thumb2) -#define TARGET_ARM_V7M (!arm_arch_notm && arm_arch_thumb2) +#define TARGET_ARM_V6M (TARGET_ARM_ARCH == BASE_ARCH_6M && !arm_arch_notm \ + && !arm_arch_thumb2) +#define TARGET_ARM_V7M (TARGET_ARM_ARCH == BASE_ARCH_7M && !arm_arch_notm \ + && arm_arch_thumb2) /* The highest Thumb instruction set version supported by the chip. */ #define TARGET_ARM_ARCH_ISA_THUMB \ diff --git a/gcc/config/arm/elf.h b/gcc/config/arm/elf.h index 3795728..579a580 100644 --- a/gcc/config/arm/elf.h +++ b/gcc/config/arm/elf.h @@ -148,8 +148,9 @@ while (0) /* Horrible hack: We want to prevent some libgcc routines being included - for some multilibs. */ -#ifndef __ARM_ARCH_6M__ + for some multilibs. The condition should match the one in + libgcc/config/arm/lib1funcs.S. */ +#if __ARM_ARCH_ISA_ARM || __ARM_ARCH_ISA_THUMB != 1 #undef L_fixdfsi #undef L_fixunsdfsi #undef L_truncdfsf2 diff --git a/gcc/testsuite/lib/target-supports.exp b/gcc/testsuite/lib/target-supports.exp index 4e349e9..3f96826 100644 --- a/gcc/testsuite/lib/target-supports.exp +++ b/gcc/testsuite/lib/target-su
[PATCH, testsuite] Stabilize test result output of dump-noaddr
Hi, Everytime the static pass number of passes change, testsuite output for dump- noaddr will change, leading to a series of noise lines like the following under dg-cmp-results: PASS->NA: gcc.c-torture/unsorted/dump-noaddr.c.036t.fre1, -O1 comparison PASS->NA: gcc.c-torture/unsorted/dump-noaddr.c.036t.fre1, -O2 comparison PASS->NA: gcc.c-torture/unsorted/dump-noaddr.c.036t.fre1, -O2 -flto -fno-use- linker-plugin -flto-partition=none comparison PASS->NA: gcc.c-torture/unsorted/dump-noaddr.c.036t.fre1, -O2 -flto -fuse- linker-plugin -fno-fat-lto-objects comparison PASS->NA: gcc.c-torture/unsorted/dump-noaddr.c.036t.fre1, -O3 -fomit-frame- pointer -funroll-loops -fpeel-loops -ftracer -finline-functions comparison PASS->NA: gcc.c-torture/unsorted/dump-noaddr.c.036t.fre1, -O3 -g comparison PASS->NA: gcc.c-torture/unsorted/dump-noaddr.c.036t.fre1, -Og -g comparison PASS->NA: gcc.c-torture/unsorted/dump-noaddr.c.036t.fre1, -Os comparison NA->PASS: gcc.c-torture/unsorted/dump-noaddr.c.034t.fre1, -O1 comparison NA->PASS: gcc.c-torture/unsorted/dump-noaddr.c.034t.fre1, -O2 comparison NA->PASS: gcc.c-torture/unsorted/dump-noaddr.c.034t.fre1, -O2 -flto -fno-use- linker-plugin -flto-partition=none comparison NA->PASS: gcc.c-torture/unsorted/dump-noaddr.c.034t.fre1, -O2 -flto -fuse- linker-plugin -fno-fat-lto-objects comparison NA->PASS: gcc.c-torture/unsorted/dump-noaddr.c.034t.fre1, -O3 -fomit-frame- pointer -funroll-loops -fpeel-loops -ftracer -finline-functions comparison NA->PASS: gcc.c-torture/unsorted/dump-noaddr.c.034t.fre1, -O3 -g comparison NA->PASS: gcc.c-torture/unsorted/dump-noaddr.c.034t.fre1, -Og -g comparison NA->PASS: gcc.c-torture/unsorted/dump-noaddr.c.034t.fre1, -Os comparison This patch solve this problem by replacing the static pass number in the output by a star, allowing for a stable output while retaining easy copy/ pasting in shell. ChangeLog entry is as follows: *** gcc/testsuite/ChangeLog *** 2015-12-30 Thomas Preud'homme * gcc.c-torture/unsorted/dump-noaddr.x (dump_compare): Replace static pass number in output by a star. diff --git a/gcc/testsuite/gcc.c-torture/unsorted/dump-noaddr.x b/gcc/ testsuite/gcc.c-torture/unsorted/dump-noaddr.x index a8174e0..001dd6b 100644 --- a/gcc/testsuite/gcc.c-torture/unsorted/dump-noaddr.x +++ b/gcc/testsuite/gcc.c-torture/unsorted/dump-noaddr.x @@ -18,6 +18,7 @@ proc dump_compare { src options } { foreach dump1 [lsort [glob -nocomplain dump1/*]] { regsub dump1/ $dump1 dump2/ dump2 set dumptail "gcc.c-torture/unsorted/[file tail $dump1]" + regsub {\.\d+((t|r|i)\.[^.]+)$} $dumptail {.*\1} dumptail #puts "$option $dump1" set tmp [ diff "$dump1" "$dump2" ] if { $tmp == 0 } { Is this ok for stage3? Best regards, Thomas
Re: [PATCH, testsuite] Fix g++.dg/pr67989.C test failure when running with -march or -mcpu
On Monday, January 11, 2016 04:57:18 PM Bernd Schmidt wrote: > On 01/08/2016 10:33 AM, Thomas Preud'homme wrote: > > 2016-01-08 Thomas Preud'homme > > > > * g++.dg/pr67989.C: Remove ARM-specific option. > > * gcc.target/arm/pr67989.C: New file. > > I checked some other arm tests and they have things like > > /* { dg-skip-if "avoid conflicting multilib options" { *-*-* } { > "-march=*" } { "-march=armv4t" } } */ > /* { dg-skip-if "avoid conflicting multilib options" { *-*-* } { > "-mthumb" } { "" } } */ > > Do you need the same in your testcase? That was the first approach I took but Kyrill suggested me to use arm_arch_v4t and arm_arch_v4t_ok machinery instead. It should take care about whether the architecture can be selected. Best regards, Thomas
Re: [PATCH, testsuite] Fix g++.dg/pr67989.C test failure when running with -march or -mcpu
On Thursday, January 07, 2016 10:26:28 AM Richard Earnshaw wrote: > On 07/01/16 09:15, Kyrill Tkachov wrote: > > In this case perhaps we should go the route of just removing the > > target-specific option > > altogether. > > > > Richard, that's the approach you recommended, right? > > Yes. > > I think if you really need to test a specific set of target flags, then > it might be acceptable to have a duplicate of the test in dg.target/arm > (but please put a comment in the (arm version of the) test to explain > why it has been duplicated. What about the following: *** gcc/testsuite/ChangeLog *** 2016-01-08 Thomas Preud'homme * g++.dg/pr67989.C: Remove ARM-specific option. * gcc.target/arm/pr67989.C: New file. diff --git a/gcc/testsuite/g++.dg/pr67989.C b/gcc/testsuite/g++.dg/pr67989.C index 90261c450b4b9429fb989f7df62f3743017c7363..c3023557d31a21aead717fd58483c82e3e74da95 100644 --- a/gcc/testsuite/g++.dg/pr67989.C +++ b/gcc/testsuite/g++.dg/pr67989.C @@ -1,6 +1,5 @@ /* { dg-do compile } */ /* { dg-options "-std=c++11 -O2" } */ -/* { dg-additional-options "-marm -march=armv4t" { target arm*-*-* } } */ __extension__ typedef unsigned long long int uint64_t; namespace std __attribute__ ((__visibility__ ("default"))) diff --git a/gcc/testsuite/gcc.target/arm/pr67989.C b/gcc/testsuite/ gcc.target/arm/pr67989.C new file mode 100644 index ..0006924e24f698711e1e501d09b5098049522ad6 --- /dev/null +++ b/gcc/testsuite/gcc.target/arm/pr67989.C @@ -0,0 +1,82 @@ +/* { dg-do compile } */ +/* { dg-options "-std=c++11 -O2" } */ +/* { dg-require-effective-target arm_arch_v4t_ok } */ +/* { dg-add-options arm_arch_v4t } */ +/* { dg-additional-options "-marm" } */ + +/* Duplicate version of the test in g++.dg to be able to run this test only if + ARMv4t in ARM execution state can be targetted. Newer architecture don't + expose the bug this testcase was written for. */ + + +__extension__ typedef unsigned long long int uint64_t; +namespace std __attribute__ ((__visibility__ ("default"))) +{ + typedef enum memory_order + { +memory_order_seq_cst + } memory_order; +} + +namespace std __attribute__ ((__visibility__ ("default"))) +{ + template < typename _Tp > struct atomic + { +static constexpr int _S_min_alignment + = (sizeof (_Tp) & (sizeof (_Tp) - 1)) || sizeof (_Tp) > 16 + ? 0 : sizeof (_Tp); +static constexpr int _S_alignment + = _S_min_alignment > alignof (_Tp) ? _S_min_alignment : alignof (_Tp); + alignas (_S_alignment) _Tp _M_i; +operator _Tp () const noexcept +{ + return load (); +} +_Tp load (memory_order __m = memory_order_seq_cst) const noexcept +{ + _Tp tmp; +__atomic_load (&_M_i, &tmp, __m); +} + }; +} + +namespace lldb_private +{ + namespace imp + { + } + class Address; +} +namespace lldb +{ + typedef uint64_t addr_t; + class SBSection + { + }; + class SBAddress + { +void SetAddress (lldb::SBSection section, lldb::addr_t offset); + lldb_private::Address & ref (); + }; +} +namespace lldb_private +{ + class Address + { + public: +const Address & SetOffset (lldb::addr_t offset) +{ + bool changed = m_offset != offset; +} +std::atomic < lldb::addr_t > m_offset; + }; +} + +using namespace lldb; +using namespace lldb_private; +void +SBAddress::SetAddress (lldb::SBSection section, lldb::addr_t offset) +{ + Address & addr = ref (); + addr.SetOffset (offset); +} Is this ok for stage3? Best regards, Thomas
RE: [PATCH, ARM, ping1] Fix gcc.c-torture/execute/loop-2b.c execution failure on cortex-m0
Ping? > -Original Message- > From: gcc-patches-ow...@gcc.gnu.org [mailto:gcc-patches- > ow...@gcc.gnu.org] On Behalf Of Thomas Preud'homme > Sent: Wednesday, December 16, 2015 5:11 PM > To: gcc-patches@gcc.gnu.org; Richard Earnshaw; Ramana Radhakrishnan; > Kyrylo Tkachov > Subject: [PATCH, ARM] Fix gcc.c-torture/execute/loop-2b.c execution > failure on cortex-m0 > > During reorg pass, thumb1_reorg () is tasked with rewriting mov rd, rn to > subs rd, rn, 0 to avoid a comparison against 0 instruction before doing a > conditional branch based on it. The actual avoiding of cmp is done in > cbranchsi4_insn instruction C output template. When the condition is > met, the source register (rn) is also propagated into the comparison in > place the destination register (rd). > > However, right now thumb1_reorg () only look for a mov followed by a > cbranchsi but does not check whether the comparison in cbranchsi is > against the constant 0. This is not safe because a non clobbering > instruction could exist between the mov and the comparison that > modifies the source register. This is what happens here with a post > increment of the source register after the mov, which skip the &a[i] == > &a[1] comparison for iteration i == 1. > > This patch fixes the issue by checking that the comparison is against > constant 0. > > ChangeLog entry is as follow: > > > *** gcc/ChangeLog *** > > 2015-12-07 Thomas Preud'homme > > * config/arm/arm.c (thumb1_reorg): Check that the comparison is > against the constant 0. > > > diff --git a/gcc/config/arm/arm.c b/gcc/config/arm/arm.c > index 42bf272..49c0a06 100644 > --- a/gcc/config/arm/arm.c > +++ b/gcc/config/arm/arm.c > @@ -17195,7 +17195,7 @@ thumb1_reorg (void) >FOR_EACH_BB_FN (bb, cfun) > { >rtx dest, src; > - rtx pat, op0, set = NULL; > + rtx cmp, op0, op1, set = NULL; >rtx_insn *prev, *insn = BB_END (bb); >bool insn_clobbered = false; > > @@ -17208,8 +17208,13 @@ thumb1_reorg (void) > continue; > >/* Get the register with which we are comparing. */ > - pat = PATTERN (insn); > - op0 = XEXP (XEXP (SET_SRC (pat), 0), 0); > + cmp = XEXP (SET_SRC (PATTERN (insn)), 0); > + op0 = XEXP (cmp, 0); > + op1 = XEXP (cmp, 1); > + > + /* Check that comparison is against ZERO. */ > + if (!CONST_INT_P (op1) || INTVAL (op1) != 0) > + continue; > >/* Find the first flag setting insn before INSN in basic block BB. */ >gcc_assert (insn != BB_HEAD (bb)); > @@ -17249,7 +17254,7 @@ thumb1_reorg (void) > PATTERN (prev) = gen_rtx_SET (dest, src); > INSN_CODE (prev) = -1; > /* Set test register in INSN to dest. */ > - XEXP (XEXP (SET_SRC (pat), 0), 0) = copy_rtx (dest); > + XEXP (cmp, 0) = copy_rtx (dest); > INSN_CODE (insn) = -1; > } > } > > > Testsuite shows no regression when run for arm-none-eabi with - > mcpu=cortex-m0 -mthumb > > Is this ok for trunk? > > Best regards, > > Thomas
Re: [PATCH, testsuite] Fix g++.dg/pr67989.C test failure when running with -march or -mcpu
On Tuesday, January 05, 2016 10:47:38 AM Kyrill Tkachov wrote: > Hi Thomas, Hi Kyrill, > > > > diff --git a/gcc/testsuite/g++.dg/pr67989.C > > b/gcc/testsuite/g++.dg/pr67989.C index > > 90261c450b4b9429fb989f7df62f3743017c7363..61be8e172a96df5bb76f7ecd8543dadf > > 825e7dc7 100644 > > --- a/gcc/testsuite/g++.dg/pr67989.C > > +++ b/gcc/testsuite/g++.dg/pr67989.C > > @@ -1,5 +1,6 @@ > > > > /* { dg-do compile } */ > > /* { dg-options "-std=c++11 -O2" } */ > > > > +/* { dg-skip-if "do not override -mcpu" { arm*-*-* } { "-march=*" > > "-mcpu=*" } { "-march=armv4t" } } */ > > > > /* { dg-additional-options "-marm -march=armv4t" { target arm*-*-* } } > > */ > > How about we try to do it using the add_options_for_arm_arch_v4t machinery > and the arm_arch_v4t_ok check? I don't quite understand. dg-add-options doesn't take a selector according to GCC internals documentation and dg-additional-options doesn't take feature. If I use dg-add-options with a require-effective-target that will limit this test to ARM. Did I misunderstand your point? Best regards, Thomas
[PATCH, testsuite] Fix g++.dg/pr67989.C test failure when running with -march or -mcpu
Hi, g++.dg/pr67989.C passes -march=armv4t to gcc when compiling which fails if RUNTESTFLAGS passes -mcpu or -march with a different value. This patch adds a dg-skip-if directive to skip the test when such a thing happens. ChangeLog entry is as follows: *** gcc/testsuite/ChangeLog *** 2015-12-31 Thomas Preud'homme * g++.dg/pr67989.C: Skip test if already running it with -mcpu or -march with different value. diff --git a/gcc/testsuite/g++.dg/pr67989.C b/gcc/testsuite/g++.dg/pr67989.C index 90261c450b4b9429fb989f7df62f3743017c7363..61be8e172a96df5bb76f7ecd8543dadf825e7dc7 100644 --- a/gcc/testsuite/g++.dg/pr67989.C +++ b/gcc/testsuite/g++.dg/pr67989.C @@ -1,5 +1,6 @@ /* { dg-do compile } */ /* { dg-options "-std=c++11 -O2" } */ +/* { dg-skip-if "do not override -mcpu" { arm*-*-* } { "-march=*" "-mcpu=*" } { "-march=armv4t" } } */ /* { dg-additional-options "-marm -march=armv4t" { target arm*-*-* } } */ __extension__ typedef unsigned long long int uint64_t; Is this ok for stage3? Best regards, Thomas
Re: [PATCH, GCC] Fix PR67781: wrong code generation for partial load on big endian targets
On Tuesday, January 05, 2016 01:53:37 PM you wrote: > > Regression testsuite was run on a bootstrapped native x86_64-linux-gnu GCC > and on an arm-none-eabi GCC cross-compiler without any regression. I'm > waiting for a slot on gcc110 to do a big endian bootstrap but at least the > testcase works on mips-linux. I'll send an update once bootstrap is > complete. Bootstrap went fine on gcc110 with the following language enabled: c,c++,objc,obj-c++,java,fortran,ada,go,lto. Best regards, Thomas
[PATCH, GCC] Fix PR67781: wrong code generation for partial load on big endian targets
Hi, bswap optimization pass generate wrong code on big endian targets when the result of a bit operation it analyzed is a partial load of the range of memory accessed by the original expression (when one or more bytes at lowest address were lost in the computation). This is due to the way cmpxchg and cmpnop are adjusted in find_bswap_or_nop before being compared to the result of the symbolic expression. Part of the adjustment is endian independent: it's to ignore the bytes that were not accessed by the original gimple expression. However, when the result has less byte than that original expression, some more byte need to be ignored and this is endian dependent. The current code only support loss of bytes at the highest addresses because there is no code to adjust the address of the load. However, for little and big endian targets the bytes at highest address translate into different byte significance in the result. This patch first separate cmpxchg and cmpnop adjustement into 2 steps and then deal with endianness correctly for the second step. ChangeLog entries are as follow: *** gcc/ChangeLog *** 2015-12-16 Thomas Preud'homme PR tree-optimization/67781 * tree-ssa-math-opts.c (find_bswap_or_nop): Zero out bytes in cmpxchg and cmpnop in two steps: first the ones not accessed in original gimple expression in a endian independent way and then the ones not accessed in the final result in an endian-specific way. *** gcc/testsuite/ChangeLog *** 2015-12-16 Thomas Preud'homme PR tree-optimization/67781 * gcc.c-torture/execute/pr67781.c: New file. diff --git a/gcc/testsuite/gcc.c-torture/execute/pr67781.c b/gcc/testsuite/gcc.c-torture/execute/pr67781.c new file mode 100644 index 000..bf50aa2 --- /dev/null +++ b/gcc/testsuite/gcc.c-torture/execute/pr67781.c @@ -0,0 +1,34 @@ +#ifdef __UINT32_TYPE__ +typedef __UINT32_TYPE__ uint32_t; +#else +typedef unsigned uint32_t; +#endif + +#ifdef __UINT8_TYPE__ +typedef __UINT8_TYPE__ uint8_t; +#else +typedef unsigned char uint8_t; +#endif + +struct +{ + uint32_t a; + uint8_t b; +} s = { 0x123456, 0x78 }; + +int pr67781() +{ + uint32_t c = (s.a << 8) | s.b; + return c; +} + +int +main () +{ + if (sizeof (uint32_t) * __CHAR_BIT__ != 32) +return 0; + + if (pr67781 () != 0x12345678) +__builtin_abort (); + return 0; +} diff --git a/gcc/tree-ssa-math-opts.c b/gcc/tree-ssa-math-opts.c index b00f046..e5a185f 100644 --- a/gcc/tree-ssa-math-opts.c +++ b/gcc/tree-ssa-math-opts.c @@ -2441,6 +2441,8 @@ find_bswap_or_nop_1 (gimple *stmt, struct symbolic_number *n, int limit) static gimple * find_bswap_or_nop (gimple *stmt, struct symbolic_number *n, bool *bswap) { + unsigned rsize; + uint64_t tmpn, mask; /* The number which the find_bswap_or_nop_1 result should match in order to have a full byte swap. The number is shifted to the right according to the size of the symbolic number before using it. */ @@ -2464,24 +2466,38 @@ find_bswap_or_nop (gimple *stmt, struct symbolic_number *n, bool *bswap) /* Find real size of result (highest non-zero byte). */ if (n->base_addr) -{ - int rsize; - uint64_t tmpn; - - for (tmpn = n->n, rsize = 0; tmpn; tmpn >>= BITS_PER_MARKER, rsize++); - n->range = rsize; -} +for (tmpn = n->n, rsize = 0; tmpn; tmpn >>= BITS_PER_MARKER, rsize++); + else +rsize = n->range; - /* Zero out the extra bits of N and CMP*. */ + /* Zero out the bits corresponding to untouched bytes in original gimple + expression. */ if (n->range < (int) sizeof (int64_t)) { - uint64_t mask; - mask = ((uint64_t) 1 << (n->range * BITS_PER_MARKER)) - 1; cmpxchg >>= (64 / BITS_PER_MARKER - n->range) * BITS_PER_MARKER; cmpnop &= mask; } + /* Zero out the bits corresponding to unused bytes in the result of the + gimple expression. */ + if (rsize < n->range) +{ + if (BYTES_BIG_ENDIAN) + { + mask = ((uint64_t) 1 << (rsize * BITS_PER_MARKER)) - 1; + cmpxchg &= mask; + cmpnop >>= (n->range - rsize) * BITS_PER_MARKER; + } + else + { + mask = ((uint64_t) 1 << (rsize * BITS_PER_MARKER)) - 1; + cmpxchg >>= (n->range - rsize) * BITS_PER_MARKER; + cmpnop &= mask; + } + n->range = rsize; +} + /* A complete byte swap should make the symbolic number to start with the largest digit in the highest order byte. Unchanged symbolic number indicates a read with same endianness as target architecture. */ Regression testsuite was run on a bootstrapped native x86_64-linux-gnu GCC and on an arm-none-eabi GCC cross-compiler without any regression. I'm waiting for a slot on gcc110 to do a big endian bootstrap but at least the testcase works on mips-linux. I
RE: [PATCH, ARM, 1/3] Document --with-multilib-list for arm*-*-* targets
> From: Gerald Pfeifer [mailto:ger...@pfeifer.com] > Sent: Sunday, January 03, 2016 6:49 AM > > On Wed, 16 Dec 2015, Thomas Preud'homme wrote: > > Currently, the documentation for --with-multilib-list in > > gcc/doc/install.texi only mentions sh*-*-* and x86-64-*-linux* targets. > > However, arm*-*-* targets also support this option. This patch adds > > documention for the meaning of this option for arm*-*-* targets. > > > > 2015-12-09 Thomas Preud'homme > > > > * doc/install.texi (--with-multilib-list): Describe the meaning of > > the > > option for arm*-*-* targets. > > Ok (since I don't think I saw a response from the ARM maintainers). > > (The list of options for -mfpu= is a bit inconsistent with the other > cases, in that it has -mfpu= only in the first case, but I guess this > is fine if you want to keep it that way.) Oh indeed right. How could I miss that? I fixed it and committed the following: 2016-01-04 Thomas Preud'homme * doc/install.texi (--with-multilib-list): Describe the meaning of the option for arm*-*-* targets. diff --git a/gcc/doc/install.texi b/gcc/doc/install.texi index 81cadb5ed59c3ce8de2b8f8a474fcb3d9de10f32..f3052c07b06bca2e1fc77f0133b723563467a94d 100644 --- a/gcc/doc/install.texi +++ b/gcc/doc/install.texi @@ -1102,9 +1102,19 @@ sysv, aix. @item --with-multilib-list=@var{list} @itemx --without-multilib-list Specify what multilibs to build. -Currently only implemented for sh*-*-* and x86-64-*-linux*. +Currently only implemented for arm*-*-*, sh*-*-* and x86-64-*-linux*. @table @code +@item arm*-*-* +@var{list} is either @code{default} or @code{aprofile}. Specifying +@code{default} is equivalent to omitting this option while specifying +@code{aprofile} builds multilibs for each combination of ISA (@code{-marm} or +@code{-mthumb}), architecture (@code{-march=armv7-a}, @code{-march=armv7ve}, +or @code{-march=armv8-a}), FPU available (none, @code{-mfpu=vfpv3-d16}, +@code{-mfpu=neon}, @code{-mfpu=vfpv4-d16}, @code{-mfpu=neon-vfpv4} or +@code{-mfpu=neon-fp-armv8} depending on architecture) and floating-point ABI +(@code{-mfloat-abi=softfp} or @code{-mfloat-abi=hard}). + @item sh*-*-* @var{list} is a comma separated list of CPU names. These must be of the form @code{sh*} or @code{m*} (in which case they match the compiler option Best regards, Thomas
[RFC][PATCH, ARM 8/8] Added support for ARMV8-M Security Extension cmse_nonsecure_caller intrinsic
[Sending on behalf of Andre Vieira] Hello, This patch adds support ARMv8-M's Security Extension's cmse_nonsecure_caller intrinsic. This intrinsic is used to check whether an entry function was called from a non-secure state. See Section 5.4.3 of ARM®v8-M Security Extensions: Requirements on Development Tools (http://infocenter.arm.com/help/topic/com.arm.doc.ecm0359818/index.html) for further details. *** gcc/ChangeLog *** 2015-10-27 Andre Vieira Thomas Preud'homme * gcc/config/arm/arm-builtins.c (arm_builtins): Define ARM_BUILTIN_CMSE_NONSECURE_CALLER. (bdesc_2arg): Add line for cmse_nonsecure_caller. (arm_init_builtins): Init for cmse_nonsecure_caller. (arm_expand_builtin): Handle cmse_nonsecure_caller. * gcc/config/arm/arm_cmse.h (cmse_nonsecure_caller): New. *** gcc/testsuite/ChangeLog *** 2015-10-27 Andre Vieira Thomas Preud'homme * gcc.target/arm/cmse/cmse-1.c: Added test for cmse_nonsecure_caller. diff --git a/gcc/config/arm/arm-builtins.c b/gcc/config/arm/arm-builtins.c index 11cd17d0b8f3c29ccbe16cb463a17d55ba0fa1e3..7934cf1d4d96c40255d3e93dc9902b4568014984 100644 --- a/gcc/config/arm/arm-builtins.c +++ b/gcc/config/arm/arm-builtins.c @@ -515,6 +515,8 @@ enum arm_builtins ARM_BUILTIN_GET_FPSCR, ARM_BUILTIN_SET_FPSCR, + ARM_BUILTIN_CMSE_NONSECURE_CALLER, + #undef CRYPTO1 #undef CRYPTO2 #undef CRYPTO3 @@ -1263,6 +1265,10 @@ static const struct builtin_description bdesc_2arg[] = FP_BUILTIN (set_fpscr, SET_FPSCR) #undef FP_BUILTIN + {ARM_FSET_MAKE_CPU2 (FL2_CMSE), CODE_FOR_andsi3, + "__builtin_arm_cmse_nonsecure_caller", ARM_BUILTIN_CMSE_NONSECURE_CALLER, + UNKNOWN, 0}, + #define CRC32_BUILTIN(L, U) \ {ARM_FSET_EMPTY, CODE_FOR_##L, "__builtin_arm_"#L, \ ARM_BUILTIN_##U, UNKNOWN, 0}, @@ -1797,6 +1803,17 @@ arm_init_builtins (void) = add_builtin_function ("__builtin_arm_stfscr", ftype_set_fpscr, ARM_BUILTIN_SET_FPSCR, BUILT_IN_MD, NULL, NULL_TREE); } + + if (arm_arch_cmse) +{ + tree ftype_cmse_nonsecure_caller + = build_function_type_list (unsigned_type_node, NULL); + arm_builtin_decls[ARM_BUILTIN_CMSE_NONSECURE_CALLER] + = add_builtin_function ("__builtin_arm_cmse_nonsecure_caller", + ftype_cmse_nonsecure_caller, + ARM_BUILTIN_CMSE_NONSECURE_CALLER, BUILT_IN_MD, + NULL, NULL_TREE); +} } /* Return the ARM builtin for CODE. */ @@ -2356,6 +2373,14 @@ arm_expand_builtin (tree exp, emit_insn (pat); return target; +case ARM_BUILTIN_CMSE_NONSECURE_CALLER: + icode = CODE_FOR_andsi3; + target = gen_reg_rtx (SImode); + op0 = arm_return_addr (0, NULL_RTX); + pat = GEN_FCN (icode) (target, op0, const1_rtx); + emit_insn (pat); + return target; + case ARM_BUILTIN_TEXTRMSB: case ARM_BUILTIN_TEXTRMUB: case ARM_BUILTIN_TEXTRMSH: diff --git a/gcc/config/arm/arm_cmse.h b/gcc/config/arm/arm_cmse.h index ab20a3ec46025f268a1e9bed895d27da9af7aab6..0bdff668d03d54e1acf2bdd3b5ff1bfb2b463bd8 100644 --- a/gcc/config/arm/arm_cmse.h +++ b/gcc/config/arm/arm_cmse.h @@ -163,6 +163,13 @@ __attribute__ ((__always_inline__)) cmse_TTAT (void *p) CMSE_TT_ASM (at) +//TODO: diagnose use outside cmse_nonsecure_entry functions +__extension__ static __inline int __attribute__ ((__always_inline__)) +cmse_nonsecure_caller (void) +{ + return __builtin_arm_cmse_nonsecure_caller (); +} + #define CMSE_AU_NONSECURE 2 #define CMSE_MPU_NONSECURE 16 #define CMSE_NONSECURE 18 diff --git a/gcc/testsuite/gcc.target/arm/cmse/cmse-1.c b/gcc/testsuite/gcc.target/arm/cmse/cmse-1.c index 1c3d4e9e934f4b1166d4d98383cf4ae8c3515117..ccecf396d3cda76536537b4d146bbb5f70589fd5 100644 --- a/gcc/testsuite/gcc.target/arm/cmse/cmse-1.c +++ b/gcc/testsuite/gcc.target/arm/cmse/cmse-1.c @@ -66,3 +66,32 @@ int foo (char * p) /* { dg-final { scan-assembler-times "ttat " 2 } } */ /* { dg-final { scan-assembler-times "bl.cmse_check_address_range" 7 } } */ /* { dg-final { scan-assembler-not "cmse_check_pointed_object" } } */ + +typedef int (*int_ret_funcptr_t) (void); +typedef int __attribute__ ((cmse_nonsecure_call)) (*int_ret_nsfuncptr_t) (void); + +int __attribute__ ((cmse_nonsecure_entry)) +baz (void) +{ + return cmse_nonsecure_caller (); +} + +int __attribute__ ((cmse_nonsecure_entry)) +qux (int_ret_funcptr_t int_ret_funcptr) +{ + int_ret_nsfuncptr_t int_ret_nsfunc_ptr; + + if (cmse_is_nsfptr (int_ret_funcptr)) +{ + int_ret_nsfunc_ptr = cmse_nsfptr_create (int_ret_funcptr); + return int_ret_nsfunc_ptr (); +} + return 0; +} +/* { dg-final { scan-assembler "baz:" } } */ +/* { dg-final { scan-assembler "__acle_se_baz:" }
[RFC][PATCH, ARM 7/8] ARMv8-M Security Extension's cmse_nonsecure_call: use __gnu_cmse_nonsecure_call
[Sending on behalf of Andre Vieira] Hello, This patch extends support for the ARMv8-M Security Extensions 'cmse_nonsecure_call' to use a new library function '__gnu_cmse_nonsecure_call'. This library function is responsible for (without using r0-r3 or d0-d7): 1) saving and clearing all callee-saved registers using the secure stack 2) clearing the LSB of the address passed in r4 and using blxns to 'jump' to it 3) clearing ASPR, including the 'ge bits' if DSP is enabled 4) clearing FPSCR if using non-soft float-abi 5) restoring callee-saved registers. The decisions whether to include DSP 'ge bits' clearing and floating point registers (single/double precision) all depends on the multilib used. See Section 5.5 of ARM®v8-M Security Extensions (http://infocenter.arm.com/help/topic/com.arm.doc.ecm0359818/index.html). *** gcc/ChangeLog *** *** gcc/ChangeLog *** 2015-10-27 Andre Vieira Thomas Preud'homme * gcc/config/arm/arm.c (detect_cmse_nonsecure_call): New. (cmse_nonsecure_call_clear_caller_saved): New. * gcc/config/arm/arm-protos.h (detect_cmse_nonsecure_call): New. * gcc/config/arm/arm.md (call): Handle cmse_nonsecure_entry. (call_value): Likewise. (nonsecure_call_internal): New. (nonsecure_call_value_internal): New. * gcc/config/arm/thumb1.md (*nonsecure_call_reg_thumb1_v5): New. (*nonsecure_call_value_reg_thumb1_v5): New. * gcc/config/arm/thumb2.md (*nonsecure_call_reg_thumb2): New. (*nonsecure_call_value_reg_thumb2): New. * gcc/config/arm/unspecs.md (UNSPEC_NONSECURE_MEM): New. * libgcc/config/arm/cmse_nonsecure_call.S: New. * libgcc/config/arm/t-arm: Compile cmse_nonsecure_call.S *** gcc/testsuite/ChangeLog *** 2015-10-27 Andre Vieira Thomas Preud'homme * gcc/testsuite/gcc.target/arm/cmse/baseline/cmse-11.c: New. * gcc/testsuite/gcc.target/arm/cmse/baseline/cmse-13.c: New. * gcc/testsuite/gcc.target/arm/cmse/baseline/cmse-6.c: New. * gcc/testsuite/gcc.target/arm/cmse/mainline/hard-sp/cmse-13.c: New. * gcc/testsuite/gcc.target/arm/cmse/mainline/hard-sp/cmse-7.c: New. * gcc/testsuite/gcc.target/arm/cmse/mainline/hard-sp/cmse-8.c: New. * gcc/testsuite/gcc.target/arm/cmse/mainline/hard/cmse-13.c: New. * gcc/testsuite/gcc.target/arm/cmse/mainline/hard/cmse-7.c: New. * gcc/testsuite/gcc.target/arm/cmse/mainline/hard/cmse-8.c: New. * gcc/testsuite/gcc.target/arm/cmse/mainline/soft/cmse-13.c: New. * gcc/testsuite/gcc.target/arm/cmse/mainline/soft/cmse-7.c: New. * gcc/testsuite/gcc.target/arm/cmse/mainline/soft/cmse-8.c: New. * gcc/testsuite/gcc.target/arm/cmse/mainline/softfp-sp/cmse-7.c: New. * gcc/testsuite/gcc.target/arm/cmse/mainline/softfp-sp/cmse-8.c: New. * gcc/testsuite/gcc.target/arm/cmse/mainline/softfp/cmse-13.c: New. * gcc/testsuite/gcc.target/arm/cmse/mainline/softfp/cmse-7.c: New. * gcc/testsuite/gcc.target/arm/cmse/mainline/softfp/cmse-8.c: New. diff --git a/gcc/config/arm/arm-protos.h b/gcc/config/arm/arm-protos.h index 9ee8c333046d9a5bb0487f7b710a5aff42d2..694ee02f534019a5fc9377757f3269dfe6ccfbc0 100644 --- a/gcc/config/arm/arm-protos.h +++ b/gcc/config/arm/arm-protos.h @@ -132,6 +132,7 @@ extern int arm_const_double_inline_cost (rtx); extern bool arm_const_double_by_parts (rtx); extern bool arm_const_double_by_immediates (rtx); extern void arm_emit_call_insn (rtx, rtx, bool); +bool detect_cmse_nonsecure_call (tree); extern const char *output_call (rtx *); void arm_emit_movpair (rtx, rtx); extern const char *output_mov_long_double_arm_from_arm (rtx *); diff --git a/gcc/config/arm/arm.c b/gcc/config/arm/arm.c index 4b4eea88cbec8e04d5b92210f0af2440ce6fb6e4..320f7b447501047a59ceef4f7ded2dadc2088664 100644 --- a/gcc/config/arm/arm.c +++ b/gcc/config/arm/arm.c @@ -17403,6 +17403,129 @@ note_invalid_constants (rtx_insn *insn, HOST_WIDE_INT address, int do_pushes) return; } +/* Saves callee saved registers, clears callee saved registers and caller saved + registers not used to pass arguments before a cmse_nonsecure_call. And + restores the callee saved registers after. */ + +static void +cmse_nonsecure_call_clear_caller_saved (void) +{ + basic_block bb; + + FOR_EACH_BB_FN (bb, cfun) +{ + rtx_insn *insn; + + FOR_BB_INSNS (bb, insn) + { + uint64_t to_clear_mask, float_mask; + rtx_insn *seq; + rtx pat, call, unspec, link, reg, cleared_reg, tmp; + unsigned int regno, maxregno; + rtx address; + + if (!NONDEBUG_INSN_P (insn)) + continue; + + if (!CALL_P (insn)) + continue; + + pat = PATTERN (insn); + gcc_assert (GET_CODE (pat) == PARALLEL && XVECLEN (pat, 0) > 0); + call
[RFC][PATCH, ARM 6/8] Handling ARMv8-M Security Extension's cmse_nonsecure_call attribute
[Sending on behalf of Andre Vieira] Hello, This patch adds support for the ARMv8-M Security Extensions 'cmse_nonsecure_call' attribute. This attribute may only be used for function types and when used in combination with the '-mcmse' compilation flag. See Section 5.5 of ARM®v8-M Security Extensions (http://infocenter.arm.com/help/topic/com.arm.doc.ecm0359818/index.html). We currently do not support cmse_nonsecure_call functions that pass arguments or return variables on the stack and we diagnose this. *** gcc/ChangeLog *** 2015-10-27 Andre Vieira Thomas Preud'homme * gcc/config/arm/arm.c (gimplify.h): New include. (arm_handle_cmse_nonsecure_call): New. (arm_attribute_table): Added cmse_nonsecure_call. *** gcc/testsuite/ChangeLog *** 2015-10-27 Andre Vieira Thomas Preud'homme * gcc.target/arm/cmse/cmse-3.c: Add tests. * gcc.target/arm/cmse/cmse-4.c: Add tests. diff --git a/gcc/config/arm/arm.c b/gcc/config/arm/arm.c index 0700478ca38307f35d0cb01f83ea182802ba28fa..4b4eea88cbec8e04d5b92210f0af2440ce6fb6e4 100644 --- a/gcc/config/arm/arm.c +++ b/gcc/config/arm/arm.c @@ -61,6 +61,7 @@ #include "builtins.h" #include "tm-constrs.h" #include "rtl-iter.h" +#include "gimplify.h" /* This file should be included last. */ #include "target-def.h" @@ -136,6 +137,7 @@ static tree arm_handle_isr_attribute (tree *, tree, tree, int, bool *); static tree arm_handle_notshared_attribute (tree *, tree, tree, int, bool *); #endif static tree arm_handle_cmse_nonsecure_entry (tree *, tree, tree, int, bool *); +static tree arm_handle_cmse_nonsecure_call (tree *, tree, tree, int, bool *); static void arm_output_function_epilogue (FILE *, HOST_WIDE_INT); static void arm_output_function_prologue (FILE *, HOST_WIDE_INT); static int arm_comp_type_attributes (const_tree, const_tree); @@ -347,6 +349,8 @@ static const struct attribute_spec arm_attribute_table[] = /* ARMv8-M Security Extensions support. */ { "cmse_nonsecure_entry", 0, 0, true, false, false, arm_handle_cmse_nonsecure_entry, false }, + { "cmse_nonsecure_call", 0, 0, true, false, false, +arm_handle_cmse_nonsecure_call, false }, { NULL, 0, 0, false, false, false, NULL, false } }; @@ -6667,6 +6671,76 @@ arm_handle_cmse_nonsecure_entry (tree *node, tree name, return NULL_TREE; } + +/* Called upon detection of the use of the cmse_nonsecure_call attribute, this + function will check whether the attribute is allowed here and will add the + attribute to the function type tree or otherwise issue a diagnose. The + reason we check this at declaration time is to only allow the use of the + attribute with declartions of function pointers and not function + declartions. */ + +static tree +arm_handle_cmse_nonsecure_call (tree *node, tree name, +tree /* args */, +int /* flags */, +bool *no_add_attrs) +{ + tree decl = NULL_TREE; + tree type, fntype, main_variant; + + if (!use_cmse) +{ + *no_add_attrs = true; + return NULL_TREE; +} + + if (TREE_CODE (*node) == VAR_DECL || TREE_CODE (*node) == TYPE_DECL) +{ + decl = *node; + type = TREE_TYPE (decl); +} + + if (!decl + || (!(TREE_CODE (type) == POINTER_TYPE + && TREE_CODE (TREE_TYPE (type)) == FUNCTION_TYPE) + && TREE_CODE (type) != FUNCTION_TYPE)) +{ + warning (OPT_Wattributes, "%qE attribute only applies to base type of a " +"function pointer", name); + *no_add_attrs = true; + return NULL_TREE; +} + + /* type is either a function pointer, when the attribute is used on a function + * pointer, or a function type when used in a typedef. */ + if (TREE_CODE (type) == FUNCTION_TYPE) +fntype = type; + else +fntype = TREE_TYPE (type); + + *no_add_attrs |= cmse_func_args_or_return_in_stack (NULL, name, fntype); + + if (*no_add_attrs) +return NULL_TREE; + + /* Prevent tree's being shared among function types with and without + cmse_nonsecure_call attribute. Do however make sure they keep the same + main_variant, this is required for correct DIE output. */ + main_variant = TYPE_MAIN_VARIANT (fntype); + fntype = build_distinct_type_copy (fntype); + TYPE_MAIN_VARIANT (fntype) = main_variant; + if (TREE_CODE (type) == FUNCTION_TYPE) +TREE_TYPE (decl) = fntype; + else +TREE_TYPE (type) = fntype; + + /* Construct a type attribute and add it to the function type. */ + tree attrs = tree_cons (get_identifier ("cmse_nonsecure_call"), NULL_TREE, + TYPE_ATTRIBUTES (fntype)); + TYPE_ATTRIBUTES (fntype) = attrs; + return NULL_TREE; +} + /* Return 0 if the attributes for two t
[RFC][PATCH, ARM 5/8] ARMv8-M Security Extension's cmse_nonsecure_entry: clear registers
[Sending on behalf of Andre Vieira] Hello, This patch extends support for the ARMv8-M Security Extensions 'cmse_nonsecure_entry' attribute to safeguard against leak of information through unbanked registers. When returning from a nonsecure entry function we clear all caller-saved registers that are not used to pass return values, by writing either the LR, in case of general purpose registers, or the value 0, in case of FP registers. We use the LR to write to APSR and FPSCR too. We currently only support 32 FP registers as in we only clear D0-D7. We currently do not support entry functions that pass arguments or return variables on the stack and we diagnose this. This patch relies on the existing code to make sure callee-saved registers used in cmse_nonsecure_entry functions are saved and restored thus retaining their nonsecure mode value, this should be happening already as it is required by AAPCS. *** gcc/ChangeLog *** 2015-10-27 Andre Vieira Thomas Preud'homme * gcc/config/arm/arm.c (output_return_instruction): Clear registers. (thumb2_expand_return): Likewise. (thumb1_expand_epilogue): Likewise. (arm_expand_epilogue): Likewise. (cmse_nonsecure_entry_clear_before_return): New. * gcc/config/arm/arm.h (TARGET_DSP_ADD): New macro define. * gcc/config/arm/thumb1.md (*epilogue_insns): Change length attribute. * gcc/config/arm/thumb2.md (*thumb2_return): Likewise. *** gcc/testsuite/ChangeLog *** 2015-10-27 Andre Vieira Thomas Preud'homme * gcc.target/arm/cmse/cmse.exp: Test different multilibs separate. * gcc.target/arm/cmse/baseline/cmse-2.c: Test that registers are cleared. * gcc.target/arm/cmse/mainline/soft/cmse-5.c: New. * gcc.target/arm/cmse/mainline/hard/cmse-5.c: New. * gcc.target/arm/cmse/mainline/hard-sp/cmse-5.c: New. * gcc.target/arm/cmse/mainline/softfp/cmse-5.c: New. * gcc.target/arm/cmse/mainline/softfp-sp/cmse-5.c: New. diff --git a/gcc/config/arm/arm.h b/gcc/config/arm/arm.h index f12e3c93bbe24b10ed8eee6687161826773ef649..b06e0586a3da50f57645bda13629bc4dbd3d53b7 100644 --- a/gcc/config/arm/arm.h +++ b/gcc/config/arm/arm.h @@ -230,6 +230,9 @@ extern void (*arm_lang_output_object_attributes_hook)(void); /* Integer SIMD instructions, and extend-accumulate instructions. */ #define TARGET_INT_SIMD \ (TARGET_32BIT && arm_arch6 && (arm_arch_notm || arm_arch7em)) +/* Parallel addition and subtraction instructions. */ +#define TARGET_DSP_ADD \ + (TARGET_ARM_ARCH >= 6 && (arm_arch_notm || arm_arch7em)) /* Should MOVW/MOVT be used in preference to a constant pool. */ #define TARGET_USE_MOVT \ diff --git a/gcc/config/arm/arm.c b/gcc/config/arm/arm.c index e530b772e3cc053c16421a2a2861d815d53ebb01..0700478ca38307f35d0cb01f83ea182802ba28fa 100644 --- a/gcc/config/arm/arm.c +++ b/gcc/config/arm/arm.c @@ -19755,6 +19755,24 @@ output_return_instruction (rtx operand, bool really_return, bool reverse, default: if (IS_CMSE_ENTRY (func_type)) { + char flags[12] = "APSR_nzcvq"; + /* Check if we have to clear the 'GE bits' which is only used if +parallel add and subtraction instructions are available. */ + if (TARGET_DSP_ADD) + { + /* If so also clear the ge flags. */ + flags[10] = 'g'; + flags[11] = '\0'; + } + snprintf (instr, sizeof (instr), "msr%s\t%s, %%|lr", conditional, + flags); + output_asm_insn (instr, & operand); + if (TARGET_HARD_FLOAT && TARGET_VFP) + { + snprintf (instr, sizeof (instr), "vmsr%s\tfpscr, %%|lr", + conditional); + output_asm_insn (instr, & operand); + } snprintf (instr, sizeof (instr), "bxns%s\t%%|lr", conditional); } /* Use bx if it's available. */ @@ -23999,6 +24017,17 @@ thumb_pop (FILE *f, unsigned long mask) static void thumb1_cmse_nonsecure_entry_return (FILE *f, int reg_containing_return_addr) { + char flags[12] = "APSR_nzcvq"; + /* Check if we have to clear the 'GE bits' which is only used if + parallel add and subtraction instructions are available. */ + if (TARGET_DSP_ADD) +{ + flags[10] = 'g'; + flags[11] = '\0'; +} + asm_fprintf (f, "\tmsr\t%s, %r\n", flags, reg_containing_return_addr); + if (TARGET_HARD_FLOAT && TARGET_VFP) +asm_fprintf (f, "\tvmsr\tfpscr, %r\n", reg_containing_return_addr); asm_fprintf (f, "\tbxns\t%r\n", reg_containing_return_addr); }
[RFC][PATCH, ARM 4/8] ARMv8-M Security Extension's cmse_nonsecure_entry: __acle_se label and bxns return
[Sending on behalf of Andre Vieira] Hello, This patch extends support for the ARMv8-M Security Extensions 'cmse_nonsecure_entry' attribute in two ways: 1) Generate two labels for the function, the regular function name and one with the function's name appended to '__acle_se_', this will trigger the linker to create a secure gateway veneer for this entry function. 2) Return from cmse_nonsecure_entry marked functions using bxns. See Section 5.4 of ARM®v8-M Security Extensions (http://infocenter.arm.com/help/topic/com.arm.doc.ecm0359818/index.html). *** gcc/ChangeLog *** 2015-10-27 Andre Vieira Thomas Preud'homme * gcc/config/arm/arm.c (use_return_insn): Change to return with bxns when cmse_nonsecure_entry. (output_return_instruction): Likewise. (arm_output_function_prologue): Likewise. (thumb_pop): Likewise. (thumb_exit): Likewise. (arm_function_ok_for_sibcall): Disable sibcall for entry functions. (arm_asm_declare_function_name): New. (thumb1_cmse_nonsecure_entry_return): New. * gcc/config/arm/arm-protos.h (arm_asm_declare_function_name): New. * gcc/config/arm/elf.h (ASM_DECLARE_FUNCTION_NAME): Redefine to use arm_asm_declare_function_name. *** gcc/testsuite/ChangeLog *** 2015-10-27 Andre Vieira Thomas Preud'homme * gcc.target/arm/cmse/cmse-2.c: New. * gcc.target/arm/cmse/cmse-4.c: New. diff --git a/gcc/config/arm/arm-protos.h b/gcc/config/arm/arm-protos.h index 85dca057d63544c672188db39b05a33b1be10915..9ee8c333046d9a5bb0487f7b710a5aff42d2 100644 --- a/gcc/config/arm/arm-protos.h +++ b/gcc/config/arm/arm-protos.h @@ -31,6 +31,7 @@ extern int arm_volatile_func (void); extern void arm_expand_prologue (void); extern void arm_expand_epilogue (bool); extern void arm_declare_function_name (FILE *, const char *, tree); +extern void arm_asm_declare_function_name (FILE *, const char *, tree); extern void thumb2_expand_return (bool); extern const char *arm_strip_name_encoding (const char *); extern void arm_asm_output_labelref (FILE *, const char *); diff --git a/gcc/config/arm/arm.c b/gcc/config/arm/arm.c index 5b9e51b10e91eee64e3383c1ed50269c3e6cf24c..e530b772e3cc053c16421a2a2861d815d53ebb01 100644 --- a/gcc/config/arm/arm.c +++ b/gcc/config/arm/arm.c @@ -3795,6 +3795,11 @@ use_return_insn (int iscond, rtx sibling) return 0; } + /* ARMv8-M nonsecure entry function need to use bxns to return and thus need + several instructions if anything needs to be popped. */ + if (saved_int_regs && IS_CMSE_ENTRY (func_type)) +return 0; + /* If there are saved registers but the LR isn't saved, then we need two instructions for the return. */ if (saved_int_regs && !(saved_int_regs & (1 << LR_REGNUM))) @@ -6820,6 +6825,11 @@ arm_function_ok_for_sibcall (tree decl, tree exp) if (IS_INTERRUPT (func_type)) return false; + /* ARMv8-M non-secure entry functions need to return with bxns which is only + generated for entry functions themselves. */ + if (IS_CMSE_ENTRY (arm_current_func_type ())) +return false; + if (!VOID_TYPE_P (TREE_TYPE (DECL_RESULT (cfun->decl { /* Check that the return value locations are the same. For @@ -19607,6 +19617,7 @@ output_return_instruction (rtx operand, bool really_return, bool reverse, (e.g. interworking) then we can load the return address directly into the PC. Otherwise we must load it into LR. */ if (really_return + && !IS_CMSE_ENTRY (func_type) && (IS_INTERRUPT (func_type) || !TARGET_INTERWORK)) return_reg = reg_names[PC_REGNUM]; else @@ -19742,8 +19753,12 @@ output_return_instruction (rtx operand, bool really_return, bool reverse, break; default: + if (IS_CMSE_ENTRY (func_type)) + { + snprintf (instr, sizeof (instr), "bxns%s\t%%|lr", conditional); + } /* Use bx if it's available. */ - if (arm_arch5 || arm_arch4t) + else if (arm_arch5 || arm_arch4t) sprintf (instr, "bx%s\t%%|lr", conditional); else sprintf (instr, "mov%s\t%%|pc, %%|lr", conditional); @@ -19756,6 +19771,42 @@ output_return_instruction (rtx operand, bool really_return, bool reverse, return ""; } +/* Output in FILE asm statements needed to declare the NAME of the function + defined by its DECL node. */ + +void +arm_asm_declare_function_name (FILE *file, const char *name, tree decl) +{ + size_t cmse_name_len; + char *cmse_name = 0; + char cmse_prefix[] = "__acle_se_"; + + if (use_cmse && lookup_attribute ("cmse_nonsecure_entry", + DECL_ATTRIBUTES (decl))) +{ +
[RFC][PATCH, ARM 3/8] Handling ARMv8-M Security Extension's cmse_nonsecure_entry attribute
[Sending on behalf of Andre Vieira] Hello, This patch adds support for the ARMv8-M Security Extensions 'cmse_nonsecure_entry' attribute. In this patch we implement the attribute handling and diagnosis around the attribute. See Section 5.4 of ARM®v8-M Security Extensions (http://infocenter.arm.com/help/topic/com.arm.doc.ecm0359818/index.html). *** gcc/ChangeLog *** 2015-10-27 Andre Vieira Thomas Preud'homme * gcc/config/arm/arm.c (arm_handle_cmse_nonsecure_entry): New. (arm_attribute_table): Added cmse_nonsecure_entry (arm_compute_func_type): Handle cmse_nonsecure_entry. (cmse_func_args_or_return_in_stack): New. (arm_handle_cmse_nonsecure_entry): New. * gcc/config/arm/arm.h (ARM_FT_CMSE_ENTRY): New macro define. (IS_CMSE_ENTRY): Likewise. *** gcc/testsuite/ChangeLog *** 2015-10-27 Andre Vieira Thomas Preud'homme * gcc.target/arm/cmse/cmse-3.c: New. diff --git a/gcc/config/arm/arm.h b/gcc/config/arm/arm.h index cf6d9466fb79e4f8a2dbfe725c52d5be8ea24fd2..f12e3c93bbe24b10ed8eee6687161826773ef649 100644 --- a/gcc/config/arm/arm.h +++ b/gcc/config/arm/arm.h @@ -1375,6 +1375,7 @@ enum reg_class #define ARM_FT_VOLATILE(1 << 4) /* Does not return. */ #define ARM_FT_NESTED (1 << 5) /* Embedded inside another func. */ #define ARM_FT_STACKALIGN (1 << 6) /* Called with misaligned stack. */ +#define ARM_FT_CMSE_ENTRY (1 << 7) /* ARMv8-M non-secure entry function. */ /* Some macros to test these flags. */ #define ARM_FUNC_TYPE(t) (t & ARM_FT_TYPE_MASK) @@ -1383,6 +1384,7 @@ enum reg_class #define IS_NAKED(t)(t & ARM_FT_NAKED) #define IS_NESTED(t) (t & ARM_FT_NESTED) #define IS_STACKALIGN(t) (t & ARM_FT_STACKALIGN) +#define IS_CMSE_ENTRY(t) (t & ARM_FT_CMSE_ENTRY) /* Structure used to hold the function stack frame layout. Offsets are diff --git a/gcc/config/arm/arm.c b/gcc/config/arm/arm.c index 2223101fbf96bceb4beb3a7d6cb04162481dc3bf..5b9e51b10e91eee64e3383c1ed50269c3e6cf24c 100644 --- a/gcc/config/arm/arm.c +++ b/gcc/config/arm/arm.c @@ -135,6 +135,7 @@ static tree arm_handle_isr_attribute (tree *, tree, tree, int, bool *); #if TARGET_DLLIMPORT_DECL_ATTRIBUTES static tree arm_handle_notshared_attribute (tree *, tree, tree, int, bool *); #endif +static tree arm_handle_cmse_nonsecure_entry (tree *, tree, tree, int, bool *); static void arm_output_function_epilogue (FILE *, HOST_WIDE_INT); static void arm_output_function_prologue (FILE *, HOST_WIDE_INT); static int arm_comp_type_attributes (const_tree, const_tree); @@ -343,6 +344,9 @@ static const struct attribute_spec arm_attribute_table[] = { "notshared",0, 0, false, true, false, arm_handle_notshared_attribute, false }, #endif + /* ARMv8-M Security Extensions support. */ + { "cmse_nonsecure_entry", 0, 0, true, false, false, +arm_handle_cmse_nonsecure_entry, false }, { NULL, 0, 0, false, false, false, NULL, false } }; @@ -3562,6 +3566,9 @@ arm_compute_func_type (void) else type |= arm_isr_value (TREE_VALUE (a)); + if (lookup_attribute ("cmse_nonsecure_entry", attr)) +type |= ARM_FT_CMSE_ENTRY; + return type; } @@ -6552,6 +6559,109 @@ arm_handle_notshared_attribute (tree *node, } #endif +/* This function is used to check whether functions with attributes + cmse_nonsecure_call or cmse_nonsecure_entry use the stack to pass arguments + or return variables. If the function does indeed use the stack this + function returns true and diagnoses this, otherwise it returns false. */ + +static bool +cmse_func_args_or_return_in_stack (tree fndecl, tree name, tree fntype) +{ + function_args_iterator args_iter; + CUMULATIVE_ARGS args_so_far_v; + cumulative_args_t args_so_far; + bool first_param = true; + tree arg_type, prev_arg_type = NULL_TREE, ret_type; + + /* Error out if any argument is passed on the stack. */ + arm_init_cumulative_args (&args_so_far_v, fntype, NULL_RTX, fndecl); + args_so_far = pack_cumulative_args (&args_so_far_v); + FOREACH_FUNCTION_ARGS (fntype, arg_type, args_iter) +{ + rtx arg_rtx; + machine_mode arg_mode = TYPE_MODE (arg_type); + + prev_arg_type = arg_type; + if (VOID_TYPE_P (arg_type)) + continue; + + if (!first_param) + arm_function_arg_advance (args_so_far, arg_mode, arg_type, true); + arg_rtx = arm_function_arg (args_so_far, arg_mode, arg_type, true); + if (!arg_rtx + || arm_arg_partial_bytes (args_so_far, arg_mode, arg_type, true)) + { + error ("%qE attribute not available to functions with arguments " +"passed on the stack", name); + return true; + } + first_param = false; +} + + /* Error out for
[RFC][PATCH , ARM 2/8] Add RTL patterns for thumb1 push/pop
[Sending on behalf of Andre Vieira] Hello, This patch adds RTL patterns for the push and pop instructions for thumb1. These are needed by subsequent patches in the series. *** gcc/ChangeLog *** 2015-10-27 Andre Vieira Thomas Preud'homme * gcc/config/arm/arm-ldmstm.nl (constr thumb): Enabled stackpointer to be written/read. * gcc/config/arm/ldmstm.md: Regenerated. * gcc/config/arm/thumb1.md (*thumb1_pop_single): New. (*thumb1_load_multiple_operation): New. * gcc/config/arm/arm.c (thumb_pop): Fix of comment. diff --git a/gcc/config/arm/arm-ldmstm.ml b/gcc/config/arm/arm-ldmstm.ml index 62982df594d5d4a1407df359e927c66986a9788c..f3ee741e93927d8d44a9eccec8970b46a8984216 100644 --- a/gcc/config/arm/arm-ldmstm.ml +++ b/gcc/config/arm/arm-ldmstm.ml @@ -63,7 +63,7 @@ let rec final_offset addrmode nregs = | DB -> -4 * nregs let constr thumb = - if thumb then "l" else "rk" + if thumb then "lk" else "rk" let inout_constr op_type = match op_type with diff --git a/gcc/config/arm/arm.c b/gcc/config/arm/arm.c index 06a6184ee0c4ed1a7cec1de4c1786e297cc57872..2223101fbf96bceb4beb3a7d6cb04162481dc3bf 100644 --- a/gcc/config/arm/arm.c +++ b/gcc/config/arm/arm.c @@ -23773,8 +23773,8 @@ thumb1_emit_multi_reg_push (unsigned long mask, unsigned long real_regs) return insn; } -/* Emit code to push or pop registers to or from the stack. F is the - assembly file. MASK is the registers to pop. */ +/* Emit code to pop registers from the stack. F is the assembly file. + MASK is the registers to pop. */ static void thumb_pop (FILE *f, unsigned long mask) { diff --git a/gcc/config/arm/ldmstm.md b/gcc/config/arm/ldmstm.md index ebb09ab86e799f3606e0988980edf3cd0189272b..8c0472e07799bd9d08759e35b6b98f3536d3d013 100644 --- a/gcc/config/arm/ldmstm.md +++ b/gcc/config/arm/ldmstm.md @@ -43,7 +43,7 @@ (define_insn "*thumb_ldm4_ia" [(match_parallel 0 "load_multiple_operation" [(set (match_operand:SI 1 "low_register_operand" "") - (mem:SI (match_operand:SI 5 "s_register_operand" "l"))) + (mem:SI (match_operand:SI 5 "s_register_operand" "lk"))) (set (match_operand:SI 2 "low_register_operand" "") (mem:SI (plus:SI (match_dup 5) (const_int 4 @@ -80,7 +80,7 @@ (define_insn "*thumb_ldm4_ia_update" [(match_parallel 0 "load_multiple_operation" -[(set (match_operand:SI 5 "s_register_operand" "+&l") +[(set (match_operand:SI 5 "s_register_operand" "+&lk") (plus:SI (match_dup 5) (const_int 16))) (set (match_operand:SI 1 "low_register_operand" "") (mem:SI (match_dup 5))) @@ -133,7 +133,7 @@ (define_insn "*thumb_stm4_ia_update" [(match_parallel 0 "store_multiple_operation" -[(set (match_operand:SI 5 "s_register_operand" "+&l") +[(set (match_operand:SI 5 "s_register_operand" "+&lk") (plus:SI (match_dup 5) (const_int 16))) (set (mem:SI (match_dup 5)) (match_operand:SI 1 "low_register_operand" "")) @@ -491,7 +491,7 @@ (define_insn "*thumb_ldm3_ia" [(match_parallel 0 "load_multiple_operation" [(set (match_operand:SI 1 "low_register_operand" "") - (mem:SI (match_operand:SI 4 "s_register_operand" "l"))) + (mem:SI (match_operand:SI 4 "s_register_operand" "lk"))) (set (match_operand:SI 2 "low_register_operand" "") (mem:SI (plus:SI (match_dup 4) (const_int 4 @@ -522,7 +522,7 @@ (define_insn "*thumb_ldm3_ia_update" [(match_parallel 0 "load_multiple_operation" -[(set (match_operand:SI 4 "s_register_operand" "+&l") +[(set (match_operand:SI 4 "s_register_operand" "+&lk") (plus:SI (match_dup 4) (const_int 12))) (set (match_operand:SI 1 "low_register_operand" "") (mem:SI (match_dup 4))) @@ -568,7 +568,7 @@ (define_insn "*thumb_stm3_ia_update" [(match_parallel 0 "store_multiple_operation" -[(set (match_operand:SI 4 "s_register_operand" "+&l") +[(set (match_operand:SI 4 "s_register_operand" "+&lk") (plus:SI (match_dup 4) (const_int 12))) (set (mem:SI (match_dup 4)) (match_operand:SI 1 "low_register_operand" "")) @@ -877,7 +877,7 @@ (define_insn "*thumb_ldm2_ia" [(match_parallel 0 "load_multiple_operation" [(set (match_operand:SI 1 "low_
RE: [RFC][PATCH, ARM 1/8] Add support for ARMv8-M's Security Extensions flag and intrinsics
And even better, with the patch (see below ChangeLog entries)! Sigh... > From: gcc-patches-ow...@gcc.gnu.org [mailto:gcc-patches- > ow...@gcc.gnu.org] On Behalf Of Thomas Preud'homme > Sent: Saturday, December 26, 2015 9:41 AM > To: gcc-patches@gcc.gnu.org; Richard Earnshaw; Ramana Radhakrishnan; > Kyrylo Tkachov > Subject: [RFC][PATCH, ARM 1/8] Add support for ARMv8-M's Security > Extensions flag and intrinsics > > [Sending on behalf of Andre Vieira] > > Hello, > > This patch adds the support of the '-mcmse' option to enable ARMv8-M's > Security Extensions and supports the following intrinsics: > cmse_TT > cmse_TT_fptr > cmse_TTT > cmse_TTT_fptr > cmse_TTA > cmse_TTA_fptr > cmse_TTAT > cmse_TTAT_fptr > cmse_check_address_range > cmse_check_pointed_object > cmse_is_nsfptr > cmse_nsfptr_create > > It also defines the mandatory cmse_address_info struct and the > __ARM_FEATURE_CMSE macro. > See Chapter 4, Sections 5.2, 5.3 and 5.6 of ARM®v8-M Security > Extensions: Requirements on Development Tools > (http://infocenter.arm.com/help/topic/com.arm.doc.ecm0359818/index > .html). > > *** gcc/ChangeLog *** > 2015-10-27 Andre Vieira > Thomas Preud'homme > > * gcc/config.gcc (extra_headers): Added arm_cmse.h. > * gcc/config/arm/arm-arches.def (ARM_ARCH): > (armv8-m): Add FL2_CMSE. > (armv8-m.main): Likewise. > (armv8-m.main+dsp): Likewise. > * gcc/config/arm/arm-c.c > (arm_cpu_builtins): Added __ARM_FEATURE_CMSE macro. > * gcc/config/arm/arm-protos.h > (arm_is_constant_pool_ref): Define FL2_CMSE. > * gcc/config/arm.c (arm_arch_cmse): New. > (arm_option_override): New error for unsupported cmse target. > * gcc/config/arm/arm.h (arm_arch_cmse): New. > * gcc/config/arm/arm.opt (mcmse): New. > * gcc/doc/invoke.texi (ARM Options): Add -mcmse. > * gcc/config/arm/arm_cmse.h: New file. > * libgcc/config/arm/cmse.c: Likewise. > * libgcc/config/arm/t-arm (HAVE_CMSE): New. > > > *** gcc/testsuite/ChangeLog *** > 2015-10-27 Andre Vieira > Thomas Preud'homme > > * gcc.target/arm/cmse/cmse.exp: New. > * gcc.target/arm/cmse/cmse-1.c: New. > * gcc.target/arm/cmse/cmse-12.c: New. > * lib/target-supports.exp > (check_effective_target_arm_cmse_ok): New. diff --git a/gcc/config.gcc b/gcc/config.gcc index 882e4134b4c883a5fe0f19996e54ac63769bada1..701082e82ee3da6c5bf00da799293c92af8624ff 100644 --- a/gcc/config.gcc +++ b/gcc/config.gcc @@ -321,7 +321,7 @@ arc*-*-*) arm*-*-*) cpu_type=arm extra_objs="arm-builtins.o aarch-common.o" - extra_headers="mmintrin.h arm_neon.h arm_acle.h" + extra_headers="mmintrin.h arm_neon.h arm_acle.h arm_cmse.h" target_type_format_char='%' c_target_objs="arm-c.o" cxx_target_objs="arm-c.o" diff --git a/gcc/config/arm/arm-arches.def b/gcc/config/arm/arm-arches.def index 1d0301a3b9414127d387834584f3e42c225b6d3f..52518e64e07a1b085ae5ed1932b598e064258971 100644 --- a/gcc/config/arm/arm-arches.def +++ b/gcc/config/arm/arm-arches.def @@ -58,11 +58,11 @@ ARM_ARCH("armv7e-m", cortexm4, 7EM, ARM_FSET_MAKE_CPU1 (FL_CO_PROC | FL_F ARM_ARCH("armv8-a", cortexa53, 8A,ARM_FSET_MAKE_CPU1 (FL_CO_PROC | FL_FOR_ARCH8A)) ARM_ARCH("armv8-a+crc",cortexa53, 8A, ARM_FSET_MAKE_CPU1 (FL_CO_PROC | FL_CRC32 | FL_FOR_ARCH8A)) ARM_ARCH("armv8-m.base", cortexm0, 8M_BASE, -ARM_FSET_MAKE_CPU1 ( FL_FOR_ARCH8M_BASE)) +ARM_FSET_MAKE ( FL_FOR_ARCH8M_BASE, FL2_CMSE)) ARM_ARCH("armv8-m.main", cortexm7, 8M_MAIN, -ARM_FSET_MAKE_CPU1(FL_CO_PROC | FL_FOR_ARCH8M_MAIN)) +ARM_FSET_MAKE (FL_CO_PROC | FL_FOR_ARCH8M_MAIN, FL2_CMSE)) ARM_ARCH("armv8-m.main+dsp", cortexm7, 8M_MAIN, -ARM_FSET_MAKE_CPU1(FL_CO_PROC | FL_ARCH7EM | FL_FOR_ARCH8M_MAIN)) +ARM_FSET_MAKE (FL_CO_PROC | FL_ARCH7EM | FL_FOR_ARCH8M_MAIN, FL2_CMSE)) ARM_ARCH("iwmmxt", iwmmxt, 5TE, ARM_FSET_MAKE_CPU1 (FL_LDSCHED | FL_STRONG | FL_FOR_ARCH5TE | FL_XSCALE | FL_IWMMXT)) ARM_ARCH("iwmmxt2", iwmmxt2,5TE, ARM_FSET_MAKE_CPU1 (FL_LDSCHED | FL_STRONG | FL_FOR_ARCH5TE | FL_XSCALE | FL_IWMMXT | FL_IWMMXT2)) diff --git a/gcc/config/arm/arm-c.c b/gcc/config/arm/arm-c.c index 7dee28ec52df68f8c7a60fe66e1b049fed39c1c0..459ddbb1f41947cbeeb1a291ab7395843528e562 100644 --- a/gcc/config/arm/arm-c.c +++ b/gcc/config/arm/arm-c.c @@ -73
[RFC][PATCH, ARM 1/8] Add support for ARMv8-M's Security Extensions flag and intrinsics
[Sending on behalf of Andre Vieira] Hello, This patch adds the support of the '-mcmse' option to enable ARMv8-M's Security Extensions and supports the following intrinsics: cmse_TT cmse_TT_fptr cmse_TTT cmse_TTT_fptr cmse_TTA cmse_TTA_fptr cmse_TTAT cmse_TTAT_fptr cmse_check_address_range cmse_check_pointed_object cmse_is_nsfptr cmse_nsfptr_create It also defines the mandatory cmse_address_info struct and the __ARM_FEATURE_CMSE macro. See Chapter 4, Sections 5.2, 5.3 and 5.6 of ARM®v8-M Security Extensions: Requirements on Development Tools (http://infocenter.arm.com/help/topic/com.arm.doc.ecm0359818/index.html). *** gcc/ChangeLog *** 2015-10-27 Andre Vieira Thomas Preud'homme * gcc/config.gcc (extra_headers): Added arm_cmse.h. * gcc/config/arm/arm-arches.def (ARM_ARCH): (armv8-m): Add FL2_CMSE. (armv8-m.main): Likewise. (armv8-m.main+dsp): Likewise. * gcc/config/arm/arm-c.c (arm_cpu_builtins): Added __ARM_FEATURE_CMSE macro. * gcc/config/arm/arm-protos.h (arm_is_constant_pool_ref): Define FL2_CMSE. * gcc/config/arm.c (arm_arch_cmse): New. (arm_option_override): New error for unsupported cmse target. * gcc/config/arm/arm.h (arm_arch_cmse): New. * gcc/config/arm/arm.opt (mcmse): New. * gcc/doc/invoke.texi (ARM Options): Add -mcmse. * gcc/config/arm/arm_cmse.h: New file. * libgcc/config/arm/cmse.c: Likewise. * libgcc/config/arm/t-arm (HAVE_CMSE): New. *** gcc/testsuite/ChangeLog *** 2015-10-27 Andre Vieira Thomas Preud'homme * gcc.target/arm/cmse/cmse.exp: New. * gcc.target/arm/cmse/cmse-1.c: New. * gcc.target/arm/cmse/cmse-12.c: New. * lib/target-supports.exp (check_effective_target_arm_cmse_ok): New. We welcome any comments. Cheers, Andre
[RFC][PATCH, ARM 0/8] ARMv8-M Security Extensions
[Sending on behalf of Andre Vieira] Hello, This patch series aims at implementing an alpha status support for ARMv8-M's Security Extensions. It is only posted as RFC at this stage. You can find the specification of ARMV8-M Security Extensions in: ARM®v8-M Security Extensions: Requirements on Development Tools (http://infocenter.arm.com/help/topic/com.arm.doc.ecm0359818/index.html). We currently: - do not support passing arguments or returning on the stack for cmse_nonsecure_{call,entry} functions, - do not guarantee padding bits are cleared for arguments or return variables of cmse_nonsecure_{call,entry} functions, - only test Security Extensions for -mfpu=fpv5-d16 and fpv5-sp-d16 and only support single and double precision FPU's with d16. Andre Vieira (8): Add support for ARMv8-M's Security Extensions flag and intrinsics Add RTL patterns for thumb1 push/pop Handling ARMv8-M Security Extension's cmse_nonsecure_entry attribute ARMv8-M Security Extension's cmse_nonsecure_entry: __acle_se label and bxns return ARMv8-M Security Extension's cmse_nonsecure_entry: clear registers Handling ARMv8-M Security Extension's cmse_nonsecure_call attribute ARMv8-M Security Extension's cmse_nonsecure_call: use __gnu_cmse_nonsecure_call Added support for ARMV8-M Security Extension cmse_nonsecure_caller intrinsic Cheers, Andre
[PATCH, ARM 7/6] Enable atomics for ARMv8-M Mainline
Hi, This patch is part of a patch series to add support for ARMv8-M[1] to GCC. This specific patch enable atomics for ARMv8-M Mainline. No change is needed to existing patterns since Thumb-2 backend can already handle them fine. [1] For a quick overview of ARMv8-M please refer to the initial cover letter. ChangeLog entries are as follow: *** gcc/ChangeLog *** 2015-12-17 Thomas Preud'homme * config/arm/arm.h (TARGET_HAVE_LDACQ): Enable for ARMv8-M Mainline. diff --git a/gcc/config/arm/arm.h b/gcc/config/arm/arm.h index 1f79c37b5c36a410a2d500ba92c62a5ba4ca1178..fa2a6fb03ffd2ca53bfb7e7c8f03022b626880e0 100644 --- a/gcc/config/arm/arm.h +++ b/gcc/config/arm/arm.h @@ -258,7 +258,7 @@ extern void (*arm_lang_output_object_attributes_hook)(void); || arm_arch7) && arm_arch_notm) /* Nonzero if this chip supports load-acquire and store-release. */ -#define TARGET_HAVE_LDACQ (TARGET_ARM_ARCH >= 8 && arm_arch_notm) +#define TARGET_HAVE_LDACQ (TARGET_ARM_ARCH >= 8 && TARGET_32BIT) /* Nonzero if this chip provides the movw and movt instructions. */ #define TARGET_HAVE_MOVT (arm_arch_thumb2 || arm_arch8) Testing: * Toolchain was built successfully with and without the ARMv8-M support patches with the following multilib list: armv6-m,armv7-m,armv7e-m,cortex-m7. The code generation for crtbegin.o, crtend.o, crti.o, crtn.o, libgcc.a, libgcov.a, libc.a, libg.a, libgloss-linux.a, libm.a, libnosys.a, librdimon.a, librdpmon.a, libstdc++.a and libsupc++.a is unchanged for all these targets. * GCC also showed no testsuite regression when targeting ARMv8-M Baseline compared to ARMv6-M on ARM Fast Models and when targeting ARMv6-M and ARMv7-M (compared to without the patch) * GCC was bootstrapped successfully targeting Thumb-1 and targeting Thumb-2 Is this ok for stage3? Best regards, Thomas
[arm-embedded][PATCH, ARM, 3/3] Add multilib support for bare-metal ARM architectures
Hi, We decided to apply the following patch to the ARM embedded 5 branch. Best regards, Thomas > -Original Message- > From: gcc-patches-ow...@gcc.gnu.org [mailto:gcc-patches- > ow...@gcc.gnu.org] On Behalf Of Thomas Preud'homme > Sent: Wednesday, December 16, 2015 8:04 PM > To: 'Ramana Radhakrishnan'; Richard Earnshaw; Kyrylo Tkachov; gcc- > patches > Cc: Jasmin J. > Subject: [PATCH, ARM, 3/3] Add multilib support for bare-metal ARM > architectures > > Hi Ramana, > > As suggested in your initial answer to this thread, we updated the > multilib patch provided in ARM's embedded branch to be up-to-date > with regards to supported CPUs in GCC. As to the need to modify > Makefile.in and configure.ac, this is because the patch aims to let control > to the user as to what multilib should be built. To this effect, it takes a > list > of architecture at configure time and that list needs to be passed down > to t-baremetal Makefile to set the multilib variables appropriately. > > ChangeLog entry is as follows: > > > *** gcc/ChangeLog *** > > 2015-12-15 Thomas Preud'homme > > * Makefile.in (with_multilib_list): New variables substituted by > configure. > * config.gcc: Handle bare-metal multilibs in --with-multilib-list > option. > * config/arm/t-baremetal: New file. > * configure.ac (with_multilib_list): New AC_SUBST. > * configure: Regenerate. > * doc/install.texi (--with-multilib-list): Update description for > arm*-*-* targets to mention bare-metal multilibs. > > > diff --git a/gcc/Makefile.in b/gcc/Makefile.in > index > 1f698798aa2df3f44d6b3a478bb4bf48e9fa7372..18b790afa114aa7580be06 > 62d3ac9ffbc94e919d 100644 > --- a/gcc/Makefile.in > +++ b/gcc/Makefile.in > @@ -546,6 +546,7 @@ lang_opt_files=@lang_opt_files@ $(srcdir)/c- > family/c.opt $(srcdir)/common.opt > lang_specs_files=@lang_specs_files@ > lang_tree_files=@lang_tree_files@ > target_cpu_default=@target_cpu_default@ > +with_multilib_list=@with_multilib_list@ > OBJC_BOEHM_GC=@objc_boehm_gc@ > extra_modes_file=@extra_modes_file@ > extra_opt_files=@extra_opt_files@ > diff --git a/gcc/config.gcc b/gcc/config.gcc > index > af948b5e203f6b4f53dfca38e9d02d060d00c97b..d8098ed3cefacd00cb1059 > 0db1ec86d48e9fcdbc 100644 > --- a/gcc/config.gcc > +++ b/gcc/config.gcc > @@ -3787,15 +3787,25 @@ case "${target}" in > default) > ;; > *) > - echo "Error: --with-multilib- > list=${with_multilib_list} not supported." 1>&2 > - exit 1 > + for arm_multilib in ${arm_multilibs}; do > + case ${arm_multilib} in > + armv6-m | armv7-m | armv7e-m > | armv7-r | armv8-m.base | armv8-m.main) > + > tmake_profile_file="arm/t-baremetal" > + ;; > + *) > + echo "Error: --with- > multilib-list=${with_multilib_list} not supported." 1>&2 > + exit 1 > + ;; > + esac > + done > ;; > esac > > if test "x${tmake_profile_file}" != x ; then > - # arm/t-aprofile is only designed to work > - # without any with-cpu, with-arch, with- > mode, > - # with-fpu or with-float options. > + # arm/t-aprofile and arm/t-baremetal are > only > + # designed to work without any with-cpu, > + # with-arch, with-mode, with-fpu or > with-float > + # options. > if test "x$with_arch" != x \ > || test "x$with_cpu" != x \ > || test "x$with_float" != x \ > diff --git a/gcc/config/arm/t-baremetal b/gcc/config/arm/t-baremetal > new file mode 100644 > index > ..ffd29815e6ec22c747e777 > 47ed9b69e0ae21b63a > --- /dev/null > +++ b/gcc/config/arm/t-baremetal > @@ -0,0 +1,130 @@ > +# A set of predefined MULTILIB which can be used for different ARM > targets. > +# Via the configure option --
[arm-embedded][PATCH, GCC/ARM, 2/3] Error out for incompatible ARM multilibs
Hi, We decided to apply the following patch to the ARM embedded 5 branch. Best regards, Thomas > -Original Message- > From: gcc-patches-ow...@gcc.gnu.org [mailto:gcc-patches- > ow...@gcc.gnu.org] On Behalf Of Thomas Preud'homme > Sent: Wednesday, December 16, 2015 7:59 PM > To: gcc-patches@gcc.gnu.org; Richard Earnshaw; Ramana Radhakrishnan; > Kyrylo Tkachov > Subject: [PATCH, GCC/ARM, 2/3] Error out for incompatible ARM > multilibs > > Currently in config.gcc, only the first multilib in a multilib list is > checked for > validity and the following elements are ignored due to the break which > only breaks out of loop in shell. A loop is also done over the multilib list > elements despite no combination being legal. This patch rework the code > to address both issues. > > ChangeLog entry is as follows: > > > 2015-11-24 Thomas Preud'homme > > * config.gcc: Error out when conflicting multilib is detected. Do not > loop over multilibs since no combination is legal. > > > diff --git a/gcc/config.gcc b/gcc/config.gcc > index 59aee2c..be3c720 100644 > --- a/gcc/config.gcc > +++ b/gcc/config.gcc > @@ -3772,38 +3772,40 @@ case "${target}" in > # Add extra multilibs > if test "x$with_multilib_list" != x; then > arm_multilibs=`echo $with_multilib_list | sed -e > 's/,/ /g'` > - for arm_multilib in ${arm_multilibs}; do > - case ${arm_multilib} in > - aprofile) > + case ${arm_multilibs} in > + aprofile) > # Note that arm/t-aprofile is a > # stand-alone make file fragment to be > # used only with itself. We do not > # specifically use the > # TM_MULTILIB_OPTION framework > because > # this shorthand is more > - # pragmatic. Additionally it is only > - # designed to work without any > - # with-cpu, with-arch with-mode > + # pragmatic. > + tmake_profile_file="arm/t-aprofile" > + ;; > + default) > + ;; > + *) > + echo "Error: --with-multilib- > list=${with_multilib_list} not supported." 1>&2 > + exit 1 > + ;; > + esac > + > + if test "x${tmake_profile_file}" != x ; then > + # arm/t-aprofile is only designed to work > + # without any with-cpu, with-arch, with- > mode, > # with-fpu or with-float options. > - if test "x$with_arch" != x \ > - || test "x$with_cpu" != x \ > - || test "x$with_float" != x \ > - || test "x$with_fpu" != x \ > - || test "x$with_mode" != x ; > then > - echo "Error: You cannot use > any of --with-arch/cpu/fpu/float/mode with --with-multilib-list=aprofile" > 1>&2 > - exit 1 > - fi > - tmake_file="${tmake_file} > arm/t-aprofile" > - break > - ;; > - default) > - ;; > - *) > - echo "Error: --with-multilib- > list=${with_multilib_list} not supported." 1>&2 > - exit 1 > - ;; > - esac > - done > + if test "x$with_arch" != x \ > + || test "x$with_cpu" != x \ > + || test "x$with_float" != x \ > + || test "x$with_fpu" != x \ > + || test "x$with_mode" != x ; then > +
[arm-embedded][PATCH, ARM, 1/3] Document --with-multilib-list for arm*-*-* targets
Hi, We decided to apply the following patch to the ARM embedded 5 branch. Best regards, Thomas > -Original Message- > From: gcc-patches-ow...@gcc.gnu.org [mailto:gcc-patches- > ow...@gcc.gnu.org] On Behalf Of Thomas Preud'homme > Sent: Wednesday, December 16, 2015 7:56 PM > To: gcc-patches@gcc.gnu.org; Richard Earnshaw; Ramana Radhakrishnan; > Kyrylo Tkachov > Subject: [PATCH, ARM, 1/3] Document --with-multilib-list for arm*-*-* > targets > > Currently, the documentation for --with-multilib-list in > gcc/doc/install.texi only mentions sh*-*-* and x86-64-*-linux* targets. > However, arm*-*-* targets also support this option. This patch adds > documention for the meaning of this option for arm*-*-* targets. > > ChangeLog entry is as follows: > > > *** gcc/ChangeLog *** > > 2015-12-09 Thomas Preud'homme > > * doc/install.texi (--with-multilib-list): Describe the meaning of the > option for arm*-*-* targets. > > > diff --git a/gcc/doc/install.texi b/gcc/doc/install.texi > index 57399ed..2c93eb0 100644 > --- a/gcc/doc/install.texi > +++ b/gcc/doc/install.texi > @@ -1102,9 +1102,19 @@ sysv, aix. > @item --with-multilib-list=@var{list} > @itemx --without-multilib-list > Specify what multilibs to build. > -Currently only implemented for sh*-*-* and x86-64-*-linux*. > +Currently only implemented for arm*-*-*, sh*-*-* and x86-64-*-linux*. > > @table @code > +@item arm*-*-* > +@var{list} is either @code{default} or @code{aprofile}. Specifying > +@code{default} is equivalent to omitting this option while specifying > +@code{aprofile} builds multilibs for each combination of ISA (@code{- > marm} or > +@code{-mthumb}), architecture (@code{-march=armv7-a}, @code{- > march=armv7ve}, > +or @code{-march=armv8-a}), FPU available (none, @code{- > mfpu=vfpv3-d16}, > +@code{neon}, @code{vfpv4-d16}, @code{neon-vfpv4} or > @code{neon-fp-armv8} > +depending on architecture) and floating-point ABI (@code{-mfloat- > abi=softfp} > +or @code{-mfloat-abi=hard}). > + > @item sh*-*-* > @var{list} is a comma separated list of CPU names. These must be of > the > form @code{sh*} or @code{m*} (in which case they match the compiler > option > > > PDF builds fine out of the updated file and look as expected. > > Is this ok for trunk? > > Best regards, > > Thomas
RE: [PATCH, testsuite] Fix PR68629: attr-simd-3.c failure on arm-none-eabi targets
Reverted now. > -Original Message- > From: gcc-patches-ow...@gcc.gnu.org [mailto:gcc-patches- > ow...@gcc.gnu.org] On Behalf Of Thomas Preud'homme > Sent: Wednesday, December 09, 2015 5:56 PM > To: gcc-patches@gcc.gnu.org > Subject: [PATCH, testsuite] Fix PR68629: attr-simd-3.c failure on arm- > none-eabi targets > > c-c++-common/attr-simd-3.c fails to compile on arm-none-eabi targets > due to -fcilkplus needing -pthread which is not available for those targets. > This patch solves this issue by adding a condition to the cilkplus effective > target that compiling with -fcilkplus succeeds and requires cilkplus as an > effective target for attr-simd-3.c testcase. > > ChangeLog entry is as follows: > > > *** gcc/testsuite/ChangeLog *** > > 2015-12-08 Thomas Preud'homme > > PR testsuite/68629 > * lib/target-supports.exp (check_effective_target_cilkplus): Also > check that compiling with -fcilkplus does not give an error. > * c-c++-common/attr-simd-3.c: Require cilkplus effective target. > > > diff --git a/gcc/testsuite/c-c++-common/attr-simd-3.c b/gcc/testsuite/c- > c++-common/attr-simd-3.c > index d61ba82..1970c67 100644 > --- a/gcc/testsuite/c-c++-common/attr-simd-3.c > +++ b/gcc/testsuite/c-c++-common/attr-simd-3.c > @@ -1,4 +1,5 @@ > /* { dg-do compile } */ > +/* { dg-require-effective-target "cilkplus" } */ > /* { dg-options "-fcilkplus" } */ > /* { dg-prune-output "undeclared here \\(not in a > function\\)|\[^\n\r\]* was not declared in this scope" } */ > > diff --git a/gcc/testsuite/lib/target-supports.exp > b/gcc/testsuite/lib/target-supports.exp > index 4e349e9..95b903c 100644 > --- a/gcc/testsuite/lib/target-supports.exp > +++ b/gcc/testsuite/lib/target-supports.exp > @@ -1432,7 +1432,12 @@ proc check_effective_target_cilkplus { } { > if { [istarget avr-*-*] } { > return 0; > } > -return 1 > +return [ check_no_compiler_messages_nocache fcilkplus_available > executable { > + #ifdef __cplusplus > + extern "C" > + #endif > + int dummy; > + } "-fcilkplus" ] > } > > proc check_linker_plugin_available { } { > > > Testsuite shows no regression when run with > + an arm-none-eabi GCC cross-compiler targeting Cortex-M3 > + a bootstrapped x86_64-linux-gnu GCC native compiler > > Is this ok for trunk? > > Best regards, > > Thomas >
RE: [PATCH, testsuite] Fix PR68629: attr-simd-3.c failure on arm-none-eabi targets
Hi, > From: Jakub Jelinek [mailto:ja...@redhat.com] > Sent: Thursday, December 17, 2015 4:26 PM > > > > > > > --- a/gcc/testsuite/lib/target-supports.exp > > > > +++ b/gcc/testsuite/lib/target-supports.exp > > > > @@ -1432,7 +1432,12 @@ proc check_effective_target_cilkplus { } { > > > > if { [istarget avr-*-*] } { > > > > return 0; > > > > } > > > > -return 1 > > > > +return [ check_no_compiler_messages_nocache > fcilkplus_available executable { > > > > + #ifdef __cplusplus > > > > + extern "C" > > > > + #endif > > > > + int dummy; > > > > + } "-fcilkplus" ] > > > > } > > That change has been obviously bad. If anything, you want to make it > compile time only, i.e. check_no_compiler_messages_nocache > fcilkplus_available assembly Indeed, I failed to parse the space and didn't realize the kind of testing could be selected. > Just look at cilk-plus.exp: > It checks check_effective_target_cilkplus, and performs lots of tests if it > it returns true, and then checks check_libcilkrts_available and performs > further tests. > So, if any use of -fcilkplus fails on your target, then putting it > into check_effective_target_cilkplus is fine, you won't lose any Cilk+ > testing that way. Otherwise, if it is conditional say only some constructs, > say array notation is fine, but _Cilk_for is not, then even that is wrong. Ok. When I saw the very small list of target for which the condition returned true, I thought the goal was only to check if the target *could* support cilkplus and that actual support was tested by cilk-plus.exp. I'll revert this commit and prepare a patch to add arm in that list. > > In any case, IMHO the attr-simd-3.c test just should be moved into > c-c++-common/cilk-plus/SE/ directory. That was my thought initially but then I changed my mind, thinking that the test was placed there for a reason. I'll prepare a third patch to do that. My apologize for the breakage. Best regards, Thomas
[arm-embedded][PATCH, ARM 6/6] Add support for CB(N)Z and (U|S)DIV to ARMv8-M Baseline
Hi, We decided to apply the following patch to the ARM embedded 5 branch. Best regards, Thomas > -Original Message- > From: gcc-patches-ow...@gcc.gnu.org [mailto:gcc-patches- > ow...@gcc.gnu.org] On Behalf Of Thomas Preud'homme > Sent: Thursday, December 17, 2015 4:18 PM > To: gcc-patches@gcc.gnu.org; Richard Earnshaw; Ramana Radhakrishnan; > Kyrylo Tkachov > Subject: [PATCH, ARM 6/6] Add support for CB(N)Z and (U|S)DIV to > ARMv8-M Baseline > > Hi, > > This patch is part of a patch series to add support for ARMv8-M[1] to GCC. > This specific patch makes the compiler start generating code with the > new CB(N)Z and (U|S)DIV instructions for ARMv8-M Baseline. > > Sharing of instruction patterns for div insn template with ARM or Thumb- > 2 was done by allowing %? punctuation character for Thumb-1. This is > safe to do since the compiler would fault in arm_print_condition if a > condition code is not handled by a branch in Thumb1. > > Unfortunately, cbz cannot be shared with cbranchsi4 because it would > lead to worse code for Thumb-1. Indeed, choosing cb(n)z over the other > alternatives for cbranchsi4 depends on the distance between target and > pc which lead insn-attrtab to evaluate the minimum length of this > pattern to be 2 as it cannot computer the distance statically. It would be > possible to determine that this alternative is not available for non > ARMv8-M Thumb-1 target statically but genattrtab is not currently > capable to do it, so this is for a later patch. > > > [1] For a quick overview of ARMv8-M please refer to the initial cover > letter. > > ChangeLog entry is as follows: > > > *** gcc/ChangeLog *** > > 2015-11-13 Thomas Preud'homme > > * config/arm/arm.c (arm_print_operand_punct_valid_p): Make %? > valid > for Thumb-1. > * config/arm/arm.h (TARGET_HAVE_CBZ): Define. > (TARGET_IDIV): Set for all Thumb targets provided they have > hardware > divide feature. > * config/arm/thumb1.md (thumb1_cbz): New insn. > > > diff --git a/gcc/config/arm/arm.h b/gcc/config/arm/arm.h > index 015df50..247f144 100644 > --- a/gcc/config/arm/arm.h > +++ b/gcc/config/arm/arm.h > @@ -263,9 +263,12 @@ extern void > (*arm_lang_output_object_attributes_hook)(void); > /* Nonzero if this chip provides the movw and movt instructions. */ > #define TARGET_HAVE_MOVT (arm_arch_thumb2 || arm_arch8) > > +/* Nonzero if this chip provides the cb{n}z instruction. */ > +#define TARGET_HAVE_CBZ (arm_arch_thumb2 || arm_arch8) > + > /* Nonzero if integer division instructions supported. */ > #define TARGET_IDIV ((TARGET_ARM && arm_arch_arm_hwdiv) \ > - || (TARGET_THUMB2 && > arm_arch_thumb_hwdiv)) > + || (TARGET_THUMB && > arm_arch_thumb_hwdiv)) > > /* Nonzero if disallow volatile memory access in IT block. */ > #define TARGET_NO_VOLATILE_CE > (arm_arch_no_volatile_ce) > diff --git a/gcc/config/arm/arm.c b/gcc/config/arm/arm.c > index d832309..5ef3a1d 100644 > --- a/gcc/config/arm/arm.c > +++ b/gcc/config/arm/arm.c > @@ -22568,7 +22568,7 @@ arm_print_operand_punct_valid_p > (unsigned char code) > { >return (code == '@' || code == '|' || code == '.' > || code == '(' || code == ')' || code == '#' > - || (TARGET_32BIT && (code == '?')) > + || code == '?' > || (TARGET_THUMB2 && (code == '!')) > || (TARGET_THUMB && (code == '_'))); > } > diff --git a/gcc/config/arm/thumb1.md b/gcc/config/arm/thumb1.md > index 7e3bcb4..074b267 100644 > --- a/gcc/config/arm/thumb1.md > +++ b/gcc/config/arm/thumb1.md > @@ -973,6 +973,92 @@ >DONE; > }) > > +;; A pattern for the cb(n)z instruction added in ARMv8-M baseline > profile, > +;; adapted from cbranchsi4_insn. Modifying cbranchsi4_insn instead > leads to > +;; code generation difference for ARMv6-M because the minimum > length of the > +;; instruction becomes 2 even for it due to a limitation in genattrtab's > +;; handling of pc in the length condition. > +(define_insn "thumb1_cbz" > + [(set (pc) (if_then_else > + (match_operator 0 "equality_operator" > +[(match_operand:SI 1 "s_register_operand" "l") > + (const_int 0)]) > + (label_ref (match_operand 2 "" "")) > + (pc)))] > + "TARGET_THUMB1 && TARGET_HAVE_MOVT" > +{ > + if (get_attr_length (insn) == 2) > +{
[PATCH, ARM 6/6] Add support for CB(N)Z and (U|S)DIV to ARMv8-M Baseline
Hi, This patch is part of a patch series to add support for ARMv8-M[1] to GCC. This specific patch makes the compiler start generating code with the new CB(N)Z and (U|S)DIV instructions for ARMv8-M Baseline. Sharing of instruction patterns for div insn template with ARM or Thumb-2 was done by allowing %? punctuation character for Thumb-1. This is safe to do since the compiler would fault in arm_print_condition if a condition code is not handled by a branch in Thumb1. Unfortunately, cbz cannot be shared with cbranchsi4 because it would lead to worse code for Thumb-1. Indeed, choosing cb(n)z over the other alternatives for cbranchsi4 depends on the distance between target and pc which lead insn-attrtab to evaluate the minimum length of this pattern to be 2 as it cannot computer the distance statically. It would be possible to determine that this alternative is not available for non ARMv8-M Thumb-1 target statically but genattrtab is not currently capable to do it, so this is for a later patch. [1] For a quick overview of ARMv8-M please refer to the initial cover letter. ChangeLog entry is as follows: *** gcc/ChangeLog *** 2015-11-13 Thomas Preud'homme * config/arm/arm.c (arm_print_operand_punct_valid_p): Make %? valid for Thumb-1. * config/arm/arm.h (TARGET_HAVE_CBZ): Define. (TARGET_IDIV): Set for all Thumb targets provided they have hardware divide feature. * config/arm/thumb1.md (thumb1_cbz): New insn. diff --git a/gcc/config/arm/arm.h b/gcc/config/arm/arm.h index 015df50..247f144 100644 --- a/gcc/config/arm/arm.h +++ b/gcc/config/arm/arm.h @@ -263,9 +263,12 @@ extern void (*arm_lang_output_object_attributes_hook)(void); /* Nonzero if this chip provides the movw and movt instructions. */ #define TARGET_HAVE_MOVT (arm_arch_thumb2 || arm_arch8) +/* Nonzero if this chip provides the cb{n}z instruction. */ +#define TARGET_HAVE_CBZ(arm_arch_thumb2 || arm_arch8) + /* Nonzero if integer division instructions supported. */ #define TARGET_IDIV((TARGET_ARM && arm_arch_arm_hwdiv) \ -|| (TARGET_THUMB2 && arm_arch_thumb_hwdiv)) +|| (TARGET_THUMB && arm_arch_thumb_hwdiv)) /* Nonzero if disallow volatile memory access in IT block. */ #define TARGET_NO_VOLATILE_CE (arm_arch_no_volatile_ce) diff --git a/gcc/config/arm/arm.c b/gcc/config/arm/arm.c index d832309..5ef3a1d 100644 --- a/gcc/config/arm/arm.c +++ b/gcc/config/arm/arm.c @@ -22568,7 +22568,7 @@ arm_print_operand_punct_valid_p (unsigned char code) { return (code == '@' || code == '|' || code == '.' || code == '(' || code == ')' || code == '#' - || (TARGET_32BIT && (code == '?')) + || code == '?' || (TARGET_THUMB2 && (code == '!')) || (TARGET_THUMB && (code == '_'))); } diff --git a/gcc/config/arm/thumb1.md b/gcc/config/arm/thumb1.md index 7e3bcb4..074b267 100644 --- a/gcc/config/arm/thumb1.md +++ b/gcc/config/arm/thumb1.md @@ -973,6 +973,92 @@ DONE; }) +;; A pattern for the cb(n)z instruction added in ARMv8-M baseline profile, +;; adapted from cbranchsi4_insn. Modifying cbranchsi4_insn instead leads to +;; code generation difference for ARMv6-M because the minimum length of the +;; instruction becomes 2 even for it due to a limitation in genattrtab's +;; handling of pc in the length condition. +(define_insn "thumb1_cbz" + [(set (pc) (if_then_else + (match_operator 0 "equality_operator" + [(match_operand:SI 1 "s_register_operand" "l") + (const_int 0)]) + (label_ref (match_operand 2 "" "")) + (pc)))] + "TARGET_THUMB1 && TARGET_HAVE_MOVT" +{ + if (get_attr_length (insn) == 2) +{ + if (GET_CODE (operands[0]) == EQ) + return "cbz\t%1, %l2"; + else + return "cbnz\t%1, %l2"; +} + else +{ + rtx t = cfun->machine->thumb1_cc_insn; + if (t != NULL_RTX) + { + if (!rtx_equal_p (cfun->machine->thumb1_cc_op0, operands[1]) + || !rtx_equal_p (cfun->machine->thumb1_cc_op1, operands[2])) + t = NULL_RTX; + if (cfun->machine->thumb1_cc_mode == CC_NOOVmode) + { + if (!noov_comparison_operator (operands[0], VOIDmode)) + t = NULL_RTX; + } + else if (cfun->machine->thumb1_cc_mode != CCmode) + t = NULL_RTX; + } + if (t == NULL_RTX) + { + output_asm_insn ("cmp\t%1, #0", operands); + cfun->machine->thumb1_cc_insn = insn; + cfun->machine->thumb1_cc_op0 = operands[1]; + cfun->machi
[arm-embedded][PATCH, ARM 5/6] Add support for MOVT/MOVW to ARMv8-M Baseline
Hi, We decided to apply the following patch to the ARM embedded 5 branch. Best regards, Thomas > -Original Message- > From: gcc-patches-ow...@gcc.gnu.org [mailto:gcc-patches- > ow...@gcc.gnu.org] On Behalf Of Thomas Preud'homme > Sent: Thursday, December 17, 2015 4:08 PM > To: gcc-patches@gcc.gnu.org; Richard Earnshaw; Ramana Radhakrishnan; > Kyrylo Tkachov > Subject: [PATCH, ARM 5/6] Add support for MOVT/MOVW to ARMv8-M > Baseline > > Hi, > > This patch is part of a patch series to add support for ARMv8-M[1] to GCC. > This specific patch makes the compiler start generating code with the > new MOVT/MOVW instructions for ARMv8-M Baseline. > > [1] For a quick overview of ARMv8-M please refer to the initial cover > letter. > > ChangeLog entry is as follows: > > > *** gcc/ChangeLog *** > > 2015-11-13 Thomas Preud'homme > > * config/arm/arm.h (TARGET_HAVE_MOVT): Include ARMv8-M as > having MOVT. > * config/arm/arm.c (arm_arch_name): (const_ok_for_op): Check > MOVT/MOVW > availability with TARGET_HAVE_MOVT. > (thumb_legitimate_constant_p): Legalize high part of a label_ref as a > constant. > (thumb1_rtx_costs): Also return 0 if setting a half word constant and > movw is available. > (thumb1_size_rtx_costs): Make set of half word constant also cost 1 > extra instruction if MOVW is available. Make constant with bottom > half > word zero cost 2 instruction if MOVW is available. > * config/arm/arm.md (define_attr "arch"): Add v8mb. > (define_attr "arch_enabled"): Set to yes if arch value is v8mb and > target is ARMv8-M Baseline. > * config/arm/thumb1.md (thumb1_movdi_insn): Add ARMv8-M > Baseline only > alternative for constants satisfying j constraint. > (thumb1_movsi_insn): Likewise. > (movsi splitter for K alternative): Tighten condition to not trigger > if movt is available and j constraint is satisfied. > (Pe immediate splitter): Likewise. > (thumb1_movhi_insn): Add ARMv8-M Baseline only alternative for > constant fitting in an halfword to use movw. > * doc/sourcebuild.texi (arm_thumb1_movt_ko): Document new > ARM > effective target. > > > *** gcc/testsuite/ChangeLog *** > > 2015-11-13 Thomas Preud'homme > > * lib/target-supports.exp > (check_effective_target_arm_thumb1_movt_ko): > Define effective target. > * gcc.target/arm/pr42574.c: Require arm_thumb1_movt_ko instead > of > arm_thumb1_ok as effective target to exclude ARMv8-M Baseline. > > > diff --git a/gcc/config/arm/arm.h b/gcc/config/arm/arm.h > index ff3cfcd..015df50 100644 > --- a/gcc/config/arm/arm.h > +++ b/gcc/config/arm/arm.h > @@ -261,7 +261,7 @@ extern void > (*arm_lang_output_object_attributes_hook)(void); > #define TARGET_HAVE_LDACQ(TARGET_ARM_ARCH >= 8 && > arm_arch_notm) > > /* Nonzero if this chip provides the movw and movt instructions. */ > -#define TARGET_HAVE_MOVT (arm_arch_thumb2) > +#define TARGET_HAVE_MOVT (arm_arch_thumb2 || arm_arch8) > > /* Nonzero if integer division instructions supported. */ > #define TARGET_IDIV ((TARGET_ARM && arm_arch_arm_hwdiv) \ > diff --git a/gcc/config/arm/arm.c b/gcc/config/arm/arm.c > index 51d501e..d832309 100644 > --- a/gcc/config/arm/arm.c > +++ b/gcc/config/arm/arm.c > @@ -8158,6 +8158,12 @@ arm_legitimate_constant_p_1 > (machine_mode, rtx x) > static bool > thumb_legitimate_constant_p (machine_mode mode > ATTRIBUTE_UNUSED, rtx x) > { > + /* Splitters for TARGET_USE_MOVT call arm_emit_movpair which > creates high > + RTX. These RTX must therefore be allowed for Thumb-1 so that > when run > + for ARMv8-M baseline or later the result is valid. */ > + if (TARGET_HAVE_MOVT && GET_CODE (x) == HIGH) > +x = XEXP (x, 0); > + >return (CONST_INT_P (x) > || CONST_DOUBLE_P (x) > || CONSTANT_ADDRESS_P (x) > @@ -8244,7 +8250,8 @@ thumb1_rtx_costs (rtx x, enum rtx_code code, > enum rtx_code outer) > case CONST_INT: >if (outer == SET) > { > - if ((unsigned HOST_WIDE_INT) INTVAL (x) < 256) > + if ((unsigned HOST_WIDE_INT) INTVAL (x) < 256 > + || (TARGET_HAVE_MOVT && !(INTVAL (x) & 0x))) > return 0; > if (thumb_shiftable_const (INTVAL (x))) > return COSTS_N_INSNS (2); > @@ -8994,16 +9001,24 @@ thumb1_size_rtx_costs (rtx x, enum > rtx_code code, enum rtx_code outer) >the mode. */ >
[PATCH, ARM 5/6] Add support for MOVT/MOVW to ARMv8-M Baseline
Hi, This patch is part of a patch series to add support for ARMv8-M[1] to GCC. This specific patch makes the compiler start generating code with the new MOVT/MOVW instructions for ARMv8-M Baseline. [1] For a quick overview of ARMv8-M please refer to the initial cover letter. ChangeLog entry is as follows: *** gcc/ChangeLog *** 2015-11-13 Thomas Preud'homme * config/arm/arm.h (TARGET_HAVE_MOVT): Include ARMv8-M as having MOVT. * config/arm/arm.c (arm_arch_name): (const_ok_for_op): Check MOVT/MOVW availability with TARGET_HAVE_MOVT. (thumb_legitimate_constant_p): Legalize high part of a label_ref as a constant. (thumb1_rtx_costs): Also return 0 if setting a half word constant and movw is available. (thumb1_size_rtx_costs): Make set of half word constant also cost 1 extra instruction if MOVW is available. Make constant with bottom half word zero cost 2 instruction if MOVW is available. * config/arm/arm.md (define_attr "arch"): Add v8mb. (define_attr "arch_enabled"): Set to yes if arch value is v8mb and target is ARMv8-M Baseline. * config/arm/thumb1.md (thumb1_movdi_insn): Add ARMv8-M Baseline only alternative for constants satisfying j constraint. (thumb1_movsi_insn): Likewise. (movsi splitter for K alternative): Tighten condition to not trigger if movt is available and j constraint is satisfied. (Pe immediate splitter): Likewise. (thumb1_movhi_insn): Add ARMv8-M Baseline only alternative for constant fitting in an halfword to use movw. * doc/sourcebuild.texi (arm_thumb1_movt_ko): Document new ARM effective target. *** gcc/testsuite/ChangeLog *** 2015-11-13 Thomas Preud'homme * lib/target-supports.exp (check_effective_target_arm_thumb1_movt_ko): Define effective target. * gcc.target/arm/pr42574.c: Require arm_thumb1_movt_ko instead of arm_thumb1_ok as effective target to exclude ARMv8-M Baseline. diff --git a/gcc/config/arm/arm.h b/gcc/config/arm/arm.h index ff3cfcd..015df50 100644 --- a/gcc/config/arm/arm.h +++ b/gcc/config/arm/arm.h @@ -261,7 +261,7 @@ extern void (*arm_lang_output_object_attributes_hook)(void); #define TARGET_HAVE_LDACQ (TARGET_ARM_ARCH >= 8 && arm_arch_notm) /* Nonzero if this chip provides the movw and movt instructions. */ -#define TARGET_HAVE_MOVT (arm_arch_thumb2) +#define TARGET_HAVE_MOVT (arm_arch_thumb2 || arm_arch8) /* Nonzero if integer division instructions supported. */ #define TARGET_IDIV((TARGET_ARM && arm_arch_arm_hwdiv) \ diff --git a/gcc/config/arm/arm.c b/gcc/config/arm/arm.c index 51d501e..d832309 100644 --- a/gcc/config/arm/arm.c +++ b/gcc/config/arm/arm.c @@ -8158,6 +8158,12 @@ arm_legitimate_constant_p_1 (machine_mode, rtx x) static bool thumb_legitimate_constant_p (machine_mode mode ATTRIBUTE_UNUSED, rtx x) { + /* Splitters for TARGET_USE_MOVT call arm_emit_movpair which creates high + RTX. These RTX must therefore be allowed for Thumb-1 so that when run + for ARMv8-M baseline or later the result is valid. */ + if (TARGET_HAVE_MOVT && GET_CODE (x) == HIGH) +x = XEXP (x, 0); + return (CONST_INT_P (x) || CONST_DOUBLE_P (x) || CONSTANT_ADDRESS_P (x) @@ -8244,7 +8250,8 @@ thumb1_rtx_costs (rtx x, enum rtx_code code, enum rtx_code outer) case CONST_INT: if (outer == SET) { - if ((unsigned HOST_WIDE_INT) INTVAL (x) < 256) + if ((unsigned HOST_WIDE_INT) INTVAL (x) < 256 + || (TARGET_HAVE_MOVT && !(INTVAL (x) & 0x))) return 0; if (thumb_shiftable_const (INTVAL (x))) return COSTS_N_INSNS (2); @@ -8994,16 +9001,24 @@ thumb1_size_rtx_costs (rtx x, enum rtx_code code, enum rtx_code outer) the mode. */ words = ARM_NUM_INTS (GET_MODE_SIZE (GET_MODE (SET_DEST (x; return COSTS_N_INSNS (words) -+ COSTS_N_INSNS (1) * (satisfies_constraint_J (SET_SRC (x)) - || satisfies_constraint_K (SET_SRC (x)) - /* thumb1_movdi_insn. */ - || ((words > 1) && MEM_P (SET_SRC (x; ++ COSTS_N_INSNS (1) + * (satisfies_constraint_J (SET_SRC (x)) + || satisfies_constraint_K (SET_SRC (x)) +/* Too big immediate for 2byte mov, using movt. */ + || ((unsigned HOST_WIDE_INT) INTVAL (SET_SRC (x)) >= 256 + && TARGET_HAVE_MOVT + && satisfies_constraint_j (SET_SRC (x))) +/* thumb1_movdi_insn. */ + || ((words > 1) && MEM_P (SET_SRC (x; case CONST_INT: if (outer == SET) {
[arm-embedded][PATCH, ARM 4/6] Factor out MOVW/MOVT availability and desirability checks
Hi, We decided to apply the following patch to the ARM embedded 5 branch. Best regards, Thomas > -Original Message- > From: gcc-patches-ow...@gcc.gnu.org [mailto:gcc-patches- > ow...@gcc.gnu.org] On Behalf Of Thomas Preud'homme > Sent: Thursday, December 17, 2015 3:59 PM > To: gcc-patches@gcc.gnu.org; Richard Earnshaw; Ramana Radhakrishnan; > Kyrylo Tkachov > Subject: [PATCH, ARM 4/6] Factor out MOVW/MOVT availability and > desirability checks > > Hi, > > This patch is part of a patch series to add support for ARMv8-M[1] to GCC. > This specific patch factors out the checks for MOVW/MOVT availability > and whether to use it. To this end, the new macro TARGET_HAVE_MOVT > is introduced and code is modified to use it or the existing > TARGET_USE_MOVT as needed. > > [1] For a quick overview of ARMv8-M please refer to the initial cover > letter. > > ChangeLog entry is as follows: > > > *** gcc/ChangeLog *** > > 2015-11-09 Thomas Preud'homme > > * config/arm/arm.h (TARGET_USE_MOVT): Check MOVT/MOVW > availability > with TARGET_HAVE_MOVT. > (TARGET_HAVE_MOVT): Define. > * config/arm/arm.c (const_ok_for_op): Check MOVT/MOVW > availability with TARGET_HAVE_MOVT. > * config/arm/arm.md (arm_movt): Use TARGET_HAVE_MOVT to > check movt > availability. > (addsi splitter): Use TARGET_USE_MOVT to check whether to use > movt + movw. > (symbol_refs movsi splitter): Remove TARGET_32BIT check. > (arm_movtas_ze): Use TARGET_HAVE_MOVT to check movt > availability. > * config/arm/constraints.md (define_constraint "j"): Use > TARGET_HAVE_MOVT to check movt availability. > > > diff --git a/gcc/config/arm/arm.h b/gcc/config/arm/arm.h > index fed3205..1831d01 100644 > --- a/gcc/config/arm/arm.h > +++ b/gcc/config/arm/arm.h > @@ -233,7 +233,7 @@ extern void > (*arm_lang_output_object_attributes_hook)(void); > > /* Should MOVW/MOVT be used in preference to a constant pool. */ > #define TARGET_USE_MOVT \ > - (arm_arch_thumb2 \ > + (TARGET_HAVE_MOVT \ > && (arm_disable_literal_pool \ > || (!optimize_size && !current_tune->prefer_constant_pool))) > > @@ -268,6 +268,9 @@ extern void > (*arm_lang_output_object_attributes_hook)(void); > /* Nonzero if this chip supports load-acquire and store-release. */ > #define TARGET_HAVE_LDACQ(TARGET_ARM_ARCH >= 8) > > +/* Nonzero if this chip provides the movw and movt instructions. */ > +#define TARGET_HAVE_MOVT (arm_arch_thumb2) > + > /* Nonzero if integer division instructions supported. */ > #define TARGET_IDIV ((TARGET_ARM && arm_arch_arm_hwdiv) \ >|| (TARGET_THUMB2 && > arm_arch_thumb_hwdiv)) > diff --git a/gcc/config/arm/arm.c b/gcc/config/arm/arm.c > index 62287bc..ec5197a 100644 > --- a/gcc/config/arm/arm.c > +++ b/gcc/config/arm/arm.c > @@ -3851,7 +3851,7 @@ const_ok_for_op (HOST_WIDE_INT i, enum > rtx_code code) > { > case SET: >/* See if we can use movw. */ > - if (arm_arch_thumb2 && (i & 0x) == 0) > + if (TARGET_HAVE_MOVT && (i & 0x) == 0) > return 1; >else > /* Otherwise, try mvn. */ > diff --git a/gcc/config/arm/arm.md b/gcc/config/arm/arm.md > index 8ebb1bf..78dafa0 100644 > --- a/gcc/config/arm/arm.md > +++ b/gcc/config/arm/arm.md > @@ -5736,7 +5736,7 @@ >[(set (match_operand:SI 0 "nonimmediate_operand" "=r") > (lo_sum:SI (match_operand:SI 1 "nonimmediate_operand" "0") > (match_operand:SI 2 "general_operand" "i")))] > - "arm_arch_thumb2 && arm_valid_symbolic_address_p (operands[2])" > + "TARGET_HAVE_MOVT && arm_valid_symbolic_address_p > (operands[2])" >"movt%?\t%0, #:upper16:%c2" >[(set_attr "predicable" "yes") > (set_attr "predicable_short_it" "no") > @@ -5796,8 +5796,7 @@ >[(set (match_operand:SI 0 "arm_general_register_operand" "") > (const:SI (plus:SI (match_operand:SI 1 "general_operand" "") > (match_operand:SI 2 "const_int_operand" > ""] > - "TARGET_THUMB2 > - && arm_disable_literal_pool > + "TARGET_USE_MOVT > && reload_completed > && GET_CODE (operands[1]) == SYMBOL_REF" >[(clobber (const_int 0))] > @@ -5827,8 +5826,7 @@ > (define_split >[(set (match_operand:SI
[PATCH, ARM 4/6] Factor out MOVW/MOVT availability and desirability checks
Hi, This patch is part of a patch series to add support for ARMv8-M[1] to GCC. This specific patch factors out the checks for MOVW/MOVT availability and whether to use it. To this end, the new macro TARGET_HAVE_MOVT is introduced and code is modified to use it or the existing TARGET_USE_MOVT as needed. [1] For a quick overview of ARMv8-M please refer to the initial cover letter. ChangeLog entry is as follows: *** gcc/ChangeLog *** 2015-11-09 Thomas Preud'homme * config/arm/arm.h (TARGET_USE_MOVT): Check MOVT/MOVW availability with TARGET_HAVE_MOVT. (TARGET_HAVE_MOVT): Define. * config/arm/arm.c (const_ok_for_op): Check MOVT/MOVW availability with TARGET_HAVE_MOVT. * config/arm/arm.md (arm_movt): Use TARGET_HAVE_MOVT to check movt availability. (addsi splitter): Use TARGET_USE_MOVT to check whether to use movt + movw. (symbol_refs movsi splitter): Remove TARGET_32BIT check. (arm_movtas_ze): Use TARGET_HAVE_MOVT to check movt availability. * config/arm/constraints.md (define_constraint "j"): Use TARGET_HAVE_MOVT to check movt availability. diff --git a/gcc/config/arm/arm.h b/gcc/config/arm/arm.h index fed3205..1831d01 100644 --- a/gcc/config/arm/arm.h +++ b/gcc/config/arm/arm.h @@ -233,7 +233,7 @@ extern void (*arm_lang_output_object_attributes_hook)(void); /* Should MOVW/MOVT be used in preference to a constant pool. */ #define TARGET_USE_MOVT \ - (arm_arch_thumb2 \ + (TARGET_HAVE_MOVT \ && (arm_disable_literal_pool \ || (!optimize_size && !current_tune->prefer_constant_pool))) @@ -268,6 +268,9 @@ extern void (*arm_lang_output_object_attributes_hook)(void); /* Nonzero if this chip supports load-acquire and store-release. */ #define TARGET_HAVE_LDACQ (TARGET_ARM_ARCH >= 8) +/* Nonzero if this chip provides the movw and movt instructions. */ +#define TARGET_HAVE_MOVT (arm_arch_thumb2) + /* Nonzero if integer division instructions supported. */ #define TARGET_IDIV((TARGET_ARM && arm_arch_arm_hwdiv) \ || (TARGET_THUMB2 && arm_arch_thumb_hwdiv)) diff --git a/gcc/config/arm/arm.c b/gcc/config/arm/arm.c index 62287bc..ec5197a 100644 --- a/gcc/config/arm/arm.c +++ b/gcc/config/arm/arm.c @@ -3851,7 +3851,7 @@ const_ok_for_op (HOST_WIDE_INT i, enum rtx_code code) { case SET: /* See if we can use movw. */ - if (arm_arch_thumb2 && (i & 0x) == 0) + if (TARGET_HAVE_MOVT && (i & 0x) == 0) return 1; else /* Otherwise, try mvn. */ diff --git a/gcc/config/arm/arm.md b/gcc/config/arm/arm.md index 8ebb1bf..78dafa0 100644 --- a/gcc/config/arm/arm.md +++ b/gcc/config/arm/arm.md @@ -5736,7 +5736,7 @@ [(set (match_operand:SI 0 "nonimmediate_operand" "=r") (lo_sum:SI (match_operand:SI 1 "nonimmediate_operand" "0") (match_operand:SI 2 "general_operand" "i")))] - "arm_arch_thumb2 && arm_valid_symbolic_address_p (operands[2])" + "TARGET_HAVE_MOVT && arm_valid_symbolic_address_p (operands[2])" "movt%?\t%0, #:upper16:%c2" [(set_attr "predicable" "yes") (set_attr "predicable_short_it" "no") @@ -5796,8 +5796,7 @@ [(set (match_operand:SI 0 "arm_general_register_operand" "") (const:SI (plus:SI (match_operand:SI 1 "general_operand" "") (match_operand:SI 2 "const_int_operand" ""] - "TARGET_THUMB2 - && arm_disable_literal_pool + "TARGET_USE_MOVT && reload_completed && GET_CODE (operands[1]) == SYMBOL_REF" [(clobber (const_int 0))] @@ -5827,8 +5826,7 @@ (define_split [(set (match_operand:SI 0 "arm_general_register_operand" "") (match_operand:SI 1 "general_operand" ""))] - "TARGET_32BIT - && TARGET_USE_MOVT && GET_CODE (operands[1]) == SYMBOL_REF + "TARGET_USE_MOVT && GET_CODE (operands[1]) == SYMBOL_REF && !flag_pic && !target_word_relocations && !arm_tls_referenced_p (operands[1])" [(clobber (const_int 0))] @@ -11030,7 +11028,7 @@ (const_int 16) (const_int 16)) (match_operand:SI 1 "const_int_operand" ""))] - "arm_arch_thumb2" + "TARGET_HAVE_MOVT" "movt%?\t%0, %L1" [(set_attr "predicable" "yes") (set_attr "predicable_short_it" "no") diff --git a/gcc/config/arm/constraints.md b/gcc/config/arm/constraints.md index d01a918..838e031 100644 --- a/gcc/config/arm/constraints.md +++ b/gcc/confi
RE: [PATCH, ARM 3/6] Fix indentation of FL_FOR_ARCH* definition after adding support for ARMv8-M
[Fixed the subject and added ARM maintainers to recipient.] > -Original Message- > From: gcc-patches-ow...@gcc.gnu.org [mailto:gcc-patches- > ow...@gcc.gnu.org] On Behalf Of Thomas Preud'homme > Sent: Thursday, December 17, 2015 3:51 PM > To: gcc-patches@gcc.gnu.org > Subject: [PATCH, ARM 3/8] Fix indentation of FL_FOR_ARCH* definition > after adding support for ARMv8-M > > Hi, > > This patch is part of a patch series to add support for ARMv8-M[1] to GCC. > This specific patch fixes the indentation of FL_FOR_ARCH* macros > definition following the patch to add support for ARMv8-M. Since this is > an obvious change, I'm not expecting a review and will commit it as soon > as the other patches in the series are accepted. > > [1] For a quick overview of ARMv8-M please refer to the initial cover > letter. > > ChangeLog entry is as follows: > > > *** gcc/ChangeLog *** > > 2015-11-06 Thomas Preud'homme > > * config/arm/arm-protos.h: Reindent FL_FOR_* macro definitions. > > > diff --git a/gcc/config/arm/arm-protos.h b/gcc/config/arm/arm-protos.h > index 1371ee7..bf0d1b4 100644 > --- a/gcc/config/arm/arm-protos.h > +++ b/gcc/config/arm/arm-protos.h > @@ -391,32 +391,33 @@ extern bool arm_is_constant_pool_ref (rtx); > #define FL_TUNE (FL_WBUF | FL_VFPV2 | FL_STRONG | > FL_LDSCHED \ >| FL_CO_PROC) > > -#define FL_FOR_ARCH2 FL_NOTM > -#define FL_FOR_ARCH3 (FL_FOR_ARCH2 | FL_MODE32) > -#define FL_FOR_ARCH3M(FL_FOR_ARCH3 | FL_ARCH3M) > -#define FL_FOR_ARCH4 (FL_FOR_ARCH3M | FL_ARCH4) > -#define FL_FOR_ARCH4T(FL_FOR_ARCH4 | FL_THUMB) > -#define FL_FOR_ARCH5 (FL_FOR_ARCH4 | FL_ARCH5) > -#define FL_FOR_ARCH5T(FL_FOR_ARCH5 | FL_THUMB) > -#define FL_FOR_ARCH5E(FL_FOR_ARCH5 | FL_ARCH5E) > -#define FL_FOR_ARCH5TE (FL_FOR_ARCH5E | FL_THUMB) > -#define FL_FOR_ARCH5TEJ FL_FOR_ARCH5TE > -#define FL_FOR_ARCH6 (FL_FOR_ARCH5TE | FL_ARCH6) > -#define FL_FOR_ARCH6JFL_FOR_ARCH6 > -#define FL_FOR_ARCH6K(FL_FOR_ARCH6 | FL_ARCH6K) > -#define FL_FOR_ARCH6ZFL_FOR_ARCH6 > -#define FL_FOR_ARCH6KZ (FL_FOR_ARCH6K | FL_ARCH6KZ) > -#define FL_FOR_ARCH6T2 (FL_FOR_ARCH6 | FL_THUMB2) > -#define FL_FOR_ARCH6M(FL_FOR_ARCH6 & ~FL_NOTM) > -#define FL_FOR_ARCH7 ((FL_FOR_ARCH6T2 & ~FL_NOTM) | > FL_ARCH7) > -#define FL_FOR_ARCH7A(FL_FOR_ARCH7 | FL_NOTM | > FL_ARCH6K) > -#define FL_FOR_ARCH7VE (FL_FOR_ARCH7A | FL_THUMB_DIV | > FL_ARM_DIV) > -#define FL_FOR_ARCH7R(FL_FOR_ARCH7A | FL_THUMB_DIV) > -#define FL_FOR_ARCH7M(FL_FOR_ARCH7 | FL_THUMB_DIV) > -#define FL_FOR_ARCH7EM (FL_FOR_ARCH7M | FL_ARCH7EM) > -#define FL_FOR_ARCH8A(FL_FOR_ARCH7VE | FL_ARCH8) > -#define FL_FOR_ARCH8M_BASE (FL_FOR_ARCH6M | FL_ARCH8 | > FL_THUMB_DIV) > -#define FL_FOR_ARCH8M_MAIN (FL_FOR_ARCH7M | FL_ARCH8) > +#define FL_FOR_ARCH2 FL_NOTM > +#define FL_FOR_ARCH3 (FL_FOR_ARCH2 | FL_MODE32) > +#define FL_FOR_ARCH3M(FL_FOR_ARCH3 | FL_ARCH3M) > +#define FL_FOR_ARCH4 (FL_FOR_ARCH3M | FL_ARCH4) > +#define FL_FOR_ARCH4T(FL_FOR_ARCH4 | FL_THUMB) > +#define FL_FOR_ARCH5 (FL_FOR_ARCH4 | FL_ARCH5) > +#define FL_FOR_ARCH5T(FL_FOR_ARCH5 | FL_THUMB) > +#define FL_FOR_ARCH5E(FL_FOR_ARCH5 | FL_ARCH5E) > +#define FL_FOR_ARCH5TE (FL_FOR_ARCH5E | FL_THUMB) > +#define FL_FOR_ARCH5TEJ FL_FOR_ARCH5TE > +#define FL_FOR_ARCH6 (FL_FOR_ARCH5TE | FL_ARCH6) > +#define FL_FOR_ARCH6JFL_FOR_ARCH6 > +#define FL_FOR_ARCH6K(FL_FOR_ARCH6 | FL_ARCH6K) > +#define FL_FOR_ARCH6ZFL_FOR_ARCH6 > +#define FL_FOR_ARCH6ZK FL_FOR_ARCH6K > +#define FL_FOR_ARCH6KZ (FL_FOR_ARCH6K | FL_ARCH6KZ) > +#define FL_FOR_ARCH6T2 (FL_FOR_ARCH6 | FL_THUMB2) > +#define FL_FOR_ARCH6M(FL_FOR_ARCH6 & ~FL_NOTM) > +#define FL_FOR_ARCH7 ((FL_FOR_ARCH6T2 & ~FL_NOTM) > | FL_ARCH7) > +#define FL_FOR_ARCH7A(FL_FOR_ARCH7 | FL_NOTM | > FL_ARCH6K) > +#define FL_FOR_ARCH7VE (FL_FOR_ARCH7A | > FL_THUMB_DIV | FL_ARM_DIV) > +#define FL_FOR_ARCH7R(FL_FOR_ARCH7A | > FL_THUMB_DIV) > +#define FL_FOR_ARCH7M(FL_FOR_ARCH7 | > FL_THUMB_DIV) > +#define FL_FOR_ARCH7EM (FL_FOR_ARCH7M | > FL_ARCH7EM) > +#define FL_FOR_ARCH8A(FL_FOR_ARCH7VE | FL_ARCH8) > +#define FL_FOR_ARCH8M_BASE (FL_FOR_ARCH6M | FL_ARCH8 | > FL_THUMB_DIV) > +#define FL_FOR_ARCH8M_MAIN (FL_FOR_ARCH7M | FL_ARCH8) > > /* There are too many feature bits to fit in a single word so the set of > cpu and > fpu capabilities is a structure. A feature set is created and manipulated > > > Is this ok for stage3? > > Best regards, > > Thomas
[arm-embedded][PATCH, ARM 3/6] Fix indentation of FL_FOR_ARCH* definition after adding support for ARMv8-M
Hi, We decided to apply the following patch to the ARM embedded 5 branch. Best regards, Thomas > -Original Message- > From: gcc-patches-ow...@gcc.gnu.org [mailto:gcc-patches- > ow...@gcc.gnu.org] On Behalf Of Thomas Preud'homme > Sent: Thursday, December 17, 2015 3:51 PM > To: gcc-patches@gcc.gnu.org > Subject: [PATCH, ARM 3/8] Fix indentation of FL_FOR_ARCH* definition > after adding support for ARMv8-M > > Hi, > > This patch is part of a patch series to add support for ARMv8-M[1] to GCC. > This specific patch fixes the indentation of FL_FOR_ARCH* macros > definition following the patch to add support for ARMv8-M. Since this is > an obvious change, I'm not expecting a review and will commit it as soon > as the other patches in the series are accepted. > > [1] For a quick overview of ARMv8-M please refer to the initial cover > letter. > > ChangeLog entry is as follows: > > > *** gcc/ChangeLog *** > > 2015-11-06 Thomas Preud'homme > > * config/arm/arm-protos.h: Reindent FL_FOR_* macro definitions. > > > diff --git a/gcc/config/arm/arm-protos.h b/gcc/config/arm/arm-protos.h > index 1371ee7..bf0d1b4 100644 > --- a/gcc/config/arm/arm-protos.h > +++ b/gcc/config/arm/arm-protos.h > @@ -391,32 +391,33 @@ extern bool arm_is_constant_pool_ref (rtx); > #define FL_TUNE (FL_WBUF | FL_VFPV2 | FL_STRONG | > FL_LDSCHED \ >| FL_CO_PROC) > > -#define FL_FOR_ARCH2 FL_NOTM > -#define FL_FOR_ARCH3 (FL_FOR_ARCH2 | FL_MODE32) > -#define FL_FOR_ARCH3M(FL_FOR_ARCH3 | FL_ARCH3M) > -#define FL_FOR_ARCH4 (FL_FOR_ARCH3M | FL_ARCH4) > -#define FL_FOR_ARCH4T(FL_FOR_ARCH4 | FL_THUMB) > -#define FL_FOR_ARCH5 (FL_FOR_ARCH4 | FL_ARCH5) > -#define FL_FOR_ARCH5T(FL_FOR_ARCH5 | FL_THUMB) > -#define FL_FOR_ARCH5E(FL_FOR_ARCH5 | FL_ARCH5E) > -#define FL_FOR_ARCH5TE (FL_FOR_ARCH5E | FL_THUMB) > -#define FL_FOR_ARCH5TEJ FL_FOR_ARCH5TE > -#define FL_FOR_ARCH6 (FL_FOR_ARCH5TE | FL_ARCH6) > -#define FL_FOR_ARCH6JFL_FOR_ARCH6 > -#define FL_FOR_ARCH6K(FL_FOR_ARCH6 | FL_ARCH6K) > -#define FL_FOR_ARCH6ZFL_FOR_ARCH6 > -#define FL_FOR_ARCH6KZ (FL_FOR_ARCH6K | FL_ARCH6KZ) > -#define FL_FOR_ARCH6T2 (FL_FOR_ARCH6 | FL_THUMB2) > -#define FL_FOR_ARCH6M(FL_FOR_ARCH6 & ~FL_NOTM) > -#define FL_FOR_ARCH7 ((FL_FOR_ARCH6T2 & ~FL_NOTM) | > FL_ARCH7) > -#define FL_FOR_ARCH7A(FL_FOR_ARCH7 | FL_NOTM | > FL_ARCH6K) > -#define FL_FOR_ARCH7VE (FL_FOR_ARCH7A | FL_THUMB_DIV | > FL_ARM_DIV) > -#define FL_FOR_ARCH7R(FL_FOR_ARCH7A | FL_THUMB_DIV) > -#define FL_FOR_ARCH7M(FL_FOR_ARCH7 | FL_THUMB_DIV) > -#define FL_FOR_ARCH7EM (FL_FOR_ARCH7M | FL_ARCH7EM) > -#define FL_FOR_ARCH8A(FL_FOR_ARCH7VE | FL_ARCH8) > -#define FL_FOR_ARCH8M_BASE (FL_FOR_ARCH6M | FL_ARCH8 | > FL_THUMB_DIV) > -#define FL_FOR_ARCH8M_MAIN (FL_FOR_ARCH7M | FL_ARCH8) > +#define FL_FOR_ARCH2 FL_NOTM > +#define FL_FOR_ARCH3 (FL_FOR_ARCH2 | FL_MODE32) > +#define FL_FOR_ARCH3M(FL_FOR_ARCH3 | FL_ARCH3M) > +#define FL_FOR_ARCH4 (FL_FOR_ARCH3M | FL_ARCH4) > +#define FL_FOR_ARCH4T(FL_FOR_ARCH4 | FL_THUMB) > +#define FL_FOR_ARCH5 (FL_FOR_ARCH4 | FL_ARCH5) > +#define FL_FOR_ARCH5T(FL_FOR_ARCH5 | FL_THUMB) > +#define FL_FOR_ARCH5E(FL_FOR_ARCH5 | FL_ARCH5E) > +#define FL_FOR_ARCH5TE (FL_FOR_ARCH5E | FL_THUMB) > +#define FL_FOR_ARCH5TEJ FL_FOR_ARCH5TE > +#define FL_FOR_ARCH6 (FL_FOR_ARCH5TE | FL_ARCH6) > +#define FL_FOR_ARCH6JFL_FOR_ARCH6 > +#define FL_FOR_ARCH6K(FL_FOR_ARCH6 | FL_ARCH6K) > +#define FL_FOR_ARCH6ZFL_FOR_ARCH6 > +#define FL_FOR_ARCH6ZK FL_FOR_ARCH6K > +#define FL_FOR_ARCH6KZ (FL_FOR_ARCH6K | FL_ARCH6KZ) > +#define FL_FOR_ARCH6T2 (FL_FOR_ARCH6 | FL_THUMB2) > +#define FL_FOR_ARCH6M(FL_FOR_ARCH6 & ~FL_NOTM) > +#define FL_FOR_ARCH7 ((FL_FOR_ARCH6T2 & ~FL_NOTM) > | FL_ARCH7) > +#define FL_FOR_ARCH7A(FL_FOR_ARCH7 | FL_NOTM | > FL_ARCH6K) > +#define FL_FOR_ARCH7VE (FL_FOR_ARCH7A | > FL_THUMB_DIV | FL_ARM_DIV) > +#define FL_FOR_ARCH7R(FL_FOR_ARCH7A | > FL_THUMB_DIV) > +#define FL_FOR_ARCH7M(FL_FOR_ARCH7 | > FL_THUMB_DIV) > +#define FL_FOR_ARCH7EM (FL_FOR_ARCH7M | > FL_ARCH7EM) > +#define FL_FOR_ARCH8A(FL_FOR_ARCH7VE | FL_ARCH8) > +#define FL_FOR_ARCH8M_BASE (FL_FOR_ARCH6M | FL_ARCH8 | > FL_THUMB_DIV) > +#define FL_FOR_ARCH8M_MAIN (FL_FOR_ARCH7M | FL_ARCH8) > > /* There are too many feature bits to fit in a single word so the set of > cpu and > fpu capabilities is a structure. A feature set is created and manipulated > > > Is this ok for stage3? > > Best regards, > > Thomas
[PATCH, ARM 3/8] Fix indentation of FL_FOR_ARCH* definition after adding support for ARMv8-M
Hi, This patch is part of a patch series to add support for ARMv8-M[1] to GCC. This specific patch fixes the indentation of FL_FOR_ARCH* macros definition following the patch to add support for ARMv8-M. Since this is an obvious change, I'm not expecting a review and will commit it as soon as the other patches in the series are accepted. [1] For a quick overview of ARMv8-M please refer to the initial cover letter. ChangeLog entry is as follows: *** gcc/ChangeLog *** 2015-11-06 Thomas Preud'homme * config/arm/arm-protos.h: Reindent FL_FOR_* macro definitions. diff --git a/gcc/config/arm/arm-protos.h b/gcc/config/arm/arm-protos.h index 1371ee7..bf0d1b4 100644 --- a/gcc/config/arm/arm-protos.h +++ b/gcc/config/arm/arm-protos.h @@ -391,32 +391,33 @@ extern bool arm_is_constant_pool_ref (rtx); #define FL_TUNE(FL_WBUF | FL_VFPV2 | FL_STRONG | FL_LDSCHED \ | FL_CO_PROC) -#define FL_FOR_ARCH2 FL_NOTM -#define FL_FOR_ARCH3 (FL_FOR_ARCH2 | FL_MODE32) -#define FL_FOR_ARCH3M (FL_FOR_ARCH3 | FL_ARCH3M) -#define FL_FOR_ARCH4 (FL_FOR_ARCH3M | FL_ARCH4) -#define FL_FOR_ARCH4T (FL_FOR_ARCH4 | FL_THUMB) -#define FL_FOR_ARCH5 (FL_FOR_ARCH4 | FL_ARCH5) -#define FL_FOR_ARCH5T (FL_FOR_ARCH5 | FL_THUMB) -#define FL_FOR_ARCH5E (FL_FOR_ARCH5 | FL_ARCH5E) -#define FL_FOR_ARCH5TE (FL_FOR_ARCH5E | FL_THUMB) -#define FL_FOR_ARCH5TEJFL_FOR_ARCH5TE -#define FL_FOR_ARCH6 (FL_FOR_ARCH5TE | FL_ARCH6) -#define FL_FOR_ARCH6J FL_FOR_ARCH6 -#define FL_FOR_ARCH6K (FL_FOR_ARCH6 | FL_ARCH6K) -#define FL_FOR_ARCH6Z FL_FOR_ARCH6 -#define FL_FOR_ARCH6KZ (FL_FOR_ARCH6K | FL_ARCH6KZ) -#define FL_FOR_ARCH6T2 (FL_FOR_ARCH6 | FL_THUMB2) -#define FL_FOR_ARCH6M (FL_FOR_ARCH6 & ~FL_NOTM) -#define FL_FOR_ARCH7 ((FL_FOR_ARCH6T2 & ~FL_NOTM) | FL_ARCH7) -#define FL_FOR_ARCH7A (FL_FOR_ARCH7 | FL_NOTM | FL_ARCH6K) -#define FL_FOR_ARCH7VE (FL_FOR_ARCH7A | FL_THUMB_DIV | FL_ARM_DIV) -#define FL_FOR_ARCH7R (FL_FOR_ARCH7A | FL_THUMB_DIV) -#define FL_FOR_ARCH7M (FL_FOR_ARCH7 | FL_THUMB_DIV) -#define FL_FOR_ARCH7EM (FL_FOR_ARCH7M | FL_ARCH7EM) -#define FL_FOR_ARCH8A (FL_FOR_ARCH7VE | FL_ARCH8) -#define FL_FOR_ARCH8M_BASE (FL_FOR_ARCH6M | FL_ARCH8 | FL_THUMB_DIV) -#define FL_FOR_ARCH8M_MAIN (FL_FOR_ARCH7M | FL_ARCH8) +#define FL_FOR_ARCH2 FL_NOTM +#define FL_FOR_ARCH3 (FL_FOR_ARCH2 | FL_MODE32) +#define FL_FOR_ARCH3M (FL_FOR_ARCH3 | FL_ARCH3M) +#define FL_FOR_ARCH4 (FL_FOR_ARCH3M | FL_ARCH4) +#define FL_FOR_ARCH4T (FL_FOR_ARCH4 | FL_THUMB) +#define FL_FOR_ARCH5 (FL_FOR_ARCH4 | FL_ARCH5) +#define FL_FOR_ARCH5T (FL_FOR_ARCH5 | FL_THUMB) +#define FL_FOR_ARCH5E (FL_FOR_ARCH5 | FL_ARCH5E) +#define FL_FOR_ARCH5TE (FL_FOR_ARCH5E | FL_THUMB) +#define FL_FOR_ARCH5TEJFL_FOR_ARCH5TE +#define FL_FOR_ARCH6 (FL_FOR_ARCH5TE | FL_ARCH6) +#define FL_FOR_ARCH6J FL_FOR_ARCH6 +#define FL_FOR_ARCH6K (FL_FOR_ARCH6 | FL_ARCH6K) +#define FL_FOR_ARCH6Z FL_FOR_ARCH6 +#define FL_FOR_ARCH6ZK FL_FOR_ARCH6K +#define FL_FOR_ARCH6KZ (FL_FOR_ARCH6K | FL_ARCH6KZ) +#define FL_FOR_ARCH6T2 (FL_FOR_ARCH6 | FL_THUMB2) +#define FL_FOR_ARCH6M (FL_FOR_ARCH6 & ~FL_NOTM) +#define FL_FOR_ARCH7 ((FL_FOR_ARCH6T2 & ~FL_NOTM) | FL_ARCH7) +#define FL_FOR_ARCH7A (FL_FOR_ARCH7 | FL_NOTM | FL_ARCH6K) +#define FL_FOR_ARCH7VE (FL_FOR_ARCH7A | FL_THUMB_DIV | FL_ARM_DIV) +#define FL_FOR_ARCH7R (FL_FOR_ARCH7A | FL_THUMB_DIV) +#define FL_FOR_ARCH7M (FL_FOR_ARCH7 | FL_THUMB_DIV) +#define FL_FOR_ARCH7EM (FL_FOR_ARCH7M | FL_ARCH7EM) +#define FL_FOR_ARCH8A (FL_FOR_ARCH7VE | FL_ARCH8) +#define FL_FOR_ARCH8M_BASE (FL_FOR_ARCH6M | FL_ARCH8 | FL_THUMB_DIV) +#define FL_FOR_ARCH8M_MAIN (FL_FOR_ARCH7M | FL_ARCH8) /* There are too many feature bits to fit in a single word so the set of cpu and fpu capabilities is a structure. A feature set is created and manipulated Is this ok for stage3? Best regards, Thomas
[arm-embedded][PATCH, ARM 2/6] Add support for ARMv8-M
Hi, We decided to apply the following patch to the ARM embedded 5 branch. Best regards, Thomas > -Original Message- > From: gcc-patches-ow...@gcc.gnu.org [mailto:gcc-patches- > ow...@gcc.gnu.org] On Behalf Of Thomas Preud'homme > Sent: Thursday, December 17, 2015 3:25 PM > To: gcc-patches@gcc.gnu.org; Richard Earnshaw; Ramana Radhakrishnan; > Kyrylo Tkachov > Subject: [PATCH, ARM 2/6] Add support for ARMv8-M > > Hi, > > This patch is part of a patch series to add support for ARMv8-M[1] to GCC. > This specific patch adds basic support for the new architecture, allowing > the new names to be accepted by -march and the compiler to behave > like ARMv6-M (for ARMv8-M Baseline) and or ARMv7-M (for ARMv8-M > Mainline). The changes are divided in two categories: > > * those to recognize the new architecture name > * those to keep the behavior as previous architectures > > Changes to make the compiler generate code with the new instructions > are in follow-up patches. > > [1] For a quick overview of ARMv8-M please refer to the initial cover > letter. > > > ChangeLog entries are as follow: > > *** gcc/ChangeLog *** > > 2015-11-23 Thomas Preud'homme > > * config/arm/arm-arches.def (armv8-m.base): Define new > architecture. > (armv8-m.main): Likewise. > (armv8-m.main+dsp): Likewise > * config/arm/arm-protos.h (FL_FOR_ARCH8M_BASE): Define. > (FL_FOR_ARCH8M_MAIN): Likewise. > * config/arm/arm-tables.opt: Regenerate. > * config/arm/bpabi.h: Add armv8-m.base, armv8-m.main and > armv8-m.main+dsp to BE8_LINK_SPEC. > * config/arm/arm.h (TARGET_HAVE_LDACQ): Exclude ARMv8-M. > (enum base_architecture): Add BASE_ARCH_8M_BASE and > BASE_ARCH_8M_MAIN. > (TARGET_ARM_V8M): Define. > * config/arm/arm.c (arm_arch_name): Increase size to work with > ARMv8-M > Baseline and Mainline. > (arm_option_override_internal): Also disable arm_restrict_it when > !arm_arch_notm. > (arm_file_start): Increase architecture buffer size. > * doc/invoke.texi: Document architectures armv8-m.base, armv8- > m.main > and armv8-m.main+dsp. > (mno-unaligned-access): Clarify that this is disabled by default for > ARMv8-M Baseline architecture as well. > > > *** gcc/testsuite/ChangeLog *** > > 2015-11-10 Thomas Preud'homme > > * lib/target-supports.exp: Generate > add_options_for_arm_arch_FUNC and > check_effective_target_arm_arch_FUNC_multilib for ARMv8-M > Baseline and > ARMv8-M Mainline architectures. > > > *** libgcc/ChangeLog *** > > 2015-11-10 Thomas Preud'homme > > * config/arm/lib1funcs.S (__ARM_ARCH__): Define to 8 for ARMv8- > M. > > > diff --git a/gcc/config/arm/arm-arches.def b/gcc/config/arm/arm- > arches.def > index > ddf6c3c330f91640d647d266f3d0e2350e7b986a..1d0301a3b9414127d38783 > 4584f3e42c225b6d3f 100644 > --- a/gcc/config/arm/arm-arches.def > +++ b/gcc/config/arm/arm-arches.def > @@ -57,6 +57,12 @@ ARM_ARCH("armv7-m", cortexm3, 7M, > ARM_FSET_MAKE_CPU1 (FL_CO_PROC | FL_FOR_ > ARM_ARCH("armv7e-m", cortexm4, 7EM, ARM_FSET_MAKE_CPU1 > (FL_CO_PROC | FL_FOR_ARCH7EM)) > ARM_ARCH("armv8-a", cortexa53, 8A, ARM_FSET_MAKE_CPU1 > (FL_CO_PROC | FL_FOR_ARCH8A)) > ARM_ARCH("armv8-a+crc",cortexa53, 8A, ARM_FSET_MAKE_CPU1 > (FL_CO_PROC | FL_CRC32 | FL_FOR_ARCH8A)) > +ARM_ARCH("armv8-m.base", cortexm0, 8M_BASE, > + ARM_FSET_MAKE_CPU1 ( > FL_FOR_ARCH8M_BASE)) > +ARM_ARCH("armv8-m.main", cortexm7, 8M_MAIN, > + ARM_FSET_MAKE_CPU1(FL_CO_PROC | > FL_FOR_ARCH8M_MAIN)) > +ARM_ARCH("armv8-m.main+dsp", cortexm7, 8M_MAIN, > + ARM_FSET_MAKE_CPU1(FL_CO_PROC | FL_ARCH7EM | > FL_FOR_ARCH8M_MAIN)) > ARM_ARCH("iwmmxt", iwmmxt, 5TE, ARM_FSET_MAKE_CPU1 > (FL_LDSCHED | FL_STRONG | FL_FOR_ARCH5TE | FL_XSCALE | > FL_IWMMXT)) > ARM_ARCH("iwmmxt2", iwmmxt2,5TE, ARM_FSET_MAKE_CPU1 > (FL_LDSCHED | FL_STRONG | FL_FOR_ARCH5TE | FL_XSCALE | > FL_IWMMXT | FL_IWMMXT2)) > > diff --git a/gcc/config/arm/arm-protos.h b/gcc/config/arm/arm-protos.h > index > e7328e79650739fca1c3e21b10c194feaa697465..dc7a0871c37bfda267 > 1f197bfe83c20c7888 100644 > --- a/gcc/config/arm/arm-protos.h > +++ b/gcc/config/arm/arm-protos.h > @@ -415,6 +415,8 @@ extern bool arm_is_constant_pool_ref (rtx); > #define FL_FOR_ARCH7M(FL_FOR_ARCH7 | FL_THUMB_DIV) > #define FL_FOR_ARCH7EM (FL_FOR_ARCH7M | FL_ARCH
[PATCH, ARM 2/6] Add support for ARMv8-M
Hi, This patch is part of a patch series to add support for ARMv8-M[1] to GCC. This specific patch adds basic support for the new architecture, allowing the new names to be accepted by -march and the compiler to behave like ARMv6-M (for ARMv8-M Baseline) and or ARMv7-M (for ARMv8-M Mainline). The changes are divided in two categories: * those to recognize the new architecture name * those to keep the behavior as previous architectures Changes to make the compiler generate code with the new instructions are in follow-up patches. [1] For a quick overview of ARMv8-M please refer to the initial cover letter. ChangeLog entries are as follow: *** gcc/ChangeLog *** 2015-11-23 Thomas Preud'homme * config/arm/arm-arches.def (armv8-m.base): Define new architecture. (armv8-m.main): Likewise. (armv8-m.main+dsp): Likewise * config/arm/arm-protos.h (FL_FOR_ARCH8M_BASE): Define. (FL_FOR_ARCH8M_MAIN): Likewise. * config/arm/arm-tables.opt: Regenerate. * config/arm/bpabi.h: Add armv8-m.base, armv8-m.main and armv8-m.main+dsp to BE8_LINK_SPEC. * config/arm/arm.h (TARGET_HAVE_LDACQ): Exclude ARMv8-M. (enum base_architecture): Add BASE_ARCH_8M_BASE and BASE_ARCH_8M_MAIN. (TARGET_ARM_V8M): Define. * config/arm/arm.c (arm_arch_name): Increase size to work with ARMv8-M Baseline and Mainline. (arm_option_override_internal): Also disable arm_restrict_it when !arm_arch_notm. (arm_file_start): Increase architecture buffer size. * doc/invoke.texi: Document architectures armv8-m.base, armv8-m.main and armv8-m.main+dsp. (mno-unaligned-access): Clarify that this is disabled by default for ARMv8-M Baseline architecture as well. *** gcc/testsuite/ChangeLog *** 2015-11-10 Thomas Preud'homme * lib/target-supports.exp: Generate add_options_for_arm_arch_FUNC and check_effective_target_arm_arch_FUNC_multilib for ARMv8-M Baseline and ARMv8-M Mainline architectures. *** libgcc/ChangeLog *** 2015-11-10 Thomas Preud'homme * config/arm/lib1funcs.S (__ARM_ARCH__): Define to 8 for ARMv8-M. diff --git a/gcc/config/arm/arm-arches.def b/gcc/config/arm/arm-arches.def index ddf6c3c330f91640d647d266f3d0e2350e7b986a..1d0301a3b9414127d387834584f3e42c225b6d3f 100644 --- a/gcc/config/arm/arm-arches.def +++ b/gcc/config/arm/arm-arches.def @@ -57,6 +57,12 @@ ARM_ARCH("armv7-m", cortexm3,7M, ARM_FSET_MAKE_CPU1 (FL_CO_PROC | FL_FOR_ ARM_ARCH("armv7e-m", cortexm4, 7EM, ARM_FSET_MAKE_CPU1 (FL_CO_PROC | FL_FOR_ARCH7EM)) ARM_ARCH("armv8-a", cortexa53, 8A,ARM_FSET_MAKE_CPU1 (FL_CO_PROC | FL_FOR_ARCH8A)) ARM_ARCH("armv8-a+crc",cortexa53, 8A, ARM_FSET_MAKE_CPU1 (FL_CO_PROC | FL_CRC32 | FL_FOR_ARCH8A)) +ARM_ARCH("armv8-m.base", cortexm0, 8M_BASE, +ARM_FSET_MAKE_CPU1 ( FL_FOR_ARCH8M_BASE)) +ARM_ARCH("armv8-m.main", cortexm7, 8M_MAIN, +ARM_FSET_MAKE_CPU1(FL_CO_PROC | FL_FOR_ARCH8M_MAIN)) +ARM_ARCH("armv8-m.main+dsp", cortexm7, 8M_MAIN, +ARM_FSET_MAKE_CPU1(FL_CO_PROC | FL_ARCH7EM | FL_FOR_ARCH8M_MAIN)) ARM_ARCH("iwmmxt", iwmmxt, 5TE, ARM_FSET_MAKE_CPU1 (FL_LDSCHED | FL_STRONG | FL_FOR_ARCH5TE | FL_XSCALE | FL_IWMMXT)) ARM_ARCH("iwmmxt2", iwmmxt2,5TE, ARM_FSET_MAKE_CPU1 (FL_LDSCHED | FL_STRONG | FL_FOR_ARCH5TE | FL_XSCALE | FL_IWMMXT | FL_IWMMXT2)) diff --git a/gcc/config/arm/arm-protos.h b/gcc/config/arm/arm-protos.h index e7328e79650739fca1c3e21b10c194feaa697465..dc7a0871c37bfda2671f197bfe83c20c7888 100644 --- a/gcc/config/arm/arm-protos.h +++ b/gcc/config/arm/arm-protos.h @@ -415,6 +415,8 @@ extern bool arm_is_constant_pool_ref (rtx); #define FL_FOR_ARCH7M (FL_FOR_ARCH7 | FL_THUMB_DIV) #define FL_FOR_ARCH7EM (FL_FOR_ARCH7M | FL_ARCH7EM) #define FL_FOR_ARCH8A (FL_FOR_ARCH7VE | FL_ARCH8) +#define FL_FOR_ARCH8M_BASE (FL_FOR_ARCH6M | FL_ARCH8 | FL_THUMB_DIV) +#define FL_FOR_ARCH8M_MAIN (FL_FOR_ARCH7M | FL_ARCH8) /* There are too many feature bits to fit in a single word so the set of cpu and fpu capabilities is a structure. A feature set is created and manipulated diff --git a/gcc/config/arm/arm-tables.opt b/gcc/config/arm/arm-tables.opt index 48aac41c37a35e27440f67863d4a92457916dd1b..2f24bf4c9a9ae1a12ba284fd160c34e577b2e4c6 100644 --- a/gcc/config/arm/arm-tables.opt +++ b/gcc/config/arm/arm-tables.opt @@ -416,10 +416,19 @@ EnumValue Enum(arm_arch) String(armv8-a+crc) Value(26) EnumValue -Enum(arm_arch) String(iwmmxt) Value(27) +Enum(arm_arch) String(armv8-m.base) Value(27) EnumValue -Enum(arm_arch) String(iwmmxt2) Value(28) +Enum(arm_arch) String(armv8-m.main) Value(28) + +EnumValue +Enum(arm_arch) String(armv8-m.main+dsp) Value(29) + +EnumValue +Enum(arm_arch) St
RE: [arm-embedded][PATCH, libgcc/ARM 1/6] Fix Thumb-1 only == ARMv6-M & Thumb-2 only == ARMv7-M assumptions
CH_5E__) || defined(__ARM_ARCH_5TE__) \ || defined(__ARM_ARCH_5TEJ__) #define HAVE_ARM_CLZ 1 #endif #ifdef L_clzsi2 -#if defined(__ARM_ARCH_6M__) +#if !__ARM_ARCH_ISA_ARM && __ARM_ARCH_ISA_THUMB == 1 FUNC_START clzsi2 mov r1, #28 mov r3, #1 @@ -1753,7 +1753,7 @@ ARM_FUNC_START clzsi2 #ifdef L_clzdi2 #if !defined(HAVE_ARM_CLZ) -# if defined(__ARM_ARCH_6M__) +# if !__ARM_ARCH_ISA_ARM && __ARM_ARCH_ISA_THUMB == 1 FUNC_START clzdi2 push{r4, lr} # else @@ -1778,7 +1778,7 @@ ARM_FUNC_START clzdi2 bl __clzsi2 # endif 2: -# if defined(__ARM_ARCH_6M__) +# if !__ARM_ARCH_ISA_ARM && __ARM_ARCH_ISA_THUMB == 1 pop {r4, pc} # else RETLDM r4 @@ -1800,7 +1800,7 @@ ARM_FUNC_START clzdi2 #endif /* L_clzdi2 */ #ifdef L_ctzsi2 -#if defined(__ARM_ARCH_6M__) +#if !__ARM_ARCH_ISA_ARM && __ARM_ARCH_ISA_THUMB == 1 FUNC_START ctzsi2 neg r1, r0 and r0, r0, r1 @@ -1915,7 +1915,7 @@ ARM_FUNC_START ctzsi2 /* Don't bother with the old interworking routines for Thumb-2. */ /* ??? Maybe only omit these on "m" variants. */ -#if !defined(__thumb2__) && !defined(__ARM_ARCH_6M__) +#if __ARM_ARCH_ISA_ARM && __ARM_ARCH_ISA_THUMB == 1 #if defined L_interwork_call_via_rX @@ -2150,11 +2150,12 @@ LSYM(Lchange_\register): #endif /* Arch supports thumb. */ #ifndef __symbian__ -#ifndef __ARM_ARCH_6M__ +/* The condition here must match the one in gcc/config/arm/elf.h. */ +#if __ARM_ARCH_ISA_ARM || __ARM_ARCH_ISA_THUMB != 1 #include "ieee754-df.S" #include "ieee754-sf.S" #include "bpabi.S" -#else /* __ARM_ARCH_6M__ */ +#else /* !__ARM_ARCH_ISA_ARM && __ARM_ARCH_ISA_THUMB == 1 */ #include "bpabi-v6m.S" -#endif /* __ARM_ARCH_6M__ */ +#endif /* !__ARM_ARCH_ISA_ARM && __ARM_ARCH_ISA_THUMB == 1 */ #endif /* !__symbian__ */ diff --git a/libgcc/config/arm/libunwind.S b/libgcc/config/arm/libunwind.S index cac102231914aa85320ff579168a17afa8479f67..393ec8aaee43948154956b72960860902400df50 100644 --- a/libgcc/config/arm/libunwind.S +++ b/libgcc/config/arm/libunwind.S @@ -58,7 +58,7 @@ #endif #endif -#ifdef __ARM_ARCH_6M__ +#if !__ARM_ARCH_ISA_ARM && __ARM_ARCH_ISA_THUMB == 1 /* r0 points to a 16-word block. Upload these values to the actual core state. */ @@ -169,7 +169,7 @@ FUNC_START gnu_Unwind_Save_WMMXC UNPREFIX \name .endm -#else /* !__ARM_ARCH_6M__ */ +#else /* __ARM_ARCH_ISA_ARM || __ARM_ARCH_ISA_THUMB != 1 */ /* r0 points to a 16-word block. Upload these values to the actual core state. */ @@ -351,7 +351,7 @@ ARM_FUNC_START gnu_Unwind_Save_WMMXC UNPREFIX \name .endm -#endif /* !__ARM_ARCH_6M__ */ +#endif /* __ARM_ARCH_ISA_ARM || __ARM_ARCH_ISA_THUMB != 1 */ UNWIND_WRAPPER _Unwind_RaiseException 1 UNWIND_WRAPPER _Unwind_Resume 1 diff --git a/libgcc/config/arm/t-softfp b/libgcc/config/arm/t-softfp index 4ede438baf6a297737e52db00395f6c3a359f681..554ec9bc47b04445e79e84b1f957bf88680c08d1 100644 --- a/libgcc/config/arm/t-softfp +++ b/libgcc/config/arm/t-softfp @@ -1,2 +1,2 @@ -softfp_wrap_start := '\#ifdef __ARM_ARCH_6M__' +softfp_wrap_start := '\#if !__ARM_ARCH_ISA_ARM && __ARM_ARCH_ISA_THUMB == 1' softfp_wrap_end := '\#endif' Best regards, Thomas > -Original Message- > From: gcc-patches-ow...@gcc.gnu.org [mailto:gcc-patches- > ow...@gcc.gnu.org] On Behalf Of Thomas Preud'homme > Sent: Thursday, December 17, 2015 1:58 PM > To: gcc-patches@gcc.gnu.org > Subject: [arm-embedded][PATCH, libgcc/ARM 1/6] Fix Thumb-1 only == > ARMv6-M & Thumb-2 only == ARMv7-M assumptions > > Hi, > > We decided to apply the following patch to the ARM embedded 5 branch. > This is *not* intended for trunk for now. We will send a separate email > for trunk. > > This patch is part of a patch series to add support for ARMv8-M[1] to GCC. > This specific patch fixes some assumptions related to M profile > architectures. Currently GCC (mostly libgcc) contains several assumptions > that the only ARM architecture with Thumb-1 only instructions is ARMv6- > M and the only one with Thumb-2 only instructions is ARMv7-M. ARMv8- > M [1] make this wrong since ARMv8-M baseline is also (mostly) Thumb-1 > only and ARMv8-M mainline is also Thumb-2 only. This patch replace > checks for __ARM_ARCH_*__ for checks against > __ARM_ARCH_ISA_THUMB and __ARM_ARCH_ISA_ARM instead. For > instance, Thumb-1 only can be checked with > #if !defined(__ARM_ARCH_ISA_ARM) && (__ARM_ARCH_ISA_THUMB > == 1). It also fixes the guard for DIV code to not apply to ARMv8-M > Baseline since it uses Thumb-2 instructions. > > [1] For a quick overview of ARMv8-M please refer to the initial cover > letter. > > ChangeLog entries are as follow: &g
[arm-embedded][PATCH, libgcc/ARM 1/6] Fix Thumb-1 only == ARMv6-M & Thumb-2 only == ARMv7-M assumptions
Hi, We decided to apply the following patch to the ARM embedded 5 branch. This is *not* intended for trunk for now. We will send a separate email for trunk. This patch is part of a patch series to add support for ARMv8-M[1] to GCC. This specific patch fixes some assumptions related to M profile architectures. Currently GCC (mostly libgcc) contains several assumptions that the only ARM architecture with Thumb-1 only instructions is ARMv6-M and the only one with Thumb-2 only instructions is ARMv7-M. ARMv8-M [1] make this wrong since ARMv8-M baseline is also (mostly) Thumb-1 only and ARMv8-M mainline is also Thumb-2 only. This patch replace checks for __ARM_ARCH_*__ for checks against __ARM_ARCH_ISA_THUMB and __ARM_ARCH_ISA_ARM instead. For instance, Thumb-1 only can be checked with #if !defined(__ARM_ARCH_ISA_ARM) && (__ARM_ARCH_ISA_THUMB == 1). It also fixes the guard for DIV code to not apply to ARMv8-M Baseline since it uses Thumb-2 instructions. [1] For a quick overview of ARMv8-M please refer to the initial cover letter. ChangeLog entries are as follow: *** gcc/ChangeLog *** 2015-11-13 Thomas Preud'homme * config/arm/elf.h: Use __ARM_ARCH_ISA_THUMB and __ARM_ARCH_ISA_ARM to decide whether to prevent some libgcc routines being included for some multilibs rather than __ARM_ARCH_6M__ and add comment to indicate the link between this condition and the one in libgcc/config/arm/lib1func.S. * config/arm/arm.h (TARGET_ARM_V6M): Add check to TARGET_ARM_ARCH. (TARGET_ARM_V7M): Likewise. *** gcc/testsuite/ChangeLog *** 2015-11-10 Thomas Preud'homme * lib/target-supports.exp (check_effective_target_arm_cortex_m): Use __ARM_ARCH_ISA_ARM to test for Cortex-M devices. *** libgcc/ChangeLog *** 2015-11-13 Thomas Preud'homme * config/arm/bpabi-v6m.S: Fix header comment to mention Thumb-1 rather than ARMv6-M. * config/arm/lib1funcs.S (__prefer_thumb__): Define among other cases for all Thumb-1 only targets. (__only_thumb1__): Define for all Thumb-1 only targets. (THUMB_LDIV0): Test for __only_thumb1__ rather than __ARM_ARCH_6M__. (EQUIV): Likewise. (ARM_FUNC_ALIAS): Likewise. (umodsi3): Add check to __only_thumb1__ to guard the idiv version. (modsi3): Likewise. (HAVE_ARM_CLZ): Test for __only_thumb1__ rather than __ARM_ARCH_6M__. (clzsi2): Likewise. (clzdi2): Likewise. (ctzsi2): Likewise. (L_interwork_call_via_rX): Test for __ARM_ARCH_ISA_ARM rather than __ARM_ARCH_6M__ in guard for checking whether it is defined. (final includes): Test for __only_thumb1__ rather than __ARM_ARCH_6M__ and add comment to indicate the connection between this condition and the one in gcc/config/arm/elf.h. * config/arm/libunwind.S: Test for __ARM_ARCH_ISA_THUMB and __ARM_ARCH_ISA_ARM rather than __ARM_ARCH_6M__. * config/arm/t-softfp: Likewise. diff --git a/gcc/config/arm/arm.h b/gcc/config/arm/arm.h index 6ed8ad3..06abcf3 100644 --- a/gcc/config/arm/arm.h +++ b/gcc/config/arm/arm.h @@ -2181,8 +2181,10 @@ extern int making_const_table; #define TARGET_ARM_ARCH\ (arm_base_arch) \ -#define TARGET_ARM_V6M (!arm_arch_notm && !arm_arch_thumb2) -#define TARGET_ARM_V7M (!arm_arch_notm && arm_arch_thumb2) +#define TARGET_ARM_V6M (TARGET_ARM_ARCH == BASE_ARCH_6M && !arm_arch_notm \ + && !arm_arch_thumb2) +#define TARGET_ARM_V7M (TARGET_ARM_ARCH == BASE_ARCH_7M && !arm_arch_notm \ + && arm_arch_thumb2) /* The highest Thumb instruction set version supported by the chip. */ #define TARGET_ARM_ARCH_ISA_THUMB \ diff --git a/gcc/config/arm/elf.h b/gcc/config/arm/elf.h index 3795728..579a580 100644 --- a/gcc/config/arm/elf.h +++ b/gcc/config/arm/elf.h @@ -148,8 +148,9 @@ while (0) /* Horrible hack: We want to prevent some libgcc routines being included - for some multilibs. */ -#ifndef __ARM_ARCH_6M__ + for some multilibs. The condition should match the one in + libgcc/config/arm/lib1funcs.S. */ +#if __ARM_ARCH_ISA_ARM || __ARM_ARCH_ISA_THUMB != 1 #undef L_fixdfsi #undef L_fixunsdfsi #undef L_truncdfsf2 diff --git a/gcc/testsuite/lib/target-supports.exp b/gcc/testsuite/lib/target-supports.exp index 254c4e3..6cf7ee1 100644 --- a/gcc/testsuite/lib/target-supports.exp +++ b/gcc/testsuite/lib/target-supports.exp @@ -3210,10 +3210,8 @@ proc check_effective_target_arm_cortex_m { } { return 0 } return [check_no_compiler_messages arm_cortex_m assembly { - #if !defined(__ARM_ARCH_7M__) \ -&& !defined (__ARM_ARCH_7EM__) \ -&& !defined (__ARM_ARCH_6M__) - #error !__ARM_ARCH_7M__ && !__ARM_ARCH_7EM__ && !__ARM_ARCH_6M__ + #i
[PATCH, GCC, V8M 0/6] Add support for ARMv8-M
Hi, I'll be posting a patch series intended for trunk whose aim is to add support for ARMv8-M. This patch series does not include changes to support the security extensions [nor does it include atomics for ARMv8-M Baseline]. This will be posted as a separate patch series. === Quick overview of ARMv8-M === ARMv8-M has two profiles[1]: Baseline and Mainline. In terms of features they can be defined as: ARMv8-M Baseline (armv8-m.base): * All ARMv6-M features * 16-bit immediate moves * Wide Branch * Compare & branch if (not) zero * Integer divide * Load/store exclusives * Atomic Load/stores * security extensions ARMv8-M Mainline (armv8-m.main): * All ARMv7-M features * Atomic load/stores * security extensions. ARMv8-M Mainline with DSP extension (armv8-m.main+dsp): * ARMv8-M Mainline * Those instructions added to ARMv7E-M on top of ARMv7-M. Note that although certain architectural features of the security extensions are optional for cores implementing ARMv8-M, some of the new instructions are always available in the architecture. Note also that only the security extensions instructions are new instructions, all other instructions have previously been available in other ARM Architecture profiles. [1] http://www.arm.com/products/processors/instruction-set-architectures/armv8-m-architecture.php
[PATCH, ARM, 3/3] Add multilib support for bare-metal ARM architectures
Hi Ramana, As suggested in your initial answer to this thread, we updated the multilib patch provided in ARM's embedded branch to be up-to-date with regards to supported CPUs in GCC. As to the need to modify Makefile.in and configure.ac, this is because the patch aims to let control to the user as to what multilib should be built. To this effect, it takes a list of architecture at configure time and that list needs to be passed down to t-baremetal Makefile to set the multilib variables appropriately. ChangeLog entry is as follows: *** gcc/ChangeLog *** 2015-12-15 Thomas Preud'homme * Makefile.in (with_multilib_list): New variables substituted by configure. * config.gcc: Handle bare-metal multilibs in --with-multilib-list option. * config/arm/t-baremetal: New file. * configure.ac (with_multilib_list): New AC_SUBST. * configure: Regenerate. * doc/install.texi (--with-multilib-list): Update description for arm*-*-* targets to mention bare-metal multilibs. diff --git a/gcc/Makefile.in b/gcc/Makefile.in index 1f698798aa2df3f44d6b3a478bb4bf48e9fa7372..18b790afa114aa7580be0662d3ac9ffbc94e919d 100644 --- a/gcc/Makefile.in +++ b/gcc/Makefile.in @@ -546,6 +546,7 @@ lang_opt_files=@lang_opt_files@ $(srcdir)/c-family/c.opt $(srcdir)/common.opt lang_specs_files=@lang_specs_files@ lang_tree_files=@lang_tree_files@ target_cpu_default=@target_cpu_default@ +with_multilib_list=@with_multilib_list@ OBJC_BOEHM_GC=@objc_boehm_gc@ extra_modes_file=@extra_modes_file@ extra_opt_files=@extra_opt_files@ diff --git a/gcc/config.gcc b/gcc/config.gcc index af948b5e203f6b4f53dfca38e9d02d060d00c97b..d8098ed3cefacd00cb10590db1ec86d48e9fcdbc 100644 --- a/gcc/config.gcc +++ b/gcc/config.gcc @@ -3787,15 +3787,25 @@ case "${target}" in default) ;; *) - echo "Error: --with-multilib-list=${with_multilib_list} not supported." 1>&2 - exit 1 + for arm_multilib in ${arm_multilibs}; do + case ${arm_multilib} in + armv6-m | armv7-m | armv7e-m | armv7-r | armv8-m.base | armv8-m.main) + tmake_profile_file="arm/t-baremetal" + ;; + *) + echo "Error: --with-multilib-list=${with_multilib_list} not supported." 1>&2 + exit 1 + ;; + esac + done ;; esac if test "x${tmake_profile_file}" != x ; then - # arm/t-aprofile is only designed to work - # without any with-cpu, with-arch, with-mode, - # with-fpu or with-float options. + # arm/t-aprofile and arm/t-baremetal are only + # designed to work without any with-cpu, + # with-arch, with-mode, with-fpu or with-float + # options. if test "x$with_arch" != x \ || test "x$with_cpu" != x \ || test "x$with_float" != x \ diff --git a/gcc/config/arm/t-baremetal b/gcc/config/arm/t-baremetal new file mode 100644 index ..ffd29815e6ec22c747e77747ed9b69e0ae21b63a --- /dev/null +++ b/gcc/config/arm/t-baremetal @@ -0,0 +1,130 @@ +# A set of predefined MULTILIB which can be used for different ARM targets. +# Via the configure option --with-multilib-list, user can customize the +# final MULTILIB implementation. + +comma := , + +with_multilib_list := $(subst $(comma), ,$(with_multilib_list + +MULTILIB_OPTIONS = mthumb/marm +MULTILIB_DIRNAMES = thumb arm +MULTILIB_OPTIONS += march=armv6s-m/march=armv7-m/march=armv7e-m/march=armv7/march=armv8-m.base/march=armv8-m.main +MULTILIB_DIRNAMES += armv6-m armv7-m armv7e-m armv7-ar armv8-m.base armv8-m.main +MULTILIB_OPTIONS += mfloat-abi=softfp/mfloat-abi=hard +MULTILIB_DIRNAMES += softfp fpu +MULTILIB_OPTIONS += mfpu=fpv5-sp-d16/mfpu=fpv5-d16/mfpu=fpv4-sp-d16/mfpu=vfpv3-d16 +MULTILIB_DIRNAMES += fpv5-sp-d16 fpv5-d16 fpv4-sp-d16 vfpv3-d16 + +MULTILIB_MATCHES = march?armv6s-m=mcpu?cortex-m0 +MULTILIB_MATCHES += march?armv6s-m=mcpu?cortex-m0.small-multiply +MULTILIB_MATCHES += march?armv6s-m=mcpu?cortex-m0plus +MULTILIB_MATCHES += march?armv6s-m=mcpu?cortex-m0plus.smal
[PATCH, GCC/ARM, 2/3] Error out for incompatible ARM multilibs
Currently in config.gcc, only the first multilib in a multilib list is checked for validity and the following elements are ignored due to the break which only breaks out of loop in shell. A loop is also done over the multilib list elements despite no combination being legal. This patch rework the code to address both issues. ChangeLog entry is as follows: 2015-11-24 Thomas Preud'homme * config.gcc: Error out when conflicting multilib is detected. Do not loop over multilibs since no combination is legal. diff --git a/gcc/config.gcc b/gcc/config.gcc index 59aee2c..be3c720 100644 --- a/gcc/config.gcc +++ b/gcc/config.gcc @@ -3772,38 +3772,40 @@ case "${target}" in # Add extra multilibs if test "x$with_multilib_list" != x; then arm_multilibs=`echo $with_multilib_list | sed -e 's/,/ /g'` - for arm_multilib in ${arm_multilibs}; do - case ${arm_multilib} in - aprofile) + case ${arm_multilibs} in + aprofile) # Note that arm/t-aprofile is a # stand-alone make file fragment to be # used only with itself. We do not # specifically use the # TM_MULTILIB_OPTION framework because # this shorthand is more - # pragmatic. Additionally it is only - # designed to work without any - # with-cpu, with-arch with-mode + # pragmatic. + tmake_profile_file="arm/t-aprofile" + ;; + default) + ;; + *) + echo "Error: --with-multilib-list=${with_multilib_list} not supported." 1>&2 + exit 1 + ;; + esac + + if test "x${tmake_profile_file}" != x ; then + # arm/t-aprofile is only designed to work + # without any with-cpu, with-arch, with-mode, # with-fpu or with-float options. - if test "x$with_arch" != x \ - || test "x$with_cpu" != x \ - || test "x$with_float" != x \ - || test "x$with_fpu" != x \ - || test "x$with_mode" != x ; then - echo "Error: You cannot use any of --with-arch/cpu/fpu/float/mode with --with-multilib-list=aprofile" 1>&2 - exit 1 - fi - tmake_file="${tmake_file} arm/t-aprofile" - break - ;; - default) - ;; - *) - echo "Error: --with-multilib-list=${with_multilib_list} not supported." 1>&2 - exit 1 - ;; - esac - done + if test "x$with_arch" != x \ + || test "x$with_cpu" != x \ + || test "x$with_float" != x \ + || test "x$with_fpu" != x \ + || test "x$with_mode" != x ; then + echo "Error: You cannot use any of --with-arch/cpu/fpu/float/mode with --with-multilib-list=${arm_multilib}" 1>&2 + exit 1 + fi + + tmake_file="${tmake_file} ${tmake_profile_file}" + fi fi ;; Tested with the following multilib lists: + foo -> "Error: --with-multilib-list=foo not supported" as expected + default,aprofile -> "Error: --with-multilib-list=default,aprofile not supported" as expected + aprofile,default -> "Error: --with-multilib-list=aprofile,default not supported" as expected + (nothing) -> libraries in $installdir/arm-none-eabi/lib{,fpu,thumb} + default -> libraries in $installdir/arm-none-eabi/lib{,fpu,thumb} as expected + aprofile -> $installdir/arm-none-eabi/lib contains all supported multilib Is this ok for trunk? Best regards, Thomas
[PATCH, ARM, 1/3] Document --with-multilib-list for arm*-*-* targets
Currently, the documentation for --with-multilib-list in gcc/doc/install.texi only mentions sh*-*-* and x86-64-*-linux* targets. However, arm*-*-* targets also support this option. This patch adds documention for the meaning of this option for arm*-*-* targets. ChangeLog entry is as follows: *** gcc/ChangeLog *** 2015-12-09 Thomas Preud'homme * doc/install.texi (--with-multilib-list): Describe the meaning of the option for arm*-*-* targets. diff --git a/gcc/doc/install.texi b/gcc/doc/install.texi index 57399ed..2c93eb0 100644 --- a/gcc/doc/install.texi +++ b/gcc/doc/install.texi @@ -1102,9 +1102,19 @@ sysv, aix. @item --with-multilib-list=@var{list} @itemx --without-multilib-list Specify what multilibs to build. -Currently only implemented for sh*-*-* and x86-64-*-linux*. +Currently only implemented for arm*-*-*, sh*-*-* and x86-64-*-linux*. @table @code +@item arm*-*-* +@var{list} is either @code{default} or @code{aprofile}. Specifying +@code{default} is equivalent to omitting this option while specifying +@code{aprofile} builds multilibs for each combination of ISA (@code{-marm} or +@code{-mthumb}), architecture (@code{-march=armv7-a}, @code{-march=armv7ve}, +or @code{-march=armv8-a}), FPU available (none, @code{-mfpu=vfpv3-d16}, +@code{neon}, @code{vfpv4-d16}, @code{neon-vfpv4} or @code{neon-fp-armv8} +depending on architecture) and floating-point ABI (@code{-mfloat-abi=softfp} +or @code{-mfloat-abi=hard}). + @item sh*-*-* @var{list} is a comma separated list of CPU names. These must be of the form @code{sh*} or @code{m*} (in which case they match the compiler option PDF builds fine out of the updated file and look as expected. Is this ok for trunk? Best regards, Thomas
[PATCH, ARM] Fix gcc.c-torture/execute/loop-2b.c execution failure on cortex-m0
During reorg pass, thumb1_reorg () is tasked with rewriting mov rd, rn to subs rd, rn, 0 to avoid a comparison against 0 instruction before doing a conditional branch based on it. The actual avoiding of cmp is done in cbranchsi4_insn instruction C output template. When the condition is met, the source register (rn) is also propagated into the comparison in place the destination register (rd). However, right now thumb1_reorg () only look for a mov followed by a cbranchsi but does not check whether the comparison in cbranchsi is against the constant 0. This is not safe because a non clobbering instruction could exist between the mov and the comparison that modifies the source register. This is what happens here with a post increment of the source register after the mov, which skip the &a[i] == &a[1] comparison for iteration i == 1. This patch fixes the issue by checking that the comparison is against constant 0. ChangeLog entry is as follow: *** gcc/ChangeLog *** 2015-12-07 Thomas Preud'homme * config/arm/arm.c (thumb1_reorg): Check that the comparison is against the constant 0. diff --git a/gcc/config/arm/arm.c b/gcc/config/arm/arm.c index 42bf272..49c0a06 100644 --- a/gcc/config/arm/arm.c +++ b/gcc/config/arm/arm.c @@ -17195,7 +17195,7 @@ thumb1_reorg (void) FOR_EACH_BB_FN (bb, cfun) { rtx dest, src; - rtx pat, op0, set = NULL; + rtx cmp, op0, op1, set = NULL; rtx_insn *prev, *insn = BB_END (bb); bool insn_clobbered = false; @@ -17208,8 +17208,13 @@ thumb1_reorg (void) continue; /* Get the register with which we are comparing. */ - pat = PATTERN (insn); - op0 = XEXP (XEXP (SET_SRC (pat), 0), 0); + cmp = XEXP (SET_SRC (PATTERN (insn)), 0); + op0 = XEXP (cmp, 0); + op1 = XEXP (cmp, 1); + + /* Check that comparison is against ZERO. */ + if (!CONST_INT_P (op1) || INTVAL (op1) != 0) + continue; /* Find the first flag setting insn before INSN in basic block BB. */ gcc_assert (insn != BB_HEAD (bb)); @@ -17249,7 +17254,7 @@ thumb1_reorg (void) PATTERN (prev) = gen_rtx_SET (dest, src); INSN_CODE (prev) = -1; /* Set test register in INSN to dest. */ - XEXP (XEXP (SET_SRC (pat), 0), 0) = copy_rtx (dest); + XEXP (cmp, 0) = copy_rtx (dest); INSN_CODE (insn) = -1; } } Testsuite shows no regression when run for arm-none-eabi with -mcpu=cortex-m0 -mthumb Is this ok for trunk? Best regards, Thomas
[PATCH, testsuite] Fix PR68632: gcc.target/arm/lto/pr65837 failure on M profile ARM targets
gcc.target/arm/lto/pr65837 fails on M profile ARM targets because of lack of neon instructions. This patch adds the necessary arm_neon_ok effective target requirement to avoid running this test for such targets. ChangeLog entry is as follows: * gcc/testsuite/ChangeLog *** 2015-12-08 Thomas Preud'homme PR testsuite/68632 * gcc.target/arm/lto/pr65837_0.c: Require arm_neon_ok effective target. diff --git a/gcc/testsuite/gcc.target/arm/lto/pr65837_0.c b/gcc/testsuite/gcc.target/arm/lto/pr65837_0.c index 000fc2a..fcc26a1 100644 --- a/gcc/testsuite/gcc.target/arm/lto/pr65837_0.c +++ b/gcc/testsuite/gcc.target/arm/lto/pr65837_0.c @@ -1,4 +1,5 @@ /* { dg-lto-do run } */ +/* { dg-require-effective-target arm_neon_ok } */ /* { dg-lto-options {{-flto -mfpu=neon}} } */ /* { dg-suppress-ld-options {-mfpu=neon} } */ Testcase fails without the patch and succeeds with. Is this ok for trunk? Best regards, Thomas
[PATCH, testsuite] Fix PR68629: attr-simd-3.c failure on arm-none-eabi targets
c-c++-common/attr-simd-3.c fails to compile on arm-none-eabi targets due to -fcilkplus needing -pthread which is not available for those targets. This patch solves this issue by adding a condition to the cilkplus effective target that compiling with -fcilkplus succeeds and requires cilkplus as an effective target for attr-simd-3.c testcase. ChangeLog entry is as follows: *** gcc/testsuite/ChangeLog *** 2015-12-08 Thomas Preud'homme PR testsuite/68629 * lib/target-supports.exp (check_effective_target_cilkplus): Also check that compiling with -fcilkplus does not give an error. * c-c++-common/attr-simd-3.c: Require cilkplus effective target. diff --git a/gcc/testsuite/c-c++-common/attr-simd-3.c b/gcc/testsuite/c-c++-common/attr-simd-3.c index d61ba82..1970c67 100644 --- a/gcc/testsuite/c-c++-common/attr-simd-3.c +++ b/gcc/testsuite/c-c++-common/attr-simd-3.c @@ -1,4 +1,5 @@ /* { dg-do compile } */ +/* { dg-require-effective-target "cilkplus" } */ /* { dg-options "-fcilkplus" } */ /* { dg-prune-output "undeclared here \\(not in a function\\)|\[^\n\r\]* was not declared in this scope" } */ diff --git a/gcc/testsuite/lib/target-supports.exp b/gcc/testsuite/lib/target-supports.exp index 4e349e9..95b903c 100644 --- a/gcc/testsuite/lib/target-supports.exp +++ b/gcc/testsuite/lib/target-supports.exp @@ -1432,7 +1432,12 @@ proc check_effective_target_cilkplus { } { if { [istarget avr-*-*] } { return 0; } -return 1 +return [ check_no_compiler_messages_nocache fcilkplus_available executable { + #ifdef __cplusplus + extern "C" + #endif + int dummy; + } "-fcilkplus" ] } proc check_linker_plugin_available { } { Testsuite shows no regression when run with + an arm-none-eabi GCC cross-compiler targeting Cortex-M3 + a bootstrapped x86_64-linux-gnu GCC native compiler Is this ok for trunk? Best regards, Thomas
[arm-embedded][PATCHv2, ARM, libgcc] New aeabi_idiv function for armv6-m
We decided to apply this to ARM/embedded-5-branch. Best regards, Thomas > -Original Message- > From: gcc-patches-ow...@gcc.gnu.org [mailto:gcc-patches- > ow...@gcc.gnu.org] On Behalf Of Andre Vieira > Sent: Wednesday, October 28, 2015 1:03 AM > To: gcc-patches@gcc.gnu.org > Subject: Re: [PING][PATCHv2, ARM, libgcc] New aeabi_idiv function for > armv6-m > > Ping. > > BR, > Andre > > On 13/10/15 18:01, Andre Vieira wrote: > > This patch ports the aeabi_idiv routine from Linaro Cortex-Strings > > (https://git.linaro.org/toolchain/cortex-strings.git), which was > > contributed by ARM under Free BSD license. > > > > The new aeabi_idiv routine is used to replace the one in > > libgcc/config/arm/lib1funcs.S. This replacement happens within the > > Thumb1 wrapper. The new routine is under LGPLv3 license. > > > > The main advantage of this version is that it can improve the > > performance of the aeabi_idiv function for Thumb1. This solution will > > also increase the code size. So it will only be used if > > __OPTIMIZE_SIZE__ is not defined. > > > > Make check passed for armv6-m. > > > > libgcc/ChangeLog: > > 2015-08-10 Hale Wang > > Andre Vieira > > > > * config/arm/lib1funcs.S: Add new wrapper. > >
FW: [PATCH, ARM/testsuite] Fix thumb2-slow-flash-data.c failures
[Forwarding to gcc-patches, doh!] Best regards, Thomas --- Begin Message --- Hi, ARM-specific thumb2-slow-flash-data.c testcase shows 2 failures when running for arm-none-eabi with -mcpu=cortex-m7: FAIL: gcc.target/arm/thumb2-slow-flash-data.c (test for excess errors) FAIL: gcc.target/arm/thumb2-slow-flash-data.c scan-assembler-times movt 13 The first one is due to a missing type specifier in the declaration of labelref while the second one is due to different constant synthesis as a result of a different tuning for the CPU selected. This patch fixes these issues by adding the missing type specifier and checking for .word and similar directive instead of the number of movt. The new test passes for all of -mcpu=cortex-m{3,4,7} but fail when removing the -mslow-flash-data switch. ChangeLog entry is as follows: *** gcc/testsuite/ChangeLog *** 2015-11-04 Thomas Preud'homme * gcc.target/arm/thumb2-slow-flash-data.c: Add missing typespec for labelref and check use of constant pool by looking for .word and similar directives. diff --git a/gcc/testsuite/gcc.target/arm/thumb2-slow-flash-data.c b/gcc/testsuite/gcc.target/arm/thumb2-slow-flash-data.c index 9852ea5..089a72b 100644 --- a/gcc/testsuite/gcc.target/arm/thumb2-slow-flash-data.c +++ b/gcc/testsuite/gcc.target/arm/thumb2-slow-flash-data.c @@ -50,7 +50,7 @@ int foo (int a, int b) { int i; - volatile *labelref = &&label1; + volatile int *labelref = &&label1; if (a > b) { @@ -70,5 +70,4 @@ label1: return a + b; } -/* { dg-final { scan-assembler-times "movt" 13 } } */ -/* { dg-final { scan-assembler-times "movt.*LC0\\+4" 1 } } */ +/* { dg-final { scan-assembler-not "\\.(float|l\\?double|\d?byte|short|int|long|quad|word)\\s+\[^.\]" } } */ Is this ok for trunk? Best regards, Thomas --- End Message ---
[PATCH, ARM] List Cs and US constraints as being used
Hi, The header in gcc/config/arm/constraints.md list all the ARM-specific constraints defined and for which targets they are but miss a couple of them. This patch add the missing Cs and US constraints to the list. Patch was tested by verifying that arm-none-eabi-gcc cross-compiler can still be build (ie the comment remains a comment). diff --git a/gcc/config/arm/constraints.md b/gcc/config/arm/constraints.md index 42935a4..2d9ffb8 100644 --- a/gcc/config/arm/constraints.md +++ b/gcc/config/arm/constraints.md @@ -21,7 +21,7 @@ ;; The following register constraints have been used: ;; - in ARM/Thumb-2 state: t, w, x, y, z ;; - in Thumb state: h, b -;; - in both states: l, c, k, q, US +;; - in both states: l, c, k, q, Cs, Ts, US ;; In ARM state, 'l' is an alias for 'r' ;; 'f' and 'v' were previously used for FPA and MAVERICK registers. Committed as obvious with the following ChangeLog entry: 2015-08-25 Thomas Preud'homme * config/arm/constraints.md: Also list Cs and US ARM-specific constraints as used. Best regards, Thomas
RE: [PATCH] Obvious fix for PR66828: left shift with undefined behavior in bswap pass
Hi, > From: gcc-patches-ow...@gcc.gnu.org [mailto:gcc-patches- > ow...@gcc.gnu.org] On Behalf Of Thomas Preud'homme > Sent: Tuesday, July 28, 2015 3:04 PM > > > From: gcc-patches-ow...@gcc.gnu.org [mailto:gcc-patches- > > ow...@gcc.gnu.org] On Behalf Of Thomas Preud'homme > > > > ChangeLog entry is as follows: > > > > 2015-07-28 Thomas Preud'homme > > > > PR tree-optimization/66828 > > * tree-ssa-math-opts.c (perform_symbolic_merge): Change type > of > > inc > > from int64_t to uint64_t. Can I backport this change to GCC 5 branch? The patch applies cleanly on GCC 5 and shows no regression on a native x86_64-linux-gnu bootstrapped GCC and an arm-none-eabi GCC cross-compiler. diff --git a/gcc/ChangeLog b/gcc/ChangeLog index ba37d96..a301c23 100644 --- a/gcc/ChangeLog +++ b/gcc/ChangeLog @@ -1,3 +1,12 @@ +2015-08-04 Thomas Preud'homme + + Backport from mainline + 2015-07-28 Thomas Preud'homme + + PR tree-optimization/66828 + * tree-ssa-math-opts.c (perform_symbolic_merge): Change type of inc + from int64_t to uint64_t. + 2015-08-03 John David Anglin PR target/67060 diff --git a/gcc/tree-ssa-math-opts.c b/gcc/tree-ssa-math-opts.c index c22a677..c699dcadb 100644 --- a/gcc/tree-ssa-math-opts.c +++ b/gcc/tree-ssa-math-opts.c @@ -1856,7 +1856,7 @@ perform_symbolic_merge (gimple source_stmt1, struct symbolic_number *n1, the same base (array, structure, ...). */ if (gimple_assign_rhs1 (source_stmt1) != gimple_assign_rhs1 (source_stmt2)) { - int64_t inc; + uint64_t inc; HOST_WIDE_INT start_sub, end_sub, end1, end2, end; struct symbolic_number *toinc_n_ptr, *n_end; Best regards, Thomas
[PATCH, loop-invariant] Fix PR67043: -fcompare-debug failure with -O3
Hi, Since commit r223113, loop-invariant pass rely on luids to determine if an invariant can be hoisted out of a loop without introducing temporaries. However, nothing is made to ensure luids are up-to-date. This patch adds a DF_LIVE problem and mark all blocks as dirty before using luids to ensure these will be recomputed. ChangeLog entries are as follows: 2015-07-31 Thomas Preud'homme PR tree-optimization/67043 * loop-invariant.c (find_defs): Force recomputation of all luids. 2015-07-29 Thomas Preud'homme PR tree-optimization/67043 * gcc.dg/pr67043.c: New test. Note: the testcase was heavily reduced from the Linux kernel sources by Markus Trippelsdorf and formatted to follow GNU code style. diff --git a/gcc/loop-invariant.c b/gcc/loop-invariant.c index 1fdb84d..fc53e09 100644 --- a/gcc/loop-invariant.c +++ b/gcc/loop-invariant.c @@ -676,6 +676,8 @@ find_defs (struct loop *loop) df_remove_problem (df_chain); df_process_deferred_rescans (); df_chain_add_problem (DF_UD_CHAIN); + df_live_add_problem (); + df_live_set_all_dirty (); df_set_flags (DF_RD_PRUNE_DEAD_DEFS); df_analyze_loop (loop); check_invariant_table_size (); diff --git a/gcc/testsuite/gcc.dg/pr67043.c b/gcc/testsuite/gcc.dg/pr67043.c new file mode 100644 index 000..36aa686 --- /dev/null +++ b/gcc/testsuite/gcc.dg/pr67043.c @@ -0,0 +1,32 @@ +/* { dg-do compile } */ +/* { dg-options "-O3 -fcompare-debug -w" } */ + +extern void rt_mutex_owner (void); +extern void rt_mutex_deadlock_account_lock (int); +extern void signal_pending (void); +__typeof__ (int *) a; +int b; + +int +try_to_take_rt_mutex (int p1) { + rt_mutex_owner (); + if (b) +return 0; + rt_mutex_deadlock_account_lock (p1); + return 1; +} + +void +__rt_mutex_slowlock (int p1) { + int c; + for (;;) { +c = ({ + asm ("" : "=r"(a)); + a; +}); +if (try_to_take_rt_mutex (c)) + break; +if (__builtin_expect (p1 == 0, 0)) + signal_pending (); + } +} Patch was tested by running the testsuite against a bootstrapped native x86_64-linux-gnu GCC and against an arm-none-eabi GCC cross-compiler without any regression. Is this ok for trunk? Best regards, Thomas Preud'homme
RE: [PATCH] Obvious fix for PR66828: left shift with undefined behavior in bswap pass
> From: gcc-patches-ow...@gcc.gnu.org [mailto:gcc-patches- > ow...@gcc.gnu.org] On Behalf Of Thomas Preud'homme > > ChangeLog entry is as follows: > > 2015-07-28 Thomas Preud'homme > > PR tree-optimization/66828 > * tree-ssa-math-opts.c (perform_symbolic_merge): Change type of > inc > from int64_t to uint64_t. And the patch is: diff --git a/gcc/tree-ssa-math-opts.c b/gcc/tree-ssa-math-opts.c index 55382f3..c3098db 100644 --- a/gcc/tree-ssa-math-opts.c +++ b/gcc/tree-ssa-math-opts.c @@ -2122,7 +2122,7 @@ perform_symbolic_merge (gimple source_stmt1, struct symbolic_number *n1, the same base (array, structure, ...). */ if (gimple_assign_rhs1 (source_stmt1) != gimple_assign_rhs1 (source_stmt2)) { - int64_t inc; + uint64_t inc; HOST_WIDE_INT start_sub, end_sub, end1, end2, end; struct symbolic_number *toinc_n_ptr, *n_end; Best regards, Thomas
[PATCH] Obvious fix for PR66828: left shift with undefined behavior in bswap pass
The bswap pass contain the following loop: for (i = 0; i < size; i++, inc <<= BITS_PER_MARKER) In the update to inc and i just before exiting the loop, inc can be shifted by a total of more than 62bit, making the value too large to be represented by int64_t. This is an undefined behavior [1] and it triggers an error under an ubsan bootstrap. This patch change the type of inc to be unsigned, removing the undefined behavior. [1] C++ 98 standard section 5.8 paragraph 2: "The value of E1 << E2 is E1 left-shifted E2 bit positions; vacated bits are zero-filled. If E1 has an unsigned type, the value of the result is E1 × 2E2 , reduced modulo one more than the maximum value representable in the result type. Otherwise, if E1 has a signed type and non-negative value, and E1 × 2E2 is representable in the corresponding unsigned type of the result type, then that value, converted to the result type, is the resulting value; otherwise, the behavior is undefined." ChangeLog entry is as follows: 2015-07-28 Thomas Preud'homme PR tree-optimization/66828 * tree-ssa-math-opts.c (perform_symbolic_merge): Change type of inc from int64_t to uint64_t. Testsuite was run on a native x86_64-linux-gnu bootstrapped GCC and an arm-none-eabi cross-compiler without any regression. Committed as obvious as suggested by Markus Trippelsdorf in PR66828. Best regards, Thomas
RE: [PATCH, ARM] Restrict pr65647 testcase to ARMv6-M effective target
> From: James Greenhalgh [mailto:james.greenha...@arm.com] > Sent: Friday, June 26, 2015 6:15 PM > > This should already have been covered by: > > https://gcc.gnu.org/ml/gcc-patches/2015-06/msg01105.html > > 2015-06-16 James Greenhalgh > > * gcc.target/arm/pr65647.c: Do not override -mfloat-abi > directives > passed by the testsuite driver. Indeed, time to git pull. Sorry for the noise :-( Best regards, Thomas
[PATCH, ARM] Restrict pr65647 testcase to ARMv6-M effective target
Hi, Testcase for PR65647 assumes that the compiler can compile for ARMv6-M which might not be the case if passing some extra options via RUNTESTFLAGS (eg. -marm/-mcpu=cortex-a9). This patch restricts the testcase to ARMv6-M effective targets. Testsuite ChangeLog entry is as follows: 2015-06-25 Thomas Preud'homme * gcc.target/arm/pr65647.c: Restrict to ARMv6-M effective targets. diff --git a/gcc/testsuite/gcc.target/arm/pr65647.c b/gcc/testsuite/gcc.target/arm/pr65647.c index d3b44b2..d828d23 100644 --- a/gcc/testsuite/gcc.target/arm/pr65647.c +++ b/gcc/testsuite/gcc.target/arm/pr65647.c @@ -1,4 +1,5 @@ /* { dg-do compile } */ +/* { dg-require-effective-target arm_arch_v6m_ok } */ /* { dg-options "-march=armv6-m -mthumb -O3 -w -mfloat-abi=soft" } */ a, b, c, e, g = &e, h, i = 7, l = 1, m, n, o, q = &m, r, s = &r, u, w = 9, x, Patch was tested by running the testcase once with -mcpu=cortex-a9 (skipped as expected) and once with -mcpu=cortex-m0 (passes). Is this ok for trunk? Best regards, Thomas
RE: [PATCH] Fix PR66168: ICE due to incorrect invariant register info
> From: Jeff Law [mailto:l...@redhat.com] > Sent: Wednesday, May 27, 2015 11:24 PM > Ah, OK. I was looking at the code prior to the call for > can_move_invariant_reg in move_invariant_reg which implies that DEST > can > be a subreg, but REG can not. > > But with that check in can_move_invariant_reg obviously won't matter. > It feels like we've likely got some dead code here, but that can be a > follow-up if you want to pursue. Are you referring to the subreg code? It's used at the end of the function: inv->reg = reg; inv->orig_regno = regno; > > OK for the trunk. Thanks, committed. Best regards, Thomas
RE: [PATCH 2/3, ARM, libgcc, ping7] Code size optimization for the fmul/fdiv and dmul/ddiv function in libgcc
Ping? > -Original Message- > From: Thomas Preud'homme [mailto:thomas.preudho...@arm.com] > Sent: Thursday, April 30, 2015 3:19 PM > To: Thomas Preud'homme; Richard Earnshaw; 'gcc-patches@gcc.gnu.org'; > Marcus Shawcroft; Ramana Radhakrishnan > (ramana.radhakrish...@arm.com) > Subject: RE: [PATCH 2/3, ARM, libgcc, ping6] Code size optimization for > the fmul/fdiv and dmul/ddiv function in libgcc > > Here is an updated patch that prefix local symbols with __ for more > safety. > They appear in the symtab as local so it is not strictly necessary but one is > never too cautious. Being local, they also do not generate any PLT entry. > They appear only because the jumps are from one section to another > (which is the whole purpose of this patch) and thus need a static > relocation. > > I hope this revised version address all your concerns. > > ChangeLog entry is unchanged: > > *** gcc/libgcc/ChangeLog *** > > 2015-04-30 Tony Wang > > * config/arm/ieee754-sf.S: Expose symbols around fragment > boundaries as function symbols. > * config/arm/ieee754-df.S: Same with above > > diff --git a/libgcc/config/arm/ieee754-df.S b/libgcc/config/arm/ieee754- > df.S > index c1468dc..39b0028 100644 > --- a/libgcc/config/arm/ieee754-df.S > +++ b/libgcc/config/arm/ieee754-df.S > @@ -559,7 +559,7 @@ ARM_FUNC_ALIAS aeabi_l2d floatdidf > > #ifdef L_arm_muldivdf3 > > -ARM_FUNC_START muldf3 > +ARM_FUNC_START muldf3, function_section > ARM_FUNC_ALIAS aeabi_dmul muldf3 > do_push {r4, r5, r6, lr} > > @@ -571,7 +571,7 @@ ARM_FUNC_ALIAS aeabi_dmul muldf3 > COND(and,s,ne) r5, ip, yh, lsr #20 > teqne r4, ip > teqne r5, ip > - bleqLSYM(Lml_s) > + bleq__Lml_s > > @ Add exponents together > add r4, r4, r5 > @@ -689,7 +689,7 @@ ARM_FUNC_ALIAS aeabi_dmul muldf3 > subsip, r4, #(254 - 1) > do_it hi > cmphi ip, #0x700 > - bhi LSYM(Lml_u) > + bhi __Lml_u > > @ Round the result, merge final exponent. > cmp lr, #0x8000 > @@ -716,9 +716,12 @@ LSYM(Lml_1): > mov lr, #0 > subsr4, r4, #1 > > -LSYM(Lml_u): > + FUNC_END aeabi_dmul > + FUNC_END muldf3 > + > +ARM_SYM_START __Lml_u > @ Overflow? > - bgt LSYM(Lml_o) > + bgt __Lml_o > > @ Check if denormalized result is possible, otherwise return > signed 0. > cmn r4, #(53 + 1) > @@ -778,10 +781,11 @@ LSYM(Lml_u): > do_it eq > biceq xl, xl, r3, lsr #31 > RETLDM "r4, r5, r6" > + SYM_END __Lml_u > > @ One or both arguments are denormalized. > @ Scale them leftwards and preserve sign bit. > -LSYM(Lml_d): > +ARM_SYM_START __Lml_d > teq r4, #0 > bne 2f > and r6, xh, #0x8000 > @@ -804,8 +808,9 @@ LSYM(Lml_d): > beq 3b > orr yh, yh, r6 > RET > + SYM_END __Lml_d > > -LSYM(Lml_s): > +ARM_SYM_START __Lml_s > @ Isolate the INF and NAN cases away > teq r4, ip > and r5, ip, yh, lsr #20 > @@ -817,10 +822,11 @@ LSYM(Lml_s): > orrsr6, xl, xh, lsl #1 > do_it ne > COND(orr,s,ne) r6, yl, yh, lsl #1 > - bne LSYM(Lml_d) > + bne __Lml_d > + SYM_END __Lml_s > > @ Result is 0, but determine sign anyway. > -LSYM(Lml_z): > +ARM_SYM_START __Lml_z > eor xh, xh, yh > and xh, xh, #0x8000 > mov xl, #0 > @@ -832,41 +838,42 @@ LSYM(Lml_z): > moveq xl, yl > moveq xh, yh > COND(orr,s,ne) r6, yl, yh, lsl #1 > - beq LSYM(Lml_n) @ 0 * INF or INF * 0 -> NAN > + beq __Lml_n @ 0 * INF or INF * 0 -> NAN > teq r4, ip > bne 1f > orrsr6, xl, xh, lsl #12 > - bne LSYM(Lml_n) @ NAN * -> NAN > + bne __Lml_n @ NAN * -> NAN > 1: teq r5, ip > - bne LSYM(Lml_i) > + bne __Lml_i > orrsr6, yl, yh, lsl #12 > do_it ne, t > movne xl, yl > movne xh, yh > - bne LSYM(Lml_n) @ * NAN -> NAN > + bne __Lml_n @ * NAN -> NAN > + SYM_END __Lml_z > > @ Result is INF, but we need to determine its sign. > -LSYM(Lml_i): > +ARM_SYM_START __Lml_i > eor xh, xh, yh > + SYM_END __Lml_i > > @ Overflow: return INF (sign already in xh). > -LSYM(Lml_o): > +ARM_SYM_START __Lml_o > and xh, xh, #0x8000 &
RE: [PATCH] Fix PR66168: ICE due to incorrect invariant register info
> From: Jeff Law [mailto:l...@redhat.com] > Sent: Saturday, May 23, 2015 6:54 AM > > > > - if (!can_move_invariant_reg (loop, inv, reg)) > > + if (!can_move_invariant_reg (loop, inv, dest)) > Won't this run into into the same problem if DEST is a SUBREG? One of the very first test in can_move_invariant_reg is: if (!REG_P (reg) || !HARD_REGISTER_P (reg)) return false; So in case of a subreg the insn will not be moved which will execute the same code as before my patch. It would be nicer if it could work with subreg of course but this makes for a much smaller and safer patch. Best regards, Thomas
RE: [PATCH] Fix PR66168: ICE due to incorrect invariant register info
> From: gcc-patches-ow...@gcc.gnu.org [mailto:gcc-patches- > ow...@gcc.gnu.org] On Behalf Of Thomas Preud'homme > > From: Steven Bosscher [mailto:stevenb@gmail.com] > > Sent: Tuesday, May 19, 2015 7:21 PM > > > > Not OK. > > This will break in move_invariants() when it looks at REGNO (inv->reg). > > Indeed. I'm even surprised all tests passed. Ok I will just prevent moving > in such a case. I'm running the tests now and will get back to you > tomorrow. Patch is now tested via bootstrap + testsuite run on x86_64-linux-gnu and building arm-none-eabi cross-compiler + testsuite run. Both testsuite run show no regression. diff --git a/gcc/loop-invariant.c b/gcc/loop-invariant.c index 76a009f..4ce3576 100644 --- a/gcc/loop-invariant.c +++ b/gcc/loop-invariant.c @@ -1626,7 +1626,7 @@ move_invariant_reg (struct loop *loop, unsigned invno) if (REG_P (reg)) regno = REGNO (reg); - if (!can_move_invariant_reg (loop, inv, reg)) + if (!can_move_invariant_reg (loop, inv, dest)) { reg = gen_reg_rtx_and_attrs (dest); diff --git a/gcc/testsuite/gcc.c-torture/compile/pr66168.c b/gcc/testsuite/gcc.c-torture/compile/pr66168.c new file mode 100644 index 000..d6bfc7b --- /dev/null +++ b/gcc/testsuite/gcc.c-torture/compile/pr66168.c @@ -0,0 +1,15 @@ +int a, b; + +void +fn1 () +{ + for (;;) +{ + for (b = 0; b < 3; b++) + { + char e[2]; + char f = e[1]; + a ^= f ? 1 / f : 0; + } +} +} Ok for trunk? Best regards, Thomas
RE: [PATCH] Fix PR66168: ICE due to incorrect invariant register info
> From: Steven Bosscher [mailto:stevenb@gmail.com] > Sent: Tuesday, May 19, 2015 7:21 PM > > Not OK. > This will break in move_invariants() when it looks at REGNO (inv->reg). Indeed. I'm even surprised all tests passed. Ok I will just prevent moving in such a case. I'm running the tests now and will get back to you tomorrow. Best regards, Thomas
[PATCH] Fix PR66168: ICE due to incorrect invariant register info
Hi, r223113 made it possible for invariant to actually be moved rather than moving the source to a new pseudoregister. However, when doing so the inv->reg is not set up properly: in case of a subreg destination it holds the inner register rather than the subreg expression. This patch fixes that. ChangeLog entries are as follow: *** gcc/ChangeLog *** 2015-05-18 Thomas Preud'homme PR rtl-optimization/66168 * loop-invariant.c (move_invariant_reg): Set inv->reg to destination of inv->insn when moving an invariant without introducing a temporary register. *** gcc/testsuite/ChangeLog *** 2015-05-18 Thomas Preud'homme PR rtl-optimization/66168 * gcc.c-torture/compile/pr66168.c: New test. diff --git a/gcc/loop-invariant.c b/gcc/loop-invariant.c index 76a009f..30e1945 100644 --- a/gcc/loop-invariant.c +++ b/gcc/loop-invariant.c @@ -1642,9 +1642,13 @@ move_invariant_reg (struct loop *loop, unsigned invno) emit_insn_after (gen_move_insn (dest, reg), inv->insn); } - else if (dump_file) - fprintf (dump_file, "Invariant %d moved without introducing a new " - "temporary register\n", invno); + else + { + reg = SET_DEST (set); + if (dump_file) + fprintf (dump_file, "Invariant %d moved without introducing a new " + "temporary register\n", invno); + } reorder_insns (inv->insn, inv->insn, BB_END (preheader)); /* If there is a REG_EQUAL note on the insn we just moved, and the diff --git a/gcc/testsuite/gcc.c-torture/compile/pr66168.c b/gcc/testsuite/gcc.c-torture/compile/pr66168.c new file mode 100644 index 000..d6bfc7b --- /dev/null +++ b/gcc/testsuite/gcc.c-torture/compile/pr66168.c @@ -0,0 +1,15 @@ +int a, b; + +void +fn1 () +{ + for (;;) +{ + for (b = 0; b < 3; b++) + { + char e[2]; + char f = e[1]; + a ^= f ? 1 / f : 0; + } +} +} Tested by bootstrapping on x86_64-linux-gnu and building an arm-none-eabi cross-compiler. Testsuite run shows no regression for both of them. Ok for trunk? Best regards, Thomas
RE: [PATCH, ping 1] Move insns without introducing new temporaries in loop2_invariant
> From: Jeff Law [mailto:l...@redhat.com] > Sent: Wednesday, May 13, 2015 4:05 AM > OK for the trunk. > > Thanks for your patience, Thanks. Committed with the added "PR rtl-optimization/64616" to both ChangeLog entries. Best regards, Thomas
RE: [PATCH, ping 1] Move insns without introducing new temporaries in loop2_invariant
> From: gcc-patches-ow...@gcc.gnu.org [mailto:gcc-patches- > ow...@gcc.gnu.org] On Behalf Of Thomas Preud'homme > > > From: Jeff Law [mailto:l...@redhat.com] > > Sent: Tuesday, May 12, 2015 4:17 AM > > > > >> > > >> + > > >> + /* Check that all uses reached by the def in insn would still be > > reached > > >> + it. */ > > >> + dest_regno = REGNO (reg); > > >> + for (use = DF_REG_USE_CHAIN (dest_regno); use; use = > > >> DF_REF_NEXT_REG (use)) > > [ ... ] > > So isn't this overly conservative if DEST_REGNO is set multiple times > > since it's going to look at all the uses, even those not necessarily > > reached by the original SET of DEST_REGNO? > > > > Or is that not an issue for some reason? And I'm not requiring you to > > make this optimal, but if I'm right, a comment here seems wise. > > My apologize, it is the comment that is incorrect since it doesn't match > the code (a remaining of an old version of this patch). The code actually > checks that the use was dominated by the instruction before it is moved > out of the loop. > > > > > > I think with the wrapping nits fixed and closure on the multi-set issue > > noted immediately above and this will be good for the trunk. > > I'll fix this comment right away. Please find below a patch with the comment fixed. *** gcc/ChangeLog *** 2015-05-12 Thomas Preud'homme * loop-invariant.c (can_move_invariant_reg): New. (move_invariant_reg): Call above new function to decide whether instruction can just be moved, skipping creation of temporary register. *** gcc/testsuite/ChangeLog *** 2015-05-12 Thomas Preud'homme * gcc.dg/loop-8.c: New test. * gcc.dg/loop-9.c: New test. diff --git a/gcc/loop-invariant.c b/gcc/loop-invariant.c index e3b560d..76a009f 100644 --- a/gcc/loop-invariant.c +++ b/gcc/loop-invariant.c @@ -1511,6 +1511,79 @@ replace_uses (struct invariant *inv, rtx reg, bool in_group) return 1; } +/* Whether invariant INV setting REG can be moved out of LOOP, at the end of + the block preceding its header. */ + +static bool +can_move_invariant_reg (struct loop *loop, struct invariant *inv, rtx reg) +{ + df_ref def, use; + unsigned int dest_regno, defs_in_loop_count = 0; + rtx_insn *insn = inv->insn; + basic_block bb = BLOCK_FOR_INSN (inv->insn); + + /* We ignore hard register and memory access for cost and complexity reasons. + Hard register are few at this stage and expensive to consider as they + require building a separate data flow. Memory access would require using + df_simulate_* and can_move_insns_across functions and is more complex. */ + if (!REG_P (reg) || HARD_REGISTER_P (reg)) +return false; + + /* Check whether the set is always executed. We could omit this condition if + we know that the register is unused outside of the loop, but it does not + seem worth finding out. */ + if (!inv->always_executed) +return false; + + /* Check that all uses that would be dominated by def are already dominated + by it. */ + dest_regno = REGNO (reg); + for (use = DF_REG_USE_CHAIN (dest_regno); use; use = DF_REF_NEXT_REG (use)) +{ + rtx_insn *use_insn; + basic_block use_bb; + + use_insn = DF_REF_INSN (use); + use_bb = BLOCK_FOR_INSN (use_insn); + + /* Ignore instruction considered for moving. */ + if (use_insn == insn) + continue; + + /* Don't consider uses outside loop. */ + if (!flow_bb_inside_loop_p (loop, use_bb)) + continue; + + /* Don't move if a use is not dominated by def in insn. */ + if (use_bb == bb && DF_INSN_LUID (insn) >= DF_INSN_LUID (use_insn)) + return false; + if (!dominated_by_p (CDI_DOMINATORS, use_bb, bb)) + return false; +} + + /* Check for other defs. Any other def in the loop might reach a use + currently reached by the def in insn. */ + for (def = DF_REG_DEF_CHAIN (dest_regno); def; def = DF_REF_NEXT_REG (def)) +{ + basic_block def_bb = DF_REF_BB (def); + + /* Defs in exit block cannot reach a use they weren't already. */ + if (single_succ_p (def_bb)) + { + basic_block def_bb_succ; + + def_bb_succ = single_succ (def_bb); + if (!flow_bb_inside_loop_p (loop, def_bb_succ)) + continue; + } + + if (++defs_in_loop_count > 1) + return false; +} + + return true; +} + /* Move invariant INVNO out of the LOOP. Returns true if this succeeds, false otherwise. */ @@ -1544,11 +1617,8 @@ move_invariant_reg (struct loop *loop, unsigned invno) } } - /* Move the set out of the loop. If the set is always executed (we could
RE: [PATCH, ping 1] Move insns without introducing new temporaries in loop2_invariant
> From: Jeff Law [mailto:l...@redhat.com] > Sent: Tuesday, May 12, 2015 4:17 AM > > On 05/06/2015 03:47 AM, Thomas Preud'homme wrote: > > Ping? > Something to consider as future work -- I'm pretty sure PRE sets up the > same kind of problematical pattern with a new pseudo (reaching reg) > holding the result of the redundant expression and the original > evaluations turned into copies from the reaching reg to the final > destination. Yes absolutely, this is how the pattern I was interested in was created. The reason I solved it in loop-invariant is that I thought this was on purpose with the cleanup left to loop-invariant. When finding a TODO comment about this in loop-invariant I thought it confirmed my initial thoughts. > > That style is easy to prove correct. There was an issue with the copies > not propagating away that was pretty inherent in the partial redundancy > cases that I could probably dig out of my archives if you're interested. If you think this should also (or instead) be fixed in PRE I can take a look at some point later since it shouldn't be much more work. > It looks like there's a variety of line wrapping issues. Please > double-check line wrapping using an 80 column window. Minor I know, > but > the consistency with the rest of the code is good. Looking in vim seems to systematically cut at 80 column and check_GNU_style.sh only complain about the dg-final line in the new testcases. Could you point me to such an occurrence? > > >> > >> + > >> + /* Check that all uses reached by the def in insn would still be > reached > >> + it. */ > >> + dest_regno = REGNO (reg); > >> + for (use = DF_REG_USE_CHAIN (dest_regno); use; use = > >> DF_REF_NEXT_REG (use)) > [ ... ] > So isn't this overly conservative if DEST_REGNO is set multiple times > since it's going to look at all the uses, even those not necessarily > reached by the original SET of DEST_REGNO? > > Or is that not an issue for some reason? And I'm not requiring you to > make this optimal, but if I'm right, a comment here seems wise. My apologize, it is the comment that is incorrect since it doesn't match the code (a remaining of an old version of this patch). The code actually checks that the use was dominated by the instruction before it is moved out of the loop. This is to prevent the code motion in case like: foo = 1; bar = 0; for () { bar += foo; foo = 42; } which I met in some of the testsuite cases. > > > I think with the wrapping nits fixed and closure on the multi-set issue > noted immediately above and this will be good for the trunk. I'll fix this comment right away. Best regards, Thomas
RE: [PATCH, combine] Try REG_EQUAL for nonzero_bits
> From: gcc-patches-ow...@gcc.gnu.org [mailto:gcc-patches- > ow...@gcc.gnu.org] On Behalf Of Thomas Preud'homme > > Based on my understanding of your answer quoted above, I'll commit > it as is, despite not having been able to come up with a testcase. I'll > wait tomorrow to do so though in case you changed your mind about it. Committed. Best regards, Thomas
[PATCH, ARM] Fix testcase for PR64616
Hi, Testcase made for PR64616 was only passing when using a litteral pool. Rather than having an alternative for systems where this is not true, this patch changes the test to check that a global copy propagation occurs in cprop2. This should work accross all ARM targets (it works when targetting Cortex-M0, Cortex-M3 and whatever default core for ARMv7-a with vfpv3-d16 FPU). ChangeLog entry is as follows: *** gcc/testsuite/ChangeLog *** 2015-05-04 Thomas Preud'homme * gcc.target/arm/pr64616.c: Test dump rather than assembly to work accross ARM targets. diff --git a/gcc/testsuite/gcc.target/arm/pr64616.c b/gcc/testsuite/gcc.target/arm/pr64616.c index c686ffa..2280f21 100644 --- a/gcc/testsuite/gcc.target/arm/pr64616.c +++ b/gcc/testsuite/gcc.target/arm/pr64616.c @@ -1,5 +1,5 @@ /* { dg-do compile } */ -/* { dg-options "-O2" } */ +/* { dg-options "-O2 -fdump-rtl-cprop2" } */ int f (int); unsigned int glob; @@ -11,4 +11,5 @@ g () glob = 5; } -/* { dg-final { scan-assembler-times "ldr" 2 } } */ +/* { dg-final { scan-rtl-dump "GLOBAL COPY-PROP" "cprop2" } } */ +/* { dg-final { cleanup-rtl-dump "cprop2" } } */ Patch was tested by verifying that the pattern appears when targeting Cortex-M0, Cortex-M3 and the default core for ARMv7-a with vfpv3-d16 FPU. Best regards, Thomas
RE: [PATCH, ping 1] Move insns without introducing new temporaries in loop2_invariant
Ping? Best regards, Thomas > -Original Message- > From: gcc-patches-ow...@gcc.gnu.org [mailto:gcc-patches- > ow...@gcc.gnu.org] On Behalf Of Thomas Preud'homme > Sent: Monday, March 16, 2015 8:39 PM > To: 'Steven Bosscher' > Cc: GCC Patches; Eric Botcazou > Subject: RE: [PATCH, stage1] Move insns without introducing new > temporaries in loop2_invariant > > > From: Steven Bosscher [mailto:stevenb@gmail.com] > > Sent: Monday, March 09, 2015 7:48 PM > > To: Thomas Preud'homme > > Cc: GCC Patches; Eric Botcazou > > Subject: Re: [PATCH, stage1] Move insns without introducing new > > temporaries in loop2_invariant > > New patch below. > > > > > It looks like this would run for all candidate loop invariants, right? > > > > If so, you're creating run time of O(n_invariants*n_bbs_in_loop), a > > potential compile time hog for large loops. > > > > But why compute this at all? Perhaps I'm missing something, but you > > already have inv->always_executed available, no? > > Indeed. I didn't realize the information was already there. > > > > > > > > + basic_block use_bb; > > > + > > > + ref = DF_REF_INSN (use); > > > + use_bb = BLOCK_FOR_INSN (ref); > > > > You can use DF_REF_BB. > > Since I need use_insn here I kept BLOCK_FOR_INSN but I used > DF_REF_BB for the def below. > > > So here are the new ChangeLog entries: > > *** gcc/ChangeLog *** > > 2015-03-11 Thomas Preud'homme > > * loop-invariant.c (can_move_invariant_reg): New. > (move_invariant_reg): Call above new function to decide whether > instruction can just be moved, skipping creation of temporary > register. > > *** gcc/testsuite/ChangeLog *** > > 2015-03-12 Thomas Preud'homme > > * gcc.dg/loop-8.c: New test. > * gcc.dg/loop-9.c: New test. > > > diff --git a/gcc/loop-invariant.c b/gcc/loop-invariant.c > index f79b497..8217d62 100644 > --- a/gcc/loop-invariant.c > +++ b/gcc/loop-invariant.c > @@ -1512,6 +1512,79 @@ replace_uses (struct invariant *inv, rtx reg, > bool in_group) >return 1; > } > > And the new patch: > > +/* Whether invariant INV setting REG can be moved out of LOOP, at the > end of > + the block preceding its header. */ > + > +static bool > +can_move_invariant_reg (struct loop *loop, struct invariant *inv, rtx > reg) > +{ > + df_ref def, use; > + unsigned int dest_regno, defs_in_loop_count = 0; > + rtx_insn *insn = inv->insn; > + basic_block bb = BLOCK_FOR_INSN (inv->insn); > + > + /* We ignore hard register and memory access for cost and complexity > reasons. > + Hard register are few at this stage and expensive to consider as they > + require building a separate data flow. Memory access would require > using > + df_simulate_* and can_move_insns_across functions and is more > complex. */ > + if (!REG_P (reg) || HARD_REGISTER_P (reg)) > +return false; > + > + /* Check whether the set is always executed. We could omit this > condition if > + we know that the register is unused outside of the loop, but it does > not > + seem worth finding out. */ > + if (!inv->always_executed) > +return false; > + > + /* Check that all uses reached by the def in insn would still be reached > + it. */ > + dest_regno = REGNO (reg); > + for (use = DF_REG_USE_CHAIN (dest_regno); use; use = > DF_REF_NEXT_REG (use)) > +{ > + rtx_insn *use_insn; > + basic_block use_bb; > + > + use_insn = DF_REF_INSN (use); > + use_bb = BLOCK_FOR_INSN (use_insn); > + > + /* Ignore instruction considered for moving. */ > + if (use_insn == insn) > + continue; > + > + /* Don't consider uses outside loop. */ > + if (!flow_bb_inside_loop_p (loop, use_bb)) > + continue; > + > + /* Don't move if a use is not dominated by def in insn. */ > + if (use_bb == bb && DF_INSN_LUID (insn) >= DF_INSN_LUID > (use_insn)) > + return false; > + if (!dominated_by_p (CDI_DOMINATORS, use_bb, bb)) > + return false; > +} > + > + /* Check for other defs. Any other def in the loop might reach a use > + currently reached by the def in insn. */ > + for (def = DF_REG_DEF_CHAIN (dest_regno); def; def = > DF_REF_NEXT_REG (def)) > +{ > + basic_block def_bb = DF_REF_BB (def); > + > + /* Defs in exit block cannot reach a use they weren't already. */ > + if (single_succ_p (def_bb)) > +
RE: [PATCH, combine] Try REG_EQUAL for nonzero_bits
> From: Jeff Law [mailto:l...@redhat.com] > Sent: Tuesday, April 28, 2015 12:27 AM > OK. No need for heroics -- give it a shot, but don't burn an insane > amount of time on it. If we can't get to a reasonable testcase, then so > be it. Ok, I tried but really didn't managed to create a testcase. I did, however, understand the condition when this patch is helpful. In the function reg_nonzero_bits_for_combine () in combine.c there is a test to check if last_set_nonzero_bits for a given register is still valid. In the case I'm considering, the test evaluates to false because: (i) the register rX whose nonzero bits are being evaluated was set in a previous basic block than the one with the instruction using rX (hence rsp->last_set_label < label_tick) (ii) the predecessor of the the basic block for that same insn is not the previous basic block analyzed by combine_instructions (hence label_tick_ebb_start == label_tick) (iii) the register rX is set multiple time (hence REG_N_SETS (REGNO (x)) != 1) Yet, the block being processed is dominated by the SET for rX so there is a REG_EQUAL available to narrow down the set of nonzero bits. Based on my understanding of your answer quoted above, I'll commit it as is, despite not having been able to come up with a testcase. I'll wait tomorrow to do so though in case you changed your mind about it. Best regards, Thomas
RE: [PATCH, combine] Try REG_EQUAL for nonzero_bits
> From: Jeff Law [mailto:l...@redhat.com] > Sent: Tuesday, April 28, 2015 12:27 AM > To: Thomas Preud'homme; 'Eric Botcazou' > Cc: gcc-patches@gcc.gnu.org > Subject: Re: [PATCH, combine] Try REG_EQUAL for nonzero_bits > > On 04/27/2015 04:26 AM, Thomas Preud'homme wrote: > >> From: Jeff Law [mailto:l...@redhat.com] > >> Sent: Saturday, April 25, 2015 3:00 AM > >> Do you have a testcase where this change can result in better > generated > >> code. If so please add that testcase. It's OK if it's ARM specific. > > > > Hi Jeff, > > > > Last time I tried I couldn't reduce the code to a small testcase but if I > remember > > well it was mostly due to the problem of finding a good test for creduce > > (zero extension is not unique enough). I'll try again with a more manual > approach > > and get back to you. > OK. No need for heroics -- give it a shot, but don't burn an insane > amount of time on it. If we can't get to a reasonable testcase, then so > be it. Sadly I couldn't get a testcase. I get almost same sequence of instruction as the program we found the problem into but couldn't get exactly the same. In all the cases I constructed the nonzero_bits info we already have were enough for combine to do its job. I couldn't find what cause this information to be inaccurate. I will try to investigate a bit further on Monday as another pass might not be doing its job properly. Or maybe there's something that prevent information being propagated. Best regards, Thomas
RE: [PATCH 2/3, ARM, libgcc, ping6] Code size optimization for the fmul/fdiv and dmul/ddiv function in libgcc
Here is an updated patch that prefix local symbols with __ for more safety. They appear in the symtab as local so it is not strictly necessary but one is never too cautious. Being local, they also do not generate any PLT entry. They appear only because the jumps are from one section to another (which is the whole purpose of this patch) and thus need a static relocation. I hope this revised version address all your concerns. ChangeLog entry is unchanged: *** gcc/libgcc/ChangeLog *** 2015-04-30 Tony Wang * config/arm/ieee754-sf.S: Expose symbols around fragment boundaries as function symbols. * config/arm/ieee754-df.S: Same with above diff --git a/libgcc/config/arm/ieee754-df.S b/libgcc/config/arm/ieee754-df.S index c1468dc..39b0028 100644 --- a/libgcc/config/arm/ieee754-df.S +++ b/libgcc/config/arm/ieee754-df.S @@ -559,7 +559,7 @@ ARM_FUNC_ALIAS aeabi_l2d floatdidf #ifdef L_arm_muldivdf3 -ARM_FUNC_START muldf3 +ARM_FUNC_START muldf3, function_section ARM_FUNC_ALIAS aeabi_dmul muldf3 do_push {r4, r5, r6, lr} @@ -571,7 +571,7 @@ ARM_FUNC_ALIAS aeabi_dmul muldf3 COND(and,s,ne) r5, ip, yh, lsr #20 teqne r4, ip teqne r5, ip - bleqLSYM(Lml_s) + bleq__Lml_s @ Add exponents together add r4, r4, r5 @@ -689,7 +689,7 @@ ARM_FUNC_ALIAS aeabi_dmul muldf3 subsip, r4, #(254 - 1) do_it hi cmphi ip, #0x700 - bhi LSYM(Lml_u) + bhi __Lml_u @ Round the result, merge final exponent. cmp lr, #0x8000 @@ -716,9 +716,12 @@ LSYM(Lml_1): mov lr, #0 subsr4, r4, #1 -LSYM(Lml_u): + FUNC_END aeabi_dmul + FUNC_END muldf3 + +ARM_SYM_START __Lml_u @ Overflow? - bgt LSYM(Lml_o) + bgt __Lml_o @ Check if denormalized result is possible, otherwise return signed 0. cmn r4, #(53 + 1) @@ -778,10 +781,11 @@ LSYM(Lml_u): do_it eq biceq xl, xl, r3, lsr #31 RETLDM "r4, r5, r6" + SYM_END __Lml_u @ One or both arguments are denormalized. @ Scale them leftwards and preserve sign bit. -LSYM(Lml_d): +ARM_SYM_START __Lml_d teq r4, #0 bne 2f and r6, xh, #0x8000 @@ -804,8 +808,9 @@ LSYM(Lml_d): beq 3b orr yh, yh, r6 RET + SYM_END __Lml_d -LSYM(Lml_s): +ARM_SYM_START __Lml_s @ Isolate the INF and NAN cases away teq r4, ip and r5, ip, yh, lsr #20 @@ -817,10 +822,11 @@ LSYM(Lml_s): orrsr6, xl, xh, lsl #1 do_it ne COND(orr,s,ne) r6, yl, yh, lsl #1 - bne LSYM(Lml_d) + bne __Lml_d + SYM_END __Lml_s @ Result is 0, but determine sign anyway. -LSYM(Lml_z): +ARM_SYM_START __Lml_z eor xh, xh, yh and xh, xh, #0x8000 mov xl, #0 @@ -832,41 +838,42 @@ LSYM(Lml_z): moveq xl, yl moveq xh, yh COND(orr,s,ne) r6, yl, yh, lsl #1 - beq LSYM(Lml_n) @ 0 * INF or INF * 0 -> NAN + beq __Lml_n @ 0 * INF or INF * 0 -> NAN teq r4, ip bne 1f orrsr6, xl, xh, lsl #12 - bne LSYM(Lml_n) @ NAN * -> NAN + bne __Lml_n @ NAN * -> NAN 1: teq r5, ip - bne LSYM(Lml_i) + bne __Lml_i orrsr6, yl, yh, lsl #12 do_it ne, t movne xl, yl movne xh, yh - bne LSYM(Lml_n) @ * NAN -> NAN + bne __Lml_n @ * NAN -> NAN + SYM_END __Lml_z @ Result is INF, but we need to determine its sign. -LSYM(Lml_i): +ARM_SYM_START __Lml_i eor xh, xh, yh + SYM_END __Lml_i @ Overflow: return INF (sign already in xh). -LSYM(Lml_o): +ARM_SYM_START __Lml_o and xh, xh, #0x8000 orr xh, xh, #0x7f00 orr xh, xh, #0x00f0 mov xl, #0 RETLDM "r4, r5, r6" + SYM_END __Lml_o @ Return a quiet NAN. -LSYM(Lml_n): +ARM_SYM_START __Lml_n orr xh, xh, #0x7f00 orr xh, xh, #0x00f8 RETLDM "r4, r5, r6" + SYM_END __Lml_n - FUNC_END aeabi_dmul - FUNC_END muldf3 - -ARM_FUNC_START divdf3 +ARM_FUNC_START divdf3 function_section ARM_FUNC_ALIAS aeabi_ddiv divdf3 do_push {r4, r5, r6, lr} @@ -985,7 +992,7 @@ ARM_FUNC_ALIAS aeabi_ddiv divdf3 subsip, r4, #(254 - 1) do_it hi cmphi ip, #0x700 - bhi LSYM(Lml_u) + bhi __Lml_u @ Round the result, merge final exponent. subsip, r5, yh @@ -1009,13 +1016,13 @@ LSYM(Ldv_1): orr xh, xh, #0x0010 mov lr, #0 subsr4, r4, #1 - b LSYM(Lml_u) + b __Lml_u @ Result mightt need to be denormaliz
RE: [PATCH, Aarch64] Add FMA steering pass for Cortex-A57
> From: Marcus Shawcroft [mailto:marcus.shawcr...@gmail.com] > Sent: Thursday, February 05, 2015 5:17 PM > > > > *** gcc/ChangeLog *** > > > > 2015-01-26 Thomas Preud'homme thomas.preudho...@arm.com > > > > * config.gcc: Add cortex-a57-fma-steering.o to extra_objs for > > aarch64-*-*. > > * config/aarch64/t-aarch64: Add a rule for cortex-a57-fma-steering.o. > > * config/aarch64/aarch64.h > (AARCH64_FL_USE_FMA_STEERING_PASS): Define. > > (AARCH64_TUNE_FMA_STEERING): Likewise. > > * config/aarch64/aarch64-cores.def: Set > > AARCH64_FL_USE_FMA_STEERING_PASS for cores with dynamic > steering of > > FMUL/FMADD instructions. > > * config/aarch64/aarch64.c (aarch64_register_fma_steering): Declare. > > (aarch64_override_options): Include cortex-a57-fma-steering.h. Call > > aarch64_register_fma_steering () if > AARCH64_TUNE_FMA_STEERING is true. > > * config/aarch64/cortex-a57-fma-steering.h: New file. > > * config/aarch64/cortex-a57-fma-steering.c: Likewise. > > OK but wait for stage-1 to open for general development before you > commit it please. Done after rebasing it (context line change in aarch64.c due to new header include and adaptation to new signature of AARCH64_CORE macro in aarch64-cores.def). Committed patch below: diff --git a/gcc/config.gcc b/gcc/config.gcc index a1df043..9fec1e8 100644 --- a/gcc/config.gcc +++ b/gcc/config.gcc @@ -302,7 +302,7 @@ m32c*-*-*) aarch64*-*-*) cpu_type=aarch64 extra_headers="arm_neon.h arm_acle.h" - extra_objs="aarch64-builtins.o aarch-common.o" + extra_objs="aarch64-builtins.o aarch-common.o cortex-a57-fma-steering.o" target_gtfiles="\$(srcdir)/config/aarch64/aarch64-builtins.c" target_has_targetm_common=yes ;; diff --git a/gcc/config/aarch64/aarch64-cores.def b/gcc/config/aarch64/aarch64-cores.def index 7c285ba..dfc9cc8 100644 --- a/gcc/config/aarch64/aarch64-cores.def +++ b/gcc/config/aarch64/aarch64-cores.def @@ -40,7 +40,7 @@ /* V8 Architecture Processors. */ AARCH64_CORE("cortex-a53", cortexa53, cortexa53, 8, AARCH64_FL_FOR_ARCH8 | AARCH64_FL_CRC, cortexa53, "0x41", "0xd03") -AARCH64_CORE("cortex-a57", cortexa57, cortexa57, 8, AARCH64_FL_FOR_ARCH8 | AARCH64_FL_CRC, cortexa57, "0x41", "0xd07") +AARCH64_CORE("cortex-a57", cortexa57, cortexa57, 8, AARCH64_FL_FOR_ARCH8 | AARCH64_FL_CRC | AARCH64_FL_USE_FMA_STEERING_PASS, cortexa57, "0x41", "0xd07") AARCH64_CORE("cortex-a72", cortexa72, cortexa57, 8, AARCH64_FL_FOR_ARCH8 | AARCH64_FL_CRC, cortexa57, "0x41", "0xd08") AARCH64_CORE("exynos-m1", exynosm1, cortexa57, 8, AARCH64_FL_FOR_ARCH8 | AARCH64_FL_CRC | AARCH64_FL_CRYPTO, cortexa57, "0x53", "0x001") AARCH64_CORE("thunderx",thunderx, thunderx, 8, AARCH64_FL_FOR_ARCH8 | AARCH64_FL_CRC | AARCH64_FL_CRYPTO, thunderx, "0x43", "0x0a1") @@ -48,5 +48,5 @@ AARCH64_CORE("xgene1", xgene1,xgene1,8, AARCH64_FL_FOR_ARCH8, xgen /* V8 big.LITTLE implementations. */ -AARCH64_CORE("cortex-a57.cortex-a53", cortexa57cortexa53, cortexa53, 8, AARCH64_FL_FOR_ARCH8 | AARCH64_FL_CRC, cortexa57, "0x41", "0xd07.0xd03") +AARCH64_CORE("cortex-a57.cortex-a53", cortexa57cortexa53, cortexa53, 8, AARCH64_FL_FOR_ARCH8 | AARCH64_FL_CRC | AARCH64_FL_USE_FMA_STEERING_PASS, cortexa57, "0x41", "0xd07.0xd03") AARCH64_CORE("cortex-a72.cortex-a53", cortexa72cortexa53, cortexa53, 8, AARCH64_FL_FOR_ARCH8 | AARCH64_FL_CRC, cortexa57, "0x41", "0xd08.0xd03") diff --git a/gcc/config/aarch64/aarch64.h b/gcc/config/aarch64/aarch64.h index 1f7187b..3fd1b3f 100644 --- a/gcc/config/aarch64/aarch64.h +++ b/gcc/config/aarch64/aarch64.h @@ -200,6 +200,8 @@ extern unsigned aarch64_architecture_version; #define AARCH64_FL_CRYPTO (1 << 2) /* Has crypto. */ #define AARCH64_FL_SLOWMUL(1 << 3) /* A slow multiply core. */ #define AARCH64_FL_CRC(1 << 4) /* Has CRC. */ +/* Has static dispatch of FMA. */ +#define AARCH64_FL_USE_FMA_STEERING_PASS (1 << 5) /* Has FP and SIMD. */ #define AARCH64_FL_FPSIMD (AARCH64_FL_FP | AARCH64_FL_SIMD) @@ -220,6 +222,8 @@ extern unsigned long aarch64_isa_flags; /* Macros to test tuning flags. */ extern unsigned long aarch64_tune_flags; #define AARCH64_TUNE_SLOWMUL (aarch64_tune_flags & AARCH64_FL_SLOWMUL) +#define AARCH64_TUNE_FMA_STEERING \ + (aarch64_tune_flags & AARCH64_FL_USE_FMA_STEERING_PASS) /* Crypto is an optional extension to AdvSIMD. */ #define TARGET_CRYPTO (TARGET_SIMD &
RE: [PATCH 1/2, combine] Try REG_EQUAL for nonzero_bits
> From: Jeff Law [mailto:l...@redhat.com] > Sent: Saturday, April 25, 2015 2:57 AM > > +static rtx > > +sign_extend_short_imm (rtx src, machine_mode mode, unsigned int > prec) > > +{ > > + if (GET_MODE_PRECISION (mode) < prec && CONST_INT_P (src) > > + && INTVAL (src) > 0 && val_signbit_known_set_p (mode, INTVAL > (src))) > > +src = GEN_INT (INTVAL (src) | ~GET_MODE_MASK (mode)); > Can you go ahead and put each condition of the && on a separate line. > It uses more vertical space, but IMHO makes this easier to read.As I > said, it was a nit :-) You're perfectly right. Anything that can improve readability of source code is a good thing. > > OK with that fix. Committed. Best regards, Thomas
RE: [PATCH, combine] Try REG_EQUAL for nonzero_bits
> From: Jeff Law [mailto:l...@redhat.com] > Sent: Saturday, April 25, 2015 3:00 AM > Do you have a testcase where this change can result in better generated > code. If so please add that testcase. It's OK if it's ARM specific. Hi Jeff, Last time I tried I couldn't reduce the code to a small testcase but if I remember well it was mostly due to the problem of finding a good test for creduce (zero extension is not unique enough). I'll try again with a more manual approach and get back to you. Best regards, Thomas
RE: [PATCH 1/2, combine] Try REG_EQUAL for nonzero_bits
Hi, first of all, sorry for the delay. We quickly entered stage 4 and I thought it was best waiting for stage 1 to update you on this. > From: gcc-patches-ow...@gcc.gnu.org [mailto:gcc-patches- > ow...@gcc.gnu.org] On Behalf Of Thomas Preud'homme > > Of course both approaches are not exclusive. I'll try to test with *both* > rs6000 bootstrap and with a cross-compiler for one of these targets. I did two experiments where I checked the impact of removing the code guarded by SHORT_IMMEDIATES_SIGN_EXTEND. In the first one I removed the code in both rtlanal.c and combine.c. In the second, I only removed the code from combine.c (in both occurences). In both cases powerpc bootstrap succeeded. I then proceeded to use these 2 produced compilers to compile the same gcc source (actually the source from removing all code guarded by the macro). I compared the output of objdump on the resulting g++ and found that in both case the output was different from the one without any modification. Both diffs look like: Disassembly of section .init: @@ -1359,7 +1359,7 @@ Disassembly of section .text: 10003a94: f8 21 ff 81 stdur1,-128(r1) 10003a98: eb e4 00 00 ld r31,0(r4) 10003a9c: 3c 82 ff f8 addis r4,r2,-8 -10003aa0: 38 84 d7 60 addir4,r4,-10400 +10003aa0: 38 84 d7 70 addir4,r4,-10384 10003aa4: 7f e3 fb 78 mr r3,r31 10003aa8: 4b ff f0 d9 bl 10002b80 <003d.plt_call.strcmp@@GLIBC_2.3+0> 10003aac: e8 41 00 28 ld r2,40(r1) @@ -1371,7 +1371,7 @@ Disassembly of section .text: 10003ac4: 79 2a ff e3 rldicl. r10,r9,63,63 10003ac8: 41 82 00 78 beq-10003b40 <._ZL22sanitize_spec_functioniPPKc+0xc0> 10003acc: 3c 62 ff f8 addis r3,r2,-8 -10003ad0: 38 63 f5 70 addir3,r3,-2704 +10003ad0: 38 63 f5 b0 addir3,r3,-2640 10003ad4: 38 21 00 80 addir1,r1,128 10003ad8: e8 01 00 10 ld r0,16(r1) 10003adc: eb e1 ff f8 ld r31,-8(r1) (this one is when comparing g++ compiled by GCC with partial removal of the code guarded by the macro compared to compiled without GCC being modified. I may have done a mistake when doing the experiment though and can do it again if you wish. Best regards, Thomas
[PATCH, ARM, regression] Fix ternary operator in arm/unknown-elf.h
I just committed the obvious fix below that fix build failure introduced by revision 222371. *** gcc/ChangeLog *** 2015-04-24 Thomas Preud'homme * config/arm/unknown-elf.h (ASM_OUTPUT_ALIGNED_DECL_LOCAL): fix ternary operator in fprintf and harmonize spacing. diff --git a/gcc/config/arm/unknown-elf.h b/gcc/config/arm/unknown-elf.h index df0b9ce..2e5ab7e 100644 --- a/gcc/config/arm/unknown-elf.h +++ b/gcc/config/arm/unknown-elf.h @@ -80,9 +80,9 @@ \ ASM_OUTPUT_ALIGN (FILE, floor_log2 (ALIGN / BITS_PER_UNIT)); \ ASM_OUTPUT_LABEL (FILE, NAME); \ - fprintf (FILE, "\t.space\t%d\n", SIZE ? (int)(SIZE) : 1); \ + fprintf (FILE, "\t.space\t%d\n", SIZE ? (int) SIZE : 1); \ fprintf (FILE, "\t.size\t%s, %d\n", \ - NAME, SIZE ? (int) SIZE, 1); \ + NAME, SIZE ? (int) SIZE : 1);\ } \ while (0) Best regards, Thomas
RE: [PATCH, GCC, stage1] Fallback to copy-prop if constant-prop not possible
> From: Jeff Law [mailto:l...@redhat.com] > Sent: Friday, April 24, 2015 11:15 AM > > So revised review is "ok for the trunk" :-) Committed. Best regards, Thomas
RE: [PATCH, GCC, stage1] Fallback to copy-prop if constant-prop not possible
> From: Jeff Law [mailto:l...@redhat.com] > Sent: Friday, April 24, 2015 10:59 AM > Hi Jeff, > > + > > +static bool > > +cprop_reg_p (const_rtx x) > > +{ > > + return REG_P (x) && !HARD_REGISTER_P (x); > > +} > How about instead this move to a more visible location (perhaps a macro > in regs.h or an inline function). Then as a followup, change the > various places that have this sequence to use that common definition > that exist outside of cprop.c. According to Steven this was proposed in the past but was refused (see end of [1]). [1] https://gcc.gnu.org/ml/gcc-patches/2015-03/msg01066.html > > > @@ -1191,7 +1192,7 @@ do_local_cprop (rtx x, rtx_insn *insn) > > /* Rule out USE instructions and ASM statements as we don't want > to > >change the hard registers mentioned. */ > > if (REG_P (x) > > - && (REGNO (x) >= FIRST_PSEUDO_REGISTER > > + && (cprop_reg_p (x) > > || (GET_CODE (PATTERN (insn)) != USE > > && asm_noperands (PATTERN (insn)) < 0))) > Isn't the REG_P test now redundant? I made the same mistake when reviewing that change and indeed it's not. Note the opening parenthesis before cprop_reg_p that contains a bitwise OR expression. So in the case where cprop_reg_p is false, REG_P still needs to be true. We could keep a check on FIRST_PSEUDO_REGISTER but the intent (checking that the register is suitable for propagation) is clearer now, as pointed out by Steven to me. > > OK for the trunk with those changes. > > jeff Given the above I intent to keep the REG_P in the second excerpt and will wait for your input about moving cprop_reg_p to rtl.h Best regards, Thomas
RE: [PATCH, ping1] Fix removing of df problem in df_finish_pass
Committed. I'll wait a week and then ask for approval for a backport to 5.1.1 once 5.1 is released. Best regards, Thomas > -Original Message- > From: Kenneth Zadeck [mailto:zad...@naturalbridge.com] > Sent: Monday, April 20, 2015 9:26 PM > To: Thomas Preud'homme; 'Bernhard Reutner-Fischer'; gcc- > patc...@gcc.gnu.org; 'Paolo Bonzini'; 'Seongbae Park' > Subject: Re: [PATCH, ping1] Fix removing of df problem in df_finish_pass > > As a dataflow maintainer, I approve this patch for the next release. > However, you will have to get approval of a release manager to get it > into 5.0. > > > > On 04/20/2015 04:22 AM, Thomas Preud'homme wrote: > > Ping? > > > >> -Original Message----- > >> From: gcc-patches-ow...@gcc.gnu.org [mailto:gcc-patches- > >> ow...@gcc.gnu.org] On Behalf Of Thomas Preud'homme > >> Sent: Tuesday, March 03, 2015 12:02 PM > >> To: 'Bernhard Reutner-Fischer'; gcc-patches@gcc.gnu.org; 'Paolo > Bonzini'; > >> 'Seongbae Park'; 'Kenneth Zadeck' > >> Subject: RE: [PATCH] Fix removing of df problem in df_finish_pass > >> > >>> From: Bernhard Reutner-Fischer [mailto:rep.dot@gmail.com] > >>> Sent: Saturday, February 28, 2015 4:00 AM > >>>>use df_remove_problem rather than manually removing > problems, > >>> living > >>> > >>> leaving > >> Indeed. Please find updated changelog below: > >> > >> 2015-03-03 Thomas Preud'homme > > >> > >>* df-core.c (df_finish_pass): Iterate over df- > >>> problems_by_index[] and > >>use df_remove_problem rather than manually removing > >> problems, leaving > >>holes in df->problems_in_order[]. > >> > >> Best regards, > >> > >> Thomas > >> > >> > >> > >> > > > >
RE: [PATCH, ping1] Fix removing of df problem in df_finish_pass
Ping? > -Original Message- > From: gcc-patches-ow...@gcc.gnu.org [mailto:gcc-patches- > ow...@gcc.gnu.org] On Behalf Of Thomas Preud'homme > Sent: Tuesday, March 03, 2015 12:02 PM > To: 'Bernhard Reutner-Fischer'; gcc-patches@gcc.gnu.org; 'Paolo Bonzini'; > 'Seongbae Park'; 'Kenneth Zadeck' > Subject: RE: [PATCH] Fix removing of df problem in df_finish_pass > > > From: Bernhard Reutner-Fischer [mailto:rep.dot@gmail.com] > > Sent: Saturday, February 28, 2015 4:00 AM > > > use df_remove_problem rather than manually removing problems, > > living > > > > leaving > > Indeed. Please find updated changelog below: > > 2015-03-03 Thomas Preud'homme > > * df-core.c (df_finish_pass): Iterate over df- > >problems_by_index[] and > use df_remove_problem rather than manually removing > problems, leaving > holes in df->problems_in_order[]. > > Best regards, > > Thomas > > > >
RE: [PATCH, GCC, stage1] Fallback to copy-prop if constant-prop not possible
> From: Jeff Law [mailto:l...@redhat.com] > Sent: Monday, April 13, 2015 8:48 PM > > I know there were several followups between Steven and yourself. > With > stage1 now open, can you post a final version and do a final > bootstrap/test with it? Here is what came out of our discussion with Steven: The RTL cprop pass in GCC operates by doing a local constant/copy propagation first and then a global one. In the local one, if a constant cannot be propagated (eg. due to constraints of the destination instruction) a copy propagation is done instead. However, at the global level copy propagation is only tried if no constant can be propagated, ie. if a constant can be propagated but the constraints of the destination instruction forbids it, no copy propagation will be tried. This patch fixes this issue. ChangeLog entries are as follows: *** gcc/ChangeLog *** 2015-04-15 Thomas Preud'homme Steven Bosscher * cprop.c (cprop_reg_p): New. (hash_scan_set): Use above function to check if register can be propagated. (find_avail_set): Return up to two sets, one whose source is a register and one whose source is a constant. Sets are returned in an array passed as parameter rather than as a return value. (cprop_insn): Use a do while loop rather than a goto. Try each of the sets returned by find_avail_set, starting with the one whose source is a constant. Use cprop_reg_p to check if register can be propagated. (do_local_cprop): Use cprop_reg_p to check if register can be propagated. (implicit_set_cond_p): Likewise. *** gcc/testsuite/ChangeLog *** 2015-04-15 Thomas Preud'homme Steven Bosscher * gcc.target/arm/pr64616.c: New file. And the patch is: diff --git a/gcc/cprop.c b/gcc/cprop.c index c9fb2fc..78541cf 100644 --- a/gcc/cprop.c +++ b/gcc/cprop.c @@ -285,6 +285,15 @@ cprop_constant_p (const_rtx x) return CONSTANT_P (x) && (GET_CODE (x) != CONST || shared_const_p (x)); } +/* Determine whether the rtx X should be treated as a register that can + be propagated. Any pseudo-register is fine. */ + +static bool +cprop_reg_p (const_rtx x) +{ + return REG_P (x) && !HARD_REGISTER_P (x); +} + /* Scan SET present in INSN and add an entry to the hash TABLE. IMPLICIT is true if it's an implicit set, false otherwise. */ @@ -295,8 +304,7 @@ hash_scan_set (rtx set, rtx_insn *insn, struct hash_table_d *table, rtx src = SET_SRC (set); rtx dest = SET_DEST (set); - if (REG_P (dest) - && ! HARD_REGISTER_P (dest) + if (cprop_reg_p (dest) && reg_available_p (dest, insn) && can_copy_p (GET_MODE (dest))) { @@ -321,9 +329,8 @@ hash_scan_set (rtx set, rtx_insn *insn, struct hash_table_d *table, src = XEXP (note, 0), set = gen_rtx_SET (VOIDmode, dest, src); /* Record sets for constant/copy propagation. */ - if ((REG_P (src) + if ((cprop_reg_p (src) && src != dest - && ! HARD_REGISTER_P (src) && reg_available_p (src, insn)) || cprop_constant_p (src)) insert_set_in_table (dest, src, insn, table, implicit); @@ -821,15 +828,15 @@ try_replace_reg (rtx from, rtx to, rtx_insn *insn) return success; } -/* Find a set of REGNOs that are available on entry to INSN's block. Return - NULL no such set is found. */ +/* Find a set of REGNOs that are available on entry to INSN's block. If found, + SET_RET[0] will be assigned a set with a register source and SET_RET[1] a + set with a constant source. If not found the corresponding entry is set to + NULL. */ -static struct cprop_expr * -find_avail_set (int regno, rtx_insn *insn) +static void +find_avail_set (int regno, rtx_insn *insn, struct cprop_expr *set_ret[2]) { - /* SET1 contains the last set found that can be returned to the caller for - use in a substitution. */ - struct cprop_expr *set1 = 0; + set_ret[0] = set_ret[1] = NULL; /* Loops are not possible here. To get a loop we would need two sets available at the start of the block containing INSN. i.e. we would @@ -869,8 +876,10 @@ find_avail_set (int regno, rtx_insn *insn) If the source operand changed, we may still use it for the next iteration of this loop, but we may not use it for substitutions. */ - if (cprop_constant_p (src) || reg_not_set_p (src, insn)) - set1 = set; + if (cprop_constant_p (src)) + set_ret[1] = set; + else if (reg_not_set_p (src, insn)) + set_ret[0] = set; /* If the source of the set is anything except a register, then we have reached the end of the copy chain. */ @@ -881,10 +890,6 @@ find_avail_set (int regno, rtx_insn *insn) and see if we have an available copy into SRC. */ regno = REGNO (
RE: [PATCH, GCC, stage1] Fallback to copy-prop if constant-prop not possible
> From: Jeff Law [mailto:l...@redhat.com] > Sent: Monday, April 13, 2015 8:48 PM > Thomas, > > I know there were several followups between Steven and yourself. > With > stage1 now open, can you post a final version and do a final > bootstrap/test with it? Sure, I'm testing it right now. Sorry for not doing it earlier, I wasn't sure what constitute "too much disruption" as per GCC 6.0 Status Report email. Best regards, Thomas
RE: [PATCH, GCC, stage1] Fallback to copy-prop if constant-prop not possible
> From: gcc-patches-ow...@gcc.gnu.org [mailto:gcc-patches- > ow...@gcc.gnu.org] On Behalf Of Thomas Preud'homme > > FYI testing your patch with the one cprop_reg_p negated as said in my > previous email shows no regression on arm-none-eabi cross-compiler > targeting Cortex-M3. Testing for x86_64 is ongoing. Sorry, I forgot to report back on this. No regression as well on x86_64-linux-gnu. Do you want me to respin the patch (adding the testcase from the patch I sent, fixing the indentation and adding a ChangeLog)? Best regards, Thomas
RE: [PATCH, GCC, stage1] Fallback to copy-prop if constant-prop not possible
> From: Steven Bosscher [mailto:stevenb@gmail.com] > Sent: Friday, March 20, 2015 8:14 PM > > I put the cprop_reg_p check there instead of !HARD_REGISTER_P > because > I like to be able to quickly find all places where a similar check is > performed. The check is whether the reg is something that copy > propagation can handle, and that is what I added cprop_reg_p for. Makes sense indeed. I didn't think about the meaning of it. > (Note that cprop can _currently_ handle only pseudos but there is no > reason why a limited set of hard regs can't be handled also, e.g. the > flag registers like in targetm.fixed_condition_code_regs). > > In this case, the result is that REG_P is checked twice. > But then again, cprop_reg_p will be inlined and the double check > optimized away. True. > > Anyway, I guess we've bikeshedded long enough over this patch as it is > :-) Let's post a final form and declare it OK for stage1. What about the cprop_reg_p that needs to be negated? Did I miss something that makes it ok? > > As for PSEUDO_REG_P: If it were up to me, I'd like to have in rtl.h: > > static bool > hard_register_p (rtx x) > { > return (REG_P (x) && HARD_REGISTER_NUM_P (REGNO (x))); > } > > static bool > pseudo_register_p (rtx x) > { > return (REG_P (x) && !HARD_REGISTER_NUM_P (REGNO (x))); > } > > and do away with all the FIRST_PSEUDO_REGISTER tests. But I've > proposed this in the past and there was opposition. Perhaps when we > introduce a rtx_reg class... Ok I'll try to dig up what was the reasons presented. Anyway, it would be done in a separate patch so not a problem for this one. FYI testing your patch with the one cprop_reg_p negated as said in my previous email shows no regression on arm-none-eabi cross-compiler targeting Cortex-M3. Testing for x86_64 is ongoing. Best regards, Thomas
RE: [PATCH, GCC, stage1] Fallback to copy-prop if constant-prop not possible
> From: gcc-patches-ow...@gcc.gnu.org [mailto:gcc-patches- > ow...@gcc.gnu.org] On Behalf Of Thomas Preud'homme > > I noticed in do_local_cprop you replace >= FIRST_PSEUDO_REGISTER by > cprop_reg_p without removing the REG_P as well. Sorry, I missed the parenthesis. REG_P needs indeed to be kept. I'd be tempted to use !HARD_REGISTER_P instead since REG_P is already checked but I don't mind either way. Best regards, Thomas
RE: [PATCH, GCC, stage1] Fallback to copy-prop if constant-prop not possible
Hi Steven, > From: Steven Bosscher [mailto:stevenb@gmail.com] > Sent: Friday, March 20, 2015 3:54 PM > > > What I meant, is that I believe the tests are already done in > hash_scan_set and should be redundant in cprop_insn (i.e. the test can > be replaced with gcc_[checking_]assert). Ok. > > I've attached a patch with some changes to it: introduce cprop_reg_p() > to get rid of all the "REG_P && regno > FIRST_PSEUDO_REGISTER" tests. > I still have the cprop_constant_p and cprop_reg_p tests in cprop_insn > but this weekend I'll try with gcc_checking_asserts instead. Please > have a look at the patch and let me know if you like it (given it's > mostly yours I hope you do like it ;-) I think it would be preferable to introduce PSEUDO_REG_P in rtl.h as this seems like a common pattern enough [1]. It would be nice to have a HARD_REG_P that would be cover the other common patterns REG_P && < FIRST_PSEUDO_REGISTER and REG_P && HARD_REGISTER_P but I can't come up with a good name (HARD_REGISTER_P is confusing because it doesn't check if it's a register in the first place). I noticed in do_local_cprop you replace >= FIRST_PSEUDO_REGISTER by cprop_reg_p without removing the REG_P as well. In implicit_set_cond_p there is a replacement of !REG_P || HARD_REGISTER_P by cprop_reg_p. It seems to me it should rather be replaced by !cprop_reg_p. Otherwise it looks ok. [1] grep -R "REG_P .*&&.*>= FIRST_PSEUDO_REGISTER" . | wc -l returns 23 > > Bootstrapped & tested on powerpc64-unknown-linux-gnu. In building all > of cc1, 35 extra copies are propagated with the patch. I'll try to launch a build and testsuite run with these 2 issues fixed before I leave tonight and will report the result on Monday. Best regards, Thomas
RE: [PATCH, stage1] Move insns without introducing new temporaries in loop2_invariant
> From: Steven Bosscher [mailto:stevenb@gmail.com] > Sent: Monday, March 09, 2015 7:48 PM > To: Thomas Preud'homme > Cc: GCC Patches; Eric Botcazou > Subject: Re: [PATCH, stage1] Move insns without introducing new > temporaries in loop2_invariant New patch below. > > It looks like this would run for all candidate loop invariants, right? > > If so, you're creating run time of O(n_invariants*n_bbs_in_loop), a > potential compile time hog for large loops. > > But why compute this at all? Perhaps I'm missing something, but you > already have inv->always_executed available, no? Indeed. I didn't realize the information was already there. > > > > + basic_block use_bb; > > + > > + ref = DF_REF_INSN (use); > > + use_bb = BLOCK_FOR_INSN (ref); > > You can use DF_REF_BB. Since I need use_insn here I kept BLOCK_FOR_INSN but I used DF_REF_BB for the def below. So here are the new ChangeLog entries: *** gcc/ChangeLog *** 2015-03-11 Thomas Preud'homme * loop-invariant.c (can_move_invariant_reg): New. (move_invariant_reg): Call above new function to decide whether instruction can just be moved, skipping creation of temporary register. *** gcc/testsuite/ChangeLog *** 2015-03-12 Thomas Preud'homme * gcc.dg/loop-8.c: New test. * gcc.dg/loop-9.c: New test. diff --git a/gcc/loop-invariant.c b/gcc/loop-invariant.c index f79b497..8217d62 100644 --- a/gcc/loop-invariant.c +++ b/gcc/loop-invariant.c @@ -1512,6 +1512,79 @@ replace_uses (struct invariant *inv, rtx reg, bool in_group) return 1; } And the new patch: +/* Whether invariant INV setting REG can be moved out of LOOP, at the end of + the block preceding its header. */ + +static bool +can_move_invariant_reg (struct loop *loop, struct invariant *inv, rtx reg) +{ + df_ref def, use; + unsigned int dest_regno, defs_in_loop_count = 0; + rtx_insn *insn = inv->insn; + basic_block bb = BLOCK_FOR_INSN (inv->insn); + + /* We ignore hard register and memory access for cost and complexity reasons. + Hard register are few at this stage and expensive to consider as they + require building a separate data flow. Memory access would require using + df_simulate_* and can_move_insns_across functions and is more complex. */ + if (!REG_P (reg) || HARD_REGISTER_P (reg)) +return false; + + /* Check whether the set is always executed. We could omit this condition if + we know that the register is unused outside of the loop, but it does not + seem worth finding out. */ + if (!inv->always_executed) +return false; + + /* Check that all uses reached by the def in insn would still be reached + it. */ + dest_regno = REGNO (reg); + for (use = DF_REG_USE_CHAIN (dest_regno); use; use = DF_REF_NEXT_REG (use)) +{ + rtx_insn *use_insn; + basic_block use_bb; + + use_insn = DF_REF_INSN (use); + use_bb = BLOCK_FOR_INSN (use_insn); + + /* Ignore instruction considered for moving. */ + if (use_insn == insn) + continue; + + /* Don't consider uses outside loop. */ + if (!flow_bb_inside_loop_p (loop, use_bb)) + continue; + + /* Don't move if a use is not dominated by def in insn. */ + if (use_bb == bb && DF_INSN_LUID (insn) >= DF_INSN_LUID (use_insn)) + return false; + if (!dominated_by_p (CDI_DOMINATORS, use_bb, bb)) + return false; +} + + /* Check for other defs. Any other def in the loop might reach a use + currently reached by the def in insn. */ + for (def = DF_REG_DEF_CHAIN (dest_regno); def; def = DF_REF_NEXT_REG (def)) +{ + basic_block def_bb = DF_REF_BB (def); + + /* Defs in exit block cannot reach a use they weren't already. */ + if (single_succ_p (def_bb)) + { + basic_block def_bb_succ; + + def_bb_succ = single_succ (def_bb); + if (!flow_bb_inside_loop_p (loop, def_bb_succ)) + continue; + } + + if (++defs_in_loop_count > 1) + return false; +} + + return true; +} + /* Move invariant INVNO out of the LOOP. Returns true if this succeeds, false otherwise. */ @@ -1545,11 +1618,8 @@ move_invariant_reg (struct loop *loop, unsigned invno) } } - /* Move the set out of the loop. If the set is always executed (we could -omit this condition if we know that the register is unused outside of -the loop, but it does not seem worth finding out) and it has no uses -that would not be dominated by it, we may just move it (TODO). -Otherwise we need to create a temporary register. */ + /* If possible, just move the set out of the loop. Otherwise, we +need to create a temporary register. */ set = single_set (inv->insn); reg = dest = SET_DEST (set);
RE: [PATCH, stage1] Make function names visible in -fdump-rtl-*-graph
> From: Richard Biener [mailto:rguent...@suse.de] > Sent: Friday, March 13, 2015 5:02 PM > > > > Is this ok for stage1? It's not a bug but it helps debuggability so is > > this something we might consider backporting? > > It's ok now given you bootstrapped the change. I did + regression testsuite on both arm-none-eabi and x86_64-linux-gnu. I forgot to mentioned it sorry. I just committed it. Best regards, Thomas
[PATCH, stage1] Make function names visible in -fdump-rtl-*-graph
Hi, The description is longer than the patch so you might want to skip directly to it. The dot file generated by -fdump-rtl-*-graph switches group basic blocks for a given function together in a subgraph and use the function name as the label. However, when generating an image (for instance a svg with "dot -Tsvg") the label does not appear. This makes analyzing the resulting file more difficult than it should be. The section "Subgraphs and clusters" of "The DOT language" document contains the following excerpt: "The third role for subgraphs directly involves how the graph will be laid out by certain layout engines. If the name of the subgraph begins with cluster, Graphviz notes the subgraph as a special cluster subgraph. If supported, the layout engine will do the layout so that the nodes belonging to the cluster are drawn together, with the entire drawing of the cluster contained within a bounding rectangle. Note that, for good and bad, cluster subgraphs are not part of the DOT language, but solely a syntactic convention adhered to by certain of the layout engines." Hence prepending cluster_ to subgraph id (not its label) would improve the output image with many layout engines while no doing any difference for other layout engines. The patch also make the subgraph boudary visible with dashed lines and add "()" to the label of the subgraph (so for a function f the label would be "f ()"). ChangeLog entry is as follows: *** gcc/ChangeLog *** 2015-03-10 Thomas Preud'homme * graph.c (print_graph_cfg): Make function names visible and append parenthesis to it. Also make groups of basic blocks belonging to the same function visible. diff --git a/gcc/graph.c b/gcc/graph.c index a1eb24c..5fb0d78 100644 --- a/gcc/graph.c +++ b/gcc/graph.c @@ -292,9 +292,10 @@ print_graph_cfg (const char *base, struct function *fun) pretty_printer graph_slim_pp; graph_slim_pp.buffer->stream = fp; pretty_printer *const pp = &graph_slim_pp; - pp_printf (pp, "subgraph \"%s\" {\n" -"\tcolor=\"black\";\n" -"\tlabel=\"%s\";\n", + pp_printf (pp, "subgraph \"cluster_%s\" {\n" +"\tstyle=\"dashed\";\n" +"\tcolor=\"black\";\n" +"\tlabel=\"%s ()\";\n", funcname, funcname); draw_cfg_nodes (pp, fun); draw_cfg_edges (pp, fun); Is this ok for stage1? It's not a bug but it helps debuggability so is this something we might consider backporting? Best regards, Thomas
RE: [PATCH, stage1] Move insns without introducing new temporaries in loop2_invariant
> From: Steven Bosscher [mailto:stevenb@gmail.com] > Sent: Monday, March 09, 2015 7:48 PM > To: Thomas Preud'homme > Cc: GCC Patches; Eric Botcazou > Subject: Re: [PATCH, stage1] Move insns without introducing new > temporaries in loop2_invariant > > On Thu, Mar 5, 2015 at 10:53 AM, Thomas Preud'homme wrote: > > diff --git a/gcc/dominance.c b/gcc/dominance.c > > index 33d4ae4..09c8c90 100644 > > --- a/gcc/dominance.c > > +++ b/gcc/dominance.c > > @@ -982,7 +982,7 @@ nearest_common_dominator_for_set (enum > cdi_direction dir, bitmap blocks) > > > > A_Dominated_by_B (node A, node B) > > { > > - return DFS_Number_In(A) >= DFS_Number_In(A) > > + return DFS_Number_In(A) >= DFS_Number_In(B) > > && DFS_Number_Out (A) <= DFS_Number_Out(B); > > } */ > > This hunk is obvious enough ;-) Thus committed. Best regards, Thomas
RE: [PATCH, stage1] Move insns without introducing new temporaries in loop2_invariant
> From: Jiong Wang [mailto:jiong.w...@arm.com] > Sent: Friday, March 06, 2015 8:10 PM > > On 05/03/15 09:53, Thomas Preud'homme wrote: > > *** gcc/testsuite/ChangeLog *** > > > > 2015-02-16 Thomas Preud'homme > > > > * gcc.dg/loop-7.c: Run on all targets and check for loop2_invariant > > being able to move instructions without introducing new > temporary > Thomas, > >Can you please confirm this relax on all target will not fail on > AArch64? It do fails on my quick test. Indeed, I made a very naïve assumption here and should have tested on a wider range of targets. I'll rework the testcases associated with this patch. Thanks for catching it Jiong. Best regards, Thomas
[PATCH] Fix PR63743: Incorrect ordering of operands in sequence of commutative operations
Hi, Improved canonization after r216728 causes GCC to more often generate poor code due to suboptimal ordering of operand of commutative libcalls. Indeed, if one of the operands of a commutative operation is the result of a previous operation, both being implemented by libcall, the wrong ordering of the operands in the second operation can lead to extra mov. Consider the following case on softfloat targets: double test1 (double x, double y) { return x * (x + y); } If x + y is put in the operand using the same register as the result of the libcall for x + y then no mov is generated, otherwise mov is needed. The following happens on arm softfloat with the right ordering: bl __aeabi_dadd ldr r2, [sp] ldr r3, [sp, #4] /* r0, r1 are reused from the return values of the __aeabi_dadd libcall. */ bl __aeabi_dmul With the wrong ordering one gets: bl __aeabi_dadd mov r2, r0 mov r3, r1 ldr r0, [sp] ldr r1, [sp, #4] bl __aeabi_dmul This patch extend the patch written by Yuri Rumyantsev in r219646 to also deal with the case of only one of the operand being the result of an operation. ChangeLog entries are as follows: *** gcc/ChangeLog *** 2015-03-05 Thomas Preud'homme PR tree-optimization/63743 * cfgexpand.c (reorder_operands): Also reorder if only second operand had its definition forwarded by TER. *** gcc/testsuite/ChangeLog *** 2015-03-05 Thomas Preud'homme PR tree-optimization/63743 * gcc.dg/pr63743.c: New test. diff --git a/gcc/cfgexpand.c b/gcc/cfgexpand.c index 7dfe1f6..4fbc037 100644 --- a/gcc/cfgexpand.c +++ b/gcc/cfgexpand.c @@ -5117,13 +5117,11 @@ reorder_operands (basic_block bb) continue; /* Swap operands if the second one is more expensive. */ def0 = get_gimple_for_ssa_name (op0); - if (!def0) - continue; def1 = get_gimple_for_ssa_name (op1); if (!def1) continue; swap = false; - if (lattice[gimple_uid (def1)] > lattice[gimple_uid (def0)]) + if (!def0 || lattice[gimple_uid (def1)] > lattice[gimple_uid (def0)]) swap = true; if (swap) { @@ -5132,7 +5130,7 @@ reorder_operands (basic_block bb) fprintf (dump_file, "Swap operands in stmt:\n"); print_gimple_stmt (dump_file, stmt, 0, TDF_SLIM); fprintf (dump_file, "Cost left opnd=%d, right opnd=%d\n", - lattice[gimple_uid (def0)], + def0 ? lattice[gimple_uid (def0)] : 0, lattice[gimple_uid (def1)]); } swap_ssa_operands (stmt, gimple_assign_rhs1_ptr (stmt), diff --git a/gcc/testsuite/gcc.dg/pr63743.c b/gcc/testsuite/gcc.dg/pr63743.c new file mode 100644 index 000..87254ed --- /dev/null +++ b/gcc/testsuite/gcc.dg/pr63743.c @@ -0,0 +1,11 @@ +/* { dg-do compile } */ +/* { dg-options "-O1 -fdump-rtl-expand-details" } */ + +double +libcall_dep (double x, double y) +{ + return x * (x + y); +} + +/* { dg-final { scan-rtl-dump-times "Swap operands" 1 "expand" } } */ +/* { dg-final { cleanup-rtl-dump "expand" } } */ Testsuite was run in QEMU when compiled by an arm-none-eabi GCC cross-compiler targeting Cortex-M3 and a bootstrapped x86_64 native GCC without any regression. CSiBE sees a -0.5034% code size decrease on arm-none-eabi and a 0.0058% code size increase on x86_64-linux-gnu. Is it ok for trunk (since it fixes a code size regression in 5.0)? Best regards, Thomas
RE: [PATCH, stage1] Move insns without introducing new temporaries in loop2_invariant
> From: Richard Biener [mailto:richard.guent...@gmail.com] > Sent: Thursday, March 05, 2015 7:12 PM > > > > loop header > > start of loop body > > //stuff > > (set (reg 128) (const_int 0)) > > //other stuff > > end of loop body > > > > becomes: > > > > (set (reg 129) (const_int 0)) > > loop header > > start of loop body > > //stuff > > (set (reg 128) (reg 128)) > > //other stuff > > end of loop body > > > > Why doesn't copy-propagation clean this up? It's run after loop2. Actually cprop3 is what makes the situation worse in this case as it will copy the constant that is set outside the loop in the mov that is in the loop. In the case or PR64616 the constant is a symbol_ref which makes it a memory access so it propagates the memory access in the loop, making the load executed many times. Note that as I said in the intro this bug is also solved by [1] which is the first thing that goes wrong for this example. That being said, loop invariant pass ought to simply move instructions if it can safely do so. [1] https://gcc.gnu.org/ml/gcc-patches/2015-02/msg00933.html Best regards, Thomas
[PATCH, stage1] Move insns without introducing new temporaries in loop2_invariant
Note: this is stage1 material. Currently loop2_invariant pass hoist instructions out of loop by creating a new temporary for the destination register of that instruction and leaving there a mov from new temporary to old register as shown below: loop header start of loop body //stuff (set (reg 128) (const_int 0)) //other stuff end of loop body becomes: (set (reg 129) (const_int 0)) loop header start of loop body //stuff (set (reg 128) (reg 128)) //other stuff end of loop body This is one of the errors that led to a useless ldr ending up inside a loop (PR64616). This patch fix this specific bit (some other bit was fixed in [1]) by simply moving the instruction if it's known to be safe. This is decided by looking at all the uses of the register set in the instruction and checking that (i) they were all dominated by the instruction and (ii) there is no other def in the loop that could end up reaching one of the use. [1] https://gcc.gnu.org/ml/gcc-patches/2015-02/msg00933.html ChangeLog entries are as follows: *** gcc/ChangeLog *** 2015-02-16 Thomas Preud'homme * dominance.c (nearest_common_dominator_for_set): Fix A_Dominated_by_B code in comment. * loop-invariant.c (can_move_invariant_reg): New. (move_invariant_reg): Call above new function to decide whether instruction can just be moved, skipping creation of temporary register. *** gcc/testsuite/ChangeLog *** 2015-02-16 Thomas Preud'homme * gcc.dg/loop-7.c: Run on all targets and check for loop2_invariant being able to move instructions without introducing new temporary register. * gcc.dg/loop-8.c: New test. diff --git a/gcc/dominance.c b/gcc/dominance.c index 33d4ae4..09c8c90 100644 --- a/gcc/dominance.c +++ b/gcc/dominance.c @@ -982,7 +982,7 @@ nearest_common_dominator_for_set (enum cdi_direction dir, bitmap blocks) A_Dominated_by_B (node A, node B) { - return DFS_Number_In(A) >= DFS_Number_In(A) + return DFS_Number_In(A) >= DFS_Number_In(B) && DFS_Number_Out (A) <= DFS_Number_Out(B); } */ diff --git a/gcc/loop-invariant.c b/gcc/loop-invariant.c index f79b497..ab2a45c 100644 --- a/gcc/loop-invariant.c +++ b/gcc/loop-invariant.c @@ -1512,6 +1512,99 @@ replace_uses (struct invariant *inv, rtx reg, bool in_group) return 1; } +/* Whether invariant INV setting REG can be moved out of LOOP, at the end of + the block preceding its header. */ + +static bool +can_move_invariant_reg (struct loop *loop, struct invariant *inv, rtx reg) +{ + df_ref def, use; + bool ret = false; + unsigned int i, dest_regno, defs_in_loop_count = 0; + rtx_insn *insn = inv->insn; + bitmap may_exit, has_exit, always_executed; + basic_block *body, bb = BLOCK_FOR_INSN (inv->insn); + + /* We ignore hard register and memory access for cost and complexity reasons. + Hard register are few at this stage and expensive to consider as they + require building a separate data flow. Memory access would require using + df_simulate_* and can_move_insns_across functions and is more complex. */ + if (!REG_P (reg) || HARD_REGISTER_P (reg)) +return false; + + /* Check whether the set is always executed. We could omit this condition if + we know that the register is unused outside of the loop, but it does not + seem worth finding out. */ + may_exit = BITMAP_ALLOC (NULL); + has_exit = BITMAP_ALLOC (NULL); + always_executed = BITMAP_ALLOC (NULL); + body = get_loop_body_in_dom_order (loop); + find_exits (loop, body, may_exit, has_exit); + compute_always_reached (loop, body, has_exit, always_executed); + /* Find bit position for basic block bb. */ + for (i = 0; i < loop->num_nodes && body[i] != bb; i++); + if (!bitmap_bit_p (always_executed, i)) +goto cleanup; + + /* Check that all uses reached by the def in insn would still be reached + it. */ + dest_regno = REGNO (reg); + for (use = DF_REG_USE_CHAIN (dest_regno); use; use = DF_REF_NEXT_REG (use)) +{ + rtx ref; + basic_block use_bb; + + ref = DF_REF_INSN (use); + use_bb = BLOCK_FOR_INSN (ref); + + /* Ignore instruction considered for moving. */ + if (ref == insn) + continue; + + /* Don't consider uses outside loop. */ + if (!flow_bb_inside_loop_p (loop, use_bb)) + continue; + + /* Don't move if a use is not dominated by def in insn. */ + if (use_bb == bb && DF_INSN_LUID (insn) > DF_INSN_LUID (ref)) + goto cleanup; + if (!dominated_by_p (CDI_DOMINATORS, use_bb, bb)) + goto cleanup; + + /* Check for other defs. Any other def in the loop might reach a use +currently reached by the def in insn. */ + if (!defs_in_loop_count) + { + for (def = DF_REG_DEF_CHAIN (dest_regno); def; def = DF_REF_NEXT_REG (def)) + { + basic_block def_bb = BLOCK_F
RE: [PATCH, GCC, stage1] Fallback to copy-prop if constant-prop not possible
Ping? > From: gcc-patches-ow...@gcc.gnu.org [mailto:gcc-patches- > ow...@gcc.gnu.org] On Behalf Of Thomas Preud'homme [SNIP] > > > > Likewise for the REG_P and ">= FIRST_PSEUDO_REGISTER" tests here > > (with > > the equivalent and IMHO preferable HARD_REGISTER_P test in > > find_avail_set()). > > I'm not sure I follow you here. First, it seems to me that the equivalent > test is rather REG_P && !HARD_REGISTER_P since here it checks if it's > a pseudo register. > > Then, do you mean the test can be simply removed because of the > REG_P && !HARD_REGISTER_P in hash_scan_set () called indirectly by > compute_hash_table () when called in one_cprop_pass () before any > cprop_insn ()? Or do you mean I should move the check in > find_avail_set ()? > > Best regards, > > Thomas > > > >