Re: [PR25529] Convert (unsigned t * 2)/2 into unsigned (t 0x7FFFFFFF)
On Tue, 7 Jul 2015, Hurugalawadi, Naveen wrote: Please find attached the patch PR25529.patch that converts the pattern:- (unsigned * 2)/2 is into unsigned 0x7FFF +/* Simplify (unsigned t * 2)/2 - unsigned t 0x7FFF. */ +(for div (trunc_div ceil_div floor_div round_div exact_div) + (simplify + (div (mult @0 INTEGER_CST@1) INTEGER_CST@1) You don't need to repeat INTEGER_CST, the second time @1 is enough. + (with { tree n2 = build_int_cst (TREE_TYPE (@0), + wi::exact_log2 (@1)); } + (if (TYPE_UNSIGNED (TREE_TYPE (@0))) + (bit_and @0 (rshift (lshift { build_minus_one_cst (TREE_TYPE (@0)); } + { n2; }) { n2; })) What happens if you write t*3/3? -- Marc Glisse
Tests for libgomp based on OpenMP Examples 4.0.2.
With this letter I propose a patch with tests for libgomp based on OpenMP Examples 4.0.2 both for C and Fortran. The changes are: Renamed existing tests based on OpenMP Examples to make the names more clear. Added 16 tests for simd construct and 10 for depend clause. - Sincerely yours, Maxim Blumental 2015-07-06 Maxim Blumenthal bvm...@gmail.com * libgomp/testsuite/libgomp.c/examples-4/e.56.3.c: renamed to array_sections-3.c * libgomp/testsuite/libgomp.c/examples-4/e.56.4.c: renamed to array_sections-4.c * libgomp/testsuite/libgomp.c/examples-4/e.55.1.c: renamed to async_target-1.c * libgomp/testsuite/libgomp.c/examples-4/e.55.2.c: renamed to async_target-2.c * libgomp/testsuite/libgomp.c/examples-4/e.53.1.c: renamed to declare_target-1.c * libgomp/testsuite/libgomp.c/examples-4/e.53.3.c: renamed to declare_target-3.c * libgomp/testsuite/libgomp.c/examples-4/e.53.4.c: renamed to declare_target-4.c * libgomp/testsuite/libgomp.c/examples-4/e.53.5.c: renamed to declare_target-5.c * libgomp/testsuite/libgomp.c/examples-4/e.57.1.c: renamed to device-1.c * libgomp/testsuite/libgomp.c/examples-4/e.57.2.c: renamed to device-2.c * libgomp/testsuite/libgomp.c/examples-4/e.57.3.c: renamed to device-3.c * libgomp/testsuite/libgomp.c/examples-4/simd-1.c: A test for simd construct. * libgomp/testsuite/libgomp.c/examples-4/simd-2.c: Same. * libgomp/testsuite/libgomp.c/examples-4/simd-3.c: Same. * libgomp/testsuite/libgomp.c/examples-4/simd-4.c: Same. * libgomp/testsuite/libgomp.c/examples-4/simd-5.c: Same. * libgomp/testsuite/libgomp.c/examples-4/simd-6.c: Same. * libgomp/testsuite/libgomp.c/examples-4/simd-7.c: Same. * libgomp/testsuite/libgomp.c/examples-4/simd-8.c: Same. * libgomp/testsuite/libgomp.c/examples-4/e.50.1.c: renamed to target-1.c * libgomp/testsuite/libgomp.c/examples-4/e.50.2.c: renamed to target-2.c * libgomp/testsuite/libgomp.c/examples-4/e.50.3.c: renamed to target-3.c * libgomp/testsuite/libgomp.c/examples-4/e.50.4.c: renamed to target-4.c * libgomp/testsuite/libgomp.c/examples-4/e.50.5.c: renamed to target-5.c * libgomp/testsuite/libgomp.c/examples-4/e.51.1.c: renamed to target_data-1.c * libgomp/testsuite/libgomp.c/examples-4/e.51.2.c: renamed to target_data-2.c * libgomp/testsuite/libgomp.c/examples-4/e.51.3.c: renamed to target_data-3.c * libgomp/testsuite/libgomp.c/examples-4/e.51.4.c: renamed to target_data-4.c * libgomp/testsuite/libgomp.c/examples-4/e.51.6.c: renamed to target_data-6.c * libgomp/testsuite/libgomp.c/examples-4/e.51.7.c: renamed to target_data-7.c * libgomp/testsuite/libgomp.c/examples-4/e.52.1.c: renamed to target_update-1.c * libgomp/testsuite/libgomp.c/examples-4/e.52.2.c: renamed to target_update-2.c * libgomp/testsuite/libgomp.c/examples-4/task_dep-1.c: A test for task dependencies. * libgomp/testsuite/libgomp.c/examples-4/task_dep-2.c: Same. * libgomp/testsuite/libgomp.c/examples-4/task_dep-3.c: Same. * libgomp/testsuite/libgomp.c/examples-4/task_dep-4.c: Same. * libgomp/testsuite/libgomp.c/examples-4/task_dep-5.c: Same. * libgomp/testsuite/libgomp.c/examples-4/e.54.2.c: renamed to teams-2.c * libgomp/testsuite/libgomp.c/examples-4/e.54.3.c: renamed to teams-3.c * libgomp/testsuite/libgomp.c/examples-4/e.54.4.c: renamed to teams-4.c * libgomp/testsuite/libgomp.c/examples-4/e.54.5.c: renamed to teams-5.c * libgomp/testsuite/libgomp.c/examples-4/e.54.6.c: renamed to teams-6.c * libgomp/testsuite/libgomp.fortran/examples-4/e.56.3.f90: renamed to array_sections-3.f90 * libgomp/testsuite/libgomp.fortran/examples-4/e.56.4.f90: renamed to array_sections-4.f90 * libgomp/testsuite/libgomp.fortran/examples-4/e.55.1.f90: renamed to async_target-1.f90 * libgomp/testsuite/libgomp.fortran/examples-4/e.55.2.f90: renamed to async_target-2.f90 * libgomp/testsuite/libgomp.fortran/examples-4/e.53.1.f90: renamed to declare_target-1.f90 * libgomp/testsuite/libgomp.fortran/examples-4/e.53.2.f90: renamed to declare_target-2.f90 * libgomp/testsuite/libgomp.fortran/examples-4/e.53.3.f90: renamed to declare_target-3.f90 * libgomp/testsuite/libgomp.fortran/examples-4/e.53.4.f90: renamed to declare_target-4.f90 * libgomp/testsuite/libgomp.fortran/examples-4/e.53.5.f90: renamed to declare_target-5.f90 * libgomp/testsuite/libgomp.fortran/examples-4/e.57.1.f90: renamed to device-1.f90 * libgomp/testsuite/libgomp.fortran/examples-4/e.57.2.f90: renamed to device-2.f90 * libgomp/testsuite/libgomp.fortran/examples-4/e.57.3.f90: renamed to device-3.f90 * libgomp/testsuite/libgomp.fortran/examples-4/simd-1.f90: A test for simd construct. * libgomp/testsuite/libgomp.fortran/examples-4/simd-2.f90: Same. * libgomp/testsuite/libgomp.fortran/examples-4/simd-3.f90: Same. * libgomp/testsuite/libgomp.fortran/examples-4/simd-4.f90:
Re: [PATCH 3/16][ARM] Add float16x4_t intrinsics
Kyrill Tkachov wrote: On 07/07/15 17:34, Alan Lawrence wrote: Kyrill Tkachov wrote: On 07/07/15 14:09, Kyrill Tkachov wrote: Hi Alan, On 07/07/15 13:34, Alan Lawrence wrote: As per https://gcc.gnu.org/ml/gcc-patches/2015-04/msg01335.html For some context, the reference for these is at: http://infocenter.arm.com/help/topic/com.arm.doc.ihi0073a/IHI0073A_arm_neon_intrinsics_ref.pdf This patch is ok once you and Charles decide on how to proceed with the two prerequisites. On second thought, the ACLE document at http://infocenter.arm.com/help/topic/com.arm.doc.ihi0053c/IHI0053C_acle_2_0.pdf says in 12.2.1: float16 types are only available when the __fp16 type is defined, i.e. when supported by the hardware However, we support __fp16 whenever the user specifies -mfp16-format=ieee or -mfp16-format=alternative, regardless of whether we have hardware support or not. (Without hardware support, gcc generates calls to __gnu_f2h_ieee or __gnu_f2h_alternative instead of vcvtb.f16.f32, and __gnu_h2f_ieee or __gnu_h2f_alternative instead of vcvtb.f32.f16. However, there is no way to support __fp16 just using those hardware instructions without caring about which format is in use.) Hmmm... In my opinion intrinsics should aim to map to instructions rather than go away and call library functions, but this is the existing functionality that current users might depend on :( Sorry - to clarify: currently we generate __gnu_f2h_ieee / __gnu_h2f_ieee, to convert between single __fp16 and 'float' values, when there is no HW. General operations on scalar __fp16 values are performed by converting to float, performing operations on float, and converting back. The __fp16 type is available and usable without HW support, but only when -mfp16-format is specified. (The existing) intrinsics operating on float16x[48] vectors (converting to/from float32x4) are *not* available without hardware support; these intrinsics *are* available without specifying -mfp16-format. ACLE (4.1.2) allows toolchains to provide __fp16 when not implemented in HW, even if this is not required. CC'ing the ARM maintainers and Tejas for an ACLE perspective. I think that we'd want to gate the definition of __fp16 on hardware availability as well (the -mfpu option) rather than just arm_fp16_format but I'm not sure of the impact this will have on existing users. Surebut do we require -mfpu *and* -mfp16-format? s/and/or/ ? Do we require -mfp16-format for float16x[48] intrinsics, or allow format-agnostic code (as HW support allows us to!)? I don't have very strong opinions as to which way we should go, I merely tried to be consistent with the existing codebase, and to support as much code as possible, although I agree I ignored cases where defining functions unexpectedly might cause problems. Cheers, Alan
Re: [Ping^3] [PATCH, ARM, libgcc] New aeabi_idiv function for armv6-m
On 07/07/15 16:23, Tejas Belagod wrote: Ping! I've had a look at this (sorry for the delay). I think it's mostly OK, but I have two comments to make. 1) It's quite hard to understand the algorithm and there are no comments to aid understanding (to be fair, there aren't many comments on the other algorithms either). 2) It looks as though the new code calculates both the division and the modulus simultaneously. As such, it means that the the divmod code should be simplified to share the same code as the division function itself (saving a few bytes, and more importantly several cycles when modulus is required). Can you please run some tests to validate 2) above? and if correct adjust the code to handle this case. I think that will go some way to mitigating the code size increase from the new implementation. R. On 30/04/15 10:40, Hale Wang wrote: -Original Message- From: Hale Wang [mailto:hale.w...@arm.com] Sent: Monday, February 09, 2015 9:54 AM To: Richard Earnshaw Cc: Hale Wang; gcc-patches; Matthew Gretton-Dann Subject: RE: [Ping^2] [PATCH, ARM, libgcc] New aeabi_idiv function for armv6-m Ping https://gcc.gnu.org/ml/gcc-patches/2014-12/msg01059.html. Ping for trunk. Is it ok for trunk now? Thanks, Hale -Original Message- From: gcc-patches-ow...@gcc.gnu.org [mailto:gcc-patches- ow...@gcc.gnu.org] On Behalf Of Hale Wang Sent: Friday, December 12, 2014 9:36 AM To: gcc-patches Subject: RE: [Ping] [PATCH, ARM, libgcc] New aeabi_idiv function for armv6- m Ping? Already applied to arm/embedded-4_9-branch, is it OK for trunk? -Hale -Original Message- From: Joey Ye [mailto:joey.ye...@gmail.com] Sent: Thursday, November 27, 2014 10:01 AM To: Hale Wang Cc: gcc-patches Subject: Re: [PATCH, ARM, libgcc] New aeabi_idiv function for armv6-m OK applying to arm/embedded-4_9-branch, though you still need maintainer approval into trunk. - Joey On Wed, Nov 26, 2014 at 11:43 AM, Hale Wang hale.w...@arm.com wrote: Hi, This patch ports the aeabi_idiv routine from Linaro Cortex-Strings (https://git.linaro.org/toolchain/cortex-strings.git), which was contributed by ARM under Free BSD license. The new aeabi_idiv routine is used to replace the one in libgcc/config/arm/lib1funcs.S. This replacement happens within the Thumb1 wrapper. The new routine is under LGPLv3 license. The main advantage of this version is that it can improve the performance of the aeabi_idiv function for Thumb1. This solution will also increase the code size. So it will only be used if __OPTIMIZE_SIZE__ is not defined. Make check passed for armv6-m. OK for trunk? Thanks, Hale Wang libgcc/ChangeLog: 2014-11-26 Hale Wang hale.w...@arm.com * config/arm/lib1funcs.S: Add new wrapper. === diff --git a/libgcc/config/arm/lib1funcs.S b/libgcc/config/arm/lib1funcs.S index b617137..de66c81 100644 --- a/libgcc/config/arm/lib1funcs.S +++ b/libgcc/config/arm/lib1funcs.S @@ -306,34 +306,12 @@ LSYM(Lend_fde): #ifdef __ARM_EABI__ .macro THUMB_LDIV0 name signed #if defined(__ARM_ARCH_6M__) - .ifc \signed, unsigned - cmp r0, #0 - beq 1f - mov r0, #0 - mvn r0, r0 @ 0x -1: - .else - cmp r0, #0 - beq 2f - blt 3f + + push{r0, lr} mov r0, #0 - mvn r0, r0 - lsr r0, r0, #1 @ 0x7fff - b 2f -3: mov r0, #0x80 - lsl r0, r0, #24 @ 0x8000 -2: - .endif - push{r0, r1, r2} - ldr r0, 4f - adr r1, 4f - add r0, r1 - str r0, [sp, #8] - @ We know we are not on armv4t, so pop pc is safe. - pop {r0, r1, pc} - .align 2 -4: - .word __aeabi_idiv0 - 4b + bl SYM(__aeabi_idiv0) + pop {r1, pc} + #elif defined(__thumb2__) .syntax unified .ifc \signed, unsigned @@ -927,7 +905,158 @@ LSYM(Lover7): add dividend, work .endif LSYM(Lgot_result): -.endm +.endm + +#if defined(__prefer_thumb__) !defined(__OPTIMIZE_SIZE__) .macro +BranchToDiv n, label + lsr curbit, dividend, \n + cmp curbit, divisor + blo \label +.endm + +.macro DoDiv n + lsr curbit, dividend, \n + cmp curbit, divisor + bcc 1f + lsl curbit, divisor, \n + sub dividend, dividend, curbit + +1: adc result, result +.endm + +.macro THUMB1_Div_Positive + mov result, #0 + BranchToDiv #1, LSYM(Lthumb1_div1) + BranchToDiv #4, LSYM(Lthumb1_div4) + BranchToDiv #8, LSYM(Lthumb1_div8) + BranchToDiv #12, LSYM(Lthumb1_div12) + BranchToDiv #16, LSYM(Lthumb1_div16) +LSYM(Lthumb1_div_large_positive): + mov result, #0xff + lsl divisor, divisor, #8 +
RE: [PATCH] MIPS: Update stack-1.c testcase to match micromips jraddiusp instruction.
I'm not sure this is the right approach here. If we get a jraddiusp then the problem that the test is trying to cover can't possibly happen anyway. (The test is checking if a load and final stack adjustment are ever re- ordered from what I can see.) I'd just mark the test as NOCOMPRESSION instead of just NOMIPS16 and update the comment to say that it is avoiding SAVE, RESTORE and JRADDIUSP. Another approach would be to add the micromips testcase variant and skip the test if code-quality (ie. -O0). Catherine I agree with Matthew here. The testcase already comments that it is preventing the use of the MIPS16 save and restore instructions, so it makes sense to prevent jraddiusp as well. The updated patch and ChangeLog is below. Ok to commit? Many thanks, Andrew testsuite/ * gcc.target/mips/stack-1.c: Do not build the testcase for micromips. diff --git a/gcc/testsuite/gcc.target/mips/stack-1.c b/gcc/testsuite/gcc.target/mips/stack-1.c index a28e4bf..5f25c21 100644 --- a/gcc/testsuite/gcc.target/mips/stack-1.c +++ b/gcc/testsuite/gcc.target/mips/stack-1.c @@ -2,8 +2,8 @@ /* { dg-final { scan-assembler \tlw\t } } */ /* { dg-final { scan-assembler-not \td?addiu\t(\\\$sp,)?\\\$sp,\[1-9\].*\tlw\t } } */ -/* Avoid use of SAVE and RESTORE. */ -NOMIPS16 int foo (int y) +/* Avoid use of SAVE, RESTORE and JRADDIUSP. */ +NOCOMPRESSION int foo (int y) { volatile int a = y; volatile int *volatile b = a;
RE: [PATCH] MIPS: fix failing branch range checks for micromips
I see that you are naming these tests after the original branch-number tests that they were derived from. I think it would be better to keep all of the microMIPS tests named umips-???. I don't think preserving the original number is important. I have named the microMIPS tests umips-branch-??? to keep with the current microMIPS test naming strategy. The numbering starts at 5 as there are already tests numbered 1-4. An updated patch and ChangeLog is below. Ok to commit? Many thanks, Andrew testsuite/ * gcc.target/mips/branch-2.c: Change NOMIPS16 to NOCOMPRESSION. * gcc.target/mips/branch-3.c: Ditto * gcc.target/mips/branch-4.c: Ditto. * gcc.target/mips/branch-5.c: Ditto. * gcc.target/mips/branch-6.c: Ditto. * gcc.target/mips/branch-7.c: Ditto. * gcc.target/mips/branch-8.c: Ditto. * gcc.target/mips/branch-9.c: Ditto. * gcc.target/mips/branch-10.c: Ditto. * gcc.target/mips/branch-11.c: Ditto. * gcc.target/mips/branch-12.c: Ditto. * gcc.target/mips/branch-13.c: Ditto. * gcc.target/mips/branch-14.c: Ditto. * gcc.target/mips/branch-15.c: Ditto. * gcc.target/mips/umips-branch-5.c: New file. * gcc.target/mips/umips-branch-6.c: New file. * gcc.target/mips/umips-branch-7.c: New file. * gcc.target/mips/umips-branch-8.c: New file. * gcc.target/mips/umips-branch-9.c: New file. * gcc.target/mips/umips-branch-10.c: New file. * gcc.target/mips/umips-branch-11.c: New file. * gcc.target/mips/umips-branch-12.c: New file. * gcc.target/mips/umips-branch-13.c: New file. * gcc.target/mips/umips-branch-14.c: New file. * gcc.target/mips/umips-branch-15.c: New file. * gcc.target/mips/umips-branch-16.c: New file. * gcc.target/mips/umips-branch-17.c: New file. * gcc.target/mips/umips-branch-18.c: New file. * gcc.target/mips/branch-helper.h (OCCUPY_0x1): New define. (OCCUPY_0xfffc): New define. diff --git a/gcc/testsuite/gcc.target/mips/branch-10.c b/gcc/testsuite/gcc.target/mips/branch-10.c index e2b1b5f..eb21c16 100644 --- a/gcc/testsuite/gcc.target/mips/branch-10.c +++ b/gcc/testsuite/gcc.target/mips/branch-10.c @@ -4,7 +4,7 @@ #include branch-helper.h -NOMIPS16 void +NOCOMPRESSION void foo (int (*bar) (void), int *x) { *x = bar (); diff --git a/gcc/testsuite/gcc.target/mips/branch-11.c b/gcc/testsuite/gcc.target/mips/branch-11.c index 962eb1b..bd8e834 100644 --- a/gcc/testsuite/gcc.target/mips/branch-11.c +++ b/gcc/testsuite/gcc.target/mips/branch-11.c @@ -8,7 +8,7 @@ #include branch-helper.h -NOMIPS16 void +NOCOMPRESSION void foo (int (*bar) (void), int *x) { *x = bar (); diff --git a/gcc/testsuite/gcc.target/mips/branch-12.c b/gcc/testsuite/gcc.target/mips/branch-12.c index 4aef160..4944634 100644 --- a/gcc/testsuite/gcc.target/mips/branch-12.c +++ b/gcc/testsuite/gcc.target/mips/branch-12.c @@ -4,7 +4,7 @@ #include branch-helper.h -NOMIPS16 void +NOCOMPRESSION void foo (int (*bar) (void), int *x) { *x = bar (); diff --git a/gcc/testsuite/gcc.target/mips/branch-13.c b/gcc/testsuite/gcc.target/mips/branch-13.c index 8a6fb04..f5269b9 100644 --- a/gcc/testsuite/gcc.target/mips/branch-13.c +++ b/gcc/testsuite/gcc.target/mips/branch-13.c @@ -8,7 +8,7 @@ #include branch-helper.h -NOMIPS16 void +NOCOMPRESSION void foo (int (*bar) (void), int *x) { *x = bar (); diff --git a/gcc/testsuite/gcc.target/mips/branch-14.c b/gcc/testsuite/gcc.target/mips/branch-14.c index 026417e..c2eecc3 100644 --- a/gcc/testsuite/gcc.target/mips/branch-14.c +++ b/gcc/testsuite/gcc.target/mips/branch-14.c @@ -4,14 +4,14 @@ #include branch-helper.h void __attribute__((noinline)) -foo (volatile int *x) +NOCOMPRESSION foo (volatile int *x) { if (__builtin_expect (*x == 0, 1)) OCCUPY_0x1fff8; } int -main (void) +NOCOMPRESSION main (void) { int x = 0; int y = 1; diff --git a/gcc/testsuite/gcc.target/mips/branch-15.c b/gcc/testsuite/gcc.target/mips/branch-15.c index dee7a05..89e25f3 100644 --- a/gcc/testsuite/gcc.target/mips/branch-15.c +++ b/gcc/testsuite/gcc.target/mips/branch-15.c @@ -4,14 +4,14 @@ #include branch-helper.h void -foo (volatile int *x) +NOCOMPRESSION foo (volatile int *x) { if (__builtin_expect (*x == 0, 1)) OCCUPY_0x1fffc; } int -main (void) +NOCOMPRESSION main (void) { int x = 0; int y = 1; diff --git a/gcc/testsuite/gcc.target/mips/branch-2.c b/gcc/testsuite/gcc.target/mips/branch-2.c index 6409c4c..b60e9cd 100644 --- a/gcc/testsuite/gcc.target/mips/branch-2.c +++ b/gcc/testsuite/gcc.target/mips/branch-2.c @@ -5,7 +5,7 @@ #include branch-helper.h -NOMIPS16 void +NOCOMPRESSION void foo (volatile int *x) { if (__builtin_expect (*x == 0, 1)) diff --git a/gcc/testsuite/gcc.target/mips/branch-3.c b/gcc/testsuite/gcc.target/mips/branch-3.c index 5fcfece..69300f6 100644 ---
Re: [PATCH 3/16][ARM] Add float16x4_t intrinsics
On 07/07/15 14:09, Kyrill Tkachov wrote: Hi Alan, On 07/07/15 13:34, Alan Lawrence wrote: As per https://gcc.gnu.org/ml/gcc-patches/2015-04/msg01335.html For some context, the reference for these is at: http://infocenter.arm.com/help/topic/com.arm.doc.ihi0073a/IHI0073A_arm_neon_intrinsics_ref.pdf This patch is ok once you and Charles decide on how to proceed with the two prerequisites. On second thought, the ACLE document at http://infocenter.arm.com/help/topic/com.arm.doc.ihi0053c/IHI0053C_acle_2_0.pdf says in 12.2.1: float16 types are only available when the __fp16 type is defined, i.e. when supported by the hardware This indicates that float16 type and intrinsic availability should be gated on the availability of fp16 in the specified -mfpu. Look at some existing intrinsics like vcvt_f16_f32 for a way to gate these. I notice that the float32x4_t is unconditionally defined in our arm_neon.h, however. I think this is a bug and its definition should be #ifdef'd properly as well. Thanks, Kyrill Thanks, Kyrill
RE: [PATCH] MIPS: Do not generate micromips code for the no-smartmips-lwxs.c testcase
Hi Andrew, Instead of adding the -mno-micromips option to dg-options, please change the MIPS16 attribute to NOCOMPRESSION. Index: gcc.target/mips/no-smartmips-lwxs.c === --- gcc.target/mips/no-smartmips-lwxs.c (revision 452061) +++ gcc.target/mips/no-smartmips-lwxs.c (working copy) @@ -1,7 +1,7 @@ /* { dg-do compile } */ /* { dg-options -mno-smartmips } */ -NOMIPS16 int scaled_indexed_word_load (int a[], int b) +NOCOMPRESSION int scaled_indexed_word_load (int a[], int b) { return a[b]; } OK with that change. Catherine Committed as SVN 225519. Regards, Andrew
Re: [PATCH, ARM] stop changing signedness in PROMOTE_MODE
On Tue, Jul 7, 2015 at 8:07 AM, Jeff Law l...@redhat.com wrote: On 06/29/2015 07:15 PM, Jim Wilson wrote: So if these copies require a conversion, then isn't it fundamentally wrong to have a PHI node which copies between them? That would seem to implicate the eipa_sra pass as needing to be aware of these promotions and avoid having these objects with different representations appearing on the lhs/rhs of a PHI node. My years at Cisco didn't give me a chance to work on the SSA passes, so I don't know much about how they work. But looking at this, I see that PHI nodes are eventually handed by emit_partition_copy in tree-outof-ssa.c, which calls convert_to_mode, so it appears that conversions between different (closely related?) types are OK in a PHI node. The problem in this case is that we have the exact same type for the src and dest. The only difference is that the ARM forces sign-extension for signed sub-word parameters and zero-extension for signed sub-word locals. Thus to detect the need for a conversion, you have to have the decls, and we don't have them here. There is also the problem that all of the SUBREG_PROMOTED_* stuff is in expand_expr and friends, which aren't used by the cfglayout/tree-outof-ssa.c code for PHI nodes. So we need to copy some of the SUBREG_PROMOTED_* handling into cfglyout/tree-outof-ssa, or modify them to call expand_expr for PHI nodes, and I haven't had any luck getting that to work yet. I still need to learn more about the code to figure out if this is possible. I also think that the ARM handling of PROMOTE_MODE is wrong. Treating a signed sub-word and unsigned can lead to inefficient code. This is part of the problem is much easier for me to fix. It may be hard to convince ARM maintainers that it should be changed though. I need more time to work on this too. I haven't looked at trying to forbid the optimizer from creating PHI nodes connecting parameters to locals. That just sounds like a strange thing to forbid, and seems likely to result in worse code by disabling too many optimizations. But maybe it is the right solution if neither of the other two options work. This should only be done when PROMOTE_MODE disagrees with TARGET_PROMOTE_FUNCTION_MODE, forcing the copy to require a conversion. Jim
Re: [PATCH 3/16][ARM] Add float16x4_t intrinsics
Kyrill Tkachov wrote: On 07/07/15 14:09, Kyrill Tkachov wrote: Hi Alan, On 07/07/15 13:34, Alan Lawrence wrote: As per https://gcc.gnu.org/ml/gcc-patches/2015-04/msg01335.html For some context, the reference for these is at: http://infocenter.arm.com/help/topic/com.arm.doc.ihi0073a/IHI0073A_arm_neon_intrinsics_ref.pdf This patch is ok once you and Charles decide on how to proceed with the two prerequisites. On second thought, the ACLE document at http://infocenter.arm.com/help/topic/com.arm.doc.ihi0053c/IHI0053C_acle_2_0.pdf says in 12.2.1: float16 types are only available when the __fp16 type is defined, i.e. when supported by the hardware However, we support __fp16 whenever the user specifies -mfp16-format=ieee or -mfp16-format=alternative, regardless of whether we have hardware support or not. (Without hardware support, gcc generates calls to __gnu_f2h_ieee or __gnu_f2h_alternative instead of vcvtb.f16.f32, and __gnu_h2f_ieee or __gnu_h2f_alternative instead of vcvtb.f32.f16. However, there is no way to support __fp16 just using those hardware instructions without caring about which format is in use.) Thus we cannot be consistent with both sides of that 'i.e.', unless we also change when __fp16 is available. I notice that the float32x4_t is unconditionally defined in our arm_neon.h, however. I think this is a bug and its definition should be #ifdef'd properly as well. Hmmm. Is this becoming a question of, which potentially-existing code do we want to break??? Cheers, Alan
Re: [PING][PATCH, 1/2] Merge rewrite_virtuals_into_loop_closed_ssa from gomp4 branch
On 06/07/15 15:44, Richard Biener wrote: On Mon, 6 Jul 2015, Tom de Vries wrote: On 25/06/15 09:42, Tom de Vries wrote: Hi, this patch merges rewrite_virtuals_into_loop_closed_ssa (originally submitted here: https://gcc.gnu.org/ml/gcc-patches/2015-06/msg01236.html ) to trunk. Bootstrapped and reg-tested on x86_64. OK for trunk? Ping. Thanks, - Tom 0001-Merge-rewrite_virtuals_into_loop_closed_ssa-from-gom.patch Merge rewrite_virtuals_into_loop_closed_ssa from gomp4 branch 2015-06-24 Tom de Vriest...@codesourcery.com merge from gomp4 branch: 2015-06-24 Tom de Vriest...@codesourcery.com * tree-ssa-loop-manip.c (get_virtual_phi): Factor out of ... (rewrite_virtuals_into_loop_closed_ssa): ... here. * tree-ssa-loop-manip.c (replace_uses_in_dominated_bbs): Factor out of ... (rewrite_virtuals_into_loop_closed_ssa): ... here. * dominance.c (bitmap_get_dominated_by): New function. * dominance.h (bitmap_get_dominated_by): Declare. * tree-ssa-loop-manip.c (rewrite_virtuals_into_loop_closed_ssa): Use bitmap_get_dominated_by. * tree-parloops.c (replace_uses_in_bbs_by) (rewrite_virtuals_into_loop_closed_ssa): Move to ... * tree-ssa-loop-manip.c: here. * tree-ssa-loop-manip.h (rewrite_virtuals_into_loop_closed_ssa): Declare. 2015-06-18 Tom de Vriest...@codesourcery.com * tree-parloops.c (rewrite_virtuals_into_loop_closed_ssa): New function. (transform_to_exit_first_loop_alt): Use rewrite_virtuals_into_loop_closed_ssa. --- gcc/dominance.c | 21 gcc/dominance.h | 1 + gcc/tree-parloops.c | 43 + gcc/tree-ssa-loop-manip.c | 81 +++ gcc/tree-ssa-loop-manip.h | 1 + 5 files changed, 112 insertions(+), 35 deletions(-) diff --git a/gcc/dominance.c b/gcc/dominance.c index 9c66ca2..9b52d79 100644 --- a/gcc/dominance.c +++ b/gcc/dominance.c @@ -753,6 +753,27 @@ set_immediate_dominator (enum cdi_direction dir, basic_block bb, dom_computed[dir_index] = DOM_NO_FAST_QUERY; } +/* Returns in BBS the list of basic blocks immediately dominated by BB, in the + direction DIR. As get_dominated_by, but returns result as a bitmap. */ + +void +bitmap_get_dominated_by (enum cdi_direction dir, basic_block bb, bitmap bbs) +{ + unsigned int dir_index = dom_convert_dir_to_idx (dir); + struct et_node *node = bb-dom[dir_index], *son = node-son, *ason; + + bitmap_clear (bbs); + + gcc_checking_assert (dom_computed[dir_index]); + + if (!son) +return; + + bitmap_set_bit (bbs, ((basic_block) son-data)-index); + for (ason = son-right; ason != son; ason = ason-right) +bitmap_set_bit (bbs, ((basic_block) son-data)-index); +} + Isn't a immediate_dominated_by_p () predicate better? It's very cheap to compute compared to allocating / populating and querying a bitmap. Dropped bitmap_get_dominated_by per comment below. /* Returns the list of basic blocks immediately dominated by BB, in the direction DIR. */ vecbasic_block diff --git a/gcc/dominance.h b/gcc/dominance.h index 37e138b..0a1a13e 100644 --- a/gcc/dominance.h +++ b/gcc/dominance.h @@ -41,6 +41,7 @@ extern void free_dominance_info (enum cdi_direction); extern basic_block get_immediate_dominator (enum cdi_direction, basic_block); extern void set_immediate_dominator (enum cdi_direction, basic_block, basic_block); +extern void bitmap_get_dominated_by (enum cdi_direction, basic_block, bitmap); extern vecbasic_block get_dominated_by (enum cdi_direction, basic_block); extern vecbasic_block get_dominated_by_region (enum cdi_direction, basic_block *, diff --git a/gcc/tree-parloops.c b/gcc/tree-parloops.c index e582fe7..df7c351 100644 --- a/gcc/tree-parloops.c +++ b/gcc/tree-parloops.c @@ -1498,25 +1498,6 @@ replace_uses_in_bb_by (tree name, tree val, basic_block bb) } } -/* Replace uses of NAME by VAL in blocks BBS. */ - -static void -replace_uses_in_bbs_by (tree name, tree val, bitmap bbs) -{ - gimple use_stmt; - imm_use_iterator imm_iter; - - FOR_EACH_IMM_USE_STMT (use_stmt, imm_iter, name) -{ - if (!bitmap_bit_p (bbs, gimple_bb (use_stmt)-index)) - continue; - - use_operand_p use_p; - FOR_EACH_IMM_USE_ON_STMT (use_p, imm_iter) - SET_USE (use_p, val); -} -} - /* Do transformation from: bb preheader: @@ -1637,18 +1618,11 @@ transform_to_exit_first_loop_alt (struct loop *loop, tree control = gimple_cond_lhs (cond_stmt); edge e; - /* Gather the bbs dominated by the exit block. */ - bitmap exit_dominated = BITMAP_ALLOC (NULL); - bitmap_set_bit (exit_dominated, exit_block-index); - vecbasic_block exit_dominated_vec -= get_dominated_by (CDI_DOMINATORS, exit_block); - - int i;
Re: [PATCH 3/16][ARM] Add float16x4_t intrinsics
On 07/07/15 17:34, Alan Lawrence wrote: Kyrill Tkachov wrote: On 07/07/15 14:09, Kyrill Tkachov wrote: Hi Alan, On 07/07/15 13:34, Alan Lawrence wrote: As per https://gcc.gnu.org/ml/gcc-patches/2015-04/msg01335.html For some context, the reference for these is at: http://infocenter.arm.com/help/topic/com.arm.doc.ihi0073a/IHI0073A_arm_neon_intrinsics_ref.pdf This patch is ok once you and Charles decide on how to proceed with the two prerequisites. On second thought, the ACLE document at http://infocenter.arm.com/help/topic/com.arm.doc.ihi0053c/IHI0053C_acle_2_0.pdf says in 12.2.1: float16 types are only available when the __fp16 type is defined, i.e. when supported by the hardware However, we support __fp16 whenever the user specifies -mfp16-format=ieee or -mfp16-format=alternative, regardless of whether we have hardware support or not. (Without hardware support, gcc generates calls to __gnu_f2h_ieee or __gnu_f2h_alternative instead of vcvtb.f16.f32, and __gnu_h2f_ieee or __gnu_h2f_alternative instead of vcvtb.f32.f16. However, there is no way to support __fp16 just using those hardware instructions without caring about which format is in use.) Hmmm... In my opinion intrinsics should aim to map to instructions rather than go away and call library functions, but this is the existing functionality that current users might depend on :( Thus we cannot be consistent with both sides of that 'i.e.', unless we also change when __fp16 is available. I notice that the float32x4_t is unconditionally defined in our arm_neon.h, however. I think this is a bug and its definition should be #ifdef'd properly as well. Hmmm. Is this becoming a question of, which potentially-existing code do we want to break??? CC'ing the ARM maintainers and Tejas for an ACLE perspective. I think that we'd want to gate the definition of __fp16 on hardware availability as well (the -mfpu option) rather than just arm_fp16_format but I'm not sure of the impact this will have on existing users. Kyrill Cheers, Alan
Re: Tests for libgomp based on OpenMP Examples 4.0.2.
Comment on the patch: simd-5.f90 file is marked as xfail since the test fails because 'simd collapse' is an unsupported combination for Fortran (which though is valid in OpenMP API). 2015-07-07 19:48 GMT+03:00 Maxim Blumental bvm...@gmail.com: With this letter I propose a patch with tests for libgomp based on OpenMP Examples 4.0.2 both for C and Fortran. The changes are: Renamed existing tests based on OpenMP Examples to make the names more clear. Added 16 tests for simd construct and 10 for depend clause. - Sincerely yours, Maxim Blumental -- - Sincerely yours, Maxim Blumental
RE: [PATCH] MIPS: Update stack-1.c testcase to match micromips jraddiusp instruction.
-Original Message- From: Andrew Bennett [mailto:andrew.benn...@imgtec.com] Sent: Tuesday, July 07, 2015 12:14 PM To: Moore, Catherine; Matthew Fortune; gcc-patches@gcc.gnu.org Subject: RE: [PATCH] MIPS: Update stack-1.c testcase to match micromips jraddiusp instruction. I'm not sure this is the right approach here. If we get a jraddiusp then the problem that the test is trying to cover can't possibly happen anyway. (The test is checking if a load and final stack adjustment are ever re- ordered from what I can see.) I'd just mark the test as NOCOMPRESSION instead of just NOMIPS16 and update the comment to say that it is avoiding SAVE, RESTORE and JRADDIUSP. Another approach would be to add the micromips testcase variant and skip the test if code-quality (ie. -O0). Catherine I agree with Matthew here. The testcase already comments that it is preventing the use of the MIPS16 save and restore instructions, so it makes sense to prevent jraddiusp as well. The updated patch and ChangeLog is below. Ok to commit? testsuite/ * gcc.target/mips/stack-1.c: Do not build the testcase for micromips. Yes, this is OK.
Re: flatten cfgloop.h
On Mon, Jul 6, 2015 at 8:11 PM, Jeff Law l...@redhat.com wrote: On 07/06/2015 07:38 AM, Michael Matz wrote: Hi, On Sun, 5 Jul 2015, Prathamesh Kulkarni wrote: Hi, The attached patches flatten cfgloop.h. patch-1.diff moves around prototypes and structures to respective header-files. patch-2.diff (mostly auto-generated) replicates cfgloop.h includes in c files. Bootstrapped and tested on x86_64-unknown-linux-gnu with all front-ends. Built on all targets using config-list.mk. I left includes in cfgloop.h commented with #if 0 ... #endif. OK for trunk ? Does nobody else think that header files for one or two prototypes are fairly silly? Perhaps, but having a .h file for each .c file's exported objects means that we can implement a reasonable policy around where functions are prototyped or structures declared. Yes. At least for infrastructure files. I happily make exceptions for things like tree-vectorizer.h having prototypes for all tree-vect*.c files (the at least look related!). Similar for tree-pass.h containing the interfacing of passes to the pass manager (instead of having a header file for every pass). Contrast to I put foo in expr.h because that was the most convenient place which over 25+ years has made our header file dependencies a horrid mess. Yeah - and I think we need to clean this up. The general guidance of prototype is in the corresponding .h file is easy. After doing that we can move functions or even get rid of *.[ch] pairs (like what is tree-dfa.[ch] other than a kitchen-sink for stuff - certainly no where Data flow functions for trees) Richard. Anyway, your autogenerated part contains changes that seem exaggerated, e.g.: +++ b/gcc/bt-load.c @@ -54,6 +54,14 @@ along with GCC; see the file COPYING3. If not see #include predict.h #include basic-block.h #include df.h +#include bitmap.h +#include sbitmap.h +#include cfgloopmanip.h +#include loop-init.h +#include cfgloopanal.h +#include loop-doloop.h +#include loop-invariant.h +#include loop-iv.h Surely bt-load doesn't need anything from doloop.h or invariant.h. Before this goes into trunk this whole autogenerated thing should be cleaned up to add includes only for things that are actually needed. Agreed. jeff
Re: fix segfault in verify_flow_info() with -dx option
On Tue, Jul 7, 2015 at 2:42 AM, Prathamesh Kulkarni prathamesh.kulka...@linaro.org wrote: On 6 July 2015 at 12:00, Richard Biener richard.guent...@gmail.com wrote: On Sun, Jul 5, 2015 at 2:07 PM, Prathamesh Kulkarni prathamesh.kulka...@linaro.org wrote: Hi, Passing -dx causes segmentation fault: Test case: void f(void) {} ./test.c: In function 'f': ../test.c:3:1: internal compiler error: Segmentation fault } ^ 0xab6baf crash_signal /home/prathamesh.kulkarni/gnu-toolchain/src/gcc.git/gcc/toplev.c:366 0x694b14 verify_flow_info() /home/prathamesh.kulkarni/gnu-toolchain/src/gcc.git/gcc/cfghooks.c:109 0x9f7e64 execute_function_todo /home/prathamesh.kulkarni/gnu-toolchain/src/gcc.git/gcc/passes.c:1997 0x9f86eb execute_todo /home/prathamesh.kulkarni/gnu-toolchain/src/gcc.git/gcc/passes.c:2042 Started with r210068. It looks like -dx causes cfun-cfg to be NULL, and hence the segfault in verify_flow_info(). The attached patch tries to fix it by adding a check to cfun-cfg before calling verify_flow_info() from execute_function_todo(). Bootstrapped and tested on x86_64-unknown-linux-gnu. OK for trunk ? No. We've checked cfun-curr_properties PROP_cfg already. So whatever is keeping that set but frees the CFG is the offender (and should clear the flag). I think I have somewhat understood what's happening. -dx turns on flag rtl_dump_and_exit. pass_rest_of_compilation is gated on !rtl_dump_and_exit. Since rtl_dump_and_exit == 1 when -dx is passed, pass_rest_of_compilation and all the rtl passes inserted within pass_rest_of_compilation don't execute. One of these passes is pass_free_cfg which destorys PROP_cfg, but with -dx passed, this pass doesn't get executed and PROP_cfg remains set. Then pass_clean_state::execute() calls free_after_compilation(), which sets cfun-cfg = NULL. And hence after pass_clean_state finishes in execute_function_todo, we end up with cfun-cfg == NULL and CFG_prop set, which calls verify_flow_info() and we hit the segfault. The following untested patch tries to fix this by clearing CFG_prop in free_after_compilation. Shall that be correct approach ? Yes, that looks good to me. Richard. Thanks, Prathamesh Richard. Thank you, Prathamesh
[patch, driver] Ignore -ftree-parallelize-loops={0,1}
Hi, currently, we have these spec strings in gcc/gcc.c involving ftree-parallelize-loops: ... %{fopenacc|fopenmp|ftree-parallelize-loops=*:%:include(libgomp.spec)%(link_gomp)} %{fopenacc|fopenmp|ftree-parallelize-loops=*:-pthread} ... Actually, ftree-parallelize-loops={0,1} means that no parallelization is done, but these spec strings still get activated for these values. Attached patch fixes that, by introducing a spec function gt (short for greather than), and using it in the spec lines. [ I've also tried this approach using the already existing spec function version-compare: ... %{fopenacc|fopenmp:%:include(libgomp.spec)%(link_gomp)} %:version-compare(= 2 ftree-parallelize-loops= %:include(libgomp.spec)%(link_gomp)) ... But that didn't work out. The function evaluation mechanism evaluates the arguments before testing the function, so we evaluate '%:include(libgomp.spec)' unconditionally. The gcc build breaks on the first xgcc invocation with linking due to a missing libgomp.spec. ] Bootstrapped and reg-tested on x86_64. OK for trunk? Thanks, - Tom Ignore -ftree-parallelize-loops={0,1} using gt --- gcc/gcc.c | 48 ++-- 1 file changed, 46 insertions(+), 2 deletions(-) diff --git a/gcc/gcc.c b/gcc/gcc.c index 0f29b78..34fb437 100644 --- a/gcc/gcc.c +++ b/gcc/gcc.c @@ -274,6 +274,7 @@ static const char *compare_debug_self_opt_spec_function (int, const char **); static const char *compare_debug_auxbase_opt_spec_function (int, const char **); static const char *pass_through_libs_spec_func (int, const char **); static const char *replace_extension_spec_func (int, const char **); +static const char *greater_than_spec_func (int, const char **); static char *convert_white_space (char *); /* The Specs Language @@ -881,7 +882,7 @@ proper position among the other output files. */ %{s} %{t} %{u*} %{z} %{Z} %{!nostdlib:%{!nostartfiles:%S}} VTABLE_VERIFICATION_SPEC \ %{static:} %{L*} %(mfwrap) %(link_libgcc) SANITIZER_EARLY_SPEC %o\ CHKP_SPEC \ -%{fopenacc|fopenmp|ftree-parallelize-loops=*:%:include(libgomp.spec)%(link_gomp)}\ +%{fopenacc|fopenmp|%:gt(%{ftree-parallelize-loops=*} 1):%:include(libgomp.spec)%(link_gomp)} \ %{fcilkplus:%:include(libcilkrts.spec)%(link_cilkrts)}\ %{fgnu-tm:%:include(libitm.spec)%(link_itm)}\ %(mflib) STACK_SPLIT_SPEC \ @@ -1042,7 +1043,8 @@ static const char *const multilib_defaults_raw[] = MULTILIB_DEFAULTS; /* Linking to libgomp implies pthreads. This is particularly important for targets that use different start files and suchlike. */ #ifndef GOMP_SELF_SPECS -#define GOMP_SELF_SPECS %{fopenacc|fopenmp|ftree-parallelize-loops=*: \ +#define GOMP_SELF_SPECS \ + %{fopenacc|fopenmp|%:gt(%{ftree-parallelize-loops=*} 1): \ -pthread} #endif @@ -1482,6 +1484,7 @@ static const struct spec_function static_spec_functions[] = { compare-debug-auxbase-opt, compare_debug_auxbase_opt_spec_function }, { pass-through-libs, pass_through_libs_spec_func }, { replace-extension, replace_extension_spec_func }, + { gt, greater_than_spec_func }, #ifdef EXTRA_SPEC_FUNCTIONS EXTRA_SPEC_FUNCTIONS #endif @@ -9428,6 +9431,47 @@ replace_extension_spec_func (int argc, const char **argv) return result; } +/* Returns if the n in ARGV[1] == -opt=n is greater than ARGV[2]. + Otherwise, return NULL. */ + +static const char * +greater_than_spec_func (int argc, const char **argv) +{ + char *converted; + + if (argc == 1) +return NULL; + + gcc_assert (argc == 3); + gcc_assert (argv[0][0] == '-'); + gcc_assert (argv[0][1] == '\0'); + + /* Point p to n in in -opt=n. */ + const char *p = argv[1]; + while (true) +{ + char c = *p; + if (c == '\0') + gcc_unreachable (); + + ++p; + + if (c == '=') + break; +} + + long arg = strtol (p, converted, 10); + gcc_assert (converted != p); + + long lim = strtol (argv[2], converted, 10); + gcc_assert (converted != argv[2]); + + if (arg lim) +return ; + + return NULL; +} + /* Insert backslash before spaces in ORIG (usually a file path), to avoid being broken by spec parser. -- 1.9.1
Re: [PR25529] Convert (unsigned t * 2)/2 into unsigned (t 0x7FFFFFFF)
On Tue, Jul 7, 2015 at 8:06 AM, Marc Glisse marc.gli...@inria.fr wrote: On Tue, 7 Jul 2015, Hurugalawadi, Naveen wrote: Please find attached the patch PR25529.patch that converts the pattern:- (unsigned * 2)/2 is into unsigned 0x7FFF +/* Simplify (unsigned t * 2)/2 - unsigned t 0x7FFF. */ +(for div (trunc_div ceil_div floor_div round_div exact_div) + (simplify + (div (mult @0 INTEGER_CST@1) INTEGER_CST@1) You don't need to repeat INTEGER_CST, the second time @1 is enough. + (with { tree n2 = build_int_cst (TREE_TYPE (@0), + wi::exact_log2 (@1)); } + (if (TYPE_UNSIGNED (TREE_TYPE (@0))) + (bit_and @0 (rshift (lshift { build_minus_one_cst (TREE_TYPE (@0)); } + { n2; }) { n2; })) What happens if you write t*3/3? Huh, and you posted this patch twice? See my reply to the other copy for the correctness issues and better handling of exact_div Richard. -- Marc Glisse
Re: [RFC] two-phase marking in gt_cleare_cache
On Mon, Jul 6, 2015 at 7:29 PM, Tom de Vries tom_devr...@mentor.com wrote: On 06/07/15 15:29, Richard Biener wrote: On Mon, Jul 6, 2015 at 3:25 PM, Richard Biener richard.guent...@gmail.com wrote: On Mon, Jul 6, 2015 at 10:57 AM, Tom de Vries tom_devr...@mentor.com wrote: Hi, Using attached untested patch, I managed to minimize a test-case failure for PR 66714. The patch introduces two-phase marking in gt_cleare_cache: - first phase, it loops over all the hash table entries and removes those which are dead - second phase, it runs over all the live hash table entries and marks live items that are reachable from those live entries By doing so, we make the behaviour of gt_cleare_cache independent of the order in which the entries are visited, turning: - hard-to-trigger bugs which trigger for one visiting order but not for another, into - more easily triggered bugs which trigger for any visiting order. Any comments? I think it is only half-way correct in your proposed change. You only fix the issue for hashes of the same kind. To truly fix the issue you'd have to change generated code for gt_clear_caches () and provide a clearing-only implementation (or pass a operation mode bool to the core worker in hash-table.h). [ Btw, we have been discussing a similar issue before: https://gcc.gnu.org/ml/gcc/2010-07/msg00077.html ] True, the problem exists at the scope of all variables marked with 'cache', and this patch addresses the problem only within a single variable. Hmm, and don't we rather want to first mark and _then_ clear? I. In favor of first clear and then mark: It allows for: - a lazy one phase implementation for !ENABLE_CHECKING where you do a single clear-or-mark phase (so the clear is lazy). - an eager two phase implementation for ENABLE_CHECKING (where the clear is eager) The approach of first a marking phase and then a clearing phase means you always have to do these two phases (you can't do the marking lazily). True. First mark and then clear means the marking should be done iteratively. Each time you mark something live, another entry in another hash table could become live. Marking iteratively could become quite costly. I don't see this - marking is done recursively so if one entry makes another live and that makes another live the usual GC marking recursion will deal with this? II. In favor of first mark and then clear: The users of garbage collection will need to be less precise. Because if entry B in the hash is live and would keep A live then A _is_ kept in the end but you'll remove it from the hash, possibly no longer using a still live copy. I'm not sure I understand the scenario you're concerned about, but ... say we have - entry B: item B - item A - entry A: item A - item Z If you do clear first and mark second, and you start out with item B live and item A dead: - during the clearing phase you clear entry A and keep entry B, and - during the marking phase you mark item A live. So we no longer have entry A, but item A is kept and entry B is kept. Yes. This makes the cache weaker in that after this GC operation a lookup of A no longer succeeds but it still is there. The whole point of your patch was to make the behavior more predictable and in some way it succeeds (within a cache). As it is supposed to put more stress on the cache logic (it's ENABLE_CHECKING only) it makes sense to clear optimistically (after all it's a cache and not guaranteed to find a still live entry). It would be still nice to cover all caches together because as I remember we've mostly seen issues of caches interacting. Richard. Thanks, - Tom
GCC 5.2.0 Status Report (2015-07-07), branch frozen
The GCC 5 branch is now frozen for the release of GCC 5.2, all changes require release manager approval from now on. I will shortly announce a first release candidate for GCC 5.2. Previous Report === https://gcc.gnu.org/ml/gcc/2015-06/msg00202.html
Re: [Patch,tree-optimization]: Add new path Splitting pass on tree ssa representation
On Sat, Jul 4, 2015 at 2:39 PM, Ajit Kumar Agarwal ajit.kumar.agar...@xilinx.com wrote: -Original Message- From: Richard Biener [mailto:richard.guent...@gmail.com] Sent: Tuesday, June 30, 2015 4:42 PM To: Ajit Kumar Agarwal Cc: l...@redhat.com; GCC Patches; Vinod Kathail; Shail Aditya Gupta; Vidhumouli Hunsigida; Nagaraju Mekala Subject: Re: [Patch,tree-optimization]: Add new path Splitting pass on tree ssa representation On Tue, Jun 30, 2015 at 10:16 AM, Ajit Kumar Agarwal ajit.kumar.agar...@xilinx.com wrote: All: The below patch added a new path Splitting optimization pass on SSA representation. The Path Splitting optimization Pass moves the join block of if-then-else same as loop latch to its predecessors and get merged with the predecessors Preserving the SSA representation. The patch is tested for Microblaze and i386 target. The EEMBC/Mibench benchmarks is run with the Microblaze target And the performance gain of 9.15% and rgbcmy01_lite(EEMBC benchmarks). The Deja GNU tests is run for Mircroblaze Target and no regression is seen for Microblaze target and the new testcase attached are passed. For i386 bootstrapping goes through fine and the Spec cpu2000 benchmarks is run with this patch. Following observation were seen with spec cpu2000 benchmarks. Ratio of path splitting change vs Ratio of not having path splitting change is 3653.353 vs 3652.14 for INT benchmarks. Ratio of path splitting change vs Ratio of not having path splitting change is 4353.812 vs 4345.351 for FP benchmarks. Based on comments from RFC patch following changes were done. 1. Added a new pass for path splitting changes. 2. Placed the new path Splitting Optimization pass before the copy propagation pass. 3. The join block same as the Loop latch is wired into its predecessors so that the CFG Cleanup pass will merge the blocks Wired together. 4. Copy propagation routines added for path splitting changes is not needed as suggested by Jeff. They are removed in the patch as The copy propagation in the copied join blocks will be done by the existing copy propagation pass and the update ssa pass. 5. Only the propagation of phi results of the join block with the phi argument is done which will not be done by the existing update_ssa Or copy propagation pass on tree ssa representation. 6. Added 2 tests. a) compilation check tests. b) execution tests. 7. Refactoring of the code for the feasibility check and finding the join block same as loop latch node. [Patch,tree-optimization]: Add new path Splitting pass on tree ssa representation. Added a new pass on path splitting on tree SSA representation. The path splitting optimization does the CFG transformation of join block of the if-then-else same as the loop latch node is moved and merged with the predecessor blocks after preserving the SSA representation. ChangeLog: 2015-06-30 Ajit Agarwal ajit...@xilinx.com * gcc/Makefile.in: Add the build of the new file tree-ssa-path-split.c * gcc/common.opt: Add the new flag ftree-path-split. * gcc/opts.c: Add an entry for Path splitting pass with optimization flag greater and equal to O2. * gcc/passes.def: Enable and add new pass path splitting. * gcc/timevar.def: Add the new entry for TV_TREE_PATH_SPLIT. * gcc/tree-pass.h: Extern Declaration of make_pass_path_split. * gcc/tree-ssa-path-split.c: New file for path splitting pass. * gcc/testsuite/gcc.dg/tree-ssa/path-split-2.c: New testcase. * gcc/testsuite/gcc.dg/path-split-1.c: New testcase. I'm not 100% sure I understand the transform but what I see from the testcases it tail-duplicates from a conditional up to a loop latch block (not sure if it includes it and thus ends up creating a loop nest or not). An observation I have is that the pass should at least share the transform stage to some extent with the existing tracer pass (tracer.c) which essentially does the same but not restricted to loops in any way. The following piece of code from tracer.c can be shared with the existing path splitting pass. { e = find_edge (bb, bb2); copy = duplicate_block (bb2, e, bb); flush_pending_stmts (e); add_phi_args_after_copy (copy, 1, NULL); } Sharing the above code of the transform stage of tracer.c with the path splitting pass has the following limitation. 1. The duplicated loop latch node is wired to its predecessors and the existing phi node in the loop latch node with the Phi arguments from its corresponding predecessors is moved to the duplicated loop latch node that is wired into its predecessors. Due To this, the duplicated loop latch nodes wired into its predecessors will not be merged with the original predecessors by CFG cleanup phase . So I wonder if your pass could be simply another
[PATCH] Fix PR66739
The following fixes PR66739 - with conditionals not applying a single-use restriction usually causes some regressions. Bootstrapped and tested on x86_64-unknown-linux-gnu, applied to trunk. Richard. 2015-07-07 Richard Biener rguent...@suse.de PR middle-end/66739 * match.pd: Condition A - B ==/!= 0 - A ==/!= B on single-use A - B. Index: gcc/match.pd === --- gcc/match.pd(revision 225453) +++ gcc/match.pd(working copy) @@ -1336,8 +1353,9 @@ (define_operator_list CBRT BUILT_IN_CBRT attempts to synthetize ABS_EXPR. */ (for cmp (eq ne) (simplify - (cmp (minus @0 @1) integer_zerop) - (cmp @0 @1))) + (cmp (minus@2 @0 @1) integer_zerop) + (if (single_use (@2)) + (cmp @0 @1 /* Transform comparisons of the form X * C1 CMP 0 to X CMP 0 in the signed arithmetic case. That form is created by the compiler
Re: [PR25530] Convert (unsigned t / 2) * 2 into (unsigned t ~1)
On Tue, Jul 7, 2015 at 6:52 AM, Hurugalawadi, Naveen naveen.hurugalaw...@caviumnetworks.com wrote: Hi, Please find attached the patch PR25530.patch that converts the pattern:- (unsigned / 2) * 2 is into (unsigned ~1). Please review and let me know if its okay. For EXACT_DIV fold-const.c has /* ((T) (X /[ex] C)) * C cancels out if the conversion is sign-changing only. */ if (TREE_CODE (arg1) == INTEGER_CST TREE_CODE (arg0) == EXACT_DIV_EXPR operand_equal_p (arg1, TREE_OPERAND (arg0, 1), 0)) return fold_convert_loc (loc, type, TREE_OPERAND (arg0, 0)); we know the remainder is zero for EXACT_DIV. It also gives hints that a sign-changing conversion is ok. +/* Simplify (unsigned t / 2) * 2 - unsigned t ~1. */ +/* PR25530. */ +(for div (trunc_div ceil_div floor_div round_div exact_div) + (simplify + (mult (div @0 INTEGER_CST@1) INTEGER_CST@1) + (with { tree n2 = build_int_cst (TREE_TYPE (@0), + wi::exact_log2 (@1)); } + (if (TYPE_UNSIGNED (TREE_TYPE (@0))) + (bit_and @0 (lshift (rshift { build_minus_one_cst (TREE_TYPE (@0)); } + { n2; }) { n2; })) you should move the (with inside the (if to save work if the type is not unsigned. Also you are using wi::exact_log2 without checking whether @1 was a power of two (I think exact_log2 returns -1 in this case). Then expressing ~1 with the result expression is really excessive - you should simply build this with @1 - 1 if @1 is a power of two. So please handle exact_div differently, like fold-const.c does. Also I am not sure ceil_div and floor_div can be handled this way. (5 /[ceil] 2) * 2 == 6 but you compute it as 4. So I am only convinced trunc_div works this way. Thanks, Richard. Regression tested on AARH64 and x86_64. Thanks, Naveen gcc/testsuite/ChangeLog: 2015-07-07 Naveen H.S naveen.hurugalaw...@caviumnetworks.com PR middle-end/25530 * gcc.dg/pr25530.c: New test. gcc/ChangeLog: 2015-07-07 Naveen H.S naveen.hurugalaw...@caviumnetworks.com PR middle-end/25530 * match.pd (mult (div @0 INTEGER_CST@1) INTEGER_CST@1) : New simplifier.
[PATCH, i386]: Generate BT with immedate operand
Hello! After recent x86 EXTZ/EXTZV improvements, we can extend BT splitters to generate BT instruction with immediate operands. The improvement can be seen with attached testcases. The benefit is obvious for BT with immediates 32 = n = 63: 0: 48 b8 00 00 00 00 00movabs $0x1000,%rax 7: 00 00 10 a: 48 85 c7test %rax,%rdi vs.: 0: 48 0f ba e7 3c bt $0x3c,%rdi The benefit with operands 0 = n = 31 is also noticeable: 0: f7 c7 00 04 00 00 test $0x400,%edi vs.: 0: 0f ba e7 0a bt $0xa,%edi BT has *slightly* higher latency than TEST (0.33 vs. 0.25 cycles on a modern processor), so I have limited the conversion to -Os in case the bit-test is in the low 32 bits. In addition to 1556 BT %reg, %reg insns, already present in cc1 executable, patched compiler generated additional 628 BT #imm,%reg instructions in cc1. 2015-07-07 Uros Bizjak ubiz...@gmail.com * config/i386/i386.md (*jcc_btmode): Only split before reload. Remove operand constraints. Change operand 2 predicate to nonmemory operand. Limit const_int values to mode bitsize. Only allow const_int values less than 32 when optimizing for size. (*jcc_btmode_1, *jcc_btmode_mask): Only split before reload. Remove operand constraints. (*btmode): Use SImode for const_int values less than 32. (regmode): Remove mode attribute. testsuite/ChangeLog: 2015-07-07 Uros Bizjak ubiz...@gmail.com * gcc.target/i386/bt-3.c: New test. * gcc.target/i386/bt-4.c: Ditto. Patch was bootstrapped and regression tested on x86_64-linux-gnu {,-m32}. I'll commit the patch to mainline as soon as regression test ends. Uros. Index: config/i386/i386.md === --- config/i386/i386.md (revision 225484) +++ config/i386/i386.md (working copy) @@ -10765,8 +10765,6 @@ DONE; }) -(define_mode_attr regmode [(SI k) (DI q)]) - (define_insn *btmode [(set (reg:CCC FLAGS_REG) (compare:CCC @@ -10775,11 +10773,132 @@ (const_int 1) (match_operand:SI 1 nonmemory_operand rN)) (const_int 0)))] - TARGET_USE_BT || optimize_function_for_size_p (cfun) - bt{imodesuffix}\t{%regmode1, %0|%0, %regmode1} + +{ + switch (get_attr_mode (insn)) +{ +case MODE_SI: + return bt{l}\t{%1, %k0|%k0, %1}; + +case MODE_DI: + return bt{q}\t{%q1, %0|%0, %q1}; + +default: + gcc_unreachable (); +} +} [(set_attr type alu1) (set_attr prefix_0f 1) - (set_attr mode MODE)]) + (set (attr mode) + (if_then_else + (and (match_test CONST_INT_P (operands[1])) + (match_test INTVAL (operands[1]) 32)) + (const_string SI) + (const_string MODE)))]) + +(define_insn_and_split *jcc_btmode + [(set (pc) + (if_then_else (match_operator 0 bt_comparison_operator + [(zero_extract:SWI48 + (match_operand:SWI48 1 register_operand) + (const_int 1) + (match_operand:SI 2 nonmemory_operand)) +(const_int 0)]) + (label_ref (match_operand 3)) + (pc))) + (clobber (reg:CC FLAGS_REG))] + (TARGET_USE_BT || optimize_function_for_size_p (cfun)) +(CONST_INT_P (operands[2]) + ? (INTVAL (operands[2]) GET_MODE_BITSIZE (MODEmode) + INTVAL (operands[2]) + = (optimize_function_for_size_p (cfun) ? 0 : 32)) + : register_operand (operands[2], SImode)) +can_create_pseudo_p () + # + 1 + [(set (reg:CCC FLAGS_REG) + (compare:CCC + (zero_extract:SWI48 + (match_dup 1) + (const_int 1) + (match_dup 2)) + (const_int 0))) + (set (pc) + (if_then_else (match_op_dup 0 [(reg:CCC FLAGS_REG) (const_int 0)]) + (label_ref (match_dup 3)) + (pc)))] +{ + operands[0] = shallow_copy_rtx (operands[0]); + PUT_CODE (operands[0], reverse_condition (GET_CODE (operands[0]))); +}) + +(define_insn_and_split *jcc_btmode_1 + [(set (pc) + (if_then_else (match_operator 0 bt_comparison_operator + [(zero_extract:SWI48 + (match_operand:SWI48 1 register_operand) + (const_int 1) + (zero_extend:SI +(match_operand:QI 2 register_operand))) +(const_int 0)]) + (label_ref (match_operand 3)) + (pc))) + (clobber (reg:CC FLAGS_REG))] + (TARGET_USE_BT || optimize_function_for_size_p (cfun)) +can_create_pseudo_p () + # + 1 + [(set (reg:CCC FLAGS_REG) + (compare:CCC + (zero_extract:SWI48 + (match_dup 1) + (const_int 1) + (match_dup 2)) + (const_int 0))) + (set (pc) + (if_then_else (match_op_dup 0 [(reg:CCC
Re: [PATCH, ARM] stop changing signedness in PROMOTE_MODE
On Thu, Jul 2, 2015 at 2:07 AM, Richard Earnshaw richard.earns...@foss.arm.com wrote: Not quite, ARM state still has more flexible addressing modes for unsigned byte loads than for signed byte loads. It's even worse with thumb1 where some signed loads have no single-register addressing mode (ie you have to copy zero into another register to use as an index before doing the load). I wasn't aware of the load address problem. That was something I hadn't considered, and will have to look at that. Load is just one instruction though. For most other instructions, a zero-extend results in less efficient code, because it then forces a sign-extend before a signed operation. The fact that parameters and locals are handled differently which requires conversions when copying between them results in more inefficient code. And changing TARGET_PROMOTE_FUNCTION_MODE is an ABI change, and hence would be unwise, so changing PROMOTE_MODE is the safer option. Consider this testcase extern signed short gs; short sub (void) { signed short s = gs; int i; for (i = 0; i 10; i++) { s += 1; if (s 10) break; } return s; } The inner loop ends up as .L3: adds r3, r3, #1 mov r0, r1 uxth r3, r3 sxth r2, r3 cmp r2, #10 bgt .L8 cmp r2, r1 bne .L3 bx lr We need the sign-extension for the compare. We need the zero-extension for the loop carried dependency. We have two extensions in every loop iteration, plus some extra register usage and register movement. We get better code for this example if we aren't forcing signed shorts to be zero-extended via PROMOTE_MODE. The lack of a reg+immediate address mode for ldrs[bh] in thumb1 does look like a problem though. But this means the difference between generating movs r2, #0 ldrsh r3, [r3, r2] with my patch, or ldrh r3, [r3] lsls r2, r3, #16 asrs r2, r2, #16 without my patch. It isn't clear which sequence is better. The sign-extends in the second sequence can sometimes be optimized away, and sometimes they can't be optimized away. Similarly, in the first sequence, loading zero into a reg can sometimes be optimized, and sometimes it can't. There is also no guarantee that you get the first sequence with the patch or the second sequence without the patch. There is a splitter for ldrsh, so you can get the second pattern sometimes with the patch. Similarly, it might be possible to get the first pattern without the patch in some cases, though I don't have one at the moment. Jim
[PATCH, committed] PR jit/66779: fix segfault
Fix a segfault where expr.c:fold_single_bit_test was segfaulting due to jit_langhook_type_for_mode not handling QImode. Tested with make check-jit; takes jit.sum from 8289 to 8434 passes. Committed to trunk as r225522. gcc/jit/ChangeLog: PR jit/66779 * dummy-frontend.c (jit_langhook_type_for_mode): Ensure that we handle modes QI, HI, SI, DI, TI. gcc/testsuite/ChangeLog: PR jit/66779 * jit.dg/all-non-failing-tests.h: Add test-pr66779.c. * jit.dg/test-pr66779.c: New testcase. --- gcc/jit/dummy-frontend.c | 11 +++ gcc/testsuite/jit.dg/all-non-failing-tests.h | 10 ++ gcc/testsuite/jit.dg/test-pr66779.c | 143 +++ 3 files changed, 164 insertions(+) create mode 100644 gcc/testsuite/jit.dg/test-pr66779.c diff --git a/gcc/jit/dummy-frontend.c b/gcc/jit/dummy-frontend.c index 8001382..3ddab50 100644 --- a/gcc/jit/dummy-frontend.c +++ b/gcc/jit/dummy-frontend.c @@ -154,6 +154,17 @@ jit_langhook_type_for_mode (enum machine_mode mode, int unsignedp) if (mode == TYPE_MODE (double_type_node)) return double_type_node; + if (mode == TYPE_MODE (intQI_type_node)) +return unsignedp ? unsigned_intQI_type_node : intQI_type_node; + if (mode == TYPE_MODE (intHI_type_node)) +return unsignedp ? unsigned_intHI_type_node : intHI_type_node; + if (mode == TYPE_MODE (intSI_type_node)) +return unsignedp ? unsigned_intSI_type_node : intSI_type_node; + if (mode == TYPE_MODE (intDI_type_node)) +return unsignedp ? unsigned_intDI_type_node : intDI_type_node; + if (mode == TYPE_MODE (intTI_type_node)) +return unsignedp ? unsigned_intTI_type_node : intTI_type_node; + if (mode == TYPE_MODE (integer_type_node)) return unsignedp ? unsigned_type_node : integer_type_node; diff --git a/gcc/testsuite/jit.dg/all-non-failing-tests.h b/gcc/testsuite/jit.dg/all-non-failing-tests.h index 21ff428..463eefb 100644 --- a/gcc/testsuite/jit.dg/all-non-failing-tests.h +++ b/gcc/testsuite/jit.dg/all-non-failing-tests.h @@ -161,6 +161,13 @@ #undef create_code #undef verify_code +/* test-pr66779.c */ +#define create_code create_code_pr66779 +#define verify_code verify_code_pr66779 +#include test-pr66779.c +#undef create_code +#undef verify_code + /* test-reading-struct.c */ #define create_code create_code_reading_struct #define verify_code verify_code_reading_struct @@ -289,6 +296,9 @@ const struct testcase testcases[] = { {pr66700_observing_write_through_ptr, create_code_pr66700_observing_write_through_ptr, verify_code_pr66700_observing_write_through_ptr}, + {pr66779, + create_code_pr66779, + verify_code_pr66779}, {reading_struct , create_code_reading_struct , verify_code_reading_struct }, diff --git a/gcc/testsuite/jit.dg/test-pr66779.c b/gcc/testsuite/jit.dg/test-pr66779.c new file mode 100644 index 000..ac5a72b --- /dev/null +++ b/gcc/testsuite/jit.dg/test-pr66779.c @@ -0,0 +1,143 @@ +#include stdlib.h +#include stdio.h + +#include libgccjit.h + +#include harness.h + +/* Reproducer for PR jit/66779. + + Inject the equivalent of: + T FUNCNAME (T i, T j, T k) + { + bool comp0 = i 0x40; + bool comp1 = (j == k); + if (comp0 comp1) +return 7; + else +return 22; + } + for some type T; this was segfaulting during the expansion to RTL + due to missing handling for some machine modes in + jit_langhook_type_for_mode. */ + +void +create_fn (gcc_jit_context *ctxt, + const char *funcname, + enum gcc_jit_types jit_type) +{ + gcc_jit_type *the_type = +gcc_jit_context_get_type (ctxt, jit_type); + gcc_jit_type *t_bool = +gcc_jit_context_get_type (ctxt, GCC_JIT_TYPE_BOOL); + gcc_jit_param *param_i = +gcc_jit_context_new_param (ctxt, NULL, the_type, i); + gcc_jit_param *param_j = +gcc_jit_context_new_param (ctxt, NULL, the_type, j); + gcc_jit_param *param_k = +gcc_jit_context_new_param (ctxt, NULL, the_type, k); + gcc_jit_param *params[3] = { +param_i, +param_j, +param_k + }; + gcc_jit_function *func = +gcc_jit_context_new_function (ctxt, NULL, + GCC_JIT_FUNCTION_EXPORTED, + the_type, + funcname, + 3, params, + 0); + gcc_jit_block *b_entry = gcc_jit_function_new_block (func, entry); + gcc_jit_block *b_on_true = gcc_jit_function_new_block (func, on_true); + gcc_jit_block *b_on_false = gcc_jit_function_new_block (func, on_false); + + gcc_jit_lvalue *comp0 = +gcc_jit_function_new_local (func, NULL, t_bool, comp0); + + gcc_jit_block_add_assignment ( +b_entry, NULL, +comp0, +gcc_jit_context_new_comparison ( + ctxt, NULL, + GCC_JIT_COMPARISON_NE, + gcc_jit_context_new_binary_op ( + ctxt, NULL, + GCC_JIT_BINARY_OP_BITWISE_AND, + the_type, +
Re: Tests for libgomp based on OpenMP Examples 4.0.2.
On Tue, Jul 07, 2015 at 20:17:48 +0200, Jakub Jelinek wrote: On Tue, Jul 07, 2015 at 08:08:16PM +0300, Maxim Blumental wrote: Added 16 tests for simd construct and 10 for depend clause. Any new tests that aren't in Examples 4.0.* document should go one level higher, to libgomp.{c,c++,fortran}/ directly. Actually, the examples 4.0.2 document contains simd-* and task_dep-* tests, they are new in terms of examples-4 directory. -- Ilya
Re: [PATCH 1/3] [ARM] PR63870 NEON error messages
On 6 July 2015 at 11:18, Alan Lawrence alan.lawre...@arm.com wrote: I note some parts of this duplicate my https://gcc.gnu.org/ml/gcc-patches/2015-01/msg01422.html , which has been pinged a couple of times. Both Charles' patch, and my two, contain parts the other does not... To resolve the conflicts, I suggest that Alan's patches should be applied as-is first, and I'll rebase mine afterwards. ...and... Further to that - the main difference/conflict between Charles' patch and mine looks to be that I added the const_tree parameter to the existing neon_lane_bounds method, whereas Charles' patch adds a new method arm_neon_lane_bounds. ... I'll clean up this duplication when I do.
[PATCH, committed] PR jit/66783: prevent use of opaque structs
Prevent use of opaque structs for fields, globals and locals. Tested with make check-jit; jit.sum goes from 8434 to 8494 passes. Committed to trunk as r225523. gcc/jit/ChangeLog: PR jit/66783 * jit-recording.h: Within namespace gcc:jit::recording... (type::has_known_size): New virtual function. (struct_has_known_size): New function. * libgccjit.c (gcc_jit_context_new_field): Verify that the type has a known size. (gcc_jit_context_new_global): Likewise. (gcc_jit_function_new_local): Likewise. gcc/testsuite/ChangeLog: PR jit/66783 * jit.dg/test-error-gcc_jit_context_new_field-opaque-struct.c: New test case. * jit.dg/test-error-gcc_jit_context_new_global-opaque-struct.c: New test case. * jit.dg/test-error-gcc_jit_function_new_local-opaque-struct.c: New test case. * jit.dg/test-error-mismatching-types-in-call.c (create_code): Avoid using an opaque struct for local f. --- gcc/jit/jit-recording.h| 3 ++ gcc/jit/libgccjit.c| 15 ...error-gcc_jit_context_new_field-opaque-struct.c | 31 ...rror-gcc_jit_context_new_global-opaque-struct.c | 32 + ...rror-gcc_jit_function_new_local-opaque-struct.c | 42 ++ .../jit.dg/test-error-mismatching-types-in-call.c | 2 +- 6 files changed, 124 insertions(+), 1 deletion(-) create mode 100644 gcc/testsuite/jit.dg/test-error-gcc_jit_context_new_field-opaque-struct.c create mode 100644 gcc/testsuite/jit.dg/test-error-gcc_jit_context_new_global-opaque-struct.c create mode 100644 gcc/testsuite/jit.dg/test-error-gcc_jit_function_new_local-opaque-struct.c diff --git a/gcc/jit/jit-recording.h b/gcc/jit/jit-recording.h index acd69e9..884304b 100644 --- a/gcc/jit/jit-recording.h +++ b/gcc/jit/jit-recording.h @@ -497,6 +497,7 @@ public: virtual type *is_pointer () = 0; virtual type *is_array () = 0; virtual bool is_void () const { return false; } + virtual bool has_known_size () const { return true; } bool is_numeric () const { @@ -795,6 +796,8 @@ public: type *is_pointer () { return NULL; } type *is_array () { return NULL; } + bool has_known_size () const { return m_fields != NULL; } + playback::compound_type * playback_compound_type () { diff --git a/gcc/jit/libgccjit.c b/gcc/jit/libgccjit.c index 4d7dd8c..85d9f62 100644 --- a/gcc/jit/libgccjit.c +++ b/gcc/jit/libgccjit.c @@ -543,6 +543,11 @@ gcc_jit_context_new_field (gcc_jit_context *ctxt, /* LOC can be NULL. */ RETURN_NULL_IF_FAIL (type, ctxt, loc, NULL type); RETURN_NULL_IF_FAIL (name, ctxt, loc, NULL name); + RETURN_NULL_IF_FAIL_PRINTF1 ( +type-has_known_size (), +ctxt, loc, +type has unknown size (type: %s), +type-get_debug_string ()); return (gcc_jit_field *)ctxt-new_field (loc, type, name); } @@ -1033,6 +1038,11 @@ gcc_jit_context_new_global (gcc_jit_context *ctxt, kind); RETURN_NULL_IF_FAIL (type, ctxt, loc, NULL type); RETURN_NULL_IF_FAIL (name, ctxt, loc, NULL name); + RETURN_NULL_IF_FAIL_PRINTF1 ( +type-has_known_size (), +ctxt, loc, +type has unknown size (type: %s), +type-get_debug_string ()); return (gcc_jit_lvalue *)ctxt-new_global (loc, kind, type, name); } @@ -1829,6 +1839,11 @@ gcc_jit_function_new_local (gcc_jit_function *func, Cannot add locals to an imported function); RETURN_NULL_IF_FAIL (type, ctxt, loc, NULL type); RETURN_NULL_IF_FAIL (name, ctxt, loc, NULL name); + RETURN_NULL_IF_FAIL_PRINTF1 ( +type-has_known_size (), +ctxt, loc, +type has unknown size (type: %s), +type-get_debug_string ()); return (gcc_jit_lvalue *)func-new_local (loc, type, name); } diff --git a/gcc/testsuite/jit.dg/test-error-gcc_jit_context_new_field-opaque-struct.c b/gcc/testsuite/jit.dg/test-error-gcc_jit_context_new_field-opaque-struct.c new file mode 100644 index 000..c4e1448 --- /dev/null +++ b/gcc/testsuite/jit.dg/test-error-gcc_jit_context_new_field-opaque-struct.c @@ -0,0 +1,31 @@ +#include stdlib.h +#include stdio.h + +#include libgccjit.h + +#include harness.h + +/* Try to put an opaque struct inside another struct + (or union); the API ought to complain. */ + +void +create_code (gcc_jit_context *ctxt, void *user_data) +{ + gcc_jit_struct *t_opaque = +gcc_jit_context_new_opaque_struct (ctxt, NULL, opaque); + + (void)gcc_jit_context_new_field (ctxt, NULL, + gcc_jit_struct_as_type (t_opaque), + f_opaque); +} + +void +verify_code (gcc_jit_context *ctxt, gcc_jit_result *result) +{ + CHECK_VALUE (result, NULL); + + /* Verify that the correct error message was emitted. */ + CHECK_STRING_VALUE (gcc_jit_context_get_first_error (ctxt), + gcc_jit_context_new_field: + type has unknown size (type:
Re: Tests for libgomp based on OpenMP Examples 4.0.2.
On Tue, Jul 07, 2015 at 08:08:16PM +0300, Maxim Blumental wrote: Comment on the patch: simd-5.f90 file is marked as xfail since the test fails because 'simd collapse' is an unsupported combination for Fortran (which though is valid in OpenMP API). I'll have a look, that is supposed to work. 2015-07-07 19:48 GMT+03:00 Maxim Blumental bvm...@gmail.com: With this letter I propose a patch with tests for libgomp based on OpenMP Examples 4.0.2 both for C and Fortran. The changes are: Renamed existing tests based on OpenMP Examples to make the names more clear. If anything, the test could be renamed to match https://github.com/OpenMP/Examples/tree/master/sources/ filenames, but certainly not to made up names. The Examples-4/ directory is supposed to only contain the tests from the 4.0.* examples document and no other tests. Added 16 tests for simd construct and 10 for depend clause. Any new tests that aren't in Examples 4.0.* document should go one level higher, to libgomp.{c,c++,fortran}/ directly. Jakub
Adjust -fdump-ada-spec to C++14 switch
The Ada side doesn't know what to do with the move constructors of C++11 so the attached patch makes -fdump-ada-spec skip them. Tested on x86_64-suse-linux, applied on the mainline as obvious. 2015-07-07 Eric Botcazou ebotca...@adacore.com c-family/ * c-ada-spec.h (cpp_operation): Add IS_MOVE_CONSTRUCTOR. * c-ada-spec.c (print_ada_declaration): Skip move constructors. cp/ * decl2.c (cpp_check): Deal with IS_MOVE_CONSTRUCTOR. 2015-07-07 Eric Botcazou ebotca...@adacore.com * g++.dg/other/dump-ada-spec-8.C: New test. -- Eric BotcazouIndex: c-family/c-ada-spec.h === --- c-family/c-ada-spec.h (revision 225410) +++ c-family/c-ada-spec.h (working copy) @@ -30,6 +30,7 @@ typedef enum { IS_CONSTRUCTOR, IS_DESTRUCTOR, IS_COPY_CONSTRUCTOR, + IS_MOVE_CONSTRUCTOR, IS_TEMPLATE, IS_TRIVIAL } cpp_operation; Index: c-family/c-ada-spec.c === --- c-family/c-ada-spec.c (revision 225410) +++ c-family/c-ada-spec.c (working copy) @@ -2891,6 +2891,7 @@ print_ada_declaration (pretty_printer *b bool is_constructor = false; bool is_destructor = false; bool is_copy_constructor = false; + bool is_move_constructor = false; if (!decl_name) return 0; @@ -2901,11 +2902,12 @@ print_ada_declaration (pretty_printer *b is_constructor = cpp_check (t, IS_CONSTRUCTOR); is_destructor = cpp_check (t, IS_DESTRUCTOR); is_copy_constructor = cpp_check (t, IS_COPY_CONSTRUCTOR); + is_move_constructor = cpp_check (t, IS_MOVE_CONSTRUCTOR); } - /* Skip copy constructors: some are internal only, and those that are - not cannot be called easily from Ada anyway. */ - if (is_copy_constructor) + /* Skip copy constructors and C++11 move constructors: some are internal + only and those that are not cannot be called easily from Ada. */ + if (is_copy_constructor || is_move_constructor) return 0; if (is_constructor || is_destructor) Index: cp/decl2.c === --- cp/decl2.c (revision 225410) +++ cp/decl2.c (working copy) @@ -4077,6 +4077,8 @@ cpp_check (tree t, cpp_operation op) return DECL_DESTRUCTOR_P (t); case IS_COPY_CONSTRUCTOR: return DECL_COPY_CONSTRUCTOR_P (t); + case IS_MOVE_CONSTRUCTOR: + return DECL_MOVE_CONSTRUCTOR_P (t); case IS_TEMPLATE: return TREE_CODE (t) == TEMPLATE_DECL; case IS_TRIVIAL: /* { dg-do compile } */ /* { dg-options -fdump-ada-spec } */ templateclass T, class U class Generic_Array { Generic_Array(); }; template class Generic_Arraychar, int; /* { dg-final { scan-ada-spec-not access Generic_Array } } */ /* { dg-final { cleanup-ada-spec } } */
Re: [PATCH] save takes a single integer (register or 13-bit signed immediate)
You are right, I forgot about that. Is there a mode one can use that changes depending on the target architecture (32-bit on 32-bit architectures and 64-bit on 64-bit architectures)? Yes, Pmode does exactly that, but you cannot use it directly in the MD file. Or does one have to add a 32-bit and a 64-bit variant of window_save? Sort of, you can use the P mode iterator, but the name of the pattern will vary so you'll need to adjust the callers. -- Eric Botcazou
RE: fix PR46029: reimplement if conversion of loads and stores
(if-conversion could directly generate masked load/stores of course and not use a scratch-pad at all in that case). IMO that`s a great idea, but I don`t know how to do it. Hints would be welcome. In particular, how does one generate masked load/stores at the GIMPLE level? But are we correctly handling these cases in the current if conversion code? I`m uncertain to what that is intended to refer, but I believe Sebastian would agree that the new if converter is safer than the old one in terms of correctness at the time of running the code being compiled. Abe's changes would seem like a step forward from a correctness standpoint Not to argue, but as a point of humility: Sebastian did by far the most work on this patch. I just modernized it and helped move it along. Even _that_ was done with Sebastian`s help. even if they take us a step backwards from a performance standpoint. For now, we have a few performance regressions, and so far we have found that it`s non-trivial to remove all of those regressions. We may be better off pushing the current patch to trunk and having the performance regressions currently introduced by the new if converter be fixed by later patches. Pushing to trunk gives us excellent visibility amongst GCC hackers, so the code will get more eyeballs than if it lingers in an uncommitted patch or in a branch. I, for one, would love some help in fixing these performance regressions. ;-) If fixing the performance regressions winds up taking too long, perhaps the current imperfect patch could be undone on trunk just before a release is tagged, and then we`ll push it in again when trunk goes back to being allowed to be unstable? According to my analysis of the data near the end of the page at https://gcc.gnu.org/develop.html, we have until roughly April of 2016 to work on not-yet-perfect patches in trunk. So the question is whether we get more non-vectorized if-converted code out of this (and thus whether we want to use --param allow-store-data-races to get the old code back which is nicer to less capable CPUs and probably faster than using scatter/gather or masked loads/stores). I do think conditionalizing some of this on the allow-store-data-races makes sense. I think having both the old if-converter and the new one live on in GCC is nontrivial, but not impossible. I also don`t think it`s the best long-term goal, but only a short-term workaround. In the long run, IMO there should be only one if converter, albeit perhaps with tweaking flags [e.g. -fallow-unsafe-if-conversion]. I also wonder if we should really care about load data races (not sure your patch does). According to a recent long discussion I had with Sebastian, our current patch does not have the flaw I was concerned it might have in terms of loads because: [1] the scratchpad is only being used to if-convert assignments to thread-local scalars, never to globals/statics, and because... [2] the gimplifier is supposed to detect the address of this scalar has been taken and when such is detected in the code being compiled, it causes the scalar to no longer look like a scalar in GIMPLE so that we are also safe from stale-data problems that could come from corner-case code that takes the address of a thread-local variable and gives that address to another thread [which then proceeds to overwrite the value in the supposedly-thread-local scalar that belongs to a different thread from the one doing the writing] Regards, Abe
Re: [PATCH, ARM] stop changing signedness in PROMOTE_MODE
On July 7, 2015 6:29:21 PM GMT+02:00, Jim Wilson jim.wil...@linaro.org wrote: On Tue, Jul 7, 2015 at 8:07 AM, Jeff Law l...@redhat.com wrote: On 06/29/2015 07:15 PM, Jim Wilson wrote: So if these copies require a conversion, then isn't it fundamentally wrong to have a PHI node which copies between them? That would seem to implicate the eipa_sra pass as needing to be aware of these promotions and avoid having these objects with different representations appearing on the lhs/rhs of a PHI node. My years at Cisco didn't give me a chance to work on the SSA passes, so I don't know much about how they work. But looking at this, I see that PHI nodes are eventually handed by emit_partition_copy in tree-outof-ssa.c, which calls convert_to_mode, so it appears that conversions between different (closely related?) types are OK in a PHI node. The problem in this case is that we have the exact same type for the src and dest. The only difference is that the ARM forces sign-extension for signed sub-word parameters and zero-extension for signed sub-word locals. Thus to detect the need for a conversion, you have to have the decls, and we don't have them here. There is also the problem that all of the SUBREG_PROMOTED_* stuff is in expand_expr and friends, which aren't used by the cfglayout/tree-outof-ssa.c code for PHI nodes. So we need to copy some of the SUBREG_PROMOTED_* handling into cfglyout/tree-outof-ssa, or modify them to call expand_expr for PHI nodes, and I haven't had any luck getting that to work yet. I still need to learn more about the code to figure out if this is possible. It probably is. The decks for the parameter based SSA names are available, for the PHI destination there might be no decl. I also think that the ARM handling of PROMOTE_MODE is wrong. Treating a signed sub-word and unsigned can lead to inefficient code. This is part of the problem is much easier for me to fix. It may be hard to convince ARM maintainers that it should be changed though. I need more time to work on this too. I haven't looked at trying to forbid the optimizer from creating PHI nodes connecting parameters to locals. That just sounds like a strange thing to forbid, and seems likely to result in worse code by disabling too many optimizations. But maybe it is the right solution if neither of the other two options work. This should only be done when PROMOTE_MODE disagrees with TARGET_PROMOTE_FUNCTION_MODE, forcing the copy to require a conversion. I don't think disallowing such PHI nodes is the right thing to do. I'd rather expose the TARGET_PROMOTE_FUNCTION_MODE effect earlier by modifying the parameter types during, say, gimplification. Richard. Jim
Re: [patch 9/9] Final patch with all changes
On 07/07/2015 02:51 PM, Andrew MacLeod wrote: *** sel-sched-ir.h(revision 225452) --- sel-sched-ir.h(working copy) *** along with GCC; see the file COPYING3. *** 22,34 #define GCC_SEL_SCHED_IR_H /* For state_t. */ - #include insn-attr.h - #include regset.h /* For reg_note. */ - #include rtl.h - #include bitmap.h - #include sched-int.h - #include cfgloop.h Should probably drop those For state_t/reg_note. comments too. Thanks, Pedro Alves
Re: [PATCH, ARM] stop changing signedness in PROMOTE_MODE
On 06/29/2015 07:15 PM, Jim Wilson wrote: This is my suggested fix for PR 65932, which is a linux kernel miscompile with gcc-5.1. The problem here is caused by a chain of events. The first is that the relatively new eipa_sra pass creates fake parameters that behave slightly differently than normal parameters. The second is that the optimizer creates phi nodes that copy local variables to fake parameters and/or vice versa. The third is that the ouf-of-ssa pass assumes that it can emit simple move instructions for these phi nodes. And the fourth is that the ARM port has a PROMOTE_MODE macro that forces QImode and HImode to unsigned, but a TARGET_PROMOTE_FUNCTION_MODE hook that does not. So signed char and short parameters have different in register representations than local variables, and require a conversion when copying between them, a conversion that the out-of-ssa pass can't easily emit. So if these copies require a conversion, then isn't it fundamentally wrong to have a PHI node which copies between them? That would seem to implicate the eipa_sra pass as needing to be aware of these promotions and avoid having these objects with different representations appearing on the lhs/rhs of a PHI node. Jeff
[gomp4] Handle deviceptr from an outer directive
Hi, This patch fixes an issue where the deviceptr clause in an outer directive was being ignored during implicit variable definition on a nested directive. Committed to gomp-4_0-branch. Jim diff --git a/gcc/gimplify.c b/gcc/gimplify.c index 51aadc0..a721a52 100644 --- a/gcc/gimplify.c +++ b/gcc/gimplify.c @@ -116,6 +116,9 @@ enum gimplify_omp_var_data /* Gang-local OpenACC variable. */ GOVD_GANGLOCAL = (1 16), + /* OpenACC deviceptr clause. */ + GOVD_USE_DEVPTR = (1 17), + GOVD_DATA_SHARE_CLASS = (GOVD_SHARED | GOVD_PRIVATE | GOVD_FIRSTPRIVATE | GOVD_LASTPRIVATE | GOVD_REDUCTION | GOVD_LINEAR | GOVD_LOCAL) @@ -6274,7 +6277,10 @@ gimplify_scan_omp_clauses (tree *list_p, gimple_seq *pre_p, } break; } + flags = GOVD_MAP | GOVD_EXPLICIT; + if (OMP_CLAUSE_MAP_KIND (c) == GOMP_MAP_FORCE_DEVICEPTR) + flags |= GOVD_USE_DEVPTR; goto do_add; case OMP_CLAUSE_DEPEND: @@ -6662,6 +6668,7 @@ gimplify_adjust_omp_clauses_1 (splay_tree_node n, void *data) : (flags GOVD_FORCE_MAP ? GOMP_MAP_FORCE_TOFROM : GOMP_MAP_TOFROM)); + if (DECL_SIZE (decl) TREE_CODE (DECL_SIZE (decl)) != INTEGER_CST) { @@ -6687,7 +6694,17 @@ gimplify_adjust_omp_clauses_1 (splay_tree_node n, void *data) OMP_CLAUSE_CHAIN (clause) = nc; } else - OMP_CLAUSE_SIZE (clause) = DECL_SIZE_UNIT (decl); + { + if (gimplify_omp_ctxp-outer_context) + { + struct gimplify_omp_ctx *ctx = gimplify_omp_ctxp-outer_context; + splay_tree_node on + = splay_tree_lookup (ctx-variables, (splay_tree_key) decl); + if (on (on-value GOVD_USE_DEVPTR)) + OMP_CLAUSE_SET_MAP_KIND (clause, GOMP_MAP_FORCE_PRESENT); + } + OMP_CLAUSE_SIZE (clause) = DECL_SIZE_UNIT (decl); + } } if (code == OMP_CLAUSE_FIRSTPRIVATE (flags GOVD_LASTPRIVATE) != 0) {
Re: RE: [Ping^3] [PATCH, ARM, libgcc] New aeabi_idiv function for armv6-m
Ping! On 30/04/15 10:40, Hale Wang wrote: -Original Message- From: Hale Wang [mailto:hale.w...@arm.com] Sent: Monday, February 09, 2015 9:54 AM To: Richard Earnshaw Cc: Hale Wang; gcc-patches; Matthew Gretton-Dann Subject: RE: [Ping^2] [PATCH, ARM, libgcc] New aeabi_idiv function for armv6-m Ping https://gcc.gnu.org/ml/gcc-patches/2014-12/msg01059.html. Ping for trunk. Is it ok for trunk now? Thanks, Hale -Original Message- From: gcc-patches-ow...@gcc.gnu.org [mailto:gcc-patches- ow...@gcc.gnu.org] On Behalf Of Hale Wang Sent: Friday, December 12, 2014 9:36 AM To: gcc-patches Subject: RE: [Ping] [PATCH, ARM, libgcc] New aeabi_idiv function for armv6- m Ping? Already applied to arm/embedded-4_9-branch, is it OK for trunk? -Hale -Original Message- From: Joey Ye [mailto:joey.ye...@gmail.com] Sent: Thursday, November 27, 2014 10:01 AM To: Hale Wang Cc: gcc-patches Subject: Re: [PATCH, ARM, libgcc] New aeabi_idiv function for armv6-m OK applying to arm/embedded-4_9-branch, though you still need maintainer approval into trunk. - Joey On Wed, Nov 26, 2014 at 11:43 AM, Hale Wang hale.w...@arm.com wrote: Hi, This patch ports the aeabi_idiv routine from Linaro Cortex-Strings (https://git.linaro.org/toolchain/cortex-strings.git), which was contributed by ARM under Free BSD license. The new aeabi_idiv routine is used to replace the one in libgcc/config/arm/lib1funcs.S. This replacement happens within the Thumb1 wrapper. The new routine is under LGPLv3 license. The main advantage of this version is that it can improve the performance of the aeabi_idiv function for Thumb1. This solution will also increase the code size. So it will only be used if __OPTIMIZE_SIZE__ is not defined. Make check passed for armv6-m. OK for trunk? Thanks, Hale Wang libgcc/ChangeLog: 2014-11-26 Hale Wang hale.w...@arm.com * config/arm/lib1funcs.S: Add new wrapper. === diff --git a/libgcc/config/arm/lib1funcs.S b/libgcc/config/arm/lib1funcs.S index b617137..de66c81 100644 --- a/libgcc/config/arm/lib1funcs.S +++ b/libgcc/config/arm/lib1funcs.S @@ -306,34 +306,12 @@ LSYM(Lend_fde): #ifdef __ARM_EABI__ .macro THUMB_LDIV0 name signed #if defined(__ARM_ARCH_6M__) - .ifc \signed, unsigned - cmp r0, #0 - beq 1f - mov r0, #0 - mvn r0, r0 @ 0x -1: - .else - cmp r0, #0 - beq 2f - blt 3f + + push{r0, lr} mov r0, #0 - mvn r0, r0 - lsr r0, r0, #1 @ 0x7fff - b 2f -3: mov r0, #0x80 - lsl r0, r0, #24 @ 0x8000 -2: - .endif - push{r0, r1, r2} - ldr r0, 4f - adr r1, 4f - add r0, r1 - str r0, [sp, #8] - @ We know we are not on armv4t, so pop pc is safe. - pop {r0, r1, pc} - .align 2 -4: - .word __aeabi_idiv0 - 4b + bl SYM(__aeabi_idiv0) + pop {r1, pc} + #elif defined(__thumb2__) .syntax unified .ifc \signed, unsigned @@ -927,7 +905,158 @@ LSYM(Lover7): add dividend, work .endif LSYM(Lgot_result): -.endm +.endm + +#if defined(__prefer_thumb__) !defined(__OPTIMIZE_SIZE__) .macro +BranchToDiv n, label + lsr curbit, dividend, \n + cmp curbit, divisor + blo \label +.endm + +.macro DoDiv n + lsr curbit, dividend, \n + cmp curbit, divisor + bcc 1f + lsl curbit, divisor, \n + sub dividend, dividend, curbit + +1: adc result, result +.endm + +.macro THUMB1_Div_Positive + mov result, #0 + BranchToDiv #1, LSYM(Lthumb1_div1) + BranchToDiv #4, LSYM(Lthumb1_div4) + BranchToDiv #8, LSYM(Lthumb1_div8) + BranchToDiv #12, LSYM(Lthumb1_div12) + BranchToDiv #16, LSYM(Lthumb1_div16) +LSYM(Lthumb1_div_large_positive): + mov result, #0xff + lsl divisor, divisor, #8 + rev result, result + lsr curbit, dividend, #16 + cmp curbit, divisor + blo 1f + asr result, #8 + lsl divisor, divisor, #8 + beq LSYM(Ldivbyzero_waypoint) + +1: lsr curbit, dividend, #12 + cmp curbit, divisor + blo LSYM(Lthumb1_div12) + b LSYM(Lthumb1_div16) +LSYM(Lthumb1_div_loop): + lsr divisor, divisor, #8 +LSYM(Lthumb1_div16): + Dodiv #15 + Dodiv #14 + Dodiv #13 + Dodiv #12 +LSYM(Lthumb1_div12): + Dodiv #11 + Dodiv #10 + Dodiv #9 + Dodiv #8 + bcs LSYM(Lthumb1_div_loop) +LSYM(Lthumb1_div8): + Dodiv #7 + Dodiv #6 + Dodiv #5 +LSYM(Lthumb1_div5): + Dodiv #4 +LSYM(Lthumb1_div4): + Dodiv #3 +LSYM(Lthumb1_div3): + Dodiv #2 +LSYM(Lthumb1_div2): + Dodiv #1 +LSYM(Lthumb1_div1): + sub
Re: [patch 0/9] Flattening and initial module rebuilding
On 07/07/2015 07:40 AM, Andrew MacLeod wrote: This is a series of 9 patches which does some flattening, some module building, and some basic cleanups. I am presenting them as 9 patches for easier review. The latter couple of patches affect a lot of the same files that follow on patches then adjust, I've decided NOT to put the automated changes in with each of those patches. There are 8 patches showing the key changes, and then the 9th patch is an aggregate of the first 8 key changes, plus the final result of the impact on all the source files. This is the only patch I'd like to commit. The automated tools which generate the source changes have been significantly enhanced. When a header is flattened, the source file is checked for the existence of the headers which need moving, and any which are already present are left if they are in the right order. Any duplicate are also removed. A similar process is used when an aggregation file like backend.h or ssa.h is processed. Any occurrences of the aggregated headers are removed from the source file so there are no duplicates. The aggregated headers are typically only placed in a source file if 3 or more of the headers would be replaced. (ie, if only bitmap.h is included, I don't just blindly put backend.h in the file.) This number came from analysis of a fully flattened and include-reduced tree, and seemed to be the sweet spot. With the aggregation and flattening, the order of some includes can get shifted around with other files in between, so the tools also ensure there is a blessed order which will make sure than any pre-reqs are always available. Right now, its primarily: config.h system.h coretypes.h backend.h tree.h gimple.h rtl.h df.h ssa.h And if any of the aggregators are not present, then any headers which make up the aggregator are in the same relative position. The tools actually produced all these patches with no tweaking to solve compilation failures.. which was very helpful. The old ones needed some guidance and were a bit finicky. I can adjust any of this quite easily, or present them in a different way if you don't like it this way. Again, my goal is to check in just the final patch which does all the work of first 8 patches.It would be a lot less turmoil on the branch. I can do it in smaller chunks if need be. The set of 9 patches is fine for the trunk. Just a few discussion points... One of the things I keep thinking about as these changes fly by is your scripts. Is there a reasonable possibility for you to add your scripts to the contrib/ directory or something similar to aid us in any future header file refactoring? Yes, I know that in theory we should never have to do this again, but I also know that reality can be rather different. Presumably the aggregators, by policy, are to have #includes and nothing else, right? If so, we might want a comment to that effect in them. It's a bit of a shame that function.h is in backend.h, along with predict (which is presumably needed by basic-block/cfg?). Jeff
Re: [patch 9/9] Final patch with all changes
On 07/07/2015 06:03 PM, Pedro Alves wrote: On 07/07/2015 02:51 PM, Andrew MacLeod wrote: *** sel-sched-ir.h (revision 225452) --- sel-sched-ir.h (working copy) *** along with GCC; see the file COPYING3. *** 22,34 #define GCC_SEL_SCHED_IR_H /* For state_t. */ - #include insn-attr.h - #include regset.h /* For reg_note. */ - #include rtl.h - #include bitmap.h - #include sched-int.h - #include cfgloop.h Should probably drop those For state_t/reg_note. comments too. Thanks, Pedro Alves Ah right. Previous version had them removed. Missed them when I rebuilt the patch. Thanks Andrew
Re: [PATCH 6/7] Fix DEMANGLE_COMPONENT_LOCAL_NAME
On 07/06/2015 01:39 PM, Mikhail Maltsev wrote: --- libiberty/cp-demangle.c | 7 +++ libiberty/testsuite/demangle-expected | 4 2 files changed, 11 insertions(+) diff --git a/libiberty/cp-demangle.c b/libiberty/cp-demangle.c index 424b1c5..289a704 100644 --- a/libiberty/cp-demangle.c +++ b/libiberty/cp-demangle.c @@ -3243,6 +3243,8 @@ d_expression_1 (struct d_info *di) struct demangle_component *left; struct demangle_component *right; + if (code == NULL) + return NULL; if (op_is_new_cast (op)) left = cplus_demangle_type (di); else @@ -4436,6 +4438,11 @@ d_print_comp_inner (struct d_print_info *dpi, int options, local_name = d_right (typed_name); if (local_name-type == DEMANGLE_COMPONENT_DEFAULT_ARG) local_name = local_name-u.s_unary_num.sub; + if (local_name == NULL) + { + d_print_error (dpi); + return; + } while (local_name-type == DEMANGLE_COMPONENT_RESTRICT_THIS || local_name-type == DEMANGLE_COMPONENT_VOLATILE_THIS || local_name-type == DEMANGLE_COMPONENT_CONST_THIS diff --git a/libiberty/testsuite/demangle-expected b/libiberty/testsuite/demangle-expected index 2dbab14..cfa2691 100644 --- a/libiberty/testsuite/demangle-expected +++ b/libiberty/testsuite/demangle-expected @@ -4104,6 +4104,10 @@ _Z111 --format=gnu-v3 _ZDTtl _ZDTtl +# Check for NULL pointer when demangling DEMANGLE_COMPONENT_LOCAL_NAME +--format=gnu-v3 +_ZZN1fEEd_lEv +_ZZN1fEEd_lEv # # Ada (GNAT) tests. # Also OK with a suitable ChangeLog entry. jeff
Re: [PATCH 3/7] Fix trinary op
On 07/06/2015 01:34 PM, Mikhail Maltsev wrote: --- libiberty/cp-demangle.c | 4 +++- libiberty/testsuite/demangle-expected | 6 ++ 2 files changed, 9 insertions(+), 1 deletion(-) diff --git a/libiberty/cp-demangle.c b/libiberty/cp-demangle.c index 12093cc..44a0a9b 100644 --- a/libiberty/cp-demangle.c +++ b/libiberty/cp-demangle.c @@ -3267,7 +3267,9 @@ d_expression_1 (struct d_info *di) struct demangle_component *second; struct demangle_component *third; - if (!strcmp (code, qu)) + if (code == NULL) + return NULL; + else if (!strcmp (code, qu)) { /* ?: expression. */ first = d_expression_1 (di); diff --git a/libiberty/testsuite/demangle-expected b/libiberty/testsuite/demangle-expected index 6ea64ae..47ca8f5 100644 --- a/libiberty/testsuite/demangle-expected +++ b/libiberty/testsuite/demangle-expected @@ -4091,6 +4091,12 @@ void g1(A1, Bstatic_castbool(1)) _ZNKSt7complexIiE4realB5cxx11Ev std::complexint::real[abi:cxx11]() const # +# Some more crashes revealed by fuzz-testing: +# Check for NULL pointer when demangling trinary operators +--format=gnu-v3 +Av32_f +Av32_f +# # Ada (GNAT) tests. # # Simple test. OK with a suitable ChangeLog entry. And a generic question on the testsuite -- presumably it turns on type demangling?I wanted to verify the flow through d_expression_1 was what I expected it to be and it took a while to realize that c++filt doesn't demangle types by default, thus Av32_f would demangle to Av32_f without ever getting into d_expression_1. jeff
Re: [PATCH 4/7] Fix int overflow
On 07/06/2015 06:04 PM, Mikhail Maltsev wrote: On 07.07.2015 1:55, Jeff Law wrote: len = d_number (di); - if (len = 0) + if (len = 0 || len INT_MAX) return NULL; ret = d_identifier (di, len); di-last_name = ret; Isn't this only helpful if sizeof (long) sizeof (int)? Otherwise the compiler is going to eliminate that newly added test, right? So with that in mind, what happens on i686-unknown-linux with this test? Jeff Probably it should be fine, because the problem occurred when len became negative after implicit conversion to int (d_identifier does not check for negative length, but it does check that length does not exceed total string length). In this case (i.e. on ILP32 targets) len will not change sign after conversion to int (because it's a no-op). I'm not completely sure about compiler warnings, but AFAIR, in multilib build libiberty is also built for 32-bit target, and I did not get any additional warnings. You may need -Wtype-limits to see the warning. I'm not questioning whether or not the test will cause a problem, but instead questioning if the test does what you expect it to do on a 32bit host. On a host where sizeof (int) == sizeof (long), that len INT_MAX test is always going to be false. If you want to do overflow testing, you have to compute len in a wider type. You might consider using long long or int64_t depending on the outcome of a configure test. Falling back to a simple long if the host compiler doesn't have long long or int64_t. Interesting exercise feeding those tests into demangle.com :-0 A suitably interested party might be able to exploit that overflow. jeff
Re: [PATCH 7/7] Fix several crashes in d_find_pack
On 07/06/2015 01:40 PM, Mikhail Maltsev wrote: --- libiberty/cp-demangle.c | 3 +++ libiberty/testsuite/demangle-expected | 12 2 files changed, 15 insertions(+) diff --git a/libiberty/cp-demangle.c b/libiberty/cp-demangle.c index 289a704..4ca285e 100644 --- a/libiberty/cp-demangle.c +++ b/libiberty/cp-demangle.c @@ -4203,6 +4203,9 @@ d_find_pack (struct d_print_info *dpi, case DEMANGLE_COMPONENT_CHARACTER: case DEMANGLE_COMPONENT_FUNCTION_PARAM: case DEMANGLE_COMPONENT_UNNAMED_TYPE: +case DEMANGLE_COMPONENT_FIXED_TYPE: +case DEMANGLE_COMPONENT_DEFAULT_ARG: +case DEMANGLE_COMPONENT_NUMBER: return NULL; case DEMANGLE_COMPONENT_EXTENDED_OPERATOR: diff --git a/libiberty/testsuite/demangle-expected b/libiberty/testsuite/demangle-expected index cfa2691..b58cea2 100644 --- a/libiberty/testsuite/demangle-expected +++ b/libiberty/testsuite/demangle-expected @@ -4108,6 +4108,18 @@ _ZDTtl --format=gnu-v3 _ZZN1fEEd_lEv _ZZN1fEEd_lEv +# Handle DEMANGLE_COMPONENT_FIXED_TYPE in d_find_pack +--format=gnu-v3 +DpDFT_ +DpDFT_ +# Likewise, DEMANGLE_COMPONENT_DEFAULT_ARG +--format=gnu-v3 +DpZ1fEd_ +DpZ1fEd_ +# Likewise, DEMANGLE_COMPONENT_NUMBER (??? result is probably still wrong) +--format=gnu-v3 +DpDv1_c +(char __vector(1))... # # Ada (GNAT) tests. # OK with a suitable ChangeLog entry. FWIW, demangler.com doesn't give any results for that case. It just returns DpDv1_c Jeff
Re: [patch 0/9] Flattening and initial module rebuilding
On 07/07/2015 06:21 PM, Jeff Law wrote: On 07/07/2015 07:40 AM, Andrew MacLeod wrote: I can adjust any of this quite easily, or present them in a different way if you don't like it this way. Again, my goal is to check in just the final patch which does all the work of first 8 patches. It would be a lot less turmoil on the branch. I can do it in smaller chunks if need be. The set of 9 patches is fine for the trunk. Just a few discussion points... One of the things I keep thinking about as these changes fly by is your scripts. Is there a reasonable possibility for you to add your scripts to the contrib/ directory or something similar to aid us in any future header file refactoring? Yes, I know that in theory we should never have to do this again, but I also know that reality can be rather different. yes, with a bit of tweaking and enhancement they can be generally useful. They are all in python. And no one is allowed to make comments like OMG thats so inefficient or what a horrible way to do that :-) My goal was getting things done and sometimes the brute force approach works great when the machine sare fast enough :_) Presumably the aggregators, by policy, are to have #includes and nothing else, right? If so, we might want a comment to that effect in them. Yeah, will do. It's a bit of a shame that function.h is in backend.h, along with predict (which is presumably needed by basic-block/cfg?). Yeah,once things settle down someone could tweak things more. If I make the tools available, people can do their own analysis and adjusting. function.h provides cfun which is used all over the place..9 backend header files use it,and a few like gimple.h actually require struct function to be defined. predict.h is actually required by gimple.h for a few reasons, enum be_predictor is used in parameter lists and a few inlines use the TAKEN, NOT_TAKEN macros Its also needed by cfghooks.h, and betwen those 2 files, its just needed by a very good chunk of the backend. .. 219 of the 263 files which include backend.h need it. We could move the 2 enums and TAKEN/NOT_TAKEN to coretypes or something like that and it would probably cut the requirements for it by a *lot*. Andrew For the sake of amusement, here's the output from my initial include reduction logs for each file. Its basically all the unique errors produced by trying to remove the header from every source file in libbackend.a: predict.h: gimple_h : use of enum ‘br_predictor’ without previous declaration gimple_h : use of enum ‘prediction’ without previous declaration gimple_h : ‘TAKEN’ was not declared in this scope gimple_h : ‘NOT_TAKEN’ was not declared in this scope cfghooks_h : use of enum ‘br_predictor’ without previous declaration (1) : predict_h - cfghooks_h use of enum ‘br_predictor’ without previous declaration (4) : predict_h - gimple_h use of enum ‘br_predictor’ without previous declaration use of enum ‘prediction’ without previous declaration ‘TAKEN’ was not declared in this scope ‘NOT_TAKEN’ was not declared in this scope function.h: emit_rtl_h : field ‘expr’ has incomplete type ‘expr_status’ emit_rtl_h : field ‘emit’ has incomplete type ‘emit_status’ emit_rtl_h : field ‘varasm’ has incomplete type ‘varasm_status’ emit_rtl_h : field ‘subsections’ has incomplete type ‘function_subsections’ emit_rtl_h : field ‘eh’ has incomplete type ‘rtl_eh’ emit_rtl_h : invalid use of incomplete type ‘struct sequence_stack’ gimple_h : invalid use of incomplete type ‘struct function’ gimple_ssa_h : invalid use of incomplete type ‘const struct function’ gimple_ssa_h : ‘cfun’ was not declared in this scope tree_ssanames_h : ‘cfun’ was not declared in this scope cfgloop_h : ‘loops_for_fn’ was not declared in this scope cfgloop_h : ‘current_loops’ was not declared in this scope cfgloop_h : ‘cfun’ was not declared in this scope cilk_h : invalid use of incomplete type ‘struct function’ ssa_iterators_h : ‘cfun’ was not declared in this scope tree_scalar_evolution_h : ‘cfun’ was not declared in this scope (1) : function_h - tree_scalar_evolution_h ‘cfun’ was not declared in this scope (1) : function_h - tree_ssanames_h ‘cfun’ was not declared in this scope (1) : function_h - ssa_iterators_h ‘cfun’ was not declared in this scope (1) : function_h - cilk_h invalid use of incomplete type ‘struct function’ (1) : function_h - gimple_h invalid use of incomplete type ‘struct function’ (2) : function_h - gimple_ssa_h invalid use of incomplete type ‘const struct function’ ‘cfun’ was not declared in this scope (3) : function_h - cfgloop_h ‘loops_for_fn’ was not declared in this scope ‘current_loops’ was not declared in this scope ‘cfun’ was not declared in this scope (6) : function_h - emit_rtl_h field ‘expr’ has incomplete type
RE: [PATCH] MIPS: fix failing branch range checks for micromips
Hi Andrew, -Original Message- From: Andrew Bennett [mailto:andrew.benn...@imgtec.com] Sent: Tuesday, July 07, 2015 12:13 PM To: Moore, Catherine; gcc-patches@gcc.gnu.org Cc: Matthew Fortune Subject: RE: [PATCH] MIPS: fix failing branch range checks for micromips Ok to commit? testsuite/ * gcc.target/mips/branch-2.c: Change NOMIPS16 to NOCOMPRESSION. * gcc.target/mips/branch-3.c: Ditto * gcc.target/mips/branch-4.c: Ditto. * gcc.target/mips/branch-5.c: Ditto. * gcc.target/mips/branch-6.c: Ditto. * gcc.target/mips/branch-7.c: Ditto. * gcc.target/mips/branch-8.c: Ditto. * gcc.target/mips/branch-9.c: Ditto. * gcc.target/mips/branch-10.c: Ditto. * gcc.target/mips/branch-11.c: Ditto. * gcc.target/mips/branch-12.c: Ditto. * gcc.target/mips/branch-13.c: Ditto. These are OK, except for the splitting of the scan-assembler statements. Please change occurrences of: +/* { dg-final { scan-assembler +\tld\t\\\$1,%got_page\\(\[^)\]*\\)\\(\\\$3\\)\\n } } */ to: +/* { dg-final { scan-assembler \tld\t\\\$1,%got_page\\(\[^)\]*\\)\\(\\\$3\\)\\n } } */ before committing. * gcc.target/mips/branch-14.c: Ditto. * gcc.target/mips/branch-15.c: Ditto. The modifications for these two files need to be removed. These are execution tests and the multilib that is used to link them is important. If the libraries are not compatible with the NOCOMPRESSION attribute, then the link step will fail. You could work around this problem by enabling interlinking, but I think the best approach is to leave these two tests alone. * gcc.target/mips/umips-branch-5.c: New file. * gcc.target/mips/umips-branch-6.c: New file. * gcc.target/mips/umips-branch-7.c: New file. * gcc.target/mips/umips-branch-8.c: New file. * gcc.target/mips/umips-branch-9.c: New file. * gcc.target/mips/umips-branch-10.c: New file. * gcc.target/mips/umips-branch-11.c: New file. * gcc.target/mips/umips-branch-12.c: New file. * gcc.target/mips/umips-branch-13.c: New file. * gcc.target/mips/umips-branch-14.c: New file. * gcc.target/mips/umips-branch-15.c: New file. * gcc.target/mips/umips-branch-16.c: New file. Same comment as above on the scan-assembler statements. * gcc.target/mips/umips-branch-17.c: New file. * gcc.target/mips/umips-branch-18.c: New file. These two tests suffer from the same problem as above. They should be deleted altogether. * gcc.target/mips/branch-helper.h (OCCUPY_0x1): New define. (OCCUPY_0xfffc): New define. This is okay. Thanks, Catherine diff --git a/gcc/testsuite/gcc.target/mips/branch-10.c b/gcc/testsuite/gcc.target/mips/branch-10.c index e2b1b5f..eb21c16 100644 --- a/gcc/testsuite/gcc.target/mips/branch-10.c +++ b/gcc/testsuite/gcc.target/mips/branch-10.c @@ -4,7 +4,7 @@ #include branch-helper.h -NOMIPS16 void +NOCOMPRESSION void foo (int (*bar) (void), int *x) { *x = bar (); diff --git a/gcc/testsuite/gcc.target/mips/branch-11.c b/gcc/testsuite/gcc.target/mips/branch-11.c index 962eb1b..bd8e834 100644 --- a/gcc/testsuite/gcc.target/mips/branch-11.c +++ b/gcc/testsuite/gcc.target/mips/branch-11.c @@ -8,7 +8,7 @@ #include branch-helper.h -NOMIPS16 void +NOCOMPRESSION void foo (int (*bar) (void), int *x) { *x = bar (); diff --git a/gcc/testsuite/gcc.target/mips/branch-12.c b/gcc/testsuite/gcc.target/mips/branch-12.c index 4aef160..4944634 100644 --- a/gcc/testsuite/gcc.target/mips/branch-12.c +++ b/gcc/testsuite/gcc.target/mips/branch-12.c @@ -4,7 +4,7 @@ #include branch-helper.h -NOMIPS16 void +NOCOMPRESSION void foo (int (*bar) (void), int *x) { *x = bar (); diff --git a/gcc/testsuite/gcc.target/mips/branch-13.c b/gcc/testsuite/gcc.target/mips/branch-13.c index 8a6fb04..f5269b9 100644 --- a/gcc/testsuite/gcc.target/mips/branch-13.c +++ b/gcc/testsuite/gcc.target/mips/branch-13.c @@ -8,7 +8,7 @@ #include branch-helper.h -NOMIPS16 void +NOCOMPRESSION void foo (int (*bar) (void), int *x) { *x = bar (); diff --git a/gcc/testsuite/gcc.target/mips/branch-14.c b/gcc/testsuite/gcc.target/mips/branch-14.c index 026417e..c2eecc3 100644 --- a/gcc/testsuite/gcc.target/mips/branch-14.c +++ b/gcc/testsuite/gcc.target/mips/branch-14.c @@ -4,14 +4,14 @@ #include branch-helper.h void __attribute__((noinline)) -foo (volatile int *x) +NOCOMPRESSION foo (volatile int *x) { if (__builtin_expect (*x == 0, 1)) OCCUPY_0x1fff8; } int -main (void) +NOCOMPRESSION main (void) { int x = 0; int y = 1; diff --git a/gcc/testsuite/gcc.target/mips/branch-15.c b/gcc/testsuite/gcc.target/mips/branch-15.c index dee7a05..89e25f3 100644 --- a/gcc/testsuite/gcc.target/mips/branch-15.c +++ b/gcc/testsuite/gcc.target/mips/branch-15.c @@ -4,14 +4,14 @@ #include branch-helper.h void -foo
Re: [PATCH 15/16][fold-const.c] Fix bigendian HFmode in native_interpret_real
On 07/07/2015 06:37 AM, Alan Lawrence wrote: As per https://gcc.gnu.org/ml/gcc-patches/2015-04/msg01346.html. Fixes FAIL of advsimd-intrinsics vcreate.c on aarch64_be-none-elf from previous patch. 15_native_interpret_real.patch commit e2e7ca148960a82fc88128820f17e7cbd14173cb Author: Alan Lawrencealan.lawre...@arm.com Date: Thu Apr 9 10:54:40 2015 +0100 Fix native_interpret_real for HFmode floats on Bigendian with UNITS_PER_WORD=4 (with missing space) OK with ChangeLog in proper form. jeff
[PATCH] libgomp: Introduce gomp_thread::spare_team
Try to re-use the previous team to avoid the use of malloc() and free() in the normal case where number of threads is the same. Avoid superfluous destruction and initialization of team synchronization objects. Using the microbenchmark posted here https://gcc.gnu.org/ml/gcc-patches/2008-03/msg00930.html shows an improvement in the parallel bench test case (target x86_64-unknown-linux-gnu, median out of 9 test runs, iteration count increased to 20). Before the patch: parallel bench 11.2284 seconds After the patch: parallel bench 10.7575 seconds libgomp/ChangeLog 2015-07-07 Sebastian Huber sebastian.hu...@embedded-brains.de * libgomp.h (gomp_thread): Add spare_team field. * team.c (gomp_thread_start): Initialize spare team for non-TLS targets. (gomp_new_team): Use spare team if possible. (free_team): Destroy more team objects. (gomp_free_thread): Free spare team if necessary. (free_non_nested_team): New. (gomp_team_end): Move some team object destructions to free_team(). Use free_non_nested_team(). --- libgomp/libgomp.h | 3 +++ libgomp/team.c| 63 --- 2 files changed, 45 insertions(+), 21 deletions(-) diff --git a/libgomp/libgomp.h b/libgomp/libgomp.h index 5ed0f78..563c1e2 100644 --- a/libgomp/libgomp.h +++ b/libgomp/libgomp.h @@ -448,6 +448,9 @@ struct gomp_thread /* User pthread thread pool */ struct gomp_thread_pool *thread_pool; + + /* Spare team ready for re-use in gomp_new_team() */ + struct gomp_team *spare_team; }; diff --git a/libgomp/team.c b/libgomp/team.c index b98b233..cc19eb0 100644 --- a/libgomp/team.c +++ b/libgomp/team.c @@ -77,6 +77,7 @@ gomp_thread_start (void *xdata) struct gomp_thread local_thr; thr = local_thr; pthread_setspecific (gomp_tls_key, thr); + thr-spare_team = NULL; #endif gomp_sem_init (thr-release, 0); @@ -140,19 +141,35 @@ gomp_thread_start (void *xdata) struct gomp_team * gomp_new_team (unsigned nthreads) { + struct gomp_thread *thr = gomp_thread (); + struct gomp_team *spare_team = thr-spare_team; struct gomp_team *team; - size_t size; int i; - size = sizeof (*team) + nthreads * (sizeof (team-ordered_release[0]) - + sizeof (team-implicit_task[0])); - team = gomp_malloc (size); + if (spare_team spare_team-nthreads == nthreads) +{ + thr-spare_team = NULL; + team = spare_team; +} + else +{ + size_t extra = sizeof (team-ordered_release[0]) + + sizeof (team-implicit_task[0]); + team = gomp_malloc (sizeof (*team) + nthreads * extra); + +#ifndef HAVE_SYNC_BUILTINS + gomp_mutex_init (team-work_share_list_free_lock); +#endif + gomp_barrier_init (team-barrier, nthreads); + gomp_sem_init (team-master_release, 0); + gomp_mutex_init (team-task_lock); + + team-nthreads = nthreads; +} team-work_share_chunk = 8; #ifdef HAVE_SYNC_BUILTINS team-single_count = 0; -#else - gomp_mutex_init (team-work_share_list_free_lock); #endif team-work_shares_to_free = team-work_shares[0]; gomp_init_work_share (team-work_shares[0], false, nthreads); @@ -163,14 +180,9 @@ gomp_new_team (unsigned nthreads) team-work_shares[i].next_free = team-work_shares[i + 1]; team-work_shares[i].next_free = NULL; - team-nthreads = nthreads; - gomp_barrier_init (team-barrier, nthreads); - - gomp_sem_init (team-master_release, 0); team-ordered_release = (void *) team-implicit_task[nthreads]; team-ordered_release[0] = team-master_release; - gomp_mutex_init (team-task_lock); team-task_queue = NULL; team-task_count = 0; team-task_queued_count = 0; @@ -187,6 +199,10 @@ gomp_new_team (unsigned nthreads) static void free_team (struct gomp_team *team) { + gomp_sem_destroy (team-master_release); +#ifndef HAVE_SYNC_BUILTINS + gomp_mutex_destroy (team-work_share_list_free_lock); +#endif gomp_barrier_destroy (team-barrier); gomp_mutex_destroy (team-task_lock); free (team); @@ -225,6 +241,8 @@ gomp_free_thread (void *arg __attribute__((unused))) { struct gomp_thread *thr = gomp_thread (); struct gomp_thread_pool *pool = thr-thread_pool; + if (thr-spare_team) +free_team (thr-spare_team); if (pool) { if (pool-threads_used 0) @@ -835,6 +853,18 @@ gomp_team_start (void (*fn) (void *), void *data, unsigned nthreads, free (affinity_thr); } +static void +free_non_nested_team (struct gomp_team *team, struct gomp_thread *thr) +{ + struct gomp_thread_pool *pool = thr-thread_pool; + if (pool-last_team) +{ + if (thr-spare_team) + free_team (thr-spare_team); + thr-spare_team = pool-last_team; +} + pool-last_team = team; +} /* Terminate the current team. This is only to be called by the master thread. We assume that we must wait for the other threads. */ @@ -894,21 +924,12 @@ gomp_team_end (void)
Re: [PATCH 1/3] [ARM] PR63870 NEON error messages
Alan Lawrence wrote: I note some parts of this duplicate my https://gcc.gnu.org/ml/gcc-patches/2015-01/msg01422.html , which has been pinged a couple of times. Both Charles' patch, and my two, contain parts the other does not... Cheers, Alan Charles Baylis wrote: gcc/ChangeLog: DATE Charles Baylis charles.bay...@linaro.org * config/arm/arm-builtins.c (enum arm_type_qualifiers): New enumerators qualifier_lane_index, qualifier_struct_load_store_lane_index. (arm_expand_neon_args): New parameter. Remove ellipsis. Handle NEON argument qualifiers. (arm_expand_neon_builtin): Handle NEON argument qualifiers. * config/arm/arm-protos.h: (arm_neon_lane_bounds) New prototype. * config/arm/arm.c (arm_neon_lane_bounds): New function. Further to that - the main difference/conflict between Charles' patch and mine looks to be that I added the const_tree parameter to the existing neon_lane_bounds method, whereas Charles' patch adds a new method arm_neon_lane_bounds. --Alan
[PATCH 6/16][ARM] Remaining float16 intrinsics: vld..., vst..., vget_low/high, vcombine
As per https://gcc.gnu.org/ml/gcc-patches/2015-04/msg01341.html commit ae6264b144d25fadcbf219e68ddf3d8c5f40be34 Author: Alan Lawrence alan.lawre...@arm.com Date: Thu Dec 11 11:53:59 2014 + ARM 4/4 v2: v(ld|st)[234](q?|_lane|_dup), vcombine, vget_(low|high) (v2 w/ V_uf_sclr) All are tied together with so many iterators! Also vec_extract diff --git a/gcc/config/arm/arm-builtins.c b/gcc/config/arm/arm-builtins.c index 17e39d8..1ee0a3d 100644 --- a/gcc/config/arm/arm-builtins.c +++ b/gcc/config/arm/arm-builtins.c @@ -241,6 +241,12 @@ typedef struct { #define VAR10(T, N, A, B, C, D, E, F, G, H, I, J) \ VAR9 (T, N, A, B, C, D, E, F, G, H, I) \ VAR1 (T, N, J) +#define VAR11(T, N, A, B, C, D, E, F, G, H, I, J, K) \ + VAR10 (T, N, A, B, C, D, E, F, G, H, I, J) \ + VAR1 (T, N, K) +#define VAR12(T, N, A, B, C, D, E, F, G, H, I, J, K, L) \ + VAR11 (T, N, A, B, C, D, E, F, G, H, I, J, K) \ + VAR1 (T, N, L) /* The NEON builtin data can be found in arm_neon_builtins.def. The mode entries in the following table correspond to the key type of the diff --git a/gcc/config/arm/arm_neon.h b/gcc/config/arm/arm_neon.h index db73c70..93fb44f 100644 --- a/gcc/config/arm/arm_neon.h +++ b/gcc/config/arm/arm_neon.h @@ -162,6 +162,16 @@ typedef struct uint64x2x2_t uint64x2_t val[2]; } uint64x2x2_t; +typedef struct float16x4x2_t +{ + float16x4_t val[2]; +} float16x4x2_t; + +typedef struct float16x8x2_t +{ + float16x8_t val[2]; +} float16x8x2_t; + typedef struct float32x2x2_t { float32x2_t val[2]; @@ -288,6 +298,16 @@ typedef struct uint64x2x3_t uint64x2_t val[3]; } uint64x2x3_t; +typedef struct float16x4x3_t +{ + float16x4_t val[3]; +} float16x4x3_t; + +typedef struct float16x8x3_t +{ + float16x8_t val[3]; +} float16x8x3_t; + typedef struct float32x2x3_t { float32x2_t val[3]; @@ -414,6 +434,16 @@ typedef struct uint64x2x4_t uint64x2_t val[4]; } uint64x2x4_t; +typedef struct float16x4x4_t +{ + float16x4_t val[4]; +} float16x4x4_t; + +typedef struct float16x8x4_t +{ + float16x8_t val[4]; +} float16x8x4_t; + typedef struct float32x2x4_t { float32x2_t val[4]; @@ -6031,6 +6061,12 @@ vcombine_s64 (int64x1_t __a, int64x1_t __b) return (int64x2_t)__builtin_neon_vcombinedi (__a, __b); } +__extension__ static __inline float16x8_t __attribute__ ((__always_inline__)) +vcombine_f16 (float16x4_t __a, float16x4_t __b) +{ + return __builtin_neon_vcombinev4hf (__a, __b); +} + __extension__ static __inline float32x4_t __attribute__ ((__always_inline__)) vcombine_f32 (float32x2_t __a, float32x2_t __b) { @@ -6105,6 +6141,12 @@ vget_high_s64 (int64x2_t __a) return (int64x1_t)__builtin_neon_vget_highv2di (__a); } +__extension__ static __inline float16x4_t __attribute__ ((__always_inline__)) +vget_high_f16 (float16x8_t __a) +{ + return __builtin_neon_vget_highv8hf (__a); +} + __extension__ static __inline float32x2_t __attribute__ ((__always_inline__)) vget_high_f32 (float32x4_t __a) { @@ -6165,6 +6207,12 @@ vget_low_s32 (int32x4_t __a) return (int32x2_t)__builtin_neon_vget_lowv4si (__a); } +__extension__ static __inline float16x4_t __attribute__ ((__always_inline__)) +vget_low_f16 (float16x8_t __a) +{ + return __builtin_neon_vget_lowv8hf (__a); +} + __extension__ static __inline float32x2_t __attribute__ ((__always_inline__)) vget_low_f32 (float32x4_t __a) { @@ -8712,6 +8760,12 @@ vld1_s64 (const int64_t * __a) return (int64x1_t)__builtin_neon_vld1di ((const __builtin_neon_di *) __a); } +__extension__ static __inline float16x4_t __attribute__ ((__always_inline__)) +vld1_f16 (const float16_t * __a) +{ + return __builtin_neon_vld1v4hf ((const __builtin_neon_hf *) __a); +} + __extension__ static __inline float32x2_t __attribute__ ((__always_inline__)) vld1_f32 (const float32_t * __a) { @@ -8786,6 +8840,12 @@ vld1q_s64 (const int64_t * __a) return (int64x2_t)__builtin_neon_vld1v2di ((const __builtin_neon_di *) __a); } +__extension__ static __inline float16x8_t __attribute__ ((__always_inline__)) +vld1q_f16 (const float16_t * __a) +{ + return __builtin_neon_vld1v8hf ((const __builtin_neon_hf *) __a); +} + __extension__ static __inline float32x4_t __attribute__ ((__always_inline__)) vld1q_f32 (const float32_t * __a) { @@ -9183,6 +9243,12 @@ vst1_s64 (int64_t * __a, int64x1_t __b) } __extension__ static __inline void __attribute__ ((__always_inline__)) +vst1_f16 (float16_t * __a, float16x4_t __b) +{ + __builtin_neon_vst1v4hf ((__builtin_neon_hf *) __a, __b); +} + +__extension__ static __inline void __attribute__ ((__always_inline__)) vst1_f32 (float32_t * __a, float32x2_t __b) { __builtin_neon_vst1v2sf ((__builtin_neon_sf *) __a, __b); @@ -9257,6 +9323,12 @@ vst1q_s64 (int64_t * __a, int64x2_t __b) } __extension__ static __inline void __attribute__ ((__always_inline__)) +vst1q_f16 (float16_t * __a, float16x8_t __b) +{ + __builtin_neon_vst1v8hf ((__builtin_neon_hf *) __a, __b); +} + +__extension__ static __inline
[PATCH 5/16][ARM] Add float16x8_t intrinsics
As per https://gcc.gnu.org/ml/gcc-patches/2015-04/msg01337.html commit 336eb16d3061131fe8d28fad4a473d00768bfe5c Author: Alan Lawrence alan.lawre...@arm.com Date: Tue Dec 9 15:06:38 2014 + ARM float16x8_t intrinsics (v2 - fix v[sg]etq_lane_f16, add vreinterpretq_p16_f16, no vdup_n/lane/vmov_n) diff --git a/gcc/config/arm/arm_neon.h b/gcc/config/arm/arm_neon.h index a958f63..db73c70 100644 --- a/gcc/config/arm/arm_neon.h +++ b/gcc/config/arm/arm_neon.h @@ -5282,6 +5282,15 @@ vgetq_lane_s32 (int32x4_t __a, const int __b) return (int32_t)__builtin_neon_vget_lanev4si (__a, __b); } +#define vgetq_lane_f16(__v, __idx) \ + __extension__\ +({ \ + float16x8_t __vec = (__v); \ + __builtin_arm_lane_check (8, __idx); \ + float16_t __res = __vec[__idx]; \ + __res; \ +}) + __extension__ static __inline float32_t __attribute__ ((__always_inline__)) vgetq_lane_f32 (float32x4_t __a, const int __b) { @@ -5424,6 +5433,16 @@ vsetq_lane_s32 (int32_t __a, int32x4_t __b, const int __c) return (int32x4_t)__builtin_neon_vset_lanev4si ((__builtin_neon_si) __a, __b, __c); } +#define vsetq_lane_f16(__e, __v, __idx)\ + __extension__\ +({ \ + float16_t __elem = (__e);\ + float16x8_t __vec = (__v); \ + __builtin_arm_lane_check (8, __idx); \ + __vec[__idx] = __elem; \ + __vec; \ +}) + __extension__ static __inline float32x4_t __attribute__ ((__always_inline__)) vsetq_lane_f32 (float32_t __a, float32x4_t __b, const int __c) { @@ -8907,6 +8926,12 @@ vld1q_lane_s32 (const int32_t * __a, int32x4_t __b, const int __c) return (int32x4_t)__builtin_neon_vld1_lanev4si ((const __builtin_neon_si *) __a, __b, __c); } +__extension__ static __inline float16x8_t __attribute__ ((__always_inline__)) +vld1q_lane_f16 (const float16_t * __a, float16x8_t __b, const int __c) +{ + return vsetq_lane_f16 (*__a, __b, __c); +} + __extension__ static __inline float32x4_t __attribute__ ((__always_inline__)) vld1q_lane_f32 (const float32_t * __a, float32x4_t __b, const int __c) { @@ -9062,6 +9087,13 @@ vld1q_dup_s32 (const int32_t * __a) return (int32x4_t)__builtin_neon_vld1_dupv4si ((const __builtin_neon_si *) __a); } +__extension__ static __inline float16x8_t __attribute__ ((__always_inline__)) +vld1q_dup_f16 (const float16_t * __a) +{ + float16_t __f = *__a; + return (float16x8_t) { __f, __f, __f, __f, __f, __f, __f, __f }; +} + __extension__ static __inline float32x4_t __attribute__ ((__always_inline__)) vld1q_dup_f32 (const float32_t * __a) { @@ -12856,6 +12888,12 @@ vreinterpretq_p8_p16 (poly16x8_t __a) } __extension__ static __inline poly8x16_t __attribute__ ((__always_inline__)) +vreinterpretq_p8_f16 (float16x8_t __a) +{ + return (poly8x16_t) __a; +} + +__extension__ static __inline poly8x16_t __attribute__ ((__always_inline__)) vreinterpretq_p8_f32 (float32x4_t __a) { return (poly8x16_t)__builtin_neon_vreinterpretv16qiv4sf (__a); @@ -12932,6 +12970,12 @@ vreinterpretq_p16_p8 (poly8x16_t __a) } __extension__ static __inline poly16x8_t __attribute__ ((__always_inline__)) +vreinterpretq_p16_f16 (float16x8_t __a) +{ + return (poly16x8_t) __a; +} + +__extension__ static __inline poly16x8_t __attribute__ ((__always_inline__)) vreinterpretq_p16_f32 (float32x4_t __a) { return (poly16x8_t)__builtin_neon_vreinterpretv8hiv4sf (__a); @@ -13001,6 +13045,88 @@ vreinterpretq_p16_u32 (uint32x4_t __a) return (poly16x8_t)__builtin_neon_vreinterpretv8hiv4si ((int32x4_t) __a); } +__extension__ static __inline float16x8_t __attribute__ ((__always_inline__)) +vreinterpretq_f16_p8 (poly8x16_t __a) +{ + return (float16x8_t) __a; +} + +__extension__ static __inline float16x8_t __attribute__ ((__always_inline__)) +vreinterpretq_f16_p16 (poly16x8_t __a) +{ + return (float16x8_t) __a; +} + +__extension__ static __inline float16x8_t __attribute__ ((__always_inline__)) +vreinterpretq_f16_f32 (float32x4_t __a) +{ + return (float16x8_t) __a; +} + +#ifdef __ARM_FEATURE_CRYPTO +__extension__ static __inline float16x8_t __attribute__ ((__always_inline__)) +vreinterpretq_f16_p64 (poly64x2_t __a) +{ + return (float16x8_t) __a; +} + +#endif +#ifdef __ARM_FEATURE_CRYPTO +__extension__ static __inline float16x8_t __attribute__ ((__always_inline__)) +vreinterpretq_f16_p128 (poly128_t __a) +{ + return (float16x8_t) __a; +} + +#endif +__extension__ static __inline float16x8_t __attribute__ ((__always_inline__)) +vreinterpretq_f16_s64 (int64x2_t __a) +{ + return (float16x8_t) __a; +} + +__extension__ static __inline float16x8_t __attribute__ ((__always_inline__)) +vreinterpretq_f16_u64 (uint64x2_t __a) +{ + return
[PATCH 7/16][AArch64] Add basic fp16 support
Same as https://gcc.gnu.org/ml/gcc-patches/2015-04/msg01340.html except that two of the tests have been moved into the next patch. (The remaining test is AArch64 only.) gcc/ChangeLog: * config/aarch64/aarch64-builtins.c (aarch64_fp16_type_node): New. (aarch64_init_builtins): Make aarch64_fp16_type_node, use for __fp16. * config/aarch64/aarch64-modes.def: Add HFmode. * config/aarch64/aarch64.h (TARGET_CPU_CPP_BUILTINS): Define __ARM_FP16_FORMAT_IEEE and __ARM_FP16_ARGS. Set bit 1 of __ARM_FP. * config/aarch64/aarch64.c (aarch64_init_libfuncs, aarch64_promoted_type): New. (aarch64_float_const_representable_p): Disable HFmode. (aarch64_mangle_type): Mangle half-precision floats to Dh. (TARGET_PROMOTED_TYPE): Define to aarch64_promoted_type. (TARGET_INIT_LIBFUNCS): Define to aarch64_init_libfuncs. * config/aarch64/aarch64.md (movmode): Include HFmode using GPF_F16. (movhf_aarch64, extendhfsf2, extendhfdf2, truncsfhf2, truncdfhf2): New. * config/aarch64/iterators.md (GPF_F16): New. gcc/testsuite/ChangeLog: * gcc.target/aarch64/f16_movs_1.c: New test. commit 989af1492bbf268be1ecfae06f3303b90ae514c8 Author: Alan Lawrence alan.lawre...@arm.com Date: Tue Dec 2 12:57:39 2014 + AArch64 1/6: Basic HFmode support (less tests), aarch64_fp16_type_node, patterns, mangling, predefines. No --fp16-format option. Disable constants as NYI. diff --git a/gcc/config/aarch64/aarch64-builtins.c b/gcc/config/aarch64/aarch64-builtins.c index ec60955..cfb2dc1 100644 --- a/gcc/config/aarch64/aarch64-builtins.c +++ b/gcc/config/aarch64/aarch64-builtins.c @@ -439,6 +439,9 @@ static struct aarch64_simd_type_info aarch64_simd_types [] = { }; #undef ENTRY +/* This type is not SIMD-specific; it is the user-visible __fp16. */ +static tree aarch64_fp16_type_node = NULL_TREE; + static tree aarch64_simd_intOI_type_node = NULL_TREE; static tree aarch64_simd_intEI_type_node = NULL_TREE; static tree aarch64_simd_intCI_type_node = NULL_TREE; @@ -849,6 +852,12 @@ aarch64_init_builtins (void) = add_builtin_function (__builtin_aarch64_set_fpsr, ftype_set_fpr, AARCH64_BUILTIN_SET_FPSR, BUILT_IN_MD, NULL, NULL_TREE); + aarch64_fp16_type_node = make_node (REAL_TYPE); + TYPE_PRECISION (aarch64_fp16_type_node) = 16; + layout_type (aarch64_fp16_type_node); + + (*lang_hooks.types.register_builtin_type) (aarch64_fp16_type_node, __fp16); + if (TARGET_SIMD) aarch64_init_simd_builtins (); if (TARGET_CRC32) diff --git a/gcc/config/aarch64/aarch64-modes.def b/gcc/config/aarch64/aarch64-modes.def index b17b90d..c30059b 100644 --- a/gcc/config/aarch64/aarch64-modes.def +++ b/gcc/config/aarch64/aarch64-modes.def @@ -36,6 +36,10 @@ CC_MODE (CC_DLTU); CC_MODE (CC_DGEU); CC_MODE (CC_DGTU); +/* Half-precision floating point for arm_neon.h float16_t. */ +FLOAT_MODE (HF, 2, 0); +ADJUST_FLOAT_FORMAT (HF, ieee_half_format); + /* Vector modes. */ VECTOR_MODES (INT, 8);/* V8QI V4HI V2SI. */ VECTOR_MODES (INT, 16); /* V16QI V8HI V4SI V2DI. */ diff --git a/gcc/config/aarch64/aarch64.c b/gcc/config/aarch64/aarch64.c index 17bae08..f338033 100644 --- a/gcc/config/aarch64/aarch64.c +++ b/gcc/config/aarch64/aarch64.c @@ -8339,6 +8339,10 @@ aarch64_mangle_type (const_tree type) if (lang_hooks.types_compatible_p (CONST_CAST_TREE (type), va_list_type)) return St9__va_list; + /* Half-precision float. */ + if (TREE_CODE (type) == REAL_TYPE TYPE_PRECISION (type) == 16) +return Dh; + /* Mangle AArch64-specific internal types. TYPE_NAME is non-NULL_TREE for builtin types. */ if (TYPE_NAME (type) != NULL) @@ -9578,6 +9582,33 @@ aarch64_start_file (void) default_file_start(); } +static void +aarch64_init_libfuncs (void) +{ + /* Half-precision float operations. The compiler handles all operations + with NULL libfuncs by converting to SFmode. */ + + /* Conversions. */ + set_conv_libfunc (trunc_optab, HFmode, SFmode, __gnu_f2h_ieee); + set_conv_libfunc (sext_optab, SFmode, HFmode, __gnu_h2f_ieee); + + /* Arithmetic. */ + set_optab_libfunc (add_optab, HFmode, NULL); + set_optab_libfunc (sdiv_optab, HFmode, NULL); + set_optab_libfunc (smul_optab, HFmode, NULL); + set_optab_libfunc (neg_optab, HFmode, NULL); + set_optab_libfunc (sub_optab, HFmode, NULL); + + /* Comparisons. */ + set_optab_libfunc (eq_optab, HFmode, NULL); + set_optab_libfunc (ne_optab, HFmode, NULL); + set_optab_libfunc (lt_optab, HFmode, NULL); + set_optab_libfunc (le_optab, HFmode, NULL); + set_optab_libfunc (ge_optab, HFmode, NULL); + set_optab_libfunc (gt_optab, HFmode, NULL); + set_optab_libfunc (unord_optab, HFmode, NULL); +} + /* Target hook for c_mode_for_suffix. */ static machine_mode aarch64_c_mode_for_suffix (char suffix) @@ -9616,7 +9647,8 @@ aarch64_float_const_representable_p (rtx x) if (!CONST_DOUBLE_P (x))
[PATCH 8/16][ARM/AArch64 Testsuite] Add basic fp16 tests
These were originally part of https://gcc.gnu.org/ml/gcc-patches/2015-04/msg01340.html but I have moved into their own subdirectory and adapted them to execute on ARM also (as per https://gcc.gnu.org/ml/gcc-patches/2015-05/msg00656.html) gcc/testsuite/ChangeLog: * gcc.target/aarch64/fp16/fp16.exp: New. * gcc.target/aarch64/fp16/f16_convs_1.c: New. * gcc.target/aarch64/fp16/f16_convs_2.c: New. commit bc5045c0d3dd34b8cb94910281384f9ab9880325 Author: Alan Lawrence alan.lawre...@arm.com Date: Thu May 7 10:08:12 2015 +0100 (ARM+AArch64) Add gcc.target/aarch64/fp16, f16_conv_[12].c tests diff --git a/gcc/testsuite/gcc.target/aarch64/fp16/f16_convs_1.c b/gcc/testsuite/gcc.target/aarch64/fp16/f16_convs_1.c new file mode 100644 index 000..a1c95fd --- /dev/null +++ b/gcc/testsuite/gcc.target/aarch64/fp16/f16_convs_1.c @@ -0,0 +1,34 @@ +/* { dg-do run } */ +/* { dg-options -O2 } */ +/* { dg-additional-options -mfp16-format=ieee {target arm*-*-*} } */ + +extern void abort (void); + +#define EPSILON 0.0001 + +int +main (int argc, char **argv) +{ + float f1 = 3.14159f; + float f2 = 2.718f; + /* This 'assembler' statement should be portable between ARM and AArch64. */ + asm volatile ( : : : memory); + __fp16 in1 = f1; + __fp16 in2 = f2; + + /* Do the addition on __fp16's (implicitly converts both operands to + float32, adds, converts back to f16, then we convert back to f32). */ + __fp16 res1 = in1 + in2; + asm volatile ( : : : memory); + float f_res_1 = res1; + + /* Do the addition on float32's (we convert both operands to f32, and add, + as above, but skip the final conversion f32 - f16 - f32). */ + float f1a = in1; + float f2a = in2; + float f_res_2 = f1a + f2a; + + if (__builtin_fabs (f_res_2 - f_res_1) EPSILON) +abort (); + return 0; +} diff --git a/gcc/testsuite/gcc.target/aarch64/fp16/f16_convs_2.c b/gcc/testsuite/gcc.target/aarch64/fp16/f16_convs_2.c new file mode 100644 index 000..6aa3e59 --- /dev/null +++ b/gcc/testsuite/gcc.target/aarch64/fp16/f16_convs_2.c @@ -0,0 +1,33 @@ +/* { dg-do run } */ +/* { dg-options -O2 } */ +/* { dg-additional-options -mfp16-format=ieee {target arm*-*-*} } */ + +extern void abort (void); + +#define EPSILON 0.0001 + +int +main (int argc, char **argv) +{ + int i1 = 3; + int i2 = 2; + /* This 'assembler' should be portable across ARM and AArch64. */ + asm volatile ( : : : memory); + + __fp16 in1 = i1; + __fp16 in2 = i2; + + /* Do the addition on __fp16's (implicitly converts both operands to + float32, adds, converts back to f16, then we convert to int). */ + __fp16 res1 = in1 + in2; + asm volatile ( : : : memory); + int result1 = res1; + + /* Do the addition on int's (we convert both operands directly to int, add, + and we're done). */ + int result2 = ((int) in1) + ((int) in2); + + if (__builtin_abs (result2 - result1) EPSILON) +abort (); + return 0; +} diff --git a/gcc/testsuite/gcc.target/aarch64/fp16/fp16.exp b/gcc/testsuite/gcc.target/aarch64/fp16/fp16.exp new file mode 100644 index 000..7dc8d65 --- /dev/null +++ b/gcc/testsuite/gcc.target/aarch64/fp16/fp16.exp @@ -0,0 +1,43 @@ +# Tests of 16-bit floating point (__fp16), for both ARM and AArch64. +# Copyright (C) 2015 Free Software Foundation, Inc. + +# This program is free software; you can redistribute it and/or modify +# it under the terms of the GNU General Public License as published by +# the Free Software Foundation; either version 3 of the License, or +# (at your option) any later version. +# +# This program is distributed in the hope that it will be useful, +# but WITHOUT ANY WARRANTY; without even the implied warranty of +# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the +# GNU General Public License for more details. +# +# You should have received a copy of the GNU General Public License +# along with GCC; see the file COPYING3. If not see +# http://www.gnu.org/licenses/. + +# GCC testsuite that uses the `dg.exp' driver. + +# Exit immediately if this isn't an ARM or AArch64 target. +if {![istarget arm*-*-*] + ![istarget aarch64*-*-*]} then { + return +} + +# Load support procs. +load_lib gcc-dg.exp + +# If a testcase doesn't have special options, use these. +global DEFAULT_CFLAGS +if ![info exists DEFAULT_CFLAGS] then { +set DEFAULT_CFLAGS -ansi -pedantic-errors +} + +# Initialize `dg'. +dg-init + +# Main loop. +dg-runtest [lsort [glob -nocomplain $srcdir/$subdir/*.\[cC\]]] \ + $DEFAULT_CFLAGS + +# All done. +dg-finish
Re: [PR66726] Factor conversion out of COND_EXPR
On 07/07/15 07:37, Jeff Law wrote: On 07/04/2015 06:32 AM, Kugan wrote: I would also verify that this turns into a MIN_EXPR. I think the patch as-written won't detect the MIN_EXPR until the _next_ time phi-opt is called. And one of the benefits we're really looking for here is to remove barriers to finding these min/max expressions. + diff --git a/gcc/tree-ssa-phiopt.c b/gcc/tree-ssa-phiopt.c index d2a5cee..12ab9ee 100644 --- a/gcc/tree-ssa-phiopt.c +++ b/gcc/tree-ssa-phiopt.c @@ -73,6 +73,7 @@ along with GCC; see the file COPYING3. If not see static unsigned int tree_ssa_phiopt_worker (bool, bool); static bool conditional_replacement (basic_block, basic_block, edge, edge, gphi *, tree, tree); +static bool factor_out_conditional_conversion (edge, edge, gphi *, tree, tree); static int value_replacement (basic_block, basic_block, edge, edge, gimple, tree, tree); static bool minmax_replacement (basic_block, basic_block, @@ -342,6 +343,8 @@ tree_ssa_phiopt_worker (bool do_store_elim, bool do_hoist_loads) cfgchanged = true; else if (minmax_replacement (bb, bb1, e1, e2, phi, arg0, arg1)) cfgchanged = true; + else if (factor_out_conditional_conversion (e1, e2, phi, arg0, arg1)) +cfgchanged = true; So this transformation does not inherently change the CFG, so setting CFGCHANGED isn't really appropriate and may trigger unnecessary cleanups. I think the transformation needs to occur prior this if-elseif-else block since the transformation should enable the code in the if-elseif-else block to find more optimization opportunities. That will also imply we either restart after the transformation applies, or we update the local variables that are used as arguments to conditional_replacement, abs_replacement and minmax_replacement. } } @@ -410,6 +413,108 @@ replace_phi_edge_with_variable (basic_block cond_block, bb-index); } +/* PR66726: Factor conversion out of COND_EXPR. If the arguments of the PHI + stmt are CONVERT_STMT, factor out the conversion and perform the conversion + to the result of PHI stmt. */ + +static bool +factor_out_conditional_conversion (edge e0, edge e1, gphi *phi, + tree arg0, tree arg1) +{ + gimple def0 = NULL, def1 = NULL, new_stmt; + tree new_arg0 = NULL_TREE, new_arg1 = NULL_TREE; + tree temp, result; + gimple_stmt_iterator gsi; + + /* One of the arguments has to be SSA_NAME and other argument can + be an SSA_NAME of INTEGER_CST. */ + if ((TREE_CODE (arg0) != SSA_NAME +TREE_CODE (arg0) != INTEGER_CST) + || (TREE_CODE (arg1) != SSA_NAME + TREE_CODE (arg1) != INTEGER_CST) + || (TREE_CODE (arg0) == INTEGER_CST + TREE_CODE (arg1) == INTEGER_CST)) +return false; + + /* Handle only PHI statements with two arguments. TODO: If all + other arguments to PHI are INTEGER_CST, we can handle more + than two arguments too. */ + if (gimple_phi_num_args (phi) != 2) +return false; If you're just handling two arguments, then it's probably easiest to just swap arg0/arg1 e0/e1 if arg0 is not an SSA_NAME like this: /* First canonicalize to simplify tests. */ if (TREE_CODE (arg0) != SSA_NAME) { std::swap (arg0, arg1); std::swap (e0, e1); } if (TREE_CODE (arg0) != SSA_NAME) return false; That simplifies things a bit since you're going to know from thsi point forward that arg0 is an SSA_NAME. + + /* If arg0 is an SSA_NAME and the stmt which defines arg0 is + a CONVERT_STMT, use the LHS as new_arg0. */ + if (TREE_CODE (arg0) == SSA_NAME) +{ + def0 = SSA_NAME_DEF_STMT (arg0); + if (!is_gimple_assign (def0) + || !CONVERT_EXPR_CODE_P (gimple_assign_rhs_code (def0))) +return false; + new_arg0 = gimple_assign_rhs1 (def0); +} Use gimple_assign_cast_p rather than checking CONVERT_EXPR_CODE_P directly, so something like: /* Now see if ARG0 was defined by a typecast. */ gimple arg0_def = SSA_NAME_DEF_STMT (arg0); if (!is_gimple_assign (arg0_def) || !gimple_assign_cast_p (arg0_def)) return false; Similarly for arg1 when it's an SSA_NAME. + + /* If types of new_arg0 and new_arg1 are different bailout. */ + if (TREE_TYPE (new_arg0) != TREE_TYPE (new_arg1)) +return false; Do we want to restrict this to just integral types?I haven't though about it too deeply, so perhaps not. + + /* Replace the PHI stmt with the new_arg0 and new_arg1. Also insert + a new CONVERT_STMT that converts the phi results. */ + gsi = gsi_after_labels (gimple_bb (phi)); + result = PHI_RESULT (phi); + temp = make_ssa_name (TREE_TYPE (new_arg0), phi); + + if (dump_file (dump_flags TDF_DETAILS)) +{ + fprintf (dump_file, PHI ); + print_generic_expr (dump_file, gimple_phi_result (phi), 0); +
Re: [PATCH] Do not use floating point registers when compiling with -msoft-float for SPARC
On 2015-07-07 12:32, Eric Botcazou wrote: ChangeLog must just describe the what, nothing more. If the rationale is not obvious, then a comment must be added _in the code_ itself. * config/sparc/sparc.c (sparc_function_value_regno_p): Do not return true on %f0 for a target without FPU. * config/sparc/sparc.md (untyped_call): Do not save %f0 for a target without FPU. (untyped_return): Do not load %f0 for a target without FPU. Understood. Thank you for looking at my patches and coming with improvements. -- Daniel Cederman
[PATCH 1/16][ARM] PR/63870 Add qualifier to check lane bounds in expand
As per https://gcc.gnu.org/ml/gcc-patches/2015-04/msg01333.html (While this falls under PR/63870, and I will link to that in the ChangeLog, it is only a small step towards fixing that PR.) commit 9812db88cff20a505365f68f4065d2fbab998c9c Author: Alan Lawrence alan.lawre...@arm.com Date: Mon Dec 8 11:04:49 2014 + ARM: Add qualifier_lane_index diff --git a/gcc/config/arm/arm-builtins.c b/gcc/config/arm/arm-builtins.c index f960e0a..7f5bf87 100644 --- a/gcc/config/arm/arm-builtins.c +++ b/gcc/config/arm/arm-builtins.c @@ -77,7 +77,9 @@ enum arm_type_qualifiers /* qualifier_const_pointer | qualifier_map_mode */ qualifier_const_pointer_map_mode = 0x86, /* Polynomial types. */ - qualifier_poly = 0x100 + qualifier_poly = 0x100, + /* Lane indices - must be within range of previous argument = a vector. */ + qualifier_lane_index = 0x200 }; /* The qualifier_internal allows generation of a unary builtin from @@ -108,21 +110,40 @@ arm_ternop_qualifiers[SIMD_MAX_BUILTIN_ARGS] /* T (T, immediate). */ static enum arm_type_qualifiers -arm_getlane_qualifiers[SIMD_MAX_BUILTIN_ARGS] +arm_binop_imm_qualifiers[SIMD_MAX_BUILTIN_ARGS] = { qualifier_none, qualifier_none, qualifier_immediate }; +#define BINOP_IMM_QUALIFIERS (arm_binop_imm_qualifiers) + +/* T (T, lane index). */ +static enum arm_type_qualifiers +arm_getlane_qualifiers[SIMD_MAX_BUILTIN_ARGS] + = { qualifier_none, qualifier_none, qualifier_lane_index }; #define GETLANE_QUALIFIERS (arm_getlane_qualifiers) /* T (T, T, T, immediate). */ static enum arm_type_qualifiers -arm_lanemac_qualifiers[SIMD_MAX_BUILTIN_ARGS] +arm_mac_n_qualifiers[SIMD_MAX_BUILTIN_ARGS] = { qualifier_none, qualifier_none, qualifier_none, qualifier_none, qualifier_immediate }; -#define LANEMAC_QUALIFIERS (arm_lanemac_qualifiers) +#define MAC_N_QUALIFIERS (arm_mac_n_qualifiers) + +/* T (T, T, T, lane index). */ +static enum arm_type_qualifiers +arm_mac_lane_qualifiers[SIMD_MAX_BUILTIN_ARGS] + = { qualifier_none, qualifier_none, qualifier_none, + qualifier_none, qualifier_lane_index }; +#define MAC_LANE_QUALIFIERS (arm_mac_lane_qualifiers) /* T (T, T, immediate). */ static enum arm_type_qualifiers -arm_setlane_qualifiers[SIMD_MAX_BUILTIN_ARGS] +arm_ternop_imm_qualifiers[SIMD_MAX_BUILTIN_ARGS] = { qualifier_none, qualifier_none, qualifier_none, qualifier_immediate }; +#define TERNOP_IMM_QUALIFIERS (arm_ternop_imm_qualifiers) + +/* T (T, T, lane index). */ +static enum arm_type_qualifiers +arm_setlane_qualifiers[SIMD_MAX_BUILTIN_ARGS] + = { qualifier_none, qualifier_none, qualifier_none, qualifier_lane_index }; #define SETLANE_QUALIFIERS (arm_setlane_qualifiers) /* T (T, T). */ @@ -1927,6 +1948,7 @@ arm_expand_unop_builtin (enum insn_code icode, typedef enum { NEON_ARG_COPY_TO_REG, NEON_ARG_CONSTANT, + NEON_ARG_LANE_INDEX, NEON_ARG_MEMORY, NEON_ARG_STOP } builtin_arg; @@ -2043,6 +2065,16 @@ arm_expand_neon_args (rtx target, machine_mode map_mode, int fcode, op[argc] = copy_to_mode_reg (mode[argc], op[argc]); break; + case NEON_ARG_LANE_INDEX: + /* Previous argument must be a vector, which this indexes. */ + gcc_assert (argc 0); + if (CONST_INT_P (op[argc])) + { + enum machine_mode vmode = mode[argc - 1]; + neon_lane_bounds (op[argc], 0, GET_MODE_NUNITS (vmode), exp); + } + /* Fall through - if the lane index isn't a constant then + the next case will error. */ case NEON_ARG_CONSTANT: if (!(*insn_data[icode].operand[opno].predicate) (op[argc], mode[argc])) @@ -2170,7 +2202,9 @@ arm_expand_neon_builtin (int fcode, tree exp, rtx target) int operands_k = k - is_void; int expr_args_k = k - 1; - if (d-qualifiers[qualifiers_k] qualifier_immediate) + if (d-qualifiers[qualifiers_k] qualifier_lane_index) + args[k] = NEON_ARG_LANE_INDEX; + else if (d-qualifiers[qualifiers_k] qualifier_immediate) args[k] = NEON_ARG_CONSTANT; else if (d-qualifiers[qualifiers_k] qualifier_maybe_immediate) { diff --git a/gcc/config/arm/arm-protos.h b/gcc/config/arm/arm-protos.h index 62f91ef..25bdebd 100644 --- a/gcc/config/arm/arm-protos.h +++ b/gcc/config/arm/arm-protos.h @@ -86,7 +86,7 @@ extern void neon_pairwise_reduce (rtx, rtx, machine_mode, extern rtx neon_make_constant (rtx); extern tree arm_builtin_vectorized_function (tree, tree, tree); extern void neon_expand_vector_init (rtx, rtx); -extern void neon_lane_bounds (rtx, HOST_WIDE_INT, HOST_WIDE_INT); +extern void neon_lane_bounds (rtx, HOST_WIDE_INT, HOST_WIDE_INT, const_tree); extern void neon_const_bounds (rtx, HOST_WIDE_INT, HOST_WIDE_INT); extern HOST_WIDE_INT neon_element_bits (machine_mode); extern void neon_reinterpret (rtx, rtx); diff --git a/gcc/config/arm/arm.c b/gcc/config/arm/arm.c index e79a369..6e074ea 100644 --- a/gcc/config/arm/arm.c +++ b/gcc/config/arm/arm.c @@ -12788,12 +12788,12 @@ neon_expand_vector_init (rtx target, rtx vals) } /*
[PATCH 0/16][ARM/AArch64] Float16_t support, v2
This is a respin of the series at https://gcc.gnu.org/ml/gcc-patches/2015-04/msg01332.html, plus the two ARM patches on which these depend (https://gcc.gnu.org/ml/gcc-patches/2015-04/msg01333.html). These two somewhat duplicate Charles Baylis' lane-bounds-checking patch at https://gcc.gnu.org/ml/gcc-patches/2015-07/msg00140.html, in that they both port some of the same AArch64 infrastructure onto ARM; while each has some parts the other doesn't, there don't look to be any serious conflicts; if Charles' patches were to go in first, I would not expect any major problems in rebasing mine over his. Changes since the first version of the float16 series are * to separate out the (non-vector) tests from gcc.testsuite/aarch64 into a .../fp16 subdirectory, as per https://gcc.gnu.org/ml/gcc-patches/2015-05/msg00656.html * dropped the patch rewriting advsimd-intrinsics.exp, following other changes along similar lines by Sandra Loosemore and Christopher Lyon; * Rebased over other testsuite changes, including dropping expected values from many tests of intrinsics with no fp16 variant, and introducing a CHECK_RESULTS_NO_FP16 macro. * Changed the mechanism on ARM by which we passed in -mfpu=neon-fp16: we now try to pass this into all tests, but fail the vcvt_f16.c if float16 is still not supported (e.g. there was a conflicting -mfpu=neon passed to the compiler, or we are running on HW which does not support the instructions). Are these OK for trunk? Thanks, Alan
[PATCH 15/16][fold-const.c] Fix bigendian HFmode in native_interpret_real
As per https://gcc.gnu.org/ml/gcc-patches/2015-04/msg01346.html. Fixes FAIL of advsimd-intrinsics vcreate.c on aarch64_be-none-elf from previous patch. commit e2e7ca148960a82fc88128820f17e7cbd14173cb Author: Alan Lawrence alan.lawre...@arm.com Date: Thu Apr 9 10:54:40 2015 +0100 Fix native_interpret_real for HFmode floats on Bigendian with UNITS_PER_WORD=4 (with missing space) diff --git a/gcc/fold-const.c b/gcc/fold-const.c index e61d946..15a10f0 100644 --- a/gcc/fold-const.c +++ b/gcc/fold-const.c @@ -7622,7 +7622,7 @@ native_interpret_real (tree type, const unsigned char *ptr, int len) offset += byte % UNITS_PER_WORD; } else - offset = BYTES_BIG_ENDIAN ? 3 - byte : byte; + offset = BYTES_BIG_ENDIAN ? MIN (3, total_bytes - 1) - byte : byte; value = ptr[offset + ((bitpos / BITS_PER_UNIT) ~3)]; tmp[bitpos / 32] |= (unsigned long)value (bitpos 31);
[PATCH 14/16][ARM/AArch64 testsuite] Update advsimd-intrinsics tests to add float16 vectors
This is a respin of https://gcc.gnu.org/ml/gcc-patches/2015-04/msg01347.html, removing many default values of 0x333, to complete that I introduced new macros CHECK_RESULTS{,_NAMED}_NO_FP16 as writing the same list of vector types in four places seemed too many. gcc/testsuite/ChangeLog: * gcc.target/aarch64/advsimd-intrinsics/arm-neon-ref.h (hfloat16_t, vdup_n_f16, CHECK_RESULTS_NO_FP16, CHECK_RESULTS_NAMED_NO_FP16): New. (result, expected, clean_results): Add float16x4 and float16x8 cases. (CHECK_RESULTS_NAMED): Likewise, using CHECK_RESULTS_NAMED_NO_FP16. (CHECK_RESULTS): Redefine using CHECK_RESULTS_NAMED DECL_VARIABLE_64BITS_VARIANTS: Add float16x4 case. DECL_VARIABLE_128BITS_VARIANTS: Add float16x8 case. * gcc.target/aarch64/advsimd-intrinsics/compute-data-ref.h (buffer, buffer_pad, buffer_dup, buffer_dup_pad): Add float16x4 and float16x8. * gcc.target/aarch64/advsimd-intrinsics/vbsl.c (exec_vbsl): Change CHECK_RESULTS to CHECK_RESULTS_NO_FP16. * gcc.target/aarch64/advsimd-intrinsics/vdup_lane.c (exec_vdup_lane): Likewise. * gcc.target/aarch64/advsimd-intrinsics/vext.c (exec_vext): Likewise. * gcc.target/aarch64/advsimd-intrinsics/vdup-vmov.c (exec_vdup_vmov): Change CHECK_RESULTS_NAMED to CHECK_RESULTS_NAMED_NO_FP16. * gcc.target/aarch64/advsimd-intrinsics/vcombine.c: Add expected results for float16x4 and float16x8. (exec_vcombine): add test of float16x4 - float16x8 case. * gcc.target/aarch64/advsimd-intrinsics/vcreate.c: Likewise. * gcc.target/aarch64/advsimd-intrinsics/vget_high.c: Likewise. * gcc.target/aarch64/advsimd-intrinsics/vget_low.c: Likewise. * gcc.target/aarch64/advsimd-intrinsics/vld1.c: Likewise. * gcc.target/aarch64/advsimd-intrinsics/vld1_dup.c: Likewise. * gcc.target/aarch64/advsimd-intrinsics/vld1_lane.c: Likewise. * gcc.target/aarch64/advsimd-intrinsics/vldX.c: Likewise. * gcc.target/aarch64/advsimd-intrinsics/vldX_dup.c: Likewise. * gcc.target/aarch64/advsimd-intrinsics/vldX_lane.c: Likewise. diff --git a/gcc/testsuite/gcc.target/aarch64/advsimd-intrinsics/arm-neon-ref.h b/gcc/testsuite/gcc.target/aarch64/advsimd-intrinsics/arm-neon-ref.h index 4e728d5..cf9c358 100644 --- a/gcc/testsuite/gcc.target/aarch64/advsimd-intrinsics/arm-neon-ref.h +++ b/gcc/testsuite/gcc.target/aarch64/advsimd-intrinsics/arm-neon-ref.h @@ -7,6 +7,7 @@ #include inttypes.h /* helper type, to help write floating point results in integer form. */ +typedef uint16_t hfloat16_t; typedef uint32_t hfloat32_t; typedef uint64_t hfloat64_t; @@ -132,6 +133,7 @@ static ARRAY(result, uint, 32, 2); static ARRAY(result, uint, 64, 1); static ARRAY(result, poly, 8, 8); static ARRAY(result, poly, 16, 4); +static ARRAY(result, float, 16, 4); static ARRAY(result, float, 32, 2); static ARRAY(result, int, 8, 16); static ARRAY(result, int, 16, 8); @@ -143,6 +145,7 @@ static ARRAY(result, uint, 32, 4); static ARRAY(result, uint, 64, 2); static ARRAY(result, poly, 8, 16); static ARRAY(result, poly, 16, 8); +static ARRAY(result, float, 16, 8); static ARRAY(result, float, 32, 4); #ifdef __aarch64__ static ARRAY(result, float, 64, 2); @@ -160,6 +163,7 @@ extern ARRAY(expected, uint, 32, 2); extern ARRAY(expected, uint, 64, 1); extern ARRAY(expected, poly, 8, 8); extern ARRAY(expected, poly, 16, 4); +extern ARRAY(expected, hfloat, 16, 4); extern ARRAY(expected, hfloat, 32, 2); extern ARRAY(expected, int, 8, 16); extern ARRAY(expected, int, 16, 8); @@ -171,38 +175,11 @@ extern ARRAY(expected, uint, 32, 4); extern ARRAY(expected, uint, 64, 2); extern ARRAY(expected, poly, 8, 16); extern ARRAY(expected, poly, 16, 8); +extern ARRAY(expected, hfloat, 16, 8); extern ARRAY(expected, hfloat, 32, 4); extern ARRAY(expected, hfloat, 64, 2); -/* Check results. Operates on all possible vector types. */ -#define CHECK_RESULTS(test_name,comment)\ - { \ -CHECK(test_name, int, 8, 8, PRIx8, expected, comment); \ -CHECK(test_name, int, 16, 4, PRIx16, expected, comment); \ -CHECK(test_name, int, 32, 2, PRIx32, expected, comment); \ -CHECK(test_name, int, 64, 1, PRIx64, expected, comment); \ -CHECK(test_name, uint, 8, 8, PRIx8, expected, comment); \ -CHECK(test_name, uint, 16, 4, PRIx16, expected, comment); \ -CHECK(test_name, uint, 32, 2, PRIx32, expected, comment); \ -CHECK(test_name, uint, 64, 1, PRIx64, expected, comment); \ -CHECK(test_name, poly, 8, 8, PRIx8, expected, comment); \ -CHECK(test_name, poly, 16, 4, PRIx16, expected, comment); \ -CHECK_FP(test_name, float, 32, 2, PRIx32, expected, comment); \ - \ -CHECK(test_name, int, 8, 16, PRIx8, expected, comment); \ -CHECK(test_name, int, 16, 8, PRIx16, expected, comment); \ -CHECK(test_name, int, 32, 4, PRIx32, expected, comment); \ -
[PATCH 13/16][AArch64] Add vcvt(_high)?_f32_f16 intrinsics, with BE RTL fix
Unchanged since https://gcc.gnu.org/ml/gcc-patches/2015-04/msg01345.html commit 214fcc00475a543a79ed444f9a64061215397cc8 Author: Alan Lawrence alan.lawre...@arm.com Date: Wed Jan 28 13:01:31 2015 + AArch64 6/N: vcvt{,_high}_f32_f16 (using vect_par_cnst_hi_half, fixing bigendian indices) diff --git a/gcc/config/aarch64/aarch64-simd-builtins.def b/gcc/config/aarch64/aarch64-simd-builtins.def index 8bcab72..9869b73 100644 --- a/gcc/config/aarch64/aarch64-simd-builtins.def +++ b/gcc/config/aarch64/aarch64-simd-builtins.def @@ -361,11 +361,11 @@ BUILTIN_VSDQ_I_DI (UNOP, abs, 0) BUILTIN_VDQF (UNOP, abs, 2) - VAR1 (UNOP, vec_unpacks_hi_, 10, v4sf) + VAR2 (UNOP, vec_unpacks_hi_, 10, v4sf, v8hf) VAR1 (BINOP, float_truncate_hi_, 0, v4sf) VAR1 (BINOP, float_truncate_hi_, 0, v8hf) - VAR1 (UNOP, float_extend_lo_, 0, v2df) + VAR2 (UNOP, float_extend_lo_, 0, v2df, v4sf) BUILTIN_VDF (UNOP, float_truncate_lo_, 0) /* Implemented by aarch64_ld1VALL_F16:mode. */ diff --git a/gcc/config/aarch64/aarch64-simd.md b/gcc/config/aarch64/aarch64-simd.md index 2dc54e1..1a7d858 100644 --- a/gcc/config/aarch64/aarch64-simd.md +++ b/gcc/config/aarch64/aarch64-simd.md @@ -1691,36 +1691,57 @@ ;; Float widening operations. -(define_insn vec_unpacks_lo_v4sf - [(set (match_operand:V2DF 0 register_operand =w) - (float_extend:V2DF - (vec_select:V2SF - (match_operand:V4SF 1 register_operand w) - (parallel [(const_int 0) (const_int 1)]) - )))] +(define_insn aarch64_simd_vec_unpacks_lo_mode + [(set (match_operand:VWIDE 0 register_operand =w) +(float_extend:VWIDE (vec_select:VHALF + (match_operand:VQ_HSF 1 register_operand w) + (match_operand:VQ_HSF 2 vect_par_cnst_lo_half ) + )))] TARGET_SIMD - fcvtl\\t%0.2d, %1.2s + fcvtl\\t%0.Vwtype, %1.Vhalftype [(set_attr type neon_fp_cvt_widen_s)] ) -(define_insn aarch64_float_extend_lo_v2df - [(set (match_operand:V2DF 0 register_operand =w) - (float_extend:V2DF - (match_operand:V2SF 1 register_operand w)))] +(define_expand vec_unpacks_lo_mode + [(match_operand:VWIDE 0 register_operand ) + (match_operand:VQ_HSF 1 register_operand )] TARGET_SIMD - fcvtl\\t%0.2d, %1.2s + { +rtx p = aarch64_simd_vect_par_cnst_half (MODEmode, false); +emit_insn (gen_aarch64_simd_vec_unpacks_lo_mode (operands[0], + operands[1], p)); +DONE; + } +) + +(define_insn aarch64_simd_vec_unpacks_hi_mode + [(set (match_operand:VWIDE 0 register_operand =w) +(float_extend:VWIDE (vec_select:VHALF + (match_operand:VQ_HSF 1 register_operand w) + (match_operand:VQ_HSF 2 vect_par_cnst_hi_half ) + )))] + TARGET_SIMD + fcvtl2\\t%0.Vwtype, %1.Vtype [(set_attr type neon_fp_cvt_widen_s)] ) -(define_insn vec_unpacks_hi_v4sf - [(set (match_operand:V2DF 0 register_operand =w) - (float_extend:V2DF - (vec_select:V2SF - (match_operand:V4SF 1 register_operand w) - (parallel [(const_int 2) (const_int 3)]) - )))] +(define_expand vec_unpacks_hi_mode + [(match_operand:VWIDE 0 register_operand ) + (match_operand:VQ_HSF 1 register_operand )] + TARGET_SIMD + { +rtx p = aarch64_simd_vect_par_cnst_half (MODEmode, true); +emit_insn (gen_aarch64_simd_vec_unpacks_lo_mode (operands[0], + operands[1], p)); +DONE; + } +) +(define_insn aarch64_float_extend_lo_Vwide + [(set (match_operand:VWIDE 0 register_operand =w) + (float_extend:VWIDE + (match_operand:VDF 1 register_operand w)))] TARGET_SIMD - fcvtl2\\t%0.2d, %1.4s + fcvtl\\t%0Vmwtype, %1Vmtype [(set_attr type neon_fp_cvt_widen_s)] ) diff --git a/gcc/config/aarch64/arm_neon.h b/gcc/config/aarch64/arm_neon.h index ff1a45c..4f0636f 100644 --- a/gcc/config/aarch64/arm_neon.h +++ b/gcc/config/aarch64/arm_neon.h @@ -6026,10 +6026,6 @@ vaddlvq_u32 (uint32x4_t a) result; \ }) -/* vcvt_f32_f16 not supported */ - -/* vcvt_high_f32_f16 not supported */ - #define vcvt_n_f32_s32(a, b)\ __extension__ \ ({ \ @@ -13420,6 +13416,12 @@ vcvt_high_f32_f64 (float32x2_t __a, float64x2_t __b) /* vcvt (float - double). */ +__extension__ static __inline float32x4_t __attribute__ ((__always_inline__)) +vcvt_f32_f16 (float16x4_t __a) +{ + return __builtin_aarch64_float_extend_lo_v4sf (__a); +} + __extension__ static __inline float64x2_t __attribute__ ((__always_inline__)) vcvt_f64_f32 (float32x2_t __a) { @@ -13427,6 +13429,12 @@ vcvt_f64_f32 (float32x2_t __a) return __builtin_aarch64_float_extend_lo_v2df (__a); } +__extension__ static __inline float32x4_t __attribute__ ((__always_inline__)) +vcvt_high_f32_f16 (float16x8_t __a) +{ + return __builtin_aarch64_vec_unpacks_hi_v8hf (__a); +} + __extension__ static __inline float64x2_t __attribute__
[PATCH 12/16][AArch64] vreinterpret(q?), vget_(low|high), vld1(q?)_dup
This is the remainder of https://gcc.gnu.org/ml/gcc-patches/2015-04/msg01343.html combined with https://gcc.gnu.org/ml/gcc-patches/2015-04/msg01344.html, putting together all the intrinsics that didn't require anything outside arm_neon.h. Also update the existing tests in aarch64/. gcc/ChangeLog: * config/aarch64/arm_neon.h (vreinterpret_p8_f16, vreinterpret_p16_f16, vreinterpret_f16_f64, vreinterpret_f16_s8, vreinterpret_f16_s16, vreinterpret_f16_s32, vreinterpret_f16_s64, vreinterpret_f16_f32, vreinterpret_f16_u8, vreinterpret_f16_u16, vreinterpret_f16_u32, vreinterpret_f16_u64, vreinterpret_f16_p8, vreinterpret_f16_p16, vreinterpretq_f16_f64, vreinterpretq_f16_s8, vreinterpretq_f16_s16, vreinterpretq_f16_s32, vreinterpretq_f16_s64, vreinterpretq_f16_f32, vreinterpretq_f16_u8, vreinterpretq_f16_u16, vreinterpretq_f16_u32, vreinterpretq_f16_u64, vreinterpretq_f16_p8, vreinterpretq_f16_p16, vreinterpret_f32_f16, vreinterpret_f64_f16, vreinterpret_s64_f16, vreinterpret_u64_f16, vreinterpretq_u64_f16, vreinterpret_s8_f16, vreinterpret_s16_f16, vreinterpret_s32_f16, vreinterpret_u8_f16, vreinterpret_u16_f16, vreinterpret_u32_f16, vreinterpretq_p8_f16, vreinterpretq_p16_f16, vreinterpretq_f32_f16, vreinterpretq_f64_f16, vreinterpretq_s64_f16, vreinterpretq_s8_f16, vreinterpretq_s16_f16, vreinterpretq_s32_f16, vreinterpretq_u8_f16, vreinterpretq_u16_f16, vreinterpretq_u32_f16, vget_low_f16, vget_high_f16, vld1_dup_f16, vld1q_dup_f16): New. gcc/testsuite/ChangeLog: * gcc.target/aarch64/vget_high_1.c: Add float16x8-float16x4 case. * gcc.target/aarch64/vget_low_1.c: Likewise. commit beb21a6bce76d4fbedb13fcf25796563b27f6bae Author: Alan Lawrence alan.lawre...@arm.com Date: Mon Jun 29 18:46:49 2015 +0100 [AArch64 5/N v2] vreinterpret, vget_(low|high), vld1(q?)_dup. update tests for vget_low/high diff --git a/gcc/config/aarch64/arm_neon.h b/gcc/config/aarch64/arm_neon.h index b915754..ff1a45c 100644 --- a/gcc/config/aarch64/arm_neon.h +++ b/gcc/config/aarch64/arm_neon.h @@ -2891,6 +2891,12 @@ vgetq_lane_u64 (uint64x2_t __a, const int __b) /* vreinterpret */ __extension__ static __inline poly8x8_t __attribute__ ((__always_inline__)) +vreinterpret_p8_f16 (float16x4_t __a) +{ + return (poly8x8_t) __a; +} + +__extension__ static __inline poly8x8_t __attribute__ ((__always_inline__)) vreinterpret_p8_f64 (float64x1_t __a) { return (poly8x8_t) __a; @@ -2987,6 +2993,12 @@ vreinterpretq_p8_s64 (int64x2_t __a) } __extension__ static __inline poly8x16_t __attribute__ ((__always_inline__)) +vreinterpretq_p8_f16 (float16x8_t __a) +{ + return (poly8x16_t) __a; +} + +__extension__ static __inline poly8x16_t __attribute__ ((__always_inline__)) vreinterpretq_p8_f32 (float32x4_t __a) { return (poly8x16_t) __a; @@ -3023,6 +3035,12 @@ vreinterpretq_p8_p16 (poly16x8_t __a) } __extension__ static __inline poly16x4_t __attribute__ ((__always_inline__)) +vreinterpret_p16_f16 (float16x4_t __a) +{ + return (poly16x4_t) __a; +} + +__extension__ static __inline poly16x4_t __attribute__ ((__always_inline__)) vreinterpret_p16_f64 (float64x1_t __a) { return (poly16x4_t) __a; @@ -3119,6 +3137,12 @@ vreinterpretq_p16_s64 (int64x2_t __a) } __extension__ static __inline poly16x8_t __attribute__ ((__always_inline__)) +vreinterpretq_p16_f16 (float16x8_t __a) +{ + return (poly16x8_t) __a; +} + +__extension__ static __inline poly16x8_t __attribute__ ((__always_inline__)) vreinterpretq_p16_f32 (float32x4_t __a) { return (poly16x8_t) __a; @@ -3154,6 +3178,156 @@ vreinterpretq_p16_p8 (poly8x16_t __a) return (poly16x8_t) __a; } +__extension__ static __inline float16x4_t __attribute__ ((__always_inline__)) +vreinterpret_f16_f64 (float64x1_t __a) +{ + return (float16x4_t) __a; +} + +__extension__ static __inline float16x4_t __attribute__ ((__always_inline__)) +vreinterpret_f16_s8 (int8x8_t __a) +{ + return (float16x4_t) __a; +} + +__extension__ static __inline float16x4_t __attribute__ ((__always_inline__)) +vreinterpret_f16_s16 (int16x4_t __a) +{ + return (float16x4_t) __a; +} + +__extension__ static __inline float16x4_t __attribute__ ((__always_inline__)) +vreinterpret_f16_s32 (int32x2_t __a) +{ + return (float16x4_t) __a; +} + +__extension__ static __inline float16x4_t __attribute__ ((__always_inline__)) +vreinterpret_f16_s64 (int64x1_t __a) +{ + return (float16x4_t) __a; +} + +__extension__ static __inline float16x4_t __attribute__ ((__always_inline__)) +vreinterpret_f16_f32 (float32x2_t __a) +{ + return (float16x4_t) __a; +} + +__extension__ static __inline float16x4_t __attribute__ ((__always_inline__)) +vreinterpret_f16_u8 (uint8x8_t __a) +{ + return (float16x4_t) __a; +} + +__extension__ static __inline float16x4_t __attribute__ ((__always_inline__)) +vreinterpret_f16_u16 (uint16x4_t __a) +{ + return (float16x4_t) __a; +} +
RE: [PATCH] MIPS: Fix the call-[1,5,6].c tests to allow the jrc instruction to be matched when testing with microMIPS
-Original Message- From: Andrew Bennett [mailto:andrew.benn...@imgtec.com] Sent: Tuesday, July 07, 2015 6:53 AM To: gcc-patches@gcc.gnu.org Cc: Moore, Catherine; Matthew Fortune Subject: [PATCH] MIPS: Fix the call-[1,5,6].c tests to allow the jrc instruction to be matched when testing with microMIPS Hi, When building the call-[1,5,6].c tests for micromips the jrc rather than the jr instruction is used to call the tail* functions. I have updated the test output to allow the jrc instruction to be matched. I have tested this on the mips-mti-elf target using mips32r2/{-mno- micromips/-mmicromips} test options and there are no new regressions. The patch and ChangeLog are below. Ok to commit? testsuite/ * gcc.target/mips/call-1.c: Allow testcase to match the jrc instruction. * gcc.target/mips/call-5.c: Ditto. * gcc.target/mips/call-6.c: Ditto. OK.
Re: Clean-ups in match.pd
On Mon, Jul 6, 2015 at 4:08 PM, Richard Biener richard.guent...@gmail.com wrote: On Sat, Jul 4, 2015 at 4:34 PM, Marc Glisse marc.gli...@inria.fr wrote: Hello, these are just some minor changes. I believe I had already promised a build_ function to match integer_each_onep. Bootstrap+testsuite on powerpc64le-unknown-linux-gnu (it looks like *-match.c takes about 10 minutes to compile in stage2 these days). Ouch. I have some changes to the code generation in the queue which also supports a more natural if structure (else and elif). Eventually that helps a bit but I suppose the main issue is simply from the large functions. They can be split quite easily I think, but passing down all relevant state might turn out to be tricky unless we start using nested functions here ... (and IIRC those are not supported in C++) Just checking in my dev tree (-O0 build with checking enabled, thus similar to the stage2 situation) reveals nothing interesting. The checkers take up most of the time: CFG verifier: 21.27 ( 8%) usr 0.01 ( 1%) sys 21.46 ( 8%) wall 0 kB ( 0%) ggc early inlining heuristics: 12.59 ( 5%) usr 0.03 ( 2%) sys 12.61 ( 5%) wall 10826 kB ( 1%) ggc tree SSA verifier : 26.30 (10%) usr 0.01 ( 1%) sys 26.34 (10%) wall 0 kB ( 0%) ggc tree STMT verifier : 50.44 (20%) usr 0.10 ( 6%) sys 50.27 (20%) wall 0 kB ( 0%) ggc that's everything = 5% Trying to figure out if there is some gross algorithms in here (yes, we now verify stuff quite often...) Richard. Richard. 2015-07-06 Marc Glisse marc.gli...@inria.fr * match.pd: Remove element_mode inside HONOR_*. (~ (-A) - A - 1, ~ (A - 1) - -A): Handle complex types. (~X | X - -1, ~X ^ X - -1): Merge. * tree.c (build_each_one_cst): New function. * tree.h (build_each_one_cst): Likewise. -- Marc Glisse Index: match.pd === --- match.pd(revision 225411) +++ match.pd(working copy) @@ -101,7 +101,7 @@ negative value by 0 gives -0, not +0. */ (simplify (mult @0 real_zerop@1) - (if (!HONOR_NANS (type) !HONOR_SIGNED_ZEROS (element_mode (type))) + (if (!HONOR_NANS (type) !HONOR_SIGNED_ZEROS (type)) @1)) /* In IEEE floating point, x*1 is not equivalent to x for snans. @@ -108,8 +108,8 @@ Likewise for complex arithmetic with signed zeros. */ (simplify (mult @0 real_onep) - (if (!HONOR_SNANS (element_mode (type)) - (!HONOR_SIGNED_ZEROS (element_mode (type)) + (if (!HONOR_SNANS (type) + (!HONOR_SIGNED_ZEROS (type) || !COMPLEX_FLOAT_TYPE_P (type))) (non_lvalue @0))) @@ -116,8 +116,8 @@ /* Transform x * -1.0 into -x. */ (simplify (mult @0 real_minus_onep) - (if (!HONOR_SNANS (element_mode (type)) -(!HONOR_SIGNED_ZEROS (element_mode (type)) + (if (!HONOR_SNANS (type) +(!HONOR_SIGNED_ZEROS (type) || !COMPLEX_FLOAT_TYPE_P (type))) (negate @0))) @@ -165,7 +165,7 @@ (rdiv @0 @0) (if (FLOAT_TYPE_P (type) ! HONOR_NANS (type) - ! HONOR_INFINITIES (element_mode (type))) + ! HONOR_INFINITIES (type)) { build_one_cst (type); })) /* Optimize -A / A to -1.0 if we don't care about @@ -174,19 +174,19 @@ (rdiv:c @0 (negate @0)) (if (FLOAT_TYPE_P (type) ! HONOR_NANS (type) - ! HONOR_INFINITIES (element_mode (type))) + ! HONOR_INFINITIES (type)) { build_minus_one_cst (type); })) /* In IEEE floating point, x/1 is not equivalent to x for snans. */ (simplify (rdiv @0 real_onep) - (if (!HONOR_SNANS (element_mode (type))) + (if (!HONOR_SNANS (type)) (non_lvalue @0))) /* In IEEE floating point, x/-1 is not equivalent to -x for snans. */ (simplify (rdiv @0 real_minus_onep) - (if (!HONOR_SNANS (element_mode (type))) + (if (!HONOR_SNANS (type)) (negate @0))) /* If ARG1 is a constant, we can convert this to a multiply by the @@ -297,9 +297,10 @@ @1) /* ~x | x - -1 */ Please also adjust this comment. Ok with that change. Thanks, Richard. -(simplify - (bit_ior:c (convert? @0) (convert? (bit_not @0))) - (convert { build_all_ones_cst (TREE_TYPE (@0)); })) +(for op (bit_ior bit_xor plus) + (simplify + (op:c (convert? @0) (convert? (bit_not @0))) + (convert { build_all_ones_cst (TREE_TYPE (@0)); }))) /* x ^ x - 0 */ (simplify @@ -311,11 +312,6 @@ (bit_xor @0 integer_all_onesp@1) (bit_not @0)) -/* ~X ^ X is -1. */ -(simplify - (bit_xor:c (bit_not @0) @0) - { build_all_ones_cst (type); }) - /* x ~0 - x */ (simplify (bit_and @0 integer_all_onesp) @@ -603,11 +599,11 @@ (simplify (bit_not (convert? (negate @0))) (if (tree_nop_conversion_p (type, TREE_TYPE (@0))) - (convert (minus @0 { build_one_cst (TREE_TYPE (@0)); } + (convert (minus @0 { build_each_one_cst (TREE_TYPE (@0)); } /* Convert ~ (A - 1) or ~ (A + -1) to -A.
Re: [PATCH] save takes a single integer (register or 13-bit signed immediate)
2015-06-26 Daniel Cederman ceder...@gaisler.com * config/sparc/sparc.md: Window save takes a single integer This will probably break in 64-bit mode, the operand can be a DImode register. -- Eric Botcazou
Re: [Patch, fortran, pr66578, v1] [F2008] Invalid free on allocate(...,source=a(:)) in block
Hi all, hi Paul, Paul thanks for the review. Committed as r225507. Regards, Andre -- Andre Vehreschild * Email: vehre ad gmx dot de Index: gcc/fortran/trans-expr.c === *** gcc/fortran/trans-expr.c (revision 223641) --- gcc/fortran/trans-expr.c (working copy) *** gfc_conv_procedure_call (gfc_se * se, gf *** 5877,5882 --- 5877,5896 fntype = TREE_TYPE (TREE_TYPE (se-expr)); se-expr = build_call_vec (TREE_TYPE (fntype), se-expr, arglist); + /* Allocatable scalar function results must be freed and nullified + after use. This necessitates the creation of a temporary to + hold the result to prevent duplicate calls. */ + if (!byref sym-ts.type != BT_CHARACTER +sym-attr.allocatable !sym-attr.dimension) + { + tmp = gfc_create_var (TREE_TYPE (se-expr), NULL); + gfc_add_modify (se-pre, tmp, se-expr); + se-expr = tmp; + tmp = gfc_call_free (tmp); + gfc_add_expr_to_block (post, tmp); + gfc_add_modify (post, se-expr, build_int_cst (TREE_TYPE (se-expr), 0)); + } + /* If we have a pointer function, but we don't want a pointer, e.g. something like x = f() Index: gcc/fortran/trans-stmt.c === *** gcc/fortran/trans-stmt.c (revision 223641) --- gcc/fortran/trans-stmt.c (working copy) *** gfc_trans_allocate (gfc_code * code) *** 5214,5219 --- 5214,5220 false, false); gfc_add_block_to_block (block, se.pre); gfc_add_block_to_block (post, se.post); + /* Prevent aliasing, i.e., se.expr may be already a variable declaration. */ if (!VAR_P (se.expr)) *** gfc_trans_allocate (gfc_code * code) *** 5223,5230 se.expr); /* We need a regular (non-UID) symbol here, therefore give a prefix. */ ! var = gfc_create_var (TREE_TYPE (tmp), atmp); gfc_add_modify_loc (input_location, block, var, tmp); tmp = var; } else --- 5224,5243 se.expr); /* We need a regular (non-UID) symbol here, therefore give a prefix. */ ! var = gfc_create_var (TREE_TYPE (tmp), expr3); gfc_add_modify_loc (input_location, block, var, tmp); + + /* Deallocate any allocatable components after all the allocations + and assignments of expr3 have been completed. */ + if (code-expr3-ts.type == BT_DERIVED + code-expr3-rank == 0 + code-expr3-ts.u.derived-attr.alloc_comp) + { + tmp = gfc_deallocate_alloc_comp (code-expr3-ts.u.derived, + var, 0); + gfc_add_expr_to_block (post, tmp); + } + tmp = var; } else Index: gcc/testsuite/gfortran.dg/allocatable_scalar_13.f90 === *** gcc/testsuite/gfortran.dg/allocatable_scalar_13.f90 (revision 0) --- gcc/testsuite/gfortran.dg/allocatable_scalar_13.f90 (working copy) *** *** 0 --- 1,70 + ! { dg-do run } + ! { dg-options -fdump-tree-original } + ! + ! Test the fix for PR66079. The original problem was with the first + ! allocate statement. The rest of this testcase fixes problems found + ! whilst working on it! + ! + ! Reported by Damian Rouson dam...@sourceryinstitute.org + ! + type subdata + integer, allocatable :: b + endtype + ! block + call newRealVec + ! end block + contains + subroutine newRealVec + type(subdata), allocatable :: d, e, f + character(:), allocatable :: g, h, i + character(8), allocatable :: j + allocate(d,source=subdata(1)) ! memory was lost, now OK + allocate(e,source=d) ! OK + allocate(f,source=create (99)) ! memory was lost, now OK + if (d%b .ne. 1) call abort + if (e%b .ne. 1) call abort + if (f%b .ne. 99) call abort + allocate (g, source = greeting1(good day)) + if (g .ne. good day) call abort + allocate (h, source = greeting2(hello)) + if (h .ne. hello) call abort + allocate (i, source = greeting3(hiya!)) + if (i .ne. hiya!) call abort + call greeting4 (j, Goodbye ) ! Test that dummy arguments are OK + if (j .ne. Goodbye ) call abort + end subroutine + + function create (arg) result(res) + integer :: arg + type(subdata), allocatable :: res, res1 + allocate(res, res1, source = subdata(arg)) + end function + + function greeting1 (arg) result(res) ! memory was lost, now OK + character(*) :: arg + Character(:), allocatable :: res + allocate(res, source = arg) + end function + + function greeting2 (arg) result(res) + character(5) :: arg + Character(:), allocatable :: res + allocate(res, source = arg) + end function + + function greeting3 (arg) result(res) + character(5) :: arg + Character(5), allocatable :: res, res1 + allocate(res, res1, source = arg) ! Caused an ICE + if (res1
[patch committed SH] Fix PR target/66780
The attatched patch reverts a part of the change in r221165 for target/65249. It turned out that that change causes a wrong code problem PR target/66780 which is worse than the ICE with 'R0_REGS' spill failure for a specific program reported by PR65249. I've committed it on trunk and reopened PR target/65249. I'll backport it to 4.9 later and to 5 when the branch reopens. Regards, kaz -- 2015-07-07 Kaz Kojima kkoj...@gcc.gnu.org PR target/66780 * config/sh/sh.md (symGOT_load): Revert a part of 2015-03-03 change for target/65249. diff --git a/config/sh/sh.md b/config/sh/sh.md index 5c8d306..f0cb3cf 100644 --- a/config/sh/sh.md +++ b/config/sh/sh.md @@ -10751,12 +10751,6 @@ label: __stack_chk_guard) == 0) stack_chk_guard_p = true; - /* Use R0 to avoid long R0 liveness which stack-protector tends to - produce. */ - if (! sh_lra_flag - stack_chk_guard_p ! reload_in_progress ! reload_completed) -operands[2] = gen_rtx_REG (Pmode, R0_REG); - if (TARGET_SHMEDIA) { rtx reg = operands[2];
[PATCH 10/16][AArch64] vld{2,3,4}{,_lane,_dup},vcombine,vcreate
As per https://gcc.gnu.org/ml/gcc-patches/2015-04/msg01342.html commit ef719e5d3d6eccc5cf621851283b7c0ba1a9ee6c Author: Alan Lawrence alan.lawre...@arm.com Date: Tue Aug 5 17:52:28 2014 +0100 AArch64 3/N: v(create|combine|v(ld|st|ld...dup/lane|st...lane)[234](q?))_f16; tests vldN{,_lane,_dup} inc bigendian. Add __builtin_aarch64_simd_hf. Fix some casts, to ..._hf not ..._sf ! diff --git a/gcc/config/aarch64/aarch64-builtins.c b/gcc/config/aarch64/aarch64-builtins.c index a6c3377..5367ba6 100644 --- a/gcc/config/aarch64/aarch64-builtins.c +++ b/gcc/config/aarch64/aarch64-builtins.c @@ -300,6 +300,12 @@ aarch64_types_storestruct_lane_qualifiers[SIMD_MAX_BUILTIN_ARGS] #define VAR12(T, N, MAP, A, B, C, D, E, F, G, H, I, J, K, L) \ VAR11 (T, N, MAP, A, B, C, D, E, F, G, H, I, J, K) \ VAR1 (T, N, MAP, L) +#define VAR13(T, N, MAP, A, B, C, D, E, F, G, H, I, J, K, L, M) \ + VAR12 (T, N, MAP, A, B, C, D, E, F, G, H, I, J, K, L) \ + VAR1 (T, N, MAP, M) +#define VAR14(T, X, MAP, A, B, C, D, E, F, G, H, I, J, K, L, M, N) \ + VAR13 (T, X, MAP, A, B, C, D, E, F, G, H, I, J, K, L, M) \ + VAR1 (T, X, MAP, N) #include aarch64-builtin-iterators.h @@ -377,6 +383,7 @@ const char *aarch64_scalar_builtin_types[] = { __builtin_aarch64_simd_qi, __builtin_aarch64_simd_hi, __builtin_aarch64_simd_si, + __builtin_aarch64_simd_hf, __builtin_aarch64_simd_sf, __builtin_aarch64_simd_di, __builtin_aarch64_simd_df, @@ -664,6 +671,8 @@ aarch64_init_simd_builtin_scalar_types (void) __builtin_aarch64_simd_qi); (*lang_hooks.types.register_builtin_type) (intHI_type_node, __builtin_aarch64_simd_hi); + (*lang_hooks.types.register_builtin_type) (aarch64_fp16_type_node, + __builtin_aarch64_simd_hf); (*lang_hooks.types.register_builtin_type) (intSI_type_node, __builtin_aarch64_simd_si); (*lang_hooks.types.register_builtin_type) (float_type_node, diff --git a/gcc/config/aarch64/aarch64.c b/gcc/config/aarch64/aarch64.c index ccf063a..bbf5230 100644 --- a/gcc/config/aarch64/aarch64.c +++ b/gcc/config/aarch64/aarch64.c @@ -1063,6 +1063,9 @@ aarch64_split_simd_combine (rtx dst, rtx src1, rtx src2) case V2SImode: gen = gen_aarch64_simd_combinev2si; break; + case V4HFmode: + gen = gen_aarch64_simd_combinev4hf; + break; case V2SFmode: gen = gen_aarch64_simd_combinev2sf; break; diff --git a/gcc/config/aarch64/arm_neon.h b/gcc/config/aarch64/arm_neon.h index 7425485..d61e619 100644 --- a/gcc/config/aarch64/arm_neon.h +++ b/gcc/config/aarch64/arm_neon.h @@ -153,6 +153,16 @@ typedef struct uint64x2x2_t uint64x2_t val[2]; } uint64x2x2_t; +typedef struct float16x4x2_t +{ + float16x4_t val[2]; +} float16x4x2_t; + +typedef struct float16x8x2_t +{ + float16x8_t val[2]; +} float16x8x2_t; + typedef struct float32x2x2_t { float32x2_t val[2]; @@ -273,6 +283,16 @@ typedef struct uint64x2x3_t uint64x2_t val[3]; } uint64x2x3_t; +typedef struct float16x4x3_t +{ + float16x4_t val[3]; +} float16x4x3_t; + +typedef struct float16x8x3_t +{ + float16x8_t val[3]; +} float16x8x3_t; + typedef struct float32x2x3_t { float32x2_t val[3]; @@ -393,6 +413,16 @@ typedef struct uint64x2x4_t uint64x2_t val[4]; } uint64x2x4_t; +typedef struct float16x4x4_t +{ + float16x4_t val[4]; +} float16x4x4_t; + +typedef struct float16x8x4_t +{ + float16x8_t val[4]; +} float16x8x4_t; + typedef struct float32x2x4_t { float32x2_t val[4]; @@ -2644,6 +2674,12 @@ vcreate_s64 (uint64_t __a) return (int64x1_t) {__a}; } +__extension__ static __inline float16x4_t __attribute__ ((__always_inline__)) +vcreate_f16 (uint64_t __a) +{ + return (float16x4_t) __a; +} + __extension__ static __inline float32x2_t __attribute__ ((__always_inline__)) vcreate_f32 (uint64_t __a) { @@ -4780,6 +4816,12 @@ vcombine_s64 (int64x1_t __a, int64x1_t __b) return __builtin_aarch64_combinedi (__a[0], __b[0]); } +__extension__ static __inline float16x8_t __attribute__ ((__always_inline__)) +vcombine_f16 (float16x4_t __a, float16x4_t __b) +{ + return __builtin_aarch64_combinev4hf (__a, __b); +} + __extension__ static __inline float32x4_t __attribute__ ((__always_inline__)) vcombine_f32 (float32x2_t __a, float32x2_t __b) { @@ -9908,7 +9950,7 @@ vtstq_p16 (poly16x8_t a, poly16x8_t b) +--+++++ |uint | Y | Y | N | N | +--+++++ - |float | - | - | N | N | + |float | - | Y | N | N | +--+++++ |poly | Y | Y | - | - | +--+++++ @@ -9922,7 +9964,7 @@ vtstq_p16 (poly16x8_t a, poly16x8_t b) +--+++++ |uint | Y | Y | Y | Y | +--+++++ - |float | - | - | Y | Y | + |float | - | Y | Y | Y | +--+++++ |poly | Y | Y | - | - | +--+++++ @@ -9936,7 +9978,7 @@ vtstq_p16 (poly16x8_t a, poly16x8_t b) +--+++++
[PATCH 9/16][AArch64] Add support for float16x{4,8}_t vectors/builtins
As https://gcc.gnu.org/ml/gcc-patches/2015-04/msg01341.html commit 49cb53a94a44fcda845c3f6ef11e88f9be458aad Author: Alan Lawrence alan.lawre...@arm.com Date: Tue Dec 2 13:08:15 2014 + AArch64 2/N: Vector/__builtin basics: define+support types, movs, test ABI. Patterns, builtins, intrinsics for {ld1,st1}{,_lane},v{g,s}et_lane. Tests: vld1-vst1_1, vset_lane_1, vld1_lane.c diff --git a/gcc/config/aarch64/aarch64-builtins.c b/gcc/config/aarch64/aarch64-builtins.c index cfb2dc1..a6c3377 100644 --- a/gcc/config/aarch64/aarch64-builtins.c +++ b/gcc/config/aarch64/aarch64-builtins.c @@ -66,6 +66,7 @@ #define v8qi_UP V8QImode #define v4hi_UP V4HImode +#define v4hf_UP V4HFmode #define v2si_UP V2SImode #define v2sf_UP V2SFmode #define v1df_UP V1DFmode @@ -73,6 +74,7 @@ #define df_UPDFmode #define v16qi_UP V16QImode #define v8hi_UP V8HImode +#define v8hf_UP V8HFmode #define v4si_UP V4SImode #define v4sf_UP V4SFmode #define v2di_UP V2DImode @@ -523,6 +525,8 @@ aarch64_simd_builtin_std_type (enum machine_mode mode, return aarch64_simd_intCI_type_node; case XImode: return aarch64_simd_intXI_type_node; +case HFmode: + return aarch64_fp16_type_node; case SFmode: return float_type_node; case DFmode: @@ -607,6 +611,8 @@ aarch64_init_simd_builtin_types (void) aarch64_simd_types[Poly64x2_t].eltype = aarch64_simd_types[Poly64_t].itype; /* Continue with standard types. */ + aarch64_simd_types[Float16x4_t].eltype = aarch64_fp16_type_node; + aarch64_simd_types[Float16x8_t].eltype = aarch64_fp16_type_node; aarch64_simd_types[Float32x2_t].eltype = float_type_node; aarch64_simd_types[Float32x4_t].eltype = float_type_node; aarch64_simd_types[Float64x1_t].eltype = double_type_node; diff --git a/gcc/config/aarch64/aarch64-simd-builtin-types.def b/gcc/config/aarch64/aarch64-simd-builtin-types.def index bb54e56..ea219b7 100644 --- a/gcc/config/aarch64/aarch64-simd-builtin-types.def +++ b/gcc/config/aarch64/aarch64-simd-builtin-types.def @@ -44,6 +44,8 @@ ENTRY (Poly16x8_t, V8HI, poly, 12) ENTRY (Poly64x1_t, DI, poly, 12) ENTRY (Poly64x2_t, V2DI, poly, 12) + ENTRY (Float16x4_t, V4HF, none, 13) + ENTRY (Float16x8_t, V8HF, none, 13) ENTRY (Float32x2_t, V2SF, none, 13) ENTRY (Float32x4_t, V4SF, none, 13) ENTRY (Float64x1_t, V1DF, none, 13) diff --git a/gcc/config/aarch64/aarch64-simd-builtins.def b/gcc/config/aarch64/aarch64-simd-builtins.def index dd2bc47..4dd2bc7 100644 --- a/gcc/config/aarch64/aarch64-simd-builtins.def +++ b/gcc/config/aarch64/aarch64-simd-builtins.def @@ -367,11 +367,11 @@ VAR1 (UNOP, float_extend_lo_, 0, v2df) VAR1 (UNOP, float_truncate_lo_, 0, v2sf) - /* Implemented by aarch64_ld1VALL:mode. */ - BUILTIN_VALL (LOAD1, ld1, 0) + /* Implemented by aarch64_ld1VALL_F16:mode. */ + BUILTIN_VALL_F16 (LOAD1, ld1, 0) - /* Implemented by aarch64_st1VALL:mode. */ - BUILTIN_VALL (STORE1, st1, 0) + /* Implemented by aarch64_st1VALL_F16:mode. */ + BUILTIN_VALL_F16 (STORE1, st1, 0) /* Implemented by fmamode4. */ BUILTIN_VDQF (TERNOP, fma, 4) diff --git a/gcc/config/aarch64/aarch64-simd.md b/gcc/config/aarch64/aarch64-simd.md index b90f938..5cc45ed 100644 --- a/gcc/config/aarch64/aarch64-simd.md +++ b/gcc/config/aarch64/aarch64-simd.md @@ -19,8 +19,8 @@ ;; http://www.gnu.org/licenses/. (define_expand movmode - [(set (match_operand:VALL 0 nonimmediate_operand ) - (match_operand:VALL 1 general_operand ))] + [(set (match_operand:VALL_F16 0 nonimmediate_operand ) + (match_operand:VALL_F16 1 general_operand ))] TARGET_SIMD if (GET_CODE (operands[0]) == MEM) @@ -2450,7 +2450,7 @@ (define_insn aarch64_get_lanemode [(set (match_operand:VEL 0 aarch64_simd_nonimmediate_operand =r, w, Utv) (vec_select:VEL - (match_operand:VALL 1 register_operand w, w, w) + (match_operand:VALL_F16 1 register_operand w, w, w) (parallel [(match_operand:SI 2 immediate_operand i, i, i)])))] TARGET_SIMD { @@ -4234,8 +4234,9 @@ ) (define_insn aarch64_be_ld1mode - [(set (match_operand:VALLDI 0 register_operand =w) - (unspec:VALLDI [(match_operand:VALLDI 1 aarch64_simd_struct_operand Utv)] + [(set (match_operand:VALLDI_F16 0 register_operand =w) + (unspec:VALLDI_F16 [(match_operand:VALLDI_F16 1 + aarch64_simd_struct_operand Utv)] UNSPEC_LD1))] TARGET_SIMD ld1\\t{%0Vmtype}, %1 @@ -4243,8 +4244,8 @@ ) (define_insn aarch64_be_st1mode - [(set (match_operand:VALLDI 0 aarch64_simd_struct_operand =Utv) - (unspec:VALLDI [(match_operand:VALLDI 1 register_operand w)] + [(set (match_operand:VALLDI_F16 0 aarch64_simd_struct_operand =Utv) + (unspec:VALLDI_F16 [(match_operand:VALLDI_F16 1 register_operand w)] UNSPEC_ST1))] TARGET_SIMD st1\\t{%1Vmtype}, %0 @@ -4533,16 +4534,16 @@ DONE; }) -(define_expand aarch64_ld1VALL:mode - [(match_operand:VALL 0 register_operand) +(define_expand aarch64_ld1VALL_F16:mode + [(match_operand:VALL_F16 0
[PATCH 11/16][AArch64] Implement vcvt_{,high_}f16_f32
This comes from https://gcc.gnu.org/ml/gcc-patches/2015-04/msg01343.html but the other/unrelated intrinsics have moved into the next patch. gcc/ChangeLog: * config/aarch64/aarch64-simd.md (aarch64_float_truncate_lo_v2sf): Reparameterize to... (aarch64_float_truncate_lo_mode): ...this, for both V2SF and V4HF. (aarch64_float_truncate_hi_v4sf): Reparameterize to... (aarch64_float_truncate_hi_Vdbl): ...this, for both V4SF and V8HF. * config/aarch64/aarch64-simd-builtins.def (float_truncate_hi_): Add v8hf variant. (float_truncate_lo_): Use BUILTIN_VDF iterator. * config/aarch64/arm_neon.h (vcvt_f16_f32, vcvt_high_f16_f32): New. * config/aarch64/iterators.md (VDF, Vdtype): New. (VWIDE, Vmwtype): Add cases for V4HF and V2SF. commit 5007fafedc8469ab645edfe65fbf41f75fc74750 Author: Alan Lawrence alan.lawre...@arm.com Date: Tue Dec 2 18:30:05 2014 + AArch64 4/N v2: float_truncate_lo/hi diff --git a/gcc/config/aarch64/aarch64-simd-builtins.def b/gcc/config/aarch64/aarch64-simd-builtins.def index 4dd2bc7..8bcab72 100644 --- a/gcc/config/aarch64/aarch64-simd-builtins.def +++ b/gcc/config/aarch64/aarch64-simd-builtins.def @@ -363,9 +363,10 @@ VAR1 (UNOP, vec_unpacks_hi_, 10, v4sf) VAR1 (BINOP, float_truncate_hi_, 0, v4sf) + VAR1 (BINOP, float_truncate_hi_, 0, v8hf) VAR1 (UNOP, float_extend_lo_, 0, v2df) - VAR1 (UNOP, float_truncate_lo_, 0, v2sf) + BUILTIN_VDF (UNOP, float_truncate_lo_, 0) /* Implemented by aarch64_ld1VALL_F16:mode. */ BUILTIN_VALL_F16 (LOAD1, ld1, 0) diff --git a/gcc/config/aarch64/aarch64-simd.md b/gcc/config/aarch64/aarch64-simd.md index 5cc45ed..2dc54e1 100644 --- a/gcc/config/aarch64/aarch64-simd.md +++ b/gcc/config/aarch64/aarch64-simd.md @@ -1726,23 +1726,23 @@ ;; Float narrowing operations. -(define_insn aarch64_float_truncate_lo_v2sf - [(set (match_operand:V2SF 0 register_operand =w) - (float_truncate:V2SF - (match_operand:V2DF 1 register_operand w)))] +(define_insn aarch64_float_truncate_lo_mode + [(set (match_operand:VDF 0 register_operand =w) + (float_truncate:VDF + (match_operand:VWIDE 1 register_operand w)))] TARGET_SIMD - fcvtn\\t%0.2s, %1.2d + fcvtn\\t%0.Vtype, %1Vmwtype [(set_attr type neon_fp_cvt_narrow_d_q)] ) -(define_insn aarch64_float_truncate_hi_v4sf - [(set (match_operand:V4SF 0 register_operand =w) -(vec_concat:V4SF - (match_operand:V2SF 1 register_operand 0) - (float_truncate:V2SF - (match_operand:V2DF 2 register_operand w] +(define_insn aarch64_float_truncate_hi_Vdbl + [(set (match_operand:VDBL 0 register_operand =w) +(vec_concat:VDBL + (match_operand:VDF 1 register_operand 0) + (float_truncate:VDF + (match_operand:VWIDE 2 register_operand w] TARGET_SIMD - fcvtn2\\t%0.4s, %2.2d + fcvtn2\\t%0.Vdtype, %2Vmwtype [(set_attr type neon_fp_cvt_narrow_d_q)] ) diff --git a/gcc/config/aarch64/arm_neon.h b/gcc/config/aarch64/arm_neon.h index d61e619..b915754 100644 --- a/gcc/config/aarch64/arm_neon.h +++ b/gcc/config/aarch64/arm_neon.h @@ -5726,12 +5726,8 @@ vaddlvq_u32 (uint32x4_t a) result; \ }) -/* vcvt_f16_f32 not supported */ - /* vcvt_f32_f16 not supported */ -/* vcvt_high_f16_f32 not supported */ - /* vcvt_high_f32_f16 not supported */ #define vcvt_n_f32_s32(a, b)\ @@ -13098,6 +13094,18 @@ vcntq_u8 (uint8x16_t __a) /* vcvt (double - float). */ +__extension__ static __inline float16x4_t __attribute__ ((__always_inline__)) +vcvt_f16_f32 (float32x4_t __a) +{ + return __builtin_aarch64_float_truncate_lo_v4hf (__a); +} + +__extension__ static __inline float16x8_t __attribute__ ((__always_inline__)) +vcvt_high_f16_f32 (float16x4_t __a, float32x4_t __b) +{ + return __builtin_aarch64_float_truncate_hi_v8hf (__a, __b); +} + __extension__ static __inline float32x2_t __attribute__ ((__always_inline__)) vcvt_f32_f64 (float64x2_t __a) { diff --git a/gcc/config/aarch64/iterators.md b/gcc/config/aarch64/iterators.md index 96920cf..f6094b1 100644 --- a/gcc/config/aarch64/iterators.md +++ b/gcc/config/aarch64/iterators.md @@ -41,6 +41,9 @@ ;; Iterator for General Purpose Float regs, inc float16_t. (define_mode_iterator GPF_F16 [HF SF DF]) +;; Double vector modes. +(define_mode_iterator VDF [V2SF V4HF]) + ;; Integer vector modes. (define_mode_iterator VDQ_I [V8QI V16QI V4HI V8HI V2SI V4SI V2DI]) @@ -452,6 +455,9 @@ (SI V2SI) (DI V2DI) (DF V2DF)]) +;; Register suffix for double-length mode. +(define_mode_attr Vdtype [(V4HF 8h) (V2SF 4s)]) + ;; Double modes of vector modes (lower case). (define_mode_attr Vdbl [(V8QI v16qi) (V4HI v8hi) (V4HF v8hf) @@ -485,7 +491,8 @@ (define_mode_attr VWIDE [(V8QI V8HI) (V4HI V4SI) (V2SI V2DI) (V16QI V8HI) (V8HI V4SI) (V4SI V2DI) - (HI SI) (SI DI)] + (HI SI) (SI DI) +
[gomp4] libgomp: XFAIL libgomp.oacc-c-c++-common/reduction-4.c for acc_device_nvidia (was: implicit firstprivate and other testcase fixes)
Hi! On Wed, 1 Jul 2015 22:19:01 +0800, Chung-Lin Tang clt...@codesourcery.com wrote: This patch notices the index variable of an acc loop (internally an OMP_FOR) inside an OpenACC construct, and completes the implicit firstprivate behavior as described in the spec. The firstprivate clauses and FIXME in libgomp.oacc-c-c++-common/parallel-loop-2.h has also been removed together in the patch. Thanks! Also a typo-bug in testcase libgomp.oacc-c-c++-common/reduction-4.c is also corrected, where reduction variable names are apparently wrong. Tested without regressions, and applied to gomp-4_0-branch. I'm seeing: WARNING: program timed out. FAIL: libgomp.oacc-c/../libgomp.oacc-c-c++-common/reduction-4.c -DACC_DEVICE_TYPE_nvidia=1 -DACC_MEM_SHARED=0 execution test ... with: libgomp: cuStreamSynchronize error: launch timeout (also for C++), and applied in r225513: commit f03018ac39ed0193102fe29139d3c995caa02fd5 Author: tschwinge tschwinge@138bc75d-0d04-0410-961f-82ee72b054a4 Date: Tue Jul 7 12:45:12 2015 + libgomp: XFAIL libgomp.oacc-c-c++-common/reduction-4.c for acc_device_nvidia ... after r225250 changes. libgomp/ * testsuite/libgomp.oacc-c-c++-common/reduction-4.c: dg-xfail-run-if openacc_nvidia_accel_selected. git-svn-id: svn+ssh://gcc.gnu.org/svn/gcc/branches/gomp-4_0-branch@225513 138bc75d-0d04-0410-961f-82ee72b054a4 --- libgomp/ChangeLog.gomp| 5 + libgomp/testsuite/libgomp.oacc-c-c++-common/reduction-4.c | 1 + 2 files changed, 6 insertions(+) diff --git libgomp/ChangeLog.gomp libgomp/ChangeLog.gomp index dc0f0bf..5f3dfaf 100644 --- libgomp/ChangeLog.gomp +++ libgomp/ChangeLog.gomp @@ -1,3 +1,8 @@ +2015-07-07 Thomas Schwinge tho...@codesourcery.com + + * testsuite/libgomp.oacc-c-c++-common/reduction-4.c: + dg-xfail-run-if openacc_nvidia_accel_selected. + 2015-06-24 James Norris jnor...@codesourcery.com * testsuite/libgomp.oacc-fortran/if-1.c: Fix syntax. diff --git libgomp/testsuite/libgomp.oacc-c-c++-common/reduction-4.c libgomp/testsuite/libgomp.oacc-c-c++-common/reduction-4.c index 416d960..c32f1db 100644 --- libgomp/testsuite/libgomp.oacc-c-c++-common/reduction-4.c +++ libgomp/testsuite/libgomp.oacc-c-c++-common/reduction-4.c @@ -1,4 +1,5 @@ /* { dg-do run { target { ! { hppa*-*-hpux* } } } } */ +/* { dg-xfail-run-if libgomp: cuStreamSynchronize error: launch timeout { openacc_nvidia_accel_selected } } */ /* complex reductions. */ Grüße, Thomas pgpenONfpLS22.pgp Description: PGP signature
[AArch64][2/2] Define TARGET_UNSPEC_MAY_TRAP_P for AArch64
A second patch to improve rtl loop iv on AArch64. We should define this to tell gcc the pattern hidden by these GOT unspec is safe from trap, so gcc could make more positive decision when handling them, for example in RTL loop iv pass, when deciding whether one instruction is invariant candidate, may_trap_or_fault_p will be invoked which will call this target hook. OK for trunk? 2015-07-07 Jiong Wang jiong.w...@arm.com gcc/ * config/aarch64/aarch64.c (aarch64_unspec_may_trap_p): New function. (TARGET_UNSPEC_MAY_TRAP_P): Define as aarch64_unspec_may_trap_p. -- Regards, Jiong diff --git a/gcc/config/aarch64/aarch64.c b/gcc/config/aarch64/aarch64.c index e180daa..c7c12ee 100644 --- a/gcc/config/aarch64/aarch64.c +++ b/gcc/config/aarch64/aarch64.c @@ -11943,6 +11943,24 @@ aarch64_use_pseudo_pic_reg (void) return aarch64_cmodel == AARCH64_CMODEL_SMALL_SPIC; } +/* Implement TARGET_UNSPEC_MAY_TRAP_P. */ + +static int +aarch64_unspec_may_trap_p (const_rtx x, unsigned flags) +{ + switch (XINT (x, 1)) +{ +case UNSPEC_GOTSMALLPIC: +case UNSPEC_GOTSMALLPIC28K: +case UNSPEC_GOTTINYPIC: + return 0; +default: + break; +} + + return default_unspec_may_trap_p (x, flags); +} + #undef TARGET_ADDRESS_COST #define TARGET_ADDRESS_COST aarch64_address_cost @@ -12221,6 +12239,9 @@ aarch64_use_pseudo_pic_reg (void) #undef TARGET_SCHED_FUSION_PRIORITY #define TARGET_SCHED_FUSION_PRIORITY aarch64_sched_fusion_priority +#undef TARGET_UNSPEC_MAY_TRAP_P +#define TARGET_UNSPEC_MAY_TRAP_P aarch64_unspec_may_trap_p + #undef TARGET_USE_PSEUDO_PIC_REG #define TARGET_USE_PSEUDO_PIC_REG aarch64_use_pseudo_pic_reg
Re: [PATCH] save takes a single integer (register or 13-bit signed immediate)
On 2015-07-07 12:35, Eric Botcazou wrote: 2015-06-26 Daniel Cederman ceder...@gaisler.com * config/sparc/sparc.md: Window save takes a single integer This will probably break in 64-bit mode, the operand can be a DImode register. You are right, I forgot about that. Is there a mode one can use that changes depending on the target architecture (32-bit on 32-bit architectures and 64-bit on 64-bit architectures)? Or does one have to add a 32-bit and a 64-bit variant of window_save? -- Daniel Cederman
Re: [PATCH] Update instruction cost for LEON
On 2015-07-07 12:37, Eric Botcazou wrote: 2015-07-03 Daniel Cederman ceder...@gaisler.com * config/sparc/sparc.c (struct processor_costs): Set div cost for leon to match UT699 and AT697F. Set mul cost for leon3 to match standard leon3. So UT699 is not a standard LEON3? LEON3 exists in multiple revisions and is configurable so I agree that using the word standard in this context is a bit ambiguous. I think we should delay applying this patch. First we need to look into how to properly provide the information on FPU selection and multiplier size to GCC. Otherwise we risk having to change the values again in a short while. -- Daniel Cederman
[PATCH 3/16][ARM] Add float16x4_t intrinsics
As per https://gcc.gnu.org/ml/gcc-patches/2015-04/msg01335.html commit 54a89a084fbd00e4de036f549ca893b74b8f58fb Author: Alan Lawrence alan.lawre...@arm.com Date: Mon Dec 8 18:40:03 2014 + ARM: float16x4_t intrinsics (v2 - fix v[sg]et_lane_f16 at -O0, no vdup_n/vmov_n) diff --git a/gcc/config/arm/arm_neon.h b/gcc/config/arm/arm_neon.h index c923e29..b4100c8 100644 --- a/gcc/config/arm/arm_neon.h +++ b/gcc/config/arm/arm_neon.h @@ -41,6 +41,7 @@ typedef __simd64_int8_t int8x8_t; typedef __simd64_int16_t int16x4_t; typedef __simd64_int32_t int32x2_t; typedef __builtin_neon_di int64x1_t; +typedef __builtin_neon_hf float16_t; typedef __simd64_float16_t float16x4_t; typedef __simd64_float32_t float32x2_t; typedef __simd64_poly8_t poly8x8_t; @@ -5201,6 +5202,19 @@ vget_lane_s32 (int32x2_t __a, const int __b) return (int32_t)__builtin_neon_vget_lanev2si (__a, __b); } +/* Functions cannot accept or return __FP16 types. Even if the function + were marked always-inline so there were no call sites, the declaration + would nonetheless raise an error. Hence, we must use a macro instead. */ + +#define vget_lane_f16(__v, __idx) \ + __extension__ \ +({ \ + float16x4_t __vec = (__v); \ + __builtin_arm_lane_check (4, __idx); \ + float16_t __res = __vec[__idx]; \ + __res; \ +}) + __extension__ static __inline float32_t __attribute__ ((__always_inline__)) vget_lane_f32 (float32x2_t __a, const int __b) { @@ -5333,6 +5347,16 @@ vset_lane_s32 (int32_t __a, int32x2_t __b, const int __c) return (int32x2_t)__builtin_neon_vset_lanev2si ((__builtin_neon_si) __a, __b, __c); } +#define vset_lane_f16(__e, __v, __idx) \ + __extension__ \ +({ \ + float16_t __elem = (__e); \ + float16x4_t __vec = (__v); \ + __builtin_arm_lane_check (4, __idx); \ + __vec[__idx] = __elem; \ + __vec; \ +}) + __extension__ static __inline float32x2_t __attribute__ ((__always_inline__)) vset_lane_f32 (float32_t __a, float32x2_t __b, const int __c) { @@ -5479,6 +5503,12 @@ vcreate_s64 (uint64_t __a) return (int64x1_t)__builtin_neon_vcreatedi ((__builtin_neon_di) __a); } +__extension__ static __inline float16x4_t __attribute__ ((__always_inline__)) +vcreate_f16 (uint64_t __a) +{ + return (float16x4_t) __a; +} + __extension__ static __inline float32x2_t __attribute__ ((__always_inline__)) vcreate_f32 (uint64_t __a) { @@ -8796,6 +8826,12 @@ vld1_lane_s32 (const int32_t * __a, int32x2_t __b, const int __c) return (int32x2_t)__builtin_neon_vld1_lanev2si ((const __builtin_neon_si *) __a, __b, __c); } +__extension__ static __inline float16x4_t __attribute__ ((__always_inline__)) +vld1_lane_f16 (const float16_t * __a, float16x4_t __b, const int __c) +{ + return vset_lane_f16 (*__a, __b, __c); +} + __extension__ static __inline float32x2_t __attribute__ ((__always_inline__)) vld1_lane_f32 (const float32_t * __a, float32x2_t __b, const int __c) { @@ -8944,6 +8980,13 @@ vld1_dup_s32 (const int32_t * __a) return (int32x2_t)__builtin_neon_vld1_dupv2si ((const __builtin_neon_si *) __a); } +__extension__ static __inline float16x4_t __attribute__ ((__always_inline__)) +vld1_dup_f16 (const float16_t * __a) +{ + float16_t __f = *__a; + return (float16x4_t) { __f, __f, __f, __f }; +} + __extension__ static __inline float32x2_t __attribute__ ((__always_inline__)) vld1_dup_f32 (const float32_t * __a) { @@ -11828,6 +11871,12 @@ vreinterpret_p8_p16 (poly16x4_t __a) } __extension__ static __inline poly8x8_t __attribute__ ((__always_inline__)) +vreinterpret_p8_f16 (float16x4_t __a) +{ + return (poly8x8_t) __a; +} + +__extension__ static __inline poly8x8_t __attribute__ ((__always_inline__)) vreinterpret_p8_f32 (float32x2_t __a) { return (poly8x8_t)__builtin_neon_vreinterpretv8qiv2sf (__a); @@ -11896,6 +11945,12 @@ vreinterpret_p16_p8 (poly8x8_t __a) } __extension__ static __inline poly16x4_t __attribute__ ((__always_inline__)) +vreinterpret_p16_f16 (float16x4_t __a) +{ + return (poly16x4_t) __a; +} + +__extension__ static __inline poly16x4_t __attribute__ ((__always_inline__)) vreinterpret_p16_f32 (float32x2_t __a) { return (poly16x4_t)__builtin_neon_vreinterpretv4hiv2sf (__a); @@ -11957,6 +12012,80 @@ vreinterpret_p16_u32 (uint32x2_t __a) return (poly16x4_t)__builtin_neon_vreinterpretv4hiv2si ((int32x2_t) __a); } +__extension__ static __inline float16x4_t __attribute__ ((__always_inline__)) +vreinterpret_f16_p8 (poly8x8_t __a) +{ + return (float16x4_t) __a; +} + +__extension__ static __inline float16x4_t __attribute__ ((__always_inline__)) +vreinterpret_f16_p16 (poly16x4_t __a) +{ + return (float16x4_t) __a; +} + +__extension__ static __inline float16x4_t __attribute__ ((__always_inline__)) +vreinterpret_f16_f32 (float32x2_t __a) +{ + return (float16x4_t) __a; +} + +#ifdef __ARM_FEATURE_CRYPTO +__extension__ static __inline float16x4_t __attribute__ ((__always_inline__))
[PATCH 2/16][ARM] PR/63870 Add __builtin_arm_lane_check.
As per https://gcc.gnu.org/ml/gcc-patches/2015-04/msg01334.html commit 1bb1b208a2c8c8b1ee1186c6128a498583fd64fe Author: Alan Lawrence alan.lawre...@arm.com Date: Mon Dec 8 18:36:30 2014 + Add __builtin_arm_lane_check diff --git a/gcc/config/arm/arm-builtins.c b/gcc/config/arm/arm-builtins.c index 7f5bf87..89b1b0c 100644 --- a/gcc/config/arm/arm-builtins.c +++ b/gcc/config/arm/arm-builtins.c @@ -534,12 +534,16 @@ enum arm_builtins #undef CRYPTO2 #undef CRYPTO3 + ARM_BUILTIN_NEON_BASE, + ARM_BUILTIN_NEON_LANE_CHECK = ARM_BUILTIN_NEON_BASE, + #include arm_neon_builtins.def ARM_BUILTIN_MAX }; -#define ARM_BUILTIN_NEON_BASE (ARM_BUILTIN_MAX - ARRAY_SIZE (neon_builtin_data)) +#define ARM_BUILTIN_NEON_PATTERN_START \ +(ARM_BUILTIN_MAX - ARRAY_SIZE (neon_builtin_data)) #undef CF #undef VAR1 @@ -898,7 +902,7 @@ arm_init_simd_builtin_scalar_types (void) static void arm_init_neon_builtins (void) { - unsigned int i, fcode = ARM_BUILTIN_NEON_BASE; + unsigned int i, fcode = ARM_BUILTIN_NEON_PATTERN_START; arm_init_simd_builtin_types (); @@ -908,6 +912,15 @@ arm_init_neon_builtins (void) system. */ arm_init_simd_builtin_scalar_types (); + tree lane_check_fpr = build_function_type_list (void_type_node, + intSI_type_node, + intSI_type_node, + NULL); + arm_builtin_decls[ARM_BUILTIN_NEON_LANE_CHECK] = + add_builtin_function (__builtin_arm_lane_check, lane_check_fpr, + ARM_BUILTIN_NEON_LANE_CHECK, BUILT_IN_MD, + NULL, NULL_TREE); + for (i = 0; i ARRAY_SIZE (neon_builtin_data); i++, fcode++) { bool print_type_signature_p = false; @@ -2171,14 +2184,28 @@ arm_expand_neon_args (rtx target, machine_mode map_mode, int fcode, return target; } -/* Expand a Neon builtin. These are special because they don't have symbolic +/* Expand a Neon builtin, i.e. those registered only if TARGET_NEON holds. + Most of these are special because they don't have symbolic constants defined per-instruction or per instruction-variant. Instead, the required info is looked up in the table neon_builtin_data. */ static rtx arm_expand_neon_builtin (int fcode, tree exp, rtx target) { + if (fcode == ARM_BUILTIN_NEON_LANE_CHECK) +{ + tree nlanes = CALL_EXPR_ARG (exp, 0); + gcc_assert (TREE_CODE (nlanes) == INTEGER_CST); + rtx lane_idx = expand_normal (CALL_EXPR_ARG (exp, 1)); + if (CONST_INT_P (lane_idx)) + neon_lane_bounds (lane_idx, 0, TREE_INT_CST_LOW (nlanes), exp); + else + error (%Klane index must be a constant immediate, exp); + /* Don't generate any RTL. */ + return const0_rtx; +} + neon_builtin_datum *d = - neon_builtin_data[fcode - ARM_BUILTIN_NEON_BASE]; + neon_builtin_data[fcode - ARM_BUILTIN_NEON_PATTERN_START]; enum insn_code icode = d-code; builtin_arg args[SIMD_MAX_BUILTIN_ARGS]; int num_args = insn_data[d-code].n_operands;
[PATCH 4/16][ARM] Add float16x8_t type
Unchanged since https://gcc.gnu.org/ml/gcc-patches/2015-04/msg01336.html commit b9ccac6243415b304024443b74bdc97b3a5954f2 Author: Alan Lawrence alan.lawre...@arm.com Date: Mon Dec 8 18:40:24 2014 + Add float16x8_t + V8HFmode support (regardless of -mfp16-format) diff --git a/gcc/config/arm/arm-builtins.c b/gcc/config/arm/arm-builtins.c index 89b1b0c..17e39d8 100644 --- a/gcc/config/arm/arm-builtins.c +++ b/gcc/config/arm/arm-builtins.c @@ -192,6 +192,7 @@ arm_storestruct_lane_qualifiers[SIMD_MAX_BUILTIN_ARGS] #define di_UPDImode #define v16qi_UP V16QImode #define v8hi_UP V8HImode +#define v8hf_UP V8HFmode #define v4si_UP V4SImode #define v4sf_UP V4SFmode #define v2di_UP V2DImode @@ -827,6 +828,7 @@ arm_init_simd_builtin_types (void) /* Continue with standard types. */ arm_simd_types[Float16x4_t].eltype = arm_simd_floatHF_type_node; arm_simd_types[Float32x2_t].eltype = float_type_node; + arm_simd_types[Float16x8_t].eltype = arm_simd_floatHF_type_node; arm_simd_types[Float32x4_t].eltype = float_type_node; for (i = 0; i nelts; i++) diff --git a/gcc/config/arm/arm-simd-builtin-types.def b/gcc/config/arm/arm-simd-builtin-types.def index bcbd20b..b178ae6 100644 --- a/gcc/config/arm/arm-simd-builtin-types.def +++ b/gcc/config/arm/arm-simd-builtin-types.def @@ -44,5 +44,7 @@ ENTRY (Float16x4_t, V4HF, none, 64, float16, 18) ENTRY (Float32x2_t, V2SF, none, 64, float32, 18) + + ENTRY (Float16x8_t, V8HF, none, 128, float16, 19) ENTRY (Float32x4_t, V4SF, none, 128, float32, 19) diff --git a/gcc/config/arm/arm.c b/gcc/config/arm/arm.c index 6e074ea..0faa46c 100644 --- a/gcc/config/arm/arm.c +++ b/gcc/config/arm/arm.c @@ -26251,7 +26251,8 @@ arm_vector_mode_supported_p (machine_mode mode) { /* Neon also supports V2SImode, etc. listed in the clause below. */ if (TARGET_NEON (mode == V2SFmode || mode == V4SImode || mode == V8HImode - || mode == V4HFmode || mode == V16QImode || mode == V4SFmode || mode == V2DImode)) + || mode ==V4HFmode || mode == V16QImode || mode == V4SFmode + || mode == V2DImode || mode == V8HFmode)) return true; if ((TARGET_NEON || TARGET_IWMMXT) diff --git a/gcc/config/arm/arm.h b/gcc/config/arm/arm.h index 373dc85..c0a83b2 100644 --- a/gcc/config/arm/arm.h +++ b/gcc/config/arm/arm.h @@ -999,7 +999,7 @@ extern int arm_arch_crc; /* Modes valid for Neon Q registers. */ #define VALID_NEON_QREG_MODE(MODE) \ ((MODE) == V4SImode || (MODE) == V8HImode || (MODE) == V16QImode \ - || (MODE) == V4SFmode || (MODE) == V2DImode) + || (MODE) == V8HFmode || (MODE) == V4SFmode || (MODE) == V2DImode) /* Structure modes valid for Neon registers. */ #define VALID_NEON_STRUCT_MODE(MODE) \ diff --git a/gcc/config/arm/arm_neon.h b/gcc/config/arm/arm_neon.h index b4100c8..a958f63 100644 --- a/gcc/config/arm/arm_neon.h +++ b/gcc/config/arm/arm_neon.h @@ -58,6 +58,7 @@ typedef __simd128_int8_t int8x16_t; typedef __simd128_int16_t int16x8_t; typedef __simd128_int32_t int32x4_t; typedef __simd128_int64_t int64x2_t; +typedef __simd128_float16_t float16x8_t; typedef __simd128_float32_t float32x4_t; typedef __simd128_poly8_t poly8x16_t; typedef __simd128_poly16_t poly16x8_t;
[PATCH 16/16][ARM/AArch64 Testsuite] Add test of vcvt{,_high}_{f16_f32,f32_f16}
This is a respin of https://gcc.gnu.org/ml/gcc-patches/2015-04/msg01349.html . Changes are to: use #if defined(__aarch64__) rather than __ARM_64BIT_STATE__; add an initial call to clean_results; use a different mechanism for adding -mfpu=neon-fp16 on ARM (specifically: we try to add that flag for all tests, as AFAICT that is valid anywhere -mfpu=neon is valid; and bail out of the vcvt_f16 test, the only test that actually requires fp16 H/W, if unsuccessful e.g. if a -mfpu=neon was forced on the command-line). This is because the rightmost -mfpu option overrides the previous. gcc/testsuite/ChangeLog: * gcc.target/aarch64/advsimd-intrinsics/advsimd-intrinsics.exp: set additional flags for neon-fp16 support. * gcc.target/aarch64/advsimd-intrinsics/vcvt_f16.c: New. commit e6cc7467ddf5702d3a122b8ac4163621d0164b37 Author: Alan Lawrence alan.lawre...@arm.com Date: Wed Jan 28 13:02:22 2015 + v2 Test vcvt{,_high on aarch64}_f{32_f16,16_f32}, with neon-fp16 for ARM targets. v2a: #if defined(__aarch64__); + clean_results(); fp16 opts for ARM; fp16_hw_ok diff --git a/gcc/testsuite/gcc.target/aarch64/advsimd-intrinsics/advsimd-intrinsics.exp b/gcc/testsuite/gcc.target/aarch64/advsimd-intrinsics/advsimd-intrinsics.exp index ceada83..5f5e1fe 100644 --- a/gcc/testsuite/gcc.target/aarch64/advsimd-intrinsics/advsimd-intrinsics.exp +++ b/gcc/testsuite/gcc.target/aarch64/advsimd-intrinsics/advsimd-intrinsics.exp @@ -52,8 +52,10 @@ if {[istarget arm*-*-*]} then { torture-init set-torture-options $C_TORTURE_OPTIONS {{}} $LTO_TORTURE_OPTIONS -# Make sure Neon flags are provided, if necessary. -set additional_flags [add_options_for_arm_neon ] +# Make sure Neon flags are provided, if necessary. We try to add FP16 flags +# for all tests; tests requiring FP16 will abort if a non-FP16 option +# was forced. +set additional_flags [add_options_for_arm_neon_fp16 ] # Main loop. gcc-dg-runtest [lsort [glob -nocomplain $srcdir/$subdir/*.c]] \ diff --git a/gcc/testsuite/gcc.target/aarch64/advsimd-intrinsics/vcvt_f16.c b/gcc/testsuite/gcc.target/aarch64/advsimd-intrinsics/vcvt_f16.c new file mode 100644 index 000..7a1c256 --- /dev/null +++ b/gcc/testsuite/gcc.target/aarch64/advsimd-intrinsics/vcvt_f16.c @@ -0,0 +1,98 @@ +/* { dg-require-effective-target arm_neon_fp16_hw_ok { target { arm*-*-* } } } */ +#include arm_neon.h +#include arm-neon-ref.h +#include compute-ref-data.h +#include math.h + +/* Expected results for vcvt. */ +VECT_VAR_DECL (expected,hfloat,32,4) [] = { 0x4180, 0x4170, + 0x4160, 0x4150 }; +VECT_VAR_DECL (expected,hfloat,16,4) [] = { 0x3e00, 0x4100, 0x4300, 0x4480 }; + +/* Expected results for vcvt_high_f32_f16. */ +VECT_VAR_DECL (expected_high,hfloat,32,4) [] = { 0xc140, 0xc130, + 0xc120, 0xc110 }; +/* Expected results for vcvt_high_f16_f32. */ +VECT_VAR_DECL (expected_high,hfloat,16,8) [] = { 0x4000, 0x4000, 0x4000, 0x4000, + 0xcc00, 0xcb80, 0xcb00, 0xca80 }; + +void +exec_vcvt (void) +{ + clean_results(); + +#define TEST_MSG vcvt_f32_f16 + { +VECT_VAR_DECL (buffer_src, float, 16, 4) [] = { 16.0, 15.0, 14.0, 13.0 }; + +DECL_VARIABLE (vector_src, float, 16, 4); + +VLOAD (vector_src, buffer_src, , float, f, 16, 4); +DECL_VARIABLE (vector_res, float, 32, 4) = + vcvt_f32_f16 (VECT_VAR (vector_src, float, 16, 4)); +vst1q_f32 (VECT_VAR (result, float, 32, 4), + VECT_VAR (vector_res, float, 32, 4)); + +CHECK_FP (TEST_MSG, float, 32, 4, PRIx32, expected, ); + } +#undef TEST_MSG + + clean_results (); + +#define TEST_MSG vcvt_f16_f32 + { +VECT_VAR_DECL (buffer_src, float, 32, 4) [] = { 1.5, 2.5, 3.5, 4.5 }; +DECL_VARIABLE (vector_src, float, 32, 4); + +VLOAD (vector_src, buffer_src, q, float, f, 32, 4); +DECL_VARIABLE (vector_res, float, 16, 4) = + vcvt_f16_f32 (VECT_VAR (vector_src, float, 32, 4)); +vst1_f16 (VECT_VAR (result, float, 16, 4), + VECT_VAR (vector_res, float, 16 ,4)); + +CHECK_FP (TEST_MSG, float, 16, 4, PRIx16, expected, ); + } +#undef TEST_MSG + +#if defined(__aarch64__) + clean_results (); + +#define TEST_MSG vcvt_high_f32_f16 + { +DECL_VARIABLE (vector_src, float, 16, 8); +VLOAD (vector_src, buffer, q, float, f, 16, 8); +DECL_VARIABLE (vector_res, float, 32, 4); +VECT_VAR (vector_res, float, 32, 4) = + vcvt_high_f32_f16 (VECT_VAR (vector_src, float, 16, 8)); +vst1q_f32 (VECT_VAR (result, float, 32, 4), + VECT_VAR (vector_res, float, 32, 4)); +CHECK_FP (TEST_MSG, float, 32, 4, PRIx32, expected_high, ); + } +#undef TEST_MSG + clean_results (); + +#define TEST_MSG vcvt_high_f16_f32 + { +DECL_VARIABLE (vector_low, float, 16, 4); +VDUP (vector_low, , float, f, 16, 4, 2.0); + +DECL_VARIABLE (vector_src, float, 32, 4); +VLOAD (vector_src, buffer, q, float, f, 32, 4); + +DECL_VARIABLE (vector_res, float, 16, 8) = + vcvt_high_f16_f32 (VECT_VAR (vector_low, float, 16,
[AArch64][1/2] Mark GOT related MEM rtx as const to help RTL loop IV
Given below testcase, the instruction which load function address from GOT table is not hoisted out of the loop while it should be, as the value is fixed at runtime. The problem is we havn't mark those GOT related mem as READONLY that RTL loop2_iv pass has make conservative decision in check_maybe_invariant to not hoist them. int bar (int) ; int foo (int a, int bound) { int i = 0; int sum = 0; for (i; i bound; i++) sum = bar (sum); return sum; } this patch mark mem in PIC related pattern as READONLY and NO_TRAP, more cleanup may needed for several other pattern. 2015-07-06 Jiong Wang jiong.w...@arm.com gcc/ * config/aarch64/aarch64.c (aarch64_load_symref_appropriately): Mark mem as READONLY and NOTRAP for PIC symbol. gcc/testsuite/ * gcc.target/aarch64/got_mem_hoist.c: New test. -- Regards, Jiong diff --git a/gcc/config/aarch64/aarch64.c b/gcc/config/aarch64/aarch64.c index 4522fc2..4bbc049 100644 --- a/gcc/config/aarch64/aarch64.c +++ b/gcc/config/aarch64/aarch64.c @@ -915,6 +915,8 @@ aarch64_load_symref_appropriately (rtx dest, rtx imm, { machine_mode mode = GET_MODE (dest); rtx gp_rtx = pic_offset_table_rtx; + rtx insn; + rtx mem; /* NOTE: pic_offset_table_rtx can be NULL_RTX, because we can reach here before rtl expand. Tree IVOPT will generate rtl pattern to @@ -958,16 +960,27 @@ aarch64_load_symref_appropriately (rtx dest, rtx imm, if (mode == ptr_mode) { if (mode == DImode) - emit_insn (gen_ldr_got_small_28k_di (dest, gp_rtx, imm)); + insn = gen_ldr_got_small_28k_di (dest, gp_rtx, imm); else - emit_insn (gen_ldr_got_small_28k_si (dest, gp_rtx, imm)); + insn = gen_ldr_got_small_28k_si (dest, gp_rtx, imm); + + mem = XVECEXP (SET_SRC (insn), 0, 0); } else { gcc_assert (mode == Pmode); - emit_insn (gen_ldr_got_small_28k_sidi (dest, gp_rtx, imm)); + + insn = gen_ldr_got_small_28k_sidi (dest, gp_rtx, imm); + mem = XVECEXP (XEXP (SET_SRC (insn), 0), 0, 0); } + /* The operand is expected to be MEM. Whenever the related insn + pattern changed, above code which calculate mem should be + updated. */ + gcc_assert (GET_CODE (mem) == MEM); + MEM_READONLY_P (mem) = 1; + MEM_NOTRAP_P (mem) = 1; + emit_insn (insn); return; } @@ -980,6 +993,9 @@ aarch64_load_symref_appropriately (rtx dest, rtx imm, DImode if dest is dereferenced to access the memeory. This is why we have to handle three different ldr_got_small patterns here (two patterns for ILP32). */ + + rtx insn; + rtx mem; rtx tmp_reg = dest; machine_mode mode = GET_MODE (dest); @@ -990,16 +1006,24 @@ aarch64_load_symref_appropriately (rtx dest, rtx imm, if (mode == ptr_mode) { if (mode == DImode) - emit_insn (gen_ldr_got_small_di (dest, tmp_reg, imm)); + insn = gen_ldr_got_small_di (dest, tmp_reg, imm); else - emit_insn (gen_ldr_got_small_si (dest, tmp_reg, imm)); + insn = gen_ldr_got_small_si (dest, tmp_reg, imm); + + mem = XVECEXP (SET_SRC (insn), 0, 0); } else { gcc_assert (mode == Pmode); - emit_insn (gen_ldr_got_small_sidi (dest, tmp_reg, imm)); + + insn = gen_ldr_got_small_sidi (dest, tmp_reg, imm); + mem = XVECEXP (XEXP (SET_SRC (insn), 0), 0, 0); } + gcc_assert (GET_CODE (mem) == MEM); + MEM_READONLY_P (mem) = 1; + MEM_NOTRAP_P (mem) = 1; + emit_insn (insn); return; } diff --git a/gcc/testsuite/gcc.target/aarch64/got_mem_hoist.c b/gcc/testsuite/gcc.target/aarch64/got_mem_hoist.c new file mode 100644 index 000..6d29718 --- /dev/null +++ b/gcc/testsuite/gcc.target/aarch64/got_mem_hoist.c @@ -0,0 +1,21 @@ +/* { dg-do compile } */ +/* { dg-options -O2 -fpic -fdump-rtl-loop2_invariant } */ + +int bar (int); +int cal (void *); + +int +foo (int a, int bound) +{ + int i = 0; + int sum = 0; + + for (i; i bound; i++) +sum = cal (bar); + + return sum; +} + +/* The insn which loads function address from GOT table should be moved out + of the loop. */ +/* { dg-final { scan-rtl-dump Decided loop2_invariant } } */
Re: [gomp] Move openacc vector worker single handling to RTL
On Mon, Jul 06, 2015 at 03:34:51PM -0400, Nathan Sidwell wrote: On 07/04/15 16:41, Nathan Sidwell wrote: On 07/03/15 19:11, Jakub Jelinek wrote: If the builtins are not meant to be used by users directly (I assume they aren't) nor have a 1-1 correspondence to a library routine, it is much better to emit them as internal calls (see internal-fn.{c,def}) instead of BUILT_IN_NORMAL functions. This patch uses internal builtins, I had to make one additional change to tree-ssa-tail-merge.c's same_succ_def::equal hash compare function. The new internal fn I introduced should compare EQ but not otherwise compare EQUAL, and that was blowing up the has function, which relied on EQUAL only. I don't know why I didn't hit this problem in the previous patch with the regular builtin. How does this interact with #pragma acc routine {gang,worker,vector,seq} ? Or is that something to be added later on? Jakub
Re: [PATCH 1/2][ARM] PR/65956 AAPCS update for alignment attribute
Ramana Radhakrishnan wrote: This is OK, the ada testing can go in parallel and we should take this in to not delay rc1 any further. I can confirm, no regressions in check-ada (gcc/testsuite/gnats and gcc/testsuite/acats) following an ada bootstrap on cortex-a15/neon/hard-float. That's the existing tests - nothing specifically testing conformance to the AAPCS updates (wrt. arrays), of course. Cheers, Alan
Re: Clean-ups in match.pd
On Tue, Jul 7, 2015 at 12:48 PM, Eric Botcazou ebotca...@adacore.com wrote: Bootstrap+testsuite on powerpc64le-unknown-linux-gnu (it looks like *-match.c takes about 10 minutes to compile in stage2 these days). Yeah, it has already taken back all the speedup brought by the rewrite of the RTL gen* stuff by Richard S. :-( And it's going to get worse (read: larger). Looking at the time-report data I don't think splitting into multiple functions will help though. Splitting into multiple files would allow to parallelize the build at least. I'm gathering a profile to see where all the time in the checking stuff goes. As said, one code generation arrangement that is on my TODO list will remove some code duplication, but I'm not sure it will make a big enough difference. Richard. -- Eric Botcazou
Re: [RFC] two-phase marking in gt_cleare_cache
On 07/07/15 10:42, Richard Biener wrote: On Mon, Jul 6, 2015 at 7:29 PM, Tom de Vries tom_devr...@mentor.com wrote: On 06/07/15 15:29, Richard Biener wrote: On Mon, Jul 6, 2015 at 3:25 PM, Richard Biener richard.guent...@gmail.com wrote: On Mon, Jul 6, 2015 at 10:57 AM, Tom de Vries tom_devr...@mentor.com wrote: Hi, Using attached untested patch, I managed to minimize a test-case failure for PR 66714. The patch introduces two-phase marking in gt_cleare_cache: - first phase, it loops over all the hash table entries and removes those which are dead - second phase, it runs over all the live hash table entries and marks live items that are reachable from those live entries By doing so, we make the behaviour of gt_cleare_cache independent of the order in which the entries are visited, turning: - hard-to-trigger bugs which trigger for one visiting order but not for another, into - more easily triggered bugs which trigger for any visiting order. Any comments? I think it is only half-way correct in your proposed change. You only fix the issue for hashes of the same kind. To truly fix the issue you'd have to change generated code for gt_clear_caches () and provide a clearing-only implementation (or pass a operation mode bool to the core worker in hash-table.h). [ Btw, we have been discussing a similar issue before: https://gcc.gnu.org/ml/gcc/2010-07/msg00077.html ] True, the problem exists at the scope of all variables marked with 'cache', and this patch addresses the problem only within a single variable. Hmm, and don't we rather want to first mark and _then_ clear? I. In favor of first clear and then mark: It allows for: - a lazy one phase implementation for !ENABLE_CHECKING where you do a single clear-or-mark phase (so the clear is lazy). - an eager two phase implementation for ENABLE_CHECKING (where the clear is eager) The approach of first a marking phase and then a clearing phase means you always have to do these two phases (you can't do the marking lazily). True. First mark and then clear means the marking should be done iteratively. Each time you mark something live, another entry in another hash table could become live. Marking iteratively could become quite costly. I don't see this - marking is done recursively so if one entry makes another live and that makes another live the usual GC marking recursion will deal with this? That is not my understanding. Marking an item live doesn't mean that the associated cache entries become live. For that, we have to iterate again over all hash tables and all entries to find those entries. And by marking those, we may find new items which are live. And the process starts over again, until fixed point. [ If we maintain a per-item list of cache entries the item is the key for, then we can do this recursively, rather than iteratively. ] II. In favor of first mark and then clear: The users of garbage collection will need to be less precise. Because if entry B in the hash is live and would keep A live then A _is_ kept in the end but you'll remove it from the hash, possibly no longer using a still live copy. I'm not sure I understand the scenario you're concerned about, but ... say we have - entry B: item B - item A - entry A: item A - item Z If you do clear first and mark second, and you start out with item B live and item A dead: - during the clearing phase you clear entry A and keep entry B, and - during the marking phase you mark item A live. So we no longer have entry A, but item A is kept and entry B is kept. Yes. This makes the cache weaker in that after this GC operation a lookup of A no longer succeeds but it still is there. The whole point of your patch was to make the behavior more predictable and in some way it succeeds (within a cache). As it is supposed to put more stress on the cache logic (it's ENABLE_CHECKING only) it makes sense to clear optimistically (after all it's a cache and not guaranteed to find a still live entry). It would be still nice to cover all caches together because as I remember we've mostly seen issues of caches interacting. Attached patch (completed no-bootstrap c-only build) implements that. Thanks, - Tom Add clear-phase/mark-phase cache clearing 2015-07-07 Tom de Vries t...@codesourcery.com * gengtype.c (finish_cache_funcs): Add phase param to gt_clear_caches_file and gt_clear_caches. (write_roots): Add phase param to gt_clear_caches_file, and use. * ggc-common.c (ggc_mark_roots): Add arg to call to gt_clear_caches. Call gt_clear_caches twice for ENABLE_CHECKING. * ggc.h (gt_clear_caches): Add phase param to declaration. * hash-table.h (gt_cleare_cache): Add and handle phase param. --- gcc/gengtype.c | 11 ++- gcc/ggc-common.c | 11 ++- gcc/ggc.h| 2 +- gcc/hash-table.h | 55 --- 4 files changed, 61 insertions(+), 18 deletions(-) diff --git a/gcc/gengtype.c
Re: [PATCH] Update instruction cost for LEON
2015-07-03 Daniel Cederman ceder...@gaisler.com * config/sparc/sparc.c (struct processor_costs): Set div cost for leon to match UT699 and AT697F. Set mul cost for leon3 to match standard leon3. So UT699 is not a standard LEON3? -- Eric Botcazou
Re: [PR25529] Convert (unsigned t * 2)/2 into unsigned (t 0x7FFFFFFF)
On Tue, 7 Jul 2015, Richard Biener wrote: On Tue, Jul 7, 2015 at 8:06 AM, Marc Glisse marc.gli...@inria.fr wrote: On Tue, 7 Jul 2015, Hurugalawadi, Naveen wrote: Please find attached the patch PR25529.patch that converts the pattern:- (unsigned * 2)/2 is into unsigned 0x7FFF +/* Simplify (unsigned t * 2)/2 - unsigned t 0x7FFF. */ +(for div (trunc_div ceil_div floor_div round_div exact_div) + (simplify + (div (mult @0 INTEGER_CST@1) INTEGER_CST@1) You don't need to repeat INTEGER_CST, the second time @1 is enough. + (with { tree n2 = build_int_cst (TREE_TYPE (@0), + wi::exact_log2 (@1)); } + (if (TYPE_UNSIGNED (TREE_TYPE (@0))) + (bit_and @0 (rshift (lshift { build_minus_one_cst (TREE_TYPE (@0)); } + { n2; }) { n2; })) What happens if you write t*3/3? Huh, and you posted this patch twice? See my reply to the other copy for the correctness issues and better handling of exact_div They are not the same, one is for left shifts and the other one for right shifts. And that makes a big difference: in t*c/c, the division is always exact, so all divisions are equivalent. This is not the case for t/c*c. -- Marc Glisse
Re: [PING 3] Re: [PATCH] warn for unsafe calls to __builtin_return_address
On 07/07/2015 04:41 AM, Martin Sebor wrote: This is a small change to diagnose unsafe calls to __builtin_{frame,return}_address (with an argument 2) than tend to return bogus values or lead to crashes at runtime. I hadn't realized you went through and implemented the suggestion. Thanks for doing this. I hope it gets approved. A review would be appreciated. (It may be the relevant maintainers haven't noticed there's a code patch here because this is nested within the [PATCH] clarify doc for __builtin_return_address thread).) Thanks, Pedro Alves
Re: Clean-ups in match.pd
Bootstrap+testsuite on powerpc64le-unknown-linux-gnu (it looks like *-match.c takes about 10 minutes to compile in stage2 these days). Yeah, it has already taken back all the speedup brought by the rewrite of the RTL gen* stuff by Richard S. :-( -- Eric Botcazou
[PATCH] MIPS: Fix the call-[1,5,6].c tests to allow the jrc instruction to be matched when testing with microMIPS
Hi, When building the call-[1,5,6].c tests for micromips the jrc rather than the jr instruction is used to call the tail* functions. I have updated the test output to allow the jrc instruction to be matched. I have tested this on the mips-mti-elf target using mips32r2/{-mno-micromips/-mmicromips} test options and there are no new regressions. The patch and ChangeLog are below. Ok to commit? Many thanks, Andrew testsuite/ * gcc.target/mips/call-1.c: Allow testcase to match the jrc instruction. * gcc.target/mips/call-5.c: Ditto. * gcc.target/mips/call-6.c: Ditto. diff --git a/gcc/testsuite/gcc.target/mips/call-1.c b/gcc/testsuite/gcc.target/mips/call-1.c index 2f4a37e..a00126e 100644 --- a/gcc/testsuite/gcc.target/mips/call-1.c +++ b/gcc/testsuite/gcc.target/mips/call-1.c @@ -3,10 +3,10 @@ /* { dg-final { scan-assembler \\.reloc\t1f,R_MIPS_JALR,normal\n1:\tjalrs?\t } } */ /* { dg-final { scan-assembler \\.reloc\t1f,R_MIPS_JALR,normal2\n1:\tjalrs?\t } } */ /* { dg-final { scan-assembler \\.reloc\t1f,R_MIPS_JALR,staticfunc\n1:\tjalrs?\t } } */ -/* { dg-final { scan-assembler \\.reloc\t1f,R_MIPS_JALR,tail\n1:\tjr\t } } */ -/* { dg-final { scan-assembler \\.reloc\t1f,R_MIPS_JALR,tail2\n1:\tjr\t } } */ -/* { dg-final { scan-assembler \\.reloc\t1f,R_MIPS_JALR,tail3\n1:\tjr\t } } */ -/* { dg-final { scan-assembler \\.reloc\t1f,R_MIPS_JALR,tail4\n1:\tjr\t } } */ +/* { dg-final { scan-assembler (\\.reloc\t1f,R_MIPS_JALR,tail\n1:)?\tjrc?\t } } */ +/* { dg-final { scan-assembler (\\.reloc\t1f,R_MIPS_JALR,tail2\n1:)?\tjrc?\t } } */ +/* { dg-final { scan-assembler (\\.reloc\t1f,R_MIPS_JALR,tail3\n1:)?\tjrc?\t } } */ +/* { dg-final { scan-assembler (\\.reloc\t1f,R_MIPS_JALR,tail4\n1:)?\tjrc?\t } } */ __attribute__ ((noinline)) static void staticfunc () { asm (); } int normal (); diff --git a/gcc/testsuite/gcc.target/mips/call-5.c b/gcc/testsuite/gcc.target/mips/call-5.c index bfb95eb..d8d84d3 100644 --- a/gcc/testsuite/gcc.target/mips/call-5.c +++ b/gcc/testsuite/gcc.target/mips/call-5.c @@ -7,8 +7,8 @@ /* { dg-final { scan-assembler \\.reloc\t1f,R_MIPS_JALR,staticfunc\n1:\tjalr\t } } */ /* { dg-final { scan-assembler \\.reloc\t1f,R_MIPS_JALR,tail\n1:\tjalr\t } } */ /* { dg-final { scan-assembler \\.reloc\t1f,R_MIPS_JALR,tail2\n1:\tjalr\t } } */ -/* { dg-final { scan-assembler \\.reloc\t1f,R_MIPS_JALR,tail3\n1:\tjr\t } } */ -/* { dg-final { scan-assembler \\.reloc\t1f,R_MIPS_JALR,tail4\n1:\tjr\t } } */ +/* { dg-final { scan-assembler (\\.reloc\t1f,R_MIPS_JALR,tail3\n1:)?\tjrc?\t } } */ +/* { dg-final { scan-assembler (\\.reloc\t1f,R_MIPS_JALR,tail4\n1:)?\tjrc?\t } } */ __attribute__ ((noinline)) static void staticfunc () { asm (); } int normal (); diff --git a/gcc/testsuite/gcc.target/mips/call-6.c b/gcc/testsuite/gcc.target/mips/call-6.c index 117795d..e6c90d7 100644 --- a/gcc/testsuite/gcc.target/mips/call-6.c +++ b/gcc/testsuite/gcc.target/mips/call-6.c @@ -6,8 +6,8 @@ /* { dg-final { scan-assembler \\.reloc\t1f,R_MIPS_JALR,staticfunc\n1:\tjalr\t } } */ /* { dg-final { scan-assembler \\.reloc\t1f,R_MIPS_JALR,tail\n1:\tjalr\t } } */ /* { dg-final { scan-assembler \\.reloc\t1f,R_MIPS_JALR,tail2\n1:\tjalr\t } } */ -/* { dg-final { scan-assembler \\.reloc\t1f,R_MIPS_JALR,tail3\n1:\tjr\t } } */ -/* { dg-final { scan-assembler \\.reloc\t1f,R_MIPS_JALR,tail4\n1:\tjr\t } } */ +/* { dg-final { scan-assembler (\\.reloc\t1f,R_MIPS_JALR,tail3\n1:)?\tjrc?\t } } */ +/* { dg-final { scan-assembler (\\.reloc\t1f,R_MIPS_JALR,tail4\n1:)?\tjrc?\t } } */ __attribute__ ((noinline)) static void staticfunc () { asm (); } int normal ();
Re: [PR25529] Convert (unsigned t * 2)/2 into unsigned (t 0x7FFFFFFF)
On Tue, Jul 7, 2015 at 11:24 AM, Marc Glisse marc.gli...@inria.fr wrote: On Tue, 7 Jul 2015, Richard Biener wrote: On Tue, Jul 7, 2015 at 8:06 AM, Marc Glisse marc.gli...@inria.fr wrote: On Tue, 7 Jul 2015, Hurugalawadi, Naveen wrote: Please find attached the patch PR25529.patch that converts the pattern:- (unsigned * 2)/2 is into unsigned 0x7FFF +/* Simplify (unsigned t * 2)/2 - unsigned t 0x7FFF. */ +(for div (trunc_div ceil_div floor_div round_div exact_div) + (simplify + (div (mult @0 INTEGER_CST@1) INTEGER_CST@1) You don't need to repeat INTEGER_CST, the second time @1 is enough. + (with { tree n2 = build_int_cst (TREE_TYPE (@0), + wi::exact_log2 (@1)); } + (if (TYPE_UNSIGNED (TREE_TYPE (@0))) + (bit_and @0 (rshift (lshift { build_minus_one_cst (TREE_TYPE (@0)); } + { n2; }) { n2; })) What happens if you write t*3/3? Huh, and you posted this patch twice? See my reply to the other copy for the correctness issues and better handling of exact_div They are not the same, one is for left shifts and the other one for right shifts. And that makes a big difference: in t*c/c, the division is always exact, so all divisions are equivalent. This is not the case for t/c*c. Ah, sorry. Still the same comment for computing the constant and placing of the 'with' applies. For signed types with TYPE_OVERFLOW_UNDEFINED you can simply cancel the operation (even for non-power-of-two multipliers). In fold-const.c extract_muldiv contains magic to handle this kind of cases. Otherwise for signed division (only the sign of the division matters, so you can probably ignore sign-changing conversions of the multiplication result) you can simplify it to a sign-extension from bit precision - log2 with the proposed introduction of a SEXT_EXPR (see other thread about type promotion). Richard. -- Marc Glisse
Re: [RFC] two-phase marking in gt_cleare_cache
On Tue, Jul 7, 2015 at 11:39 AM, Tom de Vries tom_devr...@mentor.com wrote: On 07/07/15 10:42, Richard Biener wrote: On Mon, Jul 6, 2015 at 7:29 PM, Tom de Vries tom_devr...@mentor.com wrote: On 06/07/15 15:29, Richard Biener wrote: On Mon, Jul 6, 2015 at 3:25 PM, Richard Biener richard.guent...@gmail.com wrote: On Mon, Jul 6, 2015 at 10:57 AM, Tom de Vries tom_devr...@mentor.com wrote: Hi, Using attached untested patch, I managed to minimize a test-case failure for PR 66714. The patch introduces two-phase marking in gt_cleare_cache: - first phase, it loops over all the hash table entries and removes those which are dead - second phase, it runs over all the live hash table entries and marks live items that are reachable from those live entries By doing so, we make the behaviour of gt_cleare_cache independent of the order in which the entries are visited, turning: - hard-to-trigger bugs which trigger for one visiting order but not for another, into - more easily triggered bugs which trigger for any visiting order. Any comments? I think it is only half-way correct in your proposed change. You only fix the issue for hashes of the same kind. To truly fix the issue you'd have to change generated code for gt_clear_caches () and provide a clearing-only implementation (or pass a operation mode bool to the core worker in hash-table.h). [ Btw, we have been discussing a similar issue before: https://gcc.gnu.org/ml/gcc/2010-07/msg00077.html ] True, the problem exists at the scope of all variables marked with 'cache', and this patch addresses the problem only within a single variable. Hmm, and don't we rather want to first mark and _then_ clear? I. In favor of first clear and then mark: It allows for: - a lazy one phase implementation for !ENABLE_CHECKING where you do a single clear-or-mark phase (so the clear is lazy). - an eager two phase implementation for ENABLE_CHECKING (where the clear is eager) The approach of first a marking phase and then a clearing phase means you always have to do these two phases (you can't do the marking lazily). True. First mark and then clear means the marking should be done iteratively. Each time you mark something live, another entry in another hash table could become live. Marking iteratively could become quite costly. I don't see this - marking is done recursively so if one entry makes another live and that makes another live the usual GC marking recursion will deal with this? That is not my understanding. Marking an item live doesn't mean that the associated cache entries become live. For that, we have to iterate again over all hash tables and all entries to find those entries. And by marking those, we may find new items which are live. And the process starts over again, until fixed point. All used predicates are basically ggc_marked_p () AFAIK. So when sth was not marked previosuly GC will recurse to marking it. GC only considers the reference from the cache special not references from entries in the cache. But maybe I am missing something here. [ If we maintain a per-item list of cache entries the item is the key for, then we can do this recursively, rather than iteratively. II. In favor of first mark and then clear: The users of garbage collection will need to be less precise. Because if entry B in the hash is live and would keep A live then A _is_ kept in the end but you'll remove it from the hash, possibly no longer using a still live copy. I'm not sure I understand the scenario you're concerned about, but ... say we have - entry B: item B - item A - entry A: item A - item Z If you do clear first and mark second, and you start out with item B live and item A dead: - during the clearing phase you clear entry A and keep entry B, and - during the marking phase you mark item A live. So we no longer have entry A, but item A is kept and entry B is kept. Yes. This makes the cache weaker in that after this GC operation a lookup of A no longer succeeds but it still is there. The whole point of your patch was to make the behavior more predictable and in some way it succeeds (within a cache). As it is supposed to put more stress on the cache logic (it's ENABLE_CHECKING only) it makes sense to clear optimistically (after all it's a cache and not guaranteed to find a still live entry). It would be still nice to cover all caches together because as I remember we've mostly seen issues of caches interacting. Attached patch (completed no-bootstrap c-only build) implements that. Looks good to me. Thanks, Richard. Thanks, - Tom
Re: [PATCH] Do not use floating point registers when compiling with -msoft-float for SPARC
__builtin_apply* and __builtin_return accesses the floating point registers on SPARC even when compiling with -msoft-float. Ouch. The fix is OK for all active branches but... 2015-06-26 Daniel Cederman ceder...@gaisler.com * config/sparc/sparc.c (sparc_function_value_regno_p): Floating point registers cannot be used when compiling for a target without FPU. * config/sparc/sparc.md: A function cannot return a value in a floating point register when compiled without floating point support. ChangeLog must just describe the what, nothing more. If the rationale is not obvious, then a comment must be added _in the code_ itself. * config/sparc/sparc.c (sparc_function_value_regno_p): Do not return true on %f0 for a target without FPU. * config/sparc/sparc.md (untyped_call): Do not save %f0 for a target without FPU. (untyped_return): Do not load %f0 for a target without FPU. + + if ( TARGET_FPU ) +{ + rtx valreg2 = gen_rtx_REG (TARGET_ARCH64 ? TFmode : DFmode, 32); + emit_move_insn (valreg2, + adjust_address (result, TARGET_ARCH64 ? TFmode : DFmode, 8)); + emit_use (valreg2); +} Superfluous spaces around TARGET_FPU here. -- Eric Botcazou
Re: [patch 4/9] Flatten sel-sched-dump.h and sel-sched-ir.h
On Tue, 7 Jul 2015, Andrew MacLeod wrote: This patch flattens both sel-sched-dump.h and sel-sched-ir.h. Both these files end up including cfgloop.h, so in preparation for flattening cfgloop.h, flatten these. Note they actually have only a small effect on what includes them. This patch removes #include insn-attr.h from sel-sched-ir.h without adding it to .c files. I'm curious how it works, is that file now arranged to be included elsewhere? (sorry if I missed it, but the patch series does not seem to mention insn-attr.h specifically) Thanks. Alexander
Re: [gomp4] Allow parameter declarations with deviceptr
Hi! On Wed, 1 Jul 2015 16:33:24 -0700, Cesar Philippidis ce...@codesourcery.com wrote: On 07/01/2015 02:25 PM, James Norris wrote: This patch allows parameter declarations to be used as arguments to deviceptr for C and C++. Thanks! I suppose this does fix http://gcc.gnu.org/PR64748? Does this fix an existing failure? If not, can you please add a new test case? An earlier submission, http://news.gmane.org/find-root.php?message_id=%3C54E23658.6060105%40codesourcery.com%3E, did include some testsuite changes -- but I had not seen any update of this patch after Jakub's and my review comments. Grüße, Thomas signature.asc Description: PGP signature
[PATCH][13/n] Remove GENERIC stmt combining from SCCVN
This moves a few more patterns that show up during bootstrap. Bootstrapped and tested on x86_64-unknown-linux-gnu, applied to trunk. Richard. 2015-07-07 Richard Biener rguent...@suse.de * fold-const.c (fold_binary_loc): Move (X C2) C1 - (X C1) (C2 C1) simplification ... * match.pd: ... here. Add (X * C1) % C2 - 0 simplification pattern derived from extract_muldiv_1. * gcc.dg/vect/vect-over-widen-3-big-array.c: Adjust. Index: gcc/match.pd === --- gcc/match.pd(revision 225504) +++ gcc/match.pd(working copy) @@ -230,7 +230,14 @@ (define_operator_list CBRT BUILT_IN_CBRT /* (X % Y) % Y is just X % Y. */ (simplify (mod (mod@2 @0 @1) @1) - @2)) + @2) + /* From extract_muldiv_1: (X * C1) % C2 is zero if C1 is a multiple of C2. */ + (simplify + (mod (mult @0 INTEGER_CST@1) INTEGER_CST@2) + (if (ANY_INTEGRAL_TYPE_P (type) +TYPE_OVERFLOW_UNDEFINED (type) +wi::multiple_of_p (@1, @2, TYPE_SIGN (type))) + { build_zero_cst (type); }))) /* X % -C is the same as X % C. */ (simplify @@ -992,6 +999,16 @@ (define_operator_list CBRT BUILT_IN_CBRT (if (shift_type == TREE_TYPE (@3)) (bit_and @4 { newmaskt; } +/* Fold (X C2) C1 into (X C1) (C2 C1) + (X C2) C1 into (X C1) (C2 C1). */ +(for shift (lshift rshift) + (simplify + (shift (convert? (bit_and @0 INTEGER_CST@2)) INTEGER_CST@1) + (if (tree_nop_conversion_p (type, TREE_TYPE (@0))) + (with { tree mask = int_const_binop (shift, fold_convert (type, @2), @1); } +(bit_and (shift (convert @0) @1) { mask; }) + + /* Simplifications of conversions. */ /* Basic strip-useless-type-conversions / strip_nops. */ Index: gcc/fold-const.c === --- gcc/fold-const.c(revision 225504) +++ gcc/fold-const.c(working copy) @@ -11194,27 +11140,6 @@ fold_binary_loc (location_t loc, prec) == 0) return TREE_OPERAND (arg0, 0); - /* Fold (X C2) C1 into (X C1) (C2 C1) - (X C2) C1 into (X C1) (C2 C1) -if the latter can be further optimized. */ - if ((code == LSHIFT_EXPR || code == RSHIFT_EXPR) - TREE_CODE (arg0) == BIT_AND_EXPR - TREE_CODE (arg1) == INTEGER_CST - TREE_CODE (TREE_OPERAND (arg0, 1)) == INTEGER_CST) - { - tree mask = fold_build2_loc (loc, code, type, - fold_convert_loc (loc, type, -TREE_OPERAND (arg0, 1)), - arg1); - tree shift = fold_build2_loc (loc, code, type, - fold_convert_loc (loc, type, - TREE_OPERAND (arg0, 0)), - arg1); - tem = fold_binary_loc (loc, BIT_AND_EXPR, type, shift, mask); - if (tem) - return tem; - } - return NULL_TREE; case MIN_EXPR: Index: gcc/testsuite/gcc.dg/vect/vect-over-widen-3-big-array.c === --- gcc/testsuite/gcc.dg/vect/vect-over-widen-3-big-array.c (revision 225504) +++ gcc/testsuite/gcc.dg/vect/vect-over-widen-3-big-array.c (working copy) @@ -58,6 +58,6 @@ int main (void) return 0; } -/* { dg-final { scan-tree-dump-times vect_recog_over_widening_pattern: detected 1 vect } } */ +/* { dg-final { scan-tree-dump-times vect_recog_over_widening_pattern: detected 2 vect } } */ /* { dg-final { scan-tree-dump-times vectorized 1 loops 1 vect } } */
Re: [gomp] Move openacc vector worker single handling to RTL
On 07/07/15 05:54, Jakub Jelinek wrote: On Mon, Jul 06, 2015 at 03:34:51PM -0400, Nathan Sidwell wrote: How does this interact with #pragma acc routine {gang,worker,vector,seq} ? Or is that something to be added later on? That is to be added later on. I suspect such routines will trivially work, as they'll be marked up with the loop head/tail functions and levels builtin (the latter might need a bit of reworking). What will need additional work at that point is the callers of routines -- they're typically called from a foo-single mode, but need to get all threads into the called function. I'm thinking each call site will look like a mini-loop[*] surrounded by a hesd/tail marker. (all that can be done in the device-side compiler once real call sites are known.) nathan [*] of course it won't be a loop. Perhaps fork/join are less confusing names after all. WDYT?
Re: [gomp] Move openacc vector worker single handling to RTL
On Tue, Jul 07, 2015 at 10:12:56AM -0400, Nathan Sidwell wrote: On 07/07/15 05:54, Jakub Jelinek wrote: On Mon, Jul 06, 2015 at 03:34:51PM -0400, Nathan Sidwell wrote: How does this interact with #pragma acc routine {gang,worker,vector,seq} ? Or is that something to be added later on? That is to be added later on. I suspect such routines will trivially work, as they'll be marked up with the loop head/tail functions and levels builtin (the latter might need a bit of reworking). What will need additional work at that point is the callers of routines -- they're typically called from a foo-single mode, but need to get all threads into the called function. I'm thinking each call site will look like a mini-loop[*] surrounded by a hesd/tail marker. (all that can be done in the device-side compiler once real call sites are known.) Wouldn't function attributes be better for that case, and just use the internal functions for the case when the mode is being changed in the middle of function? I agree that fork/join might be less confusing. BTW, where do you plan to lower the internal functions for non-PTX? Doing it in RTL mach reorg is too late for those, we shouldn't be writing it for each single target, as for non-PTX (perhaps non-HSA) I bet the behavior is the same. Jakub
[gomp4] OpenACC device_type clause (was: OpenACC: Complete changes to disallow the independent clause after device_type)
Hi Cesar! On Tue, 7 Jul 2015 14:58:20 +0200, I wrote: On Wed, 1 Jul 2015 07:52:13 -0500, James Norris jnor...@codesourcery.com wrote: The independent clause is not available for use with device_type clauses associated with loop directives. [...] Independent of this, would you please verify (also for the corresponding Fortran code) that given these changes: --- gcc/testsuite/c-c++-common/goacc/dtype-1.c +++ gcc/testsuite/c-c++-common/goacc/dtype-1.c @@ -53,7 +53,7 @@ test () for (i3 = 1; i3 10; i3++) #pragma acc loop dtype (nVidia) auto for (i4 = 1; i4 10; i4++) -#pragma acc loop dtype (nVidia) independent +#pragma acc loop dtype (nVidia) for (i5 = 1; i5 10; i5++) #pragma acc loop device_type (nVidia) seq for (i6 = 1; i6 10; i6++) @@ -69,7 +69,7 @@ test () for (i3 = 1; i3 10; i3++) #pragma acc loop dtype (nVidia) auto device_type (*) seq for (i4 = 1; i4 10; i4++) -#pragma acc loop device_type (nVidia) independent device_type (*) seq +#pragma acc loop device_type (nVidia) device_type (*) seq for (i5 = 1; i5 10; i5++) #pragma acc loop device_type (nVidia) seq for (i6 = 1; i6 10; i6++) @@ -85,7 +85,7 @@ test () for (i3 = 1; i3 10; i3++) #pragma acc loop device_type (nVidiaGPU) auto device_type (*) seq for (i4 = 1; i4 10; i4++) -#pragma acc loop dtype (nVidiaGPU) independent dtype (*) seq +#pragma acc loop dtype (nVidiaGPU) dtype (*) seq for (i5 = 1; i5 10; i5++) #pragma acc loop device_type (nVidiaGPU) seq device_type (*) seq for (i6 = 1; i6 10; i6++) ... it is indeed correct to adjust the scan-tree-dump-times as follows: @@ -147,7 +147,7 @@ test () /* { dg-final { scan-tree-dump-times acc loop auto private\\(i4\\) 2 omplower } } */ -/* { dg-final { scan-tree-dump-times acc loop private\\(i5\\) 2 omplower } } */ +/* { dg-final { scan-tree-dump-times acc loop private\\(i5\\) 1 omplower } } */ /* { dg-final { scan-tree-dump-times acc loop seq private\\(i6\\) 3 omplower } } */ @@ -155,6 +155,6 @@ test () /* { dg-final { scan-tree-dump-times acc loop seq private\\(i4\\) 1 omplower } } */ -/* { dg-final { scan-tree-dump-times acc loop seq private\\(i5\\) 1 omplower } } */ +/* { dg-final { scan-tree-dump-times acc loop seq private\\(i5\\) 2 omplower } } */ /* { dg-final { cleanup-tree-dump omplower } } */ Is possibly something going wrong if there are two consecutive device_type clauses, such as in: #pragma acc loop device_type (nVidia) device_type (*) seq ..., or: #pragma acc loop dtype (nVidiaGPU) dtype (*) seq I'm unclear why I had to apply these scan-tree-dump-times changes? --- gcc/testsuite/gfortran.dg/goacc/dtype-1.f95 +++ gcc/testsuite/gfortran.dg/goacc/dtype-1.f95 @@ -54,7 +54,7 @@ program dtype do i3 = 1, 10 !$acc loop device_type (nVidia) auto do i4 = 1, 10 - !$acc loop dtype (nVidia) independent + !$acc loop dtype (nVidia) do i5 = 1, 10 !$acc loop dtype (nVidia) seq do i6 = 1, 10 @@ -76,7 +76,7 @@ program dtype do i3 = 1, 10 !$acc loop device_type (nVidia) auto dtype (*) seq do i4 = 1, 10 - !$acc loop dtype (nVidia) independent + !$acc loop dtype (nVidia) !$acc dtype (*) seq do i5 = 1, 10 !$acc loop device_type (nVidia) seq @@ -99,7 +99,7 @@ program dtype do i3 = 1, 10 !$acc loop dtype (nVidiaGPU) auto device_type (*) seq do i4 = 1, 10 - !$acc loop dtype (nVidiaGPU) independent + !$acc loop dtype (nVidiaGPU) !$acc dtype (*) seq do i5 = 1, 10 !$acc loop dtype (nVidiaGPU) seq device_type (*) seq No adjustment has been necessary here. Grüße, Thomas pgpBBH0CFfMzm.pgp Description: PGP signature
RE: [Patch,tree-optimization]: Add new path Splitting pass on tree ssa representation
-Original Message- From: Richard Biener [mailto:richard.guent...@gmail.com] Sent: Tuesday, July 07, 2015 2:21 PM To: Ajit Kumar Agarwal Cc: l...@redhat.com; GCC Patches; Vinod Kathail; Shail Aditya Gupta; Vidhumouli Hunsigida; Nagaraju Mekala Subject: Re: [Patch,tree-optimization]: Add new path Splitting pass on tree ssa representation On Sat, Jul 4, 2015 at 2:39 PM, Ajit Kumar Agarwal ajit.kumar.agar...@xilinx.com wrote: -Original Message- From: Richard Biener [mailto:richard.guent...@gmail.com] Sent: Tuesday, June 30, 2015 4:42 PM To: Ajit Kumar Agarwal Cc: l...@redhat.com; GCC Patches; Vinod Kathail; Shail Aditya Gupta; Vidhumouli Hunsigida; Nagaraju Mekala Subject: Re: [Patch,tree-optimization]: Add new path Splitting pass on tree ssa representation On Tue, Jun 30, 2015 at 10:16 AM, Ajit Kumar Agarwal ajit.kumar.agar...@xilinx.com wrote: All: The below patch added a new path Splitting optimization pass on SSA representation. The Path Splitting optimization Pass moves the join block of if-then-else same as loop latch to its predecessors and get merged with the predecessors Preserving the SSA representation. The patch is tested for Microblaze and i386 target. The EEMBC/Mibench benchmarks is run with the Microblaze target And the performance gain of 9.15% and rgbcmy01_lite(EEMBC benchmarks). The Deja GNU tests is run for Mircroblaze Target and no regression is seen for Microblaze target and the new testcase attached are passed. For i386 bootstrapping goes through fine and the Spec cpu2000 benchmarks is run with this patch. Following observation were seen with spec cpu2000 benchmarks. Ratio of path splitting change vs Ratio of not having path splitting change is 3653.353 vs 3652.14 for INT benchmarks. Ratio of path splitting change vs Ratio of not having path splitting change is 4353.812 vs 4345.351 for FP benchmarks. Based on comments from RFC patch following changes were done. 1. Added a new pass for path splitting changes. 2. Placed the new path Splitting Optimization pass before the copy propagation pass. 3. The join block same as the Loop latch is wired into its predecessors so that the CFG Cleanup pass will merge the blocks Wired together. 4. Copy propagation routines added for path splitting changes is not needed as suggested by Jeff. They are removed in the patch as The copy propagation in the copied join blocks will be done by the existing copy propagation pass and the update ssa pass. 5. Only the propagation of phi results of the join block with the phi argument is done which will not be done by the existing update_ssa Or copy propagation pass on tree ssa representation. 6. Added 2 tests. a) compilation check tests. b) execution tests. 7. Refactoring of the code for the feasibility check and finding the join block same as loop latch node. [Patch,tree-optimization]: Add new path Splitting pass on tree ssa representation. Added a new pass on path splitting on tree SSA representation. The path splitting optimization does the CFG transformation of join block of the if-then-else same as the loop latch node is moved and merged with the predecessor blocks after preserving the SSA representation. ChangeLog: 2015-06-30 Ajit Agarwal ajit...@xilinx.com * gcc/Makefile.in: Add the build of the new file tree-ssa-path-split.c * gcc/common.opt: Add the new flag ftree-path-split. * gcc/opts.c: Add an entry for Path splitting pass with optimization flag greater and equal to O2. * gcc/passes.def: Enable and add new pass path splitting. * gcc/timevar.def: Add the new entry for TV_TREE_PATH_SPLIT. * gcc/tree-pass.h: Extern Declaration of make_pass_path_split. * gcc/tree-ssa-path-split.c: New file for path splitting pass. * gcc/testsuite/gcc.dg/tree-ssa/path-split-2.c: New testcase. * gcc/testsuite/gcc.dg/path-split-1.c: New testcase. I'm not 100% sure I understand the transform but what I see from the testcases it tail-duplicates from a conditional up to a loop latch block (not sure if it includes it and thus ends up creating a loop nest or not). An observation I have is that the pass should at least share the transform stage to some extent with the existing tracer pass (tracer.c) which essentially does the same but not restricted to loops in any way. The following piece of code from tracer.c can be shared with the existing path splitting pass. { e = find_edge (bb, bb2); copy = duplicate_block (bb2, e, bb); flush_pending_stmts (e); add_phi_args_after_copy (copy, 1, NULL); } Sharing the above code of the transform stage of tracer.c with the path splitting pass has the following limitation. 1. The duplicated loop latch node is wired to its predecessors and the existing phi node in the loop
[patch 6/9] Flatten gimple-streamer.h
This one is amusing... 3 header files, all of them already included in all the files which include this. Bootstraps from scratch on x86_64-unknown-linux-gnu with no new regressions. Also compiles all the files in config-list.mk. * gimple-streamer.h: Remove all includes. Index: gimple-streamer.h === *** gimple-streamer.h (revision 225452) --- gimple-streamer.h (working copy) *** along with GCC; see the file COPYING3. *** 22,30 #ifndef GCC_GIMPLE_STREAMER_H #define GCC_GIMPLE_STREAMER_H - #include tm.h - #include hard-reg-set.h - #include function.h #include lto-streamer.h /* In gimple-streamer-in.c */ --- 22,27
Re: [RFC] two-phase marking in gt_cleare_cache
Hi, On Mon, 6 Jul 2015, Richard Biener wrote: By doing so, we make the behaviour of gt_cleare_cache independent of the order in which the entries are visited, turning: - hard-to-trigger bugs which trigger for one visiting order but not for another, into - more easily triggered bugs which trigger for any visiting order. Any comments? I think it is only half-way correct in your proposed change. You only fix the issue for hashes of the same kind. To truly fix the issue you'd have to change generated code for gt_clear_caches () and provide a clearing-only implementation (or pass a operation mode bool to the core worker in hash-table.h). Hmm, and don't we rather want to first mark and _then_ clear? Because if entry B in the hash is live and would keep A live then A _is_ kept in the end but you'll remove it from the hash, possibly no longer using a still live copy. I don't think such use has ever worked with the caching hash tables. Even the old (before c++) scheme didn't iterate, i.e. if a cache-hash entry A became life from outside, but it itself kept an entry B live in the hash table (with no other references to B) then this never worked (or only by luck), because the slot was always cleared if !ggc_marked_p, so if B was visited before A it was removed from the hash-table (and in particular B's gt_ggc_mx routine was never called, so whatever B needed wasn't even marked). Given this I think the call to gt_ggc_mx is superfluous because it wouldn't work relyably for multi-step dependencies anyway. Hence a situation that works with that call in place, and breaking without it is actually a bug waiting to be uncovered. Ciao, Michael.
RE: [PATCH] MIPS: Fix the call-[1,5,6].c tests to allow the jrc instruction to be matched when testing with microMIPS
OK. Committed as SVN 225516. Regards, Andrew