Re: [PR25529] Convert (unsigned t * 2)/2 into unsigned (t 0x7FFFFFFF)

2015-07-07 Thread Marc Glisse

On Tue, 7 Jul 2015, Hurugalawadi, Naveen wrote:


Please find attached the patch PR25529.patch that converts the pattern:-
(unsigned * 2)/2 is into unsigned 0x7FFF


+/* Simplify (unsigned t * 2)/2 - unsigned t  0x7FFF.  */
+(for div (trunc_div ceil_div floor_div round_div exact_div)
+ (simplify
+  (div (mult @0 INTEGER_CST@1) INTEGER_CST@1)

You don't need to repeat INTEGER_CST, the second time @1 is enough.

+  (with { tree n2 = build_int_cst (TREE_TYPE (@0),
+  wi::exact_log2 (@1)); }
+  (if (TYPE_UNSIGNED (TREE_TYPE (@0)))
+   (bit_and @0 (rshift (lshift { build_minus_one_cst (TREE_TYPE (@0)); }
+  { n2; }) { n2; }))

What happens if you write t*3/3?

--
Marc Glisse


Tests for libgomp based on OpenMP Examples 4.0.2.

2015-07-07 Thread Maxim Blumental
With this letter I propose a patch with tests for libgomp based on
OpenMP Examples 4.0.2 both for C and Fortran.

The changes are:

 Renamed existing tests based on OpenMP Examples to make the names more clear.
Added 16 tests for simd construct and 10 for depend clause.

-
Sincerely yours,
Maxim Blumental
2015-07-06 Maxim  Blumenthal  bvm...@gmail.com

* libgomp/testsuite/libgomp.c/examples-4/e.56.3.c: renamed to 
array_sections-3.c
* libgomp/testsuite/libgomp.c/examples-4/e.56.4.c: renamed to 
array_sections-4.c
* libgomp/testsuite/libgomp.c/examples-4/e.55.1.c: renamed to 
async_target-1.c
* libgomp/testsuite/libgomp.c/examples-4/e.55.2.c: renamed to 
async_target-2.c
* libgomp/testsuite/libgomp.c/examples-4/e.53.1.c: renamed to 
declare_target-1.c
* libgomp/testsuite/libgomp.c/examples-4/e.53.3.c: renamed to 
declare_target-3.c
* libgomp/testsuite/libgomp.c/examples-4/e.53.4.c: renamed to 
declare_target-4.c
* libgomp/testsuite/libgomp.c/examples-4/e.53.5.c: renamed to 
declare_target-5.c
* libgomp/testsuite/libgomp.c/examples-4/e.57.1.c: renamed to device-1.c
* libgomp/testsuite/libgomp.c/examples-4/e.57.2.c: renamed to device-2.c
* libgomp/testsuite/libgomp.c/examples-4/e.57.3.c: renamed to device-3.c
* libgomp/testsuite/libgomp.c/examples-4/simd-1.c: A test for simd 
construct.
* libgomp/testsuite/libgomp.c/examples-4/simd-2.c: Same.
* libgomp/testsuite/libgomp.c/examples-4/simd-3.c: Same.
* libgomp/testsuite/libgomp.c/examples-4/simd-4.c: Same.
* libgomp/testsuite/libgomp.c/examples-4/simd-5.c: Same.
* libgomp/testsuite/libgomp.c/examples-4/simd-6.c: Same.
* libgomp/testsuite/libgomp.c/examples-4/simd-7.c: Same.
* libgomp/testsuite/libgomp.c/examples-4/simd-8.c: Same.
* libgomp/testsuite/libgomp.c/examples-4/e.50.1.c: renamed to target-1.c
* libgomp/testsuite/libgomp.c/examples-4/e.50.2.c: renamed to target-2.c
* libgomp/testsuite/libgomp.c/examples-4/e.50.3.c: renamed to target-3.c
* libgomp/testsuite/libgomp.c/examples-4/e.50.4.c: renamed to target-4.c
* libgomp/testsuite/libgomp.c/examples-4/e.50.5.c: renamed to target-5.c
* libgomp/testsuite/libgomp.c/examples-4/e.51.1.c: renamed to 
target_data-1.c
* libgomp/testsuite/libgomp.c/examples-4/e.51.2.c: renamed to 
target_data-2.c
* libgomp/testsuite/libgomp.c/examples-4/e.51.3.c: renamed to 
target_data-3.c
* libgomp/testsuite/libgomp.c/examples-4/e.51.4.c: renamed to 
target_data-4.c
* libgomp/testsuite/libgomp.c/examples-4/e.51.6.c: renamed to 
target_data-6.c
* libgomp/testsuite/libgomp.c/examples-4/e.51.7.c: renamed to 
target_data-7.c
* libgomp/testsuite/libgomp.c/examples-4/e.52.1.c: renamed to 
target_update-1.c
* libgomp/testsuite/libgomp.c/examples-4/e.52.2.c: renamed to 
target_update-2.c
* libgomp/testsuite/libgomp.c/examples-4/task_dep-1.c: A test for task 
dependencies.
* libgomp/testsuite/libgomp.c/examples-4/task_dep-2.c: Same.
* libgomp/testsuite/libgomp.c/examples-4/task_dep-3.c: Same.
* libgomp/testsuite/libgomp.c/examples-4/task_dep-4.c: Same.
* libgomp/testsuite/libgomp.c/examples-4/task_dep-5.c: Same.
* libgomp/testsuite/libgomp.c/examples-4/e.54.2.c: renamed to teams-2.c
* libgomp/testsuite/libgomp.c/examples-4/e.54.3.c: renamed to teams-3.c
* libgomp/testsuite/libgomp.c/examples-4/e.54.4.c: renamed to teams-4.c
* libgomp/testsuite/libgomp.c/examples-4/e.54.5.c: renamed to teams-5.c
* libgomp/testsuite/libgomp.c/examples-4/e.54.6.c: renamed to teams-6.c
* libgomp/testsuite/libgomp.fortran/examples-4/e.56.3.f90: renamed to 
array_sections-3.f90
* libgomp/testsuite/libgomp.fortran/examples-4/e.56.4.f90: renamed to 
array_sections-4.f90
* libgomp/testsuite/libgomp.fortran/examples-4/e.55.1.f90: renamed to 
async_target-1.f90
* libgomp/testsuite/libgomp.fortran/examples-4/e.55.2.f90: renamed to 
async_target-2.f90
* libgomp/testsuite/libgomp.fortran/examples-4/e.53.1.f90: renamed to 
declare_target-1.f90
* libgomp/testsuite/libgomp.fortran/examples-4/e.53.2.f90: renamed to 
declare_target-2.f90
* libgomp/testsuite/libgomp.fortran/examples-4/e.53.3.f90: renamed to 
declare_target-3.f90
* libgomp/testsuite/libgomp.fortran/examples-4/e.53.4.f90: renamed to 
declare_target-4.f90
* libgomp/testsuite/libgomp.fortran/examples-4/e.53.5.f90: renamed to 
declare_target-5.f90
* libgomp/testsuite/libgomp.fortran/examples-4/e.57.1.f90: renamed to 
device-1.f90
* libgomp/testsuite/libgomp.fortran/examples-4/e.57.2.f90: renamed to 
device-2.f90
* libgomp/testsuite/libgomp.fortran/examples-4/e.57.3.f90: renamed to 
device-3.f90
* libgomp/testsuite/libgomp.fortran/examples-4/simd-1.f90: A test for simd 
construct.
* libgomp/testsuite/libgomp.fortran/examples-4/simd-2.f90: Same.
* libgomp/testsuite/libgomp.fortran/examples-4/simd-3.f90: Same.
* libgomp/testsuite/libgomp.fortran/examples-4/simd-4.f90: 

Re: [PATCH 3/16][ARM] Add float16x4_t intrinsics

2015-07-07 Thread Alan Lawrence

Kyrill Tkachov wrote:

On 07/07/15 17:34, Alan Lawrence wrote:

Kyrill Tkachov wrote:

On 07/07/15 14:09, Kyrill Tkachov wrote:

Hi Alan,

On 07/07/15 13:34, Alan Lawrence wrote:

As per https://gcc.gnu.org/ml/gcc-patches/2015-04/msg01335.html

For some context, the reference for these is at:
http://infocenter.arm.com/help/topic/com.arm.doc.ihi0073a/IHI0073A_arm_neon_intrinsics_ref.pdf

This patch is ok once you and Charles decide on how to proceed with the two 
prerequisites.

On second thought, the ACLE document at 
http://infocenter.arm.com/help/topic/com.arm.doc.ihi0053c/IHI0053C_acle_2_0.pdf

says in 12.2.1:
float16 types are only available when the __fp16 type is defined, i.e. when 
supported by the hardware

However, we support __fp16 whenever the user specifies -mfp16-format=ieee or
-mfp16-format=alternative, regardless of whether we have hardware support or 
not.

(Without hardware support, gcc generates calls to  __gnu_f2h_ieee or
__gnu_f2h_alternative instead of vcvtb.f16.f32, and  __gnu_h2f_ieee or
__gnu_h2f_alternative instead of vcvtb.f32.f16. However, there is no way to
support __fp16 just using those hardware instructions without caring about which
format is in use.)


Hmmm... In my opinion intrinsics should aim to map to instructions rather than 
go away and
call library functions, but this is the existing functionality
that current users might depend on :(


Sorry - to clarify: currently we generate __gnu_f2h_ieee / __gnu_h2f_ieee, to 
convert between single __fp16 and 'float' values, when there is no HW. General 
operations on scalar __fp16 values are performed by converting to float, 
performing operations on float, and converting back. The __fp16 type is 
available and usable without HW support, but only when -mfp16-format is specified.


(The existing) intrinsics operating on float16x[48] vectors (converting to/from 
float32x4) are *not* available without hardware support; these intrinsics *are* 
available without specifying -mfp16-format.


ACLE (4.1.2) allows toolchains to provide __fp16 when not implemented in HW, 
even if this is not required.



CC'ing the ARM maintainers and Tejas for an ACLE perspective.
I think that we'd want to gate the definition of __fp16 on hardware 
availability as well
(the -mfpu option) rather than just arm_fp16_format but I'm not sure of the 
impact this will have
on existing users.


Surebut do we require -mfpu *and* -mfp16-format? s/and/or/ ?   Do we require 
-mfp16-format for float16x[48] intrinsics, or allow format-agnostic code (as HW 
support allows us to!)?


I don't have very strong opinions as to which way we should go, I merely tried 
to be consistent with the existing codebase, and to support as much code as 
possible, although I agree I ignored cases where defining functions unexpectedly 
might cause problems.


Cheers, Alan



Re: [Ping^3] [PATCH, ARM, libgcc] New aeabi_idiv function for armv6-m

2015-07-07 Thread Richard Earnshaw
On 07/07/15 16:23, Tejas Belagod wrote:
 Ping!
 

I've had a look at this (sorry for the delay).  I think it's mostly OK,
but I have two comments to make.

1) It's quite hard to understand the algorithm and there are no comments
to aid understanding (to be fair, there aren't many comments on the
other algorithms either).

2) It looks as though the new code calculates both the division and the
modulus simultaneously.  As such, it means that the the divmod code
should be simplified to share the same code as the division function
itself (saving a few bytes, and more importantly several cycles when
modulus is required).

Can you please run some tests to validate 2) above?  and if correct
adjust the code to handle this case.  I think that will go some way to
mitigating the code size increase from the new implementation.

R.

 On 30/04/15 10:40, Hale Wang wrote:
 -Original Message-
 From: Hale Wang [mailto:hale.w...@arm.com]
 Sent: Monday, February 09, 2015 9:54 AM
 To: Richard Earnshaw
 Cc: Hale Wang; gcc-patches; Matthew Gretton-Dann
 Subject: RE: [Ping^2] [PATCH, ARM, libgcc] New aeabi_idiv function for
 armv6-m

 Ping https://gcc.gnu.org/ml/gcc-patches/2014-12/msg01059.html.


 Ping for trunk. Is it ok for trunk now?

 Thanks,
 Hale
 -Original Message-
 From: gcc-patches-ow...@gcc.gnu.org [mailto:gcc-patches-
 ow...@gcc.gnu.org] On Behalf Of Hale Wang
 Sent: Friday, December 12, 2014 9:36 AM
 To: gcc-patches
 Subject: RE: [Ping] [PATCH, ARM, libgcc] New aeabi_idiv function for
 armv6- m

 Ping? Already applied to arm/embedded-4_9-branch, is it OK for trunk?

 -Hale

 -Original Message-
 From: Joey Ye [mailto:joey.ye...@gmail.com]
 Sent: Thursday, November 27, 2014 10:01 AM
 To: Hale Wang
 Cc: gcc-patches
 Subject: Re: [PATCH, ARM, libgcc] New aeabi_idiv function for
 armv6-m

 OK applying to arm/embedded-4_9-branch, though you still need
 maintainer approval into trunk.

 - Joey

 On Wed, Nov 26, 2014 at 11:43 AM, Hale Wang hale.w...@arm.com
 wrote:
 Hi,

 This patch ports the aeabi_idiv routine from Linaro Cortex-Strings
 (https://git.linaro.org/toolchain/cortex-strings.git), which was
 contributed by ARM under Free BSD license.

 The new aeabi_idiv routine is used to replace the one in
 libgcc/config/arm/lib1funcs.S. This replacement happens within the
 Thumb1 wrapper. The new routine is under LGPLv3 license.

 The main advantage of this version is that it can improve the
 performance of the aeabi_idiv function for Thumb1. This solution
 will also increase the code size. So it will only be used if
 __OPTIMIZE_SIZE__ is
 not defined.

 Make check passed for armv6-m.

 OK for trunk?

 Thanks,
 Hale Wang

 libgcc/ChangeLog:

 2014-11-26  Hale Wang  hale.w...@arm.com

  * config/arm/lib1funcs.S: Add new wrapper.

 ===
 diff --git a/libgcc/config/arm/lib1funcs.S
 b/libgcc/config/arm/lib1funcs.S index b617137..de66c81 100644
 --- a/libgcc/config/arm/lib1funcs.S
 +++ b/libgcc/config/arm/lib1funcs.S
 @@ -306,34 +306,12 @@ LSYM(Lend_fde):
   #ifdef __ARM_EABI__
   .macro THUMB_LDIV0 name signed
   #if defined(__ARM_ARCH_6M__)
 -   .ifc \signed, unsigned
 -   cmp r0, #0
 -   beq 1f
 -   mov r0, #0
 -   mvn r0, r0  @ 0x
 -1:
 -   .else
 -   cmp r0, #0
 -   beq 2f
 -   blt 3f
 +
 +   push{r0, lr}
  mov r0, #0
 -   mvn r0, r0
 -   lsr r0, r0, #1  @ 0x7fff
 -   b   2f
 -3: mov r0, #0x80
 -   lsl r0, r0, #24 @ 0x8000
 -2:
 -   .endif
 -   push{r0, r1, r2}
 -   ldr r0, 4f
 -   adr r1, 4f
 -   add r0, r1
 -   str r0, [sp, #8]
 -   @ We know we are not on armv4t, so pop pc is safe.
 -   pop {r0, r1, pc}
 -   .align  2
 -4:
 -   .word   __aeabi_idiv0 - 4b
 +   bl  SYM(__aeabi_idiv0)
 +   pop {r1, pc}
 +
   #elif defined(__thumb2__)
  .syntax unified
  .ifc \signed, unsigned
 @@ -927,7 +905,158 @@ LSYM(Lover7):
  add dividend, work
 .endif
   LSYM(Lgot_result):
 -.endm
 +.endm
 +
 +#if defined(__prefer_thumb__)
  !defined(__OPTIMIZE_SIZE__) .macro
 +BranchToDiv n, label
 +   lsr curbit, dividend, \n
 +   cmp curbit, divisor
 +   blo \label
 +.endm
 +
 +.macro DoDiv n
 +   lsr curbit, dividend, \n
 +   cmp curbit, divisor
 +   bcc 1f
 +   lsl curbit, divisor, \n
 +   sub dividend, dividend, curbit
 +
 +1: adc result, result
 +.endm
 +
 +.macro THUMB1_Div_Positive
 +   mov result, #0
 +   BranchToDiv #1, LSYM(Lthumb1_div1)
 +   BranchToDiv #4, LSYM(Lthumb1_div4)
 +   BranchToDiv #8, LSYM(Lthumb1_div8)
 +   BranchToDiv #12, LSYM(Lthumb1_div12)
 +   BranchToDiv #16, LSYM(Lthumb1_div16)
 +LSYM(Lthumb1_div_large_positive):
 +   mov result, #0xff
 +   lsl divisor, divisor, #8
 +   

RE: [PATCH] MIPS: Update stack-1.c testcase to match micromips jraddiusp instruction.

2015-07-07 Thread Andrew Bennett
  I'm not sure this is the right approach here. If we get a jraddiusp then the
  problem that the test is trying to cover can't possibly happen anyway.
  (The test is checking if a load and final stack adjustment are ever re-
 ordered
  from what I can see.)
 
  I'd just mark the test as NOCOMPRESSION instead of just NOMIPS16 and
  update the comment to say that it is avoiding SAVE, RESTORE and
  JRADDIUSP.
 
 
 Another approach would be to add the micromips testcase variant and skip the
 test if code-quality (ie. -O0).
 Catherine

I agree with Matthew here.  The testcase already comments that it is preventing
the use of the MIPS16 save and restore instructions, so it makes sense to 
prevent
jraddiusp as well.

The updated patch and ChangeLog is below.

Ok to commit?


Many thanks,



Andrew


testsuite/
* gcc.target/mips/stack-1.c: Do not build the testcase for micromips.


diff --git a/gcc/testsuite/gcc.target/mips/stack-1.c 
b/gcc/testsuite/gcc.target/mips/stack-1.c
index a28e4bf..5f25c21 100644
--- a/gcc/testsuite/gcc.target/mips/stack-1.c
+++ b/gcc/testsuite/gcc.target/mips/stack-1.c
@@ -2,8 +2,8 @@
 /* { dg-final { scan-assembler \tlw\t } } */
 /* { dg-final { scan-assembler-not 
\td?addiu\t(\\\$sp,)?\\\$sp,\[1-9\].*\tlw\t } } */
 
-/* Avoid use of SAVE and RESTORE.  */
-NOMIPS16 int foo (int y)
+/* Avoid use of SAVE, RESTORE and JRADDIUSP.  */
+NOCOMPRESSION int foo (int y)
 {
   volatile int a = y;
   volatile int *volatile b = a;



RE: [PATCH] MIPS: fix failing branch range checks for micromips

2015-07-07 Thread Andrew Bennett
 I see that you are naming these tests after the original branch-number tests
 that they were derived from.
 I think it would be better to keep all of the microMIPS tests named umips-???.
 I don't think preserving the original number is important.

I have named the microMIPS tests umips-branch-??? to keep with the current 
microMIPS
test naming strategy.  The numbering starts at 5 as there are already tests 
numbered 
1-4.

An updated patch and ChangeLog is below.

Ok to commit?


Many thanks,



Andrew



testsuite/
* gcc.target/mips/branch-2.c: Change NOMIPS16 to NOCOMPRESSION.
* gcc.target/mips/branch-3.c: Ditto
* gcc.target/mips/branch-4.c: Ditto.
* gcc.target/mips/branch-5.c: Ditto.
* gcc.target/mips/branch-6.c: Ditto.
* gcc.target/mips/branch-7.c: Ditto.
* gcc.target/mips/branch-8.c: Ditto.
* gcc.target/mips/branch-9.c: Ditto.
* gcc.target/mips/branch-10.c: Ditto.
* gcc.target/mips/branch-11.c: Ditto.
* gcc.target/mips/branch-12.c: Ditto.
* gcc.target/mips/branch-13.c: Ditto.
* gcc.target/mips/branch-14.c: Ditto.
* gcc.target/mips/branch-15.c: Ditto.
* gcc.target/mips/umips-branch-5.c: New file.
* gcc.target/mips/umips-branch-6.c: New file.
* gcc.target/mips/umips-branch-7.c: New file.
* gcc.target/mips/umips-branch-8.c: New file.
* gcc.target/mips/umips-branch-9.c: New file.
* gcc.target/mips/umips-branch-10.c: New file.
* gcc.target/mips/umips-branch-11.c: New file.
* gcc.target/mips/umips-branch-12.c: New file.
* gcc.target/mips/umips-branch-13.c: New file.
* gcc.target/mips/umips-branch-14.c: New file.
* gcc.target/mips/umips-branch-15.c: New file.
* gcc.target/mips/umips-branch-16.c: New file.
* gcc.target/mips/umips-branch-17.c: New file.
* gcc.target/mips/umips-branch-18.c: New file.
* gcc.target/mips/branch-helper.h (OCCUPY_0x1): New define.
(OCCUPY_0xfffc): New define.

diff --git a/gcc/testsuite/gcc.target/mips/branch-10.c 
b/gcc/testsuite/gcc.target/mips/branch-10.c
index e2b1b5f..eb21c16 100644
--- a/gcc/testsuite/gcc.target/mips/branch-10.c
+++ b/gcc/testsuite/gcc.target/mips/branch-10.c
@@ -4,7 +4,7 @@
 
 #include branch-helper.h
 
-NOMIPS16 void
+NOCOMPRESSION void
 foo (int (*bar) (void), int *x)
 {
   *x = bar ();
diff --git a/gcc/testsuite/gcc.target/mips/branch-11.c 
b/gcc/testsuite/gcc.target/mips/branch-11.c
index 962eb1b..bd8e834 100644
--- a/gcc/testsuite/gcc.target/mips/branch-11.c
+++ b/gcc/testsuite/gcc.target/mips/branch-11.c
@@ -8,7 +8,7 @@
 
 #include branch-helper.h
 
-NOMIPS16 void
+NOCOMPRESSION void
 foo (int (*bar) (void), int *x)
 {
   *x = bar ();
diff --git a/gcc/testsuite/gcc.target/mips/branch-12.c 
b/gcc/testsuite/gcc.target/mips/branch-12.c
index 4aef160..4944634 100644
--- a/gcc/testsuite/gcc.target/mips/branch-12.c
+++ b/gcc/testsuite/gcc.target/mips/branch-12.c
@@ -4,7 +4,7 @@
 
 #include branch-helper.h
 
-NOMIPS16 void
+NOCOMPRESSION void
 foo (int (*bar) (void), int *x)
 {
   *x = bar ();
diff --git a/gcc/testsuite/gcc.target/mips/branch-13.c 
b/gcc/testsuite/gcc.target/mips/branch-13.c
index 8a6fb04..f5269b9 100644
--- a/gcc/testsuite/gcc.target/mips/branch-13.c
+++ b/gcc/testsuite/gcc.target/mips/branch-13.c
@@ -8,7 +8,7 @@
 
 #include branch-helper.h
 
-NOMIPS16 void
+NOCOMPRESSION void
 foo (int (*bar) (void), int *x)
 {
   *x = bar ();
diff --git a/gcc/testsuite/gcc.target/mips/branch-14.c 
b/gcc/testsuite/gcc.target/mips/branch-14.c
index 026417e..c2eecc3 100644
--- a/gcc/testsuite/gcc.target/mips/branch-14.c
+++ b/gcc/testsuite/gcc.target/mips/branch-14.c
@@ -4,14 +4,14 @@
 #include branch-helper.h
 
 void __attribute__((noinline))
-foo (volatile int *x)
+NOCOMPRESSION foo (volatile int *x)
 {
   if (__builtin_expect (*x == 0, 1))
 OCCUPY_0x1fff8;
 }
 
 int
-main (void)
+NOCOMPRESSION main (void)
 {
   int x = 0;
   int y = 1;
diff --git a/gcc/testsuite/gcc.target/mips/branch-15.c 
b/gcc/testsuite/gcc.target/mips/branch-15.c
index dee7a05..89e25f3 100644
--- a/gcc/testsuite/gcc.target/mips/branch-15.c
+++ b/gcc/testsuite/gcc.target/mips/branch-15.c
@@ -4,14 +4,14 @@
 #include branch-helper.h
 
 void
-foo (volatile int *x)
+NOCOMPRESSION foo (volatile int *x)
 {
   if (__builtin_expect (*x == 0, 1))
 OCCUPY_0x1fffc;
 }
 
 int
-main (void)
+NOCOMPRESSION main (void)
 {
   int x = 0;
   int y = 1;
diff --git a/gcc/testsuite/gcc.target/mips/branch-2.c 
b/gcc/testsuite/gcc.target/mips/branch-2.c
index 6409c4c..b60e9cd 100644
--- a/gcc/testsuite/gcc.target/mips/branch-2.c
+++ b/gcc/testsuite/gcc.target/mips/branch-2.c
@@ -5,7 +5,7 @@
 
 #include branch-helper.h
 
-NOMIPS16 void
+NOCOMPRESSION void
 foo (volatile int *x)
 {
   if (__builtin_expect (*x == 0, 1))
diff --git a/gcc/testsuite/gcc.target/mips/branch-3.c 
b/gcc/testsuite/gcc.target/mips/branch-3.c
index 5fcfece..69300f6 100644
--- 

Re: [PATCH 3/16][ARM] Add float16x4_t intrinsics

2015-07-07 Thread Kyrill Tkachov


On 07/07/15 14:09, Kyrill Tkachov wrote:

Hi Alan,

On 07/07/15 13:34, Alan Lawrence wrote:

As per https://gcc.gnu.org/ml/gcc-patches/2015-04/msg01335.html

For some context, the reference for these is at:
http://infocenter.arm.com/help/topic/com.arm.doc.ihi0073a/IHI0073A_arm_neon_intrinsics_ref.pdf

This patch is ok once you and Charles decide on how to proceed with the two 
prerequisites.


On second thought, the ACLE document at 
http://infocenter.arm.com/help/topic/com.arm.doc.ihi0053c/IHI0053C_acle_2_0.pdf

says in 12.2.1:
float16 types are only available when the __fp16 type is defined, i.e. when 
supported by the hardware

This indicates that float16 type and intrinsic availability should be gated on 
the availability of fp16
in the specified -mfpu. Look at some existing intrinsics like vcvt_f16_f32 for 
a way to gate these.

I notice that the float32x4_t is unconditionally defined in our arm_neon.h, 
however.
I think this is a bug and its definition should be #ifdef'd properly as well.

Thanks,

Kyrill



Thanks,
Kyrill





RE: [PATCH] MIPS: Do not generate micromips code for the no-smartmips-lwxs.c testcase

2015-07-07 Thread Andrew Bennett
 Hi Andrew,
 
 Instead of adding the -mno-micromips option to dg-options, please change the
 MIPS16 attribute to NOCOMPRESSION.
 
 Index: gcc.target/mips/no-smartmips-lwxs.c
 ===
 --- gcc.target/mips/no-smartmips-lwxs.c (revision 452061)
 +++ gcc.target/mips/no-smartmips-lwxs.c (working copy)
 @@ -1,7 +1,7 @@
  /* { dg-do compile } */
  /* { dg-options -mno-smartmips } */
 
 -NOMIPS16 int scaled_indexed_word_load (int a[], int b)
 +NOCOMPRESSION int scaled_indexed_word_load (int a[], int b)
  {
return a[b];
  }
 
 OK with that change.
 Catherine

Committed as SVN 225519. 

Regards,



Andrew


Re: [PATCH, ARM] stop changing signedness in PROMOTE_MODE

2015-07-07 Thread Jim Wilson
On Tue, Jul 7, 2015 at 8:07 AM, Jeff Law l...@redhat.com wrote:
 On 06/29/2015 07:15 PM, Jim Wilson wrote:
 So if these copies require a  conversion, then isn't it fundamentally
 wrong to have a PHI node which copies between them?  That would seem to
 implicate the eipa_sra pass as needing to be aware of these promotions and
 avoid having these objects with different representations appearing on the
 lhs/rhs of a PHI node.

My years at Cisco didn't give me a chance to work on the SSA passes,
so I don't know much about how they work.  But looking at this, I see
that PHI nodes are eventually handed by emit_partition_copy in
tree-outof-ssa.c, which calls convert_to_mode, so it appears that
conversions between different (closely related?) types are OK in a PHI
node.  The problem in this case is that we have the exact same type
for the src and dest.  The only difference is that the ARM forces
sign-extension for signed sub-word parameters and zero-extension for
signed sub-word locals.  Thus to detect the need for a conversion, you
have to have the decls, and we don't have them here.  There is also
the problem that all of the SUBREG_PROMOTED_* stuff is in expand_expr
and friends, which aren't used by the cfglayout/tree-outof-ssa.c code
for PHI nodes.  So we need to copy some of the SUBREG_PROMOTED_*
handling into cfglyout/tree-outof-ssa, or modify them to call
expand_expr for PHI nodes, and I haven't had any luck getting that to
work yet. I still need to learn more about the code to figure out if
this is possible.

I also think that the ARM handling of PROMOTE_MODE is wrong.  Treating
a signed sub-word and unsigned can lead to inefficient code.  This is
part of the problem is much easier for me to fix.  It may be hard to
convince ARM maintainers that it should be changed though.  I need
more time to work on this too.

I haven't looked at trying to forbid the optimizer from creating PHI
nodes connecting parameters to locals.  That just sounds like a
strange thing to forbid, and seems likely to result in worse code by
disabling too many optimizations.  But maybe it is the right solution
if neither of the other two options work.  This should only be done
when PROMOTE_MODE disagrees with TARGET_PROMOTE_FUNCTION_MODE, forcing
the copy to require a conversion.

Jim


Re: [PATCH 3/16][ARM] Add float16x4_t intrinsics

2015-07-07 Thread Alan Lawrence

Kyrill Tkachov wrote:

On 07/07/15 14:09, Kyrill Tkachov wrote:

Hi Alan,

On 07/07/15 13:34, Alan Lawrence wrote:

As per https://gcc.gnu.org/ml/gcc-patches/2015-04/msg01335.html

For some context, the reference for these is at:
http://infocenter.arm.com/help/topic/com.arm.doc.ihi0073a/IHI0073A_arm_neon_intrinsics_ref.pdf

This patch is ok once you and Charles decide on how to proceed with the two 
prerequisites.


On second thought, the ACLE document at 
http://infocenter.arm.com/help/topic/com.arm.doc.ihi0053c/IHI0053C_acle_2_0.pdf

says in 12.2.1:
float16 types are only available when the __fp16 type is defined, i.e. when 
supported by the hardware


However, we support __fp16 whenever the user specifies -mfp16-format=ieee or 
-mfp16-format=alternative, regardless of whether we have hardware support or not.


(Without hardware support, gcc generates calls to  __gnu_f2h_ieee or 
__gnu_f2h_alternative instead of vcvtb.f16.f32, and  __gnu_h2f_ieee or 
__gnu_h2f_alternative instead of vcvtb.f32.f16. However, there is no way to 
support __fp16 just using those hardware instructions without caring about which 
format is in use.)


Thus we cannot be consistent with both sides of that 'i.e.', unless we also 
change when __fp16 is available.



I notice that the float32x4_t is unconditionally defined in our arm_neon.h, 
however.
I think this is a bug and its definition should be #ifdef'd properly as well.


Hmmm. Is this becoming a question of, which potentially-existing code do we want 
to break???


Cheers, Alan



Re: [PING][PATCH, 1/2] Merge rewrite_virtuals_into_loop_closed_ssa from gomp4 branch

2015-07-07 Thread Tom de Vries

On 06/07/15 15:44, Richard Biener wrote:

On Mon, 6 Jul 2015, Tom de Vries wrote:


On 25/06/15 09:42, Tom de Vries wrote:

Hi,

this patch merges rewrite_virtuals_into_loop_closed_ssa (originally
submitted here: https://gcc.gnu.org/ml/gcc-patches/2015-06/msg01236.html
) to trunk.

Bootstrapped and reg-tested on x86_64.

OK for trunk?



Ping.

Thanks,
- Tom



0001-Merge-rewrite_virtuals_into_loop_closed_ssa-from-gom.patch


Merge rewrite_virtuals_into_loop_closed_ssa from gomp4 branch

2015-06-24  Tom de Vriest...@codesourcery.com

merge from gomp4 branch:
2015-06-24  Tom de Vriest...@codesourcery.com

* tree-ssa-loop-manip.c (get_virtual_phi): Factor out of ...
(rewrite_virtuals_into_loop_closed_ssa): ... here.

* tree-ssa-loop-manip.c (replace_uses_in_dominated_bbs): Factor out
of ...
(rewrite_virtuals_into_loop_closed_ssa): ... here.

* dominance.c (bitmap_get_dominated_by): New function.
* dominance.h (bitmap_get_dominated_by): Declare.
* tree-ssa-loop-manip.c (rewrite_virtuals_into_loop_closed_ssa): Use
bitmap_get_dominated_by.

* tree-parloops.c (replace_uses_in_bbs_by)
(rewrite_virtuals_into_loop_closed_ssa): Move to ...
* tree-ssa-loop-manip.c: here.
* tree-ssa-loop-manip.h (rewrite_virtuals_into_loop_closed_ssa):
Declare.

2015-06-18  Tom de Vriest...@codesourcery.com

* tree-parloops.c (rewrite_virtuals_into_loop_closed_ssa): New
function.
(transform_to_exit_first_loop_alt): Use
rewrite_virtuals_into_loop_closed_ssa.
---
   gcc/dominance.c   | 21 
   gcc/dominance.h   |  1 +
   gcc/tree-parloops.c   | 43 +
   gcc/tree-ssa-loop-manip.c | 81
+++
   gcc/tree-ssa-loop-manip.h |  1 +
   5 files changed, 112 insertions(+), 35 deletions(-)

diff --git a/gcc/dominance.c b/gcc/dominance.c
index 9c66ca2..9b52d79 100644
--- a/gcc/dominance.c
+++ b/gcc/dominance.c
@@ -753,6 +753,27 @@ set_immediate_dominator (enum cdi_direction dir,
basic_block bb,
   dom_computed[dir_index] = DOM_NO_FAST_QUERY;
   }

+/* Returns in BBS the list of basic blocks immediately dominated by BB, in
the
+   direction DIR.  As get_dominated_by, but returns result as a bitmap.  */
+
+void
+bitmap_get_dominated_by (enum cdi_direction dir, basic_block bb, bitmap
bbs)
+{
+  unsigned int dir_index = dom_convert_dir_to_idx (dir);
+  struct et_node *node = bb-dom[dir_index], *son = node-son, *ason;
+
+  bitmap_clear (bbs);
+
+  gcc_checking_assert (dom_computed[dir_index]);
+
+  if (!son)
+return;
+
+  bitmap_set_bit (bbs, ((basic_block) son-data)-index);
+  for (ason = son-right; ason != son; ason = ason-right)
+bitmap_set_bit (bbs, ((basic_block) son-data)-index);
+}
+


Isn't a immediate_dominated_by_p () predicate better?  It's very
cheap to compute compared to allocating / populating and querying
a bitmap.



Dropped bitmap_get_dominated_by per comment below.


   /* Returns the list of basic blocks immediately dominated by BB, in the
  direction DIR.  */
   vecbasic_block
diff --git a/gcc/dominance.h b/gcc/dominance.h
index 37e138b..0a1a13e 100644
--- a/gcc/dominance.h
+++ b/gcc/dominance.h
@@ -41,6 +41,7 @@ extern void free_dominance_info (enum cdi_direction);
   extern basic_block get_immediate_dominator (enum cdi_direction,
basic_block);
   extern void set_immediate_dominator (enum cdi_direction, basic_block,
 basic_block);
+extern void bitmap_get_dominated_by (enum cdi_direction, basic_block,
bitmap);
   extern vecbasic_block get_dominated_by (enum cdi_direction,
basic_block);
   extern vecbasic_block get_dominated_by_region (enum cdi_direction,
 basic_block *,
diff --git a/gcc/tree-parloops.c b/gcc/tree-parloops.c
index e582fe7..df7c351 100644
--- a/gcc/tree-parloops.c
+++ b/gcc/tree-parloops.c
@@ -1498,25 +1498,6 @@ replace_uses_in_bb_by (tree name, tree val,
basic_block bb)
   }
   }

-/* Replace uses of NAME by VAL in blocks BBS.  */
-
-static void
-replace_uses_in_bbs_by (tree name, tree val, bitmap bbs)
-{
-  gimple use_stmt;
-  imm_use_iterator imm_iter;
-
-  FOR_EACH_IMM_USE_STMT (use_stmt, imm_iter, name)
-{
-  if (!bitmap_bit_p (bbs, gimple_bb (use_stmt)-index))
-   continue;
-
-  use_operand_p use_p;
-  FOR_EACH_IMM_USE_ON_STMT (use_p, imm_iter)
-   SET_USE (use_p, val);
-}
-}
-
   /* Do transformation from:

bb preheader:
@@ -1637,18 +1618,11 @@ transform_to_exit_first_loop_alt (struct loop *loop,
 tree control = gimple_cond_lhs (cond_stmt);
 edge e;

-  /* Gather the bbs dominated by the exit block.  */
-  bitmap exit_dominated = BITMAP_ALLOC (NULL);
-  bitmap_set_bit (exit_dominated, exit_block-index);
-  vecbasic_block exit_dominated_vec
-= get_dominated_by (CDI_DOMINATORS, exit_block);
-
-  int i;

Re: [PATCH 3/16][ARM] Add float16x4_t intrinsics

2015-07-07 Thread Kyrill Tkachov


On 07/07/15 17:34, Alan Lawrence wrote:

Kyrill Tkachov wrote:

On 07/07/15 14:09, Kyrill Tkachov wrote:

Hi Alan,

On 07/07/15 13:34, Alan Lawrence wrote:

As per https://gcc.gnu.org/ml/gcc-patches/2015-04/msg01335.html

For some context, the reference for these is at:
http://infocenter.arm.com/help/topic/com.arm.doc.ihi0073a/IHI0073A_arm_neon_intrinsics_ref.pdf

This patch is ok once you and Charles decide on how to proceed with the two 
prerequisites.

On second thought, the ACLE document at 
http://infocenter.arm.com/help/topic/com.arm.doc.ihi0053c/IHI0053C_acle_2_0.pdf

says in 12.2.1:
float16 types are only available when the __fp16 type is defined, i.e. when 
supported by the hardware

However, we support __fp16 whenever the user specifies -mfp16-format=ieee or
-mfp16-format=alternative, regardless of whether we have hardware support or 
not.

(Without hardware support, gcc generates calls to  __gnu_f2h_ieee or
__gnu_f2h_alternative instead of vcvtb.f16.f32, and  __gnu_h2f_ieee or
__gnu_h2f_alternative instead of vcvtb.f32.f16. However, there is no way to
support __fp16 just using those hardware instructions without caring about which
format is in use.)


Hmmm... In my opinion intrinsics should aim to map to instructions rather than 
go away and
call library functions, but this is the existing functionality
that current users might depend on :(



Thus we cannot be consistent with both sides of that 'i.e.', unless we also
change when __fp16 is available.


I notice that the float32x4_t is unconditionally defined in our arm_neon.h, 
however.
I think this is a bug and its definition should be #ifdef'd properly as well.

Hmmm. Is this becoming a question of, which potentially-existing code do we want
to break???


CC'ing the ARM maintainers and Tejas for an ACLE perspective.
I think that we'd want to gate the definition of __fp16 on hardware 
availability as well
(the -mfpu option) rather than just arm_fp16_format but I'm not sure of the 
impact this will have
on existing users.

Kyrill



Cheers, Alan




Re: Tests for libgomp based on OpenMP Examples 4.0.2.

2015-07-07 Thread Maxim Blumental
Comment on the patch:

simd-5.f90 file is marked as xfail since the test fails because 'simd
collapse' is an unsupported combination for Fortran (which though is
valid in OpenMP API).

2015-07-07 19:48 GMT+03:00 Maxim Blumental bvm...@gmail.com:
 With this letter I propose a patch with tests for libgomp based on
 OpenMP Examples 4.0.2 both for C and Fortran.

 The changes are:

  Renamed existing tests based on OpenMP Examples to make the names more clear.
 Added 16 tests for simd construct and 10 for depend clause.

 -
 Sincerely yours,
 Maxim Blumental



-- 


-
Sincerely yours,
Maxim Blumental


RE: [PATCH] MIPS: Update stack-1.c testcase to match micromips jraddiusp instruction.

2015-07-07 Thread Moore, Catherine


 -Original Message-
 From: Andrew Bennett [mailto:andrew.benn...@imgtec.com]
 Sent: Tuesday, July 07, 2015 12:14 PM
 To: Moore, Catherine; Matthew Fortune; gcc-patches@gcc.gnu.org
 Subject: RE: [PATCH] MIPS: Update stack-1.c testcase to match micromips
 jraddiusp instruction.
 
   I'm not sure this is the right approach here. If we get a jraddiusp
   then the problem that the test is trying to cover can't possibly happen
 anyway.
   (The test is checking if a load and final stack adjustment are ever
   re-
  ordered
   from what I can see.)
  
   I'd just mark the test as NOCOMPRESSION instead of just NOMIPS16 and
   update the comment to say that it is avoiding SAVE, RESTORE and
   JRADDIUSP.
  
 
  Another approach would be to add the micromips testcase variant and
  skip the test if code-quality (ie. -O0).
  Catherine
 
 I agree with Matthew here.  The testcase already comments that it is
 preventing the use of the MIPS16 save and restore instructions, so it makes
 sense to prevent jraddiusp as well.
 
 The updated patch and ChangeLog is below.
 
 Ok to commit?
 
 
 
 testsuite/
   * gcc.target/mips/stack-1.c: Do not build the testcase for micromips.
 
 
Yes, this is OK.


Re: flatten cfgloop.h

2015-07-07 Thread Richard Biener
On Mon, Jul 6, 2015 at 8:11 PM, Jeff Law l...@redhat.com wrote:
 On 07/06/2015 07:38 AM, Michael Matz wrote:

 Hi,

 On Sun, 5 Jul 2015, Prathamesh Kulkarni wrote:

 Hi,
 The attached patches flatten cfgloop.h.
 patch-1.diff moves around prototypes and structures to respective
 header-files.
 patch-2.diff (mostly auto-generated) replicates cfgloop.h includes in c
 files.
 Bootstrapped and tested on x86_64-unknown-linux-gnu with all front-ends.
 Built on all targets using config-list.mk.
 I left includes in cfgloop.h commented with #if 0 ... #endif.
 OK for trunk ?


 Does nobody else think that header files for one or two prototypes are
 fairly silly?

 Perhaps, but having a .h file for each .c file's exported objects means that
 we can implement a reasonable policy around where functions are prototyped
 or structures declared.

Yes.  At least for infrastructure files.  I happily make exceptions for things
like tree-vectorizer.h having prototypes for all tree-vect*.c files
(the at least
look related!).  Similar for tree-pass.h containing the interfacing of passes
to the pass manager (instead of having a header file for every pass).

 Contrast to I put foo in expr.h because that was the most convenient place
 which over 25+ years has made our header file dependencies a horrid mess.

Yeah - and I think we need to clean this up.  The general guidance of
prototype is in the corresponding .h file is easy.

After doing that we can move functions or even get rid of *.[ch] pairs
(like what is tree-dfa.[ch] other than a kitchen-sink for stuff - certainly
no where Data flow functions for trees)

Richard.



 Anyway, your autogenerated part contains changes that seem exaggerated,
 e.g.:

 +++ b/gcc/bt-load.c
 @@ -54,6 +54,14 @@ along with GCC; see the file COPYING3.  If not see
   #include predict.h
   #include basic-block.h
   #include df.h
 +#include bitmap.h
 +#include sbitmap.h
 +#include cfgloopmanip.h
 +#include loop-init.h
 +#include cfgloopanal.h
 +#include loop-doloop.h
 +#include loop-invariant.h
 +#include loop-iv.h

 Surely bt-load doesn't need anything from doloop.h or invariant.h.  Before
 this goes into trunk this whole autogenerated thing should be cleaned up
 to add includes only for things that are actually needed.

 Agreed.
 jeff




Re: fix segfault in verify_flow_info() with -dx option

2015-07-07 Thread Richard Biener
On Tue, Jul 7, 2015 at 2:42 AM, Prathamesh Kulkarni
prathamesh.kulka...@linaro.org wrote:
 On 6 July 2015 at 12:00, Richard Biener richard.guent...@gmail.com wrote:
 On Sun, Jul 5, 2015 at 2:07 PM, Prathamesh Kulkarni
 prathamesh.kulka...@linaro.org wrote:
 Hi,
 Passing -dx causes segmentation fault:
 Test case: void f(void) {}

 ./test.c: In function 'f':
 ../test.c:3:1: internal compiler error: Segmentation fault
  }
  ^
 0xab6baf crash_signal
 /home/prathamesh.kulkarni/gnu-toolchain/src/gcc.git/gcc/toplev.c:366
 0x694b14 verify_flow_info()
 
 /home/prathamesh.kulkarni/gnu-toolchain/src/gcc.git/gcc/cfghooks.c:109
 0x9f7e64 execute_function_todo
 
 /home/prathamesh.kulkarni/gnu-toolchain/src/gcc.git/gcc/passes.c:1997
 0x9f86eb execute_todo
 
 /home/prathamesh.kulkarni/gnu-toolchain/src/gcc.git/gcc/passes.c:2042

 Started with r210068.
 It looks like -dx causes cfun-cfg to be NULL, and hence the segfault
 in verify_flow_info().
 The attached patch tries to fix it by adding a check to cfun-cfg before 
 calling
 verify_flow_info() from execute_function_todo().
 Bootstrapped and tested on x86_64-unknown-linux-gnu.
 OK for trunk ?

 No.  We've checked cfun-curr_properties  PROP_cfg already.  So whatever
 is keeping that set but frees the CFG is the offender (and should
 clear the flag).
 I think I have somewhat understood what's happening.
 -dx turns on flag rtl_dump_and_exit.
 pass_rest_of_compilation is gated on !rtl_dump_and_exit.
 Since rtl_dump_and_exit == 1 when -dx is passed,
 pass_rest_of_compilation and all the
 rtl passes inserted within pass_rest_of_compilation don't execute.
 One of these passes is pass_free_cfg which destorys PROP_cfg, but with
 -dx passed,
 this pass doesn't get executed and PROP_cfg remains set.
 Then pass_clean_state::execute() calls free_after_compilation(), which
 sets cfun-cfg = NULL.
 And hence after pass_clean_state finishes in execute_function_todo, we
 end up with cfun-cfg == NULL and CFG_prop set,
 which calls verify_flow_info() and we hit the segfault.

 The following untested patch tries to fix this by clearing CFG_prop in
 free_after_compilation.
 Shall that be correct approach ?

Yes, that looks good to me.

Richard.

 Thanks,
 Prathamesh

 Richard.

 Thank you,
 Prathamesh


[patch, driver] Ignore -ftree-parallelize-loops={0,1}

2015-07-07 Thread Tom de Vries

Hi,

currently, we have these spec strings in gcc/gcc.c involving 
ftree-parallelize-loops:

...
%{fopenacc|fopenmp|ftree-parallelize-loops=*:%:include(libgomp.spec)%(link_gomp)}
%{fopenacc|fopenmp|ftree-parallelize-loops=*:-pthread}
...

Actually, ftree-parallelize-loops={0,1} means that no parallelization is 
done, but these spec strings still get activated for these values.



Attached patch fixes that, by introducing a spec function gt (short for 
greather than), and using it in the spec lines.


[ I've also tried this approach using the already existing spec function 
version-compare:

...
%{fopenacc|fopenmp:%:include(libgomp.spec)%(link_gomp)}
%:version-compare(=  2 ftree-parallelize-loops=
  %:include(libgomp.spec)%(link_gomp))
...
But that didn't work out. The function evaluation mechanism evaluates 
the arguments before testing the function, so we evaluate 
'%:include(libgomp.spec)' unconditionally. The gcc build breaks on the 
first xgcc invocation with linking due to a missing libgomp.spec. ]



Bootstrapped and reg-tested on x86_64.

OK for trunk?

Thanks,
- Tom
Ignore -ftree-parallelize-loops={0,1} using gt

---
 gcc/gcc.c | 48 ++--
 1 file changed, 46 insertions(+), 2 deletions(-)

diff --git a/gcc/gcc.c b/gcc/gcc.c
index 0f29b78..34fb437 100644
--- a/gcc/gcc.c
+++ b/gcc/gcc.c
@@ -274,6 +274,7 @@ static const char *compare_debug_self_opt_spec_function (int, const char **);
 static const char *compare_debug_auxbase_opt_spec_function (int, const char **);
 static const char *pass_through_libs_spec_func (int, const char **);
 static const char *replace_extension_spec_func (int, const char **);
+static const char *greater_than_spec_func (int, const char **);
 static char *convert_white_space (char *);
 
 /* The Specs Language
@@ -881,7 +882,7 @@ proper position among the other output files.  */
 %{s} %{t} %{u*} %{z} %{Z} %{!nostdlib:%{!nostartfiles:%S}}  VTABLE_VERIFICATION_SPEC  \
 %{static:} %{L*} %(mfwrap) %(link_libgcc)  SANITIZER_EARLY_SPEC  %o\
  CHKP_SPEC  \
-%{fopenacc|fopenmp|ftree-parallelize-loops=*:%:include(libgomp.spec)%(link_gomp)}\
+%{fopenacc|fopenmp|%:gt(%{ftree-parallelize-loops=*} 1):%:include(libgomp.spec)%(link_gomp)} \
 %{fcilkplus:%:include(libcilkrts.spec)%(link_cilkrts)}\
 %{fgnu-tm:%:include(libitm.spec)%(link_itm)}\
 %(mflib)  STACK_SPLIT_SPEC \
@@ -1042,7 +1043,8 @@ static const char *const multilib_defaults_raw[] = MULTILIB_DEFAULTS;
 /* Linking to libgomp implies pthreads.  This is particularly important
for targets that use different start files and suchlike.  */
 #ifndef GOMP_SELF_SPECS
-#define GOMP_SELF_SPECS %{fopenacc|fopenmp|ftree-parallelize-loops=*:  \
+#define GOMP_SELF_SPECS \
+  %{fopenacc|fopenmp|%:gt(%{ftree-parallelize-loops=*} 1):  \
   -pthread}
 #endif
 
@@ -1482,6 +1484,7 @@ static const struct spec_function static_spec_functions[] =
   { compare-debug-auxbase-opt, compare_debug_auxbase_opt_spec_function },
   { pass-through-libs,	pass_through_libs_spec_func },
   { replace-extension,	replace_extension_spec_func },
+  { gt,			greater_than_spec_func },
 #ifdef EXTRA_SPEC_FUNCTIONS
   EXTRA_SPEC_FUNCTIONS
 #endif
@@ -9428,6 +9431,47 @@ replace_extension_spec_func (int argc, const char **argv)
   return result;
 }
 
+/* Returns  if the n in ARGV[1] == -opt=n is greater than ARGV[2].
+   Otherwise, return NULL.  */
+
+static const char *
+greater_than_spec_func (int argc, const char **argv)
+{
+  char *converted;
+
+  if (argc == 1)
+return NULL;
+
+  gcc_assert (argc == 3);
+  gcc_assert (argv[0][0] == '-');
+  gcc_assert (argv[0][1] == '\0');
+
+  /* Point p to n in in -opt=n.  */
+  const char *p = argv[1];
+  while (true)
+{
+  char c = *p;
+  if (c == '\0')
+	gcc_unreachable ();
+
+  ++p;
+
+  if (c == '=')
+	break;
+}
+
+  long arg = strtol (p, converted, 10);
+  gcc_assert (converted != p);
+
+  long lim = strtol (argv[2], converted, 10);
+  gcc_assert (converted != argv[2]);
+
+  if (arg  lim)
+return ;
+
+  return NULL;
+}
+
 /* Insert backslash before spaces in ORIG (usually a file path), to 
avoid being broken by spec parser.
 
-- 
1.9.1



Re: [PR25529] Convert (unsigned t * 2)/2 into unsigned (t 0x7FFFFFFF)

2015-07-07 Thread Richard Biener
On Tue, Jul 7, 2015 at 8:06 AM, Marc Glisse marc.gli...@inria.fr wrote:
 On Tue, 7 Jul 2015, Hurugalawadi, Naveen wrote:

 Please find attached the patch PR25529.patch that converts the pattern:-
 (unsigned * 2)/2 is into unsigned 0x7FFF


 +/* Simplify (unsigned t * 2)/2 - unsigned t  0x7FFF.  */
 +(for div (trunc_div ceil_div floor_div round_div exact_div)
 + (simplify
 +  (div (mult @0 INTEGER_CST@1) INTEGER_CST@1)

 You don't need to repeat INTEGER_CST, the second time @1 is enough.

 +  (with { tree n2 = build_int_cst (TREE_TYPE (@0),
 +  wi::exact_log2 (@1)); }
 +  (if (TYPE_UNSIGNED (TREE_TYPE (@0)))
 +   (bit_and @0 (rshift (lshift { build_minus_one_cst (TREE_TYPE (@0)); }
 +  { n2; }) { n2; }))

 What happens if you write t*3/3?

Huh, and you posted this patch twice?  See my reply to the other copy
for the correctness issues and better handling of exact_div

Richard.

 --
 Marc Glisse


Re: [RFC] two-phase marking in gt_cleare_cache

2015-07-07 Thread Richard Biener
On Mon, Jul 6, 2015 at 7:29 PM, Tom de Vries tom_devr...@mentor.com wrote:
 On 06/07/15 15:29, Richard Biener wrote:

 On Mon, Jul 6, 2015 at 3:25 PM, Richard Biener
 richard.guent...@gmail.com wrote:

 On Mon, Jul 6, 2015 at 10:57 AM, Tom de Vries tom_devr...@mentor.com
 wrote:

 Hi,

 Using attached untested patch, I managed to minimize a test-case failure
 for
 PR 66714.

 The patch introduces two-phase marking in gt_cleare_cache:
 - first phase, it loops over all the hash table entries and removes
those which are dead
 - second phase, it runs over all the live hash table entries and marks
live items that are reachable from those live entries

 By doing so, we make the behaviour of gt_cleare_cache independent of the
 order in which the entries are visited, turning:
 - hard-to-trigger bugs which trigger for one visiting order but not for
another, into
 - more easily triggered bugs which trigger for any visiting order.

 Any comments?


 I think it is only half-way correct in your proposed change.  You only
 fix the issue for hashes of the same kind.  To truly fix the issue you'd
 have to change generated code for gt_clear_caches () and provide
 a clearing-only implementation (or pass a operation mode bool to
 the core worker in hash-table.h).



 [ Btw, we have been discussing a similar issue before:
 https://gcc.gnu.org/ml/gcc/2010-07/msg00077.html ]

 True, the problem exists at the scope of all variables marked with 'cache',
 and this patch addresses the problem only within a single variable.

 Hmm, and don't we rather want to first mark and _then_ clear?


 I. In favor of first clear and then mark:

 It allows for:
 - a lazy one phase implementation for !ENABLE_CHECKING where
   you do a single clear-or-mark phase (so the clear is lazy).
 - an eager two phase implementation for ENABLE_CHECKING (where the
   clear is eager)
 The approach of first a marking phase and then a clearing phase means you
 always have to do these two phases (you can't do the marking lazily).

True.

 First mark and then clear means the marking should be done iteratively. Each
 time you mark something live, another entry in another hash table could
 become live. Marking iteratively could become quite costly.

I don't see this - marking is done recursively so if one entry makes another
live and that makes another live the usual GC marking recursion will deal
with this?

 II. In favor of first mark and then clear:

 The users of garbage collection will need to be less precise.

 Because
 if entry B in the hash is live and would keep A live then A _is_ kept in
 the
 end but you'll remove it from the hash, possibly no longer using a still
 live copy.


 I'm not sure I understand the scenario you're concerned about, but ... say
 we have
 - entry B: item B - item A
 - entry A: item A - item Z

 If you do clear first and mark second, and you start out with item B live
 and item A dead:
 - during the clearing phase you clear entry A and keep entry B, and
 - during the marking phase you mark item A live.

 So we no longer have entry A, but item A is kept and entry B is kept.

Yes.  This makes the cache weaker in that after this GC operation
a lookup of A no longer succeeds but it still is there.

The whole point of your patch was to make the behavior more predictable
and in some way it succeeds (within a cache).  As it is supposed to
put more stress on the cache logic (it's ENABLE_CHECKING only)
it makes sense to clear optimistically (after all it's a cache and not
guaranteed to find a still live entry).  It would be still nice to cover
all caches together because as I remember we've mostly seen issues
of caches interacting.

Richard.

 Thanks,
 - Tom



GCC 5.2.0 Status Report (2015-07-07), branch frozen

2015-07-07 Thread Richard Biener

The GCC 5 branch is now frozen for the release of GCC 5.2, all changes
require release manager approval from now on.

I will shortly announce a first release candidate for GCC 5.2.


Previous Report
===

https://gcc.gnu.org/ml/gcc/2015-06/msg00202.html


Re: [Patch,tree-optimization]: Add new path Splitting pass on tree ssa representation

2015-07-07 Thread Richard Biener
On Sat, Jul 4, 2015 at 2:39 PM, Ajit Kumar Agarwal
ajit.kumar.agar...@xilinx.com wrote:


 -Original Message-
 From: Richard Biener [mailto:richard.guent...@gmail.com]
 Sent: Tuesday, June 30, 2015 4:42 PM
 To: Ajit Kumar Agarwal
 Cc: l...@redhat.com; GCC Patches; Vinod Kathail; Shail Aditya Gupta; 
 Vidhumouli Hunsigida; Nagaraju Mekala
 Subject: Re: [Patch,tree-optimization]: Add new path Splitting pass on tree 
 ssa representation

 On Tue, Jun 30, 2015 at 10:16 AM, Ajit Kumar Agarwal 
 ajit.kumar.agar...@xilinx.com wrote:
 All:

 The below patch added a new path Splitting optimization pass on SSA
 representation. The Path Splitting optimization Pass moves the join
 block of if-then-else same as loop latch to its predecessors and get merged 
 with the predecessors Preserving the SSA representation.

 The patch is tested for Microblaze and i386 target. The EEMBC/Mibench
 benchmarks is run with the Microblaze target And the performance gain
 of 9.15% and rgbcmy01_lite(EEMBC benchmarks). The Deja GNU tests is run for 
 Mircroblaze Target and no regression is seen for Microblaze target and the 
 new testcase attached are passed.

 For i386 bootstrapping goes through fine and the Spec cpu2000
 benchmarks is run with this patch. Following observation were seen with spec 
 cpu2000 benchmarks.

 Ratio of path splitting change vs Ratio of not having path splitting change 
 is 3653.353 vs 3652.14 for INT benchmarks.
 Ratio of path splitting change vs Ratio of not having path splitting change 
 is  4353.812 vs 4345.351 for FP benchmarks.

 Based on comments from RFC patch following changes were done.

 1. Added a new pass for path splitting changes.
 2. Placed the new path  Splitting Optimization pass before the copy 
 propagation pass.
 3. The join block same as the Loop latch is wired into its
 predecessors so that the CFG Cleanup pass will merge the blocks Wired 
 together.
 4. Copy propagation routines added for path splitting changes is not
 needed as suggested by Jeff. They are removed in the patch as The copy 
 propagation in the copied join blocks will be done by the existing copy 
 propagation pass and the update ssa pass.
 5. Only the propagation of phi results of the join block with the phi
 argument is done which will not be done by the existing update_ssa Or copy 
 propagation pass on tree ssa representation.
 6. Added 2 tests.
 a) compilation check  tests.
b) execution tests.
 7. Refactoring of the code for the feasibility check and finding the join 
 block same as loop latch node.

 [Patch,tree-optimization]: Add new path Splitting pass on tree ssa 
 representation.

 Added a new pass on path splitting on tree SSA representation. The path
 splitting optimization does the CFG transformation of join block of the
 if-then-else same as the loop latch node is moved and merged with the
 predecessor blocks after preserving the SSA representation.

 ChangeLog:
 2015-06-30  Ajit Agarwal  ajit...@xilinx.com

 * gcc/Makefile.in: Add the build of the new file
 tree-ssa-path-split.c
 * gcc/common.opt: Add the new flag ftree-path-split.
 * gcc/opts.c: Add an entry for Path splitting pass
 with optimization flag greater and equal to O2.
 * gcc/passes.def: Enable and add new pass path splitting.
 * gcc/timevar.def: Add the new entry for TV_TREE_PATH_SPLIT.
 * gcc/tree-pass.h: Extern Declaration of make_pass_path_split.
 * gcc/tree-ssa-path-split.c: New file for path splitting pass.
 * gcc/testsuite/gcc.dg/tree-ssa/path-split-2.c: New testcase.
 * gcc/testsuite/gcc.dg/path-split-1.c: New testcase.

I'm not 100% sure I understand the transform but what I see from the 
testcases it tail-duplicates from a conditional up to a loop latch block 
(not sure if it includes it and thus ends up creating a loop nest or not).

An observation I have is that the pass should at least share the transform 
stage to some extent with the existing tracer pass (tracer.c) which 
essentially does the same but not restricted to loops in any way.

 The following piece of code from tracer.c can be shared with the existing 
 path splitting pass.

 {
  e = find_edge (bb, bb2);

   copy = duplicate_block (bb2, e, bb);
   flush_pending_stmts (e);

   add_phi_args_after_copy (copy, 1, NULL);
 }

 Sharing the above code of the transform stage of tracer.c with the path 
 splitting pass has the following limitation.

 1. The duplicated loop latch node is wired to its predecessors and the 
 existing phi node in the loop latch node with the
 Phi arguments from its corresponding predecessors is moved to the duplicated 
 loop latch node that is wired into its predecessors. Due
 To this, the duplicated loop latch nodes wired into its predecessors will not 
 be merged with the original predecessors by CFG cleanup phase .

 So I wonder if your pass could be simply another 

[PATCH] Fix PR66739

2015-07-07 Thread Richard Biener

The following fixes PR66739 - with conditionals not applying
a single-use restriction usually causes some regressions.

Bootstrapped and tested on x86_64-unknown-linux-gnu, applied to trunk.

Richard.

2015-07-07  Richard Biener  rguent...@suse.de

PR middle-end/66739
* match.pd: Condition A - B ==/!= 0 - A ==/!= B on single-use
A - B.

Index: gcc/match.pd
===
--- gcc/match.pd(revision 225453)
+++ gcc/match.pd(working copy)
@@ -1336,8 +1353,9 @@ (define_operator_list CBRT BUILT_IN_CBRT
attempts to synthetize ABS_EXPR.  */
 (for cmp (eq ne)
  (simplify
-  (cmp (minus @0 @1) integer_zerop)
-  (cmp @0 @1)))
+  (cmp (minus@2 @0 @1) integer_zerop)
+  (if (single_use (@2))
+   (cmp @0 @1
 
 /* Transform comparisons of the form X * C1 CMP 0 to X CMP 0 in the
signed arithmetic case.  That form is created by the compiler



Re: [PR25530] Convert (unsigned t / 2) * 2 into (unsigned t ~1)

2015-07-07 Thread Richard Biener
On Tue, Jul 7, 2015 at 6:52 AM, Hurugalawadi, Naveen
naveen.hurugalaw...@caviumnetworks.com wrote:
 Hi,

 Please find attached the patch PR25530.patch that converts the pattern:-
 (unsigned / 2) * 2 is into (unsigned  ~1).

 Please review and let me know if its okay.

For EXACT_DIV fold-const.c has

  /* ((T) (X /[ex] C)) * C cancels out if the conversion is
 sign-changing only.  */
  if (TREE_CODE (arg1) == INTEGER_CST
   TREE_CODE (arg0) == EXACT_DIV_EXPR
   operand_equal_p (arg1, TREE_OPERAND (arg0, 1), 0))
return fold_convert_loc (loc, type, TREE_OPERAND (arg0, 0));

we know the remainder is zero for EXACT_DIV.  It also gives hints that
a sign-changing conversion is ok.

+/* Simplify (unsigned t / 2) * 2 - unsigned t  ~1.  */
+/* PR25530.  */
+(for div (trunc_div ceil_div floor_div round_div exact_div)
+ (simplify
+  (mult (div @0 INTEGER_CST@1) INTEGER_CST@1)
+  (with { tree n2 = build_int_cst (TREE_TYPE (@0),
+  wi::exact_log2 (@1)); }
+  (if (TYPE_UNSIGNED (TREE_TYPE (@0)))
+   (bit_and @0 (lshift (rshift { build_minus_one_cst (TREE_TYPE (@0)); }
+  { n2; }) { n2; }))

you should move the (with inside the (if to save work if the type is not
unsigned.  Also you are using wi::exact_log2 without checking whether
@1 was a power of two (I think exact_log2 returns -1 in this case).
Then expressing ~1 with the result expression is really excessive - you
should simply build this with @1 - 1 if @1 is a power of two.

So please handle exact_div differently, like fold-const.c does.

Also I am not sure ceil_div and floor_div can be handled this way.
(5 /[ceil] 2) * 2 == 6 but you compute it as 4.  So I am only convinced
trunc_div works this way.

Thanks,
Richard.

 Regression tested on AARH64 and x86_64.

 Thanks,
 Naveen

 gcc/testsuite/ChangeLog:

 2015-07-07  Naveen H.S  naveen.hurugalaw...@caviumnetworks.com

 PR middle-end/25530
 * gcc.dg/pr25530.c: New test.

 gcc/ChangeLog:

 2015-07-07  Naveen H.S  naveen.hurugalaw...@caviumnetworks.com

 PR middle-end/25530
 * match.pd (mult (div @0 INTEGER_CST@1) INTEGER_CST@1) :
 New simplifier.


[PATCH, i386]: Generate BT with immedate operand

2015-07-07 Thread Uros Bizjak
Hello!

After recent x86 EXTZ/EXTZV improvements, we can extend BT splitters
to generate BT instruction with immediate operands.  The improvement
can be seen with attached testcases.

The benefit is obvious for BT with immediates 32 = n = 63:

  0:   48 b8 00 00 00 00 00movabs $0x1000,%rax
  7:   00 00 10
  a:   48 85 c7test   %rax,%rdi

vs.:

  0:   48 0f ba e7 3c  bt $0x3c,%rdi

The benefit with operands 0 = n = 31 is also noticeable:

  0:   f7 c7 00 04 00 00   test   $0x400,%edi

vs.:

  0:   0f ba e7 0a bt $0xa,%edi

BT has *slightly* higher latency than TEST (0.33 vs. 0.25 cycles on a
modern processor), so I have limited the conversion to -Os in case the
bit-test is in the low 32 bits.

In addition to 1556 BT %reg, %reg insns, already present in cc1
executable, patched compiler generated additional 628 BT #imm,%reg
instructions in cc1.

2015-07-07  Uros Bizjak  ubiz...@gmail.com

* config/i386/i386.md (*jcc_btmode): Only split before reload.
Remove operand constraints.  Change operand 2 predicate to
nonmemory operand.  Limit const_int values to mode bitsize.  Only
allow const_int values less than 32 when optimizing for size.
(*jcc_btmode_1, *jcc_btmode_mask): Only split before reload.
Remove operand constraints.
(*btmode): Use SImode for const_int values less than 32.
(regmode): Remove mode attribute.

testsuite/ChangeLog:

2015-07-07  Uros Bizjak  ubiz...@gmail.com

* gcc.target/i386/bt-3.c: New test.
* gcc.target/i386/bt-4.c: Ditto.

Patch was bootstrapped and regression tested on x86_64-linux-gnu
{,-m32}. I'll commit the patch to mainline as soon as regression test
ends.

Uros.
Index: config/i386/i386.md
===
--- config/i386/i386.md (revision 225484)
+++ config/i386/i386.md (working copy)
@@ -10765,8 +10765,6 @@
   DONE;
 })
 
-(define_mode_attr regmode [(SI k) (DI q)])
-
 (define_insn *btmode
   [(set (reg:CCC FLAGS_REG)
(compare:CCC
@@ -10775,11 +10773,132 @@
(const_int 1)
(match_operand:SI 1 nonmemory_operand rN))
  (const_int 0)))]
-  TARGET_USE_BT || optimize_function_for_size_p (cfun)
-  bt{imodesuffix}\t{%regmode1, %0|%0, %regmode1}
+  
+{
+  switch (get_attr_mode (insn))
+{
+case MODE_SI:
+  return bt{l}\t{%1, %k0|%k0, %1};
+
+case MODE_DI:
+  return bt{q}\t{%q1, %0|%0, %q1};
+
+default:
+  gcc_unreachable ();
+}
+}
   [(set_attr type alu1)
(set_attr prefix_0f 1)
-   (set_attr mode MODE)])
+   (set (attr mode)
+   (if_then_else
+ (and (match_test CONST_INT_P (operands[1]))
+  (match_test INTVAL (operands[1])  32))
+ (const_string SI)
+ (const_string MODE)))])
+
+(define_insn_and_split *jcc_btmode
+  [(set (pc)
+   (if_then_else (match_operator 0 bt_comparison_operator
+   [(zero_extract:SWI48
+  (match_operand:SWI48 1 register_operand)
+  (const_int 1)
+  (match_operand:SI 2 nonmemory_operand))
+(const_int 0)])
+ (label_ref (match_operand 3))
+ (pc)))
+   (clobber (reg:CC FLAGS_REG))]
+  (TARGET_USE_BT || optimize_function_for_size_p (cfun))
+(CONST_INT_P (operands[2])
+   ? (INTVAL (operands[2])  GET_MODE_BITSIZE (MODEmode)
+  INTVAL (operands[2])
+  = (optimize_function_for_size_p (cfun) ? 0 : 32))
+   : register_operand (operands[2], SImode))
+can_create_pseudo_p ()
+  #
+   1
+  [(set (reg:CCC FLAGS_REG)
+   (compare:CCC
+ (zero_extract:SWI48
+   (match_dup 1)
+   (const_int 1)
+   (match_dup 2))
+ (const_int 0)))
+   (set (pc)
+   (if_then_else (match_op_dup 0 [(reg:CCC FLAGS_REG) (const_int 0)])
+ (label_ref (match_dup 3))
+ (pc)))]
+{
+  operands[0] = shallow_copy_rtx (operands[0]);
+  PUT_CODE (operands[0], reverse_condition (GET_CODE (operands[0])));
+})
+
+(define_insn_and_split *jcc_btmode_1
+  [(set (pc)
+   (if_then_else (match_operator 0 bt_comparison_operator
+   [(zero_extract:SWI48
+  (match_operand:SWI48 1 register_operand)
+  (const_int 1)
+  (zero_extend:SI
+(match_operand:QI 2 register_operand)))
+(const_int 0)])
+ (label_ref (match_operand 3))
+ (pc)))
+   (clobber (reg:CC FLAGS_REG))]
+  (TARGET_USE_BT || optimize_function_for_size_p (cfun))
+can_create_pseudo_p ()
+  #
+   1
+  [(set (reg:CCC FLAGS_REG)
+   (compare:CCC
+ (zero_extract:SWI48
+   (match_dup 1)
+   (const_int 1)
+   (match_dup 2))
+ (const_int 0)))
+   (set (pc)
+   (if_then_else (match_op_dup 0 [(reg:CCC 

Re: [PATCH, ARM] stop changing signedness in PROMOTE_MODE

2015-07-07 Thread Jim Wilson
On Thu, Jul 2, 2015 at 2:07 AM, Richard Earnshaw
richard.earns...@foss.arm.com wrote:
 Not quite, ARM state still has more flexible addressing modes for
 unsigned byte loads than for signed byte loads.  It's even worse with
 thumb1 where some signed loads have no single-register addressing mode
 (ie you have to copy zero into another register to use as an index
 before doing the load).

I wasn't aware of the load address problem.  That was something I
hadn't considered, and will have to look at that.  Load is just one
instruction though.  For most other instructions, a zero-extend
results in less efficient code, because it then forces a sign-extend
before a signed operation.  The fact that parameters and locals are
handled differently which requires conversions when copying between
them results in more inefficient code.  And changing
TARGET_PROMOTE_FUNCTION_MODE is an ABI change, and hence would be
unwise, so changing PROMOTE_MODE is the safer option.

Consider this testcase
extern signed short gs;

short
sub (void)
{
  signed short s = gs;
  int i;

  for (i = 0; i  10; i++)
{
  s += 1;
  if (s  10) break;
}

  return s;
}

The inner loop ends up as
.L3:
adds r3, r3, #1
mov r0, r1
uxth r3, r3
sxth r2, r3
cmp r2, #10
bgt .L8
cmp r2, r1
bne .L3
bx lr

We need the sign-extension for the compare.  We need the
zero-extension for the loop carried dependency.  We have two
extensions in every loop iteration, plus some extra register usage and
register movement.  We get better code for this example if we aren't
forcing signed shorts to be zero-extended via PROMOTE_MODE.

The lack of a reg+immediate address mode for ldrs[bh] in thumb1 does
look like a problem though.  But this means the difference between
generating
movs r2, #0
ldrsh r3, [r3, r2]
with my patch, or
ldrh r3, [r3]
lsls r2, r3, #16
asrs r2, r2, #16
without my patch.  It isn't clear which sequence is better.  The
sign-extends in the second sequence can sometimes be optimized away,
and sometimes they can't be optimized away.  Similarly, in the first
sequence, loading zero into a reg can sometimes be optimized, and
sometimes it can't.  There is also no guarantee that you get the first
sequence with the patch or the second sequence without the patch.
There is a splitter for ldrsh, so you can get the second pattern
sometimes with the patch.  Similarly, it might be possible to get the
first pattern without the patch in some cases, though I don't have one
at the moment.

Jim


[PATCH, committed] PR jit/66779: fix segfault

2015-07-07 Thread David Malcolm
Fix a segfault where expr.c:fold_single_bit_test was segfaulting due to
jit_langhook_type_for_mode not handling QImode.

Tested with make check-jit; takes jit.sum from 8289 to 8434 passes.

Committed to trunk as r225522.

gcc/jit/ChangeLog:
PR jit/66779
* dummy-frontend.c (jit_langhook_type_for_mode): Ensure that we
handle modes QI, HI, SI, DI, TI.

gcc/testsuite/ChangeLog:
PR jit/66779
* jit.dg/all-non-failing-tests.h: Add test-pr66779.c.
* jit.dg/test-pr66779.c: New testcase.
---
 gcc/jit/dummy-frontend.c |  11 +++
 gcc/testsuite/jit.dg/all-non-failing-tests.h |  10 ++
 gcc/testsuite/jit.dg/test-pr66779.c  | 143 +++
 3 files changed, 164 insertions(+)
 create mode 100644 gcc/testsuite/jit.dg/test-pr66779.c

diff --git a/gcc/jit/dummy-frontend.c b/gcc/jit/dummy-frontend.c
index 8001382..3ddab50 100644
--- a/gcc/jit/dummy-frontend.c
+++ b/gcc/jit/dummy-frontend.c
@@ -154,6 +154,17 @@ jit_langhook_type_for_mode (enum machine_mode mode, int 
unsignedp)
   if (mode == TYPE_MODE (double_type_node))
 return double_type_node;
 
+  if (mode == TYPE_MODE (intQI_type_node))
+return unsignedp ? unsigned_intQI_type_node : intQI_type_node;
+  if (mode == TYPE_MODE (intHI_type_node))
+return unsignedp ? unsigned_intHI_type_node : intHI_type_node;
+  if (mode == TYPE_MODE (intSI_type_node))
+return unsignedp ? unsigned_intSI_type_node : intSI_type_node;
+  if (mode == TYPE_MODE (intDI_type_node))
+return unsignedp ? unsigned_intDI_type_node : intDI_type_node;
+  if (mode == TYPE_MODE (intTI_type_node))
+return unsignedp ? unsigned_intTI_type_node : intTI_type_node;
+
   if (mode == TYPE_MODE (integer_type_node))
 return unsignedp ? unsigned_type_node : integer_type_node;
 
diff --git a/gcc/testsuite/jit.dg/all-non-failing-tests.h 
b/gcc/testsuite/jit.dg/all-non-failing-tests.h
index 21ff428..463eefb 100644
--- a/gcc/testsuite/jit.dg/all-non-failing-tests.h
+++ b/gcc/testsuite/jit.dg/all-non-failing-tests.h
@@ -161,6 +161,13 @@
 #undef create_code
 #undef verify_code
 
+/* test-pr66779.c */
+#define create_code create_code_pr66779
+#define verify_code verify_code_pr66779
+#include test-pr66779.c
+#undef create_code
+#undef verify_code
+
 /* test-reading-struct.c */
 #define create_code create_code_reading_struct
 #define verify_code verify_code_reading_struct
@@ -289,6 +296,9 @@ const struct testcase testcases[] = {
   {pr66700_observing_write_through_ptr,
create_code_pr66700_observing_write_through_ptr,
verify_code_pr66700_observing_write_through_ptr},
+  {pr66779,
+   create_code_pr66779,
+   verify_code_pr66779},
   {reading_struct ,
create_code_reading_struct ,
verify_code_reading_struct },
diff --git a/gcc/testsuite/jit.dg/test-pr66779.c 
b/gcc/testsuite/jit.dg/test-pr66779.c
new file mode 100644
index 000..ac5a72b
--- /dev/null
+++ b/gcc/testsuite/jit.dg/test-pr66779.c
@@ -0,0 +1,143 @@
+#include stdlib.h
+#include stdio.h
+
+#include libgccjit.h
+
+#include harness.h
+
+/* Reproducer for PR jit/66779.
+
+   Inject the equivalent of:
+ T FUNCNAME (T i, T j, T k)
+ {
+   bool comp0 = i  0x40;
+   bool comp1 = (j == k);
+   if (comp0  comp1)
+return 7;
+   else
+return 22;
+ }
+   for some type T; this was segfaulting during the expansion to RTL
+   due to missing handling for some machine modes in
+   jit_langhook_type_for_mode.  */
+
+void
+create_fn (gcc_jit_context *ctxt,
+  const char *funcname,
+  enum gcc_jit_types jit_type)
+{
+  gcc_jit_type *the_type =
+gcc_jit_context_get_type (ctxt, jit_type);
+  gcc_jit_type *t_bool =
+gcc_jit_context_get_type (ctxt, GCC_JIT_TYPE_BOOL);
+  gcc_jit_param *param_i =
+gcc_jit_context_new_param (ctxt, NULL, the_type, i);
+  gcc_jit_param *param_j =
+gcc_jit_context_new_param (ctxt, NULL, the_type, j);
+  gcc_jit_param *param_k =
+gcc_jit_context_new_param (ctxt, NULL, the_type, k);
+  gcc_jit_param *params[3] = {
+param_i,
+param_j,
+param_k
+  };
+  gcc_jit_function *func =
+gcc_jit_context_new_function (ctxt, NULL,
+ GCC_JIT_FUNCTION_EXPORTED,
+ the_type,
+ funcname,
+ 3, params,
+ 0);
+  gcc_jit_block *b_entry = gcc_jit_function_new_block (func, entry);
+  gcc_jit_block *b_on_true = gcc_jit_function_new_block (func, on_true);
+  gcc_jit_block *b_on_false = gcc_jit_function_new_block (func, on_false);
+
+  gcc_jit_lvalue *comp0 =
+gcc_jit_function_new_local (func, NULL, t_bool, comp0);
+
+  gcc_jit_block_add_assignment (
+b_entry, NULL,
+comp0,
+gcc_jit_context_new_comparison (
+  ctxt, NULL,
+  GCC_JIT_COMPARISON_NE,
+  gcc_jit_context_new_binary_op (
+   ctxt, NULL,
+   GCC_JIT_BINARY_OP_BITWISE_AND,
+   the_type,
+   

Re: Tests for libgomp based on OpenMP Examples 4.0.2.

2015-07-07 Thread Ilya Verbin
On Tue, Jul 07, 2015 at 20:17:48 +0200, Jakub Jelinek wrote:
 On Tue, Jul 07, 2015 at 08:08:16PM +0300, Maxim Blumental wrote:
   Added 16 tests for simd construct and 10 for depend clause.
 
 Any new tests that aren't in Examples 4.0.* document should go one level
 higher, to libgomp.{c,c++,fortran}/ directly.

Actually, the examples 4.0.2 document contains simd-* and task_dep-* tests, they
are new in terms of examples-4 directory.

  -- Ilya


Re: [PATCH 1/3] [ARM] PR63870 NEON error messages

2015-07-07 Thread Charles Baylis
On 6 July 2015 at 11:18, Alan Lawrence alan.lawre...@arm.com wrote:
 I note some parts of this duplicate my
 https://gcc.gnu.org/ml/gcc-patches/2015-01/msg01422.html , which has been
 pinged a couple of times. Both Charles' patch, and my two, contain parts the
 other does not...

To resolve the conflicts, I suggest that Alan's patches should be
applied as-is first, and I'll rebase mine afterwards.

...and...

 Further to that - the main difference/conflict between Charles' patch and mine
 looks to be that I added the const_tree parameter to the existing
 neon_lane_bounds method, whereas Charles' patch adds a new method
 arm_neon_lane_bounds.

... I'll clean up this duplication when I do.


[PATCH, committed] PR jit/66783: prevent use of opaque structs

2015-07-07 Thread David Malcolm
Prevent use of opaque structs for fields, globals and locals.

Tested with make check-jit; jit.sum goes from 8434 to 8494 passes.

Committed to trunk as r225523.

gcc/jit/ChangeLog:
PR jit/66783
* jit-recording.h: Within namespace gcc:jit::recording...
(type::has_known_size): New virtual function.
(struct_has_known_size): New function.
* libgccjit.c (gcc_jit_context_new_field): Verify that the type
has a known size.
(gcc_jit_context_new_global): Likewise.
(gcc_jit_function_new_local): Likewise.

gcc/testsuite/ChangeLog:
PR jit/66783
* jit.dg/test-error-gcc_jit_context_new_field-opaque-struct.c: New
test case.
* jit.dg/test-error-gcc_jit_context_new_global-opaque-struct.c:
New test case.
* jit.dg/test-error-gcc_jit_function_new_local-opaque-struct.c:
New test case.
* jit.dg/test-error-mismatching-types-in-call.c (create_code):
Avoid using an opaque struct for local f.
---
 gcc/jit/jit-recording.h|  3 ++
 gcc/jit/libgccjit.c| 15 
 ...error-gcc_jit_context_new_field-opaque-struct.c | 31 
 ...rror-gcc_jit_context_new_global-opaque-struct.c | 32 +
 ...rror-gcc_jit_function_new_local-opaque-struct.c | 42 ++
 .../jit.dg/test-error-mismatching-types-in-call.c  |  2 +-
 6 files changed, 124 insertions(+), 1 deletion(-)
 create mode 100644 
gcc/testsuite/jit.dg/test-error-gcc_jit_context_new_field-opaque-struct.c
 create mode 100644 
gcc/testsuite/jit.dg/test-error-gcc_jit_context_new_global-opaque-struct.c
 create mode 100644 
gcc/testsuite/jit.dg/test-error-gcc_jit_function_new_local-opaque-struct.c

diff --git a/gcc/jit/jit-recording.h b/gcc/jit/jit-recording.h
index acd69e9..884304b 100644
--- a/gcc/jit/jit-recording.h
+++ b/gcc/jit/jit-recording.h
@@ -497,6 +497,7 @@ public:
   virtual type *is_pointer () = 0;
   virtual type *is_array () = 0;
   virtual bool is_void () const { return false; }
+  virtual bool has_known_size () const { return true; }
 
   bool is_numeric () const
   {
@@ -795,6 +796,8 @@ public:
   type *is_pointer () { return NULL; }
   type *is_array () { return NULL; }
 
+  bool has_known_size () const { return m_fields != NULL; }
+
   playback::compound_type *
   playback_compound_type ()
   {
diff --git a/gcc/jit/libgccjit.c b/gcc/jit/libgccjit.c
index 4d7dd8c..85d9f62 100644
--- a/gcc/jit/libgccjit.c
+++ b/gcc/jit/libgccjit.c
@@ -543,6 +543,11 @@ gcc_jit_context_new_field (gcc_jit_context *ctxt,
   /* LOC can be NULL.  */
   RETURN_NULL_IF_FAIL (type, ctxt, loc, NULL type);
   RETURN_NULL_IF_FAIL (name, ctxt, loc, NULL name);
+  RETURN_NULL_IF_FAIL_PRINTF1 (
+type-has_known_size (),
+ctxt, loc,
+type has unknown size (type: %s),
+type-get_debug_string ());
 
   return (gcc_jit_field *)ctxt-new_field (loc, type, name);
 }
@@ -1033,6 +1038,11 @@ gcc_jit_context_new_global (gcc_jit_context *ctxt,
 kind);
   RETURN_NULL_IF_FAIL (type, ctxt, loc, NULL type);
   RETURN_NULL_IF_FAIL (name, ctxt, loc, NULL name);
+  RETURN_NULL_IF_FAIL_PRINTF1 (
+type-has_known_size (),
+ctxt, loc,
+type has unknown size (type: %s),
+type-get_debug_string ());
 
   return (gcc_jit_lvalue *)ctxt-new_global (loc, kind, type, name);
 }
@@ -1829,6 +1839,11 @@ gcc_jit_function_new_local (gcc_jit_function *func,
   Cannot add locals to an imported function);
   RETURN_NULL_IF_FAIL (type, ctxt, loc, NULL type);
   RETURN_NULL_IF_FAIL (name, ctxt, loc, NULL name);
+  RETURN_NULL_IF_FAIL_PRINTF1 (
+type-has_known_size (),
+ctxt, loc,
+type has unknown size (type: %s),
+type-get_debug_string ());
 
   return (gcc_jit_lvalue *)func-new_local (loc, type, name);
 }
diff --git 
a/gcc/testsuite/jit.dg/test-error-gcc_jit_context_new_field-opaque-struct.c 
b/gcc/testsuite/jit.dg/test-error-gcc_jit_context_new_field-opaque-struct.c
new file mode 100644
index 000..c4e1448
--- /dev/null
+++ b/gcc/testsuite/jit.dg/test-error-gcc_jit_context_new_field-opaque-struct.c
@@ -0,0 +1,31 @@
+#include stdlib.h
+#include stdio.h
+
+#include libgccjit.h
+
+#include harness.h
+
+/* Try to put an opaque struct inside another struct
+   (or union); the API ought to complain.  */
+
+void
+create_code (gcc_jit_context *ctxt, void *user_data)
+{
+  gcc_jit_struct *t_opaque =
+gcc_jit_context_new_opaque_struct (ctxt, NULL, opaque);
+
+  (void)gcc_jit_context_new_field (ctxt, NULL,
+  gcc_jit_struct_as_type (t_opaque),
+  f_opaque);
+}
+
+void
+verify_code (gcc_jit_context *ctxt, gcc_jit_result *result)
+{
+  CHECK_VALUE (result, NULL);
+
+  /* Verify that the correct error message was emitted.  */
+  CHECK_STRING_VALUE (gcc_jit_context_get_first_error (ctxt),
+ gcc_jit_context_new_field:
+  type has unknown size (type: 

Re: Tests for libgomp based on OpenMP Examples 4.0.2.

2015-07-07 Thread Jakub Jelinek
On Tue, Jul 07, 2015 at 08:08:16PM +0300, Maxim Blumental wrote:
 Comment on the patch:
 
 simd-5.f90 file is marked as xfail since the test fails because 'simd
 collapse' is an unsupported combination for Fortran (which though is
 valid in OpenMP API).

I'll have a look, that is supposed to work.

 2015-07-07 19:48 GMT+03:00 Maxim Blumental bvm...@gmail.com:
  With this letter I propose a patch with tests for libgomp based on
  OpenMP Examples 4.0.2 both for C and Fortran.
 
  The changes are:
 
   Renamed existing tests based on OpenMP Examples to make the names more 
  clear.

If anything, the test could be renamed to match
https://github.com/OpenMP/Examples/tree/master/sources/
filenames, but certainly not to made up names.  The Examples-4/
directory is supposed to only contain the tests from the 4.0.* examples
document and no other tests.

  Added 16 tests for simd construct and 10 for depend clause.

Any new tests that aren't in Examples 4.0.* document should go one level
higher, to libgomp.{c,c++,fortran}/ directly.

Jakub


Adjust -fdump-ada-spec to C++14 switch

2015-07-07 Thread Eric Botcazou
The Ada side doesn't know what to do with the move constructors of C++11 so 
the attached patch makes -fdump-ada-spec skip them.

Tested on x86_64-suse-linux, applied on the mainline as obvious.


2015-07-07  Eric Botcazou  ebotca...@adacore.com

c-family/
* c-ada-spec.h (cpp_operation): Add IS_MOVE_CONSTRUCTOR.
* c-ada-spec.c (print_ada_declaration): Skip move constructors.
cp/
* decl2.c (cpp_check): Deal with IS_MOVE_CONSTRUCTOR.


2015-07-07  Eric Botcazou  ebotca...@adacore.com

* g++.dg/other/dump-ada-spec-8.C: New test.

-- 
Eric BotcazouIndex: c-family/c-ada-spec.h
===
--- c-family/c-ada-spec.h	(revision 225410)
+++ c-family/c-ada-spec.h	(working copy)
@@ -30,6 +30,7 @@ typedef enum {
   IS_CONSTRUCTOR,
   IS_DESTRUCTOR,
   IS_COPY_CONSTRUCTOR,
+  IS_MOVE_CONSTRUCTOR,
   IS_TEMPLATE,
   IS_TRIVIAL
 } cpp_operation;
Index: c-family/c-ada-spec.c
===
--- c-family/c-ada-spec.c	(revision 225410)
+++ c-family/c-ada-spec.c	(working copy)
@@ -2891,6 +2891,7 @@ print_ada_declaration (pretty_printer *b
   bool is_constructor = false;
   bool is_destructor = false;
   bool is_copy_constructor = false;
+  bool is_move_constructor = false;
 
   if (!decl_name)
 	return 0;
@@ -2901,11 +2902,12 @@ print_ada_declaration (pretty_printer *b
 	  is_constructor = cpp_check (t, IS_CONSTRUCTOR);
 	  is_destructor = cpp_check (t, IS_DESTRUCTOR);
 	  is_copy_constructor = cpp_check (t, IS_COPY_CONSTRUCTOR);
+	  is_move_constructor = cpp_check (t, IS_MOVE_CONSTRUCTOR);
 	}
 
-  /* Skip copy constructors: some are internal only, and those that are
-	 not cannot be called easily from Ada anyway.  */
-  if (is_copy_constructor)
+  /* Skip copy constructors and C++11 move constructors: some are internal
+	 only and those that are not cannot be called easily from Ada.  */
+  if (is_copy_constructor || is_move_constructor)
 	return 0;
 
   if (is_constructor || is_destructor)
Index: cp/decl2.c
===
--- cp/decl2.c	(revision 225410)
+++ cp/decl2.c	(working copy)
@@ -4077,6 +4077,8 @@ cpp_check (tree t, cpp_operation op)
 	return DECL_DESTRUCTOR_P (t);
   case IS_COPY_CONSTRUCTOR:
 	return DECL_COPY_CONSTRUCTOR_P (t);
+  case IS_MOVE_CONSTRUCTOR:
+	return DECL_MOVE_CONSTRUCTOR_P (t);
   case IS_TEMPLATE:
 	return TREE_CODE (t) == TEMPLATE_DECL;
   case IS_TRIVIAL:
/* { dg-do compile } */
/* { dg-options -fdump-ada-spec } */

templateclass T, class U class Generic_Array
{
  Generic_Array();
};

template class Generic_Arraychar, int;

/* { dg-final { scan-ada-spec-not access Generic_Array } } */
/* { dg-final { cleanup-ada-spec } } */


Re: [PATCH] save takes a single integer (register or 13-bit signed immediate)

2015-07-07 Thread Eric Botcazou
 You are right, I forgot about that. Is there a mode one can use that
 changes depending on the target architecture (32-bit on 32-bit
 architectures and 64-bit on 64-bit architectures)?

Yes, Pmode does exactly that, but you cannot use it directly in the MD file.

 Or does one have to add a 32-bit and a 64-bit variant of window_save?

Sort of, you can use the P mode iterator, but the name of the pattern will 
vary so you'll need to adjust the callers.

-- 
Eric Botcazou


RE: fix PR46029: reimplement if conversion of loads and stores

2015-07-07 Thread Abe

(if-conversion could directly generate masked load/stores
 of course and not use a scratch-pad at all in that case).


IMO that`s a great idea, but I don`t know how to do it.  Hints would be welcome.  In 
particular, how does one generate masked load/stores at the GIMPLE level?



But are we correctly handling these cases in the current if conversion code?


I`m uncertain to what that is intended to refer, but I believe Sebastian would 
agree that the new if converter is safer than the old one in terms of 
correctness at the time of running the code being compiled.



Abe's changes would seem like a step forward from a correctness standpoint


Not to argue, but as a point of humility: Sebastian did by far the most work on 
this patch.  I just modernized it and helped move it along.  Even _that_ was 
done with Sebastian`s help.



even if they take us a step backwards from a performance standpoint.


For now, we have a few performance regressions, and so far we have found that 
it`s non-trivial to remove all of those regressions.
We may be better off pushing the current patch to trunk and having the 
performance regressions currently introduced by the new if converter be fixed 
by later patches.
Pushing to trunk gives us excellent visibility amongst GCC hackers, so the code will 
get more eyeballs than if it lingers in an uncommitted patch or in a branch.
I, for one, would love some help in fixing these performance regressions. ;-)

If fixing the performance regressions winds up taking too long, perhaps the 
current imperfect patch could be undone on trunk just before a release is 
tagged,
and then we`ll push it in again when trunk goes back to being allowed to be 
unstable?  According to my analysis of the data near the end of the page at
https://gcc.gnu.org/develop.html, we have until roughly April of 2016 to work 
on not-yet-perfect patches in trunk.



So the question is whether we get more non-vectorized if-converted
code out of this (and thus whether we want to use --param
allow-store-data-races to get the old code back which is nicer to less
capable CPUs and probably faster than using scatter/gather or masked 
loads/stores).



I do think conditionalizing some of this on the allow-store-data-races makes 
sense.


I think having both the old if-converter and the new one live on in GCC is 
nontrivial, but not impossible.  I also don`t think it`s the best long-term goal,
but only a short-term workaround.  In the long run, IMO there should be only one if 
converter, albeit perhaps with tweaking flags [e.g. 
-fallow-unsafe-if-conversion].



I also wonder if we should really care about load data races (not sure your 
patch does).


According to a recent long discussion I had with Sebastian, our current patch 
does not have the flaw I was concerned it might have in terms of loads because:

  [1] the scratchpad is only being used to if-convert assignments to 
thread-local scalars, never to globals/statics, and because...

  [2] the gimplifier is supposed to detect the address of this scalar has been 
taken and when such is detected in the code being compiled,
  it causes the scalar to no longer look like a scalar in GIMPLE so that we 
are also safe from stale-data problems that could come from
  corner-case code that takes the address of a thread-local variable and 
gives that address to another thread [which then proceeds to
  overwrite the value in the supposedly-thread-local scalar that belongs to 
a different thread from the one doing the writing]


Regards,

Abe




Re: [PATCH, ARM] stop changing signedness in PROMOTE_MODE

2015-07-07 Thread Richard Biener
On July 7, 2015 6:29:21 PM GMT+02:00, Jim Wilson jim.wil...@linaro.org wrote:
On Tue, Jul 7, 2015 at 8:07 AM, Jeff Law l...@redhat.com wrote:
 On 06/29/2015 07:15 PM, Jim Wilson wrote:
 So if these copies require a  conversion, then isn't it
fundamentally
 wrong to have a PHI node which copies between them?  That would seem
to
 implicate the eipa_sra pass as needing to be aware of these
promotions and
 avoid having these objects with different representations appearing
on the
 lhs/rhs of a PHI node.

My years at Cisco didn't give me a chance to work on the SSA passes,
so I don't know much about how they work.  But looking at this, I see
that PHI nodes are eventually handed by emit_partition_copy in
tree-outof-ssa.c, which calls convert_to_mode, so it appears that
conversions between different (closely related?) types are OK in a PHI
node.  The problem in this case is that we have the exact same type
for the src and dest.  The only difference is that the ARM forces
sign-extension for signed sub-word parameters and zero-extension for
signed sub-word locals.  Thus to detect the need for a conversion, you
have to have the decls, and we don't have them here.  There is also
the problem that all of the SUBREG_PROMOTED_* stuff is in expand_expr
and friends, which aren't used by the cfglayout/tree-outof-ssa.c code
for PHI nodes.  So we need to copy some of the SUBREG_PROMOTED_*
handling into cfglyout/tree-outof-ssa, or modify them to call
expand_expr for PHI nodes, and I haven't had any luck getting that to
work yet. I still need to learn more about the code to figure out if
this is possible.

It probably is.  The decks for the parameter based SSA names are available, for 
the PHI destination there might be no decl.

I also think that the ARM handling of PROMOTE_MODE is wrong.  Treating
a signed sub-word and unsigned can lead to inefficient code.  This is
part of the problem is much easier for me to fix.  It may be hard to
convince ARM maintainers that it should be changed though.  I need
more time to work on this too.

I haven't looked at trying to forbid the optimizer from creating PHI
nodes connecting parameters to locals.  That just sounds like a
strange thing to forbid, and seems likely to result in worse code by
disabling too many optimizations.  But maybe it is the right solution
if neither of the other two options work.  This should only be done
when PROMOTE_MODE disagrees with TARGET_PROMOTE_FUNCTION_MODE, forcing
the copy to require a conversion.

I don't think disallowing such PHI nodes is the right thing to do.  I'd rather 
expose the TARGET_PROMOTE_FUNCTION_MODE effect earlier by modifying the 
parameter types during, say, gimplification.

Richard.

Jim




Re: [patch 9/9] Final patch with all changes

2015-07-07 Thread Pedro Alves
On 07/07/2015 02:51 PM, Andrew MacLeod wrote:
 *** sel-sched-ir.h(revision 225452)
 --- sel-sched-ir.h(working copy)
 *** along with GCC; see the file COPYING3.
 *** 22,34 
   #define GCC_SEL_SCHED_IR_H
   
   /* For state_t.  */
 - #include insn-attr.h
 - #include regset.h
   /* For reg_note.  */
 - #include rtl.h
 - #include bitmap.h
 - #include sched-int.h
 - #include cfgloop.h
   

Should probably drop those For state_t/reg_note. comments too.

Thanks,
Pedro Alves


Re: [PATCH, ARM] stop changing signedness in PROMOTE_MODE

2015-07-07 Thread Jeff Law

On 06/29/2015 07:15 PM, Jim Wilson wrote:

This is my suggested fix for PR 65932, which is a linux kernel
miscompile with gcc-5.1.

The problem here is caused by a chain of events.  The first is that
the relatively new eipa_sra pass creates fake parameters that behave
slightly differently than normal parameters.  The second is that the
optimizer creates phi nodes that copy local variables to fake
parameters and/or vice versa.  The third is that the ouf-of-ssa pass
assumes that it can emit simple move instructions for these phi nodes.
And the fourth is that the ARM port has a PROMOTE_MODE macro that
forces QImode and HImode to unsigned, but a
TARGET_PROMOTE_FUNCTION_MODE hook that does not.  So signed char and
short parameters have different in register representations than local
variables, and require a conversion when copying between them, a
conversion that the out-of-ssa pass can't easily emit.
So if these copies require a  conversion, then isn't it fundamentally 
wrong to have a PHI node which copies between them?  That would seem to 
implicate the eipa_sra pass as needing to be aware of these promotions 
and avoid having these objects with different representations appearing 
on the lhs/rhs of a PHI node.


Jeff


[gomp4] Handle deviceptr from an outer directive

2015-07-07 Thread James Norris

Hi,

This patch fixes an issue where the deviceptr clause in an outer
directive was being ignored during implicit variable definition
on a nested directive.

Committed to gomp-4_0-branch.

Jim
diff --git a/gcc/gimplify.c b/gcc/gimplify.c
index 51aadc0..a721a52 100644
--- a/gcc/gimplify.c
+++ b/gcc/gimplify.c
@@ -116,6 +116,9 @@ enum gimplify_omp_var_data
   /* Gang-local OpenACC variable.  */
   GOVD_GANGLOCAL = (1  16),
 
+  /* OpenACC deviceptr clause.  */
+  GOVD_USE_DEVPTR = (1  17),
+
   GOVD_DATA_SHARE_CLASS = (GOVD_SHARED | GOVD_PRIVATE | GOVD_FIRSTPRIVATE
 			   | GOVD_LASTPRIVATE | GOVD_REDUCTION | GOVD_LINEAR
 			   | GOVD_LOCAL)
@@ -6274,7 +6277,10 @@ gimplify_scan_omp_clauses (tree *list_p, gimple_seq *pre_p,
 		}
 	  break;
 	}
+
 	  flags = GOVD_MAP | GOVD_EXPLICIT;
+	  if (OMP_CLAUSE_MAP_KIND (c) == GOMP_MAP_FORCE_DEVICEPTR)
+	flags |= GOVD_USE_DEVPTR;
 	  goto do_add;
 
 	case OMP_CLAUSE_DEPEND:
@@ -6662,6 +6668,7 @@ gimplify_adjust_omp_clauses_1 (splay_tree_node n, void *data)
 			   : (flags  GOVD_FORCE_MAP
   ? GOMP_MAP_FORCE_TOFROM
   : GOMP_MAP_TOFROM));
+
   if (DECL_SIZE (decl)
 	   TREE_CODE (DECL_SIZE (decl)) != INTEGER_CST)
 	{
@@ -6687,7 +6694,17 @@ gimplify_adjust_omp_clauses_1 (splay_tree_node n, void *data)
 	  OMP_CLAUSE_CHAIN (clause) = nc;
 	}
   else
-	OMP_CLAUSE_SIZE (clause) = DECL_SIZE_UNIT (decl);
+	{
+	  if (gimplify_omp_ctxp-outer_context)
+	{
+	  struct gimplify_omp_ctx *ctx = gimplify_omp_ctxp-outer_context;
+	  splay_tree_node on
+		= splay_tree_lookup (ctx-variables, (splay_tree_key) decl);
+	  if (on  (on-value  GOVD_USE_DEVPTR))
+	OMP_CLAUSE_SET_MAP_KIND (clause, GOMP_MAP_FORCE_PRESENT);
+	}
+	  OMP_CLAUSE_SIZE (clause) = DECL_SIZE_UNIT (decl);
+	}
 }
   if (code == OMP_CLAUSE_FIRSTPRIVATE  (flags  GOVD_LASTPRIVATE) != 0)
 {


Re: RE: [Ping^3] [PATCH, ARM, libgcc] New aeabi_idiv function for armv6-m

2015-07-07 Thread Tejas Belagod

Ping!

On 30/04/15 10:40, Hale Wang wrote:

-Original Message-
From: Hale Wang [mailto:hale.w...@arm.com]
Sent: Monday, February 09, 2015 9:54 AM
To: Richard Earnshaw
Cc: Hale Wang; gcc-patches; Matthew Gretton-Dann
Subject: RE: [Ping^2] [PATCH, ARM, libgcc] New aeabi_idiv function for
armv6-m

Ping https://gcc.gnu.org/ml/gcc-patches/2014-12/msg01059.html.



Ping for trunk. Is it ok for trunk now?

Thanks,
Hale

-Original Message-
From: gcc-patches-ow...@gcc.gnu.org [mailto:gcc-patches-
ow...@gcc.gnu.org] On Behalf Of Hale Wang
Sent: Friday, December 12, 2014 9:36 AM
To: gcc-patches
Subject: RE: [Ping] [PATCH, ARM, libgcc] New aeabi_idiv function for
armv6- m

Ping? Already applied to arm/embedded-4_9-branch, is it OK for trunk?

-Hale


-Original Message-
From: Joey Ye [mailto:joey.ye...@gmail.com]
Sent: Thursday, November 27, 2014 10:01 AM
To: Hale Wang
Cc: gcc-patches
Subject: Re: [PATCH, ARM, libgcc] New aeabi_idiv function for
armv6-m

OK applying to arm/embedded-4_9-branch, though you still need
maintainer approval into trunk.

- Joey

On Wed, Nov 26, 2014 at 11:43 AM, Hale Wang hale.w...@arm.com

wrote:

Hi,

This patch ports the aeabi_idiv routine from Linaro Cortex-Strings
(https://git.linaro.org/toolchain/cortex-strings.git), which was
contributed by ARM under Free BSD license.

The new aeabi_idiv routine is used to replace the one in
libgcc/config/arm/lib1funcs.S. This replacement happens within the
Thumb1 wrapper. The new routine is under LGPLv3 license.

The main advantage of this version is that it can improve the
performance of the aeabi_idiv function for Thumb1. This solution
will also increase the code size. So it will only be used if
__OPTIMIZE_SIZE__ is

not defined.


Make check passed for armv6-m.

OK for trunk?

Thanks,
Hale Wang

libgcc/ChangeLog:

2014-11-26  Hale Wang  hale.w...@arm.com

 * config/arm/lib1funcs.S: Add new wrapper.

===
diff --git a/libgcc/config/arm/lib1funcs.S
b/libgcc/config/arm/lib1funcs.S index b617137..de66c81 100644
--- a/libgcc/config/arm/lib1funcs.S
+++ b/libgcc/config/arm/lib1funcs.S
@@ -306,34 +306,12 @@ LSYM(Lend_fde):
  #ifdef __ARM_EABI__
  .macro THUMB_LDIV0 name signed
  #if defined(__ARM_ARCH_6M__)
-   .ifc \signed, unsigned
-   cmp r0, #0
-   beq 1f
-   mov r0, #0
-   mvn r0, r0  @ 0x
-1:
-   .else
-   cmp r0, #0
-   beq 2f
-   blt 3f
+
+   push{r0, lr}
 mov r0, #0
-   mvn r0, r0
-   lsr r0, r0, #1  @ 0x7fff
-   b   2f
-3: mov r0, #0x80
-   lsl r0, r0, #24 @ 0x8000
-2:
-   .endif
-   push{r0, r1, r2}
-   ldr r0, 4f
-   adr r1, 4f
-   add r0, r1
-   str r0, [sp, #8]
-   @ We know we are not on armv4t, so pop pc is safe.
-   pop {r0, r1, pc}
-   .align  2
-4:
-   .word   __aeabi_idiv0 - 4b
+   bl  SYM(__aeabi_idiv0)
+   pop {r1, pc}
+
  #elif defined(__thumb2__)
 .syntax unified
 .ifc \signed, unsigned
@@ -927,7 +905,158 @@ LSYM(Lover7):
 add dividend, work
.endif
  LSYM(Lgot_result):
-.endm
+.endm
+
+#if defined(__prefer_thumb__)

 !defined(__OPTIMIZE_SIZE__) .macro

+BranchToDiv n, label
+   lsr curbit, dividend, \n
+   cmp curbit, divisor
+   blo \label
+.endm
+
+.macro DoDiv n
+   lsr curbit, dividend, \n
+   cmp curbit, divisor
+   bcc 1f
+   lsl curbit, divisor, \n
+   sub dividend, dividend, curbit
+
+1: adc result, result
+.endm
+
+.macro THUMB1_Div_Positive
+   mov result, #0
+   BranchToDiv #1, LSYM(Lthumb1_div1)
+   BranchToDiv #4, LSYM(Lthumb1_div4)
+   BranchToDiv #8, LSYM(Lthumb1_div8)
+   BranchToDiv #12, LSYM(Lthumb1_div12)
+   BranchToDiv #16, LSYM(Lthumb1_div16)
+LSYM(Lthumb1_div_large_positive):
+   mov result, #0xff
+   lsl divisor, divisor, #8
+   rev result, result
+   lsr curbit, dividend, #16
+   cmp curbit, divisor
+   blo 1f
+   asr result, #8
+   lsl divisor, divisor, #8
+   beq LSYM(Ldivbyzero_waypoint)
+
+1: lsr curbit, dividend, #12
+   cmp curbit, divisor
+   blo LSYM(Lthumb1_div12)
+   b   LSYM(Lthumb1_div16)
+LSYM(Lthumb1_div_loop):
+   lsr divisor, divisor, #8
+LSYM(Lthumb1_div16):
+   Dodiv   #15
+   Dodiv   #14
+   Dodiv   #13
+   Dodiv   #12
+LSYM(Lthumb1_div12):
+   Dodiv   #11
+   Dodiv   #10
+   Dodiv   #9
+   Dodiv   #8
+   bcs LSYM(Lthumb1_div_loop)
+LSYM(Lthumb1_div8):
+   Dodiv   #7
+   Dodiv   #6
+   Dodiv   #5
+LSYM(Lthumb1_div5):
+   Dodiv   #4
+LSYM(Lthumb1_div4):
+   Dodiv   #3
+LSYM(Lthumb1_div3):
+   Dodiv   #2
+LSYM(Lthumb1_div2):
+   Dodiv   #1
+LSYM(Lthumb1_div1):
+   sub 

Re: [patch 0/9] Flattening and initial module rebuilding

2015-07-07 Thread Jeff Law

On 07/07/2015 07:40 AM, Andrew MacLeod wrote:

This is a series of 9 patches which does some flattening, some module
building, and some basic cleanups.

  I am presenting them as 9 patches for easier review. The latter couple
of patches affect a lot of the same files that follow on patches then
adjust, I've decided NOT to put the automated changes in with each of
those patches.

There are 8 patches showing the key changes, and then the 9th patch is
an aggregate of the first 8 key changes, plus the final result of the
impact on all the source files.  This is the only patch I'd like to commit.

  The automated tools which generate the source changes have been
significantly enhanced.  When a header is flattened, the source file is
checked for the existence of the headers which need moving, and any
which are already present are left if they are in the right order. Any
duplicate are also removed.

  A similar process is used when an aggregation file like backend.h or
ssa.h is processed. Any occurrences of the aggregated headers are
removed from the source file so there are no duplicates.  The aggregated
headers are typically only placed in a source file if 3 or more of the
headers would be replaced.  (ie, if only bitmap.h is included, I don't
just blindly put backend.h in the file.)   This number came from
analysis of a fully flattened and include-reduced tree, and seemed to be
the sweet spot.

  With the aggregation and flattening, the order of some includes can
get shifted around with other files in between, so the tools also ensure
there is a blessed order which will make sure than any pre-reqs are
always available.  Right now, its primarily:

config.h
system.h
coretypes.h
backend.h
tree.h
gimple.h
rtl.h
df.h
ssa.h

And if any of the aggregators are not present, then any headers which
make up the aggregator are in the same relative position.

The tools actually produced all these patches with no tweaking to solve
compilation failures.. which was very helpful.  The old ones needed some
guidance and were a bit finicky.

I can adjust any of this quite easily, or present them in a different
way if you don't like it this way.  Again, my goal is to check in just
the final patch which does all the work of first 8 patches.It would
be a lot less turmoil on the branch.   I can do it in smaller chunks if
need be.

The set of 9 patches is fine for the trunk.  Just a few discussion points...

One of the things I keep thinking about as these changes fly by is your 
scripts.  Is there a reasonable possibility for you to add your scripts 
to the contrib/ directory or something similar to aid us in any future 
header file refactoring?  Yes, I know that in theory we should never 
have to do this again, but I also know that reality can be rather different.


Presumably the aggregators, by policy, are to have #includes and nothing 
else, right?  If so, we might want a comment to that effect in them.


It's a bit of a shame that function.h is in backend.h, along with 
predict (which is presumably needed by basic-block/cfg?).



Jeff


Re: [patch 9/9] Final patch with all changes

2015-07-07 Thread Andrew MacLeod

On 07/07/2015 06:03 PM, Pedro Alves wrote:

On 07/07/2015 02:51 PM, Andrew MacLeod wrote:

*** sel-sched-ir.h  (revision 225452)
--- sel-sched-ir.h  (working copy)
*** along with GCC; see the file COPYING3.
*** 22,34 
   #define GCC_SEL_SCHED_IR_H
   
   /* For state_t.  */

- #include insn-attr.h
- #include regset.h
   /* For reg_note.  */
- #include rtl.h
- #include bitmap.h
- #include sched-int.h
- #include cfgloop.h
   

Should probably drop those For state_t/reg_note. comments too.

Thanks,
Pedro Alves
Ah right. Previous version had them removed. Missed them when I rebuilt 
the patch.

Thanks
Andrew


Re: [PATCH 6/7] Fix DEMANGLE_COMPONENT_LOCAL_NAME

2015-07-07 Thread Jeff Law

On 07/06/2015 01:39 PM, Mikhail Maltsev wrote:

---
  libiberty/cp-demangle.c   | 7 +++
  libiberty/testsuite/demangle-expected | 4 
  2 files changed, 11 insertions(+)

diff --git a/libiberty/cp-demangle.c b/libiberty/cp-demangle.c
index 424b1c5..289a704 100644
--- a/libiberty/cp-demangle.c
+++ b/libiberty/cp-demangle.c
@@ -3243,6 +3243,8 @@ d_expression_1 (struct d_info *di)
struct demangle_component *left;
struct demangle_component *right;

+   if (code == NULL)
+ return NULL;
if (op_is_new_cast (op))
  left = cplus_demangle_type (di);
else
@@ -4436,6 +4438,11 @@ d_print_comp_inner (struct d_print_info *dpi, int 
options,
local_name = d_right (typed_name);
if (local_name-type == DEMANGLE_COMPONENT_DEFAULT_ARG)
  local_name = local_name-u.s_unary_num.sub;
+   if (local_name == NULL)
+ {
+   d_print_error (dpi);
+   return;
+ }
while (local_name-type == DEMANGLE_COMPONENT_RESTRICT_THIS
   || local_name-type == DEMANGLE_COMPONENT_VOLATILE_THIS
   || local_name-type == DEMANGLE_COMPONENT_CONST_THIS
diff --git a/libiberty/testsuite/demangle-expected
b/libiberty/testsuite/demangle-expected
index 2dbab14..cfa2691 100644
--- a/libiberty/testsuite/demangle-expected
+++ b/libiberty/testsuite/demangle-expected
@@ -4104,6 +4104,10 @@ _Z111
  --format=gnu-v3
  _ZDTtl
  _ZDTtl
+# Check for NULL pointer when demangling DEMANGLE_COMPONENT_LOCAL_NAME
+--format=gnu-v3
+_ZZN1fEEd_lEv
+_ZZN1fEEd_lEv
  #
  # Ada (GNAT) tests.
  #


Also OK with a suitable ChangeLog entry.

jeff


Re: [PATCH 3/7] Fix trinary op

2015-07-07 Thread Jeff Law

On 07/06/2015 01:34 PM, Mikhail Maltsev wrote:

---
  libiberty/cp-demangle.c   | 4 +++-
  libiberty/testsuite/demangle-expected | 6 ++
  2 files changed, 9 insertions(+), 1 deletion(-)

diff --git a/libiberty/cp-demangle.c b/libiberty/cp-demangle.c
index 12093cc..44a0a9b 100644
--- a/libiberty/cp-demangle.c
+++ b/libiberty/cp-demangle.c
@@ -3267,7 +3267,9 @@ d_expression_1 (struct d_info *di)
struct demangle_component *second;
struct demangle_component *third;

-   if (!strcmp (code, qu))
+   if (code == NULL)
+ return NULL;
+   else if (!strcmp (code, qu))
  {
/* ?: expression.  */
first = d_expression_1 (di);
diff --git a/libiberty/testsuite/demangle-expected
b/libiberty/testsuite/demangle-expected
index 6ea64ae..47ca8f5 100644
--- a/libiberty/testsuite/demangle-expected
+++ b/libiberty/testsuite/demangle-expected
@@ -4091,6 +4091,12 @@ void g1(A1, Bstatic_castbool(1))
  _ZNKSt7complexIiE4realB5cxx11Ev
  std::complexint::real[abi:cxx11]() const
  #
+# Some more crashes revealed by fuzz-testing:
+# Check for NULL pointer when demangling trinary operators
+--format=gnu-v3
+Av32_f
+Av32_f
+#
  # Ada (GNAT) tests.
  #
  # Simple test.


OK with a suitable ChangeLog entry.

And a generic question on the testsuite -- presumably it turns on type 
demangling?I wanted to verify the flow through d_expression_1 was 
what I expected it to be and it took a while to realize that c++filt 
doesn't demangle types by default, thus Av32_f would demangle to Av32_f 
without ever getting into d_expression_1.


jeff


Re: [PATCH 4/7] Fix int overflow

2015-07-07 Thread Jeff Law

On 07/06/2015 06:04 PM, Mikhail Maltsev wrote:

On 07.07.2015 1:55, Jeff Law wrote:


 len = d_number (di);
-  if (len = 0)
+  if (len = 0 || len  INT_MAX)
   return NULL;
 ret = d_identifier (di, len);
 di-last_name = ret;

Isn't this only helpful if sizeof (long)  sizeof (int)?  Otherwise the
compiler is going to eliminate that newly added test, right?

So with that in mind, what happens on i686-unknown-linux with this test?


Jeff



Probably it should be fine, because the problem occurred when len became
negative after implicit conversion to int (d_identifier does not check
for negative length, but it does check that length does not exceed total
string length). In this case (i.e. on ILP32 targets) len will not change
sign after conversion to int (because it's a no-op).
I'm not completely sure about compiler warnings, but AFAIR, in multilib
build libiberty is also built for 32-bit target, and I did not get any
additional warnings.

You may need -Wtype-limits to see the warning.

I'm not questioning whether or not the test will cause a problem, but 
instead questioning if the test does what you expect it to do on a 32bit 
host.


On a host where sizeof (int) == sizeof (long), that len  INT_MAX test 
is always going to be false.


If you want to do overflow testing, you have to compute len in a wider 
type.  You might consider using long long or int64_t depending on 
the outcome of a configure test.  Falling back to a simple long if the 
host compiler doesn't have long long or int64_t.


Interesting exercise feeding those tests into demangle.com  :-0  A 
suitably interested party might be able to exploit that overflow.



jeff



Re: [PATCH 7/7] Fix several crashes in d_find_pack

2015-07-07 Thread Jeff Law

On 07/06/2015 01:40 PM, Mikhail Maltsev wrote:

---
  libiberty/cp-demangle.c   |  3 +++
  libiberty/testsuite/demangle-expected | 12 
  2 files changed, 15 insertions(+)

diff --git a/libiberty/cp-demangle.c b/libiberty/cp-demangle.c
index 289a704..4ca285e 100644
--- a/libiberty/cp-demangle.c
+++ b/libiberty/cp-demangle.c
@@ -4203,6 +4203,9 @@ d_find_pack (struct d_print_info *dpi,
  case DEMANGLE_COMPONENT_CHARACTER:
  case DEMANGLE_COMPONENT_FUNCTION_PARAM:
  case DEMANGLE_COMPONENT_UNNAMED_TYPE:
+case DEMANGLE_COMPONENT_FIXED_TYPE:
+case DEMANGLE_COMPONENT_DEFAULT_ARG:
+case DEMANGLE_COMPONENT_NUMBER:
return NULL;

  case DEMANGLE_COMPONENT_EXTENDED_OPERATOR:
diff --git a/libiberty/testsuite/demangle-expected
b/libiberty/testsuite/demangle-expected
index cfa2691..b58cea2 100644
--- a/libiberty/testsuite/demangle-expected
+++ b/libiberty/testsuite/demangle-expected
@@ -4108,6 +4108,18 @@ _ZDTtl
  --format=gnu-v3
  _ZZN1fEEd_lEv
  _ZZN1fEEd_lEv
+# Handle DEMANGLE_COMPONENT_FIXED_TYPE in d_find_pack
+--format=gnu-v3
+DpDFT_
+DpDFT_
+# Likewise, DEMANGLE_COMPONENT_DEFAULT_ARG
+--format=gnu-v3
+DpZ1fEd_
+DpZ1fEd_
+# Likewise, DEMANGLE_COMPONENT_NUMBER (??? result is probably still wrong)
+--format=gnu-v3
+DpDv1_c
+(char __vector(1))...
  #
  # Ada (GNAT) tests.
  #


OK with a suitable ChangeLog entry.

FWIW, demangler.com doesn't give any results for that case.  It just 
returns DpDv1_c


Jeff


Re: [patch 0/9] Flattening and initial module rebuilding

2015-07-07 Thread Andrew MacLeod

On 07/07/2015 06:21 PM, Jeff Law wrote:

On 07/07/2015 07:40 AM, Andrew MacLeod wrote:


I can adjust any of this quite easily, or present them in a different
way if you don't like it this way.  Again, my goal is to check in just
the final patch which does all the work of first 8 patches. It would
be a lot less turmoil on the branch.   I can do it in smaller chunks if
need be.
The set of 9 patches is fine for the trunk.  Just a few discussion 
points...


One of the things I keep thinking about as these changes fly by is 
your scripts.  Is there a reasonable possibility for you to add your 
scripts to the contrib/ directory or something similar to aid us in 
any future header file refactoring?  Yes, I know that in theory we 
should never have to do this again, but I also know that reality can 
be rather different.
yes, with a bit of tweaking and enhancement they can be generally 
useful.  They are all in python.  And no one is allowed to make comments 
like OMG thats so inefficient or what a horrible way to do that 
:-)   My goal was getting things done and sometimes the brute force 
approach works great when the machine sare fast enough :_)




Presumably the aggregators, by policy, are to have #includes and 
nothing else, right?  If so, we might want a comment to that effect in 
them.


Yeah, will do.



It's a bit of a shame that function.h is in backend.h, along with 
predict (which is presumably needed by basic-block/cfg?).




Yeah,once things settle down someone could tweak things more. If I make 
the tools available, people can do their own analysis and adjusting.


function.h provides cfun which is used all over the place..9 backend 
header files use it,and a few like gimple.h actually require struct 
function to be defined.


predict.h is actually required by gimple.h for a few reasons, enum 
be_predictor is used in parameter lists and a few inlines use the TAKEN, 
NOT_TAKEN macros
Its also needed by cfghooks.h, and betwen those 2 files, its just needed 
by  a very good chunk of the backend. .. 219 of the 263 files which 
include backend.h need it.
We could move the 2 enums and TAKEN/NOT_TAKEN to coretypes or something 
like that and it would probably cut the requirements for it by a *lot*.


Andrew


For the sake of amusement, here's the output from my initial include 
reduction logs for each file. Its basically all the unique errors 
produced by trying to remove the header from every source file in 
libbackend.a:


predict.h:
gimple_h : use of enum ‘br_predictor’ without previous declaration
gimple_h : use of enum ‘prediction’ without previous declaration
gimple_h : ‘TAKEN’ was not declared in this scope
gimple_h : ‘NOT_TAKEN’ was not declared in this scope
cfghooks_h : use of enum ‘br_predictor’ without previous declaration
 (1) : predict_h - cfghooks_h
use of enum ‘br_predictor’ without previous declaration
 (4) : predict_h - gimple_h
use of enum ‘br_predictor’ without previous declaration
use of enum ‘prediction’ without previous declaration
‘TAKEN’ was not declared in this scope
‘NOT_TAKEN’ was not declared in this scope

function.h:
emit_rtl_h : field ‘expr’ has incomplete type ‘expr_status’
emit_rtl_h : field ‘emit’ has incomplete type ‘emit_status’
emit_rtl_h : field ‘varasm’ has incomplete type ‘varasm_status’
emit_rtl_h : field ‘subsections’ has incomplete type ‘function_subsections’
emit_rtl_h : field ‘eh’ has incomplete type ‘rtl_eh’
emit_rtl_h : invalid use of incomplete type ‘struct sequence_stack’
gimple_h : invalid use of incomplete type ‘struct function’
gimple_ssa_h : invalid use of incomplete type ‘const struct function’
gimple_ssa_h : ‘cfun’ was not declared in this scope
tree_ssanames_h : ‘cfun’ was not declared in this scope
cfgloop_h : ‘loops_for_fn’ was not declared in this scope
cfgloop_h : ‘current_loops’ was not declared in this scope
cfgloop_h : ‘cfun’ was not declared in this scope
cilk_h : invalid use of incomplete type ‘struct function’
ssa_iterators_h : ‘cfun’ was not declared in this scope
tree_scalar_evolution_h : ‘cfun’ was not declared in this scope
 (1) : function_h - tree_scalar_evolution_h
‘cfun’ was not declared in this scope
 (1) : function_h - tree_ssanames_h
‘cfun’ was not declared in this scope
 (1) : function_h - ssa_iterators_h
‘cfun’ was not declared in this scope
 (1) : function_h - cilk_h
invalid use of incomplete type ‘struct function’
 (1) : function_h - gimple_h
invalid use of incomplete type ‘struct function’
 (2) : function_h - gimple_ssa_h
invalid use of incomplete type ‘const struct function’
‘cfun’ was not declared in this scope
 (3) : function_h - cfgloop_h
‘loops_for_fn’ was not declared in this scope
‘current_loops’ was not declared in this scope
‘cfun’ was not declared in this scope
 (6) : function_h - emit_rtl_h
field ‘expr’ has incomplete type 

RE: [PATCH] MIPS: fix failing branch range checks for micromips

2015-07-07 Thread Moore, Catherine
Hi Andrew,

 -Original Message-
 From: Andrew Bennett [mailto:andrew.benn...@imgtec.com]
 Sent: Tuesday, July 07, 2015 12:13 PM
 To: Moore, Catherine; gcc-patches@gcc.gnu.org
 Cc: Matthew Fortune
 Subject: RE: [PATCH] MIPS: fix failing branch range checks for micromips
 
 
 Ok to commit?
 
 testsuite/
   * gcc.target/mips/branch-2.c: Change NOMIPS16 to
 NOCOMPRESSION.
   * gcc.target/mips/branch-3.c: Ditto
   * gcc.target/mips/branch-4.c: Ditto.
   * gcc.target/mips/branch-5.c: Ditto.
   * gcc.target/mips/branch-6.c: Ditto.
   * gcc.target/mips/branch-7.c: Ditto.
   * gcc.target/mips/branch-8.c: Ditto.
   * gcc.target/mips/branch-9.c: Ditto.
   * gcc.target/mips/branch-10.c: Ditto.
   * gcc.target/mips/branch-11.c: Ditto.
   * gcc.target/mips/branch-12.c: Ditto.
   * gcc.target/mips/branch-13.c: Ditto.

These are OK, except for the splitting of the scan-assembler statements.

Please change occurrences of:
 +/* { dg-final { scan-assembler
 +\tld\t\\\$1,%got_page\\(\[^)\]*\\)\\(\\\$3\\)\\n } } */
to: 
+/* { dg-final { scan-assembler 
\tld\t\\\$1,%got_page\\(\[^)\]*\\)\\(\\\$3\\)\\n } } */

before committing.


   * gcc.target/mips/branch-14.c: Ditto.
   * gcc.target/mips/branch-15.c: Ditto.

The modifications for these two files need to be removed.   These are execution 
tests and the multilib that is used to link them is important.   If the 
libraries are not compatible with the NOCOMPRESSION attribute, then the link 
step will fail.  You could work around this problem by enabling interlinking, 
but I think the best approach is to leave these two tests alone.

   * gcc.target/mips/umips-branch-5.c: New file.
   * gcc.target/mips/umips-branch-6.c: New file.
   * gcc.target/mips/umips-branch-7.c: New file.
   * gcc.target/mips/umips-branch-8.c: New file.
   * gcc.target/mips/umips-branch-9.c: New file.
   * gcc.target/mips/umips-branch-10.c: New file.
   * gcc.target/mips/umips-branch-11.c: New file.
   * gcc.target/mips/umips-branch-12.c: New file.
   * gcc.target/mips/umips-branch-13.c: New file.
   * gcc.target/mips/umips-branch-14.c: New file.
   * gcc.target/mips/umips-branch-15.c: New file.
   * gcc.target/mips/umips-branch-16.c: New file.

Same comment as above on the scan-assembler statements.

   * gcc.target/mips/umips-branch-17.c: New file.
   * gcc.target/mips/umips-branch-18.c: New file.

These two tests suffer from the same problem as above.  They should be deleted 
altogether.

   * gcc.target/mips/branch-helper.h (OCCUPY_0x1): New define.
   (OCCUPY_0xfffc): New define.

This is okay.

Thanks,
Catherine

 
 diff --git a/gcc/testsuite/gcc.target/mips/branch-10.c
 b/gcc/testsuite/gcc.target/mips/branch-10.c
 index e2b1b5f..eb21c16 100644
 --- a/gcc/testsuite/gcc.target/mips/branch-10.c
 +++ b/gcc/testsuite/gcc.target/mips/branch-10.c
 @@ -4,7 +4,7 @@
 
  #include branch-helper.h
 
 -NOMIPS16 void
 +NOCOMPRESSION void
  foo (int (*bar) (void), int *x)
  {
*x = bar ();
 diff --git a/gcc/testsuite/gcc.target/mips/branch-11.c
 b/gcc/testsuite/gcc.target/mips/branch-11.c
 index 962eb1b..bd8e834 100644
 --- a/gcc/testsuite/gcc.target/mips/branch-11.c
 +++ b/gcc/testsuite/gcc.target/mips/branch-11.c
 @@ -8,7 +8,7 @@
 
  #include branch-helper.h
 
 -NOMIPS16 void
 +NOCOMPRESSION void
  foo (int (*bar) (void), int *x)
  {
*x = bar ();
 diff --git a/gcc/testsuite/gcc.target/mips/branch-12.c
 b/gcc/testsuite/gcc.target/mips/branch-12.c
 index 4aef160..4944634 100644
 --- a/gcc/testsuite/gcc.target/mips/branch-12.c
 +++ b/gcc/testsuite/gcc.target/mips/branch-12.c
 @@ -4,7 +4,7 @@
 
  #include branch-helper.h
 
 -NOMIPS16 void
 +NOCOMPRESSION void
  foo (int (*bar) (void), int *x)
  {
*x = bar ();
 diff --git a/gcc/testsuite/gcc.target/mips/branch-13.c
 b/gcc/testsuite/gcc.target/mips/branch-13.c
 index 8a6fb04..f5269b9 100644
 --- a/gcc/testsuite/gcc.target/mips/branch-13.c
 +++ b/gcc/testsuite/gcc.target/mips/branch-13.c
 @@ -8,7 +8,7 @@
 
  #include branch-helper.h
 
 -NOMIPS16 void
 +NOCOMPRESSION void
  foo (int (*bar) (void), int *x)
  {
*x = bar ();
 diff --git a/gcc/testsuite/gcc.target/mips/branch-14.c
 b/gcc/testsuite/gcc.target/mips/branch-14.c
 index 026417e..c2eecc3 100644
 --- a/gcc/testsuite/gcc.target/mips/branch-14.c
 +++ b/gcc/testsuite/gcc.target/mips/branch-14.c
 @@ -4,14 +4,14 @@
  #include branch-helper.h
 
  void __attribute__((noinline))
 -foo (volatile int *x)
 +NOCOMPRESSION foo (volatile int *x)
  {
if (__builtin_expect (*x == 0, 1))
  OCCUPY_0x1fff8;
  }
 
  int
 -main (void)
 +NOCOMPRESSION main (void)
  {
int x = 0;
int y = 1;
 diff --git a/gcc/testsuite/gcc.target/mips/branch-15.c
 b/gcc/testsuite/gcc.target/mips/branch-15.c
 index dee7a05..89e25f3 100644
 --- a/gcc/testsuite/gcc.target/mips/branch-15.c
 +++ b/gcc/testsuite/gcc.target/mips/branch-15.c
 @@ -4,14 +4,14 @@
  #include branch-helper.h
 
  void
 -foo 

Re: [PATCH 15/16][fold-const.c] Fix bigendian HFmode in native_interpret_real

2015-07-07 Thread Jeff Law

On 07/07/2015 06:37 AM, Alan Lawrence wrote:

As per https://gcc.gnu.org/ml/gcc-patches/2015-04/msg01346.html. Fixes
FAIL of advsimd-intrinsics vcreate.c on aarch64_be-none-elf from
previous patch.

15_native_interpret_real.patch


commit e2e7ca148960a82fc88128820f17e7cbd14173cb
Author: Alan Lawrencealan.lawre...@arm.com
Date:   Thu Apr 9 10:54:40 2015 +0100

 Fix native_interpret_real for HFmode floats on Bigendian with 
UNITS_PER_WORD=4

 (with missing space)

OK with ChangeLog in proper form.

jeff



[PATCH] libgomp: Introduce gomp_thread::spare_team

2015-07-07 Thread Sebastian Huber
Try to re-use the previous team to avoid the use of malloc() and free()
in the normal case where number of threads is the same.  Avoid
superfluous destruction and initialization of team synchronization
objects.

Using the microbenchmark posted here

https://gcc.gnu.org/ml/gcc-patches/2008-03/msg00930.html

shows an improvement in the parallel bench test case (target
x86_64-unknown-linux-gnu, median out of 9 test runs, iteration count
increased to 20).

Before the patch:

parallel bench 11.2284 seconds

After the patch:

parallel bench 10.7575 seconds

libgomp/ChangeLog
2015-07-07  Sebastian Huber  sebastian.hu...@embedded-brains.de

* libgomp.h (gomp_thread): Add spare_team field.
* team.c (gomp_thread_start): Initialize spare team for non-TLS
targets.
(gomp_new_team): Use spare team if possible.
(free_team): Destroy more team objects.
(gomp_free_thread): Free spare team if necessary.
(free_non_nested_team): New.
(gomp_team_end): Move some team object destructions to
free_team().  Use free_non_nested_team().
---
 libgomp/libgomp.h |  3 +++
 libgomp/team.c| 63 ---
 2 files changed, 45 insertions(+), 21 deletions(-)

diff --git a/libgomp/libgomp.h b/libgomp/libgomp.h
index 5ed0f78..563c1e2 100644
--- a/libgomp/libgomp.h
+++ b/libgomp/libgomp.h
@@ -448,6 +448,9 @@ struct gomp_thread
 
   /* User pthread thread pool */
   struct gomp_thread_pool *thread_pool;
+
+  /* Spare team ready for re-use in gomp_new_team()  */
+  struct gomp_team *spare_team;
 };
 
 
diff --git a/libgomp/team.c b/libgomp/team.c
index b98b233..cc19eb0 100644
--- a/libgomp/team.c
+++ b/libgomp/team.c
@@ -77,6 +77,7 @@ gomp_thread_start (void *xdata)
   struct gomp_thread local_thr;
   thr = local_thr;
   pthread_setspecific (gomp_tls_key, thr);
+  thr-spare_team = NULL;
 #endif
   gomp_sem_init (thr-release, 0);
 
@@ -140,19 +141,35 @@ gomp_thread_start (void *xdata)
 struct gomp_team *
 gomp_new_team (unsigned nthreads)
 {
+  struct gomp_thread *thr = gomp_thread ();
+  struct gomp_team *spare_team = thr-spare_team;
   struct gomp_team *team;
-  size_t size;
   int i;
 
-  size = sizeof (*team) + nthreads * (sizeof (team-ordered_release[0])
- + sizeof (team-implicit_task[0]));
-  team = gomp_malloc (size);
+  if (spare_team  spare_team-nthreads == nthreads)
+{
+  thr-spare_team = NULL;
+  team = spare_team;
+}
+  else
+{
+  size_t extra = sizeof (team-ordered_release[0])
+ + sizeof (team-implicit_task[0]);
+  team = gomp_malloc (sizeof (*team) + nthreads * extra);
+
+#ifndef HAVE_SYNC_BUILTINS
+  gomp_mutex_init (team-work_share_list_free_lock);
+#endif
+  gomp_barrier_init (team-barrier, nthreads);
+  gomp_sem_init (team-master_release, 0);
+  gomp_mutex_init (team-task_lock);
+
+  team-nthreads = nthreads;
+}
 
   team-work_share_chunk = 8;
 #ifdef HAVE_SYNC_BUILTINS
   team-single_count = 0;
-#else
-  gomp_mutex_init (team-work_share_list_free_lock);
 #endif
   team-work_shares_to_free = team-work_shares[0];
   gomp_init_work_share (team-work_shares[0], false, nthreads);
@@ -163,14 +180,9 @@ gomp_new_team (unsigned nthreads)
 team-work_shares[i].next_free = team-work_shares[i + 1];
   team-work_shares[i].next_free = NULL;
 
-  team-nthreads = nthreads;
-  gomp_barrier_init (team-barrier, nthreads);
-
-  gomp_sem_init (team-master_release, 0);
   team-ordered_release = (void *) team-implicit_task[nthreads];
   team-ordered_release[0] = team-master_release;
 
-  gomp_mutex_init (team-task_lock);
   team-task_queue = NULL;
   team-task_count = 0;
   team-task_queued_count = 0;
@@ -187,6 +199,10 @@ gomp_new_team (unsigned nthreads)
 static void
 free_team (struct gomp_team *team)
 {
+  gomp_sem_destroy (team-master_release);
+#ifndef HAVE_SYNC_BUILTINS
+  gomp_mutex_destroy (team-work_share_list_free_lock);
+#endif
   gomp_barrier_destroy (team-barrier);
   gomp_mutex_destroy (team-task_lock);
   free (team);
@@ -225,6 +241,8 @@ gomp_free_thread (void *arg __attribute__((unused)))
 {
   struct gomp_thread *thr = gomp_thread ();
   struct gomp_thread_pool *pool = thr-thread_pool;
+  if (thr-spare_team)
+free_team (thr-spare_team);
   if (pool)
 {
   if (pool-threads_used  0)
@@ -835,6 +853,18 @@ gomp_team_start (void (*fn) (void *), void *data, unsigned 
nthreads,
 free (affinity_thr);
 }
 
+static void
+free_non_nested_team (struct gomp_team *team, struct gomp_thread *thr)
+{
+  struct gomp_thread_pool *pool = thr-thread_pool;
+  if (pool-last_team)
+{
+  if (thr-spare_team)
+   free_team (thr-spare_team);
+  thr-spare_team = pool-last_team;
+}
+  pool-last_team = team;
+}
 
 /* Terminate the current team.  This is only to be called by the master
thread.  We assume that we must wait for the other threads.  */
@@ -894,21 +924,12 @@ gomp_team_end (void)

Re: [PATCH 1/3] [ARM] PR63870 NEON error messages

2015-07-07 Thread Alan Lawrence

Alan Lawrence wrote:

I note some parts of this duplicate my
https://gcc.gnu.org/ml/gcc-patches/2015-01/msg01422.html , which has been pinged
a couple of times. Both Charles' patch, and my two, contain parts the other does
not...

Cheers, Alan

Charles Baylis wrote:

gcc/ChangeLog:

DATE  Charles Baylis  charles.bay...@linaro.org

* config/arm/arm-builtins.c (enum arm_type_qualifiers): New enumerators
qualifier_lane_index, qualifier_struct_load_store_lane_index.
(arm_expand_neon_args): New parameter. Remove ellipsis. Handle NEON
argument qualifiers.
(arm_expand_neon_builtin): Handle NEON argument qualifiers.
* config/arm/arm-protos.h: (arm_neon_lane_bounds) New prototype.
* config/arm/arm.c (arm_neon_lane_bounds): New function.


Further to that - the main difference/conflict between Charles' patch and mine 
looks to be that I added the const_tree parameter to the existing 
neon_lane_bounds method, whereas Charles' patch adds a new method 
arm_neon_lane_bounds.


--Alan



[PATCH 6/16][ARM] Remaining float16 intrinsics: vld..., vst..., vget_low/high, vcombine

2015-07-07 Thread Alan Lawrence

As per https://gcc.gnu.org/ml/gcc-patches/2015-04/msg01341.html
commit ae6264b144d25fadcbf219e68ddf3d8c5f40be34
Author: Alan Lawrence alan.lawre...@arm.com
Date:   Thu Dec 11 11:53:59 2014 +

ARM 4/4 v2: v(ld|st)[234](q?|_lane|_dup), vcombine, vget_(low|high) (v2 w/ V_uf_sclr)

All are tied together with so many iterators!

Also vec_extract

diff --git a/gcc/config/arm/arm-builtins.c b/gcc/config/arm/arm-builtins.c
index 17e39d8..1ee0a3d 100644
--- a/gcc/config/arm/arm-builtins.c
+++ b/gcc/config/arm/arm-builtins.c
@@ -241,6 +241,12 @@ typedef struct {
 #define VAR10(T, N, A, B, C, D, E, F, G, H, I, J) \
   VAR9 (T, N, A, B, C, D, E, F, G, H, I) \
   VAR1 (T, N, J)
+#define VAR11(T, N, A, B, C, D, E, F, G, H, I, J, K) \
+  VAR10 (T, N, A, B, C, D, E, F, G, H, I, J) \
+  VAR1 (T, N, K)
+#define VAR12(T, N, A, B, C, D, E, F, G, H, I, J, K, L) \
+  VAR11 (T, N, A, B, C, D, E, F, G, H, I, J, K) \
+  VAR1 (T, N, L)
 
 /* The NEON builtin data can be found in arm_neon_builtins.def.
The mode entries in the following table correspond to the key type of the
diff --git a/gcc/config/arm/arm_neon.h b/gcc/config/arm/arm_neon.h
index db73c70..93fb44f 100644
--- a/gcc/config/arm/arm_neon.h
+++ b/gcc/config/arm/arm_neon.h
@@ -162,6 +162,16 @@ typedef struct uint64x2x2_t
   uint64x2_t val[2];
 } uint64x2x2_t;
 
+typedef struct float16x4x2_t
+{
+  float16x4_t val[2];
+} float16x4x2_t;
+
+typedef struct float16x8x2_t
+{
+  float16x8_t val[2];
+} float16x8x2_t;
+
 typedef struct float32x2x2_t
 {
   float32x2_t val[2];
@@ -288,6 +298,16 @@ typedef struct uint64x2x3_t
   uint64x2_t val[3];
 } uint64x2x3_t;
 
+typedef struct float16x4x3_t
+{
+  float16x4_t val[3];
+} float16x4x3_t;
+
+typedef struct float16x8x3_t
+{
+  float16x8_t val[3];
+} float16x8x3_t;
+
 typedef struct float32x2x3_t
 {
   float32x2_t val[3];
@@ -414,6 +434,16 @@ typedef struct uint64x2x4_t
   uint64x2_t val[4];
 } uint64x2x4_t;
 
+typedef struct float16x4x4_t
+{
+  float16x4_t val[4];
+} float16x4x4_t;
+
+typedef struct float16x8x4_t
+{
+  float16x8_t val[4];
+} float16x8x4_t;
+
 typedef struct float32x2x4_t
 {
   float32x2_t val[4];
@@ -6031,6 +6061,12 @@ vcombine_s64 (int64x1_t __a, int64x1_t __b)
   return (int64x2_t)__builtin_neon_vcombinedi (__a, __b);
 }
 
+__extension__ static __inline float16x8_t __attribute__ ((__always_inline__))
+vcombine_f16 (float16x4_t __a, float16x4_t __b)
+{
+  return __builtin_neon_vcombinev4hf (__a, __b);
+}
+
 __extension__ static __inline float32x4_t __attribute__ ((__always_inline__))
 vcombine_f32 (float32x2_t __a, float32x2_t __b)
 {
@@ -6105,6 +6141,12 @@ vget_high_s64 (int64x2_t __a)
   return (int64x1_t)__builtin_neon_vget_highv2di (__a);
 }
 
+__extension__ static __inline float16x4_t __attribute__ ((__always_inline__))
+vget_high_f16 (float16x8_t __a)
+{
+  return __builtin_neon_vget_highv8hf (__a);
+}
+
 __extension__ static __inline float32x2_t __attribute__ ((__always_inline__))
 vget_high_f32 (float32x4_t __a)
 {
@@ -6165,6 +6207,12 @@ vget_low_s32 (int32x4_t __a)
   return (int32x2_t)__builtin_neon_vget_lowv4si (__a);
 }
 
+__extension__ static __inline float16x4_t __attribute__ ((__always_inline__))
+vget_low_f16 (float16x8_t __a)
+{
+  return __builtin_neon_vget_lowv8hf (__a);
+}
+
 __extension__ static __inline float32x2_t __attribute__ ((__always_inline__))
 vget_low_f32 (float32x4_t __a)
 {
@@ -8712,6 +8760,12 @@ vld1_s64 (const int64_t * __a)
   return (int64x1_t)__builtin_neon_vld1di ((const __builtin_neon_di *) __a);
 }
 
+__extension__ static __inline float16x4_t __attribute__ ((__always_inline__))
+vld1_f16 (const float16_t * __a)
+{
+  return __builtin_neon_vld1v4hf ((const __builtin_neon_hf *) __a);
+}
+
 __extension__ static __inline float32x2_t __attribute__ ((__always_inline__))
 vld1_f32 (const float32_t * __a)
 {
@@ -8786,6 +8840,12 @@ vld1q_s64 (const int64_t * __a)
   return (int64x2_t)__builtin_neon_vld1v2di ((const __builtin_neon_di *) __a);
 }
 
+__extension__ static __inline float16x8_t __attribute__ ((__always_inline__))
+vld1q_f16 (const float16_t * __a)
+{
+  return __builtin_neon_vld1v8hf ((const __builtin_neon_hf *) __a);
+}
+
 __extension__ static __inline float32x4_t __attribute__ ((__always_inline__))
 vld1q_f32 (const float32_t * __a)
 {
@@ -9183,6 +9243,12 @@ vst1_s64 (int64_t * __a, int64x1_t __b)
 }
 
 __extension__ static __inline void __attribute__ ((__always_inline__))
+vst1_f16 (float16_t * __a, float16x4_t __b)
+{
+  __builtin_neon_vst1v4hf ((__builtin_neon_hf *) __a, __b);
+}
+
+__extension__ static __inline void __attribute__ ((__always_inline__))
 vst1_f32 (float32_t * __a, float32x2_t __b)
 {
   __builtin_neon_vst1v2sf ((__builtin_neon_sf *) __a, __b);
@@ -9257,6 +9323,12 @@ vst1q_s64 (int64_t * __a, int64x2_t __b)
 }
 
 __extension__ static __inline void __attribute__ ((__always_inline__))
+vst1q_f16 (float16_t * __a, float16x8_t __b)
+{
+  __builtin_neon_vst1v8hf ((__builtin_neon_hf *) __a, __b);
+}
+
+__extension__ static __inline 

[PATCH 5/16][ARM] Add float16x8_t intrinsics

2015-07-07 Thread Alan Lawrence

As per https://gcc.gnu.org/ml/gcc-patches/2015-04/msg01337.html
commit 336eb16d3061131fe8d28fad4a473d00768bfe5c
Author: Alan Lawrence alan.lawre...@arm.com
Date:   Tue Dec 9 15:06:38 2014 +

ARM float16x8_t intrinsics (v2 - fix v[sg]etq_lane_f16, add 
vreinterpretq_p16_f16, no vdup_n/lane/vmov_n)

diff --git a/gcc/config/arm/arm_neon.h b/gcc/config/arm/arm_neon.h
index a958f63..db73c70 100644
--- a/gcc/config/arm/arm_neon.h
+++ b/gcc/config/arm/arm_neon.h
@@ -5282,6 +5282,15 @@ vgetq_lane_s32 (int32x4_t __a, const int __b)
   return (int32_t)__builtin_neon_vget_lanev4si (__a, __b);
 }
 
+#define vgetq_lane_f16(__v, __idx) \
+  __extension__\
+({ \
+  float16x8_t __vec = (__v);   \
+  __builtin_arm_lane_check (8, __idx); \
+  float16_t __res = __vec[__idx];  \
+  __res;   \
+})
+
 __extension__ static __inline float32_t __attribute__ ((__always_inline__))
 vgetq_lane_f32 (float32x4_t __a, const int __b)
 {
@@ -5424,6 +5433,16 @@ vsetq_lane_s32 (int32_t __a, int32x4_t __b, const int 
__c)
   return (int32x4_t)__builtin_neon_vset_lanev4si ((__builtin_neon_si) __a, 
__b, __c);
 }
 
+#define vsetq_lane_f16(__e, __v, __idx)\
+  __extension__\
+({ \
+  float16_t __elem = (__e);\
+  float16x8_t __vec = (__v);   \
+  __builtin_arm_lane_check (8, __idx); \
+  __vec[__idx] = __elem;   \
+  __vec;   \
+})
+
 __extension__ static __inline float32x4_t __attribute__ ((__always_inline__))
 vsetq_lane_f32 (float32_t __a, float32x4_t __b, const int __c)
 {
@@ -8907,6 +8926,12 @@ vld1q_lane_s32 (const int32_t * __a, int32x4_t __b, 
const int __c)
   return (int32x4_t)__builtin_neon_vld1_lanev4si ((const __builtin_neon_si *) 
__a, __b, __c);
 }
 
+__extension__ static __inline float16x8_t __attribute__ ((__always_inline__))
+vld1q_lane_f16 (const float16_t * __a, float16x8_t __b, const int __c)
+{
+  return vsetq_lane_f16 (*__a, __b, __c);
+}
+
 __extension__ static __inline float32x4_t __attribute__ ((__always_inline__))
 vld1q_lane_f32 (const float32_t * __a, float32x4_t __b, const int __c)
 {
@@ -9062,6 +9087,13 @@ vld1q_dup_s32 (const int32_t * __a)
   return (int32x4_t)__builtin_neon_vld1_dupv4si ((const __builtin_neon_si *) 
__a);
 }
 
+__extension__ static __inline float16x8_t __attribute__ ((__always_inline__))
+vld1q_dup_f16 (const float16_t * __a)
+{
+  float16_t __f = *__a;
+  return (float16x8_t) { __f, __f, __f, __f, __f, __f, __f, __f };
+}
+
 __extension__ static __inline float32x4_t __attribute__ ((__always_inline__))
 vld1q_dup_f32 (const float32_t * __a)
 {
@@ -12856,6 +12888,12 @@ vreinterpretq_p8_p16 (poly16x8_t __a)
 }
 
 __extension__ static __inline poly8x16_t __attribute__ ((__always_inline__))
+vreinterpretq_p8_f16 (float16x8_t __a)
+{
+  return (poly8x16_t) __a;
+}
+
+__extension__ static __inline poly8x16_t __attribute__ ((__always_inline__))
 vreinterpretq_p8_f32 (float32x4_t __a)
 {
   return (poly8x16_t)__builtin_neon_vreinterpretv16qiv4sf (__a);
@@ -12932,6 +12970,12 @@ vreinterpretq_p16_p8 (poly8x16_t __a)
 }
 
 __extension__ static __inline poly16x8_t __attribute__ ((__always_inline__))
+vreinterpretq_p16_f16 (float16x8_t __a)
+{
+  return (poly16x8_t) __a;
+}
+
+__extension__ static __inline poly16x8_t __attribute__ ((__always_inline__))
 vreinterpretq_p16_f32 (float32x4_t __a)
 {
   return (poly16x8_t)__builtin_neon_vreinterpretv8hiv4sf (__a);
@@ -13001,6 +13045,88 @@ vreinterpretq_p16_u32 (uint32x4_t __a)
   return (poly16x8_t)__builtin_neon_vreinterpretv8hiv4si ((int32x4_t) __a);
 }
 
+__extension__ static __inline float16x8_t __attribute__ ((__always_inline__))
+vreinterpretq_f16_p8 (poly8x16_t __a)
+{
+  return (float16x8_t) __a;
+}
+
+__extension__ static __inline float16x8_t __attribute__ ((__always_inline__))
+vreinterpretq_f16_p16 (poly16x8_t __a)
+{
+  return (float16x8_t) __a;
+}
+
+__extension__ static __inline float16x8_t __attribute__ ((__always_inline__))
+vreinterpretq_f16_f32 (float32x4_t __a)
+{
+  return (float16x8_t) __a;
+}
+
+#ifdef __ARM_FEATURE_CRYPTO
+__extension__ static __inline float16x8_t __attribute__ ((__always_inline__))
+vreinterpretq_f16_p64 (poly64x2_t __a)
+{
+  return (float16x8_t) __a;
+}
+
+#endif
+#ifdef __ARM_FEATURE_CRYPTO
+__extension__ static __inline float16x8_t __attribute__ ((__always_inline__))
+vreinterpretq_f16_p128 (poly128_t __a)
+{
+  return (float16x8_t) __a;
+}
+
+#endif
+__extension__ static __inline float16x8_t __attribute__ ((__always_inline__))
+vreinterpretq_f16_s64 (int64x2_t __a)
+{
+  return (float16x8_t) __a;
+}
+
+__extension__ static __inline float16x8_t __attribute__ ((__always_inline__))
+vreinterpretq_f16_u64 (uint64x2_t __a)
+{
+  return 

[PATCH 7/16][AArch64] Add basic fp16 support

2015-07-07 Thread Alan Lawrence
Same as https://gcc.gnu.org/ml/gcc-patches/2015-04/msg01340.html except that two 
of the tests have been moved into the next patch. (The remaining test is AArch64 
only.)


gcc/ChangeLog:

* config/aarch64/aarch64-builtins.c (aarch64_fp16_type_node): New.
(aarch64_init_builtins): Make aarch64_fp16_type_node, use for __fp16.

* config/aarch64/aarch64-modes.def: Add HFmode.

* config/aarch64/aarch64.h (TARGET_CPU_CPP_BUILTINS): Define
__ARM_FP16_FORMAT_IEEE and __ARM_FP16_ARGS. Set bit 1 of __ARM_FP.

* config/aarch64/aarch64.c (aarch64_init_libfuncs,
aarch64_promoted_type): New.

(aarch64_float_const_representable_p): Disable HFmode.
(aarch64_mangle_type): Mangle half-precision floats to Dh.
(TARGET_PROMOTED_TYPE): Define to aarch64_promoted_type.
(TARGET_INIT_LIBFUNCS): Define to aarch64_init_libfuncs.

* config/aarch64/aarch64.md (movmode): Include HFmode using GPF_F16.
(movhf_aarch64, extendhfsf2, extendhfdf2, truncsfhf2, truncdfhf2): New.

* config/aarch64/iterators.md (GPF_F16): New.


gcc/testsuite/ChangeLog:

* gcc.target/aarch64/f16_movs_1.c: New test.

commit 989af1492bbf268be1ecfae06f3303b90ae514c8
Author: Alan Lawrence alan.lawre...@arm.com
Date:   Tue Dec 2 12:57:39 2014 +

AArch64 1/6: Basic HFmode support (less tests), aarch64_fp16_type_node, patterns, mangling, predefines.

No --fp16-format option.

Disable constants as NYI.

diff --git a/gcc/config/aarch64/aarch64-builtins.c b/gcc/config/aarch64/aarch64-builtins.c
index ec60955..cfb2dc1 100644
--- a/gcc/config/aarch64/aarch64-builtins.c
+++ b/gcc/config/aarch64/aarch64-builtins.c
@@ -439,6 +439,9 @@ static struct aarch64_simd_type_info aarch64_simd_types [] = {
 };
 #undef ENTRY
 
+/* This type is not SIMD-specific; it is the user-visible __fp16.  */
+static tree aarch64_fp16_type_node = NULL_TREE;
+
 static tree aarch64_simd_intOI_type_node = NULL_TREE;
 static tree aarch64_simd_intEI_type_node = NULL_TREE;
 static tree aarch64_simd_intCI_type_node = NULL_TREE;
@@ -849,6 +852,12 @@ aarch64_init_builtins (void)
 = add_builtin_function (__builtin_aarch64_set_fpsr, ftype_set_fpr,
 			AARCH64_BUILTIN_SET_FPSR, BUILT_IN_MD, NULL, NULL_TREE);
 
+  aarch64_fp16_type_node = make_node (REAL_TYPE);
+  TYPE_PRECISION (aarch64_fp16_type_node) = 16;
+  layout_type (aarch64_fp16_type_node);
+
+  (*lang_hooks.types.register_builtin_type) (aarch64_fp16_type_node, __fp16);
+
   if (TARGET_SIMD)
 aarch64_init_simd_builtins ();
   if (TARGET_CRC32)
diff --git a/gcc/config/aarch64/aarch64-modes.def b/gcc/config/aarch64/aarch64-modes.def
index b17b90d..c30059b 100644
--- a/gcc/config/aarch64/aarch64-modes.def
+++ b/gcc/config/aarch64/aarch64-modes.def
@@ -36,6 +36,10 @@ CC_MODE (CC_DLTU);
 CC_MODE (CC_DGEU);
 CC_MODE (CC_DGTU);
 
+/* Half-precision floating point for arm_neon.h float16_t.  */
+FLOAT_MODE (HF, 2, 0);
+ADJUST_FLOAT_FORMAT (HF, ieee_half_format);
+
 /* Vector modes.  */
 VECTOR_MODES (INT, 8);/*   V8QI V4HI V2SI.  */
 VECTOR_MODES (INT, 16);   /* V16QI V8HI V4SI V2DI.  */
diff --git a/gcc/config/aarch64/aarch64.c b/gcc/config/aarch64/aarch64.c
index 17bae08..f338033 100644
--- a/gcc/config/aarch64/aarch64.c
+++ b/gcc/config/aarch64/aarch64.c
@@ -8339,6 +8339,10 @@ aarch64_mangle_type (const_tree type)
   if (lang_hooks.types_compatible_p (CONST_CAST_TREE (type), va_list_type))
 return St9__va_list;
 
+  /* Half-precision float.  */
+  if (TREE_CODE (type) == REAL_TYPE  TYPE_PRECISION (type) == 16)
+return Dh;
+
   /* Mangle AArch64-specific internal types.  TYPE_NAME is non-NULL_TREE for
  builtin types.  */
   if (TYPE_NAME (type) != NULL)
@@ -9578,6 +9582,33 @@ aarch64_start_file (void)
   default_file_start();
 }
 
+static void
+aarch64_init_libfuncs (void)
+{
+   /* Half-precision float operations.  The compiler handles all operations
+ with NULL libfuncs by converting to SFmode.  */
+
+  /* Conversions.  */
+  set_conv_libfunc (trunc_optab, HFmode, SFmode, __gnu_f2h_ieee);
+  set_conv_libfunc (sext_optab, SFmode, HFmode, __gnu_h2f_ieee);
+
+  /* Arithmetic.  */
+  set_optab_libfunc (add_optab, HFmode, NULL);
+  set_optab_libfunc (sdiv_optab, HFmode, NULL);
+  set_optab_libfunc (smul_optab, HFmode, NULL);
+  set_optab_libfunc (neg_optab, HFmode, NULL);
+  set_optab_libfunc (sub_optab, HFmode, NULL);
+
+  /* Comparisons.  */
+  set_optab_libfunc (eq_optab, HFmode, NULL);
+  set_optab_libfunc (ne_optab, HFmode, NULL);
+  set_optab_libfunc (lt_optab, HFmode, NULL);
+  set_optab_libfunc (le_optab, HFmode, NULL);
+  set_optab_libfunc (ge_optab, HFmode, NULL);
+  set_optab_libfunc (gt_optab, HFmode, NULL);
+  set_optab_libfunc (unord_optab, HFmode, NULL);
+}
+
 /* Target hook for c_mode_for_suffix.  */
 static machine_mode
 aarch64_c_mode_for_suffix (char suffix)
@@ -9616,7 +9647,8 @@ aarch64_float_const_representable_p (rtx x)
   if (!CONST_DOUBLE_P (x))
 

[PATCH 8/16][ARM/AArch64 Testsuite] Add basic fp16 tests

2015-07-07 Thread Alan Lawrence
These were originally part of 
https://gcc.gnu.org/ml/gcc-patches/2015-04/msg01340.html but I have moved into 
their own subdirectory and adapted them to execute on ARM also (as per 
https://gcc.gnu.org/ml/gcc-patches/2015-05/msg00656.html)


gcc/testsuite/ChangeLog:

* gcc.target/aarch64/fp16/fp16.exp: New.
* gcc.target/aarch64/fp16/f16_convs_1.c: New.
* gcc.target/aarch64/fp16/f16_convs_2.c: New.
commit bc5045c0d3dd34b8cb94910281384f9ab9880325
Author: Alan Lawrence alan.lawre...@arm.com
Date:   Thu May 7 10:08:12 2015 +0100

(ARM+AArch64) Add gcc.target/aarch64/fp16, f16_conv_[12].c tests

diff --git a/gcc/testsuite/gcc.target/aarch64/fp16/f16_convs_1.c b/gcc/testsuite/gcc.target/aarch64/fp16/f16_convs_1.c
new file mode 100644
index 000..a1c95fd
--- /dev/null
+++ b/gcc/testsuite/gcc.target/aarch64/fp16/f16_convs_1.c
@@ -0,0 +1,34 @@
+/* { dg-do run } */
+/* { dg-options -O2 } */
+/* { dg-additional-options -mfp16-format=ieee {target arm*-*-*} } */
+
+extern void abort (void);
+
+#define EPSILON 0.0001
+
+int
+main (int argc, char **argv)
+{
+  float f1 = 3.14159f;
+  float f2 = 2.718f;
+  /* This 'assembler' statement should be portable between ARM and AArch64.  */
+  asm volatile ( : : : memory);
+  __fp16 in1 = f1;
+  __fp16 in2 = f2;
+
+  /* Do the addition on __fp16's (implicitly converts both operands to
+ float32, adds, converts back to f16, then we convert back to f32).  */
+  __fp16 res1 = in1 + in2;
+  asm volatile ( : : : memory);
+  float f_res_1 = res1;
+
+  /* Do the addition on float32's (we convert both operands to f32, and add,
+ as above, but skip the final conversion f32 - f16 - f32).  */
+  float f1a = in1;
+  float f2a = in2;
+  float f_res_2 = f1a + f2a;
+
+  if (__builtin_fabs (f_res_2 - f_res_1)  EPSILON)
+abort ();
+  return 0;
+}
diff --git a/gcc/testsuite/gcc.target/aarch64/fp16/f16_convs_2.c b/gcc/testsuite/gcc.target/aarch64/fp16/f16_convs_2.c
new file mode 100644
index 000..6aa3e59
--- /dev/null
+++ b/gcc/testsuite/gcc.target/aarch64/fp16/f16_convs_2.c
@@ -0,0 +1,33 @@
+/* { dg-do run } */
+/* { dg-options -O2 } */
+/* { dg-additional-options -mfp16-format=ieee {target arm*-*-*} } */
+
+extern void abort (void);
+
+#define EPSILON 0.0001
+
+int
+main (int argc, char **argv)
+{
+  int i1 = 3;
+  int i2 = 2;
+  /*  This 'assembler' should be portable across ARM and AArch64.  */
+  asm volatile ( : : : memory);
+
+  __fp16 in1 = i1;
+  __fp16 in2 = i2;
+
+  /* Do the addition on __fp16's (implicitly converts both operands to
+ float32, adds, converts back to f16, then we convert to int).  */
+  __fp16 res1 = in1 + in2;
+  asm volatile ( : : : memory);
+  int result1 = res1;
+
+  /* Do the addition on int's (we convert both operands directly to int, add,
+ and we're done).  */
+  int result2 = ((int) in1) + ((int) in2);
+
+  if (__builtin_abs (result2 - result1)  EPSILON)
+abort ();
+  return 0;
+}
diff --git a/gcc/testsuite/gcc.target/aarch64/fp16/fp16.exp b/gcc/testsuite/gcc.target/aarch64/fp16/fp16.exp
new file mode 100644
index 000..7dc8d65
--- /dev/null
+++ b/gcc/testsuite/gcc.target/aarch64/fp16/fp16.exp
@@ -0,0 +1,43 @@
+# Tests of 16-bit floating point (__fp16), for both ARM and AArch64.
+# Copyright (C) 2015 Free Software Foundation, Inc.
+
+# This program is free software; you can redistribute it and/or modify
+# it under the terms of the GNU General Public License as published by
+# the Free Software Foundation; either version 3 of the License, or
+# (at your option) any later version.
+#
+# This program is distributed in the hope that it will be useful,
+# but WITHOUT ANY WARRANTY; without even the implied warranty of
+# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+# GNU General Public License for more details.
+#
+# You should have received a copy of the GNU General Public License
+# along with GCC; see the file COPYING3.  If not see
+# http://www.gnu.org/licenses/.
+
+# GCC testsuite that uses the `dg.exp' driver.
+
+# Exit immediately if this isn't an ARM or AArch64 target.
+if {![istarget arm*-*-*]
+ ![istarget aarch64*-*-*]} then {
+  return
+}
+
+# Load support procs.
+load_lib gcc-dg.exp
+
+# If a testcase doesn't have special options, use these.
+global DEFAULT_CFLAGS
+if ![info exists DEFAULT_CFLAGS] then {
+set DEFAULT_CFLAGS  -ansi -pedantic-errors
+}
+
+# Initialize `dg'.
+dg-init
+
+# Main loop.
+dg-runtest [lsort [glob -nocomplain $srcdir/$subdir/*.\[cC\]]] \
+	 $DEFAULT_CFLAGS
+
+# All done.
+dg-finish


Re: [PR66726] Factor conversion out of COND_EXPR

2015-07-07 Thread Kugan
On 07/07/15 07:37, Jeff Law wrote:
 On 07/04/2015 06:32 AM, Kugan wrote:

 I would also verify that this turns into a MIN_EXPR.  I think the patch
 as-written won't detect the MIN_EXPR until the _next_ time phi-opt is
 called.  And one of the benefits we're really looking for here is to
 remove barriers to finding these min/max expressions.
 

 
 +
 diff --git a/gcc/tree-ssa-phiopt.c b/gcc/tree-ssa-phiopt.c
 index d2a5cee..12ab9ee 100644
 --- a/gcc/tree-ssa-phiopt.c
 +++ b/gcc/tree-ssa-phiopt.c
 @@ -73,6 +73,7 @@ along with GCC; see the file COPYING3.  If not see
   static unsigned int tree_ssa_phiopt_worker (bool, bool);
   static bool conditional_replacement (basic_block, basic_block,
edge, edge, gphi *, tree, tree);
 +static bool factor_out_conditional_conversion (edge, edge, gphi *,
 tree, tree);
   static int value_replacement (basic_block, basic_block,
 edge, edge, gimple, tree, tree);
   static bool minmax_replacement (basic_block, basic_block,
 @@ -342,6 +343,8 @@ tree_ssa_phiopt_worker (bool do_store_elim, bool
 do_hoist_loads)
   cfgchanged = true;
 else if (minmax_replacement (bb, bb1, e1, e2, phi, arg0, arg1))
   cfgchanged = true;
 +  else if (factor_out_conditional_conversion (e1, e2, phi, arg0,
 arg1))
 +cfgchanged = true;
 So this transformation does not inherently change the CFG, so setting
 CFGCHANGED isn't really appropriate and may trigger unnecessary cleanups.
 
 I think the transformation needs to occur prior this if-elseif-else
 block since the transformation should enable the code in the
 if-elseif-else block to find more optimization opportunities.
 
 That will also imply we either restart after the transformation applies,
 or we update the local variables that are used as arguments to
 conditional_replacement, abs_replacement and minmax_replacement.
 
 
   }
   }

 @@ -410,6 +413,108 @@ replace_phi_edge_with_variable (basic_block
 cond_block,
 bb-index);
   }

 +/* PR66726: Factor conversion out of COND_EXPR.  If the arguments of
 the PHI
 +   stmt are CONVERT_STMT, factor out the conversion and perform the
 conversion
 +   to the result of PHI stmt.  */
 +
 +static bool
 +factor_out_conditional_conversion (edge e0, edge e1, gphi *phi,
 +   tree arg0, tree arg1)
 +{
 +  gimple def0 = NULL, def1 = NULL, new_stmt;
 +  tree new_arg0 = NULL_TREE, new_arg1 = NULL_TREE;
 +  tree temp, result;
 +  gimple_stmt_iterator gsi;
 +
 +  /* One of the arguments has to be SSA_NAME and other argument can
 + be an SSA_NAME of INTEGER_CST.  */
 +  if ((TREE_CODE (arg0) != SSA_NAME
 +TREE_CODE (arg0) != INTEGER_CST)
 +  || (TREE_CODE (arg1) != SSA_NAME
 +   TREE_CODE (arg1) != INTEGER_CST)
 +  || (TREE_CODE (arg0) == INTEGER_CST
 +   TREE_CODE (arg1) == INTEGER_CST))
 +return false;
 +
 +  /* Handle only PHI statements with two arguments.  TODO: If all
 + other arguments to PHI are INTEGER_CST, we can handle more
 + than two arguments too.  */
 +  if (gimple_phi_num_args (phi) != 2)
 +return false;
 If you're just handling two arguments, then it's probably easiest to
 just swap arg0/arg1 e0/e1 if arg0 is not an SSA_NAME like this:
 
  /* First canonicalize to simplify tests.  */
   if (TREE_CODE (arg0) != SSA_NAME)
 {
   std::swap (arg0, arg1);
   std::swap (e0, e1);
 }
 
   if (TREE_CODE (arg0) != SSA_NAME)
 return false;
 
 That simplifies things a bit since you're going to know from thsi point
 forward that arg0 is an SSA_NAME.
 
 
 
 +
 +  /* If arg0 is an SSA_NAME and the stmt which defines arg0 is
 + a CONVERT_STMT, use the LHS as new_arg0.  */
 +  if (TREE_CODE (arg0) == SSA_NAME)
 +{
 +  def0 = SSA_NAME_DEF_STMT (arg0);
 +  if (!is_gimple_assign (def0)
 +  || !CONVERT_EXPR_CODE_P (gimple_assign_rhs_code (def0)))
 +return false;
 +  new_arg0 = gimple_assign_rhs1 (def0);
 +}
 Use gimple_assign_cast_p rather than checking CONVERT_EXPR_CODE_P
 directly, so something like:
 
   /* Now see if ARG0 was defined by a typecast.  */
   gimple arg0_def = SSA_NAME_DEF_STMT (arg0);
 
   if (!is_gimple_assign (arg0_def) || !gimple_assign_cast_p (arg0_def))
 return false;
 
 Similarly for arg1 when it's an SSA_NAME.
 
 
 +
 +  /* If types of new_arg0 and new_arg1 are different bailout.  */
 +  if (TREE_TYPE (new_arg0) != TREE_TYPE (new_arg1))
 +return false;
 Do we want to restrict this to just integral types?I haven't though
 about it too deeply, so perhaps not.
 
 +
 +  /* Replace the PHI stmt with the new_arg0 and new_arg1.  Also insert
 + a new CONVERT_STMT that converts the phi results.  */
 +  gsi = gsi_after_labels (gimple_bb (phi));
 +  result = PHI_RESULT (phi);
 +  temp = make_ssa_name (TREE_TYPE (new_arg0), phi);
 +
 +  if (dump_file  (dump_flags  TDF_DETAILS))
 +{
 +  fprintf (dump_file, PHI );
 +  print_generic_expr (dump_file, gimple_phi_result (phi), 0);
 + 

Re: [PATCH] Do not use floating point registers when compiling with -msoft-float for SPARC

2015-07-07 Thread Daniel Cederman


On 2015-07-07 12:32, Eric Botcazou wrote:


ChangeLog must just describe the what, nothing more.  If the rationale is not
obvious, then a comment must be added _in the code_ itself.

* config/sparc/sparc.c (sparc_function_value_regno_p): Do not return
true on %f0 for a target without FPU.
* config/sparc/sparc.md (untyped_call): Do not save %f0 for a target
without FPU.
(untyped_return): Do not load %f0 for a target without FPU.



Understood. Thank you for looking at my patches and coming with 
improvements.


--
Daniel Cederman


[PATCH 1/16][ARM] PR/63870 Add qualifier to check lane bounds in expand

2015-07-07 Thread Alan Lawrence

As per https://gcc.gnu.org/ml/gcc-patches/2015-04/msg01333.html

(While this falls under PR/63870, and I will link to that in the ChangeLog, it 
is only a small step towards fixing that PR.)
commit 9812db88cff20a505365f68f4065d2fbab998c9c
Author: Alan Lawrence alan.lawre...@arm.com
Date:   Mon Dec 8 11:04:49 2014 +

ARM: Add qualifier_lane_index

diff --git a/gcc/config/arm/arm-builtins.c b/gcc/config/arm/arm-builtins.c
index f960e0a..7f5bf87 100644
--- a/gcc/config/arm/arm-builtins.c
+++ b/gcc/config/arm/arm-builtins.c
@@ -77,7 +77,9 @@ enum arm_type_qualifiers
   /* qualifier_const_pointer | qualifier_map_mode  */
   qualifier_const_pointer_map_mode = 0x86,
   /* Polynomial types.  */
-  qualifier_poly = 0x100
+  qualifier_poly = 0x100,
+  /* Lane indices - must be within range of previous argument = a vector.  */
+  qualifier_lane_index = 0x200
 };
 
 /*  The qualifier_internal allows generation of a unary builtin from
@@ -108,21 +110,40 @@ arm_ternop_qualifiers[SIMD_MAX_BUILTIN_ARGS]
 
 /* T (T, immediate).  */
 static enum arm_type_qualifiers
-arm_getlane_qualifiers[SIMD_MAX_BUILTIN_ARGS]
+arm_binop_imm_qualifiers[SIMD_MAX_BUILTIN_ARGS]
   = { qualifier_none, qualifier_none, qualifier_immediate };
+#define BINOP_IMM_QUALIFIERS (arm_binop_imm_qualifiers)
+
+/* T (T, lane index).  */
+static enum arm_type_qualifiers
+arm_getlane_qualifiers[SIMD_MAX_BUILTIN_ARGS]
+  = { qualifier_none, qualifier_none, qualifier_lane_index };
 #define GETLANE_QUALIFIERS (arm_getlane_qualifiers)
 
 /* T (T, T, T, immediate).  */
 static enum arm_type_qualifiers
-arm_lanemac_qualifiers[SIMD_MAX_BUILTIN_ARGS]
+arm_mac_n_qualifiers[SIMD_MAX_BUILTIN_ARGS]
   = { qualifier_none, qualifier_none, qualifier_none,
   qualifier_none, qualifier_immediate };
-#define LANEMAC_QUALIFIERS (arm_lanemac_qualifiers)
+#define MAC_N_QUALIFIERS (arm_mac_n_qualifiers)
+
+/* T (T, T, T, lane index).  */
+static enum arm_type_qualifiers
+arm_mac_lane_qualifiers[SIMD_MAX_BUILTIN_ARGS]
+  = { qualifier_none, qualifier_none, qualifier_none,
+  qualifier_none, qualifier_lane_index };
+#define MAC_LANE_QUALIFIERS (arm_mac_lane_qualifiers)
 
 /* T (T, T, immediate).  */
 static enum arm_type_qualifiers
-arm_setlane_qualifiers[SIMD_MAX_BUILTIN_ARGS]
+arm_ternop_imm_qualifiers[SIMD_MAX_BUILTIN_ARGS]
   = { qualifier_none, qualifier_none, qualifier_none, qualifier_immediate };
+#define TERNOP_IMM_QUALIFIERS (arm_ternop_imm_qualifiers)
+
+/* T (T, T, lane index).  */
+static enum arm_type_qualifiers
+arm_setlane_qualifiers[SIMD_MAX_BUILTIN_ARGS]
+  = { qualifier_none, qualifier_none, qualifier_none, qualifier_lane_index };
 #define SETLANE_QUALIFIERS (arm_setlane_qualifiers)
 
 /* T (T, T).  */
@@ -1927,6 +1948,7 @@ arm_expand_unop_builtin (enum insn_code icode,
 typedef enum {
   NEON_ARG_COPY_TO_REG,
   NEON_ARG_CONSTANT,
+  NEON_ARG_LANE_INDEX,
   NEON_ARG_MEMORY,
   NEON_ARG_STOP
 } builtin_arg;
@@ -2043,6 +2065,16 @@ arm_expand_neon_args (rtx target, machine_mode map_mode, int fcode,
 		op[argc] = copy_to_mode_reg (mode[argc], op[argc]);
 	  break;
 
+	case NEON_ARG_LANE_INDEX:
+	  /* Previous argument must be a vector, which this indexes.  */
+	  gcc_assert (argc  0);
+	  if (CONST_INT_P (op[argc]))
+		{
+		  enum machine_mode vmode = mode[argc - 1];
+		  neon_lane_bounds (op[argc], 0, GET_MODE_NUNITS (vmode), exp);
+		}
+	  /* Fall through - if the lane index isn't a constant then
+		 the next case will error.  */
 	case NEON_ARG_CONSTANT:
 	  if (!(*insn_data[icode].operand[opno].predicate)
 		  (op[argc], mode[argc]))
@@ -2170,7 +2202,9 @@ arm_expand_neon_builtin (int fcode, tree exp, rtx target)
   int operands_k = k - is_void;
   int expr_args_k = k - 1;
 
-  if (d-qualifiers[qualifiers_k]  qualifier_immediate)
+  if (d-qualifiers[qualifiers_k]  qualifier_lane_index)
+	args[k] = NEON_ARG_LANE_INDEX;
+  else if (d-qualifiers[qualifiers_k]  qualifier_immediate)
 	args[k] = NEON_ARG_CONSTANT;
   else if (d-qualifiers[qualifiers_k]  qualifier_maybe_immediate)
 	{
diff --git a/gcc/config/arm/arm-protos.h b/gcc/config/arm/arm-protos.h
index 62f91ef..25bdebd 100644
--- a/gcc/config/arm/arm-protos.h
+++ b/gcc/config/arm/arm-protos.h
@@ -86,7 +86,7 @@ extern void neon_pairwise_reduce (rtx, rtx, machine_mode,
 extern rtx neon_make_constant (rtx);
 extern tree arm_builtin_vectorized_function (tree, tree, tree);
 extern void neon_expand_vector_init (rtx, rtx);
-extern void neon_lane_bounds (rtx, HOST_WIDE_INT, HOST_WIDE_INT);
+extern void neon_lane_bounds (rtx, HOST_WIDE_INT, HOST_WIDE_INT, const_tree);
 extern void neon_const_bounds (rtx, HOST_WIDE_INT, HOST_WIDE_INT);
 extern HOST_WIDE_INT neon_element_bits (machine_mode);
 extern void neon_reinterpret (rtx, rtx);
diff --git a/gcc/config/arm/arm.c b/gcc/config/arm/arm.c
index e79a369..6e074ea 100644
--- a/gcc/config/arm/arm.c
+++ b/gcc/config/arm/arm.c
@@ -12788,12 +12788,12 @@ neon_expand_vector_init (rtx target, rtx vals)
 }
 
 /* 

[PATCH 0/16][ARM/AArch64] Float16_t support, v2

2015-07-07 Thread Alan Lawrence
This is a respin of the series at 
https://gcc.gnu.org/ml/gcc-patches/2015-04/msg01332.html, plus the two ARM 
patches on which these depend 
(https://gcc.gnu.org/ml/gcc-patches/2015-04/msg01333.html). These two somewhat 
duplicate Charles Baylis' lane-bounds-checking patch at 
https://gcc.gnu.org/ml/gcc-patches/2015-07/msg00140.html, in that they both port 
some of the same AArch64 infrastructure onto ARM; while each has some parts the 
other doesn't, there don't look to be any serious conflicts; if Charles' patches 
were to go in first, I would not expect any major problems in rebasing mine over 
his.


Changes since the first version of the float16 series are

* to separate out the (non-vector) tests from gcc.testsuite/aarch64 into a 
.../fp16 subdirectory, as per 
https://gcc.gnu.org/ml/gcc-patches/2015-05/msg00656.html


* dropped the patch rewriting advsimd-intrinsics.exp, following other changes 
along similar lines by Sandra Loosemore and Christopher Lyon;


* Rebased over other testsuite changes, including dropping expected values from 
many tests of intrinsics with no fp16 variant, and introducing a 
CHECK_RESULTS_NO_FP16 macro.


* Changed the mechanism on ARM by which we passed in -mfpu=neon-fp16: we now try 
to pass this into all tests, but fail the vcvt_f16.c if float16 is still not 
supported (e.g. there was a conflicting -mfpu=neon passed to the compiler, or we 
are running on HW which does not support the instructions).


Are these OK for trunk?

Thanks, Alan



[PATCH 15/16][fold-const.c] Fix bigendian HFmode in native_interpret_real

2015-07-07 Thread Alan Lawrence
As per https://gcc.gnu.org/ml/gcc-patches/2015-04/msg01346.html. Fixes FAIL of 
advsimd-intrinsics vcreate.c on aarch64_be-none-elf from previous patch.
commit e2e7ca148960a82fc88128820f17e7cbd14173cb
Author: Alan Lawrence alan.lawre...@arm.com
Date:   Thu Apr 9 10:54:40 2015 +0100

Fix native_interpret_real for HFmode floats on Bigendian with UNITS_PER_WORD=4

(with missing space)

diff --git a/gcc/fold-const.c b/gcc/fold-const.c
index e61d946..15a10f0 100644
--- a/gcc/fold-const.c
+++ b/gcc/fold-const.c
@@ -7622,7 +7622,7 @@ native_interpret_real (tree type, const unsigned char *ptr, int len)
 	offset += byte % UNITS_PER_WORD;
 	}
   else
-	offset = BYTES_BIG_ENDIAN ? 3 - byte : byte;
+	offset = BYTES_BIG_ENDIAN ? MIN (3, total_bytes - 1) - byte : byte;
   value = ptr[offset + ((bitpos / BITS_PER_UNIT)  ~3)];
 
   tmp[bitpos / 32] |= (unsigned long)value  (bitpos  31);


[PATCH 14/16][ARM/AArch64 testsuite] Update advsimd-intrinsics tests to add float16 vectors

2015-07-07 Thread Alan Lawrence
This is a respin of https://gcc.gnu.org/ml/gcc-patches/2015-04/msg01347.html, 
removing many default values of 0x333, to complete that I introduced new macros 
CHECK_RESULTS{,_NAMED}_NO_FP16 as writing the same list of vector types in four 
places seemed too many.


gcc/testsuite/ChangeLog:

* gcc.target/aarch64/advsimd-intrinsics/arm-neon-ref.h (hfloat16_t,
vdup_n_f16, CHECK_RESULTS_NO_FP16, CHECK_RESULTS_NAMED_NO_FP16): New.
(result, expected, clean_results): Add float16x4 and float16x8 cases.
(CHECK_RESULTS_NAMED): Likewise, using CHECK_RESULTS_NAMED_NO_FP16.
(CHECK_RESULTS): Redefine using CHECK_RESULTS_NAMED

DECL_VARIABLE_64BITS_VARIANTS: Add float16x4 case.
DECL_VARIABLE_128BITS_VARIANTS: Add float16x8 case.

* gcc.target/aarch64/advsimd-intrinsics/compute-data-ref.h (buffer,
buffer_pad, buffer_dup, buffer_dup_pad): Add float16x4 and float16x8.

* gcc.target/aarch64/advsimd-intrinsics/vbsl.c (exec_vbsl): Change
CHECK_RESULTS to CHECK_RESULTS_NO_FP16.
* gcc.target/aarch64/advsimd-intrinsics/vdup_lane.c (exec_vdup_lane):
Likewise.
* gcc.target/aarch64/advsimd-intrinsics/vext.c (exec_vext): Likewise.

* gcc.target/aarch64/advsimd-intrinsics/vdup-vmov.c (exec_vdup_vmov):
Change CHECK_RESULTS_NAMED to CHECK_RESULTS_NAMED_NO_FP16.

* gcc.target/aarch64/advsimd-intrinsics/vcombine.c: Add expected
results for float16x4 and float16x8.
(exec_vcombine): add test of float16x4 - float16x8 case.
* gcc.target/aarch64/advsimd-intrinsics/vcreate.c: Likewise.
* gcc.target/aarch64/advsimd-intrinsics/vget_high.c: Likewise.
* gcc.target/aarch64/advsimd-intrinsics/vget_low.c: Likewise.
* gcc.target/aarch64/advsimd-intrinsics/vld1.c: Likewise.
* gcc.target/aarch64/advsimd-intrinsics/vld1_dup.c: Likewise.
* gcc.target/aarch64/advsimd-intrinsics/vld1_lane.c: Likewise.
* gcc.target/aarch64/advsimd-intrinsics/vldX.c: Likewise.
* gcc.target/aarch64/advsimd-intrinsics/vldX_dup.c: Likewise.
* gcc.target/aarch64/advsimd-intrinsics/vldX_lane.c: Likewise.
diff --git a/gcc/testsuite/gcc.target/aarch64/advsimd-intrinsics/arm-neon-ref.h b/gcc/testsuite/gcc.target/aarch64/advsimd-intrinsics/arm-neon-ref.h
index 4e728d5..cf9c358 100644
--- a/gcc/testsuite/gcc.target/aarch64/advsimd-intrinsics/arm-neon-ref.h
+++ b/gcc/testsuite/gcc.target/aarch64/advsimd-intrinsics/arm-neon-ref.h
@@ -7,6 +7,7 @@
 #include inttypes.h
 
 /* helper type, to help write floating point results in integer form.  */
+typedef uint16_t hfloat16_t;
 typedef uint32_t hfloat32_t;
 typedef uint64_t hfloat64_t;
 
@@ -132,6 +133,7 @@ static ARRAY(result, uint, 32, 2);
 static ARRAY(result, uint, 64, 1);
 static ARRAY(result, poly, 8, 8);
 static ARRAY(result, poly, 16, 4);
+static ARRAY(result, float, 16, 4);
 static ARRAY(result, float, 32, 2);
 static ARRAY(result, int, 8, 16);
 static ARRAY(result, int, 16, 8);
@@ -143,6 +145,7 @@ static ARRAY(result, uint, 32, 4);
 static ARRAY(result, uint, 64, 2);
 static ARRAY(result, poly, 8, 16);
 static ARRAY(result, poly, 16, 8);
+static ARRAY(result, float, 16, 8);
 static ARRAY(result, float, 32, 4);
 #ifdef __aarch64__
 static ARRAY(result, float, 64, 2);
@@ -160,6 +163,7 @@ extern ARRAY(expected, uint, 32, 2);
 extern ARRAY(expected, uint, 64, 1);
 extern ARRAY(expected, poly, 8, 8);
 extern ARRAY(expected, poly, 16, 4);
+extern ARRAY(expected, hfloat, 16, 4);
 extern ARRAY(expected, hfloat, 32, 2);
 extern ARRAY(expected, int, 8, 16);
 extern ARRAY(expected, int, 16, 8);
@@ -171,38 +175,11 @@ extern ARRAY(expected, uint, 32, 4);
 extern ARRAY(expected, uint, 64, 2);
 extern ARRAY(expected, poly, 8, 16);
 extern ARRAY(expected, poly, 16, 8);
+extern ARRAY(expected, hfloat, 16, 8);
 extern ARRAY(expected, hfloat, 32, 4);
 extern ARRAY(expected, hfloat, 64, 2);
 
-/* Check results. Operates on all possible vector types.  */
-#define CHECK_RESULTS(test_name,comment)\
-  {	\
-CHECK(test_name, int, 8, 8, PRIx8, expected, comment);		\
-CHECK(test_name, int, 16, 4, PRIx16, expected, comment);		\
-CHECK(test_name, int, 32, 2, PRIx32, expected, comment);		\
-CHECK(test_name, int, 64, 1, PRIx64, expected, comment);		\
-CHECK(test_name, uint, 8, 8, PRIx8, expected, comment);		\
-CHECK(test_name, uint, 16, 4, PRIx16, expected, comment);		\
-CHECK(test_name, uint, 32, 2, PRIx32, expected, comment);		\
-CHECK(test_name, uint, 64, 1, PRIx64, expected, comment);		\
-CHECK(test_name, poly, 8, 8, PRIx8, expected, comment);		\
-CHECK(test_name, poly, 16, 4, PRIx16, expected, comment);		\
-CHECK_FP(test_name, float, 32, 2, PRIx32, expected, comment);	\
-	\
-CHECK(test_name, int, 8, 16, PRIx8, expected, comment);		\
-CHECK(test_name, int, 16, 8, PRIx16, expected, comment);		\
-CHECK(test_name, int, 32, 4, PRIx32, expected, comment);		\
-

[PATCH 13/16][AArch64] Add vcvt(_high)?_f32_f16 intrinsics, with BE RTL fix

2015-07-07 Thread Alan Lawrence

Unchanged since https://gcc.gnu.org/ml/gcc-patches/2015-04/msg01345.html
commit 214fcc00475a543a79ed444f9a64061215397cc8
Author: Alan Lawrence alan.lawre...@arm.com
Date:   Wed Jan 28 13:01:31 2015 +

AArch64 6/N: vcvt{,_high}_f32_f16 (using vect_par_cnst_hi_half, fixing bigendian indices)

diff --git a/gcc/config/aarch64/aarch64-simd-builtins.def b/gcc/config/aarch64/aarch64-simd-builtins.def
index 8bcab72..9869b73 100644
--- a/gcc/config/aarch64/aarch64-simd-builtins.def
+++ b/gcc/config/aarch64/aarch64-simd-builtins.def
@@ -361,11 +361,11 @@
   BUILTIN_VSDQ_I_DI (UNOP, abs, 0)
   BUILTIN_VDQF (UNOP, abs, 2)
 
-  VAR1 (UNOP, vec_unpacks_hi_, 10, v4sf)
+  VAR2 (UNOP, vec_unpacks_hi_, 10, v4sf, v8hf)
   VAR1 (BINOP, float_truncate_hi_, 0, v4sf)
   VAR1 (BINOP, float_truncate_hi_, 0, v8hf)
 
-  VAR1 (UNOP, float_extend_lo_, 0, v2df)
+  VAR2 (UNOP, float_extend_lo_, 0, v2df, v4sf)
   BUILTIN_VDF (UNOP, float_truncate_lo_, 0)
 
   /* Implemented by aarch64_ld1VALL_F16:mode.  */
diff --git a/gcc/config/aarch64/aarch64-simd.md b/gcc/config/aarch64/aarch64-simd.md
index 2dc54e1..1a7d858 100644
--- a/gcc/config/aarch64/aarch64-simd.md
+++ b/gcc/config/aarch64/aarch64-simd.md
@@ -1691,36 +1691,57 @@
 
 ;; Float widening operations.
 
-(define_insn vec_unpacks_lo_v4sf
-  [(set (match_operand:V2DF 0 register_operand =w)
-	(float_extend:V2DF
-	  (vec_select:V2SF
-	(match_operand:V4SF 1 register_operand w)
-	(parallel [(const_int 0) (const_int 1)])
-	  )))]
+(define_insn aarch64_simd_vec_unpacks_lo_mode
+  [(set (match_operand:VWIDE 0 register_operand =w)
+(float_extend:VWIDE (vec_select:VHALF
+			   (match_operand:VQ_HSF 1 register_operand w)
+			   (match_operand:VQ_HSF 2 vect_par_cnst_lo_half )
+			)))]
   TARGET_SIMD
-  fcvtl\\t%0.2d, %1.2s
+  fcvtl\\t%0.Vwtype, %1.Vhalftype
   [(set_attr type neon_fp_cvt_widen_s)]
 )
 
-(define_insn aarch64_float_extend_lo_v2df
-  [(set (match_operand:V2DF 0 register_operand =w)
-	(float_extend:V2DF
-	  (match_operand:V2SF 1 register_operand w)))]
+(define_expand vec_unpacks_lo_mode
+  [(match_operand:VWIDE 0 register_operand )
+   (match_operand:VQ_HSF 1 register_operand )]
   TARGET_SIMD
-  fcvtl\\t%0.2d, %1.2s
+  {
+rtx p = aarch64_simd_vect_par_cnst_half (MODEmode, false);
+emit_insn (gen_aarch64_simd_vec_unpacks_lo_mode (operands[0],
+		   operands[1], p));
+DONE;
+  }
+)
+
+(define_insn aarch64_simd_vec_unpacks_hi_mode
+  [(set (match_operand:VWIDE 0 register_operand =w)
+(float_extend:VWIDE (vec_select:VHALF
+			   (match_operand:VQ_HSF 1 register_operand w)
+			   (match_operand:VQ_HSF 2 vect_par_cnst_hi_half )
+			)))]
+  TARGET_SIMD
+  fcvtl2\\t%0.Vwtype, %1.Vtype
   [(set_attr type neon_fp_cvt_widen_s)]
 )
 
-(define_insn vec_unpacks_hi_v4sf
-  [(set (match_operand:V2DF 0 register_operand =w)
-	(float_extend:V2DF
-	  (vec_select:V2SF
-	(match_operand:V4SF 1 register_operand w)
-	(parallel [(const_int 2) (const_int 3)])
-	  )))]
+(define_expand vec_unpacks_hi_mode
+  [(match_operand:VWIDE 0 register_operand )
+   (match_operand:VQ_HSF 1 register_operand )]
+  TARGET_SIMD
+  {
+rtx p = aarch64_simd_vect_par_cnst_half (MODEmode, true);
+emit_insn (gen_aarch64_simd_vec_unpacks_lo_mode (operands[0],
+		   operands[1], p));
+DONE;
+  }
+)
+(define_insn aarch64_float_extend_lo_Vwide
+  [(set (match_operand:VWIDE 0 register_operand =w)
+	(float_extend:VWIDE
+	  (match_operand:VDF 1 register_operand w)))]
   TARGET_SIMD
-  fcvtl2\\t%0.2d, %1.4s
+  fcvtl\\t%0Vmwtype, %1Vmtype
   [(set_attr type neon_fp_cvt_widen_s)]
 )
 
diff --git a/gcc/config/aarch64/arm_neon.h b/gcc/config/aarch64/arm_neon.h
index ff1a45c..4f0636f 100644
--- a/gcc/config/aarch64/arm_neon.h
+++ b/gcc/config/aarch64/arm_neon.h
@@ -6026,10 +6026,6 @@ vaddlvq_u32 (uint32x4_t a)
result;  \
  })
 
-/* vcvt_f32_f16 not supported */
-
-/* vcvt_high_f32_f16 not supported */
-
 #define vcvt_n_f32_s32(a, b)\
   __extension__ \
 ({  \
@@ -13420,6 +13416,12 @@ vcvt_high_f32_f64 (float32x2_t __a, float64x2_t __b)
 
 /* vcvt (float - double).  */
 
+__extension__ static __inline float32x4_t __attribute__ ((__always_inline__))
+vcvt_f32_f16 (float16x4_t __a)
+{
+  return __builtin_aarch64_float_extend_lo_v4sf (__a);
+}
+
 __extension__ static __inline float64x2_t __attribute__ ((__always_inline__))
 vcvt_f64_f32 (float32x2_t __a)
 {
@@ -13427,6 +13429,12 @@ vcvt_f64_f32 (float32x2_t __a)
   return __builtin_aarch64_float_extend_lo_v2df (__a);
 }
 
+__extension__ static __inline float32x4_t __attribute__ ((__always_inline__))
+vcvt_high_f32_f16 (float16x8_t __a)
+{
+  return __builtin_aarch64_vec_unpacks_hi_v8hf (__a);
+}
+
 __extension__ static __inline float64x2_t __attribute__ 

[PATCH 12/16][AArch64] vreinterpret(q?), vget_(low|high), vld1(q?)_dup

2015-07-07 Thread Alan Lawrence
This is the remainder of 
https://gcc.gnu.org/ml/gcc-patches/2015-04/msg01343.html combined with 
https://gcc.gnu.org/ml/gcc-patches/2015-04/msg01344.html, putting together all 
the intrinsics that didn't require anything outside arm_neon.h. Also update the 
existing tests in aarch64/.


gcc/ChangeLog:

* config/aarch64/arm_neon.h (vreinterpret_p8_f16, vreinterpret_p16_f16,
vreinterpret_f16_f64, vreinterpret_f16_s8, vreinterpret_f16_s16,
vreinterpret_f16_s32, vreinterpret_f16_s64, vreinterpret_f16_f32,
vreinterpret_f16_u8, vreinterpret_f16_u16, vreinterpret_f16_u32,
vreinterpret_f16_u64, vreinterpret_f16_p8, vreinterpret_f16_p16,
vreinterpretq_f16_f64, vreinterpretq_f16_s8, vreinterpretq_f16_s16,
vreinterpretq_f16_s32, vreinterpretq_f16_s64, vreinterpretq_f16_f32,
vreinterpretq_f16_u8, vreinterpretq_f16_u16, vreinterpretq_f16_u32,
vreinterpretq_f16_u64, vreinterpretq_f16_p8, vreinterpretq_f16_p16,
vreinterpret_f32_f16, vreinterpret_f64_f16, vreinterpret_s64_f16,
vreinterpret_u64_f16, vreinterpretq_u64_f16, vreinterpret_s8_f16,
vreinterpret_s16_f16, vreinterpret_s32_f16, vreinterpret_u8_f16,
vreinterpret_u16_f16, vreinterpret_u32_f16, vreinterpretq_p8_f16,
vreinterpretq_p16_f16, vreinterpretq_f32_f16, vreinterpretq_f64_f16,
vreinterpretq_s64_f16, vreinterpretq_s8_f16, vreinterpretq_s16_f16,
vreinterpretq_s32_f16, vreinterpretq_u8_f16, vreinterpretq_u16_f16,
vreinterpretq_u32_f16, vget_low_f16, vget_high_f16, vld1_dup_f16,
vld1q_dup_f16): New.

gcc/testsuite/ChangeLog:

* gcc.target/aarch64/vget_high_1.c: Add float16x8-float16x4 case.
* gcc.target/aarch64/vget_low_1.c: Likewise.
commit beb21a6bce76d4fbedb13fcf25796563b27f6bae
Author: Alan Lawrence alan.lawre...@arm.com
Date:   Mon Jun 29 18:46:49 2015 +0100

[AArch64 5/N v2] vreinterpret, vget_(low|high), vld1(q?)_dup. update tests for vget_low/high

diff --git a/gcc/config/aarch64/arm_neon.h b/gcc/config/aarch64/arm_neon.h
index b915754..ff1a45c 100644
--- a/gcc/config/aarch64/arm_neon.h
+++ b/gcc/config/aarch64/arm_neon.h
@@ -2891,6 +2891,12 @@ vgetq_lane_u64 (uint64x2_t __a, const int __b)
 /* vreinterpret  */
 
 __extension__ static __inline poly8x8_t __attribute__ ((__always_inline__))
+vreinterpret_p8_f16 (float16x4_t __a)
+{
+  return (poly8x8_t) __a;
+}
+
+__extension__ static __inline poly8x8_t __attribute__ ((__always_inline__))
 vreinterpret_p8_f64 (float64x1_t __a)
 {
   return (poly8x8_t) __a;
@@ -2987,6 +2993,12 @@ vreinterpretq_p8_s64 (int64x2_t __a)
 }
 
 __extension__ static __inline poly8x16_t __attribute__ ((__always_inline__))
+vreinterpretq_p8_f16 (float16x8_t __a)
+{
+  return (poly8x16_t) __a;
+}
+
+__extension__ static __inline poly8x16_t __attribute__ ((__always_inline__))
 vreinterpretq_p8_f32 (float32x4_t __a)
 {
   return (poly8x16_t) __a;
@@ -3023,6 +3035,12 @@ vreinterpretq_p8_p16 (poly16x8_t __a)
 }
 
 __extension__ static __inline poly16x4_t __attribute__ ((__always_inline__))
+vreinterpret_p16_f16 (float16x4_t __a)
+{
+  return (poly16x4_t) __a;
+}
+
+__extension__ static __inline poly16x4_t __attribute__ ((__always_inline__))
 vreinterpret_p16_f64 (float64x1_t __a)
 {
   return (poly16x4_t) __a;
@@ -3119,6 +3137,12 @@ vreinterpretq_p16_s64 (int64x2_t __a)
 }
 
 __extension__ static __inline poly16x8_t __attribute__ ((__always_inline__))
+vreinterpretq_p16_f16 (float16x8_t __a)
+{
+  return (poly16x8_t) __a;
+}
+
+__extension__ static __inline poly16x8_t __attribute__ ((__always_inline__))
 vreinterpretq_p16_f32 (float32x4_t __a)
 {
   return (poly16x8_t) __a;
@@ -3154,6 +3178,156 @@ vreinterpretq_p16_p8 (poly8x16_t __a)
   return (poly16x8_t) __a;
 }
 
+__extension__ static __inline float16x4_t __attribute__ ((__always_inline__))
+vreinterpret_f16_f64 (float64x1_t __a)
+{
+  return (float16x4_t) __a;
+}
+
+__extension__ static __inline float16x4_t __attribute__ ((__always_inline__))
+vreinterpret_f16_s8 (int8x8_t __a)
+{
+  return (float16x4_t) __a;
+}
+
+__extension__ static __inline float16x4_t __attribute__ ((__always_inline__))
+vreinterpret_f16_s16 (int16x4_t __a)
+{
+  return (float16x4_t) __a;
+}
+
+__extension__ static __inline float16x4_t __attribute__ ((__always_inline__))
+vreinterpret_f16_s32 (int32x2_t __a)
+{
+  return (float16x4_t) __a;
+}
+
+__extension__ static __inline float16x4_t __attribute__ ((__always_inline__))
+vreinterpret_f16_s64 (int64x1_t __a)
+{
+  return (float16x4_t) __a;
+}
+
+__extension__ static __inline float16x4_t __attribute__ ((__always_inline__))
+vreinterpret_f16_f32 (float32x2_t __a)
+{
+  return (float16x4_t) __a;
+}
+
+__extension__ static __inline float16x4_t __attribute__ ((__always_inline__))
+vreinterpret_f16_u8 (uint8x8_t __a)
+{
+  return (float16x4_t) __a;
+}
+
+__extension__ static __inline float16x4_t __attribute__ ((__always_inline__))
+vreinterpret_f16_u16 (uint16x4_t __a)
+{
+  return (float16x4_t) __a;
+}
+

RE: [PATCH] MIPS: Fix the call-[1,5,6].c tests to allow the jrc instruction to be matched when testing with microMIPS

2015-07-07 Thread Moore, Catherine


 -Original Message-
 From: Andrew Bennett [mailto:andrew.benn...@imgtec.com]
 Sent: Tuesday, July 07, 2015 6:53 AM
 To: gcc-patches@gcc.gnu.org
 Cc: Moore, Catherine; Matthew Fortune
 Subject: [PATCH] MIPS: Fix the call-[1,5,6].c tests to allow the jrc 
 instruction
 to be matched when testing with microMIPS
 
 Hi,
 
 When building the call-[1,5,6].c tests for micromips the jrc rather than the 
 jr
 instruction is used to call the tail* functions.
 
 I have updated the test output to allow the jrc instruction to be matched.
 
 I have tested this on the mips-mti-elf target using mips32r2/{-mno-
 micromips/-mmicromips}
 test options and there are no new regressions.
 
 The patch and ChangeLog are below.
 
 Ok to commit?
 
 
 testsuite/
   * gcc.target/mips/call-1.c: Allow testcase to match the jrc instruction.
   * gcc.target/mips/call-5.c: Ditto.
   * gcc.target/mips/call-6.c: Ditto.
 
 
OK.


Re: Clean-ups in match.pd

2015-07-07 Thread Richard Biener
On Mon, Jul 6, 2015 at 4:08 PM, Richard Biener
richard.guent...@gmail.com wrote:
 On Sat, Jul 4, 2015 at 4:34 PM, Marc Glisse marc.gli...@inria.fr wrote:
 Hello,

 these are just some minor changes. I believe I had already promised a build_
 function to match integer_each_onep.

 Bootstrap+testsuite on powerpc64le-unknown-linux-gnu (it looks like
 *-match.c takes about 10 minutes to compile in stage2 these days).

 Ouch.  I have some changes to the code generation in the queue which
 also supports a more natural if structure (else and elif).  Eventually
 that helps a bit but I suppose the main issue is simply from the large
 functions.  They can be split quite easily I think, but passing down
 all relevant state might turn out to be tricky unless we start using
 nested functions here ... (and IIRC those are not supported in C++)

Just checking in my dev tree (-O0 build with checking enabled, thus
similar to the stage2 situation) reveals nothing interesting.  The checkers
take up most of the time:

 CFG verifier:  21.27 ( 8%) usr   0.01 ( 1%) sys  21.46 (
8%) wall   0 kB ( 0%) ggc
early inlining heuristics:  12.59 ( 5%) usr   0.03 ( 2%) sys  12.61 (
5%) wall   10826 kB ( 1%) ggc
tree SSA verifier   :  26.30 (10%) usr   0.01 ( 1%) sys  26.34
(10%) wall   0 kB ( 0%) ggc
 tree STMT verifier  :  50.44 (20%) usr   0.10 ( 6%) sys  50.27
(20%) wall   0 kB ( 0%) ggc

that's everything = 5%

Trying to figure out if there is some gross algorithms in here (yes, we now
verify stuff quite often...)

Richard.



 Richard.

 2015-07-06  Marc Glisse  marc.gli...@inria.fr

 * match.pd: Remove element_mode inside HONOR_*.
 (~ (-A) - A - 1, ~ (A - 1) - -A): Handle complex types.
 (~X | X - -1, ~X ^ X - -1): Merge.
 * tree.c (build_each_one_cst): New function.
 * tree.h (build_each_one_cst): Likewise.

 --
 Marc Glisse
 Index: match.pd
 ===
 --- match.pd(revision 225411)
 +++ match.pd(working copy)
 @@ -101,7 +101,7 @@
 negative value by 0 gives -0, not +0.  */
  (simplify
   (mult @0 real_zerop@1)
 - (if (!HONOR_NANS (type)  !HONOR_SIGNED_ZEROS (element_mode (type)))
 + (if (!HONOR_NANS (type)  !HONOR_SIGNED_ZEROS (type))
@1))

  /* In IEEE floating point, x*1 is not equivalent to x for snans.
 @@ -108,8 +108,8 @@
 Likewise for complex arithmetic with signed zeros.  */
  (simplify
   (mult @0 real_onep)
 - (if (!HONOR_SNANS (element_mode (type))
 -   (!HONOR_SIGNED_ZEROS (element_mode (type))
 + (if (!HONOR_SNANS (type)
 +   (!HONOR_SIGNED_ZEROS (type)
|| !COMPLEX_FLOAT_TYPE_P (type)))
(non_lvalue @0)))

 @@ -116,8 +116,8 @@
  /* Transform x * -1.0 into -x.  */
  (simplify
   (mult @0 real_minus_onep)
 -  (if (!HONOR_SNANS (element_mode (type))
 -(!HONOR_SIGNED_ZEROS (element_mode (type))
 +  (if (!HONOR_SNANS (type)
 +(!HONOR_SIGNED_ZEROS (type)
 || !COMPLEX_FLOAT_TYPE_P (type)))
 (negate @0)))

 @@ -165,7 +165,7 @@
   (rdiv @0 @0)
   (if (FLOAT_TYPE_P (type)
 ! HONOR_NANS (type)
 -   ! HONOR_INFINITIES (element_mode (type)))
 +   ! HONOR_INFINITIES (type))
{ build_one_cst (type); }))

  /* Optimize -A / A to -1.0 if we don't care about
 @@ -174,19 +174,19 @@
   (rdiv:c @0 (negate @0))
   (if (FLOAT_TYPE_P (type)
 ! HONOR_NANS (type)
 -   ! HONOR_INFINITIES (element_mode (type)))
 +   ! HONOR_INFINITIES (type))
{ build_minus_one_cst (type); }))

  /* In IEEE floating point, x/1 is not equivalent to x for snans.  */
  (simplify
   (rdiv @0 real_onep)
 - (if (!HONOR_SNANS (element_mode (type)))
 + (if (!HONOR_SNANS (type))
(non_lvalue @0)))

  /* In IEEE floating point, x/-1 is not equivalent to -x for snans.  */
  (simplify
   (rdiv @0 real_minus_onep)
 - (if (!HONOR_SNANS (element_mode (type)))
 + (if (!HONOR_SNANS (type))
(negate @0)))

  /* If ARG1 is a constant, we can convert this to a multiply by the
 @@ -297,9 +297,10 @@
@1)

  /* ~x | x - -1 */

 Please also adjust this comment.  Ok with that change.

 Thanks,
 Richard.

 -(simplify
 - (bit_ior:c (convert? @0) (convert? (bit_not @0)))
 - (convert { build_all_ones_cst (TREE_TYPE (@0)); }))
 +(for op (bit_ior bit_xor plus)
 + (simplify
 +  (op:c (convert? @0) (convert? (bit_not @0)))
 +  (convert { build_all_ones_cst (TREE_TYPE (@0)); })))

  /* x ^ x - 0 */
  (simplify
 @@ -311,11 +312,6 @@
(bit_xor @0 integer_all_onesp@1)
(bit_not @0))

 -/* ~X ^ X is -1.  */
 -(simplify
 - (bit_xor:c (bit_not @0) @0)
 - { build_all_ones_cst (type); })
 -
  /* x  ~0 - x  */
  (simplify
   (bit_and @0 integer_all_onesp)
 @@ -603,11 +599,11 @@
  (simplify
   (bit_not (convert? (negate @0)))
   (if (tree_nop_conversion_p (type, TREE_TYPE (@0)))
 -  (convert (minus @0 { build_one_cst (TREE_TYPE (@0)); }
 +  (convert (minus @0 { build_each_one_cst (TREE_TYPE (@0)); }

  /* Convert ~ (A - 1) or ~ (A + -1) to -A.  

Re: [PATCH] save takes a single integer (register or 13-bit signed immediate)

2015-07-07 Thread Eric Botcazou
 2015-06-26  Daniel Cederman  ceder...@gaisler.com
 
   * config/sparc/sparc.md: Window save takes a single integer

This will probably break in 64-bit mode, the operand can be a DImode register.

-- 
Eric Botcazou


Re: [Patch, fortran, pr66578, v1] [F2008] Invalid free on allocate(...,source=a(:)) in block

2015-07-07 Thread Andre Vehreschild
Hi all, hi Paul,

Paul thanks for the review. Committed as r225507. 

Regards,
Andre
-- 
Andre Vehreschild * Email: vehre ad gmx dot de 
Index: gcc/fortran/trans-expr.c
===
*** gcc/fortran/trans-expr.c	(revision 223641)
--- gcc/fortran/trans-expr.c	(working copy)
*** gfc_conv_procedure_call (gfc_se * se, gf
*** 5877,5882 
--- 5877,5896 
fntype = TREE_TYPE (TREE_TYPE (se-expr));
se-expr = build_call_vec (TREE_TYPE (fntype), se-expr, arglist);
  
+   /* Allocatable scalar function results must be freed and nullified
+  after use. This necessitates the creation of a temporary to
+  hold the result to prevent duplicate calls.  */
+   if (!byref  sym-ts.type != BT_CHARACTER
+sym-attr.allocatable  !sym-attr.dimension)
+ {
+   tmp = gfc_create_var (TREE_TYPE (se-expr), NULL);
+   gfc_add_modify (se-pre, tmp, se-expr);
+   se-expr = tmp;
+   tmp = gfc_call_free (tmp);
+   gfc_add_expr_to_block (post, tmp);
+   gfc_add_modify (post, se-expr, build_int_cst (TREE_TYPE (se-expr), 0));
+ }
+ 
/* If we have a pointer function, but we don't want a pointer, e.g.
   something like
  x = f()
Index: gcc/fortran/trans-stmt.c
===
*** gcc/fortran/trans-stmt.c	(revision 223641)
--- gcc/fortran/trans-stmt.c	(working copy)
*** gfc_trans_allocate (gfc_code * code)
*** 5214,5219 
--- 5214,5220 
   false, false);
  	  gfc_add_block_to_block (block, se.pre);
  	  gfc_add_block_to_block (post, se.post);
+ 
  	  /* Prevent aliasing, i.e., se.expr may be already a
  		 variable declaration.  */
  	  if (!VAR_P (se.expr))
*** gfc_trans_allocate (gfc_code * code)
*** 5223,5230 
  		 se.expr);
  	  /* We need a regular (non-UID) symbol here, therefore give a
  		 prefix.  */
! 	  var = gfc_create_var (TREE_TYPE (tmp), atmp);
  	  gfc_add_modify_loc (input_location, block, var, tmp);
  	  tmp = var;
  	}
  	  else
--- 5224,5243 
  		 se.expr);
  	  /* We need a regular (non-UID) symbol here, therefore give a
  		 prefix.  */
! 	  var = gfc_create_var (TREE_TYPE (tmp), expr3);
  	  gfc_add_modify_loc (input_location, block, var, tmp);
+ 
+ 	  /* Deallocate any allocatable components after all the allocations
+ 		 and assignments of expr3 have been completed.  */
+ 	  if (code-expr3-ts.type == BT_DERIVED
+ 		   code-expr3-rank == 0
+ 		   code-expr3-ts.u.derived-attr.alloc_comp)
+ 		{
+ 		  tmp = gfc_deallocate_alloc_comp (code-expr3-ts.u.derived,
+ 		   var, 0);
+ 		  gfc_add_expr_to_block (post, tmp);
+ 		}
+ 
  	  tmp = var;
  	}
  	  else
Index: gcc/testsuite/gfortran.dg/allocatable_scalar_13.f90
===
*** gcc/testsuite/gfortran.dg/allocatable_scalar_13.f90	(revision 0)
--- gcc/testsuite/gfortran.dg/allocatable_scalar_13.f90	(working copy)
***
*** 0 
--- 1,70 
+ ! { dg-do run }
+ ! { dg-options -fdump-tree-original }
+ !
+ ! Test the fix for PR66079. The original problem was with the first
+ ! allocate statement. The rest of this testcase fixes problems found
+ ! whilst working on it!
+ !
+ ! Reported by Damian Rouson  dam...@sourceryinstitute.org
+ !
+   type subdata
+ integer, allocatable :: b
+   endtype
+ !  block
+ call newRealVec
+ !  end block
+ contains
+   subroutine newRealVec
+ type(subdata), allocatable :: d, e, f
+ character(:), allocatable :: g, h, i
+ character(8), allocatable :: j
+ allocate(d,source=subdata(1)) ! memory was lost, now OK
+ allocate(e,source=d) ! OK
+ allocate(f,source=create (99)) ! memory was lost, now OK
+ if (d%b .ne. 1) call abort
+ if (e%b .ne. 1) call abort
+ if (f%b .ne. 99) call abort
+ allocate (g, source = greeting1(good day))
+ if (g .ne. good day) call abort
+ allocate (h, source = greeting2(hello))
+ if (h .ne. hello) call abort
+ allocate (i, source = greeting3(hiya!))
+ if (i .ne. hiya!) call abort
+ call greeting4 (j, Goodbye ) ! Test that dummy arguments are OK
+ if (j .ne. Goodbye ) call abort
+   end subroutine
+ 
+   function create (arg) result(res)
+ integer :: arg
+ type(subdata), allocatable :: res, res1
+ allocate(res, res1, source = subdata(arg))
+   end function
+ 
+   function greeting1 (arg) result(res) ! memory was lost, now OK
+ character(*) :: arg
+ Character(:), allocatable :: res
+ allocate(res, source = arg)
+   end function
+ 
+   function greeting2 (arg) result(res)
+ character(5) :: arg
+ Character(:), allocatable :: res
+ allocate(res, source = arg)
+   end function
+ 
+   function greeting3 (arg) result(res)
+ character(5) :: arg
+ Character(5), allocatable :: res, res1
+ allocate(res, res1, source = arg) ! Caused an ICE
+ if (res1 

[patch committed SH] Fix PR target/66780

2015-07-07 Thread Kaz Kojima
The attatched patch reverts a part of the change in r221165 for
target/65249. It turned out that that change causes a wrong code
problem PR target/66780 which is worse than the ICE with 'R0_REGS'
spill failure for a specific program reported by PR65249.
I've committed it on trunk and reopened PR target/65249.  I'll
backport it to 4.9 later and to 5 when the branch reopens.

Regards,
kaz
--
2015-07-07  Kaz Kojima  kkoj...@gcc.gnu.org

PR target/66780
* config/sh/sh.md (symGOT_load): Revert a part of 2015-03-03
change for target/65249.

diff --git a/config/sh/sh.md b/config/sh/sh.md
index 5c8d306..f0cb3cf 100644
--- a/config/sh/sh.md
+++ b/config/sh/sh.md
@@ -10751,12 +10751,6 @@ label:
 __stack_chk_guard) == 0)
 stack_chk_guard_p = true;
 
-  /* Use R0 to avoid long R0 liveness which stack-protector tends to
- produce.  */
-  if (! sh_lra_flag
-   stack_chk_guard_p  ! reload_in_progress  ! reload_completed)
-operands[2] = gen_rtx_REG (Pmode, R0_REG);
-
   if (TARGET_SHMEDIA)
 {
   rtx reg = operands[2];


[PATCH 10/16][AArch64] vld{2,3,4}{,_lane,_dup},vcombine,vcreate

2015-07-07 Thread Alan Lawrence

As per https://gcc.gnu.org/ml/gcc-patches/2015-04/msg01342.html
commit ef719e5d3d6eccc5cf621851283b7c0ba1a9ee6c
Author: Alan Lawrence alan.lawre...@arm.com
Date:   Tue Aug 5 17:52:28 2014 +0100

AArch64 3/N: v(create|combine|v(ld|st|ld...dup/lane|st...lane)[234](q?))_f16; tests vldN{,_lane,_dup} inc bigendian. Add __builtin_aarch64_simd_hf.

Fix some casts, to ..._hf not ..._sf !

diff --git a/gcc/config/aarch64/aarch64-builtins.c b/gcc/config/aarch64/aarch64-builtins.c
index a6c3377..5367ba6 100644
--- a/gcc/config/aarch64/aarch64-builtins.c
+++ b/gcc/config/aarch64/aarch64-builtins.c
@@ -300,6 +300,12 @@ aarch64_types_storestruct_lane_qualifiers[SIMD_MAX_BUILTIN_ARGS]
 #define VAR12(T, N, MAP, A, B, C, D, E, F, G, H, I, J, K, L) \
   VAR11 (T, N, MAP, A, B, C, D, E, F, G, H, I, J, K) \
   VAR1 (T, N, MAP, L)
+#define VAR13(T, N, MAP, A, B, C, D, E, F, G, H, I, J, K, L, M) \
+  VAR12 (T, N, MAP, A, B, C, D, E, F, G, H, I, J, K, L) \
+  VAR1 (T, N, MAP, M)
+#define VAR14(T, X, MAP, A, B, C, D, E, F, G, H, I, J, K, L, M, N) \
+  VAR13 (T, X, MAP, A, B, C, D, E, F, G, H, I, J, K, L, M) \
+  VAR1 (T, X, MAP, N)
 
 #include aarch64-builtin-iterators.h
 
@@ -377,6 +383,7 @@ const char *aarch64_scalar_builtin_types[] = {
   __builtin_aarch64_simd_qi,
   __builtin_aarch64_simd_hi,
   __builtin_aarch64_simd_si,
+  __builtin_aarch64_simd_hf,
   __builtin_aarch64_simd_sf,
   __builtin_aarch64_simd_di,
   __builtin_aarch64_simd_df,
@@ -664,6 +671,8 @@ aarch64_init_simd_builtin_scalar_types (void)
 	 __builtin_aarch64_simd_qi);
   (*lang_hooks.types.register_builtin_type) (intHI_type_node,
 	 __builtin_aarch64_simd_hi);
+  (*lang_hooks.types.register_builtin_type) (aarch64_fp16_type_node,
+	 __builtin_aarch64_simd_hf);
   (*lang_hooks.types.register_builtin_type) (intSI_type_node,
 	 __builtin_aarch64_simd_si);
   (*lang_hooks.types.register_builtin_type) (float_type_node,
diff --git a/gcc/config/aarch64/aarch64.c b/gcc/config/aarch64/aarch64.c
index ccf063a..bbf5230 100644
--- a/gcc/config/aarch64/aarch64.c
+++ b/gcc/config/aarch64/aarch64.c
@@ -1063,6 +1063,9 @@ aarch64_split_simd_combine (rtx dst, rtx src1, rtx src2)
 	case V2SImode:
 	  gen = gen_aarch64_simd_combinev2si;
 	  break;
+	case V4HFmode:
+	  gen = gen_aarch64_simd_combinev4hf;
+	  break;
 	case V2SFmode:
 	  gen = gen_aarch64_simd_combinev2sf;
 	  break;
diff --git a/gcc/config/aarch64/arm_neon.h b/gcc/config/aarch64/arm_neon.h
index 7425485..d61e619 100644
--- a/gcc/config/aarch64/arm_neon.h
+++ b/gcc/config/aarch64/arm_neon.h
@@ -153,6 +153,16 @@ typedef struct uint64x2x2_t
   uint64x2_t val[2];
 } uint64x2x2_t;
 
+typedef struct float16x4x2_t
+{
+  float16x4_t val[2];
+} float16x4x2_t;
+
+typedef struct float16x8x2_t
+{
+  float16x8_t val[2];
+} float16x8x2_t;
+
 typedef struct float32x2x2_t
 {
   float32x2_t val[2];
@@ -273,6 +283,16 @@ typedef struct uint64x2x3_t
   uint64x2_t val[3];
 } uint64x2x3_t;
 
+typedef struct float16x4x3_t
+{
+  float16x4_t val[3];
+} float16x4x3_t;
+
+typedef struct float16x8x3_t
+{
+  float16x8_t val[3];
+} float16x8x3_t;
+
 typedef struct float32x2x3_t
 {
   float32x2_t val[3];
@@ -393,6 +413,16 @@ typedef struct uint64x2x4_t
   uint64x2_t val[4];
 } uint64x2x4_t;
 
+typedef struct float16x4x4_t
+{
+  float16x4_t val[4];
+} float16x4x4_t;
+
+typedef struct float16x8x4_t
+{
+  float16x8_t val[4];
+} float16x8x4_t;
+
 typedef struct float32x2x4_t
 {
   float32x2_t val[4];
@@ -2644,6 +2674,12 @@ vcreate_s64 (uint64_t __a)
   return (int64x1_t) {__a};
 }
 
+__extension__ static __inline float16x4_t __attribute__ ((__always_inline__))
+vcreate_f16 (uint64_t __a)
+{
+  return (float16x4_t) __a;
+}
+
 __extension__ static __inline float32x2_t __attribute__ ((__always_inline__))
 vcreate_f32 (uint64_t __a)
 {
@@ -4780,6 +4816,12 @@ vcombine_s64 (int64x1_t __a, int64x1_t __b)
   return __builtin_aarch64_combinedi (__a[0], __b[0]);
 }
 
+__extension__ static __inline float16x8_t __attribute__ ((__always_inline__))
+vcombine_f16 (float16x4_t __a, float16x4_t __b)
+{
+  return __builtin_aarch64_combinev4hf (__a, __b);
+}
+
 __extension__ static __inline float32x4_t __attribute__ ((__always_inline__))
 vcombine_f32 (float32x2_t __a, float32x2_t __b)
 {
@@ -9908,7 +9950,7 @@ vtstq_p16 (poly16x8_t a, poly16x8_t b)
+--+++++
|uint  | Y  | Y  | N  | N  |
+--+++++
-   |float | -  | -  | N  | N  |
+   |float | -  | Y  | N  | N  |
+--+++++
|poly  | Y  | Y  | -  | -  |
+--+++++
@@ -9922,7 +9964,7 @@ vtstq_p16 (poly16x8_t a, poly16x8_t b)
+--+++++
|uint  | Y  | Y  | Y  | Y  |
+--+++++
-   |float | -  | -  | Y  | Y  |
+   |float | -  | Y  | Y  | Y  |
+--+++++
|poly  | Y  | Y  | -  | -  |
+--+++++
@@ -9936,7 +9978,7 @@ vtstq_p16 (poly16x8_t a, poly16x8_t b)
+--+++++

[PATCH 9/16][AArch64] Add support for float16x{4,8}_t vectors/builtins

2015-07-07 Thread Alan Lawrence

As https://gcc.gnu.org/ml/gcc-patches/2015-04/msg01341.html
commit 49cb53a94a44fcda845c3f6ef11e88f9be458aad
Author: Alan Lawrence alan.lawre...@arm.com
Date:   Tue Dec 2 13:08:15 2014 +

AArch64 2/N: Vector/__builtin basics: define+support types, movs, test ABI.

Patterns, builtins, intrinsics for {ld1,st1}{,_lane},v{g,s}et_lane. Tests: vld1-vst1_1, vset_lane_1, vld1_lane.c

diff --git a/gcc/config/aarch64/aarch64-builtins.c b/gcc/config/aarch64/aarch64-builtins.c
index cfb2dc1..a6c3377 100644
--- a/gcc/config/aarch64/aarch64-builtins.c
+++ b/gcc/config/aarch64/aarch64-builtins.c
@@ -66,6 +66,7 @@
 
 #define v8qi_UP  V8QImode
 #define v4hi_UP  V4HImode
+#define v4hf_UP  V4HFmode
 #define v2si_UP  V2SImode
 #define v2sf_UP  V2SFmode
 #define v1df_UP  V1DFmode
@@ -73,6 +74,7 @@
 #define df_UPDFmode
 #define v16qi_UP V16QImode
 #define v8hi_UP  V8HImode
+#define v8hf_UP  V8HFmode
 #define v4si_UP  V4SImode
 #define v4sf_UP  V4SFmode
 #define v2di_UP  V2DImode
@@ -523,6 +525,8 @@ aarch64_simd_builtin_std_type (enum machine_mode mode,
   return aarch64_simd_intCI_type_node;
 case XImode:
   return aarch64_simd_intXI_type_node;
+case HFmode:
+  return aarch64_fp16_type_node;
 case SFmode:
   return float_type_node;
 case DFmode:
@@ -607,6 +611,8 @@ aarch64_init_simd_builtin_types (void)
   aarch64_simd_types[Poly64x2_t].eltype = aarch64_simd_types[Poly64_t].itype;
 
   /* Continue with standard types.  */
+  aarch64_simd_types[Float16x4_t].eltype = aarch64_fp16_type_node;
+  aarch64_simd_types[Float16x8_t].eltype = aarch64_fp16_type_node;
   aarch64_simd_types[Float32x2_t].eltype = float_type_node;
   aarch64_simd_types[Float32x4_t].eltype = float_type_node;
   aarch64_simd_types[Float64x1_t].eltype = double_type_node;
diff --git a/gcc/config/aarch64/aarch64-simd-builtin-types.def b/gcc/config/aarch64/aarch64-simd-builtin-types.def
index bb54e56..ea219b7 100644
--- a/gcc/config/aarch64/aarch64-simd-builtin-types.def
+++ b/gcc/config/aarch64/aarch64-simd-builtin-types.def
@@ -44,6 +44,8 @@
   ENTRY (Poly16x8_t, V8HI, poly, 12)
   ENTRY (Poly64x1_t, DI, poly, 12)
   ENTRY (Poly64x2_t, V2DI, poly, 12)
+  ENTRY (Float16x4_t, V4HF, none, 13)
+  ENTRY (Float16x8_t, V8HF, none, 13)
   ENTRY (Float32x2_t, V2SF, none, 13)
   ENTRY (Float32x4_t, V4SF, none, 13)
   ENTRY (Float64x1_t, V1DF, none, 13)
diff --git a/gcc/config/aarch64/aarch64-simd-builtins.def b/gcc/config/aarch64/aarch64-simd-builtins.def
index dd2bc47..4dd2bc7 100644
--- a/gcc/config/aarch64/aarch64-simd-builtins.def
+++ b/gcc/config/aarch64/aarch64-simd-builtins.def
@@ -367,11 +367,11 @@
   VAR1 (UNOP, float_extend_lo_, 0, v2df)
   VAR1 (UNOP, float_truncate_lo_, 0, v2sf)
 
-  /* Implemented by aarch64_ld1VALL:mode.  */
-  BUILTIN_VALL (LOAD1, ld1, 0)
+  /* Implemented by aarch64_ld1VALL_F16:mode.  */
+  BUILTIN_VALL_F16 (LOAD1, ld1, 0)
 
-  /* Implemented by aarch64_st1VALL:mode.  */
-  BUILTIN_VALL (STORE1, st1, 0)
+  /* Implemented by aarch64_st1VALL_F16:mode.  */
+  BUILTIN_VALL_F16 (STORE1, st1, 0)
 
   /* Implemented by fmamode4.  */
   BUILTIN_VDQF (TERNOP, fma, 4)
diff --git a/gcc/config/aarch64/aarch64-simd.md b/gcc/config/aarch64/aarch64-simd.md
index b90f938..5cc45ed 100644
--- a/gcc/config/aarch64/aarch64-simd.md
+++ b/gcc/config/aarch64/aarch64-simd.md
@@ -19,8 +19,8 @@
 ;; http://www.gnu.org/licenses/.
 
 (define_expand movmode
-  [(set (match_operand:VALL 0 nonimmediate_operand )
-	(match_operand:VALL 1 general_operand ))]
+  [(set (match_operand:VALL_F16 0 nonimmediate_operand )
+	(match_operand:VALL_F16 1 general_operand ))]
   TARGET_SIMD
   
 if (GET_CODE (operands[0]) == MEM)
@@ -2450,7 +2450,7 @@
 (define_insn aarch64_get_lanemode
   [(set (match_operand:VEL 0 aarch64_simd_nonimmediate_operand =r, w, Utv)
 	(vec_select:VEL
-	  (match_operand:VALL 1 register_operand w, w, w)
+	  (match_operand:VALL_F16 1 register_operand w, w, w)
 	  (parallel [(match_operand:SI 2 immediate_operand i, i, i)])))]
   TARGET_SIMD
   {
@@ -4234,8 +4234,9 @@
 )
 
 (define_insn aarch64_be_ld1mode
-  [(set (match_operand:VALLDI 0	register_operand =w)
-	(unspec:VALLDI [(match_operand:VALLDI 1 aarch64_simd_struct_operand Utv)]
+  [(set (match_operand:VALLDI_F16 0	register_operand =w)
+	(unspec:VALLDI_F16 [(match_operand:VALLDI_F16 1
+			 aarch64_simd_struct_operand Utv)]
 	UNSPEC_LD1))]
   TARGET_SIMD
   ld1\\t{%0Vmtype}, %1
@@ -4243,8 +4244,8 @@
 )
 
 (define_insn aarch64_be_st1mode
-  [(set (match_operand:VALLDI 0 aarch64_simd_struct_operand =Utv)
-	(unspec:VALLDI [(match_operand:VALLDI 1 register_operand w)]
+  [(set (match_operand:VALLDI_F16 0 aarch64_simd_struct_operand =Utv)
+	(unspec:VALLDI_F16 [(match_operand:VALLDI_F16 1 register_operand w)]
 	UNSPEC_ST1))]
   TARGET_SIMD
   st1\\t{%1Vmtype}, %0
@@ -4533,16 +4534,16 @@
   DONE;
 })
 
-(define_expand aarch64_ld1VALL:mode
- [(match_operand:VALL 0 register_operand)
+(define_expand aarch64_ld1VALL_F16:mode
+ [(match_operand:VALL_F16 0 

[PATCH 11/16][AArch64] Implement vcvt_{,high_}f16_f32

2015-07-07 Thread Alan Lawrence
This comes from https://gcc.gnu.org/ml/gcc-patches/2015-04/msg01343.html but the 
other/unrelated intrinsics have moved into the next patch.


gcc/ChangeLog:

* config/aarch64/aarch64-simd.md (aarch64_float_truncate_lo_v2sf):
Reparameterize to...
(aarch64_float_truncate_lo_mode): ...this, for both V2SF and V4HF.
(aarch64_float_truncate_hi_v4sf): Reparameterize to...
(aarch64_float_truncate_hi_Vdbl): ...this, for both V4SF and V8HF.

* config/aarch64/aarch64-simd-builtins.def (float_truncate_hi_): Add
v8hf variant.
(float_truncate_lo_): Use BUILTIN_VDF iterator.

* config/aarch64/arm_neon.h (vcvt_f16_f32, vcvt_high_f16_f32): New.

* config/aarch64/iterators.md (VDF, Vdtype): New.
(VWIDE, Vmwtype): Add cases for V4HF and V2SF.
commit 5007fafedc8469ab645edfe65fbf41f75fc74750
Author: Alan Lawrence alan.lawre...@arm.com
Date:   Tue Dec 2 18:30:05 2014 +

AArch64 4/N v2: float_truncate_lo/hi

diff --git a/gcc/config/aarch64/aarch64-simd-builtins.def b/gcc/config/aarch64/aarch64-simd-builtins.def
index 4dd2bc7..8bcab72 100644
--- a/gcc/config/aarch64/aarch64-simd-builtins.def
+++ b/gcc/config/aarch64/aarch64-simd-builtins.def
@@ -363,9 +363,10 @@
 
   VAR1 (UNOP, vec_unpacks_hi_, 10, v4sf)
   VAR1 (BINOP, float_truncate_hi_, 0, v4sf)
+  VAR1 (BINOP, float_truncate_hi_, 0, v8hf)
 
   VAR1 (UNOP, float_extend_lo_, 0, v2df)
-  VAR1 (UNOP, float_truncate_lo_, 0, v2sf)
+  BUILTIN_VDF (UNOP, float_truncate_lo_, 0)
 
   /* Implemented by aarch64_ld1VALL_F16:mode.  */
   BUILTIN_VALL_F16 (LOAD1, ld1, 0)
diff --git a/gcc/config/aarch64/aarch64-simd.md b/gcc/config/aarch64/aarch64-simd.md
index 5cc45ed..2dc54e1 100644
--- a/gcc/config/aarch64/aarch64-simd.md
+++ b/gcc/config/aarch64/aarch64-simd.md
@@ -1726,23 +1726,23 @@
 
 ;; Float narrowing operations.
 
-(define_insn aarch64_float_truncate_lo_v2sf
-  [(set (match_operand:V2SF 0 register_operand =w)
-  (float_truncate:V2SF
-	(match_operand:V2DF 1 register_operand w)))]
+(define_insn aarch64_float_truncate_lo_mode
+  [(set (match_operand:VDF 0 register_operand =w)
+  (float_truncate:VDF
+	(match_operand:VWIDE 1 register_operand w)))]
   TARGET_SIMD
-  fcvtn\\t%0.2s, %1.2d
+  fcvtn\\t%0.Vtype, %1Vmwtype
   [(set_attr type neon_fp_cvt_narrow_d_q)]
 )
 
-(define_insn aarch64_float_truncate_hi_v4sf
-  [(set (match_operand:V4SF 0 register_operand =w)
-(vec_concat:V4SF
-  (match_operand:V2SF 1 register_operand 0)
-  (float_truncate:V2SF
-	(match_operand:V2DF 2 register_operand w]
+(define_insn aarch64_float_truncate_hi_Vdbl
+  [(set (match_operand:VDBL 0 register_operand =w)
+(vec_concat:VDBL
+  (match_operand:VDF 1 register_operand 0)
+  (float_truncate:VDF
+	(match_operand:VWIDE 2 register_operand w]
   TARGET_SIMD
-  fcvtn2\\t%0.4s, %2.2d
+  fcvtn2\\t%0.Vdtype, %2Vmwtype
   [(set_attr type neon_fp_cvt_narrow_d_q)]
 )
 
diff --git a/gcc/config/aarch64/arm_neon.h b/gcc/config/aarch64/arm_neon.h
index d61e619..b915754 100644
--- a/gcc/config/aarch64/arm_neon.h
+++ b/gcc/config/aarch64/arm_neon.h
@@ -5726,12 +5726,8 @@ vaddlvq_u32 (uint32x4_t a)
result;  \
  })
 
-/* vcvt_f16_f32 not supported */
-
 /* vcvt_f32_f16 not supported */
 
-/* vcvt_high_f16_f32 not supported */
-
 /* vcvt_high_f32_f16 not supported */
 
 #define vcvt_n_f32_s32(a, b)\
@@ -13098,6 +13094,18 @@ vcntq_u8 (uint8x16_t __a)
 
 /* vcvt (double - float).  */
 
+__extension__ static __inline float16x4_t __attribute__ ((__always_inline__))
+vcvt_f16_f32 (float32x4_t __a)
+{
+  return __builtin_aarch64_float_truncate_lo_v4hf (__a);
+}
+
+__extension__ static __inline float16x8_t __attribute__ ((__always_inline__))
+vcvt_high_f16_f32 (float16x4_t __a, float32x4_t __b)
+{
+  return __builtin_aarch64_float_truncate_hi_v8hf (__a, __b);
+}
+
 __extension__ static __inline float32x2_t __attribute__ ((__always_inline__))
 vcvt_f32_f64 (float64x2_t __a)
 {
diff --git a/gcc/config/aarch64/iterators.md b/gcc/config/aarch64/iterators.md
index 96920cf..f6094b1 100644
--- a/gcc/config/aarch64/iterators.md
+++ b/gcc/config/aarch64/iterators.md
@@ -41,6 +41,9 @@
 ;; Iterator for General Purpose Float regs, inc float16_t.
 (define_mode_iterator GPF_F16 [HF SF DF])
 
+;; Double vector modes.
+(define_mode_iterator VDF [V2SF V4HF])
+
 ;; Integer vector modes.
 (define_mode_iterator VDQ_I [V8QI V16QI V4HI V8HI V2SI V4SI V2DI])
 
@@ -452,6 +455,9 @@
 			(SI   V2SI)  (DI   V2DI)
 			(DF   V2DF)])
 
+;; Register suffix for double-length mode.
+(define_mode_attr Vdtype [(V4HF 8h) (V2SF 4s)])
+
 ;; Double modes of vector modes (lower case).
 (define_mode_attr Vdbl [(V8QI v16qi) (V4HI v8hi)
 			(V4HF v8hf)
@@ -485,7 +491,8 @@
 (define_mode_attr VWIDE [(V8QI V8HI) (V4HI V4SI)
 			 (V2SI V2DI) (V16QI V8HI) 
 			 (V8HI V4SI) (V4SI V2DI)
-			 (HI SI) (SI DI)]
+			 (HI SI) (SI DI)
+			 

[gomp4] libgomp: XFAIL libgomp.oacc-c-c++-common/reduction-4.c for acc_device_nvidia (was: implicit firstprivate and other testcase fixes)

2015-07-07 Thread Thomas Schwinge
Hi!

On Wed, 1 Jul 2015 22:19:01 +0800, Chung-Lin Tang clt...@codesourcery.com 
wrote:
 This patch notices the index variable of an acc loop (internally an OMP_FOR)
 inside an OpenACC construct, and completes the implicit firstprivate
 behavior as described in the spec. The firstprivate clauses and FIXME in
 libgomp.oacc-c-c++-common/parallel-loop-2.h has also been removed together
 in the patch.

Thanks!

 Also a typo-bug in testcase libgomp.oacc-c-c++-common/reduction-4.c is also 
 corrected,
 where reduction variable names are apparently wrong.
 
 Tested without regressions, and applied to gomp-4_0-branch.

I'm seeing:

WARNING: program timed out.
FAIL: libgomp.oacc-c/../libgomp.oacc-c-c++-common/reduction-4.c 
-DACC_DEVICE_TYPE_nvidia=1 -DACC_MEM_SHARED=0 execution test

... with:

libgomp: cuStreamSynchronize error: launch timeout

 (also for C++), and applied in r225513:

commit f03018ac39ed0193102fe29139d3c995caa02fd5
Author: tschwinge tschwinge@138bc75d-0d04-0410-961f-82ee72b054a4
Date:   Tue Jul 7 12:45:12 2015 +

libgomp: XFAIL libgomp.oacc-c-c++-common/reduction-4.c for acc_device_nvidia

... after r225250 changes.

libgomp/
* testsuite/libgomp.oacc-c-c++-common/reduction-4.c:
dg-xfail-run-if openacc_nvidia_accel_selected.

git-svn-id: svn+ssh://gcc.gnu.org/svn/gcc/branches/gomp-4_0-branch@225513 
138bc75d-0d04-0410-961f-82ee72b054a4
---
 libgomp/ChangeLog.gomp| 5 +
 libgomp/testsuite/libgomp.oacc-c-c++-common/reduction-4.c | 1 +
 2 files changed, 6 insertions(+)

diff --git libgomp/ChangeLog.gomp libgomp/ChangeLog.gomp
index dc0f0bf..5f3dfaf 100644
--- libgomp/ChangeLog.gomp
+++ libgomp/ChangeLog.gomp
@@ -1,3 +1,8 @@
+2015-07-07  Thomas Schwinge  tho...@codesourcery.com
+
+   * testsuite/libgomp.oacc-c-c++-common/reduction-4.c:
+   dg-xfail-run-if openacc_nvidia_accel_selected.
+
 2015-06-24  James Norris  jnor...@codesourcery.com
 
* testsuite/libgomp.oacc-fortran/if-1.c: Fix syntax.
diff --git libgomp/testsuite/libgomp.oacc-c-c++-common/reduction-4.c 
libgomp/testsuite/libgomp.oacc-c-c++-common/reduction-4.c
index 416d960..c32f1db 100644
--- libgomp/testsuite/libgomp.oacc-c-c++-common/reduction-4.c
+++ libgomp/testsuite/libgomp.oacc-c-c++-common/reduction-4.c
@@ -1,4 +1,5 @@
 /* { dg-do run { target { ! { hppa*-*-hpux* } } } } */
+/* { dg-xfail-run-if libgomp: cuStreamSynchronize error: launch timeout { 
openacc_nvidia_accel_selected } } */
 
 /* complex reductions.  */
 


Grüße,
 Thomas


pgpenONfpLS22.pgp
Description: PGP signature


[AArch64][2/2] Define TARGET_UNSPEC_MAY_TRAP_P for AArch64

2015-07-07 Thread Jiong Wang

A second patch to improve rtl loop iv on AArch64.

We should define this to tell gcc the pattern hidden by these GOT unspec
is safe from trap, so gcc could make more positive decision when
handling them, for example in RTL loop iv pass, when deciding whether
one instruction is invariant candidate, may_trap_or_fault_p will be
invoked which will call this target hook.

OK for trunk?

2015-07-07  Jiong Wang  jiong.w...@arm.com

gcc/
  * config/aarch64/aarch64.c (aarch64_unspec_may_trap_p): New function.
  (TARGET_UNSPEC_MAY_TRAP_P): Define as aarch64_unspec_may_trap_p.
  
-- 
Regards,
Jiong

diff --git a/gcc/config/aarch64/aarch64.c b/gcc/config/aarch64/aarch64.c
index e180daa..c7c12ee 100644
--- a/gcc/config/aarch64/aarch64.c
+++ b/gcc/config/aarch64/aarch64.c
@@ -11943,6 +11943,24 @@ aarch64_use_pseudo_pic_reg (void)
   return aarch64_cmodel == AARCH64_CMODEL_SMALL_SPIC;
 }
 
+/* Implement TARGET_UNSPEC_MAY_TRAP_P.  */
+
+static int
+aarch64_unspec_may_trap_p (const_rtx x, unsigned flags)
+{
+  switch (XINT (x, 1))
+{
+case UNSPEC_GOTSMALLPIC:
+case UNSPEC_GOTSMALLPIC28K:
+case UNSPEC_GOTTINYPIC:
+  return 0;
+default:
+  break;
+}
+
+  return default_unspec_may_trap_p (x, flags);
+}
+
 #undef TARGET_ADDRESS_COST
 #define TARGET_ADDRESS_COST aarch64_address_cost
 
@@ -12221,6 +12239,9 @@ aarch64_use_pseudo_pic_reg (void)
 #undef TARGET_SCHED_FUSION_PRIORITY
 #define TARGET_SCHED_FUSION_PRIORITY aarch64_sched_fusion_priority
 
+#undef TARGET_UNSPEC_MAY_TRAP_P
+#define TARGET_UNSPEC_MAY_TRAP_P aarch64_unspec_may_trap_p
+
 #undef TARGET_USE_PSEUDO_PIC_REG
 #define TARGET_USE_PSEUDO_PIC_REG aarch64_use_pseudo_pic_reg
 


Re: [PATCH] save takes a single integer (register or 13-bit signed immediate)

2015-07-07 Thread Daniel Cederman



On 2015-07-07 12:35, Eric Botcazou wrote:

2015-06-26  Daniel Cederman  ceder...@gaisler.com

* config/sparc/sparc.md: Window save takes a single integer


This will probably break in 64-bit mode, the operand can be a DImode register.



You are right, I forgot about that. Is there a mode one can use that 
changes depending on the target architecture (32-bit on 32-bit 
architectures and 64-bit on 64-bit architectures)? Or does one have to 
add a 32-bit and a 64-bit variant of window_save?


--
Daniel Cederman


Re: [PATCH] Update instruction cost for LEON

2015-07-07 Thread Daniel Cederman

On 2015-07-07 12:37, Eric Botcazou wrote:

2015-07-03  Daniel Cederman  ceder...@gaisler.com

* config/sparc/sparc.c (struct processor_costs): Set div cost
for leon to match UT699 and AT697F. Set mul cost for leon3 to
match standard leon3.


So UT699 is not a standard LEON3?



LEON3 exists in multiple revisions and is configurable so I agree that 
using the word standard in this context is a bit ambiguous.


I think we should delay applying this patch. First we need to look into 
how to properly provide the information on FPU selection and multiplier 
size to GCC. Otherwise we risk having to change the values again in a 
short while.


--
Daniel Cederman



[PATCH 3/16][ARM] Add float16x4_t intrinsics

2015-07-07 Thread Alan Lawrence

As per https://gcc.gnu.org/ml/gcc-patches/2015-04/msg01335.html
commit 54a89a084fbd00e4de036f549ca893b74b8f58fb
Author: Alan Lawrence alan.lawre...@arm.com
Date:   Mon Dec 8 18:40:03 2014 +

ARM: float16x4_t intrinsics (v2 - fix v[sg]et_lane_f16 at -O0, no vdup_n/vmov_n)

diff --git a/gcc/config/arm/arm_neon.h b/gcc/config/arm/arm_neon.h
index c923e29..b4100c8 100644
--- a/gcc/config/arm/arm_neon.h
+++ b/gcc/config/arm/arm_neon.h
@@ -41,6 +41,7 @@ typedef __simd64_int8_t int8x8_t;
 typedef __simd64_int16_t int16x4_t;
 typedef __simd64_int32_t int32x2_t;
 typedef __builtin_neon_di int64x1_t;
+typedef __builtin_neon_hf float16_t;
 typedef __simd64_float16_t float16x4_t;
 typedef __simd64_float32_t float32x2_t;
 typedef __simd64_poly8_t poly8x8_t;
@@ -5201,6 +5202,19 @@ vget_lane_s32 (int32x2_t __a, const int __b)
   return (int32_t)__builtin_neon_vget_lanev2si (__a, __b);
 }
 
+/* Functions cannot accept or return __FP16 types.  Even if the function
+   were marked always-inline so there were no call sites, the declaration
+   would nonetheless raise an error.  Hence, we must use a macro instead.  */
+
+#define vget_lane_f16(__v, __idx)		\
+  __extension__	\
+({		\
+  float16x4_t __vec = (__v);		\
+  __builtin_arm_lane_check (4, __idx);	\
+  float16_t __res = __vec[__idx];		\
+  __res;	\
+})
+
 __extension__ static __inline float32_t __attribute__ ((__always_inline__))
 vget_lane_f32 (float32x2_t __a, const int __b)
 {
@@ -5333,6 +5347,16 @@ vset_lane_s32 (int32_t __a, int32x2_t __b, const int __c)
   return (int32x2_t)__builtin_neon_vset_lanev2si ((__builtin_neon_si) __a, __b, __c);
 }
 
+#define vset_lane_f16(__e, __v, __idx)		\
+  __extension__	\
+({		\
+  float16_t __elem = (__e);			\
+  float16x4_t __vec = (__v);		\
+  __builtin_arm_lane_check (4, __idx);	\
+  __vec[__idx] = __elem;			\
+  __vec;	\
+})
+
 __extension__ static __inline float32x2_t __attribute__ ((__always_inline__))
 vset_lane_f32 (float32_t __a, float32x2_t __b, const int __c)
 {
@@ -5479,6 +5503,12 @@ vcreate_s64 (uint64_t __a)
   return (int64x1_t)__builtin_neon_vcreatedi ((__builtin_neon_di) __a);
 }
 
+__extension__ static __inline float16x4_t __attribute__ ((__always_inline__))
+vcreate_f16 (uint64_t __a)
+{
+  return (float16x4_t) __a;
+}
+
 __extension__ static __inline float32x2_t __attribute__ ((__always_inline__))
 vcreate_f32 (uint64_t __a)
 {
@@ -8796,6 +8826,12 @@ vld1_lane_s32 (const int32_t * __a, int32x2_t __b, const int __c)
   return (int32x2_t)__builtin_neon_vld1_lanev2si ((const __builtin_neon_si *) __a, __b, __c);
 }
 
+__extension__ static __inline float16x4_t __attribute__ ((__always_inline__))
+vld1_lane_f16 (const float16_t * __a, float16x4_t __b, const int __c)
+{
+  return vset_lane_f16 (*__a, __b, __c);
+}
+
 __extension__ static __inline float32x2_t __attribute__ ((__always_inline__))
 vld1_lane_f32 (const float32_t * __a, float32x2_t __b, const int __c)
 {
@@ -8944,6 +8980,13 @@ vld1_dup_s32 (const int32_t * __a)
   return (int32x2_t)__builtin_neon_vld1_dupv2si ((const __builtin_neon_si *) __a);
 }
 
+__extension__ static __inline float16x4_t __attribute__ ((__always_inline__))
+vld1_dup_f16 (const float16_t * __a)
+{
+  float16_t __f = *__a;
+  return (float16x4_t) { __f, __f, __f, __f };
+}
+
 __extension__ static __inline float32x2_t __attribute__ ((__always_inline__))
 vld1_dup_f32 (const float32_t * __a)
 {
@@ -11828,6 +11871,12 @@ vreinterpret_p8_p16 (poly16x4_t __a)
 }
 
 __extension__ static __inline poly8x8_t __attribute__ ((__always_inline__))
+vreinterpret_p8_f16 (float16x4_t __a)
+{
+  return (poly8x8_t) __a;
+}
+
+__extension__ static __inline poly8x8_t __attribute__ ((__always_inline__))
 vreinterpret_p8_f32 (float32x2_t __a)
 {
   return (poly8x8_t)__builtin_neon_vreinterpretv8qiv2sf (__a);
@@ -11896,6 +11945,12 @@ vreinterpret_p16_p8 (poly8x8_t __a)
 }
 
 __extension__ static __inline poly16x4_t __attribute__ ((__always_inline__))
+vreinterpret_p16_f16 (float16x4_t __a)
+{
+  return (poly16x4_t) __a;
+}
+
+__extension__ static __inline poly16x4_t __attribute__ ((__always_inline__))
 vreinterpret_p16_f32 (float32x2_t __a)
 {
   return (poly16x4_t)__builtin_neon_vreinterpretv4hiv2sf (__a);
@@ -11957,6 +12012,80 @@ vreinterpret_p16_u32 (uint32x2_t __a)
   return (poly16x4_t)__builtin_neon_vreinterpretv4hiv2si ((int32x2_t) __a);
 }
 
+__extension__ static __inline float16x4_t __attribute__ ((__always_inline__))
+vreinterpret_f16_p8 (poly8x8_t __a)
+{
+  return (float16x4_t) __a;
+}
+
+__extension__ static __inline float16x4_t __attribute__ ((__always_inline__))
+vreinterpret_f16_p16 (poly16x4_t __a)
+{
+  return (float16x4_t) __a;
+}
+
+__extension__ static __inline float16x4_t __attribute__ ((__always_inline__))
+vreinterpret_f16_f32 (float32x2_t __a)
+{
+  return (float16x4_t) __a;
+}
+
+#ifdef __ARM_FEATURE_CRYPTO
+__extension__ static __inline float16x4_t __attribute__ ((__always_inline__))

[PATCH 2/16][ARM] PR/63870 Add __builtin_arm_lane_check.

2015-07-07 Thread Alan Lawrence

As per https://gcc.gnu.org/ml/gcc-patches/2015-04/msg01334.html
commit 1bb1b208a2c8c8b1ee1186c6128a498583fd64fe
Author: Alan Lawrence alan.lawre...@arm.com
Date:   Mon Dec 8 18:36:30 2014 +

Add __builtin_arm_lane_check

diff --git a/gcc/config/arm/arm-builtins.c b/gcc/config/arm/arm-builtins.c
index 7f5bf87..89b1b0c 100644
--- a/gcc/config/arm/arm-builtins.c
+++ b/gcc/config/arm/arm-builtins.c
@@ -534,12 +534,16 @@ enum arm_builtins
 #undef CRYPTO2
 #undef CRYPTO3
 
+  ARM_BUILTIN_NEON_BASE,
+  ARM_BUILTIN_NEON_LANE_CHECK = ARM_BUILTIN_NEON_BASE,
+
 #include arm_neon_builtins.def
 
   ARM_BUILTIN_MAX
 };
 
-#define ARM_BUILTIN_NEON_BASE (ARM_BUILTIN_MAX - ARRAY_SIZE (neon_builtin_data))
+#define ARM_BUILTIN_NEON_PATTERN_START \
+(ARM_BUILTIN_MAX - ARRAY_SIZE (neon_builtin_data))
 
 #undef CF
 #undef VAR1
@@ -898,7 +902,7 @@ arm_init_simd_builtin_scalar_types (void)
 static void
 arm_init_neon_builtins (void)
 {
-  unsigned int i, fcode = ARM_BUILTIN_NEON_BASE;
+  unsigned int i, fcode = ARM_BUILTIN_NEON_PATTERN_START;
 
   arm_init_simd_builtin_types ();
 
@@ -908,6 +912,15 @@ arm_init_neon_builtins (void)
  system.  */
   arm_init_simd_builtin_scalar_types ();
 
+  tree lane_check_fpr = build_function_type_list (void_type_node,
+		  intSI_type_node,
+		  intSI_type_node,
+		  NULL);
+  arm_builtin_decls[ARM_BUILTIN_NEON_LANE_CHECK] =
+  add_builtin_function (__builtin_arm_lane_check, lane_check_fpr,
+			ARM_BUILTIN_NEON_LANE_CHECK, BUILT_IN_MD,
+			NULL, NULL_TREE);
+
   for (i = 0; i  ARRAY_SIZE (neon_builtin_data); i++, fcode++)
 {
   bool print_type_signature_p = false;
@@ -2171,14 +2184,28 @@ arm_expand_neon_args (rtx target, machine_mode map_mode, int fcode,
   return target;
 }
 
-/* Expand a Neon builtin. These are special because they don't have symbolic
+/* Expand a Neon builtin, i.e. those registered only if TARGET_NEON holds.
+   Most of these are special because they don't have symbolic
constants defined per-instruction or per instruction-variant. Instead, the
required info is looked up in the table neon_builtin_data.  */
 static rtx
 arm_expand_neon_builtin (int fcode, tree exp, rtx target)
 {
+  if (fcode == ARM_BUILTIN_NEON_LANE_CHECK)
+{
+  tree nlanes = CALL_EXPR_ARG (exp, 0);
+  gcc_assert (TREE_CODE (nlanes) == INTEGER_CST);
+  rtx lane_idx = expand_normal (CALL_EXPR_ARG (exp, 1));
+  if (CONST_INT_P (lane_idx))
+	neon_lane_bounds (lane_idx, 0, TREE_INT_CST_LOW (nlanes), exp);
+  else
+	error (%Klane index must be a constant immediate, exp);
+  /* Don't generate any RTL.  */
+  return const0_rtx;
+}
+
   neon_builtin_datum *d =
-		neon_builtin_data[fcode - ARM_BUILTIN_NEON_BASE];
+		neon_builtin_data[fcode - ARM_BUILTIN_NEON_PATTERN_START];
   enum insn_code icode = d-code;
   builtin_arg args[SIMD_MAX_BUILTIN_ARGS];
   int num_args = insn_data[d-code].n_operands;


[PATCH 4/16][ARM] Add float16x8_t type

2015-07-07 Thread Alan Lawrence

Unchanged since https://gcc.gnu.org/ml/gcc-patches/2015-04/msg01336.html
commit b9ccac6243415b304024443b74bdc97b3a5954f2
Author: Alan Lawrence alan.lawre...@arm.com
Date:   Mon Dec 8 18:40:24 2014 +

Add float16x8_t + V8HFmode support (regardless of -mfp16-format)

diff --git a/gcc/config/arm/arm-builtins.c b/gcc/config/arm/arm-builtins.c
index 89b1b0c..17e39d8 100644
--- a/gcc/config/arm/arm-builtins.c
+++ b/gcc/config/arm/arm-builtins.c
@@ -192,6 +192,7 @@ arm_storestruct_lane_qualifiers[SIMD_MAX_BUILTIN_ARGS]
 #define di_UPDImode
 #define v16qi_UP V16QImode
 #define v8hi_UP  V8HImode
+#define v8hf_UP  V8HFmode
 #define v4si_UP  V4SImode
 #define v4sf_UP  V4SFmode
 #define v2di_UP  V2DImode
@@ -827,6 +828,7 @@ arm_init_simd_builtin_types (void)
   /* Continue with standard types.  */
   arm_simd_types[Float16x4_t].eltype = arm_simd_floatHF_type_node;
   arm_simd_types[Float32x2_t].eltype = float_type_node;
+  arm_simd_types[Float16x8_t].eltype = arm_simd_floatHF_type_node;
   arm_simd_types[Float32x4_t].eltype = float_type_node;
 
   for (i = 0; i  nelts; i++)
diff --git a/gcc/config/arm/arm-simd-builtin-types.def b/gcc/config/arm/arm-simd-builtin-types.def
index bcbd20b..b178ae6 100644
--- a/gcc/config/arm/arm-simd-builtin-types.def
+++ b/gcc/config/arm/arm-simd-builtin-types.def
@@ -44,5 +44,7 @@
 
   ENTRY (Float16x4_t, V4HF, none, 64, float16, 18)
   ENTRY (Float32x2_t, V2SF, none, 64, float32, 18)
+
+  ENTRY (Float16x8_t, V8HF, none, 128, float16, 19)
   ENTRY (Float32x4_t, V4SF, none, 128, float32, 19)
 
diff --git a/gcc/config/arm/arm.c b/gcc/config/arm/arm.c
index 6e074ea..0faa46c 100644
--- a/gcc/config/arm/arm.c
+++ b/gcc/config/arm/arm.c
@@ -26251,7 +26251,8 @@ arm_vector_mode_supported_p (machine_mode mode)
 {
   /* Neon also supports V2SImode, etc. listed in the clause below.  */
   if (TARGET_NEON  (mode == V2SFmode || mode == V4SImode || mode == V8HImode
-  || mode == V4HFmode || mode == V16QImode || mode == V4SFmode || mode == V2DImode))
+  || mode ==V4HFmode || mode == V16QImode || mode == V4SFmode
+  || mode == V2DImode || mode == V8HFmode))
 return true;
 
   if ((TARGET_NEON || TARGET_IWMMXT)
diff --git a/gcc/config/arm/arm.h b/gcc/config/arm/arm.h
index 373dc85..c0a83b2 100644
--- a/gcc/config/arm/arm.h
+++ b/gcc/config/arm/arm.h
@@ -999,7 +999,7 @@ extern int arm_arch_crc;
 /* Modes valid for Neon Q registers.  */
 #define VALID_NEON_QREG_MODE(MODE) \
   ((MODE) == V4SImode || (MODE) == V8HImode || (MODE) == V16QImode \
-   || (MODE) == V4SFmode || (MODE) == V2DImode)
+   || (MODE) == V8HFmode || (MODE) == V4SFmode || (MODE) == V2DImode)
 
 /* Structure modes valid for Neon registers.  */
 #define VALID_NEON_STRUCT_MODE(MODE) \
diff --git a/gcc/config/arm/arm_neon.h b/gcc/config/arm/arm_neon.h
index b4100c8..a958f63 100644
--- a/gcc/config/arm/arm_neon.h
+++ b/gcc/config/arm/arm_neon.h
@@ -58,6 +58,7 @@ typedef __simd128_int8_t int8x16_t;
 typedef __simd128_int16_t int16x8_t;
 typedef __simd128_int32_t int32x4_t;
 typedef __simd128_int64_t int64x2_t;
+typedef __simd128_float16_t float16x8_t;
 typedef __simd128_float32_t float32x4_t;
 typedef __simd128_poly8_t poly8x16_t;
 typedef __simd128_poly16_t poly16x8_t;


[PATCH 16/16][ARM/AArch64 Testsuite] Add test of vcvt{,_high}_{f16_f32,f32_f16}

2015-07-07 Thread Alan Lawrence
This is a respin of https://gcc.gnu.org/ml/gcc-patches/2015-04/msg01349.html . 
Changes are to:


use #if defined(__aarch64__) rather than __ARM_64BIT_STATE__;
add an initial call to clean_results;
use a different mechanism for adding -mfpu=neon-fp16 on ARM (specifically: we 
try to add that flag for all tests, as AFAICT that is valid anywhere -mfpu=neon 
is valid; and bail out of the vcvt_f16 test, the only test that actually 
requires fp16 H/W, if unsuccessful e.g. if a -mfpu=neon was forced on the 
command-line). This is because the rightmost -mfpu option overrides the previous.


gcc/testsuite/ChangeLog:

* gcc.target/aarch64/advsimd-intrinsics/advsimd-intrinsics.exp:
set additional flags for neon-fp16 support.
* gcc.target/aarch64/advsimd-intrinsics/vcvt_f16.c: New.
commit e6cc7467ddf5702d3a122b8ac4163621d0164b37
Author: Alan Lawrence alan.lawre...@arm.com
Date:   Wed Jan 28 13:02:22 2015 +

v2 Test vcvt{,_high on aarch64}_f{32_f16,16_f32}, with neon-fp16 for ARM targets.

v2a: #if defined(__aarch64__); + clean_results(); fp16 opts for ARM; fp16_hw_ok

diff --git a/gcc/testsuite/gcc.target/aarch64/advsimd-intrinsics/advsimd-intrinsics.exp b/gcc/testsuite/gcc.target/aarch64/advsimd-intrinsics/advsimd-intrinsics.exp
index ceada83..5f5e1fe 100644
--- a/gcc/testsuite/gcc.target/aarch64/advsimd-intrinsics/advsimd-intrinsics.exp
+++ b/gcc/testsuite/gcc.target/aarch64/advsimd-intrinsics/advsimd-intrinsics.exp
@@ -52,8 +52,10 @@ if {[istarget arm*-*-*]} then {
 torture-init
 set-torture-options $C_TORTURE_OPTIONS {{}} $LTO_TORTURE_OPTIONS
 
-# Make sure Neon flags are provided, if necessary.
-set additional_flags [add_options_for_arm_neon ]
+# Make sure Neon flags are provided, if necessary. We try to add FP16 flags
+# for all tests; tests requiring FP16 will abort if a non-FP16 option
+# was forced.
+set additional_flags [add_options_for_arm_neon_fp16 ]
 
 # Main loop.
 gcc-dg-runtest [lsort [glob -nocomplain $srcdir/$subdir/*.c]] \
diff --git a/gcc/testsuite/gcc.target/aarch64/advsimd-intrinsics/vcvt_f16.c b/gcc/testsuite/gcc.target/aarch64/advsimd-intrinsics/vcvt_f16.c
new file mode 100644
index 000..7a1c256
--- /dev/null
+++ b/gcc/testsuite/gcc.target/aarch64/advsimd-intrinsics/vcvt_f16.c
@@ -0,0 +1,98 @@
+/* { dg-require-effective-target arm_neon_fp16_hw_ok { target { arm*-*-* } } } */
+#include arm_neon.h
+#include arm-neon-ref.h
+#include compute-ref-data.h
+#include math.h
+
+/* Expected results for vcvt.  */
+VECT_VAR_DECL (expected,hfloat,32,4) [] = { 0x4180, 0x4170,
+	0x4160, 0x4150 };
+VECT_VAR_DECL (expected,hfloat,16,4) [] = { 0x3e00, 0x4100, 0x4300, 0x4480 };
+
+/* Expected results for vcvt_high_f32_f16.  */
+VECT_VAR_DECL (expected_high,hfloat,32,4) [] = { 0xc140, 0xc130,
+		 0xc120, 0xc110 };
+/* Expected results for vcvt_high_f16_f32.  */
+VECT_VAR_DECL (expected_high,hfloat,16,8) [] = { 0x4000, 0x4000, 0x4000, 0x4000,
+		 0xcc00, 0xcb80, 0xcb00, 0xca80 };
+
+void
+exec_vcvt (void)
+{
+  clean_results();
+
+#define TEST_MSG vcvt_f32_f16
+  {
+VECT_VAR_DECL (buffer_src, float, 16, 4) [] = { 16.0, 15.0, 14.0, 13.0 };
+
+DECL_VARIABLE (vector_src, float, 16, 4);
+
+VLOAD (vector_src, buffer_src, , float, f, 16, 4);
+DECL_VARIABLE (vector_res, float, 32, 4) =
+	vcvt_f32_f16 (VECT_VAR (vector_src, float, 16, 4));
+vst1q_f32 (VECT_VAR (result, float, 32, 4),
+	   VECT_VAR (vector_res, float, 32, 4));
+
+CHECK_FP (TEST_MSG, float, 32, 4, PRIx32, expected, );
+  }
+#undef TEST_MSG
+
+  clean_results ();
+
+#define TEST_MSG vcvt_f16_f32
+  {
+VECT_VAR_DECL (buffer_src, float, 32, 4) [] = { 1.5, 2.5, 3.5, 4.5 };
+DECL_VARIABLE (vector_src, float, 32, 4);
+
+VLOAD (vector_src, buffer_src, q, float, f, 32, 4);
+DECL_VARIABLE (vector_res, float, 16, 4) =
+  vcvt_f16_f32 (VECT_VAR (vector_src, float, 32, 4));
+vst1_f16 (VECT_VAR (result, float, 16, 4),
+	  VECT_VAR (vector_res, float, 16 ,4));
+
+CHECK_FP (TEST_MSG, float, 16, 4, PRIx16, expected, );
+  }
+#undef TEST_MSG
+
+#if defined(__aarch64__)
+  clean_results ();
+
+#define TEST_MSG vcvt_high_f32_f16
+  {
+DECL_VARIABLE (vector_src, float, 16, 8);
+VLOAD (vector_src, buffer, q, float, f, 16, 8);
+DECL_VARIABLE (vector_res, float, 32, 4);
+VECT_VAR (vector_res, float, 32, 4) =
+  vcvt_high_f32_f16 (VECT_VAR (vector_src, float, 16, 8));
+vst1q_f32 (VECT_VAR (result, float, 32, 4),
+	   VECT_VAR (vector_res, float, 32, 4));
+CHECK_FP (TEST_MSG, float, 32, 4, PRIx32, expected_high, );
+  }
+#undef TEST_MSG
+  clean_results ();
+
+#define TEST_MSG vcvt_high_f16_f32
+  {
+DECL_VARIABLE (vector_low, float, 16, 4);
+VDUP (vector_low, , float, f, 16, 4, 2.0);
+
+DECL_VARIABLE (vector_src, float, 32, 4);
+VLOAD (vector_src, buffer, q, float, f, 32, 4);
+
+DECL_VARIABLE (vector_res, float, 16, 8) =
+  vcvt_high_f16_f32 (VECT_VAR (vector_low, float, 16, 

[AArch64][1/2] Mark GOT related MEM rtx as const to help RTL loop IV

2015-07-07 Thread Jiong Wang

Given below testcase, the instruction which load function address from
GOT table is not hoisted out of the loop while it should be, as the
value is fixed at runtime.

The problem is we havn't mark those GOT related mem as READONLY that RTL
loop2_iv pass has make conservative decision in check_maybe_invariant to
not hoist them.

int bar (int) ;

int
foo (int a, int bound)
{
  int i = 0;
  int sum = 0;
  
  for (i; i  bound; i++)
sum = bar (sum);

  return sum;
}

this patch mark mem in PIC related pattern as READONLY and NO_TRAP, more
cleanup may needed for several other pattern.


2015-07-06  Jiong Wang  jiong.w...@arm.com

gcc/
  * config/aarch64/aarch64.c (aarch64_load_symref_appropriately): Mark mem as
  READONLY and NOTRAP for PIC symbol.

gcc/testsuite/
  * gcc.target/aarch64/got_mem_hoist.c: New test.
  
-- 
Regards,
Jiong
diff --git a/gcc/config/aarch64/aarch64.c b/gcc/config/aarch64/aarch64.c
index 4522fc2..4bbc049 100644
--- a/gcc/config/aarch64/aarch64.c
+++ b/gcc/config/aarch64/aarch64.c
@@ -915,6 +915,8 @@ aarch64_load_symref_appropriately (rtx dest, rtx imm,
   {
 	machine_mode mode = GET_MODE (dest);
 	rtx gp_rtx = pic_offset_table_rtx;
+	rtx insn;
+	rtx mem;
 
 	/* NOTE: pic_offset_table_rtx can be NULL_RTX, because we can reach
 	   here before rtl expand.  Tree IVOPT will generate rtl pattern to
@@ -958,16 +960,27 @@ aarch64_load_symref_appropriately (rtx dest, rtx imm,
 	if (mode == ptr_mode)
 	  {
 	if (mode == DImode)
-	  emit_insn (gen_ldr_got_small_28k_di (dest, gp_rtx, imm));
+	  insn = gen_ldr_got_small_28k_di (dest, gp_rtx, imm);
 	else
-	  emit_insn (gen_ldr_got_small_28k_si (dest, gp_rtx, imm));
+	  insn = gen_ldr_got_small_28k_si (dest, gp_rtx, imm);
+
+	mem = XVECEXP (SET_SRC (insn), 0, 0);
 	  }
 	else
 	  {
 	gcc_assert (mode == Pmode);
-	emit_insn (gen_ldr_got_small_28k_sidi (dest, gp_rtx, imm));
+
+	insn = gen_ldr_got_small_28k_sidi (dest, gp_rtx, imm);
+	mem = XVECEXP (XEXP (SET_SRC (insn), 0), 0, 0);
 	  }
 
+	/* The operand is expected to be MEM.  Whenever the related insn
+	   pattern changed, above code which calculate mem should be
+	   updated.  */
+	gcc_assert (GET_CODE (mem) == MEM);
+	MEM_READONLY_P (mem) = 1;
+	MEM_NOTRAP_P (mem) = 1;
+	emit_insn (insn);
 	return;
   }
 
@@ -980,6 +993,9 @@ aarch64_load_symref_appropriately (rtx dest, rtx imm,
 	   DImode if dest is dereferenced to access the memeory.
 	   This is why we have to handle three different ldr_got_small
 	   patterns here (two patterns for ILP32).  */
+
+	rtx insn;
+	rtx mem;
 	rtx tmp_reg = dest;
 	machine_mode mode = GET_MODE (dest);
 
@@ -990,16 +1006,24 @@ aarch64_load_symref_appropriately (rtx dest, rtx imm,
 	if (mode == ptr_mode)
 	  {
 	if (mode == DImode)
-	  emit_insn (gen_ldr_got_small_di (dest, tmp_reg, imm));
+	  insn = gen_ldr_got_small_di (dest, tmp_reg, imm);
 	else
-	  emit_insn (gen_ldr_got_small_si (dest, tmp_reg, imm));
+	  insn = gen_ldr_got_small_si (dest, tmp_reg, imm);
+
+	mem = XVECEXP (SET_SRC (insn), 0, 0);
 	  }
 	else
 	  {
 	gcc_assert (mode == Pmode);
-	emit_insn (gen_ldr_got_small_sidi (dest, tmp_reg, imm));
+
+	insn = gen_ldr_got_small_sidi (dest, tmp_reg, imm);
+	mem = XVECEXP (XEXP (SET_SRC (insn), 0), 0, 0);
 	  }
 
+	gcc_assert (GET_CODE (mem) == MEM);
+	MEM_READONLY_P (mem) = 1;
+	MEM_NOTRAP_P (mem) = 1;
+	emit_insn (insn);
 	return;
   }
 
diff --git a/gcc/testsuite/gcc.target/aarch64/got_mem_hoist.c b/gcc/testsuite/gcc.target/aarch64/got_mem_hoist.c
new file mode 100644
index 000..6d29718
--- /dev/null
+++ b/gcc/testsuite/gcc.target/aarch64/got_mem_hoist.c
@@ -0,0 +1,21 @@
+/* { dg-do compile } */
+/* { dg-options -O2 -fpic -fdump-rtl-loop2_invariant } */
+
+int bar (int);
+int cal (void *);
+
+int
+foo (int a, int bound)
+{
+  int i = 0;
+  int sum = 0;
+
+  for (i; i  bound; i++)
+sum = cal (bar);
+
+  return sum;
+}
+
+/* The insn which loads function address from GOT table should be moved out
+   of the loop.  */
+/* { dg-final { scan-rtl-dump Decided loop2_invariant } } */


Re: [gomp] Move openacc vector worker single handling to RTL

2015-07-07 Thread Jakub Jelinek
On Mon, Jul 06, 2015 at 03:34:51PM -0400, Nathan Sidwell wrote:
 On 07/04/15 16:41, Nathan Sidwell wrote:
 On 07/03/15 19:11, Jakub Jelinek wrote:
 
 If the builtins are not meant to be used by users directly (I assume they
 aren't) nor have a 1-1 correspondence to a library routine, it is much
 better to emit them as internal calls (see internal-fn.{c,def}) instead of
 BUILT_IN_NORMAL functions.
 
 
 This patch uses internal builtins, I had to make one additional change to
 tree-ssa-tail-merge.c's same_succ_def::equal hash compare function.  The new
 internal fn I introduced should compare EQ but not otherwise compare EQUAL,
 and that was blowing up the has function, which relied on EQUAL only.  I
 don't know why I didn't hit this problem in the previous patch with the
 regular builtin.

How does this interact with
#pragma acc routine {gang,worker,vector,seq} ?
Or is that something to be added later on?

Jakub


Re: [PATCH 1/2][ARM] PR/65956 AAPCS update for alignment attribute

2015-07-07 Thread Alan Lawrence

Ramana Radhakrishnan wrote:


This is OK, the ada testing can go in parallel and we should take this in to 
not delay rc1 any further.


I can confirm, no regressions in check-ada (gcc/testsuite/gnats and 
gcc/testsuite/acats) following an ada bootstrap on cortex-a15/neon/hard-float.


That's the existing tests - nothing specifically testing conformance to the 
AAPCS updates (wrt. arrays), of course.


Cheers, Alan



Re: Clean-ups in match.pd

2015-07-07 Thread Richard Biener
On Tue, Jul 7, 2015 at 12:48 PM, Eric Botcazou ebotca...@adacore.com wrote:
 Bootstrap+testsuite on powerpc64le-unknown-linux-gnu (it looks like
 *-match.c takes about 10 minutes to compile in stage2 these days).

 Yeah, it has already taken back all the speedup brought by the rewrite of the
 RTL gen* stuff by Richard S. :-(

And it's going to get worse (read: larger).  Looking at the
time-report data I don't
think splitting into multiple functions will help though.  Splitting
into multiple
files would allow to parallelize the build at least.

I'm gathering a profile to see where all the time in the checking stuff goes.

As said, one code generation arrangement that is on my TODO list will
remove some code duplication, but I'm not sure it will make a big
enough difference.

Richard.

 --
 Eric Botcazou


Re: [RFC] two-phase marking in gt_cleare_cache

2015-07-07 Thread Tom de Vries

On 07/07/15 10:42, Richard Biener wrote:

On Mon, Jul 6, 2015 at 7:29 PM, Tom de Vries tom_devr...@mentor.com wrote:

On 06/07/15 15:29, Richard Biener wrote:


On Mon, Jul 6, 2015 at 3:25 PM, Richard Biener
richard.guent...@gmail.com wrote:


On Mon, Jul 6, 2015 at 10:57 AM, Tom de Vries tom_devr...@mentor.com
wrote:


Hi,

Using attached untested patch, I managed to minimize a test-case failure
for
PR 66714.

The patch introduces two-phase marking in gt_cleare_cache:
- first phase, it loops over all the hash table entries and removes
those which are dead
- second phase, it runs over all the live hash table entries and marks
live items that are reachable from those live entries

By doing so, we make the behaviour of gt_cleare_cache independent of the
order in which the entries are visited, turning:
- hard-to-trigger bugs which trigger for one visiting order but not for
another, into
- more easily triggered bugs which trigger for any visiting order.

Any comments?



I think it is only half-way correct in your proposed change.  You only
fix the issue for hashes of the same kind.  To truly fix the issue you'd
have to change generated code for gt_clear_caches () and provide
a clearing-only implementation (or pass a operation mode bool to
the core worker in hash-table.h).





[ Btw, we have been discussing a similar issue before:
https://gcc.gnu.org/ml/gcc/2010-07/msg00077.html ]

True, the problem exists at the scope of all variables marked with 'cache',
and this patch addresses the problem only within a single variable.


Hmm, and don't we rather want to first mark and _then_ clear?



I. In favor of first clear and then mark:

It allows for:
- a lazy one phase implementation for !ENABLE_CHECKING where
   you do a single clear-or-mark phase (so the clear is lazy).
- an eager two phase implementation for ENABLE_CHECKING (where the
   clear is eager)
The approach of first a marking phase and then a clearing phase means you
always have to do these two phases (you can't do the marking lazily).


True.


First mark and then clear means the marking should be done iteratively. Each
time you mark something live, another entry in another hash table could
become live. Marking iteratively could become quite costly.


I don't see this - marking is done recursively so if one entry makes another
live and that makes another live the usual GC marking recursion will deal
with this?



That is not my understanding.  Marking an item live doesn't mean that 
the associated cache entries become live. For that, we have to iterate 
again over all hash tables and all entries to find those entries.
And by marking those, we may find new items which are live. And the 
process starts over again, until fixed point.


[ If we maintain a per-item list of cache entries the item is the key 
for, then we can do this recursively, rather than iteratively. ]



II. In favor of first mark and then clear:

The users of garbage collection will need to be less precise.


Because
if entry B in the hash is live and would keep A live then A _is_ kept in
the
end but you'll remove it from the hash, possibly no longer using a still
live copy.



I'm not sure I understand the scenario you're concerned about, but ... say
we have
- entry B: item B - item A
- entry A: item A - item Z

If you do clear first and mark second, and you start out with item B live
and item A dead:
- during the clearing phase you clear entry A and keep entry B, and
- during the marking phase you mark item A live.

So we no longer have entry A, but item A is kept and entry B is kept.


Yes.  This makes the cache weaker in that after this GC operation
a lookup of A no longer succeeds but it still is there.

The whole point of your patch was to make the behavior more predictable
and in some way it succeeds (within a cache).  As it is supposed to
put more stress on the cache logic (it's ENABLE_CHECKING only)
it makes sense to clear optimistically (after all it's a cache and not
guaranteed to find a still live entry).  It would be still nice to cover
all caches together because as I remember we've mostly seen issues
of caches interacting.



Attached patch (completed no-bootstrap c-only build) implements that.

Thanks,
- Tom

Add clear-phase/mark-phase cache clearing

2015-07-07  Tom de Vries  t...@codesourcery.com

	* gengtype.c (finish_cache_funcs): Add phase param to
	gt_clear_caches_file and gt_clear_caches.
	(write_roots): Add phase param to gt_clear_caches_file, and use.
	* ggc-common.c (ggc_mark_roots): Add arg to call to gt_clear_caches.
	Call gt_clear_caches twice for ENABLE_CHECKING.
	* ggc.h (gt_clear_caches): Add phase param to declaration.
	* hash-table.h (gt_cleare_cache): Add and handle phase param.
---
 gcc/gengtype.c   | 11 ++-
 gcc/ggc-common.c | 11 ++-
 gcc/ggc.h|  2 +-
 gcc/hash-table.h | 55 ---
 4 files changed, 61 insertions(+), 18 deletions(-)

diff --git a/gcc/gengtype.c 

Re: [PATCH] Update instruction cost for LEON

2015-07-07 Thread Eric Botcazou
 2015-07-03  Daniel Cederman  ceder...@gaisler.com
 
   * config/sparc/sparc.c (struct processor_costs): Set div cost
   for leon to match UT699 and AT697F. Set mul cost for leon3 to
   match standard leon3.

So UT699 is not a standard LEON3?

-- 
Eric Botcazou


Re: [PR25529] Convert (unsigned t * 2)/2 into unsigned (t 0x7FFFFFFF)

2015-07-07 Thread Marc Glisse

On Tue, 7 Jul 2015, Richard Biener wrote:


On Tue, Jul 7, 2015 at 8:06 AM, Marc Glisse marc.gli...@inria.fr wrote:

On Tue, 7 Jul 2015, Hurugalawadi, Naveen wrote:


Please find attached the patch PR25529.patch that converts the pattern:-
(unsigned * 2)/2 is into unsigned 0x7FFF



+/* Simplify (unsigned t * 2)/2 - unsigned t  0x7FFF.  */
+(for div (trunc_div ceil_div floor_div round_div exact_div)
+ (simplify
+  (div (mult @0 INTEGER_CST@1) INTEGER_CST@1)

You don't need to repeat INTEGER_CST, the second time @1 is enough.

+  (with { tree n2 = build_int_cst (TREE_TYPE (@0),
+  wi::exact_log2 (@1)); }
+  (if (TYPE_UNSIGNED (TREE_TYPE (@0)))
+   (bit_and @0 (rshift (lshift { build_minus_one_cst (TREE_TYPE (@0)); }
+  { n2; }) { n2; }))

What happens if you write t*3/3?


Huh, and you posted this patch twice?  See my reply to the other copy
for the correctness issues and better handling of exact_div


They are not the same, one is for left shifts and the other one for right 
shifts. And that makes a big difference: in t*c/c, the division is always 
exact, so all divisions are equivalent. This is not the case for t/c*c.


--
Marc Glisse


Re: [PING 3] Re: [PATCH] warn for unsafe calls to __builtin_return_address

2015-07-07 Thread Pedro Alves
On 07/07/2015 04:41 AM, Martin Sebor wrote:
 This is a small change to diagnose unsafe calls to
 __builtin_{frame,return}_address (with an argument  2) than
 tend to return bogus values or lead to crashes at runtime.
 

I hadn't realized you went through and implemented the
suggestion.  Thanks for doing this.  I hope it gets approved.

 A review would be appreciated.

(It may be the relevant maintainers haven't noticed there's a
code patch here because this is nested within
the [PATCH] clarify doc for __builtin_return_address thread).)

Thanks,
Pedro Alves



Re: Clean-ups in match.pd

2015-07-07 Thread Eric Botcazou
 Bootstrap+testsuite on powerpc64le-unknown-linux-gnu (it looks like
 *-match.c takes about 10 minutes to compile in stage2 these days).

Yeah, it has already taken back all the speedup brought by the rewrite of the 
RTL gen* stuff by Richard S. :-(

-- 
Eric Botcazou


[PATCH] MIPS: Fix the call-[1,5,6].c tests to allow the jrc instruction to be matched when testing with microMIPS

2015-07-07 Thread Andrew Bennett
Hi,

When building the call-[1,5,6].c tests for micromips the jrc rather than the
jr instruction is used to call the tail* functions.

I have updated the test output to allow the jrc instruction to be matched.

I have tested this on the mips-mti-elf target using 
mips32r2/{-mno-micromips/-mmicromips}
test options and there are no new regressions.

The patch and ChangeLog are below.

Ok to commit?


Many thanks,


Andrew



testsuite/
* gcc.target/mips/call-1.c: Allow testcase to match the jrc instruction.
* gcc.target/mips/call-5.c: Ditto.
* gcc.target/mips/call-6.c: Ditto.



diff --git a/gcc/testsuite/gcc.target/mips/call-1.c 
b/gcc/testsuite/gcc.target/mips/call-1.c
index 2f4a37e..a00126e 100644
--- a/gcc/testsuite/gcc.target/mips/call-1.c
+++ b/gcc/testsuite/gcc.target/mips/call-1.c
@@ -3,10 +3,10 @@
 /* { dg-final { scan-assembler \\.reloc\t1f,R_MIPS_JALR,normal\n1:\tjalrs?\t 
} } */
 /* { dg-final { scan-assembler 
\\.reloc\t1f,R_MIPS_JALR,normal2\n1:\tjalrs?\t } } */
 /* { dg-final { scan-assembler 
\\.reloc\t1f,R_MIPS_JALR,staticfunc\n1:\tjalrs?\t } } */
-/* { dg-final { scan-assembler \\.reloc\t1f,R_MIPS_JALR,tail\n1:\tjr\t } } */
-/* { dg-final { scan-assembler \\.reloc\t1f,R_MIPS_JALR,tail2\n1:\tjr\t } } 
*/
-/* { dg-final { scan-assembler \\.reloc\t1f,R_MIPS_JALR,tail3\n1:\tjr\t } } 
*/
-/* { dg-final { scan-assembler \\.reloc\t1f,R_MIPS_JALR,tail4\n1:\tjr\t } } 
*/
+/* { dg-final { scan-assembler (\\.reloc\t1f,R_MIPS_JALR,tail\n1:)?\tjrc?\t 
} } */
+/* { dg-final { scan-assembler (\\.reloc\t1f,R_MIPS_JALR,tail2\n1:)?\tjrc?\t 
} } */
+/* { dg-final { scan-assembler (\\.reloc\t1f,R_MIPS_JALR,tail3\n1:)?\tjrc?\t 
} } */
+/* { dg-final { scan-assembler (\\.reloc\t1f,R_MIPS_JALR,tail4\n1:)?\tjrc?\t 
} } */
 
 __attribute__ ((noinline)) static void staticfunc () { asm (); }
 int normal ();
diff --git a/gcc/testsuite/gcc.target/mips/call-5.c 
b/gcc/testsuite/gcc.target/mips/call-5.c
index bfb95eb..d8d84d3 100644
--- a/gcc/testsuite/gcc.target/mips/call-5.c
+++ b/gcc/testsuite/gcc.target/mips/call-5.c
@@ -7,8 +7,8 @@
 /* { dg-final { scan-assembler 
\\.reloc\t1f,R_MIPS_JALR,staticfunc\n1:\tjalr\t } } */
 /* { dg-final { scan-assembler \\.reloc\t1f,R_MIPS_JALR,tail\n1:\tjalr\t } } 
*/
 /* { dg-final { scan-assembler \\.reloc\t1f,R_MIPS_JALR,tail2\n1:\tjalr\t } 
} */
-/* { dg-final { scan-assembler \\.reloc\t1f,R_MIPS_JALR,tail3\n1:\tjr\t } } 
*/
-/* { dg-final { scan-assembler \\.reloc\t1f,R_MIPS_JALR,tail4\n1:\tjr\t } } 
*/
+/* { dg-final { scan-assembler (\\.reloc\t1f,R_MIPS_JALR,tail3\n1:)?\tjrc?\t 
} } */
+/* { dg-final { scan-assembler (\\.reloc\t1f,R_MIPS_JALR,tail4\n1:)?\tjrc?\t 
} } */
 
 __attribute__ ((noinline)) static void staticfunc () { asm (); }
 int normal ();
diff --git a/gcc/testsuite/gcc.target/mips/call-6.c 
b/gcc/testsuite/gcc.target/mips/call-6.c
index 117795d..e6c90d7 100644
--- a/gcc/testsuite/gcc.target/mips/call-6.c
+++ b/gcc/testsuite/gcc.target/mips/call-6.c
@@ -6,8 +6,8 @@
 /* { dg-final { scan-assembler 
\\.reloc\t1f,R_MIPS_JALR,staticfunc\n1:\tjalr\t } } */
 /* { dg-final { scan-assembler \\.reloc\t1f,R_MIPS_JALR,tail\n1:\tjalr\t } } 
*/
 /* { dg-final { scan-assembler \\.reloc\t1f,R_MIPS_JALR,tail2\n1:\tjalr\t } 
} */
-/* { dg-final { scan-assembler \\.reloc\t1f,R_MIPS_JALR,tail3\n1:\tjr\t } } 
*/
-/* { dg-final { scan-assembler \\.reloc\t1f,R_MIPS_JALR,tail4\n1:\tjr\t } } 
*/
+/* { dg-final { scan-assembler (\\.reloc\t1f,R_MIPS_JALR,tail3\n1:)?\tjrc?\t 
} } */
+/* { dg-final { scan-assembler (\\.reloc\t1f,R_MIPS_JALR,tail4\n1:)?\tjrc?\t 
} } */
 
 __attribute__ ((noinline)) static void staticfunc () { asm (); }
 int normal ();



Re: [PR25529] Convert (unsigned t * 2)/2 into unsigned (t 0x7FFFFFFF)

2015-07-07 Thread Richard Biener
On Tue, Jul 7, 2015 at 11:24 AM, Marc Glisse marc.gli...@inria.fr wrote:
 On Tue, 7 Jul 2015, Richard Biener wrote:

 On Tue, Jul 7, 2015 at 8:06 AM, Marc Glisse marc.gli...@inria.fr wrote:

 On Tue, 7 Jul 2015, Hurugalawadi, Naveen wrote:

 Please find attached the patch PR25529.patch that converts the
 pattern:-
 (unsigned * 2)/2 is into unsigned 0x7FFF



 +/* Simplify (unsigned t * 2)/2 - unsigned t  0x7FFF.  */
 +(for div (trunc_div ceil_div floor_div round_div exact_div)
 + (simplify
 +  (div (mult @0 INTEGER_CST@1) INTEGER_CST@1)

 You don't need to repeat INTEGER_CST, the second time @1 is enough.

 +  (with { tree n2 = build_int_cst (TREE_TYPE (@0),
 +  wi::exact_log2 (@1)); }
 +  (if (TYPE_UNSIGNED (TREE_TYPE (@0)))
 +   (bit_and @0 (rshift (lshift { build_minus_one_cst (TREE_TYPE (@0)); }
 +  { n2; }) { n2; }))

 What happens if you write t*3/3?


 Huh, and you posted this patch twice?  See my reply to the other copy
 for the correctness issues and better handling of exact_div


 They are not the same, one is for left shifts and the other one for right
 shifts. And that makes a big difference: in t*c/c, the division is always
 exact, so all divisions are equivalent. This is not the case for t/c*c.

Ah, sorry.  Still the same comment for computing the constant and
placing of the 'with' applies.  For signed types with TYPE_OVERFLOW_UNDEFINED
you can simply cancel the operation (even for non-power-of-two multipliers).
In fold-const.c extract_muldiv contains magic to handle this kind of cases.
Otherwise for signed division (only the sign of the division matters, so you
can probably ignore sign-changing conversions of the multiplication result)
you can simplify it to a sign-extension from bit precision - log2 with
the proposed introduction of a SEXT_EXPR (see other thread about
type promotion).

Richard.

 --
 Marc Glisse


Re: [RFC] two-phase marking in gt_cleare_cache

2015-07-07 Thread Richard Biener
On Tue, Jul 7, 2015 at 11:39 AM, Tom de Vries tom_devr...@mentor.com wrote:
 On 07/07/15 10:42, Richard Biener wrote:

 On Mon, Jul 6, 2015 at 7:29 PM, Tom de Vries tom_devr...@mentor.com
 wrote:

 On 06/07/15 15:29, Richard Biener wrote:


 On Mon, Jul 6, 2015 at 3:25 PM, Richard Biener
 richard.guent...@gmail.com wrote:


 On Mon, Jul 6, 2015 at 10:57 AM, Tom de Vries tom_devr...@mentor.com
 wrote:


 Hi,

 Using attached untested patch, I managed to minimize a test-case
 failure
 for
 PR 66714.

 The patch introduces two-phase marking in gt_cleare_cache:
 - first phase, it loops over all the hash table entries and removes
 those which are dead
 - second phase, it runs over all the live hash table entries and marks
 live items that are reachable from those live entries

 By doing so, we make the behaviour of gt_cleare_cache independent of
 the
 order in which the entries are visited, turning:
 - hard-to-trigger bugs which trigger for one visiting order but not
 for
 another, into
 - more easily triggered bugs which trigger for any visiting order.

 Any comments?



 I think it is only half-way correct in your proposed change.  You only
 fix the issue for hashes of the same kind.  To truly fix the issue
 you'd
 have to change generated code for gt_clear_caches () and provide
 a clearing-only implementation (or pass a operation mode bool to
 the core worker in hash-table.h).




 [ Btw, we have been discussing a similar issue before:
 https://gcc.gnu.org/ml/gcc/2010-07/msg00077.html ]

 True, the problem exists at the scope of all variables marked with
 'cache',
 and this patch addresses the problem only within a single variable.

 Hmm, and don't we rather want to first mark and _then_ clear?



 I. In favor of first clear and then mark:

 It allows for:
 - a lazy one phase implementation for !ENABLE_CHECKING where
you do a single clear-or-mark phase (so the clear is lazy).
 - an eager two phase implementation for ENABLE_CHECKING (where the
clear is eager)
 The approach of first a marking phase and then a clearing phase means you
 always have to do these two phases (you can't do the marking lazily).


 True.

 First mark and then clear means the marking should be done iteratively.
 Each
 time you mark something live, another entry in another hash table could
 become live. Marking iteratively could become quite costly.


 I don't see this - marking is done recursively so if one entry makes
 another
 live and that makes another live the usual GC marking recursion will deal
 with this?


 That is not my understanding.  Marking an item live doesn't mean that the
 associated cache entries become live. For that, we have to iterate again
 over all hash tables and all entries to find those entries.
 And by marking those, we may find new items which are live. And the process
 starts over again, until fixed point.

All used predicates are basically ggc_marked_p () AFAIK.  So when sth
was not marked previosuly GC will recurse to marking it.  GC only considers
the reference from the cache special not references from entries in the cache.

But maybe I am missing something here.

 [ If we maintain a per-item list of cache entries the item is the key for,
 then we can do this recursively, rather than iteratively.

 II. In favor of first mark and then clear:

 The users of garbage collection will need to be less precise.

 Because
 if entry B in the hash is live and would keep A live then A _is_ kept in
 the
 end but you'll remove it from the hash, possibly no longer using a still
 live copy.


 I'm not sure I understand the scenario you're concerned about, but ...
 say
 we have
 - entry B: item B - item A
 - entry A: item A - item Z

 If you do clear first and mark second, and you start out with item B live
 and item A dead:
 - during the clearing phase you clear entry A and keep entry B, and
 - during the marking phase you mark item A live.

 So we no longer have entry A, but item A is kept and entry B is kept.


 Yes.  This makes the cache weaker in that after this GC operation
 a lookup of A no longer succeeds but it still is there.

 The whole point of your patch was to make the behavior more predictable
 and in some way it succeeds (within a cache).  As it is supposed to
 put more stress on the cache logic (it's ENABLE_CHECKING only)
 it makes sense to clear optimistically (after all it's a cache and not
 guaranteed to find a still live entry).  It would be still nice to cover
 all caches together because as I remember we've mostly seen issues
 of caches interacting.


 Attached patch (completed no-bootstrap c-only build) implements that.

Looks good to me.

Thanks,
Richard.

 Thanks,
 - Tom



Re: [PATCH] Do not use floating point registers when compiling with -msoft-float for SPARC

2015-07-07 Thread Eric Botcazou
 __builtin_apply* and __builtin_return accesses the floating point registers
 on SPARC even when compiling with -msoft-float.

Ouch.  The fix is OK for all active branches but...

 2015-06-26  Daniel Cederman  ceder...@gaisler.com
 
   * config/sparc/sparc.c (sparc_function_value_regno_p): Floating
 point registers cannot be used when compiling for a target
 without FPU.
   * config/sparc/sparc.md: A function cannot return a value in a
 floating point register when compiled without floating point
 support.

ChangeLog must just describe the what, nothing more.  If the rationale is not 
obvious, then a comment must be added _in the code_ itself.

* config/sparc/sparc.c (sparc_function_value_regno_p): Do not return
true on %f0 for a target without FPU.
* config/sparc/sparc.md (untyped_call): Do not save %f0 for a target
without FPU.
(untyped_return): Do not load %f0 for a target without FPU.

 +
 +  if ( TARGET_FPU )
 +{
 +  rtx valreg2 = gen_rtx_REG (TARGET_ARCH64 ? TFmode : DFmode, 32);
 +  emit_move_insn (valreg2,
 +  adjust_address (result, TARGET_ARCH64 ? TFmode :
 DFmode, 8)); +  emit_use (valreg2);
 +}

Superfluous spaces around TARGET_FPU here.

-- 
Eric Botcazou


Re: [patch 4/9] Flatten sel-sched-dump.h and sel-sched-ir.h

2015-07-07 Thread Alexander Monakov
On Tue, 7 Jul 2015, Andrew MacLeod wrote:

 This patch flattens both sel-sched-dump.h and sel-sched-ir.h. Both these files
 end up including cfgloop.h, so in preparation for flattening cfgloop.h,
 flatten these.  Note they actually have only a small effect on what includes
 them.

This patch removes #include insn-attr.h from sel-sched-ir.h without adding
it to .c files.  I'm curious how it works, is that file now arranged to be
included elsewhere?  (sorry if I missed it, but the patch series does not seem
to mention insn-attr.h specifically)

Thanks.
Alexander


Re: [gomp4] Allow parameter declarations with deviceptr

2015-07-07 Thread Thomas Schwinge
Hi!

On Wed, 1 Jul 2015 16:33:24 -0700, Cesar Philippidis ce...@codesourcery.com 
wrote:
 On 07/01/2015 02:25 PM, James Norris wrote:
 
  This patch allows parameter declarations to be used as
  arguments to deviceptr for C and C++.

Thanks!  I suppose this does fix http://gcc.gnu.org/PR64748?

 Does this fix an existing failure? If not, can you please add a new test
 case?

An earlier submission,
http://news.gmane.org/find-root.php?message_id=%3C54E23658.6060105%40codesourcery.com%3E,
did include some testsuite changes -- but I had not seen any update of
this patch after Jakub's and my review comments.


Grüße,
 Thomas


signature.asc
Description: PGP signature


[PATCH][13/n] Remove GENERIC stmt combining from SCCVN

2015-07-07 Thread Richard Biener

This moves a few more patterns that show up during bootstrap.

Bootstrapped and tested on x86_64-unknown-linux-gnu, applied to trunk.

Richard.

2015-07-07  Richard Biener  rguent...@suse.de

* fold-const.c (fold_binary_loc): Move
(X  C2)  C1 - (X  C1)  (C2  C1) simplification ...
* match.pd: ... here.
Add (X * C1) % C2 - 0 simplification pattern derived from
extract_muldiv_1.

* gcc.dg/vect/vect-over-widen-3-big-array.c: Adjust.

Index: gcc/match.pd
===
--- gcc/match.pd(revision 225504)
+++ gcc/match.pd(working copy)
@@ -230,7 +230,14 @@ (define_operator_list CBRT BUILT_IN_CBRT
  /* (X % Y) % Y is just X % Y.  */
  (simplify
   (mod (mod@2 @0 @1) @1)
-  @2))
+  @2)
+ /* From extract_muldiv_1: (X * C1) % C2 is zero if C1 is a multiple of C2.  */
+ (simplify
+  (mod (mult @0 INTEGER_CST@1) INTEGER_CST@2)
+  (if (ANY_INTEGRAL_TYPE_P (type)
+TYPE_OVERFLOW_UNDEFINED (type)
+wi::multiple_of_p (@1, @2, TYPE_SIGN (type)))
+   { build_zero_cst (type); })))
 
 /* X % -C is the same as X % C.  */
 (simplify
@@ -992,6 +999,16 @@ (define_operator_list CBRT BUILT_IN_CBRT
   (if (shift_type == TREE_TYPE (@3))
(bit_and @4 { newmaskt; }
 
+/* Fold (X  C2)  C1 into (X  C1)  (C2  C1)
+   (X  C2)  C1 into (X  C1)  (C2  C1).  */
+(for shift (lshift rshift)
+ (simplify
+  (shift (convert? (bit_and @0 INTEGER_CST@2)) INTEGER_CST@1)
+  (if (tree_nop_conversion_p (type, TREE_TYPE (@0)))
+   (with { tree mask = int_const_binop (shift, fold_convert (type, @2), @1); }
+(bit_and (shift (convert @0) @1) { mask; })
+
+
 /* Simplifications of conversions.  */
 
 /* Basic strip-useless-type-conversions / strip_nops.  */
Index: gcc/fold-const.c
===
--- gcc/fold-const.c(revision 225504)
+++ gcc/fold-const.c(working copy)
@@ -11194,27 +11140,6 @@ fold_binary_loc (location_t loc,
 prec) == 0)
return TREE_OPERAND (arg0, 0);
 
-  /* Fold (X  C2)  C1 into (X  C1)  (C2  C1)
- (X  C2)  C1 into (X  C1)  (C2  C1)
-if the latter can be further optimized.  */
-  if ((code == LSHIFT_EXPR || code == RSHIFT_EXPR)
-  TREE_CODE (arg0) == BIT_AND_EXPR
-  TREE_CODE (arg1) == INTEGER_CST
-  TREE_CODE (TREE_OPERAND (arg0, 1)) == INTEGER_CST)
-   {
- tree mask = fold_build2_loc (loc, code, type,
-  fold_convert_loc (loc, type,
-TREE_OPERAND (arg0, 1)),
-  arg1);
- tree shift = fold_build2_loc (loc, code, type,
-   fold_convert_loc (loc, type,
- TREE_OPERAND (arg0, 0)),
-   arg1);
- tem = fold_binary_loc (loc, BIT_AND_EXPR, type, shift, mask);
- if (tem)
-   return tem;
-   }
-
   return NULL_TREE;
 
 case MIN_EXPR:
Index: gcc/testsuite/gcc.dg/vect/vect-over-widen-3-big-array.c
===
--- gcc/testsuite/gcc.dg/vect/vect-over-widen-3-big-array.c (revision 
225504)
+++ gcc/testsuite/gcc.dg/vect/vect-over-widen-3-big-array.c (working copy)
@@ -58,6 +58,6 @@ int main (void)
   return 0;
 }
 
-/* { dg-final { scan-tree-dump-times vect_recog_over_widening_pattern: 
detected 1 vect } } */
+/* { dg-final { scan-tree-dump-times vect_recog_over_widening_pattern: 
detected 2 vect } } */
 /* { dg-final { scan-tree-dump-times vectorized 1 loops 1 vect } } */
 


Re: [gomp] Move openacc vector worker single handling to RTL

2015-07-07 Thread Nathan Sidwell

On 07/07/15 05:54, Jakub Jelinek wrote:

On Mon, Jul 06, 2015 at 03:34:51PM -0400, Nathan Sidwell wrote:



How does this interact with
#pragma acc routine {gang,worker,vector,seq} ?
Or is that something to be added later on?


That is to be added later on.  I suspect such routines will trivially work, as 
they'll be marked up with the loop head/tail functions and levels builtin (the 
latter might need a bit of reworking).  What will need additional work at that 
point is the callers of routines -- they're typically called from a foo-single 
mode, but need to get all threads into the called function.  I'm thinking each 
call site will look like a mini-loop[*] surrounded by a hesd/tail marker.  (all 
that can be done in the device-side compiler once real call sites are known.)


nathan

[*] of course it won't be a loop.  Perhaps fork/join are less confusing names 
after all.  WDYT?


Re: [gomp] Move openacc vector worker single handling to RTL

2015-07-07 Thread Jakub Jelinek
On Tue, Jul 07, 2015 at 10:12:56AM -0400, Nathan Sidwell wrote:
 On 07/07/15 05:54, Jakub Jelinek wrote:
 On Mon, Jul 06, 2015 at 03:34:51PM -0400, Nathan Sidwell wrote:
 
 How does this interact with
 #pragma acc routine {gang,worker,vector,seq} ?
 Or is that something to be added later on?
 
 That is to be added later on.  I suspect such routines will trivially work,
 as they'll be marked up with the loop head/tail functions and levels builtin
 (the latter might need a bit of reworking).  What will need additional work
 at that point is the callers of routines -- they're typically called from a
 foo-single mode, but need to get all threads into the called function.  I'm
 thinking each call site will look like a mini-loop[*] surrounded by a
 hesd/tail marker.  (all that can be done in the device-side compiler once
 real call sites are known.)

Wouldn't function attributes be better for that case, and just use the internal
functions for the case when the mode is being changed in the middle of
function?

I agree that fork/join might be less confusing.

BTW, where do you plan to lower the internal functions for non-PTX?
Doing it in RTL mach reorg is too late for those, we shouldn't be writing it
for each single target, as for non-PTX (perhaps non-HSA) I bet the behavior
is the same.

Jakub


[gomp4] OpenACC device_type clause (was: OpenACC: Complete changes to disallow the independent clause after device_type)

2015-07-07 Thread Thomas Schwinge
Hi Cesar!

On Tue, 7 Jul 2015 14:58:20 +0200, I wrote:
 On Wed, 1 Jul 2015 07:52:13 -0500, James Norris jnor...@codesourcery.com 
 wrote:
  The independent clause is not available for use
  with device_type clauses associated with loop
  directives. [...]

Independent of this, would you please verify (also for the corresponding
Fortran code) that given these changes:

 --- gcc/testsuite/c-c++-common/goacc/dtype-1.c
 +++ gcc/testsuite/c-c++-common/goacc/dtype-1.c
 @@ -53,7 +53,7 @@ test ()
for (i3 = 1; i3  10; i3++)
  #pragma acc loop dtype (nVidia) auto
   for (i4 = 1; i4  10; i4++)
 -#pragma acc loop dtype (nVidia) independent
 +#pragma acc loop dtype (nVidia)
 for (i5 = 1; i5  10; i5++)
  #pragma acc loop device_type (nVidia) seq
   for (i6 = 1; i6  10; i6++)
 @@ -69,7 +69,7 @@ test ()
for (i3 = 1; i3  10; i3++)
  #pragma acc loop dtype (nVidia) auto device_type (*) seq
   for (i4 = 1; i4  10; i4++)
 -#pragma acc loop device_type (nVidia) independent device_type (*) seq
 +#pragma acc loop device_type (nVidia) device_type (*) seq
 for (i5 = 1; i5  10; i5++)
  #pragma acc loop device_type (nVidia) seq
   for (i6 = 1; i6  10; i6++)
 @@ -85,7 +85,7 @@ test ()
for (i3 = 1; i3  10; i3++)
  #pragma acc loop device_type (nVidiaGPU) auto device_type (*) seq
   for (i4 = 1; i4  10; i4++)
 -#pragma acc loop dtype (nVidiaGPU) independent dtype (*) seq
 +#pragma acc loop dtype (nVidiaGPU) dtype (*) seq
 for (i5 = 1; i5  10; i5++)
  #pragma acc loop device_type (nVidiaGPU) seq device_type (*) seq
   for (i6 = 1; i6  10; i6++)

... it is indeed correct to adjust the scan-tree-dump-times as follows:

 @@ -147,7 +147,7 @@ test ()
  
  /* { dg-final { scan-tree-dump-times acc loop auto private\\(i4\\) 2 
 omplower } } */
  
 -/* { dg-final { scan-tree-dump-times acc loop private\\(i5\\) 2 omplower 
 } } */
 +/* { dg-final { scan-tree-dump-times acc loop private\\(i5\\) 1 omplower 
 } } */
  
  /* { dg-final { scan-tree-dump-times acc loop seq private\\(i6\\) 3 
 omplower } } */
  
 @@ -155,6 +155,6 @@ test ()
  
  /* { dg-final { scan-tree-dump-times acc loop seq private\\(i4\\) 1 
 omplower } } */
  
 -/* { dg-final { scan-tree-dump-times acc loop seq private\\(i5\\) 1 
 omplower } } */
 +/* { dg-final { scan-tree-dump-times acc loop seq private\\(i5\\) 2 
 omplower } } */
  
  /* { dg-final { cleanup-tree-dump omplower } } */

Is possibly something going wrong if there are two consecutive
device_type clauses, such as in:

#pragma acc loop device_type (nVidia) device_type (*) seq

..., or:

#pragma acc loop dtype (nVidiaGPU) dtype (*) seq

I'm unclear why I had to apply these scan-tree-dump-times changes?

 --- gcc/testsuite/gfortran.dg/goacc/dtype-1.f95
 +++ gcc/testsuite/gfortran.dg/goacc/dtype-1.f95
 @@ -54,7 +54,7 @@ program dtype
  do i3 = 1, 10
 !$acc loop device_type (nVidia) auto
 do i4 = 1, 10
 -  !$acc loop dtype (nVidia) independent
 +  !$acc loop dtype (nVidia)
do i5 = 1, 10
   !$acc loop dtype (nVidia) seq
   do i6 = 1, 10
 @@ -76,7 +76,7 @@ program dtype
  do i3 = 1, 10
 !$acc loop device_type (nVidia) auto dtype (*) seq
 do i4 = 1, 10
 -  !$acc loop dtype (nVidia) independent 
 +  !$acc loop dtype (nVidia) 
!$acc dtype (*) seq
do i5 = 1, 10
   !$acc loop device_type (nVidia) seq
 @@ -99,7 +99,7 @@ program dtype
  do i3 = 1, 10
 !$acc loop dtype (nVidiaGPU) auto device_type (*) seq
 do i4 = 1, 10
 -  !$acc loop dtype (nVidiaGPU) independent 
 +  !$acc loop dtype (nVidiaGPU) 
!$acc dtype (*) seq
do i5 = 1, 10
   !$acc loop dtype (nVidiaGPU) seq device_type (*) seq

No adjustment has been necessary here.


Grüße,
 Thomas


pgpBBH0CFfMzm.pgp
Description: PGP signature


RE: [Patch,tree-optimization]: Add new path Splitting pass on tree ssa representation

2015-07-07 Thread Ajit Kumar Agarwal


-Original Message-
From: Richard Biener [mailto:richard.guent...@gmail.com] 
Sent: Tuesday, July 07, 2015 2:21 PM
To: Ajit Kumar Agarwal
Cc: l...@redhat.com; GCC Patches; Vinod Kathail; Shail Aditya Gupta; Vidhumouli 
Hunsigida; Nagaraju Mekala
Subject: Re: [Patch,tree-optimization]: Add new path Splitting pass on tree ssa 
representation

On Sat, Jul 4, 2015 at 2:39 PM, Ajit Kumar Agarwal 
ajit.kumar.agar...@xilinx.com wrote:


 -Original Message-
 From: Richard Biener [mailto:richard.guent...@gmail.com]
 Sent: Tuesday, June 30, 2015 4:42 PM
 To: Ajit Kumar Agarwal
 Cc: l...@redhat.com; GCC Patches; Vinod Kathail; Shail Aditya Gupta; 
 Vidhumouli Hunsigida; Nagaraju Mekala
 Subject: Re: [Patch,tree-optimization]: Add new path Splitting pass on 
 tree ssa representation

 On Tue, Jun 30, 2015 at 10:16 AM, Ajit Kumar Agarwal 
 ajit.kumar.agar...@xilinx.com wrote:
 All:

 The below patch added a new path Splitting optimization pass on SSA 
 representation. The Path Splitting optimization Pass moves the join 
 block of if-then-else same as loop latch to its predecessors and get merged 
 with the predecessors Preserving the SSA representation.

 The patch is tested for Microblaze and i386 target. The EEMBC/Mibench 
 benchmarks is run with the Microblaze target And the performance gain 
 of 9.15% and rgbcmy01_lite(EEMBC benchmarks). The Deja GNU tests is run for 
 Mircroblaze Target and no regression is seen for Microblaze target and the 
 new testcase attached are passed.

 For i386 bootstrapping goes through fine and the Spec cpu2000 
 benchmarks is run with this patch. Following observation were seen with spec 
 cpu2000 benchmarks.

 Ratio of path splitting change vs Ratio of not having path splitting change 
 is 3653.353 vs 3652.14 for INT benchmarks.
 Ratio of path splitting change vs Ratio of not having path splitting change 
 is  4353.812 vs 4345.351 for FP benchmarks.

 Based on comments from RFC patch following changes were done.

 1. Added a new pass for path splitting changes.
 2. Placed the new path  Splitting Optimization pass before the copy 
 propagation pass.
 3. The join block same as the Loop latch is wired into its 
 predecessors so that the CFG Cleanup pass will merge the blocks Wired 
 together.
 4. Copy propagation routines added for path splitting changes is not 
 needed as suggested by Jeff. They are removed in the patch as The copy 
 propagation in the copied join blocks will be done by the existing copy 
 propagation pass and the update ssa pass.
 5. Only the propagation of phi results of the join block with the phi 
 argument is done which will not be done by the existing update_ssa Or copy 
 propagation pass on tree ssa representation.
 6. Added 2 tests.
 a) compilation check  tests.
b) execution tests.
 7. Refactoring of the code for the feasibility check and finding the join 
 block same as loop latch node.

 [Patch,tree-optimization]: Add new path Splitting pass on tree ssa 
 representation.

 Added a new pass on path splitting on tree SSA representation. The path
 splitting optimization does the CFG transformation of join block of the
 if-then-else same as the loop latch node is moved and merged with the
 predecessor blocks after preserving the SSA representation.

 ChangeLog:
 2015-06-30  Ajit Agarwal  ajit...@xilinx.com

 * gcc/Makefile.in: Add the build of the new file
 tree-ssa-path-split.c
 * gcc/common.opt: Add the new flag ftree-path-split.
 * gcc/opts.c: Add an entry for Path splitting pass
 with optimization flag greater and equal to O2.
 * gcc/passes.def: Enable and add new pass path splitting.
 * gcc/timevar.def: Add the new entry for TV_TREE_PATH_SPLIT.
 * gcc/tree-pass.h: Extern Declaration of make_pass_path_split.
 * gcc/tree-ssa-path-split.c: New file for path splitting pass.
 * gcc/testsuite/gcc.dg/tree-ssa/path-split-2.c: New testcase.
 * gcc/testsuite/gcc.dg/path-split-1.c: New testcase.

I'm not 100% sure I understand the transform but what I see from the 
testcases it tail-duplicates from a conditional up to a loop latch block 
(not sure if it includes it and thus ends up creating a loop nest or not).

An observation I have is that the pass should at least share the transform 
stage to some extent with the existing tracer pass (tracer.c) which 
essentially does the same but not restricted to loops in any way.

 The following piece of code from tracer.c can be shared with the existing 
 path splitting pass.

 {
  e = find_edge (bb, bb2);

   copy = duplicate_block (bb2, e, bb);
   flush_pending_stmts (e);

   add_phi_args_after_copy (copy, 1, NULL); }

 Sharing the above code of the transform stage of tracer.c with the path 
 splitting pass has the following limitation.

 1. The duplicated loop latch node is wired to its predecessors and the 
 existing phi node in the loop 

[patch 6/9] Flatten gimple-streamer.h

2015-07-07 Thread Andrew MacLeod
This one is amusing... 3 header files, all of them already included in 
all the files which include this.


Bootstraps from scratch on x86_64-unknown-linux-gnu with no new 
regressions. Also compiles all the files in config-list.mk.




	* gimple-streamer.h: Remove all includes.

Index: gimple-streamer.h
===
*** gimple-streamer.h	(revision 225452)
--- gimple-streamer.h	(working copy)
*** along with GCC; see the file COPYING3.
*** 22,30 
  #ifndef GCC_GIMPLE_STREAMER_H
  #define GCC_GIMPLE_STREAMER_H
  
- #include tm.h
- #include hard-reg-set.h
- #include function.h
  #include lto-streamer.h
  
  /* In gimple-streamer-in.c  */
--- 22,27 


Re: [RFC] two-phase marking in gt_cleare_cache

2015-07-07 Thread Michael Matz
Hi,

On Mon, 6 Jul 2015, Richard Biener wrote:

  By doing so, we make the behaviour of gt_cleare_cache independent of the
  order in which the entries are visited, turning:
  - hard-to-trigger bugs which trigger for one visiting order but not for
another, into
  - more easily triggered bugs which trigger for any visiting order.
 
  Any comments?
 
  I think it is only half-way correct in your proposed change.  You only
  fix the issue for hashes of the same kind.  To truly fix the issue you'd
  have to change generated code for gt_clear_caches () and provide
  a clearing-only implementation (or pass a operation mode bool to
  the core worker in hash-table.h).
 
 Hmm, and don't we rather want to first mark and _then_ clear?  Because 
 if entry B in the hash is live and would keep A live then A _is_ kept in 
 the end but you'll remove it from the hash, possibly no longer using a 
 still live copy.

I don't think such use has ever worked with the caching hash tables.  Even 
the old (before c++) scheme didn't iterate, i.e. if a cache-hash entry A
became life from outside, but it itself kept an entry B live in the hash 
table (with no other references to B) then this never worked (or only by 
luck), because the slot was always cleared if !ggc_marked_p, so if B was 
visited before A it was removed from the hash-table (and in particular B's 
gt_ggc_mx routine was never called, so whatever B needed wasn't even 
marked).

Given this I think the call to gt_ggc_mx is superfluous because it 
wouldn't work relyably for multi-step dependencies anyway.  Hence a 
situation that works with that call in place, and breaking without it is 
actually a bug waiting to be uncovered.


Ciao,
Michael.


RE: [PATCH] MIPS: Fix the call-[1,5,6].c tests to allow the jrc instruction to be matched when testing with microMIPS

2015-07-07 Thread Andrew Bennett
 OK.

Committed as SVN 225516.


Regards,



Andrew


  1   2   >