date:20150707

Re: [PR25529] Convert (unsigned t * 2)/2 into unsigned (t 0x7FFFFFFF)

2015-07-07 Thread Marc Glisse


On Tue, 7 Jul 2015, Hurugalawadi, Naveen wrote:


Please find attached the patch PR25529.patch that converts the pattern:-
(unsigned * 2)/2 is into unsigned 0x7FFF


+/* Simplify (unsigned t * 2)/2 - unsigned t  0x7FFF.  */
+(for div (trunc_div ceil_div floor_div round_div exact_div)
+ (simplify
+  (div (mult @0 INTEGER_CST@1) INTEGER_CST@1)

You don't need to repeat INTEGER_CST, the second time @1 is enough.

+  (with { tree n2 = build_int_cst (TREE_TYPE (@0),
+  wi::exact_log2 (@1)); }
+  (if (TYPE_UNSIGNED (TREE_TYPE (@0)))
+   (bit_and @0 (rshift (lshift { build_minus_one_cst (TREE_TYPE (@0)); }
+  { n2; }) { n2; }))

What happens if you write t*3/3?

--
Marc Glisse

Tests for libgomp based on OpenMP Examples 4.0.2.

2015-07-07 Thread Maxim Blumental

With this letter I propose a patch with tests for libgomp based on
OpenMP Examples 4.0.2 both for C and Fortran.

The changes are:

 Renamed existing tests based on OpenMP Examples to make the names more clear.
Added 16 tests for simd construct and 10 for depend clause.

-
Sincerely yours,
Maxim Blumental
2015-07-06 Maxim  Blumenthal  bvm...@gmail.com

* libgomp/testsuite/libgomp.c/examples-4/e.56.3.c: renamed to 
array_sections-3.c
* libgomp/testsuite/libgomp.c/examples-4/e.56.4.c: renamed to 
array_sections-4.c
* libgomp/testsuite/libgomp.c/examples-4/e.55.1.c: renamed to 
async_target-1.c
* libgomp/testsuite/libgomp.c/examples-4/e.55.2.c: renamed to 
async_target-2.c
* libgomp/testsuite/libgomp.c/examples-4/e.53.1.c: renamed to 
declare_target-1.c
* libgomp/testsuite/libgomp.c/examples-4/e.53.3.c: renamed to 
declare_target-3.c
* libgomp/testsuite/libgomp.c/examples-4/e.53.4.c: renamed to 
declare_target-4.c
* libgomp/testsuite/libgomp.c/examples-4/e.53.5.c: renamed to 
declare_target-5.c
* libgomp/testsuite/libgomp.c/examples-4/e.57.1.c: renamed to device-1.c
* libgomp/testsuite/libgomp.c/examples-4/e.57.2.c: renamed to device-2.c
* libgomp/testsuite/libgomp.c/examples-4/e.57.3.c: renamed to device-3.c
* libgomp/testsuite/libgomp.c/examples-4/simd-1.c: A test for simd 
construct.
* libgomp/testsuite/libgomp.c/examples-4/simd-2.c: Same.
* libgomp/testsuite/libgomp.c/examples-4/simd-3.c: Same.
* libgomp/testsuite/libgomp.c/examples-4/simd-4.c: Same.
* libgomp/testsuite/libgomp.c/examples-4/simd-5.c: Same.
* libgomp/testsuite/libgomp.c/examples-4/simd-6.c: Same.
* libgomp/testsuite/libgomp.c/examples-4/simd-7.c: Same.
* libgomp/testsuite/libgomp.c/examples-4/simd-8.c: Same.
* libgomp/testsuite/libgomp.c/examples-4/e.50.1.c: renamed to target-1.c
* libgomp/testsuite/libgomp.c/examples-4/e.50.2.c: renamed to target-2.c
* libgomp/testsuite/libgomp.c/examples-4/e.50.3.c: renamed to target-3.c
* libgomp/testsuite/libgomp.c/examples-4/e.50.4.c: renamed to target-4.c
* libgomp/testsuite/libgomp.c/examples-4/e.50.5.c: renamed to target-5.c
* libgomp/testsuite/libgomp.c/examples-4/e.51.1.c: renamed to 
target_data-1.c
* libgomp/testsuite/libgomp.c/examples-4/e.51.2.c: renamed to 
target_data-2.c
* libgomp/testsuite/libgomp.c/examples-4/e.51.3.c: renamed to 
target_data-3.c
* libgomp/testsuite/libgomp.c/examples-4/e.51.4.c: renamed to 
target_data-4.c
* libgomp/testsuite/libgomp.c/examples-4/e.51.6.c: renamed to 
target_data-6.c
* libgomp/testsuite/libgomp.c/examples-4/e.51.7.c: renamed to 
target_data-7.c
* libgomp/testsuite/libgomp.c/examples-4/e.52.1.c: renamed to 
target_update-1.c
* libgomp/testsuite/libgomp.c/examples-4/e.52.2.c: renamed to 
target_update-2.c
* libgomp/testsuite/libgomp.c/examples-4/task_dep-1.c: A test for task 
dependencies.
* libgomp/testsuite/libgomp.c/examples-4/task_dep-2.c: Same.
* libgomp/testsuite/libgomp.c/examples-4/task_dep-3.c: Same.
* libgomp/testsuite/libgomp.c/examples-4/task_dep-4.c: Same.
* libgomp/testsuite/libgomp.c/examples-4/task_dep-5.c: Same.
* libgomp/testsuite/libgomp.c/examples-4/e.54.2.c: renamed to teams-2.c
* libgomp/testsuite/libgomp.c/examples-4/e.54.3.c: renamed to teams-3.c
* libgomp/testsuite/libgomp.c/examples-4/e.54.4.c: renamed to teams-4.c
* libgomp/testsuite/libgomp.c/examples-4/e.54.5.c: renamed to teams-5.c
* libgomp/testsuite/libgomp.c/examples-4/e.54.6.c: renamed to teams-6.c
* libgomp/testsuite/libgomp.fortran/examples-4/e.56.3.f90: renamed to 
array_sections-3.f90
* libgomp/testsuite/libgomp.fortran/examples-4/e.56.4.f90: renamed to 
array_sections-4.f90
* libgomp/testsuite/libgomp.fortran/examples-4/e.55.1.f90: renamed to 
async_target-1.f90
* libgomp/testsuite/libgomp.fortran/examples-4/e.55.2.f90: renamed to 
async_target-2.f90
* libgomp/testsuite/libgomp.fortran/examples-4/e.53.1.f90: renamed to 
declare_target-1.f90
* libgomp/testsuite/libgomp.fortran/examples-4/e.53.2.f90: renamed to 
declare_target-2.f90
* libgomp/testsuite/libgomp.fortran/examples-4/e.53.3.f90: renamed to 
declare_target-3.f90
* libgomp/testsuite/libgomp.fortran/examples-4/e.53.4.f90: renamed to 
declare_target-4.f90
* libgomp/testsuite/libgomp.fortran/examples-4/e.53.5.f90: renamed to 
declare_target-5.f90
* libgomp/testsuite/libgomp.fortran/examples-4/e.57.1.f90: renamed to 
device-1.f90
* libgomp/testsuite/libgomp.fortran/examples-4/e.57.2.f90: renamed to 
device-2.f90
* libgomp/testsuite/libgomp.fortran/examples-4/e.57.3.f90: renamed to 
device-3.f90
* libgomp/testsuite/libgomp.fortran/examples-4/simd-1.f90: A test for simd 
construct.
* libgomp/testsuite/libgomp.fortran/examples-4/simd-2.f90: Same.
* libgomp/testsuite/libgomp.fortran/examples-4/simd-3.f90: Same.
* libgomp/testsuite/libgomp.fortran/examples-4/simd-4.f90:

Re: [PATCH 3/16][ARM] Add float16x4_t intrinsics

2015-07-07 Thread Alan Lawrence

Kyrill Tkachov wrote:

On 07/07/15 17:34, Alan Lawrence wrote:

Kyrill Tkachov wrote:

On 07/07/15 14:09, Kyrill Tkachov wrote:

Hi Alan,

On 07/07/15 13:34, Alan Lawrence wrote:

As per https://gcc.gnu.org/ml/gcc-patches/2015-04/msg01335.html

For some context, the reference for these is at:
http://infocenter.arm.com/help/topic/com.arm.doc.ihi0073a/IHI0073A_arm_neon_intrinsics_ref.pdf

This patch is ok once you and Charles decide on how to proceed with the two
prerequisites.

On second thought, the ACLE document at
http://infocenter.arm.com/help/topic/com.arm.doc.ihi0053c/IHI0053C_acle_2_0.pdf

says in 12.2.1:
float16 types are only available when the __fp16 type is defined, i.e. when
supported by the hardware

However, we support __fp16 whenever the user specifies -mfp16-format=ieee or
-mfp16-format=alternative, regardless of whether we have hardware support or
not.

(Without hardware support, gcc generates calls to __gnu_f2h_ieee or
__gnu_f2h_alternative instead of vcvtb.f16.f32, and __gnu_h2f_ieee or
__gnu_h2f_alternative instead of vcvtb.f32.f16. However, there is no way to
support __fp16 just using those hardware instructions without caring about which
format is in use.)

Hmmm... In my opinion intrinsics should aim to map to instructions rather than
go away and
call library functions, but this is the existing functionality
that current users might depend on :(

Sorry - to clarify: currently we generate __gnu_f2h_ieee / __gnu_h2f_ieee, to
convert between single __fp16 and 'float' values, when there is no HW. General
operations on scalar __fp16 values are performed by converting to float,
performing operations on float, and converting back. The __fp16 type is
available and usable without HW support, but only when -mfp16-format is specified.

(The existing) intrinsics operating on float16x[48] vectors (converting to/from
float32x4) are *not* available without hardware support; these intrinsics *are*
available without specifying -mfp16-format.

ACLE (4.1.2) allows toolchains to provide __fp16 when not implemented in HW,
even if this is not required.

CC'ing the ARM maintainers and Tejas for an ACLE perspective.
I think that we'd want to gate the definition of __fp16 on hardware
availability as well
(the -mfpu option) rather than just arm_fp16_format but I'm not sure of the
impact this will have
on existing users.

Surebut do we require -mfpu *and* -mfp16-format? s/and/or/ ? Do we require
-mfp16-format for float16x[48] intrinsics, or allow format-agnostic code (as HW
support allows us to!)?

I don't have very strong opinions as to which way we should go, I merely tried
to be consistent with the existing codebase, and to support as much code as
possible, although I agree I ignored cases where defining functions unexpectedly
might cause problems.

Cheers, Alan

Re: [Ping^3] [PATCH, ARM, libgcc] New aeabi_idiv function for armv6-m

2015-07-07 Thread Richard Earnshaw

On 07/07/15 16:23, Tejas Belagod wrote:
 Ping!
 

I've had a look at this (sorry for the delay).  I think it's mostly OK,
but I have two comments to make.

1) It's quite hard to understand the algorithm and there are no comments
to aid understanding (to be fair, there aren't many comments on the
other algorithms either).

2) It looks as though the new code calculates both the division and the
modulus simultaneously.  As such, it means that the the divmod code
should be simplified to share the same code as the division function
itself (saving a few bytes, and more importantly several cycles when
modulus is required).

Can you please run some tests to validate 2) above?  and if correct
adjust the code to handle this case.  I think that will go some way to
mitigating the code size increase from the new implementation.

R.

 On 30/04/15 10:40, Hale Wang wrote:
 -Original Message-
 From: Hale Wang [mailto:hale.w...@arm.com]
 Sent: Monday, February 09, 2015 9:54 AM
 To: Richard Earnshaw
 Cc: Hale Wang; gcc-patches; Matthew Gretton-Dann
 Subject: RE: [Ping^2] [PATCH, ARM, libgcc] New aeabi_idiv function for
 armv6-m

 Ping https://gcc.gnu.org/ml/gcc-patches/2014-12/msg01059.html.


 Ping for trunk. Is it ok for trunk now?

 Thanks,
 Hale
 -Original Message-
 From: gcc-patches-ow...@gcc.gnu.org [mailto:gcc-patches-
 ow...@gcc.gnu.org] On Behalf Of Hale Wang
 Sent: Friday, December 12, 2014 9:36 AM
 To: gcc-patches
 Subject: RE: [Ping] [PATCH, ARM, libgcc] New aeabi_idiv function for
 armv6- m

 Ping? Already applied to arm/embedded-4_9-branch, is it OK for trunk?

 -Hale

 -Original Message-
 From: Joey Ye [mailto:joey.ye...@gmail.com]
 Sent: Thursday, November 27, 2014 10:01 AM
 To: Hale Wang
 Cc: gcc-patches
 Subject: Re: [PATCH, ARM, libgcc] New aeabi_idiv function for
 armv6-m

 OK applying to arm/embedded-4_9-branch, though you still need
 maintainer approval into trunk.

 - Joey

 On Wed, Nov 26, 2014 at 11:43 AM, Hale Wang hale.w...@arm.com
 wrote:
 Hi,

 This patch ports the aeabi_idiv routine from Linaro Cortex-Strings
 (https://git.linaro.org/toolchain/cortex-strings.git), which was
 contributed by ARM under Free BSD license.

 The new aeabi_idiv routine is used to replace the one in
 libgcc/config/arm/lib1funcs.S. This replacement happens within the
 Thumb1 wrapper. The new routine is under LGPLv3 license.

 The main advantage of this version is that it can improve the
 performance of the aeabi_idiv function for Thumb1. This solution
 will also increase the code size. So it will only be used if
 __OPTIMIZE_SIZE__ is
 not defined.

 Make check passed for armv6-m.

 OK for trunk?

 Thanks,
 Hale Wang

 libgcc/ChangeLog:

 2014-11-26  Hale Wang  hale.w...@arm.com

  * config/arm/lib1funcs.S: Add new wrapper.

 ===
 diff --git a/libgcc/config/arm/lib1funcs.S
 b/libgcc/config/arm/lib1funcs.S index b617137..de66c81 100644
 --- a/libgcc/config/arm/lib1funcs.S
 +++ b/libgcc/config/arm/lib1funcs.S
 @@ -306,34 +306,12 @@ LSYM(Lend_fde):
   #ifdef __ARM_EABI__
   .macro THUMB_LDIV0 name signed
   #if defined(__ARM_ARCH_6M__)
 -   .ifc \signed, unsigned
 -   cmp r0, #0
 -   beq 1f
 -   mov r0, #0
 -   mvn r0, r0  @ 0x
 -1:
 -   .else
 -   cmp r0, #0
 -   beq 2f
 -   blt 3f
 +
 +   push{r0, lr}
  mov r0, #0
 -   mvn r0, r0
 -   lsr r0, r0, #1  @ 0x7fff
 -   b   2f
 -3: mov r0, #0x80
 -   lsl r0, r0, #24 @ 0x8000
 -2:
 -   .endif
 -   push{r0, r1, r2}
 -   ldr r0, 4f
 -   adr r1, 4f
 -   add r0, r1
 -   str r0, [sp, #8]
 -   @ We know we are not on armv4t, so pop pc is safe.
 -   pop {r0, r1, pc}
 -   .align  2
 -4:
 -   .word   __aeabi_idiv0 - 4b
 +   bl  SYM(__aeabi_idiv0)
 +   pop {r1, pc}
 +
   #elif defined(__thumb2__)
  .syntax unified
  .ifc \signed, unsigned
 @@ -927,7 +905,158 @@ LSYM(Lover7):
  add dividend, work
 .endif
   LSYM(Lgot_result):
 -.endm
 +.endm
 +
 +#if defined(__prefer_thumb__)
  !defined(__OPTIMIZE_SIZE__) .macro
 +BranchToDiv n, label
 +   lsr curbit, dividend, \n
 +   cmp curbit, divisor
 +   blo \label
 +.endm
 +
 +.macro DoDiv n
 +   lsr curbit, dividend, \n
 +   cmp curbit, divisor
 +   bcc 1f
 +   lsl curbit, divisor, \n
 +   sub dividend, dividend, curbit
 +
 +1: adc result, result
 +.endm
 +
 +.macro THUMB1_Div_Positive
 +   mov result, #0
 +   BranchToDiv #1, LSYM(Lthumb1_div1)
 +   BranchToDiv #4, LSYM(Lthumb1_div4)
 +   BranchToDiv #8, LSYM(Lthumb1_div8)
 +   BranchToDiv #12, LSYM(Lthumb1_div12)
 +   BranchToDiv #16, LSYM(Lthumb1_div16)
 +LSYM(Lthumb1_div_large_positive):
 +   mov result, #0xff
 +   lsl divisor, divisor, #8
 +

RE: [PATCH] MIPS: Update stack-1.c testcase to match micromips jraddiusp instruction.

2015-07-07 Thread Andrew Bennett

  I'm not sure this is the right approach here. If we get a jraddiusp then the
  problem that the test is trying to cover can't possibly happen anyway.
  (The test is checking if a load and final stack adjustment are ever re-
 ordered
  from what I can see.)
 
  I'd just mark the test as NOCOMPRESSION instead of just NOMIPS16 and
  update the comment to say that it is avoiding SAVE, RESTORE and
  JRADDIUSP.
 
 
 Another approach would be to add the micromips testcase variant and skip the
 test if code-quality (ie. -O0).
 Catherine

I agree with Matthew here.  The testcase already comments that it is preventing
the use of the MIPS16 save and restore instructions, so it makes sense to 
prevent
jraddiusp as well.

The updated patch and ChangeLog is below.

Ok to commit?


Many thanks,



Andrew


testsuite/
* gcc.target/mips/stack-1.c: Do not build the testcase for micromips.


diff --git a/gcc/testsuite/gcc.target/mips/stack-1.c 
b/gcc/testsuite/gcc.target/mips/stack-1.c
index a28e4bf..5f25c21 100644
--- a/gcc/testsuite/gcc.target/mips/stack-1.c
+++ b/gcc/testsuite/gcc.target/mips/stack-1.c
@@ -2,8 +2,8 @@
 /* { dg-final { scan-assembler \tlw\t } } */
 /* { dg-final { scan-assembler-not 
\td?addiu\t(\\\$sp,)?\\\$sp,\[1-9\].*\tlw\t } } */
 
-/* Avoid use of SAVE and RESTORE.  */
-NOMIPS16 int foo (int y)
+/* Avoid use of SAVE, RESTORE and JRADDIUSP.  */
+NOCOMPRESSION int foo (int y)
 {
   volatile int a = y;
   volatile int *volatile b = a;

RE: [PATCH] MIPS: fix failing branch range checks for micromips

2015-07-07 Thread Andrew Bennett

 I see that you are naming these tests after the original branch-number tests
 that they were derived from.
 I think it would be better to keep all of the microMIPS tests named umips-???.
 I don't think preserving the original number is important.

I have named the microMIPS tests umips-branch-??? to keep with the current 
microMIPS
test naming strategy.  The numbering starts at 5 as there are already tests 
numbered 
1-4.

An updated patch and ChangeLog is below.

Ok to commit?


Many thanks,



Andrew



testsuite/
* gcc.target/mips/branch-2.c: Change NOMIPS16 to NOCOMPRESSION.
* gcc.target/mips/branch-3.c: Ditto
* gcc.target/mips/branch-4.c: Ditto.
* gcc.target/mips/branch-5.c: Ditto.
* gcc.target/mips/branch-6.c: Ditto.
* gcc.target/mips/branch-7.c: Ditto.
* gcc.target/mips/branch-8.c: Ditto.
* gcc.target/mips/branch-9.c: Ditto.
* gcc.target/mips/branch-10.c: Ditto.
* gcc.target/mips/branch-11.c: Ditto.
* gcc.target/mips/branch-12.c: Ditto.
* gcc.target/mips/branch-13.c: Ditto.
* gcc.target/mips/branch-14.c: Ditto.
* gcc.target/mips/branch-15.c: Ditto.
* gcc.target/mips/umips-branch-5.c: New file.
* gcc.target/mips/umips-branch-6.c: New file.
* gcc.target/mips/umips-branch-7.c: New file.
* gcc.target/mips/umips-branch-8.c: New file.
* gcc.target/mips/umips-branch-9.c: New file.
* gcc.target/mips/umips-branch-10.c: New file.
* gcc.target/mips/umips-branch-11.c: New file.
* gcc.target/mips/umips-branch-12.c: New file.
* gcc.target/mips/umips-branch-13.c: New file.
* gcc.target/mips/umips-branch-14.c: New file.
* gcc.target/mips/umips-branch-15.c: New file.
* gcc.target/mips/umips-branch-16.c: New file.
* gcc.target/mips/umips-branch-17.c: New file.
* gcc.target/mips/umips-branch-18.c: New file.
* gcc.target/mips/branch-helper.h (OCCUPY_0x1): New define.
(OCCUPY_0xfffc): New define.

diff --git a/gcc/testsuite/gcc.target/mips/branch-10.c 
b/gcc/testsuite/gcc.target/mips/branch-10.c
index e2b1b5f..eb21c16 100644
--- a/gcc/testsuite/gcc.target/mips/branch-10.c
+++ b/gcc/testsuite/gcc.target/mips/branch-10.c
@@ -4,7 +4,7 @@
 
 #include branch-helper.h
 
-NOMIPS16 void
+NOCOMPRESSION void
 foo (int (*bar) (void), int *x)
 {
   *x = bar ();
diff --git a/gcc/testsuite/gcc.target/mips/branch-11.c 
b/gcc/testsuite/gcc.target/mips/branch-11.c
index 962eb1b..bd8e834 100644
--- a/gcc/testsuite/gcc.target/mips/branch-11.c
+++ b/gcc/testsuite/gcc.target/mips/branch-11.c
@@ -8,7 +8,7 @@
 
 #include branch-helper.h
 
-NOMIPS16 void
+NOCOMPRESSION void
 foo (int (*bar) (void), int *x)
 {
   *x = bar ();
diff --git a/gcc/testsuite/gcc.target/mips/branch-12.c 
b/gcc/testsuite/gcc.target/mips/branch-12.c
index 4aef160..4944634 100644
--- a/gcc/testsuite/gcc.target/mips/branch-12.c
+++ b/gcc/testsuite/gcc.target/mips/branch-12.c
@@ -4,7 +4,7 @@
 
 #include branch-helper.h
 
-NOMIPS16 void
+NOCOMPRESSION void
 foo (int (*bar) (void), int *x)
 {
   *x = bar ();
diff --git a/gcc/testsuite/gcc.target/mips/branch-13.c 
b/gcc/testsuite/gcc.target/mips/branch-13.c
index 8a6fb04..f5269b9 100644
--- a/gcc/testsuite/gcc.target/mips/branch-13.c
+++ b/gcc/testsuite/gcc.target/mips/branch-13.c
@@ -8,7 +8,7 @@
 
 #include branch-helper.h
 
-NOMIPS16 void
+NOCOMPRESSION void
 foo (int (*bar) (void), int *x)
 {
   *x = bar ();
diff --git a/gcc/testsuite/gcc.target/mips/branch-14.c 
b/gcc/testsuite/gcc.target/mips/branch-14.c
index 026417e..c2eecc3 100644
--- a/gcc/testsuite/gcc.target/mips/branch-14.c
+++ b/gcc/testsuite/gcc.target/mips/branch-14.c
@@ -4,14 +4,14 @@
 #include branch-helper.h
 
 void __attribute__((noinline))
-foo (volatile int *x)
+NOCOMPRESSION foo (volatile int *x)
 {
   if (__builtin_expect (*x == 0, 1))
 OCCUPY_0x1fff8;
 }
 
 int
-main (void)
+NOCOMPRESSION main (void)
 {
   int x = 0;
   int y = 1;
diff --git a/gcc/testsuite/gcc.target/mips/branch-15.c 
b/gcc/testsuite/gcc.target/mips/branch-15.c
index dee7a05..89e25f3 100644
--- a/gcc/testsuite/gcc.target/mips/branch-15.c
+++ b/gcc/testsuite/gcc.target/mips/branch-15.c
@@ -4,14 +4,14 @@
 #include branch-helper.h
 
 void
-foo (volatile int *x)
+NOCOMPRESSION foo (volatile int *x)
 {
   if (__builtin_expect (*x == 0, 1))
 OCCUPY_0x1fffc;
 }
 
 int
-main (void)
+NOCOMPRESSION main (void)
 {
   int x = 0;
   int y = 1;
diff --git a/gcc/testsuite/gcc.target/mips/branch-2.c 
b/gcc/testsuite/gcc.target/mips/branch-2.c
index 6409c4c..b60e9cd 100644
--- a/gcc/testsuite/gcc.target/mips/branch-2.c
+++ b/gcc/testsuite/gcc.target/mips/branch-2.c
@@ -5,7 +5,7 @@
 
 #include branch-helper.h
 
-NOMIPS16 void
+NOCOMPRESSION void
 foo (volatile int *x)
 {
   if (__builtin_expect (*x == 0, 1))
diff --git a/gcc/testsuite/gcc.target/mips/branch-3.c 
b/gcc/testsuite/gcc.target/mips/branch-3.c
index 5fcfece..69300f6 100644
---

Re: [PATCH 3/16][ARM] Add float16x4_t intrinsics

2015-07-07 Thread Kyrill Tkachov



On 07/07/15 14:09, Kyrill Tkachov wrote:

Hi Alan,

On 07/07/15 13:34, Alan Lawrence wrote:

As per https://gcc.gnu.org/ml/gcc-patches/2015-04/msg01335.html

For some context, the reference for these is at:
http://infocenter.arm.com/help/topic/com.arm.doc.ihi0073a/IHI0073A_arm_neon_intrinsics_ref.pdf

This patch is ok once you and Charles decide on how to proceed with the two 
prerequisites.


On second thought, the ACLE document at 
http://infocenter.arm.com/help/topic/com.arm.doc.ihi0053c/IHI0053C_acle_2_0.pdf

says in 12.2.1:
float16 types are only available when the __fp16 type is defined, i.e. when 
supported by the hardware

This indicates that float16 type and intrinsic availability should be gated on 
the availability of fp16
in the specified -mfpu. Look at some existing intrinsics like vcvt_f16_f32 for 
a way to gate these.

I notice that the float32x4_t is unconditionally defined in our arm_neon.h, 
however.
I think this is a bug and its definition should be #ifdef'd properly as well.

Thanks,

Kyrill



Thanks,
Kyrill

RE: [PATCH] MIPS: Do not generate micromips code for the no-smartmips-lwxs.c testcase

2015-07-07 Thread Andrew Bennett

 Hi Andrew,
 
 Instead of adding the -mno-micromips option to dg-options, please change the
 MIPS16 attribute to NOCOMPRESSION.
 
 Index: gcc.target/mips/no-smartmips-lwxs.c
 ===
 --- gcc.target/mips/no-smartmips-lwxs.c (revision 452061)
 +++ gcc.target/mips/no-smartmips-lwxs.c (working copy)
 @@ -1,7 +1,7 @@
  /* { dg-do compile } */
  /* { dg-options -mno-smartmips } */
 
 -NOMIPS16 int scaled_indexed_word_load (int a[], int b)
 +NOCOMPRESSION int scaled_indexed_word_load (int a[], int b)
  {
return a[b];
  }
 
 OK with that change.
 Catherine

Committed as SVN 225519. 

Regards,



Andrew

Re: [PATCH, ARM] stop changing signedness in PROMOTE_MODE

2015-07-07 Thread Jim Wilson

On Tue, Jul 7, 2015 at 8:07 AM, Jeff Law l...@redhat.com wrote:
 On 06/29/2015 07:15 PM, Jim Wilson wrote:
 So if these copies require a  conversion, then isn't it fundamentally
 wrong to have a PHI node which copies between them?  That would seem to
 implicate the eipa_sra pass as needing to be aware of these promotions and
 avoid having these objects with different representations appearing on the
 lhs/rhs of a PHI node.

My years at Cisco didn't give me a chance to work on the SSA passes,
so I don't know much about how they work.  But looking at this, I see
that PHI nodes are eventually handed by emit_partition_copy in
tree-outof-ssa.c, which calls convert_to_mode, so it appears that
conversions between different (closely related?) types are OK in a PHI
node.  The problem in this case is that we have the exact same type
for the src and dest.  The only difference is that the ARM forces
sign-extension for signed sub-word parameters and zero-extension for
signed sub-word locals.  Thus to detect the need for a conversion, you
have to have the decls, and we don't have them here.  There is also
the problem that all of the SUBREG_PROMOTED_* stuff is in expand_expr
and friends, which aren't used by the cfglayout/tree-outof-ssa.c code
for PHI nodes.  So we need to copy some of the SUBREG_PROMOTED_*
handling into cfglyout/tree-outof-ssa, or modify them to call
expand_expr for PHI nodes, and I haven't had any luck getting that to
work yet. I still need to learn more about the code to figure out if
this is possible.

I also think that the ARM handling of PROMOTE_MODE is wrong.  Treating
a signed sub-word and unsigned can lead to inefficient code.  This is
part of the problem is much easier for me to fix.  It may be hard to
convince ARM maintainers that it should be changed though.  I need
more time to work on this too.

I haven't looked at trying to forbid the optimizer from creating PHI
nodes connecting parameters to locals.  That just sounds like a
strange thing to forbid, and seems likely to result in worse code by
disabling too many optimizations.  But maybe it is the right solution
if neither of the other two options work.  This should only be done
when PROMOTE_MODE disagrees with TARGET_PROMOTE_FUNCTION_MODE, forcing
the copy to require a conversion.

Jim

Re: [PATCH 3/16][ARM] Add float16x4_t intrinsics

2015-07-07 Thread Alan Lawrence

Kyrill Tkachov wrote:

On 07/07/15 14:09, Kyrill Tkachov wrote:

Hi Alan,

On 07/07/15 13:34, Alan Lawrence wrote:

As per https://gcc.gnu.org/ml/gcc-patches/2015-04/msg01335.html

For some context, the reference for these is at:
http://infocenter.arm.com/help/topic/com.arm.doc.ihi0073a/IHI0073A_arm_neon_intrinsics_ref.pdf

This patch is ok once you and Charles decide on how to proceed with the two
prerequisites.

On second thought, the ACLE document at
http://infocenter.arm.com/help/topic/com.arm.doc.ihi0053c/IHI0053C_acle_2_0.pdf

says in 12.2.1:
float16 types are only available when the __fp16 type is defined, i.e. when
supported by the hardware

However, we support __fp16 whenever the user specifies -mfp16-format=ieee or
-mfp16-format=alternative, regardless of whether we have hardware support or not.

Thus we cannot be consistent with both sides of that 'i.e.', unless we also
change when __fp16 is available.

I notice that the float32x4_t is unconditionally defined in our arm_neon.h,
however.
I think this is a bug and its definition should be #ifdef'd properly as well.

Hmmm. Is this becoming a question of, which potentially-existing code do we want
to break???

Cheers, Alan

Re: [PING][PATCH, 1/2] Merge rewrite_virtuals_into_loop_closed_ssa from gomp4 branch

2015-07-07 Thread Tom de Vries


On 06/07/15 15:44, Richard Biener wrote:

On Mon, 6 Jul 2015, Tom de Vries wrote:


On 25/06/15 09:42, Tom de Vries wrote:

Hi,

this patch merges rewrite_virtuals_into_loop_closed_ssa (originally
submitted here: https://gcc.gnu.org/ml/gcc-patches/2015-06/msg01236.html
) to trunk.

Bootstrapped and reg-tested on x86_64.

OK for trunk?



Ping.

Thanks,
- Tom



0001-Merge-rewrite_virtuals_into_loop_closed_ssa-from-gom.patch


Merge rewrite_virtuals_into_loop_closed_ssa from gomp4 branch

2015-06-24  Tom de Vriest...@codesourcery.com

merge from gomp4 branch:
2015-06-24  Tom de Vriest...@codesourcery.com

* tree-ssa-loop-manip.c (get_virtual_phi): Factor out of ...
(rewrite_virtuals_into_loop_closed_ssa): ... here.

* tree-ssa-loop-manip.c (replace_uses_in_dominated_bbs): Factor out
of ...
(rewrite_virtuals_into_loop_closed_ssa): ... here.

* dominance.c (bitmap_get_dominated_by): New function.
* dominance.h (bitmap_get_dominated_by): Declare.
* tree-ssa-loop-manip.c (rewrite_virtuals_into_loop_closed_ssa): Use
bitmap_get_dominated_by.

* tree-parloops.c (replace_uses_in_bbs_by)
(rewrite_virtuals_into_loop_closed_ssa): Move to ...
* tree-ssa-loop-manip.c: here.
* tree-ssa-loop-manip.h (rewrite_virtuals_into_loop_closed_ssa):
Declare.

2015-06-18  Tom de Vriest...@codesourcery.com

* tree-parloops.c (rewrite_virtuals_into_loop_closed_ssa): New
function.
(transform_to_exit_first_loop_alt): Use
rewrite_virtuals_into_loop_closed_ssa.
---
   gcc/dominance.c   | 21 
   gcc/dominance.h   |  1 +
   gcc/tree-parloops.c   | 43 +
   gcc/tree-ssa-loop-manip.c | 81
+++
   gcc/tree-ssa-loop-manip.h |  1 +
   5 files changed, 112 insertions(+), 35 deletions(-)

diff --git a/gcc/dominance.c b/gcc/dominance.c
index 9c66ca2..9b52d79 100644
--- a/gcc/dominance.c
+++ b/gcc/dominance.c
@@ -753,6 +753,27 @@ set_immediate_dominator (enum cdi_direction dir,
basic_block bb,
   dom_computed[dir_index] = DOM_NO_FAST_QUERY;
   }

+/* Returns in BBS the list of basic blocks immediately dominated by BB, in
the
+   direction DIR.  As get_dominated_by, but returns result as a bitmap.  */
+
+void
+bitmap_get_dominated_by (enum cdi_direction dir, basic_block bb, bitmap
bbs)
+{
+  unsigned int dir_index = dom_convert_dir_to_idx (dir);
+  struct et_node *node = bb-dom[dir_index], *son = node-son, *ason;
+
+  bitmap_clear (bbs);
+
+  gcc_checking_assert (dom_computed[dir_index]);
+
+  if (!son)
+return;
+
+  bitmap_set_bit (bbs, ((basic_block) son-data)-index);
+  for (ason = son-right; ason != son; ason = ason-right)
+bitmap_set_bit (bbs, ((basic_block) son-data)-index);
+}
+


Isn't a immediate_dominated_by_p () predicate better?  It's very
cheap to compute compared to allocating / populating and querying
a bitmap.



Dropped bitmap_get_dominated_by per comment below.


   /* Returns the list of basic blocks immediately dominated by BB, in the
  direction DIR.  */
   vecbasic_block
diff --git a/gcc/dominance.h b/gcc/dominance.h
index 37e138b..0a1a13e 100644
--- a/gcc/dominance.h
+++ b/gcc/dominance.h
@@ -41,6 +41,7 @@ extern void free_dominance_info (enum cdi_direction);
   extern basic_block get_immediate_dominator (enum cdi_direction,
basic_block);
   extern void set_immediate_dominator (enum cdi_direction, basic_block,
 basic_block);
+extern void bitmap_get_dominated_by (enum cdi_direction, basic_block,
bitmap);
   extern vecbasic_block get_dominated_by (enum cdi_direction,
basic_block);
   extern vecbasic_block get_dominated_by_region (enum cdi_direction,
 basic_block *,
diff --git a/gcc/tree-parloops.c b/gcc/tree-parloops.c
index e582fe7..df7c351 100644
--- a/gcc/tree-parloops.c
+++ b/gcc/tree-parloops.c
@@ -1498,25 +1498,6 @@ replace_uses_in_bb_by (tree name, tree val,
basic_block bb)
   }
   }

-/* Replace uses of NAME by VAL in blocks BBS.  */
-
-static void
-replace_uses_in_bbs_by (tree name, tree val, bitmap bbs)
-{
-  gimple use_stmt;
-  imm_use_iterator imm_iter;
-
-  FOR_EACH_IMM_USE_STMT (use_stmt, imm_iter, name)
-{
-  if (!bitmap_bit_p (bbs, gimple_bb (use_stmt)-index))
-   continue;
-
-  use_operand_p use_p;
-  FOR_EACH_IMM_USE_ON_STMT (use_p, imm_iter)
-   SET_USE (use_p, val);
-}
-}
-
   /* Do transformation from:

bb preheader:
@@ -1637,18 +1618,11 @@ transform_to_exit_first_loop_alt (struct loop *loop,
 tree control = gimple_cond_lhs (cond_stmt);
 edge e;

-  /* Gather the bbs dominated by the exit block.  */
-  bitmap exit_dominated = BITMAP_ALLOC (NULL);
-  bitmap_set_bit (exit_dominated, exit_block-index);
-  vecbasic_block exit_dominated_vec
-= get_dominated_by (CDI_DOMINATORS, exit_block);
-
-  int i;