from:"Matthew Wahab"

[Testsuite] Use correct effective-target settings for ARM fp16-aapcs tests.

2016-09-30 Thread Matthew Wahab


The recently added tests gcc.target/arm/aapcs-{3,4}.c are intended
to check the behaviour of th ARM Alternative FP16 format. They both
check for compiler support of FP16 using dg-require-effective-target
arm_fp16_ok This is too weak since the directive is true when
fp16-format=ieee is set, as it is when the +fp16 extension is
enabled.

This patch changes the directives for both tests to
  dg-require-effective-target arm_fp16_alternative_ok
which is only enabled with fp16-format=alternative is set.

For fp16-aapcs-4.c, it was also necessary to add the
-mfp16-format=alternative to the dg-options, rather than use the
arm_fp16-alternative add-options. There seems to some interaction
between the different directives and the dg-skip-if, but I can't track
it down.

Tested for cross-compiled arm-none-eabi by running the
gcc.target/arm/arm.exp testsuite on an ARMv8.2-A emulator and on an
ARMv8-A emulator.

Ok for trunk?
Matthew

testsuite/
2016-09-28  Matthew Wawhab  

* gcc.target/arm/fp16-aapcs-3.c: Replace the arm_fp16_ok with
arm_fp16_alternative_ok as the required effective target.
* gcc.target/arm/fp16-aapcs-4.c: Likewise.  Also add
-mfp16-format=alternative to the dg-options directive and remove
the dg-add-otions directive.
>From 5ca74bbfdf2b87904ca21fcaa54952cbd1d3916c Mon Sep 17 00:00:00 2001
From: Matthew Wahab 
Date: Wed, 28 Sep 2016 10:54:43 +0100
Subject: [PATCH] [Testsuite] Use correct effective-target settings for ARM 
 fp16-aapcs tests.

The recently added tests gcc.target/arm/aapcs-{3,4}.c are intended
to check the behaviour of th ARM Alternative FP16 format. They both
check for compiler support of FP16 using dg-require-effective-target
arm_fp16_ok This is too weak since the directive is true when
fp16-format=ieee is set, as it is when the +fp16 extension is
enabled.

This patch changes the directives for both tests to
  dg-require-effective-target arm_fp16_alternative_ok
which is only enabled with fp16-format=alternative is set.

For fp16-aapcs-4.c, it was also neccessary to add the
-mfp16-format=alternative to the dg-options, rather than use the
arm_fp16-alternative add-options. There seems to some interaction
between the different directives and the dg-skip-if, but I can't track
it down.

Tested for cross-compiled arm-none-eabi by running the
gcc.target/arm/arm.exp testsuite on an ARMv8.2-A emulator and on an
ARMv8-A emulator.

testsuite/
2016-09-28  Matthew Wawhab  

	* gcc.target/arm/fp16-aapcs-3.c: Replace the arm_fp16_ok with
	arm_fp16_alternative_ok as the required effective target.
	* gcc.target/arm/fp16-aapcs-4.c: Likewise.  Also add
	-mfp16-format=alternative to the dg-options directive and remove
	the dg-add-otions directive.
---
 gcc/testsuite/gcc.target/arm/fp16-aapcs-3.c | 2 +-
 gcc/testsuite/gcc.target/arm/fp16-aapcs-4.c | 5 ++---
 2 files changed, 3 insertions(+), 4 deletions(-)

diff --git a/gcc/testsuite/gcc.target/arm/fp16-aapcs-3.c b/gcc/testsuite/gcc.target/arm/fp16-aapcs-3.c
index b7d7e58..84fc0a0 100644
--- a/gcc/testsuite/gcc.target/arm/fp16-aapcs-3.c
+++ b/gcc/testsuite/gcc.target/arm/fp16-aapcs-3.c
@@ -1,6 +1,6 @@
 /* { dg-do compile }  */
 /* { dg-require-effective-target arm_hard_vfp_ok }  */
-/* { dg-require-effective-target arm_fp16_ok } */
+/* { dg-require-effective-target arm_fp16_alternative_ok } */
 /* { dg-options "-O2" }  */
 /* { dg-add-options arm_fp16_alternative } */
 
diff --git a/gcc/testsuite/gcc.target/arm/fp16-aapcs-4.c b/gcc/testsuite/gcc.target/arm/fp16-aapcs-4.c
index 4c90a56..41c7ab7 100644
--- a/gcc/testsuite/gcc.target/arm/fp16-aapcs-4.c
+++ b/gcc/testsuite/gcc.target/arm/fp16-aapcs-4.c
@@ -1,7 +1,6 @@
 /* { dg-do compile }  */
-/* { dg-require-effective-target arm_fp16_ok } */
-/* { dg-options "-mfloat-abi=softfp -O2" }  */
-/* { dg-add-options arm_fp16_alternative } */
+/* { dg-require-effective-target arm_fp16_alternative_ok } */
+/* { dg-options "-mfloat-abi=softfp -O2 -mfp16-format=alternative" }  */
 /* { dg-skip-if "incompatible float-abi" { arm*-*-* } { "-mfloat-abi=hard" } } */
 
 /* Test __fp16 arguments and return value in registers (softfp).  */
-- 
2.1.4

[ARM] Fix new constraints and attributes of SI/HI data movement patterns

2016-09-29 Thread Matthew Wahab


The patch at https://gcc.gnu.org/ml/gcc-patches/2016-09/msg01975.html
added constraints and "arch" attributes to some data movement patterns,
to fix wrongly generating MOVW instructions when not supported by the
targets. This was needed to fix a broken build but also resulted in MOVW
instructions not being generated when they should be.

This patch fixes the code-gen problems by removing the new attributes
from *arm_movsi_vfp and *thumb2_movsi_vfp since they are unnecessary.
It changes the rest of the added arch attributes from "t2", which
specifies a Thumb-2 target, to the weaker "v6t2" which specifies a
target that supports Thumb-2.

Tested for arm-none-linux-gnueabihf with native bootstrap and make check
on ARMv8-A and for arm-none-eabi with cross-compiled check-gcc on an
ARMv8.2-A emulator.

There is one unrelated failure in gcc.target/arm/fp16-aapcs-3.c, which
is a recently added test for the FP16 ARM alternative format. This has
dg-require-effective-target arm_fp16_ok
which is true for +fp16 because that implies mfp16-format=ieee. The test
should instead be requiring
dg-require-effective-target arm_fp16_alternative_ok
I'll send a patch to fix this.

Ok for trunk?
Matthew

gcc/
2016-09-29  Matthew Wahab  

* config/arm/arm.md (*arm_movsi_insn): Replace "t2" arch attribute
with "v6t2".  Move "arch" attribute above "pool_range".
* config/arm/vfp.md (*arm_movhi_vfp): Likewise.
(*thumb2_movhi_vfp): Likewise.
(*arm_movhi_fp16): Likewise.
(*thumb2_movhi_fp16): Likewise.
(*arm_movsi_vfp): Remove "arch" attribute.
(*thumb2_movsi_vfp): Likewise.
>From 7ef04c9cc749f1705ea657874c9db43e8a7d5320 Mon Sep 17 00:00:00 2001
From: Matthew Wahab 
Date: Wed, 28 Sep 2016 12:08:22 +0100
Subject: [PATCH] [ARM] Fix new constraints and attributes of SI/HI data
 movement patterns

The patch at https://gcc.gnu.org/ml/gcc-patches/2016-09/msg01975.html
added constraints and "arch" attributes to some data movement patterns,
to fix wrongly generating MOVW instructions when not supported by the
targets. This was needed to fix a broken build but also resulted in MOVW
instructions not being generated when they should be.

This patch fixes the code-gen problems by removing the new attributes
from *arm_movsi_insn, *arm_movsi_vfp and *thumb2_movsi_vfp since they
are unnecessary. It changes the rest of the added arch attributes from
"t2", which specifies a Thumb-2 target, to the weaker "v6t2" which
specifies a target that supports Thumb-2.

Tested for arm-none-linux-gnueabihf with native bootstrap and make check
on ARMv8-A and for arm-none-eabi with cross-compiled check-gcc on an
ARMv8.2-A emulator.

There is one unrelated failure in gcc.target/arm/fp16-aapcs-3.c, which
is a recently added test for the FP16 ARM alternative format. This has
dg-require-effective-target arm_fp16_ok
which is true for +fp16 because that implies mfp16-format=ieee. The test
should instead be requiring
dg-require-effective-target arm_fp16_alternative_ok

gcc/
2016-09-29  Matthew Wahab  

	* config/arm/arm.md (*arm_movsi_insn): Replace "t2" arch attribute
	with "v6t2".  Move "arch" attribute above "pool_range".
	* config/arm/vfp.md (*arm_movhi_vfp): Likewise.
	(*thumb2_movhi_vfp): Likewise.
	(*arm_movhi_fp16): Likewise.
	(*thumb2_movhi_fp16): Likewise.
	(*arm_movsi_vfp): Remove "arch" attribute.
	(*thumb2_movsi_vfp): Likewise.
---
 gcc/config/arm/arm.md |  2 +-
 gcc/config/arm/vfp.md | 10 --
 2 files changed, 5 insertions(+), 7 deletions(-)

diff --git a/gcc/config/arm/arm.md b/gcc/config/arm/arm.md
index 999292b..396aab7 100644
--- a/gcc/config/arm/arm.md
+++ b/gcc/config/arm/arm.md
@@ -6064,8 +6064,8 @@
str%?\\t%1, %0"
   [(set_attr "type" "mov_reg,mov_imm,mvn_imm,mov_imm,load1,store1")
(set_attr "predicable" "yes")
+   (set_attr "arch" "*,*,*,v6t2,*,*")
(set_attr "pool_range" "*,*,*,*,4096,*")
-   (set_attr "arch" "*,*,*,t2,*,*")
(set_attr "neg_pool_range" "*,*,*,*,4084,*")]
 )
 
diff --git a/gcc/config/arm/vfp.md b/gcc/config/arm/vfp.md
index 21eaf48..f39e590 100644
--- a/gcc/config/arm/vfp.md
+++ b/gcc/config/arm/vfp.md
@@ -65,7 +65,7 @@
 (const_string "f_mcr")
 (const_string "f_mrc")
 (const_string "fmov")])
-  (set_attr "arch" "*, *, t2, *, *, *, *, *")
+  (set_attr "arch" "*, *, v6t2, *, *, *, *, *")
   (set_attr "pool_range" "*, *, *, *, 256, *, *, *")
   (set_attr "neg_pool_range" "*, *, *, *, 244, *, *, *")
   (set_attr "length" "4")]
@@ -109,7 +109,7 @@
   (set_attr "type"
"mov_reg, mov_

[ARM] Fix invalid instructions generated for data movement.

2016-09-27 Thread Matthew Wahab


Recently added support for ARMv8.2-A
(https://gcc.gnu.org/ml/gcc-patches/2016-05/msg01240.html) included a
number of changes to improve data movement, particularly for HI and HF
mode values. These included the use of the Thumb-2 instruction MOVW and
of the new VMOV.F16 instruction. There are problems with both: the use
of MOVW isn't properly guarded so that it can be generated for targets
that don't support it and the VMOV.F16 instruction is wrongly marked as
being predicable.

This patch adds guards to the use of the MOVW so that it is only
generated when the target supports Thumb-2 and fixes the predication on
the VMOV.F16 instruction.

Tested for arm-none-linux-gnueabihf with native bootstrap and make check
on ARMv8-A and for arm-none-eabi with cross compiled check-gcc on an
ARMv8.2-A emulator.

There is one failure on arm-none-linux-gnueabihf,
gcc.dg/guality/pr36728-1.c which is due to MOVW not being generated,
even for ARMv7-A. The generated code is otherwise correct. I think I
understand why MOVW isn't being emitted but it'll take time to test
properly.

Since this patch is to fix a broken build is it OK to commit it and to fix
the poor code-gen in a follow-up patch?
Matthew

gcc/
2016-09-27  Matthew Wahab  

* config/arm/arm.md (*arm_movsi_insn): Add "arch" attribute.
* config/arm/vfp.md (*arm_movhi_vfp): Likewise.
(*thumb2_movhi_vfp): Likewise.
(*arm_movhi_fp16): Remove predication operand from VMOV.F16
template.  Expand predicable attribute to mark VMOV.F16 as not
predicable.  Add "arch" attribute.
(*thumb2_movhi_fp16): Likewise.
(*arm_movsi_vfp): Break a long line.  Add "arch" attribute.
(*thumb2_movsi_vfp): Add "arch" attribute.
>From bedba58f504ef1f68ee0f90d9e34563b75653ae5 Mon Sep 17 00:00:00 2001
From: Matthew Wahab 
Date: Mon, 26 Sep 2016 12:05:34 +0100
Subject: [PATCH] [ARM] Fix invalid instructions generated for data movement.

Recently added support for ARMv8.2-A
(https://gcc.gnu.org/ml/gcc-patches/2016-05/msg01240.html) included a
number of changes to improve data movement, particularly for HI and HF
mode values. These included the use of the Thumb-2 instruction MOVW and
of the new VMOV.F16 instruction. There are problems with both: the use
of MOVW isn't properly guarded so that it can be generated for targets
that don't support it. The VMOV.F16 instruction is wrongly marked as
being predicable.

This patch adds guards to the use of the MOVW so that it is only
generated when the target supports Thumb-2 and fixes the predication on
the VMOV.F16 instruction.

Tested for arm-none-linux-gnueabihf with native bootstrap and make check
on ARMv8-A and for arm-none-eabi with cross compiled check-gcc on an
ARMv8.2-A emulator.

There is one failure on arm-none-linux-gnueabihf,
gcc.dg/guality/pr36728-1.c which is due to MOVW not being generated,
even for ARMv7-A. The generated code is otherwise correct. I think I
understand why MOVW isn't being emitted but it'll take time to test
properly.

Since this patch is to fix a broken build is it OK to commit and to fix
the poor code-gen in a follow-up patch?

gcc/
2016-09-27  Matthew Wahab  

	* config/arm/arm.md (*arm_movsi_insn): Add "arch" attribute.
	* config/arm/vfp.md (*arm_movhi_vfp): Likewise.
	(*thumb2_movhi_vfP): Likewise.
	(*arm_movhi_fp16): Remove predication operand from VMOV.F16
	template.  Expand predicable attribute to mark VMOV.F16 as not
	predicable.  Add "arch" attribute.
	(*thumb2_movhi_fp16): Likewise.
	(*arm_movsi_vfp): Break a long line.  Add "arch" attribute.
	(*thum2_movsi_vfp): Add "arch" attribute.
---
 gcc/config/arm/arm.md |  1 +
 gcc/config/arm/vfp.md | 18 +-
 2 files changed, 14 insertions(+), 5 deletions(-)

diff --git a/gcc/config/arm/arm.md b/gcc/config/arm/arm.md
index 411754f..999292b 100644
--- a/gcc/config/arm/arm.md
+++ b/gcc/config/arm/arm.md
@@ -6065,6 +6065,7 @@
   [(set_attr "type" "mov_reg,mov_imm,mvn_imm,mov_imm,load1,store1")
(set_attr "predicable" "yes")
(set_attr "pool_range" "*,*,*,*,4096,*")
+   (set_attr "arch" "*,*,*,t2,*,*")
(set_attr "neg_pool_range" "*,*,*,*,4084,*")]
 )
 
diff --git a/gcc/config/arm/vfp.md b/gcc/config/arm/vfp.md
index 5d22c34..21eaf48 100644
--- a/gcc/config/arm/vfp.md
+++ b/gcc/config/arm/vfp.md
@@ -65,6 +65,7 @@
 (const_string "f_mcr")
 (const_string "f_mrc")
 (const_string "fmov")])
+  (set_attr "arch" "*, *, t2, *, *, *, *, *")
   (set_attr "pool_range" "*, *, *, *, 256, *, *, *")
   (set_attr "neg_pool_range" "*, *, *, *, 244, *, *, *")
   (set_attr "length" "4")]
@@ -108,6 +109,7 @@
   (set_attr "type"
"mov_reg, mov

Re: [PATCH 5/17][ARM] Enable HI mode moves for floating point values.

2016-09-26 Thread Matthew Wahab


On 26/09/16 14:15, Christophe Lyon wrote:

Hi,

Sorry for the delay, I've been travelling.

On 27 July 2016 at 15:57, Ramana Radhakrishnan
 wrote:

On Tue, May 17, 2016 at 3:29 PM, Matthew Wahab
 wrote:

The handling of 16-bit integer data-movement in the ARM backend doesn't
make full use of the VFP instructions when they are available, even when
the values are for use in VFP operations.

This patch adds support for using the VFP instructions and registers
when moving 16-bit integer and floating point data between registers and
between registers and memory.


With this patch, I've noticed 2 regressions which still seem present on
trunk.

When GCC is configured with:
  --target=arm-none-linux-gnueabihf  --disable-libgomp
--disable-libmudflap --disable-libcilkrts --enable-checking
--enable-languages=c,c++,fortran --with-float=hard
--enable-build-with-cxx --with-mode=arm --with-cpu=cortex-a9
--with-fpu=vfp

and the tests run with -march=armv5t in RUNTESTFLAGS

FAIL: gcc.target/arm/constant-pool.c (test for excess errors)
because:
/ccepmUiD.s:29: Error: selected processor does not support ARM mode
`movw r0,4660'

FAIL: gfortran.fortran-torture/execute/nan_inf_fmt.f90 compilation,  -O2
because:
/cc76h4mz.s: Assembler messages:
/cc76h4mz.s:413: Error: selected processor does not support ARM mode
`movw r3,8224'
/cc76h4mz.s:496: Error: selected processor does not support ARM mode
`movw r2,8224'
/cc76h4mz.s:578: Error: selected processor does not support ARM mode
`movw ip,8224'

Christophe


I suspect that this is the same MOVW problem as on the next patch in the series.

Matthew

Re: [PATCH 6/17][ARM] Add data processing intrinsics for float16_t.

2016-09-26 Thread Matthew Wahab


On 26/09/16 14:03, Ramana Radhakrishnan wrote:

On Mon, Sep 26, 2016 at 1:48 PM, Christophe Lyon
 wrote:

On 26 September 2016 at 11:43, Matthew Wahab  wrote:

Hello,

On 25/09/16 14:00, Christophe Lyon wrote:



This patch adds the new intrinsics:
   vbsl_f16, vbslq_f16, vdup_n_f16, vdupq_n_f16, vdup_lane_f16,
   vdupq_lane_f16, vext_f16, vextq_f16, vmov_n_f16, vmovq_n_f16,
   vrev64_f16, vrev64q_f16, vtrn_f16, vtrnq_f16, vuzp_f16, vuzpq_f16,
   vzip_f16, vzipq_f16.

This patch also updates the advsimd-intrinsics testsuite to test the f16
variants for ARM targets. These intrinsics are only implemented in the
ARM target so the tests are disabled for AArch64 using an extra
condition on a new convenience macro FP16_SUPPORTED. This patch also
disables, for the ARM target, the testsuite defined macro vdup_n_f16 as
it is no longer needed.



Since you committed this patch, I've noticed that libgcc fails to build
when GCC is configured:
--target arm-none-eabi and default cpu
/tmp/9649048_29.tmpdir/ccuBwQJJ.s: Assembler messages:
/tmp/9649048_29.tmpdir/ccuBwQJJ.s:64: Error: selected processor does
not support ARM mode `movwlt r0,32768'
/tmp/9649048_29.tmpdir/ccuBwQJJ.s:65: Error: selected processor does
not support ARM mode `movwge r0,32767'
make[4]: *** [_ssaddHQ.o] Error 1
make[4]: Leaving directory

`/tmp/9649048_29.tmpdir/aci-gcc-fsf/builds/gcc-fsf-gccsrc/obj-arm-none-eabi/gcc1/arm-none-eabi/fpu/libgcc'




I can't reproduce the failure, could you send the configure arguments for
the build.



If I'm not mistaken, that is:
  --target=arm-none-eabi  --disable-nls --disable-libgomp
--disable-libmudflap --disable-libcilkrts --enable-checking
--enable-languages=c,c++ --with-newlib

Maybe you've disabled multilibs?



I'm pretty sure I built this as part of reviewing all these patches
with --with-mutlib-list=aprofile and didnt' see any failures. Not sure
what's going on here.



I think the problem is that the new patterns use MOVW, which is a Thumb-2 
instruction, but don't check for Thumb-2 support in the target. I'm testing a 
patch to fix this.


Matthew

Re: [PATCH 6/17][ARM] Add data processing intrinsics for float16_t.

2016-09-26 Thread Matthew Wahab


Hello,

On 25/09/16 14:00, Christophe Lyon wrote:


This patch adds the new intrinsics:
  vbsl_f16, vbslq_f16, vdup_n_f16, vdupq_n_f16, vdup_lane_f16,
  vdupq_lane_f16, vext_f16, vextq_f16, vmov_n_f16, vmovq_n_f16,
  vrev64_f16, vrev64q_f16, vtrn_f16, vtrnq_f16, vuzp_f16, vuzpq_f16,
  vzip_f16, vzipq_f16.

This patch also updates the advsimd-intrinsics testsuite to test the f16
variants for ARM targets. These intrinsics are only implemented in the
ARM target so the tests are disabled for AArch64 using an extra
condition on a new convenience macro FP16_SUPPORTED. This patch also
disables, for the ARM target, the testsuite defined macro vdup_n_f16 as
it is no longer needed.


Since you committed this patch, I've noticed that libgcc fails to build
when GCC is configured:
--target arm-none-eabi and default cpu
/tmp/9649048_29.tmpdir/ccuBwQJJ.s: Assembler messages:
/tmp/9649048_29.tmpdir/ccuBwQJJ.s:64: Error: selected processor does
not support ARM mode `movwlt r0,32768'
/tmp/9649048_29.tmpdir/ccuBwQJJ.s:65: Error: selected processor does
not support ARM mode `movwge r0,32767'
make[4]: *** [_ssaddHQ.o] Error 1
make[4]: Leaving directory
`/tmp/9649048_29.tmpdir/aci-gcc-fsf/builds/gcc-fsf-gccsrc/obj-arm-none-eabi/gcc1/arm-none-eabi/fpu/libgcc'




I can't reproduce the failure, could you send the configure arguments for the 
build.


I've tried assembling the string 'movw r0, 32768' and get the error when 
-march=armv6kz or earlier. I suspect the new movhi and/or movhf patterns added 
earlier in the series need the architecture level added as a precondition but 
I'll need to look into it.


Matthew

[ARM] Enable FP16 vector arithmetic operations.

2016-09-23 Thread Matthew Wahab


Hello,

Support for the ARMv8.2-A FP16 NEON arithmetic instructions was added
using non-standard names for the instruction patterns. This was needed
because the NEON floating point semantics meant that their use by the
compiler for HFmode arithmetic operations needed to be restricted. This
follows the implementation for 32-bit NEON intructions.

As with the 32-bit instructions, the restriction on the HFmode
operation can be lifted when -funsafe-math-optimizations is
enabled. This patch does that, defining the standard pattern names
addhf3, subhf3, mulhf3 and fmahf3.

This patch also updates the NEON intrinsics to use the arithmetic
operations when -ffast-math is enabled. This is to make keep the 16-bit
support consistent with the 32-bit supportd. It is needed so that code
using the f16 intrinsics are subject to the same optimizations as code
using the f32 intrinsics would be.

Tested for arm-none-linux-gnueabihf with native bootstrap and make check
on ARMv8-A and for arm-none-eabi and armeb-none-eabi with cross-compiled
make check on an ARMv8.2-A emulator.

Ok for trunk?
Matthew

gcc/
2016-09-23  Matthew Wahab  

* config/arm/arm_neon.h (vadd_f16): Use standard arithmetic
operations in fast-math mode.
(vaddq_f16): Likewise.
(vmul_f16): Likewise.
(vmulq_f16): Likewise.
(vsub_f16): Likewise.
(vsubq_f16): Likewise.
* config/arm/neon.md (add3): New.
(sub3): New.
(fma:3): New.  Also remove outdated comment.
(mul3): New.

testsuite/
2016-09-23  Matthew Wahab  

* gcc.target/arm/armv8_2-fp16-arith-1.c: Expand comment.  Update
expected output of vadd, vsub and vmul instructions.
* gcc.target/arm/armv8_2-fp16-arith-2.c: New.
* gcc.target/arm/armv8_2-fp16-neon-2.c: New.
* gcc.target/arm/armv8_2-fp16-neon-3.c: New.
>From 5c8855d44db480772803b6395cd698c704353408 Mon Sep 17 00:00:00 2001
From: Matthew Wahab 
Date: Tue, 5 Jul 2016 14:53:19 +0100
Subject: [PATCH] [ARM] Enable FP16 vector arithmetic operations.

Tested for arm-none-linux-gnueabihf with native bootstrap and make check
on ARMv8-A and for arm-none-eabi and armeb-none-eabi with cross-compiled
make check on an ARMv8.2-A emulator.

gcc/
2016-09-23  Matthew Wahab  

	* config/arm/arm_neon.h (vadd_f16): Use standard arithmetic
	operations in fast-math mode.
	(vaddq_f16): Likewise.
	(vmul_f16): Likewise.
	(vmulq_f16): Likewise.
	(vsub_f16): Likewise.
	(vsubq_f16): Likewise.
	* config/arm/neon.md (add3): New.
	(sub3): New.
	(fma:3): New.  Also remove outdated comment.
	(mul3): New.

testsuite/
2016-09-23  Matthew Wahab  

	* gcc.target/arm/armv8_2-fp16-arith-1.c: Expand comment.  Update
	expected output of vadd, vsub and vmul instructions.
	* gcc.target/arm/armv8_2-fp16-arith-2.c: New.
	* gcc.target/arm/armv8_2-fp16-neon-2.c: New.
	* gcc.target/arm/armv8_2-fp16-neon-3.c: New.
---
 gcc/config/arm/arm_neon.h  |  24 +
 gcc/config/arm/neon.md |  52 ++-
 .../gcc.target/arm/armv8_2-fp16-arith-1.c  |  18 +-
 .../gcc.target/arm/armv8_2-fp16-arith-2.c  | 109 +
 gcc/testsuite/gcc.target/arm/armv8_2-fp16-neon-2.c | 491 +
 gcc/testsuite/gcc.target/arm/armv8_2-fp16-neon-3.c | 108 +
 6 files changed, 796 insertions(+), 6 deletions(-)
 create mode 100644 gcc/testsuite/gcc.target/arm/armv8_2-fp16-arith-2.c
 create mode 100644 gcc/testsuite/gcc.target/arm/armv8_2-fp16-neon-2.c
 create mode 100644 gcc/testsuite/gcc.target/arm/armv8_2-fp16-neon-3.c

diff --git a/gcc/config/arm/arm_neon.h b/gcc/config/arm/arm_neon.h
index 54bbc7d..b19ed4f 100644
--- a/gcc/config/arm/arm_neon.h
+++ b/gcc/config/arm/arm_neon.h
@@ -14875,13 +14875,21 @@ vabsq_f16 (float16x8_t __a)
 __extension__ static __inline float16x4_t __attribute__ ((__always_inline__))
 vadd_f16 (float16x4_t __a, float16x4_t __b)
 {
+#ifdef __FAST_MATH__
+  return __a + __b;
+#else
   return __builtin_neon_vaddv4hf (__a, __b);
+#endif
 }
 
 __extension__ static __inline float16x8_t __attribute__ ((__always_inline__))
 vaddq_f16 (float16x8_t __a, float16x8_t __b)
 {
+#ifdef __FAST_MATH__
+  return __a + __b;
+#else
   return __builtin_neon_vaddv8hf (__a, __b);
+#endif
 }
 
 __extension__ static __inline uint16x4_t __attribute__ ((__always_inline__))
@@ -15319,7 +15327,11 @@ vminnmq_f16 (float16x8_t __a, float16x8_t __b)
 __extension__ static __inline float16x4_t __attribute__ ((__always_inline__))
 vmul_f16 (float16x4_t __a, float16x4_t __b)
 {
+#ifdef __FAST_MATH__
+  return __a * __b;
+#else
   return __builtin_neon_vmulfv4hf (__a, __b);
+#endif
 }
 
 __extension__ static __inline float16x4_t __attribute__ ((__always_inline__))
@@ -15337,7 +15349,11 @@ vmul_n_f16 (float16x4_t __a, float16_t __b)
 __extension__ static __inline float16x8_t __attribute__ ((__always_inline__))
 vmulq_f16 (float16x8_t __a, float16x8_t __b)
 {
+#ifdef __FAST_MATH__
+  return __a * __b;
+#else
   return __builtin_neon_vmulfv8hf (

Re: [ARM] FP16 ARM Alternative format variants of AAPCS tests.

2016-09-21 Thread Matthew Wahab


On 03/08/16 12:43, Ramana Radhakrishnan wrote:

On Mon, Jun 27, 2016 at 11:09 AM, Matthew Wahab
 wrote:


Tests added for FP16 argument and return values being passed in
registers only check the case when the FP16 IEEE format is used. This
patch adds equivalent tests that also check the behaviour when the
ARM Alternative format is used.


[..]

testsuite/
2016-06-27  Matthew Wahab  

 * gcc.target/arm/fp16-aapcs-3.c: New.
 * gcc.target/arm/fp16-aapcs-4.c: New.
 * gcc.target/arm/aapcs/aapcs/vfp22.c: New.
 * gcc.target/arm/aapcs/aapcs/vfp23.c: New.
 * gcc.target/arm/aapcs/aapcs/vfp24.c: New.
 * gcc.target/arm/aapcs/aapcs/vfp25.c: New.


OK once the pre-reqs are in place.

Thanks,
Ramana




Committed as r240314 after checking the new tests with cross-compiled 
arm-none-linux-gnueabihf. Sorry for the delay.


Matthew

Re: [ARM][PR target/77281] Fix an invalid check for vectors of, the same floating-point constants.

2016-08-30 Thread Matthew Wahab


Ping.

On 19/08/16 15:47, Richard Earnshaw (lists) wrote:

On 19/08/16 15:06, Matthew Wahab wrote:

On 19/08/16 14:30, Richard Earnshaw (lists) wrote:

On 19/08/16 12:48, Matthew Wahab wrote:

2016-08-19  Matthew Wahab  

  PR target/77281
  * config/arm/arm.c (neon_valid_immediate): Delete declaration.
  Use const_vec_duplicate to check for duplicated elements.

Ok for trunk?


OK.

Thanks.

R.


Is this ok to backport to gcc-6?
Matthew



I believe we're in a release process, so backporting needs RM approval.

R.



Is this ok to backport to gcc-6?
I've tested for arm-none-linux-gnueabihf with native bootstrap and check-gcc.

Matthew

Re: Implement C _FloatN, _FloatNx types [version 6]

2016-08-22 Thread Matthew Wahab


Hello,

On 17/08/16 21:17, Joseph Myers wrote:

[Version 6 changes the testsuite to use dg-add-options systematically
to add any options that may be needed for the types to be supported;
this should allow the _Float128 and _Float64x tests to run for
powerpc64le, but I have not tested that; it could do with powerpc
maintainer testing that the tests do indeed run.  It also has fixes
for the fp-int-convert issues pointed out in
.  There are
no changes to the patch outside the testsuite and associated
documentation of effective-target and dg-add-options keywords.]


ISO/IEC TS 18661-3:2015 defines C bindings to IEEE interchange and
extended types, in the form of _FloatN and _FloatNx type names with
corresponding fN/FN and fNx/FNx constant suffixes and FLTN_* / FLTNX_*
 macros.  This patch implements support for this feature in
GCC.



There are some test failure on arm that appear to be due to this patch.


Index: gcc/testsuite/gcc.dg/torture/fp-int-convert-float32x-timode.c
===
--- gcc/testsuite/gcc.dg/torture/fp-int-convert-float32x-timode.c   
(nonexistent)
+++ gcc/testsuite/gcc.dg/torture/fp-int-convert-float32x-timode.c   
(working copy)
@@ -0,0 +1,16 @@
+/* Test floating-point conversions.  _Float32x type with TImode.  */
+/* { dg-do run } */
+/* { dg-options "" } */
+/* { dg-add-options float32x } */
+/* { dg-require-effective-target float32x_runtime } */
+
+#define __STDC_WANT_IEC_60559_TYPES_EXT__
+#include 
+#include "fp-int-convert.h"
+
+int
+main (void)
+{
+  TEST_I_F(TItype, UTItype, _Float32, FLT32X_MANT_DIG, FLT32X_MAX_EXP);
+  exit (0);
+


This test fails with an abort at runtime. The other float32x tests pass.


Index: gcc/testsuite/gcc.dg/torture/fp-int-convert.h
===
--- gcc/testsuite/gcc.dg/torture/fp-int-convert.h   (revision 239543)
+++ gcc/testsuite/gcc.dg/torture/fp-int-convert.h   (working copy)
@@ -15,20 +15,21 @@ typedef long TItype;
  typedef unsigned long UTItype;
  #endif

-/* TEST_I_F(I, U, F, P) tests conversions between the pair of signed
-   and unsigned integer types I and U and the floating-point type F,
-   where P is the binary precision of the floating point type.  We
-   test conversions of the values 0, 1, 0x7...f, 0x8...0, 0xf...f.  We
-   also test conversions of values half way between two
-   representable values (rounding both ways), just above half way, and
-   just below half way.  */
-#define TEST_I_F(I, U, F, P)   \
+/* TEST_I_F(I, U, F, P, M) tests conversions between the pair of
+   signed and unsigned integer types I and U and the floating-point
+   type F, where P is the binary precision of the floating point type
+   and M is the MAX_EXP value for that type (so 2^M overflows, 2^(M-1)
+   does not).  We test conversions of the values 0, 1, 0x7...f,
+   0x8...0, 0xf...f.  We also test conversions of values half way
+   between two representable values (rounding both ways), just above
+   half way, and just below half way.  */
+#define TEST_I_F(I, U, F, P, M)\


This change makes gcc.dg/torture/arm-fp16-int-convert-{alt,ieee].c fail because
they still pass four arguments to the macro, not five.

Matthew

Re: [ARM][PR target/77281] Fix an invalid check for vectors of, the same floating-point constants.

2016-08-19 Thread Matthew Wahab


On 19/08/16 14:30, Richard Earnshaw (lists) wrote:

On 19/08/16 12:48, Matthew Wahab wrote:

2016-08-19  Matthew Wahab  

 PR target/77281
 * config/arm/arm.c (neon_valid_immediate): Delete declaration.
 Use const_vec_duplicate to check for duplicated elements.

Ok for trunk?


OK.

Thanks.

R.


Is this ok to backport to gcc-6?
Matthew

[ARM][PR target/77281] Fix an invalid check for vectors of, the same floating-point constants.

2016-08-19 Thread Matthew Wahab


Hello,

Test gcc.c-torture/execute/ieee/pr72824-2.c fails for arm targets
because the code generated to move a vector of signed and unsigned zeros
treats it as a vector of unsigned zeros.

That is, an assignment x = { 0.f, -0.f, 0.f, -0.f } is treated as the
assignment x = { 0.f, 0.f, 0.f, 0.f }.

This is due to config/arm/arm.c/neon_valid_immediate using real_equal to
compare the vector elements. This patch replaces the check, using
const_vec_duplicate_p instead. It doesn't add a new test because
pr72824-2.c is enough to check the behaviour.

Tested for arm-none-linux-gnueabihf with native bootstrap and make check
and for arm-none-eabi with cross-compiled check-gcc.

2016-08-19  Matthew Wahab  

PR target/77281
* config/arm/arm.c (neon_valid_immediate): Delete declaration.
Use const_vec_duplicate to check for duplicated elements.

Ok for trunk?
Matthew
>From 90c1c86b7a3d8bc6ac07363aea5fba8f29ef3e96 Mon Sep 17 00:00:00 2001
From: Matthew Wahab 
Date: Wed, 17 Aug 2016 14:43:48 +0100
Subject: [PATCH] [ARM] Fix a wrong test for vectors of the same constants.

---
 gcc/config/arm/arm.c | 13 -
 1 file changed, 4 insertions(+), 9 deletions(-)

diff --git a/gcc/config/arm/arm.c b/gcc/config/arm/arm.c
index a6afdcc..c1d010c 100644
--- a/gcc/config/arm/arm.c
+++ b/gcc/config/arm/arm.c
@@ -12471,7 +12471,6 @@ neon_valid_immediate (rtx op, machine_mode mode, int inverse,
   if (GET_MODE_CLASS (mode) == MODE_VECTOR_FLOAT)
 {
   rtx el0 = CONST_VECTOR_ELT (op, 0);
-  const REAL_VALUE_TYPE *r0;
 
   if (!vfp3_const_double_rtx (el0) && el0 != CONST0_RTX (GET_MODE (el0)))
 return -1;
@@ -12480,14 +12479,10 @@ neon_valid_immediate (rtx op, machine_mode mode, int inverse,
   if (GET_MODE_INNER (mode) == HFmode)
 	return -1;
 
-  r0 = CONST_DOUBLE_REAL_VALUE (el0);
-
-  for (i = 1; i < n_elts; i++)
-{
-  rtx elt = CONST_VECTOR_ELT (op, i);
-  if (!real_equal (r0, CONST_DOUBLE_REAL_VALUE (elt)))
-return -1;
-}
+  /* All elements in the vector must be the same.  Note that 0.0 and -0.0
+	 are distinct in this context.  */
+  if (!const_vec_duplicate_p (op))
+	return -1;
 
   if (modconst)
 *modconst = CONST_VECTOR_ELT (op, 0);
-- 
2.1.4

Re: fix fallout of pr22051-2.c on arm

2016-08-04 Thread Matthew Wahab


On 03/08/16 23:08, Prathamesh Kulkarni wrote:

Hi,
The attached patch fixes pr22051-2.c which regressed due to
r238754. Matthew, could you please confirm if this patch fixes the
test-case for you ?



Confirmed. (Tested with arm-none-linux-gnueabihf.)

Thanks
Matthew

Re: [PATCH 8/17][ARM] Add VFP FP16 arithmetic instructions.

2016-08-03 Thread Matthew Wahab


On 03/08/16 12:52, Ramana Radhakrishnan wrote:

On Thu, Jul 28, 2016 at 12:37 PM, Ramana Radhakrishnan
 wrote:

On Mon, Jul 4, 2016 at 3:02 PM, Matthew Wahab
 wrote:

On 19/05/16 15:54, Matthew Wahab wrote:

On 18/05/16 16:20, Joseph Myers wrote:

On Wed, 18 May 2016, Matthew Wahab wrote:

In short: instructions for direct HFmode arithmetic should be described
with patterns with the standard names.  It's the job of the
architecture-independent compiler to ensure that fp16 arithmetic in the
user's source code only generates direct fp16 arithmetic in GIMPLE (and
thus ends up using those patterns) if that is a correct representation of
the source code's semantics according to ACLE.



This patch changes the implementation to use the standard names for the
HFmode arithmetic. Later patches will also be updated to use the
arithmetic operators where appropriate.



All fine except -

Why can we not extend the  and the l in
vfp.md for fp16 and avoid all the unspecs for vcvta and vrnd*
instructions ?



I now feel reasonably convinced that these can go away and be replaced
by extending the  and l expanders to
consider FP16 as well. Given that we are still only in the middle of
stage1 - I'm ok for you to apply this as is and then follow-up with a
patch that gets rid of the UNSPECs . If this holds for add, sub and
other patterns I don't see why it wouldn't hold for all these patterns
as well.

Joseph, do you have any opinions on whether we should be extending the
standard pattern names or not for btrunc, ceil, round, floor,
nearbyint, rint, lround, lfloor and lceil optabs for the HFmode
quantities ?



Sorry for the delay replying.

I didn't extend the lvrint_pattern and vrint_pattern expanders to HF mode 
because of the general intention to do fp16 operations through the NEON 
intrinsics. If extending them to HF mode  produces the expected behaviour for 
the standard names that they implement then I agree that the change should be made.


I would prefer to do that as a separate patch though, to make sure that the new 
operations are properly tested. Some of the existing tests (in gcc.target/arm) 
use builtins that aren't available for HF mode so something else will be needed.


Matthew

Re: [PR70920] transform (intptr_t) x eq/ne CST to x eq/ne (typeof x) cst

2016-08-03 Thread Matthew Wahab


On 29/07/16 15:32, Prathamesh Kulkarni wrote:

On 29 July 2016 at 12:42, Richard Biener  wrote:

On Fri, 29 Jul 2016, Prathamesh Kulkarni wrote:


On 28 July 2016 at 19:18, Richard Biener  wrote:

On Thu, 28 Jul 2016, Prathamesh Kulkarni wrote:


On 28 July 2016 at 15:58, Andreas Schwab  wrote:

On Mo, Jul 25 2016, Prathamesh Kulkarni  wrote:


diff --git a/gcc/testsuite/gcc.dg/pr70920-4.c b/gcc/testsuite/gcc.dg/pr70920-4.c
new file mode 100644
index 000..dedb895
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/pr70920-4.c
@@ -0,0 +1,21 @@
+/* { dg-do compile } */
+/* { dg-options "-O2 -fdump-tree-ccp-details -Wno-int-to-pointer-cast" } */
+
+#include 
+
+void f1();
+void f2();
+
+void
+foo (int a)
+{
+  void *cst = 0;
+  if ((int *) a == cst)
+{
+  f1 ();
+  if (a)
+ f2 ();
+}
+}
+
+/* { dg-final { scan-tree-dump "gimple_simplified to if \\(_\[0-9\]* == 0\\)" 
"ccp1" } } */


This fails on all ilp32 platforms.

[..]


I don't think just matching == 0 is a good idea.  I suggest to
restrict the testcase to lp64 targets and maybe add a ilp32 variant.

Hi,
I restricted the test-case to lp64 targets.
Is this OK to commit ?


Hello,

The test case is failing for arm-none-linux-gnueabihf.

It is correctly skipped if the 'dg-require-effective-target lp64' you added is 
moved to the end of the directives (after the dg-options).


Matthew

Re: [PATCH 16/17][ARM] Add tests for VFP FP16 ACLE instrinsics.

2016-07-04 Thread Matthew Wahab


On 18/05/16 11:58, Matthew Wahab wrote:
> On 18/05/16 02:06, Joseph Myers wrote:
>> On Tue, 17 May 2016, Matthew Wahab wrote:
>>
>>> In some tests, there are unavoidable differences in precision when
>>> calculating the actual and the expected results of an FP16 operation. A
>>> new support function CHECK_FP_BIAS is used so that these tests can check
>>> for an acceptable margin of error. In these tests, the tolerance is
>>> given as the absolute integer difference between the bitvectors of the
>>> expected and the actual results.
>>
>> As far as I can see, CHECK_FP_BIAS is only used in the following patch,
>> but there is another bias test in vsqrth_f16_1.c in this patch.
>
> This is my mistake, the CHECK_FP_BIAS is used for the NEON tests and should
>  have gone into that patch. The VFP test can do a simpler check so doesn't
> need the macro.
>
>> Could you clarify where the "unavoidable differences in precision" come
>> from? Are the results of some of the new instructions not fully specified,
>> only specified within a given precision?  (As far as I can tell the
>> existing v8 instructions for reciprocal and reciprocal square root
>> estimates do have fully defined results, despite being loosely described
>> as esimtates.)
>
> The expected results in the new tests are represented as expressions whose
> value is expected to be calculated at compile-time. This makes the tests
> more readable but differences in the precision between the the compiler and
> the HW calculations mean that for vrecpe_f16, vrecps_f16, vrsqrts_f16 and
> vsqrth_f16_1.c the expected and actual results are different.
>
> On reflection, it may be better to remove the CHECK_FP_BIAS macro and, for
> the tests that needed it, to drop the compiler calculation and just use the
>  expected hexadecimal value.
>
> Other tests depending on compiler-time calculations involve relatively
> simple arithmetic operations and it's not clear if they are susceptible to
> the same rounding errors. I have limited knowledge in FP arithmetic though
> so I'll look into this.

The scalar tests added in this patch and the vector tests added in the
next patch have been reworked to use the exact values for the expected
results rather than compile-time expressions. The CHECK_FP_BIAS macro is
not used and is removed from this patch.

The intention with these tests and with the vector tests is to check
that the compiler emits code that produces the same results as the
instruction regardless of any optimizations that it may apply. The
expected results for the tests were produced using inline assembler
taking the same inputs as the intrinsics being tested.

Other changes are to add and use some (limited) templates for scalar
operations and to add progress and error reporting, making the scalar
tests more consistent with those for the vector operations.

Tested the series for arm-none-linux-gnueabihf with native bootstrap and
make check and for arm-none-eabi and armeb-none-eabi with make check on
an ARMv8.2-A emulator.

Ok for trunk?
Matthew

testsuite/
2016-07-04  Jiong Wang  
Matthew Wahab  

* gcc.target/aarch64/advsimd-intrinsics/binary_scalar_op.inc: New.
* gcc.target/aarch64/advsimd-intrinsics/unary_scalar_op.inc: New.
* gcc.target/aarch64/advsimd-intrinsics/ternary_scalar_op.inc: New.
* gcc.target/aarch64/advsimd-intrinsics/vabsh_f16_1.c: New.
* gcc.target/aarch64/advsimd-intrinsics/vaddh_f16_1.c: New.
* gcc.target/aarch64/advsimd-intrinsics/vcvtah_s32_f16_1.c: New.
* gcc.target/aarch64/advsimd-intrinsics/vcvtah_u32_f16_1.c: New.
* gcc.target/aarch64/advsimd-intrinsics/vcvth_f16_s32_1.c: New.
* gcc.target/aarch64/advsimd-intrinsics/vcvth_f16_u32_1.c: New.
* gcc.target/aarch64/advsimd-intrinsics/vcvth_n_f16_s32_1.c: New.
* gcc.target/aarch64/advsimd-intrinsics/vcvth_n_f16_u32_1.c: New.
* gcc.target/aarch64/advsimd-intrinsics/vcvth_n_s32_f16_1.c: New.
* gcc.target/aarch64/advsimd-intrinsics/vcvth_n_u32_f16_1.c: New.
* gcc.target/aarch64/advsimd-intrinsics/vcvth_s32_f16_1.c: New.
* gcc.target/aarch64/advsimd-intrinsics/vcvth_u32_f16_1.c: New.
* gcc.target/aarch64/advsimd-intrinsics/vcvtmh_s32_f16_1.c: New.
* gcc.target/aarch64/advsimd-intrinsics/vcvtmh_u32_f16_1.c: New.
* gcc.target/aarch64/advsimd-intrinsics/vcvtnh_s32_f16_1.c: New.
* gcc.target/aarch64/advsimd-intrinsics/vcvtnh_u32_f16_1.c: New.
* gcc.target/aarch64/advsimd-intrinsics/vcvtph_s32_f16_1.c: New.
* gcc.target/aarch64/advsimd-intrinsics/vcvtph_u32_f16_1.c: New.
* gcc.target/aarch64/advsimd-intrinsics/vdivh_f16_1.c: New.
* gcc.target/aarch64/advsimd-intrinsics/vfmah_f16_1.c: New.
* gcc.target

Re: [PATCH 15/17][ARM] Add tests for ARMv8.2-A FP16 support.

2016-07-04 Thread Matthew Wahab


On 17/05/16 15:48, Matthew Wahab wrote:
> Support for using the half-precision floating point operations added by
> the ARMv8.2-A FP16 extension is based on the macros and intrinsics added
> to the ACLE for the extension.
>
> This patch adds tests to check the compilers treatment of the ACLE
> macros and the code generated for the new intrinsics. It does not
> include the executable tests for the
> gcc.target/aarch64/advsimd-intrinsics testsuite. Those are added later
> in the patch series.

Changes since the previous version are:

- Fix the vsqrte/vrsqrte spelling mistake.

- armv8_2-fp16-scalar-2.c: Set option -std=c11, needed to test that
  vaddh_f16 (vmulh_f16 (a, b), c) generates a VMLA. (Options enabled
  with the default -std=g11 mean that VFMA would be generated
  otherwise.)

Tested the series for arm-none-linux-gnueabihf with native bootstrap and
make check and for arm-none-eabi and armeb-none-eabi with make check on
an ARMv8.2-A emulator.

Ok for trunk?
Matthew

testsuite/
2016-07-04  Matthew Wahab  

* gcc.target/arm/armv8_2-fp16-neon-1.c: New.
* gcc.target/arm/armv8_2-fp16-scalar-1.c: New.
* gcc.target/arm/armv8_2-fp16-scalar-2.c: New.
* gcc.target/arm/attr-fp16-arith-1.c: Add a test of intrinsics
support.

>From b8760efc9da23357dc2bccef36e8ba2fc2f7a856 Mon Sep 17 00:00:00 2001
From: Matthew Wahab 
Date: Thu, 7 Apr 2016 13:38:02 +0100
Subject: [PATCH 15/17] [PATCH 15/17][ARM] Add tests for ARMv8.2-A FP16
 support.

testsuite/
2016-07-04  Matthew Wahab  

	* gcc.target/arm/armv8_2-fp16-neon-1.c: New.
	* gcc.target/arm/armv8_2-fp16-scalar-1.c: New.
	* gcc.target/arm/armv8_2-fp16-scalar-2.c: New.
	* gcc.target/arm/attr-fp16-arith-1.c: Add a test of intrinsics
	support.
---
 gcc/testsuite/gcc.target/arm/armv8_2-fp16-neon-1.c | 490 +
 .../gcc.target/arm/armv8_2-fp16-scalar-1.c | 203 +
 .../gcc.target/arm/armv8_2-fp16-scalar-2.c |  71 +++
 gcc/testsuite/gcc.target/arm/attr-fp16-arith-1.c   |  13 +
 4 files changed, 777 insertions(+)
 create mode 100644 gcc/testsuite/gcc.target/arm/armv8_2-fp16-neon-1.c
 create mode 100644 gcc/testsuite/gcc.target/arm/armv8_2-fp16-scalar-1.c
 create mode 100644 gcc/testsuite/gcc.target/arm/armv8_2-fp16-scalar-2.c

diff --git a/gcc/testsuite/gcc.target/arm/armv8_2-fp16-neon-1.c b/gcc/testsuite/gcc.target/arm/armv8_2-fp16-neon-1.c
new file mode 100644
index 000..968efae
--- /dev/null
+++ b/gcc/testsuite/gcc.target/arm/armv8_2-fp16-neon-1.c
@@ -0,0 +1,490 @@
+/* { dg-do compile }  */
+/* { dg-require-effective-target arm_v8_2a_fp16_neon_ok }  */
+/* { dg-options "-O2" }  */
+/* { dg-add-options arm_v8_2a_fp16_neon }  */
+
+/* Test instructions generated for the FP16 vector intrinsics.  */
+
+#include 
+
+#define MSTRCAT(L, str)	L##str
+
+#define UNOP_TEST(insn)\
+  float16x4_t	\
+  MSTRCAT (test_##insn, _16x4) (float16x4_t a)	\
+  {		\
+return MSTRCAT (insn, _f16) (a);		\
+  }		\
+  float16x8_t	\
+  MSTRCAT (test_##insn, _16x8) (float16x8_t a)	\
+  {		\
+return MSTRCAT (insn, q_f16) (a);		\
+  }
+
+#define BINOP_TEST(insn)	\
+  float16x4_t			\
+  MSTRCAT (test_##insn, _16x4) (float16x4_t a, float16x4_t b)	\
+  {\
+return MSTRCAT (insn, _f16) (a, b);\
+  }\
+  float16x8_t			\
+  MSTRCAT (test_##insn, _16x8) (float16x8_t a, float16x8_t b)	\
+  {\
+return MSTRCAT (insn, q_f16) (a, b);			\
+  }
+
+#define BINOP_LANE_TEST(insn, I)	\
+  float16x4_t\
+  MSTRCAT (test_##insn##_lane, _16x4) (float16x4_t a, float16x4_t b)	\
+  {	\
+return MSTRCAT (insn, _lane_f16) (a, b, I);\
+  }	\
+  float16x8_t\
+  MSTRCAT (test_##insn##_lane, _16x8) (float16x8_t a, float16x4_t b)	\
+  {	\
+return MSTRCAT (insn, q_lane_f16) (a, b, I);			\
+  }
+
+#define BINOP_LANEQ_TEST(insn, I)	\
+  float16x4_t\
+  MSTRCAT (test_##insn##_laneq, _16x4) (float16x4_t a, float16x8_t b)	\
+  {	\
+return MSTRCAT (insn, _laneq_f16) (a, b, I);			\
+  }	\
+  float16x8_t\
+  MSTRCAT (test_##insn##_laneq, _16x8) (float16x8_t a, float16x8_t b)	\
+  {	\
+return MSTRCAT (insn, q_laneq_f16) (a, b, I);			\
+  }	\
+
+#define BINOP_N_TEST(insn)	\
+  float16x4_t			\
+  MSTRCAT (test_##insn##_n, _16x4) (float16x4_t a, float16_t b)	\
+  {\
+return MSTRCAT (insn, _n_f16) (a, b);			\
+  }\
+  float16x8_t			\
+  MSTRCAT (test_##insn##_n, _16x8) (float16x8_t a, float16_t b)	\
+  {\
+return MSTRCAT (insn, q_n_f16) (a, b);			\
+  }
+
+#define TERNOP_TEST(insn)		\
+  float16_t\
+  MSTRCAT (test_##insn, _16) (float16_t a, float16_t b, float16_t c)	\
+  {	\
+return MSTRCAT (insn, h_f16) (a, b, c);\
+  }	\
+  float16x4_t\
+  MSTRCAT (test_##insn, _16x4) (float16x4_t a, float16x4_t b,		\
+			   float16x4_t c)\
+  {

Re: [PATCH 14/17][ARM] Add NEON FP16 instrinsics.

2016-07-04 Thread Matthew Wahab


On 17/05/16 15:46, Matthew Wahab wrote:
> The ARMv8.2-A architecture introduces an optional FP16 extension adding
> half-precision floating point data processing instructions to the
> existing Adv.SIMD (NEON) support. A future version of the ACLE will add
> support for these instructions and this patch implements that support.

Updated to fix the vsqrte/vrsqrte spelling mistake.

Tested the series for arm-none-linux-gnueabihf with native bootstrap and
make check and for arm-none-eabi and armeb-none-eabi with make check on
an ARMv8.2-A emulator.

Ok for trunk?
Matthew

2016-07-04  Matthew Wahab  

* config/arm/arm_neon.h (vabd_f16): New.
(vabdq_f16): New.
(vabs_f16): New.
(vabsq_f16): New.
(vadd_f16): New.
(vaddq_f16): New.
(vcage_f16): New.
(vcageq_f16): New.
(vcagt_f16): New.
(vcagtq_f16): New.
(vcale_f16): New.
(vcaleq_f16): New.
(vcalt_f16): New.
(vcaltq_f16): New.
(vceq_f16): New.
(vceqq_f16): New.
(vceqz_f16): New.
(vceqzq_f16): New.
(vcge_f16): New.
(vcgeq_f16): New.
(vcgez_f16): New.
(vcgezq_f16): New.
(vcgt_f16): New.
(vcgtq_f16): New.
(vcgtz_f16): New.
(vcgtzq_f16): New.
(vcle_f16): New.
(vcleq_f16): New.
(vclez_f16): New.
(vclezq_f16): New.
(vclt_f16): New.
(vcltq_f16): New.
(vcltz_f16): New.
(vcltzq_f16): New.
(vcvt_f16_s16): New.
(vcvt_f16_u16): New.
(vcvt_s16_f16): New.
(vcvt_u16_f16): New.
(vcvtq_f16_s16): New.
(vcvtq_f16_u16): New.
(vcvtq_s16_f16): New.
(vcvtq_u16_f16): New.
(vcvta_s16_f16): New.
(vcvta_u16_f16): New.
(vcvtaq_s16_f16): New.
(vcvtaq_u16_f16): New.
(vcvtm_s16_f16): New.
(vcvtm_u16_f16): New.
(vcvtmq_s16_f16): New.
(vcvtmq_u16_f16): New.
(vcvtn_s16_f16): New.
(vcvtn_u16_f16): New.
(vcvtnq_s16_f16): New.
(vcvtnq_u16_f16): New.
(vcvtp_s16_f16): New.
(vcvtp_u16_f16): New.
(vcvtpq_s16_f16): New.
(vcvtpq_u16_f16): New.
(vcvt_n_f16_s16): New.
(vcvt_n_f16_u16): New.
(vcvtq_n_f16_s16): New.
(vcvtq_n_f16_u16): New.
(vcvt_n_s16_f16): New.
(vcvt_n_u16_f16): New.
(vcvtq_n_s16_f16): New.
(vcvtq_n_u16_f16): New.
(vfma_f16): New.
(vfmaq_f16): New.
(vfms_f16): New.
(vfmsq_f16): New.
(vmax_f16): New.
(vmaxq_f16): New.
(vmaxnm_f16): New.
(vmaxnmq_f16): New.
(vmin_f16): New.
(vminq_f16): New.
(vminnm_f16): New.
(vminnmq_f16): New.
(vmul_f16): New.
(vmul_lane_f16): New.
(vmul_n_f16): New.
(vmulq_f16): New.
(vmulq_lane_f16): New.
(vmulq_n_f16): New.
(vneg_f16): New.
(vnegq_f16): New.
(vpadd_f16): New.
(vpmax_f16): New.
(vpmin_f16): New.
(vrecpe_f16): New.
(vrecpeq_f16): New.
(vrnd_f16): New.
(vrndq_f16): New.
(vrnda_f16): New.
(vrndaq_f16): New.
(vrndm_f16): New.
(vrndmq_f16): New.
(vrndn_f16): New.
(vrndnq_f16): New.
(vrndp_f16): New.
(vrndpq_f16): New.
(vrndx_f16): New.
(vrndxq_f16): New.
(vrsqrte_f16): New.
(vrsqrteq_f16): New.
(vrecps_f16): New.
(vrecpsq_f16): New.
(vrsqrts_f16): New.
(vrsqrtsq_f16): New.
(vsub_f16): New.
(vsubq_f16): New.

>From c26f43f3127d18971769f891c252ec5e157026f9 Mon Sep 17 00:00:00 2001
From: Matthew Wahab 
Date: Thu, 7 Apr 2016 15:36:34 +0100
Subject: [PATCH 14/17] [PATCH 14/17][ARM] Add NEON FP16 instrinsics.

2016-07-04  Matthew Wahab  

	* config/arm/arm_neon.h (vabd_f16): New.
	(vabdq_f16): New.
	(vabs_f16): New.
	(vabsq_f16): New.
	(vadd_f16): New.
	(vaddq_f16): New.
	(vcage_f16): New.
	(vcageq_f16): New.
	(vcagt_f16): New.
	(vcagtq_f16): New.
	(vcale_f16): New.
	(vcaleq_f16): New.
	(vcalt_f16): New.
	(vcaltq_f16): New.
	(vceq_f16): New.
	(vceqq_f16): New.
	(vceqz_f16): New.
	(vceqzq_f16): New.
	(vcge_f16): New.
	(vcgeq_f16): New.
	(vcgez_f16): New.
	(vcgezq_f16): New.
	(vcgt_f16): New.
	(vcgtq_f16): New.
	(vcgtz_f16): New.
	(vcgtzq_f16): New.
	(vcle_f16): New.
	(vcleq_f16): New.
	(vclez_f16): New.
	(vclezq_f16): New.
	(vclt_f16): New.
	(vcltq_f16): New.
	(vcltz_f16): New.
	(vcltzq_f16): New.
	(vcvt_f16_s16): New.
	(vcvt_f16_u16): New.
	(vcvt_s16_f16): New.
	(vcvt_u16_f16): New.
	(vcvtq_f16_s16): New.
	(vcvtq_f16_u16): New.
	(vcvtq_s16_f16): New.
	(vcvtq_u16_f16): New.
	(vcvta_s16_f16): New.
	(vcvta_u16_f16): New.
	(vcvtaq_s16_f16): New.
	(vcvtaq_u16_f16): New.
	(vcvtm_s16_f16): New.
	(vcvtm_u16_f16): New.
	(vcvtmq_s16_f16): New.
	(vcvtmq_u16_f16): New.
	(vcvtn_s16_f16): New.

Re: [PATCH 13/17][ARM] Add VFP FP16 instrinsics.

2016-07-04 Thread Matthew Wahab


On 17/05/16 15:44, Matthew Wahab wrote:
> The ARMv8.2-A architecture introduces an optional FP16 extension adding
> half-precision floating point data processing instructions to the
> existing scalar (floating point) support. A future version of the ACLE
> will add support for these instructions and this patch implements that
> support.

Updated to use the standard arithmetic operations for vnegh_f16,
vaddh_f16, vsubh_f16, vmulh_f16 and vdivh_f16.

Tested the series for arm-none-linux-gnueabihf with native bootstrap and
make check and for arm-none-eabi and armeb-none-eabi with make check on
an ARMv8.2-A emulator.

Ok for trunk?
Matthew

2016-07-04  Matthew Wahab  

* config.gcc (extra_headers): Add arm_fp16.h
* config/arm/arm_fp16.h: New.
* config/arm/arm_neon.h: Include "arm_fp16.h".

>From a9042ae0e0ea4a61436663a1afea81ccf699e9f9 Mon Sep 17 00:00:00 2001
From: Matthew Wahab 
Date: Thu, 7 Apr 2016 15:36:23 +0100
Subject: [PATCH 13/17] [PATCH 13/17][ARM] Add VFP FP16 instrinsics.

2016-07-04  Matthew Wahab  

	* config.gcc (extra_headers): Add arm_fp16.h
	* config/arm/arm_fp16.h: New.
	* config/arm/arm_neon.h: Include "arm_fp16.h".
---
 gcc/config.gcc|   2 +-
 gcc/config/arm/arm_fp16.h | 255 ++
 gcc/config/arm/arm_neon.h |   1 +
 3 files changed, 257 insertions(+), 1 deletion(-)
 create mode 100644 gcc/config/arm/arm_fp16.h

diff --git a/gcc/config.gcc b/gcc/config.gcc
index 1f75f17..4333bc9 100644
--- a/gcc/config.gcc
+++ b/gcc/config.gcc
@@ -320,7 +320,7 @@ arc*-*-*)
 arm*-*-*)
 	cpu_type=arm
 	extra_objs="arm-builtins.o aarch-common.o"
-	extra_headers="mmintrin.h arm_neon.h arm_acle.h"
+	extra_headers="mmintrin.h arm_neon.h arm_acle.h arm_fp16.h"
 	target_type_format_char='%'
 	c_target_objs="arm-c.o"
 	cxx_target_objs="arm-c.o"
diff --git a/gcc/config/arm/arm_fp16.h b/gcc/config/arm/arm_fp16.h
new file mode 100644
index 000..c72d8c4
--- /dev/null
+++ b/gcc/config/arm/arm_fp16.h
@@ -0,0 +1,255 @@
+/* ARM FP16 intrinsics include file.
+
+   Copyright (C) 2016 Free Software Foundation, Inc.
+   Contributed by ARM Ltd.
+
+   This file is part of GCC.
+
+   GCC is free software; you can redistribute it and/or modify it
+   under the terms of the GNU General Public License as published
+   by the Free Software Foundation; either version 3, or (at your
+   option) any later version.
+
+   GCC is distributed in the hope that it will be useful, but WITHOUT
+   ANY WARRANTY; without even the implied warranty of MERCHANTABILITY
+   or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU General Public
+   License for more details.
+
+   Under Section 7 of GPL version 3, you are granted additional
+   permissions described in the GCC Runtime Library Exception, version
+   3.1, as published by the Free Software Foundation.
+
+   You should have received a copy of the GNU General Public License and
+   a copy of the GCC Runtime Library Exception along with this program;
+   see the files COPYING3 and COPYING.RUNTIME respectively.  If not, see
+   <http://www.gnu.org/licenses/>.  */
+
+#ifndef _GCC_ARM_FP16_H
+#define _GCC_ARM_FP16_H 1
+
+#ifdef __cplusplus
+extern "C" {
+#endif
+
+#include 
+
+/* Intrinsics for FP16 instructions.  */
+#pragma GCC push_options
+#pragma GCC target ("fpu=fp-armv8")
+
+#if defined (__ARM_FEATURE_FP16_SCALAR_ARITHMETIC)
+
+typedef __fp16 float16_t;
+
+__extension__ static __inline float16_t __attribute__ ((__always_inline__))
+vabsh_f16 (float16_t __a)
+{
+  return __builtin_neon_vabshf (__a);
+}
+
+__extension__ static __inline float16_t __attribute__ ((__always_inline__))
+vaddh_f16 (float16_t __a, float16_t __b)
+{
+  return __a + __b;
+}
+
+__extension__ static __inline int32_t __attribute__ ((__always_inline__))
+vcvtah_s32_f16 (float16_t __a)
+{
+  return __builtin_neon_vcvtahssi (__a);
+}
+
+__extension__ static __inline uint32_t __attribute__ ((__always_inline__))
+vcvtah_u32_f16 (float16_t __a)
+{
+  return __builtin_neon_vcvtahusi (__a);
+}
+
+__extension__ static __inline float16_t __attribute__ ((__always_inline__))
+vcvth_f16_s32 (int32_t __a)
+{
+  return __builtin_neon_vcvthshf (__a);
+}
+
+__extension__ static __inline float16_t __attribute__ ((__always_inline__))
+vcvth_f16_u32 (uint32_t __a)
+{
+  return __builtin_neon_vcvthuhf (__a);
+}
+
+__extension__ static __inline float16_t __attribute__ ((__always_inline__))
+vcvth_n_f16_s32 (int32_t __a, const int __b)
+{
+  return __builtin_neon_vcvths_nhf (__a, __b);
+}
+
+__extension__ static __inline float16_t __attribute__ ((__always_inline__))
+vcvth_n_f16_u32 (uint32_t __a, const int __b)
+{
+  return __builtin_neon_vcvthu_nhf ((int32_t)__a, __b);
+}
+
+__extension__ static __inline int32_t __attribute__ ((__always_inline__))
+vcvth_n_s32_f16 (float16_t __a, const int __b)
+{
+  return __builtin_neon_vcvths_nsi (__a, _

Re: [PATCH 12/17][ARM] Add builtins for NEON FP16 intrinsics.

2016-07-04 Thread Matthew Wahab

On 17/05/16 15:42, Matthew Wahab wrote:
> This patch adds the builtins data for the ACLE intrinsics introduced to
> support the NEON instructions of the ARMv8.2-A FP16 extension.

Updated to fix the vsqrte/vrsqrte spelling mistake and correct the changelog.

Tested the series for arm-none-linux-gnueabihf with native bootstrap and
make check and for arm-none-eabi and armeb-none-eabi with make check on
an ARMv8.2-A emulator.

Ok for trunk?
Matthew

2016-07-04  Matthew Wahab  

* config/arm/arm_neon_builtins.def (vadd): New (v8hf, v4hf
variants).
(vmulf): New (v8hf, v4hf variants).
(vfma): New (v8hf, v4hf variants).
(vfms): New (v8hf, v4hf variants).
(vsub): New (v8hf, v4hf variants).
(vcage): New (v8hf, v4hf variants).
(vcagt): New (v8hf, v4hf variants).
(vcale): New (v8hf, v4hf variants).
(vcalt): New (v8hf, v4hf variants).
(vceq): New (v8hf, v4hf variants).
(vcgt): New (v8hf, v4hf variants).
(vcge): New (v8hf, v4hf variants).
(vcle): New (v8hf, v4hf variants).
(vclt): New (v8hf, v4hf variants).
(vceqz): New (v8hf, v4hf variants).
(vcgez): New (v8hf, v4hf variants).
(vcgtz): New (v8hf, v4hf variants).
(vcltz): New (v8hf, v4hf variants).
(vclez): New (v8hf, v4hf variants).
(vabd): New (v8hf, v4hf variants).
(vmaxf): New (v8hf, v4hf variants).
(vmaxnm): New (v8hf, v4hf variants).
(vminf): New (v8hf, v4hf variants).
(vminnm): New (v8hf, v4hf variants).
(vpmaxf): New (v4hf variant).
(vpminf): New (v4hf variant).
(vpadd): New (v4hf variant).
(vrecps): New (v8hf, v4hf variants).
(vrsqrts): New (v8hf, v4hf variants).
(vabs): New (v8hf, v4hf variants).
(vneg): New (v8hf, v4hf variants).
(vrecpe): New (v8hf, v4hf variants).
(vrnd): New (v8hf, v4hf variants).
(vrnda): New (v8hf, v4hf variants).
(vrndm): New (v8hf, v4hf variants).
(vrndn): New (v8hf, v4hf variants).
(vrndp): New (v8hf, v4hf variants).
(vrndx): New (v8hf, v4hf variants).
(vrsqrte): New (v8hf, v4hf variants).
(vmul_lane): Add v4hf and v8hf variants.
(vmul_n): Add v4hf and v8hf variants.
(vext): New (v8hf, v4hf variants).
(vcvts): New (v8hi, v4hi variants).
(vcvts): New (v8hf, v4hf variants).
(vcvtu): New (v8hi, v4hi variants).
(vcvtu): New (v8hf, v4hf variants).
(vcvts_n): New (v8hf, v4hf variants).
(vcvtu_n): New (v8hi, v4hi variants).
(vcvts_n): New (v8hi, v4hi variants).
(vcvtu_n): New (v8hf, v4hf variants).
(vbsl): New (v8hf, v4hf variants).
(vcvtas): New (v8hf, v4hf variants).
(vcvtau): New (v8hf, v4hf variants).
(vcvtms): New (v8hf, v4hf variants).
(vcvtmu): New (v8hf, v4hf variants).
(vcvtns): New (v8hf, v4hf variants).
(vcvtnu): New (v8hf, v4hf variants).
(vcvtps): New (v8hf, v4hf variants).
(vcvtpu): New (v8hf, v4hf variants).

>From 5df552f65de19667400c63ff939ed5e90a8cbadf Mon Sep 17 00:00:00 2001
From: Matthew Wahab 
Date: Thu, 7 Apr 2016 13:36:41 +0100
Subject: [PATCH 12/17] [PATCH 12/17][ARM] Add builtins for NEON FP16
 intrinsics.

2016-07-04  Matthew Wahab  

	* config/arm/arm_neon_builtins.def (vadd): New (v8hf, v4hf
	variants).
	(vmulf): New (v8hf, v4hf variants).
	(vfma): New (v8hf, v4hf variants).
	(vfms): New (v8hf, v4hf variants).
	(vsub): New (v8hf, v4hf variants).
	(vcage): New (v8hf, v4hf variants).
	(vcagt): New (v8hf, v4hf variants).
	(vcale): New (v8hf, v4hf variants).
	(vcalt): New (v8hf, v4hf variants).
	(vceq): New (v8hf, v4hf variants).
	(vcgt): New (v8hf, v4hf variants).
	(vcge): New (v8hf, v4hf variants).
	(vcle): New (v8hf, v4hf variants).
	(vclt): New (v8hf, v4hf variants).
	(vceqz): New (v8hf, v4hf variants).
	(vcgez): New (v8hf, v4hf variants).
	(vcgtz): New (v8hf, v4hf variants).
	(vcltz): New (v8hf, v4hf variants).
	(vclez): New (v8hf, v4hf variants).
	(vabd): New (v8hf, v4hf variants).
	(vmaxf): New (v8hf, v4hf variants).
	(vmaxnm): New (v8hf, v4hf variants).
	(vminf): New (v8hf, v4hf variants).
	(vminnm): New (v8hf, v4hf variants).
	(vpmaxf): New (v4hf variant).
	(vpminf): New (v4hf variant).
	(vpadd): New (v4hf variant).
	(vrecps): New (v8hf, v4hf variants).
	(vrsqrts): New (v8hf, v4hf variants).
	(vabs): New (v8hf, v4hf variants).
	(vneg): New (v8hf, v4hf variants).
	(vrecpe): New (v8hf, v4hf variants).
	(vrnd): New (v8hf, v4hf variants).
	(vrnda): New (v8hf, v4hf variants).
	(vrndm): New (v8hf, v4hf variants).
	(vrndn): New (v8hf, v4hf variants).
	(vrndp): New (v8hf, v4hf variants).
	(vrndx): New (v8hf, v4hf variants).
	(vrsqrte): New (v8hf, v4hf variants).
	(vmul_lane): Add v4hf and v8hf variants.
	(vmul_n): Add v4hf and v8hf variants.
	(vext): New (v8hf, v4hf variants).
	(vcvts): New (v8hi, v4hi variants).
	(vcvts): New (v

Re: [PATCH 11/17][ARM] Add builtins for VFP FP16 intrinsics.

2016-07-04 Thread Matthew Wahab


On 17/05/16 15:41, Matthew Wahab wrote:
> The ACLE intrinsics introduced to support the ARMv8.2 FP16 extensions
> require that intrinsics for scalar floating pointer (VFP) instructions
> are available under different conditions from those for the NEON
> intrinsics.
>
> This patch adds the support code and builtins data for the new VFP
> intrinsics. Because of the similarities between the scalar and NEON
> builtins, the support code for the scalar builtins follows the code for
> the NEON builtins. The declarations for the VFP builtins are also added
> in this patch since the support code expects non-empty tables.

Updated the patch to drop the builtins for vneg, vadd, vsub, vmul and
vdiv, which are no longer needed.

Tested the series for arm-none-linux-gnueabihf with native bootstrap and
make check and for arm-none-eabi and armeb-none-eabi with make check on
an ARMv8.2-A emulator.

Ok for trunk?
Matthew

2016-07-04  Matthew Wahab  

* config/arm/arm-builtins.c (hf_UP): New.
(si_UP): New.
(vfp_builtin_data): New.  Update comment.
(enum arm_builtins): Include "arm_vfp_builtins.def".
(ARM_BUILTIN_VFP_PATTERN_START): New.
(arm_init_vfp_builtins): New.
(arm_init_builtins): Add arm_init_vfp_builtins.
(arm_expand_vfp_builtin): New.
(arm_expand_builtins): Update for arm_expand_vfp_builtin.  Fix
long line.
* config/arm/arm_vfp_builtins.def: New file.
* config/arm/t-arm (arm.o): Add arm_vfp_builtins.def.
(arm-builtins.o): Likewise.

>From 04896868ba0af25b31e9d23c3af5d3a88e70a564 Mon Sep 17 00:00:00 2001
From: Matthew Wahab 
Date: Thu, 7 Apr 2016 15:33:14 +0100
Subject: [PATCH 11/17] [PATCH 11/17][ARM] Add builtins for VFP FP16
 intrinsics.

2016-07-04  Matthew Wahab  

	* config/arm/arm-builtins.c (hf_UP): New.
	(si_UP): New.
	(vfp_builtin_data): New.  Update comment.
	(enum arm_builtins): Include "arm_vfp_builtins.def".
	(ARM_BUILTIN_VFP_PATTERN_START): New.
	(arm_init_vfp_builtins): New.
	(arm_init_builtins): Add arm_init_vfp_builtins.
	(arm_expand_vfp_builtin): New.
	(arm_expand_builtins): Update for arm_expand_vfp_builtin.  Fix
	long line.
	* config/arm/arm_vfp_builtins.def: New file.
	* config/arm/t-arm (arm.o): Add arm_vfp_builtins.def.
	(arm-builtins.o): Likewise.
---
 gcc/config/arm/arm-builtins.c   | 75 +
 gcc/config/arm/arm_vfp_builtins.def | 51 +
 gcc/config/arm/t-arm|  4 +-
 3 files changed, 121 insertions(+), 9 deletions(-)
 create mode 100644 gcc/config/arm/arm_vfp_builtins.def

diff --git a/gcc/config/arm/arm-builtins.c b/gcc/config/arm/arm-builtins.c
index 5dd81b1..70bcc07 100644
--- a/gcc/config/arm/arm-builtins.c
+++ b/gcc/config/arm/arm-builtins.c
@@ -190,6 +190,8 @@ arm_storestruct_lane_qualifiers[SIMD_MAX_BUILTIN_ARGS]
 #define ti_UP	 TImode
 #define ei_UP	 EImode
 #define oi_UP	 OImode
+#define hf_UP	 HFmode
+#define si_UP	 SImode
 
 #define UP(X) X##_UP
 
@@ -239,12 +241,22 @@ typedef struct {
   VAR11 (T, N, A, B, C, D, E, F, G, H, I, J, K) \
   VAR1 (T, N, L)
 
-/* The NEON builtin data can be found in arm_neon_builtins.def.
-   The mode entries in the following table correspond to the "key" type of the
-   instruction variant, i.e. equivalent to that which would be specified after
-   the assembler mnemonic, which usually refers to the last vector operand.
-   The modes listed per instruction should be the same as those defined for
-   that instruction's pattern in neon.md.  */
+/* The NEON builtin data can be found in arm_neon_builtins.def and
+   arm_vfp_builtins.def.  The entries in arm_neon_builtins.def require
+   TARGET_NEON to be true.  The entries in arm_vfp_builtins.def require
+   TARGET_VFP to be true.  The feature tests are checked when the builtins are
+   expanded.
+
+   The mode entries in the following table correspond to
+   the "key" type of the instruction variant, i.e. equivalent to that which
+   would be specified after the assembler mnemonic, which usually refers to the
+   last vector operand.  The modes listed per instruction should be the same as
+   those defined for that instruction's pattern in neon.md.  */
+
+static neon_builtin_datum vfp_builtin_data[] =
+{
+#include "arm_vfp_builtins.def"
+};
 
 static neon_builtin_datum neon_builtin_data[] =
 {
@@ -534,6 +546,10 @@ enum arm_builtins
 #undef CRYPTO2
 #undef CRYPTO3
 
+  ARM_BUILTIN_VFP_BASE,
+
+#include "arm_vfp_builtins.def"
+
   ARM_BUILTIN_NEON_BASE,
   ARM_BUILTIN_NEON_LANE_CHECK = ARM_BUILTIN_NEON_BASE,
 
@@ -542,6 +558,9 @@ enum arm_builtins
   ARM_BUILTIN_MAX
 };
 
+#define ARM_BUILTIN_VFP_PATTERN_START \
+  (ARM_BUILTIN_VFP_BASE + 1)
+
 #define ARM_BUILTIN_NEON_PATTERN_START \
   (ARM_BUILTIN_NEON_BASE + 1)
 
@@ -1033,6 +1052,20 @@ arm_init_neon_builtins (void)
 }
 }
 
+/* Set up all the scala

Re: [PATCH 9/17][ARM] Add NEON FP16 arithmetic instructions.

2016-07-04 Thread Matthew Wahab


On 18/05/16 01:58, Joseph Myers wrote:
> On Tue, 17 May 2016, Matthew Wahab wrote:
>
>> As with the VFP FP16 arithmetic instructions, operations on __fp16
>> values are done by conversion to single-precision. Any new optimization
>> supported by the instruction descriptions can only apply to code
>> generated using intrinsics added in this patch series.
>
> As with the scalar instructions, I think it is legitimate in most cases to
> optimize arithmetic via single precision to work direct on __fp16 values
> (and this would be natural for vectorization of __fp16 arithmetic).
>
>> A number of the instructions are modelled as two variants, one using
>> UNSPEC and the other using RTL operations, with the model used decided
>> by the funsafe-math-optimizations flag. This follows the
>> single-precision instructions and is due to the half-precision
>> operations having the same conditions and restrictions on their use in
>> optmizations (when they are enabled).
>
> (Of course, these restrictions still apply.)

The F16 support generally follows the F32 implementation and, for F32,
direct arithmetic vector operations are only available when
unsafe-math-optimizations is enabled. I want to check the behaviour of
the F16 operations when unsafe-math is enabled so I'll defer to a follow
up patch the change to use standard names for the vector operations.

There are still some changes from the previous patch:

- Two fma/fmsub patterns *fma4 and <*fmsub4 are
  dropped since they just duplicated *fma4_intrinsic and
  <*fmsub4_intrinsic.

- Patterns neon_vadd_unspec and neon_vsub_unspec are
  dropped, they were redundant.

- 2_fp16 is renamed to 2. This
  implements the abs and neg operations which are always safe to use.

- neon_vsqrte is renamed to neon_vrsqrte. This is a
  misspelled intrinsic that wasn't caught in testing because the
  relevant test case is missing. The intrinsic is fixed here and in
  other patches and an advsimd-intrinsics test added later in the
  (updated) series.

- neon_vcvt_n

* config/arm/iterators.md (VCVTHI): New.
(NEON_VCMP): Add UNSPEC_VCLT and UNSPEC_VCLE.  Fix a long line.
(NEON_VAGLTE): New.
(VFM_LANE_AS): New.
(VH_CVTTO): New.
(V_reg): Add HF, V4HF and V8HF.  Fix white-space.
(V_HALF): Add V4HF.  Fix white-space.
(V_if_elem): Add HF, V4HF and V8HF.  Fix white-space.
(V_s_elem): Likewise.
(V_sz_elem): Fix white-space.
(V_elem_ch): Likewise.
(VH_elem_ch): New.
(scalar_mul_constraint): Add V8HF and V4HF.
(Is_float_mode): Fix white-space.
(Is_d_reg): Fix white-space.
(q): Add HF.  Fix white-space.
(float_sup): New.
(float_SUP): New.
(cmp_op_unsp): Add UNSPEC_VCALE and UNSPEC_VCALT.
(neon_vfm_lane_as): New.
* config/arm/neon.md (add3_fp16): New.
(sub3_fp16): New.
(mul3add_neon): New.
(fma4_intrinsic): New.
(fmsub4_intrinsic): Fix white-space.
(fmsub4_intrinsic): New.
(2): New.
(neon_v): New.
(neon_v): New.
(neon_vrsqrte): New.
(neon_vpaddv4hf): New.
(neon_vadd): New.
(neon_vsub): New.
(neon_vmulf): New.
(neon_vfma): New.
(neon_vfms): New.
(neon_vc): New.
(neon_vc_fp16insn): New
(neon_vc_fp16insn_unspec): New.
(neon_vca): New.
(neon_vca_fp16insn): New.
(neon_vca_fp16insn_unspec): New.
(neon_vcz): New.
(neon_vabd): New.
(neon_vf): New.
(neon_vpfv4hf: New.
(neon_): New.
(neon_vrecps): New.
(neon_vrsqrts): New.
(neon_vrecpe): New (VH variant).
(neon_vdup_lane_internal): New.
(neon_vdup_lane): New.
(neon_vcvt): New (VCVTHI variant).
(neon_vcvt): New (VH variant).
(neon_vcvt_n): New (VH variant).
(neon_vcvt_n): New (VCVTHI variant).
(neon_vcvt): New.
(neon_vmul_lane): New.
(neon_vmul_n): New.
* config/arm/unspecs.md (UNSPEC_VCALE): New
(UNSPEC_VCALT): New.
    (UNSPEC_VFMA_LANE): New.
(UNSPECS_VFMS_LANE): New.

testsuite/
2016-07-04  Matthew Wahab  

* gcc.target/arm/armv8_2-fp16-arith-1.c: Use arm_v8_2a_fp16_neon
options.  Add tests for float16x4_t and float16x8_t.

>From 4cbebc297f74f0c2e3ddac600d7902083c09c934 Mon Sep 17 00:00:00 2001
From: Matthew Wahab 
Date: Thu, 7 Apr 2016 16:19:57 +0100
Subject: [PATCH 09/17] [PATCH 9/17][ARM] Add NEON FP16 arithmetic
 instructions.

2016-07-04  Matthew Wahab  

	* config/arm/iterators.md (VCVTHI): New.
	(NEON_VCMP): Add UNSPEC_VCLT and UNSPEC_VCLE.  Fix a long line.
	(NEON_VAGLTE): New.
	(VFM_LANE_AS): New.
	(VH_CVTTO): New.
	(V_reg): Add HF, V4HF and V8HF.  Fix white-space.
	(V_HALF): Add V4HF.  Fix white-space.
	(V_if_elem): Add HF, V4H

Re: [PATCH 8/17][ARM] Add VFP FP16 arithmetic instructions.

2016-07-04 Thread Matthew Wahab

On 19/05/16 15:54, Matthew Wahab wrote:
> On 18/05/16 16:20, Joseph Myers wrote:
>> On Wed, 18 May 2016, Matthew Wahab wrote:
>>
>> In short: instructions for direct HFmode arithmetic should be described
>> with patterns with the standard names.  It's the job of the
>> architecture-independent compiler to ensure that fp16 arithmetic in the
>> user's source code only generates direct fp16 arithmetic in GIMPLE (and
>> thus ends up using those patterns) if that is a correct representation of
>> the source code's semantics according to ACLE.
>>
>> The intrinsics you provide can then be written to use direct arithmetic,
>> and rely on convert_to_real_1 eliminating the promotions, rather than
>> needing built-in functions at all, just like many arm_neon.h intrinsics
>> make direct use of GNU C vector arithmetic.
>
> I think it's clear that this has exhausted my knowledge of FP semantics.
>
> Forcing promotion to single-precision was to settle concerns brought up in
> internal discussions about __fp16 semantics. I'll see if anybody has any
> problem with the changes you suggest.

This patch changes the implementation to use the standard names for the
HFmode arithmetic. Later patches will also be updated to use the
arithmetic operators where appropriate.

Changes since the last version of this patch:
- The standard names for plus, minus, mult, div and fma are defined for
  HF mode.
- The patterns supporting the new ACLE intrinsics vnegh_f16, vaddh_f16,
  vsubh_f16, vmulh_f16 and vdivh_f16 are removed, the arithmetic
  operators will be used instead.
- The tests are updated to expect f16 instructions rather than the f32
  instructions that were previously emitted.

Tested the series for arm-none-linux-gnueabihf with native bootstrap and
make check and for arm-none-eabi and armeb-none-eabi with make check on
an ARMv8.2-A emulator.

Ok for trunk?
Matthew

2016-07-04  Matthew Wahab  

* config/arm/iterators.md (Code iterators): Fix some white-space
in the comments.
(GLTE): New.
(ABSNEG): New
(FCVT): Moved from vfp.md.
(VCVT_HF_US_N): New.
(VCVT_SI_US_N): New.
(VCVT_HF_US): New.
(VCVTH_US): New.
(FP16_RND): New.
(absneg_str): New.
(FCVTI32typename): Moved from vfp.md.
(sup): Add UNSPEC_VCVTA_S, UNSPEC_VCVTA_U, UNSPEC_VCVTM_S,
UNSPEC_VCVTM_U, UNSPEC_VCVTN_S, UNSPEC_VCVTN_U, UNSPEC_VCVTP_S,
UNSPEC_VCVTP_U, UNSPEC_VCVT_HF_S_N, UNSPEC_VCVT_HF_U_N,
UNSPEC_VCVT_SI_S_N, UNSPEC_VCVT_SI_U_N,  UNSPEC_VCVTH_S_N,
UNSPEC_VCVTH_U_N, UNSPEC_VCVTH_S and UNSPEC_VCVTH_U.
(vcvth_op): New.
(fp16_rnd_str): New.
(fp16_rnd_insn): New.
* config/arm/unspecs.md (UNSPEC_VCVT_HF_S_N): New.
(UNSPEC_VCVT_HF_U_N): New.
(UNSPEC_VCVT_SI_S_N): New.
(UNSPEC_VCVT_SI_U_N): New.
(UNSPEC_VCVTH_S): New.
(UNSPEC_VCVTH_U): New.
(UNSPEC_VCVTA_S): New.
(UNSPEC_VCVTA_U): New.
(UNSPEC_VCVTM_S): New.
(UNSPEC_VCVTM_U): New.
(UNSPEC_VCVTN_S): New.
(UNSPEC_VCVTN_U): New.
(UNSPEC_VCVTP_S): New.
(UNSPEC_VCVTP_U): New.
(UNSPEC_VCVTP_S): New.
(UNSPEC_VCVTP_U): New.
(UNSPEC_VRND): New.
(UNSPEC_VRNDA): New.
(UNSPEC_VRNDI): New.
(UNSPEC_VRNDM): New.
(UNSPEC_VRNDN): New.
(UNSPEC_VRNDP): New.
(UNSPEC_VRNDX): New.
* config/arm/vfp.md (hf2): New.
(neon_vabshf): New.
(neon_vhf): New.
(neon_vrndihf): New.
(addhf3): New.
(subhf3): New.
(divhf3): New.
(mulhf3): New.
(*mulsf3neghf_vfp): New.
(*negmulhf3_vfp): New.
(*mulsf3addhf_vfp): New.
(*mulhf3subhf_vfp): New.
(*mulhf3neghfaddhf_vfp): New.
(*mulhf3neghfsubhf_vfp): New.
(fmahf4): New.
(neon_vfmahf): New.
(fmsubhf4_fp16): New.
(neon_vfmshf): New.
(*fnmsubhf4): New.
(*fnmaddhf4): New.
(neon_vsqrthf): New.
(neon_vrsqrtshf): New.
(FCVT): Move to iterators.md.
(FCVTI32typename): Likewise.
(neon_vcvthhf): New.
(neon_vcvthsi): New.
(neon_vcvth_nhf_unspec): New.
(neon_vcvth_nhf): New.
(neon_vcvth_nsi_unspec): New.
(neon_vcvth_nsi): New.
(neon_vcvthsi): New.
(neon_hf): New.

testsuite/
2016-07-04  Matthew Wahab  

* gcc.target/arm/armv8_2-fp16-arith-1.c: New.
* gcc.target/arm/armv8_2-fp16-conv-1.c: New.

>From 780903a1c5ef2e4393c9ee2843307d9041f36f87 Mon Sep 17 00:00:00 2001
From: Matthew Wahab 
Date: Thu, 7 Apr 2016 14:49:17 +0100
Subject: [PATCH 08/17] [PATCH 8/17][ARM] Add VFP FP16 arithmetic instructions.

2016-07-04  Matthew Wahab  

	* config/arm/iterators.md (Code iterators): Fix some white-space
	in

Re: [PATCH 7/17][ARM] Add FP16 data movement instructions.

2016-07-04 Thread Matthew Wahab


On 17/05/16 15:34, Matthew Wahab wrote:
> The ARMv8.2-A FP16 extension adds a number of instructions to support
> data movement for FP16 values. This patch adds these instructions to the
> backend, making them available to the compiler code generator.

This updates the expected output for the test added by the patch since
gcc now generates ldrh/strh for some indexed loads/stores which were
previously done with vld1/vstr1.

Tested the series for arm-none-linux-gnueabihf with native bootstrap and
make check and for arm-none-eabi and armeb-none-eabi with make check on
an ARMv8.2-A emulator.

2016-07-04  Matthew Wahab  
Jiong Wang 

* config/arm/arm.c (coproc_secondary_reload_class): Make HFmode
available when FP16 instructions are available.
(output_move_vfp): Add support for 16-bit data moves.
(arm_validize_comparison): Fix some white-space.  Support HFmode
by conversion to SFmode.
* config/arm/arm.md (truncdfhf2): Fix a comment.
(extendhfdf2): Likewise.
(cstorehf4): New.
(movsicc): Fix some white-space.
(movhfcc): New.
(movsfcc): Fix some white-space.
(*cmovhf): New.
* config/arm/vfp.md (*arm_movhi_vfp): Disable when VFP FP16
instructions are available.
(*thumb2_movhi_vfp): Likewise.
(*arm_movhi_fp16): New.
(*thumb2_movhi_fp16): New.
(*movhf_vfp_fp16): New.
(*movhf_vfp_neon): Disable when VFP FP16 instructions are
available.
(*movhf_vfp): Likewise.
(extendhfsf2): Enable when VFP FP16 instructions are available.
(truncsfhf2):  Enable when VFP FP16 instructions are available.

testsuite/
2016-07-04  Matthew Wahab  

* gcc.target/arm/armv8_2_fp16-move-1.c: New.

>From 0633bbb2f2d43a6994adaeb44898e18c304ee728 Mon Sep 17 00:00:00 2001
From: Matthew Wahab 
Date: Thu, 7 Apr 2016 13:35:04 +0100
Subject: [PATCH 07/17] [PATCH 7/17][ARM] Add FP16 data movement instructions.

2016-07-04  Matthew Wahab  
	Jiong Wang 

	* config/arm/arm.c (coproc_secondary_reload_class): Make HFmode
	available when FP16 instructions are available.
	(output_move_vfp): Add support for 16-bit data moves.
	(arm_validize_comparison): Fix some white-space.  Support HFmode
	by conversion to SFmode.
	* config/arm/arm.md (truncdfhf2): Fix a comment.
	(extendhfdf2): Likewise.
	(cstorehf4): New.
	(movsicc): Fix some white-space.
	(movhfcc): New.
	(movsfcc): Fix some white-space.
	(*cmovhf): New.
	* config/arm/vfp.md (*arm_movhi_vfp): Disable when VFP FP16
	instructions are available.
	(*thumb2_movhi_vfp): Likewise.
	(*arm_movhi_fp16): New.
	(*thumb2_movhi_fp16): New.
	(*movhf_vfp_fp16): New.
	(*movhf_vfp_neon): Disable when VFP FP16 instructions are
	available.
	(*movhf_vfp): Likewise.
	(extendhfsf2): Enable when VFP FP16 instructions are available.
	(truncsfhf2):  Enable when VFP FP16 instructions are available.

testsuite/
2016-07-04  Matthew Wahab  

	* gcc.target/arm/armv8_2_fp16-move-1.c: New.
---
 gcc/config/arm/arm.c   |  16 +-
 gcc/config/arm/arm.md  |  81 -
 gcc/config/arm/vfp.md  | 182 -
 gcc/testsuite/gcc.target/arm/armv8_2-fp16-move-1.c | 165 +++
 4 files changed, 432 insertions(+), 12 deletions(-)
 create mode 100644 gcc/testsuite/gcc.target/arm/armv8_2-fp16-move-1.c

diff --git a/gcc/config/arm/arm.c b/gcc/config/arm/arm.c
index ce18f75..f07e2c1 100644
--- a/gcc/config/arm/arm.c
+++ b/gcc/config/arm/arm.c
@@ -13187,7 +13187,7 @@ coproc_secondary_reload_class (machine_mode mode, rtx x, bool wb)
 {
   if (mode == HFmode)
 {
-  if (!TARGET_NEON_FP16)
+  if (!TARGET_NEON_FP16 && !TARGET_VFP_FP16INST)
 	return GENERAL_REGS;
   if (s_register_operand (x, mode) || neon_vector_mem_operand (x, 2, true))
 	return NO_REGS;
@@ -18638,6 +18638,8 @@ output_move_vfp (rtx *operands)
   rtx reg, mem, addr, ops[2];
   int load = REG_P (operands[0]);
   int dp = GET_MODE_SIZE (GET_MODE (operands[0])) == 8;
+  int sp = (!TARGET_VFP_FP16INST
+	|| GET_MODE_SIZE (GET_MODE (operands[0])) == 4);
   int integer_p = GET_MODE_CLASS (GET_MODE (operands[0])) == MODE_INT;
   const char *templ;
   char buff[50];
@@ -18684,7 +18686,7 @@ output_move_vfp (rtx *operands)
 
   sprintf (buff, templ,
 	   load ? "ld" : "st",
-	   dp ? "64" : "32",
+	   dp ? "64" : sp ? "32" : "16",
 	   dp ? "P" : "",
 	   integer_p ? "\t%@ int" : "");
   output_asm_insn (buff, ops);
@@ -29326,7 +29328,7 @@ arm_validize_comparison (rtx *comparison, rtx * op1, rtx * op2)
 {
   enum rtx_code code = GET_CODE (*comparison);
   int code_int;
-  machine_mode mode = (GET_MODE (*op1) == VOIDmode) 
+  machine_mode mode = (GET_MODE (*op1) == VOIDmode)
 ? GET_MODE (*op2) : GET_MODE (*op1);
 
   gcc_assert (GET_MOD

Re: [PATCH 3/17][Testsuite] Add ARM support for ARMv8.2-A with FP16 arithmetic instructions.

2016-07-04 Thread Matthew Wahab


On 17/05/16 15:26, Matthew Wahab wrote:
> The ARMv8.2-A FP16 extension adds to both the VFP and the NEON
> instruction sets. This patch adds support to the testsuite to select
> targets and set options for tests that make use of these
> instructions. It also adds documentation for ARMv8.1-A selectors.

This is a rebase of the patch to take account of changes in
sourcebuild.texi.

Tested the series for arm-none-linux-gnueabihf with native bootstrap and
make check and for arm-none-eabi and armeb-none-eabi with make check on
an ARMv8.2-A emulator.

2016-07-04  Matthew Wahab  

* doc/sourcebuild.texi (ARM-specific attributes): Add anchor for
arm_v8_1a_neon_ok.  Add entries for arm_v8_2a_fp16_scalar_ok,
arm_v8_2a_fp16_scalar_hw, arm_v8_2a_fp16_neon_ok and
arm_v8_2a_fp16_neon_hw.
(Add options): Add entries for arm_v8_1a_neon, arm_v8_2a_fp16_scalar,
arm_v8_2a_fp16_neon.
* lib/target-supports.exp
(add_options_for_arm_v8_2a_fp16_scalar): New.
(add_options_for_arm_v8_2a_fp16_neon): New.
(check_effective_target_arm_arch_v8_2a_ok): Auto-generate.
(add_options_for_arm_arch_v8_2a): Auto-generate.
(check_effective_target_arm_arch_v8_2a_multilib): Auto-generate.
(check_effective_target_arm_v8_2a_fp16_scalar_ok_nocache): New.
(check_effective_target_arm_v8_2a_fp16_scalar_ok): New.
(check_effective_target_arm_v8_2a_fp16_neon_ok_nocache): New.
(check_effective_target_arm_v8_2a_fp16_neon_ok): New.
(check_effective_target_arm_v8_2a_fp16_scalar_hw): New.
(check_effective_target_arm_v8_2a_fp16_neon_hw): New.

>From 47ead98473ac1f6dda5df2638800e5b4c8ec38a1 Mon Sep 17 00:00:00 2001
From: Matthew Wahab 
Date: Thu, 7 Apr 2016 13:34:30 +0100
Subject: [PATCH 03/17] [PATCH 3/17][Testsuite] Add ARM support for ARMv8.2-A
 with FP16   arithmetic instructions.

2016-07-04  Matthew Wahab  

	* doc/sourcebuild.texi (ARM-specific attributes): Add anchor for
	arm_v8_1a_neon_ok.  Add entries for arm_v8_2a_fp16_scalar_ok,
	arm_v8_2a_fp16_scalar_hw, arm_v8_2a_fp16_neon_ok and
	arm_v8_2a_fp16_neon_hw.
	(Add options): Add entries for arm_v8_1a_neon, arm_v8_2a_scalar,
	arm_v8_2a_neon.
	* lib/target-supports.exp
	(add_options_for_arm_v8_2a_fp16_scalar): New.
	(add_options_for_arm_v8_2a_fp16_neon): New.
	(check_effective_target_arm_arch_v8_2a_ok): Auto-generate.
	(add_options_for_arm_arch_v8_2a): Auto-generate.
	(check_effective_target_arm_arch_v8_2a_multilib): Auto-generate.
	(check_effective_target_arm_v8_2a_fp16_scalar_ok_nocache): New.
	(check_effective_target_arm_v8_2a_fp16_scalar_ok): New.
	(check_effective_target_arm_v8_2a_fp16_neon_ok_nocache): New.
	(check_effective_target_arm_v8_2a_fp16_neon_ok): New.
	(check_effective_target_arm_v8_2a_fp16_scalar_hw): New.
	(check_effective_target_arm_v8_2a_fp16_neon_hw): New.
---
 gcc/doc/sourcebuild.texi  |  40 ++
 gcc/testsuite/lib/target-supports.exp | 145 +-
 2 files changed, 184 insertions(+), 1 deletion(-)

diff --git a/gcc/doc/sourcebuild.texi b/gcc/doc/sourcebuild.texi
index 1fa962d..4f83307 100644
--- a/gcc/doc/sourcebuild.texi
+++ b/gcc/doc/sourcebuild.texi
@@ -1596,6 +1596,7 @@ ARM target supports @code{-mfpu=neon-fp-armv8 -mfloat-abi=softfp}.
 Some multilibs may be incompatible with these options.
 
 @item arm_v8_1a_neon_ok
+@anchor{arm_v8_1a_neon_ok}
 ARM target supports options to generate ARMv8.1 Adv.SIMD instructions.
 Some multilibs may be incompatible with these options.
 
@@ -1607,6 +1608,28 @@ arm_v8_1a_neon_ok.
 @item arm_acq_rel
 ARM target supports acquire-release instructions.
 
+@item arm_v8_2a_fp16_scalar_ok
+@anchor{arm_v8_2a_fp16_scalar_ok}
+ARM target supports options to generate instructions for ARMv8.2 and
+scalar instructions from the FP16 extension.  Some multilibs may be
+incompatible with these options.
+
+@item arm_v8_2a_fp16_scalar_hw
+ARM target supports executing instructions for ARMv8.2 and scalar
+instructions from the FP16 extension.  Some multilibs may be
+incompatible with these options.  Implies arm_v8_2a_fp16_neon_ok.
+
+@item arm_v8_2a_fp16_neon_ok
+@anchor{arm_v8_2a_fp16_neon_ok}
+ARM target supports options to generate instructions from ARMv8.2 with
+the FP16 extension.  Some multilibs may be incompatible with these
+options.  Implies arm_v8_2a_fp16_scalar_ok.
+
+@item arm_v8_2a_fp16_neon_hw
+ARM target supports executing instructions from ARMv8.2 with the FP16
+extension.  Some multilibs may be incompatible with these options.
+Implies arm_v8_2a_fp16_neon_ok and arm_v8_2a_fp16_scalar_hw.
+
 @item arm_prefer_ldrd_strd
 ARM target prefers @code{LDRD} and @code{STRD} instructions over
 @code{LDM} and @code{STM} instructions.
@@ -2091,6 +2114,23 @@ the @ref{arm_neon_fp16_ok,,arm_neon_fp16_ok effective target keyword}.
 arm vfp3 floating point support; see
 the @ref{arm_vfp3_ok,,arm_vfp3_ok effective target keyword}.
 
+@item arm_v8_1a_neon
+Add opti

Re: [PATCH 1/17][ARM] Add ARMv8.2-A command line option and profile.

2016-07-04 Thread Matthew Wahab


On 17/05/16 15:22, Matthew Wahab wrote:
> This patch adds the command options for the architecture ARMv8.2-A and
> the half-precision extension. The architecture is selected by
> -march=armv8.2-a and has all the properties of -march=armv8.1-a.
>
> This patch also enables the CRC extension (+crc) which is required
> for both ARMv8.2-A and ARMv8.1-A architectures but is not currently
> enabled by default for -march=armv8.1-a.
>
> The half-precision extension is selected using the extension +fp16. This
> enables the VFP FP16 instructions if an ARMv8 VFP unit is also
> specified, e.g. by -mfpu=fp-armv8. It also enables the FP16 NEON
> instructions if an ARMv8 NEON unit is specified, e.g. by
> -mfpu=neon-fp-armv8. Note that if the NEON FP16 instructions are enabled
> then so are the VFP FP16 instructions.

This a minor respin that moves the setting of arm_fp16_inst in
arm_option_override to immediately before it is used to set the required
arm_fp16_format.

Tested the series for arm-none-linux-gnueabihf with native bootstrap and
make check and for arm-none-eabi and armeb-none-eabi with make check on
an ARMv8.2-A emulator.

2016-07-04  Matthew Wahab  

* config/arm/arm-arches.def ("armv8.1-a"): Add FL_CRC32.
("armv8.2-a"): New.
("armv8.2-a+fp16"): New.
* config/arm/arm-protos.h (FL2_ARCH8_2): New.
(FL2_FP16INST): New.
(FL2_FOR_ARCH8_2A): New.
* config/arm/arm-tables.opt: Regenerate.
* config/arm/arm.c (arm_arch8_2): New.
(arm_fp16_inst): New.
(arm_option_override): Set arm_arch8_2 and arm_fp16_inst.  Check
for incompatible fp16-format settings.
* config/arm/arm.h (TARGET_VFP_FP16INST): New.
(TARGET_NEON_FP16INST): New.
(arm_arch8_2): Declare.
(arm_fp16_inst): Declare.
* config/arm/bpabi.h (BE8_LINK_SPEC): Add entries for
march=armv8.2-a and march=armv8.2-a+fp16.
* config/arm/t-aprofile (Arch Matches): Add entries for armv8.2-a
and armv8.2-a+fp16.
* doc/invoke.texi (ARM Options): Add "-march=armv8.1-a",
"-march=armv8.2-a" and "-march=armv8.2-a+fp16".

>From e165b4e8bc4338608ff9505a7fd1a26d8a996b0a Mon Sep 17 00:00:00 2001
From: Matthew Wahab 
Date: Thu, 7 Apr 2016 13:31:24 +0100
Subject: [PATCH 01/17] [PATCH 1/17][ARM] Add ARMv8.2-A command line option and
 profile.

2016-07-04  Matthew Wahab  

	* config/arm/arm-arches.def ("armv8.1-a"): Add FL_CRC32.
	("armv8.2-a"): New.
	("armv8.2-a+fp16"): New.
	* config/arm/arm-protos.h (FL2_ARCH8_2): New.
	(FL2_FP16INST): New.
	(FL2_FOR_ARCH8_2A): New.
	* config/arm/arm-tables.opt: Regenerate.
	* config/arm/arm.c (arm_arch8_2): New.
	(arm_fp16_inst): New.
	(arm_option_override): Set arm_arch8_2 and arm_fp16_inst.  Check
	for incompatible fp16-format settings.
	* config/arm/arm.h (TARGET_VFP_FP16INST): New.
	(TARGET_NEON_FP16INST): New.
	(arm_arch8_2): Declare.
	(arm_fp16_inst): Declare.
	* config/arm/bpabi.h (BE8_LINK_SPEC): Add entries for
	march=armv8.2-a and march=armv8.2-a+fp16.
	* config/arm/t-aprofile (Arch Matches): Add entries for armv8.2-a
	and armv8.2-a+fp16.
	* doc/invoke.texi (ARM Options): Add "-march=armv8.1-a",
	"-march=armv8.2-a" and "-march=armv8.2-a+fp16".
---
 gcc/config/arm/arm-arches.def | 10 --
 gcc/config/arm/arm-protos.h   |  4 
 gcc/config/arm/arm-tables.opt | 10 --
 gcc/config/arm/arm.c  | 15 +++
 gcc/config/arm/arm.h  | 14 ++
 gcc/config/arm/bpabi.h|  4 
 gcc/config/arm/t-aprofile |  2 ++
 gcc/doc/invoke.texi   | 13 +
 8 files changed, 68 insertions(+), 4 deletions(-)

diff --git a/gcc/config/arm/arm-arches.def b/gcc/config/arm/arm-arches.def
index fd02b18..2b4a80e 100644
--- a/gcc/config/arm/arm-arches.def
+++ b/gcc/config/arm/arm-arches.def
@@ -58,10 +58,16 @@ ARM_ARCH("armv7e-m", cortexm4,  7EM,	ARM_FSET_MAKE_CPU1 (FL_CO_PROC |	  FL_F
 ARM_ARCH("armv8-a", cortexa53,  8A,	ARM_FSET_MAKE_CPU1 (FL_CO_PROC | FL_FOR_ARCH8A))
 ARM_ARCH("armv8-a+crc",cortexa53, 8A,   ARM_FSET_MAKE_CPU1 (FL_CO_PROC | FL_CRC32  | FL_FOR_ARCH8A))
 ARM_ARCH("armv8.1-a", cortexa53,  8A,
-	  ARM_FSET_MAKE (FL_CO_PROC | FL_FOR_ARCH8A,  FL2_FOR_ARCH8_1A))
+	  ARM_FSET_MAKE (FL_CO_PROC | FL_CRC32 | FL_FOR_ARCH8A,
+			 FL2_FOR_ARCH8_1A))
 ARM_ARCH("armv8.1-a+crc",cortexa53, 8A,
 	  ARM_FSET_MAKE (FL_CO_PROC | FL_CRC32 | FL_FOR_ARCH8A,
 			 FL2_FOR_ARCH8_1A))
+ARM_ARCH ("armv8.2-a", cortexa53,  8A,
+	  ARM_FSET_MAKE (FL_CO_PROC | FL_CRC32 | FL_FOR_ARCH8A,
+			 FL2_FOR_ARCH8_2A))
+ARM_ARCH ("armv8.2-a+fp16", cortexa53,  8A,
+	  ARM_FSET_MAKE (FL_CO_PROC | FL_CRC32 | FL_FOR_ARCH8A,
+			 FL2_FOR_ARCH8_2A | FL2_FP16INST))
 ARM_ARCH("iwmmxt",  iwmmxt, 5TE,	ARM_FSE

[ARM] FP16 ARM Alternative format variants of AAPCS tests.

2016-06-27 Thread Matthew Wahab


Hello,

Tests added for FP16 argument and return values being passed in
registers only check the case when the FP16 IEEE format is used. This
patch adds equivalent tests that also check the behaviour when the
ARM Alternative format is used.

This patch depends on the testsuite directives added for the FP16 aapcs
tests at https://gcc.gnu.org/ml/gcc-patches/2016-06/msg01794.html.

Tested arm-none-eabi with cross-compiled make check-gcc and
arm-none-linux-gnueabihf with native make check.

Ok for trunk?
Matthew

testsuite/
2016-06-27  Matthew Wahab  

* gcc.target/arm/fp16-aapcs-3.c: New.
* gcc.target/arm/fp16-aapcs-4.c: New.
* gcc.target/arm/aapcs/aapcs/vfp22.c: New.
* gcc.target/arm/aapcs/aapcs/vfp23.c: New.
* gcc.target/arm/aapcs/aapcs/vfp24.c: New.
* gcc.target/arm/aapcs/aapcs/vfp25.c: New.

>From 13b0cbec24a3fdeaaf6318acb42c79bf76e3414e Mon Sep 17 00:00:00 2001
From: Matthew Wahab 
Date: Wed, 15 Jun 2016 09:22:59 +0100
Subject: [PATCH] [ARM] FP16 ARM Alternative format variants of AAPCS tests.

Tests added for FP16 argument and return values being passed in
registers only check the case when the FP16 IEEE format is used. This
patch adds equivalent tests that to also check the behaviour when the
ARM Alternative format is used.

Tested arm-none-eabi with cross-compiled make check and
arm-none-linux-gnueabihf with native make check.

testsuite/
2016-06-27  Matthew Wahab  

	* gcc.target/arm/fp16-aapcs-3.c: New.
	* gcc.target/arm/fp16-aapcs-4.c: New.
	* gcc.target/arm/aapcs/aapcs/vfp22.c: New.
	* gcc.target/arm/aapcs/aapcs/vfp23.c: New.
	* gcc.target/arm/aapcs/aapcs/vfp24.c: New.
	* gcc.target/arm/aapcs/aapcs/vfp25.c: New.
---
 gcc/testsuite/gcc.target/arm/aapcs/vfp22.c  | 28 +++
 gcc/testsuite/gcc.target/arm/aapcs/vfp23.c  | 30 +
 gcc/testsuite/gcc.target/arm/aapcs/vfp24.c  | 22 +
 gcc/testsuite/gcc.target/arm/aapcs/vfp25.c  | 26 +
 gcc/testsuite/gcc.target/arm/fp16-aapcs-3.c | 21 
 gcc/testsuite/gcc.target/arm/fp16-aapcs-4.c | 21 
 6 files changed, 148 insertions(+)
 create mode 100644 gcc/testsuite/gcc.target/arm/aapcs/vfp22.c
 create mode 100644 gcc/testsuite/gcc.target/arm/aapcs/vfp23.c
 create mode 100644 gcc/testsuite/gcc.target/arm/aapcs/vfp24.c
 create mode 100644 gcc/testsuite/gcc.target/arm/aapcs/vfp25.c
 create mode 100644 gcc/testsuite/gcc.target/arm/fp16-aapcs-3.c
 create mode 100644 gcc/testsuite/gcc.target/arm/fp16-aapcs-4.c

diff --git a/gcc/testsuite/gcc.target/arm/aapcs/vfp22.c b/gcc/testsuite/gcc.target/arm/aapcs/vfp22.c
new file mode 100644
index 000..1944bb5
--- /dev/null
+++ b/gcc/testsuite/gcc.target/arm/aapcs/vfp22.c
@@ -0,0 +1,28 @@
+/* Test AAPCS layout (VFP variant)  */
+
+/* { dg-do run { target arm_eabi } }  */
+/* { dg-require-effective-target arm_hard_vfp_ok }  */
+/* { dg-require-effective-target arm_fp16_hw }  */
+/* { dg-add-options arm_fp16_alternative }  */
+
+#ifndef IN_FRAMEWORK
+#define VFP
+#define TESTFILE "vfp22.c"
+#include "abitest.h"
+
+#else
+#if defined (__ARM_BIG_ENDIAN)
+ARG (__fp16, 1.0f, S0 + 2)
+#else
+ARG (__fp16, 1.0f, S0)
+#endif
+ARG (float, 2.0f, S1)
+ARG (double, 4.0, D1)
+ARG (float, 2.0f, S4)
+#if defined (__ARM_BIG_ENDIAN)
+ARG (__fp16, 1.0f, S5 + 2)
+#else
+ARG (__fp16, 1.0f, S5)
+#endif
+LAST_ARG (int, 3, R0)
+#endif
diff --git a/gcc/testsuite/gcc.target/arm/aapcs/vfp23.c b/gcc/testsuite/gcc.target/arm/aapcs/vfp23.c
new file mode 100644
index 000..bcacf9f
--- /dev/null
+++ b/gcc/testsuite/gcc.target/arm/aapcs/vfp23.c
@@ -0,0 +1,30 @@
+/* Test AAPCS layout (VFP variant)  */
+
+/* { dg-do run { target arm_eabi } }  */
+/* { dg-require-effective-target arm_hard_vfp_ok }  */
+/* { dg-require-effective-target arm_fp16_hw }  */
+/* { dg-add-options arm_fp16_alternative }  */
+
+#ifndef IN_FRAMEWORK
+#define VFP
+#define TESTFILE "vfp23.c"
+
+__complex__ x = 1.0+2.0i;
+
+#include "abitest.h"
+#else
+#if defined (__ARM_BIG_ENDIAN)
+ARG (__fp16, 1.0f, S0 + 2)
+#else
+ARG (__fp16, 1.0f, S0)
+#endif
+ARG (float, 2.0f, S1)
+ARG (__complex__ double, x, D1)
+ARG (float, 3.0f, S6)
+#if defined (__ARM_BIG_ENDIAN)
+ARG (__fp16, 2.0f, S7 + 2)
+#else
+ARG (__fp16, 2.0f, S7)
+#endif
+LAST_ARG (int, 3, R0)
+#endif
diff --git a/gcc/testsuite/gcc.target/arm/aapcs/vfp24.c b/gcc/testsuite/gcc.target/arm/aapcs/vfp24.c
new file mode 100644
index 000..eac640e
--- /dev/null
+++ b/gcc/testsuite/gcc.target/arm/aapcs/vfp24.c
@@ -0,0 +1,22 @@
+/* Test AAPCS layout (VFP variant)  */
+
+/* { dg-do run { target arm_eabi } }  */
+/* { dg-require-effective-target arm_hard_vfp_ok }  */
+/* { dg-require-effective-target arm_fp16_hw }  */
+/* { dg-add-options arm_fp16_alternative }  */
+
+#ifndef IN_FRAMEWORK
+#define VFP
+#define TESTFILE "vfp24.c"
+
+#define PCSATTR __attribute__((pcs("aapcs")))
+
+#include "abitest.h&quo

Re: [ARM] Fix, add tests for FP16 aapcs.

2016-06-27 Thread Matthew Wahab


On 10/06/16 15:30, Matthew Wahab wrote:
> On 10/06/16 15:22, Christophe Lyon wrote:
>> On 10 June 2016 at 15:56, Matthew Wahab  wrote:
>>> On 10/06/16 09:32, Christophe Lyon wrote:
>>>>
>>>> On 9 June 2016 at 17:21, Matthew Wahab  wrote:
>>>>>
>>>> It's an improvement, but I'm still seeing a few problems with this patch:
>>>> the vfp* tests are still failing in some of the configurations I test,
>>>> because
>>>> * you force dg-options that contains -mfloat-abi=hard,
>>>> * you check effective-target arm_neon_fp16_hw
>>>> * but you don't call dg-add-options arm_neon_fp16
>>>>
> I understand now. I still think it would be better to use a list of
> require-effective-targets so I'll try that first and use the arm_neon_fp16
> options if that doesn't work.
>

Sorry for the delay. I've added effective-target requirements to the
tests to check for hard-fp and for VFP (i.e. non-neon) FP16 support. The
directives for the VFP FP16 support are new. I've split them out to a
separate patch, both patches are attached.

The first patch adds:

- effective-target keywords arm_fp16_ok and arm_fp16_hw to check for
  compiler and hardware support for FP16.

- add-options features arm_fp16_ieee and arm_fp16_alternative, to
  enable FP16 IEEE format and FP16 ARM Alternative format support

Note that the existing add-options feature arm_fp16 enables the default
FP16 format (fp16-format=none).

The second patch updates the tests to use these directives. It also
reworks gcc.target/arm/fp16-aapcs-1.c test is also reworked to focus on
argument passing and return values adds a softfp variant as
fp16-aapcs-2.c.

As before, checked for arm-none-eabi with cross-compiled check-gcc and
arm-linux-gnueabihf with native make check. I also ran the tests for
cross-compiled arm-none-eabi with -mcpu=Cortex-M3.

Ok for trunk?
Matthew

PATCH 1/2 ChangeLog
gcc/
2016-06-27  Matthew Wahab  

* doc/sourcebuild.texi (Effective-Target keywords): Add entries
for arm_fp16_ok and arm_fp16_hw.
(Add Options): Add entries for arm_fp16, arm_fp16_ieee and
arm_fp16_alternative.

testsuite/
2016-06-27  Matthew Wahab  

* lib/target-supports.exp (add_options_for arm_fp16): Reword
comment.
(add_options_for_arm_fp16_ieee): New.
(add_options_for_arm_fp16_alternative): New.
    (check_effective_target_arm_fp16_ok_nocache): Add to comment.  Fix a
long-line.
(check_effective_target_arm_fp16_hw): New.

PATCH 2/2 ChangeLog
testsuite/
2016-06-27  Matthew Wahab  

* testsuite/gcc.target/arm/aapcs/neon-vect10.c: Require
-mfloat-ab=hard.  Replace arm_neon_fp16_ok with arm_neon_fp16_hw.
* testsuite/gcc.target/arm/aapcs/neon-vect9.c: Likewise.
* testsuite/gcc.target/arm/aapcs/vfp18.c: Likewise.  Also add
options for ARM FP16 IEEE format.
* testsuite/gcc.target/arm/aapcs/vfp19.c: Likewise.
* testsuite/gcc.target/arm/aapcs/vfp20.c: Likewise.
* testsuite/gcc.target/arm/aapcs/vfp21.c: Likewise.
* testsuite/gcc.target/arm/fp16-aapcs-1.c: Require
    -mfloat-ab=hard.  Also simplify the test.
* testsuite/gcc.target/arm/fp16-aapcs-2.c: New.

>From ff46f8397b2ae4ffe3be0027849aa8ff63e9ab9b Mon Sep 17 00:00:00 2001
From: Matthew Wahab 
Date: Mon, 13 Jun 2016 13:30:13 +0100
Subject: [PATCH 1/2] [Testsuite] Selectors and options directives for ARM VFP
 FP16 support.

To support FP16 VFP tests for the ARM backend, this patch adds:

- effective-target keywords arm_fp16_ok and arm_fp16_hw to check for
  compiler and hardware support for FP16.

- add-options features arm_fp16_ieee and arm_fp16_alternative, to
  enable FP16 IEEE format and FP16 ARM Alternative format support

Note that the existing add-options feature arm_fp16 enables the default
FP16 format (fp16-format=none).

gcc/
2016-06-27  Matthew Wahab  

	* doc/sourcebuild.texi (Effective-Target keywords): Add entries
	for arm_fp16_ok and arm_fp16_hw.
	(Add Options): Add entries for arm_fp16, arm_fp16_ieee and
	arm_fp16_alternative.

testsuite/
2016-06-27  Matthew Wahab  

	* lib/target-supports.exp (add_options_for arm_fp16): Reword
	comment.
	(add_options_for_arm_fp16_ieee): New.
	(add_options_for_arm_fp16_alternative): New.
	(effective_target_arm_fp16_ok_nocache): Add to comment.  Fix a
	long-line.
	(effective_target_arm_fp16_hw): New.
---
 gcc/doc/sourcebuild.texi  | 32 +++
 gcc/testsuite/lib/target-supports.exp | 58 ---
 2 files changed, 85 insertions(+), 5 deletions(-)

diff --git a/gcc/doc/sourcebuild.texi b/gcc/doc/sourcebuild.texi
index 95a781d..23d3c3f 100644
--- a/gcc/doc/sourcebuild.texi
+++ b/gcc/doc/sourcebuild.texi
@@ -1555,6 +1555,16 @@ options.  Some multilibs may be incompatible with these options.
 AR

Re: [ARM] Fix, add tests for FP16 aapcs.

2016-06-10 Thread Matthew Wahab


On 10/06/16 15:22, Christophe Lyon wrote:

On 10 June 2016 at 15:56, Matthew Wahab  wrote:

On 10/06/16 09:32, Christophe Lyon wrote:


On 9 June 2016 at 17:21, Matthew Wahab  wrote:



It's an improvement, but I'm still seeing a few problems with this patch:
the vfp* tests are still failing in some of the configurations I test,
because
* you force dg-options that contains -mfloat-abi=hard,
* you check effective-target arm_neon_fp16_hw
* but you don't call dg-add-options arm_neon_fp16



I'm not sure why the skip-if is failing, it's intended to skip the test if
float-abi={soft,softfp} appears anywhere in the command line. That's needed
because, in some configurations, the directives add an -mfloat-abi=softfp
after the -mfloat-abi=hard from dg-options, making the test fail.

The require-effective-target arm_neon_fp16_hw was intended to select a
hard-float target with FP16 support. I don't think that dg-add-options
arm_neon_fp16 is right because that could also force soft-fp.


I'm seeing arm_neon_fp16_hw test passing with -mfpu=neon-fp16 -mfloat-abi=softfp
but since you don't call dg-add-options arm_neon_fp16, the dg-skip directive
sees only the -mfloat-abi-hard which comes from dg-options.
The test fails to link for me with -mfpu=vfp -mfloat-abi=hard -mfp16-format=ieee
according to my gcc.log



I understand now. I still think it would be better to use a list of 
require-effective-targets so I'll try that first and use the arm_neon_fp16 
options if that doesn't work.


Thanks,
Matthew

Re: [ARM] Fix, add tests for FP16 aapcs.

2016-06-10 Thread Matthew Wahab


On 10/06/16 09:32, Christophe Lyon wrote:

On 9 June 2016 at 17:21, Matthew Wahab  wrote:

A number of tests were added to check for FP16 arguments and return
values being passed in registers. These require mfloat-abi=hard to be
selected but in some test configurations they were run with
-mfloat-abi=soft or -mfloat-abi=softfp.


It's an improvement, but I'm still seeing a few problems with this patch:
the vfp* tests are still failing in some of the configurations I test, because
* you force dg-options that contains -mfloat-abi=hard,
* you check effective-target arm_neon_fp16_hw
* but you don't call dg-add-options arm_neon_fp16

on non-hf targets, the effective-target arm_neon_fp16_hw will want to
add -mfloat-abi=softfp, but you actually force -mfloat-abi=hard.
So, the dg-skip directive doesn't match, and the test fails to link because
the dejagnu glue code is compiled in soft mode, and conflicts
with the hard mode from vfpXX.o


Thanks for testing this.

I'm not sure why the skip-if is failing, it's intended to skip the test if 
float-abi={soft,softfp} appears anywhere in the command line. That's needed 
because, in some configurations, the directives add an -mfloat-abi=softfp 
after the -mfloat-abi=hard from dg-options, making the test fail.


The require-effective-target arm_neon_fp16_hw was intended to select a 
hard-float target with FP16 support. I don't think that dg-add-options 
arm_neon_fp16 is right because that could also force soft-fp.


It may be better to not use arm_neon_fp16_hw. The existing aapc/vfp* tests have 
a list of require-effective-target directives to filter out invalid boards. 
I'll see if that can be made to work with arm_hard_vfp_ok and a selector for 
vfp-fp16 hardware.


Matthew

[ARM] Fix, add tests for FP16 aapcs.

2016-06-09 Thread Matthew Wahab


Hello,

A number of tests were added to check for FP16 arguments and return
values being passed in registers. These require mfloat-abi=hard to be
selected but in some test configurations they were run with
-mfloat-abi=soft or -mfloat-abi=softfp.

Explict skip-if directives are added to the tests to ensure that they
only run on valid configurations. In addition, the code in the
gcc.target/arm/fp16-aapcs-1.c test is reworked to focus on argument
passing and return values and a softfp variant is added as
fp16-aapcs-2.c.

Tested for arm-none-linux-gnueabihf with native make check and for
arm-none-eabi with cross-compiled check-gcc. Also checked the new tests
with cross-compiled arm-eabi-qemu/-mcpu=cortex-m3/-mthumb.

Ok for trunk?
Matthew

2016-06-09  Matthew Wahab  

* testsuite/gcc.target/arm/aapcs/neon-vect10.c: Skip for
mfloat-abi=soft and mfloat-abi=softfp.  Replace arm_neon_fp16_ok
with arm_neon_fp16_hw.
* testsuite/gcc.target/arm/aapcs/neon-vect9.c: Likewise.
* testsuite/gcc.target/arm/aapcs/vfp18.c: Likewise.
* testsuite/gcc.target/arm/aapcs/vfp19.c: Likewise.
* testsuite/gcc.target/arm/aapcs/vfp20.c: Likewise.
* testsuite/gcc.target/arm/aapcs/vfp21.c: Likewise.
* testsuite/gcc.target/arm/fp16-aapcs-1.c: Skip for
mfloat-abi=soft and mfloat-abi=softfp.  Also, simplify the test
and set option -mfloat-abi=hard.
* testsuite/gcc.target/arm/fp16-aapcs-2.c: New.
>From b02f0283367d4a4c1b012e8ca8e7b5c91f3ac561 Mon Sep 17 00:00:00 2001
From: Matthew Wahab 
Date: Tue, 24 May 2016 09:21:11 +0100
Subject: [PATCH] [ARM] Fix, add tests for FP16 aapcs.

A number of tests were added to check for FP16 arguments and return
values being passed in register. These require mfloat-abi=hard to be
selected but, in some test configurations, they were run with
-mfloat-abi=soft or -mfloat-abi=softfp.

Explict skip-if directives are added to the tests to ensure that they
only run on valid configurations. In addition, the code in the
gcc.target/arm/fp16-aapcs-1.c test is reworked to focus on argument
passing and return values and a softfp variant is added as
fp16-aapcs-2.c.

2016-06-09  Matthew Wahab  

	* testsuite/gcc.target/arm/aapcs/neon-vect10.c: Skip for
	mfloat-abi=soft and mfloat-abi=softfp.  Replace arm_neon_fp16_ok
	with arm_neon_fp16_hw.
	* testsuite/gcc.target/arm/aapcs/neon-vect9.c: Likewise.
	* testsuite/gcc.target/arm/aapcs/vfp18.c: Likewise.
	* testsuite/gcc.target/arm/aapcs/vfp19.c: Likewise.
	* testsuite/gcc.target/arm/aapcs/vfp20.c: Likewise.
	* testsuite/gcc.target/arm/aapcs/vfp21.c: Likewise.
	* testsuite/gcc.target/arm/fp16-aapcs-1.c: Skip for
	mfloat-abi=soft and mfloat-abi=softfp.  Also, simplify the test
	and set option -mfloat-abi=hard.
	* testsuite/gcc.target/arm/fp16-aapcs-2.c: New.
---
 gcc/testsuite/gcc.target/arm/aapcs/neon-vect10.c |  3 ++-
 gcc/testsuite/gcc.target/arm/aapcs/neon-vect9.c  |  3 ++-
 gcc/testsuite/gcc.target/arm/aapcs/vfp18.c   |  3 ++-
 gcc/testsuite/gcc.target/arm/aapcs/vfp19.c   |  3 ++-
 gcc/testsuite/gcc.target/arm/aapcs/vfp20.c   |  3 ++-
 gcc/testsuite/gcc.target/arm/aapcs/vfp21.c   |  3 ++-
 gcc/testsuite/gcc.target/arm/fp16-aapcs-1.c  | 22 +-
 gcc/testsuite/gcc.target/arm/fp16-aapcs-2.c  | 21 +
 8 files changed, 46 insertions(+), 15 deletions(-)
 create mode 100644 gcc/testsuite/gcc.target/arm/fp16-aapcs-2.c

diff --git a/gcc/testsuite/gcc.target/arm/aapcs/neon-vect10.c b/gcc/testsuite/gcc.target/arm/aapcs/neon-vect10.c
index 680a3b5..6764a19 100644
--- a/gcc/testsuite/gcc.target/arm/aapcs/neon-vect10.c
+++ b/gcc/testsuite/gcc.target/arm/aapcs/neon-vect10.c
@@ -1,8 +1,9 @@
 /* Test AAPCS layout (VFP variant for Neon types) */
 
 /* { dg-do run { target arm_eabi } } */
-/* { dg-require-effective-target arm_neon_fp16_ok } */
+/* { dg-require-effective-target arm_neon_fp16_hw } */
 /* { dg-add-options arm_neon_fp16 } */
+/* { dg-skip-if "" { *-*-* } { "-mfloat-abi=soft" "-mfloat-abi=softfp" } } */
 
 #ifndef IN_FRAMEWORK
 #define VFP
diff --git a/gcc/testsuite/gcc.target/arm/aapcs/neon-vect9.c b/gcc/testsuite/gcc.target/arm/aapcs/neon-vect9.c
index fc2b13b..b2526a1 100644
--- a/gcc/testsuite/gcc.target/arm/aapcs/neon-vect9.c
+++ b/gcc/testsuite/gcc.target/arm/aapcs/neon-vect9.c
@@ -1,8 +1,9 @@
 /* Test AAPCS layout (VFP variant for Neon types) */
 
 /* { dg-do run { target arm_eabi } } */
-/* { dg-require-effective-target arm_neon_fp16_ok } */
+/* { dg-require-effective-target arm_neon_fp16_hw } */
 /* { dg-add-options arm_neon_fp16 } */
+/* { dg-skip-if "" { *-*-* } { "-mfloat-abi=soft" "-mfloat-abi=softfp" } } */
 
 #ifndef IN_FRAMEWORK
 #define VFP
diff --git a/gcc/testsuite/gcc.target/arm/aapcs/vfp18.c b/gcc/testsuite/gcc.target/arm/aapcs/vfp18.c
index 225e9ce..629b884 100644
--- a/gcc/testsuite/gcc.target/arm/aapcs/vfp18.c
+++ b/gcc/testsuite/gcc.targ

Re: [ARM] Enable __fp16 as a function parameter and return type.

2016-06-02 Thread Matthew Wahab


On 01/06/16 15:43, Christophe Lyon wrote:

On 13 May 2016 at 15:41, Ramana Radhakrishnan  wrote:

On Thu, Apr 28, 2016 at 10:20 AM, Matthew Wahab
 wrote:


This patch enables data movement for HF-mode values using VFP registers,
when they are available, to support passing arguments and return values
through the registers.


[..]

HF-mode data moves.

Tested for arm-none-eabi with cross-compiled check-gcc and for
arm-none-linux-gnueabihf with native bootstrap and make check.

Ok for trunk?
Matthew


This is OK - thanks.


Hi,

I'm seeing regressions on non-hf targets (arm-none-eabi,
arm-none-linux-gnueabi):
new FAIL:
gcc.target/arm/aapcs/neon-vect10.c execution test
gcc.target/arm/aapcs/neon-vect9.c execution test

I'm using QEMU (2.4.1). You said you tested arm-none-eabi, so I'm
probably missing something?

Christophe


Hi,

This has also appeared in our internal testing in some configurations. 
It's due to mfloatabi=soft or mfloat-abi=softfp being added to the end 
of the command line for the tests. That overrides the mfloat-abi=hard 
option that the tests need. I'm just finishing up a patch to fix this.


Matthew

Re: [PATCH 8/17][ARM] Add VFP FP16 arithmetic instructions.

2016-05-19 Thread Matthew Wahab


On 18/05/16 16:20, Joseph Myers wrote:

On Wed, 18 May 2016, Matthew Wahab wrote:


AArch64 follows IEEE-754 but ARM (AArch32) adds restrictions like
flush-to-zero that could affect the outcome of a calculation.


The result of a float computation on two values immediately promoted from
fp16 cannot be within the subnormal range for float.  Thus, only one flush
to zero can happen, on the final conversion back to fp16, and that cannot
make the result different from doing direct arithmetic in fp16 (assuming
flush to zero affects conversion from float to fp16 the same way it
affects direct fp16 arithmetic).


[..]


In short: instructions for direct HFmode arithmetic should be described
with patterns with the standard names.  It's the job of the
architecture-independent compiler to ensure that fp16 arithmetic in the
user's source code only generates direct fp16 arithmetic in GIMPLE (and
thus ends up using those patterns) if that is a correct representation of
the source code's semantics according to ACLE.

The intrinsics you provide can then be written to use direct arithmetic,
and rely on convert_to_real_1 eliminating the promotions, rather than
needing built-in functions at all, just like many arm_neon.h intrinsics
make direct use of GNU C vector arithmetic.


I think it's clear that this has exhausted my knowledge of FP semantics.

Forcing promotion to single-precision was to settle concerns brought up in internal 
discussions about __fp16 semantics. I'll see if anybody has any problem with the 
changes you suggest.


Thanks,
Matthew

Re: [ARM] Enable __fp16 as a function parameter and return type.

2016-05-18 Thread Matthew Wahab


On 18/05/16 09:41, Ramana Radhakrishnan wrote:

On Mon, May 16, 2016 at 2:16 PM, Tejas Belagod
 wrote:



We do have plans to fix pre-ACLE behavior of fp16 to conform to current ACLE
spec, but can't say when exactly.


Matthew, could you please take a look at this while you are in this area ?


Ok.

Part of this is likely to involve removing TARGET_CONVERT_TO_TYPE from the ARM 
backend. grep doesn't show anywhere that uses this hook, the only other occurrence is 
in the ARC backend. Does the hook ever get used?


Matthew

Re: [PATCH 8/17][ARM] Add VFP FP16 arithmetic instructions.

2016-05-18 Thread Matthew Wahab


On 18/05/16 01:51, Joseph Myers wrote:

On Tue, 17 May 2016, Matthew Wahab wrote:


In most cases the instructions are added using non-standard pattern
names. This is to force operations on __fp16 values to be done, by
conversion, using the single-precision instructions. The exceptions are
the precision preserving operations ABS and NEG.


But why do you need to force that?  If the instructions follow IEEE
semantics including for exceptions and rounding modes, then X OP Y
computed directly with binary16 arithmetic has the same value as results
from promoting to binary32, doing binary32 arithmetic and converting back
to binary16, for OP in + - * /.  (Double-rounding problems can only occur
in round-to-nearest and if the binary32 result is exactly half way between
two representable binary16 values but the exact result is not exactly half
way between.  It's obvious that this can't occur to + - * and only a bit
harder to see this for /.  According to the logic used in
convert.c:convert_to_real_1, double rounding can't occur in this case for
square root either, though I haven't verified that.)


AArch64 follows IEEE-754 but ARM (AArch32) adds restrictions like flush-to-zero that 
could affect the outcome of a calculation.



So I'd expect e.g.

__fp16 a, b;
__fp16 c = a / b;

to generate the new instructions, because direct binary16 arithmetic is a
correct implementation of (__fp16) ((float) a / (float) b).


Something like

__fp16 a, b, c;
__fp16 d = (a / b) * c;

would be done as the sequence of single precision operations:

vcvtb.f32.f16 s0, s0
vcvtb.f32.f16 s1, s1
vcvtb.f32.f16 s2, s2
vdiv.f32 s15, s0, s1
vmul.f32 s0, s15, s2
vcvtb.f16.f32 s0, s0

Doing this with vdiv.f16 and vmul.f16 could change the calculated result because the 
flush-to-zero rule is related to operation precision so affects the value of a 
vdiv.f16 differently from the vdiv.f32.


(At least, that's my understanding.)

Matthew

Re: [PATCH 16/17][ARM] Add tests for VFP FP16 ACLE instrinsics.

2016-05-18 Thread Matthew Wahab


On 18/05/16 02:06, Joseph Myers wrote:

On Tue, 17 May 2016, Matthew Wahab wrote:


In some tests, there are unavoidable differences in precision when calculating
the actual and the expected results of an FP16 operation. A new support function
CHECK_FP_BIAS is used so that these tests can check for an acceptable margin of
error. In these tests, the tolerance is given as the absolute integer difference
between the bitvectors of the expected and the actual results.


As far as I can see, CHECK_FP_BIAS is only used in the following patch, but 
there
 is another bias test in vsqrth_f16_1.c in this patch.


This is my mistake, the CHECK_FP_BIAS is used for the NEON tests and should 
have gone
into that patch. The VFP test can do a simpler check so doesn't need the macro.


Could you clarify where the "unavoidable differences in precision" come from? 
Are
the results of some of the new instructions not fully specified, only specified
within a given precision?  (As far as I can tell the existing v8 instructions 
for
reciprocal and reciprocal square root estimates do have fully defined results,
despite being loosely described as esimtates.)


The expected results in the new tests are represented as expressions whose 
value is
expected to be calculated at compile-time. This makes the tests more readable but 
differences in the precision between the the compiler and the HW calculations mean 
that for vrecpe_f16, vrecps_f16, vrsqrts_f16 and vsqrth_f16_1.c the expected and 
actual results are different.


On reflection, it may be better to remove the CHECK_FP_BIAS macro and, for the tests 
that needed it, to drop the compiler calculation and just use the expected 
hexadecimal value.


Other tests depending on compiler-time calculations involve relatively simple 
arithmetic operations and it's not clear if they are susceptible to the same rounding 
errors. I have limited knowledge in FP arithmetic though so I'll look into this.


Matthew

[PATCH 16/17][ARM] Add tests for VFP FP16 ACLE instrinsics.

2016-05-17 Thread Matthew Wahab


Support for using the half-precision floating point operations added by
the ARMv8.2-A FP16 extension is based on the macros and intrinsics added
to the ACLE for the extension.

This patch adds executable tests for the ACLE scalar (floating point)
intrinsics to the advsimd-intrinsics testsuite. The tests were written
by Jiong Wang.

In some tests, there are unavoidable differences in precision when
calculating the actual and the expected results of an FP16 operation. A
new support function CHECK_FP_BIAS is used so that these tests can check
for an acceptable margin of error. In these tests, the tolerance is
given as the absolute integer difference between the bitvectors of the
expected and the actual results.

Tested the series for arm-none-linux-gnueabihf with native bootstrap and
make check and for arm-none-eabi and armeb-none-eabi with make check on
an ARMv8.2-A emulator. Also tested for aarch64-none-elf with the
advsimd-intrinsics testsuite using an ARMv8.2-A emulator.

Ok for trunk?
Matthew

testsuite/
2016-05-17  Jiong Wang  
Matthew Wahab  

* gcc.target/aarch64/advsimd-intrinsics/arm-neon-ref.h
(CHECK_FP_BIAS): New.
* gcc.target/aarch64/advsimd-intrinsics/vabsh_f16_1.c: New.
* gcc.target/aarch64/advsimd-intrinsics/vaddh_f16_1.c: New.
* gcc.target/aarch64/advsimd-intrinsics/vcvtah_s32_f16_1.c: New.
* gcc.target/aarch64/advsimd-intrinsics/vcvtah_u32_f16_1.c: New.
* gcc.target/aarch64/advsimd-intrinsics/vcvth_f16_s32_1.c: New.
* gcc.target/aarch64/advsimd-intrinsics/vcvth_f16_u32_1.c: New.
* gcc.target/aarch64/advsimd-intrinsics/vcvth_n_f16_s32_1.c: New.
* gcc.target/aarch64/advsimd-intrinsics/vcvth_n_f16_u32_1.c: New.
* gcc.target/aarch64/advsimd-intrinsics/vcvth_n_s32_f16_1.c: New.
* gcc.target/aarch64/advsimd-intrinsics/vcvth_n_u32_f16_1.c: New.
* gcc.target/aarch64/advsimd-intrinsics/vcvth_s32_f16_1.c: New.
* gcc.target/aarch64/advsimd-intrinsics/vcvth_u32_f16_1.c: New.
* gcc.target/aarch64/advsimd-intrinsics/vcvtmh_s32_f16_1.c: New.
* gcc.target/aarch64/advsimd-intrinsics/vcvtmh_u32_f16_1.c: New.
* gcc.target/aarch64/advsimd-intrinsics/vcvtnh_s32_f16_1.c: New.
* gcc.target/aarch64/advsimd-intrinsics/vcvtnh_u32_f16_1.c: New.
* gcc.target/aarch64/advsimd-intrinsics/vcvtph_s32_f16_1.c: New.
* gcc.target/aarch64/advsimd-intrinsics/vcvtph_u32_f16_1.c: New.
* gcc.target/aarch64/advsimd-intrinsics/vdivh_f16_1.c: New.
* gcc.target/aarch64/advsimd-intrinsics/vfmah_f16_1.c: New.
* gcc.target/aarch64/advsimd-intrinsics/vfmsh_f16_1.c: New.
* gcc.target/aarch64/advsimd-intrinsics/vmaxnmh_f16_1.c: New.
* gcc.target/aarch64/advsimd-intrinsics/vminnmh_f16_1.c: New.
* gcc.target/aarch64/advsimd-intrinsics/vmulh_f16_1.c: New.
* gcc.target/aarch64/advsimd-intrinsics/vnegh_f16_1.c: New.
* gcc.target/aarch64/advsimd-intrinsics/vrndah_f16_1.c: New.
* gcc.target/aarch64/advsimd-intrinsics/vrndh_f16_1.c: New.
* gcc.target/aarch64/advsimd-intrinsics/vrndih_f16_1.c: New.
* gcc.target/aarch64/advsimd-intrinsics/vrndmh_f16_1.c: New.
* gcc.target/aarch64/advsimd-intrinsics/vrndnh_f16_1.c: New.
* gcc.target/aarch64/advsimd-intrinsics/vrndph_f16_1.c: New.
* gcc.target/aarch64/advsimd-intrinsics/vrndxh_f16_1.c: New.
* gcc.target/aarch64/advsimd-intrinsics/vsqrth_f16_1.c: New.
* gcc.target/aarch64/advsimd-intrinsics/vsubh_f16_1.c: New.

>From fe243d41337fcce0c93a8ce1df68921c680bcfe8 Mon Sep 17 00:00:00 2001
From: Matthew Wahab 
Date: Thu, 7 Apr 2016 15:40:52 +0100
Subject: [PATCH 16/17] [PATCH 16/17][ARM] Add tests for VFP FP16 ACLE
 instrinsics.

testsuite/
2016-05-17  Jiong Wang  
	    Matthew Wahab  

	* gcc.target/aarch64/advsimd-intrinsics/arm-neon-ref.h
	(CHECK_FP_BIAS): New.
	* gcc.target/aarch64/advsimd-intrinsics/vabsh_f16_1.c: New.
	* gcc.target/aarch64/advsimd-intrinsics/vaddh_f16_1.c: New.
	* gcc.target/aarch64/advsimd-intrinsics/vcvtah_s32_f16_1.c: New.
	* gcc.target/aarch64/advsimd-intrinsics/vcvtah_u32_f16_1.c: New.
	* gcc.target/aarch64/advsimd-intrinsics/vcvth_f16_s32_1.c: New.
	* gcc.target/aarch64/advsimd-intrinsics/vcvth_f16_u32_1.c: New.
	* gcc.target/aarch64/advsimd-intrinsics/vcvth_n_f16_s32_1.c: New.
	* gcc.target/aarch64/advsimd-intrinsics/vcvth_n_f16_u32_1.c: New.
	* gcc.target/aarch64/advsimd-intrinsics/vcvth_n_s32_f16_1.c: New.
	* gcc.target/aarch64/advsimd-intrinsics/vcvth_n_u32_f16_1.c: New.
	* gcc.target/aarch64/advsimd-intrinsics/vcvth_s32_f16_1.c: New.
	* gcc.target/aarch64/advsimd-intrinsics/vcvth_u32_f16_1.c: New.
	* gcc.target/aarch64/advsimd-intrinsics/vcvtmh_s32_f16_1.c: New.
	* gcc.target/aarch64/advsimd-intrinsics/vcvtmh_u32_f16_1.c: New.
	* gcc.target/aarch64/advsimd-intrinsics/vcvtnh_s32_f16_1.c: New.
	* gcc.target/aarch64/advsimd-intrinsics/vcvtnh_u32_f16_1.c: New.
	* gcc.target/aarc

[PATCH 13/17][ARM] Add VFP FP16 instrinsics.

2016-05-17 Thread Matthew Wahab


The ARMv8.2-A architecture introduces an optional FP16 extension adding
half-precision floating point data processing instructions to the
existing scalar (floating point) support. A future version of the ACLE
will add support for these instructions and this patch implements that
support.

The ACLE will introduce new intrinsics for the scalar (floating-point)
instructions together with a new header file arm_fp16.h. The ACLE will
require that the intrinsics are available when both the header file is
included and the ACLE feature macro __ARM_FEATURE_FP16_SCALAR_ARITHMETIC
is defined. (The new ACLE feature macros are dealt with in an earlier
patch.)

The patch adds the arm_fp16.h header file with the following new
intrinsics:

float16_t vabsh_f16 (float16_t __a)
int32_t vcvtah_s32_f16 (float16_t __a)
uint32_t vcvtah_u32_f16 (float16_t __a)
float16_t vcvth_f16_s32 (int32_t __a)
float16_t vcvth_f16_u32 (uint32_t __a)
int32_t vcvth_s32_f16 (float16_t __a)
uint32_t vcvth_u32_f16 (float16_t __a)
int32_t vcvtmh_s32_f16 (float16_t __a)
uint32_t vcvtmh_u32_f16 (float16_t __a)
int32_t vcvtnh_s32_f16 (float16_t __a)
uint32_t vcvtnh_u32_f16 (float16_t __a)
int32_t vcvtph_s32_f16 (float16_t __a)
uint32_t vcvtph_u32_f16 (float16_t __a)
float16_t vnegh_f16 (float16_t __a)
float16_t vrndah_f16 (float16_t __a)
float16_t vrndh_f16 (float16_t __a)
float16_t vrndih_f16 (float16_t __a)
float16_t vrndmh_f16 (float16_t __a)
float16_t vrndnh_f16 (float16_t __a)
float16_t vrndph_f16 (float16_t __a)
float16_t vrndxh_f16 (float16_t __a)
float16_t vsqrth_f16 (float16_t __a)

float16_t vaddh_f16 (float16_t __a, float16_t __b)
float16_t vcvth_n_f16_s32 (int32_t __a, const int __b)
float16_t vcvth_n_f16_u32 (uint32_t __a, const int __b)
int32_t vcvth_n_s32_f16 (float16_t __a, const int __b)
uint32_t vcvth_n_u32_f16 (float16_t __a, const int __b)
float16_t vdivh_f16 (float16_t __a, float16_t __b)
float16_t vmaxnmh_f16 (float16_t __a, float16_t __b)
float16_t vminnmh_f16 (float16_t __a, float16_t __b)
float16_t vmulh_f16 (float16_t __a, float16_t __b)
float16_t vsubh_f16 (float16_t __a, float16_t __b)

float16_t vfmah_f16 (float16_t __a, float16_t __b, float16_t __c)
float16_t vfmsh_f16 (float16_t __a, float16_t __b, float16_t __c)


Tested the series for arm-none-linux-gnueabihf with native bootstrap and
make check and for arm-none-eabi and armeb-none-eabi with make check on
an ARMv8.2-A emulator.

Ok for trunk?
Matthew

2016-05-17  Matthew Wahab  

* config.gcc (extra_headers): Add arm_fp16.h
* config/arm/arm_fp16.h: New.

>From 0c7d4da5a7c8ca9cf3ce2f23072668c4155b35d9 Mon Sep 17 00:00:00 2001
From: Matthew Wahab 
Date: Thu, 7 Apr 2016 15:36:23 +0100
Subject: [PATCH 13/17] [PATCH 13/17][ARM] Add VFP FP16 instrinsics.

2016-05-17  Matthew Wahab  

	* config.gcc (extra_headers): Add arm_fp16.h
	* config/arm/arm_fp16.h: New.
---
 gcc/config.gcc|   2 +-
 gcc/config/arm/arm_fp16.h | 255 ++
 2 files changed, 256 insertions(+), 1 deletion(-)
 create mode 100644 gcc/config/arm/arm_fp16.h

diff --git a/gcc/config.gcc b/gcc/config.gcc
index 51af122a..e22ff9e 100644
--- a/gcc/config.gcc
+++ b/gcc/config.gcc
@@ -327,7 +327,7 @@ arc*-*-*)
 arm*-*-*)
 	cpu_type=arm
 	extra_objs="arm-builtins.o aarch-common.o"
-	extra_headers="mmintrin.h arm_neon.h arm_acle.h"
+	extra_headers="mmintrin.h arm_neon.h arm_acle.h arm_fp16.h"
 	target_type_format_char='%'
 	c_target_objs="arm-c.o"
 	cxx_target_objs="arm-c.o"
diff --git a/gcc/config/arm/arm_fp16.h b/gcc/config/arm/arm_fp16.h
new file mode 100644
index 000..702090a
--- /dev/null
+++ b/gcc/config/arm/arm_fp16.h
@@ -0,0 +1,255 @@
+/* ARM FP16 intrinsics include file.
+
+   Copyright (C) 2016 Free Software Foundation, Inc.
+   Contributed by ARM Ltd.
+
+   This file is part of GCC.
+
+   GCC is free software; you can redistribute it and/or modify it
+   under the terms of the GNU General Public License as published
+   by the Free Software Foundation; either version 3, or (at your
+   option) any later version.
+
+   GCC is distributed in the hope that it will be useful, but WITHOUT
+   ANY WARRANTY; without even the implied warranty of MERCHANTABILITY
+   or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU General Public
+   License for more details.
+
+   Under Section 7 of GPL version 3, you are granted additional
+   permissions described in the GCC Runtime Library Exception, version
+   3.1, as published by the Free Software Foundation.
+
+   You should have received a copy of the GNU General Public License and
+   a copy of the GCC Runtime Library Exception along with this program;
+   see the files COPYING3 and COPYING.RUNTIME respectively.  If not, see
+   <http://www.gnu.org/licenses/>.  */
+
+#ifndef _GCC_ARM_FP16_H
+#define _GCC_ARM_FP16_H 1
+
+#ifdef __cplusplus
+extern "C" {
+#endif
+
+#include 
+
+/* Intrinsics for FP16 instructions.  *

[PATCH 15/17][ARM] Add tests for ARMv8.2-A FP16 support.

2016-05-17 Thread Matthew Wahab


Support for using the half-precision floating point operations added by
the ARMv8.2-A FP16 extension is based on the macros and intrinsics added
to the ACLE for the extension.

This patch adds tests to check the compilers treatment of the ACLE
macros and the code generated for the new intrinsics. It does not
include the executable tests for the
gcc.target/aarch64/advsimd-intrinsics testsuite. Those are added later
in the patch series.

Tested the series for arm-none-linux-gnueabihf with native bootstrap and
make check and for arm-none-eabi and armeb-none-eabi with make check on
an ARMv8.2-A emulator.

Ok for trunk?
Matthew

testsuite/
2016-05-17  Matthew Wahab  

* gcc.target/arm/armv8_2-fp16-neon-1.c: New.
* gcc.target/arm/armv8_2-fp16-scalar-1.c: New.
* gcc.target/arm/armv8_2-fp16-scalar-2.c: New.
* gcc.target/arm/attr-fp16-arith-1.c: Add a test of intrinsics
support.

>From fe0cac871efe08d491a3b4ac027c29db1a72d15c Mon Sep 17 00:00:00 2001
From: Matthew Wahab 
Date: Thu, 7 Apr 2016 13:38:02 +0100
Subject: [PATCH 15/17] [PATCH 15/17][ARM] Add tests for ARMv8.2-A FP16
 support.

testsuite/
2016-05-17  Matthew Wahab  

	* gcc.target/arm/armv8_2-fp16-neon-1.c: New.
	* gcc.target/arm/armv8_2-fp16-scalar-1.c: New.
	* gcc.target/arm/armv8_2-fp16-scalar-2.c: New.
	* gcc.target/arm/attr-fp16-arith-1.c: Add a test of intrinsics
	support.
---
 gcc/testsuite/gcc.target/arm/armv8_2-fp16-neon-1.c | 490 +
 .../gcc.target/arm/armv8_2-fp16-scalar-1.c | 203 +
 .../gcc.target/arm/armv8_2-fp16-scalar-2.c |  71 +++
 gcc/testsuite/gcc.target/arm/attr-fp16-arith-1.c   |  13 +
 4 files changed, 777 insertions(+)
 create mode 100644 gcc/testsuite/gcc.target/arm/armv8_2-fp16-neon-1.c
 create mode 100644 gcc/testsuite/gcc.target/arm/armv8_2-fp16-scalar-1.c
 create mode 100644 gcc/testsuite/gcc.target/arm/armv8_2-fp16-scalar-2.c

diff --git a/gcc/testsuite/gcc.target/arm/armv8_2-fp16-neon-1.c b/gcc/testsuite/gcc.target/arm/armv8_2-fp16-neon-1.c
new file mode 100644
index 000..576031e
--- /dev/null
+++ b/gcc/testsuite/gcc.target/arm/armv8_2-fp16-neon-1.c
@@ -0,0 +1,490 @@
+/* { dg-do compile }  */
+/* { dg-require-effective-target arm_v8_2a_fp16_neon_ok }  */
+/* { dg-options "-O2" }  */
+/* { dg-add-options arm_v8_2a_fp16_neon }  */
+
+/* Test instructions generated for the FP16 vector intrinsics.  */
+
+#include 
+
+#define MSTRCAT(L, str)	L##str
+
+#define UNOP_TEST(insn)\
+  float16x4_t	\
+  MSTRCAT (test_##insn, _16x4) (float16x4_t a)	\
+  {		\
+return MSTRCAT (insn, _f16) (a);		\
+  }		\
+  float16x8_t	\
+  MSTRCAT (test_##insn, _16x8) (float16x8_t a)	\
+  {		\
+return MSTRCAT (insn, q_f16) (a);		\
+  }
+
+#define BINOP_TEST(insn)	\
+  float16x4_t			\
+  MSTRCAT (test_##insn, _16x4) (float16x4_t a, float16x4_t b)	\
+  {\
+return MSTRCAT (insn, _f16) (a, b);\
+  }\
+  float16x8_t			\
+  MSTRCAT (test_##insn, _16x8) (float16x8_t a, float16x8_t b)	\
+  {\
+return MSTRCAT (insn, q_f16) (a, b);			\
+  }
+
+#define BINOP_LANE_TEST(insn, I)	\
+  float16x4_t\
+  MSTRCAT (test_##insn##_lane, _16x4) (float16x4_t a, float16x4_t b)	\
+  {	\
+return MSTRCAT (insn, _lane_f16) (a, b, I);\
+  }	\
+  float16x8_t\
+  MSTRCAT (test_##insn##_lane, _16x8) (float16x8_t a, float16x4_t b)	\
+  {	\
+return MSTRCAT (insn, q_lane_f16) (a, b, I);			\
+  }
+
+#define BINOP_LANEQ_TEST(insn, I)	\
+  float16x4_t\
+  MSTRCAT (test_##insn##_laneq, _16x4) (float16x4_t a, float16x8_t b)	\
+  {	\
+return MSTRCAT (insn, _laneq_f16) (a, b, I);			\
+  }	\
+  float16x8_t\
+  MSTRCAT (test_##insn##_laneq, _16x8) (float16x8_t a, float16x8_t b)	\
+  {	\
+return MSTRCAT (insn, q_laneq_f16) (a, b, I);			\
+  }	\
+
+#define BINOP_N_TEST(insn)	\
+  float16x4_t			\
+  MSTRCAT (test_##insn##_n, _16x4) (float16x4_t a, float16_t b)	\
+  {\
+return MSTRCAT (insn, _n_f16) (a, b);			\
+  }\
+  float16x8_t			\
+  MSTRCAT (test_##insn##_n, _16x8) (float16x8_t a, float16_t b)	\
+  {\
+return MSTRCAT (insn, q_n_f16) (a, b);			\
+  }
+
+#define TERNOP_TEST(insn)		\
+  float16_t\
+  MSTRCAT (test_##insn, _16) (float16_t a, float16_t b, float16_t c)	\
+  {	\
+return MSTRCAT (insn, h_f16) (a, b, c);\
+  }	\
+  float16x4_t\
+  MSTRCAT (test_##insn, _16x4) (float16x4_t a, float16x4_t b,		\
+			   float16x4_t c)\
+  {	\
+return MSTRCAT (insn, _f16) (a, b, c);\
+  }	\
+  float16x8_t\
+  MSTRCAT (test_##insn, _16x8) (float16x8_t a, float16x8_t b,		\
+			   float16x8_t c)\
+  {	\
+return MSTRCAT (insn, q_f16) (a, b, c);\
+  }
+
+#define VCMP1_TEST(insn)			\
+  uint16x4_t	\
+  MSTRCAT (test_##insn, _16x4) (float16x4_t a)	\
+  {		\
+return

[PATCH 14/17][ARM] Add NEON FP16 instrinsics.

2016-05-17 Thread Matthew Wahab

 __b)
float16x8_t vminnmq_f16 (float16x8_t __a, float16x8_t __b)
float16x4_t vmul_f16 (float16x4_t __a, float16x4_t __b)
float16x4_t vmul_n_f16 (float16x4_t __a, float16_t __b)
float16x8_t vmulq_f16 (float16x8_t __a, float16x8_t __b)
float16x8_t vmulq_n_f16 (float16x8_t __a, float16_t __b)
float16x4_t vpadd_f16 (float16x4_t __a, float16x4_t __b)
float16x4_t vpmax_f16 (float16x4_t __a, float16x4_t __b)
float16x4_t vpmin_f16 (float16x4_t __a, float16x4_t __b)
float16x4_t vrecps_f16 (float16x4_t __a, float16x4_t __b)
float16x8_t vrecpsq_f16 (float16x8_t __a, float16x8_t __b)
float16x4_t vrsqrts_f16 (float16x4_t __a, float16x4_t __b)
float16x8_t vrsqrtsq_f16 (float16x8_t __a, float16x8_t __b)
float16x4_t vsub_f16 (float16x4_t __a, float16x4_t __b)
float16x8_t vsubq_f16 (float16x8_t __a, float16x8_t __b)

float16x4_t vfma_f16 (float16x4_t __a, float16x4_t __b, float16x4_t __c)
float16x8_t vfmaq_f16 (float16x8_t __a, float16x8_t __b, float16x8_t __c)
float16x4_t vfms_f16 (float16x4_t __a, float16x4_t __b, float16x4_t __c)
float16x8_t vfmsq_f16 (float16x8_t __a, float16x8_t __b, float16x8_t __c)
float16x4_t vmul_lane_f16 (float16x4_t __a, float16x4_t __b, const int __c)
float16x8_t vmulq_lane_f16 (float16x8_t __a, float16x4_t __b, const int __c)


Tested the series for arm-none-linux-gnueabihf with native bootstrap and
make check and for arm-none-eabi and armeb-none-eabi with make check on
an ARMv8.2-A emulator.

Ok for trunk?
Matthew

2016-05-17  Matthew Wahab  

* config/arm/arm_neon.h: Include arm_fp16.h.
(vabd_f16): New.
(vabdq_f16): New.
(vabs_f16): New.
(vabsq_f16): New.
(vadd_f16): New.
(vaddq_f16): New.
(vcage_f16): New.
(vcageq_f16): New.
(vcagt_f16): New.
(vcagtq_f16): New.
(vcale_f16): New.
(vcaleq_f16): New.
(vcalt_f16): New.
(vcaltq_f16): New.
(vceq_f16): New.
(vceqq_f16): New.
(vceqz_f16): New.
(vceqzq_f16): New.
(vcge_f16): New.
(vcgeq_f16): New.
(vcgez_f16): New.
(vcgezq_f16): New.
(vcgt_f16): New.
(vcgtq_f16): New.
(vcgtz_f16): New.
(vcgtzq_f16): New.
(vcle_f16): New.
(vcleq_f16): New.
(vclez_f16): New.
(vclezq_f16): New.
(vclt_f16): New.
(vcltq_f16): New.
(vcltz_f16): New.
(vcltzq_f16): New.
(vcvt_f16_s16): New.
(vcvt_f16_u16): New.
(vcvt_s16_f16): New.
(vcvt_u16_f16): New.
(vcvtq_f16_s16): New.
(vcvtq_f16_u16): New.
(vcvtq_s16_f16): New.
(vcvtq_u16_f16): New.
(vcvta_s16_f16): New.
(vcvta_u16_f16): New.
(vcvtaq_s16_f16): New.
(vcvtaq_u16_f16): New.
(vcvtm_s16_f16): New.
(vcvtm_u16_f16): New.
(vcvtmq_s16_f16): New.
(vcvtmq_u16_f16): New.
(vcvtn_s16_f16): New.
(vcvtn_u16_f16): New.
(vcvtnq_s16_f16): New.
(vcvtnq_u16_f16): New.
(vcvtp_s16_f16): New.
(vcvtp_u16_f16): New.
(vcvtpq_s16_f16): New.
(vcvtpq_u16_f16): New.
(vcvt_n_f16_s16): New.
(vcvt_n_f16_u16): New.
(vcvtq_n_f16_s16): New.
(vcvtq_n_f16_u16): New.
(vcvt_n_s16_f16): New.
(vcvt_n_u16_f16): New.
(vcvtq_n_s16_f16): New.
(vcvtq_n_u16_f16): New.
(vfma_f16): New.
(vfmaq_f16): New.
(vfms_f16): New.
(vfmsq_f16): New.
(vmax_f16): New.
(vmaxq_f16): New.
(vmaxnm_f16): New.
(vmaxnmq_f16): New.
(vmin_f16): New.
(vminq_f16): New.
(vminnm_f16): New.
(vminnmq_f16): New.
(vmul_f16): New.
(vmul_lane_f16): New.
(vmul_n_f16): New.
(vmulq_f16): New.
(vmulq_lane_f16): New.
(vmulq_n_f16): New.
(vneg_f16): New.
(vnegq_f16): New.
(vpadd_f16): New.
(vpmax_f16): New.
(vpmin_f16): New.
(vrecpe_f16): New.
(vrecpeq_f16): New.
(vrnd_f16): New.
(vrndq_f16): New.
(vrnda_f16): New.
(vrndaq_f16): New.
(vrndm_f16): New.
(vrndmq_f16): New.
(vrndn_f16): New.
(vrndnq_f16): New.
(vrndp_f16): New.
(vrndpq_f16): New.
(vrndx_f16): New.
(vrndxq_f16): New.
(vsqrte_f16): New.
(vsqrteq_f16): New.
(vrecps_f16): New.
(vrecpsq_f16): New.
(vrsqrts_f16): New.
(vrsqrtsq_f16): New.
(vsub_f16): New.
(vsubq_f16): New.

>From 3f8692f5849049af0db05d1cc3b4cda80ae131e0 Mon Sep 17 00:00:00 2001
From: Matthew Wahab 
Date: Thu, 7 Apr 2016 15:36:34 +0100
Subject: [PATCH 14/17] [PATCH 14/17][ARM] Add NEON FP16 instrinsics.

2016-05-17  Matthew Wahab  

	* config/arm/arm_neon.h (vabd_f16): New.
	(vabdq_f16): New.
	(vabs_f16): New.
	(vabsq_f16): New.
	(vadd_f16): New.
	(vaddq_f16): New.
	(vcage_f16): New.
	(vcageq_f16):

[PATCH 12/17][ARM] Add builtins for NEON FP16 intrinsics.

2016-05-17 Thread Matthew Wahab


This patch adds the builtins data for the ACLE intrinsics introduced to
support the NEON instructions of the ARMv8.2-A FP16 extension.

Tested the series for arm-none-linux-gnueabihf with native bootstrap and
make check and for arm-none-eabi and armeb-none-eabi with make check on
an ARMv8.2-A emulator.

Ok for trunk?
Matthew

2016-05-17  Matthew Wahab  

* config/arm/arm_neon_builtins.def (vadd): New (v8hf, v4hf
variants).
(vmulf): New (v8hf, v4hf variants).
(vfma): New (v8hf, v4hf variants).
(vfms): New (v8hf, v4hf variants).
(vsub): New (v8hf, v4hf variants).
(vcage): New (v8hf, v4hf variants).
(vcagt): New (v8hf, v4hf variants).
(vcale): New (v8hf, v4hf variants).
(vcalt): New (v8hf, v4hf variants).
(vceq): New (v8hf, v4hf variants).
(vcgt): New (v8hf, v4hf variants).
(vcge): New (v8hf, v4hf variants).
(vcle): New (v8hf, v4hf variants).
(vclt): New (v8hf, v4hf variants).
(vceqz): New (v8hf, v4hf variants).
(vcgez): New (v8hf, v4hf variants).
(vcgtz): New (v8hf, v4hf variants).
(vcltz): New (v8hf, v4hf variants).
(vclez): New (v8hf, v4hf variants).
(vabd): New (v8hf, v4hf variants).
(vmaxf): New (v8hf, v4hf variants).
(vmaxnm): New (v8hf, v4hf variants).
(vminf): New (v8hf, v4hf variants).
(vminnm): New (v8hf, v4hf variants).
(vpmaxf): New (v4hf variant).
(vpminf): New (v4hf variant).
(vpadd): New (v4hf variant).
(vrecps): New (v8hf, v4hf variants).
(vrsqrts): New (v8hf, v4hf variants).
(vabs): New (v8hf, v4hf variants).
(vneg): New (v8hf, v4hf variants).
(vrecpe): New (v8hf, v4hf variants).
(vrnd): New (v8hf, v4hf variants).
(vrnda): New (v8hf, v4hf variants).
(vrndm): New (v8hf, v4hf variants).
(vrndn): New (v8hf, v4hf variants).
(vrndp): New (v8hf, v4hf variants).
(vrndx): New (v8hf, v4hf variants).
(vsqrte): New (v8hf, v4hf variants).
(vdup_n): New (v8hf, v4hf variants).
(vdup_lane): New (v8hf, v4hf variants).
(vmul_lane): Add v4hf and v8hf variants.
(vmul_n): Add v4hf and v8hf variants.
(vmul_n): Add v4hf and v8hf variants.
(vext): New (v8hf, v4hf variants).
(vcvts): New (v8hi, v4hi variants).
(vcvts): New (v8hf, v4hf variants).
(vcvtu): New (v8hi, v4hi variants).
(vcvtu): New (v8hf, v4hf variants).
(vcvts_n): New (v8hf, v4hf variants).
(vcvtu_n): New (v8hi, v4hi variants).
(vcvts_n): New (v8hi, v4hi variants).
(vcvtu_n): New (v8hf, v4hf variants).
(vbsl): New (v8hf, v4hf variants).
(vcvtas): New (v8hf, v4hf variants).
(vcvtau): New (v8hf, v4hf variants).
(vcvtms): New (v8hf, v4hf variants).
(vcvtmu): New (v8hf, v4hf variants).
(vcvtns): New (v8hf, v4hf variants).
(vcvtnu): New (v8hf, v4hf variants).
(vcvtps): New (v8hf, v4hf variants).
(vcvtpu): New (v8hf, v4hf variants).

>From ca740dee578be4c67afeec106feaa1633daff63b Mon Sep 17 00:00:00 2001
From: Matthew Wahab 
Date: Thu, 7 Apr 2016 13:36:41 +0100
Subject: [PATCH 12/17] [PATCH 12/17][ARM] Add builtins for NEON FP16
 intrinsics.

2016-05-17  Matthew Wahab  

	* config/arm/arm_neon_builtins.def (vadd): New (v8hf, v4hf
	variants).
	(vmulf): New (v8hf, v4hf variants).
	(vfma): New (v8hf, v4hf variants).
	(vfms): New (v8hf, v4hf variants).
	(vsub): New (v8hf, v4hf variants).
	(vcage): New (v8hf, v4hf variants).
	(vcagt): New (v8hf, v4hf variants).
	(vcale): New (v8hf, v4hf variants).
	(vcalt): New (v8hf, v4hf variants).
	(vceq): New (v8hf, v4hf variants).
	(vcgt): New (v8hf, v4hf variants).
	(vcge): New (v8hf, v4hf variants).
	(vcle): New (v8hf, v4hf variants).
	(vclt): New (v8hf, v4hf variants).
	(vceqz): New (v8hf, v4hf variants).
	(vcgez): New (v8hf, v4hf variants).
	(vcgtz): New (v8hf, v4hf variants).
	(vcltz): New (v8hf, v4hf variants).
	(vclez): New (v8hf, v4hf variants).
	(vabd): New (v8hf, v4hf variants).
	(vmaxf): New (v8hf, v4hf variants).
	(vmaxnm): New (v8hf, v4hf variants).
	(vminf): New (v8hf, v4hf variants).
	(vminnm): New (v8hf, v4hf variants).
	(vpmaxf): New (v4hf variant).
	(vpminf): New (v4hf variant).
	(vpadd): New (v4hf variant).
	(vrecps): New (v8hf, v4hf variants).
	(vrsqrts): New (v8hf, v4hf variants).
	(vabs): New (v8hf, v4hf variants).
	(vneg): New (v8hf, v4hf variants).
	(vrecpe): New (v8hf, v4hf variants).
	(vrnd): New (v8hf, v4hf variants).
	(vrnda): New (v8hf, v4hf variants).
	(vrndm): New (v8hf, v4hf variants).
	(vrndn): New (v8hf, v4hf variants).
	(vrndp): New (v8hf, v4hf variants).
	(vrndx): New (v8hf, v4hf variants).
	(vsqrte): New (v8hf, v4hf variants).
	(vdup_n): New (v8hf, v4hf variants).
	(vdup_lane): New (v8hf, v4hf variants).
	(vmul_lane): Add v4hf and v8hf variants.
	(vmul_n): Add v4hf and v8hf variants.
	(vmu

[PATCH 8/17][ARM] Add VFP FP16 arithmetic instructions.

2016-05-17 Thread Matthew Wahab


The ARMv8.2-A FP16 extension adds a number of arithmetic instrutctions
to the VFP instruction set. This patch adds support for these
instructions to the ARM backend.

In most cases the instructions are added using non-standard pattern
names. This is to force operations on __fp16 values to be done, by
conversion, using the single-precision instructions. The exceptions are
the precision preserving operations ABS and NEG.

The instruction patterns can be used by the compiler to optimize
half-precision operations. Since the patterns names are non-standard,
the only way for half-precision operations to be generated is by using
the intrinsics added by this patch series meaning that existing code
will not be affected.

Tested the series for arm-none-linux-gnueabihf with native bootstrap and
make check and for arm-none-eabi and armeb-none-eabi with make check on
an ARMv8.2-A emulator.

Ok for trunk?
Matthew

2016-05-17  Matthew Wahab  

* config/arm/iterators.md (Code iterators): Fix some white-space
in the comments.
(GLTE): New.
(ABSNEG): New
(FCVT): Moved from vfp.md.
(VCVT_HF_US_N): New.
(VCVT_SI_US_N): New.
(VCVT_HF_US): New.
(VCVTH_US): New.
(FP16_RND): New.
(absneg_str): New.
(FCVTI32typename): Moved from vfp.md.
(sup): Add UNSPEC_VCVTA_S, UNSPEC_VCVTA_U, UNSPEC_VCVTM_S,
UNSPEC_VCVTM_U, UNSPEC_VCVTN_S, UNSPEC_VCVTN_U, UNSPEC_VCVTP_S,
UNSPEC_VCVTP_U, UNSPEC_VCVT_HF_S_N, UNSPEC_VCVT_HF_U_N,
UNSPEC_VCVT_SI_S_N, UNSPEC_VCVT_SI_U_N, UUNSPEC_VCVTH_S and
UNSPEC_VCVTH_U.
(vcvth_op): New.
(fp16_rnd_str): New.
(fp16_rnd_insn): New.
* config/arm/unspecs.md (UNSPEC_VCVT_HF_S_N): New.
(UNSPEC_VCVT_HF_U_N): New.
(UNSPEC_VCVT_SI_S_N): New.
(UNSPEC_VCVT_SI_U_N): New.
(UNSPEC_VCVTH_S): New.
(UNSPEC_VCVTH_U): New.
(UNSPEC_VCVTA_S): New.
(UNSPEC_VCVTA_U): New.
(UNSPEC_VCVTM_S): New.
(UNSPEC_VCVTM_U): New.
(UNSPEC_VCVTN_S): New.
(UNSPEC_VCVTN_U): New.
(UNSPEC_VCVTP_S): New.
(UNSPEC_VCVTP_U): New.
(UNSPEC_VCVTP_S): New.
(UNSPEC_VCVTP_U): New.
(UNSPEC_VRND): New.
(UNSPEC_VRNDA): New.
(UNSPEC_VRNDI): New.
(UNSPEC_VRNDM): New.
(UNSPEC_VRNDN): New.
(UNSPEC_VRNDP): New.
(UNSPEC_VRNDX): New.
* config/arm/vfp.md (hf2): New.
(neon_vhf): New.
(neon_vhf): New.
(neon_vrndihf): New.
(addhf3_fp16): New.
(neon_vaddhf): New.
(subhf3_fp16): New.
(neon_vsubhf): New.
(divhf3_fp16): New.
(neon_vdivhf): New.
(mulhf3_fp16): New.
(neon_vmulhf): New.
(*mulsf3neghf_vfp): New.
(*negmulhf3_vfp): New.
(*mulsf3addhf_vfp): New.
(*mulhf3subhf_vfp): New.
(*mulhf3neghfaddhf_vfp): New.
(*mulhf3neghfsubhf_vfp): New.
(fmahf4_fp16): New.
(neon_vfmahf): New.
(fmsubhf4_fp16): New.
(neon_vfmshf): New.
(*fnmsubhf4): New.
(*fnmaddhf4): New.
(neon_vsqrthf): New.
(neon_vrsqrtshf): New.
(FCVT): Move to iterators.md.
(FCVTI32typename): Likewise.
(neon_vcvthhf): New.
(neon_vcvthsi): New.
(neon_vcvth_nhf_unspec): New.
(neon_vcvth_nhf): New.
(neon_vcvth_nsi_unspec): New.
(neon_vcvth_nsi): New.
(neon_vcvthsi): New.
(neon_hf): New.

testsuite/
2016-05-17  Matthew Wahab  

* gcc.target/arm/armv8_2-fp16-arith-1.c: New.
* gcc.target/arm/armv8_2-fp16-conv-1.c: New.

>From 3e773f2ec85ea66d0be0e3a97ea52826156c00f2 Mon Sep 17 00:00:00 2001
From: Matthew Wahab 
Date: Thu, 7 Apr 2016 14:49:17 +0100
Subject: [PATCH 08/17] [PATCH 8/17][ARM] Add VFP FP16 arithmetic instructions.

2016-05-17  Matthew Wahab  

	* config/arm/iterators.md (Code iterators): Fix some white-space
	in the comments.
	(GLTE): New.
	(ABSNEG): New
	(FCVT): Moved from vfp.md.
	(VCVT_HF_US_N): New.
	(VCVT_SI_US_N): New.
	(VCVT_HF_US): New.
	(VCVTH_US): New.
	(FP16_RND): New.
	(absneg_str): New.
	(FCVTI32typename): Moved from vfp.md.
	(sup): Add UNSPEC_VCVTA_S, UNSPEC_VCVTA_U, UNSPEC_VCVTM_S,
	UNSPEC_VCVTM_U, UNSPEC_VCVTN_S, UNSPEC_VCVTN_U, UNSPEC_VCVTP_S,
	UNSPEC_VCVTP_U, UNSPEC_VCVT_HF_S_N, UNSPEC_VCVT_HF_U_N,
	UNSPEC_VCVT_SI_S_N, UNSPEC_VCVT_SI_U_N,  UNSPEC_VCVTH_S_N,
	UNSPEC_VCVTH_U_N, UNSPEC_VCVTH_S and UNSPEC_VCVTH_U.
	(vcvth_op): New.
	(fp16_rnd_str): New.
	(fp16_rnd_insn): New.
	* config/arm/unspecs.md (UNSPEC_VCVT_HF_S_N): New.
	(UNSPEC_VCVT_HF_U_N): New.
	(UNSPEC_VCVT_SI_S_N): New.
	(UNSPEC_VCVT_SI_U_N): New.
	(UNSPEC_VCVTH_S): New.
	(UNSPEC_VCVTH_U): New.
	(UNSPEC_VCVTA_S): New.
	(UNSPEC_VCVTA_U): New.
	(UNSPEC_VCVTM_S): New.
	(UNSPEC_VCVTM_U): New.
	(UNSPEC_VCVTN_S): New.
	(UNSPEC_VCVTN_U): New.
	(UNSPEC_VCVTP_S): New.
	(UNSPEC_VCVTP_U): New.
	(UNSPEC_VCVTP_S):

[PATCH 11/17][ARM] Add builtins for VFP FP16 intrinsics.

2016-05-17 Thread Matthew Wahab


The ACLE intrinsics introduced to support the ARMv8.2 FP16 extensions
require that intrinsics for scalar floating pointer (VFP) instructions
are available under different conditions from those for the NEON
intrinsics.

This patch adds the support code and builtins data for the new VFP
intrinsics. Because of the similarities between the scalar and NEON
builtins, the support code for the scalar builtins follows the code for
the NEON builtins. The declarations for the VFP builtins are also added
in this patch since the support code expects non-empty tables.

Tested the series for arm-none-linux-gnueabihf with native bootstrap and
make check and for arm-none-eabi and armeb-none-eabi with make check on
an ARMv8.2-A emulator.

Ok for trunk?
Matthew

2016-05-17  Matthew Wahab  

* config/arm/arm-builtins.c (hf_UP): New.
(si_UP): New.
(arm_vfp_builtin_data): New.  Update comment.
(enum arm_builtins): Include arm_vfp_builtins.def.
(ARM_BUILTIN_VFP_PATTERN_START): New.
(arm_init_vfp_builtins): New.
(arm_init_builtins): Add arm_init_vfp_builtins.
(arm_expand_vfp_builtin): New.
(arm_expand_builtins: Update for arm_expand_vfp_builtin.  Fix
long line.
* config/arm/arm_vfp_builtins.c: New file.
* config/arm/t-arm (arm.o): Add arm_vfp_builtins.def.
(arm-builtins.o): Likewise.

>From d1f2b10a2e672b1dc886d8d1efb136d970f967f1 Mon Sep 17 00:00:00 2001
From: Matthew Wahab 
Date: Thu, 7 Apr 2016 15:33:14 +0100
Subject: [PATCH 11/17] [PATCH 11/17][ARM] Add builtins for VFP FP16
 intrinsics.

2016-05-17  Matthew Wahab  

	* config/arm/arm-builtins.c (hf_UP): New.
	(si_UP): New.
	(arm_vfp_builtin_data): New.  Update comment.
	(arm_init_vfp_builtins): New.
	(arm_init_builtins): Add arm_init_vfp_builtins.
	(arm_expand_vfp_builtin): New.
	(arm_expand_builtins): Update for arm_expand_vfp_builtin.  Fix
	long line.
	* config/arm/arm_vfp_builtins.c: New file.
	* config/arm/t-arm (arm.o): Add arm_vfp_builtins.def.
	(arm-builtins.o): Likewise.
---
 gcc/config/arm/arm-builtins.c   | 75 +
 gcc/config/arm/arm_vfp_builtins.def | 56 +++
 gcc/config/arm/t-arm|  4 +-
 3 files changed, 126 insertions(+), 9 deletions(-)
 create mode 100644 gcc/config/arm/arm_vfp_builtins.def

diff --git a/gcc/config/arm/arm-builtins.c b/gcc/config/arm/arm-builtins.c
index 5a22b91..58c68a6 100644
--- a/gcc/config/arm/arm-builtins.c
+++ b/gcc/config/arm/arm-builtins.c
@@ -190,6 +190,8 @@ arm_storestruct_lane_qualifiers[SIMD_MAX_BUILTIN_ARGS]
 #define ti_UP	 TImode
 #define ei_UP	 EImode
 #define oi_UP	 OImode
+#define hf_UP	 HFmode
+#define si_UP	 SImode
 
 #define UP(X) X##_UP
 
@@ -239,12 +241,22 @@ typedef struct {
   VAR11 (T, N, A, B, C, D, E, F, G, H, I, J, K) \
   VAR1 (T, N, L)
 
-/* The NEON builtin data can be found in arm_neon_builtins.def.
-   The mode entries in the following table correspond to the "key" type of the
-   instruction variant, i.e. equivalent to that which would be specified after
-   the assembler mnemonic, which usually refers to the last vector operand.
-   The modes listed per instruction should be the same as those defined for
-   that instruction's pattern in neon.md.  */
+/* The NEON builtin data can be found in arm_neon_builtins.def and
+   arm_vfp_builtins.def.  The entries in arm_neon_builtins.def require
+   TARGET_NEON to be true.  The entries in arm_vfp_builtins.def require
+   TARGET_VFP to be true.  The feature tests are checked when the builtins are
+   expanded.
+
+   The mode entries in the following table correspond to
+   the "key" type of the instruction variant, i.e. equivalent to that which
+   would be specified after the assembler mnemonic, which usually refers to the
+   last vector operand.  The modes listed per instruction should be the same as
+   those defined for that instruction's pattern in neon.md.  */
+
+static neon_builtin_datum vfp_builtin_data[] =
+{
+#include "arm_vfp_builtins.def"
+};
 
 static neon_builtin_datum neon_builtin_data[] =
 {
@@ -534,6 +546,10 @@ enum arm_builtins
 #undef CRYPTO2
 #undef CRYPTO3
 
+  ARM_BUILTIN_VFP_BASE,
+
+#include "arm_vfp_builtins.def"
+
   ARM_BUILTIN_NEON_BASE,
   ARM_BUILTIN_NEON_LANE_CHECK = ARM_BUILTIN_NEON_BASE,
 
@@ -542,6 +558,9 @@ enum arm_builtins
   ARM_BUILTIN_MAX
 };
 
+#define ARM_BUILTIN_VFP_PATTERN_START \
+  (ARM_BUILTIN_VFP_BASE + 1)
+
 #define ARM_BUILTIN_NEON_PATTERN_START \
   (ARM_BUILTIN_NEON_BASE + 1)
 
@@ -1033,6 +1052,20 @@ arm_init_neon_builtins (void)
 }
 }
 
+/* Set up all the scalar floating point builtins.  */
+
+static void
+arm_init_vfp_builtins (void)
+{
+  unsigned int i, fcode = ARM_BUILTIN_VFP_PATTERN_START;
+
+  for (i = 0; i < ARRAY_SIZE (vfp_builtin_data); i++, fcode++)
+{
+  neon_builtin_datum *d = &vfp_builtin_data[i];
+  arm_init_neon_builtin (fcode, d);
+}
+}
+

[PATCH 10/17][ARM] Refactor support code for NEON builtins.

2016-05-17 Thread Matthew Wahab


The ACLE intrinsics introduced to support the ARMv8.2 FP16 extensions
require that intrinsics for scalar (VFP) instructions are available
under different conditions from those for the NEON intrinsics. To
support this, changes to the builtins support code are needed to enable
the scalar intrinsics to be initialized and expanded independently of
the NEON intrinsics.

This patch prepares for this by refactoring some of the builtin support
code so that it can be used for both the scalar and the NEON intrinsics.

Tested the series for arm-none-linux-gnueabihf with native bootstrap and
make check and for arm-none-eabi and armeb-none-eabi with make check on
an ARMv8.2-A emulator.

Ok for trunk?
Matthew

2016-05-17  Matthew Wahab  

* config/arm/arm-builtins.c (ARM_BUILTIN_NEON_PATTERN_START):
Change offset calculation.
(arm_init_neon_builtin): New.
(arm_init_builtins): Move body of a loop to the standalone
function arm_init_neon_builtin.
(arm_expand_neon_builtin_1): New.  Update comment.  Function body
moved from arm_expand_neon_builtin with some white-space fixes.
(arm_expand_neon_builtin): Move code into the standalone function
arm_expand_neon_builtin_1.

>From 01aee04d2dc6d2d089407ab14892164417f8407e Mon Sep 17 00:00:00 2001
From: Matthew Wahab 
Date: Thu, 7 Apr 2016 13:36:09 +0100
Subject: [PATCH 10/17] [PATCH 10/17][ARM] Refactor support code for NEON
 builtins.

2016-05-17  Matthew Wahab  

	* config/arm/arm-builtins.c (arm_init_neon_builtin): New.
	(arm_init_builtins): Move body of a loop to the standalone
	function arm_init_neon_builtin.
	(arm_expand_neon_builtin_1): New.  Update comment.  Function body
	moved from arm_neon_builtin with some white-space fixes.
	(arm_expand_neon_builtin): Move code into the standalone function
	arm_expand_neon_builtin_1.
---
 gcc/config/arm/arm-builtins.c | 292 +++---
 1 file changed, 158 insertions(+), 134 deletions(-)

diff --git a/gcc/config/arm/arm-builtins.c b/gcc/config/arm/arm-builtins.c
index 90fb40f..5a22b91 100644
--- a/gcc/config/arm/arm-builtins.c
+++ b/gcc/config/arm/arm-builtins.c
@@ -543,7 +543,7 @@ enum arm_builtins
 };
 
 #define ARM_BUILTIN_NEON_PATTERN_START \
-(ARM_BUILTIN_MAX - ARRAY_SIZE (neon_builtin_data))
+  (ARM_BUILTIN_NEON_BASE + 1)
 
 #undef CF
 #undef VAR1
@@ -895,6 +895,110 @@ arm_init_simd_builtin_scalar_types (void)
 	 "__builtin_neon_uti");
 }
 
+/* Set up a NEON builtin.  */
+
+static void
+arm_init_neon_builtin (unsigned int fcode,
+		   neon_builtin_datum *d)
+{
+  bool print_type_signature_p = false;
+  char type_signature[SIMD_MAX_BUILTIN_ARGS] = { 0 };
+  char namebuf[60];
+  tree ftype = NULL;
+  tree fndecl = NULL;
+
+  d->fcode = fcode;
+
+  /* We must track two variables here.  op_num is
+ the operand number as in the RTL pattern.  This is
+ required to access the mode (e.g. V4SF mode) of the
+ argument, from which the base type can be derived.
+ arg_num is an index in to the qualifiers data, which
+ gives qualifiers to the type (e.g. const unsigned).
+ The reason these two variables may differ by one is the
+ void return type.  While all return types take the 0th entry
+ in the qualifiers array, there is no operand for them in the
+ RTL pattern.  */
+  int op_num = insn_data[d->code].n_operands - 1;
+  int arg_num = d->qualifiers[0] & qualifier_void
+? op_num + 1
+: op_num;
+  tree return_type = void_type_node, args = void_list_node;
+  tree eltype;
+
+  /* Build a function type directly from the insn_data for this
+ builtin.  The build_function_type () function takes care of
+ removing duplicates for us.  */
+  for (; op_num >= 0; arg_num--, op_num--)
+{
+  machine_mode op_mode = insn_data[d->code].operand[op_num].mode;
+  enum arm_type_qualifiers qualifiers = d->qualifiers[arg_num];
+
+  if (qualifiers & qualifier_unsigned)
+	{
+	  type_signature[arg_num] = 'u';
+	  print_type_signature_p = true;
+	}
+  else if (qualifiers & qualifier_poly)
+	{
+	  type_signature[arg_num] = 'p';
+	  print_type_signature_p = true;
+	}
+  else
+	type_signature[arg_num] = 's';
+
+  /* Skip an internal operand for vget_{low, high}.  */
+  if (qualifiers & qualifier_internal)
+	continue;
+
+  /* Some builtins have different user-facing types
+	 for certain arguments, encoded in d->mode.  */
+  if (qualifiers & qualifier_map_mode)
+	op_mode = d->mode;
+
+  /* For pointers, we want a pointer to the basic type
+	 of the vector.  */
+  if (qualifiers & qualifier_pointer && VECTOR_MODE_P (op_mode))
+	op_mode = GET_MODE_INNER (op_mode);
+
+  eltype = arm_simd_builtin_type
+	(op_mode,
+	 (qualifiers & qualifier_unsigned) != 0,
+	 (qualifiers & qualifier_poly) != 0);
+  gcc_assert (eltype != NULL);
+
+  /* Add qualif

[PATCH 9/17][ARM] Add NEON FP16 arithmetic instructions.

2016-05-17 Thread Matthew Wahab


The ARMv8.2-A FP16 extension adds a number of arithmetic instrutctions
to the NEON instruction set. This patch adds support for these
instructions to the ARM backend.

As with the VFP FP16 arithmetic instructions, operations on __fp16
values are done by conversion to single-precision. Any new optimization
supported by the instruction descriptions can only apply to code
generated using intrinsics added in this patch series.

A number of the instructions are modelled as two variants, one using
UNSPEC and the other using RTL operations, with the model used decided
by the funsafe-math-optimizations flag. This follows the
single-precision instructions and is due to the half-precision
operations having the same conditions and restrictions on their use in
optmizations (when they are enabled).

Tested the series for arm-none-linux-gnueabihf with native bootstrap and
make check and for arm-none-eabi and armeb-none-eabi with make check on
an ARMv8.2-A emulator.

Ok for trunk?
Matthew

2016-05-17  Matthew Wahab  

* config/arm/iterators.md (VCVTHI): New.
(NEON_VCMP): Add UNSPEC_VCLT and UNSPEC_VCLE.  Fix a long line.
(NEON_VAGLTE): New.
(VFM_LANE_AS): New.
(VH_CVTTO): New.
(V_reg): Add HF, V4HF and V8HF.  Fix white-space.
(V_HALF): Add V4HF.  Fix white-space.
(V_if_elem): Add HF, V4HF and V8HF.  Fix white-space.
(V_s_elem): Likewise.
(V_sz_elem): Fix white-space.
(V_elem_ch): Likewise.
(VH_elem_ch): New.
(scalar_mul_constraint): Add V8HF and V4HF.
(Is_float_mode): Fix white-space.
(Is_d_reg): Likewise.
(q): Add HF.  Fix white-space.
(float_sup): New.
(float_SUP): New.
(cmp_op_unsp): Add UNSPEC_VCALE and UNSPEC_VCALT.
(neon_vfm_lane_as): New.
* config/arm/neon.md (add3_fp16): New.
(sub3_fp16): New.
(mul3add_neon): New.
(*fma4): New.
(fma4_intrinsic): New.
(fmsub4_intrinsic): Fix white-space.
(*fmsub4): New.
(fmsub4_intrinsic): New.
(2_fp16): New.
(neon_v): New.
(neon_v): New.
(neon_vsqrte): New.
(neon_vpaddv4hf): New.
(neon_vadd): New.
(neon_vsub): New.
(neon_vadd_unspec): New.
(neon_vsub_unspec): New.
(neon_vmulf): New.
(neon_vfma): New.
(neon_vfms): New.
(neon_vc): New.
(neon_vc_fp16insn): New
(neon_vc_fp16insn_unspec): New.
(neon_vca): New.
(neon_vca_fp16insn): New.
(neon_vca_fp16insn_unspec): New.
(neon_vcz): New.
(neon_vabd): New.
(neon_vf): New.
(neon_vpfv4hf): New.
(neon_): New.
(neon_vrecps): New.
(neon_vrsqrts): New.
(neon_vrecpe): New (VH variant).
(neon_vcvt): New (VCVTHI variant).
(neon_vcvt): New (VH variant).
(neon_vcvt_n): New (VH variant).
(neon_vcvt_n): New (VCVTHI variant).
(neon_vcvt): New (VH variant).
(neon_vmul_lane): New.
(neon_vmul_n): New.
* config/arm/unspecs.md (UNSPEC_VCALE): New
(UNSPEC_VCALT): New.
(UNSPEC_VFMA_LANE): New.
(UNSPECS_VFMS_LANE): New.
(UNSPECS_VSQRTE): New.

testsuite/
2016-05-17  Matthew Wahab  

* gcc.target/arm/armv8_2-fp16-arith-1.c: Add tests for float16x4_t
and float16x8_t.

>From 623f36632cc2848f16ba1c75f400198a72dc6ea4 Mon Sep 17 00:00:00 2001
From: Matthew Wahab 
Date: Thu, 7 Apr 2016 16:19:57 +0100
Subject: [PATCH 09/17] [PATCH 9/17][ARM] Add NEON FP16 arithmetic
 instructions.

2016-05-17  Matthew Wahab  

	* config/arm/iterators.md (VCVTHI): New.
	(NEON_VCMP): Add UNSPEC_VCLT and UNSPEC_VCLE.  Fix a long line.
	(NEON_VAGLTE): New.
	(VFM_LANE_AS): New.
	(VH_CVTTO): New.
	(V_reg): Add HF, V4HF and V8HF.  Fix white-space.
	(V_HALF): Add V4HF.  Fix white-space.
	(V_if_elem): Add HF, V4HF and V8HF.  Fix white-space.
	(V_s_elem): Likewise.
	(V_sz_elem): Fix white-space.
	(V_elem_ch): Likewise.
	(VH_elem_ch): New.
	(scalar_mul_constraint): Add V8HF and V4HF.
	(Is_float_mode): Fix white-space.
	(Is_d_reg): Add V4HF and V8HF.  Fix white-space.
	(q): Add HF.  Fix white-space.
	(float_sup): New.
	(float_SUP): New.
	(cmp_op_unsp): Add UNSPEC_VCALE and UNSPEC_VCALT.
	(neon_vfm_lane_as): New.
	* config/arm/neon.md (add3_fp16): New.
	(sub3_fp16): New.
	(mul3add_neon): New.
	(*fma4): New.
	(fma4_intrinsic): New.
	(fmsub4_intrinsic): Fix white-space.
	(*fmsub4): New.
	(fmsub4_intrinsic): New.
	(2_fp16): New.
	(neon_v): New.
	(neon_v): New.
	(neon_vsqrte): New.
	(neon_vpadd): New.
	(neon_vadd): New.
	(neon_vsub): New.
	(neon_vadd_unspec): New.
	(neon_vsub_unspec): New.
	(neon_vmulf): New.
	(neon_vfma): New.
	(neon_vfms): New.
	(neon_vc): New.
	(neon_vc_fp16insn): New
	(neon_vc_fp16insn_unspec): New.
	(neon_vca): New.
	(neon_vca_fp16insn): New.
	(neon_vca_fp16insn_unspec): New.
	(neon_vcz): New.
	(neon_vabd): New.
	(neon_vf): New.
	(neon_

[PATCH 7/17][ARM] Add FP16 data movement instructions.

2016-05-17 Thread Matthew Wahab


The ARMv8.2-A FP16 extension adds a number of instructions to support
data movement for FP16 values. This patch adds these instructions to the
backend, making them available to the compiler code generator.

The new instructions include VSEL which selects between two registers
depending on a condition. This is used to support conditional data
movement which can depend on the result of comparisons between
half-precision values. These comparisons are always done by conversion
to single-precision.

Tested the series for arm-none-linux-gnueabihf with native bootstrap and
make check and for arm-none-eabi and armeb-none-eabi with make check on
an ARMv8.2-A emulator. This patch also tested for
arm-none-linux-gnueabihf with native bootstrap and make check and for
arm-none-eabi with check-gcc on an ARMv8.2-A emulator.

Ok for trunk?
Matthew

2016-05-17  Matthew Wahab  
Jiong Wang  

* config/arm/arm.c (coproc_secondary_reload_class): Make HFmode
available when FP16 instructions are available.
(output_mov_vfp): Add support for 16-bit data moves.
(arm_validize_comparison): Fix some white-space.  Support HFmode
by conversion to SFmode.
* config/arm/arm.md (truncdfhf2): Fix a comment.
(extendhfdf2): Likewise.
(cstorehf4): New.
(movsicc): Fix some white-space.
(movhfcc): New.
(movsfcc): Fix some white-space.
(*cmovhf): New.
* config/arm/vfp.md (*arm_movhi_vfp): Disable when VFP FP16
instructions are available.
(*thumb_movhi_vfp): Likewise.
(*arm_movhi_fp16): New.
(*thumb_movhi_fp16): New.
(*movhf_vfp_fp16): New.
(*movhf_vfp_neon): Disable when VFP FP16 instructions are
available.
(*movhf_vfp): Likewise.
(extendhfsf2): Enable when VFP FP16 instructions are available.
(truncsfhf2):  Enable when VFP FP16 instructions are available.

testsuite/
2016-05-17  Matthew Wahab  

* gcc.target/arm/armv8_2_fp16-move-1.c: New.

>From 83268813cf9aa59940ed17d623606c9e485f6ecf Mon Sep 17 00:00:00 2001
From: Matthew Wahab 
Date: Thu, 7 Apr 2016 13:35:04 +0100
Subject: [PATCH 07/17] [PATCH 7/17][ARM] Add FP16 data movement instructions.

2016-05-17  Matthew Wahab  
	Jiong Wang  

	* config/arm/arm.c (coproc_secondary_reload_class): Make HFmode
	available when FP16 instructions are available.
	(output_mov_vfp): Add support for 16-bit data moves.
	(arm_validize_comparison): Fix some white-space.  Support HFmode
	by conversion to SFmode.
	* config/arm/arm.md (truncdfhf2): Fix a comment.
	(extendhfdf2): Likewise.
	(cstorehf4_fp16): New.
	(movsicc): Fix some white-space.
	(movhfcc): New.
	(movsfcc): Fix some white-space.
	(*cmovhf): New.
	* config/arm/vfp.md (*arm_movhi_vfp): Disable when VFP FP16
	instructions are available.
	(*thumb_movhi_vfp): Likewise.
	(*arm_movhi_fp16): New.
	(*thumb_movhi_fp16): New.
	(*movhf_vfp_fp16): New.
	(*movhf_vfp_neon): Disable when VFP FP16 instructions are
	available.
	(*movhf_vfp): Likewise.
	(extendhfsf2): Enable when VFP FP16 instructions are available.
	(truncsfhf2):  Enable when VFP FP16 instructions are available.

testsuite/
2016-05-17  Matthew Wahab  

	* gcc.target/arm/armv8_2_fp16-move-1.c: New.
---
 gcc/config/arm/arm.c   |  16 +-
 gcc/config/arm/arm.md  |  81 -
 gcc/config/arm/vfp.md  | 182 -
 gcc/testsuite/gcc.target/arm/armv8_2-fp16-move-1.c | 166 +++
 4 files changed, 433 insertions(+), 12 deletions(-)
 create mode 100644 gcc/testsuite/gcc.target/arm/armv8_2-fp16-move-1.c

diff --git a/gcc/config/arm/arm.c b/gcc/config/arm/arm.c
index 6892040..187ebda 100644
--- a/gcc/config/arm/arm.c
+++ b/gcc/config/arm/arm.c
@@ -13162,7 +13162,7 @@ coproc_secondary_reload_class (machine_mode mode, rtx x, bool wb)
 {
   if (mode == HFmode)
 {
-  if (!TARGET_NEON_FP16)
+  if (!TARGET_NEON_FP16 && !TARGET_VFP_FP16INST)
 	return GENERAL_REGS;
   if (s_register_operand (x, mode) || neon_vector_mem_operand (x, 2, true))
 	return NO_REGS;
@@ -18613,6 +18613,8 @@ output_move_vfp (rtx *operands)
   rtx reg, mem, addr, ops[2];
   int load = REG_P (operands[0]);
   int dp = GET_MODE_SIZE (GET_MODE (operands[0])) == 8;
+  int sp = (!TARGET_VFP_FP16INST
+	|| GET_MODE_SIZE (GET_MODE (operands[0])) == 4);
   int integer_p = GET_MODE_CLASS (GET_MODE (operands[0])) == MODE_INT;
   const char *templ;
   char buff[50];
@@ -18659,7 +18661,7 @@ output_move_vfp (rtx *operands)
 
   sprintf (buff, templ,
 	   load ? "ld" : "st",
-	   dp ? "64" : "32",
+	   dp ? "64" : sp ? "32" : "16",
 	   dp ? "P" : "",
 	   integer_p ? "\t%@ int" : "");
   output_asm_insn (buff, ops);
@@ -29238,7 +29240,7 @@ arm_validize_comparison (rtx *comparison, rtx * op1, rtx * op2

[PATCH 6/17][ARM] Add data processing intrinsics for float16_t.

2016-05-17 Thread Matthew Wahab


The ACLE specifies a number of intrinsics for manipulating vectors
holding values in most of the integer and floating point type. These
include 16-bit integer types but not 16-bit floating point even though
the same instruction is used for both.

A future version of the ACLE extends the data processing intrinscs to
the 16-bit floating point types, making the intrinsics available
under the same conditions as the ARM __fp16 type.

This patch adds the new intrinsics:
 vbsl_f16, vbslq_f16, vdup_n_f16, vdupq_n_f16, vdup_lane_f16,
 vdupq_lane_f16, vext_f16, vextq_f16, vmov_n_f16, vmovq_n_f16,
 vrev64_f16, vrev64q_f16, vtrn_f16, vtrnq_f16, vuzp_f16, vuzpq_f16,
 vzip_f16, vzipq_f16.

This patch also updates the advsimd-intrinsics testsuite to test the f16
variants for ARM targets. These intrinsics are only implemented in the
ARM target so the tests are disabled for AArch64 using an extra
condition on a new convenience macro FP16_SUPPORTED. This patch also
disables, for the ARM target, the testsuite defined macro vdup_n_f16 as
it is no longer needed.

Tested the series for arm-none-linux-gnueabihf with native bootstrap and
make check and for arm-none-eabi and armeb-none-eabi with make check on
an ARMv8.2-A emulator. Also tested for aarch64-none-elf with the
advsimd-intrinsics testsuite using an ARMv8.2-A emulator.

Ok for trunk?
Matthew

2016-05-17  Matthew Wahab  

* config/arm/arm.c (arm_evpc_neon_vuzp): Add support for V8HF and
V4HF modes.
(arm_evpc_neon_vzip): Likewise.
(arm_evpc_neon_vrev): Likewise.
(arm_evpc_neon_vtrn): Likewise.
(arm_evpc_neon_vext): Likewise.
* config/arm/arm_neon.h (vbsl_f16): New.
(vbslq_f16): New.
(vdup_n_f16): New.
(vdupq_n_f16): New.
(vdup_lane_f16): New.
(vdupq_lane_f16): New.
(vext_f16): New.
(vextq_f16): New.
(vmov_n_f16): New.
(vmovq_n_f16): New.
(vrev64_f16): New.
(vrev64q_f16): New.
(vtrn_f16): New.
(vtrnq_f16): New.
(vuzp_f16): New.
(vuzpq_f16): New.
(vzip_f16): New.
(vzipq_f16): New.
* config/arm/arm_neon_buillins.def (vdup_n): New (v8hf, v4hf variants).
(vdup_lane): New (v8hf, v4hf variants).
(vext): New (v8hf, v4hf variants).
(vbsl): New (v8hf, v4hf variants).
* config/arm/iterators.md (VDQWH): New.
(VH): New.
(V_double_vector_mode): Add V8HF and V4HF.  Fix white-space.
(Scalar_mul_8_16): Fix white-space.
(Is_d_reg): Add V4HF and V8HF.
* config/arm/neon.md (neon_vdup_lane_internal): New.
(neon_vdup_lane): New.
(neon_vtrn_internal): Replace VDQW with VDQWH.
(*neon_vtrn_insn): Likewise.
(neon_vzip_internal): Likewise. Also fix white-space.
(*neon_vzip_insn): Likewise
(neon_vuzp_internal): Likewise.
(*neon_vuzp_insn): Likewise
* config/arm/vec-common.md (vec_perm_const): New.

testsuite/
2016-05-17  Matthew Wahab  

* gcc.target/aarch64/advsimd-intrinsics/arm-neon-ref.h
(FP16_SUPPORTED): New
(vdup_n_f16): Disable for non-AArch64 targets.
* gcc.target/aarch64/advsimd-intrinsics/vbsl.c: Add __fp16 tests,
conditional on FP16_SUPPORTED.
* gcc.target/aarch64/advsimd-intrinsics/vdup-vmov.c: Likewise.
* gcc.target/aarch64/advsimd-intrinsics/vdup_lane.c: Likewise.
* gcc.target/aarch64/advsimd-intrinsics/vext.c: Likewise.
* gcc.target/aarch64/advsimd-intrinsics/vrev.c: Likewise.
* gcc.target/aarch64/advsimd-intrinsics/vshuffle.inc: Add support
for testing __fp16.
* gcc.target/aarch64/advsimd-intrinsics/vtrn.c: Add __fp16 tests,
conditional on FP16_SUPPORTED.
* gcc.target/aarch64/advsimd-intrinsics/vuzp.c: Likewise.
* gcc.target/aarch64/advsimd-intrinsics/vzip.c: Likewise.

>From 08c5cf4b5c6c846a4f62b6ad8776f2388b135e55 Mon Sep 17 00:00:00 2001
From: Matthew Wahab 
Date: Thu, 7 Apr 2016 14:48:29 +0100
Subject: [PATCH 06/17] [PATCH 6/17][ARM] Add data processing intrinsics for
 float16_t.

2016-05-17  Matthew Wahab  

	* config/arm/arm.c (arm_evpc_neon_vuzp): Add support for V8HF and
	V4HF modes.
	(arm_evpc_neon_vtrn): Likewise.
	(arm_evpc_neon_vrev): Likewise.
	(arm_evpc_neon_vext): Likewise.
	* config/arm/arm_neon.h (vbsl_f16): New.
	(vbslq_f16): New.
	(vdup_n_f16): New.
	(vdupq_n_f16): New.
	(vdup_lane_f16): New.
	(vdupq_lane_f16): New.
	(vext_f16): New.
	(vextq_f16): New.
	(vmov_n_f16): New.
	(vmovq_n_f16): New.
	(vrev64_f16): New.
	(vrev64q_f16): New.
	(vtrn_f16): New.
	(vtrnq_f16): New.
	(vuzp_f16): New.
	(vuzpq_f16): New.
	(vzip_f16): New.
	(vzipq_f16): New.
	* config/arm/arm_neon_buillins.def (vdup_n): New (v8hf, v4hf variants).
	(vdup_lane): New (v8hf, v4hf variants).
	(vext): New (v8hf, v4hf variants).
	(vbsl): New (v8hf, v4hf variants).
	* config/arm/iterators.md (VDQWH): New.
	(VH): New.
	(V_double_vector_mode): Add V8HF

[PATCH 5/17][ARM] Enable HI mode moves for floating point values.

2016-05-17 Thread Matthew Wahab


The handling of 16-bit integer data-movement in the ARM backend doesn't
make full use of the VFP instructions when they are available, even when
the values are for use in VFP operations.

This patch adds support for using the VFP instructions and registers
when moving 16-bit integer and floating point data between registers and
between registers and memory.

Tested the series for arm-none-linux-gnueabihf with native bootstrap and
make check and for arm-none-eabi and armeb-none-eabi with make check on
an ARMv8.2-A emulator. Tested this patch for arm-none-linux-gnueabihf
with native bootstrap and make check and for arm-none-eabi with
check-gcc on an ARMv8.2-A emulator.

Ok for trunk?
Matthew

2016-05-17  Jiong Wang  
    Matthew Wahab  

* config/arm/arm.c (output_move_vfp): Weaken assert to allow
HImode.
(arm_hard_regno_mode_ok): Allow HImode values in VFP registers.
* config/arm/arm.md (*movhi_insn_arch4) Disable when VFP registers are
available.
(*movhi_bytes): Likewise.
* config/arm/vfp.md (*arm_movhi_vfp): New.
(*thumb2_movhi_vfp): New.

testsuite/
2016-05-17  Matthew Wahab  

* gcc.target/arm/short-vfp-1.c: New.

>From 0b8bc5f2966924c523d6fd75cf73dd01341914e2 Mon Sep 17 00:00:00 2001
From: Matthew Wahab 
Date: Thu, 7 Apr 2016 13:33:04 +0100
Subject: [PATCH 05/17] [PATCH 5/17][ARM] Enable HI mode moves for floating
 point values.

2016-05-17  Jiong Wang  
	    Matthew Wahab  

	* config/arm/arm.c (output_move_vfp): Weaken assert to allow
	HImode.
	(arm_hard_regno_mode_ok): Allow HImode values in VFP registers.
	* config/arm/arm.md (*movhi_bytes): Disable when VFP registers are
	available.  Also fix some white-space.
	* config/arm/vfp.md (*arm_movhi_vfp): New.
	(*thumb2_movhi_vfp): New.

testsuite/
2016-05-17  Matthew Wahab  

	* gcc.target/arm/short-vfp-1.c: New.
---
 gcc/config/arm/arm.c   |  5 ++
 gcc/config/arm/arm.md  |  6 +-
 gcc/config/arm/vfp.md  | 93 ++
 gcc/testsuite/gcc.target/arm/short-vfp-1.c | 45 +++
 4 files changed, 146 insertions(+), 3 deletions(-)
 create mode 100644 gcc/testsuite/gcc.target/arm/short-vfp-1.c

diff --git a/gcc/config/arm/arm.c b/gcc/config/arm/arm.c
index f3914ef..26a8a48 100644
--- a/gcc/config/arm/arm.c
+++ b/gcc/config/arm/arm.c
@@ -18628,6 +18628,7 @@ output_move_vfp (rtx *operands)
   gcc_assert ((mode == HFmode && TARGET_HARD_FLOAT && TARGET_VFP)
 	  || mode == SFmode
 	  || mode == DFmode
+	  || mode == HImode
 	  || mode == SImode
 	  || mode == DImode
   || (TARGET_NEON && VALID_NEON_DREG_MODE (mode)));
@@ -23422,6 +23423,10 @@ arm_hard_regno_mode_ok (unsigned int regno, machine_mode mode)
   if (mode == HFmode)
 	return VFP_REGNO_OK_FOR_SINGLE (regno);
 
+  /* VFP registers can hold HImode values.  */
+  if (mode == HImode)
+	return VFP_REGNO_OK_FOR_SINGLE (regno);
+
   if (TARGET_NEON)
 return (VALID_NEON_DREG_MODE (mode) && VFP_REGNO_OK_FOR_DOUBLE (regno))
|| (VALID_NEON_QREG_MODE (mode)
diff --git a/gcc/config/arm/arm.md b/gcc/config/arm/arm.md
index 4049f10..3e23178 100644
--- a/gcc/config/arm/arm.md
+++ b/gcc/config/arm/arm.md
@@ -6365,7 +6365,7 @@
   [(set (match_operand:HI 0 "nonimmediate_operand" "=r,r,r,m,r")
 	(match_operand:HI 1 "general_operand"  "rIk,K,n,r,mi"))]
   "TARGET_ARM
-   && arm_arch4
+   && arm_arch4 && !(TARGET_HARD_FLOAT && TARGET_VFP)
&& (register_operand (operands[0], HImode)
|| register_operand (operands[1], HImode))"
   "@
@@ -6391,7 +6391,7 @@
 (define_insn "*movhi_bytes"
   [(set (match_operand:HI 0 "s_register_operand" "=r,r,r")
 	(match_operand:HI 1 "arm_rhs_operand"  "I,rk,K"))]
-  "TARGET_ARM"
+  "TARGET_ARM && !(TARGET_HARD_FLOAT && TARGET_VFP)"
   "@
mov%?\\t%0, %1\\t%@ movhi
mov%?\\t%0, %1\\t%@ movhi
@@ -6399,7 +6399,7 @@
   [(set_attr "predicable" "yes")
(set_attr "type" "mov_imm,mov_reg,mvn_imm")]
 )
-	
+
 ;; We use a DImode scratch because we may occasionally need an additional
 ;; temporary if the address isn't offsettable -- push_reload doesn't seem
 ;; to take any notice of the "o" constraints on reload_memory_operand operand.
diff --git a/gcc/config/arm/vfp.md b/gcc/config/arm/vfp.md
index 9750ba1..d7c874a 100644
--- a/gcc/config/arm/vfp.md
+++ b/gcc/config/arm/vfp.md
@@ -18,6 +18,99 @@
 ;; along with GCC; see the file COPYING3.  If not see
 ;; <http://www.gnu.org/licenses/>.  */
 
+;; Patterns for HI moves which provide more data transfer instructions when VFP
+;; support is enabled.
+(define_insn "*arm_movhi_vfp"
+ [(set
+   (

[PATCH 4/17][ARM] Define feature macros for FP16.

2016-05-17 Thread Matthew Wahab


The FP16 extension introduced with the ARMv8.2-A architecture adds
instructions operating on FP16 values to the VFP and NEON instruction
sets.

The patch adds the feature macro __ARM_FEATURE_FP16_SCALAR_ARITHMETIC
which is defined to be 1 if the VFP FP16 instructions are available; it
is otherwise undefined.

The patch also adds the feature macro __ARM_FEATURE_FP16_VECTOR_ARITHMETIC
which is defined to be 1 if the NEON FP16 instructions are available; it
is otherwise undefined.

These two macros will appear in a future version of the ACLE.

Tested the series for arm-none-linux-gnueabihf with native bootstrap and
make check and for arm-none-eabi and armeb-none-eabi with make check on
an ARMv8.2-A emulator.

Ok for trunk?
Matthew

2016-05-17  Matthew Wahab  

* config/arm/arm-c.c (arm_cpu_builtins): Define
"__ARM_FEATURE_FP16_SCALAR_ARITHMETIC" and
"__ARM_FEATURE_FP16_VECTOR_ARITHMETIC".

testsuite/
2016-05-17  Matthew Wahab  

* gcc.target/arm/attr-fp16-arith-1.c: New.

>From 688b4d34a64a40abd4705a9bdaea40929a7a1d26 Mon Sep 17 00:00:00 2001
From: Matthew Wahab 
Date: Thu, 7 Apr 2016 13:32:15 +0100
Subject: [PATCH 04/17] [PATCH 4/17][ARM] Define feature macros for FP16.

2016-05-17  Matthew Wahab  

	* config/arm/arm-c.c (arm_cpu_builtins): Define
	"__ARM_FEATURE_FP16_SCALAR_ARITHMETIC" and
	"__ARM_FEATURE_FP16_VECTOR_ARITHMETIC".

testsuite/
2016-05-17  Matthew Wahab  

	* gcc.target/arm/attr-fp16-arith-1.c: New.
---
 gcc/config/arm/arm-c.c   |  5 +++
 gcc/testsuite/gcc.target/arm/attr-fp16-arith-1.c | 45 
 2 files changed, 50 insertions(+)
 create mode 100644 gcc/testsuite/gcc.target/arm/attr-fp16-arith-1.c

diff --git a/gcc/config/arm/arm-c.c b/gcc/config/arm/arm-c.c
index b98470f..7283700 100644
--- a/gcc/config/arm/arm-c.c
+++ b/gcc/config/arm/arm-c.c
@@ -142,6 +142,11 @@ arm_cpu_builtins (struct cpp_reader* pfile)
   def_or_undef_macro (pfile, "__ARM_FP16_ARGS",
 		  arm_fp16_format != ARM_FP16_FORMAT_NONE);
 
+  def_or_undef_macro (pfile, "__ARM_FEATURE_FP16_SCALAR_ARITHMETIC",
+		  TARGET_VFP_FP16INST);
+  def_or_undef_macro (pfile, "__ARM_FEATURE_FP16_VECTOR_ARITHMETIC",
+		  TARGET_NEON_FP16INST);
+
   def_or_undef_macro (pfile, "__ARM_FEATURE_FMA", TARGET_FMA);
   def_or_undef_macro (pfile, "__ARM_NEON__", TARGET_NEON);
   def_or_undef_macro (pfile, "__ARM_NEON", TARGET_NEON);
diff --git a/gcc/testsuite/gcc.target/arm/attr-fp16-arith-1.c b/gcc/testsuite/gcc.target/arm/attr-fp16-arith-1.c
new file mode 100644
index 000..5011315
--- /dev/null
+++ b/gcc/testsuite/gcc.target/arm/attr-fp16-arith-1.c
@@ -0,0 +1,45 @@
+/* { dg-do compile } */
+/* { dg-require-effective-target arm_v8_2a_fp16_neon_ok } */
+/* { dg-options "-O2" } */
+/* { dg-add-options arm_v8_2a_fp16_scalar } */
+
+/* Reset fpu to a value compatible with the next pragmas.  */
+#pragma GCC target ("fpu=vfp")
+
+#pragma GCC push_options
+#pragma GCC target ("fpu=fp-armv8")
+
+#ifndef __ARM_FEATURE_FP16_SCALAR_ARITHMETIC
+#error __ARM_FEATURE_FP16_SCALAR_ARITHMETIC not defined.
+#endif
+
+#pragma GCC push_options
+#pragma GCC target ("fpu=neon-fp-armv8")
+
+#ifndef __ARM_FEATURE_FP16_VECTOR_ARITHMETIC
+#error __ARM_FEATURE_FP16_VECTOR_ARITHMETIC not defined.
+#endif
+
+#ifndef __ARM_NEON
+#error __ARM_NEON not defined.
+#endif
+
+#if !defined (__ARM_FP) || !(__ARM_FP & 0x2)
+#error Invalid value for __ARM_FP
+#endif
+
+#pragma GCC pop_options
+
+/* Check that the FP version is correctly reset to mfpu=fp-armv8.  */
+
+#if !defined (__ARM_FP) || !(__ARM_FP & 0x2)
+#error __ARM_FP should record FP16 support.
+#endif
+
+#pragma GCC pop_options
+
+/* Check that the FP version is correctly reset to mfpu=vfp.  */
+
+#if !defined (__ARM_FP) || (__ARM_FP & 0x2)
+#error Unexpected value for __ARM_FP.
+#endif
-- 
2.1.4

[PATCH 3/17][Testsuite] Add ARM support for ARMv8.2-A with FP16 arithmetic instructions.

2016-05-17 Thread Matthew Wahab


The ARMv8.2-A FP16 extension adds to both the VFP and the NEON
instruction sets. This patch adds support to the testsuite to select
targets and set options for tests that make use of these
instructions. It also adds documentation for ARMv8.1-A selectors.

Tested the series for arm-none-linux-gnueabihf with native bootstrap and
make check and for arm-none-eabi and armeb-none-eabi with make check on an
ARMv8.2-A emulator.

Ok for trunk?
Matthew

2016-05-17  Matthew Wahab  

* doc/sourcebuild.texi (ARM-specific attributes): Add entries for
arm_v8_1a_neon_ok, arm_v8_2a_fp16_scalar_ok, arm_v8_2a_fp16_scalar_hw,
arm_v8_2a_fp16_neon_ok and arm_v8_2a_fp16_neon_hw.
(Add options): Add entries for arm_v8_1a_neon, arm_v8_2a_scalar,
arm_v8_2a_neon.

testsuite/
2016-05-17  Matthew Wahab  

* lib/target-supports.exp
(add_options_for_arm_v8_2a_fp16_scalar_ok): New.
(add_options_for_arm_v8_2a_fp16_neon): New.
(check_effective_target_arm_arch_v8_2a_ok): Auto-generate.
(add_options_for_arm_arch_v8_2a): Auto-generate.
(check_effective_target_arm_arch_v8_2a_multilib): Auto-generate.
(check_effective_target_arm_v8_2a_fp16_scalar_ok_nocache): New.
(check_effective_target_arm_v8_2a_fp16_scalar_ok): New.
(check_effective_target_arm_v8_2a_fp16_neon_ok_nocache): New.
(check_effective_target_arm_v8_2a_fp16_neon_ok): New.
(check_effective_target_arm_v8_2a_fp16_scalar_hw): New.
(check_effective_target_arm_v8_2a_fp16_neon_hw): New.

>From ba9b4dcf774d0fdffae11ac59537255775e8f1b6 Mon Sep 17 00:00:00 2001
From: Matthew Wahab 
Date: Thu, 7 Apr 2016 13:34:30 +0100
Subject: [PATCH 03/17] [PATCH 3/17][Testsuite] Add ARM support for ARMv8.2-A
 with FP16   arithmetic instructions.

2016-05-17  Matthew Wahab  

	* doc/sourcebuild.texi (ARM-specific attributes): Add entries for
arm_v8_1a_none_ok, arm_v8_2a_fp16_scalar_ok, arm_v8_2a_fp16_scalar_hw,
	arm_v8_2a_fp16_neon_ok and arm_v8_2a_fp16_neon_hw.
	(Add options): Add entries for arm_v8_1a_neon, arm_v8_2a_scalar,
	arm_v8_2a_neon.
	* lib/target-supports.exp
	(add_options_for_arm_v8_2a_fp16_scalar_ok): New.
	(add_options_for_arm_v8_2a_fp16_neon): New.
	(check_effective_target_arm_arch_v8_2a_ok): Auto-generate.
	(add_options_for_arm_arch_v8_2a): Auto-generate.
	(check_effective_target_arm_arch_v8_2a_multilib): Auto-generate.
	(check_effective_target_arm_v8_2a_fp16_scalar_ok_nocache): New.
	(check_effective_target_arm_v8_2a_fp16_scalar_ok): New.
	(check_effective_target_arm_v8_2a_fp16_neon_ok_nocache): New.
	(check_effective_target_arm_v8_2a_fp16_neon_ok): New.
	(check_effective_target_arm_v8_2a_fp16_scalar_hw): New.
	(check_effective_target_arm_v8_2a_fp16_neon_hw): New.
---
 gcc/doc/sourcebuild.texi  |  40 ++
 gcc/testsuite/lib/target-supports.exp | 145 +-
 2 files changed, 184 insertions(+), 1 deletion(-)

diff --git a/gcc/doc/sourcebuild.texi b/gcc/doc/sourcebuild.texi
index dd6abda..66904a7 100644
--- a/gcc/doc/sourcebuild.texi
+++ b/gcc/doc/sourcebuild.texi
@@ -1596,6 +1596,7 @@ ARM target supports @code{-mfpu=neon-fp-armv8 -mfloat-abi=softfp}.
 Some multilibs may be incompatible with these options.
 
 @item arm_v8_1a_neon_ok
+@anchor{arm_v8_1a_neon_ok}
 ARM target supports options to generate ARMv8.1 Adv.SIMD instructions.
 Some multilibs may be incompatible with these options.
 
@@ -1604,6 +1605,28 @@ ARM target supports executing ARMv8.1 Adv.SIMD instructions.  Some
 multilibs may be incompatible with the options needed.  Implies
 arm_v8_1a_neon_ok.
 
+@item arm_v8_2a_fp16_scalar_ok
+@anchor{arm_v8_2a_fp16_scalar_ok}
+ARM target supports options to generate instructions for ARMv8.2 and
+scalar instructions from the FP16 extension.  Some multilibs may be
+incompatible with these options.
+
+@item arm_v8_2a_fp16_scalar_hw
+ARM target supports executing instructions for ARMv8.2 and scalar
+instructions from the FP16 extension.  Some multilibs may be
+incompatible with these options.  Implies arm_v8_2a_fp16_neon_ok.
+
+@item arm_v8_2a_fp16_neon_ok
+@anchor{arm_v8_2a_fp16_neon_ok}
+ARM target supports options to generate instructions from ARMv8.2 with
+the FP16 extension.  Some multilibs may be incompatible with these
+options.  Implies arm_v8_2a_fp16_scalar_ok.
+
+@item arm_v8_2a_fp16_neon_hw
+ARM target supports executing instructions from ARMv8.2 with the FP16
+extension.  Some multilibs may be incompatible with these options.
+Implies arm_v8_2a_fp16_neon_ok and arm_v8_2a_fp16_scalar_hw.
+
 @item arm_prefer_ldrd_strd
 ARM target prefers @code{LDRD} and @code{STRD} instructions over
 @code{LDM} and @code{STM} instructions.
@@ -2088,6 +2111,23 @@ the @ref{arm_neon_fp16_ok,,arm_neon_fp16_ok effective target keyword}.
 arm vfp3 floating point support; see
 the @ref{arm_vfp3_ok,,arm_vfp3_ok effective target keyword}.
 
+@item arm_v8_1a_neon
+Add options for ARMv8.1 with Adv.SIMD support, if this

[PATCH 2/17][Testsuite] Add a selector for ARM FP16 alternative format support.

2016-05-17 Thread Matthew Wahab


The ARMv8.2-A FP16 extension only supports the IEEE format for FP16
data. It is not compatible with the option -mfp16-format=none nor with
the option -mfp16-format=alternative (selecting the ARM alternative FP16
format). Using either with the FP16 extension will trigger a compiler
error.

This patch adds the selector arm_fp16_alternative_ok to the testsuite's
target-support code to allow tests to require support for the
alternative format. It also adds selector arm_fp16_none_ok to check
whether -mfp16-format=none is a valid option for the target.  The patch
also updates existing tests to make use of the new selectors.

Tested the series for arm-none-linux-gnueabihf with native bootstrap and
make check and for arm-none-eabi and armeb-none-eabi with make check on an
ARMv8.2-A emulator.

Ok for trunk?
Matthew

2016-05-17  Matthew Wahab  

* doc/sourcebuild.texi (ARM-specific attributes): Add entries for
arm_fp16_alternative_ok and arm_fp16_none_ok.

testsuite/
2016-05-17  Matthew Wahab  

* g++.dg/ext/arm-fp16/arm-fp16-ops-3.C: Use
arm_fp16_alternative_ok.
* g++.dg/ext/arm-fp16/arm-fp16-ops-4.C: Likewise.
* gcc.dg/torture/arm-fp16-int-convert-alt.c: Likewise.
* gcc/testsuite/gcc.dg/torture/arm-fp16-ops-3.c: Likewise.
* gcc/testsuite/gcc.dg/torture/arm-fp16-ops-4.c: Likewise.
* gcc.target/arm/fp16-compile-alt-1.c: Likewise.
* gcc.target/arm/fp16-compile-alt-10.c: Likewise.
* gcc.target/arm/fp16-compile-alt-11.c: Likewise.
* gcc.target/arm/fp16-compile-alt-12.c: Likewise.
* gcc.target/arm/fp16-compile-alt-2.c: Likewise.
* gcc.target/arm/fp16-compile-alt-3.c: Likewise.
* gcc.target/arm/fp16-compile-alt-4.c: Likewise.
* gcc.target/arm/fp16-compile-alt-5.c: Likewise.
* gcc.target/arm/fp16-compile-alt-6.c: Likewise.
* gcc.target/arm/fp16-compile-alt-7.c: Likewise.
* gcc.target/arm/fp16-compile-alt-8.c: Likewise.
* gcc.target/arm/fp16-compile-alt-9.c: Likewise.
* gcc.target/arm/fp16-compile-none-1.c: Use arm_fp16_none_ok.
* gcc.target/arm/fp16-compile-none-2.c: Likewise.
* gcc.target/arm/fp16-rounding-alt-1.c: Use
arm_fp16_alternative_ok.
* lib/target-supports.exp
(check_effective_target_arm_fp16_alternative_ok_nocache): New.
(check_effective_target_arm_fp16_alternative_ok): New.
(check_effective_target_arm_fp16_none_ok_nocache): New.
(check_effective_target_arm_fp16_none_ok): New.

>From 1901fdfbd2f8da9809a60e43284a1749b015dfba Mon Sep 17 00:00:00 2001
From: Matthew Wahab 
Date: Thu, 7 Apr 2016 13:33:51 +0100
Subject: [PATCH 02/17] [PATCH 2/17][Testsuite] Add a selector for ARM FP16
 alternative format support.

2016-05-17  Matthew Wahab  

	* doc/sourcebuild.texi (ARM-specific attributes): Add entries for
	arm_fp16_alternative_ok and arm_fp16_none_ok.

testsuite/
2016-05-17  Matthew Wahab  

	* g++.dg/ext/arm-fp16/arm-fp16-ops-3.C: Use
	arm_fp16_alternative_ok.
	* g++.dg/ext/arm-fp16/arm-fp16-ops-4.C: Likewise.
	* gcc.dg/torture/arm-fp16-int-convert-alt.c: Likewise.
	* gcc/testsuite/gcc.dg/torture/arm-fp16-ops-3.c: Likewise.
	* gcc/testsuite/gcc.dg/torture/arm-fp16-ops-4.c: Likewise.
	* gcc.target/arm/fp16-compile-alt-1.c: Likewise.
	* gcc.target/arm/fp16-compile-alt-10.c: Likewise.
	* gcc.target/arm/fp16-compile-alt-11.c: Likewise.
	* gcc.target/arm/fp16-compile-alt-12.c: Likewise.
	* gcc.target/arm/fp16-compile-alt-2.c: Likewise.
	* gcc.target/arm/fp16-compile-alt-3.c: Likewise.
	* gcc.target/arm/fp16-compile-alt-4.c: Likewise.
	* gcc.target/arm/fp16-compile-alt-5.c: Likewise.
	* gcc.target/arm/fp16-compile-alt-6.c: Likewise.
	* gcc.target/arm/fp16-compile-alt-7.c: Likewise.
	* gcc.target/arm/fp16-compile-alt-8.c: Likewise.
	* gcc.target/arm/fp16-compile-alt-9.c: Likewise.
	* gcc.target/arm/fp16-compile-none-1.c: Use arm_fp16_none_ok.
	* gcc.target/arm/fp16-compile-none-2.c: Likewise.
	* gcc.target/arm/fp16-rounding-alt-1.c: Use
	arm_fp16_alternative_ok.
	* lib/target-supports.exp
	(check_effective_target_arm_fp16_alternative_ok_nocache): New.
	(check_effective_target_arm_fp16_alternative_ok): New.
	(check_effective_target_arm_fp16_none_ok_nocache): New.
	(check_effective_target_arm_fp16_none_ok): New.
---
 gcc/doc/sourcebuild.texi   |  7 +++
 gcc/testsuite/g++.dg/ext/arm-fp16/arm-fp16-ops-3.C |  1 +
 gcc/testsuite/g++.dg/ext/arm-fp16/arm-fp16-ops-4.C |  1 +
 .../gcc.dg/torture/arm-fp16-int-convert-alt.c  |  1 +
 gcc/testsuite/gcc.dg/torture/arm-fp16-ops-3.c  |  1 +
 gcc/testsuite/gcc.dg/torture/arm-fp16-ops-4.c  |  1 +
 gcc/testsuite/gcc.target/arm/fp16-compile-alt-1.c  |  1 +
 gcc/testsuite/gcc.target/arm/fp16-compile-alt-10.c |  1 +
 gcc/testsuite/gcc.target/arm/fp16-compile-alt-11.c |  1 +
 gcc/testsuite/gcc.target/arm/fp16-compile-alt-12.c |  1 +
 gcc/testsuite/gcc.target/arm/fp16-compile-alt-2.c  |  1 +
 gcc/testsuite/gcc.target/

[PATCH 1/17][ARM] Add ARMv8.2-A command line option and profile.

2016-05-17 Thread Matthew Wahab


This patch adds the command options for the architecture ARMv8.2-A and
the half-precision extension. The architecture is selected by
-march=armv8.2-a and has all the properties of -march=armv8.1-a.

This patch also enables the CRC extension (+crc) which is required
for both ARMv8.2-A and ARMv8.1-A architectures but is not currently
enabled by default for -march=armv8.1-a.

The half-precision extension is selected using the extension +fp16. This
enables the VFP FP16 instructions if an ARMv8 VFP unit is also
specified, e.g. by -mfpu=fp-armv8. It also enables the FP16 NEON
instructions if an ARMv8 NEON unit is specified, e.g. by
-mfpu=neon-fp-armv8. Note that if the NEON FP16 instructions are enabled
then so are the VFP FP16 instructions.

Tested the series for arm-none-linux-gnueabihf with native bootstrap and
make check and for arm-none-eabi and armeb-none-eabi with make check on an
ARMv8.2-A emulator.

Ok for trunk?
Matthew

2016-05-17  Matthew Wahab  

* config/arm/arm-arches.def ("armv8.1-a"): Add FL_CRC32.
("armv8.2-a"): New.
("armv8.2-a+fp16"): New.
* config/arm/arm-protos.h (FL2_ARCH8_2): New.
(FL2_FP16INST): New.
(FL2_FOR_ARCH8_2A): New.
* config/arm/arm-tables.opt: Regenerate.
* config/arm/arm.c (arm_arch8_2): New.
(arm_fp16_inst): New.
(arm_option_override): Set arm_arch8_2 and arm_fp16_inst.  Check
for incompatible fp16-format settings.
* config/arm/arm.h (TARGET_VFP_FP16INST): New.
(TARGET_NEON_FP16INST): New.
(arm_arch8_2): Declare.
(arm_fp16_inst): Declare.
* config/arm/bpabi.h (BE8_LINK_SPEC): Add entries for
march=armv8.2-a and march=armv8.2-a+fp16.
* config/arm/t-aprofile (Arch Matches): Add entries for armv8.2-a
and armv8.2-a+fp16.
* doc/invoke.texi (ARM Options): Add "-march=armv8.1-a",
"-march=armv8.2-a" and "-march=armv8.2-a+fp16".


>From 7df41b0a5d248d842fd4c89082dc1a1055dc4604 Mon Sep 17 00:00:00 2001
From: Matthew Wahab 
Date: Thu, 7 Apr 2016 13:31:24 +0100
Subject: [PATCH 01/17] [PATCH 1/17][ARM] Add ARMv8.2-A command line option and
 profile.

2016-05-17  Matthew Wahab  

	* config/arm/arm-arches.def ("armv8.1-a"): Add FL_CRC32.
	("armv8.2-a"): New.
	("armv8.2-a+fp16"): New.
	* config/arm/arm-protos.h (FL2_ARCH8_2): New.
	(FL2_FP16INST): New.
	(FL2_FOR_ARCH8_2A): New.
	* config/arm/arm-tables.opt: Regenerate.
	* config/arm/arm.c (arm_arch8_2): New.
	(arm_fp16_inst): New.
	(arm_option_override): Set arm_arch8_2 and arm_fp16_inst.  Check
	for incompatible fp16-format settings.
	* config/arm/arm.h (TARGET_VFP_FP16INST): New.
	(TARGET_NEON_FP16INST): New.
	(arm_arch8_2): Declare.
	(arm_fp16_inst): Declare.
	* config/arm/bpabi.h (BE8_LINK_SPEC): Add entries for
	march=armv8.2-a and march=armv8.2-a+fp16.
	* config/arm/t-aprofile (Arch Matches): Add entries for armv8.2-a
	and armv8.2-a+fp16.
	* doc/invoke.texi (ARM Options): Add "-march=armv8.1-a",
	"-march=armv8.2-a" and "-march=armv8.2-a+fp16".
---
 gcc/config/arm/arm-arches.def | 10 --
 gcc/config/arm/arm-protos.h   |  4 
 gcc/config/arm/arm-tables.opt | 10 --
 gcc/config/arm/arm.c  | 15 +++
 gcc/config/arm/arm.h  | 14 ++
 gcc/config/arm/bpabi.h|  4 
 gcc/config/arm/t-aprofile |  2 ++
 gcc/doc/invoke.texi   | 13 +
 8 files changed, 68 insertions(+), 4 deletions(-)

diff --git a/gcc/config/arm/arm-arches.def b/gcc/config/arm/arm-arches.def
index fd02b18..2b4a80e 100644
--- a/gcc/config/arm/arm-arches.def
+++ b/gcc/config/arm/arm-arches.def
@@ -58,10 +58,16 @@ ARM_ARCH("armv7e-m", cortexm4,  7EM,	ARM_FSET_MAKE_CPU1 (FL_CO_PROC |	  FL_F
 ARM_ARCH("armv8-a", cortexa53,  8A,	ARM_FSET_MAKE_CPU1 (FL_CO_PROC | FL_FOR_ARCH8A))
 ARM_ARCH("armv8-a+crc",cortexa53, 8A,   ARM_FSET_MAKE_CPU1 (FL_CO_PROC | FL_CRC32  | FL_FOR_ARCH8A))
 ARM_ARCH("armv8.1-a", cortexa53,  8A,
-	  ARM_FSET_MAKE (FL_CO_PROC | FL_FOR_ARCH8A,  FL2_FOR_ARCH8_1A))
+	  ARM_FSET_MAKE (FL_CO_PROC | FL_CRC32 | FL_FOR_ARCH8A,
+			 FL2_FOR_ARCH8_1A))
 ARM_ARCH("armv8.1-a+crc",cortexa53, 8A,
 	  ARM_FSET_MAKE (FL_CO_PROC | FL_CRC32 | FL_FOR_ARCH8A,
 			 FL2_FOR_ARCH8_1A))
+ARM_ARCH ("armv8.2-a", cortexa53,  8A,
+	  ARM_FSET_MAKE (FL_CO_PROC | FL_CRC32 | FL_FOR_ARCH8A,
+			 FL2_FOR_ARCH8_2A))
+ARM_ARCH ("armv8.2-a+fp16", cortexa53,  8A,
+	  ARM_FSET_MAKE (FL_CO_PROC | FL_CRC32 | FL_FOR_ARCH8A,
+			 FL2_FOR_ARCH8_2A | FL2_FP16INST))
 ARM_ARCH("iwmmxt",  iwmmxt, 5TE,	ARM_FSET_MAKE_CPU1 (FL_LDSCHED | FL_STRONG | FL_FOR_ARCH5TE | FL_XSCALE | FL_IWMMXT))
 ARM_ARCH("iwmmxt2", iwmmxt2,5TE,	ARM_FSET_MAKE_CPU1 (FL_LDSCHED | FL_STRONG | FL_FOR_ARCH5TE | FL_XSCALE | FL_IWMMXT | FL

[PATCH 0/17][ARM] ARMv8.2-A and FP16 extension support.

2016-05-17 Thread Matthew Wahab


Hello,

The ARMv8.2-A architecture builds on ARMv8.1-A and includes an optional
extension supporting half-precision floating point (FP16)
arithmetic. This extension adds instructions to the VFP and NEON
instructions sets to provide operations on IEEE754-2008 formatted FP16
values.

This patch set adds support to GCC for the ARMv8.2-A architecture and
for the FP16 extension. The FP16 VFP and NEON instructions are exposed
as new ACLE intrinsics and support is added to the compiler to make use
of data movement and other precision-preserving instructions.

The new half-precision operations are treated as complementary to the
existing FP16 support. To preserve compatibility with existing code, the
ARM __fp16 data-type continues to be treated as a storage-only format
and operations on it continue to be by promotion to single precision
floating point. Half-precision operations are only supported through the
use of the intrinsics added by this patch set.

This series also includes a number of patches to improve the handling of
16-bit integer and floating point values. These are to support the code
generation of ARMv8.2 FP16 extension but are also made available
independently of it. Among these changes are a number of new ACLE data
processing instrinsics to support half-precision (f16) data.

The patches in this series:

- Add the command line and profile for the new architecture.
- Add selectors to the testsuite target-support to distinguish targets
  using the IEEE FP16 format from those using the ARM Alternative format.
- Add support (selectors and directives) to the testsuite target-support
  for ARMv8.2-A and the FP16 extension.
- Add feature macros for the new features.
- Improve the handling of 16-bit integers when VFP units are available.
- Add vector shuffle intrinsics for float16_t.
- Add data movement instructions introduced by the new extension.
- Add the VFP FP16 arithmetic instructions introduced by the extension.
- Add the NEON FP16 arithmetic instructions introduced by the extension.
- Refactor the code for initializing and expanding the NEON intrinsics.
- Add builtins to support intrinsics for VFP FP16 instruction.
- Add builtins to support intrinsics for NEON FP16 instruction.
- Add intrinsics for VFP FP16 instruction.
- Add intrinsics for NEON FP16 instruction.
- Add tests for ARMv8.2-A and the new FP16 support.
- Add tests for the VFP FP16 intrinsics.
- Add tests for the NEON FP16 intrinsics.

Tested the series for arm-none-linux-gnueabihf with native bootstrap and
make check and for arm-none-eabi and armeb-none-eabi with make check on
an ARMv8.2-A emulator. Also tested aarch64-none-elf with the
advsimd-intrinsics testsuite using an ARMv8.2-A emulator.

Matthew

[PATCH] Remove TARGET_INVALID_PARAMETER_TYPE and TARGET_INVALID_RETURN_TYPE hooks.

2016-05-03 Thread Matthew Wahab


Hello,

The target hooks TARGET_INVALID_PARAMETER_TYPE and
TARGET_INVALID_RETURN_TYPE were only used by the ARM backend and
are no longer used. This patch removes them.

Tested for arm-none-linux-gnueabihf with native bootstrap and make check
and for and x86_64-none-linux native bootstrap and check-gcc.

Ok for trunk?
Matthew

c/
2016-05-03  Matthew Wahab  

* c-decl.c (grokdeclarator): Remove errmsg and use of
targetm.invalid_return_type.
(grokparms): Remove errmsg and use of
targetm.invalid_parameter_type.

cp/
2016-05-03  Matthew Wahab  

* decl.c (grokdeclarator): Remove errmsg and use of
targetm.invalid_return_type.
(grokparms): Remove errmsg and use of
targetm.invalid_parameter_type.

gcc/
2016-05-03  Matthew Wahab  

* doc/tm.texi: Regenerate.
* doc/tm.texi.in (TARGET_INVALID_PARAMETER_TYPE): Remove.
(TARGET_INVALID_RETURN_TYPE): Remove.
* system.h: Poison TARGET_INVALID_PARAMETER_TYPE and
TARGET_INVALID_RETURN_TYPE.
* target.def (invalid_parameter_type): Remove.
(invalid_return_type): Remove.
>From 67c843ec9c20f361ecc18e8b60235577c277c1e6 Mon Sep 17 00:00:00 2001
From: Matthew Wahab 
Date: Fri, 29 Apr 2016 09:24:24 +0100
Subject: [PATCH] [PATCH] Remove TARGET_INVALID_PARAMETER_TYPE and
 TARGET_INVALID_RETURN_TYPE hooks.

The target hooks TARGET_INVALID_PARAMETER_TYPE and
TARGET_INVALID_RETURN_TYPE were only used by the ARM backend. Since they
are no longer used, this patch removes them.

Tested for arm-none-linux-gnueabihf with native bootstrap and make check
and for and x86_64-none-linux native bootstrap and check-gcc.

c/
2016-05-03  Matthew Wahab  

	* c-decl.c (grokdeclarator): Remove errmsg and use of
	targetm.invalid_return_type.
	(grokparms): Remove errmsg and use of
	targetm.invalid_parameter_type.

cp/
2016-05-03  Matthew Wahab  

	* decl.c (grokdeclarator): Remove errmsg and use of
	targetm.invalid_return_type.
	(grokparms): Remove errmsg and use of
	targetm.invalid_parameter_type.

gcc/
2016-05-03  Matthew Wahab  

	* doc/tm.texi: Regenerate.
	* doc/tm.texi.in (TARGET_INVALID_PARAMETER_TYPE): Remove.
	(TARGET_INVALID_RETURN_TYPE): Remove.
	* system.h: Poison TARGET_INVALID_PARAMETER_TYPE and
	TARGET_INVALID_RETURN_TYPE.
	* target.def (invalid_parameter_type): Remove.
	(invalid_return_type): Remove.
---
 gcc/c/c-decl.c | 17 -
 gcc/cp/decl.c  | 16 
 gcc/doc/tm.texi| 14 --
 gcc/doc/tm.texi.in |  4 
 gcc/system.h   |  4 +++-
 gcc/target.def | 22 --
 6 files changed, 3 insertions(+), 74 deletions(-)

diff --git a/gcc/c/c-decl.c b/gcc/c/c-decl.c
index f0c677b..2232693 100644
--- a/gcc/c/c-decl.c
+++ b/gcc/c/c-decl.c
@@ -5391,7 +5391,6 @@ grokdeclarator (const struct c_declarator *declarator,
   struct c_arg_info *arg_info = 0;
   addr_space_t as1, as2, address_space;
   location_t loc = UNKNOWN_LOCATION;
-  const char *errmsg;
   tree expr_dummy;
   bool expr_const_operands_dummy;
   enum c_declarator_kind first_non_attr_kind;
@@ -6114,12 +6113,6 @@ grokdeclarator (const struct c_declarator *declarator,
 		  	"an array");
 		type = integer_type_node;
 	  }
-	errmsg = targetm.invalid_return_type (type);
-	if (errmsg)
-	  {
-		error (errmsg);
-		type = integer_type_node;
-	  }
 
 	/* Construct the function type and go to the next
 	   inner layer of declarator.  */
@@ -6856,7 +6849,6 @@ grokparms (struct c_arg_info *arg_info, bool funcdef_flag)
 {
   tree parm, type, typelt;
   unsigned int parmno;
-  const char *errmsg;
 
   /* If there is a parameter of incomplete type in a definition,
 	 this is an error.  In a declaration this is valid, and a
@@ -6905,15 +6897,6 @@ grokparms (struct c_arg_info *arg_info, bool funcdef_flag)
 		}
 	}
 
-	  errmsg = targetm.invalid_parameter_type (type);
-	  if (errmsg)
-	{
-	  error (errmsg);
-	  TREE_VALUE (typelt) = error_mark_node;
-	  TREE_TYPE (parm) = error_mark_node;
-	  arg_types = NULL_TREE;
-	}
-
 	  if (DECL_NAME (parm) && TREE_USED (parm))
 	warn_if_shadowing (parm);
 	}
diff --git a/gcc/cp/decl.c b/gcc/cp/decl.c
index 461822b..c866613 100644
--- a/gcc/cp/decl.c
+++ b/gcc/cp/decl.c
@@ -9271,7 +9271,6 @@ grokdeclarator (const cp_declarator *declarator,
   bool late_return_type_p = false;
   bool array_parameter_p = false;
   source_location saved_loc = input_location;
-  const char *errmsg;
   tree reqs = NULL_TREE;
 
   signed_p = decl_spec_seq_has_spec_p (declspecs, ds_signed);
@@ -10071,12 +10070,6 @@ grokdeclarator (const cp_declarator *declarator,
 		   decl, but to its return type.  */
 		type_quals = TYPE_UNQUALIFIED;
 	  }
-	errmsg = targetm.invalid_return_type (type);
-	if (errmsg)
-	  {
-		error (errmsg);
-		type = integer_type_node;
-	  }
 
 	/* Error about some types functions can't return.

Re: [ARM] Enable __fp16 as a function parameter and return type.

2016-05-03 Thread Matthew Wahab


Hello,

On 28/04/16 16:49, Joseph Myers wrote:

On Thu, 28 Apr 2016, Matthew Wahab wrote:


The ARM target supports the half-precision floating point type __fp16 but does
not allow its use as a function return or parameter type. This patch removes
that restriction and defines the ACLE macro __ARM_FP16_ARGS to indicate this.
The code generated for passing __fp16 values into and out of functions depends
on the level of hardware support but conforms to the AAPCS (see
http://infocenter.arm.com/help/topic/com.arm.doc.ihi0042f/IHI0042F_aapcs.pdf).


The sole use of the TARGET_INVALID_PARAMETER_TYPE and TARGET_INVALID_RETURN_TYPE
hooks was to disallow __fp16 use as a function return or parameter type.  Thus, 
I
think this patch should completely remove those hooks and poison them in
system.h.


This touches the C and C++ front-ends so I'll send a separate patch to do this.


This patch addresses one incompatibility of the original __fp16 specification 
with
the more recent ACLE specification and the specification in ISO/IEC TS 18661-3 
for
how such types should work. Another such incompatibility is the peculiar rule in
the original specification that conversions from double to __fp16 go via float,
with double rounding.  Do you have plans to eliminate that and move to the
single-rounding semantics that are in current specifications?


The patch aims to preserve the __fp16 semantics currently used by GCC, except 
for the
obvious argument/return value in registers change. This is to avoid any changes 
to
the values calculated by existing __fp16 code that might be introduced if 
things like
the conversion rules are changed.


I note that that AAPCS revision says for __fp16, in 7.1.1 Arithmetic Types, "In 
a
variadic function call this will be passed as a double-precision value.".  I
haven't checked what this patch implements, but that could be problematic, and
different from what's said under 7.2, "For variadic functions, float arguments
that match the ellipsis (...) are converted to type double.".


The patch keeps the current GCC behaviour (conversion to double). (There's an 
existing test for this in gcc.target/arm/fp16-variadic-1.c.)



In TS 18661-3, _Float16 is *not* affected by default argument promotions; only
float is.  This reflects how the default conversion of float to double is a 
legacy
feature; note for example how in C99 and C11 float _Imaginary is not promoted to
double _Imaginary, and float _Complex is not promoted to double _Complex.

Thus it would be better for compatibility with TS 18661-3 to pass __fp16 values 
to
variadic functions as themselves, unpromoted.  (Formally of course the lack of
promotion is a language feature not an ABI feature; as long as va_arg for 
_Float16
named works correctly, you could promote at the ABI level and then convert back,
and the only effect would be that sNaNs get quieted, so passing a _Float16 sNaN
through variable arguments would act as a convertFormat operation instead of a
copy operation.  It's not clear that having such an ABI-level promotion is a 
good
idea, however.)

Now, in the context of the current implementation and current ACLE arithmetic on
__fp16 values produces float results - the operands are promoted at the C 
language
level.  This is different from TS 18661-3, where _Float16 arithmetic produces
results whose semantics type is _Float16 but which, if FLT_EVAL_METHOD is 0, are
evaluated with excess range and precision to the range and precision of float.  
So
if __fp16 and float are differently passed to variadic functions, you have the
issue that if the argument is an expression resulting from __fp16 arithmetic, 
the
way it is passed depends on whether current ACLE or TS 18661-3 are followed.  
But
if the eventual aim is for __fp16 (when using the IEEE format rather than the
alternative format) to become just a typedef for _Float16, then these issues 
will
need to be addressed.


When _Float16 support is added, the relationship between __fp16 and _FLoat16 is 
something that will need to be considered. At the moment though, there's only the 
__fp16 type and the intention with this patch is to avoid doing anything that changes 
the behaviour of existing code.


Matthew

[AArch64] Remove an unused reload hook.

2016-04-28 Thread Matthew Wahab


Hello,

Yvan Roux pointed out that the patch at
https://gcc.gnu.org/ml/gcc-patches/2016-02/msg01713.html was never
committed.

From the original submission:

  The LEGITIMIZE_RELOAD_ADDRESS macro is only needed for reload. Since
  the Aarch64 backend no longer supports reload, this macro is not
  needed and this patch removes it.

This is a rebased and retested version of that patch.

Tested aarch64-none-linux-gnu with native bootstrap and make check.

Ok for trunk?
Matthew

gcc/
2016-04-26  Matthew Wahab  

* config/aarch64/aarch64.h (LEGITIMIZE_RELOAD_ADDRESS): Remove.
* config/aarch64/arch64-protos.h
(aarch64_legitimize_reload_address): Remove.
* config/aarch64/aarch64.c (aarch64_legitimize_reload_address):
Remove.
[PATCH] [AArch64] Remove an unused reload hook.

Yvan Roux pointed out that the patch at
https://gcc.gnu.org/ml/gcc-patches/2016-02/msg01713.html was never
committed.

>From the original submission:

  The LEGITIMIZE_RELOAD_ADDRESS macro is only needed for reload. Since
  the Aarch64 backend no longer supports reload, this macro is not
  needed and this patch removes it.

This is a rebased and retested version of that patch.

Tested aarch64-none-linux-gnu with native bootstrap and make check.

Ok for trunk?
Matthew

gcc/
2016-04-26  Matthew Wahab  

* config/aarch64/aarch64.h (LEGITIMIZE_RELOAD_ADDRESS): Remove.
* config/aarch64/arch64-protos.h
(aarch64_legitimize_reload_address): Remove.
* config/aarch64/aarch64.c (aarch64_legitimize_reload_address):
Remove.
---
 gcc/config/aarch64/aarch64-protos.h |   1 -
 gcc/config/aarch64/aarch64.c| 114 
 gcc/config/aarch64/aarch64.h|  15 -
 3 files changed, 130 deletions(-)

diff --git a/gcc/config/aarch64/aarch64-protos.h b/gcc/config/aarch64/aarch64-protos.h
index f22a31c..6a8a850 100644
--- a/gcc/config/aarch64/aarch64-protos.h
+++ b/gcc/config/aarch64/aarch64-protos.h
@@ -339,7 +339,6 @@ int aarch64_simd_attr_length_move (rtx_insn *);
 int aarch64_uxt_size (int, HOST_WIDE_INT);
 int aarch64_vec_fpconst_pow_of_2 (rtx);
 rtx aarch64_final_eh_return_addr (void);
-rtx aarch64_legitimize_reload_address (rtx *, machine_mode, int, int, int);
 rtx aarch64_mask_from_zextract_ops (rtx, rtx);
 const char *aarch64_output_move_struct (rtx *operands);
 rtx aarch64_return_addr (int, rtx);
diff --git a/gcc/config/aarch64/aarch64.c b/gcc/config/aarch64/aarch64.c
index 9995494..4a1acc9 100644
--- a/gcc/config/aarch64/aarch64.c
+++ b/gcc/config/aarch64/aarch64.c
@@ -5022,120 +5022,6 @@ aarch64_legitimize_address (rtx x, rtx /* orig_x  */, machine_mode mode)
   return x;
 }
 
-/* Try a machine-dependent way of reloading an illegitimate address
-   operand.  If we find one, push the reload and return the new rtx.  */
-
-rtx
-aarch64_legitimize_reload_address (rtx *x_p,
-   machine_mode mode,
-   int opnum, int type,
-   int ind_levels ATTRIBUTE_UNUSED)
-{
-  rtx x = *x_p;
-
-  /* Do not allow mem (plus (reg, const)) if vector struct mode.  */
-  if (aarch64_vect_struct_mode_p (mode)
-  && GET_CODE (x) == PLUS
-  && REG_P (XEXP (x, 0))
-  && CONST_INT_P (XEXP (x, 1)))
-{
-  rtx orig_rtx = x;
-  x = copy_rtx (x);
-  push_reload (orig_rtx, NULL_RTX, x_p, NULL,
-		   BASE_REG_CLASS, GET_MODE (x), VOIDmode, 0, 0,
-		   opnum, (enum reload_type) type);
-  return x;
-}
-
-  /* We must recognize output that we have already generated ourselves.  */
-  if (GET_CODE (x) == PLUS
-  && GET_CODE (XEXP (x, 0)) == PLUS
-  && REG_P (XEXP (XEXP (x, 0), 0))
-  && CONST_INT_P (XEXP (XEXP (x, 0), 1))
-  && CONST_INT_P (XEXP (x, 1)))
-{
-  push_reload (XEXP (x, 0), NULL_RTX, &XEXP (x, 0), NULL,
-		   BASE_REG_CLASS, GET_MODE (x), VOIDmode, 0, 0,
-		   opnum, (enum reload_type) type);
-  return x;
-}
-
-  /* We wish to handle large displacements off a base register by splitting
- the addend across an add and the mem insn.  This can cut the number of
- extra insns needed from 3 to 1.  It is only useful for load/store of a
- single register with 12 bit offset field.  */
-  if (GET_CODE (x) == PLUS
-  && REG_P (XEXP (x, 0))
-  && CONST_INT_P (XEXP (x, 1))
-  && HARD_REGISTER_P (XEXP (x, 0))
-  && mode != TImode
-  && mode != TFmode
-  && aarch64_regno_ok_for_base_p (REGNO (XEXP (x, 0)), true))
-{
-  HOST_WIDE_INT val = INTVAL (XEXP (x, 1));
-  HOST_WIDE_INT low = val & 0xfff;
-  HOST_WIDE_INT high = val - low;
-  HOST_WIDE_INT offs;
-  rtx cst;
-  machine_mode xmode = GET_MODE (x);
-
-  /* In ILP32, xmode can be either DImode or SImode.  */
-  gcc_assert (xmode == DImode || xmode == SImode);
-
-  /* Reload non-zero BLKmode offsets.  This is because we cannot ascertain
-	 BLKmode alignment.  */
-  if (GET_MODE_SIZE (mode) == 0)
-

[ARM] Enable __fp16 as a function parameter and return type.

2016-04-28 Thread Matthew Wahab


Hello,

The ARM target supports the half-precision floating point type __fp16
but does not allow its use as a function return or parameter type. This
patch removes that restriction and defines the ACLE macro
__ARM_FP16_ARGS to indicate this. The code generated for passing __fp16
values into and out of functions depends on the level of hardware
support but conforms to the AAPCS (see
http://infocenter.arm.com/help/topic/com.arm.doc.ihi0042f/IHI0042F_aapcs.pdf).

This patch enables data movement for HF-mode values using VFP registers,
when they are available, to support passing arguments and return values
through the registers.

This patch also fixes the definition of TARGET_NEON_FP16 which used to
require both neon and fp16 features to be enabled. This was
inadvertently weakened, when the macro definition was changed to use
ARM_FPU_FSET_HAS, to only require one of neon or fp16 to be
enabled. This patch returns to the original
requirements. TARGET_NEON_FP16 is only used in instruction selection for
HF-mode data moves.

Tested for arm-none-eabi with cross-compiled check-gcc and for
arm-none-linux-gnueabihf with native bootstrap and make check.

Ok for trunk?
Matthew

gcc/
2016-04-27  Matthew Wahab  
Ramana Radhakrishnan  
Jiong Wang  

* config/arm/arm-c.c (arm_cpu_builtins): Use def_or_undef_macro
for __ARM_FP16_FORMAT_IEEE and __ARM_FP16_FORMAT_ALTERNATIVE.
Define __ARM_FP16_ARGS when appropriate.
* config/arm/arm.c (arm_invalid_parameter_type): Remove
declaration.
(arm_invalid_return_type): Likewise.
(TARGET_INVALID_PARAMETER_TYPE): Remove.
(TARGET_INVALID_RETURN_TYPE): Remove.
(aapcs_vfp_sub_candidate): Allow HFmode.
(aapcs_vfp_allocate): Add comment.  Support HFmode.
(aapcs_vfp_allocate_return_reg): Likewise.
(struct aapcs_cp_arg_layout): Slightly reword comments for
is_return_candidate and allocate_return_reg.
(output_mov_vfp): Update assert.
(arm_hard_regno_mode_ok): Remove comment, update HF-mode
condition.
(arm_invalid_parameter_type): Remove.
(amr_invalid_return_type): Remove.
* config/arm/arm.h (TARGET_NEON_FP16): Fix definition.
* config/arm/arm.md (*arm32_movhf): Disable for TARGET_VFP.
* config/arm/vfp.md (*movhf_vfp): Enable for TARGET_VFP.

gcc/testsuite/
2016-04-27  Matthew Wahab  

* g++.dg/ext/arm-fp16/fp16-param-1.c: Update expected output.  Add
test for __ARM_FP16_ARGS.
* g++.dg/ext/arm-fp16/fp16-return-1.c: Update expected output.
* gcc.target/arm/aapcs/neon-vect10.c: New.
* gcc.target/arm/aapcs/neon-vect9.c: New.
* gcc.target/arm/aapcs/vfp18.c: New.
* gcc.target/arm/aapcs/vfp19.c: New.
* gcc.target/arm/aapcs/vfp20.c: New.
* gcc.target/arm/aapcs/vfp21.c: New.
* gcc.target/arm/fp16-aapcs-1.c: New.
* g++.target/arm/fp16-param-1.c: Update expected output.  Add
test for __ARM_FP16_ARGS.
* g++.target/arm/fp16-return-1.c: Update expected output.
[PATCH] [ARM] Enable __fp16 as a function parameter and return type.

The ARM target supports the half-precision floating point type __fp16
but does not allow its use as a function return or parameter type. This
patch removes that restriction and defines the ACLE macro
__ARM_FP16_ARGS to indicate this. The code generated for passing __fp16
values into and out of functions depends on the level of hardware
support but conforms to the AAPCS (see
http://infocenter.arm.com/help/topic/com.arm.doc.ihi0042f/IHI0042F_aapcs.pdf).

This patch enables data movement for HF-mode values using VFP registers,
when they are available, to support passing arguments and return values
through the register.

This patch also fixes the definition of TARGET_NEON_FP16 which used to
require both neon and fp16 features to be enabled. This was
inadvertently weakened, when the macro definition was changed to use
ARM_FPU_FSET_HAS, to only require one of neon or fp16 to be
enabled. This patch returns to the original
requirements. TARGET_NEON_FP16 is only used in instruction selection for
HF-mode data moves.

Tested for arm-none-eabi with cross-compiled check-gcc and for
arm-none-linux-gnueabihf with native bootstrap and make check.

gcc/
2016-04-27  Matthew Wahab  
	Ramana Radhakrishnan  
	Jiong Wang  

	* config/arm/arm-c.c (arm_cpu_builtins): Use def_or_undef_macro
	for __ARM_FP16_FORMAT_IEEE and __ARM_FP16_FORMAT_ALTERNATIVE.
	Define __ARM_FP16_ARGS when appropriate.
	* config/arm/arm.c (arm_invalid_parameter_type): Remove
	declaration.
	(arm_invalid_return_type): Likewise.
	(TARGET_INVALID_PARAMETER_TYPE): Remove.
	(TARGET_INVALID_RETURN_TYPE): Remove.
	(aapcs_vfp_sub_candidate): Allow HFmode.
	(aapcs_vfp_allocate): Add comment.  Support HFmode.
	(aapcs_vfp_allocate_return_reg): Likewise.
	(struct aapcs_cp_arg_layout): Slightly reword comments for
	is_return_candidate and

Re: [PR target/70711][ARM] Fix big-endian ARMv8.1-A builds.

2016-04-18 Thread Matthew Wahab


On 18/04/16 10:41, Richard Biener wrote:

On Mon, 18 Apr 2016, Ramana Radhakrishnan wrote:


Testing for armeb-none-eabi with cross-compiled check-gcc and with
command line testing to confirm working executables are built.

Is this ok for trunk and for GCC-6 once testing is completed?


Oops, Thanks for catching this -

Ok for trunk.

Ok for GCC-6 by me but you need RM sign off before applying to the branch.


Ok.

Thanks,
Richard.



Tested and committed to trunk and GCC-6.
Matthew

[PR target/70711][ARM] Fix big-endian ARMv8.1-A builds.

2016-04-18 Thread Matthew Wahab


When ARMv8.1 support was added to the ARM target, the
bpabi.h/BE8_LINK_SPEC list wasn't updated. That means that when GCC
targets ARMv8.1 big-endian, it fails to generate working binaries.

This patch addds the required 'march=armv8.1-a' entries to
BE8_LINK_SPEC. It also adds the missing entries for armv8-a+crc.

Testing for armeb-none-eabi with cross-compiled check-gcc and with
command line testing to confirm working executables are built.

Is this ok for trunk and for GCC-6 once testing is completed?
Matthew

2016-04-18  Matthew Wahab  

PR target/70711
* config/arm/bpabi.h (BE8_LINK_SPEC): Add entries for armv8+crc,
armv8.1-a and armv8.1-a+crc.
>From 627f689c37eeec3f0d846cda4577385158ca8d10 Mon Sep 17 00:00:00 2001
From: Matthew Wahab 
Date: Mon, 18 Apr 2016 09:32:54 +0100
Subject: [PATCH] [ARM] Fix big-endian ARMv8.1-A builds.

When ARMv8.1 support was added to the ARM target, the
bpabi.h/BE8_LINK_SPEC list wasn't updated. That means that when GCC
targets ARMv8.1 big-endian, it fails to generate working binaries.

This patch addds the required 'march=armv8.1-a' entries to
BE8_LINK_SPEC. It also adds the missing entries for armv8-a+crc.

Tested for armeb-none-eabi with cross-compiled check-gcc and with
command line testing to confirm working executables are built.

Is this ok for trunk and for GCC-6?
Matthew

2016-04-18  Matthew Wahab  

	PR target/70711
	* config/arm/bpabi.h (BE8_LINK_SPEC): Add entries for armv8+crc,
	armv8.1-a and armv8.1-a+crc.

Change-Id: I8a825421b6c4355d9fd611432c11a8d7b4d61bbf
---
 gcc/config/arm/bpabi.h | 6 ++
 1 file changed, 6 insertions(+)

diff --git a/gcc/config/arm/bpabi.h b/gcc/config/arm/bpabi.h
index 5d6c4ed..06488ba 100644
--- a/gcc/config/arm/bpabi.h
+++ b/gcc/config/arm/bpabi.h
@@ -87,6 +87,9 @@
|march=armv7e-m|mcpu=cortex-m4|mcpu=cortex-m7\
|march=armv6-m|mcpu=cortex-m0\
|march=armv8-a	\
+   |march=armv8-a+crc	\
+   |march=armv8.1-a	\
+   |march=armv8.1-a+crc	\
:%{!r:--be8}}}"
 #else
 #define BE8_LINK_SPEC \
@@ -115,6 +118,9 @@
|march=armv7e-m|mcpu=cortex-m4|mcpu=cortex-m7\
|march=armv6-m|mcpu=cortex-m0\
|march=armv8-a	\
+   |march=armv8-a+crc	\
+   |march=armv8.1-a	\
+   |march=armv8.1-a+crc	\
:%{!r:--be8}}}"
 #endif
 
-- 
2.1.4

Re: [PATCH][AArch64] Remove an unused reload hook.

2016-02-29 Thread Matthew Wahab


On 25/02/16 11:00, Yvan Roux wrote:

Hi,

On 26 January 2015 at 18:01, Matthew Wahab  wrote:

Hello,

The LEGITIMIZE_RELOAD_ADDRESS macro is only needed for reload. Since the
Aarch64 backend no longer supports reload, this macro is not needed and this
patch removes it.

Tested aarch64-none-linux-gnu with gcc-check. No new failures.

Ok for trunk?
Matthew

gcc/
2015-01-26  Matthew Wahab  

 * config/aarch64/aarch64.h (LEGITIMIZE_RELOAD_ADDRESS): Remove.
 * config/aarch64/arch64-protos.h
 (aarch64_legitimize_reload_address): Remove.
 * config/aarch64/aarch64.c (aarch64_legitimize_reload_address):
 Remove.


It seems that this old patch was forgotten, I guess that it'll have to
wait for GCC 7 now, but I think it's a good thing top cleanup the
reload specific hooks and constructions (I've another patch on for
that under on-going).



Thanks for spotting this. I'll take of it when stage 1 opens.
Matthew

Re: [PATCH 01/15] add more coalescing to simplify constraints

2016-01-21 Thread Matthew Wahab


On 21/01/16 14:22, Matthew Wahab wrote:

On 15/01/16 17:14, Sebastian Pop wrote:

From: Sebastian Pop 

2015-12-30  Aditya Kumar  
Sebastian Pop  

* graphite-dependences.c (constrain_domain): Add call to isl_*_coalesce.
(add_pdr_constraints): Same.
(scop_get_reads): Same.
(scop_get_must_writes): Same.
(scop_get_may_writes): Same.
(scop_get_original_schedule): Same.
(extend_schedule): Same.
(apply_schedule_on_deps): Same.
(carries_deps): Same.
(compute_deps): Same.
(scop_get_dependences): Same.
* graphite-isl-ast-to-gimple.c
(translate_isl_ast_to_gimple::generate_isl_schedule): Same.
* graphite-optimize-isl.c (get_schedule_for_band): Same.
(get_schedule_for_band_list): Same.
(get_schedule_map): Same.
(apply_schedule_map_to_scop): Same.
* graphite-sese-to-poly.c (build_pbb_scattering_polyhedrons): Same.
(build_loop_iteration_domains): Same.
(add_condition_to_pbb): Same.
(add_param_constraints): Same.
(pdr_add_memory_accesses): Same.
(pdr_add_data_dimensions): Same.
---
  gcc/graphite-dependences.c   | 63 ++--
  gcc/graphite-isl-ast-to-gimple.c |  2 ++
  gcc/graphite-optimize-isl.c  | 12 
  gcc/graphite-sese-to-poly.c  | 28 --
  4 files changed, 56 insertions(+), 49 deletions(-)



diff --git a/gcc/graphite-optimize-isl.c b/gcc/graphite-optimize-isl.c
index 9626e96..15dd5b0 100644
--- a/gcc/graphite-optimize-isl.c
+++ b/gcc/graphite-optimize-isl.c

[..]

  static isl_union_map *
  get_schedule_map (isl_schedule *schedule)
  {
-  isl_band_list *bandList = isl_schedule_get_band_forest (schedule);
-  isl_union_map *schedule_map = get_schedule_for_band_list (bandList);
+  isl_band_list *band_list = isl_schedule_get_band_forest (schedule);
+  isl_union_map *schedule_map = get_schedule_for_band_list (band_list);
isl_band_list_free (bandList);
return schedule_map;
  }


Building arm-none-linux-gnueabihf fails at the isl_band_list_free. Shouldn't 
bandList
be band_list?


Kyrill points out that it's already fixed: 
https://gcc.gnu.org/ml/gcc-patches/2016-01/msg01613.html


Matthew

Re: [PATCH 01/15] add more coalescing to simplify constraints

2016-01-21 Thread Matthew Wahab


On 15/01/16 17:14, Sebastian Pop wrote:

From: Sebastian Pop 

2015-12-30  Aditya Kumar  
Sebastian Pop  

* graphite-dependences.c (constrain_domain): Add call to isl_*_coalesce.
(add_pdr_constraints): Same.
(scop_get_reads): Same.
(scop_get_must_writes): Same.
(scop_get_may_writes): Same.
(scop_get_original_schedule): Same.
(extend_schedule): Same.
(apply_schedule_on_deps): Same.
(carries_deps): Same.
(compute_deps): Same.
(scop_get_dependences): Same.
* graphite-isl-ast-to-gimple.c
(translate_isl_ast_to_gimple::generate_isl_schedule): Same.
* graphite-optimize-isl.c (get_schedule_for_band): Same.
(get_schedule_for_band_list): Same.
(get_schedule_map): Same.
(apply_schedule_map_to_scop): Same.
* graphite-sese-to-poly.c (build_pbb_scattering_polyhedrons): Same.
(build_loop_iteration_domains): Same.
(add_condition_to_pbb): Same.
(add_param_constraints): Same.
(pdr_add_memory_accesses): Same.
(pdr_add_data_dimensions): Same.
---
  gcc/graphite-dependences.c   | 63 ++--
  gcc/graphite-isl-ast-to-gimple.c |  2 ++
  gcc/graphite-optimize-isl.c  | 12 
  gcc/graphite-sese-to-poly.c  | 28 --
  4 files changed, 56 insertions(+), 49 deletions(-)



diff --git a/gcc/graphite-optimize-isl.c b/gcc/graphite-optimize-isl.c
index 9626e96..15dd5b0 100644
--- a/gcc/graphite-optimize-isl.c
+++ b/gcc/graphite-optimize-isl.c

[..]

  static isl_union_map *
  get_schedule_map (isl_schedule *schedule)
  {
-  isl_band_list *bandList = isl_schedule_get_band_forest (schedule);
-  isl_union_map *schedule_map = get_schedule_for_band_list (bandList);
+  isl_band_list *band_list = isl_schedule_get_band_forest (schedule);
+  isl_union_map *schedule_map = get_schedule_for_band_list (band_list);
isl_band_list_free (bandList);
return schedule_map;
  }


Building arm-none-linux-gnueabihf fails at the isl_band_list_free. Shouldn't bandList 
be band_list?


Matthew

Re: [PR tree-optimization/64946] Push integer type conversion to ABS_EXPR argument when possible.

2016-01-12 Thread Matthew Wahab


On 11/01/16 17:46, Richard Biener wrote:

On January 11, 2016 5:36:33 PM GMT+01:00, Bernd Schmidt  
wrote:

On 01/11/2016 05:33 PM, Matthew Wahab wrote:


The case I'm trying to fix has (short)abs((int)short_var). I'd

thought

that if
abs(short_var) was undefined because the result couldn't be

represented

then the type
conversion from int to short would also be undefined. In fact, it's
implementation
defined and S4.5 of the GCC manual says that the value is reduced

until

it can be
represented. So (short)abs((int)short_var) will produce a value when
abs(short_var) is undefined meaning this transformation isn't

correct.

I'll drop this patch.


Maybe we could have an optab and corresponding internal function for an

abs that's always defined.


I'd like to have ABSU_EXPR (or allow unsigned result on ABS_EXPR).



I'll see if I can do anything along those lines. This looks like something for stage 
1 though.


Matthew

Re: [PR tree-optimization/64946] Push integer type conversion to ABS_EXPR argument when possible.

2016-01-11 Thread Matthew Wahab


On 08/01/16 22:24, Joseph Myers wrote:

On Fri, 8 Jan 2016, Matthew Wahab wrote:


Hello,

The C/C++ front-ends apply type conversions to expressions using ABS
with integral arguments of type smaller than int. This means that, for
short x, ABS(x) becomes something like (short)ABS((int)x)). When the
argument is the result of memory access, this restricts the vectorizer
apparently because the alignment of the operation doesn't match the
alignment of the memory access. This causes two failures in
gcc.target/aarch64/vec-abs-compile.c where the expected abs instructions
aren't generated.

This patch adds a case to convert_to_integer_1 to push the integer type
conversions applied to an ABS expression into the argument, when it is
safe to do so. This fixes the failing test cases.


Note that, for example, (int) labs (INT_MIN) is well-defined if long is
wider than int (in GNU C, where conversion of out-of-range values to
signed types is well-defined).  But abs (INT_MIN) is undefined.  That is,
such transformations can never be safe unless the minimum value of the
type of the ABS_EXPR argument is not a possible value of the contained
expression.


The case I'm trying to fix has (short)abs((int)short_var). I'd thought that if
abs(short_var) was undefined because the result couldn't be represented then 
the type
conversion from int to short would also be undefined. In fact, it's 
implementation
defined and S4.5 of the GCC manual says that the value is reduced until it can 
be
represented. So (short)abs((int)short_var) will produce a value when abs(short_var) 
is undefined meaning this transformation isn't correct. I'll drop this patch.


Thanks,
Matthew


What's your definition of safe?  I'd expect a patch like this to contain
lots of testcases verifying that the transformation occurs in cases where
it's OK, but that (int) labs (int_var) does not end up with an ABS_EXPR on
type int (for example) (in the absence of a guarantee that said ABS_EXPR
will return INT_MIN for an argument of INT_MIN - which might be the case
for -fwrapv, but can't be otherwise because *_nonnegative*_p assume
ABS_EXPR always produces a nonnegative result).

[PR tree-optimization/64946] Push integer type conversion to ABS_EXPR argument when possible.

2016-01-08 Thread Matthew Wahab


Hello,

The C/C++ front-ends apply type conversions to expressions using ABS
with integral arguments of type smaller than int. This means that, for
short x, ABS(x) becomes something like (short)ABS((int)x)). When the
argument is the result of memory access, this restricts the vectorizer
apparently because the alignment of the operation doesn't match the
alignment of the memory access. This causes two failures in
gcc.target/aarch64/vec-abs-compile.c where the expected abs instructions
aren't generated.

This patch adds a case to convert_to_integer_1 to push the integer type
conversions applied to an ABS expression into the argument, when it is
safe to do so. This fixes the failing test cases.

Tested aarch64-none-elf with cross-compiled check-gcc, Also tested
aarch64-none-linux-gnu and x86_64-linux-gnu with native bootstrap and
make check.

Ok for trunk?
Matthew

gcc/
2016-01-07  Matthew Wahab  

PR tree-optimization/64946
* convert.c (convert_to_integer_1): Push narrowing type
conversions for ABS_EXPR into the argument when possible.

diff --git a/gcc/convert.c b/gcc/convert.c
index 4b1e1f1..95ff1e2 100644
--- a/gcc/convert.c
+++ b/gcc/convert.c
@@ -852,6 +852,24 @@ convert_to_integer_1 (tree type, tree expr, bool dofold)
 	  }
 	  break;
 
+	case ABS_EXPR:
+	  /* Convert (N) abs ((W)x) -> abs ((N)x)
+	 if N, W and the type T of x are all signed integer types
+	 and the precision of N is >= the precision of T.  */
+	  {
+	tree arg0 = get_unwidened (TREE_OPERAND (expr, 0), type);
+	tree arg0_type = TREE_TYPE (arg0);
+
+	if (!TYPE_UNSIGNED (type)
+		&& !TYPE_UNSIGNED (arg0_type)
+		&& outprec >= TYPE_PRECISION (arg0_type))
+	  {
+		return fold_build1 (ABS_EXPR, type,
+convert (type, arg0));
+	  }
+	break;
+	  }
+
 	case NEGATE_EXPR:
 	case BIT_NOT_EXPR:
 	  /* This is not correct for ABS_EXPR,

Re: [PATCH 1/7][ARM] Add support for ARMv8.1.

2015-12-16 Thread Matthew Wahab


On 10/12/15 11:02, Ramana Radhakrishnan wrote:

On Thu, Dec 10, 2015 at 10:43 AM, Ramana Radhakrishnan
 wrote:

On Mon, Dec 7, 2015 at 4:04 PM, Matthew Wahab
 wrote:

Ping. Updated patch attached.
Matthew


On 26/11/15 15:55, Matthew Wahab wrote:


Hello,


ARMv8.1 includes an extension to ARM which adds two Adv.SIMD
instructions, vqrdmlah and vqrdmlsh. This patch set adds support for
ARMv8.1 and for the new instructions, enabling the architecture with
--march=armv8.1-a. The new instructions are enabled when both ARMv8.1
and a suitable fpu options are set, for instance with -march=armv8.1-a
-mfpu=neon-fp-armv8 -mfloat-abi=hard.


[..]

gcc/
2015-11-26  Matthew Wahab  

  * config/arm/arm-arches.def: Add "armv8.1-a" and "armv8.1-a+crc".
  * config/arm/arm-protos.h (FL2_ARCH8_1): New.
  (FL2_FOR_ARCH8_1A): New.
  * config/arm/arm-tables.opt: Regenerate.
  * config/arm/arm.c (arm_arch8_1): New.
  (arm_option_override): Set arm_arch8_1.
  * config/arm/arm.h (TARGET_NEON_RDMA): New.
  (arm_arch8_1): Declare.
  * doc/invoke.texi (ARM Options, -march): Add "armv8.1-a" and
  "armv8.1-a+crc".
  (ARM Options, -mfpu): Fix a typo.





OK.


I couldn't find 0/7 but in addition here you need to update the output
for TAG_FP_SIMD_Arch to be 4.

regards
Ramana


After discussing this offline, it turns out that the relevant attribute 
(Tag_Advanced_SIMD_arch) is set by the assembler.


Matthew

Re: [PATCH 5/7][Testsuite] Support ARMv8.1 ARM tests.

2015-12-15 Thread Matthew Wahab


On 10/12/15 10:49, Ramana Radhakrishnan wrote:

On Mon, Dec 7, 2015 at 4:10 PM, Matthew Wahab  
wrote:

On 27/11/15 17:11, Matthew Wahab wrote:

On 27/11/15 13:44, Christophe Lyon wrote:

On 26/11/15 16:02, Matthew Wahab wrote



This patch adds ARMv8.1 support to GCC Dejagnu, to allow ARM tests to
specify targest and to set up command line options. It builds on the
ARMv8.1 target support added for AArch64 tests, partly reworking that
support to take into account the different configurations that tests may
be run under.

[..]

# Return 1 if the target supports the ARMv8.1 Adv.SIMD extension, 0 -#
otherwise.  The test is valid for AArch64. +# otherwise.  The test is valid for
AArch64 and ARM.  Record the command +# line options that needed.


s/that//


Fixed in attached patch.


Can you also make sure doc/sourcebuild.texi is updated for this helper function 
?
If not documented,it would be good to add the documentation for the same while 
you
are here.


Done, I've listed them as ARM attributes based on their names.

Tested this and the other update patch (#4/7) for arm-none-eabi with 
cross-compiled
check-gcc by running the gcc.target/aarch64/advsimd-intrinsics with and without 
ARMv8.1 enabled as a test target.


Ok?
Matthew

testsuite/
2015-12-14  Matthew Wahab  

* lib/target-supports.exp (add_options_for_arm_v8_1a_neon): Update
comment.  Use check_effective_target_arm_v8_1a_neon_ok to select
the command line options.
(check_effective_target_arm_v8_1a_neon_ok_nocache): Update initial
test to allow ARM targets.  Select and record a working set of
command line options.
(check_effective_target_arm_v8_1a_neon_hw): Add tests for ARM
targets.

gcc/
2015-12-14  Matthew Wahab  

* doc/sourcebuild.texi (ARM-specific attributes): Add
"arm_v8_1a_neon_ok" and "arm_v8_1a_neon_hw".

>From d6a4dfd89cfb29aeaa0e2d58ac9d8271b31879c1 Mon Sep 17 00:00:00 2001
From: Matthew Wahab 
Date: Fri, 9 Oct 2015 17:38:12 +0100
Subject: [PATCH 5/7] [Testsuite] Support ARMv8.1 NEON on ARM.

---
 gcc/doc/sourcebuild.texi  |  9 ++
 gcc/testsuite/lib/target-supports.exp | 60 ++-
 2 files changed, 54 insertions(+), 15 deletions(-)

diff --git a/gcc/doc/sourcebuild.texi b/gcc/doc/sourcebuild.texi
index 61de4a5..cd49e6d8 100644
--- a/gcc/doc/sourcebuild.texi
+++ b/gcc/doc/sourcebuild.texi
@@ -1582,6 +1582,15 @@ Some multilibs may be incompatible with these options.
 ARM target supports @code{-mfpu=neon-fp-armv8 -mfloat-abi=softfp}.
 Some multilibs may be incompatible with these options.
 
+@item arm_v8_1a_neon_ok
+ARM target supports options to generate ARMv8.1 Adv.SIMD instructions.
+Some multilibs may be incompatible with these options.
+
+@item arm_v8_1a_neon_hw
+ARM target supports executing ARMv8.1 Adv.SIMD instructions.  Some
+multilibs may be incompatible with the options needed.  Implies
+arm_v8_1a_neon_ok.
+
 @item arm_prefer_ldrd_strd
 ARM target prefers @code{LDRD} and @code{STRD} instructions over
 @code{LDM} and @code{STM} instructions.
diff --git a/gcc/testsuite/lib/target-supports.exp b/gcc/testsuite/lib/target-supports.exp
index 8d28b23..a0de314 100644
--- a/gcc/testsuite/lib/target-supports.exp
+++ b/gcc/testsuite/lib/target-supports.exp
@@ -2825,14 +2825,15 @@ proc add_options_for_arm_v8_neon { flags } {
 return "$flags $et_arm_v8_neon_flags -march=armv8-a"
 }
 
-# Add the options needed for ARMv8.1 Adv.SIMD.
+# Add the options needed for ARMv8.1 Adv.SIMD.  Also adds the ARMv8 NEON
+# options for AArch64 and for ARM.
 
 proc add_options_for_arm_v8_1a_neon { flags } {
-if { [istarget aarch64*-*-*] } {
-	return "$flags -march=armv8.1-a"
-} else {
+if { ! [check_effective_target_arm_v8_1a_neon_ok] } {
 	return "$flags"
 }
+global et_arm_v8_1a_neon_flags
+return "$flags $et_arm_v8_1a_neon_flags -march=armv8.1-a"
 }
 
 proc add_options_for_arm_crc { flags } {
@@ -3280,17 +3281,33 @@ proc check_effective_target_arm_neonv2_hw { } {
 }
 
 # Return 1 if the target supports the ARMv8.1 Adv.SIMD extension, 0
-# otherwise.  The test is valid for AArch64.
+# otherwise.  The test is valid for AArch64 and ARM.  Record the command
+# line options needed.
 
 proc check_effective_target_arm_v8_1a_neon_ok_nocache { } {
-if { ![istarget aarch64*-*-*] } {
-	return 0
+global et_arm_v8_1a_neon_flags
+set et_arm_v8_1a_neon_flags ""
+
+if { ![istarget arm*-*-*] && ![istarget aarch64*-*-*] } {
+	return 0;
 }
-return [check_no_compiler_messages_nocache arm_v8_1a_neon_ok assembly {
-	#if !defined (__ARM_FEATURE_QRDMX)
-	#error "__ARM_FEATURE_QRDMX not defined"
-	#endif
-} [add_options_for_arm_v8_1a_neon ""]]
+
+# Iterate through sets of options to find the compiler flags that
+# need to be added to the -march option.  Start with the empty set
+# sin

Re: [PATCH 4/7][ARM] Add ACLE feature macro for ARMv8.1 instructions.

2015-12-15 Thread Matthew Wahab


On 10/12/15 10:45, Ramana Radhakrishnan wrote:

On Tue, Dec 8, 2015 at 7:45 AM, Christian Bruel  wrote:

Hi Matthew,


On 26/11/15 16:01, Matthew Wahab wrote:


Hello,

This patch adds the feature macro __ARM_FEATURE_QRDMX to indicate the
presence of the ARMv8.1 instructions vqrdmlah and vqrdmlsh. It is
defined when the instructions are available, as it is when
-march=armv8.1-a is enabled with suitable fpu options.

gcc/
2015-11-26  Matthew Wahab  

   * config/arm/arm-c.c (arm_cpu_builtins): Define
__ARM_FEATURE_QRDMX.





+  if (TARGET_NEON_RDMA)
+builtin_define ("__ARM_FEATURE_QRDMX");
+

Since it depends on TARGET_NEON, could you please use

   def_or_undef_macro (pfile, "__ARM_FEATURE_QRDMX", TARGET_NEON_RDMA);

instead ?


I think that's what it should be -

OK with that fixed.


Attached an updated patch using the def_or_undef macro. It also removes some trailing 
whitespace in that part of the code.


Still ok?
Matthew

gcc/
2015-12-14  Matthew Wahab  

* config/arm/arm-c.c (arm_cpu_builtins): Define
__ARM_FEATURE_QRDMX.  Clean up some trailing whitespace.


>From 8cce5cd7b6d89c49dcf694a5c72ab0ed7c26fe20 Mon Sep 17 00:00:00 2001
From: Matthew Wahab 
Date: Wed, 17 Jun 2015 13:25:09 +0100
Subject: [PATCH 4/7] [ARM] Add __ARM_FEATURE_QRDMX

---
 gcc/config/arm/arm-c.c | 8 +---
 1 file changed, 5 insertions(+), 3 deletions(-)

diff --git a/gcc/config/arm/arm-c.c b/gcc/config/arm/arm-c.c
index 7dee28e..a980ed8 100644
--- a/gcc/config/arm/arm-c.c
+++ b/gcc/config/arm/arm-c.c
@@ -62,19 +62,21 @@ static void
 arm_cpu_builtins (struct cpp_reader* pfile)
 {
   def_or_undef_macro (pfile, "__ARM_FEATURE_DSP", TARGET_DSP_MULTIPLY);
-  def_or_undef_macro (pfile, "__ARM_FEATURE_QBIT", TARGET_ARM_QBIT); 
+  def_or_undef_macro (pfile, "__ARM_FEATURE_QBIT", TARGET_ARM_QBIT);
   def_or_undef_macro (pfile, "__ARM_FEATURE_SAT", TARGET_ARM_SAT);
   def_or_undef_macro (pfile, "__ARM_FEATURE_CRYPTO", TARGET_CRYPTO);
 
   def_or_undef_macro (pfile, "__ARM_FEATURE_UNALIGNED", unaligned_access);
 
+  def_or_undef_macro (pfile, "__ARM_FEATURE_QRDMX", TARGET_NEON_RDMA);
+
   if (TARGET_CRC32)
 builtin_define ("__ARM_FEATURE_CRC32");
 
-  def_or_undef_macro (pfile, "__ARM_32BIT_STATE", TARGET_32BIT); 
+  def_or_undef_macro (pfile, "__ARM_32BIT_STATE", TARGET_32BIT);
 
   if (TARGET_ARM_FEATURE_LDREX)
-builtin_define_with_int_value ("__ARM_FEATURE_LDREX", 
+builtin_define_with_int_value ("__ARM_FEATURE_LDREX",
    TARGET_ARM_FEATURE_LDREX);
   else
 cpp_undef (pfile, "__ARM_FEATURE_LDREX");
-- 
2.1.4

Re: [Fortran, Patch] Memory sync after coarray image control statements and assignment

2015-12-09 Thread Matthew Wahab


On 08/12/15 09:25, Tobias Burnus wrote:

On Mon, Dec 07, 2015 at 02:09:22PM +, Matthew Wahab wrote:

I wonder whether using
__asm__ __volatile__ ("":::"memory");
would be sufficient as it has a way lower overhead than
__sync_synchronize().


I don't know anything about Fortran or coarrays and I'm curious
whether this affects architectures with weak memory models. Is the
barrier only needed to stop reordering by the compiler or is does it
also need to stop reordering by the hardware?


Short answer: I think no mfence is needed as either the communication
is local (to the thread/process) - in which case the hardware will act
correctly - or the communication is remote (different thread, process,
communication to different computer via interlink [ethernet, infiniband,
...]); and in the later case, the communication library has to deal with
it.


Thanks for explaining this, it made things clear. Based on your description, I agree 
that hardware reordering shouldn't be a problem.



and the (main) program code (slightly trimmed):

   static void * restrict caf_token.0;
   static integer(kind=4) * restrict var;
   void _caf_init.1 (void);

   *var = 4;

   desc.3.data = 42;
   _gfortran_caf_send (caf_token.0, 0B /* offset */ var,
   _gfortran_caf_this_image (0), &desc.2, 0B, &desc.3, 4, 
4, 0);
   __asm__ __volatile__("":::"memory");  // new
   tmp = *var;

The problem is that in that case the compiler does not know that
"_gfortran_caf_send (caf_token.0," can modify "*var".



Is the restrict attribute on var correct? From what you say, it sounds like *var 
could be accessed through other pointers (assuming restrict has the same meaning as 
in C).


Matthew

Re: [C] Issue an error on scalar va_list with reverse storage order

2015-12-08 Thread Matthew Wahab


Hello

On 03/12/15 14:53, Eric Botcazou wrote:

further testing revealed an issue with va_arg handling and reverse scalar 
storage
order on some platforms: when va_list is scalar, passing a field of a structure
with reverse SSO as first argument to va_start/va_arg/va_end doesn't work 
because
the machinery takes its address and this is not allowed for such a field (it's
really a corner case but gcc.c-torture/execute/stdarg-2.c does exercise it). 
Hence
the attached patch which issues an error in this case.


The new gcc.dg/sso-9.c test is failing for aarch64 and arm targets. There's no 
error
generated if I compile the test from the command line for aarch64-none-elf. GCC 
for
x86_64 does generate the error.

Matthew


2015-12-03  Eric Botcazou  

* c-tree.h (c_build_va_arg): Adjust prototype. * c-parser.c
(c_parser_postfix_expression): Adjust call to above. * c-typeck.c
(c_build_va_arg): Rename LOC parameter to LOC2, add LOC1 parameter, adjust
throughout and issue an error if EXPR is a component with reverse storage order.


2015-12-03  Eric Botcazou  

* gcc.dg/sso-9.c: New test.

Re: [PATCH 6/7][ARM] Add ACLE intrinsics vqrdmlah and vqrdmlsh

2015-12-07 Thread Matthew Wahab


Ping. Updated patch attached.
Matthew

On 26/11/15 16:04, Matthew Wahab wrote:

Hello,

This patch adds the ACLE intrinsics for the instructions introduced in
ARMv8.1. It adds the vqrmdlah and vqrdmlsh forms of the instrinsics to
the arm_neon.h header, together with the ARM builtins used to implement
them. The intrinsics are available when -march=armv8.1-a is enabled
together with appropriate fpu options.

Tested the series for arm-none-eabi with cross-compiled check-gcc on an
ARMv8.1 emulator. Also tested arm-none-linux-gnueabihf with native
bootstrap and make check.

Ok for trunk?
Matthew

gcc/
2015-11-26  Matthew Wahab  

 * config/arm/arm_neon.h (vqrdmlah_s16, vqrdmlah_s32): New.
 (vqrdmlahq_s16, vqrdmlahq_s32): New.
 (vqrdmlsh_s16, vqrdmlsh_s32): New.
 (vqrdmlahq_s16, vqrdmlshq_s32): New.
 * config/arm/arm_neon_builtins.def: Add "vqrdmlah" and "vqrdmlsh".



>From 1844027592d818e0de53a3da904ae6bfe1aef534 Mon Sep 17 00:00:00 2001
From: Matthew Wahab 
Date: Tue, 1 Sep 2015 16:21:44 +0100
Subject: [PATCH 6/7] [ARM] Add neon intrinsics vqrdmlah, vqrdmlsh.

Change-Id: Ic40ff4d477f36ec01714c68e3b83b66208c7958b
---
 gcc/config/arm/arm_neon.h| 50 
 gcc/config/arm/arm_neon_builtins.def |  2 ++
 2 files changed, 52 insertions(+)

diff --git a/gcc/config/arm/arm_neon.h b/gcc/config/arm/arm_neon.h
index 0a33d21..b617f80 100644
--- a/gcc/config/arm/arm_neon.h
+++ b/gcc/config/arm/arm_neon.h
@@ -1158,6 +1158,56 @@ vqrdmulhq_s32 (int32x4_t __a, int32x4_t __b)
   return (int32x4_t)__builtin_neon_vqrdmulhv4si (__a, __b);
 }
 
+#ifdef __ARM_FEATURE_QRDMX
+__extension__ static __inline int16x4_t __attribute__ ((__always_inline__))
+vqrdmlah_s16 (int16x4_t __a, int16x4_t __b, int16x4_t __c)
+{
+  return (int16x4_t)__builtin_neon_vqrdmlahv4hi (__a, __b, __c);
+}
+
+__extension__ static __inline int32x2_t __attribute__ ((__always_inline__))
+vqrdmlah_s32 (int32x2_t __a, int32x2_t __b, int32x2_t __c)
+{
+  return (int32x2_t)__builtin_neon_vqrdmlahv2si (__a, __b, __c);
+}
+
+__extension__ static __inline int16x8_t __attribute__ ((__always_inline__))
+vqrdmlahq_s16 (int16x8_t __a, int16x8_t __b, int16x8_t __c)
+{
+  return (int16x8_t)__builtin_neon_vqrdmlahv8hi (__a, __b, __c);
+}
+
+__extension__ static __inline int32x4_t __attribute__ ((__always_inline__))
+vqrdmlahq_s32 (int32x4_t __a, int32x4_t __b, int32x4_t __c)
+{
+  return (int32x4_t)__builtin_neon_vqrdmlahv4si (__a, __b, __c);
+}
+
+__extension__ static __inline int16x4_t __attribute__ ((__always_inline__))
+vqrdmlsh_s16 (int16x4_t __a, int16x4_t __b, int16x4_t __c)
+{
+  return (int16x4_t)__builtin_neon_vqrdmlshv4hi (__a, __b, __c);
+}
+
+__extension__ static __inline int32x2_t __attribute__ ((__always_inline__))
+vqrdmlsh_s32 (int32x2_t __a, int32x2_t __b, int32x2_t __c)
+{
+  return (int32x2_t)__builtin_neon_vqrdmlshv2si (__a, __b, __c);
+}
+
+__extension__ static __inline int16x8_t __attribute__ ((__always_inline__))
+vqrdmlshq_s16 (int16x8_t __a, int16x8_t __b, int16x8_t __c)
+{
+  return (int16x8_t)__builtin_neon_vqrdmlshv8hi (__a, __b, __c);
+}
+
+__extension__ static __inline int32x4_t __attribute__ ((__always_inline__))
+vqrdmlshq_s32 (int32x4_t __a, int32x4_t __b, int32x4_t __c)
+{
+  return (int32x4_t)__builtin_neon_vqrdmlshv4si (__a, __b, __c);
+}
+#endif
+
 __extension__ static __inline int16x8_t __attribute__ ((__always_inline__))
 vmull_s8 (int8x8_t __a, int8x8_t __b)
 {
diff --git a/gcc/config/arm/arm_neon_builtins.def b/gcc/config/arm/arm_neon_builtins.def
index 0b719df..8d5c0ca 100644
--- a/gcc/config/arm/arm_neon_builtins.def
+++ b/gcc/config/arm/arm_neon_builtins.def
@@ -45,6 +45,8 @@ VAR4 (BINOP, vqdmulh, v4hi, v2si, v8hi, v4si)
 VAR4 (BINOP, vqrdmulh, v4hi, v2si, v8hi, v4si)
 VAR2 (TERNOP, vqdmlal, v4hi, v2si)
 VAR2 (TERNOP, vqdmlsl, v4hi, v2si)
+VAR4 (TERNOP, vqrdmlah, v4hi, v2si, v8hi, v4si)
+VAR4 (TERNOP, vqrdmlsh, v4hi, v2si, v8hi, v4si)
 VAR3 (BINOP, vmullp, v8qi, v4hi, v2si)
 VAR3 (BINOP, vmulls, v8qi, v4hi, v2si)
 VAR3 (BINOP, vmullu, v8qi, v4hi, v2si)
-- 
2.1.4

Re: [PATCH 7/7][ARM] Add ACLE intrinsics vqrdmlah_lane and vqrdmlsh_lane

2015-12-07 Thread Matthew Wahab


Ping. Updated patch attached.
Matthew

On 26/11/15 16:10, Matthew Wahab wrote:

Attached the missing patch.
Matthew

On 26/11/15 16:04, Matthew Wahab wrote:

Hello,

This patch adds the ACLE intrinsics for the instructions introduced in
ARMv8.1. It adds the vqrmdlah_lane and vqrdmlsh_lane forms of the
instrinsics to the arm_neon.h header, together with the ARM builtins
used to implement them. The intrinsics are available when
-march=armv8.1-a is enabled together with appropriate fpu options.

Tested the series for arm-none-eabi with cross-compiled check-gcc on an
ARMv8.1 emulator. Also tested arm-none-linux-gnueabihf with native
bootstrap and make check.

Ok for trunk?
Matthew

gcc/
2015-11-26  Matthew Wahab  

 * config/arm/arm_neon.h (vqrdmlahq_lane_s16): New.
 (vqrdmlahq_lane_s32): New.
 (vqrdmlah_lane_s16): New.
 (vqrdmlah_lane_s32): New.
 (vqrdmlshq_lane_s16): New.
 (vqrdmlshq_lane_s32): New.
 (vqrdmlsh_lane_s16): New.
 (vqrdmlsh_lane_s32): New.
 * config/arm/arm_neon_builtins.def: Add "vqrdmlah_lane" and
 "vqrdmlsh_lane".





>From 9928f1e8e30c500933fa68f95311cf0f78dd6712 Mon Sep 17 00:00:00 2001
From: Matthew Wahab 
Date: Tue, 1 Sep 2015 16:22:34 +0100
Subject: [PATCH 7/7] [ARM] Add neon intrinsics vqrdmlah_lane, vqrdmlsh_lane.

Change-Id: Ia0ab4bbe683af2d019d18a34302a7b9798193a79
---
 gcc/config/arm/arm_neon.h| 50 
 gcc/config/arm/arm_neon_builtins.def |  2 ++
 2 files changed, 52 insertions(+)

diff --git a/gcc/config/arm/arm_neon.h b/gcc/config/arm/arm_neon.h
index b617f80..ed50253 100644
--- a/gcc/config/arm/arm_neon.h
+++ b/gcc/config/arm/arm_neon.h
@@ -7096,6 +7096,56 @@ vqrdmulh_lane_s32 (int32x2_t __a, int32x2_t __b, const int __c)
   return (int32x2_t)__builtin_neon_vqrdmulh_lanev2si (__a, __b, __c);
 }
 
+#ifdef __ARM_FEATURE_QRDMX
+__extension__ static __inline int16x8_t __attribute__ ((__always_inline__))
+vqrdmlahq_lane_s16 (int16x8_t __a, int16x8_t __b, int16x4_t __c, const int __d)
+{
+  return (int16x8_t)__builtin_neon_vqrdmlah_lanev8hi (__a, __b, __c, __d);
+}
+
+__extension__ static __inline int32x4_t __attribute__ ((__always_inline__))
+vqrdmlahq_lane_s32 (int32x4_t __a, int32x4_t __b, int32x2_t __c, const int __d)
+{
+  return (int32x4_t)__builtin_neon_vqrdmlah_lanev4si (__a, __b, __c, __d);
+}
+
+__extension__ static __inline int16x4_t __attribute__ ((__always_inline__))
+vqrdmlah_lane_s16 (int16x4_t __a, int16x4_t __b, int16x4_t __c, const int __d)
+{
+  return (int16x4_t)__builtin_neon_vqrdmlah_lanev4hi (__a, __b, __c, __d);
+}
+
+__extension__ static __inline int32x2_t __attribute__ ((__always_inline__))
+vqrdmlah_lane_s32 (int32x2_t __a, int32x2_t __b, int32x2_t __c, const int __d)
+{
+  return (int32x2_t)__builtin_neon_vqrdmlah_lanev2si (__a, __b, __c, __d);
+}
+
+__extension__ static __inline int16x8_t __attribute__ ((__always_inline__))
+vqrdmlshq_lane_s16 (int16x8_t __a, int16x8_t __b, int16x4_t __c, const int __d)
+{
+  return (int16x8_t)__builtin_neon_vqrdmlsh_lanev8hi (__a, __b, __c, __d);
+}
+
+__extension__ static __inline int32x4_t __attribute__ ((__always_inline__))
+vqrdmlshq_lane_s32 (int32x4_t __a, int32x4_t __b, int32x2_t __c, const int __d)
+{
+  return (int32x4_t)__builtin_neon_vqrdmlsh_lanev4si (__a, __b, __c, __d);
+}
+
+__extension__ static __inline int16x4_t __attribute__ ((__always_inline__))
+vqrdmlsh_lane_s16 (int16x4_t __a, int16x4_t __b, int16x4_t __c, const int __d)
+{
+  return (int16x4_t)__builtin_neon_vqrdmlsh_lanev4hi (__a, __b, __c, __d);
+}
+
+__extension__ static __inline int32x2_t __attribute__ ((__always_inline__))
+vqrdmlsh_lane_s32 (int32x2_t __a, int32x2_t __b, int32x2_t __c, const int __d)
+{
+  return (int32x2_t)__builtin_neon_vqrdmlsh_lanev2si (__a, __b, __c, __d);
+}
+#endif
+
 __extension__ static __inline int16x4_t __attribute__ ((__always_inline__))
 vmul_n_s16 (int16x4_t __a, int16_t __b)
 {
diff --git a/gcc/config/arm/arm_neon_builtins.def b/gcc/config/arm/arm_neon_builtins.def
index 8d5c0ca..1fdb2a8 100644
--- a/gcc/config/arm/arm_neon_builtins.def
+++ b/gcc/config/arm/arm_neon_builtins.def
@@ -60,6 +60,8 @@ VAR4 (BINOP, vqdmulh_n, v4hi, v2si, v8hi, v4si)
 VAR4 (BINOP, vqrdmulh_n, v4hi, v2si, v8hi, v4si)
 VAR4 (SETLANE, vqdmulh_lane, v4hi, v2si, v8hi, v4si)
 VAR4 (SETLANE, vqrdmulh_lane, v4hi, v2si, v8hi, v4si)
+VAR4 (MAC_LANE, vqrdmlah_lane, v4hi, v2si, v8hi, v4si)
+VAR4 (MAC_LANE, vqrdmlsh_lane, v4hi, v2si, v8hi, v4si)
 VAR2 (BINOP, vqdmull, v4hi, v2si)
 VAR8 (BINOP, vshls, v8qi, v4hi, v2si, di, v16qi, v8hi, v4si, v2di)
 VAR8 (BINOP, vshlu, v8qi, v4hi, v2si, di, v16qi, v8hi, v4si, v2di)
-- 
2.1.4

Re: [PATCH 5/7][Testsuite] Support ARMv8.1 ARM tests.

2015-12-07 Thread Matthew Wahab


On 27/11/15 17:11, Matthew Wahab wrote:

On 27/11/15 13:44, Christophe Lyon wrote:

On 26/11/15 16:02, Matthew Wahab wrote:



This patch adds ARMv8.1 support to GCC Dejagnu, to allow ARM
tests to specify targest and to set up command line options.
It builds on the ARMv8.1 target support added for AArch64 tests, partly
reworking that support to take into account the different configurations
that tests may be run under.



I may be mistaken, but -mfpu=neon-fp-armv8 and -mfloat-abi=softfp are not
supported by aarch64-gcc. So it seems to me that
check_effective_target_arm_v8_1a_neon_ok_nocache will not always work
for aarch64 after your patch.



Or does it work because no option is needed and thus "" always
matches and thus the loop always exits after the first iteration
on aarch64?


Yes, the idea is that the empty string will make the function first try
'-march=armv8.1-a' without any other flag. That will work for AArch64 because it
doesn't need any other option.


Maybe a more accurate comment would help remembering that, in case
-mfpu option becomes necessary for aarch64.



Agreed, it's worth having a comment to explain what the 'foreach' construct is 
doing.

Matthew


I've added a comment to the foreach construct, to make it clearer what
it's doing.

Matthew

testsuite/
2015-12-07  Matthew Wahab  

* lib/target-supports.exp (add_options_for_arm_v8_1a_neon): Update
comment.  Use check_effetive_target_arm_v8_1a_neon_ok to select
the command line options.
(check_effective_target_arm_v8_1a_neon_ok_nocache): Update initial
test to allow ARM targets.  Select and record a working set of
command line options.
(check_effective_target_arm_v8_1a_neon_hw): Add tests for ARM
targets.

>From 7e2cd1ef475a5c7f4a4722b9ba32bd46e3b30eae Mon Sep 17 00:00:00 2001
From: Matthew Wahab 
Date: Fri, 9 Oct 2015 17:38:12 +0100
Subject: [PATCH 5/7] [Testsuite] Support ARMv8.1 NEON on ARM.

---
 gcc/testsuite/lib/target-supports.exp | 60 ++-
 1 file changed, 45 insertions(+), 15 deletions(-)

diff --git a/gcc/testsuite/lib/target-supports.exp b/gcc/testsuite/lib/target-supports.exp
index 4e349e9..6dfb6f6 100644
--- a/gcc/testsuite/lib/target-supports.exp
+++ b/gcc/testsuite/lib/target-supports.exp
@@ -2816,14 +2816,15 @@ proc add_options_for_arm_v8_neon { flags } {
 return "$flags $et_arm_v8_neon_flags -march=armv8-a"
 }
 
-# Add the options needed for ARMv8.1 Adv.SIMD.
+# Add the options needed for ARMv8.1 Adv.SIMD.  Also adds the ARMv8 NEON
+# options for AArch64 and for ARM.
 
 proc add_options_for_arm_v8_1a_neon { flags } {
-if { [istarget aarch64*-*-*] } {
-	return "$flags -march=armv8.1-a"
-} else {
+if { ! [check_effective_target_arm_v8_1a_neon_ok] } {
 	return "$flags"
 }
+global et_arm_v8_1a_neon_flags
+return "$flags $et_arm_v8_1a_neon_flags -march=armv8.1-a"
 }
 
 proc add_options_for_arm_crc { flags } {
@@ -3271,17 +3272,33 @@ proc check_effective_target_arm_neonv2_hw { } {
 }
 
 # Return 1 if the target supports the ARMv8.1 Adv.SIMD extension, 0
-# otherwise.  The test is valid for AArch64.
+# otherwise.  The test is valid for AArch64 and ARM.  Record the command
+# line options that needed.
 
 proc check_effective_target_arm_v8_1a_neon_ok_nocache { } {
-if { ![istarget aarch64*-*-*] } {
-	return 0
+global et_arm_v8_1a_neon_flags
+set et_arm_v8_1a_neon_flags ""
+
+if { ![istarget arm*-*-*] && ![istarget aarch64*-*-*] } {
+	return 0;
 }
-return [check_no_compiler_messages_nocache arm_v8_1a_neon_ok assembly {
-	#if !defined (__ARM_FEATURE_QRDMX)
-	#error "__ARM_FEATURE_QRDMX not defined"
-	#endif
-} [add_options_for_arm_v8_1a_neon ""]]
+
+# Iterate through sets of options to find the compiler flags that
+# need to be added to the -march option.  Start with the empty set
+# since AArch64 only needs the -march setting.
+foreach flags {"" "-mfpu=neon-fp-armv8" "-mfloat-abi=softfp" \
+		   "-mfpu=neon-fp-armv8 -mfloat-abi=softfp"} {
+	if { [check_no_compiler_messages_nocache arm_v8_1a_neon_ok object {
+	#if !defined (__ARM_FEATURE_QRDMX)
+	#error "__ARM_FEATURE_QRDMX not defined"
+	#endif
+	} "$flags -march=armv8.1-a"] } {
+	set et_arm_v8_1a_neon_flags "$flags -march=armv8.1-a"
+	return 1
+	}
+}
+
+return 0;
 }
 
 proc check_effective_target_arm_v8_1a_neon_ok { } {
@@ -3308,16 +3325,17 @@ proc check_effective_target_arm_v8_neon_hw { } {
 }
 
 # Return 1 if the target supports executing the ARMv8.1 Adv.SIMD extension, 0
-# otherwise.  The test is valid for AArch64.
+# otherwise.  The test is valid for AArch64 and ARM.
 
 proc check_effective_target_arm_v8_1a_neon_hw { } {
 if { ![check_effective_target_arm_v8_1a_neon_o

Re: [PATCH 4/7][ARM] Add ACLE feature macro for ARMv8.1 instructions.

2015-12-07 Thread Matthew Wahab


Ping. Updated patch attached.
Matthew


On 26/11/15 16:01, Matthew Wahab wrote:

Hello,

This patch adds the feature macro __ARM_FEATURE_QRDMX to indicate the
presence of the ARMv8.1 instructions vqrdmlah and vqrdmlsh. It is
defined when the instructions are available, as it is when
-march=armv8.1-a is enabled with suitable fpu options.

Tested the series for arm-none-eabi with cross-compiled check-gcc on an
ARMv8.1 emulator. Also tested arm-none-linux-gnueabihf with native
bootstrap and make check.

Ok for trunk?
Matthew

gcc/
2015-11-26  Matthew Wahab  

 * config/arm/arm-c.c (arm_cpu_builtins): Define __ARM_FEATURE_QRDMX.



>From 721586aad45f7f75a0c198517602125c9d8f76f2 Mon Sep 17 00:00:00 2001
From: Matthew Wahab 
Date: Wed, 17 Jun 2015 13:25:09 +0100
Subject: [PATCH 4/7] [ARM] Add __ARM_FEATURE_QRDMX

Change-Id: I26cde507e8844a731e4fd857fbd30bf87f213f89
---
 gcc/config/arm/arm-c.c | 3 +++
 1 file changed, 3 insertions(+)

diff --git a/gcc/config/arm/arm-c.c b/gcc/config/arm/arm-c.c
index 7dee28e..62c9304 100644
--- a/gcc/config/arm/arm-c.c
+++ b/gcc/config/arm/arm-c.c
@@ -68,6 +68,9 @@ arm_cpu_builtins (struct cpp_reader* pfile)
 
   def_or_undef_macro (pfile, "__ARM_FEATURE_UNALIGNED", unaligned_access);
 
+  if (TARGET_NEON_RDMA)
+builtin_define ("__ARM_FEATURE_QRDMX");
+
   if (TARGET_CRC32)
 builtin_define ("__ARM_FEATURE_CRC32");
 
-- 
2.1.4

Re: [PATCH 3/7][ARM] Add patterns for new instructions

2015-12-07 Thread Matthew Wahab


Ping. Updated patch attached.
Matthew

On 26/11/15 16:00, Matthew Wahab wrote:

Hello,

This patch adds patterns for the instructions, vqrdmlah and vqrdmlsh,
introduced in the ARMv8.1 architecture. The instructions are made
available when -march=armv8.1-a is enabled with suitable fpu settings,
such as -mfpu=neon-fp-armv8 -mfloat-abi=hard.

Tested the series for arm-none-eabi with cross-compiled check-gcc on an
ARMv8.1 emulator. Also tested arm-none-linux-gnueabihf with native
bootstrap and make check.

Ok for trunk?
Matthew

gcc/
2015-11-26  Matthew Wahab  

 * config/arm/iterators.md (VQRDMLH_AS): New.
 (neon_rdma_as): New.
 * config/arm/neon.md
 (neon_vqrdmlh): New.
 (neon_vqrdmlh_lane): New.
 * config/arm/unspecs.md (UNSPEC_VQRDMLAH): New.
 (UNSPEC_VQRDMLSH): New.



>From 8b69bae2f0057be09d3cbe3fe3c29155085e260d Mon Sep 17 00:00:00 2001
From: Matthew Wahab 
Date: Wed, 17 Jun 2015 12:00:50 +0100
Subject: [PATCH 3/7] [ARM] Add patterns for new instructions.

Change-Id: Ia84c345019c7beda2d3c6c39074043d2e005347a
---
 gcc/config/arm/iterators.md |  5 +
 gcc/config/arm/neon.md  | 45 +
 gcc/config/arm/unspecs.md   |  2 ++
 3 files changed, 52 insertions(+)

diff --git a/gcc/config/arm/iterators.md b/gcc/config/arm/iterators.md
index 6a54125..c7a6880 100644
--- a/gcc/config/arm/iterators.md
+++ b/gcc/config/arm/iterators.md
@@ -362,6 +362,8 @@
 (define_int_iterator CRYPTO_SELECTING [UNSPEC_SHA1C UNSPEC_SHA1M
UNSPEC_SHA1P])
 
+(define_int_iterator VQRDMLH_AS [UNSPEC_VQRDMLAH UNSPEC_VQRDMLSH])
+
 ;;
 ;; Mode attributes
 ;;
@@ -831,3 +833,6 @@
(simple_return " && use_simple_return_p ()")])
 (define_code_attr return_cond_true [(return " && USE_RETURN_INSN (TRUE)")
(simple_return " && use_simple_return_p ()")])
+
+;; Attributes for VQRDMLAH/VQRDMLSH
+(define_int_attr neon_rdma_as [(UNSPEC_VQRDMLAH "a") (UNSPEC_VQRDMLSH "s")])
diff --git a/gcc/config/arm/neon.md b/gcc/config/arm/neon.md
index 62fb6da..844ef5e 100644
--- a/gcc/config/arm/neon.md
+++ b/gcc/config/arm/neon.md
@@ -2014,6 +2014,18 @@
   [(set_attr "type" "neon_sat_mul_")]
 )
 
+;; vqrdmlah, vqrdmlsh
+(define_insn "neon_vqrdmlh"
+  [(set (match_operand:VMDQI 0 "s_register_operand" "=w")
+	(unspec:VMDQI [(match_operand:VMDQI 1 "s_register_operand" "0")
+		   (match_operand:VMDQI 2 "s_register_operand" "w")
+		   (match_operand:VMDQI 3 "s_register_operand" "w")]
+		  VQRDMLH_AS))]
+  "TARGET_NEON_RDMA"
+  "vqrdmlh.\t%0, %2, %3"
+  [(set_attr "type" "neon_sat_mla__long")]
+)
+
 (define_insn "neon_vqdmlal"
   [(set (match_operand: 0 "s_register_operand" "=w")
 (unspec: [(match_operand: 1 "s_register_operand" "0")
@@ -3176,6 +3188,39 @@ if (BYTES_BIG_ENDIAN)
   [(set_attr "type" "neon_sat_mul__scalar_q")]
 )
 
+;; vqrdmlah_lane, vqrdmlsh_lane
+(define_insn "neon_vqrdmlh_lane"
+  [(set (match_operand:VMQI 0 "s_register_operand" "=w")
+	(unspec:VMQI [(match_operand:VMQI 1 "s_register_operand" "0")
+		  (match_operand:VMQI 2 "s_register_operand" "w")
+		  (match_operand: 3 "s_register_operand"
+	  "")
+		  (match_operand:SI 4 "immediate_operand" "i")]
+		 VQRDMLH_AS))]
+  "TARGET_NEON_RDMA"
+{
+  return
+   "vqrdmlh.\t%q0, %q2, %P3[%c4]";
+}
+  [(set_attr "type" "neon_mla__scalar")]
+)
+
+(define_insn "neon_vqrdmlh_lane"
+  [(set (match_operand:VMDI 0 "s_register_operand" "=w")
+	(unspec:VMDI [(match_operand:VMDI 1 "s_register_operand" "0")
+		  (match_operand:VMDI 2 "s_register_operand" "w")
+		  (match_operand:VMDI 3 "s_register_operand"
+	  "")
+		  (match_operand:SI 4 "immediate_operand" "i")]
+		 VQRDMLH_AS))]
+  "TARGET_NEON_RDMA"
+{
+  return
+   "vqrdmlh.\t%P0, %P2, %P3[%c4]";
+}
+  [(set_attr "type" "neon_mla__scalar")]
+)
+
 (define_insn "neon_vmla_lane"
   [(set (match_operand:VMD 0 "s_register_operand" "=w")
 	(unspec:VMD [(match_operand:VMD 1 "s_register_operand" "0")
diff --git a/gcc/config/arm/unspecs.md b/gcc/config/arm/unspecs.md
index 67acafd..ffe703c 100644
--- a/gcc/config/arm/unspecs.md
+++ b/gcc/config/arm/unspecs.md
@@ -360,5 +360,7 @@
   UNSPEC_NVRINTX
   UNSPEC_NVRINTA
   UNSPEC_NVRINTN
+  UNSPEC_VQRDMLAH
+  UNSPEC_VQRDMLSH
 ])
 
-- 
2.1.4

Re: [PATCH 2/7][ARM] Multilib support for ARMv8.1.

2015-12-07 Thread Matthew Wahab


Ping. Updated patch attached.
Matthew

On 26/11/15 15:58, Matthew Wahab wrote:

This patch sets up multilib support for ARMv8.1, treating it as a
synonym for ARMv8. Since ARMv8.1 integer, FP or SIMD
instructions are only generated for the new, instruction-specific
instrinsics, mapping to ARMv8 rather than adding a new multilib variant
is sufficient.

Tested the series for arm-none-eabi with cross-compiled check-gcc on an
ARMv8.1 emulator. Also tested arm-none-linux-gnueabihf with native
bootstrap and make check.

Ok for trunk?
Matthew

gcc/
2015-11-26  Matthew Wahab  

 * config/arm/t-aprofile: Make "armv8.1-a" and "armv8.1-a+crc"
 matches for "armv8-a".



>From c5c0f983e03135fe0cde29077353b429c0c502a2 Mon Sep 17 00:00:00 2001
From: Matthew Wahab 
Date: Fri, 23 Oct 2015 09:37:12 +0100
Subject: [PATCH 2/7] [ARM] Multilib support for ARMv8.1

Change-Id: I65ee77768e22452ac15452cf6d4fdec3079ef852
---
 gcc/config/arm/t-aprofile | 2 ++
 1 file changed, 2 insertions(+)

diff --git a/gcc/config/arm/t-aprofile b/gcc/config/arm/t-aprofile
index cf34161..b23f1bc 100644
--- a/gcc/config/arm/t-aprofile
+++ b/gcc/config/arm/t-aprofile
@@ -98,6 +98,8 @@ MULTILIB_MATCHES   += march?armv8-a=mcpu?xgene1
 
 # Arch Matches
 MULTILIB_MATCHES   += march?armv8-a=march?armv8-a+crc
+MULTILIB_MATCHES   += march?armv8-a=march?armv8.1-a
+MULTILIB_MATCHES   += march?armv8-a=march?armv8.1-a+crc
 
 # FPU matches
 MULTILIB_MATCHES   += mfpu?vfpv3-d16=mfpu?vfpv3
-- 
2.1.4

Re: [PATCH 1/7][ARM] Add support for ARMv8.1.

2015-12-07 Thread Matthew Wahab


Ping. Updated patch attached.
Matthew

On 26/11/15 15:55, Matthew Wahab wrote:

Hello,


ARMv8.1 includes an extension to ARM which adds two Adv.SIMD
instructions, vqrdmlah and vqrdmlsh. This patch set adds support for
ARMv8.1 and for the new instructions, enabling the architecture with
--march=armv8.1-a. The new instructions are enabled when both ARMv8.1
and a suitable fpu options are set, for instance with -march=armv8.1-a
-mfpu=neon-fp-armv8 -mfloat-abi=hard.

This patch set adds the command line options and internal feature
macros. Following patches
- enable multilib support for ARMv8.1,
- add patterns for the new instructions,
- add the ACLE feature macro for the ARMv8.1 extensions,
- extend target support in the testsuite to ARMv8.1,
- add the ACLE intrinsics for vqrmdl{as}h and
- add the ACLE intrinsics for vqrmdl{as}h_lane.

Tested the series for arm-none-eabi with cross-compiled check-gcc on an
ARMv8.1 emulator. Also tested arm-none-linux-gnueabihf with native
bootstrap and make check.

Is this ok for trunk?
Matthew

gcc/
2015-11-26  Matthew Wahab  

 * config/arm/arm-arches.def: Add "armv8.1-a" and "armv8.1-a+crc".
 * config/arm/arm-protos.h (FL2_ARCH8_1): New.
 (FL2_FOR_ARCH8_1A): New.
 * config/arm/arm-tables.opt: Regenerate.
 * config/arm/arm.c (arm_arch8_1): New.
 (arm_option_override): Set arm_arch8_1.
 * config/arm/arm.h (TARGET_NEON_RDMA): New.
 (arm_arch8_1): Declare.
 * doc/invoke.texi (ARM Options, -march): Add "armv8.1-a" and
 "armv8.1-a+crc".
 (ARM Options, -mfpu): Fix a typo.


>From 65bcf9a875fd31f6201e64cbbd4fdfb0b8f4719e Mon Sep 17 00:00:00 2001
From: Matthew Wahab 
Date: Tue, 1 Sep 2015 11:31:25 +0100
Subject: [PATCH 1/7] [ARM] Add ARMv8.1 architecture flags and options.

Change-Id: I6bb0c7f020613a1a17e40bccc28b00c30d644c70
---
 gcc/config/arm/arm-arches.def |  5 +
 gcc/config/arm/arm-protos.h   |  3 +++
 gcc/config/arm/arm-tables.opt | 10 --
 gcc/config/arm/arm.c  |  4 
 gcc/config/arm/arm.h  |  6 ++
 gcc/doc/invoke.texi   |  6 +++---
 6 files changed, 29 insertions(+), 5 deletions(-)

diff --git a/gcc/config/arm/arm-arches.def b/gcc/config/arm/arm-arches.def
index ddf6c3c..6c83153 100644
--- a/gcc/config/arm/arm-arches.def
+++ b/gcc/config/arm/arm-arches.def
@@ -57,6 +57,11 @@ ARM_ARCH("armv7-m", cortexm3,	7M,	ARM_FSET_MAKE_CPU1 (FL_CO_PROC |	  FL_FOR_
 ARM_ARCH("armv7e-m", cortexm4,  7EM,	ARM_FSET_MAKE_CPU1 (FL_CO_PROC |	  FL_FOR_ARCH7EM))
 ARM_ARCH("armv8-a", cortexa53,  8A,	ARM_FSET_MAKE_CPU1 (FL_CO_PROC | FL_FOR_ARCH8A))
 ARM_ARCH("armv8-a+crc",cortexa53, 8A,   ARM_FSET_MAKE_CPU1 (FL_CO_PROC | FL_CRC32  | FL_FOR_ARCH8A))
+ARM_ARCH ("armv8.1-a", cortexa53,  8A,
+	  ARM_FSET_MAKE (FL_CO_PROC | FL_FOR_ARCH8A,  FL2_FOR_ARCH8_1A))
+ARM_ARCH ("armv8.1-a+crc",cortexa53, 8A,
+	  ARM_FSET_MAKE (FL_CO_PROC | FL_CRC32 | FL_FOR_ARCH8A,
+			 FL2_FOR_ARCH8_1A))
 ARM_ARCH("iwmmxt",  iwmmxt, 5TE,	ARM_FSET_MAKE_CPU1 (FL_LDSCHED | FL_STRONG | FL_FOR_ARCH5TE | FL_XSCALE | FL_IWMMXT))
 ARM_ARCH("iwmmxt2", iwmmxt2,5TE,	ARM_FSET_MAKE_CPU1 (FL_LDSCHED | FL_STRONG | FL_FOR_ARCH5TE | FL_XSCALE | FL_IWMMXT | FL_IWMMXT2))
 
diff --git a/gcc/config/arm/arm-protos.h b/gcc/config/arm/arm-protos.h
index e7328e7..d649e86 100644
--- a/gcc/config/arm/arm-protos.h
+++ b/gcc/config/arm/arm-protos.h
@@ -387,6 +387,8 @@ extern bool arm_is_constant_pool_ref (rtx);
 #define FL_IWMMXT2(1 << 30)   /* "Intel Wireless MMX2 technology".  */
 #define FL_ARCH6KZ(1 << 31)   /* ARMv6KZ architecture.  */
 
+#define FL2_ARCH8_1   (1 << 0)	  /* Architecture 8.1.  */
+
 /* Flags that only effect tuning, not available instructions.  */
 #define FL_TUNE		(FL_WBUF | FL_VFPV2 | FL_STRONG | FL_LDSCHED \
 			 | FL_CO_PROC)
@@ -415,6 +417,7 @@ extern bool arm_is_constant_pool_ref (rtx);
 #define FL_FOR_ARCH7M	(FL_FOR_ARCH7 | FL_THUMB_DIV)
 #define FL_FOR_ARCH7EM  (FL_FOR_ARCH7M | FL_ARCH7EM)
 #define FL_FOR_ARCH8A	(FL_FOR_ARCH7VE | FL_ARCH8)
+#define FL2_FOR_ARCH8_1A	FL2_ARCH8_1
 
 /* There are too many feature bits to fit in a single word so the set of cpu and
fpu capabilities is a structure.  A feature set is created and manipulated
diff --git a/gcc/config/arm/arm-tables.opt b/gcc/config/arm/arm-tables.opt
index 48aac41..db17f6e 100644
--- a/gcc/config/arm/arm-tables.opt
+++ b/gcc/config/arm/arm-tables.opt
@@ -416,10 +416,16 @@ EnumValue
 Enum(arm_arch) String(armv8-a+crc) Value(26)
 
 EnumValue
-Enum(arm_arch) String(iwmmxt) Value(27)
+Enum(arm_arch) String(armv8.1-a) Value(27)
 
 EnumValue
-Enum(arm_arch) String(iwmmxt2) Value(28)
+Enum(arm_arch) String(armv8.1-a+crc) Value(28)
+
+EnumValue
+Enum(arm_arch) String(iwmmxt) Value(29)
+
+EnumValue
+Enum(arm_arch) String(iwmmxt2) Value(30)
 
 Enum
 Name(arm_fpu) Type(int)
diff --git a/

Re: [Fortran, Patch] Memory sync after coarray image control statements and assignment

2015-12-07 Thread Matthew Wahab


On 07/12/15 10:06, Tobias Burnus wrote:

I wrote:

I wonder whether using

__asm__ __volatile__ ("":::"memory");

would be sufficient as it has a way lower overhead than
__sync_synchronize().


Namely, something like the attached patch.

Regarding the original patch submission: Is there a reason that you didn't
include the test case of Deepak from 
https://gcc.gnu.org/ml/fortran/2015-04/msg00062.html
It should work as -fcoarray=lib -lcaf_single "dg-do run" test.

Tobias



I don't know anything about Fortran or coarrays and I'm curious whether this affects 
architectures with weak memory models. Is the barrier only needed to stop reordering 
by the compiler or is does it also need to stop reordering by the hardware?


Matthew

Re: [AArch64] Rework ARMv8.1 command line options.

2015-12-07 Thread Matthew Wahab


Ping. Updated patch attached.

Matthew

On 27/11/15 09:23, Matthew Wahab wrote:

On 24/11/15 15:22, James Greenhalgh wrote:
 > On Mon, Nov 16, 2015 at 04:31:32PM +0000, Matthew Wahab wrote:
 >>
 >> The command line options for target selection allow ARMv8.1 extensions
 >> to be individually enabled/disabled. They also allow the extensions to
 >> be enabled with -march=armv8-a. This doesn't reflect the ARMv8.1
 >> architecture which requires all extensions to be enabled and doesn't make
 >> them available for ARMv8.
 >>
 >> This patch removes the options for the individual ARMv8.1 extensions
 >> except for +lse. This means that setting -march=armv8.1-a will enable
 >> all extensions required by ARMv8.1 and that the ARMv8.1 extensions can't
 >> be used with -march=armv8.

 > I think I mentioned it in another review, but this patch seems a good place
 > to solve the problem. Could you please update the documentation to explain
 > what you've written above. As it stands I find myself confused by which
 > features GCC will make available at -march=armv8-a and -march=armv8.1-a.

Attached is a patch with the documentation for the AArch64 -march option
reworked to try to make it clearer what the -march=armv8.1-a option will
do. Extensions with feature modifiers (+crc, +lse) are explicitly stated
as being enabled by -march=armv8.1-a. Extensions without feature
modifiers (RDMA, PAN, LOR) are treated as part of the generic 'ARMv8.1
architecture extension' term in the description of -march=armv8.1-a.

I've also rearranged the -march section, to put the description of the
values for -march together and reworded the description of the
-march=native option.

Matthew

2015-11-26  Matthew Wahab  

 * config/aarch64/aarch64-options-extensions.def: Remove
 AARCH64_FL_RDMA from "fp" and "simd".  Remove "pan", "lor",
 "rdma".
 * config/aarch64/aarch64.h (AARCH64_FL_PAN): Remove.
 (AARCH64_FL_LOR): Remove.
 (AARCH64_FL_RDMA): Remove.
 (AARCH64_FL_V8_1): New.
 (AARCH64_FL_FOR_AARCH8_1): Replace AARCH64_FL_PAN, AARCH64_FL_LOR
 and AARCH64_FL_RDMA with AARCH64_FL_V8_1.
 (AARCH64_ISA_RDMA): Replace AARCH64_FL_RDMA with AARCH64_FL_V8_1.
 * doc/invoke.texi (AArch64 -march): Rewrite initial paragraph and
 section on -march=native.  Group descriptions of permitted
 architecture names together.  Expand description of
 -march=armv8.1-a.
 (AArch64 -mtune): Slightly rework section on -march=native.
 (AArch64 -mcpu): Slightly rework section on -march=native.
 (AArch64 Feature Modifiers): Remove "pan", "lor" and "rdma".
 State that -march=armv8.1-a enables "crc" and "lse".



>From 498323fc1992cd75070e86f195d4bba09a5e02e0 Mon Sep 17 00:00:00 2001
From: Matthew Wahab 
Date: Fri, 30 Oct 2015 10:32:59 +
Subject: [PATCH] [AArch64] Rework ARMv8.1 command line options.

Change-Id: Ib9053719f45980255a3d7727e226a53d9f214049
---
 gcc/config/aarch64/aarch64-option-extensions.def |  9 ++---
 gcc/config/aarch64/aarch64.h |  9 ++---
 gcc/doc/invoke.texi  | 47 
 3 files changed, 30 insertions(+), 35 deletions(-)

diff --git a/gcc/config/aarch64/aarch64-option-extensions.def b/gcc/config/aarch64/aarch64-option-extensions.def
index b261a0f..4f1d535 100644
--- a/gcc/config/aarch64/aarch64-option-extensions.def
+++ b/gcc/config/aarch64/aarch64-option-extensions.def
@@ -34,11 +34,10 @@
should contain a whitespace-separated list of the strings in 'Features'
that are required.  Their order is not important.  */
 
-AARCH64_OPT_EXTENSION("fp",	AARCH64_FL_FP,  AARCH64_FL_FPSIMD | AARCH64_FL_CRYPTO | AARCH64_FL_RDMA, "fp")
-AARCH64_OPT_EXTENSION("simd",	AARCH64_FL_FPSIMD,  AARCH64_FL_SIMD | AARCH64_FL_CRYPTO | AARCH64_FL_RDMA,   "asimd")
+AARCH64_OPT_EXTENSION ("fp", AARCH64_FL_FP,
+		   AARCH64_FL_FPSIMD | AARCH64_FL_CRYPTO, "fp")
+AARCH64_OPT_EXTENSION ("simd", AARCH64_FL_FPSIMD,
+		   AARCH64_FL_SIMD | AARCH64_FL_CRYPTO, "asimd")
 AARCH64_OPT_EXTENSION("crypto",	AARCH64_FL_CRYPTO | AARCH64_FL_FPSIMD,  AARCH64_FL_CRYPTO,   "aes pmull sha1 sha2")
 AARCH64_OPT_EXTENSION("crc",	AARCH64_FL_CRC, AARCH64_FL_CRC,"crc32")
 AARCH64_OPT_EXTENSION("lse",	AARCH64_FL_LSE, AARCH64_FL_LSE,"lse")
-AARCH64_OPT_EXTENSION("pan",	AARCH64_FL_PAN,		AARCH64_FL_PAN,		"pan")
-AARCH64_OPT_EXTENSION("lor",	AARCH64_FL_LOR,		AARCH64_FL_LOR,		"lor")
-AARCH64_OPT_EXTENSION("rdma",	AA

Re: [PATCH 5/7][Testsuite] Support ARMv8.1 ARM tests.

2015-11-27 Thread Matthew Wahab


On 27/11/15 13:44, Christophe Lyon wrote:

On 26/11/15 16:02, Matthew Wahab wrote:



This patch adds ARMv8.1 support to GCC Dejagnu, to allow ARM
tests to specify targest and to set up command line options.
It builds on the ARMv8.1 target support added for AArch64 tests, partly
reworking that support to take into account the different configurations
that tests may be run under.



I may be mistaken, but -mfpu=neon-fp-armv8 and -mfloat-abi=softfp are not
supported by aarch64-gcc. So it seems to me that
check_effective_target_arm_v8_1a_neon_ok_nocache will not always work
for aarch64 after your patch.



Or does it work because no option is needed and thus "" always
matches and thus the loop always exits after the first iteration
on aarch64?


Yes, the idea is that the empty string will make the function first try 
'-march=armv8.1-a' without any other flag. That will work for AArch64 because it 
doesn't need any other option.



Maybe a more accurate comment would help remembering that, in case
-mfpu option becomes necessary for aarch64.



Agreed, it's worth having a comment to explain what the 'foreach' construct is 
doing.

Matthew

Re: [PATCH 5/7][Testsuite] Support ARMv8.1 ARM tests.

2015-11-27 Thread Matthew Wahab


On 27/11/15 13:44, Christophe Lyon wrote:

On 26/11/15 16:02, Matthew Wahab wrote:



This patch adds ARMv8.1 support to GCC Dejagnu, to allow ARM
tests to specify targest and to set up command line options.
It builds on the ARMv8.1 target support added for AArch64 tests, partly
reworking that support to take into account the different configurations
that tests may be run under.



I may be mistaken, but -mfpu=neon-fp-armv8 and -mfloat-abi=softfp are not
supported by aarch64-gcc. So it seems to me that
check_effective_target_arm_v8_1a_neon_ok_nocache will not always work
for aarch64 after your patch.



Or does it work because no option is needed and thus "" always
matches and thus the loop always exits after the first iteration
on aarch64?


Yes, the idea is that the empty string will make the function first try 
'-march=armv8.1-a' without any other flag. That will work for AArch64 because it 
doesn't need any other option.



Maybe a more accurate comment would help remembering that, in case
-mfpu option becomes necessary for aarch64.



Agreed, it's worth having a comment to explain what the 'foreach' construct is 
doing.

Matthew

Re: [PATCH 1/7][ARM] Add support for ARMv8.1.

2015-11-27 Thread Matthew Wahab


On 27/11/15 14:05, Christophe Lyon wrote:

On 26 November 2015 at 16:55, Matthew Wahab  wrote:



ARMv8.1 includes an extension to ARM which adds two Adv.SIMD
instructions, vqrdmlah and vqrdmlsh. This patch set adds support for
ARMv8.1 and for the new instructions, enabling the architecture with
--march=armv8.1-a. The new instructions are enabled when both ARMv8.1
and a suitable fpu options are set, for instance with -march=armv8.1-a
-mfpu=neon-fp-armv8 -mfloat-abi=hard.

This patch set adds the command line options and internal feature
macros. Following patches
- enable multilib support for ARMv8.1,
- add patterns for the new instructions,
- add the ACLE feature macro for the ARMv8.1 extensions,
- extend target support in the testsuite to ARMv8.1,
- add the ACLE intrinsics for vqrmdl{as}h and
- add the ACLE intrinsics for vqrmdl{as}h_lane.





The whole series LGTM, but do you plan to add tests for the new intrinsics?


The Adv.SIMD intrinsics tests are in gcc.target/aarch64/advsimd-intrinsics, they get 
run for both AArch64 and ARM backends. The tests for the new intrinsics were added 
(yesterday) by the AArch64 version of this patch.


Matthew

Re: [AArch64] Rework ARMv8.1 command line options.

2015-11-27 Thread Matthew Wahab


On 24/11/15 15:22, James Greenhalgh wrote:
> On Mon, Nov 16, 2015 at 04:31:32PM +0000, Matthew Wahab wrote:
>>
>> The command line options for target selection allow ARMv8.1 extensions
>> to be individually enabled/disabled. They also allow the extensions to
>> be enabled with -march=armv8-a. This doesn't reflect the ARMv8.1
>> architecture which requires all extensions to be enabled and doesn't make
>> them available for ARMv8.
>>
>> This patch removes the options for the individual ARMv8.1 extensions
>> except for +lse. This means that setting -march=armv8.1-a will enable
>> all extensions required by ARMv8.1 and that the ARMv8.1 extensions can't
>> be used with -march=armv8.

> I think I mentioned it in another review, but this patch seems a good place
> to solve the problem. Could you please update the documentation to explain
> what you've written above. As it stands I find myself confused by which
> features GCC will make available at -march=armv8-a and -march=armv8.1-a.

Attached is a patch with the documentation for the AArch64 -march option
reworked to try to make it clearer what the -march=armv8.1-a option will
do. Extensions with feature modifiers (+crc, +lse) are explicitly stated
as being enabled by -march=armv8.1-a. Extensions without feature
modifiers (RDMA, PAN, LOR) are treated as part of the generic 'ARMv8.1
architecture extension' term in the description of -march=armv8.1-a.

I've also rearranged the -march section, to put the description of the
values for -march together and reworded the description of the
-march=native option.

Matthew

2015-11-26  Matthew Wahab  

* config/aarch64/aarch64-options-extensions.def: Remove
AARCH64_FL_RDMA from "fp" and "simd".  Remove "pan", "lor",
"rdma".
* config/aarch64/aarch64.h (AARCH64_FL_PAN): Remove.
(AARCH64_FL_LOR): Remove.
(AARCH64_FL_RDMA): Remove.
(AARCH64_FL_V8_1): New.
(AARCH64_FL_FOR_AARCH8_1): Replace AARCH64_FL_PAN, AARCH64_FL_LOR
and AARCH64_FL_RDMA with AARCH64_FL_V8_1.
(AARCH64_ISA_RDMA): Replace AARCH64_FL_RDMA with AARCH64_FL_V8_1.
* doc/invoke.texi (AArch64 -march): Rewrite initial paragraph and
section on -march=native.  Group descriptions of permitted
architecture names together.  Expand description of
-march=armv8.1-a.
(AArch64 -mtune): Slightly rework section on -march=native.
(AArch64 -mcpu): Slightly rework section on -march=native.
(AArch64 Feature Modifiers): Remove "pan", "lor" and "rdma".
State that -march=armv8.1-a enables "crc" and "lse".

diff --git a/gcc/config/aarch64/aarch64-option-extensions.def b/gcc/config/aarch64/aarch64-option-extensions.def
index b261a0f7c3c6f5264fe4f95c85a59535aa951ce4..4f1d53515a9a4ff8920fadb13164c85e39990db5 100644
--- a/gcc/config/aarch64/aarch64-option-extensions.def
+++ b/gcc/config/aarch64/aarch64-option-extensions.def
@@ -34,11 +34,10 @@
should contain a whitespace-separated list of the strings in 'Features'
that are required.  Their order is not important.  */
 
-AARCH64_OPT_EXTENSION("fp",	AARCH64_FL_FP,  AARCH64_FL_FPSIMD | AARCH64_FL_CRYPTO | AARCH64_FL_RDMA, "fp")
-AARCH64_OPT_EXTENSION("simd",	AARCH64_FL_FPSIMD,  AARCH64_FL_SIMD | AARCH64_FL_CRYPTO | AARCH64_FL_RDMA,   "asimd")
+AARCH64_OPT_EXTENSION ("fp", AARCH64_FL_FP,
+		   AARCH64_FL_FPSIMD | AARCH64_FL_CRYPTO, "fp")
+AARCH64_OPT_EXTENSION ("simd", AARCH64_FL_FPSIMD,
+		   AARCH64_FL_SIMD | AARCH64_FL_CRYPTO, "asimd")
 AARCH64_OPT_EXTENSION("crypto",	AARCH64_FL_CRYPTO | AARCH64_FL_FPSIMD,  AARCH64_FL_CRYPTO,   "aes pmull sha1 sha2")
 AARCH64_OPT_EXTENSION("crc",	AARCH64_FL_CRC, AARCH64_FL_CRC,"crc32")
 AARCH64_OPT_EXTENSION("lse",	AARCH64_FL_LSE, AARCH64_FL_LSE,"lse")
-AARCH64_OPT_EXTENSION("pan",	AARCH64_FL_PAN,		AARCH64_FL_PAN,		"pan")
-AARCH64_OPT_EXTENSION("lor",	AARCH64_FL_LOR,		AARCH64_FL_LOR,		"lor")
-AARCH64_OPT_EXTENSION("rdma",	AARCH64_FL_RDMA | AARCH64_FL_FPSIMD,	AARCH64_FL_RDMA,	"rdma")
diff --git a/gcc/config/aarch64/aarch64.h b/gcc/config/aarch64/aarch64.h
index 68c006fa91f6326140cf447c7f4578ac46c24f79..06345f0215ea190b7b089264a0039a201437ecec 100644
--- a/gcc/config/aarch64/aarch64.h
+++ b/gcc/config/aarch64/aarch64.h
@@ -134,9 +134,7 @@ extern unsigned aarch64_architecture_version;
 #define AARCH64_FL_CRC(1 << 3)	/* Has CRC.  */
 /* ARMv8.1 architecture extensions.  */
 #define AA

Re: [PATCH 7/7][ARM] Add ACLE intrinsics vqrdmlah_lane and vqrdmlsh_lane

2015-11-26 Thread Matthew Wahab


Attached the missing patch.
Matthew

On 26/11/15 16:04, Matthew Wahab wrote:

Hello,

This patch adds the ACLE intrinsics for the instructions introduced in
ARMv8.1. It adds the vqrmdlah_lane and vqrdmlsh_lane forms of the
instrinsics to the arm_neon.h header, together with the ARM builtins
used to implement them. The intrinsics are available when
-march=armv8.1-a is enabled together with appropriate fpu options.

Tested the series for arm-none-eabi with cross-compiled check-gcc on an
ARMv8.1 emulator. Also tested arm-none-linux-gnueabihf with native
bootstrap and make check.

Ok for trunk?
Matthew

gcc/
2015-11-26  Matthew Wahab  

 * config/arm/arm_neon.h (vqrdmlahq_lane_s16): New.
 (vqrdmlahq_lane_s32): New.
 (vqrdmlah_lane_s16): New.
 (vqrdmlah_lane_s32): New.
 (vqrdmlshq_lane_s16): New.
 (vqrdmlshq_lane_s32): New.
 (vqrdmlsh_lane_s16): New.
 (vqrdmlsh_lane_s32): New.
 * config/arm/arm_neon_builtins.def: Add "vqrdmlah_lane" and
 "vqrdmlsh_lane".



>From cdfee6be49e52056de8999fbc33a432f2cc7254f Mon Sep 17 00:00:00 2001
From: Matthew Wahab 
Date: Tue, 1 Sep 2015 16:22:34 +0100
Subject: [PATCH 7/7] [ARM] Add neon intrinsics vqrdmlah_lane, vqrdmlsh_lane.

Change-Id: Ia0ab4bbe683af2d019d18a34302a7b9798193a79
---
 gcc/config/arm/arm_neon.h| 50 
 gcc/config/arm/arm_neon_builtins.def |  2 ++
 2 files changed, 52 insertions(+)

diff --git a/gcc/config/arm/arm_neon.h b/gcc/config/arm/arm_neon.h
index b617f80..ed50253 100644
--- a/gcc/config/arm/arm_neon.h
+++ b/gcc/config/arm/arm_neon.h
@@ -7096,6 +7096,56 @@ vqrdmulh_lane_s32 (int32x2_t __a, int32x2_t __b, const int __c)
   return (int32x2_t)__builtin_neon_vqrdmulh_lanev2si (__a, __b, __c);
 }
 
+#ifdef __ARM_FEATURE_QRDMX
+__extension__ static __inline int16x8_t __attribute__ ((__always_inline__))
+vqrdmlahq_lane_s16 (int16x8_t __a, int16x8_t __b, int16x4_t __c, const int __d)
+{
+  return (int16x8_t)__builtin_neon_vqrdmlah_lanev8hi (__a, __b, __c, __d);
+}
+
+__extension__ static __inline int32x4_t __attribute__ ((__always_inline__))
+vqrdmlahq_lane_s32 (int32x4_t __a, int32x4_t __b, int32x2_t __c, const int __d)
+{
+  return (int32x4_t)__builtin_neon_vqrdmlah_lanev4si (__a, __b, __c, __d);
+}
+
+__extension__ static __inline int16x4_t __attribute__ ((__always_inline__))
+vqrdmlah_lane_s16 (int16x4_t __a, int16x4_t __b, int16x4_t __c, const int __d)
+{
+  return (int16x4_t)__builtin_neon_vqrdmlah_lanev4hi (__a, __b, __c, __d);
+}
+
+__extension__ static __inline int32x2_t __attribute__ ((__always_inline__))
+vqrdmlah_lane_s32 (int32x2_t __a, int32x2_t __b, int32x2_t __c, const int __d)
+{
+  return (int32x2_t)__builtin_neon_vqrdmlah_lanev2si (__a, __b, __c, __d);
+}
+
+__extension__ static __inline int16x8_t __attribute__ ((__always_inline__))
+vqrdmlshq_lane_s16 (int16x8_t __a, int16x8_t __b, int16x4_t __c, const int __d)
+{
+  return (int16x8_t)__builtin_neon_vqrdmlsh_lanev8hi (__a, __b, __c, __d);
+}
+
+__extension__ static __inline int32x4_t __attribute__ ((__always_inline__))
+vqrdmlshq_lane_s32 (int32x4_t __a, int32x4_t __b, int32x2_t __c, const int __d)
+{
+  return (int32x4_t)__builtin_neon_vqrdmlsh_lanev4si (__a, __b, __c, __d);
+}
+
+__extension__ static __inline int16x4_t __attribute__ ((__always_inline__))
+vqrdmlsh_lane_s16 (int16x4_t __a, int16x4_t __b, int16x4_t __c, const int __d)
+{
+  return (int16x4_t)__builtin_neon_vqrdmlsh_lanev4hi (__a, __b, __c, __d);
+}
+
+__extension__ static __inline int32x2_t __attribute__ ((__always_inline__))
+vqrdmlsh_lane_s32 (int32x2_t __a, int32x2_t __b, int32x2_t __c, const int __d)
+{
+  return (int32x2_t)__builtin_neon_vqrdmlsh_lanev2si (__a, __b, __c, __d);
+}
+#endif
+
 __extension__ static __inline int16x4_t __attribute__ ((__always_inline__))
 vmul_n_s16 (int16x4_t __a, int16_t __b)
 {
diff --git a/gcc/config/arm/arm_neon_builtins.def b/gcc/config/arm/arm_neon_builtins.def
index 8d5c0ca..1fdb2a8 100644
--- a/gcc/config/arm/arm_neon_builtins.def
+++ b/gcc/config/arm/arm_neon_builtins.def
@@ -60,6 +60,8 @@ VAR4 (BINOP, vqdmulh_n, v4hi, v2si, v8hi, v4si)
 VAR4 (BINOP, vqrdmulh_n, v4hi, v2si, v8hi, v4si)
 VAR4 (SETLANE, vqdmulh_lane, v4hi, v2si, v8hi, v4si)
 VAR4 (SETLANE, vqrdmulh_lane, v4hi, v2si, v8hi, v4si)
+VAR4 (MAC_LANE, vqrdmlah_lane, v4hi, v2si, v8hi, v4si)
+VAR4 (MAC_LANE, vqrdmlsh_lane, v4hi, v2si, v8hi, v4si)
 VAR2 (BINOP, vqdmull, v4hi, v2si)
 VAR8 (BINOP, vshls, v8qi, v4hi, v2si, di, v16qi, v8hi, v4si, v2di)
 VAR8 (BINOP, vshlu, v8qi, v4hi, v2si, di, v16qi, v8hi, v4si, v2di)
-- 
2.1.4

Re: [PATCH 5/7][Testsuite] Support ARMv8.1 ARM tests.

2015-11-26 Thread Matthew Wahab


Attached the missing patch.
Matthew

On 26/11/15 16:02, Matthew Wahab wrote:

Hello,

This patch adds ARMv8.1 support to GCC Dejagnu, to allow ARM
tests to specify targest and to set up command line options.
It builds on the ARMv8.1 target support added for AArch64 tests, partly
reworking that support to take into account the different configurations
that tests may be run under.

The main changes are
- add_options_for_arm_v8_1a_neon: Call
   check_effective_target_arm_v8_1a_neon_ok to select a suitable set of
   options.
- check_effective_target_arm_v8_1a_neon_ok: Test possible command line
   options, recording the first set that works.
- check_effective_target_arm_v8_1a_neon_hw: Add a test for ARM targets.

Tested the series for arm-none-eabi with cross-compiled check-gcc on an
ARMv8.1 emulator. Also tested arm-none-linux-gnueabihf with native
bootstrap and make check.

Ok for trunk?
Matthew

testsuite/
2015-11-26  Matthew Wahab  

 * lib/target-supports.exp (add_options_for_arm_v8_1a_neon): Update
 comment.  Use check_effetive_target_arm_v8_1a_neon_ok to select
 the command line options.
 (check_effective_target_arm_v8_1a_neon_ok_nocache): Update initial
 test to allow ARM targets.  Select and record a working set of
 command line options.
 (check_effective_target_arm_v8_1a_neon_hw): Add tests for ARM
 targets.



>From 6f767289ce83be88bc088c7adf66d137ed335762 Mon Sep 17 00:00:00 2001
From: Matthew Wahab 
Date: Fri, 9 Oct 2015 17:38:12 +0100
Subject: [PATCH 5/7] [Testsuite] Support ARMv8.1 NEON on ARM.

Change-Id: I35436b64996789d54f215d66ed4b0ec5ffe48e37
---
 gcc/testsuite/lib/target-supports.exp | 56 +--
 1 file changed, 41 insertions(+), 15 deletions(-)

diff --git a/gcc/testsuite/lib/target-supports.exp b/gcc/testsuite/lib/target-supports.exp
index dcd51fd..34bb45d 100644
--- a/gcc/testsuite/lib/target-supports.exp
+++ b/gcc/testsuite/lib/target-supports.exp
@@ -2816,14 +2816,15 @@ proc add_options_for_arm_v8_neon { flags } {
 return "$flags $et_arm_v8_neon_flags -march=armv8-a"
 }
 
-# Add the options needed for ARMv8.1 Adv.SIMD.
+# Add the options needed for ARMv8.1 Adv.SIMD.  Also adds the ARMv8 NEON
+# options for AArch64 and for ARM.
 
 proc add_options_for_arm_v8_1a_neon { flags } {
-if { [istarget aarch64*-*-*] } {
-	return "$flags -march=armv8.1-a"
-} else {
+if { ! [check_effective_target_arm_v8_1a_neon_ok] } {
 	return "$flags"
 }
+global et_arm_v8_1a_neon_flags
+return "$flags $et_arm_v8_1a_neon_flags -march=armv8.1-a"
 }
 
 proc add_options_for_arm_crc { flags } {
@@ -3271,17 +3272,29 @@ proc check_effective_target_arm_neonv2_hw { } {
 }
 
 # Return 1 if the target supports the ARMv8.1 Adv.SIMD extension, 0
-# otherwise.  The test is valid for AArch64.
+# otherwise.  The test is valid for AArch64 and ARM.
 
 proc check_effective_target_arm_v8_1a_neon_ok_nocache { } {
-if { ![istarget aarch64*-*-*] } {
-	return 0
+global et_arm_v8_1a_neon_flags
+set et_arm_v8_1a_neon_flags ""
+
+if { ![istarget arm*-*-*] && ![istarget aarch64*-*-*] } {
+	return 0;
 }
-return [check_no_compiler_messages_nocache arm_v8_1a_neon_ok assembly {
-	#if !defined (__ARM_FEATURE_QRDMX)
-	#error "__ARM_FEATURE_QRDMX not defined"
-	#endif
-} [add_options_for_arm_v8_1a_neon ""]]
+
+foreach flags {"" "-mfpu=neon-fp-armv8" "-mfloat-abi=softfp" \
+		   "-mfpu=neon-fp-armv8 -mfloat-abi=softfp"} {
+	if { [check_no_compiler_messages_nocache arm_v8_1a_neon_ok object {
+	#if !defined (__ARM_FEATURE_QRDMX)
+	#error "__ARM_FEATURE_QRDMX not defined"
+	#endif
+	} "$flags -march=armv8.1-a"] } {
+	set et_arm_v8_1a_neon_flags "$flags -march=armv8.1-a"
+	return 1
+	}
+}
+
+return 0;
 }
 
 proc check_effective_target_arm_v8_1a_neon_ok { } {
@@ -3308,16 +3321,17 @@ proc check_effective_target_arm_v8_neon_hw { } {
 }
 
 # Return 1 if the target supports executing the ARMv8.1 Adv.SIMD extension, 0
-# otherwise.  The test is valid for AArch64.
+# otherwise.  The test is valid for AArch64 and ARM.
 
 proc check_effective_target_arm_v8_1a_neon_hw { } {
 if { ![check_effective_target_arm_v8_1a_neon_ok] } {
 	return 0;
 }
-return [check_runtime_nocache arm_v8_1a_neon_hw_available {
+return [check_runtime arm_v8_1a_neon_hw_available {
 	int
 	main (void)
 	{
+	  #ifdef __ARM_ARCH_ISA_A64
 	  __Int32x2_t a = {0, 1};
 	  __Int32x2_t b = {0, 2};
 	  __Int32x2_t result;
@@ -3327,9 +3341,21 @@ proc check_effective_target_arm_v8_1a_neon_hw { } {
 	   : "w"(a), "w"(b)
 	   : /* No clobbers.  */);
 
+	  #else
+
+	  __simd64_int32_t a = {0, 1};
+	  __simd64_int32_t b = {0, 2};
+	  __simd64_int32_t result;
+
+	  asm ("vqrdmlah.s32 %P0, %P1, %P2"
+	   : "=w"(result)
+	   : &quo

[PATCH 7/7][ARM] Add ACLE intrinsics vqrdmlah_lane and vqrdmlsh_lane

2015-11-26 Thread Matthew Wahab


Hello,

This patch adds the ACLE intrinsics for the instructions introduced in
ARMv8.1. It adds the vqrmdlah_lane and vqrdmlsh_lane forms of the
instrinsics to the arm_neon.h header, together with the ARM builtins
used to implement them. The intrinsics are available when
-march=armv8.1-a is enabled together with appropriate fpu options.

Tested the series for arm-none-eabi with cross-compiled check-gcc on an
ARMv8.1 emulator. Also tested arm-none-linux-gnueabihf with native
bootstrap and make check.

Ok for trunk?
Matthew

gcc/
2015-11-26  Matthew Wahab  

* config/arm/arm_neon.h (vqrdmlahq_lane_s16): New.
(vqrdmlahq_lane_s32): New.
(vqrdmlah_lane_s16): New.
(vqrdmlah_lane_s32): New.
(vqrdmlshq_lane_s16): New.
(vqrdmlshq_lane_s32): New.
(vqrdmlsh_lane_s16): New.
(vqrdmlsh_lane_s32): New.
* config/arm/arm_neon_builtins.def: Add "vqrdmlah_lane" and
"vqrdmlsh_lane".

[PATCH 6/7][ARM] Add ACLE intrinsics vqrdmlah and vqrdmlsh

2015-11-26 Thread Matthew Wahab


Hello,

This patch adds the ACLE intrinsics for the instructions introduced in
ARMv8.1. It adds the vqrmdlah and vqrdmlsh forms of the instrinsics to
the arm_neon.h header, together with the ARM builtins used to implement
them. The intrinsics are available when -march=armv8.1-a is enabled
together with appropriate fpu options.

Tested the series for arm-none-eabi with cross-compiled check-gcc on an
ARMv8.1 emulator. Also tested arm-none-linux-gnueabihf with native
bootstrap and make check.

Ok for trunk?
Matthew

gcc/
2015-11-26  Matthew Wahab  

* config/arm/arm_neon.h (vqrdmlah_s16, vqrdmlah_s32): New.
(vqrdmlahq_s16, vqrdmlahq_s32): New.
(vqrdmlsh_s16, vqrdmlsh_s32): New.
(vqrdmlahq_s16, vqrdmlshq_s32): New.
* config/arm/arm_neon_builtins.def: Add "vqrdmlah" and "vqrdmlsh".

>From 93e9db5bf06172f18f4e89e9533c66d8a0c4f2ca Mon Sep 17 00:00:00 2001
From: Matthew Wahab 
Date: Tue, 1 Sep 2015 16:21:44 +0100
Subject: [PATCH 6/7] [ARM] Add neon intrinsics vqrdmlah, vqrdmlsh.

Change-Id: Ic40ff4d477f36ec01714c68e3b83b66208c7958b
---
 gcc/config/arm/arm_neon.h| 50 
 gcc/config/arm/arm_neon_builtins.def |  2 ++
 2 files changed, 52 insertions(+)

diff --git a/gcc/config/arm/arm_neon.h b/gcc/config/arm/arm_neon.h
index 0a33d21..b617f80 100644
--- a/gcc/config/arm/arm_neon.h
+++ b/gcc/config/arm/arm_neon.h
@@ -1158,6 +1158,56 @@ vqrdmulhq_s32 (int32x4_t __a, int32x4_t __b)
   return (int32x4_t)__builtin_neon_vqrdmulhv4si (__a, __b);
 }
 
+#ifdef __ARM_FEATURE_QRDMX
+__extension__ static __inline int16x4_t __attribute__ ((__always_inline__))
+vqrdmlah_s16 (int16x4_t __a, int16x4_t __b, int16x4_t __c)
+{
+  return (int16x4_t)__builtin_neon_vqrdmlahv4hi (__a, __b, __c);
+}
+
+__extension__ static __inline int32x2_t __attribute__ ((__always_inline__))
+vqrdmlah_s32 (int32x2_t __a, int32x2_t __b, int32x2_t __c)
+{
+  return (int32x2_t)__builtin_neon_vqrdmlahv2si (__a, __b, __c);
+}
+
+__extension__ static __inline int16x8_t __attribute__ ((__always_inline__))
+vqrdmlahq_s16 (int16x8_t __a, int16x8_t __b, int16x8_t __c)
+{
+  return (int16x8_t)__builtin_neon_vqrdmlahv8hi (__a, __b, __c);
+}
+
+__extension__ static __inline int32x4_t __attribute__ ((__always_inline__))
+vqrdmlahq_s32 (int32x4_t __a, int32x4_t __b, int32x4_t __c)
+{
+  return (int32x4_t)__builtin_neon_vqrdmlahv4si (__a, __b, __c);
+}
+
+__extension__ static __inline int16x4_t __attribute__ ((__always_inline__))
+vqrdmlsh_s16 (int16x4_t __a, int16x4_t __b, int16x4_t __c)
+{
+  return (int16x4_t)__builtin_neon_vqrdmlshv4hi (__a, __b, __c);
+}
+
+__extension__ static __inline int32x2_t __attribute__ ((__always_inline__))
+vqrdmlsh_s32 (int32x2_t __a, int32x2_t __b, int32x2_t __c)
+{
+  return (int32x2_t)__builtin_neon_vqrdmlshv2si (__a, __b, __c);
+}
+
+__extension__ static __inline int16x8_t __attribute__ ((__always_inline__))
+vqrdmlshq_s16 (int16x8_t __a, int16x8_t __b, int16x8_t __c)
+{
+  return (int16x8_t)__builtin_neon_vqrdmlshv8hi (__a, __b, __c);
+}
+
+__extension__ static __inline int32x4_t __attribute__ ((__always_inline__))
+vqrdmlshq_s32 (int32x4_t __a, int32x4_t __b, int32x4_t __c)
+{
+  return (int32x4_t)__builtin_neon_vqrdmlshv4si (__a, __b, __c);
+}
+#endif
+
 __extension__ static __inline int16x8_t __attribute__ ((__always_inline__))
 vmull_s8 (int8x8_t __a, int8x8_t __b)
 {
diff --git a/gcc/config/arm/arm_neon_builtins.def b/gcc/config/arm/arm_neon_builtins.def
index 0b719df..8d5c0ca 100644
--- a/gcc/config/arm/arm_neon_builtins.def
+++ b/gcc/config/arm/arm_neon_builtins.def
@@ -45,6 +45,8 @@ VAR4 (BINOP, vqdmulh, v4hi, v2si, v8hi, v4si)
 VAR4 (BINOP, vqrdmulh, v4hi, v2si, v8hi, v4si)
 VAR2 (TERNOP, vqdmlal, v4hi, v2si)
 VAR2 (TERNOP, vqdmlsl, v4hi, v2si)
+VAR4 (TERNOP, vqrdmlah, v4hi, v2si, v8hi, v4si)
+VAR4 (TERNOP, vqrdmlsh, v4hi, v2si, v8hi, v4si)
 VAR3 (BINOP, vmullp, v8qi, v4hi, v2si)
 VAR3 (BINOP, vmulls, v8qi, v4hi, v2si)
 VAR3 (BINOP, vmullu, v8qi, v4hi, v2si)
-- 
2.1.4

[PATCH 5/7][Testsuite] Support ARMv8.1 ARM tests.

2015-11-26 Thread Matthew Wahab


Hello,

This patch adds ARMv8.1 support to GCC Dejagnu, to allow ARM
tests to specify targest and to set up command line options.
It builds on the ARMv8.1 target support added for AArch64 tests, partly
reworking that support to take into account the different configurations
that tests may be run under.

The main changes are
- add_options_for_arm_v8_1a_neon: Call
  check_effective_target_arm_v8_1a_neon_ok to select a suitable set of
  options.
- check_effective_target_arm_v8_1a_neon_ok: Test possible command line
  options, recording the first set that works.
- check_effective_target_arm_v8_1a_neon_hw: Add a test for ARM targets.

Tested the series for arm-none-eabi with cross-compiled check-gcc on an
ARMv8.1 emulator. Also tested arm-none-linux-gnueabihf with native
bootstrap and make check.

Ok for trunk?
Matthew

testsuite/
2015-11-26  Matthew Wahab  

* lib/target-supports.exp (add_options_for_arm_v8_1a_neon): Update
comment.  Use check_effetive_target_arm_v8_1a_neon_ok to select
the command line options.
(check_effective_target_arm_v8_1a_neon_ok_nocache): Update initial
test to allow ARM targets.  Select and record a working set of
command line options.
(check_effective_target_arm_v8_1a_neon_hw): Add tests for ARM
targets.

[PATCH 4/7][ARM] Add ACLE feature macro for ARMv8.1 instructions.

2015-11-26 Thread Matthew Wahab


Hello,

This patch adds the feature macro __ARM_FEATURE_QRDMX to indicate the
presence of the ARMv8.1 instructions vqrdmlah and vqrdmlsh. It is
defined when the instructions are available, as it is when
-march=armv8.1-a is enabled with suitable fpu options.

Tested the series for arm-none-eabi with cross-compiled check-gcc on an
ARMv8.1 emulator. Also tested arm-none-linux-gnueabihf with native
bootstrap and make check.

Ok for trunk?
Matthew

gcc/
2015-11-26  Matthew Wahab  

* config/arm/arm-c.c (arm_cpu_builtins): Define __ARM_FEATURE_QRDMX.

>From 4009cf5c0455429a415be9ca239ac09ac86b17dd Mon Sep 17 00:00:00 2001
From: Matthew Wahab 
Date: Wed, 17 Jun 2015 13:25:09 +0100
Subject: [PATCH 4/7] [ARM] Add __ARM_FEATURE_QRDMX

Change-Id: I26cde507e8844a731e4fd857fbd30bf87f213f89
---
 gcc/config/arm/arm-c.c | 2 ++
 1 file changed, 2 insertions(+)

diff --git a/gcc/config/arm/arm-c.c b/gcc/config/arm/arm-c.c
index c336a16..6bf740b 100644
--- a/gcc/config/arm/arm-c.c
+++ b/gcc/config/arm/arm-c.c
@@ -66,6 +66,8 @@ arm_cpu_builtins (struct cpp_reader* pfile)
   def_or_undef_macro (pfile, "__ARM_FEATURE_SAT", TARGET_ARM_SAT);
   def_or_undef_macro (pfile, "__ARM_FEATURE_CRYPTO", TARGET_CRYPTO);
 
+  if (TARGET_NEON_RDMA)
+builtin_define ("__ARM_FEATURE_QRDMX");
   if (unaligned_access)
 builtin_define ("__ARM_FEATURE_UNALIGNED");
   if (TARGET_CRC32)
-- 
2.1.4

[PATCH 3/7][ARM] Add patterns for new instructions

2015-11-26 Thread Matthew Wahab


Hello,

This patch adds patterns for the instructions, vqrdmlah and vqrdmlsh,
introduced in the ARMv8.1 architecture. The instructions are made
available when -march=armv8.1-a is enabled with suitable fpu settings,
such as -mfpu=neon-fp-armv8 -mfloat-abi=hard.

Tested the series for arm-none-eabi with cross-compiled check-gcc on an
ARMv8.1 emulator. Also tested arm-none-linux-gnueabihf with native
bootstrap and make check.

Ok for trunk?
Matthew

gcc/
2015-11-26  Matthew Wahab  

* config/arm/iterators.md (VQRDMLH_AS): New.
(neon_rdma_as): New.
* config/arm/neon.md
(neon_vqrdmlh): New.
(neon_vqrdmlh_lane): New.
* config/arm/unspecs.md (UNSPEC_VQRDMLAH): New.
(UNSPEC_VQRDMLSH): New.

>From fea646491d51548b775fdfb5a4fd6d6bc72d4c83 Mon Sep 17 00:00:00 2001
From: Matthew Wahab 
Date: Wed, 17 Jun 2015 12:00:50 +0100
Subject: [PATCH 3/7] [ARM] Add patterns for new instructions.

Change-Id: Ia84c345019c7beda2d3c6c39074043d2e005347a
---
 gcc/config/arm/iterators.md |  5 +
 gcc/config/arm/neon.md  | 45 +
 gcc/config/arm/unspecs.md   |  2 ++
 3 files changed, 52 insertions(+)

diff --git a/gcc/config/arm/iterators.md b/gcc/config/arm/iterators.md
index 6a54125..c7a6880 100644
--- a/gcc/config/arm/iterators.md
+++ b/gcc/config/arm/iterators.md
@@ -362,6 +362,8 @@
 (define_int_iterator CRYPTO_SELECTING [UNSPEC_SHA1C UNSPEC_SHA1M
UNSPEC_SHA1P])
 
+(define_int_iterator VQRDMLH_AS [UNSPEC_VQRDMLAH UNSPEC_VQRDMLSH])
+
 ;;
 ;; Mode attributes
 ;;
@@ -831,3 +833,6 @@
(simple_return " && use_simple_return_p ()")])
 (define_code_attr return_cond_true [(return " && USE_RETURN_INSN (TRUE)")
(simple_return " && use_simple_return_p ()")])
+
+;; Attributes for VQRDMLAH/VQRDMLSH
+(define_int_attr neon_rdma_as [(UNSPEC_VQRDMLAH "a") (UNSPEC_VQRDMLSH "s")])
diff --git a/gcc/config/arm/neon.md b/gcc/config/arm/neon.md
index 62fb6da..844ef5e 100644
--- a/gcc/config/arm/neon.md
+++ b/gcc/config/arm/neon.md
@@ -2014,6 +2014,18 @@
   [(set_attr "type" "neon_sat_mul_")]
 )
 
+;; vqrdmlah, vqrdmlsh
+(define_insn "neon_vqrdmlh"
+  [(set (match_operand:VMDQI 0 "s_register_operand" "=w")
+	(unspec:VMDQI [(match_operand:VMDQI 1 "s_register_operand" "0")
+		   (match_operand:VMDQI 2 "s_register_operand" "w")
+		   (match_operand:VMDQI 3 "s_register_operand" "w")]
+		  VQRDMLH_AS))]
+  "TARGET_NEON_RDMA"
+  "vqrdmlh.\t%0, %2, %3"
+  [(set_attr "type" "neon_sat_mla__long")]
+)
+
 (define_insn "neon_vqdmlal"
   [(set (match_operand: 0 "s_register_operand" "=w")
 (unspec: [(match_operand: 1 "s_register_operand" "0")
@@ -3176,6 +3188,39 @@ if (BYTES_BIG_ENDIAN)
   [(set_attr "type" "neon_sat_mul__scalar_q")]
 )
 
+;; vqrdmlah_lane, vqrdmlsh_lane
+(define_insn "neon_vqrdmlh_lane"
+  [(set (match_operand:VMQI 0 "s_register_operand" "=w")
+	(unspec:VMQI [(match_operand:VMQI 1 "s_register_operand" "0")
+		  (match_operand:VMQI 2 "s_register_operand" "w")
+		  (match_operand: 3 "s_register_operand"
+	  "")
+		  (match_operand:SI 4 "immediate_operand" "i")]
+		 VQRDMLH_AS))]
+  "TARGET_NEON_RDMA"
+{
+  return
+   "vqrdmlh.\t%q0, %q2, %P3[%c4]";
+}
+  [(set_attr "type" "neon_mla__scalar")]
+)
+
+(define_insn "neon_vqrdmlh_lane"
+  [(set (match_operand:VMDI 0 "s_register_operand" "=w")
+	(unspec:VMDI [(match_operand:VMDI 1 "s_register_operand" "0")
+		  (match_operand:VMDI 2 "s_register_operand" "w")
+		  (match_operand:VMDI 3 "s_register_operand"
+	  "")
+		  (match_operand:SI 4 "immediate_operand" "i")]
+		 VQRDMLH_AS))]
+  "TARGET_NEON_RDMA"
+{
+  return
+   "vqrdmlh.\t%P0, %P2, %P3[%c4]";
+}
+  [(set_attr "type" "neon_mla__scalar")]
+)
+
 (define_insn "neon_vmla_lane"
   [(set (match_operand:VMD 0 "s_register_operand" "=w")
 	(unspec:VMD [(match_operand:VMD 1 "s_register_operand" "0")
diff --git a/gcc/config/arm/unspecs.md b/gcc/config/arm/unspecs.md
index 44d4e7d..e7ae9a2 100644
--- a/gcc/config/arm/unspecs.md
+++ b/gcc/config/arm/unspecs.md
@@ -360,5 +360,7 @@
   UNSPEC_NVRINTX
   UNSPEC_NVRINTA
   UNSPEC_NVRINTN
+  UNSPEC_VQRDMLAH
+  UNSPEC_VQRDMLSH
 ])
 
-- 
2.1.4

[PATCH 2/7][ARM] Multilib support for ARMv8.1.

2015-11-26 Thread Matthew Wahab


This patch sets up multilib support for ARMv8.1, treating it as a
synonym for ARMv8. Since ARMv8.1 integer, FP or SIMD
instructions are only generated for the new, instruction-specific
instrinsics, mapping to ARMv8 rather than adding a new multilib variant
is sufficient.

Tested the series for arm-none-eabi with cross-compiled check-gcc on an
ARMv8.1 emulator. Also tested arm-none-linux-gnueabihf with native
bootstrap and make check.

Ok for trunk?
Matthew

gcc/
2015-11-26  Matthew Wahab  

* config/arm/t-aprofile: Make "armv8.1-a" and "armv8.1-a+crc"
matches for "armv8-a".

>From 9cd389bf72cff391423e17423f4624904aff5474 Mon Sep 17 00:00:00 2001
From: Matthew Wahab 
Date: Fri, 23 Oct 2015 09:37:12 +0100
Subject: [PATCH 2/7] [ARM] Multilib support for ARMv8.1

Change-Id: I65ee77768e22452ac15452cf6d4fdec3079ef852
---
 gcc/config/arm/t-aprofile | 2 ++
 1 file changed, 2 insertions(+)

diff --git a/gcc/config/arm/t-aprofile b/gcc/config/arm/t-aprofile
index cf34161..b23f1bc 100644
--- a/gcc/config/arm/t-aprofile
+++ b/gcc/config/arm/t-aprofile
@@ -98,6 +98,8 @@ MULTILIB_MATCHES   += march?armv8-a=mcpu?xgene1
 
 # Arch Matches
 MULTILIB_MATCHES   += march?armv8-a=march?armv8-a+crc
+MULTILIB_MATCHES   += march?armv8-a=march?armv8.1-a
+MULTILIB_MATCHES   += march?armv8-a=march?armv8.1-a+crc
 
 # FPU matches
 MULTILIB_MATCHES   += mfpu?vfpv3-d16=mfpu?vfpv3
-- 
2.1.4

[PATCH 1/7][ARM] Add support for ARMv8.1.

2015-11-26 Thread Matthew Wahab


Hello,


ARMv8.1 includes an extension to ARM which adds two Adv.SIMD
instructions, vqrdmlah and vqrdmlsh. This patch set adds support for
ARMv8.1 and for the new instructions, enabling the architecture with
--march=armv8.1-a. The new instructions are enabled when both ARMv8.1
and a suitable fpu options are set, for instance with -march=armv8.1-a
-mfpu=neon-fp-armv8 -mfloat-abi=hard.

This patch set adds the command line options and internal feature
macros. Following patches
- enable multilib support for ARMv8.1,
- add patterns for the new instructions,
- add the ACLE feature macro for the ARMv8.1 extensions,
- extend target support in the testsuite to ARMv8.1,
- add the ACLE intrinsics for vqrmdl{as}h and
- add the ACLE intrinsics for vqrmdl{as}h_lane.

Tested the series for arm-none-eabi with cross-compiled check-gcc on an
ARMv8.1 emulator. Also tested arm-none-linux-gnueabihf with native
bootstrap and make check.

Is this ok for trunk?
Matthew

gcc/
2015-11-26  Matthew Wahab  

* config/arm/arm-arches.def: Add "armv8.1-a" and "armv8.1-a+crc".
* config/arm/arm-protos.h (FL2_ARCH8_1): New.
(FL2_FOR_ARCH8_1A): New.
* config/arm/arm-tables.opt: Regenerate.
* config/arm/arm.c (arm_arch8_1): New.
(arm_option_override): Set arm_arch8_1.
* config/arm/arm.h (TARGET_NEON_RDMA): New.
(arm_arch8_1): Declare.
* doc/invoke.texi (ARM Options, -march): Add "armv8.1-a" and
"armv8.1-a+crc".
(ARM Options, -mfpu): Fix a typo.
>From 3ee3a16839c1c316906e33f5384da05ee70dd831 Mon Sep 17 00:00:00 2001
From: Matthew Wahab 
Date: Tue, 1 Sep 2015 11:31:25 +0100
Subject: [PATCH 1/7] [ARM] Add ARMv8.1 architecture flags and options.

Change-Id: I6bb0c7f020613a1a17e40bccc28b00c30d644c70
---
 gcc/config/arm/arm-arches.def |  5 +
 gcc/config/arm/arm-protos.h   |  3 +++
 gcc/config/arm/arm-tables.opt | 10 --
 gcc/config/arm/arm.c  |  4 
 gcc/config/arm/arm.h  |  6 ++
 gcc/doc/invoke.texi   |  6 +++---
 6 files changed, 29 insertions(+), 5 deletions(-)

diff --git a/gcc/config/arm/arm-arches.def b/gcc/config/arm/arm-arches.def
index ddf6c3c..6c83153 100644
--- a/gcc/config/arm/arm-arches.def
+++ b/gcc/config/arm/arm-arches.def
@@ -57,6 +57,11 @@ ARM_ARCH("armv7-m", cortexm3,	7M,	ARM_FSET_MAKE_CPU1 (FL_CO_PROC |	  FL_FOR_
 ARM_ARCH("armv7e-m", cortexm4,  7EM,	ARM_FSET_MAKE_CPU1 (FL_CO_PROC |	  FL_FOR_ARCH7EM))
 ARM_ARCH("armv8-a", cortexa53,  8A,	ARM_FSET_MAKE_CPU1 (FL_CO_PROC | FL_FOR_ARCH8A))
 ARM_ARCH("armv8-a+crc",cortexa53, 8A,   ARM_FSET_MAKE_CPU1 (FL_CO_PROC | FL_CRC32  | FL_FOR_ARCH8A))
+ARM_ARCH ("armv8.1-a", cortexa53,  8A,
+	  ARM_FSET_MAKE (FL_CO_PROC | FL_FOR_ARCH8A,  FL2_FOR_ARCH8_1A))
+ARM_ARCH ("armv8.1-a+crc",cortexa53, 8A,
+	  ARM_FSET_MAKE (FL_CO_PROC | FL_CRC32 | FL_FOR_ARCH8A,
+			 FL2_FOR_ARCH8_1A))
 ARM_ARCH("iwmmxt",  iwmmxt, 5TE,	ARM_FSET_MAKE_CPU1 (FL_LDSCHED | FL_STRONG | FL_FOR_ARCH5TE | FL_XSCALE | FL_IWMMXT))
 ARM_ARCH("iwmmxt2", iwmmxt2,5TE,	ARM_FSET_MAKE_CPU1 (FL_LDSCHED | FL_STRONG | FL_FOR_ARCH5TE | FL_XSCALE | FL_IWMMXT | FL_IWMMXT2))
 
diff --git a/gcc/config/arm/arm-protos.h b/gcc/config/arm/arm-protos.h
index e4b8fb3..c3eb6d3 100644
--- a/gcc/config/arm/arm-protos.h
+++ b/gcc/config/arm/arm-protos.h
@@ -388,6 +388,8 @@ extern bool arm_is_constant_pool_ref (rtx);
 #define FL_IWMMXT2(1 << 30)   /* "Intel Wireless MMX2 technology".  */
 #define FL_ARCH6KZ(1 << 31)   /* ARMv6KZ architecture.  */
 
+#define FL2_ARCH8_1   (1 << 0)	  /* Architecture 8.1.  */
+
 /* Flags that only effect tuning, not available instructions.  */
 #define FL_TUNE		(FL_WBUF | FL_VFPV2 | FL_STRONG | FL_LDSCHED \
 			 | FL_CO_PROC)
@@ -416,6 +418,7 @@ extern bool arm_is_constant_pool_ref (rtx);
 #define FL_FOR_ARCH7M	(FL_FOR_ARCH7 | FL_THUMB_DIV)
 #define FL_FOR_ARCH7EM  (FL_FOR_ARCH7M | FL_ARCH7EM)
 #define FL_FOR_ARCH8A	(FL_FOR_ARCH7VE | FL_ARCH8)
+#define FL2_FOR_ARCH8_1A	FL2_ARCH8_1
 
 /* There are too many feature bits to fit in a single word so the set of cpu and
fpu capabilities is a structure.  A feature set is created and manipulated
diff --git a/gcc/config/arm/arm-tables.opt b/gcc/config/arm/arm-tables.opt
index 48aac41..db17f6e 100644
--- a/gcc/config/arm/arm-tables.opt
+++ b/gcc/config/arm/arm-tables.opt
@@ -416,10 +416,16 @@ EnumValue
 Enum(arm_arch) String(armv8-a+crc) Value(26)
 
 EnumValue
-Enum(arm_arch) String(iwmmxt) Value(27)
+Enum(arm_arch) String(armv8.1-a) Value(27)
 
 EnumValue
-Enum(arm_arch) String(iwmmxt2) Value(28)
+Enum(arm_arch) String(armv8.1-a+crc) Value(28)
+
+EnumValue
+Enum(arm_arch) String(iwmmxt) Value(29)
+
+EnumValue
+Enum(arm_arch) String(iwmmxt2) Value(30)
 
 Enum
 Name(arm_fpu) Type(int)
diff --git a/gcc/config/arm/arm.c b/gcc/config/arm/arm.c
index e0cdc2

Re: [AArch64][PATCH 7/7] Add NEON intrinsics vqrdmlah_lane and vqrdmlsh_lane.

2015-11-25 Thread Matthew Wahab


On 23/11/15 13:37, James Greenhalgh wrote:

On Fri, Oct 23, 2015 at 01:30:46PM +0100, Matthew Wahab wrote:

The ARMv8.1 architecture extension adds two Adv.SIMD instructions,
sqrdmlah and sqrdmlsh. This patch adds the NEON intrinsics vqrdmlah_lane
and vqrdmlsh_lane for these instructions. The new intrinsics are of the
form vqrdml{as}h[q]_lane_.




diff --git a/gcc/config/aarch64/arm_neon.h b/gcc/config/aarch64/arm_neon.h
index 9e73809..9b68e4a 100644
--- a/gcc/config/aarch64/arm_neon.h
+++ b/gcc/config/aarch64/arm_neon.h
@@ -10675,6 +10675,59 @@ vqrdmulhq_laneq_s32 (int32x4_t __a, int32x4_t __b, 
const int __c)
return __builtin_aarch64_sqrdmulh_laneqv4si (__a, __b, __c);
  }

+#pragma GCC push_options
+#pragma GCC target ("arch=armv8.1-a")


Rather than strict alphabetical order, can we group everything which is
under one set of extensions together, to save on the push_options/pop_options
pairs.



Attached the reworked patch that keeps the ARMv8.1 intrinsics together,
bracketed by a single target pragma.

Retested aarch64-none-elf with cross-compiled check-gcc on an ARMv8.1
emulator. Also re-ran the cross-compiled
gcc.target/aarch64/advsimd-intrinsics tests for aarch64-none-elf on an
ARMv8 emulator.

Matthew

gcc/
2015-11-24  Matthew Wahab  

* gcc/config/aarch64/arm_neon.h
(vqrdmlah_laneq_s16, vqrdmlah_laneq_s32): New.
(vqrdmlahq_laneq_s16, vqrdmlahq_laneq_s32): New.
(vqrdmlsh_lane_s16, vqrdmlsh_lane_s32): New.
(vqrdmlshq_laneq_s16, vqrdmlshq_laneq_s32): New.
(vqrdmlah_lane_s16, vqrdmlah_lane_s32): New.
(vqrdmlahq_lane_s16, vqrdmlahq_lane_s32): New.
(vqrdmlahh_s16, vqrdmlahh_lane_s16, vqrdmlahh_laneq_s16): New.
(vqrdmlahs_s32, vqrdmlahs_lane_s32, vqrdmlahs_laneq_s32): New.
(vqrdmlsh_lane_s16, vqrdmlsh_lane_s32): New.
(vqrdmlshq_lane_s16, vqrdmlshq_lane_s32): New.
(vqrdmlshh_s16, vqrdmlshh_lane_s16, vqrdmlshh_laneq_s16): New.
(vqrdmlshs_s32, vqrdmlshs_lane_s32, vqrdmlshs_laneq_s32): New.

gcc/testsuite
2015-11-24  Matthew Wahab  

* gcc.target/aarch64/advsimd-intrinsics/vqrdmlXh_lane.inc: New file,
support code for vqrdml{as}h_lane tests.
* gcc.target/aarch64/advsimd-intrinsics/vqrdmlah_lane.c: New.
* gcc.target/aarch64/advsimd-intrinsics/vqrdmlsh_lane.c: New.

>From 03cb214eaf07cceb65f0dc07dca1be739bfe5375 Mon Sep 17 00:00:00 2001
From: Matthew Wahab 
Date: Thu, 27 Aug 2015 14:17:26 +0100
Subject: [PATCH 7/7] Add neon intrinsics: vqrdmlah_lane, vqrdmlsh_lane.

Change-Id: I6d7a372e0a5b83ef0846ab62abbe9b24ada69fc4
---
 gcc/config/aarch64/arm_neon.h  | 168 +
 .../aarch64/advsimd-intrinsics/vqrdmlXh_lane.inc   | 154 +++
 .../aarch64/advsimd-intrinsics/vqrdmlah_lane.c |  57 +++
 .../aarch64/advsimd-intrinsics/vqrdmlsh_lane.c |  61 
 4 files changed, 440 insertions(+)
 create mode 100644 gcc/testsuite/gcc.target/aarch64/advsimd-intrinsics/vqrdmlXh_lane.inc
 create mode 100644 gcc/testsuite/gcc.target/aarch64/advsimd-intrinsics/vqrdmlah_lane.c
 create mode 100644 gcc/testsuite/gcc.target/aarch64/advsimd-intrinsics/vqrdmlsh_lane.c

diff --git a/gcc/config/aarch64/arm_neon.h b/gcc/config/aarch64/arm_neon.h
index 63f1627..56db339 100644
--- a/gcc/config/aarch64/arm_neon.h
+++ b/gcc/config/aarch64/arm_neon.h
@@ -11264,6 +11264,174 @@ vqrdmlshq_s32 (int32x4_t __a, int32x4_t __b, int32x4_t __c)
 {
   return __builtin_aarch64_sqrdmlshv4si (__a, __b, __c);
 }
+
+__extension__ static __inline int16x4_t __attribute__ ((__always_inline__))
+vqrdmlah_laneq_s16 (int16x4_t __a, int16x4_t __b, int16x8_t __c, const int __d)
+{
+  return  __builtin_aarch64_sqrdmlah_laneqv4hi (__a, __b, __c, __d);
+}
+
+__extension__ static __inline int32x2_t __attribute__ ((__always_inline__))
+vqrdmlah_laneq_s32 (int32x2_t __a, int32x2_t __b, int32x4_t __c, const int __d)
+{
+  return __builtin_aarch64_sqrdmlah_laneqv2si (__a, __b, __c, __d);
+}
+
+__extension__ static __inline int16x8_t __attribute__ ((__always_inline__))
+vqrdmlahq_laneq_s16 (int16x8_t __a, int16x8_t __b, int16x8_t __c, const int __d)
+{
+  return __builtin_aarch64_sqrdmlah_laneqv8hi (__a, __b, __c, __d);
+}
+
+__extension__ static __inline int32x4_t __attribute__ ((__always_inline__))
+vqrdmlahq_laneq_s32 (int32x4_t __a, int32x4_t __b, int32x4_t __c, const int __d)
+{
+  return __builtin_aarch64_sqrdmlah_laneqv4si (__a, __b, __c, __d);
+}
+
+__extension__ static __inline int16x4_t __attribute__ ((__always_inline__))
+vqrdmlsh_laneq_s16 (int16x4_t __a, int16x4_t __b, int16x8_t __c, const int __d)
+{
+  return  __builtin_aarch64_sqrdmlsh_laneqv4hi (__a, __b, __c, __d);
+}
+
+__extension__ static __inline int32x2_t __attribute__ ((__always_inline__))
+vqrdmlsh_laneq_s32 (int32x2_t __a, int32x2_t __b, int32x4_t __c, const int __d)
+{
+  return __builtin_aarch64_sqrdmlsh_laneqv2si (__a, __b, __c, __d);
+}
+
+__extension__ static __inline int16x8_t

Re: [AArch64][PATCH 6/7] Add NEON intrinsics vqrdmlah and vqrdmlsh.

2015-11-25 Thread Matthew Wahab


On 23/11/15 13:35, James Greenhalgh wrote:

On Fri, Oct 23, 2015 at 01:26:11PM +0100, Matthew Wahab wrote:

The ARMv8.1 architecture extension adds two Adv.SIMD instructions,
sqrdmlah and sqrdmlsh. This patch adds the NEON intrinsics vqrdmlah and
vqrdmlsh for these instructions. The new intrinsics are of the form
vqrdml{as}h[q]_.




diff --git a/gcc/config/aarch64/arm_neon.h b/gcc/config/aarch64/arm_neon.h
index e186348..9e73809 100644
--- a/gcc/config/aarch64/arm_neon.h
+++ b/gcc/config/aarch64/arm_neon.h
@@ -2649,6 +2649,59 @@ vqrdmulhq_s32 (int32x4_t __a, int32x4_t __b)
return (int32x4_t) __builtin_aarch64_sqrdmulhv4si (__a, __b);
  }

+#pragma GCC push_options
+#pragma GCC target ("arch=armv8.1-a")


Can we please patch the documentation to make it clear that -march=armv8.1-a
always implies -march=armv8.1-a+rdma ? The documentation around which
feature modifiers are implied when leaves much to be desired.


I'll rework the documentation as part of the (separate) command lines clean-up 
patch.


+
+__extension__ static __inline int16x4_t __attribute__ ((__always_inline__))
+vqrdmlah_s16 (int16x4_t __a, int16x4_t __b, int16x4_t __c)
+{
+  return (int16x4_t) __builtin_aarch64_sqrdmlahv4hi (__a, __b, __c);


We don't need this cast (likewise the other instances)?



Attached, a reworked patch that removes the casts from the new
intrinsics. It also moves the new intrinsics to before the crypto
intrinsics. The intention is that the intrinsics added in this and the
next patch in the set are put in the same place and bracketed by a
single target pragma.

Retested aarch64-none-elf with cross-compiled check-gcc on an ARMv8.1
emulator. Also re-ran the cross-compiled
gcc.target/aarch64/advsimd-intrinsics tests for aarch64-none-elf on an
ARMv8 emulator.

Matthew

gcc/
2015-11-24  Matthew Wahab  

* gcc/config/aarch64/arm_neon.h (vqrdmlah_s16, vqrdmlah_s32): New.
(vqrdmlahq_s16, vqrdmlahq_s32): New.
(vqrdmlsh_s16, vqrdmlsh_s32): New.
(vqrdmlshq_s16, vqrdmlshq_s32): New.

gcc/testsuite
2015-11-24  Matthew Wahab  

* gcc.target/aarch64/advsimd-intrinsics/vqrdmlXh.inc: New file,
support code for vqrdml{as}h tests.
* gcc.target/aarch64/advsimd-intrinsics/vqrdmlah.c: New.
* gcc.target/aarch64/advsimd-intrinsics/vqrdmlsh.c: New.


>From e623828ac2d033a9a51766d9843a650aab9f42e9 Mon Sep 17 00:00:00 2001
From: Matthew Wahab 
Date: Thu, 27 Aug 2015 13:22:41 +0100
Subject: [PATCH 6/7] Add neon intrinsics: vqrdmlah, vqrdmlsh.

Change-Id: I5c7f8d36ee980d280c1d50f6f212b286084c5acf
---
 gcc/config/aarch64/arm_neon.h  |  53 
 .../aarch64/advsimd-intrinsics/vqrdmlXh.inc| 138 +
 .../aarch64/advsimd-intrinsics/vqrdmlah.c  |  57 +
 .../aarch64/advsimd-intrinsics/vqrdmlsh.c  |  61 +
 4 files changed, 309 insertions(+)
 create mode 100644 gcc/testsuite/gcc.target/aarch64/advsimd-intrinsics/vqrdmlXh.inc
 create mode 100644 gcc/testsuite/gcc.target/aarch64/advsimd-intrinsics/vqrdmlah.c
 create mode 100644 gcc/testsuite/gcc.target/aarch64/advsimd-intrinsics/vqrdmlsh.c

diff --git a/gcc/config/aarch64/arm_neon.h b/gcc/config/aarch64/arm_neon.h
index 138b108..63f1627 100644
--- a/gcc/config/aarch64/arm_neon.h
+++ b/gcc/config/aarch64/arm_neon.h
@@ -11213,6 +11213,59 @@ vbslq_u64 (uint64x2_t __a, uint64x2_t __b, uint64x2_t __c)
   return __builtin_aarch64_simd_bslv2di_ (__a, __b, __c);
 }
 
+/* ARMv8.1 instrinsics.  */
+#pragma GCC push_options
+#pragma GCC target ("arch=armv8.1-a")
+
+__extension__ static __inline int16x4_t __attribute__ ((__always_inline__))
+vqrdmlah_s16 (int16x4_t __a, int16x4_t __b, int16x4_t __c)
+{
+  return __builtin_aarch64_sqrdmlahv4hi (__a, __b, __c);
+}
+
+__extension__ static __inline int32x2_t __attribute__ ((__always_inline__))
+vqrdmlah_s32 (int32x2_t __a, int32x2_t __b, int32x2_t __c)
+{
+  return __builtin_aarch64_sqrdmlahv2si (__a, __b, __c);
+}
+
+__extension__ static __inline int16x8_t __attribute__ ((__always_inline__))
+vqrdmlahq_s16 (int16x8_t __a, int16x8_t __b, int16x8_t __c)
+{
+  return __builtin_aarch64_sqrdmlahv8hi (__a, __b, __c);
+}
+
+__extension__ static __inline int32x4_t __attribute__ ((__always_inline__))
+vqrdmlahq_s32 (int32x4_t __a, int32x4_t __b, int32x4_t __c)
+{
+  return __builtin_aarch64_sqrdmlahv4si (__a, __b, __c);
+}
+
+__extension__ static __inline int16x4_t __attribute__ ((__always_inline__))
+vqrdmlsh_s16 (int16x4_t __a, int16x4_t __b, int16x4_t __c)
+{
+  return __builtin_aarch64_sqrdmlshv4hi (__a, __b, __c);
+}
+
+__extension__ static __inline int32x2_t __attribute__ ((__always_inline__))
+vqrdmlsh_s32 (int32x2_t __a, int32x2_t __b, int32x2_t __c)
+{
+  return __builtin_aarch64_sqrdmlshv2si (__a, __b, __c);
+}
+
+__extension__ static __inline int16x8_t __attribute__ ((__always_inline__))
+vqrdmlshq_s16 (int16x8_t __a, int16x8_t __b, int16x8_t __c)
+{
+  return __builtin_aa

Re: [AArch64][dejagnu][PATCH 5/7] Dejagnu support for ARMv8.1 Adv.SIMD.

2015-11-25 Thread Matthew Wahab


On 23/11/15 16:38, Matthew Wahab wrote:

On 23/11/15 12:24, James Greenhalgh wrote:

On Tue, Oct 27, 2015 at 03:32:04PM +, Matthew Wahab wrote:

On 24/10/15 08:16, Bernhard Reutner-Fischer wrote:

On October 23, 2015 2:24:26 PM GMT+02:00, Matthew Wahab
 wrote:

The ARMv8.1 architecture extension adds two Adv.SIMD instructions,.
This
patch adds support in Dejagnu for ARMv8.1 Adv.SIMD specifiers and
checks.

The new test options are
- { dg-add-options arm_v8_1a_neon }: Add compiler options needed to
   enable ARMv8.1 Adv.SIMD.
- { dg-require-effective-target arm_v8_1a_neon_hw }: Require a target
   capable of executing ARMv8.1 Adv.SIMD instructions.




+# Return 1 if the target supports executing the ARMv8.1 Adv.SIMD extension, 0
+# otherwise.  The test is valid for AArch64.
+
+proc check_effective_target_arm_v8_1a_neon_hw { } {
+if { ![check_effective_target_arm_v8_1a_neon_ok] } {
+return 0;
+}
+return [check_runtime_nocache arm_v8_1a_neon_hw_available {
+int
+main (void)
+{
+  long long a = 0, b = 1;
+  long long result = 0;
+
+  asm ("sqrdmlah %s0,%s1,%s2"
+   : "=w"(result)
+   : "w"(a), "w"(b)
+   : /* No clobbers.  */);


Hm, those types look wrong, I guess this works but it is an unusual way
to write it. I presume this is to avoid including arm_neon.h each time, but
you could just directly use the internal type names for the arm_neon types.
That is to say __Int32x4_t (or whichever mode you intend to use).



I'll rework the patch to use the internal types names.


Attached, the reworked patch which uses internal type __Int32x2_t and
cleans up the assembler.

Retested aarch64-none-elf with cross-compiled check-gcc on an ARMv8.1
emulator. Also re-ran the cross-compiled
gcc.target/aarch64/advsimd-intrinsics tests for aarch64-none-elf on an
ARMv8 emulator.

Matthew

gcc/testsuite
2015-11-24  Matthew Wahab  

* lib/target-supports.exp (add_options_for_arm_v8_1a_neon): New.
(check_effective_target_arm_arch_FUNC_ok)
(add_options_for_arm_arch_FUNC)
(check_effective_target_arm_arch_FUNC_multilib): Add "armv8.1-a"
to the list to be generated.
(check_effective_target_arm_v8_1a_neon_ok_nocache): New.
(check_effective_target_arm_v8_1a_neon_ok): New.
(check_effective_target_arm_v8_1a_neon_hw): New.



>From 262c24946b2da5833a30b2e3e696bb7ea271059f Mon Sep 17 00:00:00 2001
From: Matthew Wahab 
Date: Mon, 26 Oct 2015 14:58:36 +
Subject: [PATCH 5/7] [Testsuite] Add dejagnu options for armv8.1 neon

Change-Id: Ib58b8c4930ad3971af3ea682eda043e14cd2e8b3
---
 gcc/testsuite/lib/target-supports.exp | 57 ++-
 1 file changed, 56 insertions(+), 1 deletion(-)

diff --git a/gcc/testsuite/lib/target-supports.exp b/gcc/testsuite/lib/target-supports.exp
index 3eb46f2..dcd51fd 100644
--- a/gcc/testsuite/lib/target-supports.exp
+++ b/gcc/testsuite/lib/target-supports.exp
@@ -2816,6 +2816,16 @@ proc add_options_for_arm_v8_neon { flags } {
 return "$flags $et_arm_v8_neon_flags -march=armv8-a"
 }
 
+# Add the options needed for ARMv8.1 Adv.SIMD.
+
+proc add_options_for_arm_v8_1a_neon { flags } {
+if { [istarget aarch64*-*-*] } {
+	return "$flags -march=armv8.1-a"
+} else {
+	return "$flags"
+}
+}
+
 proc add_options_for_arm_crc { flags } {
 if { ! [check_effective_target_arm_crc_ok] } {
 return "$flags"
@@ -3102,7 +3112,8 @@ foreach { armfunc armflag armdef } { v4 "-march=armv4 -marm" __ARM_ARCH_4__
  v7r "-march=armv7-r" __ARM_ARCH_7R__
  v7m "-march=armv7-m -mthumb" __ARM_ARCH_7M__
  v7em "-march=armv7e-m -mthumb" __ARM_ARCH_7EM__
- v8a "-march=armv8-a" __ARM_ARCH_8A__ } {
+ v8a "-march=armv8-a" __ARM_ARCH_8A__
+ v8_1a "-march=armv8.1a" __ARM_ARCH_8A__ } {
 eval [string map [list FUNC $armfunc FLAG $armflag DEF $armdef ] {
 	proc check_effective_target_arm_arch_FUNC_ok { } {
 	if { [ string match "*-marm*" "FLAG" ] &&
@@ -3259,6 +3270,25 @@ proc check_effective_target_arm_neonv2_hw { } {
 } [add_options_for_arm_neonv2 ""]]
 }
 
+# Return 1 if the target supports the ARMv8.1 Adv.SIMD extension, 0
+# otherwise.  The test is valid for AArch64.
+
+proc check_effective_target_arm_v8_1a_neon_ok_nocache { } {
+if { ![istarget aarch64*-*-*] } {
+	return 0
+}
+return [check_no_compiler_messages_nocache arm_v8_1a_neon_ok assembly {
+	#if !defined (__ARM_FEATURE_QRDMX)
+	#error "__ARM_FEATURE_QRDMX not defined"
+	#endif
+} [add_options_for_arm_v8_1a_neon ""]]
+}
+
+proc check_effective_target_arm_v8_1a_neon_ok { } {
+return [check_cached_effective_target arm_v8_1a_neon_ok \
+		check_effective_target_arm_v8_1a_neon_ok_nocache]
+}
+
 # Return 1 if

Re: [AArch64][dejagnu][PATCH 5/7] Dejagnu support for ARMv8.1 Adv.SIMD.

2015-11-23 Thread Matthew Wahab


On 23/11/15 12:24, James Greenhalgh wrote:

On Tue, Oct 27, 2015 at 03:32:04PM +, Matthew Wahab wrote:

On 24/10/15 08:16, Bernhard Reutner-Fischer wrote:

On October 23, 2015 2:24:26 PM GMT+02:00, Matthew Wahab 
 wrote:

The ARMv8.1 architecture extension adds two Adv.SIMD instructions,.
This
patch adds support in Dejagnu for ARMv8.1 Adv.SIMD specifiers and
checks.

The new test options are
- { dg-add-options arm_v8_1a_neon }: Add compiler options needed to
   enable ARMv8.1 Adv.SIMD.
- { dg-require-effective-target arm_v8_1a_neon_hw }: Require a target
   capable of executing ARMv8.1 Adv.SIMD instructions.




Hi Matthew,

I have a couple of comments below. Neither need to block the patch, but
I'd appreciate a reply before I say OK.



diff --git a/gcc/testsuite/lib/target-supports.exp 
b/gcc/testsuite/lib/target-supports.exp
index 4d5b0a3d..0fb679d 100644
--- a/gcc/testsuite/lib/target-supports.exp
+++ b/gcc/testsuite/lib/target-supports.exp
@@ -2700,6 +2700,16 @@ proc add_options_for_arm_v8_neon { flags } {
  return "$flags $et_arm_v8_neon_flags -march=armv8-a"
  }

+# Add the options needed for ARMv8.1 Adv.SIMD.
+
+proc add_options_for_arm_v8_1a_neon { flags } {
+if { [istarget aarch64*-*-*] } {
+   return "$flags -march=armv8.1-a"


Should this be -march=armv8.1-a+simd or some other feature flag?



I think it should by armv8.1-a only. +simd is enabled by all -march settings so it 
seems redundant to add it here. An alternative is to add +rdma but that's also 
enabled by armv8.1-a. (I've a patch at 
https://gcc.gnu.org/ml/gcc-patches/2015-11/msg01973.html which gets rid for +rdma as 
part of an armv8.1-a command line clean up.)



+# Return 1 if the target supports executing the ARMv8.1 Adv.SIMD extension, 0
+# otherwise.  The test is valid for AArch64.
+
+proc check_effective_target_arm_v8_1a_neon_hw { } {
+if { ![check_effective_target_arm_v8_1a_neon_ok] } {
+   return 0;
+}
+return [check_runtime_nocache arm_v8_1a_neon_hw_available {
+   int
+   main (void)
+   {
+ long long a = 0, b = 1;
+ long long result = 0;
+
+ asm ("sqrdmlah %s0,%s1,%s2"
+  : "=w"(result)
+  : "w"(a), "w"(b)
+  : /* No clobbers.  */);


Hm, those types look wrong, I guess this works but it is an unusual way
to write it. I presume this is to avoid including arm_neon.h each time, but
you could just directly use the internal type names for the arm_neon types.
That is to say __Int32x4_t (or whichever mode you intend to use).



I'll rework the patch to use the internal types names.

Matthew

[AArch64] Rework ARMv8.1 command line options.

2015-11-16 Thread Matthew Wahab


Hello,

The command line options for target selection allow ARMv8.1 extensions
to be individually enabled/disabled. They also allow the extensions to
be enabled with -march=armv8-a. This doesn't reflect the ARMv8.1
architecture which requires all extensions to be enabled and doesn't make
them available for ARMv8.

This patch removes the options for the individual ARMv8.1 extensions
except for +lse. This means that setting -march=armv8.1-a will enable
all extensions required by ARMv8.1 and that the ARMv8.1 extensions can't
be used with -march=armv8.

The exception to this is +lse since there may be existing code expecting
to be built with -march=armv8-a+lse. Note that +crc, which is enabled by
-march=armv8.1-a, is still an option for -march=armv8-a.

This patch depends on the patch series
https://gcc.gnu.org/ml/gcc-patches/2015-10/msg02429.html.

Tested aarch64-none-elf with cross-compiled check-gcc and
aarch64-none-linux-gnu with native bootstrap and make check.

Ok for trunk?
Matthew

gcc/
2015-11-16  Matthew Wahab  

* config/aarch64/aarch64-options-extensions.def: Remove
AARCH64_FL_RDMA from "fp" and "simd".  Remove "pan", "lor",
"rdma".
* config/aarch64/aarch64.h (AARCH64_FL_PAN): Remove.
(AARCH64_FL_LOR): Remove.
(AARCH64_FL_RDMA): Remove.
(AARCH64_FL_V8_1): New.
(AARCH64_FL_FOR_AARCH8_1): Replace AARCH64_FL_PAN, AARCH64_FL_LOR
and AARCH64_FL_RDMA with AARCH64_FL_V8_1.
(AARCH64_ISA_RDMA): Replace AARCH64_FL_RDMA with AARCH64_FL_V8_1.
* doc/invoke.texi (AArch64 - Feature Modifiers): Remove "pan",
"lor" and "rdma".
>From bc4ea389754127ec639ea2de085a7c82aebae117 Mon Sep 17 00:00:00 2001
From: Matthew Wahab 
Date: Fri, 30 Oct 2015 10:32:59 +
Subject: [PATCH] [AArch64] Rework ARMv8.1 command line options.

Change-Id: Ib9053719f45980255a3d7727e226a53d9f214049
---
 gcc/config/aarch64/aarch64-option-extensions.def | 9 -
 gcc/config/aarch64/aarch64.h | 9 +++--
 gcc/doc/invoke.texi  | 7 ---
 3 files changed, 7 insertions(+), 18 deletions(-)

diff --git a/gcc/config/aarch64/aarch64-option-extensions.def b/gcc/config/aarch64/aarch64-option-extensions.def
index b261a0f..4f1d535 100644
--- a/gcc/config/aarch64/aarch64-option-extensions.def
+++ b/gcc/config/aarch64/aarch64-option-extensions.def
@@ -34,11 +34,10 @@
should contain a whitespace-separated list of the strings in 'Features'
that are required.  Their order is not important.  */
 
-AARCH64_OPT_EXTENSION("fp",	AARCH64_FL_FP,  AARCH64_FL_FPSIMD | AARCH64_FL_CRYPTO | AARCH64_FL_RDMA, "fp")
-AARCH64_OPT_EXTENSION("simd",	AARCH64_FL_FPSIMD,  AARCH64_FL_SIMD | AARCH64_FL_CRYPTO | AARCH64_FL_RDMA,   "asimd")
+AARCH64_OPT_EXTENSION ("fp", AARCH64_FL_FP,
+		   AARCH64_FL_FPSIMD | AARCH64_FL_CRYPTO, "fp")
+AARCH64_OPT_EXTENSION ("simd", AARCH64_FL_FPSIMD,
+		   AARCH64_FL_SIMD | AARCH64_FL_CRYPTO, "asimd")
 AARCH64_OPT_EXTENSION("crypto",	AARCH64_FL_CRYPTO | AARCH64_FL_FPSIMD,  AARCH64_FL_CRYPTO,   "aes pmull sha1 sha2")
 AARCH64_OPT_EXTENSION("crc",	AARCH64_FL_CRC, AARCH64_FL_CRC,"crc32")
 AARCH64_OPT_EXTENSION("lse",	AARCH64_FL_LSE, AARCH64_FL_LSE,"lse")
-AARCH64_OPT_EXTENSION("pan",	AARCH64_FL_PAN,		AARCH64_FL_PAN,		"pan")
-AARCH64_OPT_EXTENSION("lor",	AARCH64_FL_LOR,		AARCH64_FL_LOR,		"lor")
-AARCH64_OPT_EXTENSION("rdma",	AARCH64_FL_RDMA | AARCH64_FL_FPSIMD,	AARCH64_FL_RDMA,	"rdma")
diff --git a/gcc/config/aarch64/aarch64.h b/gcc/config/aarch64/aarch64.h
index 68c006f..06345f0 100644
--- a/gcc/config/aarch64/aarch64.h
+++ b/gcc/config/aarch64/aarch64.h
@@ -134,9 +134,7 @@ extern unsigned aarch64_architecture_version;
 #define AARCH64_FL_CRC(1 << 3)	/* Has CRC.  */
 /* ARMv8.1 architecture extensions.  */
 #define AARCH64_FL_LSE	  (1 << 4)  /* Has Large System Extensions.  */
-#define AARCH64_FL_PAN	  (1 << 5)  /* Has Privileged Access Never.  */
-#define AARCH64_FL_LOR	  (1 << 6)  /* Has Limited Ordering regions.  */
-#define AARCH64_FL_RDMA	  (1 << 7)  /* Has ARMv8.1 Adv.SIMD.  */
+#define AARCH64_FL_V8_1	  (1 << 5)  /* Has ARMv8.1 extensions.  */
 
 /* Has FP and SIMD.  */
 #define AARCH64_FL_FPSIMD (AARCH64_FL_FP | AARCH64_FL_SIMD)
@@ -147,8 +145,7 @@ extern unsigned aarch64_architecture_version;
 /* Architecture flags that effect instruction selection.  */
 #define AARCH64_FL_FOR_ARCH8   (AARCH64_FL_FPSIMD)
 #define AARCH64_FL_FOR_ARCH8_1			   \
-  (AARCH64_FL_FOR_

Re: [AArch64][PATCH 6/7] Add NEON intrinsics vqrdmlah and vqrdmlsh.

2015-11-09 Thread Matthew Wahab


On 09/11/15 13:31, Christophe Lyon wrote:

On 30 October 2015 at 16:52, Matthew Wahab  wrote:

On 30/10/15 12:51, Christophe Lyon wrote:


On 23 October 2015 at 14:26, Matthew Wahab 
wrote:


The ARMv8.1 architecture extension adds two Adv.SIMD instructions,
sqrdmlah and sqrdmlsh. This patch adds the NEON intrinsics vqrdmlah and
vqrdmlsh for these instructions. The new intrinsics are of the form
vqrdml{as}h[q]_.

Tested the series for aarch64-none-linux-gnu with native bootstrap and
make check on an ARMv8 architecture. Also tested aarch64-none-elf with
cross-compiled check-gcc on an ARMv8.1 emulator.


Is there a publicly available simulator for v8.1? QEMU or Foundation
Model?


Sorry, I don't know.
Matthew



So, what will happen to the testsuite once this is committed?
Are we going to see FAILs when using QEMU?



No, the check at the top of the  test files

+/* { dg-require-effective-target arm_v8_1a_neon_hw } */

should make this test UNSUPPORTED if the the HW/simulator can't execute it. (Support 
for this check is added in patch #5 in this series.) Note that the aarch64-none-linux 
make check was run on ARMv8 HW which can't execute the test and correctly reported it 
as unsupported.


Matthew

[AArch64] Move iterators from atomics.md to iterators.md

2015-11-02 Thread Matthew Wahab


Hello

One of the review comments for the v8.1 atomics patches was that the
iterators and unspec declarations should be moved out of the atomics.md
file (https://gcc.gnu.org/ml/gcc-patches/2015-09/msg01375.html).

The iterators in atomics.md are tied to the unspecv definition in the
same file. This patch moves both into iterators.md.

Tested aarch64-none-elf with cross-compiled check-gcc and
aarch64-none-linux-gnu with native bootstrap and make check.

Ok for trunk?
Matthew

gcc/
2015-11-02  Matthew Wahab  

* config/aarch64/atomics.md (unspecv): Move to iterators.md.
(ATOMIC_LDOP): Likewise.
(atomic_ldop): Likewise.
* config/aarch64/iterators.md (unspecv): Moved from atomics.md.
(ATOMIC_LDOP): Likewise.
(atomic_ldop): Likewise.
>From 90471e373421b838d1069cddb54abe0377fdc244 Mon Sep 17 00:00:00 2001
From: Matthew Wahab 
Date: Thu, 29 Oct 2015 15:44:41 +
Subject: [PATCH] [AArch64] Move atomics iterators into iteraors.md

Change-Id: Ie83ae9c983762c10920db6bf3f2e2d4fa33167b2
---
 gcc/config/aarch64/atomics.md   | 28 
 gcc/config/aarch64/iterators.md | 33 +
 2 files changed, 33 insertions(+), 28 deletions(-)

diff --git a/gcc/config/aarch64/atomics.md b/gcc/config/aarch64/atomics.md
index e7ac5f6..3c034fb 100644
--- a/gcc/config/aarch64/atomics.md
+++ b/gcc/config/aarch64/atomics.md
@@ -18,34 +18,6 @@
 ;; along with GCC; see the file COPYING3.  If not see
 ;; <http://www.gnu.org/licenses/>.
 
-(define_c_enum "unspecv"
- [
-UNSPECV_LX; Represent a load-exclusive.
-UNSPECV_SX; Represent a store-exclusive.
-UNSPECV_LDA; Represent an atomic load or load-acquire.
-UNSPECV_STL; Represent an atomic store or store-release.
-UNSPECV_ATOMIC_CMPSW		; Represent an atomic compare swap.
-UNSPECV_ATOMIC_EXCHG		; Represent an atomic exchange.
-UNSPECV_ATOMIC_CAS			; Represent an atomic CAS.
-UNSPECV_ATOMIC_SWP			; Represent an atomic SWP.
-UNSPECV_ATOMIC_OP			; Represent an atomic operation.
-UNSPECV_ATOMIC_LDOP			; Represent an atomic load-operation
-UNSPECV_ATOMIC_LDOP_OR		; Represent an atomic load-or
-UNSPECV_ATOMIC_LDOP_BIC		; Represent an atomic load-bic
-UNSPECV_ATOMIC_LDOP_XOR		; Represent an atomic load-xor
-UNSPECV_ATOMIC_LDOP_PLUS		; Represent an atomic load-add
-])
-
-;; Iterators for load-operate instructions.
-
-(define_int_iterator ATOMIC_LDOP
- [UNSPECV_ATOMIC_LDOP_OR UNSPECV_ATOMIC_LDOP_BIC
-  UNSPECV_ATOMIC_LDOP_XOR UNSPECV_ATOMIC_LDOP_PLUS])
-
-(define_int_attr atomic_ldop
- [(UNSPECV_ATOMIC_LDOP_OR "set") (UNSPECV_ATOMIC_LDOP_BIC "clr")
-  (UNSPECV_ATOMIC_LDOP_XOR "eor") (UNSPECV_ATOMIC_LDOP_PLUS "add")])
-
 ;; Instruction patterns.
 
 (define_expand "atomic_compare_and_swap"
diff --git a/gcc/config/aarch64/iterators.md b/gcc/config/aarch64/iterators.md
index 964f8f1..fe7ca39 100644
--- a/gcc/config/aarch64/iterators.md
+++ b/gcc/config/aarch64/iterators.md
@@ -305,6 +305,29 @@
 UNSPEC_VEC_SHR  ; Used in aarch64-simd.md.
 ])
 
+;; --
+;; Unspec enumerations for Atomics.  They are here so that they can be
+;; used in the int_iterators for atomic operations.
+;; --
+
+(define_c_enum "unspecv"
+ [
+UNSPECV_LX			; Represent a load-exclusive.
+UNSPECV_SX			; Represent a store-exclusive.
+UNSPECV_LDA			; Represent an atomic load or load-acquire.
+UNSPECV_STL			; Represent an atomic store or store-release.
+UNSPECV_ATOMIC_CMPSW	; Represent an atomic compare swap.
+UNSPECV_ATOMIC_EXCHG	; Represent an atomic exchange.
+UNSPECV_ATOMIC_CAS		; Represent an atomic CAS.
+UNSPECV_ATOMIC_SWP		; Represent an atomic SWP.
+UNSPECV_ATOMIC_OP		; Represent an atomic operation.
+UNSPECV_ATOMIC_LDOP		; Represent an atomic load-operation
+UNSPECV_ATOMIC_LDOP_OR	; Represent an atomic load-or
+UNSPECV_ATOMIC_LDOP_BIC	; Represent an atomic load-bic
+UNSPECV_ATOMIC_LDOP_XOR	; Represent an atomic load-xor
+UNSPECV_ATOMIC_LDOP_PLUS	; Represent an atomic load-add
+])
+
 ;; ---
 ;; Mode attributes
 ;; ---
@@ -958,6 +981,16 @@
 
 (define_int_iterator CRYPTO_SHA256 [UNSPEC_SHA256H UNSPEC_SHA256H2])
 
+;; Iterators for atomic operations.
+
+(define_int_iterator ATOMIC_LDOP
+ [UNSPECV_ATOMIC_LDOP_OR UNSPECV_ATOMIC_LDOP_BIC
+  UNSPECV_ATOMIC_LDOP_XOR UNSPECV_ATOMIC_LDOP_PLUS])
+
+(define_int_attr atomic_ldop
+ [(UNSPECV_ATOMIC_LDOP_OR "set") (UNSPECV_ATOMIC_LDOP_BIC "clr")
+  (UNSPECV_ATOMIC_LDOP_XOR "eor") (UNSPECV_ATOMIC_LDOP_PLUS "add")])
+
 ;; ---
 ;; Int Iterators Attributes.
 ;; ---
-- 
2.1.4

Re: [AArch64][PATCH 6/7] Add NEON intrinsics vqrdmlah and vqrdmlsh.

2015-10-30 Thread Matthew Wahab


On 30/10/15 12:51, Christophe Lyon wrote:

On 23 October 2015 at 14:26, Matthew Wahab  wrote:

The ARMv8.1 architecture extension adds two Adv.SIMD instructions,
sqrdmlah and sqrdmlsh. This patch adds the NEON intrinsics vqrdmlah and
vqrdmlsh for these instructions. The new intrinsics are of the form
vqrdml{as}h[q]_.

Tested the series for aarch64-none-linux-gnu with native bootstrap and
make check on an ARMv8 architecture. Also tested aarch64-none-elf with
cross-compiled check-gcc on an ARMv8.1 emulator.



Is there a publicly available simulator for v8.1? QEMU or Foundation Model?



Sorry, I don't know.
Matthew

1 2 3 >

1 - 100 of 213 matches

Mail list logo