[PATCH][PR rtl-optimization/pr52773] Do not reference virtual_outgoing_args after vreg instantiation

2015-01-16 Thread Jeff Law


As discussed in the PR and associated patch from Bernd

https://gcc.gnu.org/ml/gcc-patches/2013-06/msg01147.html

There are cases where we can call into emit_library_call_value after 
virtual register instantiation is complete.  That can result in a 
reference to a virtual register surviving until LRA/reload which causes 
an ICE.


Bernd's patch changes the code to use a stack pointer reference instead. 
 Note this is just for use in CALL_INSN_FUNCTION_USAGE.


Bootstrapped and regression tested on x86_64-unknown-linux-gnu.  I'll 
fire up another m68k bootstrap since that will likely exercise this more 
given it pushes arguments.


Installed on the trunk on Bernd's behalf.

Jeff
commit d059ba35689a92b87a2f20408f6c2daafc3a39e1
Author: law 
Date:   Sat Jan 17 07:35:40 2015 +

PR rtl-optimization/52773
* calls.c (emit_library_call_value): When pushing arguments use
stack_pointer_rtx rather than virtual_outgoing_args_rtx in
CALL_INSN_FUNCTION_USAGE.  Only emit one of use of the magic
stack pointer reference into CALL_INSN_FUNCTION_USAGE.

PR rtl-optimization/52773
* gcc.c-torture/compile/pr52773.c: New test.

git-svn-id: svn+ssh://gcc.gnu.org/svn/gcc/trunk@219796 
138bc75d-0d04-0410-961f-82ee72b054a4

diff --git a/gcc/ChangeLog b/gcc/ChangeLog
index 04ae255..80f8c92 100644
--- a/gcc/ChangeLog
+++ b/gcc/ChangeLog
@@ -1,3 +1,11 @@
+2015-01-17  Bernd Schmidt  
+
+   PR rtl-optimization/52773
+   * calls.c (emit_library_call_value): When pushing arguments use
+   stack_pointer_rtx rather than virtual_outgoing_args_rtx in
+   CALL_INSN_FUNCTION_USAGE.  Only emit one of use of the magic
+   stack pointer reference into CALL_INSN_FUNCTION_USAGE.
+
 2015-01-17  Jeff Law  
 
PR rtl-optimization/32790
diff --git a/gcc/calls.c b/gcc/calls.c
index 1c2f0ad..ec44624 100644
--- a/gcc/calls.c
+++ b/gcc/calls.c
@@ -3808,6 +3808,7 @@ emit_library_call_value_1 (int retval, rtx orgfun, rtx 
value,
   int reg_parm_stack_space = 0;
   int needed;
   rtx_insn *before_call;
+  bool have_push_fusage;
   tree tfom;   /* type_for_mode (outmode, 0) */
 
 #ifdef REG_PARM_STACK_SPACE
@@ -4165,6 +4166,8 @@ emit_library_call_value_1 (int retval, rtx orgfun, rtx 
value,
 
   /* Push the args that need to be pushed.  */
 
+  have_push_fusage = false;
+
   /* ARGNUM indexes the ARGVEC array in the order in which the arguments
  are to be pushed.  */
   for (count = 0; count < nargs; count++, argnum--)
@@ -4256,14 +4259,19 @@ emit_library_call_value_1 (int retval, rtx orgfun, rtx 
value,
  if (argblock)
use = plus_constant (Pmode, argblock,
 argvec[argnum].locate.offset.constant);
+ else if (have_push_fusage)
+   continue;
  else
-   /* When arguments are pushed, trying to tell alias.c where
-  exactly this argument is won't work, because the
-  auto-increment causes confusion.  So we merely indicate
-  that we access something with a known mode somewhere on
-  the stack.  */
-   use = gen_rtx_PLUS (Pmode, virtual_outgoing_args_rtx,
-   gen_rtx_SCRATCH (Pmode));
+   {
+ /* When arguments are pushed, trying to tell alias.c where
+exactly this argument is won't work, because the
+auto-increment causes confusion.  So we merely indicate
+that we access something with a known mode somewhere on
+the stack.  */
+ use = gen_rtx_PLUS (Pmode, stack_pointer_rtx,
+ gen_rtx_SCRATCH (Pmode));
+ have_push_fusage = true;
+   }
  use = gen_rtx_MEM (argvec[argnum].mode, use);
  use = gen_rtx_USE (VOIDmode, use);
  call_fusage = gen_rtx_EXPR_LIST (VOIDmode, use, call_fusage);
diff --git a/gcc/testsuite/ChangeLog b/gcc/testsuite/ChangeLog
index 41ad87e..3d424ce 100644
--- a/gcc/testsuite/ChangeLog
+++ b/gcc/testsuite/ChangeLog
@@ -1,3 +1,8 @@
+2015-01-16  Bernd Schmidt  
+
+   PR rtl-optimization/52773
+   * gcc.c-torture/compile/pr52773.c: New test.
+
 2015-01-16  Paolo Carlini  
 
PR c++/62134
diff --git a/gcc/testsuite/gcc.c-torture/compile/pr52773.c 
b/gcc/testsuite/gcc.c-torture/compile/pr52773.c
new file mode 100644
index 000..8daa5ee
--- /dev/null
+++ b/gcc/testsuite/gcc.c-torture/compile/pr52773.c
@@ -0,0 +1,16 @@
+/* pr52773.c */
+
+struct s {
+short x;
+short _pad[2];
+};
+
+static short mat_a_x;
+
+void transform(const struct s *src, struct s *dst, int n)
+{
+int i;
+
+for (i = 0; i < n; ++i)
+   dst[i].x = (src[i].x * mat_a_x) >> 6;
+}


[PATCH] [PR rtl-optimization/32790] Fix long standing typo/thinko in reg_scan_mark_refs

2015-01-16 Thread Jeff Law


Currently virtually working in PST, not sure if I'll end up in Hawaii or 
points further west or not. :-)



This has been around a long long time.

reg_scan_mark_refs counts the number of references to each pseudo in a 
function.  It has to peek at SET_DEST operands in the off chance that 
they might be a STRICT_LOW_PART, SUBREG, or ZERO_EXTRACT which have to 
be considered as read-write operands.


A bit of text from the manual in case anyone is unaware of how 
ZERO_EXTRACT can appear in a SET_DEST...


table @code
@findex set
@item (set @var{lval} @var{x})
Represents the action of storing the value of @var{x} into the place
represented by @var{lval}.  @var{lval} must be an expression
representing a place that can be stored in: @code{reg} (or @code{subreg},
@code{strict_low_part} or @code{zero_extract}), @code{mem}, @code{pc},
@code{parallel}, or @code{cc0}.

[ ... ]

If @var{lval} is a @code{zero_extract}, then the referenced part of
the bit-field (a memory or register reference) specified by the
@code{zero_extract} is given the value @var{x} and the rest of the
bit-field is not changed.  Note that @code{sign_extract} can not
appear in @var{lval}.


Unfortunately someone used ZERO_EXTEND rather than ZERO_EXTRACT in the 
test for these special case lvalues that are also reads.  The 
consequences of this goof are tiny, but we might as well fix it. 
Georg-Johann noticed this back in 2007, but nobody ever took corrective 
action.




Bootstrapped and regression tested on x86_64-unknown-linux-gnu. 
Installed on the trunk.




commit acb3ee94191b9d2093e6954a7255758ed8f83125
Author: Jeff Law 
Date:   Sat Jan 17 00:19:23 2015 -0700

PR rtl-optimization/32790
* reginfo.c (reg_scan_mark_refs): Look for ZERO_EXTRACT,
not ZERO_EXTEND in SET_DESTs.

diff --git a/gcc/ChangeLog b/gcc/ChangeLog
index 12bd23a..04ae255 100644
--- a/gcc/ChangeLog
+++ b/gcc/ChangeLog
@@ -1,3 +1,9 @@
+2015-01-17  Jeff Law  
+
+   PR rtl-optimization/32790
+   * reginfo.c (reg_scan_mark_refs): Look for ZERO_EXTRACT,
+   not ZERO_EXTEND in SET_DESTs.
+
 2015-01-17  Alan Modra  
 
* cprop.c (do_local_cprop): Revert last change.
diff --git a/gcc/reginfo.c b/gcc/reginfo.c
index 2a18fb8..9015eeb 100644
--- a/gcc/reginfo.c
+++ b/gcc/reginfo.c
@@ -1132,7 +1132,7 @@ reg_scan_mark_refs (rtx x, rtx_insn *insn)
   /* Count a set of the destination if it is a register.  */
   for (dest = SET_DEST (x);
   GET_CODE (dest) == SUBREG || GET_CODE (dest) == STRICT_LOW_PART
-  || GET_CODE (dest) == ZERO_EXTEND;
+  || GET_CODE (dest) == ZERO_EXTRACT;
   dest = XEXP (dest, 0))
;
 


Re: [patch libstdc++] Optimize synchronization in std::future if futexes are available.

2015-01-16 Thread Hans-Peter Nilsson
On Fri, 16 Jan 2015, pins...@gmail.com wrote:
> > On Jan 16, 2015, at 9:57 PM, David Edelsohn  wrote:
> >
> > This patch has broken bootstrap on AIX
> >
> > May I mention that this really should have been tested on systems
> > other than x86 Linux.
>
> It also broke all newlib targets too. So you could have tested one listed in 
> the sim-test web page.

For those interested, PR64638.

brgds, H-P


Re: [patch] DW_AT_producer: Ignore -fpreprocessed

2015-01-16 Thread Jakub Jelinek
On Sat, Jan 17, 2015 at 12:42:21AM +0100, Jan Kratochvil wrote:
> Hi,
> 
> I have provided a sufficient fix in GDB for the -fplugin=libcc1plugin feature:
>   [patch+7.9] compile: Filter out -fpreprocessed
>   https://sourceware.org/ml/gdb-patches/2015-01/msg00485.html
> 
> But still I think "-fpreprocessed" is inappropriate for DW_AT_producer.
> 
> Otherwise for an inferior built using ccache the string "-fpreprocessed" is
> put into DW_AT_producer and then GDB compilation with -fplugin=libcc1plugin
> breaks as the GDB-runtime-generated source is not already preprocessed.
> 
> I have rebuilt GCC but I have not run the testsuite with this patch.

Ok for trunk, thanks.

> gcc/ChangeLog
>   * dwarf2out.c (gen_producer_string): Ignore also OPT_fpreprocessed.
> 
> Index: ./gcc/dwarf2out.c
> ===
> --- ./gcc/dwarf2out.c (revision 219770)
> +++ ./gcc/dwarf2out.c (working copy)
> @@ -19624,6 +19624,7 @@ gen_producer_string (void)
>case OPT__sysroot_:
>case OPT_nostdinc:
>case OPT_nostdinc__:
> +  case OPT_fpreprocessed:
>   /* Ignore these.  */
>   continue;
>default:


Jakub


Re: [patch libstdc++] Optimize synchronization in std::future if futexes are available.

2015-01-16 Thread pinskia




> On Jan 16, 2015, at 9:57 PM, David Edelsohn  wrote:
> 
> This patch has broken bootstrap on AIX
> 
> May I mention that this really should have been tested on systems
> other than x86 Linux.

It also broke all newlib targets too. So you could have tested one listed in 
the sim-test web page. 

Thanks,
Andrew

> 
> In file included from 
> /tmp/20150117/powerpc-ibm-aix7.1.0.0/libstdc++-v3/include/
> future:44:0,
> from
> /nasfarm/edelsohn/src/src/libstdc++-v3/src/c++11/compatibility-thread-c++0x.cc:30:
> /tmp/20150117/powerpc-ibm-aix7.1.0.0/libstdc++-v3/include/bits/atomic_futex.h:223:5:
> error: 'mutex' does not name a type
> mutex _M_mutex;
> ^
> /tmp/20150117/powerpc-ibm-aix7.1.0.0/libstdc++-v3/include/bits/atomic_futex.h:224:5:
> error: 'condition_variable' does not name a type
> condition_variable _M_condvar;
> ^
> /tmp/20150117/powerpc-ibm-aix7.1.0.0/libstdc++-v3/include/bits/atomic_futex.h:
> In member function 'unsigned int
> std::__atomic_futex_unsigned<_Waiter_bit>::_M_load(std::memory_order)':
> /tmp/20150117/powerpc-ibm-aix7.1.0.0/libstdc++-v3/include/bits/atomic_futex.h:232:19:
> error: 'mutex' was not declared in this scope
>   unique_lock __lock(_M_mutex);
>   ^
> /tmp/20150117/powerpc-ibm-aix7.1.0.0/libstdc++-v3/include/bits/atomic_futex.h:232:24:
> error: template argument 1 is invalid
>   unique_lock __lock(_M_mutex);
>^
> /tmp/20150117/powerpc-ibm-aix7.1.0.0/libstdc++-v3/include/bits/atomic_futex.h:232:33:
> error: '_M_mutex' was not declared in this scope
>   unique_lock __lock(_M_mutex);
> 
> /tmp/20150117/powerpc-ibm-aix7.1.0.0/libstdc++-v3/include/bits/atomic_futex.h:
> In member function 'unsigned int
> std::__atomic_futex_unsigned<_Waiter_bit>::_M_load_when_not_equal(unsigned
> int, std::memory_order)':
> /tmp/20150117/powerpc-ibm-aix7.1.0.0/libstdc++-v3/include/bits/atomic_futex.h:239:19:
> error: 'mutex' was not declared in this scope
>   unique_lock __lock(_M_mutex);
>   ^
> /tmp/20150117/powerpc-ibm-aix7.1.0.0/libstdc++-v3/include/bits/atomic_futex.h:239:24:
> error: template argument 1 is invalid
>   unique_lock __lock(_M_mutex);
>^
> /tmp/20150117/powerpc-ibm-aix7.1.0.0/libstdc++-v3/include/bits/atomic_futex.h:239:33:
> error: '_M_mutex' was not declared in this scope
>   unique_lock __lock(_M_mutex);
> ^
> /tmp/20150117/powerpc-ibm-aix7.1.0.0/libstdc++-v3/include/bits/atomic_futex.h:241:2:
> error: '_M_condvar' was not declared in this scope
>  _M_condvar.wait(__lock);
>  ^
> /tmp/20150117/powerpc-ibm-aix7.1.0.0/libstdc++-v3/include/bits/atomic_futex.h:
> In member function 'void
> std::__atomic_futex_unsigned<_Waiter_bit>::_M_load_when_equal(unsigned
> int, std::memory_order)':
> 
> /tmp/20150117/powerpc-ibm-aix7.1.0.0/libstdc++-v3/include/bits/atomic_futex.h:248:19:
> error: 'mutex' was not declared in this scope
>   unique_lock __lock(_M_mutex);
>   ^
> /tmp/20150117/powerpc-ibm-aix7.1.0.0/libstdc++-v3/include/bits/atomic_futex.h:248:24:
> error: template argument 1 is invalid
>   unique_lock __lock(_M_mutex);
> 
> /tmp/20150117/powerpc-ibm-aix7.1.0.0/libstdc++-v3/include/bits/atomic_futex.h:248:33:
> error: '_M_mutex' was not declared in this scope
>   unique_lock __lock(_M_mutex);
> ^
> /tmp/20150117/powerpc-ibm-aix7.1.0.0/libstdc++-v3/include/bits/atomic_futex.h:250:2:
> error: '_M_condvar' was not declared in this scope
>  _M_condvar.wait(__lock);
>  ^
> /tmp/20150117/powerpc-ibm-aix7.1.0.0/libstdc++-v3/include/bits/atomic_futex.h:
> In member function 'bool
> std::__atomic_futex_unsigned<_Waiter_bit>::_M_load_when_equal_for(unsigned
> int, std::memory_order, const std::chrono::duration<_Rep, _Period>&)':
> 
> /tmp/20150117/powerpc-ibm-aix7.1.0.0/libstdc++-v3/include/bits/atomic_futex.h:258:19:
> error: 'mutex' was not declared in this scope
>   unique_lock __lock(_M_mutex);
>   ^
> /tmp/20150117/powerpc-ibm-aix7.1.0.0/libstdc++-v3/include/bits/atomic_futex.h:258:24:
> error: template argument 1 is invalid
>   unique_lock __lock(_M_mutex);
>^
> /tmp/20150117/powerpc-ibm-aix7.1.0.0/libstdc++-v3/include/bits/atomic_futex.h:258:33:
> error: '_M_mutex' was not declared in this scope
>   unique_lock __lock(_M_mutex);
> ^
> /tmp/20150117/powerpc-ibm-aix7.1.0.0/libstdc++-v3/include/bits/atomic_futex.h:259:14:
> error: '_M_condvar' was not declared in this scope
>   return _M_condvar.wait_for(__lock, __rtime,
>  ^
> 
> etc.
> 
> - David


Re: [patch libstdc++] Optimize synchronization in std::future if futexes are available.

2015-01-16 Thread David Edelsohn
This patch has broken bootstrap on AIX

May I mention that this really should have been tested on systems
other than x86 Linux.

In file included from /tmp/20150117/powerpc-ibm-aix7.1.0.0/libstdc++-v3/include/
future:44:0,
 from
/nasfarm/edelsohn/src/src/libstdc++-v3/src/c++11/compatibility-thread-c++0x.cc:30:
/tmp/20150117/powerpc-ibm-aix7.1.0.0/libstdc++-v3/include/bits/atomic_futex.h:223:5:
error: 'mutex' does not name a type
 mutex _M_mutex;
 ^
/tmp/20150117/powerpc-ibm-aix7.1.0.0/libstdc++-v3/include/bits/atomic_futex.h:224:5:
error: 'condition_variable' does not name a type
 condition_variable _M_condvar;
 ^
/tmp/20150117/powerpc-ibm-aix7.1.0.0/libstdc++-v3/include/bits/atomic_futex.h:
In member function 'unsigned int
std::__atomic_futex_unsigned<_Waiter_bit>::_M_load(std::memory_order)':
/tmp/20150117/powerpc-ibm-aix7.1.0.0/libstdc++-v3/include/bits/atomic_futex.h:232:19:
error: 'mutex' was not declared in this scope
   unique_lock __lock(_M_mutex);
   ^
/tmp/20150117/powerpc-ibm-aix7.1.0.0/libstdc++-v3/include/bits/atomic_futex.h:232:24:
error: template argument 1 is invalid
   unique_lock __lock(_M_mutex);
^
/tmp/20150117/powerpc-ibm-aix7.1.0.0/libstdc++-v3/include/bits/atomic_futex.h:232:33:
error: '_M_mutex' was not declared in this scope
   unique_lock __lock(_M_mutex);

/tmp/20150117/powerpc-ibm-aix7.1.0.0/libstdc++-v3/include/bits/atomic_futex.h:
In member function 'unsigned int
std::__atomic_futex_unsigned<_Waiter_bit>::_M_load_when_not_equal(unsigned
int, std::memory_order)':
/tmp/20150117/powerpc-ibm-aix7.1.0.0/libstdc++-v3/include/bits/atomic_futex.h:239:19:
error: 'mutex' was not declared in this scope
   unique_lock __lock(_M_mutex);
   ^
/tmp/20150117/powerpc-ibm-aix7.1.0.0/libstdc++-v3/include/bits/atomic_futex.h:239:24:
error: template argument 1 is invalid
   unique_lock __lock(_M_mutex);
^
/tmp/20150117/powerpc-ibm-aix7.1.0.0/libstdc++-v3/include/bits/atomic_futex.h:239:33:
error: '_M_mutex' was not declared in this scope
   unique_lock __lock(_M_mutex);
 ^
/tmp/20150117/powerpc-ibm-aix7.1.0.0/libstdc++-v3/include/bits/atomic_futex.h:241:2:
error: '_M_condvar' was not declared in this scope
  _M_condvar.wait(__lock);
  ^
/tmp/20150117/powerpc-ibm-aix7.1.0.0/libstdc++-v3/include/bits/atomic_futex.h:
In member function 'void
std::__atomic_futex_unsigned<_Waiter_bit>::_M_load_when_equal(unsigned
int, std::memory_order)':

/tmp/20150117/powerpc-ibm-aix7.1.0.0/libstdc++-v3/include/bits/atomic_futex.h:248:19:
error: 'mutex' was not declared in this scope
   unique_lock __lock(_M_mutex);
   ^
/tmp/20150117/powerpc-ibm-aix7.1.0.0/libstdc++-v3/include/bits/atomic_futex.h:248:24:
error: template argument 1 is invalid
   unique_lock __lock(_M_mutex);

/tmp/20150117/powerpc-ibm-aix7.1.0.0/libstdc++-v3/include/bits/atomic_futex.h:248:33:
error: '_M_mutex' was not declared in this scope
   unique_lock __lock(_M_mutex);
 ^
/tmp/20150117/powerpc-ibm-aix7.1.0.0/libstdc++-v3/include/bits/atomic_futex.h:250:2:
error: '_M_condvar' was not declared in this scope
  _M_condvar.wait(__lock);
  ^
/tmp/20150117/powerpc-ibm-aix7.1.0.0/libstdc++-v3/include/bits/atomic_futex.h:
In member function 'bool
std::__atomic_futex_unsigned<_Waiter_bit>::_M_load_when_equal_for(unsigned
int, std::memory_order, const std::chrono::duration<_Rep, _Period>&)':

/tmp/20150117/powerpc-ibm-aix7.1.0.0/libstdc++-v3/include/bits/atomic_futex.h:258:19:
error: 'mutex' was not declared in this scope
   unique_lock __lock(_M_mutex);
   ^
/tmp/20150117/powerpc-ibm-aix7.1.0.0/libstdc++-v3/include/bits/atomic_futex.h:258:24:
error: template argument 1 is invalid
   unique_lock __lock(_M_mutex);
^
/tmp/20150117/powerpc-ibm-aix7.1.0.0/libstdc++-v3/include/bits/atomic_futex.h:258:33:
error: '_M_mutex' was not declared in this scope
   unique_lock __lock(_M_mutex);
 ^
/tmp/20150117/powerpc-ibm-aix7.1.0.0/libstdc++-v3/include/bits/atomic_futex.h:259:14:
error: '_M_condvar' was not declared in this scope
   return _M_condvar.wait_for(__lock, __rtime,
  ^

etc.

- David


[patch] Update C++11 status in libstdc++ docs

2015-01-16 Thread Jonathan Wakely

Update the C++11 library status table and regenerate HTML.

Committed to trunk.

commit 3413bcff19f3ca144c8b4ba9660d8737d5c12072
Author: Jonathan Wakely 
Date:   Sat Jan 17 03:22:15 2015 +

2015-01-17  Ville Voutilainen  
	Jonathan Wakely  

	* doc/xml/manual/status_cxx2011.xml: Update C++11 status.
	* doc/html/*: Regenerate.

diff --git a/libstdc++-v3/doc/xml/manual/status_cxx2011.xml b/libstdc++-v3/doc/xml/manual/status_cxx2011.xml
index 2dd72ae..f1c9639 100644
--- a/libstdc++-v3/doc/xml/manual/status_cxx2011.xml
+++ b/libstdc++-v3/doc/xml/manual/status_cxx2011.xml
@@ -948,8 +948,8 @@ particular release.
   
   20.11.5
   Class template duration
-  Partial
-  Missing constexpr for non-member arithmetic operations
+  Y
+  
 
 
   20.11.6
@@ -1072,15 +1072,15 @@ particular release.
   
   21.2.3.1
   struct char_traits
-  Partial
-  Missing constexpr
+  Y
+  
 
 
   
   21.2.3.2
   struct char_traits
-  Partial
-  Missing constexpr
+  Y
+  
 
 
   21.2.3.3
@@ -1190,17 +1190,15 @@ particular release.
   
 
 
-  
   22.3.3.2.2
   string conversions
-  N
+  Y
   
 
 
-  
   22.3.3.2.3
   Buffer conversions
-  N
+  Y
   
 
 
@@ -1210,12 +1208,10 @@ particular release.
   
 
 
-  
   22.4.1
   The ctype category
-  Partial
-  Missing codecvt and
- codecvt
+  Y
+  
 
 
   22.4.2
@@ -1320,10 +1316,9 @@ particular release.
   
 
 
-  
   22.5
   Standard code conversion facets
-  N
+  Y
   
 
 
@@ -1624,11 +1619,10 @@ particular release.
   
 
 
-  
   25.3
   Mutating sequence operations
-  Partial
-  rotate returns void.
+  Y
+  
 
 
   25.4
@@ -1671,8 +1665,8 @@ particular release.
 
   26.4
   Complex numbers
-  Partial
-  Missing constexpr
+  Y
+  
 
 
   26.5
@@ -1702,19 +1696,19 @@ particular release.
   26.5.3.1
   Class template linear_congruential_engine
   Y
-  Missing constexpr
+  
 
 
   26.5.3.2
   Class template mersenne_twister_engine
   Y
-  Missing constexpr
+  
 
 
   26.5.3.3
   Class template subtract_with_carry_engine
   Y
-  Missing constexpr
+  
 
 
   26.5.4
@@ -1726,19 +1720,19 @@ particular release.
   26.5.4.2
   Class template discard_block_engine
   Y
-  Missing constexpr
+  
 
 
   26.5.4.3
   Class template independent_bits_engine
   Y
-  Missing constexpr
+  
 
 
   26.5.4.4
   Class template shuffle_order_engine
   Y
-  Missing constexpr
+  
 
 
   26.5.5
@@ -1750,7 +1744,7 @@ particular release.
   26.5.6
   Class random_device
   Y
-  Missing constexpr
+  
 
 
   26.5.7


[patch] libstdc++/58357 DR 488 std::rotate should return an iterator

2015-01-16 Thread Jonathan Wakely

http://cplusplus.github.io/LWG/lwg-defects.html#488

Tested x86_64-linux, committed to trunk.
commit a5ae94562335d47358186f76ca71cc6cf0560ed8
Author: Jonathan Wakely 
Date:   Tue Sep 23 00:11:26 2014 +0100

	DR 488
	PR libstdc++/58357
	* include/bits/algorithmfwd.h (rotate): Return an iterator.
	* include/bits/stl_algo.h (rotate, __rotate): Likewise.
	* testsuite/25_algorithms/rotate/dr488.cc: New.
	* testsuite/25_algorithms/rotate/check_type.cc: Adjust function type.
	* testsuite/25_algorithms/rotate/requirements/explicit_instantiation/
	2.cc: Likewise.
	* testsuite/25_algorithms/rotate/requirements/explicit_instantiation/
	pod.cc: Likewise.

diff --git a/libstdc++-v3/include/bits/algorithmfwd.h b/libstdc++-v3/include/bits/algorithmfwd.h
index 283c5e6..11361bb 100644
--- a/libstdc++-v3/include/bits/algorithmfwd.h
+++ b/libstdc++-v3/include/bits/algorithmfwd.h
@@ -531,7 +531,7 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
 reverse_copy(_BIter, _BIter, _OIter);
 
   template
-void 
+_FIter
 rotate(_FIter, _FIter, _FIter);
 
   template
diff --git a/libstdc++-v3/include/bits/stl_algo.h b/libstdc++-v3/include/bits/stl_algo.h
index da642e6..3325b94 100644
--- a/libstdc++-v3/include/bits/stl_algo.h
+++ b/libstdc++-v3/include/bits/stl_algo.h
@@ -1239,14 +1239,16 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
 
   /// This is a helper function for the rotate algorithm.
   template
-void
+_ForwardIterator
 __rotate(_ForwardIterator __first,
 	 _ForwardIterator __middle,
 	 _ForwardIterator __last,
 	 forward_iterator_tag)
 {
-  if (__first == __middle || __last  == __middle)
-	return;
+  if (__first == __middle)
+	return __last;
+  else if (__last  == __middle)
+	return __first;
 
   _ForwardIterator __first2 = __middle;
   do
@@ -1259,6 +1261,8 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
 	}
   while (__first2 != __last);
 
+  _ForwardIterator __ret = __first;
+
   __first2 = __middle;
 
   while (__first2 != __last)
@@ -1271,11 +1275,12 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
 	  else if (__first2 == __last)
 	__first2 = __middle;
 	}
+  return __ret;
 }
 
/// This is a helper function for the rotate algorithm.
   template
-void
+_BidirectionalIterator
 __rotate(_BidirectionalIterator __first,
 	 _BidirectionalIterator __middle,
 	 _BidirectionalIterator __last,
@@ -1285,8 +1290,10 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
   __glibcxx_function_requires(_Mutable_BidirectionalIteratorConcept<
   _BidirectionalIterator>)
 
-  if (__first == __middle || __last  == __middle)
-	return;
+  if (__first == __middle)
+	return __last;
+  else if (__last  == __middle)
+	return __first;
 
   std::__reverse(__first,  __middle, bidirectional_iterator_tag());
   std::__reverse(__middle, __last,   bidirectional_iterator_tag());
@@ -1298,14 +1305,20 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
 	}
 
   if (__first == __middle)
-	std::__reverse(__middle, __last,   bidirectional_iterator_tag());
+	{
+	  std::__reverse(__middle, __last,   bidirectional_iterator_tag());
+	  return __last;
+	}
   else
-	std::__reverse(__first,  __middle, bidirectional_iterator_tag());
+	{
+	  std::__reverse(__first,  __middle, bidirectional_iterator_tag());
+	  return __first;
+	}
 }
 
   /// This is a helper function for the rotate algorithm.
   template
-void
+_RandomAccessIterator
 __rotate(_RandomAccessIterator __first,
 	 _RandomAccessIterator __middle,
 	 _RandomAccessIterator __last,
@@ -1315,8 +1328,10 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
   __glibcxx_function_requires(_Mutable_RandomAccessIteratorConcept<
   _RandomAccessIterator>)
 
-  if (__first == __middle || __last  == __middle)
-	return;
+  if (__first == __middle)
+	return __last;
+  else if (__last  == __middle)
+	return __first;
 
   typedef typename iterator_traits<_RandomAccessIterator>::difference_type
 	_Distance;
@@ -1329,10 +1344,11 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
   if (__k == __n - __k)
 	{
 	  std::swap_ranges(__first, __middle, __middle);
-	  return;
+	  return __middle;
 	}
 
   _RandomAccessIterator __p = __first;
+  _RandomAccessIterator __ret = __first + (__last - __middle);
 
   for (;;)
 	{
@@ -1343,7 +1359,7 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
 		  _ValueType __t = _GLIBCXX_MOVE(*__p);
 		  _GLIBCXX_MOVE3(__p + 1, __p + __n, __p);
 		  *(__p + __n - 1) = _GLIBCXX_MOVE(__t);
-		  return;
+		  return __ret;
 		}
 	  _RandomAccessIterator __q = __p + __k;
 	  for (_Distance __i = 0; __i < __n - __k; ++ __i)
@@ -1354,7 +1370,7 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
 		}
 	  __n %= __k;
 	  if (__n == 0)
-		return;
+		return __ret;
 	  std::swap(__n, __k);
 	  __k = __n - __k;
 	}
@@ -1366,7 +1382,7 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
 		  _ValueType __t = _GLIBCXX_MOVE(*(__p + __n - 1));
 		  _GLIBCXX_MOVE_BACKWARD3(__p, __p + __n - 1, _

Re: RTL cprop vs. fixed hard regs

2015-01-16 Thread Alan Modra
On Sat, Jan 17, 2015 at 11:16:57AM +1030, Alan Modra wrote:
> On Fri, Jan 16, 2015 at 09:35:16AM -0700, Jeff Law wrote:
> > On 01/16/15 02:42, Alan Modra wrote:
> > >   * cprop.c (do_local_cprop): Disallow replacement of fixed
> > >   hard registers.
> > OK.  Extra credit for a testcase, ppc specific is obviously OK.
> 
> Thanks.  Committed revision 219786.  I'll see if I can come up with a
> reasonable testcase.

And now reverted due to Segher's objection.  Here's the testcase FWIW.

Index: gcc/testsuite/gcc.target/powerpc/cprophard.c
===
--- gcc/testsuite/gcc.target/powerpc/cprophard.c(revision 0)
+++ gcc/testsuite/gcc.target/powerpc/cprophard.c(working copy)
@@ -0,0 +1,13 @@
+/* { dg-do compile { target { powerpc64*-*-* && lp64 } } } */
+/* { dg-skip-if "" { powerpc*-*-darwin* } { "*" } { "" } } */
+/* { dg-options "-O2" } */
+/* { dg-final { scan-assembler "ld 2,(24|40)\\(1\\)" } } */
+
+/* From a linux kernel mis-compile of net/core/skbuff.c.  */
+register unsigned long current_r1 asm ("r1");
+
+void f (unsigned int n, void (*fun) (unsigned long))
+{
+  while (n--)
+(*fun) (current_r1 & -0x1000);
+}

-- 
Alan Modra
Australia Development Lab, IBM


[rl78] Various fixes and tweaks

2015-01-16 Thread DJ Delorie

Various RL78-specific fixes and tweaks wrt volatiles and addressing
modes.  Committed.

* config/rl78/rl78-real.md (addqi3_real): Allow volatiles.
(addhi3_real): Likewise.  Fix [HL+0] syntax.
(subqi3_real): Likewise.
(subhi3_real): Likewise.
(cbranchqi4_real): Likewise.  Allow saddr,#imm.
(cbranchhi4_real): Likewise.
(cbranchhi4_real_inverted): Likewise.
(cbranchsi4_real_lt): Likewise.
(cbranchsi4_real_ge): Likewise.
(cbranchsi4_real_ge): Likewise.
* config/rl78/rl78-virt.md (add3_virt): Likewise.
(sub3_virt): Likewise.
(cbranchqi4_virt): Likewise.
(cbranchhi4_virt): Likewise.
* config/rl78/rl78.c (rl78_print_operand_1): 'p' modifier means
always use '[reg+imm]' even when imm is zero.
* config/rl78/predicates.md (rl78_volatile_memory_operand): New.
(rl78_general_operand): New.
(rl78_nonimmediate_operand): New.
(rl78_nonfar_operand): Use them.
(rl78_nonfar_nonimm_operand): Likewise.
(rl78_stack_based_mem): Fix.
* config/rl78/constraints.md (Ibqi): New.
(IBqi): New.
(Wsa): New.
(Wsf): New.
(Cs1): Fix.
* config/rl78/rl78-expand.md (andqi3): Accept volatiles.
(iorqi3): Likewise.
(xorqi3): Likewise.
* config/rl78/rl78-protos.h (rl78_sfr_p): New.

* config/rl78/constrains (Qs8): New constraint.
* config/rl78/rl78.c (rl78_flags_already_set): New function.
* config/rl78/rl78-protos.h (rl78_flags_already_set): New prototype.
* config/rl78/rl78-real.md (update_Z): New attribute.
Update patterns to set it.
(cbranchqi4_real): Call rl78_flags_already_set() to determine if a
shorter compare and branch sequence can be used.
(cbranchhi4_real): Likewise.
(cbranchhi4_real_inverted): Likewise.

* config/rl78/predicates.md (uword_operand): Allow symbol_refs.
* config/rl78/rl78-c.c (rl78_register_pragmas): Register __near
address space.
* config/rl78/rl78.c (rl78_get_name_encoding): New.
(rl78_option_override): Allow -mes0 only if C.
(characterize_address): Support subregs of symbol_refs.
(rl78_addr_space_address_mode): Move.  Add __near.
(rl78_far_p): Likewise.
(rl78_addr_space_pointer_mode): Likewise.
(rl78_as_legitimate_address): Likewise.
(rl78_addr_space_subset_p): Likewise.
(rl78_addr_space_convert): Likewise.
(rl78_print_operand_1): Support 16-bit addressing of 32-bit
symbols with -mes0.
(transcode_memory_rtx): Don't copy ES if -mes0.  Allow symbol[BC]
addressing.
(rl78_alloc_physical_registers_op1): Change logic to prefer
symbol[BC] addressing.
(frodata_section): New.
(rl78_asm_init_sections): Initialize it.
(rl78_select_section): Put __far readonly symbols in .frodata.
(rl78_make_type_far): New.
(rl78_insert_attributes): Force all readonly symbols to be __far when 
-mes0.
(rl78_asm_out_integer): New.
* config/rl78/rl78.h (ADDR_SPACE_NEAR): New.
* config/rl78/rl78.opt (-mes0): New.

* config/rl78/rl78.h (ASM_OUTPUT_LABELREF): New.
(ASM_OUTPUT_ALIGNED_DECL_COMMON): New.
(ASM_OUTPUT_ALIGNED_DECL_LOCAL): New.
* config/rl78/rl78-protos.h (rl78_output_labelref): New.
(rl78_saddr_p): New.
(rl78_output_aligned_common): New.
* config/rl78/rl78.c (rl78_output_symbol_ref): Strip encodings.
(rl78_handle_saddr_attribute): New.
(rl78_handle_naked_attribute): New.
(rl78_attribute_table): Add saddr.
(rl78_print_operand_1): Don't print '!' on saddr operands.
(rl78_print_operand_1): Strip encodings.
(rl78_sfr_p): New.
(rl78_strip_name_encoding): New.
(rl78_attrlist_to_encoding): New.
(rl78_encode_section_info): New.
(rl78_asm_init_sections): New.
(rl78_select_section): New.
(rl78_output_labelref): New.
(rl78_output_aligned_common): New.
(rl78_asm_out_integer): New.
(rl78_asm_ctor_dtor): New.
(rl78_asm_constructor): New.
(rl78_asm_destructor): New.

* config/rl78/rl78-real.md (movqi_es): Rename to movqi_to_es.
* config/rl78/rl78.c (rl78_expand_epilogue): Update.
(transcode_memory_rtx): Update.
(rl78_expand_epilogue): Use A_REG instead of 0.

Index: config/rl78/predicates.md
===
--- config/rl78/predicates.md   (revision 219790)
+++ config/rl78/predicates.md   (working copy)
@@ -15,24 +15,40 @@
 ;; GNU General Public License for more details.
 
 ;; You should have received a copy of the GNU General Public License
 ;; along with GCC; see the file COPYING3.  If not see
 ;; .
 
-(define_predicate "rl78_any_ope

Re: RTL cprop vs. fixed hard regs

2015-01-16 Thread Alan Modra
On Fri, Jan 16, 2015 at 08:09:51PM -0600, Segher Boessenkool wrote:
> On Sat, Jan 17, 2015 at 11:07:12AM +1030, Alan Modra wrote:
> > On Fri, Jan 16, 2015 at 11:03:24AM -0600, Segher Boessenkool wrote:
> > > On Fri, Jan 16, 2015 at 08:12:27PM +1030, Alan Modra wrote:
> > > > OK, so we need to fix this in the rs6000 backend, but it occurs to me
> > > > that cprop also has a bug here.  It shouldn't be touching fixed hard
> > > > registers.
> > > 
> > > Why not?  It cannot allocate a fixed reg to a pseudo, but other than
> > > that there is nothing special about fixed regs; the transform is
> > > perfectly valid as far as I see.
> > 
> > I didn't say that copying to a pseudo and using that was invalid..
> > The bug I see is a mis-optimisation.
> 
> Ah, okay, good :-)
> 
> This same mis-optimisation would happen if r1 was just some regular
> non-fixed register, hrm.  Maybe something else in cprop needs some
> tuning up?

Well, if the original pseudo register dies earlier as a result of
substituting a copy then you've gained.

> > Also, the asm operands case that
> > do_local_cprop already rules out changing is very similar to fixed
> > regs.  Would you argue that changing asm operands is also valid?  :)
> 
> A fixed reg in an asm_operands is a hard reg; a hard reg in an asm_operands
> (before reload) is a register asm variable.  And we had better not change
> register variable asm arguments, since that is what we promise not to do
> with register variables.  The case is not similar at all.

The similarity I see is that we have a hard reg that is a register asm
variable here too.  How else do you get a copy from a fixed hard reg
to a pseudo?  Hrrm, maybe some backend code does that sort of thing.

> > > It isn't a desirable transform in this case, but that is not true for
> > > fixed regs in general (just because the stack pointer is live everywhere).
> > 
> > What's the point in extending the lifetime of some pseudo when you
> > know the original fixed register is available everywhere?
> 
> That is my point: _if_ you know it is live all the time, or if there is no
> advantage to shortening the lifetime of the value in that fixed reg, then
> yes we should not propagate that value.  But that is not true for all
> fixed regs.
> 
> > Do you have
> > some concrete example in mind where this "optimisation" is beneficial?
> 
> The CA_REG in rs6000 is a fixed register.  It isn't a terribly good
> example because it cannot be propagated anyway, for other reasons; but
> it will hopefully help explain my point.  So please pretend we can copy
> it to GPRs :-)  [ The situation with the T bit on SH is similar, but I
> don't know the details there well enough. ]
> 
> There is only one CA_REG.  It is used in quite a few sequences.  It
> contains a totally different value every time.  Because there is only
> one such register the instruction sequences around it cannot be reordered
> very well.
> 
> Propagating the value in such a not-so-very-fixed fixed reg helps reduce
> the lifetimes of the values in those regs, helps reordering, combining,
> scheduling, performance in general.
> 
> If you are only concerned about the stack pointer, you could just check
> for that?  But please add a comment in any case, saying why you exclude
> it (and ideally don't lump it in with tests that are needed for
> correctness).

No, I don't want to special-case sp.  That's horrible.  If the patch I
just committed is wrong, I'll revert it.

-- 
Alan Modra
Australia Development Lab, IBM


Re: [PR libgomp/64625] Remove __OFFLOAD_TABLE__ variable/formal parameter (was: Merge current set of OpenACC changes from gomp-4_0-branch)

2015-01-16 Thread Jack Howarth
Confirmed that this patch eliminates

[Bug libgomp/64625] ___OFFLOAD_TABLE__ symbol not produced on x86_64 darwin

and thus exposes

[Bug libgomp/64635] New: darwin produces
libgomp-plugin-host_nonshm.1.dylib but tries to load
libgomp-plugin-host_nonshm.so.1

The additional hack (which should be fixed with configure/Makefile.
changes to detect SHLIBEXT)...

@@ -1055,7 +1054,7 @@ static void
 gomp_target_init (void)
 {
   const char *prefix ="libgomp-plugin-";
-  const char *suffix = ".so.1";
+  const char *suffix = ".1.dylib";
   const char *cur, *next;
   char *plugin_name;

to target.c in libgomp eliminates the second bug.

Native configuration is x86_64-apple-darwin14.1.0

=== libgomp tests ===

Schedule of variations:
unix/-m32
unix/-m64

Running target unix/-m32
Using /sw/share/dejagnu/baseboards/unix.exp as board description file
for target.
Using /sw/share/dejagnu/config/unix.exp as generic interface file for target.
Using 
/sw/src/fink.build/gcc50-5.0.0-1000/gcc-5-20150116/libgomp/testsuite/config/default.exp
as tool-and-target-specific interface file.
Running 
/sw/src/fink.build/gcc50-5.0.0-1000/gcc-5-20150116/libgomp/testsuite/libgomp.c/c.exp
...
Running 
/sw/src/fink.build/gcc50-5.0.0-1000/gcc-5-20150116/libgomp/testsuite/libgomp.c++/c++.exp
...
Running 
/sw/src/fink.build/gcc50-5.0.0-1000/gcc-5-20150116/libgomp/testsuite/libgomp.fortran/fortran.exp
...
Running 
/sw/src/fink.build/gcc50-5.0.0-1000/gcc-5-20150116/libgomp/testsuite/libgomp.graphite/graphite.exp
...
Running 
/sw/src/fink.build/gcc50-5.0.0-1000/gcc-5-20150116/libgomp/testsuite/libgomp.oacc-c/c.exp
...
Running 
/sw/src/fink.build/gcc50-5.0.0-1000/gcc-5-20150116/libgomp/testsuite/libgomp.oacc-c++/c++.exp
...
Running 
/sw/src/fink.build/gcc50-5.0.0-1000/gcc-5-20150116/libgomp/testsuite/libgomp.oacc-fortran/fortran.exp
...

=== libgomp Summary for unix/-m32 ===

# of expected passes 5715
# of unsupported tests 281
Running target unix/-m64
Using /sw/share/dejagnu/baseboards/unix.exp as board description file
for target.
Using /sw/share/dejagnu/config/unix.exp as generic interface file for target.
Using 
/sw/src/fink.build/gcc50-5.0.0-1000/gcc-5-20150116/libgomp/testsuite/config/default.exp
as tool-and-target-specific interface file.
Running 
/sw/src/fink.build/gcc50-5.0.0-1000/gcc-5-20150116/libgomp/testsuite/libgomp.c/c.exp
...
Running 
/sw/src/fink.build/gcc50-5.0.0-1000/gcc-5-20150116/libgomp/testsuite/libgomp.c++/c++.exp
...
Running 
/sw/src/fink.build/gcc50-5.0.0-1000/gcc-5-20150116/libgomp/testsuite/libgomp.fortran/fortran.exp
...
Running 
/sw/src/fink.build/gcc50-5.0.0-1000/gcc-5-20150116/libgomp/testsuite/libgomp.graphite/graphite.exp
...
Running 
/sw/src/fink.build/gcc50-5.0.0-1000/gcc-5-20150116/libgomp/testsuite/libgomp.oacc-c/c.exp
...
Running 
/sw/src/fink.build/gcc50-5.0.0-1000/gcc-5-20150116/libgomp/testsuite/libgomp.oacc-c++/c++.exp
...
Running 
/sw/src/fink.build/gcc50-5.0.0-1000/gcc-5-20150116/libgomp/testsuite/libgomp.oacc-fortran/fortran.exp
...

=== libgomp Summary for unix/-m64 ===

# of expected passes 5715
# of unsupported tests 281

=== libgomp Summary ===

# of expected passes 11430
# of unsupported tests 562


On Fri, Jan 16, 2015 at 3:34 PM, Thomas Schwinge
 wrote:
> Hi!
>
> On Thu, 15 Jan 2015 21:20:07 +0100, I wrote:
>> In r219682, I have committed to trunk our current set of OpenACC changes,
>
> Here is a patch to remove the __OFFLOAD_SYMBOL__ variable/formal
> parameter, as discussed in <https://gcc.gnu.org/PR64625>.
>
> But -- I now wonder whether that's actually the issue that has been
> reported in the PR; doesn't that more look like a problem with the
> __OFFLOAD_TABLE__ symbol defined in libgcc/offloadstuff.c, and used in
> the mkoffload tools (such as gcc/config/i386/intelmic-mkoffload.c)?  Can
> anyone guess what's going on?
>
> Anyway, as discussed in <https://gcc.gnu.org/PR64625>, I'd like to commit
> this patch either way, OK?
>
> commit 4409d0129118479c1cd1adbcfa96316ac4e734b0
> Author: Thomas Schwinge 
> Date:   Fri Jan 16 20:12:12 2015 +0100
>
> [PR libgomp/64625] Remove __OFFLOAD_TABLE__ variable/formal parameter.
>
> gcc/
> * omp-low.c (offload_symbol_decl): Remove variable.
> (get_offload_symbol_decl): Remove function.
> (expand_omp_target): For BUILT_IN_GOMP_TARGET,
> BUILT_IN_GOMP_TARGET_DATA, BUILT_IN_GOMP_TARGET_UPDATE pass NULL
> instead of &__OFFLOAD_TABLE__, for BUILT_IN_GOACC_DATA_START,
> BUILT_IN_GOACC_ENTER_EXIT_DATA, BUILT_IN_GOACC_PARALLEL,
> BUILT_IN_GOACC_UPDATE don't pass it at all.
> libgomp/
> * libgomp_g.h (GOACC_data_start, GOACC_enter_exit_data)
> (GOACC_parallel, GOACC_update): Remove const_void *offload_table
> formal parameter.  Update all users.
> * target.c (GOMP_tar

[RFC][PATCH 3/3] Enable zero/sign extension elimination

2015-01-16 Thread Kugan

Re-enable zero/sign extension elimination using value range that
includes wrapped attribute.

Thanks,
Kugan


gcc/ChangeLog:

2015-01-16  Kugan Vivekanandarajah  

* calls.c (precompute_arguments): Check
 promoted_for_signed_and_unsigned_p and set the promoted mode.
* expr.c (expand_expr_real_1): Likewise.
(promoted_for_signed_and_unsigned_p): New function.
* cfgexpand.c (expand_gimple_stmt_1): Call emit_move_insn if
SUBREG is promoted with SRP_SIGNED_AND_UNSIGNED.
* expr.h (promoted_for_signed_and_unsigned_p): New definition.

diff --git a/gcc/calls.c b/gcc/calls.c
index 36aa19f..71d2469 100644
--- a/gcc/calls.c
+++ b/gcc/calls.c
@@ -1620,7 +1620,10 @@ precompute_arguments (int num_actuals, struct arg_data 
*args)
  args[i].initial_value
= gen_lowpart_SUBREG (mode, args[i].value);
  SUBREG_PROMOTED_VAR_P (args[i].initial_value) = 1;
- SUBREG_PROMOTED_SET (args[i].initial_value, args[i].unsignedp);
+ if (promoted_for_signed_and_unsigned_p (args[i].tree_value))
+   SUBREG_PROMOTED_SET (args[i].initial_value, 
SRP_SIGNED_AND_UNSIGNED);
+ else
+   SUBREG_PROMOTED_SET (args[i].initial_value, args[i].unsignedp);
}
}
 }
diff --git a/gcc/cfgexpand.c b/gcc/cfgexpand.c
index 8926e8f..39d52db 100644
--- a/gcc/cfgexpand.c
+++ b/gcc/cfgexpand.c
@@ -3410,7 +3410,13 @@ expand_gimple_stmt_1 (gimple stmt)
  GET_MODE (target), temp, unsignedp);
  }
 
-   convert_move (SUBREG_REG (target), temp, unsignedp);
+   if ((SUBREG_PROMOTED_GET (target) == SRP_SIGNED_AND_UNSIGNED)
+   && (GET_CODE (temp) == SUBREG)
+   && (GET_MODE (target) == GET_MODE (temp))
+   && (GET_MODE (SUBREG_REG (target)) == GET_MODE (SUBREG_REG 
(temp
+ emit_move_insn (SUBREG_REG (target), SUBREG_REG (temp));
+   else
+ convert_move (SUBREG_REG (target), temp, unsignedp);
  }
else if (nontemporal && emit_storent_insn (target, temp))
  ;
diff --git a/gcc/expr.c b/gcc/expr.c
index fc22862..48a5d13 100644
--- a/gcc/expr.c
+++ b/gcc/expr.c
@@ -174,6 +174,39 @@ static void emit_single_push_insn (machine_mode, rtx, 
tree);
 static void do_tablejump (rtx, machine_mode, rtx, rtx, rtx, int);
 static rtx const_vector_from_tree (tree);
 
+/* Return TRUE if value in SSA is zero and sign extended for wider mode MODE
+   using value range information stored.  Return FALSE otherwise.
+
+   This is used to check if SUBREG is zero and sign extended and to set
+   promoted mode SRP_SIGNED_AND_UNSIGNED to SUBREG.  */
+
+bool
+promoted_for_signed_and_unsigned_p (tree ssa)
+{
+  wide_int min, max;
+  bool ovf;
+
+  if (ssa == NULL_TREE
+  || TREE_CODE (ssa) != SSA_NAME
+  || !INTEGRAL_TYPE_P (TREE_TYPE (ssa)))
+return false;
+
+  /* Return FALSE if value_range is not recorded for SSA.  */
+  if (get_range_info (ssa, &min, &max, &ovf) != VR_RANGE)
+return false;
+
+  if (ovf)
+return false;
+
+  /* Return true (to set SRP_SIGNED_AND_UNSIGNED to SUBREG) if MSB of the
+ smaller mode is not set (i.e.  MSB of ssa is not set).  */
+  if (!wi::neg_p (min, SIGNED) && !wi::neg_p(max, SIGNED))
+return true;
+  else
+return false;
+
+}
+
 
 /* This is run to set up which modes can be used
directly in memory and to initialize the block move optab.  It is run
@@ -9656,7 +9689,10 @@ expand_expr_real_1 (tree exp, rtx target, machine_mode 
tmode,
 
  temp = gen_lowpart_SUBREG (mode, decl_rtl);
  SUBREG_PROMOTED_VAR_P (temp) = 1;
- SUBREG_PROMOTED_SET (temp, unsignedp);
+ if (promoted_for_signed_and_unsigned_p (ssa_name))
+   SUBREG_PROMOTED_SET (temp, SRP_SIGNED_AND_UNSIGNED);
+ else
+   SUBREG_PROMOTED_SET (temp, unsignedp);
  return temp;
}
 
diff --git a/gcc/expr.h b/gcc/expr.h
index a7638b8..8fb1339 100644
--- a/gcc/expr.h
+++ b/gcc/expr.h
@@ -288,6 +288,7 @@ extern rtx expand_expr_real_1 (tree, rtx, machine_mode,
   enum expand_modifier, rtx *, bool);
 extern rtx expand_expr_real_2 (sepops, rtx, machine_mode,
   enum expand_modifier);
+extern bool promoted_for_signed_and_unsigned_p (tree);
 
 /* Generate code for computing expression EXP.
An rtx for the computed value is returned.  The value is never null.


Re: RTL cprop vs. fixed hard regs

2015-01-16 Thread Segher Boessenkool
On Sat, Jan 17, 2015 at 11:07:12AM +1030, Alan Modra wrote:
> On Fri, Jan 16, 2015 at 11:03:24AM -0600, Segher Boessenkool wrote:
> > On Fri, Jan 16, 2015 at 08:12:27PM +1030, Alan Modra wrote:
> > > OK, so we need to fix this in the rs6000 backend, but it occurs to me
> > > that cprop also has a bug here.  It shouldn't be touching fixed hard
> > > registers.
> > 
> > Why not?  It cannot allocate a fixed reg to a pseudo, but other than
> > that there is nothing special about fixed regs; the transform is
> > perfectly valid as far as I see.
> 
> I didn't say that copying to a pseudo and using that was invalid..
> The bug I see is a mis-optimisation.

Ah, okay, good :-)

This same mis-optimisation would happen if r1 was just some regular
non-fixed register, hrm.  Maybe something else in cprop needs some
tuning up?

> Also, the asm operands case that
> do_local_cprop already rules out changing is very similar to fixed
> regs.  Would you argue that changing asm operands is also valid?  :)

A fixed reg in an asm_operands is a hard reg; a hard reg in an asm_operands
(before reload) is a register asm variable.  And we had better not change
register variable asm arguments, since that is what we promise not to do
with register variables.  The case is not similar at all.

> > It isn't a desirable transform in this case, but that is not true for
> > fixed regs in general (just because the stack pointer is live everywhere).
> 
> What's the point in extending the lifetime of some pseudo when you
> know the original fixed register is available everywhere?

That is my point: _if_ you know it is live all the time, or if there is no
advantage to shortening the lifetime of the value in that fixed reg, then
yes we should not propagate that value.  But that is not true for all
fixed regs.

> Do you have
> some concrete example in mind where this "optimisation" is beneficial?

The CA_REG in rs6000 is a fixed register.  It isn't a terribly good
example because it cannot be propagated anyway, for other reasons; but
it will hopefully help explain my point.  So please pretend we can copy
it to GPRs :-)  [ The situation with the T bit on SH is similar, but I
don't know the details there well enough. ]

There is only one CA_REG.  It is used in quite a few sequences.  It
contains a totally different value every time.  Because there is only
one such register the instruction sequences around it cannot be reordered
very well.

Propagating the value in such a not-so-very-fixed fixed reg helps reduce
the lifetimes of the values in those regs, helps reordering, combining,
scheduling, performance in general.

If you are only concerned about the stack pointer, you could just check
for that?  But please add a comment in any case, saying why you exclude
it (and ideally don't lump it in with tests that are needed for
correctness).

Cheers,


Segher


[RFC][PATCH 2/3] Propagate and save value ranges wrapped information

2015-01-16 Thread Kugan

This patch propagate value range wrapps attribute and save this to
SSA_NAME.

Thanks,
Kugan

gcc/ChangeLog:

2015-01-16  Kugan Vivekanandarajah  

* builtins.c (determine_block_size): Use new definition of
 get_range_info.
* gimple-pretty-print.c (dump_ssaname_info): Dump new wrapped info.
* internal-fn.c (get_range_pos_neg): Use new definition of
 get_range_info.
(get_min_precision): Likewise.
* tree-ssa-copy.c (fini_copy_prop): Use new definition of
 duplicate_ssa_range_info.
* tree-ssa-loop-im.c
(move_computations_dom_walker::before_dom_children): Likewise.
* tree-ssa-pre.c (eliminate_dom_walker::before_dom_children): Likewise.
* tree-ssa-loop-niter.c (determine_value_range): Use new definition.
* tree-ssanames.c (set_range_info): Save wrapped information.
(get_range_info): Retrive wrapped information.
(set_nonzero_bits): Set wrapped info.
(duplicate_ssa_name_range_info): Likewise.
(duplicate_ssa_name_fn): Likewise.
* tree-ssanames.h: (set_range_info): Update definition.
(get_range_info): Ditto.
* tree-vect-patterns.c (vect_recog_divmod_pattern): Use new
declaration get_range_info.
* tree-vrp.c (struct value_range_d): Add wrapped field.
(set_value_range): Calculate and add wrapped field.
(set_and_canonicalize_value_range): Likewise.
(copy_value_range): Likewise.
(set_value_range_to_value): Likewise.
(set_value_range_to_nonnegative): Likewise.
(set_value_range_to_nonnull): Likewise.
(set_value_range_to_truthvalue): Likewise.
(abs_extent_range): Likewise.
(get_value_range): Return wrapped info.
(update_value_range): Save wrapped info.
(extract_range_from_assert): Extract and update wrapped info.
(extract_range_from_ssa_name): Likewise.
(vrp_int_const_binop): Likewise.
(extract_range_from_multiplicative_op_1): Likewise.
(extract_range_from_binary_expr_1): Likewise.
(extract_range_from_binary_expr): Likewise.
(extract_range_from_unary_expr_1): Likewise.
(extract_range_from_comparison): Likewise.
(extract_range_basic): Likewise.
(adjust_range_with_scev): Likewise.
(dump_value_range): Dump wrapped info.
(remove_range_assertions): Update parameters.
(vrp_intersect_ranges_1): Propagate wrapped info.
(vrp_meet_1): Likewise.
(vrp_visit_phi_node): Save wrapped info to SSA.
(vrp_finalize): Likewise.
* tree.h (SSA_NAME_ANTI_RANGE_P): Remove.
(SSA_NAME_RANGE_OVF_P): New.

gcc/testsuite/ChangeLog:

2015-01-16  Kugan Vivekanandarajah  

* gcc.dg/tree-ssa/vrp92.c: Update scanned pattern.

diff --git a/gcc/builtins.c b/gcc/builtins.c
index 9280704..83a0882 100644
--- a/gcc/builtins.c
+++ b/gcc/builtins.c
@@ -3150,7 +3150,7 @@ determine_block_size (tree len, rtx len_rtx,
*probable_max_size = *max_size = GET_MODE_MASK (GET_MODE (len_rtx));
 
   if (TREE_CODE (len) == SSA_NAME)
-   range_type = get_range_info (len, &min, &max);
+   range_type = get_range_info (len, &min, &max, NULL);
   if (range_type == VR_RANGE)
{
  if (wi::fits_uhwi_p (min) && *min_size < min.to_uhwi ())
diff --git a/gcc/gimple-pretty-print.c b/gcc/gimple-pretty-print.c
index 21e98c6..4d41bca 100644
--- a/gcc/gimple-pretty-print.c
+++ b/gcc/gimple-pretty-print.c
@@ -1812,13 +1812,15 @@ dump_ssaname_info (pretty_printer *buffer, tree node, 
int spc)
   && SSA_NAME_RANGE_INFO (node))
 {
   wide_int min, max, nonzero_bits;
-  value_range_type range_type = get_range_info (node, &min, &max);
+  bool wrapped;
+  value_range_type range_type = get_range_info (node, &min, &max, 
&wrapped);
 
   if (range_type == VR_VARYING)
pp_printf (buffer, "# RANGE VR_VARYING");
   else if (range_type == VR_RANGE || range_type == VR_ANTI_RANGE)
{
  pp_printf (buffer, "# RANGE ");
+ pp_printf (buffer, "WRAPPED  = %s ", wrapped ? "true" : "false");
  pp_printf (buffer, "%s[", range_type == VR_RANGE ? "" : "~");
  pp_wide_int (buffer, min, TYPE_SIGN (TREE_TYPE (node)));
  pp_printf (buffer, ", ");
diff --git a/gcc/internal-fn.c b/gcc/internal-fn.c
index 07a9ec5..8955bb8 100644
--- a/gcc/internal-fn.c
+++ b/gcc/internal-fn.c
@@ -262,7 +262,7 @@ get_range_pos_neg (tree arg)
   if (TREE_CODE (arg) != SSA_NAME)
 return 3;
   wide_int arg_min, arg_max;
-  while (get_range_info (arg, &arg_min, &arg_max) != VR_RANGE)
+  while (get_range_info (arg, &arg_min, &arg_max, NULL) != VR_RANGE)
 {
   gimple g = SSA_NAME_DEF_STMT (arg);
   if (is_gimple_assign (g)
@@ -344,7 +344,7 @@ get_min_precision (tree arg, signop sign)
   if (TREE_CODE (arg) != SSA_NAME)
 return prec + (orig_sign != sign);
   wide_int arg_min, arg_max;
-  while (get_range_info 

[RFC][PATCH 1/3] Free a bit in SSA_NAME to save wrapped information

2015-01-16 Thread Kugan
Freeing a spare-bit to store wrapped attribute by going back to
representing VR_ANTI_RANGE as [max + 1, min - 1] in SSA_NAME.

Thanks,
Kugan

gcc/ChangeLog:

2015-01-16  Kugan Vivekanandarajah  

* tree-ssanames.c (set_range_info): Change range info representation
  and represent VR_ANTI_RANGE as [max + 1, min - 1].
(get_range_info): Likewise.
(set_nonzero_bits): Likewise.
(duplicate_ssa_name_range_info): Likewise.
* tree-ssanames.h (set_range_info): Change prototype.
(get_range_info): Likewise.
(set_nonzero_bits): Likewise.
(duplicate_ssa_name_range_info): Likewise.
* tree-vrp.c (remove_range_assertions): Use new representation.
(vrp_finalize): Likewise.

diff --git a/gcc/tree-ssanames.c b/gcc/tree-ssanames.c
index 9c39f65..744dc43 100644
--- a/gcc/tree-ssanames.c
+++ b/gcc/tree-ssanames.c
@@ -200,11 +200,10 @@ make_ssa_name_fn (struct function *fn, tree var, gimple 
stmt)
 /* Store range information RANGE_TYPE, MIN, and MAX to tree ssa_name NAME.  */
 
 void
-set_range_info (tree name, enum value_range_type range_type,
+set_range_info (tree name,
const wide_int_ref &min, const wide_int_ref &max)
 {
   gcc_assert (!POINTER_TYPE_P (TREE_TYPE (name)));
-  gcc_assert (range_type == VR_RANGE || range_type == VR_ANTI_RANGE);
   range_info_def *ri = SSA_NAME_RANGE_INFO (name);
   unsigned int precision = TYPE_PRECISION (TREE_TYPE (name));
 
@@ -219,16 +218,12 @@ set_range_info (tree name, enum value_range_type 
range_type,
   ri->set_nonzero_bits (wi::shwi (-1, precision));
 }
 
-  /* Record the range type.  */
-  if (SSA_NAME_RANGE_TYPE (name) != range_type)
-SSA_NAME_ANTI_RANGE_P (name) = (range_type == VR_ANTI_RANGE);
-
   /* Set the values.  */
   ri->set_min (min);
   ri->set_max (max);
 
   /* If it is a range, try to improve nonzero_bits from the min/max.  */
-  if (range_type == VR_RANGE)
+  if (wi::cmp (min, max, TYPE_SIGN (TREE_TYPE (name))) < 0)
 {
   wide_int xorv = ri->get_min () ^ ri->get_max ();
   if (xorv != 0)
@@ -248,6 +243,7 @@ get_range_info (const_tree name, wide_int *min, wide_int 
*max)
   gcc_assert (!POINTER_TYPE_P (TREE_TYPE (name)));
   gcc_assert (min && max);
   range_info_def *ri = SSA_NAME_RANGE_INFO (name);
+  value_range_type range_type;
 
   /* Return VR_VARYING for SSA_NAMEs with NULL RANGE_INFO or SSA_NAMEs
  with integral types width > 2 * HOST_BITS_PER_WIDE_INT precision.  */
@@ -255,9 +251,22 @@ get_range_info (const_tree name, wide_int *min, wide_int 
*max)
  > 2 * HOST_BITS_PER_WIDE_INT))
 return VR_VARYING;
 
-  *min = ri->get_min ();
-  *max = ri->get_max ();
-  return SSA_NAME_RANGE_TYPE (name);
+   /* If max < min, it is VR_ANTI_RANGE.  */
+  if (wi::cmp (ri->get_max (), ri->get_min (), TYPE_SIGN (TREE_TYPE (name))) < 
0)
+{
+  /* VR_ANTI_RANGE ~[min, max] is encoded as [max + 1, min - 1].  */
+  range_type = VR_ANTI_RANGE;
+  *min = wi::add (ri->get_max (), 1);
+  *max = wi::sub (ri->get_min (), 1);
+}
+  else
+{
+  /* Otherwise (when min <= max), it is VR_RANGE.  */
+  range_type = VR_RANGE;
+  *min = ri->get_min ();
+  *max = ri->get_max ();
+}
+  return range_type;
 }
 
 /* Change non-zero bits bitmask of NAME.  */
@@ -267,7 +276,7 @@ set_nonzero_bits (tree name, const wide_int_ref &mask)
 {
   gcc_assert (!POINTER_TYPE_P (TREE_TYPE (name)));
   if (SSA_NAME_RANGE_INFO (name) == NULL)
-set_range_info (name, VR_RANGE,
+set_range_info (name,
TYPE_MIN_VALUE (TREE_TYPE (name)),
TYPE_MAX_VALUE (TREE_TYPE (name)));
   range_info_def *ri = SSA_NAME_RANGE_INFO (name);
@@ -495,7 +504,8 @@ duplicate_ssa_name_ptr_info (tree name, struct ptr_info_def 
*ptr_info)
 /* Creates a duplicate of the range_info_def at RANGE_INFO of type
RANGE_TYPE for use by the SSA name NAME.  */
 void
-duplicate_ssa_name_range_info (tree name, enum value_range_type range_type,
+duplicate_ssa_name_range_info (tree name,
+  enum value_range_type range_type 
ATTRIBUTE_UNUSED,
   struct range_info_def *range_info)
 {
   struct range_info_def *new_range_info;
@@ -513,8 +523,6 @@ duplicate_ssa_name_range_info (tree name, enum 
value_range_type range_type,
   new_range_info = static_cast (ggc_internal_alloc (size));
   memcpy (new_range_info, range_info, size);
 
-  gcc_assert (range_type == VR_RANGE || range_type == VR_ANTI_RANGE);
-  SSA_NAME_ANTI_RANGE_P (name) = (range_type == VR_ANTI_RANGE);
   SSA_NAME_RANGE_INFO (name) = new_range_info;
 }
 
diff --git a/gcc/tree-ssanames.h b/gcc/tree-ssanames.h
index a7eeb8f..0d4b212 100644
--- a/gcc/tree-ssanames.h
+++ b/gcc/tree-ssanames.h
@@ -68,7 +68,7 @@ struct GTY ((variable_size)) range_info_def {
 enum value_range_type { VR_UNDEFINED, VR_RANGE, VR_ANTI_RANGE, VR_VARYING };
 
 /* Sets the value range to SSA.  */
-extern void set_range_info (tree, enum value

[RFC][PATCH 0/3] Re-enable zero/sign extension elimination using value ranges

2015-01-16 Thread Kugan
Hi,

Due to wrapping in the value ranges, there was a regression in
aplha-linux (https://gcc.gnu.org/ml/gcc-patches/2014-08/msg02458.html)
and hence I had to revert the patch that enabled zero/sign extension
elimination using value range. I have now attempted to propagate this
information and enable it again.

PATCH1: Free a bit in SSA_NAME to save wrapped information
PATCH2: Propagate and save value ranges wrapped information
PATCH3: Enable zero/sign extension elimination

Bootstrapped and regression tested for x86_64-linux-gnu with no new
regressions.

Thanks,
Kugan


[patch] libstdc++/60940 unify std::atomic_xxx typedefs with std::atomic types

2015-01-16 Thread Jonathan Wakely

Here's the finished version of the proof-of-concept patch I sent
in stage1.

This changes the atomic_int typedef to be a synonym for
std::atomic instead of std::__atomic_base, and likewise for
the other atomic integral types.

This fixes PR 60940, so that the non-member atomic ops such as
std::atomic_load(), std::atomic_store() etc. work on the typedefs, not
only on the std::atomic types.

Tested x86_64-linux, committed to trunk.
commit b831a68f637d7484e1a5f237f1632fbd3c62c489
Author: Jonathan Wakely 
Date:   Fri Oct 24 15:40:50 2014 +0100

	PR libstdc++/60940
	* include/bits/atomic_base.h: Remove atomic integral typedefs as
	synonyms for __atomic_base etc.
	* include/std/atomic: Make atomic_int a synonym for atomic and
	likewise for all atomic integral types.
	* testsuite/29_atomics/atomic_integral/cons/copy_list.cc: New.
	* testsuite/29_atomics/atomic/60695.cc: Adjust dg-error line number.

diff --git a/libstdc++-v3/include/bits/atomic_base.h b/libstdc++-v3/include/bits/atomic_base.h
index 29ee9e7..5e610f1 100644
--- a/libstdc++-v3/include/bits/atomic_base.h
+++ b/libstdc++-v3/include/bits/atomic_base.h
@@ -119,120 +119,6 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
   template
 struct __atomic_base;
 
-  /// atomic_char
-  typedef __atomic_base  	   		atomic_char;
-
-  /// atomic_schar
-  typedef __atomic_base	 	atomic_schar;
-
-  /// atomic_uchar
-  typedef __atomic_base		atomic_uchar;
-
-  /// atomic_short
-  typedef __atomic_base			atomic_short;
-
-  /// atomic_ushort
-  typedef __atomic_base	 	atomic_ushort;
-
-  /// atomic_int
-  typedef __atomic_base  	   		atomic_int;
-
-  /// atomic_uint
-  typedef __atomic_base	 	atomic_uint;
-
-  /// atomic_long
-  typedef __atomic_base  	   		atomic_long;
-
-  /// atomic_ulong
-  typedef __atomic_base		atomic_ulong;
-
-  /// atomic_llong
-  typedef __atomic_base  		atomic_llong;
-
-  /// atomic_ullong
-  typedef __atomic_base 	atomic_ullong;
-
-  /// atomic_wchar_t
-  typedef __atomic_base  		atomic_wchar_t;
-
-  /// atomic_char16_t
-  typedef __atomic_base  		atomic_char16_t;
-
-  /// atomic_char32_t
-  typedef __atomic_base  		atomic_char32_t;
-
-  /// atomic_char32_t
-  typedef __atomic_base  		atomic_char32_t;
-
-
-  /// atomic_int_least8_t
-  typedef __atomic_base  		atomic_int_least8_t;
-
-  /// atomic_uint_least8_t
-  typedef __atomic_base	   	atomic_uint_least8_t;
-
-  /// atomic_int_least16_t
-  typedef __atomic_base	   	atomic_int_least16_t;
-
-  /// atomic_uint_least16_t
-  typedef __atomic_base	   	atomic_uint_least16_t;
-
-  /// atomic_int_least32_t
-  typedef __atomic_base	   	atomic_int_least32_t;
-
-  /// atomic_uint_least32_t
-  typedef __atomic_base	   	atomic_uint_least32_t;
-
-  /// atomic_int_least64_t
-  typedef __atomic_base	   	atomic_int_least64_t;
-
-  /// atomic_uint_least64_t
-  typedef __atomic_base	   	atomic_uint_least64_t;
-
-
-  /// atomic_int_fast8_t
-  typedef __atomic_base  		atomic_int_fast8_t;
-
-  /// atomic_uint_fast8_t
-  typedef __atomic_base	  	atomic_uint_fast8_t;
-
-  /// atomic_int_fast16_t
-  typedef __atomic_base	  	atomic_int_fast16_t;
-
-  /// atomic_uint_fast16_t
-  typedef __atomic_base	  	atomic_uint_fast16_t;
-
-  /// atomic_int_fast32_t
-  typedef __atomic_base	  	atomic_int_fast32_t;
-
-  /// atomic_uint_fast32_t
-  typedef __atomic_base	  	atomic_uint_fast32_t;
-
-  /// atomic_int_fast64_t
-  typedef __atomic_base	  	atomic_int_fast64_t;
-
-  /// atomic_uint_fast64_t
-  typedef __atomic_base	  	atomic_uint_fast64_t;
-
-
-  /// atomic_intptr_t
-  typedef __atomic_base  	   	atomic_intptr_t;
-
-  /// atomic_uintptr_t
-  typedef __atomic_base  	   	atomic_uintptr_t;
-
-  /// atomic_size_t
-  typedef __atomic_base	 	   	atomic_size_t;
-
-  /// atomic_intmax_t
-  typedef __atomic_base  	   	atomic_intmax_t;
-
-  /// atomic_uintmax_t
-  typedef __atomic_base  	   	atomic_uintmax_t;
-
-  /// atomic_ptrdiff_t
-  typedef __atomic_base  	   	atomic_ptrdiff_t;
-
 
 #define ATOMIC_VAR_INIT(_VI) { _VI }
 
diff --git a/libstdc++-v3/include/std/atomic b/libstdc++-v3/include/std/atomic
index 61611af..43cf4f3 100644
--- a/libstdc++-v3/include/std/atomic
+++ b/libstdc++-v3/include/std/atomic
@@ -49,21 +49,25 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
* @{
*/
 
-  /// atomic_bool
+  template
+struct atomic;
+
+  /// atomic
   // NB: No operators or fetch-operations for this type.
-  struct atomic_bool
+  template<>
+  struct atomic
   {
   private:
 __atomic_base	_M_base;
 
   public:
-atomic_bool() noexcept = default;
-~atomic_bool() noexcept = default;
-atomic_bool(const atomic_bool&) = delete;
-atomic_bool& operator=(const atomic_bool&) = delete;
-atomic_bool& operator=(const atomic_bool&) volatile = delete;
+atomic() noexcept = default;
+~atomic() noexcept = default;
+atomic(const atomic&) = delete;
+atomic& operator=(const atomic&) = delete;
+atom

Re: RTL cprop vs. fixed hard regs

2015-01-16 Thread Alan Modra
On Fri, Jan 16, 2015 at 09:35:16AM -0700, Jeff Law wrote:
> On 01/16/15 02:42, Alan Modra wrote:
> > * cprop.c (do_local_cprop): Disallow replacement of fixed
> > hard registers.
> OK.  Extra credit for a testcase, ppc specific is obviously OK.

Thanks.  Committed revision 219786.  I'll see if I can come up with a
reasonable testcase.

-- 
Alan Modra
Australia Development Lab, IBM


Re: RTL cprop vs. fixed hard regs

2015-01-16 Thread Alan Modra
On Fri, Jan 16, 2015 at 11:03:24AM -0600, Segher Boessenkool wrote:
> On Fri, Jan 16, 2015 at 08:12:27PM +1030, Alan Modra wrote:
> > OK, so we need to fix this in the rs6000 backend, but it occurs to me
> > that cprop also has a bug here.  It shouldn't be touching fixed hard
> > registers.
> 
> Why not?  It cannot allocate a fixed reg to a pseudo, but other than
> that there is nothing special about fixed regs; the transform is
> perfectly valid as far as I see.

I didn't say that copying to a pseudo and using that was invalid..
The bug I see is a mis-optimisation.  Also, the asm operands case that
do_local_cprop already rules out changing is very similar to fixed
regs.  Would you argue that changing asm operands is also valid?  :)

> It isn't a desirable transform in this case, but that is not true for
> fixed regs in general (just because the stack pointer is live everywhere).

What's the point in extending the lifetime of some pseudo when you
know the original fixed register is available everywhere?  Do you have
some concrete example in mind where this "optimisation" is beneficial?

Some ports even include pc in fixed_regs.  So there are obvious
examples where regs in fixed_regs change behind the compiler's back.
Naive users might even expect to see the "current" value of those
regs.  (Again, I'm not saying that it is invalid if gcc substituted an
older value.)

-- 
Alan Modra
Australia Development Lab, IBM


Re: [PR libgomp/64625] Remove __OFFLOAD_TABLE__ variable/formal parameter (was: Merge current set of OpenACC changes from gomp-4_0-branch)

2015-01-16 Thread Ilya Verbin
On 16 Jan 19:23, Jack Howarth wrote:
> As I read https://gcc.gnu.org/bugzilla/show_bug.cgi?id=64625#c3, the
> requirement for  __OFFLOAD_TABLE__ was not longer present and the
> residual usages of it just had to be removed. The weak symbol on
> darwin is fragile and seems to trip up on the existing code which
> produces undefined symbols for ___OFFLOAD_TABLE__...
> 
> # nm e.50.1.o | grep OFF
>  U ___OFFLOAD_TABLE__
> 
> rather than
> 
> $ nm e.50.1.o | grep OFF
>  w __OFFLOAD_TABLE__
> 
> for all of the test cases.

I believe that the initial patch, which removes get_offload_symbol_decl, will
fix this.

  -- Ilya


Re: [PR libgomp/64625] Remove __OFFLOAD_TABLE__ variable/formal parameter (was: Merge current set of OpenACC changes from gomp-4_0-branch)

2015-01-16 Thread Jack Howarth
As I read https://gcc.gnu.org/bugzilla/show_bug.cgi?id=64625#c3, the
requirement for  __OFFLOAD_TABLE__ was not longer present and the
residual usages of it just had to be removed. The weak symbol on
darwin is fragile and seems to trip up on the existing code which
produces undefined symbols for ___OFFLOAD_TABLE__...

# nm e.50.1.o | grep OFF
 U ___OFFLOAD_TABLE__

rather than

$ nm e.50.1.o | grep OFF
 w __OFFLOAD_TABLE__

for all of the test cases.

On Fri, Jan 16, 2015 at 6:30 PM, Ilya Verbin  wrote:
> On 16 Jan 18:22, Jack Howarth wrote:
>> On 86_64 Fedora 15, current gcc trunk only produces…
>>
>> nm libgcc_s.so.1 | grep OFF
>> 00215478 d _GLOBAL_OFFSET_TABLE_
>>
>> and not __OFFLOAD_TABLE__,  The  libgcc_s.so.1 built on
>> x86_64-apple-darwin14 doesn't even contain the _GLOBAL_OFFSET_TABLE_
>> symbol.
>>
>> On Fri, Jan 16, 2015 at 5:40 PM, Ilya Verbin  wrote:
>> > Why do you think so?  __OFFLOAD_TABLE__ symbol lives in 
>> > libgcc/offloadstuff.c
>> > since November without regressions.
>
> That's correct.
> 1. offloadstuff.c isn't linked into libgcc_s.so.1
> 2. __OFFLOAD_TABLE__ is guarded with ENABLE_OFFLOADING, which is disabled in
> default configuration.
>
>   -- Ilya


[patch] libstdc++/56785 reduce space overhead of nested tuples

2015-01-16 Thread Jonathan Wakely

This replaces the current empty _Tuple_impl that terminates the
recursive inheritance hierarchy, instead adding the extra code to the
last base class that holds data so that the recursion terminates there
instead.

The purpose of this is to avoid nested tuples having two instances of
the same _Tuple_impl base class, which cannot be placed at the same
address and so take up space despite being empty.

Tested x86_64-linux, committed to trunk.
commit 65e06eb5b8ee42fb024307538380f8a375aba7ca
Author: Jonathan Wakely 
Date:   Mon Jun 23 23:41:08 2014 +0100

	PR libstdc++/56785
	* include/std/tuple (_Tuple_impl): Remove zero-element specialization
	and define one-element specialization.
	* testsuite/20_util/tuple/56785.cc: New.

diff --git a/libstdc++-v3/include/std/tuple b/libstdc++-v3/include/std/tuple
index b710049..e500a76 100644
--- a/libstdc++-v3/include/std/tuple
+++ b/libstdc++-v3/include/std/tuple
@@ -158,30 +158,6 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
   template
 struct _Tuple_impl; 
 
-  /**
-   * Zero-element tuple implementation. This is the basis case for the 
-   * inheritance recursion.
-   */
-  template
-struct _Tuple_impl<_Idx>
-{
-  template friend class _Tuple_impl;
-
-  _Tuple_impl() = default;
-
-  template
-_Tuple_impl(allocator_arg_t, const _Alloc&) { }
-
-  template
-_Tuple_impl(allocator_arg_t, const _Alloc&, const _Tuple_impl&) { }
-
-  template
-_Tuple_impl(allocator_arg_t, const _Alloc&, _Tuple_impl&&) { }
-
-protected:
-  void _M_swap(_Tuple_impl&) noexcept { /* no-op */ }
-};
-
   template
 struct __is_empty_non_tuple : is_empty<_Tp> { };
 
@@ -358,6 +334,130 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
   }
 };
 
+  // Basis case of inheritance recursion.
+  template
+struct _Tuple_impl<_Idx, _Head>
+: private _Head_base<_Idx, _Head, __empty_not_final<_Head>::value>
+{
+  template friend class _Tuple_impl;
+
+  typedef _Head_base<_Idx, _Head, __empty_not_final<_Head>::value> _Base;
+
+  static constexpr _Head&
+  _M_head(_Tuple_impl& __t) noexcept { return _Base::_M_head(__t); }
+
+  static constexpr const _Head&
+  _M_head(const _Tuple_impl& __t) noexcept { return _Base::_M_head(__t); }
+
+  constexpr _Tuple_impl()
+  : _Base() { }
+
+  explicit
+  constexpr _Tuple_impl(const _Head& __head)
+  : _Base(__head) { }
+
+  template
+explicit
+constexpr _Tuple_impl(_UHead&& __head)
+	: _Base(std::forward<_UHead>(__head)) { }
+
+  constexpr _Tuple_impl(const _Tuple_impl&) = default;
+
+  constexpr
+  _Tuple_impl(_Tuple_impl&& __in)
+  noexcept(is_nothrow_move_constructible<_Head>::value)
+  : _Base(std::forward<_Head>(_M_head(__in))) { }
+
+  template
+constexpr _Tuple_impl(const _Tuple_impl<_Idx, _UHead>& __in)
+	: _Base(_Tuple_impl<_Idx, _UHead>::_M_head(__in)) { }
+
+  template
+constexpr _Tuple_impl(_Tuple_impl<_Idx, _UHead>&& __in)
+	: _Base(std::forward<_UHead>(_Tuple_impl<_Idx, _UHead>::_M_head(__in)))
+	{ }
+
+  template
+	_Tuple_impl(allocator_arg_t __tag, const _Alloc& __a)
+	: _Base(__tag, __use_alloc<_Head>(__a)) { }
+
+  template
+	_Tuple_impl(allocator_arg_t __tag, const _Alloc& __a,
+		const _Head& __head)
+	: _Base(__use_alloc<_Head, _Alloc, _Head>(__a), __head) { }
+
+  template
+	_Tuple_impl(allocator_arg_t __tag, const _Alloc& __a,
+	_UHead&& __head)
+	: _Base(__use_alloc<_Head, _Alloc, _UHead>(__a),
+	std::forward<_UHead>(__head)) { }
+
+  template
+_Tuple_impl(allocator_arg_t __tag, const _Alloc& __a,
+	const _Tuple_impl& __in)
+	: _Base(__use_alloc<_Head, _Alloc, _Head>(__a), _M_head(__in)) { }
+
+  template
+	_Tuple_impl(allocator_arg_t __tag, const _Alloc& __a,
+	_Tuple_impl&& __in)
+	: _Base(__use_alloc<_Head, _Alloc, _Head>(__a),
+	std::forward<_Head>(_M_head(__in))) { }
+
+  template
+	_Tuple_impl(allocator_arg_t __tag, const _Alloc& __a,
+	const _Tuple_impl<_Idx, _UHead>& __in)
+	: _Base(__use_alloc<_Head, _Alloc, _Head>(__a),
+		_Tuple_impl<_Idx, _UHead>::_M_head(__in)) { }
+
+  template
+	_Tuple_impl(allocator_arg_t __tag, const _Alloc& __a,
+	_Tuple_impl<_Idx, _UHead>&& __in)
+	: _Base(__use_alloc<_Head, _Alloc, _UHead>(__a),
+std::forward<_UHead>(_Tuple_impl<_Idx, _UHead>::_M_head(__in)))
+	{ }
+
+  _Tuple_impl&
+  operator=(const _Tuple_impl& __in)
+  {
+	_M_head(*this) = _M_head(__in);
+	return *this;
+  }
+
+  _Tuple_impl&
+  operator=(_Tuple_impl&& __in)
+  noexcept(is_nothrow_move_assignable<_Head>::value)
+  {
+	_M_head(*this) = std::forward<_Head>(_M_head(__in));
+	return *this;
+  }
+
+  template
+_Tuple_impl&
+operator=(const _Tuple_impl<_Idx, _UHead>& __in)
+{
+	  _M_head(*this) = _Tuple_impl<_Idx, _UHead>::_M_head(__in);
+	  return *this;
+	}
+
+   

Re: [patch] Add and last pieces of C++11 std::lib

2015-01-16 Thread Jonathan Wakely

On 16/01/15 23:38 +, Jonathan Wakely wrote:

This defines the C++11 header  and adds the wstring_convert
and wbuffer_convert utilities.

These need lots more tests, so if anyone understands how to use them
please test them and report problems to Bugzilla.

Tested x86_64-linux, committed to trunk.


Here's a tiny tweak to some of the new tests, committed to trunk.
commit 7da829b76fec8438f677b9b2914e77c0bad32a2d
Author: redi 
Date:   Sat Jan 17 00:12:50 2015 +

	* testsuite/22_locale/codecvt/codecvt_utf8/requirements/1.cc:
	Remove unused header.
	* testsuite/22_locale/codecvt/codecvt_utf16/requirements/1.cc:
	Likewise.
	* testsuite/22_locale/codecvt/codecvt_utf8_utf16/requirements/1.cc:
	Likewise.

git-svn-id: svn+ssh://gcc.gnu.org/svn/gcc/trunk@219781 138bc75d-0d04-0410-961f-82ee72b054a4

diff --git a/libstdc++-v3/testsuite/22_locale/codecvt/codecvt_utf16/requirements/1.cc b/libstdc++-v3/testsuite/22_locale/codecvt/codecvt_utf16/requirements/1.cc
index 38bb393..457c65c 100644
--- a/libstdc++-v3/testsuite/22_locale/codecvt/codecvt_utf16/requirements/1.cc
+++ b/libstdc++-v3/testsuite/22_locale/codecvt/codecvt_utf16/requirements/1.cc
@@ -21,7 +21,6 @@
 
 #include 
 #include 
-#include 
 
 template
   using codecvt = std::codecvt;
diff --git a/libstdc++-v3/testsuite/22_locale/codecvt/codecvt_utf8/requirements/1.cc b/libstdc++-v3/testsuite/22_locale/codecvt/codecvt_utf8/requirements/1.cc
index 6bc2418..3629cfb 100644
--- a/libstdc++-v3/testsuite/22_locale/codecvt/codecvt_utf8/requirements/1.cc
+++ b/libstdc++-v3/testsuite/22_locale/codecvt/codecvt_utf8/requirements/1.cc
@@ -21,7 +21,6 @@
 
 #include 
 #include 
-#include 
 
 template
   using codecvt = std::codecvt;
diff --git a/libstdc++-v3/testsuite/22_locale/codecvt/codecvt_utf8_utf16/requirements/1.cc b/libstdc++-v3/testsuite/22_locale/codecvt/codecvt_utf8_utf16/requirements/1.cc
index 5e5f8dd..d73a856 100644
--- a/libstdc++-v3/testsuite/22_locale/codecvt/codecvt_utf8_utf16/requirements/1.cc
+++ b/libstdc++-v3/testsuite/22_locale/codecvt/codecvt_utf8_utf16/requirements/1.cc
@@ -21,7 +21,6 @@
 
 #include 
 #include 
-#include 
 
 template
   using codecvt = std::codecvt;


[SH] Introduce treg_set_expr

2015-01-16 Thread Oleg Endo
Hi,

The attached patch does a couple of things, which are based on the
treg_set_expr (for an explanation/motivation see below).  Somehow the
stuff just kept piling on and it was difficult to make step-by-step
patches for all the individual issues.  Some patterns needed to be
rewritten to keep the existing test cases happy.  Some patterns became
redundant.  If really really needed, I could try to split it into
multiple patches that do the changes bit by bit.  However, it makes only
sense as a whole somehow.

Tested with
make -k check RUNTESTFLAGS="--target_board=sh-sim
\{-m2/-ml,-m2/-mb,-m2a/-mb,-m4/-ml,-m4/-mb,-m4a/-ml,-m4a/-mb}"

and one new failure on SH2A (-m2a):
FAIL: tr1/6_containers/unordered_set/26132.cc execution test

which is a heap-stack collision.  Since that test case has been failing
here before for the other SH variants in the same way, I didn't pursue
it further.

There is one minor fallout/regression that this patch causes, which is
reduced usage of the SH2A movu.{b|w} insn (zero extending QI/HImode mem
load).  I had to disable the expansion of that insn as it makes
eliminating zero extensions a bit difficult in some cases.  The movu.{b|
w} insn should be used as a last-resort option in a peephole like pass
after combine/split1, but before RA.  I'll try to fix that soon.

Kaz, could you please test the patch on your sh4-linux setup and report
your findings?  Even though it's a bit late, I'd like to get this in for
GCC 5, if it doesn't break too many things.

Cheers,
Oleg


treg_set_expr explanation/motivation:

On SH there are insns that compute a value and store the result in the T
bit (1 bit register), such as comparison results, shifted out MSB/LSB
bits etc.  Then there are also insns which take the T bit as an operand,
such as rotates or add/sub/neg-with-carry.  Some of the insns that set
the T bit are only discovered during combine.  div0s is one such
example:

(define_insn "cmp_div0s"
  [(set (reg:SI T_REG)
(lshiftrt:SI (xor:SI (match_operand:SI 0 "arith_reg_operand" "%r")
 (match_operand:SI 1 "arith_reg_operand" "r"))
 (const_int 31)))]
  "TARGET_SH1"
  "div0s%0,%1"
  [(set_attr "type" "arith")])

In order to match e.g. div0s-addc or div0s-subc sequences, it's usually
required to write down patterns for all the combinations.  Instead of
doing that, I had the idea of a special operand predicate which would
match any expression for which there is an insn in the .md that does
   (set (reg:SI T_REG) ())

This predicate then can be used in insns that take the T bit as an
operand like this:

(define_insn_and_split "*addc"
  [(set (match_operand:SI 0 "arith_reg_dest")
(plus:SI (plus:SI (match_operand:SI 1 "arith_reg_operand")
  (match_operand:SI 2 "arith_reg_or_0_operand"))
 (match_operand 3 "treg_set_expr")))
   (clobber (reg:SI T_REG))]

... which means for operand 3: Match any expression, which can be
calculated into the T bit, using one of the existing patterns in
the .md.  After combine, in the split1 pass, a function is used to split
out the appropriate T bit setting insn and substitute the expression at
operand 3 with a simple T_REG.

This makes the example addc pattern above automatically cover cases such
as reg+reg+1, reg+reg+(reg & 1), reg+reg+((reg >> 31) & 1), since there
are insns that can do T = 1, T = reg & 1, T = (reg >> 31) & 1.

Then there are also some insns (again discovered during combine) which
can only store the result into the T bit, such as the single bit extract
patterns.  However, if those results are required in a GP reg instead of
the T bit, it's usually required to add insn_and_split variants that do
a T -> GP reg move afterwards.
The treg_set_expr predicate can be used to match all those insns with a
single one:
(define_insn_and_split "any_treg_expr_to_reg"
  [(set (match_operand:SI 0 "arith_reg_dest")
(match_operand 1 "treg_set_expr"))
   (clobber (reg:SI T_REG))]

... which then splits out the appropriate T bit setting insn and appends
a T -> GP reg move.

Having the treg_set_expr thing opens some new doors here and there to
implement specific insn (re-)combinations, which combine would not
handle that easily by itself.  For example, some of the single bit zero
extracts can store only the negated extracted bit in the T bit register.
When this is fed into an addc insn, the explicit T bit negation can be
avoided by replacing the addc insn with a subc insn.

The whole thing is implemented by constructing a temporary insn
   (set (reg:SI T_REG) ())
and invoking recog.  However, since this happens while matching the
treg_set_expr predicate, recog must be invoked in a re-entrant way.  To
do that, the global recog_data struct needs to be saved and restored
before returning back into recog.  This seems to work OK.  If any other
target is interested in doing the same, maybe we should extend recog
itself and make it re-entrant.

gcc/ChangeLog
 

[patch] DW_AT_producer: Ignore -fpreprocessed

2015-01-16 Thread Jan Kratochvil
Hi,

I have provided a sufficient fix in GDB for the -fplugin=libcc1plugin feature:
[patch+7.9] compile: Filter out -fpreprocessed
https://sourceware.org/ml/gdb-patches/2015-01/msg00485.html

But still I think "-fpreprocessed" is inappropriate for DW_AT_producer.

Otherwise for an inferior built using ccache the string "-fpreprocessed" is
put into DW_AT_producer and then GDB compilation with -fplugin=libcc1plugin
breaks as the GDB-runtime-generated source is not already preprocessed.

I have rebuilt GCC but I have not run the testsuite with this patch.


Jan
gcc/ChangeLog
* dwarf2out.c (gen_producer_string): Ignore also OPT_fpreprocessed.

Index: ./gcc/dwarf2out.c
===
--- ./gcc/dwarf2out.c   (revision 219770)
+++ ./gcc/dwarf2out.c   (working copy)
@@ -19624,6 +19624,7 @@ gen_producer_string (void)
   case OPT__sysroot_:
   case OPT_nostdinc:
   case OPT_nostdinc__:
+  case OPT_fpreprocessed:
/* Ignore these.  */
continue;
   default:


[patch] Add and last pieces of C++11 std::lib

2015-01-16 Thread Jonathan Wakely

This defines the C++11 header  and adds the wstring_convert
and wbuffer_convert utilities.

These need lots more tests, so if anyone understands how to use them
please test them and report problems to Bugzilla.

Tested x86_64-linux, committed to trunk.
commit dd8ed5fe16c0d0a5805504da9ea98da86837a550
Author: Jonathan Wakely 
Date:   Fri Jan 16 23:07:50 2015 +

Implement C++11  header.

	* config/abi/pre/gnu.ver: Export new symbols.
	* include/Makefile.am: Add codecvt.
	* include/Makefile.in: Regenerate.
	* include/std/codecvt: New header.
	* src/c++11/codecvt.cc (__codecvt_utf8_base, __codecvt_utf16_base,
	__codecvt_utf8_utf16_base): Define specializations.
	* testsuite/22_locale/codecvt/codecvt_utf8/requirements/1.cc: New.
	* testsuite/22_locale/codecvt/codecvt_utf16/requirements/1.cc: New.
	* testsuite/22_locale/codecvt/codecvt_utf8_utf16/requirements/1.cc:
	New.

diff --git a/libstdc++-v3/config/abi/pre/gnu.ver b/libstdc++-v3/config/abi/pre/gnu.ver
index dc83ad4..d23306e 100644
--- a/libstdc++-v3/config/abi/pre/gnu.ver
+++ b/libstdc++-v3/config/abi/pre/gnu.ver
@@ -1769,6 +1769,17 @@ GLIBCXX_3.4.21 {
   std::__atomic_futex_unsigned_base*;
 };
 
+# codecvt_utf8 etc.
+_ZNKSt19__codecvt_utf8_base*;
+_ZNSt19__codecvt_utf8_base*;
+_ZT[ISV]St19__codecvt_utf8_base*;
+_ZNKSt20__codecvt_utf16_base*;
+_ZNSt20__codecvt_utf16_base*;
+_ZT[ISV]St20__codecvt_utf16_base*;
+_ZNKSt25__codecvt_utf8_utf16_base*;
+_ZNSt25__codecvt_utf8_utf16_base*;
+_ZT[ISV]St25__codecvt_utf8_utf16_base*;
+
 } GLIBCXX_3.4.20;
 
 
diff --git a/libstdc++-v3/include/Makefile.am b/libstdc++-v3/include/Makefile.am
index 4772950..285a504 100644
--- a/libstdc++-v3/include/Makefile.am
+++ b/libstdc++-v3/include/Makefile.am
@@ -31,6 +31,7 @@ std_headers = \
 	${std_srcdir}/atomic \
 	${std_srcdir}/bitset \
 	${std_srcdir}/chrono \
+	${std_srcdir}/codecvt \
 	${std_srcdir}/complex \
 	${std_srcdir}/condition_variable \
 	${std_srcdir}/deque \
diff --git a/libstdc++-v3/include/std/codecvt b/libstdc++-v3/include/std/codecvt
new file mode 100644
index 000..d58a0ec
--- /dev/null
+++ b/libstdc++-v3/include/std/codecvt
@@ -0,0 +1,179 @@
+//  -*- C++ -*-
+
+// Copyright (C) 2015 Free Software Foundation, Inc.
+//
+// This file is part of the GNU ISO C++ Library.  This library is free
+// software; you can redistribute it and/or modify it under the
+// terms of the GNU General Public License as published by the
+// Free Software Foundation; either version 3, or (at your option)
+// any later version.
+
+// This library is distributed in the hope that it will be useful,
+// but WITHOUT ANY WARRANTY; without even the implied warranty of
+// MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+// GNU General Public License for more details.
+
+// Under Section 7 of GPL version 3, you are granted additional
+// permissions described in the GCC Runtime Library Exception, version
+// 3.1, as published by the Free Software Foundation.
+
+// You should have received a copy of the GNU General Public License and
+// a copy of the GCC Runtime Library Exception along with this program;
+// see the files COPYING3 and COPYING.RUNTIME respectively.  If not, see
+// .
+
+// ISO C++ 14882: 22.5  Standard code conversion facets
+
+/** @file include/codecvt
+ *  This is a Standard C++ Library header.
+ */
+
+#ifndef _GLIBCXX_CODECVT
+#define _GLIBCXX_CODECVT 1
+
+#pragma GCC system_header
+
+#if __cplusplus < 201103L
+# include 
+#else
+
+#include 
+#include 
+
+#ifdef _GLIBCXX_USE_C99_STDINT_TR1
+
+namespace std _GLIBCXX_VISIBILITY(default)
+{
+_GLIBCXX_BEGIN_NAMESPACE_VERSION
+
+  enum codecvt_mode
+  {
+consume_header = 4,
+generate_header = 2,
+little_endian = 1
+  };
+
+  template
+class codecvt_utf8 : public codecvt<_Elem, char, mbstate_t>
+{
+public:
+  explicit
+  codecvt_utf8(size_t __refs = 0);
+
+  ~codecvt_utf8();
+};
+
+  template
+class codecvt_utf16 : public codecvt<_Elem, char, mbstate_t>
+{
+public:
+  explicit
+  codecvt_utf16(size_t __refs = 0);
+
+  ~codecvt_utf16();
+};
+
+  template
+class codecvt_utf8_utf16 : public codecvt<_Elem, char, mbstate_t>
+{
+public:
+  explicit
+  codecvt_utf8_utf16(size_t __refs = 0);
+
+  ~codecvt_utf8_utf16();
+};
+
+#define _GLIBCXX_CODECVT_SPECIALIZATION2(_NAME, _ELEM) \
+  template<> \
+class _NAME<_ELEM> \
+: public codecvt<_ELEM, char, mbstate_t> \
+{ \
+public: \
+  typedef _ELEM			intern_type; \
+  typedef char			extern_type; \
+  typedef mbstate_t			state_type; \
+ \
+protected: \
+  _NAME(unsigned long __maxcode, codecvt_mode __mode, size_t __refs) \
+  : codecvt(__refs), _M_maxcode(__maxcode), _M_mode(__mode) { } \
+ \
+  virtual \
+  ~_NAME(); \
+ \
+  virtual result \
+  do_out(state_type& __state, const intern_type* __from

Re: [PR libgomp/64625] Remove __OFFLOAD_TABLE__ variable/formal parameter (was: Merge current set of OpenACC changes from gomp-4_0-branch)

2015-01-16 Thread Ilya Verbin
On 16 Jan 18:22, Jack Howarth wrote:
> On 86_64 Fedora 15, current gcc trunk only produces…
> 
> nm libgcc_s.so.1 | grep OFF
> 00215478 d _GLOBAL_OFFSET_TABLE_
> 
> and not __OFFLOAD_TABLE__,  The  libgcc_s.so.1 built on
> x86_64-apple-darwin14 doesn't even contain the _GLOBAL_OFFSET_TABLE_
> symbol.
> 
> On Fri, Jan 16, 2015 at 5:40 PM, Ilya Verbin  wrote:
> > Why do you think so?  __OFFLOAD_TABLE__ symbol lives in 
> > libgcc/offloadstuff.c
> > since November without regressions.

That's correct.
1. offloadstuff.c isn't linked into libgcc_s.so.1
2. __OFFLOAD_TABLE__ is guarded with ENABLE_OFFLOADING, which is disabled in
default configuration.

  -- Ilya


Re: [libgo] Build fix for sparc-linux

2015-01-16 Thread Ian Lance Taylor
On Fri, Jan 16, 2015 at 12:37 PM, Richard Henderson  wrote:
>
> The glibc setcontext modifies %g7, so the SETCONTEXT_CLOBBERS_TLS configure
> test triggers.
>
> My guess is that Solaris doesn't clobber %g7, or Rainer would have noticed 
> this
> before.  Is it worth also checking for __linux__ in the elif test, or just 
> wait
> until someone notices that they have an incompatible set of 
> definitions?

Thanks, committed to mainline.

I agree with just committing it--let's see if anybody complains.

Ian


Re: [PR libgomp/64625] Remove __OFFLOAD_TABLE__ variable/formal parameter (was: Merge current set of OpenACC changes from gomp-4_0-branch)

2015-01-16 Thread Jack Howarth
On 86_64 Fedora 15, current gcc trunk only produces…

nm libgcc_s.so.1 | grep OFF
00215478 d _GLOBAL_OFFSET_TABLE_

and not __OFFLOAD_TABLE__,  The  libgcc_s.so.1 built on
x86_64-apple-darwin14 doesn't even contain the _GLOBAL_OFFSET_TABLE_
symbol.

On Fri, Jan 16, 2015 at 5:40 PM, Ilya Verbin  wrote:
> On 16 Jan 21:34, Thomas Schwinge wrote:
>> On Thu, 15 Jan 2015 21:20:07 +0100, I wrote:
>> Here is a patch to remove the __OFFLOAD_SYMBOL__ variable/formal
>> parameter, as discussed in .
>>
>> But -- I now wonder whether that's actually the issue that has been
>> reported in the PR; doesn't that more look like a problem with the
>> __OFFLOAD_TABLE__ symbol defined in libgcc/offloadstuff.c, and used in
>> the mkoffload tools (such as gcc/config/i386/intelmic-mkoffload.c)?  Can
>> anyone guess what's going on?
>
> Why do you think so?  __OFFLOAD_TABLE__ symbol lives in libgcc/offloadstuff.c
> since November without regressions.
>
>   -- Ilya


Re: Use static chain and libffi for Go closures

2015-01-16 Thread Ian Lance Taylor
On Fri, Jan 16, 2015 at 2:22 AM, Uros Bizjak  wrote:
>
> You should also revert alpha specific change to
> libgo/go/testing/quick/quick_test.go, please see [1] and [2].
>
> [1] https://gcc.gnu.org/ml/gcc-patches/2013-03/msg00038.html
> [2] https://gcc.gnu.org/ml/gcc-patches/2013-03/msg00038/foo.patch

Done like so.  Committed to mainline.

Ian
diff -r bb70e852004f libgo/go/testing/quick/quick_test.go
--- a/libgo/go/testing/quick/quick_test.go  Fri Jan 16 13:28:21 2015 -0800
+++ b/libgo/go/testing/quick/quick_test.go  Fri Jan 16 15:17:42 2015 -0800
@@ -7,7 +7,6 @@
 import (
"math/rand"
"reflect"
-   "runtime"
"testing"
 )
 
@@ -158,12 +157,10 @@
reportError("fFloat32Alias", CheckEqual(fFloat32Alias, fFloat32Alias, 
nil), t)
reportError("fFloat64", CheckEqual(fFloat64, fFloat64, nil), t)
reportError("fFloat64Alias", CheckEqual(fFloat64Alias, fFloat64Alias, 
nil), t)
-   if runtime.GOARCH != "alpha" {
-   reportError("fComplex64", CheckEqual(fComplex64, fComplex64, 
nil), t)
-   reportError("fComplex64Alias", CheckEqual(fComplex64Alias, 
fComplex64Alias, nil), t)
-   reportError("fComplex128", CheckEqual(fComplex128, fComplex128, 
nil), t)
-   reportError("fComplex128Alias", CheckEqual(fComplex128Alias, 
fComplex128Alias, nil), t)
-   }
+   reportError("fComplex64", CheckEqual(fComplex64, fComplex64, nil), t)
+   reportError("fComplex64Alias", CheckEqual(fComplex64Alias, 
fComplex64Alias, nil), t)
+   reportError("fComplex128", CheckEqual(fComplex128, fComplex128, nil), t)
+   reportError("fComplex128Alias", CheckEqual(fComplex128Alias, 
fComplex128Alias, nil), t)
reportError("fInt16", CheckEqual(fInt16, fInt16, nil), t)
reportError("fInt16Alias", CheckEqual(fInt16Alias, fInt16Alias, nil), t)
reportError("fInt32", CheckEqual(fInt32, fInt32, nil), t)


common.opt and optimization attribute housekeeping

2015-01-16 Thread Jan Hubicka
Hi,
this patch finishes transition of IPA passes to be per-function (and thus 
behaving sanely
at LTO time when different flags are mixed).
It also goes through common.opt and fixes attributes of quite few flags:

fauto-inc-dec, fdelete-dead-exceptions, ffunction-cse,
fgraphite, fstrict-volatile-bitfields, fira-algorithm, fira-region,
fira-share-save-slots, fira-share-spill-slots,
fmodulo-sched-allow-regmoves, fpartial-inlining,
sched-stalled-insns, fsched-stalled-insns-dep, fstrict-overflow,
ftracer, ftree-parallelize-loops, fassociative-math,
freciprocal-math, fvect-cost-model, fsimd-cost-model, flag_stack_reuse

are all function local properties and thus should be marked as Optimization.
while

fauto-profile, fcommon, fdata-sections, fipa-icf-variables,
ftoplevel-reorder, funit-at-a-time, fwhole-program

currentwly won't work in optimization attributes (and won't get properly 
maintained
to LTO). This is becuase either they affect variables that are not annotated or 
because
they are decided globally for whole compilation unit.

There are few cases I need to look into still:

flag-function-sections
  - Here we can support optimizaiton attribute but currently don't - basically 
it
needs revisiting the few flag_function_sections uses and making the 
opt_for_fn.
flag_gnu_tm
flag_gnu_unique
  - Those probably can be turned into Optimization but I have no clue.
flag_proflie_reorder_functions
  - can be handled but isn't at the moment.
flag_split_stack
  - this one needs some unit finalization
flag_stack_protector
  - Probably can be supported as Optimization
flag_strict_aliasing
  - This is marked as Optimization, but I blieve it does not work that way 
because
alias classes are assigned to types that gets shared.  It would be great to 
support
this since a lot of real code is mixing the settings.
flag_toplevel_reorder
  - This can be supported but isn't
Similarly to flag_proflie_reorder_functions needs a bit tweaking of lto and 
cgraphunit.
fp_contract_mode
  - This probalby should be optimization but doing so makes the awk machinery 
to fail.

Bootstrapped/regtested x86_64-linux, will commit it later today if there are no 
complains.

Honza

* ipa-reference.c (set_reference_optimization_summary,
ipa_reference_get_not_written_global): Do nothing if ipa-reference is 
disabled.
(propagate_bits): If ipa-reference is disabled, do not look into local 
properties.
(analyze_function): Disable analysis when ipa_reference is disabled.
(generate_summary): Do not dump when reference is disabled.
(get_read_write_all_from_node): When ipa-reference is disabled, use the
node flags.
(gate): Enable for LTO.
* optc-save-gen.awk: Handle optimize_debug correctly.
* opth-gen.awk: Likewise.
* common.opt (fauto-inc-dec, fdelete-dead-exceptions, ffunction-cse,
fgraphite, fstrict-volatile-bitfields, fira-algorithm, fira-region,
fira-share-save-slots, fira-share-spill-slots,
fmodulo-sched-allow-regmoves, fpartial-inlining,
sched-stalled-insns, fsched-stalled-insns-dep, fstrict-overflow,
ftracer, ftree-parallelize-loops, fassociative-math,
freciprocal-math, fvect-cost-model, fsimd-cost-model): Mark as
Optimization
(fauto-profile, fcommon, fdata-sections, fipa-icf-variables,
ftoplevel-reorder, funit-at-a-time, fwhole-program): Do not mark as
Optimization.
* ipa-icf.c (gate, sem_item_optimizer::filter_removed_items):
Fix for IPA.
Index: ipa-reference.c
===
--- ipa-reference.c (revision 219756)
+++ ipa-reference.c (working copy)
@@ -198,6 +198,9 @@ set_reference_optimization_summary (stru
 bitmap
 ipa_reference_get_not_read_global (struct cgraph_node *fn)
 {
+  if (!opt_for_fn (fn->decl, flag_ipa_reference)
+  || !opt_for_fn (current_function_decl, flag_ipa_reference))
+return NULL;
   ipa_reference_optimization_summary_t info =
 get_reference_optimization_summary (fn->function_symbol (NULL));
   if (info)
@@ -216,6 +219,9 @@ ipa_reference_get_not_read_global (struc
 bitmap
 ipa_reference_get_not_written_global (struct cgraph_node *fn)
 {
+  if (!opt_for_fn (fn->decl, flag_ipa_reference)
+  || !opt_for_fn (current_function_decl, flag_ipa_reference))
+return NULL;
   ipa_reference_optimization_summary_t info =
 get_reference_optimization_summary (fn);
   if (info)
@@ -381,8 +387,9 @@ propagate_bits (ipa_reference_global_var
 
   /* Only look into nodes we can propagate something.  */
   int flags = flags_from_decl_or_type (y->decl);
-  if (avail > AVAIL_INTERPOSABLE
- || (avail == AVAIL_INTERPOSABLE && (flags & ECF_LEAF)))
+  if (opt_for_fn (y->decl, flag_ipa_reference)
+ && (avail > AVAIL_INTERPOSABLE
+ || (avail == AVAIL_IN

Re: Merge current set of OpenACC changes from gomp-4_0-branch

2015-01-16 Thread Ilya Verbin
Hi!

On 15 Jan 21:20, Thomas Schwinge wrote:
> In r219682, I have committed to trunk our current set of OpenACC changes,
> which we had prepared on gomp-4_0-branch.  Thanks to everyone who has
> been contributing!

Unfortunately, it broke offloading from shared libraries (I mean common libs
with NEEDED entries, not dlopened).  Such things are not covered by the
testsuite, that's why you missed this issue.  Here is a simple testcase:

+ test.c: +

int f_aaa (void);

int main ()
{
  int x = f_aaa ();
  #pragma omp target
x++;
  return x;
}

+ libaaa.c: +

int f_aaa (void)
{
  int x = 0;
  #pragma omp target
x = 10;
  return x;
}

++

$ gcc -fopenmp -shared -fPIC libaaa.c -o libaaa.so
$ gcc -fopenmp -L. -laaa test.c
$ ./a.out
libgomp: Target function wasn't mapped


The problem seems to be here:

-gomp_register_images_for_device (struct gomp_device_descr *device)
+gomp_register_image_for_device (struct gomp_device_descr *device,
+   struct offload_image_descr *image)
 {
-  int i;
-  for (i = 0; i < num_offload_images; i++)
+  if (!device->offload_regions_registered
+  && (device->type == image->type
+ || device->type == OFFLOAD_TARGET_TYPE_HOST))
 {
-  struct offload_image_descr *image = &offload_images[i];
-  if (image->type == device->type)
-   device->register_image_func (image->host_table, image->target_data);
+  device->register_image_func (image->host_table, image->target_data);
+  device->offload_regions_registered = true;
 }
 }

So, you don't assume that a device can have multiple images from multiple libs?

  -- Ilya


Re: Use static chain and libffi for Go closures

2015-01-16 Thread Ian Lance Taylor
On Thu, Jan 15, 2015 at 2:12 PM, Richard Henderson  wrote:
>
> All of this has been posted before.
>
> I believe the ABI change is something we should have for gcc 5.
>
> Yes, the libffi merge has been causing problems (on targets that
> don't even support libgo, annoyingly), but missing support for
> the new libffi interfaces is easier to remedy in a dot release
> than the change from __thread variable to static chain register
> for implementing closures.

Thanks very much.  Looks great.  Committed to mainline.

Ian


Re: [PR libgomp/64625] Remove __OFFLOAD_TABLE__ variable/formal parameter (was: Merge current set of OpenACC changes from gomp-4_0-branch)

2015-01-16 Thread Ilya Verbin
On 16 Jan 21:34, Thomas Schwinge wrote:
> On Thu, 15 Jan 2015 21:20:07 +0100, I wrote:
> Here is a patch to remove the __OFFLOAD_SYMBOL__ variable/formal
> parameter, as discussed in .
> 
> But -- I now wonder whether that's actually the issue that has been
> reported in the PR; doesn't that more look like a problem with the
> __OFFLOAD_TABLE__ symbol defined in libgcc/offloadstuff.c, and used in
> the mkoffload tools (such as gcc/config/i386/intelmic-mkoffload.c)?  Can
> anyone guess what's going on?

Why do you think so?  __OFFLOAD_TABLE__ symbol lives in libgcc/offloadstuff.c
since November without regressions.

  -- Ilya


Re: Fix a MinGW warning in libiberty/strerror.c

2015-01-16 Thread DJ Delorie

> Thanks.  Do I need to hear from someone else approving this, or can I
> go ahead and commit?

Go ahead and commit.


[PATCH, committed] Parallelize the jit testsuite

2015-01-16 Thread David Malcolm
This patch adds "check-jit" to lang_checks_parallelized, and
sets check_jit_parallelize, enabling jit.exp to be split across
multiple jobs.

Running:
  time make -j64 check-jit
I saw:
  Before:
real   6m49.601s
user   6m19.851s
sys0m15.824s
  After
real   2m55.869s
user   6m33.339s
sys0m17.579s

i.e. about 50% reduction in wallclock time.

Committed to trunk as r219774.

FWIW, this is dominated by test-threads.c, which has most of the
testsuite in one process, each running in a different thread (the
threads spend most of their time waiting for the jit mutex).

FWIW, it's possible to dramatically speed up the jit testsuite
(by about a factor of 5) by hacking jit.dg/harness.h and setting
GCC_JIT_BOOL_OPTION_SELFCHECK_GC there to 0 (I do this when
developing new changes, but I have it turned back on for final
testing, since we want to shake out any GC memory issues).

gcc/jit/ChangeLog:
* Make-lang.in (lang_checks_parallelized): Add "check-jit".
(check_jit_parallelize): Set this to an arbitrary value (10).
---
 gcc/jit/Make-lang.in | 5 +
 1 file changed, 5 insertions(+)

diff --git a/gcc/jit/Make-lang.in b/gcc/jit/Make-lang.in
index 551b115..e622690 100644
--- a/gcc/jit/Make-lang.in
+++ b/gcc/jit/Make-lang.in
@@ -247,6 +247,11 @@ jit.man:
 jit.srcman:
 
 lang_checks += check-jit
+lang_checks_parallelized += check-jit
+# This number is somewhat arbitrary.  Two tests are much slower
+# than all the others (test-combination.c and test-threads.c) so
+# we want them to be placed in different "buckets".
+check_jit_parallelize = 10
 
 #
 # Install hooks:
-- 
1.8.5.3



Re: Merge current set of OpenACC changes from gomp-4_0-branch

2015-01-16 Thread Andreas Schwab
FAIL: c-c++-common/goacc/acc_on_device-2-off.c  -std=c++98  scan-rtl-dump-times 
expand "(call [^n]*"acc_on_device" 1
FAIL: c-c++-common/goacc/acc_on_device-2-off.c  -std=c++11  scan-rtl-dump-times 
expand "(call [^n]*"acc_on_device" 1
FAIL: c-c++-common/goacc/acc_on_device-2-off.c  -std=c++14  scan-rtl-dump-times 
expand "(call [^n]*"acc_on_device" 1
XPASS: c-c++-common/goacc/acc_on_device-2.c  -std=c++98  scan-rtl-dump-times 
expand "(call [^n]*"acc_on_device" 0
XPASS: c-c++-common/goacc/acc_on_device-2.c  -std=c++11  scan-rtl-dump-times 
expand "(call [^n]*"acc_on_device" 0
XPASS: c-c++-common/goacc/acc_on_device-2.c  -std=c++14  scan-rtl-dump-times 
expand "(call [^n]*"acc_on_device" 0
FAIL: c-c++-common/goacc/acc_on_device-2-off.c scan-rtl-dump-times expand 
"(call [^n]*"acc_on_device" 1
FAIL: gcc.dg/goacc/acc_on_device-1.c scan-rtl-dump-times expand "(call 
[^n]*"acc_on_device" 4

You are making invalid assumptions about the form of a call pattern.

(call_insn 7 6 8 2 (set (reg:SI 0 %d0)
    (call (mem:QI (reg/f:SI 33) [0 acc_on_device S1 A8])
(const_int 4 [0x4]))) 
/daten/aranym/gcc/gcc-20150116/gcc/testsuite/c-c++-common/goacc/acc_on_device-2-off.c:19
 -1
 (nil)
(nil))

Andreas.

-- 
Andreas Schwab, sch...@linux-m68k.org
GPG Key fingerprint = 58CA 54C7 6D53 942B 1756  01D3 44D5 214B 8276 4ED5
"And now for something completely different."


Re: Stage3 closing soon, call for patch pings

2015-01-16 Thread Jeff Law

On 01/16/15 13:14, Magnus Granberg wrote:

torsdag 15 januari 2015 13.26.43 skrev  H.J. Lu:

On Thu, Jan 15, 2015 at 1:04 PM, Jeff Law  wrote:

Stage3 is closing rapidly.  I've drained my queue of patches I was
tracking
for gcc-5.However, note that I don't track everything.  If it's a
patch
for a backend, language other than C or seemingly has another maintainer
that's engaged in review, then I haven't been tracking the patch.

So this is my final call for patch pings.  I've got some bandwidth and may
be able to look at a few patches that have otherwise stalled.


This one was updated yesterday:

https://gcc.gnu.org/ml/gcc-patches/2015-01/msg00956.html

I guess it won't hurt to list it here.


---
H.J.

Jeff can that be commited?
Thank you H.J. for the work with it.
Hoping folks more familiar with it will wrap it up.  I'm nowhere near up 
to speed on this change yet.


jeff


Re: [debug-early] C++ clones and limbo DIEs

2015-01-16 Thread Jason Merrill

On 01/16/2015 12:50 PM, Aldy Hernandez wrote:

Can you remove the first flush and just do it in the second place?


If I only flush the limbo list in the second place, that's basically
what mainline does, albeit abstracted into a function.  I thought the
whole point was to get rid of the limbo list, or at least keep it from
being a structure that has to go through LTO streaming.


It would expect it to be before free_lang_data and LTO streaming.

Jason




[libgo] Build fix for sparc-linux

2015-01-16 Thread Richard Henderson
The glibc setcontext modifies %g7, so the SETCONTEXT_CLOBBERS_TLS configure
test triggers.

My guess is that Solaris doesn't clobber %g7, or Rainer would have noticed this
before.  Is it worth also checking for __linux__ in the elif test, or just wait
until someone notices that they have an incompatible set of 
definitions?


r~
diff --git a/libgo/runtime/proc.c b/libgo/runtime/proc.c
index 20fbc0a..3b00a65 100644
--- a/libgo/runtime/proc.c
+++ b/libgo/runtime/proc.c
@@ -126,6 +126,30 @@ fixcontext(ucontext_t* c)
c->uc_mcontext._mc_tlsbase = tlsbase;
 }
 
+# elif defined(__sparc__)
+
+static inline void
+initcontext(void)
+{
+}
+
+static inline void
+fixcontext(ucontext_t *c)
+{
+   /* ??? Using 
+register unsigned long thread __asm__("%g7");
+c->uc_mcontext.gregs[REG_G7] = thread;
+  results in
+error: variable ‘thread’ might be clobbered by \
+   ‘longjmp’ or ‘vfork’ [-Werror=clobbered]
+  which ought to be false, as %g7 is a fixed register.  */
+
+   if (sizeof (c->uc_mcontext.gregs[REG_G7]) == 8)
+   asm ("stx %%g7, %0" : "=m"(c->uc_mcontext.gregs[REG_G7]));
+   else
+   asm ("st %%g7, %0" : "=m"(c->uc_mcontext.gregs[REG_G7]));
+}
+
 # else
 
 #  error unknown case for SETCONTEXT_CLOBBERS_TLS


[PR libgomp/64625] Remove __OFFLOAD_TABLE__ variable/formal parameter (was: Merge current set of OpenACC changes from gomp-4_0-branch)

2015-01-16 Thread Thomas Schwinge
Hi!

On Thu, 15 Jan 2015 21:20:07 +0100, I wrote:
> In r219682, I have committed to trunk our current set of OpenACC changes,

Here is a patch to remove the __OFFLOAD_SYMBOL__ variable/formal
parameter, as discussed in .

But -- I now wonder whether that's actually the issue that has been
reported in the PR; doesn't that more look like a problem with the
__OFFLOAD_TABLE__ symbol defined in libgcc/offloadstuff.c, and used in
the mkoffload tools (such as gcc/config/i386/intelmic-mkoffload.c)?  Can
anyone guess what's going on?

Anyway, as discussed in , I'd like to commit
this patch either way, OK?

commit 4409d0129118479c1cd1adbcfa96316ac4e734b0
Author: Thomas Schwinge 
Date:   Fri Jan 16 20:12:12 2015 +0100

[PR libgomp/64625] Remove __OFFLOAD_TABLE__ variable/formal parameter.

gcc/
* omp-low.c (offload_symbol_decl): Remove variable.
(get_offload_symbol_decl): Remove function.
(expand_omp_target): For BUILT_IN_GOMP_TARGET,
BUILT_IN_GOMP_TARGET_DATA, BUILT_IN_GOMP_TARGET_UPDATE pass NULL
instead of &__OFFLOAD_TABLE__, for BUILT_IN_GOACC_DATA_START,
BUILT_IN_GOACC_ENTER_EXIT_DATA, BUILT_IN_GOACC_PARALLEL,
BUILT_IN_GOACC_UPDATE don't pass it at all.
libgomp/
* libgomp_g.h (GOACC_data_start, GOACC_enter_exit_data)
(GOACC_parallel, GOACC_update): Remove const_void *offload_table
formal parameter.  Update all users.
* target.c (GOMP_target, GOMP_target_data, GOMP_target_update):
Document unused formal parameter.
---
 gcc/omp-low.c   | 45 ++---
 libgomp/libgomp_g.h | 10 +-
 libgomp/oacc-parallel.c |  8 
 libgomp/target.c| 11 +--
 4 files changed, 32 insertions(+), 42 deletions(-)

diff --git gcc/omp-low.c gcc/omp-low.c
index b7bf338..1589310 100644
--- gcc/omp-low.c
+++ gcc/omp-low.c
@@ -340,30 +340,6 @@ oacc_max_threads (omp_context *ctx)
 /* Holds offload tables with decls.  */
 vec *offload_funcs, *offload_vars;
 
-/* Holds a decl for __OFFLOAD_TABLE__.  */
-static GTY(()) tree offload_symbol_decl;
-
-/* Get the __OFFLOAD_TABLE__ symbol.  */
-static tree
-get_offload_symbol_decl (void)
-{
-  if (!offload_symbol_decl)
-{
-  tree decl = build_decl (UNKNOWN_LOCATION, VAR_DECL,
- get_identifier ("__OFFLOAD_TABLE__"),
- ptr_type_node);
-  TREE_ADDRESSABLE (decl) = 1;
-  TREE_PUBLIC (decl) = 1;
-  DECL_EXTERNAL (decl) = 1;
-  DECL_WEAK (decl) = 1;
-  DECL_ATTRIBUTES (decl)
-   = tree_cons (get_identifier ("weak"),
-NULL_TREE, DECL_ATTRIBUTES (decl));
-  offload_symbol_decl = decl;
-}
-  return offload_symbol_decl;
-}
-
 /* Convenience function for calling scan_omp_1_op on tree operands.  */
 
 static inline tree
@@ -9119,16 +9095,31 @@ expand_omp_target (struct omp_region *region)
 }
 
   gimple g;
-  tree offload_table = get_offload_symbol_decl ();
   vec *args;
   /* The maximum number used by any start_ix, without varargs.  */
-  unsigned int argcnt = 12;
+  unsigned int argcnt = 11;
 
   vec_alloc (args, argcnt);
   args->quick_push (device);
   if (offloaded)
 args->quick_push (build_fold_addr_expr (child_fn));
-  args->quick_push (build_fold_addr_expr (offload_table));
+  switch (start_ix)
+{
+case BUILT_IN_GOMP_TARGET:
+case BUILT_IN_GOMP_TARGET_DATA:
+case BUILT_IN_GOMP_TARGET_UPDATE:
+  /* This const void * is part of the current ABI, but we're not actually
+using it.  */
+  args->quick_push (build_zero_cst (ptr_type_node));
+  break;
+case BUILT_IN_GOACC_DATA_START:
+case BUILT_IN_GOACC_ENTER_EXIT_DATA:
+case BUILT_IN_GOACC_PARALLEL:
+case BUILT_IN_GOACC_UPDATE:
+  break;
+default:
+  gcc_unreachable ();
+}
   args->quick_push (t1);
   args->quick_push (t2);
   args->quick_push (t3);
diff --git libgomp/libgomp_g.h libgomp/libgomp_g.h
index c1e4e63..5e88d45 100644
--- libgomp/libgomp_g.h
+++ libgomp/libgomp_g.h
@@ -217,15 +217,15 @@ extern void GOMP_teams (unsigned int, unsigned int);
 
 /* oacc-parallel.c */
 
-extern void GOACC_data_start (int, const void *,
- size_t, void **, size_t *, unsigned short *);
+extern void GOACC_data_start (int, size_t, void **, size_t *,
+ unsigned short *);
 extern void GOACC_data_end (void);
-extern void GOACC_enter_exit_data (int, const void *, size_t, void **,
+extern void GOACC_enter_exit_data (int, size_t, void **,
   size_t *, unsigned short *, int, int, ...);
-extern void GOACC_parallel (int, void (*) (void *), const void *, size_t,
+extern void GOACC_parallel (int, void (*) (void *), size_t,
void **, size_t *, unsigned short *, int, int, int,
int, int, ...);
-extern void GOACC_up

Re: Stage3 closing soon, call for patch pings

2015-01-16 Thread Magnus Granberg
torsdag 15 januari 2015 13.26.43 skrev  H.J. Lu:
> On Thu, Jan 15, 2015 at 1:04 PM, Jeff Law  wrote:
> > Stage3 is closing rapidly.  I've drained my queue of patches I was
> > tracking
> > for gcc-5.However, note that I don't track everything.  If it's a
> > patch
> > for a backend, language other than C or seemingly has another maintainer
> > that's engaged in review, then I haven't been tracking the patch.
> > 
> > So this is my final call for patch pings.  I've got some bandwidth and may
> > be able to look at a few patches that have otherwise stalled.
> 
> This one was updated yesterday:
> 
> https://gcc.gnu.org/ml/gcc-patches/2015-01/msg00956.html
> 
> I guess it won't hurt to list it here.
> 
> 
> ---
> H.J.
Jeff can that be commited?
Thank you H.J. for the work with it.

/Magnus



Re: Merge current set of OpenACC changes from gomp-4_0-branch

2015-01-16 Thread Thomas Schwinge
Hi Gerald!

On Fri, 16 Jan 2015 13:32:10 +0100 (CET), Gerald Pfeifer  
wrote:
> On Thursday 2015-01-15 21:20, Thomas Schwinge wrote:
> > In r219682, I have committed to trunk our current set of OpenACC changes,
> > which we had prepared on gomp-4_0-branch.  Thanks to everyone who has
> > been contributing!
> 
> this breaks bootstrap on FreeBSD 8/amd64 from what I can tell:

Sorry for that.  And, thanks for fixing the num_devices issue.

> /scratch/tmp/gerald/gcc-HEAD/libgomp/oacc-parallel.c:37:20: fatal error: 
> alloca. h: No such file or directory compilation terminated.
> 
> 
> % find /usr/include/ -name alloca.h
> %
> 
> i.e., FreeBSD does not feature the alloca.h header and declares
> alloca() in stdlib.h.

The fix is simple enough; committed to trunk in r219771, as obvious:

commit a6f19a7c6b55f96d0c6dc65914857fc8e9b30aaf
Author: tschwinge 
Date:   Fri Jan 16 20:05:21 2015 +

libgomp: Don't use .

libgomp/
* oacc-parallel.c: Don't include .
(GOACC_parallel): Use gomp_alloca instead of alloca.

git-svn-id: svn+ssh://gcc.gnu.org/svn/gcc/trunk@219771 
138bc75d-0d04-0410-961f-82ee72b054a4
---
 libgomp/ChangeLog   | 5 +
 libgomp/oacc-parallel.c | 3 +--
 2 files changed, 6 insertions(+), 2 deletions(-)

diff --git libgomp/ChangeLog libgomp/ChangeLog
index 7c106d4..065dfd4 100644
--- libgomp/ChangeLog
+++ libgomp/ChangeLog
@@ -1,3 +1,8 @@
+2015-01-16  Thomas Schwinge  
+
+   * oacc-parallel.c: Don't include .
+   (GOACC_parallel): Use gomp_alloca instead of alloca.
+
 2015-01-16  Gerald Pfeifer  
 
* target.c (num_devices): Guard with PLUGIN_SUPPORT.
diff --git libgomp/oacc-parallel.c libgomp/oacc-parallel.c
index 6d5386b..b5e8060 100644
--- libgomp/oacc-parallel.c
+++ libgomp/oacc-parallel.c
@@ -34,7 +34,6 @@
 #include 
 #include 
 #include 
-#include 
 
 static int
 find_pset (int pos, size_t mapnum, unsigned short *kinds)
@@ -151,7 +150,7 @@ GOACC_parallel (int device, void (*fn) (void *), const void 
*offload_table,
   tgt = gomp_map_vars (acc_dev, mapnum, hostaddrs, NULL, sizes, kinds, true,
   false);
 
-  devaddrs = alloca (sizeof (void *) * mapnum);
+  devaddrs = gomp_alloca (sizeof (void *) * mapnum);
   for (i = 0; i < mapnum; i++)
 devaddrs[i] = (void *) (tgt->list[i]->tgt->tgt_start
+ tgt->list[i]->tgt_offset);


Grüße,
 Thomas


signature.asc
Description: PGP signature


Re: [[ARM/AArch64][testsuite] 30/36] Add vpaddl tests.

2015-01-16 Thread Christophe Lyon
On 16 January 2015 at 19:49, Christophe Lyon  wrote:
> On 16 January 2015 at 19:33, Tejas Belagod  wrote:
>>
>>> +VECT_VAR_DECL(expected,poly,8,16) [] = { 0x33, 0x33, 0x33, 0x33,
>>> +0x33, 0x33, 0x33, 0x33,
>>> +0x33, 0x33, 0x33, 0x33,
>>> +0x33, 0x33, 0x33, 0x33 };
>>> +VECT_VAR_DECL(expected,poly,16,8) [] = { 0x, 0x, 0x, 0x,
>>> +0x, 0x, 0x, 0x };
>>> +VECT_VAR_DECL(expected,hfloat,32,4) [] = { 0x, 0x,
>>> +  0x, 0x };
>>> +
>>
>>
>> No poly or float ops.
>>
>>> +#define INSN_NAME vpaddl
>>> +#define TEST_MSG "VPADDL/VPADDLQ"
>>> +
>>> +#define FNNAME1(NAME) void exec_ ## NAME (void)
>>> +#define FNNAME(NAME) FNNAME1(NAME)
>>> +
>>> +FNNAME (INSN_NAME)
>>> +{
>>> +  /* Basic test: y=OP(x), then store the result.  */
>>> +#define TEST_VPADDL1(INSN, Q, T1, T2, W, N, W2, N2)\
>>> +  VECT_VAR(vector_res, T1, W2, N2) =   \
>>> +INSN##Q##_##T2##W(VECT_VAR(vector, T1, W, N)); \
>>> +  vst1##Q##_##T2##W2(VECT_VAR(result, T1, W2, N2), \
>>> +   VECT_VAR(vector_res, T1, W2, N2))
>>> +
>>> +#define TEST_VPADDL(INSN, Q, T1, T2, W, N, W2, N2) \
>>> +  TEST_VPADDL1(INSN, Q, T1, T2, W, N, W2, N2)
>>> +
>>> +  /* No need for 64 bits variants.  */
>>
>>
>> These look like 64-bit variants.
>>
> I mean no vector element of 64 bits.
>
>>> +  DECL_VARIABLE(vector, int, 8, 8);
>>> +  DECL_VARIABLE(vector, int, 16, 4);
>>> +  DECL_VARIABLE(vector, int, 32, 2);
>>> +  DECL_VARIABLE(vector, uint, 8, 8);
>>> +  DECL_VARIABLE(vector, uint, 16, 4);
>>> +  DECL_VARIABLE(vector, uint, 32, 2);
>>> +  DECL_VARIABLE(vector, int, 8, 16);
>>> +  DECL_VARIABLE(vector, int, 16, 8);
>>> +  DECL_VARIABLE(vector, int, 32, 4);
>>> +  DECL_VARIABLE(vector, uint, 8, 16);
>>> +  DECL_VARIABLE(vector, uint, 16, 8);
>>> +  DECL_VARIABLE(vector, uint, 32, 4);
>>> +
>>
>>
>>> +  /* Apply a unary operator named INSN_NAME.  */
>>
>> Unary op?
>
> Cut & paste error, again.
>
Hmm changed my mind: vpaddl takes only one vector as input, although
it does add 2 vector elements.


>>
>>> +  TEST_VPADDL(INSN_NAME, , int, s, 8, 8, 16, 4);
>>> +  TEST_VPADDL(INSN_NAME, , int, s, 16, 4, 32, 2);
>>> +  TEST_VPADDL(INSN_NAME, , int, s, 32, 2, 64, 1);
>>> +  TEST_VPADDL(INSN_NAME, , uint, u, 8, 8, 16, 4);
>>> +  TEST_VPADDL(INSN_NAME, , uint, u, 16, 4, 32, 2);
>>> +  TEST_VPADDL(INSN_NAME, , uint, u, 32, 2, 64, 1);
>>> +  TEST_VPADDL(INSN_NAME, q, int, s, 8, 16, 16, 8);
>>> +  TEST_VPADDL(INSN_NAME, q, int, s, 16, 8, 32, 4);
>>> +  TEST_VPADDL(INSN_NAME, q, int, s, 32, 4, 64, 2);
>>> +  TEST_VPADDL(INSN_NAME, q, uint, u, 8, 16, 16, 8);
>>> +  TEST_VPADDL(INSN_NAME, q, uint, u, 16, 8, 32, 4);
>>> +  TEST_VPADDL(INSN_NAME, q, uint, u, 32, 4, 64, 2);
>>> +
>>> +  CHECK_RESULTS (TEST_MSG, "");
>>> +}
>>> +
>>> +int main (void)
>>> +{
>>> +  exec_vpaddl ();
>>> +  return 0;
>>> +}
>>>
>>
>>
>> Otherwise, LGTM.
>>
>> Tejas.
>>


Re: [COMMITTED] Merge libffi with upstream

2015-01-16 Thread Richard Henderson
On 01/15/2015 08:40 AM, Rainer Orth wrote:
> * on Solaris/SPARC, /bin/as requires
> 
>   .type fn,#function

Gas accepts this as well, so let's just use that.

> * Yet unfixed for Solaris/SPARC /bin/as:
> 
> as: "v8.s", line 128: error: invalid digit in radix 10
> 
>   as seems to only understand single-digit labels
> 
> as: "v8.s", line 140: error: statement syntax
> as: "v8.s", line 157: error: unknown opcode ".rept"
> as: "v8.s", line 157: error: statement syntax
> as: "v8.s", line 163: error: unknown opcode ".endr"
> as: "v8.s", line 163: error: statement syntax
> 
>   and knows nothing about .rept/.endr

Here's a diff of v8.S that I've just tested on linux.
That should fix the remaining problems for Solaris.


r~

diff --git a/libffi/src/sparc/v8.S b/libffi/src/sparc/v8.S
index 3a811ef..f675151 100644
--- a/libffi/src/sparc/v8.S
+++ b/libffi/src/sparc/v8.S
@@ -48,7 +48,7 @@
 #ifndef __GNUC__
 .align 8
.globl  C(ffi_flush_icache)
-   .type   C(ffi_flush_icache),@function
+   .type   C(ffi_flush_icache),#function
FFI_HIDDEN(C(ffi_flush_icache))
 
 C(ffi_flush_icache):
@@ -66,14 +66,15 @@ C(ffi_flush_icache):
.size   C(ffi_flush_icache), . - C(ffi_flush_icache)
 #endif
 
-.macro E index
-   .align  16
-   .org2b + \index * 16
-.endm
+#if defined(__sun__) && defined(__svr4__)
+# define E(INDEX)  .align 16
+#else
+# define E(INDEX)  .align 16; .org 2b + INDEX * 16
+#endif
 
 .align 8
.globl  C(ffi_call_v8)
-   .type   C(ffi_call_v8),@function
+   .type   C(ffi_call_v8),#function
FFI_HIDDEN(C(ffi_call_v8))
 
 C(ffi_call_v8):
@@ -114,71 +115,71 @@ C(ffi_call_v8):
! Note that each entry is 4 insns, enforced by the E macro.
.align  16
 2:
-E SPARC_RET_VOID
+E(SPARC_RET_VOID)
ret
 restore
-E SPARC_RET_STRUCT
+E(SPARC_RET_STRUCT)
unimp
-E SPARC_RET_UINT8
+E(SPARC_RET_UINT8)
and %o0, 0xff, %o0
st  %o0, [%i2]
ret
 restore
-E SPARC_RET_SINT8
+E(SPARC_RET_SINT8)
sll %o0, 24, %o0
b   7f
 sra%o0, 24, %o0
-E SPARC_RET_UINT16
+E(SPARC_RET_UINT16)
sll %o0, 16, %o0
b   7f
 srl%o0, 16, %o0
-E SPARC_RET_SINT16
+E(SPARC_RET_SINT16)
sll %o0, 16, %o0
b   7f
 sra%o0, 16, %o0
-E SPARC_RET_UINT32
+E(SPARC_RET_UINT32)
 7: st  %o0, [%i2]
ret
 restore
-E SP_V8_RET_CPLX16
+E(SP_V8_RET_CPLX16)
sth %o0, [%i2+2]
b   9f
 srl%o0, 16, %o0
-E SPARC_RET_INT64
+E(SPARC_RET_INT64)
st  %o0, [%i2]
st  %o1, [%i2+4]
ret
 restore
-E SPARC_RET_INT128
+E(SPARC_RET_INT128)
std %o0, [%i2]
std %o2, [%i2+8]
ret
 restore
-E SPARC_RET_F_8
+E(SPARC_RET_F_8)
st  %f7, [%i2+7*4]
nop
st  %f6, [%i2+6*4]
nop
-E SPARC_RET_F_6
+E(SPARC_RET_F_6)
st  %f5, [%i2+5*4]
nop
st  %f4, [%i2+4*4]
nop
-E SPARC_RET_F_4
+E(SPARC_RET_F_4)
st  %f3, [%i2+3*4]
nop
st  %f2, [%i2+2*4]
nop
-E SPARC_RET_F_2
+E(SPARC_RET_F_2)
st  %f1, [%i2+4]
st  %f0, [%i2]
ret
 restore
-E SP_V8_RET_CPLX8
+E(SP_V8_RET_CPLX8)
stb %o0, [%i2+1]
-   b   10f
+   b   0f
 srl%o0, 8, %o0
-E SPARC_RET_F_1
+E(SPARC_RET_F_1)
st  %f0, [%i2]
ret
 restore
@@ -188,7 +189,7 @@ E SPARC_RET_F_1
ret
 restore
.align  8
-10:stb %o0, [%i2]
+0: stb %o0, [%i2]
ret
 restore
 
@@ -201,17 +202,35 @@ E SPARC_RET_F_1
 sll%l1, 2, %l0 ! size * 4
 1: sll %l1, 4, %l1 ! size * 16
add %l0, %l1, %l0   ! size * 20
-   add %o7, %l0, %o7   ! o7 = 0b + size*20
+   add %o7, %l0, %o7   ! o7 = 8b + size*20
jmp %o7+(2f-8b)
 mov%i5, %g2! load static chain
 2:
-.rept  0x1000
-   call%i1
-nop
-   unimp   (. - 2b) / 20
-   ret
+
+/* The Sun assembler doesn't understand .rept 0x1000.  */
+#define rept1  \
+   call%i1;\
+nop;   \
+   unimp   (. - 2b) / 20;  \
+   ret;\
 restore
-.endr
+
+#define rept16 \
+   rept1; rept1; rept1; rept1; \
+   rept1; rept1; rept1; rept1; \
+   rept1; rept1; rept1; rept1; \
+   rept1; rept1; rept1; rept1
+
+#define rept256\
+   rept16; rept16; rept16; rept16; \
+   rept16; rept16; rept16; rept16; \
+   rept16; rept16; rept16; rept16; \
+   rept16; rept16; rept16; rept16
+
+   rept256; rept256; rept256; rept256
+   rept256; rept2

Re: Fix CC_MODE pessimization in reorg.c

2015-01-16 Thread Richard Henderson
On 01/16/2015 09:47 AM, Eric Botcazou wrote:
> 
> 2015-01-16  Eric Botcazou  
> 
>   * reorg.c (fill_simple_delay_slots): If TARGET_FLAGS_REGNUM is valid,
>   implement a more precise life analysis for it during backward scan.

Ok.


r~



Merge from trunk to gccgo branch

2015-01-16 Thread Ian Lance Taylor
I merged trunk revision 219753 to the gccgo branch.

Ian


[PATCH] Re: Stage 3 RFC: using "jit" for ahead-of-time compilation

2015-01-16 Thread David Malcolm
On Thu, 2015-01-15 at 22:50 +0100, Richard Biener wrote:
> On January 15, 2015 9:05:59 PM CET, David Malcolm  wrote:
> >Release managers: given that this only touches the jit, and that the
> >jit
> >is off by default, any objections if I go ahead and commit this?
> >It's a late-breaking feature, but the jit as a whole is new, and
> >I think the following is a big win, so I'd like to proceed with this in
> >stage 3 (i.e. in the next 24 hours).  There are docs and testcases.
> >
> >New jit API entrypoint: gcc_jit_context_compile_to_file
> >
> >This patch adds a way to use libgccjit for ahead-of-time compilation.
> >I noticed that given the postprocessing steps the jit has to take to
> >turn the .s file into in-memory code (invoke driver to convert to
> >a .so and then dlopen), that it's not much of a leap to support
> >compiling the .s file into objects, dynamic libraries, and executables.
> >
> >Doing so seems like a big win from a feature standpoint: people with
> >pre-existing frontend code who want a backend can then plug in
> >libgccjit
> >and have a compiler, without needing to write it as a GCC frontend, or
> >use LLVM.
> 
> Note that you should make them aware of our runtime license with
> respect to the eligible compilation process.  Which means this is not
> a way to implement proprietary front ends.
> 
> Richard.

IANAL, but as I understand things, the runtime license is an additional
grant of rights that covers certain components of GCC that bear the GCC
Runtime Library Exception, allowing them to be used in certain
additional ways beyond regular GPLv3-compliance.

libgccjit doesn't have that exception; it's GPLv3.

Perhaps an argument could be made for libgccjit to have the exception,
if the FSF think that that would better serve the FSF's mission; right
now, I'm merely trying to provide a technical means to modularity.

Assuming the above is correct, anything linked against it needs to be
GPLv3-compatible.  Hence any such frontend linked against libgccjit
would need to be GPLv3-compatible.

Attached is a patch (on top of the proposed one below), to clarify the
wording in the new tutorial a little, to remind people that such linking
needs to be license-compatible (without actually spelling out what the
license is, though it's visible at the top of the public header file,
libgccjit.h, as GPLv3 or later without the runtime library exception).

Are the combined patches OK by you?

Thanks
Dave


> >"jit" becomes something of a misnomer for this use-case.
> >
> >As an experiment, I used this technique to add a compiler for the
> >language I'll refer to as "brainf" (ahem), and wrote this up for the
> >libgccjit tutorial (it's all in the patch); prebuilt HTML can be seen
> >at:
> >https://dmalcolm.fedorapeople.org/gcc/libgccjit-api-docs-wip/intro/tutorial05.html
> >
> >The main things that are missing are:
> > * specifying libraries to link against (Uli had some ideas about this)
> >  * cross-compilation support (needs some deeper work, especially the
> >test suite, so deferrable to gcc 6, I guess)
> >but the feature is useful with the patch as-is.
> >
> >The new test cases take jit.sum's # of expected passes
> >from 7514 to 7571.
> >
> >gcc/jit/ChangeLog:
> > * docs/cp/topics/results.rst: Rename to...
> > * docs/cp/topics/compilation.rst: ...this, and add section on
> > ahead-of-time compilation.
> > * docs/cp/topics/index.rst: Update for renaming of results.rst
> > to compilation.rst.
> > * docs/examples/emit-alphabet.bf: New file, a sample "brainf"
> > script.
> > * docs/examples/tut05-bf.c: New file, implementing a compiler
> > for "brainf".
> > * docs/internals/test-hello-world.exe.log.txt: Update to reflect
> > changes to logger output.
> > * docs/intro/index.rst: Add tutorial05.rst
> > * docs/intro/tutorial05.rst: New file.
> > * docs/topics/results.rst: Rename to...
> > * docs/topics/compilation.rst: ...this, and add section on
> > ahead-of-time compilation.
> > * docs/topics/index.rst: Update for renaming of results.rst to
> > compilation.rst.
> > * jit-playback.c (gcc::jit::playback::context::compile): Convert
> > return type from result * to void.  Move the code to convert to
> > dso and dlopen the result to a new pure virtual "postprocess"
> > method.
> > (gcc::jit::playback::compile_to_memory::compile_to_memory): New
> > function.
> > (gcc::jit::playback::compile_to_memory::postprocess): New
> > function, based on playback::context::compile.
> > (gcc::jit::playback::compile_to_file::compile_to_file): New
> > function.
> > (gcc::jit::playback::compile_to_file::postprocess): New function.
> > (gcc::jit::playback::compile_to_file::copy_file): New function.
> > (gcc::jit::playback::context::convert_to_dso): Move internals
> > to...
> > (gcc::jit::playback::context::invoke_driver): New method.  Add
> > "-shared" and "-c" options to driver's argv as needed.

Re: [[ARM/AArch64][testsuite] 36/36] Add vqdmull_n tests.

2015-01-16 Thread Christophe Lyon
On 16 January 2015 at 19:40, Tejas Belagod  wrote:
> On 13/01/15 15:18, Christophe Lyon wrote:
>>
>> * gcc.target/aarch64/advsimd-intrinsics/vqdmull_n.c: New file.
>>
>> diff --git
>> a/gcc/testsuite/gcc.target/aarch64/advsimd-intrinsics/vqdmull_n.c
>> b/gcc/testsuite/gcc.target/aarch64/advsimd-intrinsics/vqdmull_n.c
>> new file mode 100644
>> index 000..9e73009
>> --- /dev/null
>> +++ b/gcc/testsuite/gcc.target/aarch64/advsimd-intrinsics/vqdmull_n.c
>> @@ -0,0 +1,92 @@
>> +#include 
>> +#include "arm-neon-ref.h"
>> +#include "compute-ref-data.h"
>> +
>> +/* Expected values of cumulative_saturation flag.  */
>> +int VECT_VAR(expected_cumulative_sat,int,16,4) = 0;
>> +int VECT_VAR(expected_cumulative_sat,int,32,2) = 0;
>> +
>> +/* Expected results.  */
>> +VECT_VAR_DECL(expected,int,32,4) [] = { 0x44000, 0x44000,
>> +   0x44000, 0x44000 };
>> +VECT_VAR_DECL(expected,int,64,2) [] = { 0xaa000, 0xaa000 };
>> +
>> +/* Expected values of cumulative_saturation flag when saturation
>> +   occurs.  */
>> +int VECT_VAR(expected_cumulative_sat2,int,16,4) = 1;
>> +int VECT_VAR(expected_cumulative_sat2,int,32,2) = 1;
>> +
>> +/* Expected results when saturation occurs.  */
>> +VECT_VAR_DECL(expected2,int,32,4) [] = { 0x7fff, 0x7fff,
>> +0x7fff, 0x7fff };
>> +VECT_VAR_DECL(expected2,int,64,2) [] = { 0x7fff,
>> +0x7fff };
>> +
>> +#define INSN_NAME vqdmull
>> +#define TEST_MSG "VQDMULL_N"
>> +
>> +#define FNNAME1(NAME) exec_ ## NAME
>> +#define FNNAME(NAME) FNNAME1(NAME)
>> +
>> +void FNNAME (INSN_NAME) (void)
>> +{
>> +  int i;
>> +
>> +  /* vector_res = vqdmull_n(vector,val), then store the result.  */
>> +#define TEST_VQDMULL_N2(INSN, T1, T2, W, W2, N, L,
>> EXPECTED_CUMULATIVE_SAT, CMT) \
>> +  Set_Neon_Cumulative_Sat(0, VECT_VAR(vector_res, T1, W2, N)); \
>> +  VECT_VAR(vector_res, T1, W2, N) =\
>> +INSN##_n_##T2##W(VECT_VAR(vector, T1, W, N),   \
>> +L);\
>> +  vst1q_##T2##W2(VECT_VAR(result, T1, W2, N),  \
>> +VECT_VAR(vector_res, T1, W2, N));  \
>> +  CHECK_CUMULATIVE_SAT(TEST_MSG, T1, W, N, EXPECTED_CUMULATIVE_SAT, CMT)
>> +
>> +  /* Two auxliary macros are necessary to expand INSN.  */
>> +#define TEST_VQDMULL_N1(INSN, T1, T2, W, W2, N, L,
>> EXPECTED_CUMULATIVE_SAT, CMT) \
>> +  TEST_VQDMULL_N2(INSN, T1, T2, W, W2, N, L, EXPECTED_CUMULATIVE_SAT,
>> CMT)
>> +
>> +#define TEST_VQDMULL_N(T1, T2, W, W2, N, L, EXPECTED_CUMULATIVE_SAT, CMT)
>> \
>> +  TEST_VQDMULL_N1(INSN_NAME, T1, T2, W, W2, N, L,
>> EXPECTED_CUMULATIVE_SAT, CMT)
>> +
>> +  DECL_VARIABLE(vector, int, 16, 4);
>> +  DECL_VARIABLE(vector, int, 32, 2);
>> +  DECL_VARIABLE(vector2, int, 16, 4);
>> +  DECL_VARIABLE(vector2, int, 32, 2);
>> +
>> +  DECL_VARIABLE(vector_res, int, 32, 4);
>> +  DECL_VARIABLE(vector_res, int, 64, 2);
>> +
>> +  clean_results ();
>> +
>> +  /* Initialize vector.  */
>> +  VDUP(vector, , int, s, 16, 4, 0x1000);
>> +  VDUP(vector, , int, s, 32, 2, 0x1000);
>> +
>> +  /* Initialize vector2.  */
>> +  VDUP(vector2, , int, s, 16, 4, 0x4);
>> +  VDUP(vector2, , int, s, 32, 2, 0x2);
>> +
>> +  /* Choose multiplier arbitrarily.  */
>> +  TEST_VQDMULL_N(int, s, 16, 32, 4, 0x22, expected_cumulative_sat, "");
>> +  TEST_VQDMULL_N(int, s, 32, 64, 2, 0x55, expected_cumulative_sat, "");
>> +
>> +  CHECK(TEST_MSG, int, 32, 4, PRIx32, expected, "");
>> +  CHECK(TEST_MSG, int, 64, 2, PRIx64, expected, "");
>> +
>> +  VDUP(vector, , int, s, 16, 4, 0x8000);
>> +  VDUP(vector, , int, s, 32, 2, 0x8000);
>> +
>> +#define TEST_MSG2 "with saturation"
>> +  TEST_VQDMULL_N(int, s, 16, 32, 4, 0x8000, expected_cumulative_sat2,
>> TEST_MSG2);
>> +  TEST_VQDMULL_N(int, s, 32, 64, 2, 0x8000, expected_cumulative_sat2,
>> TEST_MSG2);
>> +
>> +  CHECK(TEST_MSG, int, 32, 4, PRIx32, expected2, TEST_MSG2);
>> +  CHECK(TEST_MSG, int, 64, 2, PRIx64, expected2, TEST_MSG2);
>> +}
>> +
>> +int main (void)
>> +{
>> +  FNNAME (INSN_NAME) ();
>> +  return 0;
>> +}
>>
>
> Patches 31 to 36 also LGTM.
>
> A general nit about all the patches. Code like:
>
> +  DECL_VARIABLE(vector, int, 16, 4);
> +  DECL_VARIABLE(vector, int, 32, 2);
> +  DECL_VARIABLE(vector2, int, 16, 4);
> +  DECL_VARIABLE(vector2, int, 32, 2);
> +  DECL_VARIABLE(vector_res, int, 32, 4);
> +  DECL_VARIABLE(vector_res, int, 64, 2);
>
> A space before the '(' is required.
I should probably fix this in all the tests already committed too.


> Thanks for working on these tests.
Thanks for the review.

> Tejas.
>


Re: [[ARM/AArch64][testsuite] 30/36] Add vpaddl tests.

2015-01-16 Thread Christophe Lyon
On 16 January 2015 at 19:33, Tejas Belagod  wrote:
>
>> +VECT_VAR_DECL(expected,poly,8,16) [] = { 0x33, 0x33, 0x33, 0x33,
>> +0x33, 0x33, 0x33, 0x33,
>> +0x33, 0x33, 0x33, 0x33,
>> +0x33, 0x33, 0x33, 0x33 };
>> +VECT_VAR_DECL(expected,poly,16,8) [] = { 0x, 0x, 0x, 0x,
>> +0x, 0x, 0x, 0x };
>> +VECT_VAR_DECL(expected,hfloat,32,4) [] = { 0x, 0x,
>> +  0x, 0x };
>> +
>
>
> No poly or float ops.
>
>> +#define INSN_NAME vpaddl
>> +#define TEST_MSG "VPADDL/VPADDLQ"
>> +
>> +#define FNNAME1(NAME) void exec_ ## NAME (void)
>> +#define FNNAME(NAME) FNNAME1(NAME)
>> +
>> +FNNAME (INSN_NAME)
>> +{
>> +  /* Basic test: y=OP(x), then store the result.  */
>> +#define TEST_VPADDL1(INSN, Q, T1, T2, W, N, W2, N2)\
>> +  VECT_VAR(vector_res, T1, W2, N2) =   \
>> +INSN##Q##_##T2##W(VECT_VAR(vector, T1, W, N)); \
>> +  vst1##Q##_##T2##W2(VECT_VAR(result, T1, W2, N2), \
>> +   VECT_VAR(vector_res, T1, W2, N2))
>> +
>> +#define TEST_VPADDL(INSN, Q, T1, T2, W, N, W2, N2) \
>> +  TEST_VPADDL1(INSN, Q, T1, T2, W, N, W2, N2)
>> +
>> +  /* No need for 64 bits variants.  */
>
>
> These look like 64-bit variants.
>
I mean no vector element of 64 bits.

>> +  DECL_VARIABLE(vector, int, 8, 8);
>> +  DECL_VARIABLE(vector, int, 16, 4);
>> +  DECL_VARIABLE(vector, int, 32, 2);
>> +  DECL_VARIABLE(vector, uint, 8, 8);
>> +  DECL_VARIABLE(vector, uint, 16, 4);
>> +  DECL_VARIABLE(vector, uint, 32, 2);
>> +  DECL_VARIABLE(vector, int, 8, 16);
>> +  DECL_VARIABLE(vector, int, 16, 8);
>> +  DECL_VARIABLE(vector, int, 32, 4);
>> +  DECL_VARIABLE(vector, uint, 8, 16);
>> +  DECL_VARIABLE(vector, uint, 16, 8);
>> +  DECL_VARIABLE(vector, uint, 32, 4);
>> +
>
>
>> +  /* Apply a unary operator named INSN_NAME.  */
>
> Unary op?

Cut & paste error, again.

>
>> +  TEST_VPADDL(INSN_NAME, , int, s, 8, 8, 16, 4);
>> +  TEST_VPADDL(INSN_NAME, , int, s, 16, 4, 32, 2);
>> +  TEST_VPADDL(INSN_NAME, , int, s, 32, 2, 64, 1);
>> +  TEST_VPADDL(INSN_NAME, , uint, u, 8, 8, 16, 4);
>> +  TEST_VPADDL(INSN_NAME, , uint, u, 16, 4, 32, 2);
>> +  TEST_VPADDL(INSN_NAME, , uint, u, 32, 2, 64, 1);
>> +  TEST_VPADDL(INSN_NAME, q, int, s, 8, 16, 16, 8);
>> +  TEST_VPADDL(INSN_NAME, q, int, s, 16, 8, 32, 4);
>> +  TEST_VPADDL(INSN_NAME, q, int, s, 32, 4, 64, 2);
>> +  TEST_VPADDL(INSN_NAME, q, uint, u, 8, 16, 16, 8);
>> +  TEST_VPADDL(INSN_NAME, q, uint, u, 16, 8, 32, 4);
>> +  TEST_VPADDL(INSN_NAME, q, uint, u, 32, 4, 64, 2);
>> +
>> +  CHECK_RESULTS (TEST_MSG, "");
>> +}
>> +
>> +int main (void)
>> +{
>> +  exec_vpaddl ();
>> +  return 0;
>> +}
>>
>
>
> Otherwise, LGTM.
>
> Tejas.
>


Re: [PING ^ 3] [RFC PATCH, AARCH64] Add support for -mlong-calls option

2015-01-16 Thread David Abdurachmanov
Ping.

This was posted in Stage 1, thus still should be valid for Stage 3.

First we hit a issue with OpenLoops package were we generated big functions
bodies (1-2MB) [1]. This caused the GNU assembler not to be happy as it 
generated
PC-relative offsets to literal pool above 1MB. The authors of package are
attempting to split the functions into smaller pieces, but now we are hitting
GNU bfd linker issues:

virtual_6_pplljjj_eexuuxggg_1_qp.f90:(.text+0x22820): relocation truncated to
fit: R_AARCH64_CALL26 against symbol __ol_last_step_qp_MOD_check_last_aq_v
defined in .text section i

After reading: http://lists.cs.uiuc.edu/pipermail/llvmdev/2014-June/073992.html

Seems that this is GNU linker limitation as ARM linker handles this situation.

GCC AArch64 supports "-mcmodel=large", but that's only for static linking. I
hope "-mlong-calls" can force gfortran to generate long calls instead for
shared linking.

Can we get this one in GCC 5?

Thanks,
david
- - -
[1] https://gcc.gnu.org/bugzilla/show_bug.cgi?id=63304

On Dec 9, 2014, at 9:28 AM, Yangfei (Felix) wrote:

> Hi, 
>  This is a pin for: 
>https://gcc.gnu.org/ml/gcc-patches/2014-11/msg02258.html 
> 
> Thanks.



Re: [debug-early] C++ clones and limbo DIEs

2015-01-16 Thread Richard Biener
On January 16, 2015 5:31:49 PM CET, Jason Merrill  wrote:
>On 01/16/2015 05:55 AM, Richard Biener wrote:
>> I'd hope that in the very distant future all early DIEs would be
>"created"
>> by the frontends (that is, dwarf2out.c wouldn't walk into
>parents/siblings
>> so much).
>
>Are you thinking that the front end would immediately call a debug hook
>
>for every block, local variable and such, or just for higher level
>entities?

For every block, local variable and such.
Yes.  The FE then also has the chance to append whatever FE specific attributes 
without langhooks in dwarf2out.

>> I hoped we wouldn't need the limbo list at all ... that is, parent
>DIEs
>> are always present when we create children.  I think that should
>> work in principle if the frontends would create DIEs while parsing.
>
>So create the function DIE as soon as we see the declaration?  That 
>seems reasonable.  Then that would be the point of early debug, not 
>later at EOF.

Yes.

Richard.

>> Note that dwarf2out forces parent DIE creation in some cases
>> but not in some others - it would be interesting to sort out which
>> parent DIEs it thinks it cannot create when we create the DIE
>> for a sibling.  Maybe it's just poor ordering of early_global_decl
>> calls?
>
>Agreed.
>
>Jason




Re: [[ARM/AArch64][testsuite] 36/36] Add vqdmull_n tests.

2015-01-16 Thread Tejas Belagod

On 13/01/15 15:18, Christophe Lyon wrote:

* gcc.target/aarch64/advsimd-intrinsics/vqdmull_n.c: New file.

diff --git a/gcc/testsuite/gcc.target/aarch64/advsimd-intrinsics/vqdmull_n.c 
b/gcc/testsuite/gcc.target/aarch64/advsimd-intrinsics/vqdmull_n.c
new file mode 100644
index 000..9e73009
--- /dev/null
+++ b/gcc/testsuite/gcc.target/aarch64/advsimd-intrinsics/vqdmull_n.c
@@ -0,0 +1,92 @@
+#include 
+#include "arm-neon-ref.h"
+#include "compute-ref-data.h"
+
+/* Expected values of cumulative_saturation flag.  */
+int VECT_VAR(expected_cumulative_sat,int,16,4) = 0;
+int VECT_VAR(expected_cumulative_sat,int,32,2) = 0;
+
+/* Expected results.  */
+VECT_VAR_DECL(expected,int,32,4) [] = { 0x44000, 0x44000,
+   0x44000, 0x44000 };
+VECT_VAR_DECL(expected,int,64,2) [] = { 0xaa000, 0xaa000 };
+
+/* Expected values of cumulative_saturation flag when saturation
+   occurs.  */
+int VECT_VAR(expected_cumulative_sat2,int,16,4) = 1;
+int VECT_VAR(expected_cumulative_sat2,int,32,2) = 1;
+
+/* Expected results when saturation occurs.  */
+VECT_VAR_DECL(expected2,int,32,4) [] = { 0x7fff, 0x7fff,
+0x7fff, 0x7fff };
+VECT_VAR_DECL(expected2,int,64,2) [] = { 0x7fff,
+0x7fff };
+
+#define INSN_NAME vqdmull
+#define TEST_MSG "VQDMULL_N"
+
+#define FNNAME1(NAME) exec_ ## NAME
+#define FNNAME(NAME) FNNAME1(NAME)
+
+void FNNAME (INSN_NAME) (void)
+{
+  int i;
+
+  /* vector_res = vqdmull_n(vector,val), then store the result.  */
+#define TEST_VQDMULL_N2(INSN, T1, T2, W, W2, N, L, EXPECTED_CUMULATIVE_SAT, 
CMT) \
+  Set_Neon_Cumulative_Sat(0, VECT_VAR(vector_res, T1, W2, N)); \
+  VECT_VAR(vector_res, T1, W2, N) =\
+INSN##_n_##T2##W(VECT_VAR(vector, T1, W, N),   \
+L);\
+  vst1q_##T2##W2(VECT_VAR(result, T1, W2, N),  \
+VECT_VAR(vector_res, T1, W2, N));  \
+  CHECK_CUMULATIVE_SAT(TEST_MSG, T1, W, N, EXPECTED_CUMULATIVE_SAT, CMT)
+
+  /* Two auxliary macros are necessary to expand INSN.  */
+#define TEST_VQDMULL_N1(INSN, T1, T2, W, W2, N, L, EXPECTED_CUMULATIVE_SAT, 
CMT) \
+  TEST_VQDMULL_N2(INSN, T1, T2, W, W2, N, L, EXPECTED_CUMULATIVE_SAT, CMT)
+
+#define TEST_VQDMULL_N(T1, T2, W, W2, N, L, EXPECTED_CUMULATIVE_SAT, CMT) \
+  TEST_VQDMULL_N1(INSN_NAME, T1, T2, W, W2, N, L, EXPECTED_CUMULATIVE_SAT, CMT)
+
+  DECL_VARIABLE(vector, int, 16, 4);
+  DECL_VARIABLE(vector, int, 32, 2);
+  DECL_VARIABLE(vector2, int, 16, 4);
+  DECL_VARIABLE(vector2, int, 32, 2);
+
+  DECL_VARIABLE(vector_res, int, 32, 4);
+  DECL_VARIABLE(vector_res, int, 64, 2);
+
+  clean_results ();
+
+  /* Initialize vector.  */
+  VDUP(vector, , int, s, 16, 4, 0x1000);
+  VDUP(vector, , int, s, 32, 2, 0x1000);
+
+  /* Initialize vector2.  */
+  VDUP(vector2, , int, s, 16, 4, 0x4);
+  VDUP(vector2, , int, s, 32, 2, 0x2);
+
+  /* Choose multiplier arbitrarily.  */
+  TEST_VQDMULL_N(int, s, 16, 32, 4, 0x22, expected_cumulative_sat, "");
+  TEST_VQDMULL_N(int, s, 32, 64, 2, 0x55, expected_cumulative_sat, "");
+
+  CHECK(TEST_MSG, int, 32, 4, PRIx32, expected, "");
+  CHECK(TEST_MSG, int, 64, 2, PRIx64, expected, "");
+
+  VDUP(vector, , int, s, 16, 4, 0x8000);
+  VDUP(vector, , int, s, 32, 2, 0x8000);
+
+#define TEST_MSG2 "with saturation"
+  TEST_VQDMULL_N(int, s, 16, 32, 4, 0x8000, expected_cumulative_sat2, 
TEST_MSG2);
+  TEST_VQDMULL_N(int, s, 32, 64, 2, 0x8000, expected_cumulative_sat2, 
TEST_MSG2);
+
+  CHECK(TEST_MSG, int, 32, 4, PRIx32, expected2, TEST_MSG2);
+  CHECK(TEST_MSG, int, 64, 2, PRIx64, expected2, TEST_MSG2);
+}
+
+int main (void)
+{
+  FNNAME (INSN_NAME) ();
+  return 0;
+}



Patches 31 to 36 also LGTM.

A general nit about all the patches. Code like:

+  DECL_VARIABLE(vector, int, 16, 4);
+  DECL_VARIABLE(vector, int, 32, 2);
+  DECL_VARIABLE(vector2, int, 16, 4);
+  DECL_VARIABLE(vector2, int, 32, 2);
+  DECL_VARIABLE(vector_res, int, 32, 4);
+  DECL_VARIABLE(vector_res, int, 64, 2);

A space before the '(' is required.

Thanks for working on these tests.

Tejas.



Re: [[ARM/AArch64][testsuite] 30/36] Add vpaddl tests.

2015-01-16 Thread Tejas Belagod



+VECT_VAR_DECL(expected,poly,8,16) [] = { 0x33, 0x33, 0x33, 0x33,
+0x33, 0x33, 0x33, 0x33,
+0x33, 0x33, 0x33, 0x33,
+0x33, 0x33, 0x33, 0x33 };
+VECT_VAR_DECL(expected,poly,16,8) [] = { 0x, 0x, 0x, 0x,
+0x, 0x, 0x, 0x };
+VECT_VAR_DECL(expected,hfloat,32,4) [] = { 0x, 0x,
+  0x, 0x };
+


No poly or float ops.


+#define INSN_NAME vpaddl
+#define TEST_MSG "VPADDL/VPADDLQ"
+
+#define FNNAME1(NAME) void exec_ ## NAME (void)
+#define FNNAME(NAME) FNNAME1(NAME)
+
+FNNAME (INSN_NAME)
+{
+  /* Basic test: y=OP(x), then store the result.  */
+#define TEST_VPADDL1(INSN, Q, T1, T2, W, N, W2, N2)\
+  VECT_VAR(vector_res, T1, W2, N2) =   \
+INSN##Q##_##T2##W(VECT_VAR(vector, T1, W, N)); \
+  vst1##Q##_##T2##W2(VECT_VAR(result, T1, W2, N2), \
+   VECT_VAR(vector_res, T1, W2, N2))
+
+#define TEST_VPADDL(INSN, Q, T1, T2, W, N, W2, N2) \
+  TEST_VPADDL1(INSN, Q, T1, T2, W, N, W2, N2)
+
+  /* No need for 64 bits variants.  */


These look like 64-bit variants.


+  DECL_VARIABLE(vector, int, 8, 8);
+  DECL_VARIABLE(vector, int, 16, 4);
+  DECL_VARIABLE(vector, int, 32, 2);
+  DECL_VARIABLE(vector, uint, 8, 8);
+  DECL_VARIABLE(vector, uint, 16, 4);
+  DECL_VARIABLE(vector, uint, 32, 2);
+  DECL_VARIABLE(vector, int, 8, 16);
+  DECL_VARIABLE(vector, int, 16, 8);
+  DECL_VARIABLE(vector, int, 32, 4);
+  DECL_VARIABLE(vector, uint, 8, 16);
+  DECL_VARIABLE(vector, uint, 16, 8);
+  DECL_VARIABLE(vector, uint, 32, 4);
+



+  /* Apply a unary operator named INSN_NAME.  */

Unary op?


+  TEST_VPADDL(INSN_NAME, , int, s, 8, 8, 16, 4);
+  TEST_VPADDL(INSN_NAME, , int, s, 16, 4, 32, 2);
+  TEST_VPADDL(INSN_NAME, , int, s, 32, 2, 64, 1);
+  TEST_VPADDL(INSN_NAME, , uint, u, 8, 8, 16, 4);
+  TEST_VPADDL(INSN_NAME, , uint, u, 16, 4, 32, 2);
+  TEST_VPADDL(INSN_NAME, , uint, u, 32, 2, 64, 1);
+  TEST_VPADDL(INSN_NAME, q, int, s, 8, 16, 16, 8);
+  TEST_VPADDL(INSN_NAME, q, int, s, 16, 8, 32, 4);
+  TEST_VPADDL(INSN_NAME, q, int, s, 32, 4, 64, 2);
+  TEST_VPADDL(INSN_NAME, q, uint, u, 8, 16, 16, 8);
+  TEST_VPADDL(INSN_NAME, q, uint, u, 16, 8, 32, 4);
+  TEST_VPADDL(INSN_NAME, q, uint, u, 32, 4, 64, 2);
+
+  CHECK_RESULTS (TEST_MSG, "");
+}
+
+int main (void)
+{
+  exec_vpaddl ();
+  return 0;
+}




Otherwise, LGTM.

Tejas.



Re: [[ARM/AArch64][testsuite] 29/36] Add vpadal tests.

2015-01-16 Thread Tejas Belagod

+VECT_VAR_DECL(expected,poly,8,16) [] = { 0x33, 0x33, 0x33, 0x33,
+0x33, 0x33, 0x33, 0x33,
+0x33, 0x33, 0x33, 0x33,
+0x33, 0x33, 0x33, 0x33 };
+VECT_VAR_DECL(expected,poly,16,8) [] = { 0x, 0x, 0x, 0x,
+0x, 0x, 0x, 0x };
+VECT_VAR_DECL(expected,hfloat,32,4) [] = { 0x, 0x,
+  0x, 0x };
+


No float or poly ops for VPADAL insns.

Otherwise, LGTM.

Tejas.




Re: [[ARM/AArch64][testsuite] 28/36] Add vmnv tests.

2015-01-16 Thread Tejas Belagod

+VECT_VAR_DECL(expected,poly,16,8) [] = { 0x, 0x, 0x, 0x,
+0x, 0x, 0x, 0x };
+VECT_VAR_DECL(expected,hfloat,32,4) [] = { 0x, 0x,
+  0x, 0x };
+


No float or poly16 for vmvn_*.

Otherwise, LGTM.

Tejas.




Re: [[ARM/AArch64][testsuite] 27/36] Add vmull_n tests.

2015-01-16 Thread Tejas Belagod

On 13/01/15 15:18, Christophe Lyon wrote:

* gcc.target/aarch64/advsimd-intrinsics/vmull_n.c: New file.

diff --git a/gcc/testsuite/gcc.target/aarch64/advsimd-intrinsics/vmull_n.c 
b/gcc/testsuite/gcc.target/aarch64/advsimd-intrinsics/vmull_n.c
new file mode 100644
index 000..df28a94
--- /dev/null
+++ b/gcc/testsuite/gcc.target/aarch64/advsimd-intrinsics/vmull_n.c
@@ -0,0 +1,61 @@
+#include 
+#include "arm-neon-ref.h"
+#include "compute-ref-data.h"
+
+/* Expected results.  */
+VECT_VAR_DECL(expected,int,32,4) [] = { 0x11000, 0x11000, 0x11000, 0x11000 };
+VECT_VAR_DECL(expected,int,64,2) [] = { 0x22000, 0x22000 };
+VECT_VAR_DECL(expected,uint,32,4) [] = { 0x33000, 0x33000, 0x33000, 0x33000 };
+VECT_VAR_DECL(expected,uint,64,2) [] = { 0x44000, 0x44000 };
+
+#define INSN_NAME vmull
+#define TEST_MSG "VMULL_N"
+void exec_vmull_n (void)
+{
+  int i;
+
+  /* vector_res = vmull_n(vector,val), then store the result.  */
+#define TEST_VMULL_N1(INSN, T1, T2, W, W2, N, L)   \
+  VECT_VAR(vector_res, T1, W2, N) =\
+INSN##_n_##T2##W(VECT_VAR(vector, T1, W, N),   \
+L);\
+  vst1q_##T2##W2(VECT_VAR(result, T1, W2, N), VECT_VAR(vector_res, T1, W2, N))
+
+#define TEST_VMULL_N(INSN, T1, T2, W, W2, N, L)\
+  TEST_VMULL_N1(INSN, T1, T2, W, W2, N, L)
+
+  DECL_VARIABLE(vector, int, 16, 4);
+  DECL_VARIABLE(vector, int, 32, 2);
+  DECL_VARIABLE(vector, uint, 16, 4);
+  DECL_VARIABLE(vector, uint, 32, 2);
+
+  DECL_VARIABLE(vector_res, int, 32, 4);
+  DECL_VARIABLE(vector_res, int, 64, 2);
+  DECL_VARIABLE(vector_res, uint, 32, 4);
+  DECL_VARIABLE(vector_res, uint, 64, 2);
+
+  clean_results ();
+
+  /* Initialize vector.  */
+  VDUP(vector, , int, s, 16, 4, 0x1000);
+  VDUP(vector, , int, s, 32, 2, 0x1000);
+  VDUP(vector, , uint, u, 16, 4, 0x1000);
+  VDUP(vector, , uint, u, 32, 2, 0x1000);
+
+  /* Choose multiplier arbitrarily.  */
+  TEST_VMULL_N(INSN_NAME, int, s, 16, 32, 4, 0x11);
+  TEST_VMULL_N(INSN_NAME, int, s, 32, 64, 2, 0x22);
+  TEST_VMULL_N(INSN_NAME, uint, u, 16, 32, 4, 0x33);
+  TEST_VMULL_N(INSN_NAME, uint, u, 32, 64, 2, 0x44);
+
+  CHECK(TEST_MSG, int, 32, 4, PRIx32, expected, "");
+  CHECK(TEST_MSG, int, 64, 2, PRIx64, expected, "");
+  CHECK(TEST_MSG, uint, 32, 4, PRIx32, expected, "");
+  CHECK(TEST_MSG, uint, 64, 2, PRIx64, expected, "");
+}
+
+int main (void)
+{
+  exec_vmull_n ();
+  return 0;
+}



LGTM.

Tejas.



Re: [[ARM/AArch64][testsuite] 26/36] Add vmull_lane tests.

2015-01-16 Thread Tejas Belagod

On 13/01/15 15:18, Christophe Lyon wrote:

* gcc.target/aarch64/advsimd-intrinsics/vmull_lane.c: New file.

diff --git a/gcc/testsuite/gcc.target/aarch64/advsimd-intrinsics/vmull_lane.c 
b/gcc/testsuite/gcc.target/aarch64/advsimd-intrinsics/vmull_lane.c
new file mode 100644
index 000..d3aa879
--- /dev/null
+++ b/gcc/testsuite/gcc.target/aarch64/advsimd-intrinsics/vmull_lane.c
@@ -0,0 +1,66 @@
+#include 
+#include "arm-neon-ref.h"
+#include "compute-ref-data.h"
+
+/* Expected results.  */
+VECT_VAR_DECL(expected,int,32,4) [] = { 0x4000, 0x4000, 0x4000, 0x4000 };
+VECT_VAR_DECL(expected,int,64,2) [] = { 0x2000, 0x2000 };
+VECT_VAR_DECL(expected,uint,32,4) [] = { 0x4000, 0x4000, 0x4000, 0x4000 };
+VECT_VAR_DECL(expected,uint,64,2) [] = { 0x2000, 0x2000 };
+
+#define TEST_MSG "VMULL_LANE"
+void exec_vmull_lane (void)
+{
+  /* vector_res = vmull_lane(vector,vector2,lane), then store the result.  */
+#define TEST_VMULL_LANE(T1, T2, W, W2, N, L)   \
+  VECT_VAR(vector_res, T1, W2, N) =\
+vmull##_lane_##T2##W(VECT_VAR(vector, T1, W, N),   \
+VECT_VAR(vector2, T1, W, N),   \
+L);\
+  vst1q_##T2##W2(VECT_VAR(result, T1, W2, N), VECT_VAR(vector_res, T1, W2, N))
+
+  DECL_VARIABLE(vector, int, 16, 4);
+  DECL_VARIABLE(vector, int, 32, 2);
+  DECL_VARIABLE(vector, uint, 16, 4);
+  DECL_VARIABLE(vector, uint, 32, 2);
+  DECL_VARIABLE(vector2, int, 16, 4);
+  DECL_VARIABLE(vector2, int, 32, 2);
+  DECL_VARIABLE(vector2, uint, 16, 4);
+  DECL_VARIABLE(vector2, uint, 32, 2);
+
+  DECL_VARIABLE(vector_res, int, 32, 4);
+  DECL_VARIABLE(vector_res, int, 64, 2);
+  DECL_VARIABLE(vector_res, uint, 32, 4);
+  DECL_VARIABLE(vector_res, uint, 64, 2);
+
+  clean_results ();
+
+  /* Initialize vector.  */
+  VDUP(vector, , int, s, 16, 4, 0x1000);
+  VDUP(vector, , int, s, 32, 2, 0x1000);
+  VDUP(vector, , uint, u, 16, 4, 0x1000);
+  VDUP(vector, , uint, u, 32, 2, 0x1000);
+
+  /* Initialize vector2.  */
+  VDUP(vector2, , int, s, 16, 4, 0x4);
+  VDUP(vector2, , int, s, 32, 2, 0x2);
+  VDUP(vector2, , uint, u, 16, 4, 0x4);
+  VDUP(vector2, , uint, u, 32, 2, 0x2);
+
+  /* Choose lane arbitrarily.  */
+  TEST_VMULL_LANE(int, s, 16, 32, 4, 2);
+  TEST_VMULL_LANE(int, s, 32, 64, 2, 1);
+  TEST_VMULL_LANE(uint, u, 16, 32, 4, 2);
+  TEST_VMULL_LANE(uint, u, 32, 64, 2, 1);
+
+  CHECK(TEST_MSG, int, 32, 4, PRIx32, expected, "");
+  CHECK(TEST_MSG, int, 64, 2, PRIx32, expected, "");
+  CHECK(TEST_MSG, uint, 32, 4, PRIx32, expected, "");
+  CHECK(TEST_MSG, uint, 64, 2, PRIx32, expected, "");
+}
+
+int main (void)
+{
+  exec_vmull_lane ();
+  return 0;
+}




LGTM.

Tejas.



Re: [[ARM/AArch64][testsuite] 25/36] Add vmull tests.

2015-01-16 Thread Tejas Belagod

On 13/01/15 15:18, Christophe Lyon wrote:

* gcc.target/aarch64/advsimd-intrinsics/vmull.c: New file.

diff --git a/gcc/testsuite/gcc.target/aarch64/advsimd-intrinsics/vmull.c 
b/gcc/testsuite/gcc.target/aarch64/advsimd-intrinsics/vmull.c
new file mode 100644
index 000..3fdd51e
--- /dev/null
+++ b/gcc/testsuite/gcc.target/aarch64/advsimd-intrinsics/vmull.c
@@ -0,0 +1,75 @@
+#include 
+#include "arm-neon-ref.h"
+#include "compute-ref-data.h"
+
+/* Expected results.  */
+VECT_VAR_DECL(expected,int,16,8) [] = { 0x100, 0xe1, 0xc4, 0xa9,
+   0x90, 0x79, 0x64, 0x51 };
+VECT_VAR_DECL(expected,int,32,4) [] = { 0x100, 0xe1, 0xc4, 0xa9 };
+VECT_VAR_DECL(expected,int,64,2) [] = { 0x100, 0xe1 };
+VECT_VAR_DECL(expected,uint,16,8) [] = { 0xe100, 0xe2e1, 0xe4c4, 0xe6a9,
+0xe890, 0xea79, 0xec64, 0xee51 };
+VECT_VAR_DECL(expected,uint,32,4) [] = { 0xffe00100, 0xffe200e1,
+0xffe400c4, 0xffe600a9 };
+VECT_VAR_DECL(expected,uint,64,2) [] = { 0xffe00100,
+0xffe200e1 };
+VECT_VAR_DECL(expected,poly,16,8) [] = { 0x5500, 0x5501, 0x5504, 0x5505,
+0x5510, 0x5511, 0x5514, 0x5515 };
+
+#define TEST_MSG "VMULL"
+void exec_vmull (void)
+{
+  /* Basic test: y=vmull(x,x), then store the result.  */
+#define TEST_VMULL(T1, T2, W, W2, N)   \
+  VECT_VAR(vector_res, T1, W2, N) =\
+vmull_##T2##W(VECT_VAR(vector, T1, W, N),  \
+ VECT_VAR(vector, T1, W, N));  \
+  vst1q_##T2##W2(VECT_VAR(result, T1, W2, N), VECT_VAR(vector_res, T1, W2, N))
+
+  DECL_VARIABLE(vector, int, 8, 8);
+  DECL_VARIABLE(vector, int, 16, 4);
+  DECL_VARIABLE(vector, int, 32, 2);
+  DECL_VARIABLE(vector, uint, 8, 8);
+  DECL_VARIABLE(vector, uint, 16, 4);
+  DECL_VARIABLE(vector, uint, 32, 2);
+  DECL_VARIABLE(vector, poly, 8, 8);
+  DECL_VARIABLE(vector_res, int, 16, 8);
+  DECL_VARIABLE(vector_res, int, 32, 4);
+  DECL_VARIABLE(vector_res, int, 64, 2);
+  DECL_VARIABLE(vector_res, uint, 16, 8);
+  DECL_VARIABLE(vector_res, uint, 32, 4);
+  DECL_VARIABLE(vector_res, uint, 64, 2);
+  DECL_VARIABLE(vector_res, poly, 16, 8);
+
+  clean_results ();
+
+  VLOAD(vector, buffer, , int, s, 8, 8);
+  VLOAD(vector, buffer, , int, s, 16, 4);
+  VLOAD(vector, buffer, , int, s, 32, 2);
+  VLOAD(vector, buffer, , uint, u, 8, 8);
+  VLOAD(vector, buffer, , uint, u, 16, 4);
+  VLOAD(vector, buffer, , uint, u, 32, 2);
+  VLOAD(vector, buffer, , poly, p, 8, 8);
+
+  TEST_VMULL(int, s, 8, 16, 8);
+  TEST_VMULL(int, s, 16, 32, 4);
+  TEST_VMULL(int, s, 32, 64, 2);
+  TEST_VMULL(uint, u, 8, 16, 8);
+  TEST_VMULL(uint, u, 16, 32, 4);
+  TEST_VMULL(uint, u, 32, 64, 2);
+  TEST_VMULL(poly, p, 8, 16, 8);
+
+  CHECK(TEST_MSG, int, 16, 8, PRIx64, expected, "");
+  CHECK(TEST_MSG, int, 32, 4, PRIx32, expected, "");
+  CHECK(TEST_MSG, int, 64, 2, PRIx32, expected, "");
+  CHECK(TEST_MSG, uint, 16, 8, PRIx64, expected, "");
+  CHECK(TEST_MSG, uint, 32, 4, PRIx32, expected, "");
+  CHECK(TEST_MSG, uint, 64, 2, PRIx32, expected, "");
+  CHECK(TEST_MSG, poly, 16, 8, PRIx16, expected, "");
+}
+
+int main (void)
+{
+  exec_vmull ();
+  return 0;
+}



LGTM.

Tejas.



Re: [[ARM/AArch64][testsuite] 24/36] Add vmul_n tests.

2015-01-16 Thread Tejas Belagod

On 13/01/15 15:18, Christophe Lyon wrote:

* gcc.target/aarch64/advsimd-intrinsics/vmul_n.c: New file.

diff --git a/gcc/testsuite/gcc.target/aarch64/advsimd-intrinsics/vmul_n.c 
b/gcc/testsuite/gcc.target/aarch64/advsimd-intrinsics/vmul_n.c
new file mode 100644
index 000..be0ee65
--- /dev/null
+++ b/gcc/testsuite/gcc.target/aarch64/advsimd-intrinsics/vmul_n.c
@@ -0,0 +1,96 @@
+#include 
+#include "arm-neon-ref.h"
+#include "compute-ref-data.h"
+
+/* Expected results.  */
+VECT_VAR_DECL(expected,int,16,4) [] = { 0xfef0, 0xff01, 0xff12, 0xff23 };
+VECT_VAR_DECL(expected,int,32,2) [] = { 0xfde0, 0xfe02 };
+VECT_VAR_DECL(expected,uint,16,4) [] = { 0xfcd0, 0xfd03, 0xfd36, 0xfd69 };
+VECT_VAR_DECL(expected,uint,32,2) [] = { 0xfbc0, 0xfc04 };
+VECT_VAR_DECL(expected,hfloat,32,2) [] = { 0xc3b2, 0xc3a74000 };
+VECT_VAR_DECL(expected,int,16,8) [] = { 0xfab0, 0xfb05, 0xfb5a, 0xfbaf,
+   0xfc04, 0xfc59, 0xfcae, 0xfd03 };
+VECT_VAR_DECL(expected,int,32,4) [] = { 0xf9a0, 0xfa06,
+   0xfa6c, 0xfad2 };
+VECT_VAR_DECL(expected,uint,16,8) [] = { 0xf890, 0xf907, 0xf97e, 0xf9f5,
+0xfa6c, 0xfae3, 0xfb5a, 0xfbd1 };
+VECT_VAR_DECL(expected,uint,32,4) [] = { 0xf780, 0xf808,
+0xf890, 0xf918 };
+VECT_VAR_DECL(expected,hfloat,32,4) [] = { 0xc4b1cccd, 0xc4a6b000,
+  0xc49b9333, 0xc4907667 };
+
+#define INSN_NAME vmul_n
+#define TEST_MSG "VMUL_N"
+
+#define FNNAME1(NAME) exec_ ## NAME
+#define FNNAME(NAME) FNNAME1(NAME)
+
+void FNNAME (INSN_NAME) (void)
+{
+#define DECL_VMUL(VAR) \
+  DECL_VARIABLE(VAR, int, 16, 4);  \
+  DECL_VARIABLE(VAR, int, 32, 2);  \
+  DECL_VARIABLE(VAR, uint, 16, 4); \
+  DECL_VARIABLE(VAR, uint, 32, 2); \
+  DECL_VARIABLE(VAR, float, 32, 2);\
+  DECL_VARIABLE(VAR, int, 16, 8);  \
+  DECL_VARIABLE(VAR, int, 32, 4);  \
+  DECL_VARIABLE(VAR, uint, 16, 8); \
+  DECL_VARIABLE(VAR, uint, 32, 4); \
+  DECL_VARIABLE(VAR, float, 32, 4)
+
+  /* vector_res = vmul_n(vector,val), then store the result.  */
+#define TEST_VMUL_N(Q, T1, T2, W, N, L)
\
+  VECT_VAR(vector_res, T1, W, N) = \
+vmul##Q##_n_##T2##W(VECT_VAR(vector, T1, W, N),\
+   L); \
+  vst1##Q##_##T2##W(VECT_VAR(result, T1, W, N),
\
+   VECT_VAR(vector_res, T1, W, N))
+
+  DECL_VMUL(vector);
+  DECL_VMUL(vector_res);
+
+  clean_results ();
+
+  /* Initialize vector from pre-initialized values.  */
+  VLOAD(vector, buffer, , int, s, 16, 4);
+  VLOAD(vector, buffer, , int, s, 32, 2);
+  VLOAD(vector, buffer, , uint, u, 16, 4);
+  VLOAD(vector, buffer, , uint, u, 32, 2);
+  VLOAD(vector, buffer, , float, f, 32, 2);
+  VLOAD(vector, buffer, q, int, s, 16, 8);
+  VLOAD(vector, buffer, q, int, s, 32, 4);
+  VLOAD(vector, buffer, q, uint, u, 16, 8);
+  VLOAD(vector, buffer, q, uint, u, 32, 4);
+  VLOAD(vector, buffer, q, float, f, 32, 4);
+
+  /* Choose multiplier arbitrarily.  */
+  TEST_VMUL_N(, int, s, 16, 4, 0x11);
+  TEST_VMUL_N(, int, s, 32, 2, 0x22);
+  TEST_VMUL_N(, uint, u, 16, 4, 0x33);
+  TEST_VMUL_N(, uint, u, 32, 2, 0x44);
+  TEST_VMUL_N(, float, f, 32, 2, 22.3f);
+  TEST_VMUL_N(q, int, s, 16, 8, 0x55);
+  TEST_VMUL_N(q, int, s, 32, 4, 0x66);
+  TEST_VMUL_N(q, uint, u, 16, 8, 0x77);
+  TEST_VMUL_N(q, uint, u, 32, 4, 0x88);
+  TEST_VMUL_N(q, float, f, 32, 4, 88.9f);
+
+  CHECK(TEST_MSG, int, 16, 4, PRIx64, expected, "");
+  CHECK(TEST_MSG, int, 32, 2, PRIx32, expected, "");
+  CHECK(TEST_MSG, uint, 16, 4, PRIx64, expected, "");
+  CHECK(TEST_MSG, uint, 32, 2, PRIx32, expected, "");
+  CHECK_FP(TEST_MSG, float, 32, 2, PRIx32, expected, "");
+  CHECK(TEST_MSG, int, 16, 8, PRIx64, expected, "");
+  CHECK(TEST_MSG, int, 32, 4, PRIx32, expected, "");
+  CHECK(TEST_MSG, uint, 16, 8, PRIx64, expected, "");
+  CHECK(TEST_MSG, uint, 32, 4, PRIx32, expected, "");
+  CHECK_FP(TEST_MSG, float, 32, 4, PRIx32, expected, "");
+}
+
+int main (void)
+{
+  FNNAME (INSN_NAME) ();
+
+  return 0;
+}



LGTM.

Tejas.




Re: [[ARM/AArch64][testsuite] 23/36] Add vmul_lane tests.

2015-01-16 Thread Tejas Belagod

On 13/01/15 15:18, Christophe Lyon wrote:

* gcc.target/aarch64/advsimd-intrinsics/vmul_lane.c: New file.

diff --git a/gcc/testsuite/gcc.target/aarch64/advsimd-intrinsics/vmul_lane.c 
b/gcc/testsuite/gcc.target/aarch64/advsimd-intrinsics/vmul_lane.c
new file mode 100644
index 000..978cd9b
--- /dev/null
+++ b/gcc/testsuite/gcc.target/aarch64/advsimd-intrinsics/vmul_lane.c
@@ -0,0 +1,104 @@
+#include 
+#include "arm-neon-ref.h"
+#include "compute-ref-data.h"
+
+/* Expected results.  */
+VECT_VAR_DECL(expected,int,16,4) [] = { 0xffc0, 0xffc4, 0xffc8, 0xffcc };
+VECT_VAR_DECL(expected,int,32,2) [] = { 0xfde0, 0xfe02 };
+VECT_VAR_DECL(expected,uint,16,4) [] = { 0xbbc0, 0xc004, 0xc448, 0xc88c };
+VECT_VAR_DECL(expected,uint,32,2) [] = { 0xace0, 0xb212 };
+VECT_VAR_DECL(expected,hfloat,32,2) [] = { 0xc3b6, 0xc3ab };
+VECT_VAR_DECL(expected,int,16,8) [] = { 0xffc0, 0xffc4, 0xffc8, 0xffcc,
+   0xffd0, 0xffd4, 0xffd8, 0xffdc };
+VECT_VAR_DECL(expected,int,32,4) [] = { 0xfde0, 0xfe02,
+   0xfe24, 0xfe46 };
+VECT_VAR_DECL(expected,uint,16,8) [] = { 0xbbc0, 0xc004, 0xc448, 0xc88c,
+0xccd0, 0xd114, 0xd558, 0xd99c };
+VECT_VAR_DECL(expected,uint,32,4) [] = { 0xace0, 0xb212,
+0xb744, 0xbc76 };
+VECT_VAR_DECL(expected,hfloat,32,4) [] = { 0xc3b6, 0xc3ab,
+  0xc39f, 0xc394 };
+
+#define TEST_MSG "VMUL_LANE"
+void exec_vmul_lane (void)
+{
+#define DECL_VMUL(VAR) \
+  DECL_VARIABLE(VAR, int, 16, 4);  \
+  DECL_VARIABLE(VAR, int, 32, 2);  \
+  DECL_VARIABLE(VAR, uint, 16, 4); \
+  DECL_VARIABLE(VAR, uint, 32, 2); \
+  DECL_VARIABLE(VAR, float, 32, 2);\
+  DECL_VARIABLE(VAR, int, 16, 8);  \
+  DECL_VARIABLE(VAR, int, 32, 4);  \
+  DECL_VARIABLE(VAR, uint, 16, 8); \
+  DECL_VARIABLE(VAR, uint, 32, 4); \
+  DECL_VARIABLE(VAR, float, 32, 4)
+
+  /* vector_res = vmul_lane(vector,vector2,lane), then store the result.  */
+#define TEST_VMUL_LANE(Q, T1, T2, W, N, N2, L) \
+  VECT_VAR(vector_res, T1, W, N) = \
+vmul##Q##_lane_##T2##W(VECT_VAR(vector, T1, W, N), \
+  VECT_VAR(vector2, T1, W, N2),\
+  L);  \
+  vst1##Q##_##T2##W(VECT_VAR(result, T1, W, N),
\
+   VECT_VAR(vector_res, T1, W, N))
+
+  DECL_VMUL(vector);
+  DECL_VMUL(vector_res);
+
+  DECL_VARIABLE(vector2, int, 16, 4);
+  DECL_VARIABLE(vector2, int, 32, 2);
+  DECL_VARIABLE(vector2, uint, 16, 4);
+  DECL_VARIABLE(vector2, uint, 32, 2);
+  DECL_VARIABLE(vector2, float, 32, 2);
+
+  clean_results ();
+
+  /* Initialize vector from pre-initialized values.  */
+  VLOAD(vector, buffer, , int, s, 16, 4);
+  VLOAD(vector, buffer, , int, s, 32, 2);
+  VLOAD(vector, buffer, , uint, u, 16, 4);
+  VLOAD(vector, buffer, , uint, u, 32, 2);
+  VLOAD(vector, buffer, , float, f, 32, 2);
+  VLOAD(vector, buffer, q, int, s, 16, 8);
+  VLOAD(vector, buffer, q, int, s, 32, 4);
+  VLOAD(vector, buffer, q, uint, u, 16, 8);
+  VLOAD(vector, buffer, q, uint, u, 32, 4);
+  VLOAD(vector, buffer, q, float, f, 32, 4);
+
+  /* Initialize vector2.  */
+  VDUP(vector2, , int, s, 16, 4, 0x4);
+  VDUP(vector2, , int, s, 32, 2, 0x22);
+  VDUP(vector2, , uint, u, 16, 4, 0x444);
+  VDUP(vector2, , uint, u, 32, 2, 0x532);
+  VDUP(vector2, , float, f, 32, 2, 22.8f);
+
+  /* Choose lane arbitrarily.  */
+  TEST_VMUL_LANE(, int, s, 16, 4, 4, 2);
+  TEST_VMUL_LANE(, int, s, 32, 2, 2, 1);
+  TEST_VMUL_LANE(, uint, u, 16, 4, 4, 2);
+  TEST_VMUL_LANE(, uint, u, 32, 2, 2, 1);
+  TEST_VMUL_LANE(, float, f, 32, 2, 2, 1);
+  TEST_VMUL_LANE(q, int, s, 16, 8, 4, 2);
+  TEST_VMUL_LANE(q, int, s, 32, 4, 2, 0);
+  TEST_VMUL_LANE(q, uint, u, 16, 8, 4, 2);
+  TEST_VMUL_LANE(q, uint, u, 32, 4, 2, 1);
+  TEST_VMUL_LANE(q, float, f, 32, 4, 2, 0);
+
+  CHECK(TEST_MSG, int, 16, 4, PRIx64, expected, "");
+  CHECK(TEST_MSG, int, 32, 2, PRIx32, expected, "");
+  CHECK(TEST_MSG, uint, 16, 4, PRIx64, expected, "");
+  CHECK(TEST_MSG, uint, 32, 2, PRIx32, expected, "");
+  CHECK_FP(TEST_MSG, float, 32, 2, PRIx32, expected, "");
+  CHECK(TEST_MSG, int, 16, 8, PRIx64, expected, "");
+  CHECK(TEST_MSG, int, 32, 4, PRIx32, expected, "");
+  CHECK(TEST_MSG, uint, 16, 8, PRIx64, expected, "");
+  CHECK(TEST_MSG, uint, 32, 4, PRIx32, expected, "");
+  CHECK_FP(TEST_MSG, float, 32, 4, PRIx32, expected, "");
+}
+
+int main (void)
+{
+  exec_vmul_lane ();
+  return 0;
+}



LGTM.

Tejas.



Re: [[ARM/AArch64][testsuite] 22/36] Add vmovn tests.

2015-01-16 Thread Tejas Belagod

On 13/01/15 15:18, Christophe Lyon wrote:

* gcc.target/aarch64/advsimd-intrinsics/vmovn.c: New file.

diff --git a/gcc/testsuite/gcc.target/aarch64/advsimd-intrinsics/vmovn.c 
b/gcc/testsuite/gcc.target/aarch64/advsimd-intrinsics/vmovn.c
new file mode 100644
index 000..bc2c2ca
--- /dev/null
+++ b/gcc/testsuite/gcc.target/aarch64/advsimd-intrinsics/vmovn.c
@@ -0,0 +1,50 @@
+#include 
+#include "arm-neon-ref.h"
+#include "compute-ref-data.h"
+
+/* Expected results.  */
+VECT_VAR_DECL(expected,int,8,8) [] = { 0xf0, 0xf1, 0xf2, 0xf3,
+  0xf4, 0xf5, 0xf6, 0xf7 };
+VECT_VAR_DECL(expected,int,16,4) [] = { 0xfff0, 0xfff1, 0xfff2, 0xfff3 };
+VECT_VAR_DECL(expected,int,32,2) [] = { 0xfff0, 0xfff1 };
+VECT_VAR_DECL(expected,uint,8,8) [] = { 0xf0, 0xf1, 0xf2, 0xf3,
+   0xf4, 0xf5, 0xf6, 0xf7 };
+VECT_VAR_DECL(expected,uint,16,4) [] = { 0xfff0, 0xfff1, 0xfff2, 0xfff3 };
+VECT_VAR_DECL(expected,uint,32,2) [] = { 0xfff0, 0xfff1 };
+
+#define TEST_MSG "VMOVN"
+void exec_vmovn (void)
+{
+  /* Basic test: vec64=vmovn(vec128), then store the result.  */
+#define TEST_VMOVN(T1, T2, W, W2, N)   \
+  VECT_VAR(vector64, T1, W2, N) =  \
+vmovn_##T2##W(VECT_VAR(vector128, T1, W, N));  \
+  vst1_##T2##W2(VECT_VAR(result, T1, W2, N), VECT_VAR(vector64, T1, W2, N))
+
+  DECL_VARIABLE_64BITS_VARIANTS(vector64);
+  DECL_VARIABLE_128BITS_VARIANTS(vector128);
+
+  TEST_MACRO_128BITS_VARIANTS_2_5(VLOAD, vector128, buffer);
+
+  clean_results ();
+
+  TEST_VMOVN(int, s, 16, 8, 8);
+  TEST_VMOVN(int, s, 32, 16, 4);
+  TEST_VMOVN(int, s, 64, 32, 2);
+  TEST_VMOVN(uint, u, 16, 8, 8);
+  TEST_VMOVN(uint, u, 32, 16, 4);
+  TEST_VMOVN(uint, u, 64, 32, 2);
+
+  CHECK(TEST_MSG, int, 8, 8, PRIx32, expected, "");
+  CHECK(TEST_MSG, int, 16, 4, PRIx64, expected, "");
+  CHECK(TEST_MSG, int, 32, 2, PRIx32, expected, "");
+  CHECK(TEST_MSG, uint, 8, 8, PRIx32, expected, "");
+  CHECK(TEST_MSG, uint, 16, 4, PRIx64, expected, "");
+  CHECK(TEST_MSG, uint, 32, 2, PRIx32, expected, "");
+}
+
+int main (void)
+{
+  exec_vmovn ();
+  return 0;
+}



LGTM.

Tejas.



Re: [[ARM/AArch64][testsuite] 21/36] Add vmovl tests.

2015-01-16 Thread Tejas Belagod

On 13/01/15 15:18, Christophe Lyon wrote:

* gcc.target/aarch64/advsimd-intrinsics/vmovl.c: New file.

diff --git a/gcc/testsuite/gcc.target/aarch64/advsimd-intrinsics/vmovl.c 
b/gcc/testsuite/gcc.target/aarch64/advsimd-intrinsics/vmovl.c
new file mode 100644
index 000..427c9ba
--- /dev/null
+++ b/gcc/testsuite/gcc.target/aarch64/advsimd-intrinsics/vmovl.c
@@ -0,0 +1,77 @@
+#include 
+#include "arm-neon-ref.h"
+#include "compute-ref-data.h"
+
+/* Expected results.  */
+VECT_VAR_DECL(expected,int,8,8) [] = { 0x33, 0x33, 0x33, 0x33,
+  0x33, 0x33, 0x33, 0x33 };
+VECT_VAR_DECL(expected,int,16,4) [] = { 0x, 0x, 0x, 0x };
+VECT_VAR_DECL(expected,int,32,2) [] = { 0x, 0x };
+VECT_VAR_DECL(expected,int,64,1) [] = { 0x };
+VECT_VAR_DECL(expected,uint,8,8) [] = { 0x33, 0x33, 0x33, 0x33,
+   0x33, 0x33, 0x33, 0x33 };
+VECT_VAR_DECL(expected,uint,16,4) [] = { 0x, 0x, 0x, 0x };
+VECT_VAR_DECL(expected,uint,32,2) [] = { 0x, 0x };
+VECT_VAR_DECL(expected,uint,64,1) [] = { 0x };
+VECT_VAR_DECL(expected,poly,8,8) [] = { 0x33, 0x33, 0x33, 0x33,
+   0x33, 0x33, 0x33, 0x33 };
+VECT_VAR_DECL(expected,poly,16,4) [] = { 0x, 0x, 0x, 0x };
+VECT_VAR_DECL(expected,hfloat,32,2) [] = { 0x, 0x };
+VECT_VAR_DECL(expected,int,8,16) [] = { 0x33, 0x33, 0x33, 0x33,
+   0x33, 0x33, 0x33, 0x33,
+   0x33, 0x33, 0x33, 0x33,
+   0x33, 0x33, 0x33, 0x33 };
+VECT_VAR_DECL(expected,int,16,8) [] = { 0xfff0, 0xfff1, 0xfff2, 0xfff3,
+   0xfff4, 0xfff5, 0xfff6, 0xfff7 };
+VECT_VAR_DECL(expected,int,32,4) [] = { 0xfff0, 0xfff1,
+   0xfff2, 0xfff3 };
+VECT_VAR_DECL(expected,int,64,2) [] = { 0xfff0,
+   0xfff1 };
+VECT_VAR_DECL(expected,uint,8,16) [] = { 0x33, 0x33, 0x33, 0x33,
+0x33, 0x33, 0x33, 0x33,
+0x33, 0x33, 0x33, 0x33,
+0x33, 0x33, 0x33, 0x33 };
+VECT_VAR_DECL(expected,uint,16,8) [] = { 0xf0, 0xf1, 0xf2, 0xf3,
+0xf4, 0xf5, 0xf6, 0xf7 };
+VECT_VAR_DECL(expected,uint,32,4) [] = { 0xfff0, 0xfff1, 0xfff2, 0xfff3 };
+VECT_VAR_DECL(expected,uint,64,2) [] = { 0xfff0, 0xfff1 };
+VECT_VAR_DECL(expected,poly,8,16) [] = { 0x33, 0x33, 0x33, 0x33,
+0x33, 0x33, 0x33, 0x33,
+0x33, 0x33, 0x33, 0x33,
+0x33, 0x33, 0x33, 0x33 };
+VECT_VAR_DECL(expected,poly,16,8) [] = { 0x, 0x, 0x, 0x,
+0x, 0x, 0x, 0x };
+VECT_VAR_DECL(expected,hfloat,32,4) [] = { 0x, 0x,
+  0x, 0x };
+


No poly or float for vmovl.

Otherwise, LGTM.

Tejas.




Re: [[ARM/AArch64][testsuite] 05/36] Add vldX_dup test.

2015-01-16 Thread Christophe Lyon
On 16 January 2015 at 16:20, Tejas Belagod  wrote:
> On 13/01/15 15:18, Christophe Lyon wrote:
>>
>>
>>  * gcc.target/aarch64/advsimd-intrinsics/vldX_dup.c: New file.
>>
>> diff --git
>> a/gcc/testsuite/gcc.target/aarch64/advsimd-intrinsics/vldX_dup.c
>> b/gcc/testsuite/gcc.target/aarch64/advsimd-intrinsics/vldX_dup.c
>> new file mode 100644
>> index 000..53cd8f3
>> --- /dev/null
>> +++ b/gcc/testsuite/gcc.target/aarch64/advsimd-intrinsics/vldX_dup.c
>> @@ -0,0 +1,671 @@
>> +#include 
>> +#include "arm-neon-ref.h"
>> +#include "compute-ref-data.h"
>> +
>> +/* Expected results.  */
>> +
>> +/* vld2_dup/chunk 0.  */
>> +VECT_VAR_DECL(expected_vld2_0,int,8,8) [] = { 0xf0, 0xf1, 0xf0, 0xf1,
>> +  0xf0, 0xf1, 0xf0, 0xf1 };
>> +VECT_VAR_DECL(expected_vld2_0,int,16,4) [] = { 0xfff0, 0xfff1, 0xfff0,
>> 0xfff1 };
>> +VECT_VAR_DECL(expected_vld2_0,int,32,2) [] = { 0xfff0, 0xfff1 };
>> +VECT_VAR_DECL(expected_vld2_0,int,64,1) [] = { 0xfff0 };
>> +VECT_VAR_DECL(expected_vld2_0,uint,8,8) [] = { 0xf0, 0xf1, 0xf0, 0xf1,
>> +   0xf0, 0xf1, 0xf0, 0xf1 };
>> +VECT_VAR_DECL(expected_vld2_0,uint,16,4) [] = { 0xfff0, 0xfff1, 0xfff0,
>> 0xfff1 };
>> +VECT_VAR_DECL(expected_vld2_0,uint,32,2) [] = { 0xfff0, 0xfff1 };
>> +VECT_VAR_DECL(expected_vld2_0,uint,64,1) [] = { 0xfff0 };
>> +VECT_VAR_DECL(expected_vld2_0,poly,8,8) [] = { 0xf0, 0xf1, 0xf0, 0xf1,
>> +   0xf0, 0xf1, 0xf0, 0xf1 };
>> +VECT_VAR_DECL(expected_vld2_0,poly,16,4) [] = { 0xfff0, 0xfff1, 0xfff0,
>> 0xfff1 };
>> +VECT_VAR_DECL(expected_vld2_0,hfloat,32,2) [] = { 0xc180, 0xc170
>> };
>> +VECT_VAR_DECL(expected_vld2_0,int,8,16) [] = { 0x33, 0x33, 0x33, 0x33,
>> +   0x33, 0x33, 0x33, 0x33,
>> +   0x33, 0x33, 0x33, 0x33,
>> +   0x33, 0x33, 0x33, 0x33 };
>> +VECT_VAR_DECL(expected_vld2_0,int,16,8) [] = { 0x, 0x, 0x,
>> 0x,
>> +   0x, 0x, 0x, 0x };
>> +VECT_VAR_DECL(expected_vld2_0,int,32,4) [] = { 0x, 0x,
>> +   0x, 0x };
>> +VECT_VAR_DECL(expected_vld2_0,int,64,2) [] = { 0x,
>> +   0x };
>> +VECT_VAR_DECL(expected_vld2_0,uint,8,16) [] = { 0x33, 0x33, 0x33, 0x33,
>> +0x33, 0x33, 0x33, 0x33,
>> +0x33, 0x33, 0x33, 0x33,
>> +0x33, 0x33, 0x33, 0x33 };
>> +VECT_VAR_DECL(expected_vld2_0,uint,16,8) [] = { 0x, 0x, 0x,
>> 0x,
>> +0x, 0x, 0x, 0x };
>> +VECT_VAR_DECL(expected_vld2_0,uint,32,4) [] = { 0x, 0x,
>> +0x, 0x };
>> +VECT_VAR_DECL(expected_vld2_0,uint,64,2) [] = { 0x,
>> +0x };
>> +VECT_VAR_DECL(expected_vld2_0,poly,8,16) [] = { 0x33, 0x33, 0x33, 0x33,
>> +0x33, 0x33, 0x33, 0x33,
>> +0x33, 0x33, 0x33, 0x33,
>> +0x33, 0x33, 0x33, 0x33 };
>> +VECT_VAR_DECL(expected_vld2_0,poly,16,8) [] = { 0x, 0x, 0x,
>> 0x,
>> +0x, 0x, 0x, 0x };
>> +VECT_VAR_DECL(expected_vld2_0,hfloat,32,4) [] = { 0x, 0x,
>> +  0x, 0x };
>> +
>> +/* vld2_dup/chunk 1.  */
>> +VECT_VAR_DECL(expected_vld2_1,int,8,8) [] = { 0xf0, 0xf1, 0xf0, 0xf1,
>> + 0xf0, 0xf1, 0xf0, 0xf1 };
>> +VECT_VAR_DECL(expected_vld2_1,int,16,4) [] = { 0xfff0, 0xfff1, 0xfff0,
>> 0xfff1 };
>> +VECT_VAR_DECL(expected_vld2_1,int,32,2) [] = { 0xfff0, 0xfff1 };
>> +VECT_VAR_DECL(expected_vld2_1,int,64,1) [] = { 0xfff1 };
>> +VECT_VAR_DECL(expected_vld2_1,uint,8,8) [] = { 0xf0, 0xf1, 0xf0, 0xf1,
>> +  0xf0, 0xf1, 0xf0, 0xf1 };
>> +VECT_VAR_DECL(expected_vld2_1,uint,16,4) [] = { 0xfff0, 0xfff1, 0xfff0,
>> 0xfff1 };
>> +VECT_VAR_DECL(expected_vld2_1,uint,32,2) [] = { 0xfff0, 0xfff1 };
>> +VECT_VAR_DECL(expected_vld2_1,uint,64,1) [] = { 0xfff1 };
>> +VECT_VAR_DECL(expected_vld2_1,poly,8,8) [] = { 0xf0, 0xf1, 0xf0, 0xf1,
>> +  0xf0, 0xf1, 0xf0, 0xf1 };
>> +VECT_VAR_DECL(expected_vld2_1,poly,16,4) [] = { 0xfff0, 0xfff1,
>> +   0xfff0, 0xfff1 };
>> +VECT_VAR_DECL(expected_vld2_1,hfloat,32,2) [] = { 0xc180, 0xc170
>> };
>> +VECT_VAR_DECL(expected_vld2_1,int,8,16) [] = { 

Re: [[ARM/AArch64][testsuite] 20/36] Add vsubw tests, putting most of the code in common with vaddw through vXXWw.inc

2015-01-16 Thread Tejas Belagod

On 13/01/15 15:18, Christophe Lyon wrote:

 * gcc.target/aarch64/advsimd-intrinsics/vXXXw.inc: New file.
 * gcc.target/aarch64/advsimd-intrinsics/vsubw.c: New file.
 * gcc.target/aarch64/advsimd-intrinsics/vaddw.c: Use code from
 vXXXw.inc.

diff --git a/gcc/testsuite/gcc.target/aarch64/advsimd-intrinsics/vXXXw.inc 
b/gcc/testsuite/gcc.target/aarch64/advsimd-intrinsics/vXXXw.inc
new file mode 100644
index 000..c535557
--- /dev/null
+++ b/gcc/testsuite/gcc.target/aarch64/advsimd-intrinsics/vXXXw.inc
@@ -0,0 +1,70 @@
+#define FNNAME1(NAME) exec_ ## NAME
+#define FNNAME(NAME) FNNAME1(NAME)
+
+void FNNAME (INSN_NAME) (void)
+{
+  /* Basic test: y=vaddw(x1,x2), then store the result.  */
+#define TEST_VADDW1(INSN, T1, T2, W, W2, N)\
+  VECT_VAR(vector_res, T1, W2, N) =\
+INSN##_##T2##W(VECT_VAR(vector, T1, W2, N),
\
+  VECT_VAR(vector2, T1, W, N));\
+  vst1q_##T2##W2(VECT_VAR(result, T1, W2, N), VECT_VAR(vector_res, T1, W2, N))
+
+#define TEST_VADDW(INSN, T1, T2, W, W2, N) \
+  TEST_VADDW1(INSN, T1, T2, W, W2, N)
+
+  DECL_VARIABLE(vector, int, 16, 8);
+  DECL_VARIABLE(vector, int, 32, 4);
+  DECL_VARIABLE(vector, int, 64, 2);
+  DECL_VARIABLE(vector, uint, 16, 8);
+  DECL_VARIABLE(vector, uint, 32, 4);
+  DECL_VARIABLE(vector, uint, 64, 2);
+
+  DECL_VARIABLE(vector2, int, 8, 8);
+  DECL_VARIABLE(vector2, int, 16, 4);
+  DECL_VARIABLE(vector2, int, 32, 2);
+  DECL_VARIABLE(vector2, uint, 8, 8);
+  DECL_VARIABLE(vector2, uint, 16, 4);
+  DECL_VARIABLE(vector2, uint, 32, 2);
+
+  DECL_VARIABLE(vector_res, int, 16, 8);
+  DECL_VARIABLE(vector_res, int, 32, 4);
+  DECL_VARIABLE(vector_res, int, 64, 2);
+  DECL_VARIABLE(vector_res, uint, 16, 8);
+  DECL_VARIABLE(vector_res, uint, 32, 4);
+  DECL_VARIABLE(vector_res, uint, 64, 2);
+
+  clean_results ();
+
+  /* Initialize input "vector" from "buffer".  */
+  VLOAD(vector, buffer, q, int, s, 16, 8);
+  VLOAD(vector, buffer, q, int, s, 32, 4);
+  VLOAD(vector, buffer, q, int, s, 64, 2);
+  VLOAD(vector, buffer, q, uint, u, 16, 8);
+  VLOAD(vector, buffer, q, uint, u, 32, 4);
+  VLOAD(vector, buffer, q, uint, u, 64, 2);
+
+  /* Choose init value arbitrarily.  */
+  VDUP(vector2, , int, s, 8, 8, -13);
+  VDUP(vector2, , int, s, 16, 4, -14);
+  VDUP(vector2, , int, s, 32, 2, -16);
+  VDUP(vector2, , uint, u, 8, 8, 0xf3);
+  VDUP(vector2, , uint, u, 16, 4, 0xfff1);
+  VDUP(vector2, , uint, u, 32, 2, 0xfff0);
+
+  /* Execute the tests.  */
+  TEST_VADDW(INSN_NAME, int, s, 8, 16, 8);
+  TEST_VADDW(INSN_NAME, int, s, 16, 32, 4);
+  TEST_VADDW(INSN_NAME, int, s, 32, 64, 2);
+  TEST_VADDW(INSN_NAME, uint, u, 8, 16, 8);
+  TEST_VADDW(INSN_NAME, uint, u, 16, 32, 4);
+  TEST_VADDW(INSN_NAME, uint, u, 32, 64, 2);
+
+  CHECK_RESULTS (TEST_MSG, "");
+}
+
+int main (void)
+{
+  FNNAME (INSN_NAME) ();
+  return 0;
+}
diff --git a/gcc/testsuite/gcc.target/aarch64/advsimd-intrinsics/vaddw.c 
b/gcc/testsuite/gcc.target/aarch64/advsimd-intrinsics/vaddw.c
index 95cbb31..27f54f6 100644
--- a/gcc/testsuite/gcc.target/aarch64/advsimd-intrinsics/vaddw.c
+++ b/gcc/testsuite/gcc.target/aarch64/advsimd-intrinsics/vaddw.c
@@ -2,6 +2,9 @@
  #include "arm-neon-ref.h"
  #include "compute-ref-data.h"

+#define INSN_NAME vaddw
+#define TEST_MSG "VADDW"
+
  /* Expected results.  */
  VECT_VAR_DECL(expected,int,8,8) [] = { 0x33, 0x33, 0x33, 0x33,
0x33, 0x33, 0x33, 0x33 };
@@ -45,76 +48,4 @@ VECT_VAR_DECL(expected,poly,16,8) [] = { 0x, 0x, 
0x, 0x,
  VECT_VAR_DECL(expected,hfloat,32,4) [] = { 0x, 0x,
0x, 0x };

-#define INSN_NAME vaddw
-#define TEST_MSG "VADDW"
-
-#define FNNAME1(NAME) exec_ ## NAME
-#define FNNAME(NAME) FNNAME1(NAME)
-
-void FNNAME (INSN_NAME) (void)
-{
-  /* Basic test: y=vaddw(x1,x2), then store the result.  */
-#define TEST_VADDW1(INSN, T1, T2, W, W2, N)\
-  VECT_VAR(vector_res, T1, W2, N) =\
-INSN##_##T2##W(VECT_VAR(vector, T1, W2, N),
\
-  VECT_VAR(vector2, T1, W, N));\
-  vst1q_##T2##W2(VECT_VAR(result, T1, W2, N), VECT_VAR(vector_res, T1, W2, N))
-
-#define TEST_VADDW(INSN, T1, T2, W, W2, N) \
-  TEST_VADDW1(INSN, T1, T2, W, W2, N)
-
-  DECL_VARIABLE(vector, int, 16, 8);
-  DECL_VARIABLE(vector, int, 32, 4);
-  DECL_VARIABLE(vector, int, 64, 2);
-  DECL_VARIABLE(vector, uint, 16, 8);
-  DECL_VARIABLE(vector, uint, 32, 4);
-  DECL_VARIABLE(vector, uint, 64, 2);
-
-  DECL_VARIABLE(vector2, int, 8, 8);
-  DECL_VARIABLE(vector2, int, 16, 4);
-  DECL_VARIABLE(vector2, int, 32, 2);
-  DECL_VARIABLE(vector2, uint, 8, 8);
-  DECL_VARIABLE(vector2, uint, 16, 4);
-  DECL_VARIABLE(vector2, uint, 32, 2);
-
-  DECL_VARIABLE(vector_res, int, 

Re: [patch libstdc++] Optimize synchronization in std::future if futexes are available.

2015-01-16 Thread Jonathan Wakely

On 16/01/15 18:00 +0100, Torvald Riegel wrote:

OK for trunk?


OK, thanks.


Re: [[ARM/AArch64][testsuite] 08/36] Add vtrn tests. Refactor vzup and vzip tests.

2015-01-16 Thread Christophe Lyon
On 16 January 2015 at 16:58, Tejas Belagod  wrote:
> On 13/01/15 15:18, Christophe Lyon wrote:
>>
>>
>>  * gcc.target/aarch64/advsimd-intrinsics/vshuffle.inc: New file.
>>  * gcc.target/aarch64/advsimd-intrinsics/vtrn.c: New file.
>>  * gcc.target/aarch64/advsimd-intrinsics/vuzp.c: Use code from
>>  vshuffle.inc.
>>  * gcc.target/aarch64/advsimd-intrinsics/vzip.c: Use code from
>>  vshuffle.inc.
>>
>> diff --git
>> a/gcc/testsuite/gcc.target/aarch64/advsimd-intrinsics/vshuffle.inc
>> b/gcc/testsuite/gcc.target/aarch64/advsimd-intrinsics/vshuffle.inc
>> new file mode 100644
>> index 000..928f338
>> --- /dev/null
>> +++ b/gcc/testsuite/gcc.target/aarch64/advsimd-intrinsics/vshuffle.inc
>> @@ -0,0 +1,139 @@
>> +#define FNNAME1(NAME) exec_ ## NAME
>> +#define FNNAME(NAME) FNNAME1(NAME)
>> +
>> +void FNNAME (INSN_NAME) (void)
>> +{
>> +  /* In this case, output variables are arrays of vectors.  */
>> +#define DECL_VSHUFFLE(T1, W, N)
>> \
>> +  VECT_ARRAY_TYPE(T1, W, N, 2) VECT_ARRAY_VAR(result_vec, T1, W, N, 2);
>> \
>> +  VECT_VAR_DECL(result_bis, T1, W, N)[2 * N]
>> +
>> +  /* We need to use a temporary result buffer (result_bis), because
>> + the one used for other tests is not large enough. A subset of the
>> + result data is moved from result_bis to result, and it is this
>> + subset which is used to check the actual behaviour. The next
>> + macro enables to move another chunk of data from result_bis to
>> + result.  */
>> +#define TEST_VSHUFFLE(INSN, Q, T1, T2, W, N)   \
>> +  VECT_ARRAY_VAR(result_vec, T1, W, N, 2) =\
>> +INSN##Q##_##T2##W(VECT_VAR(vector1, T1, W, N), \
>> + VECT_VAR(vector2, T1, W, N)); \
>> +  vst2##Q##_##T2##W(VECT_VAR(result_bis, T1, W, N),\
>> +   VECT_ARRAY_VAR(result_vec, T1, W, N, 2));   \
>> +  memcpy(VECT_VAR(result, T1, W, N), VECT_VAR(result_bis, T1, W, N),   \
>> +sizeof(VECT_VAR(result, T1, W, N)));
>> +
>> +  /* Overwrite "result" with the contents of "result_bis"[X].  */
>> +#define TEST_EXTRA_CHUNK(T1, W, N, X)  \
>> +  memcpy(VECT_VAR(result, T1, W, N), &(VECT_VAR(result_bis, T1, W,
>> N)[X*N]), \
>> +sizeof(VECT_VAR(result, T1, W, N)));
>> +
>> +  DECL_VARIABLE_ALL_VARIANTS(vector1);
>> +  DECL_VARIABLE_ALL_VARIANTS(vector2);
>> +
>> +  /* We don't need 64 bits variants.  */
>> +#define DECL_ALL_VSHUFFLE()\
>> +  DECL_VSHUFFLE(int, 8, 8);\
>> +  DECL_VSHUFFLE(int, 16, 4);   \
>> +  DECL_VSHUFFLE(int, 32, 2);   \
>> +  DECL_VSHUFFLE(uint, 8, 8);   \
>> +  DECL_VSHUFFLE(uint, 16, 4);  \
>> +  DECL_VSHUFFLE(uint, 32, 2);  \
>> +  DECL_VSHUFFLE(poly, 8, 8);   \
>> +  DECL_VSHUFFLE(poly, 16, 4);  \
>> +  DECL_VSHUFFLE(float, 32, 2); \
>> +  DECL_VSHUFFLE(int, 8, 16);   \
>> +  DECL_VSHUFFLE(int, 16, 8);   \
>> +  DECL_VSHUFFLE(int, 32, 4);   \
>> +  DECL_VSHUFFLE(uint, 8, 16);  \
>> +  DECL_VSHUFFLE(uint, 16, 8);  \
>> +  DECL_VSHUFFLE(uint, 32, 4);  \
>> +  DECL_VSHUFFLE(poly, 8, 16);  \
>> +  DECL_VSHUFFLE(poly, 16, 8);  \
>> +  DECL_VSHUFFLE(float, 32, 4)
>> +
>> +  DECL_ALL_VSHUFFLE();
>> +
>> +  /* Initialize input "vector" from "buffer".  */
>> +  TEST_MACRO_ALL_VARIANTS_2_5(VLOAD, vector1, buffer);
>> +  VLOAD(vector1, buffer, , float, f, 32, 2);
>> +  VLOAD(vector1, buffer, q, float, f, 32, 4);
>> +
>> +  /* Choose arbitrary initialization values.  */
>> +  VDUP(vector2, , int, s, 8, 8, 0x11);
>> +  VDUP(vector2, , int, s, 16, 4, 0x22);
>> +  VDUP(vector2, , int, s, 32, 2, 0x33);
>> +  VDUP(vector2, , uint, u, 8, 8, 0x55);
>> +  VDUP(vector2, , uint, u, 16, 4, 0x66);
>> +  VDUP(vector2, , uint, u, 32, 2, 0x77);
>> +  VDUP(vector2, , poly, p, 8, 8, 0x55);
>> +  VDUP(vector2, , poly, p, 16, 4, 0x66);
>> +  VDUP(vector2, , float, f, 32, 2, 33.6f);
>> +
>> +  VDUP(vector2, q, int, s, 8, 16, 0x11);
>> +  VDUP(vector2, q, int, s, 16, 8, 0x22);
>> +  VDUP(vector2, q, int, s, 32, 4, 0x33);
>> +  VDUP(vector2, q, uint, u, 8, 16, 0x55);
>> +  VDUP(vector2, q, uint, u, 16, 8, 0x66);
>> +  VDUP(vector2, q, uint, u, 32, 4, 0x77);
>> +  VDUP(vector2, q, poly, p, 8, 16, 0x55);
>> +  VDUP(vector2, q, poly, p, 16, 8, 0x66);
>> +  VDUP(vector2, q, float, f, 32, 4, 33.8f);
>> +
>> +#define TEST_ALL_VSHUFFLE(INSN)\
>> +  TEST_VSHUFFLE(INSN, , int, s, 8, 8); \
>> +  TEST_VSHUFFLE(INSN, , int, s, 16, 4);\
>> + 

Re: [[ARM/AArch64][testsuite] 19/36] Add vsubl tests, put most of the code in common with vaddl in vXXXl.inc.

2015-01-16 Thread Tejas Belagod

On 13/01/15 15:18, Christophe Lyon wrote:

 * gcc.target/aarch64/advsimd-intrinsics/vXXXl.inc: New file.
 * gcc.target/aarch64/advsimd-intrinsics/vsubl.c: New file.
 * gcc.target/aarch64/advsimd-intrinsics/vaddl.c: Use code from
 vXXXl.inc.

diff --git a/gcc/testsuite/gcc.target/aarch64/advsimd-intrinsics/vXXXl.inc 
b/gcc/testsuite/gcc.target/aarch64/advsimd-intrinsics/vXXXl.inc
new file mode 100644
index 000..bd4c8fb
--- /dev/null
+++ b/gcc/testsuite/gcc.target/aarch64/advsimd-intrinsics/vXXXl.inc
@@ -0,0 +1,70 @@
+#define FNNAME1(NAME) exec_ ## NAME
+#define FNNAME(NAME) FNNAME1(NAME)
+
+void FNNAME (INSN_NAME) (void)
+{
+  /* Basic test: y=vaddl(x1,x2), then store the result.  */
+#define TEST_VADDL1(INSN, T1, T2, W, W2, N)\
+  VECT_VAR(vector_res, T1, W2, N) =\
+INSN##_##T2##W(VECT_VAR(vector, T1, W, N), \
+  VECT_VAR(vector2, T1, W, N));\
+  vst1q_##T2##W2(VECT_VAR(result, T1, W2, N), VECT_VAR(vector_res, T1, W2, N))
+
+#define TEST_VADDL(INSN, T1, T2, W, W2, N) \
+  TEST_VADDL1(INSN, T1, T2, W, W2, N)
+
+  DECL_VARIABLE(vector, int, 8, 8);
+  DECL_VARIABLE(vector, int, 16, 4);
+  DECL_VARIABLE(vector, int, 32, 2);
+  DECL_VARIABLE(vector, uint, 8, 8);
+  DECL_VARIABLE(vector, uint, 16, 4);
+  DECL_VARIABLE(vector, uint, 32, 2);
+
+  DECL_VARIABLE(vector2, int, 8, 8);
+  DECL_VARIABLE(vector2, int, 16, 4);
+  DECL_VARIABLE(vector2, int, 32, 2);
+  DECL_VARIABLE(vector2, uint, 8, 8);
+  DECL_VARIABLE(vector2, uint, 16, 4);
+  DECL_VARIABLE(vector2, uint, 32, 2);
+
+  DECL_VARIABLE(vector_res, int, 16, 8);
+  DECL_VARIABLE(vector_res, int, 32, 4);
+  DECL_VARIABLE(vector_res, int, 64, 2);
+  DECL_VARIABLE(vector_res, uint, 16, 8);
+  DECL_VARIABLE(vector_res, uint, 32, 4);
+  DECL_VARIABLE(vector_res, uint, 64, 2);
+
+  clean_results ();
+
+  /* Initialize input "vector" from "buffer".  */
+  VLOAD(vector, buffer, , int, s, 8, 8);
+  VLOAD(vector, buffer, , int, s, 16, 4);
+  VLOAD(vector, buffer, , int, s, 32, 2);
+  VLOAD(vector, buffer, , uint, u, 8, 8);
+  VLOAD(vector, buffer, , uint, u, 16, 4);
+  VLOAD(vector, buffer, , uint, u, 32, 2);
+
+  /* Choose init value arbitrarily.  */
+  VDUP(vector2, , int, s, 8, 8, -13);
+  VDUP(vector2, , int, s, 16, 4, -14);
+  VDUP(vector2, , int, s, 32, 2, -16);
+  VDUP(vector2, , uint, u, 8, 8, 0xf3);
+  VDUP(vector2, , uint, u, 16, 4, 0xfff1);
+  VDUP(vector2, , uint, u, 32, 2, 0xfff0);
+
+  /* Execute the tests.  */
+  TEST_VADDL(INSN_NAME, int, s, 8, 16, 8);
+  TEST_VADDL(INSN_NAME, int, s, 16, 32, 4);
+  TEST_VADDL(INSN_NAME, int, s, 32, 64, 2);
+  TEST_VADDL(INSN_NAME, uint, u, 8, 16, 8);
+  TEST_VADDL(INSN_NAME, uint, u, 16, 32, 4);
+  TEST_VADDL(INSN_NAME, uint, u, 32, 64, 2);
+
+  CHECK_RESULTS (TEST_MSG, "");
+}
+
+int main (void)
+{
+  FNNAME (INSN_NAME) ();
+  return 0;
+}
diff --git a/gcc/testsuite/gcc.target/aarch64/advsimd-intrinsics/vaddl.c 
b/gcc/testsuite/gcc.target/aarch64/advsimd-intrinsics/vaddl.c
index 030785d..020d9f8 100644
--- a/gcc/testsuite/gcc.target/aarch64/advsimd-intrinsics/vaddl.c
+++ b/gcc/testsuite/gcc.target/aarch64/advsimd-intrinsics/vaddl.c
@@ -2,6 +2,9 @@
  #include "arm-neon-ref.h"
  #include "compute-ref-data.h"

+#define INSN_NAME vaddl
+#define TEST_MSG "VADDL"
+
  /* Expected results.  */
  VECT_VAR_DECL(expected,int,8,8) [] = { 0x33, 0x33, 0x33, 0x33,
0x33, 0x33, 0x33, 0x33 };
@@ -45,76 +48,4 @@ VECT_VAR_DECL(expected,poly,16,8) [] = { 0x, 0x, 
0x, 0x,
  VECT_VAR_DECL(expected,hfloat,32,4) [] = { 0x, 0x,
0x, 0x };

-#define INSN_NAME vaddl
-#define TEST_MSG "VADDL"
-
-#define FNNAME1(NAME) exec_ ## NAME
-#define FNNAME(NAME) FNNAME1(NAME)
-
-void FNNAME (INSN_NAME) (void)
-{
-  /* Basic test: y=vaddl(x1,x2), then store the result.  */
-#define TEST_VADDL1(INSN, T1, T2, W, W2, N)\
-  VECT_VAR(vector_res, T1, W2, N) =\
-INSN##_##T2##W(VECT_VAR(vector, T1, W, N), \
-  VECT_VAR(vector2, T1, W, N));\
-  vst1q_##T2##W2(VECT_VAR(result, T1, W2, N), VECT_VAR(vector_res, T1, W2, N))
-
-#define TEST_VADDL(INSN, T1, T2, W, W2, N) \
-  TEST_VADDL1(INSN, T1, T2, W, W2, N)
-
-  DECL_VARIABLE(vector, int, 8, 8);
-  DECL_VARIABLE(vector, int, 16, 4);
-  DECL_VARIABLE(vector, int, 32, 2);
-  DECL_VARIABLE(vector, uint, 8, 8);
-  DECL_VARIABLE(vector, uint, 16, 4);
-  DECL_VARIABLE(vector, uint, 32, 2);
-
-  DECL_VARIABLE(vector2, int, 8, 8);
-  DECL_VARIABLE(vector2, int, 16, 4);
-  DECL_VARIABLE(vector2, int, 32, 2);
-  DECL_VARIABLE(vector2, uint, 8, 8);
-  DECL_VARIABLE(vector2, uint, 16, 4);
-  DECL_VARIABLE(vector2, uint, 32, 2);
-
-  DECL_VARIABLE(vector_res, int, 16, 8);
-  DECL_VARIABLE(vecto

Re: [[ARM/AArch64][testsuite] 18/36] Add vsli_n and vsri_n tests.

2015-01-16 Thread Tejas Belagod

+
+void vsri_extra(void)
+{
+/* Test cases with maximum shift amount (this amount is different
+ * from vsli.  */
+


Nit. Comment Formatting. Similarly, few other places.

Otherwise, LGTM.

Tejas.



Re: [PATCH, Aarch64] Add FMA steering pass for Cortex-A57

2015-01-16 Thread Andrew Pinski
On Fri, Jan 16, 2015 at 5:42 AM, Thomas Preud'homme
 wrote:
> Hi all,
>
> Quoting the patch:
>
> For better performance, the destination of FMADD/FMSUB instructions should
> have the same parity as their accumulator register if the accumulator contains
> the result of a previous FMUL or FMADD/FMSUB instruction if targeting
> Cortex-A57 processors.  Performance is also increased by otherwise keeping a
> good balance in the parity of the destination register of FMUL or FMADD/FMSUB.
>
> This pass ensure that registers are renamed so that these conditions hold. We
> reuse the existing register renaming facility from regrename.c to build
> dependency chains and expose candidate registers for renaming.
>
> ChangeLog entry is as follow:
>
> gcc/ChangeLog
>
> 2015-01-14 Thomas Preud'homme thomas.preudho...@arm.com
>
> * config.gcc: Add fma_steering.o to extra_objs for aarch64-*-*.
> * config/aarch64/t-aarch64: Add a rule for fma_steering.o.
> * config/aarch64/aarch64.h (AARCH64_FL_USE_FMA_STEERING_PASS): Define.
> (AARCH64_TUNE_FMA_STEERING): Likewise.
> * config/aarch64/aarch64-cores.def: Set
> AARCH64_FL_USE_FMA_STEERING_PASS for cores with dynamic steering of
> FMUL/FMADD instructions.
> * config/aarch64/aarch64.c (aarch64_register_fma_steering): Declare.
> (aarch64_override_options): Call aarch64_register_fma_steering () if
> AARCH64_TUNE_FMA_STEERING is true.
> * config/aarch64/fma_steering.c: New file.
>
>
> Patch can be found below and is also attached to this email for convenience.
>
> diff --git a/gcc/config.gcc b/gcc/config.gcc
> index 0dfc08f..7acc5b6 100644
> --- a/gcc/config.gcc
> +++ b/gcc/config.gcc
> @@ -302,7 +302,7 @@ m32c*-*-*)
>  aarch64*-*-*)
> cpu_type=aarch64
> extra_headers="arm_neon.h arm_acle.h"
> -   extra_objs="aarch64-builtins.o aarch-common.o"
> +   extra_objs="aarch64-builtins.o aarch-common.o 
> cortex-a57-fma-steering.o"
> target_gtfiles="\$(srcdir)/config/aarch64/aarch64-builtins.c"
> target_has_targetm_common=yes
> ;;
> diff --git a/gcc/config/aarch64/aarch64-cores.def 
> b/gcc/config/aarch64/aarch64-cores.def
> index 18f5c48..fc770f3 100644
> --- a/gcc/config/aarch64/aarch64-cores.def
> +++ b/gcc/config/aarch64/aarch64-cores.def
> @@ -35,9 +35,9 @@
>  /* V8 Architecture Processors.  */
>
>  AARCH64_CORE("cortex-a53",  cortexa53, cortexa53, 8,  AARCH64_FL_FOR_ARCH8 | 
> AARCH64_FL_CRC, cortexa53)
> -AARCH64_CORE("cortex-a57",  cortexa15, cortexa15, 8,  AARCH64_FL_FOR_ARCH8 | 
> AARCH64_FL_CRC, cortexa57)
> +AARCH64_CORE("cortex-a57",  cortexa15, cortexa15, 8,  AARCH64_FL_FOR_ARCH8 | 
> AARCH64_FL_CRC | AARCH64_FL_USE_FMA_STEERING_PASS, cortexa57)
>  AARCH64_CORE("thunderx",thunderx,  thunderx, 8,  AARCH64_FL_FOR_ARCH8 | 
> AARCH64_FL_CRC | AARCH64_FL_CRYPTO, thunderx)
>
>  /* V8 big.LITTLE implementations.  */
>
> -AARCH64_CORE("cortex-a57.cortex-a53",  cortexa57cortexa53, cortexa53, 8,  
> AARCH64_FL_FOR_ARCH8 | AARCH64_FL_CRC, cortexa57)
> +AARCH64_CORE("cortex-a57.cortex-a53",  cortexa57cortexa53, cortexa53, 8,  
> AARCH64_FL_FOR_ARCH8 | AARCH64_FL_CRC | AARCH64_FL_USE_FMA_STEERING_PASS, 
> cortexa57)
> diff --git a/gcc/config/aarch64/aarch64.h b/gcc/config/aarch64/aarch64.h
> index eed86f7..f749811 100644
> --- a/gcc/config/aarch64/aarch64.h
> +++ b/gcc/config/aarch64/aarch64.h
> @@ -200,6 +200,8 @@ extern unsigned aarch64_architecture_version;
>  #define AARCH64_FL_CRYPTO (1 << 2) /* Has crypto.  */
>  #define AARCH64_FL_SLOWMUL(1 << 3) /* A slow multiply core.  */
>  #define AARCH64_FL_CRC(1 << 4) /* Has CRC.  */
> +/* Has static dispatch of FMA.  */
> +#define AARCH64_FL_USE_FMA_STEERING_PASS (1 << 5)
>
>  /* Has FP and SIMD.  */
>  #define AARCH64_FL_FPSIMD (AARCH64_FL_FP | AARCH64_FL_SIMD)
> @@ -220,6 +222,8 @@ extern unsigned long aarch64_isa_flags;
>  /* Macros to test tuning flags.  */
>  extern unsigned long aarch64_tune_flags;
>  #define AARCH64_TUNE_SLOWMUL   (aarch64_tune_flags & AARCH64_FL_SLOWMUL)
> +#define AARCH64_TUNE_FMA_STEERING \
> +  (aarch64_tune_flags & AARCH64_FL_USE_FMA_STEERING_PASS)
>
>  /* Crypto is an optional extension to AdvSIMD.  */
>  #define TARGET_CRYPTO (TARGET_SIMD && AARCH64_ISA_CRYPTO)
> diff --git a/gcc/config/aarch64/aarch64.c b/gcc/config/aarch64/aarch64.c
> index 5100532..566ac2d 100644
> --- a/gcc/config/aarch64/aarch64.c
> +++ b/gcc/config/aarch64/aarch64.c
> @@ -6716,6 +6716,11 @@ aarch64_parse_tune (void)
>  }
>
>
> +/* Defined in config/aarch64/fma_steering.c.  */
> +
> +void
> +aarch64_register_fma_steering (void);


This is really bad form.  Can you add a header file for this
declaration and maybe future declarations too?
Maybe called aarch64-internal.h and include this header file both in
aarch64.c and fma_steering.c.

Thanks,
Andrew Pinski


> +
>  /* Implement TARGET_OPTION_OVERRIDE.  */
>
>  static void
> @@ -6798,6 +6803,9 @@ aarch64_override_options (void)
> align_functions = aarch64_tune_params

Re: [debug-early] C++ clones and limbo DIEs

2015-01-16 Thread Aldy Hernandez

On 01/16/2015 08:31 AM, Jason Merrill wrote:

On 01/16/2015 05:55 AM, Richard Biener wrote:

I'd hope that in the very distant future all early DIEs would be
"created"
by the frontends (that is, dwarf2out.c wouldn't walk into
parents/siblings
so much).


Are you thinking that the front end would immediately call a debug hook
for every block, local variable and such, or just for higher level
entities?


In the very distant future, as in, when Aldy is retired and living in a 
tropical island somewhere? :).





I hoped we wouldn't need the limbo list at all ... that is, parent DIEs
are always present when we create children.  I think that should
work in principle if the frontends would create DIEs while parsing.


So create the function DIE as soon as we see the declaration?  That
seems reasonable.  Then that would be the point of early debug, not
later at EOF.


I'm certainly game to exploring this option, though I think we should be 
able to get this working at EOF.  The reason why I didn't take this 
approach originally is because it seemed like a lot more hassle, 
especially since we have to do the same thing for all other front-ends.





Note that dwarf2out forces parent DIE creation in some cases
but not in some others - it would be interesting to sort out which
parent DIEs it thinks it cannot create when we create the DIE
for a sibling.  Maybe it's just poor ordering of early_global_decl
calls?


Agreed.


Regrettably, I have to agree as well.  I can investigate why some DIEs 
end up orphaned and report back.


Thanks.
Aldy


Re: [[ARM/AArch64][testsuite] 17/36] Add vpadd, vpmax and vpmin tests.

2015-01-16 Thread Christophe Lyon
On 16 January 2015 at 18:52, Tejas Belagod  wrote:
> On 13/01/15 15:18, Christophe Lyon wrote:
>>
>> * gcc.target/aarch64/advsimd-intrinsics/vpXXX.inc: New file.
>> * gcc.target/aarch64/advsimd-intrinsics/vpadd.c: New file.
>> * gcc.target/aarch64/advsimd-intrinsics/vpmax.c: New file.
>> * gcc.target/aarch64/advsimd-intrinsics/vpmin.c: New file.
>>
>> diff --git a/gcc/testsuite/gcc.target/aarch64/advsimd-intrinsics/vpXXX.inc
>> b/gcc/testsuite/gcc.target/aarch64/advsimd-intrinsics/vpXXX.inc
>> new file mode 100644
>> index 000..7ac2ed4
>> --- /dev/null
>> +++ b/gcc/testsuite/gcc.target/aarch64/advsimd-intrinsics/vpXXX.inc
>> @@ -0,0 +1,67 @@
>> +#define FNNAME1(NAME) exec_ ## NAME
>> +#define FNNAME(NAME) FNNAME1(NAME)
>> +
>> +void FNNAME (INSN_NAME) (void)
>> +{
>> +  /* Basic test: y=OP(x), then store the result.  */
>> +#define TEST_VPADD1(INSN, T1, T2, W, N)
>> \
>> +  VECT_VAR(vector_res, T1, W, N) = \
>> +INSN##_##T2##W(VECT_VAR(vector, T1, W, N), \
>> +  VECT_VAR(vector, T1, W, N)); \
>> +  vst1##_##T2##W(VECT_VAR(result, T1, W, N),   \
>> +VECT_VAR(vector_res, T1, W, N))
>> +
>> +#define TEST_VPADD(INSN, T1, T2, W, N) \
>> +  TEST_VPADD1(INSN, T1, T2, W, N)  \
>> +
>> +  /* No need for 64 bits variants.  */
>> +  DECL_VARIABLE(vector, int, 8, 8);
>> +  DECL_VARIABLE(vector, int, 16, 4);
>> +  DECL_VARIABLE(vector, int, 32, 2);
>> +  DECL_VARIABLE(vector, uint, 8, 8);
>> +  DECL_VARIABLE(vector, uint, 16, 4);
>> +  DECL_VARIABLE(vector, uint, 32, 2);
>> +  DECL_VARIABLE(vector, float, 32, 2);
>> +
>> +  DECL_VARIABLE(vector_res, int, 8, 8);
>> +  DECL_VARIABLE(vector_res, int, 16, 4);
>> +  DECL_VARIABLE(vector_res, int, 32, 2);
>> +  DECL_VARIABLE(vector_res, uint, 8, 8);
>> +  DECL_VARIABLE(vector_res, uint, 16, 4);
>> +  DECL_VARIABLE(vector_res, uint, 32, 2);
>> +  DECL_VARIABLE(vector_res, float, 32, 2);
>> +
>> +  clean_results ();
>> +
>> +  /* Initialize input "vector" from "buffer".  */
>> +  VLOAD(vector, buffer, , int, s, 8, 8);
>> +  VLOAD(vector, buffer, , int, s, 16, 4);
>> +  VLOAD(vector, buffer, , int, s, 32, 2);
>> +  VLOAD(vector, buffer, , uint, u, 8, 8);
>> +  VLOAD(vector, buffer, , uint, u, 16, 4);
>> +  VLOAD(vector, buffer, , uint, u, 32, 2);
>> +  VLOAD(vector, buffer, , float, f, 32, 2);
>> +
>> +  /* Apply a unary operator named INSN_NAME.  */
>
>
> Unary op?
>
Hmm cut & paste issue. Thanks

>
>> +  TEST_VPADD(INSN_NAME, int, s, 8, 8);
>> +  TEST_VPADD(INSN_NAME, int, s, 16, 4);
>> +  TEST_VPADD(INSN_NAME, int, s, 32, 2);
>> +  TEST_VPADD(INSN_NAME, uint, u, 8, 8);
>> +  TEST_VPADD(INSN_NAME, uint, u, 16, 4);
>> +  TEST_VPADD(INSN_NAME, uint, u, 32, 2);
>> +  TEST_VPADD(INSN_NAME, float, f, 32, 2);
>> +
>> +  CHECK(TEST_MSG, int, 8, 8, PRIx32, expected, "");
>> +  CHECK(TEST_MSG, int, 16, 4, PRIx64, expected, "");
>> +  CHECK(TEST_MSG, int, 32, 2, PRIx32, expected, "");
>> +  CHECK(TEST_MSG, uint, 8, 8, PRIx32, expected, "");
>> +  CHECK(TEST_MSG, uint, 16, 4, PRIx64, expected, "");
>> +  CHECK(TEST_MSG, uint, 32, 2, PRIx32, expected, "");
>> +  CHECK_FP(TEST_MSG, float, 32, 2, PRIx32, expected, "");
>> +}
>> +
>> +int main (void)
>> +{
>> +  FNNAME (INSN_NAME) ();
>> +  return 0;
>> +}
>> diff --git a/gcc/testsuite/gcc.target/aarch64/advsimd-intrinsics/vpadd.c
>> b/gcc/testsuite/gcc.target/aarch64/advsimd-intrinsics/vpadd.c
>> new file mode 100644
>> index 000..5ddfd3d
>> --- /dev/null
>> +++ b/gcc/testsuite/gcc.target/aarch64/advsimd-intrinsics/vpadd.c
>> @@ -0,0 +1,19 @@
>> +#include 
>> +#include "arm-neon-ref.h"
>> +#include "compute-ref-data.h"
>> +
>> +#define INSN_NAME vpadd
>> +#define TEST_MSG "VPADD"
>> +
>> +/* Expected results.  */
>> +VECT_VAR_DECL(expected,int,8,8) [] = { 0xe1, 0xe5, 0xe9, 0xed,
>> +  0xe1, 0xe5, 0xe9, 0xed };
>> +VECT_VAR_DECL(expected,int,16,4) [] = { 0xffe1, 0xffe5, 0xffe1, 0xffe5 };
>> +VECT_VAR_DECL(expected,int,32,2) [] = { 0xffe1, 0xffe1 };
>> +VECT_VAR_DECL(expected,uint,8,8) [] = { 0xe1, 0xe5, 0xe9, 0xed,
>> +   0xe1, 0xe5, 0xe9, 0xed };
>> +VECT_VAR_DECL(expected,uint,16,4) [] = { 0xffe1, 0xffe5, 0xffe1, 0xffe5
>> };
>> +VECT_VAR_DECL(expected,uint,32,2) [] = { 0xffe1, 0xffe1 };
>> +VECT_VAR_DECL(expected,hfloat,32,2) [] = { 0xc1f8, 0xc1f8 };
>> +
>> +#include "vpXXX.inc"
>> diff --git a/gcc/testsuite/gcc.target/aarch64/advsimd-intrinsics/vpmax.c
>> b/gcc/testsuite/gcc.target/aarch64/advsimd-intrinsics/vpmax.c
>> new file mode 100644
>> index 000..f27a9a9
>> --- /dev/null
>> +++ b/gcc/testsuite/gcc.target/aarch64/advsimd-intrinsics/vpmax.c
>> @@ -0,0 +1,20 @@
>> +#include 
>> +#include "arm-neon-ref.h"
>> +#include "compute-ref-data.h"
>> +
>> +
>> +#define INSN_NAME vpmax
>> +#define TEST_MSG "VPMAX"
>> +
>> +/* Expected results. 

Fix CC_MODE pessimization in reorg.c

2015-01-16 Thread Eric Botcazou
As exposed in https://gcc.gnu.org/ml/gcc-patches/2014-12/msg01052.html the 
conversion of the original cc0-based Visium port into a CC_MODE port went 
smoothly and didn't affect the run time performance, except for a single but 
notable case: the reorg.c pass cannot put an insn that clobbers the CC reg 
into a conditional branch's delay slot if it comes from before the branch.
I guess this is negligible on most architectures with delay slots but not on 
the Visium where it alone costs 3% on CoreMark.

The attached patch fixes the pessimization very selectively, using the same 
trigger as the compare-elim pass, which means that only aarch64, mn10300, rx 
and visium are potentially affected; now among them only visium has delay 
slots so the patch actually affects visium only.

Tested on visium-elf.  Any objections to me applying it now?


2015-01-16  Eric Botcazou  

* reorg.c (fill_simple_delay_slots): If TARGET_FLAGS_REGNUM is valid,
implement a more precise life analysis for it during backward scan.


-- 
Eric BotcazouIndex: reorg.c
===
--- reorg.c	(revision 219714)
+++ reorg.c	(working copy)
@@ -2072,9 +2072,24 @@ fill_simple_delay_slots (int non_jumps_p
 
   if (slots_filled < slots_to_fill)
 	{
+	  /* If the flags register is dead after the insn, then we want to be
+	 able to accept a candidate that clobbers it.  For this purpose,
+	 we need to filter the flags register during life analysis, so
+	 that it doesn't create RAW and WAW dependencies, while still
+	 creating the necessary WAR dependencies.  */
+	  bool filter_flags
+	= (slots_to_fill == 1
+	   && targetm.flags_regnum != INVALID_REGNUM
+	   && find_regno_note (insn, REG_DEAD, targetm.flags_regnum));
+	  struct resources fset;
 	  CLEAR_RESOURCE (&needed);
 	  CLEAR_RESOURCE (&set);
 	  mark_set_resources (insn, &set, 0, MARK_SRC_DEST);
+	  if (filter_flags)
+	{
+	  CLEAR_RESOURCE (&fset);
+	  mark_set_resources (insn, &fset, 0, MARK_SRC_DEST);
+	}
 	  mark_referenced_resources (insn, &needed, false);
 
 	  for (trial = prev_nonnote_insn (insn); ! stop_search_p (trial, 1);
@@ -2092,7 +2107,9 @@ fill_simple_delay_slots (int non_jumps_p
 	  /* Check for resource conflict first, to avoid unnecessary
 		 splitting.  */
 	  if (! insn_references_resource_p (trial, &set, true)
-		  && ! insn_sets_resource_p (trial, &set, true)
+		  && ! insn_sets_resource_p (trial,
+	 filter_flags ? &fset : &set,
+	 true)
 		  && ! insn_sets_resource_p (trial, &needed, true)
 #ifdef HAVE_cc0
 		  /* Can't separate set of cc0 from its use.  */
@@ -2121,6 +2138,18 @@ fill_simple_delay_slots (int non_jumps_p
 		}
 
 	  mark_set_resources (trial, &set, 0, MARK_SRC_DEST_CALL);
+	  if (filter_flags)
+		{
+		  mark_set_resources (trial, &fset, 0, MARK_SRC_DEST_CALL);
+		  /* If the flags register is set, then it doesn't create RAW
+		 dependencies any longer and it also doesn't create WAW
+		 dependencies since it's dead after the original insn.  */
+		  if (TEST_HARD_REG_BIT (fset.regs, targetm.flags_regnum))
+		{
+		  CLEAR_HARD_REG_BIT (needed.regs, targetm.flags_regnum);
+		  CLEAR_HARD_REG_BIT (fset.regs, targetm.flags_regnum);
+		}
+		}
 	  mark_referenced_resources (trial, &needed, true);
 	}
 	}


Re: [[ARM/AArch64][testsuite] 03/36] Add vmax, vmin, vhadd, vhsub and vrhadd tests.

2015-01-16 Thread Christophe Lyon
On 16 January 2015 at 18:14, Marcus Shawcroft
 wrote:
> On 16 January 2015 at 16:21, Christophe Lyon  
> wrote:
>
>> My existing tests only cover armv7 so far.
>> I do plan to expand them once they are all in GCC.
>>
>>> Otherwise, they look good to me(but I can't approve it).
>>>
>>> Tejas.
>>>
>
> OK provided, as per the previous couple, that we don;t regression or
> introduce new fails on aarch64[_be] or aarch32.

This patch shows failures on aarch64 and aarch64_be for vmax and vmin
when the input is -NaN.
It's a corner case, and my reading of the ARM ARM is that the result
should the same as on aarch32.
I haven't had time to look at it in more details though.
So, not OK?

> /Marcus


Re: [[ARM/AArch64][testsuite] 17/36] Add vpadd, vpmax and vpmin tests.

2015-01-16 Thread Tejas Belagod

On 13/01/15 15:18, Christophe Lyon wrote:

* gcc.target/aarch64/advsimd-intrinsics/vpXXX.inc: New file.
* gcc.target/aarch64/advsimd-intrinsics/vpadd.c: New file.
* gcc.target/aarch64/advsimd-intrinsics/vpmax.c: New file.
* gcc.target/aarch64/advsimd-intrinsics/vpmin.c: New file.

diff --git a/gcc/testsuite/gcc.target/aarch64/advsimd-intrinsics/vpXXX.inc 
b/gcc/testsuite/gcc.target/aarch64/advsimd-intrinsics/vpXXX.inc
new file mode 100644
index 000..7ac2ed4
--- /dev/null
+++ b/gcc/testsuite/gcc.target/aarch64/advsimd-intrinsics/vpXXX.inc
@@ -0,0 +1,67 @@
+#define FNNAME1(NAME) exec_ ## NAME
+#define FNNAME(NAME) FNNAME1(NAME)
+
+void FNNAME (INSN_NAME) (void)
+{
+  /* Basic test: y=OP(x), then store the result.  */
+#define TEST_VPADD1(INSN, T1, T2, W, N)
\
+  VECT_VAR(vector_res, T1, W, N) = \
+INSN##_##T2##W(VECT_VAR(vector, T1, W, N), \
+  VECT_VAR(vector, T1, W, N)); \
+  vst1##_##T2##W(VECT_VAR(result, T1, W, N),   \
+VECT_VAR(vector_res, T1, W, N))
+
+#define TEST_VPADD(INSN, T1, T2, W, N) \
+  TEST_VPADD1(INSN, T1, T2, W, N)  \
+
+  /* No need for 64 bits variants.  */
+  DECL_VARIABLE(vector, int, 8, 8);
+  DECL_VARIABLE(vector, int, 16, 4);
+  DECL_VARIABLE(vector, int, 32, 2);
+  DECL_VARIABLE(vector, uint, 8, 8);
+  DECL_VARIABLE(vector, uint, 16, 4);
+  DECL_VARIABLE(vector, uint, 32, 2);
+  DECL_VARIABLE(vector, float, 32, 2);
+
+  DECL_VARIABLE(vector_res, int, 8, 8);
+  DECL_VARIABLE(vector_res, int, 16, 4);
+  DECL_VARIABLE(vector_res, int, 32, 2);
+  DECL_VARIABLE(vector_res, uint, 8, 8);
+  DECL_VARIABLE(vector_res, uint, 16, 4);
+  DECL_VARIABLE(vector_res, uint, 32, 2);
+  DECL_VARIABLE(vector_res, float, 32, 2);
+
+  clean_results ();
+
+  /* Initialize input "vector" from "buffer".  */
+  VLOAD(vector, buffer, , int, s, 8, 8);
+  VLOAD(vector, buffer, , int, s, 16, 4);
+  VLOAD(vector, buffer, , int, s, 32, 2);
+  VLOAD(vector, buffer, , uint, u, 8, 8);
+  VLOAD(vector, buffer, , uint, u, 16, 4);
+  VLOAD(vector, buffer, , uint, u, 32, 2);
+  VLOAD(vector, buffer, , float, f, 32, 2);
+
+  /* Apply a unary operator named INSN_NAME.  */


Unary op?


+  TEST_VPADD(INSN_NAME, int, s, 8, 8);
+  TEST_VPADD(INSN_NAME, int, s, 16, 4);
+  TEST_VPADD(INSN_NAME, int, s, 32, 2);
+  TEST_VPADD(INSN_NAME, uint, u, 8, 8);
+  TEST_VPADD(INSN_NAME, uint, u, 16, 4);
+  TEST_VPADD(INSN_NAME, uint, u, 32, 2);
+  TEST_VPADD(INSN_NAME, float, f, 32, 2);
+
+  CHECK(TEST_MSG, int, 8, 8, PRIx32, expected, "");
+  CHECK(TEST_MSG, int, 16, 4, PRIx64, expected, "");
+  CHECK(TEST_MSG, int, 32, 2, PRIx32, expected, "");
+  CHECK(TEST_MSG, uint, 8, 8, PRIx32, expected, "");
+  CHECK(TEST_MSG, uint, 16, 4, PRIx64, expected, "");
+  CHECK(TEST_MSG, uint, 32, 2, PRIx32, expected, "");
+  CHECK_FP(TEST_MSG, float, 32, 2, PRIx32, expected, "");
+}
+
+int main (void)
+{
+  FNNAME (INSN_NAME) ();
+  return 0;
+}
diff --git a/gcc/testsuite/gcc.target/aarch64/advsimd-intrinsics/vpadd.c 
b/gcc/testsuite/gcc.target/aarch64/advsimd-intrinsics/vpadd.c
new file mode 100644
index 000..5ddfd3d
--- /dev/null
+++ b/gcc/testsuite/gcc.target/aarch64/advsimd-intrinsics/vpadd.c
@@ -0,0 +1,19 @@
+#include 
+#include "arm-neon-ref.h"
+#include "compute-ref-data.h"
+
+#define INSN_NAME vpadd
+#define TEST_MSG "VPADD"
+
+/* Expected results.  */
+VECT_VAR_DECL(expected,int,8,8) [] = { 0xe1, 0xe5, 0xe9, 0xed,
+  0xe1, 0xe5, 0xe9, 0xed };
+VECT_VAR_DECL(expected,int,16,4) [] = { 0xffe1, 0xffe5, 0xffe1, 0xffe5 };
+VECT_VAR_DECL(expected,int,32,2) [] = { 0xffe1, 0xffe1 };
+VECT_VAR_DECL(expected,uint,8,8) [] = { 0xe1, 0xe5, 0xe9, 0xed,
+   0xe1, 0xe5, 0xe9, 0xed };
+VECT_VAR_DECL(expected,uint,16,4) [] = { 0xffe1, 0xffe5, 0xffe1, 0xffe5 };
+VECT_VAR_DECL(expected,uint,32,2) [] = { 0xffe1, 0xffe1 };
+VECT_VAR_DECL(expected,hfloat,32,2) [] = { 0xc1f8, 0xc1f8 };
+
+#include "vpXXX.inc"
diff --git a/gcc/testsuite/gcc.target/aarch64/advsimd-intrinsics/vpmax.c 
b/gcc/testsuite/gcc.target/aarch64/advsimd-intrinsics/vpmax.c
new file mode 100644
index 000..f27a9a9
--- /dev/null
+++ b/gcc/testsuite/gcc.target/aarch64/advsimd-intrinsics/vpmax.c
@@ -0,0 +1,20 @@
+#include 
+#include "arm-neon-ref.h"
+#include "compute-ref-data.h"
+
+
+#define INSN_NAME vpmax
+#define TEST_MSG "VPMAX"
+
+/* Expected results.  */
+VECT_VAR_DECL(expected,int,8,8) [] = { 0xf1, 0xf3, 0xf5, 0xf7,
+  0xf1, 0xf3, 0xf5, 0xf7 };
+VECT_VAR_DECL(expected,int,16,4) [] = { 0xfff1, 0xfff3, 0xfff1, 0xfff3 };
+VECT_VAR_DECL(expected,int,32,2) [] = { 0xfff1, 0xfff1 };
+VECT_VAR_DECL(expected,uint,8,8) [] = { 0xf1, 0xf3, 0xf5, 0xf7,
+   0xf1, 0xf3, 0xf5, 0xf7 };
+VECT_VAR_

Re: [debug-early] C++ clones and limbo DIEs

2015-01-16 Thread Aldy Hernandez

On 01/15/2015 07:11 PM, Jason Merrill wrote:

On 01/15/2015 09:58 PM, Aldy Hernandez wrote:



Now back to limbdo_die_list... My approach is to flush the limbo list,
generically, after the front-ends have finished, by adding a new
"early_finish" debug hook.  This gets rid of any permanence into LTO
time.  Then I flush it out again, if the middle end (or LTO, etc) has
added any limbo DIEs.


Can you remove the first flush and just do it in the second place?


If I only flush the limbo list in the second place, that's basically 
what mainline does, albeit abstracted into a function.  I thought the 
whole point was to get rid of the limbo list, or at least keep it from 
being a structure that has to go through LTO streaming.


Aldy


RE: [PING] [PATCH] Fix parameters of __tsan_vptr_update

2015-01-16 Thread Bernd Edlinger
Hi,

On Fri, 16 Jan 2015 21:25:42, Dmitry Vyukov wrote:
>
> This is just a copy from llvm repo, right?
> Looks good to me.
>

Thanks.

Yes I found these test case in the llvm tree, and just adapted them
to work in the gcc test suite.

However, here is a small tweak in the positive test:
That is we now use a tsan-invisible barrier_wait function
instead of the not very reliable sleep(1).

barrier_wait is bypassing the tsan interceptor, because
it is accessed with dlsym (dlopen ("libpthread.so.0", RTLD_LAZY),
"pthread_barrier_wait").


Bernd.


> On Fri, Jan 16, 2015 at 10:17 AM, Bernd Edlinger
>  wrote:
>> Hi,
>>
>>
>> I think I should ping for this patch now:
>> https://gcc.gnu.org/ml/gcc-patches/2015-01/msg00599.html
>>
>> note that by mistake the change log referenced sanitizer.c instead of
>> sanitizer.def, consider that fixed on my local copy.
>>
>>
>> Thanks
>> Bernd.
>>
>>
>>> Date: Sun, 11 Jan 2015 14:15:54 +0100
>>>
>>> Hi,
>>>
>>>
>>>
>>> On Sun, 4 Jan 2015 14:54:56, Bernd Edlinger wrote:

 Hi Jakub,


 I think I have found a reasonable test case, see the attached patch file.
 The use case is: a class that destroys an owned thread in the destructor.
 The destructor sets the vptr again to thread::vptr but this should
 _not_ trigger a diagnostic message, when the vptr does not really change.

 Jakub, this is another test case where the TREE_READONLY prevents
 the tsan instrumentation. So I had first to install your patch:

 https://gcc.gnu.org/ml/gcc-patches/2014-12/msg01432.html

 ... to see the test case fail without my patch.

>>>
>>> that has been installed in the meantime.
>>>
 The patch installs cleanly on 4.9 and 4.8, however the 4.8 branch
 has no tsan tests, so I would leave the test case away for 4.8.

>>>
>>> I found, 4.8 does not have BT_FN_VOID_PTR_PTR, and no tsan tests
>>> at all, so it is probably not worth the effort.
>>>
 Boot-strapped and regression-tested on x86_64-linux-gnu
 OK for trunk and 4.9 + 4.8 branches?


 Thanks
 Bernd.


>>>
>>> I found some test cases in the clang tree, about the __tsan_vptr_update.
>>> So I thought I should use these instead of inventing new ones.
>>>
>>> Attached you'll find an updated patch with one positive and one negative
>>> test for vptr races.
>>>
>>> Tested with x86_64-linux-gnu.
>>> OK for trunk and 4.9 after a while?
>>>
>>>
>>> Thanks
>>> Bernd.
>>>
>>
  

Re: [PATCH 0/4][ARM Intrinsics][RFTesting] Add missing float16x8_t type, and float16x[48] intrinsics

2015-01-16 Thread Christophe Lyon
On 16 January 2015 at 18:22, Alan Lawrence  wrote:
> These add all the V[48]HFmode insns and corresponding intrinsics for ARM.
> Depends on the two patches at
> https://gcc.gnu.org/ml/gcc-patches/2015-01/msg01422.html .
>
> Unfortunately I don't at present have a testsuite. I've done some testing
> both manually and on a large internal testsuite for Neon/ACLE intrinsics,
> but I'm wondering if anyone has anything they might be able to contribute?
> Christophe, perhaps you can give me some pointers, how might one add float16
> to the advsimd-intrinsics testsuite / how easy would this be?

I don't expect this to be difficult.
I have some support already in my testsuite at
https://gitorious.org/arm-neon-tests
For instance, if you look at ref_vld1.c, you'll that GCC's vld1.c test
needs to have something like:
#if defined(__ARM_FP16_FORMAT_IEEE)
  DECL_VARIABLE(vector, float, 16, 4);
  DECL_VARIABLE(vector, float, 16, 8);
#endif
...
#if defined(__ARM_FP16_FORMAT_IEEE)
  TEST_VLD1(vector, buffer, , float, f, 16, 4);
  TEST_VLD1(vector, buffer, q, float, f, 16, 8);
#endif

as well as a similar fragment with the expected results.

I didn't add it yet because I was waiting for the support in GCC to be
complete :-)
I discussed this briefly with Ramana when I started contributing my tests.

However, it might be better to add such tests in a dedicated file
(e.g. vld1-fp16.c)
protected by some dejagnu directive to check that the effective target
does support fp16, and without the #ifdef.


> Cross-tested check-gcc on arm-none-eabi
> Bootstrapped on arm-none-linux-gnueabihf cortex-a15
>

Christophe.


[Obvious ARM Testsuite] mangle-arm-crypto.C needs arm_crypto arguments added

2015-01-16 Thread James Greenhalgh

Hi,

g++.dg/abi/mangle-arm-crypto.C fails on some ARM targets
(see: https://gcc.gnu.org/ml/gcc-testresults/2015-01/msg01654.html ).
Much to my irritation, I can't reproduce the failures locally, but the
reason looks simple enough - you need to pass the correct flags to
enable the cypto intrinsics and instructions.

There is already a dg-add-options directive for this, so use it.

I've committed this patch as obvious as r219758, after checking it doesn't
cause any new problems on the platforms I have access to. I'll watch
Ramana's test results over the weekend to see if this fixes his bug!

Cheers,
James

---
gcc/testsuite/

2015-01-16  James Greenhalgh  

* g++.dg/abi/mangle-abi-crypto.C: Add crypto options, rather
than Neon options.
diff --git a/gcc/testsuite/g++.dg/abi/mangle-arm-crypto.C b/gcc/testsuite/g++.dg/abi/mangle-arm-crypto.C
index aae8847..f3fb1a9 100644
--- a/gcc/testsuite/g++.dg/abi/mangle-arm-crypto.C
+++ b/gcc/testsuite/g++.dg/abi/mangle-arm-crypto.C
@@ -3,7 +3,7 @@
 
 // { dg-do compile }
 // { dg-require-effective-target arm_crypto_ok }
-// { dg-add-options arm_neon }
+// { dg-add-options arm_crypto }
 
 #include 
 

[PATCH 4/4][ARM Intrinsics] Add float16 v(ld|st)[234](q?|_lane|_dup),vcombine,vget_(low|high)

2015-01-16 Thread Alan Lawrence
These intrinsics are all made from patterns in neon.md, and are all tied 
together by iterators - I've tried to reduce coupling a bit but there is 
possibly more that could be done here.


gcc/ChangeLog:

* config/arm/arm-builtins.c (VAR11, VAR12): New.
* config/arm/arm_neon_builtins.def (vcombine, vld2_dup, vld3_dup,
vld4_dup): Add v4hf variant.
(vget_high, vget_low): Add v8hf variant.
(vld1, vst1, vld2, vld2_lane, vst2, vst2_lane, vld3, vld3_lane, vst3,
vst3_lane, vld4, vld4_lane, vst4, vst4_lane): Add v4hf and v8hf variants.

* config/arm/iterators.md (VD_LANE, VD_RE, VQ2, VQ_HS): New.
(VDX): Add V4HF.
(V_DOUBLE): Add case for V4HF.
(VQX): Add V8HF.
(V_HALF): Add case for V8HF.
(VDQX): Add V4HF, V8HF.
(V_elem, V_two_elem, V_three_elem, V_four_elem, V_cmp_result,
V_sz_elem, V_mode_nunits, q): Add cases for V4HF & V8HF.

* config/arm/neon.md (vec_setinternal, vec_extract,
neon_vget_lane_sext_internal, neon_vget_lane_zext_internal,
vec_load_lanesoi, neon_vld2, vec_store_lanesoi,
neon_vst2, vec_load_lanesci, neon_vld3,
neon_vld3qa, neon_vld3qb, vec_store_lanesci,
neon_vst3, neon_vst3qa, neon_vst3qb,
vec_load_lanesxi, neon_vld4, neon_vld4qa,
neon_vld4qb, vec_store_lanesxi, neon_vst4,
neon_vst4qa, neon_vst4qb):
Change VQ iterator to VQ2.

(neon_vcreate, neon_vreinterpretv8qi,
neon_vreinterpretv4hi, neon_vreinterpretv2si,
neon_vreinterpretv2sf, neon_vreinterpretdi):
Change VDX to VD_RE.

(neon_vld2_lane, neon_vst2_lane, neon_vld3_lane,
neon_vst3_lane, neon_vld4_lane, neon_vst4_lane):
Change VD iterator to VD_LANE, and VMQ iterator to VQ_HS.

* config/arm/arm_neon.h (float16x4x2_t, float16x8x2_t, float16x4x3_t,
float16x8x3_t, float16x4x4_t, float16x8x4_t, vcombine_f16,
vget_high_f16, vget_low_f16, vld1_f16, vld1q_f16, vst1_f16, vst1q_f16,
vld2_f16, vld2q_f16, vld2_lane_f16, vld2q_lane_f16, vld2_dup_f16,
vld3_f16, vld3q_f16, vld3_lane_f16, vld3q_lane_f16, vld3_dup_f16,
vld4_f16, vld4q_f16, vld4_lane_f16, vld4q_lane_f16, vld4_dup_f16): New.diff --git a/gcc/config/arm/arm-builtins.c b/gcc/config/arm/arm-builtins.c
index 9855b86202e80816bada565786f35dd21fe68c91..4c3f0e888969f16ff6c84e2a1bf65321d73ec8b4 100644
--- a/gcc/config/arm/arm-builtins.c
+++ b/gcc/config/arm/arm-builtins.c
@@ -228,6 +228,12 @@ typedef struct {
 #define VAR10(T, N, A, B, C, D, E, F, G, H, I, J) \
   VAR9 (T, N, A, B, C, D, E, F, G, H, I) \
   VAR1 (T, N, J)
+#define VAR11(T, N, A, B, C, D, E, F, G, H, I, J, K) \
+  VAR10 (T, N, A, B, C, D, E, F, G, H, I, J) \
+  VAR1 (T, N, K)
+#define VAR12(T, N, A, B, C, D, E, F, G, H, I, J, K, L) \
+  VAR11 (T, N, A, B, C, D, E, F, G, H, I, J, K) \
+  VAR1 (T, N, L)
 
 /* The NEON builtin data can be found in arm_neon_builtins.def.
The mode entries in the following table correspond to the "key" type of the
diff --git a/gcc/config/arm/arm_neon.h b/gcc/config/arm/arm_neon.h
index 2c53af9d2d4b3ef4948ddb6c196bfa394c4a6d1c..2e9d442f57ca9a46eda465f051fe0c157eb2b88a 100644
--- a/gcc/config/arm/arm_neon.h
+++ b/gcc/config/arm/arm_neon.h
@@ -162,6 +162,16 @@ typedef struct uint64x2x2_t
   uint64x2_t val[2];
 } uint64x2x2_t;
 
+typedef struct float16x4x2_t
+{
+  float16x4_t val[2];
+} float16x4x2_t;
+
+typedef struct float16x8x2_t
+{
+  float16x8_t val[2];
+} float16x8x2_t;
+
 typedef struct float32x2x2_t
 {
   float32x2_t val[2];
@@ -288,6 +298,16 @@ typedef struct uint64x2x3_t
   uint64x2_t val[3];
 } uint64x2x3_t;
 
+typedef struct float16x4x3_t
+{
+  float16x4_t val[3];
+} float16x4x3_t;
+
+typedef struct float16x8x3_t
+{
+  float16x8_t val[3];
+} float16x8x3_t;
+
 typedef struct float32x2x3_t
 {
   float32x2_t val[3];
@@ -414,6 +434,16 @@ typedef struct uint64x2x4_t
   uint64x2_t val[4];
 } uint64x2x4_t;
 
+typedef struct float16x4x4_t
+{
+  float16x4_t val[4];
+} float16x4x4_t;
+
+typedef struct float16x8x4_t
+{
+  float16x8_t val[4];
+} float16x8x4_t;
+
 typedef struct float32x2x4_t
 {
   float32x2_t val[4];
@@ -6061,6 +6091,12 @@ vcombine_s64 (int64x1_t __a, int64x1_t __b)
   return (int64x2_t)__builtin_neon_vcombinedi (__a, __b);
 }
 
+__extension__ static __inline float16x8_t __attribute__ ((__always_inline__))
+vcombine_f16 (float16x4_t __a, float16x4_t __b)
+{
+  return __builtin_neon_vcombinev4hf (__a, __b);
+}
+
 __extension__ static __inline float32x4_t __attribute__ ((__always_inline__))
 vcombine_f32 (float32x2_t __a, float32x2_t __b)
 {
@@ -6135,6 +6171,12 @@ vget_high_s64 (int64x2_t __a)
   return (int64x1_t)__builtin_neon_vget_highv2di (__a);
 }
 
+__extension__ static __inline float16x4_t __attribute__ ((__always_inline__))
+vget_high_f16 (float16x8_t __a)
+{
+  return __builtin_neon_vget_highv8hf (__a);
+}
+
 __extension__ static __inline float32x2_t __attribute__ ((__always_inline__))
 vget_high_f32 (float32x4_t __a)
 {
@@ -6195,6 +6237,12 @@ vget_low_s32 (int32x4_t __a)
   return (int32x2_t)__builtin_neon_vget_lowv4si (__a);
 }
 
+__extension_

[PATCH 3/4][ARM Intrinsics]float16x8_t intrinsics: vgetq_lane, vsetq_lane, vdupq_n, vdupq_lane, vld1q_lane, vld1q_dup, vreinterpretq

2015-01-16 Thread Alan Lawrence
Much like the first patch, this adds the equivalent ...q... intrinsics for 
float16x8_t, using GCC vector extensions.


gcc/ChangeLog:

* config/arm/arm_neon.h (vdupq_lane_f16, vld1q_lane_f16, vld1q_dup_f16,
vreinterpretq_p8_f16, vreinterpretq_f16_p8, vreinterpretq_f16_p16,
vreinterpretq_f16_f32, vreinterpretq_f16_p64, vreinterpretq_f16_p128,
vreinterpretq_f16_s64, vreinterpretq_f16_u64, vreinterpretq_f16_s8,
vreinterpretq_f16_s16, vreinterpretq_f16_s32, vreinterpretq_f16_u8,
vreinterpretq_f16_u16, vreinterpretq_f16_u32, vreinterpretq_f32_f16,
vreinterpretq_p64_f16, vreinterpretq_p128_f16, vreinterpretq_s64_f16,
vreinterpretq_u64_f16, vreinterpretq_s8_f16, vreinterpretq_s16_f16,
vreinterpretq_s32_f16, vreinterpretq_u8_f16, vreinterpretq_u16_f16,
vreinterpretq_u32_f16): New.

diff --git a/gcc/config/arm/arm_neon.h b/gcc/config/arm/arm_neon.h
index 7259852a6a450c5f693b03cf6342f33190f266c6..d214fd673565c1cd020203c40c514762dfead520 100644
--- a/gcc/config/arm/arm_neon.h
+++ b/gcc/config/arm/arm_neon.h
@@ -5264,6 +5264,16 @@ vgetq_lane_s32 (int32x4_t __a, const int __b)
   return (int32_t)__builtin_neon_vget_lanev4si (__a, __b);
 }
 
+#define vgetq_lane_f16(__v, __i)		\
+  __extension__	\
+({		\
+  float16x8_t __vec = (__v);		\
+  int __idx = (__i);			\
+  __builtin_arm_lane_check (8, __idx);	\
+  float16_t __res = __vec[__idx];		\
+  __res;	\
+})
+
 __extension__ static __inline float32_t __attribute__ ((__always_inline__))
 vgetq_lane_f32 (float32x4_t __a, const int __b)
 {
@@ -5407,6 +5417,17 @@ vsetq_lane_s32 (int32_t __a, int32x4_t __b, const int __c)
   return (int32x4_t)__builtin_neon_vset_lanev4si ((__builtin_neon_si) __a, __b, __c);
 }
 
+#define vsetq_lane_f16(__e, __v, __i)		\
+  __extension__	\
+  ({		\
+  float16_t __elem = (__e);			\
+  float16x8_t __vec = (__v);		\
+  int __idx = (__i);			\
+  __builtin_arm_lane_check (4, __idx);	\
+  __vec[__idx] = __elem;			\
+  __vec;	\
+})
+
 __extension__ static __inline float32x4_t __attribute__ ((__always_inline__))
 vsetq_lane_f32 (float32_t __a, float32x4_t __b, const int __c)
 {
@@ -5642,6 +5663,13 @@ vdupq_n_s32 (int32_t __a)
   return (int32x4_t)__builtin_neon_vdup_nv4si ((__builtin_neon_si) __a);
 }
 
+#define vdupq_n_f16(__e1)	\
+  __extension__			\
+({\
+  float16_t __e = (__e1);	\
+  (float16x8_t) {__e, __e, __e, __e, __e, __e, __e, __e};	\
+})
+
 __extension__ static __inline float32x4_t __attribute__ ((__always_inline__))
 vdupq_n_f32 (float32_t __a)
 {
@@ -5920,6 +5948,12 @@ vdupq_lane_s32 (int32x2_t __a, const int __b)
   return (int32x4_t)__builtin_neon_vdup_lanev4si (__a, __b);
 }
 
+__extension__ static __inline float16x8_t __attribute__ ((__always_inline__))
+vdupq_lane_f16 (float16x8_t __a, const int __b)
+{
+  return vdupq_n_f16 (vgetq_lane_f16 (__a, __b));
+}
+
 __extension__ static __inline float32x4_t __attribute__ ((__always_inline__))
 vdupq_lane_f32 (float32x2_t __a, const int __b)
 {
@@ -8903,6 +8937,12 @@ vld1q_lane_s32 (const int32_t * __a, int32x4_t __b, const int __c)
   return (int32x4_t)__builtin_neon_vld1_lanev4si ((const __builtin_neon_si *) __a, __b, __c);
 }
 
+__extension__ static __inline float16x8_t __attribute__ ((__always_inline__))
+vld1q_lane_f16 (const float16_t * __a, float16x8_t __b, const int __c)
+{
+  return vsetq_lane_f16 (*__a, __b, __c);
+}
+
 __extension__ static __inline float32x4_t __attribute__ ((__always_inline__))
 vld1q_lane_f32 (const float32_t * __a, float32x4_t __b, const int __c)
 {
@@ -9057,6 +9097,12 @@ vld1q_dup_s32 (const int32_t * __a)
   return (int32x4_t)__builtin_neon_vld1_dupv4si ((const __builtin_neon_si *) __a);
 }
 
+__extension__ static __inline float16x8_t __attribute__ ((__always_inline__))
+vld1q_dup_f16 (const float16_t * __a)
+{
+  return vdupq_n_f16 (*__a);
+}
+
 __extension__ static __inline float32x4_t __attribute__ ((__always_inline__))
 vld1q_dup_f32 (const float32_t * __a)
 {
@@ -12851,6 +12897,12 @@ vreinterpretq_p8_p16 (poly16x8_t __a)
 }
 
 __extension__ static __inline poly8x16_t __attribute__ ((__always_inline__))
+vreinterpretq_p8_f16 (float16x8_t __a)
+{
+  return (poly8x16_t) __a;
+}
+
+__extension__ static __inline poly8x16_t __attribute__ ((__always_inline__))
 vreinterpretq_p8_f32 (float32x4_t __a)
 {
   return (poly8x16_t)__builtin_neon_vreinterpretv16qiv4sf (__a);
@@ -12996,6 +13048,88 @@ vreinterpretq_p16_u32 (uint32x4_t __a)
   return (poly16x8_t)__builtin_neon_vreinterpretv8hiv4si ((int32x4_t) __a);
 }
 
+__extension__ static __inline float16x8_t __attribute__ ((__always_inline__))
+vreinterpretq_f16_p8 (poly8x16_t __a)
+{
+  return (float16x8_t) __a;
+}
+
+__extension__ static __inline float16x8_t __attribute__ ((__always_inline__))
+vreinterpretq_f16_p16 (poly16x8_t __a)
+{
+  return (float16x8_t) __a;
+}
+
+__extension__ static __inline float16x8_t __attribute__ ((__always_inline__))
+vreinterpretq_

[PATCH 2/4][ARM Intrinsics] Add missing float16x8_t type

2015-01-16 Thread Alan Lawrence
This defines arm_neon.h's float16x8_t type, although no intrinsics yet (see next 
patch). Adding V8HFmode does mean programmers can define a GCC vector of same 
size themselves.


gcc/ChangeLog:

* config/arm/arm.h (VALID_NEON_QREG_MODE): Add V8HFmode.

* config/arm/arm.c (arm_vector_mode_supported_p): Support V8HFmode.

* config/arm/arm-builtins.c (v8hf_UP): New.
(arm_init_simd_builtin_types): Initialise Float16x8_t.

* config/arm/arm-simd-builtin-types.def (Float16x8_t): New.

* config/arm/arm_neon.h (float16x8_t): New typedef.

diff --git a/gcc/config/arm/arm-builtins.c b/gcc/config/arm/arm-builtins.c
index baa83490fcd9bf68d9e9bdbd57cdf6f2d3d0e056..a91d656bad7b8250cc38237358fc1065acd47714 100644
--- a/gcc/config/arm/arm-builtins.c
+++ b/gcc/config/arm/arm-builtins.c
@@ -179,6 +179,7 @@ arm_storestruct_lane_qualifiers[SIMD_MAX_BUILTIN_ARGS]
 #define di_UPDImode
 #define v16qi_UP V16QImode
 #define v8hi_UP  V8HImode
+#define v8hf_UP  V8HFmode
 #define v4si_UP  V4SImode
 #define v4sf_UP  V4SFmode
 #define v2di_UP  V2DImode
@@ -814,6 +815,7 @@ arm_init_simd_builtin_types (void)
   /* Continue with standard types.  */
   arm_simd_types[Float16x4_t].eltype = arm_simd_floatHF_type_node;
   arm_simd_types[Float32x2_t].eltype = float_type_node;
+  arm_simd_types[Float16x8_t].eltype = arm_simd_floatHF_type_node;
   arm_simd_types[Float32x4_t].eltype = float_type_node;
 
   for (i = 0; i < nelts; i++)
diff --git a/gcc/config/arm/arm-simd-builtin-types.def b/gcc/config/arm/arm-simd-builtin-types.def
index 7360e268bf8507f975b3cff7c6078a046cde3954..c4cb0e2a32b47d13227999a319237895573f5766 100644
--- a/gcc/config/arm/arm-simd-builtin-types.def
+++ b/gcc/config/arm/arm-simd-builtin-types.def
@@ -44,5 +44,7 @@
 
   ENTRY (Float16x4_t, V4HF, none, 64, float16, 18)
   ENTRY (Float32x2_t, V2SF, none, 64, float32, 18)
+
+  ENTRY (Float16x8_t, V8HF, none, 128, float16, 19)
   ENTRY (Float32x4_t, V4SF, none, 128, float32, 19)
 
diff --git a/gcc/config/arm/arm.h b/gcc/config/arm/arm.h
index d850982563eb35d9d87298473bc3dfcb0527ae0b..0059136ba46f6636fd7ac63e667161b8f77118b8 100644
--- a/gcc/config/arm/arm.h
+++ b/gcc/config/arm/arm.h
@@ -1093,7 +1093,7 @@ extern int arm_arch_crc;
 /* Modes valid for Neon Q registers.  */
 #define VALID_NEON_QREG_MODE(MODE) \
   ((MODE) == V4SImode || (MODE) == V8HImode || (MODE) == V16QImode \
-   || (MODE) == V4SFmode || (MODE) == V2DImode)
+   || (MODE) == V8HFmode || (MODE) == V4SFmode || (MODE) == V2DImode)
 
 /* Structure modes valid for Neon registers.  */
 #define VALID_NEON_STRUCT_MODE(MODE) \
diff --git a/gcc/config/arm/arm.c b/gcc/config/arm/arm.c
index 6944c3f3867d2d8ff814a2302112207a56319454..a2fef7aee5f19e3d89524470de7d8615bd60b5a0 100644
--- a/gcc/config/arm/arm.c
+++ b/gcc/config/arm/arm.c
@@ -26184,7 +26184,8 @@ arm_vector_mode_supported_p (machine_mode mode)
 {
   /* Neon also supports V2SImode, etc. listed in the clause below.  */
   if (TARGET_NEON && (mode == V2SFmode || mode == V4SImode || mode == V8HImode
-  || mode == V4HFmode || mode == V16QImode || mode == V4SFmode || mode == V2DImode))
+  || mode ==V4HFmode || mode == V16QImode || mode == V4SFmode
+  || mode == V2DImode || mode == V8HFmode))
 return true;
 
   if ((TARGET_NEON || TARGET_IWMMXT)
diff --git a/gcc/config/arm/arm_neon.h b/gcc/config/arm/arm_neon.h
index 231d1392b93fe78a37f58595f775b0cc87fb709f..7259852a6a450c5f693b03cf6342f33190f266c6 100644
--- a/gcc/config/arm/arm_neon.h
+++ b/gcc/config/arm/arm_neon.h
@@ -58,6 +58,7 @@ typedef __simd128_int8_t int8x16_t;
 typedef __simd128_int16_t int16x8_t;
 typedef __simd128_int32_t int32x4_t;
 typedef __simd128_int64_t int64x2_t;
+typedef __simd128_float16_t float16x8_t;
 typedef __simd128_float32_t float32x4_t;
 typedef __simd128_poly8_t poly8x16_t;
 typedef __simd128_poly16_t poly16x8_t;

[PATCH 1/4][ARM Intrinsics]float16x4_t intrinsics: vget_lane, vset_lane, vcreate, vdup_n, vdup_lane, vld1_lane, vld1_dup, vreinterpret

2015-01-16 Thread Alan Lawrence
This adds a bunch of new intrinsics, implemented with GCC vector extensions to 
maximise mid-end optimization (the same approach as AArch64). Note that unlike 
AArch64, no attempt is made to support bigendian.


gcc/ChangeLog:

* config/arm/arm_neon.h (vcreate_f16, vdup_lane_f16, vld1_lane_f16,
vld1_dup_f16, vreinterpret_p8_f16, vreinterpret_p16_f16,
vreinterpret_f16_p8, vreinterpret_f16_p16, vreinterpret_f16_f32,
vreinterpret_f16_p64, vreinterpret_f16_s64, vreinterpret_f16_u64,
vreinterpret_f16_s8, vreinterpret_f16_s16, vreinterpret_f16_s32,
vreinterpret_f16_u8, vreinterpret_f16_u16, vreinterpret_f16_u32,
vreinterpret_f32_f16, vreinterpret_p64_f16, vreinterpret_s64_f16,
vreinterpret_u64_f16, vreinterpret_s8_f16, vreinterpret_s16_f16,
vreinterpret_s32_f16, vreinterpret_u8_f16, vreinterpret_u16_f16,
vreinterpret_u32_f16): New.

diff --git a/gcc/config/arm/arm_neon.h b/gcc/config/arm/arm_neon.h
index e58b772ee29f910a344d2d3a5be5a7818a79af64..231d1392b93fe78a37f58595f775b0cc87fb709f 100644
--- a/gcc/config/arm/arm_neon.h
+++ b/gcc/config/arm/arm_neon.h
@@ -41,6 +41,7 @@ typedef __simd64_int8_t int8x8_t;
 typedef __simd64_int16_t int16x4_t;
 typedef __simd64_int32_t int32x2_t;
 typedef __builtin_neon_di int64x1_t;
+typedef __builtin_neon_hf float16_t;
 typedef __simd64_float16_t float16x4_t;
 typedef __simd64_float32_t float32x2_t;
 typedef __simd64_poly8_t poly8x8_t;
@@ -5182,6 +5183,20 @@ vget_lane_s32 (int32x2_t __a, const int __b)
   return (int32_t)__builtin_neon_vget_lanev2si (__a, __b);
 }
 
+/* Functions cannot accept or return __FP16 types.  Even if the function
+   were marked always-inline so there were no call sites, the declaration
+   would nonetheless raise an error.  Hence, we must use a macro instead.  */
+
+#define vget_lane_f16(__v, __i)			\
+  __extension__	\
+({		\
+  float16x4_t __vec = (__v);		\
+  int __idx = (__i);			\
+  __builtin_arm_lane_check (4, __idx);	\
+  float16_t __res = __vec[__idx];		\
+  __res;	\
+})
+
 __extension__ static __inline float32_t __attribute__ ((__always_inline__))
 vget_lane_f32 (float32x2_t __a, const int __b)
 {
@@ -5314,6 +5329,17 @@ vset_lane_s32 (int32_t __a, int32x2_t __b, const int __c)
   return (int32x2_t)__builtin_neon_vset_lanev2si ((__builtin_neon_si) __a, __b, __c);
 }
 
+#define vset_lane_f16(__e, __v, __i)		\
+  __extension__	\
+({		\
+  float16_t __elem = (__e);			\
+  float16x4_t __vec = (__v);		\
+  int __idx = (__i);			\
+  __builtin_arm_lane_check (4, __idx);	\
+  __vec[__idx] = __elem;			\
+  __vec;	\
+})
+
 __extension__ static __inline float32x2_t __attribute__ ((__always_inline__))
 vset_lane_f32 (float32_t __a, float32x2_t __b, const int __c)
 {
@@ -5460,6 +5486,12 @@ vcreate_s64 (uint64_t __a)
   return (int64x1_t)__builtin_neon_vcreatedi ((__builtin_neon_di) __a);
 }
 
+__extension__ static __inline float16x4_t __attribute__ ((__always_inline__))
+vcreate_f16 (uint64_t __a)
+{
+  return (float16x4_t) __a;
+}
+
 __extension__ static __inline float32x2_t __attribute__ ((__always_inline__))
 vcreate_f32 (uint64_t __a)
 {
@@ -5520,6 +5552,13 @@ vdup_n_s32 (int32_t __a)
   return (int32x2_t)__builtin_neon_vdup_nv2si ((__builtin_neon_si) __a);
 }
 
+#define vdup_n_f16(__e1)			\
+  __extension__	\
+({		\
+  float16_t __e = (__e1);			\
+  (float16x4_t) {__e, __e, __e, __e};	\
+})
+
 __extension__ static __inline float32x2_t __attribute__ ((__always_inline__))
 vdup_n_f32 (float32_t __a)
 {
@@ -5800,6 +5839,12 @@ vdup_lane_s32 (int32x2_t __a, const int __b)
   return (int32x2_t)__builtin_neon_vdup_lanev2si (__a, __b);
 }
 
+__extension__ static __inline float16x4_t __attribute__ ((__always_inline__))
+vdup_lane_f16 (float16x4_t __a, const int __b)
+{
+  return vdup_n_f16 (vget_lane_f16 (__a, __b));
+}
+
 __extension__ static __inline float32x2_t __attribute__ ((__always_inline__))
 vdup_lane_f32 (float32x2_t __a, const int __b)
 {
@@ -8777,6 +8822,12 @@ vld1_lane_s32 (const int32_t * __a, int32x2_t __b, const int __c)
   return (int32x2_t)__builtin_neon_vld1_lanev2si ((const __builtin_neon_si *) __a, __b, __c);
 }
 
+__extension__ static __inline float16x4_t __attribute__ ((__always_inline__))
+vld1_lane_f16 (const float16_t * __a, float16x4_t __b, const int __c)
+{
+  return vset_lane_f16 (*__a, __b, __c);
+}
+
 __extension__ static __inline float32x2_t __attribute__ ((__always_inline__))
 vld1_lane_f32 (const float32_t * __a, float32x2_t __b, const int __c)
 {
@@ -8925,6 +8976,12 @@ vld1_dup_s32 (const int32_t * __a)
   return (int32x2_t)__builtin_neon_vld1_dupv2si ((const __builtin_neon_si *) __a);
 }
 
+__extension__ static __inline float16x4_t __attribute__ ((__always_inline__))
+vld1_dup_f16 (const float16_t * __a)
+{
+  return vdup_n_f16 (*__a);
+}
+
 __extension__ static __inline float32x2_t __attribute__ ((__always_inline__))
 vld1_dup_f32 (const float32_t * __a)
 {
@@ -11809,6 +11866,12

[PATCH 2/2][ARM] PR/63870: Add a __builtin_lane_check

2015-01-16 Thread Alan Lawrence
This parallels the present form of __builtin_aarch64_im_lane_boundsi, and allows 
to check lane indices for intrinsics that can otherwise be written in terms of 
GCC vector extensions.


The new builtin is not used in this patch but is used in my series of float16_t 
intrinsics (https://gcc.gnu.org/ml/gcc-patches/2015-01/msg01434.html), and at 
some point in the future we should rewrite existing intrinsics (for other types) 
to this form too, but I'm leaving that for a later patch series :).


Cross-tested check-gcc on arm-none-eabi
Bootstrapped on arm-none-linux-gnueabihf cortex-a15

gcc/ChangeLog:

* config/arm/arm-builtins.c (enum arm_builtins):
Add ARM_BUILTIN_NEON_BASE and ARM_BUILTIN_NEON_LANE_CHECK.
(ARM_BUILTIN_NEON_BASE): Rename macro to
(ARM_BUILTIN_NEON_PATTERN_START): ...this.
(arm_init_neon_builtins): Register __builtin_arm_lane_check.
(arm_expand_neon_builtin): Handle ARM_BUILTIN_NEON_LANE_CHECK.

diff --git a/gcc/config/arm/arm-builtins.c b/gcc/config/arm/arm-builtins.c
index 2ca7ac5ad3cf82941a5d3b6707a0a41f3157190b..baa83490fcd9bf68d9e9bdbd57cdf6f2d3d0e056 100644
--- a/gcc/config/arm/arm-builtins.c
+++ b/gcc/config/arm/arm-builtins.c
@@ -521,12 +521,16 @@ enum arm_builtins
 #undef CRYPTO2
 #undef CRYPTO3
 
+  ARM_BUILTIN_NEON_BASE,
+  ARM_BUILTIN_NEON_LANE_CHECK = ARM_BUILTIN_NEON_BASE,
+
 #include "arm_neon_builtins.def"
 
   ARM_BUILTIN_MAX
 };
 
-#define ARM_BUILTIN_NEON_BASE (ARM_BUILTIN_MAX - ARRAY_SIZE (neon_builtin_data))
+#define ARM_BUILTIN_NEON_PATTERN_START \
+(ARM_BUILTIN_MAX - ARRAY_SIZE (neon_builtin_data))
 
 #undef CF
 #undef VAR1
@@ -885,7 +889,7 @@ arm_init_simd_builtin_scalar_types (void)
 static void
 arm_init_neon_builtins (void)
 {
-  unsigned int i, fcode = ARM_BUILTIN_NEON_BASE;
+  unsigned int i, fcode = ARM_BUILTIN_NEON_PATTERN_START;
 
   arm_init_simd_builtin_types ();
 
@@ -895,6 +899,15 @@ arm_init_neon_builtins (void)
  system.  */
   arm_init_simd_builtin_scalar_types ();
 
+  tree lane_check_fpr = build_function_type_list (void_type_node,
+		  intSI_type_node,
+		  intSI_type_node,
+		  NULL);
+  arm_builtin_decls[ARM_BUILTIN_NEON_LANE_CHECK] =
+  add_builtin_function ("__builtin_arm_lane_check", lane_check_fpr,
+			ARM_BUILTIN_NEON_LANE_CHECK, BUILT_IN_MD,
+			NULL, NULL_TREE);
+
   for (i = 0; i < ARRAY_SIZE (neon_builtin_data); i++, fcode++)
 {
   bool print_type_signature_p = false;
@@ -2155,14 +2168,28 @@ arm_expand_neon_args (rtx target, machine_mode map_mode, int fcode,
   return target;
 }
 
-/* Expand a Neon builtin. These are "special" because they don't have symbolic
+/* Expand a Neon builtin, i.e. those registered only if TARGET_NEON holds.
+   Most of these are "special" because they don't have symbolic
constants defined per-instruction or per instruction-variant. Instead, the
required info is looked up in the table neon_builtin_data.  */
 static rtx
 arm_expand_neon_builtin (int fcode, tree exp, rtx target)
 {
+  if (fcode == ARM_BUILTIN_NEON_LANE_CHECK)
+{
+  tree nlanes = CALL_EXPR_ARG (exp, 0);
+  gcc_assert (TREE_CODE (nlanes) == INTEGER_CST);
+  rtx lane_idx = expand_normal (CALL_EXPR_ARG (exp, 1));
+  if (CONST_INT_P (lane_idx))
+	neon_lane_bounds (lane_idx, 0, TREE_INT_CST_LOW (nlanes), exp);
+  else
+	error ("%Klane index must be a constant immediate", exp);
+  /* Don't generate any RTL.  */
+  return const0_rtx;
+}
+
   neon_builtin_datum *d =
-		&neon_builtin_data[fcode - ARM_BUILTIN_NEON_BASE];
+		&neon_builtin_data[fcode - ARM_BUILTIN_NEON_PATTERN_START];
   enum insn_code icode = d->code;
   builtin_arg args[SIMD_MAX_BUILTIN_ARGS];
   int num_args = insn_data[d->code].n_operands;

Re: [PING] [PATCH] Fix parameters of __tsan_vptr_update

2015-01-16 Thread Dmitry Vyukov
This is just a copy from llvm repo, right?
Looks good to me.

On Fri, Jan 16, 2015 at 10:17 AM, Bernd Edlinger
 wrote:
> Hi,
>
>
> I think I should ping for this patch now:
> https://gcc.gnu.org/ml/gcc-patches/2015-01/msg00599.html
>
> note that by mistake the change log referenced sanitizer.c instead of
> sanitizer.def, consider that fixed on my local copy.
>
>
> Thanks
> Bernd.
>
>
>> Date: Sun, 11 Jan 2015 14:15:54 +0100
>>
>> Hi,
>>
>>
>>
>> On Sun, 4 Jan 2015 14:54:56, Bernd Edlinger wrote:
>>>
>>> Hi Jakub,
>>>
>>>
>>> I think I have found a reasonable test case, see the attached patch file.
>>> The use case is: a class that destroys an owned thread in the destructor.
>>> The destructor sets the vptr again to thread::vptr but this should
>>> _not_ trigger a diagnostic message, when the vptr does not really change.
>>>
>>> Jakub, this is another test case where the TREE_READONLY prevents
>>> the tsan instrumentation. So I had first to install your patch:
>>>
>>> https://gcc.gnu.org/ml/gcc-patches/2014-12/msg01432.html
>>>
>>> ... to see the test case fail without my patch.
>>>
>>
>> that has been installed in the meantime.
>>
>>> The patch installs cleanly on 4.9 and 4.8, however the 4.8 branch
>>> has no tsan tests, so I would leave the test case away for 4.8.
>>>
>>
>> I found, 4.8 does not have BT_FN_VOID_PTR_PTR, and no tsan tests
>> at all, so it is probably not worth the effort.
>>
>>> Boot-strapped and regression-tested on x86_64-linux-gnu
>>> OK for trunk and 4.9 + 4.8 branches?
>>>
>>>
>>> Thanks
>>> Bernd.
>>>
>>>
>>
>> I found some test cases in the clang tree, about the __tsan_vptr_update.
>> So I thought I should use these instead of inventing new ones.
>>
>> Attached you'll find an updated patch with one positive and one negative
>> test for vptr races.
>>
>> Tested with x86_64-linux-gnu.
>> OK for trunk and 4.9 after a while?
>>
>>
>> Thanks
>> Bernd.
>>
>


[PATCH 0/4][ARM Intrinsics][RFTesting] Add missing float16x8_t type, and float16x[48] intrinsics

2015-01-16 Thread Alan Lawrence
These add all the V[48]HFmode insns and corresponding intrinsics for ARM. 
Depends on the two patches at 
https://gcc.gnu.org/ml/gcc-patches/2015-01/msg01422.html .


Unfortunately I don't at present have a testsuite. I've done some testing both 
manually and on a large internal testsuite for Neon/ACLE intrinsics, but I'm 
wondering if anyone has anything they might be able to contribute? Christophe, 
perhaps you can give me some pointers, how might one add float16 to the 
advsimd-intrinsics testsuite / how easy would this be?


Cross-tested check-gcc on arm-none-eabi
Bootstrapped on arm-none-linux-gnueabihf cortex-a15



Re: [PATCH][ARM] PR 62066: Call va_end on early return from va_list processing function

2015-01-16 Thread Ramana Radhakrishnan



On 16/01/15 16:56, Kyrill Tkachov wrote:

Hi all,

As the simple PR says we should call va_end before returning early from
a function that started processing the va_list with va_start.
The C spec agrees:
"Each invocation of the va_start and va_copy macros
shall be matched by a corresponding invocation of the va_end macro in
the same
function."

Tested arm-none-eabi.

Ok for trunk?


OK.

Thanks,
Ramana



Thanks,
Kyrill

2014-01-16  Kyrylo Tkachov  

  PR target/62066
  * config/arm/arm-builtins.c (arm_expand_neon_args): Call va_end before
  early return 0.



Re: [[ARM/AArch64][testsuite] 16/36] Add vqdmlal_n and vqdmlsl_n tests.

2015-01-16 Thread Tejas Belagod

On 13/01/15 15:18, Christophe Lyon wrote:

* gcc.target/aarch64/advsimd-intrinsics/vqdmlXl_n.inc: New file.
* gcc.target/aarch64/advsimd-intrinsics/vqdmlal_n.c: New file.
* gcc.target/aarch64/advsimd-intrinsics/vqdmlsl_n.c: New file.

diff --git a/gcc/testsuite/gcc.target/aarch64/advsimd-intrinsics/vqdmlXl_n.inc 
b/gcc/testsuite/gcc.target/aarch64/advsimd-intrinsics/vqdmlXl_n.inc
new file mode 100644
index 000..fd885dd
--- /dev/null
+++ b/gcc/testsuite/gcc.target/aarch64/advsimd-intrinsics/vqdmlXl_n.inc
@@ -0,0 +1,59 @@
+#define FNNAME1(NAME) exec_ ## NAME
+#define FNNAME(NAME) FNNAME1(NAME)
+
+void FNNAME (INSN_NAME) (void)
+{
+  /* vector_res = vqdmlxl_n(vector, vector3, val),
+ then store the result.  */
+#define TEST_VQDMLXL_N1(INSN, T1, T2, W, W2, N, V, EXPECTED_CUMULATIVE_SAT, 
CMT) \
+  Set_Neon_Cumulative_Sat(0, VECT_VAR(vector_res, T1, W, N));  \
+  VECT_VAR(vector_res, T1, W, N) = \
+INSN##_##T2##W2(VECT_VAR(vector, T1, W, N),
\
+   VECT_VAR(vector3, T1, W2, N),   \
+   V); \
+  vst1q_##T2##W(VECT_VAR(result, T1, W, N),\
+   VECT_VAR(vector_res, T1, W, N));\
+  CHECK_CUMULATIVE_SAT(TEST_MSG, T1, W, N, EXPECTED_CUMULATIVE_SAT, CMT)
+
+#define TEST_VQDMLXL_N(INSN, T1, T2, W, W2, N, V, EXPECTED_CUMULATIVE_SAT, 
CMT) \
+  TEST_VQDMLXL_N1(INSN, T1, T2, W, W2, N, V, EXPECTED_CUMULATIVE_SAT, CMT)
+
+  DECL_VARIABLE(vector, int, 32, 4);
+  DECL_VARIABLE(vector3, int, 16, 4);
+  DECL_VARIABLE(vector_res, int, 32, 4);
+
+  DECL_VARIABLE(vector, int, 64, 2);
+  DECL_VARIABLE(vector3, int, 32, 2);
+  DECL_VARIABLE(vector_res, int, 64, 2);
+
+  clean_results ();
+
+  VLOAD(vector, buffer, q, int, s, 32, 4);
+  VLOAD(vector, buffer, q, int, s, 64, 2);
+
+  VDUP(vector3, , int, s, 16, 4, 0x55);
+  VDUP(vector3, , int, s, 32, 2, 0x55);
+
+  /* Choose val arbitrarily.  */
+  TEST_VQDMLXL_N(INSN_NAME, int, s, 32, 16, 4, 0x22, expected_cumulative_sat, 
"");
+  TEST_VQDMLXL_N(INSN_NAME, int, s, 64, 32, 2, 0x33, expected_cumulative_sat, 
"");
+
+  CHECK(TEST_MSG, int, 32, 4, PRIx32, expected, "");
+  CHECK(TEST_MSG, int, 64, 2, PRIx64, expected, "");
+
+#define TEST_MSG2 "(check mul cumulative saturation)"
+  VDUP(vector3, , int, s, 16, 4, 0x8000);
+  VDUP(vector3, , int, s, 32, 2, 0x8000);
+
+  TEST_VQDMLXL_N(INSN_NAME, int, s, 32, 16, 4, 0x8000, 
expected_cumulative_sat2, TEST_MSG2);
+  TEST_VQDMLXL_N(INSN_NAME, int, s, 64, 32, 2, 0x8000, 
expected_cumulative_sat2, TEST_MSG2);
+
+  CHECK(TEST_MSG, int, 32, 4, PRIx32, expected2, TEST_MSG2);
+  CHECK(TEST_MSG, int, 64, 2, PRIx64, expected2, TEST_MSG2);
+}
+
+int main (void)
+{
+  FNNAME (INSN_NAME) ();
+  return 0;
+}
diff --git a/gcc/testsuite/gcc.target/aarch64/advsimd-intrinsics/vqdmlal_n.c 
b/gcc/testsuite/gcc.target/aarch64/advsimd-intrinsics/vqdmlal_n.c
new file mode 100644
index 000..b84bca3
--- /dev/null
+++ b/gcc/testsuite/gcc.target/aarch64/advsimd-intrinsics/vqdmlal_n.c
@@ -0,0 +1,27 @@
+#include 
+#include "arm-neon-ref.h"
+#include "compute-ref-data.h"
+
+#define INSN_NAME vqdmlal_n
+#define TEST_MSG "VQDMLAL_N"
+
+/* Expected values of cumulative_saturation flag.  */
+int VECT_VAR(expected_cumulative_sat,int,32,4) = 0;
+int VECT_VAR(expected_cumulative_sat,int,64,2) = 0;
+
+/* Expected results.  */
+VECT_VAR_DECL(expected,int,32,4) [] = { 0x1684, 0x1685, 0x1686, 0x1687 };
+VECT_VAR_DECL(expected,int,64,2) [] = { 0x21ce, 0x21cf };
+
+/* Expected values of cumulative_saturation flag when saturation
+   occurs.  */
+int VECT_VAR(expected_cumulative_sat2,int,32,4) = 1;
+int VECT_VAR(expected_cumulative_sat2,int,64,2) = 1;
+
+/* Expected results when saturation occurs.  */
+VECT_VAR_DECL(expected2,int,32,4) [] = { 0x7fef, 0x7ff0,
+0x7ff1, 0x7ff2 };
+VECT_VAR_DECL(expected2,int,64,2) [] = { 0x7fef,
+0x7ff0 };
+
+#include "vqdmlXl_n.inc"
diff --git a/gcc/testsuite/gcc.target/aarch64/advsimd-intrinsics/vqdmlsl_n.c 
b/gcc/testsuite/gcc.target/aarch64/advsimd-intrinsics/vqdmlsl_n.c
new file mode 100644
index 000..ff8d9d3
--- /dev/null
+++ b/gcc/testsuite/gcc.target/aarch64/advsimd-intrinsics/vqdmlsl_n.c
@@ -0,0 +1,29 @@
+#include 
+#include "arm-neon-ref.h"
+#include "compute-ref-data.h"
+
+#define INSN_NAME vqdmlsl_n
+#define TEST_MSG "VQDMLSL_N"
+
+/* Expected values of cumulative_saturation flag.  */
+int VECT_VAR(expected_cumulative_sat,int,32,4) = 0;
+int VECT_VAR(expected_cumulative_sat,int,64,2) = 0;
+
+/* Expected results.  */
+VECT_VAR_DECL(expected,int,32,4) [] = { 0xe95c, 0xe95d,
+   0xe95e, 0xe95f };
+VECT_VAR_DECL(expected,int,64,2) [] = { 0xde12,
+  

Re: [[ARM/AArch64][testsuite] 04/36] Add vld1_lane tests.

2015-01-16 Thread Marcus Shawcroft
On 16 January 2015 at 16:23, Christophe Lyon  wrote:
> On 16 January 2015 at 15:09, Tejas Belagod  wrote:
>> On 13/01/15 15:18, Christophe Lyon wrote:
>>>
>>> * gcc.target/aarch64/advsimd-intrinsics/vld1_lane.c: New file.



>> Hmm.. again, I don't see vld1_lane_f64?
>
> Same answer: unless I am mistaken it isn't supported on armv7, and
> indeed the tests need to be expanded.
>
>>> +#ifndef __CC_ARM
>>> +  /* Check runtime assertions. With RVCT, the check is performed at
>>> + compile-time */
>>> +  //  TEST_VLD1_LANE(, int, s, 64, 1, 1);
>>> +#endif
>>> +
>>
>> Does this belong in this patch?
> Good catch!
> The original testsuite uses RVCT features not present in GCC, and I
> forgot to remove this chunk.
>
>> Otherwise, it looks good to me(I cannot approve though).


OK with that hunk dropped, provided no new fails on aarch64[_be] and arm.
/Marcus


Re: [PING] [PATCH] Fix parameters of __tsan_vptr_update

2015-01-16 Thread Jeff Law

On 01/16/15 00:17, Bernd Edlinger wrote:

Hi,


I think I should ping for this patch now:
https://gcc.gnu.org/ml/gcc-patches/2015-01/msg00599.html

note that by mistake the change log referenced sanitizer.c instead of
sanitizer.def, consider that fixed on my local copy.

That patch is fine.  Thanks for pinging.

jeff



Re: [[ARM/AArch64][testsuite] 03/36] Add vmax, vmin, vhadd, vhsub and vrhadd tests.

2015-01-16 Thread Marcus Shawcroft
On 16 January 2015 at 16:21, Christophe Lyon  wrote:

> My existing tests only cover armv7 so far.
> I do plan to expand them once they are all in GCC.
>
>> Otherwise, they look good to me(but I can't approve it).
>>
>> Tejas.
>>

OK provided, as per the previous couple, that we don;t regression or
introduce new fails on aarch64[_be] or aarch32.
/Marcus


Re: [[ARM/AArch64][testsuite] 02/36] Be more verbose, and actually confirm that a test was checked.

2015-01-16 Thread Marcus Shawcroft
On 13 January 2015 at 15:18, Christophe Lyon  wrote:
> * gcc.target/aarch64/advsimd-intrinsics/arm-neon-ref.h (CHECK):
> Add trace.
> (CHECK_FP): Likewise.
> (CHECK_CUMULATIVE_SAT): Likewise.

OK, provided no regressions and no new fails for aarch64, aarch64_be and arm.
/Marcus


Re: [[ARM/AArch64][testsuite] 01/36] Add explicit dependency on Neon Cumulative Saturation flag (QC).

2015-01-16 Thread Marcus Shawcroft
On 13 January 2015 at 15:17, Christophe Lyon  wrote:

> * gcc.target/aarch64/advsimd-intrinsics/arm-neon-ref.h
> (Set_Neon_Cumulative_Sat): Add parameter.
> (__set_neon_cumulative_sat): Support new parameter.
> * gcc.target/aarch64/advsimd-intrinsics/binary_sat_op.inc
> (TEST_BINARY_SAT_OP1): Call Set_Neon_Cumulative_Sat with new
> argument.
> * gcc.target/aarch64/advsimd-intrinsics/unary_sat_op.inc
> (TEST_UNARY_SAT_OP1): Call Set_Neon_Cumulative_Sat with new
> argument.

OK, provided no regressions and no new fails for aarch64, aarch64_be and arm.
/Marcus


Re: RTL cprop vs. fixed hard regs

2015-01-16 Thread Segher Boessenkool
On Fri, Jan 16, 2015 at 08:12:27PM +1030, Alan Modra wrote:
> https://lists.ozlabs.org/pipermail/linuxppc-dev/2014-December/123776.html
> shows gcc-5 miscompiling a powerpc64 linux kernel.  The executive
> summary is that the rs6000 backend has a bug in its RTL description of
> indirect calls.  We specify a parallel containing both the actual call
> and an action that happens after the call, the restore of r2.  The
> restore is simply a memory load:
> (set (reg:DI 2 2)
> (mem/v/c:DI (plus:DI (reg/f:DI 1 1)
> (const_int 40 [0x28])) [0  S8 A8]))
> This leads to cprop concluding that it is valid to replace the
> reference to r1 with another register having the same value before the
> call.  Unfortunately, sometimes a call-clobbered register is chosen.
> 
> OK, so we need to fix this in the rs6000 backend, but it occurs to me
> that cprop also has a bug here.  It shouldn't be touching fixed hard
> registers.

Why not?  It cannot allocate a fixed reg to a pseudo, but other than
that there is nothing special about fixed regs; the transform is
perfectly valid as far as I see.

It isn't a desirable transform in this case, but that is not true for
fixed regs in general (just because the stack pointer is live everywhere).


Segher


[patch libstdc++] Optimize synchronization in std::future if futexes are available.

2015-01-16 Thread Torvald Riegel
This patch optimizes synchronization in std::future by using atomic
operations and futexes if the latter are available, instead of the
mutex/condvar combination in the existing code.  That reduces the space
overhead for futexes as well as synchronization runtime overheads.

To do that, the patch introduces an abstraction layer for futexes that
essentially extends an atomic-typed variable with operations that wait
for the variable to (not) have a certain value.  This waiting can then
be implemented internally with a combination of spin-waiting (not
implemented yet) and blocking using OS features such as futexes.  This
approach is similar to what the "synchronic" proposal to ISO C++
(N4195) contains.
The atomic-typed variable is an unsigned int because this is what Linux
futexes currently support as futex variable.

If futexes are not available, the implementation falls back to using a
mutex and condvar.  This leads to very similar code for std::future
compared to the existing code.  The exception is that
_State_baseV2::wait_for and _State_baseV2::wait_until may acquire the
mutex twice if the future is not ready; this may lead to some additional
contention if those functions actually have to wait for the future to
become ready (ie, this is on the slow path).

It would be possible to optimize the space overhead in std::future
further by merging _State_baseV2::_M_retrieved and
_State_baseV2::_M_once into _State_baseV2::_M_status.  This would add
some complexity to the synchronization code, but not a lot.
I'm happy to do this next week if people are interested in this.

Tested on x86_64-linux and the 30_threads/* tests.

OK for trunk?
commit 4b07d1c0ab807fd0fbedacde1e9fec99a7d75b6d
Author: Torvald Riegel 
Date:   Sun Nov 16 12:07:22 2014 +0100

libstdc++: Optimize synchronization in std::future if futexes are available.

	* src/c++11/futex.cc: New file.
	* include/bits/atomic_futex.h: New file.
	* include/std/future (__future_base::_State_baseV2): Use
	atomic_futex_unsigned instead of mutex+condvar.
	* src/c++11/futex.cc: Likewise.
	* include/Makefile.am: Add atomic_futex.h.
	* include/Makefile.in: Likewise.
	* src/c++11/Makefile.am: Add futex.cc.
	* src/c++11/Makefile.in: Likewise.

diff --git a/libstdc++-v3/config/abi/pre/gnu.ver b/libstdc++-v3/config/abi/pre/gnu.ver
index 700da18..d8d155f 100644
--- a/libstdc++-v3/config/abi/pre/gnu.ver
+++ b/libstdc++-v3/config/abi/pre/gnu.ver
@@ -1759,6 +1759,11 @@ GLIBCXX_3.4.21 {
 _ZNKSt8time_getI[cw]St19istreambuf_iteratorI[cw]St11char_traitsI[cw]EEE3getES3_S3_RSt8ios_baseRSt12_Ios_IostateP2tmPK[cw]SC_;
 _ZNKSt8time_getI[cw]St19istreambuf_iteratorI[cw]St11char_traitsI[cw]EEE6do_getES3_S3_RSt8ios_baseRSt12_Ios_IostateP2tmcc;
 
+extern "C++"
+{
+  std::__atomic_futex_unsigned_base*;
+};
+
 } GLIBCXX_3.4.20;
 
 
diff --git a/libstdc++-v3/include/Makefile.am b/libstdc++-v3/include/Makefile.am
index 08e5d5f..4772950 100644
--- a/libstdc++-v3/include/Makefile.am
+++ b/libstdc++-v3/include/Makefile.am
@@ -83,6 +83,7 @@ bits_headers = \
 	${bits_srcdir}/allocated_ptr.h \
 	${bits_srcdir}/allocator.h \
 	${bits_srcdir}/atomic_base.h \
+	${bits_srcdir}/atomic_futex.h \
 	${bits_srcdir}/basic_ios.h \
 	${bits_srcdir}/basic_ios.tcc \
 	${bits_srcdir}/basic_string.h \
diff --git a/libstdc++-v3/include/Makefile.in b/libstdc++-v3/include/Makefile.in
index 3e5d82e..ebcaa96 100644
--- a/libstdc++-v3/include/Makefile.in
+++ b/libstdc++-v3/include/Makefile.in
@@ -351,6 +351,7 @@ bits_headers = \
 	${bits_srcdir}/allocated_ptr.h \
 	${bits_srcdir}/allocator.h \
 	${bits_srcdir}/atomic_base.h \
+	${bits_srcdir}/atomic_futex.h \
 	${bits_srcdir}/basic_ios.h \
 	${bits_srcdir}/basic_ios.tcc \
 	${bits_srcdir}/basic_string.h \
diff --git a/libstdc++-v3/include/bits/atomic_futex.h b/libstdc++-v3/include/bits/atomic_futex.h
new file mode 100644
index 000..9a418d8
--- /dev/null
+++ b/libstdc++-v3/include/bits/atomic_futex.h
@@ -0,0 +1,288 @@
+// -*- C++ -*- header.
+
+// Copyright (C) 2015 Free Software Foundation, Inc.
+//
+// This file is part of the GNU ISO C++ Library.  This library is free
+// software; you can redistribute it and/or modify it under the
+// terms of the GNU General Public License as published by the
+// Free Software Foundation; either version 3, or (at your option)
+// any later version.
+
+// This library is distributed in the hope that it will be useful,
+// but WITHOUT ANY WARRANTY; without even the implied warranty of
+// MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+// GNU General Public License for more details.
+
+// Under Section 7 of GPL version 3, you are granted additional
+// permissions described in the GCC Runtime Library Exception, version
+// 3.1, as published by the Free Software Foundation.
+
+// You should have received a copy of the GNU General Public License and
+// a copy of the GCC Runtime Library Exception along with this program;
+// see the files COPYING3 and C

Re: [PATCH, CHKP] [always_inline 2/2] Fix segafult in SRA

2015-01-16 Thread Jeff Law

On 01/16/15 04:09, Ilya Enkovich wrote:

Hi,

In early SRA some_callers_have_mismatched_arguments_p function is called for 
function, all its thunks and aliases, but actually cannot handle function with 
thunks because assumes call_stmt for call_edge is not NULL.  This patch rejects 
functions with thunks instead of ICE.

Bootstrapped and checked on x86_64-unknown-linux-gnu.  Fixes faults revealed by 
the first patch in the series.  OK for trunk?

Thanks,
Ilya
--
2015-01-16  Ilya Enkovich  

* tree-sra.c (some_callers_have_mismatched_arguments_p): Allow thunk
callers.

OK.
jeff



[PATCH][ARM][committed] Move comment about splitting Thumb1 patterns to thumb1.md

2015-01-16 Thread Kyrill Tkachov

Hi all,

Sorry for the rototill, but this bugged me ;)
Since now we have a separate .md file for Thumb1 patterns, I think this 
comment is better placed there.

Applied as obvious with r219755.

Thanks,
Kyrill

2015-01-16  Kyrylo Tkachov  kyrylo.tkac...@arm.com

* config/arm/arm.md: Move comment about splitting Thumb1 patterns to...
* config/arm/thumb1.md: ... Here.
commit 98c0b0bc525f11407516ced25c8eb7d12c7da1fd
Author: Kyrylo Tkachov 
Date:   Mon Jan 12 13:51:40 2015 +

[ARM] Move comment on Thumb1 splitting to thumb1.md

diff --git a/gcc/config/arm/arm.md b/gcc/config/arm/arm.md
index bbefb93..79fd0c6 100644
--- a/gcc/config/arm/arm.md
+++ b/gcc/config/arm/arm.md
@@ -22,25 +22,6 @@
 
 ;;- See file "rtl.def" for documentation on define_insn, match_*, et. al.
 
-;; Beware of splitting Thumb1 patterns that output multiple
-;; assembly instructions, in particular instruction such as SBC and
-;; ADC which consume flags.  For example, in the pattern thumb_subdi3
-;; below, the output SUB implicitly sets the flags (assembled to SUBS)
-;; and then the Carry flag is used by SBC to compute the correct
-;; result.  If we split thumb_subdi3 pattern into two separate RTL
-;; insns (using define_insn_and_split), the scheduler might place
-;; other RTL insns between SUB and SBC, possibly modifying the Carry
-;; flag used by SBC.  This might happen because most Thumb1 patterns
-;; for flag-setting instructions do not have explicit RTL for setting
-;; or clobbering the flags.  Instead, they have the attribute "conds"
-;; with value "set" or "clob".  However, this attribute is not used to
-;; identify dependencies and therefore the scheduler might reorder
-;; these instruction.  Currenly, this problem cannot happen because
-;; there are no separate Thumb1 patterns for individual instruction
-;; that consume flags (except conditional execution, which is treated
-;; differently).  In particular there is no Thumb1 armv6-m pattern for
-;; sbc or adc.
-
 
 ;;---
 ;; Constants
diff --git a/gcc/config/arm/thumb1.md b/gcc/config/arm/thumb1.md
index ff423d8..b1a5897 100644
--- a/gcc/config/arm/thumb1.md
+++ b/gcc/config/arm/thumb1.md
@@ -22,6 +22,27 @@
 ;; Insn patterns
 ;;
 
+;; Beware of splitting Thumb1 patterns that output multiple
+;; assembly instructions, in particular instruction such as SBC and
+;; ADC which consume flags.  For example, in the pattern thumb_subdi3
+;; below, the output SUB implicitly sets the flags (assembled to SUBS)
+;; and then the Carry flag is used by SBC to compute the correct
+;; result.  If we split thumb_subdi3 pattern into two separate RTL
+;; insns (using define_insn_and_split), the scheduler might place
+;; other RTL insns between SUB and SBC, possibly modifying the Carry
+;; flag used by SBC.  This might happen because most Thumb1 patterns
+;; for flag-setting instructions do not have explicit RTL for setting
+;; or clobbering the flags.  Instead, they have the attribute "conds"
+;; with value "set" or "clob".  However, this attribute is not used to
+;; identify dependencies and therefore the scheduler might reorder
+;; these instruction.  Currenly, this problem cannot happen because
+;; there are no separate Thumb1 patterns for individual instruction
+;; that consume flags (except conditional execution, which is treated
+;; differently).  In particular there is no Thumb1 armv6-m pattern for
+;; sbc or adc.
+
+
+
 (define_insn "*thumb1_adddi3"
   [(set (match_operand:DI  0 "register_operand" "=l")
 	(plus:DI (match_operand:DI 1 "register_operand" "%0")

[PATCH 1/2][ARM] PR/63870: Add qualifier to check lane bounds in expand

2015-01-16 Thread Alan Lawrence
This is based loosely upon svn r217440, "[AArch64] Add bounds checking to 
vqdm_lane intrinsics...", but applies to more intrinsics (including e.g. 
vget_lane), and does not do the endianness-flipping present on AArch64: the 
objective is to exactly preserve behaviour on all valid code. (Yes, the new 
qualifier may perhaps give us a location for flipping lanes according to 
endianness in the future, but I'm not doing that here.) Checks for lanes being 
in range for many insns are thus moved from assembly to expand time, with 
inlining history. For example, previous error message:


vqrdmulh_lane_s16_indices_1.c: In function 'test1':
vqrdmulh_lane_s16_indices_1.c:9:1: error: lane out of range
}
^

becomes:

In file included vqrdmulh_lane_s16_indices_1.c:3:0:
In function 'vqrdmulh_lane_s16',
inlined from 'test1' at 
gcc/testsuite/gcc.target/aarch64/simd/vqrdmulh_lane_s16_indices_1.c:8:10:
.../install/lib/gcc/arm-none-eabi/5.0.0/include/arm_neon.h:6882:10: error: lane 
-1 out of range 0 - 3

return (int16x4_t)builtin_neon_vqrdmulh_lanev4hi (a, b, c);

Note the question of how to common up tests with those in 
gcc.target/aarch64/simd/*_indices_1.c is not resolved by this patch.


Cross-tested check-gcc on arm-none-eabi
Bootstrapped on arm-none-linux-gnueabihf cortex-a15

gcc/ChangeLog:

* config/arm/arm-builtins.c (enum arm_type_qualifiers):
Add qualifier_lane_index.
(arm_binop_imm_qualifiers, BINOP_IMM_QUALIFIERS): New.
(arm_getlane_qualifiers): Use qualifier_lane_index.
(arm_lanemac_qualifiers): Rename to...
(arm_mac_n_qualifiers): ...this.
(LANEMAC_QUALIFIERS): Rename to...
(MAC_N_QUALIFIERS): ...this.
(arm_mac_lane_qualifiers, MAC_LANE_QUALIFIERS): New.
(arm_setlane_qualifiers): Use qualifier_lane_index.
(arm_ternop_imm_qualifiers, TERNOP_IMM_QUALIFIERS): New.
(enum builtin_arg): Add NEON_ARG_LANE_INDEX.
(arm_expand_neon_args): Handle NEON_ARG_LANE_INDEX.
(arm_expand_neon_builtin): Handle qualifier_lane_index.

* config/arm/arm-protos.h (neon_lane_bounds): Add const_tree parameter.
* config/arm/arm.c (bounds_check): Likewise, improve error message.
(neon_lane_bounds, neon_const_bounds): Add arguments to bounds_check.
* config/arm/arm_neon_builtins.def (vshrs_n, vshru_n, vrshrs_n,
vrshru_n, vshrn_n, vrshrn_n, vqshrns_n, vqshrnu_n, vqrshrns_n,
vqrshrnu_n, vqshrun_n, vqrshrun_n, vshl_n, vqshl_s_n, vqshl_u_n,
vqshlu_n, vshlls_n, vshllu_n): Change qualifiers to BINOP_IMM.
(vsras_n, vsrau_n, vrsras_n, vrsrau_n, vsri_n, vsli_n): Change
qualifiers to TERNOP_IMM.
(vdup_lane): Change qualifiers to GETLANE.
(vmla_lane, vmlals_lane, vmlalu_lane, vqdmlal_lane, vmls_lane,
vmlsls_lane, vmlslu_lane, vqdmlsl_lane): Change qualifiers to MAC_LANE.
(vmla_n, vmlals_n, vmlalu_n, vqdmlal_n, vmls_n, vmlsls_n, vmlslu_n,
vqdmlsl_n): Change qualifiers to MAC_N.

* config/arm/neon.md (neon_vget_lane, neon_vget_laneu,
neon_vget_lanedi, neon_vget_lanev2di, neon_vset_lane,
neon_vset_lanedi, neon_vdup_lane, neon_vdup_lanedi,
neon_vdup_lanev2di, neon_vmul_lane, neon_vmul_lane,
neon_vmull_lane, neon_vqdmull_lane,
neon_vqdmulh_lane, neon_vqdmulh_lane,
neon_vmla_lane, neon_vmla_lane, neon_vmlal_lane,
neon_vqdmlal_lane, neon_vmls_lane, neon_vmls_lane,
neon_vmlsl_lane, neon_vqdmlsl_lane):
Remove call to neon_lane_bounds.

diff --git a/gcc/config/arm/arm-builtins.c b/gcc/config/arm/arm-builtins.c
index 2d2cafe56373fd9fb8cdba9c142c7ac9b188aed1..e7e16c21f619449d395efbfbe4efbda421ffb3a2 100644
--- a/gcc/config/arm/arm-builtins.c
+++ b/gcc/config/arm/arm-builtins.c
@@ -64,7 +64,9 @@ enum arm_type_qualifiers
   /* qualifier_const_pointer | qualifier_map_mode  */
   qualifier_const_pointer_map_mode = 0x86,
   /* Polynomial types.  */
-  qualifier_poly = 0x100
+  qualifier_poly = 0x100,
+  /* Lane indices - must be within range of previous argument = a vector.  */
+  qualifier_lane_index = 0x200
 };
 
 /*  The qualifier_internal allows generation of a unary builtin from
@@ -95,21 +97,40 @@ arm_ternop_qualifiers[SIMD_MAX_BUILTIN_ARGS]
 
 /* T (T, immediate).  */
 static enum arm_type_qualifiers
-arm_getlane_qualifiers[SIMD_MAX_BUILTIN_ARGS]
+arm_binop_imm_qualifiers[SIMD_MAX_BUILTIN_ARGS]
   = { qualifier_none, qualifier_none, qualifier_immediate };
+#define BINOP_IMM_QUALIFIERS (arm_binop_imm_qualifiers)
+
+/* T (T, lane index).  */
+static enum arm_type_qualifiers
+arm_getlane_qualifiers[SIMD_MAX_BUILTIN_ARGS]
+  = { qualifier_none, qualifier_none, qualifier_lane_index };
 #define GETLANE_QUALIFIERS (arm_getlane_qualifiers)
 
 /* T (T, T, T, immediate).  */
 static enum arm_type_qualifiers
-arm_lanemac_qualifiers[SIMD_MAX_BUILTIN_ARGS]
+arm_mac_n_qualifiers[SIMD_MAX_BUILTIN_ARGS]
   = { qualifier_none, qualifier_none, qualifier_none,
   qualifier_none, qualifier_immediate };
-#define LANEMAC_QUALIFIERS (arm_lanemac_qualifiers)
+#define MAC_N_QUALIFIERS (arm_mac_n_qualifiers)
+
+/* T (T, T, T, lane

Re: [PATCH, CHKP] [always_inline 1/2] Allow inlining for not instrumented calls to always_inline functions

2015-01-16 Thread Jeff Law

On 01/16/15 04:04, Ilya Enkovich wrote:

Hi,

Currently compiler emits an error in case of not instrumented call to 
instrumented alwyas_inline function.  It happens because when we inline there 
is only instrumented version of function available and we don't inline thunks.  
This patch solves the problem by split of thunk production pass into two 
passes.  The first one removes all functions we don't have to inline.  The 
other one does the rest after local optimizations.

Bootstrapped and checked on x86_64-unknown-linux-gnu.  This patch causes fault 
in chkp-strchr.c test and also breaks instrumented bootstrap.  Both are due to 
problem in SRA fixed by the next patch.  OK for trunk?

Thanks,
Ilya
--
gcc/

2015-01-16  Ilya Enkovich  

* ipa-chkp.c (chkp_produce_thunks): Add early param.

 to support splitting of thunk production.  Do not remove bodies of
"always_inline" functions.

Is that an accurate representation of what changed?



(pass_data_ipa_chkp_early_produce_thunks): New.
(pass_ipa_chkp_early_produce_thunks): New.
(pass_ipa_chkp_produce_thunks::execute): Adjust to new
chkp_produce_thunks signature.
(make_pass_ipa_chkp_early_produce_thunks): New.
* passes.def (pass_ipa_chkp_early_produce_thunks): New.
(pass_ipa_chkp_produce_thunks): Move after local optimizations.
* tree-pass.h (make_pass_ipa_chkp_early_produce_thunks): New.

gcc/testsuite/

2015-01-16  Ilya Enkovich  

* gcc.target/i386/chkp-always_inline.c: New.

With the updated ChangeLog, this is fine.

jeff



  1   2   3   >