Re: [PATCH] Fix PR71132

2016-05-17 Thread H.J. Lu
On Tue, May 17, 2016 at 11:21 AM, H.J. Lu  wrote:
> On Tue, May 17, 2016 at 5:51 AM, Richard Biener  wrote:
>>
>> The following fixes a latent issue in loop distribution catched by
>> the fake edge placement adjustment.
>>
>> Bootstrapped and tested on x86_64-unknown-linux-gnu, applied.
>>
>> Richard.
>>
>> 2016-05-17  Richard Biener  
>>
>> PR tree-optimization/71132
>> * tree-loop-distribution.c (create_rdg_cd_edges): Pass in loop.
>> Only add control dependences for blocks in the loop.
>> (build_rdg): Adjust.
>> (generate_code_for_partition): Return whether loop should
>> be destroyed and delay that.
>> (distribute_loop): Likewise.
>> (pass_loop_distribution::execute): Record loops to be destroyed
>> and perform delayed destroying of loops.
>>
>> * gcc.dg/torture/pr71132.c: New testcase.
>>
>
> On x86, this caused:
>
> FAIL: c-c++-common/cilk-plus/AN/builtin_fn_custom.c  -O3 -fcilkplus
> (internal compiler error)
> FAIL: c-c++-common/cilk-plus/AN/builtin_fn_custom.c  -O3 -fcilkplus
> (test for excess errors)
> FAIL: c-c++-common/cilk-plus/AN/builtin_fn_mutating.c  -fcilkplus -O3
> -std=c99 (internal compiler error)
> FAIL: c-c++-common/cilk-plus/AN/builtin_fn_mutating.c  -fcilkplus -O3
> -std=c99 (test for excess errors)
> FAIL: c-c++-common/cilk-plus/AN/builtin_fn_mutating.c  -O3 -fcilkplus
> (internal compiler error)
> FAIL: c-c++-common/cilk-plus/AN/builtin_fn_mutating.c  -O3 -fcilkplus
> (test for excess errors)
> FAIL: c-c++-common/cilk-plus/AN/builtin_func_double.c  -fcilkplus -O3
> -std=c99 (internal compiler error)
> FAIL: c-c++-common/cilk-plus/AN/builtin_func_double.c  -fcilkplus -O3
> -std=c99 (test for excess errors)
> FAIL: c-c++-common/cilk-plus/AN/builtin_func_double.c  -O3 -fcilkplus
> (internal compiler error)
> FAIL: c-c++-common/cilk-plus/AN/builtin_func_double.c  -O3 -fcilkplus
> (internal compiler error)
> FAIL: c-c++-common/cilk-plus/AN/builtin_func_double.c  -O3 -fcilkplus
> (test for excess errors)
> FAIL: c-c++-common/cilk-plus/AN/builtin_func_double.c  -O3 -fcilkplus
> (test for excess errors)
> FAIL: c-c++-common/cilk-plus/AN/sec_reduce_ind_same_value.c
> -fcilkplus -O3 -std=c99 (internal compiler error)
> FAIL: c-c++-common/cilk-plus/AN/sec_reduce_ind_same_value.c
> -fcilkplus -O3 -std=c99 (test for excess errors)
> FAIL: c-c++-common/cilk-plus/AN/sec_reduce_ind_same_value.c  -O3
> -fcilkplus (internal compiler error)
> FAIL: c-c++-common/cilk-plus/AN/sec_reduce_ind_same_value.c  -O3
> -fcilkplus (test for excess errors)
> FAIL: gcc.c-torture/compile/pr32399.c   -O3 -fomit-frame-pointer
> -funroll-loops -fpeel-loops -ftracer -finline-functions  (internal
> compiler error)
> FAIL: gcc.c-torture/compile/pr32399.c   -O3 -fomit-frame-pointer
> -funroll-loops -fpeel-loops -ftracer -finline-functions  (test for
> excess errors)
> FAIL: gcc.c-torture/compile/pr32399.c   -O3 -g  (internal compiler error)
> FAIL: gcc.c-torture/compile/pr32399.c   -O3 -g  (test for excess errors)
> FAIL: gcc.c-torture/execute/20010221-1.c   -O3 -g  (internal compiler error)
> FAIL: gcc.c-torture/execute/20010221-1.c   -O3 -g  (test for excess errors)
> FAIL: gcc.c-torture/execute/20120919-1.c   -O3 -fomit-frame-pointer
> -funroll-loops -fpeel-loops -ftracer -finline-functions  (internal
> compiler error)
> FAIL: gcc.c-torture/execute/20120919-1.c   -O3 -fomit-frame-pointer
> -funroll-loops -fpeel-loops -ftracer -finline-functions  (test for
> excess errors)
> FAIL: gcc.c-torture/execute/20120919-1.c   -O3 -g  (internal compiler error)
> FAIL: gcc.c-torture/execute/20120919-1.c   -O3 -g  (test for excess errors)
> FAIL: gcc.dg/torture/pr61383-1.c   -O3 -fomit-frame-pointer
> -funroll-loops -fpeel-loops -ftracer -finline-functions  (internal
> compiler error)
> FAIL: gcc.dg/torture/pr61383-1.c   -O3 -fomit-frame-pointer
> -funroll-loops -fpeel-loops -ftracer -finline-functions  (test for
> excess errors)
> FAIL: gcc.dg/torture/pr69452.c   -O3 -fomit-frame-pointer
> -funroll-loops -fpeel-loops -ftracer -finline-functions  (internal
> compiler error)
> FAIL: gcc.dg/torture/pr69452.c   -O3 -fomit-frame-pointer
> -funroll-loops -fpeel-loops -ftracer -finline-functions  (test for
> excess errors)
> FAIL: gcc.dg/torture/pr69452.c   -O3 -g  (internal compiler error)
> FAIL: gcc.dg/torture/pr69452.c   -O3 -g  (test for excess errors)
> FAIL: g++.dg/cilk-plus/AN/builtin_fn_mutating_tplt.cc  -g -O3
> -fcilkplus (internal compiler error)
> FAIL: g++.dg/cilk-plus/AN/builtin_fn_mutating_tplt.cc  -g -O3
> -fcilkplus (test for excess errors)
> FAIL: g++.dg/cilk-plus/AN/builtin_fn_mutating_tplt.cc  -O3 -fcilkplus
> (internal compiler error)
> FAIL: g++.dg/cilk-plus/AN/builtin_fn_mutating_tplt.cc  -O3 -fcilkplus
> (test for excess errors)
> FAIL: g++.dg/cilk-plus/AN/builtin_fn_mutating_tplt.cc  -O3
> -ftree-vectorize -fcilkplus -g (internal compiler error)
> FAIL: 

Re: [PATCH 16/17][ARM] Add tests for VFP FP16 ACLE instrinsics.

2016-05-17 Thread Joseph Myers
On Tue, 17 May 2016, Matthew Wahab wrote:

> In some tests, there are unavoidable differences in precision when
> calculating the actual and the expected results of an FP16 operation. A
> new support function CHECK_FP_BIAS is used so that these tests can check
> for an acceptable margin of error. In these tests, the tolerance is
> given as the absolute integer difference between the bitvectors of the
> expected and the actual results.

As far as I can see, CHECK_FP_BIAS is only used in the following patch, 
but there is another bias test in vsqrth_f16_1.c in this patch.

Could you clarify where the "unavoidable differences in precision" come 
from?  Are the results of some of the new instructions not fully 
specified, only specified within a given precision?  (As far as I can tell 
the existing v8 instructions for reciprocal and reciprocal square root 
estimates do have fully defined results, despite being loosely described 
as esimtates.)

-- 
Joseph S. Myers
jos...@codesourcery.com


Re: [PATCH 9/17][ARM] Add NEON FP16 arithmetic instructions.

2016-05-17 Thread Joseph Myers
On Tue, 17 May 2016, Matthew Wahab wrote:

> As with the VFP FP16 arithmetic instructions, operations on __fp16
> values are done by conversion to single-precision. Any new optimization
> supported by the instruction descriptions can only apply to code
> generated using intrinsics added in this patch series.

As with the scalar instructions, I think it is legitimate in most cases to 
optimize arithmetic via single precision to work direct on __fp16 values 
(and this would be natural for vectorization of __fp16 arithmetic).

> A number of the instructions are modelled as two variants, one using
> UNSPEC and the other using RTL operations, with the model used decided
> by the funsafe-math-optimizations flag. This follows the
> single-precision instructions and is due to the half-precision
> operations having the same conditions and restrictions on their use in
> optmizations (when they are enabled).

(Of course, these restrictions still apply.)

-- 
Joseph S. Myers
jos...@codesourcery.com


Re: [PATCH 8/17][ARM] Add VFP FP16 arithmetic instructions.

2016-05-17 Thread Joseph Myers
On Wed, 18 May 2016, Joseph Myers wrote:

> But why do you need to force that?  If the instructions follow IEEE 
> semantics including for exceptions and rounding modes, then X OP Y 
> computed directly with binary16 arithmetic has the same value as results 
> from promoting to binary32, doing binary32 arithmetic and converting back 
> to binary16, for OP in + - * /.  (Double-rounding problems can only occur 

I should say: this is not the case for fma - (__fp16) fmaf (a, b, c) need 
not be the same as fmaf16 (a, b, c) for fp16 values a, b, c - but I think 
you should use the standard instruction name there as well - if the 
instruction is a fused multiply-add on binary16, it should be described as 
such.

-- 
Joseph S. Myers
jos...@codesourcery.com


Re: [PATCH 8/17][ARM] Add VFP FP16 arithmetic instructions.

2016-05-17 Thread Joseph Myers
On Tue, 17 May 2016, Matthew Wahab wrote:

> In most cases the instructions are added using non-standard pattern
> names. This is to force operations on __fp16 values to be done, by
> conversion, using the single-precision instructions. The exceptions are
> the precision preserving operations ABS and NEG.

But why do you need to force that?  If the instructions follow IEEE 
semantics including for exceptions and rounding modes, then X OP Y 
computed directly with binary16 arithmetic has the same value as results 
from promoting to binary32, doing binary32 arithmetic and converting back 
to binary16, for OP in + - * /.  (Double-rounding problems can only occur 
in round-to-nearest and if the binary32 result is exactly half way between 
two representable binary16 values but the exact result is not exactly half 
way between.  It's obvious that this can't occur to + - * and only a bit 
harder to see this for /.  According to the logic used in 
convert.c:convert_to_real_1, double rounding can't occur in this case for 
square root either, though I haven't verified that.)

So I'd expect e.g.

__fp16 a, b;
__fp16 c = a / b;

to generate the new instructions, because direct binary16 arithmetic is a 
correct implementation of (__fp16) ((float) a / (float) b).

(ISO C, even with DTS 18661-5, does not concern itself with the number of 
times an expression raises a given exception beyond whether that is zero 
or nonzero, so changes between two and one instances of "inexact" are not 
a concern.)

-- 
Joseph S. Myers
jos...@codesourcery.com


[PATCH] Fix bootstrap on hppa*-*-hpux*

2016-05-17 Thread John David Anglin
r235550 introduced the use of long long, and the macros LLONG_MIN and 
LLONG_MAX.  These macros
are not defined by default and we need to include  when compiling with 
c++ to define them.

Tested on hppa2.0w-hp-hpux11.11 and hppa64-hp-hpux11.11.  Okay for trunk?

Dave
--
John David Anglin   dave.ang...@bell.net


2016-05-17  John David Anglin  

PR bootstrap/71014
* system.h: Include climits instead of limits.h when compiling with c++.

Index: system.h
===
--- system.h(revision 236287)
+++ system.h(working copy)
@@ -301,8 +301,12 @@
 # undef m_slot
 #endif
 
-#if HAVE_LIMITS_H
-# include 
+#ifdef __cplusplus
+# include 
+#else
+# if HAVE_LIMITS_H
+#  include 
+# endif
 #endif
 
 /* A macro to determine whether a VALUE lies inclusively within a


Re: [PATCH 2/4] BRIG (HSAIL) frontend: The FE itself.

2016-05-17 Thread Joseph Myers
This patch has many improperly formatted diagnostic messages (e.g. 
starting with capital letters, ending with '.' or failing to use %q for 
quoting).  I also note cases where you use %lu as a format for size_t, 
which is not correct (you'd need to add pretty-print.c support for %zu 
before you could use that, however).

-- 
Joseph S. Myers
jos...@codesourcery.com


Re: [PATCH 0/4] BRIG (HSAIL) frontend

2016-05-17 Thread Joseph Myers
On Mon, 16 May 2016, Pekka Jääskeläinen wrote:

> The diffstat is as follows:

I don't see any .texi files in this diffstat.  New front ends need all 
relevant documentation updated.

-- 
Joseph S. Myers
jos...@codesourcery.com

Re: [PATCH #2] Add PowerPC ISA 3.0 word splat and byte immediate splat support

2016-05-17 Thread Michael Meissner
On Tue, May 17, 2016 at 06:45:49PM -0400, Michael Meissner wrote:
> As I mentioned in the last message, my previous patch had some problems that
> showed up on big endian systems, using RELOAD (one of the tests that failed 
> was
> the vshuf-v32qi.c test in the testsuite).  Little endian and IRA did compiled
> the test fine.  This patch fixes the problem.  I went over the alternatives in
> the vsx_mov_{32bit,64bit} patterns, and I removed the '*' constraints,
> and checked all of the other constraints.
> 
> Just to be sure, I build the Spec 2006 benchmark with this patch with the
> following variants, and all built fine (using svn id 236136 as the base
> revision).
> 
>   32-bit power7 big endian, reload
>   32-bit power7 big endian, ira
>   64-bit power7 big endian, reload
>   64-bit power7 big endian, ira
>   64-bit power8 little endian, reload
>   64-bit power8 little endian, ira
> 
> I also built the power8 little endian versions with subversion id 236325, and
> all built fine.  I am having trouble with 236325 on my big endian power7
> system, but it fails with both the straight trunk, and with these patches, so
> it is something else.

FWIW, the problem after subversion id 236136 shows up when the trunk compiler
is built with the host compiler (4.3.4).  I build compilers to build Spec using
the configuration option --with-advance-toolchain=at7.0 so that the libraries
and include files from the Advance Toolchain are used instead of the host
libraries and include files.  The --with-advance-toolchain=at7.0 does not work
as well with a bootstrap compiler, so I tend to build a non-bootstrap compiler
to build spec objects with.

If I build it with the Advance Toolchain (AT 7.0), or presumably with a
bootstrap compiler from trunk, it builds the integer portion of Spec 2006 for
both reload and lra without error (building with the host compiler, 9 of the 12
SpecInt benchmarks would fail).

-- 
Michael Meissner, IBM
IBM, M/S 2506R, 550 King Street, Littleton, MA 01460-6245, USA
email: meiss...@linux.vnet.ibm.com, phone: +1 (978) 899-4797



[PATCH #2] Add PowerPC ISA 3.0 word splat and byte immediate splat support

2016-05-17 Thread Michael Meissner
As I mentioned in the last message, my previous patch had some problems that
showed up on big endian systems, using RELOAD (one of the tests that failed was
the vshuf-v32qi.c test in the testsuite).  Little endian and IRA did compiled
the test fine.  This patch fixes the problem.  I went over the alternatives in
the vsx_mov_{32bit,64bit} patterns, and I removed the '*' constraints,
and checked all of the other constraints.

Just to be sure, I build the Spec 2006 benchmark with this patch with the
following variants, and all built fine (using svn id 236136 as the base
revision).

32-bit power7 big endian, reload
32-bit power7 big endian, ira
64-bit power7 big endian, reload
64-bit power7 big endian, ira
64-bit power8 little endian, reload
64-bit power8 little endian, ira

I also built the power8 little endian versions with subversion id 236325, and
all built fine.  I am having trouble with 236325 on my big endian power7
system, but it fails with both the straight trunk, and with these patches, so
it is something else.

The patches bootstrap and pass regression tests on both little endian power8
and big endian power7 systems.  Are these patches ok to install in the trunk?
After a burn-in period, will they be ok to back port to the GCC 6.2 branch?

[gcc]
2016-05-17  Michael Meissner  

PR target/70915
* config/rs6000/constraints.md (wE constraint): New constraint
for a vector constant that can be loaded with XXSPLTIB.
(wM constraint): New constraint for a vector constant of a 1's.
(wS constraint): New constraint for a vector constant that can be
loaded with XXSPLTIB and a vector sign extend instruction.
* config/rs6000/predicates.md (xxspltib_constant_split): New
predicates for wE/wS constraints.
(xxspltib_constant_nosplit): Likewise.
(easy_vector_constant): Add support for constants that can be
loaded via XXSPLTIB.
(all_ones_constant): New predicate for vector constant with all
1's set.
(splat_input_operand): Add support for ISA 3.0 word splat
operations.
* config/rs6000/rs6000.c (xxspltib_constant_p): New function to
return if a constant can be loaded with the ISA 3.0 XXSPLTIB
instruction and possibly with a sign extension.
(output_vec_const_move): Add support for XXSPLTIB. If we are
loading up 0/-1 into Altivec registers, prefer using VSPLTISW
instead of XXLXOR/XXLORC.
(rs6000_expand_vector_init): Add support for ISA 3.0 word splat
operations.
(rs6000_legitimize_reload_address): Likewise.
(rs6000_output_move_128bit): Use output_vec_const_move to emit
constants.
* config/rs6000/vsx.md (VSX_M): Add TImode (if -mvsx-timode) and
combine VSX_M and VSX_M2 into one iterator.
(VSX_M2): Likewise.
(VSINT_84): New iterators for loading constants with XXSPLTIB.
(VSINT_842): Likewise.
(UNSPEC_VSX_SIGN_EXTEND): New UNSPEC.
(xxspltib_v16qi): New insns to load up constants with the ISA 3.0
XXSPLTIB instruction.
(xxspltib__nosplit): Likewise.
(xxspltib__split): New insn to load up constants with
XXSPLTIB and a sign extend instruction.
(vsx_mov): Replace single move that handled all vector types
with separate 32-bit and 64-bit moves.  Combine the movti_
moves (when -mvsx-timode is in effect) into the main vector
moves.  Eliminate separate moves for  , where the
preferred register class () is listed first, and the
secondary register class () is listed second with a '?' to
discourage use.  Prefer loading 0/-1 in any VSX register for ISA
3.0, and Altivec registers for ISA 2.06/2.07 (PR target/70915) so
that if the register was involved in a slow operation, the
clear/set operation does not wait for the slow operation to
finish.  Adjust the length attributes for 32-bit mode.  Use
rs6000_output_move_128bit and drop the use of the string
instructions for 32-bit movti when -mvsx-timode is in effect.  Use
spacing so that the alternatives and attributes don't generate
long lines, and put things in columns, so that it is easier to
match up the operands and attributes with the insn alternatives.
(vsx_mov_64bit): Likewise.
(vsx_mov_32bit): Likewise.
(vsx_movti_64bit): Fold movti into normal vector moves.
(vsx_movti_32bit): Likewise.
(vsx_splat_, V4SI/V4SF modes): Add support for ISA 3.0 word
spat instructions.
(vsx_splat_v4si_internal): Likewise.
(vsx_splat_v4sf_internal): Likewise.
(vector fusion peepholes): Use VSX_M instead of VSX_M2.
(vsx_sign_extend_qi_): New ISA 3.0 instructions to sign
extend vector elements.
(vsx_sign_extend_hi_): Likewise.
  

Re: [PATCH 3/3] jit: implement gcc_jit_rvalue_set_bool_require_tail_call

2016-05-17 Thread Trevor Saunders
On Tue, May 17, 2016 at 06:01:32PM -0400, David Malcolm wrote:
> This implements the libgccjit support for must-tail-call via
> a new:
>   gcc_jit_rvalue_set_bool_require_tail_call
> API entrypoint.

It seems to me like that's not a great name, the rvalue and bool parts
are just about the argument types, not what the function does.  Wouldn't
gcc_jit_set_call_requires_tail_call be better?

Trev



Re: [PATCH 1/3] function: Do the CLEANUP_EXPENSIVE after shrink-wrapping, not before

2016-05-17 Thread Eric Botcazou
> I built cross-compilers for 30 targets, and built Linux with that.
> 6 of those failed for unrelated reasons.  Of the 24 that do build,
> five show a few insns difference between having a cleanup_cfg before
> shrink-wrapping or not (CLEANUP_EXPENSIVE made no difference there).
> These targets are s390, blackfin, m68k, mn10300, nios2.

Interesting, thanks for experimenting.

> It turns out that prepare_shrink_wrap *does* care about block structure:
> namely, it only moves insns from the "head" block to a successor.  It
> then makes a difference when the cleanup_cfg can merge two successor blocks
> (say, the first is a forwarder block).  This happens quite seldomly.
> 
> So, shall I put a cleanup_cfg back before shrink-wrapping?

Yes, I'd only swap CLEANUP_EXPENSIVE with 0, i.e. leave the calls themselves
and add a comment on the first one saying that this helps shrink-wrapping too.

-- 
Eric Botcazou


Re: [PATCH 1/3] function: Do the CLEANUP_EXPENSIVE after shrink-wrapping, not before

2016-05-17 Thread Segher Boessenkool
On Tue, May 17, 2016 at 04:17:58AM -0500, Segher Boessenkool wrote:
> On Tue, May 17, 2016 at 11:08:53AM +0200, Eric Botcazou wrote:
> > > How would it?  The shrink-wrapping algorithms do not much care how you
> > > write your control flow.  The only things I can think of are drastic
> > > things like removing some dead code, or converting a switch to a direct
> > > jump, but those had better be done for the immediately preceding passes
> > > already (register allocation).
> > 
> > But the compiler didn't wait until after shrink-wrapping to emit multiple 
> > epilogues and can still do that w/o shrink-wrapping.
> 
> It will only ever generate a single epilogue (unless you also count
> sibcall epilogues), and that is done after shrink-wrapping.  Or you mean
> something else and I just don't see it.
> 
> > > I can put back a  cleanup_cfg (0)  in front if that seems less tricky
> > > (or just safer)?
> > 
> > I think you need to evaluate the effects of the change on a set of sources.
> 
> Yeah I'll do that, thanks for the idea.

I built cross-compilers for 30 targets, and built Linux with that.
6 of those failed for unrelated reasons.  Of the 24 that do build,
five show a few insns difference between having a cleanup_cfg before
shrink-wrapping or not (CLEANUP_EXPENSIVE made no difference there).
These targets are s390, blackfin, m68k, mn10300, nios2.

It turns out that prepare_shrink_wrap *does* care about block structure:
namely, it only moves insns from the "head" block to a successor.  It
then makes a difference when the cleanup_cfg can merge two successor blocks
(say, the first is a forwarder block).  This happens quite seldomly.

So, shall I put a cleanup_cfg back before shrink-wrapping?

[ I'm now also looking at what patch #3 (and #2) change; also small stuff ].


Segher


Re: [C++ Patch] PR 70466 ("ICE on invalid code in tree check: expected constructor, have parm_decl in convert_like_real...")

2016-05-17 Thread Paolo Carlini

Hi,

On 17/05/2016 20:15, Jason Merrill wrote:

On 05/17/2016 04:47 AM, Paolo Carlini wrote:

... alternately, if the substance of my patchlet is right, we could
simplify a bit the logic per the below.


Here's a well-formed variant that was accepted by 4.5.  Does your 
patch fix it?  I also think with your patch we can drop the C++11 
check, since list-initialization doesn't exist in C++98.
Oh nice, the new testcase indeed passes with my patch. However, removing 
completely C++11 check causes a regression in c++98 mode for 
init/explicit1.C, we start warning for it:


struct A { explicit A(int = 0); };
struct B { A a; };

int main()
{
  B b = {};// { dg-warning "explicit" "" { target c++11 } }
}

(just checked, apparently clang too accepts init/explicit1.C in c++98)

Thanks,
Paolo.





[PATCH 1/3] Introduce can_implement_as_sibling_call_p

2016-05-17 Thread David Malcolm
This patch moves part of the logic for determining if tail
call optimizations are possible to a new helper function.

There are no functional changes.

expand_call is 1300 lines long, so there's arguably a
case for doing this on its own, but this change also
enables the followup patch.

The patch changes the logic from a big "if" with joined
|| clauses:

  if (first_problem ()
  ||second_problem ()
  /* ...etc... */
  ||final_problem ())
 try_tail_call = 0;

to a series of separate tests:

  if (first_problem ())
return false;
  if (second_problem ())
return false;
  /* ...etc... */
  if (final_problem ())
return false;

I think the latter form has several advantages over the former:
- IMHO it's easier to read
- it makes it easy to put breakpoints on individual causes of failure
- it makes it easy to put specific error messages on individual causes
  of failure (as done in the followup patch).

Successfully bootstrapped on x86_64-pc-linux-gnu.

OK for trunk?

gcc/ChangeLog:
* calls.c (expand_call): Move "Rest of purposes for tail call
optimizations to fail" to...
(can_implement_as_sibling_call_p): ...this new function, and
split into multiple "if" statements.
---
 gcc/calls.c | 114 
 1 file changed, 76 insertions(+), 38 deletions(-)

diff --git a/gcc/calls.c b/gcc/calls.c
index 6cc1fc7..ac8092c 100644
--- a/gcc/calls.c
+++ b/gcc/calls.c
@@ -2344,6 +2344,78 @@ avoid_likely_spilled_reg (rtx x)
   return x;
 }
 
+/* Helper function for expand_call.
+   Return false is EXP is not implementable as a sibling call.  */
+
+static bool
+can_implement_as_sibling_call_p (tree exp,
+rtx structure_value_addr,
+tree funtype,
+int reg_parm_stack_space,
+tree fndecl,
+int flags,
+tree addr,
+const args_size _size)
+{
+  if (!targetm.have_sibcall_epilogue ())
+return false;
+
+  /* Doing sibling call optimization needs some work, since
+ structure_value_addr can be allocated on the stack.
+ It does not seem worth the effort since few optimizable
+ sibling calls will return a structure.  */
+  if (structure_value_addr != NULL_RTX)
+return false;
+
+#ifdef REG_PARM_STACK_SPACE
+  /* If outgoing reg parm stack space changes, we can not do sibcall.  */
+  if (OUTGOING_REG_PARM_STACK_SPACE (funtype)
+  != OUTGOING_REG_PARM_STACK_SPACE (TREE_TYPE (current_function_decl))
+  || (reg_parm_stack_space != REG_PARM_STACK_SPACE 
(current_function_decl)))
+return false;
+#endif
+
+  /* Check whether the target is able to optimize the call
+ into a sibcall.  */
+  if (!targetm.function_ok_for_sibcall (fndecl, exp))
+return false;
+
+  /* Functions that do not return exactly once may not be sibcall
+ optimized.  */
+  if (flags & (ECF_RETURNS_TWICE | ECF_NORETURN))
+return false;
+
+  if (TYPE_VOLATILE (TREE_TYPE (TREE_TYPE (addr
+return false;
+
+  /* If the called function is nested in the current one, it might access
+ some of the caller's arguments, but could clobber them beforehand if
+ the argument areas are shared.  */
+  if (fndecl && decl_function_context (fndecl) == current_function_decl)
+return false;
+
+  /* If this function requires more stack slots than the current
+ function, we cannot change it into a sibling call.
+ crtl->args.pretend_args_size is not part of the
+ stack allocated by our caller.  */
+  if (args_size.constant > (crtl->args.size - crtl->args.pretend_args_size))
+return false;
+
+  /* If the callee pops its own arguments, then it must pop exactly
+ the same number of arguments as the current function.  */
+  if (targetm.calls.return_pops_args (fndecl, funtype, args_size.constant)
+  != targetm.calls.return_pops_args (current_function_decl,
+TREE_TYPE (current_function_decl),
+crtl->args.size))
+return false;
+
+  if (!lang_hooks.decls.ok_for_sibcall (fndecl))
+return false;
+
+  /* All checks passed.  */
+  return true;
+}
+
 /* Generate all the code for a CALL_EXPR exp
and return an rtx for its value.
Store the value in TARGET (specified as an rtx) if convenient.
@@ -2740,44 +2812,10 @@ expand_call (tree exp, rtx target, int ignore)
 try_tail_call = 0;
 
   /*  Rest of purposes for tail call optimizations to fail.  */
-  if (!try_tail_call
-  || !targetm.have_sibcall_epilogue ()
-  /* Doing sibling call optimization needs some work, since
-structure_value_addr can be allocated on the stack.
-It does not seem worth the effort since few optimizable
-sibling calls will return a structure.  */
-  || structure_value_addr != NULL_RTX
-#ifdef 

[PATCH 2/3] Implement CALL_EXPR_MUST_TAIL_CALL

2016-05-17 Thread David Malcolm
This patch implements support for marking CALL_EXPRs
as being mandatory for tail-call-optimization. expand_call
tries harder to perform the optimization on such CALL_EXPRs,
and issues an error if it fails.

Currently this flag isn't accessible from any frontend,
so the patch uses a plugin for testing the functionality.

Successfully bootstrapped on x86_64-pc-linux-gnu;
adds 8 PASS results to gcc.sum.

OK for trunk?

gcc/ChangeLog:
* calls.c (maybe_complain_about_tail_call): New function.
(initialize_argument_information): Call
maybe_complain_about_tail_call when clearing *may_tailcall.
(can_implement_as_sibling_call_p): Call
maybe_complain_about_tail_call when returning false.
(expand_call): Read CALL_EXPR_MUST_TAIL_CALL and, if set,
ensure try_tail_call is set.  Call maybe_complain_about_tail_call
if tail-call optimization fails.
* cfgexpand.c (expand_call_stmt): Initialize
CALL_EXPR_MUST_TAIL_CALL from gimple_call_must_tail_p.
* gimple-pretty-print.c (dump_gimple_call): Dump
gimple_call_must_tail_p.
* gimple.c (gimple_build_call_from_tree): Call
gimple_call_set_must_tail with the value of
CALL_EXPR_MUST_TAIL_CALL.
* gimple.h (enum gf_mask): Add GF_CALL_MUST_TAIL_CALL.
(gimple_call_set_must_tail): New function.
(gimple_call_must_tail_p): New function.
* print-tree.c (print_node): Update printing of TREE_STATIC
to reflect its use for CALL_EXPR_MUST_TAIL_CALL.
* tree-core.h (struct tree_base): Add MUST_TAIL_CALL to the
trailing comment listing applicable flags.
* tree.h (CALL_EXPR_MUST_TAIL_CALL): New macro.

gcc/testsuite/ChangeLog:
* gcc.dg/plugin/must-tail-call-1.c: New test case.
* gcc.dg/plugin/must-tail-call-2.c: New test case.
* gcc.dg/plugin/must_tail_call_plugin.c: New file.
* gcc.dg/plugin/plugin.exp (plugin_test_list): Add the above.
---
 gcc/calls.c| 123 ++---
 gcc/cfgexpand.c|   1 +
 gcc/gimple-pretty-print.c  |   2 +
 gcc/gimple.c   |   1 +
 gcc/gimple.h   |  20 
 gcc/print-tree.c   |   2 +-
 gcc/testsuite/gcc.dg/plugin/must-tail-call-1.c |  22 
 gcc/testsuite/gcc.dg/plugin/must-tail-call-2.c |  58 ++
 .../gcc.dg/plugin/must_tail_call_plugin.c  |  76 +
 gcc/testsuite/gcc.dg/plugin/plugin.exp |   3 +
 gcc/tree-core.h|   3 +
 gcc/tree.h |   5 +
 12 files changed, 299 insertions(+), 17 deletions(-)
 create mode 100644 gcc/testsuite/gcc.dg/plugin/must-tail-call-1.c
 create mode 100644 gcc/testsuite/gcc.dg/plugin/must-tail-call-2.c
 create mode 100644 gcc/testsuite/gcc.dg/plugin/must_tail_call_plugin.c

diff --git a/gcc/calls.c b/gcc/calls.c
index ac8092c..1b12eca 100644
--- a/gcc/calls.c
+++ b/gcc/calls.c
@@ -1102,6 +1102,19 @@ store_unaligned_arguments_into_pseudos (struct arg_data 
*args, int num_actuals)
   }
 }
 
+/* Issue an error if CALL_EXPR was flagged as requiring
+   tall-call optimization.  */
+
+static void
+maybe_complain_about_tail_call (tree call_expr, const char *reason)
+{
+  gcc_assert (TREE_CODE (call_expr) == CALL_EXPR);
+  if (!CALL_EXPR_MUST_TAIL_CALL (call_expr))
+return;
+
+  error_at (EXPR_LOCATION (call_expr), "cannot tail-call: %s", reason);
+}
+
 /* Fill in ARGS_SIZE and ARGS array based on the parameters found in
CALL_EXPR EXP.
 
@@ -1343,7 +1356,13 @@ initialize_argument_information (int num_actuals 
ATTRIBUTE_UNUSED,
  /* We can't use sibcalls if a callee-copied argument is
 stored in the current function's frame.  */
  if (!call_from_thunk_p && DECL_P (base) && !TREE_STATIC (base))
-   *may_tailcall = false;
+   {
+ *may_tailcall = false;
+ maybe_complain_about_tail_call (exp,
+ "a callee-copied argument is"
+ " stored in the current "
+ " function's frame");
+   }
 
  args[i].tree_value = build_fold_addr_expr_loc (loc,
 args[i].tree_value);
@@ -1406,6 +1425,9 @@ initialize_argument_information (int num_actuals 
ATTRIBUTE_UNUSED,
= build_fold_addr_expr_loc (loc, make_tree (type, copy));
  type = TREE_TYPE (args[i].tree_value);
  *may_tailcall = false;
+ maybe_complain_about_tail_call (exp,
+ "argument must be passed"
+ " by copying");
   

[PATCH 3/3] jit: implement gcc_jit_rvalue_set_bool_require_tail_call

2016-05-17 Thread David Malcolm
This implements the libgccjit support for must-tail-call via
a new:
  gcc_jit_rvalue_set_bool_require_tail_call
API entrypoint.

(I didn't implement a wrapper for this within the C++ bindings)

Successfully bootstrapped on x86_64-pc-linux-gnu.

gcc/jit/ChangeLog:
* docs/topics/compatibility.rst: Add LIBGCCJIT_ABI_6.
* docs/topics/expressions.rst (Function calls): Add documentation
of gcc_jit_rvalue_set_bool_require_tail_call.
* docs/_build/texinfo/libgccjit.texi: Regenerate.
* jit-common.h (gcc::jit::recording::base_call): Add forward decl.
* jit-playback.c: Within namespace gcc::jit::playback...
(context::build_call) Add "require_tail_call" param and use it
to set CALL_EXPR_MUST_TAIL_CALL.
(context::new_call): Add "require_tail_call" param.
(context::new_call_through_ptr): Likewise.
* jit-playback.h: Within namespace gcc::jit::playback...
(context::new_call: Add "require_tail_call" param.
(context::new_call_through_ptr): Likewise.
(context::build_call): Likewise.
* jit-recording.c: Within namespace gcc::jit::recording...
(base_call::base_call): New constructor.
(base_call::write_reproducer_tail_call): New method.
(call::call): Update for inheritance from base_call.
(call::replay_into): Provide m_require_tail_call to call
to new_call.
(call::write_reproducer): Call write_reproducer_tail_call.
(call_through_ptr::call_through_ptr): Update for inheritance from
base_call.
(call_through_ptr::replay_into): Provide m_require_tail_call to call
to new_call_through_ptr.
(recording::call_through_ptr::write_reproducer): Call
write_reproducer_tail_call.
* jit-recording.h: Within namespace gcc::jit::recording...
(rvalue::dyn_cast_base_call): New virtual function.
(class base_call): New subclass of class rvalue.
(class call): Inherit from base_call rather than directly from
rvalue, moving get_precedence and m_args to base_call.
(class call_through_ptr): Likewise.
* libgccjit.c (gcc_jit_rvalue_set_bool_require_tail_call): New
function.
* libgccjit.h
(LIBGCCJIT_HAVE_gcc_jit_rvalue_set_bool_require_tail_call): New
macro.
(gcc_jit_rvalue_set_bool_require_tail_call): New function.
* libgccjit.map (LIBGCCJIT_ABI_6): New.
(gcc_jit_rvalue_set_bool_require_tail_call): Add.

gcc/testsuite/ChangeLog:
* jit.dg/all-non-failing-tests.h: Add
test-factorial-must-tail-call.c.
* jit.dg/test-error-impossible-must-tail-call.c: New test case.
* jit.dg/test-factorial-must-tail-call.c: New test case.
---
 gcc/jit/docs/topics/compatibility.rst  |   7 ++
 gcc/jit/docs/topics/expressions.rst|  24 +
 gcc/jit/jit-common.h   |   1 +
 gcc/jit/jit-playback.c |  23 +++--
 gcc/jit/jit-playback.h |   9 +-
 gcc/jit/jit-recording.c|  60 +---
 gcc/jit/jit-recording.h|  46 ++---
 gcc/jit/libgccjit.c|  20 
 gcc/jit/libgccjit.h|  13 +++
 gcc/jit/libgccjit.map  |   5 +
 gcc/testsuite/jit.dg/all-non-failing-tests.h   |  10 ++
 .../jit.dg/test-error-impossible-must-tail-call.c  |  93 ++
 .../jit.dg/test-factorial-must-tail-call.c | 109 +
 13 files changed, 382 insertions(+), 38 deletions(-)
 create mode 100644 gcc/testsuite/jit.dg/test-error-impossible-must-tail-call.c
 create mode 100644 gcc/testsuite/jit.dg/test-factorial-must-tail-call.c

diff --git a/gcc/jit/docs/topics/compatibility.rst 
b/gcc/jit/docs/topics/compatibility.rst
index d9eacf2..7abd050 100644
--- a/gcc/jit/docs/topics/compatibility.rst
+++ b/gcc/jit/docs/topics/compatibility.rst
@@ -135,3 +135,10 @@ entrypoints:
 ---
 ``LIBGCCJIT_ABI_5`` covers the addition of
 :func:`gcc_jit_context_set_bool_use_external_driver`
+
+.. _LIBGCCJIT_ABI_6:
+
+``LIBGCCJIT_ABI_6``
+---
+``LIBGCCJIT_ABI_6`` covers the addition of
+:func:`gcc_jit_rvalue_set_bool_require_tail_call`
diff --git a/gcc/jit/docs/topics/expressions.rst 
b/gcc/jit/docs/topics/expressions.rst
index 0445332..261483c 100644
--- a/gcc/jit/docs/topics/expressions.rst
+++ b/gcc/jit/docs/topics/expressions.rst
@@ -424,6 +424,30 @@ Function calls
 
   The same caveat as for :c:func:`gcc_jit_context_new_call` applies.
 
+.. function:: void\
+  gcc_jit_rvalue_set_bool_require_tail_call (gcc_jit_rvalue *call,\
+ int require_tail_call)
+
+   Given an :c:type:`gcc_jit_rvalue *` for a call created through
+   :c:func:`gcc_jit_context_new_call` or
+   

[ptx] More test tweaks

2016-05-17 Thread Nathan Sidwell

This adjusts a few more tests:
* I'd missed the optimization glob on a ptx skip-if, so it wasn't being skipped.
* An asm test relied on the register allocator being run to assign an input to 
the same register  as an output.
* An atomic test operated on automatic storage, which doesn't work on ptx -- 
atomic operations are only valid for global or shared memory.  Allocating it to 
static storage allows it to pass.
* PTX doesn't stick to the IEEE rules for one or more of denorms, zero or nan 
signedness.


nathan
2016-05-17  Nathan Sidwell  

	* gcc.c-torture/execute/20030222-1.c: Skip on ptx.
	* gcc.dg/pr68671.c: Fix ptx xfail-if.
	* gcc.dg/torture/pr54261-1.c: Allocate atomic var statically.
	* gcc.dg/torture/type-generic-1.c: Enable UNSAFE for ptx.

Index: gcc.c-torture/execute/20030222-1.c
===
--- gcc.c-torture/execute/20030222-1.c	(revision 236317)
+++ gcc.c-torture/execute/20030222-1.c	(working copy)
@@ -4,6 +4,7 @@
actually truncated to int, in case a single register is wide enough
for a long long.  */
 /* { dg-skip-if "asm would require extra shift-left-4-byte" { spu-*-* } "*" "" } */
+/* { dg-skip-if "asm requires register allocation" { nvptx-*-* } "*" "" } */
 #include 
 
 void
Index: gcc.dg/pr68671.c
===
--- gcc.dg/pr68671.c	(revision 236317)
+++ gcc.dg/pr68671.c	(working copy)
@@ -1,7 +1,7 @@
 /* PR tree-optimization/68671 */
 /* { dg-do run } */
 /* { dg-options " -O2 -fno-tree-dce" } */
-/* { dg-xfail-if "ptxas crashes" { nvptx-*-* } { "" } { "" } } */
+/* { dg-xfail-if "ptxas crashes" { nvptx-*-* } { "*" } { "" } } */
 
 volatile int a = -1;
 volatile int b;
Index: gcc.dg/torture/pr54261-1.c
===
--- gcc.dg/torture/pr54261-1.c	(revision 236317)
+++ gcc.dg/torture/pr54261-1.c	(working copy)
@@ -32,7 +32,10 @@ void g (int *at, int val)
 
 int main(void)
 {
-  int x = 41;
+  /* On PTX it is not valid to perform atomic operations on auto
+ variables, which end up in .local.  Making this static places it
+ in .global.  */
+  static int x = 41;
   int a = 1;
   g (, a);
 
Index: gcc.dg/torture/type-generic-1.c
===
--- gcc.dg/torture/type-generic-1.c	(revision 236317)
+++ gcc.dg/torture/type-generic-1.c	(working copy)
@@ -3,7 +3,7 @@
 
 /* { dg-do run } */
 /* { dg-skip-if "No Inf/NaN support" { spu-*-* } } */
-/* { dg-options "-DUNSAFE" { target tic6x*-*-* visium-*-* } } */
+/* { dg-options "-DUNSAFE" { target tic6x*-*-* visium-*-* nvptx-*-* } } */
 /* { dg-add-options ieee } */
 
 #include "../tg-tests.h"


Re: inhibit the sincos optimization when the target has sin and cos instructions

2016-05-17 Thread Cesar Philippidis
On 05/17/2016 02:22 PM, Andrew Pinski wrote:
> On Tue, May 17, 2016 at 2:10 PM, Cesar Philippidis
>  wrote:
>> On 05/13/2016 01:13 PM, Andrew Pinski wrote:
>>> On Fri, May 13, 2016 at 12:58 PM, Richard Biener
>>>  wrote:
 On May 13, 2016 9:18:57 PM GMT+02:00, Cesar Philippidis 
  wrote:
> The cse_sincos pass tries to optimize sequences such as
>
>  sin (x);
>  cos (x);
>
> into a single call to sincos, or cexpi, when available. However, the
> nvptx target has sin and cos instructions, albeit with some loss of
> precision (so it's only enabled with -ffast-math). This patch teaches
> cse_sincos pass to ignore sin, cos and cexpi instructions when the
> target can expand those calls. This yields a 6x speedup in 314.omriq
 >from spec accel when running on Nvidia accelerators.
>
> Is this OK for trunk?

 Isn't there an optab for sincos?
>>>
>>> This is exactly what I was going to suggest.  This transformation
>>> should be done in the back-end back to sin/cos instructions.
>>
>> I didn't realize that the 387 has sin, cos and sincos instructions,
>> so yeah, my original patch is bad.
>>
>> Nathan, is this patch ok for trunk and gcc-6? It adds a new sincos
>> pattern in the nvptx backend. I haven't testing a standalone nvptx
>> toolchain prior to this patch, so I'm not sure if my test results
>> look sane. I seem to be getting a different set of failures when I
>> test a clean trunk build multiple times. I attached my results
>> below for reference.
> 
> 
> UNSPEC_SINCOS is unused so why add it?

Good eyes, thanks! I thought I had to create a new insn, but I got away
with an expand. I attached the updated patch.

Cesar

>> g++.sum
>> Tests that now fail, but worked before:
>>
>> nvptx-none-run: g++.dg/abi/param1.C  -std=c++14 execution test
>>
>> Tests that now work, but didn't before:
>>
>> nvptx-none-run: g++.dg/opt/pr30590.C  -std=gnu++98 execution test
>> nvptx-none-run: g++.dg/opt/pr36187.C  -std=gnu++14 execution test
>>
>> gfortran.sum
>> Tests that now fail, but worked before:
>>
>> nvptx-none-run: gfortran.dg/alloc_comp_assign_10.f90   -O3 
>> -fomit-frame-pointer -funroll-loops -fpeel-loops -ftracer -finline-functions 
>>  execution test
>> nvptx-none-run: gfortran.dg/allocate_with_source_5.f90   -O1  execution test
>> nvptx-none-run: gfortran.dg/func_assign_3.f90   -O3 -g  execution test
>> nvptx-none-run: gfortran.dg/inline_sum_3.f90   -O1  execution test
>> nvptx-none-run: gfortran.dg/inline_sum_3.f90   -O3 -g  execution test
>> nvptx-none-run: gfortran.dg/internal_pack_15.f90   -O2  execution test
>> nvptx-none-run: gfortran.dg/internal_pack_8.f90   -Os  execution test
>> nvptx-none-run: gfortran.dg/intrinsic_ifunction_2.f90   -O0  execution test
>> nvptx-none-run: gfortran.dg/intrinsic_ifunction_2.f90   -O3 
>> -fomit-frame-pointer -funroll-loops -fpeel-loops -ftracer -finline-functions 
>>  execution test
>> nvptx-none-run: gfortran.dg/intrinsic_pack_5.f90   -O3 -g  execution test
>> nvptx-none-run: gfortran.dg/intrinsic_product_1.f90   -O1  execution test
>> nvptx-none-run: gfortran.dg/intrinsic_verify_1.f90   -O3 -g  execution test
>> nvptx-none-run: gfortran.dg/is_iostat_end_eor_1.f90   -O3 
>> -fomit-frame-pointer -funroll-loops -fpeel-loops -ftracer -finline-functions 
>>  execution test
>> nvptx-none-run: gfortran.dg/iso_c_binding_rename_1.f03   -O3 
>> -fomit-frame-pointer -funroll-loops -fpeel-loops -ftracer -finline-functions 
>>  execution test
>>
>> Tests that now work, but didn't before:
>>
>> nvptx-none-run: gfortran.dg/char_pointer_assign.f90   -O3 
>> -fomit-frame-pointer -funroll-loops -fpeel-loops -ftracer -finline-functions 
>>  execution test
>> nvptx-none-run: gfortran.dg/char_pointer_dummy.f90   -O1  execution test
>> nvptx-none-run: gfortran.dg/char_pointer_dummy.f90   -Os  execution test
>> nvptx-none-run: gfortran.dg/char_result_13.f90   -O3 -g  execution test
>> nvptx-none-run: gfortran.dg/char_result_2.f90   -O1  execution test
>> nvptx-none-run: gfortran.dg/char_type_len.f90   -Os  execution test
>> nvptx-none-run: gfortran.dg/character_array_constructor_1.f90   -O0  
>> execution test
>> nvptx-none-run: gfortran.dg/nested_allocatables_1.f90   -O3 
>> -fomit-frame-pointer -funroll-loops -fpeel-loops -ftracer -finline-functions 
>>  execution test
>>
>> gcc.sum
>> Tests that now fail, but worked before:
>>
>> nvptx-none-run: gcc.c-torture/execute/20100316-1.c   -Os  execution test
>> nvptx-none-run: gcc.c-torture/execute/20100708-1.c   -O1  execution test
>> nvptx-none-run: gcc.c-torture/execute/20100805-1.c   -O0  execution test
>> nvptx-none-run: gcc.dg/torture/pr52028.c   -O3 -fomit-frame-pointer 
>> -funroll-loops -fpeel-loops -ftracer -finline-functions  execution test
>> nvptx-none-run: gcc.dg/torture/pr52028.c   -O3 -g  execution test
>>
>> Tests that now work, but didn't before:
>>
>> nvptx-none-run: 

Re: inhibit the sincos optimization when the target has sin and cos instructions

2016-05-17 Thread Andrew Pinski
On Tue, May 17, 2016 at 2:10 PM, Cesar Philippidis
 wrote:
> On 05/13/2016 01:13 PM, Andrew Pinski wrote:
>> On Fri, May 13, 2016 at 12:58 PM, Richard Biener
>>  wrote:
>>> On May 13, 2016 9:18:57 PM GMT+02:00, Cesar Philippidis 
>>>  wrote:
 The cse_sincos pass tries to optimize sequences such as

  sin (x);
  cos (x);

 into a single call to sincos, or cexpi, when available. However, the
 nvptx target has sin and cos instructions, albeit with some loss of
 precision (so it's only enabled with -ffast-math). This patch teaches
 cse_sincos pass to ignore sin, cos and cexpi instructions when the
 target can expand those calls. This yields a 6x speedup in 314.omriq
>>> >from spec accel when running on Nvidia accelerators.

 Is this OK for trunk?
>>>
>>> Isn't there an optab for sincos?
>>
>> This is exactly what I was going to suggest.  This transformation
>> should be done in the back-end back to sin/cos instructions.
>
> I didn't realize that the 387 has sin, cos and sincos instructions,
> so yeah, my original patch is bad.
>
> Nathan, is this patch ok for trunk and gcc-6? It adds a new sincos
> pattern in the nvptx backend. I haven't testing a standalone nvptx
> toolchain prior to this patch, so I'm not sure if my test results
> look sane. I seem to be getting a different set of failures when I
> test a clean trunk build multiple times. I attached my results
> below for reference.


UNSPEC_SINCOS is unused so why add it?

Thanks,
Andrew Pinski


>
> Cesar
>
> g++.sum
> Tests that now fail, but worked before:
>
> nvptx-none-run: g++.dg/abi/param1.C  -std=c++14 execution test
>
> Tests that now work, but didn't before:
>
> nvptx-none-run: g++.dg/opt/pr30590.C  -std=gnu++98 execution test
> nvptx-none-run: g++.dg/opt/pr36187.C  -std=gnu++14 execution test
>
> gfortran.sum
> Tests that now fail, but worked before:
>
> nvptx-none-run: gfortran.dg/alloc_comp_assign_10.f90   -O3 
> -fomit-frame-pointer -funroll-loops -fpeel-loops -ftracer -finline-functions  
> execution test
> nvptx-none-run: gfortran.dg/allocate_with_source_5.f90   -O1  execution test
> nvptx-none-run: gfortran.dg/func_assign_3.f90   -O3 -g  execution test
> nvptx-none-run: gfortran.dg/inline_sum_3.f90   -O1  execution test
> nvptx-none-run: gfortran.dg/inline_sum_3.f90   -O3 -g  execution test
> nvptx-none-run: gfortran.dg/internal_pack_15.f90   -O2  execution test
> nvptx-none-run: gfortran.dg/internal_pack_8.f90   -Os  execution test
> nvptx-none-run: gfortran.dg/intrinsic_ifunction_2.f90   -O0  execution test
> nvptx-none-run: gfortran.dg/intrinsic_ifunction_2.f90   -O3 
> -fomit-frame-pointer -funroll-loops -fpeel-loops -ftracer -finline-functions  
> execution test
> nvptx-none-run: gfortran.dg/intrinsic_pack_5.f90   -O3 -g  execution test
> nvptx-none-run: gfortran.dg/intrinsic_product_1.f90   -O1  execution test
> nvptx-none-run: gfortran.dg/intrinsic_verify_1.f90   -O3 -g  execution test
> nvptx-none-run: gfortran.dg/is_iostat_end_eor_1.f90   -O3 
> -fomit-frame-pointer -funroll-loops -fpeel-loops -ftracer -finline-functions  
> execution test
> nvptx-none-run: gfortran.dg/iso_c_binding_rename_1.f03   -O3 
> -fomit-frame-pointer -funroll-loops -fpeel-loops -ftracer -finline-functions  
> execution test
>
> Tests that now work, but didn't before:
>
> nvptx-none-run: gfortran.dg/char_pointer_assign.f90   -O3 
> -fomit-frame-pointer -funroll-loops -fpeel-loops -ftracer -finline-functions  
> execution test
> nvptx-none-run: gfortran.dg/char_pointer_dummy.f90   -O1  execution test
> nvptx-none-run: gfortran.dg/char_pointer_dummy.f90   -Os  execution test
> nvptx-none-run: gfortran.dg/char_result_13.f90   -O3 -g  execution test
> nvptx-none-run: gfortran.dg/char_result_2.f90   -O1  execution test
> nvptx-none-run: gfortran.dg/char_type_len.f90   -Os  execution test
> nvptx-none-run: gfortran.dg/character_array_constructor_1.f90   -O0  
> execution test
> nvptx-none-run: gfortran.dg/nested_allocatables_1.f90   -O3 
> -fomit-frame-pointer -funroll-loops -fpeel-loops -ftracer -finline-functions  
> execution test
>
> gcc.sum
> Tests that now fail, but worked before:
>
> nvptx-none-run: gcc.c-torture/execute/20100316-1.c   -Os  execution test
> nvptx-none-run: gcc.c-torture/execute/20100708-1.c   -O1  execution test
> nvptx-none-run: gcc.c-torture/execute/20100805-1.c   -O0  execution test
> nvptx-none-run: gcc.dg/torture/pr52028.c   -O3 -fomit-frame-pointer 
> -funroll-loops -fpeel-loops -ftracer -finline-functions  execution test
> nvptx-none-run: gcc.dg/torture/pr52028.c   -O3 -g  execution test
>
> Tests that now work, but didn't before:
>
> nvptx-none-run: gcc.c-torture/execute/20091229-1.c   -O3 -g  execution test
> nvptx-none-run: gcc.c-torture/execute/20101013-1.c   -Os  execution test
> nvptx-none-run: gcc.c-torture/execute/20101025-1.c   -Os  execution test
> nvptx-none-run: gcc.c-torture/execute/20120105-1.c   

Re: inhibit the sincos optimization when the target has sin and cos instructions

2016-05-17 Thread Cesar Philippidis
On 05/13/2016 01:13 PM, Andrew Pinski wrote:
> On Fri, May 13, 2016 at 12:58 PM, Richard Biener
>  wrote:
>> On May 13, 2016 9:18:57 PM GMT+02:00, Cesar Philippidis 
>>  wrote:
>>> The cse_sincos pass tries to optimize sequences such as
>>>
>>>  sin (x);
>>>  cos (x);
>>>
>>> into a single call to sincos, or cexpi, when available. However, the
>>> nvptx target has sin and cos instructions, albeit with some loss of
>>> precision (so it's only enabled with -ffast-math). This patch teaches
>>> cse_sincos pass to ignore sin, cos and cexpi instructions when the
>>> target can expand those calls. This yields a 6x speedup in 314.omriq
>> >from spec accel when running on Nvidia accelerators.
>>>
>>> Is this OK for trunk?
>>
>> Isn't there an optab for sincos?
> 
> This is exactly what I was going to suggest.  This transformation
> should be done in the back-end back to sin/cos instructions.

I didn't realize that the 387 has sin, cos and sincos instructions,
so yeah, my original patch is bad.

Nathan, is this patch ok for trunk and gcc-6? It adds a new sincos 
pattern in the nvptx backend. I haven't testing a standalone nvptx 
toolchain prior to this patch, so I'm not sure if my test results 
look sane. I seem to be getting a different set of failures when I
test a clean trunk build multiple times. I attached my results
below for reference.

Cesar

g++.sum
Tests that now fail, but worked before:

nvptx-none-run: g++.dg/abi/param1.C  -std=c++14 execution test

Tests that now work, but didn't before:

nvptx-none-run: g++.dg/opt/pr30590.C  -std=gnu++98 execution test
nvptx-none-run: g++.dg/opt/pr36187.C  -std=gnu++14 execution test

gfortran.sum
Tests that now fail, but worked before:

nvptx-none-run: gfortran.dg/alloc_comp_assign_10.f90   -O3 -fomit-frame-pointer 
-funroll-loops -fpeel-loops -ftracer -finline-functions  execution test
nvptx-none-run: gfortran.dg/allocate_with_source_5.f90   -O1  execution test
nvptx-none-run: gfortran.dg/func_assign_3.f90   -O3 -g  execution test
nvptx-none-run: gfortran.dg/inline_sum_3.f90   -O1  execution test
nvptx-none-run: gfortran.dg/inline_sum_3.f90   -O3 -g  execution test
nvptx-none-run: gfortran.dg/internal_pack_15.f90   -O2  execution test
nvptx-none-run: gfortran.dg/internal_pack_8.f90   -Os  execution test
nvptx-none-run: gfortran.dg/intrinsic_ifunction_2.f90   -O0  execution test
nvptx-none-run: gfortran.dg/intrinsic_ifunction_2.f90   -O3 
-fomit-frame-pointer -funroll-loops -fpeel-loops -ftracer -finline-functions  
execution test
nvptx-none-run: gfortran.dg/intrinsic_pack_5.f90   -O3 -g  execution test
nvptx-none-run: gfortran.dg/intrinsic_product_1.f90   -O1  execution test
nvptx-none-run: gfortran.dg/intrinsic_verify_1.f90   -O3 -g  execution test
nvptx-none-run: gfortran.dg/is_iostat_end_eor_1.f90   -O3 -fomit-frame-pointer 
-funroll-loops -fpeel-loops -ftracer -finline-functions  execution test
nvptx-none-run: gfortran.dg/iso_c_binding_rename_1.f03   -O3 
-fomit-frame-pointer -funroll-loops -fpeel-loops -ftracer -finline-functions  
execution test

Tests that now work, but didn't before:

nvptx-none-run: gfortran.dg/char_pointer_assign.f90   -O3 -fomit-frame-pointer 
-funroll-loops -fpeel-loops -ftracer -finline-functions  execution test
nvptx-none-run: gfortran.dg/char_pointer_dummy.f90   -O1  execution test
nvptx-none-run: gfortran.dg/char_pointer_dummy.f90   -Os  execution test
nvptx-none-run: gfortran.dg/char_result_13.f90   -O3 -g  execution test
nvptx-none-run: gfortran.dg/char_result_2.f90   -O1  execution test
nvptx-none-run: gfortran.dg/char_type_len.f90   -Os  execution test
nvptx-none-run: gfortran.dg/character_array_constructor_1.f90   -O0  execution 
test
nvptx-none-run: gfortran.dg/nested_allocatables_1.f90   -O3 
-fomit-frame-pointer -funroll-loops -fpeel-loops -ftracer -finline-functions  
execution test

gcc.sum
Tests that now fail, but worked before:

nvptx-none-run: gcc.c-torture/execute/20100316-1.c   -Os  execution test
nvptx-none-run: gcc.c-torture/execute/20100708-1.c   -O1  execution test
nvptx-none-run: gcc.c-torture/execute/20100805-1.c   -O0  execution test
nvptx-none-run: gcc.dg/torture/pr52028.c   -O3 -fomit-frame-pointer 
-funroll-loops -fpeel-loops -ftracer -finline-functions  execution test
nvptx-none-run: gcc.dg/torture/pr52028.c   -O3 -g  execution test

Tests that now work, but didn't before:

nvptx-none-run: gcc.c-torture/execute/20091229-1.c   -O3 -g  execution test
nvptx-none-run: gcc.c-torture/execute/20101013-1.c   -Os  execution test
nvptx-none-run: gcc.c-torture/execute/20101025-1.c   -Os  execution test
nvptx-none-run: gcc.c-torture/execute/20120105-1.c   -O0  execution test
nvptx-none-run: gcc.c-torture/execute/20120111-1.c   -O0  execution test

New tests that PASS:

nvptx-none-run: gcc.target/nvptx/sincos-1.c (test for excess errors)
nvptx-none-run: gcc.target/nvptx/sincos-1.c scan-assembler-times cos.approx.f32 
1
nvptx-none-run: gcc.target/nvptx/sincos-1.c 

Re: [PATCH, GCC] PR middle-end/55299, fold bitnot through ASR and rotates

2016-05-17 Thread Mikhail Maltsev
On 05/17/2016 06:09 PM, Richard Biener wrote:
> 
> The patch is ok.
> 

Committed as r236344.

-- 
Regards,
Mikhail Maltsev


[C++ Patch] PR 69793 ("ICE on invalid code in "cp_lexer_peek_nth_token"")

2016-05-17 Thread Paolo Carlini

Hi,

this ICE during error recovery exposes a rather more general weakness: 
we should never call cp_lexer_peek_nth_token (*, 2) when a previous 
cp_lexer_peek_token returns CPP_EOF.


The fix seems easy, just reshape a bit the condition and delay the 
latter. It should also be a net, albeit minor, performance win: in 
general we don't want to call cp_lexer_peek_nth_token (which is out of 
line) unnecessarily.


Tested x86_64-linux.

Thanks,
Paolo.

///
/cp
2016-05-17  Paolo Carlini  

PR c++/69793
* parser.c (cp_parser_template_id): Don't call cp_lexer_peek_nth_token
when the previous cp_lexer_peek_token returns CPP_EOF.

/testsuite
2016-05-17  Paolo Carlini  

PR c++/69793
* g++.dg/template/crash123.C: New.
Index: cp/parser.c
===
--- cp/parser.c (revision 236338)
+++ cp/parser.c (working copy)
@@ -14835,11 +14835,11 @@ cp_parser_template_id (cp_parser *parser,
   /* If we find the sequence `[:' after a template-name, it's probably
  a digraph-typo for `< ::'. Substitute the tokens and check if we can
  parse correctly the argument list.  */
-  next_token = cp_lexer_peek_token (parser->lexer);
-  next_token_2 = cp_lexer_peek_nth_token (parser->lexer, 2);
-  if (next_token->type == CPP_OPEN_SQUARE
+  if (((next_token = cp_lexer_peek_token (parser->lexer))->type
+   == CPP_OPEN_SQUARE)
   && next_token->flags & DIGRAPH
-  && next_token_2->type == CPP_COLON
+  && ((next_token_2 = cp_lexer_peek_nth_token (parser->lexer, 2))->type
+ == CPP_COLON)
   && !(next_token_2->flags & PREV_WHITE))
 {
   cp_parser_parse_tentatively (parser);
Index: testsuite/g++.dg/template/crash123.C
===
--- testsuite/g++.dg/template/crash123.C(revision 0)
+++ testsuite/g++.dg/template/crash123.C(working copy)
@@ -0,0 +1,4 @@
+// PR c++/69793
+
+class fpos;
+template < state > bool operator!= (fpos,; operator!= // { dg-error 
"declared|expected|type" }


Re: [PATCH 2/3] function: Factor out make_*logue_seq

2016-05-17 Thread Jeff Law

On 05/16/2016 07:09 PM, Segher Boessenkool wrote:

Make new functions make_split_prologue_seq, make_prologue_seq, and
make_epilogue_seq.

Tested as in the previous patch; is this okay for trunk?


Segher


2016-05-16  Segher Boessenkool  

* function.c (make_split_prologue_seq, make_prologue_seq,
make_epilogue_seq): New functions, factored out from...
(thread_prologue_and_epilogue_insns): Here.

Please add function comments for the new functions.  OK with that change.

jeff



Re: [PATCH GCC]Enable vect_cond_mixed for AArch64.

2016-05-17 Thread Jeff Law

On 05/17/2016 03:04 AM, Bin Cheng wrote:

Hi,
After supporting all vcond/vcondu patterns in AArch64 backend, now we can 
vectorize VEC_COND_EXPR with different type in comparison operands and value 
operands on AArch64.  GCC uses vect_cond_mixed to control such test cases, for 
now, there are below cases affected by it:

pr61194.c
This was failed for all targets, but was just fixed by my previous tree 
ifcvt patch.
slp-cond-2-big-array.c
slp-cond-2.c
vect-cond-10.c
vect-cond-8.c
vect-cond-9.c

They will start passing after this patch.

Test on AArch64.  Is it OK?

Thanks,
bin

gcc/testsuite/ChangeLog
2016-05-12  Bin Cheng  

* lib/target-supports.exp (check_effective_target_vect_cond_mixed):
Add aarch64*-*-*.

OK.
jeff


Re: New hashtable power 2 rehash policy

2016-05-17 Thread François Dumont

On 14/05/2016 19:06, Daniel Krügler wrote:

2016-05-14 18:13 GMT+02:00 François Dumont :


New patch attached, tested under linux x86_64.

François

1) The function __clp2 is declared using _GLIBCXX14_CONSTEXPR, which
means that it is an inline function if and *only* if
_GLIBCXX14_CONSTEXPR really expands to constexpr, otherwise it is
*not* inline, which is probably not intended and could easily cause
ODR problems. I suggest to mark it unconditionally as inline,
regardless of _GLIBCXX14_CONSTEXPR.


Maybe _GLIBCXX14_CONSTEXPR should take inline value previous to C++14 mode.

For the moment I simply added the inline as done in other situations.



2) Furthermore I suggest to declare __clp2 as noexcept - this is
(intentionally) *not* implied by constexpr.

3) Is there any reason, why _Power2_rehash_policy::_M_next_bkt
shouldn't be noexcept?

4) Similar to (3) for _Power2_rehash_policy's member functions
_M_bkt_for_elements, _M_need_rehash, _M_state, _M_reset
For noexcept I throught we were only adding it if necessary. We might 
have to go through a lot of code to find all places where noexcept could 
be added. Jonathan will give his feedback.


For the moment I have added it on all those methods.

Thanks for feedback, updated and tested patch attached.

François

diff --git a/libstdc++-v3/include/bits/c++config b/libstdc++-v3/include/bits/c++config
index 57024e4..78353ae 100644
--- a/libstdc++-v3/include/bits/c++config
+++ b/libstdc++-v3/include/bits/c++config
@@ -106,8 +106,10 @@
 #ifndef _GLIBCXX14_CONSTEXPR
 # if __cplusplus >= 201402L
 #  define _GLIBCXX14_CONSTEXPR constexpr
+#  define _GLIBCXX14_USE_CONSTEXPR constexpr
 # else
 #  define _GLIBCXX14_CONSTEXPR
+#  define _GLIBCXX14_USE_CONSTEXPR const
 # endif
 #endif
 
diff --git a/libstdc++-v3/include/bits/hashtable_policy.h b/libstdc++-v3/include/bits/hashtable_policy.h
index 2c24c19..caff085 100644
--- a/libstdc++-v3/include/bits/hashtable_policy.h
+++ b/libstdc++-v3/include/bits/hashtable_policy.h
@@ -31,6 +31,8 @@
 #ifndef _HASHTABLE_POLICY_H
 #define _HASHTABLE_POLICY_H 1
 
+#include  // for std::min.
+
 namespace std _GLIBCXX_VISIBILITY(default)
 {
 _GLIBCXX_BEGIN_NAMESPACE_VERSION
@@ -457,6 +459,8 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
   /// smallest prime that keeps the load factor small enough.
   struct _Prime_rehash_policy
   {
+using __has_load_factor = std::true_type;
+
 _Prime_rehash_policy(float __z = 1.0) noexcept
 : _M_max_load_factor(__z), _M_next_resize(0) { }
 
@@ -501,6 +505,132 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
 mutable std::size_t	_M_next_resize;
   };
 
+  /// Range hashing function assuming that second arg is a power of 2.
+  struct _Mask_range_hashing
+  {
+typedef std::size_t first_argument_type;
+typedef std::size_t second_argument_type;
+typedef std::size_t result_type;
+
+result_type
+operator()(first_argument_type __num,
+	   second_argument_type __den) const noexcept
+{ return __num & (__den - 1); }
+  };
+
+  /// Compute closest power of 2.
+  _GLIBCXX14_CONSTEXPR
+  inline std::size_t
+  __clp2(std::size_t n) noexcept
+  {
+#if __SIZEOF_SIZE_T__ >= 8
+std::uint_fast64_t x = n;
+#else
+std::uint_fast32_t x = n;
+#endif
+// Algorithm from Hacker's Delight, Figure 3-3.
+x = x - 1;
+x = x | (x >> 1);
+x = x | (x >> 2);
+x = x | (x >> 4);
+x = x | (x >> 8);
+x = x | (x >>16);
+#if __SIZEOF_SIZE_T__ >= 8
+x = x | (x >>32);
+#endif
+return x + 1;
+  }
+
+  /// Rehash policy providing power of 2 bucket numbers. Avoids modulo
+  /// operations.
+  struct _Power2_rehash_policy
+  {
+using __has_load_factor = std::true_type;
+
+_Power2_rehash_policy(float __z = 1.0) noexcept
+: _M_max_load_factor(__z), _M_next_resize(0) { }
+
+float
+max_load_factor() const noexcept
+{ return _M_max_load_factor; }
+
+// Return a bucket size no smaller than n (as long as n is not above the
+// highest power of 2).
+std::size_t
+_M_next_bkt(std::size_t __n) const noexcept
+{
+  _GLIBCXX14_USE_CONSTEXPR size_t __max_width
+	= std::min(sizeof(size_t), 8);
+  _GLIBCXX14_USE_CONSTEXPR auto __max_bkt
+	= std::size_t(1) << (__max_width * __CHAR_BIT__ - 1);
+
+  std::size_t __res = __clp2(__n);
+
+  if (__res == 0)
+	__res = __max_bkt;
+
+  if (__res == __max_bkt)
+	// Set next resize to the max value so that we never try to rehash again
+	// as we already reach the biggest possible bucket number.
+	// Note that it might result in max_load_factor not being respected.
+	_M_next_resize = std::size_t(-1);
+  else
+	_M_next_resize
+	  = __builtin_ceil(__res * (long double)_M_max_load_factor);
+
+  return __res;
+}
+
+// Return a bucket count appropriate for n elements
+std::size_t
+_M_bkt_for_elements(std::size_t __n) const noexcept
+{ return __builtin_ceil(__n / (long double)_M_max_load_factor); }
+
+// __n_bkt is current bucket count, __n_elt is current 

[PATCH] Fix ICE with redirecting to unreachable in thunks (PR ipa/71146)

2016-05-17 Thread Marek Polacek
Since Honza's change in r236012 we're able to expand thunks inline, and as
a side-effect we can redirect call within thunk to __buitin_unreachable (at
least that's my understanding ;).

But that means we need to employ the maybe_remove_unused_call_args function
so that we don't left __buitin_unreachable with arguments in the IL.

Bootstrapped/regtested on x86_64-linux, applying to trunk (approved by Honza
in the PR).

2016-05-17  Marek Polacek  

PR ipa/71146
* tree-inline.c (expand_call_inline): Call
maybe_remove_unused_call_args.

* g++.dg/ipa/pr71146.C: New test.

diff --git gcc/testsuite/g++.dg/ipa/pr71146.C gcc/testsuite/g++.dg/ipa/pr71146.C
index e69de29..54d34a7 100644
--- gcc/testsuite/g++.dg/ipa/pr71146.C
+++ gcc/testsuite/g++.dg/ipa/pr71146.C
@@ -0,0 +1,29 @@
+// PR ipa/71146
+// { dg-do compile }
+// { dg-options "-O3" }
+
+typedef enum { X } E;
+struct A {
+  virtual void bar ();
+};
+struct B {
+  virtual E fn (const char *, int, int *) = 0;
+};
+struct C : A, B {
+  E fn (const char *, int, int *);
+  void fn2 ();
+  B *foo;
+};
+void C::fn2 () {
+  if (!foo)
+return;
+  foo->fn (0, 0, 0);
+}
+E
+C::fn (const char *, int, int *)
+{
+  fn2 ();
+  foo = 0;
+  fn (0, 0, 0);
+  return X;
+}
diff --git gcc/tree-inline.c gcc/tree-inline.c
index 85ed2c2..954dac3 100644
--- gcc/tree-inline.c
+++ gcc/tree-inline.c
@@ -4486,6 +4486,7 @@ expand_call_inline (basic_block bb, gimple *stmt, 
copy_body_data *id)
   update_stmt (stmt);
   id->src_node->remove ();
   expand_call_inline (bb, stmt, id);
+  maybe_remove_unused_call_args (cfun, stmt);
   return true;
 }
   fn = cg_edge->callee->decl;

Marek


Re: [PATCH] c++/60760 - arithmetic on null pointers should not be allowed in constant expressions

2016-05-17 Thread Jason Merrill

On 05/12/2016 06:34 PM, Martin Sebor wrote:

Attached is a resubmission of the patch for c++/60760 originally
submitted late in the 6.0 cycle along with a patch for c++/67376.
Since c++/60760 was not a regression, it was decided that it
would be safer to defer the fix until after the 6.1.0 release.

While retesting this patch I was happy to notice that it also
fixes another bug: c++/71091 - constexpr reference bound to a null
pointer dereference accepted.


I'm not sure why we need to track nullptr_p through everything.  Can't 
we set *non_constant_p instead in the places where it's problematic, as 
in cxx_eval_binary_expression?


I understand that the complication comes because of needing to allow

constexpr int *p = &*(int*)0;

but I don't see how cxx_eval_component_reference could come up with a 
constant value for the referent of a null pointer, so we already reject


struct A { int i; };
constexpr A* p = nullptr;
constexpr int i = p->i;

In cxx_eval_indirect_ref, we could check !lval and reject the 
constant-expression at that point.


Jason



[Committed] jit: gcc diagnostics are jit errors

2016-05-17 Thread David Malcolm
libgccjit performs numerous checks at the API boundary, but
if these succeed, it ignores errors and other diagnostics emitted
within the core of gcc, and treats the compile of a gcc_jit_context
as having succeeded.

This patch ensures that if any diagnostics are emitted, they
are visible from the libgccjit API, and that the the context is
flagged as having failed.

For now any kind of diagnostic is treated as a jit error,
so warnings and notes also count as errors.

Successfully bootstrapped on x86_64-pc-linux-gnu;
adds 19 PASS results to jit.sum.

Committed to trunk as r236342.

gcc/jit/ChangeLog:
* dummy-frontend.c: Include diagnostic.h.
(jit_begin_diagnostic): New function.
(jit_end_diagnostic): New function.
(jit_langhook_init): Register jit_begin_diagnostic
and jit_end_diagnostic with the global_dc.
* jit-playback.c: Include diagnostic.h.
(gcc::jit::playback::context::add_diagnostic): New method.
* jit-playback.h (struct diagnostic_context): Add forward
declaration.
(gcc::jit::playback::context::add_diagnostic): New method.

gcc/testsuite/ChangeLog:
* jit.dg/test-error-array-bounds.c: New test case.
---
 gcc/jit/dummy-frontend.c   | 34 
 gcc/jit/jit-playback.c | 38 ++
 gcc/jit/jit-playback.h |  7 +++
 gcc/testsuite/jit.dg/test-error-array-bounds.c | 72 ++
 4 files changed, 151 insertions(+)
 create mode 100644 gcc/testsuite/jit.dg/test-error-array-bounds.c

diff --git a/gcc/jit/dummy-frontend.c b/gcc/jit/dummy-frontend.c
index 7194ba6..2631153 100644
--- a/gcc/jit/dummy-frontend.c
+++ b/gcc/jit/dummy-frontend.c
@@ -25,6 +25,7 @@ along with GCC; see the file COPYING3.  If not see
 #include "debug.h"
 #include "langhooks.h"
 #include "langhooks-def.h"
+#include "diagnostic.h"
 
 
 #include 
@@ -90,6 +91,35 @@ struct ggc_root_tab jit_root_tab[] =
 LAST_GGC_ROOT_TAB
   };
 
+/* JIT-specific implementation of diagnostic callbacks.  */
+
+/* Implementation of "begin_diagnostic".  */
+
+static void
+jit_begin_diagnostic (diagnostic_context */*context*/,
+ diagnostic_info */*diagnostic*/)
+{
+  gcc_assert (gcc::jit::active_playback_ctxt);
+  JIT_LOG_SCOPE (gcc::jit::active_playback_ctxt->get_logger ());
+
+  /* No-op (apart from logging); the real error-handling is done in the
+ "end_diagnostic" hook.  */
+}
+
+/* Implementation of "end_diagnostic".  */
+
+static void
+jit_end_diagnostic (diagnostic_context *context,
+   diagnostic_info *diagnostic)
+{
+  gcc_assert (gcc::jit::active_playback_ctxt);
+  JIT_LOG_SCOPE (gcc::jit::active_playback_ctxt->get_logger ());
+
+  /* Delegate to the playback context (and thence to the
+ recording context).  */
+  gcc::jit::active_playback_ctxt->add_diagnostic (context, diagnostic);
+}
+
 /* Language hooks.  */
 
 static bool
@@ -105,6 +135,10 @@ jit_langhook_init (void)
   registered_root_tab = true;
 }
 
+  gcc_assert (global_dc);
+  global_dc->begin_diagnostic = jit_begin_diagnostic;
+  global_dc->end_diagnostic = jit_end_diagnostic;
+
   build_common_tree_nodes (false);
 
   /* I don't know why this has to be done explicitly.  */
diff --git a/gcc/jit/jit-playback.c b/gcc/jit/jit-playback.c
index 579230d..156448d 100644
--- a/gcc/jit/jit-playback.c
+++ b/gcc/jit/jit-playback.c
@@ -37,6 +37,7 @@ along with GCC; see the file COPYING3.  If not see
 #include "context.h"
 #include "fold-const.h"
 #include "gcc.h"
+#include "diagnostic.h"
 
 #include 
 
@@ -2833,6 +2834,43 @@ add_error_va (location *loc, const char *fmt, va_list ap)
  fmt, ap);
 }
 
+/* Report a diagnostic up to the jit context as an error,
+   so that the compilation is treated as a failure.
+   For now, any kind of diagnostic is treated as an error by the jit
+   API.  */
+
+void
+playback::context::
+add_diagnostic (struct diagnostic_context *diag_context,
+   struct diagnostic_info *diagnostic)
+{
+  /* At this point the text has been formatted into the pretty-printer's
+ output buffer.  */
+  pretty_printer *pp = diag_context->printer;
+  const char *text = pp_formatted_text (pp);
+
+  /* Get location information (if any) from the diagnostic.
+ The recording::context::add_error[_va] methods require a
+ recording::location.  We can't lookup the playback::location
+ from the file/line/column since any playback location instances
+ may have been garbage-collected away by now, so instead we create
+ another recording::location directly.  */
+  location_t gcc_loc = diagnostic_location (diagnostic);
+  recording::location *rec_loc = NULL;
+  if (gcc_loc)
+{
+  expanded_location exploc = expand_location (gcc_loc);
+  if (exploc.file)
+   rec_loc = m_recording_ctxt->new_location (exploc.file,
+ exploc.line,
+  

[Committed] jit: document gcc_jit_context_new_call_through_ptr

2016-05-17 Thread David Malcolm
Every version of libgccjit.h in trunk has had
gcc_jit_context_new_call_through_ptr, but it wasn't
documented until now.

Committed to trunk as r236341.

gcc/jit/ChangeLog:
* docs/topics/expressions.rst (Function calls): Document
gcc_jit_context_new_call_through_ptr.
* docs/_build/texinfo/libgccjit.texi: Regenerate.
---
 gcc/jit/docs/topics/expressions.rst | 16 
 1 file changed, 16 insertions(+)

diff --git a/gcc/jit/docs/topics/expressions.rst 
b/gcc/jit/docs/topics/expressions.rst
index cb65c43..0445332 100644
--- a/gcc/jit/docs/topics/expressions.rst
+++ b/gcc/jit/docs/topics/expressions.rst
@@ -409,6 +409,22 @@ Function calls
  printf_func,
  2, args));
 
+.. function:: gcc_jit_rvalue *\
+  gcc_jit_context_new_call_through_ptr (gcc_jit_context *ctxt,\
+gcc_jit_location *loc,\
+gcc_jit_rvalue *fn_ptr,\
+int numargs, \
+gcc_jit_rvalue **args)
+
+   Given an rvalue of function pointer type, and the given table of
+   argument rvalues, construct a call to the function pointer, with the
+   result as an rvalue.
+
+   .. note::
+
+  The same caveat as for :c:func:`gcc_jit_context_new_call` applies.
+
+
 Type-coercion
 *
 
-- 
1.8.5.3



Re: [PATCH] integer overflow checking builtins in constant expressions

2016-05-17 Thread Jason Merrill

On 05/01/2016 12:39 PM, Martin Sebor wrote:

+  if (TREE_CODE (arg0) == INTEGER_CST && TREE_CODE (arg1) == INTEGER_CST)
+{
+  if (tree result = size_binop_loc (EXPR_LOC_OR_LOC (t, input_location),
+   opcode, arg0, arg1))
+   {
+ if (TREE_OVERFLOW (result))
+   {
+ /* Reset TREE_OVERFLOW to avoid warnings for the overflow.  */
+ TREE_OVERFLOW (result) = 0;
+
+ return build_complex (TREE_TYPE (t), result, integer_one_node);
+   }
+
+ return build_complex (TREE_TYPE (t), result, integer_zero_node);
+   }
+}


Should this be in the middle-end somewhere, perhaps shared with 
fold_builtin_arith_overflow?  I notice that the comment for that 
function says that it folds into normal arithmetic if the operation can 
never overflow, but I don't see any code that would accomplish that.


Jason



[Patch] Implement is_[nothrow_]swappable (p0185r1)

2016-05-17 Thread Daniel Krügler
This is an implementation of the Standard is_swappable traits according to

http://www.open-std.org/jtc1/sc22/wg21/docs/papers/2016/p0185r1.html

During that work it has been found that std::array's member swap's exception
specification for zero-size arrays was incorrectly depending on the value_type
and that was fixed as well.

- Daniel


changelog.patch
Description: Binary data


is_swappable.patch
Description: Binary data


Re: [PATCH] Fix PR71132

2016-05-17 Thread H.J. Lu
On Tue, May 17, 2016 at 5:51 AM, Richard Biener  wrote:
>
> The following fixes a latent issue in loop distribution catched by
> the fake edge placement adjustment.
>
> Bootstrapped and tested on x86_64-unknown-linux-gnu, applied.
>
> Richard.
>
> 2016-05-17  Richard Biener  
>
> PR tree-optimization/71132
> * tree-loop-distribution.c (create_rdg_cd_edges): Pass in loop.
> Only add control dependences for blocks in the loop.
> (build_rdg): Adjust.
> (generate_code_for_partition): Return whether loop should
> be destroyed and delay that.
> (distribute_loop): Likewise.
> (pass_loop_distribution::execute): Record loops to be destroyed
> and perform delayed destroying of loops.
>
> * gcc.dg/torture/pr71132.c: New testcase.
>

On x86, this caused:

FAIL: c-c++-common/cilk-plus/AN/builtin_fn_custom.c  -O3 -fcilkplus
(internal compiler error)
FAIL: c-c++-common/cilk-plus/AN/builtin_fn_custom.c  -O3 -fcilkplus
(test for excess errors)
FAIL: c-c++-common/cilk-plus/AN/builtin_fn_mutating.c  -fcilkplus -O3
-std=c99 (internal compiler error)
FAIL: c-c++-common/cilk-plus/AN/builtin_fn_mutating.c  -fcilkplus -O3
-std=c99 (test for excess errors)
FAIL: c-c++-common/cilk-plus/AN/builtin_fn_mutating.c  -O3 -fcilkplus
(internal compiler error)
FAIL: c-c++-common/cilk-plus/AN/builtin_fn_mutating.c  -O3 -fcilkplus
(test for excess errors)
FAIL: c-c++-common/cilk-plus/AN/builtin_func_double.c  -fcilkplus -O3
-std=c99 (internal compiler error)
FAIL: c-c++-common/cilk-plus/AN/builtin_func_double.c  -fcilkplus -O3
-std=c99 (test for excess errors)
FAIL: c-c++-common/cilk-plus/AN/builtin_func_double.c  -O3 -fcilkplus
(internal compiler error)
FAIL: c-c++-common/cilk-plus/AN/builtin_func_double.c  -O3 -fcilkplus
(internal compiler error)
FAIL: c-c++-common/cilk-plus/AN/builtin_func_double.c  -O3 -fcilkplus
(test for excess errors)
FAIL: c-c++-common/cilk-plus/AN/builtin_func_double.c  -O3 -fcilkplus
(test for excess errors)
FAIL: c-c++-common/cilk-plus/AN/sec_reduce_ind_same_value.c
-fcilkplus -O3 -std=c99 (internal compiler error)
FAIL: c-c++-common/cilk-plus/AN/sec_reduce_ind_same_value.c
-fcilkplus -O3 -std=c99 (test for excess errors)
FAIL: c-c++-common/cilk-plus/AN/sec_reduce_ind_same_value.c  -O3
-fcilkplus (internal compiler error)
FAIL: c-c++-common/cilk-plus/AN/sec_reduce_ind_same_value.c  -O3
-fcilkplus (test for excess errors)
FAIL: gcc.c-torture/compile/pr32399.c   -O3 -fomit-frame-pointer
-funroll-loops -fpeel-loops -ftracer -finline-functions  (internal
compiler error)
FAIL: gcc.c-torture/compile/pr32399.c   -O3 -fomit-frame-pointer
-funroll-loops -fpeel-loops -ftracer -finline-functions  (test for
excess errors)
FAIL: gcc.c-torture/compile/pr32399.c   -O3 -g  (internal compiler error)
FAIL: gcc.c-torture/compile/pr32399.c   -O3 -g  (test for excess errors)
FAIL: gcc.c-torture/execute/20010221-1.c   -O3 -g  (internal compiler error)
FAIL: gcc.c-torture/execute/20010221-1.c   -O3 -g  (test for excess errors)
FAIL: gcc.c-torture/execute/20120919-1.c   -O3 -fomit-frame-pointer
-funroll-loops -fpeel-loops -ftracer -finline-functions  (internal
compiler error)
FAIL: gcc.c-torture/execute/20120919-1.c   -O3 -fomit-frame-pointer
-funroll-loops -fpeel-loops -ftracer -finline-functions  (test for
excess errors)
FAIL: gcc.c-torture/execute/20120919-1.c   -O3 -g  (internal compiler error)
FAIL: gcc.c-torture/execute/20120919-1.c   -O3 -g  (test for excess errors)
FAIL: gcc.dg/torture/pr61383-1.c   -O3 -fomit-frame-pointer
-funroll-loops -fpeel-loops -ftracer -finline-functions  (internal
compiler error)
FAIL: gcc.dg/torture/pr61383-1.c   -O3 -fomit-frame-pointer
-funroll-loops -fpeel-loops -ftracer -finline-functions  (test for
excess errors)
FAIL: gcc.dg/torture/pr69452.c   -O3 -fomit-frame-pointer
-funroll-loops -fpeel-loops -ftracer -finline-functions  (internal
compiler error)
FAIL: gcc.dg/torture/pr69452.c   -O3 -fomit-frame-pointer
-funroll-loops -fpeel-loops -ftracer -finline-functions  (test for
excess errors)
FAIL: gcc.dg/torture/pr69452.c   -O3 -g  (internal compiler error)
FAIL: gcc.dg/torture/pr69452.c   -O3 -g  (test for excess errors)
FAIL: g++.dg/cilk-plus/AN/builtin_fn_mutating_tplt.cc  -g -O3
-fcilkplus (internal compiler error)
FAIL: g++.dg/cilk-plus/AN/builtin_fn_mutating_tplt.cc  -g -O3
-fcilkplus (test for excess errors)
FAIL: g++.dg/cilk-plus/AN/builtin_fn_mutating_tplt.cc  -O3 -fcilkplus
(internal compiler error)
FAIL: g++.dg/cilk-plus/AN/builtin_fn_mutating_tplt.cc  -O3 -fcilkplus
(test for excess errors)
FAIL: g++.dg/cilk-plus/AN/builtin_fn_mutating_tplt.cc  -O3
-ftree-vectorize -fcilkplus -g (internal compiler error)
FAIL: g++.dg/cilk-plus/AN/builtin_fn_mutating_tplt.cc  -O3
-ftree-vectorize -fcilkplus -g (test for excess errors)
FAIL: gfortran.dg/do_concurrent_2.f90   -O3 -fomit-frame-pointer
-funroll-loops -fpeel-loops -ftracer -finline-functions  (internal
compiler error)
FAIL: 

Re: [C++ Patch] PR 70466 ("ICE on invalid code in tree check: expected constructor, have parm_decl in convert_like_real...")

2016-05-17 Thread Jason Merrill

On 05/17/2016 04:47 AM, Paolo Carlini wrote:

... alternately, if the substance of my patchlet is right, we could
simplify a bit the logic per the below.


Here's a well-formed variant that was accepted by 4.5.  Does your patch 
fix it?  I also think with your patch we can drop the C++11 check, since 
list-initialization doesn't exist in C++98.


template < class T, class S >
struct A
{
  explicit A (...) {}
};

template < class T, class S >
A < T, S > foo (T (S::*f) ())
{
  return A < T, S > (f);
}

struct B
{
  void bar () {}
};

int
main ()
{
  foo (::bar);
  return 0;
}




Re: PING^5 [PATCH, GCC 5] PR 70613, -fabi-version docs don't match implementation

2016-05-17 Thread Mike Stump
On May 17, 2016, at 8:19 AM, Sandra Loosemore  wrote:
> 
> I thought I remembered mail going by that changes to a release branch require 
> RM approval too.

For time to time, the RM can close any release branch at any time for any 
reason.  :-)  For example, a gcc 3.2.x release branch has been closed for any 
checkins.  When they are trying to get a release out, they may say, no checkins 
til the release is made and so on.  In general, they update 
https://gcc.gnu.org/ to list the state of each branch.  The non-permanent 
closures are generally only til the next release.  The permanent closures are 
because we are removing support for an old release because we have a new 
release now.

[gomp4.5] Minor OpenMP 4.5 fortran translation fixes, 3 new taskloop testcases

2016-05-17 Thread Jakub Jelinek
Hi!

Tested on x86_64-linux, committed to gomp-4_5-branch.

2016-05-17  Jakub Jelinek  

* trans-openmp.c (gfc_split_omp_clauses): Handle EXEC_OMP_TARGET_SIMD.
(gfc_trans_omp_teams): Don't wrap into OMP_TEAMS if -fopenmp-simd.
(gfc_trans_omp_target): Set OMP_TARGET_COMBINED if needed.

* testsuite/libgomp.fortran/taskloop-1.f90: Renamed to ...
* testsuite/libgomp.fortran/taskloop1.f90: ... this.
* testsuite/libgomp.fortran/taskloop2.f90: New test.
* testsuite/libgomp.fortran/taskloop3.f90: New test.
* testsuite/libgomp.fortran/taskloop4.f90: New test.

--- gcc/fortran/trans-openmp.c.jj   2016-05-16 17:56:25.0 +0200
+++ gcc/fortran/trans-openmp.c  2016-05-17 12:21:11.289337099 +0200
@@ -3809,6 +3809,10 @@ gfc_split_omp_clauses (gfc_code *code,
 | GFC_OMP_MASK_SIMD;
   innermost = GFC_OMP_SPLIT_SIMD;
   break;
+case EXEC_OMP_TARGET_SIMD:
+  mask = GFC_OMP_MASK_TARGET | GFC_OMP_MASK_SIMD;
+  innermost = GFC_OMP_SPLIT_SIMD;
+  break;
 case EXEC_OMP_TARGET_TEAMS:
   mask = GFC_OMP_MASK_TARGET | GFC_OMP_MASK_TEAMS;
   innermost = GFC_OMP_SPLIT_TEAMS;
@@ -4431,10 +4435,13 @@ gfc_trans_omp_teams (gfc_code *code, gfc
   stmt = gfc_trans_omp_distribute (code, clausesa);
   break;
 }
-  stmt = build2_loc (input_location, OMP_TEAMS, void_type_node, stmt,
-omp_clauses);
-  if (combined)
-OMP_TEAMS_COMBINED (stmt) = 1;
+  if (flag_openmp)
+{
+  stmt = build2_loc (input_location, OMP_TEAMS, void_type_node, stmt,
+omp_clauses);
+  if (combined)
+   OMP_TEAMS_COMBINED (stmt) = 1;
+}
   gfc_add_expr_to_block (, stmt);
   return gfc_finish_block ();
 }
@@ -4502,8 +4509,12 @@ gfc_trans_omp_target (gfc_code *code)
   break;
 }
   if (flag_openmp)
-stmt = build2_loc (input_location, OMP_TARGET, void_type_node, stmt,
-  omp_clauses);
+{
+  stmt = build2_loc (input_location, OMP_TARGET, void_type_node, stmt,
+omp_clauses);
+  if (code->op != EXEC_OMP_TARGET)
+   OMP_TARGET_COMBINED (stmt) = 1;
+}
   gfc_add_expr_to_block (, stmt);
   return gfc_finish_block ();
 }
--- libgomp/testsuite/libgomp.fortran/taskloop-1.f90.jj 2016-05-16 
16:38:49.100807474 +0200
+++ libgomp/testsuite/libgomp.fortran/taskloop-1.f902016-05-17 
13:06:44.974169085 +0200
@@ -1,44 +0,0 @@
-  common /blk/ q, e
-  integer :: q, r
-  logical :: e
-!$omp parallel
-!$omp single
-  call foo (2, 7)
-  r = bar (12, 18)
-!$omp end single
-!$omp end parallel
-  if (q .ne. 6 .or. r .ne. 17 .or. e) call abort
-contains
-  subroutine foo (a, b)
-integer, intent (in) :: a, b
-common /blk/ q, e
-integer :: q, r, d
-logical :: e
-!$omp taskloop lastprivate (q) nogroup
-do d = a, b, 2
-  q = d
-  if (d < 2 .or. d > 6 .or. iand (d, 1) .ne. 0) then
-!$omp atomic write
-e = .true.
-  end if
-end do
-  end subroutine foo
-  function bar (a, b)
-integer, intent (in) :: a, b
-integer :: bar
-common /blk/ q, e
-integer :: q, r, d, s
-logical :: e
-s = 7
-!$omp taskloop lastprivate (s)
-do d = a, b - 1
-  if (d < 12 .or. d > 17) then
-!$omp atomic write
-e = .true.
-  end if
-  s = d
-end do
-!$omp end taskloop
-bar = s
-  end function bar
-end
--- libgomp/testsuite/libgomp.fortran/taskloop1.f90.jj  2016-05-17 
13:06:28.644391501 +0200
+++ libgomp/testsuite/libgomp.fortran/taskloop1.f90 2016-05-16 
16:38:49.100807474 +0200
@@ -0,0 +1,44 @@
+  common /blk/ q, e
+  integer :: q, r
+  logical :: e
+!$omp parallel
+!$omp single
+  call foo (2, 7)
+  r = bar (12, 18)
+!$omp end single
+!$omp end parallel
+  if (q .ne. 6 .or. r .ne. 17 .or. e) call abort
+contains
+  subroutine foo (a, b)
+integer, intent (in) :: a, b
+common /blk/ q, e
+integer :: q, r, d
+logical :: e
+!$omp taskloop lastprivate (q) nogroup
+do d = a, b, 2
+  q = d
+  if (d < 2 .or. d > 6 .or. iand (d, 1) .ne. 0) then
+!$omp atomic write
+e = .true.
+  end if
+end do
+  end subroutine foo
+  function bar (a, b)
+integer, intent (in) :: a, b
+integer :: bar
+common /blk/ q, e
+integer :: q, r, d, s
+logical :: e
+s = 7
+!$omp taskloop lastprivate (s)
+do d = a, b - 1
+  if (d < 12 .or. d > 17) then
+!$omp atomic write
+e = .true.
+  end if
+  s = d
+end do
+!$omp end taskloop
+bar = s
+  end function bar
+end
--- libgomp/testsuite/libgomp.fortran/taskloop2.f90.jj  2016-05-17 
13:08:16.947916378 +0200
+++ libgomp/testsuite/libgomp.fortran/taskloop2.f90 2016-05-17 
15:42:18.328235190 +0200
@@ -0,0 +1,134 @@
+! { dg-do run }
+! { dg-options "-O2" }
+! { dg-additional-options "-msse2" { target sse2_runtime } }
+! { dg-additional-options "-mavx" { target avx_runtime } }
+
+  integer, save :: u(1024), v(1024), w(1024), m
+  

[GCC 6] [PR target/70860] [nvptx] Handle NULL cfun in nvptx_libcall_value

2016-05-17 Thread Thomas Schwinge
Hi!

On Thu, 28 Apr 2016 12:26:24 +0200, I wrote:
> Richard's r235511 changes (quoted below) cause certain nvptx offloading
> test cases to run into SIGSEGVs:

With these changes recently having been ported to gcc-6-branch in
r236210, these failures/regressions now also show up there:

> [...]
> #4  0x00d14193 in nvptx_libcall_value (mode=mode@entry=SImode)
> at [...]/source-gcc/gcc/config/nvptx/nvptx.c:489
> #5  0x00d17a20 in nvptx_function_value (type=0x7fc1fa359690, 
> func=0x0, outgoing=)
> at [...]/source-gcc/gcc/config/nvptx/nvptx.c:512
> #6  0x006ba220 in hard_function_value 
> (valtype=valtype@entry=0x7fc1fa359690, func=func@entry=0x0, 
> fntype=fntype@entry=0x0, 
> outgoing=outgoing@entry=0) at [...]/source-gcc/gcc/explow.c:1860
> #7  0x0073b0fa in aggregate_value_p 
> (exp=exp@entry=0x7fc1fa41a048, fntype=0x0)
> at [...]/source-gcc/gcc/function.c:2086
> #8  0x00bebc11 in find_func_aliases_for_call (t=0x1feac90, 
> fn=0x7ffe448ca8a0)
> at [...]/source-gcc/gcc/tree-ssa-structalias.c:4644
> #9  find_func_aliases (fn=fn@entry=0x7fc1fa43a540, 
> origt=origt@entry=0x7fc1fa43a7e0)
> at [...]/source-gcc/gcc/tree-ssa-structalias.c:4737
> #10 0x00bf04eb in ipa_pta_execute ()
> at [...]/source-gcc/gcc/tree-ssa-structalias.c:7787
> #11 (anonymous namespace)::pass_ipa_pta::execute (this=)
> at [...]/source-gcc/gcc/tree-ssa-structalias.c:8035
> #12 0x00940bed in execute_one_pass (pass=pass@entry=0x1f43770)
> at [...]/source-gcc/gcc/passes.c:2348
> #13 0x00941972 in execute_ipa_pass_list (pass=0x1f43770)
> at [...]/source-gcc/gcc/passes.c:2778
> #14 0x00607f1f in symbol_table::compile (this=0x7fc1fa359000)
> at [...]/source-gcc/gcc/cgraphunit.c:2435
> #15 0x0056ad48 in lto_main () at 
> [...]/source-gcc/gcc/lto/lto.c:3328
> #16 0x00a065df in compile_file () at 
> [...]/source-gcc/gcc/toplev.c:474
> #17 0x0053753a in do_compile () at 
> [...]/source-gcc/gcc/toplev.c:1998
> #18 toplev::main (this=this@entry=0x7ffe448caba0, argc=argc@entry=18, 
> argv=0x1f1eec0, argv@entry=0x7ffe448caca8)
> at [...]/source-gcc/gcc/toplev.c:2106
> #19 0x005391d7 in main (argc=18, argv=0x7ffe448caca8)
> at [...]/source-gcc/gcc/main.c:39

As obvious, backported my trunk r235748 to gcc-6-branch in r236326:

commit 75737dcecabcfd43fe3a3fa7a0c4d3d215dcdee7
Author: tschwinge 
Date:   Tue May 17 16:08:37 2016 +

[PR target/70860] [nvptx] Handle NULL cfun in nvptx_libcall_value

Backport GCC trunk r235748:

gcc/
PR target/70860
* config/nvptx/nvptx.c (nvptx_libcall_value): Handle NULL cfun.
(nvptx_function_value): Assert non-NULL cfun.

git-svn-id: svn+ssh://gcc.gnu.org/svn/gcc/branches/gcc-6-branch@236326 
138bc75d-0d04-0410-961f-82ee72b054a4
---
 gcc/ChangeLog| 7 +++
 gcc/config/nvptx/nvptx.c | 3 ++-
 2 files changed, 9 insertions(+), 1 deletion(-)

diff --git gcc/ChangeLog gcc/ChangeLog
index ea107f7..8fd7f61 100644
--- gcc/ChangeLog
+++ gcc/ChangeLog
@@ -1,3 +1,10 @@
+2016-05-17  Thomas Schwinge  
+
+   Backport trunk r235748:
+   PR target/70860
+   * config/nvptx/nvptx.c (nvptx_libcall_value): Handle NULL cfun.
+   (nvptx_function_value): Assert non-NULL cfun.
+
 2016-05-17  Kyrylo Tkachov  
 
Backport from mainline
diff --git gcc/config/nvptx/nvptx.c gcc/config/nvptx/nvptx.c
index b088cf8..a6c90b6 100644
--- gcc/config/nvptx/nvptx.c
+++ gcc/config/nvptx/nvptx.c
@@ -483,7 +483,7 @@ nvptx_strict_argument_naming (cumulative_args_t cum_v)
 static rtx
 nvptx_libcall_value (machine_mode mode, const_rtx)
 {
-  if (!cfun->machine->doing_call)
+  if (!cfun || !cfun->machine->doing_call)
 /* Pretend to return in a hard reg for early uses before pseudos can be
generated.  */
 return gen_rtx_REG (mode, NVPTX_RETURN_REGNUM);
@@ -502,6 +502,7 @@ nvptx_function_value (const_tree type, const_tree 
ARG_UNUSED (func),
 
   if (outgoing)
 {
+  gcc_assert (cfun);
   cfun->machine->return_mode = mode;
   return gen_rtx_REG (mode, NVPTX_RETURN_REGNUM);
 }


Grüße
 Thomas


signature.asc
Description: PGP signature


Re: [PATCH][AArch64] Improve aarch64_modes_tieable_p

2016-05-17 Thread Wilco Dijkstra
James Greenhalgh wrote:
> It would be handy if you could raise something in bugzilla for the
> register allocator deficiency.

The register allocation issues are well known and we have multiple
workarounds for this in place. When you allow modes to be tieable
the workarounds are not as effective.

> -  if (TARGET_SIMD
> -  && aarch64_vector_mode_p (mode1)
> -  && aarch64_vector_mode_p (mode2))
> +  if (aarch64_vector_mode_p (mode1) && aarch64_vector_mode_p (mode2))
> +return true;

> This relaxes the TARGET_SIMD check that would have prevented
> OImode/CImode/XImode ties when !TARGET_SIMD. What's the reasoning
> behind that?

There is no need for TARGET_SIMD checks here - in order to create a
vector_struct mode you need to call aarch64_array_mode_supported_p first.

> +  /* Also allow any scalar modes with vectors.  */
> +  if (aarch64_vector_mode_supported_p (mode1)
> +  || aarch64_vector_mode_supported_p (mode2))
>  return true;

> Does this always hold? It seems like you might need to be more restrictive
> with what we allow to avoid ties with some of the more obscure modes
> (V4DF etc.).

Well it is safe to always return true - this passes regression tests (it's just 
a bad
idea from a CQ point of view).

Wilco




Re: [PATCH][AArch64] Adjust SIMD integer preference

2016-05-17 Thread Wilco Dijkstra
ping


From: Wilco Dijkstra
Sent: 22 April 2016 16:35
To: gcc-patches@gcc.gnu.org
Cc: nd
Subject: [PATCH][AArch64] Adjust SIMD integer preference

SIMD operations like combine prefer to have their operands in FP registers,
so increase the cost of integer registers slightly to avoid unnecessary int<->FP
moves. This improves register allocation of scalar SIMD operations.

OK for trunk?

ChangeLog:
2016-04-22  Wilco Dijkstra  

* gcc/config/aarch64/aarch64-simd.md (aarch64_combinez):
Add ? to integer variant.
(aarch64_combinez_be): Likewise.

--

diff --git a/gcc/config/aarch64/aarch64-simd.md 
b/gcc/config/aarch64/aarch64-simd.md
index 
e1f5682165cd22ca7d31643b8f4e7f631d99c2d8..d3830838867eec2098b71eb46b7343d0155acf7e
 100644
--- a/gcc/config/aarch64/aarch64-simd.md
+++ b/gcc/config/aarch64/aarch64-simd.md
@@ -2645,7 +2645,7 @@
 (define_insn "*aarch64_combinez"
   [(set (match_operand: 0 "register_operand" "=w,w,w")
 (vec_concat:
-  (match_operand:VD_BHSI 1 "general_operand" "w,r,m")
+  (match_operand:VD_BHSI 1 "general_operand" "w,?r,m")
   (match_operand:VD_BHSI 2 "aarch64_simd_imm_zero" "Dz,Dz,Dz")))]
   "TARGET_SIMD && !BYTES_BIG_ENDIAN"
   "@
@@ -2661,7 +2661,7 @@
   [(set (match_operand: 0 "register_operand" "=w,w,w")
 (vec_concat:
   (match_operand:VD_BHSI 2 "aarch64_simd_imm_zero" "Dz,Dz,Dz")
-  (match_operand:VD_BHSI 1 "general_operand" "w,r,m")))]
+  (match_operand:VD_BHSI 1 "general_operand" "w,?r,m")))]
   "TARGET_SIMD && BYTES_BIG_ENDIAN"
   "@
mov\\t%0.8b, %1.8b



Re: [PATCH, wide-int] change fixed_wide_int_storage from class to struct

2016-05-17 Thread Mike Stump
On May 15, 2016, at 1:30 PM, Andrew Pinski  wrote:
> 
> Can we recommend that clang disable this warning by default instead?

No.  We want to ensure the class/struct tags match as there is no good reason 
to have them differ.

> Or use an option flag to disable the warning while compiling gcc?

Don't need the option to disable it, as once we fix gcc, it won't produce these 
warnings.  If clang produces warnings for constructs we do want to use, then we 
disable those warnings as uninteresting.  For example, if they warned for line 
widths more than 80, and we don't have a hard rule about 80, then we'd turn it 
off as overly pedantic for our source base, even though in most files, most of 
the time we were 80 or under.  If we had a hard and fast rule, no lines over 
80, ever, then we'd not turn that warning off, rather, we would fixed the 
software to conform to that.

Re: RFA: Generate normal DWARF DW_LOC descriptors for non integer mode pointers

2016-05-17 Thread Jeff Law

On 05/17/2016 06:37 AM, Nick Clifton wrote:

Hi Jeff,


  Currently dwarf2out.c:mem_loc_descriptor() has some special case
  code to handle the situation where an address is held in a register
  whose mode is not of type MODE_INT.  It generates a
  DW_OP_GNU_regval_type expression which may later on be converted into
  a frame pointer based expression.  This is a problem for targets which
  use a partial integer mode for their pointers (eg the msp430).  In
  such cases the conversion to a frame pointer based expression could
  be wrong if the frame pointer is not being used.



I may be missing something, but isn't it the transition to an FP
relative address rather than a SP relative address that's the problem
here?


Yes, I believe so.


Where does that happen?


I did not track it down.  But whilst I was searching for the cause I came
across the code that is modified by the patch.  Reading the code it seemed
obvious to me that the special case for handling non INT_MODE register modes
was not intended for pointers, and when I tried out a small patch it worked.
Maybe rather than tweaking behaviour based on whether or not it's a 
pointer type, we should look at whether or not the object has an integer 
mode (ie, test for MODE_INT or MODE_PARTIAL_INT).





Is it possible we've got the wrong DECL_RTL or somesuch?


I don't think so.  I am not familiar with this code myself, but the dump from
the dwarf2 pass shows:

  (insn 5 2 6 (set (mem/c:HI (plus:PSI (reg/f:PSI 1 R1)
(const_int 4 [0x4])) [1 c+0 S2 A16])
(const_int 5 [0x5])) 
/work/sources/binutils/current/gdb/testsuite/gdb.base/advance.c:41 12 {movhi}
 (nil))

which to me pretty clearly shows that "c" is being stored at R1+4.
Right, but I believe a fair amount of the dwarf stuff goes back to 
trees, which have things like DECL_RTL/DECL_INCOMING_RTL and friends 
embedded inside them.


I wouldn't be terribly surprised to find that it's looking at some stale 
hunk of RTL that wasn't updated for register eliminations or something 
of that nature.


I think we should dig further into why the base register (and offset) is 
wrong and fix that.  We may independently want to tweak the code in 
mem_loc_descriptor to better handle partial integers.


jeff


Re: [PATCH][RFC] Introduce BIT_FIELD_INSERT

2016-05-17 Thread Michael Matz
Hi,

On Tue, 17 May 2016, Richard Biener wrote:

> BIT_INSERT_EXPR 

This.

> Any preference?


Ciao,
Michael.


Re: PING^5 [PATCH, GCC 5] PR 70613, -fabi-version docs don't match implementation

2016-05-17 Thread Sandra Loosemore

On 05/17/2016 03:27 AM, Ramana Radhakrishnan wrote:

On Tue, May 17, 2016 at 1:22 AM, Sandra Loosemore
 wrote:

On 05/16/2016 04:35 PM, Jim Wilson wrote:


This is my fifth ping.  I just need someone to rubber stamp it so I
can check it in.



The documentation change looks fine, but as a documentation maintainer only
I don't think I can approve changes to a release branch.


The release branches are open to regression fixes and documentation
fixes. As documentation maintainer you can approve this backport if it
is appropriate. In case of doubt you may want to punt it to the RM's
for final approval.


OK, I was confused about the process.  I thought I remembered mail going 
by that changes to a release branch require RM approval too.


Jim, go ahead and commit the patch.

-Sandra



Re: [PATCH, GCC] PR middle-end/55299, fold bitnot through ASR and rotates

2016-05-17 Thread Richard Biener
On Tue, May 17, 2016 at 4:05 PM, Marc Glisse  wrote:
> On Tue, 17 May 2016, Richard Biener wrote:
>
>> On Fri, May 13, 2016 at 3:36 PM, Marc Glisse  wrote:
>>>
>>> On Fri, 13 May 2016, Mikhail Maltsev wrote:
>>>
> I don't know if we might want some :c / single_use restrictions, maybe
> on
> the
> outer convert and the rshift/rotate.
>
 I don't think :c can be used here.
>>>
>>>
>>>
>>> Oups, typo for :s.
>>>
 As for :s, I added it, as you suggested.
>>>
>>>
>>>
>>> :s will be ignored when there is no conversion, but I think that's good
>>> enough for now.
>>
>>
>> Yeah.  Doing :s twice as done in the patch works though.
>
>
> I meant that the output will be a single instruction, and thus any :s will
> be ignored (I think).

Yes, if the outer convert is a no-op then :s will be ignored (and that is good).

 Also, I tried to add some more test cases for rotate with conversions,
 but
 unfortunately GCC does not recognize rotate pattern, when narrowing
 conversions
 are present.
>>>
>>>
>>>
>>> It is usually easier to split your expression into several assignments.
>>> Untested:
>>>
>>> int f(long long a, unsigned long n){
>>>   long long b = ~a;
>>>   unsigned long c = b;
>>>   unsigned long d = ROLL (c, n);
>>>   int e = d;
>>>   return ~e;
>>> }
>>>
>>> this way the rotate pattern is detected early (generic) with no extra
>>> operations to confuse the compiler, while your new transformation will
>>> happen in gimple (most likely the first forwprop pass).
>>>
>>> The patch looks good to me, now wait for Richard's comments.
>>
>>
>> Are you sure narrowing conversions are valid for rotates?
>
>
> My reasoning is that narrowing conversions and bit_not commute (sign
> extensions should be fine as well IIRC). So you can essentially pull
> convert1 out and push convert2 down to @1 (notice how the the result
> converts the input and the output), and simplify the middle (rotate and
> bit_not commute) without any conversion.

Ah, indeed.  I missed the rotation/shift is done on the same type as before.

> Note that an alternative way to handle the transformation would be to fix a
> canonical order for some things that commute. Turn non-extending convert of
> bit_not into bit_not of convert, turn rotate of bit_not to bit_not of rotate
> (or the reverse, whatever), etc. If we are lucky, this might move the 2
> bit_not next to each other, where they can cancel out. But that's more
> ambitious.
>
> I heard that llvm and visual studio were using a prover for such transforms.
> Without going that far, it might be good to use some automation to check
> that a transform works say for all values of all types of precision at most
> 4 or something, if someone is motivated...
>
>> (char)short_var <> byte.
>>
>> so at least for the conversion inside the rotate (and shift as well)
>> only nop-conversions look valid to me.
>
>
> Notice that the rotation is done in the type of @0 both before and after.

Yeah, failed to notice that.

The patch is ok.

Thanks,
Richard.

> --
> Marc Glisse


[PATCH 16/17][ARM] Add tests for VFP FP16 ACLE instrinsics.

2016-05-17 Thread Matthew Wahab

Support for using the half-precision floating point operations added by
the ARMv8.2-A FP16 extension is based on the macros and intrinsics added
to the ACLE for the extension.

This patch adds executable tests for the ACLE scalar (floating point)
intrinsics to the advsimd-intrinsics testsuite. The tests were written
by Jiong Wang.

In some tests, there are unavoidable differences in precision when
calculating the actual and the expected results of an FP16 operation. A
new support function CHECK_FP_BIAS is used so that these tests can check
for an acceptable margin of error. In these tests, the tolerance is
given as the absolute integer difference between the bitvectors of the
expected and the actual results.

Tested the series for arm-none-linux-gnueabihf with native bootstrap and
make check and for arm-none-eabi and armeb-none-eabi with make check on
an ARMv8.2-A emulator. Also tested for aarch64-none-elf with the
advsimd-intrinsics testsuite using an ARMv8.2-A emulator.

Ok for trunk?
Matthew

testsuite/
2016-05-17  Jiong Wang  
Matthew Wahab  

* gcc.target/aarch64/advsimd-intrinsics/arm-neon-ref.h
(CHECK_FP_BIAS): New.
* gcc.target/aarch64/advsimd-intrinsics/vabsh_f16_1.c: New.
* gcc.target/aarch64/advsimd-intrinsics/vaddh_f16_1.c: New.
* gcc.target/aarch64/advsimd-intrinsics/vcvtah_s32_f16_1.c: New.
* gcc.target/aarch64/advsimd-intrinsics/vcvtah_u32_f16_1.c: New.
* gcc.target/aarch64/advsimd-intrinsics/vcvth_f16_s32_1.c: New.
* gcc.target/aarch64/advsimd-intrinsics/vcvth_f16_u32_1.c: New.
* gcc.target/aarch64/advsimd-intrinsics/vcvth_n_f16_s32_1.c: New.
* gcc.target/aarch64/advsimd-intrinsics/vcvth_n_f16_u32_1.c: New.
* gcc.target/aarch64/advsimd-intrinsics/vcvth_n_s32_f16_1.c: New.
* gcc.target/aarch64/advsimd-intrinsics/vcvth_n_u32_f16_1.c: New.
* gcc.target/aarch64/advsimd-intrinsics/vcvth_s32_f16_1.c: New.
* gcc.target/aarch64/advsimd-intrinsics/vcvth_u32_f16_1.c: New.
* gcc.target/aarch64/advsimd-intrinsics/vcvtmh_s32_f16_1.c: New.
* gcc.target/aarch64/advsimd-intrinsics/vcvtmh_u32_f16_1.c: New.
* gcc.target/aarch64/advsimd-intrinsics/vcvtnh_s32_f16_1.c: New.
* gcc.target/aarch64/advsimd-intrinsics/vcvtnh_u32_f16_1.c: New.
* gcc.target/aarch64/advsimd-intrinsics/vcvtph_s32_f16_1.c: New.
* gcc.target/aarch64/advsimd-intrinsics/vcvtph_u32_f16_1.c: New.
* gcc.target/aarch64/advsimd-intrinsics/vdivh_f16_1.c: New.
* gcc.target/aarch64/advsimd-intrinsics/vfmah_f16_1.c: New.
* gcc.target/aarch64/advsimd-intrinsics/vfmsh_f16_1.c: New.
* gcc.target/aarch64/advsimd-intrinsics/vmaxnmh_f16_1.c: New.
* gcc.target/aarch64/advsimd-intrinsics/vminnmh_f16_1.c: New.
* gcc.target/aarch64/advsimd-intrinsics/vmulh_f16_1.c: New.
* gcc.target/aarch64/advsimd-intrinsics/vnegh_f16_1.c: New.
* gcc.target/aarch64/advsimd-intrinsics/vrndah_f16_1.c: New.
* gcc.target/aarch64/advsimd-intrinsics/vrndh_f16_1.c: New.
* gcc.target/aarch64/advsimd-intrinsics/vrndih_f16_1.c: New.
* gcc.target/aarch64/advsimd-intrinsics/vrndmh_f16_1.c: New.
* gcc.target/aarch64/advsimd-intrinsics/vrndnh_f16_1.c: New.
* gcc.target/aarch64/advsimd-intrinsics/vrndph_f16_1.c: New.
* gcc.target/aarch64/advsimd-intrinsics/vrndxh_f16_1.c: New.
* gcc.target/aarch64/advsimd-intrinsics/vsqrth_f16_1.c: New.
* gcc.target/aarch64/advsimd-intrinsics/vsubh_f16_1.c: New.

>From fe243d41337fcce0c93a8ce1df68921c680bcfe8 Mon Sep 17 00:00:00 2001
From: Matthew Wahab 
Date: Thu, 7 Apr 2016 15:40:52 +0100
Subject: [PATCH 16/17] [PATCH 16/17][ARM] Add tests for VFP FP16 ACLE
 instrinsics.

testsuite/
2016-05-17  Jiong Wang  
	Matthew Wahab  

	* gcc.target/aarch64/advsimd-intrinsics/arm-neon-ref.h
	(CHECK_FP_BIAS): New.
	* gcc.target/aarch64/advsimd-intrinsics/vabsh_f16_1.c: New.
	* gcc.target/aarch64/advsimd-intrinsics/vaddh_f16_1.c: New.
	* gcc.target/aarch64/advsimd-intrinsics/vcvtah_s32_f16_1.c: New.
	* gcc.target/aarch64/advsimd-intrinsics/vcvtah_u32_f16_1.c: New.
	* gcc.target/aarch64/advsimd-intrinsics/vcvth_f16_s32_1.c: New.
	* gcc.target/aarch64/advsimd-intrinsics/vcvth_f16_u32_1.c: New.
	* gcc.target/aarch64/advsimd-intrinsics/vcvth_n_f16_s32_1.c: New.
	* gcc.target/aarch64/advsimd-intrinsics/vcvth_n_f16_u32_1.c: New.
	* gcc.target/aarch64/advsimd-intrinsics/vcvth_n_s32_f16_1.c: New.
	* gcc.target/aarch64/advsimd-intrinsics/vcvth_n_u32_f16_1.c: New.
	* gcc.target/aarch64/advsimd-intrinsics/vcvth_s32_f16_1.c: New.
	* gcc.target/aarch64/advsimd-intrinsics/vcvth_u32_f16_1.c: New.
	* gcc.target/aarch64/advsimd-intrinsics/vcvtmh_s32_f16_1.c: New.
	* gcc.target/aarch64/advsimd-intrinsics/vcvtmh_u32_f16_1.c: New.
	* 

[PATCH 13/17][ARM] Add VFP FP16 instrinsics.

2016-05-17 Thread Matthew Wahab

The ARMv8.2-A architecture introduces an optional FP16 extension adding
half-precision floating point data processing instructions to the
existing scalar (floating point) support. A future version of the ACLE
will add support for these instructions and this patch implements that
support.

The ACLE will introduce new intrinsics for the scalar (floating-point)
instructions together with a new header file arm_fp16.h. The ACLE will
require that the intrinsics are available when both the header file is
included and the ACLE feature macro __ARM_FEATURE_FP16_SCALAR_ARITHMETIC
is defined. (The new ACLE feature macros are dealt with in an earlier
patch.)

The patch adds the arm_fp16.h header file with the following new
intrinsics:

float16_t vabsh_f16 (float16_t __a)
int32_t vcvtah_s32_f16 (float16_t __a)
uint32_t vcvtah_u32_f16 (float16_t __a)
float16_t vcvth_f16_s32 (int32_t __a)
float16_t vcvth_f16_u32 (uint32_t __a)
int32_t vcvth_s32_f16 (float16_t __a)
uint32_t vcvth_u32_f16 (float16_t __a)
int32_t vcvtmh_s32_f16 (float16_t __a)
uint32_t vcvtmh_u32_f16 (float16_t __a)
int32_t vcvtnh_s32_f16 (float16_t __a)
uint32_t vcvtnh_u32_f16 (float16_t __a)
int32_t vcvtph_s32_f16 (float16_t __a)
uint32_t vcvtph_u32_f16 (float16_t __a)
float16_t vnegh_f16 (float16_t __a)
float16_t vrndah_f16 (float16_t __a)
float16_t vrndh_f16 (float16_t __a)
float16_t vrndih_f16 (float16_t __a)
float16_t vrndmh_f16 (float16_t __a)
float16_t vrndnh_f16 (float16_t __a)
float16_t vrndph_f16 (float16_t __a)
float16_t vrndxh_f16 (float16_t __a)
float16_t vsqrth_f16 (float16_t __a)

float16_t vaddh_f16 (float16_t __a, float16_t __b)
float16_t vcvth_n_f16_s32 (int32_t __a, const int __b)
float16_t vcvth_n_f16_u32 (uint32_t __a, const int __b)
int32_t vcvth_n_s32_f16 (float16_t __a, const int __b)
uint32_t vcvth_n_u32_f16 (float16_t __a, const int __b)
float16_t vdivh_f16 (float16_t __a, float16_t __b)
float16_t vmaxnmh_f16 (float16_t __a, float16_t __b)
float16_t vminnmh_f16 (float16_t __a, float16_t __b)
float16_t vmulh_f16 (float16_t __a, float16_t __b)
float16_t vsubh_f16 (float16_t __a, float16_t __b)

float16_t vfmah_f16 (float16_t __a, float16_t __b, float16_t __c)
float16_t vfmsh_f16 (float16_t __a, float16_t __b, float16_t __c)


Tested the series for arm-none-linux-gnueabihf with native bootstrap and
make check and for arm-none-eabi and armeb-none-eabi with make check on
an ARMv8.2-A emulator.

Ok for trunk?
Matthew

2016-05-17  Matthew Wahab  

* config.gcc (extra_headers): Add arm_fp16.h
* config/arm/arm_fp16.h: New.

>From 0c7d4da5a7c8ca9cf3ce2f23072668c4155b35d9 Mon Sep 17 00:00:00 2001
From: Matthew Wahab 
Date: Thu, 7 Apr 2016 15:36:23 +0100
Subject: [PATCH 13/17] [PATCH 13/17][ARM] Add VFP FP16 instrinsics.

2016-05-17  Matthew Wahab  

	* config.gcc (extra_headers): Add arm_fp16.h
	* config/arm/arm_fp16.h: New.
---
 gcc/config.gcc|   2 +-
 gcc/config/arm/arm_fp16.h | 255 ++
 2 files changed, 256 insertions(+), 1 deletion(-)
 create mode 100644 gcc/config/arm/arm_fp16.h

diff --git a/gcc/config.gcc b/gcc/config.gcc
index 51af122a..e22ff9e 100644
--- a/gcc/config.gcc
+++ b/gcc/config.gcc
@@ -327,7 +327,7 @@ arc*-*-*)
 arm*-*-*)
 	cpu_type=arm
 	extra_objs="arm-builtins.o aarch-common.o"
-	extra_headers="mmintrin.h arm_neon.h arm_acle.h"
+	extra_headers="mmintrin.h arm_neon.h arm_acle.h arm_fp16.h"
 	target_type_format_char='%'
 	c_target_objs="arm-c.o"
 	cxx_target_objs="arm-c.o"
diff --git a/gcc/config/arm/arm_fp16.h b/gcc/config/arm/arm_fp16.h
new file mode 100644
index 000..702090a
--- /dev/null
+++ b/gcc/config/arm/arm_fp16.h
@@ -0,0 +1,255 @@
+/* ARM FP16 intrinsics include file.
+
+   Copyright (C) 2016 Free Software Foundation, Inc.
+   Contributed by ARM Ltd.
+
+   This file is part of GCC.
+
+   GCC is free software; you can redistribute it and/or modify it
+   under the terms of the GNU General Public License as published
+   by the Free Software Foundation; either version 3, or (at your
+   option) any later version.
+
+   GCC is distributed in the hope that it will be useful, but WITHOUT
+   ANY WARRANTY; without even the implied warranty of MERCHANTABILITY
+   or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU General Public
+   License for more details.
+
+   Under Section 7 of GPL version 3, you are granted additional
+   permissions described in the GCC Runtime Library Exception, version
+   3.1, as published by the Free Software Foundation.
+
+   You should have received a copy of the GNU General Public License and
+   a copy of the GCC Runtime Library Exception along with this program;
+   see the files COPYING3 and COPYING.RUNTIME respectively.  If not, see
+   .  */
+
+#ifndef _GCC_ARM_FP16_H
+#define _GCC_ARM_FP16_H 1
+
+#ifdef __cplusplus
+extern "C" {
+#endif
+
+#include 
+
+/* Intrinsics for FP16 instructions.  */
+#pragma GCC 

[PATCH 15/17][ARM] Add tests for ARMv8.2-A FP16 support.

2016-05-17 Thread Matthew Wahab

Support for using the half-precision floating point operations added by
the ARMv8.2-A FP16 extension is based on the macros and intrinsics added
to the ACLE for the extension.

This patch adds tests to check the compilers treatment of the ACLE
macros and the code generated for the new intrinsics. It does not
include the executable tests for the
gcc.target/aarch64/advsimd-intrinsics testsuite. Those are added later
in the patch series.

Tested the series for arm-none-linux-gnueabihf with native bootstrap and
make check and for arm-none-eabi and armeb-none-eabi with make check on
an ARMv8.2-A emulator.

Ok for trunk?
Matthew

testsuite/
2016-05-17  Matthew Wahab  

* gcc.target/arm/armv8_2-fp16-neon-1.c: New.
* gcc.target/arm/armv8_2-fp16-scalar-1.c: New.
* gcc.target/arm/armv8_2-fp16-scalar-2.c: New.
* gcc.target/arm/attr-fp16-arith-1.c: Add a test of intrinsics
support.

>From fe0cac871efe08d491a3b4ac027c29db1a72d15c Mon Sep 17 00:00:00 2001
From: Matthew Wahab 
Date: Thu, 7 Apr 2016 13:38:02 +0100
Subject: [PATCH 15/17] [PATCH 15/17][ARM] Add tests for ARMv8.2-A FP16
 support.

testsuite/
2016-05-17  Matthew Wahab  

	* gcc.target/arm/armv8_2-fp16-neon-1.c: New.
	* gcc.target/arm/armv8_2-fp16-scalar-1.c: New.
	* gcc.target/arm/armv8_2-fp16-scalar-2.c: New.
	* gcc.target/arm/attr-fp16-arith-1.c: Add a test of intrinsics
	support.
---
 gcc/testsuite/gcc.target/arm/armv8_2-fp16-neon-1.c | 490 +
 .../gcc.target/arm/armv8_2-fp16-scalar-1.c | 203 +
 .../gcc.target/arm/armv8_2-fp16-scalar-2.c |  71 +++
 gcc/testsuite/gcc.target/arm/attr-fp16-arith-1.c   |  13 +
 4 files changed, 777 insertions(+)
 create mode 100644 gcc/testsuite/gcc.target/arm/armv8_2-fp16-neon-1.c
 create mode 100644 gcc/testsuite/gcc.target/arm/armv8_2-fp16-scalar-1.c
 create mode 100644 gcc/testsuite/gcc.target/arm/armv8_2-fp16-scalar-2.c

diff --git a/gcc/testsuite/gcc.target/arm/armv8_2-fp16-neon-1.c b/gcc/testsuite/gcc.target/arm/armv8_2-fp16-neon-1.c
new file mode 100644
index 000..576031e
--- /dev/null
+++ b/gcc/testsuite/gcc.target/arm/armv8_2-fp16-neon-1.c
@@ -0,0 +1,490 @@
+/* { dg-do compile }  */
+/* { dg-require-effective-target arm_v8_2a_fp16_neon_ok }  */
+/* { dg-options "-O2" }  */
+/* { dg-add-options arm_v8_2a_fp16_neon }  */
+
+/* Test instructions generated for the FP16 vector intrinsics.  */
+
+#include 
+
+#define MSTRCAT(L, str)	L##str
+
+#define UNOP_TEST(insn)\
+  float16x4_t	\
+  MSTRCAT (test_##insn, _16x4) (float16x4_t a)	\
+  {		\
+return MSTRCAT (insn, _f16) (a);		\
+  }		\
+  float16x8_t	\
+  MSTRCAT (test_##insn, _16x8) (float16x8_t a)	\
+  {		\
+return MSTRCAT (insn, q_f16) (a);		\
+  }
+
+#define BINOP_TEST(insn)	\
+  float16x4_t			\
+  MSTRCAT (test_##insn, _16x4) (float16x4_t a, float16x4_t b)	\
+  {\
+return MSTRCAT (insn, _f16) (a, b);\
+  }\
+  float16x8_t			\
+  MSTRCAT (test_##insn, _16x8) (float16x8_t a, float16x8_t b)	\
+  {\
+return MSTRCAT (insn, q_f16) (a, b);			\
+  }
+
+#define BINOP_LANE_TEST(insn, I)	\
+  float16x4_t\
+  MSTRCAT (test_##insn##_lane, _16x4) (float16x4_t a, float16x4_t b)	\
+  {	\
+return MSTRCAT (insn, _lane_f16) (a, b, I);\
+  }	\
+  float16x8_t\
+  MSTRCAT (test_##insn##_lane, _16x8) (float16x8_t a, float16x4_t b)	\
+  {	\
+return MSTRCAT (insn, q_lane_f16) (a, b, I);			\
+  }
+
+#define BINOP_LANEQ_TEST(insn, I)	\
+  float16x4_t\
+  MSTRCAT (test_##insn##_laneq, _16x4) (float16x4_t a, float16x8_t b)	\
+  {	\
+return MSTRCAT (insn, _laneq_f16) (a, b, I);			\
+  }	\
+  float16x8_t\
+  MSTRCAT (test_##insn##_laneq, _16x8) (float16x8_t a, float16x8_t b)	\
+  {	\
+return MSTRCAT (insn, q_laneq_f16) (a, b, I);			\
+  }	\
+
+#define BINOP_N_TEST(insn)	\
+  float16x4_t			\
+  MSTRCAT (test_##insn##_n, _16x4) (float16x4_t a, float16_t b)	\
+  {\
+return MSTRCAT (insn, _n_f16) (a, b);			\
+  }\
+  float16x8_t			\
+  MSTRCAT (test_##insn##_n, _16x8) (float16x8_t a, float16_t b)	\
+  {\
+return MSTRCAT (insn, q_n_f16) (a, b);			\
+  }
+
+#define TERNOP_TEST(insn)		\
+  float16_t\
+  MSTRCAT (test_##insn, _16) (float16_t a, float16_t b, float16_t c)	\
+  {	\
+return MSTRCAT (insn, h_f16) (a, b, c);\
+  }	\
+  float16x4_t\
+  MSTRCAT (test_##insn, _16x4) (float16x4_t a, float16x4_t b,		\
+			   float16x4_t c)\
+  {	\
+return MSTRCAT (insn, _f16) (a, b, c);\
+  }	\
+  float16x8_t\
+  MSTRCAT (test_##insn, _16x8) (float16x8_t a, float16x8_t b,		\
+			   float16x8_t c)\
+  {	\
+return MSTRCAT (insn, q_f16) (a, b, c);\
+  }
+
+#define VCMP1_TEST(insn)			\
+  uint16x4_t	\
+  MSTRCAT 

[PATCH 14/17][ARM] Add NEON FP16 instrinsics.

2016-05-17 Thread Matthew Wahab

The ARMv8.2-A architecture introduces an optional FP16 extension adding
half-precision floating point data processing instructions to the
existing Adv.SIMD (NEON) support. A future version of the ACLE will add
support for these instructions and this patch implements that support.

The ACLE will introduce new intrinsics for the Adv.SIMD instructions
together and will require that these intrinsics are available when both
the header file arm_neon.h is included and the ACLE feature macro
__ARM_FEATURE_FP16_VECTOR_ARITHMETIC is defined. (The new ACLE feature
macro is dealt with in an earlier patch.)

The patch adds the following new intrinsics to arm_neon.h:

float16x4_t vabs_f16 (float16x4_t __a)
float16x8_t vabsq_f16 (float16x8_t __a)
uint16x4_t vceqz_f16 (float16x4_t __a)
uint16x8_t vceqzq_f16 (float16x8_t __a)
uint16x4_t vcgez_f16 (float16x4_t __a)
uint16x8_t vcgezq_f16 (float16x8_t __a)
uint16x4_t vcgtz_f16 (float16x4_t __a)
uint16x8_t vcgtzq_f16 (float16x8_t __a)
uint16x4_t vclez_f16 (float16x4_t __a)
uint16x8_t vclezq_f16 (float16x8_t __a)
uint16x4_t vcltz_f16 (float16x4_t __a)
uint16x8_t vcltzq_f16 (float16x8_t __a)
float16x4_t vcvt_f16_s16 (int16x4_t __a)
float16x4_t vcvt_f16_u16 (uint16x4_t __a)
int16x4_t vcvt_s16_f16 (float16x4_t __a)
uint16x4_t vcvt_u16_f16 (float16x4_t __a)
float16x8_t vcvtq_f16_s16 (int16x8_t __a)
float16x8_t vcvtq_f16_u16 (uint16x8_t __a)
int16x8_t vcvtq_s16_f16 (float16x8_t __a)
uint16x8_t vcvtq_u16_f16 (float16x8_t __a)
int16x4_t vcvta_s16_f16 (float16x4_t __a)
uint16x4_t vcvta_u16_f16 (float16x4_t __a)
int16x8_t vcvtaq_s16_f16 (float16x8_t __a)
uint16x8_t vcvtaq_u16_f16 (float16x8_t __a)
int16x4_t vcvtm_s16_f16 (float16x4_t __a)
uint16x4_t vcvtm_u16_f16 (float16x4_t __a)
int16x8_t vcvtmq_s16_f16 (float16x8_t __a)
uint16x8_t vcvtmq_u16_f16 (float16x8_t __a)
int16x4_t vcvtn_s16_f16 (float16x4_t __a)
uint16x4_t vcvtn_u16_f16 (float16x4_t __a)
int16x8_t vcvtnq_s16_f16 (float16x8_t __a)
uint16x8_t vcvtnq_u16_f16 (float16x8_t __a)
int16x4_t vcvtp_s16_f16 (float16x4_t __a)
uint16x4_t vcvtp_u16_f16 (float16x4_t __a)
int16x8_t vcvtpq_s16_f16 (float16x8_t __a)
uint16x8_t vcvtpq_u16_f16 (float16x8_t __a)
float16x4_t vneg_f16 (float16x4_t __a)
float16x8_t vnegq_f16 (float16x8_t __a)
float16x4_t vrecpe_f16 (float16x4_t __a)
float16x8_t vrecpeq_f16 (float16x8_t __a)
float16x4_t vrnd_f16 (float16x4_t __a)
float16x8_t vrndq_f16 (float16x8_t __a)
float16x4_t vrnda_f16 (float16x4_t __a)
float16x8_t vrndaq_f16 (float16x8_t __a)
float16x4_t vrndm_f16 (float16x4_t __a)
float16x8_t vrndmq_f16 (float16x8_t __a)
float16x4_t vrndn_f16 (float16x4_t __a)
float16x8_t vrndnq_f16 (float16x8_t __a)
float16x4_t vrndp_f16 (float16x4_t __a)
float16x8_t vrndpq_f16 (float16x8_t __a)
float16x4_t vrndx_f16 (float16x4_t __a)
float16x8_t vrndxq_f16 (float16x8_t __a)
float16x4_t vsqrte_f16 (float16x4_t __a)
float16x8_t vsqrteq_f16 (float16x8_t __a)

float16x4_t vabd_f16 (float16x4_t __a, float16x4_t __b)
float16x8_t vabdq_f16 (float16x8_t __a, float16x8_t __b)
float16x4_t vadd_f16 (float16x4_t __a, float16x4_t __b)
float16x8_t vaddq_f16 (float16x8_t __a, float16x8_t __b)
uint16x4_t vcage_f16 (float16x4_t __a, float16x4_t __b)
uint16x8_t vcageq_f16 (float16x8_t __a, float16x8_t __b)
uint16x4_t vcagt_f16 (float16x4_t __a, float16x4_t __b)
uint16x8_t vcagtq_f16 (float16x8_t __a, float16x8_t __b)
uint16x4_t vcale_f16 (float16x4_t __a, float16x4_t __b)
uint16x8_t vcaleq_f16 (float16x8_t __a, float16x8_t __b)
uint16x4_t vcalt_f16 (float16x4_t __a, float16x4_t __b)
uint16x8_t vcaltq_f16 (float16x8_t __a, float16x8_t __b)
uint16x4_t vceq_f16 (float16x4_t __a, float16x4_t __b)
uint16x8_t vceqq_f16 (float16x8_t __a, float16x8_t __b)
uint16x4_t vcge_f16 (float16x4_t __a, float16x4_t __b)
uint16x8_t vcgeq_f16 (float16x8_t __a, float16x8_t __b)
uint16x4_t vcgt_f16 (float16x4_t __a, float16x4_t __b)
uint16x8_t vcgtq_f16 (float16x8_t __a, float16x8_t __b)
uint16x4_t vcle_f16 (float16x4_t __a, float16x4_t __b)
uint16x8_t vcleq_f16 (float16x8_t __a, float16x8_t __b)
uint16x4_t vclt_f16 (float16x4_t __a, float16x4_t __b)
uint16x8_t vcltq_f16 (float16x8_t __a, float16x8_t __b)
float16x4_t vcvt_n_f16_s16 (int16x4_t __a, const int __b)
float16x4_t vcvt_n_f16_u16 (uint16x4_t __a, const int __b)
float16x8_t vcvtq_n_f16_s16 (int16x8_t __a, const int __b)
float16x8_t vcvtq_n_f16_u16 (uint16x8_t __a, const int __b)
int16x4_t vcvt_n_s16_f16 (float16x4_t __a, const int __b)
uint16x4_t vcvt_n_u16_f16 (float16x4_t __a, const int __b)
int16x8_t vcvtq_n_s16_f16 (float16x8_t __a, const int __b)
uint16x8_t vcvtq_n_u16_f16 (float16x8_t __a, const int __b)
float16x4_t vmax_f16 (float16x4_t __a, float16x4_t __b)
float16x8_t vmaxq_f16 (float16x8_t __a, float16x8_t __b)
float16x4_t vmaxnm_f16 (float16x4_t __a, float16x4_t __b)
float16x8_t vmaxnmq_f16 (float16x8_t __a, float16x8_t __b)
float16x4_t vmin_f16 (float16x4_t __a, float16x4_t __b)
float16x8_t vminq_f16 (float16x8_t __a, float16x8_t __b)
float16x4_t vminnm_f16 (float16x4_t __a, float16x4_t 

Re: [committed] Cherry-pick upstream asan fix for upcoming glibc (PR sanitizer/71160)

2016-05-17 Thread Jakub Jelinek
On Tue, May 17, 2016 at 05:38:27PM +0300, Maxim Ostapenko wrote:
> Hi Jakub,
> 
> thanks for backporting this! Do you have any plans to apply this patch to
> GCC 5 and 6 branches? AFAIK people hit on this ASan + newer Glibc bug by
> using GCC 5.3.1 on Fedora 23.

I don't have the newer glibc on my box, therefore I'm waiting until somebody
confirms the trunk change fixed it before backporting.

> >IMHO even better would be to make sure that in the common case (recent
> >glibc) we don't have failed dlsym calls (still, this hack is useful just in
> >case) - the __isoc99_*printf* interceptors make no sense, glibc has never
> >exported those.  Thus, if we bump ABI of libasan again for GCC 7, IMHO those
> >bogus interceptors should be ifdefed out for glibc or removed completely.
> >Or, if we don't want to break ABI, at least changed so that they actuall
> >dlsym the corresponding *printf* (not __isoc99_ prefixed) functions
> >instead of the bogus ones.
> 
> We should definitely bump libasan version on next libsanitizer merge,
> because it would contain ABI breaking changes in ASan. Perhaps we could
> ifdef these __isoc99 interceptors as a local GCC patch then?

Well, with the patch it is nothing urgent, but IMNSHO the bogus interceptors
aren't needed upstream either.

Jakub


[PATCH 12/17][ARM] Add builtins for NEON FP16 intrinsics.

2016-05-17 Thread Matthew Wahab

This patch adds the builtins data for the ACLE intrinsics introduced to
support the NEON instructions of the ARMv8.2-A FP16 extension.

Tested the series for arm-none-linux-gnueabihf with native bootstrap and
make check and for arm-none-eabi and armeb-none-eabi with make check on
an ARMv8.2-A emulator.

Ok for trunk?
Matthew

2016-05-17  Matthew Wahab  

* config/arm/arm_neon_builtins.def (vadd): New (v8hf, v4hf
variants).
(vmulf): New (v8hf, v4hf variants).
(vfma): New (v8hf, v4hf variants).
(vfms): New (v8hf, v4hf variants).
(vsub): New (v8hf, v4hf variants).
(vcage): New (v8hf, v4hf variants).
(vcagt): New (v8hf, v4hf variants).
(vcale): New (v8hf, v4hf variants).
(vcalt): New (v8hf, v4hf variants).
(vceq): New (v8hf, v4hf variants).
(vcgt): New (v8hf, v4hf variants).
(vcge): New (v8hf, v4hf variants).
(vcle): New (v8hf, v4hf variants).
(vclt): New (v8hf, v4hf variants).
(vceqz): New (v8hf, v4hf variants).
(vcgez): New (v8hf, v4hf variants).
(vcgtz): New (v8hf, v4hf variants).
(vcltz): New (v8hf, v4hf variants).
(vclez): New (v8hf, v4hf variants).
(vabd): New (v8hf, v4hf variants).
(vmaxf): New (v8hf, v4hf variants).
(vmaxnm): New (v8hf, v4hf variants).
(vminf): New (v8hf, v4hf variants).
(vminnm): New (v8hf, v4hf variants).
(vpmaxf): New (v4hf variant).
(vpminf): New (v4hf variant).
(vpadd): New (v4hf variant).
(vrecps): New (v8hf, v4hf variants).
(vrsqrts): New (v8hf, v4hf variants).
(vabs): New (v8hf, v4hf variants).
(vneg): New (v8hf, v4hf variants).
(vrecpe): New (v8hf, v4hf variants).
(vrnd): New (v8hf, v4hf variants).
(vrnda): New (v8hf, v4hf variants).
(vrndm): New (v8hf, v4hf variants).
(vrndn): New (v8hf, v4hf variants).
(vrndp): New (v8hf, v4hf variants).
(vrndx): New (v8hf, v4hf variants).
(vsqrte): New (v8hf, v4hf variants).
(vdup_n): New (v8hf, v4hf variants).
(vdup_lane): New (v8hf, v4hf variants).
(vmul_lane): Add v4hf and v8hf variants.
(vmul_n): Add v4hf and v8hf variants.
(vmul_n): Add v4hf and v8hf variants.
(vext): New (v8hf, v4hf variants).
(vcvts): New (v8hi, v4hi variants).
(vcvts): New (v8hf, v4hf variants).
(vcvtu): New (v8hi, v4hi variants).
(vcvtu): New (v8hf, v4hf variants).
(vcvts_n): New (v8hf, v4hf variants).
(vcvtu_n): New (v8hi, v4hi variants).
(vcvts_n): New (v8hi, v4hi variants).
(vcvtu_n): New (v8hf, v4hf variants).
(vbsl): New (v8hf, v4hf variants).
(vcvtas): New (v8hf, v4hf variants).
(vcvtau): New (v8hf, v4hf variants).
(vcvtms): New (v8hf, v4hf variants).
(vcvtmu): New (v8hf, v4hf variants).
(vcvtns): New (v8hf, v4hf variants).
(vcvtnu): New (v8hf, v4hf variants).
(vcvtps): New (v8hf, v4hf variants).
(vcvtpu): New (v8hf, v4hf variants).

>From ca740dee578be4c67afeec106feaa1633daff63b Mon Sep 17 00:00:00 2001
From: Matthew Wahab 
Date: Thu, 7 Apr 2016 13:36:41 +0100
Subject: [PATCH 12/17] [PATCH 12/17][ARM] Add builtins for NEON FP16
 intrinsics.

2016-05-17  Matthew Wahab  

	* config/arm/arm_neon_builtins.def (vadd): New (v8hf, v4hf
	variants).
	(vmulf): New (v8hf, v4hf variants).
	(vfma): New (v8hf, v4hf variants).
	(vfms): New (v8hf, v4hf variants).
	(vsub): New (v8hf, v4hf variants).
	(vcage): New (v8hf, v4hf variants).
	(vcagt): New (v8hf, v4hf variants).
	(vcale): New (v8hf, v4hf variants).
	(vcalt): New (v8hf, v4hf variants).
	(vceq): New (v8hf, v4hf variants).
	(vcgt): New (v8hf, v4hf variants).
	(vcge): New (v8hf, v4hf variants).
	(vcle): New (v8hf, v4hf variants).
	(vclt): New (v8hf, v4hf variants).
	(vceqz): New (v8hf, v4hf variants).
	(vcgez): New (v8hf, v4hf variants).
	(vcgtz): New (v8hf, v4hf variants).
	(vcltz): New (v8hf, v4hf variants).
	(vclez): New (v8hf, v4hf variants).
	(vabd): New (v8hf, v4hf variants).
	(vmaxf): New (v8hf, v4hf variants).
	(vmaxnm): New (v8hf, v4hf variants).
	(vminf): New (v8hf, v4hf variants).
	(vminnm): New (v8hf, v4hf variants).
	(vpmaxf): New (v4hf variant).
	(vpminf): New (v4hf variant).
	(vpadd): New (v4hf variant).
	(vrecps): New (v8hf, v4hf variants).
	(vrsqrts): New (v8hf, v4hf variants).
	(vabs): New (v8hf, v4hf variants).
	(vneg): New (v8hf, v4hf variants).
	(vrecpe): New (v8hf, v4hf variants).
	(vrnd): New (v8hf, v4hf variants).
	(vrnda): New (v8hf, v4hf variants).
	(vrndm): New (v8hf, v4hf variants).
	(vrndn): New (v8hf, v4hf variants).
	(vrndp): New (v8hf, v4hf variants).
	(vrndx): New (v8hf, v4hf variants).
	(vsqrte): New (v8hf, v4hf variants).
	(vdup_n): New (v8hf, v4hf variants).
	(vdup_lane): New (v8hf, v4hf variants).
	(vmul_lane): Add 

[PATCH 8/17][ARM] Add VFP FP16 arithmetic instructions.

2016-05-17 Thread Matthew Wahab

The ARMv8.2-A FP16 extension adds a number of arithmetic instrutctions
to the VFP instruction set. This patch adds support for these
instructions to the ARM backend.

In most cases the instructions are added using non-standard pattern
names. This is to force operations on __fp16 values to be done, by
conversion, using the single-precision instructions. The exceptions are
the precision preserving operations ABS and NEG.

The instruction patterns can be used by the compiler to optimize
half-precision operations. Since the patterns names are non-standard,
the only way for half-precision operations to be generated is by using
the intrinsics added by this patch series meaning that existing code
will not be affected.

Tested the series for arm-none-linux-gnueabihf with native bootstrap and
make check and for arm-none-eabi and armeb-none-eabi with make check on
an ARMv8.2-A emulator.

Ok for trunk?
Matthew

2016-05-17  Matthew Wahab  

* config/arm/iterators.md (Code iterators): Fix some white-space
in the comments.
(GLTE): New.
(ABSNEG): New
(FCVT): Moved from vfp.md.
(VCVT_HF_US_N): New.
(VCVT_SI_US_N): New.
(VCVT_HF_US): New.
(VCVTH_US): New.
(FP16_RND): New.
(absneg_str): New.
(FCVTI32typename): Moved from vfp.md.
(sup): Add UNSPEC_VCVTA_S, UNSPEC_VCVTA_U, UNSPEC_VCVTM_S,
UNSPEC_VCVTM_U, UNSPEC_VCVTN_S, UNSPEC_VCVTN_U, UNSPEC_VCVTP_S,
UNSPEC_VCVTP_U, UNSPEC_VCVT_HF_S_N, UNSPEC_VCVT_HF_U_N,
UNSPEC_VCVT_SI_S_N, UNSPEC_VCVT_SI_U_N, UUNSPEC_VCVTH_S and
UNSPEC_VCVTH_U.
(vcvth_op): New.
(fp16_rnd_str): New.
(fp16_rnd_insn): New.
* config/arm/unspecs.md (UNSPEC_VCVT_HF_S_N): New.
(UNSPEC_VCVT_HF_U_N): New.
(UNSPEC_VCVT_SI_S_N): New.
(UNSPEC_VCVT_SI_U_N): New.
(UNSPEC_VCVTH_S): New.
(UNSPEC_VCVTH_U): New.
(UNSPEC_VCVTA_S): New.
(UNSPEC_VCVTA_U): New.
(UNSPEC_VCVTM_S): New.
(UNSPEC_VCVTM_U): New.
(UNSPEC_VCVTN_S): New.
(UNSPEC_VCVTN_U): New.
(UNSPEC_VCVTP_S): New.
(UNSPEC_VCVTP_U): New.
(UNSPEC_VCVTP_S): New.
(UNSPEC_VCVTP_U): New.
(UNSPEC_VRND): New.
(UNSPEC_VRNDA): New.
(UNSPEC_VRNDI): New.
(UNSPEC_VRNDM): New.
(UNSPEC_VRNDN): New.
(UNSPEC_VRNDP): New.
(UNSPEC_VRNDX): New.
* config/arm/vfp.md (hf2): New.
(neon_vhf): New.
(neon_vhf): New.
(neon_vrndihf): New.
(addhf3_fp16): New.
(neon_vaddhf): New.
(subhf3_fp16): New.
(neon_vsubhf): New.
(divhf3_fp16): New.
(neon_vdivhf): New.
(mulhf3_fp16): New.
(neon_vmulhf): New.
(*mulsf3neghf_vfp): New.
(*negmulhf3_vfp): New.
(*mulsf3addhf_vfp): New.
(*mulhf3subhf_vfp): New.
(*mulhf3neghfaddhf_vfp): New.
(*mulhf3neghfsubhf_vfp): New.
(fmahf4_fp16): New.
(neon_vfmahf): New.
(fmsubhf4_fp16): New.
(neon_vfmshf): New.
(*fnmsubhf4): New.
(*fnmaddhf4): New.
(neon_vsqrthf): New.
(neon_vrsqrtshf): New.
(FCVT): Move to iterators.md.
(FCVTI32typename): Likewise.
(neon_vcvthhf): New.
(neon_vcvthsi): New.
(neon_vcvth_nhf_unspec): New.
(neon_vcvth_nhf): New.
(neon_vcvth_nsi_unspec): New.
(neon_vcvth_nsi): New.
(neon_vcvthsi): New.
(neon_hf): New.

testsuite/
2016-05-17  Matthew Wahab  

* gcc.target/arm/armv8_2-fp16-arith-1.c: New.
* gcc.target/arm/armv8_2-fp16-conv-1.c: New.

>From 3e773f2ec85ea66d0be0e3a97ea52826156c00f2 Mon Sep 17 00:00:00 2001
From: Matthew Wahab 
Date: Thu, 7 Apr 2016 14:49:17 +0100
Subject: [PATCH 08/17] [PATCH 8/17][ARM] Add VFP FP16 arithmetic instructions.

2016-05-17  Matthew Wahab  

	* config/arm/iterators.md (Code iterators): Fix some white-space
	in the comments.
	(GLTE): New.
	(ABSNEG): New
	(FCVT): Moved from vfp.md.
	(VCVT_HF_US_N): New.
	(VCVT_SI_US_N): New.
	(VCVT_HF_US): New.
	(VCVTH_US): New.
	(FP16_RND): New.
	(absneg_str): New.
	(FCVTI32typename): Moved from vfp.md.
	(sup): Add UNSPEC_VCVTA_S, UNSPEC_VCVTA_U, UNSPEC_VCVTM_S,
	UNSPEC_VCVTM_U, UNSPEC_VCVTN_S, UNSPEC_VCVTN_U, UNSPEC_VCVTP_S,
	UNSPEC_VCVTP_U, UNSPEC_VCVT_HF_S_N, UNSPEC_VCVT_HF_U_N,
	UNSPEC_VCVT_SI_S_N, UNSPEC_VCVT_SI_U_N,  UNSPEC_VCVTH_S_N,
	UNSPEC_VCVTH_U_N, UNSPEC_VCVTH_S and UNSPEC_VCVTH_U.
	(vcvth_op): New.
	(fp16_rnd_str): New.
	(fp16_rnd_insn): New.
	* config/arm/unspecs.md (UNSPEC_VCVT_HF_S_N): New.
	(UNSPEC_VCVT_HF_U_N): New.
	(UNSPEC_VCVT_SI_S_N): New.
	(UNSPEC_VCVT_SI_U_N): New.
	(UNSPEC_VCVTH_S): New.
	(UNSPEC_VCVTH_U): New.
	(UNSPEC_VCVTA_S): New.
	(UNSPEC_VCVTA_U): New.
	(UNSPEC_VCVTM_S): New.
	(UNSPEC_VCVTM_U): New.
	(UNSPEC_VCVTN_S): New.
	

[PATCH 11/17][ARM] Add builtins for VFP FP16 intrinsics.

2016-05-17 Thread Matthew Wahab

The ACLE intrinsics introduced to support the ARMv8.2 FP16 extensions
require that intrinsics for scalar floating pointer (VFP) instructions
are available under different conditions from those for the NEON
intrinsics.

This patch adds the support code and builtins data for the new VFP
intrinsics. Because of the similarities between the scalar and NEON
builtins, the support code for the scalar builtins follows the code for
the NEON builtins. The declarations for the VFP builtins are also added
in this patch since the support code expects non-empty tables.

Tested the series for arm-none-linux-gnueabihf with native bootstrap and
make check and for arm-none-eabi and armeb-none-eabi with make check on
an ARMv8.2-A emulator.

Ok for trunk?
Matthew

2016-05-17  Matthew Wahab  

* config/arm/arm-builtins.c (hf_UP): New.
(si_UP): New.
(arm_vfp_builtin_data): New.  Update comment.
(enum arm_builtins): Include arm_vfp_builtins.def.
(ARM_BUILTIN_VFP_PATTERN_START): New.
(arm_init_vfp_builtins): New.
(arm_init_builtins): Add arm_init_vfp_builtins.
(arm_expand_vfp_builtin): New.
(arm_expand_builtins: Update for arm_expand_vfp_builtin.  Fix
long line.
* config/arm/arm_vfp_builtins.c: New file.
* config/arm/t-arm (arm.o): Add arm_vfp_builtins.def.
(arm-builtins.o): Likewise.

>From d1f2b10a2e672b1dc886d8d1efb136d970f967f1 Mon Sep 17 00:00:00 2001
From: Matthew Wahab 
Date: Thu, 7 Apr 2016 15:33:14 +0100
Subject: [PATCH 11/17] [PATCH 11/17][ARM] Add builtins for VFP FP16
 intrinsics.

2016-05-17  Matthew Wahab  

	* config/arm/arm-builtins.c (hf_UP): New.
	(si_UP): New.
	(arm_vfp_builtin_data): New.  Update comment.
	(arm_init_vfp_builtins): New.
	(arm_init_builtins): Add arm_init_vfp_builtins.
	(arm_expand_vfp_builtin): New.
	(arm_expand_builtins): Update for arm_expand_vfp_builtin.  Fix
	long line.
	* config/arm/arm_vfp_builtins.c: New file.
	* config/arm/t-arm (arm.o): Add arm_vfp_builtins.def.
	(arm-builtins.o): Likewise.
---
 gcc/config/arm/arm-builtins.c   | 75 +
 gcc/config/arm/arm_vfp_builtins.def | 56 +++
 gcc/config/arm/t-arm|  4 +-
 3 files changed, 126 insertions(+), 9 deletions(-)
 create mode 100644 gcc/config/arm/arm_vfp_builtins.def

diff --git a/gcc/config/arm/arm-builtins.c b/gcc/config/arm/arm-builtins.c
index 5a22b91..58c68a6 100644
--- a/gcc/config/arm/arm-builtins.c
+++ b/gcc/config/arm/arm-builtins.c
@@ -190,6 +190,8 @@ arm_storestruct_lane_qualifiers[SIMD_MAX_BUILTIN_ARGS]
 #define ti_UP	 TImode
 #define ei_UP	 EImode
 #define oi_UP	 OImode
+#define hf_UP	 HFmode
+#define si_UP	 SImode
 
 #define UP(X) X##_UP
 
@@ -239,12 +241,22 @@ typedef struct {
   VAR11 (T, N, A, B, C, D, E, F, G, H, I, J, K) \
   VAR1 (T, N, L)
 
-/* The NEON builtin data can be found in arm_neon_builtins.def.
-   The mode entries in the following table correspond to the "key" type of the
-   instruction variant, i.e. equivalent to that which would be specified after
-   the assembler mnemonic, which usually refers to the last vector operand.
-   The modes listed per instruction should be the same as those defined for
-   that instruction's pattern in neon.md.  */
+/* The NEON builtin data can be found in arm_neon_builtins.def and
+   arm_vfp_builtins.def.  The entries in arm_neon_builtins.def require
+   TARGET_NEON to be true.  The entries in arm_vfp_builtins.def require
+   TARGET_VFP to be true.  The feature tests are checked when the builtins are
+   expanded.
+
+   The mode entries in the following table correspond to
+   the "key" type of the instruction variant, i.e. equivalent to that which
+   would be specified after the assembler mnemonic, which usually refers to the
+   last vector operand.  The modes listed per instruction should be the same as
+   those defined for that instruction's pattern in neon.md.  */
+
+static neon_builtin_datum vfp_builtin_data[] =
+{
+#include "arm_vfp_builtins.def"
+};
 
 static neon_builtin_datum neon_builtin_data[] =
 {
@@ -534,6 +546,10 @@ enum arm_builtins
 #undef CRYPTO2
 #undef CRYPTO3
 
+  ARM_BUILTIN_VFP_BASE,
+
+#include "arm_vfp_builtins.def"
+
   ARM_BUILTIN_NEON_BASE,
   ARM_BUILTIN_NEON_LANE_CHECK = ARM_BUILTIN_NEON_BASE,
 
@@ -542,6 +558,9 @@ enum arm_builtins
   ARM_BUILTIN_MAX
 };
 
+#define ARM_BUILTIN_VFP_PATTERN_START \
+  (ARM_BUILTIN_VFP_BASE + 1)
+
 #define ARM_BUILTIN_NEON_PATTERN_START \
   (ARM_BUILTIN_NEON_BASE + 1)
 
@@ -1033,6 +1052,20 @@ arm_init_neon_builtins (void)
 }
 }
 
+/* Set up all the scalar floating point builtins.  */
+
+static void
+arm_init_vfp_builtins (void)
+{
+  unsigned int i, fcode = ARM_BUILTIN_VFP_PATTERN_START;
+
+  for (i = 0; i < ARRAY_SIZE (vfp_builtin_data); i++, fcode++)
+{
+  neon_builtin_datum *d = _builtin_data[i];
+  arm_init_neon_builtin (fcode, d);
+}
+}

[PATCH 10/17][ARM] Refactor support code for NEON builtins.

2016-05-17 Thread Matthew Wahab

The ACLE intrinsics introduced to support the ARMv8.2 FP16 extensions
require that intrinsics for scalar (VFP) instructions are available
under different conditions from those for the NEON intrinsics. To
support this, changes to the builtins support code are needed to enable
the scalar intrinsics to be initialized and expanded independently of
the NEON intrinsics.

This patch prepares for this by refactoring some of the builtin support
code so that it can be used for both the scalar and the NEON intrinsics.

Tested the series for arm-none-linux-gnueabihf with native bootstrap and
make check and for arm-none-eabi and armeb-none-eabi with make check on
an ARMv8.2-A emulator.

Ok for trunk?
Matthew

2016-05-17  Matthew Wahab  

* config/arm/arm-builtins.c (ARM_BUILTIN_NEON_PATTERN_START):
Change offset calculation.
(arm_init_neon_builtin): New.
(arm_init_builtins): Move body of a loop to the standalone
function arm_init_neon_builtin.
(arm_expand_neon_builtin_1): New.  Update comment.  Function body
moved from arm_expand_neon_builtin with some white-space fixes.
(arm_expand_neon_builtin): Move code into the standalone function
arm_expand_neon_builtin_1.

>From 01aee04d2dc6d2d089407ab14892164417f8407e Mon Sep 17 00:00:00 2001
From: Matthew Wahab 
Date: Thu, 7 Apr 2016 13:36:09 +0100
Subject: [PATCH 10/17] [PATCH 10/17][ARM] Refactor support code for NEON
 builtins.

2016-05-17  Matthew Wahab  

	* config/arm/arm-builtins.c (arm_init_neon_builtin): New.
	(arm_init_builtins): Move body of a loop to the standalone
	function arm_init_neon_builtin.
	(arm_expand_neon_builtin_1): New.  Update comment.  Function body
	moved from arm_neon_builtin with some white-space fixes.
	(arm_expand_neon_builtin): Move code into the standalone function
	arm_expand_neon_builtin_1.
---
 gcc/config/arm/arm-builtins.c | 292 +++---
 1 file changed, 158 insertions(+), 134 deletions(-)

diff --git a/gcc/config/arm/arm-builtins.c b/gcc/config/arm/arm-builtins.c
index 90fb40f..5a22b91 100644
--- a/gcc/config/arm/arm-builtins.c
+++ b/gcc/config/arm/arm-builtins.c
@@ -543,7 +543,7 @@ enum arm_builtins
 };
 
 #define ARM_BUILTIN_NEON_PATTERN_START \
-(ARM_BUILTIN_MAX - ARRAY_SIZE (neon_builtin_data))
+  (ARM_BUILTIN_NEON_BASE + 1)
 
 #undef CF
 #undef VAR1
@@ -895,6 +895,110 @@ arm_init_simd_builtin_scalar_types (void)
 	 "__builtin_neon_uti");
 }
 
+/* Set up a NEON builtin.  */
+
+static void
+arm_init_neon_builtin (unsigned int fcode,
+		   neon_builtin_datum *d)
+{
+  bool print_type_signature_p = false;
+  char type_signature[SIMD_MAX_BUILTIN_ARGS] = { 0 };
+  char namebuf[60];
+  tree ftype = NULL;
+  tree fndecl = NULL;
+
+  d->fcode = fcode;
+
+  /* We must track two variables here.  op_num is
+ the operand number as in the RTL pattern.  This is
+ required to access the mode (e.g. V4SF mode) of the
+ argument, from which the base type can be derived.
+ arg_num is an index in to the qualifiers data, which
+ gives qualifiers to the type (e.g. const unsigned).
+ The reason these two variables may differ by one is the
+ void return type.  While all return types take the 0th entry
+ in the qualifiers array, there is no operand for them in the
+ RTL pattern.  */
+  int op_num = insn_data[d->code].n_operands - 1;
+  int arg_num = d->qualifiers[0] & qualifier_void
+? op_num + 1
+: op_num;
+  tree return_type = void_type_node, args = void_list_node;
+  tree eltype;
+
+  /* Build a function type directly from the insn_data for this
+ builtin.  The build_function_type () function takes care of
+ removing duplicates for us.  */
+  for (; op_num >= 0; arg_num--, op_num--)
+{
+  machine_mode op_mode = insn_data[d->code].operand[op_num].mode;
+  enum arm_type_qualifiers qualifiers = d->qualifiers[arg_num];
+
+  if (qualifiers & qualifier_unsigned)
+	{
+	  type_signature[arg_num] = 'u';
+	  print_type_signature_p = true;
+	}
+  else if (qualifiers & qualifier_poly)
+	{
+	  type_signature[arg_num] = 'p';
+	  print_type_signature_p = true;
+	}
+  else
+	type_signature[arg_num] = 's';
+
+  /* Skip an internal operand for vget_{low, high}.  */
+  if (qualifiers & qualifier_internal)
+	continue;
+
+  /* Some builtins have different user-facing types
+	 for certain arguments, encoded in d->mode.  */
+  if (qualifiers & qualifier_map_mode)
+	op_mode = d->mode;
+
+  /* For pointers, we want a pointer to the basic type
+	 of the vector.  */
+  if (qualifiers & qualifier_pointer && VECTOR_MODE_P (op_mode))
+	op_mode = GET_MODE_INNER (op_mode);
+
+  eltype = arm_simd_builtin_type
+	(op_mode,
+	 (qualifiers & qualifier_unsigned) != 0,
+	 (qualifiers & qualifier_poly) != 0);
+  gcc_assert (eltype != NULL);
+
+  /* Add qualifiers.  */
+  if (qualifiers & 

Re: [committed] Cherry-pick upstream asan fix for upcoming glibc (PR sanitizer/71160)

2016-05-17 Thread Maxim Ostapenko

Hi Jakub,

thanks for backporting this! Do you have any plans to apply this patch 
to GCC 5 and 6 branches? AFAIK people hit on this ASan + newer Glibc bug 
by using GCC 5.3.1 on Fedora 23.


On 17/05/16 12:23, Jakub Jelinek wrote:

Hi!

This is a cherry-pick of upstream fix, so that dlsym can call not just
calloc, but also malloc or realloc, even before asan is initialized.

Tested on x86_64-linux, committed so far to trunk.

IMHO even better would be to make sure that in the common case (recent
glibc) we don't have failed dlsym calls (still, this hack is useful just in
case) - the __isoc99_*printf* interceptors make no sense, glibc has never
exported those.  Thus, if we bump ABI of libasan again for GCC 7, IMHO those
bogus interceptors should be ifdefed out for glibc or removed completely.
Or, if we don't want to break ABI, at least changed so that they actuall
dlsym the corresponding *printf* (not __isoc99_ prefixed) functions
instead of the bogus ones.


We should definitely bump libasan version on next libsanitizer merge, 
because it would contain ABI breaking changes in ASan. Perhaps we could 
ifdef these __isoc99 interceptors as a local GCC patch then?



2016-05-17  Jakub Jelinek  

PR sanitizer/71160
* asan/asan_malloc_linux.cc: Cherry pick upstream r254395
and r269633.

--- libsanitizer/asan/asan_malloc_linux.cc.jj   2014-09-24 11:08:01.772039066 
+0200
+++ libsanitizer/asan/asan_malloc_linux.cc  2016-05-17 11:02:37.859379380 
+0200
@@ -24,39 +24,62 @@
  // -- Replacement functions  {{{1
  using namespace __asan;  // NOLINT
  
+static uptr allocated_for_dlsym;

+static const uptr kDlsymAllocPoolSize = 1024;
+static uptr alloc_memory_for_dlsym[kDlsymAllocPoolSize];
+
+static bool IsInDlsymAllocPool(const void *ptr) {
+  uptr off = (uptr)ptr - (uptr)alloc_memory_for_dlsym;
+  return off < sizeof(alloc_memory_for_dlsym);
+}
+
+static void *AllocateFromLocalPool(uptr size_in_bytes) {
+  uptr size_in_words = RoundUpTo(size_in_bytes, kWordSize) / kWordSize;
+  void *mem = (void*)_memory_for_dlsym[allocated_for_dlsym];
+  allocated_for_dlsym += size_in_words;
+  CHECK_LT(allocated_for_dlsym, kDlsymAllocPoolSize);
+  return mem;
+}
+
  INTERCEPTOR(void, free, void *ptr) {
GET_STACK_TRACE_FREE;
+  if (UNLIKELY(IsInDlsymAllocPool(ptr)))
+return;
asan_free(ptr, , FROM_MALLOC);
  }
  
  INTERCEPTOR(void, cfree, void *ptr) {

GET_STACK_TRACE_FREE;
+  if (UNLIKELY(IsInDlsymAllocPool(ptr)))
+return;
asan_free(ptr, , FROM_MALLOC);
  }
  
  INTERCEPTOR(void*, malloc, uptr size) {

+  if (UNLIKELY(!asan_inited))
+// Hack: dlsym calls malloc before REAL(malloc) is retrieved from dlsym.
+return AllocateFromLocalPool(size);
GET_STACK_TRACE_MALLOC;
return asan_malloc(size, );
  }
  
  INTERCEPTOR(void*, calloc, uptr nmemb, uptr size) {

-  if (UNLIKELY(!asan_inited)) {
+  if (UNLIKELY(!asan_inited))
  // Hack: dlsym calls calloc before REAL(calloc) is retrieved from dlsym.
-const uptr kCallocPoolSize = 1024;
-static uptr calloc_memory_for_dlsym[kCallocPoolSize];
-static uptr allocated;
-uptr size_in_words = ((nmemb * size) + kWordSize - 1) / kWordSize;
-void *mem = (void*)_memory_for_dlsym[allocated];
-allocated += size_in_words;
-CHECK(allocated < kCallocPoolSize);
-return mem;
-  }
+return AllocateFromLocalPool(nmemb * size);
GET_STACK_TRACE_MALLOC;
return asan_calloc(nmemb, size, );
  }
  
  INTERCEPTOR(void*, realloc, void *ptr, uptr size) {

GET_STACK_TRACE_MALLOC;
+  if (UNLIKELY(IsInDlsymAllocPool(ptr))) {
+uptr offset = (uptr)ptr - (uptr)alloc_memory_for_dlsym;
+uptr copy_size = Min(size, kDlsymAllocPoolSize - offset);
+void *new_ptr = asan_malloc(size, );
+internal_memcpy(new_ptr, ptr, copy_size);
+return new_ptr;
+  }
return asan_realloc(ptr, size, );
  }
  


Jakub






[PATCH 9/17][ARM] Add NEON FP16 arithmetic instructions.

2016-05-17 Thread Matthew Wahab

The ARMv8.2-A FP16 extension adds a number of arithmetic instrutctions
to the NEON instruction set. This patch adds support for these
instructions to the ARM backend.

As with the VFP FP16 arithmetic instructions, operations on __fp16
values are done by conversion to single-precision. Any new optimization
supported by the instruction descriptions can only apply to code
generated using intrinsics added in this patch series.

A number of the instructions are modelled as two variants, one using
UNSPEC and the other using RTL operations, with the model used decided
by the funsafe-math-optimizations flag. This follows the
single-precision instructions and is due to the half-precision
operations having the same conditions and restrictions on their use in
optmizations (when they are enabled).

Tested the series for arm-none-linux-gnueabihf with native bootstrap and
make check and for arm-none-eabi and armeb-none-eabi with make check on
an ARMv8.2-A emulator.

Ok for trunk?
Matthew

2016-05-17  Matthew Wahab  

* config/arm/iterators.md (VCVTHI): New.
(NEON_VCMP): Add UNSPEC_VCLT and UNSPEC_VCLE.  Fix a long line.
(NEON_VAGLTE): New.
(VFM_LANE_AS): New.
(VH_CVTTO): New.
(V_reg): Add HF, V4HF and V8HF.  Fix white-space.
(V_HALF): Add V4HF.  Fix white-space.
(V_if_elem): Add HF, V4HF and V8HF.  Fix white-space.
(V_s_elem): Likewise.
(V_sz_elem): Fix white-space.
(V_elem_ch): Likewise.
(VH_elem_ch): New.
(scalar_mul_constraint): Add V8HF and V4HF.
(Is_float_mode): Fix white-space.
(Is_d_reg): Likewise.
(q): Add HF.  Fix white-space.
(float_sup): New.
(float_SUP): New.
(cmp_op_unsp): Add UNSPEC_VCALE and UNSPEC_VCALT.
(neon_vfm_lane_as): New.
* config/arm/neon.md (add3_fp16): New.
(sub3_fp16): New.
(mul3add_neon): New.
(*fma4): New.
(fma4_intrinsic): New.
(fmsub4_intrinsic): Fix white-space.
(*fmsub4): New.
(fmsub4_intrinsic): New.
(2_fp16): New.
(neon_v): New.
(neon_v): New.
(neon_vsqrte): New.
(neon_vpaddv4hf): New.
(neon_vadd): New.
(neon_vsub): New.
(neon_vadd_unspec): New.
(neon_vsub_unspec): New.
(neon_vmulf): New.
(neon_vfma): New.
(neon_vfms): New.
(neon_vc): New.
(neon_vc_fp16insn): New
(neon_vc_fp16insn_unspec): New.
(neon_vca): New.
(neon_vca_fp16insn): New.
(neon_vca_fp16insn_unspec): New.
(neon_vcz): New.
(neon_vabd): New.
(neon_vf): New.
(neon_vpfv4hf): New.
(neon_): New.
(neon_vrecps): New.
(neon_vrsqrts): New.
(neon_vrecpe): New (VH variant).
(neon_vcvt): New (VCVTHI variant).
(neon_vcvt): New (VH variant).
(neon_vcvt_n): New (VH variant).
(neon_vcvt_n): New (VCVTHI variant).
(neon_vcvt): New (VH variant).
(neon_vmul_lane): New.
(neon_vmul_n): New.
* config/arm/unspecs.md (UNSPEC_VCALE): New
(UNSPEC_VCALT): New.
(UNSPEC_VFMA_LANE): New.
(UNSPECS_VFMS_LANE): New.
(UNSPECS_VSQRTE): New.

testsuite/
2016-05-17  Matthew Wahab  

* gcc.target/arm/armv8_2-fp16-arith-1.c: Add tests for float16x4_t
and float16x8_t.

>From 623f36632cc2848f16ba1c75f400198a72dc6ea4 Mon Sep 17 00:00:00 2001
From: Matthew Wahab 
Date: Thu, 7 Apr 2016 16:19:57 +0100
Subject: [PATCH 09/17] [PATCH 9/17][ARM] Add NEON FP16 arithmetic
 instructions.

2016-05-17  Matthew Wahab  

	* config/arm/iterators.md (VCVTHI): New.
	(NEON_VCMP): Add UNSPEC_VCLT and UNSPEC_VCLE.  Fix a long line.
	(NEON_VAGLTE): New.
	(VFM_LANE_AS): New.
	(VH_CVTTO): New.
	(V_reg): Add HF, V4HF and V8HF.  Fix white-space.
	(V_HALF): Add V4HF.  Fix white-space.
	(V_if_elem): Add HF, V4HF and V8HF.  Fix white-space.
	(V_s_elem): Likewise.
	(V_sz_elem): Fix white-space.
	(V_elem_ch): Likewise.
	(VH_elem_ch): New.
	(scalar_mul_constraint): Add V8HF and V4HF.
	(Is_float_mode): Fix white-space.
	(Is_d_reg): Add V4HF and V8HF.  Fix white-space.
	(q): Add HF.  Fix white-space.
	(float_sup): New.
	(float_SUP): New.
	(cmp_op_unsp): Add UNSPEC_VCALE and UNSPEC_VCALT.
	(neon_vfm_lane_as): New.
	* config/arm/neon.md (add3_fp16): New.
	(sub3_fp16): New.
	(mul3add_neon): New.
	(*fma4): New.
	(fma4_intrinsic): New.
	(fmsub4_intrinsic): Fix white-space.
	(*fmsub4): New.
	(fmsub4_intrinsic): New.
	(2_fp16): New.
	(neon_v): New.
	(neon_v): New.
	(neon_vsqrte): New.
	(neon_vpadd): New.
	(neon_vadd): New.
	(neon_vsub): New.
	(neon_vadd_unspec): New.
	(neon_vsub_unspec): New.
	(neon_vmulf): New.
	(neon_vfma): New.
	(neon_vfms): New.
	(neon_vc): New.
	(neon_vc_fp16insn): New
	(neon_vc_fp16insn_unspec): New.
	(neon_vca): New.
	(neon_vca_fp16insn): New.
	

[PATCH 7/17][ARM] Add FP16 data movement instructions.

2016-05-17 Thread Matthew Wahab

The ARMv8.2-A FP16 extension adds a number of instructions to support
data movement for FP16 values. This patch adds these instructions to the
backend, making them available to the compiler code generator.

The new instructions include VSEL which selects between two registers
depending on a condition. This is used to support conditional data
movement which can depend on the result of comparisons between
half-precision values. These comparisons are always done by conversion
to single-precision.

Tested the series for arm-none-linux-gnueabihf with native bootstrap and
make check and for arm-none-eabi and armeb-none-eabi with make check on
an ARMv8.2-A emulator. This patch also tested for
arm-none-linux-gnueabihf with native bootstrap and make check and for
arm-none-eabi with check-gcc on an ARMv8.2-A emulator.

Ok for trunk?
Matthew

2016-05-17  Matthew Wahab  
Jiong Wang  

* config/arm/arm.c (coproc_secondary_reload_class): Make HFmode
available when FP16 instructions are available.
(output_mov_vfp): Add support for 16-bit data moves.
(arm_validize_comparison): Fix some white-space.  Support HFmode
by conversion to SFmode.
* config/arm/arm.md (truncdfhf2): Fix a comment.
(extendhfdf2): Likewise.
(cstorehf4): New.
(movsicc): Fix some white-space.
(movhfcc): New.
(movsfcc): Fix some white-space.
(*cmovhf): New.
* config/arm/vfp.md (*arm_movhi_vfp): Disable when VFP FP16
instructions are available.
(*thumb_movhi_vfp): Likewise.
(*arm_movhi_fp16): New.
(*thumb_movhi_fp16): New.
(*movhf_vfp_fp16): New.
(*movhf_vfp_neon): Disable when VFP FP16 instructions are
available.
(*movhf_vfp): Likewise.
(extendhfsf2): Enable when VFP FP16 instructions are available.
(truncsfhf2):  Enable when VFP FP16 instructions are available.

testsuite/
2016-05-17  Matthew Wahab  

* gcc.target/arm/armv8_2_fp16-move-1.c: New.

>From 83268813cf9aa59940ed17d623606c9e485f6ecf Mon Sep 17 00:00:00 2001
From: Matthew Wahab 
Date: Thu, 7 Apr 2016 13:35:04 +0100
Subject: [PATCH 07/17] [PATCH 7/17][ARM] Add FP16 data movement instructions.

2016-05-17  Matthew Wahab  
	Jiong Wang  

	* config/arm/arm.c (coproc_secondary_reload_class): Make HFmode
	available when FP16 instructions are available.
	(output_mov_vfp): Add support for 16-bit data moves.
	(arm_validize_comparison): Fix some white-space.  Support HFmode
	by conversion to SFmode.
	* config/arm/arm.md (truncdfhf2): Fix a comment.
	(extendhfdf2): Likewise.
	(cstorehf4_fp16): New.
	(movsicc): Fix some white-space.
	(movhfcc): New.
	(movsfcc): Fix some white-space.
	(*cmovhf): New.
	* config/arm/vfp.md (*arm_movhi_vfp): Disable when VFP FP16
	instructions are available.
	(*thumb_movhi_vfp): Likewise.
	(*arm_movhi_fp16): New.
	(*thumb_movhi_fp16): New.
	(*movhf_vfp_fp16): New.
	(*movhf_vfp_neon): Disable when VFP FP16 instructions are
	available.
	(*movhf_vfp): Likewise.
	(extendhfsf2): Enable when VFP FP16 instructions are available.
	(truncsfhf2):  Enable when VFP FP16 instructions are available.

testsuite/
2016-05-17  Matthew Wahab  

	* gcc.target/arm/armv8_2_fp16-move-1.c: New.
---
 gcc/config/arm/arm.c   |  16 +-
 gcc/config/arm/arm.md  |  81 -
 gcc/config/arm/vfp.md  | 182 -
 gcc/testsuite/gcc.target/arm/armv8_2-fp16-move-1.c | 166 +++
 4 files changed, 433 insertions(+), 12 deletions(-)
 create mode 100644 gcc/testsuite/gcc.target/arm/armv8_2-fp16-move-1.c

diff --git a/gcc/config/arm/arm.c b/gcc/config/arm/arm.c
index 6892040..187ebda 100644
--- a/gcc/config/arm/arm.c
+++ b/gcc/config/arm/arm.c
@@ -13162,7 +13162,7 @@ coproc_secondary_reload_class (machine_mode mode, rtx x, bool wb)
 {
   if (mode == HFmode)
 {
-  if (!TARGET_NEON_FP16)
+  if (!TARGET_NEON_FP16 && !TARGET_VFP_FP16INST)
 	return GENERAL_REGS;
   if (s_register_operand (x, mode) || neon_vector_mem_operand (x, 2, true))
 	return NO_REGS;
@@ -18613,6 +18613,8 @@ output_move_vfp (rtx *operands)
   rtx reg, mem, addr, ops[2];
   int load = REG_P (operands[0]);
   int dp = GET_MODE_SIZE (GET_MODE (operands[0])) == 8;
+  int sp = (!TARGET_VFP_FP16INST
+	|| GET_MODE_SIZE (GET_MODE (operands[0])) == 4);
   int integer_p = GET_MODE_CLASS (GET_MODE (operands[0])) == MODE_INT;
   const char *templ;
   char buff[50];
@@ -18659,7 +18661,7 @@ output_move_vfp (rtx *operands)
 
   sprintf (buff, templ,
 	   load ? "ld" : "st",
-	   dp ? "64" : "32",
+	   dp ? "64" : sp ? "32" : "16",
 	   dp ? "P" : "",
 	   integer_p ? "\t%@ int" : "");
   output_asm_insn (buff, ops);
@@ -29238,7 +29240,7 @@ arm_validize_comparison (rtx 

[PATCH 6/17][ARM] Add data processing intrinsics for float16_t.

2016-05-17 Thread Matthew Wahab

The ACLE specifies a number of intrinsics for manipulating vectors
holding values in most of the integer and floating point type. These
include 16-bit integer types but not 16-bit floating point even though
the same instruction is used for both.

A future version of the ACLE extends the data processing intrinscs to
the 16-bit floating point types, making the intrinsics available
under the same conditions as the ARM __fp16 type.

This patch adds the new intrinsics:
 vbsl_f16, vbslq_f16, vdup_n_f16, vdupq_n_f16, vdup_lane_f16,
 vdupq_lane_f16, vext_f16, vextq_f16, vmov_n_f16, vmovq_n_f16,
 vrev64_f16, vrev64q_f16, vtrn_f16, vtrnq_f16, vuzp_f16, vuzpq_f16,
 vzip_f16, vzipq_f16.

This patch also updates the advsimd-intrinsics testsuite to test the f16
variants for ARM targets. These intrinsics are only implemented in the
ARM target so the tests are disabled for AArch64 using an extra
condition on a new convenience macro FP16_SUPPORTED. This patch also
disables, for the ARM target, the testsuite defined macro vdup_n_f16 as
it is no longer needed.

Tested the series for arm-none-linux-gnueabihf with native bootstrap and
make check and for arm-none-eabi and armeb-none-eabi with make check on
an ARMv8.2-A emulator. Also tested for aarch64-none-elf with the
advsimd-intrinsics testsuite using an ARMv8.2-A emulator.

Ok for trunk?
Matthew

2016-05-17  Matthew Wahab  

* config/arm/arm.c (arm_evpc_neon_vuzp): Add support for V8HF and
V4HF modes.
(arm_evpc_neon_vzip): Likewise.
(arm_evpc_neon_vrev): Likewise.
(arm_evpc_neon_vtrn): Likewise.
(arm_evpc_neon_vext): Likewise.
* config/arm/arm_neon.h (vbsl_f16): New.
(vbslq_f16): New.
(vdup_n_f16): New.
(vdupq_n_f16): New.
(vdup_lane_f16): New.
(vdupq_lane_f16): New.
(vext_f16): New.
(vextq_f16): New.
(vmov_n_f16): New.
(vmovq_n_f16): New.
(vrev64_f16): New.
(vrev64q_f16): New.
(vtrn_f16): New.
(vtrnq_f16): New.
(vuzp_f16): New.
(vuzpq_f16): New.
(vzip_f16): New.
(vzipq_f16): New.
* config/arm/arm_neon_buillins.def (vdup_n): New (v8hf, v4hf variants).
(vdup_lane): New (v8hf, v4hf variants).
(vext): New (v8hf, v4hf variants).
(vbsl): New (v8hf, v4hf variants).
* config/arm/iterators.md (VDQWH): New.
(VH): New.
(V_double_vector_mode): Add V8HF and V4HF.  Fix white-space.
(Scalar_mul_8_16): Fix white-space.
(Is_d_reg): Add V4HF and V8HF.
* config/arm/neon.md (neon_vdup_lane_internal): New.
(neon_vdup_lane): New.
(neon_vtrn_internal): Replace VDQW with VDQWH.
(*neon_vtrn_insn): Likewise.
(neon_vzip_internal): Likewise. Also fix white-space.
(*neon_vzip_insn): Likewise
(neon_vuzp_internal): Likewise.
(*neon_vuzp_insn): Likewise
* config/arm/vec-common.md (vec_perm_const): New.

testsuite/
2016-05-17  Matthew Wahab  

* gcc.target/aarch64/advsimd-intrinsics/arm-neon-ref.h
(FP16_SUPPORTED): New
(vdup_n_f16): Disable for non-AArch64 targets.
* gcc.target/aarch64/advsimd-intrinsics/vbsl.c: Add __fp16 tests,
conditional on FP16_SUPPORTED.
* gcc.target/aarch64/advsimd-intrinsics/vdup-vmov.c: Likewise.
* gcc.target/aarch64/advsimd-intrinsics/vdup_lane.c: Likewise.
* gcc.target/aarch64/advsimd-intrinsics/vext.c: Likewise.
* gcc.target/aarch64/advsimd-intrinsics/vrev.c: Likewise.
* gcc.target/aarch64/advsimd-intrinsics/vshuffle.inc: Add support
for testing __fp16.
* gcc.target/aarch64/advsimd-intrinsics/vtrn.c: Add __fp16 tests,
conditional on FP16_SUPPORTED.
* gcc.target/aarch64/advsimd-intrinsics/vuzp.c: Likewise.
* gcc.target/aarch64/advsimd-intrinsics/vzip.c: Likewise.

>From 08c5cf4b5c6c846a4f62b6ad8776f2388b135e55 Mon Sep 17 00:00:00 2001
From: Matthew Wahab 
Date: Thu, 7 Apr 2016 14:48:29 +0100
Subject: [PATCH 06/17] [PATCH 6/17][ARM] Add data processing intrinsics for
 float16_t.

2016-05-17  Matthew Wahab  

	* config/arm/arm.c (arm_evpc_neon_vuzp): Add support for V8HF and
	V4HF modes.
	(arm_evpc_neon_vtrn): Likewise.
	(arm_evpc_neon_vrev): Likewise.
	(arm_evpc_neon_vext): Likewise.
	* config/arm/arm_neon.h (vbsl_f16): New.
	(vbslq_f16): New.
	(vdup_n_f16): New.
	(vdupq_n_f16): New.
	(vdup_lane_f16): New.
	(vdupq_lane_f16): New.
	(vext_f16): New.
	(vextq_f16): New.
	(vmov_n_f16): New.
	(vmovq_n_f16): New.
	(vrev64_f16): New.
	(vrev64q_f16): New.
	(vtrn_f16): New.
	(vtrnq_f16): New.
	(vuzp_f16): New.
	(vuzpq_f16): New.
	(vzip_f16): New.
	(vzipq_f16): New.
	* config/arm/arm_neon_buillins.def (vdup_n): New (v8hf, v4hf variants).
	(vdup_lane): New (v8hf, v4hf variants).
	(vext): New (v8hf, v4hf variants).
	(vbsl): New (v8hf, v4hf variants).
	

[PATCH 5/17][ARM] Enable HI mode moves for floating point values.

2016-05-17 Thread Matthew Wahab

The handling of 16-bit integer data-movement in the ARM backend doesn't
make full use of the VFP instructions when they are available, even when
the values are for use in VFP operations.

This patch adds support for using the VFP instructions and registers
when moving 16-bit integer and floating point data between registers and
between registers and memory.

Tested the series for arm-none-linux-gnueabihf with native bootstrap and
make check and for arm-none-eabi and armeb-none-eabi with make check on
an ARMv8.2-A emulator. Tested this patch for arm-none-linux-gnueabihf
with native bootstrap and make check and for arm-none-eabi with
check-gcc on an ARMv8.2-A emulator.

Ok for trunk?
Matthew

2016-05-17  Jiong Wang  
Matthew Wahab  

* config/arm/arm.c (output_move_vfp): Weaken assert to allow
HImode.
(arm_hard_regno_mode_ok): Allow HImode values in VFP registers.
* config/arm/arm.md (*movhi_insn_arch4) Disable when VFP registers are
available.
(*movhi_bytes): Likewise.
* config/arm/vfp.md (*arm_movhi_vfp): New.
(*thumb2_movhi_vfp): New.

testsuite/
2016-05-17  Matthew Wahab  

* gcc.target/arm/short-vfp-1.c: New.

>From 0b8bc5f2966924c523d6fd75cf73dd01341914e2 Mon Sep 17 00:00:00 2001
From: Matthew Wahab 
Date: Thu, 7 Apr 2016 13:33:04 +0100
Subject: [PATCH 05/17] [PATCH 5/17][ARM] Enable HI mode moves for floating
 point values.

2016-05-17  Jiong Wang  
	Matthew Wahab  

	* config/arm/arm.c (output_move_vfp): Weaken assert to allow
	HImode.
	(arm_hard_regno_mode_ok): Allow HImode values in VFP registers.
	* config/arm/arm.md (*movhi_bytes): Disable when VFP registers are
	available.  Also fix some white-space.
	* config/arm/vfp.md (*arm_movhi_vfp): New.
	(*thumb2_movhi_vfp): New.

testsuite/
2016-05-17  Matthew Wahab  

	* gcc.target/arm/short-vfp-1.c: New.
---
 gcc/config/arm/arm.c   |  5 ++
 gcc/config/arm/arm.md  |  6 +-
 gcc/config/arm/vfp.md  | 93 ++
 gcc/testsuite/gcc.target/arm/short-vfp-1.c | 45 +++
 4 files changed, 146 insertions(+), 3 deletions(-)
 create mode 100644 gcc/testsuite/gcc.target/arm/short-vfp-1.c

diff --git a/gcc/config/arm/arm.c b/gcc/config/arm/arm.c
index f3914ef..26a8a48 100644
--- a/gcc/config/arm/arm.c
+++ b/gcc/config/arm/arm.c
@@ -18628,6 +18628,7 @@ output_move_vfp (rtx *operands)
   gcc_assert ((mode == HFmode && TARGET_HARD_FLOAT && TARGET_VFP)
 	  || mode == SFmode
 	  || mode == DFmode
+	  || mode == HImode
 	  || mode == SImode
 	  || mode == DImode
   || (TARGET_NEON && VALID_NEON_DREG_MODE (mode)));
@@ -23422,6 +23423,10 @@ arm_hard_regno_mode_ok (unsigned int regno, machine_mode mode)
   if (mode == HFmode)
 	return VFP_REGNO_OK_FOR_SINGLE (regno);
 
+  /* VFP registers can hold HImode values.  */
+  if (mode == HImode)
+	return VFP_REGNO_OK_FOR_SINGLE (regno);
+
   if (TARGET_NEON)
 return (VALID_NEON_DREG_MODE (mode) && VFP_REGNO_OK_FOR_DOUBLE (regno))
|| (VALID_NEON_QREG_MODE (mode)
diff --git a/gcc/config/arm/arm.md b/gcc/config/arm/arm.md
index 4049f10..3e23178 100644
--- a/gcc/config/arm/arm.md
+++ b/gcc/config/arm/arm.md
@@ -6365,7 +6365,7 @@
   [(set (match_operand:HI 0 "nonimmediate_operand" "=r,r,r,m,r")
 	(match_operand:HI 1 "general_operand"  "rIk,K,n,r,mi"))]
   "TARGET_ARM
-   && arm_arch4
+   && arm_arch4 && !(TARGET_HARD_FLOAT && TARGET_VFP)
&& (register_operand (operands[0], HImode)
|| register_operand (operands[1], HImode))"
   "@
@@ -6391,7 +6391,7 @@
 (define_insn "*movhi_bytes"
   [(set (match_operand:HI 0 "s_register_operand" "=r,r,r")
 	(match_operand:HI 1 "arm_rhs_operand"  "I,rk,K"))]
-  "TARGET_ARM"
+  "TARGET_ARM && !(TARGET_HARD_FLOAT && TARGET_VFP)"
   "@
mov%?\\t%0, %1\\t%@ movhi
mov%?\\t%0, %1\\t%@ movhi
@@ -6399,7 +6399,7 @@
   [(set_attr "predicable" "yes")
(set_attr "type" "mov_imm,mov_reg,mvn_imm")]
 )
-	
+
 ;; We use a DImode scratch because we may occasionally need an additional
 ;; temporary if the address isn't offsettable -- push_reload doesn't seem
 ;; to take any notice of the "o" constraints on reload_memory_operand operand.
diff --git a/gcc/config/arm/vfp.md b/gcc/config/arm/vfp.md
index 9750ba1..d7c874a 100644
--- a/gcc/config/arm/vfp.md
+++ b/gcc/config/arm/vfp.md
@@ -18,6 +18,99 @@
 ;; along with GCC; see the file COPYING3.  If not see
 ;; .  */
 
+;; Patterns for HI moves which provide more data transfer instructions when VFP
+;; support is enabled.
+(define_insn "*arm_movhi_vfp"
+ [(set
+   (match_operand:HI 0 "nonimmediate_operand"
+"=rk,  r, r, m, r, *t,  r, *t")
+   (match_operand:HI 1 "general_operand"
+"rIk, K, n, r, mi, r, *t, *t"))]
+ 

[PATCH 4/17][ARM] Define feature macros for FP16.

2016-05-17 Thread Matthew Wahab

The FP16 extension introduced with the ARMv8.2-A architecture adds
instructions operating on FP16 values to the VFP and NEON instruction
sets.

The patch adds the feature macro __ARM_FEATURE_FP16_SCALAR_ARITHMETIC
which is defined to be 1 if the VFP FP16 instructions are available; it
is otherwise undefined.

The patch also adds the feature macro __ARM_FEATURE_FP16_VECTOR_ARITHMETIC
which is defined to be 1 if the NEON FP16 instructions are available; it
is otherwise undefined.

These two macros will appear in a future version of the ACLE.

Tested the series for arm-none-linux-gnueabihf with native bootstrap and
make check and for arm-none-eabi and armeb-none-eabi with make check on
an ARMv8.2-A emulator.

Ok for trunk?
Matthew

2016-05-17  Matthew Wahab  

* config/arm/arm-c.c (arm_cpu_builtins): Define
"__ARM_FEATURE_FP16_SCALAR_ARITHMETIC" and
"__ARM_FEATURE_FP16_VECTOR_ARITHMETIC".

testsuite/
2016-05-17  Matthew Wahab  

* gcc.target/arm/attr-fp16-arith-1.c: New.

>From 688b4d34a64a40abd4705a9bdaea40929a7a1d26 Mon Sep 17 00:00:00 2001
From: Matthew Wahab 
Date: Thu, 7 Apr 2016 13:32:15 +0100
Subject: [PATCH 04/17] [PATCH 4/17][ARM] Define feature macros for FP16.

2016-05-17  Matthew Wahab  

	* config/arm/arm-c.c (arm_cpu_builtins): Define
	"__ARM_FEATURE_FP16_SCALAR_ARITHMETIC" and
	"__ARM_FEATURE_FP16_VECTOR_ARITHMETIC".

testsuite/
2016-05-17  Matthew Wahab  

	* gcc.target/arm/attr-fp16-arith-1.c: New.
---
 gcc/config/arm/arm-c.c   |  5 +++
 gcc/testsuite/gcc.target/arm/attr-fp16-arith-1.c | 45 
 2 files changed, 50 insertions(+)
 create mode 100644 gcc/testsuite/gcc.target/arm/attr-fp16-arith-1.c

diff --git a/gcc/config/arm/arm-c.c b/gcc/config/arm/arm-c.c
index b98470f..7283700 100644
--- a/gcc/config/arm/arm-c.c
+++ b/gcc/config/arm/arm-c.c
@@ -142,6 +142,11 @@ arm_cpu_builtins (struct cpp_reader* pfile)
   def_or_undef_macro (pfile, "__ARM_FP16_ARGS",
 		  arm_fp16_format != ARM_FP16_FORMAT_NONE);
 
+  def_or_undef_macro (pfile, "__ARM_FEATURE_FP16_SCALAR_ARITHMETIC",
+		  TARGET_VFP_FP16INST);
+  def_or_undef_macro (pfile, "__ARM_FEATURE_FP16_VECTOR_ARITHMETIC",
+		  TARGET_NEON_FP16INST);
+
   def_or_undef_macro (pfile, "__ARM_FEATURE_FMA", TARGET_FMA);
   def_or_undef_macro (pfile, "__ARM_NEON__", TARGET_NEON);
   def_or_undef_macro (pfile, "__ARM_NEON", TARGET_NEON);
diff --git a/gcc/testsuite/gcc.target/arm/attr-fp16-arith-1.c b/gcc/testsuite/gcc.target/arm/attr-fp16-arith-1.c
new file mode 100644
index 000..5011315
--- /dev/null
+++ b/gcc/testsuite/gcc.target/arm/attr-fp16-arith-1.c
@@ -0,0 +1,45 @@
+/* { dg-do compile } */
+/* { dg-require-effective-target arm_v8_2a_fp16_neon_ok } */
+/* { dg-options "-O2" } */
+/* { dg-add-options arm_v8_2a_fp16_scalar } */
+
+/* Reset fpu to a value compatible with the next pragmas.  */
+#pragma GCC target ("fpu=vfp")
+
+#pragma GCC push_options
+#pragma GCC target ("fpu=fp-armv8")
+
+#ifndef __ARM_FEATURE_FP16_SCALAR_ARITHMETIC
+#error __ARM_FEATURE_FP16_SCALAR_ARITHMETIC not defined.
+#endif
+
+#pragma GCC push_options
+#pragma GCC target ("fpu=neon-fp-armv8")
+
+#ifndef __ARM_FEATURE_FP16_VECTOR_ARITHMETIC
+#error __ARM_FEATURE_FP16_VECTOR_ARITHMETIC not defined.
+#endif
+
+#ifndef __ARM_NEON
+#error __ARM_NEON not defined.
+#endif
+
+#if !defined (__ARM_FP) || !(__ARM_FP & 0x2)
+#error Invalid value for __ARM_FP
+#endif
+
+#pragma GCC pop_options
+
+/* Check that the FP version is correctly reset to mfpu=fp-armv8.  */
+
+#if !defined (__ARM_FP) || !(__ARM_FP & 0x2)
+#error __ARM_FP should record FP16 support.
+#endif
+
+#pragma GCC pop_options
+
+/* Check that the FP version is correctly reset to mfpu=vfp.  */
+
+#if !defined (__ARM_FP) || (__ARM_FP & 0x2)
+#error Unexpected value for __ARM_FP.
+#endif
-- 
2.1.4



[PATCH 3/17][Testsuite] Add ARM support for ARMv8.2-A with FP16 arithmetic instructions.

2016-05-17 Thread Matthew Wahab

The ARMv8.2-A FP16 extension adds to both the VFP and the NEON
instruction sets. This patch adds support to the testsuite to select
targets and set options for tests that make use of these
instructions. It also adds documentation for ARMv8.1-A selectors.

Tested the series for arm-none-linux-gnueabihf with native bootstrap and
make check and for arm-none-eabi and armeb-none-eabi with make check on an
ARMv8.2-A emulator.

Ok for trunk?
Matthew

2016-05-17  Matthew Wahab  

* doc/sourcebuild.texi (ARM-specific attributes): Add entries for
arm_v8_1a_neon_ok, arm_v8_2a_fp16_scalar_ok, arm_v8_2a_fp16_scalar_hw,
arm_v8_2a_fp16_neon_ok and arm_v8_2a_fp16_neon_hw.
(Add options): Add entries for arm_v8_1a_neon, arm_v8_2a_scalar,
arm_v8_2a_neon.

testsuite/
2016-05-17  Matthew Wahab  

* lib/target-supports.exp
(add_options_for_arm_v8_2a_fp16_scalar_ok): New.
(add_options_for_arm_v8_2a_fp16_neon): New.
(check_effective_target_arm_arch_v8_2a_ok): Auto-generate.
(add_options_for_arm_arch_v8_2a): Auto-generate.
(check_effective_target_arm_arch_v8_2a_multilib): Auto-generate.
(check_effective_target_arm_v8_2a_fp16_scalar_ok_nocache): New.
(check_effective_target_arm_v8_2a_fp16_scalar_ok): New.
(check_effective_target_arm_v8_2a_fp16_neon_ok_nocache): New.
(check_effective_target_arm_v8_2a_fp16_neon_ok): New.
(check_effective_target_arm_v8_2a_fp16_scalar_hw): New.
(check_effective_target_arm_v8_2a_fp16_neon_hw): New.

>From ba9b4dcf774d0fdffae11ac59537255775e8f1b6 Mon Sep 17 00:00:00 2001
From: Matthew Wahab 
Date: Thu, 7 Apr 2016 13:34:30 +0100
Subject: [PATCH 03/17] [PATCH 3/17][Testsuite] Add ARM support for ARMv8.2-A
 with FP16   arithmetic instructions.

2016-05-17  Matthew Wahab  

	* doc/sourcebuild.texi (ARM-specific attributes): Add entries for
arm_v8_1a_none_ok, arm_v8_2a_fp16_scalar_ok, arm_v8_2a_fp16_scalar_hw,
	arm_v8_2a_fp16_neon_ok and arm_v8_2a_fp16_neon_hw.
	(Add options): Add entries for arm_v8_1a_neon, arm_v8_2a_scalar,
	arm_v8_2a_neon.
	* lib/target-supports.exp
	(add_options_for_arm_v8_2a_fp16_scalar_ok): New.
	(add_options_for_arm_v8_2a_fp16_neon): New.
	(check_effective_target_arm_arch_v8_2a_ok): Auto-generate.
	(add_options_for_arm_arch_v8_2a): Auto-generate.
	(check_effective_target_arm_arch_v8_2a_multilib): Auto-generate.
	(check_effective_target_arm_v8_2a_fp16_scalar_ok_nocache): New.
	(check_effective_target_arm_v8_2a_fp16_scalar_ok): New.
	(check_effective_target_arm_v8_2a_fp16_neon_ok_nocache): New.
	(check_effective_target_arm_v8_2a_fp16_neon_ok): New.
	(check_effective_target_arm_v8_2a_fp16_scalar_hw): New.
	(check_effective_target_arm_v8_2a_fp16_neon_hw): New.
---
 gcc/doc/sourcebuild.texi  |  40 ++
 gcc/testsuite/lib/target-supports.exp | 145 +-
 2 files changed, 184 insertions(+), 1 deletion(-)

diff --git a/gcc/doc/sourcebuild.texi b/gcc/doc/sourcebuild.texi
index dd6abda..66904a7 100644
--- a/gcc/doc/sourcebuild.texi
+++ b/gcc/doc/sourcebuild.texi
@@ -1596,6 +1596,7 @@ ARM target supports @code{-mfpu=neon-fp-armv8 -mfloat-abi=softfp}.
 Some multilibs may be incompatible with these options.
 
 @item arm_v8_1a_neon_ok
+@anchor{arm_v8_1a_neon_ok}
 ARM target supports options to generate ARMv8.1 Adv.SIMD instructions.
 Some multilibs may be incompatible with these options.
 
@@ -1604,6 +1605,28 @@ ARM target supports executing ARMv8.1 Adv.SIMD instructions.  Some
 multilibs may be incompatible with the options needed.  Implies
 arm_v8_1a_neon_ok.
 
+@item arm_v8_2a_fp16_scalar_ok
+@anchor{arm_v8_2a_fp16_scalar_ok}
+ARM target supports options to generate instructions for ARMv8.2 and
+scalar instructions from the FP16 extension.  Some multilibs may be
+incompatible with these options.
+
+@item arm_v8_2a_fp16_scalar_hw
+ARM target supports executing instructions for ARMv8.2 and scalar
+instructions from the FP16 extension.  Some multilibs may be
+incompatible with these options.  Implies arm_v8_2a_fp16_neon_ok.
+
+@item arm_v8_2a_fp16_neon_ok
+@anchor{arm_v8_2a_fp16_neon_ok}
+ARM target supports options to generate instructions from ARMv8.2 with
+the FP16 extension.  Some multilibs may be incompatible with these
+options.  Implies arm_v8_2a_fp16_scalar_ok.
+
+@item arm_v8_2a_fp16_neon_hw
+ARM target supports executing instructions from ARMv8.2 with the FP16
+extension.  Some multilibs may be incompatible with these options.
+Implies arm_v8_2a_fp16_neon_ok and arm_v8_2a_fp16_scalar_hw.
+
 @item arm_prefer_ldrd_strd
 ARM target prefers @code{LDRD} and @code{STRD} instructions over
 @code{LDM} and @code{STM} instructions.
@@ -2088,6 +2111,23 @@ the @ref{arm_neon_fp16_ok,,arm_neon_fp16_ok effective target keyword}.
 arm vfp3 floating point support; see
 the @ref{arm_vfp3_ok,,arm_vfp3_ok effective target 

[PATCH 2/17][Testsuite] Add a selector for ARM FP16 alternative format support.

2016-05-17 Thread Matthew Wahab

The ARMv8.2-A FP16 extension only supports the IEEE format for FP16
data. It is not compatible with the option -mfp16-format=none nor with
the option -mfp16-format=alternative (selecting the ARM alternative FP16
format). Using either with the FP16 extension will trigger a compiler
error.

This patch adds the selector arm_fp16_alternative_ok to the testsuite's
target-support code to allow tests to require support for the
alternative format. It also adds selector arm_fp16_none_ok to check
whether -mfp16-format=none is a valid option for the target.  The patch
also updates existing tests to make use of the new selectors.

Tested the series for arm-none-linux-gnueabihf with native bootstrap and
make check and for arm-none-eabi and armeb-none-eabi with make check on an
ARMv8.2-A emulator.

Ok for trunk?
Matthew

2016-05-17  Matthew Wahab  

* doc/sourcebuild.texi (ARM-specific attributes): Add entries for
arm_fp16_alternative_ok and arm_fp16_none_ok.

testsuite/
2016-05-17  Matthew Wahab  

* g++.dg/ext/arm-fp16/arm-fp16-ops-3.C: Use
arm_fp16_alternative_ok.
* g++.dg/ext/arm-fp16/arm-fp16-ops-4.C: Likewise.
* gcc.dg/torture/arm-fp16-int-convert-alt.c: Likewise.
* gcc/testsuite/gcc.dg/torture/arm-fp16-ops-3.c: Likewise.
* gcc/testsuite/gcc.dg/torture/arm-fp16-ops-4.c: Likewise.
* gcc.target/arm/fp16-compile-alt-1.c: Likewise.
* gcc.target/arm/fp16-compile-alt-10.c: Likewise.
* gcc.target/arm/fp16-compile-alt-11.c: Likewise.
* gcc.target/arm/fp16-compile-alt-12.c: Likewise.
* gcc.target/arm/fp16-compile-alt-2.c: Likewise.
* gcc.target/arm/fp16-compile-alt-3.c: Likewise.
* gcc.target/arm/fp16-compile-alt-4.c: Likewise.
* gcc.target/arm/fp16-compile-alt-5.c: Likewise.
* gcc.target/arm/fp16-compile-alt-6.c: Likewise.
* gcc.target/arm/fp16-compile-alt-7.c: Likewise.
* gcc.target/arm/fp16-compile-alt-8.c: Likewise.
* gcc.target/arm/fp16-compile-alt-9.c: Likewise.
* gcc.target/arm/fp16-compile-none-1.c: Use arm_fp16_none_ok.
* gcc.target/arm/fp16-compile-none-2.c: Likewise.
* gcc.target/arm/fp16-rounding-alt-1.c: Use
arm_fp16_alternative_ok.
* lib/target-supports.exp
(check_effective_target_arm_fp16_alternative_ok_nocache): New.
(check_effective_target_arm_fp16_alternative_ok): New.
(check_effective_target_arm_fp16_none_ok_nocache): New.
(check_effective_target_arm_fp16_none_ok): New.

>From 1901fdfbd2f8da9809a60e43284a1749b015dfba Mon Sep 17 00:00:00 2001
From: Matthew Wahab 
Date: Thu, 7 Apr 2016 13:33:51 +0100
Subject: [PATCH 02/17] [PATCH 2/17][Testsuite] Add a selector for ARM FP16
 alternative format support.

2016-05-17  Matthew Wahab  

	* doc/sourcebuild.texi (ARM-specific attributes): Add entries for
	arm_fp16_alternative_ok and arm_fp16_none_ok.

testsuite/
2016-05-17  Matthew Wahab  

	* g++.dg/ext/arm-fp16/arm-fp16-ops-3.C: Use
	arm_fp16_alternative_ok.
	* g++.dg/ext/arm-fp16/arm-fp16-ops-4.C: Likewise.
	* gcc.dg/torture/arm-fp16-int-convert-alt.c: Likewise.
	* gcc/testsuite/gcc.dg/torture/arm-fp16-ops-3.c: Likewise.
	* gcc/testsuite/gcc.dg/torture/arm-fp16-ops-4.c: Likewise.
	* gcc.target/arm/fp16-compile-alt-1.c: Likewise.
	* gcc.target/arm/fp16-compile-alt-10.c: Likewise.
	* gcc.target/arm/fp16-compile-alt-11.c: Likewise.
	* gcc.target/arm/fp16-compile-alt-12.c: Likewise.
	* gcc.target/arm/fp16-compile-alt-2.c: Likewise.
	* gcc.target/arm/fp16-compile-alt-3.c: Likewise.
	* gcc.target/arm/fp16-compile-alt-4.c: Likewise.
	* gcc.target/arm/fp16-compile-alt-5.c: Likewise.
	* gcc.target/arm/fp16-compile-alt-6.c: Likewise.
	* gcc.target/arm/fp16-compile-alt-7.c: Likewise.
	* gcc.target/arm/fp16-compile-alt-8.c: Likewise.
	* gcc.target/arm/fp16-compile-alt-9.c: Likewise.
	* gcc.target/arm/fp16-compile-none-1.c: Use arm_fp16_none_ok.
	* gcc.target/arm/fp16-compile-none-2.c: Likewise.
	* gcc.target/arm/fp16-rounding-alt-1.c: Use
	arm_fp16_alternative_ok.
	* lib/target-supports.exp
	(check_effective_target_arm_fp16_alternative_ok_nocache): New.
	(check_effective_target_arm_fp16_alternative_ok): New.
	(check_effective_target_arm_fp16_none_ok_nocache): New.
	(check_effective_target_arm_fp16_none_ok): New.
---
 gcc/doc/sourcebuild.texi   |  7 +++
 gcc/testsuite/g++.dg/ext/arm-fp16/arm-fp16-ops-3.C |  1 +
 gcc/testsuite/g++.dg/ext/arm-fp16/arm-fp16-ops-4.C |  1 +
 .../gcc.dg/torture/arm-fp16-int-convert-alt.c  |  1 +
 gcc/testsuite/gcc.dg/torture/arm-fp16-ops-3.c  |  1 +
 gcc/testsuite/gcc.dg/torture/arm-fp16-ops-4.c  |  1 +
 gcc/testsuite/gcc.target/arm/fp16-compile-alt-1.c  |  1 +
 gcc/testsuite/gcc.target/arm/fp16-compile-alt-10.c |  1 +
 gcc/testsuite/gcc.target/arm/fp16-compile-alt-11.c |  1 +
 

[PATCH 1/17][ARM] Add ARMv8.2-A command line option and profile.

2016-05-17 Thread Matthew Wahab

This patch adds the command options for the architecture ARMv8.2-A and
the half-precision extension. The architecture is selected by
-march=armv8.2-a and has all the properties of -march=armv8.1-a.

This patch also enables the CRC extension (+crc) which is required
for both ARMv8.2-A and ARMv8.1-A architectures but is not currently
enabled by default for -march=armv8.1-a.

The half-precision extension is selected using the extension +fp16. This
enables the VFP FP16 instructions if an ARMv8 VFP unit is also
specified, e.g. by -mfpu=fp-armv8. It also enables the FP16 NEON
instructions if an ARMv8 NEON unit is specified, e.g. by
-mfpu=neon-fp-armv8. Note that if the NEON FP16 instructions are enabled
then so are the VFP FP16 instructions.

Tested the series for arm-none-linux-gnueabihf with native bootstrap and
make check and for arm-none-eabi and armeb-none-eabi with make check on an
ARMv8.2-A emulator.

Ok for trunk?
Matthew

2016-05-17  Matthew Wahab  

* config/arm/arm-arches.def ("armv8.1-a"): Add FL_CRC32.
("armv8.2-a"): New.
("armv8.2-a+fp16"): New.
* config/arm/arm-protos.h (FL2_ARCH8_2): New.
(FL2_FP16INST): New.
(FL2_FOR_ARCH8_2A): New.
* config/arm/arm-tables.opt: Regenerate.
* config/arm/arm.c (arm_arch8_2): New.
(arm_fp16_inst): New.
(arm_option_override): Set arm_arch8_2 and arm_fp16_inst.  Check
for incompatible fp16-format settings.
* config/arm/arm.h (TARGET_VFP_FP16INST): New.
(TARGET_NEON_FP16INST): New.
(arm_arch8_2): Declare.
(arm_fp16_inst): Declare.
* config/arm/bpabi.h (BE8_LINK_SPEC): Add entries for
march=armv8.2-a and march=armv8.2-a+fp16.
* config/arm/t-aprofile (Arch Matches): Add entries for armv8.2-a
and armv8.2-a+fp16.
* doc/invoke.texi (ARM Options): Add "-march=armv8.1-a",
"-march=armv8.2-a" and "-march=armv8.2-a+fp16".


>From 7df41b0a5d248d842fd4c89082dc1a1055dc4604 Mon Sep 17 00:00:00 2001
From: Matthew Wahab 
Date: Thu, 7 Apr 2016 13:31:24 +0100
Subject: [PATCH 01/17] [PATCH 1/17][ARM] Add ARMv8.2-A command line option and
 profile.

2016-05-17  Matthew Wahab  

	* config/arm/arm-arches.def ("armv8.1-a"): Add FL_CRC32.
	("armv8.2-a"): New.
	("armv8.2-a+fp16"): New.
	* config/arm/arm-protos.h (FL2_ARCH8_2): New.
	(FL2_FP16INST): New.
	(FL2_FOR_ARCH8_2A): New.
	* config/arm/arm-tables.opt: Regenerate.
	* config/arm/arm.c (arm_arch8_2): New.
	(arm_fp16_inst): New.
	(arm_option_override): Set arm_arch8_2 and arm_fp16_inst.  Check
	for incompatible fp16-format settings.
	* config/arm/arm.h (TARGET_VFP_FP16INST): New.
	(TARGET_NEON_FP16INST): New.
	(arm_arch8_2): Declare.
	(arm_fp16_inst): Declare.
	* config/arm/bpabi.h (BE8_LINK_SPEC): Add entries for
	march=armv8.2-a and march=armv8.2-a+fp16.
	* config/arm/t-aprofile (Arch Matches): Add entries for armv8.2-a
	and armv8.2-a+fp16.
	* doc/invoke.texi (ARM Options): Add "-march=armv8.1-a",
	"-march=armv8.2-a" and "-march=armv8.2-a+fp16".
---
 gcc/config/arm/arm-arches.def | 10 --
 gcc/config/arm/arm-protos.h   |  4 
 gcc/config/arm/arm-tables.opt | 10 --
 gcc/config/arm/arm.c  | 15 +++
 gcc/config/arm/arm.h  | 14 ++
 gcc/config/arm/bpabi.h|  4 
 gcc/config/arm/t-aprofile |  2 ++
 gcc/doc/invoke.texi   | 13 +
 8 files changed, 68 insertions(+), 4 deletions(-)

diff --git a/gcc/config/arm/arm-arches.def b/gcc/config/arm/arm-arches.def
index fd02b18..2b4a80e 100644
--- a/gcc/config/arm/arm-arches.def
+++ b/gcc/config/arm/arm-arches.def
@@ -58,10 +58,16 @@ ARM_ARCH("armv7e-m", cortexm4,  7EM,	ARM_FSET_MAKE_CPU1 (FL_CO_PROC |	  FL_F
 ARM_ARCH("armv8-a", cortexa53,  8A,	ARM_FSET_MAKE_CPU1 (FL_CO_PROC | FL_FOR_ARCH8A))
 ARM_ARCH("armv8-a+crc",cortexa53, 8A,   ARM_FSET_MAKE_CPU1 (FL_CO_PROC | FL_CRC32  | FL_FOR_ARCH8A))
 ARM_ARCH("armv8.1-a", cortexa53,  8A,
-	  ARM_FSET_MAKE (FL_CO_PROC | FL_FOR_ARCH8A,  FL2_FOR_ARCH8_1A))
+	  ARM_FSET_MAKE (FL_CO_PROC | FL_CRC32 | FL_FOR_ARCH8A,
+			 FL2_FOR_ARCH8_1A))
 ARM_ARCH("armv8.1-a+crc",cortexa53, 8A,
 	  ARM_FSET_MAKE (FL_CO_PROC | FL_CRC32 | FL_FOR_ARCH8A,
 			 FL2_FOR_ARCH8_1A))
+ARM_ARCH ("armv8.2-a", cortexa53,  8A,
+	  ARM_FSET_MAKE (FL_CO_PROC | FL_CRC32 | FL_FOR_ARCH8A,
+			 FL2_FOR_ARCH8_2A))
+ARM_ARCH ("armv8.2-a+fp16", cortexa53,  8A,
+	  ARM_FSET_MAKE (FL_CO_PROC | FL_CRC32 | FL_FOR_ARCH8A,
+			 FL2_FOR_ARCH8_2A | FL2_FP16INST))
 ARM_ARCH("iwmmxt",  iwmmxt, 5TE,	ARM_FSET_MAKE_CPU1 (FL_LDSCHED | FL_STRONG | FL_FOR_ARCH5TE | FL_XSCALE | FL_IWMMXT))
 ARM_ARCH("iwmmxt2", iwmmxt2,5TE,	ARM_FSET_MAKE_CPU1 (FL_LDSCHED | FL_STRONG | FL_FOR_ARCH5TE | FL_XSCALE | FL_IWMMXT | FL_IWMMXT2))
-
diff --git a/gcc/config/arm/arm-protos.h b/gcc/config/arm/arm-protos.h
index d8179c4..c1a1eb8 100644
--- a/gcc/config/arm/arm-protos.h
+++ 

[PATCH 0/17][ARM] ARMv8.2-A and FP16 extension support.

2016-05-17 Thread Matthew Wahab

Hello,

The ARMv8.2-A architecture builds on ARMv8.1-A and includes an optional
extension supporting half-precision floating point (FP16)
arithmetic. This extension adds instructions to the VFP and NEON
instructions sets to provide operations on IEEE754-2008 formatted FP16
values.

This patch set adds support to GCC for the ARMv8.2-A architecture and
for the FP16 extension. The FP16 VFP and NEON instructions are exposed
as new ACLE intrinsics and support is added to the compiler to make use
of data movement and other precision-preserving instructions.

The new half-precision operations are treated as complementary to the
existing FP16 support. To preserve compatibility with existing code, the
ARM __fp16 data-type continues to be treated as a storage-only format
and operations on it continue to be by promotion to single precision
floating point. Half-precision operations are only supported through the
use of the intrinsics added by this patch set.

This series also includes a number of patches to improve the handling of
16-bit integer and floating point values. These are to support the code
generation of ARMv8.2 FP16 extension but are also made available
independently of it. Among these changes are a number of new ACLE data
processing instrinsics to support half-precision (f16) data.

The patches in this series:

- Add the command line and profile for the new architecture.
- Add selectors to the testsuite target-support to distinguish targets
  using the IEEE FP16 format from those using the ARM Alternative format.
- Add support (selectors and directives) to the testsuite target-support
  for ARMv8.2-A and the FP16 extension.
- Add feature macros for the new features.
- Improve the handling of 16-bit integers when VFP units are available.
- Add vector shuffle intrinsics for float16_t.
- Add data movement instructions introduced by the new extension.
- Add the VFP FP16 arithmetic instructions introduced by the extension.
- Add the NEON FP16 arithmetic instructions introduced by the extension.
- Refactor the code for initializing and expanding the NEON intrinsics.
- Add builtins to support intrinsics for VFP FP16 instruction.
- Add builtins to support intrinsics for NEON FP16 instruction.
- Add intrinsics for VFP FP16 instruction.
- Add intrinsics for NEON FP16 instruction.
- Add tests for ARMv8.2-A and the new FP16 support.
- Add tests for the VFP FP16 intrinsics.
- Add tests for the NEON FP16 intrinsics.

Tested the series for arm-none-linux-gnueabihf with native bootstrap and
make check and for arm-none-eabi and armeb-none-eabi with make check on
an ARMv8.2-A emulator. Also tested aarch64-none-elf with the
advsimd-intrinsics testsuite using an ARMv8.2-A emulator.

Matthew


Re: [PATCH, GCC] PR middle-end/55299, fold bitnot through ASR and rotates

2016-05-17 Thread Marc Glisse

On Tue, 17 May 2016, Richard Biener wrote:


On Fri, May 13, 2016 at 3:36 PM, Marc Glisse  wrote:

On Fri, 13 May 2016, Mikhail Maltsev wrote:


I don't know if we might want some :c / single_use restrictions, maybe on
the
outer convert and the rshift/rotate.


I don't think :c can be used here.



Oups, typo for :s.


As for :s, I added it, as you suggested.



:s will be ignored when there is no conversion, but I think that's good
enough for now.


Yeah.  Doing :s twice as done in the patch works though.


I meant that the output will be a single instruction, and thus any :s will 
be ignored (I think).



Also, I tried to add some more test cases for rotate with conversions, but
unfortunately GCC does not recognize rotate pattern, when narrowing
conversions
are present.



It is usually easier to split your expression into several assignments.
Untested:

int f(long long a, unsigned long n){
  long long b = ~a;
  unsigned long c = b;
  unsigned long d = ROLL (c, n);
  int e = d;
  return ~e;
}

this way the rotate pattern is detected early (generic) with no extra
operations to confuse the compiler, while your new transformation will
happen in gimple (most likely the first forwprop pass).

The patch looks good to me, now wait for Richard's comments.


Are you sure narrowing conversions are valid for rotates?


My reasoning is that narrowing conversions and bit_not commute (sign 
extensions should be fine as well IIRC). So you can essentially pull 
convert1 out and push convert2 down to @1 (notice how the the result 
converts the input and the output), and simplify the middle (rotate and 
bit_not commute) without any conversion.


Note that an alternative way to handle the transformation would be to fix 
a canonical order for some things that commute. Turn non-extending convert 
of bit_not into bit_not of convert, turn rotate of bit_not to bit_not of 
rotate (or the reverse, whatever), etc. If we are lucky, this might move 
the 2 bit_not next to each other, where they can cancel out. But that's 
more ambitious.


I heard that llvm and visual studio were using a prover for such 
transforms. Without going that far, it might be good to use some 
automation to check that a transform works say for all values of all types 
of precision at most 4 or something, if someone is motivated...



(char)short_var <

Notice that the rotation is done in the type of @0 both before and after.

--
Marc Glisse


Re: [PATCH, GCC] PR middle-end/55299, fold bitnot through ASR and rotates

2016-05-17 Thread Mikhail Maltsev
On 05/17/2016 04:39 PM, Richard Biener wrote:
> 
> Are you sure narrowing conversions are valid for rotates?
> 
> (char)short_var < byte.
> 
Yes, but the transformation leaves conversions as-is. Only bit_not is removed.

-- 
Regards,
Mikhail Maltsev


Re: [PATCH, GCC] PR middle-end/55299, fold bitnot through ASR and rotates

2016-05-17 Thread Richard Biener
On Fri, May 13, 2016 at 3:36 PM, Marc Glisse  wrote:
> On Fri, 13 May 2016, Mikhail Maltsev wrote:
>
>>> I don't know if we might want some :c / single_use restrictions, maybe on
>>> the
>>> outer convert and the rshift/rotate.
>>>
>> I don't think :c can be used here.
>
>
> Oups, typo for :s.
>
>> As for :s, I added it, as you suggested.
>
>
> :s will be ignored when there is no conversion, but I think that's good
> enough for now.

Yeah.  Doing :s twice as done in the patch works though.

>> Also, I tried to add some more test cases for rotate with conversions, but
>> unfortunately GCC does not recognize rotate pattern, when narrowing
>> conversions
>> are present.
>
>
> It is usually easier to split your expression into several assignments.
> Untested:
>
> int f(long long a, unsigned long n){
>   long long b = ~a;
>   unsigned long c = b;
>   unsigned long d = ROLL (c, n);
>   int e = d;
>   return ~e;
> }
>
> this way the rotate pattern is detected early (generic) with no extra
> operations to confuse the compiler, while your new transformation will
> happen in gimple (most likely the first forwprop pass).
>
> The patch looks good to me, now wait for Richard's comments.

Are you sure narrowing conversions are valid for rotates?

(char)short_var < --
> Marc Glisse


Re: [Patch AArch64] Simplify reduc_plus_scal_v2[sd]f sequence

2016-05-17 Thread Marcus Shawcroft
On 17 May 2016 at 12:02, James Greenhalgh  wrote:
> On Tue, May 17, 2016 at 11:32:36AM +0100, Marcus Shawcroft wrote:
>> On 17 May 2016 at 10:06, James Greenhalgh  wrote:
>> >
>> > Hi,
>> >
>> > This is just a simplification, it probably makes life easier for register
>> > allocation in some corner cases and seems the right thing to do. We don't
>> > use the internal version elsewhere, so we're safe to delete it and change
>> > the types.
>> >
>> > OK?
>> >
>> > Bootstrapped on AArch64 with no issues.
>>
>> Help me understand why this is ok for BE ?
>
> The reduc_plus_scal_ pattern wants to take a vector and return a scalar
> value representing the sum of the lanes of that vector. We want to go
> from V2DFmode to DFmode.
>
> The architectural instruction FADDP writes to a scalar value in the low
> bits of the register, leaving zeroes in the upper bits.
>
> i.e.
>
> faddp  d0, v1.2d
>
> 128 640
>  |0x0| v1.d[0] + v1.d[1]  |
>
> In the current implementation, we use the
> aarch64_reduc_plus_internal pattern, which treats the result of
> FADDP as a vector of two elements. We then need an extra step to extract
> the correct scalar value from that vector. From GCC's point of view the lane
> containing the result is either lane 0 (little-endian) or lane 1
> (big-endian), which is why the current code is endian dependent. The extract
> operation will always be a NOP move from architectural bits 0-63 to
> architectural bits 0-63 - but we never elide the move as future passes can't
> be certain that the upper bits are zero (they come out of an UNSPEC so
> could be anything).
>
> However, this is all unneccesary. FADDP does exactly what we want,
> regardless of endianness, we just need to model the instruction as writing
> the scalar value in the first place. Which is what this patch wires up.
>
> We probably just missed this optimization in the migration from the
> reduc_splus optabs (which required a vector return value) to the
> reduc_plus_scal optabs (which require a scalar return value).
>
> Does that help?


Yep. Thanks. OK to commit. /Marcus

> Thanks,
> James
>


[PATCH] Fix PR71132

2016-05-17 Thread Richard Biener

The following fixes a latent issue in loop distribution catched by
the fake edge placement adjustment.

Bootstrapped and tested on x86_64-unknown-linux-gnu, applied.

Richard.

2016-05-17  Richard Biener  

PR tree-optimization/71132
* tree-loop-distribution.c (create_rdg_cd_edges): Pass in loop.
Only add control dependences for blocks in the loop.
(build_rdg): Adjust.
(generate_code_for_partition): Return whether loop should
be destroyed and delay that.
(distribute_loop): Likewise.
(pass_loop_distribution::execute): Record loops to be destroyed
and perform delayed destroying of loops.

* gcc.dg/torture/pr71132.c: New testcase.

Index: gcc/tree-loop-distribution.c
===
*** gcc/tree-loop-distribution.c(revision 236309)
--- gcc/tree-loop-distribution.c(working copy)
*** create_rdg_flow_edges (struct graph *rdg
*** 312,318 
  /* Creates the edges of the reduced dependence graph RDG.  */
  
  static void
! create_rdg_cd_edges (struct graph *rdg, control_dependences *cd)
  {
int i;
  
--- 315,321 
  /* Creates the edges of the reduced dependence graph RDG.  */
  
  static void
! create_rdg_cd_edges (struct graph *rdg, control_dependences *cd, loop_p loop)
  {
int i;
  
*** create_rdg_cd_edges (struct graph *rdg,
*** 324,329 
--- 327,333 
  edge_iterator ei;
  edge e;
  FOR_EACH_EDGE (e, ei, gimple_bb (stmt)->preds)
+   if (flow_bb_inside_loop_p (loop, e->src))
  create_edge_for_control_dependence (rdg, e->src, i, cd);
}
else
*** build_rdg (vec loop_nest, contro
*** 455,461 
  
create_rdg_flow_edges (rdg);
if (cd)
! create_rdg_cd_edges (rdg, cd);
  
datarefs.release ();
  
--- 459,465 
  
create_rdg_flow_edges (rdg);
if (cd)
! create_rdg_cd_edges (rdg, cd, loop_nest[0]);
  
datarefs.release ();
  
*** destroy_loop (struct loop *loop)
*** 917,925 
   recompute_dominator (CDI_DOMINATORS, dest));
  }
  
! /* Generates code for PARTITION.  */
  
! static void
  generate_code_for_partition (struct loop *loop,
 partition *partition, bool copy_p)
  {
--- 921,929 
   recompute_dominator (CDI_DOMINATORS, dest));
  }
  
! /* Generates code for PARTITION.  Return whether LOOP needs to be destroyed.  
*/
  
! static bool 
  generate_code_for_partition (struct loop *loop,
 partition *partition, bool copy_p)
  {
*** generate_code_for_partition (struct loop
*** 930,936 
gcc_assert (!partition_reduction_p (partition)
  || !copy_p);
generate_loops_for_partition (loop, partition, copy_p);
!   return;
  
  case PKIND_MEMSET:
generate_memset_builtin (loop, partition);
--- 934,940 
gcc_assert (!partition_reduction_p (partition)
  || !copy_p);
generate_loops_for_partition (loop, partition, copy_p);
!   return false;
  
  case PKIND_MEMSET:
generate_memset_builtin (loop, partition);
*** generate_code_for_partition (struct loop
*** 947,953 
/* Common tail for partitions we turn into a call.  If this was the last
   partition for which we generate code, we have to destroy the loop.  */
if (!copy_p)
! destroy_loop (loop);
  }
  
  
--- 951,958 
/* Common tail for partitions we turn into a call.  If this was the last
   partition for which we generate code, we have to destroy the loop.  */
if (!copy_p)
! return true;
!   return false;
  }
  
  
*** pgcmp (const void *v1_, const void *v2_)
*** 1397,1407 
  /* Distributes the code from LOOP in such a way that producer
 statements are placed before consumer statements.  Tries to separate
 only the statements from STMTS into separate loops.
!Returns the number of distributed loops.  */
  
  static int
  distribute_loop (struct loop *loop, vec stmts,
!control_dependences *cd, int *nb_calls)
  {
struct graph *rdg;
partition *partition;
--- 1412,1423 
  /* Distributes the code from LOOP in such a way that producer
 statements are placed before consumer statements.  Tries to separate
 only the statements from STMTS into separate loops.
!Returns the number of distributed loops.  Set *DESTROY_P to whether
!LOOP needs to be destroyed.  */
  
  static int
  distribute_loop (struct loop *loop, vec stmts,
!control_dependences *cd, int *nb_calls, bool *destroy_p)
  {
struct graph *rdg;
partition *partition;
*** distribute_loop (struct loop *loop, vec<
*** 1644,1654 
if (dump_file && (dump_flags & TDF_DETAILS))
  dump_rdg_partitions (dump_file, partitions);
  

Re: [AArch64, 1/4] Add the missing support of vfms_n_f32, vfmsq_n_f32, vfmsq_n_f64

2016-05-17 Thread Kyrill Tkachov


On 17/05/16 13:40, Kyrill Tkachov wrote:


On 17/05/16 13:20, James Greenhalgh wrote:

On Mon, May 16, 2016 at 10:09:26AM +0100, Jiong Wang wrote:

The support of vfma_n_f64, vfms_n_f32, vfmsq_n_f32, vfmsq_n_f64 are
missing in current gcc arm_neon.h.

Meanwhile, besides "(fma (vec_dup (vec_select)))", fma by element can
also comes from "(fma (vec_dup(scalar" where the scalar value is already
sitting in vector register then duplicated to other lanes, and there is
no lane size change.

This patch implement this and can generate better code under some
context. For example:

cat test.c
===
typedef __Float32x2_t float32x2_t;
typedef float float32_t;

float32x2_t
vfma_n_f32 (float32x2_t __a, float32x2_t __b, float32_t __c)
{
   return __builtin_aarch64_fmav2sf (__b,  (float32x2_t) {__c,
__c}, __a);
}

before (-O2)
===
vfma_n_f32:
 dup v2.2s, v2.s[0]
 fmlav0.2s, v1.2s, v2.2s
 ret
after
===
vfma_n_f32:
 fmlav0.2s, v1.2s, v2.s[0]
 ret

OK for trunk?

2016-05-16  Jiong Wang 

This ChangeLog entry is not correctly formatted. There should be two
spaces between your name and your email, and each line should start with
a tab.


gcc/
   * config/aarch64/aarch64-simd.md (*aarch64_fma4_elt_to_128df): Rename
   to *aarch64_fma4_elt_from_dup.
   (*aarch64_fnma4_elt_to_128df): Rename to
*aarch64_fnma4_elt_from_dup.
   * config/aarch64/arm_neon.h (vfma_n_f64): New.
   (vfms_n_f32): Likewise.
   (vfms_n_f64): Likewise.
   (vfmsq_n_f32): Likewise.
   (vfmsq_n_f64): Likewise.

gcc/testsuite/
   * gcc/testsuite/gcc.target/aarch64/fmla_intrinsic_1.c: Use
standard syntax.
   * gcc/testsuite/gcc.target/aarch64/fmls_intrinsic_1.c: Likewise.

The paths of these two entries are incorrect. Remove the gcc/testsuite
from the front. I don't understand what you mean by "Use standard syntax.",
please fix this to describe what you are actually changing.


   * gcc.target/aarch64/advsimd-intrinsics/arm-neon-ref.h: New entry
for float64x1.
   * gcc.target/aarch64/advsimd-intrinsics/vfms_vfma_n.c: New.

These two changes need approval from an ARM maintainer as they are in
common files.

 From an AArch64 perspective, this patch is OK with a fixed ChangeLog. Please
wait for an ARM OK for the test changes.


Considering that the tests' functionality is guarded on #if defined(__aarch64__)
it's a noop on arm and so is ok from that perspective (we have precedence for
tests guarded in such a way in advsimd-intrinsics.exp)



Of course I meant precedent rather than precedence :/

Kyrill


The arm-neon-ref.h additions are ok too.

Thanks,
Kyrill



Thanks,
James


diff --git a/gcc/config/aarch64/aarch64-simd.md 
b/gcc/config/aarch64/aarch64-simd.md
index 
bd73bce64414e8bc01732d14311d742cf28f4586..90eaca176b4706e6cc42f16ce2c956f1c8ad17b1
 100644
--- a/gcc/config/aarch64/aarch64-simd.md
+++ b/gcc/config/aarch64/aarch64-simd.md
@@ -1579,16 +1579,16 @@
[(set_attr "type" "neon_fp_mla__scalar")]
  )
  -(define_insn "*aarch64_fma4_elt_to_128df"
-  [(set (match_operand:V2DF 0 "register_operand" "=w")
-(fma:V2DF
-  (vec_duplicate:V2DF
-  (match_operand:DF 1 "register_operand" "w"))
-  (match_operand:V2DF 2 "register_operand" "w")
-  (match_operand:V2DF 3 "register_operand" "0")))]
+(define_insn "*aarch64_fma4_elt_from_dup"
+  [(set (match_operand:VMUL 0 "register_operand" "=w")
+(fma:VMUL
+  (vec_duplicate:VMUL
+  (match_operand: 1 "register_operand" "w"))
+  (match_operand:VMUL 2 "register_operand" "w")
+  (match_operand:VMUL 3 "register_operand" "0")))]
"TARGET_SIMD"
-  "fmla\\t%0.2d, %2.2d, %1.2d[0]"
-  [(set_attr "type" "neon_fp_mla_d_scalar_q")]
+  "fmla\t%0., %2., %1.[0]"
+  [(set_attr "type" "neon_mla__scalar")]
  )
(define_insn "*aarch64_fma4_elt_to_64v2df"
@@ -1656,17 +1656,17 @@
[(set_attr "type" "neon_fp_mla__scalar")]
  )
  -(define_insn "*aarch64_fnma4_elt_to_128df"
-  [(set (match_operand:V2DF 0 "register_operand" "=w")
-(fma:V2DF
-  (neg:V2DF
-(match_operand:V2DF 2 "register_operand" "w"))
-  (vec_duplicate:V2DF
-(match_operand:DF 1 "register_operand" "w"))
-  (match_operand:V2DF 3 "register_operand" "0")))]
-  "TARGET_SIMD"
-  "fmls\\t%0.2d, %2.2d, %1.2d[0]"
-  [(set_attr "type" "neon_fp_mla_d_scalar_q")]
+(define_insn "*aarch64_fnma4_elt_from_dup"
+  [(set (match_operand:VMUL 0 "register_operand" "=w")
+(fma:VMUL
+  (neg:VMUL
+(match_operand:VMUL 2 "register_operand" "w"))
+  (vec_duplicate:VMUL
+(match_operand: 1 "register_operand" "w"))
+  (match_operand:VMUL 3 "register_operand" "0")))]
+  "TARGET_SIMD"
+  "fmls\t%0., %2., %1.[0]"
+  [(set_attr "type" "neon_mla__scalar")]
  )
(define_insn "*aarch64_fnma4_elt_to_64v2df"
diff --git a/gcc/config/aarch64/arm_neon.h b/gcc/config/aarch64/arm_neon.h
index 
2612a325718918cf7cd808f28c09c9c4c7b11c07..ca7ace5aa656163826569d046fcbf02f9f7d4d6c
 100644
--- a/gcc/config/aarch64/arm_neon.h
+++ 

Re: [AArch64, 1/4] Add the missing support of vfms_n_f32, vfmsq_n_f32, vfmsq_n_f64

2016-05-17 Thread Kyrill Tkachov


On 17/05/16 13:20, James Greenhalgh wrote:

On Mon, May 16, 2016 at 10:09:26AM +0100, Jiong Wang wrote:

The support of vfma_n_f64, vfms_n_f32, vfmsq_n_f32, vfmsq_n_f64 are
missing in current gcc arm_neon.h.

Meanwhile, besides "(fma (vec_dup (vec_select)))", fma by element can
also comes from "(fma (vec_dup(scalar" where the scalar value is already
sitting in vector register then duplicated to other lanes, and there is
no lane size change.

This patch implement this and can generate better code under some
context. For example:

cat test.c
===
typedef __Float32x2_t float32x2_t;
typedef float float32_t;

float32x2_t
vfma_n_f32 (float32x2_t __a, float32x2_t __b, float32_t __c)
{
   return __builtin_aarch64_fmav2sf (__b,  (float32x2_t) {__c,
__c}, __a);
}

before (-O2)
===
vfma_n_f32:
 dup v2.2s, v2.s[0]
 fmlav0.2s, v1.2s, v2.2s
 ret
after
===
vfma_n_f32:
 fmlav0.2s, v1.2s, v2.s[0]
 ret

OK for trunk?

2016-05-16  Jiong Wang 

This ChangeLog entry is not correctly formatted. There should be two
spaces between your name and your email, and each line should start with
a tab.


gcc/
   * config/aarch64/aarch64-simd.md (*aarch64_fma4_elt_to_128df): Rename
   to *aarch64_fma4_elt_from_dup.
   (*aarch64_fnma4_elt_to_128df): Rename to
*aarch64_fnma4_elt_from_dup.
   * config/aarch64/arm_neon.h (vfma_n_f64): New.
   (vfms_n_f32): Likewise.
   (vfms_n_f64): Likewise.
   (vfmsq_n_f32): Likewise.
   (vfmsq_n_f64): Likewise.

gcc/testsuite/
   * gcc/testsuite/gcc.target/aarch64/fmla_intrinsic_1.c: Use
standard syntax.
   * gcc/testsuite/gcc.target/aarch64/fmls_intrinsic_1.c: Likewise.

The paths of these two entries are incorrect. Remove the gcc/testsuite
from the front. I don't understand what you mean by "Use standard syntax.",
please fix this to describe what you are actually changing.


   * gcc.target/aarch64/advsimd-intrinsics/arm-neon-ref.h: New entry
for float64x1.
   * gcc.target/aarch64/advsimd-intrinsics/vfms_vfma_n.c: New.

These two changes need approval from an ARM maintainer as they are in
common files.

 From an AArch64 perspective, this patch is OK with a fixed ChangeLog. Please
wait for an ARM OK for the test changes.


Considering that the tests' functionality is guarded on #if defined(__aarch64__)
it's a noop on arm and so is ok from that perspective (we have precedence for
tests guarded in such a way in advsimd-intrinsics.exp)

The arm-neon-ref.h additions are ok too.

Thanks,
Kyrill



Thanks,
James


diff --git a/gcc/config/aarch64/aarch64-simd.md 
b/gcc/config/aarch64/aarch64-simd.md
index 
bd73bce64414e8bc01732d14311d742cf28f4586..90eaca176b4706e6cc42f16ce2c956f1c8ad17b1
 100644
--- a/gcc/config/aarch64/aarch64-simd.md
+++ b/gcc/config/aarch64/aarch64-simd.md
@@ -1579,16 +1579,16 @@
[(set_attr "type" "neon_fp_mla__scalar")]
  )
  
-(define_insn "*aarch64_fma4_elt_to_128df"

-  [(set (match_operand:V2DF 0 "register_operand" "=w")
-(fma:V2DF
-  (vec_duplicate:V2DF
- (match_operand:DF 1 "register_operand" "w"))
-  (match_operand:V2DF 2 "register_operand" "w")
-  (match_operand:V2DF 3 "register_operand" "0")))]
+(define_insn "*aarch64_fma4_elt_from_dup"
+  [(set (match_operand:VMUL 0 "register_operand" "=w")
+(fma:VMUL
+  (vec_duplicate:VMUL
+ (match_operand: 1 "register_operand" "w"))
+  (match_operand:VMUL 2 "register_operand" "w")
+  (match_operand:VMUL 3 "register_operand" "0")))]
"TARGET_SIMD"
-  "fmla\\t%0.2d, %2.2d, %1.2d[0]"
-  [(set_attr "type" "neon_fp_mla_d_scalar_q")]
+  "fmla\t%0., %2., %1.[0]"
+  [(set_attr "type" "neon_mla__scalar")]
  )
  
  (define_insn "*aarch64_fma4_elt_to_64v2df"

@@ -1656,17 +1656,17 @@
[(set_attr "type" "neon_fp_mla__scalar")]
  )
  
-(define_insn "*aarch64_fnma4_elt_to_128df"

-  [(set (match_operand:V2DF 0 "register_operand" "=w")
-(fma:V2DF
-  (neg:V2DF
-(match_operand:V2DF 2 "register_operand" "w"))
-  (vec_duplicate:V2DF
-   (match_operand:DF 1 "register_operand" "w"))
-  (match_operand:V2DF 3 "register_operand" "0")))]
-  "TARGET_SIMD"
-  "fmls\\t%0.2d, %2.2d, %1.2d[0]"
-  [(set_attr "type" "neon_fp_mla_d_scalar_q")]
+(define_insn "*aarch64_fnma4_elt_from_dup"
+  [(set (match_operand:VMUL 0 "register_operand" "=w")
+(fma:VMUL
+  (neg:VMUL
+(match_operand:VMUL 2 "register_operand" "w"))
+  (vec_duplicate:VMUL
+   (match_operand: 1 "register_operand" "w"))
+  (match_operand:VMUL 3 "register_operand" "0")))]
+  "TARGET_SIMD"
+  "fmls\t%0., %2., %1.[0]"
+  [(set_attr "type" "neon_mla__scalar")]
  )
  
  (define_insn "*aarch64_fnma4_elt_to_64v2df"

diff --git a/gcc/config/aarch64/arm_neon.h b/gcc/config/aarch64/arm_neon.h
index 
2612a325718918cf7cd808f28c09c9c4c7b11c07..ca7ace5aa656163826569d046fcbf02f9f7d4d6c
 100644
--- a/gcc/config/aarch64/arm_neon.h
+++ b/gcc/config/aarch64/arm_neon.h
@@ -14456,6 +14456,12 @@ vfma_n_f32 (float32x2_t __a, float32x2_t __b, 
float32_t 

Re: [AArch64, 4/4] Reimplement vmvn* intrinscis, remove inline assembly

2016-05-17 Thread James Greenhalgh
On Mon, May 16, 2016 at 10:09:42AM +0100, Jiong Wang wrote:
> This patch remove inline assembly and reimplement all mvn/mvnq vector
> integer intrinsics through the standard "one_cmpl2" pattern was
> introduced later after the initial implementation of those intrinsics.
> that's why inline assembly was used historically.
> 
> OK for trunk?
> 
> no regression on the exist advsimd-intrinsics/vmvn.c.
> 
> 2016-05-16  Jiong Wang
> 
> gcc/
>   * config/aarch64/arm_neon.h (vmvn_s8): Reimplement using C operator.
>   Remove inline assembly.
>   (vmvn_s16): Likewise.
>   (vmvn_s32): Likewise.
>   (vmvn_u8): Likewise.
>   (vmvn_u16): Likewise.
>   (vmvn_u32): Likewise.
>   (vmvnq_s8): Likewise.
>   (vmvnq_s16): Likewise.
>   (vmvnq_s32): Likewise.
>   (vmvnq_u8): Likewise.
>   (vmvnq_u16): Likewise.
>   (vmvnq_u32): Likewise.
>   (vmvn_p8): Likewise.
>   (vmvnq_p16): Likewise.

ChangeLog formatting is incorrect.

Otherwise, this is OK.

Thanks,
James




Re: RFA: Generate normal DWARF DW_LOC descriptors for non integer mode pointers

2016-05-17 Thread Nick Clifton
Hi Jeff,

>>   Currently dwarf2out.c:mem_loc_descriptor() has some special case
>>   code to handle the situation where an address is held in a register
>>   whose mode is not of type MODE_INT.  It generates a
>>   DW_OP_GNU_regval_type expression which may later on be converted into
>>   a frame pointer based expression.  This is a problem for targets which
>>   use a partial integer mode for their pointers (eg the msp430).  In
>>   such cases the conversion to a frame pointer based expression could
>>   be wrong if the frame pointer is not being used.

> I may be missing something, but isn't it the transition to an FP 
> relative address rather than a SP relative address that's the problem 
> here?

Yes, I believe so.

> Where does that happen?  

I did not track it down.  But whilst I was searching for the cause I came
across the code that is modified by the patch.  Reading the code it seemed
obvious to me that the special case for handling non INT_MODE register modes
was not intended for pointers, and when I tried out a small patch it worked.

> Is it possible we've got the wrong DECL_RTL or somesuch?

I don't think so.  I am not familiar with this code myself, but the dump from 
the dwarf2 pass shows:

  (insn 5 2 6 (set (mem/c:HI (plus:PSI (reg/f:PSI 1 R1)
(const_int 4 [0x4])) [1 c+0 S2 A16])
(const_int 5 [0x5])) 
/work/sources/binutils/current/gdb/testsuite/gdb.base/advance.c:41 12 {movhi}
 (nil))

which to me pretty clearly shows that "c" is being stored at R1+4.

Cheers
  Nick


Re: [AArch64, 3/4] Reimplement multiply by element to get rid of inline assembly

2016-05-17 Thread James Greenhalgh
On Mon, May 16, 2016 at 10:09:37AM +0100, Jiong Wang wrote:
> This patch reimplement vector multiply by element on top of the existed
> vmul_lane* intrinsics instead of inline assembly.
> 
> There is no code generation change from this patch.
> 
> OK for trunk?
> 
> 2016-05-16  Jiong Wang
> 
> gcc/
>   * config/aarch64/aarch64-simd.md (vmul_n_f32): Remove inline assembly.
>   Use builtin.
>   (vmul_n_s16): Likewise.
>   (vmul_n_s32): Likewise.
>   (vmul_n_u16): Likewise.
>   (vmul_n_u32): Likewise.
>   (vmulq_n_f32): Likewise.
>   (vmulq_n_f64): Likewise.
>   (vmulq_n_s16): Likewise.
>   (vmulq_n_s32): Likewise.
>   (vmulq_n_u16): Likewise.
>   (vmulq_n_u32): Likewise.
> 
> gcc/testsuite/
>   * gcc.target/aarch64/simd/vmul_elem_1.c: Use intrinsics.

Please format these ChangeLogs correctly, otherwise this is OK.

Thanks,
James



Re: [Patch ARM] Fix PR target/53440 - handle generic thunks better for TARGET_32BIT.

2016-05-17 Thread Ramana Radhakrishnan
On Tue, May 17, 2016 at 1:25 PM, Christophe Lyon
 wrote:
> On 1 April 2016 at 17:32, Ramana Radhakrishnan
>  wrote:
>> I've had this in my tree for a few months now but never got
>> around to submitting it.
>>
>> This partially fixes PR target/53440 atleast in ARM and
>> Thumb2 state. I haven't yet managed to get my head around
>> rewriting the Thumb1 support yet.
>>
>> Tested on armhf with a bootstrap and regression test
>> with no regressions.
>>
>
> Hi Ramana,
>
> It took me a while to understand why the test was failing on a Thumb1 target
> despite the dg-skip directive.
> The problem was that dg-do was after dg-skip.
>
> I've checked in the swap, I hope it is "obvious" enough.

Thanks - yes that's an obvious fix.

Ramana

>
> Christophe.
>
>> Queued for stage1 now as it isn't technically a regression.
>>
>> regards
>> Ramana
>>
>>
>>   Ramana Radhakrishnan  
>>
>> PR target/53440
>> * config/arm/arm.c (arm32_output_mi_thunk): New.
>> (arm_output_mi_thunk): Rename to arm_thumb1_mi_thunk. Rework
>> to split Thumb1 vs TARGET_32BIT functionality.
>> (arm_thumb1_mi_thunk): New.
>>
>>
>> * g++.dg/inherit/thunk1.C: Support arm / aarch64.


Re: [AArch64, 2/4] Extend vector mutiply by element to all supported modes

2016-05-17 Thread James Greenhalgh
On Mon, May 16, 2016 at 10:09:31AM +0100, Jiong Wang wrote:
> AArch64 support vector multiply by element for V2DF, V2SF, V4SF, V2SI,
> V4SI, V4HI, V8HI.
> 
> All above are well supported by "*aarch64_mul3_elt" pattern and
> "*aarch64_mul3_elt_" if there is lane size
> change.
> 
> Above patterns are trying to match "(mul (vec_dup (vec_select)))"
> which is genuinely vector multiply by element.
> 
> While vector multiply by element can also comes from "(mul (vec_dup
> (scalar" where the scalar value is already sitting in vector register
> then duplicated to other lanes, and there is no lane size change.
> 
> We have "*aarch64_mul3_elt_to_128df" to match this already, but it's
> restricted for V2DF while this patch extends this support to more modes,
> for example vector integer operations.
> 
> For the testcase included, the following codegen change will happen:
> 
> 
> -   ldr w0, [x3, 160]
> -   dup v1.2s, w0
> -   mul v1.2s, v1.2s, v2.2s
> +   ldr s1, [x3, 160]
> +   mul v1.2s, v0.2s, v1.s[0]
> 
> OK for trunk?
> 
> 2016-05-16  Jiong Wang
> 
> gcc/
>   * config/aarch64/aarch64-simd.md (*aarch64_mul3_elt_to_128df): Extend to all
>   supported modes.  Rename to "*aarch64_mul3_elt_from_dup".
> 
> gcc/testsuite/
>   * /gcc.target/aarch64/simd/vmul_elem_1.c: New.


This ChangeLog formatting is incorrect. It should look like:

gcc/

2016-05-17  Jiong Wang  

* config/aarch64/aarch64-simd.md (*aarch64_mul3_elt_to_128df): Extend
to all supported modes.  Rename to...
(*aarch64_mul3_elt_from_dup): ...this.

gcc/testsuite/

2016-05-17  Jiong Wang  

* gcc.target/aarch64/simd/vmul_elem_1.c: New.

Otherwise, this patch is OK.

Thanks,
James

> diff --git a/gcc/config/aarch64/aarch64-simd.md 
> b/gcc/config/aarch64/aarch64-simd.md
> index 
> eb18defef15c24bf2334045e92bf7c34b989136d..7f338ff78fabccee868a4befbffed54c3e842dc9
>  100644
> --- a/gcc/config/aarch64/aarch64-simd.md
> +++ b/gcc/config/aarch64/aarch64-simd.md
> @@ -371,15 +371,15 @@
>[(set_attr "type" "neon_mul__scalar")]
>  )
>  
> -(define_insn "*aarch64_mul3_elt_to_128df"
> -  [(set (match_operand:V2DF 0 "register_operand" "=w")
> - (mult:V2DF
> -   (vec_duplicate:V2DF
> -  (match_operand:DF 2 "register_operand" "w"))
> -  (match_operand:V2DF 1 "register_operand" "w")))]
> +(define_insn "*aarch64_mul3_elt_from_dup"
> + [(set (match_operand:VMUL 0 "register_operand" "=w")
> +(mult:VMUL
> +  (vec_duplicate:VMUL
> + (match_operand: 1 "register_operand" ""))
> +  (match_operand:VMUL 2 "register_operand" "w")))]
>"TARGET_SIMD"
> -  "fmul\\t%0.2d, %1.2d, %2.d[0]"
> -  [(set_attr "type" "neon_fp_mul_d_scalar_q")]
> +  "mul\t%0., %2., %1.[0]";
> +  [(set_attr "type" "neon_mul__scalar")]
>  )
>  
>  (define_insn "aarch64_rsqrte_2"
> diff --git a/gcc/testsuite/gcc.target/aarch64/simd/vmul_elem_1.c 
> b/gcc/testsuite/gcc.target/aarch64/simd/vmul_elem_1.c
> new file mode 100644
> index 
> ..290a4e9adbc5d9ce1335ca28120e437293776f30
> --- /dev/null
> +++ b/gcc/testsuite/gcc.target/aarch64/simd/vmul_elem_1.c
> @@ -0,0 +1,519 @@
> +/* Test the vmul_n_f64 AArch64 SIMD intrinsic.  */
> +
> +/* { dg-do run } */
> +/* { dg-options "-O2 --save-temps" } */
> +
> +#include "arm_neon.h"
> +
> +extern void abort (void);
> +
> +#define A (132.4f)
> +#define B (-0.0f)
> +#define C (-34.8f)
> +#define D (289.34f)
> +float32_t expected2_1[2] = {A * A, B * A};
> +float32_t expected2_2[2] = {A * B, B * B};
> +float32_t expected4_1[4] = {A * A, B * A, C * A, D * A};
> +float32_t expected4_2[4] = {A * B, B * B, C * B, D * B};
> +float32_t expected4_3[4] = {A * C, B * C, C * C, D * C};
> +float32_t expected4_4[4] = {A * D, B * D, C * D, D * D};
> +float32_t _elemA = A;
> +float32_t _elemB = B;
> +float32_t _elemC = C;
> +float32_t _elemD = D;
> +
> +#define AD (1234.5)
> +#define BD (-0.0)
> +#define CD (71.3)
> +#define DD (-1024.4)
> +float64_t expectedd2_1[2] = {AD * CD, BD * CD};
> +float64_t expectedd2_2[2] = {AD * DD, BD * DD};
> +float64_t _elemdC = CD;
> +float64_t _elemdD = DD;
> +
> +
> +#define AS (1024)
> +#define BS (-31)
> +#define CS (0)
> +#define DS (655)
> +int32_t expecteds2_1[2] = {AS * AS, BS * AS};
> +int32_t expecteds2_2[2] = {AS * BS, BS * BS};
> +int32_t expecteds4_1[4] = {AS * AS, BS * AS, CS * AS, DS * AS};
> +int32_t expecteds4_2[4] = {AS * BS, BS * BS, CS * BS, DS * BS};
> +int32_t expecteds4_3[4] = {AS * CS, BS * CS, CS * CS, DS * CS};
> +int32_t expecteds4_4[4] = {AS * DS, BS * DS, CS * DS, DS * DS};
> +int32_t _elemsA = AS;
> +int32_t _elemsB = BS;
> +int32_t _elemsC = CS;
> +int32_t _elemsD = DS;
> +
> +#define AH ((int16_t) 0)
> +#define BH ((int16_t) -32)
> +#define CH ((int16_t) 102)
> +#define DH ((int16_t) -51)
> +#define EH ((int16_t) 71)
> +#define FH ((int16_t) -91)
> +#define GH ((int16_t) 48)
> +#define HH ((int16_t) 255)
> +int16_t 

[PATCH] Fix PR71104 - call gimplification

2016-05-17 Thread Richard Biener

The following patch addresses PR71104 which shows verify-SSA ICEs
after gimplify-into-SSA.  The issue is that for returns-twice calls
we gimplify register uses in the LHS before the actual call which leads to

  p.0_1 = p;
  _2 = vfork ();
  *p.0_1 = _2;

when gimplifying *p = vfork ().  That of course does not work - 
fortunately the C standard allows to evaluate operands in the LHS
in unspecified order of the RHS.  That also makes this order aligned
with that scary C++ proposal of defined evaluation order.  It also
improves code-generation, avoiding spilling of the pointer load
around the call.

Exchanging the gimplify calls doesn't fix the issue fully as for
aggregate returns we don't gimplify the call result into a
temporary.  So we need to make sure to not emit an SSA when
gimplifying the LHS of a returns-twice call (this path only applies
to non-register returns).

A bootstrap with just the gimplification order exchange is building
target libs right now, I'll re-bootstrap and test the whole thing
again if that succeeds.

Is this ok?  I think it makes sense code-generation-wise.  Code
changes from GCC 6

bar:
.LFB0:
.cfi_startproc
subq$24, %rsp
.cfi_def_cfa_offset 32
callfoo
movqp(%rip), %rax
movq%rax, 8(%rsp)
callvfork
movq8(%rsp), %rdx
movl%eax, (%rdx)
addq$24, %rsp
.cfi_def_cfa_offset 8
ret

to

bar:
.LFB0:
.cfi_startproc
subq$8, %rsp
.cfi_def_cfa_offset 16
callfoo
callvfork
movqp(%rip), %rdx
movl%eax, (%rdx)
addq$8, %rsp
.cfi_def_cfa_offset 8
ret

Thanks,
Richard.

2016-05-17  Richard Biener  

PR middle-end/71104
* gimplify.c (gimplify_modify_expr): Gimplify the RHS before
gimplifying the LHS.  Make sure to gimplify a returning twice
call LHS without using SSA names.

* gcc.dg/pr71104-1.c: New testcase.
* gcc.dg/pr71104-2.c: Likewise.

Index: gcc/gimplify.c
===
*** gcc/gimplify.c  (revision 236317)
--- gcc/gimplify.c  (working copy)
*** gimplify_modify_expr (tree *expr_p, gimp
*** 4708,4717 
   that is what we must do here.  */
maybe_with_size_expr (from_p);
  
-   ret = gimplify_expr (to_p, pre_p, post_p, is_gimple_lvalue, fb_lvalue);
-   if (ret == GS_ERROR)
- return ret;
- 
/* As a special case, we have to temporarily allow for assignments
   with a CALL_EXPR on the RHS.  Since in GIMPLE a function call is
   a toplevel statement, when gimplifying the GENERIC expression
--- 4708,4713 
*** gimplify_modify_expr (tree *expr_p, gimp
*** 4729,4734 
--- 4725,4746 
if (ret == GS_ERROR)
  return ret;
  
+   /* If we gimplified the RHS to a CALL_EXPR and that call may return
+  twice we have to make sure to gimplify into non-SSA as otherwise
+  the abnormal edge added later will make those defs not dominate
+  their uses.
+  ???  Technically this applies only to the registers used in the
+  resulting non-register *TO_P.  */
+   bool saved_into_ssa = gimplify_ctxp->into_ssa;
+   if (saved_into_ssa
+   && TREE_CODE (*from_p) == CALL_EXPR
+   && call_expr_flags (*from_p) & ECF_RETURNS_TWICE)
+ gimplify_ctxp->into_ssa = false;
+   ret = gimplify_expr (to_p, pre_p, post_p, is_gimple_lvalue, fb_lvalue);
+   gimplify_ctxp->into_ssa = saved_into_ssa;
+   if (ret == GS_ERROR)
+ return ret;
+ 
/* In case of va_arg internal fn wrappped in a WITH_SIZE_EXPR, add the type
   size as argument to the call.  */
if (TREE_CODE (*from_p) == WITH_SIZE_EXPR)
Index: gcc/testsuite/gcc.dg/pr71104-1.c
===
*** gcc/testsuite/gcc.dg/pr71104-1.c(revision 0)
--- gcc/testsuite/gcc.dg/pr71104-1.c(working copy)
***
*** 0 
--- 1,11 
+ /* { dg-do compile } */
+ 
+ void foo(void);
+ int vfork(void);
+ int *p;
+ 
+ void bar(void)
+ {
+   foo();
+   *p = vfork();
+ }
Index: gcc/testsuite/gcc.dg/pr71104-2.c
===
*** gcc/testsuite/gcc.dg/pr71104-2.c(revision 0)
--- gcc/testsuite/gcc.dg/pr71104-2.c(working copy)
***
*** 0 
--- 1,12 
+ /* { dg-do compile } */
+ 
+ struct Foo { char c[1024]; };
+ void foo(void);
+ struct Foo baz(void) __attribute__((returns_twice));
+ struct Foo *p;
+ 
+ void bar(void)
+ {
+   foo();
+   *p = baz();
+ }


Re: [Patch ARM] Fix PR target/53440 - handle generic thunks better for TARGET_32BIT.

2016-05-17 Thread Christophe Lyon
On 1 April 2016 at 17:32, Ramana Radhakrishnan
 wrote:
> I've had this in my tree for a few months now but never got
> around to submitting it.
>
> This partially fixes PR target/53440 atleast in ARM and
> Thumb2 state. I haven't yet managed to get my head around
> rewriting the Thumb1 support yet.
>
> Tested on armhf with a bootstrap and regression test
> with no regressions.
>

Hi Ramana,

It took me a while to understand why the test was failing on a Thumb1 target
despite the dg-skip directive.
The problem was that dg-do was after dg-skip.

I've checked in the swap, I hope it is "obvious" enough.

Christophe.

> Queued for stage1 now as it isn't technically a regression.
>
> regards
> Ramana
>
>
>   Ramana Radhakrishnan  
>
> PR target/53440
> * config/arm/arm.c (arm32_output_mi_thunk): New.
> (arm_output_mi_thunk): Rename to arm_thumb1_mi_thunk. Rework
> to split Thumb1 vs TARGET_32BIT functionality.
> (arm_thumb1_mi_thunk): New.
>
>
> * g++.dg/inherit/thunk1.C: Support arm / aarch64.
Index: gcc/testsuite/ChangeLog
===
--- gcc/testsuite/ChangeLog (revision 236318)
+++ gcc/testsuite/ChangeLog (revision 236319)
@@ -1,3 +1,7 @@
+2016-05-17  Christophe Lyon  
+
+   * g++.dg/inherit/think1.C: Fix dg-do and dg-skip order.
+
 2016-05-17  Kyrylo Tkachov  
 
PR target/70809
Index: gcc/testsuite/g++.dg/inherit/thunk1.C
===
--- gcc/testsuite/g++.dg/inherit/thunk1.C   (revision 236318)
+++ gcc/testsuite/g++.dg/inherit/thunk1.C   (revision 236319)
@@ -1,5 +1,5 @@
+// { dg-do run { target arm*-*-* aarch64*-*-* i?86-*-* x86_64-*-* s390*-*-* 
alpha*-*-* ia64-*-* sparc*-*-* } }
 // { dg-skip-if "" { arm_thumb1_ok } }
-// { dg-do run { target arm*-*-* aarch64*-*-* i?86-*-* x86_64-*-* s390*-*-* 
alpha*-*-* ia64-*-* sparc*-*-* } }
 
 #include 
 


Re: [AArch64, 1/4] Add the missing support of vfms_n_f32, vfmsq_n_f32, vfmsq_n_f64

2016-05-17 Thread James Greenhalgh
On Mon, May 16, 2016 at 10:09:26AM +0100, Jiong Wang wrote:
> The support of vfma_n_f64, vfms_n_f32, vfmsq_n_f32, vfmsq_n_f64 are
> missing in current gcc arm_neon.h.
> 
> Meanwhile, besides "(fma (vec_dup (vec_select)))", fma by element can
> also comes from "(fma (vec_dup(scalar" where the scalar value is already
> sitting in vector register then duplicated to other lanes, and there is
> no lane size change.
> 
> This patch implement this and can generate better code under some
> context. For example:
> 
> cat test.c
> ===
> typedef __Float32x2_t float32x2_t;
> typedef float float32_t;
> 
> float32x2_t
> vfma_n_f32 (float32x2_t __a, float32x2_t __b, float32_t __c)
> {
>   return __builtin_aarch64_fmav2sf (__b,  (float32x2_t) {__c,
> __c}, __a);
> }
> 
> before (-O2)
> ===
> vfma_n_f32:
> dup v2.2s, v2.s[0]
> fmlav0.2s, v1.2s, v2.2s
> ret
> after
> ===
> vfma_n_f32:
> fmlav0.2s, v1.2s, v2.s[0]
> ret
> 
> OK for trunk?
> 
> 2016-05-16  Jiong Wang 

This ChangeLog entry is not correctly formatted. There should be two
spaces between your name and your email, and each line should start with
a tab.

> 
> gcc/
>   * config/aarch64/aarch64-simd.md (*aarch64_fma4_elt_to_128df): Rename
>   to *aarch64_fma4_elt_from_dup.
>   (*aarch64_fnma4_elt_to_128df): Rename to
> *aarch64_fnma4_elt_from_dup.
>   * config/aarch64/arm_neon.h (vfma_n_f64): New.
>   (vfms_n_f32): Likewise.
>   (vfms_n_f64): Likewise.
>   (vfmsq_n_f32): Likewise.
>   (vfmsq_n_f64): Likewise.
> 
> gcc/testsuite/
>   * gcc/testsuite/gcc.target/aarch64/fmla_intrinsic_1.c: Use
> standard syntax.
>   * gcc/testsuite/gcc.target/aarch64/fmls_intrinsic_1.c: Likewise.

The paths of these two entries are incorrect. Remove the gcc/testsuite
from the front. I don't understand what you mean by "Use standard syntax.",
please fix this to describe what you are actually changing.

>   * gcc.target/aarch64/advsimd-intrinsics/arm-neon-ref.h: New entry
> for float64x1.
>   * gcc.target/aarch64/advsimd-intrinsics/vfms_vfma_n.c: New.

These two changes need approval from an ARM maintainer as they are in
common files.

>From an AArch64 perspective, this patch is OK with a fixed ChangeLog. Please
wait for an ARM OK for the test changes.

Thanks,
James

> diff --git a/gcc/config/aarch64/aarch64-simd.md 
> b/gcc/config/aarch64/aarch64-simd.md
> index 
> bd73bce64414e8bc01732d14311d742cf28f4586..90eaca176b4706e6cc42f16ce2c956f1c8ad17b1
>  100644
> --- a/gcc/config/aarch64/aarch64-simd.md
> +++ b/gcc/config/aarch64/aarch64-simd.md
> @@ -1579,16 +1579,16 @@
>[(set_attr "type" "neon_fp_mla__scalar")]
>  )
>  
> -(define_insn "*aarch64_fma4_elt_to_128df"
> -  [(set (match_operand:V2DF 0 "register_operand" "=w")
> -(fma:V2DF
> -  (vec_duplicate:V2DF
> -   (match_operand:DF 1 "register_operand" "w"))
> -  (match_operand:V2DF 2 "register_operand" "w")
> -  (match_operand:V2DF 3 "register_operand" "0")))]
> +(define_insn "*aarch64_fma4_elt_from_dup"
> +  [(set (match_operand:VMUL 0 "register_operand" "=w")
> +(fma:VMUL
> +  (vec_duplicate:VMUL
> +   (match_operand: 1 "register_operand" "w"))
> +  (match_operand:VMUL 2 "register_operand" "w")
> +  (match_operand:VMUL 3 "register_operand" "0")))]
>"TARGET_SIMD"
> -  "fmla\\t%0.2d, %2.2d, %1.2d[0]"
> -  [(set_attr "type" "neon_fp_mla_d_scalar_q")]
> +  "fmla\t%0., %2., %1.[0]"
> +  [(set_attr "type" "neon_mla__scalar")]
>  )
>  
>  (define_insn "*aarch64_fma4_elt_to_64v2df"
> @@ -1656,17 +1656,17 @@
>[(set_attr "type" "neon_fp_mla__scalar")]
>  )
>  
> -(define_insn "*aarch64_fnma4_elt_to_128df"
> -  [(set (match_operand:V2DF 0 "register_operand" "=w")
> -(fma:V2DF
> -  (neg:V2DF
> -(match_operand:V2DF 2 "register_operand" "w"))
> -  (vec_duplicate:V2DF
> - (match_operand:DF 1 "register_operand" "w"))
> -  (match_operand:V2DF 3 "register_operand" "0")))]
> -  "TARGET_SIMD"
> -  "fmls\\t%0.2d, %2.2d, %1.2d[0]"
> -  [(set_attr "type" "neon_fp_mla_d_scalar_q")]
> +(define_insn "*aarch64_fnma4_elt_from_dup"
> +  [(set (match_operand:VMUL 0 "register_operand" "=w")
> +(fma:VMUL
> +  (neg:VMUL
> +(match_operand:VMUL 2 "register_operand" "w"))
> +  (vec_duplicate:VMUL
> + (match_operand: 1 "register_operand" "w"))
> +  (match_operand:VMUL 3 "register_operand" "0")))]
> +  "TARGET_SIMD"
> +  "fmls\t%0., %2., %1.[0]"
> +  [(set_attr "type" "neon_mla__scalar")]
>  )
>  
>  (define_insn "*aarch64_fnma4_elt_to_64v2df"
> diff --git a/gcc/config/aarch64/arm_neon.h b/gcc/config/aarch64/arm_neon.h
> index 
> 2612a325718918cf7cd808f28c09c9c4c7b11c07..ca7ace5aa656163826569d046fcbf02f9f7d4d6c
>  100644
> --- a/gcc/config/aarch64/arm_neon.h
> +++ b/gcc/config/aarch64/arm_neon.h
> @@ -14456,6 +14456,12 @@ vfma_n_f32 (float32x2_t __a, float32x2_t __b, 
> float32_t __c)
>return __builtin_aarch64_fmav2sf (__b, vdup_n_f32 (__c), __a);
>  }
>  
> +__extension__ static __inline 

Re: [Patch V2] Fix SLP PR58135.

2016-05-17 Thread Richard Biener
On Tue, May 17, 2016 at 1:56 PM, Kumar, Venkataramanan
 wrote:
> Hi Richard,
>
> I created the patch by passing -b option to git. Now the patch is more 
> readable.
>
> As per your suggestion I tried to fix the PR by splitting the SLP store group 
> at vector boundary after the SLP tree is built.
>
> Boot strap PASSED on x86_64.
> Checked the patch with check_GNU_style.sh.
>
> The gfortran.dg/pr46519-1.f test now does SLP vectorization. Hence it  
> generated 2 more vzeroupper.
> As recommended I adjusted the test case by adding -fno-tree-slp-vectorize to 
> make it as expected after loop vectorization.
>
> The following tests are now passing.
>
> -- Snip-
> Tests that now work, but didn't before:
>
> gcc.dg/vect/bb-slp-19.c -flto -ffat-lto-objects  scan-tree-dump-times slp2 
> "basic block vectorized" 1
>
> gcc.dg/vect/bb-slp-19.c scan-tree-dump-times slp2 "basic block vectorized" 1
>
> New tests that PASS:
>
> gcc.dg/vect/pr58135.c (test for excess errors) gcc.dg/vect/pr58135.c -flto 
> -ffat-lto-objects (test for excess errors)
>
> -- Snip-
>
> ChangeLog
>
> 2016-05-14  Venkataramanan Kumar  
>  PR tree-optimization/58135
> * tree-vect-slp.c:  When group size is not multiple of vector size,
>  allow splitting of store group at vector boundary.
>
> Test suite  ChangeLog
> 2016-05-14  Venkataramanan Kumar  
> * gcc.dg/vect/bb-slp-19.c:  Remove XFAIL.
> * gcc.dg/vect/pr58135.c:  Add new.
> * gfortran.dg/pr46519-1.f: Adjust test case.
>
> The attached patch Ok for trunk?


Please avoid the excessive vertical space around the vect_build_slp_tree call.

+  /* Calculate the unrolling factor.  */
+  unrolling_factor = least_common_multiple
+ (nunits, group_size) / group_size;
...
+  else
{
  /* Calculate the unrolling factor based on the smallest type.  */
  if (max_nunits > nunits)
-unrolling_factor = least_common_multiple (max_nunits, group_size)
-   / group_size;
+   unrolling_factor
+   = least_common_multiple (max_nunits, group_size)/group_size;

please compute the "correct" unroll factor immediately and move the
"unrolling of BB required" error into the if() case by post-poning the
nunits < group_size check (and use max_nunits here).

+  if (is_a  (vinfo)
+ && nunits < group_size
+ && unrolling_factor != 1
+ && is_a  (vinfo))
+   {
+ dump_printf_loc (MSG_MISSED_OPTIMIZATION, vect_location,
+  "Build SLP failed: store group "
+  "size not a multiple of the vector size "
+  "in basic block SLP\n");
+ /* Fatal mismatch.  */
+ matches[nunits] = false;

this is too pessimistic - you want to add the extra 'false' at
group_size / max_nunits * max_nunits.

It looks like you leak 'node' in the if () path as well.  You need

  vect_free_slp_tree (node);
  loads.release ();

thus treat it as a failure case.

Thanks,
Richard.

> Regards,
> Venkat.
>


Fix firefox compilation ICE

2016-05-17 Thread Jan Hubicka
Hi,
this patch fixes ICE while building Firefox (and probably xalancbmk, too)
with -O3 -flto.  I originally tested the whole patchset on several bigger apps
including the inline heuristics change which teach it that thunks are very 
cheap.
Mainline doesn't contain it that makes us to inline into thunks more heavilly
and results in some suprises.  I hope it is last one.

Bootstrapped/regtested x86_64-linux, I am re-testing with last minute change
and will commit afterwards.

Honza

* ipa-inline-transform.c (preserve_function_body_p): Look for
first non-thunk clone.
* lto-cgraph.c (lto_output_edge): When streaming thunk do not look
up call stmt id.
(lto_output_node): Inline thunks don't need body in every
partition.
* lto-streamer-in.c: Do not fixup thunk clones.
Index: ipa-inline-transform.c
===
--- ipa-inline-transform.c  (revision 236275)
+++ ipa-inline-transform.c  (working copy)
@@ -587,9 +587,10 @@ preserve_function_body_p (struct cgraph_
   gcc_assert (symtab->global_info_ready);
   gcc_assert (!node->alias && !node->thunk.thunk_p);
 
-  /* Look if there is any clone around.  */
-  if (node->clones && !node->clones->thunk.thunk_p)
-return true;
+  /* Look if there is any non-thunk clone around.  */
+  for (node = node->clones; node; node = node->next_sibling_clone)
+if (!node->thunk.thunk_p)
+  return true;
   return false;
 }
 
Index: lto-cgraph.c
===
--- lto-cgraph.c(revision 236275)
+++ lto-cgraph.c(working copy)
@@ -259,7 +259,7 @@ lto_output_edge (struct lto_simple_outpu
   streamer_write_gcov_count_stream (ob->main_stream, edge->count);
 
   bp = bitpack_create (ob->main_stream);
-  uid = (!gimple_has_body_p (edge->caller->decl)
+  uid = (!gimple_has_body_p (edge->caller->decl) || edge->caller->thunk.thunk_p
 ? edge->lto_stmt_uid : gimple_uid (edge->call_stmt) + 1);
   bp_pack_enum (, cgraph_inline_failed_t,
CIF_N_REASONS, edge->inline_failed);
@@ -398,7 +398,8 @@ lto_output_node (struct lto_simple_outpu
 
   boundary_p = !lto_symtab_encoder_in_partition_p (encoder, node);
 
-  if (node->analyzed && (!boundary_p || node->alias || node->thunk.thunk_p))
+  if (node->analyzed && (!boundary_p || node->alias
+|| (node->thunk.thunk_p && !node->global.inlined_to)))
 tag = LTO_symtab_analyzed_node;
   else
 tag = LTO_symtab_unavail_node;
Index: lto-streamer-in.c
===
--- lto-streamer-in.c   (revision 236275)
+++ lto-streamer-in.c   (working copy)
@@ -952,20 +952,21 @@ fixup_call_stmt_edges (struct cgraph_nod
   fixup_call_stmt_edges_1 (orig, stmts, fn);
   if (orig->clones)
 for (node = orig->clones; node != orig;)
-  {
-   fixup_call_stmt_edges_1 (node, stmts, fn);
-   if (node->clones)
- node = node->clones;
-   else if (node->next_sibling_clone)
- node = node->next_sibling_clone;
-   else
- {
-   while (node != orig && !node->next_sibling_clone)
- node = node->clone_of;
-   if (node != orig)
- node = node->next_sibling_clone;
- }
-  }
+  if (!node->thunk.thunk_p)
+   {
+ fixup_call_stmt_edges_1 (node, stmts, fn);
+ if (node->clones)
+   node = node->clones;
+ else if (node->next_sibling_clone)
+   node = node->next_sibling_clone;
+ else
+   {
+ while (node != orig && !node->next_sibling_clone)
+   node = node->clone_of;
+ if (node != orig)
+   node = node->next_sibling_clone;
+   }
+   }
 }
 
 


Re: VRP: range info of new variables

2016-05-17 Thread Marc Glisse

On Mon, 16 May 2016, Jeff Law wrote:


- Now that I think of it, maybe I should check that the variable is not
a pointer before calling set_range_info? Having range [0, 1] makes it
unlikely, but who knows...

Maybe using an assert would be better.


I don't think having a pointer there would be completely wrong, just 
unlikely, so I'd rather add a check but not assert.



Index: gcc/tree-vrp.c
===
--- gcc/tree-vrp.c  (revision 236194)
+++ gcc/tree-vrp.c  (working copy)
@@ -8933,20 +8933,24 @@ simplify_truth_ops_using_ranges (gimple_
 gimple_assign_set_rhs_with_ops (gsi,
need_conversion
? NOP_EXPR : TREE_CODE (op0), op0);
   /* For A != B we substitute A ^ B.  Either with conversion.  */
   else if (need_conversion)
 {
   tree tem = make_ssa_name (TREE_TYPE (op0));
   gassign *newop
= gimple_build_assign (tem, BIT_XOR_EXPR, op0, op1);
   gsi_insert_before (gsi, newop, GSI_SAME_STMT);
+  if (TYPE_PRECISION (TREE_TYPE (tem)) > 1)
+   set_range_info (tem, VR_RANGE,
+   wi::zero (TYPE_PRECISION (TREE_TYPE (tem))),
+   wi::one (TYPE_PRECISION (TREE_TYPE (tem;
Is there actually a case where TYPE_PRECISION (TREE_TYPE (tem)) > 1 is ever 
false?  Would an assert make more sense here?


op0 can have precision 1, so tem can as well. In most cases I would expect 
need_conversion to be false in that case though. However, it doesn't seem 
impossible to have several types with 1-bit precision that are not 
equivalent (different TYPE_SIGN for instance). So again, I don't feel 
comfortable adding an assert. But I am open to proofs that those events 
cannot happen.



 static bool
 simplify_conversion_using_ranges (gimple *stmt)
Your ChangeLog mentions simplify_switch_using_ranges, not 
simplify_conversion_using_ranges.


Oups, bad copy-paste (I keep too much context in the diff for diff -p to 
give useful results), thanks.


This is OK for the trunk -- your call on asserting the variable is not a 
pointer before calling set_range_info.  Similarly on the check that the 
TYPE_PRECISION (TREE_TYPE (tem)) > 1.


--
Marc Glisse


[Patch V2] Fix SLP PR58135.

2016-05-17 Thread Kumar, Venkataramanan
Hi Richard, 

I created the patch by passing -b option to git. Now the patch is more readable.

As per your suggestion I tried to fix the PR by splitting the SLP store group 
at vector boundary after the SLP tree is built.

Boot strap PASSED on x86_64.
Checked the patch with check_GNU_style.sh.

The gfortran.dg/pr46519-1.f test now does SLP vectorization. Hence it  
generated 2 more vzeroupper.  
As recommended I adjusted the test case by adding -fno-tree-slp-vectorize to 
make it as expected after loop vectorization.

The following tests are now passing.

-- Snip-
Tests that now work, but didn't before:

gcc.dg/vect/bb-slp-19.c -flto -ffat-lto-objects  scan-tree-dump-times slp2 
"basic block vectorized" 1

gcc.dg/vect/bb-slp-19.c scan-tree-dump-times slp2 "basic block vectorized" 1

New tests that PASS:

gcc.dg/vect/pr58135.c (test for excess errors) gcc.dg/vect/pr58135.c -flto 
-ffat-lto-objects (test for excess errors)

-- Snip-

ChangeLog

2016-05-14  Venkataramanan Kumar  
 PR tree-optimization/58135
* tree-vect-slp.c:  When group size is not multiple of vector size, 
 allow splitting of store group at vector boundary. 

Test suite  ChangeLog
2016-05-14  Venkataramanan Kumar  
* gcc.dg/vect/bb-slp-19.c:  Remove XFAIL. 
* gcc.dg/vect/pr58135.c:  Add new.
* gfortran.dg/pr46519-1.f: Adjust test case.

The attached patch Ok for trunk?

Regards,
Venkat.

diff --git a/gcc/testsuite/gcc.dg/vect/bb-slp-19.c 
b/gcc/testsuite/gcc.dg/vect/bb-slp-19.c
index 42cd294..c282155 100644
--- a/gcc/testsuite/gcc.dg/vect/bb-slp-19.c
+++ b/gcc/testsuite/gcc.dg/vect/bb-slp-19.c
@@ -53,5 +53,5 @@ int main (void)
   return 0;
 }
 
-/* { dg-final { scan-tree-dump-times "basic block vectorized" 1 "slp2"  { 
xfail *-*-* }  } } */
+/* { dg-final { scan-tree-dump-times "basic block vectorized" 1 "slp2" } } */
   
diff --git a/gcc/testsuite/gcc.dg/vect/pr58135.c 
b/gcc/testsuite/gcc.dg/vect/pr58135.c
new file mode 100644
index 000..ca25000
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/vect/pr58135.c
@@ -0,0 +1,10 @@
+/* { dg-do compile } */
+/* { dg-require-effective-target vect_int } */
+
+int a[100];
+void foo ()
+{
+  a[0] = a[1] = a[2] = a[3] = a[4]= 0;
+}
+
+/* { dg-final { scan-tree-dump-times "basic block vectorized" 1 "slp2" } } */
diff --git a/gcc/testsuite/gfortran.dg/pr46519-1.f 
b/gcc/testsuite/gfortran.dg/pr46519-1.f
index 51c64b8..46be9f5 100644
--- a/gcc/testsuite/gfortran.dg/pr46519-1.f
+++ b/gcc/testsuite/gfortran.dg/pr46519-1.f
@@ -1,5 +1,5 @@
 ! { dg-do compile { target i?86-*-* x86_64-*-* } }
-! { dg-options "-O3 -mavx -mvzeroupper -mtune=generic -dp" }
+! { dg-options "-O3 -mavx -mvzeroupper -fno-tree-slp-vectorize -mtune=generic 
-dp" }
 
   PROGRAM MG3XDEMO 
   INTEGER LM, NM, NV, NR, NIT
diff --git a/gcc/tree-vect-slp.c b/gcc/tree-vect-slp.c
index d713848..23a127f 100644
--- a/gcc/tree-vect-slp.c
+++ b/gcc/tree-vect-slp.c
@@ -1754,18 +1754,6 @@ vect_analyze_slp_instance (vec_info *vinfo,
 }
   nunits = TYPE_VECTOR_SUBPARTS (vectype);
 
-  /* Calculate the unrolling factor.  */
-  unrolling_factor = least_common_multiple (nunits, group_size) / group_size;
-  if (unrolling_factor != 1 && is_a  (vinfo))
-{
-  if (dump_enabled_p ())
-dump_printf_loc (MSG_MISSED_OPTIMIZATION, vect_location,
-"Build SLP failed: unrolling required in basic"
-" block SLP\n");
-
-  return false;
-}
-
   /* Create a node (a root of the SLP tree) for the packed grouped stores.  */
   scalar_stmts.create (group_size);
   next = stmt;
@@ -1801,21 +1789,43 @@ vect_analyze_slp_instance (vec_info *vinfo,
   /* Build the tree for the SLP instance.  */
   bool *matches = XALLOCAVEC (bool, group_size);
   unsigned npermutes = 0;
-  if ((node = vect_build_slp_tree (vinfo, scalar_stmts, group_size,
+
+  node = vect_build_slp_tree (vinfo, scalar_stmts, group_size,
  _nunits, , matches, ,
-  NULL, max_tree_size)) != NULL)
+ NULL, max_tree_size);
+
+  if (node != NULL)
+{
+  /* Calculate the unrolling factor.  */
+  unrolling_factor = least_common_multiple
+ (nunits, group_size) / group_size;
+
+  if (is_a  (vinfo)
+ && nunits < group_size
+ && unrolling_factor != 1
+ && is_a  (vinfo))
+   {
+ dump_printf_loc (MSG_MISSED_OPTIMIZATION, vect_location,
+  "Build SLP failed: store group "
+  "size not a multiple of the vector size "
+  "in basic block SLP\n");
+ /* Fatal mismatch.  */
+ matches[nunits] = false;
+   }
+  else
{
  /* Calculate the unrolling factor based on the smallest type.  */
  if (max_nunits > nunits)
-unrolling_factor = least_common_multiple (max_nunits, group_size)

Re: match.pd: x & C -> x if we know that x & ~C == 0

2016-05-17 Thread Richard Biener
On Tue, May 17, 2016 at 8:59 AM, Marc Glisse  wrote:
> Hello,
>
> the testcase for this patch is taken from
> https://gcc.gnu.org/ml/gcc-patches/2016-05/msg00683.html
>
> get_nonzero_bits only gives may-be-set bits. To handle an equivalent
> transform for bit_ior, I would need must-be-set bits, maybe I should check
> if that is available somewhere in CCP...
>
> The patch is extremely similar to
> https://gcc.gnu.org/ml/gcc-patches/2016-05/msg01042.html , I'll make sure
> they are consistant with respect to testing for SSA_NAME / !POINTER_TYPE_P
> if I get feedback on either (today I am leaning towards adding all possible
> checks, just to be sure).

Yeah, better be safe than sorry.

> Bootstrap+regtest on powerpc64le-unknown-linux-gnu.

Ok with the missing INTEGRAL_TYPE_P check.

Thanks,
Richard.

>
> 2016-05-17  Marc Glisse  
>
> gcc/
> * match.pd (X & C): New transformation.
>
> gcc/testsuite/
> * gcc.dg/tree-ssa/and-1.c: New testcase.
>
> --
> Marc Glisse
> Index: gcc/match.pd
> ===
> --- gcc/match.pd(revision 236300)
> +++ gcc/match.pd(working copy)
> @@ -548,20 +548,28 @@ DEFINE_INT_AND_FLOAT_ROUND_FN (RINT)
>  (simplify
>   (bit_and @0 integer_all_onesp)
>(non_lvalue @0))
>
>  /* x & x -> x,  x | x -> x  */
>  (for bitop (bit_and bit_ior)
>   (simplify
>(bitop @0 @0)
>(non_lvalue @0)))
>
> +/* x & C -> x if we know that x & ~C == 0.  */
> +#if GIMPLE
> +(simplify
> + (bit_and SSA_NAME@0 INTEGER_CST@1)
> + (if ((get_nonzero_bits (@0) & wi::bit_not (@1)) == 0)
> +  @0))
> +#endif
> +
>  /* x + (x & 1) -> (x + 1) & ~1 */
>  (simplify
>   (plus:c @0 (bit_and:s @0 integer_onep@1))
>   (bit_and (plus @0 @1) (bit_not @1)))
>
>  /* x & ~(x & y) -> x & ~y */
>  /* x | ~(x | y) -> x | ~y  */
>  (for bitop (bit_and bit_ior)
>   (simplify
>(bitop:c @0 (bit_not (bitop:cs @0 @1)))
> Index: gcc/testsuite/gcc.dg/tree-ssa/and-1.c
> ===
> --- gcc/testsuite/gcc.dg/tree-ssa/and-1.c   (revision 0)
> +++ gcc/testsuite/gcc.dg/tree-ssa/and-1.c   (working copy)
> @@ -0,0 +1,11 @@
> +/* { dg-do compile } */
> +/* { dg-options "-O -fdump-tree-optimized-raw" } */
> +
> +int f(int in) {
> +  in = in | 3;
> +  in = in ^ 1;
> +  in = (in & ~(unsigned long)1);
> +  return in;
> +}
> +
> +/* { dg-final { scan-tree-dump-not "bit_and_expr" "optimized" } } */
>


Re: match.pd: ~X & Y to X ^ Y in some cases

2016-05-17 Thread Richard Biener
On Fri, May 13, 2016 at 9:07 PM, Marc Glisse  wrote:
> Hello,
>
> maybe this would fit better in VRP, but it is easier (and not completely
> useless) to put it in match.pd.
>
> Since the transformation is restricted to GIMPLE, I think I don't need to
> check that @0 is SSA_NAME. I didn't test if @0 has pointer type before
> calling get_range_info because we are doing bit_not on it, but it looks like
> I should because we can do bitops on pointers?

Yes, also because for odd reasons we may get constants into it here (which would
admittedly be a bug).

Thanks,
Richard.

> Adjustment for pr69270.c is exactly the same as in the previous patch from
> today :-)
>
> Bootstrap+regtest on powerpc64le-unknown-linux-gnu.
>
>
> 2016-05-16  Marc Glisse  
>
> gcc/
> * match.pd (~X & Y): New transformation.
>
> gcc/testsuite/
> * gcc.dg/tree-ssa/pr69270.c: Adjust.
> * gcc.dg/tree-ssa/andnot-1.c: New testcase.
>
>
> --
> Marc Glisse
> Index: gcc/match.pd
> ===
> --- gcc/match.pd(revision 236194)
> +++ gcc/match.pd(working copy)
> @@ -496,20 +496,27 @@ DEFINE_INT_AND_FLOAT_ROUND_FN (RINT)
>(minus @1 (bit_xor @0 @1)))
>
>  /* Simplify (X & ~Y) | (~X & Y) -> X ^ Y.  */
>  (simplify
>   (bit_ior (bit_and:c @0 (bit_not @1)) (bit_and:c (bit_not @0) @1))
>(bit_xor @0 @1))
>  (simplify
>   (bit_ior:c (bit_and @0 INTEGER_CST@2) (bit_and (bit_not @0)
> INTEGER_CST@1))
>   (if (wi::bit_not (@2) == @1)
>(bit_xor @0 @1)))
> +/* Simplify (~X & Y) to X ^ Y if we know that (X & ~Y) is 0.  */
> +#if GIMPLE
> +(simplify
> + (bit_and (bit_not @0) INTEGER_CST@1)
> + (if ((get_nonzero_bits (@0) & wi::bit_not (@1)) == 0)
> +  (bit_xor @0 @1)))
> +#endif
>
>  /* X % Y is smaller than Y.  */
>  (for cmp (lt ge)
>   (simplify
>(cmp (trunc_mod @0 @1) @1)
>(if (TYPE_UNSIGNED (TREE_TYPE (@0)))
> { constant_boolean_node (cmp == LT_EXPR, type); })))
>  (for cmp (gt le)
>   (simplify
>(cmp @1 (trunc_mod @0 @1))
> Index: gcc/testsuite/gcc.dg/tree-ssa/andnot-1.c
> ===
> --- gcc/testsuite/gcc.dg/tree-ssa/andnot-1.c(revision 0)
> +++ gcc/testsuite/gcc.dg/tree-ssa/andnot-1.c(working copy)
> @@ -0,0 +1,12 @@
> +/* { dg-do compile } */
> +/* { dg-options "-O -fdump-tree-optimized-raw" } */
> +
> +unsigned f(unsigned i){
> +  i >>= __SIZEOF_INT__ * __CHAR_BIT__ - 3;
> +  i = ~i;
> +  return i & 7;
> +}
> +
> +/* { dg-final { scan-tree-dump "bit_xor_expr" "optimized" } } */
> +/* { dg-final { scan-tree-dump-not "bit_not_expr" "optimized" } } */
> +/* { dg-final { scan-tree-dump-not "bit_and_expr" "optimized" } } */
> Index: gcc/testsuite/gcc.dg/tree-ssa/pr69270.c
> ===
> --- gcc/testsuite/gcc.dg/tree-ssa/pr69270.c (revision 236194)
> +++ gcc/testsuite/gcc.dg/tree-ssa/pr69270.c (working copy)
> @@ -1,21 +1,19 @@
>  /* { dg-do compile } */
>  /* { dg-options "-O2 -fsplit-paths -fdump-tree-dom3-details" } */
>
>  /* There should be two references to bufferstep that turn into
> constants.  */
>  /* { dg-final { scan-tree-dump-times "Replaced .bufferstep_\[0-9\]+. with
> constant .0." 1 "dom3"} } */
>  /* { dg-final { scan-tree-dump-times "Replaced .bufferstep_\[0-9\]+. with
> constant .1." 1 "dom3"} } */
>
>  /* And some assignments ought to fold down to constants.  */
> -/* { dg-final { scan-tree-dump-times "Folded to: _\[0-9\]+ = -1;" 1 "dom3"}
> } */
> -/* { dg-final { scan-tree-dump-times "Folded to: _\[0-9\]+ = -2;" 1 "dom3"}
> } */
>  /* { dg-final { scan-tree-dump-times "Folded to: _\[0-9\]+ = 1;" 1 "dom3"}
> } */
>  /* { dg-final { scan-tree-dump-times "Folded to: _\[0-9\]+ = 0;" 1 "dom3"}
> } */
>
>  /* The XOR operations should have been optimized to constants.  */
>  /* { dg-final { scan-tree-dump-not "bit_xor" "dom3"} } */
>
>
>  extern int *stepsizeTable;
>
>  void
>


Re: [Patch] PR rtl-optimization/71150, guard in_class_p check with REG_P

2016-05-17 Thread Uros Bizjak
On Tue, May 17, 2016 at 1:18 PM, Jiong Wang  wrote:
> On 17/05/16 11:23, Uros Bizjak wrote:
>>
>> On Tue, May 17, 2016 at 12:17 PM, Uros Bizjak  wrote:
>>>
>>> Hello!
>>>
 This bug is introduced by my commit r236181 where the inner rtx of
 SUBREG haven't been checked while it should as "in_class_p" only
 works with REG, and SUBREG_REG is actually not always REG.  If REG_P
 check failed,  then we should fall back to normal code patch. The
 following simple testcase for x86 can reproduce this bug.
 diff --git a/gcc/lra-constraints.c b/gcc/lra-constraints.c
 index 56ab5b4..e4e6c8c 100644
 --- a/gcc/lra-constraints.c
 +++ b/gcc/lra-constraints.c
   @@ -1317,7 +1317,8 @@ process_addr_reg (rtx *loc, bool check_only_p,
 rtx_insn **before, rtx_insn **aft
   register, and this normally will be a subreg which should be reloaded
   as a whole.  This is particularly likely to be triggered when
   -fno-split-wide-types specified.  */
 -  if (in_class_p (reg, cl, _class)
 +  if (!REG_P (reg)
 +  || in_class_p (reg, cl, _class)
|| GET_MODE_SIZE (mode) <= GET_MODE_SIZE (ptr_mode))
 loc = _REG (*loc);
>>>
>>> Why not check SUBREG_P instead of !REG_P?
>>
>> Or, alternatively:
>>
>> if ((REG_P && !in_class_p (reg, ...))
>>  || GET_MODE_SIZE ...)
>>
>> Which is IMO much more readable.
>
>
> Thanks for review.
>
> I think your proposed rewrite will be the following,
>
>   if (!(REG_P (reg) && !in_class_p (reg, cl, _class))
>   || GET_MODE_SIZE (mode) <= GET_MODE_SIZE (ptr_mode))
>
> I feel the original code is easier to understand.
> The check logic is composed of three conditions, in a pre-requisite order,
> if one condition is true then reload the inner rtx, otherwise reload the
> whole subreg.
>
>   if (!REG_P (reg)
>   || in_class_p (reg, cl, _class)
>   || GET_MODE_SIZE (mode) <= GET_MODE_SIZE (ptr_mode))
>loc = _REG (*loc);

Well, I don't want to bikeshed about it, it is OK with me either way.

(You still need a review of functionality from Vlad, though ...)

Thanks,
Uros.


Re: Avoid inlining into instrumetnation thunks

2016-05-17 Thread Richard Biener
On Mon, May 16, 2016 at 9:34 PM, Jan Hubicka  wrote:
> Hi,
> this patch fixes chkp ICE when we try to inline into an instrumentation thunk.
> This is not really a thunk and ths can't be hanled as such.
>
> Bootstrapped/regtested x86_64-linux
>
> Honza
>
> 2016-05-16  Jan Hubicka  
>
> * ipa-inline-analysis.c (compute_inline_parameters): Disable inlinig
> into instrumentation thunks.
> * cif-code.def (CIF_CHKP): New.
>
> Index: ipa-inline-analysis.c
> ===
> --- ipa-inline-analysis.c   (revision 236275)
> +++ ipa-inline-analysis.c   (working copy)
> @@ -2943,7 +2943,13 @@ compute_inline_parameters (struct cgraph
>info->self_size = info->size;
>info->self_time = info->time;
>/* We can not inline instrumetnation clones.  */
> -  info->inlinable = !node->thunk.add_pointer_bounds_args;
> +  if (node->thunk.add_pointer_bounds_args)
> +   {
> +  info->inlinable = false;
> +  node->callees->inline_failed = CIF_CHKP;
> +   }
> +  else
> +info->inlinable = true;
>  }
>else
>  {
> Index: cif-code.def
> ===
> --- cif-code.def(revision 236275)
> +++ cif-code.def(working copy)
> @@ -135,3 +135,7 @@ DEFCIFCODE(CILK_SPAWN, CIF_FINAL_ERROR,
>  /* We proved that the call is unreachable.  */
>  DEFCIFCODE(UNREACHABLE, CIF_FINAL_ERROR,
>N_("unreachable"))
> +
> +/* We can't inline because of instrumentation thunk.  */
> +DEFCIFCODE(CHKP, CIF_FINAL_ERROR,
> +  N_("caller is instrumetnation thunk"))

instrumentation


Re: [PATCH] [rtlfe] Barebones implementation of "__RTL"; next steps?

2016-05-17 Thread Richard Biener
On Mon, May 16, 2016 at 8:48 PM, Jeff Law  wrote:
> On 05/12/2016 08:29 AM, David Malcolm wrote:
>>
>>
>> One wart I ran into is that system.h has this:
>>
>> /* Front ends should never have to include middle-end headers.  Enforce
>>this by poisoning the header double-include protection defines.  */
>> #ifdef IN_GCC_FRONTEND
>> #pragma GCC poison GCC_RTL_H GCC_EXCEPT_H GCC_EXPR_H
>> #endif
>>
>> i.e. the idea of running RTL code from inside the C frontend seems to
>> be banned.
>
> Yea, we really don't want the front-ends to know about the guts of RTL. This
> work would seem to violate that guiding principle.
>
> I'd be more in favor of a true RTL front-end rather than bolting it onto the
> side of the C front-end.

It will require inventing sth new for types and decls though.  Which I expect
to be the majority of the new frontend then (similar to the GIMPLE FE).

I don't think that's desirable.

Richard.

> jeff
>


Re: [Patch] PR rtl-optimization/71150, guard in_class_p check with REG_P

2016-05-17 Thread Jiong Wang

On 17/05/16 11:23, Uros Bizjak wrote:

On Tue, May 17, 2016 at 12:17 PM, Uros Bizjak  wrote:

Hello!


This bug is introduced by my commit r236181 where the inner rtx of
SUBREG haven't been checked while it should as "in_class_p" only
works with REG, and SUBREG_REG is actually not always REG.  If REG_P
check failed,  then we should fall back to normal code patch. The
following simple testcase for x86 can reproduce this bug.
diff --git a/gcc/lra-constraints.c b/gcc/lra-constraints.c
index 56ab5b4..e4e6c8c 100644
--- a/gcc/lra-constraints.c
+++ b/gcc/lra-constraints.c
  @@ -1317,7 +1317,8 @@ process_addr_reg (rtx *loc, bool check_only_p, rtx_insn 
**before, rtx_insn **aft
  register, and this normally will be a subreg which should be reloaded
  as a whole.  This is particularly likely to be triggered when
  -fno-split-wide-types specified.  */
-  if (in_class_p (reg, cl, _class)
+  if (!REG_P (reg)
+  || in_class_p (reg, cl, _class)
   || GET_MODE_SIZE (mode) <= GET_MODE_SIZE (ptr_mode))
loc = _REG (*loc);

Why not check SUBREG_P instead of !REG_P?

Or, alternatively:

if ((REG_P && !in_class_p (reg, ...))
 || GET_MODE_SIZE ...)

Which is IMO much more readable.


Thanks for review.

I think your proposed rewrite will be the following,

  if (!(REG_P (reg) && !in_class_p (reg, cl, _class))
  || GET_MODE_SIZE (mode) <= GET_MODE_SIZE (ptr_mode))

I feel the original code is easier to understand.
The check logic is composed of three conditions, in a pre-requisite order,
if one condition is true then reload the inner rtx, otherwise reload the 
whole subreg.


  if (!REG_P (reg)
  || in_class_p (reg, cl, _class)
  || GET_MODE_SIZE (mode) <= GET_MODE_SIZE (ptr_mode))
   loc = _REG (*loc);








Uros.




Re: [PATCH PR69848/partial]Propagate comparison into VEC_COND_EXPR if target supports

2016-05-17 Thread Richard Biener
On Mon, May 16, 2016 at 10:09 AM, Bin.Cheng  wrote:
> On Fri, May 13, 2016 at 5:53 PM, Richard Biener
>  wrote:
>> On May 13, 2016 6:02:27 PM GMT+02:00, Bin Cheng  wrote:
>>>Hi,
>>>As PR69848 reported, GCC vectorizer now generates comparison outside of
>>>VEC_COND_EXPR for COND_REDUCTION case, as below:
>>>
>>>  _20 = vect__1.6_8 != { 0, 0, 0, 0 };
>>>  vect_c_2.8_16 = VEC_COND_EXPR <_20, { 0, 0, 0, 0 }, vect_c_2.7_13>;
>>>  _21 = VEC_COND_EXPR <_20, ivtmp_17, _19>;
>>>
>>>This results in inefficient expanding.  With IR like:
>>>
>>>vect_c_2.8_16 = VEC_COND_EXPR >>0, 0 }, vect_c_2.7_13>;
>>>  _21 = VEC_COND_EXPR ;
>>>
>>>We can do:
>>>1) Expanding time optimization, for example, reverting comparison
>>>operator by switching VEC_COND_EXPR operands.  This is useful when
>>>backend only supports some comparison operators.
>>>2) For backend not supporting vcond_mask patterns, saving one LT_EXPR
>>>instruction which introduced by expand_vec_cond_expr.
>>>
>>>This patch fixes this by propagating comparison into VEC_COND_EXPR even
>>>if it's used multiple times.  For now, GCC does single_use_only
>>>propagation.  Ideally, we may duplicate the comparison before each use
>>>statement just before expanding, so that TER can successfully backtrack
>>>it from each VEC_COND_EXPR.  Unfortunately I didn't find a good pass to
>>>do this.  Tree-vect-generic.c looks like a good candidate, but it's so
>>>early that following CSE could undo the transform.  Another possible
>>>fix is to generate comparison inside VEC_COND_EXPR directly in function
>>>vectorizable_reduction.
>>
>> I prefer this for now.
> Hi Richard, you mean this patch, or the possible fix before your comment?

The possible fix before my comment - make the vectorizer generate VEC_COND_EXPRs
with embedded comparison.

Thanks,
Richard.

> Here is an updated patch addressing comment issue pointed out by
> Bernhard Reutner-Fischer.  Thanks.
>
> Thanks,
> bin
>>
>> Richard.
>>
>>>As for possible comparison CSE opportunities, I checked that it's
>>>simple enough to be handled by RTL CSE.
>>>
>>>Bootstrap and test on x86_64 and AArch64.  Any comments?
>>>
>>>Thanks,
>>>bin
>>>
>>>2016-05-12  Bin Cheng  
>>>
>>>   PR tree-optimization/69848
>>>   * optabs-tree.c (expand_vcond_mask_p, expand_vcond_p): New.
>>>   (expand_vec_cmp_expr_p): Call above functions.
>>>   * optabs-tree.h (expand_vcond_mask_p, expand_vcond_p): New.
>>>   * tree-ssa-forwprop.c (optabs-tree.h): Include header file.
>>>   (forward_propagate_into_cond): Propgate multiple uses for
>>>   VEC_COND_EXPR.
>>
>>


Re: [PATCH][AArch64] PR target/70809: Delete aarch64_vmls pattern

2016-05-17 Thread James Greenhalgh
On Tue, May 17, 2016 at 11:37:57AM +0100, Kyrill Tkachov wrote:
> Hi all,
> 
> The aarch64_vmls pattern claims to perform a normal vector
> floating-point multiply-subtract but in fact performs a fused
> multiply-subtract. This is fine when -ffp-contract=fast, but it's not guarded
> on anything so will generate the FMLS instruction even when
> -ffp-contract=off.
> 
> The solution is just to delete the pattern. If -ffp-contract=fast then an fma
> operation will have been generated and the fnma4 would be used to
> generate the FMLS instruction.
> 
> Bootstrapped and tested on aarch64-none-linux-gnu.
> 
> Ok for trunk and GCC 6 and 5? GCC 4.9 needs a different -mtune option in the
> testcase to trigger the testcase...

OK, thanks.

Please consider the GCC 4.9 backport preapproved with whatever flag is
needed to expose the issue.

Thanks,
James

> 
> Thanks,
> Kyrill
> 
> 2016-05-17  Kyrylo Tkachov  
> 
> PR target/70809
> * config/aarch64/aarch64-simd.md (aarch64_vmls): Delete.
> 
> 2016-05-17  Kyrylo Tkachov  
> 
> PR target/70809
> * gcc.target/aarch64/pr70809_1.c: New test.




Re: [Patch AArch64] Simplify reduc_plus_scal_v2[sd]f sequence

2016-05-17 Thread James Greenhalgh
On Tue, May 17, 2016 at 11:32:36AM +0100, Marcus Shawcroft wrote:
> On 17 May 2016 at 10:06, James Greenhalgh  wrote:
> >
> > Hi,
> >
> > This is just a simplification, it probably makes life easier for register
> > allocation in some corner cases and seems the right thing to do. We don't
> > use the internal version elsewhere, so we're safe to delete it and change
> > the types.
> >
> > OK?
> >
> > Bootstrapped on AArch64 with no issues.
> 
> Help me understand why this is ok for BE ?

The reduc_plus_scal_ pattern wants to take a vector and return a scalar
value representing the sum of the lanes of that vector. We want to go
from V2DFmode to DFmode.

The architectural instruction FADDP writes to a scalar value in the low
bits of the register, leaving zeroes in the upper bits.

i.e.

faddp  d0, v1.2d

128 640
 |0x0| v1.d[0] + v1.d[1]  |

In the current implementation, we use the
aarch64_reduc_plus_internal pattern, which treats the result of
FADDP as a vector of two elements. We then need an extra step to extract
the correct scalar value from that vector. From GCC's point of view the lane
containing the result is either lane 0 (little-endian) or lane 1
(big-endian), which is why the current code is endian dependent. The extract
operation will always be a NOP move from architectural bits 0-63 to
architectural bits 0-63 - but we never elide the move as future passes can't
be certain that the upper bits are zero (they come out of an UNSPEC so
could be anything).

However, this is all unneccesary. FADDP does exactly what we want,
regardless of endianness, we just need to model the instruction as writing
the scalar value in the first place. Which is what this patch wires up.

We probably just missed this optimization in the migration from the
reduc_splus optabs (which required a vector return value) to the
reduc_plus_scal optabs (which require a scalar return value).

Does that help?

Thanks,
James



Re: [PATCH][ARM] PR target/70830: Avoid POP-{reglist}^ when returning from interrupt handlers

2016-05-17 Thread Kyrill Tkachov


On 13/05/16 12:05, Kyrill Tkachov wrote:

Hi Christophe,

On 12/05/16 20:57, Christophe Lyon wrote:

On 12 May 2016 at 11:48, Ramana Radhakrishnan  wrote:

On Thu, May 5, 2016 at 12:50 PM, Kyrill Tkachov
 wrote:

Hi all,

In this PR we deal with some fallout from the conversion to unified
assembly.
We now end up emitting instructions like:
   pop {r0,r1,r2,r3,pc}^
which is not legal. We have to use an LDM form.

There are bugs in two arm.c functions: output_return_instruction and
arm_output_multireg_pop.

In output_return_instruction the buggy hunk from the conversion was:
   else
-   if (TARGET_UNIFIED_ASM)
   sprintf (instr, "pop%s\t{", conditional);
-   else
- sprintf (instr, "ldm%sfd\t%%|sp!, {", conditional);

The code was already very obscurely structured and arguably the bug was
latent.
It emitted POP only when TARGET_UNIFIED_ASM was on, and since
TARGET_UNIFIED_ASM was on
only for Thumb, we never went down this path interrupt handling code, since
the interrupt
attribute is only available for ARM code. After the removal of
TARGET_UNIFIED_ASM we ended up
using POP unconditionally. So this patch adds a check for IS_INTERRUPT and
outputs the
appropriate LDM form.

In arm_output_multireg_pop the buggy hunk was:
-  if ((regno_base == SP_REGNUM) && TARGET_THUMB)
+  if ((regno_base == SP_REGNUM) && update)
  {
-  /* Output pop (not stmfd) because it has a shorter encoding.  */
-  gcc_assert (update);
sprintf (pattern, "pop%s\t{", conditional);
  }

Again, the POP was guarded on TARGET_THUMB and so would never be taken on
interrupt handling
routines. This patch guards that with the appropriate check on interrupt
return.

Also, there are a couple of bugs in the 'else' branch of that 'if':
* The "ldmfd%s" was output without a '\t' at the end which meant that the
base register
name would be concatenated with the 'ldmfd', creating invalid assembly.

* The logic:

   if (regno_base == SP_REGNUM)
   /* update is never true here, hence there is no need to handle
  pop here.  */
 sprintf (pattern, "ldmfd%s", conditional);

   if (update)
 sprintf (pattern, "ldmia%s\t", conditional);
   else
 sprintf (pattern, "ldm%s\t", conditional);

Meant that for "regno == SP_REGNUM && !update" we'd end up printing
"ldmfd%sldm%s\t"
to pattern. I didn't manage to reproduce that condition though, so maybe it
can't ever occur.
This patch fixes both these issues nevertheless.

I've added the testcase from the PR to catch the fix in
output_return_instruction.
The testcase doesn't catch the bugs in arm_output_multireg_pop, but the
existing tests
gcc.target/arm/interrupt-1.c and gcc.target/arm/interrupt-2.c would have
caught them
if only they were assemble tests rather than just compile. So this patch
makes them
assembly tests (and reverts the scan-assembler checks for the correct LDM
pattern).

Bootstrapped and tested on arm-none-linux-gnueabihf.
Ok for trunk and GCC 6?


Hi Kyrill,

Did you test --with-mode=thumb?
When using arm mode, I see regressions:

   gcc.target/arm/neon-nested-apcs.c (test for excess errors)
   gcc.target/arm/nested-apcs.c (test for excess errors)


It's because I have a local patch in my binutils that makes gas warn on the
deprecated sequences that these two tests generate (they use the deprecated 
-mapcs option),
so these tests were already showing the (test for excess errors) FAIL for me,
so I they didn't appear in my tests diff for this patch. :(

I've reproduced the failure with a clean tree.
Where before we generated:
ldmsp, {fp, sp, pc}
now we generate:
pop{fp, sp, pc}

which are not equivalent (pop performs a write-back) and gas warns:
Warning: writeback of base register when in register list is UNPREDICTABLE

I'm testing a patch to fix this.
Sorry for the regression.


Here is the fix.
I had remove the update from the condition for the "pop" erroneously. Of 
course, if we're not
updating the SP we can't use POP that has an implicit writeback.

Bootstrapped on arm-none-linux-gnueabihf. Tested with -mthumb and -marm.

Ok for trunk and GCC 6?

Thanks,
Kyrill

2016-05-17  Kyrylo Tkachov  

PR target/70830
* config/arm/arm.c (arm_output_multireg_pop): Guard "pop" on update.

diff --git a/gcc/config/arm/arm.c b/gcc/config/arm/arm.c
index 2cc7f7b452a62f898346a51ca7ede0d19bcfcfad..68985547a634bb93ab59416e24aaa046ab99f6a6 100644
--- a/gcc/config/arm/arm.c
+++ b/gcc/config/arm/arm.c
@@ -17624,10 +17624,8 @@ arm_output_multireg_pop (rtx *operands, bool return_pc, rtx cond, bool reverse,
 
   conditional = reverse ? "%?%D0" : "%?%d0";
   /* Can't use POP if returning from an interrupt.  */
-  if ((regno_base == SP_REGNUM) && !(interrupt_p && return_pc))
-{
-  sprintf (pattern, "pop%s\t{", conditional);
-}
+  if ((regno_base == SP_REGNUM) && update && !(interrupt_p && return_pc))
+

[PATCH][AArch64] PR target/70809: Delete aarch64_vmls pattern

2016-05-17 Thread Kyrill Tkachov

Hi all,

The aarch64_vmls pattern claims to perform a normal vector floating-point 
multiply-subtract but in fact
performs a fused multiply-subtract. This is fine when -ffp-contract=fast, but 
it's not guarded on anything so
will generate the FMLS instruction even when -ffp-contract=off.

The solution is just to delete the pattern. If -ffp-contract=fast then an fma 
operation will have been generated
and the fnma4 would be used to generate the FMLS instruction.

Bootstrapped and tested on aarch64-none-linux-gnu.

Ok for trunk and GCC 6 and 5? GCC 4.9 needs a different -mtune option in the 
testcase to trigger the testcase...

Thanks,
Kyrill

2016-05-17  Kyrylo Tkachov  

PR target/70809
* config/aarch64/aarch64-simd.md (aarch64_vmls): Delete.

2016-05-17  Kyrylo Tkachov  

PR target/70809
* gcc.target/aarch64/pr70809_1.c: New test.
diff --git a/gcc/config/aarch64/aarch64-simd.md b/gcc/config/aarch64/aarch64-simd.md
index a66948a28e99f4437824a8640b092f7be1c917f6..90272a09f2dd925cfc01caa09e9e8963a8e6c6ed 100644
--- a/gcc/config/aarch64/aarch64-simd.md
+++ b/gcc/config/aarch64/aarch64-simd.md
@@ -1919,16 +1919,6 @@ (define_expand "vec_pack_trunc_df"
   }
 )
 
-(define_insn "aarch64_vmls"
-  [(set (match_operand:VDQF 0 "register_operand" "=w")
-   (minus:VDQF (match_operand:VDQF 1 "register_operand" "0")
-		   (mult:VDQF (match_operand:VDQF 2 "register_operand" "w")
-			  (match_operand:VDQF 3 "register_operand" "w"]
-  "TARGET_SIMD"
- "fmls\\t%0., %2., %3."
-  [(set_attr "type" "neon_fp_mla__scalar")]
-)
-
 ;; FP Max/Min
 ;; Max/Min are introduced by idiom recognition by GCC's mid-end.  An
 ;; expression like:
diff --git a/gcc/testsuite/gcc.target/aarch64/pr70809_1.c b/gcc/testsuite/gcc.target/aarch64/pr70809_1.c
new file mode 100644
index ..df88c71c42afc7fafff703f801bbfced8daafc95
--- /dev/null
+++ b/gcc/testsuite/gcc.target/aarch64/pr70809_1.c
@@ -0,0 +1,18 @@
+/* PR target/70809.  */
+/* { dg-do compile } */
+/* { dg-options "-O2 -ftree-vectorize -ffp-contract=off -mtune=xgene1" } */
+
+/* Check that vector FMLS is not generated when contraction is disabled.  */
+
+void
+foo (float *__restrict__ __attribute__ ((aligned (16))) a,
+ float *__restrict__ __attribute__ ((aligned (16))) x,
+ float *__restrict__ __attribute__ ((aligned (16))) y,
+ float *__restrict__ __attribute__ ((aligned (16))) z)
+{
+  unsigned i = 0;
+  for (i = 0; i < 256; i++)
+a[i] = x[i] - (y[i] * z[i]);
+}
+
+/* { dg-final { scan-assembler-not "fmls\tv.*" } } */


Re: [Patch AArch64] Simplify reduc_plus_scal_v2[sd]f sequence

2016-05-17 Thread Marcus Shawcroft
On 17 May 2016 at 10:06, James Greenhalgh  wrote:
>
> Hi,
>
> This is just a simplification, it probably makes life easier for register
> allocation in some corner cases and seems the right thing to do. We don't
> use the internal version elsewhere, so we're safe to delete it and change
> the types.
>
> OK?
>
> Bootstrapped on AArch64 with no issues.

Help me understand why this is ok for BE ?

Cheers
/Marcus


Re: [Patch AArch64] Delete ASM_OUTPUT_DEF and fallback to default .set directive

2016-05-17 Thread Marcus Shawcroft
On 17 May 2016 at 10:13, James Greenhalgh  wrote:
>
> Hi,
>
> As in the ARM port [1] , the AArch64 port wants to put out "b = a" to set
> an alias. This doesn't cause us any trouble yet, as the AArch64 port doesn't
> warn for this construct - but at the same time there is no reason for us
> not to put out a .set directive - this seems to have been copied from the
> ARM port when section anchor support was added in 2012. Looking through
> the chain, we'll get a default definition for ASM_OUTPUT_DEF if SET_ASM_OP
> is defined, and we get SET_ASM_OP defined through config/elfos.h for
> all the AArch64 targets I can see in config.gcc. So we're safe to drop
> this.
>
> Bootstrapped on aarch64-none-linux-gnu.
>
> OK?

OK /Marcus


Re: [Patch] PR rtl-optimization/71150, guard in_class_p check with REG_P

2016-05-17 Thread Uros Bizjak
On Tue, May 17, 2016 at 12:17 PM, Uros Bizjak  wrote:
> Hello!
>
>> This bug is introduced by my commit r236181 where the inner rtx of
>> SUBREG haven't been checked while it should as "in_class_p" only
>> works with REG, and SUBREG_REG is actually not always REG.  If REG_P
>> check failed,  then we should fall back to normal code patch. The
>> following simple testcase for x86 can reproduce this bug.
>
>> diff --git a/gcc/lra-constraints.c b/gcc/lra-constraints.c
>> index 56ab5b4..e4e6c8c 100644
>> --- a/gcc/lra-constraints.c
>> +++ b/gcc/lra-constraints.c
>>  @@ -1317,7 +1317,8 @@ process_addr_reg (rtx *loc, bool check_only_p, 
>> rtx_insn **before, rtx_insn **aft
>>  register, and this normally will be a subreg which should be reloaded
>>  as a whole.  This is particularly likely to be triggered when
>>  -fno-split-wide-types specified.  */
>>-  if (in_class_p (reg, cl, _class)
>>+  if (!REG_P (reg)
>>+  || in_class_p (reg, cl, _class)
>>   || GET_MODE_SIZE (mode) <= GET_MODE_SIZE (ptr_mode))
>>loc = _REG (*loc);
>
> Why not check SUBREG_P instead of !REG_P?

Or, alternatively:

if ((REG_P && !in_class_p (reg, ...))
|| GET_MODE_SIZE ...)

Which is IMO much more readable.

Uros.


Re: [PATCH, ARM 7/7, ping1] Enable atomics for ARMv8-M Mainline

2016-05-17 Thread Thomas Preudhomme
Ping?

*** gcc/ChangeLog ***

2015-12-17  Thomas Preud'homme  

* config/arm/arm.h (TARGET_HAVE_LDACQ): Enable for ARMv8-M Mainline.


diff --git a/gcc/config/arm/arm.h b/gcc/config/arm/arm.h
index 
347b5b0a5cc0bc1e3b5020c8124d968e76ce48a4..e154bd31b8084f9f45ad4409e7b38de652538c51
 
100644
--- a/gcc/config/arm/arm.h
+++ b/gcc/config/arm/arm.h
@@ -266,7 +266,7 @@ extern void (*arm_lang_output_object_attributes_hook)
(void);
 || arm_arch7) && arm_arch_notm)
 
 /* Nonzero if this chip supports load-acquire and store-release.  */
-#define TARGET_HAVE_LDACQ  (TARGET_ARM_ARCH >= 8 && arm_arch_notm)
+#define TARGET_HAVE_LDACQ  (TARGET_ARM_ARCH >= 8 && TARGET_32BIT)
 
 /* Nonzero if this chip provides the movw and movt instructions.  */
 #define TARGET_HAVE_MOVT   (arm_arch_thumb2 || arm_arch8)


Best regards,

Thomas

On Thursday 17 December 2015 17:39:29 Thomas Preud'homme wrote:
> Hi,
> 
> This patch is part of a patch series to add support for ARMv8-M[1] to GCC.
> This specific patch enable atomics for ARMv8-M Mainline. No change is
> needed to existing patterns since Thumb-2 backend can already handle them
> fine.
> 
> [1] For a quick overview of ARMv8-M please refer to the initial cover
> letter.
> 
> 
> ChangeLog entries are as follow:
> 
> *** gcc/ChangeLog ***
> 
> 2015-12-17  Thomas Preud'homme  
> 
> * config/arm/arm.h (TARGET_HAVE_LDACQ): Enable for ARMv8-M Mainline.
> 
> 
> diff --git a/gcc/config/arm/arm.h b/gcc/config/arm/arm.h
> index
> 1f79c37b5c36a410a2d500ba92c62a5ba4ca1178..fa2a6fb03ffd2ca53bfb7e7c8f03022b6
> 26880e0 100644 --- a/gcc/config/arm/arm.h
> +++ b/gcc/config/arm/arm.h
> @@ -258,7 +258,7 @@ extern void
> (*arm_lang_output_object_attributes_hook)(void);
>|| arm_arch7) && arm_arch_notm)
> 
>  /* Nonzero if this chip supports load-acquire and store-release.  */
> -#define TARGET_HAVE_LDACQ(TARGET_ARM_ARCH >= 8 && arm_arch_notm)
> +#define TARGET_HAVE_LDACQ(TARGET_ARM_ARCH >= 8 && TARGET_32BIT)
> 
>  /* Nonzero if this chip provides the movw and movt instructions.  */
>  #define TARGET_HAVE_MOVT (arm_arch_thumb2 || arm_arch8)
> 
> 
> Testing:
> 
> * Toolchain was built successfully with and without the ARMv8-M support
> patches with the following multilib list:
> armv6-m,armv7-m,armv7e-m,cortex-m7. The code generation for crtbegin.o,
> crtend.o, crti.o, crtn.o, libgcc.a, libgcov.a, libc.a, libg.a,
> libgloss-linux.a, libm.a, libnosys.a, librdimon.a, librdpmon.a, libstdc++.a
> and libsupc++.a is unchanged for all these targets.
> 
> * GCC also showed no testsuite regression when targeting ARMv8-M Baseline
> compared to ARMv6-M on ARM Fast Models and when targeting ARMv6-M and
> ARMv7-M (compared to without the patch) * GCC was bootstrapped successfully
> targeting Thumb-1 and targeting Thumb-2
> 
> Is this ok for stage3?
> 
> Best regards,
> 
> Thomas



Re: [PATCH, ARM 6/7, ping1] Add support for CB(N)Z and (U|S)DIV to ARMv8-M Baseline

2016-05-17 Thread Thomas Preudhomme
Ping?

*** gcc/ChangeLog ***

2015-11-13  Thomas Preud'homme  

* config/arm/arm.c (arm_print_operand_punct_valid_p): Make %? valid
for Thumb-1.
* config/arm/arm.h (TARGET_HAVE_CBZ): Define.
(TARGET_IDIV): Set for all Thumb targets provided they have hardware
divide feature.
* config/arm/thumb1.md (thumb1_cbz): New insn.


diff --git a/gcc/config/arm/arm.h b/gcc/config/arm/arm.h
index 
f42e996e5a7ce979fe406b8261d50fb2ba005f6b..347b5b0a5cc0bc1e3b5020c8124d968e76ce48a4
 
100644
--- a/gcc/config/arm/arm.h
+++ b/gcc/config/arm/arm.h
@@ -271,9 +271,12 @@ extern void (*arm_lang_output_object_attributes_hook)
(void);
 /* Nonzero if this chip provides the movw and movt instructions.  */
 #define TARGET_HAVE_MOVT   (arm_arch_thumb2 || arm_arch8)
 
+/* Nonzero if this chip provides the cb{n}z instruction.  */
+#define TARGET_HAVE_CBZ(arm_arch_thumb2 || arm_arch8)
+
 /* Nonzero if integer division instructions supported.  */
 #define TARGET_IDIV((TARGET_ARM && arm_arch_arm_hwdiv) \
-|| (TARGET_THUMB2 && arm_arch_thumb_hwdiv))
+|| (TARGET_THUMB && arm_arch_thumb_hwdiv))
 
 /* Nonzero if disallow volatile memory access in IT block.  */
 #define TARGET_NO_VOLATILE_CE  (arm_arch_no_volatile_ce)
diff --git a/gcc/config/arm/arm.c b/gcc/config/arm/arm.c
index 
13b4b71ac8f9c1da8ef1945f7ff6985ca59f6832..445972ce0e3fd27d4411840ff69e9edbb23994fc
 
100644
--- a/gcc/config/arm/arm.c
+++ b/gcc/config/arm/arm.c
@@ -22684,7 +22684,7 @@ arm_print_operand_punct_valid_p (unsigned char code)
 {
   return (code == '@' || code == '|' || code == '.'
  || code == '(' || code == ')' || code == '#'
- || (TARGET_32BIT && (code == '?'))
+ || code == '?'
  || (TARGET_THUMB2 && (code == '!'))
  || (TARGET_THUMB && (code == '_')));
 }
diff --git a/gcc/config/arm/thumb1.md b/gcc/config/arm/thumb1.md
index 
4572456b8bc98503061846cad94bc642943db3a2..1b01ef6ce731fe3ff37c3d8c048fb9d5e7829b35
 
100644
--- a/gcc/config/arm/thumb1.md
+++ b/gcc/config/arm/thumb1.md
@@ -973,6 +973,92 @@
   DONE;
 })
 
+;; A pattern for the cb(n)z instruction added in ARMv8-M baseline profile,
+;; adapted from cbranchsi4_insn.  Modifying cbranchsi4_insn instead leads to
+;; code generation difference for ARMv6-M because the minimum length of the
+;; instruction becomes 2 even for it due to a limitation in genattrtab's
+;; handling of pc in the length condition.
+(define_insn "thumb1_cbz"
+  [(set (pc) (if_then_else
+ (match_operator 0 "equality_operator"
+  [(match_operand:SI 1 "s_register_operand" "l")
+   (const_int 0)])
+ (label_ref (match_operand 2 "" ""))
+ (pc)))]
+  "TARGET_THUMB1 && TARGET_HAVE_MOVT"
+{
+  if (get_attr_length (insn) == 2)
+{
+  if (GET_CODE (operands[0]) == EQ)
+   return "cbz\t%1, %l2";
+  else
+   return "cbnz\t%1, %l2";
+}
+  else
+{
+  rtx t = cfun->machine->thumb1_cc_insn;
+  if (t != NULL_RTX)
+   {
+ if (!rtx_equal_p (cfun->machine->thumb1_cc_op0, operands[1])
+ || !rtx_equal_p (cfun->machine->thumb1_cc_op1, operands[2]))
+   t = NULL_RTX;
+ if (cfun->machine->thumb1_cc_mode == CC_NOOVmode)
+   {
+ if (!noov_comparison_operator (operands[0], VOIDmode))
+   t = NULL_RTX;
+   }
+ else if (cfun->machine->thumb1_cc_mode != CCmode)
+   t = NULL_RTX;
+   }
+  if (t == NULL_RTX)
+   {
+ output_asm_insn ("cmp\t%1, #0", operands);
+ cfun->machine->thumb1_cc_insn = insn;
+ cfun->machine->thumb1_cc_op0 = operands[1];
+ cfun->machine->thumb1_cc_op1 = operands[2];
+ cfun->machine->thumb1_cc_mode = CCmode;
+   }
+  else
+   /* Ensure we emit the right type of condition code on the jump.  */
+   XEXP (operands[0], 0) = gen_rtx_REG (cfun->machine->thumb1_cc_mode,
+CC_REGNUM);
+
+  switch (get_attr_length (insn))
+   {
+   case 4:  return "b%d0\t%l2";
+   case 6:  return "b%D0\t.LCB%=;b\t%l2\t%@long jump\n.LCB%=:";
+   case 8: return "b%D0\t.LCB%=;bl\t%l2\t%@far jump\n.LCB%=:";
+   default: gcc_unreachable ();
+   }
+}
+}
+  [(set (attr "far_jump")
+   (if_then_else
+   (eq_attr "length" "8")
+   (const_string "yes")
+   (const_string "no")))
+   (set (attr "length")
+   (if_then_else
+   (and (ge (minus (match_dup 2) (pc)) (const_int 2))
+(le (minus (match_dup 2) (pc)) (const_int 128))
+(not (match_test "which_alternative")))
+   (const_int 2)
+   (if_then_else
+   (and (ge (minus (match_dup 2) (pc)) (const_int -250))
+(le (minus (match_dup 2) (pc)) (const_int 256)))
+   (const_int 4)
+   

Re: [PATCH, ARM 5/7, ping1] Add support for MOVT/MOVW to ARMv8-M Baseline

2016-05-17 Thread Thomas Preudhomme
Ping?

*** gcc/ChangeLog ***

2015-11-13  Thomas Preud'homme  

* config/arm/arm.h (TARGET_HAVE_MOVT): Include ARMv8-M as having MOVT.
* config/arm/arm.c (arm_arch_name): (const_ok_for_op): Check MOVT/MOVW
availability with TARGET_HAVE_MOVT.
(thumb_legitimate_constant_p): Legalize high part of a label_ref as a
constant.
(thumb1_rtx_costs): Also return 0 if setting a half word constant and
movw is available.
(thumb1_size_rtx_costs): Make set of half word constant also cost 1
extra instruction if MOVW is available.  Make constant with bottom 
half
word zero cost 2 instruction if MOVW is available.
* config/arm/arm.md (define_attr "arch"): Add v8mb.
(define_attr "arch_enabled"): Set to yes if arch value is v8mb and
target is ARMv8-M Baseline.
* config/arm/thumb1.md (thumb1_movdi_insn): Add ARMv8-M Baseline only
alternative for constants satisfying j constraint.
(thumb1_movsi_insn): Likewise.
(movsi splitter for K alternative): Tighten condition to not trigger
if movt is available and j constraint is satisfied.
(Pe immediate splitter): Likewise.
(thumb1_movhi_insn): Add ARMv8-M Baseline only alternative for
constant fitting in an halfword to use movw.
* doc/sourcebuild.texi (arm_thumb1_movt_ko): Document new ARM
effective target.


*** gcc/testsuite/ChangeLog ***

2015-11-13  Thomas Preud'homme  

* lib/target-supports.exp (check_effective_target_arm_thumb1_movt_ko):
Define effective target.
* gcc.target/arm/pr42574.c: Require arm_thumb1_movt_ko instead of
arm_thumb1_ok as effective target to exclude ARMv8-M Baseline.


diff --git a/gcc/config/arm/arm.h b/gcc/config/arm/arm.h
index 
47216b4a1959ccdb18e329db411bf7f941e67163..f42e996e5a7ce979fe406b8261d50fb2ba005f6b
 
100644
--- a/gcc/config/arm/arm.h
+++ b/gcc/config/arm/arm.h
@@ -269,7 +269,7 @@ extern void (*arm_lang_output_object_attributes_hook)
(void);
 #define TARGET_HAVE_LDACQ  (TARGET_ARM_ARCH >= 8 && arm_arch_notm)
 
 /* Nonzero if this chip provides the movw and movt instructions.  */
-#define TARGET_HAVE_MOVT   (arm_arch_thumb2)
+#define TARGET_HAVE_MOVT   (arm_arch_thumb2 || arm_arch8)
 
 /* Nonzero if integer division instructions supported.  */
 #define TARGET_IDIV((TARGET_ARM && arm_arch_arm_hwdiv) \
diff --git a/gcc/config/arm/arm.c b/gcc/config/arm/arm.c
index 
d75a34f10d5ed22cff0a0b5d3ad433f111b059ee..13b4b71ac8f9c1da8ef1945f7ff6985ca59f6832
 
100644
--- a/gcc/config/arm/arm.c
+++ b/gcc/config/arm/arm.c
@@ -8220,6 +8220,12 @@ arm_legitimate_constant_p_1 (machine_mode, rtx x)
 static bool
 thumb_legitimate_constant_p (machine_mode mode ATTRIBUTE_UNUSED, rtx x)
 {
+  /* Splitters for TARGET_USE_MOVT call arm_emit_movpair which creates high
+ RTX.  These RTX must therefore be allowed for Thumb-1 so that when run
+ for ARMv8-M baseline or later the result is valid.  */
+  if (TARGET_HAVE_MOVT && GET_CODE (x) == HIGH)
+x = XEXP (x, 0);
+
   return (CONST_INT_P (x)
  || CONST_DOUBLE_P (x)
  || CONSTANT_ADDRESS_P (x)
@@ -8306,7 +8312,8 @@ thumb1_rtx_costs (rtx x, enum rtx_code code, enum 
rtx_code outer)
 case CONST_INT:
   if (outer == SET)
{
- if ((unsigned HOST_WIDE_INT) INTVAL (x) < 256)
+ if ((unsigned HOST_WIDE_INT) INTVAL (x) < 256
+ || (TARGET_HAVE_MOVT && !(INTVAL (x) & 0x)))
return 0;
  if (thumb_shiftable_const (INTVAL (x)))
return COSTS_N_INSNS (2);
@@ -9056,16 +9063,24 @@ thumb1_size_rtx_costs (rtx x, enum rtx_code code, enum 
rtx_code outer)
 the mode.  */
   words = ARM_NUM_INTS (GET_MODE_SIZE (GET_MODE (SET_DEST (x;
   return COSTS_N_INSNS (words)
-+ COSTS_N_INSNS (1) * (satisfies_constraint_J (SET_SRC (x))
-   || satisfies_constraint_K (SET_SRC (x))
-  /* thumb1_movdi_insn.  */
-   || ((words > 1) && MEM_P (SET_SRC (x;
++ COSTS_N_INSNS (1)
+  * (satisfies_constraint_J (SET_SRC (x))
+ || satisfies_constraint_K (SET_SRC (x))
+/* Too big immediate for 2byte mov, using movt.  */
+ || ((unsigned HOST_WIDE_INT) INTVAL (SET_SRC (x)) >= 256
+ && TARGET_HAVE_MOVT
+ && satisfies_constraint_j (SET_SRC (x)))
+/* thumb1_movdi_insn.  */
+ || ((words > 1) && MEM_P (SET_SRC (x;
 
 case CONST_INT:
   if (outer == SET)
 {
   if ((unsigned HOST_WIDE_INT) INTVAL (x) < 256)
 return COSTS_N_INSNS (1);
+ /* movw is 4byte long.  */
+ if (TARGET_HAVE_MOVT && !(INTVAL (x) & 0x))
+   return COSTS_N_INSNS (2);
   

Re: [PATCH, ARM 4/7, ping1] Factor out MOVW/MOVT availability and desirability checks

2016-05-17 Thread Thomas Preudhomme
Ping?

*** gcc/ChangeLog ***

2015-11-09  Thomas Preud'homme  

* config/arm/arm.h (TARGET_USE_MOVT): Check MOVT/MOVW availability
with TARGET_HAVE_MOVT.
(TARGET_HAVE_MOVT): Define.
* config/arm/arm.c (const_ok_for_op): Check MOVT/MOVW
availability with TARGET_HAVE_MOVT.
* config/arm/arm.md (arm_movt): Use TARGET_HAVE_MOVT to check movt
availability.
(addsi splitter): Use TARGET_USE_MOVT to check whether to use
movt + movw.
(symbol_refs movsi splitter): Remove TARGET_32BIT check.
(arm_movtas_ze): Use TARGET_HAVE_MOVT to check movt availability.
* config/arm/constraints.md (define_constraint "j"): Use
TARGET_HAVE_MOVT to check movt availability.


diff --git a/gcc/config/arm/arm.h b/gcc/config/arm/arm.h
index 
1d976b36300d92d538098b3cf83c60d62ed2be1c..47216b4a1959ccdb18e329db411bf7f941e67163
 
100644
--- a/gcc/config/arm/arm.h
+++ b/gcc/config/arm/arm.h
@@ -237,7 +237,7 @@ extern void (*arm_lang_output_object_attributes_hook)
(void);
 
 /* Should MOVW/MOVT be used in preference to a constant pool.  */
 #define TARGET_USE_MOVT \
-  (arm_arch_thumb2 \
+  (TARGET_HAVE_MOVT \
&& (arm_disable_literal_pool \
|| (!optimize_size && !current_tune->prefer_constant_pool)))
 
@@ -268,6 +268,9 @@ extern void (*arm_lang_output_object_attributes_hook)
(void);
 /* Nonzero if this chip supports load-acquire and store-release.  */
 #define TARGET_HAVE_LDACQ  (TARGET_ARM_ARCH >= 8 && arm_arch_notm)
 
+/* Nonzero if this chip provides the movw and movt instructions.  */
+#define TARGET_HAVE_MOVT   (arm_arch_thumb2)
+
 /* Nonzero if integer division instructions supported.  */
 #define TARGET_IDIV((TARGET_ARM && arm_arch_arm_hwdiv) \
 || (TARGET_THUMB2 && arm_arch_thumb_hwdiv))
diff --git a/gcc/config/arm/arm.c b/gcc/config/arm/arm.c
index 
7b95ba0b379c31ee650e714ce2198a43b1cadbac..d75a34f10d5ed22cff0a0b5d3ad433f111b059ee
 
100644
--- a/gcc/config/arm/arm.c
+++ b/gcc/config/arm/arm.c
@@ -3897,7 +3897,7 @@ const_ok_for_op (HOST_WIDE_INT i, enum rtx_code code)
 {
 case SET:
   /* See if we can use movw.  */
-  if (arm_arch_thumb2 && (i & 0x) == 0)
+  if (TARGET_HAVE_MOVT && (i & 0x) == 0)
return 1;
   else
/* Otherwise, try mvn.  */
diff --git a/gcc/config/arm/arm.md b/gcc/config/arm/arm.md
index 
4049f104c6d5fd8bfd8f68ecdfae6a3d34d4333f..094423477acb8d9223fd06c17e82bfd0a94d
 
100644
--- a/gcc/config/arm/arm.md
+++ b/gcc/config/arm/arm.md
@@ -5705,7 +5705,7 @@
   [(set (match_operand:SI 0 "nonimmediate_operand" "=r")
(lo_sum:SI (match_operand:SI 1 "nonimmediate_operand" "0")
   (match_operand:SI 2 "general_operand"  "i")))]
-  "arm_arch_thumb2 && arm_valid_symbolic_address_p (operands[2])"
+  "TARGET_HAVE_MOVT && arm_valid_symbolic_address_p (operands[2])"
   "movt%?\t%0, #:upper16:%c2"
   [(set_attr "predicable" "yes")
(set_attr "predicable_short_it" "no")
@@ -5765,8 +5765,7 @@
   [(set (match_operand:SI 0 "arm_general_register_operand" "")
(const:SI (plus:SI (match_operand:SI 1 "general_operand" "")
   (match_operand:SI 2 "const_int_operand" ""]
-  "TARGET_THUMB2
-   && arm_disable_literal_pool
+  "TARGET_USE_MOVT
&& reload_completed
&& GET_CODE (operands[1]) == SYMBOL_REF"
   [(clobber (const_int 0))]
@@ -5796,8 +5795,7 @@
 (define_split
   [(set (match_operand:SI 0 "arm_general_register_operand" "")
(match_operand:SI 1 "general_operand" ""))]
-  "TARGET_32BIT
-   && TARGET_USE_MOVT && GET_CODE (operands[1]) == SYMBOL_REF
+  "TARGET_USE_MOVT && GET_CODE (operands[1]) == SYMBOL_REF
&& !flag_pic && !target_word_relocations
&& !arm_tls_referenced_p (operands[1])"
   [(clobber (const_int 0))]
@@ -10965,7 +10963,7 @@
(const_int 16)
(const_int 16))
 (match_operand:SI 1 "const_int_operand" ""))]
-  "arm_arch_thumb2"
+  "TARGET_HAVE_MOVT"
   "movt%?\t%0, %L1"
  [(set_attr "predicable" "yes")
   (set_attr "predicable_short_it" "no")
diff --git a/gcc/config/arm/constraints.md b/gcc/config/arm/constraints.md
index 
3b71c4a527064290066348cb234c6abb8c8e2e43..4ece5f013c92adee04157b5c909e1d47c894c994
 
100644
--- a/gcc/config/arm/constraints.md
+++ b/gcc/config/arm/constraints.md
@@ -66,7 +66,7 @@
 
 (define_constraint "j"
  "A constant suitable for a MOVW instruction. (ARM/Thumb-2)"
- (and (match_test "TARGET_32BIT && arm_arch_thumb2")
+ (and (match_test "TARGET_HAVE_MOVT")
   (ior (and (match_code "high")
(match_test "arm_valid_symbolic_address_p (XEXP (op, 0))"))
   (and (match_code "const_int")


Best regards,

Thomas

On Thursday 17 December 2015 15:59:13 Thomas Preud'homme wrote:
> Hi,
> 
> This patch is part of a patch series to add support for ARMv8-M[1] to GCC.
> This specific patch factors out the checks for MOVW/MOVT 

  1   2   >