Re: [PATCH] Fix obvious typo that breaks boot strap in delayed folding

2015-11-16 Thread Andreas Schwab
Andi Kleen  writes:

> On Sun, Nov 15, 2015 at 11:09:18PM +0100, Andreas Schwab wrote:
>> Andi Kleen  writes:
>> 
>> > Fix the obivous typos. To to commit?
>> 
>> They are not typos.
>
> Ok. What do you suggest to fix the error then?

Someone will need to fix the regression introduced by the C++ delayed
folding.

Andreas.

-- 
Andreas Schwab, sch...@linux-m68k.org
GPG Key fingerprint = 58CA 54C7 6D53 942B 1756  01D3 44D5 214B 8276 4ED5
"And now for something completely different."


Re: [PATCH] PR 68192 Export AIX TLS symbols

2015-11-16 Thread Václav Haisman
On 16.11.2015 03:00, David Edelsohn wrote:
>[..]
> Ten days and still waiting for a response on libtool-patches.
> 
> - David
> 

Drop a hint to libt...@gnu.org?

-- 
VH



signature.asc
Description: OpenPGP digital signature


Re: [PATCH] Fix PR56118

2015-11-16 Thread Richard Biener
On Fri, 13 Nov 2015, Alan Lawrence wrote:

> On 10/11/15 09:34, Richard Biener wrote:
> > 
> > The following fixes PR56118 by adjusting the cost model handling of
> > basic-block vectorization to favor the vectorized version in case
> > estimated cost is the same as the estimated cost of the scalar
> > version.  This makes sense because we over-estimate the vectorized
> > cost in several places.
> > 
> > Bootstrapped and tested on x86_64-unknown-linux-gnu, applied.
> > 
> > Richard.
> > 
> > 2015-11-10  Richard Biener  
> > 
> > PR tree-optimization/56118
> > * tree-vect-slp.c (vect_bb_vectorization_profitable_p): Make equal
> > cost favor vectorized version.
> > 
> > * gcc.target/i386/pr56118.c: New testcase.
> 
> On AArch64 and ARM targets, this causes PASS->FAIL of
> 
> gcc.dg/vect/bb-slp-32.c scan-tree-dump slp2 "vectorization is not profitable"
> gcc.dg/vect/bb-slp-32.c -flto -ffat-lto-objects scan-tree-dump slp2
> "vectorization is not profitable"
> 
> that sounds like a good thing ;)

It depends ;)  You may want to look at the generated code with/without
vectorization and decide for yourselves.  The testcase is

int foo (int *p)
{
  int x[4];
  int tem0, tem1, tem2, tem3;
  tem0 = p[0] + 1;
  x[0] = tem0;
  tem1 = p[1] + 2;
  x[1] = tem1;
  tem2 = p[2] + 3;
  x[2] = tem2;
  tem3 = p[3] + 4;
  x[3] = tem3;
  bar (x);
  return tem0 + tem1 + tem2 + tem3;
}

which was added to cover the situation where we vectorize the store
to x[] but have to keep the scalar computations for tem[0-3] for
the final reduction.  The scalar cost for this kernel is 3*4
while the vector cost is unaligned load + aligned load + vector op
+ aligned vector store.  We compensate for the out-of-kernel uses
by making the scalar cost just the stores (4).  Now if all of the
vector cost pieces are 1 then we have 4 vs. 4 here.  On x86_64
the unaligned load cost is 2 and thus vectorization is deemed
non-profitable.

In reality this depends on the actual constants used in the plus
as that tells you whether the constant is free for the scalar
plus or not (for vectors it almost always comes from the constant
pool).

On x86_64 the assembler difference is

movl(%rdi), %eax
movdqu  (%rdi), %xmm0
leal1(%rax), %r13d
movl4(%rdi), %eax
paddd   .LC0(%rip), %xmm0
movaps  %xmm0, (%rsp)
leal2(%rax), %r12d
movl8(%rdi), %eax
addl%r13d, %r12d
leal3(%rax), %ebp
movl12(%rdi), %eax
movq%rsp, %rdi
addl%r12d, %ebp
leal4(%rax), %ebx
callbar
leal0(%rbp,%rbx), %eax

vs.

movl(%rdi), %eax
movl12(%rdi), %ebx
leal1(%rax), %r13d
movl4(%rdi), %eax
addl$4, %ebx
movl%ebx, 12(%rsp)
movl%r13d, (%rsp)
leal2(%rax), %r12d
movl8(%rdi), %eax
movq%rsp, %rdi
movl%r12d, 4(%rsp)
leal3(%rax), %ebp
movl%ebp, 8(%rsp)
callbar
leal0(%r13,%r12), %eax
addl%ebp, %eax
addl%ebx, %eax

clearly the testcase could need adjustment to be not so on the
edge of the individual targets.  I'm considering changing it.

The testcase also shows the lack of reduction vectorization in BBs
(I have partial finished patches for this but got distracted...).

>, so I imagine the xfail directive may
> just need updating. The test also looks to be failing on powerpc64 (according
> to https://gcc.gnu.org/ml/gcc-testresults/2015-11/msg01327.html).

I'll try making the testcase more complicated instead.

Richard.

> Regards, Alan
> 
> 

-- 
Richard Biener 
SUSE LINUX GmbH, GF: Felix Imendoerffer, Jane Smithard, Graham Norton, HRB 
21284 (AG Nuernberg)


[PATCH] Adjust gcc.dg/vect/bb-slp-32.c

2015-11-16 Thread Richard Biener

This adjusts mentioned testcase to make it more reliably pass.

Tested on x86_64-unknown-linux-gnu.

Richard.

2015-11-16  Richard Biener  

* gcc.dg/vect/bb-slp-32.c: Adjust testcase.

Index: gcc/testsuite/gcc.dg/vect/bb-slp-32.c
===
--- gcc/testsuite/gcc.dg/vect/bb-slp-32.c   (revision 230310)
+++ gcc/testsuite/gcc.dg/vect/bb-slp-32.c   (working copy)
@@ -3,17 +3,17 @@
 /* { dg-additional-options "-fvect-cost-model=dynamic" } */
 
 void bar (int *);
-int foo (int *p)
+int foo (int *p, int a, int b)
 {
   int x[4];
   int tem0, tem1, tem2, tem3;
-  tem0 = p[0] + 1;
+  tem0 = p[0] + 1 + a;
   x[0] = tem0;
-  tem1 = p[1] + 2;
+  tem1 = p[1] + 2 + b;
   x[1] = tem1;
-  tem2 = p[2] + 3;
+  tem2 = p[2] + 3 + b;
   x[2] = tem2;
-  tem3 = p[3] + 4;
+  tem3 = p[3] + 4 + a;
   x[3] = tem3;
   bar (x);
   return tem0 + tem1 + tem2 + tem3;


Re: [Patch, vrp] Allow VRP type conversion folding only for widenings upto word mode

2015-11-16 Thread Richard Biener
On Sat, 14 Nov 2015, Senthil Kumar Selvaraj wrote:

> On Sat, Nov 14, 2015 at 09:57:40AM +0100, Richard Biener wrote:
> > On November 14, 2015 9:49:28 AM GMT+01:00, Senthil Kumar Selvaraj 
> >  wrote:
> > >On Sat, Nov 14, 2015 at 09:13:41AM +0100, Marc Glisse wrote:
> > >> On Sat, 14 Nov 2015, Senthil Kumar Selvaraj wrote:
> > >> 
> > >> >This patch came out of a discussion held in the gcc mailing list
> > >> >(https://gcc.gnu.org/ml/gcc/2015-11/msg00067.html).
> > >> >
> > >> >The patch restricts folding of conditional exprs with lhs previously
> > >> >set by a type conversion to occur only if the source of the type
> > >> >conversion's mode is word mode or smaller.
> > >> >
> > >> >Bootstrapped and reg tested on x86_64 (with
> > >--enable-languages=c,c++).
> > >> >
> > >> >If ok, could you commit please? I don't have commit access.
> > >> >
> > >> >Regards
> > >> >Senthil
> > >> >
> > >> >gcc/ChangeLog
> > >> >
> > >> >2015-11-11  Senthil Kumar Selvaraj 
> > >
> > >> >
> > >> >* tree-vrp.c (simplify_cond_using_ranges): Fold only
> > >> >if innerop's mode is word_mode or smaller.
> > >> >
> > >> >
> > >> >diff --git gcc/tree-vrp.c gcc/tree-vrp.c
> > >> >index e2393e4..c139bc6 100644
> > >> >--- gcc/tree-vrp.c
> > >> >+++ gcc/tree-vrp.c
> > >> >@@ -9467,6 +9467,8 @@ simplify_cond_using_ranges (gcond *stmt)
> > >> >  innerop = gimple_assign_rhs1 (def_stmt);
> > >> >
> > >> >  if (TREE_CODE (innerop) == SSA_NAME
> > >> >+ && (GET_MODE_SIZE(TYPE_MODE(TREE_TYPE(innerop)))
> > >> >+   <= GET_MODE_SIZE(word_mode))
> > >> >  && !POINTER_TYPE_P (TREE_TYPE (innerop)))
> > >> >{
> > >> >  value_range *vr = get_value_range (innerop);
> > >> 
> > >> I thought the result of the discussion was that the transformation is
> > >ok if
> > >> either it is narrowing or it widens but to something no bigger than
> > >> word_mode. So you should have 2 comparisons, or 1 with a max.
> > >
> > >Hmm, I came to the opposite conclusion - I thought Richard only okayed
> > >"widening upto word-mode", not the narrowing. 
> > 
> > I didn't mean to suggest narrowing is not OK.  In fact narrowing is always 
> > OK.
> 
> My bad. Here's a revised patch that checks for both conditions, using
> max as Marc suggested to limit to word_mode or narrowing conversions.
> 
> Bootstrapped and regtested for x86_64 with c and c++.
> 
> Is this ok? If yes, would you commit it
> for me please? I don't have commit access.
> 
> gcc/ChangeLog
> 2015-11-14  Senthil Kumar Selvaraj  
> 
>   * tree-vrp.c (simplify_cond_using_ranges): Fold only
>   if innerop's mode smaller or equal to word_mode or op0's mode.
> 
> 
> diff --git a/gcc/tree-vrp.c b/gcc/tree-vrp.c
> index e2393e4..cfd90e7 100644
> --- a/gcc/tree-vrp.c
> +++ b/gcc/tree-vrp.c
> @@ -9467,7 +9467,10 @@ simplify_cond_using_ranges (gcond *stmt)
>innerop = gimple_assign_rhs1 (def_stmt);
>  
>if (TREE_CODE (innerop) == SSA_NAME
> -   && !POINTER_TYPE_P (TREE_TYPE (innerop)))
> +   && !POINTER_TYPE_P (TREE_TYPE (innerop))
> + && (GET_MODE_SIZE (TYPE_MODE (TREE_TYPE (innerop)))
> +   <= std::max (GET_MODE_SIZE (word_mode),
> +GET_MODE_SIZE (TYPE_MODE (TREE_TYPE (op0))

Please use TYPE_PRECISION (...) and GET_MODE_PRECISION (word_mode) and
add a comment as to what we are testing here and why.

Btw, ideally we'd factor out a

bool
desired_pro_or_demotion_p (tree to_type, tree from_type) {}

function somewhere as we have similar tests throughout the compiler
that we might want to unify (and also have a central place to
eventually add a target hook if ever desired).

In fact in other places we also check that the type we promote/demote
to matches its mode precision or the type we promote/demote from
already does not.

I'd suggest tree.[ch] for that function.

Please also add a testcase.

Thanks,
Richard.

>   {
> value_range *vr = get_value_range (innerop);
>  
> 
> Regards
> Senthil
> > 
> > Richard.
> > 
> > >Richard?
> > >
> > >Regards
> > >Senthil
> > >> 
> > >> -- 
> > >> Marc Glisse
> > 
> > 
> 
> 

-- 
Richard Biener 
SUSE LINUX GmbH, GF: Felix Imendoerffer, Jane Smithard, Graham Norton, HRB 
21284 (AG Nuernberg)


Re: [PATCH] [ARM/Aarch64] add initial Qualcomm support

2015-11-16 Thread Kyrill Tkachov

Hi Jim,

On 13/11/15 17:53, Jim Wilson wrote:

Revised patch with the also missing xgene1 part added.

Jim

aprofile.patch


2015-11-13  Jim Wilson

* gcc/config/arm/t-aprofile (MULTILIB_MATCHES): Add lines for exynos-m1
and qdf24xx and xgene1 to match -march=armv8-a.


This is ok.
Thanks,
Kyrill



Re: [PATCH] g++.dg/init/vbase1.C and g++.dg/cpp/ucn-1.C

2015-11-16 Thread Eric Botcazou
> No RISC architecture can store directly to MEM, so the expected RTL in
> g++.dg/init/vbase1.C is wrong.  I am adding XFAIL for PowerPC.  This
> probably should be disabled for ARM and other RISC architectures.

Some of them can store 0 directly to MEM though, for example SPARC.

-- 
Eric Botcazou


Re: [AArch64] Cortex-A57 Choose some new branch costs.

2015-11-16 Thread James Greenhalgh
On Fri, Nov 13, 2015 at 11:40:56AM +0100, Richard Biener wrote:
> On Fri, Nov 13, 2015 at 11:13 AM, James Greenhalgh
>  wrote:
> >
> > Hi,
> >
> > With all the work that has recently gone in to ifcvt, I thought I'd revisit
> > the branch cost settings for Cortex-A57. After a run of experiments [1],
> > I found {1, 3} to be the sweet spot, giving a small set of performance
> > improvements across some popular benchmarks.
> >
> > I'd therefore like to propose changing the branch cost to those numbers.
> >
> > Patch bootstrapped tuning for Cortex-A57 with no issues. I'll revisit
> > the same for Cortex-A53.
> >
> > OK?
> 
> Can you re-do the experiment with adding a LOGICAL_OP_NON_SHORT_CIRCUIT
> target macro for aarch64?  It's fallback uses BRANCH_COST (see fold-const.c
> and tree-ssa-ifcombine.c)

This didn't drastically change the results for the workloads I was
using as benchmarks. I'd still pick {1, 3} out as the winner.

Thanks,
James

> > ---
> > 2015-11-12  James Greenhalgh  
> >
> > * config/aarch64/aarch64.c (cortexa57_branch_costs): New.
> > (cortexa57_tunings): Use it.
> >
> 


Re: [PATCH] simplify-rtx: Simplify sign_extend of lshiftrt to zero_extend (PR68330)

2015-11-16 Thread Eric Botcazou
> Since r230164, in PR68330 combine ends up with a sign_extend of an
> lshiftrt by some constant, and it does not know to morph that into a
> zero_extract (the extend will always extend with zeroes).  I think
> it is best to let simplify-rtx always replace such a sign_extend by
> a zero_extend, after which everything works as expected.

Agreed.

> 2015-11-15  Segher Boessenkool  
> 
>   PR rtl-optimization/68330
>   * simplify-rtx.c (simplify_unary_operation_1): Simplify SIGN_EXTEND
>   of LSHIFTRT by a non-zero constant integer.

OK, thanks.

-- 
Eric Botcazou


Re: [PATCH] Fix obvious typo that breaks boot strap in delayed folding

2015-11-16 Thread Ville Voutilainen
>> On Sun, Nov 15, 2015 at 11:09:18PM +0100, Andreas Schwab wrote:
>>> Andi Kleen  writes:
>>>
>>> > Fix the obivous typos. To to commit?
>>>
>>> They are not typos.
>>
>> Ok. What do you suggest to fix the error then?
>Someone will need to fix the regression introduced by the C++ delayed
>folding.

The proposed patch makes no sense. The assignment is intentional,
the relevant flags of the element type are reflected onto the array type.
Doing a comparison there instead of assignment will not have the
desired effect.

Maybe I'm dense, but why is there a warning that suggests adding
parens around the assignment when it already has parens around it?
Is that diagnostic over-eager?


Re: [RFC] Device-specific OpenMP target arguments

2015-11-16 Thread Jakub Jelinek
On Fri, Nov 13, 2015 at 07:40:27PM +0100, Martin Jambor wrote:
> the patch below is is an untested trunk-only implementation of device
> specific GOMP_target_ext arguments, which was proposed to me by Jakub
> today on IRC.  I'm sending this patch to make sure I understood the
> details well.  Nevertheless, I will be commiting a tested and working
> version to the hsa branch shortly.

Well, the details aren't really in the patch though.

> --- a/gcc/builtin-types.def
> +++ b/gcc/builtin-types.def
> @@ -556,9 +556,9 @@ DEF_FUNCTION_TYPE_9 
> (BT_FN_VOID_OMPFN_PTR_OMPCPYFN_LONG_LONG_BOOL_UINT_PTR_INT,
>BT_PTR_FN_VOID_PTR_PTR, BT_LONG, BT_LONG,
>BT_BOOL, BT_UINT, BT_PTR, BT_INT)
>  
> -DEF_FUNCTION_TYPE_10 (BT_FN_VOID_INT_OMPFN_SIZE_PTR_PTR_PTR_UINT_PTR_INT_INT,
> -   BT_VOID, BT_INT, BT_PTR_FN_VOID_PTR, BT_SIZE, BT_PTR,
> -   BT_PTR, BT_PTR, BT_UINT, BT_PTR, BT_INT, BT_INT)
> +DEF_FUNCTION_TYPE_9 (BT_FN_VOID_INT_OMPFN_SIZE_PTR_PTR_PTR_UINT_PTR_PTR,
> +  BT_VOID, BT_INT, BT_PTR_FN_VOID_PTR, BT_SIZE, BT_PTR,
> +  BT_PTR, BT_PTR, BT_UINT, BT_PTR, BT_PTR)

A _9 entry should be groupped together with other _9 entries, so no empty
line in between.

> diff --git a/gcc/fortran/types.def b/gcc/fortran/types.def
> index a37e856..279d055 100644
> --- a/gcc/fortran/types.def
> +++ b/gcc/fortran/types.def
> @@ -221,9 +221,9 @@ DEF_FUNCTION_TYPE_9 
> (BT_FN_VOID_OMPFN_PTR_OMPCPYFN_LONG_LONG_BOOL_UINT_PTR_INT,
>BT_PTR_FN_VOID_PTR_PTR, BT_LONG, BT_LONG,
>BT_BOOL, BT_UINT, BT_PTR, BT_INT)
>  
> -DEF_FUNCTION_TYPE_10 (BT_FN_VOID_INT_OMPFN_SIZE_PTR_PTR_PTR_UINT_PTR_INT_INT,
> -   BT_VOID, BT_INT, BT_PTR_FN_VOID_PTR, BT_SIZE, BT_PTR,
> -   BT_PTR, BT_PTR, BT_UINT, BT_PTR, BT_INT, BT_INT)
> +DEF_FUNCTION_TYPE_9 (BT_FN_VOID_INT_OMPFN_SIZE_PTR_PTR_PTR_UINT_PTR_PTR,
> +  BT_VOID, BT_INT, BT_PTR_FN_VOID_PTR, BT_SIZE, BT_PTR,
> +  BT_PTR, BT_PTR, BT_UINT, BT_PTR, BT_PTR)

Likewise.

> --- a/libgomp/libgomp.h
> +++ b/libgomp/libgomp.h
> @@ -876,7 +876,7 @@ struct gomp_device_descr
>void *(*dev2host_func) (int, void *, const void *, size_t);
>void *(*host2dev_func) (int, void *, const void *, size_t);
>void *(*dev2dev_func) (int, void *, const void *, size_t);
> -  void (*run_func) (int, void *, void *);
> +  void (*run_func) (int, void *, void *, void **);

There is now async_run_func too, which needs similar treatment.

> --- a/libgomp/target.c
> +++ b/libgomp/target.c
> @@ -1373,7 +1373,8 @@ GOMP_target (int device, void (*fn) (void *), const 
> void *unused,
>thr->place = old_thr.place;
>thr->ts.place_partition_len = gomp_places_list_len;
>  }
> -  devicep->run_func (devicep->target_id, fn_addr, (void *) 
> tgt_vars->tgt_start);
> +  devicep->run_func (devicep->target_id, fn_addr, (void *) 
> tgt_vars->tgt_start,
> +  NULL);
>gomp_free_thread (thr);
>*thr = old_thr;
>gomp_unmap_vars (tgt_vars, true);
> @@ -1383,6 +1384,11 @@ GOMP_target (int device, void (*fn) (void *), const 
> void *unused,
> and several arguments have been added:
> FLAGS is a bitmask, see GOMP_TARGET_FLAG_* in gomp-constants.h.
> DEPEND is array of dependencies, see GOMP_task for details.
> +   ARGS is a pointer to an array consisting of NUM_TEAMS, THREAD_LIMIT and a
> +   variable number of device-specific arguments, which always take two 
> elements
> +   where the first specifies the type and the second the actual value.  The
> +   last element of the array is a single NULL.

I wouldn't document that it has to be always a pair, simply always a new
pointer in the array encodes:
a) a few bits describing what offloading target this is for
   (enum offload_target_type or some other constant, but needs to be defined
   in gomp-constants.h; offload_target_type has the advantage that it exists
   alrady, and disadvantage that it already now contains gaps, for now
   we really need some value describing that it applies to all offloading
   targets, then host fallback, XeonPhi, NVidia, HSA (but with enough room
   for future growth))
b) a few bits describing the opcode
c) the remaining bits could be used to hold shorter values; say if you
   have a bool value (or some set of bools), they could go inline, or
   say some integer with limited range (say 16 bits at most), or perhaps
   on 64-bit hosts even full integer (and on 32-bit hosts the argument
   would go in the next pointer instead)
And this pointer would be followed by optional arguments, as specific to the
opcode.
Though, if we want other plugins to be able to easily skip opcodes for
other offloading targets, perhaps there should be also bits describing
how many arguments the opcode has.

> --- a/liboffloadmic/plugin/libgomp-plugin-intelmic.cpp
> +++ b/liboffloadmic/plugin/libgomp-plugin-intelmic.cpp
> @@ -501,7 +501,8 @@ GOMP_OFFLOAD_d

Re: [PATCH] Enable libstdc++ numeric conversions on Cygwin

2015-11-16 Thread Jonathan Wakely

On 15/11/15 16:38 -0500, Jennifer Yao wrote:

I just finished running the testsuite on x86_64-pc-cygwin for
Jonathan's latest patch and compared the results against an older
(about two months old) run, and so far I'm not seeing any regressions.
Granted, this is strictly preliminary; I'm currently re-running the
testsuite on the up-to-date, unpatched build tree to see if there are
any differences. I'll update when I have more information.


Finally finished testing the patch that was committed to trunk on Nov.
13. No regressions to report for x86_64-pc-cygwin.


Great - thanks for testing.



Re: [RFC] [PATCH V2]: RE: [RFC] [Patch] Relax tree-if-conv.c trap assumptions.

2015-11-16 Thread Richard Biener
On Sun, Nov 15, 2015 at 12:14 PM, Kumar, Venkataramanan
 wrote:
> Hi Richard  and Bernhard.
>
>> -Original Message-
>> From: Richard Biener [mailto:richard.guent...@gmail.com]
>> Sent: Tuesday, November 10, 2015 5:33 PM
>> To: Kumar, Venkataramanan
>> Cc: Andrew Pinski; gcc-patches@gcc.gnu.org
>> Subject: Re: [RFC] [PATCH V2]: RE: [RFC] [Patch] Relax tree-if-conv.c trap
>> assumptions.
>>
>> On Sat, Nov 7, 2015 at 12:41 PM, Kumar, Venkataramanan
>>  wrote:
>> > Hi Richard,
>> >
>> > I have now implemented storing of DR and references using hash maps.
>> > Please find attached patch.
>> >
>> > As discussed, I am now storing the ref, DR  and baseref, DR pairs along 
>> > with
>> unconditional read/write information  in  hash tables while iterating over DR
>> during its initialization.
>> > Then during checking for possible traps for if-converting,  just check if 
>> > the
>> memory reference for a gimple statement is read/written unconditionally by
>> querying the hash table instead of quadratic walk.
>> >
>> > Boot strapped and regression tested on x86_64.
>>
>> @@ -592,137 +598,153 @@ struct ifc_dr {
>>
>>/* -1 when not initialized, 0 when false, 1 when true.  */
>>int rw_unconditionally;
>> +
>> +  tree ored_result;
>> +
>>
>> excess vertical space at the end.  A better name would be simply "predicate".
>>
>> +  if (!exsist1)
>> +{
>> +  if (is_true_predicate (ca))
>> +   {
>> + DR_RW_UNCONDITIONALLY (a) = 1;
>> +   }
>>
>> -while (TREE_CODE (ref_base_a) == COMPONENT_REF
>> -   || TREE_CODE (ref_base_a) == IMAGPART_EXPR
>> -   || TREE_CODE (ref_base_a) == REALPART_EXPR)
>> -  ref_base_a = TREE_OPERAND (ref_base_a, 0);
>> +  IFC_DR (a)->ored_result = ca;
>> +  *master_dr = a;
>> + }
>> +  else
>> +{
>> +  IFC_DR (*master_dr)->ored_result
>> += fold_or_predicates
>> +   (EXPR_LOCATION (IFC_DR (*master_dr)->ored_result),
>> +ca, IFC_DR (*master_dr)->ored_result);
>>
>> -while (TREE_CODE (ref_base_b) == COMPONENT_REF
>> -   || TREE_CODE (ref_base_b) == IMAGPART_EXPR
>> -   || TREE_CODE (ref_base_b) == REALPART_EXPR)
>> -  ref_base_b = TREE_OPERAND (ref_base_b, 0);
>> +  if (is_true_predicate (ca)
>> + || is_true_predicate (IFC_DR (*master_dr)->ored_result))
>> +   {
>> + DR_RW_UNCONDITIONALLY (*master_dr) = 1;
>> +   }
>> +  }
>>
>> please common the predicate handling from both cases, that is, do
>>
>>if (is_true_predicate (IFC_DR (*master_dr)->ored_result))
>> DR_RW_...
>>
>> after the if.  Note no {}s around single stmts.
>>
>> Likewise for the base_master_dr code.
>>
>> +  if (!found)
>> +   {
>> + DR_WRITTEN_AT_LEAST_ONCE (a) =0;
>> + DR_RW_UNCONDITIONALLY (a) = 0;
>> + return false;
>>
>> no need to update the flags on non-"master" DRs.
>>
>> Please remove the 'found' variable and simply return 'true'
>> whenever found.
>>
>>  static bool
>>  ifcvt_memrefs_wont_trap (gimple *stmt, vec refs)  {
>> -  return write_memrefs_written_at_least_once (stmt, refs)
>> -&& memrefs_read_or_written_unconditionally (stmt, refs);
>> +  return memrefs_read_or_written_unconditionally (stmt, refs);
>>
>> please simply inline memrefs_read_or_written_unconditionally here.
>>
>> +  if (ref_DR_map)
>> +delete ref_DR_map;
>> +  ref_DR_map = NULL;
>> +
>> +  if (baseref_DR_map)
>> +delete baseref_DR_map;
>> + baseref_DR_map = NULL;
>>
>> at this point the pointers will never be NULL.
>>
>> Ok with those changes.
>
> The attached patch addresses all the review comments and is rebased to 
> current trunk.
> GCC regression test and bootstrap passed.
>
> Wanted to check with you once before committing it to trunk.
> Ok for trunk?
>
> gcc/ChangeLog
>
> 2015-11-15  Venkataramanan  
> * tree-if-conv.c  (ref_DR_map): Define.
> (baseref_DR_map): Like wise
> (struct ifc_dr): Add new tree predicate field.
> (hash_memrefs_baserefs_and_store_DRs_read_written_info): New function.
> (memrefs_read_or_written_unconditionally):  Use hash maps to query
> unconditional read/written information.
> (write_memrefs_written_at_least_once): Remove.
> (ifcvt_memrefs_wont_trap): Remove call to
> write_memrefs_written_at_least_once.
> (if_convertible_loop_p_1):  Initialize hash maps and predicates
> before hashing data references.
> * tree-data-ref.h  (decl_binds_to_current_def_p): Declare .

It's already declared in varasm.h which you need to include.

You are correct in that we don't handle multiple data references in a
single stmt
in ifcvt_memrefs_wont_trap but that's a situation that cannot not happen.

So it would be nice to make this clearer by changing the function to
not loop over
all DRs but instead just do

  data_reference_p a = drs[gimple_uid (stmt) - 1];
  gcc_as

[PATCH] [ARC] Add support for atomic memory built-in.

2015-11-16 Thread Claudiu Zissulescu
This patch adds support for atomic memory built-in for ARCHS and ARC700. Tested 
with dg.exp.

OK to apply?

Thanks,
Claudiu

ChangeLogs:
gcc/

2015-11-12  Claudiu Zissulescu  

* config/arc/arc-protos.h (arc_expand_atomic_op): Prototype.
(arc_split_compare_and_swap): Likewise.
(arc_expand_compare_and_swap): Likewise.
* config/arc/arc.c (arc_init): Check usage atomic option.
(arc_pre_atomic_barrier): New function.
(arc_post_atomic_barrier): Likewise.
(emit_unlikely_jump): Likewise.
(arc_expand_compare_and_swap_qh): Likewise.
(arc_expand_compare_and_swap): Likewise.
(arc_split_compare_and_swap): Likewise.
(arc_expand_atomic_op): Likewise.
* config/arc/arc.h (TARGET_CPU_CPP_BUILTINS): New C macro.
(ASM_SPEC): Enable mlock option when matomic is used.
* config/arc/arc.md (UNSPEC_ARC_MEMBAR): Define.
(VUNSPEC_ARC_CAS): Likewise.
(VUNSPEC_ARC_LL): Likewise.
(VUNSPEC_ARC_SC): Likewise.
(VUNSPEC_ARC_EX): Likewise.
* config/arc/arc.opt (matomic): New option.
* config/arc/constraints.md (ATO): New constraint.
* config/arc/predicates.md (mem_noofs_operand): New predicate.
* doc/invoke.texi: Document -matomic.
* config/arc/atomic.md: New file.

gcc/testsuite

2015-11-12  Claudiu Zissulescu  

* lib/target-supports.exp (check_effective_target_arc_atomic): New
function.
(check_effective_target_sync_int_long): Add checks for ARC atomic
feature.
(check_effective_target_sync_char_short): Likewise.
---
 gcc/config/arc/arc-protos.h   |   4 +
 gcc/config/arc/arc.c  | 391 ++
 gcc/config/arc/arc.h  |   6 +-
 gcc/config/arc/arc.md |   9 +
 gcc/config/arc/arc.opt|   3 +
 gcc/config/arc/atomic.md  | 235 
 gcc/config/arc/constraints.md |   6 +
 gcc/config/arc/predicates.md  |   4 +
 gcc/doc/invoke.texi   |   8 +-
 gcc/testsuite/lib/target-supports.exp |  11 +
 10 files changed, 675 insertions(+), 2 deletions(-)
 create mode 100644 gcc/config/arc/atomic.md

diff --git a/gcc/config/arc/arc-protos.h b/gcc/config/arc/arc-protos.h
index 6e04351..3581bb0 100644
--- a/gcc/config/arc/arc-protos.h
+++ b/gcc/config/arc/arc-protos.h
@@ -41,6 +41,10 @@ extern int arc_output_commutative_cond_exec (rtx *operands, 
bool);
 extern bool arc_expand_movmem (rtx *operands);
 extern bool prepare_move_operands (rtx *operands, machine_mode mode);
 extern void emit_shift (enum rtx_code, rtx, rtx, rtx);
+extern void arc_expand_atomic_op (enum rtx_code, rtx, rtx, rtx, rtx, rtx);
+extern void arc_split_compare_and_swap (rtx *);
+extern void arc_expand_compare_and_swap (rtx *);
+
 #endif /* RTX_CODE */
 
 #ifdef TREE_CODE
diff --git a/gcc/config/arc/arc.c b/gcc/config/arc/arc.c
index 8bb0969..d47bbe4 100644
--- a/gcc/config/arc/arc.c
+++ b/gcc/config/arc/arc.c
@@ -61,6 +61,7 @@ along with GCC; see the file COPYING3.  If not see
 #include "context.h"
 #include "builtins.h"
 #include "rtl-iter.h"
+#include "alias.h"
 
 /* Which cpu we're compiling for (ARC600, ARC601, ARC700).  */
 static const char *arc_cpu_string = "";
@@ -884,6 +885,9 @@ arc_init (void)
   flag_pic = 0;
 }
 
+  if (TARGET_ATOMIC && !(TARGET_ARC700 || TARGET_HS))
+error ("-matomic is only supported for ARC700 or ARC HS cores");
+
   arc_init_reg_tables ();
 
   /* Initialize array for PRINT_OPERAND_PUNCT_VALID_P.  */
@@ -9650,6 +9654,393 @@ arc_use_by_pieces_infrastructure_p (unsigned 
HOST_WIDE_INT size,
   return default_use_by_pieces_infrastructure_p (size, align, op, speed_p);
 }
 
+/* Emit a (pre) memory barrier around an atomic sequence according to
+   MODEL.  */
+
+static void
+arc_pre_atomic_barrier (enum memmodel model)
+{
+ switch (model & MEMMODEL_MASK)
+{
+case MEMMODEL_RELAXED:
+case MEMMODEL_CONSUME:
+case MEMMODEL_ACQUIRE:
+case MEMMODEL_SYNC_ACQUIRE:
+  break;
+case MEMMODEL_RELEASE:
+case MEMMODEL_ACQ_REL:
+case MEMMODEL_SYNC_RELEASE:
+  emit_insn (gen_membar (const0_rtx));
+  break;
+case MEMMODEL_SEQ_CST:
+case MEMMODEL_SYNC_SEQ_CST:
+  emit_insn (gen_sync (const1_rtx));
+  break;
+default:
+  gcc_unreachable ();
+}
+}
+
+/* Emit a (post) memory barrier around an atomic sequence according to
+   MODEL.  */
+
+static void
+arc_post_atomic_barrier (enum memmodel model)
+{
+ switch (model & MEMMODEL_MASK)
+{
+case MEMMODEL_RELAXED:
+case MEMMODEL_CONSUME:
+case MEMMODEL_RELEASE:
+case MEMMODEL_SYNC_RELEASE:
+  break;
+case MEMMODEL_ACQUIRE:
+case MEMMODEL_ACQ_REL:
+case MEMMODEL_SYNC_ACQUIRE:
+  emit_insn (gen_membar (const0_rtx));
+  break;
+case MEMMODEL_SEQ_CST:
+case MEMMODEL_SYNC_SEQ_CST:
+  emit_insn (gen_sync (const1_rtx));
+  break;
+def

Re: [AArch64] Cortex-A57 Choose some new branch costs.

2015-11-16 Thread Richard Earnshaw
On 13/11/15 10:13, James Greenhalgh wrote:
> 
> Hi,
> 
> With all the work that has recently gone in to ifcvt, I thought I'd revisit
> the branch cost settings for Cortex-A57. After a run of experiments [1],
> I found {1, 3} to be the sweet spot, giving a small set of performance
> improvements across some popular benchmarks.
> 
> I'd therefore like to propose changing the branch cost to those numbers.
> 
> Patch bootstrapped tuning for Cortex-A57 with no issues. I'll revisit
> the same for Cortex-A53.
> 
> OK?
> 
> Thanks,
> James
> 
> [1]: {2, 2}, {2, 3}, {2, 4},
>  {1, 2}, {1, 3}, {1, 4},
>  {3, 3}, {3, 4},
>  {4, 4}, {4, 5},
>  {5, 5}, {5, 6}
> 
> ---
> 2015-11-12  James Greenhalgh  
> 
>   * config/aarch64/aarch64.c (cortexa57_branch_costs): New.
>   (cortexa57_tunings): Use it.
> 

OK.

R.



Re: vector lightweight debug mode

2015-11-16 Thread Jonathan Wakely

On 15/11/15 22:12 +0100, François Dumont wrote:

Here is a last version I think.

   I completed the debug light mode by adding some check on iterator
ranges.

   Even if check are light I made some changes to make sure that
internally vector is not using methods instrumented with those checks.
This is to make sure checks are not done several times. Doing so also
simplify normal mode especially when using insert range, there is no
need to check if parameters are integers or not.


Yes, I'd also observed that those improvements could be made, to avoid
dispatching when we already know we have iterators not integers.


   I also introduce some __builtin_expect to make sure compiler will
prefer the best path.

   I didn't manage to check result on generated code. I am pretty sure
there will be an impact, you can't run more code without impact. But
that is a known drawback of debug mode, light or not, we just need to
minimize it. Mostly by making sure that checks are done only once.


Not doing the checks is also an option. That minimizes the cost :-)

For the full debug mode we want to check everything we can, and accept
that has a cost.

For the lightweight one we need to evaluate the relative benefits. Is
it worth adding checks for errors that only happen rarely? Does the
benefit outweigh the cost?

I'm still not convinced that's the case for the "valid range" checks.
I'm willing to be convinced, but am not convinced yet.


   It would be great to have it for gcc 6.0. I am working on the same
for other containers.


Please don't do the valid range checks for std::deque, the checks are
undefined for iterators into different containers and will not give a
reliable answer.



[PATCH][GCC][ARM] Disable neon testing for armv7-m

2015-11-16 Thread Andre Vieira

Hi,

  This patch changes the target support mechanism to make it recognize 
any ARM 'M' profile as a non-neon supporting target. The current check 
only tests for armv6 architectures and earlier, and does not account for 
armv7-m.


  This is correct because there is no 'M' profile that supports neon 
and the current test is not sufficient to exclude armv7-m.


  Tested by running regressions for this testcase for various ARM targets.

  Is this OK to commit?

  Thanks,
  Andre Vieira

gcc/testsuite/ChangeLog:
2015-11-06  Andre Vieira  

* gcc/testsuite/lib/target-supports.exp
  (check_effective_target_arm_neon_ok_nocache): Added check
  for M profile.
From 2c53bb9ba3236919ecf137a4887abf26d4f7fda2 Mon Sep 17 00:00:00 2001
From: Andre Simoes Dias Vieira 
Date: Fri, 13 Nov 2015 11:16:34 +
Subject: [PATCH] Disable neon testing for armv7-m

---
 gcc/testsuite/lib/target-supports.exp | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/gcc/testsuite/lib/target-supports.exp b/gcc/testsuite/lib/target-supports.exp
index 75d506829221e3d02d454631c4bd2acd1a8cedf2..8097a4621b088a93d58d09571cf7aa27b8d5fba6 100644
--- a/gcc/testsuite/lib/target-supports.exp
+++ b/gcc/testsuite/lib/target-supports.exp
@@ -2854,7 +2854,7 @@ proc check_effective_target_arm_neon_ok_nocache { } {
 		int dummy;
 		/* Avoid the case where a test adds -mfpu=neon, but the toolchain is
 		   configured for -mcpu=arm926ej-s, for example.  */
-		#if __ARM_ARCH < 7
+		#if __ARM_ARCH < 7 || __ARM_ARCH_PROFILE == 'M'
 		#error Architecture too old for NEON.
 		#endif
 	} "$flags"] } {
-- 
1.9.1



Re: [PATCH] Fix obvious typo that breaks boot strap in delayed folding

2015-11-16 Thread Andi Kleen
Ville Voutilainen  writes:
>
> The proposed patch makes no sense. The assignment is intentional,
> the relevant flags of the element type are reflected onto the array type.
> Doing a comparison there instead of assignment will not have the
> desired effect.

Ok I changed it to an assignment.

> Maybe I'm dense, but why is there a warning that suggests adding
> parens around the assignment when it already has parens around it?
> Is that diagnostic over-eager?

Yes it seems broken.

I'm just going to use --disable-werror now, as apparently everybody else
does.

-Andi



Re: [PATCH] Fix inconsistent use of integer types in gcc/lto-streamer-out.c

2015-11-16 Thread Richard Biener
On Sun, Nov 15, 2015 at 9:21 AM, Andris Pavenis  wrote:
> This fixes use of pointers different unsigned integer types as function
> parameter.
> Function prototype is (see gcc/tree-streamer.h):
>
> bool streamer_tree_cache_lookup (struct streamer_tree_cache_d *, tree,
>  unsigned *);
>
> gcc/lto-streamer-out.c passes uint32_t int * as parameter to this method in
> 2 places.
> Current type unisgned is used elsewhere in the same file.
>
> uint32_t is not guaranteed to be the same as unsigned (for DJGPP uint32_t is
> actually
> unsigned long). That causes compile failure for DJGPP native build.

Ok.

Thanks,
Richard.

> Andris
>
> 2015-11-15 Andris Pavenis 
>
> * gcc/lto-streamer-out.c (write_global_references): Adjust integer type
>   (lto_output_decl_state_refs): Likewise
>


Re: [PATCH] Fix obvious typo that breaks boot strap in delayed folding

2015-11-16 Thread Eric Botcazou
> I'm just going to use --disable-werror now, as apparently everybody else
> does.

I personally don't and bootstrap works fine probably because the bug is fixed.
Are you sure that you are bootstrapping the compiler and not just building it?
-Werror is supposed to be disabled during stage #1 of a bootstrap.

-- 
Eric Botcazou


[PATCH] Fix PR68117 (hopefully)

2015-11-16 Thread Richard Biener

In r229405 I removed a call to redirect_edge_var_map_destroy from
delete_tree_ssa which I thought cannot be necessary because that
map isn't GCed and thus stale data in it should have caused quite
some havoc otherwise.  Turns out I was wrong ;)  We were lucky
instead.  The following patch amends the code in remove_edge
which clears data for removed edges (when in gimple!) in that it
now destroys the map at RTL expansion time.

Now there is still a latent issue I believe as somebody is making
use of the map for an edge it didn't previously push to
(pushing always first clears the entry for the edge).

The comment in redirect_edge_var_map_vector is also revealing in
that context:

redirect_edge_var_map_vector (edge e)
{
  /* Hey, what kind of idiot would... you'd be surprised.  */
  if (!edge_var_maps)
return NULL;

In particular we probably have (very many?) redirect_edge_and_branch
calls not paired with a redirect_edge_var_map_clear call.

In all it looks like a very fragile thing, that SSA edge redirect
hook stuff.

Bootstrap / regtest running on x86_64-unknown-linux-gnu.  Will
apply if that succeeds and hope for the best.

Richard.

2015-11-16  Richard Biener  

PR middle-end/68117
* cfgexpand.c (pass_expand::execute): Destroy the edge
redirection var map before setting RTL CFG hooks.

Index: gcc/cfgexpand.c
===
--- gcc/cfgexpand.c (revision 230404)
+++ gcc/cfgexpand.c (working copy)
@@ -6275,6 +6278,9 @@ pass_expand::execute (function *fun)
 
   expand_phi_nodes (&SA);
 
+  /* Release any stale SSA redirection data.  */
+  redirect_edge_var_map_destroy ();
+
   /* Register rtl specific functions for cfg.  */
   rtl_register_cfg_hooks ();
 


Re: [PATCH] Fix obvious typo that breaks boot strap in delayed folding

2015-11-16 Thread Andreas Schwab
Eric Botcazou  writes:

> I personally don't and bootstrap works fine probably because the bug is fixed.

No, the bug still exists.  Try --enable-checking=release.

Andreas.

-- 
Andreas Schwab, sch...@linux-m68k.org
GPG Key fingerprint = 58CA 54C7 6D53 942B 1756  01D3 44D5 214B 8276 4ED5
"And now for something completely different."


Re: [PATCH, 5/16] Add in_oacc_kernels_region in struct loop

2015-11-16 Thread Tom de Vries

On 11/11/15 11:55, Richard Biener wrote:

On Mon, 9 Nov 2015, Tom de Vries wrote:


On 09/11/15 16:35, Tom de Vries wrote:

Hi,

this patch series for stage1 trunk adds support to:
- parallelize oacc kernels regions using parloops, and
- map the loops onto the oacc gang dimension.

The patch series contains these patches:

   1Insert new exit block only when needed in
  transform_to_exit_first_loop_alt
   2Make create_parallel_loop return void
   3Ignore reduction clause on kernels directive
   4Implement -foffload-alias
   5Add in_oacc_kernels_region in struct loop
   6Add pass_oacc_kernels
   7Add pass_dominator_oacc_kernels
   8Add pass_ch_oacc_kernels
   9Add pass_parallelize_loops_oacc_kernels
  10Add pass_oacc_kernels pass group in passes.def
  11Update testcases after adding kernels pass group
  12Handle acc loop directive
  13Add c-c++-common/goacc/kernels-*.c
  14Add gfortran.dg/goacc/kernels-*.f95
  15Add libgomp.oacc-c-c++-common/kernels-*.c
  16Add libgomp.oacc-fortran/kernels-*.f95

The first 9 patches are more or less independent, but patches 10-16 are
intended to be committed at the same time.

Bootstrapped and reg-tested on x86_64.

Build and reg-tested with nvidia accelerator, in combination with a
patch that enables accelerator testing (which is submitted at
https://gcc.gnu.org/ml/gcc-patches/2015-10/msg01771.html ).

I'll post the individual patches in reply to this message.


this patch adds and initializes the field in_oacc_kernels_region field in
struct loop.

The field is used to signal to subsequent passes that we're dealing with a
loop in a kernels region that we're trying parallelize.

Note that we do not parallelize kernels regions with more than one loop nest.
[ In general, kernels regions with more than one loop nest should be split up
into seperate kernels regions, but that's not supported atm. ]


I think mark_loops_in_oacc_kernels_region can be greatly simplified.

Both region entry and exit should have the same ->loop_father (a SESE
region).  Then you can just walk that loops inner (and their sibling)
loops checking their header domination relation with the region entry
exit (only necessary for direct inner loops).


Updated patch to use the loops structure.  Atm I'm also skipping loops 
containing sibling loops, since I have no test-cases for that yet.


Thanks,
- Tom

Add in_oacc_kernels_region in struct loop

2015-11-09  Tom de Vries  

	* cfgloop.h (struct loop): Add in_oacc_kernels_region field.
	* omp-low.c (mark_loops_in_oacc_kernels_region): New function.
	(expand_omp_target): Call mark_loops_in_oacc_kernels_region.

---
 gcc/cfgloop.h |  3 +++
 gcc/omp-low.c | 43 +++
 2 files changed, 46 insertions(+)

diff --git a/gcc/cfgloop.h b/gcc/cfgloop.h
index 6af6893..ee73bf9 100644
--- a/gcc/cfgloop.h
+++ b/gcc/cfgloop.h
@@ -191,6 +191,9 @@ struct GTY ((chain_next ("%h.next"))) loop {
   /* True if we should try harder to vectorize this loop.  */
   bool force_vectorize;
 
+  /* True if the loop is part of an oacc kernels region.  */
+  bool in_oacc_kernels_region;
+
   /* For SIMD loops, this is a unique identifier of the loop, referenced
  by IFN_GOMP_SIMD_VF, IFN_GOMP_SIMD_LANE and IFN_GOMP_SIMD_LAST_LANE
  builtins.  */
diff --git a/gcc/omp-low.c b/gcc/omp-low.c
index 5f76434..fba7bbd 100644
--- a/gcc/omp-low.c
+++ b/gcc/omp-low.c
@@ -12450,6 +12450,46 @@ get_oacc_ifn_dim_arg (const gimple *stmt)
   return (int) axis;
 }
 
+/* Mark the loops inside the kernels region starting at REGION_ENTRY and ending
+   at REGION_EXIT.  */
+
+static void
+mark_loops_in_oacc_kernels_region (basic_block region_entry,
+   basic_block region_exit)
+{
+  struct loop *outer = region_entry->loop_father;
+  gcc_assert (region_exit == NULL || outer == region_exit->loop_father);
+
+  /* Don't parallelize the kernels region if it contains more than one outer
+ loop.  */
+  unsigned int nr_outer_loops = 0;
+  struct loop *single_outer;
+  for (struct loop *loop = outer->inner; loop != NULL; loop = loop->next)
+{
+  gcc_assert (loop_outer (loop) == outer);
+
+  if (!dominated_by_p (CDI_DOMINATORS, loop->header, region_entry))
+	continue;
+
+  if (region_exit != NULL
+	  && dominated_by_p (CDI_DOMINATORS, loop->header, region_exit))
+	continue;
+
+  nr_outer_loops++;
+  single_outer = loop;
+}
+  if (nr_outer_loops != 1)
+return;
+
+  for (struct loop *loop = single_outer->inner; loop != NULL; loop = loop->inner)
+if (loop->next)
+  return;
+
+  /* Mark the loops in the region.  */
+  for (struct loop *loop = single_outer; loop != NULL; loop = loop->inner)
+loop->in_oacc_kernels_region = true;
+}
+
 /* Expand the GIMPLE_OMP_TARGET starting at REGION.  */
 
 static void
@@ -12505,6 +12545,9 @@ expand_omp_target (struct omp_region *region)
   entry_bb = region->entr

Re: [PATCH, 5/16] Add in_oacc_kernels_region in struct loop

2015-11-16 Thread Tom de Vries

On 11/11/15 11:55, Richard Biener wrote:

On Mon, 9 Nov 2015, Tom de Vries wrote:


On 09/11/15 16:35, Tom de Vries wrote:

Hi,

this patch series for stage1 trunk adds support to:
- parallelize oacc kernels regions using parloops, and
- map the loops onto the oacc gang dimension.

The patch series contains these patches:

   1Insert new exit block only when needed in
  transform_to_exit_first_loop_alt
   2Make create_parallel_loop return void
   3Ignore reduction clause on kernels directive
   4Implement -foffload-alias
   5Add in_oacc_kernels_region in struct loop
   6Add pass_oacc_kernels
   7Add pass_dominator_oacc_kernels
   8Add pass_ch_oacc_kernels
   9Add pass_parallelize_loops_oacc_kernels
  10Add pass_oacc_kernels pass group in passes.def
  11Update testcases after adding kernels pass group
  12Handle acc loop directive
  13Add c-c++-common/goacc/kernels-*.c
  14Add gfortran.dg/goacc/kernels-*.f95
  15Add libgomp.oacc-c-c++-common/kernels-*.c
  16Add libgomp.oacc-fortran/kernels-*.f95

The first 9 patches are more or less independent, but patches 10-16 are
intended to be committed at the same time.

Bootstrapped and reg-tested on x86_64.

Build and reg-tested with nvidia accelerator, in combination with a
patch that enables accelerator testing (which is submitted at
https://gcc.gnu.org/ml/gcc-patches/2015-10/msg01771.html ).

I'll post the individual patches in reply to this message.


this patch adds and initializes the field in_oacc_kernels_region field in
struct loop.

The field is used to signal to subsequent passes that we're dealing with a
loop in a kernels region that we're trying parallelize.

Note that we do not parallelize kernels regions with more than one loop nest.
[ In general, kernels regions with more than one loop nest should be split up
into seperate kernels regions, but that's not supported atm. ]


I think mark_loops_in_oacc_kernels_region can be greatly simplified.

Both region entry and exit should have the same ->loop_father (a SESE
region).  Then you can just walk that loops inner (and their sibling)
loops checking their header domination relation with the region entry
exit (only necessary for direct inner loops).


Updated patch to use the loops structure.  Atm I'm also skipping loops 
containing sibling loops, since I have no test-cases for that yet.


Thanks,
- Tom

Add in_oacc_kernels_region in struct loop

2015-11-09  Tom de Vries  

	* cfgloop.h (struct loop): Add in_oacc_kernels_region field.
	* omp-low.c (mark_loops_in_oacc_kernels_region): New function.
	(expand_omp_target): Call mark_loops_in_oacc_kernels_region.

---
 gcc/cfgloop.h |  3 +++
 gcc/omp-low.c | 43 +++
 2 files changed, 46 insertions(+)

diff --git a/gcc/cfgloop.h b/gcc/cfgloop.h
index 6af6893..ee73bf9 100644
--- a/gcc/cfgloop.h
+++ b/gcc/cfgloop.h
@@ -191,6 +191,9 @@ struct GTY ((chain_next ("%h.next"))) loop {
   /* True if we should try harder to vectorize this loop.  */
   bool force_vectorize;
 
+  /* True if the loop is part of an oacc kernels region.  */
+  bool in_oacc_kernels_region;
+
   /* For SIMD loops, this is a unique identifier of the loop, referenced
  by IFN_GOMP_SIMD_VF, IFN_GOMP_SIMD_LANE and IFN_GOMP_SIMD_LAST_LANE
  builtins.  */
diff --git a/gcc/omp-low.c b/gcc/omp-low.c
index 5f76434..fba7bbd 100644
--- a/gcc/omp-low.c
+++ b/gcc/omp-low.c
@@ -12450,6 +12450,46 @@ get_oacc_ifn_dim_arg (const gimple *stmt)
   return (int) axis;
 }
 
+/* Mark the loops inside the kernels region starting at REGION_ENTRY and ending
+   at REGION_EXIT.  */
+
+static void
+mark_loops_in_oacc_kernels_region (basic_block region_entry,
+   basic_block region_exit)
+{
+  struct loop *outer = region_entry->loop_father;
+  gcc_assert (region_exit == NULL || outer == region_exit->loop_father);
+
+  /* Don't parallelize the kernels region if it contains more than one outer
+ loop.  */
+  unsigned int nr_outer_loops = 0;
+  struct loop *single_outer;
+  for (struct loop *loop = outer->inner; loop != NULL; loop = loop->next)
+{
+  gcc_assert (loop_outer (loop) == outer);
+
+  if (!dominated_by_p (CDI_DOMINATORS, loop->header, region_entry))
+	continue;
+
+  if (region_exit != NULL
+	  && dominated_by_p (CDI_DOMINATORS, loop->header, region_exit))
+	continue;
+
+  nr_outer_loops++;
+  single_outer = loop;
+}
+  if (nr_outer_loops != 1)
+return;
+
+  for (struct loop *loop = single_outer->inner; loop != NULL; loop = loop->inner)
+if (loop->next)
+  return;
+
+  /* Mark the loops in the region.  */
+  for (struct loop *loop = single_outer; loop != NULL; loop = loop->inner)
+loop->in_oacc_kernels_region = true;
+}
+
 /* Expand the GIMPLE_OMP_TARGET starting at REGION.  */
 
 static void
@@ -12505,6 +12545,9 @@ expand_omp_target (struct omp_region *region)
   entry_bb = region->entr

Re: [PATCH][GCC] Make stackalign test LTO proof

2015-11-16 Thread Andre Vieira

On 13/11/15 10:34, Richard Biener wrote:

On Thu, Nov 12, 2015 at 4:07 PM, Andre Vieira
 wrote:

Hi,

   This patch changes this testcase to make sure LTO will not optimize away
the assignment of the local array to a global variable which was introduced
to make sure stack space was made available for the test to work.

   This is correct because LTO is supposed to optimize this global away as at
link time it knows this global will never be read. By adding a read of the
global, LTO will no longer optimize it away.


But that's only because we can't see that bar doesn't clobber it, else
we would optimize away the check and get here again.  Much better
to mark 'dummy' with __attribute__((used)) and go away with 'g' entirely.

Richard.


   Tested by running regressions for this testcase for various ARM targets.

   Is this OK to commit?

   Thanks,
   Andre Vieira

gcc/testsuite/ChangeLog:
2015-11-06  Andre Vieira  

 * gcc.dg/torture/stackalign/builtin-return-1.c: Added read
   to global such that a write is not optimized away by LTO.



Hi Richard,

That would be great but __attribute__ ((used)) can only be used for 
static variables and making dummy static would defeat the purpose here.


Cheers,
Andre



Re: [PATCH] Fix obvious typo that breaks boot strap in delayed folding

2015-11-16 Thread Eric Botcazou
> No, the bug still exists.  Try --enable-checking=release.

Ah, OK, so tree checking somehow works around it I presume.

-- 
Eric Botcazou


Re: [1/2] i386 ROP mitigation: make more of regrename callable

2015-11-16 Thread Eric Botcazou
> Bootstrapped and tested on x86_64-linux. Ok?

OK once the 2/2 patch is also approved.

-- 
Eric Botcazou


Re: [PATCH, 10/16] Add pass_oacc_kernels pass group in passes.def

2015-11-16 Thread Tom de Vries

On 11/11/15 12:02, Richard Biener wrote:

On Mon, 9 Nov 2015, Tom de Vries wrote:


On 09/11/15 16:35, Tom de Vries wrote:

Hi,

this patch series for stage1 trunk adds support to:
- parallelize oacc kernels regions using parloops, and
- map the loops onto the oacc gang dimension.

The patch series contains these patches:

   1Insert new exit block only when needed in
  transform_to_exit_first_loop_alt
   2Make create_parallel_loop return void
   3Ignore reduction clause on kernels directive
   4Implement -foffload-alias
   5Add in_oacc_kernels_region in struct loop
   6Add pass_oacc_kernels
   7Add pass_dominator_oacc_kernels
   8Add pass_ch_oacc_kernels
   9Add pass_parallelize_loops_oacc_kernels
  10Add pass_oacc_kernels pass group in passes.def
  11Update testcases after adding kernels pass group
  12Handle acc loop directive
  13Add c-c++-common/goacc/kernels-*.c
  14Add gfortran.dg/goacc/kernels-*.f95
  15Add libgomp.oacc-c-c++-common/kernels-*.c
  16Add libgomp.oacc-fortran/kernels-*.f95

The first 9 patches are more or less independent, but patches 10-16 are
intended to be committed at the same time.

Bootstrapped and reg-tested on x86_64.

Build and reg-tested with nvidia accelerator, in combination with a
patch that enables accelerator testing (which is submitted at
https://gcc.gnu.org/ml/gcc-patches/2015-10/msg01771.html ).

I'll post the individual patches in reply to this message.



This patch adds the pass_oacc_kernels pass group to the pass list in
passes.def.

Note the repetition of pass_lim/pass_copy_prop. The first pair is for an inner
loop in a loop nest, the second for an outer loop in a loop nest.


@@ -86,6 +86,27 @@ along with GCC; see the file COPYING3.  If not see
   /* pass_build_ealias is a dummy pass that ensures that we
  execute TODO_rebuild_alias at this point.  */
   NEXT_PASS (pass_build_ealias);
+ /* Pass group that runs when there are oacc kernels in the
+function.  */
+ NEXT_PASS (pass_oacc_kernels);
+ PUSH_INSERT_PASSES_WITHIN (pass_oacc_kernels)
+ NEXT_PASS (pass_dominator_oacc_kernels);
+ NEXT_PASS (pass_ch_oacc_kernels);
+ NEXT_PASS (pass_dominator_oacc_kernels);
+ NEXT_PASS (pass_tree_loop_init);
+ NEXT_PASS (pass_lim);
+ NEXT_PASS (pass_copy_prop);
+ NEXT_PASS (pass_lim);
+ NEXT_PASS (pass_copy_prop);

iterate lim/copyprop twice?!  Why's that needed?



I've managed to eliminate the last pass_copy_prop, but not pass_lim. 
I've added a comment:

...
  /* We use pass_lim to rewrite in-memory iteration and reduction
 variable accesses in loops into local variables accesses.
 However, a single pass instantion manages to do this only for
 one loop level, so we use pass_lim twice to at least be able to
 handle a loop nest with a depth of two.  */
  NEXT_PASS (pass_lim);
  NEXT_PASS (pass_copy_prop);
  NEXT_PASS (pass_lim);
...


+ NEXT_PASS (pass_scev_cprop);

What's that for?  It's supposed to help removing loops - I don't
expect kernels to vanish.


I'm using pass_scev_cprop for the "final value replacement" 
functionality. Added comment.




+ NEXT_PASS (pass_tree_loop_done);
+ NEXT_PASS (pass_dominator_oacc_kernels);

Three times DOM?  No please.  I wonder why you don't run oacc_kernels
after FRE and drop the initial DOM(s).



Done. There's just one pass_dominator_oacc_kernels left now.


+ NEXT_PASS (pass_dce);
+ NEXT_PASS (pass_tree_loop_init);
+ NEXT_PASS (pass_parallelize_loops_oacc_kernels);
+ NEXT_PASS (pass_expand_omp_ssa);
+ NEXT_PASS (pass_tree_loop_done);

The switches into/outof tree_loop also look odd to me, but well
(they'll be controlled by -ftree-loop-optimize)).



I've eliminated all the uses for pass_tree_loop_init/pass_tree_loop_done 
in the pass group. Instead, I've added conditional loop optimizer setup in:

-  pass_lim and pass_scev_cprop (added in this patch), and
- pass_parallelize_loops_oacc_kernels (added in patch "Add
  pass_parallelize_loops_oacc_kernels").

Thanks,
- Tom

Add pass_oacc_kernels pass group in passes.def

2015-11-09  Tom de Vries  

	* omp-low.c (pass_expand_omp_ssa::clone): New function.
	* passes.def: Add pass_oacc_kernels pass group.
	* tree-ssa-loop-ch.c (pass_ch::clone): New function.
	* tree-ssa-loop-im.c (tree_ssa_lim): Allow to run outside
	pass_tree_loop.
	* tree-ssa-loop.c (pass_scev_cprop::clone): New function.
	(pass_scev_cprop::execute): Allow to run outside pass_tree_loop.

---
 gcc/omp-low.c  |  1 +
 gcc/passes.def | 25 +
 gcc/tree-ssa-loop-ch.c |  2 ++
 gcc/tree-ssa-loop-im.c | 14 ++
 gcc/tree-ssa-loop.c| 22 +-
 5 files changed, 63

Re: [PATCH, 9/16] Add pass_parallelize_loops_oacc_kernels

2015-11-16 Thread Tom de Vries

On 09/11/15 20:52, Tom de Vries wrote:

On 09/11/15 16:35, Tom de Vries wrote:

Hi,

this patch series for stage1 trunk adds support to:
- parallelize oacc kernels regions using parloops, and
- map the loops onto the oacc gang dimension.

The patch series contains these patches:

  1Insert new exit block only when needed in
 transform_to_exit_first_loop_alt
  2Make create_parallel_loop return void
  3Ignore reduction clause on kernels directive
  4Implement -foffload-alias
  5Add in_oacc_kernels_region in struct loop
  6Add pass_oacc_kernels
  7Add pass_dominator_oacc_kernels
  8Add pass_ch_oacc_kernels
  9Add pass_parallelize_loops_oacc_kernels
 10Add pass_oacc_kernels pass group in passes.def
 11Update testcases after adding kernels pass group
 12Handle acc loop directive
 13Add c-c++-common/goacc/kernels-*.c
 14Add gfortran.dg/goacc/kernels-*.f95
 15Add libgomp.oacc-c-c++-common/kernels-*.c
 16Add libgomp.oacc-fortran/kernels-*.f95

The first 9 patches are more or less independent, but patches 10-16 are
intended to be committed at the same time.

Bootstrapped and reg-tested on x86_64.

Build and reg-tested with nvidia accelerator, in combination with a
patch that enables accelerator testing (which is submitted at
https://gcc.gnu.org/ml/gcc-patches/2015-10/msg01771.html ).

I'll post the individual patches in reply to this message.


This patch adds pass_parallelize_loops_oacc_kernels.

There's a number of things we do differently in parloops for oacc kernels:
- in normal parloops, we generate code to choose between a parallel
   version of the loop, and a sequential (low iteration count) version.
   Since the code in oacc kernels region is supposed to run on the
   accelerator anyway, we skip this check, and don't add a low iteration
   count loop.
- in normal parloops, we generate an #pragma omp parallel /
   GIMPLE_OMP_RETURN pair to delimit the region which will we split off
   into a thread function. Since the oacc kernels region is already
   split off, we don't add this pair.
- we indicate the parallelization factor by setting the oacc function
   attributes
- we generate an #pragma oacc loop instead of an #pragma omp for, and
   we add the gang clause
- in normal parloops, we rewrite the variable accesses in the loop in
   terms into accesses relative to a thread function parameter. For the
   oacc kernels region, that rewrite has already been done at omp-lower,
   so we skip this.
- we need to ensure that the entire kernels region can be run in
   parallel. The loop independence check is already present, so for oacc
   kernels we add a check between blocks outside the loop and the entire
   region.
- we guard stores in the blocks outside the loop with gang_pos == 0.
   There's no need for each gang to write to a single location, we can
   do this in just one gang. (Typically this is the write of the final
   value of the iteration variable if that one is copied back to the
   host).



Reposting with loop optimizer init added in 
pass_parallelize_loops_oacc_kernels::execute.


Thanks,
- Tom
Add pass_parallelize_loops_oacc_kernels

2015-11-09  Tom de Vries  

	* omp-low.c (set_oacc_fn_attrib): Make extern.
	* omp-low.c (expand_omp_atomic_fetch_op):  Release defs of update stmt.
	* omp-low.h (set_oacc_fn_attrib): Declare.
	* tree-parloops.c (struct reduction_info): Add reduc_addr field.
(create_call_for_reduction_1): Handle case that reduc_addr is non-NULL.
	(create_parallel_loop, gen_parallel_loop, try_create_reduction_list):
	Add and handle function parameter oacc_kernels_p.
	(get_omp_data_i_param): New function.
	(ref_conflicts_with_region, oacc_entry_exit_ok_1)
	(oacc_entry_exit_single_gang, oacc_entry_exit_ok): New function.
	(parallelize_loops): Add and handle function parameter oacc_kernels_p.
	Calculate dominance info.  Skip loops that are not in a kernels region
	in oacc_kernels_p mode.  Skip inner loops of parallelized loops.
	(pass_parallelize_loops::execute): Call parallelize_loops with false
	argument.
	(pass_data_parallelize_loops_oacc_kernels): New pass_data.
	(class pass_parallelize_loops_oacc_kernels): New pass.
	(pass_parallelize_loops_oacc_kernels::execute)
	(make_pass_parallelize_loops_oacc_kernels): New function.
	* tree-pass.h (make_pass_parallelize_loops_oacc_kernels): Declare.

---
 gcc/omp-low.c   |   8 +-
 gcc/omp-low.h   |   1 +
 gcc/tree-parloops.c | 693 +++-
 gcc/tree-pass.h |   2 +
 4 files changed, 640 insertions(+), 64 deletions(-)

diff --git a/gcc/omp-low.c b/gcc/omp-low.c
index fba7bbd..9eae09a 100644
--- a/gcc/omp-low.c
+++ b/gcc/omp-low.c
@@ -11944,10 +11944,14 @@ expand_omp_atomic_fetch_op (basic_block load_bb,
   gcc_assert (gimple_code (gsi_stmt (gsi)) == GIMPLE_OMP_ATOMIC_STORE);
   gsi_remove (&gsi, true);
   gsi = gsi_last_bb (store_bb);
+  stmt = gsi_st

Re: [PATCH, 7/16] Add pass_dominator_oacc_kernels

2015-11-16 Thread Tom de Vries

On 11/11/15 12:05, Richard Biener wrote:

On Mon, 9 Nov 2015, Tom de Vries wrote:


On 09/11/15 16:35, Tom de Vries wrote:

Hi,

this patch series for stage1 trunk adds support to:
- parallelize oacc kernels regions using parloops, and
- map the loops onto the oacc gang dimension.

The patch series contains these patches:

   1Insert new exit block only when needed in
  transform_to_exit_first_loop_alt
   2Make create_parallel_loop return void
   3Ignore reduction clause on kernels directive
   4Implement -foffload-alias
   5Add in_oacc_kernels_region in struct loop
   6Add pass_oacc_kernels
   7Add pass_dominator_oacc_kernels
   8Add pass_ch_oacc_kernels
   9Add pass_parallelize_loops_oacc_kernels
  10Add pass_oacc_kernels pass group in passes.def
  11Update testcases after adding kernels pass group
  12Handle acc loop directive
  13Add c-c++-common/goacc/kernels-*.c
  14Add gfortran.dg/goacc/kernels-*.f95
  15Add libgomp.oacc-c-c++-common/kernels-*.c
  16Add libgomp.oacc-fortran/kernels-*.f95

The first 9 patches are more or less independent, but patches 10-16 are
intended to be committed at the same time.

Bootstrapped and reg-tested on x86_64.

Build and reg-tested with nvidia accelerator, in combination with a
patch that enables accelerator testing (which is submitted at
https://gcc.gnu.org/ml/gcc-patches/2015-10/msg01771.html ).

I'll post the individual patches in reply to this message.


this patch adds pass_dominator_oacc_kernels (which we may as well call
pass_dominator_no_peel_loop_headers. It doesn't do anything
oacc-kernels-specific), to be used in the kernels pass group.

The reason I'm adding a new pass instead of using pass_dominator is that
pass_dominator uses first_pass_instance. So adding a pass_dominator instance A
before a pass_dominator instance B has the unexpected consequence that it may
change the behaviour of instance B. I've filed PR68247 - "Remove
pass_first_instance" to note this issue.


This looks ok (minus my comments to patch #10)



AFAIU, if "Remove first_pass_instance from pass_dominator" get approved 
and committed, we can drop this patch, and use this pass instantiation 
instead in the oacc_kernels pass group:

...
  NEXT_PASS (pass_dominator, false /* may_peel_loop_headers_p */);
...

Thanks,
- Tom


Re: [1/2] i386 ROP mitigation: make more of regrename callable

2015-11-16 Thread Bernd Schmidt

On 11/16/2015 12:54 PM, Eric Botcazou wrote:

Bootstrapped and tested on x86_64-linux. Ok?


OK once the 2/2 patch is also approved.


Thank you. The 2/2 has a small regrename piece to add target_data 
fields, are you OK with that as well? I'm assuming Uros will take a look 
at the rest.



Bernd



Re: [PATCH][GCC][ARM] Disable neon testing for armv7-m

2015-11-16 Thread James Greenhalgh
On Mon, Nov 16, 2015 at 10:49:11AM +, Andre Vieira wrote:
> Hi,
> 
>   This patch changes the target support mechanism to make it
> recognize any ARM 'M' profile as a non-neon supporting target. The
> current check only tests for armv6 architectures and earlier, and
> does not account for armv7-m.
> 
>   This is correct because there is no 'M' profile that supports neon
> and the current test is not sufficient to exclude armv7-m.
> 
>   Tested by running regressions for this testcase for various ARM targets.
> 
>   Is this OK to commit?
> 
>   Thanks,
>   Andre Vieira
> 
> gcc/testsuite/ChangeLog:
> 2015-11-06  Andre Vieira  
> 
> * gcc/testsuite/lib/target-supports.exp
>   (check_effective_target_arm_neon_ok_nocache): Added check
>   for M profile.

> From 2c53bb9ba3236919ecf137a4887abf26d4f7fda2 Mon Sep 17 00:00:00 2001
> From: Andre Simoes Dias Vieira 
> Date: Fri, 13 Nov 2015 11:16:34 +
> Subject: [PATCH] Disable neon testing for armv7-m
> 
> ---
>  gcc/testsuite/lib/target-supports.exp | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
> 
> diff --git a/gcc/testsuite/lib/target-supports.exp 
> b/gcc/testsuite/lib/target-supports.exp
> index 
> 75d506829221e3d02d454631c4bd2acd1a8cedf2..8097a4621b088a93d58d09571cf7aa27b8d5fba6
>  100644
> --- a/gcc/testsuite/lib/target-supports.exp
> +++ b/gcc/testsuite/lib/target-supports.exp
> @@ -2854,7 +2854,7 @@ proc check_effective_target_arm_neon_ok_nocache { } {
>   int dummy;
>   /* Avoid the case where a test adds -mfpu=neon, but the 
> toolchain is
>  configured for -mcpu=arm926ej-s, for example.  */
> - #if __ARM_ARCH < 7
> + #if __ARM_ARCH < 7 || __ARM_ARCH_PROFILE == 'M'
>   #error Architecture too old for NEON.

Could you fix this #error message while you're here?

Why we can't change this test to look for the __ARM_NEON macro from ACLE:

#if __ARM_NEON < 1
  #error NEON is not enabled
#endif

Thanks,
James


Re: [PATCH, 0/6] Remove first_pass_instance

2015-11-16 Thread Bernd Schmidt

On 11/15/2015 11:55 AM, Tom de Vries wrote:

[ was: Re: [PATCH] Remove first_pass_instance from pass_vrp ]

This patch series removes first_pass_instance.

  1Remove first_pass_instance from pass_vrp
  2Remove first_pass_instance from pass_reassoc
  3Remove first_pass_instance from pass_dominator
  4Remove first_pass_instance from pass_object_sizes
  5Remove first_pass_instance from pass_ccp
  6Remove first_pass_instance


In 5/6 please retain the comment about memory usage. Otherwise all ok.


Bernd



Re: [1/2] i386 ROP mitigation: make more of regrename callable

2015-11-16 Thread Eric Botcazou
> Thank you. The 2/2 has a small regrename piece to add target_data
> fields, are you OK with that as well?

Sure.

-- 
Eric Botcazou


Re: [PATCH][i386]Migrate reduction optabs to reduc__scal

2015-11-16 Thread Alan Lawrence

On 03/11/15 14:27, Alan Lawrence wrote:

This migrates the various reduction optabs in sse.md to use the reduce-to-scalar
form. I took the straightforward approach (equivalent to the migration code in
expr.c/optabs.c) of generating a vector temporary, using the existing code to
reduce to that, and extracting lane 0, in each pattern.

Bootstrapped + check-gcc + check-g++.

Ok for trunk?

gcc/ChangeLog:

* config/i386/sse.md (reduc_splus_v8df): Rename to...
(reduc_plus_scal_v8df): ...here; reduce to temp and extract scalar.

(reduc_splus_v4df): Rename to...
(reduc_plus_scal_v4df): ...here; reduce to temp and extract scalar.

(reduc_splus_v2df): Rename to...
(reduc_plus_scal_v2df): ...here; reduce to temp and extract scalar.

(reduc_splus_v16sf): Rename to...
(reduc_plus_scal_v16sf): ...here; reduce to temp and extract scalar.

(reduc_splus_v8sf): Rename to...
(reduc_plus_scal_v8sf): ...here; reduce to temp and extract scalar.

(reduc_splus_v4sf): Rename to...
(reduc_plus_scal_v4sf): ...here; reduce to temp and extract scalar.

(reduc__, all 3 variants): Rename each to...
(reduc__scal_): ...here; reduce to temp and extract scalar.

(reduc_umin_v8hf): Rename to...
(reduc_umin_scal_v8hf): ...here; reduce to temp and extract scalar.
---
  gcc/config/i386/sse.md | 82 +++---
  1 file changed, 51 insertions(+), 31 deletions(-)

diff --git a/gcc/config/i386/sse.md b/gcc/config/i386/sse.md
index 43dcc6a..041e514 100644
--- a/gcc/config/i386/sse.md
+++ b/gcc/config/i386/sse.md
@@ -2424,73 +2424,85 @@
 (set_attr "prefix_rep" "1,*")
 (set_attr "mode" "V4SF")])

-(define_expand "reduc_splus_v8df"
-  [(match_operand:V8DF 0 "register_operand")
+(define_expand "reduc_plus_scal_v8df"
+  [(match_operand:DF 0 "register_operand")
 (match_operand:V8DF 1 "register_operand")]
"TARGET_AVX512F"
  {
-  ix86_expand_reduc (gen_addv8df3, operands[0], operands[1]);
+  rtx tmp = gen_reg_rtx (V8DFmode);
+  ix86_expand_reduc (gen_addv8df3, tmp, operands[1]);
+  emit_insn (gen_vec_extractv8df (operands[0], tmp, const0_rtx));
DONE;
  })

-(define_expand "reduc_splus_v4df"
-  [(match_operand:V4DF 0 "register_operand")
+(define_expand "reduc_plus_scal_v4df"
+  [(match_operand:DF 0 "register_operand")
 (match_operand:V4DF 1 "register_operand")]
"TARGET_AVX"
  {
rtx tmp = gen_reg_rtx (V4DFmode);
rtx tmp2 = gen_reg_rtx (V4DFmode);
+  rtx vec_res = gen_reg_rtx (V4DFmode);
emit_insn (gen_avx_haddv4df3 (tmp, operands[1], operands[1]));
emit_insn (gen_avx_vperm2f128v4df3 (tmp2, tmp, tmp, GEN_INT (1)));
-  emit_insn (gen_addv4df3 (operands[0], tmp, tmp2));
+  emit_insn (gen_addv4df3 (vec_res, tmp, tmp2));
+  emit_insn (gen_vec_extractv4df (operands[0], vec_res, const0_rtx));
DONE;
  })

-(define_expand "reduc_splus_v2df"
-  [(match_operand:V2DF 0 "register_operand")
+(define_expand "reduc_plus_scal_v2df"
+  [(match_operand:DF 0 "register_operand")
 (match_operand:V2DF 1 "register_operand")]
"TARGET_SSE3"
  {
-  emit_insn (gen_sse3_haddv2df3 (operands[0], operands[1], operands[1]));
+  rtx tmp = gen_reg_rtx (V2DFmode);
+  emit_insn (gen_sse3_haddv2df3 (tmp, operands[1], operands[1]));
+  emit_insn (gen_vec_extractv2df (operands[0], tmp, const0_rtx));
DONE;
  })

-(define_expand "reduc_splus_v16sf"
-  [(match_operand:V16SF 0 "register_operand")
+(define_expand "reduc_plus_scal_v16sf"
+  [(match_operand:SF 0 "register_operand")
 (match_operand:V16SF 1 "register_operand")]
"TARGET_AVX512F"
  {
-  ix86_expand_reduc (gen_addv16sf3, operands[0], operands[1]);
+  rtx tmp = gen_reg_rtx (V16SFmode);
+  ix86_expand_reduc (gen_addv16sf3, tmp, operands[1]);
+  emit_insn (gen_vec_extractv16sf (operands[0], tmp, const0_rtx));
DONE;
  })

-(define_expand "reduc_splus_v8sf"
-  [(match_operand:V8SF 0 "register_operand")
+(define_expand "reduc_plus_scal_v8sf"
+  [(match_operand:SF 0 "register_operand")
 (match_operand:V8SF 1 "register_operand")]
"TARGET_AVX"
  {
rtx tmp = gen_reg_rtx (V8SFmode);
rtx tmp2 = gen_reg_rtx (V8SFmode);
+  rtx vec_res = gen_reg_rtx (V8SFmode);
emit_insn (gen_avx_haddv8sf3 (tmp, operands[1], operands[1]));
emit_insn (gen_avx_haddv8sf3 (tmp2, tmp, tmp));
emit_insn (gen_avx_vperm2f128v8sf3 (tmp, tmp2, tmp2, GEN_INT (1)));
-  emit_insn (gen_addv8sf3 (operands[0], tmp, tmp2));
+  emit_insn (gen_addv8sf3 (vec_res, tmp, tmp2));
+  emit_insn (gen_vec_extractv8sf (operands[0], vec_res, const0_rtx));
DONE;
  })

-(define_expand "reduc_splus_v4sf"
-  [(match_operand:V4SF 0 "register_operand")
+(define_expand "reduc_plus_scal_v4sf"
+  [(match_operand:SF 0 "register_operand")
 (match_operand:V4SF 1 "register_operand")]
"TARGET_SSE"
  {
+  rtx vec_res = gen_reg_rtx (V4SFmode);
if (TARGET_SSE3)
  {
rtx tmp = gen_reg_rtx (V4SFmode);
emit_insn (gen_sse3_hadd

Re: [PATCH, 5/16] Add in_oacc_kernels_region in struct loop

2015-11-16 Thread Richard Biener
On Mon, 16 Nov 2015, Tom de Vries wrote:

> On 11/11/15 11:55, Richard Biener wrote:
> > On Mon, 9 Nov 2015, Tom de Vries wrote:
> > 
> > > On 09/11/15 16:35, Tom de Vries wrote:
> > > > Hi,
> > > > 
> > > > this patch series for stage1 trunk adds support to:
> > > > - parallelize oacc kernels regions using parloops, and
> > > > - map the loops onto the oacc gang dimension.
> > > > 
> > > > The patch series contains these patches:
> > > > 
> > > >1Insert new exit block only when needed in
> > > >   transform_to_exit_first_loop_alt
> > > >2Make create_parallel_loop return void
> > > >3Ignore reduction clause on kernels directive
> > > >4Implement -foffload-alias
> > > >5Add in_oacc_kernels_region in struct loop
> > > >6Add pass_oacc_kernels
> > > >7Add pass_dominator_oacc_kernels
> > > >8Add pass_ch_oacc_kernels
> > > >9Add pass_parallelize_loops_oacc_kernels
> > > >   10Add pass_oacc_kernels pass group in passes.def
> > > >   11Update testcases after adding kernels pass group
> > > >   12Handle acc loop directive
> > > >   13Add c-c++-common/goacc/kernels-*.c
> > > >   14Add gfortran.dg/goacc/kernels-*.f95
> > > >   15Add libgomp.oacc-c-c++-common/kernels-*.c
> > > >   16Add libgomp.oacc-fortran/kernels-*.f95
> > > > 
> > > > The first 9 patches are more or less independent, but patches 10-16 are
> > > > intended to be committed at the same time.
> > > > 
> > > > Bootstrapped and reg-tested on x86_64.
> > > > 
> > > > Build and reg-tested with nvidia accelerator, in combination with a
> > > > patch that enables accelerator testing (which is submitted at
> > > > https://gcc.gnu.org/ml/gcc-patches/2015-10/msg01771.html ).
> > > > 
> > > > I'll post the individual patches in reply to this message.
> > > 
> > > this patch adds and initializes the field in_oacc_kernels_region field in
> > > struct loop.
> > > 
> > > The field is used to signal to subsequent passes that we're dealing with a
> > > loop in a kernels region that we're trying parallelize.
> > > 
> > > Note that we do not parallelize kernels regions with more than one loop
> > > nest.
> > > [ In general, kernels regions with more than one loop nest should be split
> > > up
> > > into seperate kernels regions, but that's not supported atm. ]
> > 
> > I think mark_loops_in_oacc_kernels_region can be greatly simplified.
> > 
> > Both region entry and exit should have the same ->loop_father (a SESE
> > region).  Then you can just walk that loops inner (and their sibling)
> > loops checking their header domination relation with the region entry
> > exit (only necessary for direct inner loops).
> 
> Updated patch to use the loops structure.  Atm I'm also skipping loops
> containing sibling loops, since I have no test-cases for that yet.

Looks ok to me now.  You want to update copy_loop_info btw.

Richard.

> Thanks,
> - Tom
> 
> 

-- 
Richard Biener 
SUSE LINUX GmbH, GF: Felix Imendoerffer, Jane Smithard, Graham Norton, HRB 
21284 (AG Nuernberg)


[Obvious][AArch64] Fix gcc.target/aarch64/vclz.c

2015-11-16 Thread Alan Lawrence
The scan-assembler vclz2s 34 test in here has been failing since r230091:

* tree-vect-slp.c (vect_bb_vectorization_profitable_p): Make equal
cost favor vectorized version.

which transforms a load of piecewise array assignments (in tree) into
2-element-vector writes to MEMs, and the first RUN_TEST for the V2SI cases
then constant-propagates because the INHIB_OPTIMIZATION (which clobbers the
data arrays) comes after the load from those arrays.

Hence, put the INHIB_OPTIMIZATION in the right place.

Moreover, the memory clobber does not affect the registers now holding the
values loaded anyway, so we can drop the second one altogether.

Tested on aarch64-none-linux-gnu, committed as r230421.

gcc/testsuite/ChangeLog:

* gcc.target/aarch64/vclz.c: Correctly place INHIB_OPTIMIZATION.
---
 gcc/testsuite/gcc.target/aarch64/vclz.c | 11 +++
 1 file changed, 3 insertions(+), 8 deletions(-)

diff --git a/gcc/testsuite/gcc.target/aarch64/vclz.c 
b/gcc/testsuite/gcc.target/aarch64/vclz.c
index 455ba63..60494a8 100644
--- a/gcc/testsuite/gcc.target/aarch64/vclz.c
+++ b/gcc/testsuite/gcc.target/aarch64/vclz.c
@@ -67,18 +67,13 @@ extern void abort (void);
   CONCAT1 (vclz, POSTFIX (reg_len, data_len, is_signed))
 
 #define RUN_TEST(test_set, answ_set, reg_len, data_len, is_signed, n)  \
+  INHIB_OPTIMIZATION;  \
   a = LOAD_INST (reg_len, data_len, is_signed) (test_set); \
   b = LOAD_INST (reg_len, data_len, is_signed) (answ_set); \
-  INHIB_OPTIMIZATION;  \
   a = CLZ_INST (reg_len, data_len, is_signed) (a); \
   for (i = 0; i < n; i++)  \
-{  \
-  INHIB_OPTIMIZATION;  \
-  if (a [i] != b [i])  \
-{  \
-  return 1;\
-}  \
-}
+if (a [i] != b [i])
\
+  return 1;
 
 int
 test_vclz_s8 ()
-- 
1.9.1



Re: [PATCH, 10/16] Add pass_oacc_kernels pass group in passes.def

2015-11-16 Thread Richard Biener
On Mon, 16 Nov 2015, Tom de Vries wrote:

> On 11/11/15 12:02, Richard Biener wrote:
> > On Mon, 9 Nov 2015, Tom de Vries wrote:
> > 
> > > On 09/11/15 16:35, Tom de Vries wrote:
> > > > Hi,
> > > > 
> > > > this patch series for stage1 trunk adds support to:
> > > > - parallelize oacc kernels regions using parloops, and
> > > > - map the loops onto the oacc gang dimension.
> > > > 
> > > > The patch series contains these patches:
> > > > 
> > > >1Insert new exit block only when needed in
> > > >   transform_to_exit_first_loop_alt
> > > >2Make create_parallel_loop return void
> > > >3Ignore reduction clause on kernels directive
> > > >4Implement -foffload-alias
> > > >5Add in_oacc_kernels_region in struct loop
> > > >6Add pass_oacc_kernels
> > > >7Add pass_dominator_oacc_kernels
> > > >8Add pass_ch_oacc_kernels
> > > >9Add pass_parallelize_loops_oacc_kernels
> > > >   10Add pass_oacc_kernels pass group in passes.def
> > > >   11Update testcases after adding kernels pass group
> > > >   12Handle acc loop directive
> > > >   13Add c-c++-common/goacc/kernels-*.c
> > > >   14Add gfortran.dg/goacc/kernels-*.f95
> > > >   15Add libgomp.oacc-c-c++-common/kernels-*.c
> > > >   16Add libgomp.oacc-fortran/kernels-*.f95
> > > > 
> > > > The first 9 patches are more or less independent, but patches 10-16 are
> > > > intended to be committed at the same time.
> > > > 
> > > > Bootstrapped and reg-tested on x86_64.
> > > > 
> > > > Build and reg-tested with nvidia accelerator, in combination with a
> > > > patch that enables accelerator testing (which is submitted at
> > > > https://gcc.gnu.org/ml/gcc-patches/2015-10/msg01771.html ).
> > > > 
> > > > I'll post the individual patches in reply to this message.
> > > > 
> > > 
> > > This patch adds the pass_oacc_kernels pass group to the pass list in
> > > passes.def.
> > > 
> > > Note the repetition of pass_lim/pass_copy_prop. The first pair is for an
> > > inner
> > > loop in a loop nest, the second for an outer loop in a loop nest.
> > 
> > @@ -86,6 +86,27 @@ along with GCC; see the file COPYING3.  If not see
> >/* pass_build_ealias is a dummy pass that ensures that we
> >   execute TODO_rebuild_alias at this point.  */
> >NEXT_PASS (pass_build_ealias);
> > + /* Pass group that runs when there are oacc kernels in the
> > +function.  */
> > + NEXT_PASS (pass_oacc_kernels);
> > + PUSH_INSERT_PASSES_WITHIN (pass_oacc_kernels)
> > + NEXT_PASS (pass_dominator_oacc_kernels);
> > + NEXT_PASS (pass_ch_oacc_kernels);
> > + NEXT_PASS (pass_dominator_oacc_kernels);
> > + NEXT_PASS (pass_tree_loop_init);
> > + NEXT_PASS (pass_lim);
> > + NEXT_PASS (pass_copy_prop);
> > + NEXT_PASS (pass_lim);
> > + NEXT_PASS (pass_copy_prop);
> > 
> > iterate lim/copyprop twice?!  Why's that needed?
> > 
> 
> I've managed to eliminate the last pass_copy_prop, but not pass_lim. I've
> added a comment:
> ...
>   /* We use pass_lim to rewrite in-memory iteration and reduction
>  variable accesses in loops into local variables accesses.
>  However, a single pass instantion manages to do this only for
>  one loop level, so we use pass_lim twice to at least be able to
>  handle a loop nest with a depth of two.  */
>   NEXT_PASS (pass_lim);
>   NEXT_PASS (pass_copy_prop);
>   NEXT_PASS (pass_lim);
> ...

Huh.  Testcase?  LIM is perfectly able to handle nests.

> > + NEXT_PASS (pass_scev_cprop);
> > 
> > What's that for?  It's supposed to help removing loops - I don't
> > expect kernels to vanish.
> 
> I'm using pass_scev_cprop for the "final value replacement" functionality.
> Added comment.

That functionality is intented to enable loop removal.

> > 
> > + NEXT_PASS (pass_tree_loop_done);
> > + NEXT_PASS (pass_dominator_oacc_kernels);
> > 
> > Three times DOM?  No please.  I wonder why you don't run oacc_kernels
> > after FRE and drop the initial DOM(s).
> > 
> 
> Done. There's just one pass_dominator_oacc_kernels left now.
> 
> > + NEXT_PASS (pass_dce);
> > + NEXT_PASS (pass_tree_loop_init);
> > + NEXT_PASS (pass_parallelize_loops_oacc_kernels);
> > + NEXT_PASS (pass_expand_omp_ssa);
> > + NEXT_PASS (pass_tree_loop_done);
> > 
> > The switches into/outof tree_loop also look odd to me, but well
> > (they'll be controlled by -ftree-loop-optimize)).
> > 
> 
> I've eliminated all the uses for pass_tree_loop_init/pass_tree_loop_done in
> the pass group. Instead, I've added conditional loop optimizer setup in:
> -  pass_lim and pass_scev_cprop (added in this patch), and
> - pass_parallelize_loops_oacc_kernels (added in patch "Add
>   pass_parallelize_loops_oacc_ker

Re: [PATCH] Allow embedded timestamps by C/C++ macros to be set externally (2)

2015-11-16 Thread Bernd Schmidt

On 11/15/2015 11:14 PM, Dhole wrote:

gcc/c-family/ChangeLog:

2015-10-10  Eduard Sanou


I can't find a previous change from you in the sources, so the first 
question would be whether you've gone through the copyright assignment 
process.



Bernd


Re: [PATCH, 0/6] Remove first_pass_instance

2015-11-16 Thread Tom de Vries

On 16/11/15 13:09, Bernd Schmidt wrote:

On 11/15/2015 11:55 AM, Tom de Vries wrote:

[ was: Re: [PATCH] Remove first_pass_instance from pass_vrp ]

This patch series removes first_pass_instance.

  1Remove first_pass_instance from pass_vrp
  2Remove first_pass_instance from pass_reassoc
  3Remove first_pass_instance from pass_dominator
  4Remove first_pass_instance from pass_object_sizes
  5Remove first_pass_instance from pass_ccp
  6Remove first_pass_instance


In 5/6 please retain the comment about memory usage.


[ FWIW, I moved that comment to passes.def, since I thought it was more 
appropriate there. ]


Done, comitted as attached.

Thanks,
- Tom
Remove first_pass_instance from pass_ccp

2015-11-15  Tom de Vries  

	* passes.def: Add arg to pass_ccp pass instantiation.
	* tree-ssa-ccp.c (ccp_finalize): Add param nonzero_p.  Use nonzero_p
	instead of first_pass_instance.
	(do_ssa_ccp): Add and handle param nonzero_p.
	(pass_ccp::pass_ccp): Initialize nonzero_p.
	(pass_ccp::set_pass_param): New member function.  Set nonzero_p.
	(pass_ccp::execute): Call do_ssa_ccp with extra arg.
	(pass_ccp::nonzero_p): New private member.

---
 gcc/passes.def | 10 ++
 gcc/tree-ssa-ccp.c | 25 +
 2 files changed, 23 insertions(+), 12 deletions(-)

diff --git a/gcc/passes.def b/gcc/passes.def
index 64883a7..17027786 100644
--- a/gcc/passes.def
+++ b/gcc/passes.def
@@ -78,7 +78,9 @@ along with GCC; see the file COPYING3.  If not see
   PUSH_INSERT_PASSES_WITHIN (pass_all_early_optimizations)
 	  NEXT_PASS (pass_remove_cgraph_callee_edges);
 	  NEXT_PASS (pass_object_sizes, true /* insert_min_max_p */);
-	  NEXT_PASS (pass_ccp);
+	  /* Don't record nonzero bits before IPA to avoid
+	 using too much memory.  */
+	  NEXT_PASS (pass_ccp, false /* nonzero_p */);
 	  /* After CCP we rewrite no longer addressed locals into SSA
 	 form if possible.  */
 	  NEXT_PASS (pass_forwprop);
@@ -157,7 +159,7 @@ along with GCC; see the file COPYING3.  If not see
   /* Initial scalar cleanups before alias computation.
 	 They ensure memory accesses are not indirect wherever possible.  */
   NEXT_PASS (pass_strip_predict_hints);
-  NEXT_PASS (pass_ccp);
+  NEXT_PASS (pass_ccp, true /* nonzero_p */);
   /* After CCP we rewrite no longer addressed locals into SSA
 	 form if possible.  */
   NEXT_PASS (pass_complete_unrolli);
@@ -209,7 +211,7 @@ along with GCC; see the file COPYING3.  If not see
   NEXT_PASS (pass_dce);
   NEXT_PASS (pass_forwprop);
   NEXT_PASS (pass_phiopt);
-  NEXT_PASS (pass_ccp);
+  NEXT_PASS (pass_ccp, true /* nonzero_p */);
   /* After CCP we rewrite no longer addressed locals into SSA
 	 form if possible.  */
   NEXT_PASS (pass_cse_sincos);
@@ -319,7 +321,7 @@ along with GCC; see the file COPYING3.  If not see
   NEXT_PASS (pass_lower_complex);
   NEXT_PASS (pass_lower_vector_ssa);
   /* Perform simple scalar cleanup which is constant/copy propagation.  */
-  NEXT_PASS (pass_ccp);
+  NEXT_PASS (pass_ccp, true /* nonzero_p */);
   NEXT_PASS (pass_object_sizes);
   /* Fold remaining builtins.  */
   NEXT_PASS (pass_fold_builtins);
diff --git a/gcc/tree-ssa-ccp.c b/gcc/tree-ssa-ccp.c
index d09fab1..7b6b451 100644
--- a/gcc/tree-ssa-ccp.c
+++ b/gcc/tree-ssa-ccp.c
@@ -886,12 +886,12 @@ do_dbg_cnt (void)
 
 
 /* Do final substitution of propagated values, cleanup the flowgraph and
-   free allocated storage.
+   free allocated storage.  If NONZERO_P, record nonzero bits.
 
Return TRUE when something was optimized.  */
 
 static bool
-ccp_finalize (void)
+ccp_finalize (bool nonzero_p)
 {
   bool something_changed;
   unsigned i;
@@ -912,7 +912,7 @@ ccp_finalize (void)
 	  && (!INTEGRAL_TYPE_P (TREE_TYPE (name))
 		  /* Don't record nonzero bits before IPA to avoid
 		 using too much memory.  */
-		  || first_pass_instance)))
+		  || !nonzero_p)))
 	continue;
 
   val = get_value (name);
@@ -2394,16 +2394,17 @@ ccp_visit_stmt (gimple *stmt, edge *taken_edge_p, tree *output_p)
 }
 
 
-/* Main entry point for SSA Conditional Constant Propagation.  */
+/* Main entry point for SSA Conditional Constant Propagation.  If NONZERO_P,
+   record nonzero bits.  */
 
 static unsigned int
-do_ssa_ccp (void)
+do_ssa_ccp (bool nonzero_p)
 {
   unsigned int todo = 0;
   calculate_dominance_info (CDI_DOMINATORS);
   ccp_initialize ();
   ssa_propagate (ccp_visit_stmt, ccp_visit_phi_node);
-  if (ccp_finalize ())
+  if (ccp_finalize (nonzero_p))
 todo = (TODO_cleanup_cfg | TODO_update_ssa);
   free_dominance_info (CDI_DOMINATORS);
   return todo;
@@ -2429,14 +2430,22 @@ class pass_ccp : public gimple_opt_pass
 {
 public:
   pass_ccp (gcc::context *ctxt)
-: gimple_opt_pass (pass_data_ccp, ctxt)
+: gimple_opt_pass (pass_data_ccp, ctxt), nonzero_p (false)
   {}
 
   /* opt_pass methods: */
   opt_pass * clone () { return new pass_ccp (m_ctxt)

Re: [PATCH][i386]Migrate reduction optabs to reduc__scal

2015-11-16 Thread Kirill Yukhin
Hello Alan,
On 16 Nov 12:23, Alan Lawrence wrote:
> On 03/11/15 14:27, Alan Lawrence wrote:
> >This migrates the various reduction optabs in sse.md to use the 
> >reduce-to-scalar
> >form. I took the straightforward approach (equivalent to the migration code 
> >in
> >expr.c/optabs.c) of generating a vector temporary, using the existing code to
> >reduce to that, and extracting lane 0, in each pattern.
> >
> >Bootstrapped + check-gcc + check-g++.
> >
> >Ok for trunk?
Your patch is OK for trunk.

Sorry, for missing the patch.

--
Thanks, K
> >
> >gcc/ChangeLog:
> >
> > * config/i386/sse.md (reduc_splus_v8df): Rename to...
> > (reduc_plus_scal_v8df): ...here; reduce to temp and extract scalar.
> >
> > (reduc_splus_v4df): Rename to...
> > (reduc_plus_scal_v4df): ...here; reduce to temp and extract scalar.
> >
> > (reduc_splus_v2df): Rename to...
> > (reduc_plus_scal_v2df): ...here; reduce to temp and extract scalar.
> >
> > (reduc_splus_v16sf): Rename to...
> > (reduc_plus_scal_v16sf): ...here; reduce to temp and extract scalar.
> >
> > (reduc_splus_v8sf): Rename to...
> > (reduc_plus_scal_v8sf): ...here; reduce to temp and extract scalar.
> >
> > (reduc_splus_v4sf): Rename to...
> > (reduc_plus_scal_v4sf): ...here; reduce to temp and extract scalar.
> >
> > (reduc__, all 3 variants): Rename each to...
> > (reduc__scal_): ...here; reduce to temp and extract scalar.
> >
> > (reduc_umin_v8hf): Rename to...
> > (reduc_umin_scal_v8hf): ...here; reduce to temp and extract scalar.
> >---
> >  gcc/config/i386/sse.md | 82 
> > +++---
> >  1 file changed, 51 insertions(+), 31 deletions(-)
> >


Re: [PATCH][GCC][ARM] Disable neon testing for armv7-m

2015-11-16 Thread Andre Vieira

On 16/11/15 12:07, James Greenhalgh wrote:

On Mon, Nov 16, 2015 at 10:49:11AM +, Andre Vieira wrote:

Hi,

   This patch changes the target support mechanism to make it
recognize any ARM 'M' profile as a non-neon supporting target. The
current check only tests for armv6 architectures and earlier, and
does not account for armv7-m.

   This is correct because there is no 'M' profile that supports neon
and the current test is not sufficient to exclude armv7-m.

   Tested by running regressions for this testcase for various ARM targets.

   Is this OK to commit?

   Thanks,
   Andre Vieira

gcc/testsuite/ChangeLog:
2015-11-06  Andre Vieira  

 * gcc/testsuite/lib/target-supports.exp
   (check_effective_target_arm_neon_ok_nocache): Added check
   for M profile.



 From 2c53bb9ba3236919ecf137a4887abf26d4f7fda2 Mon Sep 17 00:00:00 2001
From: Andre Simoes Dias Vieira 
Date: Fri, 13 Nov 2015 11:16:34 +
Subject: [PATCH] Disable neon testing for armv7-m

---
  gcc/testsuite/lib/target-supports.exp | 2 +-
  1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/gcc/testsuite/lib/target-supports.exp 
b/gcc/testsuite/lib/target-supports.exp
index 
75d506829221e3d02d454631c4bd2acd1a8cedf2..8097a4621b088a93d58d09571cf7aa27b8d5fba6
 100644
--- a/gcc/testsuite/lib/target-supports.exp
+++ b/gcc/testsuite/lib/target-supports.exp
@@ -2854,7 +2854,7 @@ proc check_effective_target_arm_neon_ok_nocache { } {
int dummy;
/* Avoid the case where a test adds -mfpu=neon, but the 
toolchain is
   configured for -mcpu=arm926ej-s, for example.  */
-   #if __ARM_ARCH < 7
+   #if __ARM_ARCH < 7 || __ARM_ARCH_PROFILE == 'M'
#error Architecture too old for NEON.


Could you fix this #error message while you're here?

Why we can't change this test to look for the __ARM_NEON macro from ACLE:

#if __ARM_NEON < 1
   #error NEON is not enabled
#endif

Thanks,
James



There is a check for this already: 'check_effective_target_arm_neon'. I 
think the idea behind arm_neon_ok is to check whether the hardware would 
support neon, whereas arm_neon is to check whether neon was enabled, 
i.e. -mfpu=neon was used or a mcpu was passed that has neon enabled by 
default.


The comments for 'check_effective_target_arm_neon_ok_nocache' highlight 
this, though maybe the comments for check_effective_target_arm_neon 
could be better.


# Return 1 if this is an ARM target supporting -mfpu=neon
# -mfloat-abi=softfp or equivalent options.  Some multilibs may be
# incompatible with these options.  Also set et_arm_neon_flags to the
# best options to add.

proc check_effective_target_arm_neon_ok_nocache
...
/* Avoid the case where a test adds -mfpu=neon, but the toolchain is
   configured for -mcpu=arm926ej-s, for example.  */
...


and

# Return 1 if this is a ARM target with NEON enabled.

proc check_effective_target_arm_neon
...

Cheers,
Andre



[PATCH][AArch64] PR target/68363 Check that argument is real INSN in aarch64_madd_needs_nop

2015-11-16 Thread Kyrill Tkachov

Hi all,

This RTL checking ICE occurs when processing an rtx_insn* in 
aarch64_madd_needs_nop that apparently
holds JUMP_TABLE_DATA. This shouldn't be passed to recog. So instead reject it 
with the INSN_P check.
Such rtx_insns are not relevant to the code in aarch64_madd_needs_nop anyway.

Bootstrapped and tested on trunk, GCC 5 and 4.9 configured with 
--enable-fix-cortex-a53-835769
--enable-checking=yes,rtl.

Ok for all branches? (the testcase passes on 4.8, presumably before the 
conversion to rtx_insn)

Thanks,
Kyrill

2015-11-16  Kyrylo Tkachov  

PR target/68363
* config/aarch64/aarch64.c (aarch64_madd_needs_nop): Reject arguments
that are not INSN_P.

2015-11-16  Kyrylo Tkachov  

PR target/68363
* gcc.target/aarch64/pr68363_1.c: New test.
commit a2fa69050c8a0dc824f5892f01385acdf35c54cc
Author: Kyrylo Tkachov 
Date:   Mon Nov 16 09:18:14 2015 +

[AArch64] PR target/68363 Check that argument is real INSN in aarch64_madd_needs_nop

diff --git a/gcc/config/aarch64/aarch64.c b/gcc/config/aarch64/aarch64.c
index 19dfc51..0b02f1e 100644
--- a/gcc/config/aarch64/aarch64.c
+++ b/gcc/config/aarch64/aarch64.c
@@ -10105,7 +10105,7 @@ aarch64_madd_needs_nop (rtx_insn* insn)
   if (!TARGET_FIX_ERR_A53_835769)
 return false;
 
-  if (recog_memoized (insn) < 0)
+  if (!INSN_P (insn) || recog_memoized (insn) < 0)
 return false;
 
   attr_type = get_attr_type (insn);
diff --git a/gcc/testsuite/gcc.target/aarch64/pr68363_1.c b/gcc/testsuite/gcc.target/aarch64/pr68363_1.c
new file mode 100644
index 000..bb294b5
--- /dev/null
+++ b/gcc/testsuite/gcc.target/aarch64/pr68363_1.c
@@ -0,0 +1,19 @@
+/* { dg-do compile } */
+/* { dg-options "-mfix-cortex-a53-835769" } */
+
+int
+foo (int i)
+{
+  switch (i)
+{
+case 0:
+case 2:
+case 5:
+  return 0;
+case 7:
+case 11:
+case 13:
+  return 1;
+}
+  return -1;
+}


Re: [PATCH] g++.dg/init/vbase1.C and g++.dg/cpp/ucn-1.C

2015-11-16 Thread David Edelsohn
On Mon, Nov 16, 2015 at 4:15 AM, Eric Botcazou  wrote:
>> No RISC architecture can store directly to MEM, so the expected RTL in
>> g++.dg/init/vbase1.C is wrong.  I am adding XFAIL for PowerPC.  This
>> probably should be disabled for ARM and other RISC architectures.
>
> Some of them can store 0 directly to MEM though, for example SPARC.

As Mike said, this testcase isn't portable and needs to be limited to
the targets that support the particular idiom used in this testcase.

Thanks, David


Re: [PATCH][GCC] Make stackalign test LTO proof

2015-11-16 Thread Richard Biener
On Mon, Nov 16, 2015 at 12:43 PM, Andre Vieira
 wrote:
> On 13/11/15 10:34, Richard Biener wrote:
>>
>> On Thu, Nov 12, 2015 at 4:07 PM, Andre Vieira
>>  wrote:
>>>
>>> Hi,
>>>
>>>This patch changes this testcase to make sure LTO will not optimize
>>> away
>>> the assignment of the local array to a global variable which was
>>> introduced
>>> to make sure stack space was made available for the test to work.
>>>
>>>This is correct because LTO is supposed to optimize this global away
>>> as at
>>> link time it knows this global will never be read. By adding a read of
>>> the
>>> global, LTO will no longer optimize it away.
>>
>>
>> But that's only because we can't see that bar doesn't clobber it, else
>> we would optimize away the check and get here again.  Much better
>> to mark 'dummy' with __attribute__((used)) and go away with 'g' entirely.
>>
>> Richard.
>>
>>>Tested by running regressions for this testcase for various ARM
>>> targets.
>>>
>>>Is this OK to commit?
>>>
>>>Thanks,
>>>Andre Vieira
>>>
>>> gcc/testsuite/ChangeLog:
>>> 2015-11-06  Andre Vieira  
>>>
>>>  * gcc.dg/torture/stackalign/builtin-return-1.c: Added read
>>>to global such that a write is not optimized away by LTO.
>>
>>
> Hi Richard,
>
> That would be great but __attribute__ ((used)) can only be used for static
> variables and making dummy static would defeat the purpose here.

I see.  What about volatile?

> Cheers,
> Andre
>


Re: [PATCH, x86] Fix posix_memalign declaration in mm_malloc.h

2015-11-16 Thread Bernd Schmidt

On 11/13/2015 11:30 PM, Marc Glisse wrote:

+__asm__("posix_memalign");


Can't say I like the __asm__ after the #if/#else/#endif block.


It might also cause trouble if some systems like to prepend an
underscore, maybe?


Yeah, that's one of the things I had in mind when I suggested moving 
this code to libgcc.a instead. Referring to a library symbol in this way 
makes me nervous.



Bernd


Re: [PATCH 6/6] Make SRA replace constant-pool loads

2015-11-16 Thread Richard Biener
On Thu, Nov 12, 2015 at 7:35 PM, Alan Lawrence  wrote:
> On 06/11/15 16:29, Richard Biener wrote:
 2) You should be able to use fold_ctor_reference directly (in place
>>> of
 all your code
 in case offset and size are readily available - don't remember
>>> exactly how
 complete scalarization "walks" elements).  Alternatively use
 fold_const_aggregate_ref.
>
> Well, yes I was able to remove subst_constant_pool_initial by using
> fold_ctor_reference, with some fairly strict checks about the access 
> expressions
> found while scanning the input. I still needed to either deep-scan the 
> constant
> value itself, or allow completely_scalarize to fail, due to Ada c460010 where
> a variable has DECL_INITIAL: {VIEW_CONVERT_EXPR({1, 2, 3})}
> (that is, a CONSTRUCTOR whose only element is a V.C.E. of another 
> CONSTRUCTOR).
>
> However:
>
>> I tried to suggest creating piecewise loads from the constant pool as 
>> opposed to the aggregate one.  But I suppose the difficulty is to determine 
>> 'pieces', not their values (that followup passes and folding can handle 
>> quite efficiently).
>
> I've now got this working, and it simplifies the patch massively :).
>
> We now replace e.g. a constant-pool load a = *.LC0, with a series of loads 
> e.g.
> SR_a1 = SRlc0_1
> SR_a2 = SRlc0_2
> etc. etc. (well those wouldn't be quite the names, but you get the idea.)
>
> And then at the start of the function we initialise
> SRlc0_1 = *.LC0[0]
> SRlc0_2 = *.LC0[1]
> etc. etc.
> - I needed to use SRlc0_1 etc. variables because some of the constant-pool 
> loads
> coming from Ada are rather more complicated than 'a = *.LC0' and substituting
> the access expression (e.g. '*.LC0[0].foo'), rather than the replacement_decl,
> into those lead to just too many verify_gimple failures.
>
> However, the initialization seems to work well enough, if I put a check into
> sra_modify_expr to avoid changing 'SRlc0_1 = *.LC0[0]' into 'SRlc0_1 = 
> SRlc0_1'
> (!). Slightly hacky perhaps but harmless as there cannot be a statement 
> writing
> to a scalar replacement prior to sra_modify_expr.
>
> Also I had to prevent writing scalar replacements back to the constant pool
> (gnat opt31.adb).
>
> Finally - I've left the disqualified_constants code in, but am now only using 
> it
> for an assert...so I guess it'd be reasonable to take that out. In an ideal
> world we could do those checks only in checking builds but allocating bitmaps
> only for checking builds feels like it would make checking vs non-checking
> diverge too much.
>
> This is now passing bootstrap+check-{gcc,g++,fortran} on AArch64, and the same
> plus Ada on x64_64 and ARM, except for some additional guality failures in
> pr54970-1.c on AArch64 (tests which are already failing on ARM) - I haven't 
> had
> any success in figuring those out yet, suggestions welcome.
>
> However, without the DOM patches, this does not fix the oiginal 
> ssa-dom-cse-2.c
> testcase, even though the load is scalarized in esra and the individual 
> element
> loads resolved in ccp2. I don't think I'll be able to respin those between now
> and stage 1 ending on Saturday, so hence I've had to drop the testsuite 
> change,
> and can only / will merely describe the remaining problem in PR/63679. I'm
> now benchmarking on AArch64 to see what difference this patch makes on its 
> own.
>
> Thoughts?

The new function needs a comment.  Otherwise, given there are no testcases,
I wonder what this does to nested constructs like multi-dimensional arrays
in a struct.

The patch looks _much_ better now though.  But I also wonder what the effects
on code-generation are for the non-reg-type load of part case.

I'd say the patch is ok with the comment added and a testcase (or two) verifying
it works as desired.

Thanks,
Richard.

> gcc/ChangeLog:
>
> PR target/63679
> * tree-sra.c (disqualified_constants): New.
> (sra_initialize): Allocate disqualified_constants.
> (sra_deinitialize): Free disqualified_constants.
> (disqualify_candidate): Update disqualified_constants when 
> appropriate.
> (create_access): Scan for constant-pool entries as we go along.
> (scalarizable_type_p): Add check against type_contains_placeholder_p.
> (maybe_add_sra_candidate): Allow constant-pool entries.
> (initialize_constant_pool_replacements): New.
> (sra_modify_expr): Avoid mangling assignments created by previous.
> (sra_modify_function_body): Call 
> initialize_constant_pool_replacements.
> ---
>  gcc/tree-sra.c | 93 
> --
>  1 file changed, 90 insertions(+), 3 deletions(-)
>
> diff --git a/gcc/tree-sra.c b/gcc/tree-sra.c
> index 1d4a632..aa60f06 100644
> --- a/gcc/tree-sra.c
> +++ b/gcc/tree-sra.c
> @@ -325,6 +325,10 @@ candidate (unsigned uid)
> those which cannot be (because they are and need be used as a whole).  */
>  static bitmap should_scalarize_away_bitmap, cannot_scala

Re: Use combined_fn in tree-vrp.c

2015-11-16 Thread Richard Biener
On Fri, Nov 13, 2015 at 12:27 PM, Richard Sandiford
 wrote:
> Richard Biener  writes:
>> On Tue, Nov 10, 2015 at 1:09 AM, Bernd Schmidt  wrote:
>>> On 11/07/2015 01:46 PM, Richard Sandiford wrote:

 @@ -3814,8 +3817,8 @@ extract_range_basic (value_range *vr, gimple *stmt)
   break;
   /* Both __builtin_ffs* and __builtin_popcount return
  [0, prec].  */
 -   CASE_INT_FN (BUILT_IN_FFS):
 -   CASE_INT_FN (BUILT_IN_POPCOUNT):
 +   CASE_CFN_FFS:
 +   CASE_CFN_POPCOUNT:
   arg = gimple_call_arg (stmt, 0);
   prec = TYPE_PRECISION (TREE_TYPE (arg));
   mini = 0;
>>>
>>>
>>> So let me see if I understood this. From what we discussed the purpose of
>>> these new internal functions is that they can have vector types. If so,
>>> isn't this code (here and elsewhere) which expects integers potentially
>>> going to be confused?
>>
>> We indeed need to add additional checks to most users of CASE_CFN_* to cover
>> the bigger freedom that exists with respect to types.
>
> The code above is OK because it's only handling integral types.
> A vector popcount or vector ffs must return a vector result.
>
>> Richard, please audit all the cases you change for that.
>
> I had another look and the only problematical uses I could see are the
> match.pd ones.  E.g.:
>
>  /* Optimize logN(func()) for various exponential functions.  We
> want to determine the value "x" and the power "exponent" in
> order to transform logN(x**exponent) into exponent*logN(x).  */
>  (for logs (LOG  LOG   LOG   LOG2 LOG2  LOG2  LOG10 LOG10)
>   exps (EXP2 EXP10 POW10 EXP  EXP10 POW10 EXP   EXP2)
>   (simplify
>(logs (exps @0))
>(with {
>  tree x;
>  switch (exps)
>{
>CASE_CFN_EXP:
>  /* Prepare to do logN(exp(exponent)) -> exponent*logN(e).  */
>  x = build_real_truncate (type, dconst_e ());
>  break;
>CASE_CFN_EXP2:
>  /* Prepare to do logN(exp2(exponent)) -> exponent*logN(2).  */
>  x = build_real (type, dconst2);
>  break;
>CASE_CFN_EXP10:
>CASE_CFN_POW10:
>  /* Prepare to do logN(exp10(exponent)) -> exponent*logN(10).  */
>  {
>REAL_VALUE_TYPE dconst10;
>real_from_integer (&dconst10, VOIDmode, 10, SIGNED);
>x = build_real (type, dconst10);
>  }
>  break;
>default:
>  gcc_unreachable ();
>}
>  }
> (mult (logs { x; }) @0
>
> Here we could either add a SCALAR_FLOAT_TYPE_P check or extend
> build_real to handle vector types.  Which do you think would be best?

I don't like extending build_real.  For now add a SCALAR_FLOAT_TYPE_P.

Or maybe extend build_real ...

Well.  Go with SCALAR_FLOAT_TYPE_P.

Thanks,
Richard.

>
> Thanks,
> Richard
>


Re: [PATCH 2/6] Make builtin_vectorized_function take a combined_fn

2015-11-16 Thread Richard Biener
On Fri, Nov 13, 2015 at 1:27 PM, Richard Sandiford
 wrote:
> Richard Biener  writes:
>> On Mon, Nov 9, 2015 at 5:25 PM, Richard Sandiford
>>  wrote:
>>> This patch replaces the fndecl argument to builtin_vectorized_function
>>> with a combined_fn and gets the vectoriser to call it for internal
>>> functions too.  The patch also moves vectorisation of machine-specific
>>> built-ins to a new hook, builtin_md_vectorized_function.
>>>
>>> I've attached a -b version too since that's easier to read.
>>
>> @@ -42095,8 +42018,7 @@ ix86_builtin_vectorized_function (tree fndecl,
>> tree type_out,
>>
>>/* Dispatch to a handler for a vectorization library.  */
>>if (ix86_veclib_handler)
>> -return ix86_veclib_handler ((enum built_in_function) fn, type_out,
>> -   type_in);
>> +return ix86_veclib_handler (combined_fn (fn), type_out, type_in);
>>
>>return NULL_TREE;
>>  }
>>
>> fn is already a combined_fn?  Why does the builtin_vectorized_function
>> not take one but an unsigned int?
>
> Not everything that includes the target headers includes tree.h.
> This is like builtin_conversion taking a tree_code as an unsigned int.
>
>> @@ -42176,11 +42077,12 @@ ix86_veclibabi_svml (enum built_in_function
>> fn, tree type_out, tree type_in)
>>return NULL_TREE;
>>  }
>>
>> -  bname = IDENTIFIER_POINTER (DECL_NAME (builtin_decl_implicit (fn)));
>> +  tree fndecl = mathfn_built_in (TREE_TYPE (type_in), fn);
>> +  bname = IDENTIFIER_POINTER (DECL_NAME (fndecl));
>>
>> -  if (fn == BUILT_IN_LOGF)
>> +  if (DECL_FUNCTION_CODE (fndecl) == BUILT_IN_LOGF)
>>
>> with 'fn' now a combined_fn how is this going to work with IFNs?
>
> By this point we already know that the function has one of the
> supported modes.  A previous patch extended matchfn_built_in
> to handle combined_fns.  E.g.
>
>   mathfn_built_in (float_type_node, IFN_SQRT)
>
> returns BUILT_IN_SQRTF.

Ah, I missed that I suppose.

>> +/* Implement TARGET_VECTORIZE_BUILTIN_MD_VECTORIZED_FUNCTION.  */
>> +
>> +static tree
>> +rs6000_builtin_md_vectorized_function (tree fndecl, tree type_out,
>> +  tree type_in)
>> +{
>>
>> any reason you are using a fndecl for this hook instead of the function code?
>
> It just seems more helpful to pass the fndecl when we have it.
> It's cheap to go from the decl to the code but it's less cheap
> to go the other way.

Ok, but for the other hook you changed it ...

>> @@ -1639,20 +1639,20 @@ vect_finish_stmt_generation (gimple *stmt,
>> gimple *vec_stmt,
>>  tree
>>  vectorizable_function (gcall *call, tree vectype_out, tree vectype_in)
>>  {
>> -  tree fndecl = gimple_call_fndecl (call);
>> -
>> -  /* We only handle functions that do not read or clobber memory -- i.e.
>> - const or novops ones.  */
>> -  if (!(gimple_call_flags (call) & (ECF_CONST | ECF_NOVOPS)))
>> +  /* We only handle functions that do not read or clobber memory.  */
>> +  if (gimple_vuse (call))
>>  return NULL_TREE;
>>
>> -  if (!fndecl
>> -  || TREE_CODE (fndecl) != FUNCTION_DECL
>> -  || !DECL_BUILT_IN (fndecl))
>> -return NULL_TREE;
>> +  combined_fn fn = gimple_call_combined_fn (call);
>> +  if (fn != CFN_LAST)
>> +return targetm.vectorize.builtin_vectorized_function
>> +  (fn, vectype_out, vectype_in);
>>
>> -  return targetm.vectorize.builtin_vectorized_function (fndecl, vectype_out,
>> -   vectype_in);
>> +  if (gimple_call_builtin_p (call, BUILT_IN_MD))
>> +return targetm.vectorize.builtin_md_vectorized_function
>> +  (gimple_call_fndecl (call), vectype_out, vectype_in);
>> +
>> +  return NULL_TREE;
>>
>> Looking at this and the issues above wouldn't it be easier to simply
>> pass the call stmt to the hook (which then can again handle
>> both normal and target builtins)?  And it has context available
>> (actual arguments and number of arguments for IFN calls).
>
> I'd rather not do that, since it means we have to construct a gcall *
> in cases where we're not asking about a straight-forward vectorisation
> of a preexisting scalar statement.
>
> The number of arguments is an inherent property of the function,
> it doesn't require access to a particular call.
>  The hook tells
> us what vector types we're using (and by extension what types
> the scalar op would have).

... so merging the hooks by passing both the combined fn code
and the decl would be possible?  The decl can be NULL if
the fn code is not CFN_LAST and if it is CFN_LAST then the decl
may be a target builtin?

Maybe I'm just too worried about that clean separation...  so decide
for yourselves here.

Thus, ok.

Thanks,
Richard.

> Thanks,
> Richard
>


Re: Extend tree-call-cdce to calls whose result is used

2015-11-16 Thread Richard Biener
On Fri, Nov 13, 2015 at 2:12 PM, Richard Sandiford
 wrote:
> Richard Biener  writes:
>> On Mon, Nov 9, 2015 at 10:03 PM, Michael Matz  wrote:
>>> Hi,
>>>
>>> On Mon, 9 Nov 2015, Richard Sandiford wrote:
>>>
 +static bool
 +can_use_internal_fn (gcall *call)
 +{
 +  /* Only replace calls that set errno.  */
 +  if (!gimple_vdef (call))
 +return false;
>>>
>>> Oh, I managed to confuse this in my head while reading the patch.  So,
>>> hmm, you don't actually replace the builtin with an internal function
>>> (without the condition) under no-errno-math?  Does something else do that?
>>> Because otherwise that seems an unnecessary restriction?
>>>
 >> r229916 fixed that for the non-EH case.
 >
 > Ah, missed it.  Even the EH case shouldn't be difficult.  If the
 > original dominator of the EH destination was the call block it moves,
 > otherwise it remains unchanged.

 The target of the edge is easy in itself, I agree, but that isn't
 necessarily the only affected block, if the EH handler doesn't
 exit or rethrow.
>>>
>>> You're worried the non-EH and the EH regions merge again, right?  Like so:
>>>
>>> before change:
>>>
>>> BB1: throwing-call
>>>  fallthru/   \EH
>>> BB2   BBeh
>>>  |   /\ (stuff in EH-region)
>>>  | /some path out of EH region
>>>  | /--/
>>> BB3
>>>
>>> Here, BB3 must at least be dominated by BB1 (the throwing block), or by
>>> something further up (when there are other side-entries to the path
>>> BB2->BB3 or into the EH region).  When further up, nothing changes, when
>>> it's BB1, then it's afterwards dominated by the BB containing the
>>> condition.  So everything with idom==BB1 gets idom=Bcond, except for BBeh,
>>> which gets idom=Bcall.  Depending on how you split BB1, either Bcond or
>>> BBcall might still be BB1 and doesn't lead to changes in the dom tree.
>>>
 > Currently we have quite some of such passes (reassoc, forwprop,
 > lower_vector_ssa, cse_reciprocals, cse_sincos (sigh!), optimize_bswap
 > and others), but they are all handling only special situations in one
 > way or the other.  pass_fold_builtins is another one, but it seems
 > most related to what you want (replacing a call with something else),
 > so I thought that'd be the natural choice.

 Well, to be pedantic, it's not really replacing the call.  Except for
 the special case of targets that support direct assignments to errno,
 it keeps the original call but ensures that it isn't usually executed.
 From that point of view it doesn't really seem like a fold.

 But I suppose that's just naming again :-).  And it's easily solved with
 s/fold/rewrite/.
>>>
>>> Exactly, in my mind pass_fold_builtin (like many of the others I
>>> mentioned) doesn't do folding but rewriting :)
>>
>> So I am replying here to the issue of where to do the transform call_cdce
>> does and the one Richard wants to add.  For example we "lower"
>> posix_memalign as early as GIMPLE lowering (that's before CFG construction).
>> We also lower sincos to cexpi during GENERIC folding (or if that is dropped
>> either GIMPLE lowering or GIMPLE folding during gimplification would be
>> appropriate).
>>
>> Now, with offloading we have to avoid creating target dependencies before
>> LTO stream-out (thus no IFN replacements before that - not sure if
>> Richards patches have an issue there already).
>
> No, this patch was the earliest point at which we converted to internal
> functions.  The idea was to make code treat ECF_PURE built-in functions
> and internal functions as being basically equivalent.  There's therefore
> not much benefit to doing a straight replacement of one with the other
> during either GENERIC or gimple.  Instead the series only used internal
> functions for things that built-in functions couldn't do, specifically:
>
> - the case used in this patch, to optimise part of a non-pure built-in
>   function using a pure equivalent.
>
> - vector versions of built-in functions.
>
> The cfgexpand patch makes sure that pure built-in functions are expanded
> like internal functions where possible.
>
>> Which would leave us with a lowering stage early in the main
>> optimization pipeline - I think fold_builtins pass is way too late but
>> any "folding" pass will do (like forwprop or backprop where the latter
>> might be better because it might end up computing FP "ranges" to
>> improve the initial lowering code).
>
> This isn't at all related to what backprop is doing though.
> backprop is about optimising definitions based on information
> about all uses.
>
> Does fold_builtins need to be as late as it is?

Not sure.  It folds remaining __builtin_constant_p stuff for example.

>> Of course call_cdce is as good as long as it still exists.
>
> Does this meann that you're not against the patch in principle
> (i.e. keeping 

Re: [PATCH] g++.dg/init/vbase1.C and g++.dg/cpp/ucn-1.C

2015-11-16 Thread Renlin Li

Hi David,

On 14/11/15 00:33, David Edelsohn wrote:

No RISC architecture can store directly to MEM, so the expected RTL in
g++.dg/init/vbase1.C is wrong.  I am adding XFAIL for PowerPC.  This
probably should be disabled for ARM and other RISC architectures.


I observed the same problem in arm.

This passes for aarch64 and mips as they have zero register to do that. 
However, other RISC might not have that feature, for example arm and 
RS6000 in this  case.


https://gcc.gnu.org/ml/gcc-patches/2015-10/msg03239.html

Regards,
Renlin Li


Dollar sign is not a valid identifier on AIX, so g++.dg/cpp/ucn-1.C
will produce an additional error on AIX.

* g++.dg/init/vbase1.C: XFAIL powerpc*-*-*.
* g++.dg/cpp/ucn-1.C: Expect error for dollar sign identifier on AIX.

Thanks, David

Index: init/vbase1.C
===
--- init/vbase1.C   (revision 230366)
+++ init/vbase1.C   (working copy)
@@ -42,4 +42,4 @@
  // Verify that the SubB() mem-initializer is storing 0 directly into
  // this->D.whatever rather than into a stack temp that is then copied into the
  // base field.
-// { dg-final { scan-rtl-dump "set
\[^\n\]*\n\[^\n\]*this\[^\n\]*\n\[^\n\]*const_int 0" "expand" } }
+// { dg-final { scan-rtl-dump "set
\[^\n\]*\n\[^\n\]*this\[^\n\]*\n\[^\n\]*const_int 0" "expand" { xfail
{ powerpc*-*-* } } } }
Index: cpp/ucn-1.C
===
--- cpp/ucn-1.C (revision 230366)
+++ cpp/ucn-1.C (working copy)
@@ -7,8 +7,9 @@
"\u0041";// 'A' UCN is OK in string literal
'\u0041';// also OK in character literal

-  int c\u0041c;  // { dg-error "not valid in an
identifier" }
-  int c\u0024c;  // $ is OK; not part of basic
source char set
+  int c\u0041c;// { dg-error "not valid in an identifier" }
+   // $ is OK on most targets; not part of basic source char set
+  int c\u0024c;// { dg-error "not valid in an identifier" {
target { powerpc-ibm-aix* } } }

U"\uD800"; // { dg-error "not a valid universal character" }
  }





[PATCH][RTL-ree] PR rtl-optimization/68194: Restrict copy instruction in presence of conditional moves

2015-11-16 Thread Kyrill Tkachov

Hi all,

In this PR we encounter a wrong-code bug on x86_64. It started with an RTL 
if-conversion patch of mine
which I believe exposed a latent issue in the ree (redundant extension 
elimination) pass.
The different if-conversion behaviour enabled a new cse opportunity
which then produced the RTL that triggered the bad ree behaviour.
The relevant RTL insns before the ree pass look like:

Basic Block A:
(set (reg:HI 0 ax)
 (mem:HI (symbol_ref:DI ("f"
...
(set (reg:SI 3 bx)
 (if_then_else:SI (eq (reg:CCZ 17 flags)
(const_int 0))
(reg:SI 0 ax)
(reg:SI 3 bx)))

(set (reg:SI 4 si)
 (sign_extend:SI (reg:HI 3 bx)))
...
Basic block B (dominated by basic block A):
(set (reg:SI 4 si)
 (sign_extend:SI (reg:QI 0 ax))) /* ax contains symbol "f". */


ree changes that into the broken:
Basic block A:
(set (reg:SI 4 si)
 (sign_extend:SI (mem:HI (symbol_ref:DI ("f")

(set (reg:SI 3 bx)
 (reg:SI 4 si))

...
(set (reg:SI 3 bx)
 (if_then_else:SI (eq (reg:CCZ 17 flags)
(const_int 0 [0]))
(reg:SI 0 ax)
(reg:SI 3 bx)))
...
Basic block B:
(set (reg:SI 4 si)
 (sign_extend:SI (reg:QI 0 ax))) /* Insn unchanged by ree, but ax now 
undefined.  */


Note that after ree register ax is now used uninitialised in basic block A and
the insn that previously set it to symbol "f" has now been transformed into:
(set (reg:SI 4 si)
 (sign_extend:SI (mem:HI (symbol_ref:DI ("f")

I've explained in the comments in the patch what's going on but the short 
version is
trying to change the destination of a defining insn that feeds into an extend 
insn
is not valid if the defining insn doesn't feed directly into the extend insn.
In the ree pass the only way this can happen is if there is an intermediate 
conditional
move that the pass tries to handle in a special way.  An equivalent fix would 
have been
to check on that path (when copy_needed in combine_reaching_defs is true) that 
the
state->copies_list vector (that contains the conditional move insns feeding 
into the
extend insn) is empty.

This patch is the minimal fix that I could do for this PR and the cases it 
rejects are
cases where we're pretty much guaranteed to miscompile the code, so it doesn't 
reject
any legitimate extension elimination opportunities. I checked that the code 
generation
doesn't change on aarch64 for the whole of SPEC2006 (I did see codegen 
regressions with
previous versions of the patch that were less targeted).

Bootstrapped and tested on x86_64-unknown-linux-gnu, arm-none-linux-gnueabihf,
aarch64-none-linux-gnu.

I've marked some other PRs that are dups of this one in the ChangeLog. If anyone
has any other PRs that are dups of this one please let me know.

Ok for trunk?

Thanks,
Kyrill

2015-11-16  Kyrylo Tkachov  

PR rtl-optimization/68194
PR rtl-optimization/68328
PR rtl-optimization/68185
* ree.c (combine_reaching_defs): Reject copy_needed case if defining
insn does not feed directly into candidate extension insn.

2015-11-16  Kyrylo Tkachov  

PR rtl-optimization/68194
* gcc.dg/pr68194.c: New test.
commit 33131f774705b936afc1a26c145e1214b388771f
Author: Kyrylo Tkachov 
Date:   Fri Nov 13 15:01:47 2015 +

[RTL-ree] PR rtl-optimization/68194: Restrict copy instruction in presence of conditional moves

diff --git a/gcc/ree.c b/gcc/ree.c
index b8436f2..e91d164 100644
--- a/gcc/ree.c
+++ b/gcc/ree.c
@@ -814,7 +814,30 @@ combine_reaching_defs (ext_cand *cand, const_rtx set_pat, ext_state *state)
 
   rtx tmp_reg = gen_rtx_REG (GET_MODE (SET_DEST (PATTERN (cand->insn))),
  REGNO (SET_DEST (*dest_sub_rtx)));
-  if (reg_overlap_mentioned_p (tmp_reg, SET_DEST (PATTERN (cand->insn
+
+  /* When transforming:
+	(set (reg1) (expression))
+	 ...
+	(set (reg2) (any_extend (reg1)))
+
+	into
+
+	(set (reg2) (any_extend (expression)))
+	(set (reg1) (reg2))
+	make sure that reg1 from the first set feeds directly into the extend.
+	This may not hold in a situation with an intermediate
+	conditional copy i.e.
+	I1: (set (reg3) (expression))
+	I2: (set (reg1) (cond ? reg3 : reg1))
+	I3: (set (reg2) (any_extend (reg1)))
+
+	where I3 is cand, I1 is def_insn and I2 is a conditional copy.
+	We want to avoid transforming that into:
+	(set (reg2) (any_extend (expression)))
+	(set (reg1) (reg2))
+	(set (reg1) (cond ? reg3 : reg1)).  */
+  if (reg_overlap_mentioned_p (tmp_reg, SET_DEST (PATTERN (cand->insn)))
+	  || !reg_overlap_mentioned_p (tmp_reg, SET_SRC (PATTERN (cand->insn
 	return false;
 
   /* The destination register of the extension insn must not be
diff --git a/gcc/testsuite/gcc.dg/pr68194.c b/gcc/testsuite/gcc.dg/pr68194.c
new file mode 100644
index 000..b4855ea
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/pr68194.c
@@ -0,0 +1,35 @@
+/* { dg-do run } */
+/* { dg-options "-O2" } */
+
+int a, c, d, e, g, h;
+short f;
+
+short
+fn1 (void)
+{
+  int j[2];
+  for (; e; e++)
+if (j[0])
+  

Re: [PATCH][GCC] Make stackalign test LTO proof

2015-11-16 Thread Andre Vieira

On 16/11/15 13:33, Richard Biener wrote:

On Mon, Nov 16, 2015 at 12:43 PM, Andre Vieira
 wrote:

On 13/11/15 10:34, Richard Biener wrote:


On Thu, Nov 12, 2015 at 4:07 PM, Andre Vieira
 wrote:


Hi,

This patch changes this testcase to make sure LTO will not optimize
away
the assignment of the local array to a global variable which was
introduced
to make sure stack space was made available for the test to work.

This is correct because LTO is supposed to optimize this global away
as at
link time it knows this global will never be read. By adding a read of
the
global, LTO will no longer optimize it away.



But that's only because we can't see that bar doesn't clobber it, else
we would optimize away the check and get here again.  Much better
to mark 'dummy' with __attribute__((used)) and go away with 'g' entirely.

Richard.


Tested by running regressions for this testcase for various ARM
targets.

Is this OK to commit?

Thanks,
Andre Vieira

gcc/testsuite/ChangeLog:
2015-11-06  Andre Vieira  

  * gcc.dg/torture/stackalign/builtin-return-1.c: Added read
to global such that a write is not optimized away by LTO.




Hi Richard,

That would be great but __attribute__ ((used)) can only be used for static
variables and making dummy static would defeat the purpose here.


I see.  What about volatile?


Cheers,
Andre



Yeh a 'volatile char dummy[64];' with a read afterwards leads to the 
stack still being reserved after LTO and we won't need the global variable.


If you prefer that solution Ill respin the patch.

Cheers,
Andre



[SH][committed] Fix PR 68277

2015-11-16 Thread Oleg Endo
Hi,

The attached patch fixes PR 68277.
Tested by Kaz on trunk on sh4-linux.  I've also done a sanity check on
GCC 5 branch with "make all" on sh-elf.

Committed to trunk as r230425 and to GCC 5 branch as r230426.

Cheers,
Oleg

gcc/ChangeLog:
PR target/68277
* config/sh/sh.md (addsi3_scr): Handle reg overlap of operands[0]
and operands[2].
(*addsi3): Add another insn_and_split variant for reload.Index: gcc/config/sh/sh.md
===
--- gcc/config/sh/sh.md	(revision 230158)
+++ gcc/config/sh/sh.md	(working copy)
@@ -2232,11 +2232,51 @@
 	}
 }
   else if (!reg_overlap_mentioned_p (operands[0], operands[1]))
-emit_move_insn (operands[0], operands[1]);
+{
+  if (!reg_overlap_mentioned_p (operands[0], operands[2]))
+	emit_move_insn (operands[0], operands[1]);
+  else
+	operands[2] = operands[1];
+}
 }
   [(set_attr "type" "arith")])
 
+;; Old reload might generate add insns directly (not through the expander) for
+;; the memory address of complex insns like atomic insns when reloading.
 (define_insn_and_split "*addsi3"
+  [(set (match_operand:SI 0 "arith_reg_dest" "=r")
+	(plus:SI (match_operand:SI 1 "arith_reg_operand" "r")
+		 (match_operand:SI 2 "arith_or_int_operand" "rn")))]
+  "TARGET_SH1 && !sh_lra_p ()
+   && reload_completed
+   && !reg_overlap_mentioned_p (operands[0], operands[1])"
+  "#"
+  "&& 1"
+  [(set (match_dup 0) (plus:SI (match_dup 0) (match_dup 2)))]
+{
+  if (operands[2] == const0_rtx)
+{
+  emit_move_insn (operands[0], operands[1]);
+  DONE;
+}
+
+  if (CONST_INT_P (operands[2]))
+{
+  if (satisfies_constraint_I08 (operands[2]))
+	emit_move_insn (operands[0], operands[1]);
+  else
+	{
+	  emit_move_insn (operands[0], operands[2]);
+	  operands[2] = operands[1];
+	}
+}
+  else if (!reg_overlap_mentioned_p (operands[0], operands[2]))
+emit_move_insn (operands[0], operands[1]);
+  else
+operands[2] = operands[1];
+})
+
+(define_insn_and_split "*addsi3"
   [(set (match_operand:SI 0 "arith_reg_dest" "=r,r")
 	(plus:SI (match_operand:SI 1 "arith_reg_operand" "%0,r")
 		 (match_operand:SI 2 "arith_operand" "rI08,Z")))]


Re: [PATCH][GCC] Make stackalign test LTO proof

2015-11-16 Thread Richard Biener
On Mon, Nov 16, 2015 at 3:08 PM, Andre Vieira
 wrote:
> On 16/11/15 13:33, Richard Biener wrote:
>>
>> On Mon, Nov 16, 2015 at 12:43 PM, Andre Vieira
>>  wrote:
>>>
>>> On 13/11/15 10:34, Richard Biener wrote:


 On Thu, Nov 12, 2015 at 4:07 PM, Andre Vieira
  wrote:
>
>
> Hi,
>
> This patch changes this testcase to make sure LTO will not optimize
> away
> the assignment of the local array to a global variable which was
> introduced
> to make sure stack space was made available for the test to work.
>
> This is correct because LTO is supposed to optimize this global
> away
> as at
> link time it knows this global will never be read. By adding a read of
> the
> global, LTO will no longer optimize it away.



 But that's only because we can't see that bar doesn't clobber it, else
 we would optimize away the check and get here again.  Much better
 to mark 'dummy' with __attribute__((used)) and go away with 'g'
 entirely.

 Richard.

> Tested by running regressions for this testcase for various ARM
> targets.
>
> Is this OK to commit?
>
> Thanks,
> Andre Vieira
>
> gcc/testsuite/ChangeLog:
> 2015-11-06  Andre Vieira  
>
>   * gcc.dg/torture/stackalign/builtin-return-1.c: Added read
> to global such that a write is not optimized away by LTO.



>>> Hi Richard,
>>>
>>> That would be great but __attribute__ ((used)) can only be used for
>>> static
>>> variables and making dummy static would defeat the purpose here.
>>
>>
>> I see.  What about volatile?
>>
>>> Cheers,
>>> Andre
>>>
>>
> Yeh a 'volatile char dummy[64];' with a read afterwards leads to the stack
> still being reserved after LTO and we won't need the global variable.
>
> If you prefer that solution Ill respin the patch.

Yes please.

Richard.

> Cheers,
> Andre
>


[Patch ARM] Add support for Cortex-A35

2015-11-16 Thread James Greenhalgh

Hi,

This patch adds support to the ARM back-end for the Cortex-A35
processor, as recently announced by ARM. The ARM Cortex-A35 provides
full support for the ARMv8-A architecture, including the CRC extension,
with optional Advanced-SIMD and Floating-Point support. We therefore set
feature flags for this CPU to FL_FOR_ARCH8A and FL_CRC32 and FL_LDSCHED,
in the same fashion as Cortex-A53 and Cortex-A57. While the Cortex-A35
has dual issue capabilities, we model it with an issue rate of one, with
the expectation that this will give better schedules when using the
Cortex-A53 pipeline model.

Bootstrapped with --with-tune=cortex-a35 with no issues.

I'm sorry to have this upstream a little late for the close of Stage 1,
I wanted to wait for binutils support to be committed. This happened
on Thursday [1]. If it is OK with the ARM maintainers, I'd like to get
this in to GCC 6.

OK?

Thanks,
James

[1]: https://sourceware.org/ml/binutils-cvs/2015-11/msg00065.html

---
2015-11-16  James Greenhalgh  

* config/arm/arm-cores.def (cortex-a35): New.
* config/arm/arm.c (arm_cortex_a35_tune): New.
* config/arm/arm-tables.opt: Regenerate.
* config/arm/arm-tune.md: Regenerate.
* config/arm/bpabi.h (BE8_LINK_SPEC): Add cortex-a35.
* config/arm/t-aprofile: Likewise.
* doc/invoke.texi (-mcpu): Likewise.

diff --git a/gcc/config/arm/arm-cores.def b/gcc/config/arm/arm-cores.def
index 86ed0cb..d09707b 100644
--- a/gcc/config/arm/arm-cores.def
+++ b/gcc/config/arm/arm-cores.def
@@ -165,6 +165,7 @@ ARM_CORE("cortex-a15.cortex-a7", cortexa15cortexa7, cortexa7,	7A,	ARM_FSET_MAKE_
 ARM_CORE("cortex-a17.cortex-a7", cortexa17cortexa7, cortexa7,	7A,	ARM_FSET_MAKE_CPU1 (FL_LDSCHED | FL_THUMB_DIV | FL_ARM_DIV | FL_FOR_ARCH7A), cortex_a12)
 
 /* V8 Architecture Processors */
+ARM_CORE("cortex-a35",	cortexa35, cortexa53,	8A,	ARM_FSET_MAKE_CPU1 (FL_LDSCHED | FL_CRC32 | FL_FOR_ARCH8A), cortex_a35)
 ARM_CORE("cortex-a53",	cortexa53, cortexa53,	8A,	ARM_FSET_MAKE_CPU1 (FL_LDSCHED | FL_CRC32 | FL_FOR_ARCH8A), cortex_a53)
 ARM_CORE("cortex-a57",	cortexa57, cortexa57,	8A,	ARM_FSET_MAKE_CPU1 (FL_LDSCHED | FL_CRC32 | FL_FOR_ARCH8A), cortex_a57)
 ARM_CORE("cortex-a72",	cortexa72, cortexa57,	8A,	ARM_FSET_MAKE_CPU1 (FL_LDSCHED | FL_CRC32 | FL_FOR_ARCH8A), cortex_a57)
diff --git a/gcc/config/arm/arm-tables.opt b/gcc/config/arm/arm-tables.opt
index 41bf1ff..48aac41 100644
--- a/gcc/config/arm/arm-tables.opt
+++ b/gcc/config/arm/arm-tables.opt
@@ -304,6 +304,9 @@ EnumValue
 Enum(processor_type) String(cortex-a17.cortex-a7) Value(cortexa17cortexa7)
 
 EnumValue
+Enum(processor_type) String(cortex-a35) Value(cortexa35)
+
+EnumValue
 Enum(processor_type) String(cortex-a53) Value(cortexa53)
 
 EnumValue
diff --git a/gcc/config/arm/arm-tune.md b/gcc/config/arm/arm-tune.md
index e56b5ad..1c84218 100644
--- a/gcc/config/arm/arm-tune.md
+++ b/gcc/config/arm/arm-tune.md
@@ -32,7 +32,7 @@
 	cortexr4f,cortexr5,cortexr7,
 	cortexm7,cortexm4,cortexm3,
 	marvell_pj4,cortexa15cortexa7,cortexa17cortexa7,
-	cortexa53,cortexa57,cortexa72,
-	exynosm1,qdf24xx,xgene1,
-	cortexa57cortexa53,cortexa72cortexa53"
+	cortexa35,cortexa53,cortexa57,
+	cortexa72,exynosm1,qdf24xx,
+	xgene1,cortexa57cortexa53,cortexa72cortexa53"
 	(const (symbol_ref "((enum attr_tune) arm_tune)")))
diff --git a/gcc/config/arm/arm.c b/gcc/config/arm/arm.c
index e31be67..2c8de40 100644
--- a/gcc/config/arm/arm.c
+++ b/gcc/config/arm/arm.c
@@ -1940,6 +1940,29 @@ const struct tune_params arm_cortex_a15_tune =
   tune_params::SCHED_AUTOPREF_FULL
 };
 
+const struct tune_params arm_cortex_a35_tune =
+{
+  arm_9e_rtx_costs,
+  &cortexa53_extra_costs,
+  NULL,	/* Sched adj cost.  */
+  arm_default_branch_cost,
+  &arm_default_vec_cost,
+  1,		/* Constant limit.  */
+  5,		/* Max cond insns.  */
+  8,		/* Memset max inline.  */
+  1,		/* Issue rate.  */
+  ARM_PREFETCH_NOT_BENEFICIAL,
+  tune_params::PREF_CONST_POOL_FALSE,
+  tune_params::PREF_LDRD_FALSE,
+  tune_params::LOG_OP_NON_SHORT_CIRCUIT_TRUE,		/* Thumb.  */
+  tune_params::LOG_OP_NON_SHORT_CIRCUIT_TRUE,		/* ARM.  */
+  tune_params::DISPARAGE_FLAGS_NEITHER,
+  tune_params::PREF_NEON_64_FALSE,
+  tune_params::PREF_NEON_STRINGOPS_TRUE,
+  FUSE_OPS (tune_params::FUSE_MOVW_MOVT),
+  tune_params::SCHED_AUTOPREF_OFF
+};
+
 const struct tune_params arm_cortex_a53_tune =
 {
   arm_9e_rtx_costs,
diff --git a/gcc/config/arm/bpabi.h b/gcc/config/arm/bpabi.h
index 8af4605..e522064 100644
--- a/gcc/config/arm/bpabi.h
+++ b/gcc/config/arm/bpabi.h
@@ -68,6 +68,7 @@
|mcpu=cortex-a15.cortex-a7\
|mcpu=cortex-a17.cortex-a7\
|mcpu=marvell-pj4	\
+   |mcpu=cortex-a35	\
|mcpu=cortex-a53	\
|mcpu=cortex-a57	\
|mcpu=cortex-a57.cortex-a53\
@@ -94,6 +95,7 @@
|mcpu=cortex-a12|mcpu=cortex-a17			\
|mcpu=cortex-a15.cortex-a7\
|mcpu=cortex-a17.cortex-a7\
+   |mcpu=cortex-a35	\
|mcpu=cortex-a53	\
|mcpu=cortex-a57	\
|mcp

Re: [PATCH] PR/67682, break SLP groups up if only some elements match

2015-11-16 Thread Christophe Lyon
On 13 November 2015 at 17:18, Alan Lawrence  wrote:
> On 10/11/15 12:51, Richard Biener wrote:
>>>
>>> Just noticing this... if we have a vectorization factor of 4 and matches
>>> is 1, 1, 1, 1,  1, 1, 0, 0, 0, 0, 0, 0 then this will split into 1, 1, 1, 1 
>>> and
>>> 1, 1, 0, 0, 0, ... where we know from the matches that it will again fail?
>>>
>>> Thus shouldn't we split either only if i % vectorization_factor is 0 or
>>> if not, split "twice", dropping the intermediate surely non-matching
>>> group of vectorization_factor size?  After all if we split like with the
>>> patch then the upper half will _not_ be splitted again with the
>>> simplified patch (result will be 1, 1, 0, 0, 0, 0, 0, 0 again).
>>>
>>> So I expect that the statistics will be the same if we restrict splitting
>>> to the i % vectorization_factor == 0 case, or rather split where we do
>>> now but only re-analyze group2 if i % vectorization_factor == 0 holds?
>>>
>>> Ok with that change.  Improvements on that incrementally.
>>
>> Btw, it just occurs to me that the whole thing is equivalent to splitting
>> the store-group into vector-size pieces up-front?  That way we do
>> the maximum splitting up-frond and avoid any redundant work?
>>
>> The patch is still ok as said, just the above may be a simple thing
>> to explore.
>
> I'd refrained from splitting in vect_analyze_group_access_1 as my 
> understanding
> was that we only did that once, whereas we would retry the
> vect_analyze_slp_instance path each time we decreased the
> vectorization_factor...however, I did try putting code at the beginning of
> vect_analyze_slp_instance to split up any groups > vf. Unfortunately this 
> loses
> us some previously-successful SLPs, as some bigger groups cannot be SLPed if 
> we
> split them as they require 'unrolling'...so not addressing that here.
>
> However your suggestion of splitting twice when we know the boundary is in the
> middle of a vector is a nice compromise; it nets us a good number more
> successes in SPEC2000 and SPEC2006, about 7% more than without the patch.
>
> Hence, here's the patch I've committed, as r230330, after regstrap on x86_64
> and AArch64. (I dropped the previous bb-slp-subgroups-2 and renamed the others
> up as we don't do that one anymore.)
>
> Cheers, Alan
>
> gcc/ChangeLog:
>
> PR tree-optimization/67682
> * tree-vect-slp.c (vect_split_slp_store_group): New.
> (vect_analyze_slp_instance): During basic block SLP, recurse on
> subgroups if vect_build_slp_tree fails after 1st vector.
>
> gcc/testsuite/ChangeLog:
>
> PR tree-optimization/67682
> * gcc.dg/vect/bb-slp-7.c (main1): Make subgroups non-isomorphic.
> * gcc.dg/vect/bb-slp-subgroups-1.c: New.
> * gcc.dg/vect/bb-slp-subgroups-2.c: New.
> * gcc.dg/vect/bb-slp-subgroups-3.c: New.

Hi Alan,

I've noticed that this new test (gcc.dg/vect/bb-slp-subgroups-3.c)
fails for armeb targets.
I haven't had time to look at more details yet, but I guess you can
reproduce it quickly enough.



> ---
>  gcc/testsuite/gcc.dg/vect/bb-slp-7.c   | 10 +--
>  gcc/testsuite/gcc.dg/vect/bb-slp-subgroups-1.c | 44 +
>  gcc/testsuite/gcc.dg/vect/bb-slp-subgroups-2.c | 41 +
>  gcc/testsuite/gcc.dg/vect/bb-slp-subgroups-3.c | 41 +
>  gcc/tree-vect-slp.c| 85 
> +-
>  5 files changed, 215 insertions(+), 6 deletions(-)
>  create mode 100644 gcc/testsuite/gcc.dg/vect/bb-slp-subgroups-1.c
>  create mode 100644 gcc/testsuite/gcc.dg/vect/bb-slp-subgroups-2.c
>  create mode 100644 gcc/testsuite/gcc.dg/vect/bb-slp-subgroups-3.c
>
> diff --git a/gcc/testsuite/gcc.dg/vect/bb-slp-7.c 
> b/gcc/testsuite/gcc.dg/vect/bb-slp-7.c
> index ab54a48..b8bef8c 100644
> --- a/gcc/testsuite/gcc.dg/vect/bb-slp-7.c
> +++ b/gcc/testsuite/gcc.dg/vect/bb-slp-7.c
> @@ -16,12 +16,12 @@ main1 (unsigned int x, unsigned int y)
>unsigned int *pout = &out[0];
>unsigned int a0, a1, a2, a3;
>
> -  /* Non isomorphic.  */
> +  /* Non isomorphic, even 64-bit subgroups.  */
>a0 = *pin++ + 23;
> -  a1 = *pin++ + 142;
> +  a1 = *pin++ * 142;
>a2 = *pin++ + 2;
>a3 = *pin++ * 31;
> -
> +
>*pout++ = a0 * x;
>*pout++ = a1 * y;
>*pout++ = a2 * x;
> @@ -29,7 +29,7 @@ main1 (unsigned int x, unsigned int y)
>
>/* Check results.  */
>if (out[0] != (in[0] + 23) * x
> -  || out[1] != (in[1] + 142) * y
> +  || out[1] != (in[1] * 142) * y
>|| out[2] != (in[2] + 2) * x
>|| out[3] != (in[3] * 31) * y)
>  abort();
> @@ -47,4 +47,4 @@ int main (void)
>  }
>
>  /* { dg-final { scan-tree-dump-times "basic block vectorized" 0 "slp2" } } */
> -
> +
> diff --git a/gcc/testsuite/gcc.dg/vect/bb-slp-subgroups-1.c 
> b/gcc/testsuite/gcc.dg/vect/bb-slp-subgroups-1.c
> new file mode 100644
> index 000..39c23c3
> --- /dev/null
> +++ b/gcc/testsuite/gcc.dg/vect/bb-slp-subgroups-1.c
> @@ -0,0 +1,44 @@
> +/* { dg-require-effective-

Re: [PATCH] Allow embedded timestamps by C/C++ macros to be set externally (2)

2015-11-16 Thread Dhole
On 11/16/2015 02:05 PM, Bernd Schmidt wrote:
> On 11/15/2015 11:14 PM, Dhole wrote:
>> gcc/c-family/ChangeLog:
>>
>> 2015-10-10  Eduard Sanou
> 
> I can't find a previous change from you in the sources, so the first
> question would be whether you've gone through the copyright assignment
> process.

I haven't gone through the copyright assignment process. Nevertheless
Matthias Klose (co-author of this patch) has.

In any case, if required, let me know how to proceed to obtain the
relevant forms so that I can file the copyright assignment.

Best regards,
Dhole


[Patch AArch64] Add support for Cortex-A35

2015-11-16 Thread James Greenhalgh

Hi,

This patch adds support to the AArch64 back-end for the Cortex-A35
processor, as recently announced by ARM. The ARM Cortex-A35 provides
full support for the ARMv8-A architecture, including the CRC extension,
with optional Advanced-SIMD and Floating-Point support. We therefore set
feature flags for this CPU to AARCH64_FL_FOR_ARCH8 and AARCH64_FL_CRC, in
the same fashion as Cortex-A53 and Cortex-A57. While the Cortex-A35
supports dual-issue, we model it as single issue with the expectation that
this will give better schedules when sharing the Cortex-A53 pipeline model.

Bootstrapped with --with-cpu=cortex-a35 with no issues.

I'm sorry to have got this upstream a little late for the end of Stage 1,
but if it is OK with the AArch64 maintainers I'd like to get it in for
GCC 6.

OK?

Thanks,
James

---
2015-11-16  James Greenhalgh  

* config/aarch64/aarch64-cores.def (cortex-a35): New.
* config/aarch64/aarch64.c (cortexa35_tunings): New.
* config/aarch64/aarch64-tune.md: Regenerate.
* doc/invoke.texi (-mcpu): Add Cortex-A35

diff --git a/gcc/config/aarch64/aarch64-cores.def b/gcc/config/aarch64/aarch64-cores.def
index 4af70ca..f8fab28 100644
--- a/gcc/config/aarch64/aarch64-cores.def
+++ b/gcc/config/aarch64/aarch64-cores.def
@@ -40,6 +40,7 @@
 
 /* V8 Architecture Processors.  */
 
+AARCH64_CORE("cortex-a35",  cortexa35, cortexa53, 8A,  AARCH64_FL_FOR_ARCH8 | AARCH64_FL_CRC, cortexa35, "0x41", "0xd04")
 AARCH64_CORE("cortex-a53",  cortexa53, cortexa53, 8A,  AARCH64_FL_FOR_ARCH8 | AARCH64_FL_CRC, cortexa53, "0x41", "0xd03")
 AARCH64_CORE("cortex-a57",  cortexa57, cortexa57, 8A,  AARCH64_FL_FOR_ARCH8 | AARCH64_FL_CRC, cortexa57, "0x41", "0xd07")
 AARCH64_CORE("cortex-a72",  cortexa72, cortexa57, 8A,  AARCH64_FL_FOR_ARCH8 | AARCH64_FL_CRC, cortexa72, "0x41", "0xd08")
diff --git a/gcc/config/aarch64/aarch64-tune.md b/gcc/config/aarch64/aarch64-tune.md
index c65a124..cbc6f48 100644
--- a/gcc/config/aarch64/aarch64-tune.md
+++ b/gcc/config/aarch64/aarch64-tune.md
@@ -1,5 +1,5 @@
 ;; -*- buffer-read-only: t -*-
 ;; Generated automatically by gentune.sh from aarch64-cores.def
 (define_attr "tune"
-	"cortexa53,cortexa57,cortexa72,exynosm1,qdf24xx,thunderx,xgene1,cortexa57cortexa53,cortexa72cortexa53"
+	"cortexa35,cortexa53,cortexa57,cortexa72,exynosm1,qdf24xx,thunderx,xgene1,cortexa57cortexa53,cortexa72cortexa53"
 	(const (symbol_ref "((enum attr_tune) aarch64_tune)")))
diff --git a/gcc/config/aarch64/aarch64.c b/gcc/config/aarch64/aarch64.c
index 5ec7f08..8569385 100644
--- a/gcc/config/aarch64/aarch64.c
+++ b/gcc/config/aarch64/aarch64.c
@@ -362,6 +362,31 @@ static const struct tune_params generic_tunings =
   (AARCH64_EXTRA_TUNE_NONE)	/* tune_flags.  */
 };
 
+static const struct tune_params cortexa35_tunings =
+{
+  &cortexa53_extra_costs,
+  &generic_addrcost_table,
+  &cortexa53_regmove_cost,
+  &generic_vector_cost,
+  &generic_branch_cost,
+  4, /* memmov_cost  */
+  1, /* issue_rate  */
+  (AARCH64_FUSE_MOV_MOVK | AARCH64_FUSE_ADRP_ADD
+   | AARCH64_FUSE_MOVK_MOVK | AARCH64_FUSE_ADRP_LDR), /* fusible_ops  */
+  8,	/* function_align.  */
+  8,	/* jump_align.  */
+  4,	/* loop_align.  */
+  2,	/* int_reassoc_width.  */
+  4,	/* fp_reassoc_width.  */
+  1,	/* vec_reassoc_width.  */
+  2,	/* min_div_recip_mul_sf.  */
+  2,	/* min_div_recip_mul_df.  */
+  0,	/* max_case_values.  */
+  0,	/* cache_line_size.  */
+  tune_params::AUTOPREFETCHER_WEAK,	/* autoprefetcher_model.  */
+  (AARCH64_EXTRA_TUNE_NONE)	/* tune_flags.  */
+};
+
 static const struct tune_params cortexa53_tunings =
 {
   &cortexa53_extra_costs,
diff --git a/gcc/doc/invoke.texi b/gcc/doc/invoke.texi
index c18df98..d782ab2 100644
--- a/gcc/doc/invoke.texi
+++ b/gcc/doc/invoke.texi
@@ -12576,8 +12576,9 @@ processors implementing the target architecture.
 @opindex mtune
 Specify the name of the target processor for which GCC should tune the
 performance of the code.  Permissible values for this option are:
-@samp{generic}, @samp{cortex-a53}, @samp{cortex-a57}, @samp{cortex-a72},
-@samp{exynos-m1}, @samp{qdf24xx}, @samp{thunderx}, @samp{xgene1}.
+@samp{generic}, @samp{cortex-a35}, @samp{cortex-a53}, @samp{cortex-a57},
+@samp{cortex-a72}, @samp{exynos-m1}, @samp{qdf24xx}, @samp{thunderx},
+@samp{xgene1}.
 
 Additionally, this option can specify that GCC should tune the performance
 of the code for a big.LITTLE system.  Permissible values for this


Re: [PATCH v4] SH FDPIC backend support

2015-11-16 Thread Oleg Endo
On Sun, 2015-11-15 at 15:39 -0500, Rich Felker wrote:

> > This is basically the same as above ... it's not possible to
> > conditionally construct/modify pattern descriptions in the .md. 
> >  However, it's possible to modify the CALL_INSN_FUNCTION_USAGE
> > field of
> > call insns -- for some examples see 'grep -r
> > CALL_INSN_FUNCTION_USAGE
> > gcc/config/*'.  Also, it seems the SH backend doesn't make use of
> > some
> > existing libcall related parameters and target hooks/macros.  Maybe
> > those could be helpful.
> 
> I'll take a look at this. Let me know if you turn up anything
> interesting.

I'm currently working on other things, sorry.


> > 
> > Maybe TARGET_USE_PSEUDO_PIC_REG could be useful?
> 
> Yes. Is there any documentation on using it? I came across that but
> couldn't figure out how it compares to just doing the pseudo yourself
> in the target files. Is non-target-specific code affected by this?

Yes, non-target-specific code seems to be affected by this in some way,
although I don't know any details.  Due to lack of documentation you'll
have to grep yourself through it by looking for "USE_PSEUDO_PIC_REG"
and "use_pseudo_pic_reg" to find the places where it's used.

Cheers,
Oleg


[PATCH] Fix PR68306 (and 68367)

2015-11-16 Thread Richard Biener

I made a copy&paste error in the prev rev.

Committed as obvious.

Richard.

2015-11-16  Richard Biener  

* tree-vect-data-refs.c (vect_verify_datarefs_alignment): Fix
bogus copying from verify_data_ref_alignment and use continue
instead of return.

Index: gcc/tree-vect-data-refs.c
===
--- gcc/tree-vect-data-refs.c   (revision 230421)
+++ gcc/tree-vect-data-refs.c   (working copy)
@@ -967,13 +967,13 @@ vect_verify_datarefs_alignment (loop_vec
   /* For interleaving, only the alignment of the first access matters.   */
   if (STMT_VINFO_GROUPED_ACCESS (stmt_info)
  && GROUP_FIRST_ELEMENT (stmt_info) != stmt)
-   return true;
+   continue;
 
   /* Strided accesses perform only component accesses, alignment is
 irrelevant for them.  */
   if (STMT_VINFO_STRIDED_P (stmt_info)
  && !STMT_VINFO_GROUPED_ACCESS (stmt_info))
-   return true;
+   continue;
 
   if (! verify_data_ref_alignment (dr))
return false;


Re: [PATCH] [ARM, Callgraph] Fix PR67280: function incorrectly marked as nothrow

2015-11-16 Thread Charles Baylis
Backported r227407 to gcc-5-branch following approval on IRC. The
patch applied without conflicts.

2015-11-16  Charles Baylis  

Backport from mainline r227407
PR ipa/67280
* cgraphunit.c (cgraph_node::create_wrapper): Set can_throw_external
in new callgraph edge.



On 20 September 2015 at 23:53, Charles Baylis  wrote:
> On 7 September 2015 at 09:35, Charles Baylis  
> wrote:
 >gcc/ChangeLog:
 >
 >2015-08-28  Charles Baylis  
 >
 > * cgraphunit.c (cgraph_node::create_wrapper): Set 
 > can_throw_external
 > in new callgraph edge.
>>
>> Committed to trunk as r227407.
>>
>> Are you happy for me to backport to gcc-5-branch?
>
> Hi Jan,
>
> I'd still like to backport this patch to gcc 5. Is that OK
>
> Thanks
> Charles


[PATCH][install.texi] Add note against GNAT 4.8 on ARM targets.

2015-11-16 Thread Alan Lawrence
This follows from the discussion here: 
https://gcc.gnu.org/ml/gcc/2015-10/msg00082.html .

OK for trunk?

--Alan

gcc/ChangeLog:

doc/install.texi: Add note against GNAT 4.8 on ARM targets.
---
 gcc/doc/install.texi | 2 ++
 1 file changed, 2 insertions(+)

diff --git a/gcc/doc/install.texi b/gcc/doc/install.texi
index 1fd773e..1ce93d4 100644
--- a/gcc/doc/install.texi
+++ b/gcc/doc/install.texi
@@ -3481,6 +3481,8 @@ require GNU binutils 2.13 or newer.  Such subtargets 
include:
 @code{arm-*-netbsdelf}, @code{arm-*-*linux-*}
 and @code{arm-*-rtemseabi}.
 
+Building the Ada frontend commonly fails (an infinite loop executing 
@code{xsinfo}) if the host compiler is GNAT 4.8.  Host compilers built from the 
GNAT 4.6, 4.9 or 5 release branches are known to succeed.
+
 @html
 
 @end html
-- 
1.9.1



Re: ping [aPATCH] Fix c++/67337 (segfault in mangle.c)

2015-11-16 Thread Markus Trippelsdorf
On 2015.08.31 at 11:31 +0200, Markus Trippelsdorf wrote:
> On 2015.08.24 at 13:44 +0200, Markus Trippelsdorf wrote:

another ping.

> ping
> 
> > decl_mangling_context() in mangle.c returns a NULL_TREE in case of
> > template type parameters. write_template_prefix() needs to handle this
> > situation.
> > 
> > Tested on ppc64le.
> > 
> > This is a regression from gcc=4.8.
> > OK for trunk, gcc-5 and gcc-4.9?
> > 
> > Thanks.
> > 
> > PR c++/67337
> > * mangle.c (write_template_prefix): Guard against context==NULL.
> > 
> > diff --git a/gcc/cp/mangle.c b/gcc/cp/mangle.c
> > index 342cb93e68b3..a9993f40b94d 100644
> > --- a/gcc/cp/mangle.c
> > +++ b/gcc/cp/mangle.c
> > @@ -1149,7 +1149,7 @@ write_template_prefix (const tree node)
> >   So, for the example above, `Outer::Inner' is represented as a
> >   substitution candidate by a TREE_LIST whose purpose is `Outer'
> >   and whose value is `Outer::Inner'.  */
> > -  if (TYPE_P (context))
> > +  if (context && TYPE_P (context))
> >  substitution = build_tree_list (context, templ);
> >else
> >  substitution = templ;
> > diff --git a/gcc/testsuite/g++.dg/template/pr67337.C 
> > b/gcc/testsuite/g++.dg/template/pr67337.C
> > new file mode 100644
> > index ..df2651bc9a57
> > --- /dev/null
> > +++ b/gcc/testsuite/g++.dg/template/pr67337.C
> > @@ -0,0 +1,25 @@
> > +template  class A
> > +{
> > +  void m_fn1 (int *, int);
> > +};
> > +
> > +template  class B
> > +{
> > +public:
> > +  typedef int Type;
> > +};
> > +
> > +template  class C
> > +{
> > +public:
> > +  C (int);
> > +  template  class T> void m_fn2 (typename T::Type);
> > +};
> > +
> > +template <>
> > +void
> > +A::m_fn1 (int *, int)
> > +{
> > +  C a (0);
> > +  a.m_fn2 (0);
> > +}
> > -- 
> > Markus
> > 
> 
> -- 
> Markus

-- 
Markus


Re: [PATCH][GCC] Make stackalign test LTO proof

2015-11-16 Thread Joern Wolfgang Rennecke

I just happened to stumble on this problem with another port.
The volatile & test solution doesn't work, though.

What does work, however, is:

__asm__ ("" : : "" (dummy));


[gomp4.5] Handle #pragma omp declare target link

2015-11-16 Thread Ilya Verbin
Hi!

On Mon, Oct 26, 2015 at 20:49:40 +0100, Jakub Jelinek wrote:
> On Mon, Oct 26, 2015 at 10:39:04PM +0300, Ilya Verbin wrote:
> > > Without declare target link or to, you can't use the global variables
> > > in orphaned accelerated routines (unless you e.g. take the address of the
> > > mapped variable in the region and pass it around).
> > > The to variables (non-deferred) are always mapped and are initialized with
> > > the original initializer, refcount is infinity.  link (deferred) work more
> > > like the normal mapping, referencing those vars when they aren't 
> > > explicitly
> > > (or implicitly) mapped is unspecified behavior, if it is e.g. mapped 
> > > freshly
> > > with to kind, it gets the current value of the host var rather than the
> > > original one.  But, beyond the mapping the compiler needs to ensure that
> > > all uses of the link global var (or perhaps just all uses of the link 
> > > global
> > > var outside of the target construct body where it is mapped, because you
> > > could use there the pointer you got from GOMP_target) are replaced by
> > > dereference of some artificial pointer, so a becomes *a_tmp and &a becomes
> > > &*a_tmp, and that the runtime library during registration of the tables is
> > > told about the address of this artificial pointer.  During registration,
> > > I'd expect it would stick an entry for this range into the table, with 
> > > some
> > > special flag or something similar, indicating that it is deferred mapping
> > > and where the offloading device pointer is.  During mapping, it would map 
> > > it
> > > as any other not yet mapped object, but additionally would also set this
> > > device pointer to the device address of the mapped object.  We also need 
> > > to
> > > ensure that when we drop the refcount of that mapping back to 0, we get it
> > > back to the state where it is described as a range with registered 
> > > deferred
> > > mapping and where the device pointer is.
> > 
> > Ok, got it, I'll try implement this...
> 
> Thanks.
> 
> > > > > we actually replace the variables with pointers to variables, then 
> > > > > need
> > > > > to somehow also mark those in the offloading tables, so that the 
> > > > > library
> > > > 
> > > > I see 2 possible options: use the MSB of the size, or introduce the 
> > > > third field
> > > > for flags.
> > > 
> > > Well, it can be either recorded in the host variable tables (which contain
> > > address and size pair, right), or in corresponding offloading device table
> > > (which contains the pointer, something else?).
> > 
> > It contains a size too, which is checked in libgomp:
> >   gomp_fatal ("Can't map target variables (size mismatch)");
> > Yes, we can remove this check, and use second field in device table for 
> > flags.
> 
> Yeah, or e.g. just use MSB of that size (so check that either the size is
> the same (then it is target to) or it is MSB | size (then it is target link).
> Objects larger than half of the address space aren't really supportable
> anyway.

Here is WIP patch, not for check-in.  There are still many FIXMEs, which I am
going to resolve, however target-link-1.c testcase pass.
Is this approach correct?  Any comments on FIXMEs?


diff --git a/gcc/c/c-parser.c b/gcc/c/c-parser.c
index 23d0107..58771c0 100644
--- a/gcc/c/c-parser.c
+++ b/gcc/c/c-parser.c
@@ -15895,7 +15895,10 @@ c_parser_omp_declare_target (c_parser *parser)
  g->have_offload = true;
  if (is_a  (node))
{
- vec_safe_push (offload_vars, t);
+ omp_offload_var var;
+ var.decl = t;
+ var.link_ptr_decl = NULL_TREE;
+ vec_safe_push (offload_vars, var);
  node->force_output = 1;
}
 #endif
diff --git a/gcc/cp/parser.c b/gcc/cp/parser.c
index d1f4970..b890f6d 100644
--- a/gcc/cp/parser.c
+++ b/gcc/cp/parser.c
@@ -34999,7 +34999,10 @@ cp_parser_omp_declare_target (cp_parser *parser, 
cp_token *pragma_tok)
  g->have_offload = true;
  if (is_a  (node))
{
- vec_safe_push (offload_vars, t);
+ omp_offload_var var;
+ var.decl = t;
+ var.link_ptr_decl = NULL_TREE;
+ vec_safe_push (offload_vars, var);
  node->force_output = 1;
}
 #endif
diff --git a/gcc/lto-cgraph.c b/gcc/lto-cgraph.c
index 67a9024..878a9c5 100644
--- a/gcc/lto-cgraph.c
+++ b/gcc/lto-cgraph.c
@@ -1106,7 +1106,7 @@ output_offload_tables (void)
   streamer_write_enum (ob->main_stream, LTO_symtab_tags,
   LTO_symtab_last_tag, LTO_symtab_variable);
   lto_output_var_decl_index (ob->decl_state, ob->main_stream,
-(*offload_vars)[i]);
+(*offload_vars)[i].decl);
 }
 
   streamer_write_uhwi_stream (ob->main_stream, 0);
@@ -1902,7 +1902,10 @@ input_offload_tables (void)
 

Re: [PATCH][GCC] Make stackalign test LTO proof

2015-11-16 Thread Andre Vieira

On 16/11/15 15:34, Joern Wolfgang Rennecke wrote:

I just happened to stumble on this problem with another port.
The volatile & test solution doesn't work, though.

What does work, however, is:

__asm__ ("" : : "" (dummy));


I can confirm that Joern's solution works for me too.



[PATCH] ctype functions and signedness

2015-11-16 Thread Michael McConville
Hi, everyone.

While it often (usually?) isn't an issue, passing a signed char to ctype
functions is undefined. Here's the CERT entry:

https://www.securecoding.cert.org/confluence/x/fAs

This means that we need to cast chars to unsigned char before passing
them to one of these functions.

The below patch, generated by a Coccinelle script, fixes instances of
this. It may need some minor tweaks to add line breaks and conform with
local style conventions.

Thanks,
Michael


--- boehm-gc/cord/de_win.c
+++ /tmp/cocci-output-25514-78f80b-de_win.c
@@ -86,7 +86,7 @@ int APIENTRY WinMain (HINSTANCE hInstanc
} else {
 char *p = command_line;
 
-while (*p != 0 && !isspace(*p)) p++;
+while (*p != 0 && !isspace((unsigned char)*p)) p++;
arg_file_name = CORD_to_char_star(
CORD_substr(command_line, 0, p - command_line));
}
@@ -129,7 +129,7 @@ char * plain_chars(char * text, size_t l
 register size_t i;
 
 for (i = 0; i < len; i++) {
-   if (iscntrl(text[i])) {
+   if (iscntrl((unsigned char)text[i])) {
result[i] = ' ';
} else {
result[i] = text[i];
@@ -147,7 +147,7 @@ char * control_chars(char * text, size_t
 register size_t i;
 
 for (i = 0; i < len; i++) {
-   if (iscntrl(text[i])) {
+   if (iscntrl((unsigned char)text[i])) {
result[i] = text[i] + 0x40;
} else {
result[i] = ' ';
--- boehm-gc/os_dep.c
+++ /tmp/cocci-output-25514-d4c0bc-os_dep.c
@@ -271,31 +271,31 @@ char *GC_parse_map_entry(char *buf_ptr,
 }
 
 p = buf_ptr;
-while (isspace(*p)) ++p;
+while (isspace((unsigned char)*p)) ++p;
 start_start = p;
-GC_ASSERT(isxdigit(*start_start));
+GC_ASSERT(isxdigit((unsigned char)*start_start));
 *start = strtoul(start_start, &endp, 16); p = endp;
 GC_ASSERT(*p=='-');
 
 ++p;
 end_start = p;
-GC_ASSERT(isxdigit(*end_start));
+GC_ASSERT(isxdigit((unsigned char)*end_start));
 *end = strtoul(end_start, &endp, 16); p = endp;
-GC_ASSERT(isspace(*p));
+GC_ASSERT(isspace((unsigned char)*p));
 
-while (isspace(*p)) ++p;
+while (isspace((unsigned char)*p)) ++p;
 prot_start = p;
 GC_ASSERT(*prot_start == 'r' || *prot_start == '-');
 memcpy(prot_buf, prot_start, 4);
 prot_buf[4] = '\0';
 if (prot_buf[1] == 'w') {/* we can skip the rest if it's not writable. */
/* Skip past protection field to offset field */
-  while (!isspace(*p)) ++p; while (isspace(*p)) ++p;
-  GC_ASSERT(isxdigit(*p));
+  while (!isspace((unsigned char)*p)) ++p; while (isspace((unsigned 
char)*p)) ++p;
+  GC_ASSERT(isxdigit((unsigned char)*p));
/* Skip past offset field, which we ignore */
-  while (!isspace(*p)) ++p; while (isspace(*p)) ++p;
+  while (!isspace((unsigned char)*p)) ++p; while (isspace((unsigned 
char)*p)) ++p;
maj_dev_start = p;
-GC_ASSERT(isxdigit(*maj_dev_start));
+GC_ASSERT(isxdigit((unsigned char)*maj_dev_start));
 *maj_dev = strtoul(maj_dev_start, NULL, 16);
 }
 
@@ -969,11 +969,11 @@ ptr_t GC_get_stack_base()
 /* Skip the required number of fields.  This number is hopefully   */
 /* constant across all Linux implementations.  */
   for (i = 0; i < STAT_SKIP; ++i) {
-   while (isspace(c)) c = stat_buf[buf_offset++];
-   while (!isspace(c)) c = stat_buf[buf_offset++];
+   while (isspace((unsigned char)c)) c = stat_buf[buf_offset++];
+   while (!isspace((unsigned char)c)) c = stat_buf[buf_offset++];
   }
-while (isspace(c)) c = stat_buf[buf_offset++];
-while (isdigit(c)) {
+while (isspace((unsigned char)c)) c = stat_buf[buf_offset++];
+while (isdigit((unsigned char)c)) {
   result *= 10;
   result += c - '0';
   c = stat_buf[buf_offset++];
--- gcc/testsuite/gcc.dg/charset/builtin1.c
+++ /tmp/cocci-output-20442-6d79bd-builtin1.c
@@ -14,9 +14,9 @@ static int strA(void) { return 'A'; }
 int
 main(void)
 {
-  if (!isdigit('1'))
+  if (!isdigit((unsigned char)'1'))
 abort();
-  if (isdigit('A'))
+  if (isdigit((unsigned char)'A'))
 abort();
   if (!isdigit(str1()))
 abort();
--- gcc/testsuite/gcc.dg/torture/pr67821.c
+++ /tmp/cocci-output-18483-79f7b5-pr67821.c
@@ -7,9 +7,9 @@ foo (const char *s)
 {
   int success = 1;
   const char *p = s + 2;
-  if (!isdigit (*p))
+  if (!isdigit ((unsigned char)*p))
 success = 0;
-  while (isdigit (*p))
+  while (isdigit ((unsigned char)*p))
 ++p;
   return success;
 }
--- gcc/ada/adaint.c
+++ /tmp/cocci-output-21151-00798e-adaint.c
@@ -1628,7 +1628,7 @@ __gnat_is_absolute_path (char *name, int
 {
   if (name[index] == ':' &&
   ((name[index + 1] == '/') ||
-   (isalpha (name[index + 1]) && index + 2 <= length &&
+   (isalpha ((unsigned char)name[index + 1]) && index + 2 <= length &&
 name[index + 2]

Re: Gimple loop splitting

2015-11-16 Thread Michael Matz
Hi,

On Thu, 12 Nov 2015, Jeff Law wrote:

> > this new pass implements loop iteration space splitting for loops that
> > contain a conditional that's always true for one part of the iteration
> > space and false for the other, i.e. such situations:
> FWIW, Ajit suggested the same transformation earlier this year.  During that
> discussion Richi indicated that for hmmer this transformation would enable
> vectorization.

It's a prerequisite indeed, but not enough in itself.  The next problem 
will be that only parts of access chains inside the hot loop are 
vectorizable, but for that those parts need to be disambiguated.  ICC is 
doing that by a massive chain of conditionals testing non-overlapping of 
the respective arrays at runtime.  Our vectorizer could also do that 
(possibly by increasing the allowed number of conditionals), but the next 
problem then is that one of these (then indeed separated) parts is not 
vectorizable by our vectorizer: it's a 'a[i] = f(a[i-1])' dependency that 
can't yet be handled by us.  If the separation of parts would be done by 
loop distribution that would be fine (we'd have separate loops for the 
parts, some of them vectorizable), but our loop distribution can't do 
runtime disambiguation, only our vectorizer.

hmmer is actually quite interesting because it's a fairly isolated hot 
loop posing quite challenging problems for us :)

> 
>   It does increase code size, when the loop body contains
> > also unconditional code (that one is duplicated), so we only transform hot
> > loops.
> 
> Probably ought to be disabled when we're not optimizing for speed as well.

That should be dealt with by '!optimize_loop_for_size_p (loop)'.

> > I've regstrapped this pass enabled with -O2 on x86-64-linux, without
> > regressions.  I've also checked cpu2006 (the non-fortran part) for
> > correctness, not yet for performance.  In the end it should probably only
> > be enabled for -O3+ (although if the whole loop body is conditional it
> > makes sense to also have it with -O2 because code growth is very small
> > then).
> 
> Very curious on the performance side, so if you could get some #s on that,
> it'd be greatly appreciated.

My test machine misbehaved over the weekend, but as soon as I have them 
I'll update here.

> > testsuite/
> > * gcc.dg/loop-split.c: New test.
> 
> Please clean up the #if 0/#if 1 code in the new tests.

Actually I'd prefer if that test contains the by-hand code and the TRACE 
stuff as well, I'd only change the #if 0 into some #if BYHAND or so ...

> You might also want to clean out the TRACE stuff.  Essentially the tests 
> look like you just dropped in a test you'd been running by hand until 
> now :-)

... the reason being, that bugs in the splitter are somewhat unwieldy to 
debug by just staring at the dumps, you only get a checksum mismatch, so 
TRACE=1 is for finding out which of the params and loops is actually 
miscompiled, TRACE=2 for finding the specific iteration that's broken, and 
the #if0 code for putting that situation into a non-macroized and smaller 
function than dotest.  (That's actually how I've run the testcase after I 
had it basically working, extending dotest with a couple more lines, aka 
example loop sitations, adjusting the checksum, and then making a face and 
scratching my head and mucking with the TRACE and #if0 macros :) ).

> I don't see any negative tests -- ie tests that should not be split due 
> to boundary conditions.  Do you have any from development?

Good point, I had some but only ones where I was able to extend the 
splitters to cover them.  I'll think of some that really shouldn't be 
split.

> > Index: tree-ssa-loop-unswitch.c
> > ===
> > --- tree-ssa-loop-unswitch.c(revision 229763)
> > +++ tree-ssa-loop-unswitch.c(working copy)
> Given the amount of new code, unless there's a strong need, I'd prefer this
> transformation to be implemented in its own file.

Okay.

> > +/* Give an induction variable GUARD_IV, and its affine descriptor IV,
> > +   find the loop phi node in LOOP defining it directly, or create
> > +   such phi node.  Return that phi node.  */
> > +
> > +static gphi *
> > +find_or_create_guard_phi (struct loop *loop, tree guard_iv, affine_iv *
> > /*iv*/)
> > +{
> > +  gimple *def = SSA_NAME_DEF_STMT (guard_iv);
> > +  gphi *phi;
> > +  if ((phi = dyn_cast  (def))
> > +  && gimple_bb (phi) == loop->header)
> > +return phi;
> > +
> > +  /* XXX Create the PHI instead.  */
> > +  return NULL;
> 
> So right now we just punt if we need to create the PHI?  Does that 
> happen with any kind of regularity in practice?

Only with such situations:

  for (int i = start; i < end; i++) {
if (i + offset < bound)
  ...
  }

Here the condition-IV is not directly defined by a PHI node.  If it 
happens often I don't know, I guess the usual situation is testing the 
control IV directly.  The deficiency is not hard to fix.

> > 

Re: Extend tree-call-cdce to calls whose result is used

2015-11-16 Thread Michael Matz
Hi,

On Mon, 16 Nov 2015, Richard Biener wrote:

> >> Which would leave us with a lowering stage early in the main 
> >> optimization pipeline - I think fold_builtins pass is way too late 
> >> but any "folding" pass will do (like forwprop or backprop where the 
> >> latter might be better because it might end up computing FP "ranges" 
> >> to improve the initial lowering code).
> >
> > This isn't at all related to what backprop is doing though. backprop 
> > is about optimising definitions based on information about all uses.

Right, I think backprop would be even worse than call_cdce, that pass has 
a completely different structure.

> >> Of course call_cdce is as good as long as it still exists.
> >
> > Does this meann that you're not against the patch in principle (i.e. 
> > keeping call_cdce for now and extending it in the way that this patch 
> > does)?
> 
> Yes, I'm fine with extending call_cdce.  Of course I'd happily approve a 
> patch dissolving it into somewhere where it makes more sense.  But this 
> shouldn't block this patch.

Okay, I like merging passes, so I'll try to do that, once the stuff is in 
:)


Ciao,
Michael.


Re: [PATCH] Fix inconsistent use of integer types in gcc/lto-streamer-out.c

2015-11-16 Thread Andris Pavenis

On 11/16/2015 01:12 PM, Richard Biener wrote:

On Sun, Nov 15, 2015 at 9:21 AM, Andris Pavenis  wrote:

This fixes use of pointers different unsigned integer types as function
parameter.
Function prototype is (see gcc/tree-streamer.h):

bool streamer_tree_cache_lookup (struct streamer_tree_cache_d *, tree,
  unsigned *);

gcc/lto-streamer-out.c passes uint32_t int * as parameter to this method in
2 places.
Current type unisgned is used elsewhere in the same file.

uint32_t is not guaranteed to be the same as unsigned (for DJGPP uint32_t is
actually
unsigned long). That causes compile failure for DJGPP native build.

Ok.

Thanks,
Richard.


Could somebody apply it as I do not have SVN write access.

Andris




Andris

2015-11-15 Andris Pavenis 

* gcc/lto-streamer-out.c (write_global_references): Adjust integer type
   (lto_output_decl_state_refs): Likewise





[AArch64] Rework ARMv8.1 command line options.

2015-11-16 Thread Matthew Wahab

Hello,

The command line options for target selection allow ARMv8.1 extensions
to be individually enabled/disabled. They also allow the extensions to
be enabled with -march=armv8-a. This doesn't reflect the ARMv8.1
architecture which requires all extensions to be enabled and doesn't make
them available for ARMv8.

This patch removes the options for the individual ARMv8.1 extensions
except for +lse. This means that setting -march=armv8.1-a will enable
all extensions required by ARMv8.1 and that the ARMv8.1 extensions can't
be used with -march=armv8.

The exception to this is +lse since there may be existing code expecting
to be built with -march=armv8-a+lse. Note that +crc, which is enabled by
-march=armv8.1-a, is still an option for -march=armv8-a.

This patch depends on the patch series
https://gcc.gnu.org/ml/gcc-patches/2015-10/msg02429.html.

Tested aarch64-none-elf with cross-compiled check-gcc and
aarch64-none-linux-gnu with native bootstrap and make check.

Ok for trunk?
Matthew

gcc/
2015-11-16  Matthew Wahab  

* config/aarch64/aarch64-options-extensions.def: Remove
AARCH64_FL_RDMA from "fp" and "simd".  Remove "pan", "lor",
"rdma".
* config/aarch64/aarch64.h (AARCH64_FL_PAN): Remove.
(AARCH64_FL_LOR): Remove.
(AARCH64_FL_RDMA): Remove.
(AARCH64_FL_V8_1): New.
(AARCH64_FL_FOR_AARCH8_1): Replace AARCH64_FL_PAN, AARCH64_FL_LOR
and AARCH64_FL_RDMA with AARCH64_FL_V8_1.
(AARCH64_ISA_RDMA): Replace AARCH64_FL_RDMA with AARCH64_FL_V8_1.
* doc/invoke.texi (AArch64 - Feature Modifiers): Remove "pan",
"lor" and "rdma".
>From bc4ea389754127ec639ea2de085a7c82aebae117 Mon Sep 17 00:00:00 2001
From: Matthew Wahab 
Date: Fri, 30 Oct 2015 10:32:59 +
Subject: [PATCH] [AArch64] Rework ARMv8.1 command line options.

Change-Id: Ib9053719f45980255a3d7727e226a53d9f214049
---
 gcc/config/aarch64/aarch64-option-extensions.def | 9 -
 gcc/config/aarch64/aarch64.h | 9 +++--
 gcc/doc/invoke.texi  | 7 ---
 3 files changed, 7 insertions(+), 18 deletions(-)

diff --git a/gcc/config/aarch64/aarch64-option-extensions.def b/gcc/config/aarch64/aarch64-option-extensions.def
index b261a0f..4f1d535 100644
--- a/gcc/config/aarch64/aarch64-option-extensions.def
+++ b/gcc/config/aarch64/aarch64-option-extensions.def
@@ -34,11 +34,10 @@
should contain a whitespace-separated list of the strings in 'Features'
that are required.  Their order is not important.  */
 
-AARCH64_OPT_EXTENSION("fp",	AARCH64_FL_FP,  AARCH64_FL_FPSIMD | AARCH64_FL_CRYPTO | AARCH64_FL_RDMA, "fp")
-AARCH64_OPT_EXTENSION("simd",	AARCH64_FL_FPSIMD,  AARCH64_FL_SIMD | AARCH64_FL_CRYPTO | AARCH64_FL_RDMA,   "asimd")
+AARCH64_OPT_EXTENSION ("fp", AARCH64_FL_FP,
+		   AARCH64_FL_FPSIMD | AARCH64_FL_CRYPTO, "fp")
+AARCH64_OPT_EXTENSION ("simd", AARCH64_FL_FPSIMD,
+		   AARCH64_FL_SIMD | AARCH64_FL_CRYPTO, "asimd")
 AARCH64_OPT_EXTENSION("crypto",	AARCH64_FL_CRYPTO | AARCH64_FL_FPSIMD,  AARCH64_FL_CRYPTO,   "aes pmull sha1 sha2")
 AARCH64_OPT_EXTENSION("crc",	AARCH64_FL_CRC, AARCH64_FL_CRC,"crc32")
 AARCH64_OPT_EXTENSION("lse",	AARCH64_FL_LSE, AARCH64_FL_LSE,"lse")
-AARCH64_OPT_EXTENSION("pan",	AARCH64_FL_PAN,		AARCH64_FL_PAN,		"pan")
-AARCH64_OPT_EXTENSION("lor",	AARCH64_FL_LOR,		AARCH64_FL_LOR,		"lor")
-AARCH64_OPT_EXTENSION("rdma",	AARCH64_FL_RDMA | AARCH64_FL_FPSIMD,	AARCH64_FL_RDMA,	"rdma")
diff --git a/gcc/config/aarch64/aarch64.h b/gcc/config/aarch64/aarch64.h
index 68c006f..06345f0 100644
--- a/gcc/config/aarch64/aarch64.h
+++ b/gcc/config/aarch64/aarch64.h
@@ -134,9 +134,7 @@ extern unsigned aarch64_architecture_version;
 #define AARCH64_FL_CRC(1 << 3)	/* Has CRC.  */
 /* ARMv8.1 architecture extensions.  */
 #define AARCH64_FL_LSE	  (1 << 4)  /* Has Large System Extensions.  */
-#define AARCH64_FL_PAN	  (1 << 5)  /* Has Privileged Access Never.  */
-#define AARCH64_FL_LOR	  (1 << 6)  /* Has Limited Ordering regions.  */
-#define AARCH64_FL_RDMA	  (1 << 7)  /* Has ARMv8.1 Adv.SIMD.  */
+#define AARCH64_FL_V8_1	  (1 << 5)  /* Has ARMv8.1 extensions.  */
 
 /* Has FP and SIMD.  */
 #define AARCH64_FL_FPSIMD (AARCH64_FL_FP | AARCH64_FL_SIMD)
@@ -147,8 +145,7 @@ extern unsigned aarch64_architecture_version;
 /* Architecture flags that effect instruction selection.  */
 #define AARCH64_FL_FOR_ARCH8   (AARCH64_FL_FPSIMD)
 #define AARCH64_FL_FOR_ARCH8_1			   \
-  (AARCH64_FL_FOR_ARCH8 | AARCH64_FL_LSE | AARCH64_FL_PAN \
-   | AARCH64_FL_LOR | AARCH64_FL_RDMA)
+  (AARCH64_FL_FOR_ARCH8 | AARCH64_FL_LSE | AARCH64_FL_V8_1)
 
 /* Macros to test ISA flags.  */
 
@@ -157,7 +154,7 @@ extern unsigned aarch64_architecture_version;
 #define AARCH64_ISA_FP (aarch64_isa_flags & AARCH64_FL_FP)
 #define AARCH64_ISA_SIMD   

Re: [PATCH, x86] Fix posix_memalign declaration in mm_malloc.h

2015-11-16 Thread Szabolcs Nagy

On 16/11/15 13:42, Bernd Schmidt wrote:

On 11/13/2015 11:30 PM, Marc Glisse wrote:

+__asm__("posix_memalign");


Can't say I like the __asm__ after the #if/#else/#endif block.


It might also cause trouble if some systems like to prepend an
underscore, maybe?


Yeah, that's one of the things I had in mind when I suggested moving this code 
to libgcc.a instead. Referring
to a library symbol in this way makes me nervous.



an alternative is to leave posix_memalign
declaration there for c as is, but remove
it for c++.

(the namespace issue i think is mostly relevant for
c and even there it should not cause problems in practice,

g++ defines _GNU_SOURCE so stdlib.h does not provide a
clean namespace anyway.

but the incompatible exception specifier can easily break
in c++ with -pedantic-errors, and removing the declaration
should work in practice because _GNU_SOURCE makes
posix_memalign visible in stdlib.h.)




Re: [AArch64] Rework ARMv8.1 command line options.

2015-11-16 Thread Andrew Pinski
On Mon, Nov 16, 2015 at 8:31 AM, Matthew Wahab
 wrote:
> Hello,
>
> The command line options for target selection allow ARMv8.1 extensions
> to be individually enabled/disabled. They also allow the extensions to
> be enabled with -march=armv8-a. This doesn't reflect the ARMv8.1
> architecture which requires all extensions to be enabled and doesn't make
> them available for ARMv8.
>
> This patch removes the options for the individual ARMv8.1 extensions
> except for +lse. This means that setting -march=armv8.1-a will enable
> all extensions required by ARMv8.1 and that the ARMv8.1 extensions can't
> be used with -march=armv8.
>
> The exception to this is +lse since there may be existing code expecting
> to be built with -march=armv8-a+lse. Note that +crc, which is enabled by
> -march=armv8.1-a, is still an option for -march=armv8-a.
>
> This patch depends on the patch series
> https://gcc.gnu.org/ml/gcc-patches/2015-10/msg02429.html.
>
> Tested aarch64-none-elf with cross-compiled check-gcc and
> aarch64-none-linux-gnu with native bootstrap and make check.

I like this patch.  Note I was going to soon submit patches to change
thunderx over to 8.1-a and add thunderxt88pass1 which is 8-a.
But I need a few more patches to get to get that.

Thanks,
Andrew


>
> Ok for trunk?
> Matthew
>
> gcc/
> 2015-11-16  Matthew Wahab  
>
> * config/aarch64/aarch64-options-extensions.def: Remove
> AARCH64_FL_RDMA from "fp" and "simd".  Remove "pan", "lor",
> "rdma".
> * config/aarch64/aarch64.h (AARCH64_FL_PAN): Remove.
> (AARCH64_FL_LOR): Remove.
> (AARCH64_FL_RDMA): Remove.
> (AARCH64_FL_V8_1): New.
> (AARCH64_FL_FOR_AARCH8_1): Replace AARCH64_FL_PAN, AARCH64_FL_LOR
> and AARCH64_FL_RDMA with AARCH64_FL_V8_1.
> (AARCH64_ISA_RDMA): Replace AARCH64_FL_RDMA with AARCH64_FL_V8_1.
> * doc/invoke.texi (AArch64 - Feature Modifiers): Remove "pan",
> "lor" and "rdma".


Re: [Patch ARM] Add support for Cortex-A35

2015-11-16 Thread Kyrill Tkachov


On 16/11/15 14:42, James Greenhalgh wrote:

Hi,

This patch adds support to the ARM back-end for the Cortex-A35
processor, as recently announced by ARM. The ARM Cortex-A35 provides
full support for the ARMv8-A architecture, including the CRC extension,
with optional Advanced-SIMD and Floating-Point support. We therefore set
feature flags for this CPU to FL_FOR_ARCH8A and FL_CRC32 and FL_LDSCHED,
in the same fashion as Cortex-A53 and Cortex-A57. While the Cortex-A35
has dual issue capabilities, we model it with an issue rate of one, with
the expectation that this will give better schedules when using the
Cortex-A53 pipeline model.

Bootstrapped with --with-tune=cortex-a35 with no issues.

I'm sorry to have this upstream a little late for the close of Stage 1,
I wanted to wait for binutils support to be committed. This happened
on Thursday [1]. If it is OK with the ARM maintainers, I'd like to get
this in to GCC 6.

OK?

Thanks,
James

[1]: https://sourceware.org/ml/binutils-cvs/2015-11/msg00065.html

---
2015-11-16  James Greenhalgh  

* config/arm/arm-cores.def (cortex-a35): New.
* config/arm/arm.c (arm_cortex_a35_tune): New.
* config/arm/arm-tables.opt: Regenerate.
* config/arm/arm-tune.md: Regenerate.
* config/arm/bpabi.h (BE8_LINK_SPEC): Add cortex-a35.
* config/arm/t-aprofile: Likewise.
* doc/invoke.texi (-mcpu): Likewise.



Ok.
Thanks,
Kyrill



Re: [PATCH] Implement GOMP_OFFLOAD_unload_image in intelmic plugin

2015-11-16 Thread Ilya Verbin
On Tue, Sep 08, 2015 at 22:41:17 +0300, Ilya Verbin wrote:
> This patch supports unloading of target images from the device.
> Unfortunately __offload_unregister_image requires the whole descriptor for
> unloading, which must contain target code inside, for this reason the plugin
> keeps descriptors for all offloaded images in memory.
> Also the patch removes useless variable names, intended for debug purposes.
> Regtested with make check-target-libgomp and using a dlopen/dlclose test.
> OK for trunk?
> 
> liboffloadmic/
>   * plugin/libgomp-plugin-intelmic.cpp (struct TargetImageDesc): New.
>   (ImgDescMap): New typedef.
>   (image_descriptors): New static var.
>   (init): Allocate image_descriptors.
>   (offload): Remove vars2 argument.  Pass NULL to __offload_offload1
>   instead of vars2.
>   (unregister_main_image): New static function.
>   (register_main_image): Call unregister_main_image at exit.
>   (GOMP_OFFLOAD_init_device): Print device number, fix offload args.
>   (GOMP_OFFLOAD_fini_device): Likewise.
>   (get_target_table): Remove vd1g and vd2g, don't pass them to offload.
>   (offload_image): Remove declaration of the struct TargetImage.
>   Free table.  Insert new descriptor into image_descriptors.
>   (GOMP_OFFLOAD_unload_image): Call __offload_unregister_image, free
>   the corresponding descriptor, and remove it from address_table and
>   image_descriptors.
>   (GOMP_OFFLOAD_alloc): Print device number, remove vd1g.
>   (GOMP_OFFLOAD_free): Likewise.
>   (GOMP_OFFLOAD_host2dev): Print device number, remove vd1g and vd2g.
>   (GOMP_OFFLOAD_dev2host): Likewise.
>   (GOMP_OFFLOAD_run): Print device number, remove vd1g.
>   * plugin/offload_target_main.cpp (__offload_target_table_p1): Remove
>   vd2, don't pass it to __offload_target_enter.
>   (__offload_target_table_p2): Likewise.
>   (__offload_target_alloc): Likewise.
>   (__offload_target_free): Likewise.
>   (__offload_target_host2tgt_p1): Likewise.
>   (__offload_target_host2tgt_p2): Likewise.
>   (__offload_target_tgt2host_p1): Likewise.
>   (__offload_target_tgt2host_p2): Likewise.
>   (__offload_target_run): Likewise.

Ping?  Rebased and retested.


diff --git a/liboffloadmic/plugin/libgomp-plugin-intelmic.cpp 
b/liboffloadmic/plugin/libgomp-plugin-intelmic.cpp
index 772e198..6ee585e 100644
--- a/liboffloadmic/plugin/libgomp-plugin-intelmic.cpp
+++ b/liboffloadmic/plugin/libgomp-plugin-intelmic.cpp
@@ -65,6 +65,17 @@ typedef std::vector DevAddrVect;
 /* Addresses for all images and all devices.  */
 typedef std::map ImgDevAddrMap;
 
+/* Image descriptor needed by __offload_[un]register_image.  */
+struct TargetImageDesc {
+  int64_t size;
+  /* 10 characters is enough for max int value.  */
+  char name[sizeof ("lib00.so")];
+  char data[];
+} __attribute__ ((packed));
+
+/* Image descriptors, indexed by a pointer obtained from libgomp.  */
+typedef std::map ImgDescMap;
+
 
 /* Total number of available devices.  */
 static int num_devices;
@@ -76,6 +87,9 @@ static int num_images;
second key is number of device.  Contains a vector of pointer pairs.  */
 static ImgDevAddrMap *address_table;
 
+/* Descriptors of all images, registered in liboffloadmic.  */
+static ImgDescMap *image_descriptors;
+
 /* Thread-safe registration of the main image.  */
 static pthread_once_t main_image_is_registered = PTHREAD_ONCE_INIT;
 
@@ -156,6 +170,7 @@ init (void)
 
 out:
   address_table = new ImgDevAddrMap;
+  image_descriptors = new ImgDescMap;
   num_devices = _Offload_number_of_devices ();
 }
 
@@ -192,14 +207,13 @@ GOMP_OFFLOAD_get_num_devices (void)
 
 static void
 offload (const char *file, uint64_t line, int device, const char *name,
-int num_vars, VarDesc *vars, VarDesc2 *vars2, const void **async_data)
+int num_vars, VarDesc *vars, const void **async_data)
 {
   OFFLOAD ofld = __offload_target_acquire1 (&device, file, line);
   if (ofld)
 {
   if (async_data == NULL)
-   __offload_offload1 (ofld, name, 0, num_vars, vars, vars2, 0, NULL,
-   NULL);
+   __offload_offload1 (ofld, name, 0, num_vars, vars, NULL, 0, NULL, NULL);
   else
{
  OffloadFlags flags;
@@ -217,13 +231,27 @@ offload (const char *file, uint64_t line, int device, 
const char *name,
 }
 
 static void
+unregister_main_image ()
+{
+  __offload_unregister_image (&main_target_image);
+}
+
+static void
 register_main_image ()
 {
+  /* Do not check the return value, because old versions of liboffloadmic did
+ not have return values.  */
   __offload_register_image (&main_target_image);
 
   /* liboffloadmic will call GOMP_PLUGIN_target_task_completion when
  asynchronous task on target is completed.  */
   __offload_register_task_callback (GOMP_PLUGIN_target_task_completion);
+
+  if (atexit (unregister_main_image) != 0)
+{
+  fprintf (stderr, "%s: atexit 

RE: [RFC, Patch]: Optimized changes in the register used inside loop for LICM and IVOPTS.

2015-11-16 Thread Ajit Kumar Agarwal


-Original Message-
From: Jeff Law [mailto:l...@redhat.com] 
Sent: Friday, November 13, 2015 11:44 AM
To: Ajit Kumar Agarwal; GCC Patches
Cc: Vinod Kathail; Shail Aditya Gupta; Vidhumouli Hunsigida; Nagaraju Mekala
Subject: Re: [RFC, Patch]: Optimized changes in the register used inside loop 
for LICM and IVOPTS.

On 10/07/2015 10:32 PM, Ajit Kumar Agarwal wrote:

>
> 0001-RFC-Patch-Optimized-changes-in-the-register-used-ins.patch
>
>
>  From f164fd80953f3cffd96a492c8424c83290cd43cc Mon Sep 17 00:00:00 
> 2001
> From: Ajit Kumar Agarwal
> Date: Wed, 7 Oct 2015 20:50:40 +0200
> Subject: [PATCH] [RFC, Patch]: Optimized changes in the register used inside
>   loop for LICM and IVOPTS.
>
> Changes are done in the Loop Invariant(LICM) at RTL level and also the 
> Induction variable optimization based on SSA representation. The 
> current logic used in LICM for register used inside the loops is 
> changed. The Live Out of the loop latch node and the Live in of the 
> destination of the exit nodes is used to set the Loops Liveness at the exit 
> of the Loop.
> The register used is the number of live variables at the exit of the 
> Loop calculated above.
>
> For Induction variable optimization on tree SSA representation, the 
> register used logic is based on the number of phi nodes at the loop 
> header to represent the liveness at the loop.  Current Logic used only 
> the number of phi nodes at the loop header.  Changes are made to 
> represent the phi operands also live at the loop. Thus number of phi 
> operands also gets incremented in the number of registers used.
>
> ChangeLog:
> 2015-10-09  Ajit Agarwal
>
>   * loop-invariant.c (compute_loop_liveness): New.
>   (determine_regs_used): New.
>   (find_invariants_to_move): Use of determine_regs_used.
>   * tree-ssa-loop-ivopts.c (determine_set_costs): Consider the phi
>   arguments for register used.
>>I think Bin rejected the tree-ssa-loop-ivopts change.  However, the 
>>loop-invariant change is still pending, right?


>
> Signed-off-by:Ajit agarwalajit...@xilinx.com
> ---
>   gcc/loop-invariant.c   | 72 
> +-
>   gcc/tree-ssa-loop-ivopts.c |  4 +--
>   2 files changed, 60 insertions(+), 16 deletions(-)
>
> diff --git a/gcc/loop-invariant.c b/gcc/loop-invariant.c
> index 52c8ae8..e4291c9 100644
> --- a/gcc/loop-invariant.c
> +++ b/gcc/loop-invariant.c
> @@ -1413,6 +1413,19 @@ set_move_mark (unsigned invno, int gain)
>   }
>   }
>
> +static int
> +determine_regs_used()
> +{
> +  unsigned int j;
> +  unsigned int reg_used = 2;
> +  bitmap_iterator bi;
> +
> +  EXECUTE_IF_SET_IN_BITMAP (&LOOP_DATA (curr_loop)->regs_live, 0, j, bi)
> +(reg_used) ++;
> +
> +  return reg_used;
> +}
>>Isn't this just bitmap_count_bits (regs_live) + 2?


> @@ -2055,9 +2057,43 @@ calculate_loop_reg_pressure (void)
>   }
>   }
>
> -
> +static void
> +calculate_loop_liveness (void)
>>Needs a function comment.

I will incorporate the above comments.
> +{
> +  basic_block bb;
> +  struct loop *loop;
>
> -/* Move the invariants out of the loops.  */
> +  FOR_EACH_LOOP (loop, 0)
> +if (loop->aux == NULL)
> +  {
> +loop->aux = xcalloc (1, sizeof (struct loop_data));
> +bitmap_initialize (&LOOP_DATA (loop)->regs_live, ®_obstack);
> + }
> +
> +  FOR_EACH_BB_FN (bb, cfun)
>>Why loop over blocks here?  Why not just iterate through all the loops 
>>in the loop structure.  Order isn't particularly important AFAICT for 
>>this code.

Iterating over the Loop structure is enough. We don't need iterating over the 
basic blocks.

> +   {
> + int  i;
> + edge e;
> + vec edges;
> + edges = get_loop_exit_edges (loop);
> + FOR_EACH_VEC_ELT (edges, i, e)
> + {
> +   bitmap_ior_into (&LOOP_DATA (loop)->regs_live, DF_LR_OUT(e->src));
> +   bitmap_ior_into (&LOOP_DATA (loop)->regs_live, DF_LR_IN(e->dest));
>>Space before the open-paren in the previous two lines
>>DF_LR_OUT (e->src) and FD_LR_INT (e->dest))

I will incorporate this.

> + }
> +  }
> +  }
> +}
> +
> +/* Move the invariants  ut of the loops.  */
>>Looks like you introduced a typo.

>>I'd like to see testcases which show the change in # regs used 
>>computation helping generate better code. 

We need to measure the test case with the scenario where the new variable 
created for loop invariant increases the register pressure and
the cost with respect to reg_used and new_regs increases that lead to spill and 
fetch and drop the invariant movement.

Getting that environment in the test case seems to be little difficult. But I 
will try to identify the testcases extracted from the Benchmarks
 where we got the gains by the above method.


>>And  I'd also like to see some background information on why you think 
>>this is a more accurate measure for the number of registers used in the 
>>loop.  regs_used AFAICT is supposed to be an estimate of the registers 
>>

[committed] Use RECORD_OR_UNION_TYPE_P in c-family/

2015-11-16 Thread Marek Polacek
While waiting for a build to finish I figured I might as well do this.

Bootstrapped/regtested on x86_64-linux, applying to trunk.

2015-11-16  Marek Polacek  

* c-ada-spec.c (dump_ada_template): Use RECORD_OR_UNION_TYPE_P.
* c-common.c (c_common_get_alias_set): Likewise.
(handle_visibility_attribute): Likewise.

diff --git gcc/c-family/c-ada-spec.c gcc/c-family/c-ada-spec.c
index 216bd6f..e85c1a9 100644
--- gcc/c-family/c-ada-spec.c
+++ gcc/c-family/c-ada-spec.c
@@ -1758,8 +1758,7 @@ dump_ada_template (pretty_printer *buffer, tree t, int 
spc)
 
   /* We are interested in concrete template instantiations only: skip
 partially specialized nodes.  */
-  if ((TREE_CODE (instance) == RECORD_TYPE
-  || TREE_CODE (instance) == UNION_TYPE)
+  if (RECORD_OR_UNION_TYPE_P (instance)
  && cpp_check && cpp_check (instance, HAS_DEPENDENT_TEMPLATE_ARGS))
continue;
 
diff --git gcc/c-family/c-common.c gcc/c-family/c-common.c
index f8ccb6d..a062f81 100644
--- gcc/c-family/c-common.c
+++ gcc/c-family/c-common.c
@@ -5344,19 +5344,15 @@ c_common_get_alias_set (tree t)
   TREE_CODE (t2) == POINTER_TYPE;
   t2 = TREE_TYPE (t2))
;
-  if (TREE_CODE (t2) != RECORD_TYPE
- && TREE_CODE (t2) != ENUMERAL_TYPE
- && TREE_CODE (t2) != QUAL_UNION_TYPE
- && TREE_CODE (t2) != UNION_TYPE)
+  if (!RECORD_OR_UNION_TYPE_P (t2)
+ && TREE_CODE (t2) != ENUMERAL_TYPE)
return -1;
   if (TYPE_SIZE (t2) == 0)
return -1;
 }
   /* These are the only cases that need special handling.  */
-  if (TREE_CODE (t) != RECORD_TYPE
+  if (!RECORD_OR_UNION_TYPE_P (t)
   && TREE_CODE (t) != ENUMERAL_TYPE
-  && TREE_CODE (t) != QUAL_UNION_TYPE
-  && TREE_CODE (t) != UNION_TYPE
   && TREE_CODE (t) != POINTER_TYPE)
 return -1;
   /* Undefined? */
@@ -8644,7 +8640,7 @@ handle_visibility_attribute (tree *node, tree name, tree 
args,
 {
   if (TREE_CODE (*node) == ENUMERAL_TYPE)
/* OK */;
-  else if (TREE_CODE (*node) != RECORD_TYPE && TREE_CODE (*node) != 
UNION_TYPE)
+  else if (!RECORD_OR_UNION_TYPE_P (*node))
{
  warning (OPT_Wattributes, "%qE attribute ignored on non-class types",
   name);

Marek


RE: [RFC, Patch]: Optimized changes in the register used inside loop for LICM and IVOPTS.

2015-11-16 Thread Ajit Kumar Agarwal

Sorry I missed out some of the points in earlier mail which is given below.

-Original Message-
From: Ajit Kumar Agarwal 
Sent: Monday, November 16, 2015 11:07 PM
To: 'Jeff Law'; GCC Patches
Cc: Vinod Kathail; Shail Aditya Gupta; Vidhumouli Hunsigida; Nagaraju Mekala
Subject: RE: [RFC, Patch]: Optimized changes in the register used inside loop 
for LICM and IVOPTS.



-Original Message-
From: Jeff Law [mailto:l...@redhat.com]
Sent: Friday, November 13, 2015 11:44 AM
To: Ajit Kumar Agarwal; GCC Patches
Cc: Vinod Kathail; Shail Aditya Gupta; Vidhumouli Hunsigida; Nagaraju Mekala
Subject: Re: [RFC, Patch]: Optimized changes in the register used inside loop 
for LICM and IVOPTS.

On 10/07/2015 10:32 PM, Ajit Kumar Agarwal wrote:

>
> 0001-RFC-Patch-Optimized-changes-in-the-register-used-ins.patch
>
>
>  From f164fd80953f3cffd96a492c8424c83290cd43cc Mon Sep 17 00:00:00
> 2001
> From: Ajit Kumar Agarwal
> Date: Wed, 7 Oct 2015 20:50:40 +0200
> Subject: [PATCH] [RFC, Patch]: Optimized changes in the register used inside
>   loop for LICM and IVOPTS.
>
> Changes are done in the Loop Invariant(LICM) at RTL level and also the 
> Induction variable optimization based on SSA representation. The 
> current logic used in LICM for register used inside the loops is 
> changed. The Live Out of the loop latch node and the Live in of the 
> destination of the exit nodes is used to set the Loops Liveness at the exit 
> of the Loop.
> The register used is the number of live variables at the exit of the 
> Loop calculated above.
>
> For Induction variable optimization on tree SSA representation, the 
> register used logic is based on the number of phi nodes at the loop 
> header to represent the liveness at the loop.  Current Logic used only 
> the number of phi nodes at the loop header.  Changes are made to 
> represent the phi operands also live at the loop. Thus number of phi 
> operands also gets incremented in the number of registers used.
>
> ChangeLog:
> 2015-10-09  Ajit Agarwal
>
>   * loop-invariant.c (compute_loop_liveness): New.
>   (determine_regs_used): New.
>   (find_invariants_to_move): Use of determine_regs_used.
>   * tree-ssa-loop-ivopts.c (determine_set_costs): Consider the phi
>   arguments for register used.
>>I think Bin rejected the tree-ssa-loop-ivopts change.  However, the 
>>loop-invariant change is still pending, right?


>
> Signed-off-by:Ajit agarwalajit...@xilinx.com
> ---
>   gcc/loop-invariant.c   | 72 
> +-
>   gcc/tree-ssa-loop-ivopts.c |  4 +--
>   2 files changed, 60 insertions(+), 16 deletions(-)
>
> diff --git a/gcc/loop-invariant.c b/gcc/loop-invariant.c index 
> 52c8ae8..e4291c9 100644
> --- a/gcc/loop-invariant.c
> +++ b/gcc/loop-invariant.c
> @@ -1413,6 +1413,19 @@ set_move_mark (unsigned invno, int gain)
>   }
>   }
>
> +static int
> +determine_regs_used()
> +{
> +  unsigned int j;
> +  unsigned int reg_used = 2;
> +  bitmap_iterator bi;
> +
> +  EXECUTE_IF_SET_IN_BITMAP (&LOOP_DATA (curr_loop)->regs_live, 0, j, bi)
> +(reg_used) ++;
> +
> +  return reg_used;
> +}
>>Isn't this just bitmap_count_bits (regs_live) + 2?


> @@ -2055,9 +2057,43 @@ calculate_loop_reg_pressure (void)
>   }
>   }
>
> -
> +static void
> +calculate_loop_liveness (void)
>>Needs a function comment.

I will incorporate the above comments.
> +{
> +  basic_block bb;
> +  struct loop *loop;
>
> -/* Move the invariants out of the loops.  */
> +  FOR_EACH_LOOP (loop, 0)
> +if (loop->aux == NULL)
> +  {
> +loop->aux = xcalloc (1, sizeof (struct loop_data));
> +bitmap_initialize (&LOOP_DATA (loop)->regs_live, ®_obstack);
> + }
> +
> +  FOR_EACH_BB_FN (bb, cfun)
>>Why loop over blocks here?  Why not just iterate through all the loops 
>>in the loop structure.  Order isn't particularly important AFAICT for 
>>this code.

Iterating over the Loop structure is enough. We don't need iterating over the 
basic blocks.

> +   {
> + int  i;
> + edge e;
> + vec edges;
> + edges = get_loop_exit_edges (loop);
> + FOR_EACH_VEC_ELT (edges, i, e)
> + {
> +   bitmap_ior_into (&LOOP_DATA (loop)->regs_live, DF_LR_OUT(e->src));
> +   bitmap_ior_into (&LOOP_DATA (loop)->regs_live, 
> + DF_LR_IN(e->dest));
>>Space before the open-paren in the previous two lines DF_LR_OUT 
>>(e->src) and FD_LR_INT (e->dest))

I will incorporate this.

> + }
> +  }
> +  }
> +}
> +
> +/* Move the invariants  ut of the loops.  */
>>Looks like you introduced a typo.

>>I'd like to see testcases which show the change in # regs used 
>>computation helping generate better code.

We need to measure the test case with the scenario where the new variable 
created for loop invariant increases the register pressure 
and the cost with respect to reg_used and new_regs increases that lead to spill 
and fetch and drop the invariant movement.

Getting that environment in the t

Re: [PATCH 00/16] Unit tests framework (v3)

2015-11-16 Thread Bernd Schmidt
So Jeff and I just had a chat, and we came up with some thoughts about 
how to proceed. I think we both agree that it would be good to have a 
special testing backend, along with frontends designed to be able to 
read in gimple or rtl that can be operated on. That's more of a 
long-term thing.


For some of the simpler infrastructure tests such as the ones in this 
patch kit (bitmap, vec or wide-int functionality testing and such), we 
had the idea of putting these into every ENABLE_CHECKING compiler, and 
run them after building stage1, controlled by a -fself-test flag. It's 
better to detect such basic failures early rather than complete a full 
bootstrap and test cycle. It also keeps the tests alongside the rest of 
the implementation, which I consider desirable for such relatively 
simple data structures.


Thoughts?


Bernd


Re: [2/2] i386 ROP mitigation

2015-11-16 Thread Uros Bizjak
On Fri, Nov 13, 2015 at 9:47 PM, Bernd Schmidt  wrote:
> This adds a new -mmitigate-rop option to the i386 port. The idea is to
> mitigate against certain forms of attack called "return oriented
> programming" that some of our security folks are concerned about. The basic
> idea is that the stack gets smashed and then, just by chaining function
> returns and some preceding instructions, you can have a Turing-complete
> program to perform an attack. The function returns can be either normal,
> intended ones that are part of the program, or parts of the instruction
> encoding of other sequences.
>
> This patch is a small step towards preventing this kind of attack. I have a
> few more steps queued (not quite ready for stage 1), but additional work
> will be necessary to give reasonable protection. Here, I'm only concerned
> with modr/m bytes, and avoiding certain specific opcodes that encode a
> "return" instruction. Two strategies are available: rename entire chains of
> registers, or insert extra reg-reg copies if there is a free scratch
> register.
>
> The modrm byte computation is not a full one, it is only intended to be able
> to tell whether a value is risky or not.
>
> This was bootstrapped and tested on x86_64-linux. I thought I'd also done a
> full test with -mmitigate-rop forced always on, but a typo thwarted that. An
> earlier set of test results looked reasonable but I did not have a baseline
> to compare against, so I'll be retesting this.

> * regrename.h (struct du_head): Add target_data_1 and target_data_2
> fields.
> * regrename.c (create_new_chain): Clear entire struct after allocating.
>
> * config/i386/i386.opt (mmitigate-rop): New option.
> * doc/invoke.texi (mmitigate-rop): Document.
> * config/i386/i386.c: Include "regrename.h".
> (ix86_rop_should_change_byte_p, reg_encoded_number,
> ix86_get_modrm_for_rop, set_rop_modrm_reg_bits, ix86_mitigate_rop): New
> static functions.
> (ix86_reorg): Call ix86_mitigate_rop if -fmitigate-rop.
> * config/i386/i386.md (attr "modrm_class"): New.
> (cmp_ccno_1, mov_xor, movstrict_xor,
> x86_movcc_0_m1. x86_movcc_0_m1_se,
> x86_movcc_0_m1_neg): Override modrm_class attribute.

LGTM, and since the whole thing is protected by a -mmitigate-rop it
looks safe for mainline SVN.

Thanks,
Uros.


Re: [PATCH][RTL-ree] PR rtl-optimization/68194: Restrict copy instruction in presence of conditional moves

2015-11-16 Thread Bernd Schmidt

On 11/16/2015 03:07 PM, Kyrill Tkachov wrote:


I've explained in the comments in the patch what's going on but the
short version is trying to change the destination of a defining insn
that feeds into an extend insn is not valid if the defining insn
doesn't feed directly into the extend insn. In the ree pass the only
way this can happen is if there is an intermediate conditional move
that the pass tries to handle in a special way. An equivalent fix
would have been to check on that path (when copy_needed in
combine_reaching_defs is true) that the state->copies_list vector
(that contains the conditional move insns feeding into the extend
insn) is empty.


I ran this through gdb, and I think I see what's going on. For 
reference, here's a comment from the source:


  /* Considering transformation of
 (set (reg1) (expression))
 ...
 (set (reg2) (any_extend (reg1)))

 into

 (set (reg2) (any_extend (expression)))
 (set (reg1) (reg2))
 ...  */

I was thinking that another possible fix would be to also check 
!reg_used_between_p for reg1 to ensure it's not used. I'm thinking this 
might be a little clearer - what is your opinion?


The added comment could lead to some confusion since it's placed in 
front of an existing if statement that also tests a different condition. 
Also, if we go with your fix,



+ || !reg_overlap_mentioned_p (tmp_reg, SET_SRC (PATTERN (cand->insn


Shouldn't this really be !rtx_equal_p?


Bernd


Re: [PATCH] Improve BB vectorization dependence analysis

2015-11-16 Thread Alan Lawrence

On 09/11/15 12:55, Richard Biener wrote:


Currently BB vectorization computes all dependences inside a BB
region and fails all vectorization if it cannot handle some of them.

This is obviously not needed - BB vectorization can restrict the
dependence tests to those that are needed to apply the load/store
motion effectively performed by the vectorization (sinking all
participating loads/stores to the place of the last one).

With restructuring it that way it's also easy to not give up completely
but only for the SLP instance we cannot vectorize (this gives
a slight bump in my SPEC CPU 2006 testing to 756 vectorized basic
block regions).

But first and foremost this patch is to reduce the dependence analysis
cost and somewhat mitigate the compile-time effects of the first patch.

For fixing PR56118 only a cost model issue remains.

Bootstrapped and tested on x86_64-unknown-linux-gnu, applied to trunk.

Richard.

2015-11-09  Richard Biener  

PR tree-optimization/56118
* tree-vectorizer.h (vect_find_last_scalar_stmt_in_slp): Declare.
* tree-vect-slp.c (vect_find_last_scalar_stmt_in_slp): Export.
* tree-vect-data-refs.c (vect_slp_analyze_node_dependences): New
function.
(vect_slp_analyze_data_ref_dependences): Instead of computing
all dependences of the region DRs just analyze the code motions
SLP vectorization will perform.  Remove SLP instances that
cannot have their store/load motions applied.
(vect_analyze_data_refs): Allow DRs without a vectype
in BB vectorization.

* gcc.dg/vect/no-tree-sra-bb-slp-pr50730.c: Adjust.


Since this, I've been seeing an ICE on gfortran.dg/vect/vect-9.f90 at on both 
aarch64-none-linux-gnu and arm-none-linux-gnueabihf:


spawn /home/alalaw01/build/gcc/testsuite/gfortran4/../../gfortran 
-B/home/alalaw01/build/gcc/testsuite/gfortran4/../../ 
-B/home/alalaw01/build/aarch64-unknown-linux-gnu/./libgfortran/ 
/home/alalaw01/gcc/gcc/testsuite/gfortran.dg/vect/vect-9.f90 
-fno-diagnostics-show-caret -fdiagnostics-color=never -O -O2 -ftree-vectorize 
-fvect-cost-model=unlimited -fdump-tree-vect-details -Ofast -S -o vect-9.s
/home/alalaw01/gcc/gcc/testsuite/gfortran.dg/vect/vect-9.f90:5:0: Error: 
definition in block 13 follows the use for SSA_NAME: _339 in statement:

vectp.156_387 = &*cc_36(D)[_339];
/home/alalaw01/gcc/gcc/testsuite/gfortran.dg/vect/vect-9.f90:5:0: internal 
compiler error: verify_ssa failed

0xcfc61b verify_ssa(bool, bool)
../../gcc-fsf/gcc/tree-ssa.c:1039
0xa2fc0b execute_function_todo
../../gcc-fsf/gcc/passes.c:1952
0xa30393 do_per_function
../../gcc-fsf/gcc/passes.c:1632
0xa3058f execute_todo
../../gcc-fsf/gcc/passes.c:2000
Please submit a full bug report...
FAIL: gfortran.dg/vect/vect-9.f90   -O  (internal compiler error)
FAIL: gfortran.dg/vect/vect-9.f90   -O  (test for excess errors)

Still there (on aarch64) at r230329.

--Alan



Re: [PATCH 00/16] Unit tests framework (v3)

2015-11-16 Thread David Malcolm
On Mon, 2015-11-16 at 19:17 +0100, Bernd Schmidt wrote:
> So Jeff and I just had a chat, and we came up with some thoughts about 
> how to proceed. I think we both agree that it would be good to have a 
> special testing backend, along with frontends designed to be able to 
> read in gimple or rtl that can be operated on. That's more of a 
> long-term thing.

(nods)  FWIW, I'm interesting in trying to get the gimple FE into gcc 7.

> For some of the simpler infrastructure tests such as the ones in this 
> patch kit (bitmap, vec or wide-int functionality testing and such), we 
> had the idea of putting these into every ENABLE_CHECKING compiler, and 
> run them after building stage1, controlled by a -fself-test flag. It's 
> better to detect such basic failures early rather than complete a full 
> bootstrap and test cycle. It also keeps the tests alongside the rest of 
> the implementation, which I consider desirable for such relatively 
> simple data structures.

Would it be reasonable to run them at each stage?  My hope is that they
will be fast.

If we're building the tests into the compiler itself, guarded by
  #if ENABLE_CHECKING
then presumably it makes sense to put the tests directly into the
pertinent source files? (rather than in a "foo-tests.c" file).  Possibly
even to interleave them, to be next to the code in question.

Given that this patch kit has seen a fair amount of discussion, and
parts of it are already approved, and that it's designed to improve our
test coverage, is it reasonable to continue pursuing this within stage
3?  (I hope so)  Should I attempt a patch for the above? (once I've
fixed the AIX bootstrap issue, of course)

Any thoughts on embedded gtest vs external gtest vs building our own?
If we're embedding, it may make most sense to build our own minimal
implementation, with a similar API (to avoid relying on the C++ stdlib,
which gtest does; that was the most awkward part of dealing with it,
iirc); this would be simpler, I suspect.


Dave




Re: [PATCH] Add configure flag for operator new (std::nothrow)

2015-11-16 Thread Pedro Alves
On 11/10/2015 01:10 PM, Jonathan Wakely wrote:
> On 06/11/15 09:59 +, Pedro Alves wrote:
>> On 11/06/2015 01:56 AM, Jonathan Wakely wrote:
>>> On 5 November 2015 at 23:31, Daniel Gutson
>>
 The issue is, as I understand it, to do the actual work of operator
 new, i.e. allocate memory. It should force
 us to copy most of the code of the original code of operator new,
 which may change on new versions of the
 STL, forcing us to keep updated.
>>>
>>> It can just call malloc, and the replacement operator delete can call free.
>>>
>>> That is very unlikely to need to change (which is corroborated by the
>>> fact that the default definitions in libsupc++ change very rarely).
>>
>> Or perhaps libsupc++ could provide the default operator new under
>> a __default_operator_new alias or some such, so that the user-defined
>> replacement can fallback to calling it.  Likewise for op delete.
> 
> That could be useful, please file an enhancement request in bugzilla
> if you'd like that done.
> 

I'll leave that to Daniel/Aurelio.

Thanks,
Pedro Alves



[C++ PATCH] Fix ICE with NOP_EXPRs in check_case_bounds (PR c++/68362)

2015-11-16 Thread Marek Polacek
This patch ought to fix a regression introduced by the C++ delayed folding
merge.  What happens here for this testcase is that in c_add_case_label we
convert "0" to NOP_EXPR: "(const A) 0" in convert_and_check.  We pass this
to check_case_bounds which only expects INTEGER_CSTs.  Since we can't use
maybe_constant_value or fold_simple, I decided to try to use just fold (but
just STRIP_SIGN_NOPS should work as well).

Bootstrapped/regtested on x86_64-linux, ok for trunk?

2015-11-16  Marek Polacek  

PR c++/68362
* c-common.c (check_case_bounds): Fold low and high cases.

* g++.dg/delayedfold/switch-1.C: New test.

diff --git gcc/c-family/c-common.c gcc/c-family/c-common.c
index f8ccb6d..03c90f7 100644
--- gcc/c-family/c-common.c
+++ gcc/c-family/c-common.c
@@ -3769,6 +3769,10 @@ check_case_bounds (location_t loc, tree type, tree 
orig_type,
   min_value = TYPE_MIN_VALUE (orig_type);
   max_value = TYPE_MAX_VALUE (orig_type);
 
+  /* We'll really need integer constants here.  */
+  case_low = fold (case_low);
+  case_high = fold (case_high);
+
   /* Case label is less than minimum for type.  */
   if (tree_int_cst_compare (case_low, min_value) < 0
   && tree_int_cst_compare (case_high, min_value) < 0)
diff --git gcc/testsuite/g++.dg/delayedfold/switch-1.C 
gcc/testsuite/g++.dg/delayedfold/switch-1.C
index e69de29..302da23 100644
--- gcc/testsuite/g++.dg/delayedfold/switch-1.C
+++ gcc/testsuite/g++.dg/delayedfold/switch-1.C
@@ -0,0 +1,19 @@
+// PR c++/68362
+// { dg-do compile { target c++11 } }
+
+enum class A { foo };
+enum E { bar };
+
+void
+fn1 (const A a)
+{
+  switch (a)
+  case A::foo:;
+}
+
+void
+fn2 (E e)
+{
+  switch (e)
+  case bar:;
+}

Marek


Re: [ARM] Fix PR middle-end/65958

2015-11-16 Thread Eric Botcazou
> More comments inline.

Revised version attached, which addresses all your comments and in particular 
removes the

+#if PROBE_INTERVAL > 4096
+#error Cannot use indexed addressing mode for stack probing
+#endif

compile-time assertion.  It generates the same code for PROBE_INTERVAL == 4096 
as before and it generates code that can be assembled for 8192.

Tested on Aarch64/Linux, OK for the mainline?


2015-11-16  Tristan Gingold  
Eric Botcazou  

PR middle-end/65958
* config/aarch64/aarch64-protos.h (aarch64_output_probe_stack-range):
Declare.
* config/aarch64/aarch64.md: Declare UNSPECV_BLOCKAGE and
UNSPEC_PROBE_STACK_RANGE.
(blockage): New instruction.
(probe_stack_range): Likewise.
* config/aarch64/aarch64.c (aarch64_emit_probe_stack_range): New
function.
(aarch64_output_probe_stack_range): Likewise.
(aarch64_expand_prologue): Invoke aarch64_emit_probe_stack_range if
static builtin stack checking is enabled.
* config/aarch64/aarch64-linux.h (STACK_CHECK_STATIC_BUILTIN):
Define.


2015-11-16  Eric Botcazou  

* gcc.target/aarch64/stack-checking.c: New test.

-- 
Eric Botcazou/* { dg-do run { target { *-*-linux* } } } */
/* { dg-options "-fstack-check" } */

int main(void)
{
  char *p;
  if (1)
{
  char i[48];
  p = __builtin_alloca(8);
  p[0] = 1;
}

  if (1)
{
  char i[48], j[64];
  j[32] = 0;
}

  return !p[0];
}
Index: config/aarch64/aarch64-linux.h
===
--- config/aarch64/aarch64-linux.h	(revision 230397)
+++ config/aarch64/aarch64-linux.h	(working copy)
@@ -88,4 +88,7 @@
 #undef TARGET_BINDS_LOCAL_P
 #define TARGET_BINDS_LOCAL_P default_binds_local_p_2
 
+/* Define this to be nonzero if static stack checking is supported.  */
+#define STACK_CHECK_STATIC_BUILTIN 1
+
 #endif  /* GCC_AARCH64_LINUX_H */
Index: config/aarch64/aarch64-protos.h
===
--- config/aarch64/aarch64-protos.h	(revision 230397)
+++ config/aarch64/aarch64-protos.h	(working copy)
@@ -340,6 +340,7 @@ void aarch64_asm_output_labelref (FILE *
 void aarch64_cpu_cpp_builtins (cpp_reader *);
 void aarch64_elf_asm_named_section (const char *, unsigned, tree);
 const char * aarch64_gen_far_branch (rtx *, int, const char *, const char *);
+const char * aarch64_output_probe_stack_range (rtx, rtx);
 void aarch64_err_no_fpadvsimd (machine_mode, const char *);
 void aarch64_expand_epilogue (bool);
 void aarch64_expand_mov_immediate (rtx, rtx);
Index: config/aarch64/aarch64.c
===
--- config/aarch64/aarch64.c	(revision 230397)
+++ config/aarch64/aarch64.c	(working copy)
@@ -62,6 +62,7 @@
 #include "sched-int.h"
 #include "cortex-a57-fma-steering.h"
 #include "target-globals.h"
+#include "common/common-target.h"
 
 /* This file should be included last.  */
 #include "target-def.h"
@@ -2151,6 +2152,169 @@ aarch64_libgcc_cmp_return_mode (void)
   return SImode;
 }
 
+#define PROBE_INTERVAL (1 << STACK_CHECK_PROBE_INTERVAL_EXP)
+
+/* We use the 12-bit shifted immediate arithmetic instructions so values
+   must be multiple of (1 << 12), i.e. 4096.  */
+#if (PROBE_INTERVAL % 4096) != 0
+#error Cannot use simple address calculation for stack probing
+#endif
+
+/* The pair of scratch registers used for stack probing.  */
+#define PROBE_STACK_FIRST_REG  9
+#define PROBE_STACK_SECOND_REG 10
+
+/* Emit code to probe a range of stack addresses from FIRST to FIRST+SIZE,
+   inclusive.  These are offsets from the current stack pointer.  */
+
+static void
+aarch64_emit_probe_stack_range (HOST_WIDE_INT first, HOST_WIDE_INT size)
+{
+  rtx reg1 = gen_rtx_REG (Pmode, PROBE_STACK_FIRST_REG);
+
+  /* See the same assertion on PROBE_INTERVAL above.  */
+  gcc_assert ((first % 4096) == 0);
+
+  /* See if we have a constant small number of probes to generate.  If so,
+ that's the easy case.  */
+  if (size <= PROBE_INTERVAL)
+{
+  const HOST_WIDE_INT base = ROUND_UP (size, 4096);
+  emit_set_insn (reg1,
+		 plus_constant (Pmode, stack_pointer_rtx,
+	   -(first + base)));
+  emit_stack_probe (plus_constant (Pmode, reg1, base - size));
+}
+
+  /* The run-time loop is made up of 8 insns in the generic case while the
+ compile-time loop is made up of 4+2*(n-2) insns for n # of intervals.  */
+  else if (size <= 4 * PROBE_INTERVAL)
+{
+  HOST_WIDE_INT i, rem;
+
+  emit_set_insn (reg1,
+		 plus_constant (Pmode, stack_pointer_rtx,
+	   -(first + PROBE_INTERVAL)));
+  emit_stack_probe (reg1);
+
+  /* Probe at FIRST + N * PROBE_INTERVAL for values of N from 2 until
+	 it exceeds SIZE.  If only two probes are needed, this will not
+	 generate any code.  Then probe at FIRST + SIZE.  */
+  for (i = 2 * PROBE_INTERVAL; i < size; i += PROBE_INTERVAL)
+	{
+	  emit

Re: [C++ PATCH] Fix ICE with NOP_EXPRs in check_case_bounds (PR c++/68362)

2015-11-16 Thread Jason Merrill

OK.

Jason


Re: Add null identifiers to genmatch

2015-11-16 Thread Pedro Alves
Hi Jeff,

(Sorry I didn't reply sooner, I was OOO.)

On 11/08/2015 11:17 PM, Jeff Law wrote:
> On 11/07/2015 07:31 AM, Pedro Alves wrote:
>> Hi Richard,
>>
>> Passerby comment below.
>>
>> On 11/07/2015 01:21 PM, Richard Sandiford wrote:
>>> -/* Lookup the identifier ID.  */
>>> +/* Lookup the identifier ID.  Allow "null" if ALLOW_NULL.  */
>>>
>>>   id_base *
>>> -get_operator (const char *id)
>>> +get_operator (const char *id, bool allow_null = false)
>>>   {
>>> +  if (allow_null && strcmp (id, "null") == 0)
>>> +return null_id;
>>> +
>>> id_base tem (id_base::CODE, id);
>>
>> Boolean params are best avoided if possible, IMO.  In this case,
>> it seems this could instead be a new wrapper function, like:
> This hasn't been something we've required for GCC.I've come across 
> this recommendation a few times over the last several months as I 
> continue to look at refactoring and best practices for codebases such as 
> GCC.

FWIW, GDB is in progress of converting to C++ and we're pulling in
GCC's C++ conventions as reference, so I thought I'd see what the GCC
community thought here.

> 
> By encoding the boolean in the function's signature, it (IMHO) does make 
> the code a bit easier to read, primarily because you don't have to go 
> lookup the tense of the boolean).  The problem is when the boolean is 
> telling us some property an argument, but there's more than one argument 
> and other similar situations.

There's more than one way to avoid boolean parameters.
If you have more than one of those, the boolean trap is even
worse IMO.  In such cases, enums can help make things clearer for
the reader.  E.g.:

 foo (true, false);
 foo (false, true);

vs:

 foo (whatever::value1, bar::in_style);

I think that internal helper functions defined close to
their usage end up being OK to use booleans, while if you have
a API consumed by other modules it pays off more to try to
avoid the boolean trap.

Anyway, sorry for the noise.  I know there are bigger fish to fry.

Thanks,
Pedro Alves



Re: Add null identifiers to genmatch

2015-11-16 Thread Jeff Law

On 11/16/2015 01:15 PM, Pedro Alves wrote:

Hi Jeff,

(Sorry I didn't reply sooner, I was OOO.)

No worries.



Boolean params are best avoided if possible, IMO.  In this case,
it seems this could instead be a new wrapper function, like:

This hasn't been something we've required for GCC.I've come across
this recommendation a few times over the last several months as I
continue to look at refactoring and best practices for codebases such as
GCC.


FWIW, GDB is in progress of converting to C++ and we're pulling in
GCC's C++ conventions as reference, so I thought I'd see what the GCC
community thought here.
FWIW, I often look at GCC's conventions, google's conventions, then 
start doing google searches around this kind of stuff.  And as always, 
the quality of the latter can vary wildly :-)






By encoding the boolean in the function's signature, it (IMHO) does make
the code a bit easier to read, primarily because you don't have to go
lookup the tense of the boolean).  The problem is when the boolean is
telling us some property an argument, but there's more than one argument
and other similar situations.


There's more than one way to avoid boolean parameters.
If you have more than one of those, the boolean trap is even
worse IMO.  In such cases, enums can help make things clearer for
the reader.  E.g.:

  foo (true, false);
  foo (false, true);

vs:

  foo (whatever::value1, bar::in_style);
Yea, I saw the latter at some point as well.  In general I don't think 
we've used enums as well as we could/should have in GCC through the years.





I think that internal helper functions defined close to
their usage end up being OK to use booleans, while if you have
a API consumed by other modules it pays off more to try to
avoid the boolean trap.

Anyway, sorry for the noise.  I know there are bigger fish to fry.

Not noise at all -- no need to apologize.

jeff


[PATCH] Fix uninitialized src_range within c_expr (Re: libcpp/C FE source range patch committed (r230331))

2015-11-16 Thread David Malcolm
On Sat, 2015-11-14 at 23:32 -0500, David Malcolm wrote:
> On Sat, 2015-11-14 at 09:50 -0500, David Edelsohn wrote:
> > This patch causes numerous new testsuite failure on AIX caused by the
> > compiler crashing during compilation, e.g.
> > 
> > gcc.c-torture/execute/20020206-1.c
> > 
> > in GCC libcpp
> > 
> > 991   linemap_assert (line >= LINEMAPS_MACRO_LOWEST_LOCATION (set));
> > 
> > (gdb) where
> > #0  _Z11fancy_abortPKciS0_ (
> > file=0x11296dc0
> > <_GLOBAL__F__ZN9text_info9set_rangeEj12source_rangeb+3056>
> > "/nasfarm/edelsohn/src/src/libcpp/line-map.c", line=991,
> > function=0x11296f30
> > <_GLOBAL__F__ZN9text_info9set_rangeEj12source_rangeb+3424>
> > "linemap_macro_map_lookup")
> > at /nasfarm/edelsohn/src/src/gcc/diagnostic.c:1332
> > #1  0x100169b4 in _Z14linemap_lookupP9line_mapsj (set=0x7000, line=991)
> > at /nasfarm/edelsohn/src/src/libcpp/line-map.c:991
> > #2  0x100188f8 in
> > _Z40linemap_unwind_to_first_non_reserved_locP9line_mapsjPPK8line_map
> > (set=0x7000, loc=991, map=0x0)
> > at /nasfarm/edelsohn/src/src/libcpp/line-map.c:1629
> > #3  0x100753c8 in _ZL17expand_location_1jb (loc=889323520,
> > expansion_point_p=false) at /nasfarm/edelsohn/src/src/gcc/input.c:158
> > #4  0x10076488 in _Z48linemap_client_expand_location_to_spelling_pointj (
> > loc=991) at /nasfarm/edelsohn/src/src/gcc/input.c:751
> > #5  0x10019928 in _ZN13rich_location9add_rangeEjjb (this=0x2ff21cd8,
> > start=991, finish=889323520, show_caret_p=true)
> > at /nasfarm/edelsohn/src/src/libcpp/line-map.c:2012
> > #6  0x10019a54 in _ZN13rich_locationC2EP9line_mapsj (this=0x2ff21cd8,
> > set=0x3df, loc=287928112)
> > at /nasfarm/edelsohn/src/src/libcpp/line-map.c:2024
> > #7  0x1000ed84 in _Z7warningiPKcz (opt=164,
> > gmsgid=0x11488d18
> > <_GLOBAL__F__Z20prepare_call_addressP9tree_nodeP7rtx_defS2-
> > _PS2_ii+3752> "function call has aggregate value")
> > at /nasfarm/edelsohn/src/src/gcc/diagnostic.c:1003
> > #8  0x1067ebac in _Z11expand_callP9tree_nodeP7rtx_defi (exp=0x700dcf20,
> > target=0x700ec080, ignore=0) at 
> > /nasfarm/edelsohn/src/src/gcc/calls.c:2476
> > #9  0x10406858 in
> > _Z18expand_expr_real_1P9tree_nodeP7rtx_def12machine_mode15expand_modifierPS2_b
> > (exp=0x700dcf20, target=0x700ec080, tmode=BLKmode,
> > modifier=EXPAND_NORMAL, alt_rtl=0x17, inner_reference_p=false)
> > at /nasfarm/edelsohn/src/src/gcc/expr.c:10581
> > #10 0x104158c0 in _Z22store_expr_with_boundsP9tree_nodeP7rtx_defibbS0_ (
> > exp=0x700dcf20, target=0x700ec080, call_param_p=0, nontemporal=false,
> > reverse=false, btarget=0x700df058)
> > at /nasfarm/edelsohn/src/src/gcc/expr.c:5405
> > #11 0x104178fc in _Z17expand_assignmentP9tree_nodeS0_b (to=0x700df058,
> > from=0x700dcf20, nontemporal=false)
> > at /nasfarm/edelsohn/src/src/gcc/expr.c:5174
> > #12 0x106f67b4 in _ZL18expand_gimple_stmtP6gimple (stmt=0x7000e240)
> > at /nasfarm/edelsohn/src/src/gcc/cfgexpand.c:6278
> > #13 0x106f87d8 in _ZL25expand_gimple_basic_blockP15basic_block_defb (
> > bb=0x700c7740, disable_tail_calls=false)
> > at /nasfarm/edelsohn/src/src/gcc/cfgexpand.c:5679
> > #14 0x106ffbf4 in _ZN12_GLOBAL__N_111pass_expand7executeEP8function (
> > this=0x11296dc0
> > <_GLOBAL__F__ZN9text_info9set_rangeEj12source_rangeb+3056>,
> > fun=0x70009138) at /nasfarm/edelsohn/src/src/gcc/cfgexpand.c:6291
> 
> I attempted to reproduce this on gcc111 (powerpc-ibm-aix7.1.3.0)
>   ../src/configure --disable-bootstrap --with-gmp=/opt/cfarm/gmp-latest
> --with-mpfr=/opt/cfarm/mpfr-latest --with-mpc=/opt/cfarm/mpc-latest
> with latest trunk (r230384).
> 
> I saw only one ICE within "make check-gcc", when running
> gcc.c-torture/execute/scal-to-vec1.c; specifically:
> 
>   /home/dmalcolm/gcc-build/build/gcc/xgcc \
> -B/home/dmalcolm/gcc-build/build/gcc/ \
> 
> /home/dmalcolm/gcc-build/src/gcc/testsuite/gcc.c-torture/execute/scal-to-vec1.c
>  \
> -fno-diagnostics-show-caret -fdiagnostics-color=never \
>-O0  -w  -lm-o ./scal-to-vec1.exe
> 
> and this shows the same assertion failure as your report.
> 
> 
> I was able to reproduce that ICE at will under gdb; from what I could
> tell from gdb, a seemingly valid location is passed in to warning_at,
> but at warning_at, the 32-bit value is seemingly corrupt, and this
> eventually leads to an assertion failure in the new code.  The warning
> is then discarded (OPT_Wvector_operation_performance).  I can't tell yet
> if the data is corrupt, or if gdb is somehow getting confused about the
> values as I go up and down the callstack (or indeed if I am), but it's
> getting late here.

The root cause is uninitialized data.  Specifically, the C parser's
struct c_expr gained a "src_range" field, and it turns out there are a
few places where I wasn't initializing this when returning c_expr
instances on the stack, and in some cases the values could get used.  I
was able to reproduce it on x86_64 using valgrind; in each ca

Re: [PATCH] PR 68366 - include emit-rtl.h in sdbout.c

2015-11-16 Thread Jeff Law

On 11/15/2015 07:27 PM, tbsaunde+...@tbsaunde.org wrote:

From: Trevor Saunders 

Some of the pa target macros rely on macros in emit-rtl.h and sdbout.c
uses some of those macros, which means that sdbout.c needs to include
emit-rtl.h.

this seems obvious, bootstrapped on x86_64-linux-gnu, and checked that a cross
to hppa-linux now builds so committing to trunk.

I noticed that gcc-order-headers already wanted to reorder includes so I didn't
worry about the oorder here, we can clean that up later easily anyway.

Trev


gcc/ChangeLog:

2015-11-15  Trevor Saunders  

PR middle-end/68366
* sdbout.c: Include emit-rtl.h and function.h.
Which PA targets?  I'm regularly building PA crosses and haven't 
stumbled over this.


jeff



[PATCH] libstdc++: Fix libstdc++/67440: pretty-printing of a const set fails

2015-11-16 Thread Doug Evans

Hi.

Apologies for the delay.
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=67440

Tested with current trunk.

2015-11-16  Doug Evans  

PR libstdc++/67440
* python/libstdcxx/v6/printers.py (find_type): Handle "const" in
type name.
* testsuite/libstdc++-prettyprinters/debug.cc: Add test for
const set.
* testsuite/libstdc++-prettyprinters/simple.cc: Ditto.
* testsuite/libstdc++-prettyprinters/simple11.cc: Ditto.

Index: python/libstdcxx/v6/printers.py
===
--- python/libstdcxx/v6/printers.py (revision 227421)
+++ python/libstdcxx/v6/printers.py (working copy)
@@ -85,7 +85,9 @@
 def find_type(orig, name):
 typ = orig.strip_typedefs()
 while True:
-search = str(typ) + '::' + name
+# Use typ.name here instead of str(typ) to discard any const,etc.
+# qualifiers.  PR 67440.
+search = typ.name + '::' + name
 try:
 return gdb.lookup_type(search)
 except RuntimeError:
Index: testsuite/libstdc++-prettyprinters/debug.cc
===
--- testsuite/libstdc++-prettyprinters/debug.cc (revision 227421)
+++ testsuite/libstdc++-prettyprinters/debug.cc (working copy)
@@ -70,6 +70,10 @@
   std::map::iterator mpiter = mp.begin();
 // { dg-final { note-test mpiter {{first = "zardoz", second = 23}} } }

+  // PR 67440
+  const std::set const_intset = {2, 3};
+// { dg-final { note-test const_intset {std::__debug::set with 2 elements  
= {[0] = 2, [1] = 3}} } }

+
   std::set sp;
   sp.insert("clownfish");
   sp.insert("barrel");
Index: testsuite/libstdc++-prettyprinters/simple.cc
===
--- testsuite/libstdc++-prettyprinters/simple.cc(revision 227421)
+++ testsuite/libstdc++-prettyprinters/simple.cc(working copy)
@@ -73,6 +73,10 @@
   std::map::iterator mpiter = mp.begin();
 // { dg-final { note-test mpiter {{first = "zardoz", second = 23}} } }

+  // PR 67440
+  const std::set const_intset = {2, 3};
+// { dg-final { note-test const_intset {std::set with 2 elements = {[0] =  
2, [1] = 3}} } }

+
   std::set sp;
   sp.insert("clownfish");
   sp.insert("barrel");
Index: testsuite/libstdc++-prettyprinters/simple11.cc
===
--- testsuite/libstdc++-prettyprinters/simple11.cc  (revision 227421)
+++ testsuite/libstdc++-prettyprinters/simple11.cc  (working copy)
@@ -73,6 +73,10 @@
   std::map::iterator mpiter = mp.begin();
 // { dg-final { note-test mpiter {{first = "zardoz", second = 23}} } }

+  // PR 67440
+  const std::set const_intset = {2, 3};
+// { dg-final { note-test const_intset {std::set with 2 elements = {[0] =  
2, [1] = 3}} } }

+
   std::set sp;
   sp.insert("clownfish");
   sp.insert("barrel");


Re: [PATCH] Fix inconsistent use of integer types in gcc/lto-streamer-out.c

2015-11-16 Thread Jeff Law

On 11/16/2015 09:22 AM, Andris Pavenis wrote:

On 11/16/2015 01:12 PM, Richard Biener wrote:

On Sun, Nov 15, 2015 at 9:21 AM, Andris Pavenis
 wrote:

This fixes use of pointers different unsigned integer types as function
parameter.
Function prototype is (see gcc/tree-streamer.h):

bool streamer_tree_cache_lookup (struct streamer_tree_cache_d *, tree,
  unsigned *);

gcc/lto-streamer-out.c passes uint32_t int * as parameter to this
method in
2 places.
Current type unisgned is used elsewhere in the same file.

uint32_t is not guaranteed to be the same as unsigned (for DJGPP
uint32_t is
actually
unsigned long). That causes compile failure for DJGPP native build.

Ok.

Thanks,
Richard.


Could somebody apply it as I do not have SVN write access.

Done.
jeff



Re: [PATCH 00/16] Unit tests framework (v3)

2015-11-16 Thread Bernd Schmidt



For some of the simpler infrastructure tests such as the ones in this
patch kit (bitmap, vec or wide-int functionality testing and such), we
had the idea of putting these into every ENABLE_CHECKING compiler, and
run them after building stage1, controlled by a -fself-test flag. It's
better to detect such basic failures early rather than complete a full
bootstrap and test cycle. It also keeps the tests alongside the rest of
the implementation, which I consider desirable for such relatively
simple data structures.


Would it be reasonable to run them at each stage?  My hope is that they
will be fast.


Depends on how fast, I guess. I don't think testing them more than once 
gains very much; if there's a suspicion that stage3 was miscompiled one 
could still run -fself-test manually.



If we're building the tests into the compiler itself, guarded by
   #if ENABLE_CHECKING
then presumably it makes sense to put the tests directly into the
pertinent source files? (rather than in a "foo-tests.c" file).  Possibly
even to interleave them, to be next to the code in question.


Yes, I was thinking same source file for the most part. I don't think 
there has to be any kind of rule, we just do whatever makes sense.



Given that this patch kit has seen a fair amount of discussion, and
parts of it are already approved, and that it's designed to improve our
test coverage, is it reasonable to continue pursuing this within stage
3?  (I hope so)  Should I attempt a patch for the above? (once I've
fixed the AIX bootstrap issue, of course)


As far as I'm concerned this can still proceed given that it was 
submitted well in advance of stage 3 (unless someone objects).



Any thoughts on embedded gtest vs external gtest vs building our own?


I think for -fself-test we can mostly operate with gcc_assert, IMO 
there's no need to use an elaborate framework. We can revisit this issue 
when we get to more extensive tests that require multiple compiler 
invocations.



Bernd


Re: [PATCH] libstdc++: Fix libstdc++/67440: pretty-printing of a const set fails

2015-11-16 Thread Jonathan Wakely

On 16/11/15 21:04 +, Doug Evans wrote:

Hi.

Apologies for the delay.
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=67440

Tested with current trunk.

2015-11-16  Doug Evans  

PR libstdc++/67440
* python/libstdcxx/v6/printers.py (find_type): Handle "const" in
type name.
* testsuite/libstdc++-prettyprinters/debug.cc: Add test for
const set.
* testsuite/libstdc++-prettyprinters/simple.cc: Ditto.
* testsuite/libstdc++-prettyprinters/simple11.cc: Ditto.

Index: python/libstdcxx/v6/printers.py
===
--- python/libstdcxx/v6/printers.py (revision 227421)
+++ python/libstdcxx/v6/printers.py (working copy)
@@ -85,7 +85,9 @@
def find_type(orig, name):
typ = orig.strip_typedefs()
while True:
-search = str(typ) + '::' + name
+# Use typ.name here instead of str(typ) to discard any const,etc.
+# qualifiers.  PR 67440.
+search = typ.name + '::' + name


Oh, that's surprisingly simple! :-)

This only affects the printers, so although we're in stage 3 this is
OK for trunk and gcc-5-branch, thanks.




[PATCH] Clear LOOP_CLOSED_SSA after pass_ccp

2015-11-16 Thread Tom de Vries

Hi,

while playing around with inserting pass_ccp here and there in the pass 
list, I put it after a pass where the loops state contained LOOP_CLOSED_SSA.


And apparently pass_ccp does not preserve loop-closed ssa.

As a consequence, during executing the pass_ccp todos, 
verify_loop_closed_ssa fails.


This patch fixes that by noting in pass_ccp that it does not preserve 
loop-closed ssa.


OK for trunk if bootstrap and reg-test succeeds?

Thanks,
- Tom

Clear LOOP_CLOSED_SSA after pass_ccp

2015-11-16  Tom de Vries  

	* tree-ssa-ccp.c (do_ssa_ccp): Clear LOOP_CLOSED_SSA in loops state if
	something changed.

---
 gcc/tree-ssa-ccp.c | 10 +-
 1 file changed, 9 insertions(+), 1 deletion(-)

diff --git a/gcc/tree-ssa-ccp.c b/gcc/tree-ssa-ccp.c
index 7b6b451..7e8bc52 100644
--- a/gcc/tree-ssa-ccp.c
+++ b/gcc/tree-ssa-ccp.c
@@ -139,6 +139,7 @@ along with GCC; see the file COPYING3.  If not see
 #include "params.h"
 #include "builtins.h"
 #include "tree-chkp.h"
+#include "cfgloop.h"
 
 
 /* Possible lattice values.  */
@@ -2402,10 +2403,17 @@ do_ssa_ccp (bool nonzero_p)
 {
   unsigned int todo = 0;
   calculate_dominance_info (CDI_DOMINATORS);
+
   ccp_initialize ();
   ssa_propagate (ccp_visit_stmt, ccp_visit_phi_node);
   if (ccp_finalize (nonzero_p))
-todo = (TODO_cleanup_cfg | TODO_update_ssa);
+{
+  todo = (TODO_cleanup_cfg | TODO_update_ssa);
+
+  /* ccp_finalize does not preserve loop-closed ssa.  */
+  loops_state_clear (LOOP_CLOSED_SSA);
+}
+
   free_dominance_info (CDI_DOMINATORS);
   return todo;
 }


Re: [PATCH] Fix uninitialized src_range within c_expr (Re: libcpp/C FE source range patch committed (r230331))

2015-11-16 Thread Bernd Schmidt

On 11/16/2015 09:50 PM, David Malcolm wrote:

The root cause is uninitialized data.  Specifically, the C parser's
struct c_expr gained a "src_range" field, and it turns out there are a
few places where I wasn't initializing this when returning c_expr
instances on the stack, and in some cases the values could get used.



I'm working on a followup to fix the remaining places I identified via
review of the source.


The patch is mostly OK IMO and should be installed to fix the problems, 
but I think there are a few more things to consider.


Should c_expr perhaps acquire a constructor so that this problem is 
avoided in the future? The whole thing seems somewhat error-prone.



@@ -4278,9 +4278,11 @@ c_parser_braced_init (c_parser *parser, tree type, bool 
nested_p)
obstack_free (&braced_init_obstack, NULL);
return ret;
  }
+  location_t close_loc = c_parser_peek_token (parser)->location;


It looks like we're peeking the token twice here (via a 
c_parser_token_is_not call above the quoted code). Probably not too 
expensive but maybe we can avoid it.



case RID_VA_ARG:
- c_parser_consume_token (parser);
+ {
+   location_t start_loc = loc;


Does this really have to be indented in an extra braced block? Please 
fix if not.



Bernd


[PATCH] Make fdump-tree-sccp-details more complete

2015-11-16 Thread Tom de Vries

Hi,

pass_scev_cprop contains a bit where it replaces uses of an ssa-name 
with constants.  This is currently not noted in the dump-file, even with 
TDF_DETAILS.


This patch adds that information in the dump-file, in this format:
...
Replacing uses of: D__lsm.10_34 with: 1
...

OK for trunk if bootstrap and reg-test succeeds?

Thanks,
- Tom
Make fdump-tree-sccp-details more complete

2015-11-16  Tom de Vries  

	* tree-scalar-evolution.c (scev_const_prop): Dump details if replacing
	uses of ssa_name with constant.

---
 gcc/tree-scalar-evolution.c | 12 +++-
 1 file changed, 11 insertions(+), 1 deletion(-)

diff --git a/gcc/tree-scalar-evolution.c b/gcc/tree-scalar-evolution.c
index e90aafb..27630f0 100644
--- a/gcc/tree-scalar-evolution.c
+++ b/gcc/tree-scalar-evolution.c
@@ -3465,7 +3465,17 @@ scev_const_prop (void)
 
 	  /* Replace the uses of the name.  */
 	  if (name != ev)
-	replace_uses_by (name, ev);
+	{
+	  if (dump_file && (dump_flags & TDF_DETAILS))
+		{
+		  fprintf (dump_file, "Replacing uses of: ");
+		  print_generic_expr (dump_file, name, 0);
+		  fprintf (dump_file, " with: ");
+		  print_generic_expr (dump_file, ev, 0);
+		  fprintf (dump_file, "\n");
+		}
+	  replace_uses_by (name, ev);
+	}
 
 	  if (!ssa_names_to_remove)
 	ssa_names_to_remove = BITMAP_ALLOC (NULL);


Re: Add null identifiers to genmatch

2015-11-16 Thread Richard Sandiford
Jeff Law  writes:
 Boolean params are best avoided if possible, IMO.  In this case,
 it seems this could instead be a new wrapper function, like:
>>> This hasn't been something we've required for GCC.I've come across
>>> this recommendation a few times over the last several months as I
>>> continue to look at refactoring and best practices for codebases such as
>>> GCC.
>>
>> FWIW, GDB is in progress of converting to C++ and we're pulling in
>> GCC's C++ conventions as reference, so I thought I'd see what the GCC
>> community thought here.
> FWIW, I often look at GCC's conventions, google's conventions, then 
> start doing google searches around this kind of stuff.  And as always, 
> the quality of the latter can vary wildly :-)
>
>>
>>>
>>> By encoding the boolean in the function's signature, it (IMHO) does make
>>> the code a bit easier to read, primarily because you don't have to go
>>> lookup the tense of the boolean).  The problem is when the boolean is
>>> telling us some property an argument, but there's more than one argument
>>> and other similar situations.
>>
>> There's more than one way to avoid boolean parameters.
>> If you have more than one of those, the boolean trap is even
>> worse IMO.  In such cases, enums can help make things clearer for
>> the reader.  E.g.:
>>
>>   foo (true, false);
>>   foo (false, true);
>>
>> vs:
>>
>>   foo (whatever::value1, bar::in_style);
> Yea, I saw the latter at some point as well.  In general I don't think 
> we've used enums as well as we could/should have in GCC through the years.

Yeah.  Kenny was adamant that for wide-int we should have an UNSIGNED/SIGNED
enum rather than a boolean flag.  And I think that does make things clearer.
I always have to remind myself whether "true" means "unsigned" or "signed",
especially for RTL functions.

I certainly prefer the enum to separate functions though.  They can get
messy if a new call site is added that needs a variable parameter.

Thanks,
Richard


  1   2   >