date:20151204

Re: [PATCH] Fix reassoc range test vs. value ranges (PR tree-optimization/68671)

2015-12-04 Thread Richard Biener

On Fri, 4 Dec 2015, Jakub Jelinek wrote:

> On Fri, Dec 04, 2015 at 09:15:25AM +0100, Richard Biener wrote:
> > > +  modified one, up to and including last_bb, to be executed even if
> > > +  they would not be in the original program.  If the value ranges of
> > > +  assignment lhs' in those bbs were dependent on the conditions
> > > +  guarding those basic blocks which now can change, the VRs might
> > > +  be incorrect.  As no_side_effect_bb should ensure those SSA_NAMEs
> > > +  are only used within the same bb, it should be not a big deal if
> > > +  we just reset all the VRs in those bbs.  See PR68671.  */
> > > +  for (bb = last_bb, idx = 0; idx < max_idx; bb = single_pred (bb), 
> > > idx++)
> > > + {
> > > +   gimple_stmt_iterator gsi;
> > > +   for (gsi = gsi_last_bb (bb); !gsi_end_p (gsi); gsi_prev ())
> > > + {
> > > +   gimple *g = gsi_stmt (gsi);
> > > +   if (!is_gimple_assign (g))
> > > + continue;
> > > +   tree lhs = gimple_assign_lhs (g);
> > > +   if (TREE_CODE (lhs) != SSA_NAME)
> > > + continue;
> > > +   if (INTEGRAL_TYPE_P (TREE_TYPE (lhs)))
> > > + SSA_NAME_RANGE_INFO (lhs) = NULL;
> > 
> > Please use
> > 
> >  reset_flow_sensitive_info (lhs);
> 
> So maybe better then replace the whole inner loop with
>   reset_flow_sensitive_info_in_bb (bb);
> ?

Yeah, indeed.  I was confused about the max_id stuff and read you
were handling some blocks partially only.

Richard.

[PATCH] Fix PR68636

2015-12-04 Thread Richard Biener


When we compute excessive byte alignment get_pointer_alignment_1
may end up returning an alignment of zero due to overflow when
multiplying with BITS_PER_UNIT.  This in turn causes get_object_alignment
to return a too conservative (byte) alignment.

Fixed as follows.

Bootstrapped and tested on x86_64-unknown-linux-gnu, applied.

Richard.

2015-12-04  Richard Biener  

PR middle-end/68636
* builtins.c (get_pointer_alignment_1): Take care of byte to
bit alignment computation overflow.

Index: gcc/builtins.c
===
--- gcc/builtins.c  (revision 231058)
+++ gcc/builtins.c  (working copy)
@@ -497,6 +497,10 @@ get_pointer_alignment_1 (tree exp, unsig
{
  *bitposp = ptr_misalign * BITS_PER_UNIT;
  *alignp = ptr_align * BITS_PER_UNIT;
+ /* Make sure to return a sensible alignment when the multiplication
+by BITS_PER_UNIT overflowed.  */
+ if (*alignp == 0)
+   *alignp = 1u << (HOST_BITS_PER_INT - 1);
  /* We cannot really tell whether this result is an approximation.  */
  return true;
}

Re: [PATCH] Fix reassoc range test vs. value ranges (PR tree-optimization/68671)

2015-12-04 Thread Jakub Jelinek

On Fri, Dec 04, 2015 at 09:15:25AM +0100, Richard Biener wrote:
> > +modified one, up to and including last_bb, to be executed even if
> > +they would not be in the original program.  If the value ranges of
> > +assignment lhs' in those bbs were dependent on the conditions
> > +guarding those basic blocks which now can change, the VRs might
> > +be incorrect.  As no_side_effect_bb should ensure those SSA_NAMEs
> > +are only used within the same bb, it should be not a big deal if
> > +we just reset all the VRs in those bbs.  See PR68671.  */
> > +  for (bb = last_bb, idx = 0; idx < max_idx; bb = single_pred (bb), 
> > idx++)
> > +   {
> > + gimple_stmt_iterator gsi;
> > + for (gsi = gsi_last_bb (bb); !gsi_end_p (gsi); gsi_prev ())
> > +   {
> > + gimple *g = gsi_stmt (gsi);
> > + if (!is_gimple_assign (g))
> > +   continue;
> > + tree lhs = gimple_assign_lhs (g);
> > + if (TREE_CODE (lhs) != SSA_NAME)
> > +   continue;
> > + if (INTEGRAL_TYPE_P (TREE_TYPE (lhs)))
> > +   SSA_NAME_RANGE_INFO (lhs) = NULL;
> 
> Please use
> 
>  reset_flow_sensitive_info (lhs);

So maybe better then replace the whole inner loop with
reset_flow_sensitive_info_in_bb (bb);
?

Jakub

[PATCH] Fix PR67438

2015-12-04 Thread Richard Biener


The following is the only way I currently see to fix PR67438, register
pressure increase over a conditional due to a match pattern applying
to ops with multiple uses.

Bootstrapped and tested on x86_64-unknown-linux-gnu, applied.

Richard.

2015-12-04  Richard Biener  

PR middle-end/67438
* match.pd: Guard ~X cmp ~Y -> Y cmp X and the variant with
a constant with single_use.

Index: gcc/match.pd
===
--- gcc/match.pd(revision 231221)
+++ gcc/match.pd(working copy)
@@ -1855,15 +1879,17 @@ DEFINE_INT_AND_FLOAT_ROUND_FN (RINT)
 /* Fold ~X op ~Y as Y op X.  */
 (for cmp (simple_comparison)
  (simplify
-  (cmp (bit_not @0) (bit_not @1))
-  (cmp @1 @0)))
+  (cmp (bit_not@2 @0) (bit_not@3 @1))
+  (if (single_use (@2) && single_use (@3))
+   (cmp @1 @0
 
 /* Fold ~X op C as X op' ~C, where op' is the swapped comparison.  */
 (for cmp (simple_comparison)
  scmp (swapped_simple_comparison)
  (simplify
-  (cmp (bit_not @0) CONSTANT_CLASS_P@1)
-  (if (TREE_CODE (@1) == INTEGER_CST || TREE_CODE (@1) == VECTOR_CST)
+  (cmp (bit_not@2 @0) CONSTANT_CLASS_P@1)
+  (if (single_use (@2)
+   && (TREE_CODE (@1) == INTEGER_CST || TREE_CODE (@1) == VECTOR_CST))
(scmp @0 (bit_not @1)
 
 (for cmp (simple_comparison)

Re: [PATCH] Fix reassoc range test vs. value ranges (PR tree-optimization/68671)

2015-12-04 Thread Richard Biener

On Thu, 3 Dec 2015, Jakub Jelinek wrote:

> Hi!
> 
> As mentioned in the PR, maybe_optimize_range_tests considers basic blocks
> with not just the final GIMPLE_COND (or for last_bb store feeding into PHI),
> but also assign stmts that don't trap, don't have side-effects and where
> the SSA_NAMEs they set are used only in their own bb.
> Now, if we decide to optimize some range test, we can change some conditions
> on previous bbs and that means we could execute some basic blocks that
> wouldn't be executed in the original program.  As the stmts don't set
> anything used in other bbs, they are most likely dead after the
> optimization, but the problem on the testcase is that because of the
> condition changes in previous bb we end up with incorrect value range
> for some SSA_NAME(s).  That can result in the miscompilation of the testcase
> on certain targets.
> 
> Fixed by resetting the value range info of such SSA_NAMEs.  I believe it
> shouldn't be a big deal, they will be mostly dead anyway.
> 
> Bootstrapped/regtested on x86_64-linux and i686-linux, ok for trunk?
> 
> 2015-12-03  Jakub Jelinek  
> 
>   PR tree-optimization/68671
>   * tree-ssa-reassoc.c (maybe_optimize_range_tests): For basic
>   blocks starting with the successor of first bb we've modified
>   and ending with last_bb, reset value ranges of all integral
>   SSA_NAMEs set in those basic blocks.
> 
>   * gcc.dg/pr68671.c: New test.
> 
> --- gcc/tree-ssa-reassoc.c.jj 2015-11-18 11:22:51.0 +0100
> +++ gcc/tree-ssa-reassoc.c2015-12-03 18:12:08.915210122 +0100
> @@ -3204,7 +3204,7 @@ maybe_optimize_range_tests (gimple *stmt
>  any_changes = optimize_range_tests (ERROR_MARK, );
>if (any_changes)
>  {
> -  unsigned int idx;
> +  unsigned int idx, max_idx = 0;
>/* update_ops relies on has_single_use predicates returning the
>same values as it did during get_ops earlier.  Additionally it
>never removes statements, only adds new ones and it should walk
> @@ -3220,6 +3220,7 @@ maybe_optimize_range_tests (gimple *stmt
>   {
> tree new_op;
>  
> +   max_idx = idx;
> stmt = last_stmt (bb);
> new_op = update_ops (bbinfo[idx].op,
>  (enum tree_code)
> @@ -3289,6 +3290,10 @@ maybe_optimize_range_tests (gimple *stmt
> && ops[bbinfo[idx].first_idx]->op != NULL_TREE)
>   {
> gcond *cond_stmt = as_a  (last_stmt (bb));
> +
> +   if (idx > max_idx)
> + max_idx = idx;
> +
> if (integer_zerop (ops[bbinfo[idx].first_idx]->op))
>   gimple_cond_make_false (cond_stmt);
> else if (integer_onep (ops[bbinfo[idx].first_idx]->op))
> @@ -3305,6 +3310,30 @@ maybe_optimize_range_tests (gimple *stmt
> if (bb == first_bb)
>   break;
>   }
> +
> +  /* The above changes could result in basic blocks after the first
> +  modified one, up to and including last_bb, to be executed even if
> +  they would not be in the original program.  If the value ranges of
> +  assignment lhs' in those bbs were dependent on the conditions
> +  guarding those basic blocks which now can change, the VRs might
> +  be incorrect.  As no_side_effect_bb should ensure those SSA_NAMEs
> +  are only used within the same bb, it should be not a big deal if
> +  we just reset all the VRs in those bbs.  See PR68671.  */
> +  for (bb = last_bb, idx = 0; idx < max_idx; bb = single_pred (bb), 
> idx++)
> + {
> +   gimple_stmt_iterator gsi;
> +   for (gsi = gsi_last_bb (bb); !gsi_end_p (gsi); gsi_prev ())
> + {
> +   gimple *g = gsi_stmt (gsi);
> +   if (!is_gimple_assign (g))
> + continue;
> +   tree lhs = gimple_assign_lhs (g);
> +   if (TREE_CODE (lhs) != SSA_NAME)
> + continue;
> +   if (INTEGRAL_TYPE_P (TREE_TYPE (lhs)))
> + SSA_NAME_RANGE_INFO (lhs) = NULL;

Please use

 reset_flow_sensitive_info (lhs);

Ok with that change.

Thanks,
Richard.

> + }
> + }
>  }
>  }
>  
> --- gcc/testsuite/gcc.dg/pr68671.c.jj 2015-12-03 18:19:24.769104484 +0100
> +++ gcc/testsuite/gcc.dg/pr68671.c2015-12-03 18:19:07.0 +0100
> @@ -0,0 +1,23 @@
> +/* PR tree-optimization/68671 */
> +/* { dg-do run } */
> +/* { dg-options " -O2 -fno-tree-dce" } */
> +
> +volatile int a = -1;
> +volatile int b;
> +
> +static inline int
> +fn1 (signed char p1, int p2)
> +{
> +  return (p1 < 0) || (p1 > (1 >> p2)) ? 0 : (p1 << 1);
> +}
> +
> +int
> +main ()
> +{
> +  signed char c = a;
> +  b = fn1 (c, 1);
> +  c = ((128 | c) < 0 ? 1 : 0);
> +  if (c != 1)
> +__builtin_abort ();
> +  return 0;
> +}
> 
>   Jakub
> 
> 

-- 
Richard Biener 
SUSE LINUX GmbH, GF: Felix Imendoerffer, Jane Smithard, Graham Norton, HRB 
21284 (AG Nuernberg)

[PATCH, i386, PR68627] Prohibit AVX-512VL broadcasts generation on KNL.

2015-12-04 Thread Kirill Yukhin

Hello,
Patch in the bottom fixes spec2k6/437.leslie3d illigal insn generation.
The problem is that for AVX-512F broacasts are allowed to 512b registers only.
[x|y]mm variants are AVX-512VL.

Bootstrapped and regtested.

I'll commit it into GCC main trunk on Monday if no objections.

gcc/
PR target/68627
* config/i386/sse.md: Make 'v' alternative work on 'avx512f' ISA only.
Force destination to 512 bits register.

gcc/testsuite/
PR target/68627
* gfortran.dg/pr68627.f: New test.

--
Thanks, K

commit ff93d08d61d58c28707b224f5b84ab30628b34a3
Author: Kirill Yukhin 
Date:   Tue Dec 1 10:28:17 2015 +0300

AVX-512. Make broadcast from SSE reg AVX-512 only. Force to zmm.

diff --git a/gcc/config/i386/sse.md b/gcc/config/i386/sse.md
index e7b517a..0286e6b 100644
--- a/gcc/config/i386/sse.md
+++ b/gcc/config/i386/sse.md
@@ -17377,20 +17377,21 @@
(set_attr "mode" "")])
 
 (define_insn "vec_dup"
-  [(set (match_operand:AVX_VEC_DUP_MODE 0 "register_operand" "=x,x,v,x")
+  [(set (match_operand:AVX_VEC_DUP_MODE 0 "register_operand" "=x,x,x,v,x")
(vec_duplicate:AVX_VEC_DUP_MODE
- (match_operand: 1 "nonimmediate_operand" "m,m,v,?x")))]
+ (match_operand: 1 "nonimmediate_operand" 
"m,m,x,v,?x")))]
   "TARGET_AVX"
   "@
vbroadcast\t{%1, %0|%0, %1}
vbroadcast\t{%1, %0|%0, %1}
vbroadcast\t{%x1, %0|%0, %x1}
+   vbroadcast\t{%x1, %g0|%g0, %x1}
#"
   [(set_attr "type" "ssemov")
(set_attr "prefix_extra" "1")
(set_attr "prefix" "maybe_evex")
-   (set_attr "isa" "avx2,noavx2,avx2,noavx2")
-   (set_attr "mode" ",V8SF,,V8SF")])
+   (set_attr "isa" "avx2,noavx2,avx2,avx512f,noavx2")
+   (set_attr "mode" ",V8SF,,,V8SF")])
 
 (define_split
   [(set (match_operand:AVX2_VEC_DUP_MODE 0 "register_operand")
diff --git a/gcc/testsuite/gfortran.dg/pr68627.f 
b/gcc/testsuite/gfortran.dg/pr68627.f
new file mode 100755
index 000..32ff4a7
--- /dev/null
+++ b/gcc/testsuite/gfortran.dg/pr68627.f
@@ -0,0 +1,18 @@
+! { dg-do compile { target lp64 } }
+
+! { dg-options "-Ofast -mavx512f -ffixed-xmm1 -ffixed-xmm2 -ffixed-xmm3 
-ffixed-xmm4 -ffixed-xmm5 -ffixed-xmm6 -ffixed-xmm7 -ffixed-xmm8 -ffixed-xmm9 
-ffixed-xmm10 -ffixed-xmm11 -ffixed-xmm12 -ffixed-xmm13 -ffixed-xmm14 
-ffixed-xmm15" }
+
+  IMPLICIT REAL*8(A-H,O-Z)
+  ALLOCATABLE DD1(:), DD2(:), WY(:,:)
+  ALLOCATE( DD1(MAX), DD2(MAX), WY(MAX,MAX))
+ DO J = J1,J2
+DO I = I1, I2
+   DD1(I) = D1 * (WY(I-2,J) - WY(I+2,J) +
+ >  (WY(I+1,J) - WY(I-1,J)))
+END DO
+DO I = I1, INT(D2 * D3(I))
+END DO
+ END DO
+  END
+
+! { dg-final { scan-assembler-not "vbroadcastsd\[ \\t\]+%xmm\[0-9\]+, 
%ymm\[0-9\]+" } }

Re: [ARM] Fix PR middle-end/65958

2015-12-04 Thread Marcus Shawcroft

On 3 December 2015 at 12:17, Eric Botcazou  wrote:
>> I can understand this restriction, but...
>>
>> > +  /* See the same assertion on PROBE_INTERVAL above.  */
>> > +  gcc_assert ((first % 4096) == 0);
>>
>> ... why isn't this a test that FIRST is aligned to PROBE_INTERVAL?
>
> Because that isn't guaranteed, FIRST is related to the size of the protection
> area while PROBE_INTERVAL is related to the page size.
>
>> blank line between declarations and code. Also, can we come up with a
>> suitable define for 4096 here that expresses the context and then use
>> that consistently through the remainder of this function?
>
> OK, let's use ARITH_BASE.
>
>> > +(define_insn "probe_stack_range"
>> > +  [(set (match_operand:DI 0 "register_operand" "=r")
>> > +   (unspec_volatile:DI [(match_operand:DI 1 "register_operand" "0")
>> > +(match_operand:DI 2 "register_operand" "r")]
>> > +UNSPEC_PROBE_STACK_RANGE))]
>>
>> I think this should really use PTRmode, so that it's ILP32 ready (I'm
>> not going to ask you to make sure that works though, since I suspect
>> there are still other issues to resolve with ILP32 at this time).
>
> Done.  Manually tested for now, I'll fully test it if approved.

Looks ok to me.  OK /Marcus

Re: [PATCH 2/2] [graphite] fix invalid bounds on array refs

2015-12-04 Thread Richard Biener

On Thu, Dec 3, 2015 at 10:14 PM, Sebastian Pop  wrote:
> Richard Biener wrote:
>> On Wed, Dec 2, 2015 at 10:36 PM, Sebastian Paul Pop  
>> wrote:
>> > Do you recommend that we add a gcc_assert that min is always lower than 
>> > max?
>>
>> No, min can be one less than max if the array has size zero.
>
> Maybe a typo: do you mean max can be one less than min?

Err, yes.

> If the array has size zero, then I think ISL is correct in saying that there 
> are
> no dependences.  As we miscompiled the testcase, I think that the bug is in 
> the
> Fortran front-end.

That was my analysis as well.

Richard.

Re: -fstrict-aliasing fixes 4/6: do not fiddle with flag_strict_aliasing when expanding debug locations

2015-12-04 Thread Richard Biener

On Wed, 2 Dec 2015, Jakub Jelinek wrote:

> On Wed, Dec 02, 2015 at 09:16:10PM +0100, Jan Hubicka wrote:
> > * cfgexpand.c: Include alias.h
> 
> Missing full stop at the end.
> 
> > (expand_call_stmt, expand_debug_expr): Set no_new_alias_sets;
> > do not fiddle with flag_strict_aliasing
> 
> Likewise.
> 
> > * varasm.c: Include alias.h
> 
> Ditto.
> 
> > @@ -1034,6 +1037,10 @@ get_alias_set (tree t)
> >   gcc_checking_assert (p == TYPE_MAIN_VARIANT (p));
> >   if (TYPE_ALIAS_SET_KNOWN_P (p))
> > set = TYPE_ALIAS_SET (p);
> > + /* During debug statement expanding we can not allocate new alias sets
> 
> s/expanding/expansion/ ?
> 
> > +  /* During debug statement expanding we can not allocate new alias sets
> 
> Ditto.
> 
> Otherwise, it looks reasonable to me, but please wait for richi's feedback
> on it.

Looks good to me.

Thanks,
Richard.

[PATCH] Speedup bitmap_find_bit

2015-12-04 Thread Richard Biener


This properly guards the bitmap_mem_desc.get_descriptor_for_instance
call with GATHER_STATISTICS - the hashmap is globally initialized
and thus a query isn't very well inline optimized even if the map
is empty.

Committed as obvious.

Richard.

2015-12-04  Richard Biener  

* bitmap.c (bitmap_find_bit): Guard the bitmap descriptor
query with GATHER_STATISTICS.

Index: gcc/bitmap.c
===
--- gcc/bitmap.c(revision 231256)
+++ gcc/bitmap.c(working copy)
@@ -487,9 +487,11 @@ bitmap_find_bit (bitmap head, unsigned i
   && head->first->next == NULL)
 return NULL;
 
-   /* Usage can be NULL due to allocated bitmaps for which we do not
-  call initialize function.  */
-   bitmap_usage *usage = bitmap_mem_desc.get_descriptor_for_instance (head);
+  /* Usage can be NULL due to allocated bitmaps for which we do not
+ call initialize function.  */
+  bitmap_usage *usage = NULL;
+  if (GATHER_STATISTICS)
+usage = bitmap_mem_desc.get_descriptor_for_instance (head);
 
   /* This bitmap has more than one element, and we're going to look
  through the elements list.  Count that as a search.  */

Re: [PATCH AArch64]Use aarch64_sync_memory_operand in atomic_store pattern

2015-12-04 Thread Marcus Shawcroft

On 4 December 2015 at 03:34, Bin Cheng  wrote:

> 2015-12-01  Bin Cheng  
>
> * config/aarch64/atomics.md (atomic_store): Use predicate
> aarch64_sync_memory_operand.
>

OK /Marcus

Re: [PATCH] Handle OBJ_TYPE_REF in FRE

2015-12-04 Thread Jan Hubicka

> Indeed we don't do code hoisting yet.  Maybe one could trick PPRE
> into doing it.
> 
> Note that for OBJ_TYPE_REFs in calls you probably should better use
> gimple_call_fntype instead of the type of the OBJ_TYPE_REF anyway
> (well, fntype will be the method-type, not pointer-to-method-type).
> 
> Not sure if you need OBJ_TYPE_REFs type in non-call contexts?

Well, to optimize speculative call sequences

if (funptr == thismethod)
  inlined this method body
else
  funptr ();

Here you want to devirtualize the conditional, not the call in order
to get the inlined method unconditonally.

In general I think OBJ_TYPE_REF is misplaced - it should be on vtable load
instead of the call/conditional. It is a property of the vtable lookup.
Then it would work for method pointers too.
> 
>   if (fn
>   && (!POINTER_TYPE_P (TREE_TYPE (fn))
>   || (TREE_CODE (TREE_TYPE (TREE_TYPE (fn))) != FUNCTION_TYPE
>   && TREE_CODE (TREE_TYPE (TREE_TYPE (fn))) != METHOD_TYPE)))
> {
>   error ("non-function in gimple call");
>   return true;
> }
> 
> and in useless_type_conversion_p:
> 
>   /* Do not lose casts to function pointer types.  */
>   if ((TREE_CODE (TREE_TYPE (outer_type)) == FUNCTION_TYPE
>|| TREE_CODE (TREE_TYPE (outer_type)) == METHOD_TYPE)
>   && !(TREE_CODE (TREE_TYPE (inner_type)) == FUNCTION_TYPE
>|| TREE_CODE (TREE_TYPE (inner_type)) == METHOD_TYPE))
> return false;

Yeah, this does not make much sense to me anymore.  Something to track next
stage1.
> 
> probably from the times we didn't have gimple_call_fntype.  So if I
> paper over the ICE (in the verifier) then the libreoffice testcase
> gets optimized to
> 
>   :
>   _3 = this_2(D)->D.2399.D.2325._vptr.B;
>   _4 = *_3;
>   PROF_6 = OBJ_TYPE_REF(_4;(struct 
> WindowListenerMultiplexer)this_2(D)->0);
>   if (PROF_6 == acquire)
> goto ;
>   else
> goto ;
> 
>   :
>   PROF_6 (this_2(D));
>   goto ;
> 
>   :
>   PROF_6 (this_2(D));
> 
> by FRE2 and either VRP or DOM will propagate the equivalency to
> 
>   :
>   _3 = this_2(D)->D.2399.D.2325._vptr.B;
>   _4 = *_3;
>   PROF_6 = OBJ_TYPE_REF(_4;(struct 
> WindowListenerMultiplexer)this_2(D)->0);
>   if (PROF_6 == acquire)
> goto ;
>   else
> goto ;
> 
>   :
>   WindowListenerMultiplexer::acquire (this_2(D));
>   goto ;
> 
>   :
>   PROF_6 (this_2(D));
> 
> Richard.

LGTM.
Honza
> 
> 2015-12-03  Richard Biener  
> 
>   PR tree-optimization/64812
>   * tree-ssa-sccvn.c (vn_get_stmt_kind): Handle OBJ_TYPE_REF.
>   (vn_nary_length_from_stmt): Likewise.
>   (init_vn_nary_op_from_stmt): Likewise.
>   * gimple-match-head.c (maybe_build_generic_op): Likewise.
>   * gimple-pretty-print.c (dump_unary_rhs): Likewise.
>   * gimple-fold.c (gimple_build): Likewise.
>   * gimple.h (gimple_expr_type): Likewise.
> 
>   * g++.dg/tree-ssa/ssa-fre-1.C: New testcase.
> 
> Index: gcc/tree-ssa-sccvn.c
> ===
> *** gcc/tree-ssa-sccvn.c  (revision 231221)
> --- gcc/tree-ssa-sccvn.c  (working copy)
> *** vn_get_stmt_kind (gimple *stmt)
> *** 460,465 
> --- 460,467 
> ? VN_CONSTANT : VN_REFERENCE);
>   else if (code == CONSTRUCTOR)
> return VN_NARY;
> + else if (code == OBJ_TYPE_REF)
> +   return VN_NARY;
>   return VN_NONE;
> }
> default:
> *** vn_nary_length_from_stmt (gimple *stmt)
> *** 2479,2484 
> --- 2481,2487 
> return 1;
>   
>   case BIT_FIELD_REF:
> + case OBJ_TYPE_REF:
> return 3;
>   
>   case CONSTRUCTOR:
> *** init_vn_nary_op_from_stmt (vn_nary_op_t
> *** 2508,2513 
> --- 2511,2517 
> break;
>   
>   case BIT_FIELD_REF:
> + case OBJ_TYPE_REF:
> vno->length = 3;
> vno->op[0] = TREE_OPERAND (gimple_assign_rhs1 (stmt), 0);
> vno->op[1] = TREE_OPERAND (gimple_assign_rhs1 (stmt), 1);
> Index: gcc/gimple-match-head.c
> ===
> *** gcc/gimple-match-head.c   (revision 231221)
> --- gcc/gimple-match-head.c   (working copy)
> *** maybe_build_generic_op (enum tree_code c
> *** 243,248 
> --- 243,249 
> *op0 = build1 (code, type, *op0);
> break;
>   case BIT_FIELD_REF:
> + case OBJ_TYPE_REF:
> *op0 = build3 (code, type, *op0, op1, op2);
> break;
>   default:;
> Index: gcc/gimple-pretty-print.c
> ===
> *** gcc/gimple-pretty-print.c (revision 231221)
> --- gcc/gimple-pretty-print.c (working copy)
> *** dump_unary_rhs (pretty_printer *buffer,
> *** 302,308 
> || TREE_CODE_CLASS (rhs_code) == tcc_reference
> || rhs_code == SSA_NAME
> || rhs_code ==

[PATCH][AArch64] Properly cost zero_extend+ashift forms of ubfi[xz]

2015-12-04 Thread Kyrill Tkachov


Hi all,

We don't handle properly the patterns for the [us]bfiz and [us]bfx instructions 
when they
have an extend+ashift form. For example, the 
*_ashl pattern.
This leads to rtx costs recuring into the extend and assigning a cost to these 
patterns that is too
large.

This patch fixes that oversight.
I stumbled across this when working on a different combine patch and ended up 
matching the above
pattern, only to have it rejected for -mcpu=cortex-a53 due to the erroneous 
cost.

Bootstrapped and tested on aarch64.

Ok for trunk?

Thanks,
Kyrill

2015-12-04  Kyrylo Tkachov  

* config/aarch64/aarch64.c (aarch64_extend_bitfield_pattern_p):
New function.
(aarch64_rtx_costs, ZERO_EXTEND, SIGN_EXTEND cases): Use the above
to handle extend+shift rtxes.
diff --git a/gcc/config/aarch64/aarch64.c b/gcc/config/aarch64/aarch64.c
index c97ecdc0859e0a24792a57aeb18b2e4ea35918f4..d180f6f2d37a280ad77f34caad8496ddaa6e01b2 100644
--- a/gcc/config/aarch64/aarch64.c
+++ b/gcc/config/aarch64/aarch64.c
@@ -5833,6 +5833,50 @@ aarch64_if_then_else_costs (rtx op0, rtx op1, rtx op2, int *cost, bool speed)
   return false;
 }
 
+/* Check whether X is a bitfield operation of the form shift + extend that
+   maps down to a UBFIZ/SBFIZ/UBFX/SBFX instruction.  If so, return the
+   operand to which the bitfield operation is applied to.  Otherwise return
+   NULL_RTX.  */
+
+static rtx
+aarch64_extend_bitfield_pattern_p (rtx x)
+{
+  rtx_code outer_code = GET_CODE (x);
+  machine_mode outer_mode = GET_MODE (x);
+
+  if (outer_code != ZERO_EXTEND && outer_code != SIGN_EXTEND
+  && outer_mode != SImode && outer_mode != DImode)
+return NULL_RTX;
+
+  rtx inner = XEXP (x, 0);
+  rtx_code inner_code = GET_CODE (inner);
+  machine_mode inner_mode = GET_MODE (inner);
+  rtx op = NULL_RTX;
+
+  switch (inner_code)
+{
+  case ASHIFT:
+	if (CONST_INT_P (XEXP (inner, 1))
+	&& (inner_mode == QImode || inner_mode == HImode))
+	  op = XEXP (inner, 0);
+	break;
+  case LSHIFTRT:
+	if (outer_code == ZERO_EXTEND && CONST_INT_P (XEXP (inner, 1))
+	&& (inner_mode == QImode || inner_mode == HImode))
+	  op = XEXP (inner, 0);
+	break;
+  case ASHIFTRT:
+	if (outer_code == SIGN_EXTEND && CONST_INT_P (XEXP (inner, 1))
+	&& (inner_mode == QImode || inner_mode == HImode))
+	  op = XEXP (inner, 0);
+	break;
+  default:
+	break;
+}
+
+  return op;
+}
+
 /* Calculate the cost of calculating X, storing it in *COST.  Result
is true if the total cost of the operation has now been calculated.  */
 static bool
@@ -6521,6 +6565,14 @@ cost_plus:
 	  return true;
 	}
 
+  op0 = aarch64_extend_bitfield_pattern_p (x);
+  if (op0)
+	{
+	  *cost += rtx_cost (op0, mode, ZERO_EXTEND, 0, speed);
+	  if (speed)
+	*cost += extra_cost->alu.bfx;
+	  return true;
+	}
   if (speed)
 	{
 	  if (VECTOR_MODE_P (mode))
@@ -6552,6 +6604,14 @@ cost_plus:
 	  return true;
 	}
 
+  op0 = aarch64_extend_bitfield_pattern_p (x);
+  if (op0)
+	{
+	  *cost += rtx_cost (op0, mode, SIGN_EXTEND, 0, speed);
+	  if (speed)
+	*cost += extra_cost->alu.bfx;
+	  return true;
+	}
   if (speed)
 	{
 	  if (VECTOR_MODE_P (mode))

Re: [PATCH] Use ECF_MAY_BE_ALLOCA for __builtin_alloca_with_align (PR tree-optimization/68680)

2015-12-04 Thread Richard Biener

On Thu, 3 Dec 2015, Jakub Jelinek wrote:

> Hi!
> 
> As mentioned in the PR, GCC 4.7+ seems to have regressed for
> -fstack-protector*, functions containing VLAs and no other arrays are not
> protected anymore.  Before 4.7, VLAs were gimplified as __builtin_alloca
> call, which sets ECF_MAY_BE_ALLOCA and in turn cfun->calls_alloca.
> These two are used in various places:
> 1) for stack protector purposes (this issue), early during expansion
> 2) in the inliner
> 3) for tail call optimization
> 4) for some non-NULL optimizations
> and tons of places in RTL.  As 4.7+ emits __builtin_alloca_with_align
> instead and special_function_p has not been adjusted, this does not happen
> any longer, though cfun->calls_alloca gets set during the expansion of
> __builtin_alloca_with_align, so for RTL optimizers it is already set.
> 
> The following patch restores the previous behavior, making VLAs be
> ECF_MAY_BE_ALLOCA and cfun->calls_alloca already during GIMPLE passes.
> It could be also done by testing the name, but I thought that it would be
> too ugly (would need another case anyway, as the current tests are for
> names with length <= 16).
> 
> 1) and 4) surely want to treat the VLAs like the patch does, I'm not 100%
> sure about 2) and 3), as VLAs are slightly different, they release
> the stack afterwards at the end of scope of the VLA var.  If we wanted to
> treat the two differently, maybe we'd need another ECF* flag and another
> cfun bitfield for VLAs.
> 
> The following patch has been bootstrapped/regtested on x86_64-linux and
> i686-linux.

The patch is ok - it looks like you could have removed the
__builtin_alloca strcmp with it though.

Does the patch mean we inlined __builtin_alloca_with_align ()
functions?  We might run into the issue Eric fixed lately with
mixing alloca and VLAs (don't see the patch being committed though).

Richard.

> 2015-12-03  Jakub Jelinek  
> 
>   PR tree-optimization/68680
>   * calls.c (special_function_p): Return ECF_MAY_BE_ALLOCA for
>   BUILT_IN_ALLOCA{,_WITH_ALIGN}.
> 
>   * gcc.target/i386/pr68680.c: New test.
> 
> --- gcc/calls.c.jj2015-11-26 11:17:25.0 +0100
> +++ gcc/calls.c   2015-12-03 19:03:59.342306457 +0100
> @@ -553,6 +553,17 @@ special_function_p (const_tree fndecl, i
>   flags |= ECF_NORETURN;
>  }
>  
> +  if (DECL_BUILT_IN_CLASS (fndecl) == BUILT_IN_NORMAL)
> +switch (DECL_FUNCTION_CODE (fndecl))
> +  {
> +  case BUILT_IN_ALLOCA:
> +  case BUILT_IN_ALLOCA_WITH_ALIGN:
> + flags |= ECF_MAY_BE_ALLOCA;
> + break;
> +  default:
> + break;
> +  }
> +
>return flags;
>  }
>  
> --- gcc/testsuite/gcc.target/i386/pr68680.c.jj2015-12-03 
> 19:10:14.836037923 +0100
> +++ gcc/testsuite/gcc.target/i386/pr68680.c   2015-12-03 19:09:57.0 
> +0100
> @@ -0,0 +1,15 @@
> +/* PR tree-optimization/68680 */
> +/* { dg-do compile } */
> +/* { dg-options "-O2 -fstack-protector-strong" } */
> +
> +int foo (char *);
> +
> +int
> +bar (unsigned long x)
> +{
> +  char a[x];
> +  return foo (a);
> +}
> +
> +/* Verify that this function is stack protected.  */
> +/* { dg-final { scan-assembler "stack_chk_fail" } } */
> 
>   Jakub
> 
> 

-- 
Richard Biener 
SUSE LINUX GmbH, GF: Felix Imendoerffer, Jane Smithard, Graham Norton, HRB 
21284 (AG Nuernberg)

[PATCH] Fix PR68681

2015-12-04 Thread Richard Biener


Writing reliable vectorizer testcases is hard - the following factors
out target dependent defaults for tree-reassoc-width for
gcc.dg/vect/pr45752.c

Committed.

Richard.

2015-12-04  Richard Biener  

PR testsuite/68681
* gcc.dg/vect/pr45752.c: Add --param tree-reassoc-width=1.

Index: gcc/testsuite/gcc.dg/vect/pr45752.c
===
--- gcc/testsuite/gcc.dg/vect/pr45752.c (revision 231250)
+++ gcc/testsuite/gcc.dg/vect/pr45752.c (working copy)
@@ -1,4 +1,5 @@
 /* { dg-require-effective-target vect_int } */
+/* { dg-additional-options "--param tree-reassoc-width=1" } */
 
 #include 
 #include "tree-vect.h"

Re: PR c/68657 - Add missing 'Warning' flags to c-family/c.opt + java/lang.opt

2015-12-04 Thread Tobias Burnus

And now with attached patch ...

On Fri, Dec 04, 2015 at 10:59:29AM +0100, Tobias Burnus wrote:
> A few warning options lack the 'Warning' flag, which since r228094 
> (2015-09-24)
> has the effect that -W(no-)error= doesn't work for them. Additionally,
> --help=warnings doesn't work for them either.
> 
> Successfully bootstrapped with c,c++,fortran,lto,go,objc,obj-c++,java on
> x86-64-gnu-linux & checked whether the issue in the PR was fixed.
> 
> I intent to commit the patch tomorrow as obvious unless someone has 
> objections.
> 
> Tobias
gcc/c-family/
	PR c/68657
	* c.opt (Wpsabi, Wfloat-conversion, Wsign-conversion):
	Add 'Warning' flag.

gcc/java/
	PR c/68657
	* lang.opt (Wdeprecated, Wextraneous-semicolon, Wout-of-date,
	Wredundant-modifiers): Add 'Warning' flag.

diff --git a/gcc/c-family/c.opt b/gcc/c-family/c.opt
index aafd802..758f88f 100644
--- a/gcc/c-family/c.opt
+++ b/gcc/c-family/c.opt
@@ -267,3 +267,3 @@ Warn if a subobject has an abi_tag attribute that the complete object type does
 Wpsabi
-C ObjC C++ ObjC++ LTO Var(warn_psabi) Init(1) Undocumented LangEnabledBy(C ObjC C++ ObjC++,Wabi)
+C ObjC C++ ObjC++ LTO Var(warn_psabi) Init(1) Undocumented Warning LangEnabledBy(C ObjC C++ ObjC++,Wabi)
 
@@ -437,3 +437,3 @@ This switch is deprecated; use -Werror=implicit-function-declaration instead.
 Wfloat-conversion
-C ObjC C++ ObjC++ Var(warn_float_conversion) LangEnabledBy(C ObjC C++ ObjC++,Wconversion)
+C ObjC C++ ObjC++ Var(warn_float_conversion) Warning LangEnabledBy(C ObjC C++ ObjC++,Wconversion)
 Warn for implicit type conversions that cause loss of floating point precision.
@@ -839,3 +839,3 @@ C ObjC C++ ObjC++ EnabledBy(Wextra)
 Wsign-conversion
-C ObjC C++ ObjC++ Var(warn_sign_conversion) LangEnabledBy(C ObjC,Wconversion)
+C ObjC C++ ObjC++ Var(warn_sign_conversion) Warning LangEnabledBy(C ObjC,Wconversion)
 Warn for implicit type conversions between signed and unsigned integers.
diff --git a/gcc/java/lang.opt b/gcc/java/lang.opt
index 7399d57..9fd2580 100644
--- a/gcc/java/lang.opt
+++ b/gcc/java/lang.opt
@@ -116,3 +116,3 @@ Java
 Wdeprecated
-Java Var(warn_deprecated)
+Java Var(warn_deprecated) Warning
 ; Documented for C
@@ -120,3 +120,3 @@ Java Var(warn_deprecated)
 Wextraneous-semicolon
-Java Var(flag_extraneous_semicolon)
+Java Var(flag_extraneous_semicolon) Warning
 Warn if deprecated empty statements are found.
@@ -124,3 +124,3 @@ Warn if deprecated empty statements are found.
 Wout-of-date
-Java Var(flag_newer) Init(1)
+Java Var(flag_newer) Init(1) Warning
 Warn if .class files are out of date.
@@ -128,3 +128,3 @@ Warn if .class files are out of date.
 Wredundant-modifiers
-Java Var(flag_redundant)
+Java Var(flag_redundant) Warning
 Warn if modifiers are specified when not necessary.

Re: [PATCH] Use ECF_MAY_BE_ALLOCA for __builtin_alloca_with_align (PR tree-optimization/68680)

2015-12-04 Thread Eric Botcazou

> Does the patch mean we inlined __builtin_alloca_with_align ()
> functions?  We might run into the issue Eric fixed lately with
> mixing alloca and VLAs (don't see the patch being committed though).

But I'm about to do it (I was waiting for the approval of the aarch64 
specific part).  Note that I'm not really sure if we want it for 5.x too.

-- 
Eric Botcazou

Re: [PATCH] Use ECF_MAY_BE_ALLOCA for __builtin_alloca_with_align (PR tree-optimization/68680)

2015-12-04 Thread Richard Biener

On Fri, 4 Dec 2015, Jakub Jelinek wrote:

> On Fri, Dec 04, 2015 at 10:30:38AM +0100, Richard Biener wrote:
> > > The following patch has been bootstrapped/regtested on x86_64-linux and
> > > i686-linux.
> > 
> > The patch is ok - it looks like you could have removed the
> > __builtin_alloca strcmp with it though.
> 
> Ok, will remove the strcmp then.
> 
> > Does the patch mean we inlined __builtin_alloca_with_align ()
> > functions?  We might run into the issue Eric fixed lately with
> 
> Yes, see testcase below.  4.7+ inlines it.  As for tail call optimization,
> seems we are just lucky there (f4), as fab pass which is quite late
> turns the __builtin_stack_restore into GIMPLE_NOP and tailc pass does not
> ignore nops.  Shall I commit following patch to trunk to fix that up
> (after committing this VLA fix of course)?

Yes please.

Thanks,
Richard.

> int f1 (char *);
> 
> static inline void
> f2 (int x)
> {
>   char a[x];
>   f1 (a);
> }
> 
> void
> f3 (int x)
> {
>   f2 (x);
>   f2 (x);
>   f2 (x);
>   f2 (x);
> }
> 
> int
> f4 (int x)
> {
>   char a[x];
>   return f1 (a);
> }
> 
> 2015-12-04  Jakub Jelinek  
> 
>   * tree-tailcall.c (find_tail_calls): Ignore GIMPLE_NOPs.
> 
> --- gcc/tree-tailcall.c.jj2015-11-04 11:12:17.0 +0100
> +++ gcc/tree-tailcall.c   2015-12-04 11:43:01.296110941 +0100
> @@ -412,9 +412,10 @@ find_tail_calls (basic_block bb, struct
>  {
>stmt = gsi_stmt (gsi);
>  
> -  /* Ignore labels, returns, clobbers and debug stmts.  */
> +  /* Ignore labels, returns, nops, clobbers and debug stmts.  */
>if (gimple_code (stmt) == GIMPLE_LABEL
> || gimple_code (stmt) == GIMPLE_RETURN
> +   || gimple_code (stmt) == GIMPLE_NOP
> || gimple_clobber_p (stmt)
> || is_gimple_debug (stmt))
>   continue;
> @@ -532,7 +533,8 @@ find_tail_calls (basic_block bb, struct
>  
>stmt = gsi_stmt (agsi);
>  
> -  if (gimple_code (stmt) == GIMPLE_LABEL)
> +  if (gimple_code (stmt) == GIMPLE_LABEL
> +   || gimple_code (stmt) == GIMPLE_NOP)
>   continue;
>  
>if (gimple_code (stmt) == GIMPLE_RETURN)
> 
> 
>   Jakub
> 
> 

-- 
Richard Biener 
SUSE LINUX GmbH, GF: Felix Imendoerffer, Jane Smithard, Graham Norton, HRB 
21284 (AG Nuernberg)

Re: [RFA][PATCH] Run CFG cleanups after reassociation as needed

2015-12-04 Thread Richard Biener

On Thu, Dec 3, 2015 at 6:54 PM, Jeff Law  wrote:
> This is something I noticed while working on fixing 67816.
>
> Essentially I was seeing trivially true or trivially false conditionals left
> in the IL for DOM to clean up.
>
> While DOM can and will clean that crud up, but a trivially true or trivially
> false conditional ought to be detected and cleaned up by cleanup_cfg.
>
> It turns out the reassociation pass does not schedule a CFG cleanup even in
> cases where it optimizes a conditional to TRUE or FALSE.
>
> Bubbling up an indicator that we optimized away a conditional and using that
> to trigger a CFG cleanup is trivial.
>
> While I have a slight preference to see this fix in GCC 6, if folks object
> and want this to wait for GCC 7 stage1, I'd understand.
>
> Bootstrapped and regression tested on x86_64-linux-gnu.
>
> OK for the trunk?

Ok.  [I always hoped we can at some point assert we don't have trivially
optimizable control flow in the IL]

Richard.

> Thanks,
> Jeff
>
> diff --git a/gcc/ChangeLog b/gcc/ChangeLog
> index 04dbcb0..61a5e54 100644
> --- a/gcc/ChangeLog
> +++ b/gcc/ChangeLog
> @@ -1,3 +1,12 @@
> +2015-12-03  Jeff Law  
> +
> +   * tree-ssa-reassoc.c (maybe_optimize_range_tests): Return boolean
> +   indicating if a gimple conditional was optimized to true/false.
> +   (reassociate_bb): Bubble up return value from
> +   maybe_optimize_range_tests.
> +   (do_reassoc): Similarly, but for reassociate_bb.
> +   (execute_reassoc): Return TODO_cleanup_cfg as needed.
> +
>  2015-11-27  Jiri Engelthaler  
>
> PR driver/68029
> diff --git a/gcc/testsuite/ChangeLog b/gcc/testsuite/ChangeLog
> index 4e62a06..893aab1 100644
> --- a/gcc/testsuite/ChangeLog
> +++ b/gcc/testsuite/ChangeLog
> @@ -1,3 +1,7 @@
> +2015-12-02  Jeff Law  
> +
> +   * gcc.dg/tree-ssa/reassoc-43.c: New test.
> +
>  2015-12-02  Andreas Krebbel  
>
> * gcc.dg/optimize-bswapdi-1.c: Force using -mzarch on s390 and
> diff --git a/gcc/testsuite/gcc.dg/tree-ssa/reassoc-43.c
> b/gcc/testsuite/gcc.dg/tree-ssa/reassoc-43.c
> new file mode 100644
> index 000..ea44f30
> --- /dev/null
> +++ b/gcc/testsuite/gcc.dg/tree-ssa/reassoc-43.c
> @@ -0,0 +1,53 @@
> +/* { dg-do compile } */
> +/* { dg-options "-O2 -fdump-tree-reassoc -w" } */
> +
> +typedef union tree_node *tree;
> +enum cpp_ttype { CPP_COLON, CPP_SEMICOLON, CPP_CLOSE_BRACE, CPP_COMMA };
> +enum rid { RID_STATIC = 0, RID_ATTRIBUTE, };
> +typedef struct c_token
> +{
> +  enum cpp_ttype type:8;
> +}
> +c_token;
> +typedef struct c_parser
> +{
> +  c_token tokens[2];
> +  short tokens_avail;
> +}
> +c_parser;
> +__inline__ c_token *
> +c_parser_peek_token (c_parser * parser)
> +{
> +  if (parser->tokens_avail == 0)
> +{
> +  parser->tokens_avail = 1;
> +}
> +  return >tokens[0];
> +}
> +
> +__inline__ unsigned char
> +c_parser_next_token_is (c_parser * parser, enum cpp_ttype type)
> +{
> +  return c_parser_peek_token (parser)->type == type;
> +}
> +
> +void
> +c_parser_translation_unit (c_parser * parser)
> +{
> +  tree prefix_attrs;
> +  tree all_prefix_attrs;
> +  while (1)
> +{
> +  if (c_parser_next_token_is (parser, CPP_COLON)
> + || c_parser_next_token_is (parser, CPP_COMMA)
> + || c_parser_next_token_is (parser, CPP_SEMICOLON)
> + || c_parser_next_token_is (parser, CPP_CLOSE_BRACE)
> + || c_parser_next_token_is_keyword (parser, RID_ATTRIBUTE))
> +   {
> + if (c_parser_next_token_is_keyword (parser, RID_ATTRIBUTE))
> +   all_prefix_attrs =
> + chainon (c_parser_attributes (parser), prefix_attrs);
> +   }
> +}
> +}
> +/* { dg-final { scan-tree-dump-not "0 != 0" "reassoc2"} } */
> diff --git a/gcc/tree-ssa-reassoc.c b/gcc/tree-ssa-reassoc.c
> index dfd0da1..315b0bf 100644
> --- a/gcc/tree-ssa-reassoc.c
> +++ b/gcc/tree-ssa-reassoc.c
> @@ -2976,9 +2976,15 @@ struct inter_bb_range_test_entry
>unsigned int first_idx, last_idx;
>  };
>
> -/* Inter-bb range test optimization.  */
> +/* Inter-bb range test optimization.
>
> -static void
> +   Returns TRUE if a gimple conditional is optimized to a true/false,
> +   otherwise return FALSE.
> +
> +   This indicates to the caller that it should run a CFG cleanup pass
> +   once reassociation is completed.  */
> +
> +static bool
>  maybe_optimize_range_tests (gimple *stmt)
>  {
>basic_block first_bb = gimple_bb (stmt);
> @@ -2990,6 +2996,7 @@ maybe_optimize_range_tests (gimple *stmt)
>auto_vec ops;
>auto_vec bbinfo;
>bool any_changes = false;
> +  bool cfg_cleanup_needed = false;
>
>/* Consider only basic blocks that end with GIMPLE_COND or
>   a cast statement satisfying final_range_test_p.  All
> @@ -2998,15 +3005,15 @@ maybe_optimize_range_tests (gimple *stmt)
>if (gimple_code (stmt) == GIMPLE_COND)
>  {
>if (EDGE_COUNT (first_bb->succs) != 2)
> -

Re: [PATCH PR68542]

2015-12-04 Thread Richard Biener

On Mon, Nov 30, 2015 at 2:11 PM, Yuri Rumyantsev  wrote:
> Hi All,
>
> Here is a patch for 481.wrf preformance regression for avx2 which is
> sligthly modified mask store optimization. This transformation allows
> perform unpredication for semi-hammock containing masked stores, other
> words if we have a loop like
> for (i=0; i   if (c[i]) {
> p1[i] += 1;
> p2[i] = p3[i] +2;
>   }
>
> then it will be transformed to
>if (!mask__ifc__42.18_165 == { 0, 0, 0, 0, 0, 0, 0, 0 }) {
>  vect__11.19_170 = MASK_LOAD (vectp_p1.20_168, 0B, mask__ifc__42.18_165);
>  vect__12.22_172 = vect__11.19_170 + vect_cst__171;
>  MASK_STORE (vectp_p1.23_175, 0B, mask__ifc__42.18_165, vect__12.22_172);
>  vect__18.25_182 = MASK_LOAD (vectp_p3.26_180, 0B, mask__ifc__42.18_165);
>  vect__19.28_184 = vect__18.25_182 + vect_cst__183;
>  MASK_STORE (vectp_p2.29_187, 0B, mask__ifc__42.18_165, vect__19.28_184);
>}
> i.e. it will put all computations related to masked stores to semi-hammock.
>
> Bootstrapping and regression testing did not show any new failures.

Can you please split out the middle-end support for vector equality compares?

@@ -3448,10 +3448,17 @@ verify_gimple_comparison (tree type, tree op0, tree op1)
   if (TREE_CODE (op0_type) == VECTOR_TYPE
  || TREE_CODE (op1_type) == VECTOR_TYPE)
 {
-  error ("vector comparison returning a boolean");
-  debug_generic_expr (op0_type);
-  debug_generic_expr (op1_type);
-  return true;
+ /* Allow vector comparison returning boolean if operand types
+are equal and CODE is EQ/NE.  */
+ if ((code != EQ_EXPR && code != NE_EXPR)
+ || !(VECTOR_BOOLEAN_TYPE_P (op0_type)
+  || VECTOR_INTEGER_TYPE_P (op0_type)))
+   {
+ error ("type mismatch for vector comparison returning a boolean");
+ debug_generic_expr (op0_type);
+ debug_generic_expr (op1_type);
+ return true;
+   }
 }
 }

please merge the conditions with a &&

@@ -13888,6 +13888,25 @@ fold_relational_const (enum tree_code code,
tree type, tree op0, tree op1)

   if (TREE_CODE (op0) == VECTOR_CST && TREE_CODE (op1) == VECTOR_CST)
 {
+  if (INTEGRAL_TYPE_P (type)
+ && (TREE_CODE (type) == BOOLEAN_TYPE
+ || TYPE_PRECISION (type) == 1))
+   {
+ /* Have vector comparison with scalar boolean result.  */
+ bool result = true;
+ gcc_assert (code == EQ_EXPR || code == NE_EXPR);
+ gcc_assert (VECTOR_CST_NELTS (op0) == VECTOR_CST_NELTS (op1));
+ for (unsigned i = 0; i < VECTOR_CST_NELTS (op0); i++)
+   {
+ tree elem0 = VECTOR_CST_ELT (op0, i);
+ tree elem1 = VECTOR_CST_ELT (op1, i);
+ tree tmp = fold_relational_const (code, type, elem0, elem1);
+ result &= integer_onep (tmp);
+ if (code == NE_EXPR)
+   result = !result;
+ return constant_boolean_node (result, type);

... just assumes it is either EQ_EXPR or NE_EXPR.   I believe you want
to change the
guarding condition to just

   if (! VECTOR_TYPE_P (type))

and assert the boolean/precision.  Please also merge the asserts into
one with &&

diff --git a/gcc/tree-ssa-forwprop.c b/gcc/tree-ssa-forwprop.c
index b82ae3c..73ee3be 100644
--- a/gcc/tree-ssa-forwprop.c
+++ b/gcc/tree-ssa-forwprop.c
@@ -373,6 +373,11 @@ combine_cond_expr_cond (gimple *stmt, enum
tree_code code, tree type,

   gcc_assert (TREE_CODE_CLASS (code) == tcc_comparison);

+  /* Do not perform combining it types are not compatible.  */
+  if (TREE_CODE (TREE_TYPE (op0)) == VECTOR_TYPE
+  && !tree_int_cst_equal (TYPE_SIZE (type), TYPE_SIZE (TREE_TYPE (op0
+return NULL_TREE;
+

again, how does this happen?

diff --git a/gcc/tree-vrp.c b/gcc/tree-vrp.c
index e67048e..1605520c 100644
--- a/gcc/tree-vrp.c
+++ b/gcc/tree-vrp.c
@@ -5760,6 +5760,12 @@ register_edge_assert_for (tree name, edge e,
gimple_stmt_iterator si,
_code, ))
 return;

+  /* Use of vector comparison in gcond is very restricted and used to check
+ that the mask in masked store is zero, so assert for such comparison
+ is not implemented yet.  */
+  if (TREE_CODE (TREE_TYPE (name)) == VECTOR_TYPE)
+return;
+

VECTOR_TYPE_P

I believe the comment should simply say that VRP doesn't track ranges for
vector types.

In the previous review I suggested you should make sure that RTL expansion
ends up using a well-defined optab for these compares.  To make sure
this happens across targets I suggest you make these comparisons available
via the GCC vector extension.  Thus allow

typedef int v4si __attribute__((vector_size(16)));

int foo (v4si a, v4si b)
{
  if (a == b)
return 4;
}

and != and also using floating point vectors.

Otherwise it's hard to see the impact of this change.  Obvious choices
are the

Re: [Patch, Contrib] Download ISL 0.15 by download_prerequisites

2015-12-04 Thread Tobias Burnus

If there are no objections, I intent to commit the following patch
tomorrow as obvious.

Tobias

On Tue, Oct 27, 2015 at 11:27:41AM +0100, Tobias Burnus wrote:
> recently, support for ISL 0.15 was added to GCC and also
> ftp://gcc.gnu.org/pub/gcc/infrastructure/ now contains ISL 0.15.
> 
> Hence, there is no reason not to download the newest version by
> download_prerequisites.
> 
> OK for the trunk? (One could also add it to GCC 5 as the ISL 0.15
> patches landed there as well on 2015-10-12.)
> 
> 
> Side remark: I think one could could consider to also put newer versions
> of the other prerequisites on the FTP server, which currently has quite
> old versions:
> - GMP:  4.3.2 of January 2010 - current is 6.0.0a of March 2014 (6.1RC is Oct 
> 2015)
> - MPFR: 2.4.2 of mid 2009 - current is 3.1.3  of June 2015 (the web page 
> has additionally 3 post-relase bug-fix patches)
> - MPC:  0.8.1 of end of 2009  - current is 1.0.2  of February 2015
> 
> Cheers,
> 
> Tobias
> 
> 
> contrib/
>   * download_prerequisites: Download ISL 0.15.
> 
> diff --git a/contrib/download_prerequisites b/contrib/download_prerequisites
> index 6940330..a685a1d 100755
> --- a/contrib/download_prerequisites
> +++ b/contrib/download_prerequisites
> @@ -48,7 +48,7 @@ ln -sf $MPC mpc || exit 1
>  
>  # Necessary to build GCC with the Graphite loop optimizations.
>  if [ "$GRAPHITE_LOOP_OPT" = "yes" ] ; then
> -  ISL=isl-0.14
> +  ISL=isl-0.15
>  
>wget ftp://gcc.gnu.org/pub/gcc/infrastructure/$ISL.tar.bz2 || exit 1
>tar xjf $ISL.tar.bz2  || exit 1

Re: [ARM] Fix PR middle-end/65958

2015-12-04 Thread Eric Botcazou

> Looks ok to me.  OK /Marcus

Thanks.  Testing was successful so I have installed it with a small change 
(s/ARITH_BASE/ARITH_FACTOR/, it's a bit more mathematically correct).

-- 
Eric Botcazou

Re: [PATCH][PR tree-optimization/67816] Fix jump threading when DOM removes conditionals in jump threading path

2015-12-04 Thread Richard Biener

On Thu, Dec 3, 2015 at 9:29 PM, Jeff Law  wrote:
> On 12/02/2015 08:35 AM, Richard Biener wrote:
>
>>>
>>> The most interesting side effect, and one I haven't fully analyzed yet is
>>> an
>>> unexpected jump thread -- which I've traced back to differences in what
>>> the
>>> alias oracle is able to find when we walk unaliased vuses. Which makes
>>> totally no sense that it's unable to find the unaliased vuse in the
>>> simplified CFG, but finds it when we don't remove the unexecutable edge.
>>> As
>>> I said, it makes no sense to me yet and I'm still digging.
>>
>>
>> The walking of PHI nodes is quite simplistic to avoid doing too much work
>> so
>> an extra (not executable) edge may confuse it enough.  So this might be
>> "expected".  Adding a flag on whether EDGE_EXECUTABLE is to be
>> trusted would be an option (also helping SCCVN).
>
> Found it.  In the CFG with the unexectuable edges _not_ removed there is a
> PHI associated with that edge which provides a dominating unaliased vuse.
> Once that edge is removed, the PHI arg gets removed and thus we can't easily
> see the unaliased vuse.
>
> So all is working as expected.  It wasn't ever a big issue, I just wanted to
> make sure I thoroughly understood the somewhat counter-intuitive result.

Good.  Btw, I remembered that with the unreachable tracking in SCCVN I
ran into cases where it didn't correctly track all unreachable blocks due
to the way the dominator walk works (which doesn't ensure we've visited
all predecessors).  Like for

   | |
   |   
   | / \
   /   \/
\ 
 \  /

DOM order visits , , ,  and then
the DOM children like .  So we fail to detect bb5 as unreachable
(didn't visit bb4 to mark outgoing edges unreachable yet).

The fix is easy (in testing right now).  Simply track if the current block
was unreachable when visiting DOM children.

Didn't manage to produce a missed-optimization testcase (but only
tried for a few minutes), the cases I've seen it were involving
unreachable loops, but I don't have them anymore.

Sorry if that makes extracting the machinery harder ;)

Richard.

2015-12-04  Richard Biener  

* tree-ssa-sccvn.c (sccvn_dom_walker): Add unreachable_dom
member and initialize it.
(sccvn_dom_walker::after_dom_children): Reset unreachable_dom
if necessary.
(sccvn_dom_walker::before_dom_children): If unreachable_dom
is set BB is not reachable either.  Set unreachable_dom
if not set and BB is unreachable.

> Jeff

p
Description: Binary data

Re: [PATCH 3b/4][AArch64] Add scheduling model for Exynos M1

2015-12-04 Thread Kyrill Tkachov


Hi Evandro,

On 03/12/15 20:58, Evandro Menezes wrote:

On 11/20/2015 11:17 AM, James Greenhalgh wrote:

On Tue, Nov 10, 2015 at 11:54:00AM -0600, Evandro Menezes wrote:

2015-11-10  Evandro Menezes 

gcc/

* config/aarch64/aarch64-cores.def: Use the Exynos M1 sched model.
* config/aarch64/aarch64.md: Include "exynos-m1.md".
* config/arm/arm-cores.def: Use the Exynos M1 sched model.
* config/arm/arm.md: Include "exynos-m1.md".
* config/arm/arm-tune.md: Regenerated.
* config/arm/exynos-m1.md: New file.

This patch adds the scheduling model for Exynos M1.  It depends on
https://gcc.gnu.org/ml/gcc-patches/2015-11/msg01257.html

Bootstrapped on arm-unknown-linux-gnueabihf, aarch64-unknown-linux-gnu.

Please, commit if it's alright.



 From 0b7b6d597e5877c78c4d88e0d4491858555a5364 Mon Sep 17 00:00:00 2001
From: Evandro Menezes 
Date: Mon, 9 Nov 2015 17:18:52 -0600
Subject: [PATCH 2/2] [AArch64] Add scheduling model for Exynos M1

gcc/
* config/aarch64/aarch64-cores.def: Use the Exynos M1 sched model.
* config/aarch64/aarch64.md: Include "exynos-m1.md".

These changes are fine.


* config/arm/arm-cores.def: Use the Exynos M1 sched model.
* config/arm/arm.md: Include "exynos-m1.md".
* config/arm/arm-tune.md: Regenerated.

These changes need an ack from an ARM reviewer.


* config/arm/exynos-m1.md: New file.

I have a few comments on this model.


+;; The Exynos M1 core is modeled as a triple issue pipeline that has
+;; the following functional units.
+
+(define_automaton "exynos_m1_gp")
+(define_automaton "exynos_m1_ls")
+(define_automaton "exynos_m1_fp")
+
+;; 1.  Two pipelines for simple integer operations: A, B
+;; 2.  One pipeline for simple or complex integer operations: C
+
+(define_cpu_unit "em1_xa, em1_xb, em1_xc" "exynos_m1_gp")
+
+(define_reservation "em1_alu" "(em1_xa | em1_xb | em1_xc)")
+(define_reservation "em1_c" "em1_xc")

Is this extra reservation useful, can we not just use em1_xc directly?


+;; 3.  Two asymmetric pipelines for Neon and FP operations: F0, F1
+
+(define_cpu_unit "em1_f0, em1_f1" "exynos_m1_fp")
+
+(define_reservation "em1_fmac" "em1_f0")
+(define_reservation "em1_fcvt" "em1_f0")
+(define_reservation "em1_nalu" "(em1_f0 | em1_f1)")
+(define_reservation "em1_nalu0" "em1_f0")
+(define_reservation "em1_nalu1" "em1_f1")
+(define_reservation "em1_nmisc" "em1_f0")
+(define_reservation "em1_ncrypt" "em1_f0")
+(define_reservation "em1_fadd" "em1_f1")
+(define_reservation "em1_fvar" "em1_f1")
+(define_reservation "em1_fst" "em1_f1")

Same comment here, does this not just obfuscate the interaction between
instruction classes in the description. I'm not against doing it this way
if you prefer, but it would seem to reduce readability to me. I think there
is also an argument that this increases readability, so it is your choice.


+
+;; 4.  One pipeline for branch operations: BX
+
+(define_cpu_unit "em1_bx" "exynos_m1_gp")
+
+(define_reservation "em1_br" "em1_bx")
+

And again?


+;; 5.  One AGU for loads: L
+;; One AGU for stores and one pipeline for stores: S, SD
+
+(define_cpu_unit "em1_lx" "exynos_m1_ls")
+(define_cpu_unit "em1_sx, em1_sd" "exynos_m1_ls")
+
+(define_reservation "em1_ld" "em1_lx")
+(define_reservation "em1_st" "(em1_sx + em1_sd)")
+
+;; Common occurrences
+(define_reservation "em1_sfst" "(em1_fst + em1_st)")
+(define_reservation "em1_lfst" "(em1_fst + em1_ld)")
+
+;; Branches
+;;
+;; No latency as there is no result
+;; TODO: Unconditional branches use no units;
+;; conditional branches add the BX unit;
+;; indirect branches add the C unit.
+(define_insn_reservation "exynos_m1_branch" 0
+  (and (eq_attr "tune" "exynosm1")
+   (eq_attr "type" "branch"))
+  "em1_br")
+
+(define_insn_reservation "exynos_m1_call" 1
+  (and (eq_attr "tune" "exynosm1")
+   (eq_attr "type" "call"))
+  "em1_alu")
+
+;; Basic ALU
+;;
+;; Simple ALU without shift, non-predicated
+(define_insn_reservation "exynos_m1_alu" 1
+  (and (eq_attr "tune" "exynosm1")
+   (and (not (eq_attr "predicated" "yes"))

(and (eq_attr "predicated" "no")) ?

Likewise throughout the file? Again this is your choice.

This is OK from the AArch64 side, let me know if you plan to change any
of the above, otherwise I'll commit it (or someone else can commit it)
after I see an OK from an ARM reviewer.


ARM ping.



This is ok arm-wise, sorry for the delay.
Make sure to regenerate and commit the updated config/arm/arm-tune.md hunk
when committing the patch.

Thanks,
Kyrill

Re: [PATCH] Handle OBJ_TYPE_REF in FRE

2015-12-04 Thread Richard Biener

On Fri, 4 Dec 2015, Jan Hubicka wrote:

> > Indeed we don't do code hoisting yet.  Maybe one could trick PPRE
> > into doing it.
> > 
> > Note that for OBJ_TYPE_REFs in calls you probably should better use
> > gimple_call_fntype instead of the type of the OBJ_TYPE_REF anyway
> > (well, fntype will be the method-type, not pointer-to-method-type).
> > 
> > Not sure if you need OBJ_TYPE_REFs type in non-call contexts?
> 
> Well, to optimize speculative call sequences
> 
> if (funptr == thismethod)
>   inlined this method body
> else
>   funptr ();
> 
> Here you want to devirtualize the conditional, not the call in order
> to get the inlined method unconditonally.
> 
> In general I think OBJ_TYPE_REF is misplaced - it should be on vtable load
> instead of the call/conditional. It is a property of the vtable lookup.
> Then it would work for method pointers too.

Even better.  Make it a tcc_reference tree then.

> > 
> >   if (fn
> >   && (!POINTER_TYPE_P (TREE_TYPE (fn))
> >   || (TREE_CODE (TREE_TYPE (TREE_TYPE (fn))) != FUNCTION_TYPE
> >   && TREE_CODE (TREE_TYPE (TREE_TYPE (fn))) != METHOD_TYPE)))
> > {
> >   error ("non-function in gimple call");
> >   return true;
> > }
> > 
> > and in useless_type_conversion_p:
> > 
> >   /* Do not lose casts to function pointer types.  */
> >   if ((TREE_CODE (TREE_TYPE (outer_type)) == FUNCTION_TYPE
> >|| TREE_CODE (TREE_TYPE (outer_type)) == METHOD_TYPE)
> >   && !(TREE_CODE (TREE_TYPE (inner_type)) == FUNCTION_TYPE
> >|| TREE_CODE (TREE_TYPE (inner_type)) == METHOD_TYPE))
> > return false;
> 
> Yeah, this does not make much sense to me anymore.  Something to track next
> stage1.

Btw, might be necessary for targets with function descriptors - not sure
though.

Richard.

> > 
> > probably from the times we didn't have gimple_call_fntype.  So if I
> > paper over the ICE (in the verifier) then the libreoffice testcase
> > gets optimized to
> > 
> >   :
> >   _3 = this_2(D)->D.2399.D.2325._vptr.B;
> >   _4 = *_3;
> >   PROF_6 = OBJ_TYPE_REF(_4;(struct 
> > WindowListenerMultiplexer)this_2(D)->0);
> >   if (PROF_6 == acquire)
> > goto ;
> >   else
> > goto ;
> > 
> >   :
> >   PROF_6 (this_2(D));
> >   goto ;
> > 
> >   :
> >   PROF_6 (this_2(D));
> > 
> > by FRE2 and either VRP or DOM will propagate the equivalency to
> > 
> >   :
> >   _3 = this_2(D)->D.2399.D.2325._vptr.B;
> >   _4 = *_3;
> >   PROF_6 = OBJ_TYPE_REF(_4;(struct 
> > WindowListenerMultiplexer)this_2(D)->0);
> >   if (PROF_6 == acquire)
> > goto ;
> >   else
> > goto ;
> > 
> >   :
> >   WindowListenerMultiplexer::acquire (this_2(D));
> >   goto ;
> > 
> >   :
> >   PROF_6 (this_2(D));
> > 
> > Richard.
> 
> LGTM.
> Honza
> > 
> > 2015-12-03  Richard Biener  
> > 
> > PR tree-optimization/64812
> > * tree-ssa-sccvn.c (vn_get_stmt_kind): Handle OBJ_TYPE_REF.
> > (vn_nary_length_from_stmt): Likewise.
> > (init_vn_nary_op_from_stmt): Likewise.
> > * gimple-match-head.c (maybe_build_generic_op): Likewise.
> > * gimple-pretty-print.c (dump_unary_rhs): Likewise.
> > * gimple-fold.c (gimple_build): Likewise.
> > * gimple.h (gimple_expr_type): Likewise.
> > 
> > * g++.dg/tree-ssa/ssa-fre-1.C: New testcase.
> > 
> > Index: gcc/tree-ssa-sccvn.c
> > ===
> > *** gcc/tree-ssa-sccvn.c(revision 231221)
> > --- gcc/tree-ssa-sccvn.c(working copy)
> > *** vn_get_stmt_kind (gimple *stmt)
> > *** 460,465 
> > --- 460,467 
> >   ? VN_CONSTANT : VN_REFERENCE);
> > else if (code == CONSTRUCTOR)
> >   return VN_NARY;
> > +   else if (code == OBJ_TYPE_REF)
> > + return VN_NARY;
> > return VN_NONE;
> >   }
> >   default:
> > *** vn_nary_length_from_stmt (gimple *stmt)
> > *** 2479,2484 
> > --- 2481,2487 
> > return 1;
> >   
> >   case BIT_FIELD_REF:
> > + case OBJ_TYPE_REF:
> > return 3;
> >   
> >   case CONSTRUCTOR:
> > *** init_vn_nary_op_from_stmt (vn_nary_op_t
> > *** 2508,2513 
> > --- 2511,2517 
> > break;
> >   
> >   case BIT_FIELD_REF:
> > + case OBJ_TYPE_REF:
> > vno->length = 3;
> > vno->op[0] = TREE_OPERAND (gimple_assign_rhs1 (stmt), 0);
> > vno->op[1] = TREE_OPERAND (gimple_assign_rhs1 (stmt), 1);
> > Index: gcc/gimple-match-head.c
> > ===
> > *** gcc/gimple-match-head.c (revision 231221)
> > --- gcc/gimple-match-head.c (working copy)
> > *** maybe_build_generic_op (enum tree_code c
> > *** 243,248 
> > --- 243,249 
> > *op0 = build1 (code, type, *op0);
> > break;
> >   case BIT_FIELD_REF:
> > + case OBJ_TYPE_REF:
> > *op0 = build3 (code, type, *op0, op1,

Re: [PATCH][ARM] PR target/68214: Delete IP-reg-clobbering call-through-mem patterns

2015-12-04 Thread Ramana Radhakrishnan

On Fri, Dec 4, 2015 at 9:27 AM, Kyrill Tkachov  wrote:
> Hi all,
>
> The wrong-code in this PR occurs for pre-ARMv5 architectures with Thumb
> interworking when trying
> to use a static chain. Our output_call_mem function that outputs the
> assembly for the call explicitly
> clobbers the IP register, which is also used as the static chain register.
>
> Richard suggested offline that we can just remove the *call_mem and
> *call_value_mem patterns as they
> are of no use anymore and just cause us trouble such as this.  The midend
> does a good enough job of
> figuring out it has to load the address to which we should branch.
>
> So this patch does that. It's an entirely negative diffstat :)
> For the failing testcase gcc.dg/cwsc1.c the bad code before this patch in
> the main function is:
> movip, r4
> ldrr3, .L6
> ldrip, [r3]
> movlr, pc
> bxip
>
> and with this patch it is:
> ldrr3, .L6
> ldrr3, [r3]
> movip, r4
> movlr, pc
> bxr3
>
> As you can see it's correct and no less efficient than before.
>
> Bootstrapped and tested on arm-none-linux-gnueabihf and a test run with
> -mcpu=arm7tdmi didn't show any problems.
>
> Ok for trunk?
>

Ok - thanks.

Ramana

> Thanks,
> Kyrill
>
> 2015-12-04  Kyrylo Tkachov  
>
> PR target/68214
> * config/arm/arm.md (*call_mem): Delete pattern.
> (*call_value_mem): Likewise.
> * config/arm/arm.c (output_call_mem): Delete.
> * config/arm/arm-protos.h (output_call_mem): Delete prototype.

Re: [PATCH] Fix missing range information for "%q+D" format code

2015-12-04 Thread Bernd Schmidt


On 12/03/2015 09:33 PM, David Malcolm wrote:

The attached patch updates the handling of %q+D, simplifying
the implementation, and ensuring that it retains the range
information of the decl, giving:

diagnostic-ranges-1.c:6:7: warning: unused variable ‘redundant’ 
[-Wunused-variable]
int redundant;
^


For most of it I've convinced myself that it looks OK.


  void
-rich_location::set_range (unsigned int idx, source_range src_range,
- bool show_caret_p, bool overwrite_loc_p)
+rich_location::set_range (line_maps *set, unsigned int idx,
+ source_location loc, bool show_caret_p)
  {


Here you need to update the function comment.

This is a bit of a strange function. As far as I can tell, it's called 
from only two places, one in c-common.c (which overwrites idx 0, i.e. 
the entire range, and therefore is uninteresting), and one in 
text_info::set_location. I wonder about the use of "idx" in the latter. 
As far as I can tell, that's used by the Fortran frontend to keep track 
of two separate ranges for their diagnostics - correct? Is that really 
related to the normal function of rich_location and how it keeps track 
of multiple ranges, or would that be better expressed by keeping two 
rich_locations in text_info?



Bernd

Re: -fstrict-aliasing fixes 6/6: permit inlining of comdats

2015-12-04 Thread Richard Biener

On Fri, 4 Dec 2015, Jan Hubicka wrote:

> Hi,
> this is the last patch of the series.  It makes operand_equal_p to compare
> alias sets even in !flag_strict_aliasing before inlining so inlining 
> !flag_strict_aliasing to flag_strict_aliasing is possible when callee is
> merged comdat.  I tried to explain it in greater detail in the comment
> in ipa-inline-tranform.
> 
> While working on the code I noticed that I managed to overload merged with
> two meanings. One is that the function had bodies defined in multiple units
> (and thus its inlining should not be considered cross-modulo) and other is
> that it used to be comdat.  This is usually the same, but not always - one
> can manually define weak functions where the bypass for OPTIMIZAITON_NODE
> checks can not apply.
> 
> Since the first only affects heuristics and I do not think I need to care
> about weaks much, I dropped it and renamed the flag to merged_comdat to make
> it more obvious what it means.

I wonder if you can split out the re-naming at this stage.  Further
comments below.

> Bootstrapped/regtested x86_64-linux, OK?
> 
> I will work on some testcases for the ICF and fold-const that would lead
> to wrong code if alias sets was ignored early.

Would be nice to have a wrong-code testcase go with the commit.

> Honza
>   * fold-const.c (operand_equal_p): Before inlining do not permit
>   transformations that would break with strict aliasing.
>   * ipa-inline.c (can_inline_edge_p) Use merged_comdat.
>   * ipa-inline-transform.c (inline_call): When inlining merged comdat do
>   not drop strict_aliasing flag of caller.
>   * cgraphclones.c (cgraph_node::create_clone): Use merged_comdat.
>   * cgraph.c (cgraph_node::dump): Dump merged_comdat.
>   * ipa-icf.c (sem_function::merge): Drop merged_comdat when merging
>   comdat and non-comdat.
>   * cgraph.h (cgraph_node): Rename merged to merged_comdat.
>   * ipa-inline-analysis.c (simple_edge_hints): Check both merged_comdat
>   and icf_merged.
> 
>   * lto-symtab.c (lto_cgraph_replace_node): Update code computing
>   merged_comdat.
> Index: fold-const.c
> ===
> --- fold-const.c  (revision 231239)
> +++ fold-const.c  (working copy)
> @@ -2987,7 +2987,7 @@ operand_equal_p (const_tree arg0, const_
>  flags)))
>   return 0;
> /* Verify that accesses are TBAA compatible.  */
> -   if (flag_strict_aliasing
> +   if ((flag_strict_aliasing || !cfun->after_inlining)
> && (!alias_ptr_types_compatible_p
>   (TREE_TYPE (TREE_OPERAND (arg0, 1)),
>TREE_TYPE (TREE_OPERAND (arg1, 1)))

Sooo  first of all the code is broken anyway as it guards
the restrict checking (MR_DEPENDENCE_*) stuff with flag_strict_aliasing
(ick).  Second, I wouldn't mind if we drop the flag_strict_aliasing
check alltogether, a cfun->after_inlining checks makes me just too
nervous.

> Index: ipa-inline.c
> ===
> --- ipa-inline.c  (revision 231239)
> +++ ipa-inline.c  (working copy)
> @@ -466,7 +466,7 @@ can_inline_edge_p (struct cgraph_edge *e
>   optimized with the optimization flags of module they are used in.
>Also do not care about mixing up size/speed optimization when
>DECL_DISREGARD_INLINE_LIMITS is set.  */
> -  else if ((callee->merged
> +  else if ((callee->merged_comdat
>   && !lookup_attribute ("optimize",
> DECL_ATTRIBUTES (caller->decl)))
>  || DECL_DISREGARD_INLINE_LIMITS (callee->decl))
> Index: ipa-inline-transform.c
> ===
> --- ipa-inline-transform.c(revision 231239)
> +++ ipa-inline-transform.c(working copy)
> @@ -322,11 +322,26 @@ inline_call (struct cgraph_edge *e, bool
>if (DECL_FUNCTION_PERSONALITY (callee->decl))
>  DECL_FUNCTION_PERSONALITY (to->decl)
>= DECL_FUNCTION_PERSONALITY (callee->decl);
> +
> +  /* merged_comdat indicate that function was originally COMDAT and merged
> + from multiple units.  Because every unit using COMDAT must also define 
> it,
> + we know that the function is safe to build with each of the optimization
> + flags used used to compile them.
> +
> + If one unit is compiled with -fstrict-aliasing and
> + other with -fno-strict-aliasing we may bypass dropping the
> + flag_strict_aliasing because we know it would be valid to inline
> + -fstrict-aliaisng variant of the calee, too.  Unless optimization
> + attribute was used, the caller and COMDAT callee must have been
> + compiled with the same flags.  */

So your logic relies on the fact that the -fno-strict-aliasing was
not necessary on copy A if copy B was compiled without that flag
because

Re: Ping [PATCH] c++/42121 - diagnose invalid flexible array members

2015-12-04 Thread Bernd Schmidt


> The patch should bring C++ support for flexible array members closer
> to C (most of the same constructs should be accepted and rejected).
> The only C change in this patch is to include the size of excessively
> large types in diagnostics (I found knowing the size helpful when
> adding tests and I think it might be helpful to others as well).

Can't really comment on the C++ parts, but I spotted some formatting issues.


+  && TREE_CODE (type) != ERROR_MARK
+  && (DECL_NAME (fld) || RECORD_OR_UNION_TYPE_P (type)))
+{
+
+  return TYPE_SIZE (type)
+   && (TREE_CODE (TYPE_SIZE (type)) != INTEGER_CST
+   || !tree_int_cst_equal (size_zero_node, TYPE_SIZE (type)));


Unnecessary blank line. Multi-line expressions should be wrapped in 
parentheses for indentation. Lose braces around single statements.




+   a flexible array member or a zero-size array.
+*/


Comment terminators go at the end of the line.


+  tree size_max_node =
+int_const_binop (MINUS_EXPR, size_zero_node, size_one_node);


The = operator should start the line.


+ tree flexarray =
+   check_flexarrays (t, TYPE_FIELDS (fldtype), seen_field);


Here too.


+  {
+bool dummy = false;
+check_flexarrays (t, TYPE_FIELDS (t), );
+  }


No reason to wrap this in braces.



+  if (NULL_TREE == size)
+return build_index_type (NULL_TREE);


Don't know whether the conventions in cp/ are different, but usually 
this is "size == NULL_TREE".



+   pedwarn (input_location, OPT_Wpedantic,
+"ISO C++ forbids zero-size arrays");
}
+
 }


Extra blank line.


@@ -11082,6 +11094,10 @@ grokdeclarator (const cp_declarator *declarator,
 || !COMPLETE_TYPE_P (TREE_TYPE (type))
 || initialized == 0))
  {
+   if (TREE_CODE (type) != ARRAY_TYPE
+   || !COMPLETE_TYPE_P (TREE_TYPE (type)))
+ {
+
if (unqualified_id)


Here too.


Bernd

[PATCH][ARM] PR target/68214: Delete IP-reg-clobbering call-through-mem patterns

2015-12-04 Thread Kyrill Tkachov


Hi all,

The wrong-code in this PR occurs for pre-ARMv5 architectures with Thumb 
interworking when trying
to use a static chain. Our output_call_mem function that outputs the assembly 
for the call explicitly
clobbers the IP register, which is also used as the static chain register.

Richard suggested offline that we can just remove the *call_mem and 
*call_value_mem patterns as they
are of no use anymore and just cause us trouble such as this.  The midend does 
a good enough job of
figuring out it has to load the address to which we should branch.

So this patch does that. It's an entirely negative diffstat :)
For the failing testcase gcc.dg/cwsc1.c the bad code before this patch in the 
main function is:
movip, r4
ldrr3, .L6
ldrip, [r3]
movlr, pc
bxip

and with this patch it is:
ldrr3, .L6
ldrr3, [r3]
movip, r4
movlr, pc
bxr3

As you can see it's correct and no less efficient than before.

Bootstrapped and tested on arm-none-linux-gnueabihf and a test run with 
-mcpu=arm7tdmi didn't show any problems.

Ok for trunk?

Thanks,
Kyrill

2015-12-04  Kyrylo Tkachov  

PR target/68214
* config/arm/arm.md (*call_mem): Delete pattern.
(*call_value_mem): Likewise.
* config/arm/arm.c (output_call_mem): Delete.
* config/arm/arm-protos.h (output_call_mem): Delete prototype.
diff --git a/gcc/config/arm/arm-protos.h b/gcc/config/arm/arm-protos.h
index e4b8fb3feda74d60e7f6628bb51b9d6d6a431e54..e7328e79650739fca1c3e21b10c194feaa697465 100644
--- a/gcc/config/arm/arm-protos.h
+++ b/gcc/config/arm/arm-protos.h
@@ -132,7 +132,6 @@ extern bool arm_const_double_by_parts (rtx);
 extern bool arm_const_double_by_immediates (rtx);
 extern void arm_emit_call_insn (rtx, rtx, bool);
 extern const char *output_call (rtx *);
-extern const char *output_call_mem (rtx *);
 void arm_emit_movpair (rtx, rtx);
 extern const char *output_mov_long_double_arm_from_arm (rtx *);
 extern const char *output_move_double (rtx *, bool, int *count);
diff --git a/gcc/config/arm/arm.c b/gcc/config/arm/arm.c
index c1cfe83d3f0716837fd3f59399a0f6a92f33c67c..f822a92d2684030c60271cfd74a81772b096e151 100644
--- a/gcc/config/arm/arm.c
+++ b/gcc/config/arm/arm.c
@@ -18057,41 +18057,6 @@ output_call (rtx *operands)
   return "";
 }
 
-/* Output a 'call' insn that is a reference in memory. This is
-   disabled for ARMv5 and we prefer a blx instead because otherwise
-   there's a significant performance overhead.  */
-const char *
-output_call_mem (rtx *operands)
-{
-  gcc_assert (!arm_arch5);
-  if (TARGET_INTERWORK)
-{
-  output_asm_insn ("ldr%?\t%|ip, %0", operands);
-  output_asm_insn ("mov%?\t%|lr, %|pc", operands);
-  output_asm_insn ("bx%?\t%|ip", operands);
-}
-  else if (regno_use_in (LR_REGNUM, operands[0]))
-{
-  /* LR is used in the memory address.  We load the address in the
-	 first instruction.  It's safe to use IP as the target of the
-	 load since the call will kill it anyway.  */
-  output_asm_insn ("ldr%?\t%|ip, %0", operands);
-  output_asm_insn ("mov%?\t%|lr, %|pc", operands);
-  if (arm_arch4t)
-	output_asm_insn ("bx%?\t%|ip", operands);
-  else
-	output_asm_insn ("mov%?\t%|pc, %|ip", operands);
-}
-  else
-{
-  output_asm_insn ("mov%?\t%|lr, %|pc", operands);
-  output_asm_insn ("ldr%?\t%|pc, %0", operands);
-}
-
-  return "";
-}
-
-
 /* Output a move from arm registers to arm registers of a long double
OPERANDS[0] is the destination.
OPERANDS[1] is the source.  */
diff --git a/gcc/config/arm/arm.md b/gcc/config/arm/arm.md
index 2dcd2ccf6dbc476dedeaacdd5bb906a040d1617c..2b48bbaf034b286d723536ec2aa6fe0f9b312911 100644
--- a/gcc/config/arm/arm.md
+++ b/gcc/config/arm/arm.md
@@ -7709,23 +7709,6 @@ (define_insn "*call_reg_arm"
 )
 
 
-;; Note: not used for armv5+ because the sequence used (ldr pc, ...) is not
-;; considered a function call by the branch predictor of some cores (PR40887).
-;; Falls back to blx rN (*call_reg_armv5).
-
-(define_insn "*call_mem"
-  [(call (mem:SI (match_operand:SI 0 "call_memory_operand" "m"))
-	 (match_operand 1 "" ""))
-   (use (match_operand 2 "" ""))
-   (clobber (reg:SI LR_REGNUM))]
-  "TARGET_ARM && !arm_arch5 && !SIBLING_CALL_P (insn)"
-  "*
-  return output_call_mem (operands);
-  "
-  [(set_attr "length" "12")
-   (set_attr "type" "call")]
-)
-
 (define_expand "call_value"
   [(parallel [(set (match_operand   0 "" "")
 	   (call (match_operand 1 "memory_operand" "")
@@ -7789,23 +7772,6 @@ (define_insn "*call_value_reg_arm"
(set_attr "type" "call")]
 )
 
-;; Note: see *call_mem
-
-(define_insn "*call_value_mem"
-  [(set (match_operand 0 "" "")
-	(call (mem:SI (match_operand:SI 1 "call_memory_operand" "m"))
-	  (match_operand 2 "" "")))
-   (use (match_operand 3 "" ""))
-   (clobber (reg:SI LR_REGNUM))]
-  "TARGET_ARM && !arm_arch5 && (!CONSTANT_ADDRESS_P (XEXP (operands[1], 0)))
-   &&

[DOC,PATCH] Mention clog10, clog10f an clog10l in Builtins section.

2015-12-04 Thread Martin Liška

Hello.

I noticed that Builtins section of documentation does not mention clog10{,f,l} 
functions.
I've tried to write a patch, however I'm not sure how should be these functions 
described.

Thanks,
Martin

>From 2cb8dfd30ac14e1de00f3788a0b6e55c6a7fa8b9 Mon Sep 17 00:00:00 2001
From: marxin 
Date: Fri, 4 Dec 2015 10:35:52 +0100
Subject: [PATCH] Mention clog10{,f,l} in documentation (Builtins section)

gcc/ChangeLog:

2015-12-04  Martin Liska  

	* doc/extend.texi: Mention clog10, clog10f an clog10l
	in Builtins section.
---
 gcc/doc/extend.texi | 4 
 1 file changed, 4 insertions(+)

diff --git a/gcc/doc/extend.texi b/gcc/doc/extend.texi
index 63fce0f..e9d08f5 100644
--- a/gcc/doc/extend.texi
+++ b/gcc/doc/extend.texi
@@ -10505,6 +10505,10 @@ that are recognized in any mode since ISO C90 reserves these names for
 the purpose to which ISO C99 puts them.  All these functions have
 corresponding versions prefixed with @code{__builtin_}.
 
+There are also GNU extension functions @code{clog10}, @code{clog10f} and
+@code{clog10l} which names are reserved by ISO C99 for future use.
+All these functions have versions prefixed with @code{__builtin_}.
+
 The ISO C94 functions
 @code{iswalnum}, @code{iswalpha}, @code{iswcntrl}, @code{iswdigit},
 @code{iswgraph}, @code{iswlower}, @code{iswprint}, @code{iswpunct},
-- 
2.6.3

PR c/68657 - Add missing 'Warning' flags to c-family/c.opt + java/lang.opt

2015-12-04 Thread Tobias Burnus

A few warning options lack the 'Warning' flag, which since r228094 (2015-09-24)
has the effect that -W(no-)error= doesn't work for them. Additionally,
--help=warnings doesn't work for them either.

Successfully bootstrapped with c,c++,fortran,lto,go,objc,obj-c++,java on
x86-64-gnu-linux & checked whether the issue in the PR was fixed.

I intent to commit the patch tomorrow as obvious unless someone has objections.

Tobias

Re: [PATCH] Use ECF_MAY_BE_ALLOCA for __builtin_alloca_with_align (PR tree-optimization/68680)

2015-12-04 Thread Jakub Jelinek

On Fri, Dec 04, 2015 at 10:30:38AM +0100, Richard Biener wrote:
> > The following patch has been bootstrapped/regtested on x86_64-linux and
> > i686-linux.
> 
> The patch is ok - it looks like you could have removed the
> __builtin_alloca strcmp with it though.

Ok, will remove the strcmp then.

> Does the patch mean we inlined __builtin_alloca_with_align ()
> functions?  We might run into the issue Eric fixed lately with

Yes, see testcase below.  4.7+ inlines it.  As for tail call optimization,
seems we are just lucky there (f4), as fab pass which is quite late
turns the __builtin_stack_restore into GIMPLE_NOP and tailc pass does not
ignore nops.  Shall I commit following patch to trunk to fix that up
(after committing this VLA fix of course)?

int f1 (char *);

static inline void
f2 (int x)
{
  char a[x];
  f1 (a);
}

void
f3 (int x)
{
  f2 (x);
  f2 (x);
  f2 (x);
  f2 (x);
}

int
f4 (int x)
{
  char a[x];
  return f1 (a);
}

2015-12-04  Jakub Jelinek  

* tree-tailcall.c (find_tail_calls): Ignore GIMPLE_NOPs.

--- gcc/tree-tailcall.c.jj  2015-11-04 11:12:17.0 +0100
+++ gcc/tree-tailcall.c 2015-12-04 11:43:01.296110941 +0100
@@ -412,9 +412,10 @@ find_tail_calls (basic_block bb, struct
 {
   stmt = gsi_stmt (gsi);
 
-  /* Ignore labels, returns, clobbers and debug stmts.  */
+  /* Ignore labels, returns, nops, clobbers and debug stmts.  */
   if (gimple_code (stmt) == GIMPLE_LABEL
  || gimple_code (stmt) == GIMPLE_RETURN
+ || gimple_code (stmt) == GIMPLE_NOP
  || gimple_clobber_p (stmt)
  || is_gimple_debug (stmt))
continue;
@@ -532,7 +533,8 @@ find_tail_calls (basic_block bb, struct
 
   stmt = gsi_stmt (agsi);
 
-  if (gimple_code (stmt) == GIMPLE_LABEL)
+  if (gimple_code (stmt) == GIMPLE_LABEL
+ || gimple_code (stmt) == GIMPLE_NOP)
continue;
 
   if (gimple_code (stmt) == GIMPLE_RETURN)


Jakub

Re: [PATCH] Handle OBJ_TYPE_REF in FRE

2015-12-04 Thread Richard Biener

On Thu, 3 Dec 2015, Jan Hubicka wrote:

> > >may lead to wrong code.
> > 
> > Can you try generating a testcase?
> >  Because with equal vptr and voffset I can't see how that can happen 
> > unless some pass extracts information from the pointer types without 
> > sanity checking with the pointers and offsets.
> 
> I am not sure I can get a wrong code with current mainline, because for now 
> you
> only substitute for the lookup done for speculative devirt and if we wrongly
> predict the thing to be __builtin_unreachable, we dispatch to usual virtual
> call.  Once you get movement on calls it will be easier to do.
> 
> OBJ_TYPE_REF is a wrapper around OBJ_TYPE_EXPR adding three extra parameters:
>  - OBJ_TYPE_REF_OBJECT
>  - OBJ_TYPE_REF_TOKEN
>  - obj_type_ref_class which is computed from TREE_TYPE (obj_type_ref) itself.
> 
> While two OBJ_TYPE_REFS with equivalent OBJ_TYPE_EXPR are kind of same
> expressions, they are optimized differently (just as if they was in different
> alias set).  For that reason you need to match the type of obj_type_ref_class
> because that one is not matched by usless_type_conversion (it is a pointer to
> method of corresponding class type we are looking up)
> 
> The following testcase:
> struct foo {virtual void bar(void) __attribute__ ((const));};
> struct foobar {virtual void bar(void) __attribute__ ((const));};
> void
> dojob(void *ptr, int t)
> {
>   if (t)
>((struct foo*)ptr)->bar();
>   else
>((struct foobar*)ptr)->bar();
> }
> 
> produces
> void dojob(void*, int) (void * ptr, int t)
> {
>   int (*__vtbl_ptr_type) () * _5;
>   int (*__vtbl_ptr_type) () _6;
>   int (*__vtbl_ptr_type) () * _8;
>   int (*__vtbl_ptr_type) () _9;
> 
>   :
>   if (t_2(D) != 0)
> goto ;
>   else
> goto ;
> 
>   :
>   _5 = MEM[(struct foo *)ptr_4(D)]._vptr.foo;
>   _6 = *_5;
>   OBJ_TYPE_REF(_6;(struct foo)ptr_4(D)->0) (ptr_4(D));
>   goto ;
> 
>   :
>   _8 = MEM[(struct foobar *)ptr_4(D)]._vptr.foobar;
>   _9 = *_8;
>   OBJ_TYPE_REF(_9;(struct foobar)ptr_4(D)->0) (ptr_4(D));
> 
>   :
>   return;
> 
> }
> 
> Now I would need to get some code movement done to get _5 and _6
> moved and unified with _8 and _9 that we currently don't do.  
> Still would feel safer if the equivalence predicate also checked
> that the type is the same.

Indeed we don't do code hoisting yet.  Maybe one could trick PPRE
into doing it.

Note that for OBJ_TYPE_REFs in calls you probably should better use
gimple_call_fntype instead of the type of the OBJ_TYPE_REF anyway
(well, fntype will be the method-type, not pointer-to-method-type).

Not sure if you need OBJ_TYPE_REFs type in non-call contexts?

> > >Or do you just substitute the operands of OBJ_TYPE_REF? 
> > 
> > No, I value number them.  But yes, the type issue also crossed my 
> > mind.  Meanwhile testing revealed that I need to adjust 
> > gimple_expr_type to preserve the type of the obj-type-ref, otherwise 
> > the devirt machinery ICEs (receiving void *). That's also a reason we 
> > can't make obj-type-ref a ternary RHS.
> 
> Yep, type of OBJ_TYPE_REF matters...

See above.

> > 
> > >> Bootstrap & regtest running on x86_64-unknown-linux-gnu.
> > >> 
> > >> Note that this does not (yet) substitute OBJ_TYPE_REFs in calls
> > >> with SSA names that have the same value - not sure if that would
> > >> be desired generally (does the devirt machinery cope with that?).
> > >
> > >This should work fine.
> > 
> > OK. So with that substituting the direct call later should work as well.
> Great!

For the above reasons I'm defering all this to stage1.

Below is the patch that actually passed bootstrap & regtest on 
x86_64-unknown-linux-gnu, just in case you want to play with it.
It doesn't do the propagation into calls yet though, the following
does (untested)

Index: gcc/tree-ssa-pre.c
===
--- gcc/tree-ssa-pre.c  (revision 231244)
+++ gcc/tree-ssa-pre.c  (working copy)
@@ -4334,6 +4334,22 @@ eliminate_dom_walker::before_dom_childre
  maybe_remove_unused_call_args (cfun, call_stmt);
  gimple_set_modified (stmt, true);
}
+
+ else
+   {
+ /* Lookup the OBJ_TYPE_REF.  */
+ tree sprime
+   = vn_nary_op_lookup_pieces (3, OBJ_TYPE_REF,
+   TREE_TYPE (fn),
+   _OPERAND (fn, 0), 
NULL);
+ if (sprime)
+   sprime = eliminate_avail (sprime);
+ if (sprime)
+   {
+ gimple_call_set_fn (call_stmt, sprime);
+ gimple_set_modified (stmt, true);
+   }
+   }
}
}
 
but it ICEs because we decided (tree-cfg.c, verify_gimple_call):

  if (fn
  && (!POINTER_TYPE_P (TREE_TYPE (fn))
  || (TREE_CODE (TREE_TYPE (TREE_TYPE (fn))) != FUNCTION_TYPE

RE: Fix 61441 [ 1/5] Add REAL_VALUE_ISSIGNALING_NAN

2015-12-04 Thread Saraswati, Sujoy (OSTL)

Hi,

> If you haven't set up write-access to the repository, please go ahead and get
> that process started:
> 
> https://www.gnu.org/software/gcc/svnwrite.html
> 
> You can list me as your sponsor on the form.
> 
> Once your account is set up, you can commit patches which have been
> approved.
> 
> I'll go ahead and approve #1, #2 and #4.  Richi has approved #3.

Thank you. I have filled the form for 
https://sourceware.org/cgi-bin/pdw/ps_form.cgi and mentioned you as the 
sponsor. I will commit the changes once the sourceware.org account request 
comes through.

> I'm still looking at #5.

I received your comments on this. I will correct the spelling mistakes as well 
as the space-tab usage and post it.

Regards,
Sujoy

> Jeff

Re: [PATCH 1/2] rs6000: Implement cstore for signed Pmode register compares

2015-12-04 Thread David Edelsohn

On Fri, Dec 4, 2015 at 9:34 AM, Segher Boessenkool
 wrote:
> This implements cstore for the last case we do not yet handle, using
> the superopt algo from the venerable CWG.  The only integer cases we
> do still not handle after this are for -m32 -mpowerpc64.  Those case
> now generate a branch sequence, which is better than what we had
> before.
>
> Tested on powerpc64-linux; okay for mainline?
>
>
> Segher
>
>
> 2015-12-04  Segher Boessenkool  
>
> * (cstore4_signed): New expander.
> (cstore4): Call it.  FAIL instead of calling rs6000_emit_sCOND.

The new expander is okay as well as calling it.

Please do not remove the fallback to sCOND without performance testing.

Thanks, David

Re: [PATCH][GCC] Make stackalign test LTO proof

2015-12-04 Thread Andre Vieira


On 17/11/15 16:30, Andre Vieira wrote:

On 17/11/15 12:29, Bernd Schmidt wrote:

On 11/16/2015 04:48 PM, Andre Vieira wrote:

On 16/11/15 15:34, Joern Wolfgang Rennecke wrote:

I just happened to stumble on this problem with another port.
The volatile & test solution doesn't work, though.

What does work, however, is:

__asm__ ("" : : "" (dummy));


I can confirm that Joern's solution works for me too.


Ok to make that change.


Bernd


OK, Joern will you submit a patch for this or shall I?

Cheers,
Andre


Hi,

Reworked following Joern's suggestion.

Is this OK?

Cheers,
Andre

gcc/testsuite/ChangeLog:
2015-12-04  Andre Vieira  
Joern Rennecke  

* gcc.dg/torture/stackalign/builtin-return-1.c: Add an
  inline assembly read to make sure dummy is not optimized
  away by LTO.
From 9cee0251e69c1061a60caaf28da598eab2a1fbee Mon Sep 17 00:00:00 2001
From: Andre Simoes Dias Vieira 
Date: Tue, 24 Nov 2015 13:50:15 +
Subject: [PATCH] Fixed test to be LTO proof.

---
 gcc/testsuite/gcc.dg/torture/stackalign/builtin-return-1.c | 4 +---
 1 file changed, 1 insertion(+), 3 deletions(-)

diff --git a/gcc/testsuite/gcc.dg/torture/stackalign/builtin-return-1.c b/gcc/testsuite/gcc.dg/torture/stackalign/builtin-return-1.c
index af017532aeb3878ef7ad717a2743661a87a56b7d..ec4fd8a9ef33a5e755bdb33e4faa41cab0f16a60 100644
--- a/gcc/testsuite/gcc.dg/torture/stackalign/builtin-return-1.c
+++ b/gcc/testsuite/gcc.dg/torture/stackalign/builtin-return-1.c
@@ -26,15 +26,13 @@ int bar(int n)
    STACK_ARGUMENTS_SIZE));
 }
 
-char *g;
-
 int main(void)
 {
   /* Allocate 64 bytes on the stack to make sure that __builtin_apply
  can read at least 64 bytes above the return address.  */
   char dummy[64];
 
-  g = dummy;
+  __asm__ ("" : : "" (dummy));
 
   if (bar(1) != 2)
 abort();
-- 
1.9.1

Re: [PATCH][AArch64] Don't allow -mgeneral-regs-only to change the .arch assembler directives

2015-12-04 Thread Marcus Shawcroft


On 04/12/15 14:40, Kyrill Tkachov wrote:

Ping.
This almost fell through the cracks.
https://gcc.gnu.org/ml/gcc-patches/2015-10/msg00055.html

Thanks,
Kyrill

On 01/10/15 14:00, Kyrill Tkachov wrote:

Hi all,

As part of the SWITCHABLE_TARGET work I inadvertently changed the
behaviour of -mgeneral-regs-only with respect to the .arch directives
that we emit.
The behaviour of -mgeneral-regs-only in GCC 5 and earlier is such that
it disallows the usage of FP/SIMD registers but does *not* stop the
compiler from
emitting the +fp,+simd etc extensions in the .arch directive of the
generated assembly. This is to accommodate users who may want to write
inline assembly
in a file compiled with -mgeneral-regs-only.

This patch restores the trunk behaviour in that respect to that of GCC
5 and the documentation for the option is tweaked a bit to reflect that.
Bootstrapped and tested on aarch64.

Ok for trunk?

Thanks,
Kyrill

2015-10-01  Kyrylo Tkachov  

  * config/aarch64/aarch64.c (aarch64_override_options_internal):
  Do not alter target_flags due to TARGET_GENERAL_REGS_ONLY_P.
  * doc/invoke.texi (AArch64 options): Mention that
-mgeneral-regs-only
  does not affect the assembler directives.

2015-10-01  Kyrylo Tkachov  

  * gcc.target/aarch64/mgeneral-regs_4.c: New test.




OK /M

Re: [PATCH] Avoid false vector mask conversion

2015-12-04 Thread Ilya Enkovich

On 02 Dec 16:27, Richard Biener wrote:
> On Wed, Dec 2, 2015 at 4:24 PM, Ilya Enkovich  wrote:
> >
> > The problem is that conversion is supposed to be handled by
> > vectorizable_conversion,
> > but it fails to because it is not actually a conversion. I suppose it
> > may be handled
> > in vectorizable_assignment but I chose this pattern because it's meant
> > to handle mask
> > conversion issues.
> 
> I think it's always better to avoid patterns if you can.
> 
> Richard.
> 

Here is a variant with vectorizable_assignment change.  Bootstrapped and 
regtested on x86_64-unknown-linux-gnu.  Does it look better?

Thanks,
Ilya
--
gcc/

2015-12-04  Ilya Enkovich  

* tree-vect-stmts.c (vectorizable_assignment): Support
useless boolean conversion.


diff --git a/gcc/tree-vect-stmts.c b/gcc/tree-vect-stmts.c
index 3b078da..2cdbb04 100644
--- a/gcc/tree-vect-stmts.c
+++ b/gcc/tree-vect-stmts.c
@@ -4229,7 +4229,12 @@ vectorizable_assignment (gimple *stmt, 
gimple_stmt_iterator *gsi,
   /* But a conversion that does not change the bit-pattern is ok.  */
   && !((TYPE_PRECISION (TREE_TYPE (scalar_dest))
> TYPE_PRECISION (TREE_TYPE (op)))
-  && TYPE_UNSIGNED (TREE_TYPE (op
+  && TYPE_UNSIGNED (TREE_TYPE (op)))
+  /* Conversion between boolean types of different sizes is
+a simple assignment in case their vectypes are same
+boolean vectors.  */
+  && (!VECTOR_BOOLEAN_TYPE_P (vectype)
+ || !VECTOR_BOOLEAN_TYPE_P (vectype_in)))
 {
   if (dump_enabled_p ())
 dump_printf_loc (MSG_MISSED_OPTIMIZATION, vect_location,

Re: [PATCH][1/2] Fix PR68553

2015-12-04 Thread Alan Lawrence


On 27/11/15 08:30, Richard Biener wrote:


This is part 1 of a fix for PR68533 which shows that some targets
cannot can_vec_perm_p on an identity permutation.  I chose to fix
this in the vectorizer by detecting the identity itself but with
the current structure of vect_transform_slp_perm_load this is
somewhat awkward.  Thus the following no-op patch simplifies it
greatly (from the times it was restricted to do interleaving-kind
of permutes).  It turned out to not be 100% no-op as we now can
handle non-adjacent source operands so I split it out from the
actual fix.

The two adjusted testcases no longer fail to vectorize because
of "need three vectors" but unadjusted would fail because there
are simply not enough scalar iterations in the loop.  I adjusted
that and now we vectorize it just fine (running into PR68559
which I filed).

Bootstrapped and tested on x86_64-unknown-linux-gnu, applied.

Richard.

2015-11-27  Richard Biener  

PR tree-optimization/68553
* tree-vect-slp.c (vect_get_mask_element): Remove.
(vect_transform_slp_perm_load): Implement in a simpler way.

* gcc.dg/vect/pr45752.c: Adjust.
* gcc.dg/vect/slp-perm-4.c: Likewise.


On aarch64 and ARM targets, this causes

PASS->FAIL: gcc.dg/vect/O3-pr36098.c scan-tree-dump-times vect "vectorizing 
stmts using SLP" 0


That is, we now vectorize using SLP, when previously we did not.

On aarch64 (and I expect ARM too), previously we used a VEC_LOAD_LANES, without 
unrolling, but now we unroll * 4, and vectorize using 3 loads and permutes:


../gcc/gcc/testsuite/gcc.dg/vect/O3-pr36098.c:15:2: note: add new stmt: 
vect__31.15_94 = VEC_PERM_EXPR ;
../gcc/gcc/testsuite/gcc.dg/vect/O3-pr36098.c:15:2: note: add new stmt: 
vect__31.16_95 = VEC_PERM_EXPR ;
../gcc/gcc/testsuite/gcc.dg/vect/O3-pr36098.c:15:2: note: add new stmt: 
vect__31.17_96 = VEC_PERM_EXPR 


which *is* a valid vectorization strategy...


--Alan

Re: [PATCH 07/10] Fix g++.dg/template/ref3.C

2015-12-04 Thread Jason Merrill


On 12/03/2015 05:08 PM, David Malcolm wrote:

On Thu, 2015-12-03 at 15:38 -0500, Jason Merrill wrote:

On 12/03/2015 09:55 AM, David Malcolm wrote:

Testcase g++.dg/template/ref3.C:

   1// PR c++/28341
   2
   3template struct A {};
   4
   5template struct B
   6{
   7  A<(T)0> b; // { dg-error "constant|not a valid" }
   8  A a; // { dg-error "constant|not a valid" }
   9};
  10
  11B b;

The output of this test for both c++11 and c++14 is unaffected
by the patch kit:
   g++.dg/template/ref3.C: In instantiation of 'struct B':
   g++.dg/template/ref3.C:11:15:   required from here
   g++.dg/template/ref3.C:7:11: error: '0' is not a valid template argument for 
type 'const int&' because it is not an lvalue
   g++.dg/template/ref3.C:8:11: error: '0' is not a valid template argument for 
type 'const int&' because it is not an lvalue

However, the c++98 output is changed:

Status quo for c++98:
g++.dg/template/ref3.C: In instantiation of 'struct B':
g++.dg/template/ref3.C:11:15:   required from here
g++.dg/template/ref3.C:7:11: error: a cast to a type other than an integral or 
enumeration type cannot appear in a constant-expression
g++.dg/template/ref3.C:8:11: error: a cast to a type other than an integral or 
enumeration type cannot appear in a constant-expression

(line 7 and 8 are at the closing semicolon for fields b and a)

With the patchkit for c++98:
g++.dg/template/ref3.C: In instantiation of 'struct B':
g++.dg/template/ref3.C:11:15:   required from here
g++.dg/template/ref3.C:7:5: error: a cast to a type other than an integral or 
enumeration type cannot appear in a constant-expression
g++.dg/template/ref3.C:7:5: error: a cast to a type other than an integral or 
enumeration type cannot appear in a constant-expression

So the 2nd:
"error: a cast to a type other than an integral or enumeration type cannot 
appear in a constant-expression"
moves from line 8 to line 7 (and moves them to earlier, having ranges)

What's happening is that cp_parser_enclosed_template_argument_list
builds a CAST_EXPR, the first time from cp_parser_cast_expression,
the second time from cp_parser_functional_cast; these have locations
representing the correct respective caret, i.e.:

 A<(T)0> b;
   ^~~~

and:

 A a;
   ^~~~

Eventually finish_template_type is called for each, to build a RECORD_TYPE,
and we get a cache hit the 2nd time through here in pt.c:
8281  hash = spec_hasher::hash ();
8282  entry = type_specializations->find_with_hash (, hash);
8283
8284  if (entry)
8285return entry->spec;

due to:
template_args_equal (ot=, nt=) at ../../src/gcc/cp/pt.c:7778
which calls:
cp_tree_equal (t1=, t2=) at ../../src/gcc/cp/tree.c:2833
and returns equality.

Hence we get a single RECORD_TYPE for the type A<(T)(0)>, and hence
when issuing the errors it uses the TREE_VEC for the first one,
using the location of the first line.


Why does the type sharing affect where the parser gives the error?


I believe what's happening is that the patchkit is setting location_t
values for more expressions than before, including the expression for
the template param.  pt.c:tsubst_expr has this:

   if (EXPR_HAS_LOCATION (t))
 input_location = EXPR_LOCATION (t);

I believe that before (in the status quo), the substituted types didn't
have location_t values, and hence the above conditional didn't fire;
input_location was coming from a *token* where the expansion happened,
hence we got an error message on the relevant line for each expansion.

With the patch, the substituted types have location_t values within
their params, hence the conditional above fires: input_location is
updated to use the EXPR_LOCATION, which comes from that of the param
within the type - but with type-sharing it's using the first place where
the type is created.

Perhaps a better fix is for cp_parser_non_integral_constant_expression
to take a location_t, rather than have it rely on input_location?


Ah, I see, the error is coming from tsubst_copy_and_build, not 
cp_parser_non_integral_constant_expression.  So indeed this is an effect 
of the canonicalization of template instances, and we aren't going to 
fix it in the context of this patchset.  But this is still a bug, so I'd 
rather have an xfail and a PR than change the expected output.


Jason


I'm not sure what the ideal fix for this is; for now I've worked
around it by updating the dg directives to reflect the new output.

gcc/testsuite/ChangeLog:
* g++.dg/template/ref3.C: Update locations of dg directives.
---
   gcc/testsuite/g++.dg/template/ref3.C | 6 --
   1 file changed, 4 insertions(+), 2 deletions(-)

diff --git a/gcc/testsuite/g++.dg/template/ref3.C 
b/gcc/testsuite/g++.dg/template/ref3.C
index 976c093..6e568c3 100644
--- a/gcc/testsuite/g++.dg/template/ref3.C
+++ b/gcc/testsuite/g++.dg/template/ref3.C
@@ -4,8

Re: [PATCH] S/390: Add -mbackchain options to fix test failure.

2015-12-04 Thread Andreas Krebbel

On 12/04/2015 02:23 AM, Dominik Vogt wrote:
gcc/testsuite/ChangeLog

* gcc.dg/Wframe-address.c: S/390 requires the -mbackchain option to
access arbitrary stack frames.
* gcc.dg/Wno-frame-address.c: Likewise.

Applied. Thanks!

-Andreas-

[PATCH 1/2] rs6000: Implement cstore for signed Pmode register compares

2015-12-04 Thread Segher Boessenkool

This implements cstore for the last case we do not yet handle, using
the superopt algo from the venerable CWG.  The only integer cases we
do still not handle after this are for -m32 -mpowerpc64.  Those case
now generate a branch sequence, which is better than what we had
before.

Tested on powerpc64-linux; okay for mainline?


Segher


2015-12-04  Segher Boessenkool  

* (cstore4_signed): New expander.
(cstore4): Call it.  FAIL instead of calling rs6000_emit_sCOND.

---
 gcc/config/rs6000/rs6000.md | 50 +++--
 1 file changed, 48 insertions(+), 2 deletions(-)

diff --git a/gcc/config/rs6000/rs6000.md b/gcc/config/rs6000/rs6000.md
index 26b0962..98abdb2 100644
--- a/gcc/config/rs6000/rs6000.md
+++ b/gcc/config/rs6000/rs6000.md
@@ -10525,6 +10525,47 @@ (define_expand "cbranch4"
   DONE;
 }")
 
+(define_expand "cstore4_signed"
+  [(use (match_operator 1 "signed_comparison_operator"
+ [(match_operand:P 2 "gpc_reg_operand")
+  (match_operand:P 3 "gpc_reg_operand")]))
+   (clobber (match_operand:P 0 "gpc_reg_operand"))]
+  ""
+{
+  enum rtx_code cond_code = GET_CODE (operands[1]);
+
+  rtx op0 = operands[0];
+  rtx op1 = operands[2];
+  rtx op2 = operands[3];
+
+  if (cond_code == GE || cond_code == LT)
+{
+  cond_code = swap_condition (cond_code);
+  std::swap (op1, op2);
+}
+
+  rtx tmp1 = gen_reg_rtx (mode);
+  rtx tmp2 = gen_reg_rtx (mode);
+  rtx tmp3 = gen_reg_rtx (mode);
+
+  int sh = GET_MODE_BITSIZE (mode) - 1;
+  emit_insn (gen_lshr3 (tmp1, op1, GEN_INT (sh)));
+  emit_insn (gen_ashr3 (tmp2, op2, GEN_INT (sh)));
+
+  emit_insn (gen_subf3_carry (tmp3, op1, op2));
+
+  if (cond_code == LE)
+emit_insn (gen_add3_carry_in (op0, tmp1, tmp2));
+  else
+{
+  rtx tmp4 = gen_reg_rtx (mode);
+  emit_insn (gen_add3_carry_in (tmp4, tmp1, tmp2));
+  emit_insn (gen_xor3 (op0, tmp4, const1_rtx));
+}
+
+  DONE;
+})
+
 (define_expand "cstore4_unsigned"
   [(use (match_operator 1 "unsigned_comparison_operator"
  [(match_operand:P 2 "gpc_reg_operand" "")
@@ -10751,9 +10792,14 @@ (define_expand "cstore4"
 emit_insn (gen_cstore4_unsigned_imm (operands[0], operands[1],
   operands[2], operands[3]));
 
-  /* Everything else, use the mfcr brute force.  */
+  /* We also do not want to use mfcr for signed comparisons.  */
+  else if (mode == Pmode
+  && signed_comparison_operator (operands[1], VOIDmode))
+emit_insn (gen_cstore4_signed (operands[0], operands[1],
+operands[2], operands[3]));
+
   else
-rs6000_emit_sCOND (mode, operands);
+FAIL;
 
   DONE;
 })
-- 
1.9.3

Re: [PATCH PR68542]

2015-12-04 Thread Yuri Rumyantsev

Hi Richard.

Thanks a lot for your review.
Below are my answers.

You asked why I inserted additional check to
++ b/gcc/tree-ssa-forwprop.c
@@ -373,6 +373,11 @@ combine_cond_expr_cond (gimple *stmt, enum
tree_code code, tree type,

   gcc_assert (TREE_CODE_CLASS (code) == tcc_comparison);

+  /* Do not perform combining it types are not compatible.  */
+  if (TREE_CODE (TREE_TYPE (op0)) == VECTOR_TYPE
+  && !tree_int_cst_equal (TYPE_SIZE (type), TYPE_SIZE (TREE_TYPE (op0
+return NULL_TREE;
+

again, how does this happen?

This is because without it I've got assert in fold_convert_loc
  gcc_assert (TREE_CODE (orig) == VECTOR_TYPE
 && tree_int_cst_equal (TYPE_SIZE (type), TYPE_SIZE (orig)));

since it tries to convert vector of bool to scalar bool.
Here is essential part of call-stack:

#0  internal_error (gmsgid=0x1e48397 "in %s, at %s:%d")
at ../../gcc/diagnostic.c:1259
#1  0x01743ada in fancy_abort (
file=0x1847fc3 "../../gcc/fold-const.c", line=2217,
function=0x184b9d0  "fold_convert_loc") at
../../gcc/diagnostic.c:1332
#2  0x009c8330 in fold_convert_loc (loc=0, type=0x718a9d20,
arg=0x71a7f488) at ../../gcc/fold-const.c:2216
#3  0x009f003f in fold_ternary_loc (loc=0, code=VEC_COND_EXPR,
type=0x718a9d20, op0=0x71a7f460, op1=0x718c2000,
op2=0x718c2030) at ../../gcc/fold-const.c:11453
#4  0x009f2f94 in fold_build3_stat_loc (loc=0, code=VEC_COND_EXPR,
type=0x718a9d20, op0=0x71a7f460, op1=0x718c2000,
op2=0x718c2030) at ../../gcc/fold-const.c:12394
#5  0x009d870c in fold_binary_op_with_conditional_arg (loc=0,
code=EQ_EXPR, type=0x718a9d20, op0=0x71a7f460,
op1=0x71a48780, cond=0x71a7f460, arg=0x71a48780,
cond_first_p=1) at ../../gcc/fold-const.c:6465
#6  0x009e3407 in fold_binary_loc (loc=0, code=EQ_EXPR,
type=0x718a9d20, op0=0x71a7f460, op1=0x71a48780)
at ../../gcc/fold-const.c:9211
#7  0x00ecb8fa in combine_cond_expr_cond (stmt=0x71a487d0,
code=EQ_EXPR, type=0x718a9d20, op0=0x71a7f460,
op1=0x71a48780, invariant_only=true)
at ../../gcc/tree-ssa-forwprop.c:382


Secondly, I did not catch your idea to implement GCC Vector Extension
for vector comparison with bool result since
such extension completely depends on comparison context, e.g. for your
example, result type of comparison depends on using - for
if-comparison it is scalar, but for c = (a==b) - result type is
vector. I don't think that this is reasonable for current release.

And finally about AMD performance. I checked that this transformation
works for "-march=bdver4" option and regression for 481.wrf must
disappear too.

Thanks.
Yuri.

2015-12-04 15:18 GMT+03:00 Richard Biener :
> On Mon, Nov 30, 2015 at 2:11 PM, Yuri Rumyantsev  wrote:
>> Hi All,
>>
>> Here is a patch for 481.wrf preformance regression for avx2 which is
>> sligthly modified mask store optimization. This transformation allows
>> perform unpredication for semi-hammock containing masked stores, other
>> words if we have a loop like
>> for (i=0; i>   if (c[i]) {
>> p1[i] += 1;
>> p2[i] = p3[i] +2;
>>   }
>>
>> then it will be transformed to
>>if (!mask__ifc__42.18_165 == { 0, 0, 0, 0, 0, 0, 0, 0 }) {
>>  vect__11.19_170 = MASK_LOAD (vectp_p1.20_168, 0B, mask__ifc__42.18_165);
>>  vect__12.22_172 = vect__11.19_170 + vect_cst__171;
>>  MASK_STORE (vectp_p1.23_175, 0B, mask__ifc__42.18_165, vect__12.22_172);
>>  vect__18.25_182 = MASK_LOAD (vectp_p3.26_180, 0B, mask__ifc__42.18_165);
>>  vect__19.28_184 = vect__18.25_182 + vect_cst__183;
>>  MASK_STORE (vectp_p2.29_187, 0B, mask__ifc__42.18_165, vect__19.28_184);
>>}
>> i.e. it will put all computations related to masked stores to semi-hammock.
>>
>> Bootstrapping and regression testing did not show any new failures.
>
> Can you please split out the middle-end support for vector equality compares?
>
> @@ -3448,10 +3448,17 @@ verify_gimple_comparison (tree type, tree op0, tree 
> op1)
>if (TREE_CODE (op0_type) == VECTOR_TYPE
>   || TREE_CODE (op1_type) == VECTOR_TYPE)
>  {
> -  error ("vector comparison returning a boolean");
> -  debug_generic_expr (op0_type);
> -  debug_generic_expr (op1_type);
> -  return true;
> + /* Allow vector comparison returning boolean if operand types
> +are equal and CODE is EQ/NE.  */
> + if ((code != EQ_EXPR && code != NE_EXPR)
> + || !(VECTOR_BOOLEAN_TYPE_P (op0_type)
> +  || VECTOR_INTEGER_TYPE_P (op0_type)))
> +   {
> + error ("type mismatch for vector comparison returning a 
> boolean");
> + debug_generic_expr (op0_type);
> + debug_generic_expr (op1_type);
> +

Re: [PATCH][1/2] Fix PR68553

2015-12-04 Thread Richard Biener

On December 4, 2015 4:32:33 PM GMT+01:00, Alan Lawrence  
wrote:
>On 27/11/15 08:30, Richard Biener wrote:
>>
>> This is part 1 of a fix for PR68533 which shows that some targets
>> cannot can_vec_perm_p on an identity permutation.  I chose to fix
>> this in the vectorizer by detecting the identity itself but with
>> the current structure of vect_transform_slp_perm_load this is
>> somewhat awkward.  Thus the following no-op patch simplifies it
>> greatly (from the times it was restricted to do interleaving-kind
>> of permutes).  It turned out to not be 100% no-op as we now can
>> handle non-adjacent source operands so I split it out from the
>> actual fix.
>>
>> The two adjusted testcases no longer fail to vectorize because
>> of "need three vectors" but unadjusted would fail because there
>> are simply not enough scalar iterations in the loop.  I adjusted
>> that and now we vectorize it just fine (running into PR68559
>> which I filed).
>>
>> Bootstrapped and tested on x86_64-unknown-linux-gnu, applied.
>>
>> Richard.
>>
>> 2015-11-27  Richard Biener  
>>
>>  PR tree-optimization/68553
>>  * tree-vect-slp.c (vect_get_mask_element): Remove.
>>  (vect_transform_slp_perm_load): Implement in a simpler way.
>>
>>  * gcc.dg/vect/pr45752.c: Adjust.
>>  * gcc.dg/vect/slp-perm-4.c: Likewise.
>
>On aarch64 and ARM targets, this causes
>
>PASS->FAIL: gcc.dg/vect/O3-pr36098.c scan-tree-dump-times vect
>"vectorizing 
>stmts using SLP" 0
>
>That is, we now vectorize using SLP, when previously we did not.
>
>On aarch64 (and I expect ARM too), previously we used a VEC_LOAD_LANES,
>without 
>unrolling, 
but now we unroll * 4, and vectorize using 3 loads and
>permutes:

Happens on x86_64 as well with at least Sse4.1.  Unfortunately we'll have to 
start introducing much more fine-grained target-supports for vect_perm to 
reliably guard all targets.

Richard.

>../gcc/gcc/testsuite/gcc.dg/vect/O3-pr36098.c:15:2: note: add new stmt:
>
>vect__31.15_94 = VEC_PERM_EXPR 2, 4 }>;
>../gcc/gcc/testsuite/gcc.dg/vect/O3-pr36098.c:15:2: note: add new stmt:
>
>vect__31.16_95 = VEC_PERM_EXPR 4, 5 }>;
>../gcc/gcc/testsuite/gcc.dg/vect/O3-pr36098.c:15:2: note: add new stmt:
>
>vect__31.17_96 = VEC_PERM_EXPR 5, 6 }>
>
>which *is* a valid vectorization strategy...
>
>
>--Alan

[PATCH, PR67627][RFC] broken libatomic multilib parallel build

2015-12-04 Thread Szabolcs Nagy


As described in pr other/67627, the all-multi target can be
built in parallel with the %_.lo targets which generate make
dependencies that are parsed during the build of all-multi.

gcc -MD does not generate the makefile dependencies in an
atomic way so make can fail if it concurrently parses those
half-written files.
(not observed on x86, but happens on arm native builds.)

this workaround forces all-multi to only run after the *_.lo
targets are done, but there might be a better solution using
automake properly. (automake should know about the generated
make dependency files that are included into the makefile so
no manual tinkering is needed to get the right build order,
but i don't know how to do that.)

2015-12-04  Szabolcs Nagy  

PR other/67627
* Makefile.am (all-multi): Add dependency.
* Makefile.in: Regenerate.
diff --git a/libatomic/Makefile.am b/libatomic/Makefile.am
index bd0ab29..38c635f 100644
--- a/libatomic/Makefile.am
+++ b/libatomic/Makefile.am
@@ -139,3 +139,10 @@ endif
 
 libatomic_convenience_la_SOURCES = $(libatomic_la_SOURCES)
 libatomic_convenience_la_LIBADD = $(libatomic_la_LIBADD)
+
+# Override the automake generated all-multi rule to guarantee that all-multi
+# is not run in parallel with the %_.lo rules which generate $(DEPDIR)/*.Ppo
+# makefile fragments to avoid broken *.Ppo getting included into the Makefile
+# when it is reloaded during the build of all-multi.
+all-multi: $(libatomic_la_LIBADD)
+	$(MULTIDO) $(AM_MAKEFLAGS) DO=all multi-do # $(MAKE)
diff --git a/libatomic/Makefile.in b/libatomic/Makefile.in
index b696d55..a083d87 100644
--- a/libatomic/Makefile.in
+++ b/libatomic/Makefile.in
@@ -496,12 +496,6 @@ clean-libtool:
 
 distclean-libtool:
 	-rm -f libtool config.lt
-
-# GNU Make needs to see an explicit $(MAKE) variable in the command it
-# runs to enable its job server during parallel builds.  Hence the
-# comments below.
-all-multi:
-	$(MULTIDO) $(AM_MAKEFLAGS) DO=all multi-do # $(MAKE)
 install-multi:
 	$(MULTIDO) $(AM_MAKEFLAGS) DO=install multi-do # $(MAKE)
 
@@ -800,6 +794,13 @@ vpath % $(strip $(search_path))
 %_.lo: Makefile
 	$(LTCOMPILE) $(M_DEPS) $(M_SIZE) $(M_IFUNC) -c -o $@ $(M_SRC)
 
+# Override the automake generated all-multi rule to guarantee that all-multi
+# is not run in parallel with the %_.lo rules which generate $(DEPDIR)/*.Ppo
+# makefile fragments to avoid broken *.Ppo getting included into the Makefile
+# when it is reloaded during the build of all-multi.
+all-multi: $(libatomic_la_LIBADD)
+	$(MULTIDO) $(AM_MAKEFLAGS) DO=all multi-do # $(MAKE)
+
 # Tell versions [3.59,3.63) of GNU make to not export all variables.
 # Otherwise a system limit (for SysV at least) may be exceeded.
 .NOEXPORT:

[Patch, Fortran] PR45859 - Permit array elements to coarray dummy arguments

2015-12-04 Thread Tobias Burnus

This patch permits

   interface
  subroutine sub (x)
 real x(10)[*]
  end subroutine
   end interface
   real :: x(100)[*]
   call sub (x(10))
   end

where one passes an array element ("x(10)") of a contiguous array to a
coarray dummy argument. That's permitted per interpretation request
F08/0048, which ended up in Fortran 2008's Corrigendum 2 - and is also
in the current Fortran 2015 drafts:

"If the dummy argument is an array coarray that has the CONTIGUOUS attribute
 or is not of assumed shape, the corresponding actual argument shall be
 simply contiguous or an element of a simply contiguous array."

the "or ..." of the last line was added in the corrigendum.


I hope and think that I got the true/false of the other users correct - in
most cases, it probably doesn't matter as the caller is only reached for
expr->rank > 0.

Build and regtested on x86-64-gnu-linux.
OK for the trunk?

Tobias
gcc/fortran
	PR fortran/45859
	* expr.c (gfc_is_simply_contiguous): Optionally permit array elements.
	(gfc_check_pointer_assign): Update call.
	* interface.c (compare_parameter): Ditto.
	* trans-array.c (gfc_conv_array_parameter): Ditto.
	* trans-intrinsic.c (gfc_conv_intrinsic_transfer,
	conv_isocbinding_function): Ditto.
	* gfortran.h (gfc_is_simply_contiguous):

gcc/testsuite/
	PR fortran/45859
	* gfortran.dg/coarray_argument_1.f90: New.

diff --git a/gcc/fortran/expr.c b/gcc/fortran/expr.c
index 2aeb0b5..5dd90ef 100644
--- a/gcc/fortran/expr.c
+++ b/gcc/fortran/expr.c
@@ -3683,7 +3683,7 @@ gfc_check_pointer_assign (gfc_expr *lvalue, gfc_expr *rvalue)
 	 and F2008 must be allowed.  */
   if (rvalue->rank != 1)
 	{
-	  if (!gfc_is_simply_contiguous (rvalue, true))
+	  if (!gfc_is_simply_contiguous (rvalue, true, false))
 	{
 	  gfc_error ("Rank remapping target must be rank 1 or"
 			 " simply contiguous at %L", >where);
@@ -4601,7 +4601,7 @@ gfc_has_ultimate_pointer (gfc_expr *e)
a "(::1)" is accepted.  */
 
 bool
-gfc_is_simply_contiguous (gfc_expr *expr, bool strict)
+gfc_is_simply_contiguous (gfc_expr *expr, bool strict, bool permit_element)
 {
   bool colon;
   int i;
@@ -4615,7 +4615,7 @@ gfc_is_simply_contiguous (gfc_expr *expr, bool strict)
   else if (expr->expr_type != EXPR_VARIABLE)
 return false;
 
-  if (expr->rank == 0)
+  if (!permit_element && expr->rank == 0)
 return false;
 
   for (ref = expr->ref; ref; ref = ref->next)
diff --git a/gcc/fortran/gfortran.h b/gcc/fortran/gfortran.h
index 9f61e45..d203c32 100644
--- a/gcc/fortran/gfortran.h
+++ b/gcc/fortran/gfortran.h
@@ -2982,7 +2982,7 @@ void gfc_free_actual_arglist (gfc_actual_arglist *);
 gfc_actual_arglist *gfc_copy_actual_arglist (gfc_actual_arglist *);
 const char *gfc_extract_int (gfc_expr *, int *);
 bool is_subref_array (gfc_expr *);
-bool gfc_is_simply_contiguous (gfc_expr *, bool);
+bool gfc_is_simply_contiguous (gfc_expr *, bool, bool);
 bool gfc_check_init_expr (gfc_expr *);
 
 gfc_expr *gfc_build_conversion (gfc_expr *);
diff --git a/gcc/fortran/interface.c b/gcc/fortran/interface.c
index f74239d..bfd5d36 100644
--- a/gcc/fortran/interface.c
+++ b/gcc/fortran/interface.c
@@ -2020,7 +2020,7 @@ compare_parameter (gfc_symbol *formal, gfc_expr *actual,
 
   /* F2008, C1241.  */
   if (formal->attr.pointer && formal->attr.contiguous
-  && !gfc_is_simply_contiguous (actual, true))
+  && !gfc_is_simply_contiguous (actual, true, false))
 {
   if (where)
 	gfc_error ("Actual argument to contiguous pointer dummy %qs at %L "
@@ -2131,15 +2131,17 @@ compare_parameter (gfc_symbol *formal, gfc_expr *actual,
 
   if (formal->attr.codimension)
 {
-  /* F2008, 12.5.2.8.  */
+  /* F2008, 12.5.2.8 + Corrig 2 (IR F08/0048).  */
+  /* F2015, 12.5.2.8.  */
   if (formal->attr.dimension
 	  && (formal->attr.contiguous || formal->as->type != AS_ASSUMED_SHAPE)
 	  && gfc_expr_attr (actual).dimension
-	  && !gfc_is_simply_contiguous (actual, true))
+	  && !gfc_is_simply_contiguous (actual, true, true))
 	{
 	  if (where)
 	gfc_error ("Actual argument to %qs at %L must be simply "
-		   "contiguous", formal->name, >where);
+		   "contiguous or an element of such an array",
+		   formal->name, >where);
 	  return 0;
 	}
 
@@ -2179,7 +2181,8 @@ compare_parameter (gfc_symbol *formal, gfc_expr *actual,
   && (actual->symtree->n.sym->attr.asynchronous
  || actual->symtree->n.sym->attr.volatile_)
   &&  (formal->attr.asynchronous || formal->attr.volatile_)
-  && actual->rank && formal->as && !gfc_is_simply_contiguous (actual, true)
+  && actual->rank && formal->as
+  && !gfc_is_simply_contiguous (actual, true, false)
   && ((formal->as->type != AS_ASSUMED_SHAPE
 	   && formal->as->type != AS_ASSUMED_RANK && !formal->attr.pointer)
 	  || formal->attr.contiguous))
diff --git a/gcc/fortran/trans-array.c b/gcc/fortran/trans-array.c
index 69f6e19..6e24e2e 100644
--- a/gcc/fortran/trans-array.c
+++ b/gcc/fortran/trans-array.c
@@ -7386,7 +7386,7 @@

PING^2: [PATCH] X86: Optimize access to globals in PIE with copy reloc

2015-12-04 Thread H.J. Lu

PING.

-- Forwarded message --
From: H.J. Lu 
Date: Mon, Oct 19, 2015 at 1:04 PM
Subject: PING: [PATCH] X86: Optimize access to globals in PIE with copy reloc
To: GCC Patches , Richard Biener
, Jakub Jelinek ,
Richard Henderson 

PING.

-- Forwarded message --
From: H.J. Lu 
Date: Wed, Jul 1, 2015 at 5:11 AM
Subject: [PATCH] X86: Optimize access to globals in PIE with copy reloc
To: gcc-patches@gcc.gnu.org

Normally, with PIE, GCC accesses globals that are extern to the module
using GOT.  This is two instructions, one to get the address of the global
from GOT and the other to get the value.  Examples:

---
extern int a_glob;
int
main ()
{
  return a_glob;
}
---

With PIE, the generated code accesses global via GOT using two memory
loads:

movqa_glob@GOTPCREL(%rip), %rax
movl(%rax), %eax

for 64-bit or

movla_glob@GOT(%ecx), %eax
movl(%eax), %eax

for 32-bit.

Some experiments on google and SPEC CPU benchmarks show that the extra
instruction affects performance by 1% to 5%.

Solution - Copy Relocations:

When the linker supports copy relocations, GCC can always assume that
the global will be defined in the executable.  For globals that are
truly extern (come from shared objects), the linker will create copy
relocations and have them defined in the executable.  Result is that
no global access needs to go through GOT and hence improves performance.
We can generate

movla_glob(%rip), %eax

for 64-bit and

movla_glob@GOTOFF(%eax), %eax

for 32-bit.  This optimization only applies to undefined non-weak
non-TLS global data.  Undefined weak global or TLS data access still
must go through GOT.

This patch reverts legitimate_pic_address_disp_p change made in revision
218397, which only applies to x86-64.  Instead, this patch updates
targetm.binds_local_p to indicate if undefined non-weak non-TLS global
data is defined locally in PIE.  It also introduces a new target hook,
binds_tls_local_p to distinguish TLS variable from non-TLS variable.  By
default, binds_tls_local_p is the same as binds_local_p which assumes
TLS variable.

This patch checks if 32-bit and 64-bit linkers support PIE with copy
reloc at configure time.  64-bit linker is enabled in binutils 2.25
and 32-bit linker is enabled in binutils 2.26.  This optimization
is enabled only if the linker support is available.

Since copy relocation in PIE is incompatible with DSO created by
-Wl,-Bsymbolic, this patch also adds a new option, -fsymbolic, which
controls how references to global symbols are bound.  The -fsymbolic
option binds references to global symbols to the local definitions
and external references globally.  It avoids copy relocations in PIE
and optimizes global symbol references in shared library created
by -Wl,-Bsymbolic.

gcc/

PR target/65846
PR target/65886
* configure.ac (HAVE_LD_PIE_COPYRELOC): Renamed to ...
(HAVE_LD_X86_64_PIE_COPYRELOC): This.
(HAVE_LD_386_PIE_COPYRELOC): New.   Defined to 1 if Linux/ia32
linker supports PIE with copy reloc.
* output.h (default_binds_tls_local_p): New.
(default_binds_local_p_3): Add 2 bool arguments.
* target.def (binds_tls_local_p): New target hook.
* varasm.c (decl_default_tls_model): Replace targetm.binds_local_p
with targetm.binds_tls_local_p.
(default_binds_local_p_3): Add a bool argument to indicate TLS
variable and a bool argument to indicate if an undefined non-TLS
non-weak data is local.  Double check TLS variable.  If an
undefined non-TLS non-weak data is local, treat it as defined
locally.
(default_binds_local_p): Pass true and false to
default_binds_local_p_3.
(default_binds_local_p_2): Likewise.
(default_binds_local_p_1): Likewise.
(default_binds_tls_local_p): New.
* config.in: Regenerated.
* configure: Likewise.
* doc/tm.texi: Likewise.
* config/i386/i386.c (legitimate_pic_address_disp_p): Don't
check HAVE_LD_PIE_COPYRELOC here.
(ix86_binds_local): New.
(ix86_binds_tls_local_p): Likewise.
(ix86_binds_local_p): Use it.
(TARGET_BINDS_TLS_LOCAL_P): New.
* doc/tm.texi.in (TARGET_BINDS_TLS_LOCAL_P): New hook.

gcc/testsuite/

PR target/65846
PR target/65886
* gcc.target/i386/pie-copyrelocs-1.c: Updated for ia32.
* gcc.target/i386/pie-copyrelocs-2.c: Likewise.
* gcc.target/i386/pie-copyrelocs-3.c: Likewise.
* gcc.target/i386/pie-copyrelocs-4.c: Likewise.
* gcc.target/i386/pr32219-9.c: Likewise.
* gcc.target/i386/pr32219-10.c: New file.
* gcc.target/i386/pr65886-1.c: Likewise.
* gcc.target/i386/pr65886-2.c: Likewise.
*

Re: Add fuzzing coverage support

2015-12-04 Thread Yury Gribov


On 12/04/2015 04:41 PM, Jakub Jelinek wrote:

Hi!

While this has been posted after stage1 closed and I'm not really happy
that it missed the deadline, I'm willing to grant an exception, the patch
is small enough that it is ok at this point of stage3.  That said, next time
please try to submit new features in time.

Are there any plans for GCC 7 for the other -fsanitize-coverage= options,
or are those just LLVM alternatives to GCC's gcov/-fprofile-generate etc.?

On Thu, Dec 03, 2015 at 08:17:06PM +0100, Dmitry Vyukov wrote:

+unsigned sancov_pass (function *fun)


Formatting:
unsigned
sancov_pass (function *fun)


+{
+  basic_block bb;
+  gimple_stmt_iterator gsi;
+  gimple *stmt, *f;
+  static bool inited;
+
+  if (!inited)
+{
+  inited = true;
+  initialize_sanitizer_builtins ();
+}


You can call this unconditionally, it will return as the first thing
if it is already initialized, no need for another guard.


+
+  /* Insert callback into beginning of every BB. */
+  FOR_EACH_BB_FN (bb, fun)
+{
+  gsi = gsi_after_labels (bb);
+  if (gsi_end_p (gsi))
+continue;
+  stmt = gsi_stmt (gsi);
+  f = gimple_build_call (builtin_decl_implicit (
+ BUILT_IN_SANITIZER_COV_TRACE_PC), 0);


I (personally) prefer no ( at the end of line unless really needed.
In this case you can just do:
   tree fndecl = builtin_decl_implicit (BUILT_IN_SANITIZER_COV_TRACE_PC);
   gimple *g = gimple_build_call (fndecl, 0);
which is same number of lines, but looks nicer.
Also, please move also the gsi, stmt and f (better g or gcall)
declarations to the first assignment to them, they aren't used outside of
the loop.


Also FYI clang-format config has been recently added to contrib/ 
(https://gcc.gnu.org/ml/gcc-patches/2015-11/msg02214.html).





--- testsuite/gcc.dg/sancov/asan.c  (revision 0)
+++ testsuite/gcc.dg/sancov/asan.c  (working copy)
@@ -0,0 +1,21 @@
+/* Test coverage/asan interaction:
+ - coverage instruments __asan_init ctor (thus 4 covarage callbacks)
+ - coverage does not instrument asan-emitted basic blocks
+ - asan considers coverage callback as "nonfreeing" (thus 1 asan store
+   callback.  */
+/* { dg-do compile } */
+/* { dg-options "-fsanitize-coverage=trace-pc -fsanitize=address" } */
+
+void notailcall ();
+
+void foo(volatile int *a, int *b)
+{
+  *a = 1;
+  if (*b)
+*a = 2;
+  notailcall ();
+}
+
+/* { dg-final { scan-assembler-times "call__sanitizer_cov_trace_pc" 4 
} } */
+/* { dg-final { scan-assembler-times "call__asan_report_load4" 1 } } */
+/* { dg-final { scan-assembler-times "call__asan_report_store4" 1 } } 
*/


I don't like these, we have lots of targets, and different targets have
different instructions for making calls, different whitespace in between
the insn name and called function, sometimes some extra decoration on the fn
name, (say sometimes an extra _ prefix), etc.  IMHO much better to add
-fdump-tree-optimized and scan-tree-dump-times instead for the calls in the
optimized dump.  Affects all tests.

Please repost a patch with these changes fixed, it will be hopefully ackable
then.

Jakub

Re: [testsuite][ARM target attributes] Fix effective_target tests

2015-12-04 Thread Christophe Lyon

Ping?


On 27 November 2015 at 14:00, Christophe Lyon
 wrote:
> Hi,
>
> After the recent commits from Christian adding target attributes
> support for ARM FPU settings,  I've noticed that some of the tests
> were failing because of incorrect assumptions wrt to the default
> cpu/fpu/float-abi of the compiler.
>
> This patch fixes the problems I've noticed in the following way:
> - do not force -mfloat-abi=softfp in dg-options, to avoid conflicts
> when gcc is configured --with-float=hard
>
> - change arm_vfp_ok such that it tries several -mfpu/-mfloat-abi
> flags, checks that __ARM_FP is defined and __ARM_NEON_FP is not
> defined
>
> - introduce arm_fp_ok, which is similar but does not enforce fpu setting
>
> - add a new effective_target: arm_crypto_pragma_ok to check that
> setting this fpu via a pragma is actually supported by the current
> "multilib". This is different from checking the command-line option
> because the pragma might conflict with the command-line options in
> use.
>
> The updates in the testcases are as follows:
> - attr-crypto.c, we have to make sure that the defaut fpu does not
> conflict with the one forced by pragma. That's why I use the arm_vfp
> options/effective_target. This is needed if gcc has been configured
> --with-fpu=neon-fp16, as the pragma fpu=crypto-neon-fp-armv8 would
> conflict.
>
> - attr-neon-builtin-fail.c: use arm_fp to force the appropriate
> float-abi setting. Enforcing fpu is not needed here.
>
> - attr-neon-fp16.c: similar, I also removed arm_neon_ok since it was
> not necessary to make the test pass in my testing. On second thought,
> I'm wondering whether I should leave it and make the test unsupported
> in more cases (such as when forcing -march=armv5t, although it does
> pass with this patch)
>
> - attr-neon2.c: use arm_vfp to force the appropriate float-abi
> setting. Enforcing mfpu=vfp is needed to avoid conflict with the
> pragma target fpu=neon (for instance if the toolchain default is
> neon-fp16)
>
> - attr-neon3.c: similar
>
> Tested on a variety of configurations, see:
> http://people.linaro.org/~christophe.lyon/cross-validation/gcc-test-patches/230929-target-attr/report-build-info.html
>
> Note that the regressions reported fall into 3 categories:
> - when forcing march=armv5t: tests are now unsupported because I
> modified arm_crypto_ok to require arm_v8_neon_ok instead of arm32.
>
> - the warning reported by attr-neon-builtin-fail.c moved from line 12
> to 14 and is thus seen as a regression + one improvement
>
> - finally, attr-neon-fp16.c causes an ICE on armeb compilers, for
> which I need to post a bugzilla.
>
I've created https://gcc.gnu.org/bugzilla/show_bug.cgi?id=68620 for this.

>
> TBH, I'm a bit concerned by the complexity of all these multilib-like
> conditions. I'm confident that I'm still missing some combinations :-)
>
> And with new target attributes coming, new architectures etc... all
> this logic is likely to become even more complex.
>
> That being said, OK for trunk?
>
> Christophe
>
>
> 2015-11-27  Christophe Lyon  
>
> * lib/target-supports.exp
> (check_effective_target_arm_vfp_ok_nocache): New.
> (check_effective_target_arm_vfp_ok): Call the new
> check_effective_target_arm_vfp_ok_nocache function.
> (check_effective_target_arm_fp_ok_nocache): New.
> (check_effective_target_arm_fp_ok): New.
> (add_options_for_arm_fp): New.
> (check_effective_target_arm_crypto_ok_nocache): Require
> target_arm_v8_neon_ok instead of arm32.
> (check_effective_target_arm_crypto_pragma_ok_nocache): New.
> (check_effective_target_arm_crypto_pragma_ok): New.
> (add_options_for_arm_vfp): New.
> * gcc.target/arm/attr-crypto.c: Use arm_crypto_pragma_ok effective
> target. Do not force -mfloat-abi=softfp, use arm_vfp effective
> target instead.
> * gcc.target/arm/attr-neon-builtin-fail.c: Do not force
> -mfloat-abi=softfp, use arm_fp effective target instead.
> * gcc.target/arm/attr-neon-fp16.c: Likewise. Remove arm_neon_ok
> dependency.
> * gcc.target/arm/attr-neon2.c: Do not force -mfloat-abi=softfp,
> use arm_vfp effective target instead.
> * gcc.target/arm/attr-neon3.c: Likewise.

Re: [PATCH, i386] Fix alignment check for AVX-512 masked store

2015-12-04 Thread Kirill Yukhin

Hi Ilya,
On 02 Dec 16:51, Ilya Enkovich wrote:
> Hi,
> 
> This patch fixes wrong alignment check in _store_mask
> pattern.  Currently we check a register operand instead of a memory
> one.  This fixes segfault on 481.wrf compiled at -O3 for KNL target.
> I bootstrapped and tested this patch on x86_64-unknown-linux-gnu.
> 
> I got a bunch of new failures:
> 
> FAIL: gcc.target/i386/avx512vl-vmovapd-1.c scan-assembler-times vmovapd[ 
> \\t]+[^{\n]*%xmm[0-9]+[^\n]*\\){%k[1-7]}(?:\n|[ \\t]+#) 1
> FAIL: gcc.target/i386/avx512vl-vmovapd-1.c scan-assembler-times vmovapd[ 
> \\t]+[^{\n]*%xmm[0-9]+[^\n]*\\){%k[1-7]}(?:\n|[ \\t]+#) 1
> FAIL: gcc.target/i386/avx512vl-vmovapd-1.c scan-assembler-times vmovapd[ 
> \\t]+[^{\n]*%ymm[0-9]+[^\n]*\\){%k[1-7]}(?:\n|[ \\t]+#) 1
> FAIL: gcc.target/i386/avx512vl-vmovapd-1.c scan-assembler-times vmovapd[ 
> \\t]+[^{\n]*%ymm[0-9]+[^\n]*\\){%k[1-7]}(?:\n|[ \\t]+#) 1
> FAIL: gcc.target/i386/avx512vl-vmovaps-1.c scan-assembler-times vmovaps[ 
> \\t]+[^{\n]*%xmm[0-9]+[^\n]*\\){%k[1-7]}(?:\n|[ \\t]+#) 1
> FAIL: gcc.target/i386/avx512vl-vmovaps-1.c scan-assembler-times vmovaps[ 
> \\t]+[^{\n]*%xmm[0-9]+[^\n]*\\){%k[1-7]}(?:\n|[ \\t]+#) 1
> FAIL: gcc.target/i386/avx512vl-vmovaps-1.c scan-assembler-times vmovaps[ 
> \\t]+[^{\n]*%ymm[0-9]+[^\n]*\\){%k[1-7]}(?:\n|[ \\t]+#) 1
> FAIL: gcc.target/i386/avx512vl-vmovaps-1.c scan-assembler-times vmovaps[ 
> \\t]+[^{\n]*%ymm[0-9]+[^\n]*\\){%k[1-7]}(?:\n|[ \\t]+#) 1
> 
> With patch applied test generates vmovup[sd] because memory
> references don't have proper alignment set.  Since this is
> another bug and it's actually a performance one, I think
> this patch should go to trunk.

Your patch definetely fixes stability issue.
New fails are performance issues.

OK for main trunk. Pls, file regression bug in Bugzilla against new fails.

--
Thanks, K


> 
> Thanks,
> Ilya
> --
> gcc/
> 
> 2015-12-02  Ilya Enkovich  
> 
>   * config/i386/sse.md (_store_mask): Fix
>   operand checked for alignment.
> 
> 
> diff --git a/gcc/config/i386/sse.md b/gcc/config/i386/sse.md
> index e7b517a..d65ed0c 100644
> --- a/gcc/config/i386/sse.md
> +++ b/gcc/config/i386/sse.md
> @@ -1051,7 +1051,7 @@
>sse_suffix = "";
>  }
>  
> -  if (misaligned_operand (operands[1], mode))
> +  if (misaligned_operand (operands[0], mode))
>  align = "u";
>else
>  align = "a";

[PATCH 02/10 v2] Fix g++.dg/cpp0x/nsdmi-template14.C (v2)

2015-12-04 Thread David Malcolm

On Thu, 2015-12-03 at 17:17 -0500, Jason Merrill wrote:
> On 12/03/2015 04:43 PM, David Malcolm wrote:
> > On Thu, 2015-12-03 at 15:33 -0500, Jason Merrill wrote:
> >> On 12/03/2015 09:55 AM, David Malcolm wrote:
> >>> This patch adds bulletproofing to detect purged tokens, and avoid using
> >>> them.
> >>>
> >>> Alternatively, is it OK to access purged tokens for this kind of thing?
> >>> If so, would it make more sense to instead leave their locations untouched
> >>> when purging them?
> >>
> >> I think cp_lexer_previous_token should skip past purged tokens.
> >
> > Sorry if this is a silly question, but should I limit the iteration e.g.
> > by detecting a sentinel value?  e.g.
> >parser->lexer->buffer->address () ?
> >
> > Or is there guaranteed to be an unpurged token somewhere beforehand?
> 
> There should always be an unpurged token.

Thanks.

> > Out of interest, the prior tokens here are:
> >
> > (gdb) p end_tok[0]
> > $25 = {type = CPP_GREATER, keyword = RID_MAX, flags = 0 '\000',
> > pragma_kind = PRAGMA_NONE, implicit_extern_c = 0,
> >error_reported = 0, purged_p = 1, location = 0, u = {tree_check_value
> > = 0x0, value = }}
> >
> > (gdb) p end_tok[-1]
> > $26 = {type = CPP_NAME, keyword = RID_MAX, flags = 0 '\000', pragma_kind
> > = PRAGMA_NONE, implicit_extern_c = 0,
> >error_reported = 0, purged_p = 1, location = 0, u = {tree_check_value
> > = 0x0, value = }}
> >
> > (gdb) p end_tok[-2]
> > $27 = {type = CPP_LESS, keyword = RID_MAX, flags = 0 '\000', pragma_kind
> > = PRAGMA_NONE, implicit_extern_c = 0,
> >error_reported = 0, purged_p = 1, location = 0, u = {tree_check_value
> > = 0x0, value = }}
> >
> > (gdb) p end_tok[-3]
> > $28 = {type = 86, keyword = RID_MAX, flags = 1 '\001', pragma_kind =
> > PRAGMA_NONE, implicit_extern_c = 0, error_reported = 0,
> >purged_p = 0, location = 202016, u = {tree_check_value =
> > 0x719dfd98, value = }}
> >
> > (gdb) p end_tok[-4]
> > $29 = {type = CPP_KEYWORD, keyword = RID_NEW, flags = 1 '\001',
> > pragma_kind = PRAGMA_NONE, implicit_extern_c = 0,
> >error_reported = 0, purged_p = 0, location = 201890, u =
> > {tree_check_value = 0x718a8318,
> >  value = }}
> >
> > where the previous unpurged token is:
> >
> > (gdb) p end_tok[-3].purged_p
> > $31 = 0
> >
> > (gdb) call inform (end_tok[-3].location, "")
> > ../../src/gcc/testsuite/g++.dg/cpp0x/nsdmi-template14.C:11:14: note:
> > B* p = new B;
> >^
> >
> > which would give a range of:
> >
> > B* p = new B;
> >^
> >
> > for the erroneous new expression, rather than:
> >
> >
> > B* p = new B;
> >^~~~
> >
> > if we used the location of the purged token (the CPP_GREATER).
> > I prefer the latter, hence my suggestion about not zero-ing out the
> > locations of tokens when purging them.
> 
> The unpurged token you're finding is the artificial CPP_TEMPLATE_ID 
> token, which seems to need to have its location adjusted to reflect the 
> full range of the template-id.

Aha!  Thanks.

Here's an updated fix for g++.dg/cpp0x/nsdmi-template14.C,
which generates meaningful locations/ranges when converting tokens
to CPP_TEMPLATE_ID:

(gdb) call inform (end_tok[-3].location, "")
../../src/gcc/testsuite/g++.dg/cpp0x/nsdmi-template14.C:11:14: note:
   B* p = new B; // { dg-error "recursive instantiation of non-static data" }
  ^~~~

I added a test case for this to
g++.dg/plugin/diagnostic-test-expressions-1.C.

The updated patch also updates cp_lexer_previous_token to skip
past purged tokens, so that we use the above token when determining
the end of the new-expression, giving:

../../src/gcc/testsuite/g++.dg/cpp0x/nsdmi-template14.C:11:10: error: recursive 
instantiation of non-static data member initializer for ‘B<1>::p’
   B* p = new B; // { dg-error "recursive instantiation of non-static data" }
  ^~~~

and hence we no longer need the "bulletproofing" from the previous
iteration of the patch.

As before, the patch also updates the location of a dg-error directive
in the testcase to reflect improved location information.

Successfully bootstrapped on x86_64-pc-linux-gnu (in
combination with the other patches from the kit).

gcc/cp/ChangeLog:
* parser.c (cp_lexer_previous_token): Skip past purged tokens.
(cp_parser_template_id): When converting a token to
CPP_TEMPLATE_ID, update the location.

gcc/testsuite/ChangeLog:
* g++.dg/cpp0x/nsdmi-template14.C: Move dg-error directive.
* g++.dg/plugin/diagnostic-test-expressions-1.C
(test_template_id): New function.
---
 gcc/cp/parser.c   | 19 +++
 gcc/testsuite/g++.dg/cpp0x/nsdmi-template14.C |  4 ++--
 .../g++.dg/plugin/diagnostic-test-expressions-1.C |  9 +
 3 files changed, 30 insertions(+), 2 deletions(-)

diff --git a/gcc/cp/parser.c b/gcc/cp/parser.c
index d859a89..1e3ada5 100644
---

Re: [PATCH] enable loop fusion on isl-15

2015-12-04 Thread Alan Lawrence


On 05/11/15 21:43, Sebastian Pop wrote:

* graphite-optimize-isl.c (optimize_isl): Call
isl_options_set_schedule_maximize_band_depth.

* gcc.dg/graphite/fuse-1.c: New.
* gcc.dg/graphite/fuse-2.c: New.
* gcc.dg/graphite/interchange-13.c: Remove bogus check.


I note that the test

scan-tree-dump-times forwprop4 "gimple_simplified to[^\\n]*\\^ 12" 1

FAILs under isl-0.14, with which GCC can still be built and generally claims to 
work.


Is it worth trying to detect this in the testsuite, so we can XFAIL it? By which 
I mean, is there a reasonable testsuite mechanism by which we could do that?


Cheers, Alan

GCC 5.4 Status report (2015-12-04)

2015-12-04 Thread Richard Biener


Status
==

The GCC 5 branch is open again for regression and documentation fixes.
If nothing unusual happens you can expect GCC 5.4 somewhen closely
before GCC 6 is released.


Quality Data


Priority  #   Change from last report
---   ---
P10
P2  109-  12
P3   28+   8
P4   85-   2
P5   32
---   ---
Total P1-P3 137+   4
Total   254-   6


Previous Report
===

https://gcc.gnu.org/ml/gcc/2015-11/msg00113.html

[HSA] Handle __builtin_{bzero,mempcpy}

2015-12-04 Thread Martin Liška

Hello.

The patch handles builtins mentioned in the email subject and was installed
to the HSA branch.

Thanks,
Martin
>From b56ba10d46c03cadfda16c6658dd3134f5da09f8 Mon Sep 17 00:00:00 2001
From: marxin 
Date: Thu, 3 Dec 2015 13:31:28 +0100
Subject: [PATCH 2/3] HSA: implement __builtin_bzero

gcc/ChangeLog:

2015-12-03  Martin Liska  

	* hsa-gen.c (build_memset_value): Provide special case
	for zero value.
	(gen_hsa_insns_for_call): Handle BUILT_IN_BZERO.
---
 gcc/hsa-gen.c | 29 +
 1 file changed, 29 insertions(+)

diff --git a/gcc/hsa-gen.c b/gcc/hsa-gen.c
index 85ba148..503f1fc 100644
--- a/gcc/hsa-gen.c
+++ b/gcc/hsa-gen.c
@@ -2649,6 +2649,9 @@ gen_hsa_memory_copy (hsa_bb *hbb, hsa_op_address *target, hsa_op_address *src,
 static unsigned HOST_WIDE_INT
 build_memset_value (unsigned HOST_WIDE_INT constant, unsigned byte_size)
 {
+  if (constant == 0)
+return 0;
+
   HOST_WIDE_INT v = constant;
 
   for (unsigned i = 1; i < byte_size; i++)
@@ -5434,6 +5437,32 @@ gen_hsa_insns_for_call (gimple *stmt, hsa_bb *hbb)
 
 	break;
   }
+case BUILT_IN_BZERO:
+  {
+	tree dst = gimple_call_arg (stmt, 0);
+	tree byte_size = gimple_call_arg (stmt, 1);
+
+	if (!tree_fits_uhwi_p (byte_size))
+	  {
+	gen_hsa_insns_for_direct_call (stmt, hbb);
+	return;
+	  }
+
+	unsigned n = tree_to_uhwi (byte_size);
+
+	if (n > HSA_MEMORY_BUILTINS_LIMIT)
+	  {
+	gen_hsa_insns_for_direct_call (stmt, hbb);
+	return;
+	  }
+
+	hsa_op_address *dst_addr;
+	dst_addr = get_address_from_value (dst, hbb);
+
+	gen_hsa_memory_set (hbb, dst_addr, 0, n);
+
+	break;
+  }
 case BUILT_IN_ALLOCA:
 case BUILT_IN_ALLOCA_WITH_ALIGN:
   {
-- 
2.6.3

>From cf61d78b048df8ce68c7d7586304faebce1c339e Mon Sep 17 00:00:00 2001
From: marxin 
Date: Thu, 3 Dec 2015 15:10:14 +0100
Subject: [PATCH 3/3] HSA: implement __builtin_mempcpy

---
 gcc/hsa-gen.c | 25 ++---
 1 file changed, 22 insertions(+), 3 deletions(-)

diff --git a/gcc/hsa-gen.c b/gcc/hsa-gen.c
index 503f1fc..25ef914 100644
--- a/gcc/hsa-gen.c
+++ b/gcc/hsa-gen.c
@@ -1180,7 +1180,6 @@ hsa_op_address::operator new (size_t)
   return hsa_allocp_operand_address->allocate_raw ();
 }
 
-
 /* Constructor of an operand referring to HSAIL code.  */
 
 hsa_op_code_ref::hsa_op_code_ref () : hsa_op_base (BRIG_KIND_OPERAND_CODE_REF),
@@ -5047,7 +5046,8 @@ gen_hsa_insns_for_call (gimple *stmt, hsa_bb *hbb)
 }
 
   tree fndecl = gimple_call_fndecl (stmt);
-  switch (DECL_FUNCTION_CODE (fndecl))
+  enum built_in_function builtin = DECL_FUNCTION_CODE (fndecl);
+  switch (builtin)
 {
 case BUILT_IN_FABS:
 case BUILT_IN_FABSF:
@@ -5366,6 +5366,7 @@ gen_hsa_insns_for_call (gimple *stmt, hsa_bb *hbb)
 	break;
   }
 case BUILT_IN_MEMCPY:
+case BUILT_IN_MEMPCPY:
   {
 	tree byte_size = gimple_call_arg (stmt, 2);
 
@@ -5393,7 +5394,25 @@ gen_hsa_insns_for_call (gimple *stmt, hsa_bb *hbb)
 
 	tree lhs = gimple_call_lhs (stmt);
 	if (lhs)
-	  gen_hsa_insns_for_single_assignment (lhs, dst, hbb);
+	  {
+	hsa_op_reg *lhs_reg = hsa_cfun->reg_for_gimple_ssa (lhs);
+	hsa_op_with_type *dst_reg = hsa_reg_or_immed_for_gimple_op (dst,
+	hbb);
+	hsa_op_with_type *tmp;
+
+	if (builtin == BUILT_IN_MEMPCPY)
+	  {
+		tmp = new hsa_op_reg (dst_reg->m_type);
+		hsa_insn_basic *add = new hsa_insn_basic
+		  (3, BRIG_OPCODE_ADD, tmp->m_type,
+		   tmp, dst_reg, new hsa_op_immed (n, dst_reg->m_type));
+		hbb->append_insn (add);
+	  }
+	else
+	  tmp = dst_reg;
+
+	hsa_build_append_simple_mov (lhs_reg, tmp, hbb);
+	  }
 
 	break;
   }
-- 
2.6.3

[HSA] Fix emission of internal functions

2015-12-04 Thread Martin Liška

Hello.

Following patch handles just floating pointer internal functions and
for the rest if displays warning message.

Patch has been installed to the HSA branch.

Thanks,
Martin
>From 5c6581d913c754e4e0a197e073bfa3c17c20b31f Mon Sep 17 00:00:00 2001
From: marxin 
Date: Fri, 4 Dec 2015 09:56:11 +0100
Subject: [PATCH 1/3] HSA: fix emission of non-math internal functions

gcc/ChangeLog:

2015-12-04  Martin Liska  

	* hsa-gen.c (gen_hsa_insn_for_internal_fn_call): Explicitly
	enumerate all internal functions that can be emitted as function
	call instruction.
---
 gcc/hsa-gen.c | 28 +++-
 1 file changed, 27 insertions(+), 1 deletion(-)

diff --git a/gcc/hsa-gen.c b/gcc/hsa-gen.c
index 3fbafb5..85ba148 100644
--- a/gcc/hsa-gen.c
+++ b/gcc/hsa-gen.c
@@ -4974,8 +4974,34 @@ gen_hsa_insn_for_internal_fn_call (gcall *stmt, hsa_bb *hbb)
   gen_hsa_popcount (stmt, hbb);
   break;
 
-default:
+case IFN_ACOS:
+case IFN_ASIN:
+case IFN_ATAN:
+case IFN_EXP:
+case IFN_EXP10:
+case IFN_EXPM1:
+case IFN_LOG:
+case IFN_LOG10:
+case IFN_LOG1P:
+case IFN_LOGB:
+case IFN_SIGNIFICAND:
+case IFN_TAN:
+case IFN_NEARBYINT:
+case IFN_ROUND:
+case IFN_ATAN2:
+case IFN_COPYSIGN:
+case IFN_FMOD:
+case IFN_POW:
+case IFN_REMAINDER:
+case IFN_SCALB:
+case IFN_FMIN:
+case IFN_FMAX:
   gen_hsa_insns_for_call_of_internal_fn (stmt, hbb);
+
+default:
+  HSA_SORRY_ATV (gimple_location (stmt),
+		 "support for HSA does not implement internal function: %s",
+		 internal_fn_name (fn));
   break;
 }
 }
-- 
2.6.3

[PATCH 2/2] rs6000: Clean up the cstore code a bit

2015-12-04 Thread Segher Boessenkool

"register_operand" was a bit confusing.  Also some other minor cleanups.

Tested on powerpc64-linux; okay for mainline?


Segher


2015-12-04  Segher Boessenkool  

* (cstore4_unsigned): Use gpc_reg_operand instead of
register_operand.  Remove empty constraints.  Use std::swap.
(cstore_si_as_di, cstore4_signed_imm,
cstore4_unsigned_imm, cstore4 for GPR): Use
gpc_reg_operand instead of register_operand.
(cstore4 for FP): Use gpc_reg_operand instead of
register_operand.  Remove empty constraints.

---
 gcc/config/rs6000/rs6000.md | 23 +++
 1 file changed, 11 insertions(+), 12 deletions(-)

diff --git a/gcc/config/rs6000/rs6000.md b/gcc/config/rs6000/rs6000.md
index 98abdb2..80f4161 100644
--- a/gcc/config/rs6000/rs6000.md
+++ b/gcc/config/rs6000/rs6000.md
@@ -10568,9 +10568,9 @@ (define_expand "cstore4_signed"
 
 (define_expand "cstore4_unsigned"
   [(use (match_operator 1 "unsigned_comparison_operator"
- [(match_operand:P 2 "gpc_reg_operand" "")
-  (match_operand:P 3 "reg_or_short_operand" "")]))
-   (clobber (match_operand:P 0 "register_operand"))]
+ [(match_operand:P 2 "gpc_reg_operand")
+  (match_operand:P 3 "reg_or_short_operand")]))
+   (clobber (match_operand:P 0 "gpc_reg_operand"))]
   ""
 {
   enum rtx_code cond_code = GET_CODE (operands[1]);
@@ -10582,8 +10582,7 @@ (define_expand "cstore4_unsigned"
   if (cond_code == GEU || cond_code == LTU)
 {
   cond_code = swap_condition (cond_code);
-  op1 = operands[3];
-  op2 = operands[2];
+  std::swap (op1, op2);
 }
 
   if (!gpc_reg_operand (op1, mode))
@@ -10609,7 +10608,7 @@ (define_expand "cstore_si_as_di"
   [(use (match_operator 1 "unsigned_comparison_operator"
  [(match_operand:SI 2 "gpc_reg_operand")
   (match_operand:SI 3 "reg_or_short_operand")]))
-   (clobber (match_operand:SI 0 "register_operand"))]
+   (clobber (match_operand:SI 0 "gpc_reg_operand"))]
   ""
 {
   int uns_flag = unsigned_comparison_operator (operands[1], VOIDmode) ? 1 : 0;
@@ -10654,7 +10653,7 @@ (define_expand "cstore4_signed_imm"
   [(use (match_operator 1 "signed_comparison_operator"
  [(match_operand:GPR 2 "gpc_reg_operand")
   (match_operand:GPR 3 "immediate_operand")]))
-   (clobber (match_operand:GPR 0 "register_operand"))]
+   (clobber (match_operand:GPR 0 "gpc_reg_operand"))]
   ""
 {
   bool invert = false;
@@ -10699,7 +10698,7 @@ (define_expand "cstore4_unsigned_imm"
   [(use (match_operator 1 "unsigned_comparison_operator"
  [(match_operand:GPR 2 "gpc_reg_operand")
   (match_operand:GPR 3 "immediate_operand")]))
-   (clobber (match_operand:GPR 0 "register_operand"))]
+   (clobber (match_operand:GPR 0 "gpc_reg_operand"))]
   ""
 {
   bool invert = false;
@@ -10746,7 +10745,7 @@ (define_expand "cstore4"
   [(use (match_operator 1 "rs6000_cbranch_operator"
  [(match_operand:GPR 2 "gpc_reg_operand")
   (match_operand:GPR 3 "reg_or_short_operand")]))
-   (clobber (match_operand:GPR 0 "register_operand"))]
+   (clobber (match_operand:GPR 0 "gpc_reg_operand"))]
   ""
 {
   /* Use ISEL if the user asked for it.  */
@@ -10806,9 +10805,9 @@ (define_expand "cstore4"
 
 (define_expand "cstore4"
   [(use (match_operator 1 "rs6000_cbranch_operator"
- [(match_operand:FP 2 "gpc_reg_operand" "")
-  (match_operand:FP 3 "gpc_reg_operand" "")]))
-   (clobber (match_operand:SI 0 "register_operand"))]
+ [(match_operand:FP 2 "gpc_reg_operand")
+  (match_operand:FP 3 "gpc_reg_operand")]))
+   (clobber (match_operand:SI 0 "gpc_reg_operand"))]
   ""
 {
   rs6000_emit_sCOND (mode, operands);
-- 
1.9.3

Re: [PATCH 1/2] s/390: Implement "target" attribute.

2015-12-04 Thread Andreas Krebbel

On 12/04/2015 03:14 PM, Dominik Vogt wrote:
> Next version of the patch with the changes discussed internally.
> (Sorry for the version number; I now use my internam branch
> numbers for sake of clarity.)

Applied. Thanks!

-Andreas-

Re: [PR64164] drop copyrename, integrate into expand

2015-12-04 Thread Dominik Vogt

On Fri, Mar 27, 2015 at 03:04:05PM -0300, Alexandre Oliva wrote:
> This patch reworks the out-of-ssa expander to enable coalescing of SSA
> partitions that don't share the same base name.  This is done only when
> optimizing.
> 
> The test we use to tell whether two partitions can be merged no longer
> demands them to have the same base variable when optimizing, so they
> become eligible for coalescing, as they would after copyrename.  We then
> compute the partitioning we'd get if all coalescible partitions were
> coalesced, using this partition assignment to assign base vars numbers.
> These base var numbers are then used to identify conflicts, which used
> to be based on shared base vars or base types.
> 
> We now propagate base var names during coalescing proper, only towards
> the leader variable.  I'm no longer sure this is still needed, but
> something about handling variables and results led me this way and I
> didn't revisit it.  I might rework that with a later patch, or a later
> revision of this patch; it would require other means to identify
> partitions holding result_decls during merging, or allow that and deal
> with param and result decls in a different way during expand proper.
> 
> I had to fix two lingering bugs in order for the whole thing to work: we
> perform conflict detection after abnormal coalescing, but we computed
> live ranges involving only the partition leaders, so conflicts with
> other names already coalesced wouldn't be detected.  The other problem
> was that we didn't track default defs for parms as live at entry, so
> they might end up coalesced.  I guess none of these problems would have
> been exercised in practice, because we wouldn't even consider merging
> ssa names associated with different variables.
> 
> In the end, I verified that this fixed the codegen regression in the
> PR64164 testcase, that failed to merge two partitions that could in
> theory be merged, but that wasn't even considered due to differences in
> the SSA var names.
> 
> I'd agree that disregarding the var names and dropping 4 passes is too
> much of a change to fix this one problem, but...  it's something we
> should have long tackled, and it gets this and other jobs done, so...
> 
> Regstrapped on x86_64-linux-gnu native and on i686-pc-linux-gnu native
> on x86_64, so without lto.  Is this ok to install?

The patch that got committed as a result of this discussion causes
a performance regression on s390[x].  Bug report:
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=68695

Ciao

Dominik ^_^  ^_^

-- 

Dominik Vogt
IBM Germany

Re: Add fuzzing coverage support

2015-12-04 Thread Jakub Jelinek

Hi!

While this has been posted after stage1 closed and I'm not really happy
that it missed the deadline, I'm willing to grant an exception, the patch
is small enough that it is ok at this point of stage3.  That said, next time
please try to submit new features in time.

Are there any plans for GCC 7 for the other -fsanitize-coverage= options,
or are those just LLVM alternatives to GCC's gcov/-fprofile-generate etc.?

On Thu, Dec 03, 2015 at 08:17:06PM +0100, Dmitry Vyukov wrote:
> +unsigned sancov_pass (function *fun)

Formatting:
unsigned
sancov_pass (function *fun)

> +{
> +  basic_block bb;
> +  gimple_stmt_iterator gsi;
> +  gimple *stmt, *f;
> +  static bool inited;
> +
> +  if (!inited)
> +{
> +  inited = true;
> +  initialize_sanitizer_builtins ();
> +}

You can call this unconditionally, it will return as the first thing
if it is already initialized, no need for another guard.

> +
> +  /* Insert callback into beginning of every BB. */
> +  FOR_EACH_BB_FN (bb, fun)
> +{
> +  gsi = gsi_after_labels (bb);
> +  if (gsi_end_p (gsi))
> +continue;
> +  stmt = gsi_stmt (gsi);
> +  f = gimple_build_call (builtin_decl_implicit (
> + BUILT_IN_SANITIZER_COV_TRACE_PC), 0);

I (personally) prefer no ( at the end of line unless really needed.
In this case you can just do:
  tree fndecl = builtin_decl_implicit (BUILT_IN_SANITIZER_COV_TRACE_PC);
  gimple *g = gimple_build_call (fndecl, 0);
which is same number of lines, but looks nicer.
Also, please move also the gsi, stmt and f (better g or gcall)
declarations to the first assignment to them, they aren't used outside of
the loop.

> --- testsuite/gcc.dg/sancov/asan.c(revision 0)
> +++ testsuite/gcc.dg/sancov/asan.c(working copy)
> @@ -0,0 +1,21 @@
> +/* Test coverage/asan interaction:
> + - coverage instruments __asan_init ctor (thus 4 covarage callbacks)
> + - coverage does not instrument asan-emitted basic blocks
> + - asan considers coverage callback as "nonfreeing" (thus 1 asan store
> +   callback.  */
> +/* { dg-do compile } */
> +/* { dg-options "-fsanitize-coverage=trace-pc -fsanitize=address" } */
> +
> +void notailcall ();
> +
> +void foo(volatile int *a, int *b)
> +{
> +  *a = 1;
> +  if (*b)
> +*a = 2;
> +  notailcall ();
> +}
> +
> +/* { dg-final { scan-assembler-times "call   __sanitizer_cov_trace_pc" 4 } } 
> */
> +/* { dg-final { scan-assembler-times "call   __asan_report_load4" 1 } } */
> +/* { dg-final { scan-assembler-times "call   __asan_report_store4" 1 } } */

I don't like these, we have lots of targets, and different targets have
different instructions for making calls, different whitespace in between
the insn name and called function, sometimes some extra decoration on the fn
name, (say sometimes an extra _ prefix), etc.  IMHO much better to add
-fdump-tree-optimized and scan-tree-dump-times instead for the calls in the
optimized dump.  Affects all tests.

Please repost a patch with these changes fixed, it will be hopefully ackable
then.

Jakub

Re: [ARM] Fix PR middle-end/65958

2015-12-04 Thread Richard Earnshaw

> + (unspec_volatile:PTR [(match_operand:PTR 1 "register_operand" "0")
> +   (match_operand:PTR 2 "register_operand" "r")]
> +UNSPEC_PROBE_STACK_RANGE))]

Minor nit.  Since this is used in an unspec_volatile, the name should be
UNSPECV_ and defined in the unspecv enum.

Otherwise OK once testing is complete.

R.


On 03/12/15 12:17, Eric Botcazou wrote:
>> I can understand this restriction, but...
>>
>>> +  /* See the same assertion on PROBE_INTERVAL above.  */
>>> +  gcc_assert ((first % 4096) == 0);
>>
>> ... why isn't this a test that FIRST is aligned to PROBE_INTERVAL?
> 
> Because that isn't guaranteed, FIRST is related to the size of the protection 
> area while PROBE_INTERVAL is related to the page size.
> 
>> blank line between declarations and code. Also, can we come up with a
>> suitable define for 4096 here that expresses the context and then use
>> that consistently through the remainder of this function?
> 
> OK, let's use ARITH_BASE.
> 
>>> +(define_insn "probe_stack_range"
>>> +  [(set (match_operand:DI 0 "register_operand" "=r")
>>> +   (unspec_volatile:DI [(match_operand:DI 1 "register_operand" "0")
>>> +(match_operand:DI 2 "register_operand" "r")]
>>> +UNSPEC_PROBE_STACK_RANGE))]
>>
>> I think this should really use PTRmode, so that it's ILP32 ready (I'm
>> not going to ask you to make sure that works though, since I suspect
>> there are still other issues to resolve with ILP32 at this time).
> 
> Done.  Manually tested for now, I'll fully test it if approved.
> 
> 
> PR middle-end/65958
> * config/aarch64/aarch64-protos.h (aarch64_output_probe_stack-range):
> Declare.
> * config/aarch64/aarch64.md: Declare UNSPECV_BLOCKAGE and
> UNSPEC_PROBE_STACK_RANGE.
> (blockage): New instruction.
> (probe_stack_range_): Likewise.
> * config/aarch64/aarch64.c (aarch64_emit_probe_stack_range): New
> function.
> (aarch64_output_probe_stack_range): Likewise.
> (aarch64_expand_prologue): Invoke aarch64_emit_probe_stack_range if
> static builtin stack checking is enabled.
> * config/aarch64/aarch64-linux.h (STACK_CHECK_STATIC_BUILTIN):
> Define.
> 
> 
> pr65958-2c.diff
> 
> 
> Index: config/aarch64/aarch64-linux.h
> ===
> --- config/aarch64/aarch64-linux.h(revision 231206)
> +++ config/aarch64/aarch64-linux.h(working copy)
> @@ -88,4 +88,7 @@
>  #undef TARGET_BINDS_LOCAL_P
>  #define TARGET_BINDS_LOCAL_P default_binds_local_p_2
>  
> +/* Define this to be nonzero if static stack checking is supported.  */
> +#define STACK_CHECK_STATIC_BUILTIN 1
> +
>  #endif  /* GCC_AARCH64_LINUX_H */
> Index: config/aarch64/aarch64-protos.h
> ===
> --- config/aarch64/aarch64-protos.h   (revision 231206)
> +++ config/aarch64/aarch64-protos.h   (working copy)
> @@ -340,6 +340,7 @@ void aarch64_asm_output_labelref (FILE *
>  void aarch64_cpu_cpp_builtins (cpp_reader *);
>  void aarch64_elf_asm_named_section (const char *, unsigned, tree);
>  const char * aarch64_gen_far_branch (rtx *, int, const char *, const char *);
> +const char * aarch64_output_probe_stack_range (rtx, rtx);
>  void aarch64_err_no_fpadvsimd (machine_mode, const char *);
>  void aarch64_expand_epilogue (bool);
>  void aarch64_expand_mov_immediate (rtx, rtx);
> Index: config/aarch64/aarch64.c
> ===
> --- config/aarch64/aarch64.c  (revision 231206)
> +++ config/aarch64/aarch64.c  (working copy)
> @@ -62,6 +62,7 @@
>  #include "sched-int.h"
>  #include "cortex-a57-fma-steering.h"
>  #include "target-globals.h"
> +#include "common/common-target.h"
>  
>  /* This file should be included last.  */
>  #include "target-def.h"
> @@ -2183,6 +2184,179 @@ aarch64_libgcc_cmp_return_mode (void)
>return SImode;
>  }
>  
> +#define PROBE_INTERVAL (1 << STACK_CHECK_PROBE_INTERVAL_EXP)
> +
> +/* We use the 12-bit shifted immediate arithmetic instructions so values
> +   must be multiple of (1 << 12), i.e. 4096.  */
> +#define ARITH_BASE 4096
> +
> +#if (PROBE_INTERVAL % ARITH_BASE) != 0
> +#error Cannot use simple address calculation for stack probing
> +#endif
> +
> +/* The pair of scratch registers used for stack probing.  */
> +#define PROBE_STACK_FIRST_REG  9
> +#define PROBE_STACK_SECOND_REG 10
> +
> +/* Emit code to probe a range of stack addresses from FIRST to FIRST+SIZE,
> +   inclusive.  These are offsets from the current stack pointer.  */
> +
> +static void
> +aarch64_emit_probe_stack_range (HOST_WIDE_INT first, HOST_WIDE_INT size)
> +{
> +  rtx reg1 = gen_rtx_REG (ptr_mode, PROBE_STACK_FIRST_REG);
> +
> +  /* See the same assertion on PROBE_INTERVAL above.  */
> +  gcc_assert ((first % ARITH_BASE) == 0);
> +
> +  /* See if we have a constant small

[PATCH, i386, AVX-512, PR68633] Fix order of operands in kunpck[bw,wd,dq] patterns.

2015-12-04 Thread Kirill Yukhin

Hello,
Patch in the bottom fixes miscompare issue on Spec2k6/434.zeus.
Bootstrapped & reg-tested.

If no objections - I'll commit it into GCC main trunk on Monday.

gcc/
PR target/68633
* config/i386/sse.md (define_insn "kunpckhi"): Fix operands order.
(define_insn "kunpcksi"): Ditto.
(define_insn "kunpckdi"): Ditto.
gcc/testsuite
PR target/68633
* gcc.target/i386/pr68633.c: New test.

--
Thanks, K

commit 2379ca2e6a65c6373dde7c3f0b778216293f229d
Author: Kirill Yukhin 
Date:   Tue Dec 1 14:38:22 2015 +0300

AVX-512. Fix operands order in kunpck[bw,wd,dq].

diff --git a/gcc/config/i386/i386.md b/gcc/config/i386/i386.md
index cbb9ffd..5a79d04 100644
--- a/gcc/config/i386/i386.md
+++ b/gcc/config/i386/i386.md
@@ -8800,7 +8800,7 @@
(const_int 8))
  (zero_extend:HI (match_operand:QI 2 "register_operand" "k"]
   "TARGET_AVX512F"
-  "kunpckbw\t{%2, %1, %0|%0, %1, %2}"
+  "kunpckbw\t{%1, %2, %0|%0, %2, %1}"
   [(set_attr "mode" "HI")
(set_attr "type" "msklog")
(set_attr "prefix" "vex")])
@@ -8813,7 +8813,7 @@
(const_int 16))
  (zero_extend:SI (match_operand:HI 2 "register_operand" "k"]
   "TARGET_AVX512BW"
-  "kunpckwd\t{%2, %1, %0|%0, %1, %2}"
+  "kunpckwd\t{%1, %2, %0|%0, %2, %1}"
   [(set_attr "mode" "SI")])
 
 (define_insn "kunpckdi"
@@ -8824,7 +8824,7 @@
(const_int 32))
  (zero_extend:DI (match_operand:SI 2 "register_operand" "k"]
   "TARGET_AVX512BW"
-  "kunpckdq\t{%2, %1, %0|%0, %1, %2}"
+  "kunpckdq\t{%1, %2, %0|%0, %2, %1}"
   [(set_attr "mode" "DI")])
 
 ;; See comment for addsi_1_zext why we do use nonimmediate_operand
diff --git a/gcc/testsuite/gcc.target/i386/pr68633.c 
b/gcc/testsuite/gcc.target/i386/pr68633.c
new file mode 100755
index 000..d7f513d
--- /dev/null
+++ b/gcc/testsuite/gcc.target/i386/pr68633.c
@@ -0,0 +1,24 @@
+/* { dg-do run } */
+/* { dg-options "-Ofast -mavx512f" } */
+/* { dg-require-effective-target avx512f } */
+
+#include 
+
+#define AVX512F
+
+#include "avx512f-helper.h"
+
+void abort ();
+
+void
+TEST ()
+{
+  __mmask16 k1, k2, k3;
+
+  __asm__( "kmovw %1, %0" : "=k" (k1) : "r" (1) );
+  __asm__( "kmovw %1, %0" : "=k" (k2) : "r" (2) );
+
+  k3 = _mm512_kunpackb (k1, k2);
+  if (k3 != 0x201)
+abort ();
+}

[graphite] test needs pthreads

2015-12-04 Thread Nathan Sidwell

The recently added test graphite/id-28.c requires pthreads due to its use of 
-fcilkplus.  Committed as obvious.


nathan
2015-12-04  Nathan Sidwell  

	* gcc.dg/graphite/id-28.c: Requires pthreads.

Index: testsuite/gcc.dg/graphite/id-28.c
===
--- testsuite/gcc.dg/graphite/id-28.c	(revision 231265)
+++ testsuite/gcc.dg/graphite/id-28.c	(working copy)
@@ -1,4 +1,5 @@
 /* { dg-options "-fcilkplus -floop-nest-optimize -O3" } */
+/* { dg-require-effective-target pthread } */
 
 #if HAVE_IO
 #include

Re: [PATCH 1/2] s/390: Implement "target" attribute.

2015-12-04 Thread Dominik Vogt

Next version of the patch with the changes discussed internally.
(Sorry for the version number; I now use my internam branch
numbers for sake of clarity.)

Ciao

Dominik ^_^  ^_^

-- 

Dominik Vogt
IBM Germany
gcc/ChangeLog

* config/s390/s390.opt (s390_arch_string): Remove.
(s390_tune_string): Likewise.
(s390_cost_pointer): Add Variable.
(s390_tune_flags): Add TargetVariable.
(s390_arch_flags, march=, mbackchain, mdebug, mesa, mhard-dfp),
(mhard-float, mlong-double-128, mlong-double-64, mhtm, mvx),
(mpacked-stack, msmall-exec, msoft-float, mstack-guard=, mstack-size=),
(mtune=, mmvcle, mzvector, mzarch, mbranch-cost=, mwarn-dynamicstack),
(mwarn-framesize=): Save option.
(mno-stack-guard, mno-stack-guard): New option.
(mwarn-dynamicstack): Allow mno-warn-dynamicstack.
(mwarn-framesize=): Convert to UInteger (negative values are rejected
now).
* config/s390/s390-c.c (s390_cpu_cpp_builtins_internal): Split setting
macros changeable through the GCC target pragma into a separate
function.
(s390_cpu_cpp_builtins): Likewise.
(s390_pragma_target_parse): New function, implement GCC target pragma
if enabled.
(s390_register_target_pragmas): Register s390_pragma_target_parse if
available.
* common/config/s390/s390-common.c (s390_handle_option):
Export.
Move setting s390_arch_flags to s390.c.
Remove s390_tune_flags.
Allow 0 as argument to -mstack-size (switch to default value).
Allow 0 as argument to -mstack-guard (switch off).
Remove now unnecessary explicit parsing code for -mwarn-framesize.
* config/s390/s390-protos.h (s390_handle_option): Export.
(s390_valid_target_attribute_tree): Export.
(s390_reset_previous_fndecl): Export.
* config/s390/s390-builtins.def: Use new macro B_GROUP to mark the start
and end of HTM and VX builtins.
(s390_asm_output_function_prefix): Declare hook.
(s390_asm_declare_function_size): Likewise.
* config/s390/s390-builtins.h (B_GROUP): Use macro.
* config/s390/s390-opts.h: Add comment about processor_type usage.
* config/s390/s390.h (TARGET_CPU_IEEE_FLOAT_P, TARGET_CPU_ZARCH_P),
(TARGET_CPU_LONG_DISPLACEMENT_P, TARGET_CPU_EXTIMM_P, TARGET_CPU_DFP_P),
(TARGET_CPU_Z10_P, TARGET_CPU_Z196_P, TARGET_CPU_ZEC12_P),
(TARGET_CPU_HTM_P, TARGET_CPU_Z13_P, TARGET_CPU_VX_P),
(TARGET_HARD_FLOAT_P, TARGET_LONG_DISPLACEMENT_P, TARGET_EXTIMM_P),
(TARGET_DFP_P, TARGET_Z10_P, TARGET_Z196_P, TARGET_ZEC12_P),
(TARGET_HTM_P, TARGET_Z13_P, TARGET_VX_P, TARGET_CPU_EXTIMM),
(TARGET_CPU_DFP, TARGET_CPU_Z10, TARGET_CPU_Z196, TARGET_CPU_ZEC12),
(TARGET_CPU_HTM, TARGET_CPU_Z13, TARGET_LONG_DISPLACEMENT),
(TARGET_EXTIMM, TARGET_DFP, TARGET_Z10, TARGET_Z196, TARGET_ZEC12),
(TARGET_Z13, TARGET_VX, S390_USE_TARGET_ATTRIBUTE),
(S390_USE_ARCHITECTURE_MODIFIERS, SWITCHABLE_TARGET),
(ASM_DECLARE_FUNCTION_SIZE, ASM_OUTPUT_FUNCTION_PREFIX): Likewise.
* config/s390/vecintrin.h: Use vector definitions even if __VEC__ is
undefined.
(vec_all_nan): Rewrite as macro using statement expressions to avoid
that the vector keyword needs to be defined when including the file.
(vec_all_numeric): Likewise.
(vec_any_nan): Likewise.
(vec_any_numeric):  Likewise.
* config/s390/s390.c (s390_previous_fndecl): New static variable.
(s390_set_current_function): New function.
(s390_cost): Wrapper macro to allow defining the cost table pointer in
the options file.
(processor_table): Table for march= and mtune= parsing.
(s390_init_builtins): Enable all builtins and types unconditionally.
(s390_expand_builtin): Generate an error message if builtin is not
supported by current options.
Correct an error message.
(s390_function_specific_restore): New function to set s390_cost.
(s390_asm_output_machine_for_arch): New function for emitting .machine
and .machinmode directives to the assembler file.
(s390_asm_output_function_prefix): Likewise.
(s390_asm_declare_function_size):  Likewise.
(s390_asm_output_function_label): Add mdebug output for feature testing.
(s390_option_override): Move implementation into internal function.
(s390_option_override_internal): Likewise.
Implement option overriding based on current options.
(s390_valid_target_attribute_inner_p): New function implementing target
attribute logic.
(s390_valid_target_attribute_tree): Likewise.
(s390_valid_target_attribute_p): Likewise.
(s390_reset_previous_fndecl): Likewise.
(s390_set_current_function): Likewise.
(TARGET_SET_CURRENT_FUNCTION): Provide

Re: [PATCH 2/2] rs6000: Clean up the cstore code a bit

2015-12-04 Thread David Edelsohn

On Fri, Dec 4, 2015 at 9:34 AM, Segher Boessenkool
 wrote:
> "register_operand" was a bit confusing.  Also some other minor cleanups.
>
> Tested on powerpc64-linux; okay for mainline?
>
>
> Segher
>
>
> 2015-12-04  Segher Boessenkool  
>
> * (cstore4_unsigned): Use gpc_reg_operand instead of
> register_operand.  Remove empty constraints.  Use std::swap.
> (cstore_si_as_di, cstore4_signed_imm,
> cstore4_unsigned_imm, cstore4 for GPR): Use
> gpc_reg_operand instead of register_operand.
> (cstore4 for FP): Use gpc_reg_operand instead of
> register_operand.  Remove empty constraints.

Okay.

Thanks, David

Re: [Patch, Fortran] PR45859 - Permit array elements to coarray dummy arguments

2015-12-04 Thread Tobias Burnus

I pressed "Send" too early - as the testsuite fails unless the following
patch is applied. I think I will just use this test case (with patch)
instead of adding a new test-suite file. Required patch:

--- a/gcc/testsuite/gfortran.dg/coarray_args_2.f90
+++ b/gcc/testsuite/gfortran.dg/coarray_args_2.f90
@@ -40,8 +40,7 @@ program rank_mismatch_02
   sync all

   call subr(ndim, a(1:1,2)) ! OK
-  call subr(ndim, a(1,2)) ! { dg-error "must be simply contiguous" }
-  ! See also F08/0048 and PR 45859 about the validity
+  call subr(ndim, a(1,2)) ! See also F08/0048 and PR 45859 about the validity
   if (this_image() == 1) then
  write(*, *) 'OK'
   end if


Tobias

On Fri, Dec 04, 2015 at 01:39:22PM +0100, Tobias Burnus wrote:
> This patch permits
> 
>interface
>   subroutine sub (x)
>  real x(10)[*]
>   end subroutine
>end interface
>real :: x(100)[*]
>call sub (x(10))
>end
> 
> where one passes an array element ("x(10)") of a contiguous array to a
> coarray dummy argument. That's permitted per interpretation request
> F08/0048, which ended up in Fortran 2008's Corrigendum 2 - and is also
> in the current Fortran 2015 drafts:
> 
> "If the dummy argument is an array coarray that has the CONTIGUOUS attribute
>  or is not of assumed shape, the corresponding actual argument shall be
>  simply contiguous or an element of a simply contiguous array."
> 
> the "or ..." of the last line was added in the corrigendum.
> 
> 
> I hope and think that I got the true/false of the other users correct - in
> most cases, it probably doesn't matter as the caller is only reached for
> expr->rank > 0.
> 
> Build and regtested on x86-64-gnu-linux.
> OK for the trunk?
> 
> Tobias

> gcc/fortran
>   PR fortran/45859
>   * expr.c (gfc_is_simply_contiguous): Optionally permit array elements.
>   (gfc_check_pointer_assign): Update call.
>   * interface.c (compare_parameter): Ditto.
>   * trans-array.c (gfc_conv_array_parameter): Ditto.
>   * trans-intrinsic.c (gfc_conv_intrinsic_transfer,
>   conv_isocbinding_function): Ditto.
>   * gfortran.h (gfc_is_simply_contiguous):
> 
> gcc/testsuite/
>   PR fortran/45859
>   * gfortran.dg/coarray_argument_1.f90: New.
> 
> diff --git a/gcc/fortran/expr.c b/gcc/fortran/expr.c
> index 2aeb0b5..5dd90ef 100644
> --- a/gcc/fortran/expr.c
> +++ b/gcc/fortran/expr.c
> @@ -3683,7 +3683,7 @@ gfc_check_pointer_assign (gfc_expr *lvalue, gfc_expr 
> *rvalue)
>and F2008 must be allowed.  */
>if (rvalue->rank != 1)
>   {
> -   if (!gfc_is_simply_contiguous (rvalue, true))
> +   if (!gfc_is_simply_contiguous (rvalue, true, false))
>   {
> gfc_error ("Rank remapping target must be rank 1 or"
>" simply contiguous at %L", >where);
> @@ -4601,7 +4601,7 @@ gfc_has_ultimate_pointer (gfc_expr *e)
> a "(::1)" is accepted.  */
>  
>  bool
> -gfc_is_simply_contiguous (gfc_expr *expr, bool strict)
> +gfc_is_simply_contiguous (gfc_expr *expr, bool strict, bool permit_element)
>  {
>bool colon;
>int i;
> @@ -4615,7 +4615,7 @@ gfc_is_simply_contiguous (gfc_expr *expr, bool strict)
>else if (expr->expr_type != EXPR_VARIABLE)
>  return false;
>  
> -  if (expr->rank == 0)
> +  if (!permit_element && expr->rank == 0)
>  return false;
>  
>for (ref = expr->ref; ref; ref = ref->next)
> diff --git a/gcc/fortran/gfortran.h b/gcc/fortran/gfortran.h
> index 9f61e45..d203c32 100644
> --- a/gcc/fortran/gfortran.h
> +++ b/gcc/fortran/gfortran.h
> @@ -2982,7 +2982,7 @@ void gfc_free_actual_arglist (gfc_actual_arglist *);
>  gfc_actual_arglist *gfc_copy_actual_arglist (gfc_actual_arglist *);
>  const char *gfc_extract_int (gfc_expr *, int *);
>  bool is_subref_array (gfc_expr *);
> -bool gfc_is_simply_contiguous (gfc_expr *, bool);
> +bool gfc_is_simply_contiguous (gfc_expr *, bool, bool);
>  bool gfc_check_init_expr (gfc_expr *);
>  
>  gfc_expr *gfc_build_conversion (gfc_expr *);
> diff --git a/gcc/fortran/interface.c b/gcc/fortran/interface.c
> index f74239d..bfd5d36 100644
> --- a/gcc/fortran/interface.c
> +++ b/gcc/fortran/interface.c
> @@ -2020,7 +2020,7 @@ compare_parameter (gfc_symbol *formal, gfc_expr *actual,
>  
>/* F2008, C1241.  */
>if (formal->attr.pointer && formal->attr.contiguous
> -  && !gfc_is_simply_contiguous (actual, true))
> +  && !gfc_is_simply_contiguous (actual, true, false))
>  {
>if (where)
>   gfc_error ("Actual argument to contiguous pointer dummy %qs at %L "
> @@ -2131,15 +2131,17 @@ compare_parameter (gfc_symbol *formal, gfc_expr 
> *actual,
>  
>if (formal->attr.codimension)
>  {
> -  /* F2008, 12.5.2.8.  */
> +  /* F2008, 12.5.2.8 + Corrig 2 (IR F08/0048).  */
> +  /* F2015, 12.5.2.8.  */
>if (formal->attr.dimension
> && (formal->attr.contiguous || formal->as->type != AS_ASSUMED_SHAPE)
> && gfc_expr_attr (actual).dimension
> -

[PTX] prologue emission

2015-12-04 Thread Nathan Sidwell

The PTX backend has to emit PTX function prototypes and prologues in addition to 
the regular argument marshalling machinery.  And of course these all have to 
agree with each other.  The first two are done by two different pieces of code,


This patch changes things so that prototype and prologue argument emission is 
done by the same piece of code.  A slight wart there is that prototype emission 
uses a C++ stdstream where as prologue emission (mainly) used ye olde FILE *. 
The patch extends the latter's use of stdstream a little further.


You'll notice that argument promotion is slightly different in the two paths. 
I've not changed that behaviour, but I suspect it may be an error.  Will relook 
at that when I'm done reducing the duplication in the function calling machinery.


nathan
2015-12-04  Nathan Sidwell  

	* config/nvptx/nvptx.c (write_one_arg): Deal with prologue
	emission too. Change 'no_arg_types' to 'prototyped'.
	(write_fn_proto):  Use write_one_arg for stdarg, static chain &
	main.
	(nvptx_declare_function_name): Use write_one_arg for prologue copies.

Index: config/nvptx/nvptx.c
===
--- config/nvptx/nvptx.c	(revision 231265)
+++ config/nvptx/nvptx.c	(working copy)
@@ -389,38 +389,67 @@ arg_promotion (machine_mode mode)
   return mode;
 }
 
-/* Write the declaration of a function arg of TYPE to S.  I is the index
-   of the argument, MODE its mode.  NO_ARG_TYPES is true if this is for
-   a decl with zero TYPE_ARG_TYPES, i.e. an old-style C decl.  */
+/* Process function parameter TYPE, either emitting in a prototype
+   argument, or as a copy a in a function prologue.  ARGNO is the
+   index of this argument in the PTX function.  FOR_REG is negative,
+   if we're emitting the PTX prototype.  It is zero if we're copying
+   to an argument register and it is greater than zero if we're
+   copying to a specific hard register.  PROTOTYPED is true, if this
+   is a prototyped function, rather than an old-style C declaration.
+
+   The behaviour here must match the regular GCC function parameter
+   marshalling machinery.  */
 
 static int
-write_one_arg (std::stringstream , const char *sep, int i,
-	   tree type, machine_mode mode, bool no_arg_types)
+write_one_arg (std::stringstream , int for_reg, int argno,
+	   tree type, bool prototyped)
 {
+  machine_mode mode = TYPE_MODE (type);
+
   if (!PASS_IN_REG_P (mode, type))
 mode = Pmode;
 
   machine_mode split = maybe_split_mode (mode);
   if (split != VOIDmode)
 {
-  i = write_one_arg (s, sep, i, TREE_TYPE (type), split, false);
-  sep = ", ";
   mode = split;
+  argno = write_one_arg (s, for_reg, argno,
+			 TREE_TYPE (type), prototyped);
 }
 
-  if (no_arg_types && !AGGREGATE_TYPE_P (type))
+  if (!prototyped && !AGGREGATE_TYPE_P (type))
 {
   if (mode == SFmode)
 	mode = DFmode;
   mode = arg_promotion (mode);
 }
 
-  s << sep;
-  s << ".param" << nvptx_ptx_type_from_mode (mode, false) << " %in_ar"
-<< i << (mode == QImode || mode == HImode ? "[1]" : "");
-  if (mode == BLKmode)
-s << "[" << int_size_in_bytes (type) << "]";
-  return i + 1;
+  if (for_reg < 0)
+{
+  /* Writing PTX prototype.  */
+  s << (argno ? ", " : " (");
+  s << ".param" << nvptx_ptx_type_from_mode (mode, false)
+	<< " %in_ar" << argno;
+  if (mode == QImode || mode == HImode)
+	s << "[1]";
+}
+  else
+{
+  mode = arg_promotion (mode);
+  s << "\t.reg" << nvptx_ptx_type_from_mode (mode, false) << " ";
+  if (for_reg)
+	s << reg_names[for_reg];
+  else
+	s << "%ar" << argno;
+  s << ";\n";
+  s << "\tld.param" << nvptx_ptx_type_from_mode (mode, false) << " ";
+  if (for_reg)
+	s << reg_names[for_reg];
+  else
+	s << "%ar" << argno;
+  s<< ", [%in_ar" << argno << "];\n";
+}
+  return argno + 1;
 }
 
 /* Look for attributes in ATTRS that would indicate we must write a function
@@ -507,16 +536,11 @@ write_fn_proto (std::stringstream , bo
 
   s << name;
 
-  const char *sep = " (";
-  int i = 0;
+  int argno = 0;
 
   /* Emit argument list.  */
   if (return_in_mem)
-{
-  s << sep << ".param.u" << GET_MODE_BITSIZE (Pmode) << " %in_ar0";
-  sep  = ", ";
-  i++;
-}
+argno = write_one_arg (s, -1, argno, ptr_type_node, true);
 
   /* We get:
  NULL in TYPE_ARG_TYPES, for old-style functions
@@ -524,46 +548,34 @@ write_fn_proto (std::stringstream , bo
declaration.
  So we have to pick the best one we have.  */
   tree args = TYPE_ARG_TYPES (fntype);
-  bool null_type_args = !args;
-  if (null_type_args)
-args = DECL_ARGUMENTS (decl);
+  bool prototyped = true;
+  if (!args)
+{
+  args = DECL_ARGUMENTS (decl);
+  prototyped = false;
+}
 
   for (; args; args = TREE_CHAIN (args))
 {
-  tree type = null_type_args ? TREE_TYPE (args) : TREE_VALUE (args);
-  machine_mode mode = TYPE_MODE (type);
+

Re: [PATCH][AArch64] Don't allow -mgeneral-regs-only to change the .arch assembler directives

2015-12-04 Thread Kyrill Tkachov


Ping.
This almost fell through the cracks.
https://gcc.gnu.org/ml/gcc-patches/2015-10/msg00055.html

Thanks,
Kyrill

On 01/10/15 14:00, Kyrill Tkachov wrote:

Hi all,

As part of the SWITCHABLE_TARGET work I inadvertently changed the behaviour of 
-mgeneral-regs-only with respect to the .arch directives that we emit.
The behaviour of -mgeneral-regs-only in GCC 5 and earlier is such that it 
disallows the usage of FP/SIMD registers but does *not* stop the compiler from
emitting the +fp,+simd etc extensions in the .arch directive of the generated 
assembly. This is to accommodate users who may want to write inline assembly
in a file compiled with -mgeneral-regs-only.

This patch restores the trunk behaviour in that respect to that of GCC 5 and 
the documentation for the option is tweaked a bit to reflect that.
Bootstrapped and tested on aarch64.

Ok for trunk?

Thanks,
Kyrill

2015-10-01  Kyrylo Tkachov  

  * config/aarch64/aarch64.c (aarch64_override_options_internal):
  Do not alter target_flags due to TARGET_GENERAL_REGS_ONLY_P.
  * doc/invoke.texi (AArch64 options): Mention that -mgeneral-regs-only
  does not affect the assembler directives.

2015-10-01  Kyrylo Tkachov  

  * gcc.target/aarch64/mgeneral-regs_4.c: New test.

Re: [PATCH 2/2] s/390: Implement "target" attribute.

2015-12-04 Thread Andreas Krebbel

On 09/25/2015 04:02 PM, Dominik Vogt wrote:
> On Fri, Sep 25, 2015 at 02:59:41PM +0100, Dominik Vogt wrote:
>> The following set of two patches implements the function
>> __attribute__ ((target("..."))) and the corresponding #pragma GCC
>> target("...") on S/390.  It comes with certain limitations:
>>
>>  * It is not possible to change any options that affect the ABI or
>>the definition of target macros by using the attribute (vx,
>>htm, zarch and others).  Some of them are still supported but
>>unable to change the definition of the corresponding target macros.
>>In these cases, the pragma has to be used.  One reason for this
>>is that it is not possible to change the definition of the target
>>macros with the attribute, but the implementation of some features
>>relies on them.
>>
>>  * Even with the pragma it is not possible to switch between zarch
>>and esa architecture because internal data typed would have to be
>>changed at Gcc run time.
>>
>> The second patch contains a long term change in the interface with
>> the assembler.  Currently, the compiler wrapper passes the same
>> -march= and -mtune= options to the compiler and the assembler.
>> The patch makes this obsolete by emitting ".machine" and
>> ".machinemode" dirctives to the top of the assembly language file.
>> The old way ist still supported but may be removed once the
>> ".machine" feature is supported by all as versions in the field.
>>
>> The second patch depends on the first one, and both require the
>> (latest) change proposed in this thread:
>> https://gcc.gnu.org/ml/gcc-patches/2015-09/msg01546.html

Applied. Thanks!

-Andreas-

Re: [PATCH 01/10] C++ FE: expression ranges v4

2015-12-04 Thread Jason Merrill


On 12/03/2015 09:55 AM, David Malcolm wrote:

@@ -362,10 +362,11 @@ convert_to_real_1 (tree type, tree expr, bool fold_p)
  case REAL_TYPE:
/* Ignore the conversion if we don't need to store intermediate
 results and neither type is a decimal float.  */
-  return build1 ((flag_float_store
-|| DECIMAL_FLOAT_TYPE_P (type)
-|| DECIMAL_FLOAT_TYPE_P (itype))
-? CONVERT_EXPR : NOP_EXPR, type, expr);
+  return build1_loc (loc,
+(flag_float_store
+ || DECIMAL_FLOAT_TYPE_P (type)
+ || DECIMAL_FLOAT_TYPE_P (itype))
+? CONVERT_EXPR : NOP_EXPR, type, expr);



@@ -5438,7 +5438,7 @@ build_nop (tree type, tree expr)
 {
   if (type == error_mark_node || error_operand_p (expr))
 return expr;
-  return build1 (NOP_EXPR, type, expr);
+  return build1_loc (EXPR_LOCATION (expr), NOP_EXPR, type, expr);


Hmm, I'm uneasy about assigning a location to a conversion or other 
expression that doesn't correspond to particular text; it could be 
associated with the location of the operand or the enclosing expression 
that prompted the conversion.  I think we've been deliberately leaving 
the location unset.  But that causes problems with code that only looks 
at the top-level EXPR_LOCATION.  Arguably such code should be fixed to 
look at the pre-conversion expression tree for a location, but I guess 
this is reasonable.


Past GCC 6 I think we definitely want to use a new tree code rather than 
cp_expr; as Jakub pointed out, cp_expr doesn't do anything for templates 
or language-independent code.


The current patchset is OK for GCC 6.

Jason

Re: [PATCH] Add XFAIL to g++.dg/template/ref3.C (PR c++/68699)

2015-12-04 Thread Jason Merrill


On 12/04/2015 11:45 AM, David Malcolm wrote:

On Fri, 2015-12-04 at 11:01 -0500, Jason Merrill wrote:

On 12/03/2015 05:08 PM, David Malcolm wrote:

On Thu, 2015-12-03 at 15:38 -0500, Jason Merrill wrote:

On 12/03/2015 09:55 AM, David Malcolm wrote:

Testcase g++.dg/template/ref3.C:

1   // PR c++/28341
2
3   template struct A {};
4
5   template struct B
6   {
7 A<(T)0> b; // { dg-error "constant|not a valid" }
8 A a; // { dg-error "constant|not a valid" }
9   };
   10
   11   B b;

The output of this test for both c++11 and c++14 is unaffected
by the patch kit:
g++.dg/template/ref3.C: In instantiation of 'struct B':
g++.dg/template/ref3.C:11:15:   required from here
g++.dg/template/ref3.C:7:11: error: '0' is not a valid template argument for 
type 'const int&' because it is not an lvalue
g++.dg/template/ref3.C:8:11: error: '0' is not a valid template argument for 
type 'const int&' because it is not an lvalue

However, the c++98 output is changed:

Status quo for c++98:
g++.dg/template/ref3.C: In instantiation of 'struct B':
g++.dg/template/ref3.C:11:15:   required from here
g++.dg/template/ref3.C:7:11: error: a cast to a type other than an integral or 
enumeration type cannot appear in a constant-expression
g++.dg/template/ref3.C:8:11: error: a cast to a type other than an integral or 
enumeration type cannot appear in a constant-expression

(line 7 and 8 are at the closing semicolon for fields b and a)

With the patchkit for c++98:
g++.dg/template/ref3.C: In instantiation of 'struct B':
g++.dg/template/ref3.C:11:15:   required from here
g++.dg/template/ref3.C:7:5: error: a cast to a type other than an integral or 
enumeration type cannot appear in a constant-expression
g++.dg/template/ref3.C:7:5: error: a cast to a type other than an integral or 
enumeration type cannot appear in a constant-expression

So the 2nd:
 "error: a cast to a type other than an integral or enumeration type cannot 
appear in a constant-expression"
moves from line 8 to line 7 (and moves them to earlier, having ranges)

What's happening is that cp_parser_enclosed_template_argument_list
builds a CAST_EXPR, the first time from cp_parser_cast_expression,
the second time from cp_parser_functional_cast; these have locations
representing the correct respective caret, i.e.:

  A<(T)0> b;
^~~~

and:

  A a;
^~~~

Eventually finish_template_type is called for each, to build a RECORD_TYPE,
and we get a cache hit the 2nd time through here in pt.c:
8281  hash = spec_hasher::hash ();
8282  entry = type_specializations->find_with_hash (, hash);
8283
8284  if (entry)
8285return entry->spec;

due to:
 template_args_equal (ot=, nt=) at ../../src/gcc/cp/pt.c:7778
which calls:
 cp_tree_equal (t1=, t2=) at ../../src/gcc/cp/tree.c:2833
and returns equality.

Hence we get a single RECORD_TYPE for the type A<(T)(0)>, and hence
when issuing the errors it uses the TREE_VEC for the first one,
using the location of the first line.


Why does the type sharing affect where the parser gives the error?


I believe what's happening is that the patchkit is setting location_t
values for more expressions than before, including the expression for
the template param.  pt.c:tsubst_expr has this:

if (EXPR_HAS_LOCATION (t))
  input_location = EXPR_LOCATION (t);

I believe that before (in the status quo), the substituted types didn't
have location_t values, and hence the above conditional didn't fire;
input_location was coming from a *token* where the expansion happened,
hence we got an error message on the relevant line for each expansion.

With the patch, the substituted types have location_t values within
their params, hence the conditional above fires: input_location is
updated to use the EXPR_LOCATION, which comes from that of the param
within the type - but with type-sharing it's using the first place where
the type is created.

Perhaps a better fix is for cp_parser_non_integral_constant_expression
to take a location_t, rather than have it rely on input_location?


Ah, I see, the error is coming from tsubst_copy_and_build, not
cp_parser_non_integral_constant_expression.  So indeed this is an effect
of the canonicalization of template instances, and we aren't going to
fix it in the context of this patchset.  But this is still a bug, so I'd
rather have an xfail and a PR than change the expected output.


Is the following what you had in mind?


Yes, thanks.

Jason

Re: [PATCH] Use explicit UNKNOWN_LOCATION instead of input_location (which is line 1) for process_options diagnostics (PR c/68656)

2015-12-04 Thread Bernd Schmidt


On 12/04/2015 05:45 PM, Jakub Jelinek wrote:

This patch fixes it to use explicit UNKNOWN_LOCATION, instead of
explicit or implicit input_location, which for most of process_options
is somewhere on line 1 of the main source file.


Ok.


Bernd

Re: [RFA][PATCH] Run CFG cleanups after reassociation as needed

2015-12-04 Thread Richard Biener

On December 4, 2015 5:20:52 PM GMT+01:00, Jeff Law  wrote:
>On 12/04/2015 03:19 AM, Richard Biener wrote:
>> On Thu, Dec 3, 2015 at 6:54 PM, Jeff Law  wrote:
>>> This is something I noticed while working on fixing 67816.
>>>
>>> Essentially I was seeing trivially true or trivially false
>conditionals left
>>> in the IL for DOM to clean up.
>>>
>>> While DOM can and will clean that crud up, but a trivially true or
>trivially
>>> false conditional ought to be detected and cleaned up by
>cleanup_cfg.
>>>
>>> It turns out the reassociation pass does not schedule a CFG cleanup
>even in
>>> cases where it optimizes a conditional to TRUE or FALSE.
>>>
>>> Bubbling up an indicator that we optimized away a conditional and
>using that
>>> to trigger a CFG cleanup is trivial.
>>>
>>> While I have a slight preference to see this fix in GCC 6, if folks
>object
>>> and want this to wait for GCC 7 stage1, I'd understand.
>>>
>>> Bootstrapped and regression tested on x86_64-linux-gnu.
>>>
>>> OK for the trunk?
>>
>> Ok.  [I always hoped we can at some point assert we don't have
>trivially
>> optimizable control flow in the IL]
>Presumably verification when a pass has completed, but not set 
>TODO_cleanup_cfg?

Well, kind-of, yes.

Richard.

>Jeff

Re: [PATCH][GCC] Make stackalign test LTO proof

2015-12-04 Thread Bernd Schmidt


On 12/04/2015 04:18 PM, Andre Vieira wrote:

Reworked following Joern's suggestion.

Is this OK?


Yes.


Bernd

[patch] libstdc++/57060 cope with invalid thread IDs

2015-12-04 Thread Jonathan Wakely


This patch ensures that this_thread::get_id() returns a value that is
distinct from the "not a thread" value, and avoids undefined behaviour
in pthread_equal.

Previously programs using glibc but not linking to libpthread would
get the "not a thread" value for std::this_thread::get_id() in main.
We were also using pthread_equal() with the "not a thread" value,
which is undefined.

This change assumes that pthread_t is EqualityComparable, but we
already have to assume it's LessThanComparable and apparently that
works everywhere that std::thread is supported, so this should too.

If it fails then we can do something smarter like dispatch based on
whether __gthread_t is an EqualityComparable scalar type, but I'd
prefer to avoid that complexity if this simpler version works.

Tested powerpc64le-linux, committed to trunk.

commit 64e418cb8e285bfa6079f176105a38ded456a976
Author: Jonathan Wakely 
Date:   Fri Dec 4 13:58:56 2015 +

PR libstdc++/57060 cope with invalid thread IDs

	PR libstdc++/57060
	* include/std/thread (operator==(thread::id, thread::id)): Do not use
	__gthread_equal.
	(operator<(thread::id, thread::id)): Add comment.
	(this_thread::get_id()): Do not use __gthread_self for single-threaded
	programs using glibc.
	* testsuite/30_threads/this_thread/57060.cc: New.

diff --git a/libstdc++-v3/include/std/thread b/libstdc++-v3/include/std/thread
index 8c01feb..efdd83e 100644
--- a/libstdc++-v3/include/std/thread
+++ b/libstdc++-v3/include/std/thread
@@ -89,11 +89,21 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
 
   friend bool
   operator==(thread::id __x, thread::id __y) noexcept
-  { return __gthread_equal(__x._M_thread, __y._M_thread); }
+  {
+	// pthread_equal is undefined if either thread ID is not valid, so we
+	// can't safely use __gthread_equal on default-constructed values (nor
+	// the non-zero value returned by this_thread::get_id() for
+	// single-threaded programs using GNU libc). Assume EqualityComparable.
+	return __x._M_thread == __y._M_thread;
+  }
 
   friend bool
   operator<(thread::id __x, thread::id __y) noexcept
-  { return __x._M_thread < __y._M_thread; }
+  {
+	// Pthreads doesn't define any way to do this, so we just have to
+	// assume native_handle_type is LessThanComparable.
+	return __x._M_thread < __y._M_thread;
+  }
 
   template
 	friend basic_ostream<_CharT, _Traits>&
@@ -269,7 +279,18 @@ _GLIBCXX_END_NAMESPACE_VERSION
 
 /// get_id
 inline thread::id
-get_id() noexcept { return thread::id(__gthread_self()); }
+get_id() noexcept
+{
+#ifdef __GLIBC__
+  // For the GNU C library pthread_self() is usable without linking to
+  // libpthread.so but returns 0, so we cannot use it in single-threaded
+  // programs, because this_thread::get_id() != thread::id{} must be true.
+  // We know that pthread_t is an integral type in the GNU C library.
+  if (!__gthread_active_p())
+	return thread::id(1);
+#endif
+  return thread::id(__gthread_self());
+}
 
 /// yield
 inline void
diff --git a/libstdc++-v3/testsuite/30_threads/this_thread/57060.cc b/libstdc++-v3/testsuite/30_threads/this_thread/57060.cc
new file mode 100644
index 000..c932719
--- /dev/null
+++ b/libstdc++-v3/testsuite/30_threads/this_thread/57060.cc
@@ -0,0 +1,37 @@
+// Copyright (C) 2015 Free Software Foundation, Inc.
+//
+// This file is part of the GNU ISO C++ Library.  This library is free
+// software; you can redistribute it and/or modify it under the
+// terms of the GNU General Public License as published by the
+// Free Software Foundation; either version 3, or (at your option)
+// any later version.
+
+// This library is distributed in the hope that it will be useful,
+// but WITHOUT ANY WARRANTY; without even the implied warranty of
+// MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+// GNU General Public License for more details.
+
+// You should have received a copy of the GNU General Public License along
+// with this library; see the file COPYING3.  If not see
+// .
+
+// { dg-do run { target *-*-gnu* } }
+// { dg-options "-std=gnu++11" }
+// { dg-require-gthreads "" }
+
+// N.B. this test intentionally does *not* use -pthread
+
+#include 
+#include 
+
+void
+test01()
+{
+  VERIFY( std::this_thread::get_id() != std::thread::id() );
+}
+
+int
+main()
+{
+  test01();
+}

Re: Add fuzzing coverage support

2015-12-04 Thread Jakub Jelinek

On Fri, Dec 04, 2015 at 06:32:38PM +0100, Dmitry Vyukov wrote:
> +2015-12-04  Dmitry Vyukov  
> +
> + * sancov.c: New file.
> + * Makefile.in (OBJS): Add sancov.o.
> + * invoke.texi (-fsanitize-coverage=trace-pc): Describe.
> + * passes.def (sancov_pass): Add.
> + * tree-pass.h  (sancov_pass): Add.
> + * common.opt (-fsanitize-coverage=trace-pc): Add.
> + * sanitizer.def (BUILT_IN_SANITIZER_COV_TRACE_PC): Add.
> + * builtins.def (DEF_SANITIZER_BUILTIN): Enable for
> + flag_sanitize_coverage.

This is ok for trunk.

Jakub

Re: Ping [PATCH] c++/42121 - diagnose invalid flexible array members

2015-12-04 Thread Jason Merrill


On 12/03/2015 11:42 PM, Martin Sebor wrote:

+ if (next && TREE_CODE (next) == FIELD_DECL)


This will break if there's a non-field between the array and the next field.


@@ -4114,7 +4115,10 @@ walk_subobject_offsets (tree type,

   /* Avoid recursing into objects that are not interesting.  */
   if (!CLASS_TYPE_P (element_type)
- || !CLASSTYPE_CONTAINS_EMPTY_CLASS_P (element_type))
+ || !CLASSTYPE_CONTAINS_EMPTY_CLASS_P (element_type)
+ || !domain
+ /* Flexible array members have no upper bound.  */
+ || !TYPE_MAX_VALUE (domain))


Why is this desirable?  We do want to avoid empty bases at the same 
address as a flexible array of the same type.



+   && (TREE_CODE (TYPE_SIZE (type)) != INTEGER_CST
+   || !tree_int_cst_equal (size_zero_node, TYPE_SIZE (type)));


This can be integer_zerop.


+   *seen_field = *seen_field || field_nonempty_p (f), fld = next)


Please add parens around the || expression.


+ && !tree_int_cst_equal (size_max_node, TYPE_MAX_VALUE (dom)))


This can be integer_minus_onep or integer_all_onesp.


+its fields.  The recursive call to the function will
+either return 0 or the flexible array member whose


Let's say NULL_TREE here rather than 0.


+  {
+bool dummy = false;
+check_flexarrays (t, TYPE_FIELDS (t), );
+  }


This should be called from check_bases_and_members, or even integrated 
into check_field_decls.



- else if (name)
-   pedwarn (input_location, OPT_Wpedantic, "ISO C++ forbids zero-size array 
%qD", name);


Why?


@@ -10912,11 +10916,19 @@ grokdeclarator (const cp_declarator *declarator,
if (!staticp && TREE_CODE (type) == ARRAY_TYPE
&& TYPE_DOMAIN (type) == NULL_TREE)
  {
-   tree itype = compute_array_index_type (dname, integer_zero_node,
+   if (TREE_CODE (ctype) == UNION_TYPE
+   || TREE_CODE (ctype) == QUAL_UNION_TYPE)
+ {
+   error ("flexible array member in union");
+   type = error_mark_node;
+ }
+   else
+ {
+   tree itype = compute_array_index_type (dname, NULL_TREE,
   tf_warning_or_error);
type = build_cplus_array_type (TREE_TYPE (type), itype);
  }


Can we leave TYPE_DOMAIN null for flexible arrays so you don't need to 
add special new handling all over the place?



-tree decl;
+tree decl = NULL_TREE;


Why?


+++ b/gcc/testsuite/g++.dg/cpp0x/bad_array_new2.C
@@ -1,7 +1,16 @@
 // Test for throwing bad_array_new_length on invalid array length
 // { dg-do run { target c++11 } }

-#include 
+// #include 
+
+namespace std {
+struct exception {
+virtual ~exception () { }
+};
+
+struct bad_alloc: exception { };
+struct bad_array_new_length { };
+}


Why?

Jason

Re: [PATCH] Fix -Werror= handling for Joined warnings, add a few missing Warning keywords (PRs c/48088, c/68657)

2015-12-04 Thread Jakub Jelinek

On Fri, Dec 04, 2015 at 06:01:58PM +0100, Bernd Schmidt wrote:
> I think marking stuff with Warning as appropriate qualifies as obvious.
> 
> On 12/04/2015 05:37 PM, Jakub Jelinek wrote:
> >+  /* If the switch takes an integer, convert it.  */
> >+  if (arg && cl_options[opt_index].cl_uinteger)
> >+{
> >+  value = integral_argument (arg);
> >+  if (value == -1)
> >+return;
> >+}
> 
> So does this issue an error message anywhere or just silently drop the
> option on the floor if the argument is invalid?

Silently accepted.  Following updated patch accepts e.g.:
-Wlarger-than=5
-Werror=larger-than=5
-Wnormalized=none
-Werror=normalized=none
and rejects (with the same diagnostics between -WXXX=YYY and -Werror=XXX=YYY):
-Wlarger-than=none
-Werror=larger-than=none
-Wlarger-than=
-Werror=larger-than=
-Wnormalized=all
-Werror=normalized=all
-Wnormalized=
-Werror=normalized=
(as examples of Warning UInteger Joined and Warning Enum Joined options).

> >+  /* If the switch takes an enumerated argument, convert it.  */
> >+  if (arg && (cl_options[opt_index].var_type == CLVC_ENUM))
> 
> Unnecessary parens.

Fixed.

2015-12-04  Jakub Jelinek  

PR c/48088
PR c/68657
* common.opt (Wframe-larger-than=): Add Warning.
* opts.h (control_warning_option): Add ARG argument.
* opts-common.c (control_warning_option): Likewise.
If non-NULL, decode it if needed and pass through
to handle_generated_option.  Handle CLVC_ENUM like
CLVC_BOOLEAN.
* opts.c (common_handle_option): Adjust control_warning_option
caller.
(enable_warning_as_error): Likewise.
c-family/
* c.opt (Wfloat-conversion, Wsign-conversion): Add Warning.
* c-pragma.c (handle_pragma_diagnostic): Adjust
control_warning_option caller.
ada/
* gcc-interface/trans.c (Pragma_to_gnu): Adjust
control_warning_option caller.
testsuite/
* c-c++-common/pr68657-1.c: New test.
* c-c++-common/pr68657-2.c: New test.
* c-c++-common/pr68657-3.c: New test.

--- gcc/common.opt.jj   2015-12-04 17:19:01.873180339 +0100
+++ gcc/common.opt  2015-12-04 18:07:31.901544973 +0100
@@ -576,7 +576,7 @@ Common Var(flag_fatal_errors)
 Exit on the first error occurred.
 
 Wframe-larger-than=
-Common RejectNegative Joined UInteger
+Common RejectNegative Joined UInteger Warning
 -Wframe-larger-than=   Warn if a function's stack frame requires more 
than  bytes.
 
 Wfree-nonheap-object
--- gcc/opts.h.jj   2015-12-04 17:19:01.939179455 +0100
+++ gcc/opts.h  2015-12-04 18:07:31.901544973 +0100
@@ -363,7 +363,7 @@ extern void read_cmdline_option (struct
 const struct cl_option_handlers *handlers,
 diagnostic_context *dc);
 extern void control_warning_option (unsigned int opt_index, int kind,
-   bool imply, location_t loc,
+   const char *arg, bool imply, location_t loc,
unsigned int lang_mask,
const struct cl_option_handlers *handlers,
struct gcc_options *opts,
--- gcc/opts-common.c.jj2015-12-04 17:19:01.854180594 +0100
+++ gcc/opts-common.c   2015-12-04 18:44:15.956606061 +0100
@@ -1332,8 +1332,8 @@ get_option_state (struct gcc_options *op
used by -Werror= and #pragma GCC diagnostic.  */
 
 void
-control_warning_option (unsigned int opt_index, int kind, bool imply,
-   location_t loc, unsigned int lang_mask,
+control_warning_option (unsigned int opt_index, int kind, const char *arg,
+   bool imply, location_t loc, unsigned int lang_mask,
const struct cl_option_handlers *handlers,
struct gcc_options *opts,
struct gcc_options *opts_set,
@@ -1347,10 +1347,89 @@ control_warning_option (unsigned int opt
 diagnostic_classify_diagnostic (dc, opt_index, (diagnostic_t) kind, loc);
   if (imply)
 {
+  const struct cl_option *option = _options[opt_index];
+
   /* -Werror=foo implies -Wfoo.  */
-  if (cl_options[opt_index].var_type == CLVC_BOOLEAN)
-   handle_generated_option (opts, opts_set,
-opt_index, NULL, 1, lang_mask,
-kind, loc, handlers, dc);
+  if (option->var_type == CLVC_BOOLEAN || option->var_type == CLVC_ENUM)
+   {
+ int value = 1;
+
+ if (arg && *arg == '\0' && !option->cl_missing_ok)
+   arg = NULL;
+
+ if ((option->flags & CL_JOINED) && arg == NULL)
+   {
+ if (option->missing_argument_error)
+   error_at (loc, option->missing_argument_error,
+ option->opt_text);
+ else
+   error_at (loc, "missing argument to

Re: [PATCH] S/390: Fix warning in "*movstr" pattern.

2015-12-04 Thread Dominik Vogt

Version 6 with another fix.  This should work now.

Ciao

Dominik ^_^  ^_^

-- 

Dominik Vogt
IBM Germany
gcc/ChangeLog

* config/s390/s390.md ("movstr", "*movstr"): Fix warning.
("movstr"): New indirect expanders used by "movstr".
gcc/testsuite/ChangeLog

* gcc.target/s390/md/movstr-1.c: New test.
* gcc.target/s390/s390.exp: Add subdir md.
Do not run hotpatch tests twice.
>From 3bbf6dd63bfd290848f8445f4309c5fcda92f18b Mon Sep 17 00:00:00 2001
From: Dominik Vogt 
Date: Tue, 3 Nov 2015 18:03:02 +0100
Subject: [PATCH] S/390: Fix warning in "*movstr" pattern.

---
 gcc/config/s390/s390.md | 19 ---
 gcc/testsuite/gcc.target/s390/md/movstr-1.c | 24 
 gcc/testsuite/gcc.target/s390/s390.exp  | 25 -
 3 files changed, 60 insertions(+), 8 deletions(-)
 create mode 100644 gcc/testsuite/gcc.target/s390/md/movstr-1.c

diff --git a/gcc/config/s390/s390.md b/gcc/config/s390/s390.md
index e5db537..bc24a36 100644
--- a/gcc/config/s390/s390.md
+++ b/gcc/config/s390/s390.md
@@ -2910,13 +2910,26 @@
 ;
 
 (define_expand "movstr"
+  [(match_operand 0 "register_operand" "")
+   (match_operand 1 "memory_operand" "")
+   (match_operand 2 "memory_operand" "")]
+  ""
+{
+  if (TARGET_64BIT)
+emit_insn (gen_movstrdi (operands[0], operands[1], operands[2]));
+  else
+emit_insn (gen_movstrsi (operands[0], operands[1], operands[2]));
+  DONE;
+})
+
+(define_expand "movstr"
   [(set (reg:SI 0) (const_int 0))
(parallel
 [(clobber (match_dup 3))
  (set (match_operand:BLK 1 "memory_operand" "")
 	  (match_operand:BLK 2 "memory_operand" ""))
- (set (match_operand 0 "register_operand" "")
-	  (unspec [(match_dup 1)
+ (set (match_operand:P 0 "register_operand" "")
+	  (unspec:P [(match_dup 1)
 		   (match_dup 2)
 		   (reg:SI 0)] UNSPEC_MVST))
  (clobber (reg:CC CC_REGNUM))])]
@@ -2937,7 +2950,7 @@
(set (mem:BLK (match_operand:P 1 "register_operand" "0"))
 	(mem:BLK (match_operand:P 3 "register_operand" "2")))
(set (match_operand:P 0 "register_operand" "=d")
-	(unspec [(mem:BLK (match_dup 1))
+	(unspec:P [(mem:BLK (match_dup 1))
 		 (mem:BLK (match_dup 3))
 		 (reg:SI 0)] UNSPEC_MVST))
(clobber (reg:CC CC_REGNUM))]
diff --git a/gcc/testsuite/gcc.target/s390/md/movstr-1.c b/gcc/testsuite/gcc.target/s390/md/movstr-1.c
new file mode 100644
index 000..7da749b
--- /dev/null
+++ b/gcc/testsuite/gcc.target/s390/md/movstr-1.c
@@ -0,0 +1,24 @@
+/* Machine description pattern tests.  */
+
+/* { dg-do run } */
+/* { dg-options "-dP -save-temps" } */
+
+__attribute__ ((noinline))
+void test(char *dest, const char *src)
+{
+  __builtin_stpcpy (dest, src);
+}
+
+/* { dg-final { scan-assembler-times {{[*]movstr}} 1 } } */
+
+#define LEN 200
+char buf[LEN];
+
+int main(void)
+{
+  __builtin_memset(buf, 0, LEN);
+  test(buf, "hello world!");
+  if (__builtin_strcmp(buf, "hello world!") != 0)
+__builtin_abort();
+  return 0;
+}
diff --git a/gcc/testsuite/gcc.target/s390/s390.exp b/gcc/testsuite/gcc.target/s390/s390.exp
index 0b8f80ed..0d7a7eb 100644
--- a/gcc/testsuite/gcc.target/s390/s390.exp
+++ b/gcc/testsuite/gcc.target/s390/s390.exp
@@ -61,20 +61,35 @@ if ![info exists DEFAULT_CFLAGS] then {
 # Initialize `dg'.
 dg-init
 
-set hotpatch_tests $srcdir/$subdir/hotpatch-\[0-9\]*.c
+set md_tests $srcdir/$subdir/md/*.c
 
 # Main loop.
 dg-runtest [lsort [prune [glob -nocomplain $srcdir/$subdir/*.\[cS\]] \
-			 $hotpatch_tests]] "" $DEFAULT_CFLAGS
+			 $md_tests]] "" $DEFAULT_CFLAGS
 
 dg-runtest [lsort [glob -nocomplain $srcdir/$subdir/*vector*/*.\[cS\]]] \
 	"" $DEFAULT_CFLAGS
 
+dg-runtest [lsort [glob -nocomplain $srcdir/$subdir/md/*.\[cS\]]] \
+	"" $DEFAULT_CFLAGS
+
 # Additional hotpatch torture tests.
 torture-init
-set HOTPATCH_TEST_OPTS [list -Os -O0 -O1 -O2 -O3]
-set-torture-options $HOTPATCH_TEST_OPTS
-gcc-dg-runtest [lsort [glob -nocomplain $hotpatch_tests]] "" $DEFAULT_CFLAGS
+set-torture-options [list -Os -O0 -O1 -O2 -O3]
+gcc-dg-runtest [lsort [glob -nocomplain $srcdir/$subdir/hotpatch-\[0-9\]*.c]] \
+	"" $DEFAULT_CFLAGS
+torture-finish
+
+# Additional md torture tests.
+torture-init
+set MD_TEST_OPTS [list \
+	{-Os -march=z900} {-Os -march=z13} \
+	{-O0 -march=z900} {-O0 -march=z13} \
+	{-O1 -march=z900} {-O1 -march=z13} \
+	{-O2 -march=z900} {-O2 -march=z13} \
+	{-O3 -march=z900} {-O3 -march=z13}]
+set-torture-options $MD_TEST_OPTS
+gcc-dg-runtest [lsort [glob -nocomplain $md_tests]] "" $DEFAULT_CFLAGS
 torture-finish
 
 # All done.
-- 
2.3.0

Re: S/390: Fix warnings in "*setmem_long..." patterns.

2015-12-04 Thread Dominik Vogt

Version 5 with the latest requested changes.  Seems to work now.
I've dropped the extra patch and rather marked the failing tests
as "xfail".

Ciao

Dominik ^_^  ^_^

-- 

Dominik Vogt
IBM Germany
gcc/ChangeLog

* config/s390/s390.c (s390_expand_setmem): Use new expanders.
* config/s390/s390.md ("*setmem_long")
("*setmem_long_and", "*setmem_long_31z"): Fix warnings.
("*setmem_long_and_31z"): New define_insn.
("setmem_long_"): New expanders.
* (): New mode attribute
gcc/testsuite/ChangeLog

* gcc.target/s390/md/setmem_long-1.c: New test.
>From 74780dc2756ed1c2052f0d6b836799dcad1217e7 Mon Sep 17 00:00:00 2001
From: Dominik Vogt 
Date: Wed, 4 Nov 2015 03:16:24 +0100
Subject: [PATCH] S/390: Fix warnings in "*setmem_long..." patterns.

---
 gcc/config/s390/s390.c   |  7 ++-
 gcc/config/s390/s390.md  | 50 -
 gcc/testsuite/gcc.target/s390/md/setmem_long-1.c | 56 
 3 files changed, 102 insertions(+), 11 deletions(-)
 create mode 100644 gcc/testsuite/gcc.target/s390/md/setmem_long-1.c

diff --git a/gcc/config/s390/s390.c b/gcc/config/s390/s390.c
index 7e7ed45..1a77437 100644
--- a/gcc/config/s390/s390.c
+++ b/gcc/config/s390/s390.c
@@ -5203,7 +5203,12 @@ s390_expand_setmem (rtx dst, rtx len, rtx val)
   else if (TARGET_MVCLE)
 {
   val = force_not_mem (convert_modes (Pmode, QImode, val, 1));
-  emit_insn (gen_setmem_long (dst, convert_to_mode (Pmode, len, 1), val));
+  if (TARGET_64BIT)
+	emit_insn (gen_setmem_long_di (dst, convert_to_mode (Pmode, len, 1),
+   val));
+  else
+	emit_insn (gen_setmem_long_si (dst, convert_to_mode (Pmode, len, 1),
+   val));
 }
 
   else
diff --git a/gcc/config/s390/s390.md b/gcc/config/s390/s390.md
index bc24a36..ef1ec92 100644
--- a/gcc/config/s390/s390.md
+++ b/gcc/config/s390/s390.md
@@ -70,6 +70,9 @@
; Copy CC as is into the lower 2 bits of an integer register
UNSPEC_CC_TO_INT
 
+   ; Convert Pmode to BLKmode
+   UNSPEC_REPLICATE_BYTE
+
; GOT/PLT and lt-relative accesses
UNSPEC_LTREL_OFFSET
UNSPEC_LTREL_BASE
@@ -727,6 +730,9 @@
 ;; In place of GET_MODE_BITSIZE (mode)
 (define_mode_attr bitsize [(DI "64") (SI "32") (HI "16") (QI "8")])
 
+;; In place of GET_MODE_SIZE (mode)
+(define_mode_attr modesize [(DI "8") (SI "4")])
+
 ;; Allow return and simple_return to be defined from a single template.
 (define_code_iterator ANY_RETURN [return simple_return])
 
@@ -3280,12 +3286,12 @@
 
 ; Initialize a block of arbitrary length with (operands[2] % 256).
 
-(define_expand "setmem_long"
+(define_expand "setmem_long_"
   [(parallel
 [(clobber (match_dup 1))
  (set (match_operand:BLK 0 "memory_operand" "")
-  (match_operand 2 "shift_count_or_setmem_operand" ""))
- (use (match_operand 1 "general_operand" ""))
+	  (unspec:BLK [(match_operand:P 2 "shift_count_or_setmem_operand" "Y")
+		  (match_dup 4)] UNSPEC_REPLICATE_BYTE))
  (use (match_dup 3))
  (clobber (reg:CC CC_REGNUM))])]
   ""
@@ -3306,13 +3312,17 @@
   operands[0] = replace_equiv_address_nv (operands[0], addr0);
   operands[1] = reg0;
   operands[3] = reg1;
+  operands[4] = gen_lowpart (Pmode, operands[1]);
 })
 
+; Patterns for 31 bit + Esa and 64 bit + Zarch.
+
 (define_insn "*setmem_long"
   [(clobber (match_operand: 0 "register_operand" "=d"))
(set (mem:BLK (subreg:P (match_operand: 3 "register_operand" "0") 0))
-(match_operand 2 "shift_count_or_setmem_operand" "Y"))
-   (use (match_dup 3))
+(unspec:BLK [(match_operand:P 2 "shift_count_or_setmem_operand" "Y")
+		 (subreg:P (match_dup 3) )]
+		 UNSPEC_REPLICATE_BYTE))
(use (match_operand: 1 "register_operand" "d"))
(clobber (reg:CC CC_REGNUM))]
   "TARGET_64BIT || !TARGET_ZARCH"
@@ -3323,9 +,11 @@
 (define_insn "*setmem_long_and"
   [(clobber (match_operand: 0 "register_operand" "=d"))
(set (mem:BLK (subreg:P (match_operand: 3 "register_operand" "0") 0))
-(and (match_operand 2 "shift_count_or_setmem_operand" "Y")
-	 (match_operand 4 "const_int_operand" "n")))
-   (use (match_dup 3))
+(unspec:BLK [(and:P
+		  (match_operand:P 2 "shift_count_or_setmem_operand" "Y")
+		  (match_operand:P 4 "const_int_operand" "n"))
+		(subreg:P (match_dup 3) )]
+		UNSPEC_REPLICATE_BYTE))
(use (match_operand: 1 "register_operand" "d"))
(clobber (reg:CC CC_REGNUM))]
   "(TARGET_64BIT || !TARGET_ZARCH) &&
@@ -3334,11 +3346,14 @@
   [(set_attr "length" "8")
(set_attr "type" "vs")])
 
+; Variants for 31 bit + Zarch, necessary because of the odd in-register offsets
+; of the SImode subregs.
+
 (define_insn "*setmem_long_31z"
   [(clobber (match_operand:TI 0 "register_operand" "=d"))
(set (mem:BLK (subreg:SI (match_operand:TI 3 "register_operand" "0") 4))
-(match_operand 2 "shift_count_or_setmem_operand" "Y"))
-   (use

Re: Add fuzzing coverage support

2015-12-04 Thread Dmitry Vyukov

On Fri, Dec 4, 2015 at 2:41 PM, Jakub Jelinek  wrote:
> Hi!
>
> While this has been posted after stage1 closed and I'm not really happy
> that it missed the deadline, I'm willing to grant an exception, the patch
> is small enough that it is ok at this point of stage3.  That said, next time
> please try to submit new features in time.

Sorry.
Thanks!

> Are there any plans for GCC 7 for the other -fsanitize-coverage= options,
> or are those just LLVM alternatives to GCC's gcov/-fprofile-generate etc.?

No, they are not alternatives to gcov. Other coverage modes are backed
by sanitizer_common runtime and libFuzzer, which together allow to do
efficient in-process fuzzing.
We don't have plans to port other options at the moment per se.
Though, that's doable and we could sync sanitizer library for runtime
support.


> On Thu, Dec 03, 2015 at 08:17:06PM +0100, Dmitry Vyukov wrote:
>> +unsigned sancov_pass (function *fun)
>
> Formatting:
> unsigned
> sancov_pass (function *fun)

Missed this. Done.

>> +{
>> +  basic_block bb;
>> +  gimple_stmt_iterator gsi;
>> +  gimple *stmt, *f;
>> +  static bool inited;
>> +
>> +  if (!inited)
>> +{
>> +  inited = true;
>> +  initialize_sanitizer_builtins ();
>> +}
>
> You can call this unconditionally, it will return as the first thing
> if it is already initialized, no need for another guard.

Done

>> +
>> +  /* Insert callback into beginning of every BB. */
>> +  FOR_EACH_BB_FN (bb, fun)
>> +{
>> +  gsi = gsi_after_labels (bb);
>> +  if (gsi_end_p (gsi))
>> +continue;
>> +  stmt = gsi_stmt (gsi);
>> +  f = gimple_build_call (builtin_decl_implicit (
>> + BUILT_IN_SANITIZER_COV_TRACE_PC), 0);
>
> I (personally) prefer no ( at the end of line unless really needed.
> In this case you can just do:
>   tree fndecl = builtin_decl_implicit (BUILT_IN_SANITIZER_COV_TRACE_PC);
>   gimple *g = gimple_build_call (fndecl, 0);
> which is same number of lines, but looks nicer.
> Also, please move also the gsi, stmt and f (better g or gcall)
> declarations to the first assignment to them, they aren't used outside of
> the loop.

Done

>> --- testsuite/gcc.dg/sancov/asan.c(revision 0)
>> +++ testsuite/gcc.dg/sancov/asan.c(working copy)
>> @@ -0,0 +1,21 @@
>> +/* Test coverage/asan interaction:
>> + - coverage instruments __asan_init ctor (thus 4 covarage callbacks)
>> + - coverage does not instrument asan-emitted basic blocks
>> + - asan considers coverage callback as "nonfreeing" (thus 1 asan store
>> +   callback.  */
>> +/* { dg-do compile } */
>> +/* { dg-options "-fsanitize-coverage=trace-pc -fsanitize=address" } */
>> +
>> +void notailcall ();
>> +
>> +void foo(volatile int *a, int *b)
>> +{
>> +  *a = 1;
>> +  if (*b)
>> +*a = 2;
>> +  notailcall ();
>> +}
>> +
>> +/* { dg-final { scan-assembler-times "call   __sanitizer_cov_trace_pc" 4 } 
>> } */
>> +/* { dg-final { scan-assembler-times "call   __asan_report_load4" 1 } } */
>> +/* { dg-final { scan-assembler-times "call   __asan_report_store4" 1 } } */
>
> I don't like these, we have lots of targets, and different targets have
> different instructions for making calls, different whitespace in between
> the insn name and called function, sometimes some extra decoration on the fn
> name, (say sometimes an extra _ prefix), etc.  IMHO much better to add
> -fdump-tree-optimized and scan-tree-dump-times instead for the calls in the
> optimized dump.  Affects all tests.

Done
Much better now.

> Please repost a patch with these changes fixed, it will be hopefully ackable
> then.


New patch is attached.
Code review is updated:
https://codereview.appspot.com/280140043
Test are passing. Also compiled and booted kernel with this.
Also applied clang-format to sancov.c.

Please take another look.
Index: ChangeLog
===
--- ChangeLog	(revision 231277)
+++ ChangeLog	(working copy)
@@ -1,3 +1,15 @@
+2015-12-04  Dmitry Vyukov  
+
+	* sancov.c: New file.
+	* Makefile.in (OBJS): Add sancov.o.
+	* invoke.texi (-fsanitize-coverage=trace-pc): Describe.
+	* passes.def (sancov_pass): Add.
+	* tree-pass.h  (sancov_pass): Add.
+	* common.opt (-fsanitize-coverage=trace-pc): Add.
+	* sanitizer.def (BUILT_IN_SANITIZER_COV_TRACE_PC): Add.
+	* builtins.def (DEF_SANITIZER_BUILTIN): Enable for
+	flag_sanitize_coverage.
+
 2015-12-04  Jeff Law  
 
 	* tree-ssa-reassoc.c (maybe_optimize_range_tests): Return boolean
@@ -593,7 +605,6 @@
 	* tree-ssa-structalias.c (find_func_aliases_for_builtin_call)
 	(find_func_clobbers, ipa_pta_execute): Handle BUILT_IN_GOACC_PARALLEL.
 
->>> .r231221
 2015-12-02  Segher Boessenkool  
 
 	* config/rs6000/rs6000.md (cstore_si_as_di): New expander.
Index: Makefile.in
===
--- Makefile.in	(revision 231277)
+++

Re: Add fuzzing coverage support

2015-12-04 Thread Dmitry Vyukov

On Fri, Dec 4, 2015 at 2:45 PM, Yury Gribov  wrote:
> On 12/04/2015 04:41 PM, Jakub Jelinek wrote:
>>
>> Hi!
>>
>> While this has been posted after stage1 closed and I'm not really happy
>> that it missed the deadline, I'm willing to grant an exception, the patch
>> is small enough that it is ok at this point of stage3.  That said, next
>> time
>> please try to submit new features in time.
>>
>> Are there any plans for GCC 7 for the other -fsanitize-coverage= options,
>> or are those just LLVM alternatives to GCC's gcov/-fprofile-generate etc.?
>>
>> On Thu, Dec 03, 2015 at 08:17:06PM +0100, Dmitry Vyukov wrote:
>>>
>>> +unsigned sancov_pass (function *fun)
>>
>>
>> Formatting:
>> unsigned
>> sancov_pass (function *fun)
>>
>>> +{
>>> +  basic_block bb;
>>> +  gimple_stmt_iterator gsi;
>>> +  gimple *stmt, *f;
>>> +  static bool inited;
>>> +
>>> +  if (!inited)
>>> +{
>>> +  inited = true;
>>> +  initialize_sanitizer_builtins ();
>>> +}
>>
>>
>> You can call this unconditionally, it will return as the first thing
>> if it is already initialized, no need for another guard.
>>
>>> +
>>> +  /* Insert callback into beginning of every BB. */
>>> +  FOR_EACH_BB_FN (bb, fun)
>>> +{
>>> +  gsi = gsi_after_labels (bb);
>>> +  if (gsi_end_p (gsi))
>>> +continue;
>>> +  stmt = gsi_stmt (gsi);
>>> +  f = gimple_build_call (builtin_decl_implicit (
>>> + BUILT_IN_SANITIZER_COV_TRACE_PC), 0);
>>
>>
>> I (personally) prefer no ( at the end of line unless really needed.
>> In this case you can just do:
>>tree fndecl = builtin_decl_implicit
>> (BUILT_IN_SANITIZER_COV_TRACE_PC);
>>gimple *g = gimple_build_call (fndecl, 0);
>> which is same number of lines, but looks nicer.
>> Also, please move also the gsi, stmt and f (better g or gcall)
>> declarations to the first assignment to them, they aren't used outside of
>> the loop.
>
>
> Also FYI clang-format config has been recently added to contrib/
> (https://gcc.gnu.org/ml/gcc-patches/2015-11/msg02214.html).

Rock-n-roll!

Applied to sancov.c

Re: [PATCH][1/2] Fix PR68553

2015-12-04 Thread Ramana Radhakrishnan



On 04/12/15 16:04, Richard Biener wrote:
> On December 4, 2015 4:32:33 PM GMT+01:00, Alan Lawrence 
>  wrote:
>> On 27/11/15 08:30, Richard Biener wrote:
>>>
>>> This is part 1 of a fix for PR68533 which shows that some targets
>>> cannot can_vec_perm_p on an identity permutation.  I chose to fix
>>> this in the vectorizer by detecting the identity itself but with
>>> the current structure of vect_transform_slp_perm_load this is
>>> somewhat awkward.  Thus the following no-op patch simplifies it
>>> greatly (from the times it was restricted to do interleaving-kind
>>> of permutes).  It turned out to not be 100% no-op as we now can
>>> handle non-adjacent source operands so I split it out from the
>>> actual fix.
>>>
>>> The two adjusted testcases no longer fail to vectorize because
>>> of "need three vectors" but unadjusted would fail because there
>>> are simply not enough scalar iterations in the loop.  I adjusted
>>> that and now we vectorize it just fine (running into PR68559
>>> which I filed).
>>>
>>> Bootstrapped and tested on x86_64-unknown-linux-gnu, applied.
>>>
>>> Richard.
>>>
>>> 2015-11-27  Richard Biener  
>>>
>>> PR tree-optimization/68553
>>> * tree-vect-slp.c (vect_get_mask_element): Remove.
>>> (vect_transform_slp_perm_load): Implement in a simpler way.
>>>
>>> * gcc.dg/vect/pr45752.c: Adjust.
>>> * gcc.dg/vect/slp-perm-4.c: Likewise.
>>
>> On aarch64 and ARM targets, this causes
>>
>> PASS->FAIL: gcc.dg/vect/O3-pr36098.c scan-tree-dump-times vect
>> "vectorizing 
>> stmts using SLP" 0
>>
>> That is, we now vectorize using SLP, when previously we did not.
>>
>> On aarch64 (and I expect ARM too), previously we used a VEC_LOAD_LANES,
>> without 
>> unrolling, 
> but now we unroll * 4, and vectorize using 3 loads and
>> permutes:
> 
> Happens on x86_64 as well with at least Sse4.1.  Unfortunately we'll have to 
> start introducing much more fine-grained target-supports for vect_perm to 
> reliably guard all targets.

I don't know enough about SSE4.1 to know whether it's a problem there or not. 
This is an actual regression on AArch64 and ARM and not just a testism, you now 
get :

.L5:
ldr q0, [x5, 16]
add x4, x4, 48
ldr q1, [x5, 32]
add w6, w6, 1
ldr q4, [x5, 48]
cmp w3, w6
ldr q2, [x5], 64
orr v3.16b, v0.16b, v0.16b
orr v5.16b, v4.16b, v4.16b
orr v4.16b, v1.16b, v1.16b
tbl v0.16b, {v0.16b - v1.16b}, v6.16b
tbl v2.16b, {v2.16b - v3.16b}, v7.16b
tbl v4.16b, {v4.16b - v5.16b}, v16.16b
str q0, [x4, -32]
str q2, [x4, -48]
str q4, [x4, -16]
bhi .L5

instead of 

.L5:
ld4 {v4.4s - v7.4s}, [x7], 64
add w4, w4, 1
cmp w3, w4
orr v1.16b, v4.16b, v4.16b
orr v2.16b, v5.16b, v5.16b
orr v3.16b, v6.16b, v6.16b
st3 {v1.4s - v3.4s}, [x6], 48
bhi .L5

LD4 and ST3 do all the permutes without needing actual permute instructions - a 
strategy that favours generic permutes avoiding the load_lanes case is likely 
to be more expensive on most implementations. I think worth a PR atleast.

regards
Ramana








> 
> Richard.
> 
>> ../gcc/gcc/testsuite/gcc.dg/vect/O3-pr36098.c:15:2: note: add new stmt:
>>
>> vect__31.15_94 = VEC_PERM_EXPR > 2, 4 }>;
>> ../gcc/gcc/testsuite/gcc.dg/vect/O3-pr36098.c:15:2: note: add new stmt:
>>
>> vect__31.16_95 = VEC_PERM_EXPR > 4, 5 }>;
>> ../gcc/gcc/testsuite/gcc.dg/vect/O3-pr36098.c:15:2: note: add new stmt:
>>
>> vect__31.17_96 = VEC_PERM_EXPR > 5, 6 }>
>>
>> which *is* a valid vectorization strategy...
>>
>>
>> --Alan
> 
>

Re: [PATCH] Do not sanitize left shifts for -fwrapv (PR68418)

2015-12-04 Thread Paolo Bonzini



On 25/11/2015 14:55, Paolo Bonzini wrote:
> Left shifts into the sign bit is a kind of overflow, and the
> standard chooses to treat left shifts of negative values the
> same way.
> 
> However, the -fwrapv option modifies the language to one where
> integers are defined as two's complement---which also defines
> entirely the behavior of shifts.  Disable sanitization of left
> shifts when -fwrapv is in effect.  The same change was proposed
> for LLVM at https://llvm.org/bugs/show_bug.cgi?id=25552.
> 
> Bootstrapped/regtested x86_64-pc-linux-gnu.  Ok for trunk, and for
> GCC 5 branch after 5.3 is released?
> 
> Thanks,
> 
> Paolo
> 
> gcc:
>   PR sanitizer/68418
>   * c-family/c-ubsan.c (ubsan_instrument_shift): Disable
>   sanitization of left shifts for wrapping signed types as well.
> 
> gcc/testsuite:
>   PR sanitizer/68418
>   * gcc.dg/ubsan/c99-wrapv-shift-1.c,
>   gcc.dg/ubsan/c99-wrapv-shift-2.c: New testcases.
> 
> Index: c-family/c-ubsan.c
> ===
> --- c-family/c-ubsan.c(revision 230466)
> +++ c-family/c-ubsan.c(working copy)
> @@ -128,7 +128,7 @@
>   (unsigned) x >> (uprecm1 - y)
>   if non-zero, is undefined.  */
>if (code == LSHIFT_EXPR
> -  && !TYPE_UNSIGNED (type0)
> +  && !TYPE_OVERFLOW_WRAPS (type0)
>&& flag_isoc99)
>  {
>tree x = fold_build2 (MINUS_EXPR, op1_utype, uprecm1,
> @@ -143,7 +143,7 @@
>   x < 0 || ((unsigned) x >> (uprecm1 - y))
>   if > 1, is undefined.  */
>if (code == LSHIFT_EXPR
> -  && !TYPE_UNSIGNED (type0)
> +  && !TYPE_OVERFLOW_WRAPS (type0)
>&& (cxx_dialect >= cxx11))
>  {
>tree x = fold_build2 (MINUS_EXPR, op1_utype, uprecm1,
> Index: testsuite/gcc.dg/ubsan/c99-wrapv-shift-1.c
> ===
> --- testsuite/gcc.dg/ubsan/c99-wrapv-shift-1.c(revision 0)
> +++ testsuite/gcc.dg/ubsan/c99-wrapv-shift-1.c(working copy)
> @@ -0,0 +1,9 @@
> +/* { dg-do run } */
> +/* { dg-options "-fsanitize=shift -fwrapv -w -std=c99" } */
> +
> +int
> +main (void)
> +{
> +  int a = -42;
> +  a << 1;
> +}
> Index: testsuite/gcc.dg/ubsan/c99-wrapv-shift-2.c
> ===
> --- testsuite/gcc.dg/ubsan/c99-wrapv-shift-2.c(revision 0)
> +++ testsuite/gcc.dg/ubsan/c99-wrapv-shift-2.c(working copy)
> @@ -0,0 +1,9 @@
> +/* { dg-do run } */
> +/* { dg-options "-fsanitize=shift -fwrapv -w -std=c99" } */
> +
> +int
> +main (void)
> +{
> +  int a = 1;
> +  a <<= 31;
> +}
> 

Ping?

Paolo

Re: [PATCH][PR tree-optimization/67816] Fix jump threading when DOM removes conditionals in jump threading path

2015-12-04 Thread Jeff Law


On 12/04/2015 03:12 AM, Richard Biener wrote:

On Thu, Dec 3, 2015 at 9:29 PM, Jeff Law  wrote:

On 12/02/2015 08:35 AM, Richard Biener wrote:



The most interesting side effect, and one I haven't fully analyzed yet is
an
unexpected jump thread -- which I've traced back to differences in what
the
alias oracle is able to find when we walk unaliased vuses. Which makes
totally no sense that it's unable to find the unaliased vuse in the
simplified CFG, but finds it when we don't remove the unexecutable edge.
As
I said, it makes no sense to me yet and I'm still digging.



The walking of PHI nodes is quite simplistic to avoid doing too much work
so
an extra (not executable) edge may confuse it enough.  So this might be
"expected".  Adding a flag on whether EDGE_EXECUTABLE is to be
trusted would be an option (also helping SCCVN).


Found it.  In the CFG with the unexectuable edges _not_ removed there is a
PHI associated with that edge which provides a dominating unaliased vuse.
Once that edge is removed, the PHI arg gets removed and thus we can't easily
see the unaliased vuse.

So all is working as expected.  It wasn't ever a big issue, I just wanted to
make sure I thoroughly understood the somewhat counter-intuitive result.


Good.  Btw, I remembered that with the unreachable tracking in SCCVN I
ran into cases where it didn't correctly track all unreachable blocks due
to the way the dominator walk works (which doesn't ensure we've visited
all predecessors).  Like for

   
| |
|   
| / \
/   \/
 \ 
  \  /
 

DOM order visits , , ,  and then
the DOM children like .  So we fail to detect bb5 as unreachable
(didn't visit bb4 to mark outgoing edges unreachable yet).

The fix is easy (in testing right now).  Simply track if the current block
was unreachable when visiting DOM children.

Didn't manage to produce a missed-optimization testcase (but only
tried for a few minutes), the cases I've seen it were involving
unreachable loops, but I don't have them anymore.

Sorry if that makes extracting the machinery harder ;)

I'll get it sorted out.

I found that I actually wanted those bits extracted and split into two 
functions.  One for the test and one for the propagation.  The hardest 
part is actually chosing meaningful names.


jeff

Commit: MSP430: Reduce number of multilibs

2015-12-04 Thread Nick Clifton

Hi Guys,

  I am applying the attached patch to reduce the number of multilibs for
  the MSP430 target.  This is at the request of TI, on behalf of their
  customers, who complained that the toolchain was too large.  The patch
  only affects MSP430 specific files, and parts of files.  It does not
  touch anything else.

  The patch removes the multilibs based upon the version of hardware
  multiply support that is used, and instead creates a separate set of
  libraries just containing the multiply routines. All normal libraries,
  and user created object files, now call the software multiply 
  routines by default[*].  At link time however one of the specific
  hardware multiply libraries can be linked in to provide hardware based
  alternatives to the software multiply functions.

  As a side effect the patch also fixes a couple of unexpected failures
  in the gcc testsuite (gcc.dg/cleanup-[12|13|5].c).  It also adds some
  new tests the MSP430 specific section of the gcc testsuite that check
  the behaviour of the multiply functions for all possible variations oh
  hardware multiply support.

Cheers
  Nick

[*] There is an exception to the all-files-call-software-multiply rule.
If a file is compiled at -O3 or above, with a hardware multiply type
specified as well, then the hardware functions will be used inline when
appropriate.  Hence no library should ever be compiled in this way,
unless the builder is sure that it will only ever be used on the
appropriate hardware.


gcc/ChangeLog
2015-11-25  Nick Clifton  

* config.gcc (extra_gcc_objs): Define for MSP430.
* common/config/msp430/msp430-common.c (msp430_handle_option):
Pass both -mmcu and -mcpu on to the back end if they are both
defined.
* config/msp430/msp430.c (hwmult_name): New function.
(msp430_option_override): If an unrecognised MCU name is
detected only warn if the user has not provided suitable
-mhwmult and -mcpu options.  Use msp430_warn_mcu to control
warning messages.  Generate warnings about conflicts between
-mmcu and -mcpu and -mhwmult options. 
If neither -mcpu nor -mmcu have been specified but -mhwmult=
f5series has the select the 430X isa.
(msp430_no_hwmult): If -mmcu has not been specified and
msp430_hwmult_type is AUTO then return true.
* config/msp430/msp430.h (EXTRA_SPEC_FUNCTIONS): Define.
(LIB_SPEC): Add hardware multiply library selection.
* config/msp430/t-msp430: Delete hardware multiply multilibs.
Add rule to build driver-msp430.o
* config/msp430/driver-msp430.c: New file.
* config/msp430/msp430.opt (warn-mcu): New option.
* doc/invoke.texi: Update description of -mhwmult=auto.
Document -mwarn-mcu option.

gcc/testsuite/ChangeLog
2015-11-25  Nick Clifton  

* gcc.target/msp430/msp_abi_div_funcs.c: New test.
* gcc.target/msp430/mul_main.h: New test support file.
* gcc.target/msp430/mul_none.c: New test.
* gcc.target/msp430/mul_16bit.c: New test.
* gcc.target/msp430/mul_32bit.c: New test.
* gcc.target/msp430/mul_f5.c: New test.

libgcc/ChangeLog
2015-12-04  Nick Clifton  

* config/msp430/mpy.c (__mulhi3): Use a faster algorithm.
Allow for the second argument being negative.
* config.host (extra_parts): Define for MSP430.  Create separate
libraries for each of the hardware multiply formats.
* config/msp430/lib2hw_mul.S: Build only the multiply routines
that are needed.
* config/msp430/lib2mul.c: Likewise.
* config/msp430/t-msp430 (LIB2ADD): Remove lib2hw_mul.S.
Add rules to build hardware multiply libraries.
* config/msp430/lib2divSI.c: (__mspabi_divlu): Alias for
__mspabi_divul function.
(__mspabi_divllu): New stub function.



msp430.hwmul.patch.2.xz
Description: application/xz

[PATCH] PPC sqrtf using rsqrtes (PR 68609)

2015-12-04 Thread David Edelsohn

The PowerPC port provides reciprocal sqrt but doesn't implement the
extra incantation to utilize it for sqrtf.

The current implementation re-associates terms in the N-R iteration to
utilize one constant instead of two, but does not provide a
pre-computed estimate multiplied by the source, which requires an
extra multiply at the end.  The cost of the extra load of an FP
constant through the LSU and a register to hold it seems to balance
against the cost of the extra multiply in the VSU, so it's not clear
that re-arranging the computation is beneficial.

Thanks, David

PR target/68609
* config/rs6000/rs6000-protos.h (rs6000_emit_swsqrt): Rename and add
bool arguement.
* config/rs6000/rs6000.c (rs6000_emit_swsqrt): Rename. Add non-reciporcal path.
* config/rs6000/rs6000.md (rsqrt2): Call new function name.
(sqrt2): Replace define_insn with define_expand that can call
rs6000_emit_swsqrt.

Index: rs6000-protos.h
===
--- rs6000-protos.h (revision 231169)
+++ rs6000-protos.h (working copy)
@@ -137,7 +137,7 @@
 extern void rs6000_expand_atomic_exchange (rtx op[]);
 extern void rs6000_expand_atomic_op (enum rtx_code, rtx, rtx, rtx, rtx, rtx);
 extern void rs6000_emit_swdiv (rtx, rtx, rtx, bool);
-extern void rs6000_emit_swrsqrt (rtx, rtx);
+extern void rs6000_emit_swsqrt (rtx, rtx, bool);
 extern void output_toc (FILE *, rtx, int, machine_mode);
 extern rtx rs6000_longcall_ref (rtx);
 extern void rs6000_fatal_bad_address (rtx);
Index: rs6000.c
===
--- rs6000.c(revision 231169)
+++ rs6000.c(working copy)
@@ -32889,7 +32889,7 @@ rs6000_emit_swdiv (rtx dst, rtx n, rtx d, bool not
rsqrt.  Assumes no trapping math and finite arguments.  */

 void
-rs6000_emit_swrsqrt (rtx dst, rtx src)
+rs6000_emit_swsqrt (rtx dst, rtx src, bool recip)
 {
   machine_mode mode = GET_MODE (src);
   rtx x0 = gen_reg_rtx (mode);
@@ -32922,6 +32922,16 @@ void
   emit_insn (gen_rtx_SET (x0, gen_rtx_UNSPEC (mode, gen_rtvec (1, src),
  UNSPEC_RSQRT)));

+  /* If (src == 0.0) filter infinity to prevent NaN for sqrt(0.0).  */
+  if (!recip)
+{
+  rtx zero = force_reg (mode, CONST0_RTX (mode));
+  rtx target = emit_conditional_move (x0, GT, src, zero, mode,
+ x0, zero, mode, 0);
+  if (target != x0)
+   emit_move_insn (x0, target);
+}
+
   /* y = 0.5 * src = 1.5 * src - src -> fewer constants */
   rs6000_emit_msub (y, src, halfthree, src);

@@ -32938,7 +32948,11 @@ void
   x0 = x1;
 }

-  emit_move_insn (dst, x0);
+  if (!recip)
+emit_insn (gen_mul (dst, src, x0));
+  else
+emit_move_insn (dst, x0);
+
   return;
 }
Index: rs6000.md
===
--- rs6000.md   (revision 231169)
+++ rs6000.md   (working copy)
@@ -4301,7 +4301,7 @@
(match_operand:RECIPF 1 "gpc_reg_operand" "")]
   "RS6000_RECIP_HAVE_RSQRTE_P (mode)"
 {
-  rs6000_emit_swrsqrt (operands[0], operands[1]);
+  rs6000_emit_swsqrt (operands[0], operands[1], 1);
   DONE;
 })
 ^L
@@ -4426,7 +4426,7 @@
   [(set_attr "type" "div")
(set_attr "fp_type" "fp_div_")])

-(define_insn "sqrt2"
+(define_insn "*sqrt2_internal"
   [(set (match_operand:SFDF 0 "gpc_reg_operand" "=,")
(sqrt:SFDF (match_operand:SFDF 1 "gpc_reg_operand" ",")))]
   "TARGET__FPR && !TARGET_SIMPLE_FPU
@@ -4437,6 +4437,23 @@
   [(set_attr "type" "sqrt")
(set_attr "fp_type" "fp_sqrt_")])

+(define_expand "sqrt2"
+  [(set (match_operand:SFDF 0 "gpc_reg_operand" "")
+   (sqrt:SFDF (match_operand:SFDF 1 "gpc_reg_operand" "")))]
+  "TARGET__FPR && !TARGET_SIMPLE_FPU
+   && (TARGET_PPC_GPOPT || (mode == SFmode && TARGET_XILINX_FPU))"
+{
+  if (mode == SFmode
+  && RS6000_RECIP_HAVE_RSQRTE_P (mode)
+  && !optimize_function_for_size_p (cfun)
+  && flag_finite_math_only && !flag_trapping_math
+  && flag_unsafe_math_optimizations)
+{
+  rs6000_emit_swsqrt (operands[0], operands[1], 0);
+  DONE;
+}
+})
+
 ;; Floating point reciprocal approximation
 (define_insn "fre"
   [(set (match_operand:SFDF 0 "gpc_reg_operand" "=,")

Re: [PATCH] Handle OBJ_TYPE_REF in FRE

2015-12-04 Thread Jan Hubicka

> On Fri, 4 Dec 2015, Jan Hubicka wrote:
> 
> > > Indeed we don't do code hoisting yet.  Maybe one could trick PPRE
> > > into doing it.
> > > 
> > > Note that for OBJ_TYPE_REFs in calls you probably should better use
> > > gimple_call_fntype instead of the type of the OBJ_TYPE_REF anyway
> > > (well, fntype will be the method-type, not pointer-to-method-type).
> > > 
> > > Not sure if you need OBJ_TYPE_REFs type in non-call contexts?
> > 
> > Well, to optimize speculative call sequences
> > 
> > if (funptr == thismethod)
> >   inlined this method body
> > else
> >   funptr ();
> > 
> > Here you want to devirtualize the conditional, not the call in order
> > to get the inlined method unconditonally.
> > 
> > In general I think OBJ_TYPE_REF is misplaced - it should be on vtable load
> > instead of the call/conditional. It is a property of the vtable lookup.
> > Then it would work for method pointers too.
> 
> Even better.  Make it a tcc_reference tree then.

makes sense to me - it is indeed more similar to MEM_REF than to an expression.
I will look into that next stage1.

Honza

[PATCH] Use explicit UNKNOWN_LOCATION instead of input_location (which is line 1) for process_options diagnostics (PR c/68656)

2015-12-04 Thread Jakub Jelinek

Hi!

As mentioned in the PR, the process_options diagnostics is about errors
on the command line (incompatible options, unsupported options etc.),
which aren't really related to any source code location.
This patch fixes it to use explicit UNKNOWN_LOCATION, instead of
explicit or implicit input_location, which for most of process_options
is somewhere on line 1 of the main source file.

Bootstrapped/regtested on x86_64-linux and i686-linux, ok for trunk?

Quick grep seems to suggest some other spots that should be using
UNKNOWN_LOCATION, will leave that to backend maintainers.

gcc/testsuite/gcc.target/s390/hotpatch-compile-4.c:/* { dg-error "arguments to 
.-mhotpatch=n,m. should be non-negative integers" "" { target *-*-* } 1 } */
gcc/testsuite/gcc.target/s390/hotpatch-compile-1.c:/* { dg-error "arguments to 
.-mhotpatch=n,m. should be non-negative integers" "" { target *-*-* } 1 } */
gcc/testsuite/gcc.target/s390/hotpatch-compile-2.c:/* { dg-error "arguments to 
.-mhotpatch=n,m. should be non-negative integers" "" { target *-*-* } 1 } */
gcc/testsuite/gcc.target/s390/hotpatch-compile-6.c:/* { dg-error "arguments to 
.-mhotpatch=n,m. should be non-negative integers" "" { target *-*-* } 1 } */
gcc/testsuite/gcc.target/s390/hotpatch-compile-3.c:/* { dg-error "arguments to 
.-mhotpatch=n,m. should be non-negative integers" "" { target *-*-* } 1 } */
gcc/testsuite/gcc.target/s390/hotpatch-compile-14.c:/* { dg-error "argument to 
.-mhotpatch=n,m. is too large" "" { target *-*-* } 1 } */
gcc/testsuite/gcc.target/s390/hotpatch-compile-5.c:/* { dg-error "arguments to 
.-mhotpatch=n,m. should be non-negative integers" "" { target *-*-* } 1 } */
gcc/testsuite/gcc.target/m68k/stack-limit-1.c:/* { dg-warning "not supported" 
"" { target *-*-* } 1 } */
gcc/testsuite/gcc.target/powerpc/warn-2.c:/* { dg-warning "-mno-altivec 
disables vsx" "" { target *-*-* } 1 } */
gcc/testsuite/gcc.target/powerpc/warn-1.c:/* { dg-warning "-mvsx and 
-mno-altivec are incompatible" "" { target *-*-* } 1 } */

2015-12-04  Jakub Jelinek  

PR c/68656
* toplev.c (init_asm_output): Pass UNKNOWN_LOCATION instead of
input_location to inform.
(process_options): Use warning_at (UNKNOWN_LOCATION instead of
warning ( and error_at (UNKNOWN_LOCATION instead of error (.
Pass UNKNOWN_LOCATION instead of input_location to fatal_error.

* gcc.target/i386/pr65044.c: Expect error on line 0 rather than
line 1.
* g++.dg/opt/pr34036.C: Expect warning on line 0 rather than line 1.
* gcc.dg/tree-ssa/pr23109.c: Likewise.
* gcc.dg/tree-ssa/recip-5.c: Likewise.
* gcc.dg/pr33007.c: Likewise.

--- gcc/toplev.c.jj 2015-12-02 20:26:56.0 +0100
+++ gcc/toplev.c2015-12-04 15:04:24.503513214 +0100
@@ -904,7 +904,9 @@ init_asm_output (const char *name)
   NULL);
}
  else
-   inform (input_location, "-frecord-gcc-switches is not supported by 
the current target");
+   inform (UNKNOWN_LOCATION,
+   "-frecord-gcc-switches is not supported by "
+   "the current target");
}
 
   if (flag_verbose_asm)
@@ -1214,8 +1216,9 @@ process_options (void)
 
   if (flag_section_anchors && !target_supports_section_anchors_p ())
 {
-  warning (OPT_fsection_anchors,
-  "this target does not support %qs", "-fsection-anchors");
+  warning_at (UNKNOWN_LOCATION, OPT_fsection_anchors,
+ "this target does not support %qs",
+ "-fsection-anchors");
   flag_section_anchors = 0;
 }
 
@@ -1250,14 +1253,16 @@ process_options (void)
 {
   if (targetm.chkp_bound_mode () == VOIDmode)
{
- error ("-fcheck-pointer-bounds is not supported for this target");
+ error_at (UNKNOWN_LOCATION,
+   "-fcheck-pointer-bounds is not supported for this target");
  flag_check_pointer_bounds = 0;
}
 
   if (flag_sanitize & SANITIZE_ADDRESS)
{
- error ("-fcheck-pointer-bounds is not supported with "
-"Address Sanitizer");
+ error_at (UNKNOWN_LOCATION,
+   "-fcheck-pointer-bounds is not supported with "
+   "Address Sanitizer");
  flag_check_pointer_bounds = 0;
}
 }
@@ -1270,7 +1275,8 @@ process_options (void)
   if (!abi_version_at_least (2))
 {
   /* -fabi-version=1 support was removed after GCC 4.9.  */
-  error ("%<-fabi-version=1%> is no longer supported");
+  error_at (UNKNOWN_LOCATION,
+   "%<-fabi-version=1%> is no longer supported");
   flag_abi_version = 2;
 }
 
@@ -1297,10 +1303,12 @@ process_options (void)
   /* Warn about options that are not supported on this machine.  */
 #ifndef INSN_SCHEDULING
   if (flag_schedule_insns || flag_schedule_insns_after_reload)
-warning (0, "instruction scheduling

Re: [PATCH] Fix -Werror= handling for Joined warnings, add a few missing Warning keywords (PRs c/48088, c/68657)

2015-12-04 Thread Bernd Schmidt


I think marking stuff with Warning as appropriate qualifies as obvious.

On 12/04/2015 05:37 PM, Jakub Jelinek wrote:

+ /* If the switch takes an integer, convert it.  */
+ if (arg && cl_options[opt_index].cl_uinteger)
+   {
+ value = integral_argument (arg);
+ if (value == -1)
+   return;
+   }


So does this issue an error message anywhere or just silently drop the 
option on the floor if the argument is invalid?



+ /* If the switch takes an enumerated argument, convert it.  */
+ if (arg && (cl_options[opt_index].var_type == CLVC_ENUM))


Unnecessary parens.


Bernd

[PATCH, committed] Fix PR fortran/68684

2015-12-04 Thread Steve Kargl

I've committed the obvious patch after confirmation from
the original author that it is correct.

2015-12-04  Steven G. Kargl  

PR fortran/68684
* resolve.c (resolve_lock_unlock_event): Fix logic faux pas.

Index: resolve.c
===
--- resolve.c   (revision 231243)
+++ resolve.c   (working copy)
@@ -8745,7 +8745,7 @@ resolve_lock_unlock_event (gfc_code *cod
  !gfc_is_coindexed (code->expr1
 gfc_error ("Lock variable at %L must be a scalar of type LOCK_TYPE",
   >expr1->where);
-  else if ((code->op == EXEC_EVENT_POST && code->op == EXEC_EVENT_WAIT)
+  else if ((code->op == EXEC_EVENT_POST || code->op == EXEC_EVENT_WAIT)
   && (code->expr1->ts.type != BT_DERIVED
   || code->expr1->expr_type != EXPR_VARIABLE
   || code->expr1->ts.u.derived->from_intmod
-- 
Steve

Re: Ping [PATCH] c++/42121 - diagnose invalid flexible array members

2015-12-04 Thread Joseph Myers

On Thu, 3 Dec 2015, Martin Sebor wrote:

> The only C change in this patch is to include the size of excessively
> large types in diagnostics (I found knowing the size helpful when
> adding tests and I think it might be helpful to others as well).

I don't see what that C change has to do with flexible array members.  
Could you post it separately with its own testcases, or explain why the C 
and C++ parts depend on each other?

-- 
Joseph S. Myers
jos...@codesourcery.com

Re: [RFA][PATCH] Run CFG cleanups after reassociation as needed

2015-12-04 Thread Jeff Law


On 12/04/2015 03:19 AM, Richard Biener wrote:

On Thu, Dec 3, 2015 at 6:54 PM, Jeff Law  wrote:

This is something I noticed while working on fixing 67816.

Essentially I was seeing trivially true or trivially false conditionals left
in the IL for DOM to clean up.

While DOM can and will clean that crud up, but a trivially true or trivially
false conditional ought to be detected and cleaned up by cleanup_cfg.

It turns out the reassociation pass does not schedule a CFG cleanup even in
cases where it optimizes a conditional to TRUE or FALSE.

Bubbling up an indicator that we optimized away a conditional and using that
to trigger a CFG cleanup is trivial.

While I have a slight preference to see this fix in GCC 6, if folks object
and want this to wait for GCC 7 stage1, I'd understand.

Bootstrapped and regression tested on x86_64-linux-gnu.

OK for the trunk?


Ok.  [I always hoped we can at some point assert we don't have trivially
optimizable control flow in the IL]
Presumably verification when a pass has completed, but not set 
TODO_cleanup_cfg?


Jeff

[PATCH] Add XFAIL to g++.dg/template/ref3.C (PR c++/68699)

2015-12-04 Thread David Malcolm

On Fri, 2015-12-04 at 11:01 -0500, Jason Merrill wrote:
> On 12/03/2015 05:08 PM, David Malcolm wrote:
> > On Thu, 2015-12-03 at 15:38 -0500, Jason Merrill wrote:
> >> On 12/03/2015 09:55 AM, David Malcolm wrote:
> >>> Testcase g++.dg/template/ref3.C:
> >>>
> >>>1  // PR c++/28341
> >>>2
> >>>3  template struct A {};
> >>>4
> >>>5  template struct B
> >>>6  {
> >>>7A<(T)0> b; // { dg-error "constant|not a valid" }
> >>>8A a; // { dg-error "constant|not a valid" }
> >>>9  };
> >>>   10
> >>>   11  B b;
> >>>
> >>> The output of this test for both c++11 and c++14 is unaffected
> >>> by the patch kit:
> >>>g++.dg/template/ref3.C: In instantiation of 'struct B':
> >>>g++.dg/template/ref3.C:11:15:   required from here
> >>>g++.dg/template/ref3.C:7:11: error: '0' is not a valid template 
> >>> argument for type 'const int&' because it is not an lvalue
> >>>g++.dg/template/ref3.C:8:11: error: '0' is not a valid template 
> >>> argument for type 'const int&' because it is not an lvalue
> >>>
> >>> However, the c++98 output is changed:
> >>>
> >>> Status quo for c++98:
> >>> g++.dg/template/ref3.C: In instantiation of 'struct B':
> >>> g++.dg/template/ref3.C:11:15:   required from here
> >>> g++.dg/template/ref3.C:7:11: error: a cast to a type other than an 
> >>> integral or enumeration type cannot appear in a constant-expression
> >>> g++.dg/template/ref3.C:8:11: error: a cast to a type other than an 
> >>> integral or enumeration type cannot appear in a constant-expression
> >>>
> >>> (line 7 and 8 are at the closing semicolon for fields b and a)
> >>>
> >>> With the patchkit for c++98:
> >>> g++.dg/template/ref3.C: In instantiation of 'struct B':
> >>> g++.dg/template/ref3.C:11:15:   required from here
> >>> g++.dg/template/ref3.C:7:5: error: a cast to a type other than an 
> >>> integral or enumeration type cannot appear in a constant-expression
> >>> g++.dg/template/ref3.C:7:5: error: a cast to a type other than an 
> >>> integral or enumeration type cannot appear in a constant-expression
> >>>
> >>> So the 2nd:
> >>> "error: a cast to a type other than an integral or enumeration type 
> >>> cannot appear in a constant-expression"
> >>> moves from line 8 to line 7 (and moves them to earlier, having ranges)
> >>>
> >>> What's happening is that cp_parser_enclosed_template_argument_list
> >>> builds a CAST_EXPR, the first time from cp_parser_cast_expression,
> >>> the second time from cp_parser_functional_cast; these have locations
> >>> representing the correct respective caret, i.e.:
> >>>
> >>>  A<(T)0> b;
> >>>^~~~
> >>>
> >>> and:
> >>>
> >>>  A a;
> >>>^~~~
> >>>
> >>> Eventually finish_template_type is called for each, to build a 
> >>> RECORD_TYPE,
> >>> and we get a cache hit the 2nd time through here in pt.c:
> >>> 8281hash = spec_hasher::hash ();
> >>> 8282entry = type_specializations->find_with_hash (, hash);
> >>> 8283
> >>> 8284if (entry)
> >>> 8285  return entry->spec;
> >>>
> >>> due to:
> >>> template_args_equal (ot=, nt= >>> 0x719bc480>) at ../../src/gcc/cp/pt.c:7778
> >>> which calls:
> >>> cp_tree_equal (t1=, t2= >>> 0x719bc480>) at ../../src/gcc/cp/tree.c:2833
> >>> and returns equality.
> >>>
> >>> Hence we get a single RECORD_TYPE for the type A<(T)(0)>, and hence
> >>> when issuing the errors it uses the TREE_VEC for the first one,
> >>> using the location of the first line.
> >>
> >> Why does the type sharing affect where the parser gives the error?
> >
> > I believe what's happening is that the patchkit is setting location_t
> > values for more expressions than before, including the expression for
> > the template param.  pt.c:tsubst_expr has this:
> >
> >if (EXPR_HAS_LOCATION (t))
> >  input_location = EXPR_LOCATION (t);
> >
> > I believe that before (in the status quo), the substituted types didn't
> > have location_t values, and hence the above conditional didn't fire;
> > input_location was coming from a *token* where the expansion happened,
> > hence we got an error message on the relevant line for each expansion.
> >
> > With the patch, the substituted types have location_t values within
> > their params, hence the conditional above fires: input_location is
> > updated to use the EXPR_LOCATION, which comes from that of the param
> > within the type - but with type-sharing it's using the first place where
> > the type is created.
> >
> > Perhaps a better fix is for cp_parser_non_integral_constant_expression
> > to take a location_t, rather than have it rely on input_location?
>
> Ah, I see, the error is coming from tsubst_copy_and_build, not
> cp_parser_non_integral_constant_expression.  So indeed this is an effect
> of the canonicalization of template instances, and we aren't going to
> fix it in the context of this patchset.  But this is still a bug, so I'd
> rather

[PATCH] Fix -Werror= handling for Joined warnings, add a few missing Warning keywords (PRs c/48088, c/68657)

2015-12-04 Thread Jakub Jelinek

Hi!

GCC changed recently to disallow -Werror= on W options that don't have
Warning keyword set on them, the following patch fixes 3 spots where
Warning has been unintentionally omitted.  The PR also mentions -Wpsabi,
but I believe -Werror=psabi is not appropriate, because -Wpsabi is not
really a warning, but just set of inform calls, that can't be turned into
error anyway.

While writing a testcase for this, I have noticed that
-Werror=frame-larger-than=65536 is broken, we don't pass the
argument through and it acts as -Werror=frame-larger-than=1.

Fixed thusly, bootstrapped/regtested on x86_64-linux and i686-linux, ok for
trunk?

2015-12-04  Jakub Jelinek  

PR c/48088
PR c/68657
* common.opt (Wframe-larger-than=): Add Warning.
* opts.h (control_warning_option): Add ARG argument.
* opts-common.c (control_warning_option): Likewise.
If non-NULL, decode it if needed and pass through
to handle_generated_option.
* opts.c (common_handle_option): Adjust control_warning_option
caller.
(enable_warning_as_error): Likewise.
c-family/
* c.opt (Wfloat-conversion, Wsign-conversion): Add Warning.
* c-pragma.c (handle_pragma_diagnostic): Adjust
control_warning_option caller.
ada/
* gcc-interface/trans.c (Pragma_to_gnu): Adjust
control_warning_option caller.
testsuite/
* c-c++-common/pr68657-1.c: New test.
* c-c++-common/pr68657-2.c: New test.
* c-c++-common/pr68657-3.c: New test.

--- gcc/common.opt.jj   2015-12-02 20:26:59.0 +0100
+++ gcc/common.opt  2015-12-04 09:59:10.050125152 +0100
@@ -576,7 +576,7 @@ Common Var(flag_fatal_errors)
 Exit on the first error occurred.
 
 Wframe-larger-than=
-Common RejectNegative Joined UInteger
+Common RejectNegative Joined UInteger Warning
 -Wframe-larger-than=   Warn if a function's stack frame requires more 
than  bytes.
 
 Wfree-nonheap-object
--- gcc/opts.h.jj   2015-11-14 19:35:57.0 +0100
+++ gcc/opts.h  2015-12-04 13:29:43.945655448 +0100
@@ -363,7 +363,7 @@ extern void read_cmdline_option (struct
 const struct cl_option_handlers *handlers,
 diagnostic_context *dc);
 extern void control_warning_option (unsigned int opt_index, int kind,
-   bool imply, location_t loc,
+   const char *arg, bool imply, location_t loc,
unsigned int lang_mask,
const struct cl_option_handlers *handlers,
struct gcc_options *opts,
--- gcc/opts-common.c.jj2015-12-02 20:26:54.0 +0100
+++ gcc/opts-common.c   2015-12-04 13:52:56.664401522 +0100
@@ -1332,8 +1332,8 @@ get_option_state (struct gcc_options *op
used by -Werror= and #pragma GCC diagnostic.  */
 
 void
-control_warning_option (unsigned int opt_index, int kind, bool imply,
-   location_t loc, unsigned int lang_mask,
+control_warning_option (unsigned int opt_index, int kind, const char *arg,
+   bool imply, location_t loc, unsigned int lang_mask,
const struct cl_option_handlers *handlers,
struct gcc_options *opts,
struct gcc_options *opts_set,
@@ -1349,8 +1349,38 @@ control_warning_option (unsigned int opt
 {
   /* -Werror=foo implies -Wfoo.  */
   if (cl_options[opt_index].var_type == CLVC_BOOLEAN)
-   handle_generated_option (opts, opts_set,
-opt_index, NULL, 1, lang_mask,
-kind, loc, handlers, dc);
+   {
+ int value = 1;
+
+ /* If the switch takes an integer, convert it.  */
+ if (arg && cl_options[opt_index].cl_uinteger)
+   {
+ value = integral_argument (arg);
+ if (value == -1)
+   return;
+   }
+
+ /* If the switch takes an enumerated argument, convert it.  */
+ if (arg && (cl_options[opt_index].var_type == CLVC_ENUM))
+   {
+ const struct cl_enum *e
+   = _enums[cl_options[opt_index].var_enum];
+
+ if (enum_arg_to_value (e->values, arg, , lang_mask))
+   {
+ const char *carg = NULL;
+
+ if (enum_value_to_arg (e->values, , value, lang_mask))
+   arg = carg;
+ gcc_assert (carg != NULL);
+   }
+ else
+   return;
+   }
+
+ handle_generated_option (opts, opts_set,
+  opt_index, arg, value, lang_mask,
+  kind, loc, handlers, dc);
+   }
 }
 }
--- gcc/opts.c.jj   2015-11-23 13:29:49.0 +0100
+++ gcc/opts.c  2015-12-04 13:39:57.986390792 +0100
@@ -2114,7 +2114,7 @@

Re: -fstrict-aliasing fixes 6/6: permit inlining of comdats

2015-12-04 Thread Jan Hubicka

> 
> I wonder if you can split out the re-naming at this stage.  Further
> comments below.

OK, I will commit the renaming and ipa-icf fix separately.
> 
> > Bootstrapped/regtested x86_64-linux, OK?
> > 
> > I will work on some testcases for the ICF and fold-const that would lead
> > to wrong code if alias sets was ignored early.
> 
> Would be nice to have a wrong-code testcase go with the commit.
> 
> > Honza
> > * fold-const.c (operand_equal_p): Before inlining do not permit
> > transformations that would break with strict aliasing.
> > * ipa-inline.c (can_inline_edge_p) Use merged_comdat.
> > * ipa-inline-transform.c (inline_call): When inlining merged comdat do
> > not drop strict_aliasing flag of caller.
> > * cgraphclones.c (cgraph_node::create_clone): Use merged_comdat.
> > * cgraph.c (cgraph_node::dump): Dump merged_comdat.
> > * ipa-icf.c (sem_function::merge): Drop merged_comdat when merging
> > comdat and non-comdat.
> > * cgraph.h (cgraph_node): Rename merged to merged_comdat.
> > * ipa-inline-analysis.c (simple_edge_hints): Check both merged_comdat
> > and icf_merged.
> > 
> > * lto-symtab.c (lto_cgraph_replace_node): Update code computing
> > merged_comdat.
> > Index: fold-const.c
> > ===
> > --- fold-const.c(revision 231239)
> > +++ fold-const.c(working copy)
> > @@ -2987,7 +2987,7 @@ operand_equal_p (const_tree arg0, const_
> >flags)))
> > return 0;
> >   /* Verify that accesses are TBAA compatible.  */
> > - if (flag_strict_aliasing
> > + if ((flag_strict_aliasing || !cfun->after_inlining)
> >   && (!alias_ptr_types_compatible_p
> > (TREE_TYPE (TREE_OPERAND (arg0, 1)),
> >  TREE_TYPE (TREE_OPERAND (arg1, 1)))
> 
> Sooo  first of all the code is broken anyway as it guards
> the restrict checking (MR_DEPENDENCE_*) stuff with flag_strict_aliasing
> (ick).  Second, I wouldn't mind if we drop the flag_strict_aliasing
> check alltogether, a cfun->after_inlining checks makes me just too
> nervous.

OK, I will drop the check separately, too.  
Next stage1 we need to look into code merging across alias classes. ipa-icf
scores are currently 40% down compared to GCC 5 at Firefox.
> 
> So your logic relies on the fact that the -fno-strict-aliasing was
> not necessary on copy A if copy B was compiled without that flag
> because otherwise copy B would invoke undefined behavior?

Yes.
> 
> This menans it's a language semantics thing but you simply look at
> whether it's "comdat"?  Shouldn't this use some ODR thing instead?

It is definition of COMDAT. COMDAT functions are output in every unit
used and no matter what body wins the linking is correct.  Only C++
produce comdat functions, so they all comply ODR rule, so we could rely
on the fact that all function bodies should be equivalent on a source
level.
> 
> Also as undefined behavior only applies at runtime consider copy A
> (with -fno-strict-aliasing) is used in contexts where undefined
> behavior would occur while copy B not.  Say,
> 
> int foo (int *p, short *q)
> {
>   *p = 1;
>   return *q;
> }
> 
> and the copy A use is foo (, ) while the copy B use foo (, ).
> 
> Yes, the case is lame here as we'd miscompile this in copy B and
> comdat makes us eventually use that copy for A.  But if we don't
> manage to miscompile this without inlining there isn't any undefined
> behavior (at runtime) you can rely on.

Well, it is ODR violation in this case :)
> 
> Just want to know whether you thought about the above cases, I would
> declare them invalid but I am not sure the C++ standard agrees here.

Well, not exactly of the case mentioned above, but still think that this is
safe (ugly, too). An alternative is to keep around the bodies until after
inlining.  I have infrastructure for that in my tree, but it is hard to tune to
do: first the alternative function body may have different symbol references
(as a result of different early inlinin) which may not be resolved in current
binary so we can not use it at all. Second keepin many alternatives of every
body around makes code size estimates in inliner go crazy.

Honza

[AArch64] Add register constraints to add3_pluslong

2015-12-04 Thread James Greenhalgh


Hi,

This patch fixes a bug I spotted in the add3_pluslong insn_and_split
pattern. We need to give register constraints, otherwise the register
allocator can do whatever it likes. This manifests as an ICE on AArch64
with -mabi=ilp32:

gcc foo.c -O2 -mabi=ilp32

  error: could not split insn
   }
   ^

  (insn:TI 85 95 7 (set (mem/c:DI (plus:DI (reg/f:DI 29 x29)
  (const_int 40 [0x28])) [1 %sfp+-65528 S8 A64])
  (plus:DI (plus:DI (reg/f:DI 29 x29)
  (const_int 16 [0x10]))
  (const_int 65552 [0x10010]))) foo.c:7 95 {*adddi3_pluslong}
   (nil))

The patch simply constrains the pattern to use w/x registers.

Bootstrapped on aarch64-none-linux-gnu and cross-tested on aarch64-none-elf
with no issues.

OK?

Thanks,
James

---
gcc/

2015-12-04  James Greenhalgh  <james.greenha...@arm.com>

* config/aarch64/aarch64.md (add3_pluslong): Add register
constraints.

gcc/testsuite/

2015-12-04  James Greenhalgh  <james.greenha...@arm.com>

* gcc.c-torture/compile/20151204.c: New.

diff --git a/gcc/config/aarch64/aarch64.md b/gcc/config/aarch64/aarch64.md
index 765df6a..79d1414 100644
--- a/gcc/config/aarch64/aarch64.md
+++ b/gcc/config/aarch64/aarch64.md
@@ -1613,9 +1613,9 @@
 
 (define_insn_and_split "*add3_pluslong"
   [(set
-(match_operand:GPI 0 "register_operand" "")
-(plus:GPI (match_operand:GPI 1 "register_operand" "")
-	  (match_operand:GPI 2 "aarch64_pluslong_operand" "")))]
+(match_operand:GPI 0 "register_operand" "=r")
+(plus:GPI (match_operand:GPI 1 "register_operand" "r")
+	  (match_operand:GPI 2 "aarch64_pluslong_immediate" "i")))]
   "!aarch64_plus_operand (operands[2], VOIDmode)
&& !aarch64_move_imm (INTVAL (operands[2]), mode)"
   "#"
diff --git a/gcc/testsuite/gcc.c-torture/compile/20151204.c b/gcc/testsuite/gcc.c-torture/compile/20151204.c
new file mode 100644
index 000..4a05671
--- /dev/null
+++ b/gcc/testsuite/gcc.c-torture/compile/20151204.c
@@ -0,0 +1,19 @@
+typedef __SIZE_TYPE__ size_t;
+
+int strcmp (const char*, const char*);
+void *memchr (const void *, int, size_t);
+char* strncpy (char *, const char *, size_t);
+
+int
+main(int argc, char** argv)
+{
+  char target[32753] = "A";
+  char buffer[32753];
+  char *x;
+  x = buffer;
+
+  if (strcmp (target, "A")
+  || memchr (target, 'A', 0) != ((void *) 0))
+if (strncpy (x, "", 4) != x);
+  return 0;
+}

Re: [AArch64] Add register constraints to add3_pluslong

2015-12-04 Thread Marcus Shawcroft

On 4 December 2015 at 19:42, James Greenhalgh <james.greenha...@arm.com> wrote:
>
> Hi,
>
> This patch fixes a bug I spotted in the add3_pluslong insn_and_split
> pattern. We need to give register constraints, otherwise the register
> allocator can do whatever it likes. This manifests as an ICE on AArch64
> with -mabi=ilp32:
>
> gcc foo.c -O2 -mabi=ilp32
>
>   error: could not split insn
>}
>^
>
>   (insn:TI 85 95 7 (set (mem/c:DI (plus:DI (reg/f:DI 29 x29)
>   (const_int 40 [0x28])) [1 %sfp+-65528 S8 A64])
>   (plus:DI (plus:DI (reg/f:DI 29 x29)
>   (const_int 16 [0x10]))
>   (const_int 65552 [0x10010]))) foo.c:7 95 {*adddi3_pluslong}
>(nil))
>
> The patch simply constrains the pattern to use w/x registers.
>
> Bootstrapped on aarch64-none-linux-gnu and cross-tested on aarch64-none-elf
> with no issues.
>
> OK?
>
> Thanks,
> James
>
> ---
> gcc/
>
> 2015-12-04  James Greenhalgh  <james.greenha...@arm.com>
>
> * config/aarch64/aarch64.md (add3_pluslong): Add register
> constraints.
>
> gcc/testsuite/
>
> 2015-12-04  James Greenhalgh  <james.greenha...@arm.com>
>
> * gcc.c-torture/compile/20151204.c: New.
>

+main(int argc, char** argv)

Space before (.

OK
/M

[PATCH] Adjust vect-widen-mult-const-[su]16.c for r226675

2015-12-04 Thread Bill Schmidt

Since r226675, we have been seeing these failures:

FAIL: gcc.dg/vect/vect-widen-mult-const-s16.c -flto -ffat-lto-objects
scan-tree-dump-times vect "pattern recognized" 2
FAIL: gcc.dg/vect/vect-widen-mult-const-s16.c scan-tree-dump-times vect
"pattern recognized" 2
FAIL: gcc.dg/vect/vect-widen-mult-const-u16.c -flto -ffat-lto-objects
scan-tree-dump-times vect "pattern recognized" 2
FAIL: gcc.dg/vect/vect-widen-mult-const-u16.c scan-tree-dump-times vect
"pattern recognized" 2

Comparing the vect-details dumps from r226674 to r226675, I see these as
the reason:

63a64,66
> /home/wschmidt/gcc/gcc-mainline-base/gcc/testsuite/gcc.dg/vect/vect-widen-mult-const-s16.c:16:3:
>  note: vect_recog_mult_pattern: detected:
> /home/wschmidt/gcc/gcc-mainline-base/gcc/testsuite/gcc.dg/vect/vect-widen-mult-const-s16.c:16:3:
>  note: patt_47 = _6 << 2;
> /home/wschmidt/gcc/gcc-mainline-base/gcc/testsuite/gcc.dg/vect/vect-widen-mult-const-s16.c:16:3:
>  note: pattern recognized: patt_47 = _6 << 2;
70a74,76
> /home/wschmidt/gcc/gcc-mainline-base/gcc/testsuite/gcc.dg/vect/vect-widen-mult-const-s16.c:16:3:
>  note: vect_recog_mult_pattern: detected:
> /home/wschmidt/gcc/gcc-mainline-base/gcc/testsuite/gcc.dg/vect/vect-widen-mult-const-s16.c:16:3:
>  note: patt_40 = _6 << 1;
> /home/wschmidt/gcc/gcc-mainline-base/gcc/testsuite/gcc.dg/vect/vect-widen-mult-const-s16.c:16:3:
>  note: pattern recognized: patt_40 = _6 << 1;

747a754,756
> /home/wschmidt/gcc/gcc-mainline-base/gcc/testsuite/gcc.dg/vect/vect-widen-mult-const-s16.c:31:3:
>  note: vect_recog_mult_pattern: detected:
> /home/wschmidt/gcc/gcc-mainline-base/gcc/testsuite/gcc.dg/vect/vect-widen-mult-const-s16.c:31:3:
>  note: patt_47 = _6 << 2;
> /home/wschmidt/gcc/gcc-mainline-base/gcc/testsuite/gcc.dg/vect/vect-widen-mult-const-s16.c:31:3:
>  note: pattern recognized: patt_47 = _6 << 2;
754a764,766
> /home/wschmidt/gcc/gcc-mainline-base/gcc/testsuite/gcc.dg/vect/vect-widen-mult-const-s16.c:31:3:
>  note: vect_recog_mult_pattern: detected:
> /home/wschmidt/gcc/gcc-mainline-base/gcc/testsuite/gcc.dg/vect/vect-widen-mult-const-s16.c:31:3:
>  note: patt_40 = _6 << 1;
> /home/wschmidt/gcc/gcc-mainline-base/gcc/testsuite/gcc.dg/vect/vect-widen-mult-const-s16.c:31:3:
>  note: pattern recognized: patt_40 = _6 << 1;

These seems precisely what's expected, given the nature of the patch,
which is looking for these opportunities.  So it's likely that we should
just change

/* { dg-final { scan-tree-dump-times "pattern recognized" 2
"vect" { target vect_widen_mult_hi_to_si_pattern } } } */

to 

/* { dg-final { scan-tree-dump-times "pattern recognized" 6
"vect" { target vect_widen_mult_hi_to_si_pattern } } } */

and similarly for the unsigned case.  The following patch does this.
However, I wanted to run this by Venkat since this was apparently not
detected when his patch went in.  This doesn't appear to be a
target-specific issue, and most targets support
vect_widen_mult_hi_to_si_pattern, so I'm not sure why this wasn't fixed
with the original patch.  Will this change break on any other targets
for some reason?

Tested on powerpc64le-unknown-linux-gnu.  Ok for trunk?

Thanks,
Bill


[gcc/testsuite]

2015-12-04  Bill Schmidt  

* gcc.dg/vect/vect-widen-mult-const-s16.c: Change number of
occurrences of "pattern recognized" to 6.
* gcc.dg/vect/vect-widen-mult-const-u16.c: Likewise.


Index: gcc/testsuite/gcc.dg/vect/vect-widen-mult-const-s16.c
===
--- gcc/testsuite/gcc.dg/vect/vect-widen-mult-const-s16.c   (revision 
231278)
+++ gcc/testsuite/gcc.dg/vect/vect-widen-mult-const-s16.c   (working copy)
@@ -56,5 +56,5 @@ int main (void)
 
 /* { dg-final { scan-tree-dump-times "vectorized 1 loops" 2 "vect" { target 
vect_widen_mult_hi_to_si } } } */
 /* { dg-final { scan-tree-dump-times "vect_recog_widen_mult_pattern: detected" 
2 "vect" { target vect_widen_mult_hi_to_si_pattern } } } */
-/* { dg-final { scan-tree-dump-times "pattern recognized" 2 "vect" { target 
vect_widen_mult_hi_to_si_pattern } } } */
+/* { dg-final { scan-tree-dump-times "pattern recognized" 6 "vect" { target 
vect_widen_mult_hi_to_si_pattern } } } */
 
Index: gcc/testsuite/gcc.dg/vect/vect-widen-mult-const-u16.c
===
--- gcc/testsuite/gcc.dg/vect/vect-widen-mult-const-u16.c   (revision 
231278)
+++ gcc/testsuite/gcc.dg/vect/vect-widen-mult-const-u16.c   (working copy)
@@ -73,4 +73,4 @@ int main (void)
 
 /* { dg-final { scan-tree-dump-times "vectorized 1 loops" 3 "vect" { target 
vect_widen_mult_hi_to_si } } } */
 /* { dg-final { scan-tree-dump-times "vect_recog_widen_mult_pattern: detected" 
2 "vect" { target vect_widen_mult_hi_to_si_pattern } } } */
-/* { dg-final { scan-tree-dump-times "pattern recognized" 2 "vect" { target 
vect_widen_mult_hi_to_si_pattern } } } */
+/* { dg-final { scan-tree-dump-times "pattern recognized" 8

RE: [PATCH] enable loop fusion on isl-15

2015-12-04 Thread Sebastian Paul Pop

I would highly recommend updating the required version of ISL to isl-0.15:
that would simplify the existing code, removing a lot of code under "#ifdef
old ISL",
and allow us to fully transition to schedule_trees instead of dealing with
the
old antiquated union_maps in the scheudler.  The result is faster
compilation time.

Thanks,
Sebastian

-Original Message-
From: Mike Stump [mailto:mikest...@comcast.net] 
Sent: Friday, December 04, 2015 12:03 PM
To: Alan Lawrence
Cc: Sebastian Pop; seb...@gmail.com; gcc-patches@gcc.gnu.org;
hiradi...@msn.com
Subject: Re: [PATCH] enable loop fusion on isl-15

On Dec 4, 2015, at 5:13 AM, Alan Lawrence  wrote:
> On 05/11/15 21:43, Sebastian Pop wrote:
>>* graphite-optimize-isl.c (optimize_isl): Call
>>isl_options_set_schedule_maximize_band_depth.
>> 
>>* gcc.dg/graphite/fuse-1.c: New.
>>* gcc.dg/graphite/fuse-2.c: New.
>>* gcc.dg/graphite/interchange-13.c: Remove bogus check.
> 
> I note that the test
> 
> scan-tree-dump-times forwprop4 "gimple_simplified to[^\\n]*\\^ 12" 1
> 
> FAILs under isl-0.14, with which GCC can still be built and generally
claims to work.
> 
> Is it worth trying to detect this in the testsuite, so we can XFAIL it? By
which I mean, is there a reasonable testsuite mechanism by which we could do
that?

You can permanently ignore it by updating to 0.15?  I don't see the
advantage of bothering to finesse this too much.  I don't know of a way to
detect 14 v 15 other than this test case, but, if you do that, you can't use
that result to gate this test case.  If one wanted to engineer in a way, one
would expose the isl version via a preprocessor symbol (built in), and then
the test case would use that to gate it.  If we had to fix it, I think I'd
prefer we just raise the isl version to 15 or later and be done with it.

1 2 >

1 - 100 of 135 matches

Mail list logo