Re: [PATCH] Fix x86_64 fix_debug_reg_uses (PR rtl-optimization/78575)

2016-12-01 Thread Uros Bizjak
On Tue, Nov 29, 2016 at 8:41 PM, Jakub Jelinek  wrote:
> Hi!
>
> The x86_64 stv pass uses PUT_MODE to change REGs and MEMs in place to affect
> all setters and users, but that is undesirable in debug insns which are
> intentionally ignored during the analysis and we should keep using correct
> modes (TImode) instead of the new one (V1TImode).
>
> The current fix_debug_reg_uses implementation just assumes such a pseudo
> can appear only directly in the VAR_LOCATION's second operand, but it can of
> course appear anywhere in the expression, the whole expression doesn't have
> to be TImode either (e.g. on the testcase it is a QImode comparison of
> originally TImode pseudo with CONST_INT, which stv incorrectly changes into
> comparison of V1TImode with CONST_INT).
>
> The following patch fixes that and also fixes an issue if the pseudo appears
> multiple times in the debug info that the rescan could break traversal of
> further uses.
>
> Bootstrapped/regtested on x86_64-linux and i686-linux, ok for trunk?
>
> 2016-11-29  Jakub Jelinek  
>
> PR rtl-optimization/78575
> * config/i386/i386.c (timode_scalar_chain::fix_debug_reg_uses): Use
> DF infrastructure to wrap all V1TImode reg uses into TImode subreg
> if not already wrapped in a subreg.  Make sure df_insn_rescan does not
> affect further iterations.
>
> * gcc.dg/pr78575.c: New test.

OK.

Thanks,
Uros.

> --- gcc/config/i386/i386.c.jj   2016-11-28 10:59:08.0 +0100
> +++ gcc/config/i386/i386.c  2016-11-29 08:31:58.061278522 +0100
> @@ -3831,30 +3831,32 @@ timode_scalar_chain::fix_debug_reg_uses
>if (!flag_var_tracking)
>  return;
>
> -  df_ref ref;
> -  for (ref = DF_REG_USE_CHAIN (REGNO (reg));
> -   ref;
> -   ref = DF_REF_NEXT_REG (ref))
> +  df_ref ref, next;
> +  for (ref = DF_REG_USE_CHAIN (REGNO (reg)); ref; ref = next)
>  {
>rtx_insn *insn = DF_REF_INSN (ref);
> +  /* Make sure the next ref is for a different instruction,
> + so that we're not affected by the rescan.  */
> +  next = DF_REF_NEXT_REG (ref);
> +  while (next && DF_REF_INSN (next) == insn)
> +   next = DF_REF_NEXT_REG (next);
> +
>if (DEBUG_INSN_P (insn))
> {
>   /* It may be a debug insn with a TImode variable in
>  register.  */
> - rtx val = PATTERN (insn);
> - if (GET_MODE (val) != TImode)
> -   continue;
> - gcc_assert (GET_CODE (val) == VAR_LOCATION);
> - rtx loc = PAT_VAR_LOCATION_LOC (val);
> - /* It may have been converted to TImode already.  */
> - if (GET_MODE (loc) == TImode)
> -   continue;
> - gcc_assert (REG_P (loc)
> - && GET_MODE (loc) == V1TImode);
> - /* Convert V1TImode register, which has been updated by a SET
> -insn before, to SUBREG TImode.  */
> - PAT_VAR_LOCATION_LOC (val) = gen_rtx_SUBREG (TImode, loc, 0);
> - df_insn_rescan (insn);
> + bool changed = false;
> + for (; ref != next; ref = DF_REF_NEXT_REG (ref))
> +   {
> + rtx *loc = DF_REF_LOC (ref);
> + if (REG_P (*loc) && GET_MODE (*loc) == V1TImode)
> +   {
> + *loc = gen_rtx_SUBREG (TImode, *loc, 0);
> + changed = true;
> +   }
> +   }
> + if (changed)
> +   df_insn_rescan (insn);
> }
>  }
>  }
> --- gcc/testsuite/gcc.dg/pr78575.c.jj   2016-11-29 08:36:25.821932436 +0100
> +++ gcc/testsuite/gcc.dg/pr78575.c  2016-11-29 08:35:35.0 +0100
> @@ -0,0 +1,16 @@
> +/* PR rtl-optimization/78575 */
> +/* { dg-do compile { target int128 } } */
> +/* { dg-options "-O2 -g -Wno-psabi" } */
> +
> +typedef unsigned __int128 V __attribute__((vector_size(64)));
> +
> +V g;
> +
> +void
> +foo (V v)
> +{
> +  unsigned __int128 x = 1;
> +  int c = v[1] <= ~x;
> +  v &= v[1];
> +  g = v;
> +}
>
> Jakub


Re: [PATCH] Another debug info stv fix (PR rtl-optimization/78547)

2016-12-01 Thread Uros Bizjak
On Thu, Dec 1, 2016 at 11:58 PM, Jeff Law  wrote:
> On 11/30/2016 11:59 AM, Jakub Jelinek wrote:
>>
>> On Wed, Nov 30, 2016 at 08:01:11AM +0100, Uros Bizjak wrote:
>>>
>>> On Tue, Nov 29, 2016 at 8:44 PM, Jakub Jelinek  wrote:

 Hi!

 The following testcase ICEs because DECL_RTL/DECL_INCOMING_RTL are
 adjusted
 by the stv pass through the PUT_MODE modifications, which means that for
 var-tracking.c they contain a bogus mode.

 Fixed by wrapping those into TImode subreg or adjusting the MEMs to have
 the
 correct mode.

 Bootstrapped/regtested on x86_64-linux and i686-linux, ok for trunk?

 2016-11-29  Jakub Jelinek  

 PR rtl-optimization/78547
 * config/i386/i386.c (convert_scalars_to_vectors): If any
 insns have been converted, adjust all parameter's DEC_RTL and
 DECL_INCOMING_RTL back from V1TImode to TImode if the parameters
 have
 TImode.
>>>
>>>
>>> LGTM.
>>
>>
>> This patch actually has been working around IMHO broken rtl sharing of MEM
>> between DECL_INCOMING_RTL and some REG_EQUAL note.
>>
>> Here is an updated patch that avoids this sharing (the middle-end part)
>> and
>> thus can remove the MEM handling and just keep REG handling in
>> convert_scalars_to_vectors.
>>
>> Bootstrapped/regtested on x86_64-linux and i686-linux, ok for trunk?
>>
>> 2016-11-29  Jakub Jelinek  
>>
>> PR rtl-optimization/78547
>> * emit-rtl.c (unshare_all_rtl): Make sure DECL_RTL and
>> DECL_INCOMING_RTL is not shared.
>> * config/i386/i386.c (convert_scalars_to_vectors): If any
>> insns have been converted, adjust all parameter's DEC_RTL and
>> DECL_INCOMING_RTL back from V1TImode to TImode if the parameters
>> have
>> TImode.
>
> emit-rtl bits are OK.  Uros has the call on the x86 bits.

Yes, still OK.

Uros.


Avoid alloca(0) when temporarily propagating operands during threading

2016-12-01 Thread Jeff Law


Martin's alloca work flagged this code as problematical.  Essentially if 
we had a statement with no operands and the statement was not in the 
hash table, then we could end up performing alloca (0), which is 
inadvisable.


We can safely just avoid the whole block of code if there are no 
operands.  Bootstrapped and regression tested on x86_64-linux-gnu.


Installed on the trunk.

Jeff
commit 790bfbe1974f0b8d1cb23f73635221f4ccb4dafe
Author: law 
Date:   Fri Dec 2 06:40:57 2016 +

* tree-ssa-threadedge.c
(record_temporary_equivalences_from_stmts_at_dest): Avoid temporary
propagation of operands if there are no operands.

git-svn-id: svn+ssh://gcc.gnu.org/svn/gcc/trunk@243152 
138bc75d-0d04-0410-961f-82ee72b054a4

diff --git a/gcc/ChangeLog b/gcc/ChangeLog
index c3170c0..75881ee 100644
--- a/gcc/ChangeLog
+++ b/gcc/ChangeLog
@@ -1,3 +1,9 @@
+2016-12-01  Jeff Law  
+
+   * tree-ssa-threadedge.c
+   (record_temporary_equivalences_from_stmts_at_dest): Avoid temporary
+   propagation of operands if there are no operands.
+
 2016-12-02  Jakub Jelinek  
 
PR tree-optimization/78586
diff --git a/gcc/tree-ssa-threadedge.c b/gcc/tree-ssa-threadedge.c
index 534292c..3fdd59e 100644
--- a/gcc/tree-ssa-threadedge.c
+++ b/gcc/tree-ssa-threadedge.c
@@ -328,9 +328,10 @@ record_temporary_equivalences_from_stmts_at_dest (edge e,
 SSA_NAME_VALUE in addition to its own lattice.  */
  cached_lhs = gimple_fold_stmt_to_constant_1 (stmt,
   threadedge_valueize);
-  if (!cached_lhs
-  || (TREE_CODE (cached_lhs) != SSA_NAME
-  && !is_gimple_min_invariant (cached_lhs)))
+  if (NUM_SSA_OPERANDS (stmt, SSA_OP_ALL_USES) != 0
+ && (!cached_lhs
+  || (TREE_CODE (cached_lhs) != SSA_NAME
+  && !is_gimple_min_invariant (cached_lhs
{
  /* We're going to temporarily copy propagate the operands
 and see if that allows us to simplify this statement.  */


Re: [tree-tailcall] Check if function returns it's argument

2016-12-01 Thread Prathamesh Kulkarni
On 2 December 2016 at 03:57, Jeff Law  wrote:
> On 12/01/2016 06:22 AM, Richard Biener wrote:
>>>
>>> Well after removing DECL_BY_REFERENCE, verify_gimple still fails but
>>> differently:
>>>
>>> tail-padding1.C:13:1: error: RESULT_DECL should be read only when
>>> DECL_BY_REFERENCE is set
>>>  }
>>>  ^
>>> while verifying SSA_NAME nrvo_4 in statement
>>> # .MEM_3 = VDEF <.MEM_1(D)>
>>> nrvo_4 = __builtin_memset (nrvo_2(D), 0, 8);
>>> tail-padding1.C:13:1: internal compiler error: verify_ssa failed
>>
>>
>> Hmm, ok.  Not sure why we enforce this.
>
> I don't know either.  But I would start by looking at tree-nrv.c since it
> looks (based on the variable names) that the named-value-return optimization
> kicked in.
Um, the name nrv0 was in the test-case itself. The transform takes
place in tailr1 pass,
which appears to be before nrv, so possibly this is not related to nrv ?

The verify check seems to be added in r161898 by Honza to fix PR 44813
based on Richard's following suggestion from
https://gcc.gnu.org/ml/gcc-patches/2010-07/msg00358.html:

"We should never see a defintion of a RESULT_DECL SSA name for
DECL_BY_REFERENCE RESULT_DECLs (that would
be a bug - we should add verification to the SSA verifier, can you
do add that?)."

The attached patch moves && ret_stmt together with !ass_var,
and keeps the !DECL_BY_REFERENCE (DECL_RESULT (cfun->decl))
check, and adjusts tailcall-9.c testcase to scan _\[0-9\]* = __builtin_memcpy
in tailr1 dump since that's where the transform takes place.
Is this version OK ?

Thanks,
Prathamesh
>
>>
>> Note that in the end this patch looks fishy -- iff we really need
>> the LHS on the assignment for correctness if we have the tailcall
>> flag set then what guarantees that later passes do not remove it
>> again?  So anybody removing a LHS would need to unset the tailcall flag?
>>
>> Saying again that I don't know enough about the RTL part of tailcall
>> expansion.
>
> The LHS on the assignment makes it easier to identify when a tail call is
> possible.  It's not needed for correctness.  Not having the LHS on the
> assignment just means we won't get an optimized tail call.
>
> Under what circumstances would the LHS possibly be removed?  We know the
> return statement references the LHS, so it's not going to be something that
> DCE will do.
>
> jeff
diff --git a/gcc/testsuite/gcc.dg/tree-ssa/tailcall-9.c 
b/gcc/testsuite/gcc.dg/tree-ssa/tailcall-9.c
new file mode 100644
index 000..9c482f4
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/tree-ssa/tailcall-9.c
@@ -0,0 +1,10 @@
+/* { dg-do compile } */
+/* { dg-options "-O2 -fdump-tree-tailr-details" } */
+
+void *f(void *a1, void *a2, __SIZE_TYPE__ a3)
+{
+  __builtin_memcpy (a1, a2, a3);
+  return a1;
+}
+
+/* { dg-final { scan-tree-dump "_\[0-9\]* = __builtin_memcpy" "tailr1" } } */ 
diff --git a/gcc/tree-tailcall.c b/gcc/tree-tailcall.c
index 66a0a4c..64f624f 100644
--- a/gcc/tree-tailcall.c
+++ b/gcc/tree-tailcall.c
@@ -401,6 +401,7 @@ find_tail_calls (basic_block bb, struct tailcall **ret)
   basic_block abb;
   size_t idx;
   tree var;
+  greturn *ret_stmt = NULL;
 
   if (!single_succ_p (bb))
 return;
@@ -408,6 +409,8 @@ find_tail_calls (basic_block bb, struct tailcall **ret)
   for (gsi = gsi_last_bb (bb); !gsi_end_p (gsi); gsi_prev ())
 {
   stmt = gsi_stmt (gsi);
+  if (!ret_stmt)
+   ret_stmt = dyn_cast (stmt);
 
   /* Ignore labels, returns, nops, clobbers and debug stmts.  */
   if (gimple_code (stmt) == GIMPLE_LABEL
@@ -422,6 +425,35 @@ find_tail_calls (basic_block bb, struct tailcall **ret)
{
  call = as_a  (stmt);
  ass_var = gimple_call_lhs (call);
+ if (!ass_var && ret_stmt)
+   {
+ /* Check if function returns one if it's arguments
+and that argument is used as return value.
+In that case create an artificial lhs to call_stmt,
+and set it as the return value.  */
+
+ unsigned rf = gimple_call_return_flags (call);
+ if (rf & ERF_RETURNS_ARG)
+   {
+ unsigned argnum = rf & ERF_RETURN_ARG_MASK;
+ if (argnum < gimple_call_num_args (call))
+   {
+ tree arg = gimple_call_arg (call, argnum);
+ tree retval = gimple_return_retval (ret_stmt);
+ if (retval
+ && TREE_CODE (retval) == SSA_NAME
+ && operand_equal_p (retval, arg, 0)
+ && !DECL_BY_REFERENCE (DECL_RESULT (cfun->decl)))
+   {
+ ass_var = copy_ssa_name (arg);
+ gimple_call_set_lhs (call, ass_var);
+ update_stmt (call);
+ gimple_return_set_retval (ret_stmt, ass_var);
+ update_stmt (ret_stmt);
+   }
+   }
+   }
+   }
 

[PATCH] fix PR71721

2016-12-01 Thread Waldemar Brodkorb
Hi,

it would be nice if uclinux targets are allowed to enable posix threads.
Together with uClibc-ng/uClibc you can build m68k-nommu toolchain and enable
old Linuxthreads instead of NPTL/TLS. With following change it is possible to 
build
boost, which checks if gcc is build with threads enabled.

Tested with a simple boost application on qemu-system-m68k emulating a coldfire
board without MMU. Other noMMU targets as cortex-m3/cortex-m4 will benefit from
this change, too.

The patch is used in Buildroot for a while without causing issues.

2016-12-02  Waldemar Brodkorb 

   gcc/
   * gcc/config.gcc: Enable posix threads.

diff --git a/gcc/config.gcc b/gcc/config.gcc
index 98267d8..6a95009 100644
--- a/gcc/config.gcc
+++ b/gcc/config.gcc
@@ -831,6 +831,9 @@ case ${target} in
 *-*-uclinux*)
   extra_options="$extra_options gnu-user.opt"
   use_gcc_stdint=wrap
+  case ${enable_threads} in
+"" | yes | posix) thread_file='posix' ;;
+  esac
   tm_defines="$tm_defines DEFAULT_LIBC=LIBC_UCLIBC SINGLE_LIBC"
   ;;
 *-*-rdos*)


Re: [PATCH] handle integer overflow/wrapping in printf directives (PR 78622)

2016-12-01 Thread Martin Sebor

On 12/01/2016 02:15 AM, Jakub Jelinek wrote:

On Thu, Dec 01, 2016 at 08:26:47AM +0100, Jakub Jelinek wrote:

Isn't this too simplistic?  I mean, if you have say dirtype of signed char
and argmin say 4096 + 32 and argmax say 4096 + 64, (signed char) arg
has range 32, 64, while I think your routine will yield -128, 127 (well,
0 as min and -128 as max as that is what you return for signed type).

Can't you subtract argmax - argmin (best just in wide_int, no need to build
trees), and use what you have just for the case where that number doesn't
fit into the narrower precision, otherwise if argmin - (dirtype) argmin
== argmax - (dirtype) argmax, just use (dirtype) argmin and (dirtype) argmax
as the range, and in case that it crosses a boundary figure if you can do
anything than the above?  Guess all cases of signed/unsigned dirtype and/or
argtype need to be considered.


Richard noted that you could have a look at CONVERT_EXPR_CODE_P
handling in extract_range_from_unary_expr.  I think it is the
  || (vr0.type == VR_RANGE
  && integer_zerop (int_const_binop (RSHIFT_EXPR,
   int_const_binop (MINUS_EXPR, vr0.max, vr0.min),
 size_int (TYPE_PRECISION (outer_type)))
part that is important here for the narrowing conversion.


Thanks, that was a helpful suggestion!  Attached is an update that
follows the vrp approach.  I assume the infinity stuff is specific
to the VRP pass and not something I need to worry about here.

Martin

PS To your earlier question, argmin and argmax have the same meaning
in the added helper function as in its caller.
PR middle-end/78622 - [7 Regression] -Wformat-length/-fprintf-return-value incorrect with overflow/wrapping

gcc/ChangeLog:

	PR middle-end/78622
	* gimple-ssa-sprintf.c (min_bytes_remaining): Use res.knownrange
	rather than res.bounded.
	(adjust_range_for_overflow): New function.
	(format_integer): Always set res.bounded to true unless either
	precision or width is specified and unknown.
	Call adjust_range_for_overflow.
	(format_directive): Remove vestigial quoting.
	(add_bytes): Correct the computation of boundrange used to
	decide whether a warning is of a "maybe" or "defnitely" kind.

gcc/testsuite/ChangeLog:

	PR middle-end/78622
	* gcc.c-torture/execute/pr78622.c: New test.
	* gcc.dg/tree-ssa/builtin-sprintf-2.c: Remove "benign" undefined
	behavior inadvertently introduced in a previous commit.  Tighten
	up final checking.
	* gcc.dg/tree-ssa/builtin-sprintf-5.c: Rename macros for clarity.
	Add test cases.
	* gcc.dg/tree-ssa/builtin-sprintf-6.c: Add test cases.
	* gcc.dg/tree-ssa/builtin-sprintf-warn-1.c: Same.
	* gcc.dg/tree-ssa/builtin-sprintf-warn-6.c: Remove xfails and
	add a final optimization check.
	* gcc.dg/tree-ssa/builtin-sprintf.c: Add test cases.

diff --git a/gcc/gimple-ssa-sprintf.c b/gcc/gimple-ssa-sprintf.c
index 99a635a..95a8710 100644
--- a/gcc/gimple-ssa-sprintf.c
+++ b/gcc/gimple-ssa-sprintf.c
@@ -637,7 +637,7 @@ min_bytes_remaining (unsigned HOST_WIDE_INT navail, const format_result )
   if (HOST_WIDE_INT_MAX <= navail)
 return navail;
 
-  if (1 < warn_format_length || res.bounded)
+  if (1 < warn_format_length || res.knownrange)
 {
   /* At level 2, or when all directives output an exact number
 	 of bytes or when their arguments were bounded by known
@@ -795,6 +795,59 @@ get_width_and_precision (const conversion_spec ,
   *pprec = prec;
 }
 
+/* With the range [*ARGMIN, *ARGMAX] of an integer directive's actual
+   argument, due to the conversion from either *ARGMIN or *ARGMAX to
+   the type of the directive's formal argument it's possible for both
+   to result in the same number of bytes or a range of bytes that's
+   less than the number of bytes that would result from formatting
+   some other value in the range [*ARGMIN, *ARGMAX].  This can be
+   determined by checking for the actual argument being in the range
+   of the type of the directive.  If it isn't it must be assumed to
+   take on the full range of the directive's type.
+   Return true when the range has been adjusted to the range of
+   DIRTYPE, false otherwise.  */
+
+static bool
+adjust_range_for_overflow (tree dirtype, tree *argmin, tree *argmax)
+{
+  unsigned argprec = TYPE_PRECISION (TREE_TYPE (*argmin));
+  unsigned dirprec = TYPE_PRECISION (dirtype);
+
+  /* The logic here was inspired/lifted from  the CONVERT_EXPR_CODE_P
+ branch in the extract_range_from_unary_expr function in tree-vrp.c.
+  */
+
+  if (TREE_CODE (*argmin) == INTEGER_CST
+  && TREE_CODE (*argmax) == INTEGER_CST
+  && (dirprec >= argprec
+	  || integer_zerop (int_const_binop (RSHIFT_EXPR,
+	 int_const_binop (MINUS_EXPR,
+			  *argmax,
+			  *argmin),
+	 size_int (dirprec)
+  {
+*argmin = force_fit_type (dirtype, wi::to_widest (*argmin), 0, false);
+*argmax = force_fit_type (dirtype, wi::to_widest (*argmax), 0, false);
+return false;
+  }
+
+  tree dirmin = 

Re: [PATCH 1/9] print_rtx: implement support for reuse IDs (v2)

2016-12-01 Thread David Malcolm
On Thu, 2016-12-01 at 16:05 -0700, Jeff Law wrote:
> On 11/11/2016 02:15 PM, David Malcolm wrote:
> > Posted earlier here:
> >   https://gcc.gnu.org/ml/gcc-patches/2016-11/msg00651.html
> > Link to earlier discussion:
> >   https://gcc.gnu.org/ml/gcc-patches/2016-10/msg01801.html
> > 
> > This version:
> > - eliminates the rtx_reuse_manager singleton
> > - eliminates print-rtl-reuse.h, moving class rtx_reuse_manager into
> > print-rtl.h and print-rtl.c
> > - fixes the formatting issues you noted in the earlier patch
> Can you repost just those pieces that still need review?  I know
> various 
> bits have been approved and various bits have iterated.  Just want to
> see what's really still outstanding.

Here's the current status of the kit:

"[PATCH 1/9] print_rtx: implement support for reuse IDs (v2)"
  https://gcc.gnu.org/ml/gcc-patches/2016-11/msg01188.html
Still needs review


Patches 2-4 are approved.


"[PATCH 5/9] Introduce selftest::locate_file (v4)"
  https://gcc.gnu.org/ml/gcc-patches/2016-11/msg01186.html
Mostly approved, but some discussion ongoing


"[PATCH 6/9] Split class rtx_reader into md_reader vs rtx_reader"
  https://gcc.gnu.org/ml/gcc-patches/2016-11/msg01184.html
Approved (by Richard Sandiford)


"[PATCH 7/9] Add RTL-error-handling to host"
  https://gcc.gnu.org/ml/gcc-patches/2016-11/msg01181.html
Bernd and I are still discussing how best to do this.


"[PATCH 8/9] Introduce class function_reader (v4)"
  https://gcc.gnu.org/ml/gcc-patches/2016-11/msg01185.html
Bernd and I are discussing this; this has now split into:
  * [PATCH 8a/9] Introduce class function_reader (v6)
* https://gcc.gnu.org/ml/gcc-patches/2016-12/msg00149.html
  * [PATCH 8b/9] Add target-independent selftests of RTL function
reader
* https://gcc.gnu.org/ml/gcc-patches/2016-12/msg00147.html
  * [PATCH 8c/9] Add aarch64-specific selftests for RTL function reader
* https://gcc.gnu.org/ml/gcc-patches/2016-12/msg00148.html
  * [PATCH 8d/9] Add x86_64-specific selftests for RTL function reader
* https://gcc.gnu.org/ml/gcc-patches/2016-12/msg00150.html


"[PATCH 9/9] Add "__RTL" to cc1 (v6)"
  https://gcc.gnu.org/ml/gcc-patches/2016-12/msg00151.html
Needs review.


So patch 1 plus patches 7-9 are still outstanding.

Hope this answers your question.
Dave


[PATCH] PR fortran/78618 -- RANK() should not ICE

2016-12-01 Thread Steve Kargl
The attached patch fixes an ICE, a nearby whitespace issue, and
removed the ATTRIBUTE_UNUSED tag.  THe change has passed regression
testing on x86_64-*-freebsd.  Ok to commit?

2016-12-01  Steven G. Kargl  

PR fortran/78618
* check.c (gfc_check_rank): Remove ATTRIBUTE_UNUSED.  Fix whitespace.
Fix ICE where a NULL pointer is dereferenced.
* simplify.c (gfc_convert_char_constant):  Do not buffer error.

2016-12-01  Steven G. Kargl  

PR fortran/78618
* gfortran.dg/char_conversion.f90: New test.

-- 
Steve
Index: gcc/fortran/check.c
===
--- gcc/fortran/check.c	(revision 243142)
+++ gcc/fortran/check.c	(working copy)
@@ -3667,7 +3667,7 @@ gfc_check_range (gfc_expr *x)
 
 
 bool
-gfc_check_rank (gfc_expr *a ATTRIBUTE_UNUSED)
+gfc_check_rank (gfc_expr *a)
 {
   /* Any data object is allowed; a "data object" is a "constant (4.1.3),
  variable (6), or subobject of a constant (2.4.3.2.3)" (F2008, 1.3.45).  */
@@ -3676,9 +3676,16 @@ gfc_check_rank (gfc_expr *a ATTRIBUTE_UN
 
   /* Functions returning pointers are regarded as variable, cf. F2008, R602.  */
   if (a->expr_type == EXPR_FUNCTION)
-is_variable = a->value.function.esym
-		  ? a->value.function.esym->result->attr.pointer
-		  : a->symtree->n.sym->result->attr.pointer;
+{
+  if (a->value.function.esym)
+	is_variable = a->value.function.esym->result->attr.pointer;
+  else if (a->symtree->n.sym->result)
+	is_variable = a->symtree->n.sym->result->attr.pointer;
+  else if (a->symtree->n.sym->value)
+	is_variable = true;
+  else
+	gfc_internal_error ("gfc_check_rank(): invalid function result");
+}
 
   if (a->expr_type == EXPR_OP || a->expr_type == EXPR_NULL
   || a->expr_type == EXPR_COMPCALL|| a->expr_type == EXPR_PPC
Index: gcc/fortran/simplify.c
===
--- gcc/fortran/simplify.c	(revision 243142)
+++ gcc/fortran/simplify.c	(working copy)
@@ -7148,7 +7148,7 @@ gfc_convert_char_constant (gfc_expr *e, 
 	if (!gfc_check_character_range (result->value.character.string[i],
 	kind))
 	  {
-	gfc_error ("Character %qs in string at %L cannot be converted "
+	gfc_error_now ("Character %qs in string at %L cannot be converted "
 		   "into character kind %d",
 		   gfc_print_wide_char (result->value.character.string[i]),
 		   >where, kind);
Index: gcc/testsuite/gfortran.dg/char_conversion.f90
===
--- gcc/testsuite/gfortran.dg/char_conversion.f90	(nonexistent)
+++ gcc/testsuite/gfortran.dg/char_conversion.f90	(working copy)
@@ -0,0 +1,6 @@
+! { dg-do compile }
+! PR fortran/78618
+program p
+   character, parameter :: c = char(256,4) ! { dg-error "cannot be converted" }
+   if (rank(c) /= 0) call abort
+end


[PATCH 9/9] Add "__RTL" to cc1 (v6)

2016-12-01 Thread David Malcolm
Changed in v6:
- Handle EOF in c_parser_parse_rtl_body
- Various fixups from review

Changed in v5:
- rebased from r242065 to r242392.  In particular, this
  brings in the gimple FE; rewrote the "startwith" pass-skipping
  mechanism to work with both __GIMPLE and __RTL.

- rewrote the "__RTL" parser so that it shares code with the
  "__GIMPLE" parser, within c_parser_declspecs.
  Updated all the test cases to use the 'startwith' syntax.

  The gimple FE has a "flag_gimple" to enable __GIMPLE; should
  I add an equivalent for __RTL?

- eliminated the new "native_RTL" field, instead just using
  curr_properties & PROP_rtl.

- added original_copy_tables_initialized_p and reinstate:
gcc_assert (original_copy_bb_pool);
  within free_original_copy_tables.

- added a test of multiple __RTL-marked functions within one test
  case.

Links to previous versions:
  https://gcc.gnu.org/ml/gcc-patches/2016-10/msg00263.html
  https://gcc.gnu.org/ml/gcc-patches/2016-10/msg00500.html



gcc/ChangeLog:
* Makefile.in (OBJS): Add run-rtl-passes.o.

gcc/c-family/ChangeLog:
* c-common.c (c_common_reswords): Add "__RTL".
* c-common.h (enum rid): Add RID_RTL.

gcc/c/ChangeLog:
* c-parser.c: Include "read-rtl-function.h" and
"run-rtl-passes.h".
(c_parser_declaration_or_fndef): Rename "gimple-pass-list" in
grammar to gimple-or-rtl-pass-list.  Add rtl-function-definition
production.  Update for renaming of field "gimple_pass" to
"gimple_or_rtl_pass".  If __RTL was seen, call
c_parser_parse_rtl_body.  Convert a timevar_push/pop pair
to an auto_timevar, to cope with early exit.
(c_parser_declspecs): Update RID_GIMPLE handling for renaming of
field "gimple_pass" to "gimple_or_rtl_pass", and for renaming of
c_parser_gimple_pass_list to c_parser_gimple_or_rtl_pass_list.
Handle RID_RTL.
(c_parser_parse_rtl_body): New function.
* c-tree.h (enum c_declspec_word): Add cdw_rtl.
(struct c_declspecs): Rename field "gimple_pass" to
"gimple_or_rtl_pass".  Add field "rtl_p".
* gimple-parser.c (c_parser_gimple_pass_list): Rename to...
(c_parser_gimple_or_rtl_pass_list): ...this, updating accordingly.
* gimple-parser.h (c_parser_gimple_pass_list): Rename to...
(c_parser_gimple_or_rtl_pass_list): ...this.

gcc/ChangeLog:
* cfg.c (original_copy_tables_initialized_p): New function.
* cfg.h (original_copy_tables_initialized_p): New decl.
* cfgrtl.c (relink_block_chain): Guard the call to
free_original_copy_tables with a call to
original_copy_tables_initialized_p.
* cgraph.h (symtab_node::native_rtl_p): New decl.
* cgraphunit.c (symtab_node::native_rtl_p): New function.
(symtab_node::needed_p): Don't assert for early assembly output
for __RTL functions.
(cgraph_node::finalize_function): Set "force_output" for __RTL
functions.
(cgraph_node::analyze): Bail out early for __RTL functions.
(analyze_functions): Update assertion to support __RTL functions.
(cgraph_node::expand): Bail out early for __RTL functions.
* final.c (rest_of_clean_state): Don't call delete_tree_ssa for
_RTL functions.
* gimple-expr.c: Include "tree-pass.h".
(gimple_has_body_p): Return false for __RTL functions.
* pass_manager.h (gcc::pass_manager::get_rest_of_compilation): New
accessor.
(gcc::pass_manager::get_clean_slate): New accessor.
* passes.c: Include "insn-addr.h".
(execute_one_pass): Split out logic for skipping passes into...
(determine_pass_name_match): ...this new function, ...
(should_skip_pass_p): ...and this new function, adding logging
and special-casing dfinit.
(skip_pass): New function.
* read-md.c (md_reader::read_char): Support filtering
the input to a subset of line numbers.
(md_reader::md_reader): Initialize fields
m_first_line and m_last_line.
(md_reader::read_file_fragment): New function.
* read-md.h (md_reader::read_file_fragment): New decl.
(md_reader::m_first_line): New field.
(md_reader::int m_last_line): New field.
* read-rtl-function.c (function_reader::create_function): Only
create cfun if it doesn't already exist.  Set PROP_rtl on cfun's
curr_properties.  Set DECL_INITIAL to a dummy block.
(read_rtl_function_body_from_file_range): New function.
* read-rtl-function.h (read_rtl_function_body_from_file_range):
New decl.
* run-rtl-passes.c: New file.
* run-rtl-passes.h: New file.

gcc/testsuite/ChangeLog:
* gcc.dg/rtl/aarch64/asr_div1.c: New file.
* gcc.dg/rtl/aarch64/pr71779.c: New file.
* gcc.dg/rtl/rtl.exp: New file.
* 

[PATCH 8a/9] Introduce class function_reader (v6)

2016-12-01 Thread David Malcolm
Changed in v6:
- split out dump-reading selftests into followup patches
  (target-independent, and target-specific)
- removes unneeded headers from read-rtl-function.c
- numerous other cleanups identified in review

Changed in v5:
- updated for change to emit_status::ensure_regno_capacity

Changed in v4:
- error-handling changes split out into a separate patch
- rewritten the loader to use the new "compact" dump format
- support for reuse_rtx in loader
- handling of params, DECL_RTL and DECL_RTL_INCOMING
- moved target-dependent selftests to target-specific code
(aarch64.c and i386.c)

Link to earlier version of the patch:
  https://gcc.gnu.org/ml/gcc-patches/2016-10/msg00267.html

gcc/ChangeLog:
* Makefile.in (OBJS): Add read-md.o, read-rtl.o,
read-rtl-function.o, and selftest-rtl.o.
* emit-rtl.c (maybe_set_max_label_num): New function.
* emit-rtl.h (maybe_set_max_label_num): New decl.
* function.c (instantiate_decls): Guard call to
instantiate_decls_1 with if (DECL_INITIAL (fndecl)).
* gensupport.c (gen_reader::gen_reader): Pass "false"
for new "compact" param of rtx_reader.
* print-rtl.c (rtx_writer::print_rtx_operand): Print "(nil)"
rather than an empty string for NULL strings.
* read-md.c: Potentially include config.h rather than bconfig.h.
(md_reader::read_name): Rename to...
(md_reader::read_name_1): ...this, adding "out_loc" param,
and converting "missing name or number" to returning false, rather
than failing.
(md_reader::read_name): Reimplement in terms of read_name_1.
(md_reader::read_name_or_nil): New function.
(md_reader::read_string): Handle "(nil)" by returning NULL.
(md_reader::md_reader): Add new param "compact".
* read-md.h (md_reader::md_reader): Add new param "compact".
(md_reader::is_compact): New accessor.
(md_reader::read_name): Convert return type from void to
file_location.
(md_reader::read_name_or_nil): New decl.
(md_reader::read_name_1): New decl.
(md_reader::m_compact): New field.
(noop_reader::noop_reader): Pass "false" for new "compact" param
of rtx_reader.
(rtx_reader::rtx_reader): Add new "compact" param.
(rtx_reader::read_rtx_operand): Make virtual and convert return
type from void to rtx.
(rtx_reader::read_until): New decl.
(rtx_reader::handle_any_trailing_information): New virtual
function.
(rtx_reader::postprocess): New virtual function.
(rtx_reader::finalize_string): New virtual function.
(rtx_reader::m_in_call_function_usage): New field.
(rtx_reader::m_reuse_rtx_by_id): New field.
* read-rtl-function.c: New file.
* read-rtl-function.h: New file.
* read-rtl.c: Potentially include config.h rather than bconfig.h.
For host, include function.h, memmodel.h, and emit-rtl.h.
(one_time_initialization): New function.
(struct compact_insn_name): New struct.
(compact_insn_names): New array.
(find_code): Handle insn codes in compact dumps.
(apply_subst_iterator): Wrap with #ifdef GENERATOR_FILE.
(bind_subst_iter_and_attr): Likewise.
(add_condition_to_string): Likewise.
(add_condition_to_rtx): Likewise.
(apply_attribute_uses): Likewise.
(add_current_iterators): Likewise.
(apply_iterators): Likewise.
(initialize_iterators): Guard usage of apply_subst_iterator with
#ifdef GENERATOR_FILE.
(read_conditions): Wrap with #ifdef GENERATOR_FILE.
(md_reader::read_mapping): Likewise.
(add_define_attr_for_define_subst): Likewise.
(add_define_subst_attr): Likewise.
(read_subst_mapping): Likewise.
(check_code_iterator): Likewise.
(rtx_reader::read_rtx): Likewise.  Move one-time initialization
logic to...
(one_time_initialization): New function.
(rtx_reader::read_until): New method.
(read_flags): New function.
(parse_reg_note_name): New function.
(rtx_reader::read_rtx_code): Initialize "iterator" to NULL.
Call one_time_initialization.  Handle reuse_rtx ids.
Wrap iterator lookup within #ifdef GENERATOR_FILE.
Add parsing support for RTL dumps, mirroring the special-cases in
print_rtx, by calling read_flags, reading REG_NOTE names, INSN_UID
values, and calling handle_any_trailing_information.
(rtx_reader::read_rtx_operand): Convert return type from void
to rtx, returning return_rtx.  Handle case 'e'.  Call
finalize_string on XSTR and XTMPL fields.
(rtx_reader::read_nested_rtx):  Handle dumps in which trailing
 "(nil)" values were omitted.  Call the postprocess vfunc on the
return_rtx.
(rtx_reader::rtx_reader): Add new "compact" param and pass to base
class ctor.  

[PATCH 8d/9] Add x86_64-specific selftests for RTL function reader

2016-12-01 Thread David Malcolm
This patch adds more selftests for class function_reader, where
the dumps to be read contain x86_64-specific features.

In an earlier version of the patch kit, these were handled using
This version instead runs them via a target hook for running
target-specific selftests, thus putting them within i386.c.

gcc/ChangeLog:
* config/i386/i386.c
(selftest::ix86_test_loading_dump_fragment_1): New function.
(selftest::ix86_test_loading_call_insn): New function.
(selftest::ix86_test_loading_full_dump): New function.
(selftest::ix86_test_loading_unspec): New function.
(selftest::ix86_run_selftests): Call the new functions.

gcc/testsuite/ChangeLog:
* selftests/x86_64: New subdirectory.
* selftests/x86_64/call-insn.rtl: New file.
* selftests/x86_64/copy-hard-reg-into-frame.rtl: New file.
* selftests/x86_64/times-two.rtl: New file.
* selftests/x86_64/unspec.rtl: New file.
---
 gcc/config/i386/i386.c | 207 +
 gcc/testsuite/selftests/x86_64/call-insn.rtl   |  17 ++
 .../selftests/x86_64/copy-hard-reg-into-frame.rtl  |  15 ++
 gcc/testsuite/selftests/x86_64/times-two.rtl   |  51 +
 gcc/testsuite/selftests/x86_64/unspec.rtl  |  20 ++
 5 files changed, 310 insertions(+)
 create mode 100644 gcc/testsuite/selftests/x86_64/call-insn.rtl
 create mode 100644 gcc/testsuite/selftests/x86_64/copy-hard-reg-into-frame.rtl
 create mode 100644 gcc/testsuite/selftests/x86_64/times-two.rtl
 create mode 100644 gcc/testsuite/selftests/x86_64/unspec.rtl

diff --git a/gcc/config/i386/i386.c b/gcc/config/i386/i386.c
index 6c608e0..0dda786 100644
--- a/gcc/config/i386/i386.c
+++ b/gcc/config/i386/i386.c
@@ -50672,6 +50672,206 @@ ix86_test_dumping_memory_blockage ()
  "] UNSPEC_MEMORY_BLOCKAGE)))\n", pat, );
 }
 
+/* Verify loading an RTL dump; specifically a dump of copying
+   a param on x86_64 from a hard reg into the frame.
+   This test is target-specific since the dump contains target-specific
+   hard reg names.  */
+
+static void
+ix86_test_loading_dump_fragment_1 ()
+{
+  rtl_dump_test t (SELFTEST_LOCATION,
+  locate_file ("x86_64/copy-hard-reg-into-frame.rtl"));
+
+  rtx_insn *insn = get_insn_by_uid (1);
+
+  /* The block structure and indentation here is purely for
+ readability; it mirrors the structure of the rtx.  */
+  tree mem_expr;
+  {
+rtx pat = PATTERN (insn);
+ASSERT_EQ (SET, GET_CODE (pat));
+{
+  rtx dest = SET_DEST (pat);
+  ASSERT_EQ (MEM, GET_CODE (dest));
+  /* Verify the "/c" was parsed.  */
+  ASSERT_TRUE (RTX_FLAG (dest, call));
+  ASSERT_EQ (SImode, GET_MODE (dest));
+  {
+   rtx addr = XEXP (dest, 0);
+   ASSERT_EQ (PLUS, GET_CODE (addr));
+   ASSERT_EQ (DImode, GET_MODE (addr));
+   {
+ rtx lhs = XEXP (addr, 0);
+ ASSERT_EQ (REG, GET_CODE (lhs));
+ /* Verify the "/f" was parsed.  */
+ ASSERT_TRUE (RTX_FLAG (lhs, frame_related));
+ ASSERT_EQ (DImode, GET_MODE (lhs));
+   }
+   {
+ rtx rhs = XEXP (addr, 1);
+ ASSERT_EQ (CONST_INT, GET_CODE (rhs));
+ ASSERT_EQ (-4, INTVAL (rhs));
+   }
+  }
+  /* Verify the "[1 i+0 S4 A32]" was parsed.  */
+  ASSERT_EQ (1, MEM_ALIAS_SET (dest));
+  /* "i" should have been handled by synthesizing a global int
+variable named "i".  */
+  mem_expr = MEM_EXPR (dest);
+  ASSERT_NE (mem_expr, NULL);
+  ASSERT_EQ (VAR_DECL, TREE_CODE (mem_expr));
+  ASSERT_EQ (integer_type_node, TREE_TYPE (mem_expr));
+  ASSERT_EQ (IDENTIFIER_NODE, TREE_CODE (DECL_NAME (mem_expr)));
+  ASSERT_STREQ ("i", IDENTIFIER_POINTER (DECL_NAME (mem_expr)));
+  /* "+0".  */
+  ASSERT_TRUE (MEM_OFFSET_KNOWN_P (dest));
+  ASSERT_EQ (0, MEM_OFFSET (dest));
+  /* "S4".  */
+  ASSERT_EQ (4, MEM_SIZE (dest));
+  /* "A32.  */
+  ASSERT_EQ (32, MEM_ALIGN (dest));
+}
+{
+  rtx src = SET_SRC (pat);
+  ASSERT_EQ (REG, GET_CODE (src));
+  ASSERT_EQ (SImode, GET_MODE (src));
+  ASSERT_EQ (5, REGNO (src));
+  tree reg_expr = REG_EXPR (src);
+  /* "i" here should point to the same var as for the MEM_EXPR.  */
+  ASSERT_EQ (reg_expr, mem_expr);
+}
+  }
+}
+
+/* Verify that the RTL loader copes with a call_insn dump.
+   This test is target-specific since the dump contains a target-specific
+   hard reg name.  */
+
+static void
+ix86_test_loading_call_insn ()
+{
+  rtl_dump_test t (SELFTEST_LOCATION, locate_file ("x86_64/call-insn.rtl"));
+
+  rtx_insn *insn = get_insns ();
+  ASSERT_EQ (CALL_INSN, GET_CODE (insn));
+
+  /* "/j".  */
+  ASSERT_TRUE (RTX_FLAG (insn, jump));
+
+  rtx pat = PATTERN (insn);
+  ASSERT_EQ (CALL, GET_CODE (SET_SRC (pat)));
+
+  /* Verify REG_NOTES.  */
+  {
+/* "(expr_list:REG_CALL_DECL".   */
+ASSERT_EQ (EXPR_LIST, GET_CODE (REG_NOTES (insn)));
+rtx_expr_list 

[PATCH 8c/9] Add aarch64-specific selftests for RTL function reader

2016-12-01 Thread David Malcolm
This patch adds more selftests for class function_reader, where
the dumps to be read contain aarch64-specific features.

In an earlier version of the patch kit, these were handled using
#ifndef GCC_AARCH64_H to conditionalize running them.
This version instead runs them via a target hook for running
target-specific selftests, thus putting them within aarch64.c.

gcc/ChangeLog:
* config/aarch64/aarch64.c: Include selftest.h and
selftest-rtl.h.
(selftest::aarch64_test_loading_full_dump): New function.
(selftest::aarch64_run_selftests): New function.
(TARGET_RUN_TARGET_SELFTESTS): Wire it up to
selftest::aarch64_run_selftests.

gcc/testsuite/ChangeLog:
* selftests/aarch64: New subdirectory.
* selftests/aarch64/times-two.rtl: New file.
---
 gcc/config/aarch64/aarch64.c  | 49 +++
 gcc/testsuite/selftests/aarch64/times-two.rtl | 36 
 2 files changed, 85 insertions(+)
 create mode 100644 gcc/testsuite/selftests/aarch64/times-two.rtl

diff --git a/gcc/config/aarch64/aarch64.c b/gcc/config/aarch64/aarch64.c
index bd97c5b..d19bee3 100644
--- a/gcc/config/aarch64/aarch64.c
+++ b/gcc/config/aarch64/aarch64.c
@@ -64,6 +64,8 @@
 #include "sched-int.h"
 #include "target-globals.h"
 #include "common/common-target.h"
+#include "selftest.h"
+#include "selftest-rtl.h"
 
 /* This file should be included last.  */
 #include "target-def.h"
@@ -14168,6 +14170,48 @@ aarch64_optab_supported_p (int op, machine_mode mode1, 
machine_mode,
 }
 }
 
+/* Target-specific selftests.  */
+
+#if CHECKING_P
+
+namespace selftest {
+
+/* Selftest for the RTL loader.  This test is target-specific and thus
+   here since the dump contains target-specific hard reg names.
+   Verify that the RTL loader copes with a dump from print_rtx_function.  */
+
+static void
+aarch64_test_loading_full_dump ()
+{
+  rtl_dump_test t (SELFTEST_LOCATION, locate_file ("aarch64/times-two.rtl"));
+
+  ASSERT_STREQ ("times_two", IDENTIFIER_POINTER (DECL_NAME (cfun->decl)));
+
+  rtx_insn *insn_1 = get_insn_by_uid (1);
+  ASSERT_EQ (NOTE, GET_CODE (insn_1));
+
+  rtx_insn *insn_15 = get_insn_by_uid (15);
+  ASSERT_EQ (INSN, GET_CODE (insn_15));
+  ASSERT_EQ (USE, GET_CODE (PATTERN (insn_15)));
+
+  /* Verify crtl->return_rtx.  */
+  ASSERT_EQ (REG, GET_CODE (crtl->return_rtx));
+  ASSERT_EQ (0, REGNO (crtl->return_rtx));
+  ASSERT_EQ (SImode, GET_MODE (crtl->return_rtx));
+}
+
+/* Run all target-specific selftests.  */
+
+static void
+aarch64_run_selftests (void)
+{
+  aarch64_test_loading_full_dump ();
+}
+
+} // namespace selftest
+
+#endif /* #if CHECKING_P */
+
 #undef TARGET_ADDRESS_COST
 #define TARGET_ADDRESS_COST aarch64_address_cost
 
@@ -14502,6 +14546,11 @@ aarch64_optab_supported_p (int op, machine_mode mode1, 
machine_mode,
 #undef TARGET_OMIT_STRUCT_RETURN_REG
 #define TARGET_OMIT_STRUCT_RETURN_REG true
 
+#if CHECKING_P
+#undef TARGET_RUN_TARGET_SELFTESTS
+#define TARGET_RUN_TARGET_SELFTESTS selftest::aarch64_run_selftests
+#endif /* #if CHECKING_P */
+
 struct gcc_target targetm = TARGET_INITIALIZER;
 
 #include "gt-aarch64.h"
diff --git a/gcc/testsuite/selftests/aarch64/times-two.rtl 
b/gcc/testsuite/selftests/aarch64/times-two.rtl
new file mode 100644
index 000..dbce67e
--- /dev/null
+++ b/gcc/testsuite/selftests/aarch64/times-two.rtl
@@ -0,0 +1,36 @@
+(function "times_two"
+  (insn-chain
+(cnote 1 NOTE_INSN_DELETED)
+(block 2
+  (edge-from entry (flags "FALLTHRU"))
+  (cnote 4 [bb 2] NOTE_INSN_BASIC_BLOCK)
+  (cinsn 2 (set (mem/c:SI (plus:DI (reg/f:DI virtual-stack-vars)
+(const_int -4)) [1 i+0 S4 A32])
+(reg:SI x0 [ i ])) "../../src/times-two.c":2
+ (nil))
+  (cnote 3 NOTE_INSN_FUNCTION_BEG)
+  (cinsn 6 (set (reg:SI %2)
+(mem/c:SI (plus:DI (reg/f:DI virtual-stack-vars)
+(const_int -4)) [1 i+0 S4 A32])) 
"../../src/times-two.c":3
+ (nil))
+  (cinsn 7 (set (reg:SI %0 [ _2 ])
+(ashift:SI (reg:SI %2)
+(const_int 1))) "../../src/times-two.c":3
+ (nil))
+  (cinsn 10 (set (reg:SI %1 [  ])
+(reg:SI %0 [ _2 ])) "../../src/times-two.c":3
+ (nil))
+  (cinsn 14 (set (reg/i:SI x0)
+(reg:SI %1 [  ])) "../../src/times-two.c":4
+ (nil))
+  (cinsn 15 (use (reg/i:SI x0)) "../../src/times-two.c":4
+ (nil))
+  (edge-to exit (flags "FALLTHRU"))
+) ;; block 2
+  ) ;; insn-chain
+  (crtl
+(return_rtx 
+  (reg/i:SI x0)
+) ;; return_rtx
+  ) ;; crtl
+) ;; function "times_two"
-- 
1.8.5.3



[PATCH 8b/9] Add target-independent selftests of RTL function reader

2016-12-01 Thread David Malcolm
gcc/ChangeLog:
* function-tests.c (selftest::verify_three_block_rtl_cfg): Remove
"static".
* read-rtl-function.c (selftest::assert_edge_at): New function.
(ASSERT_EDGE): New macro.
(selftest::test_loading_dump_fragment_1): New function.
(selftest::test_loading_dump_fragment_2): New function.
(selftest::test_loading_labels): New function.
(selftest::test_loading_insn_with_mode): New function.
(selftest::test_loading_jump_to_label_ref): New function.
(selftest::test_loading_jump_to_return): New function.
(selftest::test_loading_jump_to_simple_return): New function.
(selftest::test_loading_note_insn_basic_block): New function.
(selftest::test_loading_note_insn_deleted): New function.
(selftest::test_loading_const_int): New function.
(selftest::test_loading_symbol_ref): New function.
(selftest::test_loading_cfg): New function.
(selftest::test_loading_bb_index): New function.
(selftest::read_rtl_function_c_tests): Call the new functions.
* selftest-rtl.h (selftest::verify_three_block_rtl_cfg): New decl.

gcc/testsuite/ChangeLog:
* selftests/asr_div1.rtl: New file.
* selftests/bb-index.rtl: New file.
* selftests/cfg-test.rtl: New file.
* selftests/const-int.rtl: New file.
* selftests/example-labels.rtl: New file.
* selftests/insn-with-mode.rtl: New file.
* selftests/jump-to-label-ref.rtl: New file.
* selftests/jump-to-return.rtl: New file.
* selftests/jump-to-simple-return.rtl: New file.
* selftests/note-insn-deleted.rtl: New file.
* selftests/note_insn_basic_block.rtl: New file.
* selftests/simple-cse.rtl: New file.
* selftests/symbol-ref.rtl: New file.
---
 gcc/function-tests.c  |   2 +-
 gcc/read-rtl-function.c   | 434 ++
 gcc/selftest-rtl.h|   2 +
 gcc/testsuite/selftests/asr_div1.rtl  |  24 ++
 gcc/testsuite/selftests/bb-index.rtl  |   8 +
 gcc/testsuite/selftests/cfg-test.rtl  |  37 ++
 gcc/testsuite/selftests/const-int.rtl |  20 +
 gcc/testsuite/selftests/example-labels.rtl|   8 +
 gcc/testsuite/selftests/insn-with-mode.rtl|   7 +
 gcc/testsuite/selftests/jump-to-label-ref.rtl |  17 +
 gcc/testsuite/selftests/jump-to-return.rtl|  11 +
 gcc/testsuite/selftests/jump-to-simple-return.rtl |  11 +
 gcc/testsuite/selftests/note-insn-deleted.rtl |   5 +
 gcc/testsuite/selftests/note_insn_basic_block.rtl |   9 +
 gcc/testsuite/selftests/simple-cse.rtl|  16 +
 gcc/testsuite/selftests/symbol-ref.rtl|  13 +
 16 files changed, 623 insertions(+), 1 deletion(-)
 create mode 100644 gcc/testsuite/selftests/asr_div1.rtl
 create mode 100644 gcc/testsuite/selftests/bb-index.rtl
 create mode 100644 gcc/testsuite/selftests/cfg-test.rtl
 create mode 100644 gcc/testsuite/selftests/const-int.rtl
 create mode 100644 gcc/testsuite/selftests/example-labels.rtl
 create mode 100644 gcc/testsuite/selftests/insn-with-mode.rtl
 create mode 100644 gcc/testsuite/selftests/jump-to-label-ref.rtl
 create mode 100644 gcc/testsuite/selftests/jump-to-return.rtl
 create mode 100644 gcc/testsuite/selftests/jump-to-simple-return.rtl
 create mode 100644 gcc/testsuite/selftests/note-insn-deleted.rtl
 create mode 100644 gcc/testsuite/selftests/note_insn_basic_block.rtl
 create mode 100644 gcc/testsuite/selftests/simple-cse.rtl
 create mode 100644 gcc/testsuite/selftests/symbol-ref.rtl

diff --git a/gcc/function-tests.c b/gcc/function-tests.c
index b0c44cf..90fec6d 100644
--- a/gcc/function-tests.c
+++ b/gcc/function-tests.c
@@ -421,7 +421,7 @@ verify_three_block_gimple_cfg (function *fun)
 
 /* As above, but additionally verify the RTL insns are sane.  */
 
-static void
+void
 verify_three_block_rtl_cfg (function *fun)
 {
   verify_three_block_cfg (fun);
diff --git a/gcc/read-rtl-function.c b/gcc/read-rtl-function.c
index cef834e..07c0e7a 100644
--- a/gcc/read-rtl-function.c
+++ b/gcc/read-rtl-function.c
@@ -1643,6 +1643,427 @@ test_parsing_regnos ()
   ASSERT_EQ (LAST_VIRTUAL_REGISTER + 2, lookup_reg_by_dump_name ("%1"));
 }
 
+/* Verify that edge E is as expected, with the src and dest basic blocks
+   having indices EXPECTED_SRC_IDX and EXPECTED_DEST_IDX respectively, and
+   the edge having flags equal to EXPECTED_FLAGS.
+   Use LOC as the effective location when reporting failures.  */
+
+static void
+assert_edge_at (const location , edge e, int expected_src_idx,
+   int expected_dest_idx, int expected_flags)
+{
+  ASSERT_EQ_AT (loc, expected_src_idx, e->src->index);
+  ASSERT_EQ_AT (loc, expected_dest_idx, e->dest->index);
+  ASSERT_EQ_AT (loc, expected_flags, e->flags);
+}
+
+/* Verify that edge EDGE is as expected, with the src and dest basic blocks
+   having indices 

Re: [PATCH 5/9] Introduce selftest::locate_file (v4)

2016-12-01 Thread David Malcolm
On Thu, 2016-12-01 at 14:29 +0100, Bernd Schmidt wrote:
> On 11/11/2016 10:15 PM, David Malcolm wrote:
> > +  /* Makefile.in has -fself-test=$(srcdir)/testsuite/selftests, so
> > that
> > + flag_self_test contains the path to the selftest subdirectory
> > of the
> > + source tree (without a trailing slash).  Copy it up to
> > + path_to_selftest_files, to avoid selftest.c depending on
> > + option-handling.  */
> > +  path_to_selftest_files = flag_self_test;
> > +
> 
> What kind of dependency are you avoiding? If it's just one include
> I'd 
> prefer to get rid of the extraneous variable.

I was thinking more about keeping selftest.h/c modularized; I didn't
want them knowing anything about gcc option-handling, in case we want
to move the core of the selftests into some other support directory
(libiberty or similar).

Perhaps a better approach is to add the path as a param to locate_file,
to use it from callers.  That would support multiple subprojects using
selftest::locate_file (avoiding a global variable), at the slight cost
of requiring the use of flag_self_test (actually a macro) everywhere in
gcc that we use locate_file.

Should I update the patches to reflect that?

> Otherwise ok.
> 
> 
> Bernd
> 


[PATCH] PR target/78639, Fix code generation on Power9

2016-12-01 Thread Michael Meissner
I discovered that my change in subversion id 242679, used a 'Y' constraint for
movdi, when it should be using a 'wY'.  'Y' is for gpr load/stores, and it
allows register+register address, while 'wY' is for the new register+offset
instructions added in ISA 3.0, which does not include register+register
addressing.

The previous code before 242679 used 'wY', but I deleted the 'w' by accident.

This bug showed up in compiling PUGHReduce/Reduction.c in the cactusADM Spec
2006 benchmark suite.  I have verified that this fixes the problem.

Assuming the bootstrap and make check that I'm running right now don't cause
any regressions, can I check the patch into the trunk?

2016-12-01  Michael Meissner  

PR target/78639
* config/rs6000/rs6000.md (movdi_internal64): Fix typo in
subversion id 242679 that causes the wrong store instruction to be
generated if a DImode is in an Altivec register using REG+REG
addressing.

Index: gcc/config/rs6000/rs6000.md
===
--- gcc/config/rs6000/rs6000.md (revision 243133)
+++ gcc/config/rs6000/rs6000.md (working copy)
@@ -8239,7 +8239,7 @@ (define_split
 (define_insn "*movdi_internal64"
   [(set (match_operand:DI 0 "nonimmediate_operand"
"=Y,r, r, r, r,  r,
-^m,^d,^d,^Y,$Z, $wb,
+^m,^d,^d,^wY,   $Z, $wb,
 $wv,   ^wi,   *wo,   *wo,   *wv,*wi,
 *wi,   *wv,   *wv,   r, *h, *h,
 ?*r,   ?*wg,  ?*r,   ?*wj")


-- 
Michael Meissner, IBM
IBM, M/S 2506R, 550 King Street, Littleton, MA 01460-6245, USA
email: meiss...@linux.vnet.ibm.com, phone: +1 (978) 899-4797



Re: Pretty printers for versioned namespace

2016-12-01 Thread Jonathan Wakely

On 01/12/16 22:51 +0100, François Dumont wrote:

On 29/11/2016 21:17, Jonathan Wakely wrote:

On 28/11/16 22:19 +0100, François Dumont wrote:
  I am not fully happy with the replication in printers.py of 
StdRbtreeIteratorPrinter and 
StdExpAnyPrinter(SingleObjContainerPrinter in respectively 
StdVersionedRbtreeIteratorPrinter and 
StdExpVerAnyPrinter(SingleObjContainerPrinter just to adapt 2 
lines where regex is not an option. We could surely keep only one 
and pass it '' or '__7'. But as I said I am not a python expert so 
any help would be appreciated.


We definitely want to avoid that duplication. For
StdRbtreeIteratorPrinter you can just look at 'typename' and see
whether it starts with "std::__7" or not. If it does, you need to lookup
std::__7::_Rb_tree_node<...>, otherwise you need to lookup
std::_Rb_tree_node<...> instead.

For StdExpAnyPrinter just do two replacements: first replace
std::string with the result of gdb.lookup_type('std::string') and then
replace std::__7::string with the result of looking that up. Are you
sure that's even needed though? Does std::__7::string actually appear
in the manager function's name? I would expect it to appear as
std::__7::basic_string >

which doesn't need to be expanded anyway. So I think you can just
remove your StdExpVerAnyPrinter.


We needed the StdExpVerAnyPrinter just because of the loopkup for 
'std::string' which has to be 'std::__7::string'. But I used similar 
technique exposed previously to get rid of it.


But I don't see any std::__7::string in the relevant symbols.  Here's
the manager function for an std:experimental::__7::any storing a
std::__7::string:

std::experimental::fundamentals_v1::__7::any::_Manager_internal 
>::_S_manage(std::experimental::fundamentals_v1::__7::any::_Op, 
>std::experimental::fundamentals_v1::__7::any const*, 
>std::experimental::fundamentals_v1::__7::any::_Arg*)

Since this has no std::__7::string it doesn't need to be substituted.

Do any tests fail without the change to StdExpAnyPrinter? Which ones?



Re: [PATCH] Fix minor nits in gimple-ssa-sprintf.c (PR tree-optimization/78586)

2016-12-01 Thread Martin Sebor

On 12/01/2016 09:24 AM, Martin Sebor wrote:

So, let's use another testcase, -O2 -W -Wall -fno-tree-vrp -fno-tree-ccp
and again UB in it:
volatile bool e;
volatile int x;

int
main ()
{
  x = 123;
  *(char *) = x;
  bool f = e;
  x = __builtin_snprintf (0, 0, "%d", f);
}

This will store 1 into x, while without -fprintf-return-value it would
store
3.


Great, that's what I was looking for.  I turned it into the following
test case.  Let me try to massage it into a compile-only test suitable
for the test suite and commit it later today.


I hadn't looked at the test case carefully enough when I said
the above.  Now that I have I see it fails even with your patch
with VRP enabled) and I don't think that's a problem in GCC but
rather in the test (the undefined behavior you mentioned).  So
I won't be adding it after all.  I don't think it's appropriate
to exercise behavior that we cannot or will not (or should not)
guarantee.  If you have a different test case does show
the problem under these constraints I'm still interested.

Martin


volatile bool e;
volatile int x;

#define FMT "%d"
const char *fmt = FMT;

int
main ()
{
  x = 123;
  *(char *) = x;
  bool f = e;

  int n1 = __builtin_snprintf (0, 0, FMT, f);
  int n2 = __builtin_snprintf (0, 0, fmt, f);

  __builtin_printf ("%i == %i\n", n1, n2);
  if (n1 != n2)
__builtin_abort ();
}

Martin




Re: [PATCH] Do not simplify "(and (reg) (const bit))" to if_then_else.

2016-12-01 Thread Jeff Law

On 11/21/2016 05:36 AM, Dominik Vogt wrote:

On Fri, Nov 11, 2016 at 12:10:28PM +0100, Dominik Vogt wrote:

> On Mon, Nov 07, 2016 at 09:29:26PM +0100, Bernd Schmidt wrote:

> > On 10/31/2016 08:56 PM, Dominik Vogt wrote:
> >

> > >combine_simplify_rtx() tries to replace rtx expressions with just two
> > >possible values with an experession that uses if_then_else:
> > >
> > >  (if_then_else (condition) (value1) (value2))
> > >
> > >If the original expression is e.g.
> > >
> > >  (and (reg) (const_int 2))

> >
> > I'm not convinced that if_then_else_cond is the right place to do
> > this. That function is designed to answer the question of whether an
> > rtx has exactly one of two values and under which condition; I feel
> > it should continue to work this way.
> >
> > Maybe simplify_ternary_expression needs to be taught to deal with this case?

>
> But simplify_ternary_expression isn't called with the following
> test program (only tried it on s390x):
>
>   void bar(int, int);
>   int foo(int a, int *b)
>   {
> if (a)
>   bar(0, *b & 2);
> return *b;
>   }
>
> combine_simplify_rtx() is called with
>
>   (sign_extend:DI (and:SI (reg:SI 61) (const_int 2)))
>
> In the switch it calls simplify_unary_operation(), which return
> NULL.  The next thing it does is call if_then_else_cond(), and
> that calls itself with the sign_extend peeled off:
>
>   (and:SI (reg:SI 61) (const_int 2))
>
> takes the "BINARY_P (x)" path and returns false.  The problem
> exists only if the (and ...) is wrapped in ..._extend, i.e. the
> ondition dealing with (and ...) directly can be removed from the
> patch.
>
> So, all recursive calls to if_then_els_cond() return false, and
> finally the condition in
>
> else if (HWI_COMPUTABLE_MODE_P (mode)
>&& pow2p_hwi (nz = nonzero_bits (x, mode))
>
> is true.
>
> Thus, if if_then_else_cond should remain unchanged, the only place
> to fix this would be after the call to if_then_else_cond() in
> combine_simplify_rtx().  Actually, there already is some special
> case handling to override the return code of if_then_else_cond():
>
>   cond = if_then_else_cond (x, _rtx, _rtx);
>   if (cond != 0
>   /* If everything is a comparison, what we have is highly unlikely
>  to be simpler, so don't use it.  */
> --->  && ! (COMPARISON_P (x)
> && (COMPARISON_P (true_rtx) || COMPARISON_P (false_rtx
> {
>   rtx cop1 = const0_rtx;
>   enum rtx_code cond_code = simplify_comparison (NE, , );
>
> --->  if (cond_code == NE && COMPARISON_P (cond))
> return x;
>   ...
>
> Should be easy to duplicate the test in the if-body, if that is
> what you prefer:
>
>   ...
>   if (HWI_COMPUTABLE_MODE_P (GET_MODE (x))
>   && pow2p_hwi (nz = nonzero_bits (x, GET_MODE (x)))
>   && ! ((code == SIGN_EXTEND || code == ZERO_EXTEND)
> && GET_CODE (XEXP (x, 0)) == AND
> && CONST_INT_P (XEXP (XEXP (x, 0), 0))
> && UINTVAL (XEXP (XEXP (x, 0), 0)) == nz))
> return x;
>
> (untested)

Updated and tested version of the patch attached.  The extra logic
is now in combine_simplify_rtx.

Ciao

Dominik ^_^  ^_^

-- Dominik Vogt IBM Germany


0001-v2-ChangeLog


gcc/ChangeLog

* combine.c (combine_simplify_rtx):  Suppress replacement of
"(and (reg) (const_int bit))" with "if_then_else".
I'd agree with Bernd that if_then_else_cond would be the wrong place to 
do this.


The PA could implement something like:


(if_then_else (ne (zero_extract (reg) (1) (31))) (X) (0))

For any many constants very efficiently.  But it's a dead architecture 
and we won't have any define_insns in place to do that :-)


Anyway, the patch is OK for the trunk.  It's hard to see how a simple 
(and (X) (C)) -> (if_then_else (condition) (val1) (val2)) is a good 
transformation to make :-)


Jeff


Re: [PATCH] Unset used bit in simplify_replace_* on newly copied rtxs (PR target/78614)

2016-12-01 Thread Jakub Jelinek
On Thu, Dec 01, 2016 at 11:48:03PM +0100, Bernd Schmidt wrote:
> On 12/01/2016 11:43 PM, Jakub Jelinek wrote:
> >
> >so we'd need to slow shallow_copy_rtx down by adding switch (code)
> >in there or something similar.
> 
> Is there reason to assume it's time-critical? IMO eliminating the potential
> for error by not having callers need to do this outweighs that concern.

Well, it isn't an error not to clear it, just missed optimization if some caller
cares, and most of the shallow_copy_rtx users really don't care, e.g.
config/i386/i386.md and config/i386/i386.c callers.  cfgexpand.c doesn't
either, at that point the used bit isn't set, the calls in cselib.c don't care
either.  In emit-rtl.c one place explicitly sets used flag afterwards, another
clears it (that one could be removed if the logic is moved down).  The rs6000.c
use indeed could get rid of the explicit used bit clearing.  Most of the
other shallow_copy_rtx callers in the middle-end just don't care.
I think it is really just simplify_replace_fn_rtx where (at least in the
rs6000 case) it is useful to clear it, similar trick is not used in many
places after initial unsharing during expansion before which the bit should
be clear on everything shareable, I saw it in ifcvt.c (set_used_flags
+ some copying + unshare_*_chain afterwards).
After expansion when everything is unshared, gradually the used bits are no
longer relevant, they are set on stuff that has been originally unshared and
clear on newly added rtxes.  Then the reload/postreload
unshare_all_rtx_again clears it and unshares the shared stuff.
So it is really just about spots that do the trick of set_used_flags,
call some functions where clearing of used bit on new stuff is important,
and finally call copy_rtx_if_shared.

Jakub


Re: [PATCH] PR fortran/77505 -- Treat negative character length as LEN=0

2016-12-01 Thread Steve Kargl
On Wed, Nov 30, 2016 at 05:13:28AM +, Punnoose, Elizebeth wrote:
> Please excuse the messy formatting in my initial mail.  Resending
> with proper formatting.
> 
> This patch checks for negative character length in the array
> constructor, and treats it as LEN=0. 
> 
> A warning message is also printed if bounds checking is enabled.
> 
> Bootstrapped and regression tested the patch on x86_64-linux-gnu
> and aarch64-linux-gnu.
> 

Thanks.  After regression testing on x86_64-*-freebsd, I committed
the attached patch.  Not sure if the whitespace got messed up by 
my email agent, but I needed to reformat your testcases.  I took the
opportunity to rename and improve the testcases.  The improvements
check that in fact len=0 and that a warning is issued. 

Hopefully, you're inclined to submit additional patches in the future.
A few recommendations are to include the text of your ChangeLog 
entry in body of the email, for example,

2016-12-01  Elizebeth Punnoose  

PR fortran/77505
* trans-array.c (trans_array_constructor): Treat negative character
length as LEN = 0.

2016-12-01  Elizebeth Punnoose  

PR fortran/77505
* gfortran.dg/char_length_20.f90: New test.
* gfortran.dg/char_length_21.f90: Ditto.

(Note, 2 spaces before and after your name.)  Then attach the 
patch to the email.  This hopefully will prevent formatting
issues with various email clients/servers. 

-- 
Steve
Index: gcc/fortran/trans-array.c
===
--- gcc/fortran/trans-array.c	(revision 243134)
+++ gcc/fortran/trans-array.c	(working copy)
@@ -2226,6 +2226,8 @@ trans_array_constructor (gfc_ss * ss, lo
   gfc_ss_info *ss_info;
   gfc_expr *expr;
   gfc_ss *s;
+  tree neg_len;
+  char *msg;
 
   /* Save the old values for nested checking.  */
   old_first_len = first_len;
@@ -2271,6 +2273,29 @@ trans_array_constructor (gfc_ss * ss, lo
 	  gfc_conv_expr_type (_se, expr->ts.u.cl->length,
 			  gfc_charlen_type_node);
 	  ss_info->string_length = length_se.expr;
+
+	  /* Check if the character length is negative.  If it is, then
+	 set LEN = 0.  */
+	  neg_len = fold_build2_loc (input_location, LT_EXPR,
+ boolean_type_node, ss_info->string_length,
+ build_int_cst (gfc_charlen_type_node, 0));
+	  /* Print a warning if bounds checking is enabled.  */
+	  if (gfc_option.rtcheck & GFC_RTCHECK_BOUNDS)
+	{
+	  msg = xasprintf ("Negative character length treated as LEN = 0");
+	  gfc_trans_runtime_check (false, true, neg_len, _se.pre,
+   where, msg);
+	  free (msg);
+	}
+
+	  ss_info->string_length
+	= fold_build3_loc (input_location, COND_EXPR,
+			   gfc_charlen_type_node, neg_len,
+			   build_int_cst (gfc_charlen_type_node, 0),
+			   ss_info->string_length);
+	  ss_info->string_length = gfc_evaluate_now (ss_info->string_length,
+		 _se.pre);
+
 	  gfc_add_block_to_block (_loop->pre, _se.pre);
 	  gfc_add_block_to_block (_loop->post, _se.post);
 	}
Index: gcc/testsuite/gfortran.dg/char_length_20.f90
===
--- gcc/testsuite/gfortran.dg/char_length_20.f90	(nonexistent)
+++ gcc/testsuite/gfortran.dg/char_length_20.f90	(working copy)
@@ -0,0 +1,13 @@
+! { dg-do run }
+! { dg-options "-fcheck=bounds" }
+program rabbithole
+   implicit none
+   character(len=:), allocatable :: text_block(:)
+   integer i, ii
+   character(len=10) :: cten='abcdefghij'
+   character(len=20) :: ctwenty='abcdefghijabcdefghij'
+   ii = -6
+   text_block=[ character(len=ii) :: cten, ctwenty ]
+   if (any(len_trim(text_block) /= 0)) call abort
+end program rabbithole
+! { dg-output "At line 10 of file .*char_length_20.f90.*Fortran runtime warning: Negative character length treated as LEN = 0" }
Index: gcc/testsuite/gfortran.dg/char_length_21.f90
===
--- gcc/testsuite/gfortran.dg/char_length_21.f90	(nonexistent)
+++ gcc/testsuite/gfortran.dg/char_length_21.f90	(working copy)
@@ -0,0 +1,11 @@
+! { dg-do run }
+program rabbithole
+   implicit none
+   character(len=:), allocatable :: text_block(:)
+   integer i, ii
+   character(len=10) :: cten='abcdefghij'
+   character(len=20) :: ctwenty='abcdefghijabcdefghij'
+   ii = -6
+   text_block = [character(len=ii) :: cten, ctwenty]
+   if (any(len_trim(text_block) /= 0)) call abort
+end program rabbithole


Re: [PATCH] Fix x86_64 fix_debug_reg_uses (PR rtl-optimization/78575)

2016-12-01 Thread Jakub Jelinek
On Thu, Dec 01, 2016 at 03:55:58PM -0700, Jeff Law wrote:
> On 11/30/2016 11:57 AM, Jakub Jelinek wrote:
> >On Tue, Nov 29, 2016 at 03:20:08PM -0700, Jeff Law wrote:
> >>On 11/29/2016 12:41 PM, Jakub Jelinek wrote:
> >>>Hi!
> >>>
> >>>The x86_64 stv pass uses PUT_MODE to change REGs and MEMs in place to 
> >>>affect
> >>>all setters and users, but that is undesirable in debug insns which are
> >>>intentionally ignored during the analysis and we should keep using correct
> >>>modes (TImode) instead of the new one (V1TImode).
> >>Note that MEMs are not shared, so twiddling the mode on any given MEM
> >>changes one and only one object.
> >
> >Note that this patch isn't trying to workaround any wrong MEM sharing,
> >while the other patch has been.  So, is the PR78575 ok as is?
> >I'll post the other updated patch in the corresponding thread.
> It was an x86 patch, right?  So it's Uros's call.

Yeah, it is config/i386/ only.

Jakub


Re: [PATCH] Fix minor nits in gimple-ssa-sprintf.c (PR tree-optimization/78586)

2016-12-01 Thread Jeff Law

On 11/30/2016 12:01 PM, Jakub Jelinek wrote:

Hi!

This patch fixes some minor nits I've raised in the PR, more severe issues
left unresolved there.

Bootstrapped/regtested on x86_64-linux and i686-linux, ok for trunk?

2016-11-30  Jakub Jelinek  

PR tree-optimization/78586
* gimple-ssa-sprintf.c (format_integer): Don't handle NOP_EXPR,
CONVERT_EXPR or COMPONENT_REF here.  Formatting fix.  For
SSA_NAME_DEF_STMT with NOP_EXPR only change argtype if the rhs1's
type is INTEGER_TYPE or POINTER_TYPE.

OK.
jeff



Re: [PATCH 1/9] print_rtx: implement support for reuse IDs (v2)

2016-12-01 Thread Jeff Law

On 11/11/2016 02:15 PM, David Malcolm wrote:

Posted earlier here:
  https://gcc.gnu.org/ml/gcc-patches/2016-11/msg00651.html
Link to earlier discussion:
  https://gcc.gnu.org/ml/gcc-patches/2016-10/msg01801.html

This version:
- eliminates the rtx_reuse_manager singleton
- eliminates print-rtl-reuse.h, moving class rtx_reuse_manager into
print-rtl.h and print-rtl.c
- fixes the formatting issues you noted in the earlier patch
Can you repost just those pieces that still need review?  I know various 
bits have been approved and various bits have iterated.  Just want to 
see what's really still outstanding.


Thanks,
jeff



Re: [PATCH] remove invalid "tail +16c"

2016-12-01 Thread Jeff Law

On 11/22/2016 11:22 PM, ma.ji...@zte.com.cn wrote:

Hi all,
  In "config/acx.m4", there are still some "tail +16c"  which are invalid
on POSIX systems.
  In my opinion, all "tail +16c" should be changed to "tail -c +16"
directly, as most systems has accept the latter.
  And, to skip first 16 bytes, we should use "tail -c +17" instead of
"tail -c +16".

 * config/acx.m4:Change "tail +16c" to "tail -c +17".
 * configure: Regenerate.
Thanks.  I've installed this on the trunk after bootstrap & regression 
testing on x86_64-linux-gnu.


jeff



Re: [PATCH] Another debug info stv fix (PR rtl-optimization/78547)

2016-12-01 Thread Jeff Law

On 11/30/2016 11:59 AM, Jakub Jelinek wrote:

On Wed, Nov 30, 2016 at 08:01:11AM +0100, Uros Bizjak wrote:

On Tue, Nov 29, 2016 at 8:44 PM, Jakub Jelinek  wrote:

Hi!

The following testcase ICEs because DECL_RTL/DECL_INCOMING_RTL are adjusted
by the stv pass through the PUT_MODE modifications, which means that for
var-tracking.c they contain a bogus mode.

Fixed by wrapping those into TImode subreg or adjusting the MEMs to have the
correct mode.

Bootstrapped/regtested on x86_64-linux and i686-linux, ok for trunk?

2016-11-29  Jakub Jelinek  

PR rtl-optimization/78547
* config/i386/i386.c (convert_scalars_to_vectors): If any
insns have been converted, adjust all parameter's DEC_RTL and
DECL_INCOMING_RTL back from V1TImode to TImode if the parameters have
TImode.


LGTM.


This patch actually has been working around IMHO broken rtl sharing of MEM
between DECL_INCOMING_RTL and some REG_EQUAL note.

Here is an updated patch that avoids this sharing (the middle-end part) and
thus can remove the MEM handling and just keep REG handling in
convert_scalars_to_vectors.

Bootstrapped/regtested on x86_64-linux and i686-linux, ok for trunk?

2016-11-29  Jakub Jelinek  

PR rtl-optimization/78547
* emit-rtl.c (unshare_all_rtl): Make sure DECL_RTL and
DECL_INCOMING_RTL is not shared.
* config/i386/i386.c (convert_scalars_to_vectors): If any
insns have been converted, adjust all parameter's DEC_RTL and
DECL_INCOMING_RTL back from V1TImode to TImode if the parameters have
TImode.

emit-rtl bits are OK.  Uros has the call on the x86 bits.

jeff




Re: [PATCH] Fix x86_64 fix_debug_reg_uses (PR rtl-optimization/78575)

2016-12-01 Thread Jeff Law

On 11/30/2016 11:57 AM, Jakub Jelinek wrote:

On Tue, Nov 29, 2016 at 03:20:08PM -0700, Jeff Law wrote:

On 11/29/2016 12:41 PM, Jakub Jelinek wrote:

Hi!

The x86_64 stv pass uses PUT_MODE to change REGs and MEMs in place to affect
all setters and users, but that is undesirable in debug insns which are
intentionally ignored during the analysis and we should keep using correct
modes (TImode) instead of the new one (V1TImode).

Note that MEMs are not shared, so twiddling the mode on any given MEM
changes one and only one object.


Note that this patch isn't trying to workaround any wrong MEM sharing,
while the other patch has been.  So, is the PR78575 ok as is?
I'll post the other updated patch in the corresponding thread.

It was an x86 patch, right?  So it's Uros's call.

jeff


Re: [PATCH v2] improve folding of expressions that move a single bit around

2016-12-01 Thread Jeff Law

On 12/01/2016 04:21 AM, Paolo Bonzini wrote:

In code like the following from KVM:

/* it is a read fault? */
error_code = (exit_qualification << 2) & PFERR_FETCH_MASK;

it would be nicer to write

/* it is a read fault? */
error_code = (exit_qualification & VMX_EPT_READ_FAULT_MASK) ? 
PFERR_FETCH_MASK : 0;

instead of having to know the difference between the positions of the
source and destination bits.  LLVM catches the latter just fine (which
is why I am sending this in stage 3...), but GCC does not, so this
patch adds two patterns to catch it.

The combine.c hunk of v1 has been committed already.

Bootstrapped/regtested x86_64-pc-linux-gnu, ok?

Paolo

2016-11-26  Paolo Bonzini  

* match.pd: Simplify X ? C : 0 where C is a power of 2 and
X tests a single bit.

2016-11-26  Paolo Bonzini  

* gcc.dg/fold-and-lshift.c, gcc.dg/fold-and-rshift-1.c,
gcc.dg/fold-and-rshift-2.c: New testcases.

OK.
jeff



Re: [PATCH] Unset used bit in simplify_replace_* on newly copied rtxs (PR target/78614)

2016-12-01 Thread Bernd Schmidt

On 12/01/2016 11:43 PM, Jakub Jelinek wrote:


so we'd need to slow shallow_copy_rtx down by adding switch (code)
in there or something similar.


Is there reason to assume it's time-critical? IMO eliminating the 
potential for error by not having callers need to do this outweighs that 
concern.



Bernd


Re: [PATCH] Unset used bit in simplify_replace_* on newly copied rtxs (PR target/78614)

2016-12-01 Thread Jakub Jelinek
On Thu, Dec 01, 2016 at 01:19:16PM +0100, Bernd Schmidt wrote:
> On 11/30/2016 11:11 PM, Jakub Jelinek wrote:
> >I run into the problem that while simplify_replace_rtx if it actually does
> >copy_rtx (for y) or if it simplifies something through
> >simplify_gen_{unary,binary,relational,ternary}, the used bits on the newly
> >created rtxes are cleared, when we fall through into the fallback
> >simplify_replace_fn_rtx handling, it calls shallow_copy_rtx which copies the
> >set used bit and thus copy_rtx_if_shared copies it again.
> 
> Shouldn't the bit be cleared in shallow_copy_rtx then?

Not very easily.  The problem is that the used bit means lots of things for
various rtls:

  /* At the end of RTL generation, 1 if this rtx is used.  This is used for
 copying shared structure.  See `unshare_all_rtl'.
 In a REG, this is not needed for that purpose, and used instead
 in `leaf_renumber_regs_insn'.
 1 in a SYMBOL_REF, means that emit_library_call
 has used it as the function.
 1 in a CONCAT is VAL_HOLDS_TRACK_EXPR in var-tracking.c.
 1 in a VALUE or DEBUG_EXPR is VALUE_RECURSED_INTO in var-tracking.c. 
 */
  unsigned int used : 1;

so we'd need to slow shallow_copy_rtx down by adding switch (code)
in there or something similar.  The callers of it have the advantage that
they know they call shallow_copy_rtx already on something that shouldn't be
shared, and many of them take care of the used bit already (some clear it,
other callers set it).  So clearing it in shallow_copy_rtx for the callers
that set it would be just waste of time.  But if you strongly prefer that
I can try to implement it.

That said, I found a bug in my patch, ADDR_DIFF_VEC has "eEee0" format, thus
if simplify_replace_fn_rtx replaces anything in the first operand, then
either it wouldn't properly duplicate the vector operand if no changes are
needed there, or it would ICE with the previous version of the patch.
So here is an updated patch that fixes this.

2016-12-01  Jakub Jelinek  

PR target/78614
* simplify-rtx.c (simplify_replace_fn_rtx): When copying rtx with
'E' in format, copy all vectors.  Clear used flag after
shallow_copy_rtx.

--- gcc/simplify-rtx.c.jj   2016-12-01 08:53:37.134284466 +0100
+++ gcc/simplify-rtx.c  2016-12-01 23:32:45.074152770 +0100
@@ -547,13 +547,20 @@ simplify_replace_fn_rtx (rtx x, const_rt
  old_rtx, fn, data);
if (op != RTVEC_ELT (vec, j))
  {
-   if (newvec == vec)
+   if (x == newx)
  {
-   newvec = shallow_copy_rtvec (vec);
-   if (x == newx)
- newx = shallow_copy_rtx (x);
-   XVEC (newx, i) = newvec;
+   newx = shallow_copy_rtx (x);
+   RTX_FLAG (newx, used) = 0;
+   /* If we copy X, we need to copy also all
+  vectors in it, rather than copy only
+  a subset of them and share the rest.  */
+   for (int k = 0; fmt[k]; k++)
+ if (fmt[k] == 'E')
+   XVEC (newx, k) = shallow_copy_rtvec (XVEC (x, k));
+   newvec = XVEC (newx, i);
  }
+   else
+ gcc_checking_assert (vec != newvec);
RTVEC_ELT (newvec, j) = op;
  }
  }
@@ -566,7 +573,16 @@ simplify_replace_fn_rtx (rtx x, const_rt
if (op != XEXP (x, i))
  {
if (x == newx)
- newx = shallow_copy_rtx (x);
+ {
+   newx = shallow_copy_rtx (x);
+   RTX_FLAG (newx, used) = 0;
+   /* If we copy X, we need to copy also all
+  vectors in it, rather than copy only
+  a subset of them and share the rest.  */
+   for (int k = 0; fmt[k]; k++)
+ if (fmt[k] == 'E')
+   XVEC (newx, k) = shallow_copy_rtvec (XVEC (x, k));
+ }
XEXP (newx, i) = op;
  }
  }


Jakub


Re: [PATCH] Fix runtime error: left shift of negative value (PR, ipa/78555).

2016-12-01 Thread Jeff Law

On 12/01/2016 02:54 AM, Martin Liška wrote:

As described in the PR, we do couple of shifts of a negative value.
Fixed in the patch
and couple of new unit tests are added.

Patch can bootstrap on ppc64le-redhat-linux and survives regression tests.

Ready to be installed?
Martin

0001-Fix-runtime-error-left-shift-of-negative-value-PR-ip.patch


From 61a6b5e0c973bd77341a1053609c7ad331691a9e Mon Sep 17 00:00:00 2001
From: marxin 
Date: Wed, 30 Nov 2016 15:24:48 +0100
Subject: [PATCH] Fix runtime error: left shift of negative value (PR
 ipa/78555).

gcc/ChangeLog:

2016-11-30  Martin Liska  

PR ipa/78555
* sreal.c (sreal::to_int): Make absolute value before shifting.
(sreal::operator/): Likewise.
(sreal_verify_negative_division): New test.
(void sreal_c_tests): Call the new test.
* sreal.h (sreal::normalize_up): Use new SREAL_ABS and
SREAL_SIGN macros.
(sreal::normalize_down): Likewise.

OK.
jeff



New Spanish PO file for 'gcc' (version 6.2.0)

2016-12-01 Thread Translation Project Robot
Hello, gentle maintainer.

This is a message from the Translation Project robot.

A revised PO file for textual domain 'gcc' has been submitted
by the Spanish team of translators.  The file is available at:

http://translationproject.org/latest/gcc/es.po

(This file, 'gcc-6.2.0.es.po', has just now been sent to you in
a separate email.)

All other PO files for your package are available in:

http://translationproject.org/latest/gcc/

Please consider including all of these in your next release, whether
official or a pretest.

Whenever you have a new distribution with a new version number ready,
containing a newer POT file, please send the URL of that distribution
tarball to the address below.  The tarball may be just a pretest or a
snapshot, it does not even have to compile.  It is just used by the
translators when they need some extra translation context.

The following HTML page has been updated:

http://translationproject.org/domain/gcc.html

If any question arises, please contact the translation coordinator.

Thank you for all your work,

The Translation Project robot, in the
name of your translation coordinator.




Re: [tree-tailcall] Check if function returns it's argument

2016-12-01 Thread Jeff Law

On 12/01/2016 06:22 AM, Richard Biener wrote:

Well after removing DECL_BY_REFERENCE, verify_gimple still fails but
differently:

tail-padding1.C:13:1: error: RESULT_DECL should be read only when
DECL_BY_REFERENCE is set
 }
 ^
while verifying SSA_NAME nrvo_4 in statement
# .MEM_3 = VDEF <.MEM_1(D)>
nrvo_4 = __builtin_memset (nrvo_2(D), 0, 8);
tail-padding1.C:13:1: internal compiler error: verify_ssa failed


Hmm, ok.  Not sure why we enforce this.
I don't know either.  But I would start by looking at tree-nrv.c since 
it looks (based on the variable names) that the named-value-return 
optimization kicked in.




Note that in the end this patch looks fishy -- iff we really need
the LHS on the assignment for correctness if we have the tailcall
flag set then what guarantees that later passes do not remove it
again?  So anybody removing a LHS would need to unset the tailcall flag?

Saying again that I don't know enough about the RTL part of tailcall
expansion.
The LHS on the assignment makes it easier to identify when a tail call 
is possible.  It's not needed for correctness.  Not having the LHS on 
the assignment just means we won't get an optimized tail call.


Under what circumstances would the LHS possibly be removed?  We know the 
return statement references the LHS, so it's not going to be something 
that DCE will do.


jeff


Re: [PATCH] PR 66149 & PR78235 dbxout_type_fields

2016-12-01 Thread Jakub Jelinek
On Thu, Dec 01, 2016 at 01:41:30PM -0500, David Edelsohn wrote:
> TEMPLATE_DECL is defined in cp/cp-tree.def, which is included in
> all-tree.def, which is included in tree-core.h, which is included in
> tree.h, which is included in dbxout.c.

That is clearly a bug, if TEMPLATE_DECL is used in the generic code, it
should be moved into generic tree.def IMHO.
But of course it can be done incrementally.

Jakub


[committed] dwarf2out.c: fix jit issue with early_dwarf_finished

2016-12-01 Thread David Malcolm
All of the jit testcases that generate debuginfo appear to have been
failing since r240228 on their 2nd in-process iteration on this
assertion in set_early_dwarf's ctor:

  gcc_assert (! early_dwarf_finished);

Root cause is that the global is never reset at the end of compilation,
which this patch fixes in the obvious way.

Successfully bootstrapped on x86_64-pc-linux-gnu; fixes
55 FAILs in jit.sum.

Committed to trunk as r243136 (under the "obvious" rule).

gcc/ChangeLog:
* dwarf2out.c (dwarf2out_c_finalize): Reset early_dwarf and
early_dwarf_finished.
---
 gcc/dwarf2out.c | 3 +++
 1 file changed, 3 insertions(+)

diff --git a/gcc/dwarf2out.c b/gcc/dwarf2out.c
index 66a4919..d87ba21 100644
--- a/gcc/dwarf2out.c
+++ b/gcc/dwarf2out.c
@@ -29829,6 +29829,9 @@ dwarf2out_c_finalize (void)
   cold_text_section = NULL;
   current_unit_personality = NULL;
 
+  early_dwarf = false;
+  early_dwarf_finished = false;
+
   next_die_offset = 0;
   single_comp_unit_die = NULL;
   comdat_type_list = NULL;
-- 
1.8.5.3



Re: Pretty printers for versioned namespace

2016-12-01 Thread François Dumont

On 29/11/2016 21:17, Jonathan Wakely wrote:

On 28/11/16 22:19 +0100, François Dumont wrote:

Hi

   Here is a patch to fix pretty printers when versioned namespace is 
activated.


   You will see that I have hesitated in making the fix independant 
of the version being used. In source files you will find (__7::)? 
patterns while in xmethods.py I chose (__\d+::)? making it ready for 
__8 and forward. Do you want to generalize one option ? If so which 
one ?


I don't really mind, but I note that the point of the path
libstdcxx/v6/printers.py was that we'd have different printers for v7,
v8 etc. ... I think it's simpler to keep everything in one place
though.


Ok, I think the folder v6 depends more on a potential gdb api change.

   At the moment version namespace is visible within gdb, it displays 
for instance 'std::__7::string'. I am pretty sure we could hide it, 
is it preferable ? I would need some time to do so as I am neither a 
python nor regex expert.


It's fine to display it.

   I am not fully happy with the replication in printers.py of 
StdRbtreeIteratorPrinter and 
StdExpAnyPrinter(SingleObjContainerPrinter in respectively 
StdVersionedRbtreeIteratorPrinter and 
StdExpVerAnyPrinter(SingleObjContainerPrinter just to adapt 2 lines 
where regex is not an option. We could surely keep only one and pass 
it '' or '__7'. But as I said I am not a python expert so any help 
would be appreciated.


We definitely want to avoid that duplication. For
StdRbtreeIteratorPrinter you can just look at 'typename' and see
whether it starts with "std::__7" or not. If it does, you need to lookup
std::__7::_Rb_tree_node<...>, otherwise you need to lookup
std::_Rb_tree_node<...> instead.

For StdExpAnyPrinter just do two replacements: first replace
std::string with the result of gdb.lookup_type('std::string') and then
replace std::__7::string with the result of looking that up. Are you
sure that's even needed though? Does std::__7::string actually appear
in the manager function's name? I would expect it to appear as
std::__7::basic_string >

which doesn't need to be expanded anyway. So I think you can just
remove your StdExpVerAnyPrinter.


We needed the StdExpVerAnyPrinter just because of the loopkup for 
'std::string' which has to be 'std::__7::string'. But I used similar 
technique exposed previously to get rid of it.


So here is the simplified version I plan to test without versioned 
namespace.


François

diff --git a/libstdc++-v3/python/libstdcxx/v6/printers.py b/libstdc++-v3/python/libstdcxx/v6/printers.py
index bad42b4..a7c92a6 100644
--- a/libstdc++-v3/python/libstdcxx/v6/printers.py
+++ b/libstdc++-v3/python/libstdcxx/v6/printers.py
@@ -36,6 +36,8 @@ import sys
 # We probably can't do much about this until this GDB PR is addressed:
 # 
 
+vers_nsp = '__7::'
+
 if sys.version_info[0] > 2:
 ### Python 3 stuff
 Iterator = object
@@ -127,9 +129,9 @@ class UniquePointerPrinter:
 
 def to_string (self):
 impl_type = self.val.type.fields()[0].type.tag
-if impl_type.startswith('std::__uniq_ptr_impl<'): # New implementation
+if re.match('^std::(' + vers_nsp + ')?__uniq_ptr_impl<.*>$', impl_type): # New implementation
 v = self.val['_M_t']['_M_t']['_M_head_impl']
-elif impl_type.startswith('std::tuple<'):
+elif re.match('^std::(' + vers_nsp + ')?tuple<.*>$', impl_type):
 v = self.val['_M_t']['_M_head_impl']
 else:
 raise ValueError("Unsupported implementation for unique_ptr: %s" % self.val.type.fields()[0].type.tag)
@@ -485,7 +487,10 @@ class StdRbtreeIteratorPrinter:
 def __init__ (self, typename, val):
 self.val = val
 valtype = self.val.type.template_argument(0).strip_typedefs()
-nodetype = gdb.lookup_type('std::_Rb_tree_node<' + str(valtype) + '>')
+if typename.startswith('std::' + vers_nsp):
+nodetype = gdb.lookup_type('std::' + vers_nsp + '_Rb_tree_node<' + str(valtype) + '>')
+else:
+nodetype = gdb.lookup_type('std::_Rb_tree_node<' + str(valtype) + '>')
 self.link_type = nodetype.strip_typedefs().pointer()
 
 def to_string (self):
@@ -927,7 +932,6 @@ class SingleObjContainerPrinter(object):
 return self.visualizer.display_hint ()
 return self.hint
 
-
 class StdExpAnyPrinter(SingleObjContainerPrinter):
 "Print a std::any or std::experimental::any"
 
@@ -948,7 +952,11 @@ class StdExpAnyPrinter(SingleObjContainerPrinter):
 raise ValueError("Unknown manager function in %s" % self.typename)
 
 # FIXME need to expand 'std::string' so that gdb.lookup_type works
-mgrname = re.sub("std::string(?!\w)", str(gdb.lookup_type('std::string').strip_typedefs()), m.group(1))
+if re.match('std::(experimental::)?' + vers_nsp + '.*$', self.typename):
+strt 

Re: [PATCH 8/9] Introduce class function_reader (v4)

2016-12-01 Thread David Malcolm
On Thu, 2016-12-01 at 15:40 +0100, Bernd Schmidt wrote:

Thanks for looking at this.

> On 11/11/2016 10:15 PM, David Malcolm wrote:
> >  #include "gt-aarch64.h"
> > diff --git a/gcc/config/i386/i386.c b/gcc/config/i386/i386.c
> > index 6c608e0..0dda786 100644
> > --- a/gcc/config/i386/i386.c
> > +++ b/gcc/config/i386/i386.c
> 
> I think we should separate out the target specific tests so as to
> give 
> port maintainers a chance to comment on them separately.

Done.

> > diff --git a/gcc/emit-rtl.c b/gcc/emit-rtl.c
> > index 50cd388..179a91f 100644
> > --- a/gcc/emit-rtl.c
> > +++ b/gcc/emit-rtl.c
> > @@ -1371,6 +1371,19 @@ maybe_set_first_label_num (rtx_code_label
> > *x)
> >if (CODE_LABEL_NUMBER (x) < first_label_num)
> >  first_label_num = CODE_LABEL_NUMBER (x);
> >  }
> > +
> > +/* For use by the RTL function loader, when mingling with normal
> > +   functions.
> 
> Not sure what this means.

This is for patch 9 which adds __RTL support to cc1, so that we can
have testcases in which some function bodies are in RTL form, and
others are in normal C form.

The issue is that the CODE_LABEL_NUMBER values in the RTL dump make it
into the generated assembler, and must be unique within the .s file. 
 Hence we need to update first_label_num so that any new labels created
(e.g. when expanding pure C functions) don't clash with the ones from
the dump.

I can simply drop that first sentence; the followup probably is
sufficient:

   /* Ensure that label_num is greater than the label num of X, to
  avoid duplicate labels in the generated assembler.  */

> 
> >if (str == 0)
> > -   fputs (" \"\"", m_outfile);
> > +   fputs (" (nil)", m_outfile);
> >else
> > fprintf (m_outfile, " (\"%s\")", str);
> >m_sawclose = 1;
> 
> What does this affect?

This affects things like the LABEL_NAME of CODE_LABEL instances.

The hunk means that we preserve NULL string values as NULL, rather than
them becoming the empty string on a round trip.

> >  /* Global singleton; constrast with md_reader_ptr above.  */
> > diff --git a/gcc/read-rtl-function.c b/gcc/read-rtl-function.c
> > new file mode 100644
> > index 000..ff6c808
> > --- /dev/null
> > +++ b/gcc/read-rtl-function.c
> > @@ -0,0 +1,2124 @@
> > +/* read-rtl-function.c - Reader for RTL function dumps
> > +   Copyright (C) 2016 Free Software Foundation, Inc.
> > +
> > +This file is part of GCC.
> > +
> > +GCC is free software; you can redistribute it and/or modify it
> > under
> > +the terms of the GNU General Public License as published by the
> > Free
> > +#include 
> 
> Please double-check all these includes whether they are necessary.

Thanks; yes, quite a few were unnecessary.  Fixed.

> > +
> > +/* Fix up a NOTE_INSN_BASIC_BLOCK based on an integer block ID. 
> >  */
> > +
> > +void
> > +fixup_note_insn_basic_block::apply (function_reader */*reader*/)
> > const
> 
> Lose the /*reader*/, probably.

Done.

> > +
> > +/* Implementation of rtx_reader::handle_unknown_directive.
> > +
> > +   Require a top-level "function" elements, as emitted by
> > +   print_rtx_function, and parse it.  */
> 
> "element"?

Converted to "directive" (I've been trying to use "directive" rather
than "element" throughout).

> > +void
> > +function_reader::create_function ()
> > +{
> > +  /* Currently we assume cfgrtl mode, rather than cfglayout mode. 
> >  */
> > +  if (0)
> > +cfg_layout_rtl_register_cfg_hooks ();
> > +  else
> > +rtl_register_cfg_hooks ();
> 
> Do we expect to change this? I'd just get rid of the if (0), at least
> for now.

Patch 9 adds some logic for switching the hooks as we go through the
various passes, so that "startwith" works properly, but here we just
need it to reflect the state of the hooks coming out of "expand", so
I've changed it to:

  /* We start in cfgrtl mode, rather than cfglayout mode.  */
  rtl_register_cfg_hooks ();

> > +/* cgraph_node::add_new_function does additional processing
> > +   based on symtab->state.  We need to avoid it attempting to
> > gimplify
> > +   things.  Temporarily putting it in the PARSING state
> > appears to
> > +   achieve this.  */
> > +enum symtab_state old_state = symtab->state;
> > +symtab->state = PARSING;
> > +cgraph_node::add_new_function (fndecl, true /*lowered*/);
> > +/* Reset the state.  */
> > +symtab->state = old_state;
> > +  }
> 
> Does it do anything beside call finalize_function in that state? If 
> that's all you need, just call it directly.

There's some logic after the switch on state for calling
lang_hooks.eh_personality and setting DECL_FUNCTION_PERSONALITY, but I
think if we want to support that, we'd add it as additional dumped
data.

I've updated things to call finalize_function directly, removing the
messing about with the symtab state.

> > +
> > +  /* Parse DECL_RTL.  */
> > +  {
> > +require_char_ws ('(');
> > +require_word_ws ("DECL_RTL");
> > +DECL_WRTL_CHECK (t_param)->decl_with_rtl.rtl = parse_rtx ();
> 

[SPARC] Preparatory work for switch to LRA

2016-12-01 Thread Eric Botcazou
The switch will very likely not happen for GCC 7, but an experimental support 
can probably be introduced.  The attached patch adds the famous -mlra option 
and rescues straightforward changes by DaveM.

Tested on SPARC/Solaris, applied on the mainline.


2016-12-01  Eric Botcazou  
David S. Miller  

* config/sparc/sparc.opt (mlra): New target option.
* config/sparc/sparc.c (TARGET_LRA_P): Define to...
(sparc_lra_p): ...this.  New function.
(D_MODES, DF_MODES): Add missing cast.
* config/sparc/sparc.md (*movsi_lo_sum, *movsi_high): Do not
provide these insns when flag_pic.
(sethi_di_medlow, losum_di_medlow, seth44, setm44, setl44, sethh,
setlm, sethm, setlo, embmedany_sethi, embmedany_losum,
embmedany_brsum, embmedany_textuhi, embmedany_texthi,
embmedany_textulo, embmedany_textlo): Likewise.
(sethi_di_medlow_embmedany_pic): Provide it only when flag_pic.

-- 
Eric BotcazouIndex: config/sparc/sparc.c
===
--- config/sparc/sparc.c	(revision 243096)
+++ config/sparc/sparc.c	(working copy)
@@ -639,6 +639,7 @@ static const char *sparc_mangle_type (co
 static void sparc_trampoline_init (rtx, tree, rtx);
 static machine_mode sparc_preferred_simd_mode (machine_mode);
 static reg_class_t sparc_preferred_reload_class (rtx x, reg_class_t rclass);
+static bool sparc_lra_p (void);
 static bool sparc_print_operand_punct_valid_p (unsigned char);
 static void sparc_print_operand (FILE *, rtx, int);
 static void sparc_print_operand_address (FILE *, machine_mode, rtx);
@@ -836,7 +837,7 @@ char sparc_hard_reg_printed[8];
 #endif
 
 #undef TARGET_LRA_P
-#define TARGET_LRA_P hook_bool_void_false
+#define TARGET_LRA_P sparc_lra_p
 
 #undef TARGET_LEGITIMATE_ADDRESS_P
 #define TARGET_LEGITIMATE_ADDRESS_P sparc_legitimate_address_p
@@ -4787,7 +4788,7 @@ enum sparc_mode_class {
   ((1 << (int) H_MODE) | (1 << (int) S_MODE) | (1 << (int) SF_MODE))
 
 /* Modes for double-word and smaller quantities.  */
-#define D_MODES (S_MODES | (1 << (int) D_MODE) | (1 << DF_MODE))
+#define D_MODES (S_MODES | (1 << (int) D_MODE) | (1 << (int) DF_MODE))
 
 /* Modes for quad-word and smaller quantities.  */
 #define T_MODES (D_MODES | (1 << (int) T_MODE) | (1 << (int) TF_MODE))
@@ -4799,7 +4800,7 @@ enum sparc_mode_class {
 #define SF_MODES ((1 << (int) S_MODE) | (1 << (int) SF_MODE))
 
 /* Modes for double-float and smaller quantities.  */
-#define DF_MODES (SF_MODES | (1 << (int) D_MODE) | (1 << DF_MODE))
+#define DF_MODES (SF_MODES | (1 << (int) D_MODE) | (1 << (int) DF_MODE))
 
 /* Modes for quad-float and smaller quantities.  */
 #define TF_MODES (DF_MODES | (1 << (int) TF_MODE))
@@ -12248,6 +12249,14 @@ sparc_preferred_reload_class (rtx x, reg
   return rclass;
 }
 
+/* Return true if we use LRA instead of reload pass.  */
+
+static bool
+sparc_lra_p (void)
+{
+  return TARGET_LRA;
+}
+
 /* Output a wide multiply instruction in V8+ mode.  INSN is the instruction,
OPERANDS are its operands and OPCODE is the mnemonic to be used.  */
 
Index: config/sparc/sparc.md
===
--- config/sparc/sparc.md	(revision 243096)
+++ config/sparc/sparc.md	(working copy)
@@ -1568,13 +1568,13 @@ (define_insn "*movsi_lo_sum"
   [(set (match_operand:SI 0 "register_operand" "=r")
 	(lo_sum:SI (match_operand:SI 1 "register_operand" "r")
(match_operand:SI 2 "immediate_operand" "in")))]
-  ""
+  "!flag_pic"
   "or\t%1, %%lo(%a2), %0")
 
 (define_insn "*movsi_high"
   [(set (match_operand:SI 0 "register_operand" "=r")
 	(high:SI (match_operand:SI 1 "immediate_operand" "in")))]
-  ""
+  "!flag_pic"
   "sethi\t%%hi(%a1), %0")
 
 ;; The next two patterns must wrap the SYMBOL_REF in an UNSPEC
@@ -1846,27 +1846,27 @@ (define_insn "movdi_pic_gotdata_op"
 (define_insn "*sethi_di_medlow_embmedany_pic"
   [(set (match_operand:DI 0 "register_operand" "=r")
 (high:DI (match_operand:DI 1 "medium_pic_operand" "")))]
-  "(TARGET_CM_MEDLOW || TARGET_CM_EMBMEDANY) && check_pic (1)"
+  "(TARGET_CM_MEDLOW || TARGET_CM_EMBMEDANY) && flag_pic && check_pic (1)"
   "sethi\t%%hi(%a1), %0")
 
 (define_insn "*sethi_di_medlow"
   [(set (match_operand:DI 0 "register_operand" "=r")
 (high:DI (match_operand:DI 1 "symbolic_operand" "")))]
-  "TARGET_CM_MEDLOW && check_pic (1)"
+  "TARGET_CM_MEDLOW && !flag_pic"
   "sethi\t%%hi(%a1), %0")
 
 (define_insn "*losum_di_medlow"
   [(set (match_operand:DI 0 "register_operand" "=r")
 (lo_sum:DI (match_operand:DI 1 "register_operand" "r")
(match_operand:DI 2 "symbolic_operand" "")))]
-  "TARGET_CM_MEDLOW"
+  "TARGET_CM_MEDLOW && !flag_pic"
   "or\t%1, %%lo(%a2), %0")
 
 (define_insn "seth44"
   [(set (match_operand:DI 0 "register_operand" "=r")
 (high:DI (unspec:DI [(match_operand:DI 1 "symbolic_operand" "")]
 			

Re: [PATCH] Dump probability for edges a frequency for BBs

2016-12-01 Thread Martin Liška

On 12/01/2016 05:49 PM, Martin Sebor wrote:

Okay, thanks for the clarification.  One other question though.
Why would the probability be near zero?  In the absence of any
hints the expression 2 != sprintf(d, "%i", 12) should have
a very high probability of being true, near 100% in fact.

I ask because the test I referenced tries to verify the absence
of the sprintf return value optimization.  Without it the
likelihood of each of the calls to abort should in the EQL kind
of test cases like the one above should be nearly 100% (but not
quite).  When it's the opposite it suggests that the sprintf
optimization is providing some hint (in the form of a range of
return values) that changes the odds.  I have verified that
the optimization is not performed so something else must be
setting the probability or the value isn't correct.


I think I see what's going on.  It's the call to abort that GCC
uses for the probability (more precisely its attribute noreturn).
When I replace the abort with a function of my own that GCC knows
nothing about the probability goes up to just over 52%.  Still it
seems very low given that 2 is just one of UINT_MAX values the
function can possibly return.

Martin


Hi Martin.

Situation is more complicated:
$ gcc predictor.c -fdump-tree-profile_estimate=/dev/stdout -O2 -c

;; Function bar (bar, funcdef_no=0, decl_uid=1797, cgraph_uid=0, symbol_order=0)

;; 1 loops found
;;
;; Loop 0
;;  header 0, latch 1
;;  depth 0, outer -1
;;  nodes: 0 1 2 3 4
;; 2 succs { 3 4 }
;; 3 succs { 4 }
;; 4 succs { 1 }
Predictions for bb 2
  DS theory heuristics: 52.9%
  combined heuristics: 52.9%
  opcode values nonequal (on trees) heuristics of edge 2->3: 66.0%
  early return (on trees) heuristics of edge 2->4: 46.0%
  call heuristics of edge 2->3: 33.0%
Predictions for bb 3
1 edges in bb 3 predicted to even probabilities
Predictions for bb 4
1 edges in bb 4 predicted to even probabilities
bar ()
{
  char d[2];
  int _1;

   [100.0%]:
  _1 = __builtin_sprintf (, "%i", 12);
  if (_1 != 2)
goto ; [52.9%]
  else
goto ; [47.1%]

   [52.9%]:
  foo ();

   [100.0%]:
  d ={v} {CLOBBER};
  return;

}

As you can see, multiple predictors do match to edge 2->3:
  opcode values nonequal (on trees) heuristics of edge 2->3: 66.0% // this is 
the real probability that _x != a_value
  early return (on trees) heuristics of edge 2->4: 46.0% // 2->3 would have 
54.0%
  call heuristics of edge 2->3: 33.0% // performing condition that triggers a 
call is unlikely

These 3 combined together would result in 52.9%. Real values for all of these 
predictors are received from
SPEC benchmarks (https://gcc.gnu.org/ml/gcc-patches/2016-06/msg00511.html), 
where we use contrib/analyze_brprob.py
script which is capable of parsing of the tree profile_estimate dumps.

Back to your question about probability that a given value (from UINT_MAX 
range) is equal to 2. As the predictor
is only driven by opcode and it's quite common that a comparison in a source 
code is to a 'special' value, that's
why the tuned value of the predictor is equal to 66%.

Martin




$ cat a.c && gcc -O2 -S -w -fdump-tree-optimized=/dev/stdout a.c
void foo (void);

void bar (void)
{
  char d [2];
  if (2 != __builtin_sprintf (d, "%i", 12))
foo ();
}

;; Function bar (bar, funcdef_no=0, decl_uid=1797, cgraph_uid=0, symbol_order=0)

Removing basic block 5
bar ()
{
  char d[2];
  int _1;

   [100.0%]:
  _1 = __builtin_sprintf (, "%i", 12);
  if (_1 != 2)
goto ; [52.9%]
  else
goto ; [47.1%]

   [52.9%]:
  foo ();

   [100.0%]:
  d ={v} {CLOBBER};
  return;

}






Re: [Patch 1/2 PR78561] Rename get_pool_size to get_pool_size_upper_bound

2016-12-01 Thread Jeff Law

On 12/01/2016 08:29 AM, James Greenhalgh wrote:


Hi,

There's no functional change in this patch, just a rename.

The size recorded in "offset" is only ever incremented as we add new items
to the constant pool. But this information can become stale where those
constant pool entries would not get emitted.

Thus, it is only ever an upper bound on the size of the constant pool.

The only uses of get_pool_size are in rs6000 and there it is only used to
check whether a constant pool might be output - but explicitly renaming the
function to make it clear that you're getting an upper bound rather than the
real size can only be good for programmers using the interface.

OK?

Thanks,
James

---
2016-12-01  James Greenhalgh  

PR rtl-optimization/78561
* config/rs6000/rs6000.c (rs6000_reg_live_or_pic_offset_p) Rename
get_pool_size to get_pool_size_upper_bound.
(rs6000_stack_info): Likewise.
(rs6000_emit_prologue): Likewise.
(rs6000_elf_declare_function_name): Likewise.
(rs6000_set_up_by_prologue): Likewise.
(rs6000_can_eliminate): Likewise, reformat spaces to tabs.
* output.h (get_pool_size): Rename to...
(get_pool_size_upper_bound): ...This.
* varasm.c (get_pool_size): Rename to...
(get_pool_size_upper_bound): ...This.


Both parts of the fix for 78561 are OK.

Thanks,
Jeff


[Committed] PR fortran/78279 -- convert gcc_assert to internal error

2016-12-01 Thread Steve Kargl
I've committed the attached patch, which converts a gcc_assert()
to a conditional express tha may call gfc_internal_error().o

2016-12-01  Steven G. Kargl  

PR fortran/78279
* dependency.c (identical_array_ref): Convert gcc_assert to conditional
and gfc_internal_error.

2016-12-01  Steven G. Kargl  

PR fortran/78279
* gfortran.dg/pr78279.f90: New test.

-- 
Steve
Index: gcc/fortran/dependency.c
===
--- gcc/fortran/dependency.c	(revision 242789)
+++ gcc/fortran/dependency.c	(working copy)
@@ -101,7 +101,9 @@ identical_array_ref (gfc_array_ref *a1, 
 
   if (a1->type == AR_ELEMENT && a2->type == AR_ELEMENT)
 {
-  gcc_assert (a1->dimen == a2->dimen);
+  if (a1->dimen != a2->dimen)
+	gfc_internal_error ("identical_array_ref(): inconsistent dimensions");
+
   for (i = 0; i < a1->dimen; i++)
 	{
 	  if (gfc_dep_compare_expr (a1->start[i], a2->start[i]) != 0)
Index: gcc/testsuite/gfortran.dg/pr78279.f90
===
--- gcc/testsuite/gfortran.dg/pr78279.f90	(nonexistent)
+++ gcc/testsuite/gfortran.dg/pr78279.f90	(working copy)
@@ -0,0 +1,10 @@
+! { dg-do compile }
+! { dg-options "-Ofast" }
+program p
+   integer :: i
+   real :: z(2,4)
+   z = 0.0
+   do i = 1, 3
+  if ( z(i) > z(1,i+1) ) print *, i   ! { dg-error "mismatch in array reference" }
+   end do
+end


Go patch committed: add slice initializers to the GC root list

2016-12-01 Thread Ian Lance Taylor
As of https://golang.org/cl/32917 we can put slice initializers in the
.data section.  The program can still change the values in those
slices.  That means that if the slice elements can contain pointers,
we need to register the entire initializer as a GC root.

This would be straightforward except that we only have a Bexpression
for the slice initializer, not an Expression.  So introduce a
Backend_expression type that wraps a Bexpression as an Expression.

The test case for this is https://golang.org/cl/33790.

Bootstrapped and ran Go testsuite on x86_64-pc-linux-gnu.  Committed
to mainline.

Ian
Index: gcc/go/gofrontend/MERGE
===
--- gcc/go/gofrontend/MERGE (revision 243094)
+++ gcc/go/gofrontend/MERGE (working copy)
@@ -1,4 +1,4 @@
-97b949f249515a61d3c09e9e06f08c8af189e967
+b7bad96ce0af50a1129eaab9aa110d68a601917b
 
 The first line of this file holds the git revision number of the last
 merge done from the gofrontend repository.
Index: gcc/go/gofrontend/expressions.cc
===
--- gcc/go/gofrontend/expressions.cc(revision 243084)
+++ gcc/go/gofrontend/expressions.cc(working copy)
@@ -4295,6 +4295,20 @@ Unary_expression::do_get_backend(Transla
  true, copy_to_heap, false,
  bexpr);
  bexpr = gogo->backend()->var_expression(implicit, loc);
+
+ // If we are not copying a slice initializer to the heap,
+ // then it can be changed by the program, so if it can
+ // contain pointers we must register it as a GC root.
+ if (this->is_slice_init_
+ && !copy_to_heap
+ && this->expr_->type()->has_pointer())
+   {
+ Bexpression* root =
+   gogo->backend()->var_expression(implicit, loc);
+ root = gogo->backend()->address_expression(root, loc);
+ Type* type = Type::make_pointer_type(this->expr_->type());
+ gogo->add_gc_root(Expression::make_backend(root, type, loc));
+   }
}
   else if ((this->expr_->is_composite_literal()
|| this->expr_->string_expression() != NULL)
@@ -15433,6 +15447,28 @@ Expression::make_compound(Expression* in
   return new Compound_expression(init, expr, location);
 }
 
+// Class Backend_expression.
+
+int
+Backend_expression::do_traverse(Traverse*)
+{
+  return TRAVERSE_CONTINUE;
+}
+
+void
+Backend_expression::do_dump_expression(Ast_dump_context* ast_dump_context) 
const
+{
+  ast_dump_context->ostream() << "backend_expression<";
+  ast_dump_context->dump_type(this->type_);
+  ast_dump_context->ostream() << ">";
+}
+
+Expression*
+Expression::make_backend(Bexpression* bexpr, Type* type, Location location)
+{
+  return new Backend_expression(bexpr, type, location);
+}
+
 // Import an expression.  This comes at the end in order to see the
 // various class definitions.
 
Index: gcc/go/gofrontend/expressions.h
===
--- gcc/go/gofrontend/expressions.h (revision 243084)
+++ gcc/go/gofrontend/expressions.h (working copy)
@@ -137,7 +137,8 @@ class Expression
 EXPRESSION_STRUCT_FIELD_OFFSET,
 EXPRESSION_LABEL_ADDR,
 EXPRESSION_CONDITIONAL,
-EXPRESSION_COMPOUND
+EXPRESSION_COMPOUND,
+EXPRESSION_BACKEND
   };
 
   Expression(Expression_classification, Location);
@@ -485,6 +486,10 @@ class Expression
   static Expression*
   make_compound(Expression*, Expression*, Location);
 
+  // Make a backend expression.
+  static Expression*
+  make_backend(Bexpression*, Type*, Location);
+
   // Return the expression classification.
   Expression_classification
   classification() const
@@ -3825,6 +3830,54 @@ class Compound_expression : public Expre
   Expression* expr_;
 };
 
+// A backend expression.  This is a backend expression wrapped in an
+// Expression, for convenience during backend generation.
+
+class Backend_expression : public Expression
+{
+ public:
+  Backend_expression(Bexpression* bexpr, Type* type, Location location)
+: Expression(EXPRESSION_BACKEND, location), bexpr_(bexpr), type_(type)
+  {}
+
+ protected:
+  int
+  do_traverse(Traverse*);
+
+  // For now these are always valid static initializers.  If that
+  // changes we can change this.
+  bool
+  do_is_static_initializer() const
+  { return true; }
+
+  Type*
+  do_type()
+  { return this->type_; }
+
+  void
+  do_determine_type(const Type_context*)
+  { }
+
+  Expression*
+  do_copy()
+  {
+return new Backend_expression(this->bexpr_, this->type_, this->location());
+  }
+
+  Bexpression*
+  do_get_backend(Translate_context*)
+  { return this->bexpr_; }
+
+  void
+  do_dump_expression(Ast_dump_context*) const;
+
+ private:
+  // The backend expression we are wrapping.
+  Bexpression* bexpr_;
+  // The type of the expression;
+  Type* 

Re: [Fortran, Patch, PR{43366, 57117, 61337, 61376}, v1] Assign to polymorphic objects.

2016-12-01 Thread David Edelsohn
Dump sent privately.

Yes, I meant "x".

AIX defaults to 32 bit.

- David

On Thu, Dec 1, 2016 at 1:31 PM, Andre Vehreschild  wrote:
> Hi all,
>
> I am sorry, but the initial mail as well as Dominique answer puzzles me:
>
> David: I do expect to
>
> write (*,*) any
>
> not being compilable at all, because "any" is an intrinsic function and I
> suppose that gfortran is not able to print it. At best it gives an address. So
> am I right to assume that it should have been:
>
> write (*,*) x
>
> ?
>
> Which is a bit strange. Furthermore is it difficult for me to debug, because I
> do not have access to an AIX machine. What address size does the machine have
> 32/48/64-bit? Is there a chance you send me the file that is generated
> additionally by gfortran when called with -fdump-tree-original ? The file is
> named alloc_comp_class_5.f03.003t.original usually.
>
> Dominique: How did you get that? Do you have access to an AIX machine? What
> kind of instrumentation was active in the compiler you mentioned?
>
> - Andre
>
> On Wed, 30 Nov 2016 21:51:30 +0100
> Dominique d'Humières  wrote:
>
>> If I compile the test with an instrumented  gfortran , I get
>>
>> ../../work/gcc/fortran/interface.c:2948:33: runtime error: load of value
>> 1818451807, which is not a valid value for type ‘expr_t'
>>
>> Dominique
>>
>> > Le 30 nov. 2016 à 21:06, David Edelsohn  a écrit :
>> >
>> > Hi, Andre
>> >
>> > I have noticed that the alloc_comp_class_5.f03 testcase fails on AIX.
>> > Annotating the testcase a little, shows that the failure is at
>> >
>> >  if (any(x /= ["foo", "bar", "baz"])) call abort()
>> >
>> > write (*,*) any
>> >
>> > at the point of failure produces
>> >
>> > "foobarba"
>> >
>> > - David
>>
>
>
> --
> Andre Vehreschild * Email: vehre ad gmx dot de


Re: [PATCH] PR 66149 & PR78235 dbxout_type_fields

2016-12-01 Thread Jeff Law

On 12/01/2016 11:41 AM, David Edelsohn wrote:

On Thu, Dec 1, 2016 at 1:25 PM, Jeff Law  wrote:

On 12/01/2016 09:15 AM, David Edelsohn wrote:


A number of the "variant" testcases fail to build on AIX and targets
that use stabs.  The failure looks like:

/tmp/GCC/powerpc-ibm-aix7.2.0.0/libstdc++-v3/include/variant:956:
internal compiler error: tree check: expected field_decl, have
template_decl in int_bit_position, at tree.h:5396

which occurs in dbxout_type_fields()

  /* Output the name, type, position (in bits), size (in bits) of each
 field that we can support.  */
  for (tem = TYPE_FIELDS (type); tem; tem = DECL_CHAIN (tem))
 ...
  if (VAR_P (tem))
{
 ...
 }
  else
{
  stabstr_C (',');
  stabstr_D (int_bit_position (tem));
  stabstr_C (',');
  stabstr_D (tree_to_uhwi (DECL_SIZE (tem)));
  stabstr_C (';');
}

where tem is a TEMPLATE_DECL.  The dbxout code currently skips
TYPE_DECL, nameless fields, and CONST_DECL.

dbxout_type_methods() explicitly skips TEMPLATE_DECLs with the comment
"The debugger doesn't know what to do with such entities anyhow", so
this proposed patch skips them in dbxout_type_fields() as well.

Okay?

Thanks, David


PR debug/66419
PR c++/78235
* dbxout.c (dbxout_type_fields): Skip TEMPLATE_DECLs.


From the looks of things, it appears we skip them in the dwarf2 code as
well.  But I don't think we can use TEMPLATE_DECL here as that's defined by
the C++ front end.


TEMPLATE_DECL is defined in cp/cp-tree.def, which is included in
all-tree.def, which is included in tree-core.h, which is included in
tree.h, which is included in dbxout.c.

It also is referenced in common code in gcc/tree.c.

In that case, do ahead with checking TEMPLATE_DECL.





I think instead if you test something like:
  (int)TREE_CODE (decl) > NUM_TREE_CODES

You'll filter out any _DECL nodes coming out of the front-ends.


No other DECLs seem to escape.

Good :-)

jeff



Re: [PATCH] PR 66149 & PR78235 dbxout_type_fields

2016-12-01 Thread David Edelsohn
On Thu, Dec 1, 2016 at 1:25 PM, Jeff Law  wrote:
> On 12/01/2016 09:15 AM, David Edelsohn wrote:
>>
>> A number of the "variant" testcases fail to build on AIX and targets
>> that use stabs.  The failure looks like:
>>
>> /tmp/GCC/powerpc-ibm-aix7.2.0.0/libstdc++-v3/include/variant:956:
>> internal compiler error: tree check: expected field_decl, have
>> template_decl in int_bit_position, at tree.h:5396
>>
>> which occurs in dbxout_type_fields()
>>
>>   /* Output the name, type, position (in bits), size (in bits) of each
>>  field that we can support.  */
>>   for (tem = TYPE_FIELDS (type); tem; tem = DECL_CHAIN (tem))
>>  ...
>>   if (VAR_P (tem))
>> {
>>  ...
>>  }
>>   else
>> {
>>   stabstr_C (',');
>>   stabstr_D (int_bit_position (tem));
>>   stabstr_C (',');
>>   stabstr_D (tree_to_uhwi (DECL_SIZE (tem)));
>>   stabstr_C (';');
>> }
>>
>> where tem is a TEMPLATE_DECL.  The dbxout code currently skips
>> TYPE_DECL, nameless fields, and CONST_DECL.
>>
>> dbxout_type_methods() explicitly skips TEMPLATE_DECLs with the comment
>> "The debugger doesn't know what to do with such entities anyhow", so
>> this proposed patch skips them in dbxout_type_fields() as well.
>>
>> Okay?
>>
>> Thanks, David
>>
>>
>> PR debug/66419
>> PR c++/78235
>> * dbxout.c (dbxout_type_fields): Skip TEMPLATE_DECLs.
>
> From the looks of things, it appears we skip them in the dwarf2 code as
> well.  But I don't think we can use TEMPLATE_DECL here as that's defined by
> the C++ front end.

TEMPLATE_DECL is defined in cp/cp-tree.def, which is included in
all-tree.def, which is included in tree-core.h, which is included in
tree.h, which is included in dbxout.c.

It also is referenced in common code in gcc/tree.c.

> I think instead if you test something like:
>   (int)TREE_CODE (decl) > NUM_TREE_CODES
>
> You'll filter out any _DECL nodes coming out of the front-ends.

No other DECLs seem to escape.

- David


Re: [PATCH] avoid calling alloca(0)

2016-12-01 Thread Jeff Law

On 11/30/2016 09:09 PM, Martin Sebor wrote:

What I think this tells us is that we're not at a place where we're
clean.  But we can incrementally get there.  The warning is only
catching a fairly small subset of the cases AFAICT.  That's not unusual
and analyzing why it didn't trigger on those cases might be useful as
well.


The warning has no smarts.  It relies on constant propagation and
won't find a call unless it sees it's being made with a constant
zero.  Looking at the top two on the list the calls are in extern
functions not called from the same source file, so it probably just
doesn't see that the functions are being called from another file
with a zero.  Building GCC with LTO might perhaps help.
Right.  This is consistent with the limitations of other similar 
warnings such as null pointer dereferences.





So where does this leave us for gcc-7?  I'm wondering if we drop the
warning in, but not enable it by default anywhere.  We fix the cases we
can (such as reg-stack,c tree-ssa-threadedge.c, maybe others) before
stage3 closes, and shoot for the rest in gcc-8, including improvign the
warning (if there's something we can clearly improve), and enabling the
warning in -Wall or -Wextra.


I'm fine with deferring the GCC fixes and working on the cleanup
over time but I don't think that needs to gate enabling the option
with -Wextra.  The warnings can be suppressed or prevented from
causing errors during a GCC build either via a command line option
or by pragma in the code.  AFAICT, from the other warnings I see
go by, this is what has been done for -Wno-implicit-fallthrough
while those warnings are being cleaned up.  Why not take the same
approach here?
The difference vs implicit fallthrough is that new instances of implicit 
fallthrus aren't likely to be exposed by changes in IL that occur due to 
transformations in the optimizer pipeline.


Given the number of runtime triggers vs warnings, we know there's many 
instances of passing 0 to the allocators that we're not diagnosing. I 
can easily see differences in the early IL (such as those due to 
BRANCH_COST differing for targets) exposing/hiding cases where 0 flows 
into the allocator argument.  Similarly for changes in inlining 
decisions, jump threading, etc for profiled bootstraps.  I'd like to 
avoid playing wack-a-mole right now.


So I'm being a bit more conservative here.  Maybe it'd be appropriate in 
Wextra since that's not enabled by default for GCC builds.





As much as I would like to improve the warning itself I'm also not
sure I see much of an opportunity for it.  It's not prone to high
rates of false positives (hardly any in fact) and the cases it
misses are those where it simply doesn't see the argument value
because it's not been made available by constant propagation.
There's always ways :-)   For example, I wouldn't be at all surprised if 
you found PHIs that feed the allocation where one or more of the PHI 
arguments are 0.





That said, I consider the -Walloc-size-larger-than warning to be
the more important part of the patch by far.  I'd hate a lack of
consensus on how to deal with GCC's handful of instances of
alloca(0) to stall the rest of the patch.

Agreed on not wanting alloca(0) handling to stall the rest of the patch.

Jeff


Re: [Fortran, Patch, PR{43366, 57117, 61337, 61376}, v1] Assign to polymorphic objects.

2016-12-01 Thread Andre Vehreschild
Hi all,

I am sorry, but the initial mail as well as Dominique answer puzzles me:

David: I do expect to 

write (*,*) any 

not being compilable at all, because "any" is an intrinsic function and I
suppose that gfortran is not able to print it. At best it gives an address. So
am I right to assume that it should have been:

write (*,*) x

?

Which is a bit strange. Furthermore is it difficult for me to debug, because I
do not have access to an AIX machine. What address size does the machine have
32/48/64-bit? Is there a chance you send me the file that is generated
additionally by gfortran when called with -fdump-tree-original ? The file is
named alloc_comp_class_5.f03.003t.original usually.

Dominique: How did you get that? Do you have access to an AIX machine? What
kind of instrumentation was active in the compiler you mentioned?

- Andre

On Wed, 30 Nov 2016 21:51:30 +0100
Dominique d'Humières  wrote:

> If I compile the test with an instrumented  gfortran , I get 
> 
> ../../work/gcc/fortran/interface.c:2948:33: runtime error: load of value
> 1818451807, which is not a valid value for type ‘expr_t'
> 
> Dominique
> 
> > Le 30 nov. 2016 à 21:06, David Edelsohn  a écrit :
> > 
> > Hi, Andre
> > 
> > I have noticed that the alloc_comp_class_5.f03 testcase fails on AIX.
> > Annotating the testcase a little, shows that the failure is at
> > 
> >  if (any(x /= ["foo", "bar", "baz"])) call abort()
> > 
> > write (*,*) any
> > 
> > at the point of failure produces
> > 
> > "foobarba"
> > 
> > - David  
> 


-- 
Andre Vehreschild * Email: vehre ad gmx dot de 


Re: [PATCH] PR 66149 & PR78235 dbxout_type_fields

2016-12-01 Thread Jeff Law

On 12/01/2016 09:15 AM, David Edelsohn wrote:

A number of the "variant" testcases fail to build on AIX and targets
that use stabs.  The failure looks like:

/tmp/GCC/powerpc-ibm-aix7.2.0.0/libstdc++-v3/include/variant:956:
internal compiler error: tree check: expected field_decl, have
template_decl in int_bit_position, at tree.h:5396

which occurs in dbxout_type_fields()

  /* Output the name, type, position (in bits), size (in bits) of each
 field that we can support.  */
  for (tem = TYPE_FIELDS (type); tem; tem = DECL_CHAIN (tem))
 ...
  if (VAR_P (tem))
{
 ...
 }
  else
{
  stabstr_C (',');
  stabstr_D (int_bit_position (tem));
  stabstr_C (',');
  stabstr_D (tree_to_uhwi (DECL_SIZE (tem)));
  stabstr_C (';');
}

where tem is a TEMPLATE_DECL.  The dbxout code currently skips
TYPE_DECL, nameless fields, and CONST_DECL.

dbxout_type_methods() explicitly skips TEMPLATE_DECLs with the comment
"The debugger doesn't know what to do with such entities anyhow", so
this proposed patch skips them in dbxout_type_fields() as well.

Okay?

Thanks, David


PR debug/66419
PR c++/78235
* dbxout.c (dbxout_type_fields): Skip TEMPLATE_DECLs.
From the looks of things, it appears we skip them in the dwarf2 code as 
well.  But I don't think we can use TEMPLATE_DECL here as that's defined 
by the C++ front end.


I think instead if you test something like:
  (int)TREE_CODE (decl) > NUM_TREE_CODES

You'll filter out any _DECL nodes coming out of the front-ends.


jeff



Re: [PATCH] have __builtin_object_size handle POINTER_PLUS with non-const offset (pr 77608)

2016-12-01 Thread Martin Sebor

Sure - but then you maybe instead want to check for op being in
range [0, max-of-signed-type-of-op] instead?  So similar to
expr_not_equal_to add a expr_in_range helper?

Your function returns true for sizetype vars even if it might be
effectively signed, like for

 sizetype i_1 = -4;
 i_2 = i_1 + 1;

operand_unsigned_p (i) returns true.  I suppose you may have
meant

+static bool
+operand_unsigned_p (tree op)
+{
+  if (TREE_CODE (op) == SSA_NAME)
+{
+  gimple *def = SSA_NAME_DEF_STMT (op);
+  if (is_gimple_assign (def))
+   {
+ tree_code code = gimple_assign_rhs_code (def);
+ if (code == NOP_EXPR
   && TYPE_PRECISION (TREE_TYPE (op)) > TYPE_PRECISION
(TREE_TYPE (gimple_assign_rhs1 (def
  return tree_expr_nonnegative_p (gimple_assign_rhs1 (def)));
+   }
+}
+
+  return false;
+}

?  because only if you do see a cast and that cast is widening from an
nonnegative number
the final one will be unsigned (if interpreted as signed number).


I don't think this is what I want.  Here's a test case that works
with my function but not with the suggested modification:

   char d[4];
   long f (unsigned long i)
   {
 return __builtin_object_size (d + i + 1, 0);
   }

Here, the size I'm looking for is (at most) 3 no matter what value
i has.  Am I missing a case where my function will do the wrong
thing?


You might want to use offset_ints here (see mem_ref_offset for example)


Okay, I'll see if I can switch to offset_int.





+ gimple *def = SSA_NAME_DEF_STMT (off);
+ if (is_gimple_assign (def))
+   {
+ tree_code code = gimple_assign_rhs_code (def);
+ if (code == PLUS_EXPR)
+   {
+ /* Handle offset in the form VAR + CST where VAR's type
+is unsigned so the offset must be the greater of
+OFFRANGE[0] and CST.  This assumes the PLUS_EXPR
+is in a canonical form with CST second.  */
+ tree rhs2 = gimple_assign_rhs2 (def);

err, what?  What about overflow?  Aren't you just trying to decompose
'off' into a variable and a constant part here and somehow extracting a
range for the variable part?  So why not just do that?



Sorry, what about overflow?

The purpose of this code is to handle cases of the form

   & PTR [range (MIN, MAX)] + CST


what if MAX + CST overflows?


The code doesn't look at MAX, only MIN is considered.  It extracts
both but only actually uses MAX to see if it's dealing with a range
or a constant.  Does that resolve your concern?


   char d[7];

   #define bos(p, t) __builtin_object_size (p, t)

   long f (unsigned i)
   {
 if (2 < i) i = 2;

 char *p = [i] + 3;

 return bos (p, 0);
   }


I'm sure that doesn't work as you match for PLUS_EXPR.


Sorry, I'm not sure what you mean.  The above evaluates to 4 with
the patch because i cannot be less than zero (otherwise [i] would
be invalid/undefined) so the type-0 size we want (the maximum) is
[7] - ([0] + 3) or 4.


Maybe simply ignore VR_ANTI_RANGEs for now then?


Yes, that makes sense.


The code above is based on the observation that an anti-range is
often used to represent the full subrange of a narrower signed type
like signed char (as ~[128, -129]).  I haven't been able to create
an anti-range like ~[5, 9]. When/how would a range like that come
about (so I can test it and implement the above correctly)?


if (a < 4)
  if (a > 8)
b = a;

then b should have ~[5, 9]


Right :)  I have figured out by know by know how to create an anti
range in general.  What I meant is that I haven't had luck creating
them in a way that the tree-object-size pass could see (I'm guessing
because EVRP doesn't understand relational expressions).  So given
this modified example from above:

char d[9];

#define bos(p, t) __builtin_object_size (p, t)

long f (unsigned a)
{
   unsigned b = 0;

   if (a < 4)
 if (a > 8)
   b = a;

   char *p = [b];
   return bos (p, 0);
}

The value ranges after Early VRP are:

_1: VARYING
b_2: VARYING
a_4(D): VARYING
p_6: ~[0B, 0B]
_8: VARYING

But with the removal of the anti-range code this will be a moot
point.


Maybe the poor range info i a consequence of the pass only benefiting
from EVRP and not VRP?


The range of 'p' is indeed not known (we only represent integer bound ranges).
You seem to want the range of p - [0] here, something that is not present
in the IL.


Yes, that's effectively what this patch does.  Approximate pointer
ranges.


It's just something I haven't had time to work on yet and with the
close of stage 1 approaching I wanted to put out this version for
review.  Do you view this enhancement as prerequisite for approving
the patch or is it something that you'd be fine with adding later?


I find the patch adds quite some ad-hoc ugliness to a pass that is
already complex and nearly impossible to understand.


I'm sorry it looks ugly to you.  I'm afraid I'm not yet familiar

Re: [Patch 0/2 PR78561] Recalculate constant pool size before emitting it

2016-12-01 Thread David Edelsohn
> James Greenhalgh writes:

> The patch set has been bootstrapped and tested on aarch64-none-linux-gnu and
> x86-64-none-linux-gnu without any issues. I've also cross-tested it for
> aarch64-none-elf and build-tested it for rs6000 (though I couldn't run the
> testsuite as I don't have a test environment).

There are PPC64 Linux and AIX systems in the GNU Compile Farm.  All
have DejaGNU installed.

- David


Re: [PATCH] Fix minor nits in gimple-ssa-sprintf.c (PR tree-optimization/78586)

2016-12-01 Thread Jeff Law

On 12/01/2016 12:51 AM, Jakub Jelinek wrote:

On Wed, Nov 30, 2016 at 06:14:14PM -0700, Martin Sebor wrote:

On 11/30/2016 12:01 PM, Jakub Jelinek wrote:

Hi!

This patch fixes some minor nits I've raised in the PR, more severe issues
left unresolved there.

Bootstrapped/regtested on x86_64-linux and i686-linux, ok for trunk?


Thank you.  One comment below.


@@ -1059,7 +1048,12 @@ format_integer (const conversion_spec 
}

  if (code == NOP_EXPR)
-   argtype = TREE_TYPE (gimple_assign_rhs1 (def));
+   {
+ tree type = TREE_TYPE (gimple_assign_rhs1 (def));
+ if (TREE_CODE (type) == INTEGER_TYPE
+ || TREE_CODE (type) == POINTER_TYPE)
+   argtype = type;


As I replied in my comment #6 on the bug, I'm not sure I see what
is wrong with the original code, and I haven't been able to come
up with a test case that demonstrates a problem because of it with
any of the types you mentioned (bool, enum, or floating).


I think for floating we don't emit NOP_EXPR, but FIX_TRUNC_EXPR;
Correct.  The way to think about this stuff is whether or not the type 
change is *likely* to lead to code generated.  If the type change is not 
likely to generate code, then NOP_EXPR is appropriate.  Else a suitable 
conversion is necessary.  This distinction has long been something we'd 
like to fix, but haven't gotten around it it.


It is almost always the case that a floating point conversion will 
generate code.  So NOP_EXPR is not appropriate for a floating point 
conversion.


A NOP_EXPR will be used for a large number of integer conversions though.


perhaps bool/enum is fine, but in the UB case where one loads arbitrary
values into bool or enum the precision might be too small for those
(for enums only with -fstrict-enums).
Note that loading an out-of-range value into an enum object is, sadly, 
not UB as long as there's enough storage space in the object to hold the 
value.  That's why enums often don't constrain ranges.



Jeff


[PATCH, GCC/LRA] Fix PR78617: Fix conflict detection in rematerialization

2016-12-01 Thread Thomas Preudhomme

Hi,

When considering a candidate for rematerialization, LRA verifies if the 
candidate clobbers a live register before going forward with the 
rematerialization (see code starting with comment "Check clobbers do not kill 
something living."). To do this check, the set of live registers at any given 
instruction needs to be maintained. This is done by initializing the set of live 
registers when starting the forward scan of instruction in a basic block and 
updating the set by looking at REG_DEAD notes and destination register at the 
end of an iteration of the scan loop.


However the initialization suffers from 2 issues:

1) it is done from the live out set rather than live in (uses df_get_live_out 
(bb))
2) it ignores pseudo registers that have already been allocated a hard register 
(uses REG_SET_TO_HARD_REG_SET that only looks at hard register and does not look 
at reg_renumber for pseudo registers)


This patch changes the code to use df_get_live_in (bb) to initialize the 
live_hard_regs variable using a loop to check reg_renumber for pseudo registers. 
Please let me know if there is a macro to do that, I failed to find one.


ChangeLog entries are as follow:

gcc/testsuite/ChangeLog:

2016-12-01  Thomas Preud'homme  

PR rtl-optimization/78617
* gcc.c-torture/execute/pr78617.c: New test.


gcc/ChangeLog:

2016-12-01  Thomas Preud'homme  

PR rtl-optimization/78617
* lra-remat.c (do_remat): Initialize live_hard_regs from live in
registers, also setting hard registers mapped to pseudo registers.


Note however that as explained in the problem report, the testcase does not 
trigger the bug on GCC 7 due to better optimization before LRA rematerialization 
is reached.


Testing: testsuite shows no regression when run using:
 + an arm-none-eabi GCC cross-compiler targeting Cortex-M0 and Cortex-M3
 + a bootstrapped arm-linux-gnueabihf GCC native compiler
 + a bootstrapped x86_64-linux-gnu GCC native compiler

Is this ok for stage3?

Best regards,

Thomas
diff --git a/gcc/lra-remat.c b/gcc/lra-remat.c
index f01c6644c428fd9b5efdf6cc98788e5f6fadba62..cdd7057f602098d33ec3acfdaaac66556640bd82 100644
--- a/gcc/lra-remat.c
+++ b/gcc/lra-remat.c
@@ -1047,6 +1047,7 @@ update_scratch_ops (rtx_insn *remat_insn)
 static bool
 do_remat (void)
 {
+  unsigned regno;
   rtx_insn *insn;
   basic_block bb;
   bitmap_head avail_cands;
@@ -1054,12 +1055,21 @@ do_remat (void)
   bool changed_p = false;
   /* Living hard regs and hard registers of living pseudos.  */
   HARD_REG_SET live_hard_regs;
+  bitmap_iterator bi;
 
   bitmap_initialize (_cands, _obstack);
   bitmap_initialize (_cands, _obstack);
   FOR_EACH_BB_FN (bb, cfun)
 {
-  REG_SET_TO_HARD_REG_SET (live_hard_regs, df_get_live_out (bb));
+  CLEAR_HARD_REG_SET (live_hard_regs);
+  EXECUTE_IF_SET_IN_BITMAP (df_get_live_in (bb), 0, regno, bi)
+	{
+	  int hard_regno = regno < FIRST_PSEUDO_REGISTER
+			   ? regno
+			   : reg_renumber[regno];
+	  if (hard_regno >= 0)
+	SET_HARD_REG_BIT (live_hard_regs, hard_regno);
+	}
   bitmap_and (_cands, _remat_bb_data (bb)->avin_cands,
 		  _remat_bb_data (bb)->livein_cands);
   /* Activating insns are always in the same block as their corresponding
diff --git a/gcc/testsuite/gcc.c-torture/execute/pr78617.c b/gcc/testsuite/gcc.c-torture/execute/pr78617.c
new file mode 100644
index ..89c4f6dea8cb507b963f91debb94cbe16eb1db90
--- /dev/null
+++ b/gcc/testsuite/gcc.c-torture/execute/pr78617.c
@@ -0,0 +1,25 @@
+int a = 0;
+int d = 1;
+int f = 1;
+
+int fn1() {
+  return a || 1 >> a;
+}
+
+int fn2(int p1, int p2) {
+  return p2 >= 2 ? p1 : p1 >> 1;
+}
+
+int fn3(int p1) {
+  return d ^ p1;
+}
+
+int fn4(int p1, int p2) {
+  return fn3(!d > fn2((f = fn1() - 1000) || p2, p1));
+}
+
+int main() {
+  if (fn4(0, 0) != 1)
+__builtin_abort ();
+  return 0;
+}


Re: [PATCH] Dump probability for edges a frequency for BBs

2016-12-01 Thread Jeff Law

On 12/01/2016 09:49 AM, Martin Sebor wrote:

Okay, thanks for the clarification.  One other question though.
Why would the probability be near zero?  In the absence of any
hints the expression 2 != sprintf(d, "%i", 12) should have
a very high probability of being true, near 100% in fact.

I ask because the test I referenced tries to verify the absence
of the sprintf return value optimization.  Without it the
likelihood of each of the calls to abort should in the EQL kind
of test cases like the one above should be nearly 100% (but not
quite).  When it's the opposite it suggests that the sprintf
optimization is providing some hint (in the form of a range of
return values) that changes the odds.  I have verified that
the optimization is not performed so something else must be
setting the probability or the value isn't correct.


I think I see what's going on.  It's the call to abort that GCC
uses for the probability (more precisely its attribute noreturn).
Right.  An edge which leads to a noreturn call is predicated as 
esentially "never taken".  It's a good strong predictor.




When I replace the abort with a function of my own that GCC knows
nothing about the probability goes up to just over 52%.  Still it
seems very low given that 2 is just one of UINT_MAX values the
function can possibly return.
Jan would be the one to know the predictors work.  It's reasonably 
complex stuff.

Jeff


Re: [RFA] Handle target with no length attributes sanely in bb-reorder.c

2016-12-01 Thread Jeff Law

On 12/01/2016 05:04 AM, Segher Boessenkool wrote:

On Thu, Dec 01, 2016 at 10:19:42AM +0100, Richard Biener wrote:

Thinking about this again maybe targets w/o insn-length should simply
always use the 'simple' algorithm instead of the STV one?  At least that
might be what your change effectively does in some way?


From reading the comments I don't think STC will collapse down into the
simple algorithm if block copying is disabled.  But Segher would know for
sure.

WRT the choice of simple vs STC, I doubt it matters much for the processors
in question.


I guess STC doesn't make much sense if we can't say anything about BB sizes.


STC tries to make as large as possible consecutive "traces", mainly to
help with instruction cache utilization and hit rate etc.  It cannot do
a very good job if it isn't allowed to copy blocks.

"simple" tries to (dynamically) have as many fall-throughs as possible,
i.e. as few jumps as possible.  It never copies code; if that means it
has to jump every second insn, so be it.  It provably is within a factor
three of optimal (optimal is NP-hard), under a really weak assumption
within a factor two, and it does better than that in practice.

STC without block copying makes longer traces which is not a good idea
for most architectures, only for those that have a short jump that is
much shorter than longer jumps (and thus, cannot cover many jump
targets).

I do not know how STC behaves when it does not know the insn lengths.
mn103 &  m68k are definitely sensitive to jump distances, the former 
more so than the latter.  Some of the others probably are as well.


I think we've probably discussed this more than is really necessary.  We 
just need to pick an alternative and go with it, I think either 
alternative is reasonable (avoid copying when STC has no length 
information or fall back to simple when there is no length information).




jeff


Re: [PATCH 2/6][ARM] Move CRC builtins to refactored framework

2016-12-01 Thread Andre Vieira (lists)
On 09/11/16 10:11, Andre Vieira (lists) wrote:
> Hi,
> 
> This patch refactors the implementation of the ARM ACLE CRC builtins to
> use the builtin framework.
> 
> Is this OK for trunk?
> 
> Regards,
> Andre
> 
> gcc/ChangeLog
> 2016-11-09  Andre Vieira  
> 
>   * config/arm/arm-builtins.c (arm_unsigned_binop_qualifiers): New.
>   (UBINOP_QUALIFIERS): New.
>   (si_UP): Define.
>   (acle_builtin_data): New. Change comment.
>   (arm_builtins): Remove ARM_BUILTIN_CRC32B, ARM_BUILTIN_CRC32H,
>   ARM_BUILTIN_CRC32W, ARM_BUILTIN_CRC32CB, ARM_BUILTIN_CRC32CH,
>   ARM_BUILTIN_CRC32CW. Add ARM_BUILTIN_ACLE_BASE and include
>   arm_acle_builtins.def.
>   (ARM_BUILTIN_ACLE_PATTERN_START): Define.
>   (arm_init_acle_builtins): New.
>   (CRC32_BUILTIN): Remove.
>   (bdesc_2arg): Remove entries for crc32b, crc32h, crc32w,
>   crc32cb, crc32ch and crc32cw.
>   (arm_init_crc32_builtins): Remove.
>   (arm_init_builtins): Use arm_init_acle_builtins rather
>   than arm_init_crc32_builtins.
>   (arm_expand_acle_builtin): New.
>   (arm_expand_builtin): Use 'arm_expand_acle_builtin'.
>   * config/arm/arm_acle_builtins.def: New.
> 
Hi,

Reworked this patch based on the changes made in [1/6]. No changes to
ChangeLog.

Is this OK?

Cheers,
Andre
diff --git a/gcc/config/arm/arm-builtins.c b/gcc/config/arm/arm-builtins.c
index 
da6331fdc729461adeb81d84c0c425bc45b80b8c..b47e255c962239a73b62f5743273d12f07bb237b
 100644
--- a/gcc/config/arm/arm-builtins.c
+++ b/gcc/config/arm/arm-builtins.c
@@ -157,6 +157,13 @@ arm_load1_lane_qualifiers[SIMD_MAX_BUILTIN_ARGS]
   qualifier_none, qualifier_struct_load_store_lane_index };
 #define LOAD1LANE_QUALIFIERS (arm_load1_lane_qualifiers)
 
+/* unsigned T (unsigned T, unsigned T, unsigned T).  */
+static enum arm_type_qualifiers
+arm_unsigned_binop_qualifiers[SIMD_MAX_BUILTIN_ARGS]
+  = { qualifier_unsigned, qualifier_unsigned, qualifier_unsigned,
+  qualifier_unsigned };
+#define UBINOP_QUALIFIERS (arm_unsigned_binop_qualifiers)
+
 /* The first argument (return type) of a store should be void type,
which we represent with qualifier_void.  Their first operand will be
a DImode pointer to the location to store to, so we must use
@@ -242,17 +249,16 @@ typedef struct {
   VAR11 (T, N, A, B, C, D, E, F, G, H, I, J, K) \
   VAR1 (T, N, L)
 
-/* The builtin data can be found in arm_neon_builtins.def,
-   arm_vfp_builtins.def.  The entries in arm_neon_builtins.def require
-   TARGET_NEON to be true.  The feature tests are checked when the
-   builtins are expanded.
+/* The builtin data can be found in arm_neon_builtins.def, arm_vfp_builtins.def
+   and arm_acle_builtins.def.  The entries in arm_neon_builtins.def require
+   TARGET_NEON to be true.  The feature tests are checked when the builtins are
+   expanded.
 
-   The mode entries in the following table correspond to the "key"
-   type of the instruction variant, i.e. equivalent to that which
-   would be specified after the assembler mnemonic, which usually
-   refers to the last vector operand.  The modes listed per
-   instruction should be the same as those defined for that
-   instruction's pattern, for instance in neon.md.  */
+   The mode entries in the following table correspond to the "key" type of the
+   instruction variant, i.e. equivalent to that which would be specified after
+   the assembler mnemonic for neon instructions, which usually refers to the
+   last vector operand.  The modes listed per instruction should be the same as
+   those defined for that instruction's pattern, for instance in neon.md.  */
 
 static arm_builtin_datum vfp_builtin_data[] =
 {
@@ -266,6 +272,15 @@ static arm_builtin_datum neon_builtin_data[] =
 
 #undef CF
 #undef VAR1
+#define VAR1(T, N, A) \
+  {#N, UP (A), CODE_FOR_##N, 0, T##_QUALIFIERS},
+
+static arm_builtin_datum acle_builtin_data[] =
+{
+#include "arm_acle_builtins.def"
+};
+
+#undef VAR1
 
 #define VAR1(T, N, X) \
   ARM_BUILTIN_NEON_##N##X,
@@ -518,13 +533,6 @@ enum arm_builtins
 
   ARM_BUILTIN_WMERGE,
 
-  ARM_BUILTIN_CRC32B,
-  ARM_BUILTIN_CRC32H,
-  ARM_BUILTIN_CRC32W,
-  ARM_BUILTIN_CRC32CB,
-  ARM_BUILTIN_CRC32CH,
-  ARM_BUILTIN_CRC32CW,
-
   ARM_BUILTIN_GET_FPSCR,
   ARM_BUILTIN_SET_FPSCR,
 
@@ -556,6 +564,14 @@ enum arm_builtins
 
 #include "arm_neon_builtins.def"
 
+#undef VAR1
+#define VAR1(T, N, X) \
+  ARM_BUILTIN_##N,
+
+  ARM_BUILTIN_ACLE_BASE,
+
+#include "arm_acle_builtins.def"
+
   ARM_BUILTIN_MAX
 };
 
@@ -565,6 +581,9 @@ enum arm_builtins
 #define ARM_BUILTIN_NEON_PATTERN_START \
   (ARM_BUILTIN_NEON_BASE + 1)
 
+#define ARM_BUILTIN_ACLE_PATTERN_START \
+  (ARM_BUILTIN_ACLE_BASE + 1)
+
 #undef CF
 #undef VAR1
 #undef VAR2
@@ -1013,7 +1032,7 @@ arm_init_builtin (unsigned int fcode, arm_builtin_datum 
*d,
   gcc_assert (ftype != NULL);
 
   if (print_type_signature_p
-  && IN_RANGE (fcode, ARM_BUILTIN_VFP_BASE, ARM_BUILTIN_MAX - 1))
+  && IN_RANGE (fcode, ARM_BUILTIN_VFP_BASE, ARM_BUILTIN_ACLE_MAX - 1))
 snprintf 

Re: [PATCH 1/6][ARM] Refactor NEON builtin framework to work for other builtins

2016-12-01 Thread Andre Vieira (lists)
On 17/11/16 10:42, Kyrill Tkachov wrote:
> Hi Andre,
> 
> On 09/11/16 10:11, Andre Vieira (lists) wrote:
>> Hi,
>>
>> Refactor NEON builtin framework such that it can be used to implement
>> other builtins.
>>
>> Is this OK for trunk?
>>
>> Regards,
>> Andre
>>
>> gcc/ChangeLog
>> 2016-11-09  Andre Vieira  
>>
>>  * config/arm/arm-builtins.c (neon_builtin_datum): Rename to ..
>>  (arm_builtin_datum): ... this.
>>  (arm_init_neon_builtin): Rename to ...
>>  (arm_init_builtin): ... this. Add a new parameters PREFIX
>>  and USE_SIG_IN_NAME.
>>  (arm_init_neon_builtins): Replace 'arm_init_neon_builtin' with
>>  'arm_init_builtin'. Replace type 'neon_builtin_datum' with
>>  'arm_builtin_datum'.
>>  (arm_init_vfp_builtins): Likewise.
>>  (builtin_arg): Rename enum's replacing 'NEON_ARG' with
>>  'ARG_BUILTIN' and add a 'ARG_BUILTIN_NEON_MEMORY.
>>  (arm_expand_neon_args): Rename to ...
>>  (arm_expand_builtin_args): ... this. Rename builtin_arg
>>  enum values and differentiate between ARG_BUILTIN_MEMORY
>>  and ARG_BUILTIN_NEON_MEMORY.
>>  (arm_expand_neon_builtin_1): Rename to ...
>>  (arm_expand_builtin_1): ... this. Rename builtin_arg enum
>>  values, arm_expand_builtin_args and add bool parameter NEON.
>>  (arm_expand_neon_builtin): Use arm_expand_builtin_1.
>>  (arm_expand_vfp_builtin): Likewise.
>>  (NEON_MAX_BUILTIN_ARGS): Remove, it was unused.
> 
>  /* Expand a neon builtin.  This is also used for vfp builtins, which
> behave in
> the same way.  These builtins are "special" because they don't have
> symbolic
> constants defined per-instruction or per instruction-variant. 
> Instead, the
> -   required info is looked up in the NEON_BUILTIN_DATA record that is
> passed
> +   required info is looked up in the ARM_BUILTIN_DATA record that is
> passed
> into the function.  */
>  
> 
> The comment should be updated now that it's not just NEON builtins that
> are expanded through this function.
> 
>  static rtx
> -arm_expand_neon_builtin_1 (int fcode, tree exp, rtx target,
> -   neon_builtin_datum *d)
> +arm_expand_builtin_1 (int fcode, tree exp, rtx target,
> +   arm_builtin_datum *d, bool neon)
>  {
> 
> I'm not a fan of this 'neon' boolean as it can cause confusion among the
> users of the function
> (see long thread at https://gcc.gnu.org/ml/gcc/2016-10/msg4.html).
> Whether the builtin is a NEON/VFP builtin
> can be distinguished from FCODE, so lets just make that bool neon a
> local variable and initialise it accordingly
> from FCODE.
> 
> Same for:
> +/* Set up a builtin.  It will use information stored in the argument
> struct D to
> +   derive the builtin's type signature and name.  It will append the
> name in D
> +   to the PREFIX passed and use these to create a builtin declaration
> that is
> +   then stored in 'arm_builtin_decls' under index FCODE.  This FCODE is
> also
> +   written back to D for future use.  If USE_SIG_IN_NAME is true the
> builtin's
> +   name is appended with type signature information to distinguish between
> +   signedness and poly.  */
>  
>  static void
> -arm_init_neon_builtin (unsigned int fcode,
> -   neon_builtin_datum *d)
> +arm_init_builtin (unsigned int fcode, arm_builtin_datum *d,
> +  const char * prefix, bool use_sig_in_name)
> 
> use_sig_in_name is dependent on FCODE so just deduce it from that
> locally in arm_init_builtin.
> 
> This is ok otherwise.
> Thanks,
> Kyrill
> 
> 

Hi,

Reworked patch according to comments. No changes to ChangeLog.

Is this OK?

Cheers,
Andre
diff --git a/gcc/config/arm/arm-builtins.c b/gcc/config/arm/arm-builtins.c
index 
5ed38d1608cfbfbd1248d76705fcf675bc36c2b2..da6331fdc729461adeb81d84c0c425bc45b80b8c
 100644
--- a/gcc/config/arm/arm-builtins.c
+++ b/gcc/config/arm/arm-builtins.c
@@ -202,7 +202,7 @@ typedef struct {
   const enum insn_code code;
   unsigned int fcode;
   enum arm_type_qualifiers *qualifiers;
-} neon_builtin_datum;
+} arm_builtin_datum;
 
 #define CF(N,X) CODE_FOR_neon_##N##X
 
@@ -242,7 +242,7 @@ typedef struct {
   VAR11 (T, N, A, B, C, D, E, F, G, H, I, J, K) \
   VAR1 (T, N, L)
 
-/* The NEON builtin data can be found in arm_neon_builtins.def and
+/* The builtin data can be found in arm_neon_builtins.def,
arm_vfp_builtins.def.  The entries in arm_neon_builtins.def require
TARGET_NEON to be true.  The feature tests are checked when the
builtins are expanded.
@@ -252,14 +252,14 @@ typedef struct {
would be specified after the assembler mnemonic, which usually
refers to the last vector operand.  The modes listed per
instruction should be the same as those defined for that
-   instruction's pattern in neon.md.  */
+   instruction's pattern, for instance in neon.md.  */
 
-static neon_builtin_datum vfp_builtin_data[] =
+static arm_builtin_datum vfp_builtin_data[] =
 {
 #include "arm_vfp_builtins.def"
 };
 
-static 

Re: [patch,avr] Clean up n_flash field from MCU information.

2016-12-01 Thread Denis Chertykov
2016-12-01 12:26 GMT+03:00 Georg-Johann Lay :
> The introduction of the flash_size field in avr_mcu_t rendered the n_flash
> field redundant.  This patch computes the value of n_flash as needed from
> flash_size and cleans up n_flash.
>
> Ok for trunk?
>
> Johann
>
> gcc/
> * config/avr/avr-arch.h (avr_mcu_t) [n_flash]: Remove field.
> * config/avr/avr-devices.c (AVR_MCU): Remove N_FLASH macro argument.
> * config/avr/avr-mcus.def (AVR_MCU): Remove initializer for n_flash.
> * config/avr/avr.c (avr_set_core_architecture) [avr_n_flash]: Use
> avr_mcu_types.flash_size to compute default value.
> * config/avr/gen-avr-mmcu-specs.c (print_mcu) [cc1_n_flash]: Use
> mcu->flash_size to compute value for spec.
>
>

Approved.


Re: [PATCH] Dump probability for edges a frequency for BBs

2016-12-01 Thread Martin Sebor

Okay, thanks for the clarification.  One other question though.
Why would the probability be near zero?  In the absence of any
hints the expression 2 != sprintf(d, "%i", 12) should have
a very high probability of being true, near 100% in fact.

I ask because the test I referenced tries to verify the absence
of the sprintf return value optimization.  Without it the
likelihood of each of the calls to abort should in the EQL kind
of test cases like the one above should be nearly 100% (but not
quite).  When it's the opposite it suggests that the sprintf
optimization is providing some hint (in the form of a range of
return values) that changes the odds.  I have verified that
the optimization is not performed so something else must be
setting the probability or the value isn't correct.


I think I see what's going on.  It's the call to abort that GCC
uses for the probability (more precisely its attribute noreturn).
When I replace the abort with a function of my own that GCC knows
nothing about the probability goes up to just over 52%.  Still it
seems very low given that 2 is just one of UINT_MAX values the
function can possibly return.

Martin

$ cat a.c && gcc -O2 -S -w -fdump-tree-optimized=/dev/stdout a.c
void foo (void);

void bar (void)
{
  char d [2];
  if (2 != __builtin_sprintf (d, "%i", 12))
foo ();
}

;; Function bar (bar, funcdef_no=0, decl_uid=1797, cgraph_uid=0, 
symbol_order=0)


Removing basic block 5
bar ()
{
  char d[2];
  int _1;

   [100.0%]:
  _1 = __builtin_sprintf (, "%i", 12);
  if (_1 != 2)
goto ; [52.9%]
  else
goto ; [47.1%]

   [52.9%]:
  foo ();

   [100.0%]:
  d ={v} {CLOBBER};
  return;

}




Re: [patch,avr] Document how to avoid progmem on AVR_TINY.

2016-12-01 Thread Denis Chertykov
2016-12-01 17:28 GMT+03:00 Georg-Johann Lay :
> This adds to the documentation a hint how to set up a linker description
> file that avoids progmem altogether any without the usual overhead of
> locating read-only data in RAM.  The proposed linker description file is
> completely transparent to the compiler, and no start-up code has to be
> adjusted.
>
> IIUC there are currently no plans to fix this in the default linker
> description file avrtiny.x, cf. http://sourceware.org/PR20849
>
> Also, link between -mabsdata option and absdata variable attribute.
>
> Ok for trunk?
>
>
> Johann
>
>
> gcc/
> * doc/invoke.texi (AVR Options) [-mabsdata]: Point to absdata.
> * doc/extend.texi (AVR Variable Attributes) [progmem]: Hint
> about linker description to avoid progmem altogether.
> [absdata]: Point to -mabsdata option.
>

Approved.


Re: [patch,testsuite,avr]: Filter-out -mmcu= from options for tests that set -mmcu=

2016-12-01 Thread Mike Stump
On Dec 1, 2016, at 3:54 AM, Georg-Johann Lay  wrote:
> 
> This patch moves the compile tests that have a hard coded -mmcu=MCU in their 
> dg-options to a new folder.
> 
> The exp driver filters out -mmcu= from the command line options that are 
> provided by, say, board description files or --tool-opts.
> 
> This is needed because otherwise conflicting -mmcu= will FAIL respective test 
> cases because of "specified option '-mmcu' more than once" errors from 
> avr-gcc.
> 
> Ok for trunk?

So, it would be nice if different ports can use roughly similar schemes to 
handle the same problems.  I think arm is one of the more complex ports at this 
point in this regard with a lot of people and a lot of years time to 
contemplate and implement solutions to the problem.  They in particular don't 
have to move test cases around to handle the difference like this, I think it 
would be best to avoid that requirement if possible.

Glancing around, two starting points for how the arm achieves what it does:

  lappend dg_runtest_extra_prunes "warning: switch -m(cpu|arch)=.* conflicts 
with -m(cpu|arch)=.* switch"

in arm.exp, and they use something like:

/* { dg-require-effective-target arm_crypto_ok } */ 
|crypto-vsha256hq_u32.c:2:/* { dg-require-effective-target 
arm_crypto_ok } */
/* { dg-add-options arm_crypto } */ 
|crypto-vsha256su0q_u32.c:2:/* { 
dg-require-effective-target arm_crypto_ok } */

to validate the requirements of the test case, and to ensure that optional 
things are selected.  Nice, simple, extensible, handles multilibs, dejagnu 
arguments and different cpu defaults as I recall.

You won't need all the hair the arm folks have, but if you stub in support in 
that direction, you then have simple, easy expansion room to match all 
complexities that the arm folks have already hit and solved.



Re: [PATCH] Dump probability for edges a frequency for BBs

2016-12-01 Thread Martin Sebor

On 12/01/2016 02:48 AM, Martin Liška wrote:

On 11/30/2016 11:46 PM, Martin Sebor wrote:

On 11/24/2016 05:59 AM, Martin Liška wrote:

On 11/24/2016 09:29 AM, Richard Biener wrote:

Please guard with ! TDF_GIMPLE, otherwise the output will not be
parseable
with the GIMPLE FE.

RIchard.


Done and verified that and it provides equal dumps for -fdump*-gimple.
Installed as r242837.


Hi Martin,

I'm trying to understand how to interpret the probabilities (to
make sure one of my tests, builtin-sprintf-2.c, is testing what
it's supposed to be testing).

With this example:

  char d2[2];

  void f (void)
  {
if (2 != __builtin_sprintf (d2, "%i", 12))
  __builtin_abort ();
  }

the probability of the branch to abort is 0%:

  f1 ()
  {
int _1;

 [100.0%]:
_1 = __builtin_sprintf (, "%i", 12);
if (_1 != 2)
  goto ; [0.0%]
else
  goto ; [100.0%]

 [0.0%]:
__builtin_abort ();

 [100.0%]:
return;
  }


Hello Martin.

Looks I did a small error. I use only only one digit after decimal
point, which is unable to
display noreturn predictor (defined as PROB_VERY_UNLIKELY):

#define PROB_VERY_UNLIKELY(REG_BR_PROB_BASE / 2000 - 1) // this is 4

I would suggest to use following patch to display at least 2 digits,
that would distinguish
between real zero and PROB_VERY_UNLIKELY:

x.c.046t.profile_estimate:

f ()
{
  int _1;

   [100.00%]:
  _1 = __builtin_sprintf (, "%i", 12);
  if (_1 != 2)
goto ; [0.04%]
  else
goto ; [99.96%]

   [0.04%]:
  __builtin_abort ();

   [99.96%]:
  return;

}



Yet the call to abort is in the assembly so I would expect its
probability to be more than zero.  So my question is: it it safe
to be testing for calls to abort in the optimized dump as a way
of verifying that the call has not been eliminated from the program
regardless of their probabilities?


I think so, otherwise the call would be removed.


Okay, thanks for the clarification.  One other question though.
Why would the probability be near zero?  In the absence of any
hints the expression 2 != sprintf(d, "%i", 12) should have
a very high probability of being true, near 100% in fact.

I ask because the test I referenced tries to verify the absence
of the sprintf return value optimization.  Without it the
likelihood of each of the calls to abort should in the EQL kind
of test cases like the one above should be nearly 100% (but not
quite).  When it's the opposite it suggests that the sprintf
optimization is providing some hint (in the form of a range of
return values) that changes the odds.  I have verified that
the optimization is not performed so something else must be
setting the probability or the value isn't correct.

Martin



I'm going to test the patch (and eventually update scanned patterns).

Martin

Patch candidate:

diff --git a/gcc/gimple-pretty-print.c b/gcc/gimple-pretty-print.c
index b5e866d..de57e89 100644
--- a/gcc/gimple-pretty-print.c
+++ b/gcc/gimple-pretty-print.c
@@ -72,12 +72,17 @@ debug_gimple_stmt (gimple *gs)
   print_gimple_stmt (stderr, gs, 0, TDF_VOPS|TDF_MEMSYMS);
 }

+/* Print format used for displaying probability of an edge or frequency
+   of a basic block.  */
+
+#define PROBABILITY_FORMAT "[%.2f%%]"
+
 /* Dump E probability to BUFFER.  */

 static void
 dump_edge_probability (pretty_printer *buffer, edge e)
 {
-  pp_scalar (buffer, " [%.1f%%]",
+  pp_scalar (buffer, " " PROBABILITY_FORMAT,
  e->probability * 100.0 / REG_BR_PROB_BASE);
 }

@@ -1023,7 +1028,7 @@ dump_gimple_label (pretty_printer *buffer, glabel
*gs, int spc, int flags)
   dump_generic_node (buffer, label, spc, flags, false);
   basic_block bb = gimple_bb (gs);
   if (bb && !(flags & TDF_GIMPLE))
-pp_scalar (buffer, " [%.1f%%]",
+pp_scalar (buffer, " " PROBABILITY_FORMAT,
bb->frequency * 100.0 / REG_BR_PROB_BASE);
   pp_colon (buffer);
 }
@@ -2590,7 +2595,8 @@ dump_gimple_bb_header (FILE *outf, basic_block bb,
int indent, int flags)
   if (flags & TDF_GIMPLE)
 fprintf (outf, "%*sbb_%d:\n", indent, "", bb->index);
   else
-fprintf (outf, "%*s [%.1f%%]:\n", indent, "", bb->index,
+fprintf (outf, "%*s " PROBABILITY_FORMAT ":\n",
+ indent, "", bb->index,
  bb->frequency * 100.0 / REG_BR_PROB_BASE);
 }
 }



For reference, the directive the test uses since this change was
committed looks like this:

{ dg-final { scan-tree-dump-times "> \\\[\[0-9.\]+%\\\]:\n
*__builtin_abort" 114 "optimized" }

If I'm reading the heavily escaped regex right it matches any
percentage, even 0.0% (and the test passes).

Thanks
Martin






Re: [RFC][PATCH] Speed-up use-after-scope (re-writing to SSA)

2016-12-01 Thread Martin Liška
On 11/23/2016 03:13 PM, Jakub Jelinek wrote:
> On Wed, Nov 23, 2016 at 02:57:07PM +0100, Martin Liška wrote:
>> I started review process in libsanitizer: https://reviews.llvm.org/D26965
>> And I have a question that was asked in the review: can we distinguish 
>> between load and store
>> in case of having usage of ASAN_POISON?
> 
> I think with ASAN_POISON it is indeed just loads from after scope that can
> be caught, a store overwrites the variable with a new value and when turning
> the store after we make the var no longer addressable into SSA form, we
> loose information about the out of scope store.  Furthermore, if there is
> first a store and then a read, like:
>   if (argc != 12312)
> {
>   char my_char;
>   ptr = _char;
> }
>   *ptr = i + 26;
>   return *ptr;
> we don't notice even the read.  Not sure what could be done against that
> though.  I think we'd need to hook into the into-ssa framework, there it
> should know the current value of the variable at the point of the store is
> result of ASAN_POISON and be able to instead of turning that
>   my_char = _23;
> into
>   my_char_35 = _23;
> turn it into:
>   my_char_35 = ASAN_POISON (_23);
> which would represent after scope store into my_char.
> 
> Not really familiar with into-ssa though to know where to do it.
> 
>   Jakub
> 

Richi, may I ask you for help with this question?

Thanks,
Martin



Re: [PATCH 0/2] S/390: New patterns for extzv, risbg and r[ox]sbg.

2016-12-01 Thread Dominik Vogt
On Thu, Dec 01, 2016 at 05:26:16PM +0100, Dominik Vogt wrote:
> The following patch series adds some patterns for enhanced use of
> the r[ixo]sbg instructions on S/390.
> 
>  - 0001-* fixes some test regressions with the existing risbg
>patterns that are broken because of recent trunkt changes.
> 
>  - 0002-* adds new patterns for the r[xo]sbg instructions and an
>SI mode variant of "extzv".
> 
> For details, please chech the commit comments of the patches.  All
> patches have been bootstrapped on s390x biarch and regression
> tested on s390x biarch and s390.

r[xo]sbg patch.

Ciao

Dominik ^_^  ^_^

-- 

Dominik Vogt
IBM Germany
gcc/ChangeLog

* config/s390/s390.md ("extzv"): Allow GPR mode and rename
expander from "extzv" to "extzv".
* ("*extzvdisi"): New zero_extract pattern.
* ("*__ior_and_ze"): New pattern.
with a plain (zero_extract:SI).  Allow GPR mode.
* ("*extract1bit")
("*extract1bitdi"): Rename pattern and switch to GPR
mode.
* ("*rsbg__ze"): New pattern.
gcc/testsuite/ChangeLog

* gcc.target/s390/risbg-ll-1.c (f1, f2, f23, f34, f35, f41): Updated
tests.
* (g1, g2): New tests.
* gcc.target/s390/risbg-ll-2.c (f3, f4): Updated tests.
* gcc.target/s390/risbg-ll-3.c (g1, g2): New tests.
* gcc.target/s390/rosbg-1.c: Add tests for rosbg and rxsbg.
>From 9874c8afb7a61fb98af5b302df9866d25df16b30 Mon Sep 17 00:00:00 2001
From: Dominik Vogt 
Date: Mon, 17 Oct 2016 10:06:16 +0100
Subject: [PATCH 2/2] S/390: New patterns for extzv, risbg and r[ox]sbg.

The new extzv-patterns are necessary for the new r[ox]sbg patterns.
The new risbg patterns are necessary for the new etzv patterns.
---
 gcc/config/s390/s390.md| 74 +--
 gcc/testsuite/gcc.target/s390/risbg-ll-1.c | 36 +++---
 gcc/testsuite/gcc.target/s390/risbg-ll-2.c |  6 +--
 gcc/testsuite/gcc.target/s390/risbg-ll-3.c |  2 +
 gcc/testsuite/gcc.target/s390/rosbg-1.c| 80 ++
 5 files changed, 175 insertions(+), 23 deletions(-)
 create mode 100644 gcc/testsuite/gcc.target/s390/rosbg-1.c

diff --git a/gcc/config/s390/s390.md b/gcc/config/s390/s390.md
index 43b9371..15f0a41 100644
--- a/gcc/config/s390/s390.md
+++ b/gcc/config/s390/s390.md
@@ -3728,26 +3728,24 @@
 ; extv instruction patterns
 ;
 
-; FIXME: This expander needs to be converted from DI to GPR as well
-; after resolving some issues with it.
-
-(define_expand "extzv"
+(define_expand "extzv"
   [(parallel
-[(set (match_operand:DI 0 "register_operand" "=d")
-(zero_extract:DI
- (match_operand:DI 1 "register_operand" "d")
+[(set (match_operand:GPR 0 "register_operand" "=d")
+(zero_extract:GPR
+ (match_operand:GPR 1 "register_operand" "d")
  (match_operand 2 "const_int_operand" "")   ; size
  (match_operand 3 "const_int_operand" ""))) ; start
  (clobber (reg:CC CC_REGNUM))])]
   "TARGET_Z10"
 {
-  if (! EXTRACT_ARGS_IN_RANGE (INTVAL (operands[2]), INTVAL (operands[3]), 64))
+  if (! EXTRACT_ARGS_IN_RANGE (INTVAL (operands[2]), INTVAL (operands[3]),
+  GET_MODE_BITSIZE (mode)))
 FAIL;
   /* Starting with zEC12 there is risbgn not clobbering CC.  */
   if (TARGET_ZEC12)
 {
   emit_move_insn (operands[0],
-gen_rtx_ZERO_EXTRACT (DImode,
+gen_rtx_ZERO_EXTRACT (mode,
   operands[1],
   operands[2],
   operands[3]));
@@ -3787,6 +3785,19 @@
   [(set_attr "op_type" "RIE")
(set_attr "z10prop" "z10_super_E1")])
 
+(define_insn "*extzvdisi"
+  [(set (match_operand:DI 0 "register_operand" "=d")
+  (zero_extract:DI
+(match_operand:SI 1 "register_operand" "d")
+(match_operand 2 "const_int_operand" "")   ; size
+(match_operand 3 "const_int_operand" ""))) ; start
+  ]
+  "
+   && EXTRACT_ARGS_IN_RANGE (INTVAL (operands[2]), INTVAL (operands[3]), 32)"
+  "\t%0,%1,64-%2,128+63,32+%3+%2" ; dst, src, start, end, shift
+  [(set_attr "op_type" "RIE")
+   (set_attr "z10prop" "z10_super_E1")])
+
 ; 64 bit: (a & -16) | ((b >> 8) & 15)
 (define_insn "*extzvdi_lshiftrt"
   [(set (zero_extract:DI (match_operand:DI 0 "register_operand" "+d")
@@ -3820,17 +3831,36 @@
   [(set_attr "op_type" "RIE")
(set_attr "z10prop" "z10_super_E1")])
 
+(define_insn "*__ior_and_ze"
+  [(set (match_operand:GPR 0 "register_operand" "=d")
+   (ior:GPR (and:GPR
+(match_operand:GPR 1 "register_operand" "0")
+(match_operand:GPR 2 "const_int_operand" ""))
+   (zero_extract:GPR
+(match_operand:GPR 3 "register_operand" "d")
+(match_operand 4 "const_int_operand" "") ; size
+(match_operand 5 "const_int_operand" "")) ; start
+   ))]
+  

Re: [PATCH 1/2] S/390: New patterns for extzv, risbg and r[ox]sbg.

2016-12-01 Thread Dominik Vogt
On Thu, Dec 01, 2016 at 05:26:16PM +0100, Dominik Vogt wrote:
> The following patch series adds some patterns for enhanced use of
> the r[ixo]sbg instructions on S/390.
> 
>  - 0001-* fixes some test regressions with the existing risbg
>patterns that are broken because of recent trunkt changes.
> 
>  - 0002-* adds new patterns for the r[xo]sbg instructions and an
>SI mode variant of "extzv".
> 
> For details, please chech the commit comments of the patches.  All
> patches have been bootstrapped on s390x biarch and regression
> tested on s390x biarch and s390.

Risbg patch.

Ciao

Dominik ^_^  ^_^

-- 

Dominik Vogt
IBM Germany
gcc/ChangeLog-fix-risbg-tests

* config/s390/s390.md ("*trunc_sidi_and_subreg_ze")
("*extzvdi_top"): New patterns.
gcc/testsuite/ChangeLog-fix-risbg-tests

* gcc.target/s390/risbg-ll-1.c (f43, f44): Adapt regexps.
* gcc.target/s390/risbg-ll-2.c (f9): Ditto.
>From 59c63c47602d0f32948758b8ce9f36d55b8f8f39 Mon Sep 17 00:00:00 2001
From: Dominik Vogt 
Date: Fri, 25 Nov 2016 10:33:03 +0100
Subject: [PATCH 1/2] S/390: Fix risbg pattern tests.

With r242812 combine generates zero_extract instead of lshiftrt in some case
The test cases are updated to reflect this, but the pattern
"*trunc_sidi_and_subreg_lshrt" is not used anymore.  Add
new pattern "*trunc_sidi_and_subreg_ze" to take over the
one's work.

Add a variant "*extzv_top" that deals with zero_extracts that can be
expressed as a simple right shift, which has the advantage of not clobbering
the condition code.
---
 gcc/config/s390/s390.md| 33 ++
 gcc/testsuite/gcc.target/s390/risbg-ll-1.c |  8 
 gcc/testsuite/gcc.target/s390/risbg-ll-2.c |  2 +-
 3 files changed, 38 insertions(+), 5 deletions(-)

diff --git a/gcc/config/s390/s390.md b/gcc/config/s390/s390.md
index aaf8427..43b9371 100644
--- a/gcc/config/s390/s390.md
+++ b/gcc/config/s390/s390.md
@@ -3755,6 +3755,24 @@
 }
 })
 
+; Special case where zero_extract can be written as a right-shift.
+(define_insn_and_split "*extzvdi_top"
+  [(set (match_operand:DI 0 "register_operand" "=d")
+   (zero_extract:DI
+(match_operand:DI 1 "register_operand" "d")
+(match_operand 2 "const_int_operand" "") ; size
+(const_int 0))) ; start
+  ]
+  "EXTRACT_ARGS_IN_RANGE (INTVAL (operands[2]), 0, 64)"
+  "#"
+  ""
+  [(set (match_dup 0)
+   (lshiftrt:DI (match_dup 1) (match_dup 2)))]
+{
+  operands[2] = GEN_INT (64 - INTVAL (operands[2]));
+})
+
+; In all other cases try risbg.
 (define_insn "*extzv"
   [(set (match_operand:GPR 0 "register_operand" "=d")
   (zero_extract:GPR
@@ -4045,6 +4063,21 @@
   [(set_attr "op_type" "RIE")
(set_attr "z10prop" "z10_super_E1")])
 
+(define_insn "*trunc_sidi_and_subreg_ze"
+  [(set (match_operand:SI 0 "register_operand" "=d")
+   (and:SI
+(subreg:SI (zero_extract:DI
+(match_operand:DI 1 "register_operand" "d")
+(match_operand 2 "const_int_operand" "")  ; size
+(match_operand 3 "const_int_operand" "")) ; pos
+   4)
+(match_operand:SI 4 "contiguous_bitmask_nowrap_operand" "")))]
+  "
+   && EXTRACT_ARGS_IN_RANGE (INTVAL (operands[2]), INTVAL (operands[3]), 64)"
+  "\t%0,%1,%t4,128+%f4,%2+%3"
+  [(set_attr "op_type" "RIE")
+   (set_attr "z10prop" "z10_super_E1")])
+
 ; z = (x << c) | (y >> d) with (x << c) and (y >> d) not overlapping after 
shifting
 ;  -> z = y >> d; z = (x << c) | (z & ((1 << c) - 1))
 ;  -> z = y >> d; z = risbg;
diff --git a/gcc/testsuite/gcc.target/s390/risbg-ll-1.c 
b/gcc/testsuite/gcc.target/s390/risbg-ll-1.c
index 30350d0..17a9000 100644
--- a/gcc/testsuite/gcc.target/s390/risbg-ll-1.c
+++ b/gcc/testsuite/gcc.target/s390/risbg-ll-1.c
@@ -478,8 +478,8 @@ i64 f42 (t42 v_x)
 // Check that we get the case where a 64-bit shift is used by a 32-bit and.
 i32 f43 (i64 v_x)
 {
-  /* { dg-final { scan-assembler "f43:\n\trisbg\t%r2,%r2,32,128\\\+61,64-12" { 
target { lp64 } } } } */
-  /* { dg-final { scan-assembler 
"f43:\n\trisbg\t%r3,%r2,0,0\\\+32-1,64-0-32\n\trisbg\t%r2,%r3,32,128\\\+61,64-12"
 { target { ! lp64 } } } } */
+  /* { dg-final { scan-assembler 
"f43:\n\trisbg\t%r2,%r2,32,128\\\+61,32\\\+20" { target { lp64 } } } } */
+  /* { dg-final { scan-assembler 
"f43:\n\trisbg\t%r3,%r2,0,0\\\+32-1,64-0-32\n\trisbg\t%r2,%r3,32,128\\\+61,32\\\+20"
 { target { ! lp64 } } } } */
   i64 v_shr3 = ((ui64)v_x) >> 12;
   i32 v_shr3_tr = (ui32)v_shr3;
   i32 v_conv = v_shr3_tr & -4;
@@ -489,8 +489,8 @@ i32 f43 (i64 v_x)
 // Check that we don't get the case where the 32-bit and mask is not contiguous
 i32 f44 (i64 v_x)
 {
-  /* { dg-final { scan-assembler "f44:\n\tsrlg\t%r2,%r2,12" { target { lp64 } 
} } } */
-  /* { dg-final { scan-assembler "f44:\n\tsrlg\t%r2,%r3,12\n\tnilf\t%r2,10" { 
target { ! lp64 } } } } */
+  /* { dg-final { scan-assembler "f44:\n\(\t.*\n\)*\tngr\t" { target { lp64 } 
} } } */
+ 

[PATCH 0/2] S/390: New patterns for extzv, risbg and r[ox]sbg.

2016-12-01 Thread Dominik Vogt
The following patch series adds some patterns for enhanced use of
the r[ixo]sbg instructions on S/390.

 - 0001-* fixes some test regressions with the existing risbg
   patterns that are broken because of recent trunkt changes.

 - 0002-* adds new patterns for the r[xo]sbg instructions and an
   SI mode variant of "extzv".

For details, please chech the commit comments of the patches.  All
patches have been bootstrapped on s390x biarch and regression
tested on s390x biarch and s390.

Ciao

Dominik ^_^  ^_^

-- 

Dominik Vogt
IBM Germany



Re: [PATCH] Fix minor nits in gimple-ssa-sprintf.c (PR tree-optimization/78586)

2016-12-01 Thread Martin Sebor

So, let's use another testcase, -O2 -W -Wall -fno-tree-vrp -fno-tree-ccp
and again UB in it:
volatile bool e;
volatile int x;

int
main ()
{
  x = 123;
  *(char *) = x;
  bool f = e;
  x = __builtin_snprintf (0, 0, "%d", f);
}

This will store 1 into x, while without -fprintf-return-value it would store
3.


Great, that's what I was looking for.  I turned it into the following
test case.  Let me try to massage it into a compile-only test suitable
for the test suite and commit it later today.

volatile bool e;
volatile int x;

#define FMT "%d"
const char *fmt = FMT;

int
main ()
{
  x = 123;
  *(char *) = x;
  bool f = e;

  int n1 = __builtin_snprintf (0, 0, FMT, f);
  int n2 = __builtin_snprintf (0, 0, fmt, f);

  __builtin_printf ("%i == %i\n", n1, n2);
  if (n1 != n2)
__builtin_abort ();
}

Martin


[PATCH] PR 66149 & PR78235 dbxout_type_fields

2016-12-01 Thread David Edelsohn
A number of the "variant" testcases fail to build on AIX and targets
that use stabs.  The failure looks like:

/tmp/GCC/powerpc-ibm-aix7.2.0.0/libstdc++-v3/include/variant:956:
internal compiler error: tree check: expected field_decl, have
template_decl in int_bit_position, at tree.h:5396

which occurs in dbxout_type_fields()

  /* Output the name, type, position (in bits), size (in bits) of each
 field that we can support.  */
  for (tem = TYPE_FIELDS (type); tem; tem = DECL_CHAIN (tem))
 ...
  if (VAR_P (tem))
{
 ...
 }
  else
{
  stabstr_C (',');
  stabstr_D (int_bit_position (tem));
  stabstr_C (',');
  stabstr_D (tree_to_uhwi (DECL_SIZE (tem)));
  stabstr_C (';');
}

where tem is a TEMPLATE_DECL.  The dbxout code currently skips
TYPE_DECL, nameless fields, and CONST_DECL.

dbxout_type_methods() explicitly skips TEMPLATE_DECLs with the comment
"The debugger doesn't know what to do with such entities anyhow", so
this proposed patch skips them in dbxout_type_fields() as well.

Okay?

Thanks, David


PR debug/66419
PR c++/78235
* dbxout.c (dbxout_type_fields): Skip TEMPLATE_DECLs.

Index: dbxout.c
===
--- dbxout.c(revision 243118)
+++ dbxout.c(working copy)
@@ -1479,6 +1479,7 @@ dbxout_type_fields (tree type)

   /* Omit here local type decls until we know how to support them.  */
   if (TREE_CODE (tem) == TYPE_DECL
+ || TREE_CODE (tem) == TEMPLATE_DECL
  /* Omit here the nameless fields that are used to skip bits.  */
  || DECL_IGNORED_P (tem)
  /* Omit fields whose position or size are variable or too large to


Re: [PATCH] avoid calling alloca(0)

2016-12-01 Thread Martin Sebor

On 11/30/2016 09:09 PM, Martin Sebor wrote:

What I think this tells us is that we're not at a place where we're
clean.  But we can incrementally get there.  The warning is only
catching a fairly small subset of the cases AFAICT.  That's not unusual
and analyzing why it didn't trigger on those cases might be useful as
well.


The warning has no smarts.  It relies on constant propagation and
won't find a call unless it sees it's being made with a constant
zero.  Looking at the top two on the list the calls are in extern
functions not called from the same source file, so it probably just
doesn't see that the functions are being called from another file
with a zero.  Building GCC with LTO might perhaps help.


I should also add that for GCC abd provided the main concern is
non-unique pointers, the warning find just the right subset of
calls,  (other concerns are portability to non-GCC compilers or
to library implementations).

GCC makes sure the size is a multiple of stack alignment. When
the argument is constant and a multiple of stack size GCC does
nothing, and so when the size is zero it just returns the top
of stack, resulting in non-unique pointers.  When it's not
constant, GCC emits code to round up the size to a multiple
of stack alignment, which makes each pointer unique.




So where does this leave us for gcc-7?  I'm wondering if we drop the
warning in, but not enable it by default anywhere.  We fix the cases we
can (such as reg-stack,c tree-ssa-threadedge.c, maybe others) before
stage3 closes, and shoot for the rest in gcc-8, including improvign the
warning (if there's something we can clearly improve), and enabling the
warning in -Wall or -Wextra.


I'm fine with deferring the GCC fixes and working on the cleanup
over time but I don't think that needs to gate enabling the option
with -Wextra.  The warnings can be suppressed or prevented from
causing errors during a GCC build either via a command line option
or by pragma in the code.  AFAICT, from the other warnings I see
go by, this is what has been done for -Wno-implicit-fallthrough
while those warnings are being cleaned up.  Why not take the same
approach here?

As much as I would like to improve the warning itself I'm also not
sure I see much of an opportunity for it.  It's not prone to high
rates of false positives (hardly any in fact) and the cases it
misses are those where it simply doesn't see the argument value
because it's not been made available by constant propagation.

That said, I consider the -Walloc-size-larger-than warning to be
the more important part of the patch by far.  I'd hate a lack of
consensus on how to deal with GCC's handful of instances of
alloca(0) to stall the rest of the patch.

Thanks
Martin




Re: [RFC] Assert DECL_ABSTRACT_ORIGIN is different from the decl itself

2016-12-01 Thread Martin Jambor
Hello,

On Wed, Nov 30, 2016 at 02:09:19PM +0100, Martin Jambor wrote:
> On Tue, Nov 29, 2016 at 10:17:02AM -0700, Jeff Law wrote:
> >
> > ...
> >
> > So it seems that rather than an assert that we should just not walk down a
> > self-referencing DECL_ABSTRACT_ORIGIN.
> > 
> 
> ...
> 
> So I wonder what the options are... perhaps it seems that we can call
> dump_function_name which starts with code handling
> !DECL_LANG_SPECIFIC(t) cases, even instead of the weird 
> thing?

The following patch does that, it works as expected on my small
testcases, brings g++ in line with what gcc does with clones when it
comes to OpenMP outline functions and obviously prevents the infinite
recursion.

It passes bootstrap and testing on x86_64-linux.  OK for trunk?

Thanks,


2016-11-30  Martin Jambor  

PR c++/78589
* error.c (dump_decl): Use dump_function_name to dump
!DECL_LANG_SPECIFIC function decls with no or self-referencing
abstract origin.
---
 gcc/cp/error.c | 5 +++--
 1 file changed, 3 insertions(+), 2 deletions(-)

diff --git a/gcc/cp/error.c b/gcc/cp/error.c
index 7bf07c3..5f8fb2a 100644
--- a/gcc/cp/error.c
+++ b/gcc/cp/error.c
@@ -1216,10 +1216,11 @@ dump_decl (cxx_pretty_printer *pp, tree t, int flags)
 case FUNCTION_DECL:
   if (! DECL_LANG_SPECIFIC (t))
{
- if (DECL_ABSTRACT_ORIGIN (t))
+ if (DECL_ABSTRACT_ORIGIN (t)
+ && DECL_ABSTRACT_ORIGIN (t) != t)
dump_decl (pp, DECL_ABSTRACT_ORIGIN (t), flags);
  else
-   pp_string (pp, M_(""));
+   dump_function_name (pp, t, flags);
}
   else if (DECL_GLOBAL_CTOR_P (t) || DECL_GLOBAL_DTOR_P (t))
dump_global_iord (pp, t);
-- 
2.10.2




[Patch testsuite obvious] Use setjmp, not sigsetjmp in gcc.dg/pr78582.c

2016-12-01 Thread James Greenhalgh

As subject.

Newlib doesn't have sigsetjmp, so the test fails for our newlib-based
testruns. Confirmed that this adjusted test would still have ICE'd
without r242958.

Committed as obvious as revision 243116.

Thanks,
James

---
2016-12-01  James Greenhalgh  

* gcc.dg/pr78582.c (main): Call setjmp, not sigsetjmp.

diff --git a/gcc/testsuite/gcc.dg/pr78582.c b/gcc/testsuite/gcc.dg/pr78582.c
index 3084e3b..5284e3f 100644
--- a/gcc/testsuite/gcc.dg/pr78582.c
+++ b/gcc/testsuite/gcc.dg/pr78582.c
@@ -10,7 +10,7 @@ int
 main (int argc, char argv, char env)
 {
   int a;
-  sigsetjmp (0, 0);
+  setjmp (0);
   argc = a = argc;
   reader_loop ();
 


[PATCH][AARCH64]Simplify call, call_value, sibcall, sibcall_value patterns.

2016-12-01 Thread Renlin Li

Hi all,

This patch refactors the code used in call, call_value, sibcall,
sibcall_value expanders.

Before the change, the logic is following:

call expander  --> call_internal  --> call_reg/call_symbol
call_vlaue expander--> call_value_internal--> 
call_value_reg/call_value_symbol

sibcall expander   --> sibcall_internal   --> sibcall_insn
sibcall_value expander --> sibcall_value_internal --> sibcall_value_insn

After the change, the logic is simplified into:

call expander  --> aarch64_expand_call() --> call_insn
call_value expander--> aarch64_expand_call() --> call_value_insn

sibcall expander   --> aarch64_expand_call() --> sibcall_insn
sibcall_value expander --> aarch64_expand_call() --> sibcall_value_insn

The code are factored out from each expander into aarch64_expand_call ().

This also fixes the two issues Richard Henderson suggests in comments 8:
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=64971

aarch64-none-elf regression test Okay, aarch64-linux bootstrap Okay.
Okay for trunk?

Regards,
Renlin Li


gcc/ChangeLog:

2016-12-01  Renlin Li  

* config/aarch64/aarch64-protos.h (aarch64_expand_call): Declare.
* config/aarch64/aarch64.c (aarch64_expand_call): Define.
* config/aarch64/constraints.md (Usf): Add long call check.
* config/aarch64/aarch64.md (call): Use aarch64_expand_call.
(call_value): Likewise.
(sibcall): Likewise.
(sibcall_value): Likewise.
(call_insn): New.
(call_value_insn): New.
(sibcall_insn): Update rtx pattern.
(sibcall_value_insn): Likewise.
(call_internal): Remove.
(call_value_internal): Likewise.
(sibcall_internal): Likewise.
(sibcall_value_internal): Likewise.
(call_reg): Likewise.
(call_symbol): Likewise.
(call_value_reg): Likewise.
(call_value_symbol): Likewise.

diff --git a/gcc/config/aarch64/aarch64-protos.h b/gcc/config/aarch64/aarch64-protos.h
index 7f67f14..3a5babb 100644
--- a/gcc/config/aarch64/aarch64-protos.h
+++ b/gcc/config/aarch64/aarch64-protos.h
@@ -305,6 +305,7 @@ bool aarch64_const_vec_all_same_int_p (rtx, HOST_WIDE_INT);
 bool aarch64_constant_address_p (rtx);
 bool aarch64_emit_approx_div (rtx, rtx, rtx);
 bool aarch64_emit_approx_sqrt (rtx, rtx, bool);
+void aarch64_expand_call (rtx, rtx, bool);
 bool aarch64_expand_movmem (rtx *);
 bool aarch64_float_const_zero_rtx_p (rtx);
 bool aarch64_function_arg_regno_p (unsigned);
diff --git a/gcc/config/aarch64/aarch64.c b/gcc/config/aarch64/aarch64.c
index 68a3380..c313cf5 100644
--- a/gcc/config/aarch64/aarch64.c
+++ b/gcc/config/aarch64/aarch64.c
@@ -4343,6 +4343,51 @@ aarch64_fixed_condition_code_regs (unsigned int *p1, unsigned int *p2)
   return true;
 }
 
+/* This function is used by the call expanders of the machine description.
+   RESULT is the register in which the result is returned.  It's NULL for
+   "call" and "sibcall".
+   MEM is the location of the function call.
+   SIBCALL indicates whether this function call is normal call or sibling call.
+   It will generate different pattern accordingly.  */
+
+void
+aarch64_expand_call (rtx result, rtx mem, bool sibcall)
+{
+  rtx call, callee, tmp;
+  rtvec vec;
+  machine_mode mode;
+
+  gcc_assert (MEM_P (mem));
+  callee = XEXP (mem, 0);
+  mode = GET_MODE (callee);
+  gcc_assert (mode == Pmode);
+
+  /* Decide if we should generate indirect calls by loading the
+ 64-bit address of the callee into a register before performing
+ the branch-and-link.  */
+
+  if (GET_CODE (callee) == SYMBOL_REF
+  ? (aarch64_is_long_call_p (callee)
+	 || aarch64_is_noplt_call_p (callee))
+  : !REG_P (callee))
+  XEXP (mem, 0) = force_reg (mode, callee);
+
+  call = gen_rtx_CALL (VOIDmode, mem, const0_rtx);
+
+  if (result != NULL_RTX)
+call = gen_rtx_SET (result, call);
+
+  if (sibcall)
+tmp = ret_rtx;
+  else
+tmp = gen_rtx_CLOBBER (VOIDmode, gen_rtx_REG (Pmode, LR_REGNUM));
+
+  vec = gen_rtvec (2, call, tmp);
+  call = gen_rtx_PARALLEL (VOIDmode, vec);
+
+  aarch64_emit_call_insn (call);
+}
+
 /* Emit call insn with PAT and do aarch64-specific handling.  */
 
 void
diff --git a/gcc/config/aarch64/aarch64.md b/gcc/config/aarch64/aarch64.md
index bc6d8a2..5682686 100644
--- a/gcc/config/aarch64/aarch64.md
+++ b/gcc/config/aarch64/aarch64.md
@@ -718,12 +718,6 @@
 ;; Subroutine calls and sibcalls
 ;; ---
 
-(define_expand "call_internal"
-  [(parallel [(call (match_operand 0 "memory_operand" "")
-		(match_operand 1 "general_operand" ""))
-	  (use (match_operand 2 "" ""))
-	  (clobber (reg:DI LR_REGNUM))])])
-
 (define_expand "call"
   [(parallel [(call (match_operand 0 "memory_operand" "")
 		(match_operand 1 "general_operand" ""))
@@ -732,57 +726,22 @@
   ""
   "
   {
-rtx callee, pat;
-
-/* In an untyped call, we can get NULL for 

Re: [PATCH v3] Do not simplify "(and (reg) (const bit))" to if_then_else.

2016-12-01 Thread Dominik Vogt
On Thu, Dec 01, 2016 at 01:33:17PM +0100, Bernd Schmidt wrote:
> On 11/21/2016 01:36 PM, Dominik Vogt wrote:
> >diff --git a/gcc/combine.c b/gcc/combine.c
> >index b22a274..457fe8a 100644
> >--- a/gcc/combine.c
> >+++ b/gcc/combine.c
> >@@ -5575,10 +5575,23 @@ combine_simplify_rtx (rtx x, machine_mode op0_mode, 
> >int in_dest,
> > {
> >   rtx cop1 = const0_rtx;
> >   enum rtx_code cond_code = simplify_comparison (NE, , );
> >+  unsigned HOST_WIDE_INT nz;
> >
> >   if (cond_code == NE && COMPARISON_P (cond))
> > return x;
> >
> >+  /* If the operation is an AND wrapped in a SIGN_EXTEND or ZERO_EXTEND
> >+ with either operand being just a constant single bit value, do
> >+ nothing since IF_THEN_ELSE is likely to increase the expression's
> >+ complexity.  */
> >+  if (HWI_COMPUTABLE_MODE_P (mode)
> >+  && pow2p_hwi (nz = nonzero_bits (x, mode))
> >+  && ! ((code == SIGN_EXTEND || code == ZERO_EXTEND)
> >+&& GET_CODE (XEXP (x, 0)) == AND
> >+&& CONST_INT_P (XEXP (XEXP (x, 0), 0))
> >+&& UINTVAL (XEXP (XEXP (x, 0), 0)) == nz))
> >+return x;
> 
> It looks like this doesn't actually use cond or true/false_rtx. So
> this could be placed just above the call to if_then_else_cond to
> avoid unnecessary work. Ok if that works.

It does.  Version 3 attached, bootstrapped on s390x and regression
tested on s390x biarch and s390.

Ciao

Dominik ^_^  ^_^

-- 

Dominik Vogt
IBM Germany
gcc/ChangeLog

* combine.c (combine_simplify_rtx):  Suppress replacement of
"(and (reg) (const_int bit))" with "if_then_else".
>From 9202cab6332ce5dcfa740bbae3bcf07f3acc8705 Mon Sep 17 00:00:00 2001
From: Dominik Vogt 
Date: Mon, 31 Oct 2016 09:00:31 +0100
Subject: [PATCH] Do not simplify "(and (reg) (const bit)" to if_then_else.

combine_simplify_rtx() tries to replace rtx expressions with just two
possible values with an experession that uses if_then_else:

  (if_then_else (condition) (value1) (value2))

If the original expression is e.g.

  (and (reg) (const_int 2))

where the constant is the mask for a single bit, the replacement results
in a more complex expression than before:

  (if_then_else (ne (zero_extract (reg) (1) (31))) (2) (0))

Similar replacements are done for

  (signextend (and ...))
  (zeroextend (and ...))

Suppress the replacement this special case in if_then_else_cond().
---
 gcc/combine.c | 12 
 1 file changed, 12 insertions(+)

diff --git a/gcc/combine.c b/gcc/combine.c
index a8dae89..52bde9e 100644
--- a/gcc/combine.c
+++ b/gcc/combine.c
@@ -5600,6 +5600,18 @@ combine_simplify_rtx (rtx x, machine_mode op0_mode, int 
in_dest,
 && OBJECT_P (SUBREG_REG (XEXP (x, 0)))
 {
   rtx cond, true_rtx, false_rtx;
+  unsigned HOST_WIDE_INT nz;
+
+  /* If the operation is an AND wrapped in a SIGN_EXTEND or ZERO_EXTEND 
with
+either operand being just a constant single bit value, do nothing since
+IF_THEN_ELSE is likely to increase the expression's complexity.  */
+  if (HWI_COMPUTABLE_MODE_P (mode)
+ && pow2p_hwi (nz = nonzero_bits (x, mode))
+ && ! ((code == SIGN_EXTEND || code == ZERO_EXTEND)
+   && GET_CODE (XEXP (x, 0)) == AND
+   && CONST_INT_P (XEXP (XEXP (x, 0), 0))
+   && UINTVAL (XEXP (XEXP (x, 0), 0)) == nz))
+ return x;
 
   cond = if_then_else_cond (x, _rtx, _rtx);
   if (cond != 0
-- 
2.3.0



[Patch 2/2 PR78561] Recalculate constant pool size before emitting it

2016-12-01 Thread James Greenhalgh

Hi,

In PR78561, we try to make use of stale constant pool offset data when
making decisions as to whether to output an alignment directive after
the AArch64 constant pool. The offset data goes stale as we only ever
increment it when adding new constants to the pool (it represents an
upper bound on the size of the pool).

To fix that, we should recompute the offset values shortly after
sweeping through insns looking for valid constant.

I'm not totally sure about this code so I'd appreciate comments on whether
this is a sensible idea.

Bootstrapped on aarch64-none-linux-gnu and x86-64-none-linux-gnu and
checked with aarch64-none-elf with no issues.

OK?

Thanks,
James

gcc/

2016-12-01  James Greenhalgh  

PR rtl-optimization/78561
* varasm.c (recompute_pool_offsets): New.
(output_constant_pool): Call it.

gcc/testsuite/

2016-12-01  James Greenhalgh  

PR rtl-optimization/78561
* gcc.target/aarch64/pr78561.c: New.

diff --git a/gcc/testsuite/gcc.target/aarch64/pr78561.c b/gcc/testsuite/gcc.target/aarch64/pr78561.c
new file mode 100644
index 000..048d2d7
--- /dev/null
+++ b/gcc/testsuite/gcc.target/aarch64/pr78561.c
@@ -0,0 +1,9 @@
+/* { dg-do compile } */
+/* { dg-options "-Og -O3 -mcmodel=tiny" } */
+
+int
+main (__fp16 x)
+{
+  __fp16 a = 6.5504e4;
+  return (x <= a);
+}
diff --git a/gcc/varasm.c b/gcc/varasm.c
index f8af0c1..f3cd70a 100644
--- a/gcc/varasm.c
+++ b/gcc/varasm.c
@@ -3942,6 +3942,29 @@ output_constant_pool_1 (struct constant_descriptor_rtx *desc,
   return;
 }
 
+/* Recompute the offsets of entries in POOL, and the overall size of
+   POOL.  Do this after calling mark_constant_pool to ensure that we
+   are computing the offset values for the pool which we will actually
+   emit.  */
+
+static void
+recompute_pool_offsets (struct rtx_constant_pool *pool)
+{
+  struct constant_descriptor_rtx *desc;
+  pool->offset = 0;
+
+  for (desc = pool->first; desc ; desc = desc->next)
+if (desc->mark)
+  {
+	  /* Recalculate offset.  */
+	unsigned int align = desc->align;
+	pool->offset += (align / BITS_PER_UNIT) - 1;
+	pool->offset &= ~ ((align / BITS_PER_UNIT) - 1);
+	desc->offset = pool->offset;
+	pool->offset += GET_MODE_SIZE (desc->mode);
+  }
+}
+
 /* Mark all constants that are referenced by SYMBOL_REFs in X.
Emit referenced deferred strings.  */
 
@@ -4060,6 +4083,11 @@ output_constant_pool (const char *fnname ATTRIBUTE_UNUSED,
  case we do not need to output the constant.  */
   mark_constant_pool ();
 
+  /* Having marked the constant pool entries we'll actually emit, we
+ now need to rebuild the offset information, which may have become
+ stale.  */
+  recompute_pool_offsets (pool);
+
 #ifdef ASM_OUTPUT_POOL_PROLOGUE
   ASM_OUTPUT_POOL_PROLOGUE (asm_out_file, fnname, fndecl, pool->offset);
 #endif


[Patch 1/2 PR78561] Rename get_pool_size to get_pool_size_upper_bound

2016-12-01 Thread James Greenhalgh

Hi,

There's no functional change in this patch, just a rename.

The size recorded in "offset" is only ever incremented as we add new items
to the constant pool. But this information can become stale where those
constant pool entries would not get emitted.

Thus, it is only ever an upper bound on the size of the constant pool.

The only uses of get_pool_size are in rs6000 and there it is only used to
check whether a constant pool might be output - but explicitly renaming the
function to make it clear that you're getting an upper bound rather than the
real size can only be good for programmers using the interface.

OK?

Thanks,
James

---
2016-12-01  James Greenhalgh  

PR rtl-optimization/78561
* config/rs6000/rs6000.c (rs6000_reg_live_or_pic_offset_p) Rename
get_pool_size to get_pool_size_upper_bound.
(rs6000_stack_info): Likewise.
(rs6000_emit_prologue): Likewise.
(rs6000_elf_declare_function_name): Likewise.
(rs6000_set_up_by_prologue): Likewise.
(rs6000_can_eliminate): Likewise, reformat spaces to tabs.
* output.h (get_pool_size): Rename to...
(get_pool_size_upper_bound): ...This.
* varasm.c (get_pool_size): Rename to...
(get_pool_size_upper_bound): ...This.

diff --git a/gcc/config/rs6000/rs6000.c b/gcc/config/rs6000/rs6000.c
index 0a6a784..7e965f9 100644
--- a/gcc/config/rs6000/rs6000.c
+++ b/gcc/config/rs6000/rs6000.c
@@ -25456,7 +25456,7 @@ rs6000_reg_live_or_pic_offset_p (int reg)
   if (TARGET_TOC && TARGET_MINIMAL_TOC
 	  && (crtl->calls_eh_return
 	  || df_regs_ever_live_p (reg)
-	  || get_pool_size ()))
+	  || get_pool_size_upper_bound ()))
 	return true;
 
   if ((DEFAULT_ABI == ABI_V4 || DEFAULT_ABI == ABI_DARWIN)
@@ -26262,7 +26262,7 @@ rs6000_stack_info (void)
 #ifdef TARGET_RELOCATABLE
   || (DEFAULT_ABI == ABI_V4
 	  && (TARGET_RELOCATABLE || flag_pic > 1)
-	  && get_pool_size () != 0)
+	  && get_pool_size_upper_bound () != 0)
 #endif
   || rs6000_ra_ever_killed ())
 info->lr_save_p = 1;
@@ -28039,7 +28039,8 @@ rs6000_emit_prologue (void)
   cfun->machine->r2_setup_needed = df_regs_ever_live_p (TOC_REGNUM);
 
   /* With -mminimal-toc we may generate an extra use of r2 below.  */
-  if (TARGET_TOC && TARGET_MINIMAL_TOC && get_pool_size () != 0)
+  if (TARGET_TOC && TARGET_MINIMAL_TOC
+	  && get_pool_size_upper_bound () != 0)
 	cfun->machine->r2_setup_needed = true;
 }
 
@@ -28894,7 +28895,8 @@ rs6000_emit_prologue (void)
 
   /* If we are using RS6000_PIC_OFFSET_TABLE_REGNUM, we need to set it up.  */
   if (!TARGET_SINGLE_PIC_BASE
-  && ((TARGET_TOC && TARGET_MINIMAL_TOC && get_pool_size () != 0)
+  && ((TARGET_TOC && TARGET_MINIMAL_TOC
+	   && get_pool_size_upper_bound () != 0)
 	  || (DEFAULT_ABI == ABI_V4
 	  && (flag_pic == 1 || (flag_pic && TARGET_SECURE_PLT))
 	  && df_regs_ever_live_p (RS6000_PIC_OFFSET_TABLE_REGNUM
@@ -34961,7 +34963,7 @@ rs6000_elf_declare_function_name (FILE *file, const char *name, tree decl)
   if (DEFAULT_ABI == ABI_V4
   && (TARGET_RELOCATABLE || flag_pic > 1)
   && !TARGET_SECURE_PLT
-  && (get_pool_size () != 0 || crtl->profile)
+  && (get_pool_size_upper_bound () != 0 || crtl->profile)
   && uses_TOC ())
 {
   char buf[256];
@@ -37444,10 +37446,11 @@ static bool
 rs6000_can_eliminate (const int from, const int to)
 {
   return (from == ARG_POINTER_REGNUM && to == STACK_POINTER_REGNUM
-  ? ! frame_pointer_needed
-  : from == RS6000_PIC_OFFSET_TABLE_REGNUM
-? ! TARGET_MINIMAL_TOC || TARGET_NO_TOC || get_pool_size () == 0
-: true);
+	  ? ! frame_pointer_needed
+	  : from == RS6000_PIC_OFFSET_TABLE_REGNUM
+	? ! TARGET_MINIMAL_TOC || TARGET_NO_TOC
+		|| get_pool_size_upper_bound () == 0
+	: true);
 }
 
 /* Define the offset between two registers, FROM to be eliminated and its
@@ -38983,7 +38986,7 @@ rs6000_set_up_by_prologue (struct hard_reg_set_container *set)
   if (!TARGET_SINGLE_PIC_BASE
   && TARGET_TOC
   && TARGET_MINIMAL_TOC
-  && get_pool_size () != 0)
+  && get_pool_size_upper_bound () != 0)
 add_to_hard_reg_set (>set, Pmode, RS6000_PIC_OFFSET_TABLE_REGNUM);
   if (cfun->machine->split_stack_argp_used)
 add_to_hard_reg_set (>set, Pmode, 12);
diff --git a/gcc/output.h b/gcc/output.h
index 0924499..7186dc1 100644
--- a/gcc/output.h
+++ b/gcc/output.h
@@ -287,8 +287,11 @@ extern void assemble_real (REAL_VALUE_TYPE, machine_mode, unsigned,
 /* Write the address of the entity given by SYMBOL to SEC.  */
 extern void assemble_addr_to_section (rtx, section *);
 
-/* Return the size of the constant pool.  */
-extern int get_pool_size (void);
+/* Return the maximum size of the constant pool.  This may be larger
+   than the final size of the constant pool, as entries may be added to
+   the constant pool which become unreferenced, or otherwise not need
+   output by 

[Patch 0/2 PR78561] Recalculate constant pool size before emitting it

2016-12-01 Thread James Greenhalgh
Hi,

In PR78561, we try to make use of stale constant pool offset data when
making decisions as to whether to output an alignment directive after
the AArch64 constant pool. The offset data goes stale as we only ever
increment it when adding new constants to the pool (it represents an
upper bound on the size of the pool).

To fix that, we should recompute the offset values shortly after
sweeping through insns looking for valid constant.

That's easy enough to do (see patch 2/2) and patch 1/2 is just a simple
rename of the get_pool_size function to reflect that it is not providing
an accurate size, just an upper bound on what the size might be after
optimisation.

Technically, patch 1/2 isn't neccessary to fix the PR but cleaning up the
name seems like a useful to do.

The patch set has been bootstrapped and tested on aarch64-none-linux-gnu and
x86-64-none-linux-gnu without any issues. I've also cross-tested it for
aarch64-none-elf and build-tested it for rs6000 (though I couldn't run the
testsuite as I don't have a test environment).

OK?

Thanks,
James

---

[Patch 1/2 PR78561] Rename get_pool_size to get_pool_size_upper_bound

gcc/

2016-12-01  James Greenhalgh  

PR rtl-optimization/78561
* config/rs6000/rs6000.c (rs6000_reg_live_or_pic_offset_p) Rename
get_pool_size to get_pool_size_upper_bound.
(rs6000_stack_info): Likewise.
(rs6000_emit_prologue): Likewise.
(rs6000_elf_declare_function_name): Likewise.
(rs6000_set_up_by_prologue): Likewise.
(rs6000_can_eliminate): Likewise, reformat spaces to tabs.
* output.h (get_pool_size): Rename to...
(get_pool_size_upper_bound): ...This.
* varasm.c (get_pool_size): Rename to...
(get_pool_size_upper_bound): ...This.

[Patch 2/2 PR78561] Recalculate constant pool size before emitting it

gcc/

2016-12-01  James Greenhalgh  

PR rtl-optimization/78561
* varasm.c (recompute_pool_offsets): New.
(output_constant_pool): Call it.

gcc/testsuite/

2016-12-01  James Greenhalgh  

PR rtl-optimization/78561
* gcc.target/aarch64/pr78561.c: New.

---

 gcc/config/rs6000/rs6000.c | 23 +--
 gcc/output.h   |  7 +--
 gcc/testsuite/gcc.target/aarch64/pr78561.c |  9 +
 gcc/varasm.c   | 30 +-
 4 files changed, 56 insertions(+), 13 deletions(-)
 create mode 100644 gcc/testsuite/gcc.target/aarch64/pr78561.c



Re: Import libcilkrts Build 4467 (PR target/68945)

2016-12-01 Thread Rainer Orth
Hi Jeff,

>> The following patch has passed x86_64-pc-linux-gnu bootstrap without
>> regressions; i386-pc-solaris2.12 and sparc-sun-solaris2.12 bootstraps
>> are currently running.
>>
>> Ok for mainline if they pass?
> Yes.  Sorry for not getting back to you sooner.

no worries; I've been on vacation for a week anyway.

Thanks.
Rainer

-- 
-
Rainer Orth, Center for Biotechnology, Bielefeld University


[PATCH] S/390: Fix setmem-long test.

2016-12-01 Thread Dominik Vogt
The attached patch fixes the setmem_long-1.c S/390 backend test.

Adding a " in the scan-assembler pattern is necessary because of a
recent change in print-rtl.c.

Ciao

Dominik ^_^  ^_^

-- 

Dominik Vogt
IBM Germany
gcc/testsuite/ChangeLog-setmem-long-test

* gcc.target/s390/md/setmem_long-1.c: Fix test.
>From 6582cbb17262b8559f632fdb9bdc30ef8e9db1c3 Mon Sep 17 00:00:00 2001
From: Dominik Vogt 
Date: Fri, 25 Nov 2016 17:44:12 +0100
Subject: [PATCH] S/390: Fix setmem-long test.

The test needs to take care of extra quotes around the file name that have been
introduced recently.
---
 gcc/testsuite/gcc.target/s390/md/setmem_long-1.c | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/gcc/testsuite/gcc.target/s390/md/setmem_long-1.c 
b/gcc/testsuite/gcc.target/s390/md/setmem_long-1.c
index 933a698..bd0c594 100644
--- a/gcc/testsuite/gcc.target/s390/md/setmem_long-1.c
+++ b/gcc/testsuite/gcc.target/s390/md/setmem_long-1.c
@@ -16,8 +16,8 @@ void test2(char *p, int c, int len)
 }
 
 /* Check that the right patterns are used.  */
-/* { dg-final { scan-assembler-times {c:9 .*{[*]setmem_long_?3?1?z?}} 1 } } */
-/* { dg-final { scan-assembler-times {c:15 .*{[*]setmem_long_and_?3?1?z?}} 1 { 
xfail *-*-* } } } */
+/* { dg-final { scan-assembler-times {c"?:9 .*{[*]setmem_long_?3?1?z?}} 1 } } 
*/
+/* { dg-final { scan-assembler-times {c"?:15 .*{[*]setmem_long_and_?3?1?z?}} 1 
{ xfail *-*-* } } } */
 
 #define LEN 500
 char buf[LEN + 2];
-- 
2.3.0



Re: [PATCH, vec-tails] Support loop epilogue vectorization

2016-12-01 Thread Richard Biener
On Thu, 1 Dec 2016, Yuri Rumyantsev wrote:

> Thanks Richard for your comments.
> 
> You asked me about possible performance improvements for AVX2 machines
> - we did not see any visible speed-up for spec2k with any method of

Spec 2000?  Can you check with SPEC 2006 or CPUv6?

Did you see performance degradation?  What about compile-time and
binary size effects?

> masking, including epilogue masking and combining, only on AVX512
> machine aka knl.

I see.

Note that as said in the initial review patch the cost model I
saw therein looked flawed.  In the end I'd expect a sensible
approach would be to do

 if (n < scalar-most-profitable-niter)
   {
 no vectorization
   }
 else if (n < masking-more-profitable-than-not-masking-plus-epilogue)
   {
 do masked vectorization
   }
 else
   {
 do unmasked vectorization (with epilogue, eventually vectorized)
   }

where for short trip loops the else path would never be taken
(statically).

And yes, that means masking will only be useful for short-trip loops
which in the end means an overall performance benfit is unlikely
unless we have a lot of short-trip loops that are slow because of
the overhead of main unmasked loop plus epilogue.

Richard.

> I will answer on your question later.
> 
> Best regards.
> Yuri
> 
> 2016-12-01 14:33 GMT+03:00 Richard Biener :
> > On Mon, 28 Nov 2016, Yuri Rumyantsev wrote:
> >
> >> Richard!
> >>
> >> I attached vect dump for hte part of attached test-case which
> >> illustrated how vectorization of epilogues works through masking:
> >> #define SIZE 1023
> >> #define ALIGN 64
> >>
> >> extern int posix_memalign(void **memptr, __SIZE_TYPE__ alignment,
> >> __SIZE_TYPE__ size) __attribute__((weak));
> >> extern void free (void *);
> >>
> >> void __attribute__((noinline))
> >> test_citer (int * __restrict__ a,
> >>int * __restrict__ b,
> >>int * __restrict__ c)
> >> {
> >>   int i;
> >>
> >>   a = (int *)__builtin_assume_aligned (a, ALIGN);
> >>   b = (int *)__builtin_assume_aligned (b, ALIGN);
> >>   c = (int *)__builtin_assume_aligned (c, ALIGN);
> >>
> >>   for (i = 0; i < SIZE; i++)
> >> c[i] = a[i] + b[i];
> >> }
> >>
> >> It was compiled with -mavx2 --param vect-epilogues-mask=1 options.
> >>
> >> I did not include in this patch vectorization of low trip-count loops
> >> since in the original patch additional parameter was introduced:
> >> +DEFPARAM (PARAM_VECT_SHORT_LOOPS,
> >> +  "vect-short-loops",
> >> +  "Enable vectorization of low trip count loops using masking.",
> >> +  0, 0, 1)
> >>
> >> I assume that this ability can be included very quickly but it
> >> requires cost model enhancements also.
> >
> > Comments on the patch itself (as I'm having a closer look again,
> > I know how it vectorizes the above but I wondered why epilogue
> > and short-trip loops are not basically the same code path).
> >
> > Btw, I don't like that the features are behind a --param paywall.
> > That just means a) nobody will use it, b) it will bit-rot quickly,
> > c) bugs are well-hidden.
> >
> > +  if (loop_vinfo && LOOP_VINFO_CAN_BE_MASKED (loop_vinfo)
> > +  && integer_zerop (nested_in_vect_loop
> > +   ? STMT_VINFO_DR_STEP (stmt_info)
> > +   : DR_STEP (dr)))
> > +{
> > +  if (dump_enabled_p ())
> > +   dump_printf_loc (MSG_NOTE, vect_location,
> > +"allow invariant load for masked loop.\n");
> > +}
> >
> > this can test memory_access_type == VMAT_INVARIANT.  Please put
> > all the checks in a common
> >
> >   if (loop_vinfo && LOOP_VINFO_CAN_BE_MASKED (loop_vinfo))
> > {
> >if (memory_access_type == VMAT_INVARIANT)
> >  {
> >  }
> >else if (...)
> >  {
> > LOOP_VINFO_CAN_BE_MASKED (loop_vinfo) = false;
> >  }
> >else if (..)
> > ...
> > }
> >
> > @@ -6667,6 +6756,15 @@ vectorizable_load (gimple *stmt,
> > gimple_stmt_iterator *gsi, gimple **vec_stmt,
> >gcc_assert (!nested_in_vect_loop);
> >gcc_assert (!STMT_VINFO_GATHER_SCATTER_P (stmt_info));
> >
> > +  if (loop_vinfo && LOOP_VINFO_CAN_BE_MASKED (loop_vinfo))
> > +   {
> > + if (dump_enabled_p ())
> > +   dump_printf_loc (MSG_MISSED_OPTIMIZATION, vect_location,
> > +"cannot be masked: grouped access is not"
> > +" supported.");
> > + LOOP_VINFO_CAN_BE_MASKED (loop_vinfo) = false;
> > +  }
> > +
> >
> > isn't this already handled by the above?  Or rather the general
> > disallowance of SLP?
> >
> > @@ -5730,6 +5792,24 @@ vectorizable_store (gimple *stmt,
> > gimple_stmt_iterator *gsi, gimple **vec_stmt,
> > _access_type, _info))
> >  return false;
> >
> > +  if (loop_vinfo && LOOP_VINFO_CAN_BE_MASKED (loop_vinfo)
> > +  && memory_access_type != VMAT_CONTIGUOUS)
> > +{
> > +  LOOP_VINFO_CAN_BE_MASKED (loop_vinfo) = false;
> > +  if 

Re: [PATCH 8/9] Introduce class function_reader (v4)

2016-12-01 Thread Bernd Schmidt

On 11/11/2016 10:15 PM, David Malcolm wrote:

 #include "gt-aarch64.h"
diff --git a/gcc/config/i386/i386.c b/gcc/config/i386/i386.c
index 6c608e0..0dda786 100644
--- a/gcc/config/i386/i386.c
+++ b/gcc/config/i386/i386.c


I think we should separate out the target specific tests so as to give 
port maintainers a chance to comment on them separately.



diff --git a/gcc/emit-rtl.c b/gcc/emit-rtl.c
index 50cd388..179a91f 100644
--- a/gcc/emit-rtl.c
+++ b/gcc/emit-rtl.c
@@ -1371,6 +1371,19 @@ maybe_set_first_label_num (rtx_code_label *x)
   if (CODE_LABEL_NUMBER (x) < first_label_num)
 first_label_num = CODE_LABEL_NUMBER (x);
 }
+
+/* For use by the RTL function loader, when mingling with normal
+   functions.


Not sure what this means.



   if (str == 0)
-   fputs (" \"\"", m_outfile);
+   fputs (" (nil)", m_outfile);
   else
fprintf (m_outfile, " (\"%s\")", str);
   m_sawclose = 1;


What does this affect?


 /* Global singleton; constrast with md_reader_ptr above.  */
diff --git a/gcc/read-rtl-function.c b/gcc/read-rtl-function.c
new file mode 100644
index 000..ff6c808
--- /dev/null
+++ b/gcc/read-rtl-function.c
@@ -0,0 +1,2124 @@
+/* read-rtl-function.c - Reader for RTL function dumps
+   Copyright (C) 2016 Free Software Foundation, Inc.
+
+This file is part of GCC.
+
+GCC is free software; you can redistribute it and/or modify it under
+the terms of the GNU General Public License as published by the Free
+#include 


Please double-check all these includes whether they are necessary.


+
+/* Fix up a NOTE_INSN_BASIC_BLOCK based on an integer block ID.  */
+
+void
+fixup_note_insn_basic_block::apply (function_reader */*reader*/) const


Lose the /*reader*/, probably.


+
+/* Implementation of rtx_reader::handle_unknown_directive.
+
+   Require a top-level "function" elements, as emitted by
+   print_rtx_function, and parse it.  */


"element"?


+void
+function_reader::create_function ()
+{
+  /* Currently we assume cfgrtl mode, rather than cfglayout mode.  */
+  if (0)
+cfg_layout_rtl_register_cfg_hooks ();
+  else
+rtl_register_cfg_hooks ();


Do we expect to change this? I'd just get rid of the if (0), at least 
for now.



+/* cgraph_node::add_new_function does additional processing
+   based on symtab->state.  We need to avoid it attempting to gimplify
+   things.  Temporarily putting it in the PARSING state appears to
+   achieve this.  */
+enum symtab_state old_state = symtab->state;
+symtab->state = PARSING;
+cgraph_node::add_new_function (fndecl, true /*lowered*/);
+/* Reset the state.  */
+symtab->state = old_state;
+  }


Does it do anything beside call finalize_function in that state? If 
that's all you need, just call it directly.



+
+  /* Parse DECL_RTL.  */
+  {
+require_char_ws ('(');
+require_word_ws ("DECL_RTL");
+DECL_WRTL_CHECK (t_param)->decl_with_rtl.rtl = parse_rtx ();
+require_char_ws (')');
+  }


Spurious { } blocks.


+  if (0)
+fprintf (stderr, "parse_edge: %i flags 0x%x \n",
+other_bb_idx, flags);


Remove this.

+  /* For now, only process the (edge-from) to this BB, and (edge-to)
+ that go to the exit block; we don't yet verify that the edge-from
+ and edge-to directives are consistent.  */


That's probably worth a FIXME.


+  if (rtx_code_label *label = dyn_cast  (insn))
+maybe_set_max_label_num (label);


I keep forgetting why dyn_cast instead of as_a?


+case 'e':
+  {
+   if (idx == 7 && CALL_P (return_rtx))
+ {
+   m_in_call_function_usage = true;
+   return rtx_reader::read_rtx_operand (return_rtx, idx);
+   m_in_call_function_usage = false;
+ }
+   else
+ return rtx_reader::read_rtx_operand (return_rtx, idx);
+  }
+  break;


Unnecessary { } blocks in several places.


+
+case 'w':
+  {
+   if (!is_compact ())
+ {
+   /* Strip away the redundant hex dump of the value.  */
+   require_char_ws ('[');
+   read_name ();
+   require_char_ws (']');
+ }
+  }
+  break;


Here too.


+
+/* Special-cased handling of codes 'i' and 'n' for reading function
+   dumps.  */
+
+void
+function_reader::read_rtx_operand_i_or_n (rtx return_rtx, int idx,
+ char format_char)


Document arguments (everywhere). I think return_rtx (throughout these 
functions) is a poor name that can cause confusion because it seems to 
imply a (return).



+
+  /* Possibly wrote:
+print_node_brief (outfile, "", SYMBOL_REF_DECL (in_rtx),
+  dump_flags);  */


???


+ /* Skip the content for now.  */


Does this relate to the above? Please clarify the comments.


+  case CODE_LABEL:
+   {
+ /* Assume that LABEL_NUSES was not dumped.  */
+ /* TODO: parse LABEL_KIND.  */


Unnecessary { }.


+  if (0 == strcmp (desc, ""))
+{
+  return 

Re: [PATCH] Fix PR tree-optimization/78598 - tree-ssa-loop-prefetch.c:835:16: runtime error: signed integer overflow

2016-12-01 Thread Richard Biener
On Thu, Dec 1, 2016 at 1:49 PM, Markus Trippelsdorf
 wrote:
> Using bootstrap-ubsan gcc to build mplayer shows:
>
> tree-ssa-loop-prefetch.c:835:16: runtime error: signed integer overflow:
> 288230376151711743 * 64 cannot be represented in type 'long int'
>
> Here signed und unsigned integers are mixed in a division resulting in
> bogus results: (-83 + 64ULL -1) / 64ULL) == 288230376151711743
>
> Fixed by casting the unsigned parameter to signed.
>
> Tested on ppc64le.
> OK for trunk?

Ok.

Richard.

> Thanks.
>
> PR tree-optimization/78598
> * tree-ssa-loop-prefetch.c (ddown): Cast to signed to avoid
> overflows.
>
>
> diff --git a/gcc/tree-ssa-loop-prefetch.c b/gcc/tree-ssa-loop-prefetch.c
> index 0a2ee5ea25fd..ead2543ada46 100644
> --- a/gcc/tree-ssa-loop-prefetch.c
> +++ b/gcc/tree-ssa-loop-prefetch.c
> @@ -700,9 +700,9 @@ ddown (HOST_WIDE_INT x, unsigned HOST_WIDE_INT by)
>gcc_assert (by > 0);
>
>if (x >= 0)
> -return x / by;
> +return x / (HOST_WIDE_INT) by;
>else
> -return (x + by - 1) / by;
> +return (x + (HOST_WIDE_INT) by - 1) / (HOST_WIDE_INT) by;
>  }
>
>  /* Given a CACHE_LINE_SIZE and two inductive memory references
> --
> Markus


Re: [RS6000] fix rtl checking internal compiler error

2016-12-01 Thread Bill Schmidt
Good catch, Alan, this one is my fault.  I'll handle the backports to the 5 and 
6 branches.

Bill

> On Dec 1, 2016, at 12:34 AM, Alan Modra  wrote:
> 
> I'm committing this one as obvious once my powerpc64le-linux bootstrap
> and regression check completes.  It fixes hundreds of rtl checking
> testsuite errors like the following:
> 
> gcc.c-torture/compile/pr39943.c:6:1: internal compiler error: RTL check: 
> expected elt 0 type 'e' or 'u', have 'E' (rtx unspec) in insn_is_swappable_p, 
> at config/rs6000/rs6000.c:40678
> 
>   * gcc/config/rs6000/rs6000.c (insn_is_swappable_p): Properly
>   look inside UNSPEC_VSX_XXSPLTW vec.
> 
> diff --git a/gcc/config/rs6000/rs6000.c b/gcc/config/rs6000/rs6000.c
> index 9fe98b7..7f307b1 100644
> --- a/gcc/config/rs6000/rs6000.c
> +++ b/gcc/config/rs6000/rs6000.c
> @@ -40675,7 +40675,7 @@ insn_is_swappable_p (swap_web_entry *insn_entry, rtx 
> insn,
>   if (GET_CODE (use_body) != SET
>   || GET_CODE (SET_SRC (use_body)) != UNSPEC
>   || XINT (SET_SRC (use_body), 1) != UNSPEC_VSX_XXSPLTW
> - || XEXP (XEXP (SET_SRC (use_body), 0), 1) != const0_rtx)
> + || XVECEXP (SET_SRC (use_body), 0, 1) != const0_rtx)
> return 0;
> }
>   }
> 
> -- 
> Alan Modra
> Australia Development Lab, IBM
> 



[patch,avr] Document how to avoid progmem on AVR_TINY.

2016-12-01 Thread Georg-Johann Lay
This adds to the documentation a hint how to set up a linker description 
file that avoids progmem altogether any without the usual overhead of 
locating read-only data in RAM.  The proposed linker description file is 
completely transparent to the compiler, and no start-up code has to be 
adjusted.


IIUC there are currently no plans to fix this in the default linker 
description file avrtiny.x, cf. http://sourceware.org/PR20849


Also, link between -mabsdata option and absdata variable attribute.

Ok for trunk?


Johann


gcc/
* doc/invoke.texi (AVR Options) [-mabsdata]: Point to absdata.
* doc/extend.texi (AVR Variable Attributes) [progmem]: Hint
about linker description to avoid progmem altogether.
[absdata]: Point to -mabsdata option.

Index: doc/extend.texi
===
--- doc/extend.texi	(revision 243111)
+++ doc/extend.texi	(working copy)
@@ -5929,6 +5929,30 @@ int read_var (int i)
 @}
 @end smallexample
 
+Please notice that on these devices, there is no need for @code{progmem}
+at all.  Just use an appropriate linker description file like outlined below.
+
+@smallexample
+  .text :
+  @{ ...
+  @} > text
+  /* Leave .rodata in flash and add an offset of 0x4000 to all
+ addresses so that respective objects can be accessed by LD
+ instructions and open coded C/C++.  This means there is no
+ need for progmem in the source and no overhead by read-only
+ data in RAM.  */
+  .rodata ADDR(.text) + SIZEOF (.text) + 0x4000 :
+  @{
+*(.rodata)
+*(.rodata*)
+*(.gnu.linkonce.r*)
+  @} AT> text
+  /* No more need to put .rodata into .data:
+ Removed all .rodata entries from .data.  */
+  .data :
+  @{ ...
+@end smallexample
+
 @end table
 
 @item io
@@ -6001,6 +6025,8 @@ warning like
 
 @end itemize
 
+See also the @option{-mabsdata} @ref{AVR Options,command-line option}.
+
 @end table
 
 @node Blackfin Variable Attributes
Index: doc/invoke.texi
===
--- doc/invoke.texi	(revision 243111)
+++ doc/invoke.texi	(working copy)
@@ -15402,7 +15402,8 @@ GCC supports the following AVR devices a
 
 Assume that all data in static storage can be accessed by LDS / STS
 instructions.  This option has only an effect on reduced Tiny devices like
-ATtiny40.
+ATtiny40.  See also the @code{absdata}
+@ref{AVR Variable Attributes,variable attribute}.
 
 @item -maccumulate-args
 @opindex maccumulate-args


Re: [ARM] PR 78253 do not resolve weak ref locally

2016-12-01 Thread Christophe Lyon
Hi,


On 10 November 2016 at 15:10, Christophe Lyon
 wrote:
> On 10 November 2016 at 11:05, Richard Earnshaw
>  wrote:
>> On 09/11/16 21:29, Christophe Lyon wrote:
>>> Hi,
>>>
>>> PR 78253 shows that the handling of weak references has changed for
>>> ARM with gcc-5.
>>>
>>> When r220674 was committed, default_binds_local_p_2 gained a new
>>> parameter (weak_dominate), which, when true, implies that a reference
>>> to a weak symbol defined locally will be resolved locally, even though
>>> it could be overridden by a strong definition in another object file.
>>>
>>> With r220674, default_binds_local_p forces weak_dominate=true,
>>> effectively changing the previous behavior.
>>>
>>> The attached patch introduces default_binds_local_p_4 which is a copy
>>> of default_binds_local_p_2, but using weak_dominate=false, and updates
>>> the ARM target to call default_binds_local_p_4 instead of
>>> default_binds_local_p_2.
>>>
>>> I ran cross-tests on various arm* configurations with no regression,
>>> and checked that the test attached to the original bugzilla now works
>>> as expected.
>>>
>>> I am not sure why weak_dominate defaults to true, and I couldn't
>>> really understand why by reading the threads related to r220674 and
>>> following updates to default_binds_local_p_* which all deal with other
>>> corner cases and do not discuss the weak_dominate parameter.
>>>
>>> Or should this patch be made more generic?
>>>
>>
>> I certainly don't think it should be ARM specific.
> That was my feeling too.
>
>>
>> The questions I have are:
>>
>> 1) What do other targets do today.  Are they the same, or different?
>
> arm, aarch64, s390 use default_binds_local_p_2 since PR 65780, and
> default_binds_local_p before that. Both have weak_dominate=true
> i386 has its own version, calling default_binds_local_p_3 with true
> for weak_dominate
>
> But the behaviour of default_binds_local_p changed with r220674 as I said 
> above.
> See https://gcc.gnu.org/viewcvs/gcc?view=revision=220674 and
> notice how weak_dominate was introduced
>
> The original bug report is about a different case:
> https://gcc.gnu.org/bugzilla/show_bug.cgi?id=32219
>
> The original patch submission is
> https://gcc.gnu.org/ml/gcc-patches/2015-02/msg00410.html
> and the 1st version with weak_dominate is in:
> https://gcc.gnu.org/ml/gcc-patches/2015-02/msg00469.html
> but it's not clear to me why this was introduced
>
>> 2) If different why?
> on aarch64, although binds_local_p returns true, the relocations used when
> building the function pointer is still the same (still via the GOT).
>
> aarch64 has different logic than arm when accessing a symbol
> (eg aarch64_classify_symbol)
>
>> 3) Is the current behaviour really what was intended by the patch?  ie.
>> Was the old behaviour actually wrong?
>>
> That's what I was wondering.
> Before r220674, calling a weak function directly or via a function
> pointer had the same effect (in other words, the function pointer
> points to the actual implementation: the strong one if any, the weak
> one otherwise).
>
> After r220674, on arm the function pointer points to the weak
> definition, which seems wrong to me, it should leave the actual
> resolution to the linker.
>
>

After looking at the aarch64 port, I think that references to weak symbols
have to be handled carefully, to make sure they cannot be resolved
by the assembler, since the weak symbol can be overridden by a strong
definition at link-time.

Here is a new patch which does that.
Validated on arm* targets with no regression, and I checked that the
original testcase now executes as expected.

Christophe


>> R.
>>> Thanks,
>>>
>>> Christophe
>>>
>>
gcc/ChangeLog:

2016-12-01  Christophe Lyon  

PR target/78253
* config/arm/arm.c (legitimize_pic_address): Handle reference to
weak symbol.
(arm_assemble_integer): Likewise.


diff --git a/gcc/config/arm/arm.c b/gcc/config/arm/arm.c
index 74cb64c..258ceb1 100644
--- a/gcc/config/arm/arm.c
+++ b/gcc/config/arm/arm.c
@@ -6923,10 +6923,13 @@ legitimize_pic_address (rtx orig, machine_mode mode, 
rtx reg)
 same segment as the GOT.  Unfortunately, the flexibility of linker
 scripts means that we can't be sure of that in general, so assume
 that GOTOFF is never valid on VxWorks.  */
+  /* References to weak symbols cannot be resolved locally: they
+may be overridden by a strong definition at link time.  */
   rtx_insn *insn;
   if ((GET_CODE (orig) == LABEL_REF
-  || (GET_CODE (orig) == SYMBOL_REF &&
-  SYMBOL_REF_LOCAL_P (orig)))
+  || (GET_CODE (orig) == SYMBOL_REF
+  && SYMBOL_REF_LOCAL_P (orig)
+  && (SYMBOL_REF_DECL(orig) ? !DECL_WEAK(SYMBOL_REF_DECL(orig)) : 
1)))
  && NEED_GOT_RELOC
  && arm_pic_data_is_text_relative)
insn = arm_pic_static_addr (orig, reg);
@@ -21583,8 

Re: [PATCH, vec-tails] Support loop epilogue vectorization

2016-12-01 Thread Yuri Rumyantsev
Thanks Richard for your comments.

You asked me about possible performance improvements for AVX2 machines
- we did not see any visible speed-up for spec2k with any method of
masking, including epilogue masking and combining, only on AVX512
machine aka knl.

I will answer on your question later.

Best regards.
Yuri

2016-12-01 14:33 GMT+03:00 Richard Biener :
> On Mon, 28 Nov 2016, Yuri Rumyantsev wrote:
>
>> Richard!
>>
>> I attached vect dump for hte part of attached test-case which
>> illustrated how vectorization of epilogues works through masking:
>> #define SIZE 1023
>> #define ALIGN 64
>>
>> extern int posix_memalign(void **memptr, __SIZE_TYPE__ alignment,
>> __SIZE_TYPE__ size) __attribute__((weak));
>> extern void free (void *);
>>
>> void __attribute__((noinline))
>> test_citer (int * __restrict__ a,
>>int * __restrict__ b,
>>int * __restrict__ c)
>> {
>>   int i;
>>
>>   a = (int *)__builtin_assume_aligned (a, ALIGN);
>>   b = (int *)__builtin_assume_aligned (b, ALIGN);
>>   c = (int *)__builtin_assume_aligned (c, ALIGN);
>>
>>   for (i = 0; i < SIZE; i++)
>> c[i] = a[i] + b[i];
>> }
>>
>> It was compiled with -mavx2 --param vect-epilogues-mask=1 options.
>>
>> I did not include in this patch vectorization of low trip-count loops
>> since in the original patch additional parameter was introduced:
>> +DEFPARAM (PARAM_VECT_SHORT_LOOPS,
>> +  "vect-short-loops",
>> +  "Enable vectorization of low trip count loops using masking.",
>> +  0, 0, 1)
>>
>> I assume that this ability can be included very quickly but it
>> requires cost model enhancements also.
>
> Comments on the patch itself (as I'm having a closer look again,
> I know how it vectorizes the above but I wondered why epilogue
> and short-trip loops are not basically the same code path).
>
> Btw, I don't like that the features are behind a --param paywall.
> That just means a) nobody will use it, b) it will bit-rot quickly,
> c) bugs are well-hidden.
>
> +  if (loop_vinfo && LOOP_VINFO_CAN_BE_MASKED (loop_vinfo)
> +  && integer_zerop (nested_in_vect_loop
> +   ? STMT_VINFO_DR_STEP (stmt_info)
> +   : DR_STEP (dr)))
> +{
> +  if (dump_enabled_p ())
> +   dump_printf_loc (MSG_NOTE, vect_location,
> +"allow invariant load for masked loop.\n");
> +}
>
> this can test memory_access_type == VMAT_INVARIANT.  Please put
> all the checks in a common
>
>   if (loop_vinfo && LOOP_VINFO_CAN_BE_MASKED (loop_vinfo))
> {
>if (memory_access_type == VMAT_INVARIANT)
>  {
>  }
>else if (...)
>  {
> LOOP_VINFO_CAN_BE_MASKED (loop_vinfo) = false;
>  }
>else if (..)
> ...
> }
>
> @@ -6667,6 +6756,15 @@ vectorizable_load (gimple *stmt,
> gimple_stmt_iterator *gsi, gimple **vec_stmt,
>gcc_assert (!nested_in_vect_loop);
>gcc_assert (!STMT_VINFO_GATHER_SCATTER_P (stmt_info));
>
> +  if (loop_vinfo && LOOP_VINFO_CAN_BE_MASKED (loop_vinfo))
> +   {
> + if (dump_enabled_p ())
> +   dump_printf_loc (MSG_MISSED_OPTIMIZATION, vect_location,
> +"cannot be masked: grouped access is not"
> +" supported.");
> + LOOP_VINFO_CAN_BE_MASKED (loop_vinfo) = false;
> +  }
> +
>
> isn't this already handled by the above?  Or rather the general
> disallowance of SLP?
>
> @@ -5730,6 +5792,24 @@ vectorizable_store (gimple *stmt,
> gimple_stmt_iterator *gsi, gimple **vec_stmt,
> _access_type, _info))
>  return false;
>
> +  if (loop_vinfo && LOOP_VINFO_CAN_BE_MASKED (loop_vinfo)
> +  && memory_access_type != VMAT_CONTIGUOUS)
> +{
> +  LOOP_VINFO_CAN_BE_MASKED (loop_vinfo) = false;
> +  if (dump_enabled_p ())
> +   dump_printf_loc (MSG_MISSED_OPTIMIZATION, vect_location,
> +"cannot be masked: unsupported memory access
> type.\n");
> +}
> +
> +  if (loop_vinfo && LOOP_VINFO_CAN_BE_MASKED (loop_vinfo)
> +  && !can_mask_load_store (stmt))
> +{
> +  LOOP_VINFO_CAN_BE_MASKED (loop_vinfo) = false;
> +  if (dump_enabled_p ())
> +   dump_printf_loc (MSG_MISSED_OPTIMIZATION, vect_location,
> +"cannot be masked: unsupported masked store.\n");
> +}
> +
>
> likewise please combine the ifs.
>
> @@ -2354,7 +2401,10 @@ vectorizable_mask_load_store (gimple *stmt,
> gimple_stmt_iterator *gsi,
>   ptr, vec_mask, vec_rhs);
>   vect_finish_stmt_generation (stmt, new_stmt, gsi);
>   if (i == 0)
> -   STMT_VINFO_VEC_STMT (stmt_info) = *vec_stmt = new_stmt;
> +   {
> + STMT_VINFO_VEC_STMT (stmt_info) = *vec_stmt = new_stmt;
> + STMT_VINFO_FIRST_COPY_P (vinfo_for_stmt (new_stmt)) = true;
> +   }
>   else
> STMT_VINFO_RELATED_STMT (prev_stmt_info) = new_stmt;
>   

Re: Calling 'abort' on bounds violations in libmpx

2016-12-01 Thread Alexander Ivchenko
Should changing minor version of the library be enough?

diff --git a/libmpx/mpxrt/libtool-version b/libmpx/mpxrt/libtool-version
index 7d99255..736d763 100644
--- a/libmpx/mpxrt/libtool-version
+++ b/libmpx/mpxrt/libtool-version
@@ -3,4 +3,4 @@
 # a separate file so that version updates don't involve re-running
 # automake.
 # CURRENT:REVISION:AGE
-2:0:0
+2:1:0

(otherwise - no difference).

I've run make check on a non-mpx-enabled machine (no new regressions)
and manually tested newly added environment variable on the mpx
machine. It looks like there is no explicit tests for libmpx, so I'm
not sure what tests should I add. What do you think would be the right
testing process here?

2016-11-29 20:22 GMT+03:00 Ilya Enkovich :
> 2016-11-29 17:43 GMT+03:00 Alexander Ivchenko :
>> Hi,
>>
>> Attached patch is addressing PR67520. Would that approach work for the
>> problem? Should I also change the version of the library?
>
> Hi!
>
> Overall patch is OK. But you need to change version because you
> change default behavior. How did you test it? Did you check default
> behavior change doesn't affect existing runtime MPX tests? Can we
> add new ones?
>
> Thanks,
> Ilya
>
>>
>> 2016-11-29  Alexander Ivchenko  
>>
>> * mpxrt/mpxrt-utils.c (set_mpx_rt_stop_handler): New function.
>> (print_help): Add help for CHKP_RT_STOP_HANDLER environment
>> variable.
>> (__mpxrt_init_env_vars): Add initialization of stop_handler.
>> (__mpxrt_stop_handler): New function.
>> (__mpxrt_stop): Ditto.
>> * mpxrt/mpxrt-utils.h (mpx_rt_stop_mode_handler_t): New enum.
>>
>>
>>
>> diff --git a/libmpx/mpxrt/mpxrt-utils.c b/libmpx/mpxrt/mpxrt-utils.c
>> index 057a355..63ee7c6 100644
>> --- a/libmpx/mpxrt/mpxrt-utils.c
>> +++ b/libmpx/mpxrt/mpxrt-utils.c
>> @@ -60,6 +60,9 @@
>>  #define MPX_RT_MODE "CHKP_RT_MODE"
>>  #define MPX_RT_MODE_DEFAULT MPX_RT_COUNT
>>  #define MPX_RT_MODE_DEFAULT_STR "count"
>> +#define MPX_RT_STOP_HANDLER "CHKP_RT_STOP_HANDLER"
>> +#define MPX_RT_STOP_HANDLER_DEFAULT MPX_RT_STOP_HANDLER_ABORT
>> +#define MPX_RT_STOP_HANDLER_DEFAULT_STR "abort"
>>  #define MPX_RT_HELP "CHKP_RT_HELP"
>>  #define MPX_RT_ADDPID "CHKP_RT_ADDPID"
>>  #define MPX_RT_BNDPRESERVE "CHKP_RT_BNDPRESERVE"
>> @@ -84,6 +87,7 @@ typedef struct {
>>  static int summary;
>>  static int add_pid;
>>  static mpx_rt_mode_t mode;
>> +static mpx_rt_stop_mode_handler_t stop_handler;
>>  static env_var_list_t env_var_list;
>>  static verbose_type verbose_val;
>>  static FILE *out;
>> @@ -226,6 +230,23 @@ set_mpx_rt_mode (const char *env)
>>}
>>  }
>>
>> +static mpx_rt_stop_mode_handler_t
>> +set_mpx_rt_stop_handler (const char *env)
>> +{
>> +  if (env == 0)
>> +return MPX_RT_STOP_HANDLER_DEFAULT;
>> +  else if (strcmp (env, "abort") == 0)
>> +return MPX_RT_STOP_HANDLER_ABORT;
>> +  else if (strcmp (env, "exit") == 0)
>> +return MPX_RT_STOP_HANDLER_EXIT;
>> +  {
>> +__mpxrt_print (VERB_ERROR, "Illegal value '%s' for %s. Legal values are"
>> +   "[abort | exit]\nUsing default value %s\n",
>> +   env, MPX_RT_STOP_HANDLER, MPX_RT_STOP_HANDLER_DEFAULT);
>> +return MPX_RT_STOP_HANDLER_DEFAULT;
>> +  }
>> +}
>> +
>>  static void
>>  print_help (void)
>>  {
>> @@ -244,6 +265,11 @@ print_help (void)
>>fprintf (out, "%s \t\t set MPX runtime behavior on #BR exception."
>> " [stop | count]\n"
>> "\t\t\t [default: %s]\n", MPX_RT_MODE, MPX_RT_MODE_DEFAULT_STR);
>> +  fprintf (out, "%s \t set the handler function MPX runtime will call\n"
>> +   "\t\t\t on #BR exception when %s is set to \'stop\'."
>> +   " [abort | exit]\n"
>> +   "\t\t\t [default: %s]\n", MPX_RT_STOP_HANDLER, MPX_RT_MODE,
>> +   MPX_RT_STOP_HANDLER_DEFAULT_STR);
>>fprintf (out, "%s \t\t generate out,err file for each process.\n"
>> "\t\t\t generated file will be MPX_RT_{OUT,ERR}_FILE.pid\n"
>> "\t\t\t [default: no]\n", MPX_RT_ADDPID);
>> @@ -357,6 +383,10 @@ __mpxrt_init_env_vars (int* bndpreserve)
>>env_var_list_add (MPX_RT_MODE, env);
>>mode = set_mpx_rt_mode (env);
>>
>> +  env = secure_getenv (MPX_RT_STOP_HANDLER);
>> +  env_var_list_add (MPX_RT_STOP_HANDLER, env);
>> +  stop_handler = set_mpx_rt_stop_handler (env);
>> +
>>env = secure_getenv (MPX_RT_BNDPRESERVE);
>>env_var_list_add (MPX_RT_BNDPRESERVE, env);
>>validate_bndpreserve (env, bndpreserve);
>> @@ -487,6 +517,22 @@ __mpxrt_mode (void)
>>return mode;
>>  }
>>
>> +mpx_rt_mode_t
>> +__mpxrt_stop_handler (void)
>> +{
>> +  return stop_handler;
>> +}
>> +
>> +void __attribute__ ((noreturn))
>> +__mpxrt_stop (void)
>> +{
>> +  if (__mpxrt_stop_handler () == MPX_RT_STOP_HANDLER_ABORT)
>> +abort ();
>> +  else if (__mpxrt_stop_handler () == MPX_RT_STOP_HANDLER_EXIT)
>> +exit (255);
>> +  __builtin_unreachable ();
>> +}
>> +
>>  void
>>  __mpxrt_print_summary (uint64_t num_brs, uint64_t l1_size)
>>  {
>> diff --git a/libmpx/mpxrt/mpxrt-utils.h b/libmpx/mpxrt/mpxrt-utils.h
>> index 

Re: [PATCH] PR rtl-optimization/78596 - combine.c:12561:14: runtime error: left shift of negative value

2016-12-01 Thread Segher Boessenkool
On Thu, Dec 01, 2016 at 01:34:29PM +0100, Markus Trippelsdorf wrote:
> Hopefully one last patch for UB in combine.c:
> 
>  combine.c:12561:14: runtime error: left shift of negative value -9
> 
> Fixed by casting to unsigned, as usual.
> 
> Tested on ppc64le.
> OK for trunk?

Sure, but please fix the indentation of that last new line (and of the
changelog, too, while you're at it ;-) )


Segher


> -   const_op <<= INTVAL (XEXP (op0, 1));
> +   const_op = (unsigned HOST_WIDE_INT) const_op
> +   << INTVAL (XEXP (op0, 1));


Re: [PATCH 5/9] Introduce selftest::locate_file (v4)

2016-12-01 Thread Bernd Schmidt

On 11/11/2016 10:15 PM, David Malcolm wrote:

+  /* Makefile.in has -fself-test=$(srcdir)/testsuite/selftests, so that
+ flag_self_test contains the path to the selftest subdirectory of the
+ source tree (without a trailing slash).  Copy it up to
+ path_to_selftest_files, to avoid selftest.c depending on
+ option-handling.  */
+  path_to_selftest_files = flag_self_test;
+


What kind of dependency are you avoiding? If it's just one include I'd 
prefer to get rid of the extraneous variable.


Otherwise ok.


Bernd



Re: PR78629

2016-12-01 Thread Richard Biener
On Thu, 1 Dec 2016, Prathamesh Kulkarni wrote:

> Hi Richard,
> I tested your fix for the patch with ubsan stage-1 built gcc, and it
> fixes the error.
> Is it OK to commit if bootstrap+test passes on x86_64-unknown-linux-gnu ?

Ok.

Richard.


PR78629

2016-12-01 Thread Prathamesh Kulkarni
Hi Richard,
I tested your fix for the patch with ubsan stage-1 built gcc, and it
fixes the error.
Is it OK to commit if bootstrap+test passes on x86_64-unknown-linux-gnu ?

Thanks,
Prathamesh
2016-12-01  Richard Biener  
Prathamesh Kulkarni  

PR middle-end/78629
* vec.h (vec::quick_grow_cleared): Guard call to
memset if len-oldlen != 0.
(vec::safe_grow_cleared): Likewise.

diff --git a/gcc/vec.h b/gcc/vec.h
index 14fb2a6..aa93411 100644
--- a/gcc/vec.h
+++ b/gcc/vec.h
@@ -1092,8 +1092,10 @@ inline void
 vec::quick_grow_cleared (unsigned len)
 {
   unsigned oldlen = length ();
+  size_t sz = sizeof (T) * (len - oldlen);
   quick_grow (len);
-  memset (&(address ()[oldlen]), 0, sizeof (T) * (len - oldlen));
+  if (sz != 0)
+memset (&(address ()[oldlen]), 0, sz);
 }
 
 
@@ -1605,8 +1607,10 @@ inline void
 vec::safe_grow_cleared (unsigned len MEM_STAT_DECL)
 {
   unsigned oldlen = length ();
+  size_t sz = sizeof (T) * (len - oldlen);
   safe_grow (len PASS_MEM_STAT);
-  memset (&(address ()[oldlen]), 0, sizeof (T) * (len - oldlen));
+  if (sz != 0)
+memset (&(address ()[oldlen]), 0, sz);
 }
 
 


Re: [tree-tailcall] Check if function returns it's argument

2016-12-01 Thread Richard Biener
On Thu, 1 Dec 2016, Prathamesh Kulkarni wrote:

> On 1 December 2016 at 18:38, Richard Biener  wrote:
> > On Thu, 1 Dec 2016, Prathamesh Kulkarni wrote:
> >
> >> On 1 December 2016 at 18:26, Richard Biener  wrote:
> >> > On Thu, 1 Dec 2016, Prathamesh Kulkarni wrote:
> >> >
> >> >> On 1 December 2016 at 17:40, Richard Biener  wrote:
> >> >> > On Thu, 1 Dec 2016, Prathamesh Kulkarni wrote:
> >> >> >
> >> >> >> On 25 November 2016 at 21:17, Jeff Law  wrote:
> >> >> >> > On 11/25/2016 01:07 AM, Richard Biener wrote:
> >> >> >> >
> >> >> >> >>> For the tail-call, issue should we artificially create a lhs and 
> >> >> >> >>> use that
> >> >> >> >>> as return value (perhaps by a separate pass before tailcall) ?
> >> >> >> >>>
> >> >> >> >>> __builtin_memcpy (a1, a2, a3);
> >> >> >> >>> return a1;
> >> >> >> >>>
> >> >> >> >>> gets transformed to:
> >> >> >> >>> _1 = __builtin_memcpy (a1, a2, a3)
> >> >> >> >>> return _1;
> >> >> >> >>>
> >> >> >> >>> So tail-call optimization pass would see the IL in it's expected 
> >> >> >> >>> form.
> >> >> >> >>
> >> >> >> >>
> >> >> >> >> As said, a RTL expert needs to chime in here.  Iff then tail-call
> >> >> >> >> itself should do this rewrite.  But if this form is required to 
> >> >> >> >> make
> >> >> >> >> things work (I suppose you checked it _does_ actually work?) then
> >> >> >> >> we'd need to make sure later passes do not undo it.  So it looks
> >> >> >> >> fragile to me.  OTOH I seem to remember that the flags we set on
> >> >> >> >> GIMPLE are merely a hint to RTL expansion and the tailcalling is
> >> >> >> >> verified again there?
> >> >> >> >
> >> >> >> > So tail calling actually sits on the border between trees and RTL.
> >> >> >> > Essentially it's an expand-time decision as we use information 
> >> >> >> > from trees as
> >> >> >> > well as low level target information.
> >> >> >> >
> >> >> >> > I would not expect the former sequence to tail call.  The tail 
> >> >> >> > calling code
> >> >> >> > does not know that the return value from memcpy will be a1.  Thus 
> >> >> >> > the tail
> >> >> >> > calling code has to assume that it'll have to copy a1 into the 
> >> >> >> > return
> >> >> >> > register after returning from memcpy, which obviously can't be 
> >> >> >> > done if we
> >> >> >> > tail called memcpy.
> >> >> >> >
> >> >> >> > The second form is much more likely to turn into a tail call 
> >> >> >> > sequence
> >> >> >> > because the return value from memcpy will be sitting in the proper 
> >> >> >> > register.
> >> >> >> > This form out to work for most calling conventions that allow tail 
> >> >> >> > calls.
> >> >> >> >
> >> >> >> > We could (in theory) try and exploit the fact that memcpy returns 
> >> >> >> > its first
> >> >> >> > argument as a return value, but that would only be helpful on a 
> >> >> >> > target where
> >> >> >> > the first argument and return value use the same register. So I'd 
> >> >> >> > have a
> >> >> >> > slight preference to rewriting per Prathamesh's suggestion above 
> >> >> >> > since it's
> >> >> >> > more general.
> >> >> >> Thanks for the suggestion. The attached patch creates artificial lhs,
> >> >> >> and returns it if the function returns it's argument and that 
> >> >> >> argument
> >> >> >> is used as return-value.
> >> >> >>
> >> >> >> eg:
> >> >> >> f (void * a1, void * a2, long unsigned int a3)
> >> >> >> {
> >> >> >>[0.0%]:
> >> >> >>   # .MEM_5 = VDEF <.MEM_1(D)>
> >> >> >>   __builtin_memcpy (a1_2(D), a2_3(D), a3_4(D));
> >> >> >>   # VUSE <.MEM_5>
> >> >> >>   return a1_2(D);
> >> >> >>
> >> >> >> }
> >> >> >>
> >> >> >> is transformed to:
> >> >> >> f (void * a1, void * a2, long unsigned int a3)
> >> >> >> {
> >> >> >>   void * _6;
> >> >> >>
> >> >> >>[0.0%]:
> >> >> >>   # .MEM_5 = VDEF <.MEM_1(D)>
> >> >> >>   _6 = __builtin_memcpy (a1_2(D), a2_3(D), a3_4(D));
> >> >> >>   # VUSE <.MEM_5>
> >> >> >>   return _6;
> >> >> >>
> >> >> >> }
> >> >> >>
> >> >> >> While testing, I came across an issue with function f() defined
> >> >> >> intail-padding1.C:
> >> >> >> struct X
> >> >> >> {
> >> >> >>   ~X() {}
> >> >> >>   int n;
> >> >> >>   char d;
> >> >> >> };
> >> >> >>
> >> >> >> X f()
> >> >> >> {
> >> >> >>   X nrvo;
> >> >> >>   __builtin_memset (, 0, sizeof(X));
> >> >> >>   return nrvo;
> >> >> >> }
> >> >> >>
> >> >> >> input to the pass:
> >> >> >> X f() ()
> >> >> >> {
> >> >> >>[0.0%]:
> >> >> >>   # .MEM_3 = VDEF <.MEM_1(D)>
> >> >> >>   __builtin_memset (nrvo_2(D), 0, 8);
> >> >> >>   # VUSE <.MEM_3>
> >> >> >>   return nrvo_2(D);
> >> >> >>
> >> >> >> }
> >> >> >>
> >> >> >> verify_gimple_return failed with:
> >> >> >> tail-padding1.C:13:1: error: invalid conversion in return statement
> >> >> >>  }
> >> >> >>  ^
> >> >> >> struct X
> >> >> >>
> >> >> >> struct X &
> >> >> >>
> >> >> >> # VUSE <.MEM_3>
> >> >> >> return _4;
> >> >> >>
> >> >> >> It seems the return type of function (struct X) 

Re: [tree-tailcall] Check if function returns it's argument

2016-12-01 Thread Prathamesh Kulkarni
On 1 December 2016 at 18:38, Richard Biener  wrote:
> On Thu, 1 Dec 2016, Prathamesh Kulkarni wrote:
>
>> On 1 December 2016 at 18:26, Richard Biener  wrote:
>> > On Thu, 1 Dec 2016, Prathamesh Kulkarni wrote:
>> >
>> >> On 1 December 2016 at 17:40, Richard Biener  wrote:
>> >> > On Thu, 1 Dec 2016, Prathamesh Kulkarni wrote:
>> >> >
>> >> >> On 25 November 2016 at 21:17, Jeff Law  wrote:
>> >> >> > On 11/25/2016 01:07 AM, Richard Biener wrote:
>> >> >> >
>> >> >> >>> For the tail-call, issue should we artificially create a lhs and 
>> >> >> >>> use that
>> >> >> >>> as return value (perhaps by a separate pass before tailcall) ?
>> >> >> >>>
>> >> >> >>> __builtin_memcpy (a1, a2, a3);
>> >> >> >>> return a1;
>> >> >> >>>
>> >> >> >>> gets transformed to:
>> >> >> >>> _1 = __builtin_memcpy (a1, a2, a3)
>> >> >> >>> return _1;
>> >> >> >>>
>> >> >> >>> So tail-call optimization pass would see the IL in it's expected 
>> >> >> >>> form.
>> >> >> >>
>> >> >> >>
>> >> >> >> As said, a RTL expert needs to chime in here.  Iff then tail-call
>> >> >> >> itself should do this rewrite.  But if this form is required to make
>> >> >> >> things work (I suppose you checked it _does_ actually work?) then
>> >> >> >> we'd need to make sure later passes do not undo it.  So it looks
>> >> >> >> fragile to me.  OTOH I seem to remember that the flags we set on
>> >> >> >> GIMPLE are merely a hint to RTL expansion and the tailcalling is
>> >> >> >> verified again there?
>> >> >> >
>> >> >> > So tail calling actually sits on the border between trees and RTL.
>> >> >> > Essentially it's an expand-time decision as we use information from 
>> >> >> > trees as
>> >> >> > well as low level target information.
>> >> >> >
>> >> >> > I would not expect the former sequence to tail call.  The tail 
>> >> >> > calling code
>> >> >> > does not know that the return value from memcpy will be a1.  Thus 
>> >> >> > the tail
>> >> >> > calling code has to assume that it'll have to copy a1 into the return
>> >> >> > register after returning from memcpy, which obviously can't be done 
>> >> >> > if we
>> >> >> > tail called memcpy.
>> >> >> >
>> >> >> > The second form is much more likely to turn into a tail call sequence
>> >> >> > because the return value from memcpy will be sitting in the proper 
>> >> >> > register.
>> >> >> > This form out to work for most calling conventions that allow tail 
>> >> >> > calls.
>> >> >> >
>> >> >> > We could (in theory) try and exploit the fact that memcpy returns 
>> >> >> > its first
>> >> >> > argument as a return value, but that would only be helpful on a 
>> >> >> > target where
>> >> >> > the first argument and return value use the same register. So I'd 
>> >> >> > have a
>> >> >> > slight preference to rewriting per Prathamesh's suggestion above 
>> >> >> > since it's
>> >> >> > more general.
>> >> >> Thanks for the suggestion. The attached patch creates artificial lhs,
>> >> >> and returns it if the function returns it's argument and that argument
>> >> >> is used as return-value.
>> >> >>
>> >> >> eg:
>> >> >> f (void * a1, void * a2, long unsigned int a3)
>> >> >> {
>> >> >>[0.0%]:
>> >> >>   # .MEM_5 = VDEF <.MEM_1(D)>
>> >> >>   __builtin_memcpy (a1_2(D), a2_3(D), a3_4(D));
>> >> >>   # VUSE <.MEM_5>
>> >> >>   return a1_2(D);
>> >> >>
>> >> >> }
>> >> >>
>> >> >> is transformed to:
>> >> >> f (void * a1, void * a2, long unsigned int a3)
>> >> >> {
>> >> >>   void * _6;
>> >> >>
>> >> >>[0.0%]:
>> >> >>   # .MEM_5 = VDEF <.MEM_1(D)>
>> >> >>   _6 = __builtin_memcpy (a1_2(D), a2_3(D), a3_4(D));
>> >> >>   # VUSE <.MEM_5>
>> >> >>   return _6;
>> >> >>
>> >> >> }
>> >> >>
>> >> >> While testing, I came across an issue with function f() defined
>> >> >> intail-padding1.C:
>> >> >> struct X
>> >> >> {
>> >> >>   ~X() {}
>> >> >>   int n;
>> >> >>   char d;
>> >> >> };
>> >> >>
>> >> >> X f()
>> >> >> {
>> >> >>   X nrvo;
>> >> >>   __builtin_memset (, 0, sizeof(X));
>> >> >>   return nrvo;
>> >> >> }
>> >> >>
>> >> >> input to the pass:
>> >> >> X f() ()
>> >> >> {
>> >> >>[0.0%]:
>> >> >>   # .MEM_3 = VDEF <.MEM_1(D)>
>> >> >>   __builtin_memset (nrvo_2(D), 0, 8);
>> >> >>   # VUSE <.MEM_3>
>> >> >>   return nrvo_2(D);
>> >> >>
>> >> >> }
>> >> >>
>> >> >> verify_gimple_return failed with:
>> >> >> tail-padding1.C:13:1: error: invalid conversion in return statement
>> >> >>  }
>> >> >>  ^
>> >> >> struct X
>> >> >>
>> >> >> struct X &
>> >> >>
>> >> >> # VUSE <.MEM_3>
>> >> >> return _4;
>> >> >>
>> >> >> It seems the return type of function (struct X) differs with the type
>> >> >> of return value (struct X&).
>> >> >> Not sure how this is possible ?
>> >> >
>> >> > You need to honor DECL_BY_REFERENCE of DECL_RESULT.
>> >> Thanks! Gating on !DECL_BY_REFERENCE (DECL_RESULT (cfun->decl))
>> >> resolved the error.
>> >> Does the attached version look OK ?
>> >
>> > + ass_var = make_ssa_name 

Re: [PATCH] ira: Don't substitute into TRAP_IF insns (PR78610)

2016-12-01 Thread Segher Boessenkool
On Thu, Dec 01, 2016 at 12:24:37PM +0100, Paolo Bonzini wrote:
> 
> 
> On 30/11/2016 13:46, Segher Boessenkool wrote:
> >if (JUMP_P (use_insn))
> > continue;
> >  
> > +  /* Also don't substitute into a conditional trap insn -- it can 
> > become
> > +an unconditional trap, and that is a flow control insn.  */
> > +  if (GET_CODE (PATTERN (use_insn)) == TRAP_IF)
> > +   continue;
> 
> Should there be a predicate that catches JUMP_Ps but also TRAP_IF?

Maybe.  A conditional TRAP_IF is quite unlike a JUMP, and having two
separate statements here is handy because I need to put that comment
somewhere ;-)


Segher


Re: [tree-tailcall] Check if function returns it's argument

2016-12-01 Thread Richard Biener
On Thu, 1 Dec 2016, Prathamesh Kulkarni wrote:

> On 1 December 2016 at 18:26, Richard Biener  wrote:
> > On Thu, 1 Dec 2016, Prathamesh Kulkarni wrote:
> >
> >> On 1 December 2016 at 17:40, Richard Biener  wrote:
> >> > On Thu, 1 Dec 2016, Prathamesh Kulkarni wrote:
> >> >
> >> >> On 25 November 2016 at 21:17, Jeff Law  wrote:
> >> >> > On 11/25/2016 01:07 AM, Richard Biener wrote:
> >> >> >
> >> >> >>> For the tail-call, issue should we artificially create a lhs and 
> >> >> >>> use that
> >> >> >>> as return value (perhaps by a separate pass before tailcall) ?
> >> >> >>>
> >> >> >>> __builtin_memcpy (a1, a2, a3);
> >> >> >>> return a1;
> >> >> >>>
> >> >> >>> gets transformed to:
> >> >> >>> _1 = __builtin_memcpy (a1, a2, a3)
> >> >> >>> return _1;
> >> >> >>>
> >> >> >>> So tail-call optimization pass would see the IL in it's expected 
> >> >> >>> form.
> >> >> >>
> >> >> >>
> >> >> >> As said, a RTL expert needs to chime in here.  Iff then tail-call
> >> >> >> itself should do this rewrite.  But if this form is required to make
> >> >> >> things work (I suppose you checked it _does_ actually work?) then
> >> >> >> we'd need to make sure later passes do not undo it.  So it looks
> >> >> >> fragile to me.  OTOH I seem to remember that the flags we set on
> >> >> >> GIMPLE are merely a hint to RTL expansion and the tailcalling is
> >> >> >> verified again there?
> >> >> >
> >> >> > So tail calling actually sits on the border between trees and RTL.
> >> >> > Essentially it's an expand-time decision as we use information from 
> >> >> > trees as
> >> >> > well as low level target information.
> >> >> >
> >> >> > I would not expect the former sequence to tail call.  The tail 
> >> >> > calling code
> >> >> > does not know that the return value from memcpy will be a1.  Thus the 
> >> >> > tail
> >> >> > calling code has to assume that it'll have to copy a1 into the return
> >> >> > register after returning from memcpy, which obviously can't be done 
> >> >> > if we
> >> >> > tail called memcpy.
> >> >> >
> >> >> > The second form is much more likely to turn into a tail call sequence
> >> >> > because the return value from memcpy will be sitting in the proper 
> >> >> > register.
> >> >> > This form out to work for most calling conventions that allow tail 
> >> >> > calls.
> >> >> >
> >> >> > We could (in theory) try and exploit the fact that memcpy returns its 
> >> >> > first
> >> >> > argument as a return value, but that would only be helpful on a 
> >> >> > target where
> >> >> > the first argument and return value use the same register. So I'd 
> >> >> > have a
> >> >> > slight preference to rewriting per Prathamesh's suggestion above 
> >> >> > since it's
> >> >> > more general.
> >> >> Thanks for the suggestion. The attached patch creates artificial lhs,
> >> >> and returns it if the function returns it's argument and that argument
> >> >> is used as return-value.
> >> >>
> >> >> eg:
> >> >> f (void * a1, void * a2, long unsigned int a3)
> >> >> {
> >> >>[0.0%]:
> >> >>   # .MEM_5 = VDEF <.MEM_1(D)>
> >> >>   __builtin_memcpy (a1_2(D), a2_3(D), a3_4(D));
> >> >>   # VUSE <.MEM_5>
> >> >>   return a1_2(D);
> >> >>
> >> >> }
> >> >>
> >> >> is transformed to:
> >> >> f (void * a1, void * a2, long unsigned int a3)
> >> >> {
> >> >>   void * _6;
> >> >>
> >> >>[0.0%]:
> >> >>   # .MEM_5 = VDEF <.MEM_1(D)>
> >> >>   _6 = __builtin_memcpy (a1_2(D), a2_3(D), a3_4(D));
> >> >>   # VUSE <.MEM_5>
> >> >>   return _6;
> >> >>
> >> >> }
> >> >>
> >> >> While testing, I came across an issue with function f() defined
> >> >> intail-padding1.C:
> >> >> struct X
> >> >> {
> >> >>   ~X() {}
> >> >>   int n;
> >> >>   char d;
> >> >> };
> >> >>
> >> >> X f()
> >> >> {
> >> >>   X nrvo;
> >> >>   __builtin_memset (, 0, sizeof(X));
> >> >>   return nrvo;
> >> >> }
> >> >>
> >> >> input to the pass:
> >> >> X f() ()
> >> >> {
> >> >>[0.0%]:
> >> >>   # .MEM_3 = VDEF <.MEM_1(D)>
> >> >>   __builtin_memset (nrvo_2(D), 0, 8);
> >> >>   # VUSE <.MEM_3>
> >> >>   return nrvo_2(D);
> >> >>
> >> >> }
> >> >>
> >> >> verify_gimple_return failed with:
> >> >> tail-padding1.C:13:1: error: invalid conversion in return statement
> >> >>  }
> >> >>  ^
> >> >> struct X
> >> >>
> >> >> struct X &
> >> >>
> >> >> # VUSE <.MEM_3>
> >> >> return _4;
> >> >>
> >> >> It seems the return type of function (struct X) differs with the type
> >> >> of return value (struct X&).
> >> >> Not sure how this is possible ?
> >> >
> >> > You need to honor DECL_BY_REFERENCE of DECL_RESULT.
> >> Thanks! Gating on !DECL_BY_REFERENCE (DECL_RESULT (cfun->decl))
> >> resolved the error.
> >> Does the attached version look OK ?
> >
> > + ass_var = make_ssa_name (TREE_TYPE (arg));
> >
> > can you try
> >
> >   ass_var = copy_ssa_name (arg);
> >
> > instead?  That way the underlying decl should make sure the
> > DECL_BY_REFERENCE check in the IL verification works.
> 

Re: [PATCH PR78559][RFC]Proposed fix

2016-12-01 Thread Segher Boessenkool
Hi!

On Thu, Dec 01, 2016 at 09:47:51AM +, Bin Cheng wrote:
> After investigation, I believe PR78559 is a combine issue revealed by tree 
> level change.  Root causes is after replacing CC register use in 
> undobuf.other_insn, its REG_EQUAL/REG_EQUIV notes are no longer valid because 
> meaning of CC register has changed in i2/i3 instructions by combine.  For 
> following combine sequence, GCC would try to use the note and result in wrong 
> code.  This is a proposed patch discarding all REG_EQUAL/REG_EQUIV notes for 
> other_insn.  It might be a over-kill, but it's difficult to analyze whether 
> registers have been changed or not?  Bootstrap and test on x86_64 and 
> AArch64, any suggestion on how to fix this?
> 

Why is distribute_notes not called on this?  (Search for "We now know",
25 lines down).  Ah, it is only called on the *new* notes, and it only
deletes existing REG_DEAD/REG_UNUSED notes.  You probably should delete
a REG_EQ* here if the reg it refers to is not the same anymore / does
not have the same contents.  Well, except you cannot see the latter here,
so you'll have to kill the note where the SET of the CC is changed.

I'll have a look later; maybe this already helps though.


Segher


Re: [tree-tailcall] Check if function returns it's argument

2016-12-01 Thread Prathamesh Kulkarni
On 1 December 2016 at 18:26, Richard Biener  wrote:
> On Thu, 1 Dec 2016, Prathamesh Kulkarni wrote:
>
>> On 1 December 2016 at 17:40, Richard Biener  wrote:
>> > On Thu, 1 Dec 2016, Prathamesh Kulkarni wrote:
>> >
>> >> On 25 November 2016 at 21:17, Jeff Law  wrote:
>> >> > On 11/25/2016 01:07 AM, Richard Biener wrote:
>> >> >
>> >> >>> For the tail-call, issue should we artificially create a lhs and use 
>> >> >>> that
>> >> >>> as return value (perhaps by a separate pass before tailcall) ?
>> >> >>>
>> >> >>> __builtin_memcpy (a1, a2, a3);
>> >> >>> return a1;
>> >> >>>
>> >> >>> gets transformed to:
>> >> >>> _1 = __builtin_memcpy (a1, a2, a3)
>> >> >>> return _1;
>> >> >>>
>> >> >>> So tail-call optimization pass would see the IL in it's expected form.
>> >> >>
>> >> >>
>> >> >> As said, a RTL expert needs to chime in here.  Iff then tail-call
>> >> >> itself should do this rewrite.  But if this form is required to make
>> >> >> things work (I suppose you checked it _does_ actually work?) then
>> >> >> we'd need to make sure later passes do not undo it.  So it looks
>> >> >> fragile to me.  OTOH I seem to remember that the flags we set on
>> >> >> GIMPLE are merely a hint to RTL expansion and the tailcalling is
>> >> >> verified again there?
>> >> >
>> >> > So tail calling actually sits on the border between trees and RTL.
>> >> > Essentially it's an expand-time decision as we use information from 
>> >> > trees as
>> >> > well as low level target information.
>> >> >
>> >> > I would not expect the former sequence to tail call.  The tail calling 
>> >> > code
>> >> > does not know that the return value from memcpy will be a1.  Thus the 
>> >> > tail
>> >> > calling code has to assume that it'll have to copy a1 into the return
>> >> > register after returning from memcpy, which obviously can't be done if 
>> >> > we
>> >> > tail called memcpy.
>> >> >
>> >> > The second form is much more likely to turn into a tail call sequence
>> >> > because the return value from memcpy will be sitting in the proper 
>> >> > register.
>> >> > This form out to work for most calling conventions that allow tail 
>> >> > calls.
>> >> >
>> >> > We could (in theory) try and exploit the fact that memcpy returns its 
>> >> > first
>> >> > argument as a return value, but that would only be helpful on a target 
>> >> > where
>> >> > the first argument and return value use the same register. So I'd have a
>> >> > slight preference to rewriting per Prathamesh's suggestion above since 
>> >> > it's
>> >> > more general.
>> >> Thanks for the suggestion. The attached patch creates artificial lhs,
>> >> and returns it if the function returns it's argument and that argument
>> >> is used as return-value.
>> >>
>> >> eg:
>> >> f (void * a1, void * a2, long unsigned int a3)
>> >> {
>> >>[0.0%]:
>> >>   # .MEM_5 = VDEF <.MEM_1(D)>
>> >>   __builtin_memcpy (a1_2(D), a2_3(D), a3_4(D));
>> >>   # VUSE <.MEM_5>
>> >>   return a1_2(D);
>> >>
>> >> }
>> >>
>> >> is transformed to:
>> >> f (void * a1, void * a2, long unsigned int a3)
>> >> {
>> >>   void * _6;
>> >>
>> >>[0.0%]:
>> >>   # .MEM_5 = VDEF <.MEM_1(D)>
>> >>   _6 = __builtin_memcpy (a1_2(D), a2_3(D), a3_4(D));
>> >>   # VUSE <.MEM_5>
>> >>   return _6;
>> >>
>> >> }
>> >>
>> >> While testing, I came across an issue with function f() defined
>> >> intail-padding1.C:
>> >> struct X
>> >> {
>> >>   ~X() {}
>> >>   int n;
>> >>   char d;
>> >> };
>> >>
>> >> X f()
>> >> {
>> >>   X nrvo;
>> >>   __builtin_memset (, 0, sizeof(X));
>> >>   return nrvo;
>> >> }
>> >>
>> >> input to the pass:
>> >> X f() ()
>> >> {
>> >>[0.0%]:
>> >>   # .MEM_3 = VDEF <.MEM_1(D)>
>> >>   __builtin_memset (nrvo_2(D), 0, 8);
>> >>   # VUSE <.MEM_3>
>> >>   return nrvo_2(D);
>> >>
>> >> }
>> >>
>> >> verify_gimple_return failed with:
>> >> tail-padding1.C:13:1: error: invalid conversion in return statement
>> >>  }
>> >>  ^
>> >> struct X
>> >>
>> >> struct X &
>> >>
>> >> # VUSE <.MEM_3>
>> >> return _4;
>> >>
>> >> It seems the return type of function (struct X) differs with the type
>> >> of return value (struct X&).
>> >> Not sure how this is possible ?
>> >
>> > You need to honor DECL_BY_REFERENCE of DECL_RESULT.
>> Thanks! Gating on !DECL_BY_REFERENCE (DECL_RESULT (cfun->decl))
>> resolved the error.
>> Does the attached version look OK ?
>
> + ass_var = make_ssa_name (TREE_TYPE (arg));
>
> can you try
>
>   ass_var = copy_ssa_name (arg);
>
> instead?  That way the underlying decl should make sure the
> DECL_BY_REFERENCE check in the IL verification works.
Done in the attached version and verified tail-padding1.C passes with
the change.
Does it look OK ?
Bootstrap+test in progress on x86_64-unknown-linux-gnu.

Thanks,
Prathamesh
>
> Thanks,
> Richard.
>
>
>> Validation in progress.
>>
>> Thanks,
>> Prathamesh
>> >
>> >> To work around that, I guarded the transform on:
>> >> useless_type_conversion_p 

Re: [tree-tailcall] Check if function returns it's argument

2016-12-01 Thread Richard Biener
On Thu, 1 Dec 2016, Prathamesh Kulkarni wrote:

> On 1 December 2016 at 17:40, Richard Biener  wrote:
> > On Thu, 1 Dec 2016, Prathamesh Kulkarni wrote:
> >
> >> On 25 November 2016 at 21:17, Jeff Law  wrote:
> >> > On 11/25/2016 01:07 AM, Richard Biener wrote:
> >> >
> >> >>> For the tail-call, issue should we artificially create a lhs and use 
> >> >>> that
> >> >>> as return value (perhaps by a separate pass before tailcall) ?
> >> >>>
> >> >>> __builtin_memcpy (a1, a2, a3);
> >> >>> return a1;
> >> >>>
> >> >>> gets transformed to:
> >> >>> _1 = __builtin_memcpy (a1, a2, a3)
> >> >>> return _1;
> >> >>>
> >> >>> So tail-call optimization pass would see the IL in it's expected form.
> >> >>
> >> >>
> >> >> As said, a RTL expert needs to chime in here.  Iff then tail-call
> >> >> itself should do this rewrite.  But if this form is required to make
> >> >> things work (I suppose you checked it _does_ actually work?) then
> >> >> we'd need to make sure later passes do not undo it.  So it looks
> >> >> fragile to me.  OTOH I seem to remember that the flags we set on
> >> >> GIMPLE are merely a hint to RTL expansion and the tailcalling is
> >> >> verified again there?
> >> >
> >> > So tail calling actually sits on the border between trees and RTL.
> >> > Essentially it's an expand-time decision as we use information from 
> >> > trees as
> >> > well as low level target information.
> >> >
> >> > I would not expect the former sequence to tail call.  The tail calling 
> >> > code
> >> > does not know that the return value from memcpy will be a1.  Thus the 
> >> > tail
> >> > calling code has to assume that it'll have to copy a1 into the return
> >> > register after returning from memcpy, which obviously can't be done if we
> >> > tail called memcpy.
> >> >
> >> > The second form is much more likely to turn into a tail call sequence
> >> > because the return value from memcpy will be sitting in the proper 
> >> > register.
> >> > This form out to work for most calling conventions that allow tail calls.
> >> >
> >> > We could (in theory) try and exploit the fact that memcpy returns its 
> >> > first
> >> > argument as a return value, but that would only be helpful on a target 
> >> > where
> >> > the first argument and return value use the same register. So I'd have a
> >> > slight preference to rewriting per Prathamesh's suggestion above since 
> >> > it's
> >> > more general.
> >> Thanks for the suggestion. The attached patch creates artificial lhs,
> >> and returns it if the function returns it's argument and that argument
> >> is used as return-value.
> >>
> >> eg:
> >> f (void * a1, void * a2, long unsigned int a3)
> >> {
> >>[0.0%]:
> >>   # .MEM_5 = VDEF <.MEM_1(D)>
> >>   __builtin_memcpy (a1_2(D), a2_3(D), a3_4(D));
> >>   # VUSE <.MEM_5>
> >>   return a1_2(D);
> >>
> >> }
> >>
> >> is transformed to:
> >> f (void * a1, void * a2, long unsigned int a3)
> >> {
> >>   void * _6;
> >>
> >>[0.0%]:
> >>   # .MEM_5 = VDEF <.MEM_1(D)>
> >>   _6 = __builtin_memcpy (a1_2(D), a2_3(D), a3_4(D));
> >>   # VUSE <.MEM_5>
> >>   return _6;
> >>
> >> }
> >>
> >> While testing, I came across an issue with function f() defined
> >> intail-padding1.C:
> >> struct X
> >> {
> >>   ~X() {}
> >>   int n;
> >>   char d;
> >> };
> >>
> >> X f()
> >> {
> >>   X nrvo;
> >>   __builtin_memset (, 0, sizeof(X));
> >>   return nrvo;
> >> }
> >>
> >> input to the pass:
> >> X f() ()
> >> {
> >>[0.0%]:
> >>   # .MEM_3 = VDEF <.MEM_1(D)>
> >>   __builtin_memset (nrvo_2(D), 0, 8);
> >>   # VUSE <.MEM_3>
> >>   return nrvo_2(D);
> >>
> >> }
> >>
> >> verify_gimple_return failed with:
> >> tail-padding1.C:13:1: error: invalid conversion in return statement
> >>  }
> >>  ^
> >> struct X
> >>
> >> struct X &
> >>
> >> # VUSE <.MEM_3>
> >> return _4;
> >>
> >> It seems the return type of function (struct X) differs with the type
> >> of return value (struct X&).
> >> Not sure how this is possible ?
> >
> > You need to honor DECL_BY_REFERENCE of DECL_RESULT.
> Thanks! Gating on !DECL_BY_REFERENCE (DECL_RESULT (cfun->decl))
> resolved the error.
> Does the attached version look OK ?

+ ass_var = make_ssa_name (TREE_TYPE (arg));

can you try

  ass_var = copy_ssa_name (arg);

instead?  That way the underlying decl should make sure the
DECL_BY_REFERENCE check in the IL verification works.

Thanks,
Richard.


> Validation in progress.
> 
> Thanks,
> Prathamesh
> >
> >> To work around that, I guarded the transform on:
> >> useless_type_conversion_p (TREE_TYPE (TREE_TYPE (cfun->decl)),
> >>  TREE_TYPE (retval)))
> >>
> >> in the patch. Does that look OK ?
> >>
> >> Bootstrap+tested on x86_64-unknown-linux-gnu with 
> >> --enable-languages=all,ada.
> >> Cross-tested on arm*-*-*, aarch64*-*-*.
> >>
> >> Thanks,
> >> Prathamesh
> >> >
> >> >
> >> > Jeff
> >>
> >
> > --
> > Richard Biener 
> > SUSE LINUX GmbH, GF: 

Re: [tree-tailcall] Check if function returns it's argument

2016-12-01 Thread Prathamesh Kulkarni
On 1 December 2016 at 17:40, Richard Biener  wrote:
> On Thu, 1 Dec 2016, Prathamesh Kulkarni wrote:
>
>> On 25 November 2016 at 21:17, Jeff Law  wrote:
>> > On 11/25/2016 01:07 AM, Richard Biener wrote:
>> >
>> >>> For the tail-call, issue should we artificially create a lhs and use that
>> >>> as return value (perhaps by a separate pass before tailcall) ?
>> >>>
>> >>> __builtin_memcpy (a1, a2, a3);
>> >>> return a1;
>> >>>
>> >>> gets transformed to:
>> >>> _1 = __builtin_memcpy (a1, a2, a3)
>> >>> return _1;
>> >>>
>> >>> So tail-call optimization pass would see the IL in it's expected form.
>> >>
>> >>
>> >> As said, a RTL expert needs to chime in here.  Iff then tail-call
>> >> itself should do this rewrite.  But if this form is required to make
>> >> things work (I suppose you checked it _does_ actually work?) then
>> >> we'd need to make sure later passes do not undo it.  So it looks
>> >> fragile to me.  OTOH I seem to remember that the flags we set on
>> >> GIMPLE are merely a hint to RTL expansion and the tailcalling is
>> >> verified again there?
>> >
>> > So tail calling actually sits on the border between trees and RTL.
>> > Essentially it's an expand-time decision as we use information from trees 
>> > as
>> > well as low level target information.
>> >
>> > I would not expect the former sequence to tail call.  The tail calling code
>> > does not know that the return value from memcpy will be a1.  Thus the tail
>> > calling code has to assume that it'll have to copy a1 into the return
>> > register after returning from memcpy, which obviously can't be done if we
>> > tail called memcpy.
>> >
>> > The second form is much more likely to turn into a tail call sequence
>> > because the return value from memcpy will be sitting in the proper 
>> > register.
>> > This form out to work for most calling conventions that allow tail calls.
>> >
>> > We could (in theory) try and exploit the fact that memcpy returns its first
>> > argument as a return value, but that would only be helpful on a target 
>> > where
>> > the first argument and return value use the same register. So I'd have a
>> > slight preference to rewriting per Prathamesh's suggestion above since it's
>> > more general.
>> Thanks for the suggestion. The attached patch creates artificial lhs,
>> and returns it if the function returns it's argument and that argument
>> is used as return-value.
>>
>> eg:
>> f (void * a1, void * a2, long unsigned int a3)
>> {
>>[0.0%]:
>>   # .MEM_5 = VDEF <.MEM_1(D)>
>>   __builtin_memcpy (a1_2(D), a2_3(D), a3_4(D));
>>   # VUSE <.MEM_5>
>>   return a1_2(D);
>>
>> }
>>
>> is transformed to:
>> f (void * a1, void * a2, long unsigned int a3)
>> {
>>   void * _6;
>>
>>[0.0%]:
>>   # .MEM_5 = VDEF <.MEM_1(D)>
>>   _6 = __builtin_memcpy (a1_2(D), a2_3(D), a3_4(D));
>>   # VUSE <.MEM_5>
>>   return _6;
>>
>> }
>>
>> While testing, I came across an issue with function f() defined
>> intail-padding1.C:
>> struct X
>> {
>>   ~X() {}
>>   int n;
>>   char d;
>> };
>>
>> X f()
>> {
>>   X nrvo;
>>   __builtin_memset (, 0, sizeof(X));
>>   return nrvo;
>> }
>>
>> input to the pass:
>> X f() ()
>> {
>>[0.0%]:
>>   # .MEM_3 = VDEF <.MEM_1(D)>
>>   __builtin_memset (nrvo_2(D), 0, 8);
>>   # VUSE <.MEM_3>
>>   return nrvo_2(D);
>>
>> }
>>
>> verify_gimple_return failed with:
>> tail-padding1.C:13:1: error: invalid conversion in return statement
>>  }
>>  ^
>> struct X
>>
>> struct X &
>>
>> # VUSE <.MEM_3>
>> return _4;
>>
>> It seems the return type of function (struct X) differs with the type
>> of return value (struct X&).
>> Not sure how this is possible ?
>
> You need to honor DECL_BY_REFERENCE of DECL_RESULT.
Thanks! Gating on !DECL_BY_REFERENCE (DECL_RESULT (cfun->decl))
resolved the error.
Does the attached version look OK ?
Validation in progress.

Thanks,
Prathamesh
>
>> To work around that, I guarded the transform on:
>> useless_type_conversion_p (TREE_TYPE (TREE_TYPE (cfun->decl)),
>>  TREE_TYPE (retval)))
>>
>> in the patch. Does that look OK ?
>>
>> Bootstrap+tested on x86_64-unknown-linux-gnu with --enable-languages=all,ada.
>> Cross-tested on arm*-*-*, aarch64*-*-*.
>>
>> Thanks,
>> Prathamesh
>> >
>> >
>> > Jeff
>>
>
> --
> Richard Biener 
> SUSE LINUX GmbH, GF: Felix Imendoerffer, Jane Smithard, Graham Norton, HRB 
> 21284 (AG Nuernberg)
diff --git a/gcc/testsuite/gcc.dg/tree-ssa/tailcall-9.c 
b/gcc/testsuite/gcc.dg/tree-ssa/tailcall-9.c
new file mode 100644
index 000..b3fdc6c
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/tree-ssa/tailcall-9.c
@@ -0,0 +1,10 @@
+/* { dg-do compile } */
+/* { dg-options "-O2 -fdump-tree-tailc-details" } */
+
+void *f(void *a1, void *a2, __SIZE_TYPE__ a3)
+{
+  __builtin_memcpy (a1, a2, a3);
+  return a1;
+}
+
+/* { dg-final { scan-tree-dump-times "Found tail call" 1 "tailc" } } */ 
diff --git a/gcc/tree-tailcall.c b/gcc/tree-tailcall.c
index 66a0a4c..a1c8bd7 100644

[PATCH] Fix PR tree-optimization/78598 - tree-ssa-loop-prefetch.c:835:16: runtime error: signed integer overflow

2016-12-01 Thread Markus Trippelsdorf
Using bootstrap-ubsan gcc to build mplayer shows:

tree-ssa-loop-prefetch.c:835:16: runtime error: signed integer overflow:
288230376151711743 * 64 cannot be represented in type 'long int'

Here signed und unsigned integers are mixed in a division resulting in
bogus results: (-83 + 64ULL -1) / 64ULL) == 288230376151711743

Fixed by casting the unsigned parameter to signed.

Tested on ppc64le.
OK for trunk?

Thanks.

PR tree-optimization/78598
* tree-ssa-loop-prefetch.c (ddown): Cast to signed to avoid
overflows.


diff --git a/gcc/tree-ssa-loop-prefetch.c b/gcc/tree-ssa-loop-prefetch.c
index 0a2ee5ea25fd..ead2543ada46 100644
--- a/gcc/tree-ssa-loop-prefetch.c
+++ b/gcc/tree-ssa-loop-prefetch.c
@@ -700,9 +700,9 @@ ddown (HOST_WIDE_INT x, unsigned HOST_WIDE_INT by)
   gcc_assert (by > 0);
 
   if (x >= 0)
-return x / by;
+return x / (HOST_WIDE_INT) by;
   else
-return (x + by - 1) / by;
+return (x + (HOST_WIDE_INT) by - 1) / (HOST_WIDE_INT) by;
 }
 
 /* Given a CACHE_LINE_SIZE and two inductive memory references
-- 
Markus


Re: PR78599

2016-12-01 Thread Richard Biener
On Thu, Dec 1, 2016 at 11:07 AM, Prathamesh Kulkarni
 wrote:
> Hi,
> As mentioned in PR, the issue seems to be that in
> propagate_bits_accross_jump_functions(),
> ipa_get_type() returns record_type during WPA and hence we pass
> invalid precision to
> ipcp_bits_lattice::meet_with (value, mask, precision) which eventually
> leads to runtime error.
> The attached patch tries to fix that, by bailing out if type of param
> is not integral or pointer type.
> This happens for the edge from deque_test -> _Z4copyIPd1BEvT_S2_T0_.isra.0/9.

Feels more like a DECL_BY_REFERENCE mishandling and should be fixed elsewhere.

> However I am not sure how ipcp_bits_lattice::meet_with (value, mask,
> precision) gets called for this case. In
> ipa_compute_jump_functions_for_edge(), we set jfunc->bits.known to
> true only
> if parm's type satisfies INTEGRAL_TYPE_P or POINTER_TYPE_P.
> And ipcp_bits_lattice::meet_with (value, mask, precision) is called
> only if jfunc->bits.known
> is set to true. So I suppose it shouldn't really happen that
> ipcp_bits_lattice::meet_with(value, mask, precision) gets called when
> callee parameter's type is record_type, since the corresponding
> argument's type would also need to be record_type and
> jfunc->bits.known would be set to false.
>
> Without -flto, parm_type is reference_type so that satisfies POINTER_TYPE_P,
> but with -flto it's appearing to be record_type. Is this possibly the
> same issue of TYPE_ARG_TYPES returning bogus types during WPA ?
>
> I verified the attached patch fixes the runtime error with ubsan-built gcc.
> Bootstrap+tested on x86_64-unknown-linux-gnu.
> Cross-tested on arm*-*-*, aarch64*-*-*.
> LTO bootstrap on x86_64-unknown-linux-gnu in progress.
> Is it OK to commit if it succeeds ?
>
> Thanks,
> Prathamesh


Re: [PATCH] Minimal reimplementation of errors.c within read-md.c

2016-12-01 Thread Bernd Schmidt

On 11/30/2016 09:24 PM, David Malcolm wrote:


gcc/ChangeLog:
* read-md.c (have_error): New global, copied from errors.c.
(fatal): New function, copied from errors.c.


I would have expected the function to go into diagnostic.c, but actually 
there are already various functions of this sort in read-md. I'd request 
you place it near fatal_at, and maybe add this to errors.h:


inline bool seen_error ()
{
  return have_error;
}

and replace explicit uses of have_error with that to unify things a bit.


Bernd


[PATCH] PR rtl-optimization/78596 - combine.c:12561:14: runtime error: left shift of negative value

2016-12-01 Thread Markus Trippelsdorf
Hopefully one last patch for UB in combine.c:

 combine.c:12561:14: runtime error: left shift of negative value -9

Fixed by casting to unsigned, as usual.

Tested on ppc64le.
OK for trunk?

Thanks.

PR rtl-optimization/78596
* combine.c (simplify_comparison): Cast to unsigned to avoid
left shifting of negative value.

diff --git a/gcc/combine.c b/gcc/combine.c
index faafcb741f41..e32c02b06810 100644
--- a/gcc/combine.c
+++ b/gcc/combine.c
@@ -12561,7 +12561,8 @@ simplify_comparison (enum rtx_code code, rtx *pop0, rtx 
*pop1)
  if (GET_CODE (op0) == LSHIFTRT)
code = unsigned_condition (code);

- const_op <<= INTVAL (XEXP (op0, 1));
+ const_op = (unsigned HOST_WIDE_INT) const_op
+ << INTVAL (XEXP (op0, 1));
  if (low_bits != 0
  && (code == GT || code == GTU
  || code == LE || code == LEU))

--
Markus


  1   2   >