RE: [Patch,rtl Optimization]: Better register pressure estimate for Loop Invariant Code Motion.

2015-12-15 Thread Ajit Kumar Agarwal
The patch is modified based on the review comments from Richard, Bernd and 
David. 

The following changes are done to incorporate the comments received on the 
previous mail.

1. With this patch the liveness of the Loop is not stored on the LOOPDATA 
structures. The liveness is calculated based on the current loops
 for which the invariant is checked and the regs_used is calculated. 
2. Memory leaks are fixed.
3. Reworked on the comments section based on Bernd's comments.

Bootstrapped and regtested for i386 and Microblaze target.

SPEC CPU 2000 benchmarks are run on i386 target and following is the summary of 
the results.

SPEC CPU 2000 INT benchmarks.

( Gemoean Score without the change vs Geomean score with reg pressure change = 
3745.193 vs 3745.328)

SPEC CPU 2000 FP benchmarks.

( Gemoean Score without the change vs Geomean score with reg pressure change = 
4741.825 vs 4748.364).


[Patch,rtl Optimization]: Better register pressure estimate for loop 
invariant code motion

Calculate the loop liveness used for regs for calculating the register 
pressure
in the cost estimation.  Loop liveness is based on the following properties.
We only need to find the set of objects that are live at the birth or the 
header
of the loop. We don't need to calculate the live through the loop by 
considering
live in and live out of all the basic blocks of the loop. This is based on 
the
point that the set of objects that are live-in at the birth or header of 
the loop
will be live-in at every node in the loop.

If a v live is out at the header of the loop then the variable is live-in 
at every node
in the loop. To prove this, consider a loop L with header h such that the 
variable v
defined at d is live-in at h. Since v is live at h, d is not part of L. 
This follows i
from the dominance property, i.e. h is strictly dominated by d. 
Furthermore, there
exists a path from h to a use of v which does not go through d. For every 
node p in
the loop, since the loop is strongly connected and node is a component of 
the CFG,
there exists a path, consisting only of nodes of L from p to h. 
Concatenating these
two paths proves that v is live-in and live-out of p.

Calculate the live-out and live-in for the exit edge of the loop. This 
patch considers
liveness for not only the loop latch but also the liveness outside the 
loops.

ChangeLog:
2015-12-15  Ajit Agarwal  

* loop-invariant.c
(find_invariants_to_move): Add the logic of regs_used based
on liveness.
* cfgloopanal.c
(estimate_reg_pressure_cost): Update the heuristics in presence
of call_p.

Signed-off-by:Ajit Agarwal ajit...@xilinx.com.

Thanks & Regards
Ajit

-Original Message-
From: Bernd Schmidt [mailto:bschm...@redhat.com] 
Sent: Wednesday, December 09, 2015 7:34 PM
To: Ajit Kumar Agarwal; Richard Biener
Cc: Jeff Law; GCC Patches; Vinod Kathail; Shail Aditya Gupta; Vidhumouli 
Hunsigida; Nagaraju Mekala
Subject: Re: [Patch,rtl Optimization]: Better register pressure estimate for 
Loop Invariant Code Motion.

On 12/09/2015 12:22 PM, Ajit Kumar Agarwal wrote:
>
> This is because the available_regs = 6 and the regs_needed = 1 and 
> new_regs = 0 and the regs_used = 10.  As the reg_used that are based 
> on the Liveness given above is greater than the available_regs, then
 > it's candidate of spill and the estimate_register_pressure calculates  > the 
 > spill cost. This spill cost is greater than inv_cost and gain  > comes to be 
 > negative. The disables the loop invariant for the above  > testcase.

As far as I can tell this loop does not lead to a spill. Hence, failure of this 
testcase would suggest there is something wrong with your idea.

>> +  FOR_EACH_LOOP (loop, LI_FROM_INNERMOST)  {

Formatting.

>> +/* Loop Liveness is based on the following proprties.

"properties"

>> +   we only require to calculate the set of objects that are live at
>> +   the birth or the header of the loop.
>> +   We don't need to calculate the live through the Loop considering
>> +   Live in and Live out of all the basic blocks of the Loop. This is
>> +   because the set of objects. That are live-in at the birth or header
>> +   of the loop will be live-in at every node in the Loop.
>> +   If a v live out at the header of the loop then the variable is 
>> live-in
>> +   at every node in the Loop. To prove this, Consider a Loop L with 
>> header
>> +   h such that The variable v defined at d is live-in at h. Since v is 
>> live
>> +   at h, d is not part of L. This follows from the dominance property, 
>> i.e.
>> +   h is strictly dominated by d. Furthermore, there exists a path from 
>> h to
>> +   a use of v which does not go through d. For every node of the loop, 
>> p,
>> +   since the loop is strongly connected Component of the CFG, there 
>> exists
>> +   a path, consisting only of n

[i386] Enable -mstackrealign with SSE on 32-bit Windows

2015-12-15 Thread Eric Botcazou
Hi,

even the latest versions of Windows still guarantee only a 4-byte alignment of 
the stack in 32-bit mode, which doesn't play nice with some SSE instructions.
That's why some projects enable -mstackrealign by default on 32-bit Windows:
  https://bugzilla.mozilla.org/show_bug.cgi?id=631252
This eliminates an entire class of bugs which are sometimes hard to reproduce.

The attached patch automatically enables it when SSE instructions are used.
That's a good compromise IMO because the default configuration of the compiler 
on this platform doesn't enable SSE so should presumably not be modified.

Tested on i686-pc-mingw32, OK for the mainline?


2015-12-15  Eric Botcazou  

* config/i386/cygming.h (STACK_REALIGN_DEFAULT): Define.


2015-12-15  Eric Botcazou  

* gcc.target/i386/stack-realign-win.c: New test.


-- 
Eric BotcazouIndex: config/i386/cygming.h
===
--- config/i386/cygming.h	(revision 231605)
+++ config/i386/cygming.h	(working copy)
@@ -39,6 +39,11 @@ along with GCC; see the file COPYING3.
 #undef MAX_STACK_ALIGNMENT
 #define MAX_STACK_ALIGNMENT  (TARGET_SEH ? 128 : MAX_OFILE_ALIGNMENT)
 
+/* 32-bit Windows aligns the stack on a 4-byte boundary but SSE instructions
+   may require 16-byte alignment.  */
+#undef STACK_REALIGN_DEFAULT
+#define STACK_REALIGN_DEFAULT TARGET_SSE
+
 /* Support hooks for SEH.  */
 #undef  TARGET_ASM_UNWIND_EMIT
 #define TARGET_ASM_UNWIND_EMIT  i386_pe_seh_unwind_emit
/* { dg-do compile { target *-*-mingw* *-*-cygwin* } } */
/* { dg-require-effective-target ia32 } */
/* { dg-options "-msse -O" } */

extern void abort (void);

typedef float __m128 __attribute__ ((__vector_size__ (16), __may_alias__));

static __m128
load_m128 (float *e)
{
  return * (__m128 *) e;
}

typedef union
{
  __m128  x;
  float a[4];
} union128;

void test (void)
{
  union128 u;
  float e[4] __attribute__ ((aligned (16)))
= {2134.3343, 1234.635654, 1.2234, 876.8976};
  int i;

  u.x = load_m128 (e);

  for (i = 0; i < 4; i++)
if (u.a[i] != e[i])
  abort ();
}

/* { dg-final { scan-assembler "andl\\t\\$-16, %esp" } } */


Re: [gomp4.5] Handle #pragma omp declare target link

2015-12-15 Thread Jakub Jelinek
On Mon, Dec 14, 2015 at 08:17:33PM +0300, Ilya Verbin wrote:
> Here is an updated patch.  Now MSB is set in both tables, and
> gomp_unload_image_from_device is changed.  I've verified using simple DSO
> testcase, that memory on target is freed after dlclose.
> bootstrap and make check on x86_64-linux passed.
> 
> gcc/c-family/
>   * c-common.c (c_common_attribute_table): Handle "omp declare target
>   link" attribute.
> gcc/
>   * cgraphunit.c (output_in_order): Do not assemble "omp declare target
>   link" variables in ACCEL_COMPILER.
>   * gimplify.c (gimplify_adjust_omp_clauses): Do not remove mapping of
>   "omp declare target link" variables.
>   * lto/lto.c: Include stringpool.h and fold-const.h.
>   (offload_handle_link_vars): New static function.
>   (lto_main): Call offload_handle_link_vars.

lto/ has its own ChangeLog file, so please move the entry there and remove
the lto/ prefix.

Ok with that change, thanks.

Jakub


Re: [PATCH 1/2] mark *-knetbsd-* as obsolete

2015-12-15 Thread Andreas Schwab
tbsaunde+...@tbsaunde.org writes:

> diff --git a/gcc/config.gcc b/gcc/config.gcc
> index 882e413..59f77da 100644
> --- a/gcc/config.gcc
> +++ b/gcc/config.gcc
> @@ -237,7 +237,7 @@ md_file=
>  # Obsolete configurations.
>  case ${target} in
>  # Currently there are no obsolete targets.
> - nothing \
> + *-knetbsd-* \

The comment is obsolete.

Andreas.

-- 
Andreas Schwab, SUSE Labs, sch...@suse.de
GPG Key fingerprint = 0196 BAD8 1CE9 1970 F4BE  1748 E4D4 88E3 0EEA B9D7
"And now for something completely different."


[Patch] S/390: Simplify vector conditionals

2015-12-15 Thread Robin Dapp
Hi,

the attached patch simplifies vector conditional statements like
v < 0 ? -1 : 0 into v >> 31. The code is largely based on the x86
implementation of this feature by Jakub Jelinek. In future, (and if
useful for more backends) it could make sense to implement this directly
at tree-level.

Bootstrapped and regression-tested on s390.

Regards
 Robin

gcc/ChangeLog:

2015-12-15  Robin Dapp  

* config/s390/s390.c (s390_expand_vcond): Convert vector
conditional into shift.
* config/s390/vector.md: Change operand predicate.

gcc/testsuite/ChangeLog:

2015-12-15  Robin Dapp  

* gcc.target/s390/vcond-shift.c: New test to check vcond
simplification.
diff --git a/gcc/config/s390/s390.c b/gcc/config/s390/s390.c
index 67639bc..a72c9e1 100644
--- a/gcc/config/s390/s390.c
+++ b/gcc/config/s390/s390.c
@@ -6108,19 +6108,60 @@ s390_expand_vcond (rtx target, rtx then, rtx els,
   machine_mode result_mode;
   rtx result_target;
 
+  machine_mode target_mode = GET_MODE (target);
+  machine_mode cmp_mode = GET_MODE (cmp_op1);
+  rtx op = (cond == LT) ? els : then;
+
+  /* Try to optimize x < 0 ? -1 : 0 into (signed) x >> 31
+ and x < 0 ? 1 : 0 into (unsigned) x >> 31.  Likewise
+ for short and byte (x >> 15 and x >> 7 respectively).  */
+  if ((cond == LT || cond == GE)
+  && target_mode == cmp_mode
+  && cmp_op2 == CONST0_RTX (cmp_mode)
+  && op == CONST0_RTX (target_mode)
+  && s390_vector_mode_supported_p (target_mode)
+  && GET_MODE_CLASS (target_mode) == MODE_VECTOR_INT)
+{
+  rtx negop = (cond == LT) ? then : els;
+
+  int shift = GET_MODE_BITSIZE (GET_MODE_INNER (target_mode)) - 1;
+
+  /* if x < 0 ? 1 : 0 or if x >= 0 ? 0 : 1 */
+  if (negop == CONST1_RTX (target_mode))
+	{
+	  rtx res = expand_simple_binop (cmp_mode, LSHIFTRT, cmp_op1,
+	 GEN_INT (shift), target,
+	 1, OPTAB_DIRECT);
+	  if (res != target)
+	emit_move_insn (target, res);
+	  return;
+	}
+
+  /* if x < 0 ? -1 : 0 or if x >= 0 ? 0 : -1 */
+  else if (constm1_operand (negop, target_mode))
+	{
+	  rtx res = expand_simple_binop (cmp_mode, ASHIFTRT, cmp_op1,
+	 GEN_INT (shift), target,
+	 0, OPTAB_DIRECT);
+	  if (res != target)
+	emit_move_insn (target, res);
+	  return;
+	}
+}
+
   /* We always use an integral type vector to hold the comparison
  result.  */
-  result_mode = GET_MODE (cmp_op1) == V2DFmode ? V2DImode : GET_MODE (cmp_op1);
+  result_mode = cmp_mode == V2DFmode ? V2DImode : cmp_mode;
   result_target = gen_reg_rtx (result_mode);
 
-  /* Alternatively this could be done by reload by lowering the cmp*
- predicates.  But it appears to be better for scheduling etc. to
- have that in early.  */
+  /* We allow vector immediates as comparison operands that
+ can be handled by the optimization above but not by the
+ following code.  Hence, force them into registers here.  */
   if (!REG_P (cmp_op1))
-cmp_op1 = force_reg (GET_MODE (target), cmp_op1);
+cmp_op1 = force_reg (target_mode, cmp_op1);
 
   if (!REG_P (cmp_op2))
-cmp_op2 = force_reg (GET_MODE (target), cmp_op2);
+cmp_op2 = force_reg (target_mode, cmp_op2);
 
   s390_expand_vec_compare (result_target, cond,
 			   cmp_op1, cmp_op2);
@@ -6130,7 +6171,7 @@ s390_expand_vcond (rtx target, rtx then, rtx els,
   if (constm1_operand (then, GET_MODE (then))
   && const0_operand (els, GET_MODE (els)))
 {
-  emit_move_insn (target, gen_rtx_SUBREG (GET_MODE (target),
+  emit_move_insn (target, gen_rtx_SUBREG (target_mode,
 	  result_target, 0));
   return;
 }
@@ -6139,10 +6180,10 @@ s390_expand_vcond (rtx target, rtx then, rtx els,
   /* This gets triggered e.g.
  with gcc.c-torture/compile/pr53410-1.c */
   if (!REG_P (then))
-then = force_reg (GET_MODE (target), then);
+then = force_reg (target_mode, then);
 
   if (!REG_P (els))
-els = force_reg (GET_MODE (target), els);
+els = force_reg (target_mode, els);
 
   tmp = gen_rtx_fmt_ee (EQ, VOIDmode,
 			result_target,
@@ -6150,9 +6191,9 @@ s390_expand_vcond (rtx target, rtx then, rtx els,
 
   /* We compared the result against zero above so we have to swap then
  and els here.  */
-  tmp = gen_rtx_IF_THEN_ELSE (GET_MODE (target), tmp, els, then);
+  tmp = gen_rtx_IF_THEN_ELSE (target_mode, tmp, els, then);
 
-  gcc_assert (GET_MODE (target) == GET_MODE (then));
+  gcc_assert (target_mode == GET_MODE (then));
   emit_insn (gen_rtx_SET (target, tmp));
 }
 
diff --git a/gcc/config/s390/vector.md b/gcc/config/s390/vector.md
index c9f5890..f6a85c8 100644
--- a/gcc/config/s390/vector.md
+++ b/gcc/config/s390/vector.md
@@ -403,7 +403,7 @@
 	(if_then_else:V_HW
 	 (match_operator 3 "comparison_operator"
 			 [(match_operand:V_HW2 4 "register_operand" "")
-			  (match_operand:V_HW2 5 "register_operand" "")])
+			  (match_operand:V_HW2 5 "nonmemory_operand" "")])
 	 (match_operand:V_HW 1 "nonmemory_operand" "")
 	 (match_operand:V_HW 2 "nonmemory

Re: [PATCH 1/4 v2][AArch64] Generalize CCMP support

2015-12-15 Thread Wilco Dijkstra
ping

This patch series generalizes CCMP by adding FCCMP support and enabling more 
optimizations. 
The first patch simplifies the representation of CCMP patterns by using 
if-then-else which closely 
matches real instruction semantics. As a result the existing special CC modes 
and functions are no 
longer required. The condition of the CCMP is the if condition which compares 
the previously set
CC register. The then part does the compare like a normal compare. The else 
part contains the
integer value of the AArch64 condition that must be set if the if condition is 
false.

ChangeLog:
2015-11-12  Wilco Dijkstra  

* gcc/target.def (gen_ccmp_first): Update documentation.
(gen_ccmp_next): Likewise.
* gcc/doc/tm.texi (gen_ccmp_first): Update documentation.
(gen_ccmp_next): Likewise.
* gcc/ccmp.c (expand_ccmp_expr): Extract cmp_code from return value of 
expand_ccmp_expr_1.  Improve comments.
* gcc/config/aarch64/aarch64.md (ccmp_and): Use if_then_else for ccmp.
(ccmp_ior): Remove pattern.
(cmp): Remove expand.
(cmp): Globalize pattern.
(cstorecc4): Use cc_register.
(movcc): Remove ccmp_cc_register check.
* gcc/config/aarch64/aarch64.c (aarch64_get_condition_code_1):
Simplify after removal of CC_DNE/* modes.
(aarch64_ccmp_mode_to_code): Remove.
(aarch64_print_operand): Remove 'K' case.  Merge 'm' and 'M' cases.
In 'k' case use integer as condition.
(aarch64_nzcv_codes): Remove inverted cases.
(aarch64_code_to_ccmode): Remove.
(aarch64_gen_ccmp_first): Use cmp pattern directly.  Return the correct 
comparison with CC register to be used in folowing CCMP/branch/CSEL.
(aarch64_gen_ccmp_next): Use previous comparison and mode in CCMP
pattern.  Return the comparison with CC register.  Invert conditions
when bitcode is OR.
* gcc/config/aarch64/aarch64-modes.def: Remove CC_DNE/* modes.
* gcc/config/aarch64/predicates.md (ccmp_cc_register): Remove.


---
 gcc/ccmp.c   |  21 ++-
 gcc/config/aarch64/aarch64-modes.def |  10 --
 gcc/config/aarch64/aarch64.c | 305 ---
 gcc/config/aarch64/aarch64.md|  68 ++--
 gcc/config/aarch64/predicates.md |  17 --
 gcc/doc/tm.texi  |  36 ++---
 gcc/target.def   |  36 ++---
 7 files changed, 128 insertions(+), 365 deletions(-)

diff --git a/gcc/ccmp.c b/gcc/ccmp.c
index 20348d9..58ac126 100644
--- a/gcc/ccmp.c
+++ b/gcc/ccmp.c
@@ -65,6 +65,10 @@ along with GCC; see the file COPYING3.  If not see
 - gen_ccmp_first expands the first compare in CCMP.
 - gen_ccmp_next expands the following compares.
 
+   Both hooks return a comparison with the CC register that is equivalent
+   to the value of the gimple comparison.  This is used by the next CCMP
+   and in the final conditional store.
+
  * We use cstorecc4 pattern to convert the CCmode intermediate to
the integer mode result that expand_normal is expecting.
 
@@ -130,10 +134,12 @@ ccmp_candidate_p (gimple *g)
   return false;
 }
 
-/* PREV is the CC flag from precvious compares.  The function expands the
-   next compare based on G which ops previous compare with CODE.
+/* PREV is a comparison with the CC register which represents the
+   result of the previous CMP or CCMP.  The function expands the
+   next compare based on G which is ANDed/ORed with the previous
+   compare depending on CODE.
PREP_SEQ returns all insns to prepare opearands for compare.
-   GEN_SEQ returnss all compare insns.  */
+   GEN_SEQ returns all compare insns.  */
 static rtx
 expand_ccmp_next (gimple *g, enum tree_code code, rtx prev,
  rtx *prep_seq, rtx *gen_seq)
@@ -226,7 +232,7 @@ expand_ccmp_expr_1 (gimple *g, rtx *prep_seq, rtx *gen_seq)
   return NULL_RTX;
 }
 
-/* Main entry to expand conditional compare statement G. 
+/* Main entry to expand conditional compare statement G.
Return NULL_RTX if G is not a legal candidate or expand fail.
Otherwise return the target.  */
 rtx
@@ -249,9 +255,10 @@ expand_ccmp_expr (gimple *g)
   enum insn_code icode;
   enum machine_mode cc_mode = CCmode;
   tree lhs = gimple_assign_lhs (g);
+  rtx_code cmp_code = GET_CODE (tmp);
 
 #ifdef SELECT_CC_MODE
-  cc_mode = SELECT_CC_MODE (NE, tmp, const0_rtx);
+  cc_mode = SELECT_CC_MODE (cmp_code, XEXP (tmp, 0), const0_rtx);
 #endif
   icode = optab_handler (cstore_optab, cc_mode);
   if (icode != CODE_FOR_nothing)
@@ -262,8 +269,8 @@ expand_ccmp_expr (gimple *g)
  emit_insn (prep_seq);
  emit_insn (gen_seq);
 
- tmp = emit_cstore (target, icode, NE, cc_mode, cc_mode,
-0, tmp, const0_rtx, 1, mode);
+ tmp = emit_cstore (target, icode, cmp_code, cc_mode, cc_mode,
+0, XEXP (

RE: [PATCH 2/4 v2][AArch64] Add support for FCCMP

2015-12-15 Thread Wilco Dijkstra
ping

> -Original Message-
> From: Wilco Dijkstra [mailto:wilco.dijks...@arm.com]
> Sent: 17 November 2015 18:36
> To: gcc-patches@gcc.gnu.org
> Subject: [PATCH 2/4 v2][AArch64] Add support for FCCMP
> 
> (v2 version removes 4 enums)
> 
> This patch adds support for FCCMP. This is trivial with the new CCMP 
> representation - remove the restriction of FP in ccmp.c and add
> FCCMP patterns. Add a test to ensure FCCMP/FCCMPE are emitted as expected.
> 
> OK for commit?
> 
> ChangeLog:
> 2015-11-18  Wilco Dijkstra  
> 
>   * gcc/ccmp.c (ccmp_candidate_p): Remove integer-only restriction.
>   * gcc/config/aarch64/aarch64.md (fccmp): New pattern.
>   (fccmpe): Likewise.
>   (fcmp): Rename to fcmp and globalize pattern.
>   (fcmpe): Likewise.
>   * gcc/config/aarch64/aarch64.c (aarch64_gen_ccmp_first): Add FP support.
>   (aarch64_gen_ccmp_next): Add FP support.
> 
> gcc/testsuite/
>   * gcc.target/aarch64/ccmp_1.c: New testcase.
> 
> 
> ---
>  gcc/ccmp.c|  6 ---
>  gcc/config/aarch64/aarch64.c  | 24 +
>  gcc/config/aarch64/aarch64.md | 34 -
>  gcc/testsuite/gcc.target/aarch64/ccmp_1.c | 84 
> +++
>  4 files changed, 140 insertions(+), 8 deletions(-)
>  create mode 100644 gcc/testsuite/gcc.target/aarch64/ccmp_1.c
> 
> diff --git a/gcc/ccmp.c b/gcc/ccmp.c
> index 58ac126..3698a7d 100644
> --- a/gcc/ccmp.c
> +++ b/gcc/ccmp.c
> @@ -112,12 +112,6 @@ ccmp_candidate_p (gimple *g)
>|| gimple_bb (gs0) != gimple_bb (g))
>  return false;
> 
> -  if (!(INTEGRAL_TYPE_P (TREE_TYPE (gimple_assign_rhs1 (gs0)))
> -   || POINTER_TYPE_P (TREE_TYPE (gimple_assign_rhs1 (gs0
> -  || !(INTEGRAL_TYPE_P (TREE_TYPE (gimple_assign_rhs1 (gs1)))
> -|| POINTER_TYPE_P (TREE_TYPE (gimple_assign_rhs1 (gs1)
> -return false;
> -
>tcode0 = gimple_assign_rhs_code (gs0);
>tcode1 = gimple_assign_rhs_code (gs1);
>if (TREE_CODE_CLASS (tcode0) == tcc_comparison
> diff --git a/gcc/config/aarch64/aarch64.c b/gcc/config/aarch64/aarch64.c
> index c8bee3b..db4d190 100644
> --- a/gcc/config/aarch64/aarch64.c
> +++ b/gcc/config/aarch64/aarch64.c
> @@ -12398,6 +12398,18 @@ aarch64_gen_ccmp_first (rtx *prep_seq, rtx *gen_seq,
>icode = CODE_FOR_cmpdi;
>break;
> 
> +case SFmode:
> +  cmp_mode = SFmode;
> +  cc_mode = aarch64_select_cc_mode ((rtx_code) code, op0, op1);
> +  icode = cc_mode == CCFPEmode ? CODE_FOR_fcmpesf : CODE_FOR_fcmpsf;
> +  break;
> +
> +case DFmode:
> +  cmp_mode = DFmode;
> +  cc_mode = aarch64_select_cc_mode ((rtx_code) code, op0, op1);
> +  icode = cc_mode == CCFPEmode ? CODE_FOR_fcmpedf : CODE_FOR_fcmpdf;
> +  break;
> +
>  default:
>end_sequence ();
>return NULL_RTX;
> @@ -12461,6 +12473,18 @@ aarch64_gen_ccmp_next (rtx *prep_seq, rtx *gen_seq, 
> rtx prev, int cmp_code,
>icode = CODE_FOR_ccmpdi;
>break;
> 
> +case SFmode:
> +  cmp_mode = SFmode;
> +  cc_mode = aarch64_select_cc_mode ((rtx_code) cmp_code, op0, op1);
> +  icode = cc_mode == CCFPEmode ? CODE_FOR_fccmpesf : CODE_FOR_fccmpsf;
> +  break;
> +
> +case DFmode:
> +  cmp_mode = DFmode;
> +  cc_mode = aarch64_select_cc_mode ((rtx_code) cmp_code, op0, op1);
> +  icode = cc_mode == CCFPEmode ? CODE_FOR_fccmpedf : CODE_FOR_fccmpdf;
> +  break;
> +
>  default:
>end_sequence ();
>return NULL_RTX;
> diff --git a/gcc/config/aarch64/aarch64.md b/gcc/config/aarch64/aarch64.md
> index fab65c6..7d728b5 100644
> --- a/gcc/config/aarch64/aarch64.md
> +++ b/gcc/config/aarch64/aarch64.md
> @@ -279,6 +279,36 @@
>[(set_attr "type" "alus_sreg,alus_imm,alus_imm")]
>  )
> 
> +(define_insn "fccmp"
> +  [(set (match_operand:CCFP 1 "cc_register" "")
> + (if_then_else:CCFP
> +   (match_operator 4 "aarch64_comparison_operator"
> +[(match_operand 0 "cc_register" "")
> + (const_int 0)])
> +   (compare:CCFP
> + (match_operand:GPF 2 "register_operand" "w")
> + (match_operand:GPF 3 "register_operand" "w"))
> +   (match_operand 5 "immediate_operand")))]
> +  "TARGET_FLOAT"
> +  "fccmp\\t%2, %3, %k5, %m4"
> +  [(set_attr "type" "fcmp")]
> +)
> +
> +(define_insn "fccmpe"
> +  [(set (match_operand:CCFPE 1 "cc_register" "")
> +  (if_then_else:CCFPE
> +   (match_operator 4 "aarch64_comparison_operator"
> +[(match_operand 0 "cc_register" "")
> +   (const_int 0)])
> +(compare:CCFPE
> + (match_operand:GPF 2 "register_operand" "w")
> + (match_operand:GPF 3 "register_operand" "w"))
> +   (match_operand 5 "immediate_operand")))]
> +  "TARGET_FLOAT"
> +  "fccmpe\\t%2, %3, %k5, %m4"
> +  [(set_attr "type" "fcmp")]
> +)
> +
>  ;; Expansion of signed mod by a power of 2 using CSNEG.
>  ;; For x0 % n where n is a power of 2 produce:
>  ;; negs   x1, x0
> @@ -2794,7 +2824,7 @@
>[(set_

RE: [PATCH 3/4][AArch64] Add CCMP to rtx costs

2015-12-15 Thread Wilco Dijkstra
ping

> -Original Message-
> From: Wilco Dijkstra [mailto:wilco.dijks...@arm.com]
> Sent: 13 November 2015 16:03
> To: 'gcc-patches@gcc.gnu.org'
> Subject: [PATCH 3/4][AArch64] Add CCMP to rtx costs
> 
> This patch adds support for rtx costing of CCMP. The cost is the same as 
> int/FP compare, however comparisons with zero get a slightly
> larger cost. This means we prefer emitting compares with zero so they can be 
> merged with ALU operations.
> 
> OK for commit?
> 
> ChangeLog:
> 2015-11-13  Wilco Dijkstra  
> 
>   * gcc/config/aarch64/aarch64.c (aarch64_if_then_else_costs):
>   Add support for CCMP costing.
> 
> ---
>  gcc/config/aarch64/aarch64.c | 20 
>  1 file changed, 20 insertions(+)
> 
> diff --git a/gcc/config/aarch64/aarch64.c b/gcc/config/aarch64/aarch64.c
> index a224982..b789841 100644
> --- a/gcc/config/aarch64/aarch64.c
> +++ b/gcc/config/aarch64/aarch64.c
> @@ -5638,6 +5638,26 @@ aarch64_if_then_else_costs (rtx op0, rtx op1, rtx op2, 
> int *cost, bool speed)
>  }
>else if (GET_MODE_CLASS (GET_MODE (inner)) == MODE_CC)
>  {
> +  /* CCMP.  */
> +  if ((GET_CODE (op1) == COMPARE) && CONST_INT_P (op2))
> + {
> +   /* Increase cost of CCMP reg, 0, imm, CC to prefer CMP reg, 0.  */
> +   if (XEXP (op1, 1) == const0_rtx)
> + *cost += 1;
> +   if (speed)
> + {
> +   machine_mode mode = GET_MODE (XEXP (op1, 0));
> +   const struct cpu_cost_table *extra_cost
> + = aarch64_tune_params.insn_extra_cost;
> +
> +   if (GET_MODE_CLASS (mode) == MODE_INT)
> + *cost += extra_cost->alu.arith;
> +   else
> + *cost += extra_cost->fp[mode == DFmode].compare;
> + }
> +   return true;
> + }
> +
>/* It's a conditional operation based on the status flags,
>so it must be some flavor of CSEL.  */
> 
> --
> 1.9.1



RE: [PATCH 4/4][AArch64] Cost CCMP instruction sequences to choose better expand order

2015-12-15 Thread Wilco Dijkstra
ping

> -Original Message-
> From: Wilco Dijkstra [mailto:wilco.dijks...@arm.com]
> Sent: 13 November 2015 16:03
> To: 'gcc-patches@gcc.gnu.org'
> Subject: [PATCH 4/4][AArch64] Cost CCMP instruction sequences to choose 
> better expand order
> 
> This patch adds CCMP selection based on rtx costs. This is based on Jiong's 
> already approved patch https://gcc.gnu.org/ml/gcc-
> patches/2015-09/msg01434.html with some minor refactoring and the tests 
> updated.
> 
> OK for commit?
> 
> ChangeLog:
> 2015-11-13  Jiong Wang  
> 
> gcc/
>   * ccmp.c (expand_ccmp_expr_1): Cost the instruction sequences
>   generated from different expand order.
> 
> gcc/testsuite/
>   * gcc.target/aarch64/ccmp_1.c: Update test.
> 
> ---
>  gcc/ccmp.c| 47 
> +++
>  gcc/testsuite/gcc.target/aarch64/ccmp_1.c | 15 --
>  2 files changed, 55 insertions(+), 7 deletions(-)
> 
> diff --git a/gcc/ccmp.c b/gcc/ccmp.c
> index cbdbd6d..95a41a6 100644
> --- a/gcc/ccmp.c
> +++ b/gcc/ccmp.c
> @@ -51,6 +51,7 @@ along with GCC; see the file COPYING3.  If not see
>  #include "tree-outof-ssa.h"
>  #include "cfgexpand.h"
>  #include "ccmp.h"
> +#include "predict.h"
> 
>  /* The following functions expand conditional compare (CCMP) instructions.
> Here is a short description about the over all algorithm:
> @@ -159,6 +160,8 @@ expand_ccmp_next (gimple *g, enum tree_code code, rtx 
> prev,
>  static rtx
>  expand_ccmp_expr_1 (gimple *g, rtx *prep_seq, rtx *gen_seq)
>  {
> +  rtx prep_seq_1, gen_seq_1;
> +  rtx prep_seq_2, gen_seq_2;
>tree exp = gimple_assign_rhs_to_tree (g);
>enum tree_code code = TREE_CODE (exp);
>gimple *gs0 = get_gimple_for_ssa_name (TREE_OPERAND (exp, 0));
> @@ -174,19 +177,53 @@ expand_ccmp_expr_1 (gimple *g, rtx *prep_seq, rtx 
> *gen_seq)
>  {
>if (TREE_CODE_CLASS (code1) == tcc_comparison)
>   {
> -   int unsignedp0;
> -   enum rtx_code rcode0;
> +   int unsignedp0, unsignedp1;
> +   enum rtx_code rcode0, rcode1;
> +   int speed_p = optimize_insn_for_speed_p ();
> +   rtx tmp2, ret, ret2;
> +   unsigned cost1 = MAX_COST;
> +   unsigned cost2 = MAX_COST;
> 
> unsignedp0 = TYPE_UNSIGNED (TREE_TYPE (gimple_assign_rhs1 (gs0)));
> +   unsignedp1 = TYPE_UNSIGNED (TREE_TYPE (gimple_assign_rhs1 (gs1)));
> rcode0 = get_rtx_code (code0, unsignedp0);
> +   rcode1 = get_rtx_code (code1, unsignedp1);
> 
> -   tmp = targetm.gen_ccmp_first (prep_seq, gen_seq, rcode0,
> +   tmp = targetm.gen_ccmp_first (&prep_seq_1, &gen_seq_1, rcode0,
>   gimple_assign_rhs1 (gs0),
>   gimple_assign_rhs2 (gs0));
> -   if (!tmp)
> +
> +   tmp2 = targetm.gen_ccmp_first (&prep_seq_2, &gen_seq_2, rcode1,
> +  gimple_assign_rhs1 (gs1),
> +  gimple_assign_rhs2 (gs1));
> +
> +   if (!tmp && !tmp2)
>   return NULL_RTX;
> 
> -   return expand_ccmp_next (gs1, code, tmp, prep_seq, gen_seq);
> +   if (tmp != NULL)
> + {
> +   ret = expand_ccmp_next (gs1, code, tmp, &prep_seq_1, &gen_seq_1);
> +   cost1 = seq_cost (safe_as_a  (prep_seq_1), speed_p);
> +   cost1 += seq_cost (safe_as_a  (gen_seq_1), speed_p);
> + }
> +   if (tmp2 != NULL)
> + {
> +   ret2 = expand_ccmp_next (gs0, code, tmp2, &prep_seq_2,
> +&gen_seq_2);
> +   cost2 = seq_cost (safe_as_a  (prep_seq_2), speed_p);
> +   cost2 += seq_cost (safe_as_a  (gen_seq_2), speed_p);
> + }
> +
> +   if (cost2 < cost1)
> + {
> +   *prep_seq = prep_seq_2;
> +   *gen_seq = gen_seq_2;
> +   return ret2;
> + }
> +
> +   *prep_seq = prep_seq_1;
> +   *gen_seq = gen_seq_1;
> +   return ret;
>   }
>else
>   {
> diff --git a/gcc/testsuite/gcc.target/aarch64/ccmp_1.c 
> b/gcc/testsuite/gcc.target/aarch64/ccmp_1.c
> index ef077e0..7c39b61 100644
> --- a/gcc/testsuite/gcc.target/aarch64/ccmp_1.c
> +++ b/gcc/testsuite/gcc.target/aarch64/ccmp_1.c
> @@ -80,5 +80,16 @@ f13 (int a, int b)
>return a == 3 || a == 0;
>  }
> 
> -/* { dg-final { scan-assembler "fccmp\t" } } */
> -/* { dg-final { scan-assembler "fccmpe\t" } } */
> +/* { dg-final { scan-assembler "cmp\t(.)+32" } } */
> +/* { dg-final { scan-assembler "cmp\t(.)+33" } } */
> +/* { dg-final { scan-assembler "cmp\t(.)+34" } } */
> +/* { dg-final { scan-assembler "cmp\t(.)+35" } } */
> +
> +/* { dg-final { scan-assembler-times "\tcmp\tw\[0-9\]+, 0" 4 } } */
> +/* { dg-final { scan-assembler-times "fcmpe\t(.)+0\\.0" 2 } } */
> +/* { dg-final { scan-assembler-times "fcmp\t(.)+0\\.0" 2 } } */
> +
> +/* { dg-final { scan-assembler "adds\t" } } */
> +/* { dg-final { scan-assembler-times "\tccmp\t" 11 } } */
> +/* { dg-final { scan-assembler-times "fc

RE: [PATCH][AArch64] Add TARGET_IRA_CHANGE_PSEUDO_ALLOCNO_CLASS

2015-12-15 Thread Wilco Dijkstra
ping

> -Original Message-
> From: Wilco Dijkstra [mailto:wilco.dijks...@arm.com]
> Sent: 06 November 2015 20:06
> To: 'gcc-patches@gcc.gnu.org'
> Subject: [PATCH][AArch64] Add TARGET_IRA_CHANGE_PSEUDO_ALLOCNO_CLASS
> 
> This patch adds support for the TARGET_IRA_CHANGE_PSEUDO_ALLOCNO_CLASS hook. 
> When the cost of GENERAL_REGS and
> FP_REGS is identical, the register allocator always uses ALL_REGS even when 
> it has a much higher cost. The hook changes the class to
> either FP_REGS or GENERAL_REGS depending on the mode of the register. This 
> results in better register allocation overall, fewer spills
> and reduced codesize - particularly in SPEC2006 gamess.
> 
> GCC regression passes with several minor fixes.
> 
> OK for commit?
> 
> ChangeLog:
> 2015-11-06  Wilco Dijkstra  
> 
>   * gcc/config/aarch64/aarch64.c
>   (TARGET_IRA_CHANGE_PSEUDO_ALLOCNO_CLASS): New define.
>   (aarch64_ira_change_pseudo_allocno_class): New function.
>   * gcc/testsuite/gcc.target/aarch64/cvtf_1.c: Build with -O2.
>   * gcc/testsuite/gcc.target/aarch64/scalar_shift_1.c
>   (test_corners_sisd_di): Improve force to SIMD register.
>   (test_corners_sisd_si): Likewise.
>   * gcc/testsuite/gcc.target/aarch64/vdup_lane_2.c: Build with -O2.
>   * gcc/testsuite/gcc.target/aarch64/vect-ld1r-compile-fp.c:
>   Remove scan-assembler check for ldr.
> 
> --
>  gcc/config/aarch64/aarch64.c   | 22 
> ++
>  gcc/testsuite/gcc.target/aarch64/cvtf_1.c  |  2 +-
>  gcc/testsuite/gcc.target/aarch64/scalar_shift_1.c  |  4 ++--
>  gcc/testsuite/gcc.target/aarch64/vdup_lane_2.c |  2 +-
>  .../gcc.target/aarch64/vect-ld1r-compile-fp.c  |  1 -
>  5 files changed, 26 insertions(+), 5 deletions(-)
> 
> diff --git a/gcc/config/aarch64/aarch64.c b/gcc/config/aarch64/aarch64.c
> index 6da7245..9b60666 100644
> --- a/gcc/config/aarch64/aarch64.c
> +++ b/gcc/config/aarch64/aarch64.c
> @@ -597,6 +597,24 @@ aarch64_err_no_fpadvsimd (machine_mode mode, const char 
> *msg)
>  error ("%qs feature modifier is incompatible with %s %s", "+nofp", mc, 
> msg);
>  }
> 
> +/* Implement TARGET_IRA_CHANGE_PSEUDO_ALLOCNO_CLASS.
> +   The register allocator chooses ALL_REGS if FP_REGS and GENERAL_REGS have
> +   the same cost even if ALL_REGS has a much larger cost.  This results in 
> bad
> +   allocations and spilling.  To avoid this we force the class to 
> GENERAL_REGS
> +   if the mode is integer.  */
> +
> +static reg_class_t
> +aarch64_ira_change_pseudo_allocno_class (int regno, reg_class_t 
> allocno_class)
> +{
> +  enum machine_mode mode;
> +
> +  if (allocno_class != ALL_REGS)
> +return allocno_class;
> +
> +  mode = PSEUDO_REGNO_MODE (regno);
> +  return FLOAT_MODE_P (mode) || VECTOR_MODE_P (mode) ? FP_REGS : 
> GENERAL_REGS;
> +}
> +
>  static unsigned int
>  aarch64_min_divisions_for_recip_mul (enum machine_mode mode)
>  {
> @@ -13113,6 +13131,10 @@ aarch64_promoted_type (const_tree t)
>  #undef  TARGET_INIT_BUILTINS
>  #define TARGET_INIT_BUILTINS  aarch64_init_builtins
> 
> +#undef TARGET_IRA_CHANGE_PSEUDO_ALLOCNO_CLASS
> +#define TARGET_IRA_CHANGE_PSEUDO_ALLOCNO_CLASS \
> +  aarch64_ira_change_pseudo_allocno_class
> +
>  #undef TARGET_LEGITIMATE_ADDRESS_P
>  #define TARGET_LEGITIMATE_ADDRESS_P aarch64_legitimate_address_hook_p
> 
> diff --git a/gcc/testsuite/gcc.target/aarch64/cvtf_1.c 
> b/gcc/testsuite/gcc.target/aarch64/cvtf_1.c
> index 5f2ff81..96501db 100644
> --- a/gcc/testsuite/gcc.target/aarch64/cvtf_1.c
> +++ b/gcc/testsuite/gcc.target/aarch64/cvtf_1.c
> @@ -1,5 +1,5 @@
>  /* { dg-do run } */
> -/* { dg-options "-save-temps -fno-inline -O1" } */
> +/* { dg-options "-save-temps -fno-inline -O2" } */
> 
>  #define FCVTDEF(ftype,itype) \
>  void \
> diff --git a/gcc/testsuite/gcc.target/aarch64/scalar_shift_1.c 
> b/gcc/testsuite/gcc.target/aarch64/scalar_shift_1.c
> index 363f554..8465c89 100644
> --- a/gcc/testsuite/gcc.target/aarch64/scalar_shift_1.c
> +++ b/gcc/testsuite/gcc.target/aarch64/scalar_shift_1.c
> @@ -186,9 +186,9 @@ test_corners_sisd_di (Int64x1 b)
>  {
>force_simd_di (b);
>b = b >> 63;
> +  force_simd_di (b);
>b = b >> 0;
>b += b >> 65; /* { dg-warning "right shift count >= width of type" } */
> -  force_simd_di (b);
> 
>return b;
>  }
> @@ -199,9 +199,9 @@ test_corners_sisd_si (Int32x1 b)
>  {
>force_simd_si (b);
>b = b >> 31;
> +  force_simd_si (b);
>b = b >> 0;
>b += b >> 33; /* { dg-warning "right shift count >= width of type" } */
> -  force_simd_si (b);
> 
>return b;
>  }
> diff --git a/gcc/testsuite/gcc.target/aarch64/vdup_lane_2.c 
> b/gcc/testsuite/gcc.target/aarch64/vdup_lane_2.c
> index a49db3e..c5a9c52 100644
> --- a/gcc/testsuite/gcc.target/aarch64/vdup_lane_2.c
> +++ b/gcc/testsuite/gcc.target/aarch64/vdup_lane_2.c
> @@ -1,6 +1,6 @@
>  /* Test vdup_lane intrinsics work correctly.  */
>  /* { dg-do run } */
> -/* { dg-options "-O1 --save-temps" } */
> +/* { dg-options "-O

RE: [PATCH][ARM] Enable fusion of AES instructions

2015-12-15 Thread Wilco Dijkstra
ping

> -Original Message-
> From: Wilco Dijkstra [mailto:wilco.dijks...@arm.com]
> Sent: 19 November 2015 18:12
> To: gcc-patches@gcc.gnu.org
> Subject: [PATCH][ARM] Enable fusion of AES instructions
> 
> Enable instruction fusion of AES instructions on ARM for Cortex-A53 and 
> Cortex-A57.
> 
> OK for commit?
> 
> ChangeLog:
> 2015-11-20  Wilco Dijkstra  
> 
>   * gcc/config/arm/arm.c (arm_cortex_a53_tune): Add AES fusion.
>   (arm_cortex_a57_tune): Likewise.
>   (aarch_macro_fusion_pair_p): Add support for AES fusion.
>   * gcc/config/arm/arm-protos.h (fuse_ops): Add FUSE_AES_AESMC.
> 
> ---
>  gcc/config/arm/arm-protos.h | 5 +++--
>  gcc/config/arm/arm.c| 9 +++--
>  2 files changed, 10 insertions(+), 4 deletions(-)
> 
> diff --git a/gcc/config/arm/arm-protos.h b/gcc/config/arm/arm-protos.h
> index f9b1276..4801bb8 100644
> --- a/gcc/config/arm/arm-protos.h
> +++ b/gcc/config/arm/arm-protos.h
> @@ -302,8 +302,9 @@ struct tune_params
>enum fuse_ops
>{
>  FUSE_NOTHING   = 0,
> -FUSE_MOVW_MOVT = 1 << 0
> -  } fusible_ops: 1;
> +FUSE_MOVW_MOVT = 1 << 0,
> +FUSE_AES_AESMC = 1 << 1
> +  } fusible_ops: 2;
>/* Depth of scheduling queue to check for L2 autoprefetcher.  */
>enum {SCHED_AUTOPREF_OFF, SCHED_AUTOPREF_RANK, SCHED_AUTOPREF_FULL}
>  sched_autopref: 2;
> diff --git a/gcc/config/arm/arm.c b/gcc/config/arm/arm.c
> index 02f5dc3..7077199 100644
> --- a/gcc/config/arm/arm.c
> +++ b/gcc/config/arm/arm.c
> @@ -1969,7 +1969,7 @@ const struct tune_params arm_cortex_a53_tune =
>tune_params::DISPARAGE_FLAGS_NEITHER,
>tune_params::PREF_NEON_64_FALSE,
>tune_params::PREF_NEON_STRINGOPS_TRUE,
> -  FUSE_OPS (tune_params::FUSE_MOVW_MOVT),
> +  FUSE_OPS (tune_params::FUSE_MOVW_MOVT | tune_params::FUSE_AES_AESMC),
>tune_params::SCHED_AUTOPREF_OFF
>  };
> 
> @@ -1992,7 +1992,7 @@ const struct tune_params arm_cortex_a57_tune =
>tune_params::DISPARAGE_FLAGS_ALL,
>tune_params::PREF_NEON_64_FALSE,
>tune_params::PREF_NEON_STRINGOPS_TRUE,
> -  FUSE_OPS (tune_params::FUSE_MOVW_MOVT),
> +  FUSE_OPS (tune_params::FUSE_MOVW_MOVT | tune_params::FUSE_AES_AESMC),
>tune_params::SCHED_AUTOPREF_FULL
>  };
> 
> @@ -29668,6 +29668,11 @@ aarch_macro_fusion_pair_p (rtx_insn* prev, rtx_insn* 
> curr)
>  && REGNO (SET_DEST (curr_set)) == REGNO (SET_DEST (prev_set)))
>return true;
>  }
> +
> +  if (current_tune->fusible_ops & tune_params::FUSE_AES_AESMC
> +  && aarch_crypto_can_dual_issue (prev, curr))
> +return true;
> +
>return false;
>  }
> 
> --
> 1.9.1



RE: [PATCH][AArch64] Enable fusion of AES instructions

2015-12-15 Thread Wilco Dijkstra
Kyrill Tkachov wrote:
> On 14/10/15 13:30, Wilco Dijkstra wrote:
> > Enable instruction fusion of dependent AESE; AESMC and AESD; AESIMC pairs. 
> > This can give up to 2x
> > speedup on many AArch64 implementations. Also model the crypto instructions 
> > on Cortex-A57 according
> > to the Optimization Guide.
> >
> > Passes regression tests.
> 
> arm-wise this is ok, but I'd like a follow up patch to enable this fusion
> for the arm port as well. It should be fairly simple.
> Just add a new enum value to fuse_ops inside tune_params in arm-protos.h
> and update the arm implementation in aarch_macro_fusion_pair_p similar
> to your aarch64 implementation.

I sent out a patch for AArch32 as well. Assuming you're still OK, could you 
commit this please?

Wilco

> > ChangeLog:
> > 2015-10-14  Wilco Dijkstra  
> >
> > * gcc/config/aarch64/aarch64.c (cortexa53_tunings): Add AES fusion.
> > (cortexa57_tunings): Likewise.
> > (cortexa72_tunings): Likewise.
> > (arch_macro_fusion_pair_p): Add support for AES fusion.
> > * gcc/config/aarch64/aarch64-fusion-pairs.def: Add AES_AESMC entry.
> > * gcc/config/arm/aarch-common.c (aarch_crypto_can_dual_issue):
> > Allow virtual registers before reload so early scheduling works.
> > * gcc/config/arm/cortex-a57.md (cortex_a57_crypto_simple): Use
> > correct latency and pipeline.
> > (cortex_a57_crypto_complex): Likewise.
> > (cortex_a57_crypto_xor): Likewise.
> > (define_bypass): Add AES bypass.
> >
> >
> > ---
> >   gcc/config/aarch64/aarch64-fusion-pairs.def |  1 +
> >   gcc/config/aarch64/aarch64.c| 10 +++---
> >   gcc/config/arm/aarch-common.c   |  7 +--
> >   gcc/config/arm/cortex-a57.md| 17 +++--
> >   4 files changed, 24 insertions(+), 11 deletions(-)
> >
> > diff --git a/gcc/config/aarch64/aarch64-fusion-pairs.def
> > b/gcc/config/aarch64/aarch64-fusion-pairs.def
> > index 53bbef4..fea79fc 100644
> > --- a/gcc/config/aarch64/aarch64-fusion-pairs.def
> > +++ b/gcc/config/aarch64/aarch64-fusion-pairs.def
> > @@ -33,4 +33,5 @@ AARCH64_FUSION_PAIR ("adrp+add", ADRP_ADD)
> >   AARCH64_FUSION_PAIR ("movk+movk", MOVK_MOVK)
> >   AARCH64_FUSION_PAIR ("adrp+ldr", ADRP_LDR)
> >   AARCH64_FUSION_PAIR ("cmp+branch", CMP_BRANCH)
> > +AARCH64_FUSION_PAIR ("aes+aesmc", AES_AESMC)
> >
> > diff --git a/gcc/config/aarch64/aarch64.c b/gcc/config/aarch64/aarch64.c
> > index 230902d..96368c6 100644
> > --- a/gcc/config/aarch64/aarch64.c
> > +++ b/gcc/config/aarch64/aarch64.c
> > @@ -376,7 +376,7 @@ static const struct tune_params cortexa53_tunings =
> > &generic_branch_cost,
> > 4, /* memmov_cost  */
> > 2, /* issue_rate  */
> > -  (AARCH64_FUSE_MOV_MOVK | AARCH64_FUSE_ADRP_ADD
> > +  (AARCH64_FUSE_AES_AESMC | AARCH64_FUSE_MOV_MOVK | AARCH64_FUSE_ADRP_ADD
> >  | AARCH64_FUSE_MOVK_MOVK | AARCH64_FUSE_ADRP_LDR), /* fusible_ops  */
> > 8,  /* function_align.  */
> > 8,  /* jump_align.  */
> > @@ -398,7 +398,7 @@ static const struct tune_params cortexa57_tunings =
> > &generic_branch_cost,
> > 4, /* memmov_cost  */
> > 3, /* issue_rate  */
> > -  (AARCH64_FUSE_MOV_MOVK | AARCH64_FUSE_ADRP_ADD
> > +  (AARCH64_FUSE_AES_AESMC | AARCH64_FUSE_MOV_MOVK | AARCH64_FUSE_ADRP_ADD
> >  | AARCH64_FUSE_MOVK_MOVK), /* fusible_ops  */
> > 16, /* function_align.  */
> > 8,  /* jump_align.  */
> > @@ -420,7 +420,7 @@ static const struct tune_params cortexa72_tunings =
> > &generic_branch_cost,
> > 4, /* memmov_cost  */
> > 3, /* issue_rate  */
> > -  (AARCH64_FUSE_MOV_MOVK | AARCH64_FUSE_ADRP_ADD
> > +  (AARCH64_FUSE_AES_AESMC | AARCH64_FUSE_MOV_MOVK | AARCH64_FUSE_ADRP_ADD
> >  | AARCH64_FUSE_MOVK_MOVK), /* fusible_ops  */
> > 16, /* function_align.  */
> > 8,  /* jump_align.  */
> > @@ -12843,6 +12843,10 @@ aarch_macro_fusion_pair_p (rtx_insn *prev, 
> > rtx_insn *curr)
> >   }
> >   }
> >
> > +  if ((aarch64_tune_params.fusible_ops & AARCH64_FUSE_AES_AESMC)
> > +   && aarch_crypto_can_dual_issue (prev, curr))
> > +return true;
> > +
> > if ((aarch64_tune_params.fusible_ops & AARCH64_FUSE_CMP_BRANCH)
> > && any_condjump_p (curr))
> >   {
> > diff --git a/gcc/config/arm/aarch-common.c b/gcc/config/arm/aarch-common.c
> > index 5dd8222..e191ab6 100644
> > --- a/gcc/config/arm/aarch-common.c
> > +++ b/gcc/config/arm/aarch-common.c
> > @@ -63,8 +63,11 @@ aarch_crypto_can_dual_issue (rtx_insn *producer_insn, 
> > rtx_insn *consumer_insn)
> > {
> >   unsigned int regno = REGNO (SET_DEST (producer_set));
> >
> > -return REGNO (SET_DEST (consumer_set)) == regno
> > -   && REGNO (XVECEXP (consumer_src, 0, 0)) == regno;
> > +/* Before reload the registers are virtual, so the destination of
> > +   consumer_set doesn't need to match.  */
> > +
> > +return (REGNO (SET_DEST (consumer_set)) == regno || !reload_completed)
> > +   && REGNO (XVECEXP 

RE: [PATCH][AArch64] Avoid emitting zero immediate as zero register

2015-12-15 Thread Wilco Dijkstra
ping

> -Original Message-
> From: Wilco Dijkstra [mailto:wdijk...@arm.com]
> Sent: 28 October 2015 17:33
> To: GCC Patches
> Subject: [PATCH][AArch64] Avoid emitting zero immediate as zero register
> 
> Several instructions accidentally emit wzr/xzr even when the pattern 
> specifies an immediate. Fix this by removing the register
> specifier in patterns that emit immediates.
> 
> Passes regression tests. OK for commit?
> 
> ChangeLog:
> 2015-10-28  Wilco Dijkstra  
> 
>   * gcc/config/aarch64/aarch64.md (ccmp_and): Emit
>   immediate as %1.
>   (ccmp_ior): Likewise.
>   (add3_compare0): Likewise.
>   (addsi3_compare0_uxtw): Likewise.
>   (add3nr_compare0): Likewise.
>   (compare_neg): Likewise.
>   (3): Likewise.
> 
> ---
>  gcc/config/aarch64/aarch64.md | 14 +++---
>  1 file changed, 7 insertions(+), 7 deletions(-)
> 
> diff --git a/gcc/config/aarch64/aarch64.md b/gcc/config/aarch64/aarch64.md
> index f90b821..d262102 100644
> --- a/gcc/config/aarch64/aarch64.md
> +++ b/gcc/config/aarch64/aarch64.md
> @@ -275,7 +275,7 @@
>"aarch64_ccmp_mode_to_code (GET_MODE (operands[1])) == GET_CODE 
> (operands[5])"
>"@
> ccmp\\t%2, %3, %k5, %m4
> -   ccmp\\t%2, %3, %k5, %m4
> +   ccmp\\t%2, %3, %k5, %m4
> ccmn\\t%2, %n3, %k5, %m4"
>[(set_attr "type" "alus_sreg,alus_imm,alus_imm")]
>  )
> @@ -294,7 +294,7 @@
>"aarch64_ccmp_mode_to_code (GET_MODE (operands[1])) == GET_CODE 
> (operands[5])"
>"@
> ccmp\\t%2, %3, %K5, %M4
> -   ccmp\\t%2, %3, %K5, %M4
> +   ccmp\\t%2, %3, %K5, %M4
> ccmn\\t%2, %n3, %K5, %M4"
>[(set_attr "type" "alus_sreg,alus_imm,alus_imm")]
>  )
> @@ -1647,7 +1647,7 @@
>""
>"@
>adds\\t%0, %1, %2
> -  adds\\t%0, %1, %2
> +  adds\\t%0, %1, %2
>subs\\t%0, %1, %n2"
>[(set_attr "type" "alus_sreg,alus_imm,alus_imm")]
>  )
> @@ -1664,7 +1664,7 @@
>""
>"@
>adds\\t%w0, %w1, %w2
> -  adds\\t%w0, %w1, %w2
> +  adds\\t%w0, %w1, %2
>subs\\t%w0, %w1, %n2"
>[(set_attr "type" "alus_sreg,alus_imm,alus_imm")]
>  )
> @@ -1846,7 +1846,7 @@
>""
>"@
>cmn\\t%0, %1
> -  cmn\\t%0, %1
> +  cmn\\t%0, %1
>cmp\\t%0, %n1"
>[(set_attr "type" "alus_sreg,alus_imm,alus_imm")]
>  )
> @@ -2792,7 +2792,7 @@
>""
>"@
> cmp\\t%0, %1
> -   cmp\\t%0, %1
> +   cmp\\t%0, %1
> cmn\\t%0, %n1"
>[(set_attr "type" "alus_sreg,alus_imm,alus_imm")]
>  )
> @@ -3178,7 +3178,7 @@
>""
>"@
>\\t%0, %1, %2
> -  \\t%0, %1, %2
> +  \\t%0, %1, %2
>\\t%0., %1., %2."
>[(set_attr "type" "logic_reg,logic_imm,neon_logic")
> (set_attr "simd" "*,*,yes")]
> --
> 1.9.1



[PATCH][AArch64] Add vector permute cost

2015-12-15 Thread Wilco Dijkstra

Add support for vector permute cost since various permutes can expand into a 
complex
sequence of instructions.  This fixes major performance regressions due to 
recent changes
in the SLP vectorizer (which now vectorizes more aggressively and emits many 
complex 
permutes).

Set the cost to > 1 for all microarchitectures so that the number of permutes 
is usually zero
and regressions disappear.  An example of the kind of code that might be 
emitted for
VEC_PERM_EXPR {0, 3} where registers happen to be in the wrong order:

adrpx4, .LC16
ldr q5, [x4, #:lo12:.LC16
eor v1.16b, v1.16b, v0.16b
eor v0.16b, v1.16b, v0.16b
eor v1.16b, v1.16b, v0.16b
tbl v0.16b, {v0.16b - v1.16b}, v5.16b

Regress passes. This fixes regressions that were introduced recently, so OK for 
commit?


ChangeLog:
2015-12-15  Wilco Dijkstra  

* gcc/config/aarch64/aarch64.c (generic_vector_cost):
Set vec_permute_cost.
(cortexa57_vector_cost): Likewise.
(exynosm1_vector_cost): Likewise.
(xgene1_vector_cost): Likewise.
(aarch64_builtin_vectorization_cost): Use vec_permute_cost.
* gcc/config/aarch64/aarch64-protos.h (cpu_vector_cost):
Add vec_permute_cost entry.


diff --git a/gcc/config/aarch64/aarch64-protos.h 
b/gcc/config/aarch64/aarch64-protos.h
index 
549a89d1f691b32efbc74359f045b5df74765f0e..1bc812a4d01e8b9895c11cefde3148429397e95a
 100644
--- a/gcc/config/aarch64/aarch64-protos.h
+++ b/gcc/config/aarch64/aarch64-protos.h
@@ -156,9 +156,10 @@ struct cpu_vector_cost
   const int scalar_load_cost;   /* Cost of scalar load.  */
   const int scalar_store_cost;  /* Cost of scalar store.  */
   const int vec_stmt_cost;  /* Cost of any vector operation,
-   excluding load, store,
+   excluding load, store, permute,
vector-to-scalar and
scalar-to-vector operation.  */
+  const int vec_permute_cost;   /* Cost of permute operation.  */
   const int vec_to_scalar_cost; /* Cost of vec-to-scalar 
operation.  */
   const int scalar_to_vec_cost; /* Cost of scalar-to-vector
operation.  */
diff --git a/gcc/config/aarch64/aarch64.c b/gcc/config/aarch64/aarch64.c
index 
10754c88c0973d8ef3c847195b727f02b193bbd8..2584f16d345b3d015d577dd28c08a73ee3e0b0fb
 100644
--- a/gcc/config/aarch64/aarch64.c
+++ b/gcc/config/aarch64/aarch64.c
@@ -314,6 +314,7 @@ static const struct cpu_vector_cost generic_vector_cost =
   1, /* scalar_load_cost  */
   1, /* scalar_store_cost  */
   1, /* vec_stmt_cost  */
+  2, /* vec_permute_cost  */
   1, /* vec_to_scalar_cost  */
   1, /* scalar_to_vec_cost  */
   1, /* vec_align_load_cost  */
@@ -331,6 +332,7 @@ static const struct cpu_vector_cost cortexa57_vector_cost =
   4, /* scalar_load_cost  */
   1, /* scalar_store_cost  */
   3, /* vec_stmt_cost  */
+  3, /* vec_permute_cost  */
   8, /* vec_to_scalar_cost  */
   8, /* scalar_to_vec_cost  */
   5, /* vec_align_load_cost  */
@@ -347,6 +349,7 @@ static const struct cpu_vector_cost exynosm1_vector_cost =
   5, /* scalar_load_cost  */
   1, /* scalar_store_cost  */
   3, /* vec_stmt_cost  */
+  3, /* vec_permute_cost  */
   3, /* vec_to_scalar_cost  */
   3, /* scalar_to_vec_cost  */
   5, /* vec_align_load_cost  */
@@ -364,6 +367,7 @@ static const struct cpu_vector_cost xgene1_vector_cost =
   5, /* scalar_load_cost  */
   1, /* scalar_store_cost  */
   2, /* vec_stmt_cost  */
+  2, /* vec_permute_cost  */
   4, /* vec_to_scalar_cost  */
   4, /* scalar_to_vec_cost  */
   10, /* vec_align_load_cost  */
@@ -7555,6 +7559,8 @@ aarch64_builtin_vectorization_cost (enum 
vect_cost_for_stmt type_of_cost,
return aarch64_tune_params.vec_costs->cond_not_taken_branch_cost;
 
   case vec_perm:
+   return aarch64_tune_params.vec_costs->vec_permute_cost;
+
   case vec_promote_demote:
return aarch64_tune_params.vec_costs->vec_stmt_cost;




Re: [PATCH 1/4 v2][AArch64] Generalize CCMP support

2015-12-15 Thread James Greenhalgh
On Tue, Dec 15, 2015 at 10:32:08AM +, Wilco Dijkstra wrote:
> ping
> 
> This patch series generalizes CCMP by adding FCCMP support and enabling more 
> optimizations. 
> The first patch simplifies the representation of CCMP patterns by using 
> if-then-else which closely 
> matches real instruction semantics. As a result the existing special CC modes 
> and functions are no 
> longer required. The condition of the CCMP is the if condition which compares 
> the previously set
> CC register. The then part does the compare like a normal compare. The else 
> part contains the
> integer value of the AArch64 condition that must be set if the if condition 
> is false.
> 
> ChangeLog:
> 2015-11-12  Wilco Dijkstra  
> 
>   * gcc/target.def (gen_ccmp_first): Update documentation.
>   (gen_ccmp_next): Likewise.
>   * gcc/doc/tm.texi (gen_ccmp_first): Update documentation.
>   (gen_ccmp_next): Likewise.
>   * gcc/ccmp.c (expand_ccmp_expr): Extract cmp_code from return value of 
>   expand_ccmp_expr_1.  Improve comments.
>   * gcc/config/aarch64/aarch64.md (ccmp_and): Use if_then_else for ccmp.
>   (ccmp_ior): Remove pattern.
>   (cmp): Remove expand.
>   (cmp): Globalize pattern.
>   (cstorecc4): Use cc_register.
>   (movcc): Remove ccmp_cc_register check.
>   * gcc/config/aarch64/aarch64.c (aarch64_get_condition_code_1):
>   Simplify after removal of CC_DNE/* modes.
>   (aarch64_ccmp_mode_to_code): Remove.
>   (aarch64_print_operand): Remove 'K' case.  Merge 'm' and 'M' cases.
>   In 'k' case use integer as condition.
>   (aarch64_nzcv_codes): Remove inverted cases.
>   (aarch64_code_to_ccmode): Remove.
>   (aarch64_gen_ccmp_first): Use cmp pattern directly.  Return the correct 
>   comparison with CC register to be used in folowing CCMP/branch/CSEL.
>   (aarch64_gen_ccmp_next): Use previous comparison and mode in CCMP
>   pattern.  Return the comparison with CC register.  Invert conditions
>   when bitcode is OR.
>   * gcc/config/aarch64/aarch64-modes.def: Remove CC_DNE/* modes.
>   * gcc/config/aarch64/predicates.md (ccmp_cc_register): Remove.

The AArch64 parts of this are OK.

Thanks,
James




Re: [PATCH] gcc: read -fdebug-prefix-map OLD from environment (improved reproducibility)

2015-12-15 Thread Bernd Schmidt

On 12/11/2015 08:14 PM, Daniel Kahn Gillmor wrote:

I think you mean so that we would just ignore -fdebug-prefix-map
entirely when writing DW_AT_producer, so that you could build
reproducibly with (for example):

  gcc -fdebug-prefix-map=$(pwd)=/usr/src

We'd considered and discarded this approach in the past out of concern
for possible build systems that can easily vary the environment, but
work with a static list of CFLAGS to pass to the compiler.  On further
inspection, it's not clear that anyone has a concrete example of a build
system with this constraint.

Here's a one-liner patch for this approach (also at
https://gcc.gnu.org/bugzilla/attachment.cgi?id=37007):


I think that one-liner is fine, even for now.


Bernd



[COMMITTED] Add myself to MAINTAINERS (Write After Approval)

2015-12-15 Thread Alessandro Fanfarillo
Dear all,

I've added myself to Write After Approval maintainers.

Committed revision 231647.

Index: MAINTAINERS
===
--- MAINTAINERS(revision 231646)
+++ MAINTAINERS(working copy)
@@ -388,6 +388,7 @@
 Ansgar Esztermann
 Doug Evans
 Chris Fairles
+Alessandro Fanfarillo
 Changpeng Fang
 Li Feng
 Max Filippov
Index: ChangeLog
===
--- ChangeLog(revision 231646)
+++ ChangeLog(working copy)
@@ -1,3 +1,7 @@
+2015-12-15  Alessandro Fanfarillo 
+
+* MAINTAINERS (Write After Approval): Add myself.
+
 2015-12-02  Ian Lance Taylor  

 PR go/66147


Re: [COMMITTED] Add myself to MAINTAINERS (Write After Approval)

2015-12-15 Thread Jakub Jelinek
On Tue, Dec 15, 2015 at 02:07:40PM +0100, Alessandro Fanfarillo wrote:
> I've added myself to Write After Approval maintainers.

> --- ChangeLog(revision 231646)
> +++ ChangeLog(working copy)
> @@ -1,3 +1,7 @@
> +2015-12-15  Alessandro Fanfarillo 

Two spaces before < instead of one.

Jakub


Re: [PATCH] IRA: Fix % constraint modifier handling on disabled alternatives.

2015-12-15 Thread Bernd Schmidt

On 12/14/2015 02:05 PM, Andreas Krebbel wrote:

the constraint modifier % applies to all the alternatives of a pattern
and hence is mostly added to the first constraint of an operand.  IRA
currently ignores it if the alternative with the % gets disabled by
using the `enabled' attribute or if it is not among the preferred
alternatives.

Fixed with the attached patch by moving the % check to the first loop
which walks unconditionally over all the constraints.

Ok for mainline?


Ok assuming normal testing was done.


Bernd



Re: [PATCH][combine] PR rtl-optimization/68651 Try changing rtx from (r + r) to (r << 1) to aid recognition

2015-12-15 Thread Bernd Schmidt

On 12/14/2015 01:25 PM, Kyrill Tkachov wrote:

For this PR I want to teach combine to deal with unrecognisable patterns
that contain a sub-expression like
(x + x) by transforming it into (x << 1) and trying to match the result.
This is because some instruction
sets like arm and aarch64 can combine shifts with other arithmetic
operations or have shifts in their RTL representation
of more complex operations (like the aarch64 UBFIZ instruction which can
be expressed as a zero_extend+ashift pattern).

Due to a change in rtx costs for -mcpu=cortex-a53 in GCC 5 we no longer
expand an expression like x * 2 as x << 1
but rather as x + x, which hurts combination opportunities dues to this
deficiency.

This patch addresses the issue in the recog_for_combine function in
combine.c in a similar way to the change_zero_ext
trick. That is, if it recog_for_combine fails to match a pattern it
replaces all instances of x + x in the
rtx with x << 1 and tries again.

This way I've been able to get combine to more aggressively generate the
arithmetic+shift forms of instructions for
-mcpu=cortex-a53 on aarch64 as well as instructions like ubfiz and sbfiz
that contain shift-by-immediate sub-expressions.

This patch shouldn't affect rtxes that already match, so it should have
no fallout on other cases.


I'm somewhat undecided on this. If we keep adding cases to this 
mechanism, the run time costs will eventually add up (we'll iterate over 
the pattern over and over again if it doesn't match, which is the normal 
case in combine), and we're still not testing combinations of these 
replacements.


I wonder if it would be possible to have genrecog write a special 
recognizer that can identify cases where a pattern would match if it was 
changed. Something along the lines of


recog_for_combine (..., vec<..> *replacements)
{

  /* Trying to recognize a shift.  */
  if (GET_CODE (x) == PLUS && rtx_equal_p (XEXP (x, 0), XEXP (x, 1)))
replacements->safe_push (...)
}

Seems like it would be more efficient and more flexible.


Bernd


Re: [Fortran, Patch} Fix ICE for coarray Critical inside module procedure

2015-12-15 Thread Alessandro Fanfarillo
Committed as revision 231649 on trunk and as revision 231650 on gcc-5-branch.

Thanks.

2015-12-14 20:02 GMT+01:00 Tobias Burnus :
> Dear Alessandro,
>
> Alessandro Fanfarillo wrote:
>>
>> the compiler returns an ICE when a coarray critical section is used
>> inside a module procedure.
>> The symbols related with the lock variables were left uncommitted
>> inside resolve_critical(). A gfc_commit_symbol after each symbol or a
>> gfc_commit_symbols at the end of resolve_critical() fixed the issue.
>>
>> The latter solution is proposed in the attached patch.
>> Built and regtested on x86_64-pc-linux-gnu
>
>
> Looks good to me.
>
>> PS: This patch should be also included in GCC 5.
>
>
> Yes, that's fine with me.
>
> Tobias
>
> PS: I saw that you now have a GCC account, which you can use to commit to
> both the trunk and gcc-5-branch. See https://gcc.gnu.org/svnwrite.html.
> Additionally, you should update MAINTAINERS (trunk only) by adding yourself
> under "Write After Approval"; you can simply commit this patch yourself, but
> you should write an email to gcc-patches with the patch - like Alan did at
> https://gcc.gnu.org/ml/gcc-patches/2015-10/msg02390.html


[PTX] remove unnecessary promotions

2015-12-15 Thread Nathan Sidwell
This  patch removes  unnecessary mode promotions in argument and return 
handling.  We also move the   handling of thee return mode directly into 
nvptx_function value, rather than have that return the mode and then detect 
emission of the move that uses it.


nathan
2015-12-15  Nathan Sidwell  

	* config/nvptx/nvptx.h (HARD_REGNO_NREGS): Reformat.
	(CANNOT_CHANGE_MODE_CLASS): Always return true.
	(HARD_REGNO_MODE_OK): Reformat.
	* config/nvptx/nvptx.md (define_expand mov): No
	RETURN_REGNUM handling here.
	* config/nvptx/nvptx.c (nvptx_function_value): Set ret_reg_mode
	here.
	(write_one_arg): No QI or HI mode args.
	(write_fn_proto_from_insn): No argument promotion here.
	(nvptx_output_return_insn): No return promotion here.
	(nvptx_output_mov_insn): No RETURN_REGNUM handling needed.
	(nvptx_output_call_insn): No return promotion here.

Index: gcc/config/nvptx/nvptx.c
===
--- gcc/config/nvptx/nvptx.c	(revision 231639)
+++ gcc/config/nvptx/nvptx.c	(working copy)
@@ -516,7 +516,10 @@ nvptx_function_value (const_tree type, c
   machine_mode mode = promote_return (TYPE_MODE (type));
 
   if (outgoing)
-return gen_rtx_REG (mode, NVPTX_RETURN_REGNUM);
+{
+  cfun->machine->ret_reg_mode = mode;
+  return gen_rtx_REG (mode, NVPTX_RETURN_REGNUM);
+}
 
   return nvptx_libcall_value (mode, NULL_RTX);
 }
@@ -586,8 +589,6 @@ write_one_arg (std::stringstream &s, int
   /* Writing PTX prototype.  */
   s << (argno ? ", " : " (");
   s << ".param" << ptx_type << " %in_ar" << argno;
-  if (mode == QImode || mode == HImode)
-	s << "[1]";
 }
   else
 {
@@ -674,6 +675,7 @@ write_return (std::stringstream &s, bool
 	 this data, but more importantly for us, we must ensure it
 	 doesn't change the PTX prototype.  */
   mode = (machine_mode) cfun->machine->ret_reg_mode;
+
   if (mode == VOIDmode)
 	return return_in_mem;
 
@@ -834,7 +836,7 @@ write_fn_proto_from_insn (std::stringstr
 
   if (result != NULL_RTX)
 s << "(.param"
-  << nvptx_ptx_type_from_mode (arg_promotion (GET_MODE (result)), false)
+  << nvptx_ptx_type_from_mode (GET_MODE (result), false)
   << " %rval) ";
 
   s << name;
@@ -1049,11 +1051,8 @@ nvptx_output_return (void)
   machine_mode mode = (machine_mode)cfun->machine->ret_reg_mode;
 
   if (mode != VOIDmode)
-{
-  mode = arg_promotion (mode);
-  fprintf (asm_out_file, "\tst.param%s\t[%%out_retval], %%retval;\n",
-	   nvptx_ptx_type_from_mode (mode, false));
-}
+fprintf (asm_out_file, "\tst.param%s\t[%%out_retval], %%retval;\n",
+	 nvptx_ptx_type_from_mode (mode, false));
 
   return "ret;";
 }
@@ -1804,12 +1803,6 @@ nvptx_output_mov_insn (rtx dst, rtx src)
   machine_mode src_inner = (GET_CODE (src) == SUBREG
 			? GET_MODE (XEXP (src, 0)) : dst_mode);
 
-  if (REG_P (dst) && REGNO (dst) == NVPTX_RETURN_REGNUM && dst_mode == HImode)
-/* Special handling for the return register.  It's never really an
-   HI object, and only occurs as the destination of a move
-   insn.  */
-dst_inner = SImode;
-
   if (src_inner == dst_inner)
 return "%.\tmov%t0\t%0, %1;";
 
@@ -1841,8 +1834,7 @@ nvptx_output_call_insn (rtx_insn *insn,
   fprintf (asm_out_file, "\t{\n");
   if (result != NULL)
 fprintf (asm_out_file, "\t\t.param%s %%retval_in;\n",
-	 nvptx_ptx_type_from_mode (arg_promotion (GET_MODE (result)),
-   false));
+	 nvptx_ptx_type_from_mode (GET_MODE (result), false));
 
   /* Ensure we have a ptx declaration in the output if necessary.  */
   if (GET_CODE (callee) == SYMBOL_REF)
Index: gcc/config/nvptx/nvptx.h
===
--- gcc/config/nvptx/nvptx.h	(revision 231639)
+++ gcc/config/nvptx/nvptx.h	(working copy)
@@ -90,9 +90,12 @@
 #define CALL_USED_REGISTERS\
   { 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1 }
 
-#define HARD_REGNO_NREGS(regno, mode)	((void)(regno), (void)(mode), 1)
-#define CANNOT_CHANGE_MODE_CLASS(M1, M2, CLS) ((CLS) == RETURN_REG)
-#define HARD_REGNO_MODE_OK(REG, MODE) nvptx_hard_regno_mode_ok (REG, MODE)
+#define HARD_REGNO_NREGS(REG, MODE)		\
+  ((void)(REG), (void)(MODE), 1)
+#define CANNOT_CHANGE_MODE_CLASS(M1, M2, CLS)	\
+  ((void)(M1), (void)(M2), (void)(CLS), true)
+#define HARD_REGNO_MODE_OK(REG, MODE)		\
+ ((void)(REG), (void)(MODE), true)
 
 /* Register Classes.  */
 
Index: gcc/config/nvptx/nvptx.md
===
--- gcc/config/nvptx/nvptx.md	(revision 231639)
+++ gcc/config/nvptx/nvptx.md	(working copy)
@@ -280,16 +280,6 @@
   ""
 {
   operands[1] = nvptx_maybe_convert_symbolic_operand (operands[1]);
-  /* Record the mode of the return register so that we can prevent
- later optimization passes from changing it.  */
-  if (REG_P (operands[0]) && REGNO (operands[0]) == NVPTX_RETURN_REGNUM
-  && cfun)
-{
-  if (cfun->machine->ret_reg_mode == VOIDmode)

Re: [build] Only support -gstabs on Mac OS X if assember supports it (PR target/67973)

2015-12-15 Thread Rainer Orth
Mike Stump  writes:

> On Dec 14, 2015, at 2:40 AM, Rainer Orth  
> wrote:
>> As described in PR PR target/67973, newer assemblers on Mac OS X, which
>> are based on LLVM instead of gas, don't support .stab* directives any
>> longer.  The following patch detects this situation and tries to fall
>> back to the older gas-based as if it is still accessible via as -Q.
>> 
>> Tested on x86_64-apple-darwin15.2.0 and as expected the -gstabs* tests
>> now pass.
>> 
>> However, I'm not really comfortable with this solution.
>
> When I proposed automagically adding -Q, it sounded like a good idea.  :-(
>
> Yeah, hard to disagree with your intuition.  If a future assembler had or
> added stabs that had or added all these features, it would come first on
> the path, and it all work just work out nicely with just a configure check
> to disable stabs if it didn’t work.  That simple check should be reliable
> and work well.

Right: I'm effectively keeping just the first configure test for .stabs
support in the assembler to enable or disable
DBX_DEBUG/DBX_DEBUGGING_INFO.  I'll post it later since ...

>> Initially, I
>> forgot to wrap the -Q option to as in %{gstabs*:...}, which lead to a
>> bootstrap failure: the gas- and LLVM-based assemblers differ in a
>> number of other ways
>
> Yeah, having the feature set be a dynamic property when our software
> decides on static basis is bound to hurt.  Seem that the most likely patch
> would be to just turn off stabs in a way that the test suite disables the
> tests by itself, or to just quite the tests suite.

... testing revealed another instance of static assumptions which hurts
us now: while support for -gstabs* is checked for dynamically in
lib/gcc-dg.exp and lib/gfortran-dg.exp for the debug.exp tests, there
are a couple of testcases that use -gstabs* unconditionally, but have a
hardcoded list of targets that support those options.  I'll introduce a
new effective-target keyword (simply checking if -gstabs is accepted
should be enough) to also perform this test dynamically and repost once
it's tested.

Rainer

-- 
-
Rainer Orth, Center for Biotechnology, Bielefeld University


Re: [PATCH] gcc: read -fdebug-prefix-map OLD from environment (improved reproducibility)

2015-12-15 Thread Daniel Kahn Gillmor
On Tue 2015-12-15 07:19:30 -0500, Bernd Schmidt wrote:
> On 12/11/2015 08:14 PM, Daniel Kahn Gillmor wrote:
>> Here's a one-liner patch for this approach (also at
>> https://gcc.gnu.org/bugzilla/attachment.cgi?id=37007):
>
> I think that one-liner is fine, even for now.

great!  what would be the next steps for getting this applied upstream?

Thanks for your review,

--dkg


Re: [PATCH][combine] PR rtl-optimization/68651 Try changing rtx from (r + r) to (r << 1) to aid recognition

2015-12-15 Thread Bernd Schmidt

On 12/14/2015 01:25 PM, Kyrill Tkachov wrote:

PR 68651 is a code quality regression for GCC 5 and GCC 6 that was
introduced due to updated rtx costs
for -mcpu=cortex-a53 that affected expansion.  The costs changes were
correct (to the extent that rtx
costs have any meaning) and I think this is a deficiency in combine that
should be fixed.


Thinking a bit more about this, I'm actually not sure that this isn't a 
backend problem. IMO the costs could and maybe very well should be 
represented such that a left shift by 1 and an add have the same cost, 
and the insn pattern for the shift should emit the add if it is cheaper. 
If there are multiple ways of expressing an operation, then how it is 
represented in RTL is essentially irrelevant to the question of how much 
it costs.



Bernd


Re: [PATCH][combine] PR rtl-optimization/68651 Try changing rtx from (r + r) to (r << 1) to aid recognition

2015-12-15 Thread Richard Earnshaw
On 15/12/15 14:22, Bernd Schmidt wrote:
> On 12/14/2015 01:25 PM, Kyrill Tkachov wrote:
>> PR 68651 is a code quality regression for GCC 5 and GCC 6 that was
>> introduced due to updated rtx costs
>> for -mcpu=cortex-a53 that affected expansion.  The costs changes were
>> correct (to the extent that rtx
>> costs have any meaning) and I think this is a deficiency in combine that
>> should be fixed.
> 
> Thinking a bit more about this, I'm actually not sure that this isn't a
> backend problem. IMO the costs could and maybe very well should be
> represented such that a left shift by 1 and an add have the same cost,
> and the insn pattern for the shift should emit the add if it is cheaper.
> If there are multiple ways of expressing an operation, then how it is
> represented in RTL is essentially irrelevant to the question of how much
> it costs.
> 
> 
> Bernd

That might be OK if we didn't have to have standard canonicalization,
but I think handling all these special cases would make it incredibly
complex and fragile to work out the accurate costs of all these patterns.

It's also possible that this would explicitly break some other
optimization passes (such as the way in which multiplies are synthesised
with shift/add operations).

R.


Re: [PATCH][combine] PR rtl-optimization/68651 Try changing rtx from (r + r) to (r << 1) to aid recognition

2015-12-15 Thread Bernd Schmidt

On 12/15/2015 03:33 PM, Richard Earnshaw wrote:


It's also possible that this would explicitly break some other
optimization passes (such as the way in which multiplies are synthesised
with shift/add operations).


How so? IMO such a change would make cost calculations more accurate and 
should help rather than break anything.



Bernd



Re: [PR 66616] Check for thunks when adding extra constants to clones

2015-12-15 Thread H.J. Lu
On Fri, Dec 11, 2015 at 7:27 AM, Martin Jambor  wrote:
> Hi,
>
> PR 66616 happens because in find_more_scalar_values_for_callers_subset
> we do not do the same thunk checks like we do in
> propagate_constants_accross_call.  I am in the process of
> bootstrapping and testing the following patch to fix it.  OK if it
> passes?
>
> Thanks,
>
> Martin
>
>
> 2015-12-11  Martin Jambor  
>
> PR ipa/66616
> * ipa-cp.c (propagate_constants_accross_call): Move thuk check...
> (call_passes_through_thunk_p): ...here.
> (find_more_scalar_values_for_callers_subset): Perform thunk checks
> like propagate_constants_accross_call does.
>
> testsuite/
> * g++.dg/ipa/pr66616.C: New test.
> ---

I got

FAIL: g++.dg/ipa/pr66616.C  -std=gnu++11 execution test
FAIL: g++.dg/ipa/pr66616.C  -std=gnu++14 execution test
FAIL: g++.dg/ipa/pr66616.C  -std=gnu++98 execution test

on x86-64.


-- 
H.J.


C PATCH for c/68907 (bogus warning with _Atomic predecrement)

2015-12-15 Thread Marek Polacek
Here, missing TREE_NO_WARNING on an artificial variable caused bogus
-Wunused-value warning.

Bootstrapped/regtested on x86_64-linux, ok for trunk?

2015-12-15  Marek Polacek  

PR c/68907
* c-typeck.c (build_atomic_assign): Set TREE_NO_WARNING on an
artificial decl.

* gcc.dg/pr68907.c: New test.

diff --git gcc/c/c-typeck.c gcc/c/c-typeck.c
index b691072..9d6c604 100644
--- gcc/c/c-typeck.c
+++ gcc/c/c-typeck.c
@@ -3814,6 +3814,7 @@ build_atomic_assign (location_t loc, tree lhs, enum 
tree_code modifycode,
   newval = create_tmp_var_raw (nonatomic_lhs_type);
   newval_addr = build_unary_op (loc, ADDR_EXPR, newval, 0);
   TREE_ADDRESSABLE (newval) = 1;
+  TREE_NO_WARNING (newval) = 1;
 
   loop_decl = create_artificial_label (loc);
   loop_label = build1 (LABEL_EXPR, void_type_node, loop_decl);
diff --git gcc/testsuite/gcc.dg/pr68907.c gcc/testsuite/gcc.dg/pr68907.c
index e69de29..de1c237 100644
--- gcc/testsuite/gcc.dg/pr68907.c
+++ gcc/testsuite/gcc.dg/pr68907.c
@@ -0,0 +1,14 @@
+/* PR c/60195 */
+/* { dg-do compile } */
+/* { dg-options "-std=c11 -Wpedantic -Wall" } */
+
+_Atomic int a;
+
+void
+fn (void)
+{
+  ++a;
+  a++;
+  --a;
+  a--;
+}

Marek


Re: C PATCH for c/68907 (bogus warning with _Atomic predecrement)

2015-12-15 Thread Bernd Schmidt

On 12/15/2015 03:59 PM, Marek Polacek wrote:

Here, missing TREE_NO_WARNING on an artificial variable caused bogus
-Wunused-value warning.

Bootstrapped/regtested on x86_64-linux, ok for trunk?

2015-12-15  Marek Polacek  

PR c/68907
* c-typeck.c (build_atomic_assign): Set TREE_NO_WARNING on an
artificial decl.

* gcc.dg/pr68907.c: New test.


This looks ok.


Bernd



RE: [Patch] Fix for MIPS PR target/65604

2015-12-15 Thread Moore, Catherine


> -Original Message-
> From: gcc-patches-ow...@gcc.gnu.org [mailto:gcc-patches-
> ow...@gcc.gnu.org] On Behalf Of Steve Ellcey
> Sent: Wednesday, December 09, 2015 1:34 PM
> To: gcc-patches@gcc.gnu.org
> Cc: Moore, Catherine; matthew.fort...@imgtec.com
> Subject: [Patch] Fix for MIPS PR target/65604
> 
> This is a MIPS patch to make mips_output_division obey the -fno-delayed-
> branch flag.  Right now, with mips1 and -mcheck-zero-division, the division
> instruction is put into the bne delay slot even when -fno-delayed-branch is
> specified.  This change uses a similar strategy to MIPS16 where we do the
> division first and then do the zero test while the division is being 
> calculated.
> Tested with mips1 runs and by inspecting the code that is output.
> 
> OK to checkin?
> 
> Steve Ellcey
> sell...@imgtec.com
> 
> 
> 2015-12-09  Steve Ellcey  
> 
>   PR target/65604
>   * config/mips/mips.c (mips_output_division): Check
> flag_delayed_branch.
> 

HI Steve, The patch is OK.  Will you please add a test case and repost?
Thanks,
Catherine



Re: [PATCH 4/7][ARM] Add ACLE feature macro for ARMv8.1 instructions.

2015-12-15 Thread Matthew Wahab

On 10/12/15 10:45, Ramana Radhakrishnan wrote:

On Tue, Dec 8, 2015 at 7:45 AM, Christian Bruel  wrote:

Hi Matthew,


On 26/11/15 16:01, Matthew Wahab wrote:


Hello,

This patch adds the feature macro __ARM_FEATURE_QRDMX to indicate the
presence of the ARMv8.1 instructions vqrdmlah and vqrdmlsh. It is
defined when the instructions are available, as it is when
-march=armv8.1-a is enabled with suitable fpu options.

gcc/
2015-11-26  Matthew Wahab  

   * config/arm/arm-c.c (arm_cpu_builtins): Define
__ARM_FEATURE_QRDMX.





+  if (TARGET_NEON_RDMA)
+builtin_define ("__ARM_FEATURE_QRDMX");
+

Since it depends on TARGET_NEON, could you please use

   def_or_undef_macro (pfile, "__ARM_FEATURE_QRDMX", TARGET_NEON_RDMA);

instead ?


I think that's what it should be -

OK with that fixed.


Attached an updated patch using the def_or_undef macro. It also removes some trailing 
whitespace in that part of the code.


Still ok?
Matthew

gcc/
2015-12-14  Matthew Wahab  

* config/arm/arm-c.c (arm_cpu_builtins): Define
__ARM_FEATURE_QRDMX.  Clean up some trailing whitespace.


>From 8cce5cd7b6d89c49dcf694a5c72ab0ed7c26fe20 Mon Sep 17 00:00:00 2001
From: Matthew Wahab 
Date: Wed, 17 Jun 2015 13:25:09 +0100
Subject: [PATCH 4/7] [ARM] Add __ARM_FEATURE_QRDMX

---
 gcc/config/arm/arm-c.c | 8 +---
 1 file changed, 5 insertions(+), 3 deletions(-)

diff --git a/gcc/config/arm/arm-c.c b/gcc/config/arm/arm-c.c
index 7dee28e..a980ed8 100644
--- a/gcc/config/arm/arm-c.c
+++ b/gcc/config/arm/arm-c.c
@@ -62,19 +62,21 @@ static void
 arm_cpu_builtins (struct cpp_reader* pfile)
 {
   def_or_undef_macro (pfile, "__ARM_FEATURE_DSP", TARGET_DSP_MULTIPLY);
-  def_or_undef_macro (pfile, "__ARM_FEATURE_QBIT", TARGET_ARM_QBIT); 
+  def_or_undef_macro (pfile, "__ARM_FEATURE_QBIT", TARGET_ARM_QBIT);
   def_or_undef_macro (pfile, "__ARM_FEATURE_SAT", TARGET_ARM_SAT);
   def_or_undef_macro (pfile, "__ARM_FEATURE_CRYPTO", TARGET_CRYPTO);
 
   def_or_undef_macro (pfile, "__ARM_FEATURE_UNALIGNED", unaligned_access);
 
+  def_or_undef_macro (pfile, "__ARM_FEATURE_QRDMX", TARGET_NEON_RDMA);
+
   if (TARGET_CRC32)
 builtin_define ("__ARM_FEATURE_CRC32");
 
-  def_or_undef_macro (pfile, "__ARM_32BIT_STATE", TARGET_32BIT); 
+  def_or_undef_macro (pfile, "__ARM_32BIT_STATE", TARGET_32BIT);
 
   if (TARGET_ARM_FEATURE_LDREX)
-builtin_define_with_int_value ("__ARM_FEATURE_LDREX", 
+builtin_define_with_int_value ("__ARM_FEATURE_LDREX",
    TARGET_ARM_FEATURE_LDREX);
   else
 cpp_undef (pfile, "__ARM_FEATURE_LDREX");
-- 
2.1.4



Re: [PATCH 5/7][Testsuite] Support ARMv8.1 ARM tests.

2015-12-15 Thread Matthew Wahab

On 10/12/15 10:49, Ramana Radhakrishnan wrote:

On Mon, Dec 7, 2015 at 4:10 PM, Matthew Wahab  
wrote:

On 27/11/15 17:11, Matthew Wahab wrote:

On 27/11/15 13:44, Christophe Lyon wrote:

On 26/11/15 16:02, Matthew Wahab wrote



This patch adds ARMv8.1 support to GCC Dejagnu, to allow ARM tests to
specify targest and to set up command line options. It builds on the
ARMv8.1 target support added for AArch64 tests, partly reworking that
support to take into account the different configurations that tests may
be run under.

[..]

# Return 1 if the target supports the ARMv8.1 Adv.SIMD extension, 0 -#
otherwise.  The test is valid for AArch64. +# otherwise.  The test is valid for
AArch64 and ARM.  Record the command +# line options that needed.


s/that//


Fixed in attached patch.


Can you also make sure doc/sourcebuild.texi is updated for this helper function 
?
If not documented,it would be good to add the documentation for the same while 
you
are here.


Done, I've listed them as ARM attributes based on their names.

Tested this and the other update patch (#4/7) for arm-none-eabi with 
cross-compiled
check-gcc by running the gcc.target/aarch64/advsimd-intrinsics with and without 
ARMv8.1 enabled as a test target.


Ok?
Matthew

testsuite/
2015-12-14  Matthew Wahab  

* lib/target-supports.exp (add_options_for_arm_v8_1a_neon): Update
comment.  Use check_effective_target_arm_v8_1a_neon_ok to select
the command line options.
(check_effective_target_arm_v8_1a_neon_ok_nocache): Update initial
test to allow ARM targets.  Select and record a working set of
command line options.
(check_effective_target_arm_v8_1a_neon_hw): Add tests for ARM
targets.

gcc/
2015-12-14  Matthew Wahab  

* doc/sourcebuild.texi (ARM-specific attributes): Add
"arm_v8_1a_neon_ok" and "arm_v8_1a_neon_hw".

>From d6a4dfd89cfb29aeaa0e2d58ac9d8271b31879c1 Mon Sep 17 00:00:00 2001
From: Matthew Wahab 
Date: Fri, 9 Oct 2015 17:38:12 +0100
Subject: [PATCH 5/7] [Testsuite] Support ARMv8.1 NEON on ARM.

---
 gcc/doc/sourcebuild.texi  |  9 ++
 gcc/testsuite/lib/target-supports.exp | 60 ++-
 2 files changed, 54 insertions(+), 15 deletions(-)

diff --git a/gcc/doc/sourcebuild.texi b/gcc/doc/sourcebuild.texi
index 61de4a5..cd49e6d8 100644
--- a/gcc/doc/sourcebuild.texi
+++ b/gcc/doc/sourcebuild.texi
@@ -1582,6 +1582,15 @@ Some multilibs may be incompatible with these options.
 ARM target supports @code{-mfpu=neon-fp-armv8 -mfloat-abi=softfp}.
 Some multilibs may be incompatible with these options.
 
+@item arm_v8_1a_neon_ok
+ARM target supports options to generate ARMv8.1 Adv.SIMD instructions.
+Some multilibs may be incompatible with these options.
+
+@item arm_v8_1a_neon_hw
+ARM target supports executing ARMv8.1 Adv.SIMD instructions.  Some
+multilibs may be incompatible with the options needed.  Implies
+arm_v8_1a_neon_ok.
+
 @item arm_prefer_ldrd_strd
 ARM target prefers @code{LDRD} and @code{STRD} instructions over
 @code{LDM} and @code{STM} instructions.
diff --git a/gcc/testsuite/lib/target-supports.exp b/gcc/testsuite/lib/target-supports.exp
index 8d28b23..a0de314 100644
--- a/gcc/testsuite/lib/target-supports.exp
+++ b/gcc/testsuite/lib/target-supports.exp
@@ -2825,14 +2825,15 @@ proc add_options_for_arm_v8_neon { flags } {
 return "$flags $et_arm_v8_neon_flags -march=armv8-a"
 }
 
-# Add the options needed for ARMv8.1 Adv.SIMD.
+# Add the options needed for ARMv8.1 Adv.SIMD.  Also adds the ARMv8 NEON
+# options for AArch64 and for ARM.
 
 proc add_options_for_arm_v8_1a_neon { flags } {
-if { [istarget aarch64*-*-*] } {
-	return "$flags -march=armv8.1-a"
-} else {
+if { ! [check_effective_target_arm_v8_1a_neon_ok] } {
 	return "$flags"
 }
+global et_arm_v8_1a_neon_flags
+return "$flags $et_arm_v8_1a_neon_flags -march=armv8.1-a"
 }
 
 proc add_options_for_arm_crc { flags } {
@@ -3280,17 +3281,33 @@ proc check_effective_target_arm_neonv2_hw { } {
 }
 
 # Return 1 if the target supports the ARMv8.1 Adv.SIMD extension, 0
-# otherwise.  The test is valid for AArch64.
+# otherwise.  The test is valid for AArch64 and ARM.  Record the command
+# line options needed.
 
 proc check_effective_target_arm_v8_1a_neon_ok_nocache { } {
-if { ![istarget aarch64*-*-*] } {
-	return 0
+global et_arm_v8_1a_neon_flags
+set et_arm_v8_1a_neon_flags ""
+
+if { ![istarget arm*-*-*] && ![istarget aarch64*-*-*] } {
+	return 0;
 }
-return [check_no_compiler_messages_nocache arm_v8_1a_neon_ok assembly {
-	#if !defined (__ARM_FEATURE_QRDMX)
-	#error "__ARM_FEATURE_QRDMX not defined"
-	#endif
-} [add_options_for_arm_v8_1a_neon ""]]
+
+# Iterate through sets of options to find the compiler flags that
+# need to be added to the -march option.  Start with the empty set
+# since AArch64 only needs the -march setting.
+foreach flags {"" "-mfpu=neon-fp-armv8" "-mfloat-abi=softfp" \

Re: [PATCH] gcc: read -fdebug-prefix-map OLD from environment (improved reproducibility)

2015-12-15 Thread Bernd Schmidt

On 12/15/2015 03:14 PM, Daniel Kahn Gillmor wrote:

On Tue 2015-12-15 07:19:30 -0500, Bernd Schmidt wrote:

On 12/11/2015 08:14 PM, Daniel Kahn Gillmor wrote:

Here's a one-liner patch for this approach (also at
https://gcc.gnu.org/bugzilla/attachment.cgi?id=37007):


I think that one-liner is fine, even for now.


great!  what would be the next steps for getting this applied upstream?


I'm guessing you don't have an account so I'll bootstrap and test it and 
then commit. (with an extra testcase, as below - adapted from another 
testcase in the debug/dwarf2 directory).



Bernd

* gcc.dg/debug/dwarf2/prod-options.c: New file.

/* Verify that the DW_AT_producer does not contain certain compiler options
   such as -fdebug-prefix-map=; this is undesirable since path names make
   the build not reproducible.  Other skipped options could be tested here
   as well.  */
/* { dg-do compile } */
/* { dg-options "-O2 -gdwarf -dA -fdebug-prefix-map=a=b" } */
/* { dg-final { scan-assembler "DW_AT_producer: \"GNU C" } } */
/* { dg-final { scan-assembler-not "debug-prefix-map" } } */

void func (void)
{
}


Re: [COMMITTED] Add myself to MAINTAINERS (Write After Approval)

2015-12-15 Thread Alessandro Fanfarillo
Sorry about that.

Committed revision 231657

Index: ChangeLog
===
--- ChangeLog(revision 231656)
+++ ChangeLog(working copy)
@@ -1,4 +1,4 @@
-2015-12-15  Alessandro Fanfarillo 
+2015-12-15  Alessandro Fanfarillo  

 * MAINTAINERS (Write After Approval): Add myself.

2015-12-15 14:09 GMT+01:00 Jakub Jelinek :
> On Tue, Dec 15, 2015 at 02:07:40PM +0100, Alessandro Fanfarillo wrote:
>> I've added myself to Write After Approval maintainers.
>
>> --- ChangeLog(revision 231646)
>> +++ ChangeLog(working copy)
>> @@ -1,3 +1,7 @@
>> +2015-12-15  Alessandro Fanfarillo 
>
> Two spaces before < instead of one.
>
> Jakub


Re: [PATCH][LTO,ARM] Fix vector TYPE_MODE in streaming-out

2015-12-15 Thread Bernd Schmidt

On 12/15/2015 04:09 PM, Christian Bruel wrote:

in "normal" mode, the TYPE_MODE for vector_type __simd64_int8_t is set
to V8QImode by arm_vector_mode_supported_p during the builtins type
initializations, thanks to TARGET_NEON set bu the global flag.

Now, in LTO mode the streamer writes the information for this
vector_type as a scalar DImode, causing ICEs during arm_expand_builtin.
The root cause of this is that the streamer-out uses TYPE_MODE in a
context where the target_flags are not known return false for TARGET_NEON.

The streamer-in then will then read the wrong mode that propagates to
the back-end.



 static void
 pack_ts_type_common_value_fields (struct bitpack_d *bp, tree expr)
 {
-  bp_pack_machine_mode (bp, TYPE_MODE (expr));
+  bp_pack_machine_mode (bp, expr->type_common.mode);


This looks sensible given that tree-streamer-in uses SET_TYPE_MODE, 
which just writes expr->type_common.mode.


Make a new macro TYPE_MODE_RAW for this and I think the patch is ok 
(although there's precedent for direct access in vector_type_mode, but I 
think that's just bad).



Bernd


Re: [PATCH][combine] PR rtl-optimization/68651 Try changing rtx from (r + r) to (r << 1) to aid recognition

2015-12-15 Thread Kyrill Tkachov

Hi Bernd,

On 15/12/15 14:22, Bernd Schmidt wrote:

On 12/14/2015 01:25 PM, Kyrill Tkachov wrote:

PR 68651 is a code quality regression for GCC 5 and GCC 6 that was
introduced due to updated rtx costs
for -mcpu=cortex-a53 that affected expansion.  The costs changes were
correct (to the extent that rtx
costs have any meaning) and I think this is a deficiency in combine that
should be fixed.


Thinking a bit more about this, I'm actually not sure that this isn't a backend problem. IMO the costs could and maybe very well should be represented such that a left shift by 1 and an add have the same cost, and the insn pattern for the 
shift should emit the add if it is cheaper. If there are multiple ways of expressing an operation, then how it is represented in RTL is essentially irrelevant to the question of how much it costs.




Then for the shift pattern in the MD file we'd have to dynamically select the 
scheduling type depending on whether or not
the shift amount is 1 and the costs line up?
Sounds like going out of our way to work around the fact that either combine or 
recog does not understand equivalent forms
of instructions.

I'd be more inclined to follow your suggestion in your first reply (teaching 
genrecog about equivalent patterns)
However, I think that would require a bit more involved surgery in 
recog/combine whereas this approach
reuses the existing change_zero_ext approach which I feel is less intrusive at 
this stage.

The price we pay when trying these substitutions is an iteration over the rtx 
with FOR_EACH_SUBRTX_PTR.
recog gets called only if that iteration actually performed a substitution of x + x 
into x << 1.
Is that too high a price to pay? (I'm not familiar with the performance 
characteristics of
the FOR_EACH_SUBRTX machinery)

Thanks,
Kyrill



Bernd





Re: RFA (hash-*): PATCH for c++/68309

2015-12-15 Thread Jason Merrill

On 12/14/2015 07:49 PM, Trevor Saunders wrote:


+  hash_map (const hash_map &h, bool ggc = false,
+   bool gather_mem_stats = true CXX_MEM_STAT_INFO)


sorry about the late response, but wouldn't it be better to make this
and the hash_table constructor explicit?  Its probably less important
than other constructors, but is there a reasonable use for taking or
return them by value?


Makes sense; done.

Jason




[PATCH] C FE: use correct location range for static assertions

2015-12-15 Thread David Malcolm
When issuing diagnostics for _Static_assert, we currently ignore the
location/range of the asserted expression, and instead use the
location/range of the first token within it, which can be
incorrect for compound expressions:

error: expression in static assertion is not constant
   _Static_assert (param > 0, "message");
   ^

This patch changes things to use EXPR_LOC_OR_LOC, so we use the
location/range of the expression if it has one, falling back to the old
behavior if it doesn't, giving:

error: expression in static assertion is not constant
   _Static_assert (param > 0, "message");
   ~~^~~

Successfully bootstrapped®rtested on x86_64-pc-linux-gnu

OK for trunk in stage 3?

[a much earlier version of this was posted as part of:
"[PATCH 16/22] C/C++ frontend: use tree ranges in various diagnostics"
  https://gcc.gnu.org/ml/gcc-patches/2015-09/msg00745.html
but this patch bears no resemblence apart from the testcase, due to
changes in representation]

gcc/c/ChangeLog:
* c-parser.c (c_parser_static_assert_declaration_no_semi): Use the
expression location, falling back on the first token location,
rather than always using the latter.

gcc/testsuite/ChangeLog:
* gcc.dg/diagnostic-range-static-assert.c: New test case.
---
 gcc/c/c-parser.c   |  3 ++-
 .../gcc.dg/diagnostic-range-static-assert.c| 24 ++
 2 files changed, 26 insertions(+), 1 deletion(-)
 create mode 100644 gcc/testsuite/gcc.dg/diagnostic-range-static-assert.c

diff --git a/gcc/c/c-parser.c b/gcc/c/c-parser.c
index 124c30b..5c32f45 100644
--- a/gcc/c/c-parser.c
+++ b/gcc/c/c-parser.c
@@ -2097,8 +2097,9 @@ c_parser_static_assert_declaration_no_semi (c_parser 
*parser)
   c_parser_consume_token (parser);
   if (!c_parser_require (parser, CPP_OPEN_PAREN, "expected %<(%>"))
 return;
-  value_loc = c_parser_peek_token (parser)->location;
+  location_t value_tok_loc = c_parser_peek_token (parser)->location;
   value = c_parser_expr_no_commas (parser, NULL).value;
+  value_loc = EXPR_LOC_OR_LOC (value, value_tok_loc);
   parser->lex_untranslated_string = true;
   if (!c_parser_require (parser, CPP_COMMA, "expected %<,%>"))
 {
diff --git a/gcc/testsuite/gcc.dg/diagnostic-range-static-assert.c 
b/gcc/testsuite/gcc.dg/diagnostic-range-static-assert.c
new file mode 100644
index 000..6f75476
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/diagnostic-range-static-assert.c
@@ -0,0 +1,24 @@
+/* { dg-options "-fdiagnostics-show-caret" } */
+
+void test_nonconst_static_assert (int param)
+{
+  int local = 0;
+
+  _Static_assert (param > 0, "message"); /* { dg-error "expression in static 
assertion is not constant" } */
+/* { dg-begin-multiline-output "" }
+   _Static_assert (param > 0, "message");
+   ~~^~~
+{ dg-end-multiline-output "" } */
+
+  _Static_assert (param, "message"); /* { dg-error "expression in static 
assertion is not constant" } */
+/* { dg-begin-multiline-output "" }
+   _Static_assert (param, "message");
+   ^
+{ dg-end-multiline-output "" } */
+
+  _Static_assert (local, "message"); /* { dg-error "expression in static 
assertion is not constant" } */
+/* { dg-begin-multiline-output "" }
+   _Static_assert (local, "message");
+   ^
+{ dg-end-multiline-output "" } */
+}
-- 
1.8.5.3



[PATCH] C++ FE: Show both locations in string literal concatenation error

2015-12-15 Thread David Malcolm
We can use rich_location and the new diagnostic_show_locus to print
*both* locations when complaining about a bogus string concatenation
in the C++ FE, giving e.g.:

test.C:3:24: error: unsupported non-standard concatenation of string literals
 const void *s = u8"a"  u"b";
 ~  ^~~~

Successfully bootstrapped®rtested on x86_64-pc-linux-gnu

OK for trunk for gcc 6?

(an earlier version of this was posted as part of:
"[PATCH 10/22] C++ FE: Use token ranges for various diagnostics"
  https://gcc.gnu.org/ml/gcc-patches/2015-09/msg00730.html
though the implementation has changed slightly)

gcc/cp/ChangeLog:
* parser.c (cp_parser_string_literal): Convert non-standard
concatenation error to directly use a rich_location, and
use that to add the location of the first literal to the
diagnostic.

gcc/testsuite/ChangeLog:
* g++.dg/diagnostic/string-literal-concat.C: New test case.
---
 gcc/cp/parser.c | 16 +++-
 gcc/testsuite/g++.dg/diagnostic/string-literal-concat.C | 13 +
 2 files changed, 24 insertions(+), 5 deletions(-)
 create mode 100644 gcc/testsuite/g++.dg/diagnostic/string-literal-concat.C

diff --git a/gcc/cp/parser.c b/gcc/cp/parser.c
index a420cf1..5247a5e 100644
--- a/gcc/cp/parser.c
+++ b/gcc/cp/parser.c
@@ -3815,13 +3815,12 @@ cp_parser_string_literal (cp_parser *parser, bool 
translate, bool wide_ok,
 }
   else
 {
-  location_t last_tok_loc;
+  location_t last_tok_loc = tok->location;
   gcc_obstack_init (&str_ob);
   count = 0;
 
   do
{
- last_tok_loc = tok->location;
  cp_lexer_consume_token (parser->lexer);
  count++;
  str.text = (const unsigned char *)TREE_STRING_POINTER (string_tree);
@@ -3853,13 +3852,20 @@ cp_parser_string_literal (cp_parser *parser, bool 
translate, bool wide_ok,
  if (type == CPP_STRING)
type = curr_type;
  else if (curr_type != CPP_STRING)
-   error_at (tok->location,
- "unsupported non-standard concatenation "
- "of string literals");
+{
+  rich_location rich_loc (line_table, tok->location);
+  rich_loc.add_range (last_tok_loc, get_finish (last_tok_loc),
+  false);
+  error_at_rich_loc (&rich_loc,
+ "unsupported non-standard concatenation "
+ "of string literals");
+}
}
 
  obstack_grow (&str_ob, &str, sizeof (cpp_string));
 
+ last_tok_loc = tok->location;
+
  tok = cp_lexer_peek_token (parser->lexer);
  if (cpp_userdef_string_p (tok->type))
{
diff --git a/gcc/testsuite/g++.dg/diagnostic/string-literal-concat.C 
b/gcc/testsuite/g++.dg/diagnostic/string-literal-concat.C
new file mode 100644
index 000..2819b25
--- /dev/null
+++ b/gcc/testsuite/g++.dg/diagnostic/string-literal-concat.C
@@ -0,0 +1,13 @@
+/* { dg-options "-fdiagnostics-show-caret -std=c++11" } */
+
+const void *s = u8"a"  u"b";  // { dg-error "non-standard concatenation" }
+/* { dg-begin-multiline-output "" }
+ const void *s = u8"a"  u"b";
+ ~  ^~~~
+   { dg-end-multiline-output "" } */
+
+const void *s2 = u"a"  u"b"  u8"c";  // { dg-error "non-standard 
concatenation" }
+/* { dg-begin-multiline-output "" }
+ const void *s2 = u"a"  u"b"  u8"c";
+  ^
+  { dg-end-multiline-output "" } */
-- 
1.8.5.3



Re: [PATCH] gcc: read -fdebug-prefix-map OLD from environment (improved reproducibility)

2015-12-15 Thread Daniel Kahn Gillmor
On Tue 2015-12-15 11:08:23 -0500, Bernd Schmidt wrote:
> I'm guessing you don't have an account

I don't have an account (though i'd be happy to set one up if you point
me toward the process).

> so I'll bootstrap and test it and then commit.

fwiw, I've tested it myself, and can confirm that it does work.

> (with an extra testcase, as below - adapted from another testcase in
> the debug/dwarf2 directory).

Thanks also for this example, it points me in the right direction for
future work.

Regards,

   --dkg


Re: [RFC] Combine vectorized loops with its scalar remainder.

2015-12-15 Thread Yuri Rumyantsev
Hi Richard,

I re-designed the patch to determine ability of loop masking on fly of
vectorization analysis and invoke it after loop transformation.
Test-case is also provided.

what is your opinion?

Thanks.
Yuri.

ChangeLog::
2015-12-15  Yuri Rumyantsev  

* config/i386/i386.c (ix86_builtin_vectorization_cost): Add handling
of new cases.
* config/i386/i386.h (TARGET_INCREASE_MASK_STORE_COST): Add new target
macros.
* config/i386/x86-tune.def (X86_TUNE_INCREASE_MASK_STORE_COST): New
tuner.
* params.def (PARAM_VECT_COST_INCREASE_THRESHOLD): New parameter.
* target.h (enum vect_cost_for_stmt): Add new elements.
* targhooks.c (default_builtin_vectorization_cost): Extend switch for
new enum elements.
* tree-vect-loop.c : Include 3 header files.
(vect_analyze_loop_operations): Add new filelds initialization and
resetting, add computation of profitability for masking loop for
epilog.
(vectorizable_reduction): Determine ability of reduction masking
and compute its cost.
(vect_can_build_vector_iv): New function.
(vect_generate_tmps_on_preheader): Adjust compution of ration depending
on epilogue generation.
(gen_vec_iv_for_masking): New function.
(gen_vec_mask_for_loop): Likewise.
(mask_vect_load_store): Likewise.
(convert_reductions_for_masking): Likewise.
(fix_mask_for_masked_ld_st): Likewise.
(mask_loop_for_epilogue): Likewise.
(vect_transform_loop): Do not perform loop masking if it requires
peeling for gaps, add check on ability masking of loop, turn off
loop peeling if loop masking is performed, save recomputed NITERS to
correspondent filed of loop_vec_info, invoke of mask_loop_for_epilogue
after vectorization if masking is possible.
* tree-vect-stmts.c : Include tree-ssa-loop-ivopts.h.
(can_mask_load_store): New function.
(vectorizable_mask_load_store): Determine ability of load/store
masking and compute its cost.
(vectorizable_load):  Likewise.
* tree-vectorizer.h (additional_loop_body_cost): New field of
loop_vec_info.
(mask_main_loop_for_epilogue): Likewise.
(LOOP_VINFO_ADDITIONAL_LOOP_BODY_COST): New macros.
(LOOP_VINFO_MASK_MAIN_LOOP_FOR_EPILOGUE): Likewise.

gcc/testsuite/ChangeLog:
* gcc.target/i386/vect-mask-loop_for_epilogue1.c: New test.

2015-11-30 18:03 GMT+03:00 Yuri Rumyantsev :
> Richard,
>
> Thanks a lot for your detailed comments!
>
> Few words about 436.cactusADM gain. The loop which was transformed for
> avx2 is very huge and this is the last inner-most loop in routine
> Bench_StaggeredLeapfrog2 (StaggeredLeapfrog2.F #366). If you don't
> have sources, let me know.
>
> Yuri.
>
> 2015-11-27 16:45 GMT+03:00 Richard Biener :
>> On Fri, Nov 13, 2015 at 11:35 AM, Yuri Rumyantsev  wrote:
>>> Hi Richard,
>>>
>>> Here is updated version of the patch which 91) is in sync with trunk
>>> compiler and (2) contains simple cost model to estimate profitability
>>> of scalar epilogue elimination. The part related to vectorization of
>>> loops with small trip count is in process of developing. Note that
>>> implemented cost model was not tuned  well for HASWELL and KNL but we
>>> got  ~6% speed-up on 436.cactusADM from spec2006 suite for HASWELL.
>>
>> Ok, so I don't know where to start with this.
>>
>> First of all while I wanted to have the actual stmt processing to be
>> as post-processing
>> on the vectorized loop body I didn't want to have this competely separated 
>> from
>> vectorizing.
>>
>> So, do combine_vect_loop_remainder () from vect_transform_loop, not by 
>> iterating
>> over all (vectorized) loops at the end.
>>
>> Second, all the adjustments of the number of iterations for the vector
>> loop should
>> be integrated into the main vectorization scheme as should determining the
>> cost of the predication.  So you'll end up adding a
>> LOOP_VINFO_MASK_MAIN_LOOP_FOR_EPILOGUE flag, determined during
>> cost analysis and during code generation adjust vector iteration computation
>> accordingly and _not_ generate the epilogue loop (or wire it up correctly in
>> the first place).
>>
>> The actual stmt processing should then still happen in a similar way as you 
>> do.
>>
>> So I'm going to comment on that part only as I expect the rest will look a 
>> lot
>> different.
>>
>> +/* Generate induction_vector which will be used to mask evaluation.  */
>> +
>> +static tree
>> +gen_vec_induction (loop_vec_info loop_vinfo, unsigned elem_size, unsigned 
>> size)
>> +{
>>
>> please make use of create_iv.  Add more comments.  I reverse-engineered
>> that you add a { { 0, ..., vf }, +, {vf, ... vf } } IV which you use
>> in gen_mask_for_remainder
>> by comparing it against { niter, ..., niter }.
>>
>> +  gsi = gsi_after_labels (loop->header);
>> +  niters = LOOP_VINFO_PEELING_FOR_ALIGNMENT (loop_vinfo)
>> +  ? LOOP_VINFO_NITERS (loop_vinfo)
>> +  : LOOP_VINFO_NITERS_UNCHANGED (loop_vinfo);
>>
>> that's either wrong or unnecessary.  if ! peeling for alignment
>> loop-vinfo-niters
>> is equal to loop-vinfo-niters-unchanged.
>>
>> +  ptr = build_int_cst (reference_alias_ptr_type (ref), 0);
>> 

Re: [PATCH, testsuite] Fix PR68629: attr-simd-3.c failure on arm-none-eabi targets

2015-12-15 Thread Thomas Schwinge
Hi!

On Wed, 9 Dec 2015 17:56:13 +0800, "Thomas Preud'homme" 
 wrote:
> c-c++-common/attr-simd-3.c fails to compile on arm-none-eabi targets due to 
> -fcilkplus needing -pthread which is not available for those targets. This 
> patch solves this issue by adding a condition to the cilkplus effective 
> target that compiling with -fcilkplus succeeds and requires cilkplus as an 
> effective target for attr-simd-3.c testcase.

> PR testsuite/68629
> * lib/target-supports.exp (check_effective_target_cilkplus): Also
> check that compiling with -fcilkplus does not give an error.
> * c-c++-common/attr-simd-3.c: Require cilkplus effective target.

> --- a/gcc/testsuite/lib/target-supports.exp
> +++ b/gcc/testsuite/lib/target-supports.exp
> @@ -1432,7 +1432,12 @@ proc check_effective_target_cilkplus { } {
>  if { [istarget avr-*-*] } {
>   return 0;
>  }
> -return 1
> +return [ check_no_compiler_messages_nocache fcilkplus_available 
> executable {
> + #ifdef __cplusplus
> + extern "C"
> + #endif
> + int dummy;
> + } "-fcilkplus" ]
>  }
>  
>  proc check_linker_plugin_available { } {
> 
> 
> Testsuite shows no regression when run with
>   + an arm-none-eabi GCC cross-compiler targeting Cortex-M3
>   + a bootstrapped x86_64-linux-gnu GCC native compiler

With this committed in r231605, I now see all gcc/testsuite/ Cilk+
testing disappear for "configure && make && make check", because of:

Executing on host: [...]/build-gcc/gcc/xgcc -B[...]/build-gcc/gcc/ 
fcilkplus_available14337.c  -fno-diagnostics-show-caret 
-fdiagnostics-color=never  -fcilkplus  -lm-o fcilkplus_available14337.exe   
 (timeout = 300)
spawn [...]/build-gcc/gcc/xgcc -B[...]/build-gcc/gcc/ 
fcilkplus_available14337.c -fno-diagnostics-show-caret 
-fdiagnostics-color=never -fcilkplus -lm -o fcilkplus_available14337.exe
xgcc: error: libcilkrts.spec: No such file or directory
compiler exited with status 1

Can you confirm that in your build/test tree, the compiler is picking up
the build-tree libcilkrts, and not the one from /usr/lib/ (or similar)?

Long ago, in r208889, a similar problem has been diagnosed and fixed by
Rainer and Tobias (CCed just in case),
,
so I wonder what broke now?


Grüße
 Thomas


signature.asc
Description: PGP signature


Re: [PATCH 2/4 v2][AArch64] Add support for FCCMP

2015-12-15 Thread James Greenhalgh
On Tue, Dec 15, 2015 at 10:32:50AM +, Wilco Dijkstra wrote:
> ping
> 
> > -Original Message-
> > From: Wilco Dijkstra [mailto:wilco.dijks...@arm.com]
> > Sent: 17 November 2015 18:36
> > To: gcc-patches@gcc.gnu.org
> > Subject: [PATCH 2/4 v2][AArch64] Add support for FCCMP
> > 
> > (v2 version removes 4 enums)
> > 
> > This patch adds support for FCCMP. This is trivial with the new CCMP 
> > representation - remove the restriction of FP in ccmp.c and add
> > FCCMP patterns. Add a test to ensure FCCMP/FCCMPE are emitted as expected.
> > 
> > OK for commit?

The AArch64 code-generation parts of this are OK, though please wait for
an OK on the ccmp.c changes before committing, and please revisit the
testcase.

Sorry that this took a long time to get to.

> > 
> > ChangeLog:
> > 2015-11-18  Wilco Dijkstra  
> > 
> > * gcc/ccmp.c (ccmp_candidate_p): Remove integer-only restriction.

Drop the gcc/ from the paths here and below.

> > * gcc/config/aarch64/aarch64.md (fccmp): New pattern.
> > (fccmpe): Likewise.
> > (fcmp): Rename to fcmp and globalize pattern.
> > (fcmpe): Likewise.
> > * gcc/config/aarch64/aarch64.c (aarch64_gen_ccmp_first): Add FP support.
> > (aarch64_gen_ccmp_next): Add FP support.
> > 
> > gcc/testsuite/
> > * gcc.target/aarch64/ccmp_1.c: New testcase.

This testcase doesn't look very helpful to me. You only end up checking if
*any* of the tests compile to fccmp/fccmpe rather than *all* the tests. Should
this use a scan-assembler-times directive to count the number of times a
particular instruction appears?

Thanks,
James

> > ---
> >  gcc/ccmp.c|  6 ---
> >  gcc/config/aarch64/aarch64.c  | 24 +
> >  gcc/config/aarch64/aarch64.md | 34 -
> >  gcc/testsuite/gcc.target/aarch64/ccmp_1.c | 84 
> > +++
> >  4 files changed, 140 insertions(+), 8 deletions(-)
> >  create mode 100644 gcc/testsuite/gcc.target/aarch64/ccmp_1.c
> > 
> > diff --git a/gcc/ccmp.c b/gcc/ccmp.c
> > index 58ac126..3698a7d 100644
> > --- a/gcc/ccmp.c
> > +++ b/gcc/ccmp.c
> > @@ -112,12 +112,6 @@ ccmp_candidate_p (gimple *g)
> >|| gimple_bb (gs0) != gimple_bb (g))
> >  return false;
> > 
> > -  if (!(INTEGRAL_TYPE_P (TREE_TYPE (gimple_assign_rhs1 (gs0)))
> > -   || POINTER_TYPE_P (TREE_TYPE (gimple_assign_rhs1 (gs0
> > -  || !(INTEGRAL_TYPE_P (TREE_TYPE (gimple_assign_rhs1 (gs1)))
> > -  || POINTER_TYPE_P (TREE_TYPE (gimple_assign_rhs1 (gs1)
> > -return false;
> > -
> >tcode0 = gimple_assign_rhs_code (gs0);
> >tcode1 = gimple_assign_rhs_code (gs1);
> >if (TREE_CODE_CLASS (tcode0) == tcc_comparison
> > diff --git a/gcc/config/aarch64/aarch64.c b/gcc/config/aarch64/aarch64.c
> > index c8bee3b..db4d190 100644
> > --- a/gcc/config/aarch64/aarch64.c
> > +++ b/gcc/config/aarch64/aarch64.c
> > @@ -12398,6 +12398,18 @@ aarch64_gen_ccmp_first (rtx *prep_seq, rtx 
> > *gen_seq,
> >icode = CODE_FOR_cmpdi;
> >break;
> > 
> > +case SFmode:
> > +  cmp_mode = SFmode;
> > +  cc_mode = aarch64_select_cc_mode ((rtx_code) code, op0, op1);
> > +  icode = cc_mode == CCFPEmode ? CODE_FOR_fcmpesf : CODE_FOR_fcmpsf;
> > +  break;
> > +
> > +case DFmode:
> > +  cmp_mode = DFmode;
> > +  cc_mode = aarch64_select_cc_mode ((rtx_code) code, op0, op1);
> > +  icode = cc_mode == CCFPEmode ? CODE_FOR_fcmpedf : CODE_FOR_fcmpdf;
> > +  break;
> > +
> >  default:
> >end_sequence ();
> >return NULL_RTX;
> > @@ -12461,6 +12473,18 @@ aarch64_gen_ccmp_next (rtx *prep_seq, rtx 
> > *gen_seq, rtx prev, int cmp_code,
> >icode = CODE_FOR_ccmpdi;
> >break;
> > 
> > +case SFmode:
> > +  cmp_mode = SFmode;
> > +  cc_mode = aarch64_select_cc_mode ((rtx_code) cmp_code, op0, op1);
> > +  icode = cc_mode == CCFPEmode ? CODE_FOR_fccmpesf : CODE_FOR_fccmpsf;
> > +  break;
> > +
> > +case DFmode:
> > +  cmp_mode = DFmode;
> > +  cc_mode = aarch64_select_cc_mode ((rtx_code) cmp_code, op0, op1);
> > +  icode = cc_mode == CCFPEmode ? CODE_FOR_fccmpedf : CODE_FOR_fccmpdf;
> > +  break;
> > +
> >  default:
> >end_sequence ();
> >return NULL_RTX;
> > diff --git a/gcc/config/aarch64/aarch64.md b/gcc/config/aarch64/aarch64.md
> > index fab65c6..7d728b5 100644
> > --- a/gcc/config/aarch64/aarch64.md
> > +++ b/gcc/config/aarch64/aarch64.md
> > @@ -279,6 +279,36 @@
> >[(set_attr "type" "alus_sreg,alus_imm,alus_imm")]
> >  )
> > 
> > +(define_insn "fccmp"
> > +  [(set (match_operand:CCFP 1 "cc_register" "")
> > +   (if_then_else:CCFP
> > + (match_operator 4 "aarch64_comparison_operator"
> > +  [(match_operand 0 "cc_register" "")
> > +   (const_int 0)])
> > + (compare:CCFP
> > +   (match_operand:GPF 2 "register_operand" "w")
> > +   (match_operand:GPF 3 "register_operand" "w"))
> > + (match_operand 5 "immediate_operand")

Re: [PATCH 3/4][AArch64] Add CCMP to rtx costs

2015-12-15 Thread James Greenhalgh
On Tue, Dec 15, 2015 at 10:33:22AM +, Wilco Dijkstra wrote:
> ping
> 
> > -Original Message-
> > From: Wilco Dijkstra [mailto:wilco.dijks...@arm.com]
> > Sent: 13 November 2015 16:03
> > To: 'gcc-patches@gcc.gnu.org'
> > Subject: [PATCH 3/4][AArch64] Add CCMP to rtx costs
> > 
> > This patch adds support for rtx costing of CCMP. The cost is the same as 
> > int/FP compare, however comparisons with zero get a slightly
> > larger cost. This means we prefer emitting compares with zero so they can 
> > be merged with ALU operations.
> > 
> > OK for commit?
> > 
> > ChangeLog:
> > 2015-11-13  Wilco Dijkstra  
> > 
> > * gcc/config/aarch64/aarch64.c (aarch64_if_then_else_costs):
> > Add support for CCMP costing.

Drop the gcc/ from here.

This is OK.

Thanks,
James



[PATCH][combine] Check WORD_REGISTER_OPERATIONS normally rather than through preprocessor

2015-12-15 Thread Kyrill Tkachov

Hi all,

As part of the war on conditional compilation here's an #if check on 
WORD_REGISTER_OPERATIONS that
seems to have been missed out.

Bootstrapped and tested on arm, aarch64, x86_64.

Is it still ok to commit these kinds of conditional compilation conversions?

Thanks,
Kyrill

2015-12-15  Kyrylo Tkachov  

* combine.c (simplify_comparison): Convert preprocessor check of
WORD_REGISTER_OPERATIONS into runtime check.
diff --git a/gcc/combine.c b/gcc/combine.c
index 8601d8983ce345e2129dd047b3520d98c0582842..0658a6dbc6df6862df662bc7842c13ed06b36b04 100644
--- a/gcc/combine.c
+++ b/gcc/combine.c
@@ -11488,10 +11488,10 @@ simplify_comparison (enum rtx_code code, rtx *pop0, rtx *pop1)
   /* Try a few ways of applying the same transformation to both operands.  */
   while (1)
 {
-#if !WORD_REGISTER_OPERATIONS
   /* The test below this one won't handle SIGN_EXTENDs on these machines,
 	 so check specially.  */
-  if (code != GTU && code != GEU && code != LTU && code != LEU
+  if (!WORD_REGISTER_OPERATIONS && code != GTU && code != GEU
+	  && code != LTU && code != LEU
 	  && GET_CODE (op0) == ASHIFTRT && GET_CODE (op1) == ASHIFTRT
 	  && GET_CODE (XEXP (op0, 0)) == ASHIFT
 	  && GET_CODE (XEXP (op1, 0)) == ASHIFT
@@ -11511,7 +11511,6 @@ simplify_comparison (enum rtx_code code, rtx *pop0, rtx *pop1)
 	  op0 = SUBREG_REG (XEXP (XEXP (op0, 0), 0));
 	  op1 = SUBREG_REG (XEXP (XEXP (op1, 0), 0));
 	}
-#endif
 
   /* If both operands are the same constant shift, see if we can ignore the
 	 shift.  We can if the shift is a rotate or if the bits shifted out of


[patch] libstdc++/68912 Fix cv-qualifiers in std::bind invocation

2015-12-15 Thread Jonathan Wakely

The first patch fixes an inconsistency between the return type and the
function body, as described in the PR.

The second patch removes the TR1 return type support from _Mu, because
it isn't necessary in C++11. The third patch is because the second one
accidentally removed a "volatile" (removing that is fine, because we
never create volatile _Mu, or even a const one, but no point removing
it on only one specialization and leaving it elsewhere).

Tested powerpc64le-linux.

I've committed both to trunk and will apply the first patch (but not
the second and third) to the branches too.
commit 430066b306ddcd35982a748a34bf3603261f9050
Author: Jonathan Wakely 
Date:   Tue Dec 15 12:20:01 2015 +

Fix cv-qualifiers in std::bind invocation

PR libstdc++/68912
* include/std/functional (_Bind::operator()): Use lvalue functor to
deduce return type.
* testsuite/20_util/bind/68912.cc: New.

diff --git a/libstdc++-v3/include/std/functional 
b/libstdc++-v3/include/std/functional
index 19caa96..8d39d62 100644
--- a/libstdc++-v3/include/std/functional
+++ b/libstdc++-v3/include/std/functional
@@ -1034,7 +1034,7 @@ _GLIBCXX_MEM_FN_TRAITS(&&, false_type, true_type)
 
   // Call unqualified
   template()(
+   = decltype( std::declval<_Functor&>()(
  _Mu<_Bound_args>()( std::declval<_Bound_args&>(),
  std::declval&>() )... ) )>
_Result
@@ -1048,7 +1048,7 @@ _GLIBCXX_MEM_FN_TRAITS(&&, false_type, true_type)
   // Call as const
   template= 0),
-  typename add_const<_Functor>::type>::type>()(
+  typename add_const<_Functor>::type&>::type>()(
  _Mu<_Bound_args>()( std::declval(),
  std::declval&>() )... ) )>
_Result
@@ -1062,7 +1062,7 @@ _GLIBCXX_MEM_FN_TRAITS(&&, false_type, true_type)
   // Call as volatile
   template= 0),
-   typename add_volatile<_Functor>::type>::type>()(
+   typename add_volatile<_Functor>::type&>::type>()(
  _Mu<_Bound_args>()( std::declval(),
  std::declval&>() )... ) )>
_Result
@@ -1076,7 +1076,7 @@ _GLIBCXX_MEM_FN_TRAITS(&&, false_type, true_type)
   // Call as const volatile
   template= 0),
-   typename add_cv<_Functor>::type>::type>()(
+   typename add_cv<_Functor>::type&>::type>()(
  _Mu<_Bound_args>()( std::declval(),
  std::declval&>() )... ) )>
_Result
diff --git a/libstdc++-v3/testsuite/20_util/bind/68912.cc 
b/libstdc++-v3/testsuite/20_util/bind/68912.cc
new file mode 100644
index 000..7a00b75
--- /dev/null
+++ b/libstdc++-v3/testsuite/20_util/bind/68912.cc
@@ -0,0 +1,53 @@
+// Copyright (C) 2015 Free Software Foundation, Inc.
+//
+// This file is part of the GNU ISO C++ Library.  This library is free
+// software; you can redistribute it and/or modify it under the
+// terms of the GNU General Public License as published by the
+// Free Software Foundation; either version 3, or (at your option)
+// any later version.
+
+// This library is distributed in the hope that it will be useful,
+// but WITHOUT ANY WARRANTY; without even the implied warranty of
+// MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+// GNU General Public License for more details.
+
+// You should have received a copy of the GNU General Public License along
+// with this library; see the file COPYING3.  If not see
+// .
+
+// { dg-options "-std=gnu++11" }
+// { dg-do compile }
+
+#include
+
+struct Wrong {};
+struct A {};
+struct B {};
+struct C{};
+struct D{};
+
+struct X {
+  A operator()(int, double) & { return {}; }
+  Wrong operator()(int, double) && {return {}; }
+
+  B operator()(int, double) const & { return {}; }
+  Wrong operator()(int, double) const && {return {}; }
+
+  C operator()(int, double) volatile & { return {}; }
+  Wrong operator()(int, double) volatile && {return {}; }
+
+  D operator()(int, double) const volatile & { return {}; }
+  Wrong operator()(int, double) const volatile && {return {}; }
+};
+
+void test01()
+{
+  auto bound = std::bind(X{}, 5, std::placeholders::_1);
+  A res = bound(1.0);
+  const auto bound_c = bound;
+  B res_c = bound_c(1.0);
+  volatile auto bound_v = bound;
+  C res_v = bound_v(1.0);
+  volatile const auto bound_cv = bound;
+  D res_cv = bound_cv(1.0);
+}
commit 44698303952f8e1215756adf53f72ca72cf0c59d
Author: Jonathan Wakely 
Date:   Tue Dec 15 13:16:58 2015 +

Remove vestigial traces of std::tr1::bind

* include/std/functional (is_placeholder, is_bind_expression): Update
comments.
(_Safe_tuple_element): Replace with _Safe_tuple_element_t alias
template.
(_Mu): Remove vestigial TR1 return types and update coments.

diff --git a/libstdc++-v3/include/std/functional 
b/libstdc

[PATCH][rtlanal.c] Convert conditional compilation on WORD_REGISTER_OPERATIONS

2015-12-15 Thread Kyrill Tkachov

Hi all,

This converts the preprocessor check for WORD_REGISTER_OPERATIONS into a 
runtime one
in rtlanal.c.

Since this one was in combination with an "#if defined" and used to guard an 
if-statement
I'd appreciate it if someone gave it a double-check that I dind't screw up the 
intended
behaviour.

Bootstrapped and tested on arm, aarch64, x86_64.

Ok for trunk?

Thanks,
Kyrill

2015-12-15  Kyrylo Tkachov  

* rtlanal.c (nonzero_bits1): Convert preprocessor check
for WORD_REGISTER_OPERATIONS to runtime check.
diff --git a/gcc/rtlanal.c b/gcc/rtlanal.c
index f893bca0d0a17498c1b234492e4acff02cec6e84..ab49602b72984e336f01a6b13f94993af9e4b8f9 100644
--- a/gcc/rtlanal.c
+++ b/gcc/rtlanal.c
@@ -4540,13 +4540,14 @@ nonzero_bits1 (const_rtx x, machine_mode mode, const_rtx known_x,
 	  nonzero &= cached_nonzero_bits (SUBREG_REG (x), mode,
 	  known_x, known_mode, known_ret);
 
-#if WORD_REGISTER_OPERATIONS && defined (LOAD_EXTEND_OP)
+#ifdef LOAD_EXTEND_OP
 	  /* If this is a typical RISC machine, we only have to worry
 	 about the way loads are extended.  */
-	  if ((LOAD_EXTEND_OP (inner_mode) == SIGN_EXTEND
-	   ? val_signbit_known_set_p (inner_mode, nonzero)
-	   : LOAD_EXTEND_OP (inner_mode) != ZERO_EXTEND)
-	  || !MEM_P (SUBREG_REG (x)))
+	  if (WORD_REGISTER_OPERATIONS
+	  && ((LOAD_EXTEND_OP (inner_mode) == SIGN_EXTEND
+		 ? val_signbit_known_set_p (inner_mode, nonzero)
+		 : LOAD_EXTEND_OP (inner_mode) != ZERO_EXTEND)
+		   || !MEM_P (SUBREG_REG (x
 #endif
 	{
 	  /* On many CISC machines, accessing an object in a wider mode


RE: [PATCH 2/4 v2][AArch64] Add support for FCCMP

2015-12-15 Thread Wilco Dijkstra
Adding Bernd - would you mind reviewing the ccmp.c change please?

> -Original Message-
> From: James Greenhalgh [mailto:james.greenha...@arm.com]
> Sent: 15 December 2015 16:42
> To: Wilco Dijkstra
> Cc: gcc-patches@gcc.gnu.org
> Subject: Re: [PATCH 2/4 v2][AArch64] Add support for FCCMP
> 
> On Tue, Dec 15, 2015 at 10:32:50AM +, Wilco Dijkstra wrote:
> > ping
> >
> > > -Original Message-
> > > From: Wilco Dijkstra [mailto:wilco.dijks...@arm.com]
> > > Sent: 17 November 2015 18:36
> > > To: gcc-patches@gcc.gnu.org
> > > Subject: [PATCH 2/4 v2][AArch64] Add support for FCCMP
> > >
> > > (v2 version removes 4 enums)
> > >
> > > This patch adds support for FCCMP. This is trivial with the new CCMP 
> > > representation - remove the restriction of FP in ccmp.c and
> add
> > > FCCMP patterns. Add a test to ensure FCCMP/FCCMPE are emitted as expected.
> > >
> > > OK for commit?
> 
> The AArch64 code-generation parts of this are OK, though please wait for
> an OK on the ccmp.c changes before committing, and please revisit the
> testcase.
> 
> Sorry that this took a long time to get to.

No problem.

> > > ChangeLog:
> > > 2015-11-18  Wilco Dijkstra  
> > >
> > >   * gcc/ccmp.c (ccmp_candidate_p): Remove integer-only restriction.
> 
> Drop the gcc/ from the paths here and below.
> 
> > >   * gcc/config/aarch64/aarch64.md (fccmp): New pattern.
> > >   (fccmpe): Likewise.
> > >   (fcmp): Rename to fcmp and globalize pattern.
> > >   (fcmpe): Likewise.
> > >   * gcc/config/aarch64/aarch64.c (aarch64_gen_ccmp_first): Add FP support.
> > >   (aarch64_gen_ccmp_next): Add FP support.
> > >
> > > gcc/testsuite/
> > >   * gcc.target/aarch64/ccmp_1.c: New testcase.
> 
> This testcase doesn't look very helpful to me. You only end up checking if
> *any* of the tests compile to fccmp/fccmpe rather than *all* the tests. Should
> this use a scan-assembler-times directive to count the number of times a
> particular instruction appears?

There are no costs involved so there is no guarantee which CCMPs we will see
generated. After patch 3 and 4, the order is better defined and the testcase is
updated to reflect what we expect to be generated.

The alternative would be to not add the testcase here, but in part 4. However in
internal review it was requested to add it to this part of the patch...

Wilco

> > > ---
> > >  gcc/ccmp.c|  6 ---
> > >  gcc/config/aarch64/aarch64.c  | 24 +
> > >  gcc/config/aarch64/aarch64.md | 34 -
> > >  gcc/testsuite/gcc.target/aarch64/ccmp_1.c | 84 
> > > +++
> > >  4 files changed, 140 insertions(+), 8 deletions(-)
> > >  create mode 100644 gcc/testsuite/gcc.target/aarch64/ccmp_1.c
> > >
> > > diff --git a/gcc/ccmp.c b/gcc/ccmp.c
> > > index 58ac126..3698a7d 100644
> > > --- a/gcc/ccmp.c
> > > +++ b/gcc/ccmp.c
> > > @@ -112,12 +112,6 @@ ccmp_candidate_p (gimple *g)
> > >|| gimple_bb (gs0) != gimple_bb (g))
> > >  return false;
> > >
> > > -  if (!(INTEGRAL_TYPE_P (TREE_TYPE (gimple_assign_rhs1 (gs0)))
> > > -   || POINTER_TYPE_P (TREE_TYPE (gimple_assign_rhs1 (gs0
> > > -  || !(INTEGRAL_TYPE_P (TREE_TYPE (gimple_assign_rhs1 (gs1)))
> > > -|| POINTER_TYPE_P (TREE_TYPE (gimple_assign_rhs1 (gs1)
> > > -return false;
> > > -
> > >tcode0 = gimple_assign_rhs_code (gs0);
> > >tcode1 = gimple_assign_rhs_code (gs1);
> > >if (TREE_CODE_CLASS (tcode0) == tcc_comparison
> > > diff --git a/gcc/config/aarch64/aarch64.c b/gcc/config/aarch64/aarch64.c
> > > index c8bee3b..db4d190 100644
> > > --- a/gcc/config/aarch64/aarch64.c
> > > +++ b/gcc/config/aarch64/aarch64.c
> > > @@ -12398,6 +12398,18 @@ aarch64_gen_ccmp_first (rtx *prep_seq, rtx 
> > > *gen_seq,
> > >icode = CODE_FOR_cmpdi;
> > >break;
> > >
> > > +case SFmode:
> > > +  cmp_mode = SFmode;
> > > +  cc_mode = aarch64_select_cc_mode ((rtx_code) code, op0, op1);
> > > +  icode = cc_mode == CCFPEmode ? CODE_FOR_fcmpesf : CODE_FOR_fcmpsf;
> > > +  break;
> > > +
> > > +case DFmode:
> > > +  cmp_mode = DFmode;
> > > +  cc_mode = aarch64_select_cc_mode ((rtx_code) code, op0, op1);
> > > +  icode = cc_mode == CCFPEmode ? CODE_FOR_fcmpedf : CODE_FOR_fcmpdf;
> > > +  break;
> > > +
> > >  default:
> > >end_sequence ();
> > >return NULL_RTX;
> > > @@ -12461,6 +12473,18 @@ aarch64_gen_ccmp_next (rtx *prep_seq, rtx 
> > > *gen_seq, rtx prev, int cmp_code,
> > >icode = CODE_FOR_ccmpdi;
> > >break;
> > >
> > > +case SFmode:
> > > +  cmp_mode = SFmode;
> > > +  cc_mode = aarch64_select_cc_mode ((rtx_code) cmp_code, op0, op1);
> > > +  icode = cc_mode == CCFPEmode ? CODE_FOR_fccmpesf : 
> > > CODE_FOR_fccmpsf;
> > > +  break;
> > > +
> > > +case DFmode:
> > > +  cmp_mode = DFmode;
> > > +  cc_mode = aarch64_select_cc_mode ((rtx_code) cmp_code, op0, op1);
> > > +  icode = cc_mode ==

[PATCH][reload.c] Convert conditional compilation of WORD_REGISTER_OPERATIONS

2015-12-15 Thread Kyrill Tkachov

Hi all,

This converts the preprocessor checks for WORD_REGISTER_OPERATIONS into runtime 
checks
in reload.c.

Since this one is used to guard part of a large condition, I'd appreciate it if 
someone
double-checks that the logic is still equivalent.

Bootstrapped and tested on arm, aarch64, x86_64.

Ok for trunk?

Thanks,
Kyrill

2015-12-15  Kyrylo Tkachov  

* reload.c (push_reload): Convert preprocessor checks
for WORD_REGISTER_OPERATIONS to runtime checks.
diff --git a/gcc/reload.c b/gcc/reload.c
index 1e96dfc99c48e2aeb76ed7a0b5280cbe7c9cf34c..4ceb64312d777d730908cf7d9de0c8fd6c82b9dc 100644
--- a/gcc/reload.c
+++ b/gcc/reload.c
@@ -1074,14 +1074,12 @@ push_reload (rtx in, rtx out, rtx *inloc, rtx *outloc,
 		  && INTEGRAL_MODE_P (GET_MODE (SUBREG_REG (in)))
 		  && LOAD_EXTEND_OP (GET_MODE (SUBREG_REG (in))) != UNKNOWN)
 #endif
-#if WORD_REGISTER_OPERATIONS
-		  || ((GET_MODE_PRECISION (inmode)
-		   < GET_MODE_PRECISION (GET_MODE (SUBREG_REG (in
-		  && ((GET_MODE_SIZE (inmode) - 1) / UNITS_PER_WORD ==
-			  ((GET_MODE_SIZE (GET_MODE (SUBREG_REG (in))) - 1)
-			   / UNITS_PER_WORD)))
-#endif
-		  ))
+		  || (WORD_REGISTER_OPERATIONS
+			&& ((GET_MODE_PRECISION (inmode)
+			 < GET_MODE_PRECISION (GET_MODE (SUBREG_REG (in
+			&& ((GET_MODE_SIZE (inmode) - 1) / UNITS_PER_WORD ==
+			 ((GET_MODE_SIZE (GET_MODE (SUBREG_REG (in))) - 1)
+			   / UNITS_PER_WORD))
 	  || (REG_P (SUBREG_REG (in))
 	  && REGNO (SUBREG_REG (in)) < FIRST_PSEUDO_REGISTER
 	  /* The case where out is nonzero
@@ -1175,13 +1173,13 @@ push_reload (rtx in, rtx out, rtx *inloc, rtx *outloc,
 	   || MEM_P (SUBREG_REG (out)))
 	  && ((GET_MODE_PRECISION (outmode)
 		   > GET_MODE_PRECISION (GET_MODE (SUBREG_REG (out
-#if WORD_REGISTER_OPERATIONS
-		  || ((GET_MODE_PRECISION (outmode)
-		   < GET_MODE_PRECISION (GET_MODE (SUBREG_REG (out
-		  && ((GET_MODE_SIZE (outmode) - 1) / UNITS_PER_WORD ==
-			  ((GET_MODE_SIZE (GET_MODE (SUBREG_REG (out))) - 1)
-			   / UNITS_PER_WORD)))
-#endif
+		  || (WORD_REGISTER_OPERATIONS
+		  && ((GET_MODE_PRECISION (outmode)
+			< GET_MODE_PRECISION (GET_MODE (SUBREG_REG (out
+			  && ((GET_MODE_SIZE (outmode) - 1) / UNITS_PER_WORD
+			   == ((GET_MODE_SIZE (GET_MODE (SUBREG_REG (out)))
+  - 1)
+/ UNITS_PER_WORD
 		  ))
 	  || (REG_P (SUBREG_REG (out))
 	  && REGNO (SUBREG_REG (out)) < FIRST_PSEUDO_REGISTER


[PATCH][reload1.c] Convert conditional compilation on WORD_REGISTER_OPERATIONS

2015-12-15 Thread Kyrill Tkachov

Hi all,

This converts the preprocessor check for WORD_REGISTER_OPERATIONS into a 
runtime check
in reload1.c.

Since this one is used to guard part of a condition, I'd appreciate it if 
someone
double-checks that the logic is still equivalent.

Bootstrapped and tested on arm, aarch64, x86_64.

Ok for trunk?

Thanks,
Kyrill

2015-12-15  Kyrylo Tkachov  

* reload1.c (eliminate_regs_1): Convert preprocessor check
for WORD_REGISTER_OPERATIONS to runtime check.
diff --git a/gcc/reload1.c b/gcc/reload1.c
index 4f1910b95cae33418e7bf3f1e19a564b1e43614d..1a1a591b3777fa2a0a4a12fe0b2d763acee453ad 100644
--- a/gcc/reload1.c
+++ b/gcc/reload1.c
@@ -2851,20 +2851,17 @@ eliminate_regs_1 (rtx x, machine_mode mem_mode, rtx insn,
 
 	  if (MEM_P (new_rtx)
 	  && ((x_size < new_size
-#if WORD_REGISTER_OPERATIONS
-		   /* On these machines, combine can create rtl of the form
-		  (set (subreg:m1 (reg:m2 R) 0) ...)
-		  where m1 < m2, and expects something interesting to
-		  happen to the entire word.  Moreover, it will use the
-		  (reg:m2 R) later, expecting all bits to be preserved.
-		  So if the number of words is the same, preserve the
-		  subreg so that push_reload can see it.  */
-		   && ! ((x_size - 1) / UNITS_PER_WORD
-			 == (new_size -1 ) / UNITS_PER_WORD)
-#endif
-		   )
-		  || x_size == new_size)
-	  )
+	/* On machines with WORD_REGISTER_OPERATIONS, combine can create
+	   rtl of the form (set (subreg:m1 (reg:m2 R) 0) ...)
+	   where m1 < m2, and expects something interesting to
+	   happen to the entire word.  Moreover, it will use the
+	   (reg:m2 R) later, expecting all bits to be preserved.
+	   So if the number of words is the same, preserve the
+	   subreg so that push_reload can see it.  */
+		   && !(WORD_REGISTER_OPERATIONS
+			 && ((x_size - 1) / UNITS_PER_WORD
+			  == (new_size - 1) / UNITS_PER_WORD)))
+	  || x_size == new_size))
 	return adjust_address_nv (new_rtx, GET_MODE (x), SUBREG_BYTE (x));
 	  else
 	return gen_rtx_SUBREG (GET_MODE (x), new_rtx, SUBREG_BYTE (x));


Re: [build] Only support -gstabs on Mac OS X if assember supports it (PR target/67973)

2015-12-15 Thread Mike Stump
On Dec 15, 2015, at 5:35 AM, Rainer Orth  wrote:
> Right: I'm effectively keeping just the first configure test for .stabs
> support in the assembler to enable or disable
> DBX_DEBUG/DBX_DEBUGGING_INFO.  I'll post it later since …

> ... testing revealed another instance of static assumptions which hurts
> us now: while support for -gstabs* is checked for dynamically in
> lib/gcc-dg.exp and lib/gfortran-dg.exp for the debug.exp tests, there
> are a couple of testcases that use -gstabs* unconditionally, but have a
> hardcoded list of targets that support those options.  I'll introduce a
> new effective-target keyword (simply checking if -gstabs is accepted
> should be enough) to also perform this test dynamically and repost once
> it's tested.

Sounds good.

Re: [BUILDROBOT] "error: null argument where non-null required" on multiple targets

2015-12-15 Thread Jeff Law

On 12/14/2015 01:07 PM, Jan-Benedict Glaw wrote:

On Mon, 2015-12-14 18:54:28 +, Moore, Catherine 
 wrote:

avr-rtems   
http://toolchain.lug-owl.de/buildbot/show_build_details.php?id=478544
mipsel-elf  
http://toolchain.lug-owl.de/buildbot/show_build_details.php?id=478844
mipsisa64r2-sde-elf 
http://toolchain.lug-owl.de/buildbot/show_build_details.php?id=478855
mipsisa64sb1-elf
http://toolchain.lug-owl.de/buildbot/show_build_details.php?id=478865
mips-rtems  
http://toolchain.lug-owl.de/buildbot/show_build_details.php?id=478877
powerpc-eabialtivec 
http://toolchain.lug-owl.de/buildbot/show_build_details.php?id=478922
powerpc-eabispe 
http://toolchain.lug-owl.de/buildbot/show_build_details.php?id=478932
powerpc-rtems   
http://toolchain.lug-owl.de/buildbot/show_build_details.php?id=478956
ppc-elf 
http://toolchain.lug-owl.de/buildbot/show_build_details.php?id=478968
sh-superh-elf   
http://toolchain.lug-owl.de/buildbot/show_build_details.php?id=479077


Is there an easy way to reproduce the MIPS problems that you
reported?  I don't seem to be able to do it with a cross-compiler
targeting mipsel-elf.


What's your build compiler? For these builds, where it showed up, I'm
using a freshly compiles HEAD/master version. So basically, compile a
current GCC for your build machine:
Right.  This is something that only shows up when using the trunk to 
build the crosses.


When I looked, I thought I bisected it to the delayed folding work.

jeff



Re: [PATCH 4/4] Cost CCMP instruction sequences to choose better expand order

2015-12-15 Thread Jiong Wang

On 15/12/15 10:33, Wilco Dijkstra wrote:

-Original Message-
From: Wilco Dijkstra [mailto:wilco.dijks...@arm.com]
Sent: 13 November 2015 16:03
To: 'gcc-patches@gcc.gnu.org'
Subject: [PATCH 4/4][AArch64] Cost CCMP instruction sequences to choose better 
expand order

This patch adds CCMP selection based on rtx costs. This is based on Jiong's 
already approved patch https://gcc.gnu.org/ml/gcc-
patches/2015-09/msg01434.html with some minor refactoring and the tests updated.

OK for commit?

ChangeLog:
2015-11-13  Jiong Wang  

gcc/
* ccmp.c (expand_ccmp_expr_1): Cost the instruction sequences
generated from different expand order.

gcc/testsuite/
* gcc.target/aarch64/ccmp_1.c: Update test.


Hi Bernd,

  You approved this patch at

https://gcc.gnu.org/ml/gcc-patches/2015-09/msg01722.html

  under the condition that AArch64 cost on ccmp instruction should be
  fixed first.

  Wilco has fixed the cost issue in this patch set [3/4], and the 
"XFAIL" removed also.


  I just want to confirm that this patch is still OK to commit after 
boostrap and

  regression OK, right?

  Thanks.




---
  gcc/ccmp.c| 47 +++
  gcc/testsuite/gcc.target/aarch64/ccmp_1.c | 15 --
  2 files changed, 55 insertions(+), 7 deletions(-)

diff --git a/gcc/ccmp.c b/gcc/ccmp.c
index cbdbd6d..95a41a6 100644
--- a/gcc/ccmp.c
+++ b/gcc/ccmp.c
@@ -51,6 +51,7 @@ along with GCC; see the file COPYING3.  If not see
  #include "tree-outof-ssa.h"
  #include "cfgexpand.h"
  #include "ccmp.h"
+#include "predict.h"

  /* The following functions expand conditional compare (CCMP) instructions.
 Here is a short description about the over all algorithm:
@@ -159,6 +160,8 @@ expand_ccmp_next (gimple *g, enum tree_code code, rtx prev,
  static rtx
  expand_ccmp_expr_1 (gimple *g, rtx *prep_seq, rtx *gen_seq)
  {
+  rtx prep_seq_1, gen_seq_1;
+  rtx prep_seq_2, gen_seq_2;
tree exp = gimple_assign_rhs_to_tree (g);
enum tree_code code = TREE_CODE (exp);
gimple *gs0 = get_gimple_for_ssa_name (TREE_OPERAND (exp, 0));
@@ -174,19 +177,53 @@ expand_ccmp_expr_1 (gimple *g, rtx *prep_seq, rtx 
*gen_seq)
  {
if (TREE_CODE_CLASS (code1) == tcc_comparison)
{
- int unsignedp0;
- enum rtx_code rcode0;
+ int unsignedp0, unsignedp1;
+ enum rtx_code rcode0, rcode1;
+ int speed_p = optimize_insn_for_speed_p ();
+ rtx tmp2, ret, ret2;
+ unsigned cost1 = MAX_COST;
+ unsigned cost2 = MAX_COST;

  unsignedp0 = TYPE_UNSIGNED (TREE_TYPE (gimple_assign_rhs1 (gs0)));
+ unsignedp1 = TYPE_UNSIGNED (TREE_TYPE (gimple_assign_rhs1 (gs1)));
  rcode0 = get_rtx_code (code0, unsignedp0);
+ rcode1 = get_rtx_code (code1, unsignedp1);

- tmp = targetm.gen_ccmp_first (prep_seq, gen_seq, rcode0,
+ tmp = targetm.gen_ccmp_first (&prep_seq_1, &gen_seq_1, rcode0,
gimple_assign_rhs1 (gs0),
gimple_assign_rhs2 (gs0));
- if (!tmp)
+
+ tmp2 = targetm.gen_ccmp_first (&prep_seq_2, &gen_seq_2, rcode1,
+gimple_assign_rhs1 (gs1),
+gimple_assign_rhs2 (gs1));
+
+ if (!tmp && !tmp2)
return NULL_RTX;

- return expand_ccmp_next (gs1, code, tmp, prep_seq, gen_seq);
+ if (tmp != NULL)
+   {
+ ret = expand_ccmp_next (gs1, code, tmp, &prep_seq_1, &gen_seq_1);
+ cost1 = seq_cost (safe_as_a  (prep_seq_1), speed_p);
+ cost1 += seq_cost (safe_as_a  (gen_seq_1), speed_p);
+   }
+ if (tmp2 != NULL)
+   {
+ ret2 = expand_ccmp_next (gs0, code, tmp2, &prep_seq_2,
+  &gen_seq_2);
+ cost2 = seq_cost (safe_as_a  (prep_seq_2), speed_p);
+ cost2 += seq_cost (safe_as_a  (gen_seq_2), speed_p);
+   }
+
+ if (cost2 < cost1)
+   {
+ *prep_seq = prep_seq_2;
+ *gen_seq = gen_seq_2;
+ return ret2;
+   }
+
+ *prep_seq = prep_seq_1;
+ *gen_seq = gen_seq_1;
+ return ret;
}
else
{
diff --git a/gcc/testsuite/gcc.target/aarch64/ccmp_1.c 
b/gcc/testsuite/gcc.target/aarch64/ccmp_1.c
index ef077e0..7c39b61 100644
--- a/gcc/testsuite/gcc.target/aarch64/ccmp_1.c
+++ b/gcc/testsuite/gcc.target/aarch64/ccmp_1.c
@@ -80,5 +80,16 @@ f13 (int a, int b)
return a == 3 || a == 0;
  }

-/* { dg-final { scan-assembler "fccmp\t" } } */
-/* { dg-final { scan-assembler "fccmpe\t" } } */
+/* { dg-final { scan-assembler "cmp\t(.)+32" } } */
+/* { dg-final { scan-assembler "cmp\t(.)+33" } } */
+/* { dg-final { scan-assembler "cmp\t(.)+34" } } */
+/* { dg-final { scan-assembler "cmp\t(.)+35" } } */
+
+/* { dg-final { scan-assembler-times "\tcmp\tw\[0-9\]+, 0" 4

Re: Splitting up gcc/omp-low.c?

2015-12-15 Thread Nathan Sidwell

On 12/10/15 06:34, Jakub Jelinek wrote:


I'm aware of some duplication in expand_omp_for_* functions, and some of the
obvious duplications were already moved to helper functions.  But in these
cases the number of differences is even significantly bigger too, so having
just one function that would handle all the different schedules would be far
less readable.  Perhaps we can add some small helpers to handle some little
pieces that repeat between the functions.


I agree.  For instance, earlier openacc's loop expansion piggybacked onto the 
the two omp loop expanders.  I found it much cleaner to create a separate 
openacc loop expander.  There's so much stuff to juggle in each case, that 
combining all the variants into one function can lead to cognitive overload.


nathan


[patch] libstdc++/68921 add timeout argument to futex(2)

2015-12-15 Thread Jonathan Wakely

This fixes a missing argument to the futex syscall.

Tested powerpc64le-linux. This needs to be fixed for gcc-5 and trunk.

commit ea5726fb770a1e89f6102c903f828ec6d71a1e3d
Author: Jonathan Wakely 
Date:   Tue Dec 15 18:34:52 2015 +

libstdc++/68921 add timeout argument to futex(2)

	PR libstdc++/68921
	* src/c++11/futex.cc
	(__atomic_futex_unsigned_base::_M_futex_wait_until): Use null pointer
	as timeout argument.

diff --git a/libstdc++-v3/src/c++11/futex.cc b/libstdc++-v3/src/c++11/futex.cc
index e04dba8..e723364 100644
--- a/libstdc++-v3/src/c++11/futex.cc
+++ b/libstdc++-v3/src/c++11/futex.cc
@@ -52,7 +52,7 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
 	// we will fall back to spin-waiting.  The only thing we could do
 	// here on errors is abort.
 	int ret __attribute__((unused));
-	ret = syscall (SYS_futex, __addr, futex_wait_op, __val);
+	ret = syscall (SYS_futex, __addr, futex_wait_op, __val, nullptr);
 	_GLIBCXX_DEBUG_ASSERT(ret == 0 || errno == EINTR || errno == EAGAIN);
 	return true;
   }


Re: fix scheduling antideps

2015-12-15 Thread Mike Stump
On Dec 11, 2015, at 6:09 AM, Jeff Law  wrote:
> On 12/11/2015 02:22 AM, Eric Botcazou wrote:
>>> This patch allows a target to increase the cost of anti-deps to better
>>> reflect the actual cost on the machine.
>> 
>> But it can already do it via the TARGET_SCHED_ADJUST_COST hook, can't it?
> And can't this be done with define_bypass as well?

  if (dep_type == REG_DEP_ANTI)
cost = 0;
  else if (dep_type == REG_DEP_OUTPUT)
{
  cost = (insn_default_latency (insn)
  - insn_default_latency (used));
  if (cost <= 0)
cost = 1;
}
  else if (bypass_p (insn))
cost = insn_latency (insn, used);

I don’t see how if the first case is true, one gets into the third without code 
mods.  I opted for the adjust_cost_2 hook.

Re: fix scheduling antideps

2015-12-15 Thread Mike Stump
On Dec 11, 2015, at 1:22 AM, Eric Botcazou  wrote:
>> This patch allows a target to increase the cost of anti-deps to better
>> reflect the actual cost on the machine.
> 
> But it can already do it via the TARGET_SCHED_ADJUST_COST hook, can't it?

The undocumented TARGET_SCHED_ADJUST_COST_2 seems a better fit.  Yes, that 
works, I can use it.  I’m assuming the lack of documentation is a simple error.


[PATCH] Better error recovery for merge-conflict markers (v5)

2015-12-15 Thread David Malcolm
On Wed, 2015-12-09 at 18:44 +0100, Bernd Schmidt wrote:
> On 12/09/2015 05:58 PM, David Malcolm wrote:
> > On Wed, 2015-11-04 at 14:56 +0100, Bernd Schmidt wrote:
> >>
> >> This seems like fairly low impact but also low cost, so I'm fine with it
> >> in principle. I wonder whether the length of the marker is the same
> >> across all versions of patch (and VC tools)?
> >
> > It's hardcoded for GNU patch:
> [...]
>  >From what I can tell, Perforce is the outlier here.
>
> Thanks for checking all that.
>
> >> Just thinking out loud - I guess it would be too much to hope for to
> >> share lexers between frontends so that we need only one copy of this?
> >
> > Probably :(
>
> Someone slap sense into me, I just thought of deriving C and C++ parsers
> from a common base class... (no this is not a suggestion for this patch).
>
> > Would a better wording be:
> >
> > extern short some_var; /* This line would lead to a warning due to the
> >duplicate name, but it is skipped when handling
> >the conflict marker.  */
>
> I think so, yes.
>
> > That said, it's not clear they're always at the beginning of a line;
> > this bazaar bug indicates that CVS (and bazaar) can emit them
> > mid-line:
> >https://bugs.launchpad.net/bzr/+bug/36399
>
> Ok. CVS I think we shouldn't worry about, and it looks like this is one
> particular bug/corner case where the conflict end marker is the last
> thing in the file. I think on the whole it's best to check for beginning
> of the line as you've done.
>
> > Wording-wise, should it be "merge conflict marker", rather
> > than "patch conflict marker"?
> >
> > Clang spells it:
> > "error: version control conflict marker in file"
> > http://blog.llvm.org/2010/04/amazing-feats-of-clang-error-recovery.html#merge_conflicts
>
> Yeah, if another compiler has a similar/identical diagnostic I think we
> should just copy that unless there's a very good reason not to.
>
> > Rebased on top of r231445 (from yesterday).
> > Successfully bootstrapped®rtested on x86_64-pc-linux-gnu.
> > Adds 82 new PASSes to g++.sum and 27 new PASSes to gcc.sum.
> >
> > OK for trunk?
>
> I'm inclined to say yes since it was originally submitted in time and
> it's hard to imagine how the change could be risky (I'll point out right
> away that there are one or two other patches in the queue that were also
> submitted in time which I feel should not be considered for gcc-6 at
> this point due to risk).
>
> Let's wait until the end of the week for objections, commit then.

I got thinking about what we'd have to do to support Perforce-style
markers, and began to find my token-matching approach to be a little
clunky (in conjunction with reading Martin's observations on
c_parser_peek_nth_token).

Here's a reimplementation of the patch which takes a much simpler
approach, and avoids the need to touch the C lexer: check that we're
not in a macro expansion and then read in the source line, and
textually compare against the various possible conflict markers.
This adds the requirement that the source file be readable, so it
won't detect conflict markers in a .i file from -save-temps, but
that seems no great loss compared to the simpler, more flexible
implementation.  We're about to emit an error at the line, so
this shouldn't add any extra file access for the default case
of printing the source line after the error.

Is this approach preferable, or should I just go with the
v4 approach?

Other changes:
- Updated wording to match clang's
  "error: version control conflict marker in file"
and replaced "patch conflict marker" with "conflict marker" in the code
and names of test cases.  (I have a version of the v4 patch
with those changes).
- Renamed local "loc" to "marker_loc" to clarify things.

Successfully bootstrapped®rtested on x86_64-pc-linux-gnu.

Sorry to prolong this; thanks for your patience.

gcc/c-family/ChangeLog:
* c-common.h (conflict_marker_at_location_p): New prototype.
* c-lex.c (conflict_marker_within_line_p): New function.
(conflict_marker_at_location_p): New function.

gcc/c/ChangeLog:
* c-parser.c (c_parser_error): Detect conflict markers and report
them as such.

gcc/cp/ChangeLog:
* parser.c (cp_parser_error): Detect conflict markers and report
them as such.

gcc/testsuite/ChangeLog:
* c-c++-common/conflict-markers-1.c: New testcase.
* c-c++-common/conflict-markers-2.c: Likewise.
* c-c++-common/conflict-markers-3.c: Likewise.
* c-c++-common/conflict-markers-4.c: Likewise.
* c-c++-common/conflict-markers-5.c: Likewise.
* c-c++-common/conflict-markers-6.c: Likewise.
* c-c++-common/conflict-markers-7.c: Likewise.
* c-c++-common/conflict-markers-8.c: Likewise.
* c-c++-common/conflict-markers-9.c: Likewise.
* c-c++-common/conflict-markers-10.c: Likewise.
* c-c++-common/conflict-markers-11.c: Likewise.
* g++.dg/con

Re: [PATCH] S/390: Wide int support.

2015-12-15 Thread Richard Sandiford
"Ulrich Weigand"  writes:
> Dominik Vogt wrote:
>
>> +; Note: Although CONST_INT and CONST_DOUBLE are not handled in this
>> predicate,
>> +; at least one of them needs to appear or otherwise safe_predicate_mode will
>> +; assume that a DImode LABEL_REF is not accepted either (see genrecog.c).
>
> The problem is not DImode LABEL_REFs, but rather VOIDmode LABEL_REFs when
> matched against a match_operand:DI.

It'd be good to fix this in a more direct way though, rather than
hack around it.  It's possible that the trick will stop working
if genrecog.c gets smarter.

When do label_refs have VOIDmode?  Is this an m31-ism?

Thanks,
Richard


Re: [patch] libstdc++/68921 add timeout argument to futex(2)

2015-12-15 Thread Torvald Riegel
On Tue, 2015-12-15 at 18:46 +, Jonathan Wakely wrote:
> This fixes a missing argument to the futex syscall.
> 
> Tested powerpc64le-linux. This needs to be fixed for gcc-5 and trunk.
> 

OK.  Thanks!



[PTX] more register cleanups

2015-12-15 Thread Nathan Sidwell
this patch uses reg_names array to emit register names, rather than have 
knowledge scattered throughout the PTX backend.  Also, converted 
write_fn_proto_from_insn to use (renamed) write_arg_mode and (new) 
write_return_mode.  I also noticed we can use liveness information to determine 
whether an outgoing static chain needs declaring.


nathan
2015-12-15  Nathan Sidwell  

	* config/nvptx/nvptx.c (write_one_arg): Rename to ...
	(write_arg_mode): ... here.  Update callers.
	(write_arg): Rename to ...
	(write__arg_type): ... here.  Update callers.
	(write_return_mode): New fn, broken out of ...
	(write_return): ... here.  Rename to ...
	(write_return_type): ... here.  Call it. Update callers.
	(write_fn_proto_from_insn): Use write_arg_mode and
	write_return_mode.
	(init_frame): New fn.
	(nvptx_declare_function_name): Call it for frame and varargs. Only
	emit outgoing static chain, if it's live.
	(nvptx_output_return): Use reg_names for return reg name.
	(nvptx_output_call_insn): Likewise.
	(nvptx_reorg): Mark unused hard regs too.

Index: config/nvptx/nvptx.c
===
--- config/nvptx/nvptx.c	(revision 231662)
+++ config/nvptx/nvptx.c	(working copy)
@@ -569,7 +569,8 @@ nvptx_static_chain (const_tree fndecl, b
copying to a specific hard register.  */
 
 static int
-write_one_arg (std::stringstream &s, int for_reg, int argno, machine_mode mode)
+write_arg_mode (std::stringstream &s, int for_reg, int argno,
+		machine_mode mode)
 {
   const char *ptx_type = nvptx_ptx_type_from_mode (mode, false);
 
@@ -598,7 +599,7 @@ write_one_arg (std::stringstream &s, int
 }
 
 /* Process function parameter TYPE to emit one or more PTX
-   arguments. S, FOR_REG and ARGNO as for write_one_arg.  PROTOTYPED
+   arguments. S, FOR_REG and ARGNO as for write_arg_mode.  PROTOTYPED
is true, if this is a prototyped function, rather than an old-style
C declaration.  Returns the next argument number to use.
 
@@ -606,8 +607,8 @@ write_one_arg (std::stringstream &s, int
parameter marshalling machinery.  */
 
 static int
-write_arg (std::stringstream &s, int for_reg, int argno,
-	   tree type, bool prototyped)
+write_arg_type (std::stringstream &s, int for_reg, int argno,
+		tree type, bool prototyped)
 {
   machine_mode mode = TYPE_MODE (type);
 
@@ -630,21 +631,35 @@ write_arg (std::stringstream &s, int for
 
   mode = promote_arg (mode, prototyped);
   if (split)
-	argno = write_one_arg (s, for_reg, argno, mode);
+	argno = write_arg_mode (s, for_reg, argno, mode);
 }
 
-  return write_one_arg (s, for_reg, argno, mode);
+  return write_arg_mode (s, for_reg, argno, mode);
+}
+
+/* Emit a PTX return as a prototype or function prologue declaration
+   for MODE.  */
+
+static void
+write_return_mode (std::stringstream &s, bool for_proto, machine_mode mode)
+{
+  const char *ptx_type = nvptx_ptx_type_from_mode (mode, false);
+  const char *pfx = "\t.reg";
+  const char *sfx = ";\n";
+  
+  if (for_proto)
+pfx = "(.param", sfx = "_out) ";
+  
+  s << pfx << ptx_type << " " << reg_names[NVPTX_RETURN_REGNUM] << sfx;
 }
 
 /* Process a function return TYPE to emit a PTX return as a prototype
-   or function prologue declaration.  DECL_RESULT is the decl result
-   of the function and needed for determining named result
-   behaviour. Returns true if return is via an additional pointer
-   parameter.  The promotion behaviour here must match the regular GCC
-   function return mashalling.  */
+   or function prologue declaration.  Returns true if return is via an
+   additional pointer parameter.  The promotion behaviour here must
+   match the regular GCC function return mashalling.  */
 
 static bool
-write_return (std::stringstream &s, bool for_proto, tree type)
+write_return_type (std::stringstream &s, bool for_proto, tree type)
 {
   machine_mode mode = TYPE_MODE (type);
 
@@ -675,11 +690,7 @@ write_return (std::stringstream &s, bool
   else
 mode = promote_return (mode);
 
-  const char *ptx_type  = nvptx_ptx_type_from_mode (mode, false);
-  if (for_proto)
-s << "(.param" << ptx_type << " %out_retval) ";
-  else
-s << "\t.reg" << ptx_type << " %retval;\n";
+  write_return_mode (s, for_proto, mode);
 
   return return_in_mem;
 }
@@ -752,7 +763,7 @@ write_fn_proto (std::stringstream &s, bo
   tree result_type = TREE_TYPE (fntype);
 
   /* Declare the result.  */
-  bool return_in_mem = write_return (s, true, result_type);
+  bool return_in_mem = write_return_type (s, true, result_type);
 
   s << name;
 
@@ -760,7 +771,7 @@ write_fn_proto (std::stringstream &s, bo
 
   /* Emit argument list.  */
   if (return_in_mem)
-argno = write_arg (s, -1, argno, ptr_type_node, true);
+argno = write_arg_type (s, -1, argno, ptr_type_node, true);
 
   /* We get:
  NULL in TYPE_ARG_TYPES, for old-style functions
@@ -779,19 +790,19 @@ write_fn_proto (std::stringstream &s, bo
 {
   tree type = prototyped ? TREE_VALUE (args) : TREE_TYPE (

Re: [PATCH] Fix -fcompare-debug issue in cross-jumping (PR rtl-optimization/65980)

2015-12-15 Thread Eric Botcazou
> rtx_renumbered_equal_p considers two LABEL_REFs equivalent if they
> have the same next_real_insn, unfortunately next_real_insn doesn't ignore
> debug insns.  It ignores BARRIERs/JUMP_TABLE_DATA insns too, which is IMHO
> not desirable either, so this patch uses next_nonnote_nondebug_insn instead
> (which stops at CODE_LABEL) and keeps iterating if CODE_LABELs are found.

next_active_insn would have done the job, modulo the BARRIER thing, but do we 
really need to care about BARRIER here?

-- 
Eric Botcazou


Re: [PATCH 5/7][Testsuite] Support ARMv8.1 ARM tests.

2015-12-15 Thread Ramana Radhakrishnan
On Tue, Dec 15, 2015 at 4:07 PM, Matthew Wahab
 wrote:
> On 10/12/15 10:49, Ramana Radhakrishnan wrote:
>>
>> On Mon, Dec 7, 2015 at 4:10 PM, Matthew Wahab 
>> wrote:
>>>
>>> On 27/11/15 17:11, Matthew Wahab wrote:

 On 27/11/15 13:44, Christophe Lyon wrote:
>>
>> On 26/11/15 16:02, Matthew Wahab wrote


>>> This patch adds ARMv8.1 support to GCC Dejagnu, to allow ARM tests to
>>> specify targest and to set up command line options. It builds on the
>>> ARMv8.1 target support added for AArch64 tests, partly reworking that
>>> support to take into account the different configurations that tests
>>> may
>>> be run under.
>
> [..]
>>>
>>> # Return 1 if the target supports the ARMv8.1 Adv.SIMD extension, 0 -#
>>> otherwise.  The test is valid for AArch64. +# otherwise.  The test is
>>> valid for
>>> AArch64 and ARM.  Record the command +# line options that needed.
>>
>>
>> s/that//
>
>
> Fixed in attached patch.
>
>> Can you also make sure doc/sourcebuild.texi is updated for this helper
>> function ?
>> If not documented,it would be good to add the documentation for the same
>> while you
>> are here.
>
>
> Done, I've listed them as ARM attributes based on their names.
>
> Tested this and the other update patch (#4/7) for arm-none-eabi with
> cross-compiled
> check-gcc by running the gcc.target/aarch64/advsimd-intrinsics with and
> without ARMv8.1 enabled as a test target.
>
> Ok?

Ok - thanks for dealing with this.


Ramana


> Matthew
>
> testsuite/
> 2015-12-14  Matthew Wahab  
>
> * lib/target-supports.exp (add_options_for_arm_v8_1a_neon): Update
> comment.  Use check_effective_target_arm_v8_1a_neon_ok to select
> the command line options.
> (check_effective_target_arm_v8_1a_neon_ok_nocache): Update initial
> test to allow ARM targets.  Select and record a working set of
> command line options.
> (check_effective_target_arm_v8_1a_neon_hw): Add tests for ARM
> targets.
>
> gcc/
> 2015-12-14  Matthew Wahab  
>
> * doc/sourcebuild.texi (ARM-specific attributes): Add
> "arm_v8_1a_neon_ok" and "arm_v8_1a_neon_hw".
>


RE: [Patch] Fix for MIPS PR target/65604

2015-12-15 Thread Steve Ellcey
On Tue, 2015-12-15 at 15:13 +, Moore, Catherine wrote:

> 
> HI Steve, The patch is OK.  Will you please add a test case and repost?
> Thanks,
> Catherine

Here is the patch with a test case.

2015-12-15  Steve Ellcey  

PR target/65604
* config/mips/mips.c (mips_output_division): Check flag_delayed_branch.


diff --git a/gcc/config/mips/mips.c b/gcc/config/mips/mips.c
index 6145944..8444a91 100644
--- a/gcc/config/mips/mips.c
+++ b/gcc/config/mips/mips.c
@@ -13687,9 +13687,17 @@ mips_output_division (const char *division, rtx 
*operands)
}
   else
{
- output_asm_insn ("%(bne\t%2,%.,1f", operands);
- output_asm_insn (s, operands);
- s = "break\t7%)\n1:";
+ if (flag_delayed_branch)
+   {
+ output_asm_insn ("%(bne\t%2,%.,1f", operands);
+ output_asm_insn (s, operands);
+ s = "break\t7%)\n1:";
+   }
+ else
+   {
+ output_asm_insn (s, operands);
+ s = "bne\t%2,%.,1f\n\tnop\n\tbreak\t7\n1:";
+   }
}
 }
   return s;



2015-12-15  Steve Ellcey  

PR target/65604
* gcc.target/mips/div-delay.c: New test.


diff --git a/gcc/testsuite/gcc.target/mips/div-delay.c 
b/gcc/testsuite/gcc.target/mips/div-delay.c
index e69de29..bdeb125 100644
--- a/gcc/testsuite/gcc.target/mips/div-delay.c
+++ b/gcc/testsuite/gcc.target/mips/div-delay.c
@@ -0,0 +1,13 @@
+/* { dg-do compile } */
+/* { dg-options "-march=mips1 -fno-delayed-branch" } */
+/* { dg-final { scan-assembler "\tbne\t.*\tnop" } } */
+
+/* Ensure that mips1 does not put anything in the delay slot of the bne
+   instruction when checking for divide by zero.  mips2+ systems use teq
+   instead of bne and teq has no delay slot.  */
+
+NOCOMPRESSION int
+foo (int a, int b)
+{
+  return a / b;
+}




Re: [PATCH 4/7][ARM] Add ACLE feature macro for ARMv8.1 instructions.

2015-12-15 Thread Ramana Radhakrishnan
On Tue, Dec 15, 2015 at 4:03 PM, Matthew Wahab
 wrote:
> On 10/12/15 10:45, Ramana Radhakrishnan wrote:
>>
>> On Tue, Dec 8, 2015 at 7:45 AM, Christian Bruel 
>> wrote:
>>>
>>> Hi Matthew,


 On 26/11/15 16:01, Matthew Wahab wrote:
>
>
> Hello,
>
> This patch adds the feature macro __ARM_FEATURE_QRDMX to indicate the
> presence of the ARMv8.1 instructions vqrdmlah and vqrdmlsh. It is
> defined when the instructions are available, as it is when
> -march=armv8.1-a is enabled with suitable fpu options.
>
> gcc/
> 2015-11-26  Matthew Wahab  
>
>* config/arm/arm-c.c (arm_cpu_builtins): Define
> __ARM_FEATURE_QRDMX.
>

>>>
>>> +  if (TARGET_NEON_RDMA)
>>> +builtin_define ("__ARM_FEATURE_QRDMX");
>>> +
>>>
>>> Since it depends on TARGET_NEON, could you please use
>>>
>>>def_or_undef_macro (pfile, "__ARM_FEATURE_QRDMX", TARGET_NEON_RDMA);
>>>
>>> instead ?
>>
>>
>> I think that's what it should be -
>>
>> OK with that fixed.
>
>
> Attached an updated patch using the def_or_undef macro. It also removes some
> trailing whitespace in that part of the code.
>
> Still ok?

Yep, OK.


regards
Ramana

> Matthew
>
> gcc/
> 2015-12-14  Matthew Wahab  
>
> * config/arm/arm-c.c (arm_cpu_builtins): Define
> __ARM_FEATURE_QRDMX.  Clean up some trailing whitespace.
>
>


[PATCH, IA64] Fix building a bare-metal ia64 compiler

2015-12-15 Thread Bernd Edlinger
Hi,

due to recent discussion on the basic asm, and the special handling of 
ASM_INPUT in ia64, I tried to build a bare-metal cross-compiler for ia64, but 
that did not work, because it seems to be impossible to build it without having 
a stdlib.h.

With the attached patch, I was finally able to build the cross compiler, by 
declaring abort in the way as it is done already in many other places at libgcc.

In case someone wants to know, working configure options are as follows:

../binutils-2.25.1/configure --prefix=../ia64-elf --target=ia64-unknown-elf

../gcc-trunk/configure --prefix=../ia64-elf --target=ia64-unknown-elf 
--enable-languages=c --with-gnu-as --disable-threads --disable-sjlj-exceptions 
--disable-libssp --disable-libquadmath


I have successfully built a bare-metal cross compiler with this patch.
Is it OK for trunk?


Thanks
Bernd.2015-12-15  Bernd Edlinger  

	* unwind-generic.h: Don't include stdlib.h.
	Add a prototype for abort.


Index: libgcc/unwind-generic.h
===
--- libgcc/unwind-generic.h	(Revision 231598)
+++ libgcc/unwind-generic.h	(Arbeitskopie)
@@ -221,7 +221,9 @@ _Unwind_SjLj_Resume_or_Rethrow (struct _Unwind_Exc
compatible with the standard ABI for IA-64, we inline these.  */
 
 #ifdef __ia64__
-#include 
+/* We add a prototype for abort here to avoid creating a dependency on
+   target headers.  */
+extern void abort (void);
 
 static inline _Unwind_Ptr
 _Unwind_GetDataRelBase (struct _Unwind_Context *_C)


Re: [PATCH] Fix -fcompare-debug issue in cross-jumping (PR rtl-optimization/65980)

2015-12-15 Thread Jakub Jelinek
On Tue, Dec 15, 2015 at 09:51:15PM +0100, Eric Botcazou wrote:
> > rtx_renumbered_equal_p considers two LABEL_REFs equivalent if they
> > have the same next_real_insn, unfortunately next_real_insn doesn't ignore
> > debug insns.  It ignores BARRIERs/JUMP_TABLE_DATA insns too, which is IMHO
> > not desirable either, so this patch uses next_nonnote_nondebug_insn instead
> > (which stops at CODE_LABEL) and keeps iterating if CODE_LABELs are found.
> 
> next_active_insn would have done the job, modulo the BARRIER thing, but do we 
> really need to care about BARRIER here?

I don't know.  For void foo (void) { lab: __builtin_unreachable (); }
we have a BARRIER ending a bb with no control flow insns in there, say with:
void bar (int);
void
foo (int x, int y)
{
  if (x == 46)
goto lab1;
  if (x == 47)
goto lab2;
  if (x > 23)
{
lab1:
  if (y) goto lab3;
  bar (x);
lab3:
  __builtin_unreachable ();
}
  bar (5);
lab2:
  if (y) goto lab4;
  bar (x);
lab4:
  __builtin_unreachable ();
}
with -O0 we have in e.g. the *.reload jump:
(code_label 54 79 55 11 10 ("lab4") [1 uses])
(note 55 54 56 11 [bb 11] NOTE_INSN_BASIC_BLOCK)
;;  succ:
;; lr  out   7 [sp] 16 [argp] 20 [frame]
  
(barrier 56 55 80)
(note 80 56 0 NOTE_INSN_DELETED)

Now, sure, at -O0 cross-jumping is hopefully not going to be performed,
but I'm worrying about say -O1 -fno-* for a bunch of optimizations that
optimize this at the gimple or RTL level, where cross-jumping could see this.

Jakub


[PATCH] Fix some blockers of PR c++/24666 (arrays decay to pointers too early)

2015-12-15 Thread Patrick Palka
This patch just makes convert_for_initialization() to avoid eagerly
decaying an array, function, etc if the type we're converting to is a
class type, so that the correct conversion constructor could be later be
selected in perform_implicit_conversion_flags().

Bootstrap + regtest in progress on x86_64-pc-linux-gnu, OK to commit if
no new regressions?

gcc/cp/ChangeLog:

PR c++/16333
PR c++/41426
PR c++/59878
PR c++/66895
* typeck.c (convert_for_initialization): Don't perform an early
decaying conversion if converting to a class type.

gcc/testsuite/ChangeLog:

PR c++/16333
PR c++/41426
PR c++/59878
PR c++/66895
* g++.dg/conversion/pr16333.C: New test.
* g++.dg/conversion/pr41426.C: New test.
* g++.dg/conversion/pr59878.C: New test.
* g++.dg/conversion/pr66895.C: New test.
---
 gcc/cp/typeck.c   | 16 +---
 gcc/testsuite/g++.dg/conversion/pr16333.C |  8 
 gcc/testsuite/g++.dg/conversion/pr41426.C | 28 
 gcc/testsuite/g++.dg/conversion/pr59878.C | 19 +++
 gcc/testsuite/g++.dg/conversion/pr66895.C | 11 +++
 5 files changed, 75 insertions(+), 7 deletions(-)
 create mode 100644 gcc/testsuite/g++.dg/conversion/pr16333.C
 create mode 100644 gcc/testsuite/g++.dg/conversion/pr41426.C
 create mode 100644 gcc/testsuite/g++.dg/conversion/pr59878.C
 create mode 100644 gcc/testsuite/g++.dg/conversion/pr66895.C

diff --git a/gcc/cp/typeck.c b/gcc/cp/typeck.c
index 39c1af2..1059b26 100644
--- a/gcc/cp/typeck.c
+++ b/gcc/cp/typeck.c
@@ -8479,13 +8479,15 @@ convert_for_initialization (tree exp, tree type, tree 
rhs, int flags,
   || (TREE_CODE (rhs) == TREE_LIST && TREE_VALUE (rhs) == error_mark_node))
 return error_mark_node;
 
-  if ((TREE_CODE (TREE_TYPE (rhs)) == ARRAY_TYPE
-   && TREE_CODE (type) != ARRAY_TYPE
-   && (TREE_CODE (type) != REFERENCE_TYPE
-  || TREE_CODE (TREE_TYPE (type)) != ARRAY_TYPE))
-  || (TREE_CODE (TREE_TYPE (rhs)) == FUNCTION_TYPE
- && !TYPE_REFFN_P (type))
-  || TREE_CODE (TREE_TYPE (rhs)) == METHOD_TYPE)
+  if (MAYBE_CLASS_TYPE_P (type))
+;
+  else if ((TREE_CODE (TREE_TYPE (rhs)) == ARRAY_TYPE
+   && TREE_CODE (type) != ARRAY_TYPE
+   && (TREE_CODE (type) != REFERENCE_TYPE
+   || TREE_CODE (TREE_TYPE (type)) != ARRAY_TYPE))
+  || (TREE_CODE (TREE_TYPE (rhs)) == FUNCTION_TYPE
+  && !TYPE_REFFN_P (type))
+  || TREE_CODE (TREE_TYPE (rhs)) == METHOD_TYPE)
 rhs = decay_conversion (rhs, complain);
 
   rhstype = TREE_TYPE (rhs);
diff --git a/gcc/testsuite/g++.dg/conversion/pr16333.C 
b/gcc/testsuite/g++.dg/conversion/pr16333.C
new file mode 100644
index 000..979e0ac
--- /dev/null
+++ b/gcc/testsuite/g++.dg/conversion/pr16333.C
@@ -0,0 +1,8 @@
+// PR c++/16333
+
+struct X {
+   X (const int (&)[3]);
+};
+
+int a[3];
+X foo () { return a; }
diff --git a/gcc/testsuite/g++.dg/conversion/pr41426.C 
b/gcc/testsuite/g++.dg/conversion/pr41426.C
new file mode 100644
index 000..cc86dd5
--- /dev/null
+++ b/gcc/testsuite/g++.dg/conversion/pr41426.C
@@ -0,0 +1,28 @@
+// PR c++/41426
+
+template 
+struct A
+{
+   template 
+   A(_T (&V)[_N]);
+   A();
+};
+
+A g1()
+{
+   float f[] = {1.1f, 2.3f};
+   return f;
+}
+
+struct B
+{
+   B (int (&v)[10]);
+   B();
+};
+
+B g2()
+{
+   int c[10];
+   return c;
+}
+
diff --git a/gcc/testsuite/g++.dg/conversion/pr59878.C 
b/gcc/testsuite/g++.dg/conversion/pr59878.C
new file mode 100644
index 000..15f3a37
--- /dev/null
+++ b/gcc/testsuite/g++.dg/conversion/pr59878.C
@@ -0,0 +1,19 @@
+struct Test {
+ template 
+ Test(const char (&array)[N]) {}
+};
+
+Test test() {
+ return "test1";
+}
+
+void test2(Test arg = "test12") {}
+
+template 
+void test3(T arg = "test123") {}
+
+int main() {
+ test();
+ test2();
+ test3();
+}
diff --git a/gcc/testsuite/g++.dg/conversion/pr66895.C 
b/gcc/testsuite/g++.dg/conversion/pr66895.C
new file mode 100644
index 000..a4bf651
--- /dev/null
+++ b/gcc/testsuite/g++.dg/conversion/pr66895.C
@@ -0,0 +1,11 @@
+// { dg-do compile { target c++11 } }
+
+#include 
+#include 
+
+struct S {
+template S(char const (&)[N]);
+};
+struct T { S s; };
+void f(std::initializer_list);
+void g() { f({{""}}); }
-- 
2.7.0.rc0.50.g1470d8f.dirty



Re: [PATCH 0/2] obsolete some old targets

2015-12-15 Thread Jeff Law

On 12/14/2015 08:55 PM, tbsaunde+...@tbsaunde.org wrote:

From: Trevor Saunders 

Hi,

http://gcc.gnu.org/ml/gcc-patches/2015-12/msg00365.html reminded me I hadn't
gotten around to marking *-knetbsd and openbsd 2/3 obsolete as I offered to do
back in the spring.

I tested I could still build on x86_64-linux-gnu, and could only cross compile
to i386-openbsd2 i386-openbsd3 and x86_64_64-knetbsd-gnu with
--enable-obsolete.  Given how late in the cycle we are I'm not sure if we
should remove these targets as soon as stage 1 opens, but we might as well
obsolete them I guess, ok to commit?

Trev


Trevor Saunders (2):
   mark *-knetbsd-* as obsolete
   obsolete openbsd 2.0 and 3.X
With the fixes pointed out by Mike and Andreas fixed, this is fine for 
the trunk.


Can you mark interix as obsolete?  It hasn't even built for a long time.

jeff


Re: [PATCH] Fix -fcompare-debug issue in cross-jumping (PR rtl-optimization/65980)

2015-12-15 Thread Jeff Law

On 12/15/2015 02:13 PM, Jakub Jelinek wrote:

On Tue, Dec 15, 2015 at 09:51:15PM +0100, Eric Botcazou wrote:

rtx_renumbered_equal_p considers two LABEL_REFs equivalent if they
have the same next_real_insn, unfortunately next_real_insn doesn't ignore
debug insns.  It ignores BARRIERs/JUMP_TABLE_DATA insns too, which is IMHO
not desirable either, so this patch uses next_nonnote_nondebug_insn instead
(which stops at CODE_LABEL) and keeps iterating if CODE_LABELs are found.


next_active_insn would have done the job, modulo the BARRIER thing, but do we
really need to care about BARRIER here?


I don't know.  For void foo (void) { lab: __builtin_unreachable (); }
we have a BARRIER ending a bb with no control flow insns in there, say with:
Given the ill-formed nature of __builtin_unreachable, I think the 
sensible thing is to require BARRIERs to match as well.  If we don't, 
then we could consider paths which differ only in the existence of a 
BARRIER.  If we cross jump them, then we either lose the 
BARRIER/__builtin_unreachable property or we add it where it wasn't before.


Jeff


Re: [PATCH] Fix -fcompare-debug issue in cross-jumping (PR rtl-optimization/65980)

2015-12-15 Thread Jeff Law

On 12/14/2015 01:14 PM, Jakub Jelinek wrote:

Hi!

rtx_renumbered_equal_p considers two LABEL_REFs equivalent if they
have the same next_real_insn, unfortunately next_real_insn doesn't ignore
debug insns.  It ignores BARRIERs/JUMP_TABLE_DATA insns too, which is IMHO
not desirable either, so this patch uses next_nonnote_nondebug_insn instead
(which stops at CODE_LABEL) and keeps iterating if CODE_LABELs are found.

Bootstrapped/regtested on x86_64-linux and i686-linux, ok for trunk?

2015-12-14  Jakub Jelinek  

PR rtl-optimization/65980
* jump.c (rtx_renumbered_equal_p) : Use
next_nonnote_nondebug_insn instead of next_real_insn and
skip over CODE_LABELs too.

* gcc.dg/pr65980.c: New test.

OK.
jeff



Re: [PATCH, IA64] Fix building a bare-metal ia64 compiler

2015-12-15 Thread Jeff Law

On 12/15/2015 02:13 PM, Bernd Edlinger wrote:

Hi,

due to recent discussion on the basic asm, and the special handling of 
ASM_INPUT in ia64, I tried to build a bare-metal cross-compiler for ia64, but 
that did not work, because it seems to be impossible to build it without having 
a stdlib.h.

With the attached patch, I was finally able to build the cross compiler, by 
declaring abort in the way as it is done already in many other places at libgcc.

In case someone wants to know, working configure options are as follows:

../binutils-2.25.1/configure --prefix=../ia64-elf --target=ia64-unknown-elf

../gcc-trunk/configure --prefix=../ia64-elf --target=ia64-unknown-elf 
--enable-languages=c --with-gnu-as --disable-threads --disable-sjlj-exceptions 
--disable-libssp --disable-libquadmath


I have successfully built a bare-metal cross compiler with this patch.
Is it OK for trunk?
I wouldn't call it "many" places in libgcc.  I just see one -- fp-bit.c 
and that one is only to give "sane" handle to extended mode floating point.


For ia64-elf ISTM this header should be coming from newlib.


jeff


Re: [PATCH 0/2] obsolete some old targets

2015-12-15 Thread Trevor Saunders
On Tue, Dec 15, 2015 at 02:32:47PM -0700, Jeff Law wrote:
> On 12/14/2015 08:55 PM, tbsaunde+...@tbsaunde.org wrote:
> >From: Trevor Saunders 
> >
> >Hi,
> >
> >http://gcc.gnu.org/ml/gcc-patches/2015-12/msg00365.html reminded me I hadn't
> >gotten around to marking *-knetbsd and openbsd 2/3 obsolete as I offered to 
> >do
> >back in the spring.
> >
> >I tested I could still build on x86_64-linux-gnu, and could only cross 
> >compile
> >to i386-openbsd2 i386-openbsd3 and x86_64_64-knetbsd-gnu with
> >--enable-obsolete.  Given how late in the cycle we are I'm not sure if we
> >should remove these targets as soon as stage 1 opens, but we might as well
> >obsolete them I guess, ok to commit?
> >
> >Trev
> >
> >
> >Trevor Saunders (2):
> >   mark *-knetbsd-* as obsolete
> >   obsolete openbsd 2.0 and 3.X
> With the fixes pointed out by Mike and Andreas fixed, this is fine for the
> trunk.
> 
> Can you mark interix as obsolete?  It hasn't even built for a long time.

 Sure, I can do that if you want, I just wasn't sure before you wanted
 to.

 Trev

> 
> jeff


Re: [PATCH 0/2] obsolete some old targets

2015-12-15 Thread Jeff Law

On 12/15/2015 03:02 PM, Trevor Saunders wrote:


Can you mark interix as obsolete?  It hasn't even built for a long time.


  Sure, I can do that if you want, I just wasn't sure before you wanted
  to.
Please do.  I know we've been round and round on that one before, but 
given it hasn't been building since 2012, I think obsoleting is appropriate.


Fixing it wouldn't be hard, it just doesn't seem worth the effort.

jeff


Re: [PATCH] c/68868 - atomic_init emits an unnecessary fence

2015-12-15 Thread Jeff Law

On 12/14/2015 05:53 PM, Martin Sebor wrote:

The C atomic_init macro is implemented in terms of simple assignment
to the atomic variable pointed to by its first argument.  That's
inefficient since the variable under initialization must not be
accessed by other threads and assignment provides sequentially
consistent semantics.  The inefficiency is apparent in the generated
dumps (e.g. the gimple dump contains calls to __atomic_store (...,
memory_order_seq_cst), and the assembly dump contains the fence
instruction).

The attached patch changes the macro to use atomic_store with relaxed
consistency semantics and adds a test verifying that invocations of
the atomic_init macro emit __atomic_store_N with a zero last argument
(memory_order_relaxed).

This brings GCC on par with Clang.

Tested on powerpc64le and x86_64.

Martin

gcc-68868.patch


gcc/ChangeLog
2015-12-14  Martin Sebor

PR c/68868
* ginclude/stdatomic.h (atomic_init): Use atomic_store instead
of plain assignment.

gcc/testsuite/ChangeLog
2015-12-14  Martin Sebor

PR c/68868
* testsuite/gcc.dg/atomic/stdatomic-init.c: New test.

OK.
jeff



Re: [PATCH] C FE: use correct location range for static assertions

2015-12-15 Thread Marek Polacek
On Tue, Dec 15, 2015 at 11:51:49AM -0500, David Malcolm wrote:
> When issuing diagnostics for _Static_assert, we currently ignore the
> location/range of the asserted expression, and instead use the
> location/range of the first token within it, which can be
> incorrect for compound expressions:
> 
> error: expression in static assertion is not constant
>_Static_assert (param > 0, "message");
>^
> 
> This patch changes things to use EXPR_LOC_OR_LOC, so we use the
> location/range of the expression if it has one, falling back to the old
> behavior if it doesn't, giving:
> 
> error: expression in static assertion is not constant
>_Static_assert (param > 0, "message");
>~~^~~
> 
> Successfully bootstrapped®rtested on x86_64-pc-linux-gnu
> 
> OK for trunk in stage 3?
> 
> [a much earlier version of this was posted as part of:
> "[PATCH 16/22] C/C++ frontend: use tree ranges in various diagnostics"
>   https://gcc.gnu.org/ml/gcc-patches/2015-09/msg00745.html
> but this patch bears no resemblence apart from the testcase, due to
> changes in representation]
> 
> gcc/c/ChangeLog:
>   * c-parser.c (c_parser_static_assert_declaration_no_semi): Use the
>   expression location, falling back on the first token location,
>   rather than always using the latter.
> 
> gcc/testsuite/ChangeLog:
>   * gcc.dg/diagnostic-range-static-assert.c: New test case.

Looks ok to me.

Marek


Re: [PATCH] C FE: use correct location range for static assertions

2015-12-15 Thread Jeff Law

On 12/15/2015 09:51 AM, David Malcolm wrote:

When issuing diagnostics for _Static_assert, we currently ignore the
location/range of the asserted expression, and instead use the
location/range of the first token within it, which can be
incorrect for compound expressions:

error: expression in static assertion is not constant
_Static_assert (param > 0, "message");
^

This patch changes things to use EXPR_LOC_OR_LOC, so we use the
location/range of the expression if it has one, falling back to the old
behavior if it doesn't, giving:

error: expression in static assertion is not constant
_Static_assert (param > 0, "message");
~~^~~

Successfully bootstrapped®rtested on x86_64-pc-linux-gnu

OK for trunk in stage 3?

[a much earlier version of this was posted as part of:
"[PATCH 16/22] C/C++ frontend: use tree ranges in various diagnostics"
   https://gcc.gnu.org/ml/gcc-patches/2015-09/msg00745.html
but this patch bears no resemblence apart from the testcase, due to
changes in representation]

gcc/c/ChangeLog:
* c-parser.c (c_parser_static_assert_declaration_no_semi): Use the
expression location, falling back on the first token location,
rather than always using the latter.

gcc/testsuite/ChangeLog:
* gcc.dg/diagnostic-range-static-assert.c: New test case.

IMHO, this is a bugfix and thus entirely appropriate at this stage.

OK for the trunk.
Jeff



Re: extend shift count warnings to vector types

2015-12-15 Thread Jeff Law

On 12/14/2015 01:29 AM, Jan Beulich wrote:

On 11.12.15 at 21:40,  wrote:

On 12/11/2015 12:28 AM, Jan Beulich wrote:

gcc/c/
2015-12-10  Jan Beulich  

* c-fold.c (c_fully_fold_internal): Also emit shift count
warnings for vector types.
* c-typeck.c (build_binary_op): Likewise.

Needs testcases for the added warnings.

My additional concern here would be that in build_binary_op, after your
change, we'll be setting doing_shift to true.  That in turn will enable
ubsan instrumentation of the shift.  Does ubsan work properly for vector
shifts?


You say that it may be safe with that other patch you replied to a
little later. I have no idea myself.

I think Paolo's change makes yours safe.

Essentially Paolo's change avoids the overflow sanitization if the type 
is not an integral type.  Specifically in ubsan_instrument_shift tt will 
always be NULL, which in turn causes ubsan_instrument_shift to return 
NULL without generating any instrumentation.


So I think you just need to build some testcases for your change.
jeff


Re: [PATCH 2/4][AArch64] Increase the loop peeling limit

2015-12-15 Thread Evandro Menezes

On 12/14/2015 05:26 AM, James Greenhalgh wrote:

On Thu, Dec 03, 2015 at 03:07:43PM -0600, Evandro Menezes wrote:

On 11/20/2015 05:53 AM, James Greenhalgh wrote:

On Thu, Nov 19, 2015 at 04:04:41PM -0600, Evandro Menezes wrote:

On 11/05/2015 02:51 PM, Evandro Menezes wrote:

2015-11-05  Evandro Menezes 

   gcc/

   * config/aarch64/aarch64.c (aarch64_override_options_internal):
   Increase loop peeling limit.

This patch increases the limit for the number of peeled insns.
With this change, I noticed no major regression in either
Geekbench v3 or SPEC CPU2000 while some benchmarks, typically FP
ones, improved significantly.

I tested this tuning on Exynos M1 and on A57.  ThunderX seems to
benefit from this tuning too.  However, I'd appreciate comments

>from other stakeholders.

Ping.

I'd like to leave this for a call from the port maintainers. I can see why
this leads to more opportunities for vectorization, but I'm concerned about
the wider impact on code size. Certainly I wouldn't expect this to be our
default at -O2 and below.

My gut feeling is that this doesn't really belong in the back-end (there are
presumably good reasons why the default for this parameter across GCC has
fluctuated from 400 to 100 to 200 over recent years), but as I say, I'd
like Marcus or Richard to make the call as to whether or not we take this
patch.

Please, correct me if I'm wrong, but loop peeling is enabled only
with loop unrolling (and with PGO).  If so, then extra code size is
not a concern, for this heuristic is only active when unrolling
loops, when code size is already of secondary importance.

My understanding was that loop peeling is enabled from -O2 upwards, and
is also used to partially peel unaligned loops for vectorization (allowing
the vector code to be well aligned), or to completely peel inner loops which
may then become amenable to SLP vectorization.

If I'm wrong then I take back these objections. But I was sure this
parameter was used in a number of situations outside of just
-funroll-loops/-funroll-all-loops . Certainly I remember seeing performance
sensitivities to this parameter at -O3 in some internal workloads I was
analysing.


Vectorization, including SLP, is only enabled at -O3, isn't it?  It 
seems to me that peeling is only used by optimizations which already 
lead to potential increase in code size.


For instance, with "-Ofast -funroll-all-loops", the total text size for 
the SPEC CPU2000 suite is 26.9MB with this proposed change and 26.8MB 
without it; with just "-O2", it is the same at 23.1MB regardless of this 
setting.


So it seems to me that this proposal should be neutral for up to -O2.

Thank you,

--
Evandro Menezes



Re: [PATCH][rtlanal.c] Convert conditional compilation on WORD_REGISTER_OPERATIONS

2015-12-15 Thread Bernd Schmidt

On 12/15/2015 06:25 PM, Kyrill Tkachov wrote:

Bootstrapped and tested on arm, aarch64, x86_64.


I'd say let's wait. Some of the changes look misindented btw.


Bernd



Re: [PATCH 4/4] Cost CCMP instruction sequences to choose better expand order

2015-12-15 Thread Bernd Schmidt

On 12/15/2015 06:30 PM, Jiong Wang wrote:

   You approved this patch at

 https://gcc.gnu.org/ml/gcc-patches/2015-09/msg01722.html

   under the condition that AArch64 cost on ccmp instruction should be
   fixed first.

   Wilco has fixed the cost issue in this patch set [3/4], and the
"XFAIL" removed also.

   I just want to confirm that this patch is still OK to commit after
boostrap and
   regression OK, right?


Sure.


Bernd


Re: [Patch,tree-optimization]: Add new path Splitting pass on tree ssa representation

2015-12-15 Thread Jeff Law

On 12/11/2015 03:05 AM, Richard Biener wrote:

On Thu, Dec 10, 2015 at 9:08 PM, Jeff Law  wrote:

On 12/03/2015 07:38 AM, Richard Biener wrote:


This pass is now enabled by default with -Os but has no limits on the
amount of
stmts it copies.


The more statements it copies, the more likely it is that the path spitting
will turn out to be useful!  It's counter-intuitive.


Well, it's still not appropriate for -Os (nor -O2 I think).  -ftracer is enabled
with -fprofile-use (but it is also properly driven to only trace hot paths)
and otherwise not by default at any optimization level.
Definitely not appropriate for -Os.  But as I mentioned, I really want 
to look at the tracer code as it may totally subsume path splitting.




Don't see how this would work for the CFG pattern it operates on unless you
duplicate the exit condition into that new block creating an even more
obfuscated CFG.
Agreed, I don't see any way to fix the multiple exit problem.  Then 
again, this all runs after the tree loop optimizer, so I'm not sure how 
big of an issue it is in practice.




It was only after I approved this code after twiddling it for Ajit that I
came across Honza's tracer implementation, which may in fact be
retargettable to these loops and do a better job.  I haven't experimented
with that.


Well, I originally suggested to merge this with the tracer pass...

I missed that, or it didn't sink into my brain.


Again, the more statements it copies the more likely it is to be profitable.
Think superblocks to expose CSE, DCE and the like.


Ok, so similar to tracer (where I think the main benefit is actually increasing
scheduling opportunities for architectures where it matters).
Right.  They're both building superblocks, which has the effect of 
larger windows for scheduling, DCE, CSE, etc.





Note that both passes are placed quite late and thus won't see much
of the GIMPLE optimizations (DOM mainly).  I wonder why they were
not placed adjacent to each other.
Ajit had it fairly early, but that didn't play well with if-conversion. 
 I just pushed it past if-conversion and vectorization, but before the 
last DOM pass.  That turns out to be where tracer lives too as you noted.




I wouldn't lose any sleep if we disabled by default or removed, particularly
if we can repurpose Honza's code.  In fact, I might strongly support the
former until we hear back from Ajit on performance data.


See above for what we do with -ftracer.  path-splitting should at _least_
restrict itself to operate on optimize_loop_for_speed_p () loops.
I think we need to decide if we want the code at all, particularly given 
the multiple-exit problem.


The difficulty is I think Ajit posted some recent data that shows it's 
helping.  So maybe the thing to do is ask Ajit to try the tracer 
independent of path splitting and take the obvious actions based on 
Ajit's data.





It should also (even if counter-intuitive) limit the amount of stmt copying
it does - after all there is sth like an instruction cache size which exceeeding
for loops will never be a good idea (and even smaller special loop caches on
some archs).

Yup.



Note that a better heuristic than "at least more than one stmt" would be
to have at least one PHI in the merger block.  Otherwise I don't see how
CSE opportunities could exist we don't see without the duplication.
And yes, more PHIs -> more possible CSE.  I wouldn't say so for
the number of stmts.  So please limit the number of stmt copies!
(after all we do limit the number of stmts we copy during jump threading!)
Let's get some more data before we try to tune path splitting.  In an 
ideal world, the tracer can handle this for us and we just remove path 
splitting completely.


Jeff


Re: [PATCH] Better error recovery for merge-conflict markers (v5)

2015-12-15 Thread Bernd Schmidt

On 12/15/2015 08:30 PM, David Malcolm wrote:


I got thinking about what we'd have to do to support Perforce-style
markers, and began to find my token-matching approach to be a little
clunky (in conjunction with reading Martin's observations on
c_parser_peek_nth_token).

Here's a reimplementation of the patch which takes a much simpler
approach, and avoids the need to touch the C lexer: check that we're
not in a macro expansion and then read in the source line, and
textually compare against the various possible conflict markers.
This adds the requirement that the source file be readable, so it
won't detect conflict markers in a .i file from -save-temps,


How come? Is source file defined as the one before preprocessing?

And I do think this is an unfortunate limitation (given that we often 
load .i files into cc1 for debugging and we'd ideally like that to be 
consistent with normal compilation as much as possible). I'd rather go 
with the original patch based on this.



Bernd


Re: [PATCH, IA64] Fix building a bare-metal ia64 compiler

2015-12-15 Thread Bernd Schmidt

On 12/15/2015 10:13 PM, Bernd Edlinger wrote:

due to recent discussion on the basic asm, and the special handling
of ASM_INPUT in ia64, I tried to build a bare-metal cross-compiler
for ia64, but that did not work, because it seems to be impossible to
build it without having a stdlib.h.


Actually David Howells has complained to me about this as well, it seems 
to be a problem when building a toolchain for kernel compilation.



With the attached patch, I was finally able to build the cross
compiler, by declaring abort in the way as it is done already in many
other places at libgcc.


Can you just use __builtin_abort ()? Ok with that change.


Bernd


Re: [PATCH 2/4 v2][AArch64] Add support for FCCMP

2015-12-15 Thread Bernd Schmidt

On 12/15/2015 06:20 PM, Wilco Dijkstra wrote:

Adding Bernd - would you mind reviewing the ccmp.c change please?


Oh sorry, didn't realize there was one in here as well. Looks ok.


Bernd


[PATCH 0/4] [ARC] Collection Of Bug Fixes

2015-12-15 Thread Andrew Burgess
This is a collection of 4 bug fix patches for arc.  All 4 patches are
really stand-alone, I've only grouped them together as they all only
effect arc.

I don't have write access to the GCC repository, so if they get
approved could they also be applied please.

Thanks,
Andrew

--

Andrew Burgess (4):
  gcc/arc: Fix warning in test
  gcc/arc: Remove load_update_operand predicate
  gcc/arc: Remove store_update_operand predicate
  gcc/arc: Avoid JUMP_LABEL_AS_INSN for possible return jumps

 gcc/ChangeLog   | 27 ++
 gcc/config/arc/arc.c| 15 +++---
 gcc/config/arc/arc.md   | 72 -
 gcc/config/arc/predicates.md| 36 -
 gcc/testsuite/ChangeLog | 12 +
 gcc/testsuite/gcc.target/arc/jump-around-jump.c |  2 +-
 gcc/testsuite/gcc.target/arc/load-update.c  | 20 +++
 gcc/testsuite/gcc.target/arc/loop-hazard-1.c| 16 ++
 8 files changed, 120 insertions(+), 80 deletions(-)
 create mode 100644 gcc/testsuite/gcc.target/arc/load-update.c
 create mode 100644 gcc/testsuite/gcc.target/arc/loop-hazard-1.c

-- 
2.5.1



[PATCH 3/4] gcc/arc: Remove store_update_operand predicate

2015-12-15 Thread Andrew Burgess
The use of the arc specific predicate store_update_operand is broken,
this commit fixes the error, and in the process removes the need for
store_update_operand altogether.

Currently store_update_operand is used with match_operator, the
store_update_operand checks that the operand is a MEM operand, with an
operand that is a plus, the plus in turn has operands that are a
register and an immediate.

However, the match_operator already checks the structure of the rtl
tree, only in this case a different rtl pattern is checked for, in this
case the operand must have two child operands, one a register operand
and one an immediate operand.

The mistake here is that the plus part of the rtl tree has been missed
from the define_insn rtl pattern.  The consequence of this mistake is
that a MEM operand will match the store_update_operand predicate, then
the second operand of the MEM insn will then be passed to the
nonmemory_operand predicate, which assumes it will be passed an
rtl_insn.  However, the second operand of a MEM insn is the alias set
for the address, not an rtl_insn.

When fixing the rtl pattern within the define_insn it becomes obvious
that all of the checks currently contained within the
store_update_operand predicate are now contains within the rtl pattern,
if the use of store_update_operand is replaced with the memory_operand
predicate.

As with the previous patch in this series, once this patch is applied
I see almost all of these instructions being used in the wider GCC
testsuite.  As with the previous patch, if anyone knows a good way to
trigger the generation of specific instructions, tha would be great.

gcc/ChangeLog:

* config/arc/arc.md (*storeqi_update): Use 'memory_operand' and
fix RTL pattern to include the plus.
(*storehi_update): Likewise.
(*storesi_update): Likewise.
(*storesf_update): Likewise.
* config/arc/predicates.md (store_update_operand): Delete.
---
 gcc/ChangeLog|  9 +
 gcc/config/arc/arc.md| 24 
 gcc/config/arc/predicates.md | 18 --
 3 files changed, 21 insertions(+), 30 deletions(-)

diff --git a/gcc/ChangeLog b/gcc/ChangeLog
index 705d4e9..dcc0930 100644
--- a/gcc/ChangeLog
+++ b/gcc/ChangeLog
@@ -1,5 +1,14 @@
 2015-12-09  Andrew Burgess  
 
+   * config/arc/arc.md (*storeqi_update): Use 'memory_operand' and
+   fix RTL pattern to include the plus.
+   (*storehi_update): Likewise.
+   (*storesi_update): Likewise.
+   (*storesf_update): Likewise.
+   * config/arc/predicates.md (store_update_operand): Delete.
+
+2015-12-09  Andrew Burgess  
+
* config/arc/arc.md (*loadqi_update): Use 'memory_operand' and fix
RTL pattern to include the plus.
(*load_zeroextendqisi_update): Likewise.
diff --git a/gcc/config/arc/arc.md b/gcc/config/arc/arc.md
index ef82007..2ca4d1d 100644
--- a/gcc/config/arc/arc.md
+++ b/gcc/config/arc/arc.md
@@ -1149,9 +1149,9 @@
(set_attr "length" "4,8")])
 
 (define_insn "*storeqi_update"
-  [(set (match_operator:QI 4 "store_update_operand"
-[(match_operand:SI 1 "register_operand" "0")
- (match_operand:SI 2 "short_immediate_operand" "I")])
+  [(set (match_operator:QI 4 "memory_operand"
+[(plus:SI (match_operand:SI 1 "register_operand" "0")
+  (match_operand:SI 2 "short_immediate_operand" "I"))])
(match_operand:QI 3 "register_operand" "c"))
(set (match_operand:SI 0 "dest_reg_operand" "=w")
(plus:SI (match_dup 1) (match_dup 2)))]
@@ -1200,9 +1200,9 @@
(set_attr "length" "4,8")])
 
 (define_insn "*storehi_update"
-  [(set (match_operator:HI 4 "store_update_operand"
-[(match_operand:SI 1 "register_operand" "0")
- (match_operand:SI 2 "short_immediate_operand" "I")])
+  [(set (match_operator:HI 4 "memory_operand"
+[(plus:SI (match_operand:SI 1 "register_operand" "0")
+  (match_operand:SI 2 "short_immediate_operand" "I"))])
(match_operand:HI 3 "register_operand" "c"))
(set (match_operand:SI 0 "dest_reg_operand" "=w")
(plus:SI (match_dup 1) (match_dup 2)))]
@@ -1225,9 +1225,9 @@
(set_attr "length" "4,8")])
 
 (define_insn "*storesi_update"
-  [(set (match_operator:SI 4 "store_update_operand"
-[(match_operand:SI 1 "register_operand" "0")
- (match_operand:SI 2 "short_immediate_operand" "I")])
+  [(set (match_operator:SI 4 "memory_operand"
+[(plus:SI (match_operand:SI 1 "register_operand" "0")
+  (match_operand:SI 2 "short_immediate_operand" "I"))])
(match_operand:SI 3 "register_operand" "c"))
(set (match_operand:SI 0 "dest_reg_operand" "=w")
(plus:SI (match_dup 1) (match_dup 2)))]
@@ -1249,9 +1249,9 @@
(set_attr "length" "4,8")])
 
 (define_insn "*storesf_update"
-  [(set (match_operator:SF 4 "store_update_operand"
-[(match_operand:SI 1 "register_operand" "0")
- (match_operand:SI 2 "short_immediate_

[PATCH 2/4] gcc/arc: Remove load_update_operand predicate

2015-12-15 Thread Andrew Burgess
The use of the arc specific predicate load_update_operand is broken,
this commit fixes the error, and in the process removes the need for
load_update_operand altogether.

Currently load_update_operand is used with match_operator, the
load_update_operand checks that the operand is a MEM operand, with an
operand that is a plus, the plus in turn has operands that are a
register and an immediate.

However, the match_operator already checks the structure of the rtl
tree, only in this case a different rtl pattern is checked for, in this
case the operand must have two child operands, one a register operand
and one an immediate operand.

The mistake here is that the plus part of the rtl tree has been missed
from the define_insn rtl pattern.  The consequence of this mistake is
that a MEM operand will match the load_update_operand predicate, then
the second operand of the MEM insn will then be passed to the
nonmemory_operand predicate, which assumes it will be passed an
rtl_insn.  However, the second operand of a MEM insn is the alias set
for the address, not an rtl_insn.

When fixing the rtl pattern within the define_insn it becomes obvious
that all of the checks currently contained within the
load_update_operand predicate are now contains within the rtl pattern,
if the use of load_update_operand is replaced with the memory_operand
predicate.

I added a new test that exposes the issue that originally highlighted
this bug for me.  Having fixed this, I am now seeing some (but not
all) of these instructions used within the wider GCC test suite.

It would be great to add more tests targeting the specific
instructions that are not covered in the wider GCC test suite, but so
far I've been unable to come up with any usable tests.  If anyone has
advice on how to trigger the generation of specific instructions that
would be great.


gcc/ChangeLog:

* config/arc/arc.md (*loadqi_update): Use 'memory_operand' and fix
RTL pattern to include the plus.
(*load_zeroextendqisi_update): Likewise.
(*load_signextendqisi_update): Likewise.
(*loadhi_update): Likewise.
(*load_zeroextendhisi_update): Likewise.
(*load_signextendhisi_update): Likewise.
(*loadsi_update): Likewise.
(*loadsf_update): Likewise.
* config/arc/predicates.md (load_update_operand): Delete.

gcc/testsuite/ChangeLog:

* gcc.target/arc/load-update.c: New file.
---
 gcc/ChangeLog  | 13 
 gcc/config/arc/arc.md  | 48 +++---
 gcc/config/arc/predicates.md   | 18 ---
 gcc/testsuite/ChangeLog|  4 +++
 gcc/testsuite/gcc.target/arc/load-update.c | 20 +
 5 files changed, 61 insertions(+), 42 deletions(-)
 create mode 100644 gcc/testsuite/gcc.target/arc/load-update.c

diff --git a/gcc/ChangeLog b/gcc/ChangeLog
index 0a77807..705d4e9 100644
--- a/gcc/ChangeLog
+++ b/gcc/ChangeLog
@@ -1,3 +1,16 @@
+2015-12-09  Andrew Burgess  
+
+   * config/arc/arc.md (*loadqi_update): Use 'memory_operand' and fix
+   RTL pattern to include the plus.
+   (*load_zeroextendqisi_update): Likewise.
+   (*load_signextendqisi_update): Likewise.
+   (*loadhi_update): Likewise.
+   (*load_zeroextendhisi_update): Likewise.
+   (*load_signextendhisi_update): Likewise.
+   (*loadsi_update): Likewise.
+   (*loadsf_update): Likewise.
+   * config/arc/predicates.md (load_update_operand): Delete.
+
 2015-12-10  Jeff Law  
 
PR tree-optimization/68619
diff --git a/gcc/config/arc/arc.md b/gcc/config/arc/arc.md
index ac181a9..ef82007 100644
--- a/gcc/config/arc/arc.md
+++ b/gcc/config/arc/arc.md
@@ -1114,9 +1114,9 @@
 ;; Note: loadqi_update has no 16-bit variant
 (define_insn "*loadqi_update"
   [(set (match_operand:QI 3 "dest_reg_operand" "=r,r")
-   (match_operator:QI 4 "load_update_operand"
-[(match_operand:SI 1 "register_operand" "0,0")
- (match_operand:SI 2 "nonmemory_operand" "rI,Cal")]))
+(match_operator:QI 4 "memory_operand"
+ [(plus:SI (match_operand:SI 1 "register_operand" "0,0")
+   (match_operand:SI 2 "nonmemory_operand" "rI,Cal"))]))
(set (match_operand:SI 0 "dest_reg_operand" "=r,r")
(plus:SI (match_dup 1) (match_dup 2)))]
   ""
@@ -1126,9 +1126,9 @@
 
 (define_insn "*load_zeroextendqisi_update"
   [(set (match_operand:SI 3 "dest_reg_operand" "=r,r")
-   (zero_extend:SI (match_operator:QI 4 "load_update_operand"
-[(match_operand:SI 1 "register_operand" "0,0")
- (match_operand:SI 2 "nonmemory_operand" "rI,Cal")])))
+   (zero_extend:SI (match_operator:QI 4 "memory_operand"
+[(plus:SI (match_operand:SI 1 "register_operand" "0,0")
+  (match_operand:SI 2 "nonmemory_operand" 
"rI,Cal"))])))
(set (match_operand:SI 0 "dest_reg_operand" "=r,r")
(plus:SI (match_dup

[PATCH 1/4] gcc/arc: Fix warning in test

2015-12-15 Thread Andrew Burgess
Missing function declaration causes a warning, that results in test
failure.

gcc/testsuite/ChangeLog:

* gcc.target/arc/jump-around-jump.c (rtc_set_time): Declare.
---
 gcc/testsuite/ChangeLog | 4 
 gcc/testsuite/gcc.target/arc/jump-around-jump.c | 2 +-
 2 files changed, 5 insertions(+), 1 deletion(-)

diff --git a/gcc/testsuite/ChangeLog b/gcc/testsuite/ChangeLog
index 6bcacab..bf4d198 100644
--- a/gcc/testsuite/ChangeLog
+++ b/gcc/testsuite/ChangeLog
@@ -1,3 +1,7 @@
+2015-12-09  Andrew Burgess  
+
+   * gcc.target/arc/jump-around-jump.c (rtc_set_time): Declare.
+
 2015-12-10  Jeff Law  
 
PR tree-optimization/68619
diff --git a/gcc/testsuite/gcc.target/arc/jump-around-jump.c 
b/gcc/testsuite/gcc.target/arc/jump-around-jump.c
index 1b45328..338c667 100644
--- a/gcc/testsuite/gcc.target/arc/jump-around-jump.c
+++ b/gcc/testsuite/gcc.target/arc/jump-around-jump.c
@@ -97,7 +97,7 @@ struct rtc_device
 extern void rtc_time_to_tm(unsigned long time, struct rtc_time *tm);
 extern struct rtc_device *rtc_class_open(const char *name);
 extern void rtc_class_close(struct rtc_device *rtc);
-
+extern int rtc_set_time (struct rtc_device *rtc, struct rtc_time *tm);
 
 int rtc_set_ntp_time(struct timespec now)
 {
-- 
2.5.1



[PATCH 4/4] gcc/arc: Avoid JUMP_LABEL_AS_INSN for possible return jumps

2015-12-15 Thread Andrew Burgess
We currently call JUMP_LABEL_AS_INSN on a jump instruction that might
have SIMPLE_RETURN as it's jump label, this triggers the assertions as
SIMPLE_RETURN is of type rtx_extra, not rtx_insn.

This commit first calls JUMP_LABEL then uses ANY_RETURN_P to catch all
of the return style jump labels.  After this we can use the safe_as_a
cast mechanism to safely convert the jump label to an rtx_insn.

There's a test included, but this issue is also hit in the tests:
gcc.c-torture/execute/2605-2.c
gcc.dg/torture/pr68083.c

gcc/ChangeLog:

* config/arc/arc.c (arc_loop_hazard): Don't convert the jump label
rtx to an rtx_insn until we confirm it's not a return rtx.

gcc/testsuite/ChangeLog:

* gcc.target/arc/loop-hazard-1.c: New file.
---
 gcc/ChangeLog|  5 +
 gcc/config/arc/arc.c | 15 ---
 gcc/testsuite/ChangeLog  |  4 
 gcc/testsuite/gcc.target/arc/loop-hazard-1.c | 16 
 4 files changed, 33 insertions(+), 7 deletions(-)
 create mode 100644 gcc/testsuite/gcc.target/arc/loop-hazard-1.c

diff --git a/gcc/ChangeLog b/gcc/ChangeLog
index dcc0930..bd2621d 100644
--- a/gcc/ChangeLog
+++ b/gcc/ChangeLog
@@ -1,5 +1,10 @@
 2015-12-09  Andrew Burgess  
 
+   * config/arc/arc.c (arc_loop_hazard): Don't convert the jump label
+   rtx to an rtx_insn until we confirm it's not a return rtx.
+
+2015-12-09  Andrew Burgess  
+
* config/arc/arc.md (*storeqi_update): Use 'memory_operand' and
fix RTL pattern to include the plus.
(*storehi_update): Likewise.
diff --git a/gcc/config/arc/arc.c b/gcc/config/arc/arc.c
index 5bc2bce..2c0f8b9 100644
--- a/gcc/config/arc/arc.c
+++ b/gcc/config/arc/arc.c
@@ -7987,6 +7987,7 @@ static bool
 arc_loop_hazard (rtx_insn *pred, rtx_insn *succ)
 {
   rtx_insn *jump  = NULL;
+  rtx label_rtx = NULL_RTX;
   rtx_insn *label = NULL;
   basic_block succ_bb;
 
@@ -8013,22 +8014,22 @@ arc_loop_hazard (rtx_insn *pred, rtx_insn *succ)
   else
 return false;
 
-  label = JUMP_LABEL_AS_INSN (jump);
-  if (!label)
-return false;
-
   /* Phase 2b: Make sure is not a millicode jump.  */
   if ((GET_CODE (PATTERN (jump)) == PARALLEL)
   && (XVECEXP (PATTERN (jump), 0, 0) == ret_rtx))
 return false;
 
-  /* Phase 2c: Make sure is not a simple_return.  */
-  if ((GET_CODE (PATTERN (jump)) == SIMPLE_RETURN)
-  || (GET_CODE (label) == SIMPLE_RETURN))
+  label_rtx = JUMP_LABEL (jump);
+  if (!label_rtx)
+return false;
+
+  /* Phase 2c: Make sure is not a return.  */
+  if (ANY_RETURN_P (label_rtx))
 return false;
 
   /* Pahse 2d: Go to the target of the jump and check for aliveness of
  LP_COUNT register.  */
+  label = safe_as_a  (label_rtx);
   succ_bb = BLOCK_FOR_INSN (label);
   if (!succ_bb)
 {
diff --git a/gcc/testsuite/ChangeLog b/gcc/testsuite/ChangeLog
index 6ab629a..b98706f 100644
--- a/gcc/testsuite/ChangeLog
+++ b/gcc/testsuite/ChangeLog
@@ -1,5 +1,9 @@
 2015-12-09  Andrew Burgess  
 
+   * gcc.target/arc/loop-hazard-1.c: New file.
+
+2015-12-09  Andrew Burgess  
+
* gcc.target/arc/load-update.c: New file.
 
 2015-12-09  Andrew Burgess  
diff --git a/gcc/testsuite/gcc.target/arc/loop-hazard-1.c 
b/gcc/testsuite/gcc.target/arc/loop-hazard-1.c
new file mode 100644
index 000..7c688bb
--- /dev/null
+++ b/gcc/testsuite/gcc.target/arc/loop-hazard-1.c
@@ -0,0 +1,16 @@
+/* { dg-do compile } */
+/* { dg-options "-Os" } */
+
+/* This caused an assertion within arc_loop_hazard.  */
+
+unsigned a, b;
+
+long fn1()
+{
+  long c = 1, d = 0;
+  while (a && c && b)
+c <<= 1;
+  while (c)
+d |= c;
+  return d;
+}
-- 
2.5.1



Re: [PATCH] S/390: Wide int support.

2015-12-15 Thread Ulrich Weigand
Richard Sandiford wrote:
> "Ulrich Weigand"  writes:
> > The problem is not DImode LABEL_REFs, but rather VOIDmode LABEL_REFs when
> > matched against a match_operand:DI.
> 
> It'd be good to fix this in a more direct way though, rather than
> hack around it.  It's possible that the trick will stop working
> if genrecog.c gets smarter.
> 
> When do label_refs have VOIDmode?  Is this an m31-ism?

No, this seems to be a cross-platform issue.  For one, RTX in .md files
pretty much consistently uses (label_ref ...) without a mode.  This means
that any LABEL_REFs generated from .md file expanders or splitters will
use VOIDmode.

For LABEL_REFs generated via explicit gen_rtx_LABEL_REF, usage seems to
be mixed between using VOIDmode and Pmode in target C++ files.  Common
code does seem to be using always Pmode, as far as I can see.

Are LABEL_REFs in fact supposed to always have Pmode?

Bye,
Ulrich

-- 
  Dr. Ulrich Weigand
  GNU/Linux compilers and toolchain
  ulrich.weig...@de.ibm.com



[PATCH] C FE: fix range of primary-expression in c_parser_postfix_expression

2015-12-15 Thread David Malcolm
In the C frontend,
  c_parser_postfix_expression
after parsing a primary expression passes "loc", the location of the
*first token* in that expression to
  c_parser_postfix_expression_after_primary,
which thus discards any range information we had for primary
expressions containing more than one token; we get just the range of
the first token.

An example of this can be seen in this testcase from:
  https://gcc.gnu.org/wiki/ClangDiagnosticsComparison

void foo(char **argP, char **argQ)
{
  (argP - argQ)();
  argP();
}

for which trunk currently gives these ranges:

diagnostic-range-bad-called-object.c:7:3: error: called object is not a 
function or function pointer
   (argP - argQ)();
   ^

diagnostic-range-bad-called-object.c:14:3: error: called object 'argP' is not a 
function or function pointer
   argP();
   ^~~~

The second happens to be correct, but the first is missing
range information.

The following patch is a one-liner to preserve the expression's location,
changing the first to:

diagnostic-range-bad-called-object.c:7:9: error: called object is not a 
function or function pointer
   (argP - argQ)();
   ~~^~~

and leaving the second unchanged.

Applying this fix requires tweaking some column numbers for expected
locations in gcc.dg/cast-function-1.c; the output of trunk was of the
form:

cast-function-1.c:21:7: warning: function called through a non-compatible type
   d = ((double (*) (int)) foo1) (i);
   ^

which the patch changes to:

cast-function-1.c:21:8: warning: function called through a non-compatible type
   d = ((double (*) (int)) foo1) (i);
   ~^~~~

which I feel is an improvement.

Successfully bootstrapped®rtested on x86_64-pc-linux-gnu.
Adds 6 new PASS results to gcc.sum

OK for trunk in stage 3?

gcc/c/ChangeLog:
* c-parser.c (c_parser_postfix_expression): Use EXPR_LOC_OR_LOC
to preserve range information for the primary expression
in the call to c_parser_postfix_expression_after_primary.

gcc/testsuite/ChangeLog:
* gcc.dg/cast-function-1.c (bar): Update column numbers.
* gcc.dg/diagnostic-range-bad-called-object.c: New test case.
---
 gcc/c/c-parser.c   |  3 ++-
 gcc/testsuite/gcc.dg/cast-function-1.c |  8 
 .../gcc.dg/diagnostic-range-bad-called-object.c| 24 ++
 3 files changed, 30 insertions(+), 5 deletions(-)
 create mode 100644 gcc/testsuite/gcc.dg/diagnostic-range-bad-called-object.c

diff --git a/gcc/c/c-parser.c b/gcc/c/c-parser.c
index 5c32f45..e149e19 100644
--- a/gcc/c/c-parser.c
+++ b/gcc/c/c-parser.c
@@ -7954,7 +7954,8 @@ c_parser_postfix_expression (c_parser *parser)
   expr.value = error_mark_node;
   break;
 }
-  return c_parser_postfix_expression_after_primary (parser, loc, expr);
+  return c_parser_postfix_expression_after_primary
+(parser, EXPR_LOC_OR_LOC (expr.value, loc), expr);
 }
 
 /* Parse a postfix expression after a parenthesized type name: the
diff --git a/gcc/testsuite/gcc.dg/cast-function-1.c 
b/gcc/testsuite/gcc.dg/cast-function-1.c
index ab42db1..5228b55 100644
--- a/gcc/testsuite/gcc.dg/cast-function-1.c
+++ b/gcc/testsuite/gcc.dg/cast-function-1.c
@@ -18,14 +18,14 @@ typedef struct {
 
 void bar(double d, int i, str_t s)
 {
-  d = ((double (*) (int)) foo1) (i);  /* { dg-warning "7:non-compatible|abort" 
} */
-  i = ((int (*) (double)) foo1) (d);  /* { dg-warning "7:non-compatible|abort" 
} */
-  s = ((str_t (*) (int)) foo1) (i);   /* { dg-warning "7:non-compatible|abort" 
} */
+  d = ((double (*) (int)) foo1) (i);  /* { dg-warning "8:non-compatible|abort" 
} */
+  i = ((int (*) (double)) foo1) (d);  /* { dg-warning "8:non-compatible|abort" 
} */
+  s = ((str_t (*) (int)) foo1) (i);   /* { dg-warning "8:non-compatible|abort" 
} */
   ((void (*) (int)) foo1) (d);/* { dg-warning "non-compatible|abort" } 
*/
   i = ((int (*) (int)) foo1) (i); /* { dg-bogus "non-compatible|abort" } */
   (void) foo1 (i);/* { dg-bogus "non-compatible|abort" } */
 
-  d = ((double (*) (int)) foo2) (i);  /* { dg-warning "7:non-compatible|abort" 
} */
+  d = ((double (*) (int)) foo2) (i);  /* { dg-warning "8:non-compatible|abort" 
} */
   i = ((int (*) (double)) foo2) (d);  /* { dg-bogus "non-compatible|abort" } */
   s = ((str_t (*) (int)) foo2) (i);   /* { dg-warning "non-compatible|abort" } 
*/
   ((void (*) (int)) foo2) (d);/* { dg-warning "non-compatible|abort" } 
*/
diff --git a/gcc/testsuite/gcc.dg/diagnostic-range-bad-called-object.c 
b/gcc/testsuite/gcc.dg/diagnostic-range-bad-called-object.c
new file mode 100644
index 000..95fb3e9
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/diagnostic-range-bad-called-object.c
@@ -0,0 +1,24 @@
+/* { dg-options "-fdiagnostics-show-caret" } */
+
+/* Adapted from https://gcc.gnu.org/wiki/ClangDiagnosticsComparison */
+
+void call_of_non_function_ptr (char **argP, char **argQ)
+{
+  (argP - argQ)(); /* { dg-error "called object 

Fix DECL_VIRTUAL_P of same body aliases

2015-12-15 Thread Jan Hubicka
Hi,
this patch fixes an inconsistency seen in g++.dg/lto/20081125_0.C testcase
where we have same body alias of dtor. The alias is DECL_VIRTUAL while
the dtor itself is not.  symtab_node::fixup_same_cpp_alias_visibility
copies DECL_VIRTUAL_P from the alias target dropping the flag that is wrong.

I checked and the code copying flag was instroduced in
https://gcc.gnu.org/ml/gcc-patches/2011-06/msg00903.html
which was my initial commit implementing the same body aliases as real aliases.
I have no recollection why the code is there but it is wrong (basically
everything in symtab_node::fixup_same_cpp_alias_visibility is wrong and should
be done by C++ FE instead, but this is more wrong in a sense of leading to a
wrong code).

Bootstrapped/regtested x86_64-linux, will commit it after re-testing with
Firefox.

Honza

* symtab.c (symtab_node::fixup_same_cpp_alias_visibility):
Do not copy DECL_VIRTUAL_P.
Index: symtab.c
===
--- symtab.c(revision 231581)
+++ symtab.c(working copy)
@@ -1363,7 +1363,6 @@ symtab_node::fixup_same_cpp_alias_visibi
   DECL_EXTERNAL (decl) = DECL_EXTERNAL (target->decl);
   DECL_VISIBILITY (decl) = DECL_VISIBILITY (target->decl);
 }
-  DECL_VIRTUAL_P (decl) = DECL_VIRTUAL_P (target->decl);
   if (TREE_PUBLIC (decl))
 {
   tree group;


Re: [PATCH, IA64] Fix building a bare-metal ia64 compiler

2015-12-15 Thread Bernd Edlinger
Hi,

On 15.12.2015 22:55, Jeff Law wrote:
> On 12/15/2015 02:13 PM, Bernd Edlinger wrote:
>> Hi,
>>
>> due to recent discussion on the basic asm, and the special handling 
>> of ASM_INPUT in ia64, I tried to build a bare-metal cross-compiler 
>> for ia64, but that did not work, because it seems to be impossible to 
>> build it without having a stdlib.h.
>>
>> With the attached patch, I was finally able to build the cross 
>> compiler, by declaring abort in the way as it is done already in many 
>> other places at libgcc.
>>
>> In case someone wants to know, working configure options are as follows:
>>
>> ../binutils-2.25.1/configure --prefix=../ia64-elf 
>> --target=ia64-unknown-elf
>>
>> ../gcc-trunk/configure --prefix=../ia64-elf --target=ia64-unknown-elf 
>> --enable-languages=c --with-gnu-as --disable-threads 
>> --disable-sjlj-exceptions --disable-libssp --disable-libquadmath
>>
>>
>> I have successfully built a bare-metal cross compiler with this patch.
>> Is it OK for trunk?
> I wouldn't call it "many" places in libgcc.  I just see one -- 
> fp-bit.c and that one is only to give "sane" handle to extended mode 
> floating point.

there is also unwind-arm-common.inc, which is included by 
./config/arm/unwind-arm.h
and ./config/c6x/unwind-c6x.h, so yes, not many, only few.

> For ia64-elf ISTM this header should be coming from newlib

but I don't want to use newlib or glibc, I just wanted to see if my 
other patch breaks something.


Bernd.


[PATCH 2/2] Remove individial dependence pointers and add a scop::dependence to contain all the dependence.

2015-12-15 Thread hiraditya
Removed the member variables which are only used in scop_get_dependence. Instead
only maintaining the overall dependence. Passes regtest and bootstrap.

gcc/ChangeLog:

2015-12-15  hiraditya  

* graphite-dependences.c (scop_get_dependences): Use local pointers.
* 
graphite-isl-ast-to-gimple.c(translate_isl_ast_to_gimple::scop_to_isl_ast):
  Use scop->dependence.
* graphite-optimize-isl.c (optimize_isl): Same.
* graphite-poly.c (new_scop): Remove initialization of removed members.
(free_scop): Same.
* graphite.h (struct scop): Remove individial dependence pointers and
add a scop::dependence to contain all the dependence.

---
 gcc/graphite-dependences.c   | 56 
 gcc/graphite-isl-ast-to-gimple.c |  7 ++---
 gcc/graphite-optimize-isl.c  | 12 -
 gcc/graphite-poly.c  | 43 --
 gcc/graphite.h   |  9 ++-
 5 files changed, 55 insertions(+), 72 deletions(-)

diff --git a/gcc/graphite-dependences.c b/gcc/graphite-dependences.c
index bb81ae3..b34ed77 100644
--- a/gcc/graphite-dependences.c
+++ b/gcc/graphite-dependences.c
@@ -376,28 +376,32 @@ compute_deps (scop_p scop, vec pbbs,
 isl_union_map *
 scop_get_dependences (scop_p scop)
 {
-  isl_union_map *dependences;
-
-  if (!scop->must_raw)
-compute_deps (scop, scop->pbbs,
- &scop->must_raw, &scop->may_raw,
- &scop->must_raw_no_source, &scop->may_raw_no_source,
- &scop->must_war, &scop->may_war,
- &scop->must_war_no_source, &scop->may_war_no_source,
- &scop->must_waw, &scop->may_waw,
- &scop->must_waw_no_source, &scop->may_waw_no_source);
-
-  dependences = isl_union_map_copy (scop->must_raw);
-  dependences = isl_union_map_union (dependences,
-isl_union_map_copy (scop->must_war));
-  dependences = isl_union_map_union (dependences,
-isl_union_map_copy (scop->must_waw));
-  dependences = isl_union_map_union (dependences,
-isl_union_map_copy (scop->may_raw));
-  dependences = isl_union_map_union (dependences,
-isl_union_map_copy (scop->may_war));
-  dependences = isl_union_map_union (dependences,
-isl_union_map_copy (scop->may_waw));
+  if (scop->dependence)
+return scop->dependence;
+
+  /* The original dependence relations:
+ RAW are read after write dependences,
+ WAR are write after read dependences,
+ WAW are write after write dependences.  */
+  isl_union_map *must_raw = NULL, *may_raw = NULL, *must_raw_no_source = NULL,
+  *may_raw_no_source = NULL, *must_war = NULL, *may_war = NULL,
+  *must_war_no_source = NULL, *may_war_no_source = NULL, *must_waw = NULL,
+  *may_waw = NULL, *must_waw_no_source = NULL, *may_waw_no_source = NULL;
+
+  compute_deps (scop, scop->pbbs,
+ &must_raw, &may_raw,
+ &must_raw_no_source, &may_raw_no_source,
+ &must_war, &may_war,
+ &must_war_no_source, &may_war_no_source,
+ &must_waw, &may_waw,
+ &must_waw_no_source, &may_waw_no_source);
+
+  isl_union_map *dependences = must_raw;
+  dependences = isl_union_map_union (dependences, must_war);
+  dependences = isl_union_map_union (dependences, must_waw);
+  dependences = isl_union_map_union (dependences, may_raw);
+  dependences = isl_union_map_union (dependences, may_war);
+  dependences = isl_union_map_union (dependences, may_waw);
 
   if (dump_file)
 {
@@ -406,6 +410,14 @@ scop_get_dependences (scop_p scop)
   fprintf (dump_file, ")\n");
 }
 
+  isl_union_map_free (must_raw_no_source);
+  isl_union_map_free (may_raw_no_source);
+  isl_union_map_free (must_war_no_source);
+  isl_union_map_free (may_war_no_source);
+  isl_union_map_free (must_waw_no_source);
+  isl_union_map_free (may_waw_no_source);
+
+  scop->dependence = dependences;
   return dependences;
 }
 
diff --git a/gcc/graphite-isl-ast-to-gimple.c b/gcc/graphite-isl-ast-to-gimple.c
index ed2a896..af54109 100644
--- a/gcc/graphite-isl-ast-to-gimple.c
+++ b/gcc/graphite-isl-ast-to-gimple.c
@@ -3203,18 +3203,15 @@ translate_isl_ast_to_gimple::scop_to_isl_ast (scop_p 
scop, ivs_params &ip)
   isl_union_map *schedule_isl = generate_isl_schedule (scop);
   isl_ast_build *context_isl = generate_isl_context (scop);
   context_isl = set_options (context_isl, schedule_isl);
-  isl_union_map *dependences = NULL;
   if (flag_loop_parallelize_all)
 {
-  dependences = scop_get_dependences (scop);
+  isl_union_map *dependence = scop_get_dependences (scop);
   context_isl =
isl_ast_build_set_before_each_for (context_isl, ast_build_before_for,
-  dependences);
+ 

[PATCH 1/2] [graphite] Use refs instead of values.

2015-12-15 Thread hiraditya
Passes bootstrap and regtest.

gcc/ChangeLog:

2015-12-15  hiraditya  

* graphite-sese-to-poly.c (build_poly_sr): Use refs.

---
 gcc/graphite-sese-to-poly.c | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/gcc/graphite-sese-to-poly.c b/gcc/graphite-sese-to-poly.c
index 480c552..ff45599 100644
--- a/gcc/graphite-sese-to-poly.c
+++ b/gcc/graphite-sese-to-poly.c
@@ -1064,8 +1064,8 @@ build_poly_sr (poly_bb_p pbb)
 {
   scop_p scop = PBB_SCOP (pbb);
   gimple_poly_bb_p gbb = PBB_BLACK_BOX (pbb);
-  vec reads = gbb->read_scalar_refs;
-  vec writes = gbb->write_scalar_refs;
+  vec &reads = gbb->read_scalar_refs;
+  vec &writes = gbb->write_scalar_refs;
 
   isl_space *dc = isl_set_get_space (pbb->domain);
   int nb_out = 1;
-- 
2.1.4



Re: [PATCH, IA64] Fix building a bare-metal ia64 compiler

2015-12-15 Thread Bernd Edlinger
Hi,

On 16.12.2015 00:55 Bernd Schmidt wrote:
> On 12/15/2015 10:13 PM, Bernd Edlinger wrote:
>> due to recent discussion on the basic asm, and the special handling
>> of ASM_INPUT in ia64, I tried to build a bare-metal cross-compiler
>> for ia64, but that did not work, because it seems to be impossible to
>> build it without having a stdlib.h.
>
> Actually David Howells has complained to me about this as well, it 
> seems to be a problem when building a toolchain for kernel compilation.

yes.  I am not sure, if this is also problematic, when building a 
cross-glibc,
then I need also a cross C compiler first, build glibc, and install .h 
and objects
to the sysroot, then I build gcc again, with all languages.

>
>> With the attached patch, I was finally able to build the cross
>> compiler, by declaring abort in the way as it is done already in many
>> other places at libgcc.
>
> Can you just use __builtin_abort ()? Ok with that change.
>

I will try, but ./config/ia64/unwind-ia64.c includes this header,
and it also uses abort () at many places.  So I'd expect warnings
there.


Thanks
Bernd.


Re: [PATCH] S/390: Allow to use r1 to r4 as literal pool base.

2015-12-15 Thread Dominik Vogt
On Mon, Dec 14, 2015 at 04:08:32PM +0100, Ulrich Weigand wrote:
> Dominik Vogt wrote:
> 
> > The attached patch enables using r1 to r4 as the literal pool base pointer 
> > if
> > one of them is unused in a leaf function.  The unpatched code supports only 
> > r5
> > and r13.
> 
> I don't think that r1 is actually safe here.  Note that it may be used
> (unconditionally) as temp register in s390_emit_prologue in certain cases;
> the upcoming split-stack code will also need to use r1 in some cases.

How about the attached patch?  It also allows to use r0 as the
temp register if possible (needs more testing).  If that's too
much effort, I'm fine with limiting the original patch to r4 to
r2.

> r2 through r4 should be fine.  [ Not sure if there will be many (any?) cases
> where one of those is unused but r5 isn't, however. ]

This can happen if the function only uses register pairs
(__int128).  Actually I'm not sure whether r2 and r4 are valid
candidates.

Ciao

Dominik ^_^  ^_^

-- 

Dominik Vogt
IBM Germany
gcc/ChangeLog

* config/s390/s390.c (s390_init_frame_layout): Try r4 to r1 for the
literal pool pointer.
(s390_get_prologue_temp_regno): New function to choose the temp_reg for
the prologue.  Allow to use r0 if that's safe.
(s390_emit_prologue): Move code choosing the temp_reg to a separate
function.  Add assertions.
>From 806973409adc48c8ca701d55fdbad897b0e31c78 Mon Sep 17 00:00:00 2001
From: Dominik Vogt 
Date: Fri, 11 Dec 2015 11:33:23 +0100
Subject: [PATCH] S/390: Allow to use r1 to r4 as literal pool base
 pointer.

The old code only considered r5 and r13.
---
 gcc/config/s390/s390.c | 61 +++---
 1 file changed, 48 insertions(+), 13 deletions(-)

diff --git a/gcc/config/s390/s390.c b/gcc/config/s390/s390.c
index bc6f05b..c45b992 100644
--- a/gcc/config/s390/s390.c
+++ b/gcc/config/s390/s390.c
@@ -9506,6 +9506,26 @@ s390_frame_info (void)
   & ~(STACK_BOUNDARY / BITS_PER_UNIT - 1));
 }
 
+/* Returns the register number that is used as a temp register in the prologue.
+ */
+static int
+s390_get_prologue_temp_regno (void)
+{
+  if (!has_hard_reg_initial_val (Pmode, RETURN_REGNUM)
+  && !crtl->is_leaf
+  && !TARGET_TPF_PROFILING)
+return RETURN_REGNUM;
+  if (cfun_save_high_fprs_p)
+/* Needs an address register.  */
+return 1;
+  else if (TARGET_BACKCHAIN)
+/* Does not need an address register.  */
+return 0;
+
+  /* No temp register needed.  */
+  return -1;
+}
+
 /* Generate frame layout.  Fills in register and frame data for the current
function in cfun->machine.  This routine can be called multiple times;
it will re-do the complete frame layout every time.  */
@@ -9543,10 +9563,24 @@ s390_init_frame_layout (void)
 	 as base register to avoid save/restore overhead.  */
   if (!base_used)
 	cfun->machine->base_reg = NULL_RTX;
-  else if (crtl->is_leaf && !df_regs_ever_live_p (5))
-	cfun->machine->base_reg = gen_rtx_REG (Pmode, 5);
   else
-	cfun->machine->base_reg = gen_rtx_REG (Pmode, BASE_REGNUM);
+	{
+	  int br = 0;
+
+	  if (crtl->is_leaf)
+	{
+	  int temp_regno;
+
+	  temp_regno = s390_get_prologue_temp_regno ();
+	  /* Prefer r5 (most likely to be free).  */
+	  for (br = 5;
+		   br >= 1 && (br == temp_regno || df_regs_ever_live_p (br));
+		   br--)
+		;
+	}
+	  cfun->machine->base_reg =
+	gen_rtx_REG (Pmode, (br > 0) ? br : BASE_REGNUM);
+	}
 
   s390_register_info ();
   s390_frame_info ();
@@ -10385,19 +10419,16 @@ s390_emit_prologue (void)
 {
   rtx insn, addr;
   rtx temp_reg;
+  int temp_regno;
   int i;
   int offset;
   int next_fpr = 0;
 
-  /* Choose best register to use for temp use within prologue.
- See below for why TPF must use the register 1.  */
-
-  if (!has_hard_reg_initial_val (Pmode, RETURN_REGNUM)
-  && !crtl->is_leaf
-  && !TARGET_TPF_PROFILING)
-temp_reg = gen_rtx_REG (Pmode, RETURN_REGNUM);
-  else
-temp_reg = gen_rtx_REG (Pmode, 1);
+  /* If new uses of temp_reg are introduced into the prologue, be sure to
+ update the conditions in s390_get_prologue_temp_regno().  Otherwise the
+ prologue might overwrite the literal pool pointer in r1.  */
+  temp_regno = s390_get_prologue_temp_regno ();
+  temp_reg = (temp_regno >= 0) ? gen_rtx_REG (Pmode, temp_regno) : NULL_RTX;
 
   s390_save_gprs_to_fprs ();
 
@@ -10551,7 +10582,10 @@ s390_emit_prologue (void)
 
   /* Save incoming stack pointer into temp reg.  */
   if (TARGET_BACKCHAIN || next_fpr)
-	insn = emit_insn (gen_move_insn (temp_reg, stack_pointer_rtx));
+	{
+	  gcc_assert (temp_regno >= 0);
+	  insn = emit_insn (gen_move_insn (temp_reg, stack_pointer_rtx));
+	}
 
   /* Subtract frame size from stack pointer.  */
 
@@ -10606,6 +10640,7 @@ s390_emit_prologue (void)
 
   if (cfun_save_high_fprs_p && next_fpr)
 {
+  gcc_assert (temp_regno >= 1);
   /* If the stack might be accessed through a diff

[Patch, avr] Provide correct memory move costs

2015-12-15 Thread Senthil Kumar Selvaraj
Hi,

  When analyzing code size regressions for AVR for top-of-trunk, I
  found a few cases where aggresive inlining (by the middle-end)
  of functions containing calls to memcpy was bloating up the code.

  Turns out that the AVR backend has MOVE_MAX set to 4 (unchanged from the 
  original commit), when it really should be 1, as the AVRs can only 
  move a single byte between reg and memory in a single instruction. 
  Setting it to 4 causes the middle end to underestimate the
  cost of memcopys with a compile time constant length parameter, as it 
  thinks a 4 byte copy's cost is only a single instruction.

  Just setting MOVE_MAX to 1 makes the middle end too conservative
  though, and causes a bunch of regression tests to fail, as lots of
  optimizations fail to pass the code size increase threshold check,
even when not optimizing for size.

  Instead, the below patch sets MOVE_MAX_PIECES to 2, and implements a
  target hook that tells the middle-end to use load/store insns for
  memory moves upto two bytes. Also, the patch sets MOVE_RATIO to 3 when
  optimizing for speed, so that moves upto 4 bytes will occur through
  load/store sequences, like it does now.

  With this, only a couple of regression tests fail. uninit-19.c fails
  because it thinks only non-pic code won't inline a function, but the
  cost computation prevents inlining for AVRs. The test passes if
  the optimization level is increased to -O3. 

strlenopt-8.c has an XPASS and a FAIL because a previous pass issued 
a builtin_memcpy instead of a MEM assignment. Execution still passes.

  I'll continue running more tests to see if there are other performance
  related consequences.

  Is this ok? If ok, could someone commit please? I don't have commit
  access.

Regards
Senthil

gcc/ChangeLog

2015-12-16  Senthil Kumar Selvaraj  

* config/avr/avr.h (MOVE_MAX): Set value to 1. 
(MOVE_MAX_PIECES): Define.
(MOVE_RATIO): Define.
* config/avr/avr.c (TARGET_USE_BY_PIECES_INFRASTRUCTURE_P):
Provide target hook.
(avr_use_by_pieces_infrastructure_p): New function.


diff --git gcc/config/avr/avr.c gcc/config/avr/avr.c
index 609a42b..9cc95db 100644
--- gcc/config/avr/avr.c
+++ gcc/config/avr/avr.c
@@ -2431,6 +2431,27 @@ avr_print_operand (FILE *file, rtx x, int code)
 }
 
 
+/* Implement TARGET_USE_BY_PIECES_INFRASTRUCTURE_P.  */
+
+/* Prefer sequence of loads/stores for moves of size upto
+   two - two pairs of load/store instructions are always better
+   than the 5 instruction sequence for a loop (1 instruction
+   for loop counter setup, and 4 for the body of the loop). */
+
+static bool
+avr_use_by_pieces_infrastructure_p (unsigned HOST_WIDE_INT size,
+unsigned int align ATTRIBUTE_UNUSED,
+enum by_pieces_operation op,
+bool speed_p)
+{
+
+  if (op != MOVE_BY_PIECES || (speed_p && (size > (MOVE_MAX_PIECES
+return default_use_by_pieces_infrastructure_p (size, align, op, speed_p);
+
+  return size <= (MOVE_MAX_PIECES);
+}
+
+
 /* Worker function for `NOTICE_UPDATE_CC'.  */
 /* Update the condition code in the INSN.  */
 
@@ -13763,6 +13784,10 @@ avr_fold_builtin (tree fndecl, int n_args 
ATTRIBUTE_UNUSED, tree *arg,
 #undef  TARGET_PRINT_OPERAND_PUNCT_VALID_P
 #define TARGET_PRINT_OPERAND_PUNCT_VALID_P avr_print_operand_punct_valid_p
 
+#undef TARGET_USE_BY_PIECES_INFRASTRUCTURE_P
+#define TARGET_USE_BY_PIECES_INFRASTRUCTURE_P \
+  avr_use_by_pieces_infrastructure_p
+
 struct gcc_target targetm = TARGET_INITIALIZER;
 
 
diff --git gcc/config/avr/avr.h gcc/config/avr/avr.h
index 7439964..ebfb8ed 100644
--- gcc/config/avr/avr.h
+++ gcc/config/avr/avr.h
@@ -453,7 +453,22 @@ typedef struct avr_args
 
 #undef WORD_REGISTER_OPERATIONS
 
-#define MOVE_MAX 4
+/* Can move only a single byte from memory to reg in a
+   single instruction. */
+
+#define MOVE_MAX 1
+
+/* Allow upto two bytes moves to occur using by_pieces
+   infrastructure */
+
+#define MOVE_MAX_PIECES 2
+
+/* Set MOVE_RATIO to 3 to allow memory moves upto 4 bytes to happen
+   by pieces when optimizing for speed, like it did when MOVE_MAX_PIECES
+   was 4. When optimizing for size, allow memory moves upto 2 bytes. 
+   Also see avr_use_by_pieces_infrastructure_p. */
+
+#define MOVE_RATIO(speed) ((speed) ? 3 : 2)
 
 #define TRULY_NOOP_TRUNCATION(OUTPREC, INPREC) 1
 


  1   2   >