Re: [PATCH/AARCH64] Add vulcan -mcpu support

2016-06-07 Thread Virendra Pathak
Please find gcc ChangeLog in the correct format.

ChangeLog:

Virendra Pathak  
* config/aarch64/aarch64-cores.def (vulcan): New core
* config/aarch64/aarch64-tune.md: Regenerate
* doc/invoke.texi (AARCH64/mtune): Document vulcan as an
available option.

Thanks.

with regards,
Virendra Pathak


On Wed, Jun 8, 2016 at 10:52 AM, Virendra Pathak
 wrote:
> Hi gcc-patches group,
>
> ping,
>
> updated the ChangeLog with author's entry.
>
> ChangeLog:
> Virendra Pathak (virendra.pat...@broadcom.com)
> * config/aarch64/aarch64-cores.def (vulcan): New core
> * config/aarch64/aarch64-tune.md: Regenerate
> * doc/invoke.texi (AARCH64/mtune): Document vulcan as an available option.
>
> Thanks.
>
> with regards,
> Virendra Pathak
>
>
> On Sat, Jun 4, 2016 at 12:05 PM, Virendra Pathak
>  wrote:
>> Hi gcc-patches group,
>>
>> Please find the basic patch for adding -mcpu=vulcan support in the gcc.
>> Broadcom's vulcan is an armv8-a aarch64 based server processor.
>>
>> At present we are using schedule model of cortex-a57 but soon we will
>> be submitting a schedule model for vulcan.
>>
>> Please review the patch (attached with this mail) and kindly merge it
>> in the gcc-6-branch.
>>
>> Tested the patch with aarch64-linux-gnu cross build,
>> aarch64-unknown-linux-gnu native build and make check.
>> We have also obtained company wide agreement with FSF for contributing
>> to gcc project.
>>
>> Thanks.
>>
>>
>> ChangeLog:
>> * config/aarch64/aarch64-cores.def (vulcan): New core
>> * config/aarch64/aarch64-tune.md: Regenerate
>> * doc/invoke.texi (AARCH64/mtune): Document vulcan as an available option.
>>
>>
>>
>> with regards,
>> Virendra Pathak
From 8d065016856606740a3928518ed6a3f9933fb130 Mon Sep 17 00:00:00 2001
From: Virendra Pathak 
Date: Wed, 1 Jun 2016 03:15:33 -0700
Subject: [PATCH] [AArch64] Add -mcpu vulcan support

---
 gcc/config/aarch64/aarch64-cores.def | 1 +
 gcc/config/aarch64/aarch64-tune.md   | 2 +-
 gcc/doc/invoke.texi  | 4 ++--
 3 files changed, 4 insertions(+), 3 deletions(-)

diff --git a/gcc/config/aarch64/aarch64-cores.def 
b/gcc/config/aarch64/aarch64-cores.def
index 251a3eb..b0acad9 100644
--- a/gcc/config/aarch64/aarch64-cores.def
+++ b/gcc/config/aarch64/aarch64-cores.def
@@ -46,6 +46,7 @@ AARCH64_CORE("cortex-a57",  cortexa57, cortexa57, 8A,  
AARCH64_FL_FOR_ARCH8 | AA
 AARCH64_CORE("cortex-a72",  cortexa72, cortexa57, 8A,  AARCH64_FL_FOR_ARCH8 | 
AARCH64_FL_CRC, cortexa72, "0x41", "0xd08")
 AARCH64_CORE("exynos-m1",   exynosm1,  exynosm1,  8A,  AARCH64_FL_FOR_ARCH8 | 
AARCH64_FL_CRC | AARCH64_FL_CRYPTO, exynosm1,  "0x53", "0x001")
 AARCH64_CORE("qdf24xx", qdf24xx,   cortexa57, 8A,  AARCH64_FL_FOR_ARCH8 | 
AARCH64_FL_CRC | AARCH64_FL_CRYPTO, cortexa57, "0x51", "0x800")
+AARCH64_CORE("vulcan",  vulcan,cortexa57, 8A,  AARCH64_FL_FOR_ARCH8 | 
AARCH64_FL_CRC | AARCH64_FL_CRYPTO, cortexa57, "0x42", "0x516")
 AARCH64_CORE("thunderx",thunderx,  thunderx,  8A,  AARCH64_FL_FOR_ARCH8 | 
AARCH64_FL_CRC | AARCH64_FL_CRYPTO, thunderx,  "0x43", "0x0a1")
 AARCH64_CORE("xgene1",  xgene1,xgene1,8A,  AARCH64_FL_FOR_ARCH8, 
xgene1, "0x50", "0x000")
 
diff --git a/gcc/config/aarch64/aarch64-tune.md 
b/gcc/config/aarch64/aarch64-tune.md
index cbc6f48..c758a5f 100644
--- a/gcc/config/aarch64/aarch64-tune.md
+++ b/gcc/config/aarch64/aarch64-tune.md
@@ -1,5 +1,5 @@
 ;; -*- buffer-read-only: t -*-
 ;; Generated automatically by gentune.sh from aarch64-cores.def
 (define_attr "tune"
-   
"cortexa35,cortexa53,cortexa57,cortexa72,exynosm1,qdf24xx,thunderx,xgene1,cortexa57cortexa53,cortexa72cortexa53"
+   
"cortexa35,cortexa53,cortexa57,cortexa72,exynosm1,qdf24xx,vulcan,thunderx,xgene1,cortexa57cortexa53,cortexa72cortexa53"
(const (symbol_ref "((enum attr_tune) aarch64_tune)")))
diff --git a/gcc/doc/invoke.texi b/gcc/doc/invoke.texi
index 821f8fd..146042d 100644
--- a/gcc/doc/invoke.texi
+++ b/gcc/doc/invoke.texi
@@ -12955,8 +12955,8 @@ processors implementing the target architecture.
 Specify the name of the target processor for which GCC should tune the
 performance of the code.  Permissible values for this option are:
 @samp{generic}, @samp{cortex-a35}, @samp{cortex-a53}, @samp{cortex-a57},
-@samp{cortex-a72}, @samp{exynos-m1}, @samp{qdf24xx}, @samp{thunderx},
-@samp{xgene1}.
+@samp{cortex-a72}, @samp{exynos-m1}, @samp{qdf24xx}, @samp{vulcan},
+@samp{thunderx}, @samp{xgene1}.
 
 Additionally, this option can specify that GCC should tune the performance
 of the code for a big.LITTLE system.  Permissible values for this
-- 
2.1.4



Re: [PATCH/AARCH64] Add vulcan -mcpu support

2016-06-07 Thread Virendra Pathak
Hi gcc-patches group,

ping,

updated the ChangeLog with author's entry.

ChangeLog:
Virendra Pathak (virendra.pat...@broadcom.com)
* config/aarch64/aarch64-cores.def (vulcan): New core
* config/aarch64/aarch64-tune.md: Regenerate
* doc/invoke.texi (AARCH64/mtune): Document vulcan as an available option.

Thanks.

with regards,
Virendra Pathak


On Sat, Jun 4, 2016 at 12:05 PM, Virendra Pathak
 wrote:
> Hi gcc-patches group,
>
> Please find the basic patch for adding -mcpu=vulcan support in the gcc.
> Broadcom's vulcan is an armv8-a aarch64 based server processor.
>
> At present we are using schedule model of cortex-a57 but soon we will
> be submitting a schedule model for vulcan.
>
> Please review the patch (attached with this mail) and kindly merge it
> in the gcc-6-branch.
>
> Tested the patch with aarch64-linux-gnu cross build,
> aarch64-unknown-linux-gnu native build and make check.
> We have also obtained company wide agreement with FSF for contributing
> to gcc project.
>
> Thanks.
>
>
> ChangeLog:
> * config/aarch64/aarch64-cores.def (vulcan): New core
> * config/aarch64/aarch64-tune.md: Regenerate
> * doc/invoke.texi (AARCH64/mtune): Document vulcan as an available option.
>
>
>
> with regards,
> Virendra Pathak
From 8d065016856606740a3928518ed6a3f9933fb130 Mon Sep 17 00:00:00 2001
From: Virendra Pathak 
Date: Wed, 1 Jun 2016 03:15:33 -0700
Subject: [PATCH] [AArch64] Add -mcpu vulcan support

---
 gcc/config/aarch64/aarch64-cores.def | 1 +
 gcc/config/aarch64/aarch64-tune.md   | 2 +-
 gcc/doc/invoke.texi  | 4 ++--
 3 files changed, 4 insertions(+), 3 deletions(-)

diff --git a/gcc/config/aarch64/aarch64-cores.def 
b/gcc/config/aarch64/aarch64-cores.def
index 251a3eb..b0acad9 100644
--- a/gcc/config/aarch64/aarch64-cores.def
+++ b/gcc/config/aarch64/aarch64-cores.def
@@ -46,6 +46,7 @@ AARCH64_CORE("cortex-a57",  cortexa57, cortexa57, 8A,  
AARCH64_FL_FOR_ARCH8 | AA
 AARCH64_CORE("cortex-a72",  cortexa72, cortexa57, 8A,  AARCH64_FL_FOR_ARCH8 | 
AARCH64_FL_CRC, cortexa72, "0x41", "0xd08")
 AARCH64_CORE("exynos-m1",   exynosm1,  exynosm1,  8A,  AARCH64_FL_FOR_ARCH8 | 
AARCH64_FL_CRC | AARCH64_FL_CRYPTO, exynosm1,  "0x53", "0x001")
 AARCH64_CORE("qdf24xx", qdf24xx,   cortexa57, 8A,  AARCH64_FL_FOR_ARCH8 | 
AARCH64_FL_CRC | AARCH64_FL_CRYPTO, cortexa57, "0x51", "0x800")
+AARCH64_CORE("vulcan",  vulcan,cortexa57, 8A,  AARCH64_FL_FOR_ARCH8 | 
AARCH64_FL_CRC | AARCH64_FL_CRYPTO, cortexa57, "0x42", "0x516")
 AARCH64_CORE("thunderx",thunderx,  thunderx,  8A,  AARCH64_FL_FOR_ARCH8 | 
AARCH64_FL_CRC | AARCH64_FL_CRYPTO, thunderx,  "0x43", "0x0a1")
 AARCH64_CORE("xgene1",  xgene1,xgene1,8A,  AARCH64_FL_FOR_ARCH8, 
xgene1, "0x50", "0x000")
 
diff --git a/gcc/config/aarch64/aarch64-tune.md 
b/gcc/config/aarch64/aarch64-tune.md
index cbc6f48..c758a5f 100644
--- a/gcc/config/aarch64/aarch64-tune.md
+++ b/gcc/config/aarch64/aarch64-tune.md
@@ -1,5 +1,5 @@
 ;; -*- buffer-read-only: t -*-
 ;; Generated automatically by gentune.sh from aarch64-cores.def
 (define_attr "tune"
-   
"cortexa35,cortexa53,cortexa57,cortexa72,exynosm1,qdf24xx,thunderx,xgene1,cortexa57cortexa53,cortexa72cortexa53"
+   
"cortexa35,cortexa53,cortexa57,cortexa72,exynosm1,qdf24xx,vulcan,thunderx,xgene1,cortexa57cortexa53,cortexa72cortexa53"
(const (symbol_ref "((enum attr_tune) aarch64_tune)")))
diff --git a/gcc/doc/invoke.texi b/gcc/doc/invoke.texi
index 821f8fd..146042d 100644
--- a/gcc/doc/invoke.texi
+++ b/gcc/doc/invoke.texi
@@ -12955,8 +12955,8 @@ processors implementing the target architecture.
 Specify the name of the target processor for which GCC should tune the
 performance of the code.  Permissible values for this option are:
 @samp{generic}, @samp{cortex-a35}, @samp{cortex-a53}, @samp{cortex-a57},
-@samp{cortex-a72}, @samp{exynos-m1}, @samp{qdf24xx}, @samp{thunderx},
-@samp{xgene1}.
+@samp{cortex-a72}, @samp{exynos-m1}, @samp{qdf24xx}, @samp{vulcan},
+@samp{thunderx}, @samp{xgene1}.
 
 Additionally, this option can specify that GCC should tune the performance
 of the code for a big.LITTLE system.  Permissible values for this
-- 
2.1.4



Re: [PATCH] c/70883 - inconsistent error message for calls to __builtin_add_overflow with too few arguments

2016-06-07 Thread Martin Sebor

Ping: This is a trivial patch to make the text of an error message
for an insufficient number of arguments to a built-in consistent
across different built-ins.  Rather that sometimes saying "not
enough arguments" and others "too few arguments" the patch makes
GCC print "too few" in all cases.

  https://gcc.gnu.org/ml/gcc-patches/2016-06/msg00093.html

On 06/01/2016 03:09 PM, Martin Sebor wrote:

Jakub,

As you requested in the discussion of my arithmetic overflow
built-in patch, attached is the subset of the patch to make
consistent with other such diagnostics the text of the error
message issued for insufficient numbers of arguments in calls
to built-in functions such as __builtin_add_overflow,
__builtin_constant_p, and others.

Thanks
Martin




[PATCH 9/9] rs6000: Separate shrink-wrapping

2016-06-07 Thread Segher Boessenkool
This implements the hooks for separate shrink-wrapping for rs6000.
It handles GPRs and LR.  The GPRs get a concern number corresponding
to their register number; LR gets concern number 0.

This improves specint by 0.9%, specfp by 0.8%, some separate benchmarks
much more (on POWER8).  It improves the hot path in various interpreters,
and e.g. in glibc's malloc.


2016-06-07  Segher Boessenkool  

* config/rs6000/rs6000.c (machine_function): Add new fields
gpr_is_wrapped_separately and lr_is_wrapped_separately.
(TARGET_SHRINK_WRAP_GET_SEPARATE_CONCERNS,
TARGET_SHRINK_WRAP_CONCERNS_FOR_BB,
TARGET_SHRINK_WRAP_DISQUALIFY_CONCERNS,
TARGET_SHRINK_WRAP_EMIT_PROLOGUE_CONCERNS,
TARGET_SHRINK_WRAP_EMIT_EPILOGUE_CONCERNS,
TARGET_SHRINK_WRAP_SET_HANDLED_CONCERNS): Define.
(rs6000_get_separate_concerns): New function.
(rs6000_concerns_for_bb): New function.
(rs6000_disqualify_concerns): New function.
(rs6000_emit_prologue_concerns): New function.
(rs6000_emit_epilogue_concerns): New function.
(rs6000_set_handled_concerns): New function.
(rs6000_emit_prologue): Don't emit LR save if lr_is_wrapped_separately.
Don't emit GPR saves if gpr_is_wrapped_separately for that register.
(restore_saved_lr): Don't restore LR if lr_is_wrapped_separately.
(rs6000_emit_epilogue): Don't emit GPR restores if
gpr_is_wrapped_separately for that register.  Don't make a
REG_CFA_RESTORE note for registers we did not restore, either.

---
 gcc/config/rs6000/rs6000.c | 257 ++---
 1 file changed, 242 insertions(+), 15 deletions(-)

diff --git a/gcc/config/rs6000/rs6000.c b/gcc/config/rs6000/rs6000.c
index c6b2b6a..af56d8e 100644
--- a/gcc/config/rs6000/rs6000.c
+++ b/gcc/config/rs6000/rs6000.c
@@ -152,6 +152,10 @@ typedef struct GTY(()) machine_function
   bool split_stack_argp_used;
   /* Flag if r2 setup is needed with ELFv2 ABI.  */
   bool r2_setup_needed;
+  /* The concerns already handled by separate shrink-wrapping, which should
+ not be considered by the prologue and epilogue.  */
+  bool gpr_is_wrapped_separately[32];
+  bool lr_is_wrapped_separately;
 } machine_function;
 
 /* Support targetm.vectorize.builtin_mask_for_load.  */
@@ -1511,6 +1515,19 @@ static const struct attribute_spec 
rs6000_attribute_table[] =
 #undef TARGET_SET_UP_BY_PROLOGUE
 #define TARGET_SET_UP_BY_PROLOGUE rs6000_set_up_by_prologue
 
+#undef TARGET_SHRINK_WRAP_GET_SEPARATE_CONCERNS
+#define TARGET_SHRINK_WRAP_GET_SEPARATE_CONCERNS rs6000_get_separate_concerns
+#undef TARGET_SHRINK_WRAP_CONCERNS_FOR_BB
+#define TARGET_SHRINK_WRAP_CONCERNS_FOR_BB rs6000_concerns_for_bb
+#undef TARGET_SHRINK_WRAP_DISQUALIFY_CONCERNS
+#define TARGET_SHRINK_WRAP_DISQUALIFY_CONCERNS rs6000_disqualify_concerns
+#undef TARGET_SHRINK_WRAP_EMIT_PROLOGUE_CONCERNS
+#define TARGET_SHRINK_WRAP_EMIT_PROLOGUE_CONCERNS rs6000_emit_prologue_concerns
+#undef TARGET_SHRINK_WRAP_EMIT_EPILOGUE_CONCERNS
+#define TARGET_SHRINK_WRAP_EMIT_EPILOGUE_CONCERNS rs6000_emit_epilogue_concerns
+#undef TARGET_SHRINK_WRAP_SET_HANDLED_CONCERNS
+#define TARGET_SHRINK_WRAP_SET_HANDLED_CONCERNS rs6000_set_handled_concerns
+
 #undef TARGET_EXTRA_LIVE_ON_ENTRY
 #define TARGET_EXTRA_LIVE_ON_ENTRY rs6000_live_on_entry
 
@@ -26111,6 +26128,201 @@ rs6000_global_entry_point_needed_p (void)
   return cfun->machine->r2_setup_needed;
 }
 
+/* Implement TARGET_SHRINK_WRAP_GET_SEPARATE_CONCERNS.  */
+static sbitmap
+rs6000_get_separate_concerns (void)
+{
+  rs6000_stack_t *info = rs6000_stack_info ();
+
+  if (!(info->savres_strategy & SAVE_INLINE_GPRS)
+  || !(info->savres_strategy & REST_INLINE_GPRS)
+  || WORLD_SAVE_P (info))
+return NULL;
+
+  sbitmap concerns = sbitmap_alloc (32);
+  bitmap_clear (concerns);
+
+  /* The GPRs we need saved to the frame.  */
+  int reg_size = TARGET_32BIT ? 4 : 8;
+  int offset = info->gp_save_offset;
+  if (info->push_p)
+offset += info->total_size;
+
+  for (unsigned regno = info->first_gp_reg_save; regno < 32; regno++)
+{
+  if (IN_RANGE (offset, -0x8000, 0x7fff)
+ && rs6000_reg_live_or_pic_offset_p (regno))
+   bitmap_set_bit (concerns, regno);
+
+  offset += reg_size;
+}
+
+  /* Don't mess with the hard frame pointer.  */
+  if (frame_pointer_needed)
+bitmap_clear_bit (concerns, HARD_FRAME_POINTER_REGNUM);
+
+  /* Don't mess with the fixed TOC register.  */
+  if ((TARGET_TOC && TARGET_MINIMAL_TOC)
+  || (flag_pic == 1 && DEFAULT_ABI == ABI_V4)
+  || (flag_pic && DEFAULT_ABI == ABI_DARWIN))
+bitmap_clear_bit (concerns, RS6000_PIC_OFFSET_TABLE_REGNUM);
+
+  /* Optimize LR save and restore if we can.  This is concern 0.  */
+  if (info->lr_save_p
+  && !(flag_pic && (DEFAULT_ABI == ABI_V4 || DEFAULT_ABI == ABI_DARWIN)))
+{
+  offset = info->lr_save_offset;
+  if (info->push_p)
+   offset += info->total_size;
+  

[PATCH 8/9] shrink-wrap: shrink-wrapping for separate concerns

2016-06-07 Thread Segher Boessenkool
This is the main substance of this patch series.  The big comment before
the new code says:

/* Separate shrink-wrapping

   Instead of putting all of the prologue and epilogue in one spot, we
   can put parts of it in places that are executed less frequently.  The
   following code does this, for concerns that can have more than one
   prologue and epilogue, and where those pro- and epilogues can be
   executed more than once.

   What exactly is a concern is target-dependent.  The more usual ones
   are simple saves of callee-saved registers to the frame.  This code
   treats it abstractly (as an sbitmap of concerns), letting the target
   handle all details.

   Prologue concerns are placed in such a way that they are executed as
   infrequently as possible.  Epilogue concerns are put everywhere where
   there is an edge from a bb dominated by such a prologue concern to a
   bb not dominated by one.

   Prologues and epilogues are preferably placed into a block, either at
   the beginning or end of it, if it is needed for all predecessor resp.
   successor edges; or placed on the edge otherwise.

   If the placement of any prologue/epilogue leads to a situation we cannot
   handle (for example, an abnormal edge would need to be split, or some
   targets want to use some specific registers that may not be available
   where we want to put them), separate shrink-wrapping for that concern
   is aborted.  */


2016-06-07  Segher Boessenkool  

* function.c (thread_prologue_and_epilogue_insns): Recompute the
live info.  Call try_shrink_wrapping_separate.  Compute the
prologue_seq afterwards, if it has possibly changed.  Compute the
split_prologue_seq and epilogue_seq later, too.
* shrink-wrap.c: #include cfgbuild.h.
(dump_concern): New function.
(struct sw): New struct.
(SW): New function.
(init_separate_shrink_wrap): New function.
(fini_separate_shrink_wrap): New function.
(place_prologue_for_one_concern): New function.
(spread_concerns): New function.
(disqualify_problematic_concerns): New function.
(do_common_heads_for_concerns): New function.
(do_common_tails_for_concerns): New function.
(insert_prologue_epilogue_for_concerns): New function.
(try_shrink_wrapping_separate): New function.
* shrink-wrap.h: Declare try_shrink_wrapping_separate.

---
 gcc/function.c|  15 +-
 gcc/shrink-wrap.c | 647 ++
 gcc/shrink-wrap.h |   1 +
 3 files changed, 661 insertions(+), 2 deletions(-)

diff --git a/gcc/function.c b/gcc/function.c
index c15d47d..cc5fa6b 100644
--- a/gcc/function.c
+++ b/gcc/function.c
@@ -5909,6 +5909,12 @@ make_epilogue_seq (void)
 void
 thread_prologue_and_epilogue_insns (void)
 {
+  if (optimize > 1)
+{
+  df_live_add_problem ();
+  df_live_set_all_dirty ();
+}
+
   df_analyze ();
 
   /* Can't deal with multiple successors of the entry block at the
@@ -5919,9 +5925,7 @@ thread_prologue_and_epilogue_insns (void)
   edge entry_edge = single_succ_edge (ENTRY_BLOCK_PTR_FOR_FN (cfun));
   edge orig_entry_edge = entry_edge;
 
-  rtx_insn *split_prologue_seq = make_split_prologue_seq ();
   rtx_insn *prologue_seq = make_prologue_seq ();
-  rtx_insn *epilogue_seq = make_epilogue_seq ();
 
   /* Try to perform a kind of shrink-wrapping, making sure the
  prologue/epilogue is emitted only around those parts of the
@@ -5929,6 +5933,13 @@ thread_prologue_and_epilogue_insns (void)
 
   try_shrink_wrapping (&entry_edge, prologue_seq);
 
+  df_analyze ();
+  try_shrink_wrapping_separate (entry_edge->dest);
+  if (crtl->shrink_wrapped_separate)
+prologue_seq = make_prologue_seq ();
+
+  rtx_insn *split_prologue_seq = make_split_prologue_seq ();
+  rtx_insn *epilogue_seq = make_epilogue_seq ();
 
   rtl_profile_for_bb (EXIT_BLOCK_PTR_FOR_FN (cfun));
 
diff --git a/gcc/shrink-wrap.c b/gcc/shrink-wrap.c
index b85b1c3..6e157d0 100644
--- a/gcc/shrink-wrap.c
+++ b/gcc/shrink-wrap.c
@@ -34,6 +34,7 @@ along with GCC; see the file COPYING3.  If not see
 #include "output.h"
 #include "tree-pass.h"
 #include "cfgrtl.h"
+#include "cfgbuild.h"
 #include "params.h"
 #include "bb-reorder.h"
 #include "shrink-wrap.h"
@@ -1006,3 +1007,649 @@ try_shrink_wrapping (edge *entry_edge, rtx_insn 
*prologue_seq)
   BITMAP_FREE (bb_with);
   free_dominance_info (CDI_DOMINATORS);
 }
+
+/* Separate shrink-wrapping
+
+   Instead of putting all of the prologue and epilogue in one spot, we
+   can put parts of it in places that are executed less frequently.  The
+   following code does this, for concerns that can have more than one
+   prologue and epilogue, and where those pro- and epilogues can be
+   executed more than once.
+
+   What exactly is a concern is target-dependent.  The more usual ones
+   are simple saves of callee-saved registers to the frame.  This code
+   treats it abstractly (as an sbitmap of concer

[PATCH 5/9] regrename: Don't run if function was separately shrink-wrapped

2016-06-07 Thread Segher Boessenkool
Unfortunately even after the previous patch there are still a few cases
where regrename creates invalid code when used together with separate
shrink-wrapping.  I haven't found the cause of those.  This patch disables
regrename for functions that were separately shrink-wrapped.


2016-06-07  Segher Boessenkool  

* regrename.c (gate): Disable regrename if shrink_wrapped_separate
is set in crtl.

---
 gcc/regrename.c | 6 +-
 1 file changed, 5 insertions(+), 1 deletion(-)

diff --git a/gcc/regrename.c b/gcc/regrename.c
index 00a5d02..5740b1a 100644
--- a/gcc/regrename.c
+++ b/gcc/regrename.c
@@ -1956,7 +1956,11 @@ public:
   /* opt_pass methods: */
   virtual bool gate (function *)
 {
-  return (optimize > 0 && (flag_rename_registers));
+  /* regrename creates wrong code for exception handling, if used together
+ with separate shrink-wrapping.  Disable for now, until we have
+figured out what exactly is going on.  */
+  return (optimize > 0 && flag_rename_registers
+ && !crtl->shrink_wrapped_separate);
 }
 
   virtual unsigned int execute (function *) { return regrename_optimize (); }
-- 
1.9.3



[PATCH 6/9] sel-sched: Don't mess with register restores

2016-06-07 Thread Segher Boessenkool
If selective scheduling copies register restores it confuses dwarf2cfi.


2016-06-07  Segher Boessenkool  

* sel-sched-ir.c (init_global_and_expr_for_insn): Don't copy
instructions with a REG_CFA_RESTORE note.

---
 gcc/sel-sched-ir.c | 1 +
 1 file changed, 1 insertion(+)

diff --git a/gcc/sel-sched-ir.c b/gcc/sel-sched-ir.c
index 83f813a..4a3984a 100644
--- a/gcc/sel-sched-ir.c
+++ b/gcc/sel-sched-ir.c
@@ -3015,6 +3015,7 @@ init_global_and_expr_for_insn (insn_t insn)
   /* TRAP_IF though have an INSN code is control_flow_insn_p ().  */
   || control_flow_insn_p (insn)
   || volatile_insn_p (PATTERN (insn))
+ || find_reg_note (insn, REG_CFA_RESTORE, NULL)
   || (targetm.cannot_copy_insn_p
   && targetm.cannot_copy_insn_p (insn)))
 force_unique_p = true;
-- 
1.9.3



[PATCH 7/9] cprop: Leave RTX_FRAME_RELATED_P instructions alone

2016-06-07 Thread Segher Boessenkool
Doing cprop on frame-related instructions blows up spectacularly.
So don't.


2016-06-07  Segher Boessenkool  

* regcprop.c (copyprop_hardreg_forward_1): Don't change
RTX_FRAME_RELATED_P instructions.

---
 gcc/regcprop.c | 3 +++
 1 file changed, 3 insertions(+)

diff --git a/gcc/regcprop.c b/gcc/regcprop.c
index 933cc8a..0c01aab 100644
--- a/gcc/regcprop.c
+++ b/gcc/regcprop.c
@@ -829,6 +829,10 @@ copyprop_hardreg_forward_1 (basic_block bb, struct 
value_data *vd)
}
}
 
+  /* Don't change prologue instructions.  */
+  if (RTX_FRAME_RELATED_P (insn))
+   set = NULL;
+
   /* Special-case plain move instructions, since we may well
 be able to do the move from a different register class.  */
   if (set && REG_P (SET_SRC (set)))
-- 
1.9.3



[PATCH 4/9] regrename: Don't rename restores

2016-06-07 Thread Segher Boessenkool
A restore is supposed to restore some certain register.  Restoring it
into some other register will not work.  Don't.


2016-06-07  Segher Boessenkool  

* regrename.c (build_def_use): Invalidate chains that have a
REG_CFA_RESTORE on some instruction.

---
 gcc/regrename.c | 6 ++
 1 file changed, 6 insertions(+)

diff --git a/gcc/regrename.c b/gcc/regrename.c
index 54c7768..00a5d02 100644
--- a/gcc/regrename.c
+++ b/gcc/regrename.c
@@ -1867,6 +1867,12 @@ build_def_use (basic_block bb)
scan_rtx (insn, &XEXP (note, 0), NO_REGS, terminate_dead,
  OP_IN);
  }
+
+ /* Step 8: Kill the chains involving register restores.  Those
+should restore _that_ register.  */
+ for (note = REG_NOTES (insn); note; note = XEXP (note, 1))
+   if (REG_NOTE_KIND (note) == REG_CFA_RESTORE)
+ scan_rtx (insn, &XEXP (note, 0), NO_REGS, mark_all_read, OP_IN);
}
   else if (DEBUG_INSN_P (insn)
   && !VAR_LOC_UNKNOWN_P (INSN_VAR_LOCATION_LOC (insn)))
-- 
1.9.3



[PATCH 1/9] separate shrink-wrap: New command-line flag, status flag, hooks, and doc

2016-06-07 Thread Segher Boessenkool
This patch adds a new command-line flag "-fshrink-wrap-separate", a status
flag "shrink_wrapped_separate", hooks for abstracting the target concerns,
and documentation for all those.


2016-06-07  Segher Boessenkool  

* common.opt (-fshrink-wrap-separate): New flag.
* doc/invoke.texi: Document it.
* doc/tm.texi.in (Shrink-wrapping separate concerns): New subsection.
* doc/tm.texi: Regenerate.
* emit-rtl.h (struct rtl_data): New field shrink_wrapped_separate.
* target.def (shrink_wrap): New hook vector.
(get_separate_concerns, concerns_for_bb, disqualify_concerns,
emit_prologue_concerns, emit_epilogue_concerns, set_handled_concerns):
New hooks.

---
 gcc/common.opt  |  4 
 gcc/doc/invoke.texi | 11 ++-
 gcc/doc/tm.texi | 53 ++
 gcc/doc/tm.texi.in  | 29 +++
 gcc/emit-rtl.h  |  4 
 gcc/target.def  | 56 +
 6 files changed, 156 insertions(+), 1 deletion(-)

diff --git a/gcc/common.opt b/gcc/common.opt
index 2bb576c..b00f538 100644
--- a/gcc/common.opt
+++ b/gcc/common.opt
@@ -2153,6 +2153,10 @@ Common Report Var(flag_shrink_wrap) Optimization
 Emit function prologues only before parts of the function that need it,
 rather than at the top of the function.
 
+fshrink-wrap-separate
+Common Report Var(flag_shrink_wrap_separate) Init(1) Optimization
+Shrink-wrap parts of the prologue and epilogue separately.
+
 fsignaling-nans
 Common Report Var(flag_signaling_nans) Optimization SetByCombined
 Disable optimizations observable by IEEE signaling NaNs.
diff --git a/gcc/doc/invoke.texi b/gcc/doc/invoke.texi
index e3a2399..21df5c5 100644
--- a/gcc/doc/invoke.texi
+++ b/gcc/doc/invoke.texi
@@ -394,7 +394,8 @@ Objective-C and Objective-C++ Dialects}.
 -fschedule-insns -fschedule-insns2 -fsection-anchors @gol
 -fselective-scheduling -fselective-scheduling2 @gol
 -fsel-sched-pipelining -fsel-sched-pipelining-outer-loops @gol
--fsemantic-interposition -fshrink-wrap -fsignaling-nans @gol
+-fsemantic-interposition -fshrink-wrap -fshrink-wrap-separate @gol
+-fsignaling-nans @gol
 -fsingle-precision-constant -fsplit-ivs-in-unroller @gol
 -fsplit-paths @gol
 -fsplit-wide-types -fssa-backprop -fssa-phiopt @gol
@@ -6263,6 +6264,7 @@ compilation time.
 -fmove-loop-invariants @gol
 -freorder-blocks @gol
 -fshrink-wrap @gol
+-fshrink-wrap-separate @gol
 -fsplit-wide-types @gol
 -fssa-backprop @gol
 -fssa-phiopt @gol
@@ -7180,6 +7182,13 @@ Emit function prologues only before parts of the 
function that need it,
 rather than at the top of the function.  This flag is enabled by default at
 @option{-O} and higher.
 
+@item -fshrink-wrap-separate
+@opindex fshrink-wrap-separate
+Shrink-wrap separate parts of the prologue and epilogue separately, so that
+those parts are only executed when needed.
+This option is on by default, but has no effect unless @option{-fshrink-wrap}
+is also turned on.
+
 @item -fcaller-saves
 @opindex fcaller-saves
 Enable allocation of values to registers that are clobbered by
diff --git a/gcc/doc/tm.texi b/gcc/doc/tm.texi
index b318615..c326599 100644
--- a/gcc/doc/tm.texi
+++ b/gcc/doc/tm.texi
@@ -2924,6 +2924,7 @@ This describes the stack layout and calling conventions.
 * Function Entry::
 * Profiling::
 * Tail Calls::
+* Shrink-wrapping separate concerns::
 * Stack Smashing Protection::
 * Miscellaneous Register Hooks::
 @end menu
@@ -4852,6 +4853,58 @@ This hook should add additional registers that are 
computed by the prologue to t
 True if a function's return statements should be checked for matching the 
function's return type.  This includes checking for falling off the end of a 
non-void function.  Return false if no such check should be made.
 @end deftypefn
 
+@node Shrink-wrapping separate concerns
+@subsection Shrink-wrapping separate concerns
+@cindex shrink-wrapping separate concerns
+
+The prologue does a lot of separate things: save callee-saved registers,
+do whatever needs to be done to be able to call things (save the return
+address, align the stack, whatever; different for each target), set up a
+stack frame, do whatever needs to be done for the static chain (if anything),
+set up registers for PIC, etc.  Using the following hooks those concerns
+can be shrink-wrapped separately, so that the initialization (and possibly
+teardown) for those concerns is not done on execution paths where it is
+unnecessary.
+
+What exactly those concerns are is up to the target code; the generic
+code treats them abstractly.
+
+@deftypefn {Target Hook} sbitmap TARGET_SHRINK_WRAP_GET_SEPARATE_CONCERNS 
(void)
+This hook should return an @code{sbitmap} with the bits set for those
+concerns that can be separately shrink-wrapped in the current function.
+Return @code{NULL} if the current function should not get any separate
+shrink-wrapping.
+Don't define this hook if it would always retur

[PATCH 3/9] dce: Don't dead-code delete separately wrapped restores

2016-06-07 Thread Segher Boessenkool
Deleting restores (before a noreturn) that are dead confuses dwarf2cfi.


2016-06-07  Segher Boessenkool  

* dce.c (delete_unmarked_insns): Don't delete instructions with
a REG_CFA_RESTORE note.

---
 gcc/dce.c | 9 +
 1 file changed, 9 insertions(+)

diff --git a/gcc/dce.c b/gcc/dce.c
index ea3fb00..d510287 100644
--- a/gcc/dce.c
+++ b/gcc/dce.c
@@ -587,6 +587,15 @@ delete_unmarked_insns (void)
  if (!dbg_cnt (dce))
continue;
 
+ if (crtl->shrink_wrapped_separate
+ && find_reg_note (insn, REG_CFA_RESTORE, NULL))
+   {
+ if (dump_file)
+   fprintf (dump_file, "DCE: NOT deleting insn %d, it's a "
+   "callee-save restore\n", INSN_UID (insn));
+ continue;
+   }
+
  if (dump_file)
fprintf (dump_file, "DCE: Deleting insn %d\n", INSN_UID (insn));
 
-- 
1.9.3



[PATCH 2/9] cfgcleanup: Don't confuse CFI when -fshrink-wrap-separate

2016-06-07 Thread Segher Boessenkool
cfgcleanup would try to join noreturn paths with different concerns handled.
This then fails in dwarf2cfi.


2016-06-07  Segher Boessenkool  

* cfgcleanup.c (outgoing_edges_match): Don't join noreturn calls
if shrink_wrapped_separate.

---
 gcc/cfgcleanup.c | 5 +
 1 file changed, 5 insertions(+)

diff --git a/gcc/cfgcleanup.c b/gcc/cfgcleanup.c
index 023b9d2..f08e4ee 100644
--- a/gcc/cfgcleanup.c
+++ b/gcc/cfgcleanup.c
@@ -1824,6 +1824,11 @@ outgoing_edges_match (int mode, basic_block bb1, 
basic_block bb2)
   || !find_reg_note (last1, REG_ARGS_SIZE, NULL)))
 return false;
 
+  /* If shrink-wrapping separate concerns, joining noreturn calls that
+ have different concerns set up will confuse dwarf2cfi.  */
+  if (!nonfakeedges && crtl->shrink_wrapped_separate)
+return false;
+
   /* fallthru edges must be forwarded to the same destination.  */
   if (fallthru1)
 {
-- 
1.9.3



[PATCH 0/9] separate shrink-wrapping

2016-06-07 Thread Segher Boessenkool
This patch series introduces separate shrink-wrapping.

There are many things the prologue/epilogue of a function do, and most of
those things can be done independently.  For example, most of the time,
for many targets, the save of callee-saved registers can be done later
than the "main" prologue.

Doing so helps quite a bit because the prologue is expensive for functions
that do not need everything it does done for every path through the
function; often, the hot paths do not need much at all, e.g. not those
things the prologue needs to do for the function to call other functions.

The first patch creates a command-line flag, some hooks, a status flag
("is this function wrapped separately", used by later passes), and
documentation for these things.

The next six patches are to prevent later passes from mishandling the
epilogue instructions that now appear before the epilogue: mostly, you
cannot do much to instructions with a REG_CFA_RESTORE note without
confusing dwarf2cfi.  The cprop one is for prologue instructions.

Then, the main patch.  And finally a patch for PowerPC that implements
separate wrapping for GPRs and LR.

Tested on powerpc64-linux (-m32/-m64, -mlra/-mno-lra), and on
powerpc64le-linux.  Previous versions of this series also tested on
x86_64-linux.

Is this okay for trunk?


Segher


Segher Boessenkool (9):
  separate shrink-wrap: New command-line flag, status flag, hooks, and doc
  cfgcleanup: Don't confuse CFI when -fshrink-wrap-separate
  dce: Don't dead-code delete separately wrapped restores
  regrename: Don't rename restores
  regrename: Don't run if function was separately shrink-wrapped
  sel-sched: Don't mess with register restores
  cprop: Leave RTX_FRAME_RELATED_P instructions alone
  shrink-wrap: shrink-wrapping for separate concerns
  rs6000: Separate shrink-wrapping

 gcc/cfgcleanup.c   |   5 +
 gcc/common.opt |   4 +
 gcc/config/rs6000/rs6000.c | 257 --
 gcc/dce.c  |   9 +
 gcc/doc/invoke.texi|  11 +-
 gcc/doc/tm.texi|  53 
 gcc/doc/tm.texi.in |  29 ++
 gcc/emit-rtl.h |   4 +
 gcc/function.c |  15 +-
 gcc/regcprop.c |   3 +
 gcc/regrename.c|  12 +-
 gcc/sel-sched-ir.c |   1 +
 gcc/shrink-wrap.c  | 647 +
 gcc/shrink-wrap.h  |   1 +
 gcc/target.def |  56 
 15 files changed, 1088 insertions(+), 19 deletions(-)

-- 
1.9.3



[PATCH] Add selftest for pretty-print.c (v2)

2016-06-07 Thread David Malcolm
On Tue, 2016-06-07 at 12:02 +0200, Bernd Schmidt wrote:
> On 06/06/2016 11:28 PM, David Malcolm wrote:
> > +  assert_pp_format ("0xcafebabe", "%wx",
> > (HOST_WIDE_INT)0xcafebabe);
> 
> More interesting tests would be to have multiple arguments to test
> that
> we really used the right size for the varargs. Maybe append a single
> %d
> arg with a unique bit pattern to all of them?
> 
> Otherwise ok.

Good idea.  In the following I did it by adding 0x12345678 as a
successor argument to each test.  I chose that bit pattern on the
grounds that each nybble is unique and non-zero.
I printed them with %x to make it easier (I hope) to track down
problems.

Successfully bootstrapped®rtested on x86_64-pc-linux-gnu.

OK for trunk?

gcc/ChangeLog:
* pretty-print.c: Include "selftest.h".
(pp_format): Fix comment.
(identifier_to_locale): Likewise.
(selftest::test_basic_printing): New function.
(selftest::assert_pp_format): New function.
(selftest::test_pp_format): New function.
(selftest::pretty_print_c_tests): New function.
* selftest-run-tests.c (selftest::run_tests): Call
selftest::pretty_print_c_tests.
* selftest.h (pretty_print_c_tests): New declaration.
---
 gcc/pretty-print.c   | 164 ++-
 gcc/selftest-run-tests.c |   1 +
 gcc/selftest.h   |   1 +
 3 files changed, 165 insertions(+), 1 deletion(-)

diff --git a/gcc/pretty-print.c b/gcc/pretty-print.c
index cc2b8cc..d805da4 100644
--- a/gcc/pretty-print.c
+++ b/gcc/pretty-print.c
@@ -24,6 +24,7 @@ along with GCC; see the file COPYING3.  If not see
 #include "intl.h"
 #include "pretty-print.h"
 #include "diagnostic-color.h"
+#include "selftest.h"
 
 #if HAVE_ICONV
 #include 
@@ -304,7 +305,7 @@ pp_indent (pretty_printer *pp)
 
 /* Formatting phases 1 and 2: render TEXT->format_spec plus
TEXT->args_ptr into a series of chunks in pp_buffer (PP)->args[].
-   Phase 3 is in pp_format_text.  */
+   Phase 3 is in pp_output_formatted_text.  */
 
 void
 pp_format (pretty_printer *pp, text_info *text)
@@ -1203,3 +1204,164 @@ identifier_to_locale (const char *ident)
 return ret;
   }
 }
+
+#if CHECKING_P
+
+namespace selftest {
+
+/* Smoketest for pretty_printer.  */
+
+static void
+test_basic_printing ()
+{
+  pretty_printer pp;
+  pp_string (&pp, "hello");
+  pp_space (&pp);
+  pp_string (&pp, "world");
+
+  ASSERT_STREQ ("hello world", pp_formatted_text (&pp));
+}
+
+/* Helper function for testing pp_format.
+   Verify that pp_format (FMT, ...) followed by pp_output_formatted_text
+   prints EXPECTED, assuming that pp_show_color is SHOW_COLOR.  */
+
+static void
+assert_pp_format_va (const char *expected, bool show_color, const char *fmt,
+va_list *ap)
+{
+  pretty_printer pp;
+  text_info ti;
+  rich_location rich_loc (line_table, UNKNOWN_LOCATION);
+
+  ti.format_spec = fmt;
+  ti.args_ptr = ap;
+  ti.err_no = 0;
+  ti.x_data = NULL;
+  ti.m_richloc = &rich_loc;
+
+  pp_show_color (&pp) = show_color;
+  pp_format (&pp, &ti);
+  pp_output_formatted_text (&pp);
+  ASSERT_STREQ (expected, pp_formatted_text (&pp));
+}
+
+/* Verify that pp_format (FMT, ...) followed by pp_output_formatted_text
+   prints EXPECTED, with show_color disabled.  */
+
+static void
+assert_pp_format (const char *expected, const char *fmt, ...)
+{
+  va_list ap;
+
+  va_start (ap, fmt);
+  assert_pp_format_va (expected, false, fmt, &ap);
+  va_end (ap);
+}
+
+/* As above, but with colorization enabled.  */
+
+static void
+assert_pp_format_colored (const char *expected, const char *fmt, ...)
+{
+  va_list ap;
+
+  va_start (ap, fmt);
+  assert_pp_format_va (expected, true, fmt, &ap);
+  va_end (ap);
+}
+
+/* Verify that pp_format works, for various format codes.  */
+
+static void
+test_pp_format ()
+{
+  /* Avoid introducing locale-specific differences in the results
+ by hardcoding open_quote and close_quote.  */
+  const char *old_open_quote = open_quote;
+  const char *old_close_quote = close_quote;
+  open_quote = "`";
+  close_quote = "'";
+
+  /* Verify that plain text is passed through unchanged.  */
+  assert_pp_format ("unformatted", "unformatted");
+
+  /* Verify various individual format codes, in the order listed in the
+ comment for pp_format above.  For each code, we append a second
+ argument with a known bit pattern (0x12345678), to ensure that we
+ are consuming arguments correctly.  */
+  assert_pp_format ("-27 12345678", "%d %x", -27, 0x12345678);
+  assert_pp_format ("-5 12345678", "%i %x", -5, 0x12345678);
+  assert_pp_format ("10 12345678", "%u %x", 10, 0x12345678);
+  assert_pp_format ("17 12345678", "%o %x", 15, 0x12345678);
+  assert_pp_format ("cafebabe 12345678", "%x %x", 0xcafebabe, 0x12345678);
+  assert_pp_format ("-27 12345678", "%ld %x", (long)-27, 0x12345678);
+  assert_pp_format ("-5 12345678", "%li %x", (long)-5, 0x12345678);
+  assert_pp_format ("10 12345678", "%lu %x", (long)10, 0x123456

Re: [PATCH] Selftest framework (v7)

2016-06-07 Thread Trevor Saunders
On Tue, Jun 07, 2016 at 10:18:32AM -0400, David Malcolm wrote:
> On Mon, 2016-06-06 at 22:14 -0400, Trevor Saunders wrote:
> > On Mon, Jun 06, 2016 at 11:57:49PM +0200, Jakub Jelinek wrote:
> > > On Mon, Jun 06, 2016 at 05:53:50PM -0400, Trevor Saunders wrote:
> > > > > > As far as I can 
> > > > > > tell this just involves moving the start of namespace
> > > > > > selftest
> > > > > > upwards a 
> > > > > > bit in the files where we have tests.
> > > > > 
> > > > > Yes, and it does seem cleaner to have all of the selftest code
> > > > > start
> > > > > like this:
> > > > > 
> > > > >   #if CHECKING_P
> > > > 
> > > > What are we gaining by ifdefing this? I would think on reasonable
> > > > systems the compiler would optimize out the call to the selftests
> > > > in
> > > > release builds and then the linker would gc all the unused
> > > > functions.
> > > > Do we really care about code size in places that doesn't happen
> > > > enough
> > > > to go through this?
> > > 
> > > Not everyone is building the compiler with LTO, and if you don't,
> > > then
> > > how would you optimize that away?
> > 
> > -ffunction-sections -Wl,--gc-sections should be enough I think.  I
> > guess
> > we don't use those at the moment though.
> > 
> > > And yes, not having the self-tests, especially if they are going to
> > > grow
> > > further, in release compilers is desirable, especially if it would
> > > be
> > > intermixed with hot code.
> > 
> > That's fair, though turning on --gc-sections where we can should
> > further
> > help with that, and that should be more effective with
> > -ffunction-sections -fdata-sections, so its seems to me like the
> > right
> > thing to do is add configure tests to enable those?  And then its
> > more
> > of a non issue?
> 
> I appreciate that you'd done a lot of work on eliminating preprocessor
> use in gcc, and that we'd prefer to minimize the amount of #if code we
> have - though it's relatively easy to test the with/without #if
> CHECKING_P case (compared to all of the various target-specific
> macros).

yeah, I certainly agree CHECKING_P is one of the more defensable macros.

> Historically gcc testing has largely been "black box" testing: run the
> built programs with specific inputs and look for specific outputs. My
> hope with -fself-tests is that we can build up our "white box" test
> coverage to complement the above, with unit tests.  My favorite example
> is the testing for gengtype that I posted as a followup, which gives us
> some immediate test coverage that gengtype is working as expected,
> which is hard to do using our traditional approach to testing.

I'd probably say testing hash tables / vec is more useful, but whatever
;)

> I hope that we can add a lot more tests to -fself-test, in particular,
> I want us to have unit tests for hot code, including code that's hidden
> deep inside the implementation and that might normally get inlined
> away.  To play Devil's Advocate: if we find we're able to do a release
> build with the selftests enabled and that it isn't slowing down the
> release build, does that imply we need more unit tests for our hottest
> code?  (the counterargument being that the checked build still needs to
> run in a bearable amount of time).

by enabled do you mean we build the code or actually run the tests on
every compile?  I would think basically any tests are two slow for the
latter.  For the former I would say building the self test code should
have very little effect on performance since its all separate functions.
There's some of course because of call sites effecting inlining etc, but
we can probably minimize that with -ffunction-sections --gc-sections and
attribute cold?

> That said, I want -fself-test to always run quickly (e.g. less than a
> second); let's not put anything slow in there.  Also, any unit tests
> involving analyzing several gimple or RTL statements at once seem to be
> easier to do via the gimple and RTL frontends that Prasad and I are
> working on respectively.  (I think we can use -fself-test for unit
> -testing implementation details of passes that involve one statement at
> a time, but as soon as we start dealing with multiple statements and
> control flow, that it's probably better to express it using a
> gimple/RTL dump in DejaGnu form).

That makes sense.

Trev

> 
> 
> Hope the above sounds sane
> Dave


Re: [C++ Patch] Fix some simple location issues

2016-06-07 Thread Paolo Carlini

Hi,

On 07/06/2016 21:40, Jason Merrill wrote:
For diagnostics about an initializer, I think it would be better to 
give the location of the initializer rather than the declaration.  
Maybe use


location_t loc = EXPR_LOC_OR_LOC (init, DECL_SOURCE_LOCATION (decl));
Now that you are telling me this, I think we already briefly discussed 
it in the past... Anyway, I investigated it a little bit a couple of 
days ago and noticed that, in fact, other front-ends differ about many 
of those details, even when normally they have accurate locations. In 
some cases, for example, clang outputs the primary caret under the 
declaration and only the secondary wiggly line under the initializer, 
but only in some cases (see, for example, g++.dg/init/brace6.C, both in 
c++98 and c++11).


Anyway, in practice, for this first round of changes, I tried your 
suggestion above in 4 places, in maybe_deduce_size_from_array_init and 
in check_initializer, but in practice doesn't currently make a 
difference, at least for the testcases, for lack of usable locations. 
Certainly could in the future! The below still passes testing...


Thanks,
Paolo.

/
Index: cp/decl.c
===
--- cp/decl.c   (revision 237171)
+++ cp/decl.c   (working copy)
@@ -1393,7 +1393,7 @@ duplicate_decls (tree newdecl, tree olddecl, bool
 {
   if (DECL_INITIAL (olddecl))
inform (DECL_SOURCE_LOCATION (olddecl),
-   "previous definition of %q+D was here", olddecl);
+   "previous definition of %qD was here", olddecl);
   else
inform (DECL_SOURCE_LOCATION (olddecl),
"previous declaration of %qD was here", olddecl);
@@ -5266,13 +5266,16 @@ maybe_deduce_size_from_array_init (tree decl, tree
do_default);
  if (failure == 1)
{
- error ("initializer fails to determine size of %qD", decl);
+ error_at (EXPR_LOC_OR_LOC (initializer,
+DECL_SOURCE_LOCATION (decl)),
+   "initializer fails to determine size of %qD", decl);
}
  else if (failure == 2)
{
  if (do_default)
{
- error ("array size missing in %qD", decl);
+ error_at (DECL_SOURCE_LOCATION (decl),
+   "array size missing in %qD", decl);
}
  /* If a `static' var's size isn't known, make it extern as
 well as static, so it does not get allocated.  If it's not
@@ -5283,7 +5286,8 @@ maybe_deduce_size_from_array_init (tree decl, tree
}
  else if (failure == 3)
{
- error ("zero-size array %qD", decl);
+ error_at (DECL_SOURCE_LOCATION (decl),
+   "zero-size array %qD", decl);
}
}
 
@@ -5322,7 +5326,8 @@ layout_var_decl (tree decl)
   /* An automatic variable with an incomplete type: that is an error.
 Don't talk about array types here, since we took care of that
 message in grokdeclarator.  */
-  error ("storage size of %qD isn%'t known", decl);
+  error_at (DECL_SOURCE_LOCATION (decl),
+   "storage size of %qD isn%'t known", decl);
   TREE_TYPE (decl) = error_mark_node;
 }
 #if 0
@@ -5345,7 +5350,8 @@ layout_var_decl (tree decl)
constant_expression_warning (DECL_SIZE (decl));
   else
{
- error ("storage size of %qD isn%'t constant", decl);
+ error_at (DECL_SOURCE_LOCATION (decl),
+   "storage size of %qD isn%'t constant", decl);
  TREE_TYPE (decl) = error_mark_node;
}
 }
@@ -5954,7 +5960,8 @@ check_array_initializer (tree decl, tree type, tre
   if (!COMPLETE_TYPE_P (complete_type (element_type)))
 {
   if (decl)
-   error ("elements of array %q#D have incomplete type", decl);
+   error_at (DECL_SOURCE_LOCATION (decl),
+ "elements of array %q#D have incomplete type", decl);
   else
error ("elements of array %q#T have incomplete type", type);
   return true;
@@ -6018,7 +6025,8 @@ check_initializer (tree decl, tree init, int flags
 }
   else if (!COMPLETE_TYPE_P (type))
 {
-  error ("%q#D has incomplete type", decl);
+  error_at (DECL_SOURCE_LOCATION (decl),
+   "%q#D has incomplete type", decl);
   TREE_TYPE (decl) = error_mark_node;
   return NULL_TREE;
 }
@@ -6038,8 +6046,9 @@ check_initializer (tree decl, tree init, int flags
}
  else if (init_len != 1 && TREE_CODE (type) != COMPLEX_TYPE)
{
- error ("scalar object %qD requires one element in initializer",
-decl);
+ error_at (EXPR_LOC_OR_LOC (init, DECL_SOURCE_LOCATION (decl)),
+   "scalar object %qD requires one element in 

Re: [PATCH][2/3] Vectorize inductions that are live after the loop

2016-06-07 Thread Gerald Pfeifer
On Tue, 7 Jun 2016, Alan Hayward wrote:
> testsuite/
>   * gcc.dg/vect/vect-live-2.c : Likewise
>   * gcc.dg/vect/vect-live-slp-3.c : Update effective target
>   * lib/target-supports.exp : Add aarch64 to vect_long

"Likewise" entries usually follow those they refer to (i.e., 
while ChangeLog files are "latest first", any individual entry
is read "top-down").

And no space after the filename, and a full stop at the end of
each entry starting with a "*".

Gerald


Re: [PATCH] integer overflow checking builtins in constant expressions

2016-06-07 Thread Martin Sebor

+The built-in functions promote the first two operands into infinite precision 
signed type
+and perform addition on those promoted operands. The result is then
+cast to the type the third argument.


The above is missing an "of" (it should read "type of the third
argument".)

Martin



Re: [PATCH] Document POWER's -mhtm and -mno-htm options.

2016-06-07 Thread Peter Bergner

On 6/7/16 12:49 PM, David Edelsohn wrote:

On Tue, Jun 7, 2016 at 1:18 PM, Peter Bergner  wrote:

It seems we (ok, me) forgot to document the -mhtm option for POWER.
This bootstrapped fine and the generated docs looked good.
Is this ok for trunk and the open release branches?

Peter


* doc/invoke.texi (RS/6000 and PowerPC Options): Document -mhtm and
-mno-htm.


Okay.


Ok, committed everywhere.  Thanks!

Peter





Re: C PATCH for c/71418 (ICE with bogus array dimension and _Alignas)

2016-06-07 Thread Joseph Myers
On Tue, 7 Jun 2016, Marek Polacek wrote:

> Another error-recovery bug.  This time TYPE was error_mark_node but
> min_align_of_type accessed it via TYPE_ALIGN.  So we better check that we're
> operating on a type.
> 
> Bootstrapped/regtested on x86_64-linux, ok for trunk?

OK.

-- 
Joseph S. Myers
jos...@codesourcery.com


Re: C PATCH for c/71426 (ICE with bogus parameter)

2016-06-07 Thread Joseph Myers
On Tue, 7 Jun 2016, Marek Polacek wrote:

> cc1 crashed on the following invalid code, because it tripped this assert.
> The reason why b->nested was false for the FUNCTION_DECL 'x' was that when
> doing pushdecl, we found an incompatible duplicate, so pushdecl just bound
> 'x' to FUNCTION_DECL 'x', replacing the old binding, without the TREE_PUBLIC
> test that sets 'nested'.
> 
> At first I thought about removing the assert but in the end I decided to add
> || seen_error -- for invalid code we might do the same as for ERROR_MARK.
> 
> Bootstrapped/regtested on x86_64-linux, ok for trunk?

OK.

-- 
Joseph S. Myers
jos...@codesourcery.com


Re: [PATCH] integer overflow checking builtins in constant expressions

2016-06-07 Thread Jakub Jelinek
On Tue, Jun 07, 2016 at 02:30:06PM -0600, Martin Sebor wrote:
> I've also since discovered that the handling of (at least)
> BUILTIN_LINE is important in C++ 98 mode and if I had run
> the c-c++-common tests instead of just the strict C++ subset
> I would have seen the c-c++-common/builtin_location.c test
> fail with the same error.
> 
> I suspect I added the other built-ins to the function to get
> the Clang-compatibility typed built-ins to work in C++ 98
> (with the default argument).  When it was decided that
> the Clang built-ins shouldn't change I removed the tests
> but not this change.

Here is the updated patch, with the extra testcase (the previous C++ only
testcases were only c++11 and c++14 effective targets) and
the BUILTIN_{ADD,SUB,MUL}_OVERFLOW_P kept in cp/tree.c.

Tested on x86_64-linux, ok for trunk?

2016-06-07  Martin Sebor  
Jakub Jelinek  

PR c++/70507
PR c/68120
* builtins.def (BUILT_IN_ADD_OVERFLOW_P, BUILT_IN_SUB_OVERFLOW_P,
BUILT_IN_MUL_OVERFLOW_P): New builtins.
* builtins.c: Include gimple-fold.h.
(fold_builtin_arith_overflow): Handle
BUILT_IN_{ADD,SUB,MUL}_OVERFLOW_P.
(fold_builtin_3): Likewise.
* doc/extend.texi (Integer Overflow Builtins): Document
__builtin_{add,sub,mul}_overflow_p.
gcc/c/
* c-typeck.c (convert_arguments): Don't promote last argument
of BUILT_IN_{ADD,SUB,MUL}_OVERFLOW_P.
gcc/cp/
* constexpr.c: Include gimple-fold.h.
(cxx_eval_internal_function): New function.
(cxx_eval_call_expression): Call it.
(potential_constant_expression_1): Handle integer arithmetic
overflow built-ins.
* tree.c (builtin_valid_in_constant_expr_p): Handle
BUILT_IN_{ADD,SUB,MUL}_OVERFLOW_P.
gcc/c-family/
* c-common.c (check_builtin_function_arguments): Handle
BUILT_IN_{ADD,SUB,MUL}_OVERFLOW_P.
gcc/testsuite/
* c-c++-common/builtin-arith-overflow-1.c: Add test cases.
* c-c++-common/builtin-arith-overflow-2.c: New test.
* g++.dg/ext/builtin-arith-overflow-1.C: New test.
* g++.dg/cpp0x/constexpr-arith-overflow.C: New test.
* g++.dg/cpp1y/constexpr-arith-overflow.C: New test.

--- gcc/builtins.def.jj 2016-06-06 14:40:35.619347198 +0200
+++ gcc/builtins.def2016-06-07 17:46:52.034206039 +0200
@@ -710,6 +710,9 @@ DEF_C94_BUILTIN(BUILT_IN_TOWUPPE
 DEF_GCC_BUILTIN(BUILT_IN_ADD_OVERFLOW, "add_overflow", BT_FN_BOOL_VAR, 
ATTR_NOTHROW_TYPEGENERIC_LEAF)
 DEF_GCC_BUILTIN(BUILT_IN_SUB_OVERFLOW, "sub_overflow", BT_FN_BOOL_VAR, 
ATTR_NOTHROW_TYPEGENERIC_LEAF)
 DEF_GCC_BUILTIN(BUILT_IN_MUL_OVERFLOW, "mul_overflow", BT_FN_BOOL_VAR, 
ATTR_NOTHROW_TYPEGENERIC_LEAF)
+DEF_GCC_BUILTIN(BUILT_IN_ADD_OVERFLOW_P, "add_overflow_p", 
BT_FN_BOOL_VAR, ATTR_NOTHROW_TYPEGENERIC_LEAF)
+DEF_GCC_BUILTIN(BUILT_IN_SUB_OVERFLOW_P, "sub_overflow_p", 
BT_FN_BOOL_VAR, ATTR_NOTHROW_TYPEGENERIC_LEAF)
+DEF_GCC_BUILTIN(BUILT_IN_MUL_OVERFLOW_P, "mul_overflow_p", 
BT_FN_BOOL_VAR, ATTR_NOTHROW_TYPEGENERIC_LEAF)
 /* Clang compatibility.  */
 DEF_GCC_BUILTIN(BUILT_IN_SADD_OVERFLOW, "sadd_overflow", 
BT_FN_BOOL_INT_INT_INTPTR, ATTR_NOTHROW_LEAF_LIST)
 DEF_GCC_BUILTIN(BUILT_IN_SADDL_OVERFLOW, "saddl_overflow", 
BT_FN_BOOL_LONG_LONG_LONGPTR, ATTR_NOTHROW_LEAF_LIST)
--- gcc/builtins.c.jj   2016-06-06 14:40:35.601347427 +0200
+++ gcc/builtins.c  2016-06-07 17:46:51.958207014 +0200
@@ -64,6 +64,7 @@ along with GCC; see the file COPYING3.
 #include "rtl-chkp.h"
 #include "internal-fn.h"
 #include "case-cfn-macros.h"
+#include "gimple-fold.h"
 
 
 struct target_builtins default_target_builtins;
@@ -7943,18 +7944,28 @@ fold_builtin_unordered_cmp (location_t l
 /* Fold __builtin_{,s,u}{add,sub,mul}{,l,ll}_overflow, either into normal
arithmetics if it can never overflow, or into internal functions that
return both result of arithmetics and overflowed boolean flag in
-   a complex integer result, or some other check for overflow.  */
+   a complex integer result, or some other check for overflow.
+   Similarly fold __builtin_{add,sub,mul}_overflow_p to just the overflow
+   checking part of that.  */
 
 static tree
 fold_builtin_arith_overflow (location_t loc, enum built_in_function fcode,
 tree arg0, tree arg1, tree arg2)
 {
   enum internal_fn ifn = IFN_LAST;
-  tree type = TREE_TYPE (TREE_TYPE (arg2));
-  tree mem_arg2 = build_fold_indirect_ref_loc (loc, arg2);
+  /* The code of the expression corresponding to the type-generic
+ built-in, or ERROR_MARK for the type-specific ones.  */
+  enum tree_code opcode = ERROR_MARK;
+  bool ovf_only = false;
+
   switch (fcode)
 {
+case BUILT_IN_ADD_OVERFLOW_P:
+  ovf_only = true;
+  /* FALLTHRU */
 case BUILT_IN_ADD_OVERFLOW:
+  opcode = PLUS_EXPR;
+  /* FALLTHRU */
 case BUILT_IN_SADD_OVERFLOW:
 case BUILT_IN_SADDL_OVERFLOW:
 case BU

Re: [PATCH] integer overflow checking builtins in constant expressions

2016-06-07 Thread Martin Sebor

On 06/07/2016 01:42 PM, Jakub Jelinek wrote:

On Tue, Jun 07, 2016 at 03:35:28PM -0400, Jason Merrill wrote:

Indeed, confirming removal of the cp/tree.c hunk doesn't affect anything in
the testsuite, nor e.g.
enum A {
  B = 1,
  C = 2,
  D = __builtin_add_overflow_p (B, C, C)
};
int e[__builtin_add_overflow_p (B, C, C) + 1];
template  int foo (int);
void
bar ()
{
  foo <__builtin_add_overflow_p (B, C, C) + 1> (0);
}
That said, builtin_valid_in_constant_expr_p is used in lots of spots, and in
various of them I see that without the change it actually sets
*non_integral_constant_expression_p = true;
or something similar, but I have no idea why it still works despite of that.


Ah, that's the C++98 constant expression handling, which is a lot more
restricted.  C++11 and up don't care about that flag.


Oops, actually, it seems even the cp/tree.c hunks are significant:

I've only compiled the above testcase with the default -std=c++14, with
-std=c++98 without the cp/tree.c bits I get:

a.C:4:7: error: ‘bool __builtin_add_overflow_p(...)’ cannot appear in a
constant-expression
D = __builtin_add_overflow_p (B, C, C)
^~~~
a.C:4:40: error: a function call cannot appear in a constant-expression
D = __builtin_add_overflow_p (B, C, C)
 ^
a.C:6:45: error: array bound is not an integer constant before ‘]’ token
  int e[__builtin_add_overflow_p (B, C, C) + 1];
  ^
a.C: In function ‘void bar()’:
a.C:11:8: error: ‘bool __builtin_add_overflow_p(...)’ cannot appear in a
constant-expression
foo <__builtin_add_overflow_p (B, C, C) + 1> (0);
 ^~~~
a.C:11:41: error: a function call cannot appear in a constant-expression
foo <__builtin_add_overflow_p (B, C, C) + 1> (0);
  ^
a.C:11:50: error: no matching function for call to ‘foo(int)’
foo <__builtin_add_overflow_p (B, C, C) + 1> (0);
   ^
a.C:7:22: note: candidate: template int foo(int)
  template  int foo (int);
   ^~~
a.C:7:22: note:   template argument deduction/substitution failed:
a.C:11:50: error: template argument 1 is invalid
foo <__builtin_add_overflow_p (B, C, C) + 1> (0);
   ^

Though, maybe it is only worth supporting the __builtin_*_overflow_p
builtins for C++98 integer constant expression contexts and
thus only handle there the 3 builtins instead of all the others.

I guess I should add this testcase to the testsuite then.


I've also since discovered that the handling of (at least)
BUILTIN_LINE is important in C++ 98 mode and if I had run
the c-c++-common tests instead of just the strict C++ subset
I would have seen the c-c++-common/builtin_location.c test
fail with the same error.

I suspect I added the other built-ins to the function to get
the Clang-compatibility typed built-ins to work in C++ 98
(with the default argument).  When it was decided that
the Clang built-ins shouldn't change I removed the tests
but not this change.

Martin


Re: tuple code simplification patch

2016-06-07 Thread François Dumont

Fully tested and committed to trunk.

François


On 06/06/2016 12:17, Jonathan Wakely wrote:

On 02/06/16 23:00 +0200, François Dumont wrote:

Hi

   I was trying to play with tuple implementation and was annoyed by 
repetition of _Head type when instantiating _Head_base so I thought 
about this patch.


   How do you like it ?

   I still need to run tests, will do before commit, ok ?


Looks good, thanks. OK for trunk when testing passes.





Re: JIT patch: add gcc_jit_magic_int

2016-06-07 Thread Andrew Pinski
On Tue, Jun 7, 2016 at 12:22 PM, Andrew Pinski  wrote:
> On Tue, Jun 7, 2016 at 12:19 PM, Andrew Pinski  wrote:
>> On Mon, May 23, 2016 at 5:26 AM, Basile Starynkevitch
>>  wrote:
>>> Hello All,
>>>
>>> As I explained in https://gcc.gnu.org/ml/jit/2016-q2/msg00042.html it is
>>> difficult (or tricky without using dirty tricks involving the GCC plugin
>>> headers) to use GCCJIT to emit code equivalent to the following C file:
>>>
>>>extern int a;
>>>int get_atomic_a (void) {
>>>  return __atomic_load_n (&a, __ATOMIC_SEQ_CST);
>>>}
>>>
>>> The issue is that __ATOMIC_SEQ_CST is a magic preprocessor (but
>>> non-standard!) symbol which might not be available
>>> (or might have a different value) in the C code for GCCJIT building such an
>>> AST.
>>
>>>
>>> So we need a function to retrieve some magic integral value from the GCCJIT
>>> compiler.
>>
>> Huh?Why can't you just use the enum:
>> enum memmodel
>> {
>>   MEMMODEL_RELAXED = 0,
>>   MEMMODEL_CONSUME = 1,
>>   MEMMODEL_ACQUIRE = 2,
>>   MEMMODEL_RELEASE = 3,
>>   MEMMODEL_ACQ_REL = 4,
>>   MEMMODEL_SEQ_CST = 5,
>>   MEMMODEL_LAST = 6,
>>   MEMMODEL_SYNC_ACQUIRE = MEMMODEL_ACQUIRE | MEMMODEL_SYNC,
>>   MEMMODEL_SYNC_RELEASE = MEMMODEL_RELEASE | MEMMODEL_SYNC,
>>   MEMMODEL_SYNC_SEQ_CST = MEMMODEL_SEQ_CST | MEMMODEL_SYNC
>> };
>>
>>
>> For the values?  I don't understand why you need to have a direct
>> relationship to preprocessed macros to these values here.
>> Have your own lookup table inside your own JIT rather than a generic
>> way of doing it.
>> Having a generic way seems more incorrect way of doing it and even
>> says that the JIT is supposed to JITting C code but it is not JITing C
>> code.
>
>
> That is move enum memmodel to be included in the JIT API too.

Or rather have an API which the __atomic_* for you and all you need to
supply is the the "JIT" memory model and the locations.

Thanks,
Andrew

>
>>
>> Thanks,
>> Andrew Pinski
>>
>>>
>>> The attached patch (relative to trunk svn 236583) is a first attempt to
>>> solve that issue
>>>  (and also give ability to query some other magic numbers).
>>>
>>> Proposed ChangeLog entry (in gcc/jit/)
>>>
>>> 2016-05-23  Basile Starynkevitch  
>>> * libgccjit.h (LIBGCCJIT_HAVE_gcc_jit_magic_int): New macro.
>>> (gcc_jit_magic_int): New public function declaration.
>>>
>>> * libgccjit.c: Include "cppbuiltin.h", "options.h", "flag-types.h"
>>> (gcc_jit_magic_int): New function.
>>>
>>> * libgccjit.map: Add gcc_jit_magic_int to LIBGCCJIT_ABI_6.
>>>
>>> Comments (or an ok to commit) are welcome. (I am not sure that
>>> __SANITIZE_ADDRESS__ is correctly handled,
>>> because I would believe that optimization flags are not globals in GCCJIT)
>>>
>>> Regards.
>>>
>>> --
>>> Basile STARYNKEVITCH http://starynkevitch.net/Basile/
>>> email: basilestarynkevitchnet mobile: +33 6 8501 2359
>>> 8, rue de la Faiencerie, 92340 Bourg La Reine, France
>>> *** opinions {are only mine, sont seulement les miennes} ***
>>>


Re: [PATCH] integer overflow checking builtins in constant expressions

2016-06-07 Thread Jakub Jelinek
On Tue, Jun 07, 2016 at 03:35:28PM -0400, Jason Merrill wrote:
> >Indeed, confirming removal of the cp/tree.c hunk doesn't affect anything in
> >the testsuite, nor e.g.
> >enum A {
> >  B = 1,
> >  C = 2,
> >  D = __builtin_add_overflow_p (B, C, C)
> >};
> >int e[__builtin_add_overflow_p (B, C, C) + 1];
> >template  int foo (int);
> >void
> >bar ()
> >{
> >  foo <__builtin_add_overflow_p (B, C, C) + 1> (0);
> >}
> >That said, builtin_valid_in_constant_expr_p is used in lots of spots, and in
> >various of them I see that without the change it actually sets
> >*non_integral_constant_expression_p = true;
> >or something similar, but I have no idea why it still works despite of that.
> 
> Ah, that's the C++98 constant expression handling, which is a lot more
> restricted.  C++11 and up don't care about that flag.

Oops, actually, it seems even the cp/tree.c hunks are significant:

I've only compiled the above testcase with the default -std=c++14, with
-std=c++98 without the cp/tree.c bits I get:

a.C:4:7: error: ‘bool __builtin_add_overflow_p(...)’ cannot appear in a
constant-expression
   D = __builtin_add_overflow_p (B, C, C)
   ^~~~
a.C:4:40: error: a function call cannot appear in a constant-expression
   D = __builtin_add_overflow_p (B, C, C)
^
a.C:6:45: error: array bound is not an integer constant before ‘]’ token
 int e[__builtin_add_overflow_p (B, C, C) + 1];
 ^
a.C: In function ‘void bar()’:
a.C:11:8: error: ‘bool __builtin_add_overflow_p(...)’ cannot appear in a
constant-expression
   foo <__builtin_add_overflow_p (B, C, C) + 1> (0);
^~~~
a.C:11:41: error: a function call cannot appear in a constant-expression
   foo <__builtin_add_overflow_p (B, C, C) + 1> (0);
 ^
a.C:11:50: error: no matching function for call to ‘foo(int)’
   foo <__builtin_add_overflow_p (B, C, C) + 1> (0);
  ^
a.C:7:22: note: candidate: template int foo(int)
 template  int foo (int);
  ^~~
a.C:7:22: note:   template argument deduction/substitution failed:
a.C:11:50: error: template argument 1 is invalid
   foo <__builtin_add_overflow_p (B, C, C) + 1> (0);
  ^

Though, maybe it is only worth supporting the __builtin_*_overflow_p
builtins for C++98 integer constant expression contexts and
thus only handle there the 3 builtins instead of all the others.

I guess I should add this testcase to the testsuite then.

Jakub


Re: [C++ Patch] Fix some simple location issues

2016-06-07 Thread Jason Merrill
For diagnostics about an initializer, I think it would be better to give 
the location of the initializer rather than the declaration.  Maybe use


location_t loc = EXPR_LOC_OR_LOC (init, DECL_SOURCE_LOCATION (decl));

?

Jason


Re: [PATCH] integer overflow checking builtins in constant expressions

2016-06-07 Thread Jason Merrill

On 06/07/2016 11:51 AM, Martin Sebor wrote:

Unless I hear otherwise I'll submit a separate patch removing
the BUILTIN_FILE, FUNCTION, and LINE labels (and adjust the
comment to more closely reflect the function's purpose).


Since Jakub pointed out that this function is used by the parser for 
C++98 constant-expressions, I think we can leave these labels alone.  A 
comment tweak wouldn't hurt, though.


Jason



Re: [PATCH][3/3][RTL ifcvt] PR middle-end/37780: Conditional expression with __builtin_clz() should be optimized out

2016-06-07 Thread Christophe Lyon
On 26 May 2016 at 11:53, Kyrill Tkachov  wrote:
> Hi all,
>
> In this PR we want to optimise:
> int foo (int i)
> {
>   return (i == 0) ? N : __builtin_clz (i);
> }
>
> on targets where CLZ is defined at zero to the constant 'N'.
> This is determined at the RTL level through the CLZ_DEFINED_VALUE_AT_ZERO
> macro.
> The obvious place to implement this would be in combine through simplify-rtx
> where we'd
> recognise an IF_THEN_ELSE of the form:
> (set (reg:SI r1)
>  (if_then_else:SI (ne (reg:SI r2)
>   (const_int 0 [0]))
>(clz:SI (reg:SI r2))
>(const_int 32)))
>
> and if CLZ_DEFINED_VALUE_AT_ZERO is defined to 32 for SImode we'd simplify
> it into
> just (clz:SI (reg:SI r2)).
> However, I found this doesn't quite happen for a couple of reasons:
> 1) This depends on ifcvt or some other pass to have created a conditional
> move of the
> two branches that provide the IF_THEN_ELSE to propagate the const_int and
> clz operation into.
>
> 2) Combine will refuse to propagate r2 from the above example into both the
> condition and the
> CLZ at the same time, so the most we see is:
> (set (reg:SI r1)
>  (if_then_else:SI (ne (reg:CC cc)
> (const_int 0))
>(clz:SI (reg:SI r2))
>(const_int 32)))
>
> which is not enough information to perform the simplification.
>
> This patch implements the optimisation in ce1 using the noce ifcvt
> framework.
> During ifcvt noce_process_if_block can see that we're trying to optimise
> something
> of the form (x == 0 ? const_int : CLZ (x)) and so it has visibility of all
> the information
> needed to perform the transformation.
>
> The transformation is performed by adding a new noce_try* function that
> tries to put the
> condition and the 'then' and 'else' arms into an IF_THEN_ELSE rtx and try to
> simplify that
> using the simplify-rtx machinery. That way, we can implement the
> simplification logic in
> simplify-rtx.c where it belongs.
>
> A similar transformation for CTZ is implemented as well.
> So for code:
> int foo (int i)
> {
>   return (i == 0) ? 32 : __builtin_clz (i);
> }
>
> On aarch64 we now emit:
> foo:
> clz w0, w0
> ret
>
> instead of:
> foo:
> mov w1, 32
> clz w2, w0
> cmp w0, 0
> cselw0, w2, w1, ne
> ret
>
> and for arm similarly we generate:
> foo:
> clz r0, r0
> bx  lr
>
> instead of:
> foo:
> cmp r0, #0
> clzne   r0, r0
> moveq   r0, #32
> bx  lr
>
>
> and for x86_64 with -O2 -mlzcnt we generate:
> foo:
> xorl%eax, %eax
> lzcntl  %edi, %eax
> ret
>
> instead of:
> foo:
> xorl%eax, %eax
> movl$32, %edx
> lzcntl  %edi, %eax
> testl   %edi, %edi
> cmove   %edx, %eax
> ret
>
>
> I tried getting this to work on other targets as well, but encountered
> difficulties.
> For example on powerpc the two arms of the condition seen during ifcvt are:
>
> (insn 4 22 11 4 (set (reg:DI 156 [  ])
> (const_int 32 [0x20])) clz.c:3 434 {*movdi_internal64}
>  (nil))
> and
> (insn 10 9 23 3 (set (subreg/s/u:SI (reg:DI 156 [  ]) 0)
> (clz:SI (subreg/u:SI (reg/v:DI 157 [ i ]) 0))) clz.c:3 132 {clzsi2}
>  (expr_list:REG_DEAD (reg/v:DI 157 [ i ])
> (nil)))
>
> So the setup code in noce_process_if_block sees that the set destination is
> not the same
> ((reg:DI 156 [  ]) and (subreg/s/u:SI (reg:DI 156 [  ]) 0))
> so it bails out on the rtx_interchangeable_p (x, SET_DEST (set_b)) check.
> I suppose that's a consequence of how SImode operations are represented in
> early RTL
> on powerpc, I don't know what to do there. Perhaps that part of ivcvt can be
> taught to handle
> destinations that are subregs of one another, but that would be a separate
> patch.
>
> Anyway, is this patch ok for trunk?
>
> Bootstrapped and tested on arm-none-linux-gnueabihf, aarch64-none-linux-gnu,
> x86_64-pc-linux-gnu.
>
> Thanks,
> Kyrill
>
> 2016-05-26  Kyrylo Tkachov  
>
> PR middle-end/37780
> * ifcvt.c (noce_try_ifelse_collapse): New function.
> Declare prototype.
> (noce_process_if_block): Call noce_try_ifelse_collapse.
> * simplify-rtx.c (simplify_cond_clz_ctz): New function.
> (simplify_ternary_operation): Use the above to simplify
> conditional CLZ/CTZ expressions.
>
> 2016-05-26  Kyrylo Tkachov  
>
> PR middle-end/37780
> * gcc.c-torture/execute/pr37780.c: New test.
> * gcc.target/aarch64/pr37780_1.c: Likewise.
> * gcc.target/arm/pr37780_1.c: Likewise.

Hi Kyrylo,

I've noticed that gcc.target/arm/pr37780_1.c fails on
arm if arch < v6.
I first tried to fix the effective-target guard (IIRC, the doc
says clz is available starting with v5t), but that isn't sufficient.

When compiling for armv5-t, the scan-assembler directives
fail. It seems to work with v6t2, so I am wondering whether
it's just a matter of increasing the effective-target ar

Re: [PATCH] integer overflow checking builtins in constant expressions

2016-06-07 Thread Jason Merrill

On 06/07/2016 12:34 PM, Jakub Jelinek wrote:

On Tue, Jun 07, 2016 at 09:51:05AM -0600, Martin Sebor wrote:

On 06/07/2016 08:32 AM, Jason Merrill wrote:

On 06/06/2016 08:36 AM, Jakub Jelinek wrote:

@@ -352,6 +352,35 @@ builtin_valid_in_constant_expr_p (const_
case BUILT_IN_FUNCTION:
case BUILT_IN_LINE:

+  /* The following built-ins are valid in constant expressions
+ when their arguments are.  */
+case BUILT_IN_ADD_OVERFLOW:
+case BUILT_IN_ADD_OVERFLOW_P:



Why is this necessary?  builtin_valid_in_constant_expr_p is only needed
for builtins that can have constant values even if their operands are
non-constant, which doesn't apply here.

For that matter, I don't see why we needed to add _FUNCTION et al, either.


I don't remember what prompted me to change the function in either
patch.  Perhaps it was the comment above the function that says
this:

/* Test whether DECL is a builtin that may appear in a
   constant-expression. */

But removing the change doesn't seem to affect the C++ test results
for the two features so it looks like the change may not be needed.


Indeed, confirming removal of the cp/tree.c hunk doesn't affect anything in
the testsuite, nor e.g.
enum A {
  B = 1,
  C = 2,
  D = __builtin_add_overflow_p (B, C, C)
};
int e[__builtin_add_overflow_p (B, C, C) + 1];
template  int foo (int);
void
bar ()
{
  foo <__builtin_add_overflow_p (B, C, C) + 1> (0);
}
That said, builtin_valid_in_constant_expr_p is used in lots of spots, and in
various of them I see that without the change it actually sets
*non_integral_constant_expression_p = true;
or something similar, but I have no idea why it still works despite of that.


Ah, that's the C++98 constant expression handling, which is a lot more 
restricted.  C++11 and up don't care about that flag.



Is the patch ok for trunk without the cp/tree.c hunk?


Yes.

Jason



Re: [PATCH, rs6000] Add support for char, short, and int versions of vec_mul

2016-06-07 Thread Segher Boessenkool
On Tue, Jun 07, 2016 at 12:47:52PM -0500, Bill Seurer wrote:
> Bootstrapped and tested on powerpc64le-unknown-linux-gnu and
> powerpc64-unknown-linux-gnu with no regressions. Is this ok for trunk?

Okay, thanks!


Segher


Update probabilities in predict.def to match reality

2016-06-07 Thread Jan Hubicka
Hello,
Maritn Liska measured branch predictor hitrates on current tree and SPEC2006.

CPU2006
HEURISTICS   BRANCHES  (REL)  HITRATE
COVERAGE COVERAGE  (REL)
loop iv compare33   0.1%  20.27% /  86.24%   
30630826   30.63M   0.0%
no prediction   10406  19.5%  33.41% /  84.76%   
139755242456  139.76G  14.1%
early return (on trees)  6328  11.9%  54.20% /  86.48%
33569991740   33.57G   3.4%
guessed loop iterations   112   0.2%  62.06% /  64.49%  
958458522  958.46M   0.1%
fail alloc595   1.1%  62.18% / 100.00%  
  595   595.00   0.0%
opcode values positive (on trees)4266   8.0%  64.30% /  91.28%
16931889792   16.93G   1.7%
opcode values nonequal (on trees)6600  12.4%  66.23% /  80.60%
71483051282   71.48G   7.2%
continue  507   0.9%  66.66% /  82.85%
10086808016   10.09G   1.0%
call11351  21.3%  67.16% /  92.24%
34680666103   34.68G   3.5%
loop iterations  2689   5.0%  67.99% /  67.99%   
408309517405  408.31G  41.3%
DS theory   26385  49.4%  68.62% /  85.44%   
146974369890  146.97G  14.9%
const return  271   0.5%  69.39% /  87.09%  
301566712  301.57M   0.0%
pointer (on trees)   6230  11.7%  69.59% /  87.18%
16667735314   16.67G   1.7%
combined53398 100.0%  70.31% /  80.36%   
989164856862  989.16G 100.0%
goto   78   0.1%  70.36% /  96.96%  
951041538  951.04M   0.1%
first match 16607  31.1%  78.00% /  78.42%   
702435244516  702.44G  71.0%
extra loop exit   141   0.3%  82.80% /  88.17% 
16969469421.70G   0.2%
null return   393   0.7%  91.47% /  93.08% 
32686781973.27G   0.3%
loop exit9909  18.6%  91.80% /  92.81%   
282927773783  282.93G  28.6%
guess loop iv compare 178   0.3%  97.81% /  97.85% 
43750864534.38G   0.4%
negative return   277   0.5%  97.94% /  99.23% 
10621190281.06G   0.1%
noreturn call2372   4.4% 100.00% / 100.00% 
83565623238.36G   0.8%
overflow 1282   2.4% 100.00% / 100.00%  
175074177  175.07M   0.0%
zero-sized array  677   1.3% 100.00% / 100.00%  
112723803  112.72M   0.0%
unconditional jump103   0.2% 100.00% / 100.00% 
491001  491.00K   0.0%

We used to track SPEC2000 until 2008 but then the infrastructure broke. The
numbers show some differences to 2008 results:

HEURISTICS BRANCHES  (REL)  HITRATE  COVERAGE  (REL)
DS theory 42611  57.1%  74.54% /  89.71%   9237799352  28.7%
combined  74578 100.0%  72.88% /  90.59%  32201983315 100.0%
opcode values nonequal (on trees)14544  19.5%  72.03% /  88.64%   
3387233627  10.5%
early return (on trees)   11078  14.9%  61.23% /  89.25%   2349499033   7.3%
first match   13249  17.8%  89.11% /  93.08%  15876522911  49.3%
guessed loop iterations2722   3.6%  86.50% /  90.76%   7308035517  22.7%
no prediction 18718  25.1%  34.36% /  86.14%   7087661052  22.0%
call  23937  32.1%  71.38% /  93.08%   3829002205  11.9%
opcode values positive (on trees) 2515   3.4%  72.77% /  86.49%
927995806   2.9%
loop branch 378   0.5%  87.61% /  95.54%   1491510452   4.6%
loop exit  8833  11.8%  91.43% /  94.52%   6538486043  20.3%
loop iterations 912   1.2%  99.11% /  99.11%396451321   1.2%
noreturn call   890   1.2%  99.99% /  99.99%205957905   0.6%
pointer (on trees) 8394  11.3%  85.09% /  94.80%   1315262058   4.1%
negative return 272   0.4%  96.47% /  99.74% 49156319   0.1%
const return551   0.7%  67.92% /  68.97% 96082001   0.3%
__builtin_expect 20   0.0%  0% /  0%0   0.0%
null return 566   0.8%  96.58% /  98.77% 87555632   0.3%

There is some degradation in the combined heuristicshitrate (72.8->70) which 
may be caused
simply by fact that new spec is harder to guess. Main decrease seems to be in 
opcode_positive/nonequal
which may be also attributed to the fact that early opts now optimize out more 
code before
we do the statistics.

There are bugs in few predictors - goto predictor is dead because the FE code 
was dropped,
return predictor is bit random because CFG is optimized (it should probably be 
done in FE),
loop iv compare seems bogus and fortran fail alloc

Re: JIT patch: add gcc_jit_magic_int

2016-06-07 Thread Andrew Pinski
On Tue, Jun 7, 2016 at 12:19 PM, Andrew Pinski  wrote:
> On Mon, May 23, 2016 at 5:26 AM, Basile Starynkevitch
>  wrote:
>> Hello All,
>>
>> As I explained in https://gcc.gnu.org/ml/jit/2016-q2/msg00042.html it is
>> difficult (or tricky without using dirty tricks involving the GCC plugin
>> headers) to use GCCJIT to emit code equivalent to the following C file:
>>
>>extern int a;
>>int get_atomic_a (void) {
>>  return __atomic_load_n (&a, __ATOMIC_SEQ_CST);
>>}
>>
>> The issue is that __ATOMIC_SEQ_CST is a magic preprocessor (but
>> non-standard!) symbol which might not be available
>> (or might have a different value) in the C code for GCCJIT building such an
>> AST.
>
>>
>> So we need a function to retrieve some magic integral value from the GCCJIT
>> compiler.
>
> Huh?Why can't you just use the enum:
> enum memmodel
> {
>   MEMMODEL_RELAXED = 0,
>   MEMMODEL_CONSUME = 1,
>   MEMMODEL_ACQUIRE = 2,
>   MEMMODEL_RELEASE = 3,
>   MEMMODEL_ACQ_REL = 4,
>   MEMMODEL_SEQ_CST = 5,
>   MEMMODEL_LAST = 6,
>   MEMMODEL_SYNC_ACQUIRE = MEMMODEL_ACQUIRE | MEMMODEL_SYNC,
>   MEMMODEL_SYNC_RELEASE = MEMMODEL_RELEASE | MEMMODEL_SYNC,
>   MEMMODEL_SYNC_SEQ_CST = MEMMODEL_SEQ_CST | MEMMODEL_SYNC
> };
>
>
> For the values?  I don't understand why you need to have a direct
> relationship to preprocessed macros to these values here.
> Have your own lookup table inside your own JIT rather than a generic
> way of doing it.
> Having a generic way seems more incorrect way of doing it and even
> says that the JIT is supposed to JITting C code but it is not JITing C
> code.


That is move enum memmodel to be included in the JIT API too.

>
> Thanks,
> Andrew Pinski
>
>>
>> The attached patch (relative to trunk svn 236583) is a first attempt to
>> solve that issue
>>  (and also give ability to query some other magic numbers).
>>
>> Proposed ChangeLog entry (in gcc/jit/)
>>
>> 2016-05-23  Basile Starynkevitch  
>> * libgccjit.h (LIBGCCJIT_HAVE_gcc_jit_magic_int): New macro.
>> (gcc_jit_magic_int): New public function declaration.
>>
>> * libgccjit.c: Include "cppbuiltin.h", "options.h", "flag-types.h"
>> (gcc_jit_magic_int): New function.
>>
>> * libgccjit.map: Add gcc_jit_magic_int to LIBGCCJIT_ABI_6.
>>
>> Comments (or an ok to commit) are welcome. (I am not sure that
>> __SANITIZE_ADDRESS__ is correctly handled,
>> because I would believe that optimization flags are not globals in GCCJIT)
>>
>> Regards.
>>
>> --
>> Basile STARYNKEVITCH http://starynkevitch.net/Basile/
>> email: basilestarynkevitchnet mobile: +33 6 8501 2359
>> 8, rue de la Faiencerie, 92340 Bourg La Reine, France
>> *** opinions {are only mine, sont seulement les miennes} ***
>>


Re: JIT patch: add gcc_jit_magic_int

2016-06-07 Thread Andrew Pinski
On Mon, May 23, 2016 at 5:26 AM, Basile Starynkevitch
 wrote:
> Hello All,
>
> As I explained in https://gcc.gnu.org/ml/jit/2016-q2/msg00042.html it is
> difficult (or tricky without using dirty tricks involving the GCC plugin
> headers) to use GCCJIT to emit code equivalent to the following C file:
>
>extern int a;
>int get_atomic_a (void) {
>  return __atomic_load_n (&a, __ATOMIC_SEQ_CST);
>}
>
> The issue is that __ATOMIC_SEQ_CST is a magic preprocessor (but
> non-standard!) symbol which might not be available
> (or might have a different value) in the C code for GCCJIT building such an
> AST.

>
> So we need a function to retrieve some magic integral value from the GCCJIT
> compiler.

Huh?Why can't you just use the enum:
enum memmodel
{
  MEMMODEL_RELAXED = 0,
  MEMMODEL_CONSUME = 1,
  MEMMODEL_ACQUIRE = 2,
  MEMMODEL_RELEASE = 3,
  MEMMODEL_ACQ_REL = 4,
  MEMMODEL_SEQ_CST = 5,
  MEMMODEL_LAST = 6,
  MEMMODEL_SYNC_ACQUIRE = MEMMODEL_ACQUIRE | MEMMODEL_SYNC,
  MEMMODEL_SYNC_RELEASE = MEMMODEL_RELEASE | MEMMODEL_SYNC,
  MEMMODEL_SYNC_SEQ_CST = MEMMODEL_SEQ_CST | MEMMODEL_SYNC
};


For the values?  I don't understand why you need to have a direct
relationship to preprocessed macros to these values here.
Have your own lookup table inside your own JIT rather than a generic
way of doing it.
Having a generic way seems more incorrect way of doing it and even
says that the JIT is supposed to JITting C code but it is not JITing C
code.

Thanks,
Andrew Pinski

>
> The attached patch (relative to trunk svn 236583) is a first attempt to
> solve that issue
>  (and also give ability to query some other magic numbers).
>
> Proposed ChangeLog entry (in gcc/jit/)
>
> 2016-05-23  Basile Starynkevitch  
> * libgccjit.h (LIBGCCJIT_HAVE_gcc_jit_magic_int): New macro.
> (gcc_jit_magic_int): New public function declaration.
>
> * libgccjit.c: Include "cppbuiltin.h", "options.h", "flag-types.h"
> (gcc_jit_magic_int): New function.
>
> * libgccjit.map: Add gcc_jit_magic_int to LIBGCCJIT_ABI_6.
>
> Comments (or an ok to commit) are welcome. (I am not sure that
> __SANITIZE_ADDRESS__ is correctly handled,
> because I would believe that optimization flags are not globals in GCCJIT)
>
> Regards.
>
> --
> Basile STARYNKEVITCH http://starynkevitch.net/Basile/
> email: basilestarynkevitchnet mobile: +33 6 8501 2359
> 8, rue de la Faiencerie, 92340 Bourg La Reine, France
> *** opinions {are only mine, sont seulement les miennes} ***
>


Re: JIT patch: add gcc_jit_magic_int

2016-06-07 Thread David Malcolm
On Mon, 2016-05-23 at 14:26 +0200, Basile Starynkevitch wrote:
> Hello All,
> 
> As I explained in https://gcc.gnu.org/ml/jit/2016-q2/msg00042.html it
> is 
> difficult (or tricky without using dirty tricks involving the GCC
> plugin 
> headers) to use GCCJIT to emit code equivalent to the following C
> file:
> 
> extern int a;
> int get_atomic_a (void) {
>   return __atomic_load_n (&a, __ATOMIC_SEQ_CST);
> }
> 
> The issue is that __ATOMIC_SEQ_CST is a magic preprocessor (but non
> -standard!) symbol which might not be available
> (or might have a different value) in the C code for GCCJIT building
> such an AST.
> 
> So we need a function to retrieve some magic integral value from the
> GCCJIT compiler.
> 
> The attached patch (relative to trunk svn 236583) is a first attempt
> to solve that issue
>   (and also give ability to query some other magic numbers).
> 
> Proposed ChangeLog entry (in gcc/jit/)
> 
> 2016-05-23  Basile Starynkevitch  
>   * libgccjit.h (LIBGCCJIT_HAVE_gcc_jit_magic_int): New macro.
>   (gcc_jit_magic_int): New public function declaration.
> 
>   * libgccjit.c: Include "cppbuiltin.h", "options.h", "flag
> -types.h"
>   (gcc_jit_magic_int): New function.
> 
>   * libgccjit.map: Add gcc_jit_magic_int to LIBGCCJIT_ABI_6.
> 
> Comments (or an ok to commit) are welcome. (I am not sure that
> __SANITIZE_ADDRESS__ is correctly handled,
> because I would believe that optimization flags are not globals in
> GCCJIT)

Sorry about the delay in responding to this.

> Index: gcc/jit/libgccjit.h
> ===
> --- gcc/jit/libgccjit.h   (revision 236583)
> +++ gcc/jit/libgccjit.h   (working copy)
> @@ -1387,6 +1387,27 @@
>  gcc_jit_rvalue_set_bool_require_tail_call (gcc_jit_rvalue *call,
>  int require_tail_call);
>  
> +
> +  /* Magical integer values useful in the compiler; similar to
> + predefined C macros like __GNUC__, __GNUC_MINOR__,
> + __GNUC_PATCHLEVEL__, __ATOMIC_RELAXED, __ATOMIC_SEQ_CST,
> + __ATOMIC_ACQUIRE, __ATOMIC_RELEASE, __ATOMIC_ACQ_REL,
> + __ATOMIC_CONSUME, __PIC__, __PIE__, etc.  Typical usage would
be:
> +
> +bool err=false;
> +int mypic = gcc_jit_magic_int("__PIC__", &err);
> +if (err) somethinggotwrong();
> +
> +This function is expected to be rarely called, typically once at
> +initialization time. 
> +
> +   This API entrypoint was added in LIBGCCJIT_ABI_6; you can test
for its
> +   presence using
> + #ifdef LIBGCCJIT_HAVE_gcc_jit_magic_int
> +  */
> +#define LIBGCCJIT_HAVE_gcc_jit_magic_int
> +extern int gcc_jit_magic_int(const char*name, bool*errp);

Does the client code need to be able to check to see if a name is
defined, or can they assume it always is?

If the latter, how about:

  extern int
  gcc_jit_context_get_predefined_int (gcc_jit_context *ctxt,
  const char *name);

and have it set an error on the context if the name isn't
recognized?

If they need to be able to check it, I prefer:

  extern int
  gcc_jit_context_get_predefined_int (gcc_jit_context *ctxt,
  const char *name,
  int *out);

i.e. have the return value contain the "was name recognized" flag
and *out be written to (I'd use "bool", but we don't require its
availability).

I'm not fond of "magic int", but I'm not fond of my proposed name
either.

>  #ifdef __cplusplus
>  }
>  #endif /* __cplusplus */
> Index: gcc/jit/libgccjit.c
> ===
> --- gcc/jit/libgccjit.c   (revision 236583)
> +++ gcc/jit/libgccjit.c   (working copy)
> @@ -23,6 +23,9 @@
>  #include "coretypes.h"
>  #include "timevar.h"
>  #include "typed-splay-tree.h"
> +#include "cppbuiltin.h"
> +#include "options.h"
> +#include "flag-types.h"
>  
>  #include "libgccjit.h"
>  #include "jit-recording.h"
> @@ -2970,3 +2973,44 @@
>  
>call->set_require_tail_call (require_tail_call);
>  }
> +
> +
> +/* Public entrypoint. See description in libgccjit.h. */
> +
> +int gcc_jit_magic_int(const char*name, bool*errp)
> +{
> +  static int major, minor, patchlevel;
> +  if (!major) /* call once: */
> +parse_basever (&major, &minor, &patchlevel);

These vars are global state, and they've not guarded by a mutex.
Can they be moved to the gcc_jit_context instead?  (actually,
to gcc::jit::recording::context).  That way they'd be covered by
the general policies on thread-safety here:
  https://gcc.gnu.org/onlinedocs/jit/topics/contexts.html#thread-safety

That said, see the question below about the "__GNUC_*" vars.

> +  
> +  RETURN_VAL_IF_FAIL (name,
> +   errp?((*errp=true),0):0,

The above line is too terse for my taste.

> +   NULL, NULL,
> +   "NULL name");
> +  RETURN_VAL_IF_FAIL (name[0] == '_' && name[1] == '_',
> +   errp?(

Re: [PING] [PATCH] Fix asm X constraint (PR inline-asm/59155)

2016-06-07 Thread Bernd Edlinger
On 06/06/16 20:08, Jakub Jelinek wrote:
> On Mon, Jun 06, 2016 at 12:04:04PM -0600, Jeff Law wrote:
>> On 06/06/2016 12:01 PM, Jakub Jelinek wrote:
>>> On Mon, Jun 06, 2016 at 11:54:04AM -0600, Jeff Law wrote:
> As for recog.c, I can not approve this as I am not a maintainer of it.
> I only can say that the code looks questionable to me.
 I think the question on the recog part is a matter of how we choose to
 interpret what the "X" constraint means.

 Does it literally mean accept anything, or accept some subset expressions?

 I tend to think the former, which means that things like
 reg_overlap_mentioned_p or its callers have to be bullet-proofed.

AFACT this is not the only place where overly complex RTL trees can
cause an ICE.

>>>
>>> I think it is a bad idea to accept really anything, even for debug insns,
>>> which initially accepted arbitrarily large RTL expressions (and still accept
>>> stuff like subregs otherwise considered invalid etc.) we found it is highly
>>> undesirable, as it is not very good idea for the compile time complexity
>>> etc., so now we are trying to limit the complexity of the expressions there
>>> by splitting up more complex expressions into smaller ones using 
>>> temporaries.
>>> So, even accept anything should always be accept anything reasonable,
>>> because most of the RTL passes don't really expect arbitrarily deep
>>> expressions, or expressions where the same reg can appear thousands of times
>>> etc.
>> The problem is how do you define this subset of expressions you're going to
>> accept and those which you are going to reject.
>>
>> I first pondered accepting RTL leaf nodes (reg, subreg, constants) and
>> rejecting everything else.  But I couldn't convince myself that some port
>> might reasonably expect (plus (x) (y)) to match the "X" constraint.
>
> It is always going to be arbitrary.
> Perhaps RTL leaf nodes (if including MEM, then perhaps with valid address
> only), and unary/binary/ternary RTL expressions with RTL leaf node operands?
> Or union of what is accepted by any other constraint?
> Or "g" plus any constants?

Yes. That is what I think too, "X" is probably not used often in
practice, but if it is allowed, it should at least not result in an ICE.

"X" should allow to feed "whatsoever" valid C expressions to the asm
input, and using the %-expression in the assembler string should not
cause an ICE.

And "X" really means different "whatsoever" things in asm statements
and in .md files, even md.texi has different text for internally used
"X" constraint and assembler "X" constraint, so that should be OK.

According to md.texi
"g" should already mean:

@item @samp{g}
Any register, memory or immediate integer operand is allowed, except for
registers that are not general registers.

Which is the same as general_operand(op, VOIDmode). And in other words,
that should be everything that is in "r", "m" and "i" constraints.


However if I compare commond.md ...

(define_constraint "i"
   "Matches a general integer constant."
   (and (match_test "CONSTANT_P (op)")
(match_test "!flag_pic || LEGITIMATE_PIC_OPERAND_P (op)")))

... against recog.c, general_operand seems to allow less i.e. checks
targetm.legitimate_constant_p additionally.  So that is something
I don't really understand, I mean if a constant is not a "legitimate"
constant by the target's definition, why should it be safe to use
in later in targetm.asm_out.print_operand?



Bernd.


Re: [PATCH] Document POWER's -mhtm and -mno-htm options.

2016-06-07 Thread David Edelsohn
On Tue, Jun 7, 2016 at 1:18 PM, Peter Bergner  wrote:
> It seems we (ok, me) forgot to document the -mhtm option for POWER.
> This bootstrapped fine and the generated docs looked good.
> Is this ok for trunk and the open release branches?
>
> Peter
>
>
> * doc/invoke.texi (RS/6000 and PowerPC Options): Document -mhtm and
> -mno-htm.

Okay.

Thanks, David


[PATCH, rs6000] Add support for char, short, and int versions of vec_mul

2016-06-07 Thread Bill Seurer
This patch adds support for the missing versions of the vec_mul altivec
builtins from the Power Architecture 64-Bit ELF V2 ABI OpenPOWER ABI for
Linux Supplement (16 July 2015 Version 1.1). There are many of the builtins
that are missing and this is part of a series of patches to add them.

There aren't instructions for the {un}signed char, {un}signed short, and
{un}signed int versions of vec_mul so the output code is built from other
built-ins and operations that do have instructions.

The new test case is an executable test which verifies that the generated
code produces expected values. C macros were used so that the same
test case could be used for all the various supported types.

Bootstrapped and tested on powerpc64le-unknown-linux-gnu and
powerpc64-unknown-linux-gnu with no regressions. Is this ok for trunk?

[gcc]

2016-06-07  Bill Seurer  

* config/rs6000/altivec.h: Add __builtin_vec_mul.
* config/rs6000/rs6000-builtin.def (vec_mul): Change vec_mul to a
special case Altivec builtin.
* config/rs6000/rs6000-c.c (altivec_overloaded_builtins): Remove
VSX_BUILTIN_VEC_MUL (replaced with special case code).
* config/rs6000/rs6000-c.c (altivec_resolve_overloaded_builtin): Add
code for ALTIVEC_BUILTIN_VEC_MUL.
* config/rs6000/rs6000.c (altivec_init_builtins): Add definition
for __builtin_vec_mul.

[gcc/testsuite]

2016-06-07  Bill Seurer  

* gcc.target/powerpc/vec-mul.c: New test.


Index: gcc/config/rs6000/altivec.h
===
--- gcc/config/rs6000/altivec.h (revision 237175)
+++ gcc/config/rs6000/altivec.h (working copy)
@@ -229,6 +229,7 @@
 #define vec_mladd __builtin_vec_mladd
 #define vec_msum __builtin_vec_msum
 #define vec_msums __builtin_vec_msums
+#define vec_mul __builtin_vec_mul
 #define vec_mule __builtin_vec_mule
 #define vec_mulo __builtin_vec_mulo
 #define vec_nor __builtin_vec_nor
Index: gcc/config/rs6000/rs6000-builtin.def
===
--- gcc/config/rs6000/rs6000-builtin.def(revision 237175)
+++ gcc/config/rs6000/rs6000-builtin.def(working copy)
@@ -1300,6 +1300,7 @@ BU_ALTIVEC_OVERLOAD_X (LVRX, "lvrx")
 BU_ALTIVEC_OVERLOAD_X (LVRXL, "lvrxl")
 BU_ALTIVEC_OVERLOAD_X (LVSL,  "lvsl")
 BU_ALTIVEC_OVERLOAD_X (LVSR,  "lvsr")
+BU_ALTIVEC_OVERLOAD_X (MUL,   "mul")
 BU_ALTIVEC_OVERLOAD_X (PROMOTE,   "promote")
 BU_ALTIVEC_OVERLOAD_X (SLD,   "sld")
 BU_ALTIVEC_OVERLOAD_X (SPLAT, "splat")
@@ -1600,7 +1601,6 @@ BU_VSX_OVERLOAD_3V (XXPERMDI, "xxpermdi")
 BU_VSX_OVERLOAD_3V (XXSLDWI,  "xxsldwi")
 
 /* 2 argument VSX overloaded builtin functions.  */
-BU_VSX_OVERLOAD_2 (MUL, "mul")
 BU_VSX_OVERLOAD_2 (DIV, "div")
 BU_VSX_OVERLOAD_2 (XXMRGHW,  "xxmrghw")
 BU_VSX_OVERLOAD_2 (XXMRGLW,  "xxmrglw")
Index: gcc/config/rs6000/rs6000-c.c
===
--- gcc/config/rs6000/rs6000-c.c(revision 237175)
+++ gcc/config/rs6000/rs6000-c.c(working copy)
@@ -1941,14 +1941,6 @@ const struct altivec_builtin_types altivec_overloa
 RS6000_BTI_unsigned_V16QI, RS6000_BTI_bool_V16QI, 
RS6000_BTI_unsigned_V16QI, 0 },
   { ALTIVEC_BUILTIN_VEC_VMINUB, ALTIVEC_BUILTIN_VMINUB,
 RS6000_BTI_unsigned_V16QI, RS6000_BTI_unsigned_V16QI, 
RS6000_BTI_bool_V16QI, 0 },
-  { VSX_BUILTIN_VEC_MUL, VSX_BUILTIN_XVMULSP,
-RS6000_BTI_V4SF, RS6000_BTI_V4SF, RS6000_BTI_V4SF, 0 },
-  { VSX_BUILTIN_VEC_MUL, VSX_BUILTIN_XVMULDP,
-RS6000_BTI_V2DF, RS6000_BTI_V2DF, RS6000_BTI_V2DF, 0 },
-  { VSX_BUILTIN_VEC_MUL, VSX_BUILTIN_MUL_V2DI,
-RS6000_BTI_V2DI, RS6000_BTI_V2DI, RS6000_BTI_V2DI, 0 },
-  { VSX_BUILTIN_VEC_MUL, VSX_BUILTIN_MUL_V2DI,
-RS6000_BTI_unsigned_V2DI, RS6000_BTI_unsigned_V2DI, 
RS6000_BTI_unsigned_V2DI, 0 },
   { ALTIVEC_BUILTIN_VEC_MULE, ALTIVEC_BUILTIN_VMULEUB,
 RS6000_BTI_unsigned_V8HI, RS6000_BTI_unsigned_V16QI, 
RS6000_BTI_unsigned_V16QI, 0 },
   { ALTIVEC_BUILTIN_VEC_MULE, ALTIVEC_BUILTIN_VMULESB,
@@ -4683,7 +4675,58 @@ assignment for unaligned loads and stores");
 warning (OPT_Wdeprecated, "vec_lvsr is deprecated for little endian; use \
 assignment for unaligned loads and stores");
 
+  if (fcode == ALTIVEC_BUILTIN_VEC_MUL)
+{
+  /* vec_mul needs to be special cased because there are no instructions
+for it for the {un}signed char, {un}signed short, and {un}signed int
+types.  */
+  if (nargs != 2)
+   {
+ error ("vec_mul only accepts 2 arguments");
+ return error_mark_node;
+   }
 
+  tree arg0 = (*arglist)[0];
+  tree arg0_type = TREE_TYPE (arg0);
+  tree arg1 = (*arglist)[1];
+  tree arg1_type = TREE_TYPE (arg1);
+
+  /* Both arguments must be vectors and the types must match.  */
+  if (arg0_type != arg1_type)
+   goto bad;
+  if (TREE_CODE (arg0_type) != VECTOR_TYPE)
+   goto

[PATCH] Document POWER's -mhtm and -mno-htm options.

2016-06-07 Thread Peter Bergner
It seems we (ok, me) forgot to document the -mhtm option for POWER.
This bootstrapped fine and the generated docs looked good.
Is this ok for trunk and the open release branches?

Peter


* doc/invoke.texi (RS/6000 and PowerPC Options): Document -mhtm and
-mno-htm.

Index: gcc/doc/invoke.texi
===
--- gcc/doc/invoke.texi (revision 237175)
+++ gcc/doc/invoke.texi (working copy)
@@ -1000,7 +1000,7 @@ See RS/6000 and PowerPC Options.
 -mpointers-to-nested-functions -mno-pointers-to-nested-functions @gol
 -msave-toc-indirect -mno-save-toc-indirect @gol
 -mpower8-fusion -mno-mpower8-fusion -mpower8-vector -mno-power8-vector @gol
--mcrypto -mno-crypto -mdirect-move -mno-direct-move @gol
+-mcrypto -mno-crypto -mhtm -mno-htm -mdirect-move -mno-direct-move @gol
 -mquad-memory -mno-quad-memory @gol
 -mquad-memory-atomic -mno-quad-memory-atomic @gol
 -mcompat-align-parm -mno-compat-align-parm @gol
@@ -19991,7 +19991,7 @@ following options:
 -mpopcntb -mpopcntd  -mpowerpc64 @gol
 -mpowerpc-gpopt  -mpowerpc-gfxopt  -msingle-float -mdouble-float @gol
 -msimple-fpu -mstring  -mmulhw  -mdlmzb  -mmfpgpr -mvsx @gol
--mcrypto -mdirect-move -mpower8-fusion -mpower8-vector @gol
+-mcrypto -mdirect-move -mhtm -mpower8-fusion -mpower8-vector @gol
 -mquad-memory -mquad-memory-atomic -mmodulo -mfloat128 -mfloat128-hardware @gol
 -mpower9-fusion -mpower9-vector -mpower9-dform}
 
@@ -20165,6 +20165,14 @@ Generate code that uses (does not use) t
 between the general purpose registers and the vector/scalar (VSX)
 registers that were added in version 2.07 of the PowerPC ISA.
 
+@item -mhtm
+@itemx -mno-htm
+@opindex mhtm
+@opindex mno-htm
+Enable (disable) the use of the built-in functions that allow direct
+access to the Hardware Transactional Memory (HTM) instructions that
+were added in version 2.07 of the PowerPC ISA.
+
 @item -mpower8-fusion
 @itemx -mno-power8-fusion
 @opindex mpower8-fusion



[PATCH][AArch64][2/2] (Re)Implement vcopy_lane intrinsics

2016-06-07 Thread Kyrill Tkachov

Hi all,

This is the second part of James's patch from:
https://gcc.gnu.org/ml/gcc-patches/2013-09/msg01068.html
separated out. It reimplements the vcopyq_lane* intrinsics in C and
adds implementations of the other missing vcopy_lane_ intrinsics.

The differences from that patch are in the use of __aarch64_vset_lane_any and
__aarch64_vget_lane_any rather than the typed variants of these that were used
back in 2013 (and don't exist anymore).

The testcase is also adjusted for the ABI change in GCC 5 where integer x1 
vectors
are now passed and returned in SIMD registers.

The vcopy_laneq_f64 test in the testcase is currently XFAILed because it 
currently
doesn't generate the optimal DUP instruction but instead emits a UMOV to a 
scalar
register and then an FMOV. This is a GCC 7 regression tracked by PR 71307 and I 
think
unrelated to this patch.

Bootstrapped and tested on aarch64-none-linux-gnu.
Also tested on aarch64_be-none-elf.

Ok for trunk?

Thanks,
Kyrill

2016-06-07  Kyrylo Tkachov  
James Greenhalgh  

* config/aarch64/arm_neon.h (vcopyq_lane_f32, vcopyq_lane_f64,
vcopyq_lane_p8, vcopyq_lane_p16, vcopyq_lane_s8, vcopyq_lane_s16,
vcopyq_lane_s32, vcopyq_lane_s64, vcopyq_lane_u8, vcopyq_lane_u16,
vcopyq_lane_u32, vcopyq_lane_u64): Reimplement in C.
(vcopy_lane_f32, vcopy_lane_f64, vcopy_lane_p8, vcopy_lane_p16,
vcopy_lane_s8, vcopy_lane_s16, vcopy_lane_s32, vcopy_lane_s64,
vcopy_lane_u8, vcopy_lane_u16, vcopy_lane_u32, vcopy_lane_u64,
vcopy_laneq_f32, vcopy_laneq_f64, vcopy_laneq_p8, vcopy_laneq_p16,
vcopy_laneq_s8, vcopy_laneq_s16, vcopy_laneq_s32, vcopy_laneq_s64,
vcopy_laneq_u8, vcopy_laneq_u16, vcopy_laneq_u32, vcopy_laneq_u64,
vcopyq_laneq_f32, vcopyq_laneq_f64, vcopyq_laneq_p8, vcopyq_laneq_p16,
vcopyq_laneq_s8, vcopyq_laneq_s16, vcopyq_laneq_s32, vcopyq_laneq_s64,
vcopyq_laneq_u8, vcopyq_laneq_u16, vcopyq_laneq_u32, vcopyq_laneq_u64):
New intrinsics.

2016-06-07  Kyrylo Tkachov  
James Greenhalgh  

* gcc.target/aarch64/vect_copy_lane_1.c: New test.
diff --git a/gcc/config/aarch64/arm_neon.h b/gcc/config/aarch64/arm_neon.h
index d20caf0919356eb7a87e7c7a9cd336d8408db35b..e33ecdbb72c8e8a483f82adc6f0e28edb3e5f4e6 100644
--- a/gcc/config/aarch64/arm_neon.h
+++ b/gcc/config/aarch64/arm_neon.h
@@ -5869,162 +5869,6 @@ vaddlvq_u32 (uint32x4_t a)
   return result;
 }
 
-#define vcopyq_lane_f32(a, b, c, d) \
-  __extension__ \
-({  \
-   float32x4_t c_ = (c);\
-   float32x4_t a_ = (a);\
-   float32x4_t result;  \
-   __asm__ ("ins %0.s[%2], %3.s[%4]"\
-: "=w"(result)  \
-: "0"(a_), "i"(b), "w"(c_), "i"(d)  \
-: /* No clobbers */);   \
-   result;  \
- })
-
-#define vcopyq_lane_f64(a, b, c, d) \
-  __extension__ \
-({  \
-   float64x2_t c_ = (c);\
-   float64x2_t a_ = (a);\
-   float64x2_t result;  \
-   __asm__ ("ins %0.d[%2], %3.d[%4]"\
-: "=w"(result)  \
-: "0"(a_), "i"(b), "w"(c_), "i"(d)  \
-: /* No clobbers */);   \
-   result;  \
- })
-
-#define vcopyq_lane_p8(a, b, c, d)  \
-  __extension__ \
-({  \
-   poly8x16_t c_ = (c); \
-   poly8x16_t a_ = (a); \
-   poly8x16_t result;   \
-   __asm__ ("ins %0.b[%2], %3.b[%4]"\
-: "=w"(result)  \
-: "0"(a_), "i"(b), "w"(c_), "i"(d)  \
-: /* No clobbers */);   \
-   result;  \
- })
-
-#define vcopyq_lane_p16(a, b, c, d) \
-  __extension__ 

[PATCH][AArch64][1/2] Add support INS (element) instruction to copy lanes between vectors

2016-06-07 Thread Kyrill Tkachov

Hi all,

This patch addresses an deficiency we have in handling vector lane-to-lane 
moves in the AArch64 backend.
Generally we can use the INS (element) instruction but, as a user complains in 
https://gcc.gnu.org/ml/gcc-help/2016-05/msg00069.html
we don't. James had a patch adding an appropriate combine pattern some time ago 
(https://gcc.gnu.org/ml/gcc-patches/2013-09/msg01068.html)
but it never got applied.

This patch is a rebase of that patch that adds necessary 
vec_merge+vec_duplicate+vec_select combine pattern.
I chose to use a define_insn rather than the define_insn_and_split in that 
patch that just deletes the instruction when
the source and destination registers are the same, as I think that's not he 
combine patterns job to delete the redundant instruction
but rather some other passes job. Also, I was not able to create a testcase 
where it would make a difference.

Also, this patch doesn't reimplement that vcopy*lane* intrinsics from inline 
assembly to a vget_lane+vset_lane combo.
This can be done as a separate patch on top of this one.

Bootstrapped and tested on aarch64-none-linux-gnu.
Also tested on aarch64_be-none-elf.

Ok for trunk?

Thanks,
Kyrill

2016-06-07  James Greenhalgh  
Kyrylo Tkachov  

* config/aarch64/aarch64-simd.md (*aarch64_simd_vec_copy_lane):
New define_insn.
 (*aarch64_simd_vec_copy_lane_): Likewise.

2016-06-07  James Greenhalgh  
Kyrylo Tkachov  

* gcc.target/aarch64/vget_set_lane_1.c: New test.
diff --git a/gcc/config/aarch64/aarch64-simd.md b/gcc/config/aarch64/aarch64-simd.md
index 6ea35bf487eaa47dd78742e3eae7507b6875ba1a..5600e5bd0a94fd7efd704a4b13d95d993fd5b62f 100644
--- a/gcc/config/aarch64/aarch64-simd.md
+++ b/gcc/config/aarch64/aarch64-simd.md
@@ -555,6 +555,49 @@ (define_insn "aarch64_simd_vec_set"
   [(set_attr "type" "neon_from_gp, neon_ins, neon_load1_1reg")]
 )
 
+(define_insn "*aarch64_simd_vec_copy_lane"
+  [(set (match_operand:VALL 0 "register_operand" "=w")
+	(vec_merge:VALL
+	(vec_duplicate:VALL
+	  (vec_select:
+		(match_operand:VALL 3 "register_operand" "w")
+		(parallel
+		  [(match_operand:SI 4 "immediate_operand" "i")])))
+	(match_operand:VALL 1 "register_operand" "0")
+	(match_operand:SI 2 "immediate_operand" "i")))]
+  "TARGET_SIMD"
+  {
+int elt = ENDIAN_LANE_N (mode, exact_log2 (INTVAL (operands[2])));
+operands[2] = GEN_INT (HOST_WIDE_INT_1 << elt);
+operands[4] = GEN_INT (ENDIAN_LANE_N (mode, INTVAL (operands[4])));
+
+return "ins\t%0.[%p2], %3.[%4]";
+  }
+  [(set_attr "type" "neon_ins")]
+)
+
+(define_insn "*aarch64_simd_vec_copy_lane_"
+  [(set (match_operand:VALL 0 "register_operand" "=w")
+	(vec_merge:VALL
+	(vec_duplicate:VALL
+	  (vec_select:
+		(match_operand: 3 "register_operand" "w")
+		(parallel
+		  [(match_operand:SI 4 "immediate_operand" "i")])))
+	(match_operand:VALL 1 "register_operand" "0")
+	(match_operand:SI 2 "immediate_operand" "i")))]
+  "TARGET_SIMD"
+  {
+int elt = ENDIAN_LANE_N (mode, exact_log2 (INTVAL (operands[2])));
+operands[2] = GEN_INT (HOST_WIDE_INT_1 << elt);
+operands[4] = GEN_INT (ENDIAN_LANE_N (mode,
+			   INTVAL (operands[4])));
+
+return "ins\t%0.[%p2], %3.[%4]";
+  }
+  [(set_attr "type" "neon_ins")]
+)
+
 (define_insn "aarch64_simd_lshr"
  [(set (match_operand:VDQ_I 0 "register_operand" "=w")
(lshiftrt:VDQ_I (match_operand:VDQ_I 1 "register_operand" "w")
diff --git a/gcc/testsuite/gcc.target/aarch64/vget_set_lane_1.c b/gcc/testsuite/gcc.target/aarch64/vget_set_lane_1.c
new file mode 100644
index ..07a77de319206c5c6dad1c0d2d9bcc998583f9c1
--- /dev/null
+++ b/gcc/testsuite/gcc.target/aarch64/vget_set_lane_1.c
@@ -0,0 +1,72 @@
+/* { dg-do compile } */
+/* { dg-options "-O2" } */
+
+#include "arm_neon.h"
+
+#define BUILD_TEST(TYPE1, TYPE2, Q1, Q2, SUFFIX, INDEX1, INDEX2)	\
+TYPE1 __attribute__((noinline,noclone))\
+test_copy##Q1##_lane##Q2##_##SUFFIX (TYPE1 a, TYPE2 b)			\
+{	\
+  return vset##Q1##_lane_##SUFFIX (vget##Q2##_lane_##SUFFIX (b, INDEX2),\
+a, INDEX1);\
+}
+
+BUILD_TEST (poly8x8_t, poly8x8_t, , , p8, 7, 6)
+BUILD_TEST (int8x8_t,  int8x8_t,  , , s8, 7, 6)
+BUILD_TEST (uint8x8_t, uint8x8_t, , , u8, 7, 6)
+/* { dg-final { scan-assembler-times "ins\\tv0.b\\\[7\\\], v1.b\\\[6\\\]" 3 } } */
+BUILD_TEST (poly16x4_t, poly16x4_t, , , p16, 3, 2)
+BUILD_TEST (int16x4_t,  int16x4_t,  , , s16, 3, 2)
+BUILD_TEST (uint16x4_t, uint16x4_t, , , u16, 3, 2)
+/* { dg-final { scan-assembler-times "ins\\tv0.h\\\[3\\\], v1.h\\\[2\\\]" 3 } } */
+BUILD_TEST (float32x2_t, float32x2_t, , , f32, 1, 0)
+BUILD_TEST (int32x2_t,   int32x2_t,   , , s32, 1, 0)
+BUILD_TEST (uint32x2_t,  uint32x2_t,  , , u32, 1, 0)
+/* { dg-final { scan-assembler-times "ins\\tv0.s\\\[1\\\], v1.s\\\[0\\\]" 3 } } */
+
+BUILD_TEST (poly8x8_t, poly8x16_t, , q, p8, 7, 15)
+BUILD_TEST (int8x8_t,  int8x16_t,  , q, s8, 7, 15)
+BUILD_TEST (uint8x8_t, uint8x16_t, , q, u8, 7, 15

Re: [PATCH][3/3][RTL ifcvt] PR middle-end/37780: Conditional expression with __builtin_clz() should be optimized out

2016-06-07 Thread Kyrill Tkachov


On 07/06/16 17:47, Kyrill Tkachov wrote:


On 07/06/16 08:37, Andreas Schwab wrote:

Kyrill Tkachov  writes:


+static rtx
+simplify_cond_clz_ctz (rtx x, rtx_code cmp_code, rtx true_val, rtx false_val)
+{
+  if (cmp_code != EQ && cmp_code != NE)
+return NULL_RTX;
+
+  /* Result on X == 0 and X !=0 respectively.  */
+  rtx on_zero, on_nonzero;
+  if (cmp_code == EQ)
+{
+  on_zero = true_val;
+  on_nonzero = false_val;
+}
+  else
+{
+  on_zero = false_val;
+  on_nonzero = true_val;
+}
+
+  rtx_code op_code = GET_CODE (on_nonzero);
+  if ((op_code != CLZ && op_code != CTZ)
+  || !rtx_equal_p (XEXP (on_nonzero, 0), x)
+  || !CONST_INT_P (on_zero))
+return NULL_RTX;
+
+  HOST_WIDE_INT op_val;
+  machine_mode mode = GET_MODE (on_nonzero);
+  if (((op_code == CLZ && CLZ_DEFINED_VALUE_AT_ZERO (mode, op_val))
+|| (op_code == CTZ && CTZ_DEFINED_VALUE_AT_ZERO (mode, op_val)))
+  && op_val == INTVAL (on_zero))
+return on_nonzero;
+
+  return NULL_RTX;
+}

../../gcc/simplify-rtx.c: In function 'rtx_def* simplify_cond_clz_ctz(rtx, 
rtx_code, rtx, rtx)':
../../gcc/simplify-rtx.c:5304:16: error: unused variable 'mode' 
[-Werror=unused-variable]
machine_mode mode = GET_MODE (on_nonzero);
 ^~~~

Andreas.


That must be on targets that don't define CLZ_DEFINED_VALUE_AT_ZERO or 
CTZ_DEFINED_VALUE_AT_ZERO.
defaults.h then defines them to just "0".
I think GCC shouldn't warn in these cases as 'mode' is used in the function
it's defined, it's just the macro where it's used as an argument discards it.
Anyway, this patch removes that variable and fixes the warning.

Tested on arm-none-eabi after manually undefining CLZ_DEFINED_VALUE_AT_ZERO and 
CTZ_DEFINED_VALUE_AT_ZERO to reproduce
the warning and checking that it's fixed.
Committing as obvious in the interest of fixing bootstrap on the relevant 
targets.



With the ChangeLog entry:

2016-06-07  Kyrylo Tkachov  

* simplify-rtx.c (simplify_cond_clz_ctz): Delete 'mode' local
variable.


Thanks,
Kyrill





Re: [PATCH][3/3][RTL ifcvt] PR middle-end/37780: Conditional expression with __builtin_clz() should be optimized out

2016-06-07 Thread Kyrill Tkachov


On 07/06/16 08:37, Andreas Schwab wrote:

Kyrill Tkachov  writes:


+static rtx
+simplify_cond_clz_ctz (rtx x, rtx_code cmp_code, rtx true_val, rtx false_val)
+{
+  if (cmp_code != EQ && cmp_code != NE)
+return NULL_RTX;
+
+  /* Result on X == 0 and X !=0 respectively.  */
+  rtx on_zero, on_nonzero;
+  if (cmp_code == EQ)
+{
+  on_zero = true_val;
+  on_nonzero = false_val;
+}
+  else
+{
+  on_zero = false_val;
+  on_nonzero = true_val;
+}
+
+  rtx_code op_code = GET_CODE (on_nonzero);
+  if ((op_code != CLZ && op_code != CTZ)
+  || !rtx_equal_p (XEXP (on_nonzero, 0), x)
+  || !CONST_INT_P (on_zero))
+return NULL_RTX;
+
+  HOST_WIDE_INT op_val;
+  machine_mode mode = GET_MODE (on_nonzero);
+  if (((op_code == CLZ && CLZ_DEFINED_VALUE_AT_ZERO (mode, op_val))
+   || (op_code == CTZ && CTZ_DEFINED_VALUE_AT_ZERO (mode, op_val)))
+  && op_val == INTVAL (on_zero))
+return on_nonzero;
+
+  return NULL_RTX;
+}

../../gcc/simplify-rtx.c: In function 'rtx_def* simplify_cond_clz_ctz(rtx, 
rtx_code, rtx, rtx)':
../../gcc/simplify-rtx.c:5304:16: error: unused variable 'mode' 
[-Werror=unused-variable]
machine_mode mode = GET_MODE (on_nonzero);
 ^~~~

Andreas.


That must be on targets that don't define CLZ_DEFINED_VALUE_AT_ZERO or 
CTZ_DEFINED_VALUE_AT_ZERO.
defaults.h then defines them to just "0".
I think GCC shouldn't warn in these cases as 'mode' is used in the function
it's defined, it's just the macro where it's used as an argument discards it.
Anyway, this patch removes that variable and fixes the warning.

Tested on arm-none-eabi after manually undefining CLZ_DEFINED_VALUE_AT_ZERO and 
CTZ_DEFINED_VALUE_AT_ZERO to reproduce
the warning and checking that it's fixed.
Committing as obvious in the interest of fixing bootstrap on the relevant 
targets.

Thanks,
Kyrill

diff --git a/gcc/simplify-rtx.c b/gcc/simplify-rtx.c
index 0f750bb08e817ba630d92566c6a991f81e095304..21adfcb531f3f3379653c67145ab0023c978fd82 100644
--- a/gcc/simplify-rtx.c
+++ b/gcc/simplify-rtx.c
@@ -5314,9 +5314,10 @@ simplify_cond_clz_ctz (rtx x, rtx_code cmp_code, rtx true_val, rtx false_val)
 return NULL_RTX;
 
   HOST_WIDE_INT op_val;
-  machine_mode mode = GET_MODE (on_nonzero);
-  if (((op_code == CLZ && CLZ_DEFINED_VALUE_AT_ZERO (mode, op_val))
-	|| (op_code == CTZ && CTZ_DEFINED_VALUE_AT_ZERO (mode, op_val)))
+  if (((op_code == CLZ
+	&& CLZ_DEFINED_VALUE_AT_ZERO (GET_MODE (on_nonzero), op_val))
+  || (op_code == CTZ
+	  && CTZ_DEFINED_VALUE_AT_ZERO (GET_MODE (on_nonzero), op_val)))
   && op_val == INTVAL (on_zero))
 return on_nonzero;
 


Re: [PATCH] integer overflow checking builtins in constant expressions

2016-06-07 Thread Jakub Jelinek
On Tue, Jun 07, 2016 at 09:51:05AM -0600, Martin Sebor wrote:
> On 06/07/2016 08:32 AM, Jason Merrill wrote:
> >On 06/06/2016 08:36 AM, Jakub Jelinek wrote:
> >>@@ -352,6 +352,35 @@ builtin_valid_in_constant_expr_p (const_
> >> case BUILT_IN_FUNCTION:
> >> case BUILT_IN_LINE:
> >>
> >>+  /* The following built-ins are valid in constant expressions
> >>+ when their arguments are.  */
> >>+case BUILT_IN_ADD_OVERFLOW:
> >>+case BUILT_IN_ADD_OVERFLOW_P:
> >
> >
> >Why is this necessary?  builtin_valid_in_constant_expr_p is only needed
> >for builtins that can have constant values even if their operands are
> >non-constant, which doesn't apply here.
> >
> >For that matter, I don't see why we needed to add _FUNCTION et al, either.
> 
> I don't remember what prompted me to change the function in either
> patch.  Perhaps it was the comment above the function that says
> this:
> 
> /* Test whether DECL is a builtin that may appear in a
>constant-expression. */
> 
> But removing the change doesn't seem to affect the C++ test results
> for the two features so it looks like the change may not be needed.

Indeed, confirming removal of the cp/tree.c hunk doesn't affect anything in
the testsuite, nor e.g.
enum A {
  B = 1,
  C = 2,
  D = __builtin_add_overflow_p (B, C, C)
};
int e[__builtin_add_overflow_p (B, C, C) + 1];
template  int foo (int);
void
bar ()
{
  foo <__builtin_add_overflow_p (B, C, C) + 1> (0);
}
That said, builtin_valid_in_constant_expr_p is used in lots of spots, and in
various of them I see that without the change it actually sets
*non_integral_constant_expression_p = true;
or something similar, but I have no idea why it still works despite of that.

Is the patch ok for trunk without the cp/tree.c hunk?

Jakub


PING^2: [PATCH] Load external function address via GOT slot

2016-06-07 Thread H.J. Lu
On Sun, May 29, 2016 at 3:13 PM, H.J. Lu  wrote:
> On Mon, May 9, 2016 at 10:37 AM, H.J. Lu  wrote:
>> On Fri, Apr 22, 2016 at 6:03 AM, Uros Bizjak  wrote:
>>> On Fri, Apr 22, 2016 at 2:54 PM, H.J. Lu  wrote:
 For -fno-plt, we load the external function address via the GOT slot
 so that linker won't create an PLT entry for extern function address.

 Tested on x86-64. I also built GCC with -fno-plt.  It removes 99% PLT
 entries.  OK for trunk?

 H.J.
 --
 gcc/

 PR target/pr67400
 * config/i386/i386-protos.h (ix86_force_load_from_GOT_p): New.
 * config/i386/i386.c (ix86_force_load_from_GOT_p): New function.
 (ix86_legitimate_address_p): Allow UNSPEC_GOTPCREL for
 ix86_force_load_from_GOT_p returns true.
 (ix86_print_operand_address): Support UNSPEC_GOTPCREL if
 ix86_force_load_from_GOT_p returns true.
 (ix86_expand_move): Load the external function address via the
 GOT slot if ix86_force_load_from_GOT_p returns true.
 * config/i386/predicates.md (x86_64_immediate_operand): Return
 false if ix86_force_load_from_GOT_p returns true.

 gcc/testsuite/

 PR target/pr67400
 * gcc.target/i386/pr67400-1.c: New test.
 * gcc.target/i386/pr67400-2.c: Likewise.
 * gcc.target/i386/pr67400-3.c: Likewise.
 * gcc.target/i386/pr67400-4.c: Likewise.
>>>
>>> Please get someone that knows this linker magic to review the
>>> functionality first. Maybe Jakub can help?
>>>
>>
>> Hi Jakub,
>>
>> Can you review this patch?
>>
>> Thanks.
>
> PING.
>

PING.


-- 
H.J.


[C++ Patch] Fix some simple location issues

2016-06-07 Thread Paolo Carlini

Hi,

in parallel with other work, I'm auditing the front-end for simple 
location issues which can be fixed immediately. These I noticed in 
decl.c over the last couple of days. Slightly more interesting, for some 
error messages we didn't seem to have testcases, certainly not in 
g++.dg, thus I added some elementary ones covering the locations too. 
Tested x86_64-linux.


Thanks,
Paolo.

///
/cp
2016-06-07  Paolo Carlini  

* decl.c (maybe_deduce_size_from_array_init): Use
DECL_SOURCE_LOCATION in error_at.
(layout_var_decl): Likewise.
(check_array_initializer): Likewise.
(check_initializer): Likewise.
(duplicate_decls, check_elaborated_type_specifier): Tidy.

/testsuite
2016-06-07  Paolo Carlini  

* g++.dg/init/array42.C: New.
* g++.dg/init/array43.C: Likewise.
* g++.dg/init/array44.C: Likewise.
* g++.dg/init/array45.C: Likewise.
* g++.dg/cpp0x/constexpr-ice10.C: Test column number too.
* g++.dg/cpp0x/constexpr-incomplete1.C: Likewise.
* g++.dg/cpp1y/auto-fn27.C: Likewise.
* g++.dg/gomp/pr35751.C: Likewise.
* g++.dg/init/array23.C: Likewise.
* g++.dg/init/brace2.C: Likewise.
* g++.dg/init/brace6.C: Likewise.
Index: cp/decl.c
===
--- cp/decl.c   (revision 237171)
+++ cp/decl.c   (working copy)
@@ -1393,7 +1393,7 @@ duplicate_decls (tree newdecl, tree olddecl, bool
 {
   if (DECL_INITIAL (olddecl))
inform (DECL_SOURCE_LOCATION (olddecl),
-   "previous definition of %q+D was here", olddecl);
+   "previous definition of %qD was here", olddecl);
   else
inform (DECL_SOURCE_LOCATION (olddecl),
"previous declaration of %qD was here", olddecl);
@@ -5266,13 +5266,15 @@ maybe_deduce_size_from_array_init (tree decl, tree
do_default);
  if (failure == 1)
{
- error ("initializer fails to determine size of %qD", decl);
+ error_at (DECL_SOURCE_LOCATION (decl),
+   "initializer fails to determine size of %qD", decl);
}
  else if (failure == 2)
{
  if (do_default)
{
- error ("array size missing in %qD", decl);
+ error_at (DECL_SOURCE_LOCATION (decl),
+   "array size missing in %qD", decl);
}
  /* If a `static' var's size isn't known, make it extern as
 well as static, so it does not get allocated.  If it's not
@@ -5283,7 +5285,8 @@ maybe_deduce_size_from_array_init (tree decl, tree
}
  else if (failure == 3)
{
- error ("zero-size array %qD", decl);
+ error_at (DECL_SOURCE_LOCATION (decl),
+   "zero-size array %qD", decl);
}
}
 
@@ -5322,7 +5325,8 @@ layout_var_decl (tree decl)
   /* An automatic variable with an incomplete type: that is an error.
 Don't talk about array types here, since we took care of that
 message in grokdeclarator.  */
-  error ("storage size of %qD isn%'t known", decl);
+  error_at (DECL_SOURCE_LOCATION (decl),
+   "storage size of %qD isn%'t known", decl);
   TREE_TYPE (decl) = error_mark_node;
 }
 #if 0
@@ -5345,7 +5349,8 @@ layout_var_decl (tree decl)
constant_expression_warning (DECL_SIZE (decl));
   else
{
- error ("storage size of %qD isn%'t constant", decl);
+ error_at (DECL_SOURCE_LOCATION (decl),
+   "storage size of %qD isn%'t constant", decl);
  TREE_TYPE (decl) = error_mark_node;
}
 }
@@ -5954,7 +5959,8 @@ check_array_initializer (tree decl, tree type, tre
   if (!COMPLETE_TYPE_P (complete_type (element_type)))
 {
   if (decl)
-   error ("elements of array %q#D have incomplete type", decl);
+   error_at (DECL_SOURCE_LOCATION (decl),
+ "elements of array %q#D have incomplete type", decl);
   else
error ("elements of array %q#T have incomplete type", type);
   return true;
@@ -6018,7 +6024,8 @@ check_initializer (tree decl, tree init, int flags
 }
   else if (!COMPLETE_TYPE_P (type))
 {
-  error ("%q#D has incomplete type", decl);
+  error_at (DECL_SOURCE_LOCATION (decl),
+   "%q#D has incomplete type", decl);
   TREE_TYPE (decl) = error_mark_node;
   return NULL_TREE;
 }
@@ -6038,8 +6045,9 @@ check_initializer (tree decl, tree init, int flags
}
  else if (init_len != 1 && TREE_CODE (type) != COMPLEX_TYPE)
{
- error ("scalar object %qD requires one element in initializer",
-decl);
+ error_at (DECL_SOURCE_LOCATION (decl),
+   "scalar object %qD req

PING^5: [PATCH] PR target/70454: Build x86 libgomp with -march=i486 or better

2016-06-07 Thread H.J. Lu
On Sun, May 29, 2016 at 3:14 PM, H.J. Lu  wrote:
> On Fri, May 20, 2016 at 8:04 AM, H.J. Lu  wrote:
>> On Mon, May 9, 2016 at 5:52 AM, H.J. Lu  wrote:
>>> On Mon, May 2, 2016 at 6:46 AM, H.J. Lu  wrote:
 On Mon, Apr 25, 2016 at 1:36 PM, H.J. Lu  wrote:
> If x86 libgomp isn't compiled with -march=i486 or better, append
> -march=i486 XCFLAGS for x86 libgomp build.
>
> Tested on i686 with and without --with-arch=i386.  Tested on
> x86-64 with and without --with-arch_32=i386.  OK for trunk?
>
>
> H.J.
> ---
> PR target/70454
> * configure.tgt (XCFLAGS): Append -march=i486 to compile x86
> libgomp if needed.
> ---
>  libgomp/configure.tgt | 36 
>  1 file changed, 16 insertions(+), 20 deletions(-)
>
> diff --git a/libgomp/configure.tgt b/libgomp/configure.tgt
> index 77e73f0..c876e80 100644
> --- a/libgomp/configure.tgt
> +++ b/libgomp/configure.tgt
> @@ -67,28 +67,24 @@ if test x$enable_linux_futex = xyes; then
> ;;
>
>  # Note that bare i386 is not included here.  We need cmpxchg.
> -i[456]86-*-linux*)
> +i[456]86-*-linux* | x86_64-*-linux*)
> config_path="linux/x86 linux posix"
> -   case " ${CC} ${CFLAGS} " in
> - *" -m64 "*|*" -mx32 "*)
> -   ;;
> - *)
> -   if test -z "$with_arch"; then
> - XCFLAGS="${XCFLAGS} -march=i486 -mtune=${target_cpu}"
> +   # Need i486 or better.
> +   cat > conftestx.c < +#if defined __x86_64__ || defined __i486__ || defined __pentium__ \
> +  || defined __pentiumpro__ || defined __pentium4__ \
> +  || defined __geode__ || defined __SSE__
> +# error Need i486 or better
> +#endif
> +EOF
> +   if ${CC} ${CFLAGS} -c -o conftestx.o conftestx.c > /dev/null 
> 2>&1; then
> +   if test "${target_cpu}" = x86_64; then
> +   XCFLAGS="${XCFLAGS} -march=i486 -mtune=generic"
> +   else
> +   XCFLAGS="${XCFLAGS} -march=i486 -mtune=${target_cpu}"
> fi
> -   esac
> -   ;;
> -
> -# Similar jiggery-pokery for x86_64 multilibs, except here we
> -# can't rely on the --with-arch configure option, since that
> -# applies to the 64-bit side.
> -x86_64-*-linux*)
> -   config_path="linux/x86 linux posix"
> -   case " ${CC} ${CFLAGS} " in
> - *" -m32 "*)
> -   XCFLAGS="${XCFLAGS} -march=i486 -mtune=generic"
> -   ;;
> -   esac
> +   fi
> +   rm -f conftestx.c conftestx.o
> ;;
>
>  # Note that sparcv7 and sparcv8 is not included here.  We need cas.
> --
> 2.5.5
>

 PING.

>>>
>>> PING.
>>>
>>
>> PING.
>>
>
> PING.

PING.


-- 
H.J.


Re: [PATCH] integer overflow checking builtins in constant expressions

2016-06-07 Thread Martin Sebor

On 06/07/2016 08:32 AM, Jason Merrill wrote:

On 06/06/2016 08:36 AM, Jakub Jelinek wrote:

@@ -352,6 +352,35 @@ builtin_valid_in_constant_expr_p (const_
 case BUILT_IN_FUNCTION:
 case BUILT_IN_LINE:

+  /* The following built-ins are valid in constant expressions
+ when their arguments are.  */
+case BUILT_IN_ADD_OVERFLOW:
+case BUILT_IN_ADD_OVERFLOW_P:



Why is this necessary?  builtin_valid_in_constant_expr_p is only needed
for builtins that can have constant values even if their operands are
non-constant, which doesn't apply here.

For that matter, I don't see why we needed to add _FUNCTION et al, either.


I don't remember what prompted me to change the function in either
patch.  Perhaps it was the comment above the function that says
this:

/* Test whether DECL is a builtin that may appear in a
   constant-expression. */

But removing the change doesn't seem to affect the C++ test results
for the two features so it looks like the change may not be needed.

Unless I hear otherwise I'll submit a separate patch removing
the BUILTIN_FILE, FUNCTION, and LINE labels (and adjust the
comment to more closely reflect the function's purpose).  I'll
leave it to Jakub to tweak this patch.

Martin


[C++ PATCH] Fix -Wunused-* regression (PR c++/71442)

2016-06-07 Thread Jakub Jelinek
Hi!

Marek has recently added code to set TREE_USED bits on the elements of
TREE_VEC referenced in SIZEOF_EXPR.
But, as the testcase shows, it can be used on various parameter/argument
packs, some of them have types as elements, others decls.
And IMHO we want to set TREE_USED only on the decls listed in those,
for types TREE_USED should be a property of the type regardless of whether
the type is mentioned in sizeof... or not, otherwise we suddenly stop
diagnosing any unused vars with those types.

Bootstrapped/regtested on x86_64-linux and i686-linux, ok for trunk/6.2/5.5?

2016-06-07  Jakub Jelinek  

PR c++/71442
* pt.c (tsubst_copy): Only set TREE_USED on DECLs.

* g++.dg/cpp0x/Wunused-variable-1.C: New test.

--- gcc/cp/pt.c.jj  2016-06-01 14:17:12.0 +0200
+++ gcc/cp/pt.c 2016-06-07 14:29:16.608041125 +0200
@@ -14160,7 +14160,8 @@ tsubst_copy (tree t, tree args, tsubst_f
  len = TREE_VEC_LENGTH (expanded);
  /* Set TREE_USED for the benefit of -Wunused.  */
  for (int i = 0; i < len; i++)
-   TREE_USED (TREE_VEC_ELT (expanded, i)) = true;
+   if (DECL_P (TREE_VEC_ELT (expanded, i)))
+ TREE_USED (TREE_VEC_ELT (expanded, i)) = true;
}
 
  if (expanded == error_mark_node)
--- gcc/testsuite/g++.dg/cpp0x/Wunused-variable-1.C.jj  2016-06-07 
14:31:15.514486508 +0200
+++ gcc/testsuite/g++.dg/cpp0x/Wunused-variable-1.C 2016-06-07 
14:32:13.526730026 +0200
@@ -0,0 +1,25 @@
+// PR c++/71442
+// { dg-do compile { target c++11 } }
+// { dg-options "-Wunused-variable" }
+
+struct C
+{
+  template
+  int operator()(Ts &&...)
+  {
+return sizeof...(Ts);
+  }
+};
+
+int
+main ()
+{
+  C {} (1, 1L, 1LL, 1.0);
+  char a;  // { dg-warning "unused variable" }
+  short b; // { dg-warning "unused variable" }
+  int c;   // { dg-warning "unused variable" }
+  long d;  // { dg-warning "unused variable" }
+  long long e; // { dg-warning "unused variable" }
+  float f; // { dg-warning "unused variable" }
+  double g;// { dg-warning "unused variable" }
+}

Jakub


Ignore debug insns in memcmp optimization

2016-06-07 Thread Bernd Schmidt
This fixes a few PRs from the last few days. Fully tested on 
x86_64-linux. Ok?



Bernd
	PR debug/71432
	PR ada/71413
	* tree-ssa-strlen.c (handle_builtin_memcmp): Ignore debug insns.

	PR debug/71432
	PR ada/71413
	* g++.dg/debug/pr71432.C: New test.

Index: gcc/testsuite/g++.dg/debug/pr71432.C
===
--- gcc/testsuite/g++.dg/debug/pr71432.C	(revision 0)
+++ gcc/testsuite/g++.dg/debug/pr71432.C	(working copy)
@@ -0,0 +1,140 @@
+/* { dg-do compile } */
+/* { dg-options "-O2 -fcompare-debug" } */
+
+namespace std
+{
+  typedef long unsigned int size_t;
+  inline namespace __cxx11
+  {
+  } }
+
+extern "C++"
+{
+  namespace std
+  {
+template < typename _Tp > struct __is_char
+{
+};
+template <> struct __is_char 
+{
+  enum
+	{ __value = 1 };
+};
+  } namespace __gnu_cxx
+  {
+template < bool, typename > struct __enable_if
+{
+};
+template < typename _Tp > struct __enable_if 
+{
+  typedef _Tp __type;
+};
+  }
+}
+
+namespace __gnu_cxx
+{
+  template < typename _Tp > class new_allocator
+  {
+  };
+}
+
+namespace std
+{
+  template < typename _Tp > using __allocator_base =
+__gnu_cxx::new_allocator < _Tp >;
+template < typename _Tp > class allocator:public __allocator_base < _Tp >
+  {
+  };
+  template < typename _Alloc > struct allocator_traits
+  {
+  };
+  template < typename _Tp > struct allocator_traits >
+  {
+using size_type = std::size_t;
+template < typename _Up > using rebind_alloc = allocator < _Up >;
+  };
+}
+
+namespace __gnu_cxx
+{
+  template < typename _Alloc > struct __alloc_traits:std::allocator_traits <_Alloc >
+  {
+typedef std::allocator_traits < _Alloc > _Base_type;
+ template < typename _Tp > struct rebind
+{
+  typedef typename _Base_type::template rebind_alloc < _Tp > other;
+ };
+   };
+}
+
+namespace std
+{
+  template < class _CharT > struct char_traits;
+  namespace __cxx11
+  {
+template < typename _CharT, typename _Traits =
+  char_traits < _CharT >, typename _Alloc =
+  allocator < _CharT > >class basic_string;
+typedef basic_string < char >string;
+  }
+}
+
+namespace std
+{
+  template <> struct char_traits 
+  {
+typedef char char_type;
+ static int compare (const char_type * __s1, const char_type * __s2,
+			  size_t __n)
+{
+  return __builtin_memcmp (__s1, __s2, __n);
+}
+  };
+
+  namespace __cxx11
+  {
+template < typename _CharT, typename _Traits, typename _Alloc >
+class basic_string
+{
+  typedef typename __gnu_cxx::__alloc_traits <_Alloc >::template rebind < _CharT >::other _Char_alloc_type;
+  typedef __gnu_cxx::__alloc_traits < _Char_alloc_type > _Alloc_traits;
+  typedef typename _Alloc_traits::size_type size_type;
+
+public:
+  size_type size ()const noexcept
+  {
+  }
+  const _CharT *data () const noexcept
+  {
+  }
+};
+  }
+
+  template < typename _CharT > inline typename __gnu_cxx::__enable_if <
+__is_char < _CharT >::__value,
+bool >::__type operator== (const basic_string < _CharT > &__lhs,
+			   const basic_string < _CharT > &__rhs) noexcept
+  {
+return !std::char_traits < _CharT >::compare (__lhs.data (),
+		   __rhs.data (),
+		   __lhs.size ());
+  }
+};
+
+class CLIParameterType
+{
+  const std::string & getSwitchOption (unsigned int i) const
+  {
+  } unsigned int getSwitchOptionCount () const
+  {
+  } int checkSwitched (const std::string & value) const;
+};
+
+int
+CLIParameterType::checkSwitched (const std::string & value) const const
+{
+  int contains = false;
+  for (unsigned int i = 0; !contains && i < getSwitchOptionCount () ;)
+contains = getSwitchOption (i) == value;
+}
Index: gcc/tree-ssa-strlen.c
===
--- gcc/tree-ssa-strlen.c	(revision 237069)
+++ gcc/tree-ssa-strlen.c	(working copy)
@@ -1867,6 +1867,8 @@ handle_builtin_memcmp (gimple_stmt_itera
 {
   gimple *ustmt = USE_STMT (use_p);
 
+  if (is_gimple_debug (ustmt))
+	continue;
   if (gimple_code (ustmt) == GIMPLE_ASSIGN)
 	{
 	  gassign *asgn = as_a  (ustmt);


C PATCH for c/71418 (ICE with bogus array dimension and _Alignas)

2016-06-07 Thread Marek Polacek
Another error-recovery bug.  This time TYPE was error_mark_node but
min_align_of_type accessed it via TYPE_ALIGN.  So we better check that we're
operating on a type.

Bootstrapped/regtested on x86_64-linux, ok for trunk?

2016-06-07  Marek Polacek  

PR c/71418
* c-decl.c (grokdeclarator): Check TYPE_P.

* gcc.dg/noncompile/pr71418.c: New test.

diff --git gcc/c/c-decl.c gcc/c/c-decl.c
index d79802e..ac83e2f 100644
--- gcc/c/c-decl.c
+++ gcc/c/c-decl.c
@@ -6313,7 +6313,7 @@ grokdeclarator (const struct c_declarator *declarator,
}
   else if (TREE_CODE (type) == FUNCTION_TYPE)
error_at (loc, "alignment specified for function %qE", name);
-  else if (declspecs->align_log != -1)
+  else if (declspecs->align_log != -1 && TYPE_P (type))
{
  alignas_align = 1U << declspecs->align_log;
  if (alignas_align < min_align_of_type (type))
diff --git gcc/testsuite/gcc.dg/noncompile/pr71418.c 
gcc/testsuite/gcc.dg/noncompile/pr71418.c
index e69de29..e3243f3 100644
--- gcc/testsuite/gcc.dg/noncompile/pr71418.c
+++ gcc/testsuite/gcc.dg/noncompile/pr71418.c
@@ -0,0 +1,4 @@
+/* PR c/71418 */
+/* { dg-do compile } */
+
+_Alignas (int) int a[7++]; /* { dg-error "lvalue required" } */

Marek


Re: OpenACC wait clause

2016-06-07 Thread Jakub Jelinek
On Tue, Jun 07, 2016 at 08:01:10AM -0700, Cesar Philippidis wrote:
> On 06/07/2016 04:13 AM, Jakub Jelinek wrote:
> 
> > I've noticed
> >   if ((mask & OMP_CLAUSE_WAIT)
> >   && !c->wait
> >   && gfc_match ("wait") == MATCH_YES)
> > {
> >   c->wait = true; 
> >   match_oacc_expr_list (" (", &c->wait_list, false);
> >   continue;
> > }
> > which looks just weird and confusing.  Why isn't this instead:
> >   if ((mask & OMP_CLAUSE_WAIT)
> >   && !c->wait
> >   && (match_oacc_expr_list ("wait (", &c->wait_list, false)
> >   == MATCH_YES))
> > {
> >   c->wait = true; 
> >   continue;
> > }
> > ?  Otherwise you happily accept wait without following (, perhaps even
> > combined with another clause without any space in between etc.
> 
> Both acc wait and async accept optional parenthesis arguments. E.g.,
> 
>   #pragma acc wait
> 
> blocks for all of the async streams to complete before proceeding, whereas
> 
>   #pragma acc wait (1, 5)
> 
> only blocks for async streams 1 and 5.

But then you need to set need_space = true; if it doesn't have the ( after
it.

Jakub


Re: OpenACC wait clause

2016-06-07 Thread Cesar Philippidis
On 06/07/2016 04:13 AM, Jakub Jelinek wrote:

> I've noticed
>   if ((mask & OMP_CLAUSE_WAIT)
>   && !c->wait
>   && gfc_match ("wait") == MATCH_YES)
> {
>   c->wait = true; 
>   match_oacc_expr_list (" (", &c->wait_list, false);
>   continue;
> }
> which looks just weird and confusing.  Why isn't this instead:
>   if ((mask & OMP_CLAUSE_WAIT)
>   && !c->wait
> && (match_oacc_expr_list ("wait (", &c->wait_list, false)
> == MATCH_YES))
> {
>   c->wait = true; 
>   continue;
> }
> ?  Otherwise you happily accept wait without following (, perhaps even
> combined with another clause without any space in between etc.

Both acc wait and async accept optional parenthesis arguments. E.g.,

  #pragma acc wait

blocks for all of the async streams to complete before proceeding, whereas

  #pragma acc wait (1, 5)

only blocks for async streams 1 and 5.

Cesar


Re: [PATCH][2/3] Vectorize inductions that are live after the loop

2016-06-07 Thread Alan Hayward


On 07/06/2016 10:28, "Rainer Orth"  wrote:

>Alan Hayward  writes:
>
>> On 05/06/2016 12:00, "Andreas Schwab"  wrote:
>>
>>>Alan Hayward  writes:
>>>
* gcc.dg/vect/vect-live-2.c: New test.
>>>
>>>This test fails on powerpc64 (with -m64, but not with -m32):
>>>
>>>$ grep 'vectorized.*loops' ./vect-live-2.c.149t.vect
>>>../gcc/testsuite/gcc.dg/vect/vect-live-2.c:10:1: note: vectorized 0
>>>loops
>>>in function.
>>>../gcc/testsuite/gcc.dg/vect/vect-live-2.c:29:1: note: vectorized 0
>>>loops
>>>in function.
>>>
>>>
>>
>> "note: not vectorized: relevant stmt not supported: _1 = (long unsigned
>> int) j_24;"
>>
>>
>> This is failing because power does not support vectorising a cast from
>>int
>> to long.
>> (It works on power 32bit because longs are 32bit and therefore no need
>>to
>> cast).
>>
>> Can someone please suggest a target-supports define (or another method)
>>I
>> can use to
>> disable this test for power 64bit (but not 32bit) ?
>> I tried using vect_multiple_sizes, but that will also disable the test
>>on
>> x86 without
>> avx.
>
>I'm also seeing new FAILs on Solaris/SPARC:
>
>+FAIL: gcc.dg/vect/vect-live-2.c -flto -ffat-lto-objects
>scan-tree-dump-times v
>ect "vectorized 1 loops" 1
>+FAIL: gcc.dg/vect/vect-live-2.c scan-tree-dump-times vect "vectorized 1
>loops" 
>1
>
>32- and 64-bit:
>
>vect-live-2.c:16:3: note: not vectorized: relevant stmt not supported: _2
>= j.0_1 * 4;
>vect-live-2.c:48:7: note: not vectorized: control flow in loop.
>vect-live-2.c:35:3: note: not vectorized: loop contains function calls or
>data references that cannot be analyzed
>
>and
>
>+FAIL: gcc.dg/vect/vect-live-slp-3.c -flto -ffat-lto-objects
>scan-tree-dump-times vect "vec_stmt_relevant_p: stmt live but not
>relevant" 4
>+FAIL: gcc.dg/vect/vect-live-slp-3.c -flto -ffat-lto-objects
>scan-tree-dump-times vect "vectorized 1 loops" 4
>+FAIL: gcc.dg/vect/vect-live-slp-3.c -flto -ffat-lto-objects
>scan-tree-dump-times vect "vectorizing stmts using SLP" 4
>+FAIL: gcc.dg/vect/vect-live-slp-3.c scan-tree-dump-times vect
>"vec_stmt_relevant_p: stmt live but not relevant" 4
>+FAIL: gcc.dg/vect/vect-live-slp-3.c scan-tree-dump-times vect
>"vectorized 1 loops" 4
>+FAIL: gcc.dg/vect/vect-live-slp-3.c scan-tree-dump-times vect
>"vectorizing stmts using SLP" 4
>
>vect-live-slp-3.c:29:1: note: not vectorized: no vectype for stmt: n0_29
>= *_4;
>vect-live-slp-3.c:30:1: note: not vectorized: no vectype for stmt: n0_29
>= *_4;
>vect-live-slp-3.c:31:1: note: not vectorized: no vectype for stmt: n0_29
>= *_4;
>vect-live-slp-3.c:32:1: note: not vectorized: no vectype for stmt: n0_29
>= *_4;
>vect-live-slp-3.c:62:4: note: not vectorized: control flow in loop.
>vect-live-slp-3.c:45:3: note: not vectorized: loop contains function
>calls or data references that cannot be analyzed
>


I’ve been trying both these tests on x86,aarch64,power,sparc

vect-live-slp-3.c
Fails on  power 64 (altivec & vsx), sparc 64 (vis 2 & 3)
  - due to long int unsupported
Pass on x86, aarch64, power 32 (altivec & vsx), sparc 32 (vis 2 & 3)

vect-live-2.c
Fails on power 64 (altivec & vsx), sparc 64 (vis 2 & 3)
  - due to long int unsupported
Fails on sparc 32 (vis 2)
  - due to multiply/shift not supported
Pass on x86, aarch64, power 32 (altivec & vsx), sparc 32 (vis 3)


Therefore I think both tests should be gated on “vect_long”.
In addition, vect-live-2.c should also be gated on “vect_shift”

“vect_long” is not currently enabled for aarch64, but should be.

Also “vect_shift” is not currently enabled for sparc 32 (vis 3), but
probably
should be. I leave this for a task for a sparc maintainer to add (as I’m
unable to test).




This patch fixes the targets for vect-live-slp-3.c and vect-live-2.c.
It also adds aarch64 to vect_long.

As a side consequence, the following vector tests are now enabled for
aarch64:
pr18425.c, pr30843.c, pr36493.c, pr42193.c and pr60656.c

Tested on aarch64 and x86.
Tested by inspection on power and sparc

Ok to commit?

testsuite/
* gcc.dg/vect/vect-live-2.c : Likewise
* gcc.dg/vect/vect-live-slp-3.c : Update effective target
* lib/target-supports.exp : Add aarch64 to vect_long




diff --git a/gcc/testsuite/gcc.dg/vect/vect-live-2.c
b/gcc/testsuite/gcc.dg/vect/vect-live-2.c
index 
53adc3fee006e0577a4cf2f9ba8fe091d2a09353..9460624a515945bdd72f98a0b1a6751fd
c7a75de 100644
--- a/gcc/testsuite/gcc.dg/vect/vect-live-2.c
+++ b/gcc/testsuite/gcc.dg/vect/vect-live-2.c
@@ -1,4 +1,5 @@
-/* { dg-require-effective-target vect_int } */
+/* { dg-require-effective-target vect_long } */
+/* { dg-require-effective-target vect_shift } */
 /* { dg-additional-options "-fno-tree-scev-cprop" } */

 #include "tree-vect.h"
diff --git a/gcc/testsuite/gcc.dg/vect/vect-live-slp-3.c
b/gcc/testsuite/gcc.dg/vect/vect-live-slp-3.c
index 
aacf5cb98071f6fec1f4b522eeefeb6696787334..70947afcc45d30987fc6df2db634d002d
33b360f 100644
--- a/gcc/testsuite/gcc.dg/vect/vect-live-slp-3.c
+++ b/gcc/testsuite/gcc.dg/vect/vect-live-slp-3.c
@@ -1,4 +1,

Re: [PATCH] integer overflow checking builtins in constant expressions

2016-06-07 Thread Jason Merrill

On 06/06/2016 08:36 AM, Jakub Jelinek wrote:

@@ -352,6 +352,35 @@ builtin_valid_in_constant_expr_p (const_
 case BUILT_IN_FUNCTION:
 case BUILT_IN_LINE:

+  /* The following built-ins are valid in constant expressions
+when their arguments are.  */
+case BUILT_IN_ADD_OVERFLOW:
+case BUILT_IN_ADD_OVERFLOW_P:



Why is this necessary?  builtin_valid_in_constant_expr_p is only needed 
for builtins that can have constant values even if their operands are 
non-constant, which doesn't apply here.


For that matter, I don't see why we needed to add _FUNCTION et al, either.

Jason



Re: [PATCH,rs6000] Add built-in function support for new Power9 vector absolute difference unsigned instructions

2016-06-07 Thread Segher Boessenkool
Hi Kelvin,

On Mon, Jun 06, 2016 at 05:58:00PM -0600, Kelvin Nilsen wrote:
> Index: gcc/config/rs6000/altivec.h
> ===
> --- gcc/config/rs6000/altivec.h   (revision 237045)
> +++ gcc/config/rs6000/altivec.h   (working copy)
> @@ -401,6 +401,11 @@
>  #define vec_vprtybq __builtin_vec_vprtybq
>  #endif
>  
> +#define vec_adu __builtin_vec_vadu
> +#define vec_adub __builtin_vec_vadub
> +#define vec_aduh __builtin_vec_vaduh
> +#define vec_aduw __builtin_vec_vaduw

I think the API says vec_absd?   Please double-check.

> +;; Vector absolute difference unsigned
> +(define_expand "vadu3"
> +  [(set (match_operand:VI 0 "register_operand" "")
> +(unspec:VI [(match_operand:VI 1 "register_operand" "")
> + (match_operand:VI 2 "register_operand" "")]
> + UNSPEC_VADU))]
> +  "TARGET_P9_VECTOR")

You can leave off those constraints "", they are default.

> +;; Vector absolute difference unsigned
> +(define_insn "*p9_vadu3"
> +  [(set (match_operand:VI 0 "register_operand" "=v")
> +(unspec:VI [(match_operand:VI 1 "register_operand" "v")
> + (match_operand:VI 2 "register_operand" "v")]
> + UNSPEC_VADU))]
> +  "TARGET_P9_VECTOR"
> +  "vabsdu %0, %1, %2"

Don't use spaces after comma in asm.

> +  [(set_attr "type" "add")

That's the wrong attr type; it should be something vector.  "add" is
for GPRs.

> +   (set_attr "length" "4")])

You don't need to say the default length.


Segher


Re: [PATCH] Selftest framework (v7)

2016-06-07 Thread David Malcolm
On Mon, 2016-06-06 at 22:14 -0400, Trevor Saunders wrote:
> On Mon, Jun 06, 2016 at 11:57:49PM +0200, Jakub Jelinek wrote:
> > On Mon, Jun 06, 2016 at 05:53:50PM -0400, Trevor Saunders wrote:
> > > > > As far as I can 
> > > > > tell this just involves moving the start of namespace
> > > > > selftest
> > > > > upwards a 
> > > > > bit in the files where we have tests.
> > > > 
> > > > Yes, and it does seem cleaner to have all of the selftest code
> > > > start
> > > > like this:
> > > > 
> > > >   #if CHECKING_P
> > > 
> > > What are we gaining by ifdefing this? I would think on reasonable
> > > systems the compiler would optimize out the call to the selftests
> > > in
> > > release builds and then the linker would gc all the unused
> > > functions.
> > > Do we really care about code size in places that doesn't happen
> > > enough
> > > to go through this?
> > 
> > Not everyone is building the compiler with LTO, and if you don't,
> > then
> > how would you optimize that away?
> 
> -ffunction-sections -Wl,--gc-sections should be enough I think.  I
> guess
> we don't use those at the moment though.
> 
> > And yes, not having the self-tests, especially if they are going to
> > grow
> > further, in release compilers is desirable, especially if it would
> > be
> > intermixed with hot code.
> 
> That's fair, though turning on --gc-sections where we can should
> further
> help with that, and that should be more effective with
> -ffunction-sections -fdata-sections, so its seems to me like the
> right
> thing to do is add configure tests to enable those?  And then its
> more
> of a non issue?

I appreciate that you'd done a lot of work on eliminating preprocessor
use in gcc, and that we'd prefer to minimize the amount of #if code we
have - though it's relatively easy to test the with/without #if
CHECKING_P case (compared to all of the various target-specific
macros).

Historically gcc testing has largely been "black box" testing: run the
built programs with specific inputs and look for specific outputs. My
hope with -fself-tests is that we can build up our "white box" test
coverage to complement the above, with unit tests.  My favorite example
is the testing for gengtype that I posted as a followup, which gives us
some immediate test coverage that gengtype is working as expected,
which is hard to do using our traditional approach to testing.

I hope that we can add a lot more tests to -fself-test, in particular,
I want us to have unit tests for hot code, including code that's hidden
deep inside the implementation and that might normally get inlined
away.  To play Devil's Advocate: if we find we're able to do a release
build with the selftests enabled and that it isn't slowing down the
release build, does that imply we need more unit tests for our hottest
code?  (the counterargument being that the checked build still needs to
run in a bearable amount of time).

That said, I want -fself-test to always run quickly (e.g. less than a
second); let's not put anything slow in there.  Also, any unit tests
involving analyzing several gimple or RTL statements at once seem to be
easier to do via the gimple and RTL frontends that Prasad and I are
working on respectively.  (I think we can use -fself-test for unit
-testing implementation details of passes that involve one statement at
a time, but as soon as we start dealing with multiple statements and
control flow, that it's probably better to express it using a
gimple/RTL dump in DejaGnu form).


Hope the above sounds sane
Dave


[PATCH] Use __USER_LABEL_PREFIX__ in asm statement

2016-06-07 Thread H.J. Lu
A target may have a prefix in function symbol.  Update interrrupt tests
to use __USER_LABEL_PREFIX__ for function symbol in asm statement.

Tested on Linux and Darwin.  OK for trunk?

Thanks.

H.J.

* gcc.dg/guality/pr68037-1.c (ASMNAME): New.
(ASMNAME2): Likewise.
(main): Replace fn in asm statement with ASMNAME ("fn").
* gcc.dg/guality/pr68037-2.c: Likewise.
* gcc.dg/guality/pr68037-3.c: Likewise.
* gcc.dg/torture/pr68037-1.c: Likewise.
* gcc.dg/torture/pr68037-2.c: Likewise.
* gcc.dg/torture/pr68037-3.c: Likewise.

diff --git a/gcc/testsuite/gcc.dg/guality/pr68037-1.c 
b/gcc/testsuite/gcc.dg/guality/pr68037-1.c
index c3ea645..74f61ec 100644
--- a/gcc/testsuite/gcc.dg/guality/pr68037-1.c
+++ b/gcc/testsuite/gcc.dg/guality/pr68037-1.c
@@ -14,6 +14,8 @@ typedef unsigned int uword_t __attribute__ ((mode 
(__word__)));
 
 #define STRING(x)  XSTRING(x)
 #define XSTRING(x) #x
+#define ASMNAME(cname)  ASMNAME2 (__USER_LABEL_PREFIX__, cname)
+#define ASMNAME2(prefix, cname) XSTRING (prefix) cname
 
 struct interrupt_frame
 {
@@ -53,7 +55,7 @@ main ()
push$" STRING (CS) ";   \
push$" STRING (IP) ";   \
push$" STRING (ERROR) ";\
-   jmp fn");
+   jmp  " ASMNAME ("fn"));
   return 0;
 }
 
diff --git a/gcc/testsuite/gcc.dg/guality/pr68037-2.c 
b/gcc/testsuite/gcc.dg/guality/pr68037-2.c
index 6f7e920..c3cd73d 100644
--- a/gcc/testsuite/gcc.dg/guality/pr68037-2.c
+++ b/gcc/testsuite/gcc.dg/guality/pr68037-2.c
@@ -13,6 +13,8 @@ typedef unsigned int uword_t __attribute__ ((mode 
(__word__)));
 
 #define STRING(x)  XSTRING(x)
 #define XSTRING(x) #x
+#define ASMNAME(cname)  ASMNAME2 (__USER_LABEL_PREFIX__, cname)
+#define ASMNAME2(prefix, cname) XSTRING (prefix) cname
 
 struct interrupt_frame
 {
@@ -49,7 +51,7 @@ main ()
push$" STRING (FLAGS) ";\
push$" STRING (CS) ";   \
push$" STRING (IP) ";   \
-   jmp fn");
+   jmp  " ASMNAME ("fn"));
   return 0;
 }
 
diff --git a/gcc/testsuite/gcc.dg/guality/pr68037-3.c 
b/gcc/testsuite/gcc.dg/guality/pr68037-3.c
index 504a931..6e05472 100644
--- a/gcc/testsuite/gcc.dg/guality/pr68037-3.c
+++ b/gcc/testsuite/gcc.dg/guality/pr68037-3.c
@@ -16,6 +16,8 @@ typedef int aligned __attribute__((aligned(64)));
 
 #define STRING(x)  XSTRING(x)
 #define XSTRING(x) #x
+#define ASMNAME(cname)  ASMNAME2 (__USER_LABEL_PREFIX__, cname)
+#define ASMNAME2(prefix, cname) XSTRING (prefix) cname
 
 struct interrupt_frame
 {
@@ -65,7 +67,7 @@ main ()
push$" STRING (FLAGS) ";\
push$" STRING (CS) ";   \
push$" STRING (IP) ";   \
-   jmp fn");
+   jmp  " ASMNAME ("fn"));
   return 0;
 }
 
diff --git a/gcc/testsuite/gcc.dg/torture/pr68037-1.c 
b/gcc/testsuite/gcc.dg/torture/pr68037-1.c
index 15fe08c..23d7c6f 100644
--- a/gcc/testsuite/gcc.dg/torture/pr68037-1.c
+++ b/gcc/testsuite/gcc.dg/torture/pr68037-1.c
@@ -14,6 +14,8 @@ typedef unsigned int uword_t __attribute__ ((mode 
(__word__)));
 
 #define STRING(x)  XSTRING(x)
 #define XSTRING(x) #x
+#define ASMNAME(cname)  ASMNAME2 (__USER_LABEL_PREFIX__, cname)
+#define ASMNAME2(prefix, cname) XSTRING (prefix) cname
 
 struct interrupt_frame
 {
@@ -53,6 +55,6 @@ main ()
push$" STRING (CS) ";   \
push$" STRING (IP) ";   \
push$" STRING (ERROR) ";\
-   jmp fn");
+   jmp  " ASMNAME ("fn"));
   return 0;
 }
diff --git a/gcc/testsuite/gcc.dg/torture/pr68037-2.c 
b/gcc/testsuite/gcc.dg/torture/pr68037-2.c
index 00ba7d4..18f9844 100644
--- a/gcc/testsuite/gcc.dg/torture/pr68037-2.c
+++ b/gcc/testsuite/gcc.dg/torture/pr68037-2.c
@@ -13,6 +13,8 @@ typedef unsigned int uword_t __attribute__ ((mode 
(__word__)));
 
 #define STRING(x)  XSTRING(x)
 #define XSTRING(x) #x
+#define ASMNAME(cname)  ASMNAME2 (__USER_LABEL_PREFIX__, cname)
+#define ASMNAME2(prefix, cname) XSTRING (prefix) cname
 
 struct interrupt_frame
 {
@@ -49,6 +51,6 @@ main ()
push$" STRING (FLAGS) ";\
push$" STRING (CS) ";   \
push$" STRING (IP) ";   \
-   jmp fn");
+   jmp  " ASMNAME ("fn"));
   return 0;
 }
diff --git a/gcc/testsuite/gcc.dg/torture/pr68037-3.c 
b/gcc/testsuite/gcc.dg/torture/pr68037-3.c
index abf8adb..86324f1 100644
--- a/gcc/testsuite/gcc.dg/torture/pr68037-3.c
+++ b/gcc/testsuite/gcc.dg/torture/pr68037-3.c
@@ -16,6 +16,8 @@ typedef int aligned __attribute__((aligned(64)));
 
 #define STRING(x)  XSTRING(x)
 #define XSTRING(x) #x
+#define ASMNAME(cname)  ASMNAME2 (__USER_LABEL_PREFIX__, cname)
+#define ASMNAME2(prefix, cname) XSTRING (prefix) cname
 
 struct interrupt_frame
 {
@@ -65,6 +67,6 @@ main ()
push$" STRING (FLAGS) "; 

Re: [PATCH] Fix SLP wrong-code with VECTOR_BOOLEAN_TYPE_P (PR tree-optimization/71259)

2016-06-07 Thread Jakub Jelinek
On Tue, Jun 07, 2016 at 02:43:37PM +0200, Christophe Lyon wrote:
> > No, why?  This test doesn't test whether the function has been vectorized.
> > It only tests whether it works.
> > And the check_vect () is supposed to exit early if some extra flags were
> > passed by vect.exp (like e.g. on i?86-linux -msse2) and the HW doesn't
> > support those.
> >
> But this makes the tests fails, rather than be unsupported, right?

check_vect is supposed to exit (0) if the HW doesn't support vectorization,
so the test would PASS.

Jakub


C PATCH for c/71426 (ICE with bogus parameter)

2016-06-07 Thread Marek Polacek
cc1 crashed on the following invalid code, because it tripped this assert.
The reason why b->nested was false for the FUNCTION_DECL 'x' was that when
doing pushdecl, we found an incompatible duplicate, so pushdecl just bound
'x' to FUNCTION_DECL 'x', replacing the old binding, without the TREE_PUBLIC
test that sets 'nested'.

At first I thought about removing the assert but in the end I decided to add
|| seen_error -- for invalid code we might do the same as for ERROR_MARK.

Bootstrapped/regtested on x86_64-linux, ok for trunk?

2016-06-07  Marek Polacek  

PR c/71426
* c-decl.c (get_parm_info): Don't crash on an assert on invalid
code.

* gcc.dg/noncompile/pr71426.c: New test.

diff --git gcc/c/c-decl.c gcc/c/c-decl.c
index d79802e..8ceb8ba 100644
--- gcc/c/c-decl.c
+++ gcc/c/c-decl.c
@@ -7054,9 +7054,9 @@ get_parm_info (bool ellipsis, tree expr)
  break;
 
case FUNCTION_DECL:
- /*  FUNCTION_DECLs appear when there is an implicit function
- declaration in the parameter list.  */
- gcc_assert (b->nested);
+ /* FUNCTION_DECLs appear when there is an implicit function
+declaration in the parameter list.  */
+ gcc_assert (b->nested || seen_error ());
  goto set_shadowed;
 
case CONST_DECL:
diff --git gcc/testsuite/gcc.dg/noncompile/pr71426.c 
gcc/testsuite/gcc.dg/noncompile/pr71426.c
index e69de29..874e189 100644
--- gcc/testsuite/gcc.dg/noncompile/pr71426.c
+++ gcc/testsuite/gcc.dg/noncompile/pr71426.c
@@ -0,0 +1,5 @@
+/* PR c/71426 */
+/* { dg-do compile } */
+/* { dg-options "-w" } */
+
+int f (int x[x - x ()]); /* { dg-error "undeclared" } */

Marek


Re: [PATCH] Fix SLP wrong-code with VECTOR_BOOLEAN_TYPE_P (PR tree-optimization/71259)

2016-06-07 Thread Christophe Lyon
On 7 June 2016 at 11:42, Jakub Jelinek  wrote:
> On Tue, Jun 07, 2016 at 10:36:25AM +0100, Ramana Radhakrishnan wrote:
>> On Tue, Jun 7, 2016 at 10:28 AM, Jakub Jelinek  wrote:
>> > On Tue, Jun 07, 2016 at 11:23:01AM +0200, Christophe Lyon wrote:
>> >> > --- gcc/testsuite/gcc.dg/vect/pr71259.c.jj  2016-06-03 
>> >> > 17:05:37.693475438 +0200
>> >> > +++ gcc/testsuite/gcc.dg/vect/pr71259.c 2016-06-03 17:05:32.418544731 
>> >> > +0200
>> >> > @@ -0,0 +1,28 @@
>> >> > +/* PR tree-optimization/71259 */
>> >> > +/* { dg-do run } */
>> >> > +/* { dg-options "-O3" } */
>> >
>> > Would changing this from dg-options to dg-additional-options help for the
>> > ARM issues?
>> > check_vect () is the standard way for testing for HW vectorization support
>> > and hundreds of tests use it.
>>
>>
>> all tests in gcc.dg/vect have some form of dg-require-effective-target
>
> No, at least 170+ tests don't.
>
>> - so I think this test should just have dg-require-effective-target
>> "vect_int" .
>
> No, why?  This test doesn't test whether the function has been vectorized.
> It only tests whether it works.
> And the check_vect () is supposed to exit early if some extra flags were
> passed by vect.exp (like e.g. on i?86-linux -msse2) and the HW doesn't
> support those.
>
But this makes the tests fails, rather than be unsupported, right?

> Jakub


Re: [PATCH][vectorizer] Remove blank debug lines after dump_gimple_stmt

2016-06-07 Thread Alan Hayward

On 06/06/2016 18:56, "Jeff Law"  wrote:

>On 06/06/2016 06:46 AM, Alan Hayward wrote:
>> Lots of code calls dump_gimple_stmt then print a newline, however
>> dump_gimple_stmt will print a newline itself. This makes the vectorizer
>> debug
>> file messy. I think the confusion is because dump_generic_expr does NOT
>> print a
>> newline. This patch removes all prints of a newline direcly after a
>> dump_gimple_stmt.
>>
>> Tested by examining a selection of vect dump files.
>>
>>  gcc/
>>  \* tree-vect-data-refs.c (vect_analyze_data_refs): Remove debug
>>newline.
>>  \* tree-vect-loop-manip.c (slpeel_make_loop_iterate_ntimes): likewise.
>>  (vect_can_advance_ivs_p): likewise.
>>  (vect_update_ivs_after_vectorizer): likewise.
>>  \* tree-vect-loop.c (vect_determine_vectorization_factor): likewise.
>>  (vect_analyze_scalar_cycles_1): likewise.
>>  (vect_analyze_loop_operations): likewise.
>>  (report_vect_op): likewise.
>>  (vect_is_slp_reduction): likewise.
>>  (vect_is_simple_reduction): likewise.
>>  (get_initial_def_for_induction): likewise.
>>  (vect_transform_loop): likewise.
>>  \* tree-vect-patterns.c (vect_recog_dot_prod_pattern): likewise.
>>  (vect_recog_sad_pattern): likewise.
>>  (vect_recog_widen_sum_pattern): likewise.
>>  (vect_recog_widening_pattern): likewise.
>>  (vect_recog_divmod_pattern): likewise.
>>  \* tree-vect-slp.c (vect-build-slp_tree_1): likewise.
>>  (vect_analyze_slp_instance): likewise.
>>  (vect_transform_slp_perm_load): likewise.
>>  (vect_schedule_slp_instance): likewise.
>Wouldn't you also need to verify that the testsuite passes since it
>could potentially have tests which assume the extra newlines in the
>test's regexps?
>
>So I'm fine with this patch once the testsuite has been verified to run
>without regressions on a target which exercises the vectorizer.
>

Sorry, I should have said that I have done a check run on x86 and aarch64,
and there were no regressions.


Alan.




Re: [PATCH] Fix PR61564 - optimize attribute/pragma accepting any option

2016-06-07 Thread Richard Biener
On Tue, 7 Jun 2016, Jakub Jelinek wrote:

> On Tue, Jun 07, 2016 at 10:15:39AM +0200, Richard Biener wrote:
> > 
> > This fixes PR61564 by diagnosing (and ignoring) options not marked with
> > 'Optimization' being applied to #pragma GCC optimize or via the
> > optimize attribute.
> > 
> > The reason is that while we save/restore option state for 'Optimize'
> > marked options we don't do that for other options.  Thus while such
> > options do not end up in the per-function optimize state applying them
> > still clobbers the global state.
> > 
> > Bootstrap and regtest running on x86_64-unknown-linux-gnu.
> > 
> > Ok for trunk?
> 
> Ok (though it surprises me we haven't done that from the beginning).

It exposed that gcc.dg/ipa/pr70646.c uses a non-Optimization option #pragma
(fixed) and that -ffast-math is not marked Optimization (but its
"components" are, likewise -funsafe-math-optimizations is).

Fixed both, bootstrapped / tested on x86_64-unknown-linux-gnu and applied.

Richard.

2016-06-07  Richard Biener  

PR c/61564
* c-common.c (parse_optimize_options): Only apply CL_OPTIMIZATION
options and warn about others.
* common.opt (ffast-math): Make Optimization.

* gcc.dg/Wpragmas-1.c: New testcase.
* gcc.dg/Wattributes-4.c: Likewise.
* gcc.dg/ipa/pr70646.c: Drop optimize pragma in favor of dg-option
entry.

Index: gcc/c-family/c-common.c
===
*** gcc/c-family/c-common.c (revision 237167)
--- gcc/c-family/c-common.c (working copy)
*** parse_optimize_options (tree args, bool
*** 9580,9585 
--- 9580,9608 
decode_cmdline_options_to_array_default_mask (opt_argc, opt_argv,
&decoded_options,
&decoded_options_count);
+   /* Drop non-Optimization options.  */
+   unsigned j = 1;
+   for (i = 1; i < decoded_options_count; ++i)
+ {
+   if (! (cl_options[decoded_options[i].opt_index].flags & 
CL_OPTIMIZATION))
+   {
+ ret = false;
+ if (attr_p)
+   warning (OPT_Wattributes,
+"bad option %s to optimize attribute",
+decoded_options[i].orig_option_with_args_text);
+ else
+   warning (OPT_Wpragmas,
+"bad option %s to pragma attribute",
+decoded_options[i].orig_option_with_args_text);
+ continue;
+   }
+   if (i != j)
+   decoded_options[j] = decoded_options[i];
+   j++;
+ }
+   decoded_options_count = j;
+   /* And apply them.  */
decode_options (&global_options, &global_options_set,
  decoded_options, decoded_options_count,
  input_location, global_dc);
Index: gcc/testsuite/gcc.dg/Wpragmas-1.c
===
*** gcc/testsuite/gcc.dg/Wpragmas-1.c   (revision 0)
--- gcc/testsuite/gcc.dg/Wpragmas-1.c   (working copy)
***
*** 0 
--- 1,11 
+ /* { dg-do compile } */
+ 
+ #pragma GCC push_options
+ #pragma GCC optimize ("-fno-lto") /* { dg-warning "bad option" } */
+ int main(void){return 0;}
+ #pragma GCC pop_options
+ 
+ /* ???  The FEs still apply the pragma string as optimize attribute to
+all functions thus the diagnostic will be repeated for each function
+affected.  */
+ /* { dg-message "bad option" "" { target *-*-* } 0 } */
Index: gcc/testsuite/gcc.dg/Wattributes-4.c
===
*** gcc/testsuite/gcc.dg/Wattributes-4.c(revision 0)
--- gcc/testsuite/gcc.dg/Wattributes-4.c(working copy)
***
*** 0 
--- 1,3 
+ /* { dg-do compile } */
+ 
+ int __attribute__((optimize("no-lto"))) main(void){return 0;} /* { dg-warning 
"bad option" } */
Index: gcc/testsuite/gcc.dg/ipa/pr70646.c
===
*** gcc/testsuite/gcc.dg/ipa/pr70646.c  (revision 237165)
--- gcc/testsuite/gcc.dg/ipa/pr70646.c  (working copy)
***
*** 1,7 
  /* { dg-do run } */
! /* { dg-options "-O2" } */
! 
! #pragma GCC optimize("no-unit-at-a-time")
  
  typedef unsigned char u8;
  typedef unsigned long long u64;
--- 1,5 
  /* { dg-do run } */
! /* { dg-options "-O2 -fno-unit-at-a-time" } */
  
  typedef unsigned char u8;
  typedef unsigned long long u64;
Index: gcc/common.opt
===
*** gcc/common.opt  (revision 237165)
--- gcc/common.opt  (working copy)
*** EnumValue
*** 1287,1293 
  Enum(excess_precision) String(standard) Value(EXCESS_PRECISION_STANDARD)
  
  ffast-math
! Common
  
  ffat-lto-objects
  Common Var(flag_fat_lto_objects)
--- 1287,1293 
  Enum(excess_precision) String(standard) Value(EXCESS_PRECISION_STANDARD)
  
  ffast-math
! Common Optimization
  
  ffat-lto-objects
  Comm

Re: [PATCH, Fortran, OpenACC] Fix PR70598, Fortran host_data ICE

2016-06-07 Thread Chung-Lin Tang
Ping.

On 2016/5/11 02:57 AM, Bernhard Reutner-Fischer wrote:
> On May 9, 2016 4:26:50 PM GMT+02:00, Chung-Lin Tang  
> wrote:
>> Hi, this patch resolves an ICE for Fortran when using the OpenACC
>> host_data directive.  Actually, rather than say resolve, it's more like
>> adjusting the front-end to same middle-end restrictions as C/C++,
>> namely that we only support pointers or arrays for host_data right now.
>>
>> This patch contains a little bit of adjustments in
>> fortran/openmp.c:resolve_omp_clauses(),
>> and some testcase adjustments. This has been tested without regressions
>> for Fortran.
>>
>> Is this okay for trunk?
>>
>> Thanks,
>> Chung-Lin
>>
>> 2015-05-09  Chung-Lin Tang  
>>
>>  gcc/
>>  * fortran/openmp.c (resolve_omp_clauses): Adjust use_device clause
>>  handling to only allow pointers and arrays.
> 
> Fortran has it's own ChangeLog. The patch itself looks somewhat plausible to 
> me, fwiw, but Jakub or a FE maintainer has the say.
> 



Re: [PATCH] Improve avx_vec_concat

2016-06-07 Thread Kirill Yukhin
Hi Jakub,
On 23 May 19:26, Jakub Jelinek wrote:
> Hi!
> 
> Not sure how to easily test these.
> In any case, for the vinsert* case, we don't have vinserti128 nor
> vinsertf128 in evex, so need to use vinsert[if]{64x4,32x4} or
> for DQ {64x2,32x8}.  For the case with zero in the other half,
> we need AVX512VL and it isn't guaranteed for the output operand,
> because it can be 512-bit mode too.
> 
> Bootstrapped/regtested on x86_64-linux and i686-linux, ok for trunk?
> 
> 2016-05-23  Jakub Jelinek  
> 
>   * config/i386/sse.md (avx_vec_concat): Add v=v,vm and
>   Yv=Yv,C alternatives.
Patch is OK for trunk.

--
Thanks, K
> 
> --- gcc/config/i386/sse.md.jj 2016-05-23 15:42:49.0 +0200
> +++ gcc/config/i386/sse.md2016-05-23 16:25:58.434925572 +0200
> @@ -18178,10 +18178,10 @@ (define_insn "_
> (set_attr "mode" "")])
>  
>  (define_insn "avx_vec_concat"
> -  [(set (match_operand:V_256_512 0 "register_operand" "=x,x")
> +  [(set (match_operand:V_256_512 0 "register_operand" "=x,v,x,Yv")
>   (vec_concat:V_256_512
> -   (match_operand: 1 "register_operand" "x,x")
> -   (match_operand: 2 "vector_move_operand" "xm,C")))]
> +   (match_operand: 1 "register_operand" "x,v,x,v")
> +   (match_operand: 2 "vector_move_operand" 
> "xm,vm,C,C")))]
>"TARGET_AVX"
>  {
>switch (which_alternative)
> @@ -18189,6 +18189,22 @@ (define_insn "avx_vec_concat"
>  case 0:
>return "vinsert\t{$0x1, %2, %1, %0|%0, 
> %1, %2, 0x1}";
>  case 1:
> +  if ( == 64)
> + {
> +   if (TARGET_AVX512DQ && GET_MODE_SIZE (mode) == 4)
> + return "vinsert32x8\t{$0x1, %2, %1, 
> %0|%0, %1, %2, 0x1}";
> +   else
> + return "vinsert64x4\t{$0x1, %2, %1, 
> %0|%0, %1, %2, 0x1}";
> + }
> +  else
> + {
> +   if (TARGET_AVX512DQ && GET_MODE_SIZE (mode) == 8)
> + return "vinsert64x2\t{$0x1, %2, %1, 
> %0|%0, %1, %2, 0x1}";
> +   else
> + return "vinsert32x4\t{$0x1, %2, %1, 
> %0|%0, %1, %2, 0x1}";
> + }
> +case 2:
> +case 3:
>switch (get_attr_mode (insn))
>   {
>   case MODE_V16SF:
> @@ -18200,9 +18216,19 @@ (define_insn "avx_vec_concat"
>   case MODE_V4DF:
> return "vmovapd\t{%1, %x0|%x0, %1}";
>   case MODE_XI:
> -   return "vmovdqa\t{%1, %t0|%t0, %1}";
> +   if (which_alternative == 2)
> + return "vmovdqa\t{%1, %t0|%t0, %1}";
> +   else if (GET_MODE_SIZE (mode) == 8)
> + return "vmovdqa64\t{%1, %t0|%t0, %1}";
> +   else
> + return "vmovdqa32\t{%1, %t0|%t0, %1}";
>   case MODE_OI:
> -   return "vmovdqa\t{%1, %x0|%x0, %1}";
> +   if (which_alternative == 2)
> + return "vmovdqa\t{%1, %x0|%x0, %1}";
> +   else if (GET_MODE_SIZE (mode) == 8)
> + return "vmovdqa64\t{%1, %x0|%x0, %1}";
> +   else
> + return "vmovdqa32\t{%1, %x0|%x0, %1}";
>   default:
> gcc_unreachable ();
>   }
> @@ -18210,9 +18236,9 @@ (define_insn "avx_vec_concat"
>gcc_unreachable ();
>  }
>  }
> -  [(set_attr "type" "sselog,ssemov")
> -   (set_attr "prefix_extra" "1,*")
> -   (set_attr "length_immediate" "1,*")
> +  [(set_attr "type" "sselog,sselog,ssemov,ssemov")
> +   (set_attr "prefix_extra" "1,1,*,*")
> +   (set_attr "length_immediate" "1,1,*,*")
> (set_attr "prefix" "maybe_evex")
> (set_attr "mode" "")])
>  
> 
>   Jakub


RE: [PATCH][MIPS] P5600 scheduler fix

2016-06-07 Thread Robert Suchanek
Hi,

> > gcc/
> > * config/mips/p5600.md (p5600_fpu_fadd): Remove checking for
> > `fabs' and `fneg' type attributes.
> > (p5600_fpu_fabs): Add `fmove' to the comment.
> 
> OK.
> 
> Thanks,
> Matthew

Committed as r237173.

Regards,
Robert


Re: [Patch, avr] Fix PR 71151

2016-06-07 Thread Senthil Kumar Selvaraj

Georg-Johann Lay writes:

> Senthil Kumar Selvaraj schrieb:
>> Hi,
>> 
>>   This patch fixes PR 71151 by eliminating the
>>   TARGET_ASM_FUNCTION_RODATA_SECTION hook and setting
>>   JUMP_TABLES_IN_TEXT_SECTION to 1.
>> 
>>   As described in the bugzilla entry, this hook assumed it will get
>>   called only for jumptable rodata for functions. This was true until
>>   6.1, when a commit in varasm.c started calling the hook for mergeable
>>   string/constant data as well.
>> 
>>   This resulted in string constants ending up in a section intended for
>>   jumptables (flash), and broke code using those constants, which
>>   expects them to be present in rodata (SRAM).
>> 
>>   Given that the original reason for placing jumptables in a section was
>>   fixed by Johann in PR 63323, this patch restores the original
>>   behavior. Reg testing on both gcc-6-branch and trunk showed no regressions.
>
> Just for the record:
>
> The intention for jump-tables in function-rodata-section was to get 
> fine-grained section for the tables so that --gc-sections and 
> -ffunction-sections not only gc unused functions but also unused 
> jump-tables.  As these tables had to reside in the lowest 64KiB of flash 
> (.progmem section) neither .rodata nor .text was a correct placement, 
> hence the hacking in TARGET_ASM_FUNCTION_RODATA_SECTION.
>
> Before using TARGET_ASM_FUNCTION_RODATA_SECTION, all jump tables were 
> put into .progmem.gcc_sw_table by ASM_OUTPUT_BEFORE_CASE_LABEL switching 
> to that section.
>
> We actually never had fump-tables in .text before...

JUMP_TABLES_IN_TEXT_SECTION was 1 before r37465 - that was when the
progmem.gcc_sw_table section was introduced. But yes, I understand that
the target hook for FUNCTION_RODATA_SECTION was done to get them gc'ed
along with the code.

>
> The purpose of PR63323 was to have more generic jump-table 
> implementation that also works if the table does NOT reside in the lower 
> 64KiB.  This happens when moving whole whole TEXT section around like 
> for a bootloader.

Understood.
>
>>   As pointed out by Johann, this may end up increasing code
>>   size if there are lots of branches that cross the jump tables. I
>>   intend to propose a separate patch that gives additional information
>>   to the target hook (SECCAT_RODATA_{STRING,JUMPTABLE}) so it can know
>>   what type of function rodata is coming on. Johann also suggested
>>   handling jump table generation ourselves - I'll experiment with that
>>   some more.
>> 
>>   If ok, could someone commit please? Could you also backport to
>>   gcc-6-branch?
>> 
>> Regards
>> Senthil
>> 
>> gcc/ChangeLog
>> 
>> 2016-06-03  Senthil Kumar Selvaraj  
>> 
>>  * config/avr/avr.c (avr_asm_function_rodata_section): Remove.
>>  * config/avr/avr.c (TARGET_ASM_FUNCTION_RODATA_SECTION): Remove.
>> 
>> gcc/testsuite/ChangeLog
>> 
>> 2016-06-03  Senthil Kumar Selvaraj  
>> 
>>  * gcc/testsuite/gcc.target/avr/pr71151-1.c: New.
>>  * gcc/testsuite/gcc.target/avr/pr71151-2.c: New.
>> 
>> diff --git gcc/config/avr/avr.c gcc/config/avr/avr.c
>> index ba5cd91..3cb8cb7 100644
>> --- gcc/config/avr/avr.c
>> +++ gcc/config/avr/avr.c
>> @@ -9488,65 +9488,6 @@ avr_asm_init_sections (void)
>>  }
>>  
>>  
>> -/* Implement `TARGET_ASM_FUNCTION_RODATA_SECTION'.  */
>> -
>> -static section*
>> -avr_asm_function_rodata_section (tree decl)
>> -{
>> -  /* If a function is unused and optimized out by -ffunction-sections
>> - and --gc-sections, ensure that the same will happen for its jump
>> - tables by putting them into individual sections.  */
>> -
>> -  unsigned int flags;
>> -  section * frodata;
>> -
>> -  /* Get the frodata section from the default function in varasm.c
>> - but treat function-associated data-like jump tables as code
>> - rather than as user defined data.  AVR has no constant pools.  */
>> -  {
>> -int fdata = flag_data_sections;
>> -
>> -flag_data_sections = flag_function_sections;
>> -frodata = default_function_rodata_section (decl);
>> -flag_data_sections = fdata;
>> -flags = frodata->common.flags;
>> -  }
>> -
>> -  if (frodata != readonly_data_section
>> -  && flags & SECTION_NAMED)
>> -{
>> -  /* Adjust section flags and replace section name prefix.  */
>> -
>> -  unsigned int i;
>> -
>> -  static const char* const prefix[] =
>> -{
>> -  ".rodata",  ".progmem.gcc_sw_table",
>> -  ".gnu.linkonce.r.", ".gnu.linkonce.t."
>> -};
>> -
>> -  for (i = 0; i < sizeof (prefix) / sizeof (*prefix); i += 2)
>> -{
>> -  const char * old_prefix = prefix[i];
>> -  const char * new_prefix = prefix[i+1];
>> -  const char * name = frodata->named.name;
>> -
>> -  if (STR_PREFIX_P (name, old_prefix))
>> -{
>> -  const char *rname = ACONCAT ((new_prefix,
>> -name + strlen (old_prefix), 
>> NULL));
>> -  flags &= ~SECTION_C

Re: [PATCH, i386, AVX-512] Add vectorizer support builtins

2016-06-07 Thread Kirill Yukhin
On 02 Jun 17:17, Ilya Verbin wrote:
> On Mon, May 23, 2016 at 19:11:53 +0300, Ilya Verbin wrote:
> > This patch adds missed 512-bit rounding builtins for vectorization.
> > Regtested on x86_64-linux and i686-linux.  OK for trunk?
> > 
> > gcc/
> > * config/i386/i386-builtin-types.def: Add V16SI_FTYPE_V16SF,
> > V8DF_FTYPE_V8DF_ROUND, V16SF_FTYPE_V16SF_ROUND, V16SI_FTYPE_V16SF_ROUND.
> > * config/i386/i386.c (enum ix86_builtins): Add
> > IX86_BUILTIN_CVTPS2DQ512_MASK, IX86_BUILTIN_FLOORPS512,
> > IX86_BUILTIN_FLOORPD512, IX86_BUILTIN_CEILPS512, IX86_BUILTIN_CEILPD512,
> > IX86_BUILTIN_TRUNCPS512, IX86_BUILTIN_TRUNCPD512,
> > IX86_BUILTIN_CVTPS2DQ512, IX86_BUILTIN_VEC_PACK_SFIX512,
> > IX86_BUILTIN_FLOORPS_SFIX512, IX86_BUILTIN_CEILPS_SFIX512,
> > IX86_BUILTIN_ROUNDPS_AZ_SFIX512.
> > (builtin_description bdesc_args): Add __builtin_ia32_floorps512,
> > __builtin_ia32_ceilps512, __builtin_ia32_truncps512,
> > __builtin_ia32_floorpd512, __builtin_ia32_ceilpd512,
> > __builtin_ia32_truncpd512, __builtin_ia32_cvtps2dq512,
> > __builtin_ia32_vec_pack_sfix512, __builtin_ia32_roundps_az_sfix512,
> > __builtin_ia32_floorps_sfix512, __builtin_ia32_ceilps_sfix512.
> > Change IX86_BUILTIN_CVTPS2DQ512 to IX86_BUILTIN_CVTPS2DQ512_MASK for
> > __builtin_ia32_cvtps2dq512_mask.
> > (ix86_expand_args_builtin): Handle V8DF_FTYPE_V8DF_ROUND,
> > V16SF_FTYPE_V16SF_ROUND, V16SI_FTYPE_V16SF_ROUND, V16SI_FTYPE_V16SF.
> > (ix86_builtin_vectorized_function): Handle builtins mentioned above.
> > * config/i386/sse.md
> > (avx512f_fix_notruncv16sfv16si):
> > Rename to ...
> > (avx512f_fix_notruncv16sfv16si): ... this.
> > (avx512f_cvtpd2dq512): Rename
> > to ...
> > (avx512f_cvtpd2dq512): ... this.
> > (avx512f_vec_pack_sfix_v8df): New define_expand.
> > (avx512f_roundpd512): Rename to ...
> > (avx512f_round512): ... this.  Change iterator.
> > (avx512f_roundps512_sfix): New define_expand.
> > (round2_sfix): Change iterator.
> > gcc/testsuite/
> > * gcc.target/i386/avx512f-ceil-vec-1.c: New test.
> > * gcc.target/i386/avx512f-ceil-vec-2.c: New test.
> > * gcc.target/i386/avx512f-ceilf-sfix-vec-1.c: New test.
> > * gcc.target/i386/avx512f-ceilf-sfix-vec-2.c: New test.
> > * gcc.target/i386/avx512f-ceilf-vec-1.c: New test.
> > * gcc.target/i386/avx512f-ceilf-vec-2.c: New test.
> > * gcc.target/i386/avx512f-floor-vec-1.c: New test.
> > * gcc.target/i386/avx512f-floor-vec-2.c: New test.
> > * gcc.target/i386/avx512f-floorf-sfix-vec-1.c: New test.
> > * gcc.target/i386/avx512f-floorf-sfix-vec-2.c: New test.
> > * gcc.target/i386/avx512f-floorf-vec-1.c: New test.
> > * gcc.target/i386/avx512f-floorf-vec-2.c: New test.
> > * gcc.target/i386/avx512f-rint-sfix-vec-1.c: New test.
> > * gcc.target/i386/avx512f-rint-sfix-vec-2.c: New test.
> > * gcc.target/i386/avx512f-rintf-sfix-vec-1.c: New test.
> > * gcc.target/i386/avx512f-rintf-sfix-vec-2.c: New test.
> > * gcc.target/i386/avx512f-round-sfix-vec-1.c: New test.
> > * gcc.target/i386/avx512f-round-sfix-vec-2.c: New test.
> > * gcc.target/i386/avx512f-roundf-sfix-vec-1.c: New test.
> > * gcc.target/i386/avx512f-roundf-sfix-vec-2.c: New test.
> > * gcc.target/i386/avx512f-trunc-vec-1.c: New test.
> > * gcc.target/i386/avx512f-trunc-vec-2.c: New test.
> > * gcc.target/i386/avx512f-truncf-vec-1.c: New test.
> > * gcc.target/i386/avx512f-truncf-vec-2.c: New test.
> 
> Is it OK for gcc-6-branch?
OK.
> 
>   -- Ilya
--
Thanks, K


OpenACC wait clause

2016-06-07 Thread Jakub Jelinek
Hi!

I've noticed
  if ((mask & OMP_CLAUSE_WAIT)
  && !c->wait
  && gfc_match ("wait") == MATCH_YES)
{
  c->wait = true; 
  match_oacc_expr_list (" (", &c->wait_list, false);
  continue;
}
which looks just weird and confusing.  Why isn't this instead:
  if ((mask & OMP_CLAUSE_WAIT)
  && !c->wait
  && (match_oacc_expr_list ("wait (", &c->wait_list, false)
  == MATCH_YES))
{
  c->wait = true; 
  continue;
}
?  Otherwise you happily accept wait without following (, perhaps even
combined with another clause without any space in between etc.

Jakub


Re: [PATCH 2/2][AArch64] Tests of AAPCS64 updates for alignment attribute

2016-06-07 Thread James Greenhalgh
On Fri, Jan 15, 2016 at 02:17:43PM +, Alan Lawrence wrote:
> Here I've added both tests using the abitest.h framework(which verifies values
> are passed in the correct registers as specified by the AAPCS64), and separate
> tests which verify that called functions read arguments from the same 
> locations
> as they are passed. Hence, each test_align-N.c corresponds to rec_align-N.c.
> 
> I've tried to stay consistent with the existing naming of e.g. test_10.c,
> test_align-1.c, va_arg-10.c, but would be happy to change to another scheme
> if preferred! (e.g. func-ret-1.c ?)
> 
> Cheers, Alan

These tests are OK.

I'll commit them alongside patch 1/2 from this series tomorrow.

Thanks,
James

> 
> gcc/testsuite/ChangeLog:
> 
>   * gcc.target/aarch64/aapcs64/aapcs64.exp: Also execute rec_*.c
>   * gcc.target//aarch64/aapcs64/rec_align-5.c: New.
>   * gcc.target//aarch64/aapcs64/rec_align-6.c: New.
>   * gcc.target//aarch64/aapcs64/rec_align-7.c: New.
>   * gcc.target//aarch64/aapcs64/rec_align-8.c: New.
>   * gcc.target//aarch64/aapcs64/rec_align-9.c: New.
>   * gcc.target//aarch64/aapcs64/test_align-5.c: New.
>   * gcc.target//aarch64/aapcs64/test_align-6.c: New.
>   * gcc.target//aarch64/aapcs64/test_align-7.c: New.
>   * gcc.target//aarch64/aapcs64/test_align-8.c: New.
>   * gcc.target//aarch64/aapcs64/test_align-9.c: New.
>   * gcc.target//aarch64/aapcs64/rec_vaarg-1.c: New.
>   * gcc.target//aarch64/aapcs64/rec_vaarg-2.c: New.



Re: [PATCH 1/2][AArch64] Implement AAPCS64 updates for alignment attribute

2016-06-07 Thread James Greenhalgh
On Fri, Jan 22, 2016 at 05:16:00PM +, Alan Lawrence wrote:
> 
> On 21/01/16 17:23, Alan Lawrence wrote:
> > On 18/01/16 17:10, Eric Botcazou wrote:
> >>
> >> Could you post the list of files that differ?  How do they differ exactly?
> >
> > Hmmm. Well, I definitely had this failing to bootstrap once. I repeated 
> > that, to
> > try to identify exactly what the differences wereand it succeeded even 
> > with
> > my pre-AAPCS64-update host compiler. So, this is probably a false alarm; I'm
> > bootstrapping again, after a rebase, to make sure...
> >
> > --Alan
> 
> Ok, rebased onto a more recent build, and bootstrapping with Ada posed no
> problems. Sorry for the noise.
> 
> However, I had to drop the assert that TYPE_FIELDS was non-null because of 
> some
> C++ testcases.
> 
> Is this version OK for trunk?

Now that we're in GCC7, this version of the patch is OK for trunk.

From my reading of Richard's AAPCS update, this patch implements the
rules as required.

I'll give this a day for any last minute comments from Richard/Marcus,
then commit this on your behalf tomorrow.

Thanks,
James

> gcc/ChangeLog:
> 
>   * gcc/config/aarch64/aarch64.c (aarch64_function_arg_alignment):
>   Rewrite, looking one level down for records and arrays.
> ---
>  gcc/config/aarch64/aarch64.c | 31 ---
>  1 file changed, 16 insertions(+), 15 deletions(-)
> 
> diff --git a/gcc/config/aarch64/aarch64.c b/gcc/config/aarch64/aarch64.c
> index 9142ac0..b084f83 100644
> --- a/gcc/config/aarch64/aarch64.c
> +++ b/gcc/config/aarch64/aarch64.c
> @@ -1925,22 +1925,23 @@ aarch64_vfp_is_call_candidate (cumulative_args_t 
> pcum_v, machine_mode mode,
>  static unsigned int
>  aarch64_function_arg_alignment (machine_mode mode, const_tree type)
>  {
> -  unsigned int alignment;
> +  if (!type)
> +return GET_MODE_ALIGNMENT (mode);
> +  if (integer_zerop (TYPE_SIZE (type)))
> +return 0;
>  
> -  if (type)
> -{
> -  if (!integer_zerop (TYPE_SIZE (type)))
> - {
> -   if (TYPE_MODE (type) == mode)
> - alignment = TYPE_ALIGN (type);
> -   else
> - alignment = GET_MODE_ALIGNMENT (mode);
> - }
> -  else
> - alignment = 0;
> -}
> -  else
> -alignment = GET_MODE_ALIGNMENT (mode);
> +  gcc_assert (TYPE_MODE (type) == mode);
> +
> +  if (!AGGREGATE_TYPE_P (type))
> +return TYPE_ALIGN (TYPE_MAIN_VARIANT (type));
> +
> +  if (TREE_CODE (type) == ARRAY_TYPE)
> +return TYPE_ALIGN (TREE_TYPE (type));
> +
> +  unsigned int alignment = 0;
> +
> +  for (tree field = TYPE_FIELDS (type); field; field = DECL_CHAIN (field))
> +alignment = std::max (alignment, DECL_ALIGN (field));
>  
>return alignment;
>  }
> -- 
> 1.9.1
> 



Re: [PATCH, aarch64] Fix 70048

2016-06-07 Thread Renlin Li

Hi Richard,

On 16/03/16 21:25, Richard Henderson wrote:

This fixes only the regression described in the PR.

There was quite a bit of follow-up that points to new work that ought to
be done during the gcc7 cycle, but isn't really appropriate now.

Tested on aarch64-linux; committed as reviewed in the PR.


r~



@@ -4953,74 +4963,43 @@ aarch64_legitimize_address (rtx x, rtx /* orig_x  */, 
machine_mode mode)

   if (GET_CODE (x) == PLUS && CONST_INT_P (XEXP (x, 1)))
 {
-  HOST_WIDE_INT offset = INTVAL (XEXP (x, 1));
-  HOST_WIDE_INT base_offset;
+  rtx base = XEXP (x, 0);
+  rtx offset_rtx XEXP (x, 1);




I recently read the aarch64_legitimize_address function, and find a 
suspicious line of code in the above change.


>> + rtx offset_rtx XEXP (x, 1);

It's committed by you. It looks like a typo, and an assignment seems 
missing?


James suggests me this is c++ initialization. Ah, yes it is! But I 
believe this is an coincident?

As you have different initialization code above.

I made an obvious patch to make it looks more intuitive, is it Okay?


Regards,
Renlin Li




gcc/changelog:

2016-06-06  renlin li  

* config/aarch64/aarch64.c (aarch64_legitimize_address): Add assignment.
commit 1fd77baf4ca918ed25dbce4678d7be7b7cd51be2
Author: Renlin Li 
Date:   Mon Jun 6 11:24:39 2016 +0100

fix type

diff --git a/gcc/config/aarch64/aarch64.c b/gcc/config/aarch64/aarch64.c
index ad07fe1..54e6813 100644
--- a/gcc/config/aarch64/aarch64.c
+++ b/gcc/config/aarch64/aarch64.c
@@ -4949,7 +4949,7 @@ aarch64_legitimize_address (rtx x, rtx /* orig_x  */, machine_mode mode)
   if (GET_CODE (x) == PLUS && CONST_INT_P (XEXP (x, 1)))
 {
   rtx base = XEXP (x, 0);
-  rtx offset_rtx XEXP (x, 1);
+  rtx offset_rtx = XEXP (x, 1);
   HOST_WIDE_INT offset = INTVAL (offset_rtx);
 
   if (GET_CODE (base) == PLUS)


Re: [PATCH] spellcheck.c: add test_find_closest_string

2016-06-07 Thread Bernd Schmidt

On 06/06/2016 11:03 PM, David Malcolm wrote:

This adds another test case to -fself-test.


Ok.


Bernd



Re: [PATCH] Add selftest for pretty-print.c

2016-06-07 Thread Bernd Schmidt

On 06/06/2016 11:28 PM, David Malcolm wrote:

+  assert_pp_format ("0xcafebabe", "%wx", (HOST_WIDE_INT)0xcafebabe);


More interesting tests would be to have multiple arguments to test that 
we really used the right size for the varargs. Maybe append a single %d 
arg with a unique bit pattern to all of them?


Otherwise ok.


Bernd


Re: [PATCH] Fix SLP wrong-code with VECTOR_BOOLEAN_TYPE_P (PR tree-optimization/71259)

2016-06-07 Thread Jakub Jelinek
On Tue, Jun 07, 2016 at 10:36:25AM +0100, Ramana Radhakrishnan wrote:
> On Tue, Jun 7, 2016 at 10:28 AM, Jakub Jelinek  wrote:
> > On Tue, Jun 07, 2016 at 11:23:01AM +0200, Christophe Lyon wrote:
> >> > --- gcc/testsuite/gcc.dg/vect/pr71259.c.jj  2016-06-03 
> >> > 17:05:37.693475438 +0200
> >> > +++ gcc/testsuite/gcc.dg/vect/pr71259.c 2016-06-03 17:05:32.418544731 
> >> > +0200
> >> > @@ -0,0 +1,28 @@
> >> > +/* PR tree-optimization/71259 */
> >> > +/* { dg-do run } */
> >> > +/* { dg-options "-O3" } */
> >
> > Would changing this from dg-options to dg-additional-options help for the
> > ARM issues?
> > check_vect () is the standard way for testing for HW vectorization support
> > and hundreds of tests use it.
> 
> 
> all tests in gcc.dg/vect have some form of dg-require-effective-target

No, at least 170+ tests don't.

> - so I think this test should just have dg-require-effective-target
> "vect_int" .

No, why?  This test doesn't test whether the function has been vectorized.
It only tests whether it works.
And the check_vect () is supposed to exit early if some extra flags were
passed by vect.exp (like e.g. on i?86-linux -msse2) and the HW doesn't
support those.

Jakub


Re: [PATCH] Fix SLP wrong-code with VECTOR_BOOLEAN_TYPE_P (PR tree-optimization/71259)

2016-06-07 Thread Ramana Radhakrishnan
On Tue, Jun 7, 2016 at 10:28 AM, Jakub Jelinek  wrote:
> On Tue, Jun 07, 2016 at 11:23:01AM +0200, Christophe Lyon wrote:
>> > --- gcc/testsuite/gcc.dg/vect/pr71259.c.jj  2016-06-03 
>> > 17:05:37.693475438 +0200
>> > +++ gcc/testsuite/gcc.dg/vect/pr71259.c 2016-06-03 17:05:32.418544731 +0200
>> > @@ -0,0 +1,28 @@
>> > +/* PR tree-optimization/71259 */
>> > +/* { dg-do run } */
>> > +/* { dg-options "-O3" } */
>
> Would changing this from dg-options to dg-additional-options help for the
> ARM issues?
> check_vect () is the standard way for testing for HW vectorization support
> and hundreds of tests use it.


all tests in gcc.dg/vect have some form of dg-require-effective-target
- so I think this test should just have dg-require-effective-target
"vect_int" .


Ramana

>
>> > +/* { dg-additional-options "-mavx" { target avx_runtime } } */
>> > +
>> > +#include "tree-vect.h"
>> > +
>> > +long a, b[1][44][2];
>> > +long long c[44][17][2];
>> > +
>> > +int
>> > +main ()
>> > +{
>> > +  int i, j, k;
>> > +  check_vect ();
>> > +  asm volatile ("" : : : "memory");
>> > +  for (i = 0; i < 44; i++)
>> > +for (j = 0; j < 17; j++)
>> > +  for (k = 0; k < 2; k++)
>> > +   c[i][j][k] = (30995740 >= *(k + *(j + *b)) != (a != 8)) - 
>> > 5105075050047261684;
>> > +  asm volatile ("" : : : "memory");
>> > +  for (i = 0; i < 44; i++)
>> > +for (j = 0; j < 17; j++)
>> > +  for (k = 0; k < 2; k++)
>> > +   if (c[i][j][k] != -5105075050047261684)
>> > + __builtin_abort ();
>> > +  return 0;
>> > +}
>> >
>>
>> This new test fails on ARM targets where the default FPU is not Neon like.
>> The error message I'm seeing is:
>> In file included from
>> /aci-gcc-fsf/sources/gcc-fsf/gccsrc/gcc/testsuite/gcc.dg/vect/pr71259.c:6:0:
>> /aci-gcc-fsf/sources/gcc-fsf/gccsrc/gcc/testsuite/gcc.dg/vect/tree-vect.h:
>> In function 'check_vect':
>> /aci-gcc-fsf/sources/gcc-fsf/gccsrc/gcc/testsuite/gcc.dg/vect/tree-vect.h:65:5:
>> error: inconsistent operand constraints in an 'asm'
>>
>> Well, the same error message actually appears with other tests, I did
>> notice this one because
>> it is a new one.
>>
>> The arm code is:
>> /* On some processors without NEON support, this instruction may
>>be a no-op, on others it may trap, so check that it executes
>>correctly.  */
>> long long a = 0, b = 1;
>> asm ("vorr %P0, %P1, %P2"
>>  : "=w" (a)
>>  : "0" (a), "w" (b));
>>
>> ... which has been here since 2007 :(
>>
>> IIUC, its purpose is to check Neon availability, but this makes the
>> tests fail instead of
>> being unsupported.
>>
>> Why not use an effective-target check instead?
>
> Jakub


Re: [PATCH][2/3] Vectorize inductions that are live after the loop

2016-06-07 Thread Rainer Orth
Alan Hayward  writes:

> On 05/06/2016 12:00, "Andreas Schwab"  wrote:
>
>>Alan Hayward  writes:
>>
>>> * gcc.dg/vect/vect-live-2.c: New test.
>>
>>This test fails on powerpc64 (with -m64, but not with -m32):
>>
>>$ grep 'vectorized.*loops' ./vect-live-2.c.149t.vect
>>../gcc/testsuite/gcc.dg/vect/vect-live-2.c:10:1: note: vectorized 0 loops
>>in function.
>>../gcc/testsuite/gcc.dg/vect/vect-live-2.c:29:1: note: vectorized 0 loops
>>in function.
>>
>>
>
> "note: not vectorized: relevant stmt not supported: _1 = (long unsigned
> int) j_24;"
>
>
> This is failing because power does not support vectorising a cast from int
> to long.
> (It works on power 32bit because longs are 32bit and therefore no need to
> cast).
>
> Can someone please suggest a target-supports define (or another method) I
> can use to
> disable this test for power 64bit (but not 32bit) ?
> I tried using vect_multiple_sizes, but that will also disable the test on
> x86 without
> avx.

I'm also seeing new FAILs on Solaris/SPARC:

+FAIL: gcc.dg/vect/vect-live-2.c -flto -ffat-lto-objects  scan-tree-dump-times v
ect "vectorized 1 loops" 1
+FAIL: gcc.dg/vect/vect-live-2.c scan-tree-dump-times vect "vectorized 1 loops" 
1

32- and 64-bit:

vect-live-2.c:16:3: note: not vectorized: relevant stmt not supported: _2 = 
j.0_1 * 4;
vect-live-2.c:48:7: note: not vectorized: control flow in loop.
vect-live-2.c:35:3: note: not vectorized: loop contains function calls or data 
references that cannot be analyzed

and

+FAIL: gcc.dg/vect/vect-live-slp-3.c -flto -ffat-lto-objects  
scan-tree-dump-times vect "vec_stmt_relevant_p: stmt live but not relevant" 4
+FAIL: gcc.dg/vect/vect-live-slp-3.c -flto -ffat-lto-objects  
scan-tree-dump-times vect "vectorized 1 loops" 4
+FAIL: gcc.dg/vect/vect-live-slp-3.c -flto -ffat-lto-objects  
scan-tree-dump-times vect "vectorizing stmts using SLP" 4
+FAIL: gcc.dg/vect/vect-live-slp-3.c scan-tree-dump-times vect 
"vec_stmt_relevant_p: stmt live but not relevant" 4
+FAIL: gcc.dg/vect/vect-live-slp-3.c scan-tree-dump-times vect "vectorized 1 
loops" 4
+FAIL: gcc.dg/vect/vect-live-slp-3.c scan-tree-dump-times vect "vectorizing 
stmts using SLP" 4

vect-live-slp-3.c:29:1: note: not vectorized: no vectype for stmt: n0_29 = *_4;
vect-live-slp-3.c:30:1: note: not vectorized: no vectype for stmt: n0_29 = *_4;
vect-live-slp-3.c:31:1: note: not vectorized: no vectype for stmt: n0_29 = *_4;
vect-live-slp-3.c:32:1: note: not vectorized: no vectype for stmt: n0_29 = *_4;
vect-live-slp-3.c:62:4: note: not vectorized: control flow in loop.
vect-live-slp-3.c:45:3: note: not vectorized: loop contains function calls or 
data references that cannot be analyzed

Rainer

-- 
-
Rainer Orth, Center for Biotechnology, Bielefeld University


Re: [PATCH] Fix SLP wrong-code with VECTOR_BOOLEAN_TYPE_P (PR tree-optimization/71259)

2016-06-07 Thread Jakub Jelinek
On Tue, Jun 07, 2016 at 11:23:01AM +0200, Christophe Lyon wrote:
> > --- gcc/testsuite/gcc.dg/vect/pr71259.c.jj  2016-06-03 
> > 17:05:37.693475438 +0200
> > +++ gcc/testsuite/gcc.dg/vect/pr71259.c 2016-06-03 17:05:32.418544731 +0200
> > @@ -0,0 +1,28 @@
> > +/* PR tree-optimization/71259 */
> > +/* { dg-do run } */
> > +/* { dg-options "-O3" } */

Would changing this from dg-options to dg-additional-options help for the
ARM issues?
check_vect () is the standard way for testing for HW vectorization support
and hundreds of tests use it.

> > +/* { dg-additional-options "-mavx" { target avx_runtime } } */
> > +
> > +#include "tree-vect.h"
> > +
> > +long a, b[1][44][2];
> > +long long c[44][17][2];
> > +
> > +int
> > +main ()
> > +{
> > +  int i, j, k;
> > +  check_vect ();
> > +  asm volatile ("" : : : "memory");
> > +  for (i = 0; i < 44; i++)
> > +for (j = 0; j < 17; j++)
> > +  for (k = 0; k < 2; k++)
> > +   c[i][j][k] = (30995740 >= *(k + *(j + *b)) != (a != 8)) - 
> > 5105075050047261684;
> > +  asm volatile ("" : : : "memory");
> > +  for (i = 0; i < 44; i++)
> > +for (j = 0; j < 17; j++)
> > +  for (k = 0; k < 2; k++)
> > +   if (c[i][j][k] != -5105075050047261684)
> > + __builtin_abort ();
> > +  return 0;
> > +}
> >
> 
> This new test fails on ARM targets where the default FPU is not Neon like.
> The error message I'm seeing is:
> In file included from
> /aci-gcc-fsf/sources/gcc-fsf/gccsrc/gcc/testsuite/gcc.dg/vect/pr71259.c:6:0:
> /aci-gcc-fsf/sources/gcc-fsf/gccsrc/gcc/testsuite/gcc.dg/vect/tree-vect.h:
> In function 'check_vect':
> /aci-gcc-fsf/sources/gcc-fsf/gccsrc/gcc/testsuite/gcc.dg/vect/tree-vect.h:65:5:
> error: inconsistent operand constraints in an 'asm'
> 
> Well, the same error message actually appears with other tests, I did
> notice this one because
> it is a new one.
> 
> The arm code is:
> /* On some processors without NEON support, this instruction may
>be a no-op, on others it may trap, so check that it executes
>correctly.  */
> long long a = 0, b = 1;
> asm ("vorr %P0, %P1, %P2"
>  : "=w" (a)
>  : "0" (a), "w" (b));
> 
> ... which has been here since 2007 :(
> 
> IIUC, its purpose is to check Neon availability, but this makes the
> tests fail instead of
> being unsupported.
> 
> Why not use an effective-target check instead?

Jakub


Re: [PATCH] Fix SLP wrong-code with VECTOR_BOOLEAN_TYPE_P (PR tree-optimization/71259)

2016-06-07 Thread Christophe Lyon
Hi Jakub,


On 3 June 2016 at 19:33, Jakub Jelinek  wrote:
> On Tue, Jan 12, 2016 at 05:21:37PM +0300, Ilya Enkovich wrote:
>> > --- gcc/tree-vect-slp.c.jj  2016-01-08 21:45:57.0 +0100
>> > +++ gcc/tree-vect-slp.c 2016-01-11 12:07:19.633366712 +0100
>> > @@ -2999,12 +2999,9 @@ vect_get_constant_vectors (tree op, slp_
>> >   gimple *init_stmt;
>> >   if (VECTOR_BOOLEAN_TYPE_P (vector_type))
>> > {
>> > - gcc_assert (fold_convertible_p (TREE_TYPE 
>> > (vector_type),
>> > - op));
>> > + gcc_assert (INTEGRAL_TYPE_P (TREE_TYPE (op)));
>> >   init_stmt = gimple_build_assign (new_temp, NOP_EXPR, 
>> > op);
>>
>> In vect_init_vector we had to introduce COND_EXPR to choose between 0 and -1 
>> for
>> boolean vectors.  Shouldn't we do similar in SLP?
>
> Apparently the answer to this is YES.
>
> Bootstrapped/regtested on x86_64-linux and i686-linux, ok for trunk/6.2?
>
> 2016-06-03  Jakub Jelinek  
>
> PR tree-optimization/71259
> * tree-vect-slp.c (vect_get_constant_vectors): For
> VECTOR_BOOLEAN_TYPE_P, return all ones constant instead of
> one for constant op, and use COND_EXPR for non-constant.
>
> * gcc.dg/vect/pr71259.c: New test.
>
> --- gcc/tree-vect-slp.c.jj  2016-05-24 10:56:02.0 +0200
> +++ gcc/tree-vect-slp.c 2016-06-03 17:01:12.740955935 +0200
> @@ -3056,7 +3056,7 @@ vect_get_constant_vectors (tree op, slp_
>   if (integer_zerop (op))
> op = build_int_cst (TREE_TYPE (vector_type), 0);
>   else if (integer_onep (op))
> -   op = build_int_cst (TREE_TYPE (vector_type), 1);
> +   op = build_all_ones_cst (TREE_TYPE (vector_type));
>   else
> gcc_unreachable ();
> }
> @@ -3071,8 +3071,14 @@ vect_get_constant_vectors (tree op, slp_
>   gimple *init_stmt;
>   if (VECTOR_BOOLEAN_TYPE_P (vector_type))
> {
> + tree true_val
> +   = build_all_ones_cst (TREE_TYPE (vector_type));
> + tree false_val
> +   = build_zero_cst (TREE_TYPE (vector_type));
>   gcc_assert (INTEGRAL_TYPE_P (TREE_TYPE (op)));
> - init_stmt = gimple_build_assign (new_temp, NOP_EXPR, 
> op);
> + init_stmt = gimple_build_assign (new_temp, COND_EXPR,
> +  op, true_val,
> +  false_val);
> }
>   else
> {
> --- gcc/testsuite/gcc.dg/vect/pr71259.c.jj  2016-06-03 17:05:37.693475438 
> +0200
> +++ gcc/testsuite/gcc.dg/vect/pr71259.c 2016-06-03 17:05:32.418544731 +0200
> @@ -0,0 +1,28 @@
> +/* PR tree-optimization/71259 */
> +/* { dg-do run } */
> +/* { dg-options "-O3" } */
> +/* { dg-additional-options "-mavx" { target avx_runtime } } */
> +
> +#include "tree-vect.h"
> +
> +long a, b[1][44][2];
> +long long c[44][17][2];
> +
> +int
> +main ()
> +{
> +  int i, j, k;
> +  check_vect ();
> +  asm volatile ("" : : : "memory");
> +  for (i = 0; i < 44; i++)
> +for (j = 0; j < 17; j++)
> +  for (k = 0; k < 2; k++)
> +   c[i][j][k] = (30995740 >= *(k + *(j + *b)) != (a != 8)) - 
> 5105075050047261684;
> +  asm volatile ("" : : : "memory");
> +  for (i = 0; i < 44; i++)
> +for (j = 0; j < 17; j++)
> +  for (k = 0; k < 2; k++)
> +   if (c[i][j][k] != -5105075050047261684)
> + __builtin_abort ();
> +  return 0;
> +}
>

This new test fails on ARM targets where the default FPU is not Neon like.
The error message I'm seeing is:
In file included from
/aci-gcc-fsf/sources/gcc-fsf/gccsrc/gcc/testsuite/gcc.dg/vect/pr71259.c:6:0:
/aci-gcc-fsf/sources/gcc-fsf/gccsrc/gcc/testsuite/gcc.dg/vect/tree-vect.h:
In function 'check_vect':
/aci-gcc-fsf/sources/gcc-fsf/gccsrc/gcc/testsuite/gcc.dg/vect/tree-vect.h:65:5:
error: inconsistent operand constraints in an 'asm'

Well, the same error message actually appears with other tests, I did
notice this one because
it is a new one.

The arm code is:
/* On some processors without NEON support, this instruction may
   be a no-op, on others it may trap, so check that it executes
   correctly.  */
long long a = 0, b = 1;
asm ("vorr %P0, %P1, %P2"
 : "=w" (a)
 : "0" (a), "w" (b));

... which has been here since 2007 :(

IIUC, its purpose is to check Neon availability, but this makes the
tests fail instead of
being unsupported.

Why not use an effective-target check instead?

Christophe.


>
> Jakub


Re: Unreviewed patches

2016-06-07 Thread Rainer Orth
Hi Gerald,

> On Mon, 6 Jun 2016, Rainer Orth wrote:
>> The following patches have remained unreviewed for a week:
>> 
>>  [gotools, libcc1] Update copyright dates
>> https://gcc.gnu.org/ml/gcc-patches/2016-05/msg02307.html
>> 
>> Richard already approved the update-copyright.py changes, but the actual
>> effects on gotools and libcc1 require either maintainer or release
>> manager approval, I believe.
>
> I think applying those updates is a direct consequence of updating
> the update-copyright.py script and you can just go ahead.

done now.  I'd just hoped to get word from the RMs what to do about
default_dirs in the script so the same doesn't happen again next year.

Rainer

-- 
-
Rainer Orth, Center for Biotechnology, Bielefeld University


Re: [PATCH] integer overflow checking builtins in constant expressions

2016-06-07 Thread Marek Polacek
On Mon, Jun 06, 2016 at 09:44:08PM +0200, Jakub Jelinek wrote:
> Hi!
> 
> On Mon, Jun 06, 2016 at 02:36:17PM +0200, Jakub Jelinek wrote:
> > 2016-06-06  Martin Sebor  
> > Jakub Jelinek  
> > 
> > PR c++/70507
> > PR c/68120
> > * builtins.def (BUILT_IN_ADD_OVERFLOW_P, BUILT_IN_SUB_OVERFLOW_P,
> > BUILT_IN_MUL_OVERFLOW_P): New builtins.
> > * builtins.c: Include gimple-fold.h.
> > (fold_builtin_arith_overflow): Handle
> > BUILT_IN_{ADD,SUB,MUL}_OVERFLOW_P.
> > (fold_builtin_3): Likewise.
> > * doc/extend.texi (Integer Overflow Builtins): Document
> > __builtin_{add,sub,mul}_overflow_p.
> > gcc/c/
> > * c-typeck.c (convert_arguments): Don't promote last argument
> > of BUILT_IN_{ADD,SUB,MUL}_OVERFLOW_P.
> > gcc/cp/
> > * constexpr.c: Include gimple-fold.h.
> > (cxx_eval_internal_function): New function.
> > (cxx_eval_call_expression): Call it.
> > (potential_constant_expression_1): Handle integer arithmetic
> > overflow built-ins.
> > * tree.c (builtin_valid_in_constant_expr_p): Likewise.
> > gcc/c-family/
> > * c-common.c (check_builtin_function_arguments): Handle
> > BUILT_IN_{ADD,SUB,MUL}_OVERFLOW_P.
> > gcc/testsuite/
> > * c-c++-common/builtin-arith-overflow-1.c: Add test cases.
> > * c-c++-common/builtin-arith-overflow-2.c: New test.
> > * g++.dg/cpp0x/constexpr-arith-overflow.C: New test.
> > * g++.dg/cpp1y/constexpr-arith-overflow.C: New test.
> 
> Now successfully bootstrapped/regtested on x86_64-linux and i686-linux, ok
> for trunk?

I just played with this a bit -- nice.  The c/ and c-family/ parts are OK.

Marek


Re: [PATCH][2/3] Vectorize inductions that are live after the loop

2016-06-07 Thread Alan Hayward


On 05/06/2016 12:00, "Andreas Schwab"  wrote:

>Alan Hayward  writes:
>
>>  * gcc.dg/vect/vect-live-2.c: New test.
>
>This test fails on powerpc64 (with -m64, but not with -m32):
>
>$ grep 'vectorized.*loops' ./vect-live-2.c.149t.vect
>../gcc/testsuite/gcc.dg/vect/vect-live-2.c:10:1: note: vectorized 0 loops
>in function.
>../gcc/testsuite/gcc.dg/vect/vect-live-2.c:29:1: note: vectorized 0 loops
>in function.
>
>

"note: not vectorized: relevant stmt not supported: _1 = (long unsigned
int) j_24;"


This is failing because power does not support vectorising a cast from int
to long.
(It works on power 32bit because longs are 32bit and therefore no need to
cast).

Can someone please suggest a target-supports define (or another method) I
can use to
disable this test for power 64bit (but not 32bit) ?
I tried using vect_multiple_sizes, but that will also disable the test on
x86 without
avx.


Thanks,
Alan.




[AArch64] ARMv8.2 command line and feature macros support

2016-06-07 Thread Jiong Wang

ARMv8.2-A extends the ARMv8.1-A architecture, adding features that
include an optional extension to support FP16 in floating-point and
Adv.SIMD instructions. This patch adds the command line and feature
flags for ARMv8.2-A and the FP16 support.

The new command option -march=armv8.2-a will enable the new ARMv8.2-A
feature macro, which is a superset of the ARMv8.1 features.

The new feature "fp16" can be enabled by -march=armv8.2-a+fp16 which
will enable ARMv8.2 FP16 extension support.

The feature macro __ARM_FEATURE_FP16_SCALAR_ARITHMETIC is defined to
be 1 if the scalar FP16 instructions are available, it is otherwise
undefined.

The feature macro __ARM_FEATURE_FP16_VECTOR_ARITHMETIC is defined to
be 1 if the vector FP16 instructions are available, it is otherwise
undefined.

OK for trunk?

Thanks.

2016-06-07  Matthew Wahab
Jiong Wang

* config/aarch64/aarch64-arches.def: Add "armv8.2-a".
* config/aarch64/aarch64.h (AARCH64_FL_V8_2): New.
(AARCH64_FL_F16): New.
(AARCH64_FL_FOR_ARCH8_2): New.
(AARCH64_ISA_8_2): New.
(AARCH64_ISA_F16): New.
(TARGET_FP_F16INST): New.
(TARGET_SIMD_F16INST): New.
* config/aarch64/aarch64-option-extensions.def: New entry for "fp16".
* config/aarch64/aarch64-c.c (arch64_update_cpp_builtins): 
Conditionally define
__ARM_FEATURE_FP16_SCALAR_ARITHMETIC and 
__ARM_FEATURE_FP16_VECTOR_ARITHMETIC.
* doc/invoke.texi (AArch64 Options): Document "armv8.2-a" and "fp16".

diff --git a/gcc/config/aarch64/aarch64-arches.def b/gcc/config/aarch64/aarch64-arches.def
index 1e9d90b1b66a21ebbe4c238ce844c6fd3a192201..7dcf140411f6eb95504d9b92df9dadce50529a28 100644
--- a/gcc/config/aarch64/aarch64-arches.def
+++ b/gcc/config/aarch64/aarch64-arches.def
@@ -32,4 +32,5 @@
 
 AARCH64_ARCH("armv8-a",	  generic,	 8A,	8,  AARCH64_FL_FOR_ARCH8)
 AARCH64_ARCH("armv8.1-a", generic,	 8_1A,	8,  AARCH64_FL_FOR_ARCH8_1)
+AARCH64_ARCH("armv8.2-a", generic,	 8_2A,	8,  AARCH64_FL_FOR_ARCH8_2)
 
diff --git a/gcc/config/aarch64/aarch64-c.c b/gcc/config/aarch64/aarch64-c.c
index e64dc7676ccae10e87ade3946904408fe425730d..3380ed6f2cd0ae35fd6a4e53177604256875e6de 100644
--- a/gcc/config/aarch64/aarch64-c.c
+++ b/gcc/config/aarch64/aarch64-c.c
@@ -95,6 +95,11 @@ aarch64_update_cpp_builtins (cpp_reader *pfile)
   else
 cpp_undef (pfile, "__ARM_FP");
 
+  aarch64_def_or_undef (TARGET_FP_F16INST,
+			"__ARM_FEATURE_FP16_SCALAR_ARITHMETIC", pfile);
+  aarch64_def_or_undef (TARGET_SIMD_F16INST,
+			"__ARM_FEATURE_FP16_VECTOR_ARITHMETIC", pfile);
+
   aarch64_def_or_undef (TARGET_SIMD, "__ARM_FEATURE_NUMERIC_MAXMIN", pfile);
   aarch64_def_or_undef (TARGET_SIMD, "__ARM_NEON", pfile);
 
diff --git a/gcc/config/aarch64/aarch64-option-extensions.def b/gcc/config/aarch64/aarch64-option-extensions.def
index e8706d1c2e798872b8028cce1d9d193df8fef0be..ff59dceabffe780adc84d953823800ba2abca5f3 100644
--- a/gcc/config/aarch64/aarch64-option-extensions.def
+++ b/gcc/config/aarch64/aarch64-option-extensions.def
@@ -55,3 +55,6 @@ AARCH64_OPT_EXTENSION("crc", AARCH64_FL_CRC, 0, 0, "crc32")
 
 /* Enabling or disabling "lse" only changes "lse".  */
 AARCH64_OPT_EXTENSION("lse", AARCH64_FL_LSE, 0, 0, "atomics")
+
+/* Enabling or disabling "fp16" only changes "fp16".  */
+AARCH64_OPT_EXTENSION("fp16", AARCH64_FL_F16, 0, 0, "fp16")
diff --git a/gcc/config/aarch64/aarch64.h b/gcc/config/aarch64/aarch64.h
index 15d7e4019adf207d5127ebba31af35a8b3437c5b..a71420382069b66e2faef15c238f07374683b4fc 100644
--- a/gcc/config/aarch64/aarch64.h
+++ b/gcc/config/aarch64/aarch64.h
@@ -135,6 +135,8 @@ extern unsigned aarch64_architecture_version;
 /* ARMv8.1 architecture extensions.  */
 #define AARCH64_FL_LSE	  (1 << 4)  /* Has Large System Extensions.  */
 #define AARCH64_FL_V8_1	  (1 << 5)  /* Has ARMv8.1 extensions.  */
+#define AARCH64_FL_V8_2	  (1 << 8)  /* Has ARMv8.2 features.  */
+#define AARCH64_FL_F16	  (1 << 9)  /* Has ARMv8.2 FP16 extensions.  */
 
 /* Has FP and SIMD.  */
 #define AARCH64_FL_FPSIMD (AARCH64_FL_FP | AARCH64_FL_SIMD)
@@ -146,6 +148,8 @@ extern unsigned aarch64_architecture_version;
 #define AARCH64_FL_FOR_ARCH8   (AARCH64_FL_FPSIMD)
 #define AARCH64_FL_FOR_ARCH8_1			   \
   (AARCH64_FL_FOR_ARCH8 | AARCH64_FL_LSE | AARCH64_FL_CRC | AARCH64_FL_V8_1)
+#define AARCH64_FL_FOR_ARCH8_2			\
+  (AARCH64_FL_FOR_ARCH8_1 | AARCH64_FL_V8_2)
 
 /* Macros to test ISA flags.  */
 
@@ -155,6 +159,8 @@ extern unsigned aarch64_architecture_version;
 #define AARCH64_ISA_SIMD   (aarch64_isa_flags & AARCH64_FL_SIMD)
 #define AARCH64_ISA_LSE		   (aarch64_isa_flags & AARCH64_FL_LSE)
 #define AARCH64_ISA_RDMA	   (aarch64_isa_flags & AARCH64_FL_V8_1)
+#define AARCH64_ISA_V8_2	   (aarch64_isa_flags & AARCH64_FL_V8_2)
+#define AARCH64_ISA_F16		   (aarch64_isa_flags & AARCH64_FL_F16)
 
 /* Crypto is an optional extension to AdvSIMD.  */
 #define TARGET_CRYPTO (TARGET_SIMD && AARCH64_ISA_CRYPT

Re: [ARM] implement division using vrecpe/vrecps with -funsafe-math-optimizations

2016-06-07 Thread Ramana Radhakrishnan
>> Please find the updated patch attached.
>> It passes testsuite for arm-none-linux-gnueabi, arm-none-linux-gnueabihf and
>> arm-none-eabi.
>> However the test-case added in the patch (neon-vect-div-1.c) fails to
>> get vectorized at -O2
>> for armeb-none-linux-gnueabihf.
>> Charles suggested me to try with -O3, which worked.
>> It appears the test-case fails to get vectorized with
>> -fvect-cost-model=cheap (which is default enabled at -O2)
>> and passes for -fno-vect-cost-model / -fvect-cost-model=dynamic
>>
>> I can't figure out why it fails -fvect-cost-model=cheap.
>> From the vect dump (attached):
>> neon-vect-div-1.c:12:3: note: Setting misalignment to -1.
>> neon-vect-div-1.c:12:3: note: not vectorized: unsupported unaligned load.*_9
> Hi,
> I think I have some idea why the test-case fails attached with patch
> fail to get vectorized on armeb with -O2.
>
> Issue with big endian vectorizer:
> The patch does not cause regressions on big endian vectorizer but
> fails to vectorize the test-cases attached with the patch, while they
> get vectorized on
> litttle-endian.
> Fails with armeb with the following message in dump:
> note: not vectorized: unsupported unaligned load.*_9
>
> The behavior of big and little endian vectorizer seems to be different
> in arm_builtin_support_vector_misalignment() which overrides the hook
> targetm.vectorize.support_vector_misalignment().
>
> targetm.vectorize.support_vector_misalignment is called by
> vect_supportable_dr_alignment () which in turn is called
> by verify_data_refs_alignment ().
>
> Execution upto following condition is common between arm and armeb
> in vect_supportable_dr_alignment():
>
> if ((TYPE_USER_ALIGN (type) && !is_packed)
>   || targetm.vectorize.support_vector_misalignment (mode, type,
> DR_MISALIGNMENT (dr), is_packed))
> /* Can't software pipeline the loads, but can at least do them.  */
> return dr_unaligned_supported;
>
> For little endian case:
> arm_builtin_support_vector_misalignment() is called with
> V2SF mode and misalignment == -1, and the following condition
> becomes true:
> /* If the misalignment is unknown, we should be able to handle the access
>  so long as it is not to a member of a packed data structure.  */
>   if (misalignment == -1)
> return true;
>
> Since the hook returned true we enter the condition above in
> vect_supportable_dr_alignment() and return dr_unaligned_supported;
>
> For big-endian:
> arm_builtin_support_vector_misalignment() is called with V2SF mode.
> The following condition that gates the entire function body fails:
>  if (TARGET_NEON && !BYTES_BIG_ENDIAN && unaligned_access)
> and the default hook gets called with V2SF mode and the default hook
> returns false because
> movmisalign_optab does not exist for V2SF mode.
>
> So the condition above in vect_supportable_dr_alignment() fails
> and we come here:
>  /* Unsupported.  */
> return dr_unaligned_unsupported;
>
> And hence we get the unaligned load not supported message in the dump
> for armeb in verify_data_ref_alignment ():
>
> static bool
> verify_data_ref_alignment (data_reference_p dr)
> {
>   enum dr_alignment_support supportable_dr_alignment
> = vect_supportable_dr_alignment (dr, false);
>   if (!supportable_dr_alignment)
> {
>   if (dump_enabled_p ())
> {
>   if (DR_IS_READ (dr))
> dump_printf_loc (MSG_MISSED_OPTIMIZATION, vect_location,
>  "not vectorized: unsupported unaligned load.");
>
> With -O3, the test-cases vectorize for armeb, because loop peeling for 
> alignment
> is turned on.
> The above behavior is also reproducible with test-case which is
> irrelevant to the patch.
> for instance, we get the same unsupported unaligned load for following
> test-case (replaced / with +)
>
> void
> foo (int len, float * __restrict p, float *__restrict x)
> {
>   len = len & ~31;
>   for (int i = 0; i < len; i++)
> p[i] = p[i] + x[i];
> }
> Is the patch OK to commit after bootstrap+test ?


Thanks for the analysis - all the test needs is an additional marker
to skip it on armeb (is there a helper for misaligned loads from the
vectorizer ? ) - Ah probably vect_hw_misalign is sufficient for your
usecase and you want to appropriately fix it for little endian arm
with neon support enabled.

From the patch.

>>+   && flag_unsafe_math_optimizations && flag_reciprocal_math"

Why do we need flag_unsafe_math_optimizations && flag_reciprocal_math
? flag_unsafe_math_optimizations should be sufficient since it enables
flag_reciprocal_math - the reason for flag_unsafe_math_optimizations
is to prevent loss of precision and the fact that on neon denormalized
numbers are flushed to zero.

Ok with that change and a quick test with vect_hw_misalign added to
your testcase.

Sorry about the delay in reviewing.

Ramana


>
> Thanks,
> Prathamesh
>>
>> Thanks,
>> Prathamesh
>>>
>>> Thanks,
>>> Ramana
>>>

 Thanks,

Re: move increase_alignment from simple to regular ipa pass

2016-06-07 Thread Prathamesh Kulkarni
On 3 June 2016 at 13:35, Jan Hubicka  wrote:
>> > fsection-anchors
>> > Common Report Var(flag_section_anchors)
>> > Access data in the same section from shared anchor points.
>>
>> Funny.  I see the following on trunk:
>>
>> fsection-anchors
>> Common Report Var(flag_section_anchors) Optimization
>> Access data in the same section from shared anchor points.
>
> Aha, my local change from last year still inmy tree. Sorry.
> Yep, having it as Optimization makes sense, but we need to be sure it works 
> as intended.
>>
>> > flag_section_anchors is not declared as Optimiation, so it can't be 
>> > function
>> > specific right now. It probably should because it is an optimization.  This
>> > makes me wonder what happens when one function have anchors enabled and 
>> > other
>> > doesn't?  Probably anchroing or not anchoring the var will then depend on 
>> > what
>> > function comes first in the compilation order and then we will need to make
>> > backend grok the case where static var is anchored but 
>> > flag_section_anchors is
>> > off.
>>
>> This is because we represent the anchor with DECL_RTL, right?  Maybe
>> DECL_RTL of globals needs to be re-computed for each function...
>
> I would rather anchor variable if it is used by at least one function that is 
> compiled
> with anchors.  Accessing anchors is IMO no slower than accessing symbols. But 
> I am not
> that familiar witht his code...
>>
>> > I dunno what is the desired behaviour for LTOint together different code
>> > models.
>>
>> Good question.  There's always the choice to remove 'Optimization' and
>> enforce same setting for all TUs we LTO in lto-wrapper.
>
> Yep. Not sure what is better - I did not really think of targets that use both
> models.
Um I am not really sure what to do next to convert increase_alignment
to regular pass, I would be grateful
for suggestions.

Thanks,
Prathamesh
>
> Honza
>>
>> Richard.


Re: [PATCH][AArch64] Model CSEL instruction in Cortex-A57 scheduling model

2016-06-07 Thread James Greenhalgh
On Tue, Jun 07, 2016 at 09:22:05AM +0100, Ramana Radhakrishnan wrote:
> 
> 
> On 06/06/16 17:10, Kyrill Tkachov wrote:
> > Hi all,
> > 
> > This small patch adds handling of the CSEL-type instructions to the 
> > Cortex-A57 scheduling model.
> > It is treated the same as simple ALU instructions.
> > 
> > With this patch I didn't see any overall differences in SPEC2006.
> > 
> > Bootstrapped and tested on arm-none-linux-gnueabihf and aarch64-linux-gnu.
> > 
> > Ok for trunk?
> > 
> > The patch is very simple and the csel value isn't used in any arm 
> > instructions so I think
> > just an aarch64 approval for this should be enough.
> > 
> > Thanks,
> > Kyrill
> > 
 
> 
> This is ok by me - in these sorts of situations I think we should let the
> aarch64 approvals be sufficient.

I agree. I don't think the folder in which the file resides is the most
important thing, it is the intent and risk of the patch that matters.

This patch is OK for AArch64.

Thanks,
James

> > 2016-06-06  Kyrylo Tkachov  
> > 
> > * config/arm/cortex-a57.md (cortex_a57_alu):
> > Handle csel type.
>



Re: [RFC] [2/2] divmod transform: override expand_divmod_libfunc for ARM and add test-cases

2016-06-07 Thread Prathamesh Kulkarni
ping https://gcc.gnu.org/ml/gcc-patches/2016-05/msg02008.html

Thanks,
Prathamesh

On 25 May 2016 at 18:19, Prathamesh Kulkarni
 wrote:
> On 23 May 2016 at 14:28, Prathamesh Kulkarni
>  wrote:
>> Hi,
>> This patch overrides expand_divmod_libfunc for ARM port and adds test-cases.
>> I separated the SImode tests into separate file from DImode tests
>> because certain arm configs (cortex-15) have hardware div insn for
>> SImode but not for DImode,
>> and for that config we want SImode tests to be disabled but not DImode tests.
>> The patch therefore has two target-effective checks: divmod and 
>> divmod_simode.
>> Cross-tested on arm*-*-*.
>> Bootstrap+test on arm-linux-gnueabihf in progress.
>> Does this patch look OK ?
> Hi,
> This version adds couple of more test-cases and fixes typo in
> divmod-3-simode.c, divmod-4-simode.c
>
> Thanks,
> Prathamesh
>>
>> Thanks,
>> Prathamesh


Re: [ARM] implement division using vrecpe/vrecps with -funsafe-math-optimizations

2016-06-07 Thread Prathamesh Kulkarni
On 30 May 2016 at 13:52, Prathamesh Kulkarni
 wrote:
> On 23 May 2016 at 14:59, Prathamesh Kulkarni
>  wrote:
>> On 5 February 2016 at 18:40, Prathamesh Kulkarni
>>  wrote:
>>> On 4 February 2016 at 16:31, Ramana Radhakrishnan
>>>  wrote:
 On Sun, Jan 17, 2016 at 9:06 AM, Prathamesh Kulkarni
  wrote:
> On 31 July 2015 at 15:04, Ramana Radhakrishnan
>  wrote:
>>
>>
>> On 29/07/15 11:09, Prathamesh Kulkarni wrote:
>>> Hi,
>>> This patch tries to implement division with multiplication by
>>> reciprocal using vrecpe/vrecps
>>> with -funsafe-math-optimizations and -freciprocal-math enabled.
>>> Tested on arm-none-linux-gnueabihf using qemu.
>>> OK for trunk ?
>>>
>>> Thank you,
>>> Prathamesh
>>>
>>
>> I've tried this in the past and never been convinced that 2 iterations 
>> are enough to get to stability with this given that the results are only 
>> precise for 8 bits / iteration. Thus I've always believed you need 3 
>> iterations rather than 2 at which point I've never been sure that it's 
>> worth it. So the testing that you've done with this currently is not 
>> enough for this to go into the tree.
>>
>> I'd like this to be tested on a couple of different AArch32 
>> implementations with a wider range of inputs to verify that the results 
>> are acceptable as well as running something like SPEC2k(6) with atleast 
>> one iteration to ensure correctness.
> Hi,
> I got results of SPEC2k6 fp benchmarks:
> a15: +0.64% overall, 481.wrf: +6.46%
> a53: +0.21% overall, 416.gamess: -1.39%, 481.wrf: +6.76%
> a57: +0.35% overall, 481.wrf: +3.84%
> The other benchmarks had (almost) identical results.

 Thanks for the benchmarking results -  Please repost the patch with
 the changes that I had requested in my previous review - given it is
 now stage4 , I would rather queue changes like this for stage1 now.
>>> Hi,
>>> Please find the updated patch attached.
>>> It passes testsuite for arm-none-linux-gnueabi, arm-none-linux-gnueabihf and
>>> arm-none-eabi.
>>> However the test-case added in the patch (neon-vect-div-1.c) fails to
>>> get vectorized at -O2
>>> for armeb-none-linux-gnueabihf.
>>> Charles suggested me to try with -O3, which worked.
>>> It appears the test-case fails to get vectorized with
>>> -fvect-cost-model=cheap (which is default enabled at -O2)
>>> and passes for -fno-vect-cost-model / -fvect-cost-model=dynamic
>>>
>>> I can't figure out why it fails -fvect-cost-model=cheap.
>>> From the vect dump (attached):
>>> neon-vect-div-1.c:12:3: note: Setting misalignment to -1.
>>> neon-vect-div-1.c:12:3: note: not vectorized: unsupported unaligned load.*_9
>> Hi,
>> I think I have some idea why the test-case fails attached with patch
>> fail to get vectorized on armeb with -O2.
>>
>> Issue with big endian vectorizer:
>> The patch does not cause regressions on big endian vectorizer but
>> fails to vectorize the test-cases attached with the patch, while they
>> get vectorized on
>> litttle-endian.
>> Fails with armeb with the following message in dump:
>> note: not vectorized: unsupported unaligned load.*_9
>>
>> The behavior of big and little endian vectorizer seems to be different
>> in arm_builtin_support_vector_misalignment() which overrides the hook
>> targetm.vectorize.support_vector_misalignment().
>>
>> targetm.vectorize.support_vector_misalignment is called by
>> vect_supportable_dr_alignment () which in turn is called
>> by verify_data_refs_alignment ().
>>
>> Execution upto following condition is common between arm and armeb
>> in vect_supportable_dr_alignment():
>>
>> if ((TYPE_USER_ALIGN (type) && !is_packed)
>>   || targetm.vectorize.support_vector_misalignment (mode, type,
>> DR_MISALIGNMENT (dr), is_packed))
>> /* Can't software pipeline the loads, but can at least do them.  */
>> return dr_unaligned_supported;
>>
>> For little endian case:
>> arm_builtin_support_vector_misalignment() is called with
>> V2SF mode and misalignment == -1, and the following condition
>> becomes true:
>> /* If the misalignment is unknown, we should be able to handle the access
>>  so long as it is not to a member of a packed data structure.  */
>>   if (misalignment == -1)
>> return true;
>>
>> Since the hook returned true we enter the condition above in
>> vect_supportable_dr_alignment() and return dr_unaligned_supported;
>>
>> For big-endian:
>> arm_builtin_support_vector_misalignment() is called with V2SF mode.
>> The following condition that gates the entire function body fails:
>>  if (TARGET_NEON && !BYTES_BIG_ENDIAN && unaligned_access)
>> and the default hook gets called with V2SF mode and the default hook
>> returns false because
>> movmisalign_optab does not exist for V2SF mode.
>>
>> So the condition above in vect_supportable_dr_alignment() fails
>> and we come here:
>>  /* Unsu

Re: [PATCH] Fix PR61564 - optimize attribute/pragma accepting any option

2016-06-07 Thread Richard Biener
On Tue, 7 Jun 2016, Jakub Jelinek wrote:

> On Tue, Jun 07, 2016 at 10:15:39AM +0200, Richard Biener wrote:
> > 
> > This fixes PR61564 by diagnosing (and ignoring) options not marked with
> > 'Optimization' being applied to #pragma GCC optimize or via the
> > optimize attribute.
> > 
> > The reason is that while we save/restore option state for 'Optimize'
> > marked options we don't do that for other options.  Thus while such
> > options do not end up in the per-function optimize state applying them
> > still clobbers the global state.
> > 
> > Bootstrap and regtest running on x86_64-unknown-linux-gnu.
> > 
> > Ok for trunk?
> 
> Ok (though it surprises me we haven't done that from the beginning).

Yes, I was surprised by that as well...

Richard.

> > 2016-06-07  Richard Biener  
> > 
> > PR c/61564
> > * c-common.c (parse_optimize_options): Only apply CL_OPTIMIZATION
> > options and warn about others.
> > 
> > * gcc.dg/Wpragmas-1.c: New testcase.
> > * gcc.dg/Wattributes-4.c: Likewise.


Re: [PATCH] Fix PR61564 - optimize attribute/pragma accepting any option

2016-06-07 Thread Jakub Jelinek
On Tue, Jun 07, 2016 at 10:15:39AM +0200, Richard Biener wrote:
> 
> This fixes PR61564 by diagnosing (and ignoring) options not marked with
> 'Optimization' being applied to #pragma GCC optimize or via the
> optimize attribute.
> 
> The reason is that while we save/restore option state for 'Optimize'
> marked options we don't do that for other options.  Thus while such
> options do not end up in the per-function optimize state applying them
> still clobbers the global state.
> 
> Bootstrap and regtest running on x86_64-unknown-linux-gnu.
> 
> Ok for trunk?

Ok (though it surprises me we haven't done that from the beginning).

> 2016-06-07  Richard Biener  
> 
>   PR c/61564
>   * c-common.c (parse_optimize_options): Only apply CL_OPTIMIZATION
>   options and warn about others.
> 
>   * gcc.dg/Wpragmas-1.c: New testcase.
>   * gcc.dg/Wattributes-4.c: Likewise.

Jakub


Re: [PATCH][AArch64] Model CSEL instruction in Cortex-A57 scheduling model

2016-06-07 Thread Ramana Radhakrishnan


On 06/06/16 17:10, Kyrill Tkachov wrote:
> Hi all,
> 
> This small patch adds handling of the CSEL-type instructions to the 
> Cortex-A57 scheduling model.
> It is treated the same as simple ALU instructions.
> 
> With this patch I didn't see any overall differences in SPEC2006.
> 
> Bootstrapped and tested on arm-none-linux-gnueabihf and aarch64-linux-gnu.
> 
> Ok for trunk?
> 
> The patch is very simple and the csel value isn't used in any arm 
> instructions so I think
> just an aarch64 approval for this should be enough.
> 
> Thanks,
> Kyrill
> 
> 2016-06-06  Kyrylo Tkachov  
> 
> * config/arm/cortex-a57.md (cortex_a57_alu):
> Handle csel type.


This is ok by me - in these sorts of situations I think we should let the 
aarch64 approvals be sufficient.

regards
Ramana


Re: [PATCH][ARM] Add initial support for Cortex-A73

2016-06-07 Thread Ramana Radhakrishnan


On 06/06/16 17:18, Kyrill Tkachov wrote:
> Hi all,
> 
> This patch adds initial support for the Cortex-A73 processor through the
> cortex-a73, cortex-a73.cortex-a35 and cortex-a73.cortex-a53 arguments to 
> -mcpu and -mtune.
> 
> The Cortex-A73 is an ARMv8-A processor.
> 
> Bootstrapped and tested on arm-none-linux-gnueabihf with an appropriately
> patched binutils that understands the relevant -mcpu argument.
> 
> Ok for trunk?
> 

Ok. Any reason why we can't move the ARM port also to a world where we work 
with .arch directives instead of .cpu directives like the aarch64 port.?

Thanks,
Ramana
> Thanks,
> Kyrill
> 
> 2016-06-06  Kyrylo Tkachov  
> 
> * config/arm/arm.c (arm_cortex_a73_tune): New struct.
> * config/arm/arm-cores.def (cortex-a73): New entry.
> (cortex-a73.cortex-a35): Likewise.
> (cortex-a73.cortex-a53): Likewise.
> * config/arm/arm-tables.opt: Regenerate.
> * config/arm/arm-tune.md: Likewise.
> * config/arm/bpabi.h (BE8_LINK_SPEC): Handle mcpu=cortex-a73,
> mcpu=cortex-a73.cortex-a35 and mcpu=cortex-a73.cortex-a53.
> * config/arm/t-aprofile: Handle mcpu=cortex-a73,
> mcpu=cortex-a73.cortex-a35 and mcpu=cortex-a73.cortex-a53.
> * doc/invoke.texi (ARM Options): Document cortex-a73,
> cortex-a73.cortex-a35 and cortex-a73.cortex-a53.


[PATCH] Fix PR61564 - optimize attribute/pragma accepting any option

2016-06-07 Thread Richard Biener

This fixes PR61564 by diagnosing (and ignoring) options not marked with
'Optimization' being applied to #pragma GCC optimize or via the
optimize attribute.

The reason is that while we save/restore option state for 'Optimize'
marked options we don't do that for other options.  Thus while such
options do not end up in the per-function optimize state applying them
still clobbers the global state.

Bootstrap and regtest running on x86_64-unknown-linux-gnu.

Ok for trunk?

Note this also points to the unfortunate ways the FEs apply
the pragma to functions which seems to be via adding optimize
attributes literally rather than copying a tree optimization
node constructable by parse_optimize_options.  That results
in excessive diagnostics also reproducable without the patch
if you use sth like #pragma GCC optimize ("-Xhost").

Thanks,
Richard.

2016-06-07  Richard Biener  

PR c/61564
* c-common.c (parse_optimize_options): Only apply CL_OPTIMIZATION
options and warn about others.

* gcc.dg/Wpragmas-1.c: New testcase.
* gcc.dg/Wattributes-4.c: Likewise.

Index: gcc/c-family/c-common.c
===
*** gcc/c-family/c-common.c (revision 237167)
--- gcc/c-family/c-common.c (working copy)
*** parse_optimize_options (tree args, bool
*** 9580,9585 
--- 9580,9608 
decode_cmdline_options_to_array_default_mask (opt_argc, opt_argv,
&decoded_options,
&decoded_options_count);
+   /* Drop non-Optimization options.  */
+   unsigned j = 1;
+   for (i = 1; i < decoded_options_count; ++i)
+ {
+   if (! (cl_options[decoded_options[i].opt_index].flags & 
CL_OPTIMIZATION))
+   {
+ ret = false;
+ if (attr_p)
+   warning (OPT_Wattributes,
+"bad option %s to optimize attribute",
+decoded_options[i].orig_option_with_args_text);
+ else
+   warning (OPT_Wpragmas,
+"bad option %s to pragma attribute",
+decoded_options[i].orig_option_with_args_text);
+ continue;
+   }
+   if (i != j)
+   decoded_options[j] = decoded_options[i];
+   j++;
+ }
+   decoded_options_count = j;
+   /* And apply them.  */
decode_options (&global_options, &global_options_set,
  decoded_options, decoded_options_count,
  input_location, global_dc);
Index: gcc/testsuite/gcc.dg/Wpragmas-1.c
===
*** gcc/testsuite/gcc.dg/Wpragmas-1.c   (revision 0)
--- gcc/testsuite/gcc.dg/Wpragmas-1.c   (working copy)
***
*** 0 
--- 1,11 
+ /* { dg-do compile } */
+ 
+ #pragma GCC push_options
+ #pragma GCC optimize ("-fno-lto") /* { dg-warning "bad option" } */
+ int main(void){return 0;}
+ #pragma GCC pop_options
+ 
+ /* ???  The FEs still apply the pragma string as optimize attribute to
+all functions thus the diagnostic will be repeated for each function
+affected.  */
+ /* { dg-message "bad option" "" { target *-*-* } 0 } */
Index: gcc/testsuite/gcc.dg/Wattributes-4.c
===
*** gcc/testsuite/gcc.dg/Wattributes-4.c(revision 0)
--- gcc/testsuite/gcc.dg/Wattributes-4.c(working copy)
***
*** 0 
--- 1,3 
+ /* { dg-do compile } */
+ 
+ int __attribute__((optimize("no-lto"))) main(void){return 0;} /* { dg-warning 
"bad option" } */


PING: [PATCH] PR33661 Fix problem with register asm in templates

2016-06-07 Thread Andreas Krebbel
Hi Jason,

could you please have a look?

https://gcc.gnu.org/ml/gcc-patches/2015-06/msg00904.html

Thanks!

-Andreas-



Re: [PATCH][3/3][RTL ifcvt] PR middle-end/37780: Conditional expression with __builtin_clz() should be optimized out

2016-06-07 Thread Andreas Schwab
Kyrill Tkachov  writes:

> +static rtx
> +simplify_cond_clz_ctz (rtx x, rtx_code cmp_code, rtx true_val, rtx false_val)
> +{
> +  if (cmp_code != EQ && cmp_code != NE)
> +return NULL_RTX;
> +
> +  /* Result on X == 0 and X !=0 respectively.  */
> +  rtx on_zero, on_nonzero;
> +  if (cmp_code == EQ)
> +{
> +  on_zero = true_val;
> +  on_nonzero = false_val;
> +}
> +  else
> +{
> +  on_zero = false_val;
> +  on_nonzero = true_val;
> +}
> +
> +  rtx_code op_code = GET_CODE (on_nonzero);
> +  if ((op_code != CLZ && op_code != CTZ)
> +  || !rtx_equal_p (XEXP (on_nonzero, 0), x)
> +  || !CONST_INT_P (on_zero))
> +return NULL_RTX;
> +
> +  HOST_WIDE_INT op_val;
> +  machine_mode mode = GET_MODE (on_nonzero);
> +  if (((op_code == CLZ && CLZ_DEFINED_VALUE_AT_ZERO (mode, op_val))
> + || (op_code == CTZ && CTZ_DEFINED_VALUE_AT_ZERO (mode, op_val)))
> +  && op_val == INTVAL (on_zero))
> +return on_nonzero;
> +
> +  return NULL_RTX;
> +}

../../gcc/simplify-rtx.c: In function 'rtx_def* simplify_cond_clz_ctz(rtx, 
rtx_code, rtx, rtx)':
../../gcc/simplify-rtx.c:5304:16: error: unused variable 'mode' 
[-Werror=unused-variable]
   machine_mode mode = GET_MODE (on_nonzero);
^~~~

Andreas.

-- 
Andreas Schwab, SUSE Labs, sch...@suse.de
GPG Key fingerprint = 0196 BAD8 1CE9 1970 F4BE  1748 E4D4 88E3 0EEA B9D7
"And now for something completely different."


Re: [PATCH,rs6000] Add built-in function support for new Power9 vector absolute difference unsigned instructions

2016-06-07 Thread Richard Biener
On Tue, Jun 7, 2016 at 1:58 AM, Kelvin Nilsen
 wrote:
>
> This patch adds built-in function support for the ISA 3.0 vabsub,
> vabsduh, and vabsduw instructions.
>
> I have bootstrapped and tested on powerpc64le-unkonwn-linux-gnu with no
> regressions.  Is this ok for the trunk?
>
> I have also tested against the gcc-6 branch without regressions.  Is
> this ok for backporting to gcc6 after a few days of burn-in time on the
> trunk?

It sounds like these match SAD_EXPR and thus should allow vectorizing
gcc.dg/vect/slp-reduc-sad.c and gcc.dg/vect/vect-reduc-sad.c using SAD?

Richard.

> gcc/testsuite/ChangeLog:
>
> 2016-06-06  Kelvin Nilsen  
>
> * gcc.target/powerpc/vadsdu-0.c: New test.
> * gcc.target/powerpc/vadsdu-1.c: New test.
> * gcc.target/powerpc/vadsdu-2.c: New test.
> * gcc.target/powerpc/vadsdu-3.c: New test.
> * gcc.target/powerpc/vadsdu-4.c: New test.
> * gcc.target/powerpc/vadsdu-5.c: New test.
> * gcc.target/powerpc/vadsdub-1.c: New test.
> * gcc.target/powerpc/vadsdub-2.c: New test.
> * gcc.target/powerpc/vadsduh-1.c: New test.
> * gcc.target/powerpc/vadsduh-2.c: New test.
> * gcc.target/powerpc/vadsduw-1.c: New test.
> * gcc.target/powerpc/vadsduw-2.c: New test.
>
>
> gcc/ChangeLog:
>
> 2016-06-06  Kelvin Nilsen  
>
> * config/rs6000/altivec.h (vec_adu): New macro for vector absolute
> difference unsigned.
> (vec_adub): New macro for vector absolute difference unsigned
> byte.
> (vec_aduh): New macro for vector absolute difference unsigned
> half-word.
> (vec_aduw): New macro for vector absolute difference unsigned word.
> * config/rs6000/altivec.md (UNSPEC_VADU): New value.
> (vadu3): New insn.
> (*p9_vadu3): New insn.
> * config/rs6000/rs6000-builtin.def (vadub): New built-in
> definition.
> (vaduh): New built-in definition.
> (vaduw): New built-in definition.
> (vadu): New overloaded built-in definition.
> (vadub): New overloaded built-in definition.
> (vaduh): New overloaded built-in definition.
> (vaduw): New overloaded built-in definition.
> * config/rs6000/rs6000-c.c (altivec_overloaded_builtins): Add
> overloaded vector absolute difference unsigned functions.
> * doc/extend.texi (PowerPC AltiVec Built-in Functions): Document
> the ISA 3.0 vector absolute difference unsigned built-in functions.
>
> Index: gcc/config/rs6000/altivec.h
> ===
> --- gcc/config/rs6000/altivec.h (revision 237045)
> +++ gcc/config/rs6000/altivec.h (working copy)
> @@ -401,6 +401,11 @@
>  #define vec_vprtybq __builtin_vec_vprtybq
>  #endif
>
> +#define vec_adu __builtin_vec_vadu
> +#define vec_adub __builtin_vec_vadub
> +#define vec_aduh __builtin_vec_vaduh
> +#define vec_aduw __builtin_vec_vaduw
> +
>  #define vec_slv __builtin_vec_vslv
>  #define vec_srv __builtin_vec_vsrv
>  #endif
> Index: gcc/config/rs6000/altivec.md
> ===
> --- gcc/config/rs6000/altivec.md(revision 237045)
> +++ gcc/config/rs6000/altivec.md(working copy)
> @@ -114,6 +114,7 @@
> UNSPEC_STVLXL
> UNSPEC_STVRX
> UNSPEC_STVRXL
> +   UNSPEC_VADU
> UNSPEC_VSLV
> UNSPEC_VSRV
> UNSPEC_VMULWHUB
> @@ -3464,6 +3465,25 @@
>[(set_attr "length" "4")
> (set_attr "type" "vecsimple")])
>
> +;; Vector absolute difference unsigned
> +(define_expand "vadu3"
> +  [(set (match_operand:VI 0 "register_operand" "")
> +(unspec:VI [(match_operand:VI 1 "register_operand" "")
> +   (match_operand:VI 2 "register_operand" "")]
> + UNSPEC_VADU))]
> +  "TARGET_P9_VECTOR")
> +
> +;; Vector absolute difference unsigned
> +(define_insn "*p9_vadu3"
> +  [(set (match_operand:VI 0 "register_operand" "=v")
> +(unspec:VI [(match_operand:VI 1 "register_operand" "v")
> +   (match_operand:VI 2 "register_operand" "v")]
> + UNSPEC_VADU))]
> +  "TARGET_P9_VECTOR"
> +  "vabsdu %0, %1, %2"
> +  [(set_attr "type" "add")
> +   (set_attr "length" "4")])
> +
>  ;; Vector count trailing zeros
>  (define_insn "*p9v_ctz2"
>[(set (match_operand:VI2 0 "register_operand" "=v")
> Index: gcc/config/rs6000/rs6000-builtin.def
> ===
> --- gcc/config/rs6000/rs6000-builtin.def(revision 237045)
> +++ gcc/config/rs6000/rs6000-builtin.def(working copy)
> @@ -1757,6 +1757,17 @@ BU_P9V_AV_2 (VSRV,   "vsrv",   
>   CONST, vsrv)
>  BU_P9V_OVERLOAD_2 (VSLV,   "vslv")
>  BU_P9V_OVERLOAD_2 (VSRV,   "vsrv")
>
> +/* 2 argument vector functions added in ISA 3.0 (power9). */
> +BU_P9V_AV_2 (VADUB,"vadub",CONST,  vaduv16qi3)
> +BU_P9V_AV_2 (VADUH,"vaduh",

[PATCH] Fix PR71428

2016-06-07 Thread Richard Biener

This fixes BIT_FIELD_REF handling in bswap pass where we need to 
distinguish an "operation" vs. a load properly.

Bootstrapped and tested on x86_64-unknown-linux-gnu, applied to trunk.

Richard.

2016-06-07  Richard Biener  

PR tree-optimization/71428
* tree-ssa-math-opts.c (perform_symbolic_merge): Properly distinguish
BIT_FIELD_REF op vs. load.

* gcc.dg/torture/pr71428.c: New testcase.

Index: gcc/tree-ssa-math-opts.c
===
*** gcc/tree-ssa-math-opts.c(revision 237117)
--- gcc/tree-ssa-math-opts.c(working copy)
*** perform_symbolic_merge (gimple *source_s
*** 2164,2173 
struct symbolic_number *n_start;
  
tree rhs1 = gimple_assign_rhs1 (source_stmt1);
!   if (TREE_CODE (rhs1) == BIT_FIELD_REF)
  rhs1 = TREE_OPERAND (rhs1, 0);
tree rhs2 = gimple_assign_rhs1 (source_stmt2);
!   if (TREE_CODE (rhs2) == BIT_FIELD_REF)
  rhs2 = TREE_OPERAND (rhs2, 0);
  
/* Sources are different, cancel bswap if they are not memory location with
--- 2164,2175 
struct symbolic_number *n_start;
  
tree rhs1 = gimple_assign_rhs1 (source_stmt1);
!   if (TREE_CODE (rhs1) == BIT_FIELD_REF
!   && TREE_CODE (TREE_OPERAND (rhs1, 0)) == SSA_NAME)
  rhs1 = TREE_OPERAND (rhs1, 0);
tree rhs2 = gimple_assign_rhs1 (source_stmt2);
!   if (TREE_CODE (rhs2) == BIT_FIELD_REF
!   && TREE_CODE (TREE_OPERAND (rhs2, 0)) == SSA_NAME)
  rhs2 = TREE_OPERAND (rhs2, 0);
  
/* Sources are different, cancel bswap if they are not memory location with
Index: gcc/testsuite/gcc.dg/torture/pr71428.c
===
*** gcc/testsuite/gcc.dg/torture/pr71428.c  (revision 0)
--- gcc/testsuite/gcc.dg/torture/pr71428.c  (working copy)
***
*** 0 
--- 1,20 
+ /* { dg-do run } */
+ /* { dg-additional-options "-fno-tree-forwprop -Wno-psabi -w" } */
+ 
+ typedef unsigned short v64u16 __attribute__ ((vector_size (64)));
+ 
+ v64u16
+ foo (v64u16 p1)
+ {
+   p1[31] |= p1[1];
+   return p1;
+ }
+ 
+ int
+ main ()
+ {
+   v64u16 x = foo ((v64u16){ 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 
0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1 });
+   if (x[31] != 1)
+ __builtin_abort();
+   return 0;
+ }


[PATCH] Fix PR71423

2016-06-07 Thread Richard Biener

This fixes an omission when moving tree-ssa-forwprop.c code to match.pd
in GCC 5 times.

Bootstrapped and tested on x86_64-unknown-linux-gnu, applied to trunk.

Richard.

2016-06-07  Richard Biener  

PR middle-end/71423
* match.pd ((X | ~Y) -> Y <= X): Properly invert the comparison
for signed ops.

* gcc.dg/torture/pr71423.c: New testcase.

Index: gcc/match.pd
===
*** gcc/match.pd(revision 237117)
--- gcc/match.pd(working copy)
*** DEFINE_INT_AND_FLOAT_ROUND_FN (RINT)
*** 900,911 
(ne (bit_and:c (bit_not @0) @1) integer_zerop)
(if (INTEGRAL_TYPE_P (TREE_TYPE (@1))
 && TYPE_PRECISION (TREE_TYPE (@1)) == 1)
!(lt @0 @1)))
  (simplify
(ne (bit_ior:c (bit_not @0) @1) integer_zerop)
(if (INTEGRAL_TYPE_P (TREE_TYPE (@1))
 && TYPE_PRECISION (TREE_TYPE (@1)) == 1)
!(le @0 @1)))
  
  /* ~~x -> x */
  (simplify
--- 900,915 
(ne (bit_and:c (bit_not @0) @1) integer_zerop)
(if (INTEGRAL_TYPE_P (TREE_TYPE (@1))
 && TYPE_PRECISION (TREE_TYPE (@1)) == 1)
!(if (TYPE_UNSIGNED (TREE_TYPE (@1)))
! (lt @0 @1)
! (gt @0 @1
  (simplify
(ne (bit_ior:c (bit_not @0) @1) integer_zerop)
(if (INTEGRAL_TYPE_P (TREE_TYPE (@1))
 && TYPE_PRECISION (TREE_TYPE (@1)) == 1)
!(if (TYPE_UNSIGNED (TREE_TYPE (@1)))
! (le @0 @1)
! (ge @0 @1
  
  /* ~~x -> x */
  (simplify
Index: gcc/testsuite/gcc.dg/torture/pr71423.c
===
*** gcc/testsuite/gcc.dg/torture/pr71423.c  (revision 0)
--- gcc/testsuite/gcc.dg/torture/pr71423.c  (working copy)
***
*** 0 
--- 1,20 
+ /* { dg-do run } */
+ 
+ struct S1
+ {
+   int f1:1;
+ };
+ 
+ volatile struct S1 b = { 0 };
+ 
+ int
+ main ()
+ {
+   char c = b.f1;
+   b.f1 = 1; 
+ 
+   if (b.f1 > -1 || c)
+ __builtin_abort (); 
+ 
+   return 0; 
+ }


Re: [PATCH] Fix SLP wrong-code with VECTOR_BOOLEAN_TYPE_P (PR tree-optimization/71259)

2016-06-07 Thread Richard Biener
On Mon, 6 Jun 2016, Jakub Jelinek wrote:

> On Mon, Jun 06, 2016 at 10:05:57AM +0200, Richard Biener wrote:
> > So this ends up generating { a ? -1 : 0, b ? -1 : 0, ... }.  That
> 
> Yes, that is already what we do now for loop vectorization.
> 
> > might be less optimal than doing { a, b, ... } ? { -1, -1 ... } : { 0, 0, 
> > .. }
> 
> Well, it would need to be
> { a, b, ... } != { 0, 0, ... } ? { -1, -1, ... } : { 0, 0, ... }
> then, doesn't VEC_COND_EXPR assume the condition is in canonical
> VECTOR_BOOLEAN_TYPE_P form?
> 
> Anyway, if something like the above would be faster, perhaps generic vector
> lowering or some similar pass could detect that case post-vectorization and
> optimize?

Yeah.  OTOH an "ideal" vectorizer would already consider the smaller
prologue cost for its cost modeling.

Richard.


Re: loop-ch tweek

2016-06-07 Thread Richard Biener
On Mon, 6 Jun 2016, Jan Hubicka wrote:

> Hi,
> does this look better?

Can you make the argument of gimple_inexpensive_call_p a gcall * please?

Ok with that change.

Thanks,
Richard.

> Honza
> 
>   * gimple.c: Include builtins.h
>   (gimple_inexpensive_call_p): New function.
>   * gimple.h (gimple_inexpensive_call_p): Declare.
>   * tree-ssa-loop-ch.c (should_duplicate_loop_header_p): Use it.
>   * tree-ssa-loop-ivcanon.c (tree_estimate_loop_size): Likewise.
> Index: gimple.c
> ===
> --- gimple.c  (revision 237101)
> +++ gimple.c  (working copy)
> @@ -38,6 +38,7 @@ along with GCC; see the file COPYING3.
>  #include "gimple-walk.h"
>  #include "gimplify.h"
>  #include "target.h"
> +#include "builtins.h"
>  
>  
>  /* All the tuples have their operand vector (if present) at the very bottom
> @@ -3018,3 +3019,16 @@ maybe_remove_unused_call_args (struct fu
>update_stmt_fn (fn, stmt);
>  }
>  }
> +
> +/* Return false if STMT will likely expand to real function call.  */
> +
> +bool
> +gimple_inexpensive_call_p (gimple *stmt)
> +{
> +  if (gimple_call_internal_p (stmt))
> +return true;
> +  tree decl = gimple_call_fndecl (stmt);
> +  if (decl && is_inexpensive_builtin (decl))
> +return true;
> +  return false;
> +}
> Index: gimple.h
> ===
> --- gimple.h  (revision 237101)
> +++ gimple.h  (working copy)
> @@ -1525,6 +1525,7 @@ extern void preprocess_case_label_vec_fo
>  extern void gimple_seq_set_location (gimple_seq, location_t);
>  extern void gimple_seq_discard (gimple_seq);
>  extern void maybe_remove_unused_call_args (struct function *, gimple *);
> +extern bool gimple_inexpensive_call_p (gimple *);
>  
>  /* Formal (expression) temporary table handling: multiple occurrences of
> the same scalar expression are evaluated into the same temporary.  */
> Index: tree-ssa-loop-ch.c
> ===
> --- tree-ssa-loop-ch.c(revision 237101)
> +++ tree-ssa-loop-ch.c(working copy)
> @@ -118,7 +118,8 @@ should_duplicate_loop_header_p (basic_bl
>if (is_gimple_debug (last))
>   continue;
>  
> -  if (is_gimple_call (last))
> +  if (gimple_code (last) == GIMPLE_CALL
> +   && !gimple_inexpensive_call_p (last))
>   {
> if (dump_file && (dump_flags & TDF_DETAILS))
>   fprintf (dump_file,
> Index: tree-ssa-loop-ivcanon.c
> ===
> --- tree-ssa-loop-ivcanon.c   (revision 237101)
> +++ tree-ssa-loop-ivcanon.c   (working copy)
> @@ -339,15 +339,11 @@ tree_estimate_loop_size (struct loop *lo
>for (gsi = gsi_start_bb (bb); !gsi_end_p (gsi); gsi_next (&gsi))
>   {
> gimple *stmt = gsi_stmt (gsi);
> -   if (gimple_code (stmt) == GIMPLE_CALL)
> +   if (gimple_code (stmt) == GIMPLE_CALL
> +   && !gimple_inexpensive_call_p (stmt))
>   {
> int flags = gimple_call_flags (stmt);
> -   tree decl = gimple_call_fndecl (stmt);
> -
> -   if (decl && DECL_IS_BUILTIN (decl)
> -   && is_inexpensive_builtin (decl))
> - ;
> -   else if (flags & (ECF_PURE | ECF_CONST))
> +   if (flags & (ECF_PURE | ECF_CONST))
>   size->num_pure_calls_on_hot_path++;
> else
>   size->num_non_pure_calls_on_hot_path++;
> 
> 

-- 
Richard Biener 
SUSE LINUX GmbH, GF: Felix Imendoerffer, Jane Smithard, Graham Norton, HRB 
21284 (AG Nuernberg)


  1   2   >