Re: Patch ping

2017-11-21 Thread Jakub Jelinek
On Mon, Nov 20, 2017 at 02:58:22PM -0800, Jim Wilson wrote:
> On 11/19/2017 11:55 PM, Jakub Jelinek wrote:
> > I'd like to ping the following patches:
> > 
> >http://gcc.gnu.org/ml/gcc-patches/2017-10/msg01895.html
> >PR debug/82718
> >Fix DWARF5 .debug_loclist handling with hot/cold partitioning
> 
> I already responded to this one.
> https://gcc.gnu.org/ml/gcc-patches/2017-11/msg01268.html

Sorry for missing that.

> The dwarf2out.c patch looks good to me, though the testcase does not fail on
> unpatched mainline anymore.  I had to go back to the 2017-10-22 snapshot to
> see the failure.  There was a followup from Mark Wielaard mentioning that
> elfutils still fails on mainline, so maybe we can get a testcase from there.

Committed both with the old and your new testcase after verifying your new
one still fails without the patch.

Thanks.

2017-11-21  Jakub Jelinek  

PR debug/82718
* dwarf2out.c (dw_loc_list): If crtl->has_bb_partition, temporarily
set in_cold_section_p to the partition containing loc_list->first.
When seeing loc_list->last_before_switch node, update secname and
perform range_across_switch second partition handling only after that.

* gcc.dg/debug/dwarf2/pr82718-1.c: New test.
* gcc.dg/debug/dwarf2/pr82718-2.c: New test.

--- gcc/dwarf2out.c.jj  2017-10-23 22:39:27.0 +0200
+++ gcc/dwarf2out.c 2017-10-25 21:01:13.237929750 +0200
@@ -16333,92 +16333,111 @@ dw_loc_list (var_loc_list *loc_list, tre
  This means we have to special case the last node, and generate
  a range of [last location start, end of function label].  */
 
-  secname = secname_for_decl (decl);
+  if (cfun && crtl->has_bb_partition)
+{
+  bool save_in_cold_section_p = in_cold_section_p;
+  in_cold_section_p = first_function_block_is_cold;
+  if (loc_list->last_before_switch == NULL)
+   in_cold_section_p = !in_cold_section_p;
+  secname = secname_for_decl (decl);
+  in_cold_section_p = save_in_cold_section_p;
+}
+  else
+secname = secname_for_decl (decl);
 
   for (node = loc_list->first; node; node = node->next)
-if (GET_CODE (node->loc) == EXPR_LIST
-   || NOTE_VAR_LOCATION_LOC (node->loc) != NULL_RTX)
-  {
-   if (GET_CODE (node->loc) == EXPR_LIST)
- {
-   /* This requires DW_OP_{,bit_}piece, which is not usable
-  inside DWARF expressions.  */
-   if (want_address != 2)
- continue;
+{
+  bool range_across_switch = false;
+  if (GET_CODE (node->loc) == EXPR_LIST
+ || NOTE_VAR_LOCATION_LOC (node->loc) != NULL_RTX)
+   {
+ if (GET_CODE (node->loc) == EXPR_LIST)
+   {
+ descr = NULL;
+ /* This requires DW_OP_{,bit_}piece, which is not usable
+inside DWARF expressions.  */
+ if (want_address == 2)
+   descr = dw_sra_loc_expr (decl, node->loc);
+   }
+ else
+   {
+ initialized = NOTE_VAR_LOCATION_STATUS (node->loc);
+ varloc = NOTE_VAR_LOCATION (node->loc);
+ descr = dw_loc_list_1 (decl, varloc, want_address, initialized);
+   }
+ if (descr)
+   {
+ /* If section switch happens in between node->label
+and node->next->label (or end of function) and
+we can't emit it as a single entry list,
+emit two ranges, first one ending at the end
+of first partition and second one starting at the
+beginning of second partition.  */
+ if (node == loc_list->last_before_switch
+ && (node != loc_list->first || loc_list->first->next)
+ && current_function_decl)
+   {
+ endname = cfun->fde->dw_fde_end;
+ range_across_switch = true;
+   }
+ /* The variable has a location between NODE->LABEL and
+NODE->NEXT->LABEL.  */
+ else if (node->next)
+   endname = node->next->label;
+ /* If the variable has a location at the last label
+it keeps its location until the end of function.  */
+ else if (!current_function_decl)
+   endname = text_end_label;
+ else
+   {
+ ASM_GENERATE_INTERNAL_LABEL (label_id, FUNC_END_LABEL,
+  current_function_funcdef_no);
+ endname = ggc_strdup (label_id);
+   }
+
+ *listp = new_loc_list (descr, node->label, endname, secname);
+ if (TREE_CODE (decl) == PARM_DECL
+ && node == loc_list->first
+ && NOTE_P (node->loc)
+ && strcmp (node->label, endname) == 0)
+   (*listp)->force = true;
+ listp = &(*listp)->dw_loc_next;
+   }
+   }
+
+  if (cfun
+ 

Re: [PATCH] Fix load_gsi computation in store-merging (PR tree-optimization/83047)

2017-11-21 Thread Richard Biener
On Mon, 20 Nov 2017, Jakub Jelinek wrote:

> Hi!
> 
> This is something the bswap pass has been already doing, but not the
> new load handling code in store-merging.  If all the loads have the same
> vuse, it still doesn't mean they are necessarily in the same basic block.
> If they are in multiple bbs, previously we've chosen randomly (well, from
> the load corresponding to the first store in the group), now we pick at
> least the last basic block.
> 
> Bootstrapped/regtested on x86_64-linux and i686-linux, ok for trunk?

Ok.

> It might not be good enough, if some program has
>   int x = q[0]; foo (x); int y = q[1]; p[0] = x; p[1] = y; and
> foo conditionally exits or aborts or externally throws or loops forever
> in case q[1] might not be mapped, then even when both loads are in the
> same bb we might put the larger load on the first load rather than the
> second.  I think we'd need to compute uids (perhaps lazily) and compare
> which stmt comes last.  Thoughts on that?

What we need to compute is whether we can hoist/sink loads to the
insert location.  Ideally we'd start by hoisting loads upwards
and if we hit a road-block try sinking the other loads (might work
in case of EH / looping forever).

One slight complication is that AFIAK an externally throwing or
endlessly looping (or memory unmapping) function doesn't necessarily
have a VDEF, not even a VUSE.  So we might not catch all stmts that
serve as a barrier by walking the VUSE -> VDEF chain.

We _might_ want to change rules here when to force a VDEF for
simplicity.

With -fnon-call-exceptions even a division might throw externally
so this rule adjustment might not hold.  But OTOH I don't see
easily how that serves as a barrier (unless the combined load
is unaligned in which case later loads might access not mapped
memory?)

Richard.

> 2017-11-20  Jakub Jelinek  
> 
>   PR tree-optimization/83047
>   * gimple-ssa-store-merging.c
>   (imm_store_chain_info::output_merged_store): If the loads with the
>   same vuse are in different basic blocks, for load_gsi pick a load
>   location that is dominated by the other loads.
> 
>   * gcc.dg/pr83047.c: New test.
> 
> --- gcc/gimple-ssa-store-merging.c.jj 2017-11-17 08:40:25.0 +0100
> +++ gcc/gimple-ssa-store-merging.c2017-11-20 10:37:40.429859947 +0100
> @@ -1857,7 +1857,30 @@ imm_store_chain_info::output_merged_stor
>store_immediate_info *infol = group->stores.last ();
>if (gimple_vuse (op.stmt) == gimple_vuse (infol->ops[j].stmt))
>   {
> -   load_gsi[j] = gsi_for_stmt (op.stmt);
> +   /* We can't pick the location randomly; while we've verified
> +  all the loads have the same vuse, they can be still in different
> +  basic blocks and we need to pick the one from the last bb:
> +  int x = q[0];
> +  if (x == N) return;
> +  int y = q[1];
> +  p[0] = x;
> +  p[1] = y;
> +  otherwise if we put the wider load at the q[0] load, we might
> +  segfault if q[1] is not mapped.  */
> +   basic_block bb = gimple_bb (op.stmt);
> +   gimple *ostmt = op.stmt;
> +   store_immediate_info *info;
> +   FOR_EACH_VEC_ELT (group->stores, i, info)
> + {
> +   gimple *tstmt = info->ops[j].stmt;
> +   basic_block tbb = gimple_bb (tstmt);
> +   if (dominated_by_p (CDI_DOMINATORS, tbb, bb))
> + {
> +   ostmt = tstmt;
> +   bb = tbb;
> + }
> + }
> +   load_gsi[j] = gsi_for_stmt (ostmt);
> load_addr[j]
>   = force_gimple_operand_1 (unshare_expr (op.base_addr),
> &load_seq[j], is_gimple_mem_ref_addr,
> --- gcc/testsuite/gcc.dg/pr83047.c.jj 2017-11-20 10:19:26.612657065 +0100
> +++ gcc/testsuite/gcc.dg/pr83047.c2017-11-20 10:24:15.0 +0100
> @@ -0,0 +1,58 @@
> +/* PR tree-optimization/83047 */
> +/* { dg-do run { target mmap } } */
> +/* { dg-options "-O2" } */
> +
> +#include 
> +#include 
> +#include 
> +#ifndef MAP_ANONYMOUS
> +#define MAP_ANONYMOUS MAP_ANON
> +#endif
> +#ifndef MAP_ANON
> +#define MAP_ANON 0
> +#endif
> +#ifndef MAP_FAILED
> +#define MAP_FAILED ((void *)-1)
> +#endif
> +#include 
> +
> +__attribute__((noipa)) void
> +foo (char *p, char *q, int r)
> +{
> +  char a = q[0];
> +  if (r || a == '\0')
> +return;
> +  char b = q[1];
> +  p[0] = a;
> +  p[1] = b;
> +}
> +
> +int
> +main ()
> +{
> +  char *p = mmap (NULL, 131072, PROT_READ | PROT_WRITE,
> +   MAP_PRIVATE | MAP_ANONYMOUS, -1, 0);
> +  if (p == MAP_FAILED)
> +return 0;
> +  if (munmap (p + 65536, 65536) < 0)
> +return 0;
> +  p[0] = 'c';
> +  p[1] = 'd';
> +  p[65536 - 2] = 'a';
> +  p[65536 - 1] = 'b';
> +  volatile int r = 1;
> +  foo (p, p + 65536 - 2, r);
> +  if (p[0] != 'c' || p[1] != 'd')
> +abort ();
> +  r = 0;
> +  foo (p, p + 65536 - 2, r);
> +  if (p[0] != 'a' || p[1] != 'b')
> +abort ();
> +  p[0] = 'e';
> +  

Re: [PATCH] Fix ICEs from expand_mul_overflow (PR target/82981)

2017-11-21 Thread Richard Biener
On Mon, 20 Nov 2017, Jakub Jelinek wrote:

> Hi!
> 
> Apparently ARM can do the widening SImode multiply only using a libcall,
> so the recently changed expand_mul_overflow ignores it at first, then
> sees possibility to use the HImode code, but uses expand_simple_binop
> with OPTAB_DIRECT which requires actual HImode optabs, which ARM doesn't
> have, it needs to use widening to SImode.
> 
> Fixed thusly, bootstrapped/regtested on x86_64-linux and i686-linux, ok for
> trunk?

Ok.

RIchard.

> 2017-11-20  Jakub Jelinek  
> 
>   PR target/82981
>   * internal-fn.c (expand_mul_overflow): Use OPTAB_WIDEN instead of
>   OPTAB_DIRECT in calls to expand_simple_binop.
> 
> --- gcc/internal-fn.c.jj  2017-11-15 09:54:30.0 +0100
> +++ gcc/internal-fn.c 2017-11-20 16:38:55.185145957 +0100
> @@ -1760,7 +1760,7 @@ expand_mul_overflow (location_t loc, tre
> tem = convert_modes (mode, hmode, lopart, 1);
> tem = expand_shift (LSHIFT_EXPR, mode, tem, hprec, NULL_RTX, 1);
> tem = expand_simple_binop (mode, MINUS, loxhi, tem, NULL_RTX,
> -  1, OPTAB_DIRECT);
> +  1, OPTAB_WIDEN);
> emit_move_insn (loxhi, tem);
>  
> emit_label (after_hipart_neg);
> @@ -1774,7 +1774,7 @@ expand_mul_overflow (location_t loc, tre
>profile_probability::even ());
>  
> tem = expand_simple_binop (mode, MINUS, loxhi, larger, NULL_RTX,
> -  1, OPTAB_DIRECT);
> +  1, OPTAB_WIDEN);
> emit_move_insn (loxhi, tem);
>  
> emit_label (after_lopart_neg);
> @@ -1783,7 +1783,7 @@ expand_mul_overflow (location_t loc, tre
> /* loxhi += (uns) lo0xlo1 >> (bitsize / 2);  */
> tem = expand_shift (RSHIFT_EXPR, mode, lo0xlo1, hprec, NULL_RTX, 1);
> tem = expand_simple_binop (mode, PLUS, loxhi, tem, NULL_RTX,
> -  1, OPTAB_DIRECT);
> +  1, OPTAB_WIDEN);
> emit_move_insn (loxhi, tem);
>  
> /* if (loxhi >> (bitsize / 2)
> @@ -1810,7 +1810,7 @@ expand_mul_overflow (location_t loc, tre
>  convert_modes (hmode, mode, lo0xlo1, 1), 1);
>  
> tem = expand_simple_binop (mode, IOR, loxhishifted, tem, res,
> -  1, OPTAB_DIRECT);
> +  1, OPTAB_WIDEN);
> if (tem != res)
>   emit_move_insn (res, tem);
> emit_jump (done_label);
> @@ -1835,7 +1835,7 @@ expand_mul_overflow (location_t loc, tre
> if (!op0_medium_p)
>   {
> tem = expand_simple_binop (hmode, PLUS, hipart0, const1_rtx,
> -  NULL_RTX, 1, OPTAB_DIRECT);
> +  NULL_RTX, 1, OPTAB_WIDEN);
> do_compare_rtx_and_jump (tem, const1_rtx, GTU, true, hmode,
>  NULL_RTX, NULL, do_error,
>  profile_probability::very_unlikely 
> ());
> @@ -1844,7 +1844,7 @@ expand_mul_overflow (location_t loc, tre
> if (!op1_medium_p)
>   {
> tem = expand_simple_binop (hmode, PLUS, hipart1, const1_rtx,
> -  NULL_RTX, 1, OPTAB_DIRECT);
> +  NULL_RTX, 1, OPTAB_WIDEN);
> do_compare_rtx_and_jump (tem, const1_rtx, GTU, true, hmode,
>  NULL_RTX, NULL, do_error,
>  profile_probability::very_unlikely 
> ());
> 
>   Jakub
> 
> 

-- 
Richard Biener 
SUSE LINUX GmbH, GF: Felix Imendoerffer, Jane Smithard, Graham Norton, HRB 
21284 (AG Nuernberg)


Re: [PATCH] Fix ICE with __RTL and -g (PR debug/82933)

2017-11-21 Thread Richard Biener
On Mon, 20 Nov 2017, Jakub Jelinek wrote:

> Hi!
> 
> The dwarf2out.c code relies on the assembly_start debug hook being
> invoked before any RTL is processed from final.c (and needs it to be
> done just once).  Normally it is called from cgraphunit.c, but when
> __RTL is seen with a starting pass, run_rtl_passes is called already
> from the FE.  While it would be better to defer the rtl finalization
> until cgraph says so, it might be quite hard, so instead this patch
> hacks dwarf2out_assembly_start so that it can be invoked multiple times
> (and does nothing on the 2nd+ call) and invokes it from the run_rtl_passes
> function too.
> 
> Bootstrapped/regtested on x86_64-linux and i686-linux, ok for trunk?

works for me.

Richard.

> 2017-11-20  Jakub Jelinek  
> 
>   PR debug/82933
>   * run-rtl-passes.c: Include debug.h.
>   (run_rtl_passes): Call debug_hooks->assembly_start.
>   * dwarf2out.c (dwarf2out_assembly_start): Return early if invoked
>   multiple times.
> 
>   * gcc.dg/rtl/x86_64/pr82933.c: New test.
> 
> --- gcc/run-rtl-passes.c.jj   2017-01-24 23:29:09.0 +0100
> +++ gcc/run-rtl-passes.c  2017-11-20 17:36:31.320854900 +0100
> @@ -30,6 +30,7 @@ along with GCC; see the file COPYING3.
>  #include "bitmap.h"
>  #include "df.h"
>  #include "regs.h"
> +#include "debug.h" /* for debug_hooks.  */
>  #include "insn-attr-common.h" /* for INSN_SCHEDULING.  */
>  #include "insn-attr.h" /* for init_sched_attrs.  */
>  #include "run-rtl-passes.h"
> @@ -43,6 +44,9 @@ run_rtl_passes (char *initial_pass_name)
>cfun->pass_startwith = initial_pass_name;
>max_regno = max_reg_num ();
>  
> +  /* cgraphunit.c normally handles this.  */
> +  (*debug_hooks->assembly_start) ();
> +
>/* Pass "expand" normally sets this up.  */
>  #ifdef INSN_SCHEDULING
>init_sched_attrs ();
> --- gcc/dwarf2out.c.jj2017-11-15 09:38:26.0 +0100
> +++ gcc/dwarf2out.c   2017-11-20 17:31:48.222394813 +0100
> @@ -27507,6 +27507,9 @@ dwarf2out_init (const char *filename ATT
>  static void
>  dwarf2out_assembly_start (void)
>  {
> +  if (text_section_line_info)
> +return;
> +
>  #ifndef DWARF2_LINENO_DEBUGGING_INFO
>ASM_GENERATE_INTERNAL_LABEL (text_section_label, TEXT_SECTION_LABEL, 0);
>ASM_GENERATE_INTERNAL_LABEL (text_end_label, TEXT_END_LABEL, 0);
> --- gcc/testsuite/gcc.dg/rtl/x86_64/pr82933.c.jj  2017-11-20 
> 17:34:54.680063313 +0100
> +++ gcc/testsuite/gcc.dg/rtl/x86_64/pr82933.c 2017-11-20 17:35:26.361667161 
> +0100
> @@ -0,0 +1,4 @@
> +/* { dg-do run { target { { i?86-*-* x86_64-*-* } && lp64 } } } */
> +/* { dg-options "-g" } */
> +
> +#include "into-cfglayout.c"
> 
>   Jakub
> 
> 

-- 
Richard Biener 
SUSE LINUX GmbH, GF: Felix Imendoerffer, Jane Smithard, Graham Norton, HRB 
21284 (AG Nuernberg)


Re: [PATCH] Avoid static initialization in the strlen pass

2017-11-21 Thread Richard Biener
On Mon, 20 Nov 2017, Jakub Jelinek wrote:

> Hi!
> 
> All the hash_maps in tree-ssa-strlen.c except for the newly added one
> were pointers to hash maps, which were constructed either lazily or during
> the pass.  But strlen_to_stridx is now constructed at the compiler start,
> which is something I'd prefer to avoid, it affects even -O0 that way and
> empty/small file compilation, something e.g. the kernel folks care so much
> about.
> 
> Apparently the hash map is only needed when one of the two warnings
> is enabled, so this patch initializes it only in that case and otherwise
> doesn't fill it or query it.
> 
> Bootstrapped/regtested on x86_64-linux and i686-linux, ok for trunk?

Ok.

Richard.

> 2017-11-20  Jakub Jelinek  
> 
>   * tree-ssa-strlen.c (strlen_to_stridx): Change into a pointer to
>   hash_map.
>   (handle_builtin_strlen, strlen_optimize_stmt): Only access it
>   if non-NULL, instead of . use ->.
>   (handle_builtin_stxncpy): Return early if strlen_to_stridx
>   is NULL.  Spelling fix.  Instead of . use ->.
>   (pass_strlen::execute): Allocate strlen_to_stridx if
>   warn_stringop_{truncation,overflow}.  Instead of calling empty on it
>   delete it and clear it at the end of the pass.
> 
> --- gcc/tree-ssa-strlen.c.jj  2017-11-15 09:40:03.0 +0100
> +++ gcc/tree-ssa-strlen.c 2017-11-20 18:10:42.565458585 +0100
> @@ -153,7 +153,7 @@ struct decl_stridxlist_map
>  static hash_map *decl_to_stridxlist_htab;
>  
>  typedef std::pair stridx_strlenloc;
> -static hash_map strlen_to_stridx;
> +static hash_map *strlen_to_stridx;
>  
>  /* Obstack for struct stridxlist and struct decl_stridxlist_map.  */
>  static struct obstack stridx_obstack;
> @@ -1207,8 +1207,11 @@ handle_builtin_strlen (gimple_stmt_itera
> gcc_assert (si->full_string_p);
>   }
>  
> -   location_t loc = gimple_location (stmt);
> -   strlen_to_stridx.put (lhs, stridx_strlenloc (idx, loc));
> +   if (strlen_to_stridx)
> + {
> +   location_t loc = gimple_location (stmt);
> +   strlen_to_stridx->put (lhs, stridx_strlenloc (idx, loc));
> + }
> return;
>   }
>  }
> @@ -1253,8 +1256,11 @@ handle_builtin_strlen (gimple_stmt_itera
>set_strinfo (idx, si);
>find_equal_ptrs (src, idx);
>  
> -  location_t loc = gimple_location (stmt);
> -  strlen_to_stridx.put (lhs, stridx_strlenloc (idx, loc));
> +  if (strlen_to_stridx)
> + {
> +   location_t loc = gimple_location (stmt);
> +   strlen_to_stridx->put (lhs, stridx_strlenloc (idx, loc));
> + }
>  }
>  }
>  
> @@ -1909,6 +1915,9 @@ maybe_diag_stxncpy_trunc (gimple_stmt_it
>  static void
>  handle_builtin_stxncpy (built_in_function, gimple_stmt_iterator *gsi)
>  {
> +  if (strlen_to_stridx == NULL)
> +return;
> +
>gimple *stmt = gsi_stmt (*gsi);
>  
>bool with_bounds = gimple_call_with_bounds_p (stmt);
> @@ -1917,9 +1926,9 @@ handle_builtin_stxncpy (built_in_functio
>tree len = gimple_call_arg (stmt, with_bounds ? 3 : 2);
>  
>/* If the length argument was computed from strlen(S) for some string
> - S retrieve the strinfo index for the string (PSS->FIRST) alonng with
> + S retrieve the strinfo index for the string (PSS->FIRST) along with
>   the location of the strlen() call (PSS->SECOND).  */
> -  stridx_strlenloc *pss = strlen_to_stridx.get (len);
> +  stridx_strlenloc *pss = strlen_to_stridx->get (len);
>if (!pss || pss->first <= 0)
>  {
>if (maybe_diag_stxncpy_trunc (*gsi, src, len))
> @@ -2966,9 +2975,12 @@ strlen_optimize_stmt (gimple_stmt_iterat
> fold_strstr_to_strncmp (gimple_assign_rhs1 (stmt),
> gimple_assign_rhs2 (stmt), stmt);
>  
> - tree rhs1 = gimple_assign_rhs1 (stmt);
> - if (stridx_strlenloc *ps = strlen_to_stridx.get (rhs1))
> -   strlen_to_stridx.put (lhs, stridx_strlenloc (*ps));
> + if (strlen_to_stridx)
> +   {
> + tree rhs1 = gimple_assign_rhs1 (stmt);
> + if (stridx_strlenloc *ps = strlen_to_stridx->get (rhs1))
> +   strlen_to_stridx->put (lhs, stridx_strlenloc (*ps));
> +   }
>}
>  else if (TREE_CODE (lhs) != SSA_NAME && !TREE_SIDE_EFFECTS (lhs))
>   {
> @@ -3202,6 +3214,9 @@ pass_strlen::execute (function *fun)
>ssa_ver_to_stridx.safe_grow_cleared (num_ssa_names);
>max_stridx = 1;
>  
> +  if (warn_stringop_truncation || warn_stringop_overflow)
> +strlen_to_stridx = new hash_map (64);
> +
>calculate_dominance_info (CDI_DOMINATORS);
>  
>/* String length optimization is implemented as a walk of the dominator
> @@ -3220,7 +3235,11 @@ pass_strlen::execute (function *fun)
>laststmt.len = NULL_TREE;
>laststmt.stridx = 0;
>  
> -  strlen_to_stridx.empty ();
> +  if (strlen_to_stridx)
> +{
> +  delete strlen_to_stridx;
> +  strlen_to_stridx = NULL;
> +}
>  
>return 0;
>  }
> 
>   Jakub
> 
> 

-- 
Richa

Re: Patch ping

2017-11-21 Thread Jakub Jelinek
On Mon, Nov 20, 2017 at 01:31:39PM -0500, Nathan Sidwell wrote:
> On 11/20/2017 02:55 AM, Jakub Jelinek wrote:
> 
> >http://gcc.gnu.org/ml/gcc-patches/2017-11/msg00851.html
> >C++2A P0428R2 - familiar template syntax for generic lambdas
> 
> Are there existing testcases checking this is permitted w/o warning under
> the appropriate circumstances?  That seems to be the only thing missing from
> the patch itself.

At least for check-c++-all I believe it is mostly covered.  E.g.
g++.dg/cpp1y/lambda-generic-x.C has:
// { dg-warning "lambda templates are only available with" "" { target 
c++17_down } }
which checks that with -Wpedantic there is warning emitted in -std=gnu++14
and -std=gnu++17, but not with -std=gnu++20.  Similarly several other
testcases.
And g++.dg/cpp2a/lambda-generic1.C has:
// { dg-error "lambda templates are only available with" "" { target c++17_down 
} }
that verifies that with -pedantic-errors in -std=c++14 and -std=c++17
an error is emitted and in -std=c++20 no error/warning is diagnosed.

So, what isn't tested is: 1) behavior for C++98/C++11 2) -Wno-pedantic
behavior 3) at least one test that would catch it even without check-c++-all
for -std=c++2a

Therefore, I've added 3 new small tests that are compiled by all language
variants and check -pedantic-errors, -Wpedantic and -Wno-pedantic behavior
and one that forces -std=c++2a into dg-options and thus works even without
check-c++-all and committed.  Thanks.

2017-11-21  Jakub Jelinek  

P0428R2 - familiar template syntax for generic lambdas
* parser.c (cp_parser_lambda_declarator_opt): Don't pedwarn
for cxx2a and above, reword pedwarn for C++14/C++17.

* g++.dg/cpp1y/lambda-generic-x.C: Adjust warnings and limit
to c++17_down target.
* g++.dg/cpp1y/lambda-generic-dep.C: Likewise.
* g++.dg/cpp1y/lambda-generic-77914.C: Adjust error and limit
to c++17_down target.
* g++.dg/cpp2a/lambda-generic1.C: New test.
* g++.dg/cpp2a/lambda-generic2.C: New test.
* g++.dg/cpp2a/lambda-generic3.C: New test.
* g++.dg/cpp2a/lambda-generic4.C: New test.
* g++.dg/cpp2a/lambda-generic5.C: New test.

--- gcc/cp/parser.c.jj  2017-11-20 19:56:04.787470779 +0100
+++ gcc/cp/parser.c 2017-11-21 09:15:30.127267088 +0100
@@ -10512,9 +10512,10 @@ cp_parser_lambda_declarator_opt (cp_pars
pedwarn (parser->lexer->next_token->location, 0,
 "lambda templates are only available with "
 "-std=c++14 or -std=gnu++14");
-  else
+  else if (cxx_dialect < cxx2a)
pedwarn (parser->lexer->next_token->location, OPT_Wpedantic,
-"ISO C++ does not support lambda templates");
+"lambda templates are only available with "
+"-std=c++2a or -std=gnu++2a");
 
   cp_lexer_consume_token (parser->lexer);
 
--- gcc/testsuite/g++.dg/cpp1y/lambda-generic-x.C.jj2017-11-10 
15:42:02.371152517 +0100
+++ gcc/testsuite/g++.dg/cpp1y/lambda-generic-x.C   2017-11-21 
09:15:30.147266840 +0100
@@ -6,17 +6,17 @@
 
 int main()
 {
-   auto glambda = []  (A a, B&& b) { return a < b; };  
// { dg-warning "does not support lambda templates" }
+   auto glambda = []  (A a, B&& b) { return a < b; };  
// { dg-warning "lambda templates are only available with" "" { target 
c++17_down } }
bool b = glambda(3, 3.14); // OK
-   auto vglambda = []  (P printer) {   
// { dg-warning "does not support lambda templates" }
+   auto vglambda = []  (P printer) {   
// { dg-warning "lambda templates are only available with" "" { target 
c++17_down } }
  return [=]  (T&& ... ts) { // OK: ts is a function 
parameter pack
-   printer(std::forward(ts)...); 
// { dg-warning "does not support lambda templates" "" { target *-*-* } .-1 }
+   printer(std::forward(ts)...); 
// { dg-warning "lambda templates are only available with" "" { target 
c++17_down } .-1 }
return [=]() {
  printer(ts ...);
};
  };
};
-   auto p = vglambda( []  (A v1, B v2, C v3)
  { std::cout << v1 << v2 << v3; } );
--- gcc/testsuite/g++.dg/cpp1y/lambda-generic-dep.C.jj  2017-11-10 
15:42:02.394152236 +0100
+++ gcc/testsuite/g++.dg/cpp1y/lambda-generic-dep.C 2017-11-21 
09:15:30.16126 +0100
@@ -27,7 +27,7 @@ struct S {
 
 int main()
 {
-  auto f = []  (T const& s) mutable {  // { dg-warning "does 
not support lambda templates" }
+  auto f = []  (T const& s) mutable {  // { dg-warning "lambda 
templates are only available with" "" { target c++17_down } }
 typename T::N x;
 return x.test ();
   };
--- gcc/testsuite/g++.dg/cpp1y/lambda-generic-77914.C.jj2017-11-10 
15:42:02.336152944 +0100
+++ gcc/testsuite/g++.dg/cpp1y/lambda-generic-77914.C   2017-11-21 
09:15:30.169266567 +0100
@@ -4,6 +4,6 @@
 int
 main ()
 {
-  auto l = []  () {};  // { dg

[Ping][PATCH v3] Fix Incorrect ASan global variables alignment on arm (PR sanitizer/81697)

2017-11-21 Thread Maxim Ostapenko

Hi,

I would like to ping the following patch:
https://gcc.gnu.org/ml/gcc-patches/2017-10/msg02288.html

Thanks,
-Maxim
gcc/ChangeLog:

2017-11-21  Maxim Ostapenko  

	PR sanitizer/81697
	* asan.c (asan_protect_global): Add new ignore_decl_rtl_set_p
	parameter. Return true if ignore_decl_rtl_set_p is true and other
	conditions are satisfied.
	* asan.h (asan_protect_global): Add new parameter.
	* varasm.c (categorize_decl_for_section): Pass true as second parameter
	to asan_protect_global calls.

gcc/testsuite/ChangeLog:

2017-11-21  Maxim Ostapenko  

	PR sanitizer/81697
	* g++.dg/asan/global-alignment.C: New test.

diff --git a/gcc/asan.c b/gcc/asan.c
index d5128aa..78c3b60 100644
--- a/gcc/asan.c
+++ b/gcc/asan.c
@@ -1605,7 +1605,7 @@ is_odr_indicator (tree decl)
ASAN_RED_ZONE_SIZE bytes.  */
 
 bool
-asan_protect_global (tree decl)
+asan_protect_global (tree decl, bool ignore_decl_rtl_set_p)
 {
   if (!ASAN_GLOBALS)
 return false;
@@ -1627,7 +1627,13 @@ asan_protect_global (tree decl)
   || DECL_THREAD_LOCAL_P (decl)
   /* Externs will be protected elsewhere.  */
   || DECL_EXTERNAL (decl)
-  || !DECL_RTL_SET_P (decl)
+  /* PR sanitizer/81697: For architectures that use section anchors first
+	 call to asan_protect_global may occur before DECL_RTL (decl) is set.
+	 We should ignore DECL_RTL_SET_P then, because otherwise the first call
+	 to asan_protect_global will return FALSE and the following calls on the
+	 same decl after setting DECL_RTL (decl) will return TRUE and we'll end
+	 up with inconsistency at runtime.  */
+  || (!DECL_RTL_SET_P (decl) && !ignore_decl_rtl_set_p)
   /* Comdat vars pose an ABI problem, we can't know if
 	 the var that is selected by the linker will have
 	 padding or not.  */
@@ -1651,6 +1657,9 @@ asan_protect_global (tree decl)
   || is_odr_indicator (decl))
 return false;
 
+  if (ignore_decl_rtl_set_p)
+return true;
+
   rtl = DECL_RTL (decl);
   if (!MEM_P (rtl) || GET_CODE (XEXP (rtl, 0)) != SYMBOL_REF)
 return false;
diff --git a/gcc/asan.h b/gcc/asan.h
index c82d4d9..885b47e 100644
--- a/gcc/asan.h
+++ b/gcc/asan.h
@@ -26,7 +26,7 @@ extern void asan_finish_file (void);
 extern rtx_insn *asan_emit_stack_protection (rtx, rtx, unsigned int,
 	 HOST_WIDE_INT *, tree *, int);
 extern rtx_insn *asan_emit_allocas_unpoison (rtx, rtx, rtx_insn *);
-extern bool asan_protect_global (tree);
+extern bool asan_protect_global (tree, bool ignore_decl_rtl_set_p = false);
 extern void initialize_sanitizer_builtins (void);
 extern tree asan_dynamic_init_call (bool);
 extern bool asan_expand_check_ifn (gimple_stmt_iterator *, bool);
diff --git a/gcc/testsuite/g++.dg/asan/global-alignment.C b/gcc/testsuite/g++.dg/asan/global-alignment.C
new file mode 100644
index 000..84dac37
--- /dev/null
+++ b/gcc/testsuite/g++.dg/asan/global-alignment.C
@@ -0,0 +1,18 @@
+/* { dg-options "-fmerge-all-constants" } */
+/* { dg-do compile } */
+/* { dg-skip-if "" { *-*-* } { "-O0" } { "" } } */
+
+#include 
+#include 
+
+const char kRecoveryInstallString[] = "NEW";
+const char kRecoveryUpdateString[] = "UPDATE";
+const char kRecoveryUninstallationString[] = "UNINSTALL";
+
+const std::map kStringToRequestMap = {
+  {kRecoveryInstallString, 0},
+  {kRecoveryUpdateString, 0},
+  {kRecoveryUninstallationString, 0},
+};
+
+/* { dg-final { scan-assembler-times {\.section\s+\.rodata\n(?:(?!\.section).)*\.\w+\s+"NEW} 1 } } */
diff --git a/gcc/varasm.c b/gcc/varasm.c
index a139151..849eae0 100644
--- a/gcc/varasm.c
+++ b/gcc/varasm.c
@@ -6508,7 +6508,7 @@ categorize_decl_for_section (const_tree decl, int reloc)
   else if (TREE_CODE (decl) == STRING_CST)
 {
   if ((flag_sanitize & SANITIZE_ADDRESS)
-	  && asan_protect_global (CONST_CAST_TREE (decl)))
+	  && asan_protect_global (CONST_CAST_TREE (decl), true))
   /* or !flag_merge_constants */
 return SECCAT_RODATA;
   else
@@ -6536,7 +6536,7 @@ categorize_decl_for_section (const_tree decl, int reloc)
 	ret = reloc == 1 ? SECCAT_DATA_REL_RO_LOCAL : SECCAT_DATA_REL_RO;
   else if (reloc || flag_merge_constants < 2
 	   || ((flag_sanitize & SANITIZE_ADDRESS)
-		   && asan_protect_global (CONST_CAST_TREE (decl
+		   && asan_protect_global (CONST_CAST_TREE (decl), true)))
 	/* C and C++ don't allow different variables to share the same
 	   location.  -fmerge-all-constants allows even that (at the
 	   expense of not conforming).  */


Re: [PATCH v3 1/14] D: The front-end (DMD) language implementation and license.

2017-11-21 Thread Iain Buclaw
On 13 November 2017 at 00:20, Andrei Alexandrescu  wrote:
> On 11/06/2017 01:46 PM, Iain Buclaw wrote:
>>
>> On 25 October 2017 at 03:06, Jeff Law  wrote:
>>>
>>> On 10/18/2017 01:33 AM, Iain Buclaw wrote:

 On 6 October 2017 at 14:51, Ian Lance Taylor  wrote:
>
> On Fri, Oct 6, 2017 at 1:34 AM, Iain Buclaw 
> wrote:
>>
>>
>> Out of curiosity, I did have a look at some of the tops of gofrontend
>> sources this morning.  They are all copyright the Go Authors, and are
>> licensed as BSD.  So I'm not sure if having copyright FSF and
>> distributing under GPL is strictly required.  And from a maintenance
>> point of view, it would be easier to merge in upstream changes as-is
>> without some diff/merging tool.
>
>
> The GCC steering committee accepted the gofrontend code under a
> non-GPL license with the understanding that the master code would live
> in a separate repository that would be mirrored into the GCC repo (the
> master repository for gofrontend is currently at
> https://go.googlesource.com/gofrontend/).  Personally I don't see a
> problem with doing the same for the D frontend.
>
> Ian


 Should I request that maybe Donald from FSF chime in here?  I'd rather
 avoid another stalemate on this.
>>>
>>> Absolutely, though RMS should probably be included on any discussion
>>> with Donald.  I think the FSF needs to chime in and I think the steering
>>> committee needs to chime in once we've got guidance from the FSF.
>>>
>>> The first and most important question that needs to be answered is
>>> whether or not the FSF would be OK including the DMD bits with the
>>> license (boost) as-is into GCC.
>>>
>>> If that's not acceptable, then we'd have to look at some kind of script
>>> to fix the copyrights.
>>> Jeff
>>>
>>
>> Assuming then, that we'll ship with all copyright notices amended to
>> be copyright FSF and GPL licensed - that can be fixed up in a later
>> patch - is there anything further needed to push this review process
>> further?
>>
>> Iain.
>
>
> Hi Jeff, Ian, Joseph: thanks for your consideration. Is there anything we
> can do on our side to move things forward? Please advise, thanks!
>
> Andrei
>

Ping?

I was recently made aware that upstream DMD has a pending patch to
switch copyright ownership of all its sources to "The D Language
Foundation", however it now seems blocked pending on the outcome here.

Iain.


Re: [PATCH] rs6000: Don't touch below the stack pointer (PR77687)

2017-11-21 Thread Olivier Hainque

> On Nov 20, 2017, at 21:16 , Segher Boessenkool  
> wrote:
> 
> I backported this to GCC 7 now.

Nice, thanks Segher!



Re: Adjust empty class parameter passing ABI (PR c++/60336)

2017-11-21 Thread Uros Bizjak
On Mon, Nov 20, 2017 at 4:51 PM, Marek Polacek  wrote:
> On Thu, Nov 16, 2017 at 02:20:59PM -0500, Jason Merrill wrote:
>> On Thu, Nov 16, 2017 at 12:41 PM, Marek Polacek  wrote:
>> > On Tue, Nov 14, 2017 at 07:34:54AM +0100, Richard Biener wrote:
>> >> On November 14, 2017 6:21:41 AM GMT+01:00, Jason Merrill 
>> >>  wrote:
>> >> >On Mon, Nov 13, 2017 at 1:02 PM, Marek Polacek 
>> >> >> In the end I did two bootstraps with the patch, but modifed one of
>> >> >them
>> >> >> to always return false for ix86_is_empty_record.  Then I compared all
>> >> >the
>> >> >> *.o in both dirs.  The result is attached.  Then I looked at
>> >> >DW_AT_producer
>> >> >> for all these .o that differ; all of them are C++.  Is this enough to
>> >> >> clear our concerns?
>> >> >
>> >> >Hmm, a bunch of these are right at the beginning, bytes 41 and 65, in
>> >> >the header.
>> >> >
>> >> >Did you build them in the different trunk/trunk2 directories?  I think
>> >> >Jakub was suggesting building them in the same directory.
>> >> >> And I also ran a bootstrap with --enable-cxx-flags=-Wabi=11, and
>> >> >didn't
>> >> >> see any warnings.
>> >> >
>> >> >If there's a codegen change, there ought to be a warning to go along
>> >> >with it.
>> >>
>> >> The question was of course also for unintended changes but yes (I was 
>> >> mainly concerned by libstdc++ ABI changes).
>> >
>> > Ok, I did two bootstraps in the same dir, one with ix86_is_empty_record 
>> > always
>> > returning false.  There were a few object files that differ in their 
>> > assembly
>> > between those two bootstraps.  Previously I didn't see any warnings because
>> > I hadn't thought of -Wsystem-headers.  Also, we intentionally don't warn if
>> > the empty parameter is the last one:
>> >
>> > +  bool seen_empty_type = false;
>> > +  FOREACH_FUNCTION_ARGS (fntype, argtype, iter)
>> > +   {
>> > + if (VOID_TYPE_P (argtype))
>> > +   break;
>> > + if (TYPE_EMPTY_P (argtype))
>> > +   seen_empty_type = true;
>> > + else if (seen_empty_type)
>> > +   {
>> > + cum->warn_empty = true;
>> > + break;
>> > +   }
>> > +   }
>> >
>> > After enabling -Wsystem-headers and tweaking the code above so that we warn
>> > even if the empty parameter is trailing I can see the warnings that 
>> > correspond
>> > to the assembly changes.  Below is a summary of what I found.  TL;DR: I 
>> > don't
>> > see any unintended changes.
>>
>> Looks good to me.
>
> Thanks!
>
> Richi, are you ok with the patch now?
> Honza/Uros, are the config/i386/* changes ok?
>
> The last version of the patch is
> https://gcc.gnu.org/ml/gcc-patches/2017-11/msg00969.html

LGTM for x86 part.

Thanks,
Uros.


Re: [PATCH] Fix ICEs from expand_mul_overflow (PR target/82981)

2017-11-21 Thread Christophe Lyon
On 21 November 2017 at 09:14, Richard Biener  wrote:
> On Mon, 20 Nov 2017, Jakub Jelinek wrote:
>
>> Hi!
>>
>> Apparently ARM can do the widening SImode multiply only using a libcall,
>> so the recently changed expand_mul_overflow ignores it at first, then
>> sees possibility to use the HImode code, but uses expand_simple_binop
>> with OPTAB_DIRECT which requires actual HImode optabs, which ARM doesn't
>> have, it needs to use widening to SImode.
>>
>> Fixed thusly, bootstrapped/regtested on x86_64-linux and i686-linux, ok for
>> trunk?
>
> Ok.
>

Thanks Jakub, I had run validations with the patch you attached in
bugzilla, and it does fix the problem I reported.

Thanks,

Christophe

> RIchard.
>
>> 2017-11-20  Jakub Jelinek  
>>
>>   PR target/82981
>>   * internal-fn.c (expand_mul_overflow): Use OPTAB_WIDEN instead of
>>   OPTAB_DIRECT in calls to expand_simple_binop.
>>
>> --- gcc/internal-fn.c.jj  2017-11-15 09:54:30.0 +0100
>> +++ gcc/internal-fn.c 2017-11-20 16:38:55.185145957 +0100
>> @@ -1760,7 +1760,7 @@ expand_mul_overflow (location_t loc, tre
>> tem = convert_modes (mode, hmode, lopart, 1);
>> tem = expand_shift (LSHIFT_EXPR, mode, tem, hprec, NULL_RTX, 1);
>> tem = expand_simple_binop (mode, MINUS, loxhi, tem, NULL_RTX,
>> -  1, OPTAB_DIRECT);
>> +  1, OPTAB_WIDEN);
>> emit_move_insn (loxhi, tem);
>>
>> emit_label (after_hipart_neg);
>> @@ -1774,7 +1774,7 @@ expand_mul_overflow (location_t loc, tre
>>profile_probability::even ());
>>
>> tem = expand_simple_binop (mode, MINUS, loxhi, larger, NULL_RTX,
>> -  1, OPTAB_DIRECT);
>> +  1, OPTAB_WIDEN);
>> emit_move_insn (loxhi, tem);
>>
>> emit_label (after_lopart_neg);
>> @@ -1783,7 +1783,7 @@ expand_mul_overflow (location_t loc, tre
>> /* loxhi += (uns) lo0xlo1 >> (bitsize / 2);  */
>> tem = expand_shift (RSHIFT_EXPR, mode, lo0xlo1, hprec, NULL_RTX, 1);
>> tem = expand_simple_binop (mode, PLUS, loxhi, tem, NULL_RTX,
>> -  1, OPTAB_DIRECT);
>> +  1, OPTAB_WIDEN);
>> emit_move_insn (loxhi, tem);
>>
>> /* if (loxhi >> (bitsize / 2)
>> @@ -1810,7 +1810,7 @@ expand_mul_overflow (location_t loc, tre
>>  convert_modes (hmode, mode, lo0xlo1, 1), 1);
>>
>> tem = expand_simple_binop (mode, IOR, loxhishifted, tem, res,
>> -  1, OPTAB_DIRECT);
>> +  1, OPTAB_WIDEN);
>> if (tem != res)
>>   emit_move_insn (res, tem);
>> emit_jump (done_label);
>> @@ -1835,7 +1835,7 @@ expand_mul_overflow (location_t loc, tre
>> if (!op0_medium_p)
>>   {
>> tem = expand_simple_binop (hmode, PLUS, hipart0, const1_rtx,
>> -  NULL_RTX, 1, OPTAB_DIRECT);
>> +  NULL_RTX, 1, OPTAB_WIDEN);
>> do_compare_rtx_and_jump (tem, const1_rtx, GTU, true, hmode,
>>  NULL_RTX, NULL, do_error,
>>  profile_probability::very_unlikely 
>> ());
>> @@ -1844,7 +1844,7 @@ expand_mul_overflow (location_t loc, tre
>> if (!op1_medium_p)
>>   {
>> tem = expand_simple_binop (hmode, PLUS, hipart1, const1_rtx,
>> -  NULL_RTX, 1, OPTAB_DIRECT);
>> +  NULL_RTX, 1, OPTAB_WIDEN);
>> do_compare_rtx_and_jump (tem, const1_rtx, GTU, true, hmode,
>>  NULL_RTX, NULL, do_error,
>>  profile_probability::very_unlikely 
>> ());
>>
>>   Jakub
>>
>>
>
> --
> Richard Biener 
> SUSE LINUX GmbH, GF: Felix Imendoerffer, Jane Smithard, Graham Norton, HRB 
> 21284 (AG Nuernberg)


Re: [PATCH 2/3] [ARM] Refactor costs calculation for MEM.

2017-11-21 Thread Charles Baylis
On 20 November 2017 at 21:09, Charles Baylis  wrote:

> I have modified this patch accordingly. Patch 1 is no longer needed.
>
> Passes "make check" (with patch 3) on arm-linux-gnueabihf with no
> regressions. Bootstrap is in progress.

Bootstrap built successfully using qemu host.

> Can I still get this in during stage 3?
>
> gcc/ChangeLog:
>
>   Charles Baylis  
>
> * config/arm/arm.c (arm_mem_costs): New function.
> (arm_rtx_costs_internal): Use arm_mem_costs.
>
> gcc/testsuite/ChangeLog:
>
>   Charles Baylis  
>
> * gcc.target/arm/addr-modes-float.c: New test.
> * gcc.target/arm/addr-modes-int.c: New test.
> * gcc.target/arm/addr-modes.h: New header.


Re: [patch] Add support for #pragma GCC unroll

2017-11-21 Thread Eric Botcazou
> First of all, the structuring in this section is screwed up.  The
> discussion and examples for the previous item (#pragma ivdep) should be
> moved inside the @table so that you don't have to introduce another
> @table here, just insert another entry into the existing one.

That's also the case for the entire subsection just above, namely "Function 
Specific Option Pragmas".  I presume the tables must be merged there too?

> Second, we shouldn't be talking about "the programmer" in the third
> person; programmers are "you", the readers of the manual.  The paragraph
> structure and phrasing seem awkward as well.  How about something like
> this instead?
> 
> You can use this pragma to control how many times a loop should be
> unrolled.  It must be placed immediately before a @code{for},
> @code{while} or @code{do} loop or a @samp{#pragma ivdep}, and applies
> only to the loop that follows.  @var{n} is an integer constant
> expression; a value of 0 or 1 disables unrolling of the loop.

Thanks, integrated into the patch.

-- 
Eric Botcazou


Re: [patch] Add support for #pragma GCC unroll

2017-11-21 Thread Eric Botcazou
> The documentation for the directive is missing indeed. We can fix this
> during stage3.

Someone who speaks Fortran will have to write it down...

> Currently the directive works on the whole function (see
> gfc_cfun_has_unroll()) and instructs the loop-optimizers to run on
> that function.

gfc_cfun_has_unroll is superfluous and has already been dropped because the 
flag will be set by the middle-end, but this doesn't change the behavior.

> The loop-optimizers will discover the ANNOTATE_EXPR and act accordingly.
> Richard B. already noted that the RTL unroller might do more than
> intended, see https://gcc.gnu.org/ml/gcc-patches/2017-11/msg01468.html
> I expect updates to the C and C++ in this area to be reflected to Fortran
> too.

Yes, it's a generic issue.

-- 
Eric Botcazou


Re: [PING][patch] PR81794: have "would be stringified in traditional C" warning in libcpp/macro.c be controlled by -Wtraditional

2017-11-21 Thread Bernhard Reutner-Fischer
On Mon, Nov 20, 2017 at 08:03:23PM -0500, David Malcolm wrote:
 
> [1] FWIW the script I use for this is here:
>   https://github.com/davidmalcolm/gcc-build

NUM_CORES=$(getconf _NPROCESSORS_ONLN || echo 1)

You usually don't want to count offline processors.

and in create_src, instead of fiddling you user settings per tree copy i'd
cat >> ~/.gitconfig 

Re: [RFC][PATCH] Change default to -fcommon

2017-11-21 Thread Eric Botcazou
> There is one use in Ada which looks like an optimization for specific
> targets:
> 
>   /* Ada doesn't feature Fortran-like COMMON variables so we shouldn't
>  try to fiddle with DECL_COMMON.  However, on platforms that don't
>  support global BSS sections, uninitialized global variables would
>  go in DATA instead, thus increasing the size of the executable.  */
>   if (!flag_no_common
>   && TREE_CODE (var_decl) == VAR_DECL
>   && TREE_PUBLIC (var_decl)
>   && !have_global_bss_p ())
> DECL_COMMON (var_decl) = 1;

It's for Darwin - you need to evaluate your patch on Darwin.

> I don't understand how this works - if there is no bss support in the
> linker, wouldn't common variables would still end up in the data section?

There is, it's essentially a syntactic issue in the assembler IIRC.

-- 
Eric Botcazou


Re: [patch] implement generic debug() for vectors and hash sets

2017-11-21 Thread Gerald Pfeifer
On Mon, 20 Nov 2017, Aldy Hernandez wrote:
> Minor oversight...

Actually, there appears to be another issue related to this when
bootstrapping with clang 3.4.1 (FreeBSD 10.4):


/scratch/tmp/gerald/GCC-HEAD/gcc/print-rtl.c:982:1: error: explicit 
instantiation cannot have a storage class
DEFINE_DEBUG_VEC (rtx_def *)
^
/scratch/tmp/gerald/GCC-HEAD/gcc/vec.h:456:24: note: expanded from macro 
'DEFINE_DEBUG_VEC'
  template static void debug_helper (vec &); \
   ^


The first failing boostrap seems to have been yesterday at 16:40 UTC, 
24 hours before it still worked.

Gerald


[RFC][PATCH] Extend DCE to remove unnecessary new/delete-pairs

2017-11-21 Thread Dominik Inführ
Hi,

this patch tries to extend tree-ssa-dce.c to remove unnecessary 
new/delete-pairs (it already does that for malloc/free). Clang does it too and 
it seems to be allowed by 
http://www.open-std.org/jtc1/sc22/wg21/docs/papers/2013/n3664.html. I’ve 
bootstrapped/regtested on aarch64-linux and x86_64-linux.

Best,
Dominik



dce-new-delete.diff
Description: Binary data


signature.asc
Description: Message signed with OpenPGP using GPGMail


Re: [PATCH libstdc++/66689] comp_ellint_3 and ellint_3 return garbage values

2017-11-21 Thread Florian Weimer

On 11/18/2017 05:49 PM, Ed Smith-Rowland wrote:
I feel that distros are likely to pick up gcc-7 soon and I'd like to do 
*something*.  This would be something of a transition path.


Historically, in glibc, we would have used symbol versioning for this, 
so that existing binaries retain the old behavior.  The downside is that 
blind recompilation will give you the change in behavior, so it 
essentially benefits proprietary software vendors only, which is why I 
think it's usually not appropriate to do this because either you want 
the fix for all applications, recompiled or not, or you don't.


In addition, in Fedora and downstream, we cannot backport new symbol 
versions unless the symbol version is unique to the feature/bug fix 
being added, due to the way RPM dependencies are generated.


Thanks,
Florian


[PATCH] Fix result for conditional reductions matching at index 0

2017-11-21 Thread Kilian Verhetsel

Hi,

When translating conditional reductions based on integer induction, the
compiler uses the value zero to indicate the absence of any matches: if
the index of the last match is still zero at the end of the loop, the
default value is returned. The problem with this approach is that this
default value is returned not only when there were no matches at all,
but also when the last match occurred at index 0. This causes the test
gcc.dg/vect/pr65947-14.c to fail.

This patch corrects this by reusing the vector of indices used for
COND_REDUCTION, which starts at 1. If the 1-based index of the last
match is non-zero, 1 is subtracted from it, otherwise the initial value
is returned.

I tested this patch on x86_64-pc-linux-gnu (both with SSE and AVX2,
causing both paths through the reduc_code != ERROR_MARK branch being
taken).

2017-11-21  Kilian Verhetsel 

* tree-vect-loop.c
(vect_create_epilog_for_reduction): Fix the returned value for
INTEGER_INDUC_COND_REDUCTION whose last match occurred at
index 0.
(vectorizable_reduction): For INTEGER_INDUC_COND_REDUCTION,
pass the PHI statement that sets the induction variable to the
code generating the epilogue.

Index: gcc/tree-vect-loop.c
===
--- gcc/tree-vect-loop.c	(revision 254913)
+++ gcc/tree-vect-loop.c	(working copy)
@@ -4316,7 +4316,7 @@ get_initial_defs_for_reduction (slp_tree slp_node,
 
 static void
 vect_create_epilog_for_reduction (vec vect_defs, gimple *stmt,
-  gimple *reduc_def_stmt,
+  gimple *reduc_def_stmt, gimple *induct_stmt,
   int ncopies, enum tree_code reduc_code,
   vec reduction_phis,
   bool double_reduc, 
@@ -4477,7 +4477,9 @@ vect_create_epilog_for_reduction (vec vect_d
  The first match will be a 1 to allow 0 to be used for non-matching
  indexes.  If there are no matches at all then the vector will be all
  zeroes.  */
-  if (STMT_VINFO_VEC_REDUCTION_TYPE (stmt_info) == COND_REDUCTION)
+  if (STMT_VINFO_VEC_REDUCTION_TYPE (stmt_info) == COND_REDUCTION
+  || STMT_VINFO_VEC_REDUCTION_TYPE (stmt_info)
+  == INTEGER_INDUC_COND_REDUCTION)
 {
   tree indx_before_incr, indx_after_incr;
   int nunits_out = TYPE_VECTOR_SUBPARTS (vectype);
@@ -4754,7 +4756,9 @@ vect_create_epilog_for_reduction (vec vect_d
   else
 new_phi_result = PHI_RESULT (new_phis[0]);
 
-  if (STMT_VINFO_VEC_REDUCTION_TYPE (stmt_info) == COND_REDUCTION
+  if ((STMT_VINFO_VEC_REDUCTION_TYPE (stmt_info) == COND_REDUCTION
+   || STMT_VINFO_VEC_REDUCTION_TYPE (stmt_info)
+   == INTEGER_INDUC_COND_REDUCTION)
   && reduc_code != ERROR_MARK)
 {
   /* For condition reductions, we have a vector (NEW_PHI_RESULT) containing
@@ -4797,76 +4801,118 @@ vect_create_epilog_for_reduction (vec vect_d
 		induction_index);
   gsi_insert_before (&exit_gsi, max_index_stmt, GSI_SAME_STMT);
 
-  /* Vector of {max_index, max_index, max_index,...}.  */
-  tree max_index_vec = make_ssa_name (index_vec_type);
-  tree max_index_vec_rhs = build_vector_from_val (index_vec_type,
-		  max_index);
-  gimple *max_index_vec_stmt = gimple_build_assign (max_index_vec,
-			max_index_vec_rhs);
-  gsi_insert_before (&exit_gsi, max_index_vec_stmt, GSI_SAME_STMT);
+  if (STMT_VINFO_VEC_REDUCTION_TYPE (stmt_info) == COND_REDUCTION)
+	{
+	  /* Vector of {max_index, max_index, max_index,...}.  */
+	  tree max_index_vec = make_ssa_name (index_vec_type);
+	  tree max_index_vec_rhs = build_vector_from_val (index_vec_type,
+			  max_index);
+	  gimple *max_index_vec_stmt = gimple_build_assign (max_index_vec,
+			max_index_vec_rhs);
+	  gsi_insert_before (&exit_gsi, max_index_vec_stmt, GSI_SAME_STMT);
 
-  /* Next we compare the new vector (MAX_INDEX_VEC) full of max indexes
-	 with the vector (INDUCTION_INDEX) of found indexes, choosing values
-	 from the data vector (NEW_PHI_RESULT) for matches, 0 (ZERO_VEC)
-	 otherwise.  Only one value should match, resulting in a vector
-	 (VEC_COND) with one data value and the rest zeros.
-	 In the case where the loop never made any matches, every index will
-	 match, resulting in a vector with all data values (which will all be
-	 the default value).  */
+	  /* Next we compare the new vector (MAX_INDEX_VEC) full of max indexes
+	 with the vector (INDUCTION_INDEX) of found indexes, choosing values
+	 from the data vector (NEW_PHI_RESULT) for matches, 0 (ZERO_VEC)
+	 otherwise.  Only one value should match, resulting in a vector
+	 (VEC_COND) with one data value and the rest zeros.  In the case
+	 where the loop never made any matches, every index will match,
+	 resulting in a vector with all data values (which will all be the
+	 default value).  */
 
-  /* Compare the max index vector to the vector of found indexes to find
-	 the position of the max value.  */
-   

RE: [PATCH] Don't split call from its call arg location

2017-11-21 Thread Claudiu Zissulescu
> > gcc/
> > 2017-11-20  Claudiu Zissulescu  
> >
> > * cfgrtl.c (force_nonfallthru_and_redirect): Don't split a call
> > and its corresponding call arg location note.
> OK, but please add the test from the original message you posted to
> gcc.target/arc with suitable dg-options.
> 

Committed with the additional arc specific test. Thank you for your review,
Claudiu


RE: [PATCH] [ARC] Reimplement exception handling support.

2017-11-21 Thread Claudiu Zissulescu
> Mostly OK, my only comment is about the comments!  There's a couple of
> places where the comments are phrased in terms of "this commit" and
> "see changes".  Comments in the code should (I think) be phrased in
> terms of how the code is right now (or at least after the changes is
> merged), so:
> 
> (see changes to arc_frame_pointer_required)
> 
> becomes
> 
> (see arc_frame_pointer_required)
> 
> while:
> 
> This issue is fixed in this commit too.
> 
> can probably be deleted completely.
> 

Committed with the additional suggestions. Thank you for your review,
Claudiu


[committed] fix typo in find_jump_threads_backwards()

2017-11-21 Thread Aldy Hernandez

One typo I caused, and one that was already there :).

Fixed both.
commit eecc0d1efb049a057af214764bea6256db07828e
Author: aldyh 
Date:   Tue Nov 21 11:39:51 2017 +

* tree-ssa-threadbackward.c (find_jump_threads_backwards): Fix
typo in comment.

diff --git a/gcc/tree-ssa-threadbackward.c b/gcc/tree-ssa-threadbackward.c
index 6fdbc9039f9..f3f55cf4b44 100644
--- a/gcc/tree-ssa-threadbackward.c
+++ b/gcc/tree-ssa-threadbackward.c
@@ -732,7 +732,7 @@ thread_jumps::fsm_find_control_statement_thread_paths (tree name)
 
It is assumed that BB ends with a control statement and that by
finding a path where NAME is a constant, we can thread the path.
-   SPEED_P_ indicate that we could increase code size to improve the
+   SPEED_P indicates that we could increase code size to improve the
code path.  */
 
 void


Re: [RFA][PATCH] Stack clash protection 07/08 -- V4 (aarch64 bits)

2017-11-21 Thread James Greenhalgh
I've finally built up enough courage to start getting my head around this...

I see one outstanding issue sitting on this patch version:

On Sat, Oct 28, 2017 at 05:08:54AM +0100, Jeff Law wrote:
> On 10/13/2017 02:26 PM, Wilco Dijkstra wrote:
> > --param=stack-clash-protection-probe-interval=13
> > --param=stack-clash-protection-guard-size=12
> > 
> > So if there is a good reason to continue with 2 separate values, we must
> > force probe interval <= guard size!
> The param code really isn't designed to enforce values that are
> inter-dependent.  It has a min, max & default values.  No more, no less.
>  If you set up something inconsistent with the params, it's simply not
> going to work.
> 
> 
> > 
> > Also on AArch64 --param=stack-clash-protection-probe-interval=16 causes
> > crashes due to the offsets used in the probes - we don't need large offsets
> > as we want to probe close to the bottom of the stack.
> Not a surprise.  While I tried to handle larger intervals, I certainly
> didn't test them.  Given the ISA I wouldn't expect an interval > 12 to
> be useful or necessarily even work correctly.

Understood - weird behaviour with weird params don't concern me.

> > Functions with a large stack emit like alloca a lot of code, here I used
> > --param=stack-clash-protection-probe-interval=15:
> > 
> > int f1(int x)
> > {
> >   char arr[128*1024];
> >   return arr[x];
> > }
> > 
> > f1:
> > mov x16, 64512
> > sub sp, sp, x16
> > .cfi_def_cfa_offset 64512
> > mov x16, -32768
> > add sp, sp, x16
> > .cfi_def_cfa_offset -1024
> > str xzr, [sp, 32760]
> > add sp, sp, x16
> > .cfi_def_cfa_offset -66560
> > str xzr, [sp, 32760]
> > sub sp, sp, #1024
> > .cfi_def_cfa_offset -65536
> > str xzr, [sp, 1016]
> > ldrbw0, [sp, w0, sxtw]
> > .cfi_def_cfa_offset 131072
> > add sp, sp, 131072
> > .cfi_def_cfa_offset 0
> > ret
> > 
> > Note the cfa offsets are wrong.
> Yes.  They definitely look wrong.  There's a clear logic error in
> setting up the ADJUST_CFA note when the probing interval is larger than
> 2**12.  That should be easily fixed.  Let me poke at it.

This one does concern me, how did you get on? Did it respond well to
prodding?

> > There is an odd mix of a big initial adjustment, then some 
> > probes+adjustments and
> > then a final adjustment and probe for the remainder. I can't see the point 
> > of having
> > both an initial and remainder adjustment. I would expect this:
> > 
> > sub sp, sp, 65536
> > str xzr, [sp, 1024]
> > sub sp, sp, 65536
> > str xzr, [sp, 1024]
> > ldrbw0, [sp, w0, sxtw]
> > add sp, sp, 131072
> > ret
> I'm really not able to justify spending further time optimizing the
> aarch64 implementation.  I've done the best I can.  You can take the
> work as-is or improve it, but I really can't justify further time
> investment on that architecture.

Makes sense. Understood. And certainly not required to land this patch.

> > int f2(int x)
> > {
> >   char arr[128*1024];
> >   return arr[x];
> > }
> > 
> > f2:
> > mov x16, 64512
> > sub sp, sp, x16
> > mov x16, -65536
> > movkx16, 0xfffd, lsl 16
> > add x16, sp, x16
> > .LPSRL0:
> > sub sp, sp, 4096
> > str xzr, [sp, 4088]
> > cmp sp, x16
> > b.ne.LPSRL0
> > sub sp, sp, #1024
> > str xzr, [sp, 1016]
> > ldrbw0, [sp, w0, sxtw]
> > add sp, sp, 262144
> > ret
> > 
> > The cfa entries are OK for this case. There is a mix of positive/negative 
> > offsets which
> > makes things confusing. Again there are 3 kinds of adjustments when for 
> > this size we
> > only need the loop.
> > 
> > Reusing the existing gen_probe_stack_range code appears a bad idea since
> > it ignores the probe interval and just defaults to 4KB. I don't see why it 
> > should be
> > any more complex than this:
> > 
> > sub x16, sp, 262144  // only need temporary if > 1MB
> > .LPSRL0:
> > sub sp, sp, 65536
> > str xzr, [sp, 1024]
> > cmp sp, x16
> > b.ne.LPSRL0
> > ldrbw0, [sp, w0, sxtw]
> > add sp, sp, 262144
> > ret
> > 
> > Probe insertion if final adjustment >= 1024 also generates a lot of 
> > redundant
> > code - although this is more a theoretical issue given this is so rare.
> Again, if ARM wants this optimized, then ARM's engineers are going to
> have to take the lead here.  I've invested all I can reasonably invest
> in terms of trying optimize the probing for this target.

Likewise here - thanks for your work so far, I have no expectation of this
being fully optimized before I OK it to land.

Sorry for the big delay getting round to this patch, I hope to get serious
time to put in to it later this week, and it would be helpful to close out
the few remaining issues before I do.

Thanks,
James



RE: [PATCH] [ARC] update GLIBC_DYNAMIC_LINKER per glibc upstreaming review comments

2017-11-21 Thread Claudiu Zissulescu
Please also consider this for backporting as that is the gcc version in use by
> glibc test harnesses !
> 

Backported to gcc7 branch.

Cheers,
Claudiu


Re: [PATCH] Add -fsanitize=pointer-{compare,subtract}.

2017-11-21 Thread Martin Liška
On 10/16/2017 10:39 PM, Martin Liška wrote:
> Hi.
> 
> All nits included in mainline review request I've just done:
> https://reviews.llvm.org/D38971
> 
> Martin

Hi.

There's updated version of patch where I added new test-cases and it's rebased
with latest version of libsanitizer changes. This is subject for libsanitizer 
review process.

Martin
>From 100b723b9b7fb10dedb2154f30e1ebd6ef885ab4 Mon Sep 17 00:00:00 2001
From: marxin 
Date: Wed, 8 Nov 2017 13:16:17 +0100
Subject: [PATCH] Add -fsanitize=pointer-{compare,subtract}.

gcc/ChangeLog:

2017-11-21  Martin Liska  

	* doc/invoke.texi: Document the options.
	* flag-types.h (enum sanitize_code): Add
	SANITIZE_POINTER_COMPARE and SANITIZE_POINTER_SUBTRACT.
	* ipa-inline.c (sanitize_attrs_match_for_inline_p): Add handling
	of SANITIZE_POINTER_COMPARE and SANITIZE_POINTER_SUBTRACT.
	* opts.c: Define new sanitizer options.
	* sanitizer.def (BUILT_IN_ASAN_POINTER_COMPARE): Likewise.
	(BUILT_IN_ASAN_POINTER_SUBTRACT): Likewise.

gcc/c/ChangeLog:

2017-11-21  Martin Liska  

	* c-typeck.c (pointer_diff): Add new argument and instrument
	pointer subtraction.
	(build_binary_op): Similar for pointer comparison.

gcc/cp/ChangeLog:

2017-11-21  Martin Liska  

	* typeck.c (pointer_diff): Add new argument and instrument
	pointer subtraction.
	(cp_build_binary_op): Create compound expression if doing an
	instrumentation.

gcc/testsuite/ChangeLog:

2017-11-21  Martin Liska  

	* c-c++-common/asan/pointer-compare-1.c: New test.
	* c-c++-common/asan/pointer-compare-2.c: New test.
	* c-c++-common/asan/pointer-subtract-1.c: New test.
	* c-c++-common/asan/pointer-subtract-2.c: New test.
	* c-c++-common/asan/pointer-subtract-3.c: New test.
	* c-c++-common/asan/pointer-subtract-4.c: New test.
---
 gcc/c/c-typeck.c   | 31 ++--
 gcc/cp/typeck.c| 39 --
 gcc/doc/invoke.texi| 22 ++
 gcc/flag-types.h   |  2 +
 gcc/ipa-inline.c   |  8 ++-
 gcc/opts.c | 15 
 gcc/sanitizer.def  |  4 ++
 .../c-c++-common/asan/pointer-compare-1.c  | 83 ++
 .../c-c++-common/asan/pointer-compare-2.c  | 76 
 .../c-c++-common/asan/pointer-subtract-1.c | 41 +++
 .../c-c++-common/asan/pointer-subtract-2.c | 33 +
 .../c-c++-common/asan/pointer-subtract-3.c | 40 +++
 .../c-c++-common/asan/pointer-subtract-4.c | 40 +++
 libsanitizer/asan/asan_descriptions.cc | 20 ++
 libsanitizer/asan/asan_descriptions.h  |  4 ++
 libsanitizer/asan/asan_report.cc   | 53 --
 libsanitizer/asan/asan_thread.cc   | 25 ++-
 libsanitizer/asan/asan_thread.h|  3 +
 18 files changed, 521 insertions(+), 18 deletions(-)
 create mode 100644 gcc/testsuite/c-c++-common/asan/pointer-compare-1.c
 create mode 100644 gcc/testsuite/c-c++-common/asan/pointer-compare-2.c
 create mode 100644 gcc/testsuite/c-c++-common/asan/pointer-subtract-1.c
 create mode 100644 gcc/testsuite/c-c++-common/asan/pointer-subtract-2.c
 create mode 100644 gcc/testsuite/c-c++-common/asan/pointer-subtract-3.c
 create mode 100644 gcc/testsuite/c-c++-common/asan/pointer-subtract-4.c

diff --git a/gcc/c/c-typeck.c b/gcc/c/c-typeck.c
index 4bdc48a9ea3..5dac9bdf08b 100644
--- a/gcc/c/c-typeck.c
+++ b/gcc/c/c-typeck.c
@@ -96,7 +96,7 @@ static tree lookup_field (tree, tree);
 static int convert_arguments (location_t, vec, tree,
 			  vec *, vec *, tree,
 			  tree);
-static tree pointer_diff (location_t, tree, tree);
+static tree pointer_diff (location_t, tree, tree, tree *);
 static tree convert_for_assignment (location_t, location_t, tree, tree, tree,
 enum impl_conv, bool, tree, tree, int);
 static tree valid_compound_expr_initializer (tree, tree);
@@ -3778,10 +3778,11 @@ parser_build_binary_op (location_t location, enum tree_code code,
 }
 
 /* Return a tree for the difference of pointers OP0 and OP1.
-   The resulting tree has type int.  */
+   The resulting tree has type int.  If POINTER_SUBTRACT sanitization is
+   enabled, assign to INSTRUMENT_EXPR call to libsanitizer.  */
 
 static tree
-pointer_diff (location_t loc, tree op0, tree op1)
+pointer_diff (location_t loc, tree op0, tree op1, tree *instrument_expr)
 {
   tree restype = ptrdiff_type_node;
   tree result, inttype;
@@ -3825,6 +3826,17 @@ pointer_diff (location_t loc, tree op0, tree op1)
 pedwarn (loc, OPT_Wpointer_arith,
 	 "pointer to a function used in subtraction");
 
+  if (sanitize_flags_p (SANITIZE_POINTER_SUBTRACT))
+{
+  gcc_assert (current_function_decl != NULL_TREE);
+
+  op0 = save_expr (op0);
+  op1 = save_expr (op1);
+
+  tree tt = builtin_decl_explicit (BUILT_IN_ASAN_POINTER_SUBTRACT);
+   

Re: [patch] implement generic debug() for vectors and hash sets

2017-11-21 Thread Aldy Hernandez



On 11/21/2017 05:59 AM, Gerald Pfeifer wrote:

On Mon, 20 Nov 2017, Aldy Hernandez wrote:

Minor oversight...


Actually, there appears to be another issue related to this when
bootstrapping with clang 3.4.1 (FreeBSD 10.4):


I have no idea how to bootstrap with clang :).  Perhaps someone can 
throw a hint.


I found a machine with clang, which seemed to compile hello worlds both 
for C and C++:


$ clang -v
clang version 3.8.1 (tags/RELEASE_381/final)
Target: x86_64-unknown-linux-gnu
...

Then I tried either this:

CC=clang CXX=clang++ /blah/configure

or this:

/blah/configure CC=clang CXX=clang++

...with numerous problems building stage1, among which are:

make[3]: Entering directory '/opt/notnfs/aldyh/bld/trunk-with-clang/gcc'
clang++ -std=gnu++98 -c   -g -DIN_GCC-fno-strict-aliasing 
-fno-exceptions -fno-rtti -fasynchronous-unwind-tables -W -Wall 
-Wno-narrowing -Wwrite-strings -Wcast-qual -Wno-format 
-Wmissing-format-attribute -Woverloaded-virtual -pedantic -Wno-long-long 
-Wno-variadic-macros -Wno-overlength-strings -fno-common 
-DHAVE_CONFIG_H -DGENERATOR_FILE -fno-PIE -I. -Ibuild 
-I/home/cygnus/aldyh/src/gcc-pristine/gcc 
-I/home/cygnus/aldyh/src/gcc-pristine/gcc/build 
-I/home/cygnus/aldyh/src/gcc-pristine/gcc/../include 
-I/home/cygnus/aldyh/src/gcc-pristine/gcc/../libcpp/include  \
-o build/print-rtl.o 
/home/cygnus/aldyh/src/gcc-pristine/gcc/print-rtl.c
clang-3.8: warning: treating 'c' input as 'c++' when in C++ mode, this 
behavior is deprecated
In file included from 
/home/cygnus/aldyh/src/gcc-pristine/gcc/print-rtl.c:29:
/home/cygnus/aldyh/src/gcc-pristine/gcc/coretypes.h:73:1: warning: class 
'rtx_def' was previously declared as a struct [-Wmismatched-tags]

class rtx_def;
^
/home/cygnus/aldyh/src/gcc-pristine/gcc/coretypes.h:55:8: note: previous 
use is here

struct rtx_def;
   ^
In file included from 
/home/cygnus/aldyh/src/gcc-pristine/gcc/print-rtl.c:29:
In file included from 
/home/cygnus/aldyh/src/gcc-pristine/gcc/coretypes.h:400:
/home/cygnus/aldyh/src/gcc-pristine/gcc/machmode.h:313:1: warning: 
'pod_mode' defined as a struct template here but previously declared as 
a class template [-Wmismatched-tags]

struct pod_mode
^
/home/cygnus/aldyh/src/gcc-pristine/gcc/coretypes.h:66:20: note: did you 
mean struct here?

template class pod_mode;
   ^
   struct

Is there a magic set of flags I should use?

Aldy


Re: [patch] implement generic debug() for vectors and hash sets

2017-11-21 Thread Gerald Pfeifer
On Tue, 21 Nov 2017, Aldy Hernandez wrote:
> I have no idea how to bootstrap with clang :).  Perhaps someone can 
> throw a hint.

It just works.  Usually. :-)

I run two testers, one nightly, one weekly, on a FreeBSD.org cluster,
one with clang 3.4 the other with clang 4.0 right now, and while both
issue tons of warnings -- what you shared looks pretty familiar -- apart 
from the breakage I now reported it just works.

No magic flags or anything.  In fact, at one point one machine was 
upgraded from a GCC 4.x system compiler to clang 3.4 and things just 
kept working.

> ...with numerous problems building stage1, among which are:

You can safely ignore those.  Just make sure there's no -Werror or 
similar in your environment.

Gerald


[patch][i386, AVX] Adding missing mask[z]_sqrt_round_s[d,s] intrinsics

2017-11-21 Thread Makhotina, Olga
Hi,

This patch adds missing intrinsics for _mm_mask[z]_sqrt_round_[sd,ss].

21.11.2017 Olga Makhotina  

gcc/
  * config/i386/avx512fintrin.h (_mm_mask_sqrt_round_sd,
  _mm_maskz_sqrt_round_sd, _mm_mask_sqrt_round_ss,
  _mm_maskz_sqrt_round_ss): New intrinsics.
  (__builtin_ia32_sqrtsd_round, __builtin_ia32_sqrtss_round): 
Remove.
  (__builtin_ia32_sqrtsd_mask_round,
  __builtin_ia32_sqrtss_mask_round): New builtins.
  * config/i386/i386-builtin.def (__builtin_ia32_sqrtsd_round,
  __builtin_ia32_sqrtss_round): Remove.
  (__builtin_ia32_sqrtsd_mask_round,
  __builtin_ia32_sqrtss_mask_round): New builtins.
  * config/i386/sse.md (vmsqrt2): Renamed to ...
  (vmsqrt2): ... this.
  ((match_operand:VF_128 1 "vector_operand" 
  "xBm,")): Changed to ...
  ((match_operand:VF_128 1 "vector_operand" 
  "xBm,")): ... this.
  (vsqrt\t{%1, %2, %0|
  %0, %2, %1}): Changed to ...
  (vsqrt\t{%1, %2, 
  %0|%0, %2, 
  %1}): ... this.
  ((set_attr "prefix" "")): Changed to ...
  ((set_attr "prefix" "")): ... this.

21.11.2017 Olga Makhotina  

gcc/testsuite/
  * gcc.target/i386/avx512f-vsqrtsd-1.c (_mm_mask_sqrt_round_sd,
  _mm_maskz_sqrt_round_sd): Test new intrinsics.
  * gcc.target/i386/avx512f-vsqrtsd-2.c (_mm_sqrt_round_sd,
  _mm_mask_sqrt_round_sd, _mm_maskz_sqrt_round_sd): Test new 
intrinsics.
  * gcc.target/i386/avx512f-vsqrtss-1.c (_mm_mask_sqrt_round_ss,
  _mm_maskz_sqrt_round_ss): Test new intrinsics.
  * gcc.target/i386/avx512f-vsqrtss-2.c (_mm_sqrt_round_ss,
  _mm_mask_sqrt_round_ss,  _mm_maskz_sqrt_round_ss): Test new 
intrinsics.
  * gcc.target/i386/avx-1.c (__builtin_ia32_sqrtsd_round,
  __builtin_ia32_sqrtss_round): Remove builtins.
  (__builtin_ia32_sqrtsd_mask_round,
  __builtin_ia32_sqrtss_mask_round): Test new builtins.
  * gcc.target/i386/sse-13.c: Ditto.
  * gcc.target/i386/sse-23.c: Ditto.

Is it ok for trunk?

Thanks,
Olga



0001-sqrt.patch
Description: 0001-sqrt.patch


Re: [PATCH libstdc++/66689] comp_ellint_3 and ellint_3 return garbage values

2017-11-21 Thread Jonathan Wakely

On 21/11/17 12:33 +0100, Florian Weimer wrote:

On 11/18/2017 05:49 PM, Ed Smith-Rowland wrote:
I feel that distros are likely to pick up gcc-7 soon and I'd like to 
do *something*.  This would be something of a transition path.


Historically, in glibc, we would have used symbol versioning for this, 
so that existing binaries retain the old behavior.  The downside is 
that blind recompilation will give you the change in behavior, so it 
essentially benefits proprietary software vendors only, which is why I 
think it's usually not appropriate to do this because either you want 
the fix for all applications, recompiled or not, or you don't.


In addition, in Fedora and downstream, we cannot backport new symbol 
versions unless the symbol version is unique to the feature/bug fix 
being added, due to the way RPM dependencies are generated.


None of these functions is exported from the library, they're all
header-only inline functions or function templates.

So the good news is existing binaries retain the old behaviour, but
the bad news is we can't version it easily, and so if you link
together objects built with old and new versions of GCC you have a
one-definition rule violation and the linker will just pick one of the
symbols to be kept.

We could put them in an inline namespace, so they mangle differently,
so then the old and new objects would call different versions of the
functions, with different results.



Re: [RFC][PATCH] Change default to -fcommon

2017-11-21 Thread Richard Biener
On Mon, Nov 20, 2017 at 11:00 PM, Michael Matz  wrote:
> Hi,
>
> On Mon, 20 Nov 2017, Richard Biener wrote:
>
>> Also we cannot raise alignment of commons and thus vectorization is
>> pessimized (all vectorizer testcases use - fno-common).
>
> That would be a simple oversight then.  That's one of the nice things with
> common symbols, they contain their own alignment which you can freely
> adjust, you don't have to care for something like section alignment.

You can't because of the need for merging a real definition with lower alignment
with the adjusted tentative one.

Richard.

>> IIRC LTO promotes commons to locals.
>
> Might be, but if so, probably for no good reason.
>
>
> Ciao,
> Michael.


Re: [RFC][PATCH] Change default to -fcommon

2017-11-21 Thread Michael Matz
Hi,

On Tue, 21 Nov 2017, Richard Biener wrote:

> > That would be a simple oversight then.  That's one of the nice things 
> > with common symbols, they contain their own alignment which you can 
> > freely adjust, you don't have to care for something like section 
> > alignment.
> 
> You can't because of the need for merging a real definition with lower 
> alignment with the adjusted tentative one.

Yes, that is the real neck breaker, and I hadn't considered this 
initially.  Objections withdrawn :)


Ciao,
Michael.


[PATCH] Fix up -Wreturn-type (PR c++/83045)

2017-11-21 Thread Jakub Jelinek
Hi!

The C++ FE now emits __builtin_unreachable () with BUILTINS_LOCATION
on spots that return from functions/methods returning non-void without
proper return.  This breaks the -Wreturn-type warning, because we then
don't see any return stmt without argument on the edges to exit, instead
we see those __builtin_unreachable () calls at the end of blocks without
successors.

I wonder if the C++ FE addition of __builtin_unreachable () shouldn't be
done only if (optimize).

Anyway, this patch tweaks tree-cfg.c so that it recognizes those
__builtin_unreachable () calls and reports the -Wreturn-type warning
in those cases too (warning in the FE would be too early, we need to
optimize away unreachable code).

Bootstrapped/regtested on x86_64-linux and i686-linux, ok for trunk?

The patch regresses g++.dg/gomp/declare-simd-1.C, but given that it revealed
a real bug, I'm not trying to work around it in the patch and will fix it up
incrementally instead.

2017-11-21  Jakub Jelinek  

PR c++/83045
* tree-cfg.c (pass_warn_function_return::execute): Formatting fix.
Also warn if seen __builtin_unreachable () call with BUILTINS_LOCATION.
Use LOCATION_LOCUS when comparing against UNKNOWN_LOCATION.

* c-c++-common/pr61405.c (fn0, fn1): Add return stmts.
* c-c++-common/Wlogical-op-2.c (fn): Likewise.
* g++.dg/debug/pr53466.C: Add -Wno-return-type to dg-options.
* g++.dg/opt/combine.C: Likewise.
* g++.dg/ubsan/return-3.C: Likewise.
* g++.dg/pr59445.C: Likewise.
* g++.dg/pr49847.C: Likewise.
* g++.dg/ipa/pr61800.C: Likewise.
* g++.dg/ipa/pr63470.C: Likewise.
* g++.dg/ipa/pr68672-1.C: Likewise.
* g++.dg/pr58438.C: Likewise.
* g++.dg/torture/pr59265.C: Likewise.
* g++.dg/tree-ssa/ssa-dse-2.C: Likewise.
* g++.old-deja/g++.eh/catch13.C: Likewise.
* g++.old-deja/g++.eh/crash1.C: Likewise.
* g++.dg/tm/pr60004.C: Expect -Wreturn-type warning.
* g++.dg/torture/pr55740.C: Likewise.
* g++.dg/torture/pr43257.C: Likewise.
* g++.dg/torture/pr64280.C: Likewise.
* g++.dg/torture/pr54684.C: Likewise.
* g++.dg/torture/pr56694.C: Likewise.
* g++.dg/torture/pr68470.C: Likewise.
* g++.dg/torture/pr60648.C: Likewise.
* g++.dg/torture/pr71281.C: Likewise.
* g++.dg/torture/pr52772.C: Add -Wno-return-type dg-additional-options.
* g++.dg/torture/pr64669.C: Likewise.
* g++.dg/torture/pr58369.C: Likewise.
* g++.dg/torture/pr33627.C: Likewise.
* g++.dg/torture/predcom-1.C: Add
#pragma GCC diagnostic ignored "-Wreturn-type".
* g++.dg/lto/20090221_0.C: Likewise.
* g++.dg/lto/20091026-1_1.C: Likewise.
* g++.dg/lto/pr54625-1_1.C: Likewise.
* g++.dg/warn/pr83045.C: New test.

--- gcc/tree-cfg.c.jj   2017-11-20 19:55:36.723814204 +0100
+++ gcc/tree-cfg.c  2017-11-21 11:04:35.594567992 +0100
@@ -9049,7 +9049,8 @@ pass_warn_function_return::execute (func
  if ((gimple_code (last) == GIMPLE_RETURN
   || gimple_call_builtin_p (last, BUILT_IN_RETURN))
  && location == UNKNOWN_LOCATION
- && (location = gimple_location (last)) != UNKNOWN_LOCATION
+ && ((location = LOCATION_LOCUS (gimple_location (last)))
+ != UNKNOWN_LOCATION)
  && !optimize)
break;
  /* When optimizing, replace return stmts in noreturn functions
@@ -9075,7 +9076,6 @@ pass_warn_function_return::execute (func
  without returning a value.  */
   else if (warn_return_type > 0
   && !TREE_NO_WARNING (fun->decl)
-  && EDGE_COUNT (EXIT_BLOCK_PTR_FOR_FN (fun)->preds) > 0
   && !VOID_TYPE_P (TREE_TYPE (TREE_TYPE (fun->decl
 {
   FOR_EACH_EDGE (e, ei, EXIT_BLOCK_PTR_FOR_FN (fun)->preds)
@@ -9087,13 +9087,43 @@ pass_warn_function_return::execute (func
  && !gimple_no_warning_p (last))
{
  location = gimple_location (last);
- if (location == UNKNOWN_LOCATION)
+ if (LOCATION_LOCUS (location) == UNKNOWN_LOCATION)
location = fun->function_end_locus;
- warning_at (location, OPT_Wreturn_type, "control reaches end of 
non-void function");
+ warning_at (location, OPT_Wreturn_type,
+ "control reaches end of non-void function");
  TREE_NO_WARNING (fun->decl) = 1;
  break;
}
}
+  /* The C++ FE turns fallthrough from the end of non-void function
+into __builtin_unreachable () call with BUILTINS_LOCATION.
+Recognize those too.  */
+  basic_block bb;
+  if (!TREE_NO_WARNING (fun->decl))
+   FOR_EACH_BB_FN (bb, fun)
+ if (EDGE_COUNT (bb->succs) == 0)
+   {
+ gimple *last = last_stmt (bb);
+ if (last
+ && (LOCATION_LOCUS (gimple_lo

[committed] Fix a buglet in store merging (PR tree-optimization/83086)

2017-11-21 Thread Jakub Jelinek
Hi!

During the first loop iteration, n is uninitialized, so testing
n.base_addr is wrong.  Testing
(i == first ? this_n.base_addr : n.base_addr) is overkill,
perform_symbolic_merge will fail if some iterations have base_addr
set and others don't.

Bootstrapped/regtested on x86_64-linux and i686-linux, committed to trunk as
obvious.

2017-11-21  Jakub Jelinek  

PR tree-optimization/83086
* gimple-ssa-store-merging.c
(imm_store_chain_info::try_coalesce_bswap): Test this_n.base_addr
rather than n.base_addr.

--- gcc/gimple-ssa-store-merging.c.jj   2017-11-21 09:41:00.0 +0100
+++ gcc/gimple-ssa-store-merging.c  2017-11-21 12:13:23.346947756 +0100
@@ -2390,7 +2390,7 @@ imm_store_chain_info::try_coalesce_bswap
? try_size - info->bitsize - bitpos
: bitpos))
return false;
-  if (n.base_addr && vuse_store)
+  if (this_n.base_addr && vuse_store)
{
  unsigned int j;
  for (j = first; j <= last; ++j)

Jakub


Re: Adjust empty class parameter passing ABI (PR c++/60336)

2017-11-21 Thread Marek Polacek
On Tue, Nov 21, 2017 at 10:02:34AM +0100, Uros Bizjak wrote:
> LGTM for x86 part.

Thanks a lot.  I'll commit the patch later today.

Marek


Re: [PATCH] Fix result for conditional reductions matching at index 0

2017-11-21 Thread Richard Biener
On Tue, Nov 21, 2017 at 12:35 PM, Kilian Verhetsel
 wrote:
>
> Hi,
>
> When translating conditional reductions based on integer induction, the
> compiler uses the value zero to indicate the absence of any matches: if
> the index of the last match is still zero at the end of the loop, the
> default value is returned. The problem with this approach is that this
> default value is returned not only when there were no matches at all,
> but also when the last match occurred at index 0. This causes the test
> gcc.dg/vect/pr65947-14.c to fail.
>
> This patch corrects this by reusing the vector of indices used for
> COND_REDUCTION, which starts at 1. If the 1-based index of the last
> match is non-zero, 1 is subtracted from it, otherwise the initial value
> is returned.
>
> I tested this patch on x86_64-pc-linux-gnu (both with SSE and AVX2,
> causing both paths through the reduc_code != ERROR_MARK branch being
> taken).

This is PR81179 I think, please mention that in the changelog.

This unconditionally pessimizes code even if there is no valid index
zero, right?

The issue with the COND_REDUCITION index vector is overflow IIRC.

Alan, can you please comment on the patch?

Thanks,
Richard.

> 2017-11-21  Kilian Verhetsel 
>
> * tree-vect-loop.c
> (vect_create_epilog_for_reduction): Fix the returned value for
> INTEGER_INDUC_COND_REDUCTION whose last match occurred at
> index 0.
> (vectorizable_reduction): For INTEGER_INDUC_COND_REDUCTION,
> pass the PHI statement that sets the induction variable to the
> code generating the epilogue.
>


[committed] One c-family TREE_INT_CST_LOW -> tree_to_uhwi change

2017-11-21 Thread Jakub Jelinek
Hi!

When tweaking c-common.c last time, I was looking for TREE_INT_CST_LOW
uses and which are justified and which aren't.  This case caught my eyes,
while it isn't wrong, pretty much everywhere in the compiler we after
tree_fits_uhwi_p use tree_to_uhwi.

Bootstrapped/regtested on x86_64-linux and i686-linux, committed as obvious.

2017-11-21  Jakub Jelinek  

* c-common.c (get_nonnull_operand): Use tree_to_uhwi.

--- gcc/c-family/c-common.c.jj  2017-11-21 09:05:08.0 +0100
+++ gcc/c-family/c-common.c 2017-11-21 12:17:19.300064345 +0100
@@ -5359,7 +5359,7 @@ get_nonnull_operand (tree arg_num_expr,
   /* Verify the arg number is a small constant.  */
   if (tree_fits_uhwi_p (arg_num_expr))
 {
-  *valp = TREE_INT_CST_LOW (arg_num_expr);
+  *valp = tree_to_uhwi (arg_num_expr);
   return true;
 }
   else

Jakub


[PATCH] Fix mips hang with --help=target --help=optimizers (PR target/82880)

2017-11-21 Thread Jakub Jelinek
Hi!

This is a patch from James that has been sitting in bugzilla for a few
weeks.  The bug is that mips_register_frame_header_opt registers
the pass from a static variable which has an automatic variable in the
initializer, which means that the first time it is called it is registered
properly, but if it is called multiple times (e.g. possible with gccjit
or multiple --help options), then the second time it creates a new pass,
but registers with the old pass (since the var isn't initialized again).
register_pass doesn't store the address it is called with anywhere, just
uses the fields of the struct and stores the pass (first field).
All other spots that call register_pass in all backends do it properly
it seems.

I've just fixed up formatting and added a testcase, tested with cross to
mips and tested the testcase on x86_64-linux and i686-linux.

Ok for trunk?

2017-11-21  James Cowgill  
Jakub Jelinek  

PR target/82880
* config/mips/frame-header-opt.c (mips_register_frame_header_opt):
Remove static keyword from f variable.

* gcc.dg/opts-8.c: New test.

--- gcc/config/mips/frame-header-opt.c.jj   2017-06-07 10:45:51.0 
+0200
+++ gcc/config/mips/frame-header-opt.c  2017-11-21 12:25:54.498746712 +0100
@@ -99,8 +99,7 @@ void
 mips_register_frame_header_opt (void)
 {
   opt_pass *p = make_pass_ipa_frame_header_opt (g);
-  static struct register_pass_info f =
-{p, "comdats", 1, PASS_POS_INSERT_AFTER };
+  struct register_pass_info f = { p, "comdats", 1, PASS_POS_INSERT_AFTER };
   register_pass (&f);
 }
 
--- gcc/testsuite/gcc.dg/opts-8.c.jj2017-11-21 12:24:23.051868081 +0100
+++ gcc/testsuite/gcc.dg/opts-8.c   2017-11-21 12:32:47.023688118 +0100
@@ -0,0 +1,6 @@
+/* PR target/82880 */
+/* Test we don't ICE or hang.  */
+/* { dg-do compile } */
+/* { dg-options "--help=target --help=optimizers" } */
+/* { dg-allow-blank-lines-in-output 1 } */
+/* { dg-prune-output ".*" } */

Jakub


Re: Make ivopts handle calls to internal functions

2017-11-21 Thread Richard Biener
On Mon, Nov 20, 2017 at 12:31 PM, Bin.Cheng  wrote:
> On Fri, Nov 17, 2017 at 3:03 PM, Richard Sandiford
>  wrote:
>> ivopts previously treated pointer arguments to internal functions
>> like IFN_MASK_LOAD and IFN_MASK_STORE as normal gimple values.
>> This patch makes it treat them as addresses instead.  This makes
>> a significant difference to the code quality for SVE loops,
>> since we can then use loads and stores with scaled indices.
> Thanks for working on this.  This can be extended to other internal
> functions which eventually
> are expanded into memory references.  I believe (at least) both x86
> and AArch64 has such
> requirement.

In addition to Bins comments I only have a single one (the rest of the
middle-end
changes look OK).  The alias type of MEM_REFs and TARGET_MEM_REFs
in ADDR_EXPR context is meaningless so you don't need to jump through hoops
to get at it or preserve it in any way, likewise for CLIQUE/BASE if it
were present.

Maybe you can simplify code with this.  As you're introducing &TARGET_MEM_REF
as a valid construct (it weren't before) you'll run into missing /
misguided foldings
eventually.  So be prepared to fix up fallout.

Thanks,
Richard.

>>
>> The patch also adds support for ADDR_EXPRs of TARGET_MEM_REFs,
>> which are the natural way of representing the result of the
>> ivopts transformation.
>>
>> Tested on aarch64-linux-gnu (with and without SVE), x86_64-linux-gnu
>> and powerpc64le-linux-gnu.  OK to install?
>>
>> Richard
>>
>>
>> 2017-11-17  Richard Sandiford  
>> Alan Hayward  
>> David Sherwood  
>>
>> gcc/
>> * expr.c (expand_expr_addr_expr_1): Handle ADDR_EXPRs of
>> TARGET_MEM_REFs.
>> * gimple-expr.h (is_gimple_addressable: Likewise.
>> * gimple-expr.c (is_gimple_address): Likewise.
>> * internal-fn.c (expand_call_mem_ref): New function.
>> (expand_mask_load_optab_fn): Use it.
>> (expand_mask_store_optab_fn): Likewise.
>> * tree-ssa-loop-ivopts.c (USE_ADDRESS): Split into...
>> (USE_REF_ADDRESS, USE_PTR_ADDRESS): ...these new use types.
>> (dump_groups): Update accordingly.
>> (iv_use::mem_type): New member variable.
>> (address_p): New function.
>> (record_use): Add a mem_type argument and initialize the new
>> mem_type field.
>> (record_group_use): Add a mem_type argument.  Use address_p.
>> Update call to record_use.
>> (find_interesting_uses_op): Update call to record_group_use.
>> (find_interesting_uses_cond): Likewise.
>> (find_interesting_uses_address): Likewise.
>> (get_mem_type_for_internal_fn): New function.
>> (find_address_like_use): Likewise.
>> (find_interesting_uses_stmt): Try find_address_like_use before
>> calling find_interesting_uses_op.
>> (addr_offset_valid_p): Use the iv mem_type field as the type
>> of the addressed memory.
>> (add_autoinc_candidates): Likewise.
>> (get_address_cost): Likewise.
>> (split_small_address_groups_p): Use address_p.
>> (split_address_groups): Likewise.
>> (add_iv_candidate_for_use): Likewise.
>> (autoinc_possible_for_pair): Likewise.
>> (rewrite_groups): Likewise.
>> (get_use_type): Check for USE_REF_ADDRESS instead of USE_ADDRESS.
>> (determine_group_iv_cost): Update after split of USE_ADDRESS.
>> (get_alias_ptr_type_for_ptr_address): New function.
>> (rewrite_use_address): Rewrite address uses in calls that were
>> identified by find_address_like_use.
>>
>> gcc/testsuite/
>> * gcc.dg/tree-ssa/scev-9.c: Expected REFERENCE ADDRESS
>> instead of just ADDRESS.
>> * gcc.dg/tree-ssa/scev-10.c: Likewise.
>> * gcc.dg/tree-ssa/scev-11.c: Likewise.
>> * gcc.dg/tree-ssa/scev-12.c: Likewise.
>> * gcc.target/aarch64/sve_index_offset_1.c: New test.
>> * gcc.target/aarch64/sve_index_offset_1_run.c: Likewise.
>> * gcc.target/aarch64/sve_loop_add_2.c: Likewise.
>> * gcc.target/aarch64/sve_loop_add_3.c: Likewise.
>> * gcc.target/aarch64/sve_while_1.c: Check for indexed addressing 
>> modes.
>> * gcc.target/aarch64/sve_while_2.c: Likewise.
>> * gcc.target/aarch64/sve_while_3.c: Likewise.
>> * gcc.target/aarch64/sve_while_4.c: Likewise.
>>
>> Index: gcc/expr.c
>> ===
>> --- gcc/expr.c  2017-11-17 09:49:36.191354637 +
>> +++ gcc/expr.c  2017-11-17 15:02:12.868132458 +
>> @@ -7814,6 +7814,9 @@ expand_expr_addr_expr_1 (tree exp, rtx t
>> return expand_expr (tem, target, tmode, modifier);
>>}
>>
>> +case TARGET_MEM_REF:
>> +  return addr_for_mem_ref (exp, as, true);
>> +
>>  case CONST_DECL:
>>/* Expand the initializer like constants above.  */
>>result = XEXP (expand_expr_constant (DECL_INITIAL (exp),

Re: [RFC][PATCH] Change default to -fcommon

2017-11-21 Thread Wilco Dijkstra
Michael Matz wrote:

> bss _sections_ != bss-like segments in the executable.  Targets might not 
> have a bss section that could be named in the asm file, or no way to 
> switch to it without disrupting surrounding code, but they might have 
> common symbols, which ultimately might or might not be collected in some 
> bss-like segment.  In that case you want to use them instead of symbols in 
> .data.

OK, thanks for the explanation. For large symbols it obviously makes sense
to keep the executable size reasonable.

Is this really Ada specific and related to -fcommon? We could do this 
optimization
without checking flag_no_common.

> What's your rationale for changing this?  In your initial mail you said:
>
> "On many targets this means global variable accesses having an unnecessary 
> codesize and performance penalty in C code (the same source generates 
> better code when built as C++)."
>
> I have a hard time imaging that, so can you give details?  FWIW I've 
> personally always considered using common symbols nicer.

Basically -fcommon disables the anchor optimization on RISC targets,
here is a simple example on AArch64:

int a, b, c;
int f (void) { return a + b + c; }

With -fcommon:

f:
adrpx1, a
adrpx0, b
adrpx2, c
ldr w1, [x1, #:lo12:a]
ldr w0, [x0, #:lo12:b]
ldr w2, [x2, #:lo12:c]
add w0, w1, w0
add w0, w0, w2
ret

With -fno-common:

f:
adrpx0, .LANCHOR0
add x2, x0, :lo12:.LANCHOR0
ldr w1, [x0, #:lo12:.LANCHOR0]
ldp w0, w2, [x2, 4]
add w0, w1, w0
add w0, w0, w2
ret

The anchor pointer is set up once and cached across the function, making 
accesses
to multiple globals cheap and enabling other optimizations. On various targets
(eg. PPC, Arm) creating the address of a global takes 2 instructions so the 
savings
are larger.

Wilco

Re: Add optabs for common types of permutation

2017-11-21 Thread Richard Biener
On Mon, Nov 20, 2017 at 1:35 PM, Richard Sandiford
 wrote:
> Richard Biener  writes:
>> On Mon, Nov 20, 2017 at 12:56 AM, Jeff Law  wrote:
>>> On 11/09/2017 06:24 AM, Richard Sandiford wrote:
 ...so that we can use them for variable-length vectors.  For now
 constant-length vectors continue to use VEC_PERM_EXPR and the
 vec_perm_const optab even for cases that the new optabs could
 handle.

 The vector optabs are inconsistent about whether there should be
 an underscore before the mode part of the name, but the other lo/hi
 optabs have one.

 Doing this means that we're able to optimise some SLP tests using
 non-SLP (for now) on targets with variable-length vectors, so the
 patch needs to add a few XFAILs.  Most of these go away with later
 patches.

 Tested on aarch64-linux-gnu (with and without SVE), x86_64-linux-gnu
 and powerpc64le-linus-gnu.  OK to install?

 Richard


 2017-11-09  Richard Sandiford  
   Alan Hayward  
   David Sherwood  

 gcc/
   * doc/md.texi (vec_reverse, vec_interleave_lo, vec_interleave_hi)
   (vec_extract_even, vec_extract_odd): Document new optabs.
   * internal-fn.def (VEC_INTERLEAVE_LO, VEC_INTERLEAVE_HI)
   (VEC_EXTRACT_EVEN, VEC_EXTRACT_ODD, VEC_REVERSE): New internal
   functions.
   * optabs.def (vec_interleave_lo_optab, vec_interleave_hi_optab)
   (vec_extract_even_optab, vec_extract_odd_optab, vec_reverse_optab):
   New optabs.
   * tree-vect-data-refs.c: Include internal-fn.h.
   (vect_grouped_store_supported): Try using IFN_VEC_INTERLEAVE_{LO,HI}.
   (vect_permute_store_chain): Use them here too.
   (vect_grouped_load_supported): Try using IFN_VEC_EXTRACT_{EVEN,ODD}.
   (vect_permute_load_chain): Use them here too.
   * tree-vect-stmts.c (can_reverse_vector_p): New function.
   (get_negative_load_store_type): Use it.
   (reverse_vector): New function.
   (vectorizable_store, vectorizable_load): Use it.
   * config/aarch64/iterators.md (perm_optab): New iterator.
   * config/aarch64/aarch64-sve.md (_): New expander.
   (vec_reverse_): Likewise.

 gcc/testsuite/
   * gcc.dg/vect/no-vfa-vect-depend-2.c: Remove XFAIL.
   * gcc.dg/vect/no-vfa-vect-depend-3.c: Likewise.
   * gcc.dg/vect/pr33953.c: XFAIL for vect_variable_length.
   * gcc.dg/vect/pr68445.c: Likewise.
   * gcc.dg/vect/slp-12a.c: Likewise.
   * gcc.dg/vect/slp-13-big-array.c: Likewise.
   * gcc.dg/vect/slp-13.c: Likewise.
   * gcc.dg/vect/slp-14.c: Likewise.
   * gcc.dg/vect/slp-15.c: Likewise.
   * gcc.dg/vect/slp-42.c: Likewise.
   * gcc.dg/vect/slp-multitypes-2.c: Likewise.
   * gcc.dg/vect/slp-multitypes-4.c: Likewise.
   * gcc.dg/vect/slp-multitypes-5.c: Likewise.
   * gcc.dg/vect/slp-reduc-4.c: Likewise.
   * gcc.dg/vect/slp-reduc-7.c: Likewise.
   * gcc.target/aarch64/sve_vec_perm_2.c: New test.
   * gcc.target/aarch64/sve_vec_perm_2_run.c: Likewise.
   * gcc.target/aarch64/sve_vec_perm_3.c: New test.
   * gcc.target/aarch64/sve_vec_perm_3_run.c: Likewise.
   * gcc.target/aarch64/sve_vec_perm_4.c: New test.
   * gcc.target/aarch64/sve_vec_perm_4_run.c: Likewise.
>>> OK.
>>
>> It's really a step backwards - we had those optabs and a tree code in
>> the past and
>> canonicalizing things to VEC_PERM_EXPR made things simpler.
>>
>> Why doesn't VEC_PERM  not work?
>
> The problems with that are:
>
> - It doesn't work for vectors with 256-bit elements because the indices
>   wrap round.

That's a general issue that would need to be addressed for larger
vectors (GCN?).
I presume the requirement that the permutation vector have the same size
needs to be relaxed.

> - Supporting a fake VEC_PERM_EXPR  for a few
>   special cases would be hard, especially since v256hi isn't a normal
>   vector mode.  I imagine everything dealing with VEC_PERM_EXPR would
>   then have to worry about that special case.

I think it's not really a special case - any code here should just
expect the same
number of vector elements and not a particular size.  You already dealt with
using a char[] vector for permutations I think.

> - VEC_SERIES_CST only copes naturally with EXTRACT_EVEN, EXTRACT_ODD
>   and REVERSE.  INTERLEAVE_LO is { 0, N/2, 1, N/2+1, ... }.
>   I guess it's possible to represent that using a combination of
>   shifts, masks, and additions, but then:
>
>   1) when generating them, we'd need to make sure that we cost the
>  operation as a single permute, rather than costing all the shifts,
>  masks and additions
>
>   2) we'd need to make sure that all gimple optimisations that run
>  afterwards don't perturb the sequence, otherwise we'll end up
>  with someth

RE: [PATCH] Fix mips hang with --help=target --help=optimizers (PR target/82880)

2017-11-21 Thread Moore, Catherine


> -Original Message-
> From: Jakub Jelinek [mailto:ja...@redhat.com]
> Sent: Tuesday, November 21, 2017 9:07 AM
> To: Moore, Catherine ; Matthew
> Fortune 
> Cc: gcc-patches@gcc.gnu.org; James Cowgill
> 
> Subject: [PATCH] Fix mips hang with --help=target --help=optimizers (PR
> target/82880)
> 
> Hi!
> 
> This is a patch from James that has been sitting in bugzilla for a few
> weeks.  The bug is that mips_register_frame_header_opt registers
> the pass from a static variable which has an automatic variable in the
> initializer, which means that the first time it is called it is registered
> properly, but if it is called multiple times (e.g. possible with gccjit
> or multiple --help options), then the second time it creates a new pass,
> but registers with the old pass (since the var isn't initialized again).
> register_pass doesn't store the address it is called with anywhere, just
> uses the fields of the struct and stores the pass (first field).
> All other spots that call register_pass in all backends do it properly
> it seems.
> 
> I've just fixed up formatting and added a testcase, tested with cross to
> mips and tested the testcase on x86_64-linux and i686-linux.
> 
> Ok for trunk?
> 
> 2017-11-21  James Cowgill  
>   Jakub Jelinek  
> 
>   PR target/82880
>   * config/mips/frame-header-opt.c
> (mips_register_frame_header_opt):
>   Remove static keyword from f variable.
> 
>   * gcc.dg/opts-8.c: New test.
> 
Yes, this is OK.  Thanks for fixing.


Re: Add support for in-order addition reduction using SVE FADDA

2017-11-21 Thread Richard Biener
On Mon, Nov 20, 2017 at 1:54 PM, Richard Sandiford
 wrote:
> Richard Biener  writes:
>> On Fri, Nov 17, 2017 at 5:53 PM, Richard Sandiford
>>  wrote:
>>> This patch adds support for in-order floating-point addition reductions,
>>> which are suitable even in strict IEEE mode.
>>>
>>> Previously vect_is_simple_reduction would reject any cases that forbid
>>> reassociation.  The idea is instead to tentatively accept them as
>>> "FOLD_LEFT_REDUCTIONs" and only fail later if there is no target
>>> support for them.  Although this patch only handles the particular
>>> case of plus and minus on floating-point types, there's no reason in
>>> principle why targets couldn't handle other cases.
>>>
>>> The vect_force_simple_reduction change makes it simpler for parloops
>>> to read the type of reduction.
>>>
>>> Tested on aarch64-linux-gnu (with and without SVE), x86_64-linux-gnu
>>> and powerpc64le-linux-gnu.  OK to install?
>>
>> I don't like that you add a new tree code for this.  A new IFN looks more
>> suitable to me.
>
> OK.

Thanks.  I'd like to eventually get rid of other vectorizer tree codes as well,
like the REDUC_*_EXPR, DOT_PROD_EXPR and SAD_EXPR.  IFNs
are now really the way to go for "target instructions on GIMPLE".

>> Also I think if there's a way to handle this correctly with target support
>> you can also implement a fallback if there is no such support increasing
>> test coverage.  It would basically boil down to extracting all scalars from
>> the non-reduction operand vector and performing a series of reduction
>> ops, keeping the reduction PHI scalar.  This would also support any
>> reduction operator.
>
> Yeah, but without target support, that's probably going to be expensive.
> It's a bit like how we can implement element-by-element loads and stores
> for cases that don't have target support, but had to explicitly disable
> that in many cases, since the cost model was too optimistic.

I expect that for V2DF or even V4DF it might be profitable in quite a number
of cases.  V2DF definitely.

> I can give it a go anyway if you think it's worth it.

I think it is.

Richard.

> As far as testing coverage goes: I think the SVE port is just going
> to have to take the hit of being the only port that uses this stuff
> for now.  The AArch64 testsuite patches test SVE assembly generation
> for non-SVE targets, so it does get at least some coverge on normal
> AArch64 test runs.  But obviously assembly tests only go so far...
>
> Thanks,
> Richard


Re: [committed][PATCH] Fix bogus propagation in DOM

2017-11-21 Thread Richard Biener
On Mon, Nov 20, 2017 at 7:33 PM, Jeff Law  wrote:
> On 11/20/2017 03:25 AM, Richard Biener wrote:
>> On Sun, Nov 19, 2017 at 9:16 PM, Jeff Law  wrote:
>>> On my local branch gcc.dg/torture/pr56349.c fails by sending GCC into an
>>> infinite loop trying to simplify a self-referring statement. ie
>>> something like
>>>
>>> x_1 = x_1 + 10;
>>>
>>> That, of course, shouldn't be happening in SSA form.  After some digging
>>> I've found the culprit.
>>>
>>> Let's say we've got a PHI.
>>>
>>> a_1 = PHI (a_0, a_2)
>>>
>>> If DOM decides that the edge associated with a_2 is not executable, then
>>> DOM will consider the PHI a degenerate and enter a_1 = a_0 into its
>>> equivalence table.
>>>
>>> That in turn will result in propagation of a_0 into uses of a_1.
>>>
>>> That, of course, isn't right.  There's nothing that guarantees that the
>>> definition of a_0 dominates the uses of a_1.  In the testcase that bogus
>>> propagation cascades and eventually results in a self-referring node
>>> like I showed above.
>>>
>>> The solution here is to note whether or not we ignored any PHI
>>> arguments.  If we do and the equivalence we want to enter is SSA_NAME =
>>> SSA_NAME, then we must reject the equivalence.Obviously if we wanted
>>> to enter SSA_NAME = CONST, then we can still do so.
>>>
>>> Bootstrapped and regression tested on x86_64.  Installing on the trunk.
>>
>> Hmm, but if the edge isn't executable then it will be removed and thus the
>> definition _will_ dominate.  So I think the error happens elsewhere.  With
>> the change you remove one of the advantages of tracking unexecutable
>> edges, namely that we can treat those merges optimistically resulting in
>> more CSE.
>>
>> You didn't add a testcase so I can't have a quick look myself.
>>
>> Short: I think you're papering over an issue elsehwere.
> Depends on your point of view :-)
>
> It's not something I think you can trigger on the trunk right now.  But
> the testcase is pr56349.  You need the embedded vrp bits installed into
> DOM and for DOM to also use those bits to detect branches that have a
> static destination.
>
> So I'll just walk you through it...
>
> At the start of the second DOM pass we have this:
>
> f ()
> {
>   int k__lsm.8;
>   int * k;
>   int a;
>   int _2;
>   int b.0_3;
>   int _4;
>   unsigned int ivtmp_5;
>   int _6;
>   short int iftmp.3_14;
>   int _15;
>   int b.4_25;
>   short int iftmp.3_27;
>   short int c.2_33;
>   unsigned int ivtmp_38;
>
> ;;   basic block 2, loop depth 0
> ;;pred:   ENTRY
>   a_9 = 1;
>   ivtmp_38 = 1;
>   a_31 = a_9 + 1;
>   ivtmp_5 = ivtmp_38 + 4294967295;
>   _2 = 1;
>   b.0_3 = b;
>   _4 = b.0_3 | _2;
>   b = _4;
>   if (_4 == 0)
> goto ; [66.00%]
>   else
> goto ; [34.00%]
> ;;succ:   12
> ;;10
>
> ;;   basic block 3, loop depth 1
> ;;pred:   5
> ;;3
>   goto ; [100.00%]
> ;;succ:   3
>
> ;;   basic block 4, loop depth 0
> ;;pred:   11
> ;;12
> lbl1:
>   c.2_33 = c;
> ;;succ:   5
>
> ;;   basic block 5, loop depth 1
> ;;pred:   4
> ;;5
>   if (c.2_33 != 0)
> goto ; [85.00%]
>   else
> goto ; [15.00%]
> ;;succ:   3
> ;;5
>
> ;;   basic block 6, loop depth 0
> ;;pred:   11
>   b.4_25 = b;
>   if (b.4_25 != 0)
> goto ; [50.00%]
>   else
> goto ; [50.00%]
> ;;succ:   7
> ;;8
>
> ;;   basic block 7, loop depth 0
> ;;pred:   6
>   iftmp.3_27 = (short int) b.4_25;
>   goto ; [100.00%]
> ;;succ:   9
>
> ;;   basic block 8, loop depth 0
> ;;pred:   6
> ;;12
>   # k_37 = PHI 
> ;;succ:   9
>
> ;;   basic block 9, loop depth 0
> ;;pred:   7
> ;;8
>   # iftmp.3_14 = PHI 
>   # k_44 = PHI 
>   c = iftmp.3_14;
>   if (iftmp.3_14 != 0)
> goto ; [50.00%]
>   else
> goto ; [50.00%]
> ;;succ:   10
> ;;11
>
> ;;   basic block 10, loop depth 0
> ;;pred:   2
> ;;9
>   # k_10 = PHI <0B(2), k_44(9)>
> lbl2:
>   b = 0;
> ;;succ:   11
>
> ;;   basic block 11, loop depth 0
> ;;pred:   10
> ;;9
>   # k_11 = PHI 
>   k_30 = k_11 + 4;
>   _6 = MEM[(int *)k_11 + 4B];
>   if (_6 != 0)
> goto ; [66.00%]
>   else
> goto ; [34.00%]
> ;;succ:   6
> ;;4
>
> ;;   basic block 12, loop depth 0
> ;;pred:   2
>   _15 = MEM[(int *)0B];
>   if (_15 != 0)
> goto ; [66.00%]
>   else
> goto ; [34.00%]
> ;;succ:   8
> ;;4
>
> }
>
> The first tidbit of interest is we can statically determine that bb2
> will always transfer control to bb10.  The edge 2->12 is marked as not
> executable.  That also means that 12->4 and 12->8 are not executable as
> well.
>
> The PHI in BB8 is critical:
>
>   # k_37 = PHI 
>
> With the edge 8->12 being unexecutable the PHI is essentially k_37 =
> k_30 and we'll try to propagate k_30 into the u

Re: [14/nn] Add helpers for shift count modes

2017-11-21 Thread Richard Biener
On Mon, Nov 20, 2017 at 10:02 PM, Richard Sandiford
 wrote:
> Richard Biener  writes:
>> On Thu, Oct 26, 2017 at 2:06 PM, Richard Biener
>>  wrote:
>>> On Mon, Oct 23, 2017 at 1:25 PM, Richard Sandiford
>>>  wrote:
 This patch adds a stub helper routine to provide the mode
 of a scalar shift amount, given the mode of the values
 being shifted.

 One long-standing problem has been to decide what this mode
 should be for arbitrary rtxes (as opposed to those directly
 tied to a target pattern).  Is it the mode of the shifted
 elements?  Is it word_mode?  Or maybe QImode?  Is it whatever
 the corresponding target pattern says?  (In which case what
 should the mode be when the target doesn't have a pattern?)

 For now the patch picks word_mode, which should be safe on
 all targets but could perhaps become suboptimal if the helper
 routine is used more often than it is in this patch.  As it
 stands the patch does not change the generated code.

 The patch also adds a helper function that constructs rtxes
 for constant shift amounts, again given the mode of the value
 being shifted.  As well as helping with the SVE patches, this
 is one step towards allowing CONST_INTs to have a real mode.
>>>
>>> I think gen_shift_amount_mode is flawed and while encapsulating
>>> constant shift amount RTX generation into a gen_int_shift_amount
>>> looks good to me I'd rather have that ??? in this function (and
>>> I'd use the mode of the RTX shifted, not word_mode...).
>
> OK.  I'd gone for word_mode because that's what expand_binop uses
> for CONST_INTs:
>
>   op1_mode = (GET_MODE (op1) != VOIDmode
>   ? as_a  (GET_MODE (op1))
>   : word_mode);
>
> But using the inner mode should be fine too.  The patch below does that.
>
>>> In the end it's up to insn recognizing to convert the op to the
>>> expected mode and for generic RTL it's us that should decide
>>> on the mode -- on GENERIC the shift amount has to be an
>>> integer so why not simply use a mode that is large enough to
>>> make the constant fit?
>
> ...but I can do that instead if you think it's better.
>
>>> Just throwing in some comments here, RTL isn't my primary
>>> expertise.
>>
>> To add a little bit - shift amounts is maybe the only(?) place
>> where a modeless CONST_INT makes sense!  So "fixing"
>> that first sounds backwards.
>
> But even here they have a mode conceptually, since out-of-range shift
> amounts are target-defined rather than undefined.  E.g. if the target
> interprets the shift amount as unsigned, then for a shift amount
> (const_int -1) it matters whether the mode is QImode (and so we're
> shifting by 255) or HImode (and so we're shifting by 65535.

I think RTL is well-defined (at least I hope so ...) and machine constraints
need to be modeled explicitely (like embedding an implicit bit_and in
shift patterns).

> OK, so shifts by 65535 make no sense in practice, but *conceptually*... :-)
>
> Jeff Law  writes:
>> On 10/26/2017 06:06 AM, Richard Biener wrote:
>>> On Mon, Oct 23, 2017 at 1:25 PM, Richard Sandiford
>>>  wrote:
 This patch adds a stub helper routine to provide the mode
 of a scalar shift amount, given the mode of the values
 being shifted.

 One long-standing problem has been to decide what this mode
 should be for arbitrary rtxes (as opposed to those directly
 tied to a target pattern).  Is it the mode of the shifted
 elements?  Is it word_mode?  Or maybe QImode?  Is it whatever
 the corresponding target pattern says?  (In which case what
 should the mode be when the target doesn't have a pattern?)

 For now the patch picks word_mode, which should be safe on
 all targets but could perhaps become suboptimal if the helper
 routine is used more often than it is in this patch.  As it
 stands the patch does not change the generated code.

 The patch also adds a helper function that constructs rtxes
 for constant shift amounts, again given the mode of the value
 being shifted.  As well as helping with the SVE patches, this
 is one step towards allowing CONST_INTs to have a real mode.
>>>
>>> I think gen_shift_amount_mode is flawed and while encapsulating
>>> constant shift amount RTX generation into a gen_int_shift_amount
>>> looks good to me I'd rather have that ??? in this function (and
>>> I'd use the mode of the RTX shifted, not word_mode...).
>>>
>>> In the end it's up to insn recognizing to convert the op to the
>>> expected mode and for generic RTL it's us that should decide
>>> on the mode -- on GENERIC the shift amount has to be an
>>> integer so why not simply use a mode that is large enough to
>>> make the constant fit?
>>>
>>> Just throwing in some comments here, RTL isn't my primary
>>> expertise.
>> I wonder if encapsulation + a target hook to specify the mode would be
>> better?  We'd then have to argue over word_mode, vs QImode vs so

Re: Cleanup predict.c

2017-11-21 Thread Martin Liška
On 11/14/2017 05:07 PM, Martin Liška wrote:
> On 11/14/2017 10:20 AM, Jan Hubicka wrote:
>> @@ -4670,11 +4671,12 @@ expand_call_inline (basic_block bb, gimp
>>       if (dump_file && (dump_flags & TDF_DETAILS))
>>   {
>> -  fprintf (dump_file, "Inlining ");
>> -  print_generic_expr (dump_file, id->src_fn);
>> -  fprintf (dump_file, " to ");
>> -  print_generic_expr (dump_file, id->dst_fn);
>> -  fprintf (dump_file, " with frequency %i\n", cg_edge->frequency ());
>> +  fprintf (dump_file, "Inlining %s to %s with frequency %4.2f\n",
>> +   xstrdup_for_dump (id->src_node->dump_name ()),
>> +   xstrdup_for_dump (id->dst_node->dump_name ()),
>> +   cg_edge->sreal_frequency ().to_double ());
>> +  id->src_node->dump (dump_file);
>> +  id->dst_node->dump (dump_file);
>>   }
> 
> Hi.
> 
> You don't have to call xstrdup_for_dump for functions symtab_node::dump_name 
> and
> symtab_node::dump_asm_name. Allocation is done from GGC memory.
> 
> Martin

Tested and installed as r255005.

Martin


Re: [PATCH][RFC] Add quotes for constexpr keyword.

2017-11-21 Thread Martin Liška
Hi.

I'm sending v2 of the patch where I fixed test-suite fallout.

Patch can bootstrap on ppc64le-redhat-linux and survives regression tests.

Ready to be installed?
Martin
>From 3195b1b71c387b1359c90f6e752e1c312120cd69 Mon Sep 17 00:00:00 2001
From: marxin 
Date: Wed, 15 Nov 2017 08:41:12 +0100
Subject: [PATCH] Add quotes for constexpr keyword.

gcc/cp/ChangeLog:

2017-11-15  Martin Liska  

	* class.c (finalize_literal_type_property): Add quotes for
	constexpr keyword.
	(explain_non_literal_class): Likewise.
	* constexpr.c (ensure_literal_type_for_constexpr_object): Likewise.
	(is_valid_constexpr_fn): Likewise.
	(check_constexpr_ctor_body): Likewise.
	(register_constexpr_fundef): Likewise.
	(explain_invalid_constexpr_fn): Likewise.
	(cxx_eval_builtin_function_call): Likewise.
	(cxx_eval_call_expression): Likewise.
	(cxx_eval_loop_expr): Likewise.
	(potential_constant_expression_1): Likewise.
	* decl.c (check_previous_goto_1): Likewise.
	(check_goto): Likewise.
	(grokfndecl): Likewise.
	(grokdeclarator): Likewise.
	* error.c (maybe_print_constexpr_context): Likewise.
	* method.c (process_subob_fn): Likewise.
	(defaulted_late_check): Likewise.
	* parser.c (cp_parser_compound_statement): Likewise.

gcc/testsuite/ChangeLog:

2017-11-16  Martin Liska  

	* g++.dg/cpp0x/constexpr-48089.C: Add quotes for constexpr
	keyword; add dg-message for 'in .constexpr. expansion of '.
	* g++.dg/cpp0x/constexpr-50060.C: Likewise.
	* g++.dg/cpp0x/constexpr-60049.C: Likewise.
	* g++.dg/cpp0x/constexpr-70323.C: Likewise.
	* g++.dg/cpp0x/constexpr-70323a.C: Likewise.
	* g++.dg/cpp0x/constexpr-cast.C: Likewise.
	* g++.dg/cpp0x/constexpr-diag3.C: Likewise.
	* g++.dg/cpp0x/constexpr-ex1.C: Likewise.
	* g++.dg/cpp0x/constexpr-generated1.C: Likewise.
	* g++.dg/cpp0x/constexpr-ice16.C: Likewise.
	* g++.dg/cpp0x/constexpr-ice5.C: Likewise.
	* g++.dg/cpp0x/constexpr-incomplete2.C: Likewise.
	* g++.dg/cpp0x/constexpr-neg1.C: Likewise.
	* g++.dg/cpp0x/constexpr-recursion.C: Likewise.
	* g++.dg/cpp0x/constexpr-shift1.C: Likewise.
	* g++.dg/cpp1y/constexpr-70265-1.C: Likewise.
	* g++.dg/cpp1y/constexpr-70265-2.C: Likewise.
	* g++.dg/cpp1y/constexpr-79655.C: Likewise.
	* g++.dg/cpp1y/constexpr-new.C: Likewise.
	* g++.dg/cpp1y/constexpr-return2.C: Likewise.
	* g++.dg/cpp1y/constexpr-shift1.C: Likewise.
	* g++.dg/cpp1y/constexpr-throw.C: Likewise.
	* g++.dg/cpp1z/constexpr-lambda6.C: Likewise.
	* g++.dg/ext/constexpr-vla1.C: Likewise.
	* g++.dg/ext/constexpr-vla2.C: Likewise.
	* g++.dg/ext/constexpr-vla3.C: Likewise.
	* g++.dg/cpp0x/static_assert10.C: Likewise.
	* g++.dg/cpp1y/pr63996.C: Likewise.
	* g++.dg/cpp1y/pr68180.C: Likewise.
	* g++.dg/cpp1y/pr77830.C: Likewise.
	* g++.dg/ubsan/pr63956.C: Likewise.
---
 gcc/cp/class.c |  4 +--
 gcc/cp/constexpr.c | 35 ++--
 gcc/cp/decl.c  | 10 +++---
 gcc/cp/error.c |  4 +--
 gcc/cp/method.c|  6 ++--
 gcc/cp/parser.c|  2 +-
 gcc/testsuite/g++.dg/cpp0x/constexpr-48089.C   |  2 +-
 gcc/testsuite/g++.dg/cpp0x/constexpr-50060.C   |  2 +-
 gcc/testsuite/g++.dg/cpp0x/constexpr-60049.C   | 10 +++---
 gcc/testsuite/g++.dg/cpp0x/constexpr-70323.C   |  4 +--
 gcc/testsuite/g++.dg/cpp0x/constexpr-70323a.C  |  4 +--
 gcc/testsuite/g++.dg/cpp0x/constexpr-cast.C|  2 +-
 gcc/testsuite/g++.dg/cpp0x/constexpr-diag3.C   | 10 +++---
 gcc/testsuite/g++.dg/cpp0x/constexpr-ex1.C |  6 ++--
 gcc/testsuite/g++.dg/cpp0x/constexpr-generated1.C  |  2 +-
 gcc/testsuite/g++.dg/cpp0x/constexpr-ice16.C   |  4 +--
 gcc/testsuite/g++.dg/cpp0x/constexpr-ice5.C|  2 +-
 gcc/testsuite/g++.dg/cpp0x/constexpr-incomplete2.C |  2 +-
 gcc/testsuite/g++.dg/cpp0x/constexpr-neg1.C|  6 ++--
 gcc/testsuite/g++.dg/cpp0x/constexpr-recursion.C   |  4 +--
 gcc/testsuite/g++.dg/cpp0x/constexpr-shift1.C  | 14 
 gcc/testsuite/g++.dg/cpp0x/static_assert10.C   |  2 +-
 gcc/testsuite/g++.dg/cpp1y/constexpr-70265-1.C |  2 +-
 gcc/testsuite/g++.dg/cpp1y/constexpr-70265-2.C |  2 +-
 gcc/testsuite/g++.dg/cpp1y/constexpr-79655.C   |  8 ++---
 gcc/testsuite/g++.dg/cpp1y/constexpr-new.C |  4 +--
 gcc/testsuite/g++.dg/cpp1y/constexpr-return2.C |  2 +-
 gcc/testsuite/g++.dg/cpp1y/constexpr-shift1.C  |  2 +-
 gcc/testsuite/g++.dg/cpp1y/constexpr-throw.C   |  2 +-
 gcc/testsuite/g++.dg/cpp1y/pr63996.C   |  3 +-
 gcc/testsuite/g++.dg/cpp1y/pr68180.C   |  2 +-
 gcc/testsuite/g++.dg/cpp1y/pr77830.C   |  4 +--
 gcc/testsuite/g++.dg/cpp1z/constexpr-lambda6.C |  2 +-
 gcc/testsuite/g++.dg/ext/constexpr-vla1.C  |  2 +-
 gcc/testsuite/g++.dg/ext/constexpr-vla2.C  |  4 +--
 gcc/testsuite/g++.dg/ext/constexpr-vla3.C  |  2 +-
 gcc/testsuite/g++.dg/ubsan/pr63956.C   | 38 +++

[libstdc++-,doc] Mislocated

2017-11-21 Thread Przemyslaw Wirkus
Hello,

Wrong  element position causes libstdc++v3 make doc-pdf-docbook
docs generation procedure to fail.

PDF documentation generation for libstdc++v3 is broken for make doc-pdf-docbook
rule. Pdflatex compilation fail because Latex is not correctly generated from
wrongly formatted variablelist elements in XML docs: one varlist entry is
outside section of variablelist elements:


  
  
  

  

Commit that caused the regression is:

  Author: redi
  Date: Fri Jul 21 16:05:10 2017
  New Revision: 250430
  URL: https://gcc.gnu.org/viewcvs?rev=250430&root=gcc&view=rev

Tested by regenerating libstdc++v3 docs with 'make doc-pdf-docbook'.

OK for trunk?

Kind regards,
Przemyslaw Wirkus

libstdc++-v3/ChangeLog:

2017-11-08  Przemyslaw Wirkus  

* doc/xml/manual/using.xml (manual.intro.using.macros): Move
variablelist element at the end of its list.
diff --git a/libstdc++-v3/doc/xml/manual/using.xml 
b/libstdc++-v3/doc/xml/manual/using.xml
index 
6ce29fd30be74fcc5273ec0971a3b72115aaba73..fdbaa5730072189b828ca2ca120ff752f58da2d3
 100644
--- a/libstdc++-v3/doc/xml/manual/using.xml
+++ b/libstdc++-v3/doc/xml/manual/using.xml
@@ -989,7 +989,6 @@ g++ -Winvalid-pch -I. -include stdc++.h -H -g -O2 hello.cc 
-o test.exe
enables support for ISO/IEC 29124 Special Math Functions.
   
 
-
 
 _GLIBCXX_SANITIZE_VECTOR
 
@@ -1008,6 +1007,7 @@ g++ -Winvalid-pch -I. -include stdc++.h -H -g -O2 
hello.cc -o test.exe
 destroy or modify vectors.
   
 
+
 
   
 


Re: [libstdc++-,doc] Mislocated

2017-11-21 Thread Jonathan Wakely

On 21/11/17 15:22 +, Przemyslaw Wirkus wrote:

Hello,

Wrong  element position causes libstdc++v3 make doc-pdf-docbook
docs generation procedure to fail.

PDF documentation generation for libstdc++v3 is broken for make doc-pdf-docbook
rule. Pdflatex compilation fail because Latex is not correctly generated from
wrongly formatted variablelist elements in XML docs: one varlist entry is
outside section of variablelist elements:


 
 
 

 

Commit that caused the regression is:

 Author: redi
 Date: Fri Jul 21 16:05:10 2017
 New Revision: 250430
 URL: https://gcc.gnu.org/viewcvs?rev=250430&root=gcc&view=rev

Tested by regenerating libstdc++v3 docs with 'make doc-pdf-docbook'.

OK for trunk?


OK, thanks.



Re: [PATCH][RFC] Add quotes for constexpr keyword.

2017-11-21 Thread Martin Sebor

On 11/21/2017 08:00 AM, Martin Liška wrote:

Hi.

I'm sending v2 of the patch where I fixed test-suite fallout.

Patch can bootstrap on ppc64le-redhat-linux and survives regression tests.


The changes look good to me.  Thanks for the nice cleanup!

Martin

PS While going through the patch I noticed the -fconstexpr-depth=
option also isn't quoted in a few instances.  That would be nice
to fix as well at some point (this can of course be done
independently):

   if (!ctx->quiet)
-   error ("constexpr evaluation depth exceeds maximum of %d (use "
+   error ("% evaluation depth exceeds maximum of %d (use "
   "-fconstexpr-depth= to increase the maximum)",
   max_constexpr_depth);



[PATCH] C: don't suggest names that came from earlier failures (PR c/83056)

2017-11-21 Thread David Malcolm
PR c/83056 reports an issue affecting trunk and gcc-7 in which
the C frontend's implementation of lookup_name_fuzzy uses undeclared
identifiers as suggestions when encountering subsequent undeclared
identifiers.

The fix is to filter out the names bound to error_mark_node
in lookup_name_fuzzy.

The C++ frontend is unaffected, as it already does this.

Successfully bootstrapped & regrtested on x86_64-pc-linux-gnu.

OK for trunk and for gcc-7-branch?

gcc/c/ChangeLog:
PR c/83056
* c-decl.c (lookup_name_fuzzy): Don't suggest names that came from
earlier failed lookups.

gcc/testsuite/ChangeLog:
PR c/83056
* gcc.dg/spellcheck-pr83056.c: New test case.
---
 gcc/c/c-decl.c|  2 ++
 gcc/testsuite/gcc.dg/spellcheck-pr83056.c | 11 +++
 2 files changed, 13 insertions(+)
 create mode 100644 gcc/testsuite/gcc.dg/spellcheck-pr83056.c

diff --git a/gcc/c/c-decl.c b/gcc/c/c-decl.c
index e0a4dd1..9c3beab 100644
--- a/gcc/c/c-decl.c
+++ b/gcc/c/c-decl.c
@@ -4035,6 +4035,8 @@ lookup_name_fuzzy (tree name, enum lookup_name_fuzzy_kind 
kind, location_t loc)
   {
if (!binding->id || binding->invisible)
  continue;
+   if (binding->decl == error_mark_node)
+ continue;
/* Don't use bindings from implicitly declared functions,
   as they were likely misspellings themselves.  */
if (TREE_CODE (binding->decl) == FUNCTION_DECL)
diff --git a/gcc/testsuite/gcc.dg/spellcheck-pr83056.c 
b/gcc/testsuite/gcc.dg/spellcheck-pr83056.c
new file mode 100644
index 000..8b90887
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/spellcheck-pr83056.c
@@ -0,0 +1,11 @@
+enum { TYPE_A };
+
+/* Verify that the incorrect "TYPE_B" etc don't get re-used for
+   suggestions for the later incorrect values.  */
+
+void pr83056(void)
+{
+  int b = TYPE_B; /* { dg-error "did you mean 'TYPE_A'" } */
+  int c = TYPE_C; /* { dg-error "did you mean 'TYPE_A'" } */
+  int d = TYPE_D; /* { dg-error "did you mean 'TYPE_A'" } */
+}
-- 
1.8.5.3



Re: [PATCH] C: don't suggest names that came from earlier failures (PR c/83056)

2017-11-21 Thread Marek Polacek
On Tue, Nov 21, 2017 at 10:45:56AM -0500, David Malcolm wrote:
> PR c/83056 reports an issue affecting trunk and gcc-7 in which
> the C frontend's implementation of lookup_name_fuzzy uses undeclared
> identifiers as suggestions when encountering subsequent undeclared
> identifiers.
> 
> The fix is to filter out the names bound to error_mark_node
> in lookup_name_fuzzy.
> 
> The C++ frontend is unaffected, as it already does this.
> 
> Successfully bootstrapped & regrtested on x86_64-pc-linux-gnu.
> 
> OK for trunk and for gcc-7-branch?

Ok, thanks.

Marek


RE: [PATCH, i386] Refactor -mprefer-avx[128|256] options into common -mprefer-vector-width=[none|128|256|512]

2017-11-21 Thread Shalnov, Sergey
Uros,
I did new patch with all comments addressed as proposed.
1. old option -mprefer-avx128 is Alias(mprefer-vector-width=, 128, none)
2. Simplified default initialization (as Bernhard proposed)
3. Fixed documentation (proposed by Sandra)
4. Several tests are changed to use new style of the option but many leaved 
with -mprefer-avx128 (one test with new style -mprefer-vector-width=128)


2017-11-21  Sergey Shalnov  

gcc/
* config/i386/i386-opts.h (enum prefer_vector_width): Added new enum
for the new option -mprefer-vector-width=[none|128|256|512].
* config/i386/i386.c (ix86_target_string): remove old style options
-mprefer-avx256 and make -mprefer-avx128 as alias.
(ix86_option_override_internal):  Apply defaults for the
-mprefer-vector-width=[128|256] option.
* config/i386/i386.h (TARGET_PREFER_AVX128, TARGET_PREFER_AVX256):
Implement macros to work with -mprefer-vector-width=.
* config/i386/i386.opt: Implemented option
-mprefer-vector-width=[none|128|256|512].
* doc/invoke.texi: Documentation for
-mprefer-vector-width=[none|128|256|512].

gcc/testsuite/
* g++.dg/ext/pr57362.C (__attribute__): Apply new option syntax.
* g++.dg/torture/pr81249.C: Ditto.
* gcc.target/i386/avx512f-constant-float-return.c: Ditto.
* gcc.target/i386/avx512f-prefer.c: Ditto.
* gcc.target/i386/pr82460-2.c: Ditto.

Please merge this patch if you think it is acceptable.
Thank you
Sergey


-Original Message-
From: Uros Bizjak [mailto:ubiz...@gmail.com] 
Sent: Tuesday, November 14, 2017 7:57 AM
To: Joseph Myers 
Cc: Shalnov, Sergey ; gcc-patches@gcc.gnu.org; 
kirill.yuk...@gmail.com; Koval, Julia ; Senkevich, 
Andrew ; Peryt, Sebastian 
; Ivchenko, Alexander 
Subject: Re: [PATCH, i386] Refactor -mprefer-avx[128|256] options into common 
-mprefer-vector-width=[none|128|256|512]

On Tue, Nov 14, 2017 at 12:14 AM, Joseph Myers  wrote:
> On Mon, 13 Nov 2017, Uros Bizjak wrote:
>
>> [BTW: --mprefer-avx128 should be marked RejectNegative from the 
>> beginning; let's just assume nobody uses it in its (somehow weird) 
>> negative "-mno-prefer-avx128" form.]
>
> It's used in that form in various testcases that otherwise fail when 
> GCC is configured --with-arch= some CPU that defaults to -mprefer-avx128.

In this case, an even better choice would be:

Alias(mprefer-vector-width=, 128, none)

So, -mno-prefer-avx128 would just set the default back to none.

Uros.


0004-Refactoring-options-mprefer-avx-128-256-into-one-mpr.patch
Description: 0004-Refactoring-options-mprefer-avx-128-256-into-one-mpr.patch


Re: Backports for GCC 7 branch

2017-11-21 Thread Martin Liška
Hi.

There's another bunch of backports to GCC 7 branch I've just tested
and bootstrapped.

Martin
>From 6a918e72d251dd4e2aa8b2c9643f857ceef3997d Mon Sep 17 00:00:00 2001
From: marxin 
Date: Tue, 31 Oct 2017 11:55:19 +
Subject: [PATCH 4/7] Backport r254257

gcc/ChangeLog:

2017-10-31  Martin Liska  

	PR gcov-profile/82633
	* doc/gcov.texi: Document -fkeep-{static,inline}-functions and
	their interaction with GCOV infrastructure.
---
 gcc/configure | 4 ++--
 gcc/configure.ac  | 4 ++--
 gcc/doc/gcov.texi | 7 +++
 3 files changed, 11 insertions(+), 4 deletions(-)

diff --git a/gcc/doc/gcov.texi b/gcc/doc/gcov.texi
index 706aa6cf0b0..88b8d6d9071 100644
--- a/gcc/doc/gcov.texi
+++ b/gcc/doc/gcov.texi
@@ -328,6 +328,13 @@ handlers, respectively. Given @samp{-a} option, unexecuted blocks are
 marked @samp{$} or @samp{%}, depending on whether a basic block
 is reachable via non-exceptional or exceptional paths.
 
+Note that GCC can completely remove the bodies of functions that are
+not needed -- for instance if they are inlined everywhere.  Such functions
+are marked with @samp{-}, which can be confusing.
+Use the @option{-fkeep-inline-functions} and @option{-fkeep-static-functions}
+options to retain these functions and
+allow gcov to properly show their @var{execution_count}.
+
 Some lines of information at the start have @var{line_number} of zero.
 These preamble lines are of the form
 
-- 
2.14.3

>From 1ffe6cb8c9e9b6474616b63bc19e14f57a3bcf04 Mon Sep 17 00:00:00 2001
From: marxin 
Date: Tue, 21 Nov 2017 13:39:14 +
Subject: [PATCH 7/7] Backport r255001

gcc/ChangeLog:

2017-11-21  Martin Liska  

	PR rtl-optimization/82044
	PR tree-optimization/82042
	* dse.c (check_mem_read_rtx): Check for overflow.
---
 gcc/dse.c | 6 ++
 1 file changed, 6 insertions(+)

diff --git a/gcc/dse.c b/gcc/dse.c
index f87dd50024e..6cd1b83d802 100644
--- a/gcc/dse.c
+++ b/gcc/dse.c
@@ -1978,6 +1978,12 @@ check_mem_read_rtx (rtx *loc, bb_info_t bb_info)
   else
 width = GET_MODE_SIZE (GET_MODE (mem));
 
+  if (offset > HOST_WIDE_INT_MAX - width)
+{
+  clear_rhs_from_active_local_stores ();
+  return;
+}
+
   read_info = read_info_type_pool.allocate ();
   read_info->group_id = group_id;
   read_info->mem = mem;
-- 
2.14.3

>From d9b29ea9aedb3ccc88ee66d71999ab12bf6ee498 Mon Sep 17 00:00:00 2001
From: marxin 
Date: Wed, 8 Nov 2017 11:45:35 +
Subject: [PATCH 6/7] Backport r254524

gcc/ChangeLog:

2017-11-08  Martin Liska  

	* gimplify.c (expand_FALLTHROUGH_r): Simplify usage
	of gimple_call_internal_p.
---
 gcc/gimplify.c | 6 ++
 1 file changed, 2 insertions(+), 4 deletions(-)

diff --git a/gcc/gimplify.c b/gcc/gimplify.c
index c0ba76aefdc..5264a4f3d40 100644
--- a/gcc/gimplify.c
+++ b/gcc/gimplify.c
@@ -2203,8 +2203,7 @@ expand_FALLTHROUGH_r (gimple_stmt_iterator *gsi_p, bool *handled_ops_p,
 	  while (!gsi_end_p (gsi2))
 	{
 	  stmt = gsi_stmt (gsi2);
-	  enum gimple_code gc = gimple_code (stmt);
-	  if (gc == GIMPLE_LABEL)
+	  if (gimple_code (stmt) == GIMPLE_LABEL)
 		{
 		  tree label = gimple_label_label (as_a  (stmt));
 		  if (gimple_has_location (stmt) && DECL_ARTIFICIAL (label))
@@ -2213,8 +2212,7 @@ expand_FALLTHROUGH_r (gimple_stmt_iterator *gsi_p, bool *handled_ops_p,
 		  break;
 		}
 		}
-	  else if (gc == GIMPLE_CALL
-		   && gimple_call_internal_p (stmt, IFN_ASAN_MARK))
+	  else if (gimple_call_internal_p (stmt, IFN_ASAN_MARK))
 		;
 	  else
 		/* Something other is not expected.  */
-- 
2.14.3

>From 0a31d813c6b12b7530affb9ad937c36d3b0b48ec Mon Sep 17 00:00:00 2001
From: marxin 
Date: Wed, 8 Nov 2017 08:17:30 +
Subject: [PATCH 5/7] Backport r254519

gcc/ChangeLog:

2017-11-08  Martin Liska  

	PR sanitizer/82792
	* gimplify.c (expand_FALLTHROUGH_r): Skip IFN_ASAN_MARK.

gcc/testsuite/ChangeLog:

2017-11-08  Martin Liska  

	PR sanitizer/82792
	* g++.dg/asan/pr82792.C: New test.
---
 gcc/gimplify.c  |  8 ++--
 gcc/testsuite/g++.dg/asan/pr82792.C | 32 
 2 files changed, 38 insertions(+), 2 deletions(-)
 create mode 100644 gcc/testsuite/g++.dg/asan/pr82792.C

diff --git a/gcc/gimplify.c b/gcc/gimplify.c
index e23aae91094..c0ba76aefdc 100644
--- a/gcc/gimplify.c
+++ b/gcc/gimplify.c
@@ -2203,7 +2203,8 @@ expand_FALLTHROUGH_r (gimple_stmt_iterator *gsi_p, bool *handled_ops_p,
 	  while (!gsi_end_p (gsi2))
 	{
 	  stmt = gsi_stmt (gsi2);
-	  if (gimple_code (stmt) == GIMPLE_LABEL)
+	  enum gimple_code gc = gimple_code (stmt);
+	  if (gc == GIMPLE_LABEL)
 		{
 		  tree label = gimple_label_label (as_a  (stmt));
 		  if (gimple_has_location (stmt) && DECL_ARTIFICIAL (label))
@@ -2212,8 +2213,11 @@ expand_FALLTHROUGH_r (gimple_stmt_iterator *gsi_p, bool *handled_ops_p,
 		  break;
 		}
 		}
+	  else if (gc == GIMPLE_CALL
+		   && gimple_call_internal_p (stmt, IFN_ASAN_MARK))
+		;
 	  else
-		/* Something other than a label.  That

Re: [patch, fortran] Implement maxloc and minloc for character

2017-11-21 Thread Janne Blomqvist
On Mon, Nov 20, 2017 at 8:29 PM, Thomas Koenig  wrote:
> Am 20.11.2017 um 09:30 schrieb Janne Blomqvist:
>>
>> On Sun, Nov 19, 2017 at 11:11 PM, Thomas Koenig 
>> wrote:
>>>
>>> There is one question regarding the ABI. Apparently, the string length
>>> is passed as an int even on a 64-bit system. I verified that this
>>> is indeed the case by doing the actual work on a
>>> powerpc64-unknown-linux-gnu box (gcc110 on the gcc compile farm),
>>> which is big-endian. If we were actually passing an eight-byte
>>> quantity, and only getting the upper bytes, we would crash & burn.
>>>
>>> Now, I _thought_ we were passing string lengths as size_t now (Janne?),
>>> but maybe something was missing in that change.
>>
>>
>> Unfortunately I had to revert the charlen->size_t patch since it
>> caused regressions on aix/power (presumably due to endianness issues).
>
>
> Ah, that explains it. I had forgotten the reversion part.
>
>> There's apparently some other process for getting compile farm
>> accounts nowadays, and we have broken the ABI again for gcc 8, so
>> maybe I should dust off the patch and try again. Or what do you think?
>
>
> You can apply at https://cfarm.tetaneutral.net/ . These machines are
> indeed quite nice to work on, especially because of the different
> architectures (and because there are some very powerful machines
> there).
>
> So, any other comments about my patch? OK for trunk?

- In many cases the copyright notice has "This file is part of the GNU
Fortran 95 runtime library (libgfortran)." It's a while since we've
called ourselves "GNU Fortran 95", so just remove the "95".

- It seems in the library you're using int for string lengths? Please
use gfc_charlen_type instead (in the frontend, gfc_charlen_type_node).
(Most of the charlen->size_t patch is fixing up places where we're
accidentally using int instead of gfc_charlen_type..).

- Why are you using GFC_INTEGER_1 / GFC_INTEGER_4 to loop over the
arrays rather than char/gfc_char4_t? Not sure if it makes any
difference in practice, but it sure seems confusing..

- Not really related to your patch, but memcmp_char4 sure looks
redundant. Isn't it the same as memcmp(a, b, size*4), in which case we
could use optimized memcmp implementations?



-- 
Janne Blomqvist


Re: [PATCH] [MSP430] [PR78554] Prevent SUBREG from referencing a SYMBOL_REF

2017-11-21 Thread Jeff Law
On 08/24/2017 07:18 AM, Jozef Lawrynowicz wrote:
> As reported in PR78554, attempting to store an __int20 address in memory
> causes an ICE due to an invalid insn. This only occurs at optimisation
> levels higher than -O0 because these optimisation levels pass
> -ftree-ter, which causes the compiler to try and do the store in
> one instruction.
> The issue in the insn is that a SUBREG references a SYMBOL_REF.
> 
> I guess the compiler gets into this situation because it assumes that
> it can execute a move instruction where both src and dst are in memory,
> but this isn't possible with __int20.
GCC doesn't really make this assumption, but it does rely on the
target's movXX expanders to handle generating correct code for a
mem->mem move.

Commonly on risc targets what you'll see is a movXX expanders like this:

(define_expand "movsi"
  [(set (match_operand:SI 0 "nonimmediate_operand")
(match_operand:SI 1 "general_operand"))]
  ""
{
  /* One of the ops has to be in a register.  */
  if (!register_operand (operand1, SImode)
  && !register_operand (operand0, SImode))
operands[1] = force_reg (SImode, operand1);



> 
> The attached patch prevents a instance of SUBREG being created where the
> subword is a SYMBOL_REF.
> 
> If the patch is acceptable, I would appreciate if someone could commit
> it for me, as I do not have write access.
I was going to look deeper at this, but can't as the trunk currently
aborts in init_derived_machine_modes when compiling the testcase.
Presumably something about Richard S's work is busted when it comes to
dealing with partial-word stuff.

I'll ping Richard S. to take a looksie.

Jeff


Re: [1/4] Give the target more control over ARRAY_TYPE modes

2017-11-21 Thread Jeff Law
On 11/08/2017 08:12 AM, Richard Sandiford wrote:
> So far we've used integer modes for LD[234] and ST[234] arrays.
> That doesn't scale well to SVE, since the sizes aren't fixed at
> compile time (and even if they were, we wouldn't want integers
> to be so wide).
> 
> This patch lets the target use double-, triple- and quadruple-length
> vectors instead.
> 
> 
> 2017-11-08  Richard Sandiford  
>   Alan Hayward  
>   David Sherwood  
> 
> gcc/
>   * target.def (array_mode): New target hook.
>   * doc/tm.texi.in (TARGET_ARRAY_MODE): New hook.
>   * doc/tm.texi: Regenerate.
>   * hooks.h (hook_optmode_mode_uhwi_none): Declare.
>   * hooks.c (hook_optmode_mode_uhwi_none): New function.
>   * tree-vect-data-refs.c (vect_lanes_optab_supported_p): Use
>   targetm.array_mode.
>   * stor-layout.c (mode_for_array): Likewise.  Support polynomial
>   type sizes.
> 
Whoops.  I'd started, but not completed review on this one a few days ago.

OK.  I think this covers the target independent bits from the series, right?

jeff


Re: [libstdc++-,doc] Mislocated

2017-11-21 Thread Przemyslaw Wirkus
On 21/11/17 15:27 +, Jonathan Wakely wrote:
>>OK for trunk?

>OK, thanks.

I don't have privileges to commit. Could you please commit it on my behalf?





Re: Add support for in-order addition reduction using SVE FADDA

2017-11-21 Thread Richard Sandiford
Richard Biener  writes:
> On Mon, Nov 20, 2017 at 1:54 PM, Richard Sandiford
>  wrote:
>> Richard Biener  writes:
>>> On Fri, Nov 17, 2017 at 5:53 PM, Richard Sandiford
>>>  wrote:
 This patch adds support for in-order floating-point addition reductions,
 which are suitable even in strict IEEE mode.

 Previously vect_is_simple_reduction would reject any cases that forbid
 reassociation.  The idea is instead to tentatively accept them as
 "FOLD_LEFT_REDUCTIONs" and only fail later if there is no target
 support for them.  Although this patch only handles the particular
 case of plus and minus on floating-point types, there's no reason in
 principle why targets couldn't handle other cases.

 The vect_force_simple_reduction change makes it simpler for parloops
 to read the type of reduction.

 Tested on aarch64-linux-gnu (with and without SVE), x86_64-linux-gnu
 and powerpc64le-linux-gnu.  OK to install?
>>>
>>> I don't like that you add a new tree code for this.  A new IFN looks more
>>> suitable to me.
>>
>> OK.
>
> Thanks.  I'd like to eventually get rid of other vectorizer tree codes as 
> well,
> like the REDUC_*_EXPR, DOT_PROD_EXPR and SAD_EXPR.  IFNs
> are now really the way to go for "target instructions on GIMPLE".

Glad you said that.  I ended up having to convert REDUC_*_EXPRs too,
since it was too ugly trying to support some reductions based on tree
codes and some on internal functions.  (I did try using code_helper,
but even then...)

Tested on aarch64-linux-gnu, x86_64-linux-gnu and powerpc64le-linux-gnu.
OK to install?

Thanks,
Richard

PS. This applies at the same point in the series as the FADDA patch.
I can rejig it to apply onto current trunk if that seems better.


2017-11-21  Richard Sandiford  

gcc/
* tree.def (REDUC_MAX_EXPR, REDUC_MIN_EXPR, REDUC_PLUS_EXPR)
(REDUC_AND_EXPR, REDUC_IOR_EXPR, REDUC_XOR_EXPR): Delete.
* doc/generic.texi (REDUC_MAX_EXPR, REDUC_MIN_EXPR, REDUC_PLUS_EXPR)
(REDUC_AND_EXPR, REDUC_IOR_EXPR, REDUC_XOR_EXPR): Delete.
* cfgexpand.c (expand_debug_expr): Remove handling for them.
* expr.c (expand_expr_real_2): Likewise.
* fold-const.c (const_unop): Likewise.
* optabs-tree.c (optab_for_tree_code): Likewise.
* tree-cfg.c (verify_gimple_assign_unary): Likewise.
* tree-inline.c (estimate_operator_cost): Likewise.
* tree-pretty-print.c (dump_generic_node): Likewise.
(op_code_prio): Likewise.
(op_symbol_code): Likewise.
* internal-fn.def (DEF_INTERNAL_SIGNED_OPTAB_FN): Define.
(IFN_REDUC_PLUS, IFN_REDUC_MAX, IFN_REDUC_MIN, IFN_REDUC_AND)
(IFN_REDUC_IOR, IFN_REDUC_XOR): New internal functions.
* internal-fn.c (direct_internal_fn_optab): New function.
(direct_internal_fn_array, direct_internal_fn_supported_p
(internal_fn_expanders): Handle DEF_INTERNAL_SIGNED_OPTAB_FN.
* fold-const-call.c (fold_const_reduction): New function.
(fold_const_call): Handle CFN_REDUC_PLUS, CFN_REDUC_MAX, CFN_REDUC_MIN,
CFN_REDUC_AND, CFN_REDUC_IOR and CFN_REDUC_XOR.
* tree-vect-loop.c (reduction_code_for_scalar_code): Rename to...
(reduction_fn_for_scalar_code): ...this and return an internal
function.
(vect_model_reduction_cost): Take an internal_fn rather than
a tree_code.
(vect_create_epilog_for_reduction): Likewise.  Build calls rather
than assignments.
(vectorizable_reduction): Use internal functions rather than tree
codes for the reduction operation.  Update calls to the functions
above.
* config/aarch64/aarch64-builtins.c (aarch64_gimple_fold_builtin):
Use calls to internal functions rather than REDUC tree codes.
* config/aarch64/aarch64-simd.md: Update comment accordingly.

Index: gcc/tree.def
===
--- gcc/tree.def2017-11-21 16:31:28.695326387 +
+++ gcc/tree.def2017-11-21 16:31:49.729927809 +
@@ -1287,21 +1287,6 @@ DEFTREECODE (OMP_CLAUSE, "omp_clause", t
Operand 0: BODY: contains body of the transaction.  */
 DEFTREECODE (TRANSACTION_EXPR, "transaction_expr", tcc_expression, 1)
 
-/* Reduction operations.
-   Operations that take a vector of elements and "reduce" it to a scalar
-   result (e.g. summing the elements of the vector, finding the minimum over
-   the vector elements, etc).
-   Operand 0 is a vector.
-   The expression returns a scalar, with type the same as the elements of the
-   vector, holding the result of the reduction of all elements of the operand.
-   */
-DEFTREECODE (REDUC_MAX_EXPR, "reduc_max_expr", tcc_unary, 1)
-DEFTREECODE (REDUC_MIN_EXPR, "reduc_min_expr", tcc_unary, 1)
-DEFTREECODE (REDUC_PLUS_EXPR, "reduc_plus_expr", tcc_unary, 1)
-DEFTREECODE (REDUC_AND_EXPR, "reduc_and_expr", tcc_unary, 1)
-DEFTREECODE (REDUC_IOR_EXPR, "reduc_ior_expr"

Re: [PATCH] Simplify floating point comparisons

2017-11-21 Thread Jeff Law
On 11/15/2017 08:36 AM, Wilco Dijkstra wrote:
> Richard Biener wrote:
>> On Tue, Oct 17, 2017 at 6:28 PM, Wilco Dijkstra  
>> wrote:
> 
>>> +(if (flag_unsafe_math_optimizations)
>>> +  /* Simplify (C / x op 0.0) to x op 0.0 for C > 0.  */
>>> +  (for op (lt le gt ge)
>>> +   neg_op (gt ge lt le)
>>> +    (simplify
>>> +  (op (rdiv REAL_CST@0 @1) real_zerop@2)
>>> +  (switch
>>> +   (if (real_less (&dconst0, TREE_REAL_CST_PTR (@0)))
>>
>> Note that real_less (0., +Inf) so I think you either need to check C is 
>> 'normal'
>> or ! HONOR_INFINITIES.
> 
> Yes, it was missing an explicit check for infinity, now added.
> 
>> There's also the underflow issue I guess this is what 
>> -funsafe-math-optimizations
>> is for.  I think ignoring underflows is dangerous though.
> 
> We could change C / x > 0 to x >= 0 so the underflow case is included.
> However that still means x == 0.0 would behave differently - so the question 
> is
> what exactly does -funsafe-math-optimization allow?
Well, we largely define what it means.  I believe we have in the past
stated that it can't break SPEC.  It's as good a rule as anything,
though it is underspecified (what version, what other flags, what
targets, etc).

With that bit of background, I believe that -funsafe-math-optimizations
has been allowed to inhibit or introduce underflows (reassociation in
particular I think does this and is enabled by
-funsafe-math-optimizations).  So I don't think ignoring underflow
should inherently block this patch.



> 
> 
>>> + (for cmp (lt le gt ge)
>>> +  neg_cmp (gt ge lt le)
>>> +  /* Simplify (x * C1) cmp C2 -> x cmp (C2 / C1), where C1 != 0.  */
>>> +  (simplify
>>> +   (cmp (mult @0 REAL_CST@1) REAL_CST@2)
>>> +   (with
>>> +    { tree tem = const_binop (RDIV_EXPR, type, @2, @1); }
>>> +    (if (tem)
>>> + (switch
>>> +  (if (real_less (&dconst0, TREE_REAL_CST_PTR (@1)))
>>> +   (cmp @0 { tem; }))
>>> +  (if (real_less (TREE_REAL_CST_PTR (@1), &dconst0))
>>> +   (neg_cmp @0 { tem; })))
>>
>>
>> Drops possible overflow/underflow in x * C1 and may create underflow
>> or overflow with C2/C1 which you should detect here at least.
> 
> I've added checks for this, however I thought -funsafe-math-optimizations is
> allowed to insert/remove underflow/overflow, like in these cases:
> 
> (x * 1e20f) * 1e20f and (x * 1e40f) * 1e-30f.
Right.  That's my understanding as well.


> 
>> Existing overflows may be guarded against with a HONOR_INFINITIES check.
> 
> Not sure what you mean with this?
> 
>> When overflow/underflow can be disregarded is there any reason remaining to
>> make this guarded by flag_unsafe_math_optimizations?  Are there any cases
>> where rounding issues can flip the comparison result?
> 
> I think it needs to remain under -funsafe-math-optimizations. Here is the 
> updated
> version:
Richi has taken the lead on this review and should probably own it.
Just thought it was worth chiming in on the underflow issue.

jeff


Re: [PATCH] Fix result for conditional reductions matching at index 0

2017-11-21 Thread Kilian Verhetsel

> This is PR81179 I think, please mention that in the changelog.

Correct, my bad for missing that.

> This unconditionally pessimizes code even if there is no valid index
> zero, right?

Almost, since for a loop such as:

  #define OFFSET 1
  unsigned int find(const unsigned int *a, unsigned int v) {
unsigned int result = 120;
for (unsigned int i = OFFSET; i < 32+OFFSET; i++) {
  if (a[i-OFFSET] == v) result = i;
}
return result;
  }

the index i will match the contents of the index vector used here ---
but this does indeed pessimize the code generated for, say, OFFSET
= 2. It is probably more sensible to use the existing code path in those
situations.

> The issue with the COND_REDUCITION index vector is overflow IIRC.

Does that mean such overflows can already manifest themselves for
regular COND_REDUCTION? I had assumed sufficient checks were already in
place because of the presence of the is_nonwrapping_integer_induction
test.


Re: Add support for in-order addition reduction using SVE FADDA

2017-11-21 Thread Richard Sandiford
Richard Biener  writes:
> On Mon, Nov 20, 2017 at 1:54 PM, Richard Sandiford
>  wrote:
>> Richard Biener  writes:
>>> On Fri, Nov 17, 2017 at 5:53 PM, Richard Sandiford
>>>  wrote:
 This patch adds support for in-order floating-point addition reductions,
 which are suitable even in strict IEEE mode.

 Previously vect_is_simple_reduction would reject any cases that forbid
 reassociation.  The idea is instead to tentatively accept them as
 "FOLD_LEFT_REDUCTIONs" and only fail later if there is no target
 support for them.  Although this patch only handles the particular
 case of plus and minus on floating-point types, there's no reason in
 principle why targets couldn't handle other cases.

 The vect_force_simple_reduction change makes it simpler for parloops
 to read the type of reduction.

 Tested on aarch64-linux-gnu (with and without SVE), x86_64-linux-gnu
 and powerpc64le-linux-gnu.  OK to install?
>>>
>>> I don't like that you add a new tree code for this.  A new IFN looks more
>>> suitable to me.
>>
>> OK.
>
> Thanks.  I'd like to eventually get rid of other vectorizer tree codes as 
> well,
> like the REDUC_*_EXPR, DOT_PROD_EXPR and SAD_EXPR.  IFNs
> are now really the way to go for "target instructions on GIMPLE".
>
>>> Also I think if there's a way to handle this correctly with target support
>>> you can also implement a fallback if there is no such support increasing
>>> test coverage.  It would basically boil down to extracting all scalars from
>>> the non-reduction operand vector and performing a series of reduction
>>> ops, keeping the reduction PHI scalar.  This would also support any
>>> reduction operator.
>>
>> Yeah, but without target support, that's probably going to be expensive.
>> It's a bit like how we can implement element-by-element loads and stores
>> for cases that don't have target support, but had to explicitly disable
>> that in many cases, since the cost model was too optimistic.
>
> I expect that for V2DF or even V4DF it might be profitable in quite a number
> of cases.  V2DF definitely.
>
>> I can give it a go anyway if you think it's worth it.
>
> I think it is.

OK, here's 2/3.  It just splits out some code for reuse in 3/3.

Tested as before.

Thanks,
Richard


2017-11-21  Richard Sandiford  

gcc/
* tree-vect-loop.c (vect_extract_elements, vect_expand_fold_left): New
functions, split out from...
(vect_create_epilog_for_reduction): ...here.

Index: gcc/tree-vect-loop.c
===
--- gcc/tree-vect-loop.c2017-11-21 16:31:49.728927972 +
+++ gcc/tree-vect-loop.c2017-11-21 16:43:13.061221251 +
@@ -4566,6 +4566,65 @@ get_initial_defs_for_reduction (slp_tree
 }
 }
 
+/* Extract all the elements of VECTOR into SCALAR_RESULTS, inserting
+   the extraction statements before GSI.  Associate the new scalar
+   SSA names with variable SCALAR_DEST.  */
+
+static void
+vect_extract_elements (gimple_stmt_iterator *gsi, vec *scalar_results,
+  tree scalar_dest, tree vector)
+{
+  tree vectype = TREE_TYPE (vector);
+  tree scalar_type = TREE_TYPE (vectype);
+  tree bitsize = TYPE_SIZE (scalar_type);
+  unsigned HOST_WIDE_INT vec_size_in_bits = tree_to_uhwi (TYPE_SIZE (vectype));
+  unsigned HOST_WIDE_INT element_bitsize = tree_to_uhwi (bitsize);
+
+  for (unsigned HOST_WIDE_INT bit_offset = 0;
+   bit_offset < vec_size_in_bits;
+   bit_offset += element_bitsize)
+{
+  tree bitpos = bitsize_int (bit_offset);
+  tree rhs = build3 (BIT_FIELD_REF, scalar_type, vector, bitsize, bitpos);
+
+  gassign *stmt = gimple_build_assign (scalar_dest, rhs);
+  tree new_name = make_ssa_name (scalar_dest, stmt);
+  gimple_assign_set_lhs (stmt, new_name);
+  gsi_insert_before (gsi, stmt, GSI_SAME_STMT);
+
+  scalar_results->safe_push (new_name);
+}
+}
+
+/* Successively apply CODE to each element of VECTOR_RHS, in left-to-right
+   order.  Start with LHS if LHS is nonnull, otherwise start with the first
+   element of VECTOR_RHS.  Insert the extraction statements before GSI and
+   associate the new scalar SSA names with variable SCALAR_DEST.
+   Return the SSA name for the result.  */
+
+static tree
+vect_expand_fold_left (gimple_stmt_iterator *gsi, tree scalar_dest,
+  tree_code code, tree lhs, tree vector_rhs)
+{
+  auto_vec scalar_results;
+  vect_extract_elements (gsi, &scalar_results, scalar_dest, vector_rhs);
+  tree rhs;
+  unsigned int i;
+  FOR_EACH_VEC_ELT (scalar_results, i, rhs)
+{
+  if (lhs == NULL_TREE)
+   lhs = rhs;
+  else
+   {
+ gassign *stmt = gimple_build_assign (scalar_dest, code, lhs, rhs);
+ tree new_name = make_ssa_name (scalar_dest, stmt);
+ gimple_assign_set_lhs (stmt, new_name);
+ gsi_insert_before (gsi, stmt, GSI_SAME_STMT);
+ lhs = new_name;
+   }
+}
+  return 

Re: [DWARF] mark partial fn versions and OMP frags as partial in dwarf2+ debug info

2017-11-21 Thread Jeff Law
On 11/15/2017 12:05 AM, Alexandre Oliva wrote:
> debug info: partial noentry functions: infra
> 
> This is the first patch of a set that addresses two different but
> somewhat related issues.
> 
> On the one hand, after partial inlining, the non-inlined function
> fragment is output in a way that debug info consumers can't distinguish
> from the entire function: debug info lists the entire function as
> abstract origin for the fragment, but nothing that indicates the
> fragment does not stand for the entire function.  So, if a debugger is
> asked to set a breakpoint at the entry point of the function, it might
> very well set one at the entry point of the fragment, which is likely
> not where users expect to stop.
> 
> On the other hand, OpenMP blocks are split out into artificial functions
> that do not indicate their executable code is part of another function.
> The artificial functions are nested within the original function, but
> that's hardly enough: ideally, debug info consumers should be able to
> tell that, if they stop within one of these functions, they're
> abstractly within the original function.
> 
> This patch introduces a new DWARF attribute to indicate that a function
> is a partial copy of its abstract origin, specifically, that its entry
> point does not correspond to the entry point of the abstract origin.
> This attribute can then be used to mark the out-of-line portion of
> partial inlines, and OpenMP blocks split out into artificial functions.
> 
> 
> This patchset was regstrapped on x86_64-linux-gnu and i686-linux-gnu.
> 
> Ok to install the first patch? (infrastructure)
> 
> Ok to install the second patch? (function versioning)
> 
> Ok to install the third patch? (OpenMP fragments)
These look generally OK to me, but I'd like Jakub to chime in -- he's
got some state on the issues around OMP debugging and how it ought to be
structured.

Jakub, care to chime in?

jeff


Re: Add support for in-order addition reduction using SVE FADDA

2017-11-21 Thread Jeff Law
On 11/21/2017 09:45 AM, Richard Sandiford wrote:
> Richard Biener  writes:
>> On Mon, Nov 20, 2017 at 1:54 PM, Richard Sandiford
>>  wrote:
>>> Richard Biener  writes:
 On Fri, Nov 17, 2017 at 5:53 PM, Richard Sandiford
  wrote:
> This patch adds support for in-order floating-point addition reductions,
> which are suitable even in strict IEEE mode.
>
> Previously vect_is_simple_reduction would reject any cases that forbid
> reassociation.  The idea is instead to tentatively accept them as
> "FOLD_LEFT_REDUCTIONs" and only fail later if there is no target
> support for them.  Although this patch only handles the particular
> case of plus and minus on floating-point types, there's no reason in
> principle why targets couldn't handle other cases.
>
> The vect_force_simple_reduction change makes it simpler for parloops
> to read the type of reduction.
>
> Tested on aarch64-linux-gnu (with and without SVE), x86_64-linux-gnu
> and powerpc64le-linux-gnu.  OK to install?

 I don't like that you add a new tree code for this.  A new IFN looks more
 suitable to me.
>>>
>>> OK.
>>
>> Thanks.  I'd like to eventually get rid of other vectorizer tree codes as 
>> well,
>> like the REDUC_*_EXPR, DOT_PROD_EXPR and SAD_EXPR.  IFNs
>> are now really the way to go for "target instructions on GIMPLE".
>>
 Also I think if there's a way to handle this correctly with target support
 you can also implement a fallback if there is no such support increasing
 test coverage.  It would basically boil down to extracting all scalars from
 the non-reduction operand vector and performing a series of reduction
 ops, keeping the reduction PHI scalar.  This would also support any
 reduction operator.
>>>
>>> Yeah, but without target support, that's probably going to be expensive.
>>> It's a bit like how we can implement element-by-element loads and stores
>>> for cases that don't have target support, but had to explicitly disable
>>> that in many cases, since the cost model was too optimistic.
>>
>> I expect that for V2DF or even V4DF it might be profitable in quite a number
>> of cases.  V2DF definitely.
>>
>>> I can give it a go anyway if you think it's worth it.
>>
>> I think it is.
> 
> OK, here's 2/3.  It just splits out some code for reuse in 3/3.
[ ... ]
Is this going to obsolete any of the stuff posted to date?  I'm thinking
specifically about "Add support for bitwise reductions", but perhaps
there are others.

Jeff


Re: [PATCH] handle non-constant offsets in -Wstringop-overflow (PR 77608)

2017-11-21 Thread Jeff Law
On 11/19/2017 04:28 PM, Martin Sebor wrote:
> On 11/18/2017 12:53 AM, Jeff Law wrote:
>> On 11/17/2017 12:36 PM, Martin Sebor wrote:
>>> The attached patch enhances -Wstringop-overflow to detect more
>>> instances of buffer overflow at compile time by handling non-
>>> constant offsets into the destination object that are known to
>>> be in some range.  The solution could be improved by handling
>>> even more cases (e.g., anti-ranges or offsets relative to
>>> pointers beyond the beginning of an object) but it's a start.
>>>
>>> In addition to bootsrapping/regtesting GCC, also tested with
>>> Binutils/GDB, Glibc, and the Linux kernel on x86_64 with no
>>> regressions.
>>>
>>> Martin
>>>
>>> The top of GDB fails to compile at the moment so the validation
>>> there was incomplete.
>>>
>>> gcc-77608.diff
>>>
>>>
>>> PR middle-end/77608 - missing protection on trivially detectable
>>> runtime buffer overflow
>>>
>>> gcc/ChangeLog:
>>>
>>> PR middle-end/77608
>>> * builtins.c (compute_objsize): Handle non-constant offsets.
>>>
>>> gcc/testsuite/ChangeLog:
>>>
>>> PR middle-end/77608
>>> * gcc.dg/Wstringop-overflow.c: New test.
>> The recursive call into compute_objsize passing in the ostype avoids
>> having to think about the whole object vs nearest containing object
>> issues.  Right?
>>
>> What's left to worry about is maximum or minimum remaining bytes in the
>> object.  At least that's my understanding of how ostype works here.
>>
>> So we get the amount remaining, ignoring the variable offset, from the
>> recursive call (SIZE).  The space left after we account for the variable
>> offset is [SIZE - MAX, SIZE - MIN].  So ISTM for type 0/1 you have to
>> return SIZE-MIN (which you do) and for type 2/3 you have to return
>> SIZE-MAX which I think you get wrong (and you have to account for the
>> possibility that MAX or MIN is greater than SIZE and thus there's
>> nothing left).
> 
> Subtracting the upper bound of the offset from the size instead
> of the lower bound when the caller is asking for the minimum
> object size would make the result essentially meaningless in
> common cases where the offset is smaller than size_t, as in:
> 
>   char a[7];
> 
>   void f (const char *s, unsigned i)
>   {
>     __builtin_strcpy (a + i, s);
>   }
> 
> Here, i's range is [0, UINT_MAX].
> 
> IMO, it's only useful to use the lower bound here, otherwise
> the result would only rarely be non-zero.
But when we're asking for the minimum left, aren't we essentially asking
for "how much space am I absolutely sure I can write"?  And if that is
the question, then the only conservatively correct answer is to subtract
the high bound.


> 
> This is also what other warnings that deal with ranges do.  For
> -Warray-bounds only considers the lower bound (unless it's negative)
> when deciding whether or not to warn for
> 
>   int g (unsigned i)
>   {
>     return a[i];
>   }>
> It would be too noisy to be of practical use otherwise (even at
> level 2).
Which argues that:

1. Our ranges suck
2. We're currently avoiding dealing with that by giving answers that are
not conservatively correct.

Right?


Jeff


Re: [RFTesting] New POINTER_DIFF_EXPR

2017-11-21 Thread Jeff Law
On 11/19/2017 04:54 PM, Marc Glisse wrote:
> Hello,
> 
> new version, regtested on powerpc64le-unknown-linux-gnu. The front-end
> parts are up for review.
> 
> 2017-10-28  Marc Glisse  
> 
> gcc/c/
> * c-fold.c (c_fully_fold_internal): Handle POINTER_DIFF_EXPR.
> * c-typeck.c (pointer_diff): Use POINTER_DIFF_EXPR.
> 
> gcc/c-family/
> * c-pretty-print.c (pp_c_additive_expression,
> c_pretty_printer::expression): Handle POINTER_DIFF_EXPR.
> 
> gcc/cp/
> * constexpr.c (cxx_eval_constant_expression,
> potential_constant_expression_1): Handle POINTER_DIFF_EXPR.
> * cp-gimplify.c (cp_fold): Likewise.
> * error.c (dump_expr): Likewise.
> * typeck.c (pointer_diff): Use POINTER_DIFF_EXPR.
> 
> gcc/
> * doc/generic.texi: Document POINTER_DIFF_EXPR, update
> POINTER_PLUS_EXPR.
> * cfgexpand.c (expand_debug_expr): Handle POINTER_DIFF_EXPR.
> * expr.c (expand_expr_real_2): Likewise.
> * fold-const.c (const_binop, fold_addr_of_array_ref_difference,
> fold_binary_loc): Likewise.
> * match.pd (X-X, P+(Q-P), &D-P, (P+N)-P, P-(P+N), (P+M)-(P+N),
> P-Q==0, -(A-B), X-Z (A-B)+(C-A)): New transformations for POINTER_DIFF_EXPR, based on
> MINUS_EXPR transformations.
> * optabs-tree.c (optab_for_tree_code): Handle POINTER_DIFF_EXPR.
> * tree-cfg.c (verify_expr, verify_gimple_assign_binary): Likewise.
> * tree-inline.c (estimate_operator_cost): Likewise.
> * tree-pretty-print.c (dump_generic_node, op_code_prio,
> op_symbol_code): Likewise.
> * tree-vect-stmts.c (vectorizable_operation): Likewise.
> * vr-values.c (extract_range_from_binary_expr): Likewise.
> * varasm.c (initializer_constant_valid_p_1): Likewise.
> * tree.def: New tree code POINTER_DIFF_EXPR.
> 
The front-end bits seem very reasonable to me.  If the rest is ACK'd
then you should consider the full patch ack'd.

jeff


Re: [PATCH] Fix result for conditional reductions matching at index 0

2017-11-21 Thread Alan Hayward

> On 21 Nov 2017, at 16:43, Kilian Verhetsel  
> wrote:
> 
> 
>> This is PR81179 I think, please mention that in the changelog.
> 
> Correct, my bad for missing that.
> 
>> This unconditionally pessimizes code even if there is no valid index
>> zero, right?
> 
> Almost, since for a loop such as:
> 
>  #define OFFSET 1
>  unsigned int find(const unsigned int *a, unsigned int v) {
>unsigned int result = 120;
>for (unsigned int i = OFFSET; i < 32+OFFSET; i++) {
>  if (a[i-OFFSET] == v) result = i;
>}
>return result;
>  }
> 
> the index i will match the contents of the index vector used here ---
> but this does indeed pessimize the code generated for, say, OFFSET
> = 2. It is probably more sensible to use the existing code path in those
> situations.
> 

Looking at the final asm on aarch64 for -14.c, the code has only grown
By a single instruction in the epilogue. Which is good, given the vector
pass dump for this patch is quite a bit longer.

In the vector dump, there is a vector of 0’s
_57 = { 0, 0, 0, 0 };
created which is then never used (and later gets optimised away).
Be nice if that could avoid getting created.
I’ve not had chance to scrutinise the patch yet to see where that is created.


>> The issue with the COND_REDUCITION index vector is overflow IIRC.
> 
> Does that mean such overflows can already manifest themselves for
> regular COND_REDUCTION? I had assumed sufficient checks were already in
> place because of the presence of the is_nonwrapping_integer_induction
> test.

As an aside, it’s possibly worth mentioning that this issue will go away for
aarch64-sve when Richard Sandiford’s SVE patch goes in, as that’ll support
CLASTB reduction.



Re: [PATCH] make canonicalize_condition keep its promise

2017-11-21 Thread Jeff Law
On 11/20/2017 06:41 AM, Aaron Sawdey wrote:
> On Sun, 2017-11-19 at 16:44 -0700, Jeff Law wrote:
>> On 11/15/2017 08:40 AM, Aaron Sawdey wrote:
>>> So, the story of this very small patch starts with me adding
>>> patterns
>>> for ppc instructions bdz[tf] and bdnz[tf] such as this:
>>>
>>>   [(set (pc)
>>> (if_then_else
>>>   (and
>>>  (ne (match_operand:P 1 "register_operand" "c,*b,*b,*b")
>>>  (const_int 1))
>>>  (match_operator 3 "branch_comparison_operator"
>>>   [(match_operand 4 "cc_reg_operand" "y,y,y,y")
>>>    (const_int 0)]))
>>>   (label_ref (match_operand 0))
>>>   (pc)))
>>>    (set (match_operand:P 2 "nonimmediate_operand"
>>> "=1,*r,m,*d*wi*c*l")
>>> (plus:P (match_dup 1)
>>> (const_int -1)))
>>>    (clobber (match_scratch:P 5 "=X,X,&r,r"))
>>>    (clobber (match_scratch:CC 6 "=X,&y,&y,&y"))
>>>    (clobber (match_scratch:CCEQ 7 "=X,&y,&y,&y"))]
>>>
>>> However when this gets to the loop_doloop pass, we get an assert
>>> fail
>>> in iv_number_of_iterations():
>>>
>>>   gcc_assert (COMPARISON_P (condition));
>>>
>>> This is happening because this branch insn tests two things ANDed
>>> together so the and is at the top of the expression, not a
>>> comparison.
>>
>> Is this something you've created for an existing loop?  Presumably an
>> existing loop that previously was a simple loop?
> 
> The rtl to use this instruction is generated by new code I'm working on
> to do a builtin expansion of memcmp using a loop. I call
> gen_bdnztf_di() to generate rtl for the insn. It would be nice to be
> able to generate this instruction from doloop conversion but that is
> beyond the scope of what I'm working on presently.
Understood.

So what I think (and I'm hoping you can confirm one way or the other) is
that by generating this instruction you're turing a loop which
previously was considered a simple loop by the IV code and turning it
into something the IV bits no longer think is a simple loop.

I think that's problematical as when the loop is thought to be a simple
loop, it has to have a small number of forms for its loop back/exit loop
tests and whether or not a loop is a simple loop is cached in the loop
structure.

I think we need to dig into that first.  If my suspicion is correct then
this patch is really just papering over that deeper problem.  So I think
you need to dig a big deeper into why you're getting into the code in
question (canonicalize_condition) and whether or not the call chain
makes any sense given the changes you've made to the loop.


Jeff


Re: Add support for in-order addition reduction using SVE FADDA

2017-11-21 Thread Richard Sandiford
Richard Biener  writes:
> On Mon, Nov 20, 2017 at 1:54 PM, Richard Sandiford
>  wrote:
>> Richard Biener  writes:
>>> On Fri, Nov 17, 2017 at 5:53 PM, Richard Sandiford
>>>  wrote:
 This patch adds support for in-order floating-point addition reductions,
 which are suitable even in strict IEEE mode.

 Previously vect_is_simple_reduction would reject any cases that forbid
 reassociation.  The idea is instead to tentatively accept them as
 "FOLD_LEFT_REDUCTIONs" and only fail later if there is no target
 support for them.  Although this patch only handles the particular
 case of plus and minus on floating-point types, there's no reason in
 principle why targets couldn't handle other cases.

 The vect_force_simple_reduction change makes it simpler for parloops
 to read the type of reduction.

 Tested on aarch64-linux-gnu (with and without SVE), x86_64-linux-gnu
 and powerpc64le-linux-gnu.  OK to install?
>>>
>>> I don't like that you add a new tree code for this.  A new IFN looks more
>>> suitable to me.
>>
>> OK.
>
> Thanks.  I'd like to eventually get rid of other vectorizer tree codes as 
> well,
> like the REDUC_*_EXPR, DOT_PROD_EXPR and SAD_EXPR.  IFNs
> are now really the way to go for "target instructions on GIMPLE".
>
>>> Also I think if there's a way to handle this correctly with target support
>>> you can also implement a fallback if there is no such support increasing
>>> test coverage.  It would basically boil down to extracting all scalars from
>>> the non-reduction operand vector and performing a series of reduction
>>> ops, keeping the reduction PHI scalar.  This would also support any
>>> reduction operator.
>>
>> Yeah, but without target support, that's probably going to be expensive.
>> It's a bit like how we can implement element-by-element loads and stores
>> for cases that don't have target support, but had to explicitly disable
>> that in many cases, since the cost model was too optimistic.
>
> I expect that for V2DF or even V4DF it might be profitable in quite a number
> of cases.  V2DF definitely.
>
>> I can give it a go anyway if you think it's worth it.
>
> I think it is.

OK, done in the patch below.  Tested as before.

Thanks,
Richard


2017-11-21  Richard Sandiford  
Alan Hayward  
David Sherwood  

gcc/
* optabs.def (fold_left_plus_optab): New optab.
* doc/md.texi (fold_left_plus_@var{m}): Document.
* internal-fn.def (IFN_FOLD_LEFT_PLUS): New internal function.
* internal-fn.c (fold_left_direct): Define.
(expand_fold_left_optab_fn): Likewise.
(direct_fold_left_optab_supported_p): Likewise.
* fold-const-call.c (fold_const_fold_left): New function.
(fold_const_call): Use it to fold CFN_FOLD_LEFT_PLUS.
* tree-parloops.c (valid_reduction_p): New function.
(gather_scalar_reductions): Use it.
* tree-vectorizer.h (FOLD_LEFT_REDUCTION): New vect_reduction_type.
(vect_finish_replace_stmt): Declare.
* tree-vect-loop.c (fold_left_reduction_code): New function.
(needs_fold_left_reduction_p): New function, split out from...
(vect_is_simple_reduction): ...here.  Accept reductions that
forbid reassociation, but give them type FOLD_LEFT_REDUCTION.
(vect_force_simple_reduction): Also store the reduction type in
the assignment's STMT_VINFO_REDUC_TYPE.
(vect_model_reduction_cost): Handle FOLD_LEFT_REDUCTION.
(merge_with_identity): New function.
(vectorize_fold_left_reduction): Likewise.
(vectorizable_reduction): Handle FOLD_LEFT_REDUCTION.  Leave the
scalar phi in place for it.  Check for target support and reject
cases that would reassociate the operation.  Defer the transform
phase to vectorize_fold_left_reduction.
* config/aarch64/aarch64.md (UNSPEC_FADDA): New unspec.
* config/aarch64/aarch64-sve.md (fold_left_plus_): New expander.
(*fold_left_plus_, *pred_fold_left_plus_): New insns.

gcc/testsuite/
* gcc.dg/vect/no-fast-math-vect16.c: Expect the test to pass and
check for a message about using in-order reductions.
* gcc.dg/vect/pr79920.c: Expect both loops to be vectorized and
check for a message about using in-order reductions.
* gcc.dg/vect/trapv-vect-reduc-4.c: Expect all three loops to be
vectorized and check for a message about using in-order reductions.
* gcc.dg/vect/vect-reduc-6.c: Expect the loop to be vectorized and
check for a message about using in-order reductions.
* gcc.dg/vect/vect-reduc-in-order-1.c: New test.
* gcc.dg/vect/vect-reduc-in-order-2.c: Likewise.
* gcc.dg/vect/vect-reduc-in-order-3.c: Likewise.
* gcc.dg/vect/vect-reduc-in-order-4.c: Likewise.
* gcc.target/aarch64/sve_reduc_strict_1.c: New test.
* gcc.target/aarch64/sve_reduc_strict_1_run.c: Likewise.
   

Re: [RFC][PATCH] Extend DCE to remove unnecessary new/delete-pairs

2017-11-21 Thread Jeff Law
On 11/21/2017 04:14 AM, Dominik Inführ wrote:
> Hi,
> 
> this patch tries to extend tree-ssa-dce.c to remove unnecessary 
> new/delete-pairs (it already does that for malloc/free). Clang does it too 
> and it seems to be allowed by 
> http://www.open-std.org/jtc1/sc22/wg21/docs/papers/2013/n3664.html. I’ve 
> bootstrapped/regtested on aarch64-linux and x86_64-linux.
Just a note, we've transitioned into stage3 in preparation for the
upcoming gcc-8 release in the spring.  During stage3 we're addressing
bugfixes, not further enhancements (with the exception of enhancements
that were posted prior to stage1 close).

So it's unlikely anyone will dig into this right now, unless there's an
existing bugzilla around this missed optimization.

Just wanted to let you know where things stood so you don't interpret
silence as "we don't care".

jeff


Re: Add support for in-order addition reduction using SVE FADDA

2017-11-21 Thread Richard Sandiford
Jeff Law  writes:
> On 11/21/2017 09:45 AM, Richard Sandiford wrote:
>> Richard Biener  writes:
>>> On Mon, Nov 20, 2017 at 1:54 PM, Richard Sandiford
>>>  wrote:
 Richard Biener  writes:
> On Fri, Nov 17, 2017 at 5:53 PM, Richard Sandiford
>  wrote:
>> This patch adds support for in-order floating-point addition reductions,
>> which are suitable even in strict IEEE mode.
>>
>> Previously vect_is_simple_reduction would reject any cases that forbid
>> reassociation.  The idea is instead to tentatively accept them as
>> "FOLD_LEFT_REDUCTIONs" and only fail later if there is no target
>> support for them.  Although this patch only handles the particular
>> case of plus and minus on floating-point types, there's no reason in
>> principle why targets couldn't handle other cases.
>>
>> The vect_force_simple_reduction change makes it simpler for parloops
>> to read the type of reduction.
>>
>> Tested on aarch64-linux-gnu (with and without SVE), x86_64-linux-gnu
>> and powerpc64le-linux-gnu.  OK to install?
>
> I don't like that you add a new tree code for this.  A new IFN looks more
> suitable to me.

 OK.
>>>
>>> Thanks.  I'd like to eventually get rid of other vectorizer tree
>>> codes as well,
>>> like the REDUC_*_EXPR, DOT_PROD_EXPR and SAD_EXPR.  IFNs
>>> are now really the way to go for "target instructions on GIMPLE".
>>>
> Also I think if there's a way to handle this correctly with target support
> you can also implement a fallback if there is no such support increasing
> test coverage.  It would basically boil down to extracting all scalars 
> from
> the non-reduction operand vector and performing a series of reduction
> ops, keeping the reduction PHI scalar.  This would also support any
> reduction operator.

 Yeah, but without target support, that's probably going to be expensive.
 It's a bit like how we can implement element-by-element loads and stores
 for cases that don't have target support, but had to explicitly disable
 that in many cases, since the cost model was too optimistic.
>>>
>>> I expect that for V2DF or even V4DF it might be profitable in quite a number
>>> of cases.  V2DF definitely.
>>>
 I can give it a go anyway if you think it's worth it.
>>>
>>> I think it is.
>> 
>> OK, here's 2/3.  It just splits out some code for reuse in 3/3.
> [ ... ]
> Is this going to obsolete any of the stuff posted to date?  I'm thinking
> specifically about "Add support for bitwise reductions", but perhaps
> there are others.

It means that those codes go in and then come out again, yeah, although
the end result is the same.

OK, I'll redo it so that this goes first and then repost a new bitwise
patch too.

Thanks,
Richard



Re: [PATCH][RFC] Add quotes for constexpr keyword.

2017-11-21 Thread Jeff Law
On 11/21/2017 08:00 AM, Martin Liška wrote:
> Hi.
> 
> I'm sending v2 of the patch where I fixed test-suite fallout.
> 
> Patch can bootstrap on ppc64le-redhat-linux and survives regression tests.
> 
> Ready to be installed?
> Martin
> 
> 
> 0001-Add-quotes-for-constexpr-keyword.patch
> 
> 
> From 3195b1b71c387b1359c90f6e752e1c312120cd69 Mon Sep 17 00:00:00 2001
> From: marxin 
> Date: Wed, 15 Nov 2017 08:41:12 +0100
> Subject: [PATCH] Add quotes for constexpr keyword.
> 
> gcc/cp/ChangeLog:
> 
> 2017-11-15  Martin Liska  
> 
>   * class.c (finalize_literal_type_property): Add quotes for
>   constexpr keyword.
>   (explain_non_literal_class): Likewise.
>   * constexpr.c (ensure_literal_type_for_constexpr_object): Likewise.
>   (is_valid_constexpr_fn): Likewise.
>   (check_constexpr_ctor_body): Likewise.
>   (register_constexpr_fundef): Likewise.
>   (explain_invalid_constexpr_fn): Likewise.
>   (cxx_eval_builtin_function_call): Likewise.
>   (cxx_eval_call_expression): Likewise.
>   (cxx_eval_loop_expr): Likewise.
>   (potential_constant_expression_1): Likewise.
>   * decl.c (check_previous_goto_1): Likewise.
>   (check_goto): Likewise.
>   (grokfndecl): Likewise.
>   (grokdeclarator): Likewise.
>   * error.c (maybe_print_constexpr_context): Likewise.
>   * method.c (process_subob_fn): Likewise.
>   (defaulted_late_check): Likewise.
>   * parser.c (cp_parser_compound_statement): Likewise.
> 
> gcc/testsuite/ChangeLog:
> 
> 2017-11-16  Martin Liska  
> 
>   * g++.dg/cpp0x/constexpr-48089.C: Add quotes for constexpr
>   keyword; add dg-message for 'in .constexpr. expansion of '.
>   * g++.dg/cpp0x/constexpr-50060.C: Likewise.
>   * g++.dg/cpp0x/constexpr-60049.C: Likewise.
>   * g++.dg/cpp0x/constexpr-70323.C: Likewise.
>   * g++.dg/cpp0x/constexpr-70323a.C: Likewise.
>   * g++.dg/cpp0x/constexpr-cast.C: Likewise.
>   * g++.dg/cpp0x/constexpr-diag3.C: Likewise.
>   * g++.dg/cpp0x/constexpr-ex1.C: Likewise.
>   * g++.dg/cpp0x/constexpr-generated1.C: Likewise.
>   * g++.dg/cpp0x/constexpr-ice16.C: Likewise.
>   * g++.dg/cpp0x/constexpr-ice5.C: Likewise.
>   * g++.dg/cpp0x/constexpr-incomplete2.C: Likewise.
>   * g++.dg/cpp0x/constexpr-neg1.C: Likewise.
>   * g++.dg/cpp0x/constexpr-recursion.C: Likewise.
>   * g++.dg/cpp0x/constexpr-shift1.C: Likewise.
>   * g++.dg/cpp1y/constexpr-70265-1.C: Likewise.
>   * g++.dg/cpp1y/constexpr-70265-2.C: Likewise.
>   * g++.dg/cpp1y/constexpr-79655.C: Likewise.
>   * g++.dg/cpp1y/constexpr-new.C: Likewise.
>   * g++.dg/cpp1y/constexpr-return2.C: Likewise.
>   * g++.dg/cpp1y/constexpr-shift1.C: Likewise.
>   * g++.dg/cpp1y/constexpr-throw.C: Likewise.
>   * g++.dg/cpp1z/constexpr-lambda6.C: Likewise.
>   * g++.dg/ext/constexpr-vla1.C: Likewise.
>   * g++.dg/ext/constexpr-vla2.C: Likewise.
>   * g++.dg/ext/constexpr-vla3.C: Likewise.
>   * g++.dg/cpp0x/static_assert10.C: Likewise.
>   * g++.dg/cpp1y/pr63996.C: Likewise.
>   * g++.dg/cpp1y/pr68180.C: Likewise.
>   * g++.dg/cpp1y/pr77830.C: Likewise.
>   * g++.dg/ubsan/pr63956.C: Likewise.
OK.  And ISTM that other patches of a similar nature ought to just be
considered OK without the need to review.

Jeff


Re: [PATCH, i386] Refactor -mprefer-avx[128|256] options into common -mprefer-vector-width=[none|128|256|512]

2017-11-21 Thread Uros Bizjak
On Tue, Nov 21, 2017 at 4:50 PM, Shalnov, Sergey
 wrote:
> Uros,
> I did new patch with all comments addressed as proposed.
> 1. old option -mprefer-avx128 is Alias(mprefer-vector-width=, 128, none)
> 2. Simplified default initialization (as Bernhard proposed)
> 3. Fixed documentation (proposed by Sandra)
> 4. Several tests are changed to use new style of the option but many leaved 
> with -mprefer-avx128 (one test with new style -mprefer-vector-width=128)
>
>
> 2017-11-21  Sergey Shalnov  
>
> gcc/
> * config/i386/i386-opts.h (enum prefer_vector_width): Added new enum
> for the new option -mprefer-vector-width=[none|128|256|512].
> * config/i386/i386.c (ix86_target_string): remove old style options
> -mprefer-avx256 and make -mprefer-avx128 as alias.
> (ix86_option_override_internal):  Apply defaults for the
> -mprefer-vector-width=[128|256] option.
> * config/i386/i386.h (TARGET_PREFER_AVX128, TARGET_PREFER_AVX256):
> Implement macros to work with -mprefer-vector-width=.
> * config/i386/i386.opt: Implemented option
> -mprefer-vector-width=[none|128|256|512].
> * doc/invoke.texi: Documentation for
> -mprefer-vector-width=[none|128|256|512].
>
> gcc/testsuite/
> * g++.dg/ext/pr57362.C (__attribute__): Apply new option syntax.
> * g++.dg/torture/pr81249.C: Ditto.
> * gcc.target/i386/avx512f-constant-float-return.c: Ditto.
> * gcc.target/i386/avx512f-prefer.c: Ditto.
> * gcc.target/i386/pr82460-2.c: Ditto.
>
> Please merge this patch if you think it is acceptable.
> Thank you
> Sergey

 mprefer-avx128
-Target Report Mask(PREFER_AVX128) Save
-Use 128-bit AVX instructions instead of 256-bit AVX instructions in
the auto-vectorizer.
+Target Undocumented Alias(mprefer-vector-width=, 128, none)

For compatibility, I'd rather leave this option documented with:

+Target Alias(mprefer-vector-width=, 128, 256)

This would mean that in addition to -mprefer-avx128 switching to
128-bit AVX, -mno-prefer-avx128 would switch to 256-bit AVX, as
documented for the option.

The patch is OK, and If you agree, I can commit the patch with the above change.

Thanks,
Uros.


RE: [PATCH, i386] Refactor -mprefer-avx[128|256] options into common -mprefer-vector-width=[none|128|256|512]

2017-11-21 Thread Shalnov, Sergey
Uros,
Yes, please. Thank you for your proposals and comments.
Please commit as you proposed.
Sergey

-Original Message-
From: Uros Bizjak [mailto:ubiz...@gmail.com] 
Sent: Tuesday, November 21, 2017 6:13 PM
To: Shalnov, Sergey 
Cc: gcc-patches@gcc.gnu.org; kirill.yuk...@gmail.com; Koval, Julia 
; Senkevich, Andrew ; Peryt, 
Sebastian ; Ivchenko, Alexander 
; Joseph Myers 
Subject: Re: [PATCH, i386] Refactor -mprefer-avx[128|256] options into common 
-mprefer-vector-width=[none|128|256|512]

On Tue, Nov 21, 2017 at 4:50 PM, Shalnov, Sergey  
wrote:
> Uros,
> I did new patch with all comments addressed as proposed.
> 1. old option -mprefer-avx128 is Alias(mprefer-vector-width=, 128, 
> none) 2. Simplified default initialization (as Bernhard proposed) 3. 
> Fixed documentation (proposed by Sandra) 4. Several tests are changed 
> to use new style of the option but many leaved with -mprefer-avx128 
> (one test with new style -mprefer-vector-width=128)
>
>
> 2017-11-21  Sergey Shalnov  
>
> gcc/
> * config/i386/i386-opts.h (enum prefer_vector_width): Added new enum
> for the new option -mprefer-vector-width=[none|128|256|512].
> * config/i386/i386.c (ix86_target_string): remove old style options
> -mprefer-avx256 and make -mprefer-avx128 as alias.
> (ix86_option_override_internal):  Apply defaults for the
> -mprefer-vector-width=[128|256] option.
> * config/i386/i386.h (TARGET_PREFER_AVX128, TARGET_PREFER_AVX256):
> Implement macros to work with -mprefer-vector-width=.
> * config/i386/i386.opt: Implemented option
> -mprefer-vector-width=[none|128|256|512].
> * doc/invoke.texi: Documentation for
> -mprefer-vector-width=[none|128|256|512].
>
> gcc/testsuite/
> * g++.dg/ext/pr57362.C (__attribute__): Apply new option syntax.
> * g++.dg/torture/pr81249.C: Ditto.
> * gcc.target/i386/avx512f-constant-float-return.c: Ditto.
> * gcc.target/i386/avx512f-prefer.c: Ditto.
> * gcc.target/i386/pr82460-2.c: Ditto.
>
> Please merge this patch if you think it is acceptable.
> Thank you
> Sergey

 mprefer-avx128
-Target Report Mask(PREFER_AVX128) Save
-Use 128-bit AVX instructions instead of 256-bit AVX instructions in the 
auto-vectorizer.
+Target Undocumented Alias(mprefer-vector-width=, 128, none)

For compatibility, I'd rather leave this option documented with:

+Target Alias(mprefer-vector-width=, 128, 256)

This would mean that in addition to -mprefer-avx128 switching to 128-bit AVX, 
-mno-prefer-avx128 would switch to 256-bit AVX, as documented for the option.

The patch is OK, and If you agree, I can commit the patch with the above change.

Thanks,
Uros.


Re: [PATCH] Fix up -Wreturn-type (PR c++/83045)

2017-11-21 Thread Jeff Law
On 11/21/2017 06:52 AM, Jakub Jelinek wrote:
> Hi!
> 
> The C++ FE now emits __builtin_unreachable () with BUILTINS_LOCATION
> on spots that return from functions/methods returning non-void without
> proper return.  This breaks the -Wreturn-type warning, because we then
> don't see any return stmt without argument on the edges to exit, instead
> we see those __builtin_unreachable () calls at the end of blocks without
> successors.
> 
> I wonder if the C++ FE addition of __builtin_unreachable () shouldn't be
> done only if (optimize).
> 
> Anyway, this patch tweaks tree-cfg.c so that it recognizes those
> __builtin_unreachable () calls and reports the -Wreturn-type warning
> in those cases too (warning in the FE would be too early, we need to
> optimize away unreachable code).
> 
> Bootstrapped/regtested on x86_64-linux and i686-linux, ok for trunk?
> 
> The patch regresses g++.dg/gomp/declare-simd-1.C, but given that it revealed
> a real bug, I'm not trying to work around it in the patch and will fix it up
> incrementally instead.
> 
> 2017-11-21  Jakub Jelinek  
> 
>   PR c++/83045
>   * tree-cfg.c (pass_warn_function_return::execute): Formatting fix.
>   Also warn if seen __builtin_unreachable () call with BUILTINS_LOCATION.
>   Use LOCATION_LOCUS when comparing against UNKNOWN_LOCATION.
> 
>   * c-c++-common/pr61405.c (fn0, fn1): Add return stmts.
>   * c-c++-common/Wlogical-op-2.c (fn): Likewise.
>   * g++.dg/debug/pr53466.C: Add -Wno-return-type to dg-options.
>   * g++.dg/opt/combine.C: Likewise.
>   * g++.dg/ubsan/return-3.C: Likewise.
>   * g++.dg/pr59445.C: Likewise.
>   * g++.dg/pr49847.C: Likewise.
>   * g++.dg/ipa/pr61800.C: Likewise.
>   * g++.dg/ipa/pr63470.C: Likewise.
>   * g++.dg/ipa/pr68672-1.C: Likewise.
>   * g++.dg/pr58438.C: Likewise.
>   * g++.dg/torture/pr59265.C: Likewise.
>   * g++.dg/tree-ssa/ssa-dse-2.C: Likewise.
>   * g++.old-deja/g++.eh/catch13.C: Likewise.
>   * g++.old-deja/g++.eh/crash1.C: Likewise.
>   * g++.dg/tm/pr60004.C: Expect -Wreturn-type warning.
>   * g++.dg/torture/pr55740.C: Likewise.
>   * g++.dg/torture/pr43257.C: Likewise.
>   * g++.dg/torture/pr64280.C: Likewise.
>   * g++.dg/torture/pr54684.C: Likewise.
>   * g++.dg/torture/pr56694.C: Likewise.
>   * g++.dg/torture/pr68470.C: Likewise.
>   * g++.dg/torture/pr60648.C: Likewise.
>   * g++.dg/torture/pr71281.C: Likewise.
>   * g++.dg/torture/pr52772.C: Add -Wno-return-type dg-additional-options.
>   * g++.dg/torture/pr64669.C: Likewise.
>   * g++.dg/torture/pr58369.C: Likewise.
>   * g++.dg/torture/pr33627.C: Likewise.
>   * g++.dg/torture/predcom-1.C: Add
>   #pragma GCC diagnostic ignored "-Wreturn-type".
>   * g++.dg/lto/20090221_0.C: Likewise.
>   * g++.dg/lto/20091026-1_1.C: Likewise.
>   * g++.dg/lto/pr54625-1_1.C: Likewise.
>   * g++.dg/warn/pr83045.C: New test.
OK.
jeff


RE: [PATCH][GCC][ARM] Dot Product NEON intrinsics [Patch (3/8)]

2017-11-21 Thread Tamar Christina
Ping

> -Original Message-
> From: Tamar Christina [mailto:tamar.christ...@arm.com]
> Sent: Monday, November 6, 2017 16:54
> To: gcc-patches@gcc.gnu.org
> Cc: nd ; Ramana Radhakrishnan
> ; Richard Earnshaw
> ; ni...@redhat.com; Kyrylo Tkachov
> 
> Subject: [PATCH][GCC][ARM] Dot Product NEON intrinsics [Patch (3/8)]
> 
> Hi All,
> 
> This patch adds the NEON intrinsics for Dot product.
> 
> Dot product is available from ARMv8.2-a and onwards.
> 
> Regtested on arm-none-eabi, armeb-none-eabi, aarch64-none-elf and
> aarch64_be-none-elf with no issues found.
> 
> Ok for trunk?
> 
> gcc/
> 2017-11-06  Tamar Christina  
> 
>   * config/aarch64/arm_neon.h (vdot_u32, vdotq_u32)
>   (vdot_s32, vdotq_s32): New.
>   (vdot_lane_u32, vdotq_lane_u32): New.
>   (vdot_lane_s32, vdotq_lane_s32): New.
> 
> 
> gcc/testsuite/
> 2017-11-06  Tamar Christina  
> 
>   * gcc.target/arm/simd/vdot-compile.c: New.
>   * gcc.target/arm/simd/vect-dot-qi.h: New.
>   * gcc.target/arm/simd/vect-dot-s8.c: New.
>   * gcc.target/arm/simd/vect-dot-u8.c: New
> 
> --


RE: [PATCH][GCC][ARM] Generate .arch and .arch_extensions for each function if required. [Patch (3/3)]

2017-11-21 Thread Tamar Christina
Ping

> -Original Message-
> From: gcc-patches-ow...@gcc.gnu.org [mailto:gcc-patches-
> ow...@gcc.gnu.org] On Behalf Of Tamar Christina
> Sent: Monday, November 6, 2017 16:52
> To: gcc-patches@gcc.gnu.org
> Cc: nd ; Ramana Radhakrishnan
> ; Richard Earnshaw
> ; ni...@redhat.com; Kyrylo Tkachov
> 
> Subject: [PATCH][GCC][ARM] Generate .arch and .arch_extensions for each
> function if required. [Patch (3/3)]
> 
> Hi All,
> 
> This patch adds the needed machinery to generate the appropriate .arch
> and .arch_extension directives per function.
> 
> Borrowing from AArch64 this is only done when it's required (i.e. when the
> directives to be set differ from the currently set one).
> 
> As part if this the .fpu directive has also been cleaned up to follow the same
> logic.
> 
> Regtested on arm-none-eabi and no regressions.
> 
> Ok for trunk?
> 
> gcc/
> 2017-11-06  Tamar Christina  
> 
>   PR target/82641
>   * config/arm/arm.c (INCLUDE_STRING): Define.
>   (arm_last_printed_arch_string, arm_last_printed_fpu_string): New.
>   (arm_declare_function_name): Conservatively
> emit .arch, .arch_extensions
>   and .fpu.
> 
> gcc/testsuite/
> 2017-11-06  Tamar Christina  
> 
>   PR target/82641
>   * gcc.target/arm/pragma_arch_attribute_2.c: New.
>   * gcc.target/arm/pragma_arch_attribute_2.c: New.
>   * gcc.target/arm/pragma_arch_attribute_3.c: New.
>   * gcc.target/arm/pragma_fpu_attribute.c: New.
>   * gcc.target/arm/pragma_fpu_attribute_2.c: New.
> 
> --


RE: [PATCH][GCC][DOCS][AArch64][ARM] Documentation updates adding -A extensions.

2017-11-21 Thread Tamar Christina
Ping

> -Original Message-
> From: gcc-patches-ow...@gcc.gnu.org [mailto:gcc-patches-
> ow...@gcc.gnu.org] On Behalf Of Tamar Christina
> Sent: Wednesday, November 15, 2017 11:51
> To: gcc-patches@gcc.gnu.org
> Cc: nd ; James Greenhalgh ;
> Richard Earnshaw ; Marcus Shawcroft
> 
> Subject: [PATCH][GCC][DOCS][AArch64][ARM] Documentation updates
> adding -A extensions.
> 
> Hi All,
> 
> This patch updates the documentation for AArch64 and ARM correcting the
> use of the architecture namings by adding the -A suffix in appropriate places.
> 
> Build done on aarch64-none-elf and arm-none-eabi and no issues.
> 
> Ok for trunk?
> 
> Thanks,
> Tamar
> 
> gcc/
> 2017-11-15  Tamar Christina  
> 
>   * doc/extend.texi: Add -A suffix (ARMv8*-A, ARMv7-A).
>   * doc/invoke.texi: Add -A suffix (ARMv8*-A, ARMv7-A).
>   * doc/sourcebuild.texi: Add -A suffix (ARMv8*-A, ARMv7-A).
> 
> --


Re: Fix PR82849 (ICE in modulo scheduling)

2017-11-21 Thread Jeff Law
On 11/19/2017 07:46 AM, Jan Hubicka wrote:
> Hi,
> my understanding of the code is that trip_count is trying to estimate number
> of iterations when counts are available and disable modulo scheduler for
> loops with too few iterations.
That's my reading as well.

> We now have get_estimated_loop_iterations_int and get_max_loop_iterations_int
> which keeps information detected upstream and so we should use i there.
> 
> I tested the patch very lightly only on the testcase in question as I am not
> sure when and how modulo sched is used.
I don't think it's enabled anywhere by default.  IIRC it was primarily
used on ia64, though my recollection is also that they group that did
the work also saw good gains on ppc.

I still think we need to do some kind of evaluation about whether or not
to keep that code.  If it's not being used, then it's just a maintenance
burden.

> 
> Seems sane? What are the code size impacts of modulo scheduling?
Given it's a software pipelining algorithm I'd expect it expands code,
consistently :-)  But it's probably less than loop unrolling in general.


> 
> Honza
> 
>   PR 82849
>   * modulo-sched.c (sms_schedule): Use get_estimated_loop_iterations_int
>   and get_max_loop_iterations_int.
Seems reasoanble.  I'd go with it.

jeff


Re: [RFC][PATCH] Extend DCE to remove unnecessary new/delete-pairs

2017-11-21 Thread Dominik Inführ
Thanks for the reply, I know that it’s too late for GCC 8. I wanted to get some 
feedback on this patch, so I could address all issues until GCC 9 development 
starts. But I suppose it is better to just post it again later.

Dominik

> On 21 Nov 2017, at 18:09, Jeff Law  wrote:
> 
> On 11/21/2017 04:14 AM, Dominik Inführ wrote:
>> Hi,
>> 
>> this patch tries to extend tree-ssa-dce.c to remove unnecessary 
>> new/delete-pairs (it already does that for malloc/free). Clang does it too 
>> and it seems to be allowed by 
>> http://www.open-std.org/jtc1/sc22/wg21/docs/papers/2013/n3664.html. I’ve 
>> bootstrapped/regtested on aarch64-linux and x86_64-linux.
> Just a note, we've transitioned into stage3 in preparation for the
> upcoming gcc-8 release in the spring.  During stage3 we're addressing
> bugfixes, not further enhancements (with the exception of enhancements
> that were posted prior to stage1 close).
> 
> So it's unlikely anyone will dig into this right now, unless there's an
> existing bugzilla around this missed optimization.
> 
> Just wanted to let you know where things stood so you don't interpret
> silence as "we don't care".
> 
> jeff



signature.asc
Description: Message signed with OpenPGP using GPGMail


Re: [Patch][aarch64] Use IFUNCs to enable LSE instructions in libatomic on aarch64

2017-11-21 Thread James Greenhalgh
On Mon, Nov 20, 2017 at 07:22:15PM +, Steve Ellcey wrote:
> On Mon, 2017-11-20 at 18:27 +, James Greenhalgh wrote:
> > 
> > If you have the time, would you mind giving me a quick run-down of what
> > design decisions went in to this patch, and why they are the right thing
> > to do? Sorry to offload that, but it will be the most efficient route
> > to a review.
> 
> The main design decision was to use the existing IFUNC infrastructure
> that is used on ARM32 to enable atomic instructions that were added
> with armv7-a, on i386 to enable instructions added with i586, and on
> x86_64 to enable instructions added with cx16.
> 
> The basic idea for all these is to allow users who create programs that
> use the atomic_* functions to use new instructions on machines that
> support them while also working on older machines that do not support
> them and to not have to create two separate executables.
> 
> Some atomic_* functions get inlined into programs, and those will
> either use or not use LSE instructions based on the compiler arguments
> used during compilations.  If you want your program to work on all
> machines you have to not compile for LSE intructions.  But other
> functions (or all functions if -fno-inline-atomics is used) will call
> the libatomic library.  Currently those functions do not use LSE
> instructions but with this patch we can use the IFUNC infrastructure to
> check for LSE support and use LSE in libatomic on machines where it is
> supported or not use it on machines where it is not supported.
> 
> As an example of what this change does, __atomic_compare_exchange_8 will
> turn into a call to libat_compare_exchange_8_i1 on a machine that supports
> LSE:
> 
>  :
>    0: f9400023    ldr x3, [x1]
>    4: aa0303e4    mov x4, x3
>    8: c8e4fc02    casal   x4, x2, [x0]
>    c: eb03009f    cmp x4, x3
>   10: 1a9f17e0    csetw0, eq
>   14: 3540    cbnzw0, 1c 
>   18: f924    str x4, [x1]
>   1c: d65f03c0    ret
> 
> But call libat_compare_exchange_8 on a machine without LSE:
> 
>  :
>    0: f9400023    ldr x3, [x1]
>    4: c85ffc04    ldaxr   x4, [x0]
>    8: eb03009f    cmp x4, x3
>    c: 5461    b.ne18 
>   10: c805fc02    stlxr   w5, x2, [x0]
>   14: 3585    cbnzw5, 4 
>   18: 1a9f17e0    csetw0, eq
>   1c: 3440    cbz w0, 24 
>   20: d65f03c0    ret
>   24: f924    str x4, [x1]
>   28: d65f03c0    ret
>   2c: d503201f    nop

Thanks for the detailed explanation. I understood this, and my opinion is
that the AArch64 parts of this patch are OK (and I don't know who needs to
Ack the small generic changes you require).

Let's give Richard/Marcus 48 hours to object while we wait for an OK on the
generic bits, and then OK for AArch64.

Thanks,
James

Reviewed-By: James Greenhalgh 



Re: [RFC][PATCH] Extend DCE to remove unnecessary new/delete-pairs

2017-11-21 Thread Jeff Law
On 11/21/2017 10:30 AM, Dominik Inführ wrote:
> Thanks for the reply, I know that it’s too late for GCC 8. I wanted to get 
> some feedback on this patch, so I could address all issues until GCC 9 
> development starts. But I suppose it is better to just post it again later.
The problem is most folks' attention will be on bugfixing for the next
few months.

I did quickly scan the patch and didn't see anything terribly concerning
other than extending the one structure which folks will want to look at
more closely.  We always look closely at extending any core data
structures as that impacts memory usage.

I'll probably have to refer to the C++ experts on any specific language
issues that come into play when we open the trunk up again for new
development and come back to the patch.

Jeff


[PATCH, i386]: Improve movbe insn a bit

2017-11-21 Thread Uros Bizjak
2017-11-21  Uros Bizjak  

* config/i386/i386.md (*bswap2_movbe): Add
integer suffix to movbe mnemonic.
(*bswaphi2_movbe): Ditto.
(bswaphi_lowpart): Merge with *bswaphi_lowpart_1.

testsuite/ChangeLog:

2017-11-21  Uros Bizjak  

* gcc.target/i386/movbe-1.c: Update scan string for movbe
with integer suffix.
* gcc.target/i386/movbe-2.c: Ditto.
* gcc.target/i386/movbe-3.c: Ditto.
* gcc.target/i386/movbe-4.c: Ditto.
* gcc.target/i386/movbe-5.c: Ditto.

Bootstrapped and regression tested on x86_64-linux-gnu {,-m32}.

Committed to mainline SVN.

Uros.
diff --git a/gcc/config/i386/i386.md b/gcc/config/i386/i386.md
index 0eaa1f244f3..b8715902c35 100644
--- a/gcc/config/i386/i386.md
+++ b/gcc/config/i386/i386.md
@@ -14074,8 +14074,8 @@
&& !(MEM_P (operands[0]) && MEM_P (operands[1]))"
   "@
 bswap\t%0
-movbe\t{%1, %0|%0, %1}
-movbe\t{%1, %0|%0, %1}"
+movbe{}\t{%1, %0|%0, %1}
+movbe{}\t{%1, %0|%0, %1}"
   [(set_attr "type" "bitmanip,imov,imov")
(set_attr "modrm" "0,1,1")
(set_attr "prefix_0f" "*,1,1")
@@ -14103,8 +14103,8 @@
&& !(MEM_P (operands[0]) && MEM_P (operands[1]))"
   "@
 xchg{b}\t{%h0, %b0|%b0, %h0}
-movbe\t{%1, %0|%0, %1}
-movbe\t{%1, %0|%0, %1}"
+movbe{w}\t{%1, %0|%0, %1}
+movbe{w}\t{%1, %0|%0, %1}"
   [(set_attr "type" "imov")
(set_attr "modrm" "*,1,1")
(set_attr "prefix_0f" "*,1,1")
@@ -14124,26 +14124,25 @@
   [(parallel [(set (match_dup 0) (rotate:HI (match_dup 0) (const_int 8)))
  (clobber (reg:CC FLAGS_REG))])])
 
-(define_insn "*bswaphi_lowpart_1"
+(define_insn "bswaphi_lowpart"
   [(set (strict_low_part (match_operand:HI 0 "register_operand" "+Q,r"))
(bswap:HI (match_dup 0)))
(clobber (reg:CC FLAGS_REG))]
-  "TARGET_USE_XCHGB || optimize_function_for_size_p (cfun)"
+  ""
   "@
 xchg{b}\t{%h0, %b0|%b0, %h0}
 rol{w}\t{$8, %0|%0, 8}"
-  [(set_attr "length" "2,4")
+  [(set (attr "preferred_for_size")
+ (cond [(eq_attr "alternative" "0")
+ (symbol_ref "true")]
+  (symbol_ref "false")))
+   (set (attr "preferred_for_speed")
+ (cond [(eq_attr "alternative" "0")
+ (symbol_ref "TARGET_USE_XCHGB")]
+  (symbol_ref "!TARGET_USE_XCHGB")))
+   (set_attr "length" "2,4")
(set_attr "mode" "QI,HI")])
 
-(define_insn "bswaphi_lowpart"
-  [(set (strict_low_part (match_operand:HI 0 "register_operand" "+r"))
-   (bswap:HI (match_dup 0)))
-   (clobber (reg:CC FLAGS_REG))]
-  ""
-  "rol{w}\t{$8, %0|%0, 8}"
-  [(set_attr "length" "4")
-   (set_attr "mode" "HI")])
-
 (define_expand "paritydi2"
   [(set (match_operand:DI 0 "register_operand")
(parity:DI (match_operand:DI 1 "register_operand")))]
diff --git a/gcc/testsuite/gcc.target/i386/movbe-1.c 
b/gcc/testsuite/gcc.target/i386/movbe-1.c
index 391d4ad9814..053095ca691 100644
--- a/gcc/testsuite/gcc.target/i386/movbe-1.c
+++ b/gcc/testsuite/gcc.target/i386/movbe-1.c
@@ -15,4 +15,4 @@ bar ()
   return __builtin_bswap32 (x);
 }
 
-/* { dg-final { scan-assembler-times "movbe\[ \t\]" 2 } } */
+/* { dg-final { scan-assembler-times "movbel\[ \t\]" 2 } } */
diff --git a/gcc/testsuite/gcc.target/i386/movbe-2.c 
b/gcc/testsuite/gcc.target/i386/movbe-2.c
index c198609b722..af3b3ca9ae0 100644
--- a/gcc/testsuite/gcc.target/i386/movbe-2.c
+++ b/gcc/testsuite/gcc.target/i386/movbe-2.c
@@ -15,5 +15,5 @@ bar ()
   return __builtin_bswap64 (x);
 }
 
-/* { dg-final { scan-assembler-times "movbe\[ \t\]" 4 { target ia32 } } } */
-/* { dg-final { scan-assembler-times "movbe\[ \t\]" 2 { target { ! ia32 } } } 
} */
+/* { dg-final { scan-assembler-times "movbel\[ \t\]" 4 { target ia32 } } } */
+/* { dg-final { scan-assembler-times "movbeq\[ \t\]" 2 { target { ! ia32 } } } 
} */
diff --git a/gcc/testsuite/gcc.target/i386/movbe-3.c 
b/gcc/testsuite/gcc.target/i386/movbe-3.c
index 985bc1eb29e..e02a3016ef1 100644
--- a/gcc/testsuite/gcc.target/i386/movbe-3.c
+++ b/gcc/testsuite/gcc.target/i386/movbe-3.c
@@ -16,4 +16,4 @@ void set (struct S *s, int i)
   s->i = i;
 }
 
-/* { dg-final { scan-assembler-times "movbe\[ \t\]" 2 } } */
+/* { dg-final { scan-assembler-times "movbel\[ \t\]" 2 } } */
diff --git a/gcc/testsuite/gcc.target/i386/movbe-4.c 
b/gcc/testsuite/gcc.target/i386/movbe-4.c
index 906709167e1..966bdf52ba0 100644
--- a/gcc/testsuite/gcc.target/i386/movbe-4.c
+++ b/gcc/testsuite/gcc.target/i386/movbe-4.c
@@ -17,4 +17,4 @@ bar ()
   return __builtin_bswap32 (x);
 }
 
-/* { dg-final { scan-assembler-times "movbe\[ \t\]" 2 } } */
+/* { dg-final { scan-assembler-times "movbel\[ \t\]" 2 } } */
diff --git a/gcc/testsuite/gcc.target/i386/movbe-5.c 
b/gcc/testsuite/gcc.target/i386/movbe-5.c
index 72e4165391d..b1ab1215268 100644
--- a/gcc/testsuite/gcc.target/i386/movbe-5.c
+++ b/gcc/testsuite/gcc.target/i386/movbe-5.c
@@ -1,7 +1,6 @@
 /* PR tree-optimization/78821 */
 /* { dg-do compile } */
 /* { dg-options "-O2 -mmovbe" } */
-/* { dg-final { scan-assembler-times "movbe\[ \t\]" 2 } } */
 
 unsigned s

Re: [PATCH] make canonicalize_condition keep its promise

2017-11-21 Thread Aaron Sawdey
On Tue, 2017-11-21 at 10:06 -0700, Jeff Law wrote:
> On 11/20/2017 06:41 AM, Aaron Sawdey wrote:
> > On Sun, 2017-11-19 at 16:44 -0700, Jeff Law wrote:
> > > On 11/15/2017 08:40 AM, Aaron Sawdey wrote:
> > > > So, the story of this very small patch starts with me adding
> > > > patterns
> > > > for ppc instructions bdz[tf] and bdnz[tf] such as this:
> > > > 
> > > >   [(set (pc)
> > > > (if_then_else
> > > >   (and
> > > >  (ne (match_operand:P 1 "register_operand"
> > > > "c,*b,*b,*b")
> > > >  (const_int 1))
> > > >  (match_operator 3 "branch_comparison_operator"
> > > >   [(match_operand 4 "cc_reg_operand"
> > > > "y,y,y,y")
> > > >    (const_int 0)]))
> > > >   (label_ref (match_operand 0))
> > > >   (pc)))
> > > >    (set (match_operand:P 2 "nonimmediate_operand"
> > > > "=1,*r,m,*d*wi*c*l")
> > > > (plus:P (match_dup 1)
> > > > (const_int -1)))
> > > >    (clobber (match_scratch:P 5 "=X,X,&r,r"))
> > > >    (clobber (match_scratch:CC 6 "=X,&y,&y,&y"))
> > > >    (clobber (match_scratch:CCEQ 7 "=X,&y,&y,&y"))]
> > > > 
> > > > However when this gets to the loop_doloop pass, we get an
> > > > assert
> > > > fail
> > > > in iv_number_of_iterations():
> > > > 
> > > >   gcc_assert (COMPARISON_P (condition));
> > > > 
> > > > This is happening because this branch insn tests two things
> > > > ANDed
> > > > together so the and is at the top of the expression, not a
> > > > comparison.
> > > 
> > > Is this something you've created for an existing
> > > loop?  Presumably an
> > > existing loop that previously was a simple loop?
> > 
> > The rtl to use this instruction is generated by new code I'm
> > working on
> > to do a builtin expansion of memcmp using a loop. I call
> > gen_bdnztf_di() to generate rtl for the insn. It would be nice to
> > be
> > able to generate this instruction from doloop conversion but that
> > is
> > beyond the scope of what I'm working on presently.
> 
> Understood.
> 
> So what I think (and I'm hoping you can confirm one way or the other)
> is
> that by generating this instruction you're turing a loop which
> previously was considered a simple loop by the IV code and turning it
> into something the IV bits no longer think is a simple loop.
> 
> I think that's problematical as when the loop is thought to be a
> simple
> loop, it has to have a small number of forms for its loop back/exit
> loop
> tests and whether or not a loop is a simple loop is cached in the
> loop
> structure.
> 
> I think we need to dig into that first.  If my suspicion is correct
> then
> this patch is really just papering over that deeper problem.  So I
> think
> you need to dig a big deeper into why you're getting into the code in
> question (canonicalize_condition) and whether or not the call chain
> makes any sense given the changes you've made to the loop.
> 

Jeff,
  There is no existing loop structure. This starts with a memcmp() call
and then goes down through the builtin expansion mechanism, which is
ultimately expanding the pattern cmpmemsi which is where my code is
generating a loop that finishes with bdnzt. The code that's ultimately
generated looks like this:

srdi 9,10,4
li 6,0
mtctr 9
li 4,8
.L9:
ldbrx 8,11,6
ldbrx 9,7,6
ldbrx 5,11,4
ldbrx 3,7,4
addi 6,6,16
addi 4,4,16
subfc. 9,9,8
bne 0,.L4
subfc. 9,3,5
bdnzt 2,.L9

So it is a loop with a branch out, and then the branch decrement nz
true back to the top. The iv's here (regs 4 and 6) were generated by my
expansion code. 

I really think the ultimate problem here is that both
canonicalize_condition and get_condition promise in their documenting
comments that they will return something that has a cond at the root of
the rtx, or 0 if they don't understand what they're given. In this case
they do not understand the rtx of bdnzt and are returning rtx rooted
with an and, not a cond. This may seem like papering over the problem,
but I think it is legitimate for these functions to return 0 when the
branch insn in question does not have a simple cond at the heart of it.
And bootstrap/regtest did pass with my patch on ppc64le and x86_64.
Ultimately, yes something better ought to be done here.

Thanks,
   Aaron

-- 
Aaron Sawdey, Ph.D.  acsaw...@linux.vnet.ibm.com
050-2/C113  (507) 253-7520 home: 507/263-0782
IBM Linux Technology Center - PPC Toolchain



Re: [DWARF] mark partial fn versions and OMP frags as partial in dwarf2+ debug info

2017-11-21 Thread Jakub Jelinek
On Wed, Nov 15, 2017 at 05:05:36AM -0200, Alexandre Oliva wrote:
> debug info: partial noentry functions: infra
> 
> This is the first patch of a set that addresses two different but
> somewhat related issues.
> 
> On the one hand, after partial inlining, the non-inlined function
> fragment is output in a way that debug info consumers can't distinguish
> from the entire function: debug info lists the entire function as
> abstract origin for the fragment, but nothing that indicates the
> fragment does not stand for the entire function.  So, if a debugger is
> asked to set a breakpoint at the entry point of the function, it might
> very well set one at the entry point of the fragment, which is likely
> not where users expect to stop.
> 
> On the other hand, OpenMP blocks are split out into artificial functions
> that do not indicate their executable code is part of another function.
> The artificial functions are nested within the original function, but
> that's hardly enough: ideally, debug info consumers should be able to
> tell that, if they stop within one of these functions, they're
> abstractly within the original function.
> 
> This patch introduces a new DWARF attribute to indicate that a function
> is a partial copy of its abstract origin, specifically, that its entry
> point does not correspond to the entry point of the abstract origin.
> This attribute can then be used to mark the out-of-line portion of
> partial inlines, and OpenMP blocks split out into artificial functions.

I'm not sure I like the attribute name too much, and more importantly,
as I said before, I think the attribute should not be a flag, but a number
which tells not just that it is an outlined portion of the original
subprogram, but also what kind of outlining it is and its purpose.

For the name, I wonder if instead of
DW_AT_GNU_partial_noentry
it wouldn't be better to use e.g. one of:
DW_AT_GNU_partial
DW_AT_GNU_partial_subprogram
DW_AT_GNU_fragment
DW_AT_GNU_subprogram_fragment
As for the values I'd like to see (see e.g. DW_AT_calling_convention
and corresponding DW_CC_* values and many other examples):
1) 0 value representing a default false, that the DW_TAG_subprogram
is not any kind of subprogram fragment
2) some value for partial inlining, perhaps two if we want to mark
both fragments of the inline created by partial inlining - the
entry fragment and the outlined rest of the function
3) OpenMP outlined parallel region
4) OpenMP outlined task region
5) OpenMP outlined target region
6) OpenACC outlined kernels region
7) OpenACC outlined parallel region
Thus we would have DW_GNU_PARTIAL_* constants in some enum that we'd use
here.  Of course, the single DECL_FUNCTION_PARTIAL_COPY bit wouldn't
be enough to cover these cases, but I guess we could add an attribute
with a space in the name if this bit is set to say which of those it is
(or just use the attribute unconditionally and don't reserve a bit for
that)?

The advantage of having more details is that the debug info consumer can
then decide how to handle, say talk to OMPD to find out the parent thread,
or look it up inside of libgomp (say through infinity notes), whatever.

And we could in the future add other kinds if we start outlining for other
reasons.

Jakub


Re: [PATCH, GCC/ARM] Fix cmse_nonsecure_entry return insn size

2017-11-21 Thread Thomas Preudhomme

Hi Kyrill,

On 09/11/17 14:26, Kyrill Tkachov wrote:

Hi Thomas,

On 08/11/17 09:50, Thomas Preudhomme wrote:

Hi,

A number of instructions are output in assembler form by
output_return_instruction () when compiling a function with the
cmse_nonsecure_entry attribute for Armv8-M Mainline with hardfloat float
ABI. However, the corresponding thumb2_cmse_entry_return insn pattern
does not account for all these instructions in its computing of the
length of the instruction.

This may lead GCC to use the wrong branching instruction due to
incorrect computation of the offset between the branch instruction's
address and the target address.

This commit fixes the mismatch between what output_return_instruction ()
does and what the pattern think it does and adds a note warning about
mismatch in the affected functions' heading comments to ensure code does
not get out of sync again.

Note: no test is provided because the C testcase is fragile (only works
on GCC 6) and the extracted RTL test fails to compile due to bugs in the
RTL frontend (PR82815 and PR82817)

ChangeLog entries are as follows:

*** gcc/ChangeLog ***

2017-10-30  Thomas Preud'homme 

* config/arm/arm.c (output_return_instruction): Add comments to
indicate requirement for cmse_nonsecure_entry return to account
for the size of clearing instruction output here.
(thumb_exit): Likewise.
* config/arm/thumb2.md (thumb2_cmse_entry_return): Fix length for
return in hardfloat mode.

Testing: Bootstrapped on arm-linux-gnueabihf and testsuite shows no
regression.

Is this ok for trunk?



Ok for trunk and for the branches after a few days.


I've committed the patch to gcc-7-branch (see attached) after another round of 
testing since nobody reported a regression since. Thanks.


Best regards,

Thomas
diff --git a/gcc/config/arm/arm.c b/gcc/config/arm/arm.c
index 989957f048e3c757ef4665d0387ecdc66d26a7dd..7b3f4c1011dc37cb01654f70cfbffadd57d382ec 100644
--- a/gcc/config/arm/arm.c
+++ b/gcc/config/arm/arm.c
@@ -19316,7 +19316,12 @@ arm_get_vfp_saved_size (void)
 
 /* Generate a function exit sequence.  If REALLY_RETURN is false, then do
everything bar the final return instruction.  If simple_return is true,
-   then do not output epilogue, because it has already been emitted in RTL.  */
+   then do not output epilogue, because it has already been emitted in RTL.
+
+   Note: do not forget to update length attribute of corresponding insn pattern
+   when changing assembly output (eg. length attribute of
+   thumb2_cmse_entry_return when updating Armv8-M Mainline Security Extensions
+   register clearing sequences).  */
 const char *
 output_return_instruction (rtx operand, bool really_return, bool reverse,
bool simple_return)
@@ -23809,7 +23814,12 @@ thumb_pop (FILE *f, unsigned long mask)
 
 /* Generate code to return from a thumb function.
If 'reg_containing_return_addr' is -1, then the return address is
-   actually on the stack, at the stack pointer.  */
+   actually on the stack, at the stack pointer.
+
+   Note: do not forget to update length attribute of corresponding insn pattern
+   when changing assembly output (eg. length attribute of epilogue_insns when
+   updating Armv8-M Baseline Security Extensions register clearing
+   sequences).  */
 static void
 thumb_exit (FILE *f, int reg_containing_return_addr)
 {
diff --git a/gcc/config/arm/thumb2.md b/gcc/config/arm/thumb2.md
index 2e7580f220eae1524fef69719b1796f50f5cf27c..35f8e9bbf24058c129cbb117c74d1a4bebbf9f38 100644
--- a/gcc/config/arm/thumb2.md
+++ b/gcc/config/arm/thumb2.md
@@ -1132,7 +1132,7 @@
; we adapt the length accordingly.
(set (attr "length")
  (if_then_else (match_test "TARGET_HARD_FLOAT")
-  (const_int 12)
+  (const_int 34)
   (const_int 8)))
; We do not support predicate execution of returns from cmse_nonsecure_entry
; functions because we need to clear the APSR.  Since predicable has to be


Re: [libstdc++-,doc] Mislocated

2017-11-21 Thread Jonathan Wakely

On 21/11/17 16:36 +, Przemyslaw Wirkus wrote:

On 21/11/17 15:27 +, Jonathan Wakely wrote:

OK for trunk?



OK, thanks.


I don't have privileges to commit. Could you please commit it on my behalf?


Done - thanks again for the patch.



[Patch, rs6000] Fix register values in ppc-asm.h

2017-11-21 Thread Pat Haugen
The following patch fixes a couple typos in ppc-asm.h. Committed as
obvious. Will also backport to GCC 6/7 branches.

-Pat


2017-11-21  Pat Haugen  

* config/rs6000/ppc-asm.h (f50, vs50): Fix values.


Index: gcc/config/rs6000/ppc-asm.h
===
--- gcc/config/rs6000/ppc-asm.h (revision 255022)
+++ gcc/config/rs6000/ppc-asm.h (working copy)
@@ -120,7 +120,7 @@ see the files COPYING3 and COPYING.RUNTI
 #define f4747
 #define f4848
 #define f4949
-#define f5030
+#define f5050
 #define f5151
 #define f5252
 #define f5353
@@ -222,7 +222,7 @@ see the files COPYING3 and COPYING.RUNTI
 #define vs47   47
 #define vs48   48
 #define vs49   49
-#define vs50   30
+#define vs50   50
 #define vs51   51
 #define vs52   52
 #define vs53   53



Re: [PATCH][RFC] Add quotes for constexpr keyword.

2017-11-21 Thread Martin Liška

On 11/21/2017 06:13 PM, Jeff Law wrote:

On 11/21/2017 08:00 AM, Martin Liška wrote:

Hi.

I'm sending v2 of the patch where I fixed test-suite fallout.

Patch can bootstrap on ppc64le-redhat-linux and survives regression tests.

Ready to be installed?
Martin


0001-Add-quotes-for-constexpr-keyword.patch


 From 3195b1b71c387b1359c90f6e752e1c312120cd69 Mon Sep 17 00:00:00 2001
From: marxin 
Date: Wed, 15 Nov 2017 08:41:12 +0100
Subject: [PATCH] Add quotes for constexpr keyword.

gcc/cp/ChangeLog:

2017-11-15  Martin Liska  

* class.c (finalize_literal_type_property): Add quotes for
constexpr keyword.
(explain_non_literal_class): Likewise.
* constexpr.c (ensure_literal_type_for_constexpr_object): Likewise.
(is_valid_constexpr_fn): Likewise.
(check_constexpr_ctor_body): Likewise.
(register_constexpr_fundef): Likewise.
(explain_invalid_constexpr_fn): Likewise.
(cxx_eval_builtin_function_call): Likewise.
(cxx_eval_call_expression): Likewise.
(cxx_eval_loop_expr): Likewise.
(potential_constant_expression_1): Likewise.
* decl.c (check_previous_goto_1): Likewise.
(check_goto): Likewise.
(grokfndecl): Likewise.
(grokdeclarator): Likewise.
* error.c (maybe_print_constexpr_context): Likewise.
* method.c (process_subob_fn): Likewise.
(defaulted_late_check): Likewise.
* parser.c (cp_parser_compound_statement): Likewise.

gcc/testsuite/ChangeLog:

2017-11-16  Martin Liska  

* g++.dg/cpp0x/constexpr-48089.C: Add quotes for constexpr
keyword; add dg-message for 'in .constexpr. expansion of '.
* g++.dg/cpp0x/constexpr-50060.C: Likewise.
* g++.dg/cpp0x/constexpr-60049.C: Likewise.
* g++.dg/cpp0x/constexpr-70323.C: Likewise.
* g++.dg/cpp0x/constexpr-70323a.C: Likewise.
* g++.dg/cpp0x/constexpr-cast.C: Likewise.
* g++.dg/cpp0x/constexpr-diag3.C: Likewise.
* g++.dg/cpp0x/constexpr-ex1.C: Likewise.
* g++.dg/cpp0x/constexpr-generated1.C: Likewise.
* g++.dg/cpp0x/constexpr-ice16.C: Likewise.
* g++.dg/cpp0x/constexpr-ice5.C: Likewise.
* g++.dg/cpp0x/constexpr-incomplete2.C: Likewise.
* g++.dg/cpp0x/constexpr-neg1.C: Likewise.
* g++.dg/cpp0x/constexpr-recursion.C: Likewise.
* g++.dg/cpp0x/constexpr-shift1.C: Likewise.
* g++.dg/cpp1y/constexpr-70265-1.C: Likewise.
* g++.dg/cpp1y/constexpr-70265-2.C: Likewise.
* g++.dg/cpp1y/constexpr-79655.C: Likewise.
* g++.dg/cpp1y/constexpr-new.C: Likewise.
* g++.dg/cpp1y/constexpr-return2.C: Likewise.
* g++.dg/cpp1y/constexpr-shift1.C: Likewise.
* g++.dg/cpp1y/constexpr-throw.C: Likewise.
* g++.dg/cpp1z/constexpr-lambda6.C: Likewise.
* g++.dg/ext/constexpr-vla1.C: Likewise.
* g++.dg/ext/constexpr-vla2.C: Likewise.
* g++.dg/ext/constexpr-vla3.C: Likewise.
* g++.dg/cpp0x/static_assert10.C: Likewise.
* g++.dg/cpp1y/pr63996.C: Likewise.
* g++.dg/cpp1y/pr68180.C: Likewise.
* g++.dg/cpp1y/pr77830.C: Likewise.
* g++.dg/ubsan/pr63956.C: Likewise.

OK.  And ISTM that other patches of a similar nature ought to just be
considered OK without the need to review.

Jeff



Thanks for review, installed as r255025.

Martin


Re: Backports for GCC 7 branch

2017-11-21 Thread Martin Liška

I'm going to install one more patch.

Martin
>From e58dd9f5b28468e2afb928c767041e5a3fef057f Mon Sep 17 00:00:00 2001
From: marxin 
Date: Fri, 27 Oct 2017 08:34:56 +
Subject: [PATCH] Backport r254137

gcc/ChangeLog:

2017-10-27  Martin Liska  

	PR gcov-profile/82457
	* doc/invoke.texi: Document that one needs a non-strict ISO mode
	for fork-like functions to be properly instrumented.
---
 gcc/doc/invoke.texi | 6 +++---
 1 file changed, 3 insertions(+), 3 deletions(-)

diff --git a/gcc/doc/invoke.texi b/gcc/doc/invoke.texi
index a0fb09eb9e1..6d0283298c6 100644
--- a/gcc/doc/invoke.texi
+++ b/gcc/doc/invoke.texi
@@ -10648,9 +10648,9 @@ Link your object files with @option{-lgcov} or @option{-fprofile-arcs}
 Run the program on a representative workload to generate the arc profile
 information.  This may be repeated any number of times.  You can run
 concurrent instances of your program, and provided that the file system
-supports locking, the data files will be correctly updated.  Also
-@code{fork} calls are detected and correctly handled (double counting
-will not happen).
+supports locking, the data files will be correctly updated.  Unless
+a strict ISO C dialect option is in effect, @code{fork} calls are
+detected and correctly handled without double counting.
 
 @item
 For profile-directed optimizations, compile the source files again with
-- 
2.14.3



Re: [PATCH] handle non-constant offsets in -Wstringop-overflow (PR 77608)

2017-11-21 Thread Martin Sebor

On 11/21/2017 09:55 AM, Jeff Law wrote:

On 11/19/2017 04:28 PM, Martin Sebor wrote:

On 11/18/2017 12:53 AM, Jeff Law wrote:

On 11/17/2017 12:36 PM, Martin Sebor wrote:

The attached patch enhances -Wstringop-overflow to detect more
instances of buffer overflow at compile time by handling non-
constant offsets into the destination object that are known to
be in some range.  The solution could be improved by handling
even more cases (e.g., anti-ranges or offsets relative to
pointers beyond the beginning of an object) but it's a start.

In addition to bootsrapping/regtesting GCC, also tested with
Binutils/GDB, Glibc, and the Linux kernel on x86_64 with no
regressions.

Martin

The top of GDB fails to compile at the moment so the validation
there was incomplete.

gcc-77608.diff


PR middle-end/77608 - missing protection on trivially detectable
runtime buffer overflow

gcc/ChangeLog:

PR middle-end/77608
* builtins.c (compute_objsize): Handle non-constant offsets.

gcc/testsuite/ChangeLog:

PR middle-end/77608
* gcc.dg/Wstringop-overflow.c: New test.

The recursive call into compute_objsize passing in the ostype avoids
having to think about the whole object vs nearest containing object
issues.  Right?

What's left to worry about is maximum or minimum remaining bytes in the
object.  At least that's my understanding of how ostype works here.

So we get the amount remaining, ignoring the variable offset, from the
recursive call (SIZE).  The space left after we account for the variable
offset is [SIZE - MAX, SIZE - MIN].  So ISTM for type 0/1 you have to
return SIZE-MIN (which you do) and for type 2/3 you have to return
SIZE-MAX which I think you get wrong (and you have to account for the
possibility that MAX or MIN is greater than SIZE and thus there's
nothing left).


Subtracting the upper bound of the offset from the size instead
of the lower bound when the caller is asking for the minimum
object size would make the result essentially meaningless in
common cases where the offset is smaller than size_t, as in:

  char a[7];

  void f (const char *s, unsigned i)
  {
__builtin_strcpy (a + i, s);
  }

Here, i's range is [0, UINT_MAX].

IMO, it's only useful to use the lower bound here, otherwise
the result would only rarely be non-zero.

But when we're asking for the minimum left, aren't we essentially asking
for "how much space am I absolutely sure I can write"?  And if that is
the question, then the only conservatively correct answer is to subtract
the high bound.


I suppose you could look at it that way but IME with this work
(now, and also last year when I submitted a patch actually
changing the built-in), using the upper bound is just not that
useful because it's too often way too big.  There's no way to
distinguish an out-of-range upper bound that's the result of
an inadequate attempt to constrain a value from an out-of-range
upper bound that is sufficiently constrained but in a way GCC
doesn't see.

There are no clients of this API that would be affected by
the decision one way or the other (unless the user specifies
a -Wstringop-overflow= argument greater than the default 2)
so I don't think what we do now matters much, if at all.
Perhaps in the future with some of the range improvements
that you, Aldy and Andrew have been working on.

That said, if it helps us move forward with this enhancement
I'll use the upper bound -- let me know.  In the future, when
it is actually used, we'll adjust it as necessary.


This is also what other warnings that deal with ranges do.  For
-Warray-bounds only considers the lower bound (unless it's negative)
when deciding whether or not to warn for

  int g (unsigned i)
  {
return a[i];
  }>
It would be too noisy to be of practical use otherwise (even at
level 2).

Which argues that:

1. Our ranges suck
2. We're currently avoiding dealing with that by giving answers that are
not conservatively correct.

Right?


I don't feel quite as strongly.  Modulo the pesky bugs we get
every so often for some of the corner cases or known limitations
they seem actually reasonably accurate in most cases, and the
warnings, for the most part, strike a reasonable balance between
false positives and negatives.  But I certainly agree that there
is room to improve and I look forward to taking some of the range
enhancement out for a spin.

Martin


Re: [patch, fortran] Implement maxloc and minloc for character

2017-11-21 Thread Thomas Koenig

Hi Janne,



So, any other comments about my patch? OK for trunk?


- In many cases the copyright notice has "This file is part of the GNU
Fortran 95 runtime library (libgfortran)." It's a while since we've
called ourselves "GNU Fortran 95", so just remove the "95".


That's what I got for copying over an old version of maxloc
(when it still didn't have NAN handling) as a basis for my own
patch. This also meant that I had an old copyright notice. Fixed.



- It seems in the library you're using int for string lengths? Please
use gfc_charlen_type instead (in the frontend, gfc_charlen_type_node).
(Most of the charlen->size_t patch is fixing up places where we're
accidentally using int instead of gfc_charlen_type..).


Fixed.


- Why are you using GFC_INTEGER_1 / GFC_INTEGER_4 to loop over the
arrays rather than char/gfc_char4_t? Not sure if it makes any
difference in practice, but it sure seems confusing..


The reason has to do with evil m4 magic. I used a macro
from iparm.m4, atype_code. Changing m4 code should mostly
be restricted to those cases where it is _really_ necessary
(people, say, not without justification, that m4 is a write-only
langauge).


- Not really related to your patch, but memcmp_char4 sure looks
redundant. Isn't it the same as memcmp(a, b, size*4), in which case we
could use optimized memcmp implementations?


Big/little endian issues.

Consider the following short program:

#include 
#include 

char a[4] = { 1, 2, 3, 4};
char b[4] = { 4, 3, 2, 1};

int main()
{
  int i, j;
  memcpy (&i, a, sizeof(i));
  memcpy (&j, b, sizeof(j));
  printf("memcmp   : ");
  if (memcmp (&i,&j,sizeof(i)))
printf("larger\n");
  else
printf("smaller\n");

  printf("Direct comparison: ");
  if (i > j)
printf("larger\n");
  else
printf("smaller\n");

  return 0;
}

On my x86_64 system, this prints

memcmp   : larger 




Direct comparison: larger

On a big-endian system, this prints

memcmp   : larger
Direct comparison: smaller

However, I just learned about the __BYTE_ORDER__ macro.
We could use that (and in places where we currently have
a runtime check, for example in replacing the big_endian
global variable in libgfortran). But that is for another day.

So, attached is a new version of the patch. No update
on the ChangeLog. OK for trunk?

Regards

Thomas


p8.diff.gz
Description: application/gzip


Re: [PATCH, i386] Refactor -mprefer-avx[128|256] options into common -mprefer-vector-width=[none|128|256|512]

2017-11-21 Thread Uros Bizjak
I have committed the attached patch.

Uros.

On Tue, Nov 21, 2017 at 6:18 PM, Shalnov, Sergey
 wrote:
> Uros,
> Yes, please. Thank you for your proposals and comments.
> Please commit as you proposed.
> Sergey
>
> -Original Message-
> From: Uros Bizjak [mailto:ubiz...@gmail.com]
> Sent: Tuesday, November 21, 2017 6:13 PM
> To: Shalnov, Sergey 
> Cc: gcc-patches@gcc.gnu.org; kirill.yuk...@gmail.com; Koval, Julia 
> ; Senkevich, Andrew ; 
> Peryt, Sebastian ; Ivchenko, Alexander 
> ; Joseph Myers 
> Subject: Re: [PATCH, i386] Refactor -mprefer-avx[128|256] options into common 
> -mprefer-vector-width=[none|128|256|512]
>
> On Tue, Nov 21, 2017 at 4:50 PM, Shalnov, Sergey  
> wrote:
>> Uros,
>> I did new patch with all comments addressed as proposed.
>> 1. old option -mprefer-avx128 is Alias(mprefer-vector-width=, 128,
>> none) 2. Simplified default initialization (as Bernhard proposed) 3.
>> Fixed documentation (proposed by Sandra) 4. Several tests are changed
>> to use new style of the option but many leaved with -mprefer-avx128
>> (one test with new style -mprefer-vector-width=128)
>>
>>
>> 2017-11-21  Sergey Shalnov  
>>
>> gcc/
>> * config/i386/i386-opts.h (enum prefer_vector_width): Added new enum
>> for the new option -mprefer-vector-width=[none|128|256|512].
>> * config/i386/i386.c (ix86_target_string): remove old style options
>> -mprefer-avx256 and make -mprefer-avx128 as alias.
>> (ix86_option_override_internal):  Apply defaults for the
>> -mprefer-vector-width=[128|256] option.
>> * config/i386/i386.h (TARGET_PREFER_AVX128, TARGET_PREFER_AVX256):
>> Implement macros to work with -mprefer-vector-width=.
>> * config/i386/i386.opt: Implemented option
>> -mprefer-vector-width=[none|128|256|512].
>> * doc/invoke.texi: Documentation for
>> -mprefer-vector-width=[none|128|256|512].
>>
>> gcc/testsuite/
>> * g++.dg/ext/pr57362.C (__attribute__): Apply new option syntax.
>> * g++.dg/torture/pr81249.C: Ditto.
>> * gcc.target/i386/avx512f-constant-float-return.c: Ditto.
>> * gcc.target/i386/avx512f-prefer.c: Ditto.
>> * gcc.target/i386/pr82460-2.c: Ditto.
>>
>> Please merge this patch if you think it is acceptable.
>> Thank you
>> Sergey
>
>  mprefer-avx128
> -Target Report Mask(PREFER_AVX128) Save
> -Use 128-bit AVX instructions instead of 256-bit AVX instructions in the 
> auto-vectorizer.
> +Target Undocumented Alias(mprefer-vector-width=, 128, none)
>
> For compatibility, I'd rather leave this option documented with:
>
> +Target Alias(mprefer-vector-width=, 128, 256)
>
> This would mean that in addition to -mprefer-avx128 switching to 128-bit AVX, 
> -mno-prefer-avx128 would switch to 256-bit AVX, as documented for the option.
>
> The patch is OK, and If you agree, I can commit the patch with the above 
> change.
>
> Thanks,
> Uros.
Index: config/i386/i386-opts.h
===
--- config/i386/i386-opts.h (revision 255016)
+++ config/i386/i386-opts.h (working copy)
@@ -99,4 +99,11 @@ enum stack_protector_guard {
   SSP_GLOBAL/* global canary */
 };
 
+enum prefer_vector_width {
+PVW_NONE,
+PVW_AVX128,
+PVW_AVX256,
+PVW_AVX512
+};
+
 #endif
Index: config/i386/i386.c
===
--- config/i386/i386.c  (revision 255016)
+++ config/i386/i386.c  (working copy)
@@ -2847,7 +2847,6 @@ ix86_target_string (HOST_WIDE_INT isa, HOST_WIDE_I
 { "-mstv", MASK_STV },
 { "-mavx256-split-unaligned-load", MASK_AVX256_SPLIT_UNALIGNED_LOAD },
 { "-mavx256-split-unaligned-store",
MASK_AVX256_SPLIT_UNALIGNED_STORE },
-{ "-mprefer-avx128",   MASK_PREFER_AVX128 },
 { "-mcall-ms2sysv-xlogues",MASK_CALL_MS2SYSV_XLOGUES }
   };
 
@@ -2854,8 +2853,7 @@ ix86_target_string (HOST_WIDE_INT isa, HOST_WIDE_I
   /* Additional flag options.  */
   static struct ix86_target_opts flag2_opts[] =
   {
-{ "-mgeneral-regs-only",   OPTION_MASK_GENERAL_REGS_ONLY },
-{ "-mprefer-avx256",   OPTION_MASK_PREFER_AVX256 },
+{ "-mgeneral-regs-only",   OPTION_MASK_GENERAL_REGS_ONLY }
   };
 
   const char *opts[ARRAY_SIZE (isa_opts) + ARRAY_SIZE (isa2_opts)
@@ -4686,16 +4684,18 @@ ix86_option_override_internal (bool main_args_p,
   if (!ix86_tune_features[X86_TUNE_AVX256_UNALIGNED_STORE_OPTIMAL]
   && !(opts_set->x_target_flags & MASK_AVX256_SPLIT_UNALIGNED_STORE))
 opts->x_target_flags |= MASK_AVX256_SPLIT_UNALIGNED_STORE;
+
   /* Enable 128-bit AVX instruction generation
  for the auto-vectorizer.  */
   if (TARGET_AVX128_OPTIMAL
-  && !(opts_set->x_target_flags & MASK_PREFER_AVX128))
-opts->x_target_flags |= MASK_PREFER_AVX128;
-  /* Use 256-bit AVX instructions instead of 512-bit AVX instructions
+  && (opts_set->x_prefer_vector_width_

Re: libgo patch committed: Fix Makefile bug setting LD_LIBRARY_PATH

2017-11-21 Thread Eric Botcazou
> This patch by Than McIntosh fixes a small bug in the libgo Makefile
> recipe that constructs the directory from which to pick up
> libgcc_s.so; the gccgo invocation with -print-libgcc-file-name was
> missing the flags, which meant that for -m32 builds we'd see the
> 64-bit libgcc dir.  Bootstrapped and ran Go testsuite on
> x86_64-pc-linux-gnu.  Committed to mainline.

Thanks, this helps on Solaris.  I have attached another fixlet: the -q option 
of grep is rejected on Solaris.  Tested on Linux and Solaris.

-- 
Eric BotcazouIndex: mksigtab.sh
===
--- mksigtab.sh	(revision 255000)
+++ mksigtab.sh	(working copy)
@@ -29,7 +29,7 @@ addsig() {
 echo "	$1: $2,"
 # Get the signal number and add it to SIGLIST
 signum=`grep "const $1 = " gen-sysinfo.go | sed -e 's/.* = //'`
-if echo "$signum" | grep -q '^_SIG[A-Z0-9_]*$'; then
+if echo "$signum" | grep '^_SIG[A-Z0-9_]*$' >/dev/null 2>&1; then
 # Recurse once to obtain signal number
 # This is needed for some MIPS signals defined as aliases of other signals
 signum=`grep "const $signum = " gen-sysinfo.go | sed -e 's/.* = //'`


Re: [PATCH] Avoid static initialization in the strlen pass

2017-11-21 Thread Martin Sebor

On 11/20/2017 02:57 PM, Jakub Jelinek wrote:

On Mon, Nov 20, 2017 at 02:25:35PM -0700, Martin Sebor wrote:

On 11/20/2017 02:14 PM, Jakub Jelinek wrote:

Hi!

All the hash_maps in tree-ssa-strlen.c except for the newly added one
were pointers to hash maps, which were constructed either lazily or during
the pass.  But strlen_to_stridx is now constructed at the compiler start,
which is something I'd prefer to avoid, it affects even -O0 that way and
empty/small file compilation, something e.g. the kernel folks care so much
about.

Apparently the hash map is only needed when one of the two warnings
is enabled, so this patch initializes it only in that case and otherwise
doesn't fill it or query it.


This same change is also in the latest patch I posted for 82945
just yesterday:

  https://gcc.gnu.org/ml/gcc-patches/2017-11/msg01670.html


Oops, sorry for missing that.  But you still initialize it unconditionally
in the pass and compute it even when it is only needed for the warning.


No problem.  I've committed r255031 with the lazy initialization.

Martin


Re: [patch, fortran] Implement maxloc and minloc for character

2017-11-21 Thread Janne Blomqvist
On Tue, Nov 21, 2017 at 9:50 PM, Thomas Koenig  wrote:
> Hi Janne,
>
>
>>> So, any other comments about my patch? OK for trunk?
>>
>>
>> - In many cases the copyright notice has "This file is part of the GNU
>> Fortran 95 runtime library (libgfortran)." It's a while since we've
>> called ourselves "GNU Fortran 95", so just remove the "95".
>
>
> That's what I got for copying over an old version of maxloc
> (when it still didn't have NAN handling) as a basis for my own
> patch. This also meant that I had an old copyright notice. Fixed.

Uh, it seems the patch you posted didn't actually fix this?

>> - It seems in the library you're using int for string lengths? Please
>> use gfc_charlen_type instead (in the frontend, gfc_charlen_type_node).
>> (Most of the charlen->size_t patch is fixing up places where we're
>> accidentally using int instead of gfc_charlen_type..).
>
>
> Fixed.

Not everywhere? At least

zgrep "int len" p8.diff.gz

turns up some cases?

>> - Why are you using GFC_INTEGER_1 / GFC_INTEGER_4 to loop over the
>> arrays rather than char/gfc_char4_t? Not sure if it makes any
>> difference in practice, but it sure seems confusing..
>
>
> The reason has to do with evil m4 magic. I used a macro
> from iparm.m4, atype_code. Changing m4 code should mostly
> be restricted to those cases where it is _really_ necessary
> (people, say, not without justification, that m4 is a write-only
> langauge).

Fair enough. :)

>> - Not really related to your patch, but memcmp_char4 sure looks
>> redundant. Isn't it the same as memcmp(a, b, size*4), in which case we
>> could use optimized memcmp implementations?
>
>
> Big/little endian issues.
>
> Consider the following short program:
>
> #include 
> #include 
>
> char a[4] = { 1, 2, 3, 4};
> char b[4] = { 4, 3, 2, 1};
>
> int main()
> {
>   int i, j;
>   memcpy (&i, a, sizeof(i));
>   memcpy (&j, b, sizeof(j));
>   printf("memcmp   : ");
>   if (memcmp (&i,&j,sizeof(i)))
> printf("larger\n");
>   else
> printf("smaller\n");
>
>   printf("Direct comparison: ");
>   if (i > j)
> printf("larger\n");
>   else
> printf("smaller\n");
>
>   return 0;
> }
>
> On my x86_64 system, this prints
>
> memcmp   : larger
>
>
> Direct comparison: larger
>
> On a big-endian system, this prints
>
> memcmp   : larger
> Direct comparison: smaller

Ooh, indeed.

> However, I just learned about the __BYTE_ORDER__ macro.
> We could use that (and in places where we currently have
> a runtime check, for example in replacing the big_endian
> global variable in libgfortran). But that is for another day.

Yup.

> So, attached is a new version of the patch. No update
> on the ChangeLog. OK for trunk?

Yup, just really fix the copyright and string length stuff first. Thanks!


-- 
Janne Blomqvist


[PATCH] PR libstdc++/48101 improve errors for invalid container specializations

2017-11-21 Thread Jonathan Wakely

This uses static_assert to improve the errors when attempting to
instantiate invalid specializations of containers, e.g. set,
or unordered_set, hash> (which mixes up the
order of the hasher and equality predicate arguments).

This means instead of more than 100 lines of confusing errors for
https://wandbox.org/permlink/kL1YVNVOzrAsLPyS we get only this:

In file included from /home/jwakely/gcc/8/include/c++/8.0.0/set:61:0,
from s.cc:2:
/home/jwakely/gcc/8/include/c++/8.0.0/bits/stl_set.h: In instantiation of ‘class 
std::set’:
s.cc:8:18:   required from here
/home/jwakely/gcc/8/include/c++/8.0.0/bits/stl_set.h:108:7: error: static 
assertion failed: std::set must have a non-const, non-volatile value_type
  static_assert(is_same::type, _Key>::value,
  ^
In file included from /home/jwakely/gcc/8/include/c++/8.0.0/unordered_set:46:0,
from s.cc:1:
/home/jwakely/gcc/8/include/c++/8.0.0/bits/hashtable.h: In instantiation of ‘class std::_Hashtable, std::__detail::_Identity, std::hash, std::equal_to, 
std::__detail::_Mod_range_hashing, std::__detail::_Default_ranged_hash, std::__detail::_Prime_rehash_policy, 
std::__detail::_Hashtable_traits >’:
/home/jwakely/gcc/8/include/c++/8.0.0/bits/unordered_set.h:898:18:   required from ‘class 
std::unordered_multiset, std::hash >’
s.cc:10:53:   required from here
/home/jwakely/gcc/8/include/c++/8.0.0/bits/hashtable.h:195:7: error: static 
assertion failed: hash function must be invocable with an argument of key type
  static_assert(__is_invocable{},
  ^
/home/jwakely/gcc/8/include/c++/8.0.0/bits/hashtable.h:197:7: error: static 
assertion failed: key equality predicate must be invocable with two arguments 
of key type
  static_assert(__is_invocable{},
  ^

Tested powerpc64le-linux, committed to trunk.


PR libstdc++/48101
* include/bits/allocator.h (allocator)
(allocator, allocator): Add partial
specializations.
* include/bits/forward_list.h (forward_list): Add static assertions.
* include/bits/hashtable.h (__cache_default): Use
__is_nothrow_invocable instead of __is_noexcept_hash.
(_Hashtable): Add static assertions.
* include/bits/hashtable_policy.h (__is_noexcept_hash): Remove.
* include/bits/stl_deque.h (deque): Add static assertions.
* include/bits/stl_function.h (_Identity): Add partial
specialization.
* include/bits/stl_list.h (list): Add static assertions.
* include/bits/stl_map.h (map): Likewise.
* include/bits/stl_multimap.h (multimap): Likewise.
* include/bits/stl_multiset.h (multiset): Likewise.
* include/bits/stl_set.h (set): Likewise.
* include/bits/stl_tree.h (_Rb_tree): Likewise.
* include/bits/stl_vector.h (vector): Likewise.
* include/bits/unordered_map.h (unordered_map, unordered_multimap):
Use typename instead of class in template-parameter-list and remove
spaces.
* include/bits/unordered_set.h (unordered_set, unordered_multiset):
Likewise.
* testsuite/23_containers/deque/48101-2_neg.cc: New test.
* testsuite/23_containers/deque/48101_neg.cc: New test.
* testsuite/23_containers/forward_list/48101-2_neg.cc: New test.
* testsuite/23_containers/forward_list/48101_neg.cc: New test.
* testsuite/23_containers/list/48101-2_neg.cc: New test.
* testsuite/23_containers/list/48101_neg.cc: New test.
* testsuite/23_containers/map/48101-2_neg.cc: New test.
* testsuite/23_containers/map/48101_neg.cc: New test.
* testsuite/23_containers/multimap/48101-2_neg.cc: New test.
* testsuite/23_containers/multimap/48101_neg.cc: New test.
* testsuite/23_containers/multiset/48101-2_neg.cc: New test.
* testsuite/23_containers/multiset/48101_neg.cc: New test.
* testsuite/23_containers/set/48101-2_neg.cc: New test.
* testsuite/23_containers/set/48101_neg.cc: New test.
* testsuite/23_containers/unordered_map/48101-2_neg.cc: New test.
* testsuite/23_containers/unordered_map/48101_neg.cc: New test.
* testsuite/23_containers/unordered_multimap/48101-2_neg.cc: New test.
* testsuite/23_containers/unordered_multimap/48101_neg.cc: New test.
* testsuite/23_containers/unordered_multiset/48101-2_neg.cc: New test.
* testsuite/23_containers/unordered_multiset/48101_neg.cc: New test.
* testsuite/23_containers/unordered_set/48101-2_neg.cc: New test.
* testsuite/23_containers/unordered_set/48101_neg.cc: New test.
* testsuite/23_containers/vector/48101-2_neg.cc: New test.
* testsuite/23_containers/vector/48101_neg.cc: New test.
commit be65a3f6c9738c646b37fdfe1508b15e15c093f5
Author: Jonathan Wakely 
Date:   Tue Nov 21 18:36:26 2017 +

PR libstdc++/48101 improve errors for invalid container specializations

P

committed: remove unused function (PR 83099)

2017-11-21 Thread Martin Sebor

My last commit included an unused static function that caused
a -Wunused-function warning/error.  I removed the function in
r255036 to unblock the bootstrap while retesting everything
in the meanwhile.

Martin


Re: Add optabs for common types of permutation

2017-11-21 Thread Richard Sandiford
Richard Biener  writes:
> On Mon, Nov 20, 2017 at 1:35 PM, Richard Sandiford
>  wrote:
>> Richard Biener  writes:
>>> On Mon, Nov 20, 2017 at 12:56 AM, Jeff Law  wrote:
 On 11/09/2017 06:24 AM, Richard Sandiford wrote:
> ...so that we can use them for variable-length vectors.  For now
> constant-length vectors continue to use VEC_PERM_EXPR and the
> vec_perm_const optab even for cases that the new optabs could
> handle.
>
> The vector optabs are inconsistent about whether there should be
> an underscore before the mode part of the name, but the other lo/hi
> optabs have one.
>
> Doing this means that we're able to optimise some SLP tests using
> non-SLP (for now) on targets with variable-length vectors, so the
> patch needs to add a few XFAILs.  Most of these go away with later
> patches.
>
> Tested on aarch64-linux-gnu (with and without SVE), x86_64-linux-gnu
> and powerpc64le-linus-gnu.  OK to install?
>
> Richard
>
>
> 2017-11-09  Richard Sandiford  
>   Alan Hayward  
>   David Sherwood  
>
> gcc/
>   * doc/md.texi (vec_reverse, vec_interleave_lo, vec_interleave_hi)
>   (vec_extract_even, vec_extract_odd): Document new optabs.
>   * internal-fn.def (VEC_INTERLEAVE_LO, VEC_INTERLEAVE_HI)
>   (VEC_EXTRACT_EVEN, VEC_EXTRACT_ODD, VEC_REVERSE): New internal
>   functions.
>   * optabs.def (vec_interleave_lo_optab, vec_interleave_hi_optab)
>   (vec_extract_even_optab, vec_extract_odd_optab, vec_reverse_optab):
>   New optabs.
>   * tree-vect-data-refs.c: Include internal-fn.h.
>   (vect_grouped_store_supported): Try using 
> IFN_VEC_INTERLEAVE_{LO,HI}.
>   (vect_permute_store_chain): Use them here too.
>   (vect_grouped_load_supported): Try using IFN_VEC_EXTRACT_{EVEN,ODD}.
>   (vect_permute_load_chain): Use them here too.
>   * tree-vect-stmts.c (can_reverse_vector_p): New function.
>   (get_negative_load_store_type): Use it.
>   (reverse_vector): New function.
>   (vectorizable_store, vectorizable_load): Use it.
>   * config/aarch64/iterators.md (perm_optab): New iterator.
>   * config/aarch64/aarch64-sve.md (_): New expander.
>   (vec_reverse_): Likewise.
>
> gcc/testsuite/
>   * gcc.dg/vect/no-vfa-vect-depend-2.c: Remove XFAIL.
>   * gcc.dg/vect/no-vfa-vect-depend-3.c: Likewise.
>   * gcc.dg/vect/pr33953.c: XFAIL for vect_variable_length.
>   * gcc.dg/vect/pr68445.c: Likewise.
>   * gcc.dg/vect/slp-12a.c: Likewise.
>   * gcc.dg/vect/slp-13-big-array.c: Likewise.
>   * gcc.dg/vect/slp-13.c: Likewise.
>   * gcc.dg/vect/slp-14.c: Likewise.
>   * gcc.dg/vect/slp-15.c: Likewise.
>   * gcc.dg/vect/slp-42.c: Likewise.
>   * gcc.dg/vect/slp-multitypes-2.c: Likewise.
>   * gcc.dg/vect/slp-multitypes-4.c: Likewise.
>   * gcc.dg/vect/slp-multitypes-5.c: Likewise.
>   * gcc.dg/vect/slp-reduc-4.c: Likewise.
>   * gcc.dg/vect/slp-reduc-7.c: Likewise.
>   * gcc.target/aarch64/sve_vec_perm_2.c: New test.
>   * gcc.target/aarch64/sve_vec_perm_2_run.c: Likewise.
>   * gcc.target/aarch64/sve_vec_perm_3.c: New test.
>   * gcc.target/aarch64/sve_vec_perm_3_run.c: Likewise.
>   * gcc.target/aarch64/sve_vec_perm_4.c: New test.
>   * gcc.target/aarch64/sve_vec_perm_4_run.c: Likewise.
 OK.
>>>
>>> It's really a step backwards - we had those optabs and a tree code in
>>> the past and
>>> canonicalizing things to VEC_PERM_EXPR made things simpler.
>>>
>>> Why doesn't VEC_PERM  not work?
>>
>> The problems with that are:
>>
>> - It doesn't work for vectors with 256-bit elements because the indices
>>   wrap round.
>
> That's a general issue that would need to be addressed for larger
> vectors (GCN?).
> I presume the requirement that the permutation vector have the same size
> needs to be relaxed.
>
>> - Supporting a fake VEC_PERM_EXPR  for a few
>>   special cases would be hard, especially since v256hi isn't a normal
>>   vector mode.  I imagine everything dealing with VEC_PERM_EXPR would
>>   then have to worry about that special case.
>
> I think it's not really a special case - any code here should just
> expect the same
> number of vector elements and not a particular size.  You already dealt with
> using a char[] vector for permutations I think.

It sounds like you're talking about the case in which the permutation
vector is a VECTOR_CST.  We still use VEC_PERM_EXPRs for constant-length
vectors, so that doesn't change.  (And yes, that probably means that it
does break for *fixed-length* 2048-bit vectors.)

But this patch is about the variable-length case, in which the
permutation vector is never a VECTOR_CST, and couldn't get converted
to a vec_perm_indices array.  As far

[PATCH] Handle GOMP_NVPTX_PTXRW in libgomp nvptx plugin

2017-11-21 Thread Tom de Vries

On 11/07/2017 03:54 PM, Jakub Jelinek wrote:

On Tue, Nov 07, 2017 at 06:48:25AM -0800, Cesar Philippidis wrote:

Changes in the patch series:
- removed OPENACC_ from environment variable names
- made temp files use gomp-nvptx prefix.
- fixed build error due to missing _GNU_SOURCE in libgomp-nvptx.c.
- merged the three GOMP_NVPTX_JIT patches into one
- rewrote GOMP_NVPTX_JIT to add no extra flags to the JIT compiler
   invocation if GOMP_NVPTX_JIT if not defined, removing the need for
   hardcoding default values
- added CU_JIT_TARGET to plugin/cuda/cuda.h

Build on x86_64 with nvptx offloading enabled (using plugin/cuda/cuda.h).

The patch series now looks like:
1. Handle GOMP_NVPTX_{DISASM,SAVE_TEMPS} in libgomp nvptx plugin
2. Handle GOMP_NVPTX_PTXRW in libgomp nvptx plugin
3. Handle GOMP_NVPTX_JIT={-O[0-4],-ori,-arch=} in libgomp nvptx
    plugin

I'll repost the patch series in reply to this email.


Ping.

Can we get this patch series into trunk and og7? The ability to easily
modify PTX code, via GOMP_NVPTX_PTXRW, is extremely helpful. It helped
me isolate one problem already.


It can be helpful for debugging, but I'm afraid about having such code in
production, I think something like this would be very easy to exploit.
Sure, running a suid or sgid program with offloading is probably very
dangerous anyway, but it could be just some minor priviledge escalation
in the app (SELinux, ACLs, whatever else) and this stuff would allow anyone
to run anything else.
So, IMNSHO if it should be added, only enabled by non-default configure
option.


Hi,

I've made the GOMP_NVPTX_PTXRW patch stand-alone, and added an 
off-by-default libgomp configure option 
--enable-libgomp-plugin-developer-only-options, which sets a config.h 
macro LIBGOMP_PLUGIN_DEVELOPER_ONLY_OPTIONS, which is used to 
enable/disable the GOMP_NVPTX_PTXRW functionality.


I've build this on x86_64 for nvptx accelerator, both with and without 
the configure option, and confirmed that in one case using 
GOMP_NVPTX_PTXRW=w generates a gomp-nvptx.0.ptx file, and in the other 
case it doesn't.


OK for trunk if x86_64 bootstrap and reg-test succeeds?

Thanks,
- Tom
Handle GOMP_NVPTX_PTXRW in libgomp nvptx plugin

2017-11-21  Tom de Vries  

	* plugin/plugin-nvptx.c (_GNU_SOURCE): Define.
	(gomp_nvptx_ptxrw): New static variable.
	(parse_gomp_nvptx_ptxrw, post_process_ptx_write, post_process_ptx_read)
	(post_process_ptx): New function.
	(link_ptx): Call post_process_ptx.
	* configure.ac: Add configure option
	--enable-libgomp-plugin-developer-only-options.
	* config.h.in: Regenerate.
	* configure: Same.

---
 libgomp/config.h.in   |   3 +
 libgomp/configure |  32 -
 libgomp/configure.ac  |  11 +++
 libgomp/plugin/plugin-nvptx.c | 160 +-
 4 files changed, 202 insertions(+), 4 deletions(-)

diff --git a/libgomp/config.h.in b/libgomp/config.h.in
index e7bc4d97374..68cccea4186 100644
--- a/libgomp/config.h.in
+++ b/libgomp/config.h.in
@@ -118,6 +118,9 @@
 /* Define to 1 if building libgomp for an accelerator-only target. */
 #undef LIBGOMP_OFFLOADED_ONLY
 
+/* Define to 1 if libgomp plugin developer-only options are enabled. */
+#undef LIBGOMP_PLUGIN_DEVELOPER_ONLY_OPTIONS
+
 /* Define to 1 if libgomp should use POSIX threads. */
 #undef LIBGOMP_USE_PTHREADS
 
diff --git a/libgomp/configure b/libgomp/configure
index e7842b5519f..14e39d7fbec 100755
--- a/libgomp/configure
+++ b/libgomp/configure
@@ -780,6 +780,7 @@ ac_subst_files=''
 ac_user_opts='
 enable_option_checking
 enable_version_specific_runtime_libs
+enable_libgomp_plugin_developer_only_options
 enable_generated_files_in_srcdir
 enable_multilib
 enable_dependency_tracking
@@ -1434,6 +1435,9 @@ Optional Features:
   --enable-version-specific-runtime-libs
   Specify that runtime libraries should be installed
   in a compiler-specific directory [default=no]
+  --enable-libgomp-plugin-developer-only-options
+  Specify that libgomp plugins should be build with
+  additional developer-only options [default=no]
   --enable-generated-files-in-srcdir
   put copies of generated files in source dir intended
   for creating source tarballs for users without
@@ -2627,6 +2631,30 @@ fi
 { $as_echo "$as_me:${as_lineno-$LINENO}: result: $enable_version_specific_runtime_libs" >&5
 $as_echo "$enable_version_specific_runtime_libs" >&6; }
 
+{ $as_echo "$as_me:${as_lineno-$LINENO}: checking for --enable-libgomp-plugin-developer-only-options" >&5
+$as_echo_n "checking for --enable-libgomp-plugin-developer-only-options... " >&6; }
+ # Check whether --enable-libgomp-plugin-developer-only-options was given.
+if test "${enable_libgomp_plugin_developer_only_options+set}" = set; then :
+  enableval=$enable_libgomp_plugin_developer_only_options;
+  case "$enableval" in
+   yes|no) ;;
+   *) as_f

  1   2   >