introduce -fcallgraph-info option

2019-10-25 Thread Alexandre Oliva
This was first submitted many years ago
https://gcc.gnu.org/ml/gcc-patches/2010-10/msg02468.html

The command line option -fcallgraph-info is added and makes the
compiler generate another output file (xxx.ci) for each compilation
unit, which is a valid VCG file (you can launch your favorite VCG
viewer on it unmodified) and contains the "final" callgraph of the
unit.  "final" is a bit of a misnomer as this is actually the
callgraph at RTL expansion time, but since most high-level
optimizations are done at the Tree level and RTL doesn't usually
fiddle with calls, it's final in almost all cases.  Moreover, the
nodes can be decorated with additional info: -fcallgraph-info=su adds
stack usage info and -fcallgraph-info=da dynamic allocation info.

Compared with the earlier version, this patch does not modify cgraph,
and instead adds the required information next to the stage usage
function data structure, so we only hold one of those at at time.  I've
switched to vecs from linked lists, for more compact edges and dynamic
allocation annotations, and arranged for them to be released as soon as
we've printed out the information.  I have NOT changed the file format,
because existing tools such as gnatstack consume the current format.

Regstrapped on x86_64-linux-gnu.  Ok to install?


for  gcc/ChangeLog
>From  Eric Botcazou  , Alexandre Oliva  
>

* common.opt (-fcallgraph-info[=]): New option.
* doc/invoke.texi (Debugging options): Document it.
* opts.c (common_handle_option): Handle it.
* builtins.c (expand_builtin_alloca): Record allocation if
-fcallgraph-info=da.
* calls.c (expand_call): If -fcallgraph-info, record the call.
(emit_library_call_value_1): Likewise.
* flag-types.h (enum callgraph_info_type): New type.
* explow.c: Include stringpool.h.
(set_stack_check_libfunc): Set SET_SYMBOL_REF_DECL on the symbol.
* function.c (allocate_stack_usage_info): New.
(allocate_struct_function): Call it for -fcallgraph-info.
(prepare_function_start): Call it otherwise.
(rest_of_handle_thread_prologue_and_epilogue): Release callees
and dallocs after output_stack_usage.
(record_final_call, record_dynamic_alloc): New.
* function.h (struct callee, struct dalloc): New.
(struct stack_usage): Add callees and dallocs.
(record_final_call, record_dynamic_alloc): Declare.
* gimplify.c (gimplify_decl_expr): Record dynamically-allocated
object if -fcallgraph-info=da.
* optabs-libfuncs.c (build_libfunc_function): Keep SYMBOL_REF_DECL.
* print-tree.h (print_decl_identifier): Declare.
(PRINT_DECL_ORIGIN, PRINT_DECL_NAME, PRINT_DECL_UNIQUE_NAME): New.
* print-tree.c: Include print-tree.h.
(print_decl_identifier): New function.
* toplev.c: Include print-tree.h.
(callgraph_info_file): New global variable.
(callgraph_info_indirect_emitted): Likewise.
(output_stack_usage): Rename to...
(output_stack_usage_1): ... this.  Make it static, add cf
parameter.  If -fcallgraph-info=su, print stack usage to cf.
If -fstack-usage, use print_decl_identifier for
pretty-printing.
(INDIRECT_CALL_NAME): New.
(dump_final_indirect_call_node_vcg): New.
(dump_final_callee_vcg, dump_final_node_vcg): New.
(output_stack_usage): New.
(lang_dependent_init): Open and start file if
-fcallgraph-info.
(finalize): If callgraph_info_file is not null, finish it,
close it, and reset callgraph info state.

for  gcc/ada/ChangeLog

* gcc-interface/misc.c (callgraph_info_file): Delete.
---
 gcc/ada/gcc-interface/misc.c |3 -
 gcc/builtins.c   |4 +
 gcc/calls.c  |6 +
 gcc/common.opt   |8 ++
 gcc/doc/invoke.texi  |   17 
 gcc/explow.c |5 +
 gcc/flag-types.h |   16 
 gcc/function.c   |   63 ++--
 gcc/function.h   |   25 ++
 gcc/gimplify.c   |4 +
 gcc/optabs-libfuncs.c|4 -
 gcc/opts.c   |   26 ++
 gcc/output.h |2 
 gcc/print-tree.c |   76 +++
 gcc/print-tree.h |4 +
 gcc/toplev.c |  169 ++
 16 files changed, 381 insertions(+), 51 deletions(-)

diff --git a/gcc/ada/gcc-interface/misc.c b/gcc/ada/gcc-interface/misc.c
index 4abd4d5708a54..d68b37384ff7f 100644
--- a/gcc/ada/gcc-interface/misc.c
+++ b/gcc/ada/gcc-interface/misc.c
@@ -54,9 +54,6 @@
 #include "ada-tree.h"
 #include "gigi.h"
 
-/* This symbol needs to be defined for the front-end.  */
-void *callgraph_info_file = NULL;
-
 /* Command-line argc and argv.  These variables are global since they are
imported in back_end.adb.  */
 unsigned int save_argc;
diff --git a/gcc/builtins.c 

[FYI] fix cgraph comment

2019-10-25 Thread Alexandre Oliva
This comment cut fix was split out of another patch I'm about to
contribute, as the current version of the patch no longer touches cgraph
data structures.

I'm checking it in as obvious.


From: Eric Botcazou 

for  gcc/ChangeLog

* cgraph.c (cgraph_node::rtl_info): Fix cut in comment.
* cgraph.h (cgraph_node::rtl_info): Likewise.
---
 gcc/cgraph.c |2 +-
 gcc/cgraph.h |2 +-
 2 files changed, 2 insertions(+), 2 deletions(-)

diff --git a/gcc/cgraph.c b/gcc/cgraph.c
index 8b752d8380981..e83bf2001e257 100644
--- a/gcc/cgraph.c
+++ b/gcc/cgraph.c
@@ -1846,7 +1846,7 @@ cgraph_node::local_info (tree decl)
   return >ultimate_alias_target ()->local;
 }
 
-/* Return local info for the compiled function.  */
+/* Return RTL info for the compiled function.  */
 
 cgraph_rtl_info *
 cgraph_node::rtl_info (const_tree decl)
diff --git a/gcc/cgraph.h b/gcc/cgraph.h
index 733d616fb8c3a..a7f357f0b5c07 100644
--- a/gcc/cgraph.h
+++ b/gcc/cgraph.h
@@ -1381,7 +1381,7 @@ struct GTY((tag ("SYMTAB_FUNCTION"))) cgraph_node : 
public symtab_node
   /* Return local info for the compiled function.  */
   static cgraph_local_info *local_info (tree decl);
 
-  /* Return local info for the compiled function.  */
+  /* Return RTL info for the compiled function.  */
   static struct cgraph_rtl_info *rtl_info (const_tree);
 
   /* Return the cgraph node that has ASMNAME for its DECL_ASSEMBLER_NAME.

-- 
Alexandre Oliva, freedom fighter  he/him   https://FSFLA.org/blogs/lxo
Be the change, be Free!FSF VP & FSF Latin America board member
GNU Toolchain EngineerFree Software Evangelist
Hay que enGNUrecerse, pero sin perder la terGNUra jamás - Che GNUevara


[PATCH v4, rs6000] Replace X-form addressing with D-form addressing in new pass for Power9

2019-10-25 Thread Kelvin Nilsen


This patch adds a new optimization pass for rs6000 targets.

This new pass scans existing rtl expressions and replaces X-form loads and 
stores with rtl expressions that favor selection of the D-form instructions in 
contexts for which the D-form instructions are preferred.  The new pass runs 
after the RTL loop optimizations since loop unrolling often introduces 
opportunities for beneficial replacements of X-form addressing instructions.

For each of the new tests, multiple X-form instructions are replaced with 
D-form instructions, some addi instructions are replaced with add instructions, 
and some addi instructions are eliminated.  The typical improvement for the 
included tests is a decrease of 4.28% to 12.12% in the number of instructions 
executed on each iteration of the loop.  The optimization has not shown 
measurable improvement on specmark tests, presumably because the typical loops 
that are benefited by this optimization are memory bounded and this 
optimization does not eliminate memory loads or stores.  However, it is 
anticipated that multi-threaded workloads and measurements of total power and 
cooling costs for heavy server workloads would benefit.

This version 4 patch responds to feedback and numerous suggestions by Segher:

  1. Further improvements to comments and discussion of computational 
complexity.

  2. Changed the name of insn_sequence_no to luid.

  3. Fixed some typos in comments.

  4. Added macro-defined constants to enforce upper bounds on the sizes (and 
number of required iterations) for certain data structures.  The intent is to 
bound compile time for programs that represent large numbers of opportunities 
for D-form replacements.  This optimization pass ignores  parts of a source 
program that exceed these macro-defined size limits.

In a separate mail, I have sent discussion regarding the behavior of preceding 
passes and how this behavior relates to this new pass.

I have built and regression tested this patch on powerpc64le-unknown-linux 
target with no regressions.

Is this ok for trunk?

gcc/ChangeLog:

2019-10-25  Kelvin Nilsen  

* config/rs6000/rs6000-p9dform.c: New file.
* config/rs6000/rs6000-passes.def: Add pass_insert_dform.
* config/rs6000/rs6000-protos.h
(rs6000_target_supports_dform_offset_p): New function prototype.
(make_pass_insert_dform): Likewise.
* config/rs6000/rs6000.c (rs6000_target_supports_dform_offset_p):
New function.
* config/rs6000/t-rs6000 (rs6000-p9dform.o): New build target.
* config.gcc: Add rs6000-p9dform.o object file.

gcc/testsuite/ChangeLog:

2019-10-25  Kelvin Nilsen  

* gcc.target/powerpc/p9-dform-0.c: New test.
* gcc.target/powerpc/p9-dform-1.c: New test.
* gcc.target/powerpc/p9-dform-10.c: New test.
* gcc.target/powerpc/p9-dform-11.c: New test.
* gcc.target/powerpc/p9-dform-12.c: New test.
* gcc.target/powerpc/p9-dform-13.c: New test.
* gcc.target/powerpc/p9-dform-14.c: New test.
* gcc.target/powerpc/p9-dform-15.c: New test.
* gcc.target/powerpc/p9-dform-2.c: New test.
* gcc.target/powerpc/p9-dform-3.c: New test.
* gcc.target/powerpc/p9-dform-4.c: New test.
* gcc.target/powerpc/p9-dform-5.c: New test.
* gcc.target/powerpc/p9-dform-6.c: New test.
* gcc.target/powerpc/p9-dform-7.c: New test.
* gcc.target/powerpc/p9-dform-8.c: New test.
* gcc.target/powerpc/p9-dform-9.c: New test.
* gcc.target/powerpc/p9-dform-generic.h: New test.

Index: gcc/config/rs6000/rs6000-p9dform.c
===
--- gcc/config/rs6000/rs6000-p9dform.c  (nonexistent)
+++ gcc/config/rs6000/rs6000-p9dform.c  (working copy)
@@ -0,0 +1,1763 @@
+/* Subroutines used to transform array subscripting expressions into
+   forms that are more amenable to d-form instruction selection for p9
+   little-endian VSX code.
+   Copyright (C) 1991-2019 Free Software Foundation, Inc.
+
+   This file is part of GCC.
+
+   GCC is free software; you can redistribute it and/or modify it
+   under the terms of the GNU General Public License as published
+   by the Free Software Foundation; either version 3, or (at your
+   option) any later version.
+
+   GCC is distributed in the hope that it will be useful, but WITHOUT
+   ANY WARRANTY; without even the implied warranty of MERCHANTABILITY
+   or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU General Public
+   License for more details.
+
+   You should have received a copy of the GNU General Public License
+   along with GCC; see the file COPYING3.  If not see
+   .  */
+
+#include "config.h"
+#include "system.h"
+#include "coretypes.h"
+#include "backend.h"
+#include "rtl.h"
+#include "tree.h"
+#include "memmodel.h"
+#include "df.h"
+#include "tm_p.h"
+#include "ira.h"
+#include "print-tree.h"
+#include "varasm.h"
+#include "explow.h"

Re: [PING^3][PATCH 3/3][DejaGNU] target: Wrap linker flags into `-largs'/`-margs' for Ada

2019-10-25 Thread Jacob Bachmeyer

Maciej W. Rozycki wrote:

On Tue, 21 May 2019, Jacob Bachmeyer wrote:
  
 IOW I don't discourage you from developing a comprehensive solution, 
however applying my proposal right away will help at least some people and 
will not block you in any way.
  
  
Correct, although, considering how long my FSF paperwork took, I might 
be able to finish a comprehensive patch before your paperwork is 
completed.  :-)



 So by now the FSF paperwork has been long completed actually.
  


Yes, that turned out to be optimistic, for a few reasons.


 You are welcome to go ahead with your effort as far as I am concerned.
  
  

I am working on it.  :-)



 Hopefully you have made good progress now.


Progress is stalled because the DejaGNU maintainer has not been seen on 
this list since March and I am unsure about working too far ahead of the 
"accepted" line.



Otherwise this is a ping for:



Once complete your change can go on top of mine and meanwhile we'll have 
a working GCC test suite.
  


There might be a merge conflict, but that will be easy to resolve by 
overwriting your patch with mine.  I will make sure to include the 
functionality in the rewrite.


I have just sent a patch to the list that has been waiting in my local 
repository since June.  It adds unit tests for the 
default_target_compile procedure, but currently verifies the broken Ada 
handling.  Would you be willing to supply a patch to update those tests 
to the correct behavior?  If so, I will also merge your code on my local 
branch and we might even avoid the merge conflict down the line.


While you are doing that, could you also explain what the various -?args 
GNU Ada driver options do and if any others are needed or could be 
needed?  I will ensure that the rewrite handles all cases if I can get a 
solid description of what those cases actually are.  The rewrite will 
group complier/linker/etc. options in separate lists internally, so 
using those options will be easy without adding more hacks to a 
procedure that has already become a tissue of hacks.


-- Jacob


Merge from trunk to gccgo branch

2019-10-25 Thread Ian Lance Taylor
I merged trunk revision 277462 to the gccgo branch.

Ian


Re: [PATCH] rs6000: Enable limited unrolling at -O2

2019-10-25 Thread Segher Boessenkool
Hi Jiufu,

On Fri, Oct 25, 2019 at 10:44:39PM +0800, Jiufu Guo wrote:
> In PR88760, there are a few disscussion about improve or tune unroller for
> targets. And we would agree to enable unroller for small loops at O2 first.

[ snip ]

>   PR tree-optimization/88760
>   * config/rs6000/rs6000-common.c (rs6000_option_optimization_table): for
>   O2, enable funroll-loops.

* config/rs6000/rs6000-common.c (rs6000_option_optimization_table):
Enable -funroll-loops for -O2 and above.

>   * config/rs6000/rs6000.c (rs6000_option_override_internal): if unroller
>   is enabled throught O2, set constrains to PARAM_MAX_UNROLL_TIMES=2 and
>   PARAM_MAX_UNROLLED_INSNS=20 for small loops.

* config/rs6000/rs6000.c (rs6000_option_override_internal): Set
PARAM_MAX_UNROLL_TIMES to 2 and PARAM_MAX_UNROLLED_INSNS to 20 if the
unroller is not explicitly enabled.

> --- a/gcc/config/rs6000/rs6000.c
> +++ b/gcc/config/rs6000/rs6000.c
> @@ -4540,6 +4540,26 @@ rs6000_option_override_internal (bool global_init_p)
>global_options.x_param_values,
>global_options_set.x_param_values);
>  
> +  /* unroll very small loops 2 time if no -funroll-loops.  */
> +  if (!global_options_set.x_flag_unroll_loops
> +   && !global_options_set.x_flag_unroll_all_loops)
> + {
> +   maybe_set_param_value (PARAM_MAX_UNROLL_TIMES, 2,
> +  global_options.x_param_values,
> +  global_options_set.x_param_values);
> +
> +   maybe_set_param_value (PARAM_MAX_UNROLLED_INSNS, 20,
> +  global_options.x_param_values,
> +  global_options_set.x_param_values);
> +
> +   /* If fweb or frename-registers are not specificed in command-line,

(specified)

> +  do not turn them on implicitly.  */
> +   if (!global_options_set.x_flag_web)
> + global_options.x_flag_web = 0;
> +   if (!global_options_set.x_flag_rename_registers)
> + global_options.x_flag_rename_registers = 0;
> + }

This web and rnreg thing needs to be in the changelog, too.

> diff --git a/gcc/testsuite/c-c++-common/tsan/thread_leak2.c 
> b/gcc/testsuite/c-c++-common/tsan/thread_leak2.c
> index c9b8046..17aa5c6 100644
> --- a/gcc/testsuite/c-c++-common/tsan/thread_leak2.c
> +++ b/gcc/testsuite/c-c++-common/tsan/thread_leak2.c
> @@ -1,4 +1,5 @@
>  /* { dg-shouldfail "tsan" } */
> +/* { dg-additional-options "-fno-unroll-loops" { target { powerpc-*-* 
> powerpc64-*-* powerpc64le-*-* } } } */

You can write that as just  { target { powerpc*-*-* } }

Could you explain why this is needed here?  In the test file itself,
preferably.

> --- a/gcc/testsuite/gcc.dg/pr59643.c
> +++ b/gcc/testsuite/gcc.dg/pr59643.c
> @@ -1,6 +1,7 @@
>  /* PR tree-optimization/59643 */
>  /* { dg-do compile } */
>  /* { dg-options "-O3 -fdump-tree-pcom-details" } */
> +/* { dg-additional-options "-fno-unroll-loops" { target { powerpc-*-* 
> powerpc64-*-* powerpc64le-*-* } } } */

Same here.  How does this patch change behaviour here, anyway?  The test
uses -O3?

> --- a/gcc/testsuite/gcc.target/powerpc/loop_align.c
> +++ b/gcc/testsuite/gcc.target/powerpc/loop_align.c
> @@ -1,6 +1,6 @@
>  /* { dg-do compile { target { powerpc*-*-* } } } */
>  /* { dg-skip-if "" { powerpc*-*-darwin* powerpc-ibm-aix* } } */
> -/* { dg-options "-O2 -mdejagnu-cpu=power7 -falign-functions=16" } */
> +/* { dg-options "-O2 -mdejagnu-cpu=power7 -falign-functions=16 
> -fno-unroll-loops" } */
>  /* { dg-final { scan-assembler ".p2align 5" } } */

For this one it is pretty much obvious.

> --- a/gcc/testsuite/gcc.target/powerpc/ppc-fma-1.c
> +++ b/gcc/testsuite/gcc.target/powerpc/ppc-fma-1.c
> @@ -1,7 +1,7 @@
>  /* { dg-do compile { target { powerpc*-*-* } } } */
>  /* { dg-skip-if "" { powerpc*-*-darwin* } } */
>  /* { dg-require-effective-target powerpc_vsx_ok } */
> -/* { dg-options "-O3 -ftree-vectorize -mdejagnu-cpu=power7 -ffast-math" } */
> +/* { dg-options "-O3 -ftree-vectorize -mdejagnu-cpu=power7 -ffast-math 
> -fno-unroll-loops" } */

And here, and all other powerpc-specific tests (they count the generated
machine instructions).

> --- /dev/null
> +++ b/gcc/testsuite/gcc.target/powerpc/small-loop-unroll.c
> @@ -0,0 +1,13 @@
> +/* { dg-do compile } */
> +/* { dg-options "-O2 -fdump-rtl-loop2_unroll" } */
> +
> +void __attribute__ ((noinline)) foo(int n, int *arr)
> +{
> +  int i;
> +  for (i = 0; i < n; i++)
> +arr[i] = arr[i] - 10;
> +}
> +/* { dg-final { scan-rtl-dump-times "Unrolled loop 1 times" 1 "loop2_unroll" 
> } } */
> +/* { dg-final { scan-assembler-times "lwz" 3 } } */
> +/* { dg-final { scan-assembler-times "stw" 3 } } */

/* { dg-final { scan-assembler-times {\mlwz\M} 3 } } */
/* { dg-final { scan-assembler-times {\mstw\M} 3 } } */

so that it doesn't count when lwz or stw is only part of the word.  This
doesn't so easily matter for 

Re: PR92163

2019-10-25 Thread Prathamesh Kulkarni
On Fri, 25 Oct 2019 at 13:19, Richard Biener  wrote:
>
> On Wed, Oct 23, 2019 at 11:45 PM Prathamesh Kulkarni
>  wrote:
> >
> > Hi,
> > The attached patch tries to fix PR92163 by calling
> > gimple_purge_dead_eh_edges from ifcvt_local_dce if we need eh cleanup.
> > Does it look OK ?
>
> Hmm.  I think it shows an issue with the return value of 
> remove_stmt_form_eh_lp
> which is true if the LP index is -1 (externally throwing).  We don't
> need to purge
> any edges in that case.  That is, if-conversion should never need to
> do EH purging
> since that would be wrong-code.
>
> As of the segfault can you please instead either pass down need_eh_cleanup
> as function parameter (and NULL from ifcvt) or use the return value in DSE
> to set the bit in the caller.
Hi Richard,
Thanks for the suggestions, does the attached patch look OK ?
Bootstrap+test in progress on x86_64-unknown-linux-gnu.

Thanks,
Prathamesh
>
> Thanks,
> Richard.
>
> > Thanks,
> > Prathamesh
2019-10-25  Prathamesh Kulkarni  

PR tree-optimization/92163
* tree-ssa-dse.c (delete_dead_or_redundant_assignment): New param
need_eh_cleanup with default value NULL. Gate on need_eh_cleanup
before calling bitmap_set_bit.
(dse_optimize_redundant_stores): Pass global need_eh_cleanup to
delete_dead_or_redundant_assignment.
(dse_dom_walker::dse_optimize_stmt): Likewise.
* tree-ssa-dse.h (delete_dead_or_redundant_assignment): Adjust 
prototype.

testsuite/
* gcc.dg/tree-ssa/pr92163.c: New test.

diff --git a/gcc/testsuite/gcc.dg/tree-ssa/pr92163.c 
b/gcc/testsuite/gcc.dg/tree-ssa/pr92163.c
new file mode 100644
index 000..58f548fe76b
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/tree-ssa/pr92163.c
@@ -0,0 +1,16 @@
+/* { dg-do "compile" } */
+/* { dg-options "-O2 -fexceptions -fnon-call-exceptions -fopenacc" } */
+
+void
+xr (int *k7)
+{
+  int qa;
+
+#pragma acc parallel
+#pragma acc loop vector
+  for (qa = 0; qa < 3; ++qa)
+if (qa % 2 != 0)
+  k7[qa] = 0;
+else
+  k7[qa] = 1;
+}
diff --git a/gcc/tree-ssa-dse.c b/gcc/tree-ssa-dse.c
index 25cd4709b31..21a15eef690 100644
--- a/gcc/tree-ssa-dse.c
+++ b/gcc/tree-ssa-dse.c
@@ -77,7 +77,6 @@ along with GCC; see the file COPYING3.  If not see
fact, they are the same transformation applied to different views of
the CFG.  */
 
-void delete_dead_or_redundant_assignment (gimple_stmt_iterator *, const char 
*);
 static void delete_dead_or_redundant_call (gimple_stmt_iterator *, const char 
*);
 
 /* Bitmap of blocks that have had EH statements cleaned.  We should
@@ -639,7 +638,8 @@ dse_optimize_redundant_stores (gimple *stmt)
{
  gimple_stmt_iterator gsi = gsi_for_stmt (use_stmt);
  if (is_gimple_assign (use_stmt))
-   delete_dead_or_redundant_assignment (, "redundant");
+   delete_dead_or_redundant_assignment (, "redundant",
+need_eh_cleanup);
  else if (is_gimple_call (use_stmt))
delete_dead_or_redundant_call (, "redundant");
  else
@@ -900,7 +900,8 @@ delete_dead_or_redundant_call (gimple_stmt_iterator *gsi, 
const char *type)
 /* Delete a dead store at GSI, which is a gimple assignment. */
 
 void
-delete_dead_or_redundant_assignment (gimple_stmt_iterator *gsi, const char 
*type)
+delete_dead_or_redundant_assignment (gimple_stmt_iterator *gsi, const char 
*type,
+bitmap need_eh_cleanup)
 {
   gimple *stmt = gsi_stmt (*gsi);
   if (dump_file && (dump_flags & TDF_DETAILS))
@@ -915,7 +916,7 @@ delete_dead_or_redundant_assignment (gimple_stmt_iterator 
*gsi, const char *type
 
   /* Remove the dead store.  */
   basic_block bb = gimple_bb (stmt);
-  if (gsi_remove (gsi, true))
+  if (gsi_remove (gsi, true) && need_eh_cleanup)
 bitmap_set_bit (need_eh_cleanup, bb->index);
 
   /* And release any SSA_NAMEs set in this statement back to the
@@ -1059,7 +1060,7 @@ dse_dom_walker::dse_optimize_stmt (gimple_stmt_iterator 
*gsi)
  && !by_clobber_p)
return;
 
-  delete_dead_or_redundant_assignment (gsi, "dead");
+  delete_dead_or_redundant_assignment (gsi, "dead", need_eh_cleanup);
 }
 }
 
diff --git a/gcc/tree-ssa-dse.h b/gcc/tree-ssa-dse.h
index a5eccbd746d..2658f92b1bb 100644
--- a/gcc/tree-ssa-dse.h
+++ b/gcc/tree-ssa-dse.h
@@ -31,6 +31,7 @@ enum dse_store_status
 dse_store_status dse_classify_store (ao_ref *, gimple *, bool, sbitmap,
 bool * = NULL, tree = NULL);
 
-void delete_dead_or_redundant_assignment (gimple_stmt_iterator *, const char 
*);
+void delete_dead_or_redundant_assignment (gimple_stmt_iterator *, const char *,
+ bitmap = NULL);
 
 #endif   /* GCC_TREE_SSA_DSE_H  */


Re: [PATCH] PR85678: Change default to -fno-common

2019-10-25 Thread Segher Boessenkool
On Fri, Oct 25, 2019 at 03:47:10PM +, Wilco Dijkstra wrote:
> GCC currently defaults to -fcommon.  As discussed in the PR, this is an 
> ancient
> C feature which is not conforming with the latest C standards.  On many 
> targets
> this means global variable accesses have a codesize and performance penalty.
> This applies to C code only, C++ code is not affected by -fcommon.  It is 
> about
> time to change the default.

Does this actually work on all older OSes (etc.) we support?  Only one
way to find out, of course :-)


Segher


PSA: Nasty lurking bug causing string comparison to be eliminated incorrectly

2019-10-25 Thread Jeff Law
Ugh.  I should have caught this earlier.

My Fedora tester failed recently on the "flatbuffers" package.  It
worked on Oct 6th GCC snapshot, but was failing by Oct 13th snapshot.

Bisection ultimately landed on:

> commit d9d534895b775a453b8d8d291ef72d6dfa5f9e52 (HEAD, refs/bisect/bad)
> Author: msebor 
> Date:   Wed Oct 9 21:35:11 2019 +
> 
> PR tree-optimization/90879 - fold zero-equality of strcmp between a 
> longer string and a smaller array
> 
> gcc/c-family/ChangeLog:
> 
> PR tree-optimization/90879
> * c.opt (-Wstring-compare): New option.
> 
> gcc/testsuite/ChangeLog:
> 
> PR tree-optimization/90879
> * gcc.dg/Wstring-compare-2.c: New test.
> * gcc.dg/Wstring-compare.c: New test.
> * gcc.dg/strcmpopt_3.c: Scan the optmized dump instead of strlen.
> * gcc.dg/strcmpopt_6.c: New test.
> * gcc.dg/strlenopt-65.c: Remove uinnecessary declarations, add
> test cases.
> * gcc.dg/strlenopt-66.c: Run it.
> * gcc.dg/strlenopt-68.c: New test.
> 
> gcc/ChangeLog:
> 
> PR tree-optimization/90879
> * builtins.c (check_access): Avoid using maxbound when null.
> * calls.c (maybe_warn_nonstring_arg): Adjust to get_range_strlen 
> change.
> * doc/invoke.texi (-Wstring-compare): Document new warning option.
> * gimple-fold.c (get_range_strlen_tree): Make setting maxbound
> conditional.
> (get_range_strlen): Overwrite initial maxbound when non-null.
> * gimple-ssa-sprintf.c (get_string_length): Adjust to 
> get_range_strlen
> changes.
> * tree-ssa-strlen.c (maybe_diag_stxncpy_trunc): Same.
> (used_only_for_zero_equality): New function.
> (handle_builtin_memcmp): Call it.
> (determine_min_objsize): Return an integer instead of tree.
> (get_len_or_size, strxcmp_eqz_result): New functions.
> (maybe_warn_pointless_strcmp): New function.
> (handle_builtin_string_cmp): Call it.  Fold zero-equality of 
> strcmp
> between a longer string and a smaller array.
> (get_range_strlen_dynamic): Overwrite initial maxbound when 
> non-null.


But I think the root of the problem is actually here:

> commit 72dbc21dbbdd08cd4047d68bce3154dc485a4255
> Author: qinzhao 
> Date:   Thu May 31 20:01:42 2018 +
> 
> 2nd Patch for PR78009
> Patch for PR83026
> 
> https://gcc.gnu.org/bugzilla/show_bug.cgi?id=78809
> Inline strcmp with small constant strings
> 
> https://gcc.gnu.org/bugzilla/show_bug.cgi?id=83026
> missing strlen optimization for strcmp of unequal strings
> 
> The design doc for PR78809 is at:
> https://www.mail-archive.com/gcc@gcc.gnu.org/msg83822.html
> 
> this patch is for the second part of change of PR78809 and PR83026:
> 
> B. for strncmp (s1, s2, n) (!)= 0 or strcmp (s1, s2) (!)= 0
> 
>B.1. (PR83026) When the lengths of both arguments are constant and
> it's a strcmp:
>   * if the lengths are NOT equal, we can safely fold the call
> to a non-zero value.
>   * otherwise, do nothing now.
> 
>B.2. (PR78809) When the length of one argument is constant, try to 
> replace
>the call with a __builtin_str(n)cmp_eq call where possible, i.e:
> 
>strncmp (s, STR, C) (!)= 0 in which, s is a pointer to a string, STR 
> is a
>string with constant length, C is a constant.
>  if (C <= strlen(STR) && sizeof_array(s) > C)
>{
>  replace this call with
>  __builtin_strncmp_eq (s, STR, C) (!)= 0
>}
>  if (C > strlen(STR)
>{
>  it can be safely treated as a call to strcmp (s, STR) (!)= 0
>  can handled by the following strcmp.
>}
> 
>strcmp (s, STR) (!)= 0 in which, s is a pointer to a string, STR is a
>string with constant length.
>  if  (sizeof_array(s) > strlen(STR))
>{
>  replace this call with
>  __builtin_strcmp_eq (s, STR, strlen(STR)+1) (!)= 0
>}
> 
>later when expanding the new __builtin_str(n)cmp_eq calls, first 
> expand them
>as __builtin_memcmp_eq, if the expansion does not succeed, change them 
> back
>to call to __builtin_str(n)cmp.
> 
> adding test case strcmpopt_2.c and strcmpopt_4.c into gcc.dg for part B of
> PR78809 adding test case strcmpopt_3.c into gcc.dg for PR83026

Qing's patch introduced determine_min_objsize, which has this chunk of code:

>  /* Try to determine the size of the object from its type.  */
>   if (TREE_CODE (dest) != ADDR_EXPR)
> return HOST_WIDE_INT_M1U;
> 
>   tree type = TREE_TYPE (dest);
>   if (TREE_CODE (type) == POINTER_TYPE)
> 

Re: [PATCH] Adjust predicates and constraints of scalar insns

2019-10-25 Thread Uros Bizjak
On Fri, Oct 25, 2019 at 9:20 PM Hongtao Liu  wrote:
>
> > Looking into sse.md, there is a lot of inconsistencies in existing *vm
> > patterns w.r.t. operand constraints. Unfortunately, these were copied
> > into proposed patterns. One example is existing
> >
> > (define_insn "_vmsqrt2"
> >   [(set (match_operand:VF_128 0 "register_operand" "=x,v")
> > (vec_merge:VF_128
> >   (sqrt:VF_128
> > (match_operand:VF_128 1 "vector_operand"
> > "xBm,"))
> >   (match_operand:VF_128 2 "register_operand" "0,v")
> >   (const_int 1)))]
> >   "TARGET_SSE"
> >   "@
> >sqrt\t{%1, %0|%0, %1}
> >
> > Due to combine benefits, *vm operands to be merged is described in
> > vector mode. Since the insn operates in scalar mode, there is no need
> > for "vector_operand" and Bm constraint that impose more strict
> > alignment requirements. However, iptr modifier is needed here to
> > override VF_128 vector mode (e.g. V4SFmode) to generate scalar
> > (SFmode, DWORD PTR) memory access prefix.
> >
> > Someone should fix these existing inconsistencies in a follow-up patch.
>
> https://gcc.gnu.org/ml/gcc-patches/2019-10/msg01867.html
> This patch is to fix these.
>
> Bootstrap and regression test on i386/x86-64 is ok.
>
> Ok for trunk?
>
> Changelog
>
> cc/
> * config/i386/sse.md
> (_vm3,
> 
> _vm3,
> _vmsqrt2,
> _vm3,
> _vmmaskcmp3):
> Change predicates from vector_operand to nonimmediate_operand,
> constraints xBm to xm, since scalar operations don't need
> memory address alignment.
> (avx512f_vmcmp3,
> avx512f_vmcmp3_mask): Replace
> round_saeonly_nimm_predicate with
> round_saeonly_nimm_scalar_predicate.
> (fmai_vmfmadd_, fmai_vmfmsub_,
> fmai_vmfnmadd_,fmai_vmfnmsub_,
> *fmai_fmadd_, *fmai_fmsub_,
> *fmai_fnmadd_, *fmai_fnmsub_,
> avx512f_vmfmadd__mask3,
> avx512f_vmfmadd__maskz_1,
> *avx512f_vmfmsub__mask,
> avx512f_vmfmsub__mask3,
> *avx512f_vmfmsub__maskz_1,
> *avx512f_vmfnmadd__mask,
> *avx512f_vmfnmadd__mask3,
> *avx512f_vmfnmadd__maskz_1,
> *avx512f_vmfnmsub__mask,
> *avx512f_vmfnmsub__mask3,
> *avx512f_vmfnmsub__maskz_1,
> cvtusi232,
> cvtusi264, ): Replace
> round_nimm_predicate instead of round_nimm_scalr_predicate.
> (avx512f_sfixupimm,
> avx512f_sfixupimm_mask,
> avx512er_vmrcp28,
> avx512er_vmrsqrt28,
> ): Replace round_saeonly_nimm_predicate with
> round_saeonly_nimm_scalar_predicate.
> (avx512dq_vmfpclass, ): Replace
> vector_operand with nonimmediate_operand.
> * config/i386/subst.md (round_scalar_nimm_predicate,
> round_saeonly_scalar_nimm_predicate): Replace
> vector_operand with nonimmediate_operand.

LGTM, although this patch is very hard to review.

BTW: Please also note that there is no need to use  or operand
mode override in scalar insn templates for intel asm dialect when
operand already has a scalar mode.

Thanks,
Uros.


Re: [PATCH target/89071] Fix false dependence of scalar operations vrcp/vsqrt/vrsqrt/vrndscale

2019-10-25 Thread Uros Bizjak
On Fri, Oct 25, 2019 at 9:13 PM Hongtao Liu  wrote:
>
> Update patch.
>
> On Fri, Oct 25, 2019 at 4:01 PM Uros Bizjak  wrote:
> >
> > On Fri, Oct 25, 2019 at 7:55 AM Hongtao Liu  wrote:
> > >
> > > On Fri, Oct 25, 2019 at 1:23 PM Hongtao Liu  wrote:
> > > >
> > > > On Fri, Oct 25, 2019 at 2:39 AM Uros Bizjak  wrote:
> > > > >
> > > > > On Wed, Oct 23, 2019 at 7:48 AM Hongtao Liu  
> > > > > wrote:
> > > > > >
> > > > > > Update patch:
> > > > > > Add m constraint to define_insn (sse_1_round > > > > > *sse_1_round > > > > > error
> > > > > > when under sse4 but not avx512f.
> > > > >
> > > > > It looks to me that the original insn is incompletely defined. It
> > > > > should use nonimmediate_operand, "m" constraint and  pointer
> > > > > size modifier. Something like:
> > > > >
> > > > > (define_insn "sse4_1_round"
> > > > >   [(set (match_operand:VF_128 0 "register_operand" "=Yr,*x,x,v")
> > > > > (vec_merge:VF_128
> > > > >   (unspec:VF_128
> > > > > [(match_operand:VF_128 2 "nonimmediate_operand" 
> > > > > "Yrm,*xm,xm,vm")
> > > > >  (match_operand:SI 3 "const_0_to_15_operand" "n,n,n,n")]
> > > > > UNSPEC_ROUND)
> > > > >   (match_operand:VF_128 1 "register_operand" "0,0,x,v")
> > > > >   (const_int 1)))]
> > > > >   "TARGET_SSE4_1"
> > > > >   "@
> > > > >round\t{%3, %2, %0|%0, %2, %3}
> > > > >round\t{%3, %2, %0|%0, %2, %3}
> > > > >vround\t{%3, %2, %1, %0|%0, %1, %2, %3}
> > > > >vrndscale\t{%3, %2, %1, %0|%0, %1, %2, 
> > > > > %3}"
> > > > >
> > > > > >
> > > > > > Changelog:
> > > > > > gcc/
> > > > > > * config/i386/sse.md:  (sse4_1_round):
> > > > > > Change constraint x to xm
> > > > > > since vround support memory operand.
> > > > > > * (*sse4_1_round): Ditto.
> > > > > >
> > > > > > Bootstrap and regression test ok.
> > > > > >
> > > > > > On Wed, Oct 23, 2019 at 9:56 AM Hongtao Liu  
> > > > > > wrote:
> > > > > > >
> > > > > > > Hi uros:
> > > > > > >   This patch fixes false dependence of scalar operations
> > > > > > > vrcp/vsqrt/vrsqrt/vrndscale.
> > > > > > >   Bootstrap ok, regression test on i386/x86 ok.
> > > > > > >
> > > > > > >   It does something like this:
> > > > > > > -
> > > > > > > For scalar instructions with both xmm operands:
> > > > > > >
> > > > > > > op %xmmN,%xmmQ,%xmmQ > op %xmmN, %xmmN, %xmmQ
> > > > > > >
> > > > > > > for scalar instructions with one mem  or gpr operand:
> > > > > > >
> > > > > > > op mem/gpr, %xmmQ, %xmmQ
> > > > > > >
> > > > > > > --->  using pass rpad >
> > > > > > >
> > > > > > > xorps %xmmN, %xmmN, %xxN
> > > > > > > op mem/gpr, %xmmN, %xmmQ
> > > > > > >
> > > > > > > Performance influence of SPEC2017 fprate which is tested on SKX
> > > > > > >
> > > > > > > 503.bwaves_r -0.03%
> > > > > > > 507.cactuBSSN_r -0.22%
> > > > > > > 508.namd_r -0.02%
> > > > > > > 510.parest_r 0.37%
> > > > > > > 511.povray_r 0.74%
> > > > > > > 519.lbm_r 0.24%
> > > > > > > 521.wrf_r 2.35%
> > > > > > > 526.blender_r 0.71%
> > > > > > > 527.cam4_r 0.65%
> > > > > > > 538.imagick_r 0.95%
> > > > > > > 544.nab_r -0.37
> > > > > > > 549.fotonik3d_r 0.24%
> > > > > > > 554.roms_r 0.90%
> > > > > > > fprate geomean 0.50%
> > > > > > > -
> > > > > > >
> > > > > > > Changelog
> > > > > > > gcc/
> > > > > > > * config/i386/i386.md (*rcpsf2_sse): Add
> > > > > > > avx_partial_xmm_update, prefer m constraint for 
> > > > > > > TARGET_AVX.
> > > > > > > (*rsqrtsf2_sse): Ditto.
> > > > > > > (*sqrt2_sse): Ditto.
> > > > > > > (sse4_1_round2): separate constraint vm, add
> > > > > > > avx_partail_xmm_update, prefer m constraint for 
> > > > > > > TARGET_AVX.
> > > > > > > * config/i386/sse.md (*sse_vmrcpv4sf2"): New define_insn 
> > > > > > > used
> > > > > > > by pass rpad.
> > > > > > > 
> > > > > > > (*_vmsqrt2*):
> > > > > > > Ditto.
> > > > > > > (*sse_vmrsqrtv4sf2): Ditto.
> > > > > > > (*avx512f_rndscale): Ditto.
> > > > > > > (*sse4_1_round): Ditto.
> > > > > > >
> > > > > > > gcc/testsuite
> > > > > > > * gcc.target/i386/pr87007-4.c: New test.
> > > > > > > * gcc.target/i386/pr87007-5.c: Ditto.

OK.

Thanks,
Uros.

> > > > > > >
> > > > > > >
> > > > > > > --
> > > > > > > BR,
> > > > > > > Hongtao
> > > > >
> > > > > (set (attr "preferred_for_speed")
> > > > >   (cond [(eq_attr "alternative" "1")
> > > > >(symbol_ref "TARGET_AVX || 
> > > > > !TARGET_SSE_PARTIAL_REG_DEPENDENCY")
> > > > > (eq_attr "alternative" "2")
> > > > > -  (symbol_ref "!TARGET_SSE_PARTIAL_REG_DEPENDENCY")
> > > > > +  (symbol_ref "TARGET_AVX || 
> > > > > !TARGET_SSE_PARTIAL_REG_DEPENDENCY")
> > > > > ]
> > > > > (symbol_ref "true")))])
> > > > >
> > > > > This can be written as:
> > > > >
> > > > > (set (attr "preferred_for_speed")
> > > > >   (cond [(match_test "TARGET_AVX")
> > > > >(symbol_ref 

Re: [build] Properly track GCC language configure fragments

2019-10-25 Thread Joseph Myers
This patch is OK.

-- 
Joseph S. Myers
jos...@codesourcery.com


Re: [SVE] PR91272

2019-10-25 Thread Prathamesh Kulkarni
On Fri, 25 Oct 2019 at 14:18, Richard Sandiford
 wrote:
>
> Hi Prathamesh,
>
> I've just committed a patch that fixes a large number of SVE
> reduction-related failures.  Could you rebase and retest on top of that?
> Sorry for messing you around, but regression testing based on the state
> before the patch wouldn't have been that meaningful.  In particular...
>
> Prathamesh Kulkarni  writes:
> > diff --git a/gcc/tree-vect-loop.c b/gcc/tree-vect-loop.c
> > index a70d52eb2ca..82814e2c2af 100644
> > --- a/gcc/tree-vect-loop.c
> > +++ b/gcc/tree-vect-loop.c
> > @@ -6428,6 +6428,7 @@ vectorizable_reduction (stmt_vec_info stmt_info, 
> > slp_tree slp_node,
> >if (loop_vinfo && LOOP_VINFO_CAN_FULLY_MASK_P (loop_vinfo))
> >  {
> >if (reduction_type != FOLD_LEFT_REDUCTION
> > +   && reduction_type != EXTRACT_LAST_REDUCTION
> > && !mask_by_cond_expr
> > && (cond_fn == IFN_LAST
> > || !direct_internal_fn_supported_p (cond_fn, vectype_in,
>
> ...after today's patch, it's instead necessary to remove:
>
>   if (loop_vinfo
>   && LOOP_VINFO_CAN_FULLY_MASK_P (loop_vinfo)
>   && reduction_type == EXTRACT_LAST_REDUCTION)
> {
>   if (dump_enabled_p ())
> dump_printf_loc (MSG_MISSED_OPTIMIZATION, vect_location,
>  "can't yet use a fully-masked loop for"
>  " EXTRACT_LAST_REDUCTION.\n");
>   LOOP_VINFO_CAN_FULLY_MASK_P (loop_vinfo) = false;
> }
>
> from vectorizable_condition.  We no longer need any changes to
> vectorizable_reduction itself.
>
> > @@ -10180,18 +10181,29 @@ vectorizable_condition (stmt_vec_info stmt_info, 
> > gimple_stmt_iterator *gsi,
> >vec != { 0, ... } (masked in the MASK_LOAD,
> >unmasked in the VEC_COND_EXPR).  */
> >
> > -   if (loop_mask)
> > +   if (masks)
> >   {
> > -   if (COMPARISON_CLASS_P (vec_compare))
> > +   unsigned vec_num = vec_oprnds0.length ();
> > +   loop_mask = vect_get_loop_mask (gsi, masks, vec_num * ncopies,
> > +   vectype, vec_num * j + i);
>
> Ah... now that the two cases are merged (good!), just "if (masks)" isn't
> right after all, sorry for the misleading comment.  I think this should
> instead be:
>
>   /* Force vec_compare to be an SSA_NAME rather than a comparison,
>  in cases where that's necessary.  */
>   if (masks || reduction_type == EXTRACT_LAST_REDUCTION)
> {
>
> Not doing that would break unmasked EXTRACT_LAST_REDUCTIONs.
Ah right, thanks for pointing out!
>
> Then make the existing:
>
>   tree tmp2 = make_ssa_name (vec_cmp_type);
>   gassign *g = gimple_build_assign (tmp2, BIT_AND_EXPR,
> vec_compare, loop_mask);
>   vect_finish_stmt_generation (stmt_info, g, gsi);
>   vec_compare = tmp2;
>
> conditional on "if (masks)" only, and defer the calculation of loop_mask
> to this point too.
>
> [ It ould be good to spot-check that aarch64-sve.exp passes after making
>   the changes to the stmt-generation part of vectorizable_condition,
>   but before removing the:
>
> LOOP_VINFO_CAN_FULLY_MASK_P (loop_vinfo) = false;
>
>   quoted above.  That would show that unmasked fold-left reductions
>   still work after the changes.
>
>   There are still some lingering cases in which we can test unmasked
>   SVE loops directly, but they're becoming rarer and should eventually
>   go away altogether.  So I don't think it's worth trying to construct
>   an unmasked test for the testsuite. ]
>
> > +
> > +  if (!is_gimple_val (vec_compare))
> > +{
> > +  tree vec_compare_name = make_ssa_name (vec_cmp_type);
> > +  gassign *new_stmt = gimple_build_assign 
> > (vec_compare_name,
> > +   vec_compare);
> > +  vect_finish_stmt_generation (stmt_info, new_stmt, gsi);
> > +  vec_compare = vec_compare_name;
> > +}
>
> Should use tab-based indentation.
Thanks for the suggestions, does the attached version look OK ?
Comparing aarch64-sve.exp before/after patch shows no regressions,
bootstrap+test in progress.

Thanks,
Prathamesh
>
> Thanks,
> Richard
diff --git a/gcc/testsuite/gcc.target/aarch64/sve/clastb_1.c b/gcc/testsuite/gcc.target/aarch64/sve/clastb_1.c
index d4f9b0b6a94..d3ea52dea47 100644
--- a/gcc/testsuite/gcc.target/aarch64/sve/clastb_1.c
+++ b/gcc/testsuite/gcc.target/aarch64/sve/clastb_1.c
@@ -1,5 +1,5 @@
 /* { dg-do assemble { target aarch64_asm_sve_ok } } */
-/* { dg-options "-O2 -ftree-vectorize --save-temps" } */
+/* { dg-options "-O2 -ftree-vectorize -fdump-tree-vect-details --save-temps" } */
 
 #define N 32
 
@@ -17,4 +17,5 @@ condition_reduction (int *a, int min_v)
   return last;
 }
 
-/* { dg-final 

[PATCH] Adjust predicates and constraints of scalar insns

2019-10-25 Thread Hongtao Liu
> Looking into sse.md, there is a lot of inconsistencies in existing *vm
> patterns w.r.t. operand constraints. Unfortunately, these were copied
> into proposed patterns. One example is existing
>
> (define_insn "_vmsqrt2"
>   [(set (match_operand:VF_128 0 "register_operand" "=x,v")
> (vec_merge:VF_128
>   (sqrt:VF_128
> (match_operand:VF_128 1 "vector_operand"
> "xBm,"))
>   (match_operand:VF_128 2 "register_operand" "0,v")
>   (const_int 1)))]
>   "TARGET_SSE"
>   "@
>sqrt\t{%1, %0|%0, %1}
>
> Due to combine benefits, *vm operands to be merged is described in
> vector mode. Since the insn operates in scalar mode, there is no need
> for "vector_operand" and Bm constraint that impose more strict
> alignment requirements. However, iptr modifier is needed here to
> override VF_128 vector mode (e.g. V4SFmode) to generate scalar
> (SFmode, DWORD PTR) memory access prefix.
>
> Someone should fix these existing inconsistencies in a follow-up patch.

https://gcc.gnu.org/ml/gcc-patches/2019-10/msg01867.html
This patch is to fix these.

Bootstrap and regression test on i386/x86-64 is ok.

Ok for trunk?

Changelog

cc/
* config/i386/sse.md
(_vm3,
_vm3,
_vmsqrt2,
_vm3,
_vmmaskcmp3):
Change predicates from vector_operand to nonimmediate_operand,
constraints xBm to xm, since scalar operations don't need
memory address alignment.
(avx512f_vmcmp3,
avx512f_vmcmp3_mask): Replace
round_saeonly_nimm_predicate with
round_saeonly_nimm_scalar_predicate.
(fmai_vmfmadd_, fmai_vmfmsub_,
fmai_vmfnmadd_,fmai_vmfnmsub_,
*fmai_fmadd_, *fmai_fmsub_,
*fmai_fnmadd_, *fmai_fnmsub_,
avx512f_vmfmadd__mask3,
avx512f_vmfmadd__maskz_1,
*avx512f_vmfmsub__mask,
avx512f_vmfmsub__mask3,
*avx512f_vmfmsub__maskz_1,
*avx512f_vmfnmadd__mask,
*avx512f_vmfnmadd__mask3,
*avx512f_vmfnmadd__maskz_1,
*avx512f_vmfnmsub__mask,
*avx512f_vmfnmsub__mask3,
*avx512f_vmfnmsub__maskz_1,
cvtusi232,
cvtusi264, ): Replace
round_nimm_predicate instead of round_nimm_scalr_predicate.
(avx512f_sfixupimm,
avx512f_sfixupimm_mask,
avx512er_vmrcp28,
avx512er_vmrsqrt28,
): Replace round_saeonly_nimm_predicate with
round_saeonly_nimm_scalar_predicate.
(avx512dq_vmfpclass, ): Replace
vector_operand with nonimmediate_operand.
* config/i386/subst.md (round_scalar_nimm_predicate,
round_saeonly_scalar_nimm_predicate): Replace
vector_operand with nonimmediate_operand.

-- 
BR,
Hongtao


0001-Adjust-predicates-and-constraints-of-scalar-insns.patch
Description: Binary data


Re: [PATCH target/89071] Fix false dependence of scalar operations vrcp/vsqrt/vrsqrt/vrndscale

2019-10-25 Thread Hongtao Liu
Update patch.

On Fri, Oct 25, 2019 at 4:01 PM Uros Bizjak  wrote:
>
> On Fri, Oct 25, 2019 at 7:55 AM Hongtao Liu  wrote:
> >
> > On Fri, Oct 25, 2019 at 1:23 PM Hongtao Liu  wrote:
> > >
> > > On Fri, Oct 25, 2019 at 2:39 AM Uros Bizjak  wrote:
> > > >
> > > > On Wed, Oct 23, 2019 at 7:48 AM Hongtao Liu  wrote:
> > > > >
> > > > > Update patch:
> > > > > Add m constraint to define_insn (sse_1_round > > > > *sse_1_round > > > > when under sse4 but not avx512f.
> > > >
> > > > It looks to me that the original insn is incompletely defined. It
> > > > should use nonimmediate_operand, "m" constraint and  pointer
> > > > size modifier. Something like:
> > > >
> > > > (define_insn "sse4_1_round"
> > > >   [(set (match_operand:VF_128 0 "register_operand" "=Yr,*x,x,v")
> > > > (vec_merge:VF_128
> > > >   (unspec:VF_128
> > > > [(match_operand:VF_128 2 "nonimmediate_operand" "Yrm,*xm,xm,vm")
> > > >  (match_operand:SI 3 "const_0_to_15_operand" "n,n,n,n")]
> > > > UNSPEC_ROUND)
> > > >   (match_operand:VF_128 1 "register_operand" "0,0,x,v")
> > > >   (const_int 1)))]
> > > >   "TARGET_SSE4_1"
> > > >   "@
> > > >round\t{%3, %2, %0|%0, %2, %3}
> > > >round\t{%3, %2, %0|%0, %2, %3}
> > > >vround\t{%3, %2, %1, %0|%0, %1, %2, %3}
> > > >vrndscale\t{%3, %2, %1, %0|%0, %1, %2, 
> > > > %3}"
> > > >
> > > > >
> > > > > Changelog:
> > > > > gcc/
> > > > > * config/i386/sse.md:  (sse4_1_round):
> > > > > Change constraint x to xm
> > > > > since vround support memory operand.
> > > > > * (*sse4_1_round): Ditto.
> > > > >
> > > > > Bootstrap and regression test ok.
> > > > >
> > > > > On Wed, Oct 23, 2019 at 9:56 AM Hongtao Liu  
> > > > > wrote:
> > > > > >
> > > > > > Hi uros:
> > > > > >   This patch fixes false dependence of scalar operations
> > > > > > vrcp/vsqrt/vrsqrt/vrndscale.
> > > > > >   Bootstrap ok, regression test on i386/x86 ok.
> > > > > >
> > > > > >   It does something like this:
> > > > > > -
> > > > > > For scalar instructions with both xmm operands:
> > > > > >
> > > > > > op %xmmN,%xmmQ,%xmmQ > op %xmmN, %xmmN, %xmmQ
> > > > > >
> > > > > > for scalar instructions with one mem  or gpr operand:
> > > > > >
> > > > > > op mem/gpr, %xmmQ, %xmmQ
> > > > > >
> > > > > > --->  using pass rpad >
> > > > > >
> > > > > > xorps %xmmN, %xmmN, %xxN
> > > > > > op mem/gpr, %xmmN, %xmmQ
> > > > > >
> > > > > > Performance influence of SPEC2017 fprate which is tested on SKX
> > > > > >
> > > > > > 503.bwaves_r -0.03%
> > > > > > 507.cactuBSSN_r -0.22%
> > > > > > 508.namd_r -0.02%
> > > > > > 510.parest_r 0.37%
> > > > > > 511.povray_r 0.74%
> > > > > > 519.lbm_r 0.24%
> > > > > > 521.wrf_r 2.35%
> > > > > > 526.blender_r 0.71%
> > > > > > 527.cam4_r 0.65%
> > > > > > 538.imagick_r 0.95%
> > > > > > 544.nab_r -0.37
> > > > > > 549.fotonik3d_r 0.24%
> > > > > > 554.roms_r 0.90%
> > > > > > fprate geomean 0.50%
> > > > > > -
> > > > > >
> > > > > > Changelog
> > > > > > gcc/
> > > > > > * config/i386/i386.md (*rcpsf2_sse): Add
> > > > > > avx_partial_xmm_update, prefer m constraint for TARGET_AVX.
> > > > > > (*rsqrtsf2_sse): Ditto.
> > > > > > (*sqrt2_sse): Ditto.
> > > > > > (sse4_1_round2): separate constraint vm, add
> > > > > > avx_partail_xmm_update, prefer m constraint for TARGET_AVX.
> > > > > > * config/i386/sse.md (*sse_vmrcpv4sf2"): New define_insn 
> > > > > > used
> > > > > > by pass rpad.
> > > > > > 
> > > > > > (*_vmsqrt2*):
> > > > > > Ditto.
> > > > > > (*sse_vmrsqrtv4sf2): Ditto.
> > > > > > (*avx512f_rndscale): Ditto.
> > > > > > (*sse4_1_round): Ditto.
> > > > > >
> > > > > > gcc/testsuite
> > > > > > * gcc.target/i386/pr87007-4.c: New test.
> > > > > > * gcc.target/i386/pr87007-5.c: Ditto.
> > > > > >
> > > > > >
> > > > > > --
> > > > > > BR,
> > > > > > Hongtao
> > > >
> > > > (set (attr "preferred_for_speed")
> > > >   (cond [(eq_attr "alternative" "1")
> > > >(symbol_ref "TARGET_AVX || 
> > > > !TARGET_SSE_PARTIAL_REG_DEPENDENCY")
> > > > (eq_attr "alternative" "2")
> > > > -  (symbol_ref "!TARGET_SSE_PARTIAL_REG_DEPENDENCY")
> > > > +  (symbol_ref "TARGET_AVX || 
> > > > !TARGET_SSE_PARTIAL_REG_DEPENDENCY")
> > > > ]
> > > > (symbol_ref "true")))])
> > > >
> > > > This can be written as:
> > > >
> > > > (set (attr "preferred_for_speed")
> > > >   (cond [(match_test "TARGET_AVX")
> > > >(symbol_ref "true")
> > > > (eq_attr "alternative" "1,2")
> > > >   (symbol_ref "!TARGET_SSE_PARTIAL_REG_DEPENDENCY")
> > > > ]
> > > > (symbol_ref "true")))])
> > > >
> > > > Uros.
> > >
> > > Yes, after these fixed, i'll upstream to trunk, ok?
> > Update patch.
>
> +(sqrt:
> +  (match_operand: 1 "vector_operand"
> "xBm,")))
> +  (match_operand:VF_128 2 

C++ PATCH to add test for c++/91581

2019-10-25 Thread Marek Polacek
Fixed by r277351, applying to trunk.

2019-10-25  Marek Polacek  

PR c++/91581 - ICE in exception-specification of defaulted ctor.
* g++.dg/cpp0x/noexcept55.C: New test.

diff --git gcc/testsuite/g++.dg/cpp0x/noexcept55.C 
gcc/testsuite/g++.dg/cpp0x/noexcept55.C
new file mode 100644
index 000..f46320db211
--- /dev/null
+++ gcc/testsuite/g++.dg/cpp0x/noexcept55.C
@@ -0,0 +1,8 @@
+// PR c++/91581 - ICE in exception-specification of defaulted ctor.
+// { dg-do compile { target c++11 } }
+
+struct A {
+A() noexcept(sizeof(A)) = default;
+};
+
+A a;



GCC 9 Patch committed: Don't inline numeric expressions with named types

2019-10-25 Thread Ian Lance Taylor
In the gofrontend we've encountered problems with numeric expressions
that have named types, as shown at https://golang.org/issue/34577.
Those problems are fixed on trunk, but the fixes there rely on other
machinery that has been added since the GCC 9 branch.  This patch
fixes the same problems on GCC 9 branch, but in this case by simply
not inlining functions that use this case.  This fixes
https://golang.org/issue/35154.  Bootstrapped and ran Go testsuite.
Committed to GCC 9 branch.

Ian
Index: expressions.cc
===
--- expressions.cc  (revision 271636)
+++ expressions.cc  (working copy)
@@ -2036,7 +2036,11 @@ class Integer_expression : public Expres
 
   int
   do_inlining_cost() const
-  { return 1; }
+  {
+if (this->type_ != NULL && this->type_->named_type() != NULL)
+  return 0x10;
+return 1; 
+  }
 
   void
   do_export(Export_function_body*) const;
@@ -2451,7 +2455,11 @@ class Float_expression : public Expressi
 
   int
   do_inlining_cost() const
-  { return 1; }
+  {
+if (this->type_ != NULL && this->type_->named_type() != NULL)
+  return 0x10;
+return 1;
+  }
 
   void
   do_export(Export_function_body*) const;
@@ -2664,7 +2672,11 @@ class Complex_expression : public Expres
 
   int
   do_inlining_cost() const
-  { return 2; }
+  {
+if (this->type_ != NULL && this->type_->named_type() != NULL)
+  return 0x10;
+return 2;
+  }
 
   void
   do_export(Export_function_body*) const;


Re: [PATCH] PR85678: Change default to -fno-common

2019-10-25 Thread Georg-Johann Lay

Wilco Dijkstra schrieb:

GCC currently defaults to -fcommon.  As discussed in the PR, this is an ancient
C feature which is not conforming with the latest C standards.  On many targets
this means global variable accesses have a codesize and performance penalty.
This applies to C code only, C++ code is not affected by -fcommon.  It is about
time to change the default.

OK for commit?


IIRC using -fno-common might lead to some testsuit fallout because
some optimizations / test cases are sensitive to -f[no-]common.
So I wonder that no adjustments to test cases are needed?


ChangeLog
2019-10-25  Wilco Dijkstra  

PR85678
* common.opt (fcommon): Change init to 1.

doc/
* invoke.texi (-fcommon): Update documentation.
---

diff --git a/gcc/common.opt b/gcc/common.opt
index 
0195b0cb85a06dd043fd0412b42dfffddfa2495b..b0840f41a5e480f4428bd62724b0dc3d54c68c0b
 100644
--- a/gcc/common.opt
+++ b/gcc/common.opt
@@ -1131,7 +1131,7 @@ Common Report Var(flag_combine_stack_adjustments) 
Optimization
 Looks for opportunities to reduce stack adjustments and stack references.
 
 fcommon

-Common Report Var(flag_no_common,0)
+Common Report Var(flag_no_common,0) Init(1)
 Put uninitialized globals in the common section.
 
 fcompare-debug

diff --git a/gcc/doc/invoke.texi b/gcc/doc/invoke.texi
index 
857d9692729e503657d0d0f44f1f6252ec90d49a..5b4ff66015f5f94a5bd89e4dc3d2d53553cc091e
 100644
--- a/gcc/doc/invoke.texi
+++ b/gcc/doc/invoke.texi
@@ -568,7 +568,7 @@ Objective-C and Objective-C++ Dialects}.
 -fnon-call-exceptions  -fdelete-dead-exceptions  -funwind-tables @gol
 -fasynchronous-unwind-tables @gol
 -fno-gnu-unique @gol
--finhibit-size-directive  -fno-common  -fno-ident @gol
+-finhibit-size-directive  -fcommon  -fno-ident @gol
 -fpcc-struct-return  -fpic  -fPIC  -fpie  -fPIE  -fno-plt @gol
 -fno-jump-tables @gol
 -frecord-gcc-switches @gol
@@ -14050,35 +14050,27 @@ useful for building programs to run under WINE@.
 code that is not binary compatible with code generated without that switch.
 Use it to conform to a non-default application binary interface.
 
-@item -fno-common

-@opindex fno-common
+@item -fcommon
 @opindex fcommon
+@opindex fno-common
 @cindex tentative definitions
-In C code, this option controls the placement of global variables 
-defined without an initializer, known as @dfn{tentative definitions} 
-in the C standard.  Tentative definitions are distinct from declarations 
+In C code, this option controls the placement of global variables

+defined without an initializer, known as @dfn{tentative definitions}
+in the C standard.  Tentative definitions are distinct from declarations
 of a variable with the @code{extern} keyword, which do not allocate storage.
 
-Unix C compilers have traditionally allocated storage for

-uninitialized global variables in a common block.  This allows the
-linker to resolve all tentative definitions of the same variable
+The default is @option{-fno-common}, which specifies that the compiler places
+uninitialized global variables in the BSS section of the object file.


IMO "uninitialized" is confusing because the variables actually
*are* initialized: with zero.  It's just that the variables don't have
explicit initializers.  Dito for "uninitialized" in the --help message.

Johann



+This inhibits the merging of tentative definitions by the linker so you get a
+multiple-definition error if the same variable is accidentally defined in more
+than one compilation unit.
+
+The @option{-fcommon} places uninitialized global variables in a common block.
+This allows the linker to resolve all tentative definitions of the same 
variable
 in different compilation units to the same object, or to a non-tentative
-definition.  
-This is the behavior specified by @option{-fcommon}, and is the default for 
-GCC on most targets.  
-On the other hand, this behavior is not required by ISO

-C, and on some targets may carry a speed or code size penalty on
-variable references.
-
-The @option{-fno-common} option specifies that the compiler should instead
-place uninitialized global variables in the BSS section of the object file.
-This inhibits the merging of tentative definitions by the linker so
-you get a multiple-definition error if the same 
-variable is defined in more than one compilation unit.

-Compiling with @option{-fno-common} is useful on targets for which
-it provides better performance, or if you wish to verify that the
-program will work on other systems that always treat uninitialized
-variable definitions this way.
+definition.  This behavior does not conform to ISO C, is inconsistent with C++,
+and on many targets implies a speed and code size penalty on global variable
+references.  It is mainly useful to enable legacy code to link without errors.
 
 @item -fno-ident

 @opindex fno-ident





Re: Argument-mismatch fallout

2019-10-25 Thread Thomas Koenig





However, passing a scalar instead of an array/array
element worked/works with (nearly?) all compilers. Hence, passing
a scalar is seemingly common pattern. Thus, I wonder whether we
should do something about this.


Maybe we could mention -fallow-argument-mismatch in the error message.

(Really, the changes needed are quite trivial, and the code is in fact
invalid.  I actually wrote the patch for LAPACK which removed all the
fallout from their TESTING routines; it didn't take long).

Regards

Thomas


C++ PATCH for c++/90998 - ICE with copy elision in init by ctor and -Wconversion

2019-10-25 Thread Marek Polacek
After r269667 which introduced joust_maybe_elide_copy, in C++17 we can elide
a constructor if it uses a conversion function that returns a prvalue, and
use the conversion function in its stead.

This eliding means that if we have a candidate that previously didn't have
->second_conv, it can have it after the elision.  This confused the
-Wconversion warning because it was assuming that if cand1->second_conv is
non-null, so is cand2->second_conv.  Here cand1->second_conv was non-null
but cand2->second_conv remained null, so it crashed in compare_ics.

I checked with clang that both compilers call A::operator B() in C++17 and
B::B(A const &) otherwise.

Bootstrapped/regtested on x86_64-linux, ok for trunk and 9?

2019-10-25  Marek Polacek  

PR c++/90998 - ICE with copy elision in init by ctor and -Wconversion.
* call.c (joust): Don't attempt to warn if ->second_conv is null.

* g++.dg/cpp0x/overload-conv-4.C: New test.

diff --git gcc/cp/call.c gcc/cp/call.c
index cbd1fe8a0a4..b0c6370107d 100644
--- gcc/cp/call.c
+++ gcc/cp/call.c
@@ -10870,7 +10870,9 @@ joust (struct z_candidate *cand1, struct z_candidate 
*cand2, bool warn,
  either between a constructor and a conversion op, or between two
  conversion ops.  */
   if ((complain & tf_warning)
-  && winner && warn_conversion && cand1->second_conv
+  /* In C++17, the constructor might have been elided, which means that
+an originally null ->second_conv could become non-null.  */
+  && winner && warn_conversion && cand1->second_conv && cand2->second_conv
   && (!DECL_CONSTRUCTOR_P (cand1->fn) || !DECL_CONSTRUCTOR_P (cand2->fn))
   && winner != compare_ics (cand1->second_conv, cand2->second_conv))
 {
diff --git gcc/testsuite/g++.dg/cpp0x/overload-conv-4.C 
gcc/testsuite/g++.dg/cpp0x/overload-conv-4.C
new file mode 100644
index 000..6fcdbbaa6a4
--- /dev/null
+++ gcc/testsuite/g++.dg/cpp0x/overload-conv-4.C
@@ -0,0 +1,23 @@
+// PR c++/90998 - ICE with copy elision in init by ctor and -Wconversion.
+// { dg-do compile { target c++11 } }
+// { dg-options "-Wconversion" }
+
+struct B;
+
+struct A {
+operator B();
+};
+
+struct B {
+B(A const );
+B(B const );
+};
+
+B
+f (A x)
+{
+  // C++14: we call B::B(A const &)
+  // C++17: we call A::operator B()
+  return B(x); // { dg-warning "choosing .A::operator B\\(\\). over 
.B::B\\(const A&\\)" "" { target c++17 } }
+  // { dg-warning "for conversion from .A. to .B." "" { target c++17 } .-1 }
+}



Argument-mismatch fallout (was: Re: [PATCH, Fortran] Optionally suppress no-automatic overwrites recursive warning - for approval)

2019-10-25 Thread Tobias Burnus

Hi Jeff, hi Thomas, hi all,

I had a look at Wannier90. (Fedora uses Version 2.0.0 of 2013, 2.0.1 was 
released 2015; current is 3.0 of Feb 2019 and does build.) I think that 
problem in Wannier90 it typical for all failing code, although there are 
likely a few other failures. Namely,


That code uses something like:
  subroutine sub(array, n)
     integer array(*)
     integer N
  end subroutine

with the call:
integer :: scalar
call sub(scalar, 1)

That fails with:
Error: Rank mismatch in argument ‘array’ at (1) (rank-1 and scalar)

That's a scalar actual argument is passed to an array dummy argument.


Technically, the error is correct and the code invalid:

F2018, 15.5.2.4, paragraph 14 has:
"If the actual argument is a … scalar, the corresponding dummy
argument shall be scalar unless … the actual argument …
is an element … of an array."

Fortran 66, https://wg5-fortran.org/ARCHIVE/Fortran66.pdf , [8.4.2, ll. 33-35]
"If a dummy argument of an external function is an array name,
the corresponding actual argument must be an array name
or array element name (10.1.3)"

Fortran IV, p. 104,  
http://web.eah-jena.de/~kleine/history/languages/GC28-6515-10-FORTRAN-IV-Language.pdf
"If a dummy argument is an array, the corresponding actual argument
must be (1) an array, or (2) an array element."


[Fortran IV only permitted literal constants in
'dimension(123)' and 'dimension(N)' for dummy arguments,
i.e. the size is variable and provided by the callee.
[Number of ranks not specified, the linked specification
supports rank-7 arrays.]
Fortran 66 seems to be likewise, and specifies only rank-4 arrays.
Fortran 77 has finally assumed-size arrays (which seem to be
backwards, 'dimension(N)' I regard as better than 'dimension(*)'),
supports other lower bounds than 1, has rank-7 arrays and a few
additional array extensions to 66.]


Current result:
* Error with -std=gnu (the default) or -std=f95/f2003/f…
* Downgraded to a warning with either -std=legacy or -fallow-argument-mismatch


However, passing a scalar instead of an array/array
element worked/works with (nearly?) all compilers. Hence, passing
a scalar is seemingly common pattern. Thus, I wonder whether we
should do something about this.

Ideas/possibilities:
* Use warning instead of an error with -std=gnu (and only error
  for std=f…)
* Keep the error for most mismatches but downgrade to a warning
  if -std=gnu for (only) the scalar-to-array case.
  (a) do this always or (b) only for assumes-size dummy args.
* (a) Keep the status quo and require -std=legacy/
-fallow-argument-mismatch to get it compiled.
  (b) modify the error message such that the user knows about
  the arg-mismatch flag?
* Do something else?

Ideas? Comments? Opinions?
  


Cheers,

Tobias


On 10/25/19 4:48 PM, Jeff Law wrote:

[…]the function argument stuff broke 30-40 packages, many of which still
don't build without -fallow-argument-mismatch.



Do you know whether those 30–40 packages have code bugs or could there
be gfortran bugs (too strict checking) lurking?

I'm not familiar enough with the issue & packages to know if they're
cases of source bugs or gfortran being too strict.

My plan has always been to extract a few cases and pass them along for
that kind of analysis.  I've just been too busy lately with other
regressions :(

A partial list of the affected packages:


R-deldir
R
atlas
cgnslib
cp2k
elk
elpa
exciting
ga
getdata
grib_api
hdf
libccp4
mpich
hwchem
psblas3
qrmumps
qrupdate
quantum-espresso
scalapack
scipy
scorep
wannier90
wsjtx
xfoil
xrotor

There's certainly more, that list just represents those I've locally
worked around with -fallow-argument-mismatch.  Several more trigger the
mismatch error, but I haven't bothered working around yet.

That list comes from _after_ the  Oct 14 patch to correct issues in the
argument mismatch testing.



Regarding the BOZ: One difference to the argument mismatch is that the
latter has an option to still accept it (-fallow-argument-mismatch) and
potentially generates wrong code – depending what the ME does with the
mismatches – while the former, once parsed, causes no potential ME
problems and as there is no flag, it always requires code changes. (On
the other hand, fixing the BOZ issue is straight forward; argument
changes are trickier.)

Absolutely.  That's the primary reason why I haven't contacted the
affected package maintainers yet -- I don't want them blindly adding
-fallow-argument-mismatch to their flags.

Jeff



[PATCH v2 1/2] RISC-V: Add shorten_memrefs pass

2019-10-25 Thread Craig Blackmore
This patch aims to allow more load/store instructions to be compressed by
replacing a load/store of 'base register + large offset' with a new load/store
of 'new base + small offset'. If the new base gets stored in a compressed
register, then the new load/store can be compressed. Since there is an overhead
in creating the new base, this change is only attempted when 'base register' is
referenced in at least 4 load/stores in a basic block.

The optimization is implemented in a new RISC-V specific pass called
shorten_memrefs which is enabled for RVC targets. It has been developed for the
32-bit lw/sw instructions but could also be extended to 64-bit ld/sd in future.

Tested on bare metal rv32i, rv32iac, rv32im, rv32imac, rv32imafc, rv64imac,
rv64imafdc via QEMU. No regressions.

gcc/ChangeLog:

* config.gcc: Add riscv-shorten-memrefs.o to extra_objs for riscv.
* config/riscv/riscv-passes.def: New file.
* config/riscv/riscv-protos.h (make_pass_shorten_memrefs): Declare.
* config/riscv/riscv-shorten-memrefs.c: New file.
* config/riscv/riscv.c (tree-pass.h): New include.
(riscv_compressed_reg_p): New Function
(riscv_compressed_lw_offset_p): Likewise.
(riscv_compressed_lw_address_p): Likewise.
(riscv_shorten_lw_offset): Likewise.
(riscv_legitimize_address): Attempt to convert base + large_offset
to compressible new_base + small_offset.
(riscv_address_cost): Make anticipated compressed load/stores
cheaper for code size than uncompressed load/stores.
(riscv_register_priority): Move compressed register check to
riscv_compressed_reg_p.
* config/riscv/riscv.h (RISCV_MAX_COMPRESSED_LW_OFFSET): Define.
* config/riscv/riscv.opt (mshorten-memefs): New option.
* config/riscv/t-riscv (riscv-shorten-memrefs.o): New rule.
(PASSES_EXTRA): Add riscv-passes.def.
* doc/invoke.texi: Document -mshorten-memrefs.
---
 gcc/config.gcc   |   2 +-
 gcc/config/riscv/riscv-passes.def|  20 +++
 gcc/config/riscv/riscv-protos.h  |   2 +
 gcc/config/riscv/riscv-shorten-memrefs.c | 188 +++
 gcc/config/riscv/riscv.c |  86 ++-
 gcc/config/riscv/riscv.h |   5 +
 gcc/config/riscv/riscv.opt   |   6 +
 gcc/config/riscv/t-riscv |   5 +
 gcc/doc/invoke.texi  |  10 ++
 9 files changed, 318 insertions(+), 6 deletions(-)
 create mode 100644 gcc/config/riscv/riscv-passes.def
 create mode 100644 gcc/config/riscv/riscv-shorten-memrefs.c

diff --git a/gcc/config.gcc b/gcc/config.gcc
index bdc2253f8ef..e617215314b 100644
--- a/gcc/config.gcc
+++ b/gcc/config.gcc
@@ -523,7 +523,7 @@ pru-*-*)
;;
 riscv*)
cpu_type=riscv
-   extra_objs="riscv-builtins.o riscv-c.o"
+   extra_objs="riscv-builtins.o riscv-c.o riscv-shorten-memrefs.o"
d_target_objs="riscv-d.o"
;;
 rs6000*-*-*)
diff --git a/gcc/config/riscv/riscv-passes.def 
b/gcc/config/riscv/riscv-passes.def
new file mode 100644
index 000..8a4ea0918db
--- /dev/null
+++ b/gcc/config/riscv/riscv-passes.def
@@ -0,0 +1,20 @@
+/* Declaration of target-specific passes for RISC-V.
+   Copyright (C) 2019 Free Software Foundation, Inc.
+
+   This file is part of GCC.
+
+   GCC is free software; you can redistribute it and/or modify it
+   under the terms of the GNU General Public License as published by
+   the Free Software Foundation; either version 3, or (at your option)
+   any later version.
+
+   GCC is distributed in the hope that it will be useful, but
+   WITHOUT ANY WARRANTY; without even the implied warranty of
+   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
+   General Public License for more details.
+
+   You should have received a copy of the GNU General Public License
+   along with GCC; see the file COPYING3.  If not see
+   .  */
+
+INSERT_PASS_AFTER (pass_rtl_store_motion, 1, pass_shorten_memrefs);
diff --git a/gcc/config/riscv/riscv-protos.h b/gcc/config/riscv/riscv-protos.h
index 5092294803c..78008c28b75 100644
--- a/gcc/config/riscv/riscv-protos.h
+++ b/gcc/config/riscv/riscv-protos.h
@@ -89,4 +89,6 @@ extern void riscv_init_builtins (void);
 /* Routines implemented in riscv-common.c.  */
 extern std::string riscv_arch_str ();
 
+rtl_opt_pass * make_pass_shorten_memrefs (gcc::context *ctxt);
+
 #endif /* ! GCC_RISCV_PROTOS_H */
diff --git a/gcc/config/riscv/riscv-shorten-memrefs.c 
b/gcc/config/riscv/riscv-shorten-memrefs.c
new file mode 100644
index 000..aed7ddb792e
--- /dev/null
+++ b/gcc/config/riscv/riscv-shorten-memrefs.c
@@ -0,0 +1,188 @@
+/* Shorten memrefs pass for RISC-V.
+   Copyright (C) 2018-2019 Free Software Foundation, Inc.
+
+This file is part of GCC.
+
+GCC is free software; you can redistribute it and/or modify
+it under the terms of the GNU General Public License as published 

[PATCH v2 2/2] sched-deps.c: Avoid replacing address if it increases address cost

2019-10-25 Thread Craig Blackmore
The sched2 pass undoes some of the addresses generated by the RISC-V
shorten_memrefs code size optimization (patch 1/2) and consequently increases
code size. This patch prevents sched-deps.c from changing an address if it is
expected to increase address cost.

Tested on bare metal rv32i, rv32iac, rv32im, rv32imac, rv32imafc, rv64imac,
rv64imafdc via QEMU. Bootstrapped and tested on x86_64-linux-gnu. No
regressions.

gcc/ChangeLog:

* sched-deps.c (attempt_change): Use old address if it is
cheaper than new address.
---
 gcc/sched-deps.c | 9 +
 1 file changed, 9 insertions(+)

diff --git a/gcc/sched-deps.c b/gcc/sched-deps.c
index 308db4e3ca0..c7d0ca550df 100644
--- a/gcc/sched-deps.c
+++ b/gcc/sched-deps.c
@@ -39,6 +39,7 @@ along with GCC; see the file COPYING3.  If not see
 #include "params.h"
 #include "cselib.h"
 #include "function-abi.h"
+#include "predict.h"
 
 #ifdef INSN_SCHEDULING
 
@@ -4694,6 +4695,14 @@ attempt_change (struct mem_inc_info *mii, rtx new_addr)
   rtx mem = *mii->mem_loc;
   rtx new_mem;
 
+  /* Prefer old address if it is less expensive.  */
+  addr_space_t as = MEM_ADDR_SPACE (mem);
+  bool speed = optimize_bb_for_speed_p (BLOCK_FOR_INSN (mii->mem_insn));
+  int old_cost = address_cost (XEXP (mem, 0), GET_MODE (mem), as, speed);
+  int new_cost = address_cost (new_addr, GET_MODE (mem), as, speed);
+  if (new_cost > old_cost)
+return NULL_RTX;
+
   /* Jump through a lot of hoops to keep the attributes up to date.  We
  do not want to call one of the change address variants that take
  an offset even though we know the offset in many cases.  These
-- 
2.17.1



[PATCH v2 0/2] RISC-V: Allow more load/stores to be compressed

2019-10-25 Thread Craig Blackmore
Hi Kito,

Thank you for taking the time to review my patch. I am posting an updated
patchset taking into account your comments.

On 18/09/2019 11:01, Kito Cheng wrote:
> Hi Craig:
>
> Some general review comment:
> - Split new pass into new file.
> - Add new option to enable/disable this pass.
> - Could you extend this patch to support lw/sw/ld/sd/flw/fsw/fld/fsd?
>   I think there is lots of common logic for supporting other types
> compressed load/store
>   instruction, but I'd like to see those support at once.

I agree the patch could be extended to other load/store instructions but
unfortunately I don't have the time to do this at the moment. Can the lw/sw
support be merged and the others added later?

> - Do you have experimental data about doing that after register
> allocation/reload,

I don't think it is feasible to move the pass after reload, because the pass
requires a new register to be allocated for the new base.

>   I'd prefer doing such optimization after RA, because we can
> accurately estimate
>   how many byte we can gain, I guess it because RA didn't assign fit
> src/dest reg
>   or base reg so that increase code size?
>

Before reload, we do not know whether the base reg will be a compressed register
or not.


>
> On Fri, Sep 13, 2019 at 12:20 AM Craig Blackmore
>  wrote:
>>
>> This patch aims to allow more load/store instructions to be compressed by
>> replacing a load/store of 'base register + large offset' with a new 
>> load/store
>> of 'new base + small offset'. If the new base gets stored in a compressed
>> register, then the new load/store can be compressed. Since there is an 
>> overhead
>> in creating the new base, this change is only attempted when 'base register' 
>> is
>> referenced in at least 4 load/stores in a basic block.
>>
>> The optimization is implemented in a new RISC-V specific pass called
>> shorten_memrefs which is enabled for RVC targets. It has been developed for 
>> the
>> 32-bit lw/sw instructions but could also be extended to 64-bit ld/sd in 
>> future.
>>
>> The patch saves 164 bytes (0.3%) on a proprietary application (59450 bytes
>> compared to 59286 bytes) compiled for rv32imc bare metal with -Os. On the
>> Embench benchmark suite (https://www.embench.org/) we see code size 
>> reductions
>> of up to 18 bytes (0.7%) and only two cases where code size is increased
>> slightly, by 2 bytes each:
>>
>> Embench results (.text size in bytes, excluding .rodata)
>>
>> Benchmark   Without patch  With patch  Diff
>> aha-mont64  1052   10520
>> crc32   232232 0
>> cubic   2446   24482
>> edn 1454   1450-4
>> huffbench   1642   16420
>> matmult-int 420420 0
>> minver  1056   10560
>> nbody   714714 0
>> nettle-aes  2888   2884-4
>> nettle-sha256   5566   5564-2
>> nsichneu15052  15052   0
>> picojpeg8078   80780
>> qrduino 6140   61400
>> sglib-combined  2444   24440
>> slre2438   2420-18
>> st  880880 0
>> statemate   3842   38420
>> ud  702702 0
>> wikisort4278   42802
>> -
>> Total   61324  61300   -24
>>
>> The patch has been tested on the following bare metal targets using QEMU
>> and there were no regressions:
>>
>>   rv32i
>>   rv32iac
>>   rv32im
>>   rv32imac
>>   rv32imafc
>>   rv64imac
>>   rv64imafdc
>>
>> We noticed that sched2 undoes some of the addresses generated by this
>> optimization and consequently increases code size, therefore this patch adds 
>> a
>> check in sched-deps.c to avoid changes that are expected to increase code 
>> size
>> when not optimizing for speed. Since this change touches target-independent
>> code, the patch has been bootstrapped and tested on x86 with no regressions.
>>


>> diff --git a/gcc/sched-deps.c b/gcc/sched-deps.c
>> index 52db3cc..92a0893 100644
>> --- a/gcc/sched-deps.c
>> +++ b/gcc/sched-deps.c
>> @@ -38,6 +38,7 @@ along with GCC; see the file COPYING3.  If not see
>>  #include "sched-int.h"
>>  #include "params.h"
>>  #include "cselib.h"
>> +#include "predict.h"
>>
>>  #ifdef INSN_SCHEDULING
>>
>> @@ -4707,6 +4708,15 @@ attempt_change (struct mem_inc_info *mii, rtx 
>> new_addr)
>>rtx mem = *mii->mem_loc;
>>rtx new_mem;
>>
>> +  /* When not optimizing for speed, avoid changes that are expected to make 
>> code
>> + size larger.  */
>> +  addr_space_t as = MEM_ADDR_SPACE (mem);
>> +  bool speed = optimize_bb_for_speed_p (BLOCK_FOR_INSN (mii->mem_insn));
>> +  int old_cost = address_cost (XEXP (mem, 0), GET_MODE (mem), as, speed);
>> +  int new_cost = address_cost (new_addr, GET_MODE 

[PING^3][PATCH 3/3][DejaGNU] target: Wrap linker flags into `-largs'/`-margs' for Ada

2019-10-25 Thread Maciej W. Rozycki
On Tue, 21 May 2019, Jacob Bachmeyer wrote:

> >  IOW I don't discourage you from developing a comprehensive solution, 
> > however applying my proposal right away will help at least some people and 
> > will not block you in any way.
> >   
> 
> Correct, although, considering how long my FSF paperwork took, I might 
> be able to finish a comprehensive patch before your paperwork is 
> completed.  :-)

 So by now the FSF paperwork has been long completed actually.

> >  You are welcome to go ahead with your effort as far as I am concerned.
> >   
> 
> I am working on it.  :-)

 Hopefully you have made good progress now.  Otherwise this is a ping for:



Once complete your change can go on top of mine and meanwhile we'll have 
a working GCC test suite.

  Maciej


[PATCH] Use implicitly-defined copy operations for test iterators

2019-10-25 Thread Jonathan Wakely

All of these special member functions do exactly what the compiler would
do anyway. By defining them as defaulted for C++11 and later we prevent
move constructors and move assignment operators being defined (which is
consistent with the previous semantics).

Also move default init of the input_iterator_wrapper members from the
derived constructor to the protected base constructor.

* testsuite/util/testsuite_iterators.h (output_iterator_wrapper)
(input_iterator_wrapper, forward_iterator_wrapper)
bidirectional_iterator_wrapper, random_access_iterator_wrapper): Remove
user-provided copy constructors and copy assignment operators so they
are defined implicitly.
(input_iterator_wrapper): Initialize members in default constructor.
(forward_iterator_wrapper): Remove assignments to members of base.

Tested powerpc64le-linux, committed to trunk.

commit b44b241cab8af8af5d5ff6299c0409f4be2e7bbc
Author: Jonathan Wakely 
Date:   Thu Oct 24 22:00:27 2019 +0100

Use implicitly-defined copy operations for test iterators

All of these special member functions do exactly what the compiler would
do anyway. By defining them as defaulted for C++11 and later we prevent
move constructors and move assignment operators being defined (which is
consistent with the previous semantics).

Also move default init of the input_iterator_wrapper members from the
derived constructor to the protected base constructor.

* testsuite/util/testsuite_iterators.h (output_iterator_wrapper)
(input_iterator_wrapper, forward_iterator_wrapper)
bidirectional_iterator_wrapper, random_access_iterator_wrapper): 
Remove
user-provided copy constructors and copy assignment operators so 
they
are defined implicitly.
(input_iterator_wrapper): Initialize members in default constructor.
(forward_iterator_wrapper): Remove assignments to members of base.

diff --git a/libstdc++-v3/testsuite/util/testsuite_iterators.h 
b/libstdc++-v3/testsuite/util/testsuite_iterators.h
index 42e42740df9..d9a35622fb7 100644
--- a/libstdc++-v3/testsuite/util/testsuite_iterators.h
+++ b/libstdc++-v3/testsuite/util/testsuite_iterators.h
@@ -132,9 +132,14 @@ namespace __gnu_test
   ITERATOR_VERIFY(ptr >= SharedInfo->first && ptr <= SharedInfo->last);
 }
 
-output_iterator_wrapper(const output_iterator_wrapper& in)
-: ptr(in.ptr), SharedInfo(in.SharedInfo)
-{ }
+#if __cplusplus >= 201103L
+output_iterator_wrapper() = delete;
+
+output_iterator_wrapper(const output_iterator_wrapper&) = default;
+
+output_iterator_wrapper&
+operator=(const output_iterator_wrapper&) = default;
+#endif
 
 WritableObject
 operator*() const
@@ -144,14 +149,6 @@ namespace __gnu_test
   return WritableObject(ptr, SharedInfo);
 }
 
-output_iterator_wrapper&
-operator=(const output_iterator_wrapper& in)
-{
-  ptr = in.ptr;
-  SharedInfo = in.SharedInfo;
-  return *this;
-}
-
 output_iterator_wrapper&
 operator++()
 {
@@ -203,7 +200,7 @@ namespace __gnu_test
 std::ptrdiff_t, T*, T&>
   {
   protected:
-input_iterator_wrapper()
+input_iterator_wrapper() : ptr(0), SharedInfo(0)
 { }
 
   public:
@@ -215,9 +212,12 @@ namespace __gnu_test
 : ptr(_ptr), SharedInfo(SharedInfo_in)
 { ITERATOR_VERIFY(ptr >= SharedInfo->first && ptr <= SharedInfo->last); }
 
-input_iterator_wrapper(const input_iterator_wrapper& in)
-: ptr(in.ptr), SharedInfo(in.SharedInfo)
-{ }
+#if __cplusplus >= 201103L
+input_iterator_wrapper(const input_iterator_wrapper&) = default;
+
+input_iterator_wrapper&
+operator=(const input_iterator_wrapper&) = default;
+#endif
 
 bool
 operator==(const input_iterator_wrapper& in) const
@@ -247,14 +247,6 @@ namespace __gnu_test
   return &**this;
 }
 
-input_iterator_wrapper&
-operator=(const input_iterator_wrapper& in)
-{
-  ptr = in.ptr;
-  SharedInfo = in.SharedInfo;
-  return *this;
-}
-
 input_iterator_wrapper&
 operator++()
 {
@@ -298,19 +290,20 @@ namespace __gnu_test
   {
 typedef BoundsContainer ContainerType;
 typedef std::forward_iterator_tag iterator_category;
+
 forward_iterator_wrapper(T* _ptr, ContainerType* SharedInfo_in)
 : input_iterator_wrapper(_ptr, SharedInfo_in)
 { }
 
-forward_iterator_wrapper(const forward_iterator_wrapper& in)
-: input_iterator_wrapper(in)
+forward_iterator_wrapper()
 { }
 
-forward_iterator_wrapper()
-{
-  this->ptr = 0;
-  this->SharedInfo = 0;
-}
+#if __cplusplus >= 201103L
+forward_iterator_wrapper(const forward_iterator_wrapper&) = default;
+
+forward_iterator_wrapper&
+operator=(const forward_iterator_wrapper&) = default;
+#endif
 
 T&
 operator*() const
@@ -352,24 +345,22 @@ namespace __gnu_test
   

Re: [Patch, GCC]Backporting r269039 to gcc8

2019-10-25 Thread Delia Burduv
Hello Jeff,

Yes, it is a backport to gcc-8. No, I don't have commit access. Could you 
please commit it for me?

Thanks,
Delia

From: Jeff Law 
Sent: 04 October 2019 22:27
To: Delia Burduv ; gcc-patches@gcc.gnu.org 

Cc: nd ; i...@airs.com ; rguent...@suse.de 

Subject: Re: [Patch, GCC]Backporting r269039 to gcc8

On 10/4/19 9:11 AM, Delia Burduv wrote:
> Ping. Has anyone had a look at the patch? Please let me know if it is fine.
I think it's fine to backport to the gcc-8 branch.  Do you have commit
access?

jeff


Re: [Patch][Fortran] OpenMP – libgomp/testsuite – use 'stop' and 'dg-do run'

2019-10-25 Thread Steve Kargl
On Fri, Oct 25, 2019 at 06:17:26PM +0200, Tobias Burnus wrote:
> This patch is about: libgomp/testsuite/libgomp.fortran/, only
> 
> The two test cases I added recently use 'call abort()', which is 
> nowadays frowned on as that's a ventor extension. Hence, I change it to  
> 'stop'.
> 
> Additionally, the 'fortran.exp' in the directory states: "For Fortran 
> we're doing torture testing, as Fortran has far more tests with arrays 
> etc. that testing just -O0 or -O2 is insufficient, that is typically not 
> the case for C/C++."
> 
> The torture testing is only done if there is a "dg-do run"; without 
> dg-do, it also does run, but only with a single compile flag setting.
> 
> I have only selectively added it; I think 'dg-do run' does not make 
> sense for:
> * condinc*.f – only do uses '!$ include'
> * omp_cond*.f* – only tests '!$' code, including comments.
> Hence, I excluded those and only changed the others. (However, one can 
> still argue about the remaining ones – such as 'omp_hello.f' or tabs*.f*.)
> 
> OK for the trunk? Add/remove for 'dg-do run' from additional test cases?
> 
> Tobias
> 

I think this patch and the openacc patch are fine.

With openmp and openacc, I usually defer to Jakub, Thomas,
and Cesar for their expertise with these standards.  I do,
however, recognize that everyone has limited amounts of time.
If you have openmp/openacc patches I can cast an eye over
them for formatting issues; otherwise, I'll leave it to your
discretion on what a reasonable review timeout is prior to
committing a change. 



Re: [PATCH 1/2][vect]PR 88915: Vectorize epilogues when versioning loops

2019-10-25 Thread Andre Vieira (lists)

Hi,

This is the reworked patch after your comments.

I have moved the epilogue check into the analysis form disguised under 
'!epilogue_vinfos.is_empty ()'.  This because I realized that I am doing 
the "lowest threshold" check there.


The only place where we may reject an epilogue_vinfo is when we know the 
number of scalar iterations and we realize the number of iterations left 
after the main loop are not enough to enter the vectorized epilogue so 
we optimize away that code-gen.  The only way we know this to be true is 
if the number of scalar iterations are known and the peeling for 
alignment is known. So we know we will enter the main loop regardless, 
so whether the threshold we use is for a lower VF or not it shouldn't 
matter as much, I would even like to think that check isn't done, but I 
am not sure... Might be worth checking as an optimization.



Is this OK for trunk?

gcc/ChangeLog:
2019-10-25  Andre Vieira  

PR 88915
* tree-ssa-loop-niter.h (simplify_replace_tree): Change declaration.
* tree-ssa-loop-niter.c (simplify_replace_tree): Add context parameter
and make the valueize function pointer also take a void pointer.
* gcc/tree-ssa-sccvn.c (vn_valueize_wrapper): New function to wrap
around vn_valueize, to call it without a context.
(process_bb): Use vn_valueize_wrapper instead of vn_valueize.
* tree-vect-loop.c (_loop_vec_info): Initialize epilogue_vinfos.
(~_loop_vec_info): Release epilogue_vinfos.
(vect_analyze_loop_costing): Use knowledge of main VF to estimate
number of iterations of epilogue.
(vect_analyze_loop_2): Adapt to analyse main loop for all supported
vector sizes when vect-epilogues-nomask=1.  Also keep track of lowest
versioning threshold needed for main loop.
(vect_analyze_loop): Likewise.
(find_in_mapping): New helper function.
(update_epilogue_loop_vinfo): New function.
(vect_transform_loop): When vectorizing epilogues re-use analysis done
on main loop and call update_epilogue_loop_vinfo to update it.
* tree-vect-loop-manip.c (vect_update_inits_of_drs): No longer insert
stmts on loop preheader edge.
(vect_do_peeling): Enable skip-vectors when doing loop versioning if
we decided to vectorize epilogues.  Update epilogues NITERS and
construct ADVANCE to update epilogues data references where needed.
* tree-vectorizer.h (_loop_vec_info): Add epilogue_vinfos.
(vect_do_peeling, vect_update_inits_of_drs,
 determine_peel_for_niter, vect_analyze_loop): Add or update 
declarations.

* tree-vectorizer.c (try_vectorize_loop_1): Make sure to use already
created loop_vec_info's for epilogues when available.  Otherwise 
analyse

epilogue separately.



Cheers,
Andre
diff --git a/gcc/tree-ssa-loop-niter.h b/gcc/tree-ssa-loop-niter.h
index 4454c1ac78e02228047511a9e0214c82946855b8..aec6225125ce42ab0e4dbc930fc1a93862e6e267 100644
--- a/gcc/tree-ssa-loop-niter.h
+++ b/gcc/tree-ssa-loop-niter.h
@@ -53,7 +53,9 @@ extern bool scev_probably_wraps_p (tree, tree, tree, gimple *,
    class loop *, bool);
 extern void free_numbers_of_iterations_estimates (class loop *);
 extern void free_numbers_of_iterations_estimates (function *);
-extern tree simplify_replace_tree (tree, tree, tree, tree (*)(tree) = NULL);
+extern tree simplify_replace_tree (tree, tree,
+   tree, tree (*)(tree, void *) = NULL,
+   void * = NULL);
 extern void substitute_in_loop_info (class loop *, tree, tree);
 
 #endif /* GCC_TREE_SSA_LOOP_NITER_H */
diff --git a/gcc/tree-ssa-loop-niter.c b/gcc/tree-ssa-loop-niter.c
index cd2ced369719c37afd4aac08ff360719d7702e42..db666f019808850ed3a4aeef1a454a7ae2c65ef2 100644
--- a/gcc/tree-ssa-loop-niter.c
+++ b/gcc/tree-ssa-loop-niter.c
@@ -1935,7 +1935,7 @@ number_of_iterations_cond (class loop *loop,
 
 tree
 simplify_replace_tree (tree expr, tree old, tree new_tree,
-		   tree (*valueize) (tree))
+		   tree (*valueize) (tree, void*), void *context)
 {
   unsigned i, n;
   tree ret = NULL_TREE, e, se;
@@ -1951,7 +1951,7 @@ simplify_replace_tree (tree expr, tree old, tree new_tree,
 {
   if (TREE_CODE (expr) == SSA_NAME)
 	{
-	  new_tree = valueize (expr);
+	  new_tree = valueize (expr, context);
 	  if (new_tree != expr)
 	return new_tree;
 	}
@@ -1967,7 +1967,7 @@ simplify_replace_tree (tree expr, tree old, tree new_tree,
   for (i = 0; i < n; i++)
 {
   e = TREE_OPERAND (expr, i);
-  se = simplify_replace_tree (e, old, new_tree, valueize);
+  se = simplify_replace_tree (e, old, new_tree, valueize, context);
   if (e == se)
 	continue;
 
diff --git a/gcc/tree-ssa-sccvn.c b/gcc/tree-ssa-sccvn.c
index 57331ab44dc78c16d97065cd28e8c4cdcbf8d96e..0abe3fb8453ecf2e25ff55c5c9846663f68f7c8c 100644
--- a/gcc/tree-ssa-sccvn.c
+++ b/gcc/tree-ssa-sccvn.c
@@ -309,6 +309,10 @@ static vn_tables_t valid_info;
 /* Valueization hook.  Valueize NAME if it is an SSA name, otherwise
just return it.  */
 tree (*vn_valueize) (tree);

Re: [PATCH 1/2][vect]PR 88915: Vectorize epilogues when versioning loops

2019-10-25 Thread Andre Vieira (lists)




On 22/10/2019 14:56, Richard Biener wrote:

On Tue, 22 Oct 2019, Andre Vieira (lists) wrote:


Hi Richi,

See inline responses to your comments.

On 11/10/2019 13:57, Richard Biener wrote:

On Thu, 10 Oct 2019, Andre Vieira (lists) wrote:


Hi,





+
+  /* Keep track of vector sizes we know we can vectorize the epilogue
with.  */
+  vector_sizes epilogue_vsizes;
   };

please don't enlarge struct loop, instead track this somewhere
in the vectorizer (in loop_vinfo?  I see you already have
epilogue_vinfos there - so the loop_vinfo simply lacks
convenient access to the vector_size?)  I don't see any
use that could be trivially adjusted to look at a loop_vinfo
member instead.


Done.


For the vect_update_inits_of_drs this means that we'd possibly
do less CSE.  Not sure if really an issue.


CSE of what exactly? You are afraid we are repeating a calculation here we
have done elsewhere before?


All uses of those inits now possibly get the expression instead of
just the SSA name we inserted code for once.  But as said, we'll see.



This code changed after some comments from Richard Sandiford.


+  /* We are done vectorizing the main loop, so now we update the
epilogues
+ stmt_vec_info's.  At the same time we set the gimple UID of each
+ statement in the epilogue, as these are used to look them up in the
+ epilogues loop_vec_info later.  We also keep track of what
+ stmt_vec_info's have PATTERN_DEF_SEQ's and RELATED_STMT's that might
+ need updating and we construct a mapping between variables defined
in
+ the main loop and their corresponding names in epilogue.  */
+  for (unsigned i = 0; i < epilogue->num_nodes; ++i)

so for the following code I wonder if you can make use of the
fact that loop copying also copies UIDs, so you should be able
to match stmts via their UIDs and get at the other loop infos
stmt_info by the copy loop stmt UID.

I wonder why you need no modification for the SLP tree?

I checked with Tamar and the SLP tree works with the position of 
operands and not SSA_NAMES.  So we should be fine.


Re: [PATCH 1/2][vect]PR 88915: Vectorize epilogues when versioning loops

2019-10-25 Thread Andre Vieira (lists)




On 22/10/2019 18:52, Richard Sandiford wrote:

Thanks for doing this.  Hope this message doesn't cover too much old
ground or duplicate too much...

"Andre Vieira (lists)"  writes:

@@ -2466,15 +2476,65 @@ vect_do_peeling (loop_vec_info loop_vinfo, tree niters, 
tree nitersm1,
else
  niters_prolog = build_int_cst (type, 0);
  
+  loop_vec_info epilogue_vinfo = NULL;

+  if (vect_epilogues)
+{
+  /* Take the next epilogue_vinfo to vectorize for.  */
+  epilogue_vinfo = loop_vinfo->epilogue_vinfos[0];
+  loop_vinfo->epilogue_vinfos.ordered_remove (0);
+
+  /* Don't vectorize epilogues if this is not the most inner loop or if
+the epilogue may need peeling for alignment as the vectorizer doesn't
+know how to handle these situations properly yet.  */
+  if (loop->inner != NULL
+ || LOOP_VINFO_PEELING_FOR_ALIGNMENT (epilogue_vinfo))
+   vect_epilogues = false;
+
+}


Nit: excess blank line before "}".  Sorry if this was discussed before,
but what's the reason for delaying the check for "loop->inner" to
this point, rather than doing it in vect_analyze_loop?


Done.



+
+  tree niters_vector_mult_vf;
+  unsigned int lowest_vf = constant_lower_bound (vf);
+  /* Note LOOP_VINFO_NITERS_KNOWN_P and LOOP_VINFO_INT_NITERS work
+ on niters already ajusted for the iterations of the prologue.  */


Pre-existing typo: adjusted.  But...


+  if (LOOP_VINFO_NITERS_KNOWN_P (loop_vinfo)
+  && known_eq (vf, lowest_vf))
+{
+  loop_vec_info orig_loop_vinfo;
+  if (LOOP_VINFO_EPILOGUE_P (loop_vinfo))
+   orig_loop_vinfo = LOOP_VINFO_ORIG_LOOP_INFO (loop_vinfo);
+  else
+   orig_loop_vinfo = loop_vinfo;
+  vector_sizes vector_sizes = LOOP_VINFO_EPILOGUE_SIZES (orig_loop_vinfo);
+  unsigned next_size = 0;
+  unsigned HOST_WIDE_INT eiters
+   = (LOOP_VINFO_INT_NITERS (loop_vinfo)
+  - LOOP_VINFO_PEELING_FOR_GAPS (loop_vinfo));
+
+  if (prolog_peeling > 0)
+   eiters -= prolog_peeling;


...is that comment still true?  We're now subtracting the peeling
amount here.


It is not, "adjusted" the comment ;)


Might be worth asserting prolog_peeling >= 0, just to emphasise
that we can't get here for variable peeling amounts, and then subtract
prolog_peeling unconditionally (assuming that's the right thing to do).

Can't assert as LOOP_VINFO_NITERS_KNOWN_P can be true even with 
prolog_peeling < 0, since we still know the constant number of scalar 
iterations, we just don't know how many vector iterations will be 
performed due to the runtime peeling. I will however, not reject 
vectorizing the epilogue, when we don't know how much we are peeling.

+  eiters
+   = eiters % lowest_vf + LOOP_VINFO_PEELING_FOR_GAPS (loop_vinfo);
+
+  unsigned int ratio;
+  while (next_size < vector_sizes.length ()
+&& !(constant_multiple_p (current_vector_size,
+  vector_sizes[next_size], )
+ && eiters >= lowest_vf / ratio))
+   next_size += 1;
+
+  if (next_size == vector_sizes.length ())
+   vect_epilogues = false;
+}
+
/* Prolog loop may be skipped.  */
bool skip_prolog = (prolog_peeling != 0);
/* Skip to epilog if scalar loop may be preferred.  It's only needed
- when we peel for epilog loop and when it hasn't been checked with
- loop versioning.  */
+ when we peel for epilog loop or when we loop version.  */
bool skip_vector = (LOOP_VINFO_NITERS_KNOWN_P (loop_vinfo)
  ? maybe_lt (LOOP_VINFO_INT_NITERS (loop_vinfo),
  bound_prolog + bound_epilog)
- : !LOOP_REQUIRES_VERSIONING (loop_vinfo));
+ : (!LOOP_REQUIRES_VERSIONING (loop_vinfo)
+|| vect_epilogues));


The comment update looks wrong here: without epilogues, we don't need
the skip when loop versioning, because loop versioning ensures that we
have at least one vector iteration.

(I think "it" was supposed to mean "skipping to the epilogue" rather
than the epilogue loop itself, in case that's the confusion.)

It'd be good to mention the epilogue condition in the comment too.



Rewrote comment, hopefully this now better reflects reality.


+
+  if (vect_epilogues)
+{
+  epilog->aux = epilogue_vinfo;
+  LOOP_VINFO_LOOP (epilogue_vinfo) = epilog;
+
+  loop_constraint_clear (epilog, LOOP_C_INFINITE);
+
+  /* We now must calculate the number of iterations for our epilogue.  */
+  tree cond_niters, niters;
+
+  /* Depending on whether we peel for gaps we take niters or niters - 1,
+we will refer to this as N - G, where N and G are the NITERS and
+GAP for the original loop.  */
+  niters = LOOP_VINFO_PEELING_FOR_GAPS (loop_vinfo)
+   ? LOOP_VINFO_NITERSM1 (loop_vinfo)
+   : LOOP_VINFO_NITERS (loop_vinfo);
+
+  /* Here we build a vector factorization mask:
+vf_mask = ~(VF - 1), 

[Patch][Fortran] OpenMP – libgomp/testsuite – use 'stop' and 'dg-do run'

2019-10-25 Thread Tobias Burnus

This patch is about: libgomp/testsuite/libgomp.fortran/, only

The two test cases I added recently use 'call abort()', which is 
nowadays frowned on as that's a ventor extension. Hence, I change it to  
'stop'.


Additionally, the 'fortran.exp' in the directory states: "For Fortran 
we're doing torture testing, as Fortran has far more tests with arrays 
etc. that testing just -O0 or -O2 is insufficient, that is typically not 
the case for C/C++."


The torture testing is only done if there is a "dg-do run"; without 
dg-do, it also does run, but only with a single compile flag setting.


I have only selectively added it; I think 'dg-do run' does not make 
sense for:

* condinc*.f – only do uses '!$ include'
* omp_cond*.f* – only tests '!$' code, including comments.
Hence, I excluded those and only changed the others. (However, one can 
still argue about the remaining ones – such as 'omp_hello.f' or tabs*.f*.)


OK for the trunk? Add/remove for 'dg-do run' from additional test cases?

Tobias

	libgomp/
	* testsuite/libgomp.fortran/target-simd.f90: Use stop not abort.
	* testsuite/libgomp.fortran/use_device_ptr-optional-1.f90:
	Ditto; add 'dg-do run' for torture testing.
	* testsuite/libgomp.fortran/display-affinity-1.f90: Add 'dg-do run'.
	* testsuite/libgomp.fortran/lastprivate1.f90: Ditto.
	* testsuite/libgomp.fortran/lastprivate2.f90: Ditto.
	* testsuite/libgomp.fortran/nestedfn4.f90: Ditto.
	* testsuite/libgomp.fortran/omp_hello.f: Ditto.
	* testsuite/libgomp.fortran/omp_orphan.f: Ditto.
	* testsuite/libgomp.fortran/omp_reduction.f: Ditto.
	* testsuite/libgomp.fortran/omp_workshare1.f: Ditto.
	* testsuite/libgomp.fortran/omp_workshare2.f: Ditto.
	* testsuite/libgomp.fortran/pr25219.f90: Ditto.
	* testsuite/libgomp.fortran/pr28390.f: Ditto.
	* testsuite/libgomp.fortran/pr35130.f90: Ditto.
	* testsuite/libgomp.fortran/pr90779.f90: Ditto.
	* testsuite/libgomp.fortran/strassen.f90: Ditto.
	* testsuite/libgomp.fortran/task2.f90: Ditto.
	* testsuite/libgomp.fortran/taskgroup1.f90: Ditto.
	* testsuite/libgomp.fortran/taskloop1.f90: Ditto.
	* testsuite/libgomp.fortran/use_device_addr-1.f90: Ditto.
	* testsuite/libgomp.fortran/use_device_addr-2.f90: Ditto.
	* testsuite/libgomp.fortran/workshare1.f90: Ditto.
	* testsuite/libgomp.fortran/workshare2.f90: Ditto.

diff --git a/libgomp/testsuite/libgomp.fortran/display-affinity-1.f90 b/libgomp/testsuite/libgomp.fortran/display-affinity-1.f90
index 4811b6f801b..4e732e5427f 100644
--- a/libgomp/testsuite/libgomp.fortran/display-affinity-1.f90
+++ b/libgomp/testsuite/libgomp.fortran/display-affinity-1.f90
@@ -0,0 +1 @@
+! { dg-do run }
diff --git a/libgomp/testsuite/libgomp.fortran/lastprivate1.f90 b/libgomp/testsuite/libgomp.fortran/lastprivate1.f90
index 132617b5c27..62a5ef9d96c 100644
--- a/libgomp/testsuite/libgomp.fortran/lastprivate1.f90
+++ b/libgomp/testsuite/libgomp.fortran/lastprivate1.f90
@@ -0,0 +1 @@
+! { dg-do run }
diff --git a/libgomp/testsuite/libgomp.fortran/lastprivate2.f90 b/libgomp/testsuite/libgomp.fortran/lastprivate2.f90
index 6cd5760206c..97b6007e1ef 100644
--- a/libgomp/testsuite/libgomp.fortran/lastprivate2.f90
+++ b/libgomp/testsuite/libgomp.fortran/lastprivate2.f90
@@ -0,0 +1 @@
+! { dg-do run }
diff --git a/libgomp/testsuite/libgomp.fortran/nestedfn4.f90 b/libgomp/testsuite/libgomp.fortran/nestedfn4.f90
index bc8614a340a..6143bfb283c 100644
--- a/libgomp/testsuite/libgomp.fortran/nestedfn4.f90
+++ b/libgomp/testsuite/libgomp.fortran/nestedfn4.f90
@@ -0,0 +1 @@
+! { dg-do run }
diff --git a/libgomp/testsuite/libgomp.fortran/omp_hello.f b/libgomp/testsuite/libgomp.fortran/omp_hello.f
index ba445312625..0e795482dcf 100644
--- a/libgomp/testsuite/libgomp.fortran/omp_hello.f
+++ b/libgomp/testsuite/libgomp.fortran/omp_hello.f
@@ -0,0 +1 @@
+C { dg-do run }
diff --git a/libgomp/testsuite/libgomp.fortran/omp_orphan.f b/libgomp/testsuite/libgomp.fortran/omp_orphan.f
index 7653c78d2e4..306ed4eb64d 100644
--- a/libgomp/testsuite/libgomp.fortran/omp_orphan.f
+++ b/libgomp/testsuite/libgomp.fortran/omp_orphan.f
@@ -0,0 +1 @@
+C { dg-do run }
diff --git a/libgomp/testsuite/libgomp.fortran/omp_reduction.f b/libgomp/testsuite/libgomp.fortran/omp_reduction.f
index 0560bd8963d..222f20d010a 100644
--- a/libgomp/testsuite/libgomp.fortran/omp_reduction.f
+++ b/libgomp/testsuite/libgomp.fortran/omp_reduction.f
@@ -0,0 +1 @@
+C { dg-do run }
diff --git a/libgomp/testsuite/libgomp.fortran/omp_workshare1.f b/libgomp/testsuite/libgomp.fortran/omp_workshare1.f
index 8aef69406de..6fb8aad3ad0 100644
--- a/libgomp/testsuite/libgomp.fortran/omp_workshare1.f
+++ b/libgomp/testsuite/libgomp.fortran/omp_workshare1.f
@@ -0,0 +1 @@
+C { dg-do run }
diff --git a/libgomp/testsuite/libgomp.fortran/omp_workshare2.f b/libgomp/testsuite/libgomp.fortran/omp_workshare2.f
index 9e61da91e9b..3280dca9e6a 100644
--- a/libgomp/testsuite/libgomp.fortran/omp_workshare2.f
+++ b/libgomp/testsuite/libgomp.fortran/omp_workshare2.f
@@ -0,0 +1 @@
+C { dg-do run }
diff --git 

Re: [PATCH,Fortran] Taking a BYTE out of type-spec

2019-10-25 Thread Steve Kargl
On Fri, Oct 25, 2019 at 09:05:03AM +0200, Tobias Burnus wrote:
> On 10/24/19 10:43 PM, Steve Kargl wrote:
> > The patch moves the matching of the nonstandard type-spec
> > BYTE to its own matching function.  During this move, a
> > check for invalid matching in free-form source code it
> > detected (see byte_4.f90).  OK to commit?
> 
> OK with a nit.
> 
> > +  if (gfc_current_form == FORM_FREE)
> > +   {
> > + char c = gfc_peek_ascii_char ();
> > + if (!gfc_is_whitespace (c) && c != ',')
> > +   return MATCH_NO;
> 
> You also want to permit "byte::var", hence: c == ':' is also okay – you 
> can also add this as variant to the test case.
> 
> Cheers,

Thanks and good catch.  I tend to think of BYTE as a legacy 
extension, which likely was used prior to the double colon
notation, so 'byte ::' won't appear.  But, it does hurt to
allow it.

-- 
Steve


[Patch][Fortran] OpenACC – libgomp/testsuite – use 'stop' and 'dg-do run'

2019-10-25 Thread Tobias Burnus
This patch is for libgomp/testsuite/libgomp.oacc-fortran/ and does two 
things:


* It changes 'call abort' to 'stop 1' (etc.) to make Bernhard R.-F. 
happy as it is then doesn't use a vendor extension.


[Note: I kept the abort call in abort-{1,2,}.f90. [NB: stop*.f also 
exists.] And: 'call abort()' calls the libc's 'abort()' function while 
'stop' prints 'STOP ' and the number to stderr and uses the number as 
exit() code. (On some systems like Windows, certain exit codes still 
imply aborting.)]


* It adds '{ dg-do run }'. While the default in this directory to _run_ 
the program, the explicit 'dg-do run' turns it into a torture test.


Quoting ./fortran.exp: "For Fortran we're doing torture testing, as 
Fortran has far more tests with arrays etc. that testing just -O0 or -O2 
is insufficient, that is typically not the case for C/C++."



Regtested on x86-64-gnu-linux without failures.
I intent to commit it as obvious in the next days, but I am happy to get 
an explicit approvals – or to get some comments.


Cheers,

Tobias

	libgomp/
	* testsuite/libgomp.oacc-fortran/abort-1.f90: Add 'dg-do run'.
	* testsuite/libgomp.oacc-fortran/abort-2.f90: Ditto.
	* testsuite/libgomp.oacc-fortran/acc_on_device-1-1.f90: Ditto.
	* testsuite/libgomp.oacc-fortran/acc_on_device-1-2.f90: Ditto.
	* testsuite/libgomp.oacc-fortran/acc_on_device-1-3.f90: Ditto.
	* testsuite/libgomp.oacc-fortran/lib-1.f90: Ditto.
	* testsuite/libgomp.oacc-fortran/common-block-1.f90:
	Use 'stop' not abort().
	* testsuite/libgomp.oacc-fortran/common-block-2.f90: Ditto.
	* testsuite/libgomp.oacc-fortran/common-block-3.f90: Ditto.
	* testsuite/libgomp.oacc-fortran/data-1.f90: Ditto.
	* testsuite/libgomp.oacc-fortran/data-2.f90: Ditto.
	* testsuite/libgomp.oacc-fortran/data-5.f90: Ditto.
	* testsuite/libgomp.oacc-fortran/dummy-array.f90: Ditto.
	* testsuite/libgomp.oacc-fortran/gemm-2.f90: Ditto.
	* testsuite/libgomp.oacc-fortran/gemm.f90: Ditto.
	* testsuite/libgomp.oacc-fortran/host_data-2.f90: Ditto.
	* testsuite/libgomp.oacc-fortran/host_data-3.f90: Ditto.
	* testsuite/libgomp.oacc-fortran/host_data-4.f90: Ditto.
	* testsuite/libgomp.oacc-fortran/kernels-collapse-3.f90: Ditto.
	* testsuite/libgomp.oacc-fortran/kernels-collapse-4.f90: Ditto.
	* testsuite/libgomp.oacc-fortran/kernels-independent.f90: Ditto.
	* testsuite/libgomp.oacc-fortran/kernels-loop-1.f90: Ditto.
	* testsuite/libgomp.oacc-fortran/kernels-map-1.f90: Ditto.
	* testsuite/libgomp.oacc-fortran/kernels-parallel-loop-data-enter-exit.f95:
	Ditto.
	* testsuite/libgomp.oacc-fortran/kernels-private-vars-loop-gang-1.f90:
	Ditto.
	* testsuite/libgomp.oacc-fortran/kernels-private-vars-loop-gang-2.f90:
	Ditto.
	* testsuite/libgomp.oacc-fortran/kernels-private-vars-loop-gang-3.f90:
	Ditto.
	* testsuite/libgomp.oacc-fortran/kernels-private-vars-loop-gang-6.f90:
	Ditto.
	* testsuite/libgomp.oacc-fortran/kernels-private-vars-vector-1.f90:
	Ditto.
	* testsuite/libgomp.oacc-fortran/kernels-private-vars-vector-2.f90:
	Ditto.
	* testsuite/libgomp.oacc-fortran/kernels-private-vars-worker-1.f90:
	Ditto.
	* testsuite/libgomp.oacc-fortran/kernels-private-vars-worker-2.f90:
	Ditto.
	* testsuite/libgomp.oacc-fortran/kernels-private-vars-worker-3.f90:
	Ditto.
	* testsuite/libgomp.oacc-fortran/kernels-private-vars-worker-4.f90:
	Ditto.
	* testsuite/libgomp.oacc-fortran/kernels-private-vars-worker-5.f90:
	Ditto.
	* testsuite/libgomp.oacc-fortran/kernels-private-vars-worker-6.f90:
	Ditto.
	* testsuite/libgomp.oacc-fortran/kernels-private-vars-worker-7.f90:
	Ditto.
	* testsuite/libgomp.oacc-fortran/kernels-reduction-1.f90: Ditto.
	* testsuite/libgomp.oacc-fortran/lib-12.f90: Ditto.
	* testsuite/libgomp.oacc-fortran/lib-13.f90: Ditto.
	* testsuite/libgomp.oacc-fortran/lib-14.f90: Ditto.
	* testsuite/libgomp.oacc-fortran/kernels-acc-loop-reduction-2.f90:
	Likewise and also add 'dg-do run'.
	* testsuite/libgomp.oacc-fortran/kernels-acc-loop-reduction.f90:
	Ditto.

diff --git a/libgomp/testsuite/libgomp.oacc-fortran/abort-1.f90 b/libgomp/testsuite/libgomp.oacc-fortran/abort-1.f90
index fc0af7ff7d8..70c05d7d3c1 100644
--- a/libgomp/testsuite/libgomp.oacc-fortran/abort-1.f90
+++ b/libgomp/testsuite/libgomp.oacc-fortran/abort-1.f90
@@ -0,0 +1,2 @@
+! { dg-do run }
+!
diff --git a/libgomp/testsuite/libgomp.oacc-fortran/abort-2.f90 b/libgomp/testsuite/libgomp.oacc-fortran/abort-2.f90
index 97a692ba667..6671d46d8b8 100644
--- a/libgomp/testsuite/libgomp.oacc-fortran/abort-2.f90
+++ b/libgomp/testsuite/libgomp.oacc-fortran/abort-2.f90
@@ -0,0 +1,2 @@
+! { dg-do run }
+!
diff --git a/libgomp/testsuite/libgomp.oacc-fortran/acc_on_device-1-1.f90 b/libgomp/testsuite/libgomp.oacc-fortran/acc_on_device-1-1.f90
index d6c67a0c31a..1a8432cfa86 100644
--- a/libgomp/testsuite/libgomp.oacc-fortran/acc_on_device-1-1.f90
+++ b/libgomp/testsuite/libgomp.oacc-fortran/acc_on_device-1-1.f90
@@ -0,0 +1 @@
+! { dg-do run }
@@ -1,0 +3 @@
+!
diff --git a/libgomp/testsuite/libgomp.oacc-fortran/acc_on_device-1-2.f 

[PATCH] PR85678: Change default to -fno-common

2019-10-25 Thread Wilco Dijkstra
GCC currently defaults to -fcommon.  As discussed in the PR, this is an ancient
C feature which is not conforming with the latest C standards.  On many targets
this means global variable accesses have a codesize and performance penalty.
This applies to C code only, C++ code is not affected by -fcommon.  It is about
time to change the default.

OK for commit?

ChangeLog
2019-10-25  Wilco Dijkstra  

PR85678
* common.opt (fcommon): Change init to 1.

doc/
* invoke.texi (-fcommon): Update documentation.
---

diff --git a/gcc/common.opt b/gcc/common.opt
index 
0195b0cb85a06dd043fd0412b42dfffddfa2495b..b0840f41a5e480f4428bd62724b0dc3d54c68c0b
 100644
--- a/gcc/common.opt
+++ b/gcc/common.opt
@@ -1131,7 +1131,7 @@ Common Report Var(flag_combine_stack_adjustments) 
Optimization
 Looks for opportunities to reduce stack adjustments and stack references.
 
 fcommon
-Common Report Var(flag_no_common,0)
+Common Report Var(flag_no_common,0) Init(1)
 Put uninitialized globals in the common section.
 
 fcompare-debug
diff --git a/gcc/doc/invoke.texi b/gcc/doc/invoke.texi
index 
857d9692729e503657d0d0f44f1f6252ec90d49a..5b4ff66015f5f94a5bd89e4dc3d2d53553cc091e
 100644
--- a/gcc/doc/invoke.texi
+++ b/gcc/doc/invoke.texi
@@ -568,7 +568,7 @@ Objective-C and Objective-C++ Dialects}.
 -fnon-call-exceptions  -fdelete-dead-exceptions  -funwind-tables @gol
 -fasynchronous-unwind-tables @gol
 -fno-gnu-unique @gol
--finhibit-size-directive  -fno-common  -fno-ident @gol
+-finhibit-size-directive  -fcommon  -fno-ident @gol
 -fpcc-struct-return  -fpic  -fPIC  -fpie  -fPIE  -fno-plt @gol
 -fno-jump-tables @gol
 -frecord-gcc-switches @gol
@@ -14050,35 +14050,27 @@ useful for building programs to run under WINE@.
 code that is not binary compatible with code generated without that switch.
 Use it to conform to a non-default application binary interface.
 
-@item -fno-common
-@opindex fno-common
+@item -fcommon
 @opindex fcommon
+@opindex fno-common
 @cindex tentative definitions
-In C code, this option controls the placement of global variables 
-defined without an initializer, known as @dfn{tentative definitions} 
-in the C standard.  Tentative definitions are distinct from declarations 
+In C code, this option controls the placement of global variables
+defined without an initializer, known as @dfn{tentative definitions}
+in the C standard.  Tentative definitions are distinct from declarations
 of a variable with the @code{extern} keyword, which do not allocate storage.
 
-Unix C compilers have traditionally allocated storage for
-uninitialized global variables in a common block.  This allows the
-linker to resolve all tentative definitions of the same variable
+The default is @option{-fno-common}, which specifies that the compiler places
+uninitialized global variables in the BSS section of the object file.
+This inhibits the merging of tentative definitions by the linker so you get a
+multiple-definition error if the same variable is accidentally defined in more
+than one compilation unit.
+
+The @option{-fcommon} places uninitialized global variables in a common block.
+This allows the linker to resolve all tentative definitions of the same 
variable
 in different compilation units to the same object, or to a non-tentative
-definition.  
-This is the behavior specified by @option{-fcommon}, and is the default for 
-GCC on most targets.  
-On the other hand, this behavior is not required by ISO
-C, and on some targets may carry a speed or code size penalty on
-variable references.
-
-The @option{-fno-common} option specifies that the compiler should instead
-place uninitialized global variables in the BSS section of the object file.
-This inhibits the merging of tentative definitions by the linker so
-you get a multiple-definition error if the same 
-variable is defined in more than one compilation unit.
-Compiling with @option{-fno-common} is useful on targets for which
-it provides better performance, or if you wish to verify that the
-program will work on other systems that always treat uninitialized
-variable definitions this way.
+definition.  This behavior does not conform to ISO C, is inconsistent with C++,
+and on many targets implies a speed and code size penalty on global variable
+references.  It is mainly useful to enable legacy code to link without errors.
 
 @item -fno-ident
 @opindex fno-ident


Re: [PATCH, Fortran] Optionally suppress no-automatic overwrites recursive warning - for approval

2019-10-25 Thread Tobias Burnus

Hi Steve,

On 10/25/19 4:17 PM, Steve Kargl wrote:
My BOZ patch brought gfortran closer to an actual comforming Fortran 
compiler while providing an option that would allow quite a few 
documented and undocumented extensions. If the patch broke some of 
your code, and -fallow-invalid-boz did not allow the code to compile
Actually, it does allow it to compile – but I missed the option; 
-std=legacy used to be sufficient.


For the code at hand, it was actually better to nag the author to change 
the code to valid Fortran than to fiddle around with some options.


But I think that point to a problem: Some compiler flags do exist, but 
it is are hard to find them (especially, if one does not know that they 
do exist). … I am not claiming that it should be done in this case, but 
sometimes something like:


"Error: BOZ literal constant at (1) is neither a DATA statement value 
nor an actual argument of INT/REAL/DBLE/CMPLX intrinsic subprogram; _for 
legacy code, -fallow-invalid-boz might be an option"_


would have surely helped.**

Side note: to me, "the … intrinsic function" is clearer than "… 
intrinsic subprogram".




and you were forced to use INT(, kind=) to get it to compile


Well, using INT() was the way in order to get the code standard 
conforming :-)


As you you quoted:


C410  (R411) A boz-literal-constant shall appear only as a
   data-stmt-constant in a DATA statement, as the actual
   argument associated with the dummy argument A of the
   numeric intrinsic functions DBLE, REAL or INT, or as
   the actual argument associated with the X or Y dummy
   argument of the intrinsic function CMPLX.


Cheers,

Tobias

**This does not mean that users will always read this. (I get this 
error, what shall I do?) Nor that fixing the code would be usually be 
the better option. (As in my case.)




Re: [PATCH, Fortran] Optionally suppress no-automatic overwrites recursive warning - for approval

2019-10-25 Thread Jeff Law
On 10/25/19 7:54 AM, Tobias Burnus wrote:
> Hi Jeff,
> 
> On 10/25/19 3:22 PM, Jeff Law wrote:
>> So across Fedora the BOZ stuff tripped 2-3 packages. In comparison the
>> function argument stuff broke 30-40 packages, many of which still
>> don't build without -fallow-argument-mismatch.
> 
> Regarding the latter:
> The initial patch was too strict – an also rejected valid code
> (according to the Fortran 2018 standard).
That was my understanding from loosely following the threads.


 That has been fixed.* – Thus,
> either some valid cases were missed (gfortran bug) or all those packages
> indeed have an argument mismatch.
> 
> *That fix is: 2019-10-14 / r276972 / PR fortran/92004 /
> https://gcc.gnu.org/ml/fortran/2019-10/msg00128.html
Yea.  That patch certainly helped lapack and others.

> 
> Do you know whether those 30–40 packages have code bugs or could there
> be gfortran bugs (too strict checking) lurking?
I'm not familiar enough with the issue & packages to know if they're
cases of source bugs or gfortran being too strict.

My plan has always been to extract a few cases and pass them along for
that kind of analysis.  I've just been too busy lately with other
regressions :(

A partial list of the affected packages:


R-deldir
R
atlas
cgnslib
cp2k
elk
elpa
exciting
ga
getdata
grib_api
hdf
libccp4
mpich
hwchem
psblas3
qrmumps
qrupdate
quantum-espresso
scalapack
scipy
scorep
wannier90
wsjtx
xfoil
xrotor

There's certainly more, that list just represents those I've locally
worked around with -fallow-argument-mismatch.  Several more trigger the
mismatch error, but I haven't bothered working around yet.

That list comes from _after_ the  Oct 14 patch to correct issues in the
argument mismatch testing.

> 
> 
> Regarding the BOZ: One difference to the argument mismatch is that the
> latter has an option to still accept it (-fallow-argument-mismatch) and
> potentially generates wrong code – depending what the ME does with the
> mismatches – while the former, once parsed, causes no potential ME
> problems and as there is no flag, it always requires code changes. (On
> the other hand, fixing the BOZ issue is straight forward; argument
> changes are trickier.)
Absolutely.  That's the primary reason why I haven't contacted the
affected package maintainers yet -- I don't want them blindly adding
-fallow-argument-mismatch to their flags.

Jeff



[PATCH] rs6000: Enable limited unrolling at -O2

2019-10-25 Thread Jiufu Guo
Hi,

In PR88760, there are a few disscussion about improve or tune unroller for
targets. And we would agree to enable unroller for small loops at O2 first.
And we could see performance improvement(~10%) for below code:
```
  subroutine foo (i, i1, block)
integer :: i, i1
integer :: block(9, 9, 9)
block(i:9,1,i1) = block(i:9,1,i1) - 10
  end subroutine foo

```
This kind of code occurs a few times in exchange2 benchmark.

Similar C code:
```
  for (i = 0; i < n; i++)
arr[i] = arr[i] - 10;
```

On powerpc64le, for O2 , enable -funroll-loops and limit
PARAM_MAX_UNROLL_TIMES=2 and PARAM_MAX_UNROLLED_INSNS=20, we can see >2%
overall improvement for SPEC2017.

This patch is only for rs6000 in which we see visible performance improvement.

Bootstrapped on powerpc64le, and cases are updated. Is this ok for trunk?

Jiufu Guo
BR


gcc/
2019-10-25  Jiufu Guo   

PR tree-optimization/88760
* config/rs6000/rs6000-common.c (rs6000_option_optimization_table): for
O2, enable funroll-loops.
* config/rs6000/rs6000.c (rs6000_option_override_internal): if unroller
is enabled throught O2, set constrains to PARAM_MAX_UNROLL_TIMES=2 and
PARAM_MAX_UNROLLED_INSNS=20 for small loops.

gcc.testsuite/
2019-10-25  Jiufu Guo  

PR tree-optimization/88760
* gcc.target/powerpc/small-loop-unroll.c: New test.
* c-c++-common/tsan/thread_leak2.c: Update test.
* gcc.dg/pr59643.c: Update test.
* gcc.target/powerpc/loop_align.c: Update test.
* gcc.target/powerpc/ppc-fma-1.c: Update test.
* gcc.target/powerpc/ppc-fma-2.c: Update test.
* gcc.target/powerpc/ppc-fma-3.c: Update test.
* gcc.target/powerpc/ppc-fma-4.c: Update test.
* gcc.target/powerpc/pr78604.c: Update test.

---
 gcc/common/config/rs6000/rs6000-common.c |  1 +
 gcc/config/rs6000/rs6000.c   | 20 
 gcc/testsuite/c-c++-common/tsan/thread_leak2.c   |  1 +
 gcc/testsuite/gcc.dg/pr59643.c   |  1 +
 gcc/testsuite/gcc.target/powerpc/loop_align.c|  2 +-
 gcc/testsuite/gcc.target/powerpc/ppc-fma-1.c |  2 +-
 gcc/testsuite/gcc.target/powerpc/ppc-fma-2.c |  2 +-
 gcc/testsuite/gcc.target/powerpc/ppc-fma-3.c |  2 +-
 gcc/testsuite/gcc.target/powerpc/ppc-fma-4.c |  2 +-
 gcc/testsuite/gcc.target/powerpc/pr78604.c   |  2 +-
 gcc/testsuite/gcc.target/powerpc/small-loop-unroll.c | 13 +
 11 files changed, 42 insertions(+), 6 deletions(-)
 create mode 100644 gcc/testsuite/gcc.target/powerpc/small-loop-unroll.c

diff --git a/gcc/common/config/rs6000/rs6000-common.c 
b/gcc/common/config/rs6000/rs6000-common.c
index 4b0c205..b947196 100644
--- a/gcc/common/config/rs6000/rs6000-common.c
+++ b/gcc/common/config/rs6000/rs6000-common.c
@@ -35,6 +35,7 @@ static const struct default_options 
rs6000_option_optimization_table[] =
 { OPT_LEVELS_ALL, OPT_fsplit_wide_types_early, NULL, 1 },
 /* Enable -fsched-pressure for first pass instruction scheduling.  */
 { OPT_LEVELS_1_PLUS, OPT_fsched_pressure, NULL, 1 },
+{ OPT_LEVELS_2_PLUS, OPT_funroll_loops, NULL, 1 },
 { OPT_LEVELS_NONE, 0, NULL, 0 }
   };
 
diff --git a/gcc/config/rs6000/rs6000.c b/gcc/config/rs6000/rs6000.c
index a129137d..9a8ff9a 100644
--- a/gcc/config/rs6000/rs6000.c
+++ b/gcc/config/rs6000/rs6000.c
@@ -4540,6 +4540,26 @@ rs6000_option_override_internal (bool global_init_p)
 global_options.x_param_values,
 global_options_set.x_param_values);
 
+  /* unroll very small loops 2 time if no -funroll-loops.  */
+  if (!global_options_set.x_flag_unroll_loops
+ && !global_options_set.x_flag_unroll_all_loops)
+   {
+ maybe_set_param_value (PARAM_MAX_UNROLL_TIMES, 2,
+global_options.x_param_values,
+global_options_set.x_param_values);
+
+ maybe_set_param_value (PARAM_MAX_UNROLLED_INSNS, 20,
+global_options.x_param_values,
+global_options_set.x_param_values);
+
+ /* If fweb or frename-registers are not specificed in command-line,
+do not turn them on implicitly.  */
+ if (!global_options_set.x_flag_web)
+   global_options.x_flag_web = 0;
+ if (!global_options_set.x_flag_rename_registers)
+   global_options.x_flag_rename_registers = 0;
+   }
+
   /* If using typedef char *va_list, signal that
 __builtin_va_start (, 0) can be optimized to
 ap = __builtin_next_arg (0).  */
diff --git a/gcc/testsuite/c-c++-common/tsan/thread_leak2.c 
b/gcc/testsuite/c-c++-common/tsan/thread_leak2.c
index c9b8046..17aa5c6 100644
--- a/gcc/testsuite/c-c++-common/tsan/thread_leak2.c
+++ b/gcc/testsuite/c-c++-common/tsan/thread_leak2.c
@@ -1,4 +1,5 @@
 /* { dg-shouldfail "tsan" } 

Re: [C++ PATCH] Fix up decl_in_std_namespace_p handling of --enable-symvers=gnu-versioned-namespace

2019-10-25 Thread Marek Polacek
On Fri, Oct 25, 2019 at 12:39:58AM +0200, Jakub Jelinek wrote:
> Hi!
> 
> When looking into the constexpr new issues and adding is_std_construct_at
> function, I've noticed that with --enable-symvers=gnu-versioned-namespace

That is... sneaky.  I guess I/we need to test with
--enable-symvers=gnu-versioned-namespace every now and then.

> all of that fails, because construct_at (but for other things
> forward or move etc.) aren't directly in std namespace, but in inline
> namespace inside of it (std::_8::{construct_at,forward,move,...}).
> 
> The following patch changes the function all of those calls use to look
> through inline namespaces.
> 
> Bootstrapped/regtested on x86_64-linux and i686-linux, ok for trunk?
> 
> 2019-10-24  Jakub Jelinek  
> 
>   * typeck.c (decl_in_std_namespace_p): Return true also for decls
>   in inline namespaces inside of std namespace.
> 
>   * g++.dg/cpp0x/Wpessimizing-move6.C: New test.
> 
> --- gcc/cp/typeck.c.jj2019-10-23 20:38:00.022871653 +0200
> +++ gcc/cp/typeck.c   2019-10-24 11:36:14.982981481 +0200
> @@ -9395,8 +9395,16 @@ maybe_warn_about_returning_address_of_lo
>  bool
>  decl_in_std_namespace_p (tree decl)
>  {
> -  return (decl != NULL_TREE
> -   && DECL_NAMESPACE_STD_P (decl_namespace_context (decl)));
> +  while (decl)
> +{
> +  decl = decl_namespace_context (decl);
> +  if (DECL_NAMESPACE_STD_P (decl))
> + return true;
> +  if (!DECL_NAMESPACE_INLINE_P (decl))
> + return false;
> +  decl = CP_DECL_CONTEXT (decl);
> +}
> +  return false;

Probably deserves a comment.

--
Marek Polacek • Red Hat, Inc. • 300 A St, Boston, MA



Re: [Patch][Fortran] OpenACC – permit common blocks in some clauses

2019-10-25 Thread Tobias Burnus

Hi Thomas,

On 10/25/19 10:43 AM, Thomas Schwinge wrote:

OK for trunk, with the following few small items considered.


Committed as Rev. 277451 – after a fresh bootstrap and regtesting.

Changes:
* I have now a new test case 
libgomp/testsuite/libgomp.oacc-fortran/common-block-3.f90 which looks at 
omplower.
* In the compile-time *{2,3} test case, there is now also a 'enter data' 
and 'update host/self/device' test.

* the libgomp tests have a 'dg-do run'.
* I modified the code in gimplify.c as proposed.


Regarding the new test case: Without the gcc/gimplify.c changes, one has 
(see last item before child fn):


    #pragma omp target oacc_parallel map(tofrom:a [len: 400]) 
map(tofrom:b [len: 400]) map(tofrom:c [len: 4]) map(tofrom:block [len: 
812]) [child fn …
    #pragma omp target oacc_kernels map(force_tofrom:i [len: 4]) 
map(tofrom:y [len: 400]) map(tofrom:x [len: 400]) 
map(tofrom:kernel_block [len: 804]) map(force_tofrom:c [len: 4]) 
map(tofrom:block [len: 812])  [child fn …


With the changes of gcc/gimplify.c, one has:

    #pragma omp target oacc_parallel map(tofrom:a [len: 400]) 
map(tofrom:b [len: 400]) map(tofrom:c [len: 4]) [child fn …
    #pragma omp target oacc_kernels map(force_tofrom:i [len: 4]) 
map(tofrom:y [len: 400]) map(tofrom:x [len: 400]) map(force_tofrom:c 
[len: 4])  [child fn …



And without gimplify.c, the added run-tests indeed fail with:
libgomp: Trying to map into device [0x407100..0x407294) object when 
[0x407100..0x407290) is already mapped



Tobias

PS:

Or, would it be easy to add an OpenACC 'kernels' test case that otherwise
faild (at run time, say, with aforementioned duplicate mapping errors, or
would contain "strange"/duplicate/conflicting mapping items in the
'-fdump-tree-gimple' dump)?


See new test case and result for the current tests.

Additionally, I have applied:


Wouldn't it be clearer if that latter one were written as follows:
 if (DECL_HAS_VALUE_EXPR_P (decl))
   {
 if (ctx->region_type & ORT_ACC)
   /* For OpenACC, defer expansion of value to avoid transfering
  privatized common block data instead of im-/explicitly transfered
  variables which are in common blocks.  */
   ;
 else
   {
 tree value = get_base_address (DECL_VALUE_EXPR (decl));
 
 if (value && DECL_P (value) && DECL_THREAD_LOCAL_P (value))

   return omp_notice_threadprivate_variable (ctx, decl, value);
   }
   }


@@ -7353,7 +7374,9 @@ omp_notice_variable (struct gimplify_omp_ctx *ctx, tree 
decl, bool in_code)
n = splay_tree_lookup (ctx->variables, (splay_tree_key)decl);
if ((ctx->region_type & ORT_TARGET) != 0)
  {
-  ret = lang_hooks.decls.omp_disregard_value_expr (decl, true);
+  /* For OpenACC, as remarked above, defer expansion.  */
+  shared = !(ctx->region_type & ORT_ACC);
+  ret = lang_hooks.decls.omp_disregard_value_expr (decl, shared);

Also more explicit, easier to read:

 if (ctx->region_type & ORT_ACC)
   /* For OpenACC, as remarked above, defer expansion.  */
   shared = false;
 else
   shared = true;


@@ -7521,6 +7544,9 @@ omp_notice_variable (struct gimplify_omp_ctx *ctx, tree 
decl, bool in_code)
  }
  
shared = ((flags | n->value) & GOVD_SHARED) != 0;

+  /* For OpenACC, cf. remark above regaring common blocks.  */
+  if (ctx->region_type & ORT_ACC)
+shared = false;
ret = lang_hooks.decls.omp_disregard_value_expr (decl, shared);

And again:

 if (ctx->region_type & ORT_ACC)
   /* For OpenACC, cf. remark above regaring common blocks.  */
   shared = false;
 else
   shared = ((flags | n->value) & GOVD_SHARED) != 0;

(In all three cases, using an easy 'if (ctx->region_type & ORT_ACC)' to
point out the special case.)

It's still some kind of voodoo to me -- but at least, you've now also
reviewed this, and it's now documented what's going on.



And changed the test case based on:


+  !$acc exit data delete(/blockA/, /blockB/, e, v)

I note there is one single 'exit data' test, but no 'enter data'.

Also, 'update' is missing, to test the 'device' and 'self'/'host' clauses.


+  !$acc data deviceptr(/blockA/, /blockB/, e, v) ! { dg-error "Syntax error in 
OpenMP variable list" }
+  !$acc end data ! { dg-error "Unexpected ..ACC END DATA statement" }
+
+  !$acc data deviceptr(/blockA/, /blockB/, e, v) ! { dg-error "Syntax error in 
OpenMP variable list" }
+  !$acc end data ! { dg-error "Unexpected ..ACC END DATA statement" }
+end program test

Is there a reason for the duplicated 'deviceptr' testing?

Move 'data deviceptr' up a little bit, next to the other 'data' construct
testing?


--- /dev/null
+++ b/gcc/testsuite/gfortran.dg/goacc/common-block-2.f90

Similarly.
commit 96d1e6235a5b7c81df7940c1c8727f87dc1b577a
Author: burnus 
Date:   Fri Oct 25 14:28:40 2019 +

[Fortran] OpenACC – permit common blocks in some clauses

2019-10-25  Cesar 

Re: [PATCH, Fortran] Optionally suppress no-automatic overwrites recursive warning - for approval

2019-10-25 Thread Steve Kargl
On Fri, Oct 25, 2019 at 10:35:24AM +0200, Tobias Burnus wrote:
> On 9/26/19 10:45 AM, Mark Eggleston wrote:
> 
> PS: I was also not that happy about the BOZ changes by Steve, which 
> broke code here – but, fortunately, adding int( ,kind=) around it was 
> sufficient and that code was supposed to be F2003 standard conforming.  
> I could ping the authors and is now fixed. Still, I wonder how much code 
> broke due to that change; code is not that simple to fix. – But, in 
> general, I am very much in favour in having valid Fortran 2018 code (can 
> be fixed form, old and use old features, that's fine).
> 

My BOZ patch brought gfortran closer to an actual comforming
Fortran compiler while providing an option that would allow
quite a few documented and undocumented extensions.  If the
patch broke some of your code, and -fallow-invalid-boz did not
allow the code to compile, and you were forced to use INT(, kind=)
to get it to compile, then, no, the code was not conforming.

And since you mention F2003, for the record

C410  (R411) A boz-literal-constant shall appear only as a
  data-stmt-constant in a DATA statement, as the actual
  argument associated with the dummy argument A of the
  numeric intrinsic functions DBLE, REAL or INT, or as
  the actual argument associated with the X or Y dummy
  argument of the intrinsic function CMPLX.

-- 
Steve


[C++] Fix interaction between aka changes and DR1558 (PR92206)

2019-10-25 Thread Richard Sandiford
One of the changes in r277281 was to make the typedef variant
handling in strip_typedefs pass the raw DECL_ORIGINAL_TYPE to the
recursive call, instead of applying TYPE_MAIN_VARIANT first.
This PR shows that that interacts badly with the implementation
of DR1558, because we then refuse to strip aliases with dependent
template parameters and trip:

  gcc_assert (!typedef_variant_p (result)
  || ((flags & STF_USER_VISIBLE)
  && !user_facing_original_type_p (result)));

Keeping the current behaviour but suppressing the ICE leads to a
duplicate error (the dg-bogus in the first test), so that didn't
seem like a good fix.

I assume keeping the alias should never actually be necessary for
DECL_ORIGINAL_TYPEs, because it will already have been checked
somewhere, even for implicit TYPE_DECLs.  This patch therefore
passes a flag to say that we can assume the type is validated
elsewhere.

It seems a rather clunky fix, sorry, but restoring the
TYPE_MAIN_VARIANT (...) isn't compatible with the aka stuff.

Bootstrapped & regression-tested on aarch64-linux-gnu.  OK to install?

Richard


2019-10-25  Richard Sandiford  

gcc/cp/
PR c++/92206
* cp-tree.h (STF_ASSUME_VALID): New constant.
* tree.c (strip_typedefs): Add STF_ASSUME_VALID to the flags
when calling strip_typedefs recursively on a DECL_ORIGINAL_TYPE.
Don't apply the fix for DR1558 in that case; allow aliases with
dependent template parameters to be stripped instead.

gcc/testsuite/
PR c++/92206
* g++.dg/pr92206-1.C: New test.
* g++.dg/pr92206-2.C: Likewise.
* g++.dg/pr92206-3.C: Likewise.

Index: gcc/cp/cp-tree.h
===
--- gcc/cp/cp-tree.h2019-10-25 09:21:28.098331304 +0100
+++ gcc/cp/cp-tree.h2019-10-25 14:51:22.618009886 +0100
@@ -5728,8 +5728,12 @@ #define TFF_POINTER  (1
 
STF_USER_VISIBLE: use heuristics to try to avoid stripping user-facing
aliases of internal details.  This is intended for diagnostics,
-   where it should (for example) give more useful "aka" types.  */
+   where it should (for example) give more useful "aka" types.
+
+   STF_ASSUME_VALID: assume where possible that the given type is valid,
+   relying on code elsewhere to report any appropriate diagnostics.  */
 const unsigned int STF_USER_VISIBLE = 1U;
+const unsigned int STF_ASSUME_VALID = 1U << 1;
 
 /* Returns the TEMPLATE_DECL associated to a TEMPLATE_TEMPLATE_PARM
node.  */
Index: gcc/cp/tree.c
===
--- gcc/cp/tree.c   2019-10-22 08:47:01.255327861 +0100
+++ gcc/cp/tree.c   2019-10-25 14:51:22.618009886 +0100
@@ -1489,7 +1489,8 @@ strip_typedefs (tree t, bool *remove_att
   if (t == TYPE_CANONICAL (t))
 return t;
 
-  if (dependent_alias_template_spec_p (t))
+  if (!(flags & STF_ASSUME_VALID)
+  && dependent_alias_template_spec_p (t))
 /* DR 1558: However, if the template-id is dependent, subsequent
template argument substitution still applies to the template-id.  */
 return t;
@@ -1674,7 +1675,8 @@ strip_typedefs (tree t, bool *remove_att
  && !user_facing_original_type_p (t))
return t;
  result = strip_typedefs (DECL_ORIGINAL_TYPE (TYPE_NAME (t)),
-  remove_attributes, flags);
+  remove_attributes,
+  flags | STF_ASSUME_VALID);
}
   else
result = TYPE_MAIN_VARIANT (t);
Index: gcc/testsuite/g++.dg/pr92206-1.C
===
--- /dev/null   2019-09-17 11:41:18.176664108 +0100
+++ gcc/testsuite/g++.dg/pr92206-1.C2019-10-25 14:51:22.618009886 +0100
@@ -0,0 +1,9 @@
+// { dg-require-effective-target c++11 }
+
+template struct A {};
+template using alias1 = A;
+template class B {
+  using alias2 = alias1>; // { dg-error {no type named 'value'} }
+  A a; // { dg-bogus {no type named 'value'} }
+};
+B b;
Index: gcc/testsuite/g++.dg/pr92206-2.C
===
--- /dev/null   2019-09-17 11:41:18.176664108 +0100
+++ gcc/testsuite/g++.dg/pr92206-2.C2019-10-25 14:51:22.618009886 +0100
@@ -0,0 +1,14 @@
+// { dg-require-effective-target c++11 }
+
+template  struct A;
+class Vector {
+  template  struct TypeIsGCThing {
+template ::Type> using Vector = Vector;
+struct B;
+template  class ContainerIter {
+  using Action = B;
+  using ActionVector = Vector;
+  ContainerIter a;
+};
+  };
+};
Index: gcc/testsuite/g++.dg/pr92206-3.C
===
--- /dev/null   2019-09-17 11:41:18.176664108 +0100
+++ gcc/testsuite/g++.dg/pr92206-3.C2019-10-25 14:51:22.618009886 +0100
@@ -0,0 +1,8 @@
+// { dg-require-effective-target c++11 }
+
+template  

Re: [PATCH, Fortran] Optionally suppress no-automatic overwrites recursive warning - for approval

2019-10-25 Thread Tobias Burnus

Hi Jeff,

On 10/25/19 3:22 PM, Jeff Law wrote:
So across Fedora the BOZ stuff tripped 2-3 packages. In comparison the 
function argument stuff broke 30-40 packages, many of which still 
don't build without -fallow-argument-mismatch.


Regarding the latter:
The initial patch was too strict – an also rejected valid code 
(according to the Fortran 2018 standard). That has been fixed.* – Thus, 
either some valid cases were missed (gfortran bug) or all those packages 
indeed have an argument mismatch.


*That fix is: 2019-10-14 / r276972 / PR fortran/92004 / 
https://gcc.gnu.org/ml/fortran/2019-10/msg00128.html


Do you know whether those 30–40 packages have code bugs or could there 
be gfortran bugs (too strict checking) lurking?



Regarding the BOZ: One difference to the argument mismatch is that the 
latter has an option to still accept it (-fallow-argument-mismatch) and 
potentially generates wrong code – depending what the ME does with the 
mismatches – while the former, once parsed, causes no potential ME 
problems and as there is no flag, it always requires code changes. (On 
the other hand, fixing the BOZ issue is straight forward; argument 
changes are trickier.)


Cheers,

Tobias



Re: [PATCH] Define [range.cmp] comparisons for C++20

2019-10-25 Thread Jonathan Wakely

On 25/10/19 14:34 +0100, Jonathan Wakely wrote:

On 23/10/19 08:08 +0100, Jonathan Wakely wrote:

On Wed, 23 Oct 2019 at 00:33, Tam S. B.  wrote:

The use of concepts is causing `#include ` to break on clang.


OK, thanks, I'll guard it with #if.


Fixed on trunk with this patch. My Clang 7.0.1 still can't compile current
trunk though, because we now have a constexpr destructor on
std::allocator.


Which should be fixed by this, which I'll commit after testing
finishes.

commit 063495404380ebe3d28429c6ce718cd05c8d5451
Author: Jonathan Wakely 
Date:   Fri Oct 25 14:43:23 2019 +0100

Fix compilation with Clang

The new constexpr destructor on std::allocator breaks compilation with
Clang in C++2a mode. This only makes it constexpr if the compiler
supports the P0784R7 features.

* include/bits/allocator.h: Check __cpp_constexpr_dynamic_alloc
before making the std::allocator destructor constexpr.
* testsuite/20_util/allocator/requirements/constexpr.cc: New test.

diff --git a/libstdc++-v3/include/bits/allocator.h b/libstdc++-v3/include/bits/allocator.h
index 2559c57b12e..00d7461e42e 100644
--- a/libstdc++-v3/include/bits/allocator.h
+++ b/libstdc++-v3/include/bits/allocator.h
@@ -160,7 +160,9 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
 	_GLIBCXX20_CONSTEXPR
 	allocator(const allocator<_Tp1>&) _GLIBCXX_NOTHROW { }
 
-  _GLIBCXX20_CONSTEXPR
+#if __cpp_constexpr_dynamic_alloc
+  constexpr
+#endif
   ~allocator() _GLIBCXX_NOTHROW { }
 
 #if __cplusplus > 201703L
diff --git a/libstdc++-v3/testsuite/20_util/allocator/requirements/constexpr.cc b/libstdc++-v3/testsuite/20_util/allocator/requirements/constexpr.cc
new file mode 100644
index 000..6a6dbf1833f
--- /dev/null
+++ b/libstdc++-v3/testsuite/20_util/allocator/requirements/constexpr.cc
@@ -0,0 +1,28 @@
+// Copyright (C) 2019 Free Software Foundation, Inc.
+//
+// This file is part of the GNU ISO C++ Library.  This library is free
+// software; you can redistribute it and/or modify it under the
+// terms of the GNU General Public License as published by the
+// Free Software Foundation; either version 3, or (at your option)
+// any later version.
+
+// This library is distributed in the hope that it will be useful,
+// but WITHOUT ANY WARRANTY; without even the implied warranty of
+// MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+// GNU General Public License for more details.
+
+// You should have received a copy of the GNU General Public License along
+// with this library; see the file COPYING3.  If not see
+// .
+
+// { dg-options "-std=gnu++2a" }
+// { dg-do compile { target c++2a } }
+
+#include 
+
+constexpr bool f()
+{
+  std::allocator a;
+  return std::allocator_traits>::max_size(a) > 0;
+}
+static_assert( f() );


Re: [PATCH] Define [range.cmp] comparisons for C++20

2019-10-25 Thread Jonathan Wakely

On 23/10/19 08:08 +0100, Jonathan Wakely wrote:

On Wed, 23 Oct 2019 at 00:33, Tam S. B.  wrote:

The use of concepts is causing `#include ` to break on clang.


OK, thanks, I'll guard it with #if.


Fixed on trunk with this patch. My Clang 7.0.1 still can't compile current
trunk though, because we now have a constexpr destructor on
std::allocator.

Thanks for the report.

commit 27274f130d7c3cf8a6ccbf27c24368fbbf2fb3fb
Author: Jonathan Wakely 
Date:   Fri Oct 25 14:26:54 2019 +0100

Guard use of concepts with feature test macro

This fixes a regression when using Clang.

* include/bits/range_cmp.h: Check __cpp_lib_concepts before defining
concepts. Fix comment.

diff --git a/libstdc++-v3/include/bits/range_cmp.h b/libstdc++-v3/include/bits/range_cmp.h
index a77fd5274b9..870eb3a8ee5 100644
--- a/libstdc++-v3/include/bits/range_cmp.h
+++ b/libstdc++-v3/include/bits/range_cmp.h
@@ -22,7 +22,7 @@
 // see the files COPYING3 and COPYING.RUNTIME respectively.  If not, see
 // .
 
-/** @file bits/ranges_function.h
+/** @file bits/range_cmp.h
  *  This is an internal header file, included by other library headers.
  *  Do not attempt to use it directly. @headername{functional}
  */
@@ -54,6 +54,7 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
 using is_transparent = __is_transparent;
   };
 
+#ifdef __cpp_lib_concepts
 namespace ranges
 {
   namespace __detail
@@ -182,6 +183,7 @@ namespace ranges
   };
 
 } // namespace ranges
+#endif // library concepts
 _GLIBCXX_END_NAMESPACE_VERSION
 } // namespace std
 #endif // C++20


Re: [PATCH, Fortran] Optionally suppress no-automatic overwrites recursive warning - for approval

2019-10-25 Thread Jeff Law
On 10/25/19 2:35 AM, Tobias Burnus wrote:
> On 9/26/19 10:45 AM, Mark Eggleston wrote:
>> Original thread starts here
>> https://gcc.gnu.org/ml/gcc-patches/2019-09/msg01185.html
>> OK to commit?
> 
> As Steve, I am not really happy about adding yet another option and
> especially not about legacy features. On the other hand, I see that
> legacy code is still used.
> 
> Having said this, the patch is OK from my side.
> 
> Tobias
> 
> PS: I was also not that happy about the BOZ changes by Steve, which
> broke code here – but, fortunately, adding int( ,kind=) around it was
> sufficient and that code was supposed to be F2003 standard conforming. 
> I could ping the authors and is now fixed. Still, I wonder how much code
> broke due to that change; code is not that simple to fix. – But, in
> general, I am very much in favour in having valid Fortran 2018 code (can
> be fixed form, old and use old features, that's fine).
So across Fedora the BOZ stuff tripped 2-3 packages.  In comparison the
function argument stuff broke 30-40 packages, many of which still don't
build without -fallow-argument-mismatch.

jeff



[PATCH 4/N] Fix unsigned type overflow in memory report.

2019-10-25 Thread Martin Liška
Hi.

And there's one more integer overflow fix.

Martin
>From 35c22704dc705508672f19b09e6d1b94bd956535 Mon Sep 17 00:00:00 2001
From: Martin Liska 
Date: Fri, 25 Oct 2019 15:09:32 +0200
Subject: [PATCH] Fix unsigned type overflow in memory report.

gcc/ChangeLog:

2019-10-25  Martin Liska  

	* ggc-common.c: One can't subtract unsigned types
	in compare function.
---
 gcc/ggc-common.c | 9 ++---
 1 file changed, 6 insertions(+), 3 deletions(-)

diff --git a/gcc/ggc-common.c b/gcc/ggc-common.c
index 37d3c5df9e1..48db4208599 100644
--- a/gcc/ggc-common.c
+++ b/gcc/ggc-common.c
@@ -928,10 +928,13 @@ public:
   static int
   compare (const void *first, const void *second)
   {
-const mem_pair_t f = *(const mem_pair_t *)first;
-const mem_pair_t s = *(const mem_pair_t *)second;
+const mem_pair_t mem1 = *(const mem_pair_t *) first;
+const mem_pair_t mem2 = *(const mem_pair_t *) second;
 
-return s.second->get_balance () - f.second->get_balance ();
+size_t balance1 = mem1.second->get_balance ();
+size_t balance2 = mem2.second->get_balance ();
+
+return balance1 == balance2 ? 0 : (balance1 < balance2 ? 1 : -1);
   }
 
   /* Dump header with NAME.  */
-- 
2.23.0



Re: [PATCH 00/29] [arm] Rewrite DImode arithmetic support

2019-10-25 Thread Christophe Lyon
On Fri, 25 Oct 2019 at 12:08, Richard Earnshaw (lists)
 wrote:
>
> On 24/10/2019 17:10, Richard Earnshaw (lists) wrote:
> > On 24/10/2019 11:16, Christophe Lyon wrote:
> >> On 23/10/2019 15:21, Richard Earnshaw (lists) wrote:
> >>> On 23/10/2019 09:28, Christophe Lyon wrote:
>  On 21/10/2019 14:24, Richard Earnshaw (lists) wrote:
> > On 21/10/2019 12:51, Christophe Lyon wrote:
> >> On 18/10/2019 21:48, Richard Earnshaw wrote:
> >>> Each patch should produce a working compiler (it did when it was
> >>> originally written), though since the patch set has been re-ordered
> >>> slightly there is a possibility that some of the intermediate steps
> >>> may have missing test updates that are only cleaned up later.
> >>> However, only the end of the series should be considered complete.
> >>> I've kept the patch as a series to permit easier regression hunting
> >>> should that prove necessary.
> >>
> >> Thanks for this information: my validation system was designed in
> >> such a way that it will run the GCC testsuite after each of your
> >> patches, so I'll keep in mind not to report regressions (I've
> >> noticed several already).
> >>
> >>
> >> I can perform a manual validation taking your 29 patches as a
> >> single one and compare the results with those of the revision
> >> preceding the one were you committed patch #1. Do you think it
> >> would be useful?
> >>
> >>
> >> Christophe
> >>
> >>
> >
> > I think if you can filter out any that are removed by later patches
> > and then report against the patch that caused the regression itself
> > then that would be the best.  But I realise that would be more work
> > for you, so a round-up against the combined set would be OK.
> >
> > BTW, I'm aware of an issue with the compiler now generating
> >
> >   reg, reg, shift 
> >
> > in Thumb2; no need to report that again.
> >
> > Thanks,
> > R.
> > .
> >
> 
> 
>  Hi Richard,
> 
>  The validation of the whole set shows 1 regression, which was also
>  reported by the validation of r277179 (early split most DImode
>  comparison operations)
> 
>  When GCC is configured as:
>  --target arm-none-eabi
>  --with-mode default
>  --with-cpu default
>  --with-fpu default
>  (that is, no --with-mode, --with-cpu, --with-fpu option)
>  I'm using binutils-2.28 and newlib-3.1.0
> 
>  I can see:
>  FAIL: g++.dg/opt/pr36449.C  -std=gnu++14 execution test
>  (whatever -std=gnu++XX option)
> >>>
> >>> That's strange.  The assembler code generated for that test is
> >>> unchanged from before the patch series, so I can't see how it can't
> >>> be a problem in the test itself.  What's more, I can't seem to
> >>> reproduce this myself.
> >>
> >> As you have noticed, I have created PR92207 to help understand this.
> >>
> >>>
> >>> Similarly, in my build the code for _Znwj, malloc, malloc_r and
> >>> free_r are also unchanged, while the malloc_[un]lock functions are
> >>> empty stubs (not surprising as we aren't multi-threaded).
> >>>
> >>> So the only thing that looks to have really changed are the linker
> >>> offsets (some of the library code has changed, but I don't think it's
> >>> really reached in practice, so shouldn't be relevant).
> >>>
> 
>  I'm executing the tests using qemu-4.1.0 -cpu arm926
>  The qemu traces shows that code enters main, then _Znwj (operator
>  new), then _malloc_r
>  The qemu traces end with:
> >>>
> >>> What do you mean by 'end with'?  What's the failure mode of the test?
> >>> A crash, or the test exiting with a failure code?
> >>>
> >> qemu complains with:
> >> qemu: uncaught target signal 11 (Segmentation fault) - core dumped
> >> Segmentation fault (core dumped)
> >>
> >> 'end with' because my automated validation builds do not keep the full
> >> execution traces (that would need too much disk space)
> >>
> >
> > As I've said in the PR, this looks like a bug in the qemu+newlib code.
> > We call sbrk() which says, OK, but then the page isn't mapped by qemu
> > into the process and it then faults.
> >
> > So I think these changes are off the hook, it's just bad luck that they
> > expose the issue at this point in time.
> >
> > R.
> >
>
> I've closed the PR as invalid, because this is a newlib bug that is
> fixed on trunk.  https://sourceware.org/ml/newlib/2019/msg00413.html
>
Thanks for the analysis.
It looks like I have to upgrade the newlib version I'm using for validations.

Christophe

> R.


[14/n] Vectorise conversions between differently-sized integer vectors

2019-10-25 Thread Richard Sandiford
This patch adds AArch64 patterns for converting between 64-bit and
128-bit integer vectors, and makes the vectoriser and expand pass
use them.


2019-10-24  Richard Sandiford  

gcc/
* tree-vect-stmts.c (vectorizable_conversion): Extend the
non-widening and non-narrowing path to handle standard
conversion codes, if the target supports them.
* expr.c (convert_move): Try using the extend and truncate optabs
for vectors.
* optabs-tree.c (supportable_convert_operation): Likewise.
* config/aarch64/iterators.md (Vnarroqw): New iterator.
* config/aarch64/aarch64-simd.md (2)
(trunc2): New patterns.

gcc/testsuite/
* gcc.dg/vect/no-scevccp-outer-12.c: Expect the test to pass
on aarch64 targets.
* gcc.dg/vect/vect-double-reduc-5.c: Likewise.
* gcc.dg/vect/vect-outer-4e.c: Likewise.
* gcc.target/aarch64/vect_mixed_sizes_5.c: New test.
* gcc.target/aarch64/vect_mixed_sizes_6.c: Likewise.
* gcc.target/aarch64/vect_mixed_sizes_7.c: Likewise.
* gcc.target/aarch64/vect_mixed_sizes_8.c: Likewise.
* gcc.target/aarch64/vect_mixed_sizes_9.c: Likewise.
* gcc.target/aarch64/vect_mixed_sizes_10.c: Likewise.
* gcc.target/aarch64/vect_mixed_sizes_11.c: Likewise.
* gcc.target/aarch64/vect_mixed_sizes_12.c: Likewise.
* gcc.target/aarch64/vect_mixed_sizes_13.c: Likewise.

Index: gcc/tree-vect-stmts.c
===
--- gcc/tree-vect-stmts.c   2019-10-25 13:27:32.877640367 +0100
+++ gcc/tree-vect-stmts.c   2019-10-25 13:27:36.197616908 +0100
@@ -4861,7 +4861,9 @@ vectorizable_conversion (stmt_vec_info s
   switch (modifier)
 {
 case NONE:
-  if (code != FIX_TRUNC_EXPR && code != FLOAT_EXPR)
+  if (code != FIX_TRUNC_EXPR
+ && code != FLOAT_EXPR
+ && !CONVERT_EXPR_CODE_P (code))
return false;
   if (supportable_convert_operation (code, vectype_out, vectype_in,
 , ))
Index: gcc/expr.c
===
--- gcc/expr.c  2019-10-22 08:46:57.359355939 +0100
+++ gcc/expr.c  2019-10-25 13:27:36.193616936 +0100
@@ -250,6 +250,31 @@ convert_move (rtx to, rtx from, int unsi
 
   if (VECTOR_MODE_P (to_mode) || VECTOR_MODE_P (from_mode))
 {
+  if (GET_MODE_UNIT_PRECISION (to_mode)
+ > GET_MODE_UNIT_PRECISION (from_mode))
+   {
+ optab op = unsignedp ? zext_optab : sext_optab;
+ insn_code icode = convert_optab_handler (op, to_mode, from_mode);
+ if (icode != CODE_FOR_nothing)
+   {
+ emit_unop_insn (icode, to, from,
+ unsignedp ? ZERO_EXTEND : SIGN_EXTEND);
+ return;
+   }
+   }
+
+  if (GET_MODE_UNIT_PRECISION (to_mode)
+ < GET_MODE_UNIT_PRECISION (from_mode))
+   {
+ insn_code icode = convert_optab_handler (trunc_optab,
+  to_mode, from_mode);
+ if (icode != CODE_FOR_nothing)
+   {
+ emit_unop_insn (icode, to, from, TRUNCATE);
+ return;
+   }
+   }
+
   gcc_assert (known_eq (GET_MODE_BITSIZE (from_mode),
GET_MODE_BITSIZE (to_mode)));
 
Index: gcc/optabs-tree.c
===
--- gcc/optabs-tree.c   2019-10-08 09:23:31.894529571 +0100
+++ gcc/optabs-tree.c   2019-10-25 13:27:36.193616936 +0100
@@ -303,6 +303,20 @@ supportable_convert_operation (enum tree
   return true;
 }
 
+  if (GET_MODE_UNIT_PRECISION (m1) > GET_MODE_UNIT_PRECISION (m2)
+  && can_extend_p (m1, m2, TYPE_UNSIGNED (vectype_in)))
+{
+  *code1 = code;
+  return true;
+}
+
+  if (GET_MODE_UNIT_PRECISION (m1) < GET_MODE_UNIT_PRECISION (m2)
+  && convert_optab_handler (trunc_optab, m1, m2) != CODE_FOR_nothing)
+{
+  *code1 = code;
+  return true;
+}
+
   /* Now check for builtin.  */
   if (targetm.vectorize.builtin_conversion
   && targetm.vectorize.builtin_conversion (code, vectype_out, vectype_in))
Index: gcc/config/aarch64/iterators.md
===
--- gcc/config/aarch64/iterators.md 2019-10-17 14:23:07.71142 +0100
+++ gcc/config/aarch64/iterators.md 2019-10-25 13:27:36.189616964 +0100
@@ -860,6 +860,8 @@ (define_mode_attr VNARROWQ [(V8HI "V8QI"
(V2DI "V2SI")
(DI   "SI")   (SI   "HI")
(HI   "QI")])
+(define_mode_attr Vnarrowq [(V8HI "v8qi") (V4SI "v4hi")
+   (V2DI "v2si")])
 
 ;; Narrowed quad-modes for VQN (Used for XTN2).
 (define_mode_attr VNARROWQ2 [(V8HI "V16QI") (V4SI "V8HI")
Index: gcc/config/aarch64/aarch64-simd.md

[13/n] Allow mixed vector sizes within a single vectorised stmt

2019-10-25 Thread Richard Sandiford
Although a previous patch allowed mixed vector sizes within a vector
region, we generally still required equal vector sizes within a vector
stmt.  Specifically, vect_get_vector_types_for_stmt computes two vector
types: the vector type corresponding to STMT_VINFO_VECTYPE and the
vector type that determines the minimum vectorisation factor for the
stmt ("nunits_vectype").  It then required these two types to be
the same size.

There doesn't seem to be any need for that restriction though.  AFAICT,
all vectorizable_* functions either do their own compatibility checks
or don't need to do them (because gimple guarantees that the scalar
types are compatible).

It should always be the case that nunits_vectype has at least as many
elements as the other vectype, but that's something we can assert for.

I couldn't resist a couple of other tweaks while there:

- there's no need to compute nunits_vectype if its element type is
  the same as STMT_VINFO_VECTYPE's.

- it's useful to distinguish the nunits_vectype from the main vectype
  in dump messages

- when reusing the existing STMT_VINFO_VECTYPE, it's useful to say so
  in the dump, and say what the type is


2019-10-24  Richard Sandiford  

gcc/
* tree-vect-stmts.c (vect_get_vector_types_for_stmt): Don't
require vectype and nunits_vectype to have the same size;
instead assert that nunits_vectype has at least as many
elements as vectype.  Don't compute a separate nunits_vectype
if the scalar type is obviously the same as vectype's.
Tweak dump messages.

Index: gcc/tree-vect-stmts.c
===
--- gcc/tree-vect-stmts.c   2019-10-25 13:27:26.205687511 +0100
+++ gcc/tree-vect-stmts.c   2019-10-25 13:27:32.877640367 +0100
@@ -11973,7 +11973,12 @@ vect_get_vector_types_for_stmt (stmt_vec
   tree vectype;
   tree scalar_type = NULL_TREE;
   if (STMT_VINFO_VECTYPE (stmt_info))
-*stmt_vectype_out = vectype = STMT_VINFO_VECTYPE (stmt_info);
+{
+  *stmt_vectype_out = vectype = STMT_VINFO_VECTYPE (stmt_info);
+  if (dump_enabled_p ())
+   dump_printf_loc (MSG_NOTE, vect_location,
+"precomputed vectype: %T\n", vectype);
+}
   else
 {
   gcc_assert (!STMT_VINFO_DATA_REF (stmt_info));
@@ -12005,7 +12010,7 @@ vect_get_vector_types_for_stmt (stmt_vec
 
   if (dump_enabled_p ())
dump_printf_loc (MSG_NOTE, vect_location,
-"get vectype for scalar type:  %T\n", scalar_type);
+"get vectype for scalar type: %T\n", scalar_type);
   vectype = get_vectype_for_scalar_type (vinfo, scalar_type);
   if (!vectype)
return opt_result::failure_at (stmt,
@@ -12022,42 +12027,38 @@ vect_get_vector_types_for_stmt (stmt_vec
 
   /* Don't try to compute scalar types if the stmt produces a boolean
  vector; use the existing vector type instead.  */
-  tree nunits_vectype;
-  if (VECTOR_BOOLEAN_TYPE_P (vectype))
-nunits_vectype = vectype;
-  else
+  tree nunits_vectype = vectype;
+  if (!VECTOR_BOOLEAN_TYPE_P (vectype)
+  && *stmt_vectype_out != boolean_type_node)
 {
   /* The number of units is set according to the smallest scalar
 type (or the largest vector size, but we only support one
 vector size per vectorization).  */
-  if (*stmt_vectype_out != boolean_type_node)
+  HOST_WIDE_INT dummy;
+  scalar_type = vect_get_smallest_scalar_type (stmt_info, , );
+  if (scalar_type != TREE_TYPE (vectype))
{
- HOST_WIDE_INT dummy;
- scalar_type = vect_get_smallest_scalar_type (stmt_info,
-  , );
+ if (dump_enabled_p ())
+   dump_printf_loc (MSG_NOTE, vect_location,
+"get vectype for smallest scalar type: %T\n",
+scalar_type);
+ nunits_vectype = get_vectype_for_scalar_type (vinfo, scalar_type);
+ if (!nunits_vectype)
+   return opt_result::failure_at
+ (stmt, "not vectorized: unsupported data-type %T\n",
+  scalar_type);
+ if (dump_enabled_p ())
+   dump_printf_loc (MSG_NOTE, vect_location, "nunits vectype: %T\n",
+nunits_vectype);
}
-  if (dump_enabled_p ())
-   dump_printf_loc (MSG_NOTE, vect_location,
-"get vectype for scalar type:  %T\n", scalar_type);
-  nunits_vectype = get_vectype_for_scalar_type (vinfo, scalar_type);
 }
-  if (!nunits_vectype)
-return opt_result::failure_at (stmt,
-  "not vectorized: unsupported data-type %T\n",
-  scalar_type);
-
-  if (maybe_ne (GET_MODE_SIZE (TYPE_MODE (vectype)),
-   GET_MODE_SIZE (TYPE_MODE (nunits_vectype
-return opt_result::failure_at (stmt,
-  "not vectorized: different 

[12/n] [AArch64] Support vectorising with multiple vector sizes

2019-10-25 Thread Richard Sandiford
This patch makes the vectoriser try mixtures of 64-bit and 128-bit
vector modes on AArch64.  It fixes some existing XFAILs and allows
kernel 24 from the Livermore Loops test to be vectorised (by using
a mixture of V2DF and V2SI).

I'll apply this if the prerequisites are approved.


2019-10-24  Richard Sandiford  

gcc/
* config/aarch64/aarch64.c (aarch64_vectorize_related_mode): New
function.
(aarch64_autovectorize_vector_modes): Also add V4HImode and V2SImode.
(TARGET_VECTORIZE_RELATED_MODE): Define.

gcc/testsuite/
* gcc.dg/vect/vect-outer-4f.c: Expect the test to pass on aarch64
targets.
* gcc.dg/vect/vect-outer-4g.c: Likewise.
* gcc.dg/vect/vect-outer-4k.c: Likewise.
* gcc.dg/vect/vect-outer-4l.c: Likewise.
* gfortran.dg/vect/vect-8.f90: Expect kernel 24 to be vectorized
for aarch64.
* gcc.target/aarch64/sve/reduc_strict_3.c: Update the number of
times that "Detected double reduction" is printed.
* gcc.target/aarch64/vect_mixed_sizes_1.c: New test.
* gcc.target/aarch64/vect_mixed_sizes_2.c: Likewise.
* gcc.target/aarch64/vect_mixed_sizes_3.c: Likewise.
* gcc.target/aarch64/vect_mixed_sizes_4.c: Likewise.

Index: gcc/config/aarch64/aarch64.c
===
--- gcc/config/aarch64/aarch64.c2019-10-25 13:27:15.505763118 +0100
+++ gcc/config/aarch64/aarch64.c2019-10-25 13:27:29.685662922 +0100
@@ -1767,6 +1767,30 @@ aarch64_sve_int_mode (machine_mode mode)
   return aarch64_sve_data_mode (int_mode, GET_MODE_NUNITS (mode)).require ();
 }
 
+/* Implement TARGET_VECTORIZE_RELATED_MODE.  */
+
+static opt_machine_mode
+aarch64_vectorize_related_mode (machine_mode vector_mode,
+   scalar_mode element_mode,
+   poly_uint64 nunits)
+{
+  unsigned int vec_flags = aarch64_classify_vector_mode (vector_mode);
+
+  /* Prefer to use 1 128-bit vector instead of 2 64-bit vectors.  */
+  if ((vec_flags & VEC_ADVSIMD)
+  && known_eq (nunits, 0U)
+  && known_eq (GET_MODE_BITSIZE (vector_mode), 64U)
+  && maybe_ge (GET_MODE_BITSIZE (element_mode)
+  * GET_MODE_NUNITS (vector_mode), 128U))
+{
+  machine_mode res = aarch64_simd_container_mode (element_mode, 128);
+  if (VECTOR_MODE_P (res))
+   return res;
+}
+
+  return default_vectorize_related_mode (vector_mode, element_mode, nunits);
+}
+
 /* Implement TARGET_PREFERRED_ELSE_VALUE.  For binary operations,
prefer to use the first arithmetic operand as the else value if
the else value doesn't matter, since that exactly matches the SVE
@@ -15207,8 +15231,27 @@ aarch64_autovectorize_vector_modes (vect
 {
   if (TARGET_SVE)
 modes->safe_push (VNx16QImode);
+
+  /* Try using 128-bit vectors for all element types.  */
   modes->safe_push (V16QImode);
+
+  /* Try using 64-bit vectors for 8-bit elements and 128-bit vectors
+ for wider elements.  */
   modes->safe_push (V8QImode);
+
+  /* Try using 64-bit vectors for 16-bit elements and 128-bit vectors
+ for wider elements.
+
+ TODO: We could support a limited form of V4QImode too, so that
+ we use 32-bit vectors for 8-bit elements.  */
+  modes->safe_push (V4HImode);
+
+  /* Try using 64-bit vectors for 32-bit elements and 128-bit vectors
+ for 64-bit elements.
+
+ TODO: We could similarly support limited forms of V2QImode and V2HImode
+ for this case.  */
+  modes->safe_push (V2SImode);
 }
 
 /* Implement TARGET_MANGLE_TYPE.  */
@@ -20950,6 +20993,8 @@ #define TARGET_VECTORIZE_VECTOR_ALIGNMEN
 #define TARGET_VECTORIZE_VEC_PERM_CONST \
   aarch64_vectorize_vec_perm_const
 
+#undef TARGET_VECTORIZE_RELATED_MODE
+#define TARGET_VECTORIZE_RELATED_MODE aarch64_vectorize_related_mode
 #undef TARGET_VECTORIZE_GET_MASK_MODE
 #define TARGET_VECTORIZE_GET_MASK_MODE aarch64_get_mask_mode
 #undef TARGET_VECTORIZE_EMPTY_MASK_IS_EXPENSIVE
Index: gcc/testsuite/gcc.dg/vect/vect-outer-4f.c
===
--- gcc/testsuite/gcc.dg/vect/vect-outer-4f.c   2019-03-08 18:15:02.304871094 
+
+++ gcc/testsuite/gcc.dg/vect/vect-outer-4f.c   2019-10-25 13:27:29.685662922 
+0100
@@ -65,4 +65,4 @@ int main (void)
   return 0;
 }
 
-/* { dg-final { scan-tree-dump-times "OUTER LOOP VECTORIZED" 1 "vect" { xfail 
*-*-* } } } */
+/* { dg-final { scan-tree-dump-times "OUTER LOOP VECTORIZED" 1 "vect" { xfail 
{ ! aarch64*-*-* } } } } */
Index: gcc/testsuite/gcc.dg/vect/vect-outer-4g.c
===
--- gcc/testsuite/gcc.dg/vect/vect-outer-4g.c   2019-03-08 18:15:02.268871230 
+
+++ gcc/testsuite/gcc.dg/vect/vect-outer-4g.c   2019-10-25 13:27:29.685662922 
+0100
@@ -65,4 +65,4 @@ int main (void)
   return 0;
 }
 
-/* { dg-final { scan-tree-dump-times "OUTER LOOP VECTORIZED" 1 "vect" { xfail 
*-*-* } } } */
+/* { 

[11/n] Support vectorisation with mixed vector sizes

2019-10-25 Thread Richard Sandiford
After previous patches, it's now possible to make the vectoriser
support multiple vector sizes in the same vector region, using
related_vector_mode to pick the right vector mode for a given
element mode.  No port yet takes advantage of this, but I have
a follow-on patch for AArch64.

This patch also seemed like a good opportunity to add some more dump
messages: one to make it clear which vector size/mode was being used
when analysis passed or failed, and another to say when we've decided
to skip a redundant vector size/mode.


2019-10-24  Richard Sandiford  

gcc/
* machmode.h (opt_machine_mode::operator==): New function.
(opt_machine_mode::operator!=): Likewise.
* tree-vectorizer.h (vec_info::vector_mode): Update comment.
(get_related_vectype_for_scalar_type): Delete.
(get_vectype_for_scalar_type_and_size): Declare.
* tree-vect-slp.c (vect_slp_bb_region): Print dump messages to say
whether analysis passed or failed, and with what vector modes.
Use related_vector_mode to check whether trying a particular
vector mode would be redundant with the autodetected mode,
and print a dump message if we decide to skip it.
* tree-vect-loop.c (vect_analyze_loop): Likewise.
(vect_create_epilog_for_reduction): Use
get_related_vectype_for_scalar_type instead of
get_vectype_for_scalar_type_and_size.
* tree-vect-stmts.c (get_vectype_for_scalar_type_and_size): Replace
with...
(get_related_vectype_for_scalar_type): ...this new function.
Take a starting/"prevailing" vector mode rather than a vector size.
Take an optional nunits argument, with the same meaning as for
related_vector_mode.  Use related_vector_mode when not
auto-detecting a mode, falling back to mode_for_vector if no
target mode exists.
(get_vectype_for_scalar_type): Update accordingly.
(get_same_sized_vectype): Likewise.
* tree-vectorizer.c (get_vec_alignment_for_array_type): Likewise.

Index: gcc/machmode.h
===
--- gcc/machmode.h  2019-10-25 13:26:59.053879364 +0100
+++ gcc/machmode.h  2019-10-25 13:27:26.201687539 +0100
@@ -258,6 +258,9 @@ #define CLASS_HAS_WIDER_MODES_P(CLASS)
   bool exists () const;
   template bool exists (U *) const;
 
+  bool operator== (const T ) const { return m_mode == m; }
+  bool operator!= (const T ) const { return m_mode != m; }
+
 private:
   machine_mode m_mode;
 };
Index: gcc/tree-vectorizer.h
===
--- gcc/tree-vectorizer.h   2019-10-25 13:27:19.317736181 +0100
+++ gcc/tree-vectorizer.h   2019-10-25 13:27:26.209687483 +0100
@@ -329,8 +329,9 @@ typedef std::pair vec_object
   /* Cost data used by the target cost model.  */
   void *target_cost_data;
 
-  /* If we've chosen a vector size for this vectorization region,
- this is one mode that has such a size, otherwise it is VOIDmode.  */
+  /* The argument we should pass to related_vector_mode when looking up
+ the vector mode for a scalar mode, or VOIDmode if we haven't yet
+ made any decisions about which vector modes to use.  */
   machine_mode vector_mode;
 
 private:
@@ -1595,8 +1596,9 @@ extern dump_user_location_t find_loop_lo
 extern bool vect_can_advance_ivs_p (loop_vec_info);
 
 /* In tree-vect-stmts.c.  */
+extern tree get_related_vectype_for_scalar_type (machine_mode, tree,
+poly_uint64 = 0);
 extern tree get_vectype_for_scalar_type (vec_info *, tree);
-extern tree get_vectype_for_scalar_type_and_size (tree, poly_uint64);
 extern tree get_mask_type_for_scalar_type (vec_info *, tree);
 extern tree get_same_sized_vectype (tree, tree);
 extern bool vect_get_loop_mask_type (loop_vec_info);
Index: gcc/tree-vect-slp.c
===
--- gcc/tree-vect-slp.c 2019-10-25 13:27:19.313736209 +0100
+++ gcc/tree-vect-slp.c 2019-10-25 13:27:26.205687511 +0100
@@ -3118,7 +3118,12 @@ vect_slp_bb_region (gimple_stmt_iterator
  && dbg_cnt (vect_slp))
{
  if (dump_enabled_p ())
-   dump_printf_loc (MSG_NOTE, vect_location, "SLPing BB part\n");
+   {
+ dump_printf_loc (MSG_NOTE, vect_location,
+  "* Analysis succeeded with vector mode"
+  " %s\n", GET_MODE_NAME (bb_vinfo->vector_mode));
+ dump_printf_loc (MSG_NOTE, vect_location, "SLPing BB part\n");
+   }
 
  bb_vinfo->shared->check_datarefs ();
  vect_schedule_slp (bb_vinfo);
@@ -3138,6 +3143,13 @@ vect_slp_bb_region (gimple_stmt_iterator
 
  vectorized = true;
}
+  else
+   {
+ if (dump_enabled_p ())
+   dump_printf_loc (MSG_NOTE, vect_location,
+"* Analysis failed with 

Re: [PATCH] Refactor rust-demangle to be independent of C++ demangling.

2019-10-25 Thread Eduard-Mihai Burtescu
> This can be further optimized by using memcmp in place of strncmp, since from
> the length check you know that you won't see the null terminator among the 
> three
> chars you're checking.

Fair enough, here's the combined changelog/diff, with memcmp:

2019-10-22  Eduard-Mihai Burtescu  
include/ChangeLog:
* demangle.h (rust_demangle_callback): Add.
libiberty/ChangeLog:
* cplus-dem.c (cplus_demangle): Use rust_demangle directly.
(rust_demangle): Remove.
* rust-demangle.c (is_prefixed_hash): Rename to is_legacy_prefixed_hash.
(parse_lower_hex_nibble): Rename to decode_lower_hex_nibble.
(parse_legacy_escape): Rename to decode_legacy_escape.
(rust_is_mangled): Remove.
(struct rust_demangler): Add.
(ERROR_AND): Add.
(CHECK_OR): Add.
(peek): Add.
(next): Add.
(struct rust_mangled_ident): Add.
(parse_ident): Add.
(rust_demangle_sym): Remove.
(print_str): Add.
(PRINT): Add.
(print_ident): Add.
(rust_demangle_callback): Add.
(struct str_buf): Add.
(str_buf_reserve): Add.
(str_buf_append): Add.
(str_buf_demangle_callback): Add.
(rust_demangle): Add.
* rust-demangle.h: Remove.

diff --git a/include/demangle.h b/include/demangle.h
index 06c32571d5c..ce7235d13f3 100644
--- a/include/demangle.h
+++ b/include/demangle.h
@@ -159,6 +159,11 @@ ada_demangle (const char *mangled, int options);
 extern char *
 dlang_demangle (const char *mangled, int options);
 
+extern int
+rust_demangle_callback (const char *mangled, int options,
+demangle_callbackref callback, void *opaque);
+
+
 extern char *
 rust_demangle (const char *mangled, int options);
 
diff --git a/libiberty/cplus-dem.c b/libiberty/cplus-dem.c
index a39e2bf2ed4..735a61d7a82 100644
--- a/libiberty/cplus-dem.c
+++ b/libiberty/cplus-dem.c
@@ -52,7 +52,6 @@ void * realloc ();
 #define CURRENT_DEMANGLING_STYLE options
 
 #include "libiberty.h"
-#include "rust-demangle.h"
 
 enum demangling_styles current_demangling_style = auto_demangling;
 
@@ -160,27 +159,20 @@ cplus_demangle (const char *mangled, int options)
   if ((options & DMGL_STYLE_MASK) == 0)
 options |= (int) current_demangling_style & DMGL_STYLE_MASK;
 
+  /* The Rust demangling is implemented elsewhere.
+ Legacy Rust symbols overlap with GNU_V3, so try Rust first.  */
+  if (RUST_DEMANGLING || AUTO_DEMANGLING)
+{
+  ret = rust_demangle (mangled, options);
+  if (ret || RUST_DEMANGLING)
+return ret;
+}
+
   /* The V3 ABI demangling is implemented elsewhere.  */
-  if (GNU_V3_DEMANGLING || RUST_DEMANGLING || AUTO_DEMANGLING)
+  if (GNU_V3_DEMANGLING || AUTO_DEMANGLING)
 {
   ret = cplus_demangle_v3 (mangled, options);
-  if (GNU_V3_DEMANGLING)
-   return ret;
-
-  if (ret)
-   {
- /* Rust symbols are GNU_V3 mangled plus some extra subtitutions.
-The subtitutions are always smaller, so do in place changes.  */
- if (rust_is_mangled (ret))
-   rust_demangle_sym (ret);
- else if (RUST_DEMANGLING)
-   {
- free (ret);
- ret = NULL;
-   }
-   }
-
-  if (ret || RUST_DEMANGLING)
+  if (ret || GNU_V3_DEMANGLING)
return ret;
 }
 
@@ -204,27 +196,6 @@ cplus_demangle (const char *mangled, int options)
   return (ret);
 }
 
-char *
-rust_demangle (const char *mangled, int options)
-{
-  /* Rust symbols are GNU_V3 mangled plus some extra subtitutions.  */
-  char *ret = cplus_demangle_v3 (mangled, options);
-
-  /* The Rust subtitutions are always smaller, so do in place changes.  */
-  if (ret != NULL)
-{
-  if (rust_is_mangled (ret))
-   rust_demangle_sym (ret);
-  else
-   {
- free (ret);
- ret = NULL;
-   }
-}
-
-  return ret;
-}
-
 /* Demangle ada names.  The encoding is documented in gcc/ada/exp_dbug.ads.  */
 
 char *
diff --git a/libiberty/rust-demangle.c b/libiberty/rust-demangle.c
index 6b62e6dbd80..95255d0b601 100644
--- a/libiberty/rust-demangle.c
+++ b/libiberty/rust-demangle.c
@@ -33,9 +33,11 @@ If not, see .  */
 
 #include "safe-ctype.h"
 
+#include 
 #include 
 #include 
 #include 
+#include 
 
 #ifdef HAVE_STRING_H
 #include 
@@ -47,207 +49,115 @@ extern void *memset(void *s, int c, size_t n);
 
 #include 
 #include "libiberty.h"
-#include "rust-demangle.h"
 
-
-/* Mangled (legacy) Rust symbols look like this:
- 
_$LT$std..sys..fd..FileDesc$u20$as$u20$core..ops..Drop$GT$::drop::hc68340e1baa4987a
-
-   The original symbol is:
- ::drop
-
-   The last component of the path is a 64-bit hash in lowercase hex,
-   prefixed with "h". Rust does not have a global namespace between
-   crates, an illusion which Rust maintains by using the hash to
-   distinguish things that would otherwise have the same symbol.
-
-   Any path component not starting 

[10/n] Make less use of get_same_sized_vectype

2019-10-25 Thread Richard Sandiford
Some callers of get_same_sized_vectype were dealing with operands that
are constant or defined externally, and so have no STMT_VINFO_VECTYPE
available.  Under the current model, using get_same_sized_vectype for
that case is equivalent to using get_vectype_for_scalar_type, since
get_vectype_for_scalar_type always returns vectors of the same size,
once a size is fixed.

Using get_vectype_for_scalar_type is arguably more obvious though:
if we're using the same scalar type as we would for internal
definitions, we should use the same vector type too.  (Constant and
external definitions sometimes let us change the original scalar type
to a "nicer" scalar type, but that isn't what's happening here.)

This is a prerequisite to supporting multiple vector sizes in the same
vec_info.


2019-10-24  Richard Sandiford  

gcc/
* tree-vect-stmts.c (vectorizable_call): If an operand is
constant or external, use get_vectype_for_scalar_type
rather than get_same_sized_vectype to get its vector type.
(vectorizable_conversion, vectorizable_shift): Likewise.
(vectorizable_operation): Likewise.

Index: gcc/tree-vect-stmts.c
===
--- gcc/tree-vect-stmts.c   2019-10-25 13:27:19.313736209 +0100
+++ gcc/tree-vect-stmts.c   2019-10-25 13:27:22.985710263 +0100
@@ -3308,10 +3308,10 @@ vectorizable_call (stmt_vec_info stmt_in
  return false;
}
 }
-  /* If all arguments are external or constant defs use a vector type with
- the same size as the output vector type.  */
+  /* If all arguments are external or constant defs, infer the vector type
+ from the scalar type.  */
   if (!vectype_in)
-vectype_in = get_same_sized_vectype (rhs_type, vectype_out);
+vectype_in = get_vectype_for_scalar_type (vinfo, rhs_type);
   if (vec_stmt)
 gcc_assert (vectype_in);
   if (!vectype_in)
@@ -4800,10 +4800,10 @@ vectorizable_conversion (stmt_vec_info s
}
 }
 
-  /* If op0 is an external or constant defs use a vector type of
- the same size as the output vector type.  */
+  /* If op0 is an external or constant def, infer the vector type
+ from the scalar type.  */
   if (!vectype_in)
-vectype_in = get_same_sized_vectype (rhs_type, vectype_out);
+vectype_in = get_vectype_for_scalar_type (vinfo, rhs_type);
   if (vec_stmt)
 gcc_assert (vectype_in);
   if (!vectype_in)
@@ -5564,10 +5564,10 @@ vectorizable_shift (stmt_vec_info stmt_i
  "use not simple.\n");
   return false;
 }
-  /* If op0 is an external or constant def use a vector type with
- the same size as the output vector type.  */
+  /* If op0 is an external or constant def, infer the vector type
+ from the scalar type.  */
   if (!vectype)
-vectype = get_same_sized_vectype (TREE_TYPE (op0), vectype_out);
+vectype = get_vectype_for_scalar_type (vinfo, TREE_TYPE (op0));
   if (vec_stmt)
 gcc_assert (vectype);
   if (!vectype)
@@ -5666,7 +5666,7 @@ vectorizable_shift (stmt_vec_info stmt_i
  "vector/vector shift/rotate found.\n");
 
   if (!op1_vectype)
-   op1_vectype = get_same_sized_vectype (TREE_TYPE (op1), vectype_out);
+   op1_vectype = get_vectype_for_scalar_type (vinfo, TREE_TYPE (op1));
   incompatible_op1_vectype_p
= (op1_vectype == NULL_TREE
   || maybe_ne (TYPE_VECTOR_SUBPARTS (op1_vectype),
@@ -5997,8 +5997,8 @@ vectorizable_operation (stmt_vec_info st
  "use not simple.\n");
   return false;
 }
-  /* If op0 is an external or constant def use a vector type with
- the same size as the output vector type.  */
+  /* If op0 is an external or constant def, infer the vector type
+ from the scalar type.  */
   if (!vectype)
 {
   /* For boolean type we cannot determine vectype by
@@ -6018,7 +6018,7 @@ vectorizable_operation (stmt_vec_info st
  vectype = vectype_out;
}
   else
-   vectype = get_same_sized_vectype (TREE_TYPE (op0), vectype_out);
+   vectype = get_vectype_for_scalar_type (vinfo, TREE_TYPE (op0));
 }
   if (vec_stmt)
 gcc_assert (vectype);


[9/n] Replace vec_info::vector_size with vec_info::vector_mode

2019-10-25 Thread Richard Sandiford
This patch replaces vec_info::vector_size with vec_info::vector_mode,
but for now continues to use it as a way of specifying a single
vector size.  This makes it easier for later patches to use
related_vector_mode instead.


2019-10-24  Richard Sandiford  

gcc/
* tree-vectorizer.h (vec_info::vector_size): Replace with...
(vec_info::vector_mode): ...this new field.
* tree-vect-loop.c (vect_update_vf_for_slp): Update accordingly.
(vect_analyze_loop, vect_transform_loop): Likewise.
* tree-vect-slp.c (can_duplicate_and_interleave_p): Likewise.
(vect_make_slp_decision, vect_slp_bb_region): Likewise.
* tree-vect-stmts.c (get_vectype_for_scalar_type): Likewise.
* tree-vectorizer.c (try_vectorize_loop_1): Likewise.

gcc/testsuite/
* gcc.dg/vect/vect-tail-nomask-1.c: Update expected epilogue
vectorization message.

Index: gcc/tree-vectorizer.h
===
--- gcc/tree-vectorizer.h   2019-10-25 13:26:59.093879082 +0100
+++ gcc/tree-vectorizer.h   2019-10-25 13:27:19.317736181 +0100
@@ -329,9 +329,9 @@ typedef std::pair vec_object
   /* Cost data used by the target cost model.  */
   void *target_cost_data;
 
-  /* The vector size for this loop in bytes, or 0 if we haven't picked
- a size yet.  */
-  poly_uint64 vector_size;
+  /* If we've chosen a vector size for this vectorization region,
+ this is one mode that has such a size, otherwise it is VOIDmode.  */
+  machine_mode vector_mode;
 
 private:
   stmt_vec_info new_stmt_vec_info (gimple *stmt);
Index: gcc/tree-vect-loop.c
===
--- gcc/tree-vect-loop.c2019-10-25 13:27:15.525762975 +0100
+++ gcc/tree-vect-loop.c2019-10-25 13:27:19.309736237 +0100
@@ -1414,8 +1414,8 @@ vect_update_vf_for_slp (loop_vec_info lo
dump_printf_loc (MSG_NOTE, vect_location,
 "Loop contains SLP and non-SLP stmts\n");
   /* Both the vectorization factor and unroll factor have the form
-loop_vinfo->vector_size * X for some rational X, so they must have
-a common multiple.  */
+GET_MODE_SIZE (loop_vinfo->vector_mode) * X for some rational X,
+so they must have a common multiple.  */
   vectorization_factor
= force_common_multiple (vectorization_factor,
 LOOP_VINFO_SLP_UNROLLING_FACTOR (loop_vinfo));
@@ -2341,7 +2341,7 @@ vect_analyze_loop (class loop *loop, loo
" loops cannot be vectorized\n");
 
   unsigned n_stmts = 0;
-  poly_uint64 autodetected_vector_size = 0;
+  machine_mode autodetected_vector_mode = VOIDmode;
   opt_loop_vec_info first_loop_vinfo = opt_loop_vec_info::success (NULL);
   machine_mode next_vector_mode = VOIDmode;
   while (1)
@@ -2357,7 +2357,7 @@ vect_analyze_loop (class loop *loop, loo
  gcc_checking_assert (first_loop_vinfo == NULL);
  return loop_vinfo;
}
-  loop_vinfo->vector_size = GET_MODE_SIZE (next_vector_mode);
+  loop_vinfo->vector_mode = next_vector_mode;
 
   bool fatal = false;
 
@@ -2366,7 +2366,7 @@ vect_analyze_loop (class loop *loop, loo
 
   opt_result res = vect_analyze_loop_2 (loop_vinfo, fatal, _stmts);
   if (mode_i == 0)
-   autodetected_vector_size = loop_vinfo->vector_size;
+   autodetected_vector_mode = loop_vinfo->vector_mode;
 
   if (res)
{
@@ -2401,21 +2401,21 @@ vect_analyze_loop (class loop *loop, loo
 
   if (mode_i < vector_modes.length ()
  && known_eq (GET_MODE_SIZE (vector_modes[mode_i]),
-  autodetected_vector_size))
+  GET_MODE_SIZE (autodetected_vector_mode)))
mode_i += 1;
 
   if (mode_i == vector_modes.length ()
- || known_eq (autodetected_vector_size, 0U))
+ || autodetected_vector_mode == VOIDmode)
{
  if (first_loop_vinfo)
{
  loop->aux = (loop_vec_info) first_loop_vinfo;
  if (dump_enabled_p ())
{
+ machine_mode mode = first_loop_vinfo->vector_mode;
  dump_printf_loc (MSG_NOTE, vect_location,
-  "* Choosing vector size ");
- dump_dec (MSG_NOTE, first_loop_vinfo->vector_size);
- dump_printf (MSG_NOTE, "\n");
+  "* Choosing vector mode %s\n",
+  GET_MODE_NAME (mode));
}
  return first_loop_vinfo;
}
@@ -8238,12 +8238,9 @@ vect_transform_loop (loop_vec_info loop_
  dump_printf (MSG_NOTE, "\n");
}
   else
-   {
- dump_printf_loc (MSG_NOTE, vect_location,
-  "LOOP EPILOGUE VECTORIZED (VS=");
- dump_dec (MSG_NOTE, loop_vinfo->vector_size);
- dump_printf (MSG_NOTE, ")\n");
-   }
+   

[8/n] Replace autovectorize_vector_sizes with autovectorize_vector_modes

2019-10-25 Thread Richard Sandiford
This is another patch in the series to remove the assumption that
all modes involved in vectorisation have to be the same size.
Rather than have the target provide a list of vector sizes,
it makes the target provide a list of vector "approaches",
with each approach represented by a mode.

A later patch will pass this mode to targetm.vectorize.related_mode
to get the vector mode for a given element mode.  Until then, the modes
simply act as an alternative way of specifying the vector size.


2019-10-24  Richard Sandiford  

gcc/
* target.h (vector_sizes, auto_vector_sizes): Delete.
(vector_modes, auto_vector_modes): New typedefs.
* target.def (autovectorize_vector_sizes): Replace with...
(autovectorize_vector_modes): ...this new hook.
* doc/tm.texi.in (TARGET_VECTORIZE_AUTOVECTORIZE_VECTOR_SIZES):
Replace with...
(TARGET_VECTORIZE_AUTOVECTORIZE_VECTOR_MODES): ...this new hook.
* doc/tm.texi: Regenerate.
* targhooks.h (default_autovectorize_vector_sizes): Delete.
(default_autovectorize_vector_modes): New function.
* targhooks.c (default_autovectorize_vector_sizes): Delete.
(default_autovectorize_vector_modes): New function.
* omp-general.c (omp_max_vf): Use autovectorize_vector_modes instead
of autovectorize_vector_sizes.  Use the number of units in the mode
to calculate the maximum VF.
* omp-low.c (omp_clause_aligned_alignment): Use
autovectorize_vector_modes instead of autovectorize_vector_sizes.
Use a loop based on related_mode to iterate through all supported
vector modes for a given scalar mode.
* optabs-query.c (can_vec_mask_load_store_p): Use
autovectorize_vector_modes instead of autovectorize_vector_sizes.
* tree-vect-loop.c (vect_analyze_loop, vect_transform_loop): Likewise.
* tree-vect-slp.c (vect_slp_bb_region): Likewise.
* config/aarch64/aarch64.c (aarch64_autovectorize_vector_sizes):
Replace with...
(aarch64_autovectorize_vector_modes): ...this new function.
(TARGET_VECTORIZE_AUTOVECTORIZE_VECTOR_SIZES): Delete.
(TARGET_VECTORIZE_AUTOVECTORIZE_VECTOR_MODES): Define.
* config/arc/arc.c (arc_autovectorize_vector_sizes): Replace with...
(arc_autovectorize_vector_modes): ...this new function.
(TARGET_VECTORIZE_AUTOVECTORIZE_VECTOR_SIZES): Delete.
(TARGET_VECTORIZE_AUTOVECTORIZE_VECTOR_MODES): Define.
* config/arm/arm.c (arm_autovectorize_vector_sizes): Replace with...
(arm_autovectorize_vector_modes): ...this new function.
(TARGET_VECTORIZE_AUTOVECTORIZE_VECTOR_SIZES): Delete.
(TARGET_VECTORIZE_AUTOVECTORIZE_VECTOR_MODES): Define.
* config/i386/i386.c (ix86_autovectorize_vector_sizes): Replace with...
(ix86_autovectorize_vector_modes): ...this new function.
(TARGET_VECTORIZE_AUTOVECTORIZE_VECTOR_SIZES): Delete.
(TARGET_VECTORIZE_AUTOVECTORIZE_VECTOR_MODES): Define.
* config/mips/mips.c (mips_autovectorize_vector_sizes): Replace with...
(mips_autovectorize_vector_modes): ...this new function.
(TARGET_VECTORIZE_AUTOVECTORIZE_VECTOR_SIZES): Delete.
(TARGET_VECTORIZE_AUTOVECTORIZE_VECTOR_MODES): Define.

Index: gcc/target.h
===
--- gcc/target.h2019-09-30 17:19:39.843166118 +0100
+++ gcc/target.h2019-10-25 13:27:15.525762975 +0100
@@ -205,11 +205,11 @@ enum vect_cost_model_location {
 class vec_perm_indices;
 
 /* The type to use for lists of vector sizes.  */
-typedef vec vector_sizes;
+typedef vec vector_modes;
 
 /* Same, but can be used to construct local lists that are
automatically freed.  */
-typedef auto_vec auto_vector_sizes;
+typedef auto_vec auto_vector_modes;
 
 /* The target structure.  This holds all the backend hooks.  */
 #define DEFHOOKPOD(NAME, DOC, TYPE, INIT) TYPE NAME;
Index: gcc/target.def
===
--- gcc/target.def  2019-10-25 13:26:59.309877555 +0100
+++ gcc/target.def  2019-10-25 13:27:15.525762975 +0100
@@ -1894,20 +1894,28 @@ reached.  The default is @var{mode} whic
 /* Returns a mask of vector sizes to iterate over when auto-vectorizing
after processing the preferred one derived from preferred_simd_mode.  */
 DEFHOOK
-(autovectorize_vector_sizes,
- "If the mode returned by @code{TARGET_VECTORIZE_PREFERRED_SIMD_MODE} is not\n\
-the only one that is worth considering, this hook should add all suitable\n\
-vector sizes to @var{sizes}, in order of decreasing preference.  The first\n\
-one should be the size of @code{TARGET_VECTORIZE_PREFERRED_SIMD_MODE}.\n\
-If @var{all} is true, add suitable vector sizes even when they are generally\n\
+(autovectorize_vector_modes,
+ "If using the mode returned by @code{TARGET_VECTORIZE_PREFERRED_SIMD_MODE}\n\
+is not the only approach worth 

[PING] [PATCH] add __has_builtin (PR 66970)

2019-10-25 Thread Martin Sebor

Ping: https://gcc.gnu.org/ml/gcc-patches/2019-10/msg00062.html

I was privately pointed at the Clang tests.  The implementation
passes most of them.  The one difference I noticed is that GCC
expands macros in the __has_builtin argument list while Clang
doesn't.  Since this is in line with other similar built-ins
(e.g., __has_attribute) and since Clang has at least one bug
report about this, I left it as is and just raised PR 91961
for the record.

On 10/11/2019 09:23 AM, Martin Sebor wrote:

Ping: https://gcc.gnu.org/ml/gcc-patches/2019-10/msg00062.html

On 10/1/19 11:16 AM, Martin Sebor wrote:

Attached is an implementation of the __has_builtin special
preprocessor operator/macro analogous to __has_attribute and
(hopefully) compatible with the synonymous Clang feature (I
couldn't actually find tests for it in the Clang test suite
but if someone points me at them I'll verify it).

Tested on x86_64-linux.

Martin

PS I couldn't find an existing API to test whether a reserved
symbol like __builtin_offsetof is a function-like built-in so
I hardwired the tests for C and C++ into the new names_builtin_p
functions.  I don't like this very much because the next time
such an operator is added there is nothing to remind us to update
the functions.  Adding a flag to the c_common_reswords array would
solve the problem but at the expense of a linear search through
it.  Does anyone have a suggestion for how to do this better?




[7/n] Use consistent compatibility checks in vectorizable_shift

2019-10-25 Thread Richard Sandiford
The validation phase of vectorizable_shift used TYPE_MODE to check
whether the shift amount vector was compatible with the shifted vector:

  if ((op1_vectype == NULL_TREE
   || TYPE_MODE (op1_vectype) != TYPE_MODE (vectype))
  && (!slp_node
  || SLP_TREE_DEF_TYPE
   (SLP_TREE_CHILDREN (slp_node)[1]) != vect_constant_def))

But the generation phase was stricter and required the element types to
be equivalent:

   && !useless_type_conversion_p (TREE_TYPE (vectype),
  TREE_TYPE (op1)))

This difference led to an ICE with a later patch.

The first condition seems a bit too lax given that the function
supports vect_worthwhile_without_simd_p, where two different vector
types could have the same integer mode.  But it seems too strict
to reject signed shifts by unsigned amounts or unsigned shifts by
signed amounts; verify_gimple_assign_binary is happy with those.

This patch therefore goes for a middle ground of checking both TYPE_MODE
and TYPE_VECTOR_SUBPARTS, using the same condition in both places.


2019-10-24  Richard Sandiford  

gcc/
* tree-vect-stmts.c (vectorizable_shift): Check the number
of vector elements as well as the type mode when deciding
whether an op1_vectype is compatible.  Reuse the result of
this check when generating vector statements.

Index: gcc/tree-vect-stmts.c
===
--- gcc/tree-vect-stmts.c   2019-10-25 13:27:08.653811531 +0100
+++ gcc/tree-vect-stmts.c   2019-10-25 13:27:12.121787027 +0100
@@ -5522,6 +5522,7 @@ vectorizable_shift (stmt_vec_info stmt_i
   bool scalar_shift_arg = true;
   bb_vec_info bb_vinfo = STMT_VINFO_BB_VINFO (stmt_info);
   vec_info *vinfo = stmt_info->vinfo;
+  bool incompatible_op1_vectype_p = false;
 
   if (!STMT_VINFO_RELEVANT_P (stmt_info) && !bb_vinfo)
 return false;
@@ -5666,8 +5667,12 @@ vectorizable_shift (stmt_vec_info stmt_i
 
   if (!op1_vectype)
op1_vectype = get_same_sized_vectype (TREE_TYPE (op1), vectype_out);
-  if ((op1_vectype == NULL_TREE
-  || TYPE_MODE (op1_vectype) != TYPE_MODE (vectype))
+  incompatible_op1_vectype_p
+   = (op1_vectype == NULL_TREE
+  || maybe_ne (TYPE_VECTOR_SUBPARTS (op1_vectype),
+   TYPE_VECTOR_SUBPARTS (vectype))
+  || TYPE_MODE (op1_vectype) != TYPE_MODE (vectype));
+  if (incompatible_op1_vectype_p
  && (!slp_node
  || SLP_TREE_DEF_TYPE
   (SLP_TREE_CHILDREN (slp_node)[1]) != vect_constant_def))
@@ -5813,9 +5818,7 @@ vectorizable_shift (stmt_vec_info stmt_i
 }
 }
 }
- else if (slp_node
-  && !useless_type_conversion_p (TREE_TYPE (vectype),
- TREE_TYPE (op1)))
+ else if (slp_node && incompatible_op1_vectype_p)
{
  if (was_scalar_shift_arg)
{


[6/n] Use build_vector_type_for_mode in get_vectype_for_scalar_type_and_size

2019-10-25 Thread Richard Sandiford
Except for one case, get_vectype_for_scalar_type_and_size calculates
what the vector mode should be and then calls build_vector_type,
which recomputes the mode from scratch.  This patch makes it use
build_vector_type_for_mode instead.

The exception mentioned above is when preferred_simd_mode returns
an integer mode, which it does if no appropriate vector mode exists.
The integer mode in question is usually word_mode, although epiphany
can return a doubleword mode in some cases.

There's no guarantee that this integer mode is appropriate, since for
example the scalar type could be a float.  The traditional behaviour is
therefore to use the integer mode to determine a size only, and leave
mode_for_vector to pick the TYPE_MODE.  (Note that it can actually end
up picking a vector mode if the target defines a disabled vector mode.
We therefore still need to check TYPE_MODE after building the type.)


2019-10-24  Richard Sandiford  

gcc/
* tree-vect-stmts.c (get_vectype_for_scalar_type_and_size): If
targetm.vectorize.preferred_simd_mode returns an integer mode,
use mode_for_vector to decide what the vector type's mode
should actually be.  Use build_vector_type_for_mode instead
of build_vector_type.

Index: gcc/tree-vect-stmts.c
===
--- gcc/tree-vect-stmts.c   2019-10-25 13:26:59.309877555 +0100
+++ gcc/tree-vect-stmts.c   2019-10-25 13:27:08.653811531 +0100
@@ -11162,16 +11162,31 @@ get_vectype_for_scalar_type_and_size (tr
   /* If no size was supplied use the mode the target prefers.   Otherwise
  lookup a vector mode of the specified size.  */
   if (known_eq (size, 0U))
-simd_mode = targetm.vectorize.preferred_simd_mode (inner_mode);
+{
+  simd_mode = targetm.vectorize.preferred_simd_mode (inner_mode);
+  if (SCALAR_INT_MODE_P (simd_mode))
+   {
+ /* Traditional behavior is not to take the integer mode
+literally, but simply to use it as a way of determining
+the vector size.  It is up to mode_for_vector to decide
+what the TYPE_MODE should be.
+
+Note that nunits == 1 is allowed in order to support single
+element vector types.  */
+ if (!multiple_p (GET_MODE_SIZE (simd_mode), nbytes, )
+ || !mode_for_vector (inner_mode, nunits).exists (_mode))
+   return NULL_TREE;
+   }
+}
   else if (!multiple_p (size, nbytes, )
   || !mode_for_vector (inner_mode, nunits).exists (_mode))
 return NULL_TREE;
-  /* NOTE: nunits == 1 is allowed to support single element vector types.  */
-  if (!multiple_p (GET_MODE_SIZE (simd_mode), nbytes, ))
-return NULL_TREE;
 
-  vectype = build_vector_type (scalar_type, nunits);
+  vectype = build_vector_type_for_mode (scalar_type, simd_mode);
 
+  /* In cases where the mode was chosen by mode_for_vector, check that
+ the target actually supports the chosen mode, or that it at least
+ allows the vector mode to be replaced by a like-sized integer.  */
   if (!VECTOR_MODE_P (TYPE_MODE (vectype))
   && !INTEGRAL_MODE_P (TYPE_MODE (vectype)))
 return NULL_TREE;


[0/n] Support multiple vector sizes for vectorisation

2019-10-25 Thread Richard Sandiford
This is a continuation of the patch series I started on Wednesday
this time posted under a covering message.  Parts 1-5 were:

[1/n] https://gcc.gnu.org/ml/gcc-patches/2019-10/msg01634.html
[2/n] https://gcc.gnu.org/ml/gcc-patches/2019-10/msg01637.html
[3/n] https://gcc.gnu.org/ml/gcc-patches/2019-10/msg01638.html
[4/n] https://gcc.gnu.org/ml/gcc-patches/2019-10/msg01639.html
[5/n] https://gcc.gnu.org/ml/gcc-patches/2019-10/msg01641.html

Some parts of the series will conflict with Andre's patches,
so I'll hold off applying anything that gets approved until those
patches have gone in.  The conflicts should only be minor though,
and won't change the approach, so I thought it was worth posting
for comments now anyway.

I tested each patch individually on aarch64-linux-gnu and the series as
a whole on x86_64-linux-gnu.  I also tried building at least one target
per CPU directory and spot-checked that they were behaving sensibly after
the patch.

Thanks,
Richard


[PATCH 1/3] Remove misleading sorting function in ggc memory report.

2019-10-25 Thread Martin Liska

gcc/ChangeLog:

2019-10-25  Martin Liska  

* cgraphunit.c (symbol_table::compile): Remove argument
for dump_memory_report.
* ggc-common.c (dump_ggc_loc_statistics): Likewise.
(compare_final): Remove in order to make report
better readable.
* ggc.h (dump_ggc_loc_statistics):  Remove argument.
* mem-stats.h (mem_alloc_description::get_list):
Do not pass cmp.
(mem_alloc_description::dump): Likewise here.
* toplev.c (dump_memory_report): Remove final
argument.
(finalize): Likewise.
* toplev.h (dump_memory_report): Remove argument.

gcc/lto/ChangeLog:

2019-10-25  Martin Liska  

* lto.c (do_whole_program_analysis): Remove argument.
---
 gcc/cgraphunit.c |  4 ++--
 gcc/ggc-common.c | 19 ++-
 gcc/ggc.h|  2 +-
 gcc/lto/lto.c|  6 +++---
 gcc/mem-stats.h  | 19 ++-
 gcc/toplev.c |  6 +++---
 gcc/toplev.h |  2 +-
 7 files changed, 18 insertions(+), 40 deletions(-)

diff --git a/gcc/cgraphunit.c b/gcc/cgraphunit.c
index 3f751fa1044..9873b9b7aac 100644
--- a/gcc/cgraphunit.c
+++ b/gcc/cgraphunit.c
@@ -2604,7 +2604,7 @@ symbol_table::compile (void)
   if (pre_ipa_mem_report)
 {
   fprintf (stderr, "Memory consumption before IPA\n");
-  dump_memory_report (false);
+  dump_memory_report ();
 }
   if (!quiet_flag)
 fprintf (stderr, "Performing interprocedural optimizations\n");
@@ -2639,7 +2639,7 @@ symbol_table::compile (void)
   if (post_ipa_mem_report)
 {
   fprintf (stderr, "Memory consumption after IPA\n");
-  dump_memory_report (false);
+  dump_memory_report ();
 }
   timevar_pop (TV_CGRAPHOPT);
 
diff --git a/gcc/ggc-common.c b/gcc/ggc-common.c
index 0968d9769fa..8bc77a0a036 100644
--- a/gcc/ggc-common.c
+++ b/gcc/ggc-common.c
@@ -933,21 +933,6 @@ public:
 return s.second->get_balance () - f.second->get_balance ();
   }
 
-  /* Compare rows in final GGC summary dump.  */
-  static int
-  compare_final (const void *first, const void *second)
-  {
-typedef std::pair mem_pair_t;
-
-const ggc_usage *f = ((const mem_pair_t *)first)->second;
-const ggc_usage *s = ((const mem_pair_t *)second)->second;
-
-size_t a = f->m_allocated + f->m_overhead - f->m_freed;
-size_t b = s->m_allocated + s->m_overhead - s->m_freed;
-
-return a == b ? 0 : (a < b ? 1 : -1);
-  }
-
   /* Dump header with NAME.  */
   static inline void
   dump_header (const char *name)
@@ -970,7 +955,7 @@ static mem_alloc_description ggc_mem_desc;
 /* Dump per-site memory statistics.  */
 
 void
-dump_ggc_loc_statistics (bool final)
+dump_ggc_loc_statistics ()
 {
   if (! GATHER_STATISTICS)
 return;
@@ -978,7 +963,7 @@ dump_ggc_loc_statistics (bool final)
   ggc_force_collect = true;
   ggc_collect ();
 
-  ggc_mem_desc.dump (GGC_ORIGIN, final ? ggc_usage::compare_final : NULL);
+  ggc_mem_desc.dump (GGC_ORIGIN);
 
   ggc_force_collect = false;
 }
diff --git a/gcc/ggc.h b/gcc/ggc.h
index 31606dc843f..64d1f188eb0 100644
--- a/gcc/ggc.h
+++ b/gcc/ggc.h
@@ -149,7 +149,7 @@ extern void *ggc_realloc (void *, size_t CXX_MEM_STAT_INFO);
 /* Free a block.  To be used when known for certain it's not reachable.  */
 extern void ggc_free (void *);
 
-extern void dump_ggc_loc_statistics (bool);
+extern void dump_ggc_loc_statistics ();
 
 /* Reallocator.  */
 #define GGC_RESIZEVEC(T, P, N) \
diff --git a/gcc/lto/lto.c b/gcc/lto/lto.c
index 9b8c3272977..5dca73ffdb3 100644
--- a/gcc/lto/lto.c
+++ b/gcc/lto/lto.c
@@ -457,7 +457,7 @@ do_whole_program_analysis (void)
   if (pre_ipa_mem_report)
 {
   fprintf (stderr, "Memory consumption before IPA\n");
-  dump_memory_report (false);
+  dump_memory_report ();
 }
 
   symtab->function_flags_ready = true;
@@ -539,14 +539,14 @@ do_whole_program_analysis (void)
   if (post_ipa_mem_report)
 {
   fprintf (stderr, "Memory consumption after IPA\n");
-  dump_memory_report (false);
+  dump_memory_report ();
 }
 
   /* Show the LTO report before launching LTRANS.  */
   if (flag_lto_report || (flag_wpa && flag_lto_report_wpa))
 print_lto_report_1 ();
   if (mem_report_wpa)
-dump_memory_report (true);
+dump_memory_report ();
 }
 
 /* Create artificial pointers for "omp declare target link" vars.  */
diff --git a/gcc/mem-stats.h b/gcc/mem-stats.h
index 9ceb9ccc55b..c2329c2b14d 100644
--- a/gcc/mem-stats.h
+++ b/gcc/mem-stats.h
@@ -361,14 +361,11 @@ public:
  are filtered by ORIGIN type, LENGTH is return value where we register
  the number of elements in the list. If we want to process custom order,
  CMP comparator can be provided.  */
-  mem_list_t *get_list (mem_alloc_origin origin, unsigned *length,
-			int (*cmp) (const void *first,
-const void *second) = NULL);
+  mem_list_t *get_list (mem_alloc_origin origin, unsigned *length);
 
   /* Dump all tracked instances of type ORIGIN. If we want to process custom
  order, CMP 

[PATCH 3/3] Print header in dump_memory_report.

2019-10-25 Thread Martin Liska

gcc/ChangeLog:

2019-10-25  Martin Liska  

* cgraphunit.c (symbol_table::compile): Pass
title as dump_memory_report argument.
* toplev.c (dump_memory_report):  New argument.
(finalize): Pass new argument.
* toplev.h (dump_memory_report): Add argument.

gcc/lto/ChangeLog:

2019-10-25  Martin Liska  

* lto.c (do_whole_program_analysis): Pass
title as dump_memory_report argument.
---
 gcc/cgraphunit.c | 10 ++
 gcc/lto/lto.c| 12 +++-
 gcc/toplev.c | 13 +++--
 gcc/toplev.h |  3 ++-
 4 files changed, 18 insertions(+), 20 deletions(-)

diff --git a/gcc/cgraphunit.c b/gcc/cgraphunit.c
index 9873b9b7aac..6ec24432351 100644
--- a/gcc/cgraphunit.c
+++ b/gcc/cgraphunit.c
@@ -2602,10 +2602,7 @@ symbol_table::compile (void)
 
   timevar_push (TV_CGRAPHOPT);
   if (pre_ipa_mem_report)
-{
-  fprintf (stderr, "Memory consumption before IPA\n");
-  dump_memory_report ();
-}
+dump_memory_report ("Memory consumption before IPA");
   if (!quiet_flag)
 fprintf (stderr, "Performing interprocedural optimizations\n");
   state = IPA;
@@ -2637,10 +2634,7 @@ symbol_table::compile (void)
   symtab->dump (dump_file);
 }
   if (post_ipa_mem_report)
-{
-  fprintf (stderr, "Memory consumption after IPA\n");
-  dump_memory_report ();
-}
+dump_memory_report ("Memory consumption after IPA");
   timevar_pop (TV_CGRAPHOPT);
 
   /* Output everything.  */
diff --git a/gcc/lto/lto.c b/gcc/lto/lto.c
index 5dca73ffdb3..27ea341e04c 100644
--- a/gcc/lto/lto.c
+++ b/gcc/lto/lto.c
@@ -455,10 +455,7 @@ do_whole_program_analysis (void)
   timevar_push (TV_WHOPR_WPA);
 
   if (pre_ipa_mem_report)
-{
-  fprintf (stderr, "Memory consumption before IPA\n");
-  dump_memory_report ();
-}
+dump_memory_report ("Memory consumption before IPA");
 
   symtab->function_flags_ready = true;
 
@@ -537,16 +534,13 @@ do_whole_program_analysis (void)
   timevar_stop (TV_PHASE_STREAM_OUT);
 
   if (post_ipa_mem_report)
-{
-  fprintf (stderr, "Memory consumption after IPA\n");
-  dump_memory_report ();
-}
+dump_memory_report ("Memory consumption after IPA");
 
   /* Show the LTO report before launching LTRANS.  */
   if (flag_lto_report || (flag_wpa && flag_lto_report_wpa))
 print_lto_report_1 ();
   if (mem_report_wpa)
-dump_memory_report ();
+dump_memory_report ("Final");
 }
 
 /* Create artificial pointers for "omp declare target link" vars.  */
diff --git a/gcc/toplev.c b/gcc/toplev.c
index 8a152b8e3b1..00a5e832126 100644
--- a/gcc/toplev.c
+++ b/gcc/toplev.c
@@ -1994,8 +1994,17 @@ target_reinit (void)
 }
 
 void
-dump_memory_report ()
+dump_memory_report (const char *header)
 {
+  /* Print significant header.  */
+  fputc ('\n', stderr);
+  for (unsigned i = 0; i < 80; i++)
+fputc ('#', stderr);
+  fprintf (stderr, "\n# %-77s#\n", header);
+  for (unsigned i = 0; i < 80; i++)
+fputc ('#', stderr);
+  fputs ("\n\n", stderr);
+
   dump_line_table_statistics ();
   ggc_print_statistics ();
   stringpool_statistics ();
@@ -2058,7 +2067,7 @@ finalize (bool no_backend)
 }
 
   if (mem_report)
-dump_memory_report ();
+dump_memory_report ("Final");
 
   if (profile_report)
 dump_profile_report ();
diff --git a/gcc/toplev.h b/gcc/toplev.h
index 91e346570db..8814a5e11f8 100644
--- a/gcc/toplev.h
+++ b/gcc/toplev.h
@@ -66,7 +66,8 @@ extern bool wrapup_global_declarations (tree *, int);
 
 extern void global_decl_processing (void);
 
-extern void dump_memory_report ();
+extern void
+dump_memory_report (const char *);
 extern void dump_profile_report (void);
 
 extern void target_reinit (void);


[PATCH 0/3] -fmem-report tweaks

2019-10-25 Thread Martin Liska
Hi.

The patches fix 3 issues I spotted today during a LTO debugging
session:

1) Removal of two different sorting of GGC memory, it's confusing.
2) Move Leak to the first column for GGC memory similarly to
   other reports.
3) I make a significant header/title that can distinguish in between
   -fpre-ipa-mem-report and other options.

Thanks,
Martin

Martin Liska (3):
  Remove misleading sorting function in ggc memory report.
  Move Leak in GCC memory report to the first column.
  Print header in dump_memory_report.

 gcc/cgraphunit.c | 10 ++
 gcc/ggc-common.c | 28 +++-
 gcc/ggc.h|  2 +-
 gcc/lto/lto.c| 12 +++-
 gcc/mem-stats.h  | 19 ++-
 gcc/toplev.c | 15 ---
 gcc/toplev.h |  3 ++-
 7 files changed, 33 insertions(+), 56 deletions(-)

-- 
2.23.0



[PATCH 2/3] Move Leak in GCC memory report to the first column.

2019-10-25 Thread Martin Liska

gcc/ChangeLog:

2019-10-25  Martin Liska  

* ggc-common.c: Move Leak to the first column.
---
 gcc/ggc-common.c | 9 +
 1 file changed, 5 insertions(+), 4 deletions(-)

diff --git a/gcc/ggc-common.c b/gcc/ggc-common.c
index 8bc77a0a036..37d3c5df9e1 100644
--- a/gcc/ggc-common.c
+++ b/gcc/ggc-common.c
@@ -887,10 +887,11 @@ public:
 fprintf (stderr,
 	 "%-48s " PRsa (9) ":%5.1f%%" PRsa (9) ":%5.1f%%"
 	 PRsa (9) ":%5.1f%%" PRsa (9) ":%5.1f%%" PRsa (9) "\n",
-	 prefix, SIZE_AMOUNT (m_collected),
+	 prefix,
+	 SIZE_AMOUNT (balance), get_percent (balance, total.get_balance ()),
+	 SIZE_AMOUNT (m_collected),
 	 get_percent (m_collected, total.m_collected),
 	 SIZE_AMOUNT (m_freed), get_percent (m_freed, total.m_freed),
-	 SIZE_AMOUNT (balance), get_percent (balance, total.get_balance ()),
 	 SIZE_AMOUNT (m_overhead),
 	 get_percent (m_overhead, total.m_overhead),
 	 SIZE_AMOUNT (m_times));
@@ -937,8 +938,8 @@ public:
   static inline void
   dump_header (const char *name)
   {
-fprintf (stderr, "%-48s %11s%17s%17s%16s%17s\n", name, "Garbage", "Freed",
-	 "Leak", "Overhead", "Times");
+fprintf (stderr, "%-48s %11s%17s%17s%16s%17s\n", name, "Leak", "Garbage",
+	 "Freed", "Overhead", "Times");
   }
 
   /* Freed memory in bytes.  */


[PATCH][MSP430] Use 430 insns in the large memory model for more patterns

2019-10-25 Thread Jozef Lawrynowicz
Where possible, it is always desirable to use 430 format instructions when
compiling for the 430X ISA and the large memory model. 430 instructions have
reduced code size and faster execution time.

This patch recognizes a couple of new patterns in which we can use 430 insns in
the large memory model.

Successfully regtested on trunk.

Ok to apply?
>From ba3a8eafeba08dc034e219f892f2784c16f94c40 Mon Sep 17 00:00:00 2001
From: Jozef Lawrynowicz 
Date: Thu, 24 Oct 2019 15:17:29 +0100
Subject: [PATCH] MSP430: Use 430 insns in the large memory model for more
 patterns

gcc/ChangeLog:

2019-10-25  Jozef Lawrynowicz  

	* config/msp430/msp430.c (msp430_check_index_not_high_mem): New.
	(msp430_check_plus_not_high_mem): New.
	(msp430_op_not_in_high_mem): Use new functions to check if the operand
	might be in low memory.
	Indicate that a 16-bit absolute address is in lower memory.

gcc/testsuite/ChangeLog:

2019-10-25  Jozef Lawrynowicz  

	* gcc.target/msp430/mlarge-use-430-insn.c: New test.

---
 gcc/config/msp430/msp430.c| 43 ---
 .../gcc.target/msp430/mlarge-use-430-insn.c   | 33 ++
 2 files changed, 70 insertions(+), 6 deletions(-)
 create mode 100644 gcc/testsuite/gcc.target/msp430/mlarge-use-430-insn.c

diff --git a/gcc/config/msp430/msp430.c b/gcc/config/msp430/msp430.c
index fe1fcc0db43..a3d0d9cf64b 100644
--- a/gcc/config/msp430/msp430.c
+++ b/gcc/config/msp430/msp430.c
@@ -3232,10 +3232,37 @@ msp430_print_operand_addr (FILE * file, machine_mode /*mode*/, rtx addr)
   msp430_print_operand_raw (file, addr);
 }
 
+/* We can only allow signed 15-bit indexes i.e. +/-32K.  */
+static bool
+msp430_check_index_not_high_mem (rtx op)
+{
+  if (CONST_INT_P (op)
+  && IN_RANGE (INTVAL (op), HOST_WIDE_INT_M1U << 15, (1 << 15) - 1))
+return true;
+  return false;
+}
+
+/* If this returns true, we don't need a 430X insn.  */
+static bool
+msp430_check_plus_not_high_mem (rtx op)
+{
+  if (GET_CODE (op) != PLUS)
+return false;
+  rtx op0 = XEXP (op, 0);
+  rtx op1 = XEXP (op, 1);
+  if (SYMBOL_REF_P (op0)
+  && (SYMBOL_REF_FLAGS (op0) & SYMBOL_FLAG_LOW_MEM)
+  && msp430_check_index_not_high_mem (op1))
+return true;
+  return false;
+}
+
 /* Determine whether an RTX is definitely not a MEM referencing an address in
the upper memory region.  Returns true if we've decided the address will be
in the lower memory region, or the RTX is not a MEM.  Returns false
-   otherwise.  */
+   otherwise.
+   The Ys constraint will catch (mem (plus (const/reg)) but we catch cases
+   involving a symbol_ref here.  */
 bool
 msp430_op_not_in_high_mem (rtx op)
 {
@@ -3251,11 +3278,15 @@ msp430_op_not_in_high_mem (rtx op)
memory.  */
 return true;
 
-  /* Catch (mem (const (plus ((symbol_ref) (const_int) e.g. +2.  */
-  if ((GET_CODE (op0) == CONST)
-  && (GET_CODE (XEXP (op0, 0)) == PLUS)
-  && (SYMBOL_REF_P (XEXP (XEXP (op0, 0), 0)))
-  && (SYMBOL_REF_FLAGS (XEXP (XEXP (op0, 0), 0)) & SYMBOL_FLAG_LOW_MEM))
+  /* Check possibilites for (mem (plus)).
+ e.g. (mem (const (plus ((symbol_ref) (const_int) : +2.  */
+  if (msp430_check_plus_not_high_mem (op0)
+  || ((GET_CODE (op0) == CONST)
+	  && msp430_check_plus_not_high_mem (XEXP (op0, 0
+return true;
+
+  /* An absolute 16-bit address is allowed.  */
+  if ((CONST_INT_P (op0) && (IN_RANGE (INTVAL (op0), 0, (1 << 16) - 1
 return true;
 
   /* Return false when undecided.  */
diff --git a/gcc/testsuite/gcc.target/msp430/mlarge-use-430-insn.c b/gcc/testsuite/gcc.target/msp430/mlarge-use-430-insn.c
new file mode 100644
index 000..efa598be685
--- /dev/null
+++ b/gcc/testsuite/gcc.target/msp430/mlarge-use-430-insn.c
@@ -0,0 +1,33 @@
+/* { dg-do compile } */
+/* { dg-skip-if "" { *-*-* } { "-mcpu=msp430" "-mcpu=430" "-msmall" } { "" } } */
+/* { dg-options "-mlarge -O1" } */
+/* { dg-final { check-function-bodies "**" "" } } */
+
+/* Test to verify cases where we can use a 430 insn even in the large memory
+   model.  */
+
+int foo[2];
+
+/*
+** func:  { target msp430_region_lower }
+** ...
+**	MOV.W	#-4088, 
+**	MOV.W	#-8531, &40960
+**	MOVX.W	#-16657, &106496
+** ...
+*/
+/*
+** func:  { target msp430_region_not_lower }
+** ...
+**	MOVX.W	#-4088, 
+**	MOV.W	#-8531, &40960
+**	MOVX.W	#-16657, &106496
+** ...
+*/
+void
+func (void)
+{
+  foo[0] = 0xF008;
+  (*(int *)0xA000) = 0xDEAD;
+  (*(int *)0x1A000) = 0xBEEF;
+}
-- 
2.17.1



[build] Properly track GCC language configure fragments

2019-10-25 Thread Thomas Schwinge
Hi!

I'm aware that incremental builds aren't really supported, but during
day-to-day development, they're still useful -- as long as they work.
;-)


A recent change (adding '\$(srcdir)/cp/logic.cc' to 'gtfiles' in
'gcc/cp/config-lang.in') broke things:

[...]/source-gcc/gcc/cp/logic.cc:907:25: fatal error: gt-cp-logic.h: No 
such file or directory
compilation terminated.
Makefile:1117: recipe for target 'cp/logic.o' failed
make[2]: *** [cp/logic.o] Error 1
make[2]: *** Waiting for unfinished jobs
make[2]: Leaving directory '[...]/build-gcc-offload-nvptx-none/gcc'
Makefile:4336: recipe for target 'all-gcc' failed
make[1]: *** [all-gcc] Error 2
make[1]: Leaving directory '[...]/build-gcc-offload-nvptx-none'
Makefile:936: recipe for target 'all' failed
make: *** [all] Error 2

That's because the build machinery didn't notice the
'gcc/cp/config-lang.in' change, and thus files didn't get regenerated;
'gtfiles' are encoded in 'gcc/Makefile', and that's how/where the missing
'gt-cp-logic.h' would be generated.


Please find attached a patch with rationale.  OK to commit?  If approving
this patch, please respond with "Reviewed-by: NAME " so that your
effort will be recorded in the commit log, see
.


Grüße
 Thomas


From dae93e7c2ed195fb3d1d6c4ccf0ddb5ef54bf8ee Mon Sep 17 00:00:00 2001
From: Thomas Schwinge 
Date: Fri, 25 Oct 2019 12:37:05 +0200
Subject: [PATCH] [build] Properly track GCC language configure fragments

The 'gcc/configure' script sources all 'gcc/*/config-lang.in' files, but fails
to emit such dependency information into the build machinery.  That means,
currently, when something gets changed in a 'gcc/*/config-lang.in' file, this
is not noticed, and doesn't propagate through the build machinery.

Handling of configure fragments is modelled in the same way as it already
exists for Makefile fragments.

	gcc/
	* Makefile.in (LANG_CONFIGUREFRAGS): Define.
	(config.status): Use/depend on it.
	* configure.ac (all_lang_configurefrags): Track, 'AC_SUBST'.
	* configure: Regenerate.
---
 gcc/Makefile.in  |  3 ++-
 gcc/configure| 10 +++---
 gcc/configure.ac |  5 -
 3 files changed, 13 insertions(+), 5 deletions(-)

diff --git a/gcc/Makefile.in b/gcc/Makefile.in
index c82858fa93e..bec5d8cf431 100644
--- a/gcc/Makefile.in
+++ b/gcc/Makefile.in
@@ -1122,6 +1122,7 @@ endif
 # Support for additional languages (other than C).
 # C can be supported this way too (leave for later).
 
+LANG_CONFIGUREFRAGS  = @all_lang_configurefrags@
 LANG_MAKEFRAGS = @all_lang_makefrags@
 
 # Used by gcc/jit/Make-lang.in
@@ -1902,7 +1903,7 @@ cstamp-h: config.in config.status
 # Really, really stupid make features, such as SUN's KEEP_STATE, may force
 # a target to build even if it is up-to-date.  So we must verify that
 # config.status does not exist before failing.
-config.status: $(srcdir)/configure $(srcdir)/config.gcc
+config.status: $(srcdir)/configure $(srcdir)/config.gcc $(LANG_CONFIGUREFRAGS)
 	@if [ ! -f config.status ] ; then \
 	  echo You must configure gcc.  Look at http://gcc.gnu.org/install/ for details.; \
 	  false; \
diff --git a/gcc/configure b/gcc/configure
index 9de9ef85f24..84b3578fb2b 100755
--- a/gcc/configure
+++ b/gcc/configure
@@ -701,6 +701,7 @@ build_exeext
 all_selected_languages
 all_languages
 all_lang_makefrags
+all_lang_configurefrags
 all_gtfiles
 all_compilers
 srcdir
@@ -18851,7 +18852,7 @@ else
   lt_dlunknown=0; lt_dlno_uscore=1; lt_dlneed_uscore=2
   lt_status=$lt_dlunknown
   cat > conftest.$ac_ext <<_LT_EOF
-#line 18854 "configure"
+#line 18855 "configure"
 #include "confdefs.h"
 
 #if HAVE_DLFCN_H
@@ -18957,7 +18958,7 @@ else
   lt_dlunknown=0; lt_dlno_uscore=1; lt_dlneed_uscore=2
   lt_status=$lt_dlunknown
   cat > conftest.$ac_ext <<_LT_EOF
-#line 18960 "configure"
+#line 18961 "configure"
 #include "confdefs.h"
 
 #if HAVE_DLFCN_H
@@ -29824,7 +29825,8 @@ lang_tree_files=
 all_languages=
 all_compilers=
 all_outputs='Makefile'
-# List of language makefile fragments.
+# List of language configure and makefile fragments.
+all_lang_configurefrags=
 all_lang_makefrags=
 # Additional files for gengtype
 all_gtfiles="$target_gtfiles"
@@ -29910,6 +29912,7 @@ do
 	esac
 	$ok || continue
 
+	all_lang_configurefrags="$all_lang_configurefrags \$(srcdir)/$gcc_subdir/config-lang.in"
 	all_lang_makefrags="$all_lang_makefrags \$(srcdir)/$gcc_subdir/Make-lang.in"
 	if test -f $srcdir/$gcc_subdir/lang.opt; then
 	lang_opt_files="$lang_opt_files $srcdir/$gcc_subdir/lang.opt"
@@ -30061,6 +30064,7 @@ fi
 
 
 
+
 
 
 
diff --git a/gcc/configure.ac b/gcc/configure.ac
index 62f4b2651cc..f89bb43d19c 100644
--- a/gcc/configure.ac
+++ b/gcc/configure.ac
@@ -6284,7 +6284,8 @@ lang_tree_files=
 all_languages=
 all_compilers=
 all_outputs='Makefile'
-# List of language makefile fragments.
+# List of language configure and makefile fragments.
+all_lang_configurefrags=
 all_lang_makefrags=
 # Additional files for 

[PATCH] Use STMT_VINFO_REDUC_IDX instead of recomputing it

2019-10-25 Thread Richard Biener


This is a cleanup.  The cond-reduction restriction can go,
the fold-left one stays (it cannot handle more than one stmt in
the cycle - in the future when we get partial loop vectorization
generic code would handle duplicating of scalar code parts, they'd
simply stay single-lane SLP graph parts).

Bootstrap and regtest running on x86_64-unknown-linux-gnu.

Richard.

2019-10-25  Richard Biener  

* tree-vect-loop.c (vect_create_epilog_for_reduction): Use
STMT_VINFO_REDUC_IDX from the actual stmt.
(vect_transform_reduction): Likewise.
(vectorizable_reduction): Compute the reduction chain length,
do not recompute the reduction operand index.  Remove no longer
necessary restriction for condition reduction chains.

Index: gcc/tree-vect-loop.c
===
--- gcc/tree-vect-loop.c(revision 277441)
+++ gcc/tree-vect-loop.c(working copy)
@@ -4263,9 +4263,9 @@ vect_create_epilog_for_reduction (stmt_v
 (CCOMPARE).  The then and else values mirror the main VEC_COND_EXPR:
 the reduction phi corresponds to NEW_PHI_TREE and the new values
 correspond to INDEX_BEFORE_INCR.  */
-  gcc_assert (STMT_VINFO_REDUC_IDX (reduc_info) >= 1);
+  gcc_assert (STMT_VINFO_REDUC_IDX (stmt_info) >= 1);
   tree index_cond_expr;
-  if (STMT_VINFO_REDUC_IDX (reduc_info) == 2)
+  if (STMT_VINFO_REDUC_IDX (stmt_info) == 2)
index_cond_expr = build3 (VEC_COND_EXPR, cr_index_vector_type,
  ccompare, indx_before_incr, new_phi_tree);
   else
@@ -5720,19 +5720,21 @@ vectorizable_reduction (stmt_vec_info st
   gcc_assert (!STMT_VINFO_RELATED_STMT (phi_info));
   gphi *reduc_def_phi = as_a  (phi_info->stmt);
 
-  /* Verify following REDUC_IDX from the latch def leads us back to the PHI.  
*/
+  /* Verify following REDUC_IDX from the latch def leads us back to the PHI
+ and compute the reduction chain length.  */
   tree reduc_def = PHI_ARG_DEF_FROM_EDGE (reduc_def_phi,
  loop_latch_edge (loop));
+  unsigned reduc_chain_length = 0;
   while (reduc_def != PHI_RESULT (reduc_def_phi))
 {
   stmt_vec_info def = loop_vinfo->lookup_def (reduc_def);
   def = vect_stmt_to_vectorize (def);
   gcc_assert (STMT_VINFO_REDUC_IDX (def) != -1);
   reduc_def = gimple_op (def->stmt, 1 + STMT_VINFO_REDUC_IDX (def));
+  reduc_chain_length++;
 }
 
   reduc_def = PHI_RESULT (reduc_def_phi);
-  int reduc_index = -1;
   for (i = 0; i < op_type; i++)
 {
   tree op = gimple_op (stmt, i + 1);
@@ -5753,7 +5755,6 @@ vectorizable_reduction (stmt_vec_info st
   if ((dt == vect_reduction_def || dt == vect_nested_cycle)
  && op == reduc_def)
{
- reduc_index = i;
  continue;
}
 
@@ -5792,10 +5793,6 @@ vectorizable_reduction (stmt_vec_info st
   if (!vectype_in)
 vectype_in = vectype_out;
   STMT_VINFO_REDUC_VECTYPE_IN (reduc_info) = vectype_in;
-  /* For the SSA cycle we store on each participating stmt the operand index
- where the cycle continues.  Store the one relevant for the actual
- operation in the reduction meta.  */
-  STMT_VINFO_REDUC_IDX (reduc_info) = reduc_index;
 
   enum vect_reduction_type v_reduc_type = STMT_VINFO_REDUC_TYPE (phi_info);
   STMT_VINFO_REDUC_TYPE (reduc_info) = v_reduc_type;
@@ -5805,28 +5802,8 @@ vectorizable_reduction (stmt_vec_info st
   if (slp_node)
return false;
 
-  /* TODO: We can't yet handle reduction chains, since we need to treat
-each COND_EXPR in the chain specially, not just the last one.
-E.g. for:
-
-   x_1 = PHI 
-   x_2 = a_2 ? ... : x_1;
-   x_3 = a_3 ? ... : x_2;
-
-we're interested in the last element in x_3 for which a_2 || a_3
-is true, whereas the current reduction chain handling would
-vectorize x_2 as a normal VEC_COND_EXPR and only treat x_3
-as a reduction operation.  */
-  if (reduc_index == -1)
-   {
- if (dump_enabled_p ())
-   dump_printf_loc (MSG_MISSED_OPTIMIZATION, vect_location,
-"conditional reduction chains not supported\n");
- return false;
-   }
-
   /* When the condition uses the reduction value in the condition, fail.  
*/
-  if (reduc_index == 0)
+  if (STMT_VINFO_REDUC_IDX (stmt_info) == 0)
{
  if (dump_enabled_p ())
dump_printf_loc (MSG_MISSED_OPTIMIZATION, vect_location,
@@ -5995,17 +5972,17 @@ vectorizable_reduction (stmt_vec_info st
 outer-loop vectorization is safe.  */
   if (needs_fold_left_reduction_p (scalar_type, orig_code))
{
- STMT_VINFO_REDUC_TYPE (reduc_info)
-   = reduction_type = FOLD_LEFT_REDUCTION;
- /* When vectorizing a reduction chain w/o SLP the reduction PHI is not
-directy used in stmt.  */
- 

[PATCH] Relax SLP operand swapping

2019-10-25 Thread Richard Biener


When I remove the swapping of operands in the IL I didn't actually
relax swapping restrictions.

Bootstrapped on x86_64-unknown-linux-gnu, testing in progress.

Richard.

2019-10-25  Richard Biener  

* tree-vect-slp.c (vect_get_and_check_slp_defs): Only fail
swapping if we actually have to modify the IL on a shared stmt.
(vect_build_slp_tree_2): Never fail swapping on shared stmts
because we no longer modify the IL.

Index: gcc/tree-vect-slp.c
===
--- gcc/tree-vect-slp.c (revision 277441)
+++ gcc/tree-vect-slp.c (working copy)
@@ -537,19 +537,19 @@ again:
   /* Swap operands.  */
   if (swapped)
 {
-  /* If there are already uses of this stmt in a SLP instance then
- we've committed to the operand order and can't swap it.  */
-  if (STMT_VINFO_NUM_SLP_USES (stmt_info) != 0)
-   {
- if (dump_enabled_p ())
-   dump_printf_loc (MSG_MISSED_OPTIMIZATION, vect_location,
-"Build SLP failed: cannot swap operands of "
-"shared stmt %G", stmt_info->stmt);
- return -1;
-   }
-
   if (first_op_cond)
{
+ /* If there are already uses of this stmt in a SLP instance then
+we've committed to the operand order and can't swap it.  */
+ if (STMT_VINFO_NUM_SLP_USES (stmt_info) != 0)
+   {
+ if (dump_enabled_p ())
+   dump_printf_loc (MSG_MISSED_OPTIMIZATION, vect_location,
+"Build SLP failed: cannot swap operands of "
+"shared stmt %G", stmt_info->stmt);
+ return -1;
+   }
+
  /* To get rid of this swapping we have to move the stmt code
 to the SLP tree as well (and gather it here per stmt).  */
  gassign *stmt = as_a  (stmt_info->stmt);
@@ -1413,28 +1413,6 @@ vect_build_slp_tree_2 (vec_info *vinfo,
  swap_not_matching = false;
  break;
}
- /* Verify if we can safely swap or if we committed to a
-specific operand order already.
-???  Instead of modifying GIMPLE stmts here we could
-record whether we want to swap operands in the SLP
-node and temporarily do that when processing it
-(or wrap operand accessors in a helper).  */
- else if (swap[j] != 0
-  || STMT_VINFO_NUM_SLP_USES (stmt_info))
-   {
- if (!swap_not_matching)
-   {
- if (dump_enabled_p ())
-   dump_printf_loc (MSG_MISSED_OPTIMIZATION,
-vect_location,
-"Build SLP failed: cannot swap "
-"operands of shared stmt %G",
-stmts[j]->stmt);
- goto fail;
-   }
- swap_not_matching = false;
- break;
-   }
}
}
  while (j != group_size);


[PATCH] Fix PR92222

2019-10-25 Thread Richard Biener


We have to check each operand for being in a pattern, not just the
first when avoiding build from scalars (we could possibly handle
the special case of some of them being the pattern stmt root, but
that would be a followup improvement).

Bootstrap & regtest running on x86-64-unknown-linux-gnu.

Richard.

2019-10-25  Richard Biener  

PR tree-optimization/9
* tree-vect-slp.c (_slp_oprnd_info::first_pattern): Remove.
(_slp_oprnd_info::second_pattern): Likewise.
(_slp_oprnd_info::any_pattern): New.
(vect_create_oprnd_info): Adjust.
(vect_get_and_check_slp_defs): Compute whether any stmt is
in a pattern.
(vect_build_slp_tree_2): Avoid building up a node from scalars
if any of the operand defs, not just the first, is in a pattern.

* gcc.dg/torture/pr9.c: New testcase.

Index: gcc/tree-vect-slp.c
===
--- gcc/tree-vect-slp.c (revision 277441)
+++ gcc/tree-vect-slp.c (working copy)
@@ -177,8 +177,7 @@ typedef struct _slp_oprnd_info
  stmt.  */
   tree first_op_type;
   enum vect_def_type first_dt;
-  bool first_pattern;
-  bool second_pattern;
+  bool any_pattern;
 } *slp_oprnd_info;
 
 
@@ -199,8 +198,7 @@ vect_create_oprnd_info (int nops, int gr
   oprnd_info->ops.create (group_size);
   oprnd_info->first_dt = vect_uninitialized_def;
   oprnd_info->first_op_type = NULL_TREE;
-  oprnd_info->first_pattern = false;
-  oprnd_info->second_pattern = false;
+  oprnd_info->any_pattern = false;
   oprnds_info.quick_push (oprnd_info);
 }
 
@@ -339,13 +337,11 @@ vect_get_and_check_slp_defs (vec_info *v
   tree oprnd;
   unsigned int i, number_of_oprnds;
   enum vect_def_type dt = vect_uninitialized_def;
-  bool pattern = false;
   slp_oprnd_info oprnd_info;
   int first_op_idx = 1;
   unsigned int commutative_op = -1U;
   bool first_op_cond = false;
   bool first = stmt_num == 0;
-  bool second = stmt_num == 1;
 
   if (gcall *stmt = dyn_cast  (stmt_info->stmt))
 {
@@ -418,13 +414,12 @@ again:
  return -1;
}
 
-  if (second)
-   oprnd_info->second_pattern = pattern;
+  if (def_stmt_info && is_pattern_stmt_p (def_stmt_info))
+   oprnd_info->any_pattern = true;
 
   if (first)
{
  oprnd_info->first_dt = dt;
- oprnd_info->first_pattern = pattern;
  oprnd_info->first_op_type = TREE_TYPE (oprnd);
}
   else
@@ -1311,7 +1306,7 @@ vect_build_slp_tree_2 (vec_info *vinfo,
  /* ???  Rejecting patterns this way doesn't work.  We'd have to
 do extra work to cancel the pattern so the uses see the
 scalar version.  */
- && !is_pattern_stmt_p (SLP_TREE_SCALAR_STMTS (child)[0]))
+ && !oprnd_info->any_pattern)
{
  slp_tree grandchild;
 
@@ -1358,18 +1353,16 @@ vect_build_slp_tree_2 (vec_info *vinfo,
  /* ???  Rejecting patterns this way doesn't work.  We'd have to
 do extra work to cancel the pattern so the uses see the
 scalar version.  */
- && !is_pattern_stmt_p (stmt_info))
+ && !is_pattern_stmt_p (stmt_info)
+ && !oprnd_info->any_pattern)
{
  if (dump_enabled_p ())
dump_printf_loc (MSG_NOTE, vect_location,
 "Building vector operands from scalars\n");
  this_tree_size++;
- child = vect_create_new_slp_node (oprnd_info->def_stmts);
- SLP_TREE_DEF_TYPE (child) = vect_external_def;
- SLP_TREE_SCALAR_OPS (child) = oprnd_info->ops;
+ child = vect_create_new_slp_node (oprnd_info->ops);
  children.safe_push (child);
  oprnd_info->ops = vNULL;
- oprnd_info->def_stmts = vNULL;
  continue;
}
 
@@ -1469,7 +1440,7 @@ vect_build_slp_tree_2 (vec_info *vinfo,
  /* ???  Rejecting patterns this way doesn't work.  We'd have
 to do extra work to cancel the pattern so the uses see the
 scalar version.  */
- && !is_pattern_stmt_p (SLP_TREE_SCALAR_STMTS (child)[0]))
+ && !oprnd_info->any_pattern)
{
  unsigned int j;
  slp_tree grandchild;
Index: gcc/testsuite/gcc.dg/torture/pr9.c
===
--- gcc/testsuite/gcc.dg/torture/pr9.c  (nonexistent)
+++ gcc/testsuite/gcc.dg/torture/pr9.c  (working copy)
@@ -0,0 +1,17 @@
+/* { dg-do compile } */
+/* { dg-additional-options "-ftree-vectorize" } */
+
+unsigned char *a;
+int b;
+void f();
+void c()
+{
+  char *d;
+  int e;
+  for (; b; b++) {
+  e = 7;
+  for (; e >= 0; e--)
+   *d++ = a[b] & 1 << e ? '1' : '0';
+  }
+  f();
+}


Re: [PATCH 00/29] [arm] Rewrite DImode arithmetic support

2019-10-25 Thread Richard Earnshaw (lists)

On 24/10/2019 17:10, Richard Earnshaw (lists) wrote:

On 24/10/2019 11:16, Christophe Lyon wrote:

On 23/10/2019 15:21, Richard Earnshaw (lists) wrote:

On 23/10/2019 09:28, Christophe Lyon wrote:

On 21/10/2019 14:24, Richard Earnshaw (lists) wrote:

On 21/10/2019 12:51, Christophe Lyon wrote:

On 18/10/2019 21:48, Richard Earnshaw wrote:

Each patch should produce a working compiler (it did when it was
originally written), though since the patch set has been re-ordered
slightly there is a possibility that some of the intermediate steps
may have missing test updates that are only cleaned up later.
However, only the end of the series should be considered complete.
I've kept the patch as a series to permit easier regression hunting
should that prove necessary.


Thanks for this information: my validation system was designed in 
such a way that it will run the GCC testsuite after each of your 
patches, so I'll keep in mind not to report regressions (I've 
noticed several already).



I can perform a manual validation taking your 29 patches as a 
single one and compare the results with those of the revision 
preceding the one were you committed patch #1. Do you think it 
would be useful?



Christophe




I think if you can filter out any that are removed by later patches 
and then report against the patch that caused the regression itself 
then that would be the best.  But I realise that would be more work 
for you, so a round-up against the combined set would be OK.


BTW, I'm aware of an issue with the compiler now generating

  reg, reg, shift 

in Thumb2; no need to report that again.

Thanks,
R.
.




Hi Richard,

The validation of the whole set shows 1 regression, which was also 
reported by the validation of r277179 (early split most DImode 
comparison operations)


When GCC is configured as:
--target arm-none-eabi
--with-mode default
--with-cpu default
--with-fpu default
(that is, no --with-mode, --with-cpu, --with-fpu option)
I'm using binutils-2.28 and newlib-3.1.0

I can see:
FAIL: g++.dg/opt/pr36449.C  -std=gnu++14 execution test
(whatever -std=gnu++XX option)


That's strange.  The assembler code generated for that test is 
unchanged from before the patch series, so I can't see how it can't 
be a problem in the test itself.  What's more, I can't seem to 
reproduce this myself.


As you have noticed, I have created PR92207 to help understand this.



Similarly, in my build the code for _Znwj, malloc, malloc_r and 
free_r are also unchanged, while the malloc_[un]lock functions are 
empty stubs (not surprising as we aren't multi-threaded).


So the only thing that looks to have really changed are the linker 
offsets (some of the library code has changed, but I don't think it's 
really reached in practice, so shouldn't be relevant).




I'm executing the tests using qemu-4.1.0 -cpu arm926
The qemu traces shows that code enters main, then _Znwj (operator 
new), then _malloc_r

The qemu traces end with:


What do you mean by 'end with'?  What's the failure mode of the test? 
A crash, or the test exiting with a failure code?



qemu complains with:
qemu: uncaught target signal 11 (Segmentation fault) - core dumped
Segmentation fault (core dumped)

'end with' because my automated validation builds do not keep the full 
execution traces (that would need too much disk space)




As I've said in the PR, this looks like a bug in the qemu+newlib code. 
We call sbrk() which says, OK, but then the page isn't mapped by qemu 
into the process and it then faults.


So I think these changes are off the hook, it's just bad luck that they 
expose the issue at this point in time.


R.



I've closed the PR as invalid, because this is a newlib bug that is 
fixed on trunk.  https://sourceware.org/ml/newlib/2019/msg00413.html


R.


[PATCH rs6000]Fix PR92132

2019-10-25 Thread Kewen.Lin
Hi,

To support full condition reduction vectorization, we have to define
vec_cmp_* and vcond_mask_*.  This patch is to add related expands.
Add vector_{ungt,unge,unlt,unle} for unique vector_*
interface support.

Regression testing just launched.

gcc/ChangeLog

2019-10-25  Kewen Lin  

PR target/92132
* config/rs6000/rs6000.md (one_cmpl3_internal): Expose name.
* config/rs6000/vector.md (fpcmpun): New code_iterator.
(vcond_mask_): New expand.
(vcond_mask_): Likewise.
(vec_cmp): Likewise.
(vec_cmpu): Likewise.
(vec_cmp): Likewise.
(vector_{ungt,unge,unlt,unle}): Likewise.
(vector_uneq): Expose name.
(vector_ltgt): Likewise.
(vector_unordered): Likewise.
(vector_ordered): Likewise.

gcc/testsuite/ChangeLog

2019-10-25  Kewen Lin  

PR target/92132
* gcc.target/powerpc/pr92132-fp-1.c: New test.
* gcc.target/powerpc/pr92132-fp-2.c: New test.
* gcc.target/powerpc/pr92132-int-1.c: New test.
* gcc.target/powerpc/pr92132-int-1.c: New test.

diff --git a/gcc/config/rs6000/rs6000.md b/gcc/config/rs6000/rs6000.md
index d0cca1e..2a68548 100644
--- a/gcc/config/rs6000/rs6000.md
+++ b/gcc/config/rs6000/rs6000.md
@@ -6800,7 +6800,7 @@
 (const_string "16")))])
 
 ;; 128-bit one's complement
-(define_insn_and_split "*one_cmpl3_internal"
+(define_insn_and_split "one_cmpl3_internal"
   [(set (match_operand:BOOL_128 0 "vlogical_operand" "=")
(not:BOOL_128
  (match_operand:BOOL_128 1 "vlogical_operand" "")))]
diff --git a/gcc/config/rs6000/vector.md b/gcc/config/rs6000/vector.md
index 886cbad..64c3c60 100644
--- a/gcc/config/rs6000/vector.md
+++ b/gcc/config/rs6000/vector.md
@@ -107,6 +107,8 @@
 (smin "smin")
 (smax "smax")])
 
+(define_code_iterator fpcmpun [ungt unge unlt unle])
+
 
 ;; Vector move instructions.  Little-endian VSX loads and stores require
 ;; special handling to circumvent "element endianness."
@@ -493,6 +495,241 @@
 FAIL;
 })
 
+;; To support vector condition vectorization, define vcond_mask and vec_cmp.
+
+;; Same mode for condition true/false values and predicate operand.
+(define_expand "vcond_mask_"
+  [(match_operand:VEC_I 0 "vint_operand")
+   (match_operand:VEC_I 1 "vint_operand")
+   (match_operand:VEC_I 2 "vint_operand")
+   (match_operand:VEC_I 3 "vint_operand")]
+  "VECTOR_UNIT_ALTIVEC_OR_VSX_P (mode)"
+{
+  emit_insn (gen_vector_select_ (operands[0], operands[2], operands[1],
+ operands[3]));
+  DONE;
+})
+
+;; Condition true/false values are float but predicate operand is of
+;; type integer vector with same element size.
+(define_expand "vcond_mask_"
+  [(match_operand:VEC_F 0 "vfloat_operand")
+   (match_operand:VEC_F 1 "vfloat_operand")
+   (match_operand:VEC_F 2 "vfloat_operand")
+   (match_operand: 3 "vint_operand")]
+  "VECTOR_UNIT_ALTIVEC_OR_VSX_P (mode)"
+{
+  emit_insn (gen_vector_select_ (operands[0], operands[2], operands[1],
+ operands[3]));
+  DONE;
+})
+
+;; For signed integer vectors comparison.
+(define_expand "vec_cmp"
+  [(set (match_operand:VEC_I 0 "vint_operand")
+   (match_operator 1 "comparison_operator"
+ [(match_operand:VEC_I 2 "vint_operand")
+  (match_operand:VEC_I 3 "vint_operand")]))]
+  "VECTOR_UNIT_ALTIVEC_OR_VSX_P (mode)"
+{
+  enum rtx_code code = GET_CODE (operands[1]);
+  rtx tmp = gen_reg_rtx (mode);
+  switch (code)
+{
+case NE:
+  emit_insn (gen_vector_eq (operands[0], operands[2], operands[3]));
+  emit_insn (gen_one_cmpl2 (operands[0], operands[0]));
+  break;
+case EQ:
+  emit_insn (gen_vector_eq (operands[0], operands[2], operands[3]));
+  break;
+case GE:
+  emit_insn (
+   gen_vector_nlt (operands[0], operands[2], operands[3], tmp));
+  break;
+case GT:
+  emit_insn (gen_vector_gt (operands[0], operands[2], operands[3]));
+  break;
+case LE:
+  emit_insn (
+   gen_vector_ngt (operands[0], operands[2], operands[3], tmp));
+  break;
+case LT:
+  emit_insn (gen_vector_gt (operands[0], operands[3], operands[2]));
+  break;
+case GEU:
+  emit_insn (
+   gen_vector_nltu (operands[0], operands[2], operands[3], tmp));
+  break;
+case GTU:
+  emit_insn (gen_vector_gtu (operands[0], operands[2], operands[3]));
+  break;
+case LEU:
+  emit_insn (
+   gen_vector_ngtu (operands[0], operands[2], operands[3], tmp));
+  break;
+case LTU:
+  emit_insn (gen_vector_gtu (operands[0], operands[3], operands[2]));
+  break;
+default:
+  gcc_unreachable ();
+  break;
+}
+  DONE;
+})
+
+;; For unsigned integer vectors comparison.
+(define_expand "vec_cmpu"
+  [(set (match_operand:VEC_I 0 "vint_operand")
+   (match_operator 1 "comparison_operator"
+ [(match_operand:VEC_I 2 "vint_operand")
+  

[committed] Fix failure in gcc.target/sve/reduc_strict_3.c

2019-10-25 Thread Richard Sandiford
Unwanted unrolling meant that we had more single-precision FADDAs
than expected.

Tested on aarch64-linux-gnu (with and without SVE) and applied as r277442.

Richard


2019-10-25  Richard Sandiford  

gcc/testsuite/
* gcc.target/aarch64/sve/reduc_strict_3.c (double_reduc1): Prevent
the loop from being unrolled.

Index: gcc/testsuite/gcc.target/aarch64/sve/reduc_strict_3.c
===
--- gcc/testsuite/gcc.target/aarch64/sve/reduc_strict_3.c   2019-10-24 
08:29:08.0 +0100
+++ gcc/testsuite/gcc.target/aarch64/sve/reduc_strict_3.c   2019-10-25 
10:16:36.130802245 +0100
@@ -82,6 +82,7 @@ double_reduc1 (float (*restrict i)[16])
 {
   float l = 0;
 
+#pragma GCC unroll 0
   for (int a = 0; a < 8; a++)
 for (int b = 0; b < 8; b++)
   l += i[b][a];


[committed] Update SVE tests for recent XPASSes

2019-10-25 Thread Richard Sandiford
Recent target-independent patches mean that several SVE tests
now produce the code that we'd originally wanted them to produce.
Really nice to see :-)

This patch therefore updates the expected baseline, so that hopefully
we don't regress from this point in future.

Tested on aarch64-linux-gnu (with and without SVE) and applied as r277441.

Richard


2019-10-25  Richard Sandiford  

gcc/testsuite/
* gcc.target/aarch64/sve/loop_add_5.c: Remove XFAILs for tests
that now pass.
* gcc.target/aarch64/sve/reduc_1.c: Likewise.
* gcc.target/aarch64/sve/reduc_2.c: Likewise.
* gcc.target/aarch64/sve/reduc_5.c: Likewise.
* gcc.target/aarch64/sve/reduc_8.c: Likewise.
* gcc.target/aarch64/sve/slp_13.c: Likewise.
* gcc.target/aarch64/sve/slp_5.c: Likewise.  Update expected
WHILELO counts.
* gcc.target/aarch64/sve/slp_7.c: Likewise.

Index: gcc/testsuite/gcc.target/aarch64/sve/loop_add_5.c
===
--- gcc/testsuite/gcc.target/aarch64/sve/loop_add_5.c   2019-03-08 
18:14:29.784994721 +
+++ gcc/testsuite/gcc.target/aarch64/sve/loop_add_5.c   2019-10-25 
10:13:06.144292748 +0100
@@ -3,11 +3,11 @@
 
 #include "loop_add_4.c"
 
-/* { dg-final { scan-assembler-times {\tindex\tz[0-9]+\.b, w[0-9]+, #-16\n} 1 
{ xfail *-*-* } } } */
-/* { dg-final { scan-assembler-times {\tindex\tz[0-9]+\.b, w[0-9]+, #-15\n} 1 
{ xfail *-*-* }  } } */
+/* { dg-final { scan-assembler-times {\tindex\tz[0-9]+\.b, w[0-9]+, #-16\n} 1 
} } */
+/* { dg-final { scan-assembler-times {\tindex\tz[0-9]+\.b, w[0-9]+, #-15\n} 1 
} } */
 /* { dg-final { scan-assembler-times {\tindex\tz[0-9]+\.b, w[0-9]+, #1\n} 1 } 
} */
-/* { dg-final { scan-assembler-times {\tindex\tz[0-9]+\.b, w[0-9]+, #15\n} 1 { 
xfail *-*-* } } } */
-/* { dg-final { scan-assembler-times {\tindex\tz[0-9]+\.b, w[0-9]+, w[0-9]+\n} 
3 { xfail *-*-* }  } } */
+/* { dg-final { scan-assembler-times {\tindex\tz[0-9]+\.b, w[0-9]+, #15\n} 1 } 
} */
+/* { dg-final { scan-assembler-times {\tindex\tz[0-9]+\.b, w[0-9]+, w[0-9]+\n} 
3 } } */
 /* { dg-final { scan-assembler-times {\tld1b\tz[0-9]+\.b, p[0-7]+/z, 
\[x[0-9]+, x[0-9]+\]} 8 } } */
 /* { dg-final { scan-assembler-times {\tst1b\tz[0-9]+\.b, p[0-7]+, \[x[0-9]+, 
x[0-9]+\]} 8 } } */
 
@@ -16,11 +16,11 @@
 /* { dg-final { scan-assembler-times {\tadd\tz[0-9]+\.b, z[0-9]+\.b, #} 6 } } 
*/
 /* { dg-final { scan-assembler-times {\tadd\tz[0-9]+\.b, z[0-9]+\.b, 
z[0-9]+\.b\n} 8 } } */
 
-/* { dg-final { scan-assembler-times {\tindex\tz[0-9]+\.h, w[0-9]+, #-16\n} 1 
{ xfail *-*-* } } } */
-/* { dg-final { scan-assembler-times {\tindex\tz[0-9]+\.h, w[0-9]+, #-15\n} 1 
{ xfail *-*-* } } } */
+/* { dg-final { scan-assembler-times {\tindex\tz[0-9]+\.h, w[0-9]+, #-16\n} 1 
} } */
+/* { dg-final { scan-assembler-times {\tindex\tz[0-9]+\.h, w[0-9]+, #-15\n} 1 
} } */
 /* { dg-final { scan-assembler-times {\tindex\tz[0-9]+\.h, w[0-9]+, #1\n} 1 } 
} */
 /* { dg-final { scan-assembler-times {\tindex\tz[0-9]+\.h, w[0-9]+, #15\n} 1 } 
} */
-/* { dg-final { scan-assembler-times {\tindex\tz[0-9]+\.h, w[0-9]+, w[0-9]+\n} 
3 { xfail *-*-* } } } */
+/* { dg-final { scan-assembler-times {\tindex\tz[0-9]+\.h, w[0-9]+, w[0-9]+\n} 
3 } } */
 /* { dg-final { scan-assembler-times {\tld1h\tz[0-9]+\.h, p[0-7]+/z, 
\[x[0-9]+, x[0-9]+, lsl 1\]} 8 } } */
 /* { dg-final { scan-assembler-times {\tst1h\tz[0-9]+\.h, p[0-7]+, \[x[0-9]+, 
x[0-9]+, lsl 1\]} 8 } } */
 
Index: gcc/testsuite/gcc.target/aarch64/sve/reduc_1.c
===
--- gcc/testsuite/gcc.target/aarch64/sve/reduc_1.c  2019-03-08 
18:14:29.776994751 +
+++ gcc/testsuite/gcc.target/aarch64/sve/reduc_1.c  2019-10-25 
10:13:06.144292748 +0100
@@ -105,8 +105,8 @@ #define TEST_BITWISE(T) \
 
 TEST_BITWISE (DEF_REDUC_BITWISE)
 
-/* { dg-final { scan-assembler-times {\tadd\tz[0-9]+\.b, p[0-7]/m, z[0-9]+\.b, 
z[0-9]+\.b\n} 1 } } */
-/* { dg-final { scan-assembler-times {\tadd\tz[0-9]+\.h, p[0-7]/m, z[0-9]+\.h, 
z[0-9]+\.h\n} 1 } } */
+/* { dg-final { scan-assembler-times {\tadd\tz[0-9]+\.b, p[0-7]/m, z[0-9]+\.b, 
z[0-9]+\.b\n} 2 } } */
+/* { dg-final { scan-assembler-times {\tadd\tz[0-9]+\.h, p[0-7]/m, z[0-9]+\.h, 
z[0-9]+\.h\n} 2 } } */
 /* { dg-final { scan-assembler-times {\tadd\tz[0-9]+\.s, p[0-7]/m, z[0-9]+\.s, 
z[0-9]+\.s\n} 2 } } */
 /* { dg-final { scan-assembler-times {\tadd\tz[0-9]+\.d, p[0-7]/m, z[0-9]+\.d, 
z[0-9]+\.d\n} 2 } } */
 
@@ -157,8 +157,8 @@ TEST_BITWISE (DEF_REDUC_BITWISE)
 /* { dg-final { scan-assembler-times {\teor\tz[0-9]+\.s, p[0-7]/m, z[0-9]+\.s, 
z[0-9]+\.s\n} 2 } } */
 /* { dg-final { scan-assembler-times {\teor\tz[0-9]+\.d, p[0-7]/m, z[0-9]+\.d, 
z[0-9]+\.d\n} 2 } } */
 
-/* { dg-final { scan-assembler-times {\tuaddv\td[0-9]+, p[0-7], z[0-9]+\.b\n} 
1 } } */
-/* { dg-final { scan-assembler-times {\tuaddv\td[0-9]+, p[0-7], z[0-9]+\.h\n} 
1 } } */
+/* { dg-final { 

[PATCH][OBVIOUS] Fix typo in dump_tree_statistics.

2019-10-25 Thread Martin Liška
Hi.

I'm fixing one obvious issue in dump_tree_statistics.

I'm going to install the patch.
Martin

gcc/ChangeLog:

2019-10-25  Martin Liska  

* tree.c (dump_tree_statistics): Use sorted index 'j' and not 'i'.
---
 gcc/tree.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)


diff --git a/gcc/tree.c b/gcc/tree.c
index 2bee1d255ff..23fe5bffd37 100644
--- a/gcc/tree.c
+++ b/gcc/tree.c
@@ -9673,7 +9673,7 @@ dump_tree_statistics (void)
 	  {
 	unsigned j = indices[i];
 	fprintf (stderr, "%-20s %6" PRIu64 "%c %9" PRIu64 "%c\n",
-		 tree_node_kind_names[i], SIZE_AMOUNT (tree_node_counts[j]),
+		 tree_node_kind_names[j], SIZE_AMOUNT (tree_node_counts[j]),
 		 SIZE_AMOUNT (tree_node_sizes[j]));
 	total_nodes += tree_node_counts[j];
 	total_bytes += tree_node_sizes[j];



Re: [SVE] PR91272

2019-10-25 Thread Richard Sandiford
Hi Prathamesh,

I've just committed a patch that fixes a large number of SVE
reduction-related failures.  Could you rebase and retest on top of that?
Sorry for messing you around, but regression testing based on the state
before the patch wouldn't have been that meaningful.  In particular...

Prathamesh Kulkarni  writes:
> diff --git a/gcc/tree-vect-loop.c b/gcc/tree-vect-loop.c
> index a70d52eb2ca..82814e2c2af 100644
> --- a/gcc/tree-vect-loop.c
> +++ b/gcc/tree-vect-loop.c
> @@ -6428,6 +6428,7 @@ vectorizable_reduction (stmt_vec_info stmt_info, 
> slp_tree slp_node,
>if (loop_vinfo && LOOP_VINFO_CAN_FULLY_MASK_P (loop_vinfo))
>  {
>if (reduction_type != FOLD_LEFT_REDUCTION
> +   && reduction_type != EXTRACT_LAST_REDUCTION
> && !mask_by_cond_expr
> && (cond_fn == IFN_LAST
> || !direct_internal_fn_supported_p (cond_fn, vectype_in,

...after today's patch, it's instead necessary to remove:

  if (loop_vinfo
  && LOOP_VINFO_CAN_FULLY_MASK_P (loop_vinfo)
  && reduction_type == EXTRACT_LAST_REDUCTION)
{
  if (dump_enabled_p ())
dump_printf_loc (MSG_MISSED_OPTIMIZATION, vect_location,
 "can't yet use a fully-masked loop for"
 " EXTRACT_LAST_REDUCTION.\n");
  LOOP_VINFO_CAN_FULLY_MASK_P (loop_vinfo) = false;
}

from vectorizable_condition.  We no longer need any changes to
vectorizable_reduction itself.

> @@ -10180,18 +10181,29 @@ vectorizable_condition (stmt_vec_info stmt_info, 
> gimple_stmt_iterator *gsi,
>vec != { 0, ... } (masked in the MASK_LOAD,
>unmasked in the VEC_COND_EXPR).  */
>  
> -   if (loop_mask)
> +   if (masks)
>   {
> -   if (COMPARISON_CLASS_P (vec_compare))
> +   unsigned vec_num = vec_oprnds0.length ();
> +   loop_mask = vect_get_loop_mask (gsi, masks, vec_num * ncopies,
> +   vectype, vec_num * j + i);

Ah... now that the two cases are merged (good!), just "if (masks)" isn't
right after all, sorry for the misleading comment.  I think this should
instead be:

  /* Force vec_compare to be an SSA_NAME rather than a comparison,
 in cases where that's necessary.  */
  if (masks || reduction_type == EXTRACT_LAST_REDUCTION)
{

Not doing that would break unmasked EXTRACT_LAST_REDUCTIONs.

Then make the existing:

  tree tmp2 = make_ssa_name (vec_cmp_type);
  gassign *g = gimple_build_assign (tmp2, BIT_AND_EXPR,
vec_compare, loop_mask);
  vect_finish_stmt_generation (stmt_info, g, gsi);
  vec_compare = tmp2;

conditional on "if (masks)" only, and defer the calculation of loop_mask
to this point too.

[ It ould be good to spot-check that aarch64-sve.exp passes after making
  the changes to the stmt-generation part of vectorizable_condition,
  but before removing the:

LOOP_VINFO_CAN_FULLY_MASK_P (loop_vinfo) = false;

  quoted above.  That would show that unmasked fold-left reductions
  still work after the changes.

  There are still some lingering cases in which we can test unmasked
  SVE loops directly, but they're becoming rarer and should eventually
  go away altogether.  So I don't think it's worth trying to construct
  an unmasked test for the testsuite. ]

> +
> +  if (!is_gimple_val (vec_compare))
> +{
> +  tree vec_compare_name = make_ssa_name (vec_cmp_type);
> +  gassign *new_stmt = gimple_build_assign (vec_compare_name,
> +   vec_compare);
> +  vect_finish_stmt_generation (stmt_info, new_stmt, gsi);
> +  vec_compare = vec_compare_name;
> +}

Should use tab-based indentation.

Thanks,
Richard


Re: [Patch][Fortran] OpenACC – permit common blocks in some clauses

2019-10-25 Thread Thomas Schwinge
Hi Tobias!

On 2019-10-23T22:34:42+0200, Tobias Burnus  wrote:
> Updated version attached. Changes:
> * Use "true" instead of "openacc" for the OpenACC-only "copy()" clause 
> (as not shared w/ OpenMP)
> * Add some documentation to gimplify.c
> * Use GOVD_FIRSTPRIVATE also for "kernel"

Thanks!

> The patch survived bootstrapping + regtesting on my laptop (no 
> offloading) and on a build server (with nvptx offloading).

OK for trunk, with the following few small items considered.  To record
the review effort, please include "Reviewed-by: Thomas Schwinge
" in the commit log, see
.


> On 10/18/19 3:26 PM, Thomas Schwinge wrote:
>> I'll be quick to note that I don't have any first-hand experience with 
>> Fortran common blocks. :-P 
>
> To quote you from below: "So, please do study that closer. ;-P"

Haha!  ;-P (You don't want to know how long is my list of items that I
might/want/could look into...)

> I also do not have first-hand experience (as I started with Fortran 95 + 
> some of F2003), but common blocks were a nice idea of the early 1960 to 
> provide access to global memory, avoiding to pass all data as arguments 
> (which also has stack issues). They have been replaced by derived types 
> and variables declared at module level since Fortran 90. See 
> https://j3-fortran.org/doc/year/18/18-007r1.pdf or 
> https://web.stanford.edu/class/me200c/tutorial_77/13_common.html

..., and didn't "They have been replaced by [...]" go as far as that
they've actually been deprecated in recent Fortran standard revisions?
(I may be misremembering.)


Anyway:

> On 10/18/19 3:26 PM, Thomas Schwinge wrote:
>>> For OpenACC, gfortran already supports common blocks for 
>>> device_resident/usedevice/cache/flush/link.

>> I'll defer to your judgement there, but just one comment: I noticed 
>> that OpenACC 2.7 in 2.7. "Data Clauses" states that "For all clauses 
>> except 'deviceptr' and 'present', the list argument may include a 
>> Fortran_common block_ name enclosed within slashes, if that _common 
>> block_ name also appears in a 'declare' directive 'link' clause".
>>
>> Are we already properly handling the aspect that requires that the 
>> "that _common block_ name also appears in a 'declare' directive 'link' 
>> clause"? 
>
> I don't know neither the OpenACC spec nor the GCC implementation well 
> enough to claim proper (!) handling. However, as stated above: 
> device_resident/usedevice/cache/flush/link do support common block 
> arguments.

(... in the front end at least.)

> Looking at the testsuite, link and device_resident are tested in 
> gfortran.dg/goacc/declare-2.f95. (list.f95 and reduction.f95 also use 
> come common blocks.) – And gfortran.dg/goacc/common-block-1.f90 has been 
> added.

(..., again, that'S all front end testing, so not sufficient to claim it
actually works for executing user code.)  ;-\

>> The libgomp execution test cases you're adding all state that "This test 
>> does not exercise ACC DECLARE", yet they supposedly already do work fine. Or 
>> am I understading the OpenACC specification wrongly here?
>
> You need to ask Cesar, who wrote the test case and that comment, why he 
> added it.

Well, Cesar is not working on GCC anymore, thus you've been asked to
adopt his patch, and fix it up, change it as necessary.

> The patch does not touch 'link'/'device_resident' clauses of 'declare', 
> hence, I think he didn't see a reason to add a run-time test case for 
> it.

(Or such testing didn't work, but there was no time/interest at that
point to make it work.)

> – That's independent from whether it is supported by the OpenACC 
> spec and whether it is "properly" implemented in GCC/gfortran.
>
>> I'm certainly aware of (big) deficiencies in the OpenACC 'declare' handling 
>> so I guess my question here may be whether these test cases are valid after 
>> all?
>
> Well, you are the OpenACC specialist – both spec wise and 
> GCC-implementation wise.

Sure, I do know some things, but I'm certainly not all-knowing -- that's
why I needed you to look into this in more detail.

> However, as the test cases are currently 
> parsing-only test cases, I think they should be fine.

OK, and everything else we're thus delaying for later.  That's OK -- what
we got here now is certainly an improvement on its own.  I just wanted to
make sure that we're not missing something obvious.


>>> gcc/gimplify.c: oacc_default_clause contains some changes; there are 
>>> additionally two lines which only differ for ORT_ACC – Hence, it is an 
>>> OpenACC-only change!
>>> The ME change is about privatizing common blocks (I haven't studied this 
>>> part closer.)
>> So, please do study that closer.  ;-P
>>
>> In
>> I raised some questions, got a bit of an answer, and in
>> 
>> asked further, didn't get an answer.

By the way, in the mean time 

Re: [PATCH, Fortran] Optionally suppress no-automatic overwrites recursive warning - for approval

2019-10-25 Thread Tobias Burnus

On 9/26/19 10:45 AM, Mark Eggleston wrote:
Original thread starts here 
https://gcc.gnu.org/ml/gcc-patches/2019-09/msg01185.html

OK to commit?


As Steve, I am not really happy about adding yet another option and 
especially not about legacy features. On the other hand, I see that 
legacy code is still used.


Having said this, the patch is OK from my side.

Tobias

PS: I was also not that happy about the BOZ changes by Steve, which 
broke code here – but, fortunately, adding int( ,kind=) around it was 
sufficient and that code was supposed to be F2003 standard conforming.  
I could ping the authors and is now fixed. Still, I wonder how much code 
broke due to that change; code is not that simple to fix. – But, in 
general, I am very much in favour in having valid Fortran 2018 code (can 
be fixed form, old and use old features, that's fine).




gcc/fortran/ChangeLog

    Mark Eggleston  

    * invoke.texi: Add -Wno-overwrite-recursive to list of options. Add
    description of -Wno-overwrite-recursive. Fix typo in description
    of -Winteger-division.
    * lang.opt: Add option -Woverwrite-recursive initialised as on.
    * option.c (gfc_post_options): Output warning only if it is enabled.

gcc/testsuite/ChangeLog

    Mark Eggleston 

    * gfortran.dg/no_overwrite_recursive_1.f90: New test.
    * gfortran.dg/no_overwrite_recursive_2.f90: New test.



Re: Remove build_{same_sized_,}truth_vector_type

2019-10-25 Thread Richard Biener
On Wed, Oct 23, 2019 at 1:13 PM Richard Sandiford
 wrote:
>
> build_same_sized_truth_vector_type was confusingly named, since for
> SVE and AVX512 the returned vector isn't the same byte size (although
> it does have the same number of elements).  What it really returns
> is the "truth" vector type for a given data vector type.
>
> The more general truth_type_for provides the same thing when passed
> a vector and IMO has a more descriptive name, so this patch replaces
> all uses of build_same_sized_truth_vector_type with that.  It does
> the same for a call to build_truth_vector_type, leaving truth_type_for
> itself as the only remaining caller.
>
> It's then more natural to pass build_truth_vector_type the original
> vector type rather than its size and nunits, especially since the
> given size isn't the size of the returned vector.  This in turn allows
> a future patch to simplify the interface of get_mask_mode.  Doing this
> also fixes a bug in which truth_type_for would pass a size of zero for
> BLKmode vector types.
>
> Tested individually on aarch64-linux-gnu and as a series on
> x86_64-linux-gnu.  OK to install?

OK.

Thanks,
Richard.

> Richard
>
>
> 2019-10-23  Richard Sandiford  
>
> gcc/
> * tree.h (build_truth_vector_type): Delete.
> (build_same_sized_truth_vector_type): Likewise.
> * tree.c (build_truth_vector_type): Rename to...
> (build_truth_vector_type_for): ...this.  Make static and take
> a vector type as argument.
> (truth_type_for): Update accordingly.
> (build_same_sized_truth_vector_type): Delete.
> * tree-vect-generic.c (expand_vector_divmod): Use truth_type_for
> instead of build_same_sized_truth_vector_type.
> * tree-vect-loop.c (vect_create_epilog_for_reduction): Likewise.
> (vect_record_loop_mask, vect_get_loop_mask): Likewise.
> * tree-vect-patterns.c (build_mask_conversion): Likeise.
> * tree-vect-slp.c (vect_get_constant_vectors): Likewise.
> * tree-vect-stmts.c (vect_get_vec_def_for_operand): Likewise.
> (vect_build_gather_load_calls, vectorizable_call): Likewise.
> (scan_store_can_perm_p, vectorizable_scan_store): Likewise.
> (vectorizable_store, vectorizable_condition): Likewise.
> (get_mask_type_for_scalar_type, get_same_sized_vectype): Likewise.
> (vect_get_mask_type_for_stmt): Use truth_type_for instead of
> build_truth_vector_type.
> * config/rs6000/rs6000-call.c (fold_build_vec_cmp): Use truth_type_for
> instead of build_same_sized_truth_vector_type.
>
> gcc/c/
> * c-typeck.c (build_conditional_expr): Use truth_type_for instead
> of build_same_sized_truth_vector_type.
> (build_vec_cmp): Likewise.
>
> gcc/cp/
> * call.c (build_conditional_expr_1): Use truth_type_for instead
> of build_same_sized_truth_vector_type.
> * typeck.c (build_vec_cmp): Likewise.
>
> gcc/d/
> * d-codegen.cc (build_boolop): Use truth_type_for instead of
> build_same_sized_truth_vector_type.
>
> Index: gcc/tree.h
> ===
> --- gcc/tree.h  2019-10-23 12:07:54.505663970 +0100
> +++ gcc/tree.h  2019-10-23 12:10:58.116366179 +0100
> @@ -4438,8 +4438,6 @@ extern tree build_reference_type (tree);
>  extern tree build_vector_type_for_mode (tree, machine_mode);
>  extern tree build_vector_type (tree, poly_int64);
>  extern tree build_truth_vector_type_for_mode (poly_uint64, machine_mode);
> -extern tree build_truth_vector_type (poly_uint64, poly_uint64);
> -extern tree build_same_sized_truth_vector_type (tree vectype);
>  extern tree build_opaque_vector_type (tree, poly_int64);
>  extern tree build_index_type (tree);
>  extern tree build_array_type (tree, tree, bool = false);
> Index: gcc/tree.c
> ===
> --- gcc/tree.c  2019-10-23 12:07:54.501663998 +0100
> +++ gcc/tree.c  2019-10-23 12:10:58.116366179 +0100
> @@ -11127,11 +11127,16 @@ build_truth_vector_type_for_mode (poly_u
>return make_vector_type (bool_type, nunits, mask_mode);
>  }
>
> -/* Build truth vector with specified length and number of units.  */
> +/* Build a vector type that holds one boolean result for each element of
> +   vector type VECTYPE.  The public interface for this operation is
> +   truth_type_for.  */
>
> -tree
> -build_truth_vector_type (poly_uint64 nunits, poly_uint64 vector_size)
> +static tree
> +build_truth_vector_type_for (tree vectype)
>  {
> +  poly_uint64 nunits = TYPE_VECTOR_SUBPARTS (vectype);
> +  poly_uint64 vector_size = tree_to_poly_uint64 (TYPE_SIZE_UNIT (vectype));
> +
>machine_mode mask_mode;
>if (targetm.vectorize.get_mask_mode (nunits,
>vector_size).exists (_mode))
> @@ -11144,22 +11149,6 @@ build_truth_vector_type (poly_uint64 nun
>return make_vector_type (bool_type, nunits, BLKmode);
>  }
>
> 

Re: Add build_truth_vector_type_for_mode

2019-10-25 Thread Richard Biener
On Wed, Oct 23, 2019 at 1:10 PM Richard Sandiford
 wrote:
>
> Callers of vect_halve_mask_nunits and vect_double_mask_nunits
> already know what mode the resulting vector type should have,
> so we might as well create the vector type directly with that mode,
> just like build_vector_type_for_mode lets us build normal vectors
> with a known mode.  This avoids the current awkwardness of having
> to recompute the mode starting from vec_info::vector_size, which
> hard-codes the assumption that all vectors have to be the same size.
>
> A later patch gets rid of build_truth_vector_type and
> build_same_sized_truth_vector_type, so the net effect of the
> series is to reduce the number of type functions by one.
>
> Tested individually on aarch64-linux-gnu and as a series on
> x86_64-linux-gnu.  OK to install?

OK.

Thanks,
Richard.

> Richard
>
>
> 2019-10-23  Richard Sandiford  
>
> gcc/
> * tree.h (build_truth_vector_type_for_mode): Declare.
> * tree.c (build_truth_vector_type_for_mode): New function,
> split out from...
> (build_truth_vector_type): ...here.
> (build_opaque_vector_type): Fix head comment.
> * tree-vectorizer.h (supportable_narrowing_operation): Remove
> vec_info parameter.
> (vect_halve_mask_nunits): Replace vec_info parameter with the
> mode of the new vector.
> (vect_double_mask_nunits): Likewise.
> * tree-vect-loop.c (vect_halve_mask_nunits): Likewise.
> (vect_double_mask_nunits): Likewise.
> * tree-vect-loop-manip.c: Include insn-config.h, rtl.h and recog.h.
> (vect_maybe_permute_loop_masks): Remove vinfo parameter.  Update call
> to vect_halve_mask_nunits, getting the required mode from the unpack
> patterns.
> (vect_set_loop_condition_masked): Update call accordingly.
> * tree-vect-stmts.c (supportable_narrowing_operation): Remove vec_info
> parameter and update call to vect_double_mask_nunits.
> (vectorizable_conversion): Update call accordingly.
> (simple_integer_narrowing): Likewise.  Remove vec_info parameter.
> (vectorizable_call): Update call accordingly.
> (supportable_widening_operation): Update call to
> vect_halve_mask_nunits.
>
> Index: gcc/tree.h
> ===
> --- gcc/tree.h  2019-09-21 13:56:07.519944842 +0100
> +++ gcc/tree.h  2019-10-23 12:07:54.505663970 +0100
> @@ -4437,6 +4437,7 @@ extern tree build_reference_type_for_mod
>  extern tree build_reference_type (tree);
>  extern tree build_vector_type_for_mode (tree, machine_mode);
>  extern tree build_vector_type (tree, poly_int64);
> +extern tree build_truth_vector_type_for_mode (poly_uint64, machine_mode);
>  extern tree build_truth_vector_type (poly_uint64, poly_uint64);
>  extern tree build_same_sized_truth_vector_type (tree vectype);
>  extern tree build_opaque_vector_type (tree, poly_int64);
> Index: gcc/tree.c
> ===
> --- gcc/tree.c  2019-10-20 13:58:01.679637360 +0100
> +++ gcc/tree.c  2019-10-23 12:07:54.501663998 +0100
> @@ -3,25 +3,35 @@ build_vector_type (tree innertype, poly_
>return make_vector_type (innertype, nunits, VOIDmode);
>  }
>
> +/* Build a truth vector with NUNITS units, giving it mode MASK_MODE.  */
> +
> +tree
> +build_truth_vector_type_for_mode (poly_uint64 nunits, machine_mode mask_mode)
> +{
> +  gcc_assert (mask_mode != BLKmode);
> +
> +  poly_uint64 vsize = GET_MODE_BITSIZE (mask_mode);
> +  unsigned HOST_WIDE_INT esize = vector_element_size (vsize, nunits);
> +  tree bool_type = build_nonstandard_boolean_type (esize);
> +
> +  return make_vector_type (bool_type, nunits, mask_mode);
> +}
> +
>  /* Build truth vector with specified length and number of units.  */
>
>  tree
>  build_truth_vector_type (poly_uint64 nunits, poly_uint64 vector_size)
>  {
> -  machine_mode mask_mode
> -= targetm.vectorize.get_mask_mode (nunits, vector_size).else_blk ();
> -
> -  poly_uint64 vsize;
> -  if (mask_mode == BLKmode)
> -vsize = vector_size * BITS_PER_UNIT;
> -  else
> -vsize = GET_MODE_BITSIZE (mask_mode);
> +  machine_mode mask_mode;
> +  if (targetm.vectorize.get_mask_mode (nunits,
> +  vector_size).exists (_mode))
> +return build_truth_vector_type_for_mode (nunits, mask_mode);
>
> +  poly_uint64 vsize = vector_size * BITS_PER_UNIT;
>unsigned HOST_WIDE_INT esize = vector_element_size (vsize, nunits);
> -
>tree bool_type = build_nonstandard_boolean_type (esize);
>
> -  return make_vector_type (bool_type, nunits, mask_mode);
> +  return make_vector_type (bool_type, nunits, BLKmode);
>  }
>
>  /* Returns a vector type corresponding to a comparison of VECTYPE.  */
> @@ -11150,7 +11160,8 @@ build_same_sized_truth_vector_type (tree
>return build_truth_vector_type (TYPE_VECTOR_SUBPARTS (vectype), size);
>  }
>
> -/* Similarly, 

Re: [wwwdocs] readings.html - http://www.idris.fr/data/publications/F95/test_F95_english.html is gone

2019-10-25 Thread Tobias Burnus

On 10/25/19 8:04 AM, Gerald Pfeifer wrote:

I looked for a replacement, and there does not appear to be one, so I
remove the link.


Well, like always there is the web archive:
https://web.archive.org/web/20190419071502/http://www.idris.fr/data/publications/F95/test_F95_english.html

It does have the tar file and the README. Hence, it would be an option 
to link there. – I don't know how worthwhile the test suite it, however.


Cheers,

Tobias




Committed.

Gerald

- Log -
commit 61592c09663a83809c5115cb7dfddeb3bd606418
Author: Gerald Pfeifer 
Date:   Fri Oct 25 07:55:49 2019 +0200

 http://www.idris.fr/data/publications/F95/test_F95_english.html is gone.

diff --git a/htdocs/readings.html b/htdocs/readings.html
index 203b590..5c30391 100644
--- a/htdocs/readings.html
+++ b/htdocs/readings.html
@@ -435,12 +435,6 @@ names.
  contains legal and operational Fortran 77 code.


-IDRIS
-http://www.idris.fr/data/publications/F95/test_F95_english.html;>
-Low level bench tests of Fortran 95. It tests some Fortran 95
-intrinsics.
-  
-  
  The g77 testsuite (which is part of GCC).




Re: Pass the data vector mode to get_mask_mode

2019-10-25 Thread Richard Biener
On Thu, Oct 24, 2019 at 9:45 AM Richard Sandiford
 wrote:
>
> Bernhard Reutner-Fischer  writes:
> > On 23 October 2019 13:16:19 CEST, Richard Sandiford 
> >  wrote:
> >
> >>+++ gcc/config/gcn/gcn.c  2019-10-23 12:13:54.091122156 +0100
> >>@@ -3786,8 +3786,7 @@ gcn_expand_builtin (tree exp, rtx target
> >>a vector.  */
> >>
> >> opt_machine_mode
> >>-gcn_vectorize_get_mask_mode (poly_uint64 ARG_UNUSED (nunits),
> >>-  poly_uint64 ARG_UNUSED (length))
> >>+gcn_vectorize_get_mask_mode (nachine_mode)
> >
> > nachine?
> >
> > If that really compiles someone should fix that preexisting typo, I 
> > suppose. Didn't look though.
>
> Gah, had a nasty feeling there was some extra testing I'd forgotten to do.
>
> Thanks for spotting that.  Consider it fixed in the obvious way.

OK.

Thanks,
Richard.

> Richard


Re: Fix reductions for fully-masked loops

2019-10-25 Thread Richard Biener
On Thu, Oct 24, 2019 at 9:29 AM Richard Sandiford
 wrote:
>
> Now that vectorizable_operation vectorises most loop stmts involved
> in a reduction, it needs to be aware of reductions in fully-masked loops.
> The LOOP_VINFO_CAN_FULLY_MASK_P parts of vectorizable_reduction now only
> apply to cases that use vect_transform_reduction.
>
> This new way of doing things is definitely an improvement for SVE though,
> since it means we can lift the old restriction of not using fully-masked
> loops for reduction chains.

Yeah, I was wondering about that myself - I tried to change as little
as possible
in the area I'm not set up for testing.

> Tested on aarch64-linux-gnu (with and without SVE) and x86_64-linux-gnu.
> OK to install?

I see you are using STMT_VINFO_REDUC_IDX to check if the stmt is part
of a reduction.  I've committed a patch that makes this true for pattern stmts
as well.

Thus, OK.

Thanks,
Richard.

>
> Richard
>
>
> 2019-10-24  Richard Sandiford  
>
> gcc/
> * tree-vect-loop.c (vectorizable_reduction): Restrict the
> LOOP_VINFO_CAN_FULLY_MASK_P handling to cases that will be
> handled by vect_transform_reduction.  Allow fully-masked loops
> to be used with reduction chains.
> * tree-vect-stmts.c (vectorizable_operation): Handle reduction
> operations in fully-masked loops.
> (vectorizable_condition): Reject EXTRACT_LAST_REDUCTION
> operations in fully-masked loops.
>
> gcc/testsuite/
> * gcc.dg/vect/pr65947-1.c: No longer expect doubled dump lines
> for FOLD_EXTRACT_LAST reductions.
> * gcc.dg/vect/pr65947-2.c: Likewise.
> * gcc.dg/vect/pr65947-3.c: Likewise.
> * gcc.dg/vect/pr65947-4.c: Likewise.
> * gcc.dg/vect/pr65947-5.c: Likewise.
> * gcc.dg/vect/pr65947-6.c: Likewise.
> * gcc.dg/vect/pr65947-9.c: Likewise.
> * gcc.dg/vect/pr65947-10.c: Likewise.
> * gcc.dg/vect/pr65947-12.c: Likewise.
> * gcc.dg/vect/pr65947-13.c: Likewise.
> * gcc.dg/vect/pr65947-14.c: Likewise.
> * gcc.dg/vect/pr80631-1.c: Likewise.
> * gcc.dg/vect/pr80631-2.c: Likewise.
> * gcc.dg/vect/vect-cond-reduc-3.c: Likewise.
> * gcc.dg/vect/vect-cond-reduc-4.c: Likewise.
>
> Index: gcc/tree-vect-loop.c
> ===
> --- gcc/tree-vect-loop.c2019-10-24 08:28:45.0 +0100
> +++ gcc/tree-vect-loop.c2019-10-24 08:29:09.177742864 +0100
> @@ -6313,38 +6313,8 @@ vectorizable_reduction (stmt_vec_info st
>else
>  vec_num = 1;
>
> -  internal_fn cond_fn = get_conditional_internal_fn (code);
> -  vec_loop_masks *masks = _VINFO_MASKS (loop_vinfo);
> -  bool mask_by_cond_expr = use_mask_by_cond_expr_p (code, cond_fn, 
> vectype_in);
> -
>vect_model_reduction_cost (stmt_info, reduc_fn, reduction_type, ncopies,
>  cost_vec);
> -  if (loop_vinfo && LOOP_VINFO_CAN_FULLY_MASK_P (loop_vinfo))
> -{
> -  if (reduction_type != FOLD_LEFT_REDUCTION
> - && !mask_by_cond_expr
> - && (cond_fn == IFN_LAST
> - || !direct_internal_fn_supported_p (cond_fn, vectype_in,
> - OPTIMIZE_FOR_SPEED)))
> -   {
> - if (dump_enabled_p ())
> -   dump_printf_loc (MSG_MISSED_OPTIMIZATION, vect_location,
> -"can't use a fully-masked loop because no"
> -" conditional operation is available.\n");
> - LOOP_VINFO_CAN_FULLY_MASK_P (loop_vinfo) = false;
> -   }
> -  else if (reduc_index == -1)
> -   {
> - if (dump_enabled_p ())
> -   dump_printf_loc (MSG_MISSED_OPTIMIZATION, vect_location,
> -"can't use a fully-masked loop for chained"
> -" reductions.\n");
> - LOOP_VINFO_CAN_FULLY_MASK_P (loop_vinfo) = false;
> -   }
> -  else
> -   vect_record_loop_mask (loop_vinfo, masks, ncopies * vec_num,
> -  vectype_in, NULL);
> -}
>if (dump_enabled_p ()
>&& reduction_type == FOLD_LEFT_REDUCTION)
>  dump_printf_loc (MSG_NOTE, vect_location,
> @@ -6361,6 +6331,27 @@ vectorizable_reduction (stmt_vec_info st
>STMT_VINFO_DEF_TYPE (stmt_info) = vect_internal_def;
>STMT_VINFO_DEF_TYPE (vect_orig_stmt (stmt_info)) = vect_internal_def;
>  }
> +  else if (loop_vinfo && LOOP_VINFO_CAN_FULLY_MASK_P (loop_vinfo))
> +{
> +  vec_loop_masks *masks = _VINFO_MASKS (loop_vinfo);
> +  internal_fn cond_fn = get_conditional_internal_fn (code);
> +
> +  if (reduction_type != FOLD_LEFT_REDUCTION
> + && !use_mask_by_cond_expr_p (code, cond_fn, vectype_in)
> + && (cond_fn == IFN_LAST
> + || !direct_internal_fn_supported_p (cond_fn, vectype_in,
> + OPTIMIZE_FOR_SPEED)))
> +   {
> +   

[PATCH] Transfer STMT_VINFO_REDUC_IDX to patterns

2019-10-25 Thread Richard Biener


Reduction discovery nicely computes STMT_VINFO_REDUC_IDX for all
stmts involved in the reduction but that has been useless somewhat
since pattern recog later will wreck the info.  The following is
an attempt to fix that during pattern recog.  I may very well miss
some cases but hope to fix them...

Seeing Richard using STMT_VINFO_REDUC_IDX more I'm committing this now.

Bootstrapped and tested on x86_64-unknown-linux-gnu, applied.

Richard.

2019-10-25  Richard Biener  

* tree-vect-loop.c (vectorizable_reduction): Verify
STMT_VINFO_REDUC_IDX on the to be vectorized stmts is set up
correctly.
* tree-vect-patterns.c (vect_mark_pattern_stmts): Transfer
STMT_VINFO_REDUC_IDX from the original stmts to the pattern
stmts.

Index: gcc/tree-vect-loop.c
===
--- gcc/tree-vect-loop.c(revision 277414)
+++ gcc/tree-vect-loop.c(working copy)
@@ -5719,7 +5719,19 @@ vectorizable_reduction (stmt_vec_info st
   /* PHIs should not participate in patterns.  */
   gcc_assert (!STMT_VINFO_RELATED_STMT (phi_info));
   gphi *reduc_def_phi = as_a  (phi_info->stmt);
-  tree reduc_def = PHI_RESULT (reduc_def_phi);
+
+  /* Verify following REDUC_IDX from the latch def leads us back to the PHI.  
*/
+  tree reduc_def = PHI_ARG_DEF_FROM_EDGE (reduc_def_phi,
+ loop_latch_edge (loop));
+  while (reduc_def != PHI_RESULT (reduc_def_phi))
+{
+  stmt_vec_info def = loop_vinfo->lookup_def (reduc_def);
+  def = vect_stmt_to_vectorize (def);
+  gcc_assert (STMT_VINFO_REDUC_IDX (def) != -1);
+  reduc_def = gimple_op (def->stmt, 1 + STMT_VINFO_REDUC_IDX (def));
+}
+
+  reduc_def = PHI_RESULT (reduc_def_phi);
   int reduc_index = -1;
   for (i = 0; i < op_type; i++)
 {
Index: gcc/tree-vect-patterns.c
===
--- gcc/tree-vect-patterns.c(revision 277414)
+++ gcc/tree-vect-patterns.c(working copy)
@@ -5075,6 +5075,7 @@ static inline void
 vect_mark_pattern_stmts (stmt_vec_info orig_stmt_info, gimple *pattern_stmt,
  tree pattern_vectype)
 {
+  stmt_vec_info orig_stmt_info_saved = orig_stmt_info;
   gimple *def_seq = STMT_VINFO_PATTERN_DEF_SEQ (orig_stmt_info);
 
   gimple *orig_pattern_stmt = NULL;
@@ -5134,6 +5135,57 @@ vect_mark_pattern_stmts (stmt_vec_info o
 }
   else
 vect_set_pattern_stmt (pattern_stmt, orig_stmt_info, pattern_vectype);
+
+  /* Transfer reduction path info to the pattern.  */
+  if (STMT_VINFO_REDUC_IDX (orig_stmt_info_saved) != -1)
+{
+  vec_info *vinfo = orig_stmt_info_saved->vinfo;
+  tree lookfor = gimple_op (orig_stmt_info_saved->stmt,
+   1 + STMT_VINFO_REDUC_IDX (orig_stmt_info));
+  /* Search the pattern def sequence and the main pattern stmt.  Note
+ we may have inserted all into a containing pattern def sequence
+so the following is a bit awkward.  */
+  gimple_stmt_iterator si;
+  gimple *s;
+  if (def_seq)
+   {
+ si = gsi_start (def_seq);
+ s = gsi_stmt (si);
+ gsi_next ();
+   }
+  else
+   {
+ si = gsi_none ();
+ s = pattern_stmt;
+   }
+  do
+   {
+ bool found = false;
+ for (unsigned i = 1; i < gimple_num_ops (s); ++i)
+   if (gimple_op (s, i) == lookfor)
+ {
+   STMT_VINFO_REDUC_IDX (vinfo->lookup_stmt (s)) = i - 1;
+   lookfor = gimple_get_lhs (s);
+   found = true;
+   break;
+ }
+ if (found && s == pattern_stmt)
+   break;
+ if (s == pattern_stmt)
+   gcc_unreachable ();
+ if (gsi_end_p (si))
+   s = pattern_stmt;
+ else
+   {
+ s = gsi_stmt (si);
+ if (s == pattern_stmt)
+   /* Found the end inside a bigger pattern def seq.  */
+   si = gsi_none ();
+ else
+   gsi_next ();
+   }
+   } while (1);
+}
 }
 
 /* Function vect_pattern_recog_1


Re: [PATCH, Fortran] Allow CHARACTER literals in assignments and DATA statements - for review

2019-10-25 Thread Tobias Burnus

Hello Mark, hi all,

On 10/21/19 4:40 PM, Mark Eggleston wrote:
This is an extension to support a legacy feature supported by other 
compilers such as flang and the sun compiler.  As I understand it this 
feature is associated with DEC so it enabled using 
-fdec-char-conversions and by -fdec.


It allows character literals to be assigned to numeric (INTEGER, REAL, 
COMPLEX) and LOGICAL variables by direct assignment or in DATA 
statements.


    * arith.c (hollerith2representation): Use 
OPT_Wcharacter_truncation in

    call to gfc_warning.


This has two effects: First, it permits to toggle the warning on and 
off; secondly, it disables the warning by default. It is enabled by 
-Wall, however. – I think that's acceptable: while Holleriths are less 
transparent as normal strings, for normal strings the result is identical.




+  result->representation.string[result_len] = '\0'; /* For debugger  */


Tiny nit: full stop after 'debugger'.



+/* Convert character to integer. The constant will be padded or truncated. */


And here an extra space before '*/'.



+Allowing character literals to be used in a similar way to Hollerith constants
+is a non-standard extension.
+
+Character literals can be used in @code{DATA} statements and assignments with


I wonder whether one should mention here explicitly that only 
default-kind (i.e. kind=1) character strings are permitted. 
Additionally, I wonder whether -fdec-char-conversion should be mentioned 
here – without, it is not supported and the error message doesn't point 
to this option.




+
+  /* Flang allows character conversions similar to Hollerith conversions
+ - the first characters will be turned into ascii values. */


Is this Flang or DEC or …? I thought we talk about legacy support and 
Flang is not really legacy.




--- a/gcc/fortran/resolve.c
+++ b/gcc/fortran/resolve.c
  
+  if ((gfc_numeric_ts (>ts) || lhs->ts.type == BT_LOGICAL)

+  && rhs->ts.type == BT_CHARACTER
+  && rhs->expr_type != EXPR_CONSTANT)
+{
+  gfc_error ("Cannot convert %s to %s at %L", gfc_typename (rhs),
+gfc_typename (lhs), >where);
+  return false;
+}


Maybe add a comment like:
/* Happens with flag_dec_char_conversions for nonconstant strings.  */
might help casual readers to understand where this if comes from.



@@ -331,8 +332,9 @@ gfc_conv_constant_to_tree (gfc_expr * expr)
gfc_build_string_const (expr->representation.length,
expr->representation.string));
  if (!integer_zerop (tmp) && !integer_onep (tmp))
-   gfc_warning (0, "Assigning value other than 0 or 1 to LOGICAL"
-" has undefined result at %L", >where);
+   gfc_warning (OPT_Wsurprising, "Assigning value other than 0 or 1 "
+"to LOGICAL has undefined result at %L",
+>where);


I am not happy with this. We had odd issues with combining code 
generated by gfortran and ifort and Booleans types ("logical"). Namely, 
gfortran uses 0 and 1 – while ifort uses -1 and 0. When using ".not. 
var", it is sufficient to flip a single bit – either the first or the 
last bit – and it is sufficient to look only a single bit.


Hence, one can get ".not. var .eqv. var".

The same result one can get when assigning "-1" to logical. Hence, a 
default warning makes more sense than -Wsurprising. At least, 
-Wsurprising is enabled by default.


Hence, I wonder whether your 'OPT_Wsurprising' or 
'flag_dec_char_conversions ? OPT_Wsurprising : 0' makes more sense.



Actually, I don't quickly see whether   4_'string'  (i.e. kind=4) 
strings are rejected or not. The gfc_character2* functions all assume 
kind=1 characters – while code like gfc_convert_constant or the 
resolve.c code only looks at BT_CHARACTER.
On the other hand, the add_conv calls in intrintrinsic.c's 
add_conversions are only added for the default-character kind.


In any case, can you add a test which checks that – even with 
-fdec-char-conversion – assigning a 2_'string' and 4_'string' to a 
integer/real/complex/logical will be rejected at compile time?


Otherwise, it looks okay to me.

Tobias



Re: [PATCH target/89071] Fix false dependence of scalar operations vrcp/vsqrt/vrsqrt/vrndscale

2019-10-25 Thread Uros Bizjak
On Fri, Oct 25, 2019 at 7:55 AM Hongtao Liu  wrote:
>
> On Fri, Oct 25, 2019 at 1:23 PM Hongtao Liu  wrote:
> >
> > On Fri, Oct 25, 2019 at 2:39 AM Uros Bizjak  wrote:
> > >
> > > On Wed, Oct 23, 2019 at 7:48 AM Hongtao Liu  wrote:
> > > >
> > > > Update patch:
> > > > Add m constraint to define_insn (sse_1_round > > > *sse_1_round > > > when under sse4 but not avx512f.
> > >
> > > It looks to me that the original insn is incompletely defined. It
> > > should use nonimmediate_operand, "m" constraint and  pointer
> > > size modifier. Something like:
> > >
> > > (define_insn "sse4_1_round"
> > >   [(set (match_operand:VF_128 0 "register_operand" "=Yr,*x,x,v")
> > > (vec_merge:VF_128
> > >   (unspec:VF_128
> > > [(match_operand:VF_128 2 "nonimmediate_operand" "Yrm,*xm,xm,vm")
> > >  (match_operand:SI 3 "const_0_to_15_operand" "n,n,n,n")]
> > > UNSPEC_ROUND)
> > >   (match_operand:VF_128 1 "register_operand" "0,0,x,v")
> > >   (const_int 1)))]
> > >   "TARGET_SSE4_1"
> > >   "@
> > >round\t{%3, %2, %0|%0, %2, %3}
> > >round\t{%3, %2, %0|%0, %2, %3}
> > >vround\t{%3, %2, %1, %0|%0, %1, %2, %3}
> > >vrndscale\t{%3, %2, %1, %0|%0, %1, %2, %3}"
> > >
> > > >
> > > > Changelog:
> > > > gcc/
> > > > * config/i386/sse.md:  (sse4_1_round):
> > > > Change constraint x to xm
> > > > since vround support memory operand.
> > > > * (*sse4_1_round): Ditto.
> > > >
> > > > Bootstrap and regression test ok.
> > > >
> > > > On Wed, Oct 23, 2019 at 9:56 AM Hongtao Liu  wrote:
> > > > >
> > > > > Hi uros:
> > > > >   This patch fixes false dependence of scalar operations
> > > > > vrcp/vsqrt/vrsqrt/vrndscale.
> > > > >   Bootstrap ok, regression test on i386/x86 ok.
> > > > >
> > > > >   It does something like this:
> > > > > -
> > > > > For scalar instructions with both xmm operands:
> > > > >
> > > > > op %xmmN,%xmmQ,%xmmQ > op %xmmN, %xmmN, %xmmQ
> > > > >
> > > > > for scalar instructions with one mem  or gpr operand:
> > > > >
> > > > > op mem/gpr, %xmmQ, %xmmQ
> > > > >
> > > > > --->  using pass rpad >
> > > > >
> > > > > xorps %xmmN, %xmmN, %xxN
> > > > > op mem/gpr, %xmmN, %xmmQ
> > > > >
> > > > > Performance influence of SPEC2017 fprate which is tested on SKX
> > > > >
> > > > > 503.bwaves_r -0.03%
> > > > > 507.cactuBSSN_r -0.22%
> > > > > 508.namd_r -0.02%
> > > > > 510.parest_r 0.37%
> > > > > 511.povray_r 0.74%
> > > > > 519.lbm_r 0.24%
> > > > > 521.wrf_r 2.35%
> > > > > 526.blender_r 0.71%
> > > > > 527.cam4_r 0.65%
> > > > > 538.imagick_r 0.95%
> > > > > 544.nab_r -0.37
> > > > > 549.fotonik3d_r 0.24%
> > > > > 554.roms_r 0.90%
> > > > > fprate geomean 0.50%
> > > > > -
> > > > >
> > > > > Changelog
> > > > > gcc/
> > > > > * config/i386/i386.md (*rcpsf2_sse): Add
> > > > > avx_partial_xmm_update, prefer m constraint for TARGET_AVX.
> > > > > (*rsqrtsf2_sse): Ditto.
> > > > > (*sqrt2_sse): Ditto.
> > > > > (sse4_1_round2): separate constraint vm, add
> > > > > avx_partail_xmm_update, prefer m constraint for TARGET_AVX.
> > > > > * config/i386/sse.md (*sse_vmrcpv4sf2"): New define_insn used
> > > > > by pass rpad.
> > > > > (*_vmsqrt2*):
> > > > > Ditto.
> > > > > (*sse_vmrsqrtv4sf2): Ditto.
> > > > > (*avx512f_rndscale): Ditto.
> > > > > (*sse4_1_round): Ditto.
> > > > >
> > > > > gcc/testsuite
> > > > > * gcc.target/i386/pr87007-4.c: New test.
> > > > > * gcc.target/i386/pr87007-5.c: Ditto.
> > > > >
> > > > >
> > > > > --
> > > > > BR,
> > > > > Hongtao
> > >
> > > (set (attr "preferred_for_speed")
> > >   (cond [(eq_attr "alternative" "1")
> > >(symbol_ref "TARGET_AVX || !TARGET_SSE_PARTIAL_REG_DEPENDENCY")
> > > (eq_attr "alternative" "2")
> > > -  (symbol_ref "!TARGET_SSE_PARTIAL_REG_DEPENDENCY")
> > > +  (symbol_ref "TARGET_AVX || !TARGET_SSE_PARTIAL_REG_DEPENDENCY")
> > > ]
> > > (symbol_ref "true")))])
> > >
> > > This can be written as:
> > >
> > > (set (attr "preferred_for_speed")
> > >   (cond [(match_test "TARGET_AVX")
> > >(symbol_ref "true")
> > > (eq_attr "alternative" "1,2")
> > >   (symbol_ref "!TARGET_SSE_PARTIAL_REG_DEPENDENCY")
> > > ]
> > > (symbol_ref "true")))])
> > >
> > > Uros.
> >
> > Yes, after these fixed, i'll upstream to trunk, ok?
> Update patch.

+(sqrt:
+  (match_operand: 1 "vector_operand"
"xBm,")))
+  (match_operand:VF_128 2 "register_operand" "0,v")
+  (const_int 1)))]

vector_operand and Bm are needed for vector mode operands. This is in
effect scalar operand, so nonimmediate_operand and simple "xm" should
be used here.

+(define_insn "*sse_vmrsqrtv4sf2"
+  [(set (match_operand:V4SF 0 "register_operand" "=x,x")
+(vec_merge:V4SF
+  (vec_duplicate:V4SF
+(unspec:SF [(match_operand:SF 1 "nonimmediate_operand" 

Re: RFC/A: Add a targetm.vectorize.related_mode hook

2019-10-25 Thread Richard Sandiford
Richard Biener  writes:
> On Wed, Oct 23, 2019 at 2:12 PM Richard Sandiford
>  wrote:
>>
>> Richard Biener  writes:
>> > On Wed, Oct 23, 2019 at 1:51 PM Richard Sandiford
>> >  wrote:
>> >>
>> >> Richard Biener  writes:
>> >> > On Wed, Oct 23, 2019 at 1:00 PM Richard Sandiford
>> >> >  wrote:
>> >> >>
>> >> >> This patch is the first of a series that tries to remove two
>> >> >> assumptions:
>> >> >>
>> >> >> (1) that all vectors involved in vectorisation must be the same size
>> >> >>
>> >> >> (2) that there is only one vector mode for a given element mode and
>> >> >> number of elements
>> >> >>
>> >> >> Relaxing (1) helps with targets that support multiple vector sizes or
>> >> >> that require the number of elements to stay the same.  E.g. if we're
>> >> >> vectorising code that operates on narrow and wide elements, and the
>> >> >> narrow elements use 64-bit vectors, then on AArch64 it would normally
>> >> >> be better to use 128-bit vectors rather than pairs of 64-bit vectors
>> >> >> for the wide elements.
>> >> >>
>> >> >> Relaxing (2) makes it possible for -msve-vector-bits=128 to preoduce
>> >> >> fixed-length code for SVE.  It also allows unpacked/half-size SVE
>> >> >> vectors to work with -msve-vector-bits=256.
>> >> >>
>> >> >> The patch adds a new hook that targets can use to control how we
>> >> >> move from one vector mode to another.  The hook takes a starting vector
>> >> >> mode, a new element mode, and (optionally) a new number of elements.
>> >> >> The flexibility needed for (1) comes in when the number of elements
>> >> >> isn't specified.
>> >> >>
>> >> >> All callers in this patch specify the number of elements, but a later
>> >> >> vectoriser patch doesn't.  I won't be posting the vectoriser patch
>> >> >> for a few days, hence the RFC/A tag.
>> >> >>
>> >> >> Tested individually on aarch64-linux-gnu and as a series on
>> >> >> x86_64-linux-gnu.  OK to install?  Or if not yet, does the idea
>> >> >> look OK?
>> >> >
>> >> > In isolation the idea looks good but maybe a bit limited?  I see
>> >> > how it works for the same-size case but if you consider x86
>> >> > where we have SSE, AVX256 and AVX512 what would it return
>> >> > for related_vector_mode (V4SImode, SImode, 0)?  Or is this
>> >> > kind of query not intended (where the component modes match
>> >> > but nunits is zero)?
>> >>
>> >> In that case we'd normally get V4SImode back.  It's an allowed
>> >> combination, but not very useful.
>> >>
>> >> > How do you get from SVE fixed 128bit to NEON fixed 128bit then?  Or is
>> >> > it just used to stay in the same register set for different component
>> >> > modes?
>> >>
>> >> Yeah, the idea is to use the original vector mode as essentially
>> >> a base architecture.
>> >>
>> >> The follow-on patches replace vec_info::vector_size with
>> >> vec_info::vector_mode and targetm.vectorize.autovectorize_vector_sizes
>> >> with targetm.vectorize.autovectorize_vector_modes.  These are the
>> >> starting modes that would be passed to the hook in the nunits==0 case.
>> >>
>> >> E.g. for Advanced SIMD on AArch64, it would make more sense for
>> >> related_mode (V4HImode, SImode, 0) to be V4SImode rather than V2SImode.
>> >> I think things would work in a similar way for the x86_64 vector archs.
>> >>
>> >> For SVE we'd add both VNx16QImode (the SVE mode) and V16QImode (the
>> >> Advanced SIMD mode) to autovectorize_vector_modes, even though they
>> >> happen to be the same size for 128-bit SVE.  We can then compare
>> >> 128-bit SVE with 128-bit Advanced SIMD, with related_mode ensuring
>> >> that we consistently use all-SVE modes or all-Advanced SIMD modes
>> >> for each attempt.
>> >>
>> >> The plan for SVE is to add 4(!) modes to autovectorize_vector_modes:
>> >>
>> >> - VNx16QImode (full vector)
>> >> - VNx8QImode (half vector)
>> >> - VNx4QImode (quarter vector)
>> >> - VNx2QImode (eighth vector)
>> >>
>> >> and then pick the one with the lowest cost.  related_mode would
>> >> keep the number of units the same for nunits==0, within the limit
>> >> of the vector size.  E.g.:
>> >>
>> >> - related_mode (VNx16QImode, HImode, 0) == VNx8HImode (full vector)
>> >> - related_mode (VNx8QImode, HImode, 0) == VNx8HImode (full vector)
>> >> - related_mode (VNx4QImode, HImode, 0) == VNx4HImode (half vector)
>> >> - related_mode (VNx2QImode, HImode, 0) == VNx2HImode (quarter vector)
>> >>
>> >> and:
>> >>
>> >> - related_mode (VNx16QImode, SImode, 0) == VNx4SImode (full vector)
>> >> - related_mode (VNx8QImode, SImode, 0) == VNx4SImode (full vector)
>> >> - related_mode (VNx4QImode, SImode, 0) == VNx4SImode (full vector)
>> >> - related_mode (VNx2QImode, SImode, 0) == VNx2SImode (half vector)
>> >>
>> >> So when operating on multiple element sizes, the tradeoff is between
>> >> trying to make full use of the vector size (higher base nunits) vs.
>> >> trying to remove packs and unpacks between multiple vector copies
>> >> (lower base nunits).  The latter is useful because extending within
>> >> 

Re: PR92163

2019-10-25 Thread Richard Biener
On Wed, Oct 23, 2019 at 11:45 PM Prathamesh Kulkarni
 wrote:
>
> Hi,
> The attached patch tries to fix PR92163 by calling
> gimple_purge_dead_eh_edges from ifcvt_local_dce if we need eh cleanup.
> Does it look OK ?

Hmm.  I think it shows an issue with the return value of remove_stmt_form_eh_lp
which is true if the LP index is -1 (externally throwing).  We don't
need to purge
any edges in that case.  That is, if-conversion should never need to
do EH purging
since that would be wrong-code.

As of the segfault can you please instead either pass down need_eh_cleanup
as function parameter (and NULL from ifcvt) or use the return value in DSE
to set the bit in the caller.

Thanks,
Richard.

> Thanks,
> Prathamesh


Re: Type representation in CTF and DWARF

2019-10-25 Thread Richard Biener
On Fri, Oct 25, 2019 at 1:52 AM Indu Bhagat  wrote:
>
>
>
> On 10/11/2019 04:41 AM, Jakub Jelinek wrote:
> > On Fri, Oct 11, 2019 at 01:23:12PM +0200, Richard Biener wrote:
> >>> (coreutils-0.22)
> >>>.debug_info(D1) | .debug_abbrev(D2) | .debug_str(D4) | .ctf 
> >>> (uncompressed) | ratio (.ctf/(D1+D2+0.5*D4))
> >>> ls   30616   |1136   |21098   | 26240 
> >>>   | 0.62
> >>> pwd  10734   |788|10433   | 13929 
> >>>   | 0.83
> >>> groups 10706 |811|10249   | 13378 
> >>>   | 0.80
> >>>
> >>> (emacs-26.3)
> >>>.debug_info(D1) | .debug_abbrev(D2) | .debug_str(D4) | .ctf 
> >>> (uncompressed) | ratio (.ctf/(D1+D2+0.5*D4))
> >>> emacs-26.3.1 674657  |6402   |   273963   |   273910  
> >>>   | 0.33
> >>>
> >>> I chose to account for 50% of .debug_str because at this point, it will be
> >>> unfair to not account for them. Actually, one could even argue that upto 
> >>> 70%
> >>> of the .debug_str are names of entities. CTF section sizes do include the 
> >>> CTF
> >>> string tables.
> >>>
> >>> Across coreutils, I see a geomean of 0.73 (ratio of
> >>> .ctf/(.debug_info + .debug_abbrev + 50% of .debug_str)). So, with the
> >>> "-gdwarf-like-ctf code stubs" and dwz, DWARF continues to have a larger
> >>> footprint than CTF (with 50% of .debug_str accounted for).
> >> I'm not convinced this "improvement" in size is worth maintainig another
> >> debug-info format much less since it lacks desirable features right now
> >> and thus evaluation is tricky.
> >>
> >> At least you can improve dwarf size considerably with a low amount of work.
> >>
> >> I suspect another factor where dwarf is bigger compared to CTF is that 
> >> dwarf
> >> is recording typedef names as well as qualified type variants.  But maybe
> >> CTF just has a more compact representation for the bits it actually 
> >> implements.
> > Does CTF record automatic variables in functions, or just global variables?
> > If only the latter, it would be fair to also disable addition of local
> > variable DIEs, lexical blocks.  Does CTF record inline functions?  Again, if
> > not, it would be fair to not emit that either in .debug_info.
> > -gno-record-gcc-switches so that the compiler command line is not encoded in
> > the debug info (unless it is in CTF).
>
> CTF includes file-scope and global-scope entities. So, CTF for a function
> defined/declared at these scopes is available in .ctf section, even if it is
> inlined.
>
> To not generate DWARF for function-local entities, I made a tweak in the
> gen_decl_die API to have an early exit when TREE_CODE (DECL_CONTEXT (decl))
> is FUNCTION_DECL.
>
> @@ -26374,6 +26374,12 @@ gen_decl_die (tree decl, tree origin, struct 
> vlr_context *ctx,
> if (DECL_P (decl_or_origin) && DECL_IGNORED_P (decl_or_origin))
>   return NULL;
>
> +  /* Do not generate info for function local decl when -gdwarf-like-ctf is
> + enabled.  */
> +  if (debug_dwarf_like_ctf && DECL_CONTEXT (decl)
> +  && (TREE_CODE (DECL_CONTEXT (decl)) == FUNCTION_DECL))
> +return NULL;
> +
> switch (TREE_CODE (decl_or_origin))
>   {
>   case ERROR_MARK:

A better place is probably in gen_subprogram_die, returning early before

  /* Output Dwarf info for all of the stuff within the body of the function
 (if it has one - it may be just a declaration).

note we also emit DIEs for [optionally also unused, if requested] function
declarations without actual definitions, I would guess CTF doesn't since
there's no symbol table entry for those.  Plus we by default prune types
that are not used.  So

struct S { int i; };
extern void foo (struct S *);
void bar()
{
  struct S s;
  foo ();
}

would have DIEs for S and foo in addition to that for bar.  To me it seems
those are not relevant for function entry point inspection (eventually both
S and foo have CTF info in the defining unit).  Correct?

Richard.

>
> For the numbers in the email today:
> 1. CFLAGS="-g -gdwarf-like-ctf -gno-record-gcc-switches -O2". dwz is used on
> generated binaries.
> 2. At this time, I wanted to account for .debug_str entities appropriately 
> (not
> 50% as done previously). Using a small script to count chars for
> accounting the "path-like" strings, specifically those strings that start
> with a ".", I gathered the data in column named D5.
>
> (coreutils-0.22)
>   .debug_info(D1) | .debug_abbrev(D2) | .debug_str(D4) | path strings 
> (D5) | .ctf (uncompressed) | ratio (.ctf/(D1+D2+D4-D5))
> ls   14100   |994|16945   | 1328  
> |   26240 | 0.85
> pwd   6341   |632| 9311   |  596  
> |   13929 | 0.88
> groups 6410  |714| 9218   |  667  
> |   13378 | 0.85
> Average geomean across coreutils = 0.84
>
> 

Re: [Patch, fortran] PR91926 - assumed rank optional

2019-10-25 Thread Tobias Burnus

On 10/21/19 7:28 PM, Paul Richard Thomas wrote:

Please find attached a patch to keep 9-branch up to speed with trunk
as far as the ISO_Fortran_binding feature is concerned.

It bootstraps and regtests on 9-branch and incorporates the correction
for PR92027, which caused problems for trunk on certain platforms.

OK to commit?



OK. Thanks for the patch.

Tobias



2019-10-21  Paul Thomas

 Backport from trunk
 PR fortran/91926
 * trans-expr.c (gfc_conv_gfc_desc_to_cfi_desc): Correct the
 assignment of the attribute field to account correctly for an
 assumed shape dummy. Assign separately to the gfc and cfi
 descriptors since the atribute can be different. Add branch to
 correctly handle missing optional dummies.

2019-10-21  Paul Thomas

 Backport from trunk
 PR fortran/91926
 * gfortran.dg/ISO_Fortran_binding_13.f90 : New test.
 * gfortran.dg/ISO_Fortran_binding_13.c : Additional source.
 * gfortran.dg/ISO_Fortran_binding_14.f90 : New test.


Re: RFC/A: Add a targetm.vectorize.related_mode hook

2019-10-25 Thread Richard Biener
On Wed, Oct 23, 2019 at 2:12 PM Richard Sandiford
 wrote:
>
> Richard Biener  writes:
> > On Wed, Oct 23, 2019 at 1:51 PM Richard Sandiford
> >  wrote:
> >>
> >> Richard Biener  writes:
> >> > On Wed, Oct 23, 2019 at 1:00 PM Richard Sandiford
> >> >  wrote:
> >> >>
> >> >> This patch is the first of a series that tries to remove two
> >> >> assumptions:
> >> >>
> >> >> (1) that all vectors involved in vectorisation must be the same size
> >> >>
> >> >> (2) that there is only one vector mode for a given element mode and
> >> >> number of elements
> >> >>
> >> >> Relaxing (1) helps with targets that support multiple vector sizes or
> >> >> that require the number of elements to stay the same.  E.g. if we're
> >> >> vectorising code that operates on narrow and wide elements, and the
> >> >> narrow elements use 64-bit vectors, then on AArch64 it would normally
> >> >> be better to use 128-bit vectors rather than pairs of 64-bit vectors
> >> >> for the wide elements.
> >> >>
> >> >> Relaxing (2) makes it possible for -msve-vector-bits=128 to preoduce
> >> >> fixed-length code for SVE.  It also allows unpacked/half-size SVE
> >> >> vectors to work with -msve-vector-bits=256.
> >> >>
> >> >> The patch adds a new hook that targets can use to control how we
> >> >> move from one vector mode to another.  The hook takes a starting vector
> >> >> mode, a new element mode, and (optionally) a new number of elements.
> >> >> The flexibility needed for (1) comes in when the number of elements
> >> >> isn't specified.
> >> >>
> >> >> All callers in this patch specify the number of elements, but a later
> >> >> vectoriser patch doesn't.  I won't be posting the vectoriser patch
> >> >> for a few days, hence the RFC/A tag.
> >> >>
> >> >> Tested individually on aarch64-linux-gnu and as a series on
> >> >> x86_64-linux-gnu.  OK to install?  Or if not yet, does the idea
> >> >> look OK?
> >> >
> >> > In isolation the idea looks good but maybe a bit limited?  I see
> >> > how it works for the same-size case but if you consider x86
> >> > where we have SSE, AVX256 and AVX512 what would it return
> >> > for related_vector_mode (V4SImode, SImode, 0)?  Or is this
> >> > kind of query not intended (where the component modes match
> >> > but nunits is zero)?
> >>
> >> In that case we'd normally get V4SImode back.  It's an allowed
> >> combination, but not very useful.
> >>
> >> > How do you get from SVE fixed 128bit to NEON fixed 128bit then?  Or is
> >> > it just used to stay in the same register set for different component
> >> > modes?
> >>
> >> Yeah, the idea is to use the original vector mode as essentially
> >> a base architecture.
> >>
> >> The follow-on patches replace vec_info::vector_size with
> >> vec_info::vector_mode and targetm.vectorize.autovectorize_vector_sizes
> >> with targetm.vectorize.autovectorize_vector_modes.  These are the
> >> starting modes that would be passed to the hook in the nunits==0 case.
> >>
> >> E.g. for Advanced SIMD on AArch64, it would make more sense for
> >> related_mode (V4HImode, SImode, 0) to be V4SImode rather than V2SImode.
> >> I think things would work in a similar way for the x86_64 vector archs.
> >>
> >> For SVE we'd add both VNx16QImode (the SVE mode) and V16QImode (the
> >> Advanced SIMD mode) to autovectorize_vector_modes, even though they
> >> happen to be the same size for 128-bit SVE.  We can then compare
> >> 128-bit SVE with 128-bit Advanced SIMD, with related_mode ensuring
> >> that we consistently use all-SVE modes or all-Advanced SIMD modes
> >> for each attempt.
> >>
> >> The plan for SVE is to add 4(!) modes to autovectorize_vector_modes:
> >>
> >> - VNx16QImode (full vector)
> >> - VNx8QImode (half vector)
> >> - VNx4QImode (quarter vector)
> >> - VNx2QImode (eighth vector)
> >>
> >> and then pick the one with the lowest cost.  related_mode would
> >> keep the number of units the same for nunits==0, within the limit
> >> of the vector size.  E.g.:
> >>
> >> - related_mode (VNx16QImode, HImode, 0) == VNx8HImode (full vector)
> >> - related_mode (VNx8QImode, HImode, 0) == VNx8HImode (full vector)
> >> - related_mode (VNx4QImode, HImode, 0) == VNx4HImode (half vector)
> >> - related_mode (VNx2QImode, HImode, 0) == VNx2HImode (quarter vector)
> >>
> >> and:
> >>
> >> - related_mode (VNx16QImode, SImode, 0) == VNx4SImode (full vector)
> >> - related_mode (VNx8QImode, SImode, 0) == VNx4SImode (full vector)
> >> - related_mode (VNx4QImode, SImode, 0) == VNx4SImode (full vector)
> >> - related_mode (VNx2QImode, SImode, 0) == VNx2SImode (half vector)
> >>
> >> So when operating on multiple element sizes, the tradeoff is between
> >> trying to make full use of the vector size (higher base nunits) vs.
> >> trying to remove packs and unpacks between multiple vector copies
> >> (lower base nunits).  The latter is useful because extending within
> >> a vector is an in-lane rather than cross-lane operation and truncating
> >> within a vector is a no-op.
> >>
> >> With a couple of 

Re: [PATCH,Fortran] Taking a BYTE out of type-spec

2019-10-25 Thread Tobias Burnus

On 10/24/19 10:43 PM, Steve Kargl wrote:

The patch moves the matching of the nonstandard type-spec
BYTE to its own matching function.  During this move, a
check for invalid matching in free-form source code it
detected (see byte_4.f90).  OK to commit?


OK with a nit.


+  if (gfc_current_form == FORM_FREE)
+   {
+ char c = gfc_peek_ascii_char ();
+ if (!gfc_is_whitespace (c) && c != ',')
+   return MATCH_NO;


You also want to permit "byte::var", hence: c == ':' is also okay – you 
can also add this as variant to the test case.


Cheers,

Tobias



[wwwdocs] readings.html - http://www.idris.fr/data/publications/F95/test_F95_english.html is gone

2019-10-25 Thread Gerald Pfeifer
I looked for a replacement, and there does not appear to be one, so I
remove the link.

Committed.

Gerald

- Log -
commit 61592c09663a83809c5115cb7dfddeb3bd606418
Author: Gerald Pfeifer 
Date:   Fri Oct 25 07:55:49 2019 +0200

http://www.idris.fr/data/publications/F95/test_F95_english.html is gone.

diff --git a/htdocs/readings.html b/htdocs/readings.html
index 203b590..5c30391 100644
--- a/htdocs/readings.html
+++ b/htdocs/readings.html
@@ -435,12 +435,6 @@ names.
 contains legal and operational Fortran 77 code.
   
   
-IDRIS
-http://www.idris.fr/data/publications/F95/test_F95_english.html;>
-Low level bench tests of Fortran 95. It tests some Fortran 95
-intrinsics.
-  
-  
 The g77 testsuite (which is part of GCC).