Re: [PATCH v3 2/3] Add predict_doloop_p target hook

2019-05-16 Thread Segher Boessenkool
On Fri, May 17, 2019 at 03:30:41PM +1000, Kugan Vivekanandarajah wrote:
> On Fri, 17 May 2019 at 13:37,  wrote:
> > +  if (GET_CODE (body) == SET)
> > +   {
> > + rtx set_val = XEXP (body, 1);
> > + enum rtx_code code = GET_CODE (set_val);
> > + enum rtx_class cls = GET_RTX_CLASS (code);
> > + /* For now, we only consider these two RTX classes, to match what 
> > we
> > +get in doloop_optimize, excluding operations like zero/sign 
> > extend.  */
> > + if (cls == RTX_BIN_ARITH || cls == RTX_COMM_ARITH)
> > +   cost += set_src_cost (set_val, GET_MODE (set_val), speed);
> Cant you have PARALLEL with SET here?

So it should use single_set and SET_SRC?  Yeah I guess.

For Power, we don't have many PARALLELs in freshly expanded code, so it
doesn't make much difference for us.

> > +  if (cost > max_cost)
> > +return true;
> Maybe it is better to bailout early if the limit is reached instead of
> doing it outside the loop?

That won't be more complicated code here, so yes let's do that.


Segher


Re: [PATCH v3 2/3] Add predict_doloop_p target hook

2019-05-16 Thread Segher Boessenkool
Hi Kewen,

On Thu, May 16, 2019 at 10:35:30PM -0500, li...@linux.ibm.com wrote:
> 2) For the other part of target invalid stmt check, as the
> hook invalid_within_doloop grep data shows, no all targets
> need to check whether invalid instructions exist in doloop.
> If we scan all stmts as generic, it can waste time for those
> targets which don't need to check.

So make the default version of the hook NULL, and only run the hook if
non-null?  There are many examples of this.

>  Besides, the scope of
> the current check on SWITCH in rs6000 hook is wide, later
> if we want it more exact, we may need to check more stmts
> instead of single.  To let target hook scan the BBs/stmts
> by itself is also more flexible.

If we'll need that flexibility, okay.

> +static bool
> +invalid_insn_for_doloop_p (struct loop *loop)
> +{
> +  basic_block *body = get_loop_body (loop);
> +  gimple_stmt_iterator gsi;
> +
> +  for (unsigned i = 0; i < loop->num_nodes; i++)
> +for (gsi = gsi_start_bb (body[i]); !gsi_end_p (gsi); gsi_next (&gsi))
> +  {
> + gimple *stmt = gsi_stmt (gsi);
> + if (is_gimple_call (stmt) && !gimple_call_internal_p (stmt)
> + && !is_inexpensive_builtin (gimple_call_fndecl (stmt)))
> +   {
> + if (dump_file && (dump_flags & TDF_DETAILS))
> +   fprintf (dump_file,
> +"predict doloop failure due to finding call.\n");

Should this really be for -all dumps only?  "X failed because Y" is often
very interesting info -- and it is not much output.

(Please start the line with a capital if you end it with a period :-) )

> +  if (dump_file && (dump_flags & TDF_DETAILS))
> + fprintf (dump_file, "predict doloop failure due to"
> + "no innermost.\n");

If you paste strings (which is fine for debug output), you still need a
space between words ;-)

> +@deftypefn {Target Hook} bool TARGET_PREDICT_DOLOOP_P (struct loop 
> *@var{loop})
> +Return true if we can predict it is possible to use low-overhead loops
> +for a particular loop.  The parameter @var{loop} is a pointer to the loop

"... use a low-overhead loop ..."

> +which is going to be checked.  This target hook is required only when the

Just remove the whole "which is going to be checked" part?

> +target supports low-overhead loops, and will help some earlier middle-end
> +passes to make some decisions.

Is it *required* when the target has doloops?  And what will happen if you
do not define this hook, either if or you have doloops or if you don't?

Hook documentation often ends "The default version of this hook returns..."
which neatly answers all this.

> +   /* For now, we only consider these two RTX classes, to match what we
> +  get in doloop_optimize, excluding operations like zero/sign extend.  */

The indentation is broken here.

> +  if (dump_file && (dump_flags & TDF_DETAILS))
> + fprintf (dump_file, "predict doloop failure due to"
> + "target specific checks.\n");

Missing space as well (and more later, please check all).


Segher


Re: [PATCH][PR90106] Builtin call transformation changes in cdce pass

2019-05-16 Thread Jakub Jelinek
On Fri, May 17, 2019 at 02:24:22PM +0800, JunMa wrote:
> 2019-05-17  Jun Ma 

Two spaces before < rather than one.

>     PR tree-optimization/90106
>     * gcc.dg/cdce3.c: New test.

> --- /dev/null
> +++ b/gcc/testsuite/gcc.dg/cdce3.c
> @@ -0,0 +1,12 @@
> +/* { dg-do  compile } */

Just use one space instead of two.

> +/* { dg-options "-O2 -fmath-errno -fdump-tree-cdce-details 
> -fdump-tree-optimized  -lm" } */

For compile time test, no need to add "  -lm" (well, no need to add it even
for link/run tests).

> +/* { dg-final { scan-tree-dump "cdce3.c:10: .* function call is 
> shrink-wrapped into error conditions\." "cdce" } } */

Please use \[^\n\r]* instead of .*, you don't want newlines matched in
there.

> +/* { dg-final { scan-tree-dump "sqrtf \\(\[^\n\r]*\\); \\\[tail call\\\]" 
> "optimized" } } */
> +
> +#include 

Wouldn't it be better to just declare it yourself:
float sqrtf (float);
?
You really don't know what the target math.h includes.

> +
> +float foo ( float x )
> +{
> +  return sqrtf( x );
> +}
> +
> -- 
> 1.8.3.1
> 

Jakub


Re: [PATCH][PR90106] Builtin call transformation changes in cdce pass

2019-05-16 Thread JunMa

在 2019/5/17 上午11:09, JunMa 写道:

在 2019/5/17 上午6:04, Jakub Jelinek 写道:

On Thu, May 16, 2019 at 11:39:38PM +0200, Jakub Jelinek wrote:

One possibility is to add -fdump-tree-optimized and scan for
/* { dg-final { scan-tree-dump "pow \\(\[^\n\r]*\\); \\\[tail 
call\\\]" "optimized" } } */

resp.
/* { dg-final { scan-tree-dump "log \\(\[^\n\r]*\\); \\\[tail 
call\\\]" "optimized" } } */

Here it is in patch form.

That said, I'm not convinced your patch does what you wanted, because
comparing a month old trunk with today's trunk generates the same 
assembly
except for .ident, generates as many [tail call] lines in *.optimized 
dump
as before, emits the same number of jmp\tpow and jmp\tlog 
instructions as

before (one in a separate routine).



Thanks for point out the mistake and fix it.

For these two tests, cdce pass doesn't transform the builtin math 
functions in foo

with or without the patch because they cannot use internal functions.

I'll add another testcase to verify the patch.



Here is the new testcase.

The sqrtf function call keeps as tailcall with the patch
but not without the patch. For more details, you can read
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=90106.

Bootstrapped/regtested on x86_64-linux, ok for trunk?

Regards
Jun


gcc/testsuite/ChangeLog

2019-05-17  Jun Ma 

    PR tree-optimization/90106
    * gcc.dg/cdce3.c: New test.




Regards
Jun



But at least the tests aren't UNSUPPORTED anymore.

2019-05-16  Jakub Jelinek  

PR tree-optimization/90106
* gcc.dg/cdce1.c: Don't scan-assembler, instead 
-fdump-tree-optimized

and scan-tree-dump for tail call.
* gcc.dg/cdce2.c: Likewise.

--- gcc/testsuite/gcc.dg/cdce1.c.jj    2019-05-16 11:28:22.750177582 
+0200

+++ gcc/testsuite/gcc.dg/cdce1.c    2019-05-16 23:50:23.618450891 +0200
@@ -1,9 +1,9 @@
-/* { dg-do  run  } */
-/* { dg-options "-O2 -fmath-errno -fdump-tree-cdce-details -lm" } */
+/* { dg-do run } */
+/* { dg-options "-O2 -fmath-errno -fdump-tree-cdce-details 
-fdump-tree-optimized -lm" } */

  /* { dg-require-effective-target int32plus } */
-/* { dg-final { scan-tree-dump  "cdce1.c:17: .* function call is 
shrink-wrapped into error conditions\."  "cdce" } } */

-/* { dg-final { scan-assembler "jmp pow" } } */
  /* { dg-require-effective-target large_double } */
+/* { dg-final { scan-tree-dump "cdce1.c:17: .* function call is 
shrink-wrapped into error conditions\." "cdce" } } */
+/* { dg-final { scan-tree-dump "pow \\(\[^\n\r]*\\); \\\[tail 
call\\\]" "optimized" } } */

    #include 
  #include 
--- gcc/testsuite/gcc.dg/cdce2.c.jj    2019-05-16 11:28:22.781177075 
+0200

+++ gcc/testsuite/gcc.dg/cdce2.c    2019-05-16 23:50:58.505880845 +0200
@@ -1,8 +1,8 @@
-/* { dg-do  run  } */
+/* { dg-do run } */
  /* { dg-skip-if "doubles are floats" { "avr-*-*" } } */
-/* { dg-options "-O2 -fmath-errno -fdump-tree-cdce-details -lm" } */
-/* { dg-final { scan-tree-dump  "cdce2.c:16: .* function call is 
shrink-wrapped into error conditions\." "cdce" } } */

-/* { dg-final { scan-assembler "jmp log" } } */
+/* { dg-options "-O2 -fmath-errno -fdump-tree-cdce-details 
-fdump-tree-optimized -lm" } */
+/* { dg-final { scan-tree-dump "cdce2.c:16: .* function call is 
shrink-wrapped into error conditions\." "cdce" } } */
+/* { dg-final { scan-tree-dump "log \\(\[^\n\r]*\\); \\\[tail 
call\\\]" "optimized" } } */

     #include 
  #include 


Jakub




---
 gcc/testsuite/gcc.dg/cdce3.c | 12 
 1 file changed, 12 insertions(+)
 create mode 100644 gcc/testsuite/gcc.dg/cdce3.c

diff --git a/gcc/testsuite/gcc.dg/cdce3.c b/gcc/testsuite/gcc.dg/cdce3.c
new file mode 100644
index 000..0062c4f
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/cdce3.c
@@ -0,0 +1,12 @@
+/* { dg-do  compile } */
+/* { dg-options "-O2 -fmath-errno -fdump-tree-cdce-details 
-fdump-tree-optimized  -lm" } */
+/* { dg-final { scan-tree-dump "cdce3.c:10: .* function call is shrink-wrapped 
into error conditions\." "cdce" } } */
+/* { dg-final { scan-tree-dump "sqrtf \\(\[^\n\r]*\\); \\\[tail call\\\]" 
"optimized" } } */
+
+#include 
+
+float foo ( float x )
+{
+  return sqrtf( x );
+}
+
-- 
1.8.3.1



Re: [PATCH] i386: Enable MMX intrinsics without SSE/SSE2/SSSE3

2019-05-16 Thread Uros Bizjak
On Thu, May 16, 2019 at 11:59 PM H.J. Lu  wrote:
>
> Since MMX intrinsics are marked with SSE/SSE2/SSSE3 for SSE emulation,
> enable them without SSE/SSE2/SSSE3 if MMX is enabled.
>
> Restore TARGET_3DNOW check, which was changed to TARGET_3DNOW_A by
> revision 271235.
>
> gcc/
>
> PR target/90497
> * config/i386/i386-expand.c (ix86_expand_builtin): Enable MMX
> intrinsics without SSE/SSE2/SSSE3.
> * config/i386/mmx.md (mmx_uavgv8qi3): Restore TARGET_3DNOW
> check.
> (*mmx_uavgv8qi3): Likewise.

OK with a small nit.

Thanks,
Uros.

>
> gcc/testsuite/
>
> PR target/90497
> * gcc.target/i386/pr90497-1.c: New test.
> * gcc.target/i386/pr90497-2.c: Likewise.
> ---
>  gcc/config/i386/i386-expand.c |  6 --
>  gcc/config/i386/mmx.md|  4 ++--
>  gcc/testsuite/gcc.target/i386/pr90497-1.c | 12 
>  gcc/testsuite/gcc.target/i386/pr90497-2.c | 11 +++
>  4 files changed, 29 insertions(+), 4 deletions(-)
>  create mode 100644 gcc/testsuite/gcc.target/i386/pr90497-1.c
>  create mode 100644 gcc/testsuite/gcc.target/i386/pr90497-2.c
>
> diff --git a/gcc/config/i386/i386-expand.c b/gcc/config/i386/i386-expand.c
> index df035607fa7..35aadefdef3 100644
> --- a/gcc/config/i386/i386-expand.c
> +++ b/gcc/config/i386/i386-expand.c
> @@ -10937,8 +10937,10 @@ ix86_expand_builtin (tree exp, rtx target, rtx 
> subtarget,
>&& (isa & (OPTION_MASK_ISA_FMA | OPTION_MASK_ISA_FMA4)) != 0)
>  isa |= (OPTION_MASK_ISA_FMA | OPTION_MASK_ISA_FMA4);
>/* Use SSE/SSE2/SSSE3 to emulate MMX intrinsics in 64-bit mode when
> - MMX is disabled.  */
> -  if (TARGET_MMX_WITH_SSE)
> + MMX is disabled.  NB: Since MMX intrinsics are marked with
> + SSE/SSE2/SSSE3, enable them without SSE/SSE2/SSSE3 if MMX is
> + enabled.  */
> +  if (TARGET_MMX_WITH_SSE || TARGET_MMX)

Please use  "TARGET_MMX || TARGET_MMX_WITH_SSE" as is the case with
all these conditions.

>  {
>if (((bisa & (OPTION_MASK_ISA_SSE | OPTION_MASK_ISA_MMX))
>== (OPTION_MASK_ISA_SSE | OPTION_MASK_ISA_MMX))
> diff --git a/gcc/config/i386/mmx.md b/gcc/config/i386/mmx.md
> index 29bcf931836..adad950fa04 100644
> --- a/gcc/config/i386/mmx.md
> +++ b/gcc/config/i386/mmx.md
> @@ -1745,7 +1745,7 @@
>   (const_int 1) (const_int 1)]))
> (const_int 1]
>"(TARGET_MMX || TARGET_MMX_WITH_SSE)
> -   && (TARGET_SSE || TARGET_3DNOW_A)"
> +   && (TARGET_SSE || TARGET_3DNOW)"
>"ix86_fixup_binary_operands_no_copy (PLUS, V8QImode, operands);")
>
>  (define_insn "*mmx_uavgv8qi3"
> @@ -1764,7 +1764,7 @@
>   (const_int 1) (const_int 1)]))
> (const_int 1]
>"(TARGET_MMX || TARGET_MMX_WITH_SSE)
> -   && (TARGET_SSE || TARGET_3DNOW_A)
> +   && (TARGET_SSE || TARGET_3DNOW)
> && ix86_binary_operator_ok (PLUS, V8QImode, operands)"
>  {
>/* These two instructions have the same operation, but their encoding
> diff --git a/gcc/testsuite/gcc.target/i386/pr90497-1.c 
> b/gcc/testsuite/gcc.target/i386/pr90497-1.c
> new file mode 100644
> index 000..ed6ded7efbc
> --- /dev/null
> +++ b/gcc/testsuite/gcc.target/i386/pr90497-1.c
> @@ -0,0 +1,12 @@
> +/* PR target/90497 */
> +/* { dg-do compile } */
> +/* { dg-options "-mno-sse -mmmx" { target ia32 } } */
> +/* { dg-options "-mno-mmx" { target { ! ia32 } } } */
> +
> +typedef char __v8qi __attribute__ ((__vector_size__ (8)));
> +
> +__v8qi
> +foo (__v8qi x, __v8qi y)
> +{
> +  return __builtin_ia32_pcmpeqb (x, y);
> +}
> diff --git a/gcc/testsuite/gcc.target/i386/pr90497-2.c 
> b/gcc/testsuite/gcc.target/i386/pr90497-2.c
> new file mode 100644
> index 000..99ee5756b76
> --- /dev/null
> +++ b/gcc/testsuite/gcc.target/i386/pr90497-2.c
> @@ -0,0 +1,11 @@
> +/* PR target/90497 */
> +/* { dg-do compile { target ia32 } } */
> +/* { dg-options "-mno-sse -m3dnow" } */
> +
> +typedef char __v8qi __attribute__ ((__vector_size__ (8)));
> +
> +__v8qi
> +foo (__v8qi x, __v8qi y)
> +{
> +  return __builtin_ia32_pavgusb (x, y);
> +}
> --
> 2.20.1
>


Re: [PATCH v3 2/3] Add predict_doloop_p target hook

2019-05-16 Thread Kewen.Lin
on 2019/5/17 下午1:30, Kugan Vivekanandarajah wrote:
> Hi,
> 
> On Fri, 17 May 2019 at 13:37,  wrote:
>>
>> From: Kewen Lin 
>>
>> +/* Check whether number of iteration computation is too costly for doloop
>> +   transformation.  It expands the gimple sequence to equivalent RTL insn
>> +   sequence, then evaluate the cost.
>> +
>> +   Return true if it's costly, otherwise return false.  */
>> +
>> +static bool
>> +costly_iter_for_doloop_p (struct loop *loop, tree niters)
>> +{
>> +  tree type = TREE_TYPE (niters);
>> +  unsigned cost = 0;
>> +  bool speed = optimize_loop_for_speed_p (loop);
>> +  int regno = LAST_VIRTUAL_REGISTER + 1;
>> +  walk_tree (&niters, prepare_decl_rtl, ®no, NULL);
>> +  start_sequence ();
>> +  expand_expr (niters, NULL_RTX, TYPE_MODE (type), EXPAND_NORMAL);
>> +  rtx_insn *seq = get_insns ();
>> +  end_sequence ();
>> +
>> +  for (; seq; seq = NEXT_INSN (seq))
>> +{
>> +  if (!INSN_P (seq))
>> +   continue;
>> +  rtx body = PATTERN (seq);
>> +  if (GET_CODE (body) == SET)
>> +   {
>> + rtx set_val = XEXP (body, 1);
>> + enum rtx_code code = GET_CODE (set_val);
>> + enum rtx_class cls = GET_RTX_CLASS (code);
>> + /* For now, we only consider these two RTX classes, to match what 
>> we
>> +get in doloop_optimize, excluding operations like zero/sign extend. 
>>  */
>> + if (cls == RTX_BIN_ARITH || cls == RTX_COMM_ARITH)
>> +   cost += set_src_cost (set_val, GET_MODE (set_val), speed);
> Cant you have PARALLEL with SET here?
> 

Thanks for catching, updated it with single_set for PARALLEL.

-  if (!INSN_P (seq))
-   continue;
-  rtx body = PATTERN (seq);
-  if (GET_CODE (body) == SET)
+  rtx set = single_set (seq);
+  if (set != NULL_RTX)
{
- rtx set_val = XEXP (body, 1);
+ rtx set_val = XEXP (set, 1);


>> +   }
>> +}
>> +  unsigned max_cost
>> += COSTS_N_INSNS (PARAM_VALUE (PARAM_MAX_ITERATIONS_COMPUTATION_COST));
>> +  if (cost > max_cost)
>> +return true;
> Maybe it is better to bailout early if the limit is reached instead of
> doing it outside the loop?
> 

Good point.  Based on those cases I've checked so far, most of them are less
than max cost, it looks most cases won't return early.  Too many early checks
seem inefficient to some extent.  Does it make sense?
And we have to collect some statistics for sure. :)


Thanks,
Kewen

> Thanks,
> Kugan
> 
>> +
>> +  return false;
>> +}
>> +
>> +/* Predict whether the given loop will be transformed in the RTL
>> +   doloop_optimize pass.  Attempt to duplicate as many doloop_optimize 
>> checks
>> +   as possible.  This is only for target independent checks, see
>> +   targetm.predict_doloop_p for the target dependent ones.
>> +
>> +   Some RTL specific checks seems unable to be checked in gimple, if any new
>> +   checks or easy checks _are_ missing here, please add them.  */
>> +
>> +static bool
>> +generic_predict_doloop_p (struct ivopts_data *data)
>> +{
>> +  struct loop *loop = data->current_loop;
>> +
>> +  /* Call target hook for target dependent checks.  */
>> +  if (!targetm.predict_doloop_p (loop))
>> +{
>> +  if (dump_file && (dump_flags & TDF_DETAILS))
>> +   fprintf (dump_file, "predict doloop failure due to"
>> +   "target specific checks.\n");
>> +  return false;
>> +}
>> +
>> +  /* Similar to doloop_optimize, check iteration description to know it's
>> + suitable or not.  */
>> +  edge exit = loop_latch_edge (loop);
>> +  struct tree_niter_desc *niter_desc = niter_for_exit (data, exit);
>> +  if (niter_desc == NULL)
>> +{
>> +  if (dump_file && (dump_flags & TDF_DETAILS))
>> +   fprintf (dump_file, "predict doloop failure due to"
>> +   "unexpected niters.\n");
>> +  return false;
>> +}
>> +
>> +  /* Similar to doloop_optimize, check whether iteration count too small
>> + and not profitable.  */
>> +  HOST_WIDE_INT est_niter = get_estimated_loop_iterations_int (loop);
>> +  if (est_niter == -1)
>> +est_niter = get_likely_max_loop_iterations_int (loop);
>> +  if (est_niter >= 0 && est_niter < 3)
>> +{
>> +  if (dump_file && (dump_flags & TDF_DETAILS))
>> +   fprintf (dump_file,
>> +"predict doloop failure due to"
>> +"too few iterations (%u).\n",
>> +(unsigned int) est_niter);
>> +  return false;
>> +}
>> +
>> +  /* Similar to doloop_optimize, check whether number of iterations too 
>> costly
>> + to compute.  */
>> +  if (costly_iter_for_doloop_p (loop, niter_desc->niter))
>> +{
>> +  if (dump_file && (dump_flags & TDF_DETAILS))
>> +   fprintf (dump_file, "predict doloop failure due to"
>> +   "costly niter computation.\n");
>> +  return false;
>> +}
>> +
>> +  return true;
>> +}
>> +
>>  /* Determines cost of the computation of EXPR.  */
>>
>>  static unsigned
>> --

Go patch committed: Make value method of direct iface take pointer

2019-05-16 Thread Ian Lance Taylor
This patch to the Go frontend by Cherry Zhang make value methods of
direct interface types take a pointer argument.

Currently, a value method of a direct interface type takes the value
of the receiver, which is pointer shaped, as the first parameter.
When this method is called through interface, we actually pass the
interface data as a pointer.  On most platforms this is ok, as the
underlying calling convention is the same, except that on SPARC32, the
calling convention is actually different.

This patch changes the method function actually takes a pointer.  The
function will convert the pointer to the pointer-shaped receiver type
(a no-op conversion from machine code's aspect).  For a direct call,
in the caller we convert the receiver to a pointer (also no-op
conversion) before invoking the method.  For an interface call, we
pass the pointer as before.  This way, it is consistent that we always
pass a pointer.

Hopefully this fixes SPARC32 build and https://gcc.gnu.org/PR90482.

Bootstrapped and ran Go testsuite on x86_64-pc-linux-gnu.  Committed
to mainline.

Ian
Index: gcc/go/gofrontend/MERGE
===
--- gcc/go/gofrontend/MERGE (revision 271308)
+++ gcc/go/gofrontend/MERGE (working copy)
@@ -1,4 +1,4 @@
-f8a3668cbcfa3f8cd6c26c62bce416714cd401fc
+b5ab7b419d6328f5126ba8d6795280129eaf6e79
 
 The first line of this file holds the git revision number of the last
 merge done from the gofrontend repository.
Index: gcc/go/gofrontend/expressions.cc
===
--- gcc/go/gofrontend/expressions.cc(revision 271308)
+++ gcc/go/gofrontend/expressions.cc(working copy)
@@ -11263,6 +11263,16 @@ Call_expression::do_get_backend(Translat
   else
 has_closure_arg = true;
 
+  Expression* first_arg = NULL;
+  if (!is_interface_method && fntype->is_method())
+{
+  first_arg = this->args_->front();
+  if (first_arg->type()->points_to() == NULL
+  && first_arg->type()->is_direct_iface_type())
+first_arg = Expression::unpack_direct_iface(first_arg,
+first_arg->location());
+}
+
   int nargs;
   std::vector fn_args;
   if (this->args_ == NULL || this->args_->empty())
@@ -11279,7 +11289,7 @@ Call_expression::do_get_backend(Translat
&& this->args_->size() == 1);
   nargs = 1;
   fn_args.resize(1);
-  fn_args[0] = this->args_->front()->get_backend(context);
+  fn_args[0] = first_arg->get_backend(context);
 }
   else
 {
@@ -11294,7 +11304,7 @@ Call_expression::do_get_backend(Translat
   Expression_list::const_iterator pe = this->args_->begin();
   if (!is_interface_method && fntype->is_method())
{
-  fn_args[i] = (*pe)->get_backend(context);
+  fn_args[i] = first_arg->get_backend(context);
  ++pe;
  ++i;
}
Index: gcc/go/gofrontend/expressions.h
===
--- gcc/go/gofrontend/expressions.h (revision 271303)
+++ gcc/go/gofrontend/expressions.h (working copy)
@@ -1063,6 +1063,11 @@ class Expression
   static Expression*
   pack_direct_iface(Type*, Expression*, Location);
 
+  // Return an expression of the underlying pointer for a direct interface
+  // type (the opposite of pack_direct_iface).
+  static Expression*
+  unpack_direct_iface(Expression*, Location);
+
   // Dump an expression to a dump constext.
   void
   dump_expression(Ast_dump_context*) const;
@@ -1231,9 +1236,6 @@ class Expression
   }
 
   static Expression*
-  unpack_direct_iface(Expression*, Location);
-
-  static Expression*
   get_interface_type_descriptor(Expression*);
 
   static Expression*
Index: gcc/go/gofrontend/gogo.cc
===
--- gcc/go/gofrontend/gogo.cc   (revision 271303)
+++ gcc/go/gofrontend/gogo.cc   (working copy)
@@ -6052,9 +6052,10 @@ Function::build(Gogo* gogo, Named_object
 
  // We always pass the receiver to a method as a pointer.  If
  // the receiver is declared as a non-pointer type, then we
- // copy the value into a local variable.
+ // copy the value into a local variable.  For direct interface
+  // type we pack the pointer into the type.
  if ((*p)->var_value()->is_receiver()
- && !(*p)->var_value()->type()->is_direct_iface_type())
+  && (*p)->var_value()->type()->points_to() == NULL)
{
  std::string name = (*p)->name() + ".pointer";
  Type* var_type = (*p)->var_value()->type();
@@ -6066,14 +6067,19 @@ Function::build(Gogo* gogo, Named_object
   parm_bvar = parm_no->get_backend_variable(gogo, named_function);
 
   vars.push_back(bvar);
- Expression* parm_ref =
+
+  Expression* parm_ref =
   Expression::make_var_reference(parm_no, 

Strenghten aliasing_component_refs_p

2019-05-16 Thread Jan Hubicka
Hi,
this patch cuts walks in aliasing_component_refs_p if the type we look for
can not fit into a given type by comparing their sizes. Similar logic
already exists in indirect_ref_may_alias_decl_p.

When we walk reference a.b.c.d.e looking for type x we only need to do
it if sizeof(a)>=sizeof(x) and continue woking from e until
sizeof(e)<=sizeof(x). We do not need to compare types where sizes are
known to be different.

This saves some work (by not walking refs and not comparing their types
if they can not match) but also increases number of disambiguations
quite noticably. This is because same_type_for_tbaa often returns -1 and
makes aliasing_compinent_refs_p to give up.  Using the simple size check
prevents hitting such problematic type pairs in many common cases.

Stats on tramp3d lto build change From:

Alias oracle query stats:
  refs_may_alias_p: 0 disambiguations, 0 queries
  ref_maybe_used_by_call_p: 6451 disambiguations, 25578 queries
  call_may_clobber_ref_p: 817 disambiguations, 817 queries
  aliasing_component_ref_p: 14 disambiguations, 12528 queries
  TBAA oracle: 1468347 disambiguations 3010562 queries
   550690 are in alias set 0
   614235 queries asked about the same object
   0 queries asked about the same alias set
   0 access volatile
   260937 are dependent in the DAG
   116353 are aritificially in conflict with void *

to:

Alias oracle query stats:
  refs_may_alias_p: 0 disambiguations, 0 queries
  ref_maybe_used_by_call_p: 6451 disambiguations, 25580 queries
  call_may_clobber_ref_p: 817 disambiguations, 817 queries
  aliasing_component_ref_p: 118 disambiguations, 12552 queries
  TBAA oracle: 1468413 disambiguations 3010714 queries
   550719 are in alias set 0
   614247 queries asked about the same object
   0 queries asked about the same alias set
   0 access volatile
   260970 are dependent in the DAG
   116365 are aritificially in conflict with void *

So disambiguations are up from 14 to 118 which is still quite low.

A followup patch making types_same_for_tbaa to not give up for
TYPE_STRUCTURAL_EQUALITY pointers and arrays improves hitrate to 2723.

Bootstrapped/regtested x86_64-linux, OK?

* tree-ssa-alias.c (type_big_enough_for_type_p): New function.
(aliasing_component_refs_p): Use it.
Index: tree-ssa-alias.c
===
--- tree-ssa-alias.c(revision 271292)
+++ tree-ssa-alias.c(working copy)
@@ -735,6 +735,27 @@ ao_ref_init_from_ptr_and_size (ao_ref *r
   ref->volatile_p = false;
 }
 
+/* Return true if TYPE1 may contain TYPE2 by its size.  */
+
+static bool
+type_big_enough_for_type_p (tree type1, tree type2)
+{
+  if (!TYPE_SIZE (type1) || !TYPE_SIZE (type2))
+return true;
+  /* Be conservative for arrays and vectors.  We want to support partial
+ overlap on int[3] and int[3] as tested in gcc.dg/torture/alias-2.c.  */
+  while (TREE_CODE (type2) == ARRAY_TYPE
+|| TREE_CODE (type2) == VECTOR_TYPE)
+type2 = TREE_TYPE (type2);
+  if (!poly_int_tree_p (TYPE_SIZE (type1))
+  || !poly_int_tree_p (TYPE_SIZE (type2)))
+return true;
+  if (known_lt (wi::to_poly_widest (TYPE_SIZE (type1)),
+   wi::to_poly_widest (TYPE_SIZE (type2
+return false;
+  return true;
+}
+
 /* Return 1 if TYPE1 and TYPE2 are to be considered equivalent for the
purpose of TBAA.  Return 0 if they are distinct and -1 if we cannot
decide.  */
@@ -803,7 +824,7 @@ aliasing_component_refs_p (tree ref1,
   tree base1, base2;
   tree type1, type2;
   tree *refp;
-  int same_p, same_p2;
+  int same_p1 = 0, same_p2 = 0;
 
   /* Choose bases and base types to search for.  */
   base1 = ref1;
@@ -816,65 +837,88 @@ aliasing_component_refs_p (tree ref1,
   type2 = TREE_TYPE (base2);
 
   /* Now search for the type1 in the access path of ref2.  This
- would be a common base for doing offset based disambiguation on.  */
-  refp = &ref2;
-  while (handled_component_p (*refp)
-&& same_type_for_tbaa (TREE_TYPE (*refp), type1) == 0)
-refp = &TREE_OPERAND (*refp, 0);
-  same_p = same_type_for_tbaa (TREE_TYPE (*refp), type1);
-  if (same_p == 1)
+ would be a common base for doing offset based disambiguation on.
+ This however only makes sense if type2 is big enough to hold type1.  */
+  if (type_big_enough_for_type_p (type2, type1))
 {
-  poly_int64 offadj, sztmp, msztmp;
-  bool reverse;
-  get_ref_base_and_extent (*refp, &offadj, &sztmp, &msztmp, &reverse);
-  offset2 -= offadj;
-  get_ref_base_and_extent (base1, &offadj, &sztmp, &msztmp, &reverse);
-  offset1 -= offadj;
-  if (ranges_maybe_overlap_p (offset1, max_size1, offset2, max_size2))
+  refp = &ref2;
+  while (true)
{
- ++alias_stats.aliasing_component_refs_p_may_alias;
- return true;
+ /* We walk fro

Re: [PATCH v3 2/3] Add predict_doloop_p target hook

2019-05-16 Thread Kugan Vivekanandarajah
Hi,

On Fri, 17 May 2019 at 13:37,  wrote:
>
> From: Kewen Lin 
>
> Hi,
>
> Previous version link:
> https://gcc.gnu.org/ml/gcc-patches/2019-05/msg00654.html
>
> Comparing with the previous version, I moved the generic
> parts of rs6000 target hook to IVOPTs.  But I still kept
> the target hook as previous which checks some target
> specific criteria like innermost, max iteration counts
> etc, and checks for invalid stmt in loop.  The reason
> I decided not to move this part to generic is they are
> not generic enough.
>
> 1) For the part of target specific criteria, if we want
> to put it in generic, we can call the hook
> targetm.can_use_doloop_p, which requires us to
> prepare those four parameters, but each target only needs
> one or two parameters, it means we will evaluate some
> things which aren't required for that target.  So I'd like
> to leave this part to target hook.
> 2) For the other part of target invalid stmt check, as the
> hook invalid_within_doloop grep data shows, no all targets
> need to check whether invalid instructions exist in doloop.
> If we scan all stmts as generic, it can waste time for those
> targets which don't need to check.  Besides, the scope of
> the current check on SWITCH in rs6000 hook is wide, later
> if we want it more exact, we may need to check more stmts
> instead of single.  To let target hook scan the BBs/stmts
> by itself is also more flexible.
>
> Bootstrapped and regression testing ongoing on powerpc64le.
>
> Any more comments?
>
> gcc/ChangeLog
>
> 2019-05-17  Kewen Lin  
>
> PR middle-end/80791
> * target.def (predict_doloop_p): New hook.
> * targhooks.h (default_predict_doloop_p): New declaration.
> * targhooks.c (default_predict_doloop_p): New function.
> * doc/tm.texi.in (TARGET_PREDICT_DOLOOP_P): New hook.
> * doc/tm.texi: Regenerate.
> * config/rs6000/rs6000.c (invalid_insn_for_doloop_p): New function.
> (rs6000_predict_doloop_p): Likewise.
> (TARGET_PREDICT_DOLOOP_P): New macro.
> * tree-ssa-loop-ivopts.c (generic_predict_doloop_p): New function.
> (costly_iter_for_doloop_p): Likewise.
>
> ---
>  gcc/config/rs6000/rs6000.c |  79 +-
>  gcc/doc/tm.texi|   8 
>  gcc/doc/tm.texi.in |   2 +
>  gcc/target.def |   9 
>  gcc/targhooks.c|  13 ++
>  gcc/targhooks.h|   1 +
>  gcc/tree-ssa-loop-ivopts.c | 105 
> +
>  7 files changed, 216 insertions(+), 1 deletion(-)
>
> diff --git a/gcc/config/rs6000/rs6000.c b/gcc/config/rs6000/rs6000.c
> index a21f4f7..2fd52d7 100644
> --- a/gcc/config/rs6000/rs6000.c
> +++ b/gcc/config/rs6000/rs6000.c
> @@ -83,6 +83,7 @@
>  #include "tree-ssa-propagate.h"
>  #include "tree-vrp.h"
>  #include "tree-ssanames.h"
> +#include "tree-cfg.h"
>
>  /* This file should be included last.  */
>  #include "target-def.h"
> @@ -1914,6 +1915,9 @@ static const struct attribute_spec 
> rs6000_attribute_table[] =
>  #undef TARGET_CAN_USE_DOLOOP_P
>  #define TARGET_CAN_USE_DOLOOP_P can_use_doloop_if_innermost
>
> +#undef TARGET_PREDICT_DOLOOP_P
> +#define TARGET_PREDICT_DOLOOP_P rs6000_predict_doloop_p
> +
>  #undef TARGET_ATOMIC_ASSIGN_EXPAND_FENV
>  #define TARGET_ATOMIC_ASSIGN_EXPAND_FENV rs6000_atomic_assign_expand_fenv
>
> @@ -39436,7 +39440,80 @@ rs6000_mangle_decl_assembler_name (tree decl, tree 
> id)
>return id;
>  }
>
> -
> +/* Check whether there are some instructions preventing doloop transformation
> +   inside loop body, mainly for instructions which are possible to kill CTR.
> +
> +   Return true if some invalid insn exits, otherwise return false.  */
> +
> +static bool
> +invalid_insn_for_doloop_p (struct loop *loop)
> +{
> +  basic_block *body = get_loop_body (loop);
> +  gimple_stmt_iterator gsi;
> +
> +  for (unsigned i = 0; i < loop->num_nodes; i++)
> +for (gsi = gsi_start_bb (body[i]); !gsi_end_p (gsi); gsi_next (&gsi))
> +  {
> +   gimple *stmt = gsi_stmt (gsi);
> +   if (is_gimple_call (stmt) && !gimple_call_internal_p (stmt)
> +   && !is_inexpensive_builtin (gimple_call_fndecl (stmt)))
> + {
> +   if (dump_file && (dump_flags & TDF_DETAILS))
> + fprintf (dump_file,
> +  "predict doloop failure due to finding call.\n");
> +   return true;
> + }
> +   if (computed_goto_p (stmt))
> + {
> +   if (dump_file && (dump_flags & TDF_DETAILS))
> + fprintf (dump_file, "predict doloop failure due to"
> + "finding computed jump.\n");
> +   return true;
> + }
> +
> +   /* TODO: Now this hook is expected to be called in ivopts, which is
> +  before switchlower1/switchlower2.  Checking for SWITCH at this 
> point
> +  will eliminate some good candidates.  But since there are only a 
> few
> +  cases having _a_ swit

Re: Deque code cleanup and optimizations

2019-05-16 Thread François Dumont
Here is the simplified patch. I put back the _M_map checks, we'll see 
later if those can be removed.


    * include/bits/stl_deque.h
    (_Deque_iterator<>::__ptr_to): Remove, use std::__ptr_rebind.
    (_Deque_base(_Deque_base&&, const allocator_type&)): New.
    (_Deque_base::_Deque_impl_data): New.
    (_Deque_base::_Deque_impl): Inherit latter.
    (_Deque_base::_Deque_impl::_M_swap_data): Move...
    (_Deque_base::_Deque_impl_data::_M_swap_data): ... here.
    (_Deque_base::_Deque_impl()): Add noexcept qualification.
    (_Deque_base::_Deque_impl(_Deque_impl&&, _Tp_alloc_type&&)): New.
    (_Deque_base::_Deque_impl::_M_get_Tp_allocator()): Remove static_cast.
    (deque<>::deque()): Default.
    (deque<>::deque(deque&&)): Default.
    (deque<>::deque(deque&&, const allocator_type&, false_type)): New.
    (deque<>::deque(deque&&, const allocator_type&, true_type)): New.
    (deque<>::deque(deque&&, const allocator_type&)): Delegate to latters.
    (deque<>::deque<_It>(_It, _It, const allocator_type&)): Use
    _M_range_initialize.
    (deque<>::assign<_It>(_It, _It)): Use _M_assign_aux.
    (deque<>::resize(size_type, const value_type&)): Share a single
    implementation.
    (deque<>::insert<_It>(const_iterator, _It, _It)): Use
    _M_range_insert_aux.
    [__cplusplus >= 201103L](_M_initialize_dispatch): Remove.
    [__cplusplus >= 201103L](_M_assign_dispatch): Remove.
    [__cplusplus >= 201103L](_M_insert_dispatch): Remove.
    * testsuite/23_containers/deque/allocator/default_init.cc: New.

Tested under Linux x86_64.

Ok to commit ?

François

On 5/10/19 3:38 PM, Jonathan Wakely wrote:

This seems generally OK, but ...

On Fri, 10 May 2019, 05:59 François Dumont wrote:

  I remove several _M_map != nullptr checks cause in current
implementation it can't be null. I have several patches following this
one to support it but in this case we will be using a different code path.


You can't remove those checks. If _M_map can ever be null now or in
the future, then we need the checks. Otherwise code compiled today
would break if passed a deque compiled with a future GCC that allows
the map to be null.

I'm curious how you plan to support it though, I don't think it's
possible without an ABI break.


  (_Deque_base::_Deque_impl::_M_move_impl()): Remove _M_impl._M_map
check.

_M_move_impl and the constructor that calls it can be removed
completely, because https://cplusplus.github.io/LWG/issue2593 means
that the same allocator can still be used after moving from it. That
function only exists to handle the case where an allocator changes
value after being moved from.



diff --git a/libstdc++-v3/include/bits/stl_deque.h b/libstdc++-v3/include/bits/stl_deque.h
index 358bbda3902..22a7ac8da2e 100644
--- a/libstdc++-v3/include/bits/stl_deque.h
+++ b/libstdc++-v3/include/bits/stl_deque.h
@@ -115,15 +115,13 @@ _GLIBCXX_BEGIN_NAMESPACE_CONTAINER
   typedef _Tp**	   _Map_pointer;
 #else
 private:
-  template
-	using __ptr_to = typename pointer_traits<_Ptr>::template rebind<_Up>;
   template
-	using __iter = _Deque_iterator<_Tp, _CvTp&, __ptr_to<_CvTp>>;
+	using __iter = _Deque_iterator<_Tp, _CvTp&, __ptr_rebind<_Ptr, _CvTp>>;
 public:
   typedef __iter<_Tp>   iterator;
   typedef __iter   const_iterator;
-  typedef __ptr_to<_Tp>		_Elt_pointer;
-  typedef __ptr_to<_Elt_pointer>	_Map_pointer;
+  typedef __ptr_rebind<_Ptr, _Tp>			   _Elt_pointer;
+  typedef __ptr_rebind<_Ptr, _Elt_pointer>		   _Map_pointer;
 #endif
 
   static size_t _S_buffer_size() _GLIBCXX_NOEXCEPT
@@ -401,7 +399,6 @@ _GLIBCXX_BEGIN_NAMESPACE_CONTAINER
 	_Map_alloc_type;
   typedef __gnu_cxx::__alloc_traits<_Map_alloc_type> _Map_alloc_traits;
 
-public:
   typedef _Alloc		  allocator_type;
 
   allocator_type
@@ -436,6 +433,10 @@ _GLIBCXX_BEGIN_NAMESPACE_CONTAINER
 	  this->_M_impl._M_swap_data(__x._M_impl);
   }
 
+  _Deque_base(_Deque_base&& __x, const allocator_type& __a)
+  : _M_impl(std::move(__x._M_impl), _Tp_alloc_type(__a))
+  { __x._M_initialize_map(0); }
+
   _Deque_base(_Deque_base&& __x, const allocator_type& __a, size_t __n)
   : _M_impl(__a)
   {
@@ -456,56 +457,73 @@ _GLIBCXX_BEGIN_NAMESPACE_CONTAINER
 
   ~_Deque_base() _GLIBCXX_NOEXCEPT;
 
-protected:
   typedef typename iterator::_Map_pointer _Map_pointer;
 
-  //This struct encapsulates the implementation of the std::deque
-  //standard container and at the same time makes use of the EBO
-  //for empty allocators.
-  struct _Deque_impl
-  : public _Tp_alloc_type
+  struct _Deque_impl_data
   {
 	_Map_pointer _M_map;
 	size_t _M_map_size;
 	iterator _M_start;
 	iterator _M_finish;
 
-	_Deque_impl()
-	: _Tp_alloc_type(), _M_map(), _M_map_size(0),
-	  _M_start(), _M_finish()
+	_Deque_impl_data() _GLIBCXX_NOEXCEPT
+	: _M_map(), _M_map_size(), _M_start(), _M_finish()
+	{ }
+
+#if __cplusplus >= 201103L
+	_Deque_impl_data(const _Dequ

Re: LWG2593 Move from allocator state is preserved

2019-05-16 Thread François Dumont

2 other tests needed to be adapted in 21_strings. Attached patch applied.

2019-05-17  François Dumont 

    Move from state of allocators (LWG2593)
    * include/bits/stl_deque.h
    (_Deque_base(_Deque_base&&, false_type)): Remove.
    (_Deque_base(_Deque_base&&, true_type)): Remove.
    (_Deque_base(_Deque_base&&)): Adapt.
    (_Deque_base::_M_move_impl()): Remove.
    * testsuite/util/testsuite_allocator.h
    (propagating_allocator(propagating_allocator&&)): Preserve move from
    state.
    * testsuite/23_containers/deque/allocator/move_assign.cc (test02):
    Adapt.
    * testsuite/23_containers/forward_list/allocator/move_assign.cc 
(test02):

    Adapt.
    * testsuite/23_containers/list/allocator/move_assign.cc (test02): 
Adapt.

    * testsuite/23_containers/map/allocator/move_assign.cc (test02): Adapt.
    * testsuite/23_containers/multimap/allocator/move_assign.cc (test02):
    Adapt.
    * testsuite/23_containers/multiset/allocator/move_assign.cc (test02):
    Adapt.
    * testsuite/23_containers/set/allocator/move_assign.cc (test02): Adapt.
    * testsuite/23_containers/unordered_map/allocator/move_assign.cc
    (test02): Adapt.
    * testsuite/23_containers/unordered_multimap/allocator/move_assign.cc
    (test02): Adapt.
    * testsuite/23_containers/unordered_multiset/allocator/move_assign.cc
    (test02): Adapt.
    * testsuite/23_containers/unordered_set/allocator/move_assign.cc
    (test02): Adapt.
    * testsuite/23_containers/vector/allocator/move_assign.cc (test02):
    Adapt.
    * testsuite/23_containers/vector/bool/allocator/move_assign.cc 
(test02):

    Adapt.
    * testsuite/21_strings/basic_string/allocator/char/move_assign.cc
    (test02): Adapt.
    * testsuite/21_strings/basic_string/allocator/wchar_t/move_assign.cc
    (test02): Adapt.


On 5/16/19 11:05 AM, Jonathan Wakely wrote:

On 16/05/19 07:48 +0200, François Dumont wrote:

Hi

    Let's apply this resolution first before moving forward with the 
std::deque implementation.


    Move from state of allocators (LWG2593)
    * include/bits/stl_deque.h
    (_Deque_base(_Deque_base&&, false_type)): Remove.
    (_Deque_base(_Deque_base&&, true_type)): Remove.
    (_Deque_base(_Deque_base&&)): Adapt.
    (_Deque_base::_M_move_impl()): Remove.
    * testsuite/util/testsuite_allocator.h
    (propagating_allocator(propagating_allocator&&)): Preserve move from
    state.
    * testsuite/23_containers/deque/allocator/move_assign.cc (test02):
    Adapt.
    * testsuite/23_containers/forward_list/allocator/move_assign.cc 
(test02):

    Adapt.
    * testsuite/23_containers/list/allocator/move_assign.cc (test02): 
Adapt.
    * testsuite/23_containers/map/allocator/move_assign.cc (test02): 
Adapt.
    * testsuite/23_containers/multimap/allocator/move_assign.cc 
(test02):

    Adapt.
    * testsuite/23_containers/multiset/allocator/move_assign.cc 
(test02):

    Adapt.
    * testsuite/23_containers/set/allocator/move_assign.cc (test02): 
Adapt.

    * testsuite/23_containers/unordered_map/allocator/move_assign.cc
    (test02): Adapt.
    * 
testsuite/23_containers/unordered_multimap/allocator/move_assign.cc

    (test02): Adapt.
    * 
testsuite/23_containers/unordered_multiset/allocator/move_assign.cc

    (test02): Adapt.
    * testsuite/23_containers/unordered_set/allocator/move_assign.cc
    (test02): Adapt.
    * testsuite/23_containers/vector/allocator/move_assign.cc (test02):
    Adapt.
    * testsuite/23_containers/vector/bool/allocator/move_assign.cc 
(test02):

    Adapt.

I only run 23_containers tests with success so far, I'll complete the 
run before committing.


Nice, thanks for doing this.


Ok to commit ?


Yes, although I'd like one change ...

diff --git a/libstdc++-v3/testsuite/util/testsuite_allocator.h 
b/libstdc++-v3/testsuite/util/testsuite_allocator.h

index d817ac4e838..a98869ed14f 100644
--- a/libstdc++-v3/testsuite/util/testsuite_allocator.h
+++ b/libstdc++-v3/testsuite/util/testsuite_allocator.h
@@ -465,12 +465,12 @@ namespace __gnu_test
  return *this;
  }

-  // postcondition: a.get_personality() == 0
+  // postcondition: LWG2593 a.get_personality() un-changed.
  propagating_allocator(propagating_allocator&& a) noexcept
-  : base_alloc()
-  { swap_base(a); }
+  : base_alloc(std::move(a.base()))
+  { /*swap_base(a);*/ }


I don't think we should keep the /*swap_base(a);*/ comment. It just
confusing to have commented-out code that implements an old
behaviour.

OK for trunk with that /*...*/ comment removed.

Thanks again.





diff --git a/libstdc++-v3/include/bits/stl_deque.h b/libstdc++-v3/include/bits/stl_deque.h
index c050d1bf023..358bbda3902 100644
--- a/libstdc++-v3/include/bits/stl_deque.h
+++ b/libstdc++-v3/include/bits/stl_deque.h
@@ -428,11 +428,7 @@ _GLIBCXX_BEGIN_NAMESPACE_CONTAINER
   { /* Caller must initialize map. */ }
 
 #if __cplusplus >= 201103L
-  _Deque_base(_Deque_base&& __x, false_type)
-  : _M_impl(__x._M_move_impl())
-  { }
-
- 

[PATCH] Remove empty loop with assumed finiteness (PR tree-optimization/89713)

2019-05-16 Thread Feng Xue OS
This patch is meant to give user a way to optimize away those empty loops which 
are impossible to be recognized by compiler, such as C++ STL container-based 
loop,

void f (std::map &m)
    {
    for (auto it = m.begin (); it != m.end (); ++it);
    }
 
An option "-ffinite-loop" is added to tell compiler about finiteness of loops 
so that compiler can apply the optimization.

Feng

diff --git a/gcc/ChangeLog b/gcc/ChangeLog
index d8bed3a..c55f2e6 100644
--- a/gcc/ChangeLog
+++ b/gcc/ChangeLog
@@ -1,3 +1,18 @@
+2019-05-16  Feng Xue  
+
+PR tree-optimization/89713
+* doc/invoke.texi (-ffinite-loop): Document new option.
+* common.opt (-ffinite-loop): New option.
+* passes.def: Replace pass_cd_dce with pass_cd_dce2.
+* tree-pass.h (pass_cd_dce2): New declaration.
+* tree-ssa-dce.c (loop_has_true_exits): New function.
+(find_obviously_necessary_stmts): Add aggressive_loop_removal
+parameter.
+(perform_tree_ssa_dce, tree_ssa_cd_dce): Likewise.
+(class pass_cd_dce): Add new member aggressive_loop_removal.
+(pass_cd_dce::pass_cd_dce: Add alr parameter.
+(make_pass_cd_dce2): New function.
+
 2019-05-16  Jakub Jelinek  
 
PR c++/90484
diff --git a/gcc/common.opt b/gcc/common.opt
index d342c4f..e98a34d 100644
--- a/gcc/common.opt
+++ b/gcc/common.opt
@@ -1437,6 +1437,10 @@ ffinite-math-only
 Common Report Var(flag_finite_math_only) Optimization SetByCombined
 Assume no NaNs or infinities are generated.
 
+ffinite-loop
+Common Report Var(flag_finite_loop) Optimization
+Assume loops are finite if can not be analytically determined.
+
 ffixed-
 Common Joined RejectNegative Var(common_deferred_options) Defer
 -ffixed- Mark  as being unavailable to the compiler.
diff --git a/gcc/doc/invoke.texi b/gcc/doc/invoke.texi
index 5e3e887..9a3882c 100644
--- a/gcc/doc/invoke.texi
+++ b/gcc/doc/invoke.texi
@@ -412,6 +412,7 @@ Objective-C and Objective-C++ Dialects}.
 -fdevirtualize-at-ltrans  -fdse @gol
 -fearly-inlining  -fipa-sra  -fexpensive-optimizations  -ffat-lto-objects @gol
 -ffast-math  -ffinite-math-only  -ffloat-store  -fexcess-precision=@var{style} 
@gol
+-ffinite-loop @gol
 -fforward-propagate  -ffp-contract=@var{style}  -ffunction-sections @gol
 -fgcse  -fgcse-after-reload  -fgcse-las  -fgcse-lm  -fgraphite-identity @gol
 -fgcse-sm  -fhoist-adjacent-loads  -fif-conversion @gol
@@ -9501,6 +9502,20 @@ that may set @code{errno} but are otherwise free of side 
effects.  This flag is
 enabled by default at @option{-O2} and higher if @option{-Os} is not also
 specified.
 
+@item -ffinite-loop
+@opindex ffinite-loop
+@opindex fno-finite-loop
+Allow the compiler to assume that if finiteness of a loop can not be
+analytically determined, the loop must be finite. With the assumption, some
+aggressive transformation could be possible, such as removal of this kind
+of empty loop by dead code elimination (DCE).
+
+This option is not turned on by any @option{-O} option since it might result
+in incorrect behaviour for programs that contain seemly finite, but actually
+infinite loop.
+
+The default is @option{-fno-finite-loop}.
+
 @item -ftree-dominator-opts
 @opindex ftree-dominator-opts
 Perform a variety of simple scalar cleanups (constant/copy
diff --git a/gcc/passes.def b/gcc/passes.def
index ad2efab..b84ee34 100644
--- a/gcc/passes.def
+++ b/gcc/passes.def
@@ -322,7 +322,7 @@ along with GCC; see the file COPYING3.  If not see
   NEXT_PASS (pass_copy_prop);
   NEXT_PASS (pass_warn_restrict);
   NEXT_PASS (pass_dse);
-  NEXT_PASS (pass_cd_dce);
+  NEXT_PASS (pass_cd_dce2);
   NEXT_PASS (pass_forwprop);
   NEXT_PASS (pass_phiopt, false /* early_p */);
   NEXT_PASS (pass_fold_builtins);
diff --git a/gcc/tree-pass.h b/gcc/tree-pass.h
index 3a0b380..2392bc5 100644
--- a/gcc/tree-pass.h
+++ b/gcc/tree-pass.h
@@ -395,6 +395,7 @@ extern gimple_opt_pass *make_pass_build_ealias 
(gcc::context *ctxt);
 extern gimple_opt_pass *make_pass_dominator (gcc::context *ctxt);
 extern gimple_opt_pass *make_pass_dce (gcc::context *ctxt);
 extern gimple_opt_pass *make_pass_cd_dce (gcc::context *ctxt);
+extern gimple_opt_pass *make_pass_cd_dce2 (gcc::context *ctxt);
 extern gimple_opt_pass *make_pass_call_cdce (gcc::context *ctxt);
 extern gimple_opt_pass *make_pass_merge_phi (gcc::context *ctxt);
 extern gimple_opt_pass *make_pass_thread_jumps (gcc::context *ctxt);
diff --git a/gcc/tree-ssa-dce.c b/gcc/tree-ssa-dce.c
index 2478219..d4659df 100644
--- a/gcc/tree-ssa-dce.c
+++ b/gcc/tree-ssa-dce.c
@@ -356,6 +356,27 @@ mark_control_dependent_edges_necessary (basic_block bb, 
bool ignore_self)
 bitmap_set_bit (visited_control_parents, bb->index);
 }
 
+/* Check whether a loop has any non-EH exit. */
+
+static bool
+loop_has_true_exits (const struct loop *loop)
+{
+  vec exits = get_loop_exit_edges (loop);
+  bool found = false;
+  edge e;
+  unsigned i;
+
+  FOR_EACH_VEC_ELT (exits, i, e)
+if (!(e->

[PATCH v3 2/3] Add predict_doloop_p target hook

2019-05-16 Thread linkw
From: Kewen Lin 

Hi,

Previous version link:
https://gcc.gnu.org/ml/gcc-patches/2019-05/msg00654.html

Comparing with the previous version, I moved the generic
parts of rs6000 target hook to IVOPTs.  But I still kept
the target hook as previous which checks some target
specific criteria like innermost, max iteration counts
etc, and checks for invalid stmt in loop.  The reason
I decided not to move this part to generic is they are
not generic enough.

1) For the part of target specific criteria, if we want
to put it in generic, we can call the hook
targetm.can_use_doloop_p, which requires us to
prepare those four parameters, but each target only needs
one or two parameters, it means we will evaluate some
things which aren't required for that target.  So I'd like
to leave this part to target hook.
2) For the other part of target invalid stmt check, as the
hook invalid_within_doloop grep data shows, no all targets
need to check whether invalid instructions exist in doloop.
If we scan all stmts as generic, it can waste time for those
targets which don't need to check.  Besides, the scope of
the current check on SWITCH in rs6000 hook is wide, later
if we want it more exact, we may need to check more stmts
instead of single.  To let target hook scan the BBs/stmts
by itself is also more flexible.

Bootstrapped and regression testing ongoing on powerpc64le.

Any more comments?

gcc/ChangeLog

2019-05-17  Kewen Lin  

PR middle-end/80791
* target.def (predict_doloop_p): New hook.
* targhooks.h (default_predict_doloop_p): New declaration.
* targhooks.c (default_predict_doloop_p): New function.
* doc/tm.texi.in (TARGET_PREDICT_DOLOOP_P): New hook.
* doc/tm.texi: Regenerate.
* config/rs6000/rs6000.c (invalid_insn_for_doloop_p): New function.
(rs6000_predict_doloop_p): Likewise.
(TARGET_PREDICT_DOLOOP_P): New macro.
* tree-ssa-loop-ivopts.c (generic_predict_doloop_p): New function. 
(costly_iter_for_doloop_p): Likewise.

---
 gcc/config/rs6000/rs6000.c |  79 +-
 gcc/doc/tm.texi|   8 
 gcc/doc/tm.texi.in |   2 +
 gcc/target.def |   9 
 gcc/targhooks.c|  13 ++
 gcc/targhooks.h|   1 +
 gcc/tree-ssa-loop-ivopts.c | 105 +
 7 files changed, 216 insertions(+), 1 deletion(-)

diff --git a/gcc/config/rs6000/rs6000.c b/gcc/config/rs6000/rs6000.c
index a21f4f7..2fd52d7 100644
--- a/gcc/config/rs6000/rs6000.c
+++ b/gcc/config/rs6000/rs6000.c
@@ -83,6 +83,7 @@
 #include "tree-ssa-propagate.h"
 #include "tree-vrp.h"
 #include "tree-ssanames.h"
+#include "tree-cfg.h"
 
 /* This file should be included last.  */
 #include "target-def.h"
@@ -1914,6 +1915,9 @@ static const struct attribute_spec 
rs6000_attribute_table[] =
 #undef TARGET_CAN_USE_DOLOOP_P
 #define TARGET_CAN_USE_DOLOOP_P can_use_doloop_if_innermost
 
+#undef TARGET_PREDICT_DOLOOP_P
+#define TARGET_PREDICT_DOLOOP_P rs6000_predict_doloop_p
+
 #undef TARGET_ATOMIC_ASSIGN_EXPAND_FENV
 #define TARGET_ATOMIC_ASSIGN_EXPAND_FENV rs6000_atomic_assign_expand_fenv
 
@@ -39436,7 +39440,80 @@ rs6000_mangle_decl_assembler_name (tree decl, tree id)
   return id;
 }
 
-
+/* Check whether there are some instructions preventing doloop transformation
+   inside loop body, mainly for instructions which are possible to kill CTR.
+
+   Return true if some invalid insn exits, otherwise return false.  */
+
+static bool
+invalid_insn_for_doloop_p (struct loop *loop)
+{
+  basic_block *body = get_loop_body (loop);
+  gimple_stmt_iterator gsi;
+
+  for (unsigned i = 0; i < loop->num_nodes; i++)
+for (gsi = gsi_start_bb (body[i]); !gsi_end_p (gsi); gsi_next (&gsi))
+  {
+   gimple *stmt = gsi_stmt (gsi);
+   if (is_gimple_call (stmt) && !gimple_call_internal_p (stmt)
+   && !is_inexpensive_builtin (gimple_call_fndecl (stmt)))
+ {
+   if (dump_file && (dump_flags & TDF_DETAILS))
+ fprintf (dump_file,
+  "predict doloop failure due to finding call.\n");
+   return true;
+ }
+   if (computed_goto_p (stmt))
+ {
+   if (dump_file && (dump_flags & TDF_DETAILS))
+ fprintf (dump_file, "predict doloop failure due to"
+ "finding computed jump.\n");
+   return true;
+ }
+
+   /* TODO: Now this hook is expected to be called in ivopts, which is
+  before switchlower1/switchlower2.  Checking for SWITCH at this point
+  will eliminate some good candidates.  But since there are only a few
+  cases having _a_ switch statement in the innermost loop, it can be a
+  low priority enhancement.  */
+   if (gimple_code (stmt) == GIMPLE_SWITCH)
+ {
+   if (dump_file && (dump_flags & TDF_DETAILS))
+ fprintf (dump_file,
+  "predict doloop failur

New Spanish PO file for 'gcc' (version 9.1.0)

2019-05-16 Thread Translation Project Robot
Hello, gentle maintainer.

This is a message from the Translation Project robot.

A revised PO file for textual domain 'gcc' has been submitted
by the Spanish team of translators.  The file is available at:

https://translationproject.org/latest/gcc/es.po

(This file, 'gcc-9.1.0.es.po', has just now been sent to you in
a separate email.)

All other PO files for your package are available in:

https://translationproject.org/latest/gcc/

Please consider including all of these in your next release, whether
official or a pretest.

Whenever you have a new distribution with a new version number ready,
containing a newer POT file, please send the URL of that distribution
tarball to the address below.  The tarball may be just a pretest or a
snapshot, it does not even have to compile.  It is just used by the
translators when they need some extra translation context.

The following HTML page has been updated:

https://translationproject.org/domain/gcc.html

If any question arises, please contact the translation coordinator.

Thank you for all your work,

The Translation Project robot, in the
name of your translation coordinator.




Re: [PATCH][PR90106] Builtin call transformation changes in cdce pass

2019-05-16 Thread JunMa

在 2019/5/17 上午6:04, Jakub Jelinek 写道:

On Thu, May 16, 2019 at 11:39:38PM +0200, Jakub Jelinek wrote:

One possibility is to add -fdump-tree-optimized and scan for
/* { dg-final { scan-tree-dump "pow \\(\[^\n\r]*\\); \\\[tail call\\\]" 
"optimized" } } */
resp.
/* { dg-final { scan-tree-dump "log \\(\[^\n\r]*\\); \\\[tail call\\\]" 
"optimized" } } */

Here it is in patch form.

That said, I'm not convinced your patch does what you wanted, because
comparing a month old trunk with today's trunk generates the same assembly
except for .ident, generates as many [tail call] lines in *.optimized dump
as before, emits the same number of jmp\tpow and jmp\tlog instructions as
before (one in a separate routine).



Thanks for point out the mistake and fix it.

For these two tests, cdce pass doesn't transform the builtin math 
functions in foo

with or without the patch because they cannot use internal functions.

I'll add another testcase to verify the patch.

Regards
Jun



But at least the tests aren't UNSUPPORTED anymore.

2019-05-16  Jakub Jelinek  

PR tree-optimization/90106
* gcc.dg/cdce1.c: Don't scan-assembler, instead -fdump-tree-optimized
and scan-tree-dump for tail call.
* gcc.dg/cdce2.c: Likewise.

--- gcc/testsuite/gcc.dg/cdce1.c.jj 2019-05-16 11:28:22.750177582 +0200
+++ gcc/testsuite/gcc.dg/cdce1.c2019-05-16 23:50:23.618450891 +0200
@@ -1,9 +1,9 @@
-/* { dg-do  run  } */
-/* { dg-options "-O2 -fmath-errno -fdump-tree-cdce-details  -lm" } */
+/* { dg-do run } */
+/* { dg-options "-O2 -fmath-errno -fdump-tree-cdce-details -fdump-tree-optimized 
-lm" } */
  /* { dg-require-effective-target int32plus } */
-/* { dg-final { scan-tree-dump  "cdce1.c:17: .* function call is shrink-wrapped into error 
conditions\."  "cdce" } } */
-/* { dg-final { scan-assembler "jmp pow" } } */
  /* { dg-require-effective-target large_double } */
+/* { dg-final { scan-tree-dump "cdce1.c:17: .* function call is shrink-wrapped into error 
conditions\." "cdce" } } */
+/* { dg-final { scan-tree-dump "pow \\(\[^\n\r]*\\); \\\[tail call\\\]" 
"optimized" } } */
  
  #include 

  #include 
--- gcc/testsuite/gcc.dg/cdce2.c.jj 2019-05-16 11:28:22.781177075 +0200
+++ gcc/testsuite/gcc.dg/cdce2.c2019-05-16 23:50:58.505880845 +0200
@@ -1,8 +1,8 @@
-/* { dg-do  run  } */
+/* { dg-do run } */
  /* { dg-skip-if "doubles are floats" { "avr-*-*" } } */
-/* { dg-options "-O2 -fmath-errno -fdump-tree-cdce-details  -lm" } */
-/* { dg-final { scan-tree-dump  "cdce2.c:16: .* function call is shrink-wrapped into error 
conditions\." "cdce" } } */
-/* { dg-final { scan-assembler "jmp log" } } */
+/* { dg-options "-O2 -fmath-errno -fdump-tree-cdce-details -fdump-tree-optimized 
-lm" } */
+/* { dg-final { scan-tree-dump "cdce2.c:16: .* function call is shrink-wrapped into error 
conditions\." "cdce" } } */
+/* { dg-final { scan-tree-dump "log \\(\[^\n\r]*\\); \\\[tail call\\\]" 
"optimized" } } */
   
  #include 

  #include 


Jakub





Re: [PATCH 1/2] Add support for IVOPT

2019-05-16 Thread Kugan Vivekanandarajah
Hi Richard,

On Thu, 16 May 2019 at 21:14, Richard Biener  wrote:
>
> On Wed, May 15, 2019 at 4:40 AM  wrote:
> >
> > From: Kugan Vivekanandarajah 
> >
> > gcc/ChangeLog:
> >
> > 2019-05-15  Kugan Vivekanandarajah  
> >
> > PR target/88834
> > * tree-ssa-loop-ivopts.c (get_mem_type_for_internal_fn): Handle
> > IFN_MASK_LOAD_LANES and IFN_MASK_STORE_LANES.
> > (find_interesting_uses_stmt): Likewise.
> > (get_alias_ptr_type_for_ptr_address): Likewise.
> > (add_iv_candidate_for_use): Add scaled index candidate if useful.
> >
> > Change-Id: I8e8151fe2dde2845dedf38b090103694da6fc9d1
> > ---
> >  gcc/tree-ssa-loop-ivopts.c | 60 
> > +-
> >  1 file changed, 59 insertions(+), 1 deletion(-)
> >
> > diff --git a/gcc/tree-ssa-loop-ivopts.c b/gcc/tree-ssa-loop-ivopts.c
> > index 9864b59..115a70c 100644
> > --- a/gcc/tree-ssa-loop-ivopts.c
> > +++ b/gcc/tree-ssa-loop-ivopts.c
> > @@ -2451,11 +2451,13 @@ get_mem_type_for_internal_fn (gcall *call, tree 
> > *op_p)
> >switch (gimple_call_internal_fn (call))
> >  {
> >  case IFN_MASK_LOAD:
> > +case IFN_MASK_LOAD_LANES:
> >if (op_p == gimple_call_arg_ptr (call, 0))
> > return TREE_TYPE (gimple_call_lhs (call));
> >return NULL_TREE;
> >
> >  case IFN_MASK_STORE:
> > +case IFN_MASK_STORE_LANES:
> >if (op_p == gimple_call_arg_ptr (call, 0))
> > return TREE_TYPE (gimple_call_arg (call, 3));
> >return NULL_TREE;
> > @@ -2545,7 +2547,7 @@ find_interesting_uses_stmt (struct ivopts_data *data, 
> > gimple *stmt)
> >   return;
> > }
> >
> > -  /* TODO -- we should also handle address uses of type
> > +  /* TODO -- we should also handle all address uses of type
> >
> >  memory = call (whatever);
> >
> > @@ -2553,6 +2555,27 @@ find_interesting_uses_stmt (struct ivopts_data 
> > *data, gimple *stmt)
> >
> >  call (memory).  */
> >  }
> > +  else if (is_gimple_call (stmt))
> > +{
> > +  gcall *call = dyn_cast  (stmt);
> > +  if (call
>
> that's testing things twice, just do
>
>else if (gcall *call = dyn_cast  (stmt))
>  {
> ...
>
> no other comments besides why do you need _LANES handling here where
> the w/o _LANES handling didn't need anything.
Right,  I have now changed this in the revised patch.

Thanks,
Kugan

>
> > + && gimple_call_internal_p (call)
> > + && (gimple_call_internal_fn (call) == IFN_MASK_LOAD_LANES
> > + || gimple_call_internal_fn (call) == IFN_MASK_STORE_LANES))
> > +   {
> > + tree *arg = gimple_call_arg_ptr (call, 0);
> > + struct iv *civ = get_iv (data, *arg);
> > + tree mem_type = get_mem_type_for_internal_fn (call, arg);
> > + if (civ && mem_type)
> > +   {
> > + civ = alloc_iv (data, civ->base, civ->step);
> > + record_group_use (data, arg, civ, stmt, USE_PTR_ADDRESS,
> > +   mem_type);
> > + return;
> > +   }
> > +   }
> > +}
> > +
> >
> >if (gimple_code (stmt) == GIMPLE_PHI
> >&& gimple_bb (stmt) == data->current_loop->header)
> > @@ -3500,6 +3523,39 @@ add_iv_candidate_for_use (struct ivopts_data *data, 
> > struct iv_use *use)
> >  basetype = sizetype;
> >record_common_cand (data, build_int_cst (basetype, 0), iv->step, use);
> >
> > +  /* Compare the cost of an address with an unscaled index with the cost of
> > +an address with a scaled index and add candidate if useful. */
> > +  if (use != NULL && use->type == USE_PTR_ADDRESS)
> > +{
> > +  struct mem_address parts = {NULL_TREE, integer_one_node,
> > + NULL_TREE, NULL_TREE, NULL_TREE};
> > +  poly_uint64 temp;
> > +  poly_int64 fact;
> > +  bool speed = optimize_loop_for_speed_p (data->current_loop);
> > +  poly_int64 poly_step = tree_to_poly_int64 (iv->step);
> > +  machine_mode mem_mode = TYPE_MODE (use->mem_type);
> > +  addr_space_t as = TYPE_ADDR_SPACE (TREE_TYPE (use->iv->base));
> > +
> > +  fact = GET_MODE_SIZE (GET_MODE_INNER (TYPE_MODE (use->mem_type)));
> > +  parts.index = integer_one_node;
> > +
> > +  if (fact.is_constant ()
> > + && can_div_trunc_p (poly_step, fact, &temp))
> > +   {
> > + /* Addressing mode "base + index".  */
> > + rtx addr = addr_for_mem_ref (&parts, as, false);
> > + unsigned cost = address_cost (addr, mem_mode, as, speed);
> > + tree step = wide_int_to_tree (sizetype,
> > +   exact_div (poly_step, fact));
> > + parts.step = wide_int_to_tree (sizetype, fact);
> > + /* Addressing mode "base + index << scale".  */
> > + addr = addr_for_mem_ref (&parts, as, false);
> > + unsigned new_cost = address_cost (addr, mem_mode, as, speed);
> > + if (new_cost < cost)
> > +  

Re: [PATCH 5/12] fix diagnostic quoting/spelling in c-family

2019-05-16 Thread Martin Sebor

On 5/16/19 5:22 PM, Joseph Myers wrote:

On Tue, 14 May 2019, Martin Sebor wrote:


The attached patch fixes quoting, spelling, and other formatting
issues in diagnostics issued from files in the c-family/ directory
and pointed out by the -Wformat-diag warning.


Some of the changes in this patch are questionable.  The diagnostics for
attribute scalar_storage_order and visibility arguments use \" because the
argument is a string constant not an identifier.  So making those use %qs
makes the diagnostics misleading, by suggesting an attribute argument is
used that is not in fact valid for that attribute.


Hmm, yes.  I introduced it elsewhere as well in some of my prior
changes, and it existed even before then in handle_visibility_attribute:

error ("%qD was declared %qs which implies default visibility",
   decl, "dllexport");

There is a way to highlight a string without enclosing it in both
single and double quotes:

error ("attribute %qE argument must be one of %r%s%R or %r%s%R",
   name, "locus", "\"big-endian\"",
   "locus", "\"little-endian\"");

It's not pretty but it does the job.  Unless you know of some other
trick I'll go with it and fix up the existing mistakes the same way
in a followup commit.

Martin


Go patch committed: Intrinsify runtime/internal/atomic functions

2019-05-16 Thread Ian Lance Taylor
This patch to the Go frontend by Cherry Zhang intrinsifies the
runtime/internal/atomic functions.  Currently the
runtime/internal/atomic functions are implemented in C using C
compiler intrinsics.  This patch lets the Go frontend recognize these
functions and turn them into intrinsics directly.  Bootstrapped and
ran Go testsuite on x86_64-pc-linux-gnu.  Committed to mainline.

Ian

2019-05-16  Cherry Zhang  

* go-gcc.cc (Gcc_backend::Gcc_backend): Define atomic builtins.
Index: gcc/go/go-gcc.cc
===
--- gcc/go/go-gcc.cc(revision 271182)
+++ gcc/go/go-gcc.cc(working copy)
@@ -776,6 +776,109 @@ Gcc_backend::Gcc_backend()
   this->define_builtin(BUILT_IN_UNREACHABLE, "__builtin_unreachable", NULL,
   build_function_type(void_type_node, void_list_node),
   true, true);
+
+  // We provide some atomic functions.
+  t = build_function_type_list(uint32_type_node,
+   ptr_type_node,
+   integer_type_node,
+   NULL_TREE);
+  this->define_builtin(BUILT_IN_ATOMIC_LOAD_4, "__atomic_load_4", NULL,
+   t, false, false);
+
+  t = build_function_type_list(uint64_type_node,
+   ptr_type_node,
+   integer_type_node,
+   NULL_TREE);
+  this->define_builtin(BUILT_IN_ATOMIC_LOAD_8, "__atomic_load_8", NULL,
+   t, false, false);
+
+  t = build_function_type_list(void_type_node,
+   ptr_type_node,
+   uint32_type_node,
+   integer_type_node,
+   NULL_TREE);
+  this->define_builtin(BUILT_IN_ATOMIC_STORE_4, "__atomic_store_4", NULL,
+   t, false, false);
+
+  t = build_function_type_list(void_type_node,
+   ptr_type_node,
+   uint64_type_node,
+   integer_type_node,
+   NULL_TREE);
+  this->define_builtin(BUILT_IN_ATOMIC_STORE_8, "__atomic_store_8", NULL,
+   t, false, false);
+
+  t = build_function_type_list(uint32_type_node,
+   ptr_type_node,
+   uint32_type_node,
+   integer_type_node,
+   NULL_TREE);
+  this->define_builtin(BUILT_IN_ATOMIC_EXCHANGE_4, "__atomic_exchange_4", NULL,
+   t, false, false);
+
+  t = build_function_type_list(uint64_type_node,
+   ptr_type_node,
+   uint64_type_node,
+   integer_type_node,
+   NULL_TREE);
+  this->define_builtin(BUILT_IN_ATOMIC_EXCHANGE_8, "__atomic_exchange_8", NULL,
+   t, false, false);
+
+  t = build_function_type_list(boolean_type_node,
+   ptr_type_node,
+   ptr_type_node,
+   uint32_type_node,
+   boolean_type_node,
+   integer_type_node,
+   integer_type_node,
+   NULL_TREE);
+  this->define_builtin(BUILT_IN_ATOMIC_COMPARE_EXCHANGE_4,
+   "__atomic_compare_exchange_4", NULL,
+   t, false, false);
+
+  t = build_function_type_list(boolean_type_node,
+   ptr_type_node,
+   ptr_type_node,
+   uint64_type_node,
+   boolean_type_node,
+   integer_type_node,
+   integer_type_node,
+   NULL_TREE);
+  this->define_builtin(BUILT_IN_ATOMIC_COMPARE_EXCHANGE_8,
+   "__atomic_compare_exchange_8", NULL,
+   t, false, false);
+
+  t = build_function_type_list(uint32_type_node,
+   ptr_type_node,
+   uint32_type_node,
+   integer_type_node,
+   NULL_TREE);
+  this->define_builtin(BUILT_IN_ATOMIC_ADD_FETCH_4, "__atomic_add_fetch_4", 
NULL,
+   t, false, false);
+
+  t = build_function_type_list(uint64_type_node,
+   ptr_type_node,
+   uint64_type_node,
+   integer_type_node,
+   NULL_TREE);
+  this->define_builtin(BUILT_IN_ATOMIC_ADD_FETCH_8, "__atomic_add_fetch_8", 
NULL,
+   t, false, false);
+
+  t = build_function_type_list(unsigned_char_type_node,
+   ptr_type_node,
+   unsigned_

Re: [Contrib PATCH] Add scripts to convert GCC repo from SVN to Git

2019-05-16 Thread Joseph Myers
On Thu, 16 May 2019, Maxim Kuvyrkov wrote:

> Let's avoid mixing the two discussions: (1) converting svn repo to git 
> (and getting community consensus to switch to git) and (2) deciding on 
> which branches to keep in the new repo.
> 
> With git, we can always split away unneeded history by removing 
> unnecessary branches and tags and re-packing the repo.  We can equally 
> easily bring that history back if we change our minds.

A prerequisite of a move to git is to have policies on branch deletion / 
force-pushes, and hook implementations that ensure those policies are 
followed (as well as implementing what's agreed on commit messages, 
Bugzilla updates, etc.).  There has of course been a lot of past 
discussion of those that someone will need to find, read and describe the 
issues and conclusions from.  I think there was a view that branch 
deletion and force-pushes should be limited to a particular namespace for 
user branches.

(I support a move to git, but not one using git-svn, and only one that 
properly takes into account the large amount of work previously done on 
author maps, understanding the repository peculiarities and how to 
correctly identify exactly which directories are branches or tags, fixing 
cases where there are both a branch and tag of the same name, identifying 
which tags to remove and which to keep, etc.)

-- 
Joseph S. Myers
jos...@codesourcery.com


Re: [PATCH] soft-fp: Update soft-fp from glibc

2019-05-16 Thread Joseph Myers
On Wed, 15 May 2019, H.J. Lu wrote:

> This patch is updating all soft-fp from glibc, most changes are
> copyright years update, and changes other than years update are
> 
>   * soft-fp/extenddftf2.c: Use "_FP_W_TYPE_SIZE < 64" to check if
>   4_FP_W_TYPEs are used for IEEE quad precision.
>   * soft-fp/extendhftf2.c: Likewise.
>   * soft-fp/extendsftf2.c: Likewise.
>   * soft-fp/extendxftf2.c: Likewise.
>   * soft-fp/trunctfdf2.c: Likewise.
>   * soft-fp/trunctfhf2.c: Likewise.
>   * soft-fp/trunctfsf2.c: Likewise.
>   * soft-fp/trunctfxf2.c: Likewise.
>   * config/rs6000/ibm-ldouble.c: Likewise.
> 
> OK for trunk?

OK.

-- 
Joseph S. Myers
jos...@codesourcery.com


Re: [PATCH 3/3][DejaGNU] target: Wrap linker flags into `-largs'/`-margs' for Ada

2019-05-16 Thread Jacob Bachmeyer

Maciej W. Rozycki wrote:

On Wed, 15 May 2019, Jacob Bachmeyer wrote:
  
This patch really exposes a significant deficiency in our current 
implementation of default_target_compile:  the order of various flags 
can be significant, but we only have that order implicitly expressed in 
the code, which goes all the way back to (of course) the "Initial 
revision" that is probably from a time before Tcl had the features that 
will allow significant cleanup in here.



 I suspect the origins may be different, however as valuable as your 
observation is functional problems have precedence over issues with code 
structuring, so we need to fix the problem at hand first.  I'm sure 
DejaGNU maintainers will be happy to review your implementation of code 
restructuring afterwards.
  


My concern is that your patch may only solve a small part of the problem 
-- enough to make your specific case work, yes, but then someone else 
will hit other parts of the problem later and we spiral towards "tissue 
of hacks" unmaintainability.


The biggest hint to me that your patch is incomplete is that it only 
adds -largs/-margs to wrap LDFLAGS.  I suspect that there are other 
-?args options that should be used also with other flag sets, but those 
do not appear in this patch.  Do we know what the GNU Ada frontend 
actually expects?


Some of these could probably be combined and I may have combined 
categories that should be separate in the above list.  The GNU toolchain 
has always been a kind of "magic box that just works" (until it doesn't 
and the manual explains the problem) for me, so I am uncertain what the 
ordering rules for combining these categories should be.  Anyone know 
the traditional rules and, perhaps more importantly, what systems need 
which rules?



 The ordering rules are system-specific I'm afraid and we have to be 
careful not to break working systems out there.  People may be forced to a 
DejaGNU upgrate, due to a newer version of a program being tested having 
such a requirement, and can legitimately expect their system to continue 
working.
  


We can start by simply preserving the existing ordering until we know 
something should change, but the main goal of my previous message was to 
collect the requirements for a specification for default_target_compile 
so I can write regression tests (some of which will fail due to known 
bugs like broken Ada support in our current implementation) before 
embarking on extensive changes to that procedure.  Improving 
"target.test" was already on my local TODO list.


 NB I have been repeatedly observing cases where a forced upgrade of a 
system component I neither care nor I am competent about, triggered by an 
upgrade of a component I do care about, caused the system to malfunction 
in a way that I find both unacceptable and extremely hard to debug.  It 
seems to have become more frequent in the recent years, and I find this 
both very frustrating and have wasted lots of time trying to fix the 
damage caused.  I would therefore suggest to take all the measures 
possible to save people from going through such an experience.
  


Yes, I have also noticed an attitude that can be summed up as "Who cares 
about backwards compatibility?  New!  Shiny!" usually from people who 
have no clue and no business being anywhere near a source editor.  
(Surprise!  Their code has lots of bugs, usually severe, too.)  The 
problem is not new -- jwz called it out as the "Cascade of 
Attention-Deficit Teenagers" model, noting that it seemed to 
particularly plague GNOME, long ago.
Unfortunately, people with that particular attitude seem to have 
acquired outsize influence over the last few years.  I would suspect an 
organized attack if I were more conspiracy-oriented, but Hanlon's razor 
strongly suggests that this is simply a consequence of lowering barriers 
to entry.


-- Jacob



Re: [PATCH 1/2] Add support for IVOPT

2019-05-16 Thread Kugan Vivekanandarajah
Hi Richard,

On Wed, 15 May 2019 at 16:57, Richard Sandiford
 wrote:
>
> Thanks for doing this.
>
> kugan.vivekanandara...@linaro.org writes:
> > From: Kugan Vivekanandarajah 
> >
> > gcc/ChangeLog:
> >
> > 2019-05-15  Kugan Vivekanandarajah  
> >
> >   PR target/88834
> >   * tree-ssa-loop-ivopts.c (get_mem_type_for_internal_fn): Handle
> >   IFN_MASK_LOAD_LANES and IFN_MASK_STORE_LANES.
> >   (find_interesting_uses_stmt): Likewise.
> >   (get_alias_ptr_type_for_ptr_address): Likewise.
> >   (add_iv_candidate_for_use): Add scaled index candidate if useful.
> >
> > Change-Id: I8e8151fe2dde2845dedf38b090103694da6fc9d1
> > ---
> >  gcc/tree-ssa-loop-ivopts.c | 60 
> > +-
> >  1 file changed, 59 insertions(+), 1 deletion(-)
> >
> > diff --git a/gcc/tree-ssa-loop-ivopts.c b/gcc/tree-ssa-loop-ivopts.c
> > index 9864b59..115a70c 100644
> > --- a/gcc/tree-ssa-loop-ivopts.c
> > +++ b/gcc/tree-ssa-loop-ivopts.c
> > @@ -2451,11 +2451,13 @@ get_mem_type_for_internal_fn (gcall *call, tree 
> > *op_p)
> >switch (gimple_call_internal_fn (call))
> >  {
> >  case IFN_MASK_LOAD:
> > +case IFN_MASK_LOAD_LANES:
> >if (op_p == gimple_call_arg_ptr (call, 0))
> >   return TREE_TYPE (gimple_call_lhs (call));
> >return NULL_TREE;
> >
> >  case IFN_MASK_STORE:
> > +case IFN_MASK_STORE_LANES:
> >if (op_p == gimple_call_arg_ptr (call, 0))
> >   return TREE_TYPE (gimple_call_arg (call, 3));
> >return NULL_TREE;
> > @@ -2545,7 +2547,7 @@ find_interesting_uses_stmt (struct ivopts_data *data, 
> > gimple *stmt)
> > return;
> >   }
> >
> > -  /* TODO -- we should also handle address uses of type
> > +  /* TODO -- we should also handle all address uses of type
> >
> >memory = call (whatever);
> >
> > @@ -2553,6 +2555,27 @@ find_interesting_uses_stmt (struct ivopts_data 
> > *data, gimple *stmt)
> >
> >call (memory).  */
> >  }
> > +  else if (is_gimple_call (stmt))
> > +{
> > +  gcall *call = dyn_cast  (stmt);
> > +  if (call
> > +   && gimple_call_internal_p (call)
> > +   && (gimple_call_internal_fn (call) == IFN_MASK_LOAD_LANES
> > +   || gimple_call_internal_fn (call) == IFN_MASK_STORE_LANES))
> > + {
> > +   tree *arg = gimple_call_arg_ptr (call, 0);
> > +   struct iv *civ = get_iv (data, *arg);
> > +   tree mem_type = get_mem_type_for_internal_fn (call, arg);
> > +   if (civ && mem_type)
> > + {
> > +   civ = alloc_iv (data, civ->base, civ->step);
> > +   record_group_use (data, arg, civ, stmt, USE_PTR_ADDRESS,
> > + mem_type);
> > +   return;
> > + }
> > + }
> > +}
> > +
>
> Why do you need to handle this specially?  Does:
>
>   FOR_EACH_PHI_OR_STMT_USE (use_p, stmt, iter, SSA_OP_USE)
> {
>   op = USE_FROM_PTR (use_p);
>
>   if (TREE_CODE (op) != SSA_NAME)
> continue;
>
>   iv = get_iv (data, op);
>   if (!iv)
> continue;
>
>   if (!find_address_like_use (data, stmt, use_p->use, iv))
> find_interesting_uses_op (data, op);
> }
>
> not do the right thing for the load/store lane case?
Right, I initially thought load lanes should be handled differently
but turned out they can be done the same way. I should have removed
it. Done now.

>
> > @@ -3500,6 +3523,39 @@ add_iv_candidate_for_use (struct ivopts_data *data, 
> > struct iv_use *use)
> >  basetype = sizetype;
> >record_common_cand (data, build_int_cst (basetype, 0), iv->step, use);
> >
> > +  /* Compare the cost of an address with an unscaled index with the cost of
> > +an address with a scaled index and add candidate if useful. */
> > +  if (use != NULL && use->type == USE_PTR_ADDRESS)
>
> I think we want this for all address uses.  E.g. for SVE, masked and
> unmasked accesses would both benefit.
OK.

>
> > +{
> > +  struct mem_address parts = {NULL_TREE, integer_one_node,
> > +   NULL_TREE, NULL_TREE, NULL_TREE};
>
> Might be better to use "= {}" and initialise the fields that matter by
> assignment.  As it stands this uses integer_one_node as the base, but I
> couldn't tell if that was deliberate.

I just copied this part from get_address_cost, similar to what is done
there. I have now changed the way you suggested but using the values
used in get_address_cost.
>
> > +  poly_uint64 temp;
> > +  poly_int64 fact;
> > +  bool speed = optimize_loop_for_speed_p (data->current_loop);
> > +  poly_int64 poly_step = tree_to_poly_int64 (iv->step);
>
> The step could be variable, so we should check whether this holds
> using poly_int_tree_p.
OK.

>
> > +  machine_mode mem_mode = TYPE_MODE (use->mem_type);
> > +  addr_space_t as = TYPE_ADDR_SPACE (TREE_TYPE (use->iv->base));
> > +
> > +  fact = GET_MODE_SIZE (GET_MODE_INNER (TYPE_MODE (use->mem_type)));
>
> This is simpler a

Re: [PATCH 5/12] fix diagnostic quoting/spelling in c-family

2019-05-16 Thread Joseph Myers
On Tue, 14 May 2019, Martin Sebor wrote:

> The attached patch fixes quoting, spelling, and other formatting
> issues in diagnostics issued from files in the c-family/ directory
> and pointed out by the -Wformat-diag warning.

Some of the changes in this patch are questionable.  The diagnostics for 
attribute scalar_storage_order and visibility arguments use \" because the 
argument is a string constant not an identifier.  So making those use %qs 
makes the diagnostics misleading, by suggesting an attribute argument is 
used that is not in fact valid for that attribute.

-- 
Joseph S. Myers
jos...@codesourcery.com


Go patch committed: Add intrinsics for runtime/internal/sys functions

2019-05-16 Thread Ian Lance Taylor
This patch to the Go frontend by Cherry Zhang adds intrinsics for
runtime/internal/sys functions.

runtime/internal/sys.Ctz32/64 and Bswap32/64 are currently implemented
with compiler builtin functions.  But if they are called from another
package, the compiler does not know and therefore cannot turn them
into compiler intrinsics.  This patch makes the compiler recognize
these functions and turn them into intrinsics directly, as the gc
compiler does.

This patch sets up a way for adding intrinsics in the compiler.  More
intrinsics will be added in later patches.

Also move the handling of runtime.getcallerpc/sp to the new way of
generating intrinsics.

Bootstrapped and ran Go testsuite on x86_64-pc-linux-gnu.  Committed
to mainline.

Ian
Index: gcc/go/gofrontend/MERGE
===
--- gcc/go/gofrontend/MERGE (revision 271276)
+++ gcc/go/gofrontend/MERGE (working copy)
@@ -1,4 +1,4 @@
-2df0879e7880057293c0a59be6868a3e6ea5105b
+c0c8ad50627e3a59267e6e3de233a0b30cf64150
 
 The first line of this file holds the git revision number of the last
 merge done from the gofrontend repository.
Index: gcc/go/gofrontend/expressions.cc
===
--- gcc/go/gofrontend/expressions.cc(revision 271276)
+++ gcc/go/gofrontend/expressions.cc(working copy)
@@ -10252,42 +10252,6 @@ Call_expression::do_lower(Gogo* gogo, Na
  bme->location());
 }
 
-  // Handle a couple of special runtime functions.  In the runtime
-  // package, getcallerpc returns the PC of the caller, and
-  // getcallersp returns the frame pointer of the caller.  Implement
-  // these by turning them into calls to GCC builtin functions.  We
-  // could implement them in normal code, but then we would have to
-  // explicitly unwind the stack.  These functions are intended to be
-  // efficient.  Note that this technique obviously only works for
-  // direct calls, but that is the only way they are used.
-  if (gogo->compiling_runtime() && gogo->package_name() == "runtime")
-{
-  Func_expression* fe = this->fn_->func_expression();
-  if (fe != NULL
- && fe->named_object()->is_function_declaration()
- && fe->named_object()->package() == NULL)
-   {
- std::string n = Gogo::unpack_hidden_name(fe->named_object()->name());
- if ((this->args_ == NULL || this->args_->size() == 0)
- && n == "getcallerpc")
-   {
- static Named_object* builtin_return_address;
-  int arg = 0;
- return this->lower_to_builtin(&builtin_return_address,
-   "__builtin_return_address",
-   &arg);
-   }
- else if ((this->args_ == NULL || this->args_->size() == 0)
-  && n == "getcallersp")
-   {
- static Named_object* builtin_dwarf_cfa;
- return this->lower_to_builtin(&builtin_dwarf_cfa,
-   "__builtin_dwarf_cfa",
-   NULL);
-   }
-   }
-}
-
   // If this is a call to an imported function for which we have an
   // inlinable function body, add it to the list of functions to give
   // to the backend as inlining opportunities.
@@ -10401,31 +10365,6 @@ Call_expression::lower_varargs(Gogo* gog
   this->varargs_are_lowered_ = true;
 }
 
-// Return a call to __builtin_return_address or __builtin_dwarf_cfa.
-
-Expression*
-Call_expression::lower_to_builtin(Named_object** pno, const char* name,
- int* arg)
-{
-  if (*pno == NULL)
-*pno = Gogo::declare_builtin_rf_address(name, arg != NULL);
-
-  Location loc = this->location();
-
-  Expression* fn = Expression::make_func_reference(*pno, NULL, loc);
-  Expression_list *args = new Expression_list();
-  if (arg != NULL)
-{
-  Expression* a = Expression::make_integer_ul(*arg, NULL, loc);
-  args->push_back(a);
-}
-  Expression* call = Expression::make_call(fn, args, false, loc);
-
-  // The builtin functions return void*, but the Go functions return uintptr.
-  Type* uintptr_type = Type::lookup_integer_type("uintptr");
-  return Expression::make_cast(uintptr_type, call, loc);
-}
-
 // Flatten a call with multiple results into a temporary.
 
 Expression*
@@ -10491,9 +10430,125 @@ Call_expression::do_flatten(Gogo* gogo,
   this->args_ = args;
 }
 
+  // Lower to compiler intrinsic if possible.
+  Func_expression* fe = this->fn_->func_expression();
+  if (fe != NULL
+  && (fe->named_object()->is_function_declaration()
+  || fe->named_object()->is_function()))
+{
+  Expression* ret = this->intrinsify(gogo, inserter);
+  if (ret != NULL)
+return ret;
+}
+
   return this;
 }
 
+// Lower a call to a compiler intrinsic if possible.
+// Returns NULL if it is n

[C++ Patch] PR 67184 ("Missed optimization with C++11 final specifier")

2019-05-16 Thread Paolo Carlini

Hi,

when Roberto Agostino and I implemented the front-end devirtualization 
of final overriders we missed this case, where it comes from the base. 
It seems to me that by way of access_path the existing approach can be 
neatly extended. Tested x86_64-linux.


Thanks, Paolo.

///

/cp
2019-05-17  Paolo Carlini  

PR c++/67184
PR c++/69445
* call.c (build_over_call): Devirtualize when the final overrider
comes from the base.

/testsuite
2019-05-17  Paolo Carlini  

PR c++/67184
PR c++/69445
* g++.dg/other/final3.C: New.
* g++.dg/other/final4.C: Likewise.
* g++.dg/other/final5.C: Likewise.
Index: cp/call.c
===
--- cp/call.c   (revision 271296)
+++ cp/call.c   (working copy)
@@ -8241,7 +8241,7 @@ build_over_call (struct z_candidate *cand, int fla
   /* See if the function member or the whole class type is declared
 final and the call can be devirtualized.  */
   if (DECL_FINAL_P (fn)
- || CLASSTYPE_FINAL (TYPE_METHOD_BASETYPE (TREE_TYPE (fn
+ || CLASSTYPE_FINAL (TREE_TYPE (cand->access_path)))
flags |= LOOKUP_NONVIRTUAL;
 
   /* [class.mfct.nonstatic]: If a nonstatic member function of a class
Index: testsuite/g++.dg/other/final3.C
===
--- testsuite/g++.dg/other/final3.C (nonexistent)
+++ testsuite/g++.dg/other/final3.C (working copy)
@@ -0,0 +1,26 @@
+// PR c++/67184
+// { dg-do compile { target c++11 } }
+// { dg-options "-fdump-tree-original"  }
+
+struct V {
+ virtual void foo(); 
+};
+
+struct wV final : V {
+};
+
+struct oV final : V {
+  void foo();
+};
+
+void call(wV& x)
+{
+  x.foo();
+}
+
+void call(oV& x)
+{
+  x.foo();
+}
+
+// { dg-final { scan-tree-dump-times "OBJ_TYPE_REF" 0 "original" } }
Index: testsuite/g++.dg/other/final4.C
===
--- testsuite/g++.dg/other/final4.C (nonexistent)
+++ testsuite/g++.dg/other/final4.C (working copy)
@@ -0,0 +1,16 @@
+// PR c++/67184
+// { dg-do compile { target c++11 } }
+// { dg-options "-fdump-tree-original"  }
+
+struct B
+{
+  virtual void operator()();
+  virtual operator int();
+  virtual int operator++();
+};
+
+struct D final : B { };
+
+void foo(D& d) { d(); int t = d; ++d; }
+
+// { dg-final { scan-tree-dump-times "OBJ_TYPE_REF" 0 "original" } }
Index: testsuite/g++.dg/other/final5.C
===
--- testsuite/g++.dg/other/final5.C (nonexistent)
+++ testsuite/g++.dg/other/final5.C (working copy)
@@ -0,0 +1,19 @@
+// PR c++/69445
+// { dg-do compile { target c++11 } }
+// { dg-options "-fdump-tree-original"  }
+
+struct Base {
+  virtual void foo() const = 0;
+  virtual void bar() const {}
+};
+
+struct C final : Base {
+  void foo() const { }
+};
+
+void func(const C & c) {
+  c.bar();
+  c.foo();
+}
+
+// { dg-final { scan-tree-dump-times "OBJ_TYPE_REF" 0 "original" } }


Re: [Contrib PATCH] Add scripts to convert GCC repo from SVN to Git

2019-05-16 Thread Joseph Myers
On Tue, 14 May 2019, Maxim Kuvyrkov wrote:

> The scripts convert svn history branch by branch.  They rely on git-svn 
> on convert individual branches.  Git-svn is a good tool for converting 
> individual branches.  It is, however, either very slow at converting the 
> entire GCC repo, or goes into infinite loop.

I think git-svn is in fact a bad tool for repository conversion when the 
history is nontrivial (for the reasons that have been discussed at length 
in the past), and we should convert with reposurgeon.

ESR, can you give an update on the status of the conversion with 
reposurgeon?  You said "another serious attack on the repository 
conversion is probably about two months out" in 
.  Is it on target to be 
done by the time of the GNU Tools Cauldron in Montreal in September?

And, could you bring git://thyrsus.com/repositories/gcc-conversion.git up 
to date with changes since Jan 2018, or push the latest version of that 
repository to some other public hosting location?  That repository 
represents what I consider the collaboratively built consensus on such 
things as the desired author map (including handling of the ambiguous 
author name), which directories represent branches and tags, and what tags 
should be kept or removed - but building up such a consensus and keeping 
it up to date over time (for new committers etc.) requires that the public 
repository actually reflects the latest version of the conversion 
machinery, day by day as the consensus develops.  Review of that 
repository will be important for reviewing the details of whether the 
conversion is being done as desired - the details of the machinery will 
help suggest things to spot-check in a converted repository.

-- 
Joseph S. Myers
jos...@codesourcery.com


Re: [PATCH 2/3][GCC] GNAT/testsuite: Pass the `ada' option to target compilation

2019-05-16 Thread Jacob Bachmeyer

Maciej W. Rozycki wrote:

On Wed, 15 May 2019, Jacob Bachmeyer wrote:
[...]
 We are not consistent here in `gnat_target_compile' anyway, as you can 
see from the two existing `concat' invocations, and also the `timeout=300' 
element.
  


That is the GCC testsuite rather than DejaGnu itself, so it is less of a 
concern to me.


Perhaps {lappend options ada} might be simpler?  Is placing ada at the 
beginning of the list important?



 It can't be last because we override the default compiler otherwise
selected by this option in `default_target_compile', and then options 
passed in may override it further.  Overall I felt it to be safer if we 
placed the compiler type selection first rather than somewhere in the 
middle.
  


This is probably a bug in DejaGnu, (those options should set defaults 
rather than override whatever else has been given) but you will still 
need to work around it for the installed base.



 I hope it clears your concerns.
  


As far as the patch to GCC goes, I am not worried.

-- Jacob



[PATCH] Remove incorrect assertion from filesystem::absolute

2019-05-16 Thread Jonathan Wakely

The assertion is wrong, it should be *s.end() == 0, but that's not
allowed. Just remove it, but keep the comment.

* src/c++17/fs_ops.cc (absolute(const path&, error_code&))
[_GLIBCXX_FILESYSTEM_IS_WINDOWS]: Remove bogus assertion.

Tested x86_64-w64-mingw32, committed to trunk.

commit 6f4fd5fec3488ecc977a7d29f8538a934e6e35ac
Author: Jonathan Wakely 
Date:   Thu May 16 23:31:30 2019 +0100

Remove incorrect assertion from filesystem::absolute

The assertion is wrong, it should be *s.end() == 0, but that's not
allowed. Just remove it, but keep the comment.

* src/c++17/fs_ops.cc (absolute(const path&, error_code&))
[_GLIBCXX_FILESYSTEM_IS_WINDOWS]: Remove bogus assertion.

diff --git a/libstdc++-v3/src/c++17/fs_ops.cc b/libstdc++-v3/src/c++17/fs_ops.cc
index 2d13b172d69..274ee7f0834 100644
--- a/libstdc++-v3/src/c++17/fs_ops.cc
+++ b/libstdc++-v3/src/c++17/fs_ops.cc
@@ -96,6 +96,7 @@ fs::absolute(const path& p, error_code& ec)
 }
 
 #ifdef _GLIBCXX_FILESYSTEM_IS_WINDOWS
+  // s must remain null-terminated
   wstring_view s = p.native();
 
   if (p.has_root_directory()) // implies !p.has_root_name()
@@ -108,9 +109,6 @@ fs::absolute(const path& p, error_code& ec)
   s.remove_prefix(std::min(s.length(), pos) - 1);
 }
 
-  // s must be null-terminated
-  __glibcxx_assert(!s.empty() && s.back() == 0);
-
   uint32_t len = 1024;
   wstring buf;
   do


Re: [PATCH, Darwin, PowerPC, testsuite] Exclude Darwin from VSX, Power8 and Power9 tests.

2019-05-16 Thread Segher Boessenkool
On Thu, May 16, 2019 at 12:03:14PM +0100, Iain Sandoe wrote:
> I did a quick check...
> 
> dfp.exp most (all?) fail despite
> 
> /* { dg-require-effective-target powerpc_p9vector_ok } */
> 
> with errors like this…
> 
> error: decimal floating point not supported for this target

Okay, so the test should probably work if the test would actually try to
use DFP, not just do a machine insn in asm (maybe just using a DFP constant
or variable as input to the asm is enough).

> a number (large enough)  of the bfp.exp tests fail despite
> 
> /* { dg-require-effective-target powerpc_p9vector_ok } */
> 
> with things like ...
> .../gcc.target/powerpc/bfp/scalar-extract-exp-5.c:13:15: error: unknown type 
> name '__ieee128'; did you mean '__int128’?
> 
> It could be that these are missing an require-effective-target-float128.

Yeah, something like that.  It looks like we have no selector for exactly
__ieee128 yet.

> Other differences are rather spread around the testsuite, so I’ve not 
> re-checked.
> 
> note that circa 1000 tests are attempted with the new assembler that were 
> unsupported with cctools.
> at least half of those fail.

Oh wow.

> > (I am motivated to have
> > parity between the cctools and newer assemblers in coverage on
> > Darwin for now, and then to try expanding the horizons when the basics
> > are working well).
> 
> 
> It’s helpful to me right now that tests that are UNSUPPORTED with the cctools 
> assembler are not attempted with the newer one.
> That makes a<->b comparisons easier, and helps highlight the cases where 
> tests fail because the new assembler has better error checking rather than 
> spurious attempts to do things that Darwin can't.

If you can make the selector fail on Darwin (rather than specific testcases),
this will all be fine, not too much churn.

> In the longer term, when the testsuite noise is manageable - we can try 
> backing these things out one at a time and see what new fails we get - and 
> either fix the cases individually (or put the blanket provision back, of 
> course).

Sounds perfect.  Thanks!


Segher


Re: [PATCH] Fix PR 81721: ICE with PCH and Pragma warning and C++ operator

2019-05-16 Thread Joseph Myers
On Mon, 1 Apr 2019, apin...@marvell.com wrote:

> From: Andrew Pinski 
> 
> Hi,
>   The problem here is the token->val.node is not saved over
> a precompiled header for C++ operator.  This can cause an
> internal compiler error as we tried to print out the spelling
> of the token as we assumed it was valid.
> The fix is to have cpp_token_val_index return CPP_TOKEN_FLD_NODE
> for operator tokens that have NAMED_OP set.
> 
> OK?  Bootstrapped and tested on x86_64-linux-gnu with no regressions.

OK.

-- 
Joseph S. Myers
jos...@codesourcery.com


Re: [PATCH][PR90106] Builtin call transformation changes in cdce pass

2019-05-16 Thread Jakub Jelinek
On Thu, May 16, 2019 at 11:39:38PM +0200, Jakub Jelinek wrote:
> One possibility is to add -fdump-tree-optimized and scan for
> /* { dg-final { scan-tree-dump "pow \\(\[^\n\r]*\\); \\\[tail call\\\]" 
> "optimized" } } */
> resp.
> /* { dg-final { scan-tree-dump "log \\(\[^\n\r]*\\); \\\[tail call\\\]" 
> "optimized" } } */

Here it is in patch form.

That said, I'm not convinced your patch does what you wanted, because
comparing a month old trunk with today's trunk generates the same assembly
except for .ident, generates as many [tail call] lines in *.optimized dump
as before, emits the same number of jmp\tpow and jmp\tlog instructions as
before (one in a separate routine).

But at least the tests aren't UNSUPPORTED anymore.

2019-05-16  Jakub Jelinek  

PR tree-optimization/90106
* gcc.dg/cdce1.c: Don't scan-assembler, instead -fdump-tree-optimized
and scan-tree-dump for tail call.
* gcc.dg/cdce2.c: Likewise.

--- gcc/testsuite/gcc.dg/cdce1.c.jj 2019-05-16 11:28:22.750177582 +0200
+++ gcc/testsuite/gcc.dg/cdce1.c2019-05-16 23:50:23.618450891 +0200
@@ -1,9 +1,9 @@
-/* { dg-do  run  } */
-/* { dg-options "-O2 -fmath-errno -fdump-tree-cdce-details  -lm" } */
+/* { dg-do run } */
+/* { dg-options "-O2 -fmath-errno -fdump-tree-cdce-details 
-fdump-tree-optimized -lm" } */
 /* { dg-require-effective-target int32plus } */
-/* { dg-final { scan-tree-dump  "cdce1.c:17: .* function call is 
shrink-wrapped into error conditions\."  "cdce" } } */
-/* { dg-final { scan-assembler "jmp pow" } } */
 /* { dg-require-effective-target large_double } */
+/* { dg-final { scan-tree-dump "cdce1.c:17: .* function call is shrink-wrapped 
into error conditions\." "cdce" } } */
+/* { dg-final { scan-tree-dump "pow \\(\[^\n\r]*\\); \\\[tail call\\\]" 
"optimized" } } */
 
 #include 
 #include 
--- gcc/testsuite/gcc.dg/cdce2.c.jj 2019-05-16 11:28:22.781177075 +0200
+++ gcc/testsuite/gcc.dg/cdce2.c2019-05-16 23:50:58.505880845 +0200
@@ -1,8 +1,8 @@
-/* { dg-do  run  } */
+/* { dg-do run } */
 /* { dg-skip-if "doubles are floats" { "avr-*-*" } } */
-/* { dg-options "-O2 -fmath-errno -fdump-tree-cdce-details  -lm" } */
-/* { dg-final { scan-tree-dump  "cdce2.c:16: .* function call is 
shrink-wrapped into error conditions\." "cdce" } } */
-/* { dg-final { scan-assembler "jmp log" } } */
+/* { dg-options "-O2 -fmath-errno -fdump-tree-cdce-details 
-fdump-tree-optimized -lm" } */
+/* { dg-final { scan-tree-dump "cdce2.c:16: .* function call is shrink-wrapped 
into error conditions\." "cdce" } } */
+/* { dg-final { scan-tree-dump "log \\(\[^\n\r]*\\); \\\[tail call\\\]" 
"optimized" } } */
  
 #include 
 #include 


Jakub


Re: [Contrib PATCH] Add scripts to convert GCC repo from SVN to Git

2019-05-16 Thread Jonathan Wakely

On 16/05/19 13:07 -0600, Jeff Law wrote:

On 5/16/19 12:36 PM, Ramana Radhakrishnan wrote:

On Thu, May 16, 2019 at 5:41 PM Maxim Kuvyrkov
 wrote:



On May 16, 2019, at 7:22 PM, Jeff Law  wrote:

On 5/15/19 5:19 AM, Richard Biener wrote:


For the official converted repo do we really want all (old)
development branches to be in the
main git repo?  I suppose we could create a readonly git from the
state of the whole repository
at the point of conversion (and also keep the SVN in readonly mode),
just to make migration
of content we want easy in the future?

I've always assumed we'd keep the old SVN tree read-only for historical
purposes.  I strongly suspect that, ignoring release branches, that
non-active branches just aren't terribly interesting.


Let's avoid mixing the two discussions: (1) converting svn repo to git (and 
getting community consensus to switch to git) and (2) deciding on which 
branches to keep in the new repo.



I'm hoping that there is still community consensus to switch to git.

Personally speaking, a +1 to switch to git.

Absolutely +1 for converting as well.


Yes please!

Thanks for working on this, Maxim.




[PATCH] i386: Enable MMX intrinsics without SSE/SSE2/SSSE3

2019-05-16 Thread H.J. Lu
Since MMX intrinsics are marked with SSE/SSE2/SSSE3 for SSE emulation,
enable them without SSE/SSE2/SSSE3 if MMX is enabled.

Restore TARGET_3DNOW check, which was changed to TARGET_3DNOW_A by
revision 271235.

gcc/

PR target/90497
* config/i386/i386-expand.c (ix86_expand_builtin): Enable MMX
intrinsics without SSE/SSE2/SSSE3.
* config/i386/mmx.md (mmx_uavgv8qi3): Restore TARGET_3DNOW
check.
(*mmx_uavgv8qi3): Likewise.

gcc/testsuite/

PR target/90497
* gcc.target/i386/pr90497-1.c: New test.
* gcc.target/i386/pr90497-2.c: Likewise.
---
 gcc/config/i386/i386-expand.c |  6 --
 gcc/config/i386/mmx.md|  4 ++--
 gcc/testsuite/gcc.target/i386/pr90497-1.c | 12 
 gcc/testsuite/gcc.target/i386/pr90497-2.c | 11 +++
 4 files changed, 29 insertions(+), 4 deletions(-)
 create mode 100644 gcc/testsuite/gcc.target/i386/pr90497-1.c
 create mode 100644 gcc/testsuite/gcc.target/i386/pr90497-2.c

diff --git a/gcc/config/i386/i386-expand.c b/gcc/config/i386/i386-expand.c
index df035607fa7..35aadefdef3 100644
--- a/gcc/config/i386/i386-expand.c
+++ b/gcc/config/i386/i386-expand.c
@@ -10937,8 +10937,10 @@ ix86_expand_builtin (tree exp, rtx target, rtx 
subtarget,
   && (isa & (OPTION_MASK_ISA_FMA | OPTION_MASK_ISA_FMA4)) != 0)
 isa |= (OPTION_MASK_ISA_FMA | OPTION_MASK_ISA_FMA4);
   /* Use SSE/SSE2/SSSE3 to emulate MMX intrinsics in 64-bit mode when
- MMX is disabled.  */
-  if (TARGET_MMX_WITH_SSE)
+ MMX is disabled.  NB: Since MMX intrinsics are marked with
+ SSE/SSE2/SSSE3, enable them without SSE/SSE2/SSSE3 if MMX is
+ enabled.  */
+  if (TARGET_MMX_WITH_SSE || TARGET_MMX)
 {
   if (((bisa & (OPTION_MASK_ISA_SSE | OPTION_MASK_ISA_MMX))
   == (OPTION_MASK_ISA_SSE | OPTION_MASK_ISA_MMX))
diff --git a/gcc/config/i386/mmx.md b/gcc/config/i386/mmx.md
index 29bcf931836..adad950fa04 100644
--- a/gcc/config/i386/mmx.md
+++ b/gcc/config/i386/mmx.md
@@ -1745,7 +1745,7 @@
  (const_int 1) (const_int 1)]))
(const_int 1]
   "(TARGET_MMX || TARGET_MMX_WITH_SSE)
-   && (TARGET_SSE || TARGET_3DNOW_A)"
+   && (TARGET_SSE || TARGET_3DNOW)"
   "ix86_fixup_binary_operands_no_copy (PLUS, V8QImode, operands);")
 
 (define_insn "*mmx_uavgv8qi3"
@@ -1764,7 +1764,7 @@
  (const_int 1) (const_int 1)]))
(const_int 1]
   "(TARGET_MMX || TARGET_MMX_WITH_SSE)
-   && (TARGET_SSE || TARGET_3DNOW_A)
+   && (TARGET_SSE || TARGET_3DNOW)
&& ix86_binary_operator_ok (PLUS, V8QImode, operands)"
 {
   /* These two instructions have the same operation, but their encoding
diff --git a/gcc/testsuite/gcc.target/i386/pr90497-1.c 
b/gcc/testsuite/gcc.target/i386/pr90497-1.c
new file mode 100644
index 000..ed6ded7efbc
--- /dev/null
+++ b/gcc/testsuite/gcc.target/i386/pr90497-1.c
@@ -0,0 +1,12 @@
+/* PR target/90497 */
+/* { dg-do compile } */
+/* { dg-options "-mno-sse -mmmx" { target ia32 } } */
+/* { dg-options "-mno-mmx" { target { ! ia32 } } } */
+
+typedef char __v8qi __attribute__ ((__vector_size__ (8)));
+
+__v8qi
+foo (__v8qi x, __v8qi y)
+{
+  return __builtin_ia32_pcmpeqb (x, y);
+}
diff --git a/gcc/testsuite/gcc.target/i386/pr90497-2.c 
b/gcc/testsuite/gcc.target/i386/pr90497-2.c
new file mode 100644
index 000..99ee5756b76
--- /dev/null
+++ b/gcc/testsuite/gcc.target/i386/pr90497-2.c
@@ -0,0 +1,11 @@
+/* PR target/90497 */
+/* { dg-do compile { target ia32 } } */
+/* { dg-options "-mno-sse -m3dnow" } */
+
+typedef char __v8qi __attribute__ ((__vector_size__ (8)));
+
+__v8qi
+foo (__v8qi x, __v8qi y)
+{
+  return __builtin_ia32_pavgusb (x, y);
+}
-- 
2.20.1



[committed] Fix ICE in equal_mem_array_ref_p (PR c++/90484)

2019-05-16 Thread Jakub Jelinek
Hi!

As mentioned in the PR, if we are very unlucky and have a hash collision
not just when hash % hash table size is equal, but when the whole 32-bit
hash is equal, we can actually end up with compatible types (bool vs.
unsigned : 1 on the testcase), but sz0 != sz1 (one is 1-bit, the other
8-bit), as one expression is a COMPONENT_REF with a bitfield and the other
is a bool MEM_REF.

Fixed thusly, bootstrapped/regtested on x86_64-linux and i686-linux,
preapproved by Richard on IRC, committed to trunk.

Testcase is not included, as it is not reduceable.

2019-05-16  Jakub Jelinek  

PR c++/90484
* tree-ssa-scopedtables.c (equal_mem_array_ref_p): Don't assert that
sz0 is equal to sz1, instead return false in that case.

--- gcc/tree-ssa-scopedtables.c.jj  2019-05-07 13:56:54.342933630 +0200
+++ gcc/tree-ssa-scopedtables.c 2019-05-16 14:35:23.353145695 +0200
@@ -537,13 +537,10 @@ equal_mem_array_ref_p (tree t0, tree t1)
   || maybe_ne (sz1, max1))
 return false;
 
-  if (rev0 != rev1)
+  if (rev0 != rev1 || maybe_ne (sz0, sz1) || maybe_ne (off0, off1))
 return false;
 
-  /* Types were compatible, so this is a sanity check.  */
-  gcc_assert (known_eq (sz0, sz1));
-
-  return known_eq (off0, off1) && operand_equal_p (base0, base1, 0);
+  return operand_equal_p (base0, base1, 0);
 }
 
 /* Compare two hashable_expr structures for equivalence.  They are

Jakub


Re: [PATCH v2 3/3] Consider doloop cmp use in ivopts

2019-05-16 Thread Segher Boessenkool
Hi Jeff,

On Thu, May 16, 2019 at 12:41:16PM -0600, Jeff Law wrote:
> For architectures like PPC, we probably don't want to use the loop count
> for anything else as it's likely expensive to get data in/out of the the
> loop count register.

That is part of it.  Another part is that it costs extra code, negating
one of the advantages of using these instructions.  And a third reason
we do not want this is that on some implementations you have to load the
count register early enough to get the loop predicted correctly.

> So at least part of the problem is cost modeling of this.  It's all
> pretty low level, so not really a good match for the goals of gimple.
> But we may ultimately have no choice here but to be pragmatic like we've
> done with stuff like vector widths and allow some target properties to
> bleed in.

*All* of ivopts is low level in this sense: *all* of it is about finding
out what IVs to use such that it is lowest cost on the target.

Other than costs it doesn't use many target attributes.  For doloop it
would also ask the target whether some loop can be a doloop at all.  So
everything it does is quite high level still, but it *does* have to know
about some very machine-specific things.

Maybe two hooks for that: one, taking a struct loop, to decide if that
loop should be considered for a doloop at all; and another taking a
gimple statement, and returning whether that statement prevents the
loop it is in from being a doloop.  That way we do not have to pass
a lot of gimple data and work to the backends.  Most can just look at
some of the simple loop properties ("is this an inner loop?"), and
allow all statements or just disallow some particular types.

> > Otherwise I understand that IVOPTs doesn't properly cost
> > the doloop IV update and conditional branch.  That's clearly
> > something we should fix (maybe even indepenently on other
> > changes). 
> It feels independent to me.

It cannot cost things properly if nothing has yet decided whether some
loop could (or should) be a doloop :-)


Segher


Re: [PATCH][PR90106] Builtin call transformation changes in cdce pass

2019-05-16 Thread Jakub Jelinek
On Wed, May 08, 2019 at 06:09:06PM +0800, JunMa wrote:
> 2019-05-07  Jun Ma 
> 
>     PR tree-optimization/90106
>     * gcc.dg/cdce1.c: Check tailcall code generation after cdce pass.
>     * gcc.dg/cdce2.c: Likewise.

This is wrong and results in UNSUPPORTED failures.
Both tests are dg-do run, so you can't scan-assembler for them, assembly
isn't generated at all (it could be if -save-temps is in dg-options).
Furthermore, I don't any target actually spells the tailcall in the assembly
as jmp space pow/log, rather than say on x86_64/i686 jmp tab pow/log, but
on most other targets completely differently (and some targets don't support
tail calls at all, or have various limitations for them etc.).

One possibility is to add -fdump-tree-optimized and scan for
/* { dg-final { scan-tree-dump "pow \\(\[^\n\r]*\\); \\\[tail call\\\]" 
"optimized" } } */
resp.
/* { dg-final { scan-tree-dump "log \\(\[^\n\r]*\\); \\\[tail call\\\]" 
"optimized" } } */

> diff --git a/gcc/testsuite/gcc.dg/cdce1.c b/gcc/testsuite/gcc.dg/cdce1.c
> index b23ad63..424d80f 100644
> --- a/gcc/testsuite/gcc.dg/cdce1.c
> +++ b/gcc/testsuite/gcc.dg/cdce1.c
> @@ -1,7 +1,8 @@
>  /* { dg-do  run  } */
>  /* { dg-options "-O2 -fmath-errno -fdump-tree-cdce-details  -lm" } */
>  /* { dg-require-effective-target int32plus } */
> -/* { dg-final { scan-tree-dump  "cdce1.c:16: .* function call is 
> shrink-wrapped into error conditions\."  "cdce" } } */
> +/* { dg-final { scan-tree-dump  "cdce1.c:17: .* function call is 
> shrink-wrapped into error conditions\."  "cdce" } } */
> +/* { dg-final { scan-assembler "jmp pow" } } */
>  /* { dg-require-effective-target large_double } */
>  
>  #include 
> diff --git a/gcc/testsuite/gcc.dg/cdce2.c b/gcc/testsuite/gcc.dg/cdce2.c
> index 30e7cb1..2af2893 100644
> --- a/gcc/testsuite/gcc.dg/cdce2.c
> +++ b/gcc/testsuite/gcc.dg/cdce2.c
> @@ -1,7 +1,8 @@
>  /* { dg-do  run  } */
>  /* { dg-skip-if "doubles are floats" { "avr-*-*" } } */
>  /* { dg-options "-O2 -fmath-errno -fdump-tree-cdce-details  -lm" } */
> -/* { dg-final { scan-tree-dump  "cdce2.c:15: .* function call is 
> shrink-wrapped into error conditions\." "cdce" } } */
> +/* { dg-final { scan-tree-dump  "cdce2.c:16: .* function call is 
> shrink-wrapped into error conditions\." "cdce" } } */
> +/* { dg-final { scan-assembler "jmp log" } } */
>   
>  #include 
>  #include 

Jakub


[RFC PATCH] Enhancements to profiledbootstrap

2019-05-16 Thread Arvind Sankar
Hi, I've been playing some with the PGO build infrastructure and have a
few changes I thought I'd share and get feedback on whether they're
completely crazy or not. I'm not terribly familiar with the innards of
the build infra, so would appreciate any comments and suggestions.

First, a recap of the current PGO build process -- please let me know if
I'm wrong about anything:
For a profiledbootstrap build, we replace normal stage{2,3,4}
resepectively with an instrumented stageprofile, a non-instrumented
stagetrain and finally a stagefeedback using the profile created when
building stagetrain.

I had two main goals in doing these changes:
1. profiledbootstrap does not do any comparison build, unlike the
regular bootstrap, so it is possible that the end product is actually
broken. Goal 1: try to incorporate this.
2. The profiling data comes from the stageprofile -> stagetrain build
and that run does not include many optimization passes (at least by
default at -O2) because those would only get enabled when profiling data
is available. Goal 2: try to create a bootstrap target that would
incorporate data from these passes.

Goal 1: Comparison stage
I started on goal 1 with the idea that if we built stagetrain with
instrumentation as well, we could just compare stageprofile and
stagetrain like we do with stage2/3.

This runs into a few roadblocks however that I would appreciate if
someone could comment on:
a) profiling data starts getting generated while building stageprofile,
since parts of that process involve running newly compiled executables.
In the current build that doesn't cause any issues, we just throw that
away and only use profiling data generated during the profile->train
build, which should be extensive enough.
This will not work if stagetrain is built with instrumentation,
as it will be appending profiling data to the same files that it is
using, during the train->feedback build. To resolve this, I changed the
build to save profiling data in an external directory, so the two stages
write profiling data into different places. Unfortunately, this results
in the path to that location getting saved into each object file, which
makes it impossible to compare them -- it should be possible to compare
just the .text sections maybe, or pass different GCOV_PREFIX overrides
for build vs host tools, but instead I decided to just add the
possibility to rebuild stagefeedback a second time using the
profile->train data and use that as the comparison, this should anyway
be the right comparison to do as it would be of the final build product.

It may also be possible to solve this by saving the profile data in the
same place for the two stages, but make a copy of that to use for the
train->feedback run but I haven't explored this yet. It will result in
profiling data that is a mix of the stageprofile and stagetrain
compilers but that might be okay given that they should be identical in
control flow.

b) I do get a few differences that are somewhat random: it looks like in
some cases the second run arranges functions in a different order from
the first run even though it is using the same profile data. Is this
known/is there a way to prevent it?

Goal 2: Second feedback stage
Nothing special here, it builds a new stagefeedbackfull using the
train->feedback profile. It does produce a different compiler so there's
some effect but I haven't benchmarked improvements to see if it's
measurably better.

Testing was done on x86_64-pc-linux-gnu, with default configure settings
except for --enable-languages=c,c++ --disable-werror. I've bootstrapped
PGO with/without --with-build-config=bootstrap-lto.

Summary of changes:
a) Add three new stages -- feedbackcompare, feedbackfull,
feedbackfullcompare with the two *compare stages to be used for
comparing with the previous ones. Question about gcc/*/Make-lang.in: I
see that these have rules at the end for, for eg c.stage*. Are these
necessary or vestegial-- stagetrain is not there currently and I didn't
add any of the new ones either.
b) Modify stagetrain to be built instrumented, and change profiling
output directories. Note that this is currently wasteful of build time
if you're going to stop with profiledbootstrap, so perhaps this should
be controlled via a build-config so it is enabled only for the *full
bootstraps.
c) Cleaned up bootstrap-lto{-lean}.mk a bit. It appears unnecessary to
set all the individual stage flags -- if someone wants to customize them
they can just override STAGE{2,3,4}_FLAGS to get the same effect. I also
added STAGE4_CFLAGS in there, and added -frandom-seed=1 and do-compare3
in bootstrap-lto-lean in case the user wants to do a bootstrap4. For
bootstrap-lto-noplugin.mk I noticed that the profiling stages were added
but without -ffat-lto-objects, that should get fixed by the patch
although it appears unlikely someone would be doing such a build.
d) If one does a non-LTO PGO build currently, the LTO frontend doesn't
get profiled. I modified the main Makefi

Re: [PATCH] libfortran/90038: Use posix_spawn instead of fork

2019-05-16 Thread Thomas Koenig

Hi Janne,


I differ there.


A longer explanation:

fork() is standard POSIX. Not all systems have posix_spawn.  For
those systems which do not have it, we would cause a regression
by simply removing that functionality for this.

The patch is OK from my side if you add fork() as a fallback option.

Regards

Thomas


Re: [PATCH] libfortran/90038: Use posix_spawn instead of fork

2019-05-16 Thread Thomas Koenig

Am 16.05.19 um 22:10 schrieb Janne Blomqvist:

On Thu, May 16, 2019 at 10:59 PM Thomas Koenig  wrote:


Hi Janne,


fork() semantics can be problematic.  Most unix style OS'es have
posix_spawn which can be used to replace fork + exec in many cases.
For more information see
e.g. 
https://www.microsoft.com/en-us/research/uploads/prod/2019/04/fork-hotos19.pdf

This replaces the one use of fork in libgfortran with posix_spawn.


Do I understand the patch correctly that we would no longer use fork()
if posix_spawn is not available?  I think we should leave that in as
a fallback option.


Yes. But there is already a fallback in case posix_spawn (or
previously, fork) is not available, namely falling back to synchronous
behavior. Since this is anyway somewhat of a corner case (namely, with
wait=.false.), and posix_spawn is supported on all (well, at least
Linux, macOS, *BSD, cygwin, Solarix, AIX) remotely modern unix type
systems, a further fallback to fork() is IMHO not warranted.


I differ there.

Regards

Thomas



Re: [PATCH] Changes to std::variant to reduce code size

2019-05-16 Thread Ville Voutilainen
On Thu, 16 May 2019 at 23:28, Jonathan Wakely  wrote:
> Here's what I've tested and am about to commit.

Looks good to me.


Re: [PATCH] Changes to std::variant to reduce code size

2019-05-16 Thread Jonathan Wakely

On 16/05/19 12:43 +0100, Jonathan Wakely wrote:

On 16/05/19 12:29 +0100, Jonathan Wakely wrote:

These two changes both result in smaller code for std::variant.

The first one means smaller tables of function pointers, because we
don't generate an instantiation for the valueless state. Instead we do
a runtime branch, marked [[unlikely]] to make _M_reset() a no-op if
it's already valueless. In a microbenchmark I couldn't measure any
performance difference due to the extra branch, so the code size
reduction seems worthwhile.

The second one removes a branch from the index() member by relying on
unsigned arithmetic. That also results in smaller code and I can't see
any downside.

* include/std/variant (_Variant_storage::_M_reset):
Replace raw visitation with a runtime check for the valueless state
and a non-raw visitor.
(_Variant_storage::_M_reset_impl): Remove.
(variant::index()): Remove branch.


We might also want:

--- a/libstdc++-v3/include/std/variant
+++ b/libstdc++-v3/include/std/variant
@@ -1503,7 +1503,7 @@ namespace __variant
}
  else
{
-   if (this->index() != variant_npos)
+   if (!this->valueless_by_exception()) [[__likely__]]
{
  auto __tmp(std::move(__rhs_mem));
  __rhs = std::move(*this);
@@ -1520,7 +1520,7 @@ namespace __variant
}
  else
{
-   if (this->index() != variant_npos)
+   if (!this->valueless_by_exception()) [[__likely__]]
{
  __rhs = std::move(*this);
  this->_M_reset();


This results in smaller code too, because for some specializations
valueless_by_exception() always returns false, so the branch can be
removed.

(This suggests that it's generally better to ask the yes/no question
"are you valid?" rather than "what is your index, and does it equal
this magic number?")

For specializations where a valueless state is possible we still
expect it to be very unlikely in practice, so the attribute should
help there.


Here's what I've tested and am about to commit.


commit d4e4bd9e53d81f410a8ae289d3b67d0295f8da96
Author: Jonathan Wakely 
Date:   Wed May 15 22:33:31 2019 +0100

Changes to std::variant to reduce code size

* include/std/variant (_Variant_storage::_M_reset):
Replace raw visitation with a runtime check for the valueless state
and a non-raw visitor.
(_Variant_storage::_M_reset_impl): Remove.
(variant::index()): Remove branch.
(variant::swap(variant&)): Use valueless_by_exception() instead of
comparing the index to variant_npos, and add likelihood attribute.

diff --git a/libstdc++-v3/include/std/variant b/libstdc++-v3/include/std/variant
index 8c710c30de5..101b8945943 100644
--- a/libstdc++-v3/include/std/variant
+++ b/libstdc++-v3/include/std/variant
@@ -396,19 +396,16 @@ namespace __variant
 	_M_index(_Np)
 	{ }
 
-  constexpr void _M_reset_impl()
-  {
-	__variant::__raw_visit([](auto&& __this_mem) mutable
-	  {
-	if constexpr (!is_same_v,
-			  __variant_cookie>)
-	  std::_Destroy(std::__addressof(__this_mem));
-	  }, __variant_cast<_Types...>(*this));
-  }
-
   void _M_reset()
   {
-	_M_reset_impl();
+	if (!_M_valid()) [[unlikely]]
+	  return;
+
+	std::__do_visit([](auto&& __this_mem) mutable
+	  {
+	std::_Destroy(std::__addressof(__this_mem));
+	  }, __variant_cast<_Types...>(*this));
+
 	_M_index = variant_npos;
   }
 
@@ -1485,12 +1482,7 @@ namespace __variant
   { return !this->_M_valid(); }
 
   constexpr size_t index() const noexcept
-  {
-	if (this->_M_index ==
-	typename _Base::__index_type(variant_npos))
-	  return variant_npos;
-	return this->_M_index;
-  }
+  { return size_t(typename _Base::__index_type(this->_M_index + 1)) - 1; }
 
   void
   swap(variant& __rhs)
@@ -1511,7 +1503,7 @@ namespace __variant
 		  }
 		else
 		  {
-		if (this->index() != variant_npos)
+		if (!this->valueless_by_exception()) [[__likely__]]
 		  {
 			auto __tmp(std::move(__rhs_mem));
 			__rhs = std::move(*this);
@@ -1528,7 +1520,7 @@ namespace __variant
 	  }
 	else
 	  {
-		if (this->index() != variant_npos)
+		if (!this->valueless_by_exception()) [[__likely__]]
 		  {
 		__rhs = std::move(*this);
 		this->_M_reset();


[PATCH] Implement sane variant converting constructor (P0608R3)

2019-05-16 Thread Jonathan Wakely

* include/std/variant (__overload_set): Remove.
(_Arr): New helper.
(_Build_FUN): New class template to define a single FUN overload,
with specializations to prevent unwanted conversions, as per P0608R3.
(_Build_FUNs): New class template to build an overload set of FUN.
(_FUN_type): New alias template to perform overload resolution.
(__accepted_type): Use integer_constant base for failure case. Use
_FUN_type for successful case.
(variant::__accepted_index): Use _Tp instead of _Tp&&.
(variant::variant(_Tp&&)): Likewise.
(variant::operator=(_Tp&&)): Likewise.

Tested powerpc64le-linux, committed to trunk.


commit 0d7f794880e6d7ee6452555e4cd7308f324e402f
Author: Jonathan Wakely 
Date:   Thu May 16 20:41:16 2019 +0100

Implement sane variant converting constructor (P0608R3)

* include/std/variant (__overload_set): Remove.
(_Arr): New helper.
(_Build_FUN): New class template to define a single FUN overload,
with specializations to prevent unwanted conversions, as per P0608R3.
(_Build_FUNs): New class template to build an overload set of FUN.
(_FUN_type): New alias template to perform overload resolution.
(__accepted_type): Use integer_constant base for failure case. Use
_FUN_type for successful case.
(variant::__accepted_index): Use _Tp instead of _Tp&&.
(variant::variant(_Tp&&)): Likewise.
(variant::operator=(_Tp&&)): Likewise.

diff --git a/libstdc++-v3/include/std/variant b/libstdc++-v3/include/std/variant
index 101b8945943..eec41750da7 100644
--- a/libstdc++-v3/include/std/variant
+++ b/libstdc++-v3/include/std/variant
@@ -161,7 +161,7 @@ namespace __detail
 {
 namespace __variant
 {
-  // Returns the first apparence of _Tp in _Types.
+  // Returns the first appearence of _Tp in _Types.
   // Returns sizeof...(_Types) if _Tp is not in _Types.
   template
 struct __index_of : std::integral_constant {};
@@ -727,41 +727,65 @@ namespace __variant
 inline constexpr bool __exactly_once =
   __tuple_count_v<_Tp, tuple<_Types...>> == 1;
 
-  // Takes _Types and create an overloaded _S_fun for each type.
-  // If a type appears more than once in _Types, create only one overload.
-  template
-struct __overload_set
-{ static void _S_fun(); };
+  // Helper used to check for valid conversions that don't involve narrowing.
+  template struct _Arr { _Ti _M_x[1]; };
 
-  template
-struct __overload_set<_First, _Rest...> : __overload_set<_Rest...>
+  // Build an imaginary function FUN(Ti) for each alternative type Ti
+  template, bool>,
+	   typename = void>
+struct _Build_FUN
 {
-  using __overload_set<_Rest...>::_S_fun;
-  static integral_constant _S_fun(_First);
+  // This function means 'using _Build_FUN::_S_fun;' is valid,
+  // but only static functions will be considered in the call below.
+  void _S_fun();
 };
 
-  template
-struct __overload_set : __overload_set<_Rest...>
+  // ... for which Ti x[] = {std::forward(t)}; is well-formed,
+  template
+struct _Build_FUN<_Ind, _Tp, _Ti, false,
+		  void_t{{std::declval<_Tp>()}})>>
 {
-  using __overload_set<_Rest...>::_S_fun;
+  // This is the FUN function for type _Ti, with index _Ind
+  static integral_constant _S_fun(_Ti);
 };
 
-  // Helper for variant(_Tp&&) and variant::operator=(_Tp&&).
-  // __accepted_index maps an arbitrary _Tp to an alternative type in _Variant
-  // (or to variant_npos).
+  // ... and if Ti is cv bool, remove_cvref_t is bool.
+  template
+struct _Build_FUN<_Ind, _Tp, _Ti, true,
+		  enable_if_t, bool>>>
+{
+  // This is the FUN function for when _Ti is cv bool, with index _Ind
+  static integral_constant _S_fun(_Ti);
+};
+
+  template>>
+struct _Build_FUNs;
+
+  template
+struct _Build_FUNs<_Tp, variant<_Ti...>, index_sequence<_Ind...>>
+: _Build_FUN<_Ind, _Tp, _Ti>...
+{
+  using _Build_FUN<_Ind, _Tp, _Ti>::_S_fun...;
+};
+
+  // The index j of the overload FUN(Tj) selected by overload resolution
+  // for FUN(std::forward<_Tp>(t))
+  template
+using _FUN_type
+  = decltype(_Build_FUNs<_Tp, _Variant>::_S_fun(std::declval<_Tp>()));
+
+  // The index selected for FUN(std::forward(t)), or variant_npos if none.
   template
 struct __accepted_index
-{ static constexpr size_t value = variant_npos; };
+: integral_constant
+{ };
 
-  template
-struct __accepted_index<
-  _Tp, variant<_Types...>,
-  void_t::_S_fun(std::declval<_Tp>()))>>
-{
-  static constexpr size_t value = sizeof...(_Types) - 1
-	- decltype(__overload_set<_Types...>::
-		   _S_fun(std::declval<_Tp>()))::value;
-};
+  template
+struct __accepted_index<_Tp, _Variant, void_t<_FUN_type<_Tp, _Variant>>>
+: _FUN_type<_Tp, _Variant>
+{ };
 
   // Returns the raw storage 

Re: patches to detect GCC diagnostics

2019-05-16 Thread Martin Sebor

On 5/16/19 8:58 AM, Roland Illig wrote:

Hi Martin,

I'm impressed how much work you have put into the patches for detecting
nonoptimal diagnostics. It takes a long time to read through the
patches, but it's worth it, knowing that it took much longer for you to
prepare the patch, and that I won't have to submit i18n bug reports in
the foreseeable future. :)


Thanks.  That's the idea :)





+  /* Diagnose "arg" (short for "argument" when lazy).  */
+  if (!strncmp (format_chars, "arg", 3)
+ && (!format_chars[3]
+ || format_chars[3] == 's'
+ || ISSPACE (format_chars[3])))

Wouldn't it be sufficient to just check for !ISALNUM(format_chars[3])?
This would also catch "specify args, return type and so on".


I've improved the test in the revised patch.



I didn't like the magic "n == 3", but after experimenting a bit, I came
to the conclusion that the code you wrote is the best possible.

typo: ponters should be pointers

typo: drective should be directive


Fixed, thanks.



Since your code contains the expression strncmp(str, sub, sizeof sub -
1) occurs quite often, I was thinking whether it would be useful to
declare str_startswith, which expresses the actual intent more directly.


Good idea, thanks!




nchars > 1


Better use ngettext in these 7 cases, to account for multiple plural
forms in Arabic, Polish and Russian. :)


Ah, right.  That's exactly why a checker for this would be so
handy here! :)  I've reworded the warning to avoid the plural
vs singular difference.




+  /* Diagnose a backtick (grave accent).  */


This diagnostic should explain how to fix this one since it might be
non-obvious.


Done (I assume we prefer an apostrophe instead).



typo: /* Likewise for gimple.  */ -- should be cgraph_node

typo: be  cdiagnosed -- spurious whitespace? ;)

possible typo: arn't


Thanks.  As might be apparent from all these typos, I probably
need a checker like this more than anyone.



there is a FIXME after "you can%'t do that"


I've added the detection.



"ignoring %-specifier for non-static local " might be wrong, as
the word "asm-specifier" might come from the C or C++ grammar. Should
this be "% specifier", with a space?


Yes, I think the dash probably shouldn't be there.  It isn't
in references to "% qualifiers."  Let me remove it.



Oh no. "%qE is not an % qualifier" might destroy my hopes of
merging diagnostics with the same pattern.


How about rephrasing as:

  "%qE is not a valid qualifier for the % keyword"


If some of them need to be
prefixed with "a" and some others with "an", they cannot be merged. Or I
need to make an exception when the "before" string ends in "a" or "an".
Luckily, for "the" and "the" only the pronunciation differs but not the
spelling.

-   "%qE attribute argument %E is not in the range [0, %E)",
-   name, val, align);
+   "%qE attribute argument %E is not in the range %s, %E%c",
+   name, val, "[0", align, ')');

I don't like this one as it is asymmetrical. Either both characters
should be passed as %c, or none of them. I prefer passing none of them
to make the string easier to read for translators.


I don't like this one either and was tempted to change it to
the more common [min, max] form.  I can do that, it just means
extracting the integer from align:

  "%qE attribute argument %E is not in the range [0, %wu]",
   name, val, tree_to_uhwi (align));


> +   "unsuffixed floating constant");

I'd rather write "unsuffixed floating point constant". (appears multiple
times)


This came from the C standard that refers to "floating constants."
To be fair, C also refers to both "floating types" and "floating
point types" but not to "floating point constant."  I'm inclined
to leave it as is just because it's shorter and means less work,
but I'm not dead set against changing it either if it's important.



-  warning (OPT_Wpragmas, "#pragma redefine_extname ignored due to "
-  "conflict with __asm__ declaration");
+  warning (OPT_Wpragmas, "%<#pragma redefine_extname%> ignored "
+  "due to conflict with % declaration");

Are you sure that you want to remove the underscores? Just asking, I
haven't checked the surrounding code.


They should be interchangeable.



- error ("#pragma GCC target string... is badly formed");
+ error ("%<#pragma GCC target%> string is badly formed");
- error ("#pragma GCC optimize string... is badly formed");
+ error ("%<#pragma GCC optimize%> string is badly formed");

I think the "string..." was supposed to go inside the quotes.


The "string" refers to the pragma argument and quoting it would
imply that the word string should be verbatim.  So I think it's
correct as is, despite the call above:

  GCC_BAD ("%<#pragma GCC optimize (string [,string]...)%> does "
   "not have a final %<)%>");

where the string is part of the grammar being shown.



+  warning (0, "%s:tag %qx is invalid", filename, tag);

I think there should be a space after the 

Re: [PATCH] libfortran/90038: Use posix_spawn instead of fork

2019-05-16 Thread Janne Blomqvist
On Thu, May 16, 2019 at 10:59 PM Thomas Koenig  wrote:
>
> Hi Janne,
>
> > fork() semantics can be problematic.  Most unix style OS'es have
> > posix_spawn which can be used to replace fork + exec in many cases.
> > For more information see
> > e.g. 
> > https://www.microsoft.com/en-us/research/uploads/prod/2019/04/fork-hotos19.pdf
> >
> > This replaces the one use of fork in libgfortran with posix_spawn.
>
> Do I understand the patch correctly that we would no longer use fork()
> if posix_spawn is not available?  I think we should leave that in as
> a fallback option.

Yes. But there is already a fallback in case posix_spawn (or
previously, fork) is not available, namely falling back to synchronous
behavior. Since this is anyway somewhat of a corner case (namely, with
wait=.false.), and posix_spawn is supported on all (well, at least
Linux, macOS, *BSD, cygwin, Solarix, AIX) remotely modern unix type
systems, a further fallback to fork() is IMHO not warranted.


-- 
Janne Blomqvist


[PATCH] x86-64: Add vararg ABI tests

2019-05-16 Thread H.J. Lu
We can scan stack for return address to get vector arguments passed on
stack.

* gcc.target/x86_64/abi/test_varargs-m128.c: New file.
* gcc.target/x86_64/abi/avx/test_varargs-m256.c: Likewise.
* gcc.target/x86_64/abi/avx512f/test_varargs-m512.c: Likewise.
---
 .../x86_64/abi/avx/test_varargs-m256.c| 102 +
 .../x86_64/abi/avx512f/test_varargs-m512.c| 102 +
 .../gcc.target/x86_64/abi/test_varargs-m128.c | 108 ++
 3 files changed, 312 insertions(+)
 create mode 100644 gcc/testsuite/gcc.target/x86_64/abi/avx/test_varargs-m256.c
 create mode 100644 
gcc/testsuite/gcc.target/x86_64/abi/avx512f/test_varargs-m512.c
 create mode 100644 gcc/testsuite/gcc.target/x86_64/abi/test_varargs-m128.c

diff --git a/gcc/testsuite/gcc.target/x86_64/abi/avx/test_varargs-m256.c 
b/gcc/testsuite/gcc.target/x86_64/abi/avx/test_varargs-m256.c
new file mode 100644
index 000..d1bcf865487
--- /dev/null
+++ b/gcc/testsuite/gcc.target/x86_64/abi/avx/test_varargs-m256.c
@@ -0,0 +1,102 @@
+/* Test variable number of 256-bit vector arguments passed to functions.  */
+
+#include 
+#include "avx-check.h"
+#include "args.h"
+
+struct IntegerRegisters iregs;
+struct FloatRegisters fregs;
+
+/* This struct holds values for argument checking.  */
+struct 
+{
+  YMM_T i0, i1, i2, i3, i4, i5, i6, i7, i8, i9;
+} values;
+
+char *pass;
+int failed = 0;
+
+#undef assert
+#define assert(c) do { \
+  if (!(c)) {failed++; printf ("failed %s\n", pass); } \
+} while (0)
+
+#define compare(X1,X2,T) do { \
+  assert (memcmp (&X1, &X2, sizeof (T)) == 0); \
+} while (0)
+
+void
+fun_check_passing_m256_varargs (__m256 i0, __m256 i1, __m256 i2,
+   __m256 i3, ...)
+{
+  /* Check argument values.  */
+  void **fp = __builtin_frame_address (0);
+  void *ra = __builtin_return_address (0);
+  __m256 *argp;
+
+  compare (values.i0, i0, __m256);
+  compare (values.i1, i1, __m256);
+  compare (values.i2, i2, __m256);
+  compare (values.i3, i3, __m256);
+
+  /* Get the pointer to the return address on stack.  */
+  while (*fp != ra)
+fp++;
+
+  /* Check __m256 arguments passed on stack.  */
+  argp = (__m256 *) (((char *) fp) + 8);
+  compare (values.i4, argp[0], __m256);
+  compare (values.i5, argp[1], __m256);
+  compare (values.i6, argp[2], __m256);
+  compare (values.i7, argp[3], __m256);
+  compare (values.i8, argp[4], __m256);
+  compare (values.i9, argp[5], __m256);
+
+  /* Check register contents.  */
+  compare (fregs.ymm0, ymm_regs[0], __m256);
+  compare (fregs.ymm1, ymm_regs[1], __m256);
+  compare (fregs.ymm2, ymm_regs[2], __m256);
+  compare (fregs.ymm3, ymm_regs[3], __m256);
+}
+
+#define def_check_int_passing_varargs(_i0, _i1, _i2, _i3, _i4, _i5, \
+ _i6, _i7, _i8, _i9, \
+ _func, TYPE) \
+  values.i0.TYPE[0] = _i0; \
+  values.i1.TYPE[0] = _i1; \
+  values.i2.TYPE[0] = _i2; \
+  values.i3.TYPE[0] = _i3; \
+  values.i4.TYPE[0] = _i4; \
+  values.i5.TYPE[0] = _i5; \
+  values.i6.TYPE[0] = _i6; \
+  values.i7.TYPE[0] = _i7; \
+  values.i8.TYPE[0] = _i8; \
+  values.i9.TYPE[0] = _i9; \
+  clear_struct_registers; \
+  fregs.F0.TYPE[0] = _i0; \
+  fregs.F1.TYPE[0] = _i1; \
+  fregs.F2.TYPE[0] = _i2; \
+  fregs.F3.TYPE[0] = _i3; \
+  WRAP_CALL(_func) (_i0, _i1, _i2, _i3, _i4, _i5, _i6, _i7, _i8, _i9);
+
+void
+test_m256_varargs (void)
+{
+  __m256 x[10];
+  int i;
+  for (i = 0; i < 10; i++)
+x[i] = (__m256){32+i, 0, 0, 0, 0, 0, 0, 0};
+  pass = "m256-varargs";
+  def_check_int_passing_varargs (x[0], x[1], x[2], x[3], x[4], x[5],
+x[6], x[7], x[8], x[9],
+fun_check_passing_m256_varargs,
+_m256);
+}
+
+void
+avx_test (void)
+{
+  test_m256_varargs ();
+  if (failed)
+abort ();
+}
diff --git a/gcc/testsuite/gcc.target/x86_64/abi/avx512f/test_varargs-m512.c 
b/gcc/testsuite/gcc.target/x86_64/abi/avx512f/test_varargs-m512.c
new file mode 100644
index 000..328f76de3df
--- /dev/null
+++ b/gcc/testsuite/gcc.target/x86_64/abi/avx512f/test_varargs-m512.c
@@ -0,0 +1,102 @@
+/* Test variable number of 512-bit vector arguments passed to functions.  */
+
+#include 
+#include "avx512f-check.h"
+#include "args.h"
+
+struct IntegerRegisters iregs;
+struct FloatRegisters fregs;
+
+/* This struct holds values for argument checking.  */
+struct 
+{
+  ZMM_T i0, i1, i2, i3, i4, i5, i6, i7, i8, i9;
+} values;
+
+char *pass;
+int failed = 0;
+
+#undef assert
+#define assert(c) do { \
+  if (!(c)) {failed++; printf ("failed %s\n", pass); } \
+} while (0)
+
+#define compare(X1,X2,T) do { \
+  assert (memcmp (&X1, &X2, sizeof (T)) == 0); \
+} while (0)
+
+void
+fun_check_passing_m512_varargs (__m512 i0, __m512 i1, __m512 i2,
+   __m512 i3, ...)
+{
+  /* Check argument values.  */
+  void **fp = __builtin_frame_address (0);
+  void *ra = __builtin_return_a

Re: [PATCH] libfortran/90038: Use posix_spawn instead of fork

2019-05-16 Thread Thomas Koenig

Hi Janne,


fork() semantics can be problematic.  Most unix style OS'es have
posix_spawn which can be used to replace fork + exec in many cases.
For more information see
e.g. 
https://www.microsoft.com/en-us/research/uploads/prod/2019/04/fork-hotos19.pdf

This replaces the one use of fork in libgfortran with posix_spawn.


Do I understand the patch correctly that we would no longer use fork()
if posix_spawn is not available?  I think we should leave that in as
a fallback option.

Regards

Thomas


[PATCH] libfortran/90038: Use posix_spawn instead of fork

2019-05-16 Thread Janne Blomqvist
fork() semantics can be problematic.  Most unix style OS'es have
posix_spawn which can be used to replace fork + exec in many cases.
For more information see
e.g. 
https://www.microsoft.com/en-us/research/uploads/prod/2019/04/fork-hotos19.pdf

This replaces the one use of fork in libgfortran with posix_spawn.

2019-05-16  Janne Blomqvist  

PR libfortran/90038
* configure.ac (AC_CHECK_FUNCS_ONCE): Check for posix_spawn
instead of fork.
* intrinsics/execute_command_line (execute_command_line): Use
posix_spawn instead of fork.
* Makefile.in: Regenerated.
* config.h.in: Regenerated.
* configure: Regenerated.

Regtested on x86_64-pc-linux-gnu, Ok for trunk?
---
 libgfortran/configure.ac  |  2 +-
 libgfortran/intrinsics/execute_command_line.c | 19 +--
 2 files changed, 10 insertions(+), 11 deletions(-)

diff --git a/libgfortran/configure.ac b/libgfortran/configure.ac
index c06db7b1a78..66af512a292 100644
--- a/libgfortran/configure.ac
+++ b/libgfortran/configure.ac
@@ -315,7 +315,7 @@ else
AC_CHECK_FUNCS_ONCE(getrusage times mkstemp strtof strtold snprintf \
ftruncate chsize chdir getentropy getlogin gethostname kill link symlink \
sleep ttyname \
-   alarm access fork setmode fcntl writev \
+   alarm access posix_spawn setmode fcntl writev \
gettimeofday stat fstat lstat getpwuid vsnprintf dup \
getcwd localtime_r gmtime_r getpwuid_r ttyname_r clock_gettime \
getgid getpid getuid geteuid umask getegid \
diff --git a/libgfortran/intrinsics/execute_command_line.c 
b/libgfortran/intrinsics/execute_command_line.c
index a234bc328b5..3df99c10678 100644
--- a/libgfortran/intrinsics/execute_command_line.c
+++ b/libgfortran/intrinsics/execute_command_line.c
@@ -32,7 +32,10 @@ see the files COPYING3 and COPYING.RUNTIME respectively.  If 
not, see
 #ifdef  HAVE_SYS_WAIT_H
 #include 
 #endif
-
+#ifdef HAVE_POSIX_SPAWN
+#include 
+extern char **environ;
+#endif
 
 enum { EXEC_SYNCHRONOUS = -2, EXEC_NOERROR = 0, EXEC_SYSTEMFAILED,
EXEC_CHILDFAILED, EXEC_INVALIDCOMMAND };
@@ -71,7 +74,7 @@ execute_command_line (const char *command, bool wait, int 
*exitstat,
   /* Flush all I/O units before executing the command.  */
   flush_all_units();
 
-#if defined(HAVE_FORK)
+#if defined(HAVE_POSIX_SPAWN)
   if (!wait)
 {
   /* Asynchronous execution.  */
@@ -79,14 +82,10 @@ execute_command_line (const char *command, bool wait, int 
*exitstat,
 
   set_cmdstat (cmdstat, EXEC_NOERROR);
 
-  if ((pid = fork()) < 0)
+  const char * const argv[] = {"sh", "-c", cmd, NULL};
+  if (posix_spawn (&pid, "/bin/sh", NULL, NULL,
+  (char * const* restrict) argv, environ))
set_cmdstat (cmdstat, EXEC_CHILDFAILED);
-  else if (pid == 0)
-   {
- /* Child process.  */
- int res = system (cmd);
- _exit (WIFEXITED(res) ? WEXITSTATUS(res) : res);
-   }
 }
   else
 #endif
@@ -96,7 +95,7 @@ execute_command_line (const char *command, bool wait, int 
*exitstat,
 
   if (res == -1)
set_cmdstat (cmdstat, EXEC_SYSTEMFAILED);
-#ifndef HAVE_FORK
+#ifndef HAVE_POSIX_SPAWN
   else if (!wait)
set_cmdstat (cmdstat, EXEC_SYNCHRONOUS);
 #endif
-- 
2.17.1



Re: OpenACC Profiling Interface: 'acc_register_library'

2019-05-16 Thread Thomas Schwinge
Hi Jakub!

On Thu, 16 May 2019 17:54:23 +0200, Jakub Jelinek  wrote:
> On Thu, May 16, 2019 at 05:21:56PM +0200, Thomas Schwinge wrote:
> > > Jakub, would you please especially review the non-OpenACC-specific
> > > changes here, including the libgomp ABI changes?
> > 
> > Given a baseline that I've not yet posted ;-) would you please anyway
> > have a look at the following changes?  Is it OK to add/handle the
> > 'acc_register_library' symbol in this way?  The idea behind that one is
> > that you dynamically (including via 'LD_PRELOAD') link your code against
> > a "library" providing an implementation of 'acc_register_library', or
> > even define it in your user code (see the test case below), and then upon
> > initialization, "The OpenACC runtime will invoke 'acc_register_library',
> > passing [...]".
> 
> Ugh, it is a mess

;-P Hah, I was very sure that you'd say something like that!

> (but then, seems OMPT has the same mess with
> ompt_start_tool symbol).

..., but at least it's not OpenACC alone.  ;-)

> It is nasty to call acc_register_library from initialization of the OpenMP
> library, similarly to nastyness of calling ompt_start_tool from
> initialization of the OpenACC library, neither of those symbols is reserved
> to the implementation generally.
> Can't we not do anything for -fopenacc or -fopenmp and have
> -fopenacc-profile or -fopenmpt options that would link in another shared
> library which just provides that symbol and calls it from its
> initialization?

At least for OpenACC, I don't think we'll want an additional/separate
command-line flag, but yes, a separate library that only gets linked in
for explicit '-fopenacc' would've been my next idea, too.  This should be
easy to do GCC spec-wise, and also in the libgomp Automake build system.

> The dummy implementation would be __attribute__((weak))
> and would dlsym (RTLD_NEXT, "...") and call that if it returns non-NULL,
> so even if that library happens to be linked before whatever library
> implements the user symbol.
> Looking at what libomp does for ompt_start_tool, for Darwin they don't use
> a weak symbol and instead just dlsym(RTLD_DEFAULT, "...") in the
> library ctor, for Linux they have a weak definition that does dlsym
> (RTLD_NEXT, "...") and for Windows use something yet different.

Will that work for the case of static linking, though?  OpenACC for
"Statically-Linked Library Initialization" describes that "A tools
library can be compiled and linked directly into the application. If the
library provides an external routine 'acc_register_library' [...], the
runtime will invoke that routine to initialize the library".

If the proposed scheme won't work, we'll probably have to make the
runtime (libgomp) aware whether an explicit compile-time '-fopenacc' flag
had been specified, and only if yes, at run-time then invoke
'acc_register_library'?

Anyway, I'll defer the actual implementation for later.

But I'll still now include in the commit that I'm preparing the
'acc_register_library' prototype in , and also its symbol
version, because these things apply no matter whether we now call that
function from 'goacc_profiling_initialize' or not.

Does the 'acc_register_library' symbol version need to be backed by a
(stub) function definition?  It builds without, but it doesn't appear in
'readelf --dyn-syms x86_64-pc-linux-gnu/libgomp/.libs/libgomp.so'; is
that OK or not?


> > --- libgomp/libgomp.map
> > +++ libgomp/libgomp.map
> > @@ -469,6 +469,7 @@ OACC_2.5 {
> > acc_prof_lookup;
> > acc_prof_register;
> > acc_prof_unregister;
> > +   acc_register_library;
> > acc_update_device_async;
> > acc_update_device_async_32_h_;
> > acc_update_device_async_64_h_;
> 
> You certainly never want to add something to a symbol version
> that has been shipped in a release compiler already.

Thanks, fixed.


Grüße
 Thomas


Re: [PATCH 9/12] adjust tests to quoting/spelling diagnostics fixes

2019-05-16 Thread Jeff Law
On 5/14/19 3:32 PM, Martin Sebor wrote:
> The attached patch adjusts the expected test output to the quoting,
> spelling and other formatting changes in diagnostics to fix issues
> pointed out by the -Wformat-diag warning.
> 
> Martin
> 
> gcc-wformat-diag-tests.diff
> 
> gcc/testsuite/ChangeLog:
> 
>   * c-c++-common/Wbool-operation-1.c: Adjust text of expected diagnostics.
>   * c-c++-common/Wvarargs-2.c: Same.
>   * c-c++-common/Wvarargs.c: Same.
>   * c-c++-common/pr51768.c: Same.
>   * c-c++-common/tm/inline-asm.c: Same.
>   * c-c++-common/tm/safe-1.c: Same.
>   * g++.dg/asm-qual-1.C: Same.
>   * g++.dg/asm-qual-3.C: Same.
>   * g++.dg/conversion/dynamic1.C: Same.
>   * g++.dg/cpp0x/constexpr-89599.C: Same.
>   * g++.dg/cpp0x/constexpr-cast.C: Same.
>   * g++.dg/cpp0x/constexpr-shift1.C: Same.
>   * g++.dg/cpp0x/lambda/lambda-conv11.C: Same.
>   * g++.dg/cpp0x/nullptr04.C: Same.
>   * g++.dg/cpp0x/static_assert12.C: Same.
>   * g++.dg/cpp0x/static_assert8.C: Same.
>   * g++.dg/cpp1y/lambda-conv1.C: Same.
>   * g++.dg/cpp1y/pr79393-3.C: Same.
>   * g++.dg/cpp1y/static_assert1.C: Same.
>   * g++.dg/cpp1z/constexpr-if4.C: Same.
>   * g++.dg/cpp1z/constexpr-if5.C: Same.
>   * g++.dg/cpp1z/constexpr-if9.C: Same.
>   * g++.dg/eh/goto2.C: Same.
>   * g++.dg/eh/goto3.C: Same.
>   * g++.dg/expr/static_cast8.C: Same.
>   * g++.dg/ext/flexary5.C: Same.
>   * g++.dg/ext/utf-array-short-wchar.C: Same.
>   * g++.dg/ext/utf-array.C: Same.
>   * g++.dg/ext/utf8-2.C: Same.
>   * g++.dg/gomp/loop-4.C: Same.
>   * g++.dg/gomp/macro-4.C: Same.
>   * g++.dg/gomp/udr-1.C: Same.
>   * g++.dg/init/initializer-string-too-long.C: Same.
>   * g++.dg/other/offsetof9.C: Same.
>   * g++.dg/ubsan/pr63956.C: Same.
>   * g++.dg/warn/Wbool-operation-1.C: Same.
>   * g++.dg/warn/Wtype-limits-Wextra.C: Same.
>   * g++.dg/warn/Wtype-limits.C: Same.
>   * g++.dg/wrappers/pr88680.C: Same.
>   * g++.old-deja/g++.mike/eh55.C: Same.
>   * gcc.dg/Wsign-compare-1.c: Same.
>   * gcc.dg/Wtype-limits-Wextra.c: Same.
>   * gcc.dg/Wtype-limits.c: Same.
>   * gcc.dg/Wunknownprag.c: Same.
>   * gcc.dg/Wunsuffixed-float-constants-1.c: Same.
>   * gcc.dg/asm-6.c: Same.
>   * gcc.dg/asm-qual-1.c: Same.
>   * gcc.dg/cast-1.c: Same.
>   * gcc.dg/cast-2.c: Same.
>   * gcc.dg/cast-3.c: Same.
>   * gcc.dg/cpp/source_date_epoch-2.c: Same.
>   * gcc.dg/debug/pr85252.c: Same.
>   * gcc.dg/dfp/cast-bad.c: Same.
>   * gcc.dg/format/gcc_diag-1.c: Same.
>   * gcc.dg/format/gcc_diag-11.c: Same.New test.
>   * gcc.dg/gcc_diag-11.c: Same.New test.
>   * gcc.dg/gnu-cond-expr-2.c: Same.
>   * gcc.dg/gnu-cond-expr-3.c: Same.
>   * gcc.dg/gomp/macro-4.c: Same.
>   * gcc.dg/init-bad-1.c: Same.
>   * gcc.dg/init-bad-2.c: Same.
>   * gcc.dg/init-bad-3.c: Same.
>   * gcc.dg/pr27528.c: Same.
>   * gcc.dg/pr48552-1.c: Same.
>   * gcc.dg/pr48552-2.c: Same.
>   * gcc.dg/pr59846.c: Same.
>   * gcc.dg/pr61096-1.c: Same.
>   * gcc.dg/pr8788-1.c: Same.
>   * gcc.dg/pr90082.c: Same.
>   * gcc.dg/simd-2.c: Same.
>   * gcc.dg/spellcheck-params-2.c: Same.
>   * gcc.dg/spellcheck-params.c: Same.
>   * gcc.dg/strlenopt-49.c: Same.
>   * gcc.dg/tm/pr52141.c: Same.
>   * gcc.dg/torture/pr51106-1.c: Same.
>   * gcc.dg/torture/pr51106-2.c: Same.
>   * gcc.dg/utf-array-short-wchar.c: Same.
>   * gcc.dg/utf-array.c: Same.
>   * gcc.dg/utf8-2.c: Same.
>   * gcc.dg/warn-sprintf-no-nul.c: Same.
>   * gcc.target/i386/asm-flag-0.c: Same.
>   * gcc.target/i386/inline_error.c: Same.
>   * gcc.target/i386/pr30848.c: Same.
>   * gcc.target/i386/pr39082-1.c: Same.
>   * gcc.target/i386/pr39678.c: Same.
>   * gcc.target/i386/pr57756.c: Same.
>   * gcc.target/i386/pr68843-1.c: Same.
>   * gcc.target/i386/pr79804.c: Same.
>   * gcc.target/i386/pr82673.c: Same.
>   * obj-c++.dg/class-protocol-1.mm: Same.
>   * obj-c++.dg/exceptions-3.mm: Same.
>   * obj-c++.dg/exceptions-4.mm: Same.
>   * obj-c++.dg/exceptions-5.mm: Same.
>   * obj-c++.dg/exceptions-6.mm: Same.
>   * obj-c++.dg/method-12.mm: Same.
>   * obj-c++.dg/method-13.mm: Same.
>   * obj-c++.dg/method-6.mm: Same.
>   * obj-c++.dg/method-7.mm: Same.
>   * obj-c++.dg/method-9.mm: Same.
>   * obj-c++.dg/method-lookup-1.mm: Same.
>   * obj-c++.dg/proto-lossage-4.mm: Same.
>   * obj-c++.dg/protocol-qualifier-2.mm: Same.
>   * objc.dg/call-super-2.m: Same.
>   * objc.dg/class-protocol-1.m: Same.
>   * objc.dg/desig-init-1.m: Same.
>   * objc.dg/exceptions-3.m: Same.
>   * objc.dg/exceptions-4.m: Same.
>   * objc.dg/exceptions-5.m: Same.
>   * objc.dg/exceptions-6.m: Same.
>   * objc.dg/method-19.m: Same.
>   * objc.dg/method

Re: patches to detect GCC diagnostics

2019-05-16 Thread Jakub Jelinek
On Thu, May 16, 2019 at 04:58:08PM +0200, Roland Illig wrote:
> -   error ("#pragma GCC target string... is badly formed");
> +   error ("%<#pragma GCC target%> string is badly formed");
> -   error ("#pragma GCC optimize string... is badly formed");
> +   error ("%<#pragma GCC optimize%> string is badly formed");
> 
> I think the "string..." was supposed to go inside the quotes.

No, string is not a keyword here.

Jakub


Re: [PATCH 8/12] fix diagnostic quoting/spelling in the middle-end

2019-05-16 Thread Jeff Law
On 5/14/19 3:32 PM, Martin Sebor wrote:
> The attached patch fixes quoting, spelling, and other formatting
> issues in diagnostics issued from files in middle-end files and
> pointed out by the -Wformat-diag warning.
> 
> Martin
> 
> gcc-wformat-diag-midend.diff
> 
> gcc/ChangeLog:
> 
>   * builtins.c (expand_builtin_atomic_always_lock_free): Quote
>   identifiers, keywords, operators, and types in diagnostics.  Correct
>   quoting, spelling, and sentence capitalization issues.
>   (expand_builtin_atomic_is_lock_free): Same.
>   (fold_builtin_next_arg): Same.
>   * cfgexpand.c (expand_one_var): Same.
>   (tree_conflicts_with_clobbers_p): Same.
>   (expand_asm_stmt): Same.
>   (verify_loop_structure): Same.
>   * cgraphunit.c (process_function_and_variable_attributes): Same.
>   * collect-utils.c (collect_execute): Same.
>   * collect2.c (maybe_run_lto_and_relink): Same.
>   (is_lto_object_file): Same.
>   (scan_prog_file): Same.
>   * convert.c (convert_to_real_1): Same.
>   * diagnostic-core.h (GCC_DIAG_STYLE): Adjust.
>   (GCC_DIAG_RAW_STYLE): New macro.
>   * dwarf2out.c (dwarf2out_begin_prologue): Same.
>   * except.c (verify_eh_tree): Same.
>   * gcc.c (execute): Same.
>   (eval_spec_function): Same.
>   (run_attempt): Same.
>   (driver::set_up_specs): Same.
>   (compare_debug_auxbase_opt_spec_function): Same.
>   * gcov-tool.c (unlink_gcda_file): Same.
>   (do_merge): Same.
>   (do_rewrite): Same.
>   * gcse.c (gcse_or_cprop_is_too_expensive): Same.
>   * gimplify.c (gimplify_asm_expr): Same.
>   (gimplify_adjust_omp_clauses): Same.
>   * hsa-gen.c (gen_hsa_addr_insns): Same.
>   (gen_hsa_insns_for_load): Same.
>   (gen_hsa_cmp_insn_from_gimple): Same.
>   (gen_hsa_insns_for_operation_assignment): Same.
>   (gen_get_level): Same.
>   (gen_hsa_alloca): Same.
>   (omp_simple_builtin::generate): Same.
>   (gen_hsa_atomic_for_builtin): Same.
>   (gen_hsa_insns_for_call): Same.
>   * input.c (dump_location_info): Same.
>   * ipa-devirt.c (compare_virtual_tables): Same.
>   * ira.c (ira_setup_eliminable_regset): Same.
>   * lra-assigns.c (lra_assign): Same.
>   * lra-constraints.c (lra_constraints): Same.
>   * lto-streamer-in.c (lto_input_mode_table): Same.
>   * lto-wrapper.c (get_options_from_collect_gcc_options): Same.
>   (merge_and_complain): Same.
>   (compile_offload_image): Same.
>   (compile_images_for_offload_targets): Same.
>   (debug_objcopy): Same.
>   (run_gcc): Same.
>   (main): Same.
>   * opts.c (print_specific_help): Same.
>   (parse_no_sanitize_attribute): Same.
>   (print_help): Same.
>   (handle_param): Same.
>   * passes.c (opt_pass::clone): Same.
>   * plugin.c (add_new_plugin): Same.
>   (parse_plugin_arg_opt): Same.
>   (try_init_one_plugin): Same.
>   * pretty-print.h (GCC_DIAG_RAW_STYLE): Adjust.
>   (GCC_DIAG_RAW_STYLE): New macro.
>   * print-rtl.c (debug_bb_n_slim): Quote identifiers, keywords,
>   operators, and types in diagnostics.  Correct quoting and spelling
>   issues.
>   * read-rtl-function.c (parse_edge_flag_token): Same.
>   (function_reader::parse_enum_value): Same.
>   * reg-stack.c (check_asm_stack_operands): Same.
>   * regcprop.c (validate_value_data): Same.
>   * sched-rgn.c (make_pass_sched_fusion): Same.
>   * stmt.c (check_unique_operand_names): Same.
>   * targhooks.c (default_target_option_pragma_parse): Same.
>   * tlink.c (recompile_files): Same.
>   * toplev.c (process_options): Same.
>   (do_compile): Same.
>   * trans-mem.c (diagnose_tm_1): Same.
>   (ipa_tm_scan_irr_block): Same.
>   (ipa_tm_diagnose_transaction): Same.
>   * tree-cfg.c (verify_address): Same.  Use get_tree_code_name to
>   format a tree code name in a diagnostic.
>   (verify_types_in_gimple_min_lval): Same.
>   (verify_types_in_gimple_reference): Same.
>   (verify_gimple_call): Same.
>   (verify_gimple_assign_unary): Same.
>   (verify_gimple_assign_binary): Same.
>   (verify_gimple_assign_ternary): Same.
>   (verify_gimple_assign_single): Same.
>   (verify_gimple_switch): Same.
>   (verify_gimple_label): Same.
>   (verify_gimple_phi): Same.
>   (verify_gimple_in_seq): Same.
>   (verify_eh_throw_stmt_node): Same.
>   (collect_subblocks): Same.
>   (gimple_verify_flow_info): Same.
>   (do_warn_unused_result): Same.
>   * tree-inline.c (expand_call_inline): Same.
>   * tree-into-ssa.c (update_ssa): Same.
>   * tree.c (tree_int_cst_elt_check_failed): Same.
>   (tree_vec_elt_check_failed): Same.
>   (omp_clause_operand_check_failed): Same.
>   (verify_type_variant): Same.
>   (verify_type): Same.
>   * value-prof.c (verify_histograms): Same.
>   * varasm.c (assemble_start_funct

Re: [PATCH 6/12] fix diagnostic quoting/spelling in C++

2019-05-16 Thread Jeff Law
On 5/14/19 3:32 PM, Martin Sebor wrote:
> The attached patch fixes quoting, spelling, and other formatting
> issues in diagnostics issued by the C++ front-end and pointed out
> by the -Wformat-diag warning.
> 
> Martin
> 
> gcc-wformat-diag-cp.diff
> 
> gcc/cp/ChangeLog:
> 
>   * call.c (print_z_candidate): Wrap diagnostic text in a gettext
>   macro.  Adjust.
>   (print_z_candidates): Same.
>   (build_conditional_expr_1): Quote keywords, operators, and types
>   in diagnostics.
>   (build_op_delete_call): Same.
>   (maybe_print_user_conv_context): Wrap diagnostic text in a gettext
>   macro.
>   (convert_like_real): Same.
>   (convert_arg_to_ellipsis): Quote keywords, operators, and types
>   in diagnostics.
>   (build_over_call): Same.
>   (joust): Break up an overlong line.  Wrap diagnostic text in a gettext
>   macro.
>   * constexpr.c (cxx_eval_check_shift_p): Spell out >= in English.
>   (cxx_eval_constant_expression): Quote keywords, operators, and types
>   in diagnostics.
>   (potential_constant_expression_1): Same.
>   * cp-gimplify.c (cp_genericize_r): Same.
>   * cp-tree.h (GCC_DIAG_STYLE): Adjust.
>   (GCC_DIAG_RAW_STYLE): New macro.
>   * cvt.c (maybe_warn_nodiscard): Quote keywords, operators, and types
>   in diagnostics.
>   (type_promotes_to): Same.
>   * decl.c (check_previous_goto_1): Same.
>   (check_goto): Same.
>   (start_decl): Same.
>   (cp_finish_decl): Avoid parenthesizing a sentence for consistency.
>   (grok_op_properties): Quote keywords, operators, and types
>   in diagnostics.
>   * decl2.c (grokfield): Same.
>   (coerce_delete_type): Same.
>   * except.c (is_admissible_throw_operand_or_catch_parameter): Same.
>   * friend.c (do_friend): Quote C++ tokens.
>   * init.c (build_new_1): Quote keywords, operators, and types
>   in diagnostics.
>   (build_vec_delete_1): Same.
>   (build_delete): Same.
>   * lex.c (parse_strconst_pragma): Same.
>   (handle_pragma_implementation): Same.
>   (unqualified_fn_lookup_error): Same.
>   * mangle.c (write_type): Same.
>   * method.c (defaulted_late_check): Avoid two consecutive punctuators.
>   * name-lookup.c (cp_binding_level_debug): Remove a trailing newline.
>   (pop_everything): Same.
>   * parser.c (cp_lexer_start_debugging): Quote a macro name.
>   in a diagnostic
>   (cp_lexer_stop_debugging): Same.
>   (cp_parser_userdef_numeric_literal): Quote a C++ header name
>   in a diagnostic.
>   (cp_parser_nested_name_specifier_opt): Quote keywords, operators,
>   and types in diagnostics.
>   (cp_parser_question_colon_clause): Same.
>   (cp_parser_asm_definition): Same.
>   (cp_parser_init_declarator): Same.
>   (cp_parser_template_declaration_after_parameters): Avoid capitalizing
>   a sentence in a diagnostic.
>   (cp_parser_omp_declare_reduction): Quote keywords, operators, and types
>   in diagnostics.
>   (cp_parser_transaction): Same.
>   * pt.c (maybe_process_partial_specialization): Replace second call
>   to permerror with inform for consistency with other uses.
>   (expand_integer_pack): Quote keywords, operators, and types
>   in diagnostics.
>   * rtti.c (get_typeid): Quote keywords, operators, and types
>   in diagnostics.
>   (build_dynamic_cast_1): Same.
>   * semantics.c (finish_asm_stmt): Same.
>   (finish_label_decl): Same.
>   (finish_bases): Same.
>   (finish_offsetof): Same.
>   (cp_check_omp_declare_reduction): Same.
>   (finish_decltype_type): Same.
>   * tree.c (handle_init_priority_attribute): Same.  Add detail
>   to diagnostics.
>   (maybe_warn_zero_as_null_pointer_constant): Same.
>   * typeck.c (cp_build_binary_op): Quote keywords, operators, and types
>   in diagnostics.
>   (cp_build_unary_op): Same.
>   (check_for_casting_away_constness): Same.
>   (build_static_cast): Same.
>   (build_const_cast_1): Same.
>   (maybe_warn_about_returning_address_of_local): Same.
>   (check_return_expr): Same.
>   * typeck2.c (abstract_virtuals_error_sfinae): Same.
>   (digest_init_r): Replace a tab with spaces in a diagnostic.
>   (build_functional_cast): Quote keywords, operators, and types
>   in diagnostics.
OK.

Jeff



Re: [PATCH 5/12] fix diagnostic quoting/spelling in c-family

2019-05-16 Thread Jeff Law
On 5/14/19 3:32 PM, Martin Sebor wrote:
> The attached patch fixes quoting, spelling, and other formatting
> issues in diagnostics issued from files in the c-family/ directory
> and pointed out by the -Wformat-diag warning.
> 
> Martin
> 
> gcc-wformat-diag-c-family.diff
> 
> gcc/c-family/ChangeLog:
> 
>   * c-attribs.c (handle_no_sanitize_attribute): Quote identifiers,
>   keywords, operators, and types in diagnostics.
>   (handle_scalar_storage_order_attribute): Same.
>   (handle_mode_attribute): Same.
>   (handle_visibility_attribute): Same.
>   (handle_assume_aligned_attribute): Same.
>   (handle_no_split_stack_attribute): Same.
>   * c-common.c (shorten_compare): Same.
>   (c_common_truthvalue_conversion): Same.
>   (cb_get_source_date_epoch): Same.
>   * c-common.h (GCC_DIAG_STYLE): Adjust.
>   (GCC_DIAG_RAW_STYLE): New macro.
>   * c-lex.c (cb_def_pragma): Quote keywords, operators, and types
>   in diagnostics.
>   (interpret_float): Same.
>   * c-omp.c (c_finish_omp_for): Same.
>   * c-opts.c (c_common_post_options): Same.
>   * c-pch.c (c_common_pch_pragma): Same.
>   * c-pragma.c (pop_alignment): Same.
>   (handle_pragma_pack): Same.
>   (apply_pragma_weak): Same.
>   (handle_pragma_weak): Same.
>   (handle_pragma_scalar_storage_order): Same.
>   (handle_pragma_redefine_extname): Same.
>   (add_to_renaming_pragma_list): Same.
>   (maybe_apply_renaming_pragma): Same.
>   (push_visibility): Same.
>   (handle_pragma_visibility): Same.
>   (handle_pragma_optimize): Same.
>   (handle_pragma_message): Same.
>   * c-warn.c (warn_for_omitted_condop): Same.
>   (lvalue_error): Same.
OK
jeff


Re: [PATCH 11/12] fix diagnostic quoting/spelling issues in i386 back-end

2019-05-16 Thread Jeff Law
On 5/14/19 3:33 PM, Martin Sebor wrote:
> The attached patch fixes quoting, spelling, and other formatting
> issues in diagnostics issued by the i386 back-end and pointed out
> by the -Wformat-diag warning.
> 
> Martin
> 
> gcc-wformat-diag-i386.diff
> 
> gcc/ChangeLog:
> 
>   * config/i386/i386-expand.c (get_element_number): Quote keywords
>   and other internal names in diagnostics.  Adjust other diagnostic
>   formatting issues noted by -Wformat-diag.
>   * config/i386/i386-features.c
>   (ix86_mangle_function_version_assembler_name): Same.
>   * config/i386/i386-options.c (ix86_handle_abi_attribute): Same.
>   * config/i386/i386.c (ix86_function_type_abi): Same.
>   (ix86_function_ms_hook_prologue): Same.
>   (classify_argument): Same.
>   (ix86_expand_prologue): Same.
>   (ix86_md_asm_adjust): Same.
>   (ix86_memmodel_check): Same.
OK
jeff


Re: [PATCH 10/12] fix diagnostic quoting/spelling in D

2019-05-16 Thread Jeff Law
On 5/14/19 3:33 PM, Martin Sebor wrote:
> The attached patch fixes quoting, spelling, and other formatting
> issues in diagnostics issued by the D front end and pointed out
> by the -Wformat-diag warning.
> 
> Martin
> 
> gcc-wformat-diag-d.diff
> 
> gcc/d/ChangeLog:
> 
>   * d/d-builtins.cc (d_init_builtins): Quote keywords, operators,
>   and types in diagnostics.
>   * d/d-codegen.cc (get_array_length): Same.  Replace can't with cannot.
>   * d/d-convert.cc (convert_expr): Same.
>   * d/d-frontend.cc (getTypeInfoType): Quote an option name in
>   a diagnostic.
>   * d/d-lang.cc (d_handle_option): Same.
>   (d_parse_file): Same.
>   * d/decl.cc: Remove a trailing period from a diagnostic.
>   * d/expr.cc: Use a directive for an apostrophe.
>   * d/toir.cc: Quote keywords, operators, and types in diagnostics.
>   * d/typeinfo.cc (build_typeinfo): Quote an option name in a diagnostic.
> 
> diff --git a/gcc/d/d-builtins.cc b/gcc/d/d-builtins.cc
OK
jeff


Re: [PATCH 3/12] fix diagnostic quoting/spelling in Brig

2019-05-16 Thread Jeff Law
On 5/14/19 3:32 PM, Martin Sebor wrote:
> The attached patch fixes quoting, spelling, and other formatting
> issues in diagnostics issued by the Brig front end and pointed
> out by the -Wformat-diag warning.
> 
> Martin
> 
> gcc-wformat-diag-brig.diff
> 
> gcc/brig/ChangeLog:
> 
>   * brigfrontend/brig-control-handler.cc
>   (brig_directive_control_handler::operator): Remove trailing newline
>   from a diagnostic.
>   * brigfrontend/brig-module-handler.cc
>   (brig_directive_module_handler::operator): Remove a duplicated space
>   from a diagnostic.
OK.
Jeff


Re: [PATCH 7/12] fix diagnostic quoting/spelling in libgcc

2019-05-16 Thread Jeff Law
On 5/14/19 3:32 PM, Martin Sebor wrote:
> The attached patch fixes quoting, spelling, and other formatting
> issues in diagnostics issued from files in the libgcc directory
> and pointed out by the -Wformat-diag warning.
> 
> Martin
> 
> gcc-wformat-diag-libgcc.diff
> 
> libgcc/ChangeLog:
> 
>   * libgcov-util.c (read_gcda_file): Remove trailing newline.
OK
jeff



Re: [PATCH 4/12] fix diagnostic quoting/spelling in the C front-end

2019-05-16 Thread Jeff Law
On 5/14/19 3:32 PM, Martin Sebor wrote:
> The attached patch fixes quoting, spelling, and other formatting
> issues in diagnostics issued by the C front-end and pointed out
> by the -Wformat-diag warning.
> 
> Martin
> 
> gcc-wformat-diag-c.diff
> 
> gcc/c/ChangeLog:
> 
>   * c-decl.c (start_decl): Quote keywords, operators, and
>   types in diagnostics.
>   (finish_decl): Same.
>   * c-parser.c (c_parser_asm_statement): Same.
>   (c_parser_conditional_expression): Same.
>   (c_parser_transaction_cancel): Same.
>   * c-typeck.c (c_common_type): Same.
>   (build_conditional_expr): Same.
>   (digest_init): Same.
>   (process_init_element): Same.
>   (build_binary_op): Same.
OK
jeff


Re: [PATCH 2/12] fix diagnostic quoting/spelling in ada

2019-05-16 Thread Jeff Law
On 5/14/19 3:31 PM, Martin Sebor wrote:
> The attached patch fixes quoting, spelling, and other formatting
> issues in diagnostics issued by the Ada front and pointed out by
> the -Wformat-diag warning.
> 
> Martin
> 
> gcc-wformat-diag-ada.diff
> 
> gcc/ada/ChangeLog:
> 
>   * gcc-interface/trans.c (check_inlining_for_nested_subprog): Quote
>   reserved names.
OK.

jeff


Re: [Contrib PATCH] Add scripts to convert GCC repo from SVN to Git

2019-05-16 Thread Jeff Law
On 5/16/19 12:36 PM, Ramana Radhakrishnan wrote:
> On Thu, May 16, 2019 at 5:41 PM Maxim Kuvyrkov
>  wrote:
>>
>>> On May 16, 2019, at 7:22 PM, Jeff Law  wrote:
>>>
>>> On 5/15/19 5:19 AM, Richard Biener wrote:

 For the official converted repo do we really want all (old)
 development branches to be in the
 main git repo?  I suppose we could create a readonly git from the
 state of the whole repository
 at the point of conversion (and also keep the SVN in readonly mode),
 just to make migration
 of content we want easy in the future?
>>> I've always assumed we'd keep the old SVN tree read-only for historical
>>> purposes.  I strongly suspect that, ignoring release branches, that
>>> non-active branches just aren't terribly interesting.
>>
>> Let's avoid mixing the two discussions: (1) converting svn repo to git (and 
>> getting community consensus to switch to git) and (2) deciding on which 
>> branches to keep in the new repo.
>>
> 
> I'm hoping that there is still community consensus to switch to git.
> 
> Personally speaking, a +1 to switch to git.
Absolutely +1 for converting as well.

jeff


[PATCH] Refactor tree-affine.c to not build GENERIC trees

2019-05-16 Thread Richard Biener


The following picks up the patch from last December, refactoring
aff_combination_expand to not use gimple_assign_rhs_to_tree
but analyze GIMPLE stmts directly.

Last December I was stuck at

FAIL: gcc.dg/tree-ssa/ivopts-lt-2.c scan-tree-dump-times ivopts "PHI" 1
FAIL: gcc.dg/tree-ssa/ivopts-lt-2.c scan-tree-dump-times ivopts "p_[0-9]* 
<" 1

where IVOPTs relied on expanding _1 as (long unsigned) i_6(D) which
doesn't add anything interesting affine-combination wise.  I now
chose to simply retain that particular "expansion" and mark it
with ???, likewise the other point I made about hiding conversions.

Bootstrapped and tested on x86_64-unknown-linux-gnu, applied to trunk.

Leaves us with two necessary and one stupid caller of 
gimple_assign_rhs_to_tree (will fix that).

Richard.

2019-05-16  Richard Biener  

* tree-affine.c (expr_to_aff_combination): New function split
out from...
(tree_to_aff_combination): ... here.
(aff_combination_expand): Avoid building a GENERIC tree.

Index: gcc/tree-affine.c
===
--- gcc/tree-affine.c   (revision 271282)
+++ gcc/tree-affine.c   (working copy)
@@ -259,104 +259,66 @@ aff_combination_convert (aff_tree *comb,
 }
 }
 
-/* Splits EXPR into an affine combination of parts.  */
+/* Tries to handle OP0 CODE OP1 as affine combination of parts.  Returns
+   true when that was successful and returns the combination in COMB.  */
 
-void
-tree_to_aff_combination (tree expr, tree type, aff_tree *comb)
+static bool
+expr_to_aff_combination (aff_tree *comb, tree_code code, tree type,
+tree op0, tree op1 = NULL_TREE)
 {
   aff_tree tmp;
-  enum tree_code code;
-  tree cst, core, toffset;
   poly_int64 bitpos, bitsize, bytepos;
-  machine_mode mode;
-  int unsignedp, reversep, volatilep;
-
-  STRIP_NOPS (expr);
 
-  code = TREE_CODE (expr);
   switch (code)
 {
 case POINTER_PLUS_EXPR:
-  tree_to_aff_combination (TREE_OPERAND (expr, 0), type, comb);
-  tree_to_aff_combination (TREE_OPERAND (expr, 1), sizetype, &tmp);
+  tree_to_aff_combination (op0, type, comb);
+  tree_to_aff_combination (op1, sizetype, &tmp);
   aff_combination_add (comb, &tmp);
-  return;
+  return true;
 
 case PLUS_EXPR:
 case MINUS_EXPR:
-  tree_to_aff_combination (TREE_OPERAND (expr, 0), type, comb);
-  tree_to_aff_combination (TREE_OPERAND (expr, 1), type, &tmp);
+  tree_to_aff_combination (op0, type, comb);
+  tree_to_aff_combination (op1, type, &tmp);
   if (code == MINUS_EXPR)
aff_combination_scale (&tmp, -1);
   aff_combination_add (comb, &tmp);
-  return;
+  return true;
 
 case MULT_EXPR:
-  cst = TREE_OPERAND (expr, 1);
-  if (TREE_CODE (cst) != INTEGER_CST)
+  if (TREE_CODE (op1) != INTEGER_CST)
break;
-  tree_to_aff_combination (TREE_OPERAND (expr, 0), type, comb);
-  aff_combination_scale (comb, wi::to_widest (cst));
-  return;
+  tree_to_aff_combination (op0, type, comb);
+  aff_combination_scale (comb, wi::to_widest (op1));
+  return true;
 
 case NEGATE_EXPR:
-  tree_to_aff_combination (TREE_OPERAND (expr, 0), type, comb);
+  tree_to_aff_combination (op0, type, comb);
   aff_combination_scale (comb, -1);
-  return;
+  return true;
 
 case BIT_NOT_EXPR:
   /* ~x = -x - 1 */
-  tree_to_aff_combination (TREE_OPERAND (expr, 0), type, comb);
+  tree_to_aff_combination (op0, type, comb);
   aff_combination_scale (comb, -1);
   aff_combination_add_cst (comb, -1);
-  return;
-
-case ADDR_EXPR:
-  /* Handle &MEM[ptr + CST] which is equivalent to POINTER_PLUS_EXPR.  */
-  if (TREE_CODE (TREE_OPERAND (expr, 0)) == MEM_REF)
-   {
- expr = TREE_OPERAND (expr, 0);
- tree_to_aff_combination (TREE_OPERAND (expr, 0), type, comb);
- tree_to_aff_combination (TREE_OPERAND (expr, 1), sizetype, &tmp);
- aff_combination_add (comb, &tmp);
- return;
-   }
-  core = get_inner_reference (TREE_OPERAND (expr, 0), &bitsize, &bitpos,
- &toffset, &mode, &unsignedp, &reversep,
- &volatilep);
-  if (!multiple_p (bitpos, BITS_PER_UNIT, &bytepos))
-   break;
-  aff_combination_const (comb, type, bytepos);
-  if (TREE_CODE (core) == MEM_REF)
-   {
- tree mem_offset = TREE_OPERAND (core, 1);
- aff_combination_add_cst (comb, wi::to_poly_widest (mem_offset));
- core = TREE_OPERAND (core, 0);
-   }
-  else
-   core = build_fold_addr_expr (core);
-
-  if (TREE_CODE (core) == ADDR_EXPR)
-   aff_combination_add_elt (comb, core, 1);
-  else
-   {
- tree_to_aff_combination (core, type, &tmp);
- aff_combination_add (comb, &tmp);
-   }
-  if (toffset)
-   {
- tree_to_aff_combination (toffset, type, &tmp);
- 

Re: preserve more debug stmts in gimple jump threading

2019-05-16 Thread Jeff Law
On 5/16/19 12:46 PM, Richard Biener wrote:
> On Thu, May 16, 2019 at 6:14 PM Jeff Law  wrote:
>>
>> On 5/15/19 3:03 PM, Alexandre Oliva wrote:
>>> On May 15, 2019, Richard Biener  wrote:
>>>
 On Wed, May 15, 2019 at 10:20 AM Alexandre Oliva  wrote:
>
> Gimple jump threading does not duplicate forwarder blocks that might
> be present before or after the copied block.
>>>
 Empty forwarder blocks I presume?
>>>
>>> Empty except for debug stmts and possibly a final conditional jump that,
>>> in the threading path, becomes unconditional.
>> Right.  The tree-ssa-threadupate code all pre-dates the SEME copier
>> which is a *much* better way to handle duplicating the region.
>>
>> Initially we allowed only one block with side effects in the jump
>> threading path.  That's all we really knew how to do correctly.  We
>> extended that to ignore forwarders at some point since they didn't need
>> to be copied -- you just need to know where the forwarder block will go.
>>
>> We later added the ability to copy a second block with side effects in
>> the jump threading path.
>>
>> But I'd really like to just remove tree-ssa-threadupate.c.  It's
>> horribly convoluted due to old requirements.  I'm confident we could use
>> the SEME copier to handle all the existing cases in a much simpler fashion.
> 
> Not sure if that's the best infrastructure to use (it cannot copy a path
> crossing a backedge).  tracer does the duplicating incrementally
> for example.  Technically the duplication isn't difficult but some
> simplification on the fly would be nice (like actually merging the
> blocks and propagating out PHIs and constants).
We don't need to worry about backedges in this code anymore.

And yes, some simplification would be helpful.  In fact using the SEME
copier actually helps with that because it requires a bit more structure.

THe biggest downside I see with moving to the SEME copier here would be
that when we have multiple incoming edges that thread to the same
outgoing edge, the current copier will create a single duplicate.  We'd
likely end up with multiple duplicates using the SEME copier.

Jeff


Re: preserve more debug stmts in gimple jump threading

2019-05-16 Thread Richard Biener
On Thu, May 16, 2019 at 6:14 PM Jeff Law  wrote:
>
> On 5/15/19 3:03 PM, Alexandre Oliva wrote:
> > On May 15, 2019, Richard Biener  wrote:
> >
> >> On Wed, May 15, 2019 at 10:20 AM Alexandre Oliva  wrote:
> >>>
> >>> Gimple jump threading does not duplicate forwarder blocks that might
> >>> be present before or after the copied block.
> >
> >> Empty forwarder blocks I presume?
> >
> > Empty except for debug stmts and possibly a final conditional jump that,
> > in the threading path, becomes unconditional.
> Right.  The tree-ssa-threadupate code all pre-dates the SEME copier
> which is a *much* better way to handle duplicating the region.
>
> Initially we allowed only one block with side effects in the jump
> threading path.  That's all we really knew how to do correctly.  We
> extended that to ignore forwarders at some point since they didn't need
> to be copied -- you just need to know where the forwarder block will go.
>
> We later added the ability to copy a second block with side effects in
> the jump threading path.
>
> But I'd really like to just remove tree-ssa-threadupate.c.  It's
> horribly convoluted due to old requirements.  I'm confident we could use
> the SEME copier to handle all the existing cases in a much simpler fashion.

Not sure if that's the best infrastructure to use (it cannot copy a path
crossing a backedge).  tracer does the duplicating incrementally
for example.  Technically the duplication isn't difficult but some
simplification on the fly would be nice (like actually merging the
blocks and propagating out PHIs and constants).

Eventually I'll find cycles to implement sth like this in a greedy
fashion for loop unrolling.

Richard.

>
>
> Jeff


Re: [PATCH v2 3/3] Consider doloop cmp use in ivopts

2019-05-16 Thread Jeff Law
On 5/15/19 2:47 AM, Richard Biener wrote:
> On Wed, 15 May 2019, Kewen.Lin wrote:
> 
>> on 2019/5/14 下午3:26, Richard Biener wrote:
>>> On Tue, May 14, 2019 at 5:10 AM  wrote:

 From: Kewen Lin 

 Previous version link for background:
 https://gcc.gnu.org/ml/gcc-patches/2019-04/msg00912.html

 Firstly, it's to call predict_doloop_p hook to check this
 loop will be transformed to doloop or not, if yes, find
 the expected comp iv use and its dependent original iv,
 set the iv candidate as bind_cand of the group.
 In following candidate selection process, we will bypass
 the group with bind_cand, since we don't want to affect
 global decision making for an iv use which will be
 eliminated eventually.  At the time of iv candidate set
 finalization, we will fill the cost pair for the group
 with bind_cand.  If the bind_cand is already in the final
 set, then just use it. Otherwise, we can check whether one
 of existing final set is better and fill with that if so.

 Bootstrapped and regression testing passed on powerpc64le.

 Is it ok for trunk?
>>>
>>> I wonder what prevents IVOPTs to consider a counter IV
>>> (eventually such candidate needs to be added if that's not
>>> already done) to be the most profitable variant w/o any
>>> of the other changes?  I guess that would be costing of
>>> the IV adjust plus branch which we would need to lower
>>> in case there's nothing inside the loop that would make
>>> later doloop transform fail?
>>>
>>> Richard.
>>>
>>
>> If the question is for "w/o this patch", I think IVOPTs
>> can find counter IV as the most profitable one for the cmp
>> use in most time.  But the key issue is that it may stop
>> us to bring in more iv cands.  We have to add on iv cost
>> of new cand desirable for some iv use, it's probably more
>> than the cost of just using counter IV for the interest
>> use.  
>>
>> If the question is for "w/i this patch", since we bypass
>> the doloop cmp use in candidate determination algorithm, 
>> it's possible that some other iv cands are preferred for 
>> the remaining uses rather than the counter IV. For example,
>> for some address type iv use, iv cand with memory based is
>> mostly better.
> 
> Ah, so the key issue is that the doloop IV is "free"?  That
> is, it doesn't consume a general register and whatnot?  That
> is allocating this IV doesn't really interfere with other IVs?
> But can other uses be based on the doloop IV easily (if the
> IV doesn't reside in a general reg?)?
That's my understanding of how at least some of the low overhead looping
instructions work on some ISAs (ppc included).  There's a special loop
count register and the low overhead looping insns handle the decrement
and branch.

This is similar, but different than something like m68k dbcc where the
counter is a GPR.

For architectures like PPC, we probably don't want to use the loop count
for anything else as it's likely expensive to get data in/out of the the
loop count register.

For architectures where the counter is stored in a GPR, then we have
more flexibility in how we use it.

So at least part of the problem is cost modeling of this.  It's all
pretty low level, so not really a good match for the goals of gimple.
But we may ultimately have no choice here but to be pragmatic like we've
done with stuff like vector widths and allow some target properties to
bleed in.

The only saving grace is the existence of low overhead loops is static
-- the target either has them or it doesn't.  Similarly whether or not
the counter is a GPR or not is a static property of the target.

> Otherwise I understand that IVOPTs doesn't properly cost
> the doloop IV update and conditional branch.  That's clearly
> something we should fix (maybe even indepenently on other
> changes). 
It feels independent to me.

 One important thing is that we need to base costs
> on a common base to not compare apples and oranges, didn't
> dig into your patch in detail enough to see whether it
> fits into the general cost model or whether it is a hack
> ontop of everything.
Agreed.

jeff



Re: [PATCH] True IPA reimplementation of IPA-SRA

2019-05-16 Thread Richard Biener
On Thu, May 16, 2019 at 4:04 PM Martin Jambor  wrote:
>
> Hi Richi,
>
> On Thu, May 16 2019, Richard Biener wrote:
> > On Fri, May 10, 2019 at 10:31 AM Martin Jambor  wrote:
> >>
> >> Hello,
> >>
> >> this is a follow-up from a WIP patch I sent here in late December:
> >> https://gcc.gnu.org/ml/gcc-patches/2018-12/msg01765.html
> >>
> >> Just like the last time, the patch below is is a reimplementation of
> >> IPA-SRA to make it a full IPA pass that can handle strictly connected
> >> components in the call-graph, can take advantage of LTO and does not
> >> weirdly switch functions in pass-pipeline like our current quasi-IPA SRA
> >> does.  Unlike the current IPA-SRA it can also remove return values, even
> >> in SCCs.  On the other hand, it is less powerful when it comes to
> >> structures passed by reference.  By design it will not create references
> >> to bits of an aggregate because that turned out to be just obfuscation
> >> in practice.  However, it also cannot usually split aggregates passed by
> >> reference that are just passed to another function (where splitting
> >> would be useful) because it cannot perform the same TBAA analysis like
> >> the current implementation which already knows what types it should look
> >> at because it has access to bodies of all functions attempts to modify.
> >
> > So that's just because the analysis is imperfect?  I mean if we can handle
> >
> >  foo (X *p) { do_something (p->a); }
> >  X a; a.a = 1; foo (&a);
> >
> > then we should be able to handle
> >
> >  bar (X *p) { foo (p); }
> >  X a; a.a = 1; bar (&a);
>
> So because the call to foo dominates EXIT and uses default definition
> MEM SSA, this example would be handled fine by the patch.  But it cannot
> handle for example (assuming p->a is an int):
>
>bar (X *p) { *global_double_ptr = 0.0; foo (p); }
>
> The current IPA-SRA can, because at the time it looks at foo, bar has
> been already processed, and so it knows the load is of integer type.  If
> necessary, we could try TBAA for fields in X if there is a reasonable
> number of them and at IPA level just check a flag saying that bar does
> not engage in some type-punning.

Ah, I see.  It is of course sth we could analyze locally and propagate,
like forming an access tree for each parameters much like SRA
collects it (eventually marking sub-accesses that are always performed).

> Another example would be something like:
>
>bar (X *p) { if (cond) bar (p); else do_something_else (p->a); }
>
> The problem here is that the check if p is sure to be dereferenced when
> bar is called is also done at the intra-procedural level.  Well, it is
> not actually a test if p is dereferenced but if the offset from p which
> is known to be dereferenced covers p->a.  We could do it symbolically,
> arrive at some expression of the form
>
>   MIN(offsetof(a), known_dereference_offset_in (bar))
>
> store that to the IPA summary and then evaluate at IPA level.  If we
> think that it is worth it.
>
> Still, I don't think the situation is that much worse in practice
> because IPA-SRA can only handle fairly simple cases anyway, and those
> are actually often taken care of by indirect inlining.

Agreed.  I suppose the new pass is OK as-is feature wise and we can
always enhance it later if we figure out it is worth it.

> > Thanks for doing this.  I wonder how difficult it is to split the
> > patch into a) old IPA-SRA removal, b) refactoring c) IPA-SRA add
> > (probably easiest in that order).  It's quite a large number of
> > changes, a) being mostly uninteresting (and pre-approved hereby, just
> > not independently, of course), b) is uninteresting to me, but I would
> > like to look at c), not sure if that's really only the new file,
> > probably not since IPA modifications have infrastructure bits.
>
> The analysis parts of the new IPA-SRA, both the intra-procedural and
> inter-procedural are entirely in the new file ipa-sra.c so if you want
> to review that, just grab that file from
> https://gcc.gnu.org/git/?p=gcc.git;a=tree;h=refs/heads/jamborm/ipa-sra;hb=refs/heads/jamborm/ipa-sra
>
> The transformation part, however, are what the "refactoring" is really
> about because it is not the pass but rather the cgraph cloning
> infrastructure that performs the actual transformation.  This is so on
> purpose because not only the bodies of changed functions need to be
> adjusted but also all calls to them, and you cannot register a
> pass-specific transformation function for that - and I need to actually
> pass information from the body transformation to the call transformation
> anyway.
>
> So yes, this split would be possible and perhaps even easy but it would
> not make much of a difference.

OK, thanks for explaining - I will look at the new file then.

Richard.

> Thanks,
>
> Martin
>
>
> >
> > Sorry for not mentioning earlier.  Maybe just splitting out a) already
> > helps (you seem to remove code not in tree-sra.c).
> >
> > Thanks,
> > Richard.
> >
> >> Martin
> >>
> >>
> 

Re: [Contrib PATCH] Add scripts to convert GCC repo from SVN to Git

2019-05-16 Thread Ramana Radhakrishnan
On Thu, May 16, 2019 at 5:41 PM Maxim Kuvyrkov
 wrote:
>
> > On May 16, 2019, at 7:22 PM, Jeff Law  wrote:
> >
> > On 5/15/19 5:19 AM, Richard Biener wrote:
> >>
> >> For the official converted repo do we really want all (old)
> >> development branches to be in the
> >> main git repo?  I suppose we could create a readonly git from the
> >> state of the whole repository
> >> at the point of conversion (and also keep the SVN in readonly mode),
> >> just to make migration
> >> of content we want easy in the future?
> > I've always assumed we'd keep the old SVN tree read-only for historical
> > purposes.  I strongly suspect that, ignoring release branches, that
> > non-active branches just aren't terribly interesting.
>
> Let's avoid mixing the two discussions: (1) converting svn repo to git (and 
> getting community consensus to switch to git) and (2) deciding on which 
> branches to keep in the new repo.
>

I'm hoping that there is still community consensus to switch to git.

Personally speaking, a +1 to switch to git.

regards
Ramana

> With git, we can always split away unneeded history by removing unnecessary 
> branches and tags and re-packing the repo.  We can equally easily bring that 
> history back if we change our minds.
>
> --
> Maxim Kuvyrkov
> www.linaro.org
>


Re: [PATCH 2/2] [PR88836][aarch64] Fix CSE to process parallel rtx dest one by one

2019-05-16 Thread Jeff Law
On 5/15/19 8:08 PM, kugan.vivekanandara...@linaro.org wrote:
> From: Kugan Vivekanandarajah 
> 
> This patch changes cse_insn to process parallel rtx one by one such that
> any destination rtx in cse list is invalidated before processing the
> next.
> 
> gcc/ChangeLog:
> 
> 2019-05-16  Kugan Vivekanandarajah  
> 
>   PR target/88834
>   * cse.c (safe_hash): Handle VEC_DUPLICATE.
>   (exp_equiv_p): Likewise.
>   (hash_rtx_cb): Change to accept const_rtx.
>   (struct set): Add field to record if uses of dest is invalidated.
>   (cse_insn): For parallel rtx, invalidate register set by first rtx
>   before processing the next.
> 
> gcc/testsuite/ChangeLog:
> 
> 2019-05-16  Kugan Vivekanandarajah  
> 
>   PR target/88834
>   * gcc.target/aarch64/pr88834.c: New test.
I haven't dug into the code, so if my concerns are off base, just say so.


> (insn 19 18 20 3 (parallel [
> (set (reg:VNx4BI 93 [ next_mask_18 ])
> (unspec:VNx4BI [
> (const_int 0 [0])
> (reg:DI 95 [ _33 ])
> ] UNSPEC_WHILE_LO))
> (set (reg:CC 66 cc)
> (compare:CC (unspec:SI [
> (vec_duplicate:VNx4BI (const_int 1 [0x1]))
> (reg:VNx4BI 93 [ next_mask_18 ])
> ] UNSPEC_PTEST_PTRUE)
> (const_int 0 [0])))
> ]) 4244 {while_ultdivnx4bi}
RTL semantics in case of a PARALLEL are that all the inputs are
consumed, then all outputs are generated.  So for the example insn reg93
is read in the second set before it's set in the output of the first set.


So the ordering you're using for processing/invaliding seems unexpected.

jeff



Re: [PATCH] Handle a location with NULL as a file (PR driver/90495)

2019-05-16 Thread Jeff Law
On 5/16/19 5:19 AM, Martin Liška wrote:
> Hi.
> 
> With LTO and -fsanitize we end up with a static ctor
> (_GLOBAL__sub_I_00099_0_main) that has no source location.
> With that stack usage will print '(artificial)' as a location
> of the function.
> 
> Patch can bootstrap on x86_64-linux-gnu and survives regression tests.
> 
> Ready to be installed?
> Thanks,
> Martin
> 
> gcc/ChangeLog:
> 
> 2019-05-16  Martin Liska  
> 
>   PR driver/90495
>   * toplev.c (output_stack_usage): With LTO and sanitizer it
>   happens that a global ctor (_GLOBAL__sub_I_00099_0_main)
>   has no file location.
> ---
>  gcc/toplev.c | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
> 
> 
OK
jeff


Re: [PATCH PR57534]Support strength reduction for MEM_REF in slur

2019-05-16 Thread Jeff Law
On 5/14/19 10:29 PM, bin.cheng wrote:
> Hi,
> As noted in PR57534 comment #33, SLSR currently doesn't strength reduce memory
> references in reported cases, which conflicts with its comment at the 
> beginning of file.
> The main reason is in functions slsr_process_ref and restructure_reference 
> which
> rejects MEM_REF by handled_compoenent_p in the first place.  This patch 
> identifies
> and creates CAND_REF for MEM_REF by restructuring base/offset.
> 
> Note the patch only affects cases with multiple reducible MEM_REF.
> 
> Also note, with this patch, [base + cst_offset] addressing mode would be 
> generally
> preferred.  I need to adjust three existing tests:
> * gcc.dg/tree-ssa/isolate-5.c
> * gcc.dg/tree-ssa/ssa-hoist-4.c
> Though address computation is reduced out of memory reference, the generated 
> assembly is not worsened.
> 
> * gcc.dg/tree-ssa/slsr-3.c
> The generated assembly has two more instructions:
> <   movslq  %edx, %rcx
> <   movl(%rsi,%rcx,4), %r9d
> <   leaq0(,%rcx,4), %rax
> <   leal2(%r9), %r8d
> <   movl%r8d, (%rdi,%rcx,4)
> <   movl4(%rsi,%rax), %ecx
> <   addl$2, %ecx
> <   movl%ecx, 4(%rdi,%rax)
> <   movl8(%rsi,%rax), %ecx
> <   addl$2, %ecx
> <   movl%ecx, 8(%rdi,%rax)
> <   movl12(%rsi,%rax), %ecx
> <   addl$2, %ecx
> <   movl%ecx, 12(%rdi,%rax)
> ---
>>   movslq  %edx, %rax
>>   salq$2, %rax
>>   addq%rax, %rsi
>>   addq%rax, %rdi
>>   movl(%rsi), %eax
>>   addl$2, %eax
>>   movl%eax, (%rdi)
>>   movl4(%rsi), %eax
>>   addl$2, %eax
>>   movl%eax, 4(%rdi)
>>   movl8(%rsi), %eax
>>   addl$2, %eax
>>   movl%eax, 8(%rdi)
>>   movl12(%rsi), %eax
>>   addl$2, %eax
>>   movl%eax, 12(%rdi)
> 
> Seems to me this is not deteriorating and "salq" can be saved by two forward 
> propagation.
> 
> Bootstrap and test on x86_64, any comments?
> 
> Thanks,
> bin
> 
> 2019-05-15  Bin Cheng  
> 
> PR tree-optimization/57534
> * gimple-ssa-strength-reduction.c (restructure_base_offset): New.
> (restructure_reference): Call restructure_base_offset when offset is
> NULL.
> (slsr_process_ref): Handle MEM_REF.
> 
> 2018-05-15  Bin Cheng  
> 
> PR tree-optimization/57534
> * gcc.dg/tree-ssa/pr57534.c: New test.
> * gcc.dg/tree-ssa/isolate-5.c: Adjust checking strings.
> * gcc.dg/tree-ssa/slsr-3.c: Ditto.
> * gcc.dg/tree-ssa/ssa-hoist-4.c: Ditto.
I'll let Bill comment on the actual SLSR changes.  WRT the isolate-5.c
changes, I don't think your change impacts the purpose behind the test
at all.  So no objections from me.

jeff
> 



Re: [PATCH] gcc: move assemble_start_function / assemble_end_function to output_mi_thunk

2019-05-16 Thread Max Filippov
On Wed, May 15, 2019 at 2:46 PM Richard Sandiford
 wrote:
>
> Max Filippov  writes:
> > Let backends call assemble_start_function after they have generated
> > thunk function body so that a constant pool could be output if it is
> > required. This may help backends to avoid implementing custom constant
> > loading code specifically for the MI thunk and reuse existing
> > functionality.
> >
> > gcc/
> > 2019-01-08  Max Filippov  
> >
> >   * cgraphunit.c (cgraph_node::expand_thunk): Remove
> >   assemble_start_function and assemble_end_function calls.
> >   * config/alpha/alpha.c (alpha_output_mi_thunk_osf): Call
> >   assemble_start_function and assemble_end_function.
> >   * config/arc/arc.c (arc_output_mi_thunk): Likewise.
> >   * config/arm/arm.c (arm_output_mi_thunk): Likewise.
> >   * config/bfin/bfin.c (bfin_output_mi_thunk): Likewise.
> >   * config/c6x/c6x.c (c6x_output_mi_thunk): Likewise.
> >   * config/cris/cris.c (cris_asm_output_mi_thunk): Likewise.
> >   * config/csky/csky.c (csky_output_mi_thunk): Likewise.
> >   * config/epiphany/epiphany.c (epiphany_output_mi_thunk): Likewise.
> >   * config/frv/frv.c (frv_asm_output_mi_thunk): Likewise.
> >   * config/i386/i386.c (x86_output_mi_thunk): Likewise.
> >   * config/ia64/ia64.c (ia64_output_mi_thunk): Likewise.
> >   * config/m68k/m68k.c (m68k_output_mi_thunk): Likewise.
> >   * config/microblaze/microblaze.c (microblaze_asm_output_mi_thunk):
> >   Likewise.
> >   * config/mips/mips.c (mips_output_mi_thunk): Likewise.
> >   * config/mmix/mmix.c (mmix_asm_output_mi_thunk): Likewise.
> >   * config/mn10300/mn10300.c (mn10300_asm_output_mi_thunk): Likewise.
> >   * config/nds32/nds32.c (nds32_asm_output_mi_thunk): Likewise.
> >   * config/nios2/nios2.c (nios2_asm_output_mi_thunk): Likewise.
> >   * config/or1k/or1k.c (or1k_output_mi_thunk): Likewise.
> >   * config/pa/pa.c (pa_asm_output_mi_thunk): Likewise.
> >   * config/riscv/riscv.c (riscv_output_mi_thunk): Likewise.
> >   * config/rs6000/rs6000.c (rs6000_output_mi_thunk): Likewise.
> >   * config/s390/s390.c (s390_output_mi_thunk): Likewise.
> >   * config/sh/sh.c (sh_output_mi_thunk): Likewise.
> >   * config/sparc/sparc.c (sparc_output_mi_thunk): Likewise.
> >   * config/spu/spu.c (spu_output_mi_thunk): Likewise.
> >   * config/stormy16/stormy16.c (xstormy16_asm_output_mi_thunk):
> >   Likewise.
> >   * config/tilegx/tilegx.c (tilegx_output_mi_thunk): Likewise.
> >   * config/tilepro/tilepro.c (tilepro_asm_output_mi_thunk): Likewise.
> >   * config/vax/vax.c (vax_output_mi_thunk): Likewise.
>
> OK, thanks.  The new placement of assemble_start_function after
> shorten_branches certainly makes more conceptual sense than what
> we had before.

Thanks. Applied to trunk.

-- Max


Re: [PATCH v2 3/3] Consider doloop cmp use in ivopts

2019-05-16 Thread Segher Boessenkool
On Thu, May 16, 2019 at 09:25:49AM +0200, Richard Biener wrote:
> On Wed, 15 May 2019, Segher Boessenkool wrote:
> > > Otherwise I understand that IVOPTs doesn't properly cost
> > > the doloop IV update and conditional branch.
> > 
> > Currently it doesn't even *know* something is or isn't a doloop.
> > And yeah that matters a lot for proper costing, on all targets that
> > have a doloop.
> 
> Ah, OK.  So for general handling IVOPTs would add a new
> candidate kind (doloop kind) which is costed differently
> at the various uses.

That sounds good.

> The "guessed" RTL we create for
> costing also needs to properly create a proper counter reg
> (IIRC it always creates pseudos right now, but here it would
> need to be a hard reg so costing can properly pessimize uses
> in addresses/memory?).

We always use a pseudo currently; it is not turned into a hard register
until after RA.  Expanding as hard registers directly works really well,
*if* you can put *all* uses of that hard reg into the RTl at expand time
already.  Indirect jumps and switch tables want to use the count
register as well; this complicates things enormously.  Also, sometimes
a loop is mangled enough (in RTL) that it is better to use a GPR as IV.

> > The different cost for a doloop is pretty easy...  Might have to
> > be a target hook though; on Power the decrement + compare-to-zero
> > are "free", while on some other targets only the "compare" is.
> > The cost for using the IV...  For us we could just disallow it
> > being used at all (except for the looping itself of course), but
> > not sure what is optimal in general.  Another hook?
> 
> Indeed the easiest thing is to simply disallow uses of the doloop
> IV outside of the increment, compare and branch (maybe have a
> target hook that says whether a particular IV may be used for
> a particular USE).

But is that generic enough?

> We'd still need to cost the spilling thing around calls of course,
> but this can maybe be done incrementally.  It's still RTL doloop
> that ultimatively decides on the doloop use.

We cannot have doloops with calls, on rs6000.  This differs per target
of course.

We really need to get a good overview of what our various targets need.


Segher


Re: [Contrib PATCH] Add scripts to convert GCC repo from SVN to Git

2019-05-16 Thread Maxim Kuvyrkov
> On May 16, 2019, at 7:22 PM, Jeff Law  wrote:
> 
> On 5/15/19 5:19 AM, Richard Biener wrote:
>> 
>> For the official converted repo do we really want all (old)
>> development branches to be in the
>> main git repo?  I suppose we could create a readonly git from the
>> state of the whole repository
>> at the point of conversion (and also keep the SVN in readonly mode),
>> just to make migration
>> of content we want easy in the future?
> I've always assumed we'd keep the old SVN tree read-only for historical
> purposes.  I strongly suspect that, ignoring release branches, that
> non-active branches just aren't terribly interesting.

Let's avoid mixing the two discussions: (1) converting svn repo to git (and 
getting community consensus to switch to git) and (2) deciding on which 
branches to keep in the new repo.

With git, we can always split away unneeded history by removing unnecessary 
branches and tags and re-packing the repo.  We can equally easily bring that 
history back if we change our minds.

--
Maxim Kuvyrkov
www.linaro.org



Re: [Contrib PATCH] Add scripts to convert GCC repo from SVN to Git

2019-05-16 Thread Jeff Law
On 5/15/19 5:19 AM, Richard Biener wrote:
> 
> For the official converted repo do we really want all (old)
> development branches to be in the
> main git repo?  I suppose we could create a readonly git from the
> state of the whole repository
> at the point of conversion (and also keep the SVN in readonly mode),
> just to make migration
> of content we want easy in the future?
I've always assumed we'd keep the old SVN tree read-only for historical
purposes.  I strongly suspect that, ignoring release branches, that
non-active branches just aren't terribly interesting.


Jeff


Re: [PATCH] Remove a test-case that leads to a huge stack (and file) allocation (PR middle-end/90478).

2019-05-16 Thread Jeff Law
On 5/16/19 5:42 AM, Martin Liška wrote:
> Hi.
> 
> I'm going to remove the test as it leads to a huge .s files and stack
> allocation at gcc/stmt.c:777
> 
> Ready for trunk?
> Martin
> 
> gcc/testsuite/ChangeLog:
> 
> 2019-05-16  Martin Liska  
> 
>   PR middle-end/90478
>   * gcc.dg/tree-ssa/pr90478-2.c: Remove.
Given you just added it and as a result of doing so we've realized it's
not a good idea, I'd think this qualifies as a no-brainer :-)

OK
jeff


Re: preserve more debug stmts in gimple jump threading

2019-05-16 Thread Jeff Law
On 5/15/19 3:03 PM, Alexandre Oliva wrote:
> On May 15, 2019, Richard Biener  wrote:
> 
>> On Wed, May 15, 2019 at 10:20 AM Alexandre Oliva  wrote:
>>>
>>> Gimple jump threading does not duplicate forwarder blocks that might
>>> be present before or after the copied block.
> 
>> Empty forwarder blocks I presume?
> 
> Empty except for debug stmts and possibly a final conditional jump that,
> in the threading path, becomes unconditional.
Right.  The tree-ssa-threadupate code all pre-dates the SEME copier
which is a *much* better way to handle duplicating the region.

Initially we allowed only one block with side effects in the jump
threading path.  That's all we really knew how to do correctly.  We
extended that to ignore forwarders at some point since they didn't need
to be copied -- you just need to know where the forwarder block will go.

We later added the ability to copy a second block with side effects in
the jump threading path.

But I'd really like to just remove tree-ssa-threadupate.c.  It's
horribly convoluted due to old requirements.  I'm confident we could use
the SEME copier to handle all the existing cases in a much simpler fashion.



Jeff


Re: [PATCH PR57534]Support strength reduction for MEM_REF in slur

2019-05-16 Thread Bin.Cheng
On Thu, May 16, 2019 at 11:50 PM Bill Schmidt  wrote:
>
> Thanks, Bin and Richard -- I am out of the office until Tuesday, so will 
> review
> when I get back.  Bin, please CC me on SLSR patches as otherwise I might miss
> them.  Thanks!
Thanks for helping.  Will do it next time.

Thanks,
bin
>
> Bill
>
>
> On 5/16/19 6:37 AM, Richard Biener wrote:
> > On Wed, May 15, 2019 at 6:30 AM bin.cheng  
> > wrote:
> >> Hi,
> >> As noted in PR57534 comment #33, SLSR currently doesn't strength reduce 
> >> memory
> >> references in reported cases, which conflicts with its comment at the 
> >> beginning of file.
> >> The main reason is in functions slsr_process_ref and restructure_reference 
> >> which
> >> rejects MEM_REF by handled_compoenent_p in the first place.  This patch 
> >> identifies
> >> and creates CAND_REF for MEM_REF by restructuring base/offset.
> >>
> >> Note the patch only affects cases with multiple reducible MEM_REF.
> >>
> >> Also note, with this patch, [base + cst_offset] addressing mode would be 
> >> generally
> >> preferred.  I need to adjust three existing tests:
> >> * gcc.dg/tree-ssa/isolate-5.c
> >> * gcc.dg/tree-ssa/ssa-hoist-4.c
> >> Though address computation is reduced out of memory reference, the 
> >> generated
> >> assembly is not worsened.
> >>
> >> * gcc.dg/tree-ssa/slsr-3.c
> >> The generated assembly has two more instructions:
> >> <   movslq  %edx, %rcx
> >> <   movl(%rsi,%rcx,4), %r9d
> >> <   leaq0(,%rcx,4), %rax
> >> <   leal2(%r9), %r8d
> >> <   movl%r8d, (%rdi,%rcx,4)
> >> <   movl4(%rsi,%rax), %ecx
> >> <   addl$2, %ecx
> >> <   movl%ecx, 4(%rdi,%rax)
> >> <   movl8(%rsi,%rax), %ecx
> >> <   addl$2, %ecx
> >> <   movl%ecx, 8(%rdi,%rax)
> >> <   movl12(%rsi,%rax), %ecx
> >> <   addl$2, %ecx
> >> <   movl%ecx, 12(%rdi,%rax)
> >> ---
> >>>   movslq  %edx, %rax
> >>>   salq$2, %rax
> >>>   addq%rax, %rsi
> >>>   addq%rax, %rdi
> >>>   movl(%rsi), %eax
> >>>   addl$2, %eax
> >>>   movl%eax, (%rdi)
> >>>   movl4(%rsi), %eax
> >>>   addl$2, %eax
> >>>   movl%eax, 4(%rdi)
> >>>   movl8(%rsi), %eax
> >>>   addl$2, %eax
> >>>   movl%eax, 8(%rdi)
> >>>   movl12(%rsi), %eax
> >>>   addl$2, %eax
> >>>   movl%eax, 12(%rdi)
> >> Seems to me this is not deteriorating and "salq" can be saved by two 
> >> forward propagation.
> >>
> >> Bootstrap and test on x86_64, any comments?
> > The idea is good I think and the result above as well.  Leaving for Bill
> > to have a look as well, otherwise OK.
> >
> > Thanks,
> > Richard.
> >
> >> Thanks,
> >> bin
> >>
> >> 2019-05-15  Bin Cheng  
> >>
> >> PR tree-optimization/57534
> >> * gimple-ssa-strength-reduction.c (restructure_base_offset): New.
> >> (restructure_reference): Call restructure_base_offset when offset 
> >> is
> >> NULL.
> >> (slsr_process_ref): Handle MEM_REF.
> >>
> >> 2018-05-15  Bin Cheng  
> >>
> >> PR tree-optimization/57534
> >> * gcc.dg/tree-ssa/pr57534.c: New test.
> >> * gcc.dg/tree-ssa/isolate-5.c: Adjust checking strings.
> >> * gcc.dg/tree-ssa/slsr-3.c: Ditto.
> >> * gcc.dg/tree-ssa/ssa-hoist-4.c: Ditto.
>


Re: OpenACC Profiling Interface: 'acc_register_library' (was: OpenACC 2.5 Profiling Interface)

2019-05-16 Thread Jakub Jelinek
On Thu, May 16, 2019 at 05:21:56PM +0200, Thomas Schwinge wrote:
> > Jakub, would you please especially review the non-OpenACC-specific
> > changes here, including the libgomp ABI changes?
> 
> Given a baseline that I've not yet posted ;-) would you please anyway
> have a look at the following changes?  Is it OK to add/handle the
> 'acc_register_library' symbol in this way?  The idea behind that one is
> that you dynamically (including via 'LD_PRELOAD') link your code against
> a "library" providing an implementation of 'acc_register_library', or
> even define it in your user code (see the test case below), and then upon
> initialization, "The OpenACC runtime will invoke 'acc_register_library',
> passing [...]".

Ugh, it is a mess (but then, seems OMPT has the same mess with
ompt_start_tool symbol).

It is nasty to call acc_register_library from initialization of the OpenMP
library, similarly to nastyness of calling ompt_start_tool from
initialization of the OpenACC library, neither of those symbols is reserved
to the implementation generally.
Can't we not do anything for -fopenacc or -fopenmp and have
-fopenacc-profile or -fopenmpt options that would link in another shared
library which just provides that symbol and calls it from its
initialization?  The dummy implementation would be __attribute__((weak))
and would dlsym (RTLD_NEXT, "...") and call that if it returns non-NULL,
so even if that library happens to be linked before whatever library
implements the user symbol.
Looking at what libomp does for ompt_start_tool, for Darwin they don't use
a weak symbol and instead just dlsym(RTLD_DEFAULT, "...") in the
library ctor, for Linux they have a weak definition that does dlsym
(RTLD_NEXT, "...") and for Windows use something yet different.

> --- libgomp/libgomp.map
> +++ libgomp/libgomp.map
> @@ -469,6 +469,7 @@ OACC_2.5 {
>   acc_prof_lookup;
>   acc_prof_register;
>   acc_prof_unregister;
> + acc_register_library;
>   acc_update_device_async;
>   acc_update_device_async_32_h_;
>   acc_update_device_async_64_h_;

You certainly never want to add something to a symbol version
that has been shipped in a release compiler already.

Jakub


Re: [PATCH PR57534]Support strength reduction for MEM_REF in slur

2019-05-16 Thread Bill Schmidt
Thanks, Bin and Richard -- I am out of the office until Tuesday, so will review
when I get back.  Bin, please CC me on SLSR patches as otherwise I might miss
them.  Thanks!

Bill


On 5/16/19 6:37 AM, Richard Biener wrote:
> On Wed, May 15, 2019 at 6:30 AM bin.cheng  wrote:
>> Hi,
>> As noted in PR57534 comment #33, SLSR currently doesn't strength reduce 
>> memory
>> references in reported cases, which conflicts with its comment at the 
>> beginning of file.
>> The main reason is in functions slsr_process_ref and restructure_reference 
>> which
>> rejects MEM_REF by handled_compoenent_p in the first place.  This patch 
>> identifies
>> and creates CAND_REF for MEM_REF by restructuring base/offset.
>>
>> Note the patch only affects cases with multiple reducible MEM_REF.
>>
>> Also note, with this patch, [base + cst_offset] addressing mode would be 
>> generally
>> preferred.  I need to adjust three existing tests:
>> * gcc.dg/tree-ssa/isolate-5.c
>> * gcc.dg/tree-ssa/ssa-hoist-4.c
>> Though address computation is reduced out of memory reference, the generated
>> assembly is not worsened.
>>
>> * gcc.dg/tree-ssa/slsr-3.c
>> The generated assembly has two more instructions:
>> <   movslq  %edx, %rcx
>> <   movl(%rsi,%rcx,4), %r9d
>> <   leaq0(,%rcx,4), %rax
>> <   leal2(%r9), %r8d
>> <   movl%r8d, (%rdi,%rcx,4)
>> <   movl4(%rsi,%rax), %ecx
>> <   addl$2, %ecx
>> <   movl%ecx, 4(%rdi,%rax)
>> <   movl8(%rsi,%rax), %ecx
>> <   addl$2, %ecx
>> <   movl%ecx, 8(%rdi,%rax)
>> <   movl12(%rsi,%rax), %ecx
>> <   addl$2, %ecx
>> <   movl%ecx, 12(%rdi,%rax)
>> ---
>>>   movslq  %edx, %rax
>>>   salq$2, %rax
>>>   addq%rax, %rsi
>>>   addq%rax, %rdi
>>>   movl(%rsi), %eax
>>>   addl$2, %eax
>>>   movl%eax, (%rdi)
>>>   movl4(%rsi), %eax
>>>   addl$2, %eax
>>>   movl%eax, 4(%rdi)
>>>   movl8(%rsi), %eax
>>>   addl$2, %eax
>>>   movl%eax, 8(%rdi)
>>>   movl12(%rsi), %eax
>>>   addl$2, %eax
>>>   movl%eax, 12(%rdi)
>> Seems to me this is not deteriorating and "salq" can be saved by two forward 
>> propagation.
>>
>> Bootstrap and test on x86_64, any comments?
> The idea is good I think and the result above as well.  Leaving for Bill
> to have a look as well, otherwise OK.
>
> Thanks,
> Richard.
>
>> Thanks,
>> bin
>>
>> 2019-05-15  Bin Cheng  
>>
>> PR tree-optimization/57534
>> * gimple-ssa-strength-reduction.c (restructure_base_offset): New.
>> (restructure_reference): Call restructure_base_offset when offset is
>> NULL.
>> (slsr_process_ref): Handle MEM_REF.
>>
>> 2018-05-15  Bin Cheng  
>>
>> PR tree-optimization/57534
>> * gcc.dg/tree-ssa/pr57534.c: New test.
>> * gcc.dg/tree-ssa/isolate-5.c: Adjust checking strings.
>> * gcc.dg/tree-ssa/slsr-3.c: Ditto.
>> * gcc.dg/tree-ssa/ssa-hoist-4.c: Ditto.



Add stats for aliasing_component_refs_p

2019-05-16 Thread Jan Hubicka
Hi,
tis patch adds stats for aliasing_component_refs_p oracle into
the alias oracle report.  Bootstrapped/regtested x86_64-linux, comitted
as obvious.

This is dump for LTO optimizing tramp3d.

Compile time:
  refs_may_alias_p: 0 disambiguations, 0 queries
  ref_maybe_used_by_call_p: 248 disambiguations, 10137 queries
  call_may_clobber_ref_p: 108 disambiguations, 108 queries
  aliasing_component_ref_p: 49 disambiguations, 2461 queries
  TBAA oracle: 53460 disambiguations 226175 queries
   119554 are in alias set 0
   25206 queries asked about the same object
   0 queries asked about the same alias set
   0 access volatile
   27787 are dependent in the DAG
   168 are aritificially in conflict with void *

LTO time:
  refs_may_alias_p: 0 disambiguations, 0 queries
  ref_maybe_used_by_call_p: 6451 disambiguations, 25578 queries
  call_may_clobber_ref_p: 817 disambiguations, 817 queries
  aliasing_component_ref_p: 14 disambiguations, 12528 queries
  TBAA oracle: 1468347 disambiguations 3010562 queries
   550690 are in alias set 0
   614235 queries asked about the same object
   0 queries asked about the same alias set
   0 access volatile
   260937 are dependent in the DAG
   116353 are aritificially in conflict with void *

Compared to non-lto build:
  refs_may_alias_p: 0 disambiguations, 0 queries
  ref_maybe_used_by_call_p: 5929 disambiguations, 37709 queries
  call_may_clobber_ref_p: 691 disambiguations, 691 queries
  aliasing_component_ref_p: 184 disambiguations, 23792 queries
  TBAA oracle: 1733770 disambiguations 3465988 queries
   803609 are in alias set 0
   697488 queries asked about the same object
   0 queries asked about the same alias set
   0 access volatile
   230953 are dependent in the DAG
   168 are aritificially in conflict with void *

While this is not very exhaustive study of TBAA oracle quality, there is
clearly room for improvement with LTO.

Honza

* tree-ssa-alias.c (alias_stats): Add
aliasing_component_refs_p_may_alias and
aliasing_component_refs_p_no_alias.
(dump_alias_stats): Print aliasing_component_refs_p stats.
(aliasing_component_refs_p): Update stats.
Index: tree-ssa-alias.c
===
--- tree-ssa-alias.c(revision 271291)
+++ tree-ssa-alias.c(working copy)
@@ -98,6 +98,8 @@ static struct {
   unsigned HOST_WIDE_INT ref_maybe_used_by_call_p_no_alias;
   unsigned HOST_WIDE_INT call_may_clobber_ref_p_may_alias;
   unsigned HOST_WIDE_INT call_may_clobber_ref_p_no_alias;
+  unsigned HOST_WIDE_INT aliasing_component_refs_p_may_alias;
+  unsigned HOST_WIDE_INT aliasing_component_refs_p_no_alias;
 } alias_stats;
 
 void
@@ -122,6 +124,12 @@ dump_alias_stats (FILE *s)
   alias_stats.call_may_clobber_ref_p_no_alias,
   alias_stats.call_may_clobber_ref_p_no_alias
   + alias_stats.call_may_clobber_ref_p_may_alias);
+  fprintf (s, "  aliasing_component_ref_p: "
+  HOST_WIDE_INT_PRINT_DEC" disambiguations, "
+  HOST_WIDE_INT_PRINT_DEC" queries\n",
+  alias_stats.aliasing_component_refs_p_no_alias,
+  alias_stats.aliasing_component_refs_p_no_alias
+  + alias_stats.aliasing_component_refs_p_may_alias);
   dump_alias_stats_in_alias_c (s);
 }
 
@@ -822,7 +830,16 @@ aliasing_component_refs_p (tree ref1,
   offset2 -= offadj;
   get_ref_base_and_extent (base1, &offadj, &sztmp, &msztmp, &reverse);
   offset1 -= offadj;
-  return ranges_maybe_overlap_p (offset1, max_size1, offset2, max_size2);
+  if (ranges_maybe_overlap_p (offset1, max_size1, offset2, max_size2))
+   {
+ ++alias_stats.aliasing_component_refs_p_may_alias;
+ return true;
+   }
+  else
+   {
+ ++alias_stats.aliasing_component_refs_p_no_alias;
+ return false;
+   }
 }
 
   /* If we didn't find a common base, try the other way around.  */
@@ -840,14 +857,25 @@ aliasing_component_refs_p (tree ref1,
   offset1 -= offadj;
   get_ref_base_and_extent (base2, &offadj, &sztmp, &msztmp, &reverse);
   offset2 -= offadj;
-  return ranges_maybe_overlap_p (offset1, max_size1,
-offset2, max_size2);
+  if (ranges_maybe_overlap_p (offset1, max_size1, offset2, max_size2))
+   {
+ ++alias_stats.aliasing_component_refs_p_may_alias;
+ return true;
+   }
+  else
+   {
+ ++alias_stats.aliasing_component_refs_p_no_alias;
+ return false;
+   }
 }
 
   /* In the remaining test we assume that there is no overlapping type
  at all.  So if we are unsure, we need to give up.  */
   if (same_p == -1 || same_p2 == -1)
-return true;
+{
+  ++alias_stats.aliasing_component_refs_p_may_ali

OpenACC Profiling Interface: 'acc_register_library' (was: OpenACC 2.5 Profiling Interface)

2019-05-16 Thread Thomas Schwinge
Hi Jakub!

On Sun, 11 Nov 2018 22:31:42 -0600, I wrote:
> On Tue, 28 Feb 2017 18:43:36 +0100, I wrote:
> > The 2.5 versions of the OpenACC standard added a new chapter "Profiling
> > Interface".
> 
> I'd like to get that into trunk.  It's not yet complete (that is, doesn't
> provide all the information specified), but it's very useful already, and
> the missing pieces can later be added incrementally.
> 
> Jakub, would you please especially review the non-OpenACC-specific
> changes here, including the libgomp ABI changes?

Given a baseline that I've not yet posted ;-) would you please anyway
have a look at the following changes?  Is it OK to add/handle the
'acc_register_library' symbol in this way?  The idea behind that one is
that you dynamically (including via 'LD_PRELOAD') link your code against
a "library" providing an implementation of 'acc_register_library', or
even define it in your user code (see the test case below), and then upon
initialization, "The OpenACC runtime will invoke 'acc_register_library',
passing [...]".

As far as I can tell, it was never a concern (by us internally as well as
that nobody external ever complained) that 'acc_*' and 'GOACC_*' symbols
are visible when building with '-fopenmp' but (default) '-fno-openacc',
and vice versa, 'omp_*' and 'GOMP_*' symbols are visible when building
with '-fopenacc' but (default) '-fno-openmp'.  But,
'acc_register_library' is special in that the runtime (libgomp) will
unconditionally call it, also for '-fopenmp' but (default)
'-fno-openacc'.  So, when OpenMP user code happens to contain an
(unrelated) 'acc_register_library' symbol, strange things will happen.

OpenACC states that "Typically, the OpenACC runtime will include a _weak_
definition of 'acc_register_library', which does nothing and which will
be called when there is no tools library".  I'm not sure if that's "weak"
specifically in the ELF linking sense, or just generally "weak"
semantics.  But it seemed easy enough to just provide a regular symbol in
its own '*.o' file, to be overridden in both the dynamic and static
linking cases, so that's what I've done.  Any comments to that aspect?

--- libgomp/Makefile.am
+++ libgomp/Makefile.am
@@ -66,7 +66,7 @@ libgomp_la_SOURCES = alloc.c atomic.c barrier.c 
critical.c env.c error.c \
splay-tree.c libgomp-plugin.c oacc-parallel.c oacc-host.c oacc-init.c \
oacc-mem.c oacc-async.c oacc-plugin.c oacc-cuda.c priority_queue.c \
affinity-fmt.c teams.c \
-   oacc-profiling.c
+   oacc-profiling.c oacc-profiling-acc_register_library.c
 
 include $(top_srcdir)/plugin/Makefrag.am
 
--- libgomp/acc_prof.h
+++ libgomp/acc_prof.h
@@ -235,6 +235,9 @@ extern void acc_prof_unregister (acc_event_t, 
acc_prof_callback, acc_register_t)
 typedef void (*acc_query_fn) ();
 typedef acc_query_fn (*acc_prof_lookup_func) (const char *);
 extern acc_query_fn acc_prof_lookup (const char *) __GOACC_NOTHROW;
+/* Don't tag 'acc_register_library' as '__GOACC_NOTHROW': this function 
can be
+   overridden by the application, and must be expected to do anything.  */
+extern void acc_register_library (acc_prof_reg, acc_prof_reg, 
acc_prof_lookup_func);
 
 
 #ifdef __cplusplus
--- libgomp/libgomp.map
+++ libgomp/libgomp.map
@@ -469,6 +469,7 @@ OACC_2.5 {
acc_prof_lookup;
acc_prof_register;
acc_prof_unregister;
+   acc_register_library;
acc_update_device_async;
acc_update_device_async_32_h_;
acc_update_device_async_64_h_;
--- /dev/null
+++ libgomp/oacc-profiling-acc_register_library.c
@@ -0,0 +1,40 @@
+/* OpenACC Profiling Interface: stub 'acc_register_library' function
+[...]
+
+#include "libgomp.h"
+#include "acc_prof.h"
+
+/* This is in its own file so that this function definition can be 
overridden
+   when linking statically.  */
+
+void
+acc_register_library (acc_prof_reg reg, acc_prof_reg unreg,
+ acc_prof_lookup_func lookup)
+{
+  gomp_debug (0, "dummy %s\n", __FUNCTION__);
+}
--- libgomp/oacc-profiling.c
+++ libgomp/oacc-profiling.c
@@ -107,6 +107,12 @@ goacc_profiling_initialize (void)
   /* ..., but profiling is still disabled.  */
   __atomic_store_n (&goacc_prof_enabled, false, MEMMODEL_RELAXED);
 
+  /* We are to invoke an external acc_register_library routine, defaulting 
to
+ our stub oacc-profiling-acc_register_library.c:acc_register_library
+ implementation.  */
+  gomp_debug (0, "%s: calling acc_register_library\n", __FUNCTION__);
+  acc_register_library (acc_prof_register, acc_prof_unregister,
+   acc_prof_lookup);
 #ifdef PLUGIN_SUPPORT
   char *acc_proflibs = secure_getenv ("ACC_PROFLIB");
   while (acc_proflibs != NULL && acc_proflibs[0] != '\0')
@@ -139,16 +145,24 @@ goacc_profiling_initialize (void)
  void 

Re: [PATCH] fortran: C++ support for generating C prototypes

2019-05-16 Thread Jakub Jelinek
On Wed, May 15, 2019 at 10:41:06PM +0300, Janne Blomqvist wrote:
> > 2) I don't think float _Complex is
> >passed the same as std::complex and similar for others;
> >std::complex is in libstdc++ a C++ class with with
> >__complex__ float as its sole non-static data member and with non-trivial
> >constructors; which means it is passed/returned via a hidden reference;
> >when the argument is actually FLOAT_COMPLEX * or FLOAT_COMPLEX &,
> >you except for aliasing don't have to care that much, but if
> >that complex argument has VALUE attribute in Fortran and so the
> >C prototype would be FLOAT_COMPLEX, then std::complex is
> >passed in the end as std::complex &, while float _Complex
> >the same as __complex__ float; and ditto for functions returning
> >complex
> 
> Ugh, I guess that's right. Any good way around it? Except print a
> warning in the header that passing std::complex<> by value doesn't
> work?

Perhaps we can use different macros for the two cases, define
__GFORTRAN_FLOAT_COMPLEX to std::complex and
__GFORTRAN_FLOAT_COMPLEX_VALUE to __complex__ float and use the
former in the __GFORTRAN_FLOAT_COMPLEX * arguments and
the latter for VALUE dummy args and return value.

For the return value case, guess it will be fine, when one does
__complex__ float fortran_fn_ (void);
std::complex ret = fortran_fn_ ();
it will work just fine, but for the arguments trying to
void fortran_sub_ (__complex__ float);
std::complex f = 2.0f + 4.0fi;
fortran_sub_ (f);
will not work (but one will get at least errors).  One can use non-standard
fortran_sub_ (f.__rep ());

Jakub


Re: [PATCH 0/12] detect quoting and punctuation problems in diagnostics

2019-05-16 Thread Martin Sebor

On 5/16/19 4:02 AM, Martin Liška wrote:

Hi.

Maybe I've install the patches wrongly, but I see following error on ppc64le
during bootstrap in stage2:


I also noticed it yesterday on x86_64.  The %qk was vestige of
an earlier attempt to use the pretty-printer to format TREE_CODEs.
I have this in my tree that fixes it but let me post an updated
patch.

--- a/gcc/tree.c
+++ b/gcc/tree.c
@@ -10009,8 +10009,10 @@ void
 omp_clause_check_failed (const_tree node, const char *file, int line,
  const char *function, enum omp_clause_code code)
 {
-  internal_error ("tree check: expected omp_clause %s, have %s in %s, 
at %s:%d",
- omp_clause_code_name[code], get_tree_code_name 
(TREE_CODE (node)),

+  internal_error ("tree check: expected %, have %qs "
+ "in %s, at %s:%d",
+ omp_clause_code_name[code],
+ get_tree_code_name (TREE_CODE (node)),
  function, trim_filename (file), line);
 }


As a heads up, my latest log still shows a few testsuite failures
that I need to clean up. Those I've looked at are all missing
adjustments to expected dg-warning output.

!  FAIL: 20_util/any/misc/any_cast_neg.cc (3: +3)
!  FAIL: gcc.dg/gcc_diag-11.c (1: +1)
!  FAIL: g++.dg/ubsan/pr63956.C (21: +21)
!  FAIL: gnat.dg/inline3.adb (2: +2)
!  FAIL: gnat.dg/inline5.adb (2: +2)
!  FAIL: gnat.dg/inline7.adb (2: +2)
!  FAIL: gnat.dg/inline9.adb (2: +2)
!  FAIL: objc.dg/method-19.m (2: +2)
!  FAIL: objc.dg/protocol-qualifier-2.m (2: +2)
!  FAIL: obj-c++.dg/protocol-qualifier-2.mm (2: +2)

Martin



/home/marxin/Programming/gcc/objdir/./prev-gcc/xg++ 
-B/home/marxin/Programming/gcc/objdir/./prev-gcc/ 
-B/usr/local/powerpc64le-unknown-linux-gnu/bin/ -nostdinc++ 
-B/home/marxin/Programming/gcc/objdir/prev-powerpc64le-unknn-linux-gnu/libstdc++-v3/src/.libs
 
-B/home/marxin/Programming/gcc/objdir/prev-powerpc64le-unknown-linux-gnu/libstdc++-v3/libsupc++/.libs
  
-I/home/marxin/Programming/gcc/objdir/prev-powerpc64le-unknown-linux-gnu/libstdc++-v3/include/powerpc64le-unknown-linux-gnu
  
-I/homearxin/Programming/gcc/objdir/prev-powerpc64le-unknown-linux-gnu/libstdc++-v3/include
  -I/home/marxin/Programming/gcc/libstdc++-v3/libsupc++ 
-L/home/marxin/Programming/gcc/objdir/prev-powerpc64le-unknown-linux-gnu/libstdc++-v3/src/.libs
 
-L/home/marxin/Programming/gcc/objr/prev-powerpc64le-unknown-linux-gnu/libstdc++-v3/libsupc++/.libs
 -fno-PIE -c   -g -O2 -fno-checking -gtoggle -DIN_GCC -fno-exceptions 
-fno-rtti -fasynchronous-unwind-tables -W -Wall -Wno-narrowing -Wwrite-strings 
-Wcast-qual -Wno-error=format-diag -Wmissing-format-tribute 
-Woverloaded-virtual -pedantic -Wno-long-long -Wno-variadic-macros 
-Wno-overlength-strings -Werror -fno-common  -DHAVE_CONFIG_H -I. -I. 
-I../../gcc -I../../gcc/. -I../../gcc/../include -I../../gcc/../libcpp/include  
-I../../gcc/../libdecnumber -I../../gcc/../libcnumber/dpd -I../libdecnumber 
-I../../gcc/../libbacktrace   -o tree.o -MT tree.o -MMD -MP -MF 
./.deps/tree.TPo ../../gcc/tree.c
../../gcc/tree.c: In function ‘void omp_clause_check_failed(const_tree, const 
char*, int, const char*, omp_clause_code)’:
../../gcc/tree.c:10012:67: error: unknown conversion type character ‘k’ in 
format [-Werror=format=]
10012 |   internal_error ("tree check: expected %, have %qk "
   |   ^
../../gcc/tree.c:10013:10: error: format ‘%s’ expects argument of type ‘char*’, 
but argument 3 has type ‘int’ [-Werror=format=]
10013 | "in %s, at %s:%d",
   | ~^
   |  |
   |  char*
   | %d
../../gcc/tree.c:10013:20: error: format ‘%d’ expects argument of type ‘int’, 
but argument 5 has type ‘const char*’ [-Werror=format=]
10013 | "in %s, at %s:%d",
   |   ~^
   ||
   |int
   |   %s
10014 | omp_clause_code_name[code], TREE_CODE (node),
10015 | function, trim_filename (file), line);
   |   
   | |
   | const char*
../../gcc/tree.c:10012:19: error: too many arguments for format 
[-Werror=format-extra-args]
10012 |   internal_error ("tree check: expected %, have %qk "
   |   ^~~
10013 | "in %s, at %s:%d",
   | ~

Martin





patches to detect GCC diagnostics

2019-05-16 Thread Roland Illig
Hi Martin,

I'm impressed how much work you have put into the patches for detecting
nonoptimal diagnostics. It takes a long time to read through the
patches, but it's worth it, knowing that it took much longer for you to
prepare the patch, and that I won't have to submit i18n bug reports in
the foreseeable future. :)



+  /* Diagnose "arg" (short for "argument" when lazy).  */
+  if (!strncmp (format_chars, "arg", 3)
+ && (!format_chars[3]
+ || format_chars[3] == 's'
+ || ISSPACE (format_chars[3])))

Wouldn't it be sufficient to just check for !ISALNUM(format_chars[3])?
This would also catch "specify args, return type and so on".

I didn't like the magic "n == 3", but after experimenting a bit, I came
to the conclusion that the code you wrote is the best possible.

typo: ponters should be pointers

typo: drective should be directive

Since your code contains the expression strncmp(str, sub, sizeof sub -
1) occurs quite often, I was thinking whether it would be useful to
declare str_startswith, which expresses the actual intent more directly.

> nchars > 1

Better use ngettext in these 7 cases, to account for multiple plural
forms in Arabic, Polish and Russian. :)

> +  /* Diagnose a backtick (grave accent).  */

This diagnostic should explain how to fix this one since it might be
non-obvious.

typo: /* Likewise for gimple.  */ -- should be cgraph_node

typo: be  cdiagnosed -- spurious whitespace? ;)

possible typo: arn't

there is a FIXME after "you can%'t do that"

"ignoring %-specifier for non-static local " might be wrong, as
the word "asm-specifier" might come from the C or C++ grammar. Should
this be "% specifier", with a space?

Oh no. "%qE is not an % qualifier" might destroy my hopes of
merging diagnostics with the same pattern. If some of them need to be
prefixed with "a" and some others with "an", they cannot be merged. Or I
need to make an exception when the "before" string ends in "a" or "an".
Luckily, for "the" and "the" only the pronunciation differs but not the
spelling.

-   "%qE attribute argument %E is not in the range [0, %E)",
-   name, val, align);
+   "%qE attribute argument %E is not in the range %s, %E%c",
+   name, val, "[0", align, ')');

I don't like this one as it is asymmetrical. Either both characters
should be passed as %c, or none of them. I prefer passing none of them
to make the string easier to read for translators.

+   "unsuffixed floating constant");

I'd rather write "unsuffixed floating point constant". (appears multiple
times)

-  warning (OPT_Wpragmas, "#pragma redefine_extname ignored due to "
-  "conflict with __asm__ declaration");
+  warning (OPT_Wpragmas, "%<#pragma redefine_extname%> ignored "
+  "due to conflict with % declaration");

Are you sure that you want to remove the underscores? Just asking, I
haven't checked the surrounding code.

- error ("#pragma GCC target string... is badly formed");
+ error ("%<#pragma GCC target%> string is badly formed");
- error ("#pragma GCC optimize string... is badly formed");
+ error ("%<#pragma GCC optimize%> string is badly formed");

I think the "string..." was supposed to go inside the quotes.

+  warning (0, "%s:tag %qx is invalid", filename, tag);

I think there should be a space after the colon, but that should be in
another commit. This one is already big enough.

-inform (cloc, "%s%#qD ", msg, fn);
+inform (cloc, "%s%#qD %s", msg, fn, "");

This change and the similar ones around this will prevent the "" string from being translated at all. These strings should stay
in the format string, there needs to be a different solution for them.

-  error_at (loc, "typeid-expression is not a constant expression "
+  error_at (loc, "% is not a constant expression "

This sounds like a term from the C++ grammar.

+  inform (loc, "in C++11 destructors default to %");

grammar: "defaulting to"? A few lines below there is "because
destructors default", where this word is correct, so it may be a
copy-and-paste mistake.

+  inform (input_location, "  enters % statement");

According to https://en.cppreference.com/w/cpp/language/if#Constexpr_If,
the term "constexpr if statement" is a single word.

   error ("explicitly defaulted function %q+D cannot be declared "
 "as % because the implicit declaration is not "
-"%:", fn);
+"%qs:", fn, "constexpr");

I don't understand why you extracted one of the % but left
the other one in the message.

+  "an %-specification is not allowed "

See somewhere above.

+  error ("requested % %i is out of range [0, %i]",
+ pri, MAX_INIT_PRIORITY);

The other places use %u for MAX_INT_PRIORITY since its value is 65535. I
don't know whether GCC would work at all on platforms where %d is 16
bit, as it would be hard to address hundreds of megabytes of heap in
such an environment.

+  (0, "requested % %i is reserved for internal use",
+  pri);

Same.

+for real_option in -Wstr

Re: [PATCH] True IPA reimplementation of IPA-SRA

2019-05-16 Thread Martin Jambor
Hi Richi,

On Thu, May 16 2019, Richard Biener wrote:
> On Fri, May 10, 2019 at 10:31 AM Martin Jambor  wrote:
>>
>> Hello,
>>
>> this is a follow-up from a WIP patch I sent here in late December:
>> https://gcc.gnu.org/ml/gcc-patches/2018-12/msg01765.html
>>
>> Just like the last time, the patch below is is a reimplementation of
>> IPA-SRA to make it a full IPA pass that can handle strictly connected
>> components in the call-graph, can take advantage of LTO and does not
>> weirdly switch functions in pass-pipeline like our current quasi-IPA SRA
>> does.  Unlike the current IPA-SRA it can also remove return values, even
>> in SCCs.  On the other hand, it is less powerful when it comes to
>> structures passed by reference.  By design it will not create references
>> to bits of an aggregate because that turned out to be just obfuscation
>> in practice.  However, it also cannot usually split aggregates passed by
>> reference that are just passed to another function (where splitting
>> would be useful) because it cannot perform the same TBAA analysis like
>> the current implementation which already knows what types it should look
>> at because it has access to bodies of all functions attempts to modify.
>
> So that's just because the analysis is imperfect?  I mean if we can handle
>
>  foo (X *p) { do_something (p->a); }
>  X a; a.a = 1; foo (&a);
>
> then we should be able to handle
>
>  bar (X *p) { foo (p); }
>  X a; a.a = 1; bar (&a);

So because the call to foo dominates EXIT and uses default definition
MEM SSA, this example would be handled fine by the patch.  But it cannot
handle for example (assuming p->a is an int):

   bar (X *p) { *global_double_ptr = 0.0; foo (p); }

The current IPA-SRA can, because at the time it looks at foo, bar has
been already processed, and so it knows the load is of integer type.  If
necessary, we could try TBAA for fields in X if there is a reasonable
number of them and at IPA level just check a flag saying that bar does
not engage in some type-punning.


Another example would be something like:

   bar (X *p) { if (cond) bar (p); else do_something_else (p->a); }

The problem here is that the check if p is sure to be dereferenced when
bar is called is also done at the intra-procedural level.  Well, it is
not actually a test if p is dereferenced but if the offset from p which
is known to be dereferenced covers p->a.  We could do it symbolically,
arrive at some expression of the form

  MIN(offsetof(a), known_dereference_offset_in (bar))

store that to the IPA summary and then evaluate at IPA level.  If we
think that it is worth it.

Still, I don't think the situation is that much worse in practice
because IPA-SRA can only handle fairly simple cases anyway, and those
are actually often taken care of by indirect inlining.

>
> Thanks for doing this.  I wonder how difficult it is to split the
> patch into a) old IPA-SRA removal, b) refactoring c) IPA-SRA add
> (probably easiest in that order).  It's quite a large number of
> changes, a) being mostly uninteresting (and pre-approved hereby, just
> not independently, of course), b) is uninteresting to me, but I would
> like to look at c), not sure if that's really only the new file,
> probably not since IPA modifications have infrastructure bits.

The analysis parts of the new IPA-SRA, both the intra-procedural and
inter-procedural are entirely in the new file ipa-sra.c so if you want
to review that, just grab that file from
https://gcc.gnu.org/git/?p=gcc.git;a=tree;h=refs/heads/jamborm/ipa-sra;hb=refs/heads/jamborm/ipa-sra

The transformation part, however, are what the "refactoring" is really
about because it is not the pass but rather the cgraph cloning
infrastructure that performs the actual transformation.  This is so on
purpose because not only the bodies of changed functions need to be
adjusted but also all calls to them, and you cannot register a
pass-specific transformation function for that - and I need to actually
pass information from the body transformation to the call transformation
anyway.

So yes, this split would be possible and perhaps even easy but it would
not make much of a difference.

Thanks,

Martin


>
> Sorry for not mentioning earlier.  Maybe just splitting out a) already
> helps (you seem to remove code not in tree-sra.c).
>
> Thanks,
> Richard.
>
>> Martin
>>
>>
>>
>> 2019-05-09  Martin Jambor  
>>
>> * coretypes.h (cgraph_edge): Declare.
>> * ipa-param-manipulation.c: Rewrite.
>> * ipa-param-manipulation.h: Likewise.
>> * Makefile.in (GTFILES): Added ipa-param-manipulation.h and 
>> ipa-sra.c.
>> (OBJS): Added ipa-sra.o.
>> * cgraph.h (ipa_replace_map): Removed fields old_tree, replace_p
>> and ref_p, added fields param_adjustments and performed_splits.
>> (struct cgraph_clone_info): Remove ags_to_skip and
>> combined_args_to_skip, new field param_adjustments.
>> (cgraph_node::create_clone): Changed parameters to use
>>  

Re: [PATCH] OpenMP simd if clause support with runtime determined argument (take 2)

2019-05-16 Thread Richard Biener
On Thu, 16 May 2019, Jakub Jelinek wrote:

> On Thu, May 16, 2019 at 11:30:38AM +0200, Richard Biener wrote:
> > > note_simd_array_uses indeed does walk the IL and does look at the calls,
> > > but I'd need some data structure where to store the argument; we don't 
> > > have
> > > loop_vinfo yet (we don't have it even before the loop over vector sizes),
> > > adding another tree to struct loop seems undesirable from memory usage 
> > > POV,
> > > we'd need it just for the duration between note_simd_array_uses and
> > > the actual loop_vinfo creation; so would you prefer some extra hash_map 
> > > for
> > > that?
> > 
> > Maybe that or move it to the _loop_vec_info constructor which also
> > walks over the loop body for setting UIDs and creating stmt infos?
> 
> Good idea.  So, here is an updated patch that does that and does a fatal punt
> in vect_analyze_loop_2 for if (0) which means we don't even try other vector
> sizes in that case.
> 
> > OK, I see.  Indeed in theory something could sink the def which I'd
> > call a bug - so maybe a gcc_checking_assert that this doesn't
> > happen would be nice.
> 
> I need to compute the bb, so I've used flag_checking guarded gcc_assert if
> it is ok.
> 
> > > > build_zero_cst (TREE_TYPE (version_simd_if_cond))
> > > 
> > > Is that better (build_zero_cst is a wrapper that will call build_int_cst
> > > with 0)?  A lot of code calls build_int_cst directly.  Don't care much
> > > though.
> > 
> > it's just shorter... ;)
> 
> And this too.
> 
> Ok for trunk if it passes bootstrap/regtest?

OK with a slight adjustment below to the dominator test

Thanks,
Richard.

> 2019-05-16  Jakub Jelinek  
> 
>   * omp-low.c (lower_rec_input_clauses): If OMP_CLAUSE_IF
>   has non-constant expression, force sctx.lane and use two
>   argument IFN_GOMP_SIMD_LANE instead of single argument.
>   * tree-ssa-dce.c (eliminate_unnecessary_stmts): Don't DCE
>   two argument IFN_GOMP_SIMD_LANE without lhs.
>   * tree-vectorizer.h (struct _loop_vec_info): Add simd_if_cond
>   member.
>   (LOOP_VINFO_SIMD_IF_COND, LOOP_REQUIRES_VERSIONING_FOR_SIMD_IF_COND):
>   Define.
>   (LOOP_REQUIRES_VERSIONING): Or in
>   LOOP_REQUIRES_VERSIONING_FOR_SIMD_IF_COND.
>   * tree-vect-loop.c (_loop_vec_info::_loop_vec_info): Initialize
>   simd_if_cond.
>   (vect_analyze_loop_2): Punt if LOOP_VINFO_SIMD_IF_COND is constant 0.
>   * tree-vect-loop-manip.c (vect_loop_versioning): Add runtime check
>   from simd if clause if needed.
> 
>   * gcc.dg/vect/vect-simd-1.c: New test.
>   * gcc.dg/vect/vect-simd-2.c: New test.
>   * gcc.dg/vect/vect-simd-3.c: New test.
>   * gcc.dg/vect/vect-simd-4.c: New test.
> 
> --- gcc/omp-low.c.jj  2019-05-15 23:42:16.046859954 +0200
> +++ gcc/omp-low.c 2019-05-16 15:04:41.785179634 +0200
> @@ -3783,6 +3783,7 @@ lower_rec_input_clauses (tree clauses, g
>tree simt_lane = NULL_TREE, simtrec = NULL_TREE;
>tree ivar = NULL_TREE, lvar = NULL_TREE, uid = NULL_TREE;
>gimple_seq llist[3] = { };
> +  tree nonconst_simd_if = NULL_TREE;
>  
>copyin_seq = NULL;
>sctx.is_simt = is_simd && omp_find_clause (clauses, OMP_CLAUSE__SIMT_);
> @@ -3814,6 +3815,8 @@ lower_rec_input_clauses (tree clauses, g
>   case OMP_CLAUSE_IF:
> if (integer_zerop (OMP_CLAUSE_IF_EXPR (c)))
>   sctx.max_vf = 1;
> +   else if (TREE_CODE (OMP_CLAUSE_IF_EXPR (c)) != INTEGER_CST)
> + nonconst_simd_if = OMP_CLAUSE_IF_EXPR (c);
> break;
>  case OMP_CLAUSE_SIMDLEN:
> if (integer_onep (OMP_CLAUSE_SIMDLEN_EXPR (c)))
> @@ -5190,6 +5193,17 @@ lower_rec_input_clauses (tree clauses, g
>if (known_eq (sctx.max_vf, 1U))
>  sctx.is_simt = false;
>  
> +  if (nonconst_simd_if)
> +{
> +  if (sctx.lane == NULL_TREE)
> + {
> +   sctx.idx = create_tmp_var (unsigned_type_node);
> +   sctx.lane = create_tmp_var (unsigned_type_node);
> + }
> +  /* FIXME: For now.  */
> +  sctx.is_simt = false;
> +}
> +
>if (sctx.lane || sctx.is_simt)
>  {
>uid = create_tmp_var (ptr_type_node, "simduid");
> @@ -5219,8 +5233,9 @@ lower_rec_input_clauses (tree clauses, g
>  }
>if (sctx.lane)
>  {
> -  gimple *g
> - = gimple_build_call_internal (IFN_GOMP_SIMD_LANE, 1, uid);
> +  gimple *g = gimple_build_call_internal (IFN_GOMP_SIMD_LANE,
> +   1 + (nonconst_simd_if != NULL),
> +   uid, nonconst_simd_if);
>gimple_call_set_lhs (g, sctx.lane);
>gimple_stmt_iterator gsi = gsi_start_1 (gimple_omp_body_ptr 
> (ctx->stmt));
>gsi_insert_before_without_update (&gsi, g, GSI_SAME_STMT);
> --- gcc/tree-ssa-dce.c.jj 2019-05-15 23:36:35.696258741 +0200
> +++ gcc/tree-ssa-dce.c2019-05-16 15:04:41.786179618 +0200
> @@ -1328,12 +1328,16 @@ eliminate_unnecessary_stmts (void)
> update_stmt (stmt);
> 

[PATCH] OpenMP simd if clause support with runtime determined argument (take 2)

2019-05-16 Thread Jakub Jelinek
On Thu, May 16, 2019 at 11:30:38AM +0200, Richard Biener wrote:
> > note_simd_array_uses indeed does walk the IL and does look at the calls,
> > but I'd need some data structure where to store the argument; we don't have
> > loop_vinfo yet (we don't have it even before the loop over vector sizes),
> > adding another tree to struct loop seems undesirable from memory usage POV,
> > we'd need it just for the duration between note_simd_array_uses and
> > the actual loop_vinfo creation; so would you prefer some extra hash_map for
> > that?
> 
> Maybe that or move it to the _loop_vec_info constructor which also
> walks over the loop body for setting UIDs and creating stmt infos?

Good idea.  So, here is an updated patch that does that and does a fatal punt
in vect_analyze_loop_2 for if (0) which means we don't even try other vector
sizes in that case.

> OK, I see.  Indeed in theory something could sink the def which I'd
> call a bug - so maybe a gcc_checking_assert that this doesn't
> happen would be nice.

I need to compute the bb, so I've used flag_checking guarded gcc_assert if
it is ok.

> > > build_zero_cst (TREE_TYPE (version_simd_if_cond))
> > 
> > Is that better (build_zero_cst is a wrapper that will call build_int_cst
> > with 0)?  A lot of code calls build_int_cst directly.  Don't care much
> > though.
> 
> it's just shorter... ;)

And this too.

Ok for trunk if it passes bootstrap/regtest?

2019-05-16  Jakub Jelinek  

* omp-low.c (lower_rec_input_clauses): If OMP_CLAUSE_IF
has non-constant expression, force sctx.lane and use two
argument IFN_GOMP_SIMD_LANE instead of single argument.
* tree-ssa-dce.c (eliminate_unnecessary_stmts): Don't DCE
two argument IFN_GOMP_SIMD_LANE without lhs.
* tree-vectorizer.h (struct _loop_vec_info): Add simd_if_cond
member.
(LOOP_VINFO_SIMD_IF_COND, LOOP_REQUIRES_VERSIONING_FOR_SIMD_IF_COND):
Define.
(LOOP_REQUIRES_VERSIONING): Or in
LOOP_REQUIRES_VERSIONING_FOR_SIMD_IF_COND.
* tree-vect-loop.c (_loop_vec_info::_loop_vec_info): Initialize
simd_if_cond.
(vect_analyze_loop_2): Punt if LOOP_VINFO_SIMD_IF_COND is constant 0.
* tree-vect-loop-manip.c (vect_loop_versioning): Add runtime check
from simd if clause if needed.

* gcc.dg/vect/vect-simd-1.c: New test.
* gcc.dg/vect/vect-simd-2.c: New test.
* gcc.dg/vect/vect-simd-3.c: New test.
* gcc.dg/vect/vect-simd-4.c: New test.

--- gcc/omp-low.c.jj2019-05-15 23:42:16.046859954 +0200
+++ gcc/omp-low.c   2019-05-16 15:04:41.785179634 +0200
@@ -3783,6 +3783,7 @@ lower_rec_input_clauses (tree clauses, g
   tree simt_lane = NULL_TREE, simtrec = NULL_TREE;
   tree ivar = NULL_TREE, lvar = NULL_TREE, uid = NULL_TREE;
   gimple_seq llist[3] = { };
+  tree nonconst_simd_if = NULL_TREE;
 
   copyin_seq = NULL;
   sctx.is_simt = is_simd && omp_find_clause (clauses, OMP_CLAUSE__SIMT_);
@@ -3814,6 +3815,8 @@ lower_rec_input_clauses (tree clauses, g
case OMP_CLAUSE_IF:
  if (integer_zerop (OMP_CLAUSE_IF_EXPR (c)))
sctx.max_vf = 1;
+ else if (TREE_CODE (OMP_CLAUSE_IF_EXPR (c)) != INTEGER_CST)
+   nonconst_simd_if = OMP_CLAUSE_IF_EXPR (c);
  break;
 case OMP_CLAUSE_SIMDLEN:
  if (integer_onep (OMP_CLAUSE_SIMDLEN_EXPR (c)))
@@ -5190,6 +5193,17 @@ lower_rec_input_clauses (tree clauses, g
   if (known_eq (sctx.max_vf, 1U))
 sctx.is_simt = false;
 
+  if (nonconst_simd_if)
+{
+  if (sctx.lane == NULL_TREE)
+   {
+ sctx.idx = create_tmp_var (unsigned_type_node);
+ sctx.lane = create_tmp_var (unsigned_type_node);
+   }
+  /* FIXME: For now.  */
+  sctx.is_simt = false;
+}
+
   if (sctx.lane || sctx.is_simt)
 {
   uid = create_tmp_var (ptr_type_node, "simduid");
@@ -5219,8 +5233,9 @@ lower_rec_input_clauses (tree clauses, g
 }
   if (sctx.lane)
 {
-  gimple *g
-   = gimple_build_call_internal (IFN_GOMP_SIMD_LANE, 1, uid);
+  gimple *g = gimple_build_call_internal (IFN_GOMP_SIMD_LANE,
+ 1 + (nonconst_simd_if != NULL),
+ uid, nonconst_simd_if);
   gimple_call_set_lhs (g, sctx.lane);
   gimple_stmt_iterator gsi = gsi_start_1 (gimple_omp_body_ptr (ctx->stmt));
   gsi_insert_before_without_update (&gsi, g, GSI_SAME_STMT);
--- gcc/tree-ssa-dce.c.jj   2019-05-15 23:36:35.696258741 +0200
+++ gcc/tree-ssa-dce.c  2019-05-16 15:04:41.786179618 +0200
@@ -1328,12 +1328,16 @@ eliminate_unnecessary_stmts (void)
  update_stmt (stmt);
  release_ssa_name (name);
 
- /* GOMP_SIMD_LANE or ASAN_POISON without lhs is not
-needed.  */
+ /* GOMP_SIMD_LANE (unless two argument) or ASAN_POISON
+without lhs is not needed.  */
  if (gimple_call_inter

Re: [PATCH] Remove redundant accessors in hash tables

2019-05-16 Thread Jonathan Wakely

On 16/05/19 13:30 +0100, Jonathan Wakely wrote:

On 16/05/19 11:05 +0100, Jonathan Wakely wrote:

On 16/05/19 07:47 +0200, François Dumont wrote:

On 5/15/19 5:37 PM, Jonathan Wakely wrote:

François,
I noticed that _Hash_code_base and _Hashtable_base have a number of
member functions which are overloaded for const and non-const:

   const _Equal&
   _M_eq() const { return _EqualEBO::_S_cget(*this); }

   _Equal&
   _M_eq() { return _EqualEBO::_S_get(*this); }

The non-const ones seem to be unnecessary. They're used in the _M_swap
member functions, but all other uses could (and probably should) call
the const overload to get a const reference to the function object.

If we make the _M_swap members use the EBO accessors directly then we
can get rid of the non-const accessors. That makes overload resolution
simpler for the compiler (as there's only one function to choose from)
and should result in slightly smaller code when inlining is not
enabled.

Do you see any problem with this patch?


I think it is more a Pavlov behavior, always providing const and 
non-const no matter what.


No problem to simplify this.


OK, tested powerpc64le-linux, committed to trunk.


I don't see a need for the _Hashtable_ebo_helper member functions to
be static. They return a member of *this, but are static so *this has
to be passed as a parameter. We could just make them non-static.

It seems to have always been that way since the first version of the
patch that added the helpers:
https://gcc.gnu.org/ml/libstdc++/2011-12/msg00139.html

This patch passes all tests. I plan to commit this to trunk too.


And another one (which I've actually had sitting in a local branch for
a few years!)


commit 4cf9940963e326d09080177d7404f5aaedae67df
Author: Jonathan Wakely 
Date:   Fri Dec 5 10:42:54 2014 +

Replace _Equal_helper with simpler class template

By defining the new helper inside _Hashtable_base it doesn't need all
the template parameters to be provided, and by making it only
responsible for checking a possibly-cached hash code it only has to do
one thing.  The caller can use the equality predicate itself instead of
duplicating that in the helper template.

* include/bits/hashtable_policy.h (_Equal_helper): Remove.
(_Hashtable_base::_Equal_hash_code): Define new class template.
(_Hashtable_base::_M_equals): Use _Equal_hash_code instead of
_Equal_helper.

diff --git a/libstdc++-v3/include/bits/hashtable_policy.h b/libstdc++-v3/include/bits/hashtable_policy.h
index f7db7628c69..86589e9a2d6 100644
--- a/libstdc++-v3/include/bits/hashtable_policy.h
+++ b/libstdc++-v3/include/bits/hashtable_policy.h
@@ -1403,38 +1403,6 @@ namespace __detail
   _M_h2() const { return __ebo_h2::_M_cget(); }
 };
 
-  /**
-   *  Primary class template _Equal_helper.
-   *
-   */
-  template 
-  struct _Equal_helper;
-
-  /// Specialization.
-  template
-  struct _Equal_helper<_Key, _Value, _ExtractKey, _Equal, _HashCodeType, true>
-  {
-static bool
-_S_equals(const _Equal& __eq, const _ExtractKey& __extract,
-	  const _Key& __k, _HashCodeType __c, _Hash_node<_Value, true>* __n)
-{ return __c == __n->_M_hash_code && __eq(__k, __extract(__n->_M_v())); }
-  };
-
-  /// Specialization.
-  template
-  struct _Equal_helper<_Key, _Value, _ExtractKey, _Equal, _HashCodeType, false>
-  {
-static bool
-_S_equals(const _Equal& __eq, const _ExtractKey& __extract,
-	  const _Key& __k, _HashCodeType, _Hash_node<_Value, false>* __n)
-{ return __eq(__k, __extract(__n->_M_v())); }
-  };
-
-
   /// Partial specialization used when nodes contain a cached hash code.
   template
@@ -1788,8 +1756,22 @@ namespace __detail
 		 iterator>::type;
   private:
 using _EqualEBO = _Hashtable_ebo_helper<0, _Equal>;
-using _EqualHelper =  _Equal_helper<_Key, _Value, _ExtractKey, _Equal,
-	__hash_code, __hash_cached::value>;
+
+template
+  struct _Equal_hash_code
+  {
+   static bool
+   _S_equals(__hash_code, const _NodeT&)
+   { return true; }
+  };
+
+template
+  struct _Equal_hash_code<_Hash_node<_Ptr2, true>>
+  {
+   static bool
+   _S_equals(__hash_code __c, const _Hash_node<_Ptr2, true>& __n)
+   { return __c == __n._M_hash_code; }
+  };
 
   protected:
 _Hashtable_base() = default;
@@ -1801,8 +1783,8 @@ namespace __detail
 bool
 _M_equals(const _Key& __k, __hash_code __c, __node_type* __n) const
 {
-  return _EqualHelper::_S_equals(_M_eq(), this->_M_extract(),
- __k, __c, __n);
+  return _Equal_hash_code<__node_type>::_S_equals(__c, *__n)
+	&& _M_eq()(__k, this->_M_extract()(__n->_M_v()));
 }
 
 void


Re: [PATCH] Fix __builtin_init_dwarf_reg_size_table when built with -mfpxx

2019-05-16 Thread Dragan Mladjenovic
Ping.



From: Dragan Mladjenovic
Sent: Thursday, May 9, 2019 12:29 PM
To: gcc-patches@gcc.gnu.org
Cc: Dragan Mladjenovic; Jakub Jelinek; Matthew Fortune
Subject: [PATCH] Fix __builtin_init_dwarf_reg_size_table when built with -mfpxx

From: "Dragan Mladjenovic" 


Hi all,

For TARGET_FLOATXX the odd-numbered FP registers in SFmode are
HARD_REGNO_CALL_PART_CLOBBERED. This causes dwarf_frame_reg_mode to fall
back to VOIDmode and for __builtin_init_dwarf_reg_size_table to fill them
as zero sized.

This prevents libgcc's unwinder form ever restoring high parts of
calle-saved double precision registers.

This patch fixes the issue by forcing dwarf_frame_reg_mode to use SImode
for FP registers.

Bootstrapped and done regression tests on mipsel-unknown-linux-gnu -
no new failures found.


Best regards,
Dragan


gcc/ChangeLog:

2019-04-23  Dragan Mladjenovic  

  * gcc/config/mips/mips.c(mips_dwarf_frame_reg_mode): Replace TARGET_FLOAT64
  with !TARGET_FLOAT32, thus handling both fp64 and fpxx modes.

gcc/testsuite/ChangeLog:

2019-04-23  Dragan Mladjenovic  

  * g++.dg/eh/o32-fp.C: New.
  * gcc.target/mips/dwarfregtable-1.c: New.
  * gcc.target/mips/dwarfregtable-2.c: New.
  * gcc.target/mips/dwarfregtable-3.c: New.
  * gcc.target/mips/dwarfregtable-4.c: New.
  * gcc.target/mips/dwarfregtable.h: New.

---
 gcc/config/mips/mips.c  |  2 +-
 gcc/testsuite/g++.dg/eh/o32-fp.C| 47 +
 gcc/testsuite/gcc.target/mips/dwarfregtable-1.c |  5 +++
 gcc/testsuite/gcc.target/mips/dwarfregtable-2.c |  5 +++
 gcc/testsuite/gcc.target/mips/dwarfregtable-3.c |  5 +++
 gcc/testsuite/gcc.target/mips/dwarfregtable-4.c |  5 +++
 gcc/testsuite/gcc.target/mips/dwarfregtable.h   | 22 
 7 files changed, 90 insertions(+), 1 deletion(-)
 create mode 100644 gcc/testsuite/g++.dg/eh/o32-fp.C
 create mode 100644 gcc/testsuite/gcc.target/mips/dwarfregtable-1.c
 create mode 100644 gcc/testsuite/gcc.target/mips/dwarfregtable-2.c
 create mode 100644 gcc/testsuite/gcc.target/mips/dwarfregtable-3.c
 create mode 100644 gcc/testsuite/gcc.target/mips/dwarfregtable-4.c
 create mode 100644 gcc/testsuite/gcc.target/mips/dwarfregtable.h

diff --git a/gcc/config/mips/mips.c b/gcc/config/mips/mips.c
index 1de33b2..c0c995a 100644
--- a/gcc/config/mips/mips.c
+++ b/gcc/config/mips/mips.c
@@ -9577,7 +9577,7 @@ mips_dwarf_frame_reg_mode (int regno)
 {
   machine_mode mode = default_dwarf_frame_reg_mode (regno);

-  if (FP_REG_P (regno) && mips_abi == ABI_32 && TARGET_FLOAT64)
+  if (FP_REG_P (regno) && mips_abi == ABI_32 && !TARGET_FLOAT32)
 mode = SImode;

   return mode;
diff --git a/gcc/testsuite/g++.dg/eh/o32-fp.C b/gcc/testsuite/g++.dg/eh/o32-fp.C
new file mode 100644
index 000..08fa51b
--- /dev/null
+++ b/gcc/testsuite/g++.dg/eh/o32-fp.C
@@ -0,0 +1,47 @@
+// Test whether call saved float are restored properly for O32 ABI
+// { dg-do run { target { { { mips*-*-linux* } && hard_float } && { ! mips64 } 
} } }
+// { dg-options "-O2" }
+
+void __attribute__((noinline))
+bar (void)
+{
+  throw 1;
+}
+
+void __attribute__((noinline))
+foo (void)
+{
+  register double f20 __asm__ ("f20") = 0.0;
+  register double f22 __asm__ ("f22") = 0.0;
+  register double f24 __asm__ ("f24") = 0.0;
+  register double f26 __asm__ ("f26") = 0.0;
+  register double f28 __asm__ ("f28") = 0.0;
+  register double f30 __asm__ ("f30") = 0.0;
+  __asm__ __volatile__("":"+f"(f20),"+f"(f22),"+f"(f24),"+f"(f26),"+f"(f30));
+  bar ();
+}
+
+int
+main (void)
+{
+  register double f20 __asm__ ("f20") = 12.0;
+  register double f22 __asm__ ("f22") = 13.0;
+  register double f24 __asm__ ("f24") = 14.0;
+  register double f26 __asm__ ("f26") = 15.0;
+  register double f28 __asm__ ("f28") = 16.0;
+  register double f30 __asm__ ("f30") = 17.0;
+
+  try
+{
+  foo ();
+}
+  catch (...)
+{
+  __asm__ ("":"+f"(f20),"+f"(f22),"+f"(f24),"+f"(f26),"+f"(f30));
+}
+
+  if (f20 != 12.0 || f22 != 13.0 || f24 != 14.0
+  || f26 != 15.0 || f28 != 16.0 || f30 != 17.0)
+__builtin_abort ();
+  return 0;
+}
diff --git a/gcc/testsuite/gcc.target/mips/dwarfregtable-1.c 
b/gcc/testsuite/gcc.target/mips/dwarfregtable-1.c
new file mode 100644
index 000..93d0844
--- /dev/null
+++ b/gcc/testsuite/gcc.target/mips/dwarfregtable-1.c
@@ -0,0 +1,5 @@
+/* Check if content of dwarf reg size table matches the expected.  */
+/* { dg-do run } */
+/* { dg-options "-mabi=32 -mfp32" } */
+
+#include "dwarfregtable.h"
diff --git a/gcc/testsuite/gcc.target/mips/dwarfregtable-2.c 
b/gcc/testsuite/gcc.target/mips/dwarfregtable-2.c
new file mode 100644
index 000..c6dea94
--- /dev/null
+++ b/gcc/testsuite/gcc.target/mips/dwarfregtable-2.c
@@ -0,0 +1,5 @@
+/* Check if content of dwarf reg size table matches the expected.  */
+/* { dg-do run } */
+/* { dg-options "-mabi=32 -mfpxx" } */
+
+#include "dwarfregtable.h"
diff --git a/gcc/testsuite/gcc.target/mips/dwarfregtable-3.c 
b/gcc/testsuite

Re: [PATCH] Do not allow target_clones with alias attr (PR lto/90500).

2019-05-16 Thread Martin Liška
On 5/16/19 2:52 PM, Richard Biener wrote:
> On Thu, May 16, 2019 at 1:53 PM Martin Liška  wrote:
>>
>> On 5/16/19 1:42 PM, Richard Biener wrote:
>>> On Thu, May 16, 2019 at 1:38 PM Martin Liška  wrote:

 On 5/16/19 1:24 PM, Richard Biener wrote:
> On Thu, May 16, 2019 at 1:18 PM Martin Liška  wrote:
>>
>> Hi.
>>
>> We should not allow target_clones being combined with alias attribute.
>>
>> Patch can bootstrap on x86_64-linux-gnu and survives regression tests.
>>
>> Ready to be installed?
>
> So that's because an alias cannot be turned into a dispatcher and still
> be an alias, correct?  So a way around this would be to turn the
> alias into the dispatcher and clone the alias target, leaving the
> plain alias target as default variant not going through the dispatcher?

 We do allow having an alias to a target clone symbol:

 __attribute__((target_clones("arch=haswell", "default"))) int __tanh() {}
 __typeof(__tanh) tanhf64 __attribute__((alias("__tanh")));

 Having that tanhf64 points to the resolver, which I believe is correct.
>>>
>>> In this case yes.  I think the case in the PR wants to have an alias
>>> to the default variant instead and that's not possible so it tries to
>>> do the cloning on an alias (basically tell cloning to use an alternate
>>> symbol name for the resolver, leaving the default in place).  IMHO
>>> a reasonable feature, not sure if a reasonable way to achieve.
>>
>> I see. Agree with you that it can be handy. On the other hand, one can use 
>> target
>> attribute:
>>
>> __attribute__((target("avx","arch=core-avx2")))
>> int
>> bar ()
>> {
>>   return 2;
>> }
>>
>> __attribute__((target("default")))
>> int
>> bar ()
>> {
>>   return 2;
>> }
>>
>> int barrr () __attribute__((alias("_Z3barv")));
>>
>> Which directly identifies a concrete implementation.
> 
> Hmm.  You reference the dispatcher here via barrr?
> Who knows the mangling of the default version?

Yep, that's a bit cumbersome, but we've been using that for quite some time :)

> 
> I guess having
> 
> __attribute__((target("avx" (bar_avx),"default" (bar_default
> int bar() { ... }
> 
> would be nice, creating external symbols for the individual clones?
> 
> But maybe
> 
> int bar() { ... }
> __attribute__((target("avx"))) int bar();
> __attribute__((target("default"), alias("bar_default"))) int bar();
> 
> could already work for this.

I prefer the later one. I'll maybe implement that in the future.

> 
> Meanwhile your patch is OK I think.

Thanks,
Martin

> 
> Richard.
> 
>>
>> Martin
>>
>>>
 The PR is about an alias that has itself target_clone attribute, which
 does not make sense.

>
> Of course in the testcase the body of the alias target isn't available
> but that's a general issue, not special to aliases?

 We do the target_clone expansion just for node->definition which is true
 for node->alias == true symbols.
>>>
>>> I see.  I guess it should be done for node->analyzed only, but yes,
>>> w/o considering aliases or thunks all definitions have bodies.
>>>
>>> Richard.
>>>
 Martin

>
> Richard.
>
>> Thanks,
>> Martin
>>
>> gcc/ChangeLog:
>>
>> 2019-05-16  Martin Liska  
>>
>> PR lto/90500
>> * multiple_target.c (expand_target_clones): Do not allow
>> target_clones being used with a symbol that is an alias.
>>
>> gcc/testsuite/ChangeLog:
>>
>> 2019-05-16  Martin Liska  
>>
>> PR lto/90500
>> * gcc.target/i386/pr90500-1.c: New test.
>> * gcc.target/i386/pr90500-2.c: New test.
>> ---
>>  gcc/multiple_target.c | 5 -
>>  gcc/testsuite/gcc.target/i386/pr90500-1.c | 8 
>>  gcc/testsuite/gcc.target/i386/pr90500-2.c | 7 +++
>>  3 files changed, 19 insertions(+), 1 deletion(-)
>>  create mode 100644 gcc/testsuite/gcc.target/i386/pr90500-1.c
>>  create mode 100644 gcc/testsuite/gcc.target/i386/pr90500-2.c
>>
>>

>>



Re: [PATCH 3/3][DejaGNU] target: Wrap linker flags into `-largs'/`-margs' for Ada

2019-05-16 Thread Maciej W. Rozycki
On Wed, 15 May 2019, Jacob Bachmeyer wrote:

> This patch really exposes a significant deficiency in our current 
> implementation of default_target_compile:  the order of various flags 
> can be significant, but we only have that order implicitly expressed in 
> the code, which goes all the way back to (of course) the "Initial 
> revision" that is probably from a time before Tcl had the features that 
> will allow significant cleanup in here.

 I suspect the origins may be different, however as valuable as your 
observation is functional problems have precedence over issues with code 
structuring, so we need to fix the problem at hand first.  I'm sure 
DejaGNU maintainers will be happy to review your implementation of code 
restructuring afterwards.

> Some of these could probably be combined and I may have combined 
> categories that should be separate in the above list.  The GNU toolchain 
> has always been a kind of "magic box that just works" (until it doesn't 
> and the manual explains the problem) for me, so I am uncertain what the 
> ordering rules for combining these categories should be.  Anyone know 
> the traditional rules and, perhaps more importantly, what systems need 
> which rules?

 The ordering rules are system-specific I'm afraid and we have to be 
careful not to break working systems out there.  People may be forced to a 
DejaGNU upgrate, due to a newer version of a program being tested having 
such a requirement, and can legitimately expect their system to continue 
working.

 NB I have been repeatedly observing cases where a forced upgrade of a 
system component I neither care nor I am competent about, triggered by an 
upgrade of a component I do care about, caused the system to malfunction 
in a way that I find both unacceptable and extremely hard to debug.  It 
seems to have become more frequent in the recent years, and I find this 
both very frustrating and have wasted lots of time trying to fix the 
damage caused.  I would therefore suggest to take all the measures 
possible to save people from going through such an experience.

 FWIW,

  Maciej

Re: [PATCH] Do not allow target_clones with alias attr (PR lto/90500).

2019-05-16 Thread Richard Biener
On Thu, May 16, 2019 at 1:53 PM Martin Liška  wrote:
>
> On 5/16/19 1:42 PM, Richard Biener wrote:
> > On Thu, May 16, 2019 at 1:38 PM Martin Liška  wrote:
> >>
> >> On 5/16/19 1:24 PM, Richard Biener wrote:
> >>> On Thu, May 16, 2019 at 1:18 PM Martin Liška  wrote:
> 
>  Hi.
> 
>  We should not allow target_clones being combined with alias attribute.
> 
>  Patch can bootstrap on x86_64-linux-gnu and survives regression tests.
> 
>  Ready to be installed?
> >>>
> >>> So that's because an alias cannot be turned into a dispatcher and still
> >>> be an alias, correct?  So a way around this would be to turn the
> >>> alias into the dispatcher and clone the alias target, leaving the
> >>> plain alias target as default variant not going through the dispatcher?
> >>
> >> We do allow having an alias to a target clone symbol:
> >>
> >> __attribute__((target_clones("arch=haswell", "default"))) int __tanh() {}
> >> __typeof(__tanh) tanhf64 __attribute__((alias("__tanh")));
> >>
> >> Having that tanhf64 points to the resolver, which I believe is correct.
> >
> > In this case yes.  I think the case in the PR wants to have an alias
> > to the default variant instead and that's not possible so it tries to
> > do the cloning on an alias (basically tell cloning to use an alternate
> > symbol name for the resolver, leaving the default in place).  IMHO
> > a reasonable feature, not sure if a reasonable way to achieve.
>
> I see. Agree with you that it can be handy. On the other hand, one can use 
> target
> attribute:
>
> __attribute__((target("avx","arch=core-avx2")))
> int
> bar ()
> {
>   return 2;
> }
>
> __attribute__((target("default")))
> int
> bar ()
> {
>   return 2;
> }
>
> int barrr () __attribute__((alias("_Z3barv")));
>
> Which directly identifies a concrete implementation.

Hmm.  You reference the dispatcher here via barrr?
Who knows the mangling of the default version?

I guess having

__attribute__((target("avx" (bar_avx),"default" (bar_default
int bar() { ... }

would be nice, creating external symbols for the individual clones?

But maybe

int bar() { ... }
__attribute__((target("avx"))) int bar();
__attribute__((target("default"), alias("bar_default"))) int bar();

could already work for this.

Meanwhile your patch is OK I think.

Richard.

>
> Martin
>
> >
> >> The PR is about an alias that has itself target_clone attribute, which
> >> does not make sense.
> >>
> >>>
> >>> Of course in the testcase the body of the alias target isn't available
> >>> but that's a general issue, not special to aliases?
> >>
> >> We do the target_clone expansion just for node->definition which is true
> >> for node->alias == true symbols.
> >
> > I see.  I guess it should be done for node->analyzed only, but yes,
> > w/o considering aliases or thunks all definitions have bodies.
> >
> > Richard.
> >
> >> Martin
> >>
> >>>
> >>> Richard.
> >>>
>  Thanks,
>  Martin
> 
>  gcc/ChangeLog:
> 
>  2019-05-16  Martin Liska  
> 
>  PR lto/90500
>  * multiple_target.c (expand_target_clones): Do not allow
>  target_clones being used with a symbol that is an alias.
> 
>  gcc/testsuite/ChangeLog:
> 
>  2019-05-16  Martin Liska  
> 
>  PR lto/90500
>  * gcc.target/i386/pr90500-1.c: New test.
>  * gcc.target/i386/pr90500-2.c: New test.
>  ---
>   gcc/multiple_target.c | 5 -
>   gcc/testsuite/gcc.target/i386/pr90500-1.c | 8 
>   gcc/testsuite/gcc.target/i386/pr90500-2.c | 7 +++
>   3 files changed, 19 insertions(+), 1 deletion(-)
>   create mode 100644 gcc/testsuite/gcc.target/i386/pr90500-1.c
>   create mode 100644 gcc/testsuite/gcc.target/i386/pr90500-2.c
> 
> 
> >>
>


Re: [PATCH] debug: make -feliminate-unused-debug-symbols the default [PR debug/86964]

2019-05-16 Thread Richard Biener
On Thu, May 16, 2019 at 11:20 AM Thomas De Schampheleire
 wrote:
>
> From: Thomas De Schampheleire 
>
> In addition to making -feliminate-unused-debug-symbols work for the DWARF
> format (see [1]), make this option the default. This behavior was the case
> before, e.g. under gcc 4.9.x.
> [1] https://gcc.gnu.org/viewcvs/gcc?view=revision&revision=269925

I have tested this patch and it causes a few FAILs, eventually hinting
at implementation issues:

=== g++ tests ===


Running target unix
FAIL: g++.dg/debug/enum-2.C -gstabs -O2  scan-assembler JTI_MAX
FAIL: g++.dg/debug/enum-2.C -gstabs -O3  scan-assembler JTI_MAX
FAIL: g++.dg/debug/enum-2.C -gstabs+ -O2  scan-assembler JTI_MAX
FAIL: g++.dg/debug/enum-2.C -gstabs+ -O3  scan-assembler JTI_MAX
FAIL: g++.dg/debug/enum-2.C -gstabs+3 -O2  scan-assembler JTI_MAX
FAIL: g++.dg/debug/enum-2.C -gstabs+3 -O3  scan-assembler JTI_MAX
FAIL: g++.dg/debug/enum-2.C -gstabs3 -O2  scan-assembler JTI_MAX
FAIL: g++.dg/debug/enum-2.C -gstabs3 -O3  scan-assembler JTI_MAX

maybe expected (stabs)

FAIL: g++.dg/debug/dwarf2/fesd-any.C  -std=gnu++14  scan-assembler field_head_or
dy_defn_fld_head.*DW_AT_name
FAIL: g++.dg/debug/dwarf2/fesd-any.C  -std=gnu++14  scan-assembler field_head_or
dy_defn_ptr_head.*DW_AT_name
FAIL: g++.dg/debug/dwarf2/fesd-any.C  -std=gnu++14  scan-assembler field_head_or
dy_defn_ref_head.*DW_AT_name
FAIL: g++.dg/debug/dwarf2/fesd-any.C  -std=gnu++14  scan-assembler field_head_or
dy_defn_var_head_fld.*DW_AT_name
... more ...
FAIL: g++.dg/debug/dwarf2/fesd-baseonly.C  -std=gnu++14  scan-assembler gstruct_
head_ordy_defn_var_head.*DW_AT_name
FAIL: g++.dg/debug/dwarf2/fesd-baseonly.C  -std=gnu++14  scan-assembler gstruct_
head_tmpl_defn_var_head.*DW_AT_name
FAIL: g++.dg/debug/dwarf2/fesd-baseonly.C  -std=gnu++17  scan-assembler gstruct_
head_ordy_defn_var_head.*DW_AT_name
FAIL: g++.dg/debug/dwarf2/fesd-baseonly.C  -std=gnu++17  scan-assembler gstruct_
head_tmpl_defn_var_head.*DW_AT_name
FAIL: g++.dg/debug/dwarf2/fesd-baseonly.C  -std=gnu++98  scan-assembler gstruct_
head_ordy_defn_var_head.*DW_AT_name
FAIL: g++.dg/debug/dwarf2/fesd-baseonly.C  -std=gnu++98
scan-assembler gstruct_head_tmpl_defn_var_head.*DW_AT_name
FAIL: g++.dg/debug/dwarf2/fesd-none.C  -std=gnu++14  scan-assembler
gstruct_head_ordy_defn_var_head.*DW_AT_name
FAIL: g++.dg/debug/dwarf2/fesd-none.C  -std=gnu++14  scan-assembler
gstruct_head_tmpl_defn_var_head.*DW_AT_name
... more fesd-* testcases FAIL ...
FAIL: g++.dg/debug/dwarf2/inline-var-1.C  -std=gnu++17
scan-assembler-times  DW_AT_[^\\n\\r]*linkage_name 7
FAIL: g++.dg/debug/dwarf2/inline-var-1.C  -std=gnu++17
scan-assembler-times  DW_AT_specification 6
FAIL: g++.dg/debug/dwarf2/inline-var-1.C  -std=gnu++17
scan-assembler-times 0x3[^\\n\\r]* DW_AT_inline 6

C variants of the fesd-* testcases also FAIL.  Those testcases are
huge, a quick look didn't
reveal whether those are expected FAILs or not.

Richard.

> gcc/ChangeLog:
>
> 2019-05-16  Thomas De Schampheleire  
>
> PR debug/86964
> * common.opt (feliminate-unused-debug-symbols): Enable by default.
> * doc/invoke.texi (Debugging Options): Document new default of
> -feliminate-unused-debug-symbols and remove restriction to 'stabs'.
> ---
>  gcc/common.opt  | 2 +-
>  gcc/doc/invoke.texi | 9 +
>  2 files changed, 6 insertions(+), 5 deletions(-)
>
> diff --git a/gcc/common.opt b/gcc/common.opt
> index d342c4f3749..0e72fd08ec4 100644
> --- a/gcc/common.opt
> +++ b/gcc/common.opt
> @@ -1379,7 +1379,7 @@ Common Report Var(flag_ipa_sra) Init(0) Optimization
>  Perform interprocedural reduction of aggregates.
>
>  feliminate-unused-debug-symbols
> -Common Report Var(flag_debug_only_used_symbols)
> +Common Report Var(flag_debug_only_used_symbols) Init(1)
>  Perform unused symbol elimination in debug info.
>
>  feliminate-unused-debug-types
> diff --git a/gcc/doc/invoke.texi b/gcc/doc/invoke.texi
> index 5e3e8873d35..06c8c60f19e 100644
> --- a/gcc/doc/invoke.texi
> +++ b/gcc/doc/invoke.texi
> @@ -388,7 +388,7 @@ Objective-C and Objective-C++ Dialects}.
>  -fno-eliminate-unused-debug-types @gol
>  -femit-struct-debug-baseonly  -femit-struct-debug-reduced @gol
>  -femit-struct-debug-detailed@r{[}=@var{spec-list}@r{]} @gol
> --feliminate-unused-debug-symbols  -femit-class-debug-always @gol
> +-fno-eliminate-unused-debug-symbols  -femit-class-debug-always @gol
>  -fno-merge-debug-strings  -fno-dwarf2-cfi-asm @gol
>  -fvar-tracking  -fvar-tracking-assignments}
>
> @@ -7827,10 +7827,11 @@ confusion with @option{-gdwarf-@var{level}}.
>  Instead use an additional @option{-g@var{level}} option to change the
>  debug level for DWARF.
>
> -@item -feliminate-unused-debug-symbols
> +@item -fno-eliminate-unused-debug-symbols
>  @opindex feliminate-unused-debug-symbols
> -Produce debugging information in stabs format (if that is supported),
> -for only symbols that are actually used.
> +@opindex fno-eliminate-unused-debug-symbols
> +By default, no debug in

Re: [PATCH 2/3][GCC] GNAT/testsuite: Pass the `ada' option to target compilation

2019-05-16 Thread Maciej W. Rozycki
On Wed, 15 May 2019, Jacob Bachmeyer wrote:

> > Index: gcc/gcc/testsuite/lib/gnat.exp
> > ===
> > --- gcc.orig/gcc/testsuite/lib/gnat.exp
> > +++ gcc/gcc/testsuite/lib/gnat.exp
> > @@ -167,6 +167,8 @@ proc gnat_target_compile { source dest t
> > set options [concat "additional_flags=$TOOL_OPTIONS" $options]
> >  }
> >  
> > +set options [concat "{ada}" $options]
> > +
> >  return [target_compile $source $dest $type $options]
> >  }
> >   
> Your Tcl syntax looks suspicious to me.  Is there a reason for "ada" to 
> be in both double quotes and braces?

 Most existing `options' elements are lists, as shown by:

clone_output "options: $options\n"

placed at the top of `default_target_compile' (leading paths stripped 
here):

options: {ada} {additional_flags=-fno-diagnostics-show-caret 
-fno-diagnostics-show-line-numbers -fdiagnostics-color=never 
--sysroot=.../sysroot} {additional_flags=-gnatJ -c -u} 
{compiler=.../gcc/gnatmake --GCC=.../gcc/xgcc --GNATBIND=.../gcc/gnatbind 
--GNATLINK=.../gcc/gnatlink -cargs -B.../gcc -largs --GCC=.../gcc/xgcc\ 
-B.../gcc\ -march=rv64imafdc\ -mabi=lp64d -margs 
--RTS=.../riscv64-linux-gnu/lib64/lp64d/libada -q -f} timeout=300

so I did this for consistency, although in reality it doesn't matter, not 
at least for `default_target_compile', and either approach would work.

 We are not consistent here in `gnat_target_compile' anyway, as you can 
see from the two existing `concat' invocations, and also the `timeout=300' 
element.

> Perhaps {lappend options ada} might be simpler?  Is placing ada at the 
> beginning of the list important?

 It can't be last because we override the default compiler otherwise
selected by this option in `default_target_compile', and then options 
passed in may override it further.  Overall I felt it to be safer if we 
placed the compiler type selection first rather than somewhere in the 
middle.

 I hope it clears your concerns.

  Maciej

Re: [PATCH] Remove redundant accessors in hash tables

2019-05-16 Thread Jonathan Wakely

On 16/05/19 11:05 +0100, Jonathan Wakely wrote:

On 16/05/19 07:47 +0200, François Dumont wrote:

On 5/15/19 5:37 PM, Jonathan Wakely wrote:

François,
I noticed that _Hash_code_base and _Hashtable_base have a number of
member functions which are overloaded for const and non-const:

   const _Equal&
   _M_eq() const { return _EqualEBO::_S_cget(*this); }

   _Equal&
   _M_eq() { return _EqualEBO::_S_get(*this); }

The non-const ones seem to be unnecessary. They're used in the _M_swap
member functions, but all other uses could (and probably should) call
the const overload to get a const reference to the function object.

If we make the _M_swap members use the EBO accessors directly then we
can get rid of the non-const accessors. That makes overload resolution
simpler for the compiler (as there's only one function to choose from)
and should result in slightly smaller code when inlining is not
enabled.

Do you see any problem with this patch?


I think it is more a Pavlov behavior, always providing const and 
non-const no matter what.


No problem to simplify this.


OK, tested powerpc64le-linux, committed to trunk.


I don't see a need for the _Hashtable_ebo_helper member functions to
be static. They return a member of *this, but are static so *this has
to be passed as a parameter. We could just make them non-static.

It seems to have always been that way since the first version of the
patch that added the helpers:
https://gcc.gnu.org/ml/libstdc++/2011-12/msg00139.html

This patch passes all tests. I plan to commit this to trunk too.

commit 73b18aaa3a42f446b3bf295ef5bcd2e3af45004f
Author: Jonathan Wakely 
Date:   Thu May 16 13:01:34 2019 +0100

Change EBO accessors from static to non-static member functions

* include/bits/hashtable_policy.h (_Hashtable_ebo_helper::_S_get):
Replace with _M_get non-static member function.
(_Hashtable_ebo_helper::_S_cget): Replace with _M_cget non-static
member function.
(_Hash_code_base, _Local_iterator_base, _Hashtable_base):
(_Hashtable_alloc): Adjust to use non-static members of EBO helper.

diff --git a/libstdc++-v3/include/bits/hashtable_policy.h b/libstdc++-v3/include/bits/hashtable_policy.h
index b417a7d442c..f7db7628c69 100644
--- a/libstdc++-v3/include/bits/hashtable_policy.h
+++ b/libstdc++-v3/include/bits/hashtable_policy.h
@@ -1112,13 +1112,8 @@ namespace __detail
 	  : _Tp(std::forward<_OtherTp>(__tp))
 	{ }
 
-  static const _Tp&
-  _S_cget(const _Hashtable_ebo_helper& __eboh)
-  { return static_cast(__eboh); }
-
-  static _Tp&
-  _S_get(_Hashtable_ebo_helper& __eboh)
-  { return static_cast<_Tp&>(__eboh); }
+  const _Tp& _M_cget() const { return static_cast(*this); }
+  _Tp& _M_get() { return static_cast<_Tp&>(*this); }
 };
 
   /// Specialization not using EBO.
@@ -1132,13 +1127,8 @@ namespace __detail
 	  : _M_tp(std::forward<_OtherTp>(__tp))
 	{ }
 
-  static const _Tp&
-  _S_cget(const _Hashtable_ebo_helper& __eboh)
-  { return __eboh._M_tp; }
-
-  static _Tp&
-  _S_get(_Hashtable_ebo_helper& __eboh)
-  { return __eboh._M_tp; }
+  const _Tp& _M_cget() const { return _M_tp; }
+  _Tp& _M_get() { return _M_tp; }
 
 private:
   _Tp _M_tp;
@@ -1229,16 +1219,16 @@ namespace __detail
   void
   _M_swap(_Hash_code_base& __x)
   {
-	std::swap(__ebo_extract_key::_S_get(*this),
-		  __ebo_extract_key::_S_get(__x));
-	std::swap(__ebo_hash::_S_get(*this), __ebo_hash::_S_get(__x));
+	std::swap(__ebo_extract_key::_M_get(),
+		  __x.__ebo_extract_key::_M_get());
+	std::swap(__ebo_hash::_M_get(), __x.__ebo_hash::_M_get());
   }
 
   const _ExtractKey&
-  _M_extract() const { return __ebo_extract_key::_S_cget(*this); }
+  _M_extract() const { return __ebo_extract_key::_M_cget(); }
 
   const _Hash&
-  _M_ranged_hash() const { return __ebo_hash::_S_cget(*this); }
+  _M_ranged_hash() const { return __ebo_hash::_M_cget(); }
 };
 
   // No specialization for ranged hash function while caching hash codes.
@@ -1317,20 +1307,20 @@ namespace __detail
   void
   _M_swap(_Hash_code_base& __x)
   {
-	std::swap(__ebo_extract_key::_S_get(*this),
-		  __ebo_extract_key::_S_get(__x));
-	std::swap(__ebo_h1::_S_get(*this), __ebo_h1::_S_get(__x));
-	std::swap(__ebo_h2::_S_get(*this), __ebo_h2::_S_get(__x));
+	std::swap(__ebo_extract_key::_M_get(),
+		  __x.__ebo_extract_key::_M_get());
+	std::swap(__ebo_h1::_M_get(), __x.__ebo_h1::_M_get());
+	std::swap(__ebo_h2::_M_get(), __x.__ebo_h2::_M_get());
   }
 
   const _ExtractKey&
-  _M_extract() const { return __ebo_extract_key::_S_cget(*this); }
+  _M_extract() const { return __ebo_extract_key::_M_cget(); }
 
   const _H1&
-  _M_h1() const { return __ebo_h1::_S_cget(*this); }
+  _M_h1() const { return __ebo_h1::_M_cget(); }
 
   const _H2&
-  _M_h2() const { return __ebo_h2::_S_cget(*this); }
+  _

Re: [PATCH] True IPA reimplementation of IPA-SRA

2019-05-16 Thread Richard Biener
On Fri, May 10, 2019 at 10:31 AM Martin Jambor  wrote:
>
> Hello,
>
> this is a follow-up from a WIP patch I sent here in late December:
> https://gcc.gnu.org/ml/gcc-patches/2018-12/msg01765.html
>
> Just like the last time, the patch below is is a reimplementation of
> IPA-SRA to make it a full IPA pass that can handle strictly connected
> components in the call-graph, can take advantage of LTO and does not
> weirdly switch functions in pass-pipeline like our current quasi-IPA SRA
> does.  Unlike the current IPA-SRA it can also remove return values, even
> in SCCs.  On the other hand, it is less powerful when it comes to
> structures passed by reference.  By design it will not create references
> to bits of an aggregate because that turned out to be just obfuscation
> in practice.  However, it also cannot usually split aggregates passed by
> reference that are just passed to another function (where splitting
> would be useful) because it cannot perform the same TBAA analysis like
> the current implementation which already knows what types it should look
> at because it has access to bodies of all functions attempts to modify.

So that's just because the analysis is imperfect?  I mean if we can handle

 foo (X *p) { do_something (p->a); }
 X a; a.a = 1; foo (&a);

then we should be able to handle

 bar (X *p) { foo (p); }
 X a; a.a = 1; bar (&a);

by just applying local analysis of foo when looking at what to do for bar?
So isn't that what a IPA propagation step should do?

> Since the last time I have fixed a number of bugs that Martin Liška
> found when compiling a portion of openSUSE with the patch, removed all
> the FIXMEs, made long living memory structures more compact and
> self-reviewed the entire patch once.
>
> Therefore, I would like to ask for a review and eventually for an
> approval to commit the patch to the trunk.  The patch survives
> bootstrap, LTO bootstrap and LTO profiledbootstrap on x86_64-linux.  In
> the testsuite, it "fixes" 24 guality passes (all LTO ones) but breaks 12
> other ones (one is non-LTO).  I would welcome any help with addressing
> these.  Because the patch removes the old IPA-SRA, the input to the IPA
> pipeline looks different and so I could not just try to make it "process
> the debug statements like before."
>
> Because the patch is big I had to compress it to get it through to
> gcc-patches.  Because of its size and because it contains a completely
> new file ipa-sra.c and total reimplementations of
> ipa-param-manipulation.[ch], and so I pushed my development branch to
> branch jamborm/ipa-sra on GCC git server
> (https://gcc.gnu.org/git/?p=gcc.git;a=shortlog;h=refs/heads/jamborm/ipa-sra).
> It may be more convenient to check it out and review it that way.
>
> Thanks in advance for any questions, comments and suggestions,

Thanks for doing this.  I wonder how difficult it is to split the patch
into a) old IPA-SRA removal, b) refactoring c) IPA-SRA add (probably
easiest in that order).  It's quite a large number of changes, a)
being mostly uninteresting (and pre-approved hereby, just not
independently, of course), b) is uninteresting to me, but I would like
to look at c), not sure if that's really only the new file, probably not
since IPA modifications have infrastructure bits.

Sorry for not mentioning earlier.  Maybe just splitting out a) already
helps (you seem to remove code not in tree-sra.c).

Thanks,
Richard.

> Martin
>
>
>
> 2019-05-09  Martin Jambor  
>
> * coretypes.h (cgraph_edge): Declare.
> * ipa-param-manipulation.c: Rewrite.
> * ipa-param-manipulation.h: Likewise.
> * Makefile.in (GTFILES): Added ipa-param-manipulation.h and ipa-sra.c.
> (OBJS): Added ipa-sra.o.
> * cgraph.h (ipa_replace_map): Removed fields old_tree, replace_p
> and ref_p, added fields param_adjustments and performed_splits.
> (struct cgraph_clone_info): Remove ags_to_skip and
> combined_args_to_skip, new field param_adjustments.
> (cgraph_node::create_clone): Changed parameters to use
> ipa_param_adjustments.
> (cgraph_node::create_virtual_clone): Likewise.
> (cgraph_node::create_virtual_clone_with_body): Likewise.
> (tree_function_versioning): Likewise.
> (cgraph_build_function_type_skip_args): Removed.
> * cgraph.c (cgraph_edge::redirect_call_stmt_to_callee): Convert to
> using ipa_param_adjustments.
> (clone_of_p): Likewise.
> * cgraphclones.c (cgraph_build_function_type_skip_args): Removed.
> (build_function_decl_skip_args): Likewise.
> (duplicate_thunk_for_node): Adjust parameters using
> ipa_param_body_adjustments, copy param_adjustments instead of
> args_to_skip.
> (cgraph_node::create_clone): Convert to using ipa_param_adjustments.
> (cgraph_node::create_virtual_clone): Likewise.
> (cgraph_node::create_version_clone_with_body): Likewise.
> (cgraph_materialize_

Re: [PATCH] Do not allow target_clones with alias attr (PR lto/90500).

2019-05-16 Thread Martin Liška
On 5/16/19 1:42 PM, Richard Biener wrote:
> On Thu, May 16, 2019 at 1:38 PM Martin Liška  wrote:
>>
>> On 5/16/19 1:24 PM, Richard Biener wrote:
>>> On Thu, May 16, 2019 at 1:18 PM Martin Liška  wrote:

 Hi.

 We should not allow target_clones being combined with alias attribute.

 Patch can bootstrap on x86_64-linux-gnu and survives regression tests.

 Ready to be installed?
>>>
>>> So that's because an alias cannot be turned into a dispatcher and still
>>> be an alias, correct?  So a way around this would be to turn the
>>> alias into the dispatcher and clone the alias target, leaving the
>>> plain alias target as default variant not going through the dispatcher?
>>
>> We do allow having an alias to a target clone symbol:
>>
>> __attribute__((target_clones("arch=haswell", "default"))) int __tanh() {}
>> __typeof(__tanh) tanhf64 __attribute__((alias("__tanh")));
>>
>> Having that tanhf64 points to the resolver, which I believe is correct.
> 
> In this case yes.  I think the case in the PR wants to have an alias
> to the default variant instead and that's not possible so it tries to
> do the cloning on an alias (basically tell cloning to use an alternate
> symbol name for the resolver, leaving the default in place).  IMHO
> a reasonable feature, not sure if a reasonable way to achieve.

I see. Agree with you that it can be handy. On the other hand, one can use 
target
attribute:

__attribute__((target("avx","arch=core-avx2")))
int
bar ()
{
  return 2;
}

__attribute__((target("default")))
int
bar ()
{
  return 2;
}

int barrr () __attribute__((alias("_Z3barv")));

Which directly identifies a concrete implementation.

Martin

> 
>> The PR is about an alias that has itself target_clone attribute, which
>> does not make sense.
>>
>>>
>>> Of course in the testcase the body of the alias target isn't available
>>> but that's a general issue, not special to aliases?
>>
>> We do the target_clone expansion just for node->definition which is true
>> for node->alias == true symbols.
> 
> I see.  I guess it should be done for node->analyzed only, but yes,
> w/o considering aliases or thunks all definitions have bodies.
> 
> Richard.
> 
>> Martin
>>
>>>
>>> Richard.
>>>
 Thanks,
 Martin

 gcc/ChangeLog:

 2019-05-16  Martin Liska  

 PR lto/90500
 * multiple_target.c (expand_target_clones): Do not allow
 target_clones being used with a symbol that is an alias.

 gcc/testsuite/ChangeLog:

 2019-05-16  Martin Liska  

 PR lto/90500
 * gcc.target/i386/pr90500-1.c: New test.
 * gcc.target/i386/pr90500-2.c: New test.
 ---
  gcc/multiple_target.c | 5 -
  gcc/testsuite/gcc.target/i386/pr90500-1.c | 8 
  gcc/testsuite/gcc.target/i386/pr90500-2.c | 7 +++
  3 files changed, 19 insertions(+), 1 deletion(-)
  create mode 100644 gcc/testsuite/gcc.target/i386/pr90500-1.c
  create mode 100644 gcc/testsuite/gcc.target/i386/pr90500-2.c


>>



Re: [PATCH] Do not allow target_clones with alias attr (PR lto/90500).

2019-05-16 Thread Richard Biener
On Thu, May 16, 2019 at 1:38 PM Martin Liška  wrote:
>
> On 5/16/19 1:24 PM, Richard Biener wrote:
> > On Thu, May 16, 2019 at 1:18 PM Martin Liška  wrote:
> >>
> >> Hi.
> >>
> >> We should not allow target_clones being combined with alias attribute.
> >>
> >> Patch can bootstrap on x86_64-linux-gnu and survives regression tests.
> >>
> >> Ready to be installed?
> >
> > So that's because an alias cannot be turned into a dispatcher and still
> > be an alias, correct?  So a way around this would be to turn the
> > alias into the dispatcher and clone the alias target, leaving the
> > plain alias target as default variant not going through the dispatcher?
>
> We do allow having an alias to a target clone symbol:
>
> __attribute__((target_clones("arch=haswell", "default"))) int __tanh() {}
> __typeof(__tanh) tanhf64 __attribute__((alias("__tanh")));
>
> Having that tanhf64 points to the resolver, which I believe is correct.

In this case yes.  I think the case in the PR wants to have an alias
to the default variant instead and that's not possible so it tries to
do the cloning on an alias (basically tell cloning to use an alternate
symbol name for the resolver, leaving the default in place).  IMHO
a reasonable feature, not sure if a reasonable way to achieve.

> The PR is about an alias that has itself target_clone attribute, which
> does not make sense.
>
> >
> > Of course in the testcase the body of the alias target isn't available
> > but that's a general issue, not special to aliases?
>
> We do the target_clone expansion just for node->definition which is true
> for node->alias == true symbols.

I see.  I guess it should be done for node->analyzed only, but yes,
w/o considering aliases or thunks all definitions have bodies.

Richard.

> Martin
>
> >
> > Richard.
> >
> >> Thanks,
> >> Martin
> >>
> >> gcc/ChangeLog:
> >>
> >> 2019-05-16  Martin Liska  
> >>
> >> PR lto/90500
> >> * multiple_target.c (expand_target_clones): Do not allow
> >> target_clones being used with a symbol that is an alias.
> >>
> >> gcc/testsuite/ChangeLog:
> >>
> >> 2019-05-16  Martin Liska  
> >>
> >> PR lto/90500
> >> * gcc.target/i386/pr90500-1.c: New test.
> >> * gcc.target/i386/pr90500-2.c: New test.
> >> ---
> >>  gcc/multiple_target.c | 5 -
> >>  gcc/testsuite/gcc.target/i386/pr90500-1.c | 8 
> >>  gcc/testsuite/gcc.target/i386/pr90500-2.c | 7 +++
> >>  3 files changed, 19 insertions(+), 1 deletion(-)
> >>  create mode 100644 gcc/testsuite/gcc.target/i386/pr90500-1.c
> >>  create mode 100644 gcc/testsuite/gcc.target/i386/pr90500-2.c
> >>
> >>
>


Re: [PATCH] Changes to std::variant to reduce code size

2019-05-16 Thread Jonathan Wakely

On 16/05/19 12:29 +0100, Jonathan Wakely wrote:

These two changes both result in smaller code for std::variant.

The first one means smaller tables of function pointers, because we
don't generate an instantiation for the valueless state. Instead we do
a runtime branch, marked [[unlikely]] to make _M_reset() a no-op if
it's already valueless. In a microbenchmark I couldn't measure any
performance difference due to the extra branch, so the code size
reduction seems worthwhile.

The second one removes a branch from the index() member by relying on
unsigned arithmetic. That also results in smaller code and I can't see
any downside.

* include/std/variant (_Variant_storage::_M_reset):
Replace raw visitation with a runtime check for the valueless state
and a non-raw visitor.
(_Variant_storage::_M_reset_impl): Remove.
(variant::index()): Remove branch.


We might also want:

--- a/libstdc++-v3/include/std/variant
+++ b/libstdc++-v3/include/std/variant
@@ -1503,7 +1503,7 @@ namespace __variant
 }
   else
 {
-   if (this->index() != variant_npos)
+   if (!this->valueless_by_exception()) [[__likely__]]
 {
   auto __tmp(std::move(__rhs_mem));
   __rhs = std::move(*this);
@@ -1520,7 +1520,7 @@ namespace __variant
 }
   else
 {
-   if (this->index() != variant_npos)
+   if (!this->valueless_by_exception()) [[__likely__]]
 {
   __rhs = std::move(*this);
   this->_M_reset();


This results in smaller code too, because for some specializations
valueless_by_exception() always returns false, so the branch can be
removed.

(This suggests that it's generally better to ask the yes/no question
"are you valid?" rather than "what is your index, and does it equal
this magic number?")

For specializations where a valueless state is possible we still
expect it to be very unlikely in practice, so the attribute should
help there.



[PATCH] Remove a test-case that leads to a huge stack (and file) allocation (PR middle-end/90478).

2019-05-16 Thread Martin Liška
Hi.

I'm going to remove the test as it leads to a huge .s files and stack
allocation at gcc/stmt.c:777

Ready for trunk?
Martin

gcc/testsuite/ChangeLog:

2019-05-16  Martin Liska  

PR middle-end/90478
* gcc.dg/tree-ssa/pr90478-2.c: Remove.
---
 gcc/testsuite/gcc.dg/tree-ssa/pr90478-2.c | 17 -
 1 file changed, 17 deletions(-)
 delete mode 100644 gcc/testsuite/gcc.dg/tree-ssa/pr90478-2.c


diff --git a/gcc/testsuite/gcc.dg/tree-ssa/pr90478-2.c b/gcc/testsuite/gcc.dg/tree-ssa/pr90478-2.c
deleted file mode 100644
index f0fc103a888..000
--- a/gcc/testsuite/gcc.dg/tree-ssa/pr90478-2.c
+++ /dev/null
@@ -1,17 +0,0 @@
-/* { dg-do compile } */
-/* { dg-options "-Os --param jump-table-max-growth-ratio-for-size=2147483647" } */
-
-long
-foo (long x, long y)
-{
-  x = x & y;
-  switch (y)
-{
-case 63L: x >>= 0; break;
-case 4032L: x >>= 6; break;
-case 258048L: x >>= 12; break;
-case 16515072L: x >>= 18; break;
-default: __builtin_unreachable ();
-}
-  return x;
-}



Re: [PATCH] Do not allow target_clones with alias attr (PR lto/90500).

2019-05-16 Thread Martin Liška
On 5/16/19 1:24 PM, Richard Biener wrote:
> On Thu, May 16, 2019 at 1:18 PM Martin Liška  wrote:
>>
>> Hi.
>>
>> We should not allow target_clones being combined with alias attribute.
>>
>> Patch can bootstrap on x86_64-linux-gnu and survives regression tests.
>>
>> Ready to be installed?
> 
> So that's because an alias cannot be turned into a dispatcher and still
> be an alias, correct?  So a way around this would be to turn the
> alias into the dispatcher and clone the alias target, leaving the
> plain alias target as default variant not going through the dispatcher?

We do allow having an alias to a target clone symbol:

__attribute__((target_clones("arch=haswell", "default"))) int __tanh() {}
__typeof(__tanh) tanhf64 __attribute__((alias("__tanh")));

Having that tanhf64 points to the resolver, which I believe is correct.

The PR is about an alias that has itself target_clone attribute, which
does not make sense.

> 
> Of course in the testcase the body of the alias target isn't available
> but that's a general issue, not special to aliases?

We do the target_clone expansion just for node->definition which is true
for node->alias == true symbols.

Martin

> 
> Richard.
> 
>> Thanks,
>> Martin
>>
>> gcc/ChangeLog:
>>
>> 2019-05-16  Martin Liska  
>>
>> PR lto/90500
>> * multiple_target.c (expand_target_clones): Do not allow
>> target_clones being used with a symbol that is an alias.
>>
>> gcc/testsuite/ChangeLog:
>>
>> 2019-05-16  Martin Liska  
>>
>> PR lto/90500
>> * gcc.target/i386/pr90500-1.c: New test.
>> * gcc.target/i386/pr90500-2.c: New test.
>> ---
>>  gcc/multiple_target.c | 5 -
>>  gcc/testsuite/gcc.target/i386/pr90500-1.c | 8 
>>  gcc/testsuite/gcc.target/i386/pr90500-2.c | 7 +++
>>  3 files changed, 19 insertions(+), 1 deletion(-)
>>  create mode 100644 gcc/testsuite/gcc.target/i386/pr90500-1.c
>>  create mode 100644 gcc/testsuite/gcc.target/i386/pr90500-2.c
>>
>>



  1   2   >