Re: [PATCH] V6, #4 of 17: Add prefixed instruction support to stack protect insns

2019-11-09 Thread Segher Boessenkool
On Sun, Nov 10, 2019 at 12:38:56AM -0600, Segher Boessenkool wrote:
> It emits a literal ';' into the assembler code.

Or actually, huh, it doesn't.  Sorry.  See read_braced_string in
read-md.c .  Your code is fine.


Segher


Re: [PATCH] V6, #4 of 17: Add prefixed instruction support to stack protect insns

2019-11-09 Thread Segher Boessenkool
On Sun, Nov 10, 2019 at 01:32:29AM -0500, Michael Meissner wrote:
> On Fri, Nov 01, 2019 at 10:22:03PM -0500, Segher Boessenkool wrote:
> > On Wed, Oct 16, 2019 at 09:47:41AM -0400, Michael Meissner wrote:
> > We could make %pN mean 'p' for prefixed, for memory as operands[N]?  Are
> > there more places than this that could use that?  How about inline asm?
> 
> Right now, the only two places that do this are the two stack protect insns.
> Everything else that I'm aware of that generates multiple loads or stores will
> do a split before final.

How about inline asm?

> > > +  if (which_alternative == 0)
> > > +output_asm_insn ("xor. %3,%3,%4", operands);
> > > +  else
> > > +output_asm_insn ("cmpld %0,%3,%4\;li %3,0", operands);
> > 
> > That doesn't work: the backslash is treated like the escape character, in
> > a C block.  I think doubling it will work?  Check the generated 
> > insn-output.c,
> > it should be translated to \t\n in there.
> 
> Yes it does work.  I just checked.

It emits a literal ';' into the assembler code.


Segher


Re: [PATCH] V6, #4 of 17: Add prefixed instruction support to stack protect insns

2019-11-09 Thread Michael Meissner
On Fri, Nov 01, 2019 at 10:22:03PM -0500, Segher Boessenkool wrote:
> Hi!
> 
> On Wed, Oct 16, 2019 at 09:47:41AM -0400, Michael Meissner wrote:
> > This patch fixes the stack protection insns to support stacks larger than
> > 16-bits on the 'future' system using prefixed loads and stores.
> 
> > +;; We can't use the prefixed attribute here because there are two memory
> > +;; instructions.  We can't split the insn due to the fact that this 
> > operation
> > +;; needs to be done in one piece.
> >  (define_insn "stack_protect_setdi"
> >[(set (match_operand:DI 0 "memory_operand" "=Y")
> > (unspec:DI [(match_operand:DI 1 "memory_operand" "Y")] UNSPEC_SP_SET))
> > (set (match_scratch:DI 2 "=&r") (const_int 0))]
> >"TARGET_64BIT"
> > -  "ld%U1%X1 %2,%1\;std%U0%X0 %2,%0\;li %2,0"
> > +{
> > +  if (prefixed_memory (operands[1], DImode))
> > +output_asm_insn ("pld %2,%1", operands);
> > +  else
> > +output_asm_insn ("ld%U1%X1 %2,%1", operands);
> > +
> > +  if (prefixed_memory (operands[0], DImode))
> > +output_asm_insn ("pstd %2,%0", operands);
> > +  else
> > +output_asm_insn ("std%U0%X0 %2,%0", operands);
> 
> We could make %pN mean 'p' for prefixed, for memory as operands[N]?  Are
> there more places than this that could use that?  How about inline asm?

Right now, the only two places that do this are the two stack protect insns.
Everything else that I'm aware of that generates multiple loads or stores will
do a split before final.

> > +   (set (attr "length")
> > +   (cond [(and (match_operand 0 "prefixed_memory")
> > +   (match_operand 1 "prefixed_memory"))
> > +  (const_string "24")
> > +
> > +  (ior (match_operand 0 "prefixed_memory")
> > +   (match_operand 1 "prefixed_memory"))
> > +  (const_string "20")]
> > +
> > + (const_string "12")))])
> 
> You can use const_int instead of const_string here, I think?  Please do
> that if it works.

I'll try it out on Monday.

> Quite a simple expression, phew :-)
> 
> > +  if (which_alternative == 0)
> > +output_asm_insn ("xor. %3,%3,%4", operands);
> > +  else
> > +output_asm_insn ("cmpld %0,%3,%4\;li %3,0", operands);
> 
> That doesn't work: the backslash is treated like the escape character, in
> a C block.  I think doubling it will work?  Check the generated insn-output.c,
> it should be translated to \t\n in there.

Yes it does work.  I just checked.

> Okay for trunk with those things taken care of.  Thanks!

-- 
Michael Meissner, IBM
IBM, M/S 2506R, 550 King Street, Littleton, MA 01460-6245, USA
email: meiss...@linux.ibm.com, phone: +1 (978) 899-4797


Re: [committed][AArch64] Add main SVE ACLE tests

2019-11-09 Thread Andreas Schwab
I'm getting massive testsuite failures with -mabi=ilp32.  There are
14922 failures for g++ and another 13594 failures for gcc.

Andreas.

-- 
Andreas Schwab, sch...@linux-m68k.org
GPG Key fingerprint = 7578 EB47 D4E5 4D69 2510  2552 DF73 E780 A9DA AEC1
"And now for something completely different."


Fix IPA_NODE_REF call in evaluate_properties_for_edge

2019-11-09 Thread Jan Hubicka
Hi,
evaluate_properties_for_edge may call IPA_NODE_REF on alias which makes no
sense. Fixed thus.

Bootstrapped/regtested x86_64-linux. Comitted.

* ipa-fnsummary.c (evaluate_properties_for_edge): Call IPA_NODE_REF
on function symbol.
* gcc.dg/tree-ssa/pr46076.c: Make tested code hot.
Index: ipa-fnsummary.c
===
--- ipa-fnsummary.c (revision 278007)
+++ ipa-fnsummary.c (working copy)
@@ -474,7 +474,7 @@ evaluate_properties_for_edge (struct cgr
caller_parms_info = IPA_NODE_REF (e->caller->inlined_to);
   else
caller_parms_info = IPA_NODE_REF (e->caller);
-  callee_pi = IPA_NODE_REF (e->callee);
+  callee_pi = IPA_NODE_REF (callee);
 
   if (count && (info->conds || known_vals_ptr))
known_vals.safe_grow_cleared (count);
Index: testsuite/gcc.dg/tree-ssa/pr46076.c
===
--- testsuite/gcc.dg/tree-ssa/pr46076.c (revision 278007)
+++ testsuite/gcc.dg/tree-ssa/pr46076.c (working copy)
@@ -19,9 +19,12 @@ main()
 {
   /* Make sure we perform indirect inlining of one and two and optimize
  the result to a constant.  */
-  if (print(one) != 3)
-link_error ();
-  if (print(two) != 5)
-link_error ();
+  for (int i = 0; i < 100; i++)
+{
+  if (print(one) != 3)
+   link_error ();
+  if (print(two) != 5)
+   link_error ();
+}
   return 0;
 }


[PATCH] Bump minimum MPFR version to 3.1.0

2019-11-09 Thread Janne Blomqvist
Bump the minimum MPFR version to 3.1.0, released 2011-10-03. With this
requirement one can still build GCC with the operating system provided
MPFR on old but still supported operating systems like SLES 12 (MPFR
3.1.2) or RHEL/CentOS 7.x (MPFR 3.1.1).

This allows removing some code in the Fortran frontend, as well as
fixing PR 91828.

ChangeLog:

2019-11-09  Janne Blomqvist  

PR fortran/91828
* configure.ac: Bump minimum MPFR to 3.1.0, recommended to 3.1.6+.
* configure: Regenerated.

gcc/ChangeLog:

2019-11-09  Janne Blomqvist  

PR fortran/91828
* doc/install.texi: Document that the minimum MPFR version is
3.1.0.

gcc/fortran/ChangeLog:

2019-11-09  Janne Blomqvist  

PR fortran/91828
* simplify.c (gfc_simplify_fraction): Remove fallback path for
MPFR < 3.1.0.
---
 configure.ac   |  6 +++---
 gcc/doc/install.texi   |  2 +-
 gcc/fortran/simplify.c | 37 -
 3 files changed, 4 insertions(+), 41 deletions(-)

diff --git a/configure.ac b/configure.ac
index b8ce2ad20b9..d63a8bae940 100644
--- a/configure.ac
+++ b/configure.ac
@@ -1601,12 +1601,12 @@ if test -d ${srcdir}/gcc && test "x$have_gmp" = xno; 
then
 AC_MSG_CHECKING([for the correct version of mpfr.h])
 AC_TRY_COMPILE([#include 
 #include ],[
-#if MPFR_VERSION < MPFR_VERSION_NUM(2,4,0)
+#if MPFR_VERSION < MPFR_VERSION_NUM(3,1,0)
 choke me
 #endif
 ], [AC_TRY_COMPILE([#include 
 #include ],[
-#if MPFR_VERSION < MPFR_VERSION_NUM(2,4,2)
+#if MPFR_VERSION < MPFR_VERSION_NUM(3,1,6)
 choke me
 #endif
 ], [AC_MSG_RESULT([yes])], [AC_MSG_RESULT([buggy but acceptable])])],
@@ -1661,7 +1661,7 @@ if test -d ${srcdir}/gcc && test "x$have_gmp" = xno; then
 # The library versions listed in the error message below should match
 # the HARD-minimums enforced above.
   if test x$have_gmp != xyes; then
-AC_MSG_ERROR([Building GCC requires GMP 4.2+, MPFR 2.4.0+ and MPC 0.8.0+.
+AC_MSG_ERROR([Building GCC requires GMP 4.2+, MPFR 3.1.0+ and MPC 0.8.0+.
 Try the --with-gmp, --with-mpfr and/or --with-mpc options to specify
 their locations.  Source code for these libraries can be found at
 their respective hosting sites as well as at
diff --git a/gcc/doc/install.texi b/gcc/doc/install.texi
index 2cb8a342a2c..93b01ff7971 100644
--- a/gcc/doc/install.texi
+++ b/gcc/doc/install.texi
@@ -384,7 +384,7 @@ and @option{--with-gmp-include}.
 The in-tree build is only supported with the GMP version that
 download_prerequisites installs.
 
-@item MPFR Library version 2.4.2 (or later)
+@item MPFR Library version 3.1.0 (or later)
 
 Necessary to build GCC@.  It can be downloaded from
 @uref{https://www.mpfr.org}.  If an MPFR source distribution is found
diff --git a/gcc/fortran/simplify.c b/gcc/fortran/simplify.c
index 2eb1943c3ee..0461d31cd88 100644
--- a/gcc/fortran/simplify.c
+++ b/gcc/fortran/simplify.c
@@ -3076,12 +3076,7 @@ gfc_expr *
 gfc_simplify_fraction (gfc_expr *x)
 {
   gfc_expr *result;
-
-#if MPFR_VERSION < MPFR_VERSION_NUM(3,1,0)
-  mpfr_t absv, exp, pow2;
-#else
   mpfr_exp_t e;
-#endif
 
   if (x->expr_type != EXPR_CONSTANT)
 return NULL;
@@ -3095,41 +3090,9 @@ gfc_simplify_fraction (gfc_expr *x)
   return result;
 }
 
-#if MPFR_VERSION < MPFR_VERSION_NUM(3,1,0)
-
-  /* MPFR versions before 3.1.0 do not include mpfr_frexp.
- TODO: remove the kludge when MPFR 3.1.0 or newer will be required */
-
-  if (mpfr_sgn (x->value.real) == 0)
-{
-  mpfr_set (result->value.real, x->value.real, GFC_RND_MODE);
-  return result;
-}
-
-  gfc_set_model_kind (x->ts.kind);
-  mpfr_init (exp);
-  mpfr_init (absv);
-  mpfr_init (pow2);
-
-  mpfr_abs (absv, x->value.real, GFC_RND_MODE);
-  mpfr_log2 (exp, absv, GFC_RND_MODE);
-
-  mpfr_trunc (exp, exp);
-  mpfr_add_ui (exp, exp, 1, GFC_RND_MODE);
-
-  mpfr_ui_pow (pow2, 2, exp, GFC_RND_MODE);
-
-  mpfr_div (result->value.real, x->value.real, pow2, GFC_RND_MODE);
-
-  mpfr_clears (exp, absv, pow2, NULL);
-
-#else
-
   /* mpfr_frexp() correctly handles zeros and NaNs.  */
   mpfr_frexp (&e, result->value.real, x->value.real, GFC_RND_MODE);
 
-#endif
-
   return range_check (result, "FRACTION");
 }
 
-- 
2.17.1



Speed up inlining functions called once

2019-11-09 Thread Jan Hubicka
Hi,
it turns out that we spend a lot of time in inlining functions called
once.  This is because we in fact inline functions where inlining into
all callers and eliminating offline copy of it leads to smaller code.
For that we always compute growth of all edges in the callgraph that is
expensive. For most functions first few calls are enough to prove that
inlining will make code size grow.  This patch makes
growth_likely_positive reliable and implements the cap.

Bootstrapped/regtested x86_64-linux, comitted.

* ipa-inline-analysis.c (do_estimate_growth_1): Add support for
capping the growth cumulated.
(offline_size): Break out from ...
(estimate_growth): ... here.
(check_callers): Add N, OFFLINE and MIN_SIZE and KNOWN_EDGE
parameters.
(growth_likely_positive): Turn to ...
(growth_positive_p): Re-implement.
* ipa-inline.h (growth_likely_positive): Remove.
(growth_positive_p): Declare.
* ipa-inline.c (want_inline_small_function_p): Use
growth_positive_p.
(want_inline_function_to_all_callers_p): Likewise.

Index: ipa-inline-analysis.c
===
--- ipa-inline-analysis.c   (revision 278006)
+++ ipa-inline-analysis.c   (working copy)
@@ -387,6 +387,7 @@ struct growth_data
   bool self_recursive;
   bool uninlinable;
   int growth;
+  int cap;
 };
 
 
@@ -406,29 +407,57 @@ do_estimate_growth_1 (struct cgraph_node
  || !opt_for_fn (e->caller->decl, optimize))
{
  d->uninlinable = true;
+ if (d->cap < INT_MAX)
+   return true;
   continue;
}
 
   if (e->recursive_p ())
{
  d->self_recursive = true;
+ if (d->cap < INT_MAX)
+   return true;
  continue;
}
   d->growth += estimate_edge_growth (e);
+  if (d->growth > d->cap)
+   return true;
 }
   return false;
 }
 
+/* Return estimated savings for eliminating offline copy of NODE by inlining
+   it everywhere.  */
+
+static int
+offline_size (struct cgraph_node *node, ipa_size_summary *info)
+{
+  if (!DECL_EXTERNAL (node->decl))
+{
+  if (node->will_be_removed_from_program_if_no_direct_calls_p ())
+   return info->size;
+  /* COMDAT functions are very often not shared across multiple units
+ since they come from various template instantiations.
+ Take this into account.  */
+  else if (DECL_COMDAT (node->decl)
+  && node->can_remove_if_no_direct_calls_p ())
+   return (info->size
+   * (100 - PARAM_VALUE (PARAM_COMDAT_SHARING_PROBABILITY))
+   + 50) / 100;
+}
+  return 0;
+}
 
 /* Estimate the growth caused by inlining NODE into all callees.  */
 
 int
 estimate_growth (struct cgraph_node *node)
 {
-  struct growth_data d = { node, false, false, 0 };
-  class ipa_size_summary *info = ipa_size_summaries->get (node);
+  struct growth_data d = { node, false, false, 0, INT_MAX };
+  ipa_size_summary *info = ipa_size_summaries->get (node);
 
-  node->call_for_symbol_and_aliases (do_estimate_growth_1, &d, true);
+  if (node->call_for_symbol_and_aliases (do_estimate_growth_1, &d, true))
+return 1;
 
   /* For self recursive functions the growth estimation really should be
  infinity.  We don't want to return very large values because the growth
@@ -436,21 +465,8 @@ estimate_growth (struct cgraph_node *nod
  return zero or negative growths. */
   if (d.self_recursive)
 d.growth = d.growth < info->size ? info->size : d.growth;
-  else if (DECL_EXTERNAL (node->decl) || d.uninlinable)
-;
-  else
-{
-  if (node->will_be_removed_from_program_if_no_direct_calls_p ())
-   d.growth -= info->size;
-  /* COMDAT functions are very often not shared across multiple units
- since they come from various template instantiations.
- Take this into account.  */
-  else if (DECL_COMDAT (node->decl)
-  && node->can_remove_if_no_direct_calls_p ())
-   d.growth -= (info->size
-* (100 - PARAM_VALUE (PARAM_COMDAT_SHARING_PROBABILITY))
-+ 50) / 100;
-}
+  else if (!d.uninlinable)
+d.growth -= offline_size (node, info);
 
   return d.growth;
 }
@@ -458,7 +474,8 @@ estimate_growth (struct cgraph_node *nod
 /* Verify if there are fewer than MAX_CALLERS.  */
 
 static bool
-check_callers (cgraph_node *node, int *max_callers)
+check_callers (cgraph_node *node, int *growth, int *n, int offline,
+  int min_size, struct cgraph_edge *known_edge)
 {
   ipa_ref *ref;
 
@@ -467,70 +484,96 @@ check_callers (cgraph_node *node, int *m
 
   for (cgraph_edge *e = node->callers; e; e = e->next_caller)
 {
-  (*max_callers)--;
-  if (!*max_callers
- || cgraph_inline_failed_type (e->inline_failed) == CIF_FINAL_ERROR)
+  edge_growth_cache_entry *entry;
+
+  if (e == known_edge)
+   continue;
+

Fix computation of min_size in ipa-fnsummary.c

2019-11-09 Thread Jan Hubicka
Hi,
this patch fixes two different bugs in computation of min_size in
ipa-fnsummary.c.  This computes minimal size of any clone or version 
of given function and is used to avoid some harder work to compute
actual size for given context in the inlining heuristics.

In estimate_size_and_time we mistakely reset min_size to size of first
entry of size/time table (which is only unconditional entry) forgetting
that it already cumulated info about calls sizes.

In update_overall_fn_summary we forgot to account the size/time table
and then divide result by size scale.

Bootstrapped/regtested x86_64-linux, comitted.

Honza

* ipa-fnsummary.c (ipa_call_context::estimate_size_and_time): Fix
calculation of min_size.
(ipa_update_overall_fn_summary): Likewise.
Index: ipa-fnsummary.c
===
--- ipa-fnsummary.c (revision 278005)
+++ ipa-fnsummary.c (working copy)
@@ -3243,6 +3243,7 @@ ipa_call_context::estimate_size_and_time
 
   sreal nonspecialized_time = time;
 
+  min_size += (*info->size_time_table)[0].size;
   for (i = 0; vec_safe_iterate (info->size_time_table, i, &e); i++)
 {
   bool exec = e->exec_predicate.evaluate (m_nonspec_possible_truths);
@@ -3288,7 +3289,7 @@ ipa_call_context::estimate_size_and_time
  }
   gcc_checking_assert ((*info->size_time_table)[0].exec_predicate == true);
   gcc_checking_assert ((*info->size_time_table)[0].nonconst_predicate == true);
-  min_size = (*info->size_time_table)[0].size;
+  gcc_checking_assert (min_size >= 0);
   gcc_checking_assert (size >= 0);
   gcc_checking_assert (time >= 0);
   /* nonspecialized_time should be always bigger than specialized time.
@@ -3685,12 +3686,13 @@ ipa_update_overall_fn_summary (struct cg
   size_info->size += e->size;
   info->time += e->time;
 }
+  info->min_size = (*info->size_time_table)[0].size;
   estimate_calls_size_and_time (node, &size_info->size, &info->min_size,
&info->time, NULL,
~(clause_t) (1 << predicate::false_condition),
vNULL, vNULL, vNULL);
-  size_info->size = (size_info->size + ipa_fn_summary::size_scale / 2)
-   / ipa_fn_summary::size_scale;
+  size_info->size = RDIV (size_info->size, ipa_fn_summary::size_scale);
+  info->min_size = RDIV (info->min_size, ipa_fn_summary::size_scale);
 }
 
 


Some minor ipa-fnsummary speedups

2019-11-09 Thread Jan Hubicka
Hi,
this patch prevents ipa-fnsummary from calculating things that are not
going to be used.

Bootstrapped/regtested x86_64-linux, comitted.

Honza

* ipa-fnsummary.c (estimate_edge_size_and_time): Do not call
estimate_edge_devirt_benefit when not computing hints;
do not compute time when not asked for.
(estimate_calls_size_and_time): Pass NULL hints and time when
these are not computed; do not evaluate hint predicates when these are
not computed.
(ipa_merge_fn_summary_after_inlining): Do not re-evaluate edge
frequency.
Index: ipa-fnsummary.c
===
--- ipa-fnsummary.c (revision 278004)
+++ ipa-fnsummary.c (working copy)
@@ -2894,16 +2894,17 @@ estimate_edge_size_and_time (struct cgra
   int call_size = es->call_stmt_size;
   int call_time = es->call_stmt_time;
   int cur_size;
-  if (!e->callee
+  if (!e->callee && hints && e->maybe_hot_p ()
   && estimate_edge_devirt_benefit (e, &call_size, &call_time,
-  known_vals, known_contexts, known_aggs)
-  && hints && e->maybe_hot_p ())
+  known_vals, known_contexts, known_aggs))
 *hints |= INLINE_HINT_indirect_call;
   cur_size = call_size * ipa_fn_summary::size_scale;
   *size += cur_size;
   if (min_size)
 *min_size += cur_size;
-  if (prob == REG_BR_PROB_BASE)
+  if (!time)
+;
+  else if (prob == REG_BR_PROB_BASE)
 *time += ((sreal)call_time) * e->sreal_frequency ();
   else
 *time += ((sreal)call_time * prob) * e->sreal_frequency ();
@@ -3235,8 +3236,11 @@ ipa_call_context::estimate_size_and_time
  }
 }
 
-  estimate_calls_size_and_time (m_node, &size, &min_size, &time, &hints, 
m_possible_truths,
+  estimate_calls_size_and_time (m_node, &size, &min_size,
+   ret_time ? &time : NULL,
+   ret_hints ? &hints : NULL, m_possible_truths,
m_known_vals, m_known_contexts, m_known_aggs);
+
   sreal nonspecialized_time = time;
 
   for (i = 0; vec_safe_iterate (info->size_time_table, i, &e); i++)
@@ -3260,6 +3264,8 @@ ipa_call_context::estimate_size_and_time
 known to be constant in a specialized setting.  */
  if (nonconst)
size += e->size;
+ if (!ret_time)
+   continue;
  nonspecialized_time += e->time;
  if (!nonconst)
;
@@ -3295,16 +3301,19 @@ ipa_call_context::estimate_size_and_time
   if (time > nonspecialized_time)
 time = nonspecialized_time;
 
-  if (info->loop_iterations
-  && !info->loop_iterations->evaluate (m_possible_truths))
-hints |= INLINE_HINT_loop_iterations;
-  if (info->loop_stride
-  && !info->loop_stride->evaluate (m_possible_truths))
-hints |= INLINE_HINT_loop_stride;
-  if (info->scc_no)
-hints |= INLINE_HINT_in_scc;
-  if (DECL_DECLARED_INLINE_P (m_node->decl))
-hints |= INLINE_HINT_declared_inline;
+  if (ret_hints)
+{
+  if (info->loop_iterations
+ && !info->loop_iterations->evaluate (m_possible_truths))
+   hints |= INLINE_HINT_loop_iterations;
+  if (info->loop_stride
+ && !info->loop_stride->evaluate (m_possible_truths))
+   hints |= INLINE_HINT_loop_stride;
+  if (info->scc_no)
+   hints |= INLINE_HINT_in_scc;
+  if (DECL_DECLARED_INLINE_P (m_node->decl))
+   hints |= INLINE_HINT_declared_inline;
+}
 
   size = RDIV (size, ipa_fn_summary::size_scale);
   min_size = RDIV (min_size, ipa_fn_summary::size_scale);
@@ -3604,6 +3613,7 @@ ipa_merge_fn_summary_after_inlining (str
  gcc_assert (map < ipa_get_param_count (params_summary));
}
 }
+  sreal freq =  edge->sreal_frequency ();
   for (i = 0; vec_safe_iterate (callee_info->size_time_table, i, &e); i++)
 {
   predicate p;
@@ -3620,7 +3630,7 @@ ipa_merge_fn_summary_after_inlining (str
  toplev_predicate);
   if (p != false && nonconstp != false)
{
- sreal add_time = ((sreal)e->time * edge->sreal_frequency ());
+ sreal add_time = ((sreal)e->time * freq);
  int prob = e->nonconst_predicate.probability (callee_info->conds,
clause, es->param);
  add_time = add_time * prob / REG_BR_PROB_BASE;


[patch, fortran, committed] Commit symbol for external BLAS routine when translating MATMUL to *GEMM.

2019-11-09 Thread Thomas Koenig

Hi,

I just committed the patch below as obvious to fix a 9/10 regression
when directly calling BLAS routines for matmul.  Will backport
to gcc-9 soon.

Regards

Thomas

Commit symbol for external BLAS routine when translating MATMUL to *GEMM.

2019-11-09  Thomas Koenig  

PR fortran/92321
* frontend-passes.c (call_external_blas): Commit symbol for
external BLAS routine.

2019-11-09  Thomas Koenig  

PR fortran/92321
* gfortran.dg/matmul_blas_2.f90: New test.
! { dg-do compile }
! { dg-options "-O3 -fdump-tree-original -fexternal-blas" }
! PR fortran/92321 - this used to cause an ICE.  Original test case
! by Nathan Wukie.

module mod_badmatmul
implicit none
contains

subroutine test(c)
real, intent(inout) :: c(3,3)
real :: a(3,3), b(3,3)
c = matmul(a, b)
end subroutine test

end module mod_badmatmul

program main
use mod_badmatmul, only: test
implicit none

real :: a(3,3)
call test(a)

end program main
Index: frontend-passes.c
===
--- frontend-passes.c	(Revision 277999)
+++ frontend-passes.c	(Arbeitskopie)
@@ -4635,6 +4635,7 @@ call_external_blas (gfc_code **c, int *walk_subtre
   call->symtree->n.sym->attr.procedure = 1;
   call->symtree->n.sym->attr.flavor = FL_PROCEDURE;
   call->resolved_sym = call->symtree->n.sym;
+  gfc_commit_symbol (call->resolved_sym);
 
   /* Argument TRANSA.  */
   next = gfc_get_actual_arglist ();


[PATCH] Refactor tree-loop-distribution for thread safety

2019-11-09 Thread Giuliano Belinassi
Hi all,

This patch refactors tree-loop-distribution.c for thread safety without
use of C11 __thread feature. All global variables were moved to a struct
which is initialized at ::execute time.

I can install this patch myself in trunk if it's OK.

gcc/ChangeLog
2019-11-09  Giuliano Belinassi  

* cfgloop.c (get_loop_body_in_custom_order): New.
* cfgloop.h (get_loop_body_in_custom_order): New prototype.
* tree-loop-distribution.c (struct priv_pass_vars): New.
(bb_top_order_cmp_r): New.
(create_rdg_vertices): Update prototype.
(stmts_from_loop): Same as above.
(update_for_merge): Same as above.
(partition_merge_into): Same as above.
(get_data_dependence): Same as above.
(data_dep_in_cycle_p): Same as above.
(update_type_for_merge): Same as above.
(build_rdg_partition_for-vertex): Same as above.
(classify_builtin_ldst): Same as above.
(classify_partition): Same as above.
(share_memory_accesses): Same as above.
(rdg_build_partitions): Same as above.
(pg_add_dependence_edges): Same as above.
(build_partition_graph): Same as above.
(merge_dep_scc_partitions): Same as above.
(break_alias_scc_partitions): Same as above.
(finalize_partitions): Same as above.
(distribute_loop): Same as above.
(bb_top_order_init): New function.
(bb_top_order_destroy): New function.
(pass_loop_distribution::execute): Initialize struct priv.

Thank you,
Giuliano.
diff --git gcc/cfgloop.c gcc/cfgloop.c
index f18d2b3f24b..db0066ea859 100644
--- gcc/cfgloop.c
+++ gcc/cfgloop.c
@@ -980,6 +980,19 @@ get_loop_body_in_custom_order (const class loop *loop,
   return bbs;
 }
 
+/* Same as above, but use gcc_sort_r instead of qsort.  */
+
+basic_block *
+get_loop_body_in_custom_order (const class loop *loop, void *data,
+			   int (*bb_comparator) (const void *, const void *, void *))
+{
+  basic_block *bbs = get_loop_body (loop);
+
+  gcc_sort_r (bbs, loop->num_nodes, sizeof (basic_block), bb_comparator, data);
+
+  return bbs;
+}
+
 /* Get body of a LOOP in breadth first sort order.  */
 
 basic_block *
diff --git gcc/cfgloop.h gcc/cfgloop.h
index 0b0154ffd7b..6256cc01ff4 100644
--- gcc/cfgloop.h
+++ gcc/cfgloop.h
@@ -376,6 +376,8 @@ extern basic_block *get_loop_body_in_dom_order (const class loop *);
 extern basic_block *get_loop_body_in_bfs_order (const class loop *);
 extern basic_block *get_loop_body_in_custom_order (const class loop *,
 			   int (*) (const void *, const void *));
+extern basic_block *get_loop_body_in_custom_order (const class loop *, void *,
+			   int (*) (const void *, const void *, void *));
 
 extern vec get_loop_exit_edges (const class loop *);
 extern edge single_exit (const class loop *);
diff --git gcc/tree-loop-distribution.c gcc/tree-loop-distribution.c
index 81784866ad1..9edac891666 100644
--- gcc/tree-loop-distribution.c
+++ gcc/tree-loop-distribution.c
@@ -155,20 +155,55 @@ ddr_hasher::equal (const data_dependence_relation *ddr1,
   return (DDR_A (ddr1) == DDR_A (ddr2) && DDR_B (ddr1) == DDR_B (ddr2));
 }
 
-/* The loop (nest) to be distributed.  */
-static vec loop_nest;
+struct priv_pass_vars
+{
+  /* The loop (nest) to be distributed.  */
+  vec loop_nest;
 
-/* Vector of data references in the loop to be distributed.  */
-static vec datarefs_vec;
+  /* Vector of data references in the loop to be distributed.  */
+  vec datarefs_vec;
 
-/* If there is nonaddressable data reference in above vector.  */
-static bool has_nonaddressable_dataref_p;
+  /* If there is nonaddressable data reference in above vector.  */
+  bool has_nonaddressable_dataref_p;
 
-/* Store index of data reference in aux field.  */
-#define DR_INDEX(dr)  ((uintptr_t) (dr)->aux)
+  /* Store index of data reference in aux field.  */
+
+  /* Hash table for data dependence relation in the loop to be distributed.  */
+  hash_table *ddrs_table;
+
+  /* Array mapping basic block's index to its topological order.  */
+  int *bb_top_order_index;
+  /* And size of the array.  */
+  int bb_top_order_index_size;
+
+};
+
+
+/* If X has a smaller topological sort number than Y, returns -1;
+   if greater, returns 1.  */
+static int
+bb_top_order_cmp_r (const void *x, const void *y, void *priv)
+{
+  const struct priv_pass_vars *_priv =
+(const struct priv_pass_vars *) priv;
 
-/* Hash table for data dependence relation in the loop to be distributed.  */
-static hash_table *ddrs_table;
+  basic_block bb1 = *(const basic_block *) x;
+  basic_block bb2 = *(const basic_block *) y;
+
+  int *bb_top_order_index = _priv->bb_top_order_index;
+  int bb_top_order_index_size = _priv->bb_top_order_index_size;
+
+  gcc_assert (bb1->index < bb_top_order_index_size
+	  && bb2->index < bb_top_order_index_size);
+  gcc_assert (bb1 == bb2
+	  || bb_top_order_index[bb1->index]
+		 != bb_top_order_index[bb2->index]);
+
+  return (bb_top_order_i

[Darwin, machopic 10/n] Rework X86 mcount stub code.

2019-11-09 Thread Iain Sandoe
More machopic cleanups…

When a stub is used to call the mcount function, the code is already
marking it as used unconditionally;  This is the only use of the so-
called validation outside darwin.{h,c}.  This moves the 'validation'
into darwin.c which is a step towards making validation routine local.

tested on x86_64-darwin16 (m32/m64)
applied to mainline,
thanks
Iain

gcc/

2019-11-09  Iain Sandoe  

* config/darwin.c (machopic_mcount_stub_name): Validate the
symbol stub name when it is created.
* config/i386/darwin.h (FUNCTION_PROFILER): Remove the symbol
stub validation.

diff --git a/gcc/config/darwin.c b/gcc/config/darwin.c
index c0fafed..f34be22 100644
--- a/gcc/config/darwin.c
+++ b/gcc/config/darwin.c
@@ -601,15 +601,6 @@ machopic_indirection_name (rtx sym_ref, bool stub_p)
   return p->ptr_name;
 }
 
-/* Return the name of the stub for the mcount function.  */
-
-const char*
-machopic_mcount_stub_name (void)
-{
-  rtx symbol = gen_rtx_SYMBOL_REF (Pmode, "*mcount");
-  return machopic_indirection_name (symbol, /*stub_p=*/true);
-}
-
 /* If NAME is the name of a stub or a non-lazy pointer , mark the stub
or non-lazy pointer as used -- and mark the object to which the
pointer/stub refers as used as well, since the pointer/stub will
@@ -2155,6 +2146,20 @@ darwin_emit_except_table_label (FILE *file)
   except_table_label_num++);
   ASM_OUTPUT_LABEL (file, section_start_label);
 }
+
+/* Return, and mark as used, the name of the stub for the mcount function.
+   Currently, this is only called by X86 code in the expansion of the
+   FUNCTION_PROFILER macro, when stubs are enabled.  */
+
+const char*
+machopic_mcount_stub_name (void)
+{
+  rtx symbol = gen_rtx_SYMBOL_REF (Pmode, "*mcount");
+  const char *name = machopic_indirection_name (symbol, /*stub_p=*/true);
+  machopic_validate_stub_or_non_lazy_ptr (name);
+  return name;
+}
+
 /* Generate a PC-relative reference to a Mach-O non-lazy-symbol.  */
 
 void
diff --git a/gcc/config/i386/darwin.h b/gcc/config/i386/darwin.h
index bdb36f0..d1e53ef 100644
--- a/gcc/config/i386/darwin.h
+++ b/gcc/config/i386/darwin.h
@@ -239,7 +239,8 @@ along with GCC; see the file COPYING3.  If not see
 #undef TARGET_ASM_OUTPUT_IDENT
 #define TARGET_ASM_OUTPUT_IDENT default_asm_output_ident_directive
 
-/* Darwin profiling -- call mcount.  */
+/* Darwin profiling -- call mcount.
+   If we need a stub, then we unconditionally mark it as used.  */
 #undef FUNCTION_PROFILER
 #define FUNCTION_PROFILER(FILE, LABELNO)   \
   do { \
@@ -248,7 +249,6 @@ along with GCC; see the file COPYING3.  If not see
   {
\
const char *name = machopic_mcount_stub_name ();\
fprintf (FILE, "\tcall %s\n", name+1);  /*  skip '&'  */\
-   machopic_validate_stub_or_non_lazy_ptr (name);  \
   }
\
 else fprintf (FILE, "\tcall mcount\n");\
   } while (0)



Re: [PATCH][arm][1/X] Add initial support for saturation intrinsics

2019-11-09 Thread Richard Henderson
> +;; define_subst and associated attributes
> +
> +(define_subst "add_setq"
> +  [(set (match_operand:SI 0 "" "")
> +(match_operand:SI 1 "" ""))]
> +  ""
> +  [(set (match_dup 0)
> +(match_dup 1))
> +   (set (reg:CC APSRQ_REGNUM)
> + (unspec:CC [(reg:CC APSRQ_REGNUM)] UNSPEC_Q_SET))])
> +
> +(define_subst_attr "add_clobber_q_name" "add_setq" "" "_setq")
> +(define_subst_attr "add_clobber_q_pred" "add_setq" "!ARM_Q_BIT_READ"
> +"ARM_Q_BIT_READ")

Is there a good reason to use CCmode for the Q bit instead of BImode?

Is there a good reason not to represent the clobber of the Q bit when we're not
representing the set?

Although it probably doesn't matter, because of the unspec, the update of the Q
bit here in the subst isn't right.  Better would be

  (set (reg:BI APSRQ_REGNUM)
   (ior:BI (unspec:BI [(match_dup 1)] UNSPEC_Q_SET)
   (reg:BI APSRQ_REGNUM)))


> +/* Implement TARGET_CHECK_BUILTIN_CALL.  Record a read of the Q bit through
> +   intrinsics in the machine function.  */
> +bool
> +arm_check_builtin_call (location_t , vec , tree fndecl,
> + tree, unsigned int, tree *)
> +{
> +  int fcode = DECL_MD_FUNCTION_CODE (fndecl);
> +  if (fcode == ARM_BUILTIN_saturation_occurred
> +  || fcode == ARM_BUILTIN_set_saturation)
> +{
> +  if (cfun && cfun->decl)
> + DECL_ATTRIBUTES (cfun->decl)
> +   = tree_cons (get_identifier ("acle qbit"), NULL_TREE,
> +DECL_ATTRIBUTES (cfun->decl));
> +}
> +  return true;
> +}

Where does this attribute affect inlining?  Or get merged into the signature of
a calling function during inlining?

This check happens way way early in the c front end, while doing semantic
checks.  I think this is the wrong hook to use, especially since this is not a
semantic check of the arguments to the builtin.

I think you want to record the use of such a builtin during rtl expansion,
within the define_expand for the buitin, setting a bool in struct
machine_function.  Then use that bool in the arm.md predicates.

(I'll note that there are 5 "int" fields in the arm machine_function that
should be bool, now that we're not using C89.  Probably best to re-order all of
them to the end and add your new bool next to them.)

> +(define_insn "arm_set_apsr"
> +  [(set (reg:CC APSRQ_REGNUM)
> + (unspec_volatile:CC
> +   [(match_operand:SI 0 "s_register_operand" "r")] VUNSPEC_APSR_WRITE))]
> +  "TARGET_ARM_QBIT"
> +  "msr%?\tAPSR_nzcvq, %0"
> +  [(set_attr "predicable" "yes")
> +   (set_attr "conds" "set")]
> +)

This is missing a clobber (or set) of CC_REGNUM.
Why unspec_volatile and not unspec?

> +;; Read the APSR and set the Q bit (bit position 27) according to operand 0
> +(define_expand "arm_set_saturation"
> +  [(match_operand:SI 0 "reg_or_int_operand")]
> +  "TARGET_ARM_QBIT"
> +  {
> +rtx apsr = gen_reg_rtx (SImode);
> +emit_insn (gen_arm_get_apsr (apsr));
> +rtx to_insert = gen_reg_rtx (SImode);
> +if (CONST_INT_P (operands[0]))
> +  emit_move_insn (to_insert, operands[0] == CONST0_RTX (SImode)
> +  ? CONST0_RTX (SImode) : CONST1_RTX (SImode));
> +else
> +  {
> +rtx cmp = gen_rtx_NE (SImode, operands[0], CONST0_RTX (SImode));
> +emit_insn (gen_cstoresi4 (to_insert, cmp, operands[0],
> +   CONST0_RTX (SImode)));
> +  }
> +emit_insn (gen_insv (apsr, CONST1_RTX (SImode),
> +gen_int_mode (27, SImode), to_insert));
> +emit_insn (gen_arm_set_apsr (apsr));
> +DONE;
> +  }
> +)

Why are you preserving APSR.NZCV across this operation?  It should not be live
during the builtin that expands this.


r~


Re: [PING**3] [PATCH] Fix dwarf-lineinfo inconsistency of inlined subroutines

2019-11-09 Thread Bernd Edlinger
Ping...

On 11/2/19 7:49 AM, Bernd Edlinger wrote:
> Ping...
> 
> On 10/27/19 9:14 AM, Bernd Edlinger wrote:
>> Ping...
>>
>> I'd like to ping for this patch:
>> https://gcc.gnu.org/ml/gcc-patches/2019-10/msg01459.html
>>
>>
>> Thanks
>> Bernd.
>>
>> On 10/20/19 9:58 PM, Bernd Edlinger wrote:
>>> Hi,
>>>
>>> this fixes an issue with the gdb step-over aka. "n" command.
>>>
>>> It can be seen when you debug an optimized stage-3 cc1
>>> it does not affect -O0 code, though.
>>>
>>> This example debug session will explain the effect.
>>>
>>> (gdb) b get_alias_set
>>> Breakpoint 5 at 0xa099f0: file ../../gcc-trunk/gcc/alias.c, line 837.
>>> (gdb) r
>>> Breakpoint 5, get_alias_set (t=t@entry=0x77ff7ab0) at 
>>> ../../gcc-trunk/gcc/alias.c:837
>>> 837   if (t == error_mark_node
>>> (gdb) n
>>> 839   && (TREE_TYPE (t) == 0 || TREE_TYPE (t) == error_mark_node)))
>>> (gdb) n
>>> 3382  return __t;  <-- now we have a problem: wrong line info here
>>> (gdb) bt
>>> #0  get_alias_set (t=t@entry=0x77ff7ab0) at 
>>> ../../gcc-trunk/gcc/tree.h:3382
>>> #1  0x00b25dfe in set_mem_attributes_minus_bitpos 
>>> (ref=0x7746f990, t=0x77ff7ab0, objectp=1, bitpos=...)
>>> at ../../gcc-trunk/gcc/emit-rtl.c:1957
>>> #2  0x01137a55 in make_decl_rtl (decl=0x77ff7ab0) at 
>>> ../../gcc-trunk/gcc/varasm.c:1518
>>> #3  0x0113b6e8 in assemble_variable (decl=0x77ff7ab0, 
>>> top_level=, at_end=, 
>>> dont_output_data=0) at ../../gcc-trunk/gcc/varasm.c:2246
>>> #4  0x0113f0ea in varpool_node::assemble_decl (this=0x7745b000) 
>>> at ../../gcc-trunk/gcc/varpool.c:584
>>> #5  0x0113fa17 in varpool_node::assemble_decl (this=0x7745b000) 
>>> at ../../gcc-trunk/gcc/varpool.c:750
>>>
>>>
>>> There are at least two problems here:
>>>
>>> First you did not want to step into the TREE_TYPE, but it happens all
>>> the time, even if you use "n" to step over it.
>>>
>>> And secondly, from the call stack, you don't know where you are in 
>>> get_alias_set.
>>> But the code that is executing at this point is actually the x == 0 || x == 
>>> error_mark_node
>>> from alias.c, line 839, which contains the inlined body of the TREE_TYPE, 
>>> but
>>> the rest of the if.  So there is an inconsistency in the  
>>>
>>> Contents of the .debug_info section:
>>>
>>>  <2><4f686>: Abbrev Number: 12 (DW_TAG_inlined_subroutine)
>>> <4f687>   DW_AT_abstract_origin: <0x53d4e>
>>> <4f68b>   DW_AT_entry_pc: 0x7280
>>> <4f693>   DW_AT_GNU_entry_view: 1
>>> <4f695>   DW_AT_ranges  : 0xb480
>>> <4f699>   DW_AT_call_file   : 8  <- alias.c
>>> <4f69a>   DW_AT_call_line   : 839
>>> <4f69c>   DW_AT_call_column : 8
>>> <4f69d>   DW_AT_sibling : <0x4f717>
>>>
>>>  The File Name Table (offset 0x253):
>>>   8 2   0   0   alias.c
>>>   102   0   0   tree.h
>>>
>>> Contents of the .debug_ranges section:
>>>
>>> b480 7280 7291 
>>> b480 2764 277e 
>>> b480 
>>>
>>> The problem is at pc=0x7291 in the Line Number Section:
>>>
>>>  Line Number Statements:
>>>
>>>   [0x8826]  Special opcode 61: advance Address by 4 to 0x7284 and Line 
>>> by 0 to 3380
>>>   [0x8827]  Set is_stmt to 1
>>>   [0x8828]  Special opcode 189: advance Address by 13 to 0x7291 and 
>>> Line by 2 to 3382 (*)
>>>   [0x8829]  Set is_stmt to 0 (**)
>>>   [0x882a]  Copy (view 1)
>>>   [0x882b]  Set File Name to entry 8 in the File Name Table <- back to 
>>> alias.c
>>>   [0x882d]  Set column to 8
>>>   [0x882f]  Advance Line by -2543 to 839
>>>   [0x8832]  Copy (view 2)
>>>   [0x8833]  Set column to 27
>>>   [0x8835]  Special opcode 61: advance Address by 4 to 0x7295 and Line 
>>> by 0 to 839
>>>   [0x8836]  Set column to 3
>>>   [0x8838]  Set is_stmt to 1 <-- next line info counts: alias.c:847
>>>   [0x8839]  Special opcode 153: advance Address by 10 to 0x729f and 
>>> Line by 8 to 847
>>>   [0x883a]  Set column to 7
>>>
>>> (*) this line is tree.h:3382, but the program counter is *not* within the 
>>> subroutine,
>>> but exactly at the first instruction *after* the subroutine according to 
>>> the debug_ranges.
>>>
>>> What makes it worse, is that (**) makes gdb ignore the new location info 
>>> alias.c:839,
>>> which means, normally the n command would have continued to pc=0x729f, 
>>> which is at alias.c:847.
>>>
>>>
>>> The problem happens due to a block with only var
>>> This patch fixes this problem by moving (**) to the first statement with a 
>>> different line number.
>>>
>>> In alias.c.316r.final this looks like that:
>>>
>>> (note 2884 2883 1995 31 0x7f903a931ba0 NOTE_INSN_BLOCK_BEG)
>>> (note 1995 2884 2885 31 ../../gcc-trunk/gcc/tree.h:3377 
>>> NOTE_INSN_INLINE_ENTRY)
>>> (note 2885 1995 1996 31 0x7f903a931c00 NOTE_INSN_BLOCK_BEG)
>>> [...]
>>> (note 50 39 59 32 [bb 32] NOTE_INSN_BASIC_BLOCK)
>>> (note 59 50 60 32

Re: [PATCH 1/5] Libsanitizer: merge from trunk with merge.sh.

2019-11-09 Thread Iain Sandoe
Eric Gallager  wrote:

> On 11/5/19, Jakub Jelinek  wrote:
>> On Mon, Nov 04, 2019 at 04:10:27PM +0100, Martin Liska wrote:
>>> libsanitizer/ChangeLog:
>>> 
>>> 2019-11-05  Martin Liska  
>>> 
>>> * all source files: Merge from upstream r375507.
>>> ---
>>> libsanitizer/BlocksRuntime/Block.h|   59 +
>>> libsanitizer/BlocksRuntime/Block_private.h|  179 ++
>> 
>> Do we really need this?

probably not, 

I’ve been tied up with WG21, so not properly reviewed the libsanitizer
merge yet..  However, I have a note to myself to check the *existing*
interface that GCC is emitting for the facility on Darwin, since it differs
from the one emitted by clang (that might/might not be imporant, it’s not
yet analysed).

> So, maybe we don't need this for the sanitizer itself, but if the
> sanitizers now come with their own copy of the Blocks Runtime...
> couldn't that be the solution as to where to take our blocks support
> library for GCC proper from?

No, this issue is not the absence of a blocks runtime (at least, not on
Darwin) the issue is that the compiler support is missing.

>> I mean, couldn't we wrap this Block.h include with #ifdef __BLOCKS__ or so
>> as a local patch (at least for now)?
> 
> This is bug 78352 btw: https://gcc.gnu.org/bugzilla/show_bug.cgi?id=78352

I haven’t got anything mature enough to post for inclusion in GCC-10, the last
6 months have been “playing catch up” and just didn’t leave enough time for
this, sadly.

thanks
Iain