date:20210827

Re: fix latent bootstrap-debug issue (modref, tree-inline, tree jump-threading)

2021-08-27 Thread Alexandre Oliva

On Aug 22, 2021, Jan Hubicka  wrote:

> OK, thanks for looking into this issue!

Thanks, I've finally installed it in the trunk.

> It seems that analye_stmt indeed does not skip debug stmts.  It is very
> odd we got so far without hitting build difference.

Indeed.  That got me thinking...  The comments state:

 If the statement cannot be analyzed (for any reason), the entire
 function cannot be analyzed by modref.

but the implementation also tests, for every statement:

  || ((!summary || !summary->useful_p (ecf_flags, false))
  && (!summary_lto
  || !summary_lto->useful_p (ecf_flags, false

which means that, if the first stmt of a block doesn't add useful
information to the summary, we give up.  Was this really the intent?

-- 
Alexandre Oliva, happy hackerhttps://FSFLA.org/blogs/lxo/
   Free Software Activist   GNU Toolchain Engineer
Disinformation flourishes because many people care deeply about injustice
but very few check the facts.  Ask me about

Re: [PATCH] improve note location and refactor warn_uninit

2021-08-27 Thread Jeff Law via Gcc-patches





On 8/26/2021 1:30 PM, Martin Sebor via Gcc-patches wrote:

On 8/26/21 10:38 AM, Martin Sebor wrote:

On 8/26/21 1:00 AM, Richard Biener wrote:

On Wed, Aug 25, 2021 at 10:03 PM Martin Sebor  wrote:


Richard, some time ago you mentioned you'd had trouble getting
-Wuninitialized to print the note pointing to the uninitialized
variable.  I said I'd look into it, and so I did.  The attached
patch simplifies the warn_uninit() function to get rid of some
redundant cruft: besides a few pointless null tests and
a duplicate argument it also does away with ancient code that's
responsible for the problem you saw.  It turns out that tests
for the problem are already in the test suite but have been
xfailed since the day they were added, so the patch simply
removes the xfails.  I'll go ahead and commit this cleanup
as obvious later this week unless you have suggestions for
further changes.


I do see lots of useful refactoring.

-  if (context != NULL && gimple_has_location (context))
-    location = gimple_location (context);
-  else if (phiarg_loc != UNKNOWN_LOCATION)
-    location = phiarg_loc;
-  else
-    location = DECL_SOURCE_LOCATION (var);
+  /* Use either the location of the read statement or that of the PHI
+ argument, or that of the uninitialized variable, in that order,
+ whichever is valid.  */
+  location_t location = gimple_location (context);
+  if (location == UNKNOWN_LOCATION)
+    {
+  if (phi_arg_loc == UNKNOWN_LOCATION)
+   location = DECL_SOURCE_LOCATION (var);
+  else
+   location = phi_arg_loc;
+    }

the comment is an improvement but I think the original flow is
easier to follow ;)


It doesn't really simplify things so I can revert it.


I've done that and pushed r12-3165, and...


-  if (xloc.file != floc.file
- || linemap_location_before_p (line_table, location, cfun_loc)
- || linemap_location_before_p (line_table, 
cfun->function_end_locus,

-   location))
-   inform (DECL_SOURCE_LOCATION (var), "%qD was declared here", 
var);

...
+  inform (var_loc, "%qD was declared here", var);

and this is the actual change that "fixes" the missing diagnostics?  
Can

you explain what the reason for the original sing-and-dance was?  It
looks like it tries to print the 'was declared here' only for locations
within the current function (but then it probably intended to do that
on the DECL_SOURCE_LOCATION (var), not on whatever we are
choosing now)?


The author(s) of the logic wanted to print the note only for variables
declared in functions other than the one the uninitialized read is in.
The idea first appears in pr17506, comment #25 (and attachment 12131).
At that time GCC would print no note at all and pr17506 was about
inlining making it difficult to find the uninitialized variable.
The originally reported problem can still be reproduced on Godbolt
(with the original very large translation unit):
   https://godbolt.org/z/8WW43rxnd

I've reduced it to a small test case:
   https://godbolt.org/z/rnPEfPqPf

It looks like they didn't think it would also be a problem if
the variable was defined and read in the same function, even
if the read was far away from the declaration.



That said, I'd prefer if you can commit the refactoring independently
of this core change and can try to dig why this was added and what
it was supposed to do.


Sure, let me do that.  Please let me know if the fix for the note
is okay to commit as is on its own.


...the removal of the test guarding the note is attached.



Martin



Thanks,
Richard.


Tested on x86_64-linux.

Martin





gcc-17506.diff

Improve note location.

Related:
PR tree-optimization/17506 - warning about uninitialized variable points to 
wrong location
PR testsuite/37182 - Revision 139286 caused gcc.dg/pr17506.c and 
gcc.dg/uninit-15.c

gcc/ChangeLog:

PR tree-optimization/17506
PR testsuite/37182
* tree-ssa-uninit.c (warn_uninit): Remove conditional guarding note.

gcc/testsuite/ChangeLog:

PR tree-optimization/17506
PR testsuite/37182
* gcc.dg/diagnostic-tree-expr-ranges-2.c: Add expected output.
* gcc.dg/uninit-15-O0.c: Remove xfail.
* gcc.dg/uninit-15.c: Same.
OK if neither Joern nor Nathan object by Wednesday morning (want to give 
them a couple workdays to chime in if they feel the need).


jeff

Re: [PATCH] Only simplify TRUNCATE to SUBREG on TRULY_NOOP_TRUNCATION targets

2021-08-27 Thread Jeff Law via Gcc-patches





On 8/27/2021 3:57 PM, Roger Sayle wrote:

As recently remarked by Jeff Law, SUBREGs are the "forever chemicals"
of GCC's RTL; once created they persist in the environment.  The problem,
according to the comment on lines 5428-5438 of combine.c is that
non-tieable SUBREGs interfere with reload/register allocation, so
combine often doesn't touch/clean-up instructions containing a SUBREG.
Forever chemicals.  I like that term.   There's other places that have 
related hackery.   cse.c in particular.



Alas currently, during combine the middle TRUNCATE is converted into
a lowpart SUBREG, which subst then turns into (clobber (const_int 0)),
abandoning the attempted combination, that then never reaches recog.

This patch has been tested on x86_64-pc-linux-gnu with "make bootstrap"
and "make -k check" with no new failures, and also on nvptx-none.
But it only affects !TRULY_NOOP_TRUNCATION targets, and the motivating
example above is derived from the behaviour of backend patches not yet
in the tree [nvptx is currently a STORE_FLAG_VALUE=-1 target].
It's unfortunate that STORE_FLAG_VALUE is static.   Many ports could 
store a variety of useful value as part of the scc insn rather than have 
the scc produce a value, then we manipulate it with shifts, negation, 
multiplication, whatever to get the value we ultimately wanted.  We've 
run into this internally repeatedly on tachyum's port.  They might be 
better modeled as an if-then-else.  But I digress





Ok for mainline?


2021-08-27  Roger Sayle  

gcc/ChangeLog
* combine.c (combine_simplify_rtx): Avoid converting an explicit
TRUNCATE into a lowpart SUBREG on !TRULY_NOOP_TRUNCATION targets.
* simplify-rtx.c (simplify_unary_operation_1): Likewise.

OK
jeff

Re: [PATCH 19/34] rs6000: Handle overloads during program parsing

2021-08-27 Thread Segher Boessenkool

Hi!

On Thu, Jul 29, 2021 at 08:31:06AM -0500, Bill Schmidt wrote:
> Although this patch looks quite large, the changes are fairly minimal.
> Most of it is duplicating the large function that does the overload
> resolution using the automatically generated data structures instead of
> the old hand-generated ones.  This doesn't make the patch terribly easy to
> review, unfortunately.

Yeah, and it pretty important that it does the same as the old code did.

> Just be aware that generally we aren't changing
> the logic and functionality of overload handling.

Good :-)

>   * config/rs6000/rs6000-c.c (rs6000-builtins.h): New include.
>   (altivec_resolve_new_overloaded_builtin): New forward decl.
>   (rs6000_new_builtin_type_compatible): New function.
>   (altivec_resolve_overloaded_builtin): Call
>   altivec_resolve_new_overloaded_builtin.
>   (altivec_build_new_resolved_builtin): New function.
>   (altivec_resolve_new_overloaded_builtin): Likewise.
>   * config/rs6000/rs6000-call.c (rs6000_new_builtin_is_supported_p):
>   Likewise.

No "_p" please (it already has a verb in the name, and explicit ones are
much clearer anyway).

Does everything else belong in the C frontend file but this last
function not?  Maybe this could be split up better.  Maybe there should
be a separate file for just the builtin support, it probably is big
enough?

(This is all for later of course, but please think about it.  Code
rearrangement (or even refactoring) can be done at any time, there is
no time pressure on it).

> +static tree
> +altivec_resolve_new_overloaded_builtin (location_t, tree, void *);

This fits on one line, please do so (this is a declaration, not a
function definition; those are put with the name in the first column,
to make searching for them a lot easier).

> +static bool
> +rs6000_new_builtin_type_compatible (tree t, tree u)
> +{
> +  if (t == error_mark_node)
> +return false;
> +
> +  if (INTEGRAL_TYPE_P (t) && INTEGRAL_TYPE_P (u))
> +return true;
> +
> +  if (TARGET_IEEEQUAD && TARGET_LONG_DOUBLE_128
> +&& is_float128_p (t) && is_float128_p (u))

Indent is wrong here?

> +  if (POINTER_TYPE_P (t) && POINTER_TYPE_P (u))
> +{
> +  t = TREE_TYPE (t);
> +  u = TREE_TYPE (u);
> +  if (TYPE_READONLY (u))
> + t = build_qualified_type (t, TYPE_QUAL_CONST);
> +}

Just use TYPE_MAIN_VARIANT, don't make garbage here?

> +  /* The AltiVec overloading implementation is overall gross, but this
> + is particularly disgusting.  The vec_{all,any}_{ge,le} builtins
> + are completely different for floating-point vs. integer vector
> + types, because the former has vcmpgefp, but the latter should use
> + vcmpgtXX.

Yes.  The integer comparisons were reduced to just two in original VMX
(eq and gt, well signed and unsigned versions of the latter), but that
cannot be done for floating point (all of ab can be false at
the same time (for a or b NaN), so this cannot be compressed to just two
functions, we really need three at least).

We have the same thing in xscmp* btw, it's not just VMX.  Having three
ops for the majority of comparisons (and doing the rest with two or so)
is nicer than having 14 :-)

There aren't builtins for most of that, thankfully.

> +  if (TARGET_DEBUG_BUILTIN)
> +fprintf (stderr, "altivec_resolve_overloaded_builtin, code = %4d, %s\n",
> +  (int)fcode, IDENTIFIER_POINTER (DECL_NAME (fndecl)));

(space after cast, also in debug code)

> +   const char *name
> + = fcode == RS6000_OVLD_VEC_ADDE ? "vec_adde": "vec_sube";

(space before and after colon)

> +   const char *name = fcode == RS6000_OVLD_VEC_ADDEC ?
> + "vec_addec": "vec_subec";

(ditto.  also, ? cannot end a line.  maybe just
  const char *name;
  name = fcode == RS6000_OVLD_VEC_ADDEC ? "vec_addec" : "vec_subec";
)

> +  const char *name
> + = fcode == RS6000_OVLD_VEC_SPLATS ? "vec_splats": "vec_promote";

(more)

> +   case E_SFmode: type = V4SF_type_node; size = 4; break;
> +   case E_DFmode: type = V2DF_type_node; size = 2; break;

Don't put multiple statements on one line.  Put the label on its own,
too, for that matter.

> + }
> + return build_constructor (type, vec);

This is wrong indenting.  Where it started, I have no idea.  You figure
it out :-)

> + bad:
> +  {
> +const char *name = rs6000_overload_info[adj_fcode].ovld_name;
> +error ("invalid parameter combination for AltiVec intrinsic %qs", name);
> +return error_mark_node;
> +  }

A huge function with a lot of "goto bad;" just *screams* "this needs to
be factored".

> +case ENB_P5:
> +  if (!TARGET_POPCNTB)
> + return false;
> +  break;

case ENB_P5:
  return TARGET_POPCNTB;

and similar for all further cases.  It is shorter and does not have
negations, win-win!

> +  break;
> +};

Stray semicolon.  Did this not warn?

Could you please try to factor this better?

Segher

[PATCH] Only simplify TRUNCATE to SUBREG on TRULY_NOOP_TRUNCATION targets

2021-08-27 Thread Roger Sayle


As recently remarked by Jeff Law, SUBREGs are the "forever chemicals"
of GCC's RTL; once created they persist in the environment.  The problem,
according to the comment on lines 5428-5438 of combine.c is that
non-tieable SUBREGs interfere with reload/register allocation, so
combine often doesn't touch/clean-up instructions containing a SUBREG.

This is the first and simplest of two patches to tackle that problem,
by teaching combine to avoid converting explicit TRUNCATEs into
SUBREGs that it can't handle.

Consider the following (hypothetical) sequence of instructions on
a STORE_FLAG_VALUE=1 target, which stores a zero or one in an SI
register, then uselessly truncates to QImode, then extends it again.

(set (reg:SI 27) (ne:SI (reg:BI 28) (const_int 0)))
(set (reg:QI 26) (truncate:QI (reg:SI 27)))
(set (reg:SI 0) (zero_extend:SI (reg:QI 26)))

which ideally (i.e. with this patch) combine would simplify to:
(set (reg:SI 0) (ne:SI (reg:BI 28) (const_int 0)))

Alas currently, during combine the middle TRUNCATE is converted into
a lowpart SUBREG, which subst then turns into (clobber (const_int 0)),
abandoning the attempted combination, that then never reaches recog.

This patch has been tested on x86_64-pc-linux-gnu with "make bootstrap"
and "make -k check" with no new failures, and also on nvptx-none.
But it only affects !TRULY_NOOP_TRUNCATION targets, and the motivating
example above is derived from the behaviour of backend patches not yet
in the tree [nvptx is currently a STORE_FLAG_VALUE=-1 target].

Ok for mainline?


2021-08-27  Roger Sayle  

gcc/ChangeLog
* combine.c (combine_simplify_rtx): Avoid converting an explicit
TRUNCATE into a lowpart SUBREG on !TRULY_NOOP_TRUNCATION targets.
* simplify-rtx.c (simplify_unary_operation_1): Likewise.

Roger
--

diff --git a/gcc/combine.c b/gcc/combine.c
index cb5fa40..290a366 100644
--- a/gcc/combine.c
+++ b/gcc/combine.c
@@ -5903,7 +5903,8 @@ combine_simplify_rtx (rtx x, machine_mode op0_mode, int 
in_dest,
   if (HWI_COMPUTABLE_MODE_P (mode)
  && (STORE_FLAG_VALUE & ~GET_MODE_MASK (mode)) == 0
  && (temp = get_last_value (XEXP (x, 0)))
- && COMPARISON_P (temp))
+ && COMPARISON_P (temp)
+ && TRULY_NOOP_TRUNCATION_MODES_P (mode, GET_MODE (XEXP (x, 0
return gen_lowpart (mode, XEXP (x, 0));
   break;
 
diff --git a/gcc/simplify-rtx.c b/gcc/simplify-rtx.c
index c81e27e..e431e0c 100644
--- a/gcc/simplify-rtx.c
+++ b/gcc/simplify-rtx.c
@@ -1249,7 +1249,8 @@ simplify_context::simplify_unary_operation_1 (rtx_code 
code, machine_mode mode,
  than HOST_BITS_PER_WIDE_INT.  */
   if (HWI_COMPUTABLE_MODE_P (mode)
  && COMPARISON_P (op)
- && (STORE_FLAG_VALUE & ~GET_MODE_MASK (mode)) == 0)
+ && (STORE_FLAG_VALUE & ~GET_MODE_MASK (mode)) == 0
+ && TRULY_NOOP_TRUNCATION_MODES_P (mode, GET_MODE (op)))
{
  temp = rtl_hooks.gen_lowpart_no_emit (mode, op);
  if (temp)

Re: [PATCH] improve note location and refactor warn_uninit

2021-08-27 Thread Jeff Law via Gcc-patches





On 8/26/2021 1:00 AM, Richard Biener via Gcc-patches wrote:

On Wed, Aug 25, 2021 at 10:03 PM Martin Sebor  wrote:

Richard, some time ago you mentioned you'd had trouble getting
-Wuninitialized to print the note pointing to the uninitialized
variable.  I said I'd look into it, and so I did.  The attached
patch simplifies the warn_uninit() function to get rid of some
redundant cruft: besides a few pointless null tests and
a duplicate argument it also does away with ancient code that's
responsible for the problem you saw.  It turns out that tests
for the problem are already in the test suite but have been
xfailed since the day they were added, so the patch simply
removes the xfails.  I'll go ahead and commit this cleanup
as obvious later this week unless you have suggestions for
further changes.

I do see lots of useful refactoring.

-  if (context != NULL && gimple_has_location (context))
-location = gimple_location (context);
-  else if (phiarg_loc != UNKNOWN_LOCATION)
-location = phiarg_loc;
-  else
-location = DECL_SOURCE_LOCATION (var);
+  /* Use either the location of the read statement or that of the PHI
+ argument, or that of the uninitialized variable, in that order,
+ whichever is valid.  */
+  location_t location = gimple_location (context);
+  if (location == UNKNOWN_LOCATION)
+{
+  if (phi_arg_loc == UNKNOWN_LOCATION)
+   location = DECL_SOURCE_LOCATION (var);
+  else
+   location = phi_arg_loc;
+}

the comment is an improvement but I think the original flow is
easier to follow ;)

-  if (xloc.file != floc.file
- || linemap_location_before_p (line_table, location, cfun_loc)
- || linemap_location_before_p (line_table, cfun->function_end_locus,
-   location))
-   inform (DECL_SOURCE_LOCATION (var), "%qD was declared here", var);
...
+  inform (var_loc, "%qD was declared here", var);

and this is the actual change that "fixes" the missing diagnostics?  Can
you explain what the reason for the original sing-and-dance was?  It
looks like it tries to print the 'was declared here' only for locations
within the current function (but then it probably intended to do that
on the DECL_SOURCE_LOCATION (var), not on whatever we are
choosing now)?

That said, I'd prefer if you can commit the refactoring independently
of this core change and can try to dig why this was added and what
it was supposed to do.
Thanks.  And just to be 100% clear, all I was objecting to was the 
assertion that the patch was obvious and that it could be committed 
under the obvious rule.

jeff

[pushed] c++: Set type on dependent ARROW_EXPR

2021-08-27 Thread Jason Merrill via Gcc-patches

Even if the operand of -> has dependent type, if it's a pointer we know
that the result will be the target type of that pointer.  This should avoid
some unnecessary TYPEOF_EXPR when looking up a name after ->.

Tested x86_64-pc-linux-gnu, applying to trunk.

gcc/cp/ChangeLog:

* typeck2.c (build_x_arrow): Do set TREE_TYPE when operand is
a dependent pointer.
---
 gcc/cp/typeck2.c | 12 +---
 1 file changed, 9 insertions(+), 3 deletions(-)

diff --git a/gcc/cp/typeck2.c b/gcc/cp/typeck2.c
index dcfdff2f905..5e2c23c063c 100644
--- a/gcc/cp/typeck2.c
+++ b/gcc/cp/typeck2.c
@@ -1913,11 +1913,17 @@ build_x_arrow (location_t loc, tree expr, 
tsubst_flags_t complain)
 
   if (processing_template_decl)
 {
-  if (type && TYPE_PTR_P (type)
- && !dependent_scope_p (TREE_TYPE (type)))
+  tree ttype = NULL_TREE;
+  if (type && TYPE_PTR_P (type))
+   ttype = TREE_TYPE (type);
+  if (ttype && !dependent_scope_p (ttype))
/* Pointer to current instantiation, don't treat as dependent.  */;
   else if (type_dependent_expression_p (expr))
-   return build_min_nt_loc (loc, ARROW_EXPR, expr);
+   {
+ expr = build_min_nt_loc (loc, ARROW_EXPR, expr);
+ TREE_TYPE (expr) = ttype;
+ return expr;
+   }
   expr = build_non_dependent_expr (expr);
 }
 

base-commit: ee914ec4f811243ad72aceea4748687c74f38bc6
-- 
2.27.0

Re: [PATCH] MIPS: use N64 ABI by default if the triple end with -gnuabi64

2021-08-27 Thread Jeff Law via Gcc-patches





On 8/26/2021 10:20 PM, Xi Ruoyao via Gcc-patches wrote:

On Thu, 2021-08-26 at 23:56 -0400, YunQiang Su wrote:

gcc/ChangeLog:

 PR target/102089
 * config.gcc: MIPS: use N64 ABI by default if the triple end
   with -gnuabi64, which is used by Debian since 2013.
---
  gcc/config.gcc | 14 ++
  1 file changed, 14 insertions(+)

diff --git a/gcc/config.gcc b/gcc/config.gcc
index 0ff5cac15..0c91be6f3 100644
--- a/gcc/config.gcc
+++ b/gcc/config.gcc
@@ -2553,16 +2553,30 @@ mips*-*-linux*) # Linux MIPS, 
either endian.
 target_cpu_default=MASK_SOFT_FLOAT_ABI
 enable_mips_multilibs="yes"
 ;;
+   mipsisa64r6*-*-linux-gnuabi64)
+   default_mips_abi=64
+   default_mips_arch=mips64r6
+   enable_mips_multilibs="yes"
+   ;;
 mipsisa64r6*-*-linux*)
 default_mips_abi=n32
 default_mips_arch=mips64r6
 enable_mips_multilibs="yes"
 ;;
+   mipsisa64r2*-*-linux-gnuabi64)
+   default_mips_abi=64
+   default_mips_arch=mips64r2
+   enable_mips_multilibs="yes"
+   ;;
 mipsisa64r2*-*-linux*)
 default_mips_abi=n32
 default_mips_arch=mips64r2
 enable_mips_multilibs="yes"
 ;;
+   mips64*-*-linux-gnuabi64 | mipsisa64*-*-linux-gnuabi64)
+   default_mips_abi=64
+   enable_mips_multilibs="yes"
+   ;;
 mips64*-*-linux* | mipsisa64*-*-linux*)
 default_mips_abi=n32
 enable_mips_multilibs="yes"

LGTM, but I don't have the privilege to approve any change so you'll
still need to wait for approval :)

Yea, but having a second pair of eyes chime in is definitely helpful.



And I think the behavior change should be announced in
https://gcc.gnu.org/gcc-12/changes.html, if it's approved.

Agreed.

The configure change is approved.  And a change to the gcc-12 
changes.html is pre-approved.


jeff

Re: [PATCH v2] [MIPS]: add .module mipsREV to all output asm file

2021-08-27 Thread Jeff Law via Gcc-patches





On 8/26/2021 11:26 PM, Xi Ruoyao via Gcc-patches wrote:

On Fri, 2021-08-27 at 13:11 +0800, YunQiang Su wrote:

在 2021/6/18 11:29, YunQiang Su 写道:

Currently, the asm output file for MIPS has no rev info.
It can make some trouble, for example:
    assembler is mips1 by default,
    gcc is fpxx by default.
To assemble the output of gcc -S, we have to pass -mips2
to assembler.

gcc/ChangeLog:

  * gcc/config/mips/mips.c (mips_module_isa_name): New.
  mips_file_start: add .module mipsREV to all asm output

ping for this patch.


---
   gcc/config/mips/mips.c | 37 +
   1 file changed, 37 insertions(+)

diff --git a/gcc/config/mips/mips.c b/gcc/config/mips/mips.c
index 1f1475cf400..51cc70e6ceb 100644
--- a/gcc/config/mips/mips.c
+++ b/gcc/config/mips/mips.c
@@ -9877,6 +9877,40 @@ mips_mdebug_abi_name (void)
   }
   }
   
+static const char *

+mips_module_isa_name()

GNU style: there should be a space before ().

Correct.




+{

I think it's better to add enum values like MIPS_ISA_MIPS64R2 and use a
switch statement here?
Yes, please.   It's easier when someone has to debug the code later.  
enums show up in debug output by default, while #defines do not.




switch (mips_isa)
   {
 case MIPS_ISA_MIPS1: return "mips1";
 // ...
   }

It looks better, and (maybe) generates better code.  Just my 2 cents
though.

Coding standards would have that as

switch (mips_isa)
  {
  case MIPS_ISA_MIPS_1:
    return "mips1";
  ...
  }


Presumably .module is supported by all reasonably modern versions of GAS?


Jeff

Re: [PATCH] c++: error message for dependent template members [PR70417]

2021-08-27 Thread Jason Merrill via Gcc-patches


On 8/20/21 12:56 PM, Anthony Sharp via Gcc-patches wrote:

Hi, hope everyone is well. I have a patch here for issue 70417
(https://gcc.gnu.org/bugzilla/show_bug.cgi?id=70417). I'm still a GCC
noob, and this is probably the hardest thing I have ever coded in my
life, so please forgive any initial mistakes!

TLDR

This patch introduces a helpful error message when the user attempts
to put a template-id after a member access token (., -> or ::) in a
dependent context without having put the "template" keyword after the
access token.


Great, thanks!


CONTEXT

In C++, when referencing a class member (using ., -> or ::) in a
dependent context, if that member is a template, the template keyword
is required after the access token. For example:

template void MyDependentTemplate(T t)
{
   t.DoSomething(); /* Error - t is dependent. "<" is treated as a
less-than sign because DoSomething is assumed to be a non-template
identifier. */
   t.template DoSomething(); // Good.
}

PATCH EXPLANATION

In order to throw an appropriate error message we need to identify
and/or create a point in the compiler where the following conditions
are all simultaneously satisfied:

A) a class member access token (., ->, ::)
B) a dependent context
C) the thing being accessed is a template-id
D) No template keyword

A, B and D are relatively straightforward and I won't go into detail
about how that was achieved. The real challenge is C - figuring out
whether a name is a template-id.

Usually, we would look up the name and use that to determine whether
the name is a template or not. But, we cannot do that. We are in a
dependent context, so lookup cannot (in theory) find anything at the
point the access expression is parsed.

We (maybe) could defer checking until the template is actually
instantiated. But this is not the approach I have gone for, since this
to me sounded unnecessarily awkward and clunky. In my mind this also
seems like a syntax error, and IMO it seems more natural that syntax
errors should get caught as soon as they are parsed.

So, the solution I went for was to figure out whether the name was a
template by looking at the surrounding tokens. The key to this lies in
being able to differentiate between the start and end of a template
parameter list (< and >) and greater-than and less-than tokens (and
perhaps also things like << or >>). If the final > (if we find one at
all) does indeed mark the end of a class or function template, then it
will be followed by something like (, ::, or even just ;. On the other
hand, a greater-than operator would be followed by a name.

PERFORMANCE IMPACT

My concern with this approach was that checking this would actually
slow the compiler down. In the end it seemed the opposite was true. By
parsing the tokens to determine whether the name looks like a
template-id, we can cut out a lot of the template-checking code that
gets run trying to find a template when there is none, making compile
time generally faster. That being said, I'm not sure if the
improvement will be that substantial, so I did not feel it necessary
to introduce the template checking method everywhere where we could
possibly have run into a template.

I ran an ABAB test with the following code repeated enough times to
fill up about 10,000 lines:

ai = 1;
bi = 2;
ci = 3;
c.x = 4;
(&c)->x = 5;
mytemplateclass::y = 6;
c.template class_func();
(&c)->template class_func();
mytemplateclass::template class_func_static();
c2.class_func();
(&c2)->class_func();
myclass::class_func_static();
ai = 6;
bi = 5;
ci = 4;
c.x = 3;
(&c)->x = 2;
mytemplateclass::y = 1;
c.template class_func();
(&c)->template class_func();
mytemplateclass::template class_func_static();
c2.class_func();
(&c2)->class_func();
myclass::class_func_static();

It basically just contains a mixture of class access expressions plus
some regular assignments for good measure. The compiler was compiled
without any optimisations (and so slower than usual). In hindsight,
perhaps this test case assumes the worst-ish-case scenario since it
contains a lot of templates; most of the benefits of this patch occur
when parsing non-templates.

The compiler (clean) and the compiler with the patch applied (patched)
compiled the above code 20 times each in ABAB fashion. On average, the
clean compiler took 2.26295 seconds and the patched compiler took
2.25145 seconds (about 0.508% faster). Whether this improvement
remains or disappears when the compiler is built with optimisations
turned on I haven't tested. My main concern was just making sure it
didn't get any slower.

AWKWARD TEST CASES

You will see from the patch that it required the updating of a few
testcases (as well as one place within the compiler). I believe this
is because up until now, the compiler allowed the omission of the
template keyword in some places it shouldn't have. Take decltype34.C
for example:

struct impl
{
   template  static T create();
};

template()->impl::create())>
struct tester{};

GCC currently accepts this

Re: [PATCH] libffi: Fix MIPS r6 support

2021-08-27 Thread Jeff Law via Gcc-patches





On 8/26/2021 10:58 PM, YunQiang Su wrote:

for some instructions, MIPS r6 uses different encoding other than
the previous releases.

1. mips/n32.S disable .set mips4: since it casuses old insn encoding
is used.
https://github.com/libffi/libffi/pull/396
2. mips/ffi.c: the encoding for JR is hardcoded: we need to use
different value for r6 and pre-r6.
https://github.com/libffi/libffi/pull/401

libffi/
PR libffi/83636
* src/mips/n32.S: disable .set mips4
* src/mips/ffi.c: use different JR encoding for r6.
These should go to the upstream libffi project.  Once accepted there you 
can add them to GCC.


Jeff

Re: [PATCH] Fix inline versioned namespace bootstrap

2021-08-27 Thread Jonathan Wakely via Gcc-patches

On Fri, 27 Aug 2021 at 21:58, François Dumont via Libstdc++
 wrote:
>
> Since std::allocator is not specialized anymore in
> _GLIBCXX_INLINE_VERSION mode _ExtPtr_allocator specialization do
> not compile
>
> because std::allocator is incomplete.

That doesn't look right ... it should be complete. This suggests there
is a deeper problem, which I'll look into.

>
> So I think primary _ExtPtr_allocator template should also be prefered in
> _GLIBCXX_INLINE_VERSION mode.

I think it should always be preferred. The _ExtPtr_allocator
specialization is useless, it is missing the converting constructor
that would be needed to convert to/from that type to any other
_ExtPtr_allocator specialization.

[committed] Support limited setcc for H8

2021-08-27 Thread Jeff Law via Gcc-patches



One of the (few) remaining issues related to cc0 elimination from the H8 
port was the fact that I dropped all setcc support.  While we only had 
setcc on the H8/SX variant, it seems a shame to regress like that.


Meanwhile my primary motivation behind the recent work on the H8 port is 
to speed up testing, particularly the builtin-arith tests.  While I have 
some support that significantly improves things, it's still not as good 
as it could be.


"Naturally" those two issues started to intersect and I started thinking 
a lot about setcc on the H8 last weekend and cobbled together some code 
that makes another notable improvement in those tests.


This patch was extracted from the weekend's work and stands on its own 
as a nice little improvement.  In particular it allows us to support 
setcc style insns that move the state of the C bit into a GPR.


Handling the C bit is the easiest to implement.  In fact, we can 
implement C bit handling on all the H8 variants and the implementation 
is actually better than what the H8/SX was previously doing.


We have a variety of ways to get C bit information into GPRs.  We can 
insert it into an arbitrary position in an 8 bit register, we can use it 
in an 8 bit {add,subtract}-with-carry instruction, or we can use it in 
8, 16 or 32 bit rotates.


The first decision is should setcc generate -1, 0 or 1, 0.  -1, 0 is 
easiest (subx), but we're often going to need to negate the result.  -1, 
0 is also going to be painful for the other bits we're going to want to 
support (Z, N, V) as we don't have as much flexibility in getting the 
bit into a GPR.


1, 0 isn't too hard.  We can use xor to clear the target register (of 
any size), then do a bld to insert C into the low bit.  Good. We can 
also use bild to insert inverted C into the low bit.  This allows us to 
implement efficient setcc for leu/gtu.  At worst it's equivalent to a 
branchy sequence in terms of speed and space and it's often better, even 
in isolation.  It also often allows combinations with subsequent 
instructions that extend the result to wider types which further 
improves things.


While this could be easily extended to put the C bit at an arbitrary 
location in an 8 or 16 bit register or in the sign bit of a 32bit 
register, I haven't seen enough code that would significantly benefit.  
So I haven't implemented this.


While this does occasionally make things larger by twiddling register 
allocation decisions, these are relatively rare and dwarfed the regular 
and larger improvements we see in general.


Tested without regressions on the H8.  Installing onto the trunk.

Jeff
commit ee914ec4f811243ad72aceea4748687c74f38bc6
Author: Jeff Law 
Date:   Fri Aug 27 17:01:37 2021 -0400

Support limited setcc for H8

gcc/

* config/h8300/bitfield.md (cstore4): Remove expander.
* config/h8300/h8300.c (h8300_expand_branch): Remove function.
* config/h8300/h8300-protos.h (h8300_expadn_branch): Remove 
prototype.
* config/h8300/h8300.md (eqne): New code iterator.
(geultu, geultu_to_c): Similarly.
* config/h8300/testcompare.md (cstore4): Dummy expander.
(store_c_, store_c_i_): New define_insn_and_splits
(cmp_c): New pattern

diff --git a/gcc/config/h8300/bitfield.md b/gcc/config/h8300/bitfield.md
index 82cb161d126..0d28c750a6a 100644
--- a/gcc/config/h8300/bitfield.md
+++ b/gcc/config/h8300/bitfield.md
@@ -338,17 +338,6 @@
 }
   [(set_attr "length_table" "bitfield")])
 
-;;(define_expand "cstore4"
-;;  [(use (match_operator 1 "eqne_operator"
-;; [(match_operand:QHSI 2 "h8300_dst_operand" "")
-;;  (match_operand:QHSI 3 "h8300_src_operand" "")]))
-;;   (clobber (match_operand:QHSI 0 "register_operand"))]
-;;  "TARGET_H8300SX"
-;;  {
-;;h8300_expand_store (operands);
-;;DONE;
-;;  })
-
 ;;(define_insn "*bstzhireg"
 ;;  [(set (match_operand:HI 0 "register_operand" "=r")
 ;; (match_operator:HI 1 "eqne_operator" [(cc0) (const_int 0)]))]
diff --git a/gcc/config/h8300/h8300-protos.h b/gcc/config/h8300/h8300-protos.h
index 3d344018ff2..4a9624f91d6 100644
--- a/gcc/config/h8300/h8300-protos.h
+++ b/gcc/config/h8300/h8300-protos.h
@@ -45,7 +45,6 @@ extern int compute_a_shift_cc (rtx *, rtx_code);
 #ifdef HAVE_ATTR_cc
 extern enum attr_cc compute_plussi_cc (rtx *);
 #endif
-extern void h8300_expand_branch (rtx[]);
 extern void h8300_expand_store (rtx[]);
 extern bool expand_a_shift (machine_mode, enum rtx_code, rtx[]);
 extern int h8300_shift_needs_scratch_p (int, machine_mode, rtx_code);
diff --git a/gcc/config/h8300/h8300.c b/gcc/config/h8300/h8300.c
index 5f7251ab78d..a63c3220e66 100644
--- a/gcc/config/h8300/h8300.c
+++ b/gcc/config/h8300/h8300.c
@@ -3256,30 +3256,8 @@ compute_logical_op_length (machine_mode mode, rtx_code 
code, rtx *operands, rtx_
   return length;
 }
 
-
 #if 0
-/* Expand a conditional branch.  */
-
-void
-h8300_expand_branch (rtx operands[])
-{
-  enum rtx_co

[PATCH] Fix inline versioned namespace bootstrap

2021-08-27 Thread François Dumont via Gcc-patches

Since std::allocator is not specialized anymore in 
_GLIBCXX_INLINE_VERSION mode _ExtPtr_allocator specialization do 
not compile


because std::allocator is incomplete.

So I think primary _ExtPtr_allocator template should also be prefered in 
_GLIBCXX_INLINE_VERSION mode.


Ok to commit ?

François

diff --git a/libstdc++-v3/include/ext/extptr_allocator.h b/libstdc++-v3/include/ext/extptr_allocator.h
index 7d8aaac7cee..78ee1fa039f 100644
--- a/libstdc++-v3/include/ext/extptr_allocator.h
+++ b/libstdc++-v3/include/ext/extptr_allocator.h
@@ -162,6 +162,7 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
   std::allocator<_Tp>  _M_real_alloc;
 };
 
+#if ! _GLIBCXX_INLINE_VERSION
   // _ExtPtr_allocator specialization.
   template<>
 class _ExtPtr_allocator
@@ -183,6 +184,7 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
 private:
   std::allocator  _M_real_alloc;
 };
+#endif
 
   template
 inline void

[PATCH] PR fortran/87737 - ICE tree check: expected ssa_name, have addr_expr in remap_gimple_op_r, at tree-inline.c:923

2021-08-27 Thread Harald Anlauf via Gcc-patches

Dear all,

the ICE in the original testcase does no longer occur but leads to an
error in a later stage of compilation, and we accepted invalid code.
(Cross-checked with other compilers, such as Intel and NAG).

F2018 states:

15.6.2.6  ENTRY statement

(3) ... If the characteristics of the result of the function named in the
ENTRY statement are the same as the characteristics of the result of the
function named in the FUNCTION statement, their result names identify the same
entity, although their names need not be the same. Otherwise, they are storage
associated and shall all be nonpointer, nonallocatable scalar variables that
are default integer, default real, double precision real, default complex, or
default logical.

We thus better reject the testcase example during resolution with an
appropriate error message.  (I hope the chosen one is fine enough.)

Regtested on x86_64-pc-linux-gnu.  OK for mainline?

Thanks,
Harald


Fortran - reject function entries with mismatched characteristics

gcc/fortran/ChangeLog:

PR fortran/87737
* resolve.c (resolve_entries): For functions of type CHARACTER
tighten the checks for matching characteristics.

gcc/testsuite/ChangeLog:

PR fortran/87737
* gfortran.dg/entry_24.f90: New test.

diff --git a/gcc/fortran/resolve.c b/gcc/fortran/resolve.c
index 5b9ba43780e..f641d0d4dae 100644
--- a/gcc/fortran/resolve.c
+++ b/gcc/fortran/resolve.c
@@ -804,6 +804,15 @@ resolve_entries (gfc_namespace *ns)
 	 the same string length, i.e. both len=*, or both len=4.
 	 Having both len= is also possible, but difficult to
 	 check at compile time.  */
+	  else if (ts->type == BT_CHARACTER
+		   && (el->sym->result->attr.allocatable
+		   != ns->entries->sym->result->attr.allocatable))
+	{
+	  gfc_error ("Function %s at %L has entry %s with mismatched "
+			 "characteristics", ns->entries->sym->name,
+			 &ns->entries->sym->declared_at, el->sym->name);
+	  return;
+	}
 	  else if (ts->type == BT_CHARACTER && ts->u.cl && fts->u.cl
 		   && (((ts->u.cl->length && !fts->u.cl->length)
 			||(!ts->u.cl->length && fts->u.cl->length))
diff --git a/gcc/testsuite/gfortran.dg/entry_24.f90 b/gcc/testsuite/gfortran.dg/entry_24.f90
new file mode 100644
index 000..9773597f4e1
--- /dev/null
+++ b/gcc/testsuite/gfortran.dg/entry_24.f90
@@ -0,0 +1,20 @@
+! { dg-do compile }
+! PR fortran/87737 - improve check on function entry characteristics
+
+function f() ! { dg-error "mismatched characteristics" }
+  character(:), allocatable :: f
+  character(1)  :: g
+  f = 'f'
+  return
+entry g()
+  g = 'g'
+end
+
+function f2() ! { dg-error "mismatched characteristics" }
+  character(1)  :: f2
+  character(1), allocatable :: g2
+  f2 = 'f'
+  return
+entry g2()
+  g2 = 'g'
+end

Re: [PATCH] Fix float128-call.c test for power8 IEEE 128 and power10.

2021-08-27 Thread Michael Meissner via Gcc-patches

On Fri, Aug 27, 2021 at 12:29:42PM -0500, Segher Boessenkool wrote:
> On Wed, Aug 25, 2021 at 06:09:44PM -0400, Michael Meissner wrote:
> > I built a compiler on a little endian power8 system where the default long
> > double was IEEE 128-bit instead of IBM 128-bit.  I discovered that on
> > power8, we would generate a lxvd2x and xxpermdi to deal with the endianess
> > instead of the Altivec lxv.
> 
> You mean lvx.  Okay.
> 
> > +/* { dg-final { scan-assembler {\mlxvd2x 34\M|\mlvx 2\M|\mp?lxvx? 34\M} } 
> > } */
> > +/* { dg-final { scan-assembler {\mstxvd2x 34\M|\mstvx 2\M|\mstxvx 34\M} } 
> > } */
> 
> "stxvx?"  as well?  For robustness.  Can add "p?" as well, or would it
> be bad if that ever is used, is this test testing it is not done?

I updated the comments and regex.  In this particular test, the d-form stores
would never be generated, but it can be useful to include them.

-- 
Michael Meissner, IBM
IBM, M/S 2506R, 550 King Street, Littleton, MA 01460-6245, USA
email: meiss...@linux.ibm.com, phone: +1 (978) 899-4797

Re: [PATCH] improve note location and refactor warn_uninit

2021-08-27 Thread Martin Sebor via Gcc-patches


Hello Nathan & Joern,

Richard has asked me to give you a heads up that we will be enabling
an informational note for all instances of the GCC -Wuninitialized
warning.  You added the helpful note pretty much exactly 15 years ago
to the day (in http://gcc.gnu.org/viewcvs?root=gcc&view=rev&rev=116564)
but enabled it only to a limited extent.  So this is to let you know
of this change and give you an opportunity to raise any concerns you
might have with it.  Please see the discussion below for more details.

The change is as follows:

diff --git a/gcc/tree-ssa-uninit.c b/gcc/tree-ssa-uninit.c
index 394dbf40c9c..cb6d1145202 100644
--- a/gcc/tree-ssa-uninit.c
+++ b/gcc/tree-ssa-uninit.c
@@ -206,14 +206,7 @@ warn_uninit (opt_code opt, tree t, tree var, const 
char *gmsgid,

   if (location == var_loc)
 return;

-  location_t cfun_loc = DECL_SOURCE_LOCATION (cfun->decl);
-  expanded_location xloc = expand_location (location);
-  expanded_location floc = expand_location (cfun_loc);
-  if (xloc.file != floc.file
-  || linemap_location_before_p (line_table, location, cfun_loc)
-  || linemap_location_before_p (line_table, cfun->function_end_locus,
-   location))
-inform (var_loc, "%qD was declared here", var);
+  inform (var_loc, "%qD was declared here", var);
 }

 struct check_defs_data

Regards
Martin

On 8/27/21 11:30 AM, Richard Biener wrote:

On August 27, 2021 5:20:57 PM GMT+02:00, Martin Sebor  wrote:

On 8/27/21 12:23 AM, Richard Biener wrote:

On Thu, Aug 26, 2021 at 9:30 PM Martin Sebor  wrote:


On 8/26/21 10:38 AM, Martin Sebor wrote:

On 8/26/21 1:00 AM, Richard Biener wrote:

On Wed, Aug 25, 2021 at 10:03 PM Martin Sebor  wrote:


Richard, some time ago you mentioned you'd had trouble getting
-Wuninitialized to print the note pointing to the uninitialized
variable.  I said I'd look into it, and so I did.  The attached
patch simplifies the warn_uninit() function to get rid of some
redundant cruft: besides a few pointless null tests and
a duplicate argument it also does away with ancient code that's
responsible for the problem you saw.  It turns out that tests
for the problem are already in the test suite but have been
xfailed since the day they were added, so the patch simply
removes the xfails.  I'll go ahead and commit this cleanup
as obvious later this week unless you have suggestions for
further changes.


I do see lots of useful refactoring.

-  if (context != NULL && gimple_has_location (context))
-location = gimple_location (context);
-  else if (phiarg_loc != UNKNOWN_LOCATION)
-location = phiarg_loc;
-  else
-location = DECL_SOURCE_LOCATION (var);
+  /* Use either the location of the read statement or that of the PHI
+ argument, or that of the uninitialized variable, in that order,
+ whichever is valid.  */
+  location_t location = gimple_location (context);
+  if (location == UNKNOWN_LOCATION)
+{
+  if (phi_arg_loc == UNKNOWN_LOCATION)
+   location = DECL_SOURCE_LOCATION (var);
+  else
+   location = phi_arg_loc;
+}

the comment is an improvement but I think the original flow is
easier to follow ;)


It doesn't really simplify things so I can revert it.


I've done that and pushed r12-3165, and...


-  if (xloc.file != floc.file
- || linemap_location_before_p (line_table, location, cfun_loc)
- || linemap_location_before_p (line_table,
cfun->function_end_locus,
-   location))
-   inform (DECL_SOURCE_LOCATION (var), "%qD was declared here",
var);
...
+  inform (var_loc, "%qD was declared here", var);

and this is the actual change that "fixes" the missing diagnostics?  Can
you explain what the reason for the original sing-and-dance was?  It
looks like it tries to print the 'was declared here' only for locations
within the current function (but then it probably intended to do that
on the DECL_SOURCE_LOCATION (var), not on whatever we are
choosing now)?


The author(s) of the logic wanted to print the note only for variables
declared in functions other than the one the uninitialized read is in.
The idea first appears in pr17506, comment #25 (and attachment 12131).
At that time GCC would print no note at all and pr17506 was about
inlining making it difficult to find the uninitialized variable.
The originally reported problem can still be reproduced on Godbolt
(with the original very large translation unit):
 https://godbolt.org/z/8WW43rxnd

I've reduced it to a small test case:
 https://godbolt.org/z/rnPEfPqPf

It looks like they didn't think it would also be a problem if
the variable was defined and read in the same function, even
if the read was far away from the declaration.


Ah, OK.  I think that makes sense in principle, the test just looked
really weird - for the 'decl' it would be most natural to use sth
like decl_function_context (DECL_ORIGIN (var)) to determine
the function the variable was declared in and for the location of
th

[PATCH v5] Fix for powerpc64 long double complex divide failure

2021-08-27 Thread Patrick McGehearty via Gcc-patches

This revision (v5) adds a test in libgcc/libgcc2.c for when
"__LIBGCC_TF_MANT_DIG__ == 106" to use __LIBGCC_DF_EPSILON__ instead
of __LIBGCC_TF_EPSILON__. That is specific to IBM 128-bit format long
doubles where EPSILON is very, very small and 1/EPSILON oveflows to
infinity. This change avoids the overflow without affecting any other
platform. Discussion in the patch is adjusted to reflect this
limitation.

It does not make any changes to .../rs6000/_divkc3.c, leaving it to
use __LIBGCC_KF__*. That means the upstream gcc will not build in
older IBM environments that do not recognize the KF floating point
mode properly. Environments that do not need IBM longdouble support do
build cleanly.

- - - -
This patch addresses the failure of powerpc64 long double complex divide
in native ibm long double format after the patch "Practical improvement
to libgcc complex divide".

The new code uses the following macros which are intended to be mapped
to appropriate values according to the underlying hardware representation.
See https://gcc.gnu.org/bugzilla/show_bug.cgi?id=101104

RBIG a value near the maximum representation
RMIN a value near the minimum representation
 (but not in the subnormal range)
RMIN2a value moderately less than 1
RMINSCAL the inverse of RMIN2
RMAX2RBIG * RMIN2  - a value to limit scaling to not overflow

When "long double" values were not using the IEEE 128-bit format but
the traditional IBM 128-bit, the previous code used the LDBL values
which caused overflow for RMINSCAL. The new code uses the DBL values.

RBIG  LDBL_MAX = 0x1.f800p+1022
  DBL_MAX  = 0x1.f000p+1022

RMIN  LDBL_MIN = 0x1.p-969
RMIN  DBL_MIN  = 0x1.p-1022

RMIN2 LDBL_EPSILON = 0x0.1000p-1022 = 0x1.0p-1074
RMIN2 DBL_EPSILON  = 0x1.p-52

RMINSCAL 1/LDBL_EPSILON = inf (1.0p+1074 does not fit in IBM 128-bit).
 1/DBL_EPSILON  = 0x1.p+52

RMAX2 = RBIG * RMIN2 = 0x1.f800p-52
RBIG * RMIN2 = 0x1.f000p+970

The MAX and MIN values have only modest changes since the maximum and
minimum values are about the same as for double precision.  The
EPSILON field is considerably different. Due to how very small values
can be represented in the lower 64 bits of the IBM 128-bit floating
point, EPSILON is extremely small, so far beyond the desired value
that inversion of the value overflows and even without the overflow,
the RMAX2 is so small as to eliminate most usage of the test.

The change has been tested on gcc135.fsffrance.org and gains the
expected improvements in accuracy for long double complex divide.

libgcc/
PR target/101104
* libgcc2.c (RMIN2, RMINSCAL, RMAX2):
Use more correct values for native IBM 128-bit.
---
 libgcc/libgcc2.c | 9 +++--
 1 file changed, 7 insertions(+), 2 deletions(-)

diff --git a/libgcc/libgcc2.c b/libgcc/libgcc2.c
index 38f935e..bf45576 100644
--- a/libgcc/libgcc2.c
+++ b/libgcc/libgcc2.c
@@ -1906,8 +1906,13 @@ NAME (TYPE x, int m)
 # define NOTRUNC (!__LIBGCC_TF_EXCESS_PRECISION__)
 # define RBIG  (__LIBGCC_TF_MAX__ / 2)
 # define RMIN  (__LIBGCC_TF_MIN__)
-# define RMIN2 (__LIBGCC_TF_EPSILON__)
-# define RMINSCAL (1 / __LIBGCC_TF_EPSILON__)
+# if __LIBGCC_TF_MANT_DIG__ == 106
+#  define RMIN2  (__LIBGCC_DF_EPSILON__)
+#  define RMINSCAL (1 / __LIBGCC_DF_EPSILON__)
+# else
+#  define RMIN2(__LIBGCC_TF_EPSILON__)
+#  define RMINSCAL (1 / __LIBGCC_TF_EPSILON__)
+# endif
 # define RMAX2 (RBIG * RMIN2)
 #else
 # error
-- 
1.8.3.1

[PATCH] rs6000: Disable optimizing multiple xxsetaccz instructions into one xxsetaccz

2021-08-27 Thread Peter Bergner via Gcc-patches

Fwprop will happily optimize two xxsetaccz instructions into one xxsetaccz
by propagating the results of the first to the uses of the second.
We really don't want that to happen given the late priming/depriming of
accumulators.  I fixed this by making the xxsetaccz source operand an
unspec volatile.  I also removed the mma_xxsetaccz define_expand and
define_insn_and_split and replaced it with a simple define_insn.
The expand and splitter patterns were leftovers from the pre opaque mode
code when the xxsetaccz code was part of the movpxi pattern, and we don't
need them now.

Rather than a new test case, I was able to just modify the current test case
to add another __builtin_mma_xxsetaccz call which shows the bad code gen
with unpatched compilers.

This passed bootstrap on powerpc64le-linux with no regressions.
Ok for trunk?  We'll need this for sure in GCC11.  Ok there too after
some trunk burn in time?

GCC10 suffers from the same issue, but since the code is different, I'll
have to determine a different solution which I'll post as a separate
patch.

Peter


gcc/
* config/rs6000/mma.md (unspec): Delete UNSPEC_MMA_XXSETACCZ.
(unspecv): Add UNSPECV_MMA_XXSETACCZ.
(*mma_xxsetaccz): Delete.
(mma_xxsetaccz): Change to define_insn.  Remove match_operand.
Use UNSPECV_MMA_XXSETACCZ.
* config/rs6000/rs6000.c (rs6000_rtx_costs): Use UNSPECV_MMA_XXSETACCZ.

gcc/testsuite/
* gcc.target/powerpc/mma-builtin-6.c: Add second call to xxsetacc
built-in.  Update instruction counts.

diff --git a/gcc/config/rs6000/mma.md b/gcc/config/rs6000/mma.md
index 1f6fc03d2ac..b26ae7a5d04 100644
--- a/gcc/config/rs6000/mma.md
+++ b/gcc/config/rs6000/mma.md
@@ -91,7 +91,10 @@ (define_c_enum "unspec"
UNSPEC_MMA_XVI8GER4SPP
UNSPEC_MMA_XXMFACC
UNSPEC_MMA_XXMTACC
-   UNSPEC_MMA_XXSETACCZ
+  ])
+
+(define_c_enum "unspecv"
+  [UNSPECV_MMA_XXSETACCZ
   ])
 
 ;; MMA instructions with 1 accumulator argument
@@ -469,26 +472,12 @@ (define_insn "mma_"
 
 ;; We can't have integer constants in XOmode so we wrap this in an UNSPEC.
 
-(define_expand "mma_xxsetaccz"
-  [(set (match_operand:XO 0 "fpr_reg_operand")
-   (const_int 0))]
-  "TARGET_MMA"
-{
-  rtx xo0 = gen_rtx_UNSPEC (XOmode, gen_rtvec (1, const0_rtx),
-   UNSPEC_MMA_XXSETACCZ);
-  emit_insn (gen_rtx_SET (operands[0], xo0));
-  DONE;
-})
-
-(define_insn_and_split "*mma_xxsetaccz"
+(define_insn "mma_xxsetaccz"
   [(set (match_operand:XO 0 "fpr_reg_operand" "=d")
-   (unspec:XO [(match_operand 1 "const_0_to_1_operand" "O")]
-UNSPEC_MMA_XXSETACCZ))]
+   (unspec_volatile:XO [(const_int 0)]
+   UNSPECV_MMA_XXSETACCZ))]
   "TARGET_MMA"
   "xxsetaccz %A0"
-  "&& reload_completed"
-  [(set (match_dup 0) (unspec:XO [(match_dup 1)] UNSPEC_MMA_XXSETACCZ))]
-  ""
   [(set_attr "type" "mma")
(set_attr "length" "4")])
 
diff --git a/gcc/config/rs6000/rs6000.c b/gcc/config/rs6000/rs6000.c
index e073b26b430..40dc71c8171 100644
--- a/gcc/config/rs6000/rs6000.c
+++ b/gcc/config/rs6000/rs6000.c
@@ -21919,7 +21919,7 @@ rs6000_rtx_costs (rtx x, machine_mode mode, int 
outer_code,
   break;
 
 case UNSPEC:
-  if (XINT (x, 1) == UNSPEC_MMA_XXSETACCZ)
+  if (XINT (x, 1) == UNSPECV_MMA_XXSETACCZ)
{
  *total = 0;
  return true;
diff --git a/gcc/testsuite/gcc.target/powerpc/mma-builtin-6.c 
b/gcc/testsuite/gcc.target/powerpc/mma-builtin-6.c
index 0c6517211e3..715b28138e9 100644
--- a/gcc/testsuite/gcc.target/powerpc/mma-builtin-6.c
+++ b/gcc/testsuite/gcc.target/powerpc/mma-builtin-6.c
@@ -5,14 +5,16 @@
 void
 foo (__vector_quad *dst)
 {
-  __vector_quad acc;
-  __builtin_mma_xxsetaccz (&acc);
-  *dst = acc;
+  __vector_quad acc0, acc1;
+  __builtin_mma_xxsetaccz (&acc0);
+  __builtin_mma_xxsetaccz (&acc1);
+  dst[0] = acc0;
+  dst[1] = acc1;
 }
 
 /* { dg-final { scan-assembler-not {\mlxv\M} } } */
 /* { dg-final { scan-assembler-not {\mlxvp\M} } } */
 /* { dg-final { scan-assembler-not {\mxxmtacc\M} } } */
-/* { dg-final { scan-assembler-times {\mxxsetaccz\M} 1 } } */
-/* { dg-final { scan-assembler-times {\mxxmfacc\M} 1 } } */
-/* { dg-final { scan-assembler-times {\mstxvp\M} 2 } } */
+/* { dg-final { scan-assembler-times {\mxxsetaccz\M} 2 } } */
+/* { dg-final { scan-assembler-times {\mxxmfacc\M} 2 } } */
+/* { dg-final { scan-assembler-times {\mstxvp\M} 4 } } */

Re: [RFH] ME optimizes variable assignment away / Fortran bind(C) descriptor conversion

2021-08-27 Thread Richard Biener via Gcc-patches

On August 27, 2021 5:47:58 PM GMT+02:00, Tobias Burnus 
 wrote:
>Hi all,
>
>Background: gfortran has its own array descriptor – and one which is defined in
>F2018 and used/usable from C (#include ).
>On mainline, the conversion is done via a void* pointer and calls to 
>libgfortran,
>which causes all kind of issues, including alias issues but also data 
>type/bounds
>issues etc.
>
>The attached patch tries to do this inline - and defines in the FE a proper
>type for the C descriptor.  ("CFI_cdesc_t" has a 'dim[]' as last member,
>'CFI_cdesc_t01' has dim[1].)
>
>
>But but I have a ME optimization issue, which removes an crucial
>assignment - any help/suggestion is welcome!
>(Additionally, there is room for improvement regarding the debugging
>experience. Suggestions are welcome as well, but it is not as crucial.)
>
>
>Do you have any suggestion or idea what goes wrong?
>
>
>It looks really nice with "-O1 -fno-inline"   :-)
>   The callee 'rank_p()' is mostly optimized and in the
>   caller only those struct elements are set, which are used:
>
>integer(kind=4) rank_p (struct CFI_cdesc_t & _this)
>{
>   _1 = _this_11(D)->base_addr;
>   _2 = _this_11(D)->rank;
>...
>   rnk_13 = (integer(kind=4)) _2;
>   return rnk_13;
>}
>
>void selr_p ()
>{
>...
>   struct CFI_cdesc_t01 cfi.7;
>...
>[local count: 537730764]:
>   cfi.7.rank = 1;
>   cfi.7.base_addr = 0B;

You need to do the accesses above using the generic 't' type as well, otherwise 
they are non-conflicting TBAA wise. 

>   irnk_45 = rank_p (&cfi.7);
>   cfi.7 ={v} {CLOBBER};
>   if (irnk_45 != 1)
>
>
>BUT BAD RESULT with -O2 -fno-inline  :-(
>   The assignments on the caller side are gone,
>   which causes wrong code (run stops with 'stop 1'):
>
>integer(kind=4) rank_p (struct CFI_cdesc_t & _this)
>{
>...
>[local count: 1073741824]:
>   _1 = _this_3(D)->rank;
>   rnk_4 = (integer(kind=4)) _1;
>   return rnk_4;
>}
>
>void selr_p ()
>{
>...
>   struct CFI_cdesc_t01 cfi.7;
>...
>[local count: 537730764]:
>   irnk_30 = rank_p (&cfi.7);   !  ERROR: cfi.7.rank assignment missing
>   cfi.7 ={v} {CLOBBER};
>   if (irnk_30 != 1)
>
>  *  *  *
>
>Any idea / suggestion?
>
>  *  *  *
>
>* trans-type.c defines the new type
>* trans-decl.c handles the conversion from C descriptor to Fortran descriptor 
>in the callee
>* trans-expr.c handles the conversion to the C descriptor in the callee
>
>Attached:
>* Testcase 'test.f90'
>   - original dump
>   - -O1 -fno-inline optimized dump
>   - -O2 -fno-inline optimized dump
>* Full patch
>   - Testcase is lightly modified gfortran.dg/PR93963.f90
>
>Tobias
>
>  *  *  *
>
>PS: Current GCC (mainline w/o patch) generates the following.
>[-> with patch, see a-test.f90.*.original.]
>
>Namely, for the callee, casting the argument
>   (in reality pointer to a CFI descriptor,
>but TREE_TYPE (PARM_DECL) is ptr to Fortran descriptor)
>to 'void *', passing it to a library function, which creates
>a new Fortran descriptor and pointer-assigning it to
>the PARM_DECL pointer, which now points to a Fortran
>descriptor:
>
>integer(kind=4) rank_p (struct array15_integer(kind=4) & this)
>{
>gfc_desc_ptr.1 = &gfc_desc.0;
>CFI_desc_ptr.2 = (void *) this;
>_gfortran_cfi_desc_to_gfc_desc (gfc_desc_ptr.1, &CFI_desc_ptr.2);
>this = (struct array15_integer(kind=4) &) gfc_desc_ptr.1;
>rnk = (integer(kind=4)) this->dtype.rank;
>...
>void selr_p ()
>{
>   struct array01_integer(kind=4) intp;
>   integer(kind=4) irnk;
>   static integer(kind=4) rnk = 1;
>
>   intp.dtype = {.elem_len=4, .rank=1, .type=1};
>   intp.span = 0;
>...
> _gfortran_gfc_desc_to_cfi_desc (&cfi.3, &intp);
> intp.dtype.attribute = 0;
> irnk = rank_p (cfi.3);
> __builtin_free (cfi.3);
>
>  *  *  *
>
>-
>Siemens Electronic Design Automation GmbH; Anschrift: Arnulfstraße 201, 80634 
>München; Gesellschaft mit beschränkter Haftung; Geschäftsführer: Thomas 
>Heurung, Frank Thürauf; Sitz der Gesellschaft: München; Registergericht 
>München, HRB 106955

Re: [PATCH 18/34] rs6000: Always initialize vector_pair and vector_quad nodes

2021-08-27 Thread Segher Boessenkool

On Thu, Jul 29, 2021 at 08:31:05AM -0500, Bill Schmidt wrote:
>   * config/rs6000/rs6000-call.c (rs6000_init_builtins): Remove
>   TARGET_EXTRA_BUILTINS guard.

> +  vector_pair_type_node = make_node (OPAQUE_TYPE);

> +  vector_quad_type_node = make_node (OPAQUE_TYPE);

Although those types are called "vector", they are not, so this does
work correctly even if not TARGET_EXTRA_BUILTINS.

Ideally we will that macro always enabled eventually, but that is later
work.  Okay for trunk.  Thanks!

Segher

[committed] Reduce vector comparison of uniform vectors to a scalar comparison

2021-08-27 Thread Jeff Law via Gcc-patches



I was working some aspects of our port's vector support and stumbled 
across a bit of silly code.  We were comparing two vectors that were 
both uniform.


We can simplify a comparison of uniform vectors to a comparison of a 
representative element from each.  That in turn may expose const/copy 
propagation opportunities or even jump threading opportunities.


And that's precisely what this patch does inside DOM.  At the start of 
statement processing we see if we can reduce a vector comparison to a 
scalar comparison.


You can see this in pr95308.C on x86.   As we enter dom3 we have:


;;   basic block 2, loop depth 0
;;    pred:   ENTRY
  e.1_1 = e;
  _27 = f_26(D) != 0;
  _163 = e.1_1 != 0;
  _164 = _27 & _163;
  _196 = _164 ? -1 : 0;
  vect_cst__194 = {_196, _196, _196, _196, _196, _196, _196, _196};

[ ... ]

;;   basic block 8, loop depth 1
;;    pred:   7
;;    6
  if (vect_cst__194 == { 0, 0, 0, 0, 0, 0, 0, 0 })
    goto ; [100.00%]
  else
    goto ; [20.00%]
;;    succ:   10
;;    9


There's another similar comparison later.  We transform the comparison into:

;;   basic block 8, loop depth 1
;;    pred:   7
;;    6
  if (_196 == 0)
    goto ; [100.00%]
  else
    goto ; [20.00%]
;;    succ:   13
;;    9

There's nice secondary effects that show up after the transformation as 
well.



Bootstrapped and regression tested on x86_64.   It'll go through the 
rest of the public targets as well as our internal port over the next 
24-48 hrs.



Committed to the trunk,

Jeff
commit ac6d5c9112bb78e47398e040c553ad216edf3ebb
Author: Jeff Law 
Date:   Fri Aug 27 15:27:38 2021 -0400

Reduce vector comparison of uniform vectors to a scalar comparison

gcc/
* tree-ssa-dom.c (reduce_vector_comparison_to_scalar_comparison): 
New
function.
(dom_opt_dom_walker::optimize_stmt): Use it.

diff --git a/gcc/tree-ssa-dom.c b/gcc/tree-ssa-dom.c
index a0df492df10..a5245b33de6 100644
--- a/gcc/tree-ssa-dom.c
+++ b/gcc/tree-ssa-dom.c
@@ -1891,6 +1891,66 @@ dom_opt_dom_walker::test_for_singularity (gimple *stmt,
 }
 }
 
+/* If STMT is a comparison of two uniform vectors reduce it to a comparison
+   of scalar objects, otherwise leave STMT unchanged.  */
+
+static void
+reduce_vector_comparison_to_scalar_comparison (gimple *stmt)
+{
+  if (gimple_code (stmt) == GIMPLE_COND)
+{
+  tree lhs = gimple_cond_lhs (stmt);
+  tree rhs = gimple_cond_rhs (stmt);
+
+  /* We may have a vector comparison where both arms are uniform
+vectors.  If so, we can simplify the vector comparison down
+to a scalar comparison.  */
+  if (TREE_CODE (TREE_TYPE (lhs)) == VECTOR_TYPE
+ && TREE_CODE (TREE_TYPE (rhs)) == VECTOR_TYPE)
+   {
+ /* If either operand is an SSA_NAME, then look back to its
+defining statement to try and get at a suitable source.  */
+ if (TREE_CODE (rhs) == SSA_NAME)
+   {
+ gimple *def_stmt = SSA_NAME_DEF_STMT (rhs);
+ if (gimple_assign_single_p (def_stmt))
+   rhs = gimple_assign_rhs1 (def_stmt);
+   }
+
+ if (TREE_CODE (lhs) == SSA_NAME)
+   {
+ gimple *def_stmt = SSA_NAME_DEF_STMT (lhs);
+ if (gimple_assign_single_p (def_stmt))
+   lhs = gimple_assign_rhs1 (def_stmt);
+   }
+
+ /* Now see if they are both uniform vectors and if so replace
+the vector comparison with a scalar comparison.  */
+ tree rhs_elem = rhs ? uniform_vector_p (rhs) : NULL_TREE;
+ tree lhs_elem = lhs ? uniform_vector_p (lhs) : NULL_TREE;
+ if (rhs_elem && lhs_elem)
+   {
+ if (dump_file && dump_flags & TDF_DETAILS)
+   {
+ fprintf (dump_file, "Reducing vector comparison: ");
+ print_gimple_stmt (dump_file, stmt, 0);
+   }
+
+ gimple_cond_set_rhs (as_a (stmt), rhs_elem);
+ gimple_cond_set_lhs (as_a (stmt), lhs_elem);
+ gimple_set_modified (stmt, true);
+
+ if (dump_file && dump_flags & TDF_DETAILS)
+   {
+ fprintf (dump_file, "To scalar equivalent: ");
+ print_gimple_stmt (dump_file, stmt, 0);
+ fprintf (dump_file, "\n");
+   }
+   }
+   }
+}
+}
+
 /* Optimize the statement in block BB pointed to by iterator SI.
 
We try to perform some simplistic global redundancy elimination and
@@ -1933,6 +1993,11 @@ dom_opt_dom_walker::optimize_stmt (basic_block bb, 
gimple_stmt_iterator *si,
   update_stmt_if_modified (stmt);
   opt_stats.num_stmts++;
 
+  /* STMT may be a comparison of uniform vectors that we can simplify
+ down to a comparison of scalars.  Do that transformation first
+ so that all the scalar optimizations from here onward apply.  */
+  reduce_vector_comparison_to_scalar_comparison

Re: [PATCH 17/34] rs6000: Add sanity to V2DI_type_node definitions

2021-08-27 Thread Segher Boessenkool

On Thu, Jul 29, 2021 at 08:31:04AM -0500, Bill Schmidt wrote:
> It seems quite strange for these to be "vector long" for 64-bit and
> "vector long long" for 32-bit, when "vector long long" will do for both.

Yeah.  For most builtins the only thing that matters is the mode the
types map to.  Similarly we want "vector int" instead of "vector long"
on both ilp32 and lp64.

> +V2DI_type_node = rs6000_vector_type (TARGET_POWERPC64 ? "__vector long"
> +  : "__vector long long",
> +  long_long_integer_type_node, 2);

The same as before: either ? and : on the same line, or both should
start a new line.  Or use some temporary name.  Or write an "if" around
the whole thing (this compiles to the same machine code -- source code
should be optimised for the human reader, the compiler does not care at
all, you almost never can use that as excuse :-) )

Anyway, you know what is needed :-)  Okay for trunk.  Thanks!

Segher

Re: [PATCH v3 5/6] rs6000: Support more SSE4 "cmp", "mul", "pack" intrinsics

2021-08-27 Thread Paul A. Clarke via Gcc-patches

On Fri, Aug 27, 2021 at 10:21:35AM -0500, Bill Schmidt via Gcc-patches wrote:
> On 8/23/21 2:03 PM, Paul A. Clarke wrote:
> > Function signatures and decorations match gcc/config/i386/smmintrin.h.

> > gcc

> > * config/rs6000/nmmintrin.h: Copy from i386, tweak to suit.

> > ---
> > v3:
> > - Add nmmintrin.h. _mm_cmpgt_epi64 is part of SSE4.2, which is
> >ostensibly defined in nmmintrin.h. Following the i386 implementation,
> >however, nmmintrin.h only includes smmintrin.h, and the actual
> >implementations appear there.

> > v2:
> > - Added "extern" to functions to maintain compatible decorations with
> >like implementations in gcc/config/i386.

> > diff --git a/gcc/config/rs6000/nmmintrin.h b/gcc/config/rs6000/nmmintrin.h
> > new file mode 100644
> > index ..20a70bee3776
> > --- /dev/null
> > +++ b/gcc/config/rs6000/nmmintrin.h
> > @@ -0,0 +1,40 @@
> > +/* Copyright (C) 2021 Free Software Foundation, Inc.
> > +
> > +   This file is part of GCC.
> > +
> > +   GCC is free software; you can redistribute it and/or modify
> > +   it under the terms of the GNU General Public License as published by
> > +   the Free Software Foundation; either version 3, or (at your option)
> > +   any later version.
> > +
> > +   GCC is distributed in the hope that it will be useful,
> > +   but WITHOUT ANY WARRANTY; without even the implied warranty of
> > +   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
> > +   GNU General Public License for more details.
> > +
> > +   Under Section 7 of GPL version 3, you are granted additional
> > +   permissions described in the GCC Runtime Library Exception, version
> > +   3.1, as published by the Free Software Foundation.
> > +
> > +   You should have received a copy of the GNU General Public License and
> > +   a copy of the GCC Runtime Library Exception along with this program;
> > +   see the files COPYING3 and COPYING.RUNTIME respectively.  If not, see
> > +   .  */
> > +
> > +#ifndef NO_WARN_X86_INTRINSICS
> > +/* This header is distributed to simplify porting x86_64 code that
> > +   makes explicit use of Intel intrinsics to powerpc64le.
> > +   It is the user's responsibility to determine if the results are
> > +   acceptable and make additional changes as necessary.
> > +   Note that much code that uses Intel intrinsics can be rewritten in
> > +   standard C or GNU C extensions, which are more portable and better
> > +   optimized across multiple targets.  */
> > +#endif
> > +
> > +#ifndef _NMMINTRIN_H_INCLUDED
> > +#define _NMMINTRIN_H_INCLUDED
> > +
> > +/* We just include SSE4.1 header file.  */
> > +#include 
> > +
> > +#endif /* _NMMINTRIN_H_INCLUDED */
> 
> Should there be something in here indicating that nmmintrin.h is for SSE
> 4.2?  Otherwise it's a bit of a head-scratcher to a new person wondering why
> this file exists.  No big deal either way.

For good or bad, I have been trying to minimize differences with the
analogous i386 files.  With the exception of the copyright and our annoying
litte warning, the only difference was this comment:

--
/* Implemented from the specification included in the Intel C++ Compiler
   User Guide and Reference, version 10.0.  */
--

I didn't find that (1) accurate, since there are no implementations therein,
or (2) particularly informative, as I imagine that document has a much
bigger scope than SSE4.2.  And keeping it would be a bit misleading, I think.
So, I intentionally removed the comment.

> This looks fine to me with or without that.  Recommend approval.

Thanks for the review!

PC

Re: [PATCH 16/34] rs6000: Darwin builtin support

2021-08-27 Thread Segher Boessenkool

On Thu, Jul 29, 2021 at 08:31:03AM -0500, Bill Schmidt wrote:
>   * config/rs6000/darwin.h (SUBTARGET_INIT_BUILTINS): Use the new
>   decl when new_builtins_are_live.
>   * config/rs6000/rs6000-builtin-new.def (__builtin_cfstring): New
>   built-in.

Okay for trunk.  Thanks!


Segher

Re: [PATCH][pushed] Add -fprofile-reproducible=parallel-runs to STAGEfeedback_CFLAGS to Makefile.tpl.

2021-08-27 Thread Gleb Fotengauer-Malinovskiy

Hi,

On Thu, Mar 11, 2021 at 04:19:51PM +0100, Martin Liška wrote:
> Pushed as obvious, the original change was done
> in g:e05a117dc4b98f3ac60851608f532ba7cee7343a.
> 
> Martin
> 
> ChangeLog:
> 
>   * Makefile.tpl: The change was done Makefile.in which
>   is generated file.
> ---
>   Makefile.tpl | 2 +-
>   1 file changed, 1 insertion(+), 1 deletion(-)
> 
> diff --git a/Makefile.tpl b/Makefile.tpl
> index 3b88f351d5b..6e0337fb48f 100644
> --- a/Makefile.tpl
> +++ b/Makefile.tpl
> @@ -488,7 +488,7 @@ STAGEprofile_TFLAGS = $(STAGE2_TFLAGS)
>   STAGEtrain_CFLAGS = $(filter-out -fchecking=1,$(STAGE3_CFLAGS))
>   STAGEtrain_TFLAGS = $(filter-out -fchecking=1,$(STAGE3_TFLAGS))
>   
> -STAGEfeedback_CFLAGS = $(STAGE4_CFLAGS) -fprofile-use
> +STAGEfeedback_CFLAGS = $(STAGE4_CFLAGS) -fprofile-use 
> -fprofile-reproducible=parallel-runs

Did you mean to add -fprofile-reproducible flag to STAGEprofile_CFLAGS
instead?  I suppose this flag doesn't mean anything without
-fprofile-generate, which is passed through STAGEprofile_CFLAGS.

Thanks,
-- 
glebfm

Re: [PATCH] improve note location and refactor warn_uninit

2021-08-27 Thread Richard Biener via Gcc-patches

On August 27, 2021 5:20:57 PM GMT+02:00, Martin Sebor  wrote:
>On 8/27/21 12:23 AM, Richard Biener wrote:
>> On Thu, Aug 26, 2021 at 9:30 PM Martin Sebor  wrote:
>>>
>>> On 8/26/21 10:38 AM, Martin Sebor wrote:
 On 8/26/21 1:00 AM, Richard Biener wrote:
> On Wed, Aug 25, 2021 at 10:03 PM Martin Sebor  wrote:
>>
>> Richard, some time ago you mentioned you'd had trouble getting
>> -Wuninitialized to print the note pointing to the uninitialized
>> variable.  I said I'd look into it, and so I did.  The attached
>> patch simplifies the warn_uninit() function to get rid of some
>> redundant cruft: besides a few pointless null tests and
>> a duplicate argument it also does away with ancient code that's
>> responsible for the problem you saw.  It turns out that tests
>> for the problem are already in the test suite but have been
>> xfailed since the day they were added, so the patch simply
>> removes the xfails.  I'll go ahead and commit this cleanup
>> as obvious later this week unless you have suggestions for
>> further changes.
>
> I do see lots of useful refactoring.
>
> -  if (context != NULL && gimple_has_location (context))
> -location = gimple_location (context);
> -  else if (phiarg_loc != UNKNOWN_LOCATION)
> -location = phiarg_loc;
> -  else
> -location = DECL_SOURCE_LOCATION (var);
> +  /* Use either the location of the read statement or that of the PHI
> + argument, or that of the uninitialized variable, in that order,
> + whichever is valid.  */
> +  location_t location = gimple_location (context);
> +  if (location == UNKNOWN_LOCATION)
> +{
> +  if (phi_arg_loc == UNKNOWN_LOCATION)
> +   location = DECL_SOURCE_LOCATION (var);
> +  else
> +   location = phi_arg_loc;
> +}
>
> the comment is an improvement but I think the original flow is
> easier to follow ;)

 It doesn't really simplify things so I can revert it.
>>>
>>> I've done that and pushed r12-3165, and...
>>>
> -  if (xloc.file != floc.file
> - || linemap_location_before_p (line_table, location, cfun_loc)
> - || linemap_location_before_p (line_table,
> cfun->function_end_locus,
> -   location))
> -   inform (DECL_SOURCE_LOCATION (var), "%qD was declared here",
> var);
> ...
> +  inform (var_loc, "%qD was declared here", var);
>
> and this is the actual change that "fixes" the missing diagnostics?  Can
> you explain what the reason for the original sing-and-dance was?  It
> looks like it tries to print the 'was declared here' only for locations
> within the current function (but then it probably intended to do that
> on the DECL_SOURCE_LOCATION (var), not on whatever we are
> choosing now)?

 The author(s) of the logic wanted to print the note only for variables
 declared in functions other than the one the uninitialized read is in.
 The idea first appears in pr17506, comment #25 (and attachment 12131).
 At that time GCC would print no note at all and pr17506 was about
 inlining making it difficult to find the uninitialized variable.
 The originally reported problem can still be reproduced on Godbolt
 (with the original very large translation unit):
 https://godbolt.org/z/8WW43rxnd

 I've reduced it to a small test case:
 https://godbolt.org/z/rnPEfPqPf

 It looks like they didn't think it would also be a problem if
 the variable was defined and read in the same function, even
 if the read was far away from the declaration.
>> 
>> Ah, OK.  I think that makes sense in principle, the test just looked
>> really weird - for the 'decl' it would be most natural to use sth
>> like decl_function_context (DECL_ORIGIN (var)) to determine
>> the function the variable was declared in and for the location of
>> the uninitialized use you'd similarly use
>> 
>>tree ctx = BLOCK_ORIGIN (LOCATION_BLOCK (location));
>>while (TREE_CODE ((ctx = BLOCK_SUPERCONTEXT (ctx))) != FUNCTION_DECL)
>>   ;
>> 
>> and then you can compare both functions.  This of course requires
>> a good location of the uninitialized use.
>> 
>> So I'm not sure whether we want to increase verboseness for say
>> 
>> int foo ()
>> {
>> int i;
>> i = i + 1;
>> return i;
>> }
>> 
>> by telling the user 'i' was declared the line before the uninitialized use.
>
>In this contrived case the note may not be important but it doesn't
>hurt.  It's the more realistic cases of large functions with problem
>statements far away from the objects they operate on, often indirectly,
>via pointers, where providing the additional context is helpful.
>
>This is also why most other middle end warnings (all those I worked
>on) as well as other instances of -Wmaybe-uninitialized issue these
>(and other

Re: [PATCH] Fix float128-call.c test for power8 IEEE 128 and power10.

2021-08-27 Thread Segher Boessenkool

On Wed, Aug 25, 2021 at 06:09:44PM -0400, Michael Meissner wrote:
> I built a compiler on a little endian power8 system where the default long
> double was IEEE 128-bit instead of IBM 128-bit.  I discovered that on
> power8, we would generate a lxvd2x and xxpermdi to deal with the endianess
> instead of the Altivec lxv.

You mean lvx.  Okay.

> +/* { dg-final { scan-assembler {\mlxvd2x 34\M|\mlvx 2\M|\mp?lxvx? 34\M} } } 
> */
> +/* { dg-final { scan-assembler {\mstxvd2x 34\M|\mstvx 2\M|\mstxvx 34\M} } } 
> */

"stxvx?"  as well?  For robustness.  Can add "p?" as well, or would it
be bad if that ever is used, is this test testing it is not done?


Segher

Re: [PATCH] Fix float128-call.c test for power8 IEEE 128 and power10.

2021-08-27 Thread Michael Meissner via Gcc-patches

On Fri, Aug 27, 2021 at 11:41:06AM -0500, Bill Schmidt wrote:
> This amuses me, and I want to keep it this way. :-)
> >   void store (TYPE a, TYPE *p) { *p = a; }
> > -/* { dg-final { scan-assembler {\mlxvd2x 34\M} {target be} } } */
> > -/* { dg-final { scan-assembler {\mstxvd2x 34\M} {target be} } } */
> > -/* { dg-final { scan-assembler {\mlvx 2\M} {target le} } }  */
> > -/* { dg-final { scan-assembler {\mstvx 2\M} {target le} } } */
> > +/* This regexp captures the different vector load/stores that can be 
> > generated:
> > +
> > +   lxvd2x  -- big endian power7/power8, little endian power8
> > +   lvx -- Altivec
> > +   lxv -- power9
> > +   plxv-- power10
> > +   lxvx-- X-form variant.
> > +   stxvd2x -- big endian power7/power8, little endian power8
> > +   stvx-- Altivec
> 
> For symmetry, also mention stxvx as an X-form variant?

Yep, thanks.

Because of the way the test is written (load a constant and store through a
pointer), it wouldn't generate stxv and pstxv.

-- 
Michael Meissner, IBM
IBM, M/S 2506R, 550 King Street, Littleton, MA 01460-6245, USA
email: meiss...@linux.ibm.com, phone: +1 (978) 899-4797

Re: [PATCH] Fix float128-call.c test for power8 IEEE 128 and power10.

2021-08-27 Thread David Edelsohn via Gcc-patches

This patch is okay.

Thanks, David

On Fri, Aug 27, 2021 at 12:41 PM Bill Schmidt  wrote:
>
> Hi Mike,
>
> Thanks for this clean-up!
>
> On 8/25/21 5:09 PM, Michael Meissner wrote:
> >  From 327273dfeec5c000f3c33ca7b88ee0097fd33586 Mon Sep 17 00:00:00 2001
> > From: Michael Meissner 
> > Date: Wed, 25 Aug 2021 00:31:35 -0400
> > Subject: [PATCH] Fix float128-call.c test for power8 IEEE 128 and power10.
> >
> > I built a compiler on a little endian power8 system where the default long
> > double was IEEE 128-bit instead of IBM 128-bit.  I discovered that on
> > power8, we would generate a lxvd2x and xxpermdi to deal with the endianess
> > instead of the Altivec lxv.
> >
> > In addition, I noticed the constant that was being loaded (1.0q) could be
> > loaded by the lxvkq instruction.
> >
> > I rewrote the test to handle all forms of vector load and store that can
> > be generated.  And I changed the constant to be one that lxvkq does not
> > support.
> >
> > I did bootstrap tests on the following systems, and the the test ran in all
> > environments (each of the systems were configured for the cpu mentioned):
> >
> > 1)Little endian power9  with IBM  128-bit long double
> > 2)Little endian power9  with IEEE 128-bit long double
> > 3)Little endian power8  with IBM  128-bit long double
> > 4)Little endian power8  with IEEE 128-bit long double
> > 5)Little endian power10 with IBM  128-bit long double
> > 6)Little endian power10 with IEEE 128-bit long double
> > 7)Big endianpower8  with IBM  128-bit long double
> >
> > Can I check this patch into the master branch and later backport it to 
> > GCC-11?
> >
> > 2021-08-25  Michael Meissner  
> >
> > gcc/testsuite/
> >   * gcc.target/powerpc/float128-call.c: Fix test for IEEE 128-bit
> >   long double and power10.
> > ---
> >   .../gcc.target/powerpc/float128-call.c| 27 +--
> >   1 file changed, 19 insertions(+), 8 deletions(-)
> >
> > diff --git a/gcc/testsuite/gcc.target/powerpc/float128-call.c 
> > b/gcc/testsuite/gcc.target/powerpc/float128-call.c
> > index b64ffc68bfa..d1cf47e4298 100644
> > --- a/gcc/testsuite/gcc.target/powerpc/float128-call.c
> > +++ b/gcc/testsuite/gcc.target/powerpc/float128-call.c
> > @@ -6,22 +6,33 @@
> >   #error "-mfloat128 is not supported."
> >   #endif
> >
> > +/* Pick a constant to load that cannot be generated by the power10 lxvkq
> > +   instruction.  */
> >   #ifdef __LONG_DOUBLE_IEEE128__
> >   #define TYPE long double
> > -#define ONE  1.0L
> > +#define TEN  10.0L
> >
> >   #else
> >   #define TYPE __float128
> > -#define ONE  1.0Q
> > +#define TEN  10.0Q
> >   #endif
> >
> >   /* Test to make sure vector registers are used for passing IEEE 128-bit
> >  floating point values and returning them. Also make sure the 'q' 
> > suffix is
> > -   handled.  */
> > -TYPE one (void) { return ONE; }
> > +   handled for __float128.  */
> > +TYPE one (void) { return TEN; }
>
> This amuses me, and I want to keep it this way. :-)
> >   void store (TYPE a, TYPE *p) { *p = a; }
> >
> > -/* { dg-final { scan-assembler {\mlxvd2x 34\M} {target be} } } */
> > -/* { dg-final { scan-assembler {\mstxvd2x 34\M} {target be} } } */
> > -/* { dg-final { scan-assembler {\mlvx 2\M} {target le} } }  */
> > -/* { dg-final { scan-assembler {\mstvx 2\M} {target le} } } */
> > +/* This regexp captures the different vector load/stores that can be 
> > generated:
> > +
> > + lxvd2x  -- big endian power7/power8, little endian power8
> > + lvx -- Altivec
> > + lxv -- power9
> > + plxv-- power10
> > + lxvx-- X-form variant.
> > + stxvd2x -- big endian power7/power8, little endian power8
> > + stvx-- Altivec
>
> For symmetry, also mention stxvx as an X-form variant?
>
> Looks fine to me, recommend approval.
>
> Thanks,
> Bill
>
> > + lxvx-- power9/power10.  */
> > +
> > +/* { dg-final { scan-assembler {\mlxvd2x 34\M|\mlvx 2\M|\mp?lxvx? 34\M} } 
> > } */
> > +/* { dg-final { scan-assembler {\mstxvd2x 34\M|\mstvx 2\M|\mstxvx 34\M} } 
> > } */

Re: [PATCH] Fix float128-call.c test for power8 IEEE 128 and power10.

2021-08-27 Thread Bill Schmidt via Gcc-patches


Hi Mike,

Thanks for this clean-up!

On 8/25/21 5:09 PM, Michael Meissner wrote:

 From 327273dfeec5c000f3c33ca7b88ee0097fd33586 Mon Sep 17 00:00:00 2001
From: Michael Meissner 
Date: Wed, 25 Aug 2021 00:31:35 -0400
Subject: [PATCH] Fix float128-call.c test for power8 IEEE 128 and power10.

I built a compiler on a little endian power8 system where the default long
double was IEEE 128-bit instead of IBM 128-bit.  I discovered that on
power8, we would generate a lxvd2x and xxpermdi to deal with the endianess
instead of the Altivec lxv.

In addition, I noticed the constant that was being loaded (1.0q) could be
loaded by the lxvkq instruction.

I rewrote the test to handle all forms of vector load and store that can
be generated.  And I changed the constant to be one that lxvkq does not
support.

I did bootstrap tests on the following systems, and the the test ran in all
environments (each of the systems were configured for the cpu mentioned):

1)  Little endian power9  with IBM  128-bit long double
2)  Little endian power9  with IEEE 128-bit long double
3)  Little endian power8  with IBM  128-bit long double
4)  Little endian power8  with IEEE 128-bit long double
5)  Little endian power10 with IBM  128-bit long double
6)  Little endian power10 with IEEE 128-bit long double
7)  Big endianpower8  with IBM  128-bit long double

Can I check this patch into the master branch and later backport it to GCC-11?

2021-08-25  Michael Meissner  

gcc/testsuite/
* gcc.target/powerpc/float128-call.c: Fix test for IEEE 128-bit
long double and power10.
---
  .../gcc.target/powerpc/float128-call.c| 27 +--
  1 file changed, 19 insertions(+), 8 deletions(-)

diff --git a/gcc/testsuite/gcc.target/powerpc/float128-call.c 
b/gcc/testsuite/gcc.target/powerpc/float128-call.c
index b64ffc68bfa..d1cf47e4298 100644
--- a/gcc/testsuite/gcc.target/powerpc/float128-call.c
+++ b/gcc/testsuite/gcc.target/powerpc/float128-call.c
@@ -6,22 +6,33 @@
  #error "-mfloat128 is not supported."
  #endif
  
+/* Pick a constant to load that cannot be generated by the power10 lxvkq

+   instruction.  */
  #ifdef __LONG_DOUBLE_IEEE128__
  #define TYPE long double
-#define ONE  1.0L
+#define TEN  10.0L
  
  #else

  #define TYPE __float128
-#define ONE  1.0Q
+#define TEN  10.0Q
  #endif
  
  /* Test to make sure vector registers are used for passing IEEE 128-bit

 floating point values and returning them. Also make sure the 'q' suffix is
-   handled.  */
-TYPE one (void) { return ONE; }
+   handled for __float128.  */
+TYPE one (void) { return TEN; }


This amuses me, and I want to keep it this way. :-)

  void store (TYPE a, TYPE *p) { *p = a; }
  
-/* { dg-final { scan-assembler {\mlxvd2x 34\M} {target be} } } */

-/* { dg-final { scan-assembler {\mstxvd2x 34\M} {target be} } } */
-/* { dg-final { scan-assembler {\mlvx 2\M} {target le} } }  */
-/* { dg-final { scan-assembler {\mstvx 2\M} {target le} } } */
+/* This regexp captures the different vector load/stores that can be generated:
+
+   lxvd2x  -- big endian power7/power8, little endian power8
+   lvx -- Altivec
+   lxv -- power9
+   plxv-- power10
+   lxvx-- X-form variant.
+   stxvd2x -- big endian power7/power8, little endian power8
+   stvx-- Altivec


For symmetry, also mention stxvx as an X-form variant?

Looks fine to me, recommend approval.

Thanks,
Bill


+   lxvx-- power9/power10.  */
+
+/* { dg-final { scan-assembler {\mlxvd2x 34\M|\mlvx 2\M|\mp?lxvx? 34\M} } } */
+/* { dg-final { scan-assembler {\mstxvd2x 34\M|\mstvx 2\M|\mstxvx 34\M} } } */

[pushed] Darwin, X86 : Implement __cache_clear.

2021-08-27 Thread Iain Sandoe

Hi,

We had a NOP cache clear, but there is a suitable mechanism provided
by a system call.  This connects it up.

tested on i686, x86_64-darwin and x86_64-linux
pushed to master, thanks
Iain

Signed-off-by: Iain Sandoe 

gcc/ChangeLog:

* config/i386/darwin.h (CLEAR_INSN_CACHE): New.
---
 gcc/config/i386/darwin.h | 5 +
 1 file changed, 5 insertions(+)

diff --git a/gcc/config/i386/darwin.h b/gcc/config/i386/darwin.h
index 73b06e2307d..da0ae5b3ee7 100644
--- a/gcc/config/i386/darwin.h
+++ b/gcc/config/i386/darwin.h
@@ -344,3 +344,8 @@ along with GCC; see the file COPYING3.  If not see
 #undef SUBTARGET_SHADOW_OFFSET
 #define SUBTARGET_SHADOW_OFFSET\
   (TARGET_LP64 ? HOST_WIDE_INT_1 << 44 : HOST_WIDE_INT_1 << 29)
+
+#undef CLEAR_INSN_CACHE
+#define CLEAR_INSN_CACHE(beg, end) \
+  extern void sys_icache_invalidate(void *start, size_t len);  \
+  sys_icache_invalidate ((beg), (size_t)((end)-(beg)))
-- 
2.24.3 (Apple Git-128)

Re: [RFH] ME optimizes variable assignment away / Fortran bind(C) descriptor conversion

2021-08-27 Thread Tobias Burnus


On 27.08.21 17:47, Tobias Burnus wrote at
https://gcc.gnu.org/pipermail/gcc-patches/2021-August/578271.html :


PS: Current GCC (mainline w/o patch) generates the following.
[-> with patch, see a-test.f90.*.original.]


I accidentally attached the original dump created by mainline GCC.

For the patched compiler, I did attach the optimized dumps. For
reference, this email now contains the original dump generated with my
patch applied.

Tobias

-
Siemens Electronic Design Automation GmbH; Anschrift: Arnulfstraße 201, 80634 
München; Gesellschaft mit beschränkter Haftung; Geschäftsführer: Thomas 
Heurung, Frank Thürauf; Sitz der Gesellschaft: München; Registergericht 
München, HRB 106955
__attribute__((fn spec (". . ")))
integer(kind=4) rank_p (struct CFI_cdesc_t & _this)
{
  struct array15_integer(kind=4) this.0;
  struct array15_integer(kind=4) * this;
  signed char idx.1;
  integer(kind=4) rnk;

  this = &this.0;
  this.0.dtype = {.elem_len=4, .type=1};
  this.0.data = _this->base_addr;
  this.0.dtype.rank = _this->rank;
  if (_this->base_addr != 0B)
{
  this.0.span = _this->dim[0].sm % (integer(kind=8)) _this->elem_len != 0 ? 
_this->dim[0].sm : (integer(kind=8)) _this->elem_len;
  this.0.offset = 0;
  idx.1 = 0;
  L.1:;
  if (_this->rank <= idx.1) goto L.2;
  this.0.dim[idx.1].lbound = _this->dim[idx.1].lower_bound;
  this.0.dim[idx.1].ubound = _this->dim[idx.1].extent + 
(this.0.dim[idx.1].lbound + -1);
  this.0.dim[idx.1].stride = _this->dim[idx.1].sm / (integer(kind=8)) 
_this->elem_len;
  this.0.offset = this.0.offset - this.0.dim[idx.1].stride * 
this.0.dim[idx.1].lbound;
  idx.1 = idx.1 + 1;
  goto L.1;
  L.2:;
}
  rnk = (integer(kind=4)) this->dtype.rank;
  return rnk;
}


__attribute__((fn spec (". ")))
void selr_p ()
{
  struct array01_integer(kind=4) intp;
  integer(kind=4) irnk;
  static integer(kind=4) rnk = 1;

  intp.dtype = {.elem_len=4, .rank=1, .type=1};
  intp.span = 0;
  intp.data = 0B;
  {
struct CFI_cdesc_t01 cfi.2;
signed char idx.3;

cfi.2.version = 1;
cfi.2.rank = 1;
cfi.2.type = 1025;
cfi.2.attribute = 0;
cfi.2.base_addr = intp.data;
cfi.2.elem_len = 4;
if (cfi.2.base_addr != 0B)
  {
idx.3 = 0;
L.3:;
if (idx.3 > 0) goto L.4;
cfi.2.dim[idx.3].lower_bound = intp.dim[idx.3].lbound;
cfi.2.dim[idx.3].extent = (intp.dim[idx.3].ubound - 
intp.dim[idx.3].lbound) + 1;
cfi.2.dim[idx.3].sm = intp.dim[idx.3].stride * intp.span;
idx.3 = idx.3 + 1;
goto L.3;
L.4:;
  }
irnk = rank_p (&cfi.2);
  }
  if (irnk != rnk)
{
  _gfortran_stop_numeric (1, 0);
}
  L.5:;
  if (irnk != 1)
{
  _gfortran_stop_numeric (2, 0);
}
  L.6:;
}


__attribute__((externally_visible))
integer(kind=4) main (integer(kind=4) argc, character(kind=1) * * argv)
{
  static integer(kind=4) options.4[7] = {2116, 4095, 0, 1, 1, 0, 31};

  _gfortran_set_args (argc, argv);
  _gfortran_set_options (7, &options.4[0]);
  selr_p ();
  return 0;
}

[pushed] Darwin : Mark the mod init/term section starts with a linker-visible sym.

2021-08-27 Thread Iain Sandoe

Hi,

Some newer assemblers emit section start temp symbols for mod init and term
sections if there is no suitable symbol present already.
The temp symbols are linker visible and therefore appear in the symbol tables.
Since the temp symbol number can vary when debug is enabled, that causes
compare-debug fails.  The solution is to provide a stable linker-visible
symbol.

tested on powerpc, i686, x86_64-darwin, x8-64-linux
pushed to master, thanks
Iain

Signed-off-by: Iain Sandoe 

gcc/ChangeLog:

* config/darwin.c (finalize_ctors): Add a section-start linker-
visible symbol.
(finalize_dtors): Likewise.
* config/darwin.h (MIN_LD64_INIT_TERM_START_LABELS): New.
---
 gcc/config/darwin.c | 37 -
 gcc/config/darwin.h |  3 +++
 2 files changed, 35 insertions(+), 5 deletions(-)

diff --git a/gcc/config/darwin.c b/gcc/config/darwin.c
index 5d1d13c80aa..667fda79a60 100644
--- a/gcc/config/darwin.c
+++ b/gcc/config/darwin.c
@@ -109,6 +109,9 @@ static bool ld_uses_coal_sects = false;
each FDE.  */
 static bool ld_needs_eh_markers = false;
 
+/* Emit a section-start symbol for mod init and term sections.  */
+static bool ld_init_term_start_labels = false;
+
 /* Section names.  */
 section * darwin_sections[NUM_DARWIN_SECTIONS];
 
@@ -1838,6 +1841,11 @@ finalize_ctors ()
   else
 switch_to_section (darwin_sections[constructor_section]);
 
+  /* Where needed, provide a linker-visible section-start symbol so that we
+ have stable output between debug and non-debug.  */
+  if (ld_init_term_start_labels)
+fputs (MACHOPIC_INDIRECT ? "_Mod.init:\n" : "_CTOR.sect:\n", asm_out_file);
+
   if (vec_safe_length (ctors) > 1)
 ctors->qsort (sort_cdtor_records);
   FOR_EACH_VEC_SAFE_ELT (ctors, i, elt)
@@ -1858,6 +1866,11 @@ finalize_dtors ()
   else
 switch_to_section (darwin_sections[destructor_section]);
 
+  /* Where needed, provide a linker-visible section-start symbol so that we
+ have stable output between debug and non-debug.  */
+  if (ld_init_term_start_labels)
+fputs (MACHOPIC_INDIRECT ? "_Mod.term:\n" : "_DTOR.sect:\n", asm_out_file);
+
   if (vec_safe_length (dtors) > 1)
 dtors->qsort (sort_cdtor_records);
   FOR_EACH_VEC_SAFE_ELT (dtors, i, elt)
@@ -3228,11 +3241,25 @@ darwin_override_options (void)
   /* Earlier versions are not specifically accounted, until required.  */
 }
 
-  /* Older Darwin ld could not coalesce weak entities without them being
- placed in special sections.  */
-  if (darwin_target_linker
-  && (strverscmp (darwin_target_linker, MIN_LD64_NO_COAL_SECTS) < 0))
-ld_uses_coal_sects = true;
+  /* Some codegen needs to account for the capabilities of the target
+ linker.  */
+  if (darwin_target_linker)
+{
+  /* Older Darwin ld could not coalesce weak entities without them being
+placed in special sections.  */
+  if (strverscmp (darwin_target_linker, MIN_LD64_NO_COAL_SECTS) < 0)
+   ld_uses_coal_sects = true;
+
+  /* Some newer assemblers emit section start temp symbols for mod init
+and term sections if there is no suitable symbol present already.
+The temp symbols are linker visible and therefore appear in the
+symbol tables.  Since the temp symbol number can vary when debug is
+enabled, that causes compare-debug fails.  The solution is to provide
+a stable linker-visible symbol.  */
+  if (strverscmp (darwin_target_linker,
+ MIN_LD64_INIT_TERM_START_LABELS) >= 0)
+   ld_init_term_start_labels = true;
+}
 
   /* In principle, this should be c-family only.  However, we really need to
  set sensible defaults for LTO as well, since the section selection stuff
diff --git a/gcc/config/darwin.h b/gcc/config/darwin.h
index b1be561e854..f1d92f87e9a 100644
--- a/gcc/config/darwin.h
+++ b/gcc/config/darwin.h
@@ -1104,6 +1104,9 @@ extern void darwin_driver_init (unsigned int *,struct 
cl_decoded_option **);
needed, and there is no need for the compiler to emit them.  */
 #define MIN_LD64_OMIT_STUBS "62.1"
 
+/* Emit start labels for init and term sections from this version.  */
+#define MIN_LD64_INIT_TERM_START_LABELS "136.0"
+
 /* If we have no definition for the linker version, pick the minimum version
that will bootstrap the compiler.  */
 #ifndef LD64_VERSION
--

Re: [PATCH] libgfortran : Use the libtool macro to determine libm availability.

2021-08-27 Thread Iain Sandoe




> On 22 Aug 2021, at 09:44, Iain Sandoe  wrote:
> 
> 
> 
>> On 21 Aug 2021, at 23:18, Eric Gallager  wrote:
>> 
>> On Fri, Aug 20, 2021 at 3:53 AM Tobias Burnus  
>> wrote:
>>> 
>>> On 20.08.21 09:34, Richard Biener via Fortran wrote:
>>> 
 On Thu, Aug 19, 2021 at 10:10 PM Iain Sandoe  wrote:
> libm is not needed on Darwin, and should not be added unconditionally
> even if that is (mostly) harmless since it is a symlink to libc.
> 
> tested on x86_64, i686-darwin, x86_64-linux,
> OK for master?
 OK.
 Richard.
>>> 
>>> Looks also good to me – but for completeness and as background, I also
>>> want to remark the following.
>>> 
>>> (At least as I understand it, I did not dig deeper.)
>>> 
 --- a/libgfortran/configure.ac
 +++ b/libgfortran/configure.ac
 ...
 +AC_CHECK_LIBM
>>> 
>>> This CHECK_LIBM has a hard-coded list of targets and for some (like
>>> Darwin) it simply does not set $LIBM.
>> 
>> I thought the proper macro to use was LT_LIB_M ?
> 
> you could well be right, libtool.m4 contains:
> 
> # Old name:
> AU_ALIAS([AC_CHECK_LIBM], [LT_LIB_M])
> 
> I guess the patch can be changed and then do another round of testing …
> … will add this to the TODO, and withdraw the current patch.

this is what I pushed after retesting (with the extras space removed too):

==
libgfortran: Use the libtool macro to determine libm
 availability.

We recently had a report of build failure against a Darwin branch on
the latest OS release.  This was because (temporarily) the symlink
from libm.dylib => libSystem.dylib had been removed/omitted.

libm is not needed on Darwin, and should not be added unconditionally
even if that is (mostly) harmless since it is a symlink to libc.

There could be cases where the addition was not completely harmless
because the presentation of the symlink would cause the symbols exposed
in libSystem to be considered ahead of ones presented in convenience
libraries.

libgfortran/ChangeLog:

* Makefile.am: Use configured libm availability.
* Makefile.in: Regenerate.
* configure: Regenerate.
* configure.ac: Use libtool macro to find libm availability.
* libgfortran.spec.in: Use configured libm availability.
---
 libgfortran/Makefile.am |   2 +-
 libgfortran/Makefile.in |   3 +-
 libgfortran/configure   | 146 +++-
 libgfortran/configure.ac|   1 +
 libgfortran/libgfortran.spec.in |   2 +-
 5 files changed, 149 insertions(+), 5 deletions(-)

diff --git a/libgfortran/Makefile.am b/libgfortran/Makefile.am
index 8d104321567..6fc4b465c6e 100644
--- a/libgfortran/Makefile.am
+++ b/libgfortran/Makefile.am
@@ -42,7 +42,7 @@ libgfortran_la_LINK = $(LINK) $(libgfortran_la_LDFLAGS)
 libgfortran_la_LDFLAGS = -version-info `grep -v '^\#' 
$(srcdir)/libtool-version` \
$(LTLDFLAGS) $(LIBQUADLIB) ../libbacktrace/libbacktrace.la \
$(HWCAP_LDFLAGS) \
-   -lm $(extra_ldflags_libgfortran) \
+   $(LIBM) $(extra_ldflags_libgfortran) \
$(version_arg) -Wc,-shared-libgcc
 libgfortran_la_DEPENDENCIES = $(version_dep) libgfortran.spec $(LIBQUADLIB_DEP)
  
 
diff --git a/libgfortran/configure.ac b/libgfortran/configure.ac
index 523eb24bca1..a77509801e6 100644
--- a/libgfortran/configure.ac
+++ b/libgfortran/configure.ac
@@ -260,6 +260,7 @@ AC_PROG_INSTALL
 #AC_MSG_NOTICE([== Starting libtool configuration])
 AC_LIBTOOL_DLOPEN
 AM_PROG_LIBTOOL
+LT_LIB_M
 ACX_LT_HOST_FLAGS
 AC_SUBST(enable_shared)
 AC_SUBST(enable_static)
diff --git a/libgfortran/libgfortran.spec.in b/libgfortran/libgfortran.spec.in
index 95aa3f842a3..367d485c230 100644
--- a/libgfortran/libgfortran.spec.in
+++ b/libgfortran/libgfortran.spec.in
@@ -5,4 +5,4 @@
 #
 
 %rename lib liborig
-*lib: @LIBQUADSPEC@ -lm %(libgcc) %(liborig)
+*lib: @LIBQUADSPEC@ @LIBM@ %(libgcc) %(liborig)
-- 
2.24.3 (Apple Git-128)

[RFH] ME optimizes variable assignment away / Fortran bind(C) descriptor conversion

2021-08-27 Thread Tobias Burnus


Hi all,

Background: gfortran has its own array descriptor – and one which is defined in
F2018 and used/usable from C (#include ).
On mainline, the conversion is done via a void* pointer and calls to 
libgfortran,
which causes all kind of issues, including alias issues but also data 
type/bounds
issues etc.

The attached patch tries to do this inline - and defines in the FE a proper
type for the C descriptor.  ("CFI_cdesc_t" has a 'dim[]' as last member,
'CFI_cdesc_t01' has dim[1].)


But but I have a ME optimization issue, which removes an crucial
assignment - any help/suggestion is welcome!
(Additionally, there is room for improvement regarding the debugging
experience. Suggestions are welcome as well, but it is not as crucial.)


Do you have any suggestion or idea what goes wrong?


It looks really nice with "-O1 -fno-inline"   :-)
  The callee 'rank_p()' is mostly optimized and in the
  caller only those struct elements are set, which are used:

integer(kind=4) rank_p (struct CFI_cdesc_t & _this)
{
  _1 = _this_11(D)->base_addr;
  _2 = _this_11(D)->rank;
...
  rnk_13 = (integer(kind=4)) _2;
  return rnk_13;
}

void selr_p ()
{
...
  struct CFI_cdesc_t01 cfi.7;
...
   [local count: 537730764]:
  cfi.7.rank = 1;
  cfi.7.base_addr = 0B;
  irnk_45 = rank_p (&cfi.7);
  cfi.7 ={v} {CLOBBER};
  if (irnk_45 != 1)


BUT BAD RESULT with -O2 -fno-inline  :-(
  The assignments on the caller side are gone,
  which causes wrong code (run stops with 'stop 1'):

integer(kind=4) rank_p (struct CFI_cdesc_t & _this)
{
...
   [local count: 1073741824]:
  _1 = _this_3(D)->rank;
  rnk_4 = (integer(kind=4)) _1;
  return rnk_4;
}

void selr_p ()
{
...
  struct CFI_cdesc_t01 cfi.7;
...
   [local count: 537730764]:
  irnk_30 = rank_p (&cfi.7);   !  ERROR: cfi.7.rank assignment missing
  cfi.7 ={v} {CLOBBER};
  if (irnk_30 != 1)

 *  *  *

Any idea / suggestion?

 *  *  *

* trans-type.c defines the new type
* trans-decl.c handles the conversion from C descriptor to Fortran descriptor 
in the callee
* trans-expr.c handles the conversion to the C descriptor in the callee

Attached:
* Testcase 'test.f90'
  - original dump
  - -O1 -fno-inline optimized dump
  - -O2 -fno-inline optimized dump
* Full patch
  - Testcase is lightly modified gfortran.dg/PR93963.f90

Tobias

 *  *  *

PS: Current GCC (mainline w/o patch) generates the following.
[-> with patch, see a-test.f90.*.original.]

Namely, for the callee, casting the argument
  (in reality pointer to a CFI descriptor,
   but TREE_TYPE (PARM_DECL) is ptr to Fortran descriptor)
to 'void *', passing it to a library function, which creates
a new Fortran descriptor and pointer-assigning it to
the PARM_DECL pointer, which now points to a Fortran
descriptor:

integer(kind=4) rank_p (struct array15_integer(kind=4) & this)
{
   gfc_desc_ptr.1 = &gfc_desc.0;
   CFI_desc_ptr.2 = (void *) this;
   _gfortran_cfi_desc_to_gfc_desc (gfc_desc_ptr.1, &CFI_desc_ptr.2);
   this = (struct array15_integer(kind=4) &) gfc_desc_ptr.1;
   rnk = (integer(kind=4)) this->dtype.rank;
...
void selr_p ()
{
  struct array01_integer(kind=4) intp;
  integer(kind=4) irnk;
  static integer(kind=4) rnk = 1;

  intp.dtype = {.elem_len=4, .rank=1, .type=1};
  intp.span = 0;
...
_gfortran_gfc_desc_to_cfi_desc (&cfi.3, &intp);
intp.dtype.attribute = 0;
irnk = rank_p (cfi.3);
__builtin_free (cfi.3);

 *  *  *

-
Siemens Electronic Design Automation GmbH; Anschrift: Arnulfstraße 201, 80634 
München; Gesellschaft mit beschränkter Haftung; Geschäftsführer: Thomas 
Heurung, Frank Thürauf; Sitz der Gesellschaft: München; Registergericht 
München, HRB 106955
module m
contains
function rank_p(this) result(rnk) bind(c)
  use, intrinsic :: iso_c_binding, only: c_int
  implicit none
  integer(kind=c_int), pointer, intent(in) :: this(..)
  integer(kind=c_int)  :: rnk

  rnk = rank (this)
end function rank_p
end module m

program selr_p
  use m
  use, intrinsic :: iso_c_binding, only: c_int
  implicit none

  integer(kind=c_int), pointer :: intp(:)
  integer(kind=c_int)  :: irnk, rnk = 1

  nullify(intp)
  irnk = rank_p(intp)
  if (irnk /= rnk)stop 1
  if (irnk /= rank(intp)) stop 2
end program selr_p
__attribute__((fn spec (". . ")))
integer(kind=4) rank_p (struct array15_integer(kind=4) & this)
{
  struct array15_integer(kind=4) gfc_desc.0;
  struct array15_integer(kind=4) * gfc_desc_ptr.1;
  void * CFI_desc_ptr.2;
  integer(kind=4) rnk;

  if (this != 0)
{
  gfc_desc_ptr.1 = &gfc_desc.0;
  CFI_desc_ptr.2 = (void *) this;
  _gfortran_cfi_desc_to_gfc_desc (gfc_desc_ptr.1, &CFI_desc_ptr.2);
  this = (struct array15_integer(kind=4) &) gfc_desc_ptr.1;
}
  rnk = (integer(kind=4)) this->dtype.rank;
  return rnk;
}


__attribute__((fn spec (". ")))
void selr_p ()
{
  struct array01_integer(kind=4) intp;
  integer(kind=4) irnk;
  static integer(kind=4) rnk = 1;

  intp.dtype = {.elem_len=4, .rank=1, .type=1};
  intp.span = 0;
  intp.dat

Re: [PATCH v3 6/6] rs6000: Guard some x86 intrinsics implementations

2021-08-27 Thread Bill Schmidt via Gcc-patches


Hi Paul,

Thanks for the changes!  This looks fine to me, recommend approval.

Thanks,
Bill

On 8/23/21 2:03 PM, Paul A. Clarke wrote:

Some compatibility implementations of x86 intrinsics include
Power intrinsics which require POWER8.  Guard them.

emmintrin.h:
- _mm_cmpord_pd: Remove code which was ostensibly for pre-POWER8,
   but which indeed depended on POWER8 (vec_cmpgt(v2du)/vcmpgtud).
   The "POWER8" version works fine on pre-POWER8.
- _mm_mul_epu32: vec_mule(v4su) uses vmuleuw.
pmmintrin.h:
- _mm_movehdup_ps: vec_mergeo(v4su) uses vmrgow.
- _mm_moveldup_ps: vec_mergee(v4su) uses vmrgew.
smmintrin.h:
- _mm_cmpeq_epi64: vec_cmpeq(v2di) uses vcmpequd.
- _mm_mul_epi32: vec_mule(v4si) uses vmuluwm.
- _mm_cmpgt_epi64: vec_cmpgt(v2di) uses vcmpgtsd.
tmmintrin.h:
- _mm_sign_epi8: vec_neg(v4si) uses vsububm.
- _mm_sign_epi16: vec_neg(v4si) uses vsubuhm.
- _mm_sign_epi32: vec_neg(v4si) uses vsubuwm.
   Note that the above three could actually be supported pre-POWER8,
   but current GCC does not support them before POWER8.
- _mm_sign_pi8: depends on _mm_sign_epi8.
- _mm_sign_pi16: depends on _mm_sign_epi16.
- _mm_sign_pi32: depends on _mm_sign_epi32.

2021-08-20  Paul A. Clarke  

gcc
PR target/101893
* config/rs6000/emmintrin.h: Guard POWER8 intrinsics.
* config/rs6000/pmmintrin.h: Same.
* config/rs6000/smmintrin.h: Same.
* config/rs6000/tmmintrin.h: Same.
---
v3: No change.
v2:
- Ensured that new "#ifdef _ARCH_PWR8" bracket each function so
   impacted, rather than groups of functions, per v1 review.
- Noted testing in patch series cover letter.
- Added PR number to commit message.

  gcc/config/rs6000/emmintrin.h | 12 ++--
  gcc/config/rs6000/pmmintrin.h |  4 
  gcc/config/rs6000/smmintrin.h |  4 
  gcc/config/rs6000/tmmintrin.h | 12 
  4 files changed, 22 insertions(+), 10 deletions(-)

diff --git a/gcc/config/rs6000/emmintrin.h b/gcc/config/rs6000/emmintrin.h
index ce1287edf782..32ad72b4cc35 100644
--- a/gcc/config/rs6000/emmintrin.h
+++ b/gcc/config/rs6000/emmintrin.h
@@ -430,20 +430,10 @@ _mm_cmpnge_pd (__m128d __A, __m128d __B)
  extern __inline __m128d __attribute__((__gnu_inline__, __always_inline__, 
__artificial__))
  _mm_cmpord_pd (__m128d __A, __m128d __B)
  {
-#if _ARCH_PWR8
__v2du c, d;
/* Compare against self will return false (0's) if NAN.  */
c = (__v2du)vec_cmpeq (__A, __A);
d = (__v2du)vec_cmpeq (__B, __B);
-#else
-  __v2du a, b;
-  __v2du c, d;
-  const __v2du double_exp_mask  = {0x7ff0, 0x7ff0};
-  a = (__v2du)vec_abs ((__v2df)__A);
-  b = (__v2du)vec_abs ((__v2df)__B);
-  c = (__v2du)vec_cmpgt (double_exp_mask, a);
-  d = (__v2du)vec_cmpgt (double_exp_mask, b);
-#endif
/* A != NAN and B != NAN.  */
return ((__m128d)vec_and(c, d));
  }
@@ -1472,6 +1462,7 @@ _mm_mul_su32 (__m64 __A, __m64 __B)
return ((__m64)a * (__m64)b);
  }
  
+#ifdef _ARCH_PWR8

  extern __inline __m128i __attribute__((__gnu_inline__, __always_inline__, 
__artificial__))
  _mm_mul_epu32 (__m128i __A, __m128i __B)
  {
@@ -1498,6 +1489,7 @@ _mm_mul_epu32 (__m128i __A, __m128i __B)
return (__m128i) vec_mule ((__v4su)__A, (__v4su)__B);
  #endif
  }
+#endif
  
  extern __inline __m128i __attribute__((__gnu_inline__, __always_inline__, __artificial__))

  _mm_slli_epi16 (__m128i __A, int __B)
diff --git a/gcc/config/rs6000/pmmintrin.h b/gcc/config/rs6000/pmmintrin.h
index eab712fdfa66..83dff1d85666 100644
--- a/gcc/config/rs6000/pmmintrin.h
+++ b/gcc/config/rs6000/pmmintrin.h
@@ -123,17 +123,21 @@ _mm_hsub_pd (__m128d __X, __m128d __Y)
vec_mergel ((__v2df) __X, (__v2df)__Y));
  }
  
+#ifdef _ARCH_PWR8

  extern __inline __m128 __attribute__((__gnu_inline__, __always_inline__, 
__artificial__))
  _mm_movehdup_ps (__m128 __X)
  {
return (__m128)vec_mergeo ((__v4su)__X, (__v4su)__X);
  }
+#endif
  
+#ifdef _ARCH_PWR8

  extern __inline __m128 __attribute__((__gnu_inline__, __always_inline__, 
__artificial__))
  _mm_moveldup_ps (__m128 __X)
  {
return (__m128)vec_mergee ((__v4su)__X, (__v4su)__X);
  }
+#endif
  
  extern __inline __m128d __attribute__((__gnu_inline__, __always_inline__, __artificial__))

  _mm_loaddup_pd (double const *__P)
diff --git a/gcc/config/rs6000/smmintrin.h b/gcc/config/rs6000/smmintrin.h
index c04d2bb5b6d3..29719367e205 100644
--- a/gcc/config/rs6000/smmintrin.h
+++ b/gcc/config/rs6000/smmintrin.h
@@ -272,6 +272,7 @@ _mm_extract_ps (__m128 __X, const int __N)
return ((__v4si)__X)[__N & 3];
  }
  
+#ifdef _ARCH_PWR8

  extern __inline __m128i __attribute__((__gnu_inline__, __always_inline__, 
__artificial__))
  _mm_blend_epi16 (__m128i __A, __m128i __B, const int __imm8)
  {
@@ -283,6 +284,7 @@ _mm_blend_epi16 (__m128i __A, __m128i __B, const int __imm8)
#endif
return (__m128i) vec_sel ((__v8hu) __A, (__v8hu) __B, __shortmask);
  }
+#endif
  
  extern __inline __m128i __attribute__((__gnu_inline__, __always_inline__, __artificial_

Re: [PATCH v3 5/6] rs6000: Support more SSE4 "cmp", "mul", "pack" intrinsics

2021-08-27 Thread Bill Schmidt via Gcc-patches


Hi Paul,

On 8/23/21 2:03 PM, Paul A. Clarke wrote:

Function signatures and decorations match gcc/config/i386/smmintrin.h.

Also, copy tests for:
- _mm_cmpeq_epi64
- _mm_mullo_epi32, _mm_mul_epi32
- _mm_packus_epi32
- _mm_cmpgt_epi64 (SSE4.2)

from gcc/testsuite/gcc.target/i386.

2021-08-23  Paul A. Clarke  

gcc
* config/rs6000/smmintrin.h (_mm_cmpeq_epi64, _mm_cmpgt_epi64,
_mm_mullo_epi32, _mm_mul_epi32, _mm_packus_epi32): New.
* config/rs6000/nmmintrin.h: Copy from i386, tweak to suit.

gcc/testsuite
* gcc.target/powerpc/pr78102.c: Copy from gcc.target/i386,
adjust dg directives to suit.
* gcc.target/powerpc/sse4_1-packusdw.c: Same.
* gcc.target/powerpc/sse4_1-pcmpeqq.c: Same.
* gcc.target/powerpc/sse4_1-pmuldq.c: Same.
* gcc.target/powerpc/sse4_1-pmulld.c: Same.
* gcc.target/powerpc/sse4_2-pcmpgtq.c: Same.
* gcc.target/powerpc/sse4_2-check.h: Copy from gcc.target/i386,
tweak to suit.
---
v3:
- Add nmmintrin.h. _mm_cmpgt_epi64 is part of SSE4.2, which is
   ostensibly defined in nmmintrin.h. Following the i386 implementation,
   however, nmmintrin.h only includes smmintrin.h, and the actual
   implementations appear there.
- Add sse4_2-check.h, required by sse4_2-pcmpgtq.c. My testing was
   obviously inadequate.
v2:
- Added "extern" to functions to maintain compatible decorations with
   like implementations in gcc/config/i386.
- Removed "-Wno-psabi" from tests as unnecessary, per v1 review.
- Noted testing in patch series cover letter.

  gcc/config/rs6000/nmmintrin.h | 40 ++
  gcc/config/rs6000/smmintrin.h | 41 +++
  gcc/testsuite/gcc.target/powerpc/pr78102.c| 23 ++
  .../gcc.target/powerpc/sse4_1-packusdw.c  | 73 +++
  .../gcc.target/powerpc/sse4_1-pcmpeqq.c   | 46 
  .../gcc.target/powerpc/sse4_1-pmuldq.c| 51 +
  .../gcc.target/powerpc/sse4_1-pmulld.c| 46 
  .../gcc.target/powerpc/sse4_2-check.h | 18 +
  .../gcc.target/powerpc/sse4_2-pcmpgtq.c   | 46 
  9 files changed, 384 insertions(+)
  create mode 100644 gcc/config/rs6000/nmmintrin.h
  create mode 100644 gcc/testsuite/gcc.target/powerpc/pr78102.c
  create mode 100644 gcc/testsuite/gcc.target/powerpc/sse4_1-packusdw.c
  create mode 100644 gcc/testsuite/gcc.target/powerpc/sse4_1-pcmpeqq.c
  create mode 100644 gcc/testsuite/gcc.target/powerpc/sse4_1-pmuldq.c
  create mode 100644 gcc/testsuite/gcc.target/powerpc/sse4_1-pmulld.c
  create mode 100644 gcc/testsuite/gcc.target/powerpc/sse4_2-check.h
  create mode 100644 gcc/testsuite/gcc.target/powerpc/sse4_2-pcmpgtq.c

diff --git a/gcc/config/rs6000/nmmintrin.h b/gcc/config/rs6000/nmmintrin.h
new file mode 100644
index ..20a70bee3776
--- /dev/null
+++ b/gcc/config/rs6000/nmmintrin.h
@@ -0,0 +1,40 @@
+/* Copyright (C) 2021 Free Software Foundation, Inc.
+
+   This file is part of GCC.
+
+   GCC is free software; you can redistribute it and/or modify
+   it under the terms of the GNU General Public License as published by
+   the Free Software Foundation; either version 3, or (at your option)
+   any later version.
+
+   GCC is distributed in the hope that it will be useful,
+   but WITHOUT ANY WARRANTY; without even the implied warranty of
+   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+   GNU General Public License for more details.
+
+   Under Section 7 of GPL version 3, you are granted additional
+   permissions described in the GCC Runtime Library Exception, version
+   3.1, as published by the Free Software Foundation.
+
+   You should have received a copy of the GNU General Public License and
+   a copy of the GCC Runtime Library Exception along with this program;
+   see the files COPYING3 and COPYING.RUNTIME respectively.  If not, see
+   .  */
+
+#ifndef NO_WARN_X86_INTRINSICS
+/* This header is distributed to simplify porting x86_64 code that
+   makes explicit use of Intel intrinsics to powerpc64le.
+   It is the user's responsibility to determine if the results are
+   acceptable and make additional changes as necessary.
+   Note that much code that uses Intel intrinsics can be rewritten in
+   standard C or GNU C extensions, which are more portable and better
+   optimized across multiple targets.  */
+#endif
+
+#ifndef _NMMINTRIN_H_INCLUDED
+#define _NMMINTRIN_H_INCLUDED
+
+/* We just include SSE4.1 header file.  */
+#include 
+
+#endif /* _NMMINTRIN_H_INCLUDED */


Should there be something in here indicating that nmmintrin.h is for SSE 
4.2?  Otherwise it's a bit of a head-scratcher to a new person wondering 
why this file exists.  No big deal either way.


This looks fine to me with or without that.  Recommend approval.

Thanks!
Bill


diff --git a/gcc/config/rs6000/smmintrin.h b/gcc/config/rs6000/smmintrin.h
index fdef6674d16c..c04d2bb5b6d3 100644
--- a/gcc/config/rs6000/

Re: [PATCH] improve note location and refactor warn_uninit

2021-08-27 Thread Martin Sebor via Gcc-patches


On 8/27/21 12:23 AM, Richard Biener wrote:

On Thu, Aug 26, 2021 at 9:30 PM Martin Sebor  wrote:


On 8/26/21 10:38 AM, Martin Sebor wrote:

On 8/26/21 1:00 AM, Richard Biener wrote:

On Wed, Aug 25, 2021 at 10:03 PM Martin Sebor  wrote:


Richard, some time ago you mentioned you'd had trouble getting
-Wuninitialized to print the note pointing to the uninitialized
variable.  I said I'd look into it, and so I did.  The attached
patch simplifies the warn_uninit() function to get rid of some
redundant cruft: besides a few pointless null tests and
a duplicate argument it also does away with ancient code that's
responsible for the problem you saw.  It turns out that tests
for the problem are already in the test suite but have been
xfailed since the day they were added, so the patch simply
removes the xfails.  I'll go ahead and commit this cleanup
as obvious later this week unless you have suggestions for
further changes.


I do see lots of useful refactoring.

-  if (context != NULL && gimple_has_location (context))
-location = gimple_location (context);
-  else if (phiarg_loc != UNKNOWN_LOCATION)
-location = phiarg_loc;
-  else
-location = DECL_SOURCE_LOCATION (var);
+  /* Use either the location of the read statement or that of the PHI
+ argument, or that of the uninitialized variable, in that order,
+ whichever is valid.  */
+  location_t location = gimple_location (context);
+  if (location == UNKNOWN_LOCATION)
+{
+  if (phi_arg_loc == UNKNOWN_LOCATION)
+   location = DECL_SOURCE_LOCATION (var);
+  else
+   location = phi_arg_loc;
+}

the comment is an improvement but I think the original flow is
easier to follow ;)


It doesn't really simplify things so I can revert it.


I've done that and pushed r12-3165, and...


-  if (xloc.file != floc.file
- || linemap_location_before_p (line_table, location, cfun_loc)
- || linemap_location_before_p (line_table,
cfun->function_end_locus,
-   location))
-   inform (DECL_SOURCE_LOCATION (var), "%qD was declared here",
var);
...
+  inform (var_loc, "%qD was declared here", var);

and this is the actual change that "fixes" the missing diagnostics?  Can
you explain what the reason for the original sing-and-dance was?  It
looks like it tries to print the 'was declared here' only for locations
within the current function (but then it probably intended to do that
on the DECL_SOURCE_LOCATION (var), not on whatever we are
choosing now)?


The author(s) of the logic wanted to print the note only for variables
declared in functions other than the one the uninitialized read is in.
The idea first appears in pr17506, comment #25 (and attachment 12131).
At that time GCC would print no note at all and pr17506 was about
inlining making it difficult to find the uninitialized variable.
The originally reported problem can still be reproduced on Godbolt
(with the original very large translation unit):
https://godbolt.org/z/8WW43rxnd

I've reduced it to a small test case:
https://godbolt.org/z/rnPEfPqPf

It looks like they didn't think it would also be a problem if
the variable was defined and read in the same function, even
if the read was far away from the declaration.


Ah, OK.  I think that makes sense in principle, the test just looked
really weird - for the 'decl' it would be most natural to use sth
like decl_function_context (DECL_ORIGIN (var)) to determine
the function the variable was declared in and for the location of
the uninitialized use you'd similarly use

   tree ctx = BLOCK_ORIGIN (LOCATION_BLOCK (location));
   while (TREE_CODE ((ctx = BLOCK_SUPERCONTEXT (ctx))) != FUNCTION_DECL)
  ;

and then you can compare both functions.  This of course requires
a good location of the uninitialized use.

So I'm not sure whether we want to increase verboseness for say

int foo ()
{
int i;
i = i + 1;
return i;
}

by telling the user 'i' was declared the line before the uninitialized use.


In this contrived case the note may not be important but it doesn't
hurt.  It's the more realistic cases of large functions with problem
statements far away from the objects they operate on, often indirectly,
via pointers, where providing the additional context is helpful.

This is also why most other middle end warnings (all those I worked
on) as well as other instances of -Wmaybe-uninitialized issue these
(and other) notes to provide enough detail to understand the problem.
The one we're discussion is an outlier.  For example this code:

void f (const int*, ...);

void g (int i)
{
  int a[4], *p = a + i;
  f (p, p[4]);
}

results in two warnings, each with its own note(s):

z.c: In function ‘g’:
z.c:6:3: warning: ‘a’ is used uninitialized [-Wuninitialized]
6 |   f (p, p[4]);
  |   ^~~
z.c:5:7: note: ‘a’ declared here
5 |   int a[4], *p = a + i;
  |   ^
z.c:6:3: warning: array subscript 4 is outside array bounds of ‘int[4]’ 
[-Warray-bounds]

Re: Ping: [PATCH] diagnostics: Support for -finput-charset [PR93067]

2021-08-27 Thread Lewis Hyatt via Gcc-patches

On Wed, Aug 25, 2021 at 9:45 AM David Malcolm  wrote:
> > BTW, do you think it would be worthwhile to work on the other half of
> > encoding support, i.e. translating from UTF-8 to the user's locale,
> > when outputting diagnostics? I have probably 90% of a patch that does
> > this, however it complexifies things a bit and I am not sure if it is
> > really worth the trouble. What is rather manageable (that my patch in
> > progress does now) is to replace non-translatable characters with
> > something like UCN escapes. What is not so easy, is to do this and
> > preserve the alignment of carets and label lines and such... this
> > requires making the display width of a character also
> > locale-dependent, which concept doesn't exist currently. Adding that
> > feels like a lot of complication for what would be a little-used
> > feature... Anyway, if you think a patch that does the translation
> > without preserving the alignment would be useful, I could finish it
> > up
> > and send it. Otherwise I was kinda inclined to forget about it.
> > Thanks!
>
> Maybe post the patch you have so far, making clear that it's
> unfinished/work-in-progress?  I think I'll find it easier to think
> about this with a patch in hand rather than a description of a patch,
> and also, that way the list archives will have a copy of your work in
> case we do want to finish it at some later point (rather than it just
> being on your hard drive).

I created PR102099 about this
(https://gcc.gnu.org/bugzilla/show_bug.cgi?id=102099) and posted the
patch there. It's pretty complete, although it has a couple issues I
mention in the PR. Thanks!

-Lewis

Re: [PATCH v3 4/6] rs6000: Support SSE4.1 "cvt" intrinsics

2021-08-27 Thread Bill Schmidt via Gcc-patches


This looks fine, recommend approval.

Thanks!
Bill

On 8/23/21 2:03 PM, Paul A. Clarke wrote:

Function signatures and decorations match gcc/config/i386/smmintrin.h.

Also, copy tests for:
- _mm_cvtepi8_epi16, _mm_cvtepi8_epi32, _mm_cvtepi8_epi64
- _mm_cvtepi16_epi32, _mm_cvtepi16_epi64
- _mm_cvtepi32_epi64,
- _mm_cvtepu8_epi16, _mm_cvtepu8_epi32, _mm_cvtepu8_epi64
- _mm_cvtepu16_epi32, _mm_cvtepu16_epi64
- _mm_cvtepu32_epi64

from gcc/testsuite/gcc.target/i386.

sse4_1-pmovsxbd.c, sse4_1-pmovsxbq.c, and sse4_1-pmovsxbw.c were
modified from using "char" types to "signed char" types, because
the default is unsigned on powerpc.

2021-08-20  Paul A. Clarke  

gcc
* config/rs6000/smmintrin.h (_mm_cvtepi8_epi16, _mm_cvtepi8_epi32,
_mm_cvtepi8_epi64, _mm_cvtepi16_epi32, _mm_cvtepi16_epi64,
_mm_cvtepi32_epi64, _mm_cvtepu8_epi16, _mm_cvtepu8_epi32,
_mm_cvtepu8_epi64, _mm_cvtepu16_epi32, _mm_cvtepu16_epi64,
_mm_cvtepu32_epi64): New.

gcc/testsuite
* gcc.target/powerpc/sse4_1-pmovsxbd.c: Copy from gcc.target/i386,
adjust dg directives to suit.
* gcc.target/powerpc/sse4_1-pmovsxbq.c: Same.
* gcc.target/powerpc/sse4_1-pmovsxbw.c: Same.
* gcc.target/powerpc/sse4_1-pmovsxdq.c: Same.
* gcc.target/powerpc/sse4_1-pmovsxwd.c: Same.
* gcc.target/powerpc/sse4_1-pmovsxwq.c: Same.
* gcc.target/powerpc/sse4_1-pmovzxbd.c: Same.
* gcc.target/powerpc/sse4_1-pmovzxbq.c: Same.
* gcc.target/powerpc/sse4_1-pmovzxbw.c: Same.
* gcc.target/powerpc/sse4_1-pmovzxdq.c: Same.
* gcc.target/powerpc/sse4_1-pmovzxwd.c: Same.
* gcc.target/powerpc/sse4_1-pmovzxwq.c: Same.
---
v3: No change.
v2:
- Added "extern" to functions to maintain compatible decorations with
   like implementations in gcc/config/i386.
- Removed "-Wno-psabi" from tests as unnecessary, per v1 review.
- Noted testing in patch series cover letter.

  gcc/config/rs6000/smmintrin.h | 138 ++
  .../gcc.target/powerpc/sse4_1-pmovsxbd.c  |  42 ++
  .../gcc.target/powerpc/sse4_1-pmovsxbq.c  |  42 ++
  .../gcc.target/powerpc/sse4_1-pmovsxbw.c  |  42 ++
  .../gcc.target/powerpc/sse4_1-pmovsxdq.c  |  42 ++
  .../gcc.target/powerpc/sse4_1-pmovsxwd.c  |  42 ++
  .../gcc.target/powerpc/sse4_1-pmovsxwq.c  |  42 ++
  .../gcc.target/powerpc/sse4_1-pmovzxbd.c  |  43 ++
  .../gcc.target/powerpc/sse4_1-pmovzxbq.c  |  43 ++
  .../gcc.target/powerpc/sse4_1-pmovzxbw.c  |  43 ++
  .../gcc.target/powerpc/sse4_1-pmovzxdq.c  |  43 ++
  .../gcc.target/powerpc/sse4_1-pmovzxwd.c  |  43 ++
  .../gcc.target/powerpc/sse4_1-pmovzxwq.c  |  43 ++
  13 files changed, 648 insertions(+)
  create mode 100644 gcc/testsuite/gcc.target/powerpc/sse4_1-pmovsxbd.c
  create mode 100644 gcc/testsuite/gcc.target/powerpc/sse4_1-pmovsxbq.c
  create mode 100644 gcc/testsuite/gcc.target/powerpc/sse4_1-pmovsxbw.c
  create mode 100644 gcc/testsuite/gcc.target/powerpc/sse4_1-pmovsxdq.c
  create mode 100644 gcc/testsuite/gcc.target/powerpc/sse4_1-pmovsxwd.c
  create mode 100644 gcc/testsuite/gcc.target/powerpc/sse4_1-pmovsxwq.c
  create mode 100644 gcc/testsuite/gcc.target/powerpc/sse4_1-pmovzxbd.c
  create mode 100644 gcc/testsuite/gcc.target/powerpc/sse4_1-pmovzxbq.c
  create mode 100644 gcc/testsuite/gcc.target/powerpc/sse4_1-pmovzxbw.c
  create mode 100644 gcc/testsuite/gcc.target/powerpc/sse4_1-pmovzxdq.c
  create mode 100644 gcc/testsuite/gcc.target/powerpc/sse4_1-pmovzxwd.c
  create mode 100644 gcc/testsuite/gcc.target/powerpc/sse4_1-pmovzxwq.c

diff --git a/gcc/config/rs6000/smmintrin.h b/gcc/config/rs6000/smmintrin.h
index 363534cb06a2..fdef6674d16c 100644
--- a/gcc/config/rs6000/smmintrin.h
+++ b/gcc/config/rs6000/smmintrin.h
@@ -442,6 +442,144 @@ _mm_max_epu32 (__m128i __X, __m128i __Y)
return (__m128i) vec_max ((__v4su)__X, (__v4su)__Y);
  }
  
+extern __inline __m128i

+__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
+_mm_cvtepi8_epi16 (__m128i __A)
+{
+  return (__m128i) vec_unpackh ((__v16qi)__A);
+}
+
+extern __inline __m128i
+__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
+_mm_cvtepi8_epi32 (__m128i __A)
+{
+  __A = (__m128i) vec_unpackh ((__v16qi)__A);
+  return (__m128i) vec_unpackh ((__v8hi)__A);
+}
+
+#ifdef _ARCH_PWR8
+extern __inline __m128i
+__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
+_mm_cvtepi8_epi64 (__m128i __A)
+{
+  __A = (__m128i) vec_unpackh ((__v16qi)__A);
+  __A = (__m128i) vec_unpackh ((__v8hi)__A);
+  return (__m128i) vec_unpackh ((__v4si)__A);
+}
+#endif
+
+extern __inline __m128i
+__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
+_mm_cvtepi16_epi32 (__m128i __A)
+{
+  return (__m128i) vec_unpackh ((__v8hi)__A);
+}
+
+#ifdef _ARCH_PWR8
+extern __inline __m128i
+__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
+_mm_cvtepi16_ep

Re: [PATCH v3 3/6] rs6000: Simplify some SSE4.1 "test" intrinsics

2021-08-27 Thread Bill Schmidt via Gcc-patches


This looks fine, recommend approval.

Thanks!
Bill

On 8/23/21 2:03 PM, Paul A. Clarke wrote:

Copy some simple redirections from i386 , for:
- _mm_test_all_zeros
- _mm_test_all_ones
- _mm_test_mix_ones_zeros

2021-08-20  Paul A. Clarke  

gcc
* config/rs6000/smmintrin.h (_mm_test_all_zeros,
_mm_test_all_ones, _mm_test_mix_ones_zeros): Replace.
---
v3: No change.
v2:
- Removed "-Wno-psabi" from tests as unnecessary, per v1 review.
- Noted testing in patch series cover letter.

  gcc/config/rs6000/smmintrin.h | 30 --
  1 file changed, 4 insertions(+), 26 deletions(-)

diff --git a/gcc/config/rs6000/smmintrin.h b/gcc/config/rs6000/smmintrin.h
index 505fe4ce22a8..363534cb06a2 100644
--- a/gcc/config/rs6000/smmintrin.h
+++ b/gcc/config/rs6000/smmintrin.h
@@ -379,34 +379,12 @@ _mm_testnzc_si128 (__m128i __A, __m128i __B)
return _mm_testz_si128 (__A, __B) == 0 && _mm_testc_si128 (__A, __B) == 0;
  }
  
-__inline int

-__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm_test_all_zeros (__m128i __A, __m128i __mask)
-{
-  const __v16qu __zero = {0};
-  return vec_all_eq (vec_and ((__v16qu) __A, (__v16qu) __mask), __zero);
-}
+#define _mm_test_all_zeros(M, V) _mm_testz_si128 ((M), (V))
  
-__inline int

-__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm_test_all_ones (__m128i __A)
-{
-  const __v16qu __ones = vec_splats ((unsigned char) 0xff);
-  return vec_all_eq ((__v16qu) __A, __ones);
-}
+#define _mm_test_all_ones(V) \
+  _mm_testc_si128 ((V), _mm_cmpeq_epi32 ((V), (V)))
  
-__inline int

-__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm_test_mix_ones_zeros (__m128i __A, __m128i __mask)
-{
-  const __v16qu __zero = {0};
-  const __v16qu __Amasked = vec_and ((__v16qu) __A, (__v16qu) __mask);
-  const int any_ones = vec_any_ne (__Amasked, __zero);
-  const __v16qu __notA = vec_nor ((__v16qu) __A, (__v16qu) __A);
-  const __v16qu __notAmasked = vec_and ((__v16qu) __notA, (__v16qu) __mask);
-  const int any_zeros = vec_any_ne (__notAmasked, __zero);
-  return any_ones * any_zeros;
-}
+#define _mm_test_mix_ones_zeros(M, V) _mm_testnzc_si128 ((M), (V))
  
  extern __inline __m128i

  __attribute__ ((__gnu_inline__, __always_inline__, __artificial__))

Re: [PATCH v3 2/6] rs6000: Support SSE4.1 "min" and "max" intrinsics

2021-08-27 Thread Bill Schmidt via Gcc-patches


Hi Paul,

This looks fine to me, recommend approval.

Thanks,
Bill

On 8/23/21 2:03 PM, Paul A. Clarke wrote:

Function signatures and decorations match gcc/config/i386/smmintrin.h.

Also, copy tests for _mm_min_epi8, _mm_min_epu16, _mm_min_epi32,
_mm_min_epu32, _mm_max_epi8, _mm_max_epu16, _mm_max_epi32, _mm_max_epu32
from gcc/testsuite/gcc.target/i386.

sse4_1-pmaxsb.c and sse4_1-pminsb.c were modified from using
"char" types to "signed char" types, because the default is unsigned on
powerpc.

2021-08-20  Paul A. Clarke  

gcc
* config/rs6000/smmintrin.h (_mm_min_epi8, _mm_min_epu16,
_mm_min_epi32, _mm_min_epu32, _mm_max_epi8, _mm_max_epu16,
_mm_max_epi32, _mm_max_epu32): New.

gcc/testsuite
* gcc.target/powerpc/sse4_1-pmaxsb.c: Copy from gcc.target/i386.
* gcc.target/powerpc/sse4_1-pmaxsd.c: Same.
* gcc.target/powerpc/sse4_1-pmaxud.c: Same.
* gcc.target/powerpc/sse4_1-pmaxuw.c: Same.
* gcc.target/powerpc/sse4_1-pminsb.c: Same.
* gcc.target/powerpc/sse4_1-pminsd.c: Same.
* gcc.target/powerpc/sse4_1-pminud.c: Same.
* gcc.target/powerpc/sse4_1-pminuw.c: Same.
---
v3: No change.
v2:
- Added "extern" to functions to maintain compatible decorations with
   like implementations in gcc/config/i386.
- Removed "-Wno-psabi" from tests as unnecessary, per v1 review.
- Noted testing in patch series cover letter.

  gcc/config/rs6000/smmintrin.h | 56 +++
  .../gcc.target/powerpc/sse4_1-pmaxsb.c| 46 +++
  .../gcc.target/powerpc/sse4_1-pmaxsd.c| 46 +++
  .../gcc.target/powerpc/sse4_1-pmaxud.c| 47 
  .../gcc.target/powerpc/sse4_1-pmaxuw.c| 47 
  .../gcc.target/powerpc/sse4_1-pminsb.c| 46 +++
  .../gcc.target/powerpc/sse4_1-pminsd.c| 46 +++
  .../gcc.target/powerpc/sse4_1-pminud.c| 47 
  .../gcc.target/powerpc/sse4_1-pminuw.c| 47 
  9 files changed, 428 insertions(+)
  create mode 100644 gcc/testsuite/gcc.target/powerpc/sse4_1-pmaxsb.c
  create mode 100644 gcc/testsuite/gcc.target/powerpc/sse4_1-pmaxsd.c
  create mode 100644 gcc/testsuite/gcc.target/powerpc/sse4_1-pmaxud.c
  create mode 100644 gcc/testsuite/gcc.target/powerpc/sse4_1-pmaxuw.c
  create mode 100644 gcc/testsuite/gcc.target/powerpc/sse4_1-pminsb.c
  create mode 100644 gcc/testsuite/gcc.target/powerpc/sse4_1-pminsd.c
  create mode 100644 gcc/testsuite/gcc.target/powerpc/sse4_1-pminud.c
  create mode 100644 gcc/testsuite/gcc.target/powerpc/sse4_1-pminuw.c

diff --git a/gcc/config/rs6000/smmintrin.h b/gcc/config/rs6000/smmintrin.h
index a6b88d313ad0..505fe4ce22a8 100644
--- a/gcc/config/rs6000/smmintrin.h
+++ b/gcc/config/rs6000/smmintrin.h
@@ -408,6 +408,62 @@ _mm_test_mix_ones_zeros (__m128i __A, __m128i __mask)
return any_ones * any_zeros;
  }
  
+extern __inline __m128i

+__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
+_mm_min_epi8 (__m128i __X, __m128i __Y)
+{
+  return (__m128i) vec_min ((__v16qi)__X, (__v16qi)__Y);
+}
+
+extern __inline __m128i
+__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
+_mm_min_epu16 (__m128i __X, __m128i __Y)
+{
+  return (__m128i) vec_min ((__v8hu)__X, (__v8hu)__Y);
+}
+
+extern __inline __m128i
+__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
+_mm_min_epi32 (__m128i __X, __m128i __Y)
+{
+  return (__m128i) vec_min ((__v4si)__X, (__v4si)__Y);
+}
+
+extern __inline __m128i
+__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
+_mm_min_epu32 (__m128i __X, __m128i __Y)
+{
+  return (__m128i) vec_min ((__v4su)__X, (__v4su)__Y);
+}
+
+extern __inline __m128i
+__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
+_mm_max_epi8 (__m128i __X, __m128i __Y)
+{
+  return (__m128i) vec_max ((__v16qi)__X, (__v16qi)__Y);
+}
+
+extern __inline __m128i
+__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
+_mm_max_epu16 (__m128i __X, __m128i __Y)
+{
+  return (__m128i) vec_max ((__v8hu)__X, (__v8hu)__Y);
+}
+
+extern __inline __m128i
+__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
+_mm_max_epi32 (__m128i __X, __m128i __Y)
+{
+  return (__m128i) vec_max ((__v4si)__X, (__v4si)__Y);
+}
+
+extern __inline __m128i
+__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
+_mm_max_epu32 (__m128i __X, __m128i __Y)
+{
+  return (__m128i) vec_max ((__v4su)__X, (__v4su)__Y);
+}
+
  /* Return horizontal packed word minimum and its index in bits [15:0]
 and bits [18:16] respectively.  */
  __inline __m128i
diff --git a/gcc/testsuite/gcc.target/powerpc/sse4_1-pmaxsb.c 
b/gcc/testsuite/gcc.target/powerpc/sse4_1-pmaxsb.c
new file mode 100644
index ..7a465b01dd11
--- /dev/null
+++ b/gcc/testsuite/gcc.target/powerpc/sse4_1-pmaxsb.c
@@ -0,0 +1,46 @@
+/* { dg-do run } */
+/* { dg-require-effective-target powerpc_vsx_o

Re: [PATCH v3 1/6] rs6000: Support SSE4.1 "round" intrinsics

2021-08-27 Thread Bill Schmidt via Gcc-patches




On 8/27/21 8:44 AM, Bill Schmidt wrote:


Again, please specify where the patch was tested and whether this is for
trunk, backports, etc.  Thanks!  (I know you aren't asking for
backports, but in general please get in the habit of this.)



Sorry, I see that you did this in the cover letter.  Never mind, sorry 
for the noise.


Bill

Re: [PATCH v3 1/6] rs6000: Support SSE4.1 "round" intrinsics

2021-08-27 Thread Bill Schmidt via Gcc-patches


Hi Paul,

On 8/23/21 2:03 PM, Paul A. Clarke wrote:

Suppress exceptions (when specified), by saving, manipulating, and
restoring the FPSCR.  Similarly, save, set, and restore the floating-point
rounding mode when required.

No attempt is made to optimize writing the FPSCR (by checking if the new
value would be the same), other than using lighter weight instructions
when possible.

The scalar versions naively use the parallel versions to compute the
single scalar result and then construct the remainder of the result.

Of minor note, the values of _MM_FROUND_TO_NEG_INF and _MM_FROUND_TO_ZERO
are swapped from the corresponding values on x86 so as to match the
corresponding rounding mode values in the Power ISA.

Move implementations of _mm_ceil* and _mm_floor* into _mm_round*, and
convert _mm_ceil* and _mm_floor* into macros. This matches the current
analogous implementations in config/i386/smmintrin.h.

Function signatures match the analogous functions in config/i386/smmintrin.h.

Add tests for _mm_round_pd, _mm_round_ps, _mm_round_sd, _mm_round_ss,
modeled after the very similar "floor" and "ceil" tests.

Include basic tests, plus tests at the boundaries for floating-point
representation, positive and negative, test all of the parameterized
rounding modes as well as the C99 rounding modes and interactions
between the two.

Exceptions are not explicitly tested.


Again, please specify where the patch was tested and whether this is for 
trunk, backports, etc.  Thanks!  (I know you aren't asking for 
backports, but in general please get in the habit of this.)


2021-08-20  Paul A. Clarke  

gcc
* config/rs6000/smmintrin.h (_mm_round_pd, _mm_round_ps,
_mm_round_sd, _mm_round_ss, _MM_FROUND_TO_NEAREST_INT,
_MM_FROUND_TO_ZERO, _MM_FROUND_TO_POS_INF, _MM_FROUND_TO_NEG_INF,
_MM_FROUND_CUR_DIRECTION, _MM_FROUND_RAISE_EXC, _MM_FROUND_NO_EXC,
_MM_FROUND_NINT, _MM_FROUND_FLOOR, _MM_FROUND_CEIL, _MM_FROUND_TRUNC,
_MM_FROUND_RINT, _MM_FROUND_NEARBYINT): New.
* config/rs6000/smmintrin.h (_mm_ceil_pd, _mm_ceil_ps, _mm_ceil_sd,
_mm_ceil_ss, _mm_floor_pd, _mm_floor_ps, _mm_floor_sd, _mm_floor_ss):
Convert from function to macro.

gcc/testsuite
* gcc.target/powerpc/sse4_1-round3.h: New.
* gcc.target/powerpc/sse4_1-roundpd.c: New.
* gcc.target/powerpc/sse4_1-roundps.c: New.
* gcc.target/powerpc/sse4_1-roundsd.c: New.
* gcc.target/powerpc/sse4_1-roundss.c: New.
---
v3: No change.
v2:
- Replaced clever (and broken) exception masking with more straightforward
   implementation, per v1 review and closer inspection. mtfsf was only
   writing the final nybble (1) instead of the final two nybbles (2), so
   not all of the exception-enable bits were cleared.
- Renamed some variables from cryptic "tmp" and "save" to
   "fpscr_save" and "enables_save".
- Retained use of __builtin_mffsl, since that is supported pre-POWER8
   (with an alternate instruction sequence).
- Added "extern" to functions to maintain compatible decorations with
   like implementations in gcc/config/i386.
- Added some additional text to the commit message about some of the
   (unpleasant?) implementations and decorations coming from
   like implementations in gcc/config/i386, per v1 review.
- Removed "-Wno-psabi" from tests as unnecessary, per v1 review.
- Fixed indentation and other minor formatting changes, per v1 review.
- Noted testing in patch series cover letter.

  gcc/config/rs6000/smmintrin.h | 240 +++-
  .../gcc.target/powerpc/sse4_1-round3.h|  81 ++
  .../gcc.target/powerpc/sse4_1-roundpd.c   | 143 ++
  .../gcc.target/powerpc/sse4_1-roundps.c   |  98 +++
  .../gcc.target/powerpc/sse4_1-roundsd.c   | 256 ++
  .../gcc.target/powerpc/sse4_1-roundss.c   | 208 ++
  6 files changed, 962 insertions(+), 64 deletions(-)
  create mode 100644 gcc/testsuite/gcc.target/powerpc/sse4_1-round3.h
  create mode 100644 gcc/testsuite/gcc.target/powerpc/sse4_1-roundpd.c
  create mode 100644 gcc/testsuite/gcc.target/powerpc/sse4_1-roundps.c
  create mode 100644 gcc/testsuite/gcc.target/powerpc/sse4_1-roundsd.c
  create mode 100644 gcc/testsuite/gcc.target/powerpc/sse4_1-roundss.c

diff --git a/gcc/config/rs6000/smmintrin.h b/gcc/config/rs6000/smmintrin.h
index 3767a67eada7..a6b88d313ad0 100644
--- a/gcc/config/rs6000/smmintrin.h
+++ b/gcc/config/rs6000/smmintrin.h
@@ -42,6 +42,182 @@
  #include 
  #include 
  
+/* Rounding mode macros. */

+#define _MM_FROUND_TO_NEAREST_INT   0x00
+#define _MM_FROUND_TO_ZERO  0x01
+#define _MM_FROUND_TO_POS_INF   0x02
+#define _MM_FROUND_TO_NEG_INF   0x03
+#define _MM_FROUND_CUR_DIRECTION0x04
+
+#define _MM_FROUND_NINT\
+  (_MM_FROUND_TO_NEAREST_INT | _MM_FROUND_RAISE_EXC)
+#define _MM_FROUND_FLOOR   \
+  (_MM_FROUND_TO_NEG_INF | _MM_FROUND_RAISE_EXC)
+#define _MM_FROUND_CEIL

[PING] Re: [PATCH] configure, jit: Allow for 'make check-gcc-jit'.

2021-08-27 Thread Iain Sandoe

+Jeff

(it’s probably borderline obvious - but in the top level Makefile .. so)

> On 17 Aug 2021, at 21:53, David Malcolm  wrote:
> 
> On Tue, 2021-08-17 at 19:59 +0100, Iain Sandoe wrote:
>> Hi,
>> 
>> For those of us who habitually build Ada, it’s convenient to 
>> have a way of running individual test suites without invoking
>> the acats tests…
>> 
>> being able to do “make check-gcc-jit” from the top level is very
>> useful when debugging jit testsuite issues.
>> 
>> one can do "cd gcc ; make check-jit "- but this doesn’t seem 100%
>> identical since the invocations from the top level set the host
>> exports first.
>> 
>> … the patch itself is trivial / obvious - I am just curious as to
>> whether there was a reason for omitting it so far?
> 
> Probably just a mistake on my part; Makefile glue is not my strongest
> skill.
> 
>> 
>> If not, 
>> 
>> OK for master?
> 
> Sounds OK to me - but then again, Makefile glue is not my strongest
> skill, so not sure if I'm qualified to approve this.
> 
>> 
>> thanks
>> Iain
>> 
>> 
>> 
>> 
>> This is a convenience feature that allows the user to
>> do "make check-gcc-jit" at the top level of the build
>> to check that facility in isolation from others.
>> 
>> Signed-off-by: Iain Sandoe 
>> 
>> ChangeLog:
>> 
>> * Makefile.def: Add a jit check target for the jit
>> language.
>> * Makefile.in: Regenerate.
>> ---
>>  Makefile.def | 1 +
>>  Makefile.in  | 8 
>>  2 files changed, 9 insertions(+)
>> 
>> diff --git a/Makefile.def b/Makefile.def
>> index fbfdb6fee08..7cbeca5b181 100644
>> --- a/Makefile.def
>> +++ b/Makefile.def
>> @@ -654,6 +654,7 @@ languages = { language=go;  gcc-check-
>> target=check-go;
>> lib-check-target=check-gotools; };
>>  languages = { language=d;  gcc-check-target=check-d;
>> lib-check-target=check-target-
>> libphobos; };
>> +languages = { language=jit;gcc-check-target=check-jit; };
>>  
>>  // Toplevel bootstrap
>>  bootstrap_stage = { id=1 ; };

Re: [PATCH] analyzer: Impose recursion limit on indirect calls.

2021-08-27 Thread Martin Liška


On 8/27/21 14:44, Ankur Saini wrote:

While working on patch to convert most of the 8 whitespace characters to tabs 
in the analyzer’s source, I see this weird behaviour where at some places 
indentation looks weird when viewed in patch file ( generated via “git 
format-patch” )  but looks alright when viewed in text editor.


Yes, I can confirm the patch fixed the white spaces, you can verify that with:
$ git diff | ./contrib/check_GNU_style.py -
...

Note there are some other formatting issues reported by the script.

Martin




Here is the current patch file, Can someone look at this and see if they also 
experience the same ?

Re: [PATCH 15/34] rs6000: Execute the automatic built-in initialization code

2021-08-27 Thread Segher Boessenkool

On Fri, Aug 27, 2021 at 07:35:05AM -0500, Bill Schmidt wrote:
> On 8/26/21 6:15 PM, Segher Boessenkool wrote:
> >On Thu, Jul 29, 2021 at 08:31:02AM -0500, Bill Schmidt wrote:
> >>+  /* Execute the autogenerated initialization code for builtins.  */
> >>+  rs6000_autoinit_builtins ();
> >The name "autoinit" isn't so great (what "self" does this "auto" refer
> >to?), but perhaps some later patch fixes this up?  It is minor of
> >course, but the bigger something is, the better name that it deserves.
> >Names shape thoughts, and we should make the architecture of our code as
> >clear as possible.
> 
> Well, "autoinit" was meant to mean "automated initialization."  But I 
> take your point.

*Everything* GCC does is "automated" in some way, the term is
meaningless :-)

> What would you say to "rs6000_init_generated_builtins"?

That is fine.

Thanks!


Segher

Re: [PATCH] analyzer: Impose recursion limit on indirect calls.

2021-08-27 Thread Ankur Saini via Gcc-patches

While working on patch to convert most of the 8 whitespace characters to tabs 
in the analyzer’s source, I see this weird behaviour where at some places 
indentation looks weird when viewed in patch file ( generated via “git 
format-patch” )  but looks alright when viewed in text editor.


Here is the current patch file, Can someone look at this and see if they also 
experience the same ?




fix.patch
Description: Binary data

Re: [PATCH 15/34] rs6000: Execute the automatic built-in initialization code

2021-08-27 Thread Bill Schmidt via Gcc-patches


Hi Segher,

On 8/26/21 6:15 PM, Segher Boessenkool wrote:

Hi!

On Thu, Jul 29, 2021 at 08:31:02AM -0500, Bill Schmidt wrote:

* config/rs6000/rs6000-call.c (rs6000-builtins.h): New #include.
(rs6000_init_builtins): Call rs6000_autoinit_builtins; skip the old
initialization logic when new builtins are enabled.

s/; s/.  S/


+  /* Execute the autogenerated initialization code for builtins.  */
+  rs6000_autoinit_builtins ();

The name "autoinit" isn't so great (what "self" does this "auto" refer
to?), but perhaps some later patch fixes this up?  It is minor of
course, but the bigger something is, the better name that it deserves.
Names shape thoughts, and we should make the architecture of our code as
clear as possible.


Well, "autoinit" was meant to mean "automated initialization."  But I 
take your point.  What would you say to "rs6000_init_generated_builtins"?



+#ifdef SUBTARGET_INIT_BUILTINS
+  SUBTARGET_INIT_BUILTINS;
+#endif

Let's see how this shapes up.  Preferably we won't have an #ifdef but
an empty macro (or a "do {} while (0)"), etc.
To be clear, this isn't new code.  It's just the only part of the old 
code that isn't replaced by the generated initialization. You'll see it 
repeated at the end of rs6000_init_builtins.  This is used on our port 
for adding one builtin from Darwin.  It's a standard macro, also used in 
the aarch64, i386, and netbsd ports.


Okay for trunk, if this is revisited later.  Thanks!


Thanks for the review!  I'll commit after we agree on a name. (This will 
require minor changes to rs6000-gen-builtins.c to change the name of the 
generated function in rs6000-builtins.[ch].)


Bill



Segher

[PATCH] nvptx: Use cvt to perform sign-extension of truncation.

2021-08-27 Thread Roger Sayle


This patch introduces some new define_insn rules to the nvptx backend,
to perform sign-extension of a truncation (from and to the same mode),
using a single cvt instruction.  As an example, the following function

int foo(int x) { return (char)x; }

with -O2 currently generates:

mov.u32 %r24, %ar0;
mov.u32 %r26, %r24;
cvt.s32.s8  %value, %r26;

and with this patch, now generates:

mov.u32 %r24, %ar0;
cvt.s32.s8  %value, %r24;

This patch has been tested on nvptx-none hosted by x86_64-pc-linux-gnu
with a top-level "make" (including newlib) and a "make check" with no
new regressions.  Ok for mainline?


2021-08-27  Roger Sayle  

gcc/ChangeLog
* config/nvptx/nvptx.md (*extend_trunc_2_qi,
*extend_trunc_2_hi, *extend_trunc_di2_si): New insns.
Use cvt to perform sign-extension of truncation in one step.

gcc/testsuite/ChangeLog
* gcc.target/nvptx/exttrunc.c: New test case.


Roger
--

diff --git a/gcc/config/nvptx/nvptx.md b/gcc/config/nvptx/nvptx.md
index 108de1c..b7a0393 100644
--- a/gcc/config/nvptx/nvptx.md
+++ b/gcc/config/nvptx/nvptx.md
@@ -401,6 +401,32 @@
%.\\tst%A0.u%T0\\t%0, %1;"
   [(set_attr "subregs_ok" "true")])
 
+;; Sign-extensions of truncations
+
+(define_insn "*extend_trunc_2_qi"
+  [(set (match_operand:HSDIM 0 "nvptx_register_operand" "=R")
+   (sign_extend:HSDIM
+(truncate:QI (match_operand:HSDIM 1 "nvptx_register_operand" "R"]
+  ""
+  "%.\\tcvt.s%T0.s8\\t%0, %1;"
+  [(set_attr "subregs_ok" "true")])
+
+(define_insn "*extend_trunc_2_hi"
+  [(set (match_operand:SDIM 0 "nvptx_register_operand" "=R")
+   (sign_extend:SDIM
+(truncate:HI (match_operand:SDIM 1 "nvptx_register_operand" "R"]
+  ""
+  "%.\\tcvt.s%T0.s16\\t%0, %1;"
+  [(set_attr "subregs_ok" "true")])
+
+(define_insn "*extend_trunc_di2_si"
+  [(set (match_operand:DI 0 "nvptx_register_operand" "=R")
+   (sign_extend:DI
+(truncate:SI (match_operand:DI 1 "nvptx_register_operand" "R"]
+  ""
+  "%.\\tcvt.s64.s32\\t%0, %1;"
+  [(set_attr "subregs_ok" "true")])
+
 ;; Integer arithmetic
 
 (define_insn "add3"
/* { dg-do compile } */
/* { dg-options "-O2" } */

short exttrunc_hi2_qi(short x)
{
  return (char)x;
}

int exttrunc_si2_qi(int x)
{
  return (char)x;
}

long exttrunc_di2_qi(long x)
{
  return (char)x;
}

int exttrunc_si2_hi(int x)
{
  return (short)x;
}

long exttrunc_di2_hi(long x)
{
  return (short)x;
}

long exttrunc_di2_si(long x)
{
  return (int)x;
}

/* { dg-final { scan-assembler-not "cvt.u16.u32" } } */
/* { dg-final { scan-assembler-not "cvt.u16.u64" } } */
/* { dg-final { scan-assembler-not "cvt.u32.u64" } } */

Re: GCC documentation: porting to Sphinx

2021-08-27 Thread Martin Liška


On 8/10/21 17:43, Martin Liška wrote:

Hello.

I've just pushed the rebased branch here:
https://gcc.gnu.org/git/gitweb.cgi?p=gcc.git;a=log;h=refs/users/marxin/heads/sphinx-v4

which I force push once I rebase it. One can fetch the branch with:
$ git fetch origin refs/users/marxin/heads/sphinx-v4


Hello.

There's updated version of the patch set sitting here:
refs/users/marxin/heads/sphinx-v5



Generated output (directly made from GCC source tree) can be seen here:
https://splichal.eu/gccsphinx-final/


And can be seen here.



Changes since the previous version:

1) rebased on the current master (including addition of new target hooks, etc.)
2) two new directive/roles were added in order to not abuse 'option' directive:
    gcc-attr (function/label/... attribute) and gcc-param (--param foo=bar).
    Addition was quite straightforward and we would benefit as these attributes
    and parameters will be listed grouped in the Index:
    https://splichal.eu/gccsphinx-final/html/gcc/genindex.html
3) default syntax language was set to 'none'; made issue for e.g. chunks in 
license pages
    where a random Python keywords were highlighted


Changes since the previous version:
1) Cross manual references are working. It works surprisingly well and we have 
much more cross references now
(for things like options, ...).
2) I have a new version of update_web_docs_git that will be very simple:
   make -C doc html/pdf SOURCEDIR=... BUILDDIR=...
3) URL link creating was updated in diagnostics.c
4) I have a patch candidate that skips links in Info format:
   https://github.com/sphinx-doc/sphinx/pull/9578
5) default domain was switched to cpp and Sphinx community fixed various issues 
mentioned in:
   https://github.com/sphinx-doc/sphinx/issues/9535
6) I made one round of proof-reading of the manuals where I focused on the 
formatting issues



What needs to be done (TODO list):

1) Cross manual references are missing - we have some of them and Sphinx has 
nice support for it:
    https://docs.readthedocs.io/en/stable/guides/intersphinx.html
2) Remove `Texinfo Manuals` section in GCC internal manual
3) Document required packages for PDF, MAN, HTML, ..
4) Write How to write documentation, we can take inspiration from Kernel:
    https://www.kernel.org/doc/html/latest/doc-guide/index.html
5) Update update_web_docs_git - that will simply run Sphinx' Makefile
6) URL emission code needs to be changed in diagnostics.c
7) link emission should be ignored in Info builder
8) epub target should be added to Makefiles
9) function/struct/type attribute definition should be simplified as
    :gcc-attr: attr_name ("options")
    does not work with a reference to it: :gcc-atr:`attr_name`
    An attribute format should be placed into the attr description.
10) default domain should be switched to cpp, will lead to warnings as seen 
here:
     https://github.com/sphinx-doc/sphinx/issues/9535
11) many function and macros in gccint should promoted to '.. function::' and
     '.. macro::' directive
12) various smaller formatting issues need to be addressed


What needs to be done (TODO list):

1) Remove `Texinfo Manuals` section in GCC internal manual
2) Document required packages for PDF, MAN, HTML, ..
3) Write How to write documentation, we can take inspiration from Kernel:
https://www.kernel.org/doc/html/latest/doc-guide/index.html
4) epub target should be added to Makefiles
5) function/struct/type attribute definition should be simplified as
:gcc-attr: attr_name ("options")
does not work with a reference to it: :gcc-atr:`attr_name`
An attribute format should be placed into the attr description.
6) many function and macros in gccint should promoted to '.. function::' and
   '.. macro::' directive (partialy done)

Thoughts about current status of the migration process?

Thanks,
Martin



Known limitations:
1) chapter description (in TOC listing) is dropped in info pages
2) PDF output puts item description on the same line as item definition - 
noticed by Tamar

As previously mentioned, I'm willing to work on the majority of the points 
mentioned in the TODO list.
Now I see the current patchset in a reasonable shape and I'm asking the 
community for approval of the changes?
And I would appreciate a help with any of the items listed in the TODO list.

Thanks,
Martin

[PATCH v3] MIPS: add .module mipsREV to all output asm file

2021-08-27 Thread YunQiang Su

Currently, the asm output file for MIPS has no rev info.
It can make some trouble, for example:
  assembler is mips1 by default,
  gcc is fpxx by default.
To assemble the output of gcc -S, we have to pass -mips2
to assembler.

gcc/ChangeLog:

* gcc/config/mips/mips.c (mips_module_isa_name): New.
mips_file_start: add .module mipsREV to all asm output
---
 gcc/config/mips/mips.c | 29 +
 1 file changed, 29 insertions(+)

diff --git a/gcc/config/mips/mips.c b/gcc/config/mips/mips.c
index 2f7ffe846..47c149eac 100644
--- a/gcc/config/mips/mips.c
+++ b/gcc/config/mips/mips.c
@@ -9841,6 +9841,32 @@ mips_mdebug_abi_name (void)
 }
 }
 
+static const char *
+mips_module_isa_name ()
+{
+  switch (mips_isa)
+{
+case 1:return "mips1";
+case 2:return "mips2";
+case 3:return "mips3";
+case 4:return "mips4";
+case 32:   return "mips32";
+case 33:   return "mips32r2";
+case 34:   return "mips32r3";
+/* 35:r4 is skipped */
+case 36:   return "mips32r5";
+case 37:   return "mips32r6";
+case 64:   return "mips64";
+case 65:   return "mips64r2";
+case 66:   return "mips64r3";
+case 67:   return "mips64r5";
+/* 68:r4 is skipped */
+case 69:   return "mips64r6";
+default:
+  gcc_unreachable ();
+}
+}
+
 /* Implement TARGET_ASM_FILE_START.  */
 
 static void
@@ -9872,6 +9898,9 @@ mips_file_start (void)
 fprintf (asm_out_file, "\t.nan\t%s\n",
 mips_nan == MIPS_IEEE_754_2008 ? "2008" : "legacy");
 
+  fprintf (asm_out_file, "\t.module\t%s\n",
+  mips_module_isa_name ());
+
 #ifdef HAVE_AS_DOT_MODULE
   /* Record the FP ABI.  See below for comments.  */
   if (TARGET_NO_FLOAT)
-- 
2.30.2

Re: [PATCH] Optimize macro: make it more predictable

2021-08-27 Thread Richard Biener via Gcc-patches

On Fri, Aug 27, 2021 at 10:35 AM Martin Liška  wrote:
>
> On 8/26/21 14:39, Martin Liška wrote:
> > I can investigate that.
>
> So there are statistics about #pragma GCC optimize and optimize attribute in 
> openSUSE:Factory.
> The numbers at the line beginning represent number of functions affected by 
> the pragma or attribute.
> Are we smarter in what we want?

So with ignoring darktable which seems completely insane the cases
will likely continue
to work as intended if we change from the current scheme to appending
as proposed.

Richard.

> Processed 14026 packages, attr/pragma used in 111
>
>
> --- Mesa-drivers.log ---
>
> 1x: "O1"
>
> --- Mesa.log ---
>
> 1x: "O1"
>
> --- PrusaSlicer.log ---
>
> 44x: "-fno-unsafe-math-optimizations"
>
> --- SVT-AV1.log ---
>
> 2x: "unroll-loops"
>
> --- abseil-cpp.log ---
>
> 4x: "no-optimize-sibling-calls"
>
> --- analitza.log ---
>
> 1x: "-fno-unsafe-math-optimizations"
>
> --- argon2.log ---
>
> 2x: "O0"
>
> --- arpack-ng:serial.log ---
>
> 2x: "-fno-unsafe-math-optimizations"
>
> --- avogadrolibs.log ---
>
> 106x: "-fno-unsafe-math-optimizations"
>
> --- caddy.log ---
>
> 32x: "no-tree-vectorize"
>
> --- calligra.log ---
>
> 1x: "-fno-unsafe-math-optimizations"
>
> --- ceph-test.log ---
>
> 2x: "no-tree-vectorize"
>
> --- ceph.log ---
>
> 2x: "no-tree-vectorize"
>
> --- ceres-solver.log ---
>
> 130x: "-fno-unsafe-math-optimizations"
>
> --- cmake:mini.log ---
>
> 1x: "no-tree-vectorize"
>
> --- cpustat.log ---
>
> 15x: "-O3"
>
> --- cross-aarch64-gcc11-bootstrap.log ---
>
> 32x: "no-omit-frame-pointer"
>
> --- cross-amdgcn-gcc11.log ---
>
> 12x: "O2"
>
> 10x: "-fno-tree-loop-distribute-patterns"
>
> 1x: "-fno-tree-loop-distribute-patterns""-fno-tree-loop-distribute-patterns"
>
> --- cross-nvptx-gcc11.log ---
>
> 6x: "-fno-tree-loop-distribute-patterns"
>
> 2x: "O2"
>
> --- csound.log ---
>
> 6x: "no-finite-math-only"
>
> 1x: "-fno-unsafe-math-optimizations"
>
> --- darktable.log ---
>
> 65213x: "unroll-loops", "tree-loop-if-convert", "tree-loop-distribution", 
> "no-strict-aliasing", "loop-interchange", "loop-nest-optimize", 
> "tree-loop-im", "unswitch-loops", "tree-loop-ivcanon", "ira-loop-pressure", 
> "split-ivs-in-unroller", "tree-loop-vectorize", 
> "variable-expansion-in-unroller", "split-loops", "ivopts", 
> "predictive-commoning", "tree-loop-linear", "loop-block", "loop-strip-mine", 
> "finite-math-only", "fp-contract=fast", "fast-math", "no-math-errno"
>
> 4981x: "unroll-loops", "tree-loop-if-convert", "tree-loop-distribution", 
> "no-strict-aliasing", "loop-interchange", "loop-nest-optimize", 
> "tree-loop-im", "unswitch-loops", "tree-loop-ivcanon", "ira-loop-pressure", 
> "split-ivs-in-unroller", "variable-expansion-in-unroller", "split-loops", 
> "ivopts", "predictive-commoning", "tree-loop-linear", "loop-block", 
> "loop-strip-mine", "finite-math-only", "fp-contract=fast", "fast-math"
>
> 1021x: "unroll-loops", "tree-loop-if-convert", "tree-loop-distribution", 
> "no-strict-aliasing", "loop-interchange", "loop-nest-optimize", 
> "tree-loop-im", "unswitch-loops", "tree-loop-ivcanon", "ira-loop-pressure", 
> "split-ivs-in-unroller", "variable-expansion-in-unroller", "split-loops", 
> "ivopts", "predictive-commoning", "tree-loop-linear", "loop-block", 
> "loop-strip-mine", "fp-contract=fast", "tree-vectorize"
>
> 1018x: "unroll-loops", "tree-loop-if-convert", "tree-loop-distribution", 
> "no-strict-aliasing", "loop-interchange", "loop-nest-optimize", 
> "tree-loop-im", "unswitch-loops", "tree-loop-ivcanon", "ira-loop-pressure", 
> "split-ivs-in-unroller", "variable-expansion-in-unroller", "split-loops", 
> "ivopts", "predictive-commoning", "tree-loop-linear", "loop-block", 
> "loop-strip-mine", "finite-math-only", "fp-contract=fast", "fast-math", 
> "unroll-loops", "tree-loop-if-convert", "tree-loop-distribution", 
> "no-strict-aliasing", "loop-interchange", "loop-nest-optimize", 
> "tree-loop-im", "unswitch-loops", "tree-loop-ivcanon", "ira-loop-pressure", 
> "split-ivs-in-unroller", "variable-expansion-in-unroller", "split-loops", 
> "ivopts", "predictive-commoning", "tree-loop-linear", "loop-block", 
> "loop-strip-mine", "finite-math-only", "fp-contract=fast", "fast-math", 
> "unroll-loops", "tree-loop-if-convert", "tree-loop-distribution", 
> "no-strict-aliasing", "loop-interchange", "loop-nest-optimize", 
> "tree-loop-im", "unsw
>  itch-loops", "tree-loop-ivcanon", "ira-loop-pressure", 
> "split-ivs-in-unroller", "variable-expansion-in-unroller", "split-loops", 
> "ivopts", "predictive-commoning", "tree-loop-linear", "loop-block", 
> "loop-strip-mine", "finite-math-only", "fp-contract=fast", "fast-math", 
> "unroll-loops", "tree-loop-if-convert", "tree-loop-distribution", 
> "no-strict-aliasing", "loop-interchange", "loop-nest-optimize", 
> "tree-loop-im", "unswitch-loops", "tree-loop-ivcanon", "ira-loop-pressure", 
> "split-ivs-in-unroller", "variable-expansion-in-unroller", "split-loops", 
> "ivopts", "predictive-commoning", "tree-loop-linear",

Re: [PATCH] Optimize macro: make it more predictable

2021-08-27 Thread Martin Liška


On 8/26/21 14:39, Martin Liška wrote:

I can investigate that.


So there are statistics about #pragma GCC optimize and optimize attribute in 
openSUSE:Factory.
The numbers at the line beginning represent number of functions affected by the 
pragma or attribute.
Are we smarter in what we want?

Processed 14026 packages, attr/pragma used in 111


--- Mesa-drivers.log ---

1x: "O1"

--- Mesa.log ---

1x: "O1"

--- PrusaSlicer.log ---

44x: "-fno-unsafe-math-optimizations"

--- SVT-AV1.log ---

2x: "unroll-loops"

--- abseil-cpp.log ---

4x: "no-optimize-sibling-calls"

--- analitza.log ---

1x: "-fno-unsafe-math-optimizations"

--- argon2.log ---

2x: "O0"

--- arpack-ng:serial.log ---

2x: "-fno-unsafe-math-optimizations"

--- avogadrolibs.log ---

106x: "-fno-unsafe-math-optimizations"

--- caddy.log ---

32x: "no-tree-vectorize"

--- calligra.log ---

1x: "-fno-unsafe-math-optimizations"

--- ceph-test.log ---

2x: "no-tree-vectorize"

--- ceph.log ---

2x: "no-tree-vectorize"

--- ceres-solver.log ---

130x: "-fno-unsafe-math-optimizations"

--- cmake:mini.log ---

1x: "no-tree-vectorize"

--- cpustat.log ---

15x: "-O3"

--- cross-aarch64-gcc11-bootstrap.log ---

32x: "no-omit-frame-pointer"

--- cross-amdgcn-gcc11.log ---

12x: "O2"

10x: "-fno-tree-loop-distribute-patterns"

1x: "-fno-tree-loop-distribute-patterns""-fno-tree-loop-distribute-patterns"

--- cross-nvptx-gcc11.log ---

6x: "-fno-tree-loop-distribute-patterns"

2x: "O2"

--- csound.log ---

6x: "no-finite-math-only"

1x: "-fno-unsafe-math-optimizations"

--- darktable.log ---

65213x: "unroll-loops", "tree-loop-if-convert", "tree-loop-distribution", "no-strict-aliasing", "loop-interchange", "loop-nest-optimize", "tree-loop-im", "unswitch-loops", "tree-loop-ivcanon", 
"ira-loop-pressure", "split-ivs-in-unroller", "tree-loop-vectorize", "variable-expansion-in-unroller", "split-loops", "ivopts", "predictive-commoning", "tree-loop-linear", "loop-block", "loop-strip-mine", 
"finite-math-only", "fp-contract=fast", "fast-math", "no-math-errno"

4981x: "unroll-loops", "tree-loop-if-convert", "tree-loop-distribution", "no-strict-aliasing", "loop-interchange", "loop-nest-optimize", "tree-loop-im", "unswitch-loops", "tree-loop-ivcanon", 
"ira-loop-pressure", "split-ivs-in-unroller", "variable-expansion-in-unroller", "split-loops", "ivopts", "predictive-commoning", "tree-loop-linear", "loop-block", "loop-strip-mine", 
"finite-math-only", "fp-contract=fast", "fast-math"

1021x: "unroll-loops", "tree-loop-if-convert", "tree-loop-distribution", "no-strict-aliasing", "loop-interchange", "loop-nest-optimize", "tree-loop-im", "unswitch-loops", 
"tree-loop-ivcanon", "ira-loop-pressure", "split-ivs-in-unroller", "variable-expansion-in-unroller", "split-loops", "ivopts", "predictive-commoning", "tree-loop-linear", "loop-block", 
"loop-strip-mine", "fp-contract=fast", "tree-vectorize"

1018x: "unroll-loops", "tree-loop-if-convert", "tree-loop-distribution", "no-strict-aliasing", "loop-interchange", "loop-nest-optimize", "tree-loop-im", "unswitch-loops", "tree-loop-ivcanon", "ira-loop-pressure", "split-ivs-in-unroller", "variable-expansion-in-unroller", "split-loops", "ivopts", "predictive-commoning", "tree-loop-linear", "loop-block", "loop-strip-mine", 
"finite-math-only", "fp-contract=fast", "fast-math", "unroll-loops", "tree-loop-if-convert", "tree-loop-distribution", "no-strict-aliasing", "loop-interchange", "loop-nest-optimize", "tree-loop-im", "unswitch-loops", "tree-loop-ivcanon", "ira-loop-pressure", "split-ivs-in-unroller", "variable-expansion-in-unroller", "split-loops", "ivopts", "predictive-commoning", 
"tree-loop-linear", "loop-block", "loop-strip-mine", "finite-math-only", "fp-contract=fast", "fast-math", "unroll-loops", "tree-loop-if-convert", "tree-loop-distribution", "no-strict-aliasing", "loop-interchange", "loop-nest-optimize", "tree-loop-im", "unsw
itch-loops", "tree-loop-ivcanon", "ira-loop-pressure", "split-ivs-in-unroller", "variable-expansion-in-unroller", "split-loops", "ivopts", "predictive-commoning", "tree-loop-linear", "loop-block", "loop-strip-mine", "finite-math-only", "fp-contract=fast", "fast-math", "unroll-loops", 
"tree-loop-if-convert", "tree-loop-distribution", "no-strict-aliasing", "loop-interchange", "loop-nest-optimize", "tree-loop-im", "unswitch-loops", "tree-loop-ivcanon", "ira-loop-pressure", "split-ivs-in-unroller", "variable-expansion-in-unroller", "split-loops", "ivopts", 
"predictive-commoning", "tree-loop-linear", "loop-block", "loop-strip-mine", "finite-math-only", "fp-contract=fast", "fast-math", "tree-vectorize"

1011x: "unroll-loops", "tree-loop-if-convert", "tree-loop-distribution", "no-strict-aliasing", "loop-interchange", "loop-nest-optimize", "tree-loop-im", "unswitch-loops", "tree-loop-ivcanon", 
"ira-loop-pressure", "split-ivs-in-unroller", "variable-expansion-in-unroller", "split-loops", "ivopts", "predictive-commoning", "tree-loop-linear", "loop-block", "loop-strip-mine", "finite-math-only", 
"fp-contr

[PATCH] tree-optimization/45178 - DCE of dead control flow in infinite loop

2021-08-27 Thread Richard Biener via Gcc-patches

This fixes DCE to be able to elide dead control flow in an
infinite loop without an exit edge.  This special situation is
handled well by the code finding an edge to preserve since there's
no chance it will find the exit edge and make the loop finite.

Bootstrapped and tested on x86_64-unknown-linux-gnu, pushed.

2021-08-27  Richard Biener  

PR tree-optimization/45178
* tree-ssa-dce.c (find_obviously_necessary_stmts): For
infinite loops without exit do not mark control dependent
edges of the latch necessary.

* gcc.dg/tree-ssa/ssa-dce-3.c: Adjust testcase.
---
 gcc/testsuite/gcc.dg/tree-ssa/ssa-dce-3.c |  9 +
 gcc/tree-ssa-dce.c| 14 +-
 2 files changed, 14 insertions(+), 9 deletions(-)

diff --git a/gcc/testsuite/gcc.dg/tree-ssa/ssa-dce-3.c 
b/gcc/testsuite/gcc.dg/tree-ssa/ssa-dce-3.c
index 863aa79b4eb..fdfe37e0abd 100644
--- a/gcc/testsuite/gcc.dg/tree-ssa/ssa-dce-3.c
+++ b/gcc/testsuite/gcc.dg/tree-ssa/ssa-dce-3.c
@@ -21,11 +21,12 @@ int main(void)
   return 0;
 }
 
-/* We now can prove the infiniteness of the loop during CCP and fail
-   to eliminate the code inside the infinite loop because we start
-   by marking the j % 7 condition as useful.  See PR45178.  */
+/* We now can prove the infiniteness of the loop during CCP but we
+   still want to eliminate the code inside the infinite loop.  See PR45178.  */
 
 /* We should eliminate the inner condition, but the loop must be preserved
-   as it is infinite.  Therefore there should be just one goto and no PHI.  */
+   as it is infinite.  Therefore there should be just one goto and no PHI
+   and no if.  */
 /* { dg-final { scan-tree-dump-times "PHI " 0 "cddce1" } } */
+/* { dg-final { scan-tree-dump-times "if " 0 "cddce1" } } */
 /* { dg-final { scan-tree-dump-times "goto" 1 "cddce1" } } */
diff --git a/gcc/tree-ssa-dce.c b/gcc/tree-ssa-dce.c
index 0778eb9704a..c4907af923c 100644
--- a/gcc/tree-ssa-dce.c
+++ b/gcc/tree-ssa-dce.c
@@ -414,7 +414,9 @@ find_obviously_necessary_stmts (bool aggressive)
   if ((flags & (ECF_CONST|ECF_PURE)) && !(flags & ECF_LOOPING_CONST_OR_PURE))
 return;
 
-  /* Prevent the empty possibly infinite loops from being removed.  */
+  /* Prevent the empty possibly infinite loops from being removed.  This is
+ needed to make the logic in remove_dead_stmt work to identify the
+ correct edge to keep when removing a controlling condition.  */
   if (aggressive)
 {
   if (mark_irreducible_loops ())
@@ -426,17 +428,19 @@ find_obviously_necessary_stmts (bool aggressive)
  && (e->flags & EDGE_IRREDUCIBLE_LOOP))
{
  if (dump_file)
-   fprintf (dump_file, "Marking back edge of irreducible loop 
%i->%i\n",
-e->src->index, e->dest->index);
+   fprintf (dump_file, "Marking back edge of irreducible "
+"loop %i->%i\n", e->src->index, e->dest->index);
  mark_control_dependent_edges_necessary (e->dest, false);
}
  }
 
   for (auto loop : loops_list (cfun, 0))
-   if (!finite_loop_p (loop))
+   /* For loops without an exit do not mark any condition.  */
+   if (loop->exits->next && !finite_loop_p (loop))
  {
if (dump_file)
- fprintf (dump_file, "cannot prove finiteness of loop %i\n", 
loop->num);
+ fprintf (dump_file, "cannot prove finiteness of loop %i\n",
+  loop->num);
mark_control_dependent_edges_necessary (loop->latch, false);
  }
 }
-- 
2.31.1

Re: [PATCH v3] Fix incomplete computation in fill_always_executed_in_1

2021-08-27 Thread Richard Biener via Gcc-patches

On Thu, 26 Aug 2021, Xionghu Luo wrote:

> 
> 
> On 2021/8/24 16:20, Richard Biener wrote:
> > On Tue, 24 Aug 2021, Xionghu Luo wrote:
> > 
> >>
> >>
> >> On 2021/8/19 20:11, Richard Biener wrote:
>  -  class loop *inn_loop = loop;
> 
>   if (ALWAYS_EXECUTED_IN (loop->header) == NULL)
> {
>  @@ -3232,19 +3231,6 @@ fill_always_executed_in_1 (class loop *loop, 
>  sbitmap contains_call)
>    to disprove this if possible).  */
> if (bb->flags & BB_IRREDUCIBLE_LOOP)
>   break;
>  -
>  -  if (!flow_bb_inside_loop_p (inn_loop, bb))
>  -break;
>  -
>  -  if (bb->loop_father->header == bb)
>  -{
>  -  if (!dominated_by_p (CDI_DOMINATORS, loop->latch, bb))
>  -break;
>  -
>  -  /* In a loop that is always entered we may proceed anyway.
>  - But record that we entered it and stop once we leave 
>  it.  */
>  -  inn_loop = bb->loop_father;
>  -}
>   }
> 
>   while (1)
> >>> I'm not sure this will work correct (I'm not sure how the existing
> >>> code makes it so either...).  That said, I can't poke any hole
> >>> into the change.  What I see is that definitely
> >>>
> >>> if (dominated_by_p (CDI_DOMINATORS, loop->latch, bb))
> >>>   last = bb;
> >>>
> >>> if (bitmap_bit_p (contains_call, bb->index))
> >>>   break;
> >>>
> >>> doesn't work reliably since the DOM ordering will process blocks
> >>> A B and C in random order for
> >>>
> >>> for (;;)
> >>>  {
> >>> if (cond)
> >>>   {
> >>> A: foo ();
> >>>   }
> >>> else B:;
> >>> C:;
> >>>  }
> >>>
> >>> and thus we can end up setting 'last' to C_before_  processing
> >>> 'A' and thus arriving at the call foo () ...
> >>>
> >>> get_loop_body_in_dom_order does some "special sauce" but not
> >>> to address the above problem - but it might be that a subtle
> >>> issue like the above is the reason for the inner loop handling.
> >>> The inner loop block order does_not_  adhere to this "special sauce",
> >>> that is - the "Additionally, if a basic block s dominates
> >>> the latch, then only blocks dominated by s are be after it."
> >>> guarantee holds for the outer loop latch, not for the inner.
> >>>
> >>> Digging into the history of fill_always_executed_in_1 doesn't
> >>> reveal anything - the inner loop handling has been present
> >>> since introduction by Zdenek - but usually Zdenek has a reason
> >>> for doing things as he does;)
> >>
> >> Yes, this is really complicated usage, thanks for point it out. :)
> >> I constructed two cases to verify this with inner loop includes "If A; 
> >> else B; C".
> >> Finding that fill_sons_in_loop in get_loop_body_in_dom_order will also 
> >> checks
> >> whether the bb domintes outer loop’s latch, if C dominate outer loop’s 
> >> latch,
> >> C is postponed, the access order is ABC, 'last' won’t be set to C if A or 
> >> B contains call;
> > 
> > But it depends on the order of visiting ABC and that's hard to put into
> > a testcase since it depends on the order of edges and the processing
> > of the dominance computation.  ABC are simply unordered with respect
> > to a dominator walk.
> > 
> >> Otherwise if C doesn’t dominate outer loop’s latch in fill_sons_in_loop,
> >> the access order is CAB, but 'last' also won’t be updated to C in 
> >> fill_always_executed_in_1
> >> since there is also dominate check, then if A or B contains call, it could 
> >> break
> >> successfully.
> >>
> >> C won't be set to ALWAYS EXECUTED for both circumstance.
> >>
> >>>
> >>> Note it might be simply a measure against quadratic complexity,
> >>> esp. since with your patch we also dive into not always executed
> >>> subloops as you remove the
> >>>
> >>> if (!dominated_by_p (CDI_DOMINATORS, loop->latch, bb))
> >>>   break;
> >>>
> >>> check.  I suggest to evaluate behavior of the patch on a testcase
> >>> like
> >>>
> >>> void foo (int n, int **k)
> >>> {
> >>> for (int i = 0; i < n; ++i)
> >>>   if (k[0][i])
> >>> for (int j = 0; j < n; ++j)
> >>>   if (k[1][j])
> >>> for (int l = 0; l < n; ++l)
> >>>   if (k[2][l])
> >>> ...
> >>> }
> >>
> >> Theoretically the complexity is changing from L1(bbs) to 
> >> L1(bbs)+L2(bbs)+L3(bbs)+…+Ln(bbs),
> >> so fill_always_executed_in_1's execution time is supposed to be increase 
> >> from
> >> O(n) to O(n2)?  The time should depend on loop depth and bb counts.   I 
> >> also drafted a
> >> test case has 73-depth loop function with 25 no-ipa function copies each 
> >> compiled
> >> in lim2 and lim4 dependently.  Total execution time of 
> >> fill_always_executed_in_1 is
> >> increased from 32ms to 58ms, almost doubled but

57 matches

Mail list logo