Re: Fix handling of gimple_clobber in ipa_modref

2020-09-25 Thread Richard Biener via Gcc-patches
On September 26, 2020 12:04:24 AM GMT+02:00, Jan Hubicka  wrote:
>Hi,
>while adding check for gimple_clobber I reversed the return value
>so instead of ignoring the statement ipa-modref gives up.  Fixed thus.
>This explains the drop between originally reported disambinguations
>stats and ones I got later.

I don't think you can ignore clobbers. They are barriers for code motion. 

Richard. 


>Bootstrapped/regtested x86_64-linux.
>
>gcc/ChangeLog:
>
>2020-09-25  Jan Hubicka  
>
>   * ipa-modref.c (analyze_stmt): Fix return value for gimple_clobber.
>
>diff --git a/gcc/ipa-modref.c b/gcc/ipa-modref.c
>index aa6929ff010..44b844b90db 100644
>--- a/gcc/ipa-modref.c
>+++ b/gcc/ipa-modref.c
>@@ -658,7 +658,7 @@ analyze_stmt (modref_summary *summary, gimple
>*stmt, bool ipa)
> {
>   /* There is no need to record clobbers.  */
>   if (gimple_clobber_p (stmt))
>-return false;
>+return true;
>   /* Analyze all loads and stores in STMT.  */
>   walk_stmt_load_store_ops (stmt, summary,
>   analyze_load, analyze_store);



Re: [PATCH v2] c++: Implement -Wrange-loop-construct [PR94695]

2020-09-25 Thread Jason Merrill via Gcc-patches

On 9/25/20 6:01 PM, Marek Polacek wrote:

On Fri, Sep 25, 2020 at 04:09:44PM -0400, Jason Merrill via Gcc-patches wrote:

On 9/24/20 8:05 PM, Marek Polacek wrote:

This new warning can be used to prevent expensive copies inside range-based
for-loops, for instance:

struct S { char arr[128]; };
void fn () {
  S arr[5];
  for (const auto x : arr) {  }
}

where auto deduces to S and then we copy the big S in every iteration.
Using "const auto " would not incur such a copy.  With this patch the
compiler will warn:

q.C:4:19: warning: loop variable 'x' creates a copy from type 'const S' 
[-Wrange-loop-construct]
  4 |   for (const auto x : arr) {  }
|   ^
q.C:4:19: note: use reference type 'const S&' to prevent copying
  4 |   for (const auto x : arr) {  }
|   ^
|   &

As per Clang, this warning is suppressed for trivially copyable types
whose size does not exceed 64B.  The tricky part of the patch was how
to figure out if using a reference would have prevented a copy.  I've
used perform_implicit_conversion to perform the imaginary conversion.
Then if the conversion doesn't have any side-effects, I assume it does
not call any functions or create any TARGET_EXPRs, and is just a simple
assignment like this one:

const T  = (const T &) <__for_begin>;

But it can also be a CALL_EXPR:

x = (const T &) Iterator::operator* (&__for_begin)

which is still fine -- we just use the return value and don't create
any copies.


Would conv_binds_ref_to_prvalue (implicit_conversion (...)) do what you
want?


Yes, thanks.  I played with conv_binds_ref_to_prvalue before to check
if the non-reference range-decl case creates a copy, but since the
types of the range decl and *__for_begin are the same, we only get
ck_identity for it.  But I never tried it for the & case...  Nevermind.

Bootstrapped/regtested on x86_64-pc-linux-gnu, ok for trunk?

-- >8 --
This new warning can be used to prevent expensive copies inside range-based
for-loops, for instance:

   struct S { char arr[128]; };
   void fn () {
 S arr[5];
 for (const auto x : arr) {  }
   }

where auto deduces to S and then we copy the big S in every iteration.
Using "const auto " would not incur such a copy.  With this patch the
compiler will warn:

q.C:4:19: warning: loop variable 'x' creates a copy from type 'const S' 
[-Wrange-loop-construct]
 4 |   for (const auto x : arr) {  }
   |   ^
q.C:4:19: note: use reference type 'const S&' to prevent copying
 4 |   for (const auto x : arr) {  }
   |   ^
   |   &

As per Clang, this warning is suppressed for trivially copyable types
whose size does not exceed 64B.  The tricky part of the patch was how
to figure out if using a reference would have prevented a copy.  I've
used perform_implicit_conversion to perform the imaginary conversion.
Then if the conversion doesn't have any side-effects, I assume it does
not call any functions or create any TARGET_EXPRs, and is just a simple
assignment like this one:

   const T  = (const T &) <__for_begin>;

But it can also be a CALL_EXPR:

   x = (const T &) Iterator::operator* (&__for_begin)

which is still fine -- we just use the return value and don't create
any copies.

This warning is enabled by -Wall.  Further warnings of similar nature
should follow soon.

gcc/c-family/ChangeLog:

PR c++/94695
* c.opt (Wrange-loop-construct): New option.

gcc/cp/ChangeLog:

PR c++/94695
* call.c (ref_conv_binds_directly_p): New function.
* cp-tree.h (ref_conv_binds_directly_p): Declare.
* parser.c (warn_for_range_copy): New function.
(cp_convert_range_for): Call it.

gcc/ChangeLog:

PR c++/94695
* doc/invoke.texi: Document -Wrange-loop-construct.

gcc/testsuite/ChangeLog:

PR c++/94695
* g++.dg/warn/Wrange-loop-construct.C: New test.
---
  gcc/c-family/c.opt|   4 +
  gcc/cp/call.c |  13 ++
  gcc/cp/cp-tree.h  |   1 +
  gcc/cp/parser.c   |  68 +-
  gcc/doc/invoke.texi   |  21 +-
  .../g++.dg/warn/Wrange-loop-construct.C   | 207 ++
  6 files changed, 309 insertions(+), 5 deletions(-)
  create mode 100644 gcc/testsuite/g++.dg/warn/Wrange-loop-construct.C

diff --git a/gcc/c-family/c.opt b/gcc/c-family/c.opt
index 7761eefd203..bbf7da89658 100644
--- a/gcc/c-family/c.opt
+++ b/gcc/c-family/c.opt
@@ -800,6 +800,10 @@ Wpacked-not-aligned
  C ObjC C++ ObjC++ Var(warn_packed_not_aligned) Warning LangEnabledBy(C ObjC 
C++ ObjC++,Wall)
  Warn when fields in a struct with the packed attribute are misaligned.
  
+Wrange-loop-construct

+C++ ObjC++ Var(warn_range_loop_construct) Warning LangEnabledBy(C++ 
ObjC++,Wall)
+Warn when a range-based for-loop is creating unnecessary copies.
+

Re: [PING][PATCH] correct handling of indices into arrays with elements larger than 1 (PR c++/96511)

2020-09-25 Thread Jason Merrill via Gcc-patches

On 9/22/20 4:05 PM, Martin Sebor wrote:

The rebased and retested patches are attached.

On 9/21/20 3:17 PM, Martin Sebor wrote:
Ping: 
https://gcc.gnu.org/pipermail/gcc-patches/2020-September/553906.html


(I'm working on rebasing the patch on top of the latest trunk which
has changed some of the same code but it'd be helpful to get a go-
ahead on substance the changes.  I don't expect the rebase to
require any substantive modifications.)

Martin

On 9/14/20 4:01 PM, Martin Sebor wrote:

On 9/4/20 11:14 AM, Jason Merrill wrote:

On 9/3/20 2:44 PM, Martin Sebor wrote:

On 9/1/20 1:22 PM, Jason Merrill wrote:

On 8/11/20 12:19 PM, Martin Sebor via Gcc-patches wrote:

-Wplacement-new handles array indices and pointer offsets the same:
by adjusting them by the size of the element.  That's correct for
the latter but wrong for the former, causing false positives when
the element size is greater than one.

In addition, the warning doesn't even attempt to handle arrays of
arrays.  I'm not sure if I forgot or if I simply didn't think of
it.

The attached patch corrects these oversights by replacing most
of the -Wplacement-new code with a call to compute_objsize which
handles all this correctly (plus more), and is also better tested.
But even compute_objsize has bugs: it trips up while converting
wide_int to offset_int for some pointer offset ranges.  Since
handling the C++ IL required changes in this area the patch also
fixes that.

For review purposes, the patch affects just the middle end.
The C++ diff pretty much just removes code from the front end.


The C++ changes are OK.


Thank you for looking at the rest as well.




-compute_objsize (tree ptr, int ostype, access_ref *pref,
-    bitmap *visited, const vr_values *rvals /* = 
NULL */)
+compute_objsize (tree ptr, int ostype, access_ref *pref, bitmap 
*visited,

+    const vr_values *rvals)


This reformatting seems unnecessary, and I prefer to keep the 
comment about the default argument.


This overload doesn't take a default argument.  (There was a stray
declaration of a similar function at the top of the file that had
one.  I've removed it.)


Ah, true.


-  if (!size || TREE_CODE (size) != INTEGER_CST)
-   return false;

 >...

You change some failure cases in compute_objsize to return success 
with a maximum range, while others continue to return failure. 
This needs commentary about the design rationale.


This is too much for a comment in the code but the background is
this: compute_objsize initially returned the object size as a 
constant.

Recently, I have enhanced it to return a range to improve warnings for
allocated objects.  With that, a failure can be turned into success by
having the function set the range to that of the largest object.  That
should simplify the function's callers and could even improve
the detection of some invalid accesses.  Once this change is made
it might even be possible to change its return type to void.

The change that caught your eye is necessary to make the function
a drop-in replacement for the C++ front end code which makes this
same assumption.  Without it, a number of test cases that exercise
VLAs fail in g++.dg/warn/Wplacement-new-size-5.C.  For example:

   void f (int n)
   {
 char a[n];
 new (a - 1) int ();
   }

Changing any of the other places isn't necessary for existing tests
to pass (and I didn't want to introduce too much churn).  But I do
want to change the rest of the function along the same lines at some
point.


Please do change the other places to be consistent; better to have 
more churn than to leave the function half-updated.  That can be a 
separate patch if you prefer, but let's do it now rather than later.


I've made most of these changes in the other patch (also attached).
I'm quite happy with the result but it turned out to be a lot more
work than either of us expected, mostly due to the amount of testing.

I've left a couple of failing cases in place mainly as reminders
to handle them better (which means I also didn't change the caller
to avoid testing for failures).  I've also added TODO notes with
reminders to handle some of the new codes more completely.




+  special_array_member sam{ };


sam is always set by component_ref_size, so I don't think it's 
necessary to initialize it at the declaration.


I find initializing pass-by-pointer local variables helpful but
I don't insist on it.




@@ -187,7 +187,7 @@ decl_init_size (tree decl, bool min)
   tree last_type = TREE_TYPE (last);
   if (TREE_CODE (last_type) != ARRAY_TYPE
   || TYPE_SIZE (last_type))
-    return size;
+    return size ? size : TYPE_SIZE_UNIT (type);


This change seems to violate the comment for the function.


By my reading (and writing) the change is covered by the first
sentence:

    Returns the size of the object designated by DECL considering
    its initializer if it either has one or if it would not affect
    its size, ...


OK, I see it now.


It handles a 

[committed] analyzer: add test for placement new

2020-09-25 Thread David Malcolm via Gcc-patches
Successfully regrtested on x86_64-pc-linux-gnu.
Pushed to master as r11-3472-gd4a906e7b51f3fc31f3328810f45ae4cf2e7bbc3.

gcc/testsuite/ChangeLog:
PR analyzer/94355
* g++.dg/analyzer/placement-new.C: New test.
---
 gcc/testsuite/g++.dg/analyzer/placement-new.C | 26 +++
 1 file changed, 26 insertions(+)
 create mode 100644 gcc/testsuite/g++.dg/analyzer/placement-new.C

diff --git a/gcc/testsuite/g++.dg/analyzer/placement-new.C 
b/gcc/testsuite/g++.dg/analyzer/placement-new.C
new file mode 100644
index 000..8250f45b9d9
--- /dev/null
+++ b/gcc/testsuite/g++.dg/analyzer/placement-new.C
@@ -0,0 +1,26 @@
+#include 
+
+/* Placement new.  */
+
+void test_1 (void)
+{
+  char buf[sizeof(int)];
+  int *p = new(buf) int (42);
+}
+
+/* Placement new[].  */
+
+void test_2 (void)
+{
+  char buf[sizeof(int) * 10];
+  int *p = new(buf) int[10];
+}
+
+/* Delete of placement new.  */
+
+void test_3 (void)
+{
+  char buf[sizeof(int)];
+  int *p = new(buf) int (42);
+  delete p; // { dg-warning "memory not on the heap" }
+}
-- 
2.26.2



[committed] analyzer: fix ICEs treeifying offset_region [PR96646, PR96841]

2020-09-25 Thread David Malcolm via Gcc-patches
Successfully bootstrapped & regrtested on x86_64-pc-linux-gnu.
Pushed to master as r11-3471-g29f5db8ef81fac4db8e66e5f06fdf1d469e8161c.

gcc/analyzer/ChangeLog:
PR analyzer/96646
PR analyzer/96841
* region-model.cc (region_model::get_representative_path_var):
When handling offset_region, wrap the MEM_REF's first argument in
an ADDR_EXPR of pointer type, rather than simply using the tree
for the parent region.  Require the MEM_REF's second argument to
be an integer constant.

gcc/testsuite/ChangeLog:
PR analyzer/96646
PR analyzer/96841
* gcc.dg/analyzer/pr96646.c: New test.
* gcc.dg/analyzer/pr96841.c: New test.
---
 gcc/analyzer/region-model.cc|  7 +--
 gcc/testsuite/gcc.dg/analyzer/pr96646.c | 24 
 gcc/testsuite/gcc.dg/analyzer/pr96841.c | 23 +++
 3 files changed, 52 insertions(+), 2 deletions(-)
 create mode 100644 gcc/testsuite/gcc.dg/analyzer/pr96646.c
 create mode 100644 gcc/testsuite/gcc.dg/analyzer/pr96841.c

diff --git a/gcc/analyzer/region-model.cc b/gcc/analyzer/region-model.cc
index 981fb779df2..a88a295a241 100644
--- a/gcc/analyzer/region-model.cc
+++ b/gcc/analyzer/region-model.cc
@@ -2140,11 +2140,14 @@ region_model::get_representative_path_var (const region 
*reg,
path_var offset_pv
  = get_representative_path_var (offset_reg->get_byte_offset (),
 visited);
-   if (!offset_pv)
+   if (!offset_pv || TREE_CODE (offset_pv.m_tree) != INTEGER_CST)
  return path_var (NULL_TREE, 0);
+   tree addr_parent = build1 (ADDR_EXPR,
+  build_pointer_type (reg->get_type ()),
+  parent_pv.m_tree);
return path_var (build2 (MEM_REF,
 reg->get_type (),
-parent_pv.m_tree, offset_pv.m_tree),
+addr_parent, offset_pv.m_tree),
 parent_pv.m_stack_depth);
   }
 
diff --git a/gcc/testsuite/gcc.dg/analyzer/pr96646.c 
b/gcc/testsuite/gcc.dg/analyzer/pr96646.c
new file mode 100644
index 000..2ac5a03b0e5
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/analyzer/pr96646.c
@@ -0,0 +1,24 @@
+/* { dg-additional-options "-O1" } */
+
+struct zx {
+  struct zx *b4, *g0;
+};
+
+struct oo {
+  void *ph;
+  struct zx el;
+};
+
+inline void
+k7 (struct zx *xj)
+{
+  xj->b4->g0 = 0; /* { dg-warning "dereference of NULL" } */
+  xj->b4 = 0;
+}
+
+void
+n8 (struct oo *yx)
+{
+  k7 (>el);
+  n8 (yx);
+}
diff --git a/gcc/testsuite/gcc.dg/analyzer/pr96841.c 
b/gcc/testsuite/gcc.dg/analyzer/pr96841.c
new file mode 100644
index 000..d9d35f3dce8
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/analyzer/pr96841.c
@@ -0,0 +1,23 @@
+/* { dg-additional-options "-O1 -Wno-builtin-declaration-mismatch" } */
+
+int
+l8 (void);
+
+__SIZE_TYPE__
+malloc (__SIZE_TYPE__);
+
+void
+th (int *);
+
+void
+bv (__SIZE_TYPE__ ny)
+{
+  int ***mf;
+
+  while (l8 ())
+{
+  *mf = 0;
+  (*mf)[ny] = (int *) malloc (sizeof (int));
+  th ((*mf)[ny]); /* { dg-warning "leak" } */
+}
+}
-- 
2.26.2



Re: [PATCH, rs6000] correct an erroneous BTM value in the BU_P10_MISC define

2020-09-25 Thread Segher Boessenkool
On Fri, Sep 25, 2020 at 03:34:49PM -0500, will schmidt wrote:
> On Fri, 2020-09-25 at 12:36 -0500, Segher Boessenkool wrote:
> > No, it cannot.
> > 
> > This is used for pdepd/pextd/cntlzdm/cnttzdm/cfuged, all of which do
> > need 64-bit registers to do anything sane.
> > 
> > This should really have defined some new builtin class, and I thought
> > we
> > could just be tricky and take a massive shortcut.  Bill has been hit
> > by
> > this already as well, sigh :-(
> 
> Ok.
> 
> The usage of that macro seems to be limited to those that you have
> referenced.  i.e. 
> 
> /* Builtins for scalar instructions added in ISA 3.1 (power10).  */
> BU_P10_MISC_2 (CFUGED, "cfuged", CONST, cfuged)
> BU_P10_MISC_2 (CNTLZDM, "cntlzdm", CONST, cntlzdm)
> BU_P10_MISC_2 (CNTTZDM, "cnttzdm", CONST, cnttzdm)
> BU_P10_MISC_2 (PDEPD, "pdepd", CONST, pdepd)
> BU_P10_MISC_2 (PEXTD, "pextd", CONST, pextd)
> 
> So looking at the power7 entries that have the BTM_POWERPC64 entry..
> 
> BU_P7_MISC_2 (DIVWE,  "divwe",CONST,  dive_si)
> BU_P7_MISC_2 (DIVWEU, "divweu",   CONST,  diveu_si)
> BU_P7_POWERPC64_MISC_2 (DIVDE,"divde",CONST,  dive_di)
> BU_P7_POWERPC64_MISC_2 (DIVDEU,   "divdeu",   CONST,  diveu_di)
> 
> Would it be suitable to rename the P10 macro to 
> BU_P10_POWERPC64_MISC_2 ? 

Yes.  But that requires some more infrastructure I thought...  Maybe not
though?  And we can do that anyway of course, it's not like we do not
have way way way too much there already.

> I'd then debate whether to add a unused macro to fill the gap between
> BU_P10_MISC_1 and BU_P10_MISC_2

Nah, don't bother, those are just names, the numbers are meaningless :-)

> If you've got schemes for a deeper fix, i'd need another hint. :-)

Talk with Bill if this makes things easier for him / harder / no
difference?

Thanks,


Segher


Re: [PATCH 1/2, rs6000] int128 sign extention instructions (partial prereq)

2020-09-25 Thread Segher Boessenkool
On Fri, Sep 25, 2020 at 02:54:01PM -0500, Pat Haugen wrote:
> > +(define_expand "extendditi2"
> > +  [(set (match_operand:TI 0 "gpc_reg_operand")
> > +(sign_extend:DI (match_operand:DI 1 "gpc_reg_operand")))]
> > +  "TARGET_POWER10"
> > +  {
> > +/* Move 64-bit src from GPR to vector reg and sign extend to 128-bits 
> > */
> > +rtx temp = gen_reg_rtx (TImode);
> > +emit_insn (gen_mtvsrdd_diti_w1 (temp, operands[1]));
> > +emit_insn (gen_extendditi2_vector (operands[0], temp));
> > +DONE;
> > +  }
> > +  [(set_attr "type" "exts")])
> 
> Don't need "type" attr on define_expand since the type will come from the 2 
> individual insns emitted.

Yeah good point, those attrs do not even do anything on the expand as
far as I can see (do *any* attrs, even?)


Segher


Re: [EXTERNAL] Re: [PATCH 2/2, rs6000] VSX load/store rightmost element operations

2020-09-25 Thread Segher Boessenkool
On Fri, Sep 25, 2020 at 10:41:05AM -0500, will schmidt wrote:
> On Thu, 2020-09-24 at 19:40 -0500, Segher Boessenkool wrote:
> > > +++ b/gcc/testsuite/gcc.target/powerpc/vsx-load-element-extend-
> > > char.c
> > > @@ -0,0 +1,168 @@
> > > +/*
> > > + * Test of vec_xl_sext and vec_xl_zext (load into rightmost
> > > + * vector element and zero/sign extend). */
> > > +
> > > +/* { dg-do compile {target power10_ok} } */
> > > +/* { dg-do run {target power10_hw} } */
> > > +/* { dg-require-effective-target power10_ok } */
> > > +/* { dg-options "-mdejagnu-cpu=power10 -O2" } */
> > 
> > If you dg_require it, why test it on the "dg-do compile" line?  It
> > will
> > *work* with it of course, but it is puzzling :-)
> 
> I've had both compile-time and run-time versions of the test.  In this
> case I wanted to try to handle both, so compile when I can compile it,
> and run when I can run it, etc.
> 
> If that combo doesn't work the way I expect it to, i'll need to split
> them out into separate tests.   

It works, but it does the same thing as
  /* { dg-do compile } */
  /* { dg-do run {target power10_hw} } */
  /* { dg-require-effective-target power10_ok } */
  /* { dg-options "-mdejagnu-cpu=power10 -O2" } */
(so just that first line simplified).

> > > +/* { dg-do compile {target power10_ok} } */
> > > +/* { dg-do run {target power10_hw} } */
> > > +/* { dg-require-effective-target power10_ok } */
> > > +/* { dg-options "-mdejagnu-cpu=power10 -O0" } */
> > 
> > Please comment here what that -O0 is for?  So that we still know when
> > we
> > read it decades from now ;-)
> 
> I've got it commented at least once, I'll make sure to get all the
> instances covered.

Ah I didn't see that one instance.

> > > +/* { dg-final { scan-assembler-times {\mlxvrwx\M} 2 } } */
> > > +/* { dg-final { scan-assembler-times {\mlwax\M} 0 } } */
> > 
> > Maybe all of  {\mlwa}  here?
> 
> lwax was sufficient for what I sniff-tested.  I'll double-check.

Yes, but lwa and lwaux can happen as well...  With a scan-assembler-not
(which this is equivalent to), it is a good idea to throw a wide net in
general.  If the compiler starts using (say) lwa, you won't notice
otherwise.

(The particular regex I gave also catches lwat and lwarx, but do we
actually care?  :-) )


Segher


Re: [PATCH 1/2] rs6000: Support _mm_insert_epi{8,32,64}

2020-09-25 Thread Segher Boessenkool
Hi!

On Fri, Sep 25, 2020 at 09:07:46AM -0500, Paul A. Clarke wrote:
> On Thu, Sep 24, 2020 at 06:22:10PM -0500, Segher Boessenkool wrote:
> > > +  result [(__N & 0b)] = __D;
> > 
> > Hrm, GCC supports binary constants like this since 2007, so okay.  But I
> > have to wonder if this improves anything over hex (or decimal even!)
> > The parens are superfluous (and only hinder legibility), fwiw.

> > > +  result [(__N & 0b1)] = __D;
> > 
> > Especially single-digit numbers look really goofy (like 0x0, but even
> > worse for binary somehow).
> > 
> > Anyway, okay for trunk, with or without those things improved.  Thanks!
> 
> I was trying to obviously and consistently convey the sizes of the masks,
> but I really want to convey _why_ there are masks, so let me try a
> different approach, below.

> +extern __inline __m128i __attribute__((__gnu_inline__, __always_inline__, 
> __artificial__))
> +_mm_insert_epi8 (__m128i const __A, int const __D, int const __N)
> +{
> +  __v16qi result = (__v16qi)__A;
> +
> +  result [__N % (sizeof result / sizeof result[0])] = __D;
> +
> +  return (__m128i) result;
> +}

I don't think this helps explain things, sorry.  Just add a comment
if you want to explain things?  Simpler and works perfectly always.

To read these files I always open the x86 ISA docs anyway; I think
everyone will have to anyway, so you do not have to explain all details
you emulate, only the very tricky ones, or implementation choices, that
kind of thing.


Segher


Re: [PATCH v2 2/2] rs6000: Expand vec_insert in expander instead of gimple [PR79251]

2020-09-25 Thread Segher Boessenkool
On Fri, Sep 25, 2020 at 08:58:35AM +0200, Richard Biener wrote:
> On Thu, Sep 24, 2020 at 9:38 PM Segher Boessenkool
>  wrote:
> > after which I get (-march=znver2)
> >
> > setg:
> > vmovd   %edi, %xmm1
> > vmovd   %esi, %xmm2
> > vpbroadcastd%xmm1, %ymm1
> > vpbroadcastd%xmm2, %ymm2
> > vpcmpeqd.LC0(%rip), %ymm1, %ymm1
> > vpandn  %ymm0, %ymm1, %ymm0
> > vpand   %ymm2, %ymm1, %ymm1
> > vpor%ymm0, %ymm1, %ymm0
> > ret
> 
> I get with -march=znver2 -O2
> 
> vmovd   %edi, %xmm1
> vmovd   %esi, %xmm2
> vpbroadcastd%xmm1, %ymm1
> vpbroadcastd%xmm2, %ymm2
> vpcmpeqd.LC0(%rip), %ymm1, %ymm1
> vpblendvb   %ymm1, %ymm2, %ymm0, %ymm0

Ah, maybe my x86 compiler it too old...
  x86_64-linux-gcc (GCC) 10.0.0 20190919 (experimental)
not exactly old, huh.  I wonder what I do wrong then.

> Now, with SSE4.2 the 16byte case compiles to
> 
> setg:
> .LFB0:
> .cfi_startproc
> movd%edi, %xmm3
> movdqa  %xmm0, %xmm1
> movd%esi, %xmm4
> pshufd  $0, %xmm3, %xmm0
> pcmpeqd .LC0(%rip), %xmm0
> movdqa  %xmm0, %xmm2
> pandn   %xmm1, %xmm2
> pshufd  $0, %xmm4, %xmm1
> pand%xmm1, %xmm0
> por %xmm2, %xmm0
> ret
> 
> since there's no blend with a variable mask IIRC.

PowerPC got at least *that* right since time immemorial :-)

> with aarch64 and SVE it doesn't handle the 32byte case at all,
> the 16byte case compiles to
> 
> setg:
> .LFB0:
> .cfi_startproc
> adrpx2, .LC0
> dup v1.4s, w0
> dup v2.4s, w1
> ldr q3, [x2, #:lo12:.LC0]
> cmeqv1.4s, v1.4s, v3.4s
> bit v0.16b, v2.16b, v1.16b
> 
> which looks equivalent to the AVX2 code.

Yes, and we can do pretty much the same on Power, too.

> For all of those varying the vector element type may also
> cause "issues" I guess.

For us, as long as it stays 16B vectors, all should be fine.  There may
be issues in the compiler, but at least the hardware has no problem with
it ;-)

> > and for powerpc (changing it to 16B vectors, -mcpu=power9) it is
> >
> > setg:
> > addis 9,2,.LC0@toc@ha
> > mtvsrws 32,5
> > mtvsrws 33,6
> > addi 9,9,.LC0@toc@l
> > lxv 45,0(9)
> > vcmpequw 0,0,13
> > xxsel 34,34,33,32
> > blr

The -mcpu=power10 code right now is just

plxv 45,.LC0@pcrel
mtvsrws 32,5
mtvsrws 33,6
vcmpequw 0,0,13
xxsel 34,34,33,32
blr

(exactly the same, but less memory address setup cost), so doing
something like this as a generic version would work quite well pretty
much everywhere I think!


Segher


Re: [PATCH] c++: Implement -Wrange-loop-construct [PR94695]

2020-09-25 Thread Martin Sebor via Gcc-patches

On 9/24/20 6:05 PM, Marek Polacek via Gcc-patches wrote:

This new warning can be used to prevent expensive copies inside range-based
for-loops, for instance:

   struct S { char arr[128]; };
   void fn () {
 S arr[5];
 for (const auto x : arr) {  }
   }

where auto deduces to S and then we copy the big S in every iteration.
Using "const auto " would not incur such a copy.  With this patch the
compiler will warn:

q.C:4:19: warning: loop variable 'x' creates a copy from type 'const S' 
[-Wrange-loop-construct]
 4 |   for (const auto x : arr) {  }
   |   ^
q.C:4:19: note: use reference type 'const S&' to prevent copying
 4 |   for (const auto x : arr) {  }
   |   ^
   |   &

As per Clang, this warning is suppressed for trivially copyable types
whose size does not exceed 64B.  The tricky part of the patch was how
to figure out if using a reference would have prevented a copy.  I've
used perform_implicit_conversion to perform the imaginary conversion.
Then if the conversion doesn't have any side-effects, I assume it does
not call any functions or create any TARGET_EXPRs, and is just a simple
assignment like this one:

   const T  = (const T &) <__for_begin>;

But it can also be a CALL_EXPR:

   x = (const T &) Iterator::operator* (&__for_begin)

which is still fine -- we just use the return value and don't create
any copies.

This warning is enabled by -Wall.  Further warnings of similar nature
should follow soon.


I've always thought a warning like this would be useful when passing
large objects to functions by value.  Is adding one for these cases
what you mean by future warnings?

For the range loop, I wonder if more could be done to elide the copy
and avoid the warning when it isn't really necessary.  For instance,
for trivially copyable types like in your example, since x is const,
modifying it would be undefined, and so when we can prove that
the original object itself isn't modified (e.g., because it's
declared const, or because it can't be accessed in the loop),
there should be no need to make a copy on each iteration.  Using
a reference to the original object should be sufficient.  Does C++
rule out such an optimization?

About the name of the option: my first thought was that it was
about the construct known as the range loop, but after reading
your description I wonder if it might actually primarily be about
constructing expensive copies and the range loop is incidental.
(It's impossible to tell from the Clang manual because its way
of documenting warning options is to show examples of their text.)
Then again, I see it's related to -Wrange-loop-analysis so that
suggests it is mainly about range loops, and that there may be
a whole series of warnings and options related to it.  Can you
please shed some light on that?  (E.g., what are some of
the "further warnings of similar nature" about?)  I think it
might also be helpful to expand the documentation a bit to help
answer common questions (I came across the following post while
looking it up: https://stackoverflow.com/questions/50066139).

Martin



Bootstrapped/regtested on x86_64-pc-linux-gnu, ok for trunk?

gcc/c-family/ChangeLog:

PR c++/94695
* c.opt (Wrange-loop-construct): New option.

gcc/cp/ChangeLog:

PR c++/94695
* parser.c (warn_for_range_copy): New function.
(cp_convert_range_for): Call it.

gcc/ChangeLog:

PR c++/94695
* doc/invoke.texi: Document -Wrange-loop-construct.

gcc/testsuite/ChangeLog:

PR c++/94695
* g++.dg/warn/Wrange-loop-construct.C: New test.
---
  gcc/c-family/c.opt|   4 +
  gcc/cp/parser.c   |  77 ++-
  gcc/doc/invoke.texi   |  21 +-
  .../g++.dg/warn/Wrange-loop-construct.C   | 207 ++
  4 files changed, 304 insertions(+), 5 deletions(-)
  create mode 100644 gcc/testsuite/g++.dg/warn/Wrange-loop-construct.C

diff --git a/gcc/c-family/c.opt b/gcc/c-family/c.opt
index 7761eefd203..bbf7da89658 100644
--- a/gcc/c-family/c.opt
+++ b/gcc/c-family/c.opt
@@ -800,6 +800,10 @@ Wpacked-not-aligned
  C ObjC C++ ObjC++ Var(warn_packed_not_aligned) Warning LangEnabledBy(C ObjC 
C++ ObjC++,Wall)
  Warn when fields in a struct with the packed attribute are misaligned.
  
+Wrange-loop-construct

+C++ ObjC++ Var(warn_range_loop_construct) Warning LangEnabledBy(C++ 
ObjC++,Wall)
+Warn when a range-based for-loop is creating unnecessary copies.
+
  Wredundant-tags
  C++ ObjC++ Var(warn_redundant_tags) Warning
  Warn when a class or enumerated type is referenced using a redundant 
class-key.
diff --git a/gcc/cp/parser.c b/gcc/cp/parser.c
index fba3fcc0c4c..d233279ac62 100644
--- a/gcc/cp/parser.c
+++ b/gcc/cp/parser.c
@@ -12646,6 +12646,73 @@ do_range_for_auto_deduction (tree decl, tree 
range_expr)
  }
  }
  
+/* Warns when the loop variable should be changed to a reference type to

+  

Disable ipa-modref with -flive-patching

2020-09-25 Thread Jan Hubicka
Hi,
ipa-modref propagates knowledge about callee to caller function. This is
not compatible with live patching and thus this patch makes
-flive-patching to imply -fno-ipa-modref.

Bootstrapped/regtested x86_64-linux, comitted.

gcc/ChangeLog:

2020-09-26  Jan Hubicka  

* doc/invoke.texi: Add -fno-ipa-modref to flags disabled by
-flive-patching.
* opts.c (control_options_for_live_patching): Disable ipa-modref.

diff --git a/gcc/doc/invoke.texi b/gcc/doc/invoke.texi
index 2091e0cd23b..226b0e1dc91 100644
--- a/gcc/doc/invoke.texi
+++ b/gcc/doc/invoke.texi
@@ -10879,7 +10879,7 @@ callers are impacted, therefore need to be patched as 
well.
 @gccoptlist{-fwhole-program  -fipa-pta  -fipa-reference  -fipa-ra @gol
 -fipa-icf  -fipa-icf-functions  -fipa-icf-variables @gol
 -fipa-bit-cp  -fipa-vrp  -fipa-pure-const  -fipa-reference-addressable @gol
--fipa-stack-alignment}
+-fipa-stack-alignment -fipa-modref}
 
 @item inline-only-static
 
diff --git a/gcc/opts.c b/gcc/opts.c
index 3c4a0b540b4..3bda59afced 100644
--- a/gcc/opts.c
+++ b/gcc/opts.c
@@ -792,6 +792,13 @@ control_options_for_live_patching (struct gcc_options 
*opts,
   else
opts->x_flag_ipa_pure_const = 0;
 
+  if (opts_set->x_flag_ipa_modref && opts->x_flag_ipa_modref)
+   error_at (loc,
+ "%<-fipa-modref%> is incompatible with "
+ "%<-flive-patching=inline-only-static|inline-clone%>");
+  else
+   opts->x_flag_ipa_modref = 0;
+
   /* FIXME: disable unreachable code removal.  */
 
   /* discovery of functions/variables with no address taken.  */


Fix handling of gimple_clobber in ipa_modref

2020-09-25 Thread Jan Hubicka
Hi,
while adding check for gimple_clobber I reversed the return value
so instead of ignoring the statement ipa-modref gives up.  Fixed thus.
This explains the drop between originally reported disambinguations
stats and ones I got later.

Bootstrapped/regtested x86_64-linux.

gcc/ChangeLog:

2020-09-25  Jan Hubicka  

* ipa-modref.c (analyze_stmt): Fix return value for gimple_clobber.

diff --git a/gcc/ipa-modref.c b/gcc/ipa-modref.c
index aa6929ff010..44b844b90db 100644
--- a/gcc/ipa-modref.c
+++ b/gcc/ipa-modref.c
@@ -658,7 +658,7 @@ analyze_stmt (modref_summary *summary, gimple *stmt, bool 
ipa)
 {
   /* There is no need to record clobbers.  */
   if (gimple_clobber_p (stmt))
-return false;
+return true;
   /* Analyze all loads and stores in STMT.  */
   walk_stmt_load_store_ops (stmt, summary,
analyze_load, analyze_store);


Re: [PATCH v2] c++: Implement -Wrange-loop-construct [PR94695]

2020-09-25 Thread Marek Polacek via Gcc-patches
On Fri, Sep 25, 2020 at 04:09:44PM -0400, Jason Merrill via Gcc-patches wrote:
> On 9/24/20 8:05 PM, Marek Polacek wrote:
> > This new warning can be used to prevent expensive copies inside range-based
> > for-loops, for instance:
> > 
> >struct S { char arr[128]; };
> >void fn () {
> >  S arr[5];
> >  for (const auto x : arr) {  }
> >}
> > 
> > where auto deduces to S and then we copy the big S in every iteration.
> > Using "const auto " would not incur such a copy.  With this patch the
> > compiler will warn:
> > 
> > q.C:4:19: warning: loop variable 'x' creates a copy from type 'const S' 
> > [-Wrange-loop-construct]
> >  4 |   for (const auto x : arr) {  }
> >|   ^
> > q.C:4:19: note: use reference type 'const S&' to prevent copying
> >  4 |   for (const auto x : arr) {  }
> >|   ^
> >|   &
> > 
> > As per Clang, this warning is suppressed for trivially copyable types
> > whose size does not exceed 64B.  The tricky part of the patch was how
> > to figure out if using a reference would have prevented a copy.  I've
> > used perform_implicit_conversion to perform the imaginary conversion.
> > Then if the conversion doesn't have any side-effects, I assume it does
> > not call any functions or create any TARGET_EXPRs, and is just a simple
> > assignment like this one:
> > 
> >const T  = (const T &) <__for_begin>;
> > 
> > But it can also be a CALL_EXPR:
> > 
> >x = (const T &) Iterator::operator* (&__for_begin)
> > 
> > which is still fine -- we just use the return value and don't create
> > any copies.
> 
> Would conv_binds_ref_to_prvalue (implicit_conversion (...)) do what you
> want?

Yes, thanks.  I played with conv_binds_ref_to_prvalue before to check
if the non-reference range-decl case creates a copy, but since the
types of the range decl and *__for_begin are the same, we only get
ck_identity for it.  But I never tried it for the & case...  Nevermind.

Bootstrapped/regtested on x86_64-pc-linux-gnu, ok for trunk?

-- >8 --
This new warning can be used to prevent expensive copies inside range-based
for-loops, for instance:

  struct S { char arr[128]; };
  void fn () {
S arr[5];
for (const auto x : arr) {  }
  }

where auto deduces to S and then we copy the big S in every iteration.
Using "const auto " would not incur such a copy.  With this patch the
compiler will warn:

q.C:4:19: warning: loop variable 'x' creates a copy from type 'const S' 
[-Wrange-loop-construct]
4 |   for (const auto x : arr) {  }
  |   ^
q.C:4:19: note: use reference type 'const S&' to prevent copying
4 |   for (const auto x : arr) {  }
  |   ^
  |   &

As per Clang, this warning is suppressed for trivially copyable types
whose size does not exceed 64B.  The tricky part of the patch was how
to figure out if using a reference would have prevented a copy.  I've
used perform_implicit_conversion to perform the imaginary conversion.
Then if the conversion doesn't have any side-effects, I assume it does
not call any functions or create any TARGET_EXPRs, and is just a simple
assignment like this one:

  const T  = (const T &) <__for_begin>;

But it can also be a CALL_EXPR:

  x = (const T &) Iterator::operator* (&__for_begin)

which is still fine -- we just use the return value and don't create
any copies.

This warning is enabled by -Wall.  Further warnings of similar nature
should follow soon.

gcc/c-family/ChangeLog:

PR c++/94695
* c.opt (Wrange-loop-construct): New option.

gcc/cp/ChangeLog:

PR c++/94695
* call.c (ref_conv_binds_directly_p): New function.
* cp-tree.h (ref_conv_binds_directly_p): Declare.
* parser.c (warn_for_range_copy): New function.
(cp_convert_range_for): Call it.

gcc/ChangeLog:

PR c++/94695
* doc/invoke.texi: Document -Wrange-loop-construct.

gcc/testsuite/ChangeLog:

PR c++/94695
* g++.dg/warn/Wrange-loop-construct.C: New test.
---
 gcc/c-family/c.opt|   4 +
 gcc/cp/call.c |  13 ++
 gcc/cp/cp-tree.h  |   1 +
 gcc/cp/parser.c   |  68 +-
 gcc/doc/invoke.texi   |  21 +-
 .../g++.dg/warn/Wrange-loop-construct.C   | 207 ++
 6 files changed, 309 insertions(+), 5 deletions(-)
 create mode 100644 gcc/testsuite/g++.dg/warn/Wrange-loop-construct.C

diff --git a/gcc/c-family/c.opt b/gcc/c-family/c.opt
index 7761eefd203..bbf7da89658 100644
--- a/gcc/c-family/c.opt
+++ b/gcc/c-family/c.opt
@@ -800,6 +800,10 @@ Wpacked-not-aligned
 C ObjC C++ ObjC++ Var(warn_packed_not_aligned) Warning LangEnabledBy(C ObjC 
C++ ObjC++,Wall)
 Warn when fields in a struct with the packed attribute are misaligned.
 
+Wrange-loop-construct
+C++ ObjC++ Var(warn_range_loop_construct) Warning LangEnabledBy(C++ 

Re: [PATCH] powerpc, libcpp: Fix gcc build with clang on power8 [PR97163]

2020-09-25 Thread Segher Boessenkool
Hi!

On Fri, Sep 25, 2020 at 11:09:39AM +0200, Jakub Jelinek wrote:
> libcpp has two specialized altivec implementations of search_line_fast,
> one for power8+ and the other one otherwise.
> Both use __attribute__((altivec(vector))) and the GCC builtins rather than
> altivec.h and the APIs from there, which is fine, but should be restricted
> to when libcpp is built with GCC, so that it can be relied on.
> The second elif is
> #elif (GCC_VERSION >= 4005) && defined(__ALTIVEC__) && defined 
> (__BIG_ENDIAN__)
> and thus e.g. when built with clang it isn't picked, but the first one was
> just guarded with
> #elif defined(_ARCH_PWR8) && defined(__ALTIVEC__)
> and so according to the bugreporter clang fails miserably on that.

Yeah.  This could be rewritten to use the more portable, more modern
intrinsics, but that would need a lot of testing etc.  Since the only
thing this does is speed up the compiler a few percent, and nothing
changes for a bootstrapped compiler (whatever the build compiler is),
it is just fine.

>   PR bootstrap/97163
>   * lex.c (search_line_fast): Only use _ARCH_PWR8 Altivec version
>   for GCC >= 4.5.

Okay for trunk (and whatever backports you want of course, if any).
Thanks!


Segher


Re: [PATCH v2 9/16][docs] Add some missing test directive documentaion.

2020-09-25 Thread Sandra Loosemore

On 9/25/20 8:29 AM, Tamar Christina wrote:

Hi All,

This adds some documentation for some test directives that are missing.

Bootstrapped Regtested on aarch64-none-linux-gnu and no issues.

Ok for master?

Thanks,
Tamar

gcc/ChangeLog:

* doc/sourcebuild.texi (vect_complex_rot_,
arm_v8_3a_complex_neon_ok, arm_v8_3a_complex_neon_hw): New.

diff --git a/gcc/doc/sourcebuild.texi b/gcc/doc/sourcebuild.texi
index 
65b2e552b74becdbc5474ba5ac387a4a0296e341..3abd8f631cb0234076641e399f6f00768b38ebee
 100644
--- a/gcc/doc/sourcebuild.texi
+++ b/gcc/doc/sourcebuild.texi
@@ -1671,6 +1671,10 @@ Target supports a vector dot-product of @code{signed 
short}.
 @item vect_udot_hi
 Target supports a vector dot-product of @code{unsigned short}.
 
+@item vect_complex_rot_@var{n}

+Target supports a vector complex addition and complex fma of mode @var{N}.
+Possible values of @var{n} are @code{hf}, @code{sf}, @code{df}.
+


Well, "fma" isn't a word.  But looking at target-supports.exp, this 
description doesn't match what's in the source code anyway; there it 
says this is for "vector complex addition with rotate", not fused 
multiply-add.




+@item arm_v8_3a_complex_neon_hw
+ARM target supports executing complex arithmetic instructions from ARMv8.3-A.
+Some multilibs may be incompatible with these options.
+Implies arm_v8_3a_complex_neon_ok.
+


There should be @code markup on arm_v8_3a_complex_neon_ok at the end.  I 
noticed more existing instances of missing @code markup in similar 
language for other entries in this table; can you fix those at the same 
time, or in a separate patch?  I consider fixing markup issues like that 
to be obvious (especially in internal documentation rather than the GCC 
user manual), so you can just check in fixes like that without waiting 
for review.


-Sandra


[PATCH gcc-10] gcov: fix TOPN streaming from shared libraries

2020-09-25 Thread Sergei Trofimovich via Gcc-patches
From: Sergei Trofimovich 

Before the change gcc did not stream correctly TOPN counters
if counters belonged to a non-local shared object.

As a result zero-section optimization generated TOPN sections
in a form not recognizable by '__gcov_merge_topn'.

The problem happens because in a case of multiple shared objects
'__gcov_merge_topn' function is present in address space multiple
times (once per each object).

The fix is to never rely on function address and predicate on TOPN
counter types.

libgcc/ChangeLog:

PR gcov-profile/96913
* libgcov-driver.c (write_one_data): Avoid function pointer
comparison in TOP streaming decision.

(backported commit 4ecf368f4b4223fb2df4f3887429dfbb48852e38)
---
 libgcc/libgcov-driver.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/libgcc/libgcov-driver.c b/libgcc/libgcov-driver.c
index fb320738e1e..37438883d37 100644
--- a/libgcc/libgcov-driver.c
+++ b/libgcc/libgcov-driver.c
@@ -242,7 +242,7 @@ prune_counters (struct gcov_info *gi)
  if (gi->merge[j] == NULL)
continue;
 
- if (gi->merge[j] == __gcov_merge_topn)
+ if (j == GCOV_COUNTER_V_TOPN || j == GCOV_COUNTER_V_INDIR)
{
  gcc_assert (!(ci->num % GCOV_TOPN_VALUES_COUNTERS));
  for (unsigned k = 0; k < (ci->num / GCOV_TOPN_VALUES_COUNTERS);
-- 
2.28.0



[PATCH] c++: Incomplete parameter mappings during normalization

2020-09-25 Thread Patrick Palka via Gcc-patches
In the testcase concepts7.C below, we currently reject the call to f1
but we accept the call to f2, even though their associated constraints
are functionally equivalent.

The reason satisfaction differs for (!!C is due to normalization: the former is already an
atom, and the latter is not.  Normalization of the former yields itself,
whereas normalization of the latter yields the atom 'true' with an empty
parameter mapping (since the atom uses no template parameters).  So when
building the latter atom we threw away the T::type term that would later
result in substitution failure during satisfaction.

However, [temp.constr.normal]/1 says:

  - ...
  - The normal form of a concept-id C is the normal
form of the constraint-expression of C, after substituting A1, A2,
..., An for C's respective template parameters in the parameter
mappings in each atomic constraint.
  - The normal form of any other expression E is the atomic constraint
whose expression is E and whose parameter mapping is the identity
mapping.

I believe these two bullet points imply that the atom 'true' in the
normal form of C should have the mapping R |-> T::type
instead of the empty mapping that we give it, because according to the
last bullet point, each atom should start out with the identity mapping
that includes all template parameters.

This patch fixes this issue by always giving the first atom in the
normal form of each concept a 'complete' parameter mapping, i.e. one
that includes all template parameters.  I think it suffices to do this
only for the first atom so that we catch substitution failures like in
concepts7.C at the right time.  For the other atoms, their mappings can
continue to include only template parameters used in the atom.

I noticed that PR92268 alludes to this issue, so this patch refers to
that PR and adds the PR's first testcase which we now accept.

Is the above interpretation of the standard correct here?  If so, does
this seem like a good approach?

gcc/cp/ChangeLog:

PR c++/92268
* constraint.cc (build_parameter_mapping): Add a bool parameter
'complete'.  When 'complete' is true, then include all in-scope
template parameters in the mapping.
(norm_info::update_context): Pass false as the 'complete'
argument to build_parameter_mapping.
(norm_info::normalized_first_atom_p): New bool data member.
(normalize_logical_operation): Set info.normalized_first_atom_p
after normalizing the left operand.
(normalize_concept_check): Reset info.normalized_first_atom_p
before normalizing this concept.
(normalize_atom): Always give the first atom of a concept
definition a complete parameter mapping.

gcc/testsuite/ChangeLog:

PR c++/92268
* g++.dg/cpp2a/concepts-pr92268.C: New test.
* g++.dg/cpp2a/concepts-return-req1.C: Don't expect an error,
as the call is no longer ambiguous.
* g++.dg/cpp2a/concepts7.C: New test.
* g++.dg/cpp2a/concepts8.C: New test.
---
 gcc/cp/constraint.cc  | 44 ---
 gcc/testsuite/g++.dg/cpp2a/concepts-pr92268.C | 42 ++
 .../g++.dg/cpp2a/concepts-return-req1.C   |  2 +-
 gcc/testsuite/g++.dg/cpp2a/concepts7.C|  9 
 gcc/testsuite/g++.dg/cpp2a/concepts8.C| 25 +++
 5 files changed, 116 insertions(+), 6 deletions(-)
 create mode 100644 gcc/testsuite/g++.dg/cpp2a/concepts-pr92268.C
 create mode 100644 gcc/testsuite/g++.dg/cpp2a/concepts7.C
 create mode 100644 gcc/testsuite/g++.dg/cpp2a/concepts8.C

diff --git a/gcc/cp/constraint.cc b/gcc/cp/constraint.cc
index d49957a6c4a..729d02b73d7 100644
--- a/gcc/cp/constraint.cc
+++ b/gcc/cp/constraint.cc
@@ -559,10 +559,14 @@ map_arguments (tree parms, tree args)
   return parms;
 }
 
-/* Build the parameter mapping for EXPR using ARGS.  */
+/* Build the parameter mapping for EXPR using ARGS.
+
+   If COMPLETE then return the complete parameter mapping that includes
+   all in-scope template parameters.  Otherwise include only the
+   parameters used by EXPR.  */
 
 static tree
-build_parameter_mapping (tree expr, tree args, tree decl)
+build_parameter_mapping (tree expr, tree args, tree decl, bool complete)
 {
   tree ctx_parms = NULL_TREE;
   if (decl)
@@ -579,6 +583,24 @@ build_parameter_mapping (tree expr, tree args, tree decl)
   ctx_parms = current_template_parms;
 }
 
+  if (!ctx_parms)
+return NULL_TREE;
+
+  if (complete)
+{
+  if (!processing_template_parmlist)
+   /* Search through ctx_parms to build a complete mapping.  */
+   expr = template_parms_to_args (ctx_parms);
+  else
+   /* The expression might use parameters introduced in the currently
+  open template parameter list, which ctx_parms doesn't yet have.
+  So we need to search through the expression in addition to
+  ctx_parms.  */
+   expr = tree_cons (NULL_TREE, expr,
+ 

Re: [PATCH, rs6000] correct an erroneous BTM value in the BU_P10_MISC define

2020-09-25 Thread will schmidt via Gcc-patches
On Fri, 2020-09-25 at 12:36 -0500, Segher Boessenkool wrote:
> Hi!
> 
> On Thu, Sep 24, 2020 at 03:35:24PM -0500, will schmidt wrote:
> > We have extraneous BTM entry (RS6000_BTM_POWERPC64) in the
> > define for
> > our P10 MISC 2 builtin definition.  This does not exist for the
> > '0',
> > '1' or '3' definitions. It appears to me that this was erroneously
> > copied from the P7 version of the define which contains a version
> > of the
> > BU macro both with and without that element.  Removing the
> > RS6000_BTM_POWERPC64 portion of the define does not introduce any
> > obvious
> > failures, I believe this extra line can be safely removed.
> 
> No, it cannot.
> 
> This is used for pdepd/pextd/cntlzdm/cnttzdm/cfuged, all of which do
> need 64-bit registers to do anything sane.
> 
> This should really have defined some new builtin class, and I thought
> we
> could just be tricky and take a massive shortcut.  Bill has been hit
> by
> this already as well, sigh :-(

Ok.

The usage of that macro seems to be limited to those that you have
referenced.  i.e. 

/* Builtins for scalar instructions added in ISA 3.1 (power10).  */
BU_P10_MISC_2 (CFUGED, "cfuged", CONST, cfuged)
BU_P10_MISC_2 (CNTLZDM, "cntlzdm", CONST, cntlzdm)
BU_P10_MISC_2 (CNTTZDM, "cnttzdm", CONST, cnttzdm)
BU_P10_MISC_2 (PDEPD, "pdepd", CONST, pdepd)
BU_P10_MISC_2 (PEXTD, "pextd", CONST, pextd)

So looking at the power7 entries that have the BTM_POWERPC64 entry..

BU_P7_MISC_2 (DIVWE,"divwe",CONST,  dive_si)
BU_P7_MISC_2 (DIVWEU,   "divweu",   CONST,  diveu_si)
BU_P7_POWERPC64_MISC_2 (DIVDE,  "divde",CONST,  dive_di)
BU_P7_POWERPC64_MISC_2 (DIVDEU, "divdeu",   CONST,  diveu_di)

Would it be suitable to rename the P10 macro to 
BU_P10_POWERPC64_MISC_2 ? 

I'd then debate whether to add a unused macro to fill the gap between
BU_P10_MISC_1 and BU_P10_MISC_2

If you've got schemes for a deeper fix, i'd need another hint. :-)

thanks
-Will

> 
> 
> Segher



Re: [PATCH] c++: Fix up default initialization with consteval default ctor [PR96994]

2020-09-25 Thread Jason Merrill via Gcc-patches

On 9/15/20 3:57 AM, Jakub Jelinek wrote:

Hi!

The following testcase is miscompiled (in particular the a and i
initialization).  The problem is that build_special_member_call due to
the immediate constructors (but not evaluated in constant expression mode)
doesn't create a CALL_EXPR, but returns a TARGET_EXPR with CONSTRUCTOR
as the initializer for it,


That seems like the bug; at the end of build_over_call, after you


   call = cxx_constant_value (call, obj_arg);


You need to build an INIT_EXPR if obj_arg isn't a dummy.

Jason



Re: [PATCH] c++: Implement -Wrange-loop-construct [PR94695]

2020-09-25 Thread Jason Merrill via Gcc-patches

On 9/24/20 8:05 PM, Marek Polacek wrote:

This new warning can be used to prevent expensive copies inside range-based
for-loops, for instance:

   struct S { char arr[128]; };
   void fn () {
 S arr[5];
 for (const auto x : arr) {  }
   }

where auto deduces to S and then we copy the big S in every iteration.
Using "const auto " would not incur such a copy.  With this patch the
compiler will warn:

q.C:4:19: warning: loop variable 'x' creates a copy from type 'const S' 
[-Wrange-loop-construct]
 4 |   for (const auto x : arr) {  }
   |   ^
q.C:4:19: note: use reference type 'const S&' to prevent copying
 4 |   for (const auto x : arr) {  }
   |   ^
   |   &

As per Clang, this warning is suppressed for trivially copyable types
whose size does not exceed 64B.  The tricky part of the patch was how
to figure out if using a reference would have prevented a copy.  I've
used perform_implicit_conversion to perform the imaginary conversion.
Then if the conversion doesn't have any side-effects, I assume it does
not call any functions or create any TARGET_EXPRs, and is just a simple
assignment like this one:

   const T  = (const T &) <__for_begin>;

But it can also be a CALL_EXPR:

   x = (const T &) Iterator::operator* (&__for_begin)

which is still fine -- we just use the return value and don't create
any copies.


Would conv_binds_ref_to_prvalue (implicit_conversion (...)) do what you 
want?



This warning is enabled by -Wall.  Further warnings of similar nature
should follow soon.

Bootstrapped/regtested on x86_64-pc-linux-gnu, ok for trunk?

gcc/c-family/ChangeLog:

PR c++/94695
* c.opt (Wrange-loop-construct): New option.

gcc/cp/ChangeLog:

PR c++/94695
* parser.c (warn_for_range_copy): New function.
(cp_convert_range_for): Call it.

gcc/ChangeLog:

PR c++/94695
* doc/invoke.texi: Document -Wrange-loop-construct.

gcc/testsuite/ChangeLog:

PR c++/94695
* g++.dg/warn/Wrange-loop-construct.C: New test.
---
  gcc/c-family/c.opt|   4 +
  gcc/cp/parser.c   |  77 ++-
  gcc/doc/invoke.texi   |  21 +-
  .../g++.dg/warn/Wrange-loop-construct.C   | 207 ++
  4 files changed, 304 insertions(+), 5 deletions(-)
  create mode 100644 gcc/testsuite/g++.dg/warn/Wrange-loop-construct.C

diff --git a/gcc/c-family/c.opt b/gcc/c-family/c.opt
index 7761eefd203..bbf7da89658 100644
--- a/gcc/c-family/c.opt
+++ b/gcc/c-family/c.opt
@@ -800,6 +800,10 @@ Wpacked-not-aligned
  C ObjC C++ ObjC++ Var(warn_packed_not_aligned) Warning LangEnabledBy(C ObjC 
C++ ObjC++,Wall)
  Warn when fields in a struct with the packed attribute are misaligned.
  
+Wrange-loop-construct

+C++ ObjC++ Var(warn_range_loop_construct) Warning LangEnabledBy(C++ 
ObjC++,Wall)
+Warn when a range-based for-loop is creating unnecessary copies.
+
  Wredundant-tags
  C++ ObjC++ Var(warn_redundant_tags) Warning
  Warn when a class or enumerated type is referenced using a redundant 
class-key.
diff --git a/gcc/cp/parser.c b/gcc/cp/parser.c
index fba3fcc0c4c..d233279ac62 100644
--- a/gcc/cp/parser.c
+++ b/gcc/cp/parser.c
@@ -12646,6 +12646,73 @@ do_range_for_auto_deduction (tree decl, tree 
range_expr)
  }
  }
  
+/* Warns when the loop variable should be changed to a reference type to

+   avoid unnecessary copying.  I.e., from
+
+ for (const auto x : range)
+
+   where range returns a reference, to
+
+ for (const auto  : range)
+
+   if this version doesn't make a copy.  DECL is the RANGE_DECL; EXPR is the
+   *__for_begin expression.
+   This function is never called when processing_template_decl is on.  */
+
+static void
+warn_for_range_copy (tree decl, tree expr)
+{
+  if (!warn_range_loop_construct
+  || decl == error_mark_node)
+return;
+
+  location_t loc = DECL_SOURCE_LOCATION (decl);
+  tree type = TREE_TYPE (decl);
+
+  if (from_macro_expansion_at (loc))
+return;
+
+  if (TYPE_REF_P (type))
+{
+  /* TODO: Implement reference warnings.  */
+  return;
+}
+  else if (!CP_TYPE_CONST_P (type))
+return;
+
+  /* Since small trivially copyable types are cheap to copy, we suppress the
+ warning for them.  64B is a common size of a cache line.  */
+  if (TREE_CODE (TYPE_SIZE_UNIT (type)) != INTEGER_CST
+  || (tree_to_uhwi (TYPE_SIZE_UNIT (type)) <= 64
+ && trivially_copyable_p (type)))
+return;
+
+  tree rtype = cp_build_reference_type (type, /*rval*/false);
+  /* See what it would take to convert the expr if we used a reference.  */
+  expr = perform_implicit_conversion (rtype, expr, tf_none);
+  if (!TREE_SIDE_EFFECTS (expr))
+/* No calls/TARGET_EXPRs.  */;
+  else
+{
+  /* If we could initialize the reference directly from the call, it
+wouldn't involve any copies.  */
+  STRIP_NOPS (expr);
+  if (TREE_CODE (expr) != CALL_EXPR

Re: [PATCH] tree-optimization/97151 - improve PTA for C++ operator delete

2020-09-25 Thread Jason Merrill via Gcc-patches

On 9/25/20 2:30 AM, Richard Biener wrote:

On Thu, 24 Sep 2020, Jason Merrill wrote:


On 9/24/20 3:43 AM, Richard Biener wrote:

On Wed, 23 Sep 2020, Jason Merrill wrote:


On 9/23/20 2:42 PM, Richard Biener wrote:

On September 23, 2020 7:53:18 PM GMT+02:00, Jason Merrill

wrote:

On 9/23/20 4:14 AM, Richard Biener wrote:

C++ operator delete, when DECL_IS_REPLACEABLE_OPERATOR_DELETE_P,
does not cause the deleted object to be escaped.  It also has no
other interesting side-effects for PTA so skip it like we do
for BUILT_IN_FREE.


Hmm, this is true of the default implementation, but since the function

is replaceable, we don't know what a user definition might do with the
pointer.


But can the object still be 'used' after delete? Can delete fail / throw?

What guarantee does the predicate give us?


The deallocation function is called as part of a delete expression in order
to
release the storage for an object, ending its lifetime (if it was not ended
by
a destructor), so no, the object can't be used afterward.


OK, but the delete operator can access the object contents if there
wasn't a destructor ...



A deallocation function that throws has undefined behavior.


OK, so it seems the 'replaceable' operators are the global ones
(for user-defined/class-specific placement variants I see arbitrary
extra arguments that we'd possibly need to handle).

I'm happy to revert but I'd like to have a testcase that FAILs
with the patch ;)

Now, the following aborts:

struct X {
static struct X saved;
int *p;
X() { __builtin_memcpy (this, , sizeof (X)); }
};
void operator delete (void *p)
{
__builtin_memcpy (::saved, p, sizeof (X));
}
int main()
{
int y = 1;
X *p = new X;
p->p = 
delete p;
X *q = new X;
*(q->p) = 2;
if (y != 2)
  __builtin_abort ();
}

and I could fix this by not making *p but what *p points to escape.
The testcase is of course maximally awkward, but hey ... ;)

Now this would all be moot if operator delete may not access
the object (or if the object contents are undefined at that point).

Oh, and the testcase segfaults when compiled with GCC 10 because
there we elide the new X / delete p pair ... which is invalid then?
Hmm, we emit

MEM[(struct X *)_8] ={v} {CLOBBER};
operator delete (_8, 8);

so the object contents are undefined _before_ calling delete
even when I do not have a DTOR?  That is, the above,
w/o -fno-lifetime-dse, makes the PTA patch OK for the testcase.


Yes, all classes have a destructor, even if it's trivial, so the object's
lifetime definitely ends before the call to operator delete. This is less
clear for scalar objects, but treating them similarly would be consistent with
other recent changes, so I think it's fine for us to assume that scalar
objects are also invalidated before the call to operator delete.  But of
course this doesn't apply to explicit calls to operator delete outside of a
delete expression.


OK, so change the testcase main slightly to

int main()
{
   int y = 1;
   X *p = new X;
   p->p = 
   ::operator delete(p);
   X *q = new X;
   *(q->p) = 2;
   if (y != 2)
 __builtin_abort ();
}

in this case the lifetime of *p does not end before calling
::operator delete() and delete can stash the object contents
somewhere before ending its lifetime.  For the very same reason
we may not elide a new/delete pair like in

int main()
{
   int *p = new int;
   *p = 1;
   ::operator delete (p);
}


Correct; the permission to elide new/delete pairs are for the 
expressions, not the functions.



which we before the change did not do only because calling
operator delete made p escape.  Unfortunately points-to analysis
cannot really reconstruct whether delete was called as part of
a delete expression or directly (and thus whether object lifetime
ended already), neither can DCE.  So I guess we need to mark
the operator delete call in some way to make those transforms
safe.  At least currently any operator delete call makes the
alias guarantee of a operator new call moot by forcing the object
to be aliased with all global and escaped memory ...

Looks like there are some unallocated flags for CALL_EXPR we could
pick but I wonder if we can recycle protected_flag which is

CALL_FROM_THUNK_P and
CALL_ALLOCA_FOR_VAR_P in
CALL_EXPR

for calls to DECL_IS_OPERATOR_{NEW,DELETE}_P, thus whether
we have CALL_FROM_THUNK_P for those operators.  Guess picking
a new flag is safer.


We won't ever call those operators from a thunk, so it should be OK to 
reuse it.



But, does it seem correct that we need to distinguish
delete expressions from plain calls to operator delete?


A reason for that distinction came up in the context of omitting 
new/delete pairs: we want to consider the operator first called by the 
new or delete expression, not a call from that first operator to another 
operator new/delete and exposed by inlining.


https://gcc.gnu.org/pipermail/gcc-patches/2020-April/543404.html


In this 

Re: [PATCH 1/2, rs6000] int128 sign extention instructions (partial prereq)

2020-09-25 Thread Pat Haugen via Gcc-patches
On 9/24/20 10:59 AM, will schmidt via Gcc-patches wrote:
> +;; Move DI value from GPR to TI mode in VSX register, word 1.
> +(define_insn "mtvsrdd_diti_w1"
> +  [(set (match_operand:TI 0 "register_operand" "=wa")
> + (unspec:TI [(match_operand:DI 1 "register_operand" "r")]
> +UNSPEC_MTVSRD_DITI_W1))]
> +  "TARGET_POWERPC64 && TARGET_DIRECT_MOVE"
> +  "mtvsrdd %x0,0,%1"
> +  [(set_attr "type" "vecsimple")])

"vecmove" (since I just updated the other uses).

> +
> +;; Sign extend 64-bit value in TI reg, word 1, to 128-bit value in TI reg
> +(define_insn "extendditi2_vector"
> +  [(set (match_operand:TI 0 "gpc_reg_operand" "=v")
> +(unspec:TI [(match_operand:TI 1 "gpc_reg_operand" "v")]
> + UNSPEC_EXTENDDITI2))]
> +  "TARGET_POWER10"
> +  "vextsd2q %0,%1"
> +  [(set_attr "type" "exts")])

"vecexts".

> +
> +(define_expand "extendditi2"
> +  [(set (match_operand:TI 0 "gpc_reg_operand")
> +(sign_extend:DI (match_operand:DI 1 "gpc_reg_operand")))]
> +  "TARGET_POWER10"
> +  {
> +/* Move 64-bit src from GPR to vector reg and sign extend to 128-bits */
> +rtx temp = gen_reg_rtx (TImode);
> +emit_insn (gen_mtvsrdd_diti_w1 (temp, operands[1]));
> +emit_insn (gen_extendditi2_vector (operands[0], temp));
> +DONE;
> +  }
> +  [(set_attr "type" "exts")])

Don't need "type" attr on define_expand since the type will come from the 2 
individual insns emitted.

Thanks,
Pat


Re: [stage1][PATCH] Change semantics of -frecord-gcc-switches and add -frecord-gcc-switches-format.

2020-09-25 Thread Qing Zhao via Gcc-patches



> On Sep 25, 2020, at 9:55 AM, Martin Liška  wrote:
> 
> PING^5
> 

Thanks a lot for ping this patch again.

Hopefully it can be committed into GCC 11 very soon.

Qing
> On 7/21/20 6:24 PM, Qing Zhao wrote:
>> PING^4.
>> Our company is waiting for this patch to be committed to upstream.
>> Thanks a lot.
>> Qing
>>> On Jun 16, 2020, at 7:49 AM, Martin Liška  wrote:
>>> 
>>> PING^3
>>> 
>>> On 6/2/20 11:16 AM, Martin Liška wrote:
 PING^2
 On 5/15/20 11:58 AM, Martin Liška wrote:
> We're in stage1: PING^1
> 
> On 4/3/20 8:15 PM, Egeyar Bagcioglu wrote:
>> 
>> 
>> On 3/18/20 10:05 AM, Martin Liška wrote:
>>> On 3/17/20 7:43 PM, Egeyar Bagcioglu wrote:
 Hi Martin,
 
 I like the patch. It definitely serves our purposes at Oracle and 
 provides another way to do what my previous patches did as well.
 
 1) It keeps the backwards compatibility regarding 
 -frecord-gcc-switches; therefore, removes my related doubts about your 
 previous patch.
 
 2) It still makes use of -frecord-gcc-switches. The new option is only 
 to control the format. This addresses some previous objections to 
 having a new option doing something similar. Now the new option 
 controls the behaviour of the existing one and that behaviour can be 
 further extended.
 
 3) It uses an environment variable as Jakub suggested.
 
 The patch looks good and I confirm that it works for our purposes.
>>> 
>>> Hello.
>>> 
>>> Thank you for the support.
>>> 
 
 Having said that, I have to ask for recognition in this patch for my 
 and my company's contributions. Can you please keep my name and my 
 work email in the changelog and in the commit message?
>>> 
>>> Sure, sorry I forgot.
>> 
>> Hi Martin,
>> 
>> I noticed that some comments in the patch were still referring to 
>> --record-gcc-command-line, the option I suggested earlier. I updated 
>> those comments to mention -frecord-gcc-switches-format instead and also 
>> added my name to the patch as you agreed above. I attached the updated 
>> patch. We are starting to use this patch in the specific domain where we 
>> need its functionality.
>> 
>> Regards
>> Egeyar
>> 
>> 
>>> 
>>> Martin
>>> 
 
 Thanks
 Egeyar
 
 
 
 On 3/17/20 2:53 PM, Martin Liška wrote:
> Hi.
> 
> I'm sending enhanced patch that makes the following changes:
> - a new option -frecord-gcc-switches-format is added; the option
>   selects format (processed, driver) for all options that record
>   GCC command line
> - Dwarf gen_produce_string is now used in -fverbose-asm
> - The .s file is affected in the following way:
> 
> BEFORE:
> 
> # GNU C17 (SUSE Linux) version 9.2.1 20200128 [revision 
> 83f65674e78d97d27537361de1a9d74067ff228d] (x86_64-suse-linux)
> #compiled by GNU C version 9.2.1 20200128 [revision 
> 83f65674e78d97d27537361de1a9d74067ff228d], GMP version 6.2.0, MPFR 
> version 4.0.2, MPC version 1.1.0, isl version isl-0.22.1-GMP
> 
> # GGC heuristics: --param ggc-min-expand=100 --param 
> ggc-min-heapsize=131072
> # options passed:  -fpreprocessed test.i -march=znver1 -mmmx 
> -mno-3dnow
> # -msse -msse2 -msse3 -mssse3 -msse4a -mcx16 -msahf -mmovbe -maes 
> -msha
> # -mpclmul -mpopcnt -mabm -mno-lwp -mfma -mno-fma4 -mno-xop -mbmi 
> -mno-sgx
> # -mbmi2 -mno-pconfig -mno-wbnoinvd -mno-tbm -mavx -mavx2 -msse4.2 
> -msse4.1
> # -mlzcnt -mno-rtm -mno-hle -mrdrnd -mf16c -mfsgsbase -mrdseed 
> -mprfchw
> # -madx -mfxsr -mxsave -mxsaveopt -mno-avx512f -mno-avx512er 
> -mno-avx512cd
> # -mno-avx512pf -mno-prefetchwt1 -mclflushopt -mxsavec -mxsaves
> # -mno-avx512dq -mno-avx512bw -mno-avx512vl -mno-avx512ifma 
> -mno-avx512vbmi
> # -mno-avx5124fmaps -mno-avx5124vnniw -mno-clwb -mmwaitx -mclzero 
> -mno-pku
> # -mno-rdpid -mno-gfni -mno-shstk -mno-avx512vbmi2 -mno-avx512vnni
> # -mno-vaes -mno-vpclmulqdq -mno-avx512bitalg -mno-movdiri 
> -mno-movdir64b
> # -mno-waitpkg -mno-cldemote -mno-ptwrite --param l1-cache-size=32
> # --param l1-cache-line-size=64 --param l2-cache-size=512 
> -mtune=znver1
> # -grecord-gcc-switches -g -fverbose-asm -frecord-gcc-switches
> # options enabled:  -faggressive-loop-optimizations -fassume-phsa
> # -fasynchronous-unwind-tables -fauto-inc-dec -fcommon
> # -fdelete-null-pointer-checks -fdwarf2-cfi-asm -fearly-inlining
> # -feliminate-unused-debug-types 

c++: Adjust pushdecl/duplicate_decls API

2020-09-25 Thread Nathan Sidwell

The decl pushing APIs and duplicate_decls take an 'is_friend' parm,
when what they actually mean is 'hide this from name lookup'.  That
conflation has gotten more anachronistic as time moved on.  We now
have anticipated builtins, and I plan to have injected extern decls
soon.  So this patch is mainly a renaming excercise.  is_friend ->
hiding.  duplicate_decls gets an additional 'was_hidden' parm.  As
I've already said, hiddenness is a property of the symbol table, not
the decl.  Builtins are now pushed requesting hiding, and pushdecl
asserts that we don't attempt to push a thing that should be hidden
without asking for it to be hidden.

This is the final piece of groundwork to get rid of a bunch of 'this
is hidden' markers on decls and move the hiding management entirely
into name lookup.

gcc/cp/
* cp-tree.h (duplicate_decls): Replace 'is_friend' with 'hiding'
and add 'was_hidden'.
* name-lookup.h (pushdecl_namespace_level): Replace 'is_friend'
with 'hiding'.
(pushdecl): Likewise.
(pushdecl_top_level): Drop is_friend parm.
* decl.c (check_no_redeclaration_friend_default_args): Rename parm
olddelc_hidden_p.
(duplicate_decls): Replace 'is_friend' with 'hiding'
and 'was_hidden'.  Do minimal adjustments in body.
(cxx_builtin_function): Pass 'hiding' to pushdecl.
* friend.c (do_friend): Pass 'hiding' to pushdecl.
* name-lookup.c (supplement_binding_1): Drop defaulted arg to
duplicate_decls.
(update_binding): Replace 'is_friend' with 'hiding'.  Drop
defaulted arg to duplicate_decls.
(do_pushdecl): Replace 'is_friend' with 'hiding'.  Assert no
surprise hidhing.  Adjust duplicate_decls calls to inform of old
decl's hiddennes.
(pushdecl): Replace 'is_friend' with 'hiding'.
(set_identifier_type_value_with_scope): Adjust update_binding
call.
(do_pushdecl_with_scope): Replace 'is_friend' with 'hiding'.
(pushdecl_outermost_localscope): Drop default arg to
do_pushdecl_with_scope.
(pushdecl_namespace_level): Replace 'is_friend' with 'hiding'.
(pushdecl_top_level): Drop is_friend parm.
* pt.c (register_specialization): Comment duplicate_decls call
args.
(push_template_decl): Commont pushdecl_namespace_level.
(tsubst_friend_function, tsubst_friend_class): Likewise.

pushing to trunk

nathan
--
Nathan Sidwell
diff --git i/gcc/cp/cp-tree.h w/gcc/cp/cp-tree.h
index 321bb959120..b7f5b6b399f 100644
--- i/gcc/cp/cp-tree.h
+++ w/gcc/cp/cp-tree.h
@@ -6466,7 +6466,8 @@ extern void determine_local_discriminator	(tree);
 extern int decls_match(tree, tree, bool = true);
 extern bool maybe_version_functions		(tree, tree, bool);
 extern tree duplicate_decls			(tree, tree,
-		 bool is_friend = false);
+		 bool hiding = false,
+		 bool was_hidden = false);
 extern tree declare_local_label			(tree);
 extern tree define_label			(location_t, tree);
 extern void check_goto(tree);
diff --git i/gcc/cp/decl.c w/gcc/cp/decl.c
index b481bbd7b7d..c00b996294e 100644
--- i/gcc/cp/decl.c
+++ w/gcc/cp/decl.c
@@ -1341,17 +1341,16 @@ check_redeclaration_no_default_args (tree decl)
 
 static void
 check_no_redeclaration_friend_default_args (tree olddecl, tree newdecl,
-	bool olddecl_hidden_friend_p)
+	bool olddecl_hidden_p)
 {
-  if (!olddecl_hidden_friend_p && !DECL_FRIEND_P (newdecl))
+  if (!olddecl_hidden_p && !DECL_FRIEND_P (newdecl))
 return;
 
-  tree t1 = FUNCTION_FIRST_USER_PARMTYPE (olddecl);
-  tree t2 = FUNCTION_FIRST_USER_PARMTYPE (newdecl);
-
-  for (; t1 && t1 != void_list_node;
+  for (tree t1 = FUNCTION_FIRST_USER_PARMTYPE (olddecl),
+	 t2 = FUNCTION_FIRST_USER_PARMTYPE (newdecl);
+   t1 && t1 != void_list_node;
t1 = TREE_CHAIN (t1), t2 = TREE_CHAIN (t2))
-if ((olddecl_hidden_friend_p && TREE_PURPOSE (t1))
+if ((olddecl_hidden_p && TREE_PURPOSE (t1))
 	|| (DECL_FRIEND_P (newdecl) && TREE_PURPOSE (t2)))
   {
 	auto_diagnostic_group d;
@@ -1435,10 +1434,14 @@ duplicate_function_template_decls (tree newdecl, tree olddecl)
If NEWDECL is not a redeclaration of OLDDECL, NULL_TREE is
returned.
 
-   NEWDECL_IS_FRIEND is true if NEWDECL was declared as a friend.  */
+   HIDING is true if the new decl is being hidden.  WAS_HIDDEN is true
+   if the old decl was hidden.
+
+   Hidden decls can be anticipated builtins, injected friends, or
+   (coming soon) injected from a local-extern decl.   */
 
 tree
-duplicate_decls (tree newdecl, tree olddecl, bool newdecl_is_friend)
+duplicate_decls (tree newdecl, tree olddecl, bool hiding, bool was_hidden)
 {
   unsigned olddecl_uid = DECL_UID (olddecl);
   int olddecl_friend = 0, types_match = 0, hidden_friend = 0;
@@ -1510,7 +1513,7 @@ duplicate_decls (tree newdecl, tree olddecl, bool newdecl_is_friend)
 	{
 	  /* Avoid warnings redeclaring built-ins which have not been
 	 

[PATCH] make handling of zero-length arrays in C++ pretty printer more robust (PR 97201)

2020-09-25 Thread Martin Sebor via Gcc-patches

The C and C++ representations of zero-length arrays are different:
C uses a null upper bound of the type's domain while C++ uses
SIZE_MAX.  This makes the middle end logic more complicated (and
prone to mistakes) because it has to be prepared for both.  A recent
change to -Warray-bounds has the middle end create a zero-length
array to print in a warning message.  I forgot about this gotcha
and, as a result, when the warning triggers under these conditions
in C++, it causes an ICE in the C++ pretty printer that in turn
isn't prepared for the C form of the domain.

In my mind, the "right fix" is to make the representation the same
between the front ends, but I'm certain that such a change would
cause more problems before it solved them.  Another solution might
be to provide APIs for creating (and querying) arrays and have them
call language hooks in cases where the representation might differ.
But that would likely be quite intrusive as well.  So with that in
mind, for the time being, the attached patch just continues to deal
with the difference by teaching the C++ pretty printer to also
recognize the C form of the zero-length domain.

While testing the one line fix I noticed that -Warray-bounds (and
therefore, I assume also all other warnings that detect out of bounds
accesses to allocated objects) triggers only for the ordinary form of
operator new and not for the nothrow overload, for instance.  That's
because the ordinary form is recognized as a built-in which has
the alloc_size attribute attached to it.  But because the other forms
are neither built-in nor declared in  with the same attribute,
the warning doesn't trigger.  So the patch also adds the attribute
to the declarations of these overloads in .  In addition, it
adds attribute malloc to a couple of overloads of the operator that
it's missing from.

Tested on x86_64-linux.

Martin
PR c++/97201 - ICE in -Warray-bounds writing to result of operator new(0)

gcc/cp/ChangeLog:

	PR c++/97201
	* error.c (dump_type_suffix): Handle both the C and C++ forms of
	zero-length arrays.

libstdc++-v3/ChangeLog:

	PR c++/97201
	* libsupc++/new (operator new): Add attribute alloc_size and malloc.

gcc/testsuite/ChangeLog:

	PR c++/97201
	* g++.dg/warn/Warray-bounds-10.C: New test.
	* g++.dg/warn/Warray-bounds-11.C: New test.
	* g++.dg/warn/Warray-bounds-12.C: New test.
	* g++.dg/warn/Warray-bounds-13.C: New test.


diff --git a/gcc/cp/error.c b/gcc/cp/error.c
index ecb41e82d8c..11ed3aedc8d 100644
--- a/gcc/cp/error.c
+++ b/gcc/cp/error.c
@@ -951,8 +951,11 @@ dump_type_suffix (cxx_pretty_printer *pp, tree t, int flags)
   if (tree dtype = TYPE_DOMAIN (t))
 	{
 	  tree max = TYPE_MAX_VALUE (dtype);
-	  /* Zero-length arrays have an upper bound of SIZE_MAX.  */
-	  if (integer_all_onesp (max))
+	  /* Zero-length arrays have a null upper bound in C and SIZE_MAX
+	 in C++.  Handle both since the type might be constructed by
+	 the middle end and end up here as a result of a warning (see
+	 PR c++/97201).  */
+	  if (!max || integer_all_onesp (max))
 	pp_character (pp, '0');
 	  else if (tree_fits_shwi_p (max))
 	pp_wide_integer (pp, tree_to_shwi (max) + 1);
diff --git a/gcc/testsuite/g++.dg/warn/Warray-bounds-10.C b/gcc/testsuite/g++.dg/warn/Warray-bounds-10.C
new file mode 100644
index 000..22466977b68
--- /dev/null
+++ b/gcc/testsuite/g++.dg/warn/Warray-bounds-10.C
@@ -0,0 +1,64 @@
+/* PR c++/97201 - ICE in -Warray-bounds writing to result of operator new(0)
+   Verify that out-of-bounds accesses to memory returned by default operator
+   new() are diagnosed.
+   { dg-do compile }
+   { dg-options "-O2 -Wall -Warray-bounds -ftrack-macro-expansion=0" } */
+
+typedef __INT32_TYPE__ int32_t;
+
+void sink (void*);
+
+#define OP_NEW(n)  operator new (n)
+#define T(T, n, i) do {\
+T *p = (T*) OP_NEW (n);			\
+p[i] = 0;	\
+sink (p);	\
+  } while (0)
+
+void warn_op_new ()
+{
+  T (int32_t, 0, 0);  // { dg-warning "array subscript 0 is outside array bounds of 'int32_t \\\[0]'" }
+  // { dg-message "referencing an object of size \\d allocated by 'void\\\* operator new\\\(\(long \)?unsigned int\\\)'" "note" { target *-*-* } .-1 }
+  T (int32_t, 1, 0);  // { dg-warning "array subscript 'int32_t {aka int}\\\[0]' is partly outside array bounds of 'unsigned char \\\[1]'" }
+  T (int32_t, 2, 0); //  { dg-warning "array subscript 'int32_t {aka int}\\\[0]' is partly outside array bounds of 'unsigned char \\\[2]'" }
+  T (int32_t, 3, 0); // { dg-warning "array subscript 'int32_t {aka int}\\\[0]' is partly outside array bounds of 'unsigned char \\\[3]'" }
+
+  T (int32_t, 4, 0);
+
+  T (int32_t, 0, 1);  // { dg-warning "array subscript 1 is outside array bounds of 'int32_t \\\[0]'" }
+  T (int32_t, 1, 1);  // { dg-warning "array subscript 1 is outside array bounds " }
+  T (int32_t, 2, 1);  // { dg-warning "array subscript 1 is outside array 

Re: [PATCH 0/2] Rework adding Power10 IEEE 128-bit min, max, and conditional move

2020-09-25 Thread Segher Boessenkool
Hi!

On Thu, Sep 24, 2020 at 04:56:27PM -0400, Michael Meissner wrote:
> On Thu, Sep 24, 2020 at 10:24:52AM +0200, Florian Weimer wrote:
> > * Michael Meissner via Gcc-patches:
> > 
> > > These patches are my latest versions of the patches to add IEEE 128-bit 
> > > min,
> > > max, and conditional move to GCC.  They correspond to the earlier patches 
> > > #3
> > > and #4 (patches #1 and #2 have been installed).
> > 
> > Is this about IEEE min or IEEE minimum?  My understanding is that they
> > are not the same (or that the behavior depends on the standard version,
> > but I think min was replaced with minimum in the 2019 standard or
> > something like that).

This is about the GCC internal RTX code "smin", which returns an
undefined result if either operand is a NAN, or both are zeros (of
different sign).

> The ISA 3.0 added 2 min/max variants to add to the original variant in power7
> (ISA 2.6).

2.06, fwiw.

>   xsmaxdp   Maximum value
>   xsmaxcdp  Maximum value with "C" semantics
>   xsmaxjdp  Maximum value with "Java" semantics

xsmaxdp implements IEEE behaviour fine.  xsmaxcdp is simply the C
expression  (x > y ? x : y) (or something like that), and xsmaxjdp is
something like that for Java.

> Due to the NaN rules, unless you use -ffast-math, the compiler won't generate
> these by default.

Simply because the RTL would be undefined!

> In ISA 3.1 (power10) the decision was made to only provide the "C" form on
> maximum and minimum.

... for quad precision.


Segher


Re: One issue with default implementation of zero_call_used_regs

2020-09-25 Thread Qing Zhao via Gcc-patches



> On Sep 25, 2020, at 12:31 PM, Richard Sandiford  
> wrote:
> 
> Qing Zhao  writes:
>> Last question, in the following code portion:
>> 
>>  /* Now we get a hard register set that need to be zeroed, pass it to
>> target to generate zeroing sequence.  */
>>  HARD_REG_SET zeroed_hardregs;
>>  start_sequence ();
>>  zeroed_hardregs = targetm.calls.zero_call_used_regs (need_zeroed_hardregs);
>>  rtx_insn *seq = get_insns ();
>>  end_sequence ();
>>  if (seq)
>>{
>>  /* emit the memory blockage and register clobber asm volatile.  */
>>  rtx barrier_rtx = expand_asm_reg_clobber_blockage (zeroed_hardregs);
>> 
>> /* How to insert the barrier_rtx before "seq"???.  */
>> ??
>> emit_insn_before (barrier_rtx, seq);  ??
>> 
>>  emit_insn_before (seq, ret);
>> 
>>  /* update the data flow information.  */
>> 
>>  df_set_bb_dirty (BLOCK_FOR_INSN (ret));
>>}
>> 
>> In the above, how should I insert the barrier_rtx in the beginning of “seq” 
>> ? And then insert the seq before ret?
>> Is there special thing I need to take care?
> 
> Easiest way is just to insert both of them before ret:
> 
>  emit_insn_before (barrier_rtx, ret);
>  emit_insn_before (seq, ret);
> 
Thanks. Will do that.

> Note that you shouldn't need to mark the block containing the
> return instruction as dirty: the emit machinery should do that
> for you.

Okay, I see.

>  But it might be necessary to mark the exit block
> (EXIT_BLOCK_PTR_FOR_FN (cfun)) as dirty because of the new
> liveness information -- I'm not sure.

Will study a little more here.

Thanks a lot for your help.

Qing
> 
> Thanks,
> Richard



Re: [PATCH] generalized range_query class for multiple contexts

2020-09-25 Thread Andrew MacLeod via Gcc-patches

On 9/23/20 7:53 PM, Martin Sebor via Gcc-patches wrote:

On 9/18/20 12:38 PM, Aldy Hernandez via Gcc-patches wrote:
As part of the ranger work, we have been trying to clean up and 
generalize interfaces whenever possible. This not only helps in 
reducing the maintenance burden going forward, but provides 
mechanisms for backwards compatibility between ranger and other 
providers/users of ranges throughout the compiler like evrp and VRP.


One such interface is the range_query class in vr_values.h, which 
provides a range query mechanism for use in the simplify_using_ranges 
module.  With it, simplify_using_ranges can be used with the ranger, 
or the VRP twins by providing a get_value_range() method.  This has 
helped us in comparing apples to apples while doing our work, and has 
also future proofed the interface so that asking for a range can be 
done within the context in which it appeared.  For example, 
get_value_range now takes a gimple statement which provides context.  
We are no longer tied to asking for a global SSA range, but can ask 
for the range of an SSA within a statement. Granted, this 
functionality is currently only in the ranger, but evrp/vrp could be 
adapted to pass such context.


The range_query is a good first step, but what we really want is a 
generic query mechanism that can ask for SSA ranges within an 
expression, a statement, an edge, or anything else that may come up.  
We think that a generic mechanism can be used not only for range 
producers, but consumers such as the substitute_and_fold_engine (see 
get_value virtual) and possibly the gimple folder (see valueize).


The attached patchset provides such an interface.  It is meant to be 
a replacement for range_query that can be used for vr_values, 
substitute_and_fold, the subsitute_and_fold_engine, as well as the 
ranger.  The general API is:


class value_query
{
public:
   // Return the singleton expression for NAME at a gimple statement,
   // or NULL if none found.
   virtual tree value_of_expr (tree name, gimple * = NULL) = 0;
   // Return the singleton expression for NAME at an edge, or NULL if
   // none found.
   virtual tree value_on_edge (edge, tree name);
   // Return the singleton expression for the LHS of a gimple
   // statement, assuming an (optional) initial value of NAME. Returns
   // NULL if none found.
   //
   // Note this method calculates the range the LHS would have *after*
   // the statement has executed.
   virtual tree value_of_stmt (gimple *, tree name = NULL);
};

class range_query : public value_query
{
public:
   range_query ();
   virtual ~range_query ();

   virtual tree value_of_expr (tree name, gimple * = NULL) OVERRIDE;
   virtual tree value_on_edge (edge, tree name) OVERRIDE;
   virtual tree value_of_stmt (gimple *, tree name = NULL) OVERRIDE;

   // These are the range equivalents of the value_* methods. Instead
   // of returning a singleton, they calculate a range and return it in
   // R.  TRUE is returned on success or FALSE if no range was found.
   virtual bool range_of_expr (irange , tree name, gimple * = NULL) 
= 0;

   virtual bool range_on_edge (irange , edge, tree name);
   virtual bool range_of_stmt (irange , gimple *, tree name = NULL);

   // DEPRECATED: This method is used from vr-values.  The plan is to
   // rewrite all uses of it to the above API.
   virtual const class value_range_equiv *get_value_range (const_tree,
   gimple * = NULL);
};

The duality of the API (value_of_* and range_on_*) is because some 
passes are interested in a singleton value 
(substitute_and_fold_enginge), while others are interested in ranges 
(vr_values).  Passes that are only interested in singletons can take 
a value_query, while passes that are interested in full ranges, can 
take a range_query.  Of course, for future proofing, we would 
recommend taking a range_query, since if you provide a default 
range_of_expr, sensible defaults will be provided for the others in 
terms of range_of_expr.


Note, that the absolute bare minimum that must be provided is a 
value_of_expr and a range_of_expr respectively.


One piece of the API which is missing is a method  to return the 
range of an arbitrary SSA_NAME *after* a statement.  Currently 
range_of_expr calculates the range of an expression upon entry to the 
statement, whereas range_of_stmt calculates the range of *only* the 
LHS of a statement AFTER the statement has executed.


This would allow for complete representation of the ranges/values in 
something like:


 d_4 = *g_7;

Here the range of g_7 upon entry could be VARYING, but after the 
dereference we know it must be non-zero.  Well for sane targets anyhow.


Choices would be to:

   1) add a 4th method such as "range_after_stmt", or

   2) merge that functionality with the existing range_of_stmt method 
to provide "after" functionality for any ssa_name. Currently the 
SSA_NAME must be the same as the LHS if specified.  It also does not 
need to be 

Re: [PATCH, rs6000] correct an erroneous BTM value in the BU_P10_MISC define

2020-09-25 Thread Segher Boessenkool
Hi!

On Thu, Sep 24, 2020 at 03:35:24PM -0500, will schmidt wrote:
> We have extraneous BTM entry (RS6000_BTM_POWERPC64) in the define for
> our P10 MISC 2 builtin definition.  This does not exist for the '0',
> '1' or '3' definitions. It appears to me that this was erroneously
> copied from the P7 version of the define which contains a version of the
> BU macro both with and without that element.  Removing the
> RS6000_BTM_POWERPC64 portion of the define does not introduce any obvious
> failures, I believe this extra line can be safely removed.

No, it cannot.

This is used for pdepd/pextd/cntlzdm/cnttzdm/cfuged, all of which do
need 64-bit registers to do anything sane.

This should really have defined some new builtin class, and I thought we
could just be tricky and take a massive shortcut.  Bill has been hit by
this already as well, sigh :-(


Segher


Re: One issue with default implementation of zero_call_used_regs

2020-09-25 Thread Richard Sandiford
Qing Zhao  writes:
> Last question, in the following code portion:
>
>   /* Now we get a hard register set that need to be zeroed, pass it to
>  target to generate zeroing sequence.  */
>   HARD_REG_SET zeroed_hardregs;
>   start_sequence ();
>   zeroed_hardregs = targetm.calls.zero_call_used_regs (need_zeroed_hardregs);
>   rtx_insn *seq = get_insns ();
>   end_sequence ();
>   if (seq)
> {
>   /* emit the memory blockage and register clobber asm volatile.  */
>   rtx barrier_rtx = expand_asm_reg_clobber_blockage (zeroed_hardregs);
>
>  /* How to insert the barrier_rtx before "seq"???.  */
>  ??
>  emit_insn_before (barrier_rtx, seq);  ??
>
>   emit_insn_before (seq, ret);
>   
>   /* update the data flow information.  */
>
>   df_set_bb_dirty (BLOCK_FOR_INSN (ret));
> }
>
> In the above, how should I insert the barrier_rtx in the beginning of “seq” ? 
> And then insert the seq before ret?
> Is there special thing I need to take care?

Easiest way is just to insert both of them before ret:

  emit_insn_before (barrier_rtx, ret);
  emit_insn_before (seq, ret);

Note that you shouldn't need to mark the block containing the
return instruction as dirty: the emit machinery should do that
for you.  But it might be necessary to mark the exit block
(EXIT_BLOCK_PTR_FOR_FN (cfun)) as dirty because of the new
liveness information -- I'm not sure.

Thanks,
Richard


c++: Replace tag_scope with TAG_how

2020-09-25 Thread Nathan Sidwell


I always found tag_scope confusing, as it is not a scope, but a
direction of how to lookup or insert an elaborated type tag.  This
replaces it with a enum class TAG_how.  I also add a new value,
HIDDEN_FRIEND, to distinguish the two cases of innermost-non-class
insertion that we currently conflate.  Also renamed
'lookup_type_scope' to 'lookup_elaborated_type', because again, we're
not providing a scope to lookup in.

gcc/cp/
* name-lookup.h (enum tag_scope): Replace with ...
(enum class TAG_how): ... this.  Add HIDDEN_FRIEND value.
(lookup_type_scope): Replace with ...
(lookup_elaborated_type): ... this.
(pushtag): Use TAG_how, not tag_scope.
* cp-tree.h (xref_tag): Parameter is TAG_how, not tag_scope.
* decl.c (lookup_and_check_tag): Likewise.  Adjust.
(xref_tag_1, xref_tag): Likewise. adjust.
(start_enum): Adjust lookup_and_check_tag call.
* name-lookup.c (lookup_type_scope_1): Rename to ...
(lookup_elaborated_type_1) ... here. Use TAG_how, not tag_scope.
(lookup_type_scope): Rename to ...
(lookup_elaborated_type): ... here.  Use TAG_how, not tag_scope.
(do_pushtag): Use TAG_how, not tag_scope.  Adjust.
(pushtag): Likewise.
* parser.c (cp_parser_elaborated_type_specifier): Adjust.
(cp_parser_class_head): Likewise.
gcc/objcp/
* objcp-decl.c (objcp_start_struct): Use TAG_how not tag_scope.
(objcp_xref_tag): Likewise.


pushing to trunk

nathan
--
Nathan Sidwell
diff --git i/gcc/cp/cp-tree.h w/gcc/cp/cp-tree.h
index bd78f00ba97..321bb959120 100644
--- i/gcc/cp/cp-tree.h
+++ w/gcc/cp/cp-tree.h
@@ -6507,7 +6507,7 @@ extern void grok_special_member_properties	(tree);
 extern bool grok_ctor_properties		(const_tree, const_tree);
 extern bool grok_op_properties			(tree, bool);
 extern tree xref_tag(tag_types, tree,
-		 tag_scope = ts_current,
+		 TAG_how = TAG_how::CURRENT_ONLY,
 		 bool tpl_header_p = false);
 extern void xref_basetypes			(tree, tree);
 extern tree start_enum(tree, tree, tree, tree, bool, bool *);
diff --git i/gcc/cp/decl.c w/gcc/cp/decl.c
index 1709dd9a370..b481bbd7b7d 100644
--- i/gcc/cp/decl.c
+++ w/gcc/cp/decl.c
@@ -75,7 +75,7 @@ static void record_unknown_type (tree, const char *);
 static int member_function_or_else (tree, tree, enum overload_flags);
 static tree local_variable_p_walkfn (tree *, int *, void *);
 static const char *tag_name (enum tag_types);
-static tree lookup_and_check_tag (enum tag_types, tree, tag_scope, bool);
+static tree lookup_and_check_tag (enum tag_types, tree, TAG_how, bool);
 static void maybe_deduce_size_from_array_init (tree, tree);
 static void layout_var_decl (tree);
 static tree check_initializer (tree, tree, int, vec **);
@@ -14862,11 +14862,10 @@ check_elaborated_type_specifier (enum tag_types tag_code,
 
 static tree
 lookup_and_check_tag (enum tag_types tag_code, tree name,
-		  tag_scope scope, bool template_header_p)
+		  TAG_how how, bool template_header_p)
 {
-  tree t;
   tree decl;
-  if (scope == ts_global)
+  if (how == TAG_how::GLOBAL)
 {
   /* First try ordinary name lookup, ignoring hidden class name
 	 injected via friend declaration.  */
@@ -14879,16 +14878,16 @@ lookup_and_check_tag (enum tag_types tag_code, tree name,
 	 If we find one, that name will be made visible rather than
 	 creating a new tag.  */
   if (!decl)
-	decl = lookup_type_scope (name, ts_within_enclosing_non_class);
+	decl = lookup_elaborated_type (name, TAG_how::INNERMOST_NON_CLASS);
 }
   else
-decl = lookup_type_scope (name, scope);
+decl = lookup_elaborated_type (name, how);
 
   if (decl
   && (DECL_CLASS_TEMPLATE_P (decl)
-	  /* If scope is ts_current we're defining a class, so ignore a
-	 template template parameter.  */
-	  || (scope != ts_current
+	  /* If scope is TAG_how::CURRENT_ONLY we're defining a class,
+	 so ignore a template template parameter.  */
+	  || (how != TAG_how::CURRENT_ONLY
 	  && DECL_TEMPLATE_TEMPLATE_PARM_P (decl
 decl = DECL_TEMPLATE_RESULT (decl);
 
@@ -14898,11 +14897,10 @@ lookup_and_check_tag (enum tag_types tag_code, tree name,
 	   class C {
 	 class C {};
 	   };  */
-  if (scope == ts_current && DECL_SELF_REFERENCE_P (decl))
+  if (how == TAG_how::CURRENT_ONLY && DECL_SELF_REFERENCE_P (decl))
 	{
 	  error ("%qD has the same name as the class in which it is "
-		 "declared",
-		 decl);
+		 "declared", decl);
 	  return error_mark_node;
 	}
 
@@ -14922,10 +14920,10 @@ lookup_and_check_tag (enum tag_types tag_code, tree name,
 	 class C *c2;		// DECL_SELF_REFERENCE_P is true
 	   };  */
 
-  t = check_elaborated_type_specifier (tag_code,
-	   decl,
-	   template_header_p
-	   | DECL_SELF_REFERENCE_P (decl));
+  tree t = check_elaborated_type_specifier (tag_code,
+		decl,
+		template_header_p
+		| DECL_SELF_REFERENCE_P (decl));
   if 

Re: One issue with default implementation of zero_call_used_regs

2020-09-25 Thread Qing Zhao via Gcc-patches



> On Sep 25, 2020, at 11:58 AM, Richard Sandiford  
> wrote:
> 
> Qing Zhao  writes:
 
 
 Which data structure in GCC should be used here to hold this returned 
 value as Set of RTX ?
>>> 
>>> A HARD_REG_SET is enough.  All the caller needs to know is: which registers
>>> were clobbered?  It can then represent a clobber of R with a clobber of
>>> reg_regno_rtx[R].
>> 
>> I did not find reg_regno_rtx in the current gcc source, not sure how to use 
>> it?
> 
> Sorry, I misremembered the name, it's regno_reg_rtx (which makes
> more sense than what I wrote).

Found it!

> 
>>> 
>>> The mode isn't important for single-register clobbers: clobbering a single
>>> register in one mode is equivalent to clobbering it in another mode.
>>> None of the register contents survive the clobber.
>> 
>> Okay, I see.
>> 
>> Then is the following good enough:
>> 
>> /* Generate asm volatile("" : : : "memory") as a memory blockage, at the 
>>   same time clobbering the register set  specified by ZEROED_REGS.  */
>> 
>> void
>> expand_asm_memory_blockage_clobber_regs (HARD_REG_SET zeroed_regs)
>> {
>>  rtx asm_op, clob_mem, clob_reg;
>> 
>>  /* first get the number of registers that have been zeroed from ZEROED_REGS 
>> set.  */
>>  unsigned int  num_of_regs = ….;
>> 
>>  asm_op = gen_rtx_ASM_OPERANDS (VOIDmode, "", "", 0,
>> rtvec_alloc (0), rtvec_alloc (0),
>> rtvec_alloc (0), UNKNOWN_LOCATION);
>>  MEM_VOLATILE_P (asm_op) = 1;
>> 
>>  rtvec v = rtvec_alloc (num_of_regs + 2);
>> 
>>  clob_mem = gen_rtx_SCRATCH (VOIDmode);
>>  clob_mem = gen_rtx_MEM (BLKmode, clob_mem);
>>  clob_mem = gen_rtx_CLOBBER (VOIDmode, clob_mem);
>> 
>>  RTVEC_ELT (v,0) = asm_op;
>>  RTVEC_ELT (v,1) = clob_mem;
>> 
>>  for (unsigned int I = 0; i < FIRST_PSEUDO_REGISTER; i++)
>>If (TEST_HARD_REG_BIT (zeroed_regs, i))
>>  {
>>clob_reg  = gen_rtx_CLOBBER (VOIDmode, reg_regno_rtx[i]);
>>RTVEC_ELT (v,i+2) = clob_reg;
>>  }
>> 
>>  emit_insn (gen_rtx_PARALLEL (VOIDmode, v));  
>> }
> 
> Yeah, looks like it should work.

thanks.

Last question, in the following code portion:

  /* Now we get a hard register set that need to be zeroed, pass it to
 target to generate zeroing sequence.  */
  HARD_REG_SET zeroed_hardregs;
  start_sequence ();
  zeroed_hardregs = targetm.calls.zero_call_used_regs (need_zeroed_hardregs);
  rtx_insn *seq = get_insns ();
  end_sequence ();
  if (seq)
{
  /* emit the memory blockage and register clobber asm volatile.  */
  rtx barrier_rtx = expand_asm_reg_clobber_blockage (zeroed_hardregs);

 /* How to insert the barrier_rtx before "seq"???.  */
 ??
 emit_insn_before (barrier_rtx, seq);  ??

  emit_insn_before (seq, ret);
  
  /* update the data flow information.  */

  df_set_bb_dirty (BLOCK_FOR_INSN (ret));
}

In the above, how should I insert the barrier_rtx in the beginning of “seq” ? 
And then insert the seq before ret?
Is there special thing I need to take care?

Qing
> 
> Thanks,
> Richard



Re: One issue with default implementation of zero_call_used_regs

2020-09-25 Thread Richard Sandiford
Qing Zhao  writes:
>> On Sep 25, 2020, at 10:28 AM, Richard Sandiford  
>> wrote:
>> 
>> Qing Zhao mailto:qing.z...@oracle.com>> writes:
 On Sep 25, 2020, at 7:53 AM, Richard Sandiford  
 wrote:
 
 Qing Zhao  writes:
> Hi, Richard,
> 
> As you suggested, I added a default implementation of the target hook 
> “zero_cal_used_regs (HARD_REG_SET)” as following in my latest patch
> 
> 
> /* The default hook for TARGET_ZERO_CALL_USED_REGS.  */
> 
> void
> default_zero_call_used_regs (HARD_REG_SET need_zeroed_hardregs)
 
 FWIW, I was suggesting to return the set of registers that are actually
 cleared too.  Here you have the hook emit the asm statement, but IMO the
 way we generate the asm for a given set of registers should be entirely
 target-independent, and happen outside the hook.
 
 So the hook returning the set of cleared registers does two things:
 
 (1) It indicates which registers should be clobbered by the asm
   (which would be generated after calling the hook, but emitted
   before the sequence of instructions generated by the hook).
>>> 
>>> For this purpose, this hook should return a Set of RTX that hold the 
>>> cleared registers, a HARD_REG_SET is not enough.
>>> 
>>> Since in the ASM_OPERANDS, we will need the RTX for the register (not the 
>>> REGNO).
>>> 
>>> Which data structure in GCC should be used here to hold this returned value 
>>> as Set of RTX ?
>> 
>> A HARD_REG_SET is enough.  All the caller needs to know is: which registers
>> were clobbered?  It can then represent a clobber of R with a clobber of
>> reg_regno_rtx[R].
>
> I did not find reg_regno_rtx in the current gcc source, not sure how to use 
> it?

Sorry, I misremembered the name, it's regno_reg_rtx (which makes
more sense than what I wrote).

>> 
>> The mode isn't important for single-register clobbers: clobbering a single
>> register in one mode is equivalent to clobbering it in another mode.
>> None of the register contents survive the clobber.
>
> Okay, I see.
>
> Then is the following good enough:
>
> /* Generate asm volatile("" : : : "memory") as a memory blockage, at the 
>same time clobbering the register set  specified by ZEROED_REGS.  */
>
> void
> expand_asm_memory_blockage_clobber_regs (HARD_REG_SET zeroed_regs)
> {
>   rtx asm_op, clob_mem, clob_reg;
>
>   /* first get the number of registers that have been zeroed from ZEROED_REGS 
> set.  */
>   unsigned int  num_of_regs = ….;
>
>   asm_op = gen_rtx_ASM_OPERANDS (VOIDmode, "", "", 0,
>  rtvec_alloc (0), rtvec_alloc (0),
>  rtvec_alloc (0), UNKNOWN_LOCATION);
>   MEM_VOLATILE_P (asm_op) = 1;
>
>   rtvec v = rtvec_alloc (num_of_regs + 2);
>
>   clob_mem = gen_rtx_SCRATCH (VOIDmode);
>   clob_mem = gen_rtx_MEM (BLKmode, clob_mem);
>   clob_mem = gen_rtx_CLOBBER (VOIDmode, clob_mem);
>
>   RTVEC_ELT (v,0) = asm_op;
>   RTVEC_ELT (v,1) = clob_mem;
>
>   for (unsigned int I = 0; i < FIRST_PSEUDO_REGISTER; i++)
> If (TEST_HARD_REG_BIT (zeroed_regs, i))
>   {
> clob_reg  = gen_rtx_CLOBBER (VOIDmode, reg_regno_rtx[i]);
> RTVEC_ELT (v,i+2) = clob_reg;
>   }
>   
>   emit_insn (gen_rtx_PARALLEL (VOIDmode, v));  
> }

Yeah, looks like it should work.

Thanks,
Richard


[PATCH] AArch64: Add Linux cpuinfo string for rng feature

2020-09-25 Thread Kyrylo Tkachov
Hi all,

The Linux kernel has defined the cpuinfo string for the +rng feature, so this 
patch adds that to GCC so that -march=native can pick it up.
Bootstrapped and tested on aarch64-none-linux-gnu.
Committing to trunk and later to the branches.

Thanks,
Kyrill

gcc/
* config/aarch64/aarch64-option-extensions.def (rng): Add cpuinfo 
string.


rng-cpuinfo.patch
Description: rng-cpuinfo.patch


[PATCH][GCC 8] AArch64: Implement Armv8.3-a complex arithmetic intrinsics

2020-09-25 Thread Kyrylo Tkachov
Hi all,

I'd like to backport some patches from Tamar in GCC 9 to GCC 8 that implement 
the complex arithmetic intrinsics for Advanced SIMD.
These should have been present in GCC 8 that gained support for Armv8.3-a.

There were 4 follow-up fixes that I've rolled into the one commit.

Bootstrapped and tested on aarch64-none-linux-gnu and arm-none-linux-gnueabihf 
on the GCC 8 branch.
Pushing to the releases/gcc-8 branch.

Thanks,
Kyrill

gcc/
PR target/71233
* config/aarch64/aarch64-builtins.c (enum aarch64_type_qualifiers): Add 
qualifier_lane_pair_index.
(emit-rtl.h): Include.
(TYPES_QUADOP_LANE_PAIR): New.
(aarch64_simd_expand_args): Use it.
(aarch64_simd_expand_builtin): Likewise.
(AARCH64_SIMD_FCMLA_LANEQ_BUILTINS, aarch64_fcmla_laneq_builtin_datum): 
New.
(FCMLA_LANEQ_BUILTIN, AARCH64_SIMD_FCMLA_LANEQ_BUILTIN_BASE,
AARCH64_SIMD_FCMLA_LANEQ_BUILTINS, aarch64_fcmla_lane_builtin_data,
aarch64_init_fcmla_laneq_builtins, aarch64_expand_fcmla_builtin): New.
(aarch64_init_builtins): Add aarch64_init_fcmla_laneq_builtins.
(aarch64_expand_buildin): Add AARCH64_SIMD_BUILTIN_FCMLA_LANEQ0_V2SF,
AARCH64_SIMD_BUILTIN_FCMLA_LANEQ90_V2SF, 
AARCH64_SIMD_BUILTIN_FCMLA_LANEQ180_V2SF,
AARCH64_SIMD_BUILTIN_FCMLA_LANEQ2700_V2SF, 
AARCH64_SIMD_BUILTIN_FCMLA_LANEQ0_V4HF,
AARCH64_SIMD_BUILTIN_FCMLA_LANEQ90_V4HF, 
AARCH64_SIMD_BUILTIN_FCMLA_LANEQ180_V4HF,
AARCH64_SIMD_BUILTIN_FCMLA_LANEQ270_V4HF.
* config/aarch64/aarch64-c.c (aarch64_update_cpp_builtins): Add 
__ARM_FEATURE_COMPLEX.
* config/aarch64/aarch64-simd-builtins.def (fcadd90, fcadd270, fcmla0, 
fcmla90,
fcmla180, fcmla270, fcmla_lane0, fcmla_lane90, fcmla_lane180, 
fcmla_lane270,
fcmla_laneq0, fcmla_laneq90, fcmla_laneq180, fcmla_laneq270,
fcmlaq_lane0, fcmlaq_lane90, fcmlaq_lane180, fcmlaq_lane270): New.
* config/aarch64/aarch64-simd.md (aarch64_fcmla_lane,
aarch64_fcmla_laneqv4hf, 
aarch64_fcmlaq_lane,aarch64_fcadd,
aarch64_fcmla): New.
* config/aarch64/arm_neon.h:
(vcadd_rot90_f16): New.
(vcaddq_rot90_f16): New.
(vcadd_rot270_f16): New.
(vcaddq_rot270_f16): New.
(vcmla_f16): New.
(vcmlaq_f16): New.
(vcmla_lane_f16): New.
(vcmla_laneq_f16): New.
(vcmlaq_lane_f16): New.
(vcmlaq_rot90_lane_f16): New.
(vcmla_rot90_laneq_f16): New.
(vcmla_rot90_lane_f16): New.
(vcmlaq_rot90_f16): New.
(vcmla_rot90_f16): New.
(vcmlaq_laneq_f16): New.
(vcmla_rot180_laneq_f16): New.
(vcmla_rot180_lane_f16): New.
(vcmlaq_rot180_f16): New.
(vcmla_rot180_f16): New.
(vcmlaq_rot90_laneq_f16): New.
(vcmlaq_rot270_laneq_f16): New.
(vcmlaq_rot270_lane_f16): New.
(vcmla_rot270_laneq_f16): New.
(vcmlaq_rot270_f16): New.
(vcmla_rot270_f16): New.
(vcmlaq_rot180_laneq_f16): New.
(vcmlaq_rot180_lane_f16): New.
(vcmla_rot270_lane_f16): New.
(vcadd_rot90_f32): New.
(vcaddq_rot90_f32): New.
(vcaddq_rot90_f64): New.
(vcadd_rot270_f32): New.
(vcaddq_rot270_f32): New.
(vcaddq_rot270_f64): New.
(vcmla_f32): New.
(vcmlaq_f32): New.
(vcmlaq_f64): New.
(vcmla_lane_f32): New.
(vcmla_laneq_f32): New.
(vcmlaq_lane_f32): New.
(vcmlaq_laneq_f32): New.
(vcmla_rot90_f32): New.
(vcmlaq_rot90_f32): New.
(vcmlaq_rot90_f64): New.
(vcmla_rot90_lane_f32): New.
(vcmla_rot90_laneq_f32): New.
(vcmlaq_rot90_lane_f32): New.
(vcmlaq_rot90_laneq_f32): New.
(vcmla_rot180_f32): New.
(vcmlaq_rot180_f32): New.
(vcmlaq_rot180_f64): New.
(vcmla_rot180_lane_f32): New.
(vcmla_rot180_laneq_f32): New.
(vcmlaq_rot180_lane_f32): New.
(vcmlaq_rot180_laneq_f32): New.
(vcmla_rot270_f32): New.
(vcmlaq_rot270_f32): New.
(vcmlaq_rot270_f64): New.
(vcmla_rot270_lane_f32): New.
(vcmla_rot270_laneq_f32): New.
(vcmlaq_rot270_lane_f32): New.
(vcmlaq_rot270_laneq_f32): New.
* config/aarch64/aarch64.h (TARGET_COMPLEX): New.
* config/aarch64/iterators.md (UNSPEC_FCADD90, UNSPEC_FCADD270,
UNSPEC_FCMLA, UNSPEC_FCMLA90, UNSPEC_FCMLA180, UNSPEC_FCMLA270): New.
(FCADD, FCMLA): New.
(rot): New.
(FCMLA_maybe_lane): New.
* config/arm/types.md (neon_fcadd, neon_fcmla): New.

gcc/testsuite/
PR target/71233
* lib/target-supports.exp
(check_effective_target_arm_v8_3a_complex_neon_ok_nocache,
check_effective_target_arm_v8_3a_complex_neon_ok,
add_options_for_arm_v8_3a_complex_neon,
check_effective_target_arm_v8_3a_complex_neon_hw,
check_effective_target_vect_complex_rot_N): New.

Re: One issue with default implementation of zero_call_used_regs

2020-09-25 Thread Qing Zhao via Gcc-patches



> On Sep 25, 2020, at 10:28 AM, Richard Sandiford  
> wrote:
> 
> Qing Zhao mailto:qing.z...@oracle.com>> writes:
>>> On Sep 25, 2020, at 7:53 AM, Richard Sandiford  
>>> wrote:
>>> 
>>> Qing Zhao  writes:
 Hi, Richard,
 
 As you suggested, I added a default implementation of the target hook 
 “zero_cal_used_regs (HARD_REG_SET)” as following in my latest patch
 
 
 /* The default hook for TARGET_ZERO_CALL_USED_REGS.  */
 
 void
 default_zero_call_used_regs (HARD_REG_SET need_zeroed_hardregs)
>>> 
>>> FWIW, I was suggesting to return the set of registers that are actually
>>> cleared too.  Here you have the hook emit the asm statement, but IMO the
>>> way we generate the asm for a given set of registers should be entirely
>>> target-independent, and happen outside the hook.
>>> 
>>> So the hook returning the set of cleared registers does two things:
>>> 
>>> (1) It indicates which registers should be clobbered by the asm
>>>   (which would be generated after calling the hook, but emitted
>>>   before the sequence of instructions generated by the hook).
>> 
>> For this purpose, this hook should return a Set of RTX that hold the cleared 
>> registers, a HARD_REG_SET is not enough.
>> 
>> Since in the ASM_OPERANDS, we will need the RTX for the register (not the 
>> REGNO).
>> 
>> Which data structure in GCC should be used here to hold this returned value 
>> as Set of RTX ?
> 
> A HARD_REG_SET is enough.  All the caller needs to know is: which registers
> were clobbered?  It can then represent a clobber of R with a clobber of
> reg_regno_rtx[R].

I did not find reg_regno_rtx in the current gcc source, not sure how to use it?
> 
> The mode isn't important for single-register clobbers: clobbering a single
> register in one mode is equivalent to clobbering it in another mode.
> None of the register contents survive the clobber.

Okay, I see.

Then is the following good enough:

/* Generate asm volatile("" : : : "memory") as a memory blockage, at the 
   same time clobbering the register set  specified by ZEROED_REGS.  */

void
expand_asm_memory_blockage_clobber_regs (HARD_REG_SET zeroed_regs)
{
  rtx asm_op, clob_mem, clob_reg;

  /* first get the number of registers that have been zeroed from ZEROED_REGS 
set.  */
  unsigned int  num_of_regs = ….;

  asm_op = gen_rtx_ASM_OPERANDS (VOIDmode, "", "", 0,
 rtvec_alloc (0), rtvec_alloc (0),
 rtvec_alloc (0), UNKNOWN_LOCATION);
  MEM_VOLATILE_P (asm_op) = 1;

  rtvec v = rtvec_alloc (num_of_regs + 2);

  clob_mem = gen_rtx_SCRATCH (VOIDmode);
  clob_mem = gen_rtx_MEM (BLKmode, clob_mem);
  clob_mem = gen_rtx_CLOBBER (VOIDmode, clob_mem);

  RTVEC_ELT (v,0) = asm_op;
  RTVEC_ELT (v,1) = clob_mem;

  for (unsigned int I = 0; i < FIRST_PSEUDO_REGISTER; i++)
If (TEST_HARD_REG_BIT (zeroed_regs, i))
  {
clob_reg  = gen_rtx_CLOBBER (VOIDmode, reg_regno_rtx[i]);
RTVEC_ELT (v,i+2) = clob_reg;
  }
  
  emit_insn (gen_rtx_PARALLEL (VOIDmode, v));  
}

How to come up with the above:’

   clob_reg  = gen_rtx_CLOBBER (VOIDmode, reg_regno_rtx[i]);?

> 
>>> (2) It indicates which registers should be treated as live on return.
>>> 
>>> FWIW, for (2), I'd recommend storing the returned HARD_REG_SET in crtl.
>> 
>> Instead of storing this info in crtl, in my current patch, I added the 
>> following in “df-scan.c":
>> +static HARD_REG_SET zeroed_reg_set;
>> 
>> And routines that manipulate this HARD_REG_SET. 
>> I think that this should serve the same purpose as storing it to crtl? 
> 
> Storing it in crtl is better for two reasons:
> 
> - Using global statics for this kind of thing makes it harder to
>  compile functions in parallel.  (Work is underway to allow that.)
> 
> - Having the information in crtl reduces the risk that information
>  from one function will get reused for another function, without the
>  variable being reinitialised inbetween.

Okay, will add a new field zeroed_reg_set into crtl.

Thanks.

Qing
> 
> Thanks,
> Richard



[Patch, fortran] PR/97045 A wrong column is selected when addressing individual elements of unlimited polymorphic dummy argument

2020-09-25 Thread Paul Richard Thomas via Gcc-patches
Hi All,

The original testcase turned out to be relatively easy to fix - the chunks
in trans-expr.c and trans-stmt.c do this. However, I tested character
actual arguments to 'write_array' in the testcase and found that the _len
component of the unlimited polymorphic dummy was not being used for the
selector and so the payloads were being treated as if they were
character(len = 1). The fix for this part of the problem further
complicates the building of array references. It looks to me as if
rationalizing this part of the trans-* part of gfortran is quite a
significant TODO, since it is now little more than bandaid on sticking
plaster! I will flag this up in a new PR.

Regtests on FC31/x86_64 - OK for master?

Paul

This patch fixes PR97045 - unlimited polymorphic array element selectors.

2020-25-09  Paul Thomas  

gcc/fortran
PR fortran/97045
* trans-array.c (gfc_conv_array_ref): Make sure that the class
decl is passed to build_array_ref in the case of unlimited
polymorphic entities.
* trans-expr.c (gfc_conv_derived_to_class): Ensure that array
refs do not preceed the _len component. Free the _len expr.
* trans-stmt.c (trans_associate_var): Reset 'need_len_assign'
for polymorphic scalars.
* trans.c (gfc_build_array_ref): When the vptr size is used for
span, multiply by the _len field of unlimited polymorphic
entities, when non-zero.

gcc/testsuite/
PR fortran/97045
* gfortran.dg/select_type_50.f90 : New test.
diff --git a/gcc/fortran/trans-array.c b/gcc/fortran/trans-array.c
index 6566c47d4ae..998d4d4ed9b 100644
--- a/gcc/fortran/trans-array.c
+++ b/gcc/fortran/trans-array.c
@@ -3787,7 +3787,20 @@ gfc_conv_array_ref (gfc_se * se, gfc_array_ref * ar, gfc_expr *expr,
 	decl = sym->backend_decl;
 }
   else if (sym->ts.type == BT_CLASS)
-decl = NULL_TREE;
+{
+  if (UNLIMITED_POLY (sym))
+	{
+	  gfc_expr *class_expr = gfc_find_and_cut_at_last_class_ref (expr);
+	  gfc_init_se (, NULL);
+	  gfc_conv_expr (, class_expr);
+	  if (!se->class_vptr)
+	se->class_vptr = gfc_class_vptr_get (tmpse.expr);
+	  gfc_free_expr (class_expr);
+	  decl = tmpse.expr;
+	}
+  else
+	decl = NULL_TREE;
+}
 
   se->expr = build_array_ref (se->expr, offset, decl, se->class_vptr);
 }
diff --git a/gcc/fortran/trans-expr.c b/gcc/fortran/trans-expr.c
index a690839f591..2c31ec9bf01 100644
--- a/gcc/fortran/trans-expr.c
+++ b/gcc/fortran/trans-expr.c
@@ -728,7 +728,7 @@ gfc_conv_derived_to_class (gfc_se *parmse, gfc_expr *e,
 	  gfc_expr *len;
 	  gfc_se se;
 
-	  len = gfc_copy_expr (e);
+	  len = gfc_find_and_cut_at_last_class_ref (e);
 	  gfc_add_len_component (len);
 	  gfc_init_se (, NULL);
 	  gfc_conv_expr (, len);
@@ -739,6 +739,7 @@ gfc_conv_derived_to_class (gfc_se *parmse, gfc_expr *e,
 	integer_zero_node));
 	  else
 	tmp = se.expr;
+	  gfc_free_expr (len);
 	}
   else
 	tmp = integer_zero_node;
diff --git a/gcc/fortran/trans-stmt.c b/gcc/fortran/trans-stmt.c
index 389fec7227e..adc6b8fefb5 100644
--- a/gcc/fortran/trans-stmt.c
+++ b/gcc/fortran/trans-stmt.c
@@ -2091,6 +2091,7 @@ trans_associate_var (gfc_symbol *sym, gfc_wrapped_block *block)
 	  /* Obtain a temporary class container for the result.  */
 	  gfc_conv_derived_to_class (, e, sym->ts, tmp, false, false);
 	  se.expr = build_fold_indirect_ref_loc (input_location, se.expr);
+	  need_len_assign = false;
 	}
   else
 	{
diff --git a/gcc/fortran/trans.c b/gcc/fortran/trans.c
index ed054261452..8caa625ab0e 100644
--- a/gcc/fortran/trans.c
+++ b/gcc/fortran/trans.c
@@ -429,7 +429,28 @@ gfc_build_array_ref (tree base, tree offset, tree decl, tree vptr)
   /* If decl or vptr are non-null, pointer arithmetic for the array reference
  is likely. Generate the 'span' for the array reference.  */
   if (vptr)
-span = gfc_vptr_size_get (vptr);
+{
+  span = gfc_vptr_size_get (vptr);
+
+  /* Check if this is an unlimited polymorphic object carrying a character
+	 payload. In this case, the 'len' field is non-zero.  */
+  if (decl && GFC_CLASS_TYPE_P (TREE_TYPE (decl)))
+	{
+	  tmp = gfc_class_len_or_zero_get (decl);
+	  if (!integer_zerop (tmp))
+	{
+	  tree cond;
+	  tree stype = TREE_TYPE (span);
+	  tmp = fold_convert (stype, tmp);
+	  cond = fold_build2_loc (input_location, EQ_EXPR,
+  logical_type_node, tmp,
+  build_int_cst (stype, 0));
+	  tmp = fold_build2 (MULT_EXPR, stype, span, tmp);
+	  span = fold_build3_loc (input_location, COND_EXPR, stype,
+  cond, span, tmp);
+	}
+	}
+}
   else if (decl)
 span = get_array_span (type, decl);
 
! { dg-do run }
!
! Test the fix for PR97045. The report was for the INTEGER version. Testing
! revealed a further bug with the character versions.
!
! Contributed by Igor Gayday  
!
program test_prg
  implicit none
  integer :: i
  integer, allocatable :: arr(:, :)
  character(kind = 1, len = 2), allocatable :: chr(:, :)
  character(kind = 4, len = 2), allocatable :: chr4(:, :)

  arr = reshape ([(i, i = 1, 

Re: [EXTERNAL] Re: [PATCH 2/2, rs6000] VSX load/store rightmost element operations

2020-09-25 Thread will schmidt via Gcc-patches
On Thu, 2020-09-24 at 19:40 -0500, Segher Boessenkool wrote:
> On Thu, Sep 24, 2020 at 11:04:38AM -0500, will schmidt wrote:
> > [PATCH 2/2, rs6000] VSX load/store rightmost element operations
> > 
> > Hi,
> >   This adds support for the VSX load/store rightmost element
> > operations.
> > This includes the instructions lxvrbx, lxvrhx, lxvrwx, lxvrdx,
> > stxvrbx, stxvrhx, stxvrwx, stxvrdx; And the builtins
> > vec_xl_sext() /* vector load sign extend */
> > vec_xl_zext() /* vector load zero extend */
> > vec_xst_trunc() /* vector store truncate */.
> > 
> > Testcase results show that the instructions added with this patch
> > show
> > up at low/no optimization (-O0), with a number of those being
> > replaced
> > with other load and store instructions at higher optimization
> > levels.
> > For consistency I've left the tests at -O0.
> > 
> > Regtested OK for Linux on power8,power9 targets.  Sniff-regtested
> > OK on
> > power10 simulator.
> > OK for trunk?
> > 
> > Thanks,
> > -Will
> > 
> > gcc/ChangeLog:
> > * config/rs6000/altivec.h (vec_xl_zest, vec_xl_sext,
> > vec_xst_trunc): New
> > defines.
> 
> vec_xl_zext (no humour there :-) ).

Lol.. one of them slipped through.. my muscle memory struggled on
typing these.. :-)


> 
> > +BU_P10V_OVERLOAD_X (SE_LXVRX,   "se_lxvrx")
> > +BU_P10V_OVERLOAD_X (ZE_LXVRX,   "ze_lxvrx")
> > +BU_P10V_OVERLOAD_X (TR_STXVRX,  "tr_stxvrx")
> 
> I'm not a fan of the cryptic names.  I guess I'll get used to them ;-
> )
> 
> > +  if (op0 == const0_rtx)
> > + addr = gen_rtx_MEM (blk ? BLKmode : tmode, op1);
> 
> That indent is broken.
> 
> > +  else
> > + {
> > +   op0 = copy_to_mode_reg (mode0, op0);
> 
> And so is this.  Should be two spaces, not three.
> 
> > +   addr = gen_rtx_MEM (blk ? BLKmode : smode,
> > + gen_rtx_PLUS (Pmode, op1, op0));
> 
> "gen_rtx_PLUS" should line up with "blk".
> 
> > +  if (sign_extend)
> > +{
> > +   rtx discratch = gen_reg_rtx (DImode);
> > +   rtx tiscratch = gen_reg_rtx (TImode);
> 
> More broken indentation.  (And more later.)
> 
> > +   // emit the lxvr*x insn.
> 
> Use only /* comments */ please, don't mix them.  Emit with a capital
> E.
> 
> > +   pat = GEN_FCN (icode) (tiscratch, addr);
> > +   if (! pat)
> 
> No space after "!" (or any other unary op other than casts and sizeof
> and the like).
> 
> > +   // Emit a sign extention from QI,HI,WI to double.
> 
> "extension"

willdo, thanks

> 
> > +;; Store rightmost element into store_data
> > +;; using stxvrbx, stxvrhx, strvxwx, strvxdx.
> > +(define_insn "vsx_stxvrx"
> > +   [(set
> > +  (match_operand:INT_ISA3 0 "memory_operand" "=Z")
> > +  (truncate:INT_ISA3 (match_operand:TI 1
> > "vsx_register_operand" "wa")))]
> > +  "TARGET_POWER10"
> > +  "stxvrx %1,%y0"
> 
> %x1 I think?

I'll doublecheck. 

> 
> > --- /dev/null
> > +++ b/gcc/testsuite/gcc.target/powerpc/vsx-load-element-extend-
> > char.c
> > @@ -0,0 +1,168 @@
> > +/*
> > + * Test of vec_xl_sext and vec_xl_zext (load into rightmost
> > + * vector element and zero/sign extend). */
> > +
> > +/* { dg-do compile {target power10_ok} } */
> > +/* { dg-do run {target power10_hw} } */
> > +/* { dg-require-effective-target power10_ok } */
> > +/* { dg-options "-mdejagnu-cpu=power10 -O2" } */
> 
> If you dg_require it, why test it on the "dg-do compile" line?  It
> will
> *work* with it of course, but it is puzzling :-)

I've had both compile-time and run-time versions of the test.  In this
case I wanted to try to handle both, so compile when I can compile it,
and run when I can run it, etc.

If that combo doesn't work the way I expect it to, i'll need to split
them out into separate tests.   

> 
> > --- /dev/null
> > +++ b/gcc/testsuite/gcc.target/powerpc/vsx-load-element-extend-
> > int.c
> > @@ -0,0 +1,165 @@
> > +/*
> > + * Test of vec_xl_sext and vec_xl_zext (load into rightmost
> > + * vector element and zero/sign extend). */
> > +
> > +/* { dg-do compile {target power10_ok} } */
> > +/* { dg-do run {target power10_hw} } */
> > +/* { dg-require-effective-target power10_ok } */
> > +/* { dg-options "-mdejagnu-cpu=power10 -O0" } */
> 
> Please comment here what that -O0 is for?  So that we still know when
> we
> read it decades from now ;-)

I've got it commented at least once, I'll make sure to get all the
instances covered.

> 
> > +/* { dg-final { scan-assembler-times {\mlxvrwx\M} 2 } } */
> > +/* { dg-final { scan-assembler-times {\mlwax\M} 0 } } */
> 
> Maybe all of  {\mlwa}  here?

lwax was sufficient for what I sniff-tested.  I'll double-check.

Thanks,
-Will

> 
> 
> Segher



Re: One issue with default implementation of zero_call_used_regs

2020-09-25 Thread Richard Sandiford
Qing Zhao  writes:
>> On Sep 25, 2020, at 7:53 AM, Richard Sandiford  
>> wrote:
>> 
>> Qing Zhao  writes:
>>> Hi, Richard,
>>> 
>>> As you suggested, I added a default implementation of the target hook 
>>> “zero_cal_used_regs (HARD_REG_SET)” as following in my latest patch
>>> 
>>> 
>>> /* The default hook for TARGET_ZERO_CALL_USED_REGS.  */
>>> 
>>> void
>>> default_zero_call_used_regs (HARD_REG_SET need_zeroed_hardregs)
>> 
>> FWIW, I was suggesting to return the set of registers that are actually
>> cleared too.  Here you have the hook emit the asm statement, but IMO the
>> way we generate the asm for a given set of registers should be entirely
>> target-independent, and happen outside the hook.
>> 
>> So the hook returning the set of cleared registers does two things:
>> 
>> (1) It indicates which registers should be clobbered by the asm
>>(which would be generated after calling the hook, but emitted
>>before the sequence of instructions generated by the hook).
>
> For this purpose, this hook should return a Set of RTX that hold the cleared 
> registers, a HARD_REG_SET is not enough.
>
> Since in the ASM_OPERANDS, we will need the RTX for the register (not the 
> REGNO).
>
> Which data structure in GCC should be used here to hold this returned value 
> as Set of RTX ?

A HARD_REG_SET is enough.  All the caller needs to know is: which registers
were clobbered?  It can then represent a clobber of R with a clobber of
reg_regno_rtx[R].

The mode isn't important for single-register clobbers: clobbering a single
register in one mode is equivalent to clobbering it in another mode.
None of the register contents survive the clobber.

>> (2) It indicates which registers should be treated as live on return.
>> 
>> FWIW, for (2), I'd recommend storing the returned HARD_REG_SET in crtl.
>
> Instead of storing this info in crtl, in my current patch, I added the 
> following in “df-scan.c":
> +static HARD_REG_SET zeroed_reg_set;
>
> And routines that manipulate this HARD_REG_SET. 
> I think that this should serve the same purpose as storing it to crtl? 

Storing it in crtl is better for two reasons:

- Using global statics for this kind of thing makes it harder to
  compile functions in parallel.  (Work is underway to allow that.)

- Having the information in crtl reduces the risk that information
  from one function will get reused for another function, without the
  variable being reinitialised inbetween.

Thanks,
Richard


Re: [PATCH] OpenACC: Separate enter/exit data APIs

2020-09-25 Thread Andrew Stubbs

On 30/07/2020 12:10, Andrew Stubbs wrote:

On 29/07/2020 15:05, Andrew Stubbs wrote:
This patch does not implement anything new, but simply separates 
OpenACC 'enter data' and 'exit data' into two libgomp API functions.  
The original API name is kept for backward compatibility, but no 
longer referenced by the compiler.


The previous implementation assumed that it would always be possible 
to infer which kind of pragma it was dealing with from the context, 
but there are a few exceptions, and I want to add one more: 
zero-length arrays.


By cleaning this up I will be free to add the new feature without the 
reference counting getting broken.


This update fixes a new conflict and updates the patterns in a number of 
testcases that were affected.


OK to commit?

Andrew
OpenACC: Separate enter/exit data APIs

Move the OpenACC enter and exit data directives from using a single builtin
to having one each.  For most purposes it was easy to tell which was which,
from the directives given, but there are some exceptions.  In particular,
zero-length array copies are indistiguishable, but we still want reference
counting to work.

gcc/ChangeLog:

	* gimple-pretty-print.c (dump_gimple_omp_target): Replace
	GF_OMP_TARGET_KIND_OACC_ENTER_EXIT_DATA with
	GF_OMP_TARGET_KIND_OACC_ENTER_DATA and
	GF_OMP_TARGET_KIND_OACC_EXIT_DATA.
	* gimple.h (enum gf_mask): Likewise.
	(is_gimple_omp_oacc): Likewise.
	* gimplify.c (gimplify_omp_target_update): Likewise.
	* omp-builtins.def (BUILT_IN_GOACC_ENTER_EXIT_DATA): Delete.
	(BUILT_IN_GOACC_ENTER_DATA): Add new.
	(BUILT_IN_GOACC_EXIT_DATA): Add new.
	* omp-expand.c (expand_omp_target): Replace
	GF_OMP_TARGET_KIND_OACC_ENTER_EXIT_DATA with
	GF_OMP_TARGET_KIND_OACC_ENTER_DATA and
	GF_OMP_TARGET_KIND_OACC_EXIT_DATA.
	(build_omp_regions_1): Likewise.
	(omp_make_gimple_edges): Likewise.
	* omp-low.c (check_omp_nesting_restrictions): Likewise.
	(lower_omp_target): Likewise.

gcc/testsuite/ChangeLog:

	* c-c++-common/goacc-gomp/nesting-fail-1.c: Adjust patterns.
	* c-c++-common/goacc/finalize-1.c: Adjust patterns.
	* c-c++-common/goacc/mdc-1.c: Adjust patterns.
	* c-c++-common/goacc/nesting-fail-1.c: Adjust patterns.
	* c-c++-common/goacc/struct-enter-exit-data-1.c: Adjust patterns.

libgomp/ChangeLog:

	* libgomp.map: Add GOACC_enter_data and GOACC_exit_data.
	* libgomp_g.h (GOACC_enter_exit_data): Delete.
	(GOACC_enter_data): New prototype.
	(GOACC_exit_data) New prototype.:
	* oacc-mem.c (GOACC_enter_exit_data): Move most of the content ...
	(GOACC_enter_exit_data_internal): ... here.
	(GOACC_enter_data): New function.
	(GOACC_exit_data) New function.:
	* oacc-parallel.c (GOACC_declare): Replace GOACC_enter_exit_data with
	  GOACC_enter_data and GOACC_exit_data.
	* testsuite/libgomp.oacc-c-c++-common/lib-26.c: Delete file.
	* testsuite/libgomp.oacc-c-c++-common/lib-36.c: Delete file.
	* testsuite/libgomp.oacc-c-c++-common/lib-40.c: Delete file.

diff --git a/gcc/gimple-pretty-print.c b/gcc/gimple-pretty-print.c
index a01bf901657..26978ec1ab5 100644
--- a/gcc/gimple-pretty-print.c
+++ b/gcc/gimple-pretty-print.c
@@ -1691,8 +1691,11 @@ dump_gimple_omp_target (pretty_printer *buffer, const gomp_target *gs,
 case GF_OMP_TARGET_KIND_OACC_UPDATE:
   kind = " oacc_update";
   break;
-case GF_OMP_TARGET_KIND_OACC_ENTER_EXIT_DATA:
-  kind = " oacc_enter_exit_data";
+case GF_OMP_TARGET_KIND_OACC_ENTER_DATA:
+  kind = " oacc_enter_data";
+  break;
+case GF_OMP_TARGET_KIND_OACC_EXIT_DATA:
+  kind = " oacc_exit_data";
   break;
 case GF_OMP_TARGET_KIND_OACC_DECLARE:
   kind = " oacc_declare";
diff --git a/gcc/gimple.h b/gcc/gimple.h
index 6cc7e66059d..3f17b1c0739 100644
--- a/gcc/gimple.h
+++ b/gcc/gimple.h
@@ -171,9 +171,10 @@ enum gf_mask {
 GF_OMP_TARGET_KIND_OACC_SERIAL = 7,
 GF_OMP_TARGET_KIND_OACC_DATA = 8,
 GF_OMP_TARGET_KIND_OACC_UPDATE = 9,
-GF_OMP_TARGET_KIND_OACC_ENTER_EXIT_DATA = 10,
+GF_OMP_TARGET_KIND_OACC_ENTER_DATA = 10,
 GF_OMP_TARGET_KIND_OACC_DECLARE = 11,
 GF_OMP_TARGET_KIND_OACC_HOST_DATA = 12,
+GF_OMP_TARGET_KIND_OACC_EXIT_DATA = 13,
 GF_OMP_TEAMS_HOST		= 1 << 0,
 
 /* True on an GIMPLE_OMP_RETURN statement if the return does not require
@@ -6482,7 +6483,8 @@ is_gimple_omp_oacc (const gimple *stmt)
 	case GF_OMP_TARGET_KIND_OACC_SERIAL:
 	case GF_OMP_TARGET_KIND_OACC_DATA:
 	case GF_OMP_TARGET_KIND_OACC_UPDATE:
-	case GF_OMP_TARGET_KIND_OACC_ENTER_EXIT_DATA:
+	case GF_OMP_TARGET_KIND_OACC_ENTER_DATA:
+	case GF_OMP_TARGET_KIND_OACC_EXIT_DATA:
 	case GF_OMP_TARGET_KIND_OACC_DECLARE:
 	case GF_OMP_TARGET_KIND_OACC_HOST_DATA:
 	  return true;
diff --git a/gcc/gimplify.c b/gcc/gimplify.c
index 2dea03cce3d..8fcba8b5b18 100644
--- a/gcc/gimplify.c
+++ b/gcc/gimplify.c
@@ -12976,8 +12976,11 @@ gimplify_omp_target_update (tree *expr_p, gimple_seq *pre_p)
   switch (TREE_CODE (expr))
 {
 case OACC_ENTER_DATA:
+  kind = GF_OMP_TARGET_KIND_OACC_ENTER_DATA;
+  ort = ORT_ACC;
+   

Re: [PATCH 1/2] rs6000: Support _mm_insert_epi{8,32,64}

2020-09-25 Thread Peter Bergner via Gcc-patches
On 9/24/20 6:22 PM, Segher Boessenkool wrote:
>> +  result [(__N & 0b)] = __D;
> 
> Hrm, GCC supports binary constants like this since 2007, so okay.  But I
> have to wonder if this improves anything over hex (or decimal even!)
> The parens are superfluous (and only hinder legibility), fwiw.

+1 for using hex constants when using them with logical ops like '&'.

Peter




[PATCH] New patch for the port of gccgo to GNU/Hurd

2020-09-25 Thread Svante Signell via Gcc-patches
Hello,

Latest Debian snapshot of gcc (20200917-1) FTBFS due to a missing hurd
entry in the // +build line of libgo/go/net/fd_posix.go. Attached is a
patch for that missing entry.

With it the latest Debian snapshot has been successfully built. Test
results for libgo and go are:

=== libgo Summary ===

# of expected passes163
# of unexpected failures12

=== go Summary ===

# of expected passes7469
# of unexpected failures10
# of expected failures  1
# of untested testcases 6
# of unsupported tests  2


Thanks!
--- a/src/libgo/go/net/fd_posix.go	2020-08-03 15:12:53.0 +0200
+++ b/src/libgo/go/net/fd_posix.go	2020-09-24 16:03:50.0 +0200
@@ -2,7 +2,7 @@
 // Use of this source code is governed by a BSD-style
 // license that can be found in the LICENSE file.
 
-// +build aix darwin dragonfly freebsd linux netbsd openbsd solaris windows
+// +build aix darwin dragonfly freebsd hurd linux netbsd openbsd solaris windows
 
 package net
 


Re: One issue with default implementation of zero_call_used_regs

2020-09-25 Thread Qing Zhao via Gcc-patches



> On Sep 25, 2020, at 7:53 AM, Richard Sandiford  
> wrote:
> 
> Qing Zhao  writes:
>> Hi, Richard,
>> 
>> As you suggested, I added a default implementation of the target hook 
>> “zero_cal_used_regs (HARD_REG_SET)” as following in my latest patch
>> 
>> 
>> /* The default hook for TARGET_ZERO_CALL_USED_REGS.  */
>> 
>> void
>> default_zero_call_used_regs (HARD_REG_SET need_zeroed_hardregs)
> 
> FWIW, I was suggesting to return the set of registers that are actually
> cleared too.  Here you have the hook emit the asm statement, but IMO the
> way we generate the asm for a given set of registers should be entirely
> target-independent, and happen outside the hook.
> 
> So the hook returning the set of cleared registers does two things:
> 
> (1) It indicates which registers should be clobbered by the asm
>(which would be generated after calling the hook, but emitted
>before the sequence of instructions generated by the hook).

For this purpose, this hook should return a Set of RTX that hold the cleared 
registers, a HARD_REG_SET is not enough.

Since in the ASM_OPERANDS, we will need the RTX for the register (not the 
REGNO).

Which data structure in GCC should be used here to hold this returned value as 
Set of RTX ?
> 
> (2) It indicates which registers should be treated as live on return.
> 
> FWIW, for (2), I'd recommend storing the returned HARD_REG_SET in crtl.

Instead of storing this info in crtl, in my current patch, I added the 
following in “df-scan.c":
+static HARD_REG_SET zeroed_reg_set;

And routines that manipulate this HARD_REG_SET. 
I think that this should serve the same purpose as storing it to crtl? 

> Then the wrapper around EPILOGUE_USES that we talked about would
> check two things:
> 
> - EPILOGUE_USES itself
> - the crtl HARD_REG_SET
> 
> The crtl set would start out empty and remain empty unless the
> new option is used.

Yes, I did this for zeroed_reg_set in my current patch.
> 
>>if (zero_rtx[(int)mode] == NULL_RTX)
>>  {
>>zero_rtx[(int)mode] = reg;
>>tmp = gen_rtx_SET (reg, const0_rtx);
>>emit_insn (tmp);
>>  }
>>else
>>  emit_move_insn (reg, zero_rtx[(int)mode]);
> 
> Hmm, OK, so you're assuming that it's better to zero one register
> and reuse that register for later moves.  I guess this is my RISC
> background/bias showing, but I think it might be simpler to assume
> that zeroing is as cheap as a register move.  The danger with reusing
> earlier registers is that you might introduce a cross-bank move,
> and some targets can only do those via memory.
Okay, I will move zeroes to registers.
> 
> Or perhaps we could use insn_cost to choose between them.  But I think
> the first implementation can just zero each register individually,
> unless we already know of a specific case in which reusing registers
> is necessary.

The current X86 implementation uses register move instead of directly move zero 
to register, I guess it’s because the register move on X86 is cheaper.
> 
>> I tested this default implementation on aarch64 with a small testing case, 
>> -fzero-call-used-regs=all-gpr|used-gpr|used-gpr-arg|used-arg|used work well, 
>> however, 
>> -fzero-call-used-regs=all-arg or -fzero-call-used-regs=all have an internal 
>> compiler error as following:
>> 
>> t1.c:15:1: internal compiler error: in gen_highpart, at emit-rtl.c:1631
>>   15 | }
>>  | ^
>> 0xcff58b gen_highpart(machine_mode, rtx_def*)
>>  ../../hjl-caller-saved-gcc/gcc/emit-rtl.c:1631
>> 0x174b373 aarch64_split_128bit_move(rtx_def*, rtx_def*)
>>  ../../hjl-caller-saved-gcc/gcc/config/aarch64/aarch64.c:3390
>> 0x1d8b087 gen_split_11(rtx_insn*, rtx_def**)
>>  ../../hjl-caller-saved-gcc/gcc/config/aarch64/aarch64.md:1394
>> 
>> As I studied today, I found the major issue for this bug is because the 
>> following statement:
>> 
>>machine_mode mode = reg_raw_mode[regno];
>> 
>> “reg_raw_mode” returns E_TImode for aarch64 register V0 (which is a vector 
>> register on aarch64) , as a result, the zeroing insn for this register is:
>> 
>> (insn 112 111 113 7 (set (reg:TI 32 v0)
>>(const_int 0 [0])) "t1.c":15:1 -1
>> (nil))
>> 
>> 
>> However, looks like that the above RTL have to be splitted into two sub 
>> register moves on aarch64, and the splitting has some issue. 
>> 
>> So, I guess that on aarch64, zeroing vector registers might need other modes 
>> than the one returned by “reg_raw_mode”.  
>> 
>> My questions are:
>> 
>> 1. Is there another available utility routine that returns the proper MODE 
>> for the hard registers that can be readily used to zero the hardr register?
>> 2. If not, should I add one more target hook for this purpose? i.e 
>> 
>> /* Return the proper machine mode that can be used to zero this hard 
>> register specified by REGNO.  */
>> machine_mode zero-call-used-regs-mode (unsigned int REGNO)
> 
> Thanks for testing aarch64.  I think there are two issues here,
> one in the 

Re: [stage1][PATCH] Change semantics of -frecord-gcc-switches and add -frecord-gcc-switches-format.

2020-09-25 Thread Martin Liška

PING^5

On 7/21/20 6:24 PM, Qing Zhao wrote:

PING^4.

Our company is waiting for this patch to be committed to upstream.

Thanks a lot.

Qing


On Jun 16, 2020, at 7:49 AM, Martin Liška  wrote:

PING^3

On 6/2/20 11:16 AM, Martin Liška wrote:

PING^2
On 5/15/20 11:58 AM, Martin Liška wrote:

We're in stage1: PING^1

On 4/3/20 8:15 PM, Egeyar Bagcioglu wrote:



On 3/18/20 10:05 AM, Martin Liška wrote:

On 3/17/20 7:43 PM, Egeyar Bagcioglu wrote:

Hi Martin,

I like the patch. It definitely serves our purposes at Oracle and provides 
another way to do what my previous patches did as well.

1) It keeps the backwards compatibility regarding -frecord-gcc-switches; 
therefore, removes my related doubts about your previous patch.

2) It still makes use of -frecord-gcc-switches. The new option is only to 
control the format. This addresses some previous objections to having a new 
option doing something similar. Now the new option controls the behaviour of 
the existing one and that behaviour can be further extended.

3) It uses an environment variable as Jakub suggested.

The patch looks good and I confirm that it works for our purposes.


Hello.

Thank you for the support.



Having said that, I have to ask for recognition in this patch for my and my 
company's contributions. Can you please keep my name and my work email in the 
changelog and in the commit message?


Sure, sorry I forgot.


Hi Martin,

I noticed that some comments in the patch were still referring to 
--record-gcc-command-line, the option I suggested earlier. I updated those 
comments to mention -frecord-gcc-switches-format instead and also added my name 
to the patch as you agreed above. I attached the updated patch. We are starting 
to use this patch in the specific domain where we need its functionality.

Regards
Egeyar




Martin



Thanks
Egeyar



On 3/17/20 2:53 PM, Martin Liška wrote:

Hi.

I'm sending enhanced patch that makes the following changes:
- a new option -frecord-gcc-switches-format is added; the option
   selects format (processed, driver) for all options that record
   GCC command line
- Dwarf gen_produce_string is now used in -fverbose-asm
- The .s file is affected in the following way:

BEFORE:

# GNU C17 (SUSE Linux) version 9.2.1 20200128 [revision 
83f65674e78d97d27537361de1a9d74067ff228d] (x86_64-suse-linux)
#compiled by GNU C version 9.2.1 20200128 [revision 
83f65674e78d97d27537361de1a9d74067ff228d], GMP version 6.2.0, MPFR version 
4.0.2, MPC version 1.1.0, isl version isl-0.22.1-GMP

# GGC heuristics: --param ggc-min-expand=100 --param ggc-min-heapsize=131072
# options passed:  -fpreprocessed test.i -march=znver1 -mmmx -mno-3dnow
# -msse -msse2 -msse3 -mssse3 -msse4a -mcx16 -msahf -mmovbe -maes -msha
# -mpclmul -mpopcnt -mabm -mno-lwp -mfma -mno-fma4 -mno-xop -mbmi -mno-sgx
# -mbmi2 -mno-pconfig -mno-wbnoinvd -mno-tbm -mavx -mavx2 -msse4.2 -msse4.1
# -mlzcnt -mno-rtm -mno-hle -mrdrnd -mf16c -mfsgsbase -mrdseed -mprfchw
# -madx -mfxsr -mxsave -mxsaveopt -mno-avx512f -mno-avx512er -mno-avx512cd
# -mno-avx512pf -mno-prefetchwt1 -mclflushopt -mxsavec -mxsaves
# -mno-avx512dq -mno-avx512bw -mno-avx512vl -mno-avx512ifma -mno-avx512vbmi
# -mno-avx5124fmaps -mno-avx5124vnniw -mno-clwb -mmwaitx -mclzero -mno-pku
# -mno-rdpid -mno-gfni -mno-shstk -mno-avx512vbmi2 -mno-avx512vnni
# -mno-vaes -mno-vpclmulqdq -mno-avx512bitalg -mno-movdiri -mno-movdir64b
# -mno-waitpkg -mno-cldemote -mno-ptwrite --param l1-cache-size=32
# --param l1-cache-line-size=64 --param l2-cache-size=512 -mtune=znver1
# -grecord-gcc-switches -g -fverbose-asm -frecord-gcc-switches
# options enabled:  -faggressive-loop-optimizations -fassume-phsa
# -fasynchronous-unwind-tables -fauto-inc-dec -fcommon
# -fdelete-null-pointer-checks -fdwarf2-cfi-asm -fearly-inlining
# -feliminate-unused-debug-types -ffp-int-builtin-inexact -ffunction-cse
# -fgcse-lm -fgnu-runtime -fgnu-unique -fident -finline-atomics
# -fipa-stack-alignment -fira-hoist-pressure -fira-share-save-slots
# -fira-share-spill-slots -fivopts -fkeep-static-consts
# -fleading-underscore -flifetime-dse -flto-odr-type-merging -fmath-errno
# -fmerge-debug-strings -fpeephole -fplt -fprefetch-loop-arrays
# -frecord-gcc-switches -freg-struct-return -fsched-critical-path-heuristic
# -fsched-dep-count-heuristic -fsched-group-heuristic -fsched-interblock
# -fsched-last-insn-heuristic -fsched-rank-heuristic -fsched-spec
# -fsched-spec-insn-heuristic -fsched-stalled-insns-dep -fschedule-fusion
# -fsemantic-interposition -fshow-column -fshrink-wrap-separate
# -fsigned-zeros -fsplit-ivs-in-unroller -fssa-backprop -fstdarg-opt
# -fstrict-volatile-bitfields -fsync-libcalls -ftrapping-math -ftree-cselim
# -ftree-forwprop -ftree-loop-if-convert -ftree-loop-im -ftree-loop-ivcanon
# -ftree-loop-optimize -ftree-parallelize-loops= -ftree-phiprop
# -ftree-reassoc -ftree-scev-cprop -funit-at-a-time -funwind-tables
# -fverbose-asm -fzero-initialized-in-bss -m128bit-long-double -m64 -m80387
# -mabm -madx -maes 

Re: Issue with ggc_delete and finalizers (was Re: New modref/ipa_modref optimization passes)

2020-09-25 Thread David Malcolm via Gcc-patches
On Thu, 2020-09-24 at 08:30 +0200, Jan Hubicka wrote:
> Hi,
> This patch makes ggc_delete to be paired with ggc_alloc_no_dtor.
> I copy same scheme as used by Martin in ipa-fnsummary, that is
> creating a
> static member function create_ggc hidding the ugly bits and using it
> in
> ipa-modref.c.
> 
> I also noticed that modref-tree leaks memory on destruction/collapse
> method and
> fixed that.
> 
> Bootstrapped/regtested x86_64-linux.

It looks like you committed this as
c9da53d6987af5f8ff68b58dd76a9fbc900a6a21.

This appears to fix the issues seen with the GC with jit
(PR jit/97169).

With the previous commit, jit.sum had:

# of expected passes5751
# of unexpected failures64
# of unresolved testcases   1

with a number of SIGSEGV showing up in the FAIL reports, whereas with
c9da53d6987af5f8ff68b58dd76a9fbc900a6a21, jit.sum is restored to:

# of expected passes10854

Thanks!
Dave




[PATCH v2 16/16] Testsuite: Add initial tests for NEON (incomplete)

2020-09-25 Thread Tamar Christina
Hi All,

These are just initial testcases to show what the patch is testing for,
however it is incomplete and I am working on better test setup
to test all targets and add middle-end tests.

These were just included for completeness.

Thanks,
Tamar

gcc/testsuite/ChangeLog:

* gcc.target/aarch64/advsimd-intrinsics/vcadd-arrays-autovec-270.c: New 
test.
* gcc.target/aarch64/advsimd-intrinsics/vcadd-arrays-autovec-90.c: New 
test.
* gcc.target/aarch64/advsimd-intrinsics/vcadd-arrays_1.c: New test.
* gcc.target/aarch64/advsimd-intrinsics/vcadd-arrays_2.c: New test.
* gcc.target/aarch64/advsimd-intrinsics/vcadd-arrays_3.c: New test.
* gcc.target/aarch64/advsimd-intrinsics/vcadd-arrays_4.c: New test.
* gcc.target/aarch64/advsimd-intrinsics/vcadd-arrays_5.c: New test.
* gcc.target/aarch64/advsimd-intrinsics/vcadd-arrays_6.c: New test.
* gcc.target/aarch64/advsimd-intrinsics/vcadd-complex-autovec.c: New 
test.
* gcc.target/aarch64/advsimd-intrinsics/vcadd-complex_1.c: New test.
* gcc.target/aarch64/advsimd-intrinsics/vcadd-complex_2.c: New test.
* gcc.target/aarch64/advsimd-intrinsics/vcadd-complex_3.c: New test.
* gcc.target/aarch64/advsimd-intrinsics/vcadd-complex_4.c: New test.
* gcc.target/aarch64/advsimd-intrinsics/vcadd-complex_5.c: New test.
* gcc.target/aarch64/advsimd-intrinsics/vcadd-complex_6.c: New test.
* gcc.target/aarch64/advsimd-intrinsics/vcmla-complex-autovec.c: New 
test.
* gcc.target/aarch64/advsimd-intrinsics/vcmla-complex_1.c: New test.
* gcc.target/aarch64/advsimd-intrinsics/vcmla-complex_180_1.c: New test.
* gcc.target/aarch64/advsimd-intrinsics/vcmla-complex_180_2.c: New test.
* gcc.target/aarch64/advsimd-intrinsics/vcmla-complex_180_3.c: New test.
* gcc.target/aarch64/advsimd-intrinsics/vcmla-complex_2.c: New test.
* gcc.target/aarch64/advsimd-intrinsics/vcmla-complex_270_1.c: New test.
* gcc.target/aarch64/advsimd-intrinsics/vcmla-complex_270_2.c: New test.
* gcc.target/aarch64/advsimd-intrinsics/vcmla-complex_270_3.c: New test.
* gcc.target/aarch64/advsimd-intrinsics/vcmla-complex_3.c: New test.
* gcc.target/aarch64/advsimd-intrinsics/vcmla-complex_90_1.c: New test.
* gcc.target/aarch64/advsimd-intrinsics/vcmla-complex_90_2.c: New test.
* gcc.target/aarch64/advsimd-intrinsics/vcmla-complex_90_3.c: New test.

-- 
diff --git a/gcc/testsuite/gcc.target/aarch64/advsimd-intrinsics/vcadd-arrays-autovec-270.c b/gcc/testsuite/gcc.target/aarch64/advsimd-intrinsics/vcadd-arrays-autovec-270.c
new file mode 100644
index ..8f660f392153c3a6a83b31486e275be316c6ad2b
--- /dev/null
+++ b/gcc/testsuite/gcc.target/aarch64/advsimd-intrinsics/vcadd-arrays-autovec-270.c
@@ -0,0 +1,13 @@
+/* { dg-skip-if "" { *-*-* } } */
+
+#define N 200
+
+__attribute__ ((noinline))
+void calc (TYPE a[N], TYPE b[N], TYPE *c)
+{
+  for (int i=0; i < N; i+=2)
+{
+  c[i] = a[i] + b[i+1];
+  c[i+1] = a[i+1] - b[i];
+}
+}
diff --git a/gcc/testsuite/gcc.target/aarch64/advsimd-intrinsics/vcadd-arrays-autovec-90.c b/gcc/testsuite/gcc.target/aarch64/advsimd-intrinsics/vcadd-arrays-autovec-90.c
new file mode 100644
index ..14014b9d4f2c41e75be3e253d2e47e639e4224c0
--- /dev/null
+++ b/gcc/testsuite/gcc.target/aarch64/advsimd-intrinsics/vcadd-arrays-autovec-90.c
@@ -0,0 +1,12 @@
+/* { dg-skip-if "" { *-*-* } } */
+#define N 200
+
+__attribute__ ((noinline))
+void calc (TYPE a[N], TYPE b[N], TYPE *c)
+{
+  for (int i=0; i < N; i+=2)
+{
+  c[i] = a[i] - b[i+1];
+  c[i+1] = a[i+1] + b[i];
+}
+}
diff --git a/gcc/testsuite/gcc.target/aarch64/advsimd-intrinsics/vcadd-arrays_1.c b/gcc/testsuite/gcc.target/aarch64/advsimd-intrinsics/vcadd-arrays_1.c
new file mode 100644
index ..997d9065504a9a16d3ea1316f7ea4208b3516c55
--- /dev/null
+++ b/gcc/testsuite/gcc.target/aarch64/advsimd-intrinsics/vcadd-arrays_1.c
@@ -0,0 +1,30 @@
+/* { dg-do run } */
+/* { dg-require-effective-target arm_v8_3a_complex_neon_ok } */
+/* { dg-require-effective-target vect_complex_rot_df } */
+/* { dg-add-options arm_v8_3a_complex_neon } */
+/* { dg-additional-options "-Ofast -save-temps" } */
+
+#define TYPE double
+#include "vcadd-arrays-autovec-90.c"
+
+extern void abort(void);
+
+int main()
+{
+  TYPE a[N] = {1.0, 2.0, 3.0, 4.0};
+  TYPE b[N] = {4.0, 2.0, 1.5, 4.5};
+  TYPE c[N] = {0};
+  calc (a, b, c);
+
+  if (c[0] != -1.0 || c[1] != 6.0)
+abort ();
+
+  if (c[2] != -1.5 || c[3] != 5.5)
+abort ();
+
+  return 0;
+}
+
+/* { dg-final { scan-assembler-times {fcadd\tv[0-9]+\.2d, v[0-9]+\.2d, v[0-9]+\.2d, #90} 1 { target { aarch64*-*-* } } } } */
+/* { dg-final { scan-assembler-not {vcadd\.} { target { arm*-*-* } } } } */
+
diff --git 

[PATCH v2 15/16]Arm: Add MVE RTL patterns for Complex Addition, Multiply and FMA.

2020-09-25 Thread Tamar Christina
Hi All,

This adds implementation for the optabs for complex operations.  With this the
following C code:

  void f90 (int _Complex a[restrict N], int _Complex b[restrict N],
int _Complex c[restrict N])
  {
for (int i=0; i < N; i++)
  c[i] = a[i] + (b[i] * I);
  }

generates

  .L3:
  mov r3, r0
  vldrw.32  q2, [r3]
  mov r3, r1
  vldrw.32  q1, [r3]
  mov r3, r2
  vcadd.i32   q3, q2, q1, #90
  addsr0, r0, #16
  vstrw.32  q3, [r3]
  addsr1, r1, #16
  addsr2, r2, #16
  le  lr, .L3
  pop {r4, r5, r6, r7, r8, pc}

which is not ideal due to register allocation and addressing mode issues with
MVE in general.  However -frename-register cleans up the register allocation:

  .L3:
  mov r5, r0
  mov r6, r1
  vldrw.32  q2, [r5]
  vldrw.32  q1, [r6]
  mov r7, r2
  vcadd.i32   q3, q2, q1, #90
  addsr0, r0, #16
  vstrw.32  q3, [r7]
  addsr1, r1, #16
  addsr2, r2, #16
  le  lr, .L3
  pop {r4, r5, r6, r7, r8, pc}

but leaves the addressing mode problems.

Before this patch it generated a scalar loop

  .L2:
  ldr r7, [r0, r3, lsl #2]
  ldr r5, [r6, r3, lsl #2]
  ldr r4, [r1, r3, lsl #2]
  subsr5, r7, r5
  ldr r7, [lr, r3, lsl #2]
  add r4, r4, r7
  str r5, [r2, r3, lsl #2]
  str r4, [ip, r3, lsl #2]
  addsr3, r3, #2
  cmp r3, #200
  bne .L2
  pop {r4, r5, r6, r7, pc}



Bootstrapped Regtested on arm-none-linux-gnueabihf and no issues.
Cross compiled arm-none-eabi and ran with -march=armv8.1-m.main+mve.fp
-mfloat-abi=hard -mfpu=auto and regression is on-going.

Unfortunately MVE does not currently implement auto-vectorization of floating
point values.  As such I cannot test this directly.  But since they share 90%
of the code with NEON these should just work whenever support is added so I
would still like to commit these.

To support this I had to refactor the MVE bits a bit.  This now uses the same
unspecs for both NEON and MVE and removes the unneeded different signed and
unsigned unspecs since they both point to the signed instruction.

I have tried multiple approaches to cleaning this up but I think this is the
nicest it can get given the slight ISA differences.

Ok for master if no issues?

Thanks,
Tamar

gcc/ChangeLog:

* config/arm/arm_mve.h (__arm_vcaddq_rot90_u8, __arm_vcaddq_rot270_u8,
, __arm_vcaddq_rot90_s8, __arm_vcaddq_rot270_s8,
__arm_vcaddq_rot90_u16, __arm_vcaddq_rot270_u16, __arm_vcaddq_rot90_s16,
__arm_vcaddq_rot270_s16, __arm_vcaddq_rot90_u32,
__arm_vcaddq_rot270_u32, __arm_vcaddq_rot90_s32,
__arm_vcaddq_rot270_s32, __arm_vcmulq_rot90_f16,
__arm_vcmulq_rot270_f16, __arm_vcmulq_rot180_f16,
__arm_vcmulq_f16, __arm_vcaddq_rot90_f16, __arm_vcaddq_rot270_f16,
__arm_vcmulq_rot90_f32, __arm_vcmulq_rot270_f32,
__arm_vcmulq_rot180_f32, __arm_vcmulq_f32, __arm_vcaddq_rot90_f32,
__arm_vcaddq_rot270_f32, __arm_vcmlaq_f16, __arm_vcmlaq_rot180_f16,
__arm_vcmlaq_rot270_f16, __arm_vcmlaq_rot90_f16, __arm_vcmlaq_f32,
__arm_vcmlaq_rot180_f32, __arm_vcmlaq_rot270_f32,
__arm_vcmlaq_rot90_f32): Update builtin calls.
* config/arm/arm_mve_builtins.def (vcaddq_rot90_u, vcaddq_rot270_u,
vcaddq_rot90_s, vcaddq_rot270_s, vcaddq_rot90_f, vcaddq_rot270_f,
vcmulq_f, vcmulq_rot90_f, vcmulq_rot180_f, vcmulq_rot270_f,
vcmlaq_f, vcmlaq_rot90_f, vcmlaq_rot180_f, vcmlaq_rot270_f): Removed.
(vcaddq_rot90, vcaddq_rot270, vcmulq, vcmulq_rot90, vcmulq_rot180,
vcmulq_rot270, vcmlaq, vcmlaq_rot90, vcmlaq_rot180, vcmlaq_rot270):
New.
* config/arm/constraints.md (Dz): Include MVE.
* config/arm/iterators.md (mve_rotsplit1, mve_rotsplit2): New.
* config/arm/mve.md (VCADDQ_ROT270_S, VCADDQ_ROT90_S, VCADDQ_ROT270_U,
VCADDQ_ROT90_U, VCADDQ_ROT270_F, VCADDQ_ROT90_F, VCMULQ_F,
VCMULQ_ROT180_F, VCMULQ_ROT270_F, VCMULQ_ROT90_F, VCMLAQ_F,
VCMLAQ_ROT180_F, VCMLAQ_ROT90_F, VCMLAQ_ROT270_F, VCADDQ_ROT270_S,
VCADDQ_ROT270, VCADDQ_ROT90): Removed.
(mve_rot, VCMUL): New.
(mve_vcaddq_rot270_,
mve_vcaddq_rot270_f, mve_vcaddq_rot90_f, mve_vcmulq_f, mve_vcmulq_rot270_f,
mve_vcmulq_rot90_f, mve_vcmlaq_f, mve_vcmlaq_rot180_f,
mve_vcmlaq_rot270_f, mve_vcmlaq_rot90_f): Removed.
(mve_vcmlaq, mve_vcmulq,
mve_vcaddq, cadd3, mve_vcaddq):
New.
* config/arm/neon.md (cadd3, cml4):
Moved.
(cmul3): Exclude MVE types.
* config/arm/unspecs.md (UNSPEC_VCMUL90, UNSPEC_VCMUL270): New.
* config/arm/vec-common.md 

[PATCH v2 13/16]Arm: Add support for auto-vectorization using HF mode.

2020-09-25 Thread Tamar Christina
Hi All,

This adds support to the auto-vectorizer to support HFmode vectorization for
AArch32.  This is supported when +fp16 is used.  I wonder if I should disable
the returning of the type if the option isn't enabled.

At the moment it will be returned but the vectorizer will try and fail to use
it.  It wastes a few compile cycles but doesn't result in bad code.

Bootstrapped Regtested on arm-none-linux-gnueabihf and no issues.

Ok for master?

Thanks,
Tamar

gcc/ChangeLog:

* config/arm/arm.c (arm_preferred_simd_mode): Add E_HFmode.

gcc/testsuite/ChangeLog:

* gcc.target/arm/vect-half-floats.c: New test.

-- 
diff --git a/gcc/config/arm/arm.c b/gcc/config/arm/arm.c
index 022ef6c3f1d723bdf421268c81cd0c759c414d9a..8ca6b913fddb74cd6f4867efc0a7264184c59db0 100644
--- a/gcc/config/arm/arm.c
+++ b/gcc/config/arm/arm.c
@@ -28917,6 +28917,8 @@ arm_preferred_simd_mode (scalar_mode mode)
   if (TARGET_NEON)
 switch (mode)
   {
+  case E_HFmode:
+	return TARGET_NEON_VECTORIZE_DOUBLE ? V4HFmode : V8HFmode;
   case E_SFmode:
 	return TARGET_NEON_VECTORIZE_DOUBLE ? V2SFmode : V4SFmode;
   case E_SImode:
diff --git a/gcc/testsuite/gcc.target/arm/vect-half-floats.c b/gcc/testsuite/gcc.target/arm/vect-half-floats.c
new file mode 100644
index ..ebfe7f964442a09053b0cbe04bed425e36b0af96
--- /dev/null
+++ b/gcc/testsuite/gcc.target/arm/vect-half-floats.c
@@ -0,0 +1,14 @@
+/* { dg-do compile }  */
+/* { dg-require-effective-target target_float16 } */ 
+/* { dg-require-effective-target arm_fp16_ok } */
+/* { dg-add-options for_float16 } */
+/* { dg-additional-options "-Ofast -ftree-vectorize -fdump-tree-vect-all -std=c11" } */
+
+void foo (_Float16 n1[], _Float16 n2[], _Float16 r[], int n)
+{
+  for (int i = 0; i < n; i++)
+   r[i] = n1[i] + n2[i];
+}
+
+/* { dg-final { scan-tree-dump-not "LOOP VECTORIZED" "vect" } } */
+



[PATCH v2 12/16]AArch64: Add SVE2 Integer RTL patterns for Complex Addition, Multiply and FMA.

2020-09-25 Thread Tamar Christina
Hi All,

This adds implementation for the optabs for complex operations.  With this the
following C code:

  void f90 (int _Complex a[restrict N], int _Complex b[restrict N],
int _Complex c[restrict N])
  {
for (int i=0; i < N; i++)
  c[i] = a[i] + (b[i] * I);
  }

generates

  f90:
  mov x3, 0
  mov x4, 200
  whilelo p0.s, xzr, x4
  .p2align 3,,7
  .L2:
  ld1wz0.s, p0/z, [x0, x3, lsl 2]
  ld1wz1.s, p0/z, [x1, x3, lsl 2]
  caddz0.s, z0.s, z1.s, #90
  st1wz0.s, p0, [x2, x3, lsl 2]
  incwx3
  whilelo p0.s, x3, x4
  b.any   .L2
  ret

instead of

  f90:
  mov x3, 0
  mov x4, 0
  mov w5, 100
  whilelo p0.s, wzr, w5
  .p2align 3,,7
  .L2:
  ld2w{z4.s - z5.s}, p0/z, [x0, x3, lsl 2]
  ld2w{z2.s - z3.s}, p0/z, [x1, x3, lsl 2]
  sub z0.s, z4.s, z3.s
  add z1.s, z5.s, z2.s
  st2w{z0.s - z1.s}, p0, [x2, x3, lsl 2]
  incwx4
  inchx3
  whilelo p0.s, w4, w5
  b.any   .L2
  ret

Bootstrapped Regtested on aarch64-none-linux-gnu and no issues.

Ok for master?

Thanks,
Tamar

gcc/ChangeLog:

* config/aarch64/aarch64-sve2.md (cadd3,
cml4, cmul3): New.
* config/aarch64/iterators.md (SVE2_INT_CMLA_OP, SVE2_INT_CMUL_OP,
SVE2_INT_CADD_OP): New.

-- 
diff --git a/gcc/config/aarch64/aarch64-sve2.md b/gcc/config/aarch64/aarch64-sve2.md
index e18b9fef16e72496588fb5850e362da4ae42898a..e601c6a4586e3ed1e11aedf047f56d556a99a302 100644
--- a/gcc/config/aarch64/aarch64-sve2.md
+++ b/gcc/config/aarch64/aarch64-sve2.md
@@ -1774,6 +1774,16 @@ (define_insn "@aarch64_sve_"
   [(set_attr "movprfx" "*,yes")]
 )
 
+;; unpredicated optab pattern for auto-vectorizer
+(define_expand "cadd3"
+  [(set (match_operand:SVE_FULL_I 0 "register_operand")
+	(unspec:SVE_FULL_I
+	  [(match_operand:SVE_FULL_I 1 "register_operand")
+	   (match_operand:SVE_FULL_I 2 "register_operand")]
+	  SVE2_INT_CADD_OP))]
+  "TARGET_SVE2"
+)
+
 ;; -
 ;;  [INT] Complex ternary operations
 ;; -
@@ -1813,6 +1823,47 @@ (define_insn "@aarch64__lane_"
   [(set_attr "movprfx" "*,yes")]
 )
 
+;; unpredicated optab pattern for auto-vectorizer
+;; The complex mla/mls operations always need to expand to two instructions.
+;; The first operation does half the computation and the second does the
+;; remainder.  Because of this, expand early.
+(define_expand "cml4"
+  [(set (match_operand:SVE_FULL_I 0 "register_operand")
+	(plus:SVE_FULL_I (match_operand:SVE_FULL_I 1 "register_operand")
+	  (unspec:SVE_FULL_I
+	[(match_operand:SVE_FULL_I 2 "register_operand")
+	 (match_operand:SVE_FULL_I 3 "register_operand")]
+	SVE2_INT_CMLA_OP)))]
+  "TARGET_SVE2"
+{
+  emit_insn (gen_aarch64_sve_cmla (operands[0], operands[1],
+		   operands[2], operands[3]));
+  emit_insn (gen_aarch64_sve_cmla (operands[0], operands[0],
+		   operands[2], operands[3]));
+  DONE;
+})
+
+;; unpredicated optab pattern for auto-vectorizer
+;; The complex mul operations always need to expand to two instructions.
+;; The first operation does half the computation and the second does the
+;; remainder.  Because of this, expand early.
+(define_expand "cmul3"
+  [(set (match_operand:SVE_FULL_I 0 "register_operand")
+	(unspec:SVE_FULL_I
+	  [(match_operand:SVE_FULL_I 1 "register_operand")
+	   (match_operand:SVE_FULL_I 2 "register_operand")
+	   (match_dup 3)]
+	  SVE2_INT_CMUL_OP))]
+  "TARGET_SVE2"
+{
+  operands[3] = force_reg (mode, CONST0_RTX (mode));
+  emit_insn (gen_aarch64_sve_cmla (operands[0], operands[3],
+		   operands[1], operands[2]));
+  emit_insn (gen_aarch64_sve_cmla (operands[0], operands[0],
+		   operands[1], operands[2]));
+  DONE;
+})
+
 ;; -
 ;;  [INT] Complex dot product
 ;; -
diff --git a/gcc/config/aarch64/iterators.md b/gcc/config/aarch64/iterators.md
index 7662b929e2c4f6c103cc06e051eb574247320809..c11e976237d30771a7bd7c7fb56922f9c5c785de 100644
--- a/gcc/config/aarch64/iterators.md
+++ b/gcc/config/aarch64/iterators.md
@@ -2583,6 +2583,23 @@ (define_int_iterator SVE2_INT_CMLA [UNSPEC_CMLA
 UNSPEC_SQRDCMLAH180
 UNSPEC_SQRDCMLAH270])
 
+;; Unlike the normal CMLA instructions these represent the actual operation you
+;; to be performed.  They will always need to be expanded into multiple
+;; sequences consisting of CMLA.
+(define_int_iterator SVE2_INT_CMLA_OP [UNSPEC_CMLA
+   UNSPEC_CMLA180
+   UNSPEC_CMLS])
+
+;; Unlike the normal CMLA instructions these represent the actual operation you
+;; to be performed.  They will 

[PATCH v2 14/16]Arm: Add NEON RTL patterns for Complex Addition, Multiply and FMA.

2020-09-25 Thread Tamar Christina
Hi All,

This adds implementation for the optabs for complex additions.  With this the
following C code:

  void f90 (float complex a[restrict N], float complex b[restrict N],
float complex c[restrict N])
  {
for (int i=0; i < N; i++)
  c[i] = a[i] + (b[i] * I);
  }

generates

  f90:
  add r3, r2, #1600
  .L2:
  vld1.32 {q8}, [r0]!
  vld1.32 {q9}, [r1]!
  vcadd.f32   q8, q8, q9, #90
  vst1.32 {q8}, [r2]!
  cmp r3, r2
  bne .L2
  bx  lr


instead of

  f90:
  add r3, r2, #1600
  .L2:
  vld2.32 {d24-d27}, [r0]!
  vld2.32 {d20-d23}, [r1]!
  vsub.f32  q8, q12, q11
  vadd.f32  q9, q13, q10
  vst2.32 {d16-d19}, [r2]!
  cmp r3, r2
  bne .L2
  bx  lr


Bootstrapped Regtested on arm-none-linux-gnueabihf and no issues.

Ok for master?

Thanks,
Tamar

gcc/ChangeLog:

* config/arm/iterators.md (rot): Add UNSPEC_VCMLS, UNSPEC_VCMUL and
UNSPEC_VCMUL180.
(rot_op, rotsplit1, rotsplit2, fcmac1, VCMLA_OP, VCMUL_OP): New.
* config/arm/neon.md (cadd3, cml4,
cmul3): New.
* config/arm/unspecs.md (UNSPEC_VCMUL, UNSPEC_VCMUL180, UNSPEC_VCMLS,
UNSPEC_VCMLS180): New.

-- 
diff --git a/gcc/config/arm/iterators.md b/gcc/config/arm/iterators.md
index 0bc9eba0722689aff4c1a143e952f6eb91c0cd86..f5693c0524274da1eb1c767713574c01ec6d544c 100644
--- a/gcc/config/arm/iterators.md
+++ b/gcc/config/arm/iterators.md
@@ -1146,10 +1146,38 @@ (define_int_attr crypto_mode [(UNSPEC_SHA1H "V4SI") (UNSPEC_AESMC "V16QI")
 
 (define_int_attr rot [(UNSPEC_VCADD90 "90")
 		  (UNSPEC_VCADD270 "270")
+		  (UNSPEC_VCMLS "0")
 		  (UNSPEC_VCMLA "0")
 		  (UNSPEC_VCMLA90 "90")
 		  (UNSPEC_VCMLA180 "180")
-		  (UNSPEC_VCMLA270 "270")])
+		  (UNSPEC_VCMLA270 "270")
+		  (UNSPEC_VCMUL "0")
+		  (UNSPEC_VCMUL180 "180")])
+
+;; A conjucate is a rotation of 180* around the argand plane, or * I.
+(define_int_attr rot_op [(UNSPEC_VCMLS "")
+			 (UNSPEC_VCMLS180 "_conj")
+			 (UNSPEC_VCMLA "")
+			 (UNSPEC_VCMLA180 "_conj")
+			 (UNSPEC_VCMUL "")
+			 (UNSPEC_VCMUL180 "_conj")])
+
+(define_int_attr rotsplit1 [(UNSPEC_VCMLA "0")
+			(UNSPEC_VCMLA180 "0")
+			(UNSPEC_VCMUL "0")
+			(UNSPEC_VCMUL180 "0")
+			(UNSPEC_VCMLS "270")
+			(UNSPEC_VCMLS180 "90")])
+
+(define_int_attr rotsplit2 [(UNSPEC_VCMLA "90")
+			(UNSPEC_VCMLA180 "270")
+			(UNSPEC_VCMUL "90")
+			(UNSPEC_VCMUL180 "270")
+			(UNSPEC_VCMLS "180")
+			(UNSPEC_VCMLS180 "180")])
+
+(define_int_attr fcmac1 [(UNSPEC_VCMLA "a") (UNSPEC_VCMLA180 "a")
+			 (UNSPEC_VCMLS "s") (UNSPEC_VCMLS180 "s")])
 
 (define_int_attr simd32_op [(UNSPEC_QADD8 "qadd8") (UNSPEC_QSUB8 "qsub8")
 			(UNSPEC_SHADD8 "shadd8") (UNSPEC_SHSUB8 "shsub8")
@@ -1256,3 +1284,12 @@ (define_int_attr bt [(UNSPEC_BFMAB "b") (UNSPEC_BFMAT "t")])
 
 ;; An iterator for CDE MVE accumulator/non-accumulator versions.
 (define_int_attr a [(UNSPEC_VCDE "") (UNSPEC_VCDEA "a")])
+
+;; Define iterators for VCMLA operations
+(define_int_iterator VCMLA_OP [UNSPEC_VCMLA
+			   UNSPEC_VCMLA180
+			   UNSPEC_VCMLS])
+
+;; Define iterators for VCMLA operations as MUL
+(define_int_iterator VCMUL_OP [UNSPEC_VCMUL
+			   UNSPEC_VCMUL180])
diff --git a/gcc/config/arm/neon.md b/gcc/config/arm/neon.md
index 3e7b51d8ab60007901392df0ca1cb09fead4d0e9..1611bcea1ba8cb416d27368e4dc39ce15b3a4cd8 100644
--- a/gcc/config/arm/neon.md
+++ b/gcc/config/arm/neon.md
@@ -3217,6 +3217,14 @@ (define_insn "neon_vcadd"
   [(set_attr "type" "neon_fcadd")]
 )
 
+(define_expand "cadd3"
+  [(set (match_operand:VF 0 "register_operand")
+	(unspec:VF [(match_operand:VF 1 "register_operand")
+		(match_operand:VF 2 "register_operand")]
+		VCADD))]
+  "TARGET_COMPLEX"
+)
+
 (define_insn "neon_vcmla"
   [(set (match_operand:VF 0 "register_operand" "=w")
 	(plus:VF (match_operand:VF 1 "register_operand" "0")
@@ -3274,6 +3282,43 @@ (define_insn "neon_vcmlaq_lane"
 )
 
 
+;; The complex mla/mls operations always need to expand to two instructions.
+;; The first operation does half the computation and the second does the
+;; remainder.  Because of this, expand early.
+(define_expand "cml4"
+  [(set (match_operand:VF 0 "register_operand")
+	(plus:VF (match_operand:VF 1 "register_operand")
+		 (unspec:VF [(match_operand:VF 2 "register_operand")
+			 (match_operand:VF 3 "register_operand")]
+			 VCMLA_OP)))]
+  "TARGET_COMPLEX"
+{
+  emit_insn (gen_neon_vcmla (operands[0], operands[1],
+	  operands[2], operands[3]));
+  emit_insn (gen_neon_vcmla (operands[0], operands[0],
+	  operands[2], operands[3]));
+  DONE;
+})
+
+;; The complex mul operations always need to expand to two instructions.
+;; The first operation does half the computation and the second does the
+;; remainder.  Because of this, expand early.
+(define_expand "cmul3"
+  

[PATCH v2 11/16]AArch64: Add SVE RTL patterns for Complex Addition, Multiply and FMA.

2020-09-25 Thread Tamar Christina
Hi All,

This adds implementation for the optabs for complex operations.  With this the
following C code:

  void f90 (float complex a[restrict N], float complex b[restrict N],
float complex c[restrict N])
  {
for (int i=0; i < N; i++)
  c[i] = a[i] + (b[i] * I);
  }

generates

  f90:
  mov x3, 0
  mov x4, 400
  ptrue   p1.b, all
  whilelo p0.s, xzr, x4
  .p2align 3,,7
  .L2:
  ld1wz0.s, p0/z, [x0, x3, lsl 2]
  ld1wz1.s, p0/z, [x1, x3, lsl 2]
  fcadd   z0.s, p1/m, z0.s, z1.s, #90
  st1wz0.s, p0, [x2, x3, lsl 2]
  incwx3
  whilelo p0.s, x3, x4
  b.any   .L2
  ret

instead of

  f90:
  mov x3, 0
  mov x4, 0
  mov w5, 200
  whilelo p0.s, wzr, w5
  .p2align 3,,7
  .L2:
  ld2w{z4.s - z5.s}, p0/z, [x0, x3, lsl 2]
  ld2w{z2.s - z3.s}, p0/z, [x1, x3, lsl 2]
  fsubz0.s, z4.s, z3.s
  faddz1.s, z2.s, z5.s
  st2w{z0.s - z1.s}, p0, [x2, x3, lsl 2]
  incwx4
  inchx3
  whilelo p0.s, w4, w5
  b.any   .L2
  ret

Bootstrapped Regtested on aarch64-none-linux-gnu and no issues.

Ok for master?

Thanks,
Tamar

gcc/ChangeLog:

* config/aarch64/aarch64-sve.md (cadd3,
cml4, cmul3): New.
* config/aarch64/iterators.md (sve_rot1, sve_rot2): New.

-- 
diff --git a/gcc/config/aarch64/aarch64-sve.md b/gcc/config/aarch64/aarch64-sve.md
index cd79aba90ec9cdb5da9e9758495015ef36b2d869..12bc8077994f5a130ff4af6e9bfa7ca1237d0868 100644
--- a/gcc/config/aarch64/aarch64-sve.md
+++ b/gcc/config/aarch64/aarch64-sve.md
@@ -5109,6 +5109,20 @@ (define_expand "@cond_"
   "TARGET_SVE"
 )
 
+;; Predicated FCADD using ptrue for unpredicated optab for auto-vectorizer
+(define_expand "@cadd3"
+  [(set (match_operand:SVE_FULL_F 0 "register_operand")
+	(unspec:SVE_FULL_F
+	  [(match_dup 3)
+	   (const_int SVE_RELAXED_GP)
+	   (match_operand:SVE_FULL_F 1 "register_operand")
+	   (match_operand:SVE_FULL_F 2 "register_operand")]
+	  SVE_COND_FCADD))]
+  "TARGET_SVE"
+{
+  operands[3] = aarch64_ptrue_reg (mode);
+})
+
 ;; Predicated FCADD, merging with the first input.
 (define_insn_and_rewrite "*cond__2"
   [(set (match_operand:SVE_FULL_F 0 "register_operand" "=w, ?")
@@ -6554,6 +6568,62 @@ (define_insn "@aarch64_pred_"
   [(set_attr "movprfx" "*,yes")]
 )
 
+;; unpredicated optab pattern for auto-vectorizer
+;; The complex mla/mls operations always need to expand to two instructions.
+;; The first operation does half the computation and the second does the
+;; remainder.  Because of this, expand early.
+(define_expand "cml4"
+  [(set (match_operand:SVE_FULL_F 0 "register_operand")
+	(unspec:SVE_FULL_F
+	  [(match_dup 4)
+	   (match_dup 5)
+	   (match_operand:SVE_FULL_F 1 "register_operand")
+	   (match_operand:SVE_FULL_F 2 "register_operand")
+	   (match_operand:SVE_FULL_F 3 "register_operand")]
+	  FCMLA_OP))]
+  "TARGET_SVE"
+{
+  operands[4] = aarch64_ptrue_reg (mode);
+  operands[5] = gen_int_mode (SVE_RELAXED_GP, SImode);
+  emit_insn (
+gen_aarch64_pred_fcmla (operands[0], operands[4],
+	operands[1], operands[2],
+	operands[3], operands[5]));
+  emit_insn (
+gen_aarch64_pred_fcmla (operands[0], operands[4],
+	operands[0], operands[2],
+	operands[3], operands[5]));
+  DONE;
+})
+
+;; unpredicated optab pattern for auto-vectorizer
+;; The complex mul operations always need to expand to two instructions.
+;; The first operation does half the computation and the second does the
+;; remainder.  Because of this, expand early.
+(define_expand "cmul3"
+  [(set (match_operand:SVE_FULL_F 0 "register_operand")
+	(unspec:SVE_FULL_F
+	  [(match_dup 3)
+	   (match_dup 4)
+	   (match_operand:SVE_FULL_F 1 "register_operand")
+	   (match_operand:SVE_FULL_F 2 "register_operand")
+	   (match_dup 5)]
+	  FCMUL_OP))]
+  "TARGET_SVE"
+{
+  operands[3] = aarch64_ptrue_reg (mode);
+  operands[4] = gen_int_mode (SVE_RELAXED_GP, SImode);
+  operands[5] = force_reg (mode, CONST0_RTX (mode));
+  emit_insn (
+gen_aarch64_pred_fcmla (operands[0], operands[3], operands[1],
+	operands[2], operands[5], operands[4]));
+  emit_insn (
+gen_aarch64_pred_fcmla (operands[0], operands[3], operands[1],
+	operands[2], operands[0],
+	operands[4]));
+  DONE;
+})
+
 ;; Predicated FCMLA with merging.
 (define_expand "@cond_"
   [(set (match_operand:SVE_FULL_F 0 "register_operand")
diff --git a/gcc/config/aarch64/iterators.md b/gcc/config/aarch64/iterators.md
index 98217c9fd3ee2b6063f7564193e400e9ef71c6ac..7662b929e2c4f6c103cc06e051eb574247320809 100644
--- a/gcc/config/aarch64/iterators.md
+++ b/gcc/config/aarch64/iterators.md
@@ -3443,6 +3443,35 @@ (define_int_attr rotsplit2 [(UNSPEC_FCMLA "90")
 			(UNSPEC_FCMLS "180")
 			(UNSPEC_FCMLS180 "180")])
 
+;; SVE has slightly different namings 

[PATCH v2 8/16]middle-end: add Complex Multiply and Accumulate/Subtract and Multiply and Accumulate/Subtract with Conjucate detection

2020-09-25 Thread Tamar Christina
Hi All,

This patch adds pattern detections for the following operation:

  Complex FMLA, Conjucate FMLA of the second parameter and FMLS.

c += a * b, c += a * conj (b), c -= a * b and c -= a * conj (b)

  For the conjucate cases it supports under fast-math that the operands that is
  being conjucated be flipped by flipping the arguments to the optab.  This
  allows it to support c = conj (a) * b and c += conj (a) * b.

  where a, b and c are complex numbers.

Bootstrapped Regtested on aarch64-none-linux-gnu and no issues.

Ok for master?

Thanks,
Tamar

gcc/ChangeLog:

* doc/md.texi: Document optabs.
* internal-fn.def (COMPLEX_FMA, COMPLEX_FMA_CONJ, COMPLEX_FMS,
COMPLEX_FMS_CONJ): New.
* optabs.def (cmla_optab, cmla_conj_optab, cmls_optab, cmls_conj_optab):
New.
* tree-vect-slp-patterns.c (class ComplexFMAPattern): New.
(slp_patterns): Add ComplexFMAPattern.

-- 
diff --git a/gcc/doc/md.texi b/gcc/doc/md.texi
index ddaf1abaccbd44dae11ea902ec38b474aacfb8e1..d8142f745050d963e8d15c7793fae06d9ad02020 100644
--- a/gcc/doc/md.texi
+++ b/gcc/doc/md.texi
@@ -6143,6 +6143,50 @@ rotations @var{m} of 90 or 270.
 
 This pattern is not allowed to @code{FAIL}.
 
+@cindex @code{cmla@var{m}4} instruction pattern
+@item @samp{cmla@var{m}4}
+Perform a vector floating point multiply and accumulate of complex numbers
+in operand 0, operand 1 and operand 2.
+
+The instruction must perform the operation on data loaded contiguously into the
+vectors.
+The operation is only supported for vector modes @var{m}.
+
+This pattern is not allowed to @code{FAIL}.
+
+@cindex @code{cmla_conj@var{m}4} instruction pattern
+@item @samp{cmla_conj@var{m}4}
+Perform a vector floating point multiply and accumulate of complex numbers
+in operand 0, operand 1 and the conjucate of operand 2.
+
+The instruction must perform the operation on data loaded contiguously into the
+vectors.
+The operation is only supported for vector modes @var{m}.
+
+This pattern is not allowed to @code{FAIL}.
+
+@cindex @code{cmls@var{m}4} instruction pattern
+@item @samp{cmls@var{m}4}
+Perform a vector floating point multiply and subtract of complex numbers
+in operand 0, operand 1 and operand 2.
+
+The instruction must perform the operation on data loaded contiguously into the
+vectors.
+The operation is only supported for vector modes @var{m}.
+
+This pattern is not allowed to @code{FAIL}.
+
+@cindex @code{cmls_conj@var{m}4} instruction pattern
+@item @samp{cmls_conj@var{m}4}
+Perform a vector floating point multiply and subtract of complex numbers
+in operand 0, operand 1 and the conjucate of operand 2.
+
+The instruction must perform the operation on data loaded contiguously into the
+vectors.
+The operation is only supported for vector modes @var{m}.
+
+This pattern is not allowed to @code{FAIL}.
+
 @cindex @code{cmul@var{m}4} instruction pattern
 @item @samp{cmul@var{m}4}
 Perform a vector floating point multiplication of complex numbers in operand 0
diff --git a/gcc/internal-fn.def b/gcc/internal-fn.def
index 51bebf8701af262b22d66d19a29a8dafb74db1f0..cc0135cb2c1c14b593181edeaa5f896fa6c4c659 100644
--- a/gcc/internal-fn.def
+++ b/gcc/internal-fn.def
@@ -286,6 +286,10 @@ DEF_INTERNAL_FLT_FN (LDEXP, ECF_CONST, ldexp, binary)
 
 /* Ternary math functions.  */
 DEF_INTERNAL_FLT_FLOATN_FN (FMA, ECF_CONST, fma, ternary)
+DEF_INTERNAL_OPTAB_FN (COMPLEX_FMA, ECF_CONST, cmla, ternary)
+DEF_INTERNAL_OPTAB_FN (COMPLEX_FMA_CONJ, ECF_CONST, cmla_conj, ternary)
+DEF_INTERNAL_OPTAB_FN (COMPLEX_FMS, ECF_CONST, cmls, ternary)
+DEF_INTERNAL_OPTAB_FN (COMPLEX_FMS_CONJ, ECF_CONST, cmls_conj, ternary)
 
 /* Unary integer ops.  */
 DEF_INTERNAL_INT_FN (CLRSB, ECF_CONST | ECF_NOTHROW, clrsb, unary)
diff --git a/gcc/optabs.def b/gcc/optabs.def
index 9c267d422478d0011f288b1f5f62daabe3989ba7..19db9c00896cd08adfd20a01669990bbbebd79f1 100644
--- a/gcc/optabs.def
+++ b/gcc/optabs.def
@@ -294,6 +294,10 @@ OPTAB_D (cadd90_optab, "cadd90$a3")
 OPTAB_D (cadd270_optab, "cadd270$a3")
 OPTAB_D (cmul_optab, "cmul$a3")
 OPTAB_D (cmul_conj_optab, "cmul_conj$a3")
+OPTAB_D (cmla_optab, "cmla$a4")
+OPTAB_D (cmla_conj_optab, "cmla_conj$a4")
+OPTAB_D (cmls_optab, "cmls$a4")
+OPTAB_D (cmls_conj_optab, "cmls_conj$a4")
 OPTAB_D (cos_optab, "cos$a2")
 OPTAB_D (cosh_optab, "cosh$a2")
 OPTAB_D (exp10_optab, "exp10$a2")
diff --git a/gcc/tree-vect-slp-patterns.c b/gcc/tree-vect-slp-patterns.c
index bef7cc73b21c020e4c0128df5d186a034809b103..d9554aaaf2cce14bb5b9c68e6141ea7f555a35de 100644
--- a/gcc/tree-vect-slp-patterns.c
+++ b/gcc/tree-vect-slp-patterns.c
@@ -916,6 +916,199 @@ class ComplexMulPattern : public ComplexMLAPattern
 }
 };
 
+class ComplexFMAPattern : public ComplexMLAPattern
+{
+  protected:
+ComplexFMAPattern (slp_tree node, vec_info *vinfo)
+  : ComplexMLAPattern (node, vinfo)
+{
+  this->m_arity = 2;
+  this->m_num_args = 3;
+  this->m_vects.create (0);
+  this->m_defs.create (0);
+}
+
+  public:
+

[PATCH v2 10/16]AArch64: Add NEON RTL patterns for Complex Addition, Multiply and FMA.

2020-09-25 Thread Tamar Christina
Hi All,

This adds implementation for the optabs for complex operations.  With this the
following C code:

  void f90 (float complex a[restrict N], float complex b[restrict N],
float complex c[restrict N])
  {
for (int i=0; i < N; i++)
  c[i] = a[i] + (b[i] * I);
  }

generates

  f90:
  mov x3, 0
  .p2align 3,,7
  .L2:
  ldr q0, [x0, x3]
  ldr q1, [x1, x3]
  fcadd   v0.4s, v0.4s, v1.4s, #90
  str q0, [x2, x3]
  add x3, x3, 16
  cmp x3, 1600
  bne .L2
  ret

instead of

  f90:
  add x3, x1, 1600
  .p2align 3,,7
  .L2:
  ld2 {v4.4s - v5.4s}, [x0], 32
  ld2 {v2.4s - v3.4s}, [x1], 32
  fsubv0.4s, v4.4s, v3.4s
  faddv1.4s, v5.4s, v2.4s
  st2 {v0.4s - v1.4s}, [x2], 32
  cmp x3, x1
  bne .L2
  ret

It defined a new iterator VALL_ARITH which contains types for which we can do
general arithmetic (excludes bfloat16).

Bootstrapped Regtested on aarch64-none-linux-gnu and no issues.

Ok for master?

Thanks,
Tamar

gcc/ChangeLog:

* config/aarch64/aarch64-simd.md (cadd3,
cml4, cmul3): New.
* config/aarch64/iterators.md (VALL_ARITH, UNSPEC_FCMUL,
UNSPEC_FCMUL180, UNSPEC_FCMLS, UNSPEC_FCMLS180, UNSPEC_CMLS,
UNSPEC_CMLS180, UNSPEC_CMUL, UNSPEC_CMUL180, FCMLA_OP, FCMUL_OP, rot_op,
rotsplit1, rotsplit2, fcmac1): New.
(rot): Add UNSPEC_FCMLS, UNSPEC_FCMUL, UNSPEC_FCMUL180.

-- 
diff --git a/gcc/config/aarch64/aarch64-simd.md b/gcc/config/aarch64/aarch64-simd.md
index 381a702eba003520d2e83e91065d2a808b9c6493..c2ddef19e4e433f7ca055e42d1222d9dad6bd6c2 100644
--- a/gcc/config/aarch64/aarch64-simd.md
+++ b/gcc/config/aarch64/aarch64-simd.md
@@ -449,6 +449,14 @@ (define_insn "aarch64_fcadd"
   [(set_attr "type" "neon_fcadd")]
 )
 
+(define_expand "cadd3"
+  [(set (match_operand:VHSDF 0 "register_operand")
+	(unspec:VHSDF [(match_operand:VHSDF 1 "register_operand")
+		   (match_operand:VHSDF 2 "register_operand")]
+		   FCADD))]
+  "TARGET_COMPLEX"
+)
+
 (define_insn "aarch64_fcmla"
   [(set (match_operand:VHSDF 0 "register_operand" "=w")
 	(plus:VHSDF (match_operand:VHSDF 1 "register_operand" "0")
@@ -508,6 +516,45 @@ (define_insn "aarch64_fcmlaq_lane"
   [(set_attr "type" "neon_fcmla")]
 )
 
+;; The complex mla/mls operations always need to expand to two instructions.
+;; The first operation does half the computation and the second does the
+;; remainder.  Because of this, expand early.
+(define_expand "cml4"
+  [(set (match_operand:VHSDF 0 "register_operand")
+	(plus:VHSDF (match_operand:VHSDF 1 "register_operand")
+		(unspec:VHSDF [(match_operand:VHSDF 2 "register_operand")
+   (match_operand:VHSDF 3 "register_operand")]
+   FCMLA_OP)))]
+  "TARGET_COMPLEX"
+{
+  emit_insn (gen_aarch64_fcmla (operands[0], operands[1],
+		 operands[2], operands[3]));
+  emit_insn (gen_aarch64_fcmla (operands[0], operands[0],
+		 operands[2], operands[3]));
+  DONE;
+})
+
+;; The complex mul operations always need to expand to two instructions.
+;; The first operation does half the computation and the second does the
+;; remainder.  Because of this, expand early.
+(define_expand "cmul3"
+  [(set (match_operand:VHSDF 0 "register_operand")
+	(unspec:VHSDF [(match_operand:VHSDF 1 "register_operand")
+		   (match_operand:VHSDF 2 "register_operand")]
+		   FCMUL_OP))]
+  "TARGET_COMPLEX"
+{
+  rtx tmp = gen_reg_rtx (mode);
+  emit_move_insn (tmp, CONST0_RTX (mode));
+  emit_insn (gen_aarch64_fcmla (operands[0], tmp,
+		 operands[1], operands[2]));
+  emit_insn (gen_aarch64_fcmla (operands[0], operands[0],
+		 operands[1], operands[2]));
+  DONE;
+})
+
+
+
 ;; These instructions map to the __builtins for the Dot Product operations.
 (define_insn "aarch64_dot"
   [(set (match_operand:VS 0 "register_operand" "=w")
diff --git a/gcc/config/aarch64/iterators.md b/gcc/config/aarch64/iterators.md
index 054fd8515c6ebf136da699e2993f6ebb348c3b1a..98217c9fd3ee2b6063f7564193e400e9ef71c6ac 100644
--- a/gcc/config/aarch64/iterators.md
+++ b/gcc/config/aarch64/iterators.md
@@ -182,6 +182,11 @@ (define_mode_iterator V2F [V2SF V2DF])
 ;; All Advanced SIMD modes on which we support any arithmetic operations.
 (define_mode_iterator VALL [V8QI V16QI V4HI V8HI V2SI V4SI V2DI V2SF V4SF V2DF])
 
+;; All Advanced SIMD modes suitable for performing arithmetics.
+(define_mode_iterator VALL_ARITH [V8QI V16QI V4HI V8HI V2SI V4SI V2DI
+  (V4HF "TARGET_SIMD_F16INST") (V8HF "TARGET_SIMD_F16INST")
+  V2SF V4SF V2DF])
+
 ;; All Advanced SIMD modes suitable for moving, loading, and storing.
 (define_mode_iterator VALL_F16 [V8QI V16QI V4HI V8HI V2SI V4SI V2DI
 V4HF V8HF V4BF V8BF V2SF V4SF V2DF])
@@ -705,6 +710,10 @@ (define_c_enum "unspec"
 UNSPEC_FCMLA90	; Used in aarch64-simd.md.
 UNSPEC_FCMLA180	; Used in aarch64-simd.md.
 

[PATCH v2 9/16][docs] Add some missing test directive documentaion.

2020-09-25 Thread Tamar Christina
Hi All,

This adds some documentation for some test directives that are missing.

Bootstrapped Regtested on aarch64-none-linux-gnu and no issues.

Ok for master?

Thanks,
Tamar

gcc/ChangeLog:

* doc/sourcebuild.texi (vect_complex_rot_,
arm_v8_3a_complex_neon_ok, arm_v8_3a_complex_neon_hw): New.

-- 
diff --git a/gcc/doc/sourcebuild.texi b/gcc/doc/sourcebuild.texi
index 65b2e552b74becdbc5474ba5ac387a4a0296e341..3abd8f631cb0234076641e399f6f00768b38ebee 100644
--- a/gcc/doc/sourcebuild.texi
+++ b/gcc/doc/sourcebuild.texi
@@ -1671,6 +1671,10 @@ Target supports a vector dot-product of @code{signed short}.
 @item vect_udot_hi
 Target supports a vector dot-product of @code{unsigned short}.
 
+@item vect_complex_rot_@var{n}
+Target supports a vector complex addition and complex fma of mode @var{N}.
+Possible values of @var{n} are @code{hf}, @code{sf}, @code{df}.
+
 @item vect_pack_trunc
 Target supports a vector demotion (packing) of @code{short} to @code{char}
 and from @code{int} to @code{short} using modulo arithmetic.
@@ -1941,6 +1945,16 @@ ARM target supports executing instructions from ARMv8.2-A with the Dot
 Product extension. Some multilibs may be incompatible with these options.
 Implies arm_v8_2a_dotprod_neon_ok.
 
+@item arm_v8_3a_complex_neon_ok
+@anchor{arm_v8_3a_complex_neon_ok}
+ARM target supports options to generate complex number arithmetic instructions
+from ARMv8.3-A.  Some multilibs may be incompatible with these options.
+
+@item arm_v8_3a_complex_neon_hw
+ARM target supports executing complex arithmetic instructions from ARMv8.3-A.
+Some multilibs may be incompatible with these options.
+Implies arm_v8_3a_complex_neon_ok.
+
 @item arm_fp16fml_neon_ok
 @anchor{arm_fp16fml_neon_ok}
 ARM target supports extensions to generate the @code{VFMAL} and @code{VFMLS}



[PATCH v2 7/16]middle-end: Add Complex Multiplication and Multiplication with Conjucate detection

2020-09-25 Thread Tamar Christina
Hi All,

This patch adds pattern detections for the following operation:

  Complex multiplication and Conjucate Complex multiplication of the second
 parameter.

c = a * b and c = a * conj (b)

  For the conjucate cases it supports under fast-math that the operands that is
  being conjucated be flipped by flipping the arguments to the optab.  This
  allows it to support c = conj (a) * b and c += conj (a) * b.

  where a, b and c are complex numbers.

and provides a shared class for anything needing to recognize complex MLA
patterns.

Bootstrapped Regtested on aarch64-none-linux-gnu and no issues.

Ok for master?

Thanks,
Tamar

gcc/ChangeLog:

* doc/md.texi: Document optabs.
* internal-fn.def (COMPLEX_MUL, COMPLEX_MUL_CONJ): New.
* optabs.def (cmul_optab, cmul_conj_optab): New,
* tree-vect-slp-patterns.c (class ComplexMLAPattern,
class ComplexMulPattern): New.
(slp_patterns): Add ComplexMulPattern.

-- 
diff --git a/gcc/doc/md.texi b/gcc/doc/md.texi
index 71e226505b2619d10982b59a4ebbed73a70f29be..ddaf1abaccbd44dae11ea902ec38b474aacfb8e1 100644
--- a/gcc/doc/md.texi
+++ b/gcc/doc/md.texi
@@ -6143,6 +6143,28 @@ rotations @var{m} of 90 or 270.
 
 This pattern is not allowed to @code{FAIL}.
 
+@cindex @code{cmul@var{m}4} instruction pattern
+@item @samp{cmul@var{m}4}
+Perform a vector floating point multiplication of complex numbers in operand 0
+and operand 1.
+
+The instruction must perform the operation on data loaded contiguously into the
+vectors.
+The operation is only supported for vector modes @var{m}.
+
+This pattern is not allowed to @code{FAIL}.
+
+@cindex @code{cmul_conj@var{m}4} instruction pattern
+@item @samp{cmul_conj@var{m}4}
+Perform a vector floating point multiplication of complex numbers in operand 0
+and the conjucate of operand 1.
+
+The instruction must perform the operation on data loaded contiguously into the
+vectors.
+The operation is only supported for vector modes @var{m}.
+
+This pattern is not allowed to @code{FAIL}.
+
 @cindex @code{ffs@var{m}2} instruction pattern
 @item @samp{ffs@var{m}2}
 Store into operand 0 one plus the index of the least significant 1-bit
diff --git a/gcc/internal-fn.def b/gcc/internal-fn.def
index 956a65a338c157b51de7e78a3fb005b5af78ef31..51bebf8701af262b22d66d19a29a8dafb74db1f0 100644
--- a/gcc/internal-fn.def
+++ b/gcc/internal-fn.def
@@ -277,6 +277,9 @@ DEF_INTERNAL_FLT_FLOATN_FN (FMAX, ECF_CONST, fmax, binary)
 DEF_INTERNAL_OPTAB_FN (XORSIGN, ECF_CONST, xorsign, binary)
 DEF_INTERNAL_OPTAB_FN (COMPLEX_ADD_ROT90, ECF_CONST, cadd90, binary)
 DEF_INTERNAL_OPTAB_FN (COMPLEX_ADD_ROT270, ECF_CONST, cadd270, binary)
+DEF_INTERNAL_OPTAB_FN (COMPLEX_MUL, ECF_CONST, cmul, binary)
+DEF_INTERNAL_OPTAB_FN (COMPLEX_MUL_CONJ, ECF_CONST, cmul_conj, binary)
+
 
 /* FP scales.  */
 DEF_INTERNAL_FLT_FN (LDEXP, ECF_CONST, ldexp, binary)
diff --git a/gcc/optabs.def b/gcc/optabs.def
index 2bb0bf857977035bf562a77f5f6848e80edf936d..9c267d422478d0011f288b1f5f62daabe3989ba7 100644
--- a/gcc/optabs.def
+++ b/gcc/optabs.def
@@ -292,6 +292,8 @@ OPTAB_D (copysign_optab, "copysign$F$a3")
 OPTAB_D (xorsign_optab, "xorsign$F$a3")
 OPTAB_D (cadd90_optab, "cadd90$a3")
 OPTAB_D (cadd270_optab, "cadd270$a3")
+OPTAB_D (cmul_optab, "cmul$a3")
+OPTAB_D (cmul_conj_optab, "cmul_conj$a3")
 OPTAB_D (cos_optab, "cos$a2")
 OPTAB_D (cosh_optab, "cosh$a2")
 OPTAB_D (exp10_optab, "exp10$a2")
diff --git a/gcc/tree-vect-slp-patterns.c b/gcc/tree-vect-slp-patterns.c
index b2b0ac62e9a69145470f41d2bac736dd970be735..bef7cc73b21c020e4c0128df5d186a034809b103 100644
--- a/gcc/tree-vect-slp-patterns.c
+++ b/gcc/tree-vect-slp-patterns.c
@@ -743,6 +743,179 @@ class ComplexAddPattern : public ComplexPattern
 }
 };
 
+class ComplexMLAPattern : public ComplexPattern
+{
+  protected:
+ComplexMLAPattern (slp_tree node, vec_info *vinfo)
+  : ComplexPattern (node, vinfo)
+{ }
+
+  protected:
+/* Helper function of vect_match_call_complex_mla that looks up the
+   definition of LHS_0 and LHS_1 by finding the statements starting in
+   position BASE + IDX in child ROOT of NODE and tries to match the
+   definition against pair ops.
+
+   If the match is successful then ARGS will contain the operands matched
+   and the complex_operation_t type is returned.  If match is not successful
+   then CMPLX_NONE is returned and ARGS is left unmodified.  */
+
+complex_operation_t
+vect_match_call_complex_mla_1 (slp_tree node, slp_tree *res, int root,
+   int base, int idx, vec *args)
+{
+  gcc_assert (base >= 0 && idx >= 0 && node != NULL);
+
+  if ((unsigned)root >= SLP_TREE_CHILDREN (node).length ())
+	return CMPLX_NONE;
+
+  slp_tree data = SLP_TREE_CHILDREN (node)[root];
+
+  /* If it's a VEC_PERM_EXPR we need to look one deeper.  */
+  if (node->code == VEC_PERM_EXPR)
+	data = SLP_TREE_CHILDREN (data)[root];
+
+  int lhs_0 = base + idx;
+  int lhs_1 = base + idx + 1;
+
+  vec 

[PATCH v2 6/16]middle-end Add Complex Addition with rotation detection

2020-09-25 Thread Tamar Christina
Hi All,

This patch adds pattern detections for the following operation:

  Addition with rotation of the second argument around the Argand plane.
Supported rotations are 90 and 180.

c = a + (b * I) and c = a + (b * I * I)

  where a, b and c are complex numbers.

Bootstrapped Regtested on aarch64-none-linux-gnu and no issues.

Ok for master?

Thanks,
Tamar

gcc/ChangeLog:

* doc/md.texi: Document optabs.
* internal-fn.def (COMPLEX_ADD_ROT90, COMPLEX_ADD_ROT270): New.
* optabs.def (cadd90_optab, cadd270_optab): New.
* tree-vect-slp-patterns.c (class ComplexAddPattern): New.
(slp_patterns): Add ComplexAddPattern.

-- 
diff --git a/gcc/doc/md.texi b/gcc/doc/md.texi
index 2b46286943778e16d95b15def4299bcbf8db7eb8..71e226505b2619d10982b59a4ebbed73a70f29be 100644
--- a/gcc/doc/md.texi
+++ b/gcc/doc/md.texi
@@ -6132,6 +6132,17 @@ floating-point mode.
 
 This pattern is not allowed to @code{FAIL}.
 
+@cindex @code{cadd@var{m}@var{n}3} instruction pattern
+@item @samp{cadd@var{m}@var{n}3}
+Perform a vector addition of complex numbers in operand 1 with operand 2
+rotated by @var{m} degrees around the argand plane and storing the result in
+operand 0.  The instruction must perform the operation on data loaded
+contiguously into the vectors.
+The operation is only supported for vector modes @var{n} and with
+rotations @var{m} of 90 or 270.
+
+This pattern is not allowed to @code{FAIL}.
+
 @cindex @code{ffs@var{m}2} instruction pattern
 @item @samp{ffs@var{m}2}
 Store into operand 0 one plus the index of the least significant 1-bit
diff --git a/gcc/internal-fn.def b/gcc/internal-fn.def
index 13e60828fcf5db6c5f15aae2bacd4cf04029e430..956a65a338c157b51de7e78a3fb005b5af78ef31 100644
--- a/gcc/internal-fn.def
+++ b/gcc/internal-fn.def
@@ -275,6 +275,8 @@ DEF_INTERNAL_FLT_FN (SCALB, ECF_CONST, scalb, binary)
 DEF_INTERNAL_FLT_FLOATN_FN (FMIN, ECF_CONST, fmin, binary)
 DEF_INTERNAL_FLT_FLOATN_FN (FMAX, ECF_CONST, fmax, binary)
 DEF_INTERNAL_OPTAB_FN (XORSIGN, ECF_CONST, xorsign, binary)
+DEF_INTERNAL_OPTAB_FN (COMPLEX_ADD_ROT90, ECF_CONST, cadd90, binary)
+DEF_INTERNAL_OPTAB_FN (COMPLEX_ADD_ROT270, ECF_CONST, cadd270, binary)
 
 /* FP scales.  */
 DEF_INTERNAL_FLT_FN (LDEXP, ECF_CONST, ldexp, binary)
diff --git a/gcc/optabs.def b/gcc/optabs.def
index 78409aa14537d259bf90277751aac00d452a0d3f..2bb0bf857977035bf562a77f5f6848e80edf936d 100644
--- a/gcc/optabs.def
+++ b/gcc/optabs.def
@@ -290,6 +290,8 @@ OPTAB_D (atan_optab, "atan$a2")
 OPTAB_D (atanh_optab, "atanh$a2")
 OPTAB_D (copysign_optab, "copysign$F$a3")
 OPTAB_D (xorsign_optab, "xorsign$F$a3")
+OPTAB_D (cadd90_optab, "cadd90$a3")
+OPTAB_D (cadd270_optab, "cadd270$a3")
 OPTAB_D (cos_optab, "cos$a2")
 OPTAB_D (cosh_optab, "cosh$a2")
 OPTAB_D (exp10_optab, "exp10$a2")
diff --git a/gcc/tree-vect-slp-patterns.c b/gcc/tree-vect-slp-patterns.c
index 6453a5b1b6464dba833adc2c2a194db5e712bb79..b2b0ac62e9a69145470f41d2bac736dd970be735 100644
--- a/gcc/tree-vect-slp-patterns.c
+++ b/gcc/tree-vect-slp-patterns.c
@@ -663,12 +663,94 @@ graceful_exit:
 }
 };
 
+class ComplexAddPattern : public ComplexPattern
+{
+  protected:
+ComplexAddPattern (slp_tree node, vec_info *vinfo)
+  : ComplexPattern (node, vinfo)
+{
+  this->m_arity = 2;
+  this->m_num_args = 2;
+  this->m_vects.create (0);
+  this->m_defs.create (0);
+}
+
+  public:
+~ComplexAddPattern ()
+{
+  this->m_vects.release ();
+  this->m_defs.release ();
+}
+
+static VectPattern* create (slp_tree node, vec_info *vinfo)
+{
+   return new ComplexAddPattern (node, vinfo);
+}
+
+const char* get_name ()
+{
+  return "Complex Addition";
+}
+
+/* Pattern matcher for trying to match complex addition pattern in SLP tree
+   using the N statements statements found in node starting at position IDX.
+   If the operation matches then IFN is set to the operation it matched and
+   the arguments to the two replacement statements are put in VECTS.
+
+   If no match is found then IFN is set to IFN_LAST.
+
+   This function matches the patterns shaped as:
+
+ c[i] = a[i] - b[i+1];
+ c[i+1] = a[i+1] + b[i];
+
+   If a match occurred then TRUE is returned, else FALSE.  */
+
+bool matches (stmt_vec_info *stmts, int idx)
+{
+  this->m_last_ifn = IFN_LAST;
+  int base = idx - (this->m_arity - 1);
+  this->m_last_idx = idx;
+  this->m_stmt_info = stmts[0];
+
+  complex_operation_t op
+	= vect_detect_pair_op (base, this->m_node, >m_vects);
+
+  /* Find the two components.  Rotation in the complex plane will modify
+	 the operations:
+
+	 * Rotation  0: + +
+	 * Rotation 90: - +
+	 * Rotation 180: - -
+	 * Rotation 270: + -
+
+	Rotation 0 and 180 can be handled by normal SIMD code, so we don't need
+	to care about them here.  */
+  if (op == MINUS_PLUS)
+	this->m_last_ifn = IFN_COMPLEX_ADD_ROT90;
+  else if (op == PLUS_MINUS)
+	

[PATCH v2 4/16]middle-end: Add dissolve code for when SLP fails and non-SLP loop vectorization is to be tried.

2020-09-25 Thread Tamar Christina
Hi All,

This adds the dissolve code to undo the patterns created by the pattern matcher
in case SLP is to be aborted.

As mentioned in the cover letter this has one issue in that the number of copies
can needed can change depending on whether TWO_OPERATORS is needed or not.

Because of this I don't analyze the original statement when it's replaced by a
pattern and attempt to correct it here by analyzing it after dissolve.

This however seems too late and I would need to change the unroll factor, which
seems a bit odd.  Any advice would be appreciated.

Bootstrapped Regtested on aarch64-none-linux-gnu and no issues.

Thanks,
Tamar

gcc/ChangeLog:

* tree-vect-loop.c (vect_dissolve_slp_only_patterns): New
(vect_dissolve_slp_only_groups): Call vect_dissolve_slp_only_patterns.

-- 
diff --git a/gcc/tree-vect-loop.c b/gcc/tree-vect-loop.c
index b1a6e1508c7f00f5f369ec873f927f30d673059e..8231ad6452af6ff111911a7bfb6aab2257df9fc0 100644
--- a/gcc/tree-vect-loop.c
+++ b/gcc/tree-vect-loop.c
@@ -1956,6 +1956,92 @@ vect_get_datarefs_in_loop (loop_p loop, basic_block *bbs,
   return opt_result::success ();
 }
 
+/* For every SLP only pattern created by the pattern matched rooted in ROOT
+   restore the relevancy of the original statements over those of the pattern
+   and destroy the pattern relationship.  This restores the SLP tree to a state
+   where it can be used when SLP build is cancelled or re-tried.  */
+
+static opt_result
+vect_dissolve_slp_only_patterns (loop_vec_info loop_vinfo,
+ hash_set *visited, slp_tree root)
+{
+  if (!root || visited->contains (root))
+return opt_result::success ();
+
+  unsigned int i;
+  slp_tree node;
+  opt_result res = opt_result::success ();
+  stmt_vec_info stmt_info;
+  stmt_vec_info related_stmt_info;
+  bool need_to_vectorize = false;
+  auto_vec cost_vec;
+
+  visited->add (root);
+
+  FOR_EACH_VEC_ELT (SLP_TREE_SCALAR_STMTS (root), i, stmt_info)
+if (STMT_VINFO_SLP_VECT_ONLY (stmt_info)
+&& (related_stmt_info = STMT_VINFO_RELATED_STMT (stmt_info)) != NULL)
+  {
+	if (dump_enabled_p ())
+	  dump_printf_loc (MSG_NOTE, vect_location,
+			   "dissolving relevancy of %G over %G",
+			   STMT_VINFO_STMT (stmt_info),
+			   STMT_VINFO_STMT (related_stmt_info));
+	STMT_VINFO_RELEVANT (stmt_info) = vect_unused_in_scope;
+	STMT_VINFO_RELEVANT (related_stmt_info) = vect_used_in_scope;
+	STMT_VINFO_IN_PATTERN_P (related_stmt_info) = false;
+	STMT_SLP_TYPE (related_stmt_info) = hybrid;
+	/* Now we have to re-analyze the statement since we skipped it in the
+	   the initial analysis due to the differences in copies.  */
+	res = vect_analyze_stmt (loop_vinfo, related_stmt_info,
+ _to_vectorize, NULL, NULL, _vec);
+
+	if (!res)
+	  return res;
+  }
+
+  FOR_EACH_VEC_ELT (SLP_TREE_CHILDREN (root), i, node)
+{
+  res = vect_dissolve_slp_only_patterns (loop_vinfo, visited, node);
+  if (!res)
+	return res;
+}
+
+  return res;
+}
+
+/* Lookup any SLP Only Pattern statements created by the SLP pattern matcher in
+   all slp_instances in LOOP_VINFO and undo the relevancy of statements such
+   that the original SLP tree before the pattern matching is used.  */
+
+static opt_result
+vect_dissolve_slp_only_patterns (loop_vec_info loop_vinfo)
+{
+
+  unsigned int i;
+  opt_result res = opt_result::success ();
+  hash_set *visited = new hash_set ();
+
+  DUMP_VECT_SCOPE ("vect_dissolve_slp_only_patterns");
+
+  /* Unmark any SLP only patterns as relevant and restore the STMT_INFO of the
+ related instruction.  */
+  slp_instance instance;
+  FOR_EACH_VEC_ELT (LOOP_VINFO_SLP_INSTANCES (loop_vinfo), i, instance)
+{
+  res = vect_dissolve_slp_only_patterns (loop_vinfo, visited,
+	 SLP_INSTANCE_TREE (instance));
+  if (!res)
+	{
+	  delete visited;
+	  return res;
+	}
+}
+
+  delete visited;
+  return res;
+}
+
 /* Look for SLP-only access groups and turn each individual access into its own
group.  */
 static void
@@ -2427,6 +2513,11 @@ again:
   /* Ensure that "ok" is false (with an opt_problem if dumping is enabled).  */
   gcc_assert (!ok);
 
+  /* Dissolve any SLP patterns created by the SLP pattern matcher.  */
+  opt_result dissolved = vect_dissolve_slp_only_patterns (loop_vinfo);
+  if (!dissolved)
+return dissolved;
+
   /* Try again with SLP forced off but if we didn't do any SLP there is
  no point in re-trying.  */
   if (!slp)



[PATCH v2 5/16]middle-end: Add shared machinery for matching patterns involving complex numbers.

2020-09-25 Thread Tamar Christina
Hi All,

This patch adds shared machinery for detecting patterns having to do with
complex number operations.  The class ComplexPattern provides helpers for
matching and ultimately undoing the permutation in the tree by rebuilding the
graph.

Bootstrapped Regtested on aarch64-none-linux-gnu and no issues.

Ok for master?

Thanks,
Tamar

gcc/ChangeLog:

* tree-vect-slp-patterns.c (complex_operation_t,class ComplexPattern):
New.

-- 
diff --git a/gcc/tree-vect-slp-patterns.c b/gcc/tree-vect-slp-patterns.c
index f605f68d2a14c4bf4941f97b7c1d57f6acb5ffb1..6453a5b1b6464dba833adc2c2a194db5e712bb79 100644
--- a/gcc/tree-vect-slp-patterns.c
+++ b/gcc/tree-vect-slp-patterns.c
@@ -134,6 +134,19 @@ along with GCC; see the file COPYING3.  If not see
   To add a new pattern, implement the VectPattern class and add the type to
   slp_patterns.  */
 
+/* The COMPLEX_OPERATION enum denotes the possible pair of operations that can
+   be matched when looking for expressions that we are interested matching for
+   complex numbers addition and mla.  */
+
+typedef enum _complex_operation {
+  PLUS_PLUS,
+  MINUS_PLUS,
+  PLUS_MINUS,
+  MULT_MULT,
+  NEG_NEG,
+  CMPLX_NONE
+} complex_operation_t;
+
 /* VectSimplePatternMatch holds contextual information about a single match
found in the SLP tree.  The use of the class is to allow you to defer
performing any modifications to the SLP tree until they are to be done.  By
@@ -298,6 +311,358 @@ class VectSimplePatternMatch : public VectPatternMatch
 }
 };
 
+/* The ComplexPattern class contains common code for pattern matchers that work
+   on complex numbers.  These provide functionality to allow de-construction and
+   validation of sequences depicting/transforming REAL and IMAG pairs.  */
+
+class ComplexPattern : public VectPattern
+{
+  protected:
+/* Current list of arguments that were found during the current invocation
+   of the pattern matcher.  */
+vec m_vects;
+
+/* Representative statement for the current match being performed.  */
+stmt_vec_info m_stmt_info;
+
+/* A list of all arguments found between all invocations of the current
+   pattern matcher.  */
+vec> m_defs;
+
+/* Checks to see of the expression EXPR is a gimple assign with code CODE
+   and if this is the case the two operands of EXPR is returned in OP1 and
+   OP2.
+
+   If the matching and extraction is successful TRUE is returned otherwise
+   FALSE in which case the value of OP1 and OP2 will not have been touched.
+*/
+
+bool
+vect_match_expression_p (slp_tree node, tree_code code, int base, int idx,
+			 stmt_vec_info *op1, stmt_vec_info *op2)
+{
+
+  vec scalar_stmts = SLP_TREE_SCALAR_STMTS (node);
+
+  /* Calculate the index of the statement in the node to inspect.  */
+  int n = base + idx;
+  if (scalar_stmts.length () < (unsigned)n) // can use group_size
+	return false;
+
+  gimple* expr = STMT_VINFO_STMT (scalar_stmts[n]);
+  if (!is_gimple_assign (expr)
+	  || gimple_expr_code (expr) != code)
+	return false;
+
+  vec children = SLP_TREE_CHILDREN (node);
+
+  /* If it's a VEC_PERM_EXPR we need to look one deeper.  VEC_PERM_EXPR
+	 only have one entry.  So pick on.  */
+  if (node->code == VEC_PERM_EXPR)
+	children = SLP_TREE_CHILDREN (children.last ());
+
+  if (children.length () != (op2 ? 2 : 1))
+	return false;
+
+  if (op1)
+	{
+	  if (SLP_TREE_DEF_TYPE (children[0]) != vect_internal_def)
+	return false;
+	  *op1 = SLP_TREE_SCALAR_STMTS (children[0])[n];
+	}
+
+  if (op2)
+	{
+	  if (SLP_TREE_DEF_TYPE (children[1]) != vect_internal_def)
+	return false;
+	  *op2 = SLP_TREE_SCALAR_STMTS (children[1])[n];
+	}
+
+  return true;
+}
+
+/* This function will match two gimple expressions STMT_0 and STMT_1 in
+   parallel and returns the pair operation that represents the two
+   expressions in the two statements.  The statements are located in NODE1
+   and NODE2 at offset base + offset1 and base + offset2 respectively.
+
+   If match is successful then the corresponding complex_operation is
+   returned and the arguments to the two matched operations are returned in
+   OPS.
+
+   If unsuccessful then CMPLX_NONE is returned and OPS is untouched.
+
+   e.g. the following gimple statements
+
+   stmt 0 _39 = _37 + _12;
+   stmt 1 _6 = _38 - _36;
+
+   will return PLUS_MINUS along with OPS containing {_37, _12, _38, _36}.
+*/
+
+complex_operation_t
+vect_detect_pair_op (int base, slp_tree node1, int offset1, slp_tree node2,
+			 int offset2, vec *ops)
+{
+  stmt_vec_info op1 = NULL, op2 = NULL, op3 = NULL, op4 = NULL;
+  complex_operation_t result = CMPLX_NONE;
+  #define CHECK_FOR(x, y, z)\
+	(vect_match_expression_p (node1, x, base, offset1, ,\
+  z ?  : NULL)  \
+	 && 

[PATCH v2 1/16]middle-end: Refactor refcnt to use SLP_TREE_REF_COUNT for consistency

2020-09-25 Thread Tamar Christina
Hi All,

This is a small refactoring which introduces SLP_TREE_REF_COUNT and replaces
the uses of refcnt with it.  This for consistency between the other properties.

A similar patch was pre-approved last year but since there are more use now I am
sending it for review anyway.

Bootstrapped Regtested on aarch64-none-linux-gnu and no issues.

Ok for master?

Thanks,
Tamar

gcc/ChangeLog:

* tree-vectorizer.h (SLP_TREE_REF_COUNT): New.
* tree-vect-slp.c (_slp_tree::_slp_tree, _slp_tree::~_slp_tree,
vect_free_slp_tree, vect_build_slp_tree, vect_print_slp_tree,
slp_copy_subtree, vect_attempt_slp_rearrange_stmts): Use it.

-- 
diff --git a/gcc/tree-vect-slp.c b/gcc/tree-vect-slp.c
index c44fd396bf0b69a4153e46026c545bebb3797551..bf8ea4326597f4211d2772e9db60aa69285b5998 100644
--- a/gcc/tree-vect-slp.c
+++ b/gcc/tree-vect-slp.c
@@ -66,7 +66,7 @@ _slp_tree::_slp_tree ()
   SLP_TREE_CODE (this) = ERROR_MARK;
   SLP_TREE_VECTYPE (this) = NULL_TREE;
   SLP_TREE_REPRESENTATIVE (this) = NULL;
-  this->refcnt = 1;
+  SLP_TREE_REF_COUNT (this) = 1;
   this->max_nunits = 1;
   this->lanes = 0;
 }
@@ -92,7 +92,7 @@ vect_free_slp_tree (slp_tree node)
   int i;
   slp_tree child;
 
-  if (--node->refcnt != 0)
+  if (--SLP_TREE_REF_COUNT (node) != 0)
 return;
 
   FOR_EACH_VEC_ELT (SLP_TREE_CHILDREN (node), i, child)
@@ -1180,7 +1180,7 @@ vect_build_slp_tree (vec_info *vinfo,
 			 *leader ? "" : "failed ", *leader);
   if (*leader)
 	{
-	  (*leader)->refcnt++;
+	  SLP_TREE_REF_COUNT (*leader)++;
 	  vect_update_max_nunits (max_nunits, (*leader)->max_nunits);
 	}
   return *leader;
@@ -1194,7 +1194,7 @@ vect_build_slp_tree (vec_info *vinfo,
   res->max_nunits = this_max_nunits;
   vect_update_max_nunits (max_nunits, this_max_nunits);
   /* Keep a reference for the bst_map use.  */
-  res->refcnt++;
+  SLP_TREE_REF_COUNT (res)++;
 }
   bst_map->put (stmts.copy (), res);
   return res;
@@ -1590,7 +1590,7 @@ fail:
   SLP_TREE_CHILDREN (two).safe_splice (children);
   slp_tree child;
   FOR_EACH_VEC_ELT (SLP_TREE_CHILDREN (two), i, child)
-	child->refcnt++;
+	SLP_TREE_REF_COUNT (child)++;
 
   /* Here we record the original defs since this
 	 node represents the final lane configuration.  */
@@ -1650,7 +1650,8 @@ vect_print_slp_tree (dump_flags_t dump_kind, dump_location_t loc,
 		   : (SLP_TREE_DEF_TYPE (node) == vect_constant_def
 		  ? " (constant)"
 		  : ""), node,
-		   estimated_poly_value (node->max_nunits), node->refcnt);
+		   estimated_poly_value (node->max_nunits),
+	 SLP_TREE_REF_COUNT (node));
   if (SLP_TREE_SCALAR_STMTS (node).exists ())
 FOR_EACH_VEC_ELT (SLP_TREE_SCALAR_STMTS (node), i, stmt_info)
   dump_printf_loc (metadata, user_loc, "\tstmt %u %G", i, stmt_info->stmt);
@@ -1802,7 +1803,7 @@ slp_copy_subtree (slp_tree node, hash_map )
   SLP_TREE_REPRESENTATIVE (copy) = SLP_TREE_REPRESENTATIVE (node);
   SLP_TREE_LANES (copy) = SLP_TREE_LANES (node);
   copy->max_nunits = node->max_nunits;
-  copy->refcnt = 0;
+  SLP_TREE_REF_COUNT (copy) = 0;
   if (SLP_TREE_SCALAR_STMTS (node).exists ())
 SLP_TREE_SCALAR_STMTS (copy) = SLP_TREE_SCALAR_STMTS (node).copy ();
   if (SLP_TREE_SCALAR_OPS (node).exists ())
@@ -1819,7 +1820,7 @@ slp_copy_subtree (slp_tree node, hash_map )
   FOR_EACH_VEC_ELT (SLP_TREE_CHILDREN (copy), i, child)
 {
   SLP_TREE_CHILDREN (copy)[i] = slp_copy_subtree (child, map);
-  SLP_TREE_CHILDREN (copy)[i]->refcnt++;
+  SLP_TREE_REF_COUNT (SLP_TREE_CHILDREN (copy)[i])++;
 }
   return copy;
 }
@@ -1935,7 +1936,7 @@ vect_attempt_slp_rearrange_stmts (slp_instance slp_instn)
   hash_map map;
   slp_tree unshared = slp_copy_subtree (SLP_INSTANCE_TREE (slp_instn), map);
   vect_free_slp_tree (SLP_INSTANCE_TREE (slp_instn));
-  unshared->refcnt++;
+  SLP_TREE_REF_COUNT (unshared)++;
   SLP_INSTANCE_TREE (slp_instn) = unshared;
   FOR_EACH_VEC_ELT (SLP_INSTANCE_LOADS (slp_instn), i, node)
 SLP_INSTANCE_LOADS (slp_instn)[i] = *map.get (node);
diff --git a/gcc/tree-vectorizer.h b/gcc/tree-vectorizer.h
index 9dffc5570e51b21c2f5c02b80a9f49d25a183284..2ebcf9f9926ec7175f28391f172800499bbc59db 100644
--- a/gcc/tree-vectorizer.h
+++ b/gcc/tree-vectorizer.h
@@ -204,6 +204,7 @@ public:
 #define SLP_TREE_CHILDREN(S) (S)->children
 #define SLP_TREE_SCALAR_STMTS(S) (S)->stmts
 #define SLP_TREE_SCALAR_OPS(S)   (S)->ops
+#define SLP_TREE_REF_COUNT(S)(S)->refcnt
 #define SLP_TREE_VEC_STMTS(S)(S)->vec_stmts
 #define SLP_TREE_VEC_DEFS(S) (S)->vec_defs
 #define SLP_TREE_NUMBER_OF_VEC_STMTS(S)  (S)->vec_stmts_size



[PATCH v2 3/16]middle-end Add basic SLP pattern matching scaffolding.

2020-09-25 Thread Tamar Christina
Hi All,

This patch adds the basic infrastructure for doing pattern matching on SLP 
trees.
This is done immediately after the SLP tree creation because it can change the
shape of the tree in radical ways and so we would like to do it before any
analysis is performed on the tree.

A new file tree-vect-slp-patterns.c is added which contains all the code for
pattern matching on SLP trees.

This cover letter is short because the changes are heavily commented.

All pattern matchers need to implement the abstract type VectPatternMatch.
The VectSimplePatternMatch abstract class provides some default functionality
for pattern matchers that need to rebuild nodes.

The pattern matcher requires if replacing a statement in a node, that ALL
statements be replaced.

Bootstrapped Regtested on aarch64-none-linux-gnu and no issues.

Ok for master?

Thanks,
Tamar

gcc/ChangeLog:

* Makefile.in (tree-vect-slp-patterns.o): New.
* doc/passes.texi: Update documentation.
* tree-vect-slp.c (vect_match_slp_patterns_2, vect_match_slp_patterns):
New.
(vect_analyze_slp_instance): Call pattern matcher.
* tree-vectorizer.h (class VectPatternMatch, class VectPattern): New.
* tree-vect-slp-patterns.c: New file.

-- 
diff --git a/gcc/Makefile.in b/gcc/Makefile.in
index 9c6c1c93b976aaf350cc1f9b3bdc538308fdf08b..936202b73696c8529b32c05b2356c7316fabc542 100644
--- a/gcc/Makefile.in
+++ b/gcc/Makefile.in
@@ -1638,6 +1638,7 @@ OBJS = \
 	tree-vect-loop.o \
 	tree-vect-loop-manip.o \
 	tree-vect-slp.o \
+	tree-vect-slp-patterns.o \
 	tree-vectorizer.o \
 	tree-vector-builder.o \
 	tree-vrp.o \
diff --git a/gcc/doc/passes.texi b/gcc/doc/passes.texi
index a5ae4143a8c1293e674b499120372ee5fe5c412b..c86df5cd843084a5b7933ef99a23386891a7b0c1 100644
--- a/gcc/doc/passes.texi
+++ b/gcc/doc/passes.texi
@@ -709,7 +709,8 @@ loop.
 The pass is implemented in @file{tree-vectorizer.c} (the main driver),
 @file{tree-vect-loop.c} and @file{tree-vect-loop-manip.c} (loop specific parts
 and general loop utilities), @file{tree-vect-slp} (loop-aware SLP
-functionality), @file{tree-vect-stmts.c} and @file{tree-vect-data-refs.c}.
+functionality), @file{tree-vect-stmts.c}, @file{tree-vect-data-refs.c} and
+@file{tree-vect-slp-patterns.c} containing the SLP pattern matcher.
 Analysis of data references is in @file{tree-data-ref.c}.
 
 SLP Vectorization.  This pass performs vectorization of straight-line code. The
diff --git a/gcc/tree-vect-slp-patterns.c b/gcc/tree-vect-slp-patterns.c
new file mode 100644
index ..f605f68d2a14c4bf4941f97b7c1d57f6acb5ffb1
--- /dev/null
+++ b/gcc/tree-vect-slp-patterns.c
@@ -0,0 +1,310 @@
+/* SLP - Pattern matcher on SLP trees
+   Copyright (C) 2020 Free Software Foundation, Inc.
+
+This file is part of GCC.
+
+GCC is free software; you can redistribute it and/or modify it under
+the terms of the GNU General Public License as published by the Free
+Software Foundation; either version 3, or (at your option) any later
+version.
+
+GCC is distributed in the hope that it will be useful, but WITHOUT ANY
+WARRANTY; without even the implied warranty of MERCHANTABILITY or
+FITNESS FOR A PARTICULAR PURPOSE.  See the GNU General Public License
+for more details.
+
+You should have received a copy of the GNU General Public License
+along with GCC; see the file COPYING3.  If not see
+.  */
+
+#include "config.h"
+#include "system.h"
+#include "coretypes.h"
+#include "backend.h"
+#include "target.h"
+#include "rtl.h"
+#include "tree.h"
+#include "gimple.h"
+#include "tree-pass.h"
+#include "ssa.h"
+#include "optabs-tree.h"
+#include "insn-config.h"
+#include "recog.h"		/* FIXME: for insn_data */
+#include "fold-const.h"
+#include "stor-layout.h"
+#include "gimple-iterator.h"
+#include "cfgloop.h"
+#include "tree-vectorizer.h"
+#include "langhooks.h"
+#include "gimple-walk.h"
+#include "dbgcnt.h"
+#include "tree-vector-builder.h"
+#include "vec-perm-indices.h"
+#include "gimple-fold.h"
+#include "internal-fn.h"
+
+/* SLP Pattern matching mechanism.
+
+  This extension to the SLP vectorizer allows one to transform the generated SLP
+  tree based on any pattern.  The difference between this and the normal vect
+  pattern matcher is that unlike the former, this matcher allows you to match
+  with instructions that do not belong to the same SSA dominator graph.
+
+  The only requirement that this pattern matcher has is that you are only
+  only allowed to either match an entire group or none.
+
+  As an example, the following simple loop:
+
+double a[restrict N]; double b[restrict N]; double c[restrict N];
+
+for (int i=0; i < N; i+=2)
+{
+  c[i] = a[i] - b[i+1];
+  c[i+1] = a[i+1] + b[i];
+}
+
+  which represents a complex addition on with a rotation of 90* around the
+  argand plane. i.e. if `a` and `b` were complex numbers then this would be the
+  same as `a + (b * I)`.
+
+  Here the expressions for 

[PATCH v2 2/16]middle-end: Refactor and expose some vectorizer helper functions.

2020-09-25 Thread Tamar Christina
Hi All,

This is a small refactoring which exposes some helper functions in the
vectorizer so they can be used in other places.

Bootstrapped Regtested on aarch64-none-linux-gnu and no issues.

Ok for master?

Thanks,
Tamar

gcc/ChangeLog:

* tree-vect-patterns.c (vect_mark_pattern_stmts): Remove static.
* tree-vect-slp.c (vect_free_slp_tree,
vect_build_slp_tree): Remove static.
(struct bst_traits, bst_traits::hash, bst_traits::equal): Move...
* tree-vectorizer.h (struct bst_traits, bst_traits::hash,
bst_traits::equal): ... to here.
(vect_mark_pattern_stmts, vect_free_slp_tree,
vect_build_slp_tree): Declare.

-- 
diff --git a/gcc/tree-vect-patterns.c b/gcc/tree-vect-patterns.c
index db45740da3cba14a3552f9446651e8f289187fbb..3bacd5c827e1a6436c5916022c04e0d6594c316a 100644
--- a/gcc/tree-vect-patterns.c
+++ b/gcc/tree-vect-patterns.c
@@ -5169,7 +5169,7 @@ const unsigned int NUM_PATTERNS = ARRAY_SIZE (vect_vect_recog_func_ptrs);
 
 /* Mark statements that are involved in a pattern.  */
 
-static inline void
+void
 vect_mark_pattern_stmts (vec_info *vinfo,
 			 stmt_vec_info orig_stmt_info, gimple *pattern_stmt,
  tree pattern_vectype)
diff --git a/gcc/tree-vect-slp.c b/gcc/tree-vect-slp.c
index bf8ea4326597f4211d2772e9db60aa69285b5998..01189d44d892fc42b132bbb7de1c471df45518ae 100644
--- a/gcc/tree-vect-slp.c
+++ b/gcc/tree-vect-slp.c
@@ -86,7 +86,7 @@ _slp_tree::~_slp_tree ()
 
 /* Recursively free the memory allocated for the SLP tree rooted at NODE.  */
 
-static void
+void
 vect_free_slp_tree (slp_tree node)
 {
   int i;
@@ -1120,45 +1120,6 @@ vect_build_slp_tree_1 (vec_info *vinfo, unsigned char *swap,
   return true;
 }
 
-/* Traits for the hash_set to record failed SLP builds for a stmt set.
-   Note we never remove apart from at destruction time so we do not
-   need a special value for deleted that differs from empty.  */
-struct bst_traits
-{
-  typedef vec  value_type;
-  typedef vec  compare_type;
-  static inline hashval_t hash (value_type);
-  static inline bool equal (value_type existing, value_type candidate);
-  static inline bool is_empty (value_type x) { return !x.exists (); }
-  static inline bool is_deleted (value_type x) { return !x.exists (); }
-  static const bool empty_zero_p = true;
-  static inline void mark_empty (value_type ) { x.release (); }
-  static inline void mark_deleted (value_type ) { x.release (); }
-  static inline void remove (value_type ) { x.release (); }
-};
-inline hashval_t
-bst_traits::hash (value_type x)
-{
-  inchash::hash h;
-  for (unsigned i = 0; i < x.length (); ++i)
-h.add_int (gimple_uid (x[i]->stmt));
-  return h.end ();
-}
-inline bool
-bst_traits::equal (value_type existing, value_type candidate)
-{
-  if (existing.length () != candidate.length ())
-return false;
-  for (unsigned i = 0; i < existing.length (); ++i)
-if (existing[i] != candidate[i])
-  return false;
-  return true;
-}
-
-typedef hash_map , slp_tree,
-		  simple_hashmap_traits  >
-  scalar_stmts_to_slp_tree_map_t;
-
 static slp_tree
 vect_build_slp_tree_2 (vec_info *vinfo,
 		   vec stmts, unsigned int group_size,
@@ -1166,7 +1127,7 @@ vect_build_slp_tree_2 (vec_info *vinfo,
 		   bool *matches, unsigned *npermutes, unsigned *tree_size,
 		   scalar_stmts_to_slp_tree_map_t *bst_map);
 
-static slp_tree
+slp_tree
 vect_build_slp_tree (vec_info *vinfo,
 		 vec stmts, unsigned int group_size,
 		 poly_uint64 *max_nunits,
diff --git a/gcc/tree-vectorizer.h b/gcc/tree-vectorizer.h
index 2ebcf9f9926ec7175f28391f172800499bbc59db..79926f1a43534635ddca85556a928e364022c40a 100644
--- a/gcc/tree-vectorizer.h
+++ b/gcc/tree-vectorizer.h
@@ -2047,6 +2047,9 @@ extern int vect_get_place_in_interleaving_chain (stmt_vec_info, stmt_vec_info);
 extern bool vect_update_shared_vectype (stmt_vec_info, tree);
 
 /* In tree-vect-patterns.c.  */
+extern void
+vect_mark_pattern_stmts (vec_info *, stmt_vec_info, gimple *, tree);
+
 /* Pattern recognition functions.
Additional pattern recognition functions can (and will) be added
in the future.  */
@@ -2058,4 +2061,51 @@ void vect_free_loop_info_assumptions (class loop *);
 gimple *vect_loop_vectorized_call (class loop *, gcond **cond = NULL);
 bool vect_stmt_dominates_stmt_p (gimple *, gimple *);
 
+/* Traits for the hash_set to record failed SLP builds for a stmt set.
+   Note we never remove apart from at destruction time so we do not
+   need a special value for deleted that differs from empty.  */
+struct bst_traits
+{
+  typedef vec  value_type;
+  typedef vec  compare_type;
+  static inline hashval_t hash (value_type);
+  static inline bool equal (value_type existing, value_type candidate);
+  static inline bool is_empty (value_type x) { return !x.exists (); }
+  static inline bool is_deleted (value_type x) { return !x.exists (); }
+  static const bool empty_zero_p = true;
+  static inline void mark_empty (value_type ) { 

[PATCH v2 0/16][RFC][AArch64/Arm/SVE/SVE2/MVE]middle-end Add support for SLP vectorization of complex number instructions.

2020-09-25 Thread Tamar Christina
Hi All,

This patch series adds support for SLP vectorization of complex instructions 
[1].

These instructions exist only in their vector forms and require you to recognize
two statements in parallel.  Complex operations usually require a permute due to
the fact that the real and imaginary numbers are stored intermixed but these 
vector
instructions expect this and no longer need the compiler to generate a permute.

For this reason the pass also re-orders the loads in the SLP tree such that they
become contiguous and no longer need the permutes.  The Basic Blocks are left
untouched such that the scalar loop will still correctly issue permutes.

The instructions also support rotations along the Argand plane, as such the 
operands
have to be re-ordered to coincide with their load group.

For now, this patch only adds support for:

  * Complex Addition with rotation of 0 and 180.
  * Complex Multiplication and Multiplication where one operand is conjucated.
  * Complex FMA and FMA where one operand is conjucated.
  * Complex FMS and FMS where one operand is conjucated.
  
Complex dot-product is not currently supported in this patch set as build_slp 
fails
for it.  This will be provided as a future patch.
  
These are supported for both integer and floating point and as such these don't 
look
for real or imaginary pairs but instead rely on the early lowering of complex
numbers by GCC and canonicazation of the operations such that it just 
recognizes any
instruction sequence matching the operations requested.

To be safe when the it is not sure it can support the operation or if it finds 
something it
does not understand it backs off.

This patch is an RFC and I am looking on feedback on the approach.  Particularly
this series has one problem which is when it is decided that SLP is not viable
and that the normal loop vectorizer is to be used.

In this case I dissolve the changes but the compiler crashes because the use of
pattern matcher essentially undoes two_operands.  This means that the number of
copies needed when using the patterns and when not are different.  When using
the patterns the two operands become the same and so are treated as manually
unrolled loops.  The problem is that because nunits has already been decided
along with the unroll factor.  When the dissolved statements are then analyzed
they fail.  This is also the reason why I cannot analyze both the pattern and
original statements initially.

The relavent placed in the source code have comments describing the problem.

[1] https://developer.arm.com/documentation/ddi0487/fc/

Thanks,
Tamar

-- 


[PATCH] gcov: fix streaming of HIST_TYPE_IOR histogram type.

2020-09-25 Thread Martin Liška

Hello.

I'm going to install quite obvious patch which allow negative values
for HIST_TYPE_IOR as it tracks pointers.

Martin

gcc/ChangeLog:

PR gcov-profile/64636
* value-prof.c (stream_out_histogram_value): Allow negative
values for HIST_TYPE_IOR.
---
 gcc/value-prof.c | 5 -
 1 file changed, 4 insertions(+), 1 deletion(-)

diff --git a/gcc/value-prof.c b/gcc/value-prof.c
index ea1b1a8f98f..95d33c63a0c 100644
--- a/gcc/value-prof.c
+++ b/gcc/value-prof.c
@@ -331,7 +331,10 @@ stream_out_histogram_value (struct output_block *ob, 
histogram_value hist)
   /* When user uses an unsigned type with a big value, constant converted
 to gcov_type (a signed type) can be negative.  */
   gcov_type value = hist->hvalue.counters[i];
-  if (hist->type == HIST_TYPE_TOPN_VALUES)
+  if (hist->type == HIST_TYPE_TOPN_VALUES
+ || hist->type == HIST_TYPE_IOR)
+   /* Note that the IOR counter tracks pointer values and these can have
+  sign bit set.  */
;
   else
gcc_assert (value >= 0);
--
2.28.0



Re: [PATCH] switch lowering: limit number of cluster attemps

2020-09-25 Thread Martin Liška

On 9/25/20 3:45 PM, Richard Biener wrote:

On Fri, Sep 25, 2020 at 3:32 PM Martin Liška  wrote:


On 9/25/20 3:18 PM, Richard Biener wrote:

On Fri, Sep 25, 2020 at 11:13 AM Martin Liška  wrote:


Hello.

All right, I come up with a rapid speed up that can allow us to remove
the introduced parameter. It contains 2 parts:
- BIT TEST: we allow at maximum a range that is smaller GET_MODE_BITSIZE
- JT: we spent quite some time in density calculation, we can guess it first
 and it leads to a fast bail out.

Patch can bootstrap on x86_64-linux-gnu and survives regression tests.

Ready to be installed?


Err

+  auto_vec dest_bbs;
-  auto_bitmap dest_bbs;

-  bitmap_set_bit (dest_bbs, sc->m_case_bb->index);
+  if (!dest_bbs.contains (sc->m_case_bb->index))
+   {
+ dest_bbs.safe_push (sc->m_case_bb->index);
+ if (dest_bbs.length () > m_max_case_bit_tests)
+   return false;
+   }


That's intentional as m_max_case_bit_tests is a very small number (3) and
I want to track *distinct* indices in dest_bbs. So dest_bbs.contains
is a constant operation.


You're storing bb->index and formerly set bb->index bit, what's the difference?

For max 3 elements a vector is OK, of course but there should be a comment
that says this ;)  The static const is 'int' so it can in principle
hold up to two billion ;)


Sure, comment is needed.





vec::contains is linear search so no.  Was this for the length check?
Just do

   if (bitmap_set_bit (...))
{
  length++;
  if (length > ...)


I would need here bitmap_count_bits. Do you prefer it?


bitmap_set_bit returns false if the bit was already set so you can
count as you add bits, see the length++ above.


Ah, got it!



For three elements the vec will be faster though.  May I suggest
to use

  auto_vec dest_bbs;

then and quick_push rather than safe_push (need to guard the
push with the max_case_bit_test).


Yes.

Is the patch fine with that (and Jakub's comment)?

Martin



Richard.




Martin




Thanks,
Martin






Re: [PATCH] switch lowering: limit number of cluster attemps

2020-09-25 Thread Martin Liška

On 9/25/20 3:52 PM, Jakub Jelinek wrote:

On Fri, Sep 25, 2020 at 11:13:06AM +0200, Martin Liška wrote:

--- a/gcc/tree-switch-conversion.c
+++ b/gcc/tree-switch-conversion.c
@@ -1268,6 +1268,15 @@ jump_table_cluster::can_be_handled (const vec 
,
if (range == 0)
  return false;
  
+  unsigned HOST_WIDE_INT lhs = 100 * range;

+  if (lhs < range)
+return false;


If this test is meant to detect when 100 * range has overflowed,
then I think it is insufficient.
Perhaps do
   if (range > HOST_WIDE_INT_M1U / 100)
 return false;

   unsigned HOST_WIDE_INT lhs = 100 * range;
instead?


Yes, I'll add the check.

Thanks,
Martin



Jakub





Re: [PATCH 1/2] rs6000: Support _mm_insert_epi{8,32,64}

2020-09-25 Thread Paul A. Clarke via Gcc-patches
On Thu, Sep 24, 2020 at 06:22:10PM -0500, Segher Boessenkool wrote:
> On Wed, Sep 23, 2020 at 05:12:44PM -0500, Paul A. Clarke wrote:
> > +extern __inline __m128i __attribute__((__gnu_inline__, __always_inline__, 
> > __artificial__))
> > +_mm_insert_epi8 (__m128i const __A, int const __D, int const __N)
> > +{
> > +  __v16qi result = (__v16qi)__A;
> > +
> > +  result [(__N & 0b)] = __D;
> 
> Hrm, GCC supports binary constants like this since 2007, so okay.  But I
> have to wonder if this improves anything over hex (or decimal even!)
> The parens are superfluous (and only hinder legibility), fwiw.
> 
> > +_mm_insert_epi64 (__m128i const __A, long long const __D, int const __N)
> > +{
> > +  __v2di result = (__v2di)__A;
> > +
> > +  result [(__N & 0b1)] = __D;
> 
> Especially single-digit numbers look really goofy (like 0x0, but even
> worse for binary somehow).
> 
> Anyway, okay for trunk, with or without those things improved.  Thanks!

I was trying to obviously and consistently convey the sizes of the masks,
but I really want to convey _why_ there are masks, so let me try a
different approach, below.

--

Add compatibility implementations for SSE4.1 intrinsics
_mm_insert_epi8, _mm_insert_epi32, _mm_insert_epi64.

2020-09-25  Paul A. Clarke  

gcc/
* config/rs6000/smmintrin.h (_mm_insert_epi8): New.
(_mm_insert_epi32): New.
(_mm_insert_epi64): New.
---
 gcc/config/rs6000/smmintrin.h | 30 ++
 1 file changed, 30 insertions(+)

diff --git a/gcc/config/rs6000/smmintrin.h b/gcc/config/rs6000/smmintrin.h
index d78ddba99d9..8128c417978 100644
--- a/gcc/config/rs6000/smmintrin.h
+++ b/gcc/config/rs6000/smmintrin.h
@@ -42,6 +42,36 @@
 #include 
 #include 
 
+extern __inline __m128i __attribute__((__gnu_inline__, __always_inline__, 
__artificial__))
+_mm_insert_epi8 (__m128i const __A, int const __D, int const __N)
+{
+  __v16qi result = (__v16qi)__A;
+
+  result [__N % (sizeof result / sizeof result[0])] = __D;
+
+  return (__m128i) result;
+}
+
+extern __inline __m128i __attribute__((__gnu_inline__, __always_inline__, 
__artificial__))
+_mm_insert_epi32 (__m128i const __A, int const __D, int const __N)
+{
+  __v4si result = (__v4si)__A;
+
+  result [__N % (sizeof result / sizeof result[0])] = __D;
+
+  return (__m128i) result;
+}
+
+extern __inline __m128i __attribute__((__gnu_inline__, __always_inline__, 
__artificial__))
+_mm_insert_epi64 (__m128i const __A, long long const __D, int const __N)
+{
+  __v2di result = (__v2di)__A;
+
+  result [__N % (sizeof result / sizeof result[0])] = __D;
+
+  return (__m128i) result;
+}
+
 extern __inline int __attribute__((__gnu_inline__, __always_inline__, 
__artificial__))
 _mm_extract_epi8 (__m128i __X, const int __N)
 {
-- 
2.18.4



Re: [PATCH] Add if-chain to switch conversion pass.

2020-09-25 Thread Martin Liška

On 9/24/20 2:41 PM, Richard Biener wrote:

On Wed, Sep 2, 2020 at 1:53 PM Martin Liška  wrote:


On 9/1/20 4:50 PM, David Malcolm wrote:

Hope this is constructive
Dave


Thank you David. All of them very very useful!

There's updated version of the patch.


Hey.

What a juicy patch review!



I noticed several functions without a function-level comment.


Yep, but several of them are documented in a class declaration. Anyway, I will
improve for the next time.



-  cluster (tree case_label_expr, basic_block case_bb, profile_probability prob,
-  profile_probability subtree_prob);
+  inline cluster (tree case_label_expr, basic_block case_bb,
+ profile_probability prob, profile_probability subtree_prob);

I thought we generally leave this to the compiler ...

+@item -fconvert-if-to-switch
+@opindex fconvert-if-to-switch
+Perform conversion of an if cascade into a switch statement.
+Do so if the switch can be later transformed using a jump table
+or a bit test.  The transformation can help to produce faster code for
+the switch statement.  This flag is enabled by default
+at @option{-O2} and higher.

this mentions we do this only when we later can convert the
switch again but both passes (we still have two :/) have
independent guards.


Yes, we have the option for jump tables (-jump-tables), but we miss one for a 
bit-test.
Moreover, as mentioned in the cover email, one can see it beneficial to convert 
a if-chain
to switch as the expansion (without any BT and JT) can benefit from balanced 
tree.



+  /* For now, just wipe the dominator information.  */
+  free_dominance_info (CDI_DOMINATORS);

could at least be conditional on the vop renaming condition...

+  if (!all_candidates.is_empty ())
+mark_virtual_operands_for_renaming (fun);


Yep.



+  if (bitmap_bit_p (*visited_bbs, bb->index))
+   break;
+  bitmap_set_bit (*visited_bbs, bb->index);

since you are using a bitmap and not a sbitmap (why?)
you can combine those into


New to me, thanks.



if (!bitmap_set_bit (*visited_bbs, bb->index))
 break;

+  /* Current we support following patterns (situations):
+
+1) if condition with equal operation:
+
...

did you see whether using

register_edge_assert_for (lhs, true_edge, code, lhs, rhs, asserts);

works equally well?  It fills the 'asserts' vector with relations
derived from 'lhs'.  There's also
vr_values::extract_range_for_var_from_comparison_expr
to compute the case_range


Good point! I must admit that my patch doesn't properly handle negative 
conditions:

  if (argc != 1)
  {
if (argc == 1)
  global = 222;
...
  }

which can VRP correctly identify as anti-range:
int ~[1, 1]  EQUIVALENCES: { argc_8(D) } (1 elements)$1 = void

I have question about OR and AND conditions:

   :
  _1 = aChar_8(D) == 1;
  _2 = aChar_8(D) == 10;
  _3 = _1 | _2;
  if (_3 != 0)
goto ; [INV]
  else
goto ; [INV]

   :
  _1 = aChar_8(D) != 1;
  _2 = aChar_8(D) != 10;
  _3 = _1 & _2;
  if (_3 != 0)
goto ; [INV]
  else
goto ; [INV]

Can I somehow get that from VRP (as I ask register_edge_assert_for only for LHS
of a condition)?



+  /* If it's not the first condition, then we need a BB without
+any statements.  */
+  if (!first)
+   {
+ unsigned stmt_count = 0;
+ for (gimple_stmt_iterator gsi = gsi_start_nondebug_bb (bb);
+  !gsi_end_p (gsi); gsi_next_nondebug ())
+   ++stmt_count;
+
+ if (stmt_count - visited_stmt_count != 0)
+   break;

hmm, OK, this might be a bit iffy to get correct then, still it's a lot
of pattern maching code that is there elsewhere already.
ifcombine simply hoists any stmts without side-effects up the
dominator tree and thus only requires BBs without side-effects
(IIRC there's a predicate fn for that).


Yes, I completely miss support for code hoisting (expect first BB where we put 
gswitch).
If I'm correct hoisting should be possible where case destination should be a 
new BB
that will contain original statements and then it will jump to a case 
destination block.



+  /* Prevent loosing information for a PHI node where 2 edges will
+be folded into one.  Note that we must do the same also for false_edge
+(for last BB in a if-elseif chain).  */
+  if (!chain->record_phi_arguments (true_edge)
+ || !chain->record_phi_arguments (false_edge))

I don't really get this - looking at record_phi_arguments it seems
we're requiring that all edges into the same PHI from inside the case
(irrespective of from which case label) have the same value for the
PHI arg?


I guess so, I'll refresh the functionality.



+ if (arg != *v)
+   return false;

should use operand_equal_p at least, REAL_CSTs are for example
not shared tree nodes.  I'll also notice that if record_phi_arguments
fails we still may have altered its hash-map even though the particular
edge will not participate in the 

c++: DECL_BUILTIN_P for builtins

2020-09-25 Thread Nathan Sidwell


We currently detect builtin decls via DECL_ARTIFICIAL &&
!DECL_HIDDEN_FUNCTION_P, which, besides being clunky, is a problem as
hiddenness is a property of the symbol table -- not the decl being
hidden.  This adds DECL_BUILTIN_P, which just looks at the
SOURCE_LOCATION -- we have a magic one for builtins.

One of the consequential changes is to make function-scope omp udrs
have function context (needed because otherwise duplicate-decls thinks
the types don't match at the point we check).  This is also morally
better, because that's what they are -- nested functions, stop lying.

(That's actually my plan for all DECL_LOCAL_DECL_P decls, as they are
distinct decls to the namespace-scope decl they alias.)

gcc/cp/
* cp-tree.h (DECL_BUILTIN_P): New.
* decl.c (duplicate_decls): Use it.  Do not treat omp-udr as a
builtin.
* name-lookup.c (anticipated_builtin): Use it.
(set_decl_context_in_fn): Function-scope OMP UDRs have function 
context.

(do_nonmember_using_decl): Use DECL_BUILTIN_P.
* parser.c (cp_parser_omp_declare_reduction): Function-scope OMP
UDRs have function context.  Assert we never find a valid 
duplicate.
* pt.c (tsubst_expr): Function-scope OMP UDRs have function 
context.

libcc1/
* libcp1plugin.cc (supplement_binding): Use DECL_BULTIN_P.

pushing to trunk

nathan

--
Nathan Sidwell
diff --git i/gcc/cp/cp-tree.h w/gcc/cp/cp-tree.h
index 3ae48749b3d..bd78f00ba97 100644
--- i/gcc/cp/cp-tree.h
+++ w/gcc/cp/cp-tree.h
@@ -4040,6 +4040,10 @@ more_aggr_init_expr_args_p (const aggr_init_expr_arg_iterator *iter)
 #define FNDECL_USED_AUTO(NODE) \
   TREE_LANG_FLAG_2 (FUNCTION_DECL_CHECK (NODE))
 
+/* True if NODE is a builtin decl.  */
+#define DECL_BUILTIN_P(NODE) \
+  (DECL_SOURCE_LOCATION(NODE) == BUILTINS_LOCATION)
+
 /* Nonzero if NODE is a DECL which we know about but which has not
been explicitly declared, such as a built-in function or a friend
declared inside a class.  In the latter case DECL_HIDDEN_FRIEND_P
diff --git i/gcc/cp/decl.c w/gcc/cp/decl.c
index 6019051ed12..1709dd9a370 100644
--- i/gcc/cp/decl.c
+++ w/gcc/cp/decl.c
@@ -1464,9 +1464,7 @@ duplicate_decls (tree newdecl, tree olddecl, bool newdecl_is_friend)
 
   /* Check for redeclaration and other discrepancies.  */
   if (TREE_CODE (olddecl) == FUNCTION_DECL
-  && DECL_ARTIFICIAL (olddecl)
-  /* A C++20 implicit friend operator== uses the normal path (94462).  */
-  && !DECL_HIDDEN_FRIEND_P (olddecl))
+  && DECL_BUILTIN_P (olddecl))
 {
   if (TREE_CODE (newdecl) != FUNCTION_DECL)
 	{
@@ -1508,15 +1506,6 @@ duplicate_decls (tree newdecl, tree olddecl, bool newdecl_is_friend)
 		  "declaration %q#D", newdecl, olddecl);
 	  return NULL_TREE;
 	}
-  else if (DECL_OMP_DECLARE_REDUCTION_P (olddecl))
-	{
-	  gcc_assert (DECL_OMP_DECLARE_REDUCTION_P (newdecl));
-	  error_at (newdecl_loc,
-		"redeclaration of %");
-	  inform (olddecl_loc,
-		  "previous % declaration");
-	  return error_mark_node;
-	}
   else if (!types_match)
 	{
 	  /* Avoid warnings redeclaring built-ins which have not been
@@ -1815,6 +1804,17 @@ duplicate_decls (tree newdecl, tree olddecl, bool newdecl_is_friend)
 	  return error_mark_node;
 	}
 }
+  else if (TREE_CODE (newdecl) == FUNCTION_DECL
+	   && DECL_OMP_DECLARE_REDUCTION_P (newdecl))
+{
+  /* OMP UDRs are never duplicates. */
+  gcc_assert (DECL_OMP_DECLARE_REDUCTION_P (olddecl));
+  error_at (newdecl_loc,
+		"redeclaration of %");
+  inform (olddecl_loc,
+	  "previous % declaration");
+  return error_mark_node;
+}
   else if (TREE_CODE (newdecl) == FUNCTION_DECL
 	&& ((DECL_TEMPLATE_SPECIALIZATION (olddecl)
 		 && (!DECL_TEMPLATE_INFO (newdecl)
diff --git i/gcc/cp/name-lookup.c w/gcc/cp/name-lookup.c
index e7764abff67..dbc6cc32dd8 100644
--- i/gcc/cp/name-lookup.c
+++ w/gcc/cp/name-lookup.c
@@ -2119,10 +2119,10 @@ anticipated_builtin_p (tree ovl)
   tree fn = OVL_FUNCTION (ovl);
   gcc_checking_assert (DECL_ANTICIPATED (fn));
 
-  if (DECL_HIDDEN_FRIEND_P (fn))
-return false;
+  if (DECL_BUILTIN_P (fn))
+return true;
 
-  return true;
+  return false;
 }
 
 /* BINDING records an existing declaration for a name in the current scope.
@@ -2857,9 +2857,12 @@ set_decl_context_in_fn (tree ctx, tree decl)
 {
   if (TREE_CODE (decl) == FUNCTION_DECL
   || (VAR_P (decl) && DECL_EXTERNAL (decl)))
-/* Make sure local externs are marked as such.  */
+/* Make sure local externs are marked as such.  OMP UDRs really
+   are nested functions.  */
 gcc_checking_assert (DECL_LOCAL_DECL_P (decl)
-			 && DECL_NAMESPACE_SCOPE_P (decl));
+			 && (DECL_NAMESPACE_SCOPE_P (decl)
+			 || (TREE_CODE (decl) == FUNCTION_DECL
+ && DECL_OMP_DECLARE_REDUCTION_P (decl;
 
   if (!DECL_CONTEXT (decl)
   /* When parsing the parameter list of a function declarator,
@@ -3934,7 +3937,7 @@ do_nonmember_using_decl (name_lookup , 

Re: [PATCH] switch lowering: limit number of cluster attemps

2020-09-25 Thread Jakub Jelinek via Gcc-patches
On Fri, Sep 25, 2020 at 11:13:06AM +0200, Martin Liška wrote:
> --- a/gcc/tree-switch-conversion.c
> +++ b/gcc/tree-switch-conversion.c
> @@ -1268,6 +1268,15 @@ jump_table_cluster::can_be_handled (const vec *> ,
>if (range == 0)
>  return false;
>  
> +  unsigned HOST_WIDE_INT lhs = 100 * range;
> +  if (lhs < range)
> +return false;

If this test is meant to detect when 100 * range has overflowed,
then I think it is insufficient.
Perhaps do
  if (range > HOST_WIDE_INT_M1U / 100)
return false;

  unsigned HOST_WIDE_INT lhs = 100 * range;
instead?

Jakub



PING^2 [GCC 10] [PATCH] IRA: Don't make a global register eliminable

2020-09-25 Thread H.J. Lu via Gcc-patches
On Tue, Sep 22, 2020 at 10:48 AM H.J. Lu  wrote:
>
> On Fri, Sep 18, 2020 at 10:21 AM H.J. Lu  wrote:
> >
> > On Thu, Sep 17, 2020 at 3:52 PM Jeff Law  wrote:
> > >
> > >
> > > On 9/16/20 8:46 AM, Richard Sandiford wrote:
> > >
> > > "H.J. Lu"  writes:
> > >
> > > On Tue, Sep 15, 2020 at 7:44 AM Richard Sandiford
> > >  wrote:
> > >
> > > Thanks for looking at this.
> > >
> > > "H.J. Lu"  writes:
> > >
> > > commit 1bcb4c4faa4bd6b1c917c75b100d618faf9e628c
> > > Author: Richard Sandiford 
> > > Date:   Wed Oct 2 07:37:10 2019 +
> > >
> > > [LRA] Don't make eliminable registers live (PR91957)
> > >
> > > didn't make eliminable registers live which breaks
> > >
> > > register void *cur_pro asm("reg");
> > >
> > > where "reg" is an eliminable register.  Make fixed eliminable registers
> > > live to fix it.
> > >
> > > I don't think fixedness itself is the issue here: it's usual for at
> > > least some registers involved in eliminations to be fixed registers.
> > >
> > > I think what makes this case different is instead that cur_pro/ebp
> > > is a global register.  But IMO things have already gone wrong if we
> > > think that a global register is eliminable.
> > >
> > > So I wonder if instead we should check global_regs at the beginning of:
> > >
> > >   for (i = 0; i < fp_reg_count; i++)
> > > if (!TEST_HARD_REG_BIT (crtl->asm_clobbers,
> > > HARD_FRAME_POINTER_REGNUM + i))
> > >   {
> > > SET_HARD_REG_BIT (eliminable_regset,
> > >   HARD_FRAME_POINTER_REGNUM + i);
> > > if (frame_pointer_needed)
> > >   SET_HARD_REG_BIT (ira_no_alloc_regs,
> > > HARD_FRAME_POINTER_REGNUM + i);
> > >   }
> > > else if (frame_pointer_needed)
> > >   error ("%s cannot be used in % here",
> > >  reg_names[HARD_FRAME_POINTER_REGNUM + i]);
> > > else
> > >   df_set_regs_ever_live (HARD_FRAME_POINTER_REGNUM + i, true);
> > >
> > > (ira_setup_eliminable_regset), and handle the global_regs[] case in
> > > the same way as the else case, i.e. short-circuiting both of the ifs.
> > >
> > > Like this?
> > >
> > > Sorry for the delay.  I was testing this in parallel.
> > >
> > > Bootstrapped & regression-tested on x86_64-linux-gnu.
> > >
> > > Thanks,
> > > Richard
> > >
> > >
> > > 0001-ira-Fix-elimination-for-global-hard-FPs-PR91957.patch
> > >
> > > From af4499845d26fe65573b21197a79fd22fd38694e Mon Sep 17 00:00:00 2001
> > > From: "H.J. Lu" 
> > > Date: Tue, 15 Sep 2020 06:23:26 -0700
> > > Subject: [PATCH] ira: Fix elimination for global hard FPs [PR91957]
> > > MIME-Version: 1.0
> > > Content-Type: text/plain; charset=UTF-8
> > > Content-Transfer-Encoding: 8bit
> > >
> > > If the hard frame pointer is being used as a global register,
> > > we should skip the usual handling for eliminations.  As the
> > > comment says, the register cannot in that case be eliminated
> > > (or eliminated to) and is already marked live where appropriate.
> > >
> > > Doing this removes the duplicate error for gcc.target/i386/pr82673.c.
> > > The “cannot be used in 'asm' here” message is meant to be for asm
> > > statements rather than register asms, and the function that the
> > > error is reported against doesn't use asm.
> > >
> > > gcc/
> > > 2020-09-16  Richard Sandiford  
> > >
> > > PR middle-end/91957
> > > * ira.c (ira_setup_eliminable_regset): Skip the special elimination
> > > handling of the hard frame pointer if the hard frame pointer is fixed.
> > >
> > > gcc/testsuite/
> > > 2020-09-16  H.J. Lu  
> > >Richard Sandiford  
> > >
> > > PR middle-end/91957
> > > * g++.target/i386/pr97054.C: New test.
> > > * gcc.target/i386/pr82673.c: Remove redundant extra message.
> > >
> > > OK
> >
> > OK for GCC 10 branch?
> >
> > Thanks.
>
> PING:
>
> https://gcc.gnu.org/pipermail/gcc-patches/2020-September/554268.html
>

PING.


-- 
H.J.


Re: [PATCH] switch lowering: limit number of cluster attemps

2020-09-25 Thread Richard Biener via Gcc-patches
On Fri, Sep 25, 2020 at 3:32 PM Martin Liška  wrote:
>
> On 9/25/20 3:18 PM, Richard Biener wrote:
> > On Fri, Sep 25, 2020 at 11:13 AM Martin Liška  wrote:
> >>
> >> Hello.
> >>
> >> All right, I come up with a rapid speed up that can allow us to remove
> >> the introduced parameter. It contains 2 parts:
> >> - BIT TEST: we allow at maximum a range that is smaller GET_MODE_BITSIZE
> >> - JT: we spent quite some time in density calculation, we can guess it 
> >> first
> >> and it leads to a fast bail out.
> >>
> >> Patch can bootstrap on x86_64-linux-gnu and survives regression tests.
> >>
> >> Ready to be installed?
> >
> > Err
> >
> > +  auto_vec dest_bbs;
> > -  auto_bitmap dest_bbs;
> >
> > -  bitmap_set_bit (dest_bbs, sc->m_case_bb->index);
> > +  if (!dest_bbs.contains (sc->m_case_bb->index))
> > +   {
> > + dest_bbs.safe_push (sc->m_case_bb->index);
> > + if (dest_bbs.length () > m_max_case_bit_tests)
> > +   return false;
> > +   }
>
> That's intentional as m_max_case_bit_tests is a very small number (3) and
> I want to track *distinct* indices in dest_bbs. So dest_bbs.contains
> is a constant operation.

You're storing bb->index and formerly set bb->index bit, what's the difference?

For max 3 elements a vector is OK, of course but there should be a comment
that says this ;)  The static const is 'int' so it can in principle
hold up to two billion ;)

> >
> > vec::contains is linear search so no.  Was this for the length check?
> > Just do
> >
> >   if (bitmap_set_bit (...))
> >{
> >  length++;
> >  if (length > ...)
>
> I would need here bitmap_count_bits. Do you prefer it?

bitmap_set_bit returns false if the bit was already set so you can
count as you add bits, see the length++ above.

For three elements the vec will be faster though.  May I suggest
to use

 auto_vec dest_bbs;

then and quick_push rather than safe_push (need to guard the
push with the max_case_bit_test).

Richard.



> Martin
>
> >
> >> Thanks,
> >> Martin
>


[committed][nvptx] Fix Wimplicit-fallthrough in nvptx.c with -save-temps

2020-09-25 Thread Tom de Vries
Hi,

When compiling nvptx.c using -save-temps, I ran into Wimplicit-fallthrough
warnings.

The fallthrough locations have been marked with a fallthrough comment, but
that doesn't work with -save-temps, something that has been filed as
PR78497.

Work around this by using gcc_fallthrough () in addition to the comment.

Tested by building target nvptx, copying nvptx.c compile line and adding
-save-temps.

Committed to trunk.

Thanks,
- Tom

[nvptx] Fix Wimplicit-fallthrough in nvptx.c with -save-temps

gcc/ChangeLog:

2020-09-25  Tom de Vries  

* config/nvptx/nvptx.c (nvptx_assemble_integer, nvptx_print_operand):
Use gcc_fallthrough ().

---
 gcc/config/nvptx/nvptx.c | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/gcc/config/nvptx/nvptx.c b/gcc/config/nvptx/nvptx.c
index 54b1fdf669b..de82f9ab875 100644
--- a/gcc/config/nvptx/nvptx.c
+++ b/gcc/config/nvptx/nvptx.c
@@ -2101,7 +2101,7 @@ nvptx_assemble_integer (rtx x, unsigned int size, int 
ARG_UNUSED (aligned_p))
   val = INTVAL (XEXP (x, 1));
   x = XEXP (x, 0);
   gcc_assert (GET_CODE (x) == SYMBOL_REF);
-  /* FALLTHROUGH */
+  gcc_fallthrough (); /* FALLTHROUGH */
 
 case SYMBOL_REF:
   gcc_assert (size == init_frag.size);
@@ -2603,7 +2603,7 @@ nvptx_print_operand (FILE *file, rtx x, int code)
 {
 case 'A':
   x = XEXP (x, 0);
-  /* FALLTHROUGH.  */
+  gcc_fallthrough (); /* FALLTHROUGH. */
 
 case 'D':
   if (GET_CODE (x) == CONST)


[PATCH v2] c++/97197 - support TARGET_MEM_REF in C/C++ error pretty-printing

2020-09-25 Thread Jakub Jelinek via Gcc-patches
Hi!

On Fri, Sep 25, 2020 at 01:37:16PM +0200, Richard Biener wrote:
> See my comment above for Martins attempts to improve things.  I don't
> really want to try decide what to do with those late diagnostic IL
> printing but my commit was blamed for showing target-mem-ref unsupported.
> 
> I don't have much time to spend to think what to best print and what not,
> but yes, printing only the MEM_REF part is certainly imprecise.

Here is an updated version of the patch that prints TARGET_MEM_REF the way
it should be printed - as C representation of what it actually means.
Of course it would be better to have the original expressions, but with the
late diagnostics we no longer have them.

Ok for trunk if it passes bootstrap/regtest?

2020-09-25  Richard Biener  
Jakub Jelinek  

PR c++/97197
cp/
* error.c (dump_expr): Handle TARGET_MEM_REF.
c-family/
* c-pretty-print.c: Include langhooks.h.
(c_pretty_printer::postfix_expression): Handle TARGET_MEM_REF as
expression.
(c_pretty_printer::expression): Handle TARGET_MEM_REF as
unary_expression.
(c_pretty_printer::unary_expression): Handle TARGET_MEM_REF.

--- gcc/c-family/c-pretty-print.c.jj2020-09-21 11:15:53.600520132 +0200
+++ gcc/c-family/c-pretty-print.c   2020-09-25 15:21:26.034477251 +0200
@@ -29,6 +29,7 @@ along with GCC; see the file COPYING3.
 #include "intl.h"
 #include "tree-pretty-print.h"
 #include "selftest.h"
+#include "langhooks.h"
 
 /* The pretty-printer code is primarily designed to closely follow
(GNU) C and C++ grammars.  That is to be contrasted with spaghetti
@@ -1693,6 +1694,7 @@ c_pretty_printer::postfix_expression (tr
   break;
 
 case MEM_REF:
+case TARGET_MEM_REF:
   expression (e);
   break;
 
@@ -1859,6 +1861,55 @@ c_pretty_printer::unary_expression (tree
}
   break;
 
+case TARGET_MEM_REF:
+  pp_c_star (this);
+  if (TYPE_SIZE_UNIT (TREE_TYPE (TREE_TYPE (TMR_BASE (e == NULL_TREE
+ || !integer_onep (TYPE_SIZE_UNIT
+   (TREE_TYPE (TREE_TYPE (TMR_BASE (e))
+   {
+ if (TYPE_SIZE_UNIT (TREE_TYPE (e))
+ && integer_onep (TYPE_SIZE_UNIT (TREE_TYPE (e
+   {
+ pp_c_left_paren (this);
+ pp_c_type_cast (this, build_pointer_type (TREE_TYPE (e)));
+   }
+ else
+   {
+ pp_c_type_cast (this, build_pointer_type (TREE_TYPE (e)));
+ pp_c_left_paren (this);
+ pp_c_type_cast (this, build_pointer_type (char_type_node));
+   }
+   }
+  else if (!lang_hooks.types_compatible_p
+ (TREE_TYPE (e), TREE_TYPE (TREE_TYPE (TMR_BASE (e)
+   {
+ pp_c_type_cast (this, build_pointer_type (TREE_TYPE (e)));
+ pp_c_left_paren (this);
+   }
+  else
+   pp_c_left_paren (this);
+  pp_c_cast_expression (this, TMR_BASE (e));
+  if (TMR_STEP (e) && TMR_INDEX (e))
+   {
+ pp_plus (this);
+ pp_c_cast_expression (this, TMR_INDEX (e));
+ pp_c_star (this);
+ pp_c_cast_expression (this, TMR_STEP (e));
+   }
+  if (TMR_INDEX2 (e))
+   {
+ pp_plus (this);
+ pp_c_cast_expression (this, TMR_INDEX2 (e));
+   }
+  if (!integer_zerop (TMR_OFFSET (e)))
+   {
+ pp_plus (this);
+ pp_c_integer_constant (this,
+fold_convert (ssizetype, TMR_OFFSET (e)));
+   }
+  pp_c_right_paren (this);
+  break;
+
 case REALPART_EXPR:
 case IMAGPART_EXPR:
   pp_c_ws_string (this, code == REALPART_EXPR ? "__real__" : "__imag__");
@@ -2295,6 +2346,7 @@ c_pretty_printer::expression (tree e)
 case ADDR_EXPR:
 case INDIRECT_REF:
 case MEM_REF:
+case TARGET_MEM_REF:
 case NEGATE_EXPR:
 case BIT_NOT_EXPR:
 case TRUTH_NOT_EXPR:
--- gcc/cp/error.c.jj   2020-07-28 15:39:09.780759362 +0200
+++ gcc/cp/error.c  2020-09-25 15:30:17.452823375 +0200
@@ -2400,6 +2400,57 @@ dump_expr (cxx_pretty_printer *pp, tree
}
   break;
 
+case TARGET_MEM_REF:
+  pp_cxx_star (pp);
+  pp_cxx_left_paren (pp);
+  if (TYPE_SIZE_UNIT (TREE_TYPE (TREE_TYPE (TMR_BASE (t == NULL_TREE
+ || !integer_onep (TYPE_SIZE_UNIT
+   (TREE_TYPE (TREE_TYPE (TMR_BASE (t))
+   {
+ if (TYPE_SIZE_UNIT (TREE_TYPE (t))
+ && integer_onep (TYPE_SIZE_UNIT (TREE_TYPE (t
+   {
+ pp_cxx_left_paren (pp);
+ dump_type (pp, build_pointer_type (TREE_TYPE (t)), flags);
+   }
+ else
+   {
+ dump_type (pp, build_pointer_type (TREE_TYPE (t)), flags);
+ pp_cxx_right_paren (pp);
+ pp_cxx_left_paren (pp);
+ pp_cxx_left_paren (pp);
+ dump_type (pp, build_pointer_type (char_type_node), flags);
+   }
+ 

[PATCH] assorted improvements for fold_truth_andor_1

2020-09-25 Thread Alexandre Oliva


This patch introduces various improvements to the logic that merges
field compares.

Before the patch, we could merge:

  (a.x1 EQNE b.x1)  ANDOR  (a.y1 EQNE b.y1)

into something like:

  (((type *))[Na] & MASK) EQNE (((type *))[Nb] & MASK)

if both of A's fields live within the same alignment boundaries, and
so do B's, at the same relative positions.  Constants may be used
instead of the object B.

The initial goal of this patch was to enable such combinations when a
field crossed alignment boundaries, e.g. for packed types.  We can't
generally access such fields with a single memory access, so when we
come across such a compare, we will attempt to combine each access
separately.

Some merging opportunities were missed because of right-shifts,
compares expressed as e.g. ((a.x1 ^ b.x1) & MASK) EQNE 0, and
narrowing conversions, especially after earlier merges.  This patch
introduces handlers for several cases involving these.

Other merging opportunities were missed because of association.  The
existing logic would only succeed in merging a pair of consecutive
compares, or e.g. B with C in (A ANDOR B) ANDOR C, not even trying
e.g. C and D in (A ANDOR (B ANDOR C)) ANDOR D.  I've generalized the
handling of the rightmost compare in the left-hand operand, going for
the leftmost compare in the right-hand operand, and then onto trying
to merge compares pairwise, one from each operand, even if they are
not consecutive, taking care to avoid merging operations with
intervening side effects, including volatile accesses.

When it is the second of a non-consecutive pair of compares that first
accesses a word, we may merge the first compare with part of the
second compare that refers to the same word, keeping the compare of
the remaining bits at the spot where the second compare used to be.

Handling compares with non-constant fields was somewhat generalized,
now handling non-adjacent fields.  When a field of one object crosses
an alignment boundary but the other doesn't, we issue the same load in
both compares; gimple optimizers will later turn it into a single
load, without our having to handle SAVE_EXPRs at this point.

The logic for issuing split loads and compares, and ordering them, is
now shared between all cases of compares with constants and with
another object.

The -Wno-error for toplev.o on rs6000 is because of toplev.c's:

  if ((flag_sanitize & SANITIZE_ADDRESS)
  && !FRAME_GROWS_DOWNWARD)

and rs6000.h's:

#define FRAME_GROWS_DOWNWARD (flag_stack_protect != 0   \
  || (flag_sanitize & SANITIZE_ADDRESS) != 0)

The mutually exclusive conditions involving flag_sanitize are now
noticed and reported by fold-const.c's:

  warning (0,
   "% of mutually exclusive equal-tests"
   " is always 0");

This patch enables over 12k compare-merging opportunities that we used
to miss in a GCC bootstrap.

Regstrapped on x86_64-linux-gnu and ppc64-linux-gnu.  Ok to install?


for  gcc/ChangeLog

* fold-const.c (prepare_xor): New.
(decode_field_reference): Handle xor, shift, and narrowing
conversions.
(all_ones_mask_p): Remove.
(compute_split_boundary_from_align): New.
(build_split_load, reuse_split_load): New.
(fold_truth_andor_1): Add recursion to combine pairs of
non-neighboring compares.  Handle xor compared with zero.
Handle fields straddling across alignment boundaries.
Generalize handling of non-constant rhs.
(fold_truth_andor): Leave sub-expression handling to the
recursion above.
* config/rs6000/t-rs6000 (toplev.o-warn): Disable errors.

for  gcc/testsuite/ChangeLog

* gcc.dg/field-merge-1.c: New.
* gcc.dg/field-merge-2.c: New.
* gcc.dg/field-merge-3.c: New.
* gcc.dg/field-merge-4.c: New.
* gcc.dg/field-merge-5.c: New.
---
 gcc/config/rs6000/t-rs6000   |4 
 gcc/fold-const.c |  818 --
 gcc/testsuite/gcc.dg/field-merge-1.c |   64 +++
 gcc/testsuite/gcc.dg/field-merge-2.c |   31 +
 gcc/testsuite/gcc.dg/field-merge-3.c |   36 +
 gcc/testsuite/gcc.dg/field-merge-4.c |   40 ++
 gcc/testsuite/gcc.dg/field-merge-5.c |   40 ++
 7 files changed, 882 insertions(+), 151 deletions(-)
 create mode 100644 gcc/testsuite/gcc.dg/field-merge-1.c
 create mode 100644 gcc/testsuite/gcc.dg/field-merge-2.c
 create mode 100644 gcc/testsuite/gcc.dg/field-merge-3.c
 create mode 100644 gcc/testsuite/gcc.dg/field-merge-4.c
 create mode 100644 gcc/testsuite/gcc.dg/field-merge-5.c

diff --git a/gcc/config/rs6000/t-rs6000 b/gcc/config/rs6000/t-rs6000
index 1ddb572..516486d 100644
--- a/gcc/config/rs6000/t-rs6000
+++ b/gcc/config/rs6000/t-rs6000
@@ -52,6 +52,10 @@ $(srcdir)/config/rs6000/rs6000-tables.opt: 
$(srcdir)/config/rs6000/genopt.sh \
$(SHELL) $(srcdir)/config/rs6000/genopt.sh $(srcdir)/config/rs6000 > \

Re: [PATCH] switch lowering: limit number of cluster attemps

2020-09-25 Thread Martin Liška

On 9/25/20 3:18 PM, Richard Biener wrote:

On Fri, Sep 25, 2020 at 11:13 AM Martin Liška  wrote:


Hello.

All right, I come up with a rapid speed up that can allow us to remove
the introduced parameter. It contains 2 parts:
- BIT TEST: we allow at maximum a range that is smaller GET_MODE_BITSIZE
- JT: we spent quite some time in density calculation, we can guess it first
and it leads to a fast bail out.

Patch can bootstrap on x86_64-linux-gnu and survives regression tests.

Ready to be installed?


Err

+  auto_vec dest_bbs;
-  auto_bitmap dest_bbs;

-  bitmap_set_bit (dest_bbs, sc->m_case_bb->index);
+  if (!dest_bbs.contains (sc->m_case_bb->index))
+   {
+ dest_bbs.safe_push (sc->m_case_bb->index);
+ if (dest_bbs.length () > m_max_case_bit_tests)
+   return false;
+   }


That's intentional as m_max_case_bit_tests is a very small number (3) and
I want to track *distinct* indices in dest_bbs. So dest_bbs.contains
is a constant operation.



vec::contains is linear search so no.  Was this for the length check?
Just do

  if (bitmap_set_bit (...))
   {
 length++;
 if (length > ...)


I would need here bitmap_count_bits. Do you prefer it?

Martin




Thanks,
Martin




Re: [PATCH v4 1/3] IFN: Implement IFN_VEC_SET for ARRAY_REF with VIEW_CONVERT_EXPR

2020-09-25 Thread Richard Sandiford
xionghu luo  writes:
> @@ -2658,6 +2659,45 @@ expand_vect_cond_mask_optab_fn (internal_fn, gcall 
> *stmt, convert_optab optab)
>  
>  #define expand_vec_cond_mask_optab_fn expand_vect_cond_mask_optab_fn
>  
> +/* Expand VEC_SET internal functions.  */
> +
> +static void
> +expand_vec_set_optab_fn (internal_fn, gcall *stmt, convert_optab optab)
> +{
> +  tree lhs = gimple_call_lhs (stmt);
> +  tree op0 = gimple_call_arg (stmt, 0);
> +  tree op1 = gimple_call_arg (stmt, 1);
> +  tree op2 = gimple_call_arg (stmt, 2);
> +  rtx target = expand_expr (lhs, NULL_RTX, VOIDmode, EXPAND_WRITE);
> +  rtx src = expand_normal (op0);
> +
> +  machine_mode outermode = TYPE_MODE (TREE_TYPE (op0));
> +  scalar_mode innermode = GET_MODE_INNER (outermode);
> +
> +  rtx value = expand_expr (op1, NULL_RTX, VOIDmode, EXPAND_NORMAL);
> +  rtx pos = expand_expr (op2, NULL_RTX, VOIDmode, EXPAND_NORMAL);

These two can just use expand_normal.  Might be easier to read if
they come immediately after the expand_normal (op0).

LGTM with that change for the internal-fn.c stuff, thanks.

Richard


Re: [PATCH] middle-end/96814 - fix VECTOR_BOOLEAN_TYPE_P CTOR RTL expansion

2020-09-25 Thread Richard Biener
On Fri, 25 Sep 2020, Richard Sandiford wrote:

> Richard Biener  writes:
> >> What do we allow for non-boolean constructors.  E.g. for:
> >> 
> >>   v2hi = 0xf001;
> >> 
> >> do we allow the CONSTRUCTOR to be { 0xf001 }?  Is the type of an
> >> initialiser value allowed to be arbitrarily different from the type
> >> of the elements being initialised?
> >> 
> >> Or is there requirement that (say) each constructor element is either:
> >> 
> >> - a scalar that initialises one element of the constructed vector
> >> - a vector of N elements that initialises N elements of the constructed 
> >> vector
> >> 
> >> ?
> >> 
> >> Like you say, it mostly seems like guesswork how booleans would be
> >> handled here, but personally I don't know the answer for non-booleans
> >> either :-)
> >
> > There's extensive checking in tree-cfg.c for vector CTORs meanwhile.
> > We only supporm uniform element CTORs with only trailing zeros elided.
> > And the elements need to either have types of the vector component
> > or be vectors with such component.
> 
> Ah, great.  So in that case, could we ditch bitsize altogether and
> just use:
> 
>   unsigned int nelts = (VECTOR_TYPE_P (val_type)
>   ? TYPE_VECTOR_SUBPARTS (val_type).to_constant () : 1);
> 
> or equivalent to work out the number of elements being initialised
> by each constructor element?

But

   store_constructor_field (target, bitsize, bitpos, 0,
 bitregion_end, value_mode,
 value, cleared, alias, reverse);

still wants the bits to initialize (for the original testcase
the vector had only the first 4 elements initialized,
at wrong bit positions and sizes - QImode).

But yes, I'm sure we can eventually simplify this further.
FYI, the following passed bootstrap, regtest is still running
(but as said, test coverage dropped to zero).

Richard.

commit d16b5975ca985cbe97698479fc38b6a636886978
Author: Richard Biener 
Date:   Fri Sep 25 11:13:13 2020 +0200

middle-end/96814 - fix VECTOR_BOOLEAN_TYPE_P CTOR RTL expansion

The RTL expansion code for CTORs doesn't handle VECTOR_BOOLEAN_TYPE_P
with bit-precision elements correctly as the testcase shows before
the PR97085 fix.  The following makes it do the correct thing
(not 100% sure for CTOR of sub-vectors due to the lack of a testcase).

The alternative would be to assert such CTORs do not happen (and also
add IL verification for this).

The GIMPLE FE needs a way to declare the VECTOR_BOOLEAN_TYPE_P vectors
(thus the C FE needs that).

2020-09-25  Richard Biener  

PR middle-end/96814
* expr.c (store_constructor): Handle VECTOR_BOOLEAN_TYPE_P
CTORs correctly.

* gcc.target/i386/pr96814.c: New testcase.

diff --git a/gcc/expr.c b/gcc/expr.c
index 1a15f24b397..1c79518ee4d 100644
--- a/gcc/expr.c
+++ b/gcc/expr.c
@@ -6922,7 +6922,7 @@ store_constructor (tree exp, rtx target, int cleared, 
poly_int64 size,
insn_code icode = CODE_FOR_nothing;
tree elt;
tree elttype = TREE_TYPE (type);
-   int elt_size = tree_to_uhwi (TYPE_SIZE (elttype));
+   int elt_size = vector_element_bits (type);
machine_mode eltmode = TYPE_MODE (elttype);
HOST_WIDE_INT bitsize;
HOST_WIDE_INT bitpos;
@@ -6987,6 +6987,15 @@ store_constructor (tree exp, rtx target, int cleared, 
poly_int64 size,
  }
  }
 
+   /* Compute the size of the elements in the CTOR.  It differs
+  from the size of the vector type elements only when the
+  CTOR elements are vectors themselves.  */
+   tree val_type = TREE_TYPE (CONSTRUCTOR_ELT (exp, 0)->value);
+   if (VECTOR_TYPE_P (val_type))
+ bitsize = tree_to_uhwi (TYPE_SIZE (val_type));
+   else
+ bitsize = elt_size;
+
/* If the constructor has fewer elements than the vector,
   clear the whole array first.  Similarly if this is static
   constructor of a non-BLKmode object.  */
@@ -7001,11 +7010,7 @@ store_constructor (tree exp, rtx target, int cleared, 
poly_int64 size,
 
FOR_EACH_CONSTRUCTOR_VALUE (CONSTRUCTOR_ELTS (exp), idx, value)
  {
-   tree sz = TYPE_SIZE (TREE_TYPE (value));
-   int n_elts_here
- = tree_to_uhwi (int_const_binop (TRUNC_DIV_EXPR, sz,
-  TYPE_SIZE (elttype)));
-
+   int n_elts_here = bitsize / elt_size;
count += n_elts_here;
if (mostly_zeros_p (value))
  zero_count += n_elts_here;
@@ -7045,7 +7050,6 @@ store_constructor (tree exp, rtx target, int cleared, 
poly_int64 size,
HOST_WIDE_INT eltpos;
tree value = ce->value;
 
-   bitsize = tree_to_uhwi (TYPE_SIZE (TREE_TYPE (value)));
if (cleared && initializer_zerop (value))
   

Re: [PATCH] switch lowering: limit number of cluster attemps

2020-09-25 Thread Richard Biener via Gcc-patches
On Fri, Sep 25, 2020 at 11:13 AM Martin Liška  wrote:
>
> Hello.
>
> All right, I come up with a rapid speed up that can allow us to remove
> the introduced parameter. It contains 2 parts:
> - BIT TEST: we allow at maximum a range that is smaller GET_MODE_BITSIZE
> - JT: we spent quite some time in density calculation, we can guess it first
>and it leads to a fast bail out.
>
> Patch can bootstrap on x86_64-linux-gnu and survives regression tests.
>
> Ready to be installed?

Err

+  auto_vec dest_bbs;
-  auto_bitmap dest_bbs;

-  bitmap_set_bit (dest_bbs, sc->m_case_bb->index);
+  if (!dest_bbs.contains (sc->m_case_bb->index))
+   {
+ dest_bbs.safe_push (sc->m_case_bb->index);
+ if (dest_bbs.length () > m_max_case_bit_tests)
+   return false;
+   }

vec::contains is linear search so no.  Was this for the length check?
Just do

 if (bitmap_set_bit (...))
  {
length++;
if (length > ...)

> Thanks,
> Martin


Re: [PATCH] generalized range_query class for multiple contexts

2020-09-25 Thread Andrew MacLeod via Gcc-patches

On 9/24/20 5:51 PM, Martin Sebor via Gcc-patches wrote:

On 9/18/20 12:38 PM, Aldy Hernandez via Gcc-patches wrote:



3. Conversion of sprintf/strlen pass to class.

This is a nonfunctional change to the sprintf/strlen passes. That is, 
no effort was made to change the passes to multi-ranges.  However, 
with this patch, we are able to plug in a ranger or evrp with just a 
few lines, since the range query mechanism share a common API.


Thanks for doing all this!  There isn't anything I don't understand
in the sprintf changes so no questions from me (well, almost none).
Just some comments:

The current call statement is available in all functions that take
a directive argument, as dir->info.callstmt.  There should be no need
to also add it as a new argument to the functions that now need it.

The change adds code along these lines in a bunch of places:

+  value_range vr;
+  if (!query->range_of_expr (vr, arg, stmt))
+    vr.set_varying (TREE_TYPE (arg));

I thought under the new Ranger APIs when a range couldn't be
determined it would be automatically set to the maximum for
the type.  I like that and have been moving in that direction
with my code myself (rather than having an API fail, have it
set the max range and succeed).

Aldy will have to comment why that is there, probably an oversight The 
API should return VARYING if it cant calculate a better range. The only 
time the API returns a FALSE for a query is when the range is 
unsupported..  ie, you ask for the range of a float statement or argument.


Andrew



Re: [PATCH] middle-end/96814 - fix VECTOR_BOOLEAN_TYPE_P CTOR RTL expansion

2020-09-25 Thread Richard Sandiford
Richard Biener  writes:
> On Fri, 25 Sep 2020, Richard Sandiford wrote:
>
>> Richard Biener  writes:
>> >> What do we allow for non-boolean constructors.  E.g. for:
>> >> 
>> >>   v2hi = 0xf001;
>> >> 
>> >> do we allow the CONSTRUCTOR to be { 0xf001 }?  Is the type of an
>> >> initialiser value allowed to be arbitrarily different from the type
>> >> of the elements being initialised?
>> >> 
>> >> Or is there requirement that (say) each constructor element is either:
>> >> 
>> >> - a scalar that initialises one element of the constructed vector
>> >> - a vector of N elements that initialises N elements of the constructed 
>> >> vector
>> >> 
>> >> ?
>> >> 
>> >> Like you say, it mostly seems like guesswork how booleans would be
>> >> handled here, but personally I don't know the answer for non-booleans
>> >> either :-)
>> >
>> > There's extensive checking in tree-cfg.c for vector CTORs meanwhile.
>> > We only supporm uniform element CTORs with only trailing zeros elided.
>> > And the elements need to either have types of the vector component
>> > or be vectors with such component.
>> 
>> Ah, great.  So in that case, could we ditch bitsize altogether and
>> just use:
>> 
>>   unsigned int nelts = (VECTOR_TYPE_P (val_type)
>>  ? TYPE_VECTOR_SUBPARTS (val_type).to_constant () : 1);
>> 
>> or equivalent to work out the number of elements being initialised
>> by each constructor element?
>
> But
>
>store_constructor_field (target, bitsize, bitpos, 0,
>  bitregion_end, value_mode,
>  value, cleared, alias, reverse);
>
> still wants the bits to initialize (for the original testcase
> the vector had only the first 4 elements initialized,
> at wrong bit positions and sizes - QImode).
>
> But yes, I'm sure we can eventually simplify this further.
> FYI, the following passed bootstrap, regtest is still running
> (but as said, test coverage dropped to zero).

LGTM FWIW.

Thanks,
Richard

>
> Richard.
>
> commit d16b5975ca985cbe97698479fc38b6a636886978
> Author: Richard Biener 
> Date:   Fri Sep 25 11:13:13 2020 +0200
>
> middle-end/96814 - fix VECTOR_BOOLEAN_TYPE_P CTOR RTL expansion
> 
> The RTL expansion code for CTORs doesn't handle VECTOR_BOOLEAN_TYPE_P
> with bit-precision elements correctly as the testcase shows before
> the PR97085 fix.  The following makes it do the correct thing
> (not 100% sure for CTOR of sub-vectors due to the lack of a testcase).
> 
> The alternative would be to assert such CTORs do not happen (and also
> add IL verification for this).
> 
> The GIMPLE FE needs a way to declare the VECTOR_BOOLEAN_TYPE_P vectors
> (thus the C FE needs that).
> 
> 2020-09-25  Richard Biener  
> 
> PR middle-end/96814
> * expr.c (store_constructor): Handle VECTOR_BOOLEAN_TYPE_P
> CTORs correctly.
> 
> * gcc.target/i386/pr96814.c: New testcase.
>
> diff --git a/gcc/expr.c b/gcc/expr.c
> index 1a15f24b397..1c79518ee4d 100644
> --- a/gcc/expr.c
> +++ b/gcc/expr.c
> @@ -6922,7 +6922,7 @@ store_constructor (tree exp, rtx target, int cleared, 
> poly_int64 size,
>   insn_code icode = CODE_FOR_nothing;
>   tree elt;
>   tree elttype = TREE_TYPE (type);
> - int elt_size = tree_to_uhwi (TYPE_SIZE (elttype));
> + int elt_size = vector_element_bits (type);
>   machine_mode eltmode = TYPE_MODE (elttype);
>   HOST_WIDE_INT bitsize;
>   HOST_WIDE_INT bitpos;
> @@ -6987,6 +6987,15 @@ store_constructor (tree exp, rtx target, int cleared, 
> poly_int64 size,
> }
> }
>  
> + /* Compute the size of the elements in the CTOR.  It differs
> +from the size of the vector type elements only when the
> +CTOR elements are vectors themselves.  */
> + tree val_type = TREE_TYPE (CONSTRUCTOR_ELT (exp, 0)->value);
> + if (VECTOR_TYPE_P (val_type))
> +   bitsize = tree_to_uhwi (TYPE_SIZE (val_type));
> + else
> +   bitsize = elt_size;
> +
>   /* If the constructor has fewer elements than the vector,
>  clear the whole array first.  Similarly if this is static
>  constructor of a non-BLKmode object.  */
> @@ -7001,11 +7010,7 @@ store_constructor (tree exp, rtx target, int cleared, 
> poly_int64 size,
>  
>   FOR_EACH_CONSTRUCTOR_VALUE (CONSTRUCTOR_ELTS (exp), idx, value)
> {
> - tree sz = TYPE_SIZE (TREE_TYPE (value));
> - int n_elts_here
> -   = tree_to_uhwi (int_const_binop (TRUNC_DIV_EXPR, sz,
> -TYPE_SIZE (elttype)));
> -
> + int n_elts_here = bitsize / elt_size;
>   count += n_elts_here;
>   if (mostly_zeros_p (value))
> zero_count += n_elts_here;
> @@ -7045,7 +7050,6 @@ store_constructor (tree exp, rtx target, int cleared, 
> poly_int64 size,
>  

Re: [PATCH] middle-end/96814 - fix VECTOR_BOOLEAN_TYPE_P CTOR RTL expansion

2020-09-25 Thread Richard Sandiford
Richard Biener  writes:
>> What do we allow for non-boolean constructors.  E.g. for:
>> 
>>   v2hi = 0xf001;
>> 
>> do we allow the CONSTRUCTOR to be { 0xf001 }?  Is the type of an
>> initialiser value allowed to be arbitrarily different from the type
>> of the elements being initialised?
>> 
>> Or is there requirement that (say) each constructor element is either:
>> 
>> - a scalar that initialises one element of the constructed vector
>> - a vector of N elements that initialises N elements of the constructed 
>> vector
>> 
>> ?
>> 
>> Like you say, it mostly seems like guesswork how booleans would be
>> handled here, but personally I don't know the answer for non-booleans
>> either :-)
>
> There's extensive checking in tree-cfg.c for vector CTORs meanwhile.
> We only supporm uniform element CTORs with only trailing zeros elided.
> And the elements need to either have types of the vector component
> or be vectors with such component.

Ah, great.  So in that case, could we ditch bitsize altogether and
just use:

  unsigned int nelts = (VECTOR_TYPE_P (val_type)
? TYPE_VECTOR_SUBPARTS (val_type).to_constant () : 1);

or equivalent to work out the number of elements being initialised
by each constructor element?

Thanks,
Richard


Re: One issue with default implementation of zero_call_used_regs

2020-09-25 Thread Richard Sandiford
Qing Zhao  writes:
> Hi, Richard,
>
> As you suggested, I added a default implementation of the target hook 
> “zero_cal_used_regs (HARD_REG_SET)” as following in my latest patch
>
>
> /* The default hook for TARGET_ZERO_CALL_USED_REGS.  */
>
> void
> default_zero_call_used_regs (HARD_REG_SET need_zeroed_hardregs)

FWIW, I was suggesting to return the set of registers that are actually
cleared too.  Here you have the hook emit the asm statement, but IMO the
way we generate the asm for a given set of registers should be entirely
target-independent, and happen outside the hook.

So the hook returning the set of cleared registers does two things:

(1) It indicates which registers should be clobbered by the asm
(which would be generated after calling the hook, but emitted
before the sequence of instructions generated by the hook).

(2) It indicates which registers should be treated as live on return.

FWIW, for (2), I'd recommend storing the returned HARD_REG_SET in crtl.
Then the wrapper around EPILOGUE_USES that we talked about would
check two things:

- EPILOGUE_USES itself
- the crtl HARD_REG_SET

The crtl set would start out empty and remain empty unless the
new option is used.

> if (zero_rtx[(int)mode] == NULL_RTX)
>   {
> zero_rtx[(int)mode] = reg;
> tmp = gen_rtx_SET (reg, const0_rtx);
> emit_insn (tmp);
>   }
> else
>   emit_move_insn (reg, zero_rtx[(int)mode]);

Hmm, OK, so you're assuming that it's better to zero one register
and reuse that register for later moves.  I guess this is my RISC
background/bias showing, but I think it might be simpler to assume
that zeroing is as cheap as a register move.  The danger with reusing
earlier registers is that you might introduce a cross-bank move,
and some targets can only do those via memory.

Or perhaps we could use insn_cost to choose between them.  But I think
the first implementation can just zero each register individually,
unless we already know of a specific case in which reusing registers
is necessary.

> I tested this default implementation on aarch64 with a small testing case, 
> -fzero-call-used-regs=all-gpr|used-gpr|used-gpr-arg|used-arg|used work well, 
> however, 
> -fzero-call-used-regs=all-arg or -fzero-call-used-regs=all have an internal 
> compiler error as following:
>
> t1.c:15:1: internal compiler error: in gen_highpart, at emit-rtl.c:1631
>15 | }
>   | ^
> 0xcff58b gen_highpart(machine_mode, rtx_def*)
>   ../../hjl-caller-saved-gcc/gcc/emit-rtl.c:1631
> 0x174b373 aarch64_split_128bit_move(rtx_def*, rtx_def*)
>   ../../hjl-caller-saved-gcc/gcc/config/aarch64/aarch64.c:3390
> 0x1d8b087 gen_split_11(rtx_insn*, rtx_def**)
>   ../../hjl-caller-saved-gcc/gcc/config/aarch64/aarch64.md:1394
>
> As I studied today, I found the major issue for this bug is because the 
> following statement:
>
> machine_mode mode = reg_raw_mode[regno];
>
> “reg_raw_mode” returns E_TImode for aarch64 register V0 (which is a vector 
> register on aarch64) , as a result, the zeroing insn for this register is:
>
> (insn 112 111 113 7 (set (reg:TI 32 v0)
> (const_int 0 [0])) "t1.c":15:1 -1
>  (nil))
>
>
> However, looks like that the above RTL have to be splitted into two sub 
> register moves on aarch64, and the splitting has some issue. 
>
> So, I guess that on aarch64, zeroing vector registers might need other modes 
> than the one returned by “reg_raw_mode”.  
>
> My questions are:
>
> 1. Is there another available utility routine that returns the proper MODE 
> for the hard registers that can be readily used to zero the hardr register?
> 2. If not, should I add one more target hook for this purpose? i.e 
>
> /* Return the proper machine mode that can be used to zero this hard register 
> specified by REGNO.  */
> machine_mode zero-call-used-regs-mode (unsigned int REGNO)

Thanks for testing aarch64.  I think there are two issues here,
one in the patch and one in the aarch64 backend:

- the patch should use emit_move_insn rather than use gen_rtx_SET directly.

- the aarch64 backend doesn't handle zeroing TImode vector registers,
  but should.  E.g. for:

void
foo ()
{
  register __int128_t q0 asm ("q0");
  q0 = 0;
  asm volatile ("" :: "w" (q0));
}

  we generate:

mov x0, 0
mov x1, 0
fmovd0, x0
fmovv0.d[1], x1

  which is, er, somewhat suboptimal.

I'll try to fix the aarch64 bug for Monday next week.

Thanks,
Richard


[PATCH] middle-end/97207 - implement move assign for auto_vec<>

2020-09-25 Thread Richard Biener
This implements the missing move assignment to make std::swap work
on auto_vec<>

Bootstrapped / tesed on x86_64-unknown-linux-gnu, pushed.

Richard.

2020-09-25  Richard Biener  

PR middle-end/97207
* vec.h (auto_vec::operator=(auto_vec&&)): Implement.
---
 gcc/vec.h | 8 +++-
 1 file changed, 7 insertions(+), 1 deletion(-)

diff --git a/gcc/vec.h b/gcc/vec.h
index d73d865cff2..d8c7cdac073 100644
--- a/gcc/vec.h
+++ b/gcc/vec.h
@@ -1546,7 +1546,13 @@ public:
   this->m_vec = r.m_vec;
   r.m_vec = NULL;
 }
-  void operator= (auto_vec&&) = delete;
+  auto_vec& operator= (auto_vec&& r)
+{
+  this->release ();
+  this->m_vec = r.m_vec;
+  r.m_vec = NULL;
+  return *this;
+}
 };
 
 
-- 
2.26.2


Re: [PATCH] c++: Fix up default initialization with consteval default ctor [PR96994]

2020-09-25 Thread Stephan Bergmann via Gcc-patches

On 15/09/2020 09:57, Jakub Jelinek via Gcc-patches wrote:

The following testcase is miscompiled (in particular the a and i
initialization).  The problem is that build_special_member_call due to
the immediate constructors (but not evaluated in constant expression mode)
doesn't create a CALL_EXPR, but returns a TARGET_EXPR with CONSTRUCTOR
as the initializer for it, and then expand_default_init just emits
the returned statement, but this one doesn't have any side-effects and does
nothing.  There is an if to handle constexpr ctors which emits an INIT_EXPR
but constexpr ctors still show up as CALL_EXPR and need to be manually
evaluated to constant expressions (if possible).

The following patch fixes that, though I'm not sure about several things.
One is that the earlier if also has expr == true_exp && in the condition,
not sure if we want it in this case or not.
Another is that for delegating constructors, we emit two separate calls
and build_if_in_charge them together.  Not sure if consteval could come into
play in that case.


(Just reporting that with this patch applied, my build of LibreOffice 
using consteval, cf. 
 
"Turn OStringLiteral into a consteval'ed, static-refcound rtl_String", 
works fine.)




Re: [PATCH] middle-end/96814 - fix VECTOR_BOOLEAN_TYPE_P CTOR RTL expansion

2020-09-25 Thread Richard Biener
On Fri, 25 Sep 2020, Richard Sandiford wrote:

> Richard Biener  writes:
> > The RTL expansion code for CTORs doesn't handle VECTOR_BOOLEAN_TYPE_P
> > with bit-precision elements correctly as the testcase shows before
> > the PR97085 fix.  The following makes it do the correct thing
> > (not 100% sure for CTOR of sub-vectors due to the lack of a testcase).
> >
> > The alternative would be to assert such CTORs do not happen (and also
> > add IL verification for this).
> >
> > The GIMPLE FE needs a way to declare the VECTOR_BOOLEAN_TYPE_P vectors
> > (thus the C FE needs that), thus test coverage is quite limited (zero)
> > now and I didn't manage to convince GCC to create such CTOR for SVE
> > VnBImode vectors.
> >
> > Bootstrapped and tested on x86_64-unknown-linux-gnu.
> >
> > Does this look sensible?
> >
> > Thanks,
> > Richard.
> >
> > 2020-09-25  Richard Biener  
> >
> > PR middle-end/96814
> > * expr.c (store_constructor): Handle VECTOR_BOOLEAN_TYPE_P
> > CTORs correctly.
> >
> > * gcc.target/i386/pr96814.c: New testcase.
> > ---
> >  gcc/expr.c  | 28 ++---
> >  gcc/testsuite/gcc.target/i386/pr96814.c | 19 +
> >  2 files changed, 40 insertions(+), 7 deletions(-)
> >  create mode 100644 gcc/testsuite/gcc.target/i386/pr96814.c
> >
> > diff --git a/gcc/expr.c b/gcc/expr.c
> > index 1a15f24b397..fb42e485089 100644
> > --- a/gcc/expr.c
> > +++ b/gcc/expr.c
> > @@ -6922,7 +6922,9 @@ store_constructor (tree exp, rtx target, int cleared, 
> > poly_int64 size,
> > insn_code icode = CODE_FOR_nothing;
> > tree elt;
> > tree elttype = TREE_TYPE (type);
> > -   int elt_size = tree_to_uhwi (TYPE_SIZE (elttype));
> > +   int elt_size
> > + = (VECTOR_BOOLEAN_TYPE_P (type) ? TYPE_PRECISION (elttype)
> > +: tree_to_uhwi (TYPE_SIZE (elttype)));
> 
> FWIW, we now have vector_element_bits for this.

ah, didn't know this

> > machine_mode eltmode = TYPE_MODE (elttype);
> > HOST_WIDE_INT bitsize;
> > HOST_WIDE_INT bitpos;
> > @@ -6987,6 +6989,23 @@ store_constructor (tree exp, rtx target, int 
> > cleared, poly_int64 size,
> >   }
> >   }
> >  
> > +   /* Compute the size of the elements in the CTOR.  */
> > +   tree val_type = TREE_TYPE (CONSTRUCTOR_ELT (exp, 0)->value);
> > +   if (VECTOR_BOOLEAN_TYPE_P (type))
> > + {
> > +   if (VECTOR_TYPE_P (val_type))
> > + {
> > +   /* ???  Never seen such beast, but it's not disallowed.  */
> > +   gcc_assert (VECTOR_BOOLEAN_TYPE_P (val_type));
> > +   bitsize = (TYPE_PRECISION (TREE_TYPE (val_type))
> > +  * TYPE_VECTOR_SUBPARTS (val_type).to_constant ());

but I wonder whether it is correct?  Say, for AVX512 which uses
at least 'char' as type TYPE_SIZE of that will likely be 8 so for
a hyphotetical 4 element mask it would need two-bit elements
to work out but IIRC AVX512 mask registers always use 1 bit per lane.

The target hook currently does

static opt_machine_mode
ix86_get_mask_mode (machine_mode data_mode)
{
  unsigned vector_size = GET_MODE_SIZE (data_mode);
  unsigned nunits = GET_MODE_NUNITS (data_mode);
  unsigned elem_size = vector_size / nunits;

  /* Scalar mask case.  */
  if ((TARGET_AVX512F && vector_size == 64)
  || (TARGET_AVX512VL && (vector_size == 32 || vector_size == 16)))
{ 
  if (elem_size == 4
  || elem_size == 8
  || (TARGET_AVX512BW && (elem_size == 1 || elem_size == 2)))
return smallest_int_mode_for_size (nunits);
}

and then build_truth_vector_type_for_mode will end up building
a vector with QImode I think.  So it works but I now
wonder whether it works correctly ;)

I guess I will rework the above hunk to use the computed element
size for the non-vector element case and rely on TYPE_SIZE for
the vector element case since that's what vector_element_size does.
Simplifies the beast a bit.

> > + }
> > +   else
> > + bitsize = TYPE_PRECISION (val_type);
> > + }
> > +   else
> > + bitsize = tree_to_uhwi (TYPE_SIZE (val_type));
> > +
> 
> What do we allow for non-boolean constructors.  E.g. for:
> 
>   v2hi = 0xf001;
> 
> do we allow the CONSTRUCTOR to be { 0xf001 }?  Is the type of an
> initialiser value allowed to be arbitrarily different from the type
> of the elements being initialised?
> 
> Or is there requirement that (say) each constructor element is either:
> 
> - a scalar that initialises one element of the constructed vector
> - a vector of N elements that initialises N elements of the constructed vector
> 
> ?
> 
> Like you say, it mostly seems like guesswork how booleans would be
> handled here, but personally I don't know the answer for non-booleans
> either :-)

There's extensive checking in tree-cfg.c for vector CTORs meanwhile.
We only supporm uniform element CTORs with only trailing zeros elided.
And the elements need to either have types of the vector component
or be vectors with such 

Re: [PATCH] aarch64: Do not alter force_reg returned rtx expanding pauth builtins

2020-09-25 Thread Richard Sandiford
Andrea Corallo  writes:
> Hi Richard,
>
> thanks for reviewing
>
> Richard Sandiford  writes:
>
>> Andrea Corallo  writes:
>>> Hi all,
>>>
>>> having a look for force_reg returned rtx later on modified I've found
>>> this other case in `aarch64_general_expand_builtin` while expanding 
>>> pointer authentication builtins.
>>>
>>> Regtested and bootsraped on aarch64-linux-gnu.
>>>
>>> Okay for trunk?
>>>
>>>   Andrea
>>>
>>> From 8869ee04e3788fdec86aa7e5a13e2eb477091d0e Mon Sep 17 00:00:00 2001
>>> From: Andrea Corallo 
>>> Date: Mon, 21 Sep 2020 13:52:45 +0100
>>> Subject: [PATCH] aarch64: Do not alter force_reg returned rtx expanding 
>>> pauth
>>>  builtins
>>>
>>> 2020-09-21  Andrea Corallo  
>>>
>>> * config/aarch64/aarch64-builtins.c
>>> (aarch64_general_expand_builtin): Do not alter value on a
>>> force_reg returned rtx.
>>> ---
>>>  gcc/config/aarch64/aarch64-builtins.c | 6 +++---
>>>  1 file changed, 3 insertions(+), 3 deletions(-)
>>>
>>> diff --git a/gcc/config/aarch64/aarch64-builtins.c 
>>> b/gcc/config/aarch64/aarch64-builtins.c
>>> index b787719cf5e..a77718ccfac 100644
>>> --- a/gcc/config/aarch64/aarch64-builtins.c
>>> +++ b/gcc/config/aarch64/aarch64-builtins.c
>>> @@ -2079,10 +2079,10 @@ aarch64_general_expand_builtin (unsigned int fcode, 
>>> tree exp, rtx target,
>>>arg0 = CALL_EXPR_ARG (exp, 0);
>>>op0 = force_reg (Pmode, expand_normal (arg0));
>>>  
>>> -  if (!target)
>>> +  if (!(target
>>> +   && REG_P (target)
>>> +   && GET_MODE (target) == Pmode))
>>> target = gen_reg_rtx (Pmode);
>>> -  else
>>> -   target = force_reg (Pmode, target);
>>>  
>>>emit_move_insn (target, op0);
>>
>> Do we actually use the result of this move?  It looked like we always
>> use op0 rather than target (good) and overwrite target with a later move.
>>
>> If so, I think we should delete the move
>
> Good point agree.
>
>> and convert the later code to use expand_insn.
>
> I'm not sure I understand the suggestion right, xpaclri patterns
> are written with hardcoded in/out regs, is the suggestion to just use like
> 'expand_insn (CODE_FOR_xpaclri, 0, NULL)' in place of GEN_FCN+emit_insn?

Oops, sorry for the bogus comment, didn't look closely enough.

So yeah, no need to use expand_insn.  Rather than generate a new target,
it should be OK to return lr and x17_reg directly.  (Hope I'm right
this time. ;-))

Thanks,
Richard


Re: [PATCH v2 2/2] rs6000: Expand vec_insert in expander instead of gimple [PR79251]

2020-09-25 Thread Richard Sandiford
Richard Biener  writes:
> On Thu, Sep 24, 2020 at 9:38 PM Segher Boessenkool
>  wrote:
>>
>> Hi!
>>
>> On Thu, Sep 24, 2020 at 04:55:21PM +0200, Richard Biener wrote:
>> > Btw, on x86_64 the following produces sth reasonable:
>> >
>> > #define N 32
>> > typedef int T;
>> > typedef T V __attribute__((vector_size(N)));
>> > V setg (V v, int idx, T val)
>> > {
>> >   V valv = (V){idx, idx, idx, idx, idx, idx, idx, idx};
>> >   V mask = ((V){0, 1, 2, 3, 4, 5, 6, 7} == valv);
>> >   v = (v & ~mask) | (valv & mask);
>> >   return v;
>> > }
>> >
>> > vmovd   %edi, %xmm1
>> > vpbroadcastd%xmm1, %ymm1
>> > vpcmpeqd.LC0(%rip), %ymm1, %ymm2
>> > vpblendvb   %ymm2, %ymm1, %ymm0, %ymm0
>> > ret
>> >
>> > I'm quite sure you could do sth similar on power?
>>
>> This only allows inserting aligned elements.  Which is probably fine
>> of course (we don't allow elements that straddle vector boundaries
>> either, anyway).
>>
>> And yes, we can do that :-)
>>
>> That should be
>>   #define N 32
>>   typedef int T;
>>   typedef T V __attribute__((vector_size(N)));
>>   V setg (V v, int idx, T val)
>>   {
>> V valv = (V){val, val, val, val, val, val, val, val};
>> V idxv = (V){idx, idx, idx, idx, idx, idx, idx, idx};
>> V mask = ((V){0, 1, 2, 3, 4, 5, 6, 7} == idxv);
>> v = (v & ~mask) | (valv & mask);
>> return v;
>>   }
>
> Whoops yeah, simplified it a bit too much ;)
>
>> after which I get (-march=znver2)
>>
>> setg:
>> vmovd   %edi, %xmm1
>> vmovd   %esi, %xmm2
>> vpbroadcastd%xmm1, %ymm1
>> vpbroadcastd%xmm2, %ymm2
>> vpcmpeqd.LC0(%rip), %ymm1, %ymm1
>> vpandn  %ymm0, %ymm1, %ymm0
>> vpand   %ymm2, %ymm1, %ymm1
>> vpor%ymm0, %ymm1, %ymm0
>> ret
>
> I get with -march=znver2 -O2
>
> vmovd   %edi, %xmm1
> vmovd   %esi, %xmm2
> vpbroadcastd%xmm1, %ymm1
> vpbroadcastd%xmm2, %ymm2
> vpcmpeqd.LC0(%rip), %ymm1, %ymm1
> vpblendvb   %ymm1, %ymm2, %ymm0, %ymm0
>
> and with -mavx512vl
>
> vpbroadcastd%edi, %ymm1
> vpcmpd  $0, .LC0(%rip), %ymm1, %k1
> vpbroadcastd%esi, %ymm0{%k1}
>
> broadcast-with-mask - heh, would be interesting if we manage
> to combine v[idx1] = val; v[idx2] = val; ;)
>
> Now, with SSE4.2 the 16byte case compiles to
>
> setg:
> .LFB0:
> .cfi_startproc
> movd%edi, %xmm3
> movdqa  %xmm0, %xmm1
> movd%esi, %xmm4
> pshufd  $0, %xmm3, %xmm0
> pcmpeqd .LC0(%rip), %xmm0
> movdqa  %xmm0, %xmm2
> pandn   %xmm1, %xmm2
> pshufd  $0, %xmm4, %xmm1
> pand%xmm1, %xmm0
> por %xmm2, %xmm0
> ret
>
> since there's no blend with a variable mask IIRC.
>
> with aarch64 and SVE it doesn't handle the 32byte case at all,

FWIW, the SVE version with -msve-vector-bits=256 is:

ptrue   p0.b, vl32
mov z1.s, w1
index   z2.s, #0, #1
ld1wz0.s, p0/z, [x0]
cmpeq   p1.s, p0/z, z1.s, z2.s
mov z0.s, p1/m, w2
st1wz0.s, p0, [x8]

where the ptrue, ld1w and st1w are just because generic 256-bit
vectors are passed in memory; the real operation is:

mov z1.s, w1
index   z2.s, #0, #1
cmpeq   p1.s, p0/z, z1.s, z2.s
mov z0.s, p1/m, w2

Thanks,
Richard


Re: [PATCH] middle-end/96814 - fix VECTOR_BOOLEAN_TYPE_P CTOR RTL expansion

2020-09-25 Thread Richard Sandiford
Richard Biener  writes:
> The RTL expansion code for CTORs doesn't handle VECTOR_BOOLEAN_TYPE_P
> with bit-precision elements correctly as the testcase shows before
> the PR97085 fix.  The following makes it do the correct thing
> (not 100% sure for CTOR of sub-vectors due to the lack of a testcase).
>
> The alternative would be to assert such CTORs do not happen (and also
> add IL verification for this).
>
> The GIMPLE FE needs a way to declare the VECTOR_BOOLEAN_TYPE_P vectors
> (thus the C FE needs that), thus test coverage is quite limited (zero)
> now and I didn't manage to convince GCC to create such CTOR for SVE
> VnBImode vectors.
>
> Bootstrapped and tested on x86_64-unknown-linux-gnu.
>
> Does this look sensible?
>
> Thanks,
> Richard.
>
> 2020-09-25  Richard Biener  
>
>   PR middle-end/96814
>   * expr.c (store_constructor): Handle VECTOR_BOOLEAN_TYPE_P
>   CTORs correctly.
>
>   * gcc.target/i386/pr96814.c: New testcase.
> ---
>  gcc/expr.c  | 28 ++---
>  gcc/testsuite/gcc.target/i386/pr96814.c | 19 +
>  2 files changed, 40 insertions(+), 7 deletions(-)
>  create mode 100644 gcc/testsuite/gcc.target/i386/pr96814.c
>
> diff --git a/gcc/expr.c b/gcc/expr.c
> index 1a15f24b397..fb42e485089 100644
> --- a/gcc/expr.c
> +++ b/gcc/expr.c
> @@ -6922,7 +6922,9 @@ store_constructor (tree exp, rtx target, int cleared, 
> poly_int64 size,
>   insn_code icode = CODE_FOR_nothing;
>   tree elt;
>   tree elttype = TREE_TYPE (type);
> - int elt_size = tree_to_uhwi (TYPE_SIZE (elttype));
> + int elt_size
> +   = (VECTOR_BOOLEAN_TYPE_P (type) ? TYPE_PRECISION (elttype)
> +  : tree_to_uhwi (TYPE_SIZE (elttype)));

FWIW, we now have vector_element_bits for this.

>   machine_mode eltmode = TYPE_MODE (elttype);
>   HOST_WIDE_INT bitsize;
>   HOST_WIDE_INT bitpos;
> @@ -6987,6 +6989,23 @@ store_constructor (tree exp, rtx target, int cleared, 
> poly_int64 size,
> }
> }
>  
> + /* Compute the size of the elements in the CTOR.  */
> + tree val_type = TREE_TYPE (CONSTRUCTOR_ELT (exp, 0)->value);
> + if (VECTOR_BOOLEAN_TYPE_P (type))
> +   {
> + if (VECTOR_TYPE_P (val_type))
> +   {
> + /* ???  Never seen such beast, but it's not disallowed.  */
> + gcc_assert (VECTOR_BOOLEAN_TYPE_P (val_type));
> + bitsize = (TYPE_PRECISION (TREE_TYPE (val_type))
> +* TYPE_VECTOR_SUBPARTS (val_type).to_constant ());
> +   }
> + else
> +   bitsize = TYPE_PRECISION (val_type);
> +   }
> + else
> +   bitsize = tree_to_uhwi (TYPE_SIZE (val_type));
> +

What do we allow for non-boolean constructors.  E.g. for:

  v2hi = 0xf001;

do we allow the CONSTRUCTOR to be { 0xf001 }?  Is the type of an
initialiser value allowed to be arbitrarily different from the type
of the elements being initialised?

Or is there requirement that (say) each constructor element is either:

- a scalar that initialises one element of the constructed vector
- a vector of N elements that initialises N elements of the constructed vector

?

Like you say, it mostly seems like guesswork how booleans would be
handled here, but personally I don't know the answer for non-booleans
either :-)

Thanks,
Richard


[committed] libstdc++: Remove redundant -std=gnu++1z flags from makefile

2020-09-25 Thread Jonathan Wakely via Gcc-patches
Now that G++ defaults to gnu++17 we don't need special rules for
compiling the C++17 allocation and deallocation functions.

libstdc++-v3/ChangeLog:

* libsupc++/Makefile.am: Remove redundant -std=gnu++1z flags.
* libsupc++/Makefile.in: Regenerate.

Tested powerpc64le-linux. Committed to trunk.

commit 473da7e22c809fda9e3b37557d6ee8c07b226ca4
Author: Jonathan Wakely 
Date:   Fri Sep 25 12:50:17 2020

libstdc++: Remove redundant -std=gnu++1z flags from makefile

Now that G++ defaults to gnu++17 we don't need special rules for
compiling the C++17 allocation and deallocation functions.

libstdc++-v3/ChangeLog:

* libsupc++/Makefile.am: Remove redundant -std=gnu++1z flags.
* libsupc++/Makefile.in: Regenerate.

diff --git a/libstdc++-v3/libsupc++/Makefile.am 
b/libstdc++-v3/libsupc++/Makefile.am
index 35ad3ae7799..091fe159d5a 100644
--- a/libstdc++-v3/libsupc++/Makefile.am
+++ b/libstdc++-v3/libsupc++/Makefile.am
@@ -128,28 +128,6 @@ cp-demangle.o: cp-demangle.c
$(C_COMPILE) -DIN_GLIBCPP_V3 -Wno-error -c $<
 
 
-# Use special rules for the C++17 sources so that the proper flags are passed.
-new_opa.lo: new_opa.cc
-   $(LTCXXCOMPILE) -std=gnu++1z -c $<
-new_opant.lo: new_opant.cc
-   $(LTCXXCOMPILE) -std=gnu++1z -c $<
-new_opva.lo: new_opva.cc
-   $(LTCXXCOMPILE) -std=gnu++1z -c $<
-new_opvant.lo: new_opvant.cc
-   $(LTCXXCOMPILE) -std=gnu++1z -c $<
-del_opa.lo: del_opa.cc
-   $(LTCXXCOMPILE) -std=gnu++1z -c $<
-del_opant.lo: del_opant.cc
-   $(LTCXXCOMPILE) -std=gnu++1z -c $<
-del_opsa.lo: del_opsa.cc
-   $(LTCXXCOMPILE) -std=gnu++1z -c $<
-del_opva.lo: del_opva.cc
-   $(LTCXXCOMPILE) -std=gnu++1z -c $<
-del_opvant.lo: del_opvant.cc
-   $(LTCXXCOMPILE) -std=gnu++1z -c $<
-del_opvsa.lo: del_opvsa.cc
-   $(LTCXXCOMPILE) -std=gnu++1z -c $<
-
 # AM_CXXFLAGS needs to be in each subdirectory so that it can be
 # modified in a per-library or per-sub-library way.  Need to manually
 # set this option because CONFIG_CXXFLAGS has to be after


Patch ping

2020-09-25 Thread Jakub Jelinek via Gcc-patches
Hi!

I'd like to ping a few patches:

https://gcc.gnu.org/pipermail/gcc-patches/2020-August/552451.html
  - allow plugins to deal with global_options layout changes

https://gcc.gnu.org/pipermail/gcc-patches/2020-September/553420.html
  - --enable-link-serialization{,=N} support

https://gcc.gnu.org/pipermail/gcc-patches/2020-September/553948.html
  - PR96994 - fix up C++ handling of default initialization with
  consteval default ctor

https://gcc.gnu.org/pipermail/gcc-patches/2020-September/553992.html
  - pass -gdwarf-5 to assembler for -gdwarf-5 if possible

https://gcc.gnu.org/pipermail/gcc-patches/2020-September/554246.html
  - PR97073 - fix wrong-code on double-word op expansion

Jakub



Re: [PATCH] add move CTOR to auto_vec, use auto_vec for get_loop_exit_edges

2020-09-25 Thread Tom de Vries
On 9/24/20 5:05 PM, Richard Biener wrote:
> On Thu, 24 Sep 2020, Jonathan Wakely wrote:
> 
>> On 24/09/20 11:11 +0200, Richard Biener wrote:
>>> On Wed, 26 Aug 2020, Richard Biener wrote:
>>>
 On Thu, 6 Aug 2020, Richard Biener wrote:

> On Thu, 6 Aug 2020, Richard Biener wrote:
>
>> This adds a move CTOR to auto_vec and makes use of a
>> auto_vec return value for get_loop_exit_edges denoting
>> that lifetime management of the vector is handed to the caller.
>>
>> The move CTOR prompted the hash_table change because it appearantly
>> makes the copy CTOR implicitely deleted (good) and hash-table
>> expansion of the odr_enum_map which is
>> hash_map  where odr_enum has an
>> auto_vec member triggers this.  Not sure if
>> there's a latent bug there before this (I think we're not
>> invoking DTORs, but we're invoking copy-CTORs).
>>
>> Bootstrap / regtest running on x86_64-unknown-linux-gnu.
>>
>> Does this all look sensible and is it a good change
>> (the get_loop_exit_edges one)?
>
> Regtest went OK, here's an update with a complete ChangeLog
> (how useful..) plus the move assign operator deleted, copy
> assign wouldn't work as auto-generated and at the moment
> there's no use of assigning.  I guess if we'd have functions
> that take an auto_vec<> argument meaning they will destroy
> the vector that will become useful and we can implement it.
>
> OK for trunk?

 Ping.
>>>
>>> Ping^2.
>>
>> Looks good to me as far as the use of C++ features goes.
> 
> Thanks, now pushed after re-testing.

Ran into a build breaker after this commit, reported here (
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=97207 ).

Thanks,
- Tom


Re: [PATCH] c++/97197 - support TARGET_MEM_REF in C/C++ error pretty-printing

2020-09-25 Thread Richard Biener
On Fri, 25 Sep 2020, Jakub Jelinek wrote:

> On Fri, Sep 25, 2020 at 01:11:37PM +0200, Richard Biener wrote:
> > This adds rough support to avoid "'target_mem_ref' not supported by"
> > in diagnostics.  There were recent patches by Martin to sanitize
> > dumping of MEM_REF so I'm not trying to interfere with this here.
> 
> Is that correct?
> I mean, TARGET_MEM_REF encodes more than what MEM_REF encodes,
> so printing it like MEM_REF will ignore many things from there.
> I'd say we should print it like:
> *(type *)(BASE + STEP * INDEX + INDEX2 + OFFSET)
> rather than how we print MEM_REFs as
> *(type *)(BASE + OFFSET)
> (with skipping whatever is NULL in there).
> So instead of adding case MEM_REF: in the second and last hunk
> copy and edit it (perhaps kill the probably unnecessary
> part that checks for * and prints it as foo, because who would
> create TARGET_MEM_REF when MEM_REF could have been used in that case).

See my comment above for Martins attempts to improve things.  I don't
really want to try decide what to do with those late diagnostic IL
printing but my commit was blamed for showing target-mem-ref unsupported.

I don't have much time to spend to think what to best print and what not,
but yes, printing only the MEM_REF part is certainly imprecise.

I'll leave the PR to FE folks.

Thanks,
Richard.

> > 
> > Bootstrap & regtest pending.
> > 
> > OK?
> > 
> > 2020-09-25  Richard Biener  
> > 
> > PR c++/97197
> > cp/
> > * error.c (dump_expr): Handle TARGET_MEM_REF as if it
> > were MEM_REF.
> > 
> > c-family/
> > * c-pretty-print.c (c_pretty_printer::postfix_expression):
> > Handle TARGET_MEM_REF as expression.
> > (c_pretty_printer::expression): Handle TARGET_MEM_REF as
> > unary_expression.
> > (c_pretty_printer::unary_expression): Handle TARGET_MEM_REF
> > as if it were MEM_REF.
> 
>   Jakub
> 
> 

-- 
Richard Biener 
SUSE Software Solutions Germany GmbH, Maxfeldstrasse 5, 90409 Nuernberg,
Germany; GF: Felix Imend


Re: [PATCH v4 1/3] IFN: Implement IFN_VEC_SET for ARRAY_REF with VIEW_CONVERT_EXPR

2020-09-25 Thread Richard Biener via Gcc-patches
On Fri, Sep 25, 2020 at 8:51 AM xionghu luo  wrote:
>
> Hi,
>
> On 2020/9/24 20:39, Richard Sandiford wrote:
> > xionghu luo  writes:
> >> @@ -2658,6 +2659,43 @@ expand_vect_cond_mask_optab_fn (internal_fn, gcall 
> >> *stmt, convert_optab optab)
> >>
> >>   #define expand_vec_cond_mask_optab_fn expand_vect_cond_mask_optab_fn
> >>
> >> +/* Expand VEC_SET internal functions.  */
> >> +
> >> +static void
> >> +expand_vec_set_optab_fn (internal_fn, gcall *stmt, convert_optab optab)
> >> +{
> >> +  tree lhs = gimple_call_lhs (stmt);
> >> +  tree op0 = gimple_call_arg (stmt, 0);
> >> +  tree op1 = gimple_call_arg (stmt, 1);
> >> +  tree op2 = gimple_call_arg (stmt, 2);
> >> +  rtx target = expand_expr (lhs, NULL_RTX, VOIDmode, EXPAND_WRITE);
> >> +  rtx src = expand_expr (op0, NULL_RTX, VOIDmode, EXPAND_WRITE);
> >
> > I'm not sure about the expand_expr here.  ISTM that op0 is a normal
> > input and so should be expanded by expand_normal rather than
> > EXPAND_WRITE.  Also:
> >
> >> +
> >> +  machine_mode outermode = TYPE_MODE (TREE_TYPE (op0));
> >> +  scalar_mode innermode = GET_MODE_INNER (outermode);
> >> +
> >> +  rtx value = expand_expr (op1, NULL_RTX, VOIDmode, EXPAND_NORMAL);
> >> +  rtx pos = expand_expr (op2, NULL_RTX, VOIDmode, EXPAND_NORMAL);
> >> +
> >> +  class expand_operand ops[3];
> >> +  enum insn_code icode = optab_handler (optab, outermode);
> >> +
> >> +  if (icode != CODE_FOR_nothing)
> >> +{
> >> +  pos = convert_to_mode (E_SImode, pos, 0);
> >> +
> >> +  create_fixed_operand ([0], src);
> >
> > ...this would mean that if SRC happens to be a MEM, the pattern
> > must also accept a MEM.
> >
> > ISTM that we're making more work for ourselves by not “fixing” the optab
> > to have a natural pure-input + pure-output interface. :-)  But if we
> > stick with the current optab interface, I think we need to:
> >
> > - create a temporary register
> > - move SRC into the temporary register before the insn
> > - use create_fixed_operand with the temporary register for operand 0
> > - move the temporary register into TARGET after the insn
> >
> >> +  create_input_operand ([1], value, innermode);
> >> +  create_input_operand ([2], pos, GET_MODE (pos));
> >
> > For this I think we should use convert_operand_from on the original “pos”,
> > so that the target gets to choose what the mode of the operand is.
> >
>
> Thanks a lot for the nice suggestions, fixed them all and updated the patch 
> as below.
>
>
> [PATCH v4 1/3] IFN: Implement IFN_VEC_SET for ARRAY_REF with VIEW_CONVERT_EXPR
>
> This patch enables transformation from ARRAY_REF(VIEW_CONVERT_EXPR) to
> VEC_SET internal function in gimple-isel pass if target supports
> vec_set with variable index by checking can_vec_set_var_idx_p.

OK with me if Richard is happy with the updated patch.

Thanks,
Richard.

> gcc/ChangeLog:
>
> 2020-09-25  Xionghu Luo  
>
> * gimple-isel.cc (gimple_expand_vec_set_expr): New function.
> (gimple_expand_vec_cond_exprs): Rename to ...
> (gimple_expand_vec_exprs): ... this and call
> gimple_expand_vec_set_expr.
> * internal-fn.c (vec_set_direct): New define.
> (expand_vec_set_optab_fn): New function.
> (direct_vec_set_optab_supported_p): New define.
> * internal-fn.def (VEC_SET): New DEF_INTERNAL_OPTAB_FN.
> * optabs.c (can_vec_set_var_idx_p): New function.
> * optabs.h (can_vec_set_var_idx_p): New declaration.
> ---
>  gcc/gimple-isel.cc  | 75 +++--
>  gcc/internal-fn.c   | 41 +
>  gcc/internal-fn.def |  2 ++
>  gcc/optabs.c| 21 +
>  gcc/optabs.h|  4 +++
>  5 files changed, 141 insertions(+), 2 deletions(-)
>
> diff --git a/gcc/gimple-isel.cc b/gcc/gimple-isel.cc
> index b330cf4c20e..02513e04900 100644
> --- a/gcc/gimple-isel.cc
> +++ b/gcc/gimple-isel.cc
> @@ -35,6 +35,74 @@ along with GCC; see the file COPYING3.  If not see
>  #include "tree-cfg.h"
>  #include "bitmap.h"
>  #include "tree-ssa-dce.h"
> +#include "memmodel.h"
> +#include "optabs.h"
> +
> +/* Expand all ARRAY_REF(VIEW_CONVERT_EXPR) gimple assignments into calls to
> +   internal function based on vector type of selected expansion.
> +   i.e.:
> + VIEW_CONVERT_EXPR(u)[_1] =  = i_4(D);
> +   =>
> + _7 = u;
> + _8 = .VEC_SET (_7, i_4(D), _1);
> + u = _8;  */
> +
> +static gimple *
> +gimple_expand_vec_set_expr (gimple_stmt_iterator *gsi)
> +{
> +  enum tree_code code;
> +  gcall *new_stmt = NULL;
> +  gassign *ass_stmt = NULL;
> +
> +  /* Only consider code == GIMPLE_ASSIGN.  */
> +  gassign *stmt = dyn_cast (gsi_stmt (*gsi));
> +  if (!stmt)
> +return NULL;
> +
> +  tree lhs = gimple_assign_lhs (stmt);
> +  code = TREE_CODE (lhs);
> +  if (code != ARRAY_REF)
> +return NULL;
> +
> +  tree val = gimple_assign_rhs1 (stmt);
> +  tree op0 = TREE_OPERAND (lhs, 0);
> +  if (TREE_CODE (op0) == VIEW_CONVERT_EXPR && DECL_P (TREE_OPERAND (op0, 0))
> +  

Re: [Patch] OpenMP: Handle cpp_implicit_alias in declare-target discovery (PR96390)

2020-09-25 Thread Jakub Jelinek via Gcc-patches
On Wed, Sep 23, 2020 at 05:45:12PM +0200, Tobias Burnus wrote:
> On 9/23/20 4:06 PM, Jakub Jelinek wrote:
> 
> > What I really meant was:
> I did now something based on this.
> > > +  gcc_assert (node->alias && node->analyzed);
> 
> I believe from previous testing that node->analyzed is 0
> for the testcase at hand — and, hence, ultimate_alias_target()

That would be surprising, because if it is not node->analyzed, then
ultimate_alias_target_1 will not change node at all.

Anyway, the patch LGTM, thanks.

Jakub



Re: [PATCH] c++/97197 - support TARGET_MEM_REF in C/C++ error pretty-printing

2020-09-25 Thread Jakub Jelinek via Gcc-patches
On Fri, Sep 25, 2020 at 01:11:37PM +0200, Richard Biener wrote:
> This adds rough support to avoid "'target_mem_ref' not supported by"
> in diagnostics.  There were recent patches by Martin to sanitize
> dumping of MEM_REF so I'm not trying to interfere with this here.

Is that correct?
I mean, TARGET_MEM_REF encodes more than what MEM_REF encodes,
so printing it like MEM_REF will ignore many things from there.
I'd say we should print it like:
*(type *)(BASE + STEP * INDEX + INDEX2 + OFFSET)
rather than how we print MEM_REFs as
*(type *)(BASE + OFFSET)
(with skipping whatever is NULL in there).
So instead of adding case MEM_REF: in the second and last hunk
copy and edit it (perhaps kill the probably unnecessary
part that checks for * and prints it as foo, because who would
create TARGET_MEM_REF when MEM_REF could have been used in that case).
> 
> Bootstrap & regtest pending.
> 
> OK?
> 
> 2020-09-25  Richard Biener  
> 
>   PR c++/97197
> cp/
>   * error.c (dump_expr): Handle TARGET_MEM_REF as if it
>   were MEM_REF.
> 
> c-family/
>   * c-pretty-print.c (c_pretty_printer::postfix_expression):
>   Handle TARGET_MEM_REF as expression.
>   (c_pretty_printer::expression): Handle TARGET_MEM_REF as
>   unary_expression.
>   (c_pretty_printer::unary_expression): Handle TARGET_MEM_REF
>   as if it were MEM_REF.

Jakub



RE: [PATCH] arm: Add missing Neoverse V1 feature

2020-09-25 Thread Kyrylo Tkachov


> -Original Message-
> From: Alex Coplan 
> Sent: 25 September 2020 12:18
> To: gcc-patches@gcc.gnu.org
> Cc: ni...@redhat.com; Richard Earnshaw ;
> Ramana Radhakrishnan ; Kyrylo
> Tkachov 
> Subject: [PATCH] arm: Add missing Neoverse V1 feature
> 
> Hello,
> 
> This simple follow-on patch adds a missing feature (FP16) to the
> Neoverse V1 description in AArch32 GCC.
> 
> OK for master?

Ok, sorry for not catching it in the original review.
Kyrill

> 
> Thanks,
> Alex
> 
> ---
> 
> gcc/ChangeLog:
> 
>   * config/arm/arm-cpus.in (neoverse-v1): Add FP16.



Re: Disable modref for ipa-pta-13.c testcase

2020-09-25 Thread Richard Biener via Gcc-patches
On Fri, Sep 25, 2020 at 1:04 PM Jan Hubicka  wrote:
>
> Hi,
> parameter tracking in ipa-modref causes failure of ipa-pta-13 testcase.
> In partiuclar the check for "= x;" in fre3 is failing since we optimize
> it out in fre1.  As far as I can tell this is correct transform because
> ipa-modref propagates the fact that the call is passed pointer to y.
> Comment speaks of missed optimization, so I gues sit is OK to disable
> modref here so we still test whatever this was testing before?
>
Hmm, I guess so.  Ideally both local and local_address_taken would
be noipa but then IPA PTA wouldn't apply either ;)  So yes, OK to
disable modref.

Richard.

> Honza
>
> gcc/testsuite/ChangeLog:
>
> 2020-09-25  Jan Hubicka  
>
> * gcc.dg/ipa/ipa-pta-13.c: Disable ipa-modref.
>
> diff --git a/gcc/testsuite/gcc.dg/ipa/ipa-pta-13.c 
> b/gcc/testsuite/gcc.dg/ipa/ipa-pta-13.c
> index 93dd87107cc..e7bf6d485a4 100644
> --- a/gcc/testsuite/gcc.dg/ipa/ipa-pta-13.c
> +++ b/gcc/testsuite/gcc.dg/ipa/ipa-pta-13.c
> @@ -1,5 +1,5 @@
>  /* { dg-do link } */
> -/* { dg-options "-O2 -fipa-pta -fdump-ipa-pta2-details -fdump-tree-fre3 
> -fno-ipa-icf" } */
> +/* { dg-options "-O2 -fipa-pta -fdump-ipa-pta2-details -fdump-tree-fre3 
> -fno-ipa-icf -fno-ipa-modref" } */
>
>  static int x, y;
>


[PATCH] arm: Add missing Neoverse V1 feature

2020-09-25 Thread Alex Coplan
Hello,

This simple follow-on patch adds a missing feature (FP16) to the
Neoverse V1 description in AArch32 GCC.

OK for master?

Thanks,
Alex

---

gcc/ChangeLog:

* config/arm/arm-cpus.in (neoverse-v1): Add FP16.

diff --git a/gcc/config/arm/arm-cpus.in b/gcc/config/arm/arm-cpus.in
index be563b7f807..bf460ddbcaf 100644
--- a/gcc/config/arm/arm-cpus.in
+++ b/gcc/config/arm/arm-cpus.in
@@ -1494,7 +1494,7 @@ begin cpu neoverse-v1
   cname neoversev1
   tune for cortex-a57
   tune flags LDSCHED
-  architecture armv8.4-a+bf16+i8mm
+  architecture armv8.4-a+fp16+bf16+i8mm
   option crypto add FP_ARMv8 CRYPTO
   costs cortex_a57
 end cpu neoverse-v1


Track arguments pointing to local or readonly memory in ipa-fnsummary

2020-09-25 Thread Jan Hubicka
Hi,
this patch implement trakcing wehther argument points to readonly memory. This
is is useful for ipa-modref as well as for inline heuristics.  It is desirable
to inline functions that dereference pointers to local variables in order
to support SRA.  We always did the oposite heuristics (guessing that the
dereferences will be optimized out with 50% probability) but here we could
increase the probability for cases where we can track that argument is indeed
a local memory (or readonly which is also good)

Bootstrapped/regtested x86_64-linux.  I plan to commit it later today unless
there are comments.

Honza

* ipa-fnsummary.c (dump_ipa_call_summary): Dump
points_to_local_or_readonly_memory flag.
(analyze_function_body): Compute points_to_local_or_readonly_memory
flag.
(remap_edge_change_prob): Rename to ...
(remap_edge_params): ... this one; update
points_to_local_or_readonly_memory.
(remap_edge_summaries): Update.
(read_ipa_call_summary): Stream the new flag.
(write_ipa_call_summary): Likewise.
* ipa-predicate.h (struct inline_param_summary): Add
points_to_local_or_readonly_memory.
(inline_param_summary::equal_to): Update.
(inline_param_summary::useless_p): Update.
diff --git a/gcc/ipa-fnsummary.c b/gcc/ipa-fnsummary.c
index bb703f62206..7f12b116dec 100644
--- a/gcc/ipa-fnsummary.c
+++ b/gcc/ipa-fnsummary.c
@@ -980,6 +980,9 @@ dump_ipa_call_summary (FILE *f, int indent, struct 
cgraph_node *node,
else if (prob != REG_BR_PROB_BASE)
  fprintf (f, "%*s op%i change %f%% of time\n", indent + 2, "", i,
   prob * 100.0 / REG_BR_PROB_BASE);
+   if (es->param[i].points_to_local_or_readonly_memory)
+ fprintf (f, "%*s op%i points to local or readonly memory\n",
+  indent + 2, "", i);
  }
   if (!edge->inline_failed)
{
@@ -2671,6 +2674,9 @@ analyze_function_body (struct cgraph_node *node, bool 
early)
  int prob = param_change_prob (, stmt, i);
  gcc_assert (prob >= 0 && prob <= REG_BR_PROB_BASE);
  es->param[i].change_prob = prob;
+ es->param[i].points_to_local_or_readonly_memory
+= points_to_local_or_readonly_memory_p
+(gimple_call_arg (stmt, i));
}
}
 
@@ -3783,15 +3789,17 @@ inline_update_callee_summaries (struct cgraph_node 
*node, int depth)
 ipa_call_summaries->get (e)->loop_depth += depth;
 }
 
-/* Update change_prob of EDGE after INLINED_EDGE has been inlined.
+/* Update change_prob and points_to_local_or_readonly_memory of EDGE after
+   INLINED_EDGE has been inlined.
+
When function A is inlined in B and A calls C with parameter that
changes with probability PROB1 and C is known to be passthrough
of argument if B that change with probability PROB2, the probability
of change is now PROB1*PROB2.  */
 
 static void
-remap_edge_change_prob (struct cgraph_edge *inlined_edge,
-   struct cgraph_edge *edge)
+remap_edge_params (struct cgraph_edge *inlined_edge,
+  struct cgraph_edge *edge)
 {
   if (ipa_node_params_sum)
 {
@@ -3825,7 +3833,16 @@ remap_edge_change_prob (struct cgraph_edge *inlined_edge,
prob = 1;
 
  es->param[i].change_prob = prob;
+
+ if (inlined_es
+   ->param[id].points_to_local_or_readonly_memory)
+   es->param[i].points_to_local_or_readonly_memory = true;
}
+ if (!es->param[i].points_to_local_or_readonly_memory
+ && jfunc->type == IPA_JF_CONST
+ && points_to_local_or_readonly_memory_p
+(ipa_get_jf_constant (jfunc)))
+   es->param[i].points_to_local_or_readonly_memory = true;
}
}
 }
@@ -3858,7 +3875,7 @@ remap_edge_summaries (struct cgraph_edge *inlined_edge,
   if (e->inline_failed)
{
   class ipa_call_summary *es = ipa_call_summaries->get (e);
- remap_edge_change_prob (inlined_edge, e);
+ remap_edge_params (inlined_edge, e);
 
  if (es->predicate)
{
@@ -3884,7 +3901,7 @@ remap_edge_summaries (struct cgraph_edge *inlined_edge,
   predicate p;
   next = e->next_callee;
 
-  remap_edge_change_prob (inlined_edge, e);
+  remap_edge_params (inlined_edge, e);
   if (es->predicate)
{
  p = es->predicate->remap_after_inlining
@@ -4210,12 +4227,19 @@ read_ipa_call_summary (class lto_input_block *ib, 
struct cgraph_edge *e,
 {
   es->param.safe_grow_cleared (length, true);
   for (i = 0; i < length; i++)
-   es->param[i].change_prob = streamer_read_uhwi (ib);
+   {
+ es->param[i].change_prob = streamer_read_uhwi (ib);
+ 

  1   2   >