Re: [PATCH] PowerPC Fix ibm128 defaults for pr70117.c test.

2020-11-19 Thread Michael Meissner via Gcc-patches
On Wed, Nov 18, 2020 at 04:29:09PM -0600, Segher Boessenkool wrote:
> On Wed, Nov 18, 2020 at 10:53:49PM +0100, Jakub Jelinek wrote:
> > On Wed, Nov 18, 2020 at 03:43:20PM -0600, Segher Boessenkool wrote:
> > > Hi!
> > > 
> > > On Sun, Nov 15, 2020 at 12:17:47PM -0500, Michael Meissner wrote:
> > > > --- a/gcc/testsuite/gcc.target/powerpc/pr70117.c
> > > > +++ b/gcc/testsuite/gcc.target/powerpc/pr70117.c
> > > > @@ -9,9 +9,11 @@
> > > > 128-bit floating point, because the type is not enabled on those
> > > > systems.  */
> > > >  #define LDOUBLE __ibm128
> > > > +#define IBM128_MAX ((__ibm128) 
> > > > 1.79769313486231580793728971405301199e+308L)
> > > 
> > > This is the IEEE QP float number 43fef780 which
> > > I very much doubt is the maximum finite double-double?  See the 0 in the
> > 
> > Numbers without the 0 in the middle-end aren't valid, see
> > https://gcc.gnu.org/bugzilla/show_bug.cgi?id=95450#c6
> > for more details.  Without the 0 in the middle the double double number
> > rounded to double would require increasing the higher double, and as it is
> > the largest representable finite number, that is not possible.
> 
> Ah, in that way.  Tricky.
> 
> Mike, please add a comment, what number it represents?  Okay for trunk
> with that, thanks.
> 
> (Should those not be define in some header though?)

When long double is IBM extended double, then LDBL_MAX, etc. is set with math.h
(and the __ version created by the compiler).  We don't have min/max for the
funky MD only floating point numbers defined.  I got the number by printing
LDBL_MAX in fact and just pasting that in.

-- 
Michael Meissner, IBM
IBM, M/S 2506R, 550 King Street, Littleton, MA 01460-6245, USA
email: meiss...@linux.ibm.com, phone: +1 (978) 899-4797


Re: [PATCH] PowerPC: Restrict long double test to use IBM long double.

2020-11-19 Thread Michael Meissner via Gcc-patches
On Wed, Nov 18, 2020 at 01:27:12PM -0600, Segher Boessenkool wrote:
> Hi!
> 
> On Sun, Nov 15, 2020 at 12:23:50PM -0500, Michael Meissner wrote:
> > --- a/gcc/testsuite/c-c++-common/dfp/convert-bfp-11.c
> > +++ b/gcc/testsuite/c-c++-common/dfp/convert-bfp-11.c
> > @@ -1,4 +1,5 @@
> >  /* { dg-skip-if "" { ! "powerpc*-*-linux*" } } */
> > +/* { dg-require-effective-target ppc_long_double_ibm } */
> 
> You can remove the dg-skip-if then (there is nothing in this test that
> requires Linux).  But you want a
> /* { dg-require-effective-target dfp } */
> (or dfprt).
> 
> So what happens if you use IEEE QP float, instead?  You didn't explain.
> (Explain in the source code, with a comment where you require it!)

The test fails because QP doesn't produce the exact value that is looked for.
Given all of the comments in the source explicitly say it is testing IBM
extended double, I just decided we could not support the test wth IEEE
128-bit.

-- 
Michael Meissner, IBM
IBM, M/S 2506R, 550 King Street, Littleton, MA 01460-6245, USA
email: meiss...@linux.ibm.com, phone: +1 (978) 899-4797


[PATCH] i386: Disable *2_i387_1 for TARGET_SSE_MATH modes [PR97787]

2020-11-19 Thread Uros Bizjak via Gcc-patches
This pattern interferes with *2_1 when TARGET_SSE_MATH
modes are active. Combine pass is able to remove (use) RTXes and transforms
*2_1 to *2_i387_1 where SSE
alternatives are not available.

2020-11-19  Uroš Bizjak  

gcc/
PR target/97887
* config/i386/i386.md (*2_i387_1):
Disable for TARGET_SSE_MATH modes.

gcc/testsuite/
PR target/97887
* gcc.target/i386/pr97887.c: New test.

Bootstrapped and regression tested on x86_64-linux-gnu {,-m32}.

Pushed to mainline and gcc-10.

Uros.
diff --git a/gcc/config/i386/i386.md b/gcc/config/i386/i386.md
index 52e306de00a..29935014772 100644
--- a/gcc/config/i386/i386.md
+++ b/gcc/config/i386/i386.md
@@ -10258,7 +10258,7 @@
(absneg:X87MODEF
  (match_operand:X87MODEF 1 "register_operand" "0,0")))
(clobber (reg:CC FLAGS_REG))]
-  "TARGET_80387"
+  "TARGET_80387 && !(SSE_FLOAT_MODE_P (mode) && TARGET_SSE_MATH)"
   "#")
 
 (define_split
diff --git a/gcc/testsuite/gcc.target/i386/pr97887.c 
b/gcc/testsuite/gcc.target/i386/pr97887.c
new file mode 100644
index 000..b457f054bed
--- /dev/null
+++ b/gcc/testsuite/gcc.target/i386/pr97887.c
@@ -0,0 +1,15 @@
+/* PR target/97887 */
+/* { dg-do compile } */
+/* { dg-options "-O2 -msse2 -mfpmath=sse" } */
+
+float f (float a)
+{
+  return -a / a;
+}
+
+double d (double a)
+{
+  return -a / a;
+}
+
+/* { dg-final { scan-assembler-not "fchs" } } */


[PATCH] tree-optimization/97897 - complex lowering on abnormal edges

2020-11-19 Thread Richard Biener
This fixes complex lowering to not put constants into abnormal
edge PHI values by making sure abnormally used SSA names are
VARYING in its propagation lattice.

Bootstrapped and tested on x86_64-unknown-linux-gnu, pushed to trunk.

2020-11-19  Richard Biener  

PR tree-optimization/97897
* tree-complex.c (complex_propagate::visit_stmt): Make sure
abnormally used SSA names are VARYING.
(complex_propagate::visit_phi): Likewise.
* tree-ssa.c (verify_phi_args): Verify PHI arguments on abnormal
edges are SSA names.

* gcc.dg/pr97897.c: New testcase.
---
 gcc/testsuite/gcc.dg/pr97897.c | 13 +
 gcc/tree-complex.c |  5 -
 gcc/tree-ssa.c |  6 ++
 3 files changed, 23 insertions(+), 1 deletion(-)
 create mode 100644 gcc/testsuite/gcc.dg/pr97897.c

diff --git a/gcc/testsuite/gcc.dg/pr97897.c b/gcc/testsuite/gcc.dg/pr97897.c
new file mode 100644
index 000..775f34ca767
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/pr97897.c
@@ -0,0 +1,13 @@
+/* { dg-do compile } */
+
+void h ();
+void f () __attribute__ ((returns_twice));
+void g (_Complex int a)
+{
+  f ();
+  if (a != 0)
+  {
+a = 0;
+h ();
+  }
+}
diff --git a/gcc/tree-complex.c b/gcc/tree-complex.c
index f132e0f41c7..1cfb3e8d743 100644
--- a/gcc/tree-complex.c
+++ b/gcc/tree-complex.c
@@ -318,7 +318,7 @@ complex_propagate::visit_stmt (gimple *stmt, edge 
*taken_edge_p ATTRIBUTE_UNUSED
 
   lhs = gimple_get_lhs (stmt);
   /* Skip anything but GIMPLE_ASSIGN and GIMPLE_CALL with a lhs.  */
-  if (!lhs)
+  if (!lhs || SSA_NAME_OCCURS_IN_ABNORMAL_PHI (lhs))
 return SSA_PROP_VARYING;
 
   /* These conditions should be satisfied due to the initial filter
@@ -417,6 +417,9 @@ complex_propagate::visit_phi (gphi *phi)
  set up in init_dont_simulate_again.  */
   gcc_assert (TREE_CODE (TREE_TYPE (lhs)) == COMPLEX_TYPE);
 
+  if (SSA_NAME_OCCURS_IN_ABNORMAL_PHI (lhs))
+return SSA_PROP_VARYING;
+
   /* We've set up the lattice values such that IOR neatly models PHI meet.  */
   new_l = UNINITIALIZED;
   for (i = gimple_phi_num_args (phi) - 1; i >= 0; --i)
diff --git a/gcc/tree-ssa.c b/gcc/tree-ssa.c
index c47b963bbb2..b44361f8244 100644
--- a/gcc/tree-ssa.c
+++ b/gcc/tree-ssa.c
@@ -987,6 +987,12 @@ verify_phi_args (gphi *phi, basic_block bb, basic_block 
*definition_block)
  err = true;
}
 
+  if ((e->flags & EDGE_ABNORMAL) && TREE_CODE (op) != SSA_NAME)
+   {
+ error ("PHI argument on abnormal edge is not SSA_NAME");
+ err = true;
+   }
+
   if (TREE_CODE (op) == SSA_NAME)
{
  err = verify_ssa_name (op, virtual_operand_p (gimple_phi_result 
(phi)));
-- 
2.26.2


Re: [Patch] varasm.c: Always output flags in merged .section for LLVM assembler compatibility [PR97827]

2020-11-19 Thread Richard Sandiford via Gcc-patches
Jakub Jelinek via Gcc-patches  writes:
> On Wed, Nov 18, 2020 at 12:51:02PM +0100, Tobias Burnus wrote:
>> As noted by Matthias when bootstrapping with AMD GCN support [PR97827]:
>> Assembler source code generated by GCC might no longer assembly with
>> LLVM's 'mc' since LLVM 11.
>> 
>> The reason is that GCC generates on purpose first the section with
>> the flags, e.g. (via mergeable_constant_section)
>>.section.rodata.cst8,"aM",@progbits,8
>> and then for subsequent uses, it does not repeat the flags:
>>.section.rodata.cst8
>> 
>> GNU assembler warns (and with as >=2.35 gives an error) if the flags
>> do not match, but not if the attributes/flags are left in the other
>> same-named sections (as above) – just if they are specified and different.
>> 
>> LLVM since February (in git) and released with LLVM 11 (12 Oct 2020)
>> does a similar check – but without the no-error-if-no-flag exception:
>>   strtod.s:4472:2: error: changed section flags for .rodata.cst8, expected: 
>> 0x12
>>   strtod.s:4472:2: error: changed section entsize for .rodata.cst8, 
>> expected: 8
>> 
>> 
>> The solution done by the attached patch is to emit the full flags also
>> for SECTION_MERGE.
>> 
>> Side note: For AMD GCN, we rely on LLVM as "GNU as" does not handle
>> this target, yet; still, also in general, it makes sense to be
>> compatible with llvm-mc.
>> 
>> OK?
>
> I think we shouldn't do this except when targetting the (buggy) llvm
> assembler.
> Specifying section flags just on first .section directive and not others
> is correct, there is no point repeating that and GNU as (but I think many
> other assemblers) has been supporting it that way forever.

But are there any negative effects with specifying the flags multiple
times for GNU as?  If not, then it seems simpler to generate the form
that “all” assemblers accept.

Richard


Re: [Patch] varasm.c: Always output flags in merged .section for LLVM assembler compatibility [PR97827]

2020-11-19 Thread Jakub Jelinek via Gcc-patches
On Thu, Nov 19, 2020 at 09:51:41AM +, Richard Sandiford wrote:
> > I think we shouldn't do this except when targetting the (buggy) llvm
> > assembler.
> > Specifying section flags just on first .section directive and not others
> > is correct, there is no point repeating that and GNU as (but I think many
> > other assemblers) has been supporting it that way forever.
> 
> But are there any negative effects with specifying the flags multiple
> times for GNU as?  If not, then it seems simpler to generate the form
> that “all” assemblers accept.

It makes the assembler files unnecessarily larger and harder to read.

Jakub



Re: [Patch] varasm.c: Always output flags in merged .section for LLVM assembler compatibility [PR97827]

2020-11-19 Thread Richard Sandiford via Gcc-patches
Jakub Jelinek via Gcc-patches  writes:
> On Thu, Nov 19, 2020 at 09:51:41AM +, Richard Sandiford wrote:
>> > I think we shouldn't do this except when targetting the (buggy) llvm
>> > assembler.
>> > Specifying section flags just on first .section directive and not others
>> > is correct, there is no point repeating that and GNU as (but I think many
>> > other assemblers) has been supporting it that way forever.
>> 
>> But are there any negative effects with specifying the flags multiple
>> times for GNU as?  If not, then it seems simpler to generate the form
>> that “all” assemblers accept.
>
> It makes the assembler files unnecessarily larger and harder to read.

It certainly makes them larger, but surely not by an amount that would
cause anyone difficulty in practice.  Not sure about harder to read
though.  Personally I've never found reading code that happens to be
after the first switch to a section any harder to read than subsequent
switches to the section.

The problem with switching based on what assembler we think people
are using is that we don't always know.  It can often be useful to
try something with a different tool for comparison purposes, and that
includes using llvm-mc instead of gas.  (Certainly done that myself
a few times, and I know others have.)

I'm not saying we should bend over backwards to support difficult
quirks.  But here we're talking about a choice between (a) doing
something that works “everywhere” unconditionally (and keeping things
simple) vs. (b) having both code that takes a shortcut and code that
doesn't take a shortcut and trying to predict which one we should do.

Thanks,
Richard


Re: Improve handling of memory operands in ipa-icf 4/4

2020-11-19 Thread Jan Hubicka
> On 11/16/20 12:20 AM, Jan Hubicka wrote:
> > This is controlled by -fipa-icf-alias-sets
> > 
> > The patch drops too early, so we may end up processing function twice.  
> > Also if
> > merging is not performed we lose code quality for no win (this is rare 
> > case).
> > My original plan was to remember the mismatched parameter and apply them 
> > only
> > after merging decisions are finished, but I was not sure how to do that in
> > ipa-icf.  In particular we need to ensure transitivity. In particular if
> > function foo is merged to bar, we also need to be sure that we dropped
> > base alias setsin functions tht are called by bar even if they themselves
> > are not merged. Martin, is there easy way to implement this on top of 
> > current ICF?
> 
> Well, you will need to create a set of merged functions and then traverse all
> callers of these (via cgraph_node callers). It should not be so difficult, or?

Well, imagine you have function A1 and A2 
and calls A1->B2
and   A2->B3
and there is also B3.

Now A1 is ICF equivalent to A2
and also B1,B2,B3 are ICF equivalent if some TBAA info is dropped.

ICF merges A2 to A1
it also considers to merge B2,B3 to B1 but concludes it is not benefical
at the very end (because some of them have address taken and
constructing wrapper is too expensive)

The comparsions done are
 B1:B2
 B1:B3
 A1:A2
So after comparing we have info what to drop in B1 to make merging B2->B1
and B3->B1 valid.  We also have info what to drop in B2 to make B1->B2
valid and in B3 to make B3->B1 valid.

But we meed to drop info in B2 to make B3->B2 valid to make call path
alias A2 of A1->B2 safe.

Honza


Re: [AArch64] Add --with-tune configure flag

2020-11-19 Thread Richard Earnshaw (lists) via Gcc-patches
On 18/11/2020 17:16, Pop, Sebastian via Gcc-patches wrote:
> Hi,
> 
> On 11/18/20, 10:17 AM, "Wilco Dijkstra"  wrote:
>>I presume you're trying to unify the --with- options across most targets?
> 
> Yes, my intention was to provide the same configure options on arm64
> as on x86, such that projects that already use those options can change
> cpu name to "neoverse-n1" and that will build a compiler with the right
> tuning for Graviton2.
> 
> Allowing arm64 users to specify all the flags available on x86 is important.
> 
>>That would be very useful! However there are significant differences 
>> between
>>targets in how they interpret options like --with-arch=native (or 
>> -march). So
>>those differences also need to be looked at and fixed to avoid unexpected 
>> results.
>>
>>As for the first patch, I think support for --witch-tune requires more 
>> changes.
>>Without proper processing of a --with-tune, you get an incorrect 
>> architecture
>>version (if say the CPU you tune for is newer than the --with-cpu/arch
>>or default).
>>
>>   I posted patches to add --with-tune and fix various issues a while back:
>>https://gcc.gnu.org/pipermail/gcc-patches/2020-September/553865.html
>>https://gcc.gnu.org/pipermail/gcc-patches/2020-September/553866.html
> 
> Thanks for pointing me to your patches, I was not aware of these changes.
> I see that your patches enable more use cases and fix several bugs.
> These changes would definitely be good to have in trunk and branches.
> 
> My patch was the minimal change to enable --with-tune=neoverse-n1
> 
>>As for your second patch, --with-cpu-64 could be a simple alias indeed,
>>but what is the exact definition/expected behaviour of a --with-cpu-32
>>on a target that only supports 64-bit code? The AArch64 target cannot
>>generate AArch32 code, so we shouldn't silently accept it.
> 
> IMO allowing users to specify all the flags available on x86 is important.
> 

This isn't about general users though; it's about how you configure the
compiler and that's not all the same.  I don't mind the --with-cpu-64 as
a strict alias for --with-cpu, but --with-cpu-32 is both redundant and
misleading as it might give the impression that it does something useful.

R.

> Thanks,
> Sebastian
> 



Re: Improve handling of memory operands in ipa-icf 3/4

2020-11-19 Thread Martin Liška

On 11/18/20 3:50 PM, Jan Hubicka wrote:

On 11/13/20 6:50 PM, Jan Hubicka wrote:

Bootstrapped/regtested x86_64-linux. I plan to commit it on monday if there are
no complains.


Hello Honza.

Thank you very much for the patch set.
It's a nice improvement and it will eventually fix the WPA slowness caused by 
IPA ICF.

I made some measurements for master before a first patch and this patch (3/4) 
on godot
game engine:

BEFORE:

Equal symbols: 15690
Totally needed symbols: 17913, fraction of loaded symbols: 39.05%

2156989   false returned: '' in equals_private at 
/home/marxin/Programming/gcc2/gcc/ipa-icf.c:879
1099887   false returned: 'operand_equal_p failed' in compare_operand at 
/home/marxin/Programming/gcc2/gcc/ipa-icf-gimple.c:307
1048605   false returned: 'types are not compatible' in compatible_types_p at 
/home/marxin/Programming/gcc2/gcc/ipa-icf-gimple.c:210
1047679   false returned: 'GIMPLE assignment operands are different' in 
compare_gimple_assign at /home/marxin/Programming/gcc2/gcc/ipa-icf-gimple.c:632
1047517   false returned: 'GIMPLE NOP LHS type mismatch' in 
compare_gimple_assign at /home/marxin/Programming/gcc2/gcc/ipa-icf-gimple.c:628
   57659   false returned: 'call function types are not compatible' in 
compare_gimple_call at /home/marxin/Programming/gcc2/gcc/ipa-icf-gimple.c:573
   52088   false returned: 'PHI node comparison returns false' in 
equals_private at /home/marxin/Programming/gcc2/gcc/ipa-icf.c:914
   52088   false returned: '' in compare_phi_node at 
/home/marxin/Programming/gcc2/gcc/ipa-icf.c:1552
   13565   false returned: 'decl_or_type flags are different' in equals_wpa at 
/home/marxin/Programming/gcc2/gcc/ipa-icf.c:567
9919   false returned: 'result types are different' in equals_wpa at 
/home/marxin/Programming/gcc2/gcc/ipa-icf.c:616

Time variable   usr   sys  wall 
  GGC
  ipa icf:   4.31 (  7%)   0.06 (  2%)   4.38 (  
7%)  6008k (  0%)
  TOTAL  :  57.57  3.49 61.11   
  4830M

AFTER:

Equal symbols: 17019
Totally needed symbols: 19875, fraction of loaded symbols: 70.88%

  377327   false returned: '' in equals_private at 
/home/marxin/Programming/gcc2/gcc/ipa-icf.c:886
  213086   false returned: 'operand_equal_p failed' in compare_operand at 
/home/marxin/Programming/gcc2/gcc/ipa-icf-gimple.c:356
  212179   false returned: 'compare_ao_refs failed (access path difference)' in 
compare_operand at /home/marxin/Programming/gcc2/gcc/ipa-icf-gimple.c:345
  159947   false returned: '' in compare_gimple_call at 
/home/marxin/Programming/gcc2/gcc/ipa-icf-gimple.c:607
  147098   false returned: 'GIMPLE assignment operands are different' in 
compare_gimple_assign at /home/marxin/Programming/gcc2/gcc/ipa-icf-gimple.c:699
   66123   false returned: 'GIMPLE call operands are different' in 
compare_gimple_call at /home/marxin/Programming/gcc2/gcc/ipa-icf-gimple.c:656
   52088   false returned: 'PHI node comparison returns false' in 
equals_private at /home/marxin/Programming/gcc2/gcc/ipa-icf.c:921
   52088   false returned: '' in compare_phi_node at 
/home/marxin/Programming/gcc2/gcc/ipa-icf.c:1580
   12643   false returned: 'decl_or_type flags are different' in equals_wpa at 
/home/marxin/Programming/gcc2/gcc/ipa-icf.c:572
6318   false returned: 'different tree types' in compatible_types_p at 
/home/marxin/Programming/gcc2/gcc/ipa-icf-gimple.c:206

Time variable   usr   sys  wall 
  GGC
  ipa icf:   3.40 (  6%)   0.09 (  3%)   3.49 (  
6%)27M (  1%)
  TOTAL  :  56.60  2.94 59.58   
  4478M

and I'm also sending usage-wrapper graphs.


Thanks for checking!  I also uploaded some data to
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=92535


Nice!



Note that you want to also note gimple in timevar since that is also
mostly ICF related.


I know about that, good reminder.



It seems that ICF performance is highly sensitive to application: it now
behaves very well on cc1plus, seems to do quite well on godot and still
does very bad on Firefox (we still have regression there compared to gcc
9 that itself did relatively bad).

I noticed one stupid bug in operand_equal_p on coponent_refs (I am just
testing a fix) and there are quite few important things that we compare
but do not hash. Those should be easy to fix. I plan to iterate through
this on firefox.

It would be great to get chromium data.  Did you suceeded building it
recently?


It has LTO enabled, you can build it:
https://build.opensuse.org/package/show/openSUSE:Factory/chromium


I now got last year firefox building and working and I am
looking into updating it to current firefox tree that will probaby keep
me occupied for some time.


You can probably also test our package:
https://build.opensuse.org/package/show/openSUSE:Factory/MozillaFirefox

M

[PATCH] tree-optimization/97901 - ICE propagating out LC PHIs

2020-11-19 Thread Richard Biener
We need to fold the stmt to canonicalize MEM_REFs which means
we're back to using replace_uses_by.  Which means we need dominators
to not require a CFG cleanup upthread.

Bootstrapped / tested on x86_64-unknown-linux-gnu, pushed.

2020-11-19  Richard Biener  

PR tree-optimization/97901
* tree-ssa-propagate.c (clean_up_loop_closed_phi): Compute
dominators and use replace_uses_by.

* gcc.dg/torture/pr97901.c: New testcase.
---
 gcc/testsuite/gcc.dg/torture/pr97901.c | 15 +++
 gcc/tree-ssa-propagate.c   | 22 +-
 2 files changed, 20 insertions(+), 17 deletions(-)
 create mode 100644 gcc/testsuite/gcc.dg/torture/pr97901.c

diff --git a/gcc/testsuite/gcc.dg/torture/pr97901.c 
b/gcc/testsuite/gcc.dg/torture/pr97901.c
new file mode 100644
index 000..a6a89ef1e27
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/torture/pr97901.c
@@ -0,0 +1,15 @@
+/* { dg-do compile } */
+
+int a[1], b, *c, *d;
+
+int main() {
+L:
+  d = c;
+  for (b = 0; b < 2; b++)
+d = &a[0];
+  if (c)
+goto L;
+  if (*d)
+__builtin_abort ();
+  return 0;
+}
diff --git a/gcc/tree-ssa-propagate.c b/gcc/tree-ssa-propagate.c
index 354057b48bf..bc656ff76b1 100644
--- a/gcc/tree-ssa-propagate.c
+++ b/gcc/tree-ssa-propagate.c
@@ -1569,6 +1569,10 @@ clean_up_loop_closed_phi (function *fun)
   if (!loops_state_satisfies_p (LOOPS_HAVE_RECORDED_EXITS))
 return 0;
 
+  /* replace_uses_by might purge dead EH edges and we want it to also
+ remove dominated blocks.  */
+  calculate_dominance_info  (CDI_DOMINATORS);
+
   /* Walk over loop in function.  */
   FOR_EACH_LOOP_FN (fun, loop, 0)
 {
@@ -1595,23 +1599,7 @@ clean_up_loop_closed_phi (function *fun)
  fprintf (dump_file, "'\n");
}
 
- use_operand_p use_p;
- imm_use_iterator iter;
- gimple *use_stmt;
- FOR_EACH_IMM_USE_STMT (use_stmt, iter, lhs)
-   {
- FOR_EACH_IMM_USE_ON_STMT (use_p, iter)
-   replace_exp (use_p, rhs);
- update_stmt (use_stmt);
-
- /* Update the invariant flag for ADDR_EXPR if replacing
-a variable index with a constant.  */
- if (gimple_assign_single_p (use_stmt)
- && TREE_CODE (gimple_assign_rhs1 (use_stmt))
-  == ADDR_EXPR)
-   recompute_tree_invariant_for_addr_expr (
- gimple_assign_rhs1 (use_stmt));
-   }
+ replace_uses_by (lhs, rhs);
  remove_phi_node (&gsi, true);
}
  else
-- 
2.26.2


Re: [PATCH] [PR target/97194] [AVX2] Support variable index vec_set.

2020-11-19 Thread Richard Sandiford via Gcc-patches
Sorry for the late reply.  I somehow managed to miss this thread until now
despite being cc:ed.

> > I'm not sure what best to do here, as said accepting "any" (integer) mode as
> > input is desirable (SImode, DImode but eventually also smaller modes).  How
> > that can be best achieved I don't know.
>
> I was expecting something similar to how extvM/extzvM operands are
> handled here. We have:
>
> Operands 0 and 1 both have mode M.  Operands 2 and 3 have a
> target-specific mode.
>
> Please note operands 2 and 3 having a "target-specific" mode, handled
> in optabs-query.c as:

>   machine_mode struct_mode = data->operand[struct_op].mode;
>   if (struct_mode == VOIDmode)
> struct_mode = word_mode;
>   if (mode != struct_mode)
> return false;
>
> > Why's not specifying any mode in the patter no good?  Just make sure you
> > appropriately extend/subreg it?  We can make sure it will be an integer
> > mode in the expander itself.
>
> IIRC, having known mode, expanders can use create_convert_operand_to,
> and the middle-end will do the above by itself. Also note that at
> least two targets specify SImode, so register operands are currently
> ineffective there.

Yeah, I agree create_convert_operand_to is the right way to handle
this kind of situation.

Uros Bizjak via Gcc-patches  writes:
> On Thu, Nov 12, 2020 at 2:59 PM Richard Biener
>  wrote:
>
>> I'm not sure what best to do here, as said accepting "any" (integer) mode as
>> input is desirable (SImode, DImode but eventually also smaller modes).  How
>> that can be best achieved I don't know.
>
> FTR, attached patch *should* allow s390 and amdgcn to emit vec_set
> with SImode variable index operand, but I was not able to test the
> patch by myself.

LGTM FWIW.  I think this is preferable to a modeless operand,
which IMO should always be a last resort.

Thanks,
Richard

>
> Uros.
>
> diff --git a/gcc/optabs.c b/gcc/optabs.c
> index 1820b91877a..02ba599c373 100644
> --- a/gcc/optabs.c
> +++ b/gcc/optabs.c
> @@ -3863,12 +3863,17 @@ can_vec_set_var_idx_p (machine_mode vec_mode)
>  return false;
>  
>machine_mode inner_mode = GET_MODE_INNER (vec_mode);
> +
>rtx reg1 = alloca_raw_REG (vec_mode, LAST_VIRTUAL_REGISTER + 1);
>rtx reg2 = alloca_raw_REG (inner_mode, LAST_VIRTUAL_REGISTER + 2);
> -  rtx reg3 = alloca_raw_REG (VOIDmode, LAST_VIRTUAL_REGISTER + 3);
>  
>enum insn_code icode = optab_handler (vec_set_optab, vec_mode);
>  
> +  const struct insn_data_d *data = &insn_data[icode];
> +  machine_mode idx_mode = data->operand[2].mode;
> +
> +  rtx reg3 = alloca_raw_REG (idx_mode, LAST_VIRTUAL_REGISTER + 3);
> +
>return icode != CODE_FOR_nothing && insn_operand_matches (icode, 0, reg1)
>&& insn_operand_matches (icode, 1, reg2)
>&& insn_operand_matches (icode, 2, reg3);


[PATCH] libsanitizer: fix SIGSEGV in fopen64 interceptor

2020-11-19 Thread Slava Barinov via Gcc-patches
Null pointer in path argument leads to SIGSEGV in interceptor.

libsanitizer/ChangeLog:
* sanitizer_common/sanitizer_common_interceptors.inc: Check
path for null before dereference in fopen64 interceptor.
---

Notes:
Apparently check has been lost during merge from upstream

 libsanitizer/sanitizer_common/sanitizer_common_interceptors.inc | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/libsanitizer/sanitizer_common/sanitizer_common_interceptors.inc 
b/libsanitizer/sanitizer_common/sanitizer_common_interceptors.inc
index 729eead43c0..2ef23d9a50b 100644
--- a/libsanitizer/sanitizer_common/sanitizer_common_interceptors.inc
+++ b/libsanitizer/sanitizer_common/sanitizer_common_interceptors.inc
@@ -6081,7 +6081,7 @@ INTERCEPTOR(__sanitizer_FILE *, freopen, const char 
*path, const char *mode,
 INTERCEPTOR(__sanitizer_FILE *, fopen64, const char *path, const char *mode) {
   void *ctx;
   COMMON_INTERCEPTOR_ENTER(ctx, fopen64, path, mode);
-  COMMON_INTERCEPTOR_READ_RANGE(ctx, path, REAL(strlen)(path) + 1);
+  if (path) COMMON_INTERCEPTOR_READ_RANGE(ctx, path, REAL(strlen)(path) + 1);
   COMMON_INTERCEPTOR_READ_RANGE(ctx, mode, REAL(strlen)(mode) + 1);
   __sanitizer_FILE *res = REAL(fopen64)(path, mode);
   COMMON_INTERCEPTOR_FILE_OPEN(ctx, res, path);
-- 
2.29.2



Re: [PATCH] Remove lambdas from _Rb_tree

2020-11-19 Thread Jonathan Wakely via Gcc-patches

On 19/11/20 07:46 +0100, François Dumont via Libstdc++ wrote:

On 18/11/20 12:50 am, Jonathan Wakely wrote:

On 17/11/20 21:51 +0100, François Dumont via Libstdc++ wrote:
This is a change that has been done to _Hashtable and that I 
forgot to propose for _Rb_tree.


The _GLIBCXX_XREF macro can be easily removed of course.

    libstdc++: _Rb_tree code cleanup, remove lambdas.

    Use an additional template parameter on the clone method to 
propagate if the values must be

    copy or move rather than lambdas.

    libstdc++-v3/ChangeLog:

            * include/bits/move.h 
(_GLIBCXX_XREF): New.
            * include/bits/stl_tree.h: Adapt 
to use latter.
            (_Rb_tree<>::_S_fwd_value_for): 
New.
            (_Rb_tree<>::_M_clone_node): Add _Tree 
template parameter.

            Use _S_fwd_value_for.
            (_Rb_tree<>::_M_cbegin): New.
            (_Rb_tree<>::_M_begin): Use latter.
            (_Rb_tree<>::_M_copy): Add _Tree template 
parameter.
            (_Rb_tree<>::_M_move_data): Use rvalue 
reference for _Rb_tree parameter.

            (_Rb_tree<>::_M_move_assign): 
Likewise.

Tested under Linux x86_64.

Ok to commit ?


GCC is in stage 3 now, so this should have been posted last week
really.


Ok, no problem, it can wait.

Still, following your advises here is what I come up with, much 
simpler indeed.


Yes, this simpler patch looks promising even though it's stage 3.


I just run a few tests for the moment but so far so good.

Thanks




diff --git a/libstdc++-v3/include/bits/move.h 
b/libstdc++-v3/include/bits/move.h

index 5a4dbdc823c..e0d68ca9108 100644
--- a/libstdc++-v3/include/bits/move.h
+++ b/libstdc++-v3/include/bits/move.h
@@ -158,9 +158,11 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION

  /// @} group utilities

+#define _GLIBCXX_XREF(_Tp) _Tp&&


I think this does improve the code that uses this. But the correct
name for this is forwarding reference, so I think FWDREF would be
better than XREF. XREF doesn't tell me anything about what it's for.


#define _GLIBCXX_MOVE(__val) std::move(__val)
#define _GLIBCXX_FORWARD(_Tp, __val) std::forward<_Tp>(__val)
#else
+#define _GLIBCXX_XREF(_Tp) const _Tp&
#define _GLIBCXX_MOVE(__val) (__val)
#define _GLIBCXX_FORWARD(_Tp, __val) (__val)
#endif
diff --git a/libstdc++-v3/include/bits/stl_tree.h 
b/libstdc++-v3/include/bits/stl_tree.h

index ec141ea01c7..128c7e2c892 100644
--- a/libstdc++-v3/include/bits/stl_tree.h
+++ b/libstdc++-v3/include/bits/stl_tree.h
@@ -478,11 +478,7 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION

    template
      _Link_type
-#if __cplusplus < 201103L
-      operator()(const _Arg& __arg)
-#else
-      operator()(_Arg&& __arg)
-#endif
+      operator()(_GLIBCXX_XREF(_Arg) __arg)
      {
        _Link_type __node = static_cast<_Link_type>(_M_extract());
        if (__node)
@@ -544,11 +540,7 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION

    template
      _Link_type
-#if __cplusplus < 201103L
-      operator()(const _Arg& __arg) const
-#else
-      operator()(_Arg&& __arg) const
-#endif
+      operator()(_GLIBCXX_XREF(_Arg) __arg) const
      { return _M_t._M_create_node(_GLIBCXX_FORWARD(_Arg, __arg)); }

      private:
@@ -655,11 +647,27 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
    _M_put_node(__p);
      }

-      template
+#if __cplusplus >= 201103L
+      template
+    static constexpr
+    typename conditional::value,
+                 const value_type&, value_type&&>::type
+    _S_fwd_value_for(value_type& __val) noexcept
+    { return std::move(__val); }
+#else
+      template
+    static const value_type&
+    _S_fwd_value_for(value_type& __val)
+    { return __val; }
+#endif
+
+      template
    _Link_type
-    _M_clone_node(_Const_Link_type __x, _NodeGen& __node_gen)
+    _M_clone_node(_GLIBCXX_XREF(_Tree),


Since the _Tree type is only used to decide whether to copy or move,
could it just be a bool instead?

      template
     _Link_type
    _M_clone_node(_Link_type __x, _NodeGen& __node_gen)

Then it would be called as _M_clone_node<_Move>(__x, __node_gen)
instead of _M_clone_node(_GLIBCXX_FORWARD(_Tree, __t), __x, __node_gen).
That seems easier to read.


+              _Link_type __x, _NodeGen& __node_gen)
    {
-      _Link_type __tmp = __node_gen(*__x->_M_valptr());
+      _Link_type __tmp
+        = __node_gen(_S_fwd_value_for<_Tree>(*__x->_M_valptr()));


Is _S_fwd_value_for necessary? This would work:

#if __cplusplus >= 201103L
          using _Vp = 

Re: [PATCH] configury: --enable-link-serialization support

2020-11-19 Thread Eric Botcazou
> 2020-11-18  Jakub Jelinek  
> 
> gcc/
>   * configure.ac: Add $lang.prev rules, INDEX.$lang and SERIAL_LIST 
and
>   SERIAL_COUNT variables to Make-hooks.
>   (--enable-link-serialization): New configure option.
>   * Makefile.in (DO_LINK_SERIALIZATION, LINK_PROGRESS): New 
variables.
>   * doc/install.texi (--enable-link-serialization): Document.
>   * configure: Regenerated.
> gcc/c/
>   * Make-lang.in (c.serial): New goal.
>   (.PHONY): Add c.serial c.prev.
>   (cc1$(exeext)): Call LINK_PROGRESS.
> gcc/cp/
>   * Make-lang.in (c++.serial): New goal.
>   (.PHONY): Add c++.serial c++.prev.
>   (cc1plus$(exeext)): Depend on c++.prev.  Call LINK_PROGRESS.
> gcc/fortran/
>   * Make-lang.in (fortran.serial): New goal.
>   (.PHONY): Add fortran.serial fortran.prev.
>   (f951$(exeext)): Depend on fortran.prev.  Call LINK_PROGRESS.
> gcc/lto/
>   * Make-lang.in (lto, lto1.serial, lto2.serial): New goals.
>   (.PHONY): Add lto lto1.serial lto1.prev lto2.serial lto2.prev.
>   (lto.all.cross, lto.start.encap): Remove dependencies.
>   ($(LTO_EXE)): Depend on lto1.prev.  Call LINK_PROGRESS.
>   ($(LTO_DUMP_EXE)): Depend on lto2.prev.  Call LINK_PROGRESS.
> gcc/objc/
>   * Make-lang.in (objc.serial): New goal.
>   (.PHONY): Add objc.serial objc.prev.
>   (cc1obj$(exeext)): Depend on objc.prev.  Call LINK_PROGRESS.
> gcc/objcp/
>   * Make-lang.in (obj-c++.serial): New goal.
>   (.PHONY): Add obj-c++.serial obj-c++.prev.
>   (cc1objplus$(exeext)): Depend on obj-c++.prev.  Call LINK_PROGRESS.
> gcc/ada/
>   * gcc-interface/Make-lang.in (ada.serial): New goal.
>   (.PHONY): Add ada.serial ada.prev.
>   (gnat1$(exeext)): Depend on ada.prev.  Call LINK_PROGRESS.
> gcc/brig/
>   * Make-lang.in (brig.serial): New goal.
>   (.PHONY): Add brig.serial brig.prev.
>   (brig1$(exeext)): Depend on brig.prev.  Call LINK_PROGRESS.
> gcc/go/
>   * Make-lang.in (go.serial): New goal.
>   (.PHONY): Add go.serial go.prev.
>   (go1$(exeext)): Depend on go.prev.  Call LINK_PROGRESS.
> gcc/jit/
>   * Make-lang.in (jit.serial): New goal.
>   (.PHONY): Add jit.serial jit.prev.
>   ($(LIBGCCJIT_FILENAME)): Depend on jit.prev.  Call LINK_PROGRESS.
> gcc/d/
>   * Make-lang.in (d.serial): New goal.
>   (.PHONY): Add d.serial d.prev.
>   (d21$(exeext)): Depend on d.prev.  Call LINK_PROGRESS.

This seems to cause the binaries to be always relinked, for example from the 
gcc/ directory of the build tree:

make
[relink of gnat1, brig1, cc1plus, d21, f951, go1, lto1, ...]
make
[relink of gnat1, brig1, cc1plus, d21, f951, go1, lto1, ...]

-- 
Eric Botcazou





Re: [PATCH] vect: Add a “very cheap” cost model

2020-11-19 Thread Richard Sandiford via Gcc-patches
Richard Biener via Gcc-patches  writes:
> On Mon, Nov 16, 2020 at 10:58 AM Richard Sandiford
>  wrote:
>> > Does the patch also vectorize with SVE loops that have
>> > unknown loop bound?  The documentation isn't entirely
>> > conclusive there.
>>
>> Yeah, for SVE it vectorises.  How about changing:
>>
>>   For example, if each iteration of a vectorized loop would handle
>>   exactly four iterations, …
>>
>> to:
>>
>>   For example, if each iteration of a vectorized loop could only
>>   handle exactly four iterations of the original scalar loop, …
>>
>> ?
>
> Yeah, guess that's better.
>
>>
>> > Iff the iteration count is a multiple of two and the target can
>> > vectorize the loop with both VF 2 and VF 4 but VF 4 would be better if
>> > we'd use the 'cheap' cost model, does 'very-cheap' not vectorize the
>> > loop or does it choose VF 2?
>>
>> It would choose VF 2, if that's still a win over scalar code.
>
> OK, that's what I expected.  The VF iteration is one source of
> compile-time that we might want to avoid somehow ... on
> x86_64 knowing the precise number of constant iterations
> should allow to only pick a subset of vector modes based on
> largest_pow2_factor or so?  Or maybe just use the preferred
> SIMD mode for cheap/very-cheap?  (maybe pass down
> the cost model kind to the target hook so targets can decide
> for themselves here)

On the preferred simd mode thing: TBH, I'd prefer to get rid
of that hook one day and just rely on autovectorize_vector_modes.

The difficulty with adding an early check is that we don't know ahead
of time which types of scalar element a loop operates on: we only find
that out on the fly during the first analysis of the loop.  The check
would also depend on SLP grouping: we can use a vector of 4 ints to
handle 2 iterations of the scalar loop if the ints are in an SLP
group of size 2.

So I agree it would be nice to have early-outs, but I think we'd have
to restructure things first.  E.g. maybe we could do some “cheap” initial
analysis that checks for basic vectorisability, records which scalar
elements are used by the loop, and records how big the containing SLP
groups might be (based on optimistic assumptions).  Then we can use
that to prefilter the modes we try (perhaps all the way down to no modes).
I guess that's conceptually similar to building an SLP graph though.

Does the attached look OK?  I've included a version of the updated
wording above.  I also changed this condition to use “>” rather
than “>=”:

  /* If the vector loop needs multiple iterations to be beneficial then
 things are probably too close to call, and the conservative thing
 would be to stick with the scalar code.  */
  if (flag_vect_cost_model == VECT_COST_MODEL_VERY_CHEAP
  && min_profitable_estimate > (int) vect_vf_for_cost (loop_vinfo))

since when min_profitable_estimate == min_profitable_iters
we'll have done:

  if (!LOOP_VINFO_USING_PARTIAL_VECTORS_P (loop_vinfo)
  && min_profitable_iters < (assumed_vf + peel_iters_prologue))
/* We want the vectorized loop to execute at least once.  */
min_profitable_iters = assumed_vf + peel_iters_prologue;

I also tried to make vect-cost-model-4.c more resilient on targets
that require alignment.

Thanks,
Richard


gcc/
* doc/invoke.texi (-fvect-cost-model): Add a very-cheap model.
* common.opt (fvect-cost-model=): Add very-cheap as a possible option.
(fsimd-cost-model=): Likewise.
(vect_cost_model): Add very-cheap.
* flag-types.h (vect_cost_model): Add VECT_COST_MODEL_VERY_CHEAP.
Put the values in order of increasing aggressiveness.
* tree-vect-data-refs.c (vect_enhance_data_refs_alignment): Use
range checks when comparing against VECT_COST_MODEL_CHEAP.
(vect_prune_runtime_alias_test_list): Do not allow any alias
checks for the very-cheap cost model.
* tree-vect-loop.c (vect_analyze_loop_costing): Do not allow
any peeling for the very-cheap cost model.  Also require one
iteration of the vector loop to pay for itself.

gcc/testsuite/
* gcc.dg/vect/vect-cost-model-1.c: New test.
* gcc.dg/vect/vect-cost-model-2.c: Likewise.
* gcc.dg/vect/vect-cost-model-3.c: Likewise.
* gcc.dg/vect/vect-cost-model-4.c: Likewise.
* gcc.dg/vect/vect-cost-model-5.c: Likewise.
* gcc.dg/vect/vect-cost-model-6.c: Likewise.
---
 gcc/common.opt|  7 +++--
 gcc/doc/invoke.texi   | 12 +++--
 gcc/flag-types.h  | 10 ---
 gcc/testsuite/gcc.dg/vect/vect-cost-model-1.c | 11 
 gcc/testsuite/gcc.dg/vect/vect-cost-model-2.c | 11 
 gcc/testsuite/gcc.dg/vect/vect-cost-model-3.c | 11 
 gcc/testsuite/gcc.dg/vect/vect-cost-model-4.c | 13 +
 gcc/testsuite/gcc.dg/vect/vect-cost-model-5.c | 11 
 gcc/testsuite/gcc.dg/vect/vect-cost-model-6.c | 12 +
 gcc/tree-vect-data-refs.c

Re: [PATCH] configury: --enable-link-serialization support

2020-11-19 Thread Jakub Jelinek via Gcc-patches
On Thu, Nov 19, 2020 at 12:55:42PM +0100, Eric Botcazou wrote:
> > * Make-lang.in (d.serial): New goal.
> > (.PHONY): Add d.serial d.prev.
> > (d21$(exeext)): Depend on d.prev.  Call LINK_PROGRESS.
> 
> This seems to cause the binaries to be always relinked, for example from the 
> gcc/ directory of the build tree:
> 
> make
> [relink of gnat1, brig1, cc1plus, d21, f951, go1, lto1, ...]
> make
> [relink of gnat1, brig1, cc1plus, d21, f951, go1, lto1, ...]

Does it?
Only with --enable-link-serialization, or without that too?
Will have a look.

Jakub



Re: [C++ PATCH] Speed up inplace_merge algorithm & fix inefficient logic(PR libstdc++/83938)#

2020-11-19 Thread Jonathan Wakely via Gcc-patches

On 16/07/19 18:40 +0200, François Dumont wrote:

Hi

    I eventually spent much more time working on the inplace_merge 
performance bench.


    And the results do not confirm the theory:

Before patch:
inplace_merge.cc             bench 1 / 1 memory            243r  
227u   17s      1216mem    5pf
inplace_merge.cc             bench 1 / 4 memory            297r 278u   
18s       480mem    0pf
inplace_merge.cc             bench 1 /64 memory           373r 354u   
18s       480mem    0pf
inplace_merge.cc             bench 0 / 1 memory            12r 11u     
0s       480mem    0pf


After the patch to reduce memory allocation:
inplace_merge.cc             bench 1 / 1 memory            245r  
227u   18s      1216mem    0pf
inplace_merge.cc             bench 1 / 4 memory            292r 273u   
19s       480mem    0pf
inplace_merge.cc             bench 1 /64 memory           373r 356u   
17s       480mem    0pf
inplace_merge.cc             bench 0 / 1 memory            11r 11u     
0s       480mem    0pf


With the __len1 > __len2 condition change:
inplace_merge.cc             bench 1 / 1 memory            244r  
225u   20s      1216mem    0pf
inplace_merge.cc             bench 1 / 4 memory            379r 361u   
17s       480mem    0pf
inplace_merge.cc             bench 1 /64 memory            640r 625u   
16s       480mem    0pf
inplace_merge.cc             bench 0 / 1 memory             11r 11u    
0s       480mem    0pf


When there is no memory limit the results are identical of course. 
Otherwise as soon as memory is limited performance starts to decrease 
with the condition change on __len1 vs __len2.


Could you give the bench you use to demonstrate the enhancement ? I 
also wonder why your patch doesn't change consistently the same 
condition in __merge_without_buffer ?


For the moment I'd like to propose the attached patch that is to say 
the reduction on the amount of allocated memory and the new/modified 
benches.


Note that as soon as I forbid any memory allocation I also reduce the 
size of the range to merge cause the implementation rely on 
recursivity and so could end-up in a stack overflow. Maybe I need to 
check for simulator targets like several other tests ? Unless 
simulators do not run the performance tests ?


The performance tests are never run by default.

I don't think we should spend too much time caring about performance
of sorting close to Out Of Memory conditions. We don't try to optimize
std::vector or other cases to work when close to OOM.

So I think just reducing the memory usage is the right approach here.

Regarding this stack overflow issue, is there some routine to find out 
how many levels of function calls can be added before reaching the 
stack overflow ? I could perhaps call __builtin_alloca and check the 
result but that smells. If I could find out this we could fallback on 
an iterative approach to complete the merge.


No, alloca is inherently unsafe. Please don't even consider it :-)



    PR libstdc++/83938
    * include/bits/stl_algo.h:
    (__inplace_merge): Take temporary buffer length from smallest range.
    (__stable_sort): Limit temporary buffer length.
    * include/bits/stl_tempbuf.h (get_temporary_buffer): Change __len
    computation in the loop.


Please use "Change __len computation in the loop to avoid truncation."

It's a bit more descriptive.


    * testsuite/25_algorithms/inplace_merge/1.cc (test3): Test all possible
    pivot positions.
    * testsuite/performance/25_algorithms/inplace_merge.cc: New.
    * testsuite/performance/25_algorithms/stable_sort.cc: Rework to allow
    execution with different memory constraints.

Ok to commit ?

François

On 6/9/19 4:27 PM, François Dumont wrote:

On 12/21/18 9:57 PM, Jonathan Wakely wrote:

On 29/10/18 07:06 +0100, François Dumont wrote:

Hi

    Some feedback regarding this patch ?


Sorry this got missed, please resubmit during stage 1.

You haven't CC'd the original patch author (chang jc) to give them a
chance to comment on your proposed changes to the patch.

The attached PDF on PR libstdc++/83938 has extensive discussion of the
performance issue, but I don't see any for your version. Some
performance benchmarks for your version would be helpful.


Here is this patch again.

This time it is much closer to John's original one, I just kept my 
change on the size of the temporary buffer which doesn't need to be 
as large as it is currently allocated, especially with John's patch.


The performance tests are showing the good improvements, attached 
are the before/after.

Re: [PATCH] configury: --enable-link-serialization support

2020-11-19 Thread Eric Botcazou
> Does it?
> Only with --enable-link-serialization, or without that too?

It's the default, i.e. without --enable-link-serialization.

> Will have a look.

Thanks.

-- 
Eric Botcazou




Re: [PATCH] PR 83938 Reduce memory consumption in stable_sort/inplace_merge

2020-11-19 Thread Jonathan Wakely via Gcc-patches

On 11/06/20 08:32 +0200, François Dumont via Libstdc++ wrote:

As we are on patching algos we still have this old one.

    From the original patch I only kept the memory optimization part 
as the new performance test was not showing good result for the other 
part to change pivot value. I also kept the small change in 
get_temporary_buffer even if I don't have strong feeling about it, it 
just make sure that we'll try to allocate 1 element as a last chance 
allocation.


    Note that there is still place for an improvement. If we miss 
memory on the heap we then use a recursive implementation which then 
rely on stack memory. I would be surprise that a system which miss 
heap memory would have no problem to allocate about the same on the 
stack so we will surely end up in a stack overflow. I still have this 
on my todo even if I already made several tries with no satisfying 
result in terms of performance.


    Tested under Linux x86_64.

Commit message:

    libstdc++: Limit memory allocation in stable_sort/inplace_merge 
(PR 83938)


    Reduce memory consumption in stable_sort/inplace_merge to what is used.

    Co-authored-by: François Dumont  

    libstdc++-v3/ChangeLog:

    2020-06-11  John Chang  
                François Dumont  

            PR libstdc++/83938
            * include/bits/stl_tempbuf.h (get_temporary_buffer): 
Change __len

            computation in the loop.
            * include/bits/stl_algo.h:
            (__inplace_merge): Take temporary buffer length from 
smallest range.

            (__stable_sort): Limit temporary buffer length.
            * testsuite/25_algorithms/inplace_merge/1.cc (test03): 
Test different

            pivot positions.
            * testsuite/performance/25_algorithms/stable_sort.cc: Test 
stable_sort

            under different heap memory conditions.
            * testsuite/performance/25_algorithms/inplace_merge.cc: 
New.

Ok to commit ?


Oops, I replied to the old thread about this patch. See comments at:
https://gcc.gnu.org/pipermail/gcc-patches/2020-November/559583.html

P.S. it looks to me like the "else" branch in __merge_adaptive will
rarely get used. In most cases, the temporary buffer is going to
succeed on the first allocation and have enough space.

So I think we should split out the "else" branch of __merge_adaptive
to a separate function, so that __merge_adaptive is a much smaller
function and the cold branch doesn't have to be loaded into cache.

Something like this, but I haven't benchmarked it.

diff --git a/libstdc++-v3/include/bits/stl_algo.h b/libstdc++-v3/include/bits/stl_algo.h
index 6efc99035b7d..428c22cf5d43 100644
--- a/libstdc++-v3/include/bits/stl_algo.h
+++ b/libstdc++-v3/include/bits/stl_algo.h
@@ -2400,6 +2400,17 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
 	return std::rotate(__first, __middle, __last);
 }
 
+  // This is a helper function for the cold branch of __merge_adaptive.
+  template
+void
+__merge_adaptive_resize(_BidirectionalIterator __first,
+			_BidirectionalIterator __middle,
+			_BidirectionalIterator __last,
+			_Distance __len1, _Distance __len2,
+			_Pointer __buffer, _Distance __buffer_size,
+			_Compare __comp);
+
   /// This is a helper function for the merge routines.
   template
@@ -2425,42 +2436,58 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
 	}
   else
 	{
-	  _BidirectionalIterator __first_cut = __first;
-	  _BidirectionalIterator __second_cut = __middle;
-	  _Distance __len11 = 0;
-	  _Distance __len22 = 0;
-	  if (__len1 > __len2)
-	{
-	  __len11 = __len1 / 2;
-	  std::advance(__first_cut, __len11);
-	  __second_cut
-		= std::__lower_bound(__middle, __last, *__first_cut,
- __gnu_cxx::__ops::__iter_comp_val(__comp));
-	  __len22 = std::distance(__middle, __second_cut);
-	}
-	  else
-	{
-	  __len22 = __len2 / 2;
-	  std::advance(__second_cut, __len22);
-	  __first_cut
-		= std::__upper_bound(__first, __middle, *__second_cut,
- __gnu_cxx::__ops::__val_comp_iter(__comp));
-	  __len11 = std::distance(__first, __first_cut);
-	}
-
-	  _BidirectionalIterator __new_middle
-	= std::__rotate_adaptive(__first_cut, __middle, __second_cut,
- __len1 - __len11, __len22, __buffer,
- __buffer_size);
-	  std::__merge_adaptive(__first, __first_cut, __new_middle, __len11,
-__len22, __buffer, __buffer_size, __comp);
-	  std::__merge_adaptive(__new_middle, __second_cut, __last,
-__len1 - __len11,
-__len2 - __len22, __buffer,
-__buffer_size, __comp);
+	  std::__merge_adaptive_resize(__first, __middle, __last,
+   __len1, __len2, __buffer, __buffer_size,
+   __comp);
 	}
 }
 
+  // This is a helper function for the cold branch of __merge_adaptive.
+  template
+void
+__merge_adaptive_resize(_Bidirectio

[PATCH] Fix gcc.dg/pr97897.c

2020-11-19 Thread Richard Biener
This adds dg-options "" to avoid the pedantic error on _Complex int.

2020-11-19  Richard Biener  

* gcc.dg/pr97897.c: Add dg-options.
---
 gcc/testsuite/gcc.dg/pr97897.c | 1 +
 1 file changed, 1 insertion(+)

diff --git a/gcc/testsuite/gcc.dg/pr97897.c b/gcc/testsuite/gcc.dg/pr97897.c
index 775f34ca767..084c1cdbfeb 100644
--- a/gcc/testsuite/gcc.dg/pr97897.c
+++ b/gcc/testsuite/gcc.dg/pr97897.c
@@ -1,4 +1,5 @@
 /* { dg-do compile } */
+/* { dg-options "" } */
 
 void h ();
 void f () __attribute__ ((returns_twice));
-- 
2.26.2


[PATCH] refactor reassocs get_rank

2020-11-19 Thread Richard Biener
This refactors things so assigned ranks are dumped and the cache
is consistently used also for PHIs.

Bootstrapped and tested on x86_64-unknown-linux-gnu, pushed.

2020-11-19  Richard Biener  

* tree-ssa-reassoc.c (get_rank): Refactor to consistently
use the cache and dump ranks assigned.
---
 gcc/tree-ssa-reassoc.c | 46 ++
 1 file changed, 24 insertions(+), 22 deletions(-)

diff --git a/gcc/tree-ssa-reassoc.c b/gcc/tree-ssa-reassoc.c
index a2ca1713d4b..89adafae32c 100644
--- a/gcc/tree-ssa-reassoc.c
+++ b/gcc/tree-ssa-reassoc.c
@@ -425,41 +425,43 @@ get_rank (tree e)
   long rank;
   tree op;
 
-  if (SSA_NAME_IS_DEFAULT_DEF (e))
-   return find_operand_rank (e);
-
-  stmt = SSA_NAME_DEF_STMT (e);
-  if (gimple_code (stmt) == GIMPLE_PHI)
-   return phi_rank (stmt);
-
-  if (!is_gimple_assign (stmt))
-   return bb_rank[gimple_bb (stmt)->index];
-
   /* If we already have a rank for this expression, use that.  */
   rank = find_operand_rank (e);
   if (rank != -1)
return rank;
 
-  /* Otherwise, find the maximum rank for the operands.  As an
-exception, remove the bias from loop-carried phis when propagating
-the rank so that dependent operations are not also biased.  */
-  /* Simply walk over all SSA uses - this takes advatage of the
- fact that non-SSA operands are is_gimple_min_invariant and
-thus have rank 0.  */
-  rank = 0;
-  FOR_EACH_SSA_TREE_OPERAND (op, stmt, iter, SSA_OP_USE)
-   rank = propagate_rank (rank, op);
+  stmt = SSA_NAME_DEF_STMT (e);
+  if (gimple_code (stmt) == GIMPLE_PHI)
+   rank = phi_rank (stmt);
+
+  else if (!is_gimple_assign (stmt))
+   rank = bb_rank[gimple_bb (stmt)->index];
+
+  else
+   {
+ /* Otherwise, find the maximum rank for the operands.  As an
+exception, remove the bias from loop-carried phis when propagating
+the rank so that dependent operations are not also biased.  */
+ /* Simply walk over all SSA uses - this takes advatage of the
+fact that non-SSA operands are is_gimple_min_invariant and
+thus have rank 0.  */
+ rank = 0;
+ FOR_EACH_SSA_TREE_OPERAND (op, stmt, iter, SSA_OP_USE)
+   rank = propagate_rank (rank, op);
+
+ rank += 1;
+   }
 
   if (dump_file && (dump_flags & TDF_DETAILS))
{
  fprintf (dump_file, "Rank for ");
  print_generic_expr (dump_file, e);
- fprintf (dump_file, " is %ld\n", (rank + 1));
+ fprintf (dump_file, " is %ld\n", rank);
}
 
   /* Note the rank in the hashtable so we don't recompute it.  */
-  insert_operand_rank (e, (rank + 1));
-  return (rank + 1);
+  insert_operand_rank (e, rank);
+  return rank;
 }
 
   /* Constants, globals, etc., are rank 0 */
-- 
2.26.2


Fix hanlding of gimple_clobber in ICF

2020-11-19 Thread Jan Hubicka
Hi,
after fixing few issues I gotto stage where 1.4M icf mismatches are due to
comparing two gimple clobber.  The problem is that operand_equal_p match
clobber 

case CONSTRUCTOR:
 /* In GIMPLE empty constructors are allowed in initializers of
aggregates.  */
 return !CONSTRUCTOR_NELTS (arg0) && !CONSTRUCTOR_NELTS (arg1);

But this happens too late after comparing semantics of its type (that
are not very relevant for memory store and fails way too often).

In the context of ipa-icf we do not really need to match RHS of gimple clobbers:
it is enough to know that the LHS stores can be considered equivalent.

I this added logic to hash them all the same way and compare using
TREE_CLOBBER_P flag.  I see other option in extending operand_equal_p
in fold-const to handle them more generously or making stmt hash and compare
to skip comparing/hashing RHS of gimple_clobber_p.

I am now down to 1453 opernad_equal_p mismatches so it seems we are getting
into shape.

Bootstrapped/regtested x86_64, looks reasonable?
Honza

* ipa-icf-gimple.c (func_checker::hash_operand): Hash gimple clobber.
(func_checker::operand_equal_p): Special case gimple clobber.
diff --git a/gcc/ipa-icf-gimple.c b/gcc/ipa-icf-gimple.c
index ffb1baddbdb..69bc9ab5b34 100644
--- a/gcc/ipa-icf-gimple.c
+++ b/gcc/ipa-icf-gimple.c
@@ -245,6 +245,14 @@ func_checker::hash_operand (const_tree arg, inchash::hash 
&hstate,
   break;
 }
 
+  /* In gimple all clobbers can be considered equal: while comparaing two
+ gimple clobbers we match the left hand memory accesses.  */
+  if (TREE_CLOBBER_P (arg))
+{
+  hstate.add_int (0xc10bbe5);
+  return;
+}
+
   return operand_compare::hash_operand (arg, hstate, flags);
 }
 
@@ -306,6 +314,10 @@ func_checker::operand_equal_p (const_tree t1, const_tree 
t2,
 default:
   break;
 }
+  /* In gimple all clobbers can be considered equal.  We match the right hand
+ memory accesses.  */
+  if (TREE_CLOBBER_P (t1) || TREE_CLOBBER_P (t2))
+return TREE_CLOBBER_P (t1) == TREE_CLOBBER_P (t2);
 
   return operand_compare::operand_equal_p (t1, t2, flags);
 }


[PATCH] Fix bootstrap

2020-11-19 Thread Richard Biener
This fixes a typo in the TREE_CODE compare which should
compare against TYPE_DECL, not TYPE_NAME.

Pushed as obvious.

2020-11-19  Richard Biener  

* fold-const.c (operand_compare::hash_operand): Fix typo.
---
 gcc/fold-const.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/gcc/fold-const.c b/gcc/fold-const.c
index 397805d7779..820b08d26fd 100644
--- a/gcc/fold-const.c
+++ b/gcc/fold-const.c
@@ -3871,7 +3871,7 @@ operand_compare::hash_operand (const_tree t, 
inchash::hash &hstate,
  c = TYPE_NAME (TYPE_MAIN_VARIANT (c));
  /* We compute mangled names only when free_lang_data is run.
 In that case we can hash precisely.  */
- if (TREE_CODE (c) == TYPE_NAME
+ if (TREE_CODE (c) == TYPE_DECL
  && DECL_ASSEMBLER_NAME_SET_P (c))
hstate.add_object
   (IDENTIFIER_HASH_VALUE
-- 
2.26.2


preprocessor: main-file cleanup

2020-11-19 Thread Nathan Sidwell


In preparing module patch 7 I realized there was a cleanup I could
make to simplify it.  This is that cleanup.  Also, when doing the
cleanup I noticed some macros had been turned into inline functions,
but not renamed to the preprocessors internal namespace
(_cpp_$INTERNAL rather than cpp_$USER).  Thus, this renames those
functions, deletes an internal field of the file structure, and
determines whether we're in the main file by comparing to
pfile->main_file, the _cpp_file  of the main file.

libcpp/
* internal.h (cpp_in_system_header): Rename to ...
(_cpp_in_system_header): ... here.
(cpp_in_primary_file): Rename to ...
(_cpp_in_main_source_file): ... here.  Compare main_file equality
and check main_search value.
* lex.c (maybe_va_opt_error, _cpp_lex_direct): Adjust for rename.
* macro.c (_cpp_builtin_macro_text): Likewise.
(replace_args): Likewise.
* directives.c (do_include_next): Likewise.
(do_pragma_once, do_pragma_system_header): Likewise.
* files.c (struct _cpp_file): Delete main_file field.
(pch_open): Check pfile->main_file equality.
(make_cpp_file): Drop cpp_reader parm, don't set main_file.
(_cpp_find_file): Adjust.
(_cpp_stack_file): Check pfile->main_file equality.
(struct report_missing_guard_data): Add cpp_reader field.
(report_missing_guard): Check pfile->main_file equality.
(_cpp_report_missing_guards): Adjust.

pushing to trunk
--
Nathan Sidwell
diff --git i/libcpp/directives.c w/libcpp/directives.c
index c4ecb9657e5..bffdc913adb 100644
--- i/libcpp/directives.c
+++ w/libcpp/directives.c
@@ -877,7 +877,7 @@ do_include_next (cpp_reader *pfile)
 
   /* If this is the primary source file, warn and use the normal
  search logic.  */
-  if (cpp_in_primary_file (pfile))
+  if (_cpp_in_main_source_file (pfile))
 {
   cpp_error (pfile, CPP_DL_WARNING,
 		 "#include_next in primary source file");
@@ -1546,7 +1546,7 @@ do_pragma (cpp_reader *pfile)
 static void
 do_pragma_once (cpp_reader *pfile)
 {
-  if (cpp_in_primary_file (pfile))
+  if (_cpp_in_main_source_file (pfile))
 cpp_error (pfile, CPP_DL_WARNING, "#pragma once in main file");
 
   check_eol (pfile, false);
@@ -1708,7 +1708,7 @@ do_pragma_poison (cpp_reader *pfile)
 static void
 do_pragma_system_header (cpp_reader *pfile)
 {
-  if (cpp_in_primary_file (pfile))
+  if (_cpp_in_main_source_file (pfile))
 cpp_error (pfile, CPP_DL_WARNING,
 	   "#pragma system_header ignored outside include file");
   else
diff --git i/libcpp/files.c w/libcpp/files.c
index b5d9f30297e..ba52d2bf3cf 100644
--- i/libcpp/files.c
+++ w/libcpp/files.c
@@ -103,9 +103,6 @@ struct _cpp_file
   /* If read() failed before.  */
   bool dont_read : 1;
 
-  /* If this file is the main file.  */
-  bool main_file : 1;
-
   /* If BUFFER above contains the true contents of the file.  */
   bool buffer_valid : 1;
 
@@ -186,7 +183,7 @@ static void open_file_failed (cpp_reader *pfile, _cpp_file *file, int,
 			  location_t);
 static struct cpp_file_hash_entry *search_cache (struct cpp_file_hash_entry *head,
 	 const cpp_dir *start_dir);
-static _cpp_file *make_cpp_file (cpp_reader *, cpp_dir *, const char *fname);
+static _cpp_file *make_cpp_file (cpp_dir *, const char *fname);
 static void destroy_cpp_file (_cpp_file *);
 static cpp_dir *make_cpp_dir (cpp_reader *, const char *dir_name, int sysp);
 static void allocate_file_hash_entries (cpp_reader *pfile);
@@ -299,7 +296,7 @@ pch_open_file (cpp_reader *pfile, _cpp_file *file, bool *invalid_pch)
   for (_cpp_file *f = pfile->all_files; f; f = f->next_file)
 if (f->implicit_preinclude)
   continue;
-else if (f->main_file)
+else if (pfile->main_file == f)
   break;
 else
   return false;
@@ -528,7 +525,7 @@ _cpp_find_file (cpp_reader *pfile, const char *fname, cpp_dir *start_dir,
   if (entry)
 return entry->u.file;
 
-  _cpp_file *file = make_cpp_file (pfile, start_dir, fname);
+  _cpp_file *file = make_cpp_file (start_dir, fname);
   file->implicit_preinclude
 = (kind == _cpp_FFK_PRE_INCLUDE
|| (pfile->buffer && pfile->buffer->file->implicit_preinclude));
@@ -865,7 +862,7 @@ has_unique_contents (cpp_reader *pfile, _cpp_file *file, bool import,
 	{
 	  /* We already have a buffer but it is not valid, because
 		 the file is still stacked.  Make a new one.  */
-	  ref_file = make_cpp_file (pfile, f->dir, f->name);
+	  ref_file = make_cpp_file (f->dir, f->name);
 	  ref_file->path = f->path;
 	}
 	  else
@@ -951,7 +948,8 @@ _cpp_stack_file (cpp_reader *pfile, _cpp_file *file, include_type type,
   if (CPP_OPTION (pfile, deps.style) > (sysp != 0)
 	  && !file->stack_count
 	  && file->path[0]
-	  && !(file->main_file && CPP_OPTION (pfile, deps.ignore_main_file)))
+	  && !(pfile->main_file == file
+	   && CPP_OPTION (pfile, deps.ignore_main_file)))
 	deps_add_dep (pf

Re: Hash ODR name for OBJ_TYPE_REF

2020-11-19 Thread Jan Hubicka
Hi,
this is variant of patch I comitted (with a mistake fixed by Richard, my
apologizes for it).  It turned out that the hunk handling OBJ_TYPE_REF
in operand_compare::hash_operand was on a wrong spot in operand_equal_p
that probably happened during mering the patch.

Also with Ada LTO bootstrap I noticed I need to check for TYPE_NAME to
be TYPE_DECL and not IDENTIFIER_POINTER which Ada does before free lang
data.

lto-bootstrapped/regtested x86_64-linux.

* fold-const.c (operand_compare::operand_equal_p): More OBJ_TYPE_REF
matching to correct place; drop OEP_ADDRESS_OF for TOKEN, OBJECT and
class.
(operand_compare::hash_operand): Hash ODR type for OBJ_TYPE_REF.
diff --git a/gcc/fold-const.c b/gcc/fold-const.c
index ddf18f27cb7..136f01b6b35 100644
--- a/gcc/fold-const.c
+++ b/gcc/fold-const.c
@@ -3346,24 +3350,6 @@ operand_compare::operand_equal_p (const_tree arg0, 
const_tree arg1,
  flags &= ~OEP_ADDRESS_OF;
  return OP_SAME (1) && OP_SAME (2);
 
-   /* Virtual table call.  */
-   case OBJ_TYPE_REF:
- {
-   if (!operand_equal_p (OBJ_TYPE_REF_EXPR (arg0),
- OBJ_TYPE_REF_EXPR (arg1), flags))
- return false;
-   if (tree_to_uhwi (OBJ_TYPE_REF_TOKEN (arg0))
-   != tree_to_uhwi (OBJ_TYPE_REF_TOKEN (arg1)))
- return false;
-   if (!operand_equal_p (OBJ_TYPE_REF_OBJECT (arg0),
- OBJ_TYPE_REF_OBJECT (arg1), flags))
- return false;
-   if (!types_same_for_odr (obj_type_ref_class (arg0),
-obj_type_ref_class (arg1)))
- return false;
-   return true;
- }
-
default:
  return false;
}
@@ -3442,6 +3428,23 @@ operand_compare::operand_equal_p (const_tree arg0, 
const_tree arg1,
return OP_SAME (0);
  return false;
 
+   case OBJ_TYPE_REF:
+   /* Virtual table reference.  */
+   if (!operand_equal_p (OBJ_TYPE_REF_EXPR (arg0),
+ OBJ_TYPE_REF_EXPR (arg1), flags))
+ return false;
+   flags &= ~OEP_ADDRESS_OF;
+   if (tree_to_uhwi (OBJ_TYPE_REF_TOKEN (arg0))
+   != tree_to_uhwi (OBJ_TYPE_REF_TOKEN (arg1)))
+ return false;
+   if (!operand_equal_p (OBJ_TYPE_REF_OBJECT (arg0),
+ OBJ_TYPE_REF_OBJECT (arg1), flags))
+ return false;
+   if (!types_same_for_odr (obj_type_ref_class (arg0),
+obj_type_ref_class (arg1)))
+ return false;
+   return true;
+
default:
  return false;
}
@@ -3861,11 +3864,23 @@ operand_compare::hash_operand (const_tree t, 
inchash::hash &hstate,
  hash_operand (TARGET_EXPR_SLOT (t), hstate, flags);
  return;
 
-   /* Virtual table call.  */
case OBJ_TYPE_REF:
+   /* Virtual table reference.  */
  inchash::add_expr (OBJ_TYPE_REF_EXPR (t), hstate, flags);
+ flags &= ~OEP_ADDRESS_OF;
  inchash::add_expr (OBJ_TYPE_REF_TOKEN (t), hstate, flags);
  inchash::add_expr (OBJ_TYPE_REF_OBJECT (t), hstate, flags);
+ if (tree c = obj_type_ref_class (t))
+   {
+ c = TYPE_NAME (TYPE_MAIN_VARIANT (c));
+ /* We compute mangled names only when free_lang_data is run.
+In that case we can hash precisely.  */
+ if (TREE_CODE (c) == TYPE_DECL
+ && DECL_ASSEMBLER_NAME_SET_P (c))
+   hstate.add_object
+  (IDENTIFIER_HASH_VALUE
+  (DECL_ASSEMBLER_NAME (c)));
+   }
  return;
default:
  break;


Update: [PATCH 5/X] libsanitizer: mid-end: Introduce stack variable handling for HWASAN

2020-11-19 Thread Matthew Malcomson via Gcc-patches
Hi there,

After offline discussion with Richard I've modified the way in which the
initialisation for the hwasan base pointer is emitted.
Originally it was getting emitted during `expand_used_vars`, and
requiring `handle_builtin_alloca` to register a need for it to be
emitted so that `expand_HWASAN_CHOOSE_TAG` can use an initialised base
pointer.

Now we go through the entire expansion of the function body, and then if
`hwasan_frame_base_ptr` was used anywhere we emit the initialisation
just before `parm_birth_insn`.

This is the updated patch for the stack variable handling.
(Testing underway)

MM

---


Handling stack variables has three features.

1) Ensure HWASAN required alignment for stack variables

When tagging shadow memory, we need to ensure that each tag granule is
only used by one variable at a time.

This is done by ensuring that each tagged variable is aligned to the tag
granule representation size and also ensure that the end of each
object is aligned to ensure the start of any other data stored on the
stack is in a different granule.

This patch ensures the above by forcing the stack pointer to be aligned
before and after allocating any stack objects. Since we are forcing
alignment we also use `align_local_variable` to ensure this new alignment
is advertised properly through SET_DECL_ALIGN.

2) Put tags into each stack variable pointer

Make sure that every pointer to a stack variable includes a tag of some
sort on it.

The way tagging works is:
  1) For every new stack frame, a random tag is generated.
  2) A base register is formed from the stack pointer value and this
 random tag.
  3) References to stack variables are now formed with RTL describing an
 offset from this base in both tag and value.

The random tag generation is handled by a backend hook.  This hook
decides whether to introduce a random tag or use the stack background
based on the parameter hwasan-random-frame-tag.  Using the stack
background is necessary for testing and bootstrap.  It is necessary
during bootstrap to avoid breaking the `configure` test program for
determining stack direction.

Using the stack background means that every stack frame has the initial
tag of zero and variables are tagged with incrementing tags from 1,
which also makes debugging a bit easier.

Backend hooks define the size of a tag, the layout of the HWASAN shadow
memory, and handle emitting the code that inserts and extracts tags from a
pointer.

3) For each stack variable, tag and untag the shadow stack on function
   prologue and epilogue.

On entry to each function we tag the relevant shadow stack region for
each stack variable. This stack region is tagged to match the tag added to
each pointer to that variable.

This is the first patch where we use the HWASAN shadow space, so we need
to add in the libhwasan initialisation code that creates this shadow
memory region into the binary we produce.  This instrumentation is done
in `compile_file`.

When exiting a function we need to ensure the shadow stack for this
function has no remaining tags.  Without clearing the shadow stack area
for this stack frame, later function calls could get false positives
when those later function calls check untagged areas (such as parameters
passed on the stack) against a shadow stack area with left-over tag.

Hence we ensure that the entire stack frame is cleared on function exit.

config/ChangeLog:

* bootstrap-hwasan.mk: Disable random frame tags for stack-tagging
during bootstrap.

ChangeLog:

* gcc/asan.c (struct hwasan_stack_var): New.
(hwasan_sanitize_p): New.
(hwasan_sanitize_stack_p): New.
(hwasan_sanitize_allocas_p): New.
(initialize_sanitizer_builtins): Define new builtins.
(ATTR_NOTHROW_LIST): New macro.
(hwasan_current_frame_tag): New.
(hwasan_frame_base): New.
(stack_vars_base_reg_p): New.
(hwasan_maybe_emit_frame_base_init): New.
(hwasan_record_stack_var): New.
(hwasan_get_frame_extent): New.
(hwasan_increment_frame_tag): New.
(hwasan_record_frame_init): New.
(hwasan_emit_prologue): New.
(hwasan_emit_untag_frame): New.
(hwasan_finish_file): New.
(hwasan_truncate_to_tag_size): New.
* gcc/asan.h (hwasan_record_frame_init): New declaration.
(hwasan_record_stack_var): New declaration.
(hwasan_emit_prologue): New declaration.
(hwasan_emit_untag_frame): New declaration.
(hwasan_get_frame_extent): New declaration.
(hwasan_maybe_emit_frame_base_init): New declaration.
(hwasan_frame_base): New declaration.
(stack_vars_base_reg_p): New declaration.
(hwasan_current_frame_tag): New declaration.
(hwasan_increment_frame_tag): New declaration.
(hwasan_truncate_to_tag_size): New declaration.
(hwasan_finish_file): New declaration.
(hwasan_sanitize_p): New declaration.
(hwasan_sanitize_sta

Update [PATCH 6/X] libsanitizer: Add hwasan pass and associated gimple changes

2020-11-19 Thread Matthew Malcomson via Gcc-patches

Update to match the change in initialisation for patch 5/X.

MM

---

There are four main features to this change:

1) Check pointer tags match address tags.

When sanitizing for hwasan we now put HWASAN_CHECK internal functions before
memory accesses in the `asan` pass.  This checks that a tag in the pointer
being used match the tag stored in shadow memory for the memory region being
used.

These internal functions are expanded into actual checks in the sanopt
pass that happens just before expansion into RTL.

We use the same mechanism that currently inserts ASAN_CHECK internal
functions to insert the new HWASAN_CHECK functions.

2) Instrument known builtin function calls.

Handle all builtin functions that we know use memory accesses.
This commit uses the machinery added for ASAN to identify builtin
functions that access memory.

The main differences between the approaches for HWASAN and ASAN are:
 - libhwasan intercepts much less builtin functions.
 - Alloca needs to be transformed differently (instead of adding
   redzones it needs to tag shadow memory and return a tagged pointer).
 - stack_restore needs to untag the shadow stack between the current
   position and where it's going.
 - `noreturn` functions can not be handled by simply unpoisoning the
   entire shadow stack -- there is no "always valid" tag.
   (exceptions and things such as longjmp need to be handled in a
   different way, usually in the runtime).

For hardware implemented checking (such as AArch64's memory tagging
extension) alloca and stack_restore will need to be handled by hooks in
the backend rather than transformation at the gimple level.  This will
allow architecture specific handling of such stack modifications.

3) Introduce HWASAN block-scope poisoning

Here we use exactly the same mechanism as ASAN_MARK to poison/unpoison
variables on entry/exit of a block.

In order to simply use the exact same machinery we're using the same
internal functions until the SANOPT pass.  This means that all handling
of ASAN_MARK is the same.
This has the negative that the naming may be a little confusing, but a
positive that handling of the internal function doesn't have to be
duplicated for a function that behaves exactly the same but has a
different name.

gcc/ChangeLog:

* asan.c (asan_instrument_reads): New.
(asan_instrument_writes): New.
(asan_memintrin): New.
(handle_builtin_stack_restore): Account for HWASAN.
(hwasan_emit_round_up): New.
(handle_builtin_alloca): Account for HWASAN.
(get_mem_refs_of_builtin_call): Special case strlen for HWASAN.
(hwasan_instrument_reads): New.
(hwasan_instrument_writes): New.
(hwasan_memintrin): New.
(report_error_func): Assert not HWASAN.
(build_check_stmt): Make HWASAN_CHECK instead of ASAN_CHECK.
(instrument_derefs): HWASAN does not tag globals.
(instrument_builtin_call): Use new helper functions.
(maybe_instrument_call): Don't instrument `noreturn` functions.
(initialize_sanitizer_builtins): Add new type.
(asan_expand_mark_ifn): Account for HWASAN.
(asan_expand_check_ifn): Assert never called by HWASAN.
(asan_expand_poison_ifn): Account for HWASAN.
(asan_instrument): Branch based on whether using HWASAN or ASAN.
(pass_asan::gate): Return true if sanitizing HWASAN.
(pass_asan_O0::gate): Return true if sanitizing HWASAN.
(hwasan_check_func): New.
(hwasan_expand_check_ifn): New.
(hwasan_expand_mark_ifn): New.
(gate_hwasan): New.
* asan.h (hwasan_expand_check_ifn): New decl.
(hwasan_expand_mark_ifn): New decl.
(gate_hwasan): New decl.
(asan_intercepted_p): Always false for hwasan.
(asan_sanitize_use_after_scope): Account for HWASAN.
* builtin-types.def (BT_FN_PTR_CONST_PTR_UINT8): New.
* gimple-pretty-print.c (dump_gimple_call_args): Account for
HWASAN.
* gimplify.c (asan_poison_variable): Account for HWASAN.
(gimplify_function_tree): Remove requirement of
SANITIZE_ADDRESS, requiring asan or hwasan is accounted for in
`asan_sanitize_use_after_scope`.
* internal-fn.c (expand_HWASAN_CHECK): New.
(expand_HWASAN_ALLOCA_UNPOISON): New.
(expand_HWASAN_CHOOSE_TAG): New.
(expand_HWASAN_MARK): New.
(expand_HWASAN_SET_TAG): New.
* internal-fn.def (HWASAN_ALLOCA_UNPOISON): New.
(HWASAN_CHOOSE_TAG): New.
(HWASAN_CHECK): New.
(HWASAN_MARK): New.
(HWASAN_SET_TAG): New.
* sanitizer.def (BUILT_IN_HWASAN_LOAD1): New.
(BUILT_IN_HWASAN_LOAD2): New.
(BUILT_IN_HWASAN_LOAD4): New.
(BUILT_IN_HWASAN_LOAD8): New.
(BUILT_IN_HWASAN_LOAD16): New.
(BUILT_IN_HWASAN_LOADN): New.
(BUILT_IN_HWASAN_STORE1): New.
(BUILT_IN_HWASAN_STORE2): New.
(BUILT_IN_HWASAN_STORE4): New.
(BUILT_IN

Re: [PATCH] Remove lambdas from _Rb_tree

2020-11-19 Thread François Dumont via Gcc-patches

On 19/11/20 12:31 pm, Jonathan Wakely wrote:

On 19/11/20 07:46 +0100, François Dumont via Libstdc++ wrote:

On 18/11/20 12:50 am, Jonathan Wakely wrote:

On 17/11/20 21:51 +0100, François Dumont via Libstdc++ wrote:
This is a change that has been done to _Hashtable and that I forgot 
to propose for _Rb_tree.


The _GLIBCXX_XREF macro can be easily removed of course.

    libstdc++: _Rb_tree code cleanup, remove lambdas.

    Use an additional template parameter on the clone 
method to propagate if the values must be

    copy or move rather than lambdas.

    libstdc++-v3/ChangeLog:

            * include/bits/move.h 
(_GLIBCXX_XREF): New.
            * 
include/bits/stl_tree.h: Adapt to use latter.
           
(_Rb_tree<>::_S_fwd_value_for): New.
           
(_Rb_tree<>::_M_clone_node): Add _Tree template parameter.

            Use _S_fwd_value_for.
           
(_Rb_tree<>::_M_cbegin): New.
           (_Rb_tree<>::_M_begin): 
Use latter.
           (_Rb_tree<>::_M_copy): 
Add _Tree template parameter.
           
(_Rb_tree<>::_M_move_data): Use rvalue reference for _Rb_tree 
parameter.
           
(_Rb_tree<>::_M_move_assign): Likewise.


Tested under Linux x86_64.

Ok to commit ?


GCC is in stage 3 now, so this should have been posted last week
really.


Ok, no problem, it can wait.

Still, following your advises here is what I come up with, much 
simpler indeed.


Yes, this simpler patch looks promising even though it's stage 3.


I just run a few tests for the moment but so far so good.

Thanks




diff --git a/libstdc++-v3/include/bits/move.h 
b/libstdc++-v3/include/bits/move.h

index 5a4dbdc823c..e0d68ca9108 100644
--- a/libstdc++-v3/include/bits/move.h
+++ b/libstdc++-v3/include/bits/move.h
@@ -158,9 +158,11 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION

  /// @} group utilities

+#define _GLIBCXX_XREF(_Tp) _Tp&&


I think this does improve the code that uses this. But the correct
name for this is forwarding reference, so I think FWDREF would be
better than XREF. XREF doesn't tell me anything about what it's for.


#define _GLIBCXX_MOVE(__val) std::move(__val)
#define _GLIBCXX_FORWARD(_Tp, __val) std::forward<_Tp>(__val)
#else
+#define _GLIBCXX_XREF(_Tp) const _Tp&
#define _GLIBCXX_MOVE(__val) (__val)
#define _GLIBCXX_FORWARD(_Tp, __val) (__val)
#endif
diff --git a/libstdc++-v3/include/bits/stl_tree.h 
b/libstdc++-v3/include/bits/stl_tree.h

index ec141ea01c7..128c7e2c892 100644
--- a/libstdc++-v3/include/bits/stl_tree.h
+++ b/libstdc++-v3/include/bits/stl_tree.h
@@ -478,11 +478,7 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION

    template
      _Link_type
-#if __cplusplus < 201103L
-      operator()(const _Arg& __arg)
-#else
-      operator()(_Arg&& __arg)
-#endif
+      operator()(_GLIBCXX_XREF(_Arg) __arg)
      {
        _Link_type __node = 
static_cast<_Link_type>(_M_extract());

        if (__node)
@@ -544,11 +540,7 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION

    template
      _Link_type
-#if __cplusplus < 201103L
-      operator()(const _Arg& __arg) const
-#else
-      operator()(_Arg&& __arg) const
-#endif
+      operator()(_GLIBCXX_XREF(_Arg) __arg) const
      { return _M_t._M_create_node(_GLIBCXX_FORWARD(_Arg, 
__arg)); }


      private:
@@ -655,11 +647,27 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
    _M_put_node(__p);
      }

-      template
+#if __cplusplus >= 201103L
+      template
+    static constexpr
+    typename conditional::value,
+                 const value_type&, 
value_type&&>::type

+    _S_fwd_value_for(value_type& __val) noexcept
+    { return std::move(__val); }
+#else
+      template
+    static const value_type&
+    _S_fwd_value_for(value_type& __val)
+    { return __val; }
+#endif
+
+      template
    _Link_type
-    _M_clone_node(_Const_Link_type __x, _NodeGen& __node_gen)
+    _M_clone_node(_GLIBCXX_XREF(_Tree),


Since the _Tree type is only used to decide whether to copy or move,
could it just be a bool instead?

      template
     _Link_type
    _M_clone_node(_Link_type __x, _NodeGen& __node_gen)

Then it would be called as _M_clone_node<_Move>(__x, __node_gen)
instead of _M_clone_node(_GLIBCXX_FORWARD(_Tree, __t), __x, 
__node_gen).

That seems easier to read.


+              _Link_type __x, _NodeGen& __node_gen)
    {
-      _Link_type __tmp = __node_gen(*__x->_M_valptr());
+      _Link_type __tmp
+        = 
__node_gen(_S_fwd_value_for<_Tree>(*__x->_M_valptr()));


Is _S_fwd_value_for necessary? This would work:

#if __c

Re: [committed] libstdc++: Use custom timespec in system calls [PR 93421]

2020-11-19 Thread Jonathan Wakely via Gcc-patches

On 18/11/20 20:22 +, Jonathan Wakely wrote:

On 18/11/20 00:01 +, Jonathan Wakely wrote:

On 14/11/20 14:23 +, Jonathan Wakely wrote:

On Sat, 14 Nov 2020, 13:30 Mike Crowe wrote:

@@ -195,7 +205,7 @@ namespace
 if (__s.count() < 0) [[unlikely]]
   return false;

- struct timespec rt;
+ syscall_timespec rt;
 if (__s.count() > __int_traits::__max) [[unlikely]]
   rt.tv_sec = __int_traits::__max;


Do these now need to be __int_traits::__max in case time_t is 64-bit
yet syscall_timespec is using 32-bit long?



Ah yes. Maybe decltype(rt.tv_sec).


I'll fix that in the next patch.


And here's that next patch. I'm testing this and will commit if all
goes well.


Committed.



Re: [PATCH] libstdc++: Enable without gthreads

2020-11-19 Thread Jonathan Wakely via Gcc-patches

On 16/11/20 14:43 -0800, Thomas Rodgers wrote:

This patch looks good to me.


Committed now.


It would be great to find a way to do a similar refactoring of 
condition_variable.


Yes, probably once stage 1 opens for GCC 12.



On Nov 12, 2020, at 9:07 AM, Jonathan Wakely via Libstdc++ 
 wrote:

On 11/11/20 17:31 +, Jonathan Wakely wrote:

On 11/11/20 16:13 +, Jonathan Wakely wrote:

This makes it possible to use std::thread in single-threaded builds.
All member functions are available, but attempting to create a new
thread will throw an exception.

The main benefit for most targets is that other headers such as 
do not need to include the whole of  just to be able to create a
std::thread. That avoids including  and std::jthread where
not required.


I forgot to mention that this patch also reduces the size of the
 header, by only including  instead of the
whole of . That could be done separately from the rest of the
changes here.

It would be possible to split std::thread and this_thread::get_id()
into a new header without also making them work without gthreads.

That would still reduce the size of the  header, because it
wouldn't need the whole of . But it wouldn't get rid of
preprocessor checks for _GLIBCXX_HAS_GTHREADS in .

Allowing std::this_thread::get_id() and std::this_thread::yield() to
work without threads seems worth doing (we already make
std::this_thread::sleep_until and std::this_thread::sleep_for work
without threads).


Here's a slightly more conservative version of the patch. This moves
std::thread and this_thread::get_id() and this_thread::yield() to a
new header, and makes *most* of std::thread defined without gthreads
(because we need the nested thread::id type to be returned from
this_thread::get_id()). But it doesn't declare the std::thread
constructor that creates new threads.

That means std::thread is present, but you can't even try to create
new threads. This means we don't need to export the std::thread
symbols from libstdc++.so for a target where they are unusable and
just throw an exception.

This still has the main benefits of making  include a lot less
code, and removing some #if conditions in .

One other change from the previous patch worth mentioning is that I've
made  include  so that
std::reference_wrapper (and std::ref and std::cref) are defined by
. That isn't required, but it is a tiny header and being able
to use std::ref to pass lvalues to new threads without including
all of  seems like a kindness to users.

Both this and the previous patch require some GDB changes, because GDB
currently assumes that if std::thread is declared in  that it
is usable and multiple threads are supported. That's no longer true,
because we would declare a useless std::thread after this patch. Tom
Tromey has patches to make GDB handle this though.

Tested powerpc64le-linux, --enable-threads and --disable-threads.

Thoughts?









Re: Fix hanlding of gimple_clobber in ICF

2020-11-19 Thread Richard Biener
On Thu, 19 Nov 2020, Jan Hubicka wrote:

> Hi,
> after fixing few issues I gotto stage where 1.4M icf mismatches are due to
> comparing two gimple clobber.  The problem is that operand_equal_p match
> clobber 
> 
> case CONSTRUCTOR:
>  /* In GIMPLE empty constructors are allowed in initializers of
> aggregates.  */
>  return !CONSTRUCTOR_NELTS (arg0) && !CONSTRUCTOR_NELTS (arg1);
> 
> But this happens too late after comparing semantics of its type (that
> are not very relevant for memory store and fails way too often).
> 
> In the context of ipa-icf we do not really need to match RHS of gimple 
> clobbers:
> it is enough to know that the LHS stores can be considered equivalent.
> 
> I this added logic to hash them all the same way and compare using
> TREE_CLOBBER_P flag.  I see other option in extending operand_equal_p
> in fold-const to handle them more generously or making stmt hash and compare
> to skip comparing/hashing RHS of gimple_clobber_p.
> 
> I am now down to 1453 opernad_equal_p mismatches so it seems we are getting
> into shape.
> 
> Bootstrapped/regtested x86_64, looks reasonable?
> Honza
> 
>   * ipa-icf-gimple.c (func_checker::hash_operand): Hash gimple clobber.
>   (func_checker::operand_equal_p): Special case gimple clobber.
> diff --git a/gcc/ipa-icf-gimple.c b/gcc/ipa-icf-gimple.c
> index ffb1baddbdb..69bc9ab5b34 100644
> --- a/gcc/ipa-icf-gimple.c
> +++ b/gcc/ipa-icf-gimple.c
> @@ -245,6 +245,14 @@ func_checker::hash_operand (const_tree arg, 
> inchash::hash &hstate,
>break;
>  }
>  
> +  /* In gimple all clobbers can be considered equal: while comparaing two
> + gimple clobbers we match the left hand memory accesses.  */
> +  if (TREE_CLOBBER_P (arg))
> +{
> +  hstate.add_int (0xc10bbe5);
> +  return;
> +}
> +
>return operand_compare::hash_operand (arg, hstate, flags);
>  }
>  
> @@ -306,6 +314,10 @@ func_checker::operand_equal_p (const_tree t1, const_tree 
> t2,
>  default:
>break;
>  }
> +  /* In gimple all clobbers can be considered equal.  We match the right hand

left hand


otherwise yes, I guess this will work for ICF.

> + memory accesses.  */
> +  if (TREE_CLOBBER_P (t1) || TREE_CLOBBER_P (t2))
> +return TREE_CLOBBER_P (t1) == TREE_CLOBBER_P (t2);
>  
>return operand_compare::operand_equal_p (t1, t2, flags);
>  }
> 

-- 
Richard Biener 
SUSE Software Solutions Germany GmbH, Maxfeldstrasse 5, 90409 Nuernberg,
Germany; GF: Felix Imend


Re: [PATCH] PowerPC Fix ibm128 defaults for pr70117.c test.

2020-11-19 Thread Segher Boessenkool
On Thu, Nov 19, 2020 at 03:08:05AM -0500, Michael Meissner wrote:
> On Wed, Nov 18, 2020 at 04:29:09PM -0600, Segher Boessenkool wrote:
> > Mike, please add a comment, what number it represents?  Okay for trunk
> > with that, thanks.
> > 
> > (Should those not be define in some header though?)
> 
> When long double is IBM extended double, then LDBL_MAX, etc. is set with 
> math.h
> (and the __ version created by the compiler).  We don't have min/max for the
> funky MD only floating point numbers defined.  I got the number by printing
> LDBL_MAX in fact and just pasting that in.

Sure -- I am suggesting to always define __IBM128_MAX__ and the like,
which then can be used to define LDBL_MAX, but also can be used
directly.


Segher


Re: [PATCH] vect: Add a “very cheap” cost model

2020-11-19 Thread Richard Biener via Gcc-patches
On Thu, Nov 19, 2020 at 1:04 PM Richard Sandiford
 wrote:
>
> Richard Biener via Gcc-patches  writes:
> > On Mon, Nov 16, 2020 at 10:58 AM Richard Sandiford
> >  wrote:
> >> > Does the patch also vectorize with SVE loops that have
> >> > unknown loop bound?  The documentation isn't entirely
> >> > conclusive there.
> >>
> >> Yeah, for SVE it vectorises.  How about changing:
> >>
> >>   For example, if each iteration of a vectorized loop would handle
> >>   exactly four iterations, …
> >>
> >> to:
> >>
> >>   For example, if each iteration of a vectorized loop could only
> >>   handle exactly four iterations of the original scalar loop, …
> >>
> >> ?
> >
> > Yeah, guess that's better.
> >
> >>
> >> > Iff the iteration count is a multiple of two and the target can
> >> > vectorize the loop with both VF 2 and VF 4 but VF 4 would be better if
> >> > we'd use the 'cheap' cost model, does 'very-cheap' not vectorize the
> >> > loop or does it choose VF 2?
> >>
> >> It would choose VF 2, if that's still a win over scalar code.
> >
> > OK, that's what I expected.  The VF iteration is one source of
> > compile-time that we might want to avoid somehow ... on
> > x86_64 knowing the precise number of constant iterations
> > should allow to only pick a subset of vector modes based on
> > largest_pow2_factor or so?  Or maybe just use the preferred
> > SIMD mode for cheap/very-cheap?  (maybe pass down
> > the cost model kind to the target hook so targets can decide
> > for themselves here)
>
> On the preferred simd mode thing: TBH, I'd prefer to get rid
> of that hook one day and just rely on autovectorize_vector_modes.
>
> The difficulty with adding an early check is that we don't know ahead
> of time which types of scalar element a loop operates on: we only find
> that out on the fly during the first analysis of the loop.  The check
> would also depend on SLP grouping: we can use a vector of 4 ints to
> handle 2 iterations of the scalar loop if the ints are in an SLP
> group of size 2.
>
> So I agree it would be nice to have early-outs, but I think we'd have
> to restructure things first.  E.g. maybe we could do some “cheap” initial
> analysis that checks for basic vectorisability, records which scalar
> elements are used by the loop, and records how big the containing SLP
> groups might be (based on optimistic assumptions).  Then we can use
> that to prefilter the modes we try (perhaps all the way down to no modes).
> I guess that's conceptually similar to building an SLP graph though.
>
> Does the attached look OK?  I've included a version of the updated
> wording above.  I also changed this condition to use “>” rather
> than “>=”:

Yes, OK.

Thanks,
Richard.

>
>   /* If the vector loop needs multiple iterations to be beneficial then
>  things are probably too close to call, and the conservative thing
>  would be to stick with the scalar code.  */
>   if (flag_vect_cost_model == VECT_COST_MODEL_VERY_CHEAP
>   && min_profitable_estimate > (int) vect_vf_for_cost (loop_vinfo))
>
> since when min_profitable_estimate == min_profitable_iters
> we'll have done:
>
>   if (!LOOP_VINFO_USING_PARTIAL_VECTORS_P (loop_vinfo)
>   && min_profitable_iters < (assumed_vf + peel_iters_prologue))
> /* We want the vectorized loop to execute at least once.  */
> min_profitable_iters = assumed_vf + peel_iters_prologue;
>
> I also tried to make vect-cost-model-4.c more resilient on targets
> that require alignment.
>
> Thanks,
> Richard
>
>
> gcc/
> * doc/invoke.texi (-fvect-cost-model): Add a very-cheap model.
> * common.opt (fvect-cost-model=): Add very-cheap as a possible option.
> (fsimd-cost-model=): Likewise.
> (vect_cost_model): Add very-cheap.
> * flag-types.h (vect_cost_model): Add VECT_COST_MODEL_VERY_CHEAP.
> Put the values in order of increasing aggressiveness.
> * tree-vect-data-refs.c (vect_enhance_data_refs_alignment): Use
> range checks when comparing against VECT_COST_MODEL_CHEAP.
> (vect_prune_runtime_alias_test_list): Do not allow any alias
> checks for the very-cheap cost model.
> * tree-vect-loop.c (vect_analyze_loop_costing): Do not allow
> any peeling for the very-cheap cost model.  Also require one
> iteration of the vector loop to pay for itself.
>
> gcc/testsuite/
> * gcc.dg/vect/vect-cost-model-1.c: New test.
> * gcc.dg/vect/vect-cost-model-2.c: Likewise.
> * gcc.dg/vect/vect-cost-model-3.c: Likewise.
> * gcc.dg/vect/vect-cost-model-4.c: Likewise.
> * gcc.dg/vect/vect-cost-model-5.c: Likewise.
> * gcc.dg/vect/vect-cost-model-6.c: Likewise.
> ---
>  gcc/common.opt|  7 +++--
>  gcc/doc/invoke.texi   | 12 +++--
>  gcc/flag-types.h  | 10 ---
>  gcc/testsuite/gcc.dg/vect/vect-cost-model-1.c | 11 
>  gcc/testsuite/gcc.dg/vect/vect-cost-model-2.c | 1

Re: [PATCH] PowerPC Fix ibm128 defaults for pr70117.c test.

2020-11-19 Thread Segher Boessenkool
On Wed, Nov 18, 2020 at 07:13:04PM -0600, Paul A. Clarke wrote:
> On Wed, Nov 18, 2020 at 04:29:09PM -0600, Segher Boessenkool wrote:
> > Mike, please add a comment, what number it represents?  Okay for trunk
> > with that, thanks.
> > 
> > (Should those not be define in some header though?)
> 
> Would it be better to represent the number in hex, like with printf's '%a'
> formatting (e.g. "0x1.921fb54442d18p+0"...this is NOT the same value)?
> 
> (I always get nervous when I see a long float hardcoded in decimal.)

Yeah, good point, thanks.

That's indeed why I converted it to hex, to see what it *is* :-)

Hexadecimal float is C99 so we cannot use it in the installed headers,
but it should be fine in most testcases now.  And even in the headers
we can put it in a comment of course.


Segher


Re: [PATCH] configury: --enable-link-serialization support

2020-11-19 Thread Martin Liška

On 11/19/20 1:10 PM, Eric Botcazou wrote:

Does it?
Only with --enable-link-serialization, or without that too?


It's the default, i.e. without --enable-link-serialization.


Will have a look.


Thanks.



I see the same, I don't use --enable-link-serialization.
Just run:
make -j16 && make -j16

Martin



Re: [AArch64] Add --with-tune configure flag

2020-11-19 Thread Wilco Dijkstra via Gcc-patches
Hi,

>>>    As for your second patch, --with-cpu-64 could be a simple alias indeed,
>>>    but what is the exact definition/expected behaviour of a --with-cpu-32
>>>    on a target that only supports 64-bit code? The AArch64 target cannot
>>>    generate AArch32 code, so we shouldn't silently accept it.
>> 
>> IMO allowing users to specify all the flags available on x86 is important.
>> 
>
> This isn't about general users though; it's about how you configure the
> compiler and that's not all the same.  I don't mind the --with-cpu-64 as
> a strict alias for --with-cpu, but --with-cpu-32 is both redundant and
> misleading as it might give the impression that it does something useful.

We could make it do something useful, for example emit a warning, an error
or default to -mabi=ilp32 (since that is similar to what other targets do).
Anything is better than being the only target that doesn't support it...

Cheers,
Wilco


Re: [PATCH v2] Add if-chain to switch conversion pass.

2020-11-19 Thread Richard Biener via Gcc-patches
On Wed, Nov 18, 2020 at 1:25 PM Martin Liška  wrote:
>
> On 11/16/20 1:21 PM, Richard Biener wrote:
> > but the most trivial thing would be to feed the pass the
> > balanced-tree generated by switch expansion where I
> > would expect us to be able to detect it as the original switch again.
>
> Well, if we want to support such matching, then please deffer it to a phase 2.
> I don't see it a common pattern that people write such a code in wild.

I didn't expect do actually support the matching just have the code
structured in
a way to make it easier.  I guess it's close enough to go forward with the
current scheme though.

> Right now, we have some local analysis and one can eventually build a more 
> advanced
> algorithm on top of it. Can we please make a progress for GCC 11 with the 
> current
> approach that will cover quite some interesting if-chains?

OK, so can you send an updated patch?

Thanks,
Richard.

> Thanks,
> Martin


Re: [PATCH] configury: --enable-link-serialization support

2020-11-19 Thread Jakub Jelinek via Gcc-patches
On Thu, Nov 19, 2020 at 01:10:18PM +0100, Eric Botcazou wrote:
> > Does it?
> > Only with --enable-link-serialization, or without that too?
> 
> It's the default, i.e. without --enable-link-serialization.
> 
> > Will have a look.
> 
> Thanks.

So, I think the problem is that for make .PHONY targets are just
"rebuilt" always, so it is very much undesirable for the cc1plus$(exeext)
etc. dependencies to include .PHONY targets, but I was using
them - cc1plus.prev which would depend on some *.serial and
e.g. cc1.serial depending on c and c depending on cc1$(exeext).

The following so far only very lightly tested patch rewrites this
so that *.serial and *.prev aren't .PHONY targets, but instead just
make variables.

I was worried that the order in which the language makefile fragments are
included (which is quite random, what order we get from the filesystem
matching */config-lang.in) would be a problem but it seems to work fine.

2020-11-19  Jakub Jelinek  

gcc/
* configure.ac: In SERIAL_LIST use lang words without .serial
suffix.  Change $lang.prev from a target to variable and instead
of depending on *.serial expand to the *.serial variable if
the word is in the SERIAL_LIST at all, otherwise to nothing.
* configure: Regenerated.
gcc/c/
* Make-lang.in (c.serial): Change from goal to a variable.
(.PHONY): Drop c.serial.
gcc/ada/
* gcc-interface/Make-lang.in (ada.serial): Change from goal to a
variable.
(.PHONY): Drop ada.serial and ada.prev.
(gnat1$(exeext)): Depend on $(ada.serial) rather than ada.serial.
gcc/brig/
* Make-lang.in (brig.serial): Change from goal to a variable.
(.PHONY): Drop brig.serial and brig.prev.
(brig1$(exeext)): Depend on $(brig.serial) rather than brig.serial.
gcc/cp/
* Make-lang.in (c++.serial): Change from goal to a variable.
(.PHONY): Drop c++.serial and c++.prev.
(cc1plus$(exeext)): Depend on $(c++.serial) rather than c++.serial.
gcc/d/
* Make-lang.in (d.serial): Change from goal to a variable.
(.PHONY): Drop d.serial and d.prev.
(d21$(exeext)): Depend on $(d.serial) rather than d.serial.
gcc/fortran/
* Make-lang.in (fortran.serial): Change from goal to a variable.
(.PHONY): Drop fortran.serial and fortran.prev.
(f951$(exeext)): Depend on $(fortran.serial) rather than
fortran.serial.
gcc/go/
* Make-lang.in (go.serial): Change from goal to a variable.
(.PHONY): Drop go.serial and go.prev.
(go1$(exeext)): Depend on $(go.serial) rather than go.serial.
gcc/jit/
* Make-lang.in (jit.serial): Change from goal to a
variable.
(.PHONY): Drop jit.serial and jit.prev.
($(LIBGCCJIT_FILENAME)): Depend on $(jit.serial) rather than
jit.serial.
gcc/lto/
* Make-lang.in (lto1.serial, lto2.serial): Change from goals to
variables.
(.PHONY): Drop lto1.serial, lto2.serial, lto1.prev and lto2.prev.
($(LTO_EXE)): Depend on $(lto1.serial) rather than lto1.serial.
($(LTO_DUMP_EXE)): Depend on $(lto2.serial) rather than lto2.serial.
gcc/objc/
* Make-lang.in (objc.serial): Change from goal to a variable.
(.PHONY): Drop objc.serial and objc.prev.
(cc1obj$(exeext)): Depend on $(objc.serial) rather than objc.serial.
gcc/objcp/
* Make-lang.in (obj-c++.serial): Change from goal to a variable.
(.PHONY): Drop obj-c++.serial and obj-c++.prev.
(cc1objplus$(exeext)): Depend on $(obj-c++.serial) rather than
obj-c++.serial.

--- gcc/configure.ac.jj 2020-11-19 12:34:10.763514118 +0100
+++ gcc/configure.ac2020-11-19 15:23:24.121386881 +0100
@@ -6864,7 +6864,7 @@ echo "SERIAL_LIST =" >> Make-hooks
 echo else >> Make-hooks
 lang_cnt=0
 lang_list=
-prev=c.serial
+prev=c
 serialization_languages=c
 for lang in $all_selected_languages
 do
@@ -6880,7 +6880,7 @@ do
test $lang = c && continue
lang_cnt=`expr $lang_cnt + 1`
lang_list=" $prev$lang_list"
-   prev=${lang}.serial
+   prev=${lang}
 done
 echo "SERIAL_LIST = \$(wordlist 
\$(DO_LINK_SERIALIZATION),$lang_cnt,$lang_list)" >> Make-hooks
 echo endif >> Make-hooks
@@ -6890,7 +6890,7 @@ lang_idx=1
 for lang in $serialization_languages
 do
test $lang = c && continue
-   echo "$lang.prev: \$(word $lang_cnt,\$(SERIAL_LIST))" >> Make-hooks
+   echo "$lang.prev = \$(if \$(word $lang_cnt,\$(SERIAL_LIST)),\$(\$(word 
$lang_cnt,\$(SERIAL_LIST)).serial))" >> Make-hooks
echo "INDEX.$lang = $lang_idx" >> Make-hooks
lang_cnt=`expr $lang_cnt - 1`
lang_idx=`expr $lang_idx + 1`
--- gcc/c/Make-lang.in.jj   2020-11-19 12:34:10.752514244 +0100
+++ gcc/c/Make-lang.in  2020-11-19 14:24:32.367043794 +0100
@@ -37,10 +37,10 @@
 #
 # Define the names for selecting c in LANGUAGES.
 c: cc1$(exeext)
-c.serial: c
+c.serial = cc1$(exeext)
 
 # Tell GNU make to ignore these if they ex

preprocessor: main file searching

2020-11-19 Thread Nathan Sidwell
this patch is slightly modified from the original 07 patch, due to the 
cleanup I posted earlier today.


This adds the capability to locate the main file on the user or system
include paths.  That's extremely useful to users building header
units.  Searching has to be requiested (plain header-unit compilation
will not search).  Also, to make include_next work as expected when
building a header unit, we add a mechanism to retrofit a non-searched
source file as one on the include path.

libcpp/
* include/cpplib.h (enum cpp_main_search): New.
(struct cpp_options): Add main_search field.
(cpp_main_loc): Declare.
(cpp_retrofit_as_include): Declare.
* internal.h (struct cpp_reader): Add main_loc field.
(_cpp_in_main_source_file): Not main if main is a header.
* init.c (cpp_read_main_file): Use main_search option to locate
main file.  Set main_loc
* files.c (cpp_retrofit_as_include): New.

pushing to trunk.

nathan
--
Nathan Sidwell
diff --git i/libcpp/files.c w/libcpp/files.c
index ba52d2bf3cf..301b2379a23 100644
--- i/libcpp/files.c
+++ w/libcpp/files.c
@@ -1131,6 +1131,37 @@ cpp_find_header_unit (cpp_reader *pfile, const char *name, bool angle,
   return file->path;
 }
 
+/* Retrofit the just-entered main file asif it was an include.  This
+   will permit correct include_next use, and mark it as a system
+   header if that's where it resides.  We use filesystem-appropriate
+   prefix matching of the include path to locate the main file.  */
+void
+cpp_retrofit_as_include (cpp_reader *pfile)
+{
+  /* We should be the outermost.  */
+  gcc_assert (!pfile->buffer->prev);
+
+  if (const char *name = pfile->main_file->name)
+{
+  /* Locate name on the include dir path, using a prefix match.  */
+  size_t name_len = strlen (name);
+  for (cpp_dir *dir = pfile->quote_include; dir; dir = dir->next)
+	if (dir->len < name_len
+	&& IS_DIR_SEPARATOR (name[dir->len])
+	&& !filename_ncmp (name, dir->name, dir->len))
+	  {
+	pfile->main_file->dir = dir;
+	if (dir->sysp)
+	  cpp_make_system_header (pfile, 1, 0);
+	break;
+	  }
+}
+
+  /* Initialize controlling macro state.  */
+  pfile->mi_valid = true;
+  pfile->mi_cmacro = 0;
+}
+
 /* Could not open FILE.  The complication is dependency output.  */
 static void
 open_file_failed (cpp_reader *pfile, _cpp_file *file, int angle_brackets,
diff --git i/libcpp/include/cpplib.h w/libcpp/include/cpplib.h
index 630f2e055d1..91226cfc248 100644
--- i/libcpp/include/cpplib.h
+++ w/libcpp/include/cpplib.h
@@ -308,6 +308,15 @@ enum cpp_normalize_level {
   normalized_none
 };
 
+enum cpp_main_search 
+{
+  CMS_none,/* A regular source file.  */
+  CMS_header,  /* Is a directly-specified header file (eg PCH or
+		  header-unit).  */
+  CMS_user,/* Search the user INCLUDE path.  */
+  CMS_system,  /* Search the system INCLUDE path.  */
+};
+
 /* This structure is nested inside struct cpp_reader, and
carries all the options visible to the command line.  */
 struct cpp_options
@@ -566,6 +575,8 @@ struct cpp_options
 
   /* The maximum depth of the nested #include.  */
   unsigned int max_include_depth;
+
+  cpp_main_search main_search : 8;
 };
 
 /* Diagnostic levels.  To get a diagnostic without associating a
@@ -997,6 +1008,10 @@ extern const char *cpp_find_header_unit (cpp_reader *, const char *file,
too.  If there was an error opening the file, it returns NULL.  */
 extern const char *cpp_read_main_file (cpp_reader *, const char *,
    bool injecting = false);
+extern location_t cpp_main_loc (const cpp_reader *);
+
+/* Adjust for the main file to be an include.  */
+extern void cpp_retrofit_as_include (cpp_reader *);
 
 /* Set up built-ins with special behavior.  Use cpp_init_builtins()
instead unless your know what you are doing.  */
diff --git i/libcpp/init.c w/libcpp/init.c
index fc826583d3a..f77dc26a003 100644
--- i/libcpp/init.c
+++ w/libcpp/init.c
@@ -675,8 +675,14 @@ cpp_read_main_file (cpp_reader *pfile, const char *fname, bool injecting)
 deps_add_default_target (deps, fname);
 
   pfile->main_file
-= _cpp_find_file (pfile, fname, &pfile->no_search_path, /*angle=*/0,
-		  _cpp_FFK_NORMAL, 0);
+= _cpp_find_file (pfile, fname,
+		  CPP_OPTION (pfile, preprocessed) ? &pfile->no_search_path
+		  : CPP_OPTION (pfile, main_search) == CMS_user
+		  ? pfile->quote_include
+		  : CPP_OPTION (pfile, main_search) == CMS_system
+		  ? pfile->bracket_include : &pfile->no_search_path,
+		  /*angle=*/0, _cpp_FFK_NORMAL, 0);
+
   if (_cpp_find_failed (pfile->main_file))
 return NULL;
 
@@ -698,7 +704,16 @@ cpp_read_main_file (cpp_reader *pfile, const char *fname, bool injecting)
 			 LINEMAP_LINE (last), LINEMAP_SYSP (last));
   }
 
-  return ORDINARY_MAP_FILE_NAME (LINEMAPS_LAST_ORDINARY_MAP (pfile->line_table));
+  auto *map = LINEMAPS_LAST_ORDINARY_MAP (pfile->line_table);
+  pfile->main_lo

Re: Update: [PATCH 5/X] libsanitizer: mid-end: Introduce stack variable handling for HWASAN

2020-11-19 Thread Richard Sandiford via Gcc-patches
Matthew Malcomson  writes:
> […]
> +/* hwasan_frame_base_init_seq is the sequence of RTL insns that will 
> initialize
> +   the hwasan_frame_base_ptr.  When the hwasan_frame_base_ptr is requested, 
> we
> +   generate this sequence but do not emit it.  If the sequence was created it
> +   is emitted once the function body has been expanded.
> +
> +   This delay is because the frame base pointer may be needed anywhere in the
> +   function body, or needed by the expand_used_vars function.  Emitting once 
> in
> +   a known place is simpler than requiring the emition of the instructions to

s/emition/emission/

> +   be know where it should go depending on the first place the hwasan frame
> +   base is needed.  */
> +static GTY(()) rtx_insn *hwasan_frame_base_init_seq = NULL;
> […]
> +/* For stack tagging:
> +
> +   Return the 'base pointer' for this function.  If that base pointer has not
> +   yet been created then we create a register to hold it and record the insns
> +   to initialize the register in `hwasan_frame_base_init_seq` for later
> +   emission.  */
> +rtx
> +hwasan_frame_base ()
> +{
> +  if (! hwasan_frame_base_ptr)
> +{
> +  start_sequence ();
> +  hwasan_frame_base_ptr =
> + force_reg (Pmode,
> +targetm.memtag.insert_random_tag (virtual_stack_vars_rtx,
> +  NULL_RTX));

Nit: should be formatted as:

  hwasan_frame_base_ptr
= force_reg (Pmode,
 targetm.memtag.insert_random_tag (virtual_stack_vars_rtx,
   NULL_RTX));

> […]
> +  size_t length = hwasan_tagged_stack_vars.length ();
> +  hwasan_stack_var *vars = hwasan_tagged_stack_vars.address ();
> +
> +  poly_int64 bot = 0, top = 0;
> +  size_t i = 0;
> +  for (i = 0; i < length; i++)
> +{
> +  hwasan_stack_var& cur = vars[i];

Simpler as:

  poly_int64 bot = 0, top = 0;
  for (hwasan_stack_var &cur : hwasan_tagged_stack_vars)

(GCC style is to add a space before “&”, as for “*”)

> +  poly_int64 nearest = cur.nearest_offset;
> +  poly_int64 farthest = cur.farthest_offset;
> +
> +  if (known_ge (nearest, farthest))
> + {
> +   top = nearest;
> +   bot = farthest;
> + }
> +  else
> + {
> +   /* Given how these values are calculated, one must be known greater
> +  than the other.  */
> +   gcc_assert (known_le (nearest, farthest));
> +   top = farthest;
> +   bot = nearest;
> + }
> +  poly_int64 size = (top - bot);
> +
> +  /* Assert the edge of each variable is aligned to the HWASAN tag 
> granule
> +  size.  */
> +  gcc_assert (multiple_p (top, HWASAN_TAG_GRANULE_SIZE));
> +  gcc_assert (multiple_p (bot, HWASAN_TAG_GRANULE_SIZE));
> +  gcc_assert (multiple_p (size, HWASAN_TAG_GRANULE_SIZE));
> +
> +  rtx ret = init_one_libfunc ("__hwasan_tag_memory");
> +  rtx base_tag = targetm.memtag.extract_tag (cur.tagged_base, NULL_RTX);
> +  rtx tag = plus_constant (QImode, base_tag, cur.tag_offset);
> +  tag = hwasan_truncate_to_tag_size (tag, NULL_RTX);
> +
> +  rtx bottom = convert_memory_address (ptr_mode,
> +plus_constant (Pmode,
> +   cur.untagged_base,
> +   bot));
> +  emit_library_call (ret, LCT_NORMAL, VOIDmode,
> +  bottom, ptr_mode,
> +  tag, QImode,
> +  gen_int_mode (size, ptr_mode), ptr_mode);
> +}
> +  /* Clear the stack vars, we've emitted the prologue for them all now.  */
> +  hwasan_tagged_stack_vars.truncate (0);
> +}
> +
> +/* For stack tagging:
> +
> +   Return RTL insns to clear the tags between DYNAMIC and VARS pointers
> +   into the stack.  These instructions should be emitted at the end of
> +   every function.
> +
> +   If `dynamic` is NULL_RTX then no insns are returned.  */
> +rtx_insn *
> +hwasan_emit_untag_frame (rtx dynamic, rtx vars)
> +{
> +  if (! dynamic)
> +return NULL;
> +
> +  start_sequence ();
> +
> +  dynamic = convert_memory_address (ptr_mode, dynamic);
> +  vars = convert_memory_address (ptr_mode, vars);
> +
> +  rtx top_rtx;
> +  rtx bot_rtx;
> +  if (FRAME_GROWS_DOWNWARD)
> +{
> +  top_rtx = vars;
> +  bot_rtx = dynamic;
> +}
> +  else
> +{
> +  top_rtx = dynamic;
> +  bot_rtx = vars;
> +}
> +
> +  rtx size_rtx = expand_simple_binop (ptr_mode, MINUS, top_rtx, bot_rtx,
> +   NULL_RTX, /* unsignedp = */0,
> +   OPTAB_DIRECT);
> +
> +  rtx ret = init_one_libfunc ("__hwasan_tag_memory");
> +  emit_library_call (ret, LCT_NORMAL, VOIDmode,
> +  bot_rtx, ptr_mode,
> +  HWASAN_STACK_BACKGROUND, QImode,
> +  size_rtx, ptr_mode);

Nit: “ret” seems like a strange name for this variable, si

Fix PR ada/97805

2020-11-19 Thread Eric Botcazou
We need to include limits.h (or ) in adaint.c because of LLONG_MIN.

Tested on x86-64/Linux, applied on the mainline.


2020-11-19  Eric Botcazou  

PR ada/97805
* adaint.c: Include climits in C++ and limits.h otherwise.

-- 
Eric Botcazoudiff --git a/gcc/ada/adaint.c b/gcc/ada/adaint.c
index 560f3529442..f5432626ee6 100644
--- a/gcc/ada/adaint.c
+++ b/gcc/ada/adaint.c
@@ -145,6 +145,13 @@
 #include "version.h"
 #endif
 
+/* limits.h is needed for LLONG_MIN.  */
+#ifdef __cplusplus
+#include 
+#else
+#include 
+#endif
+
 #ifdef __cplusplus
 extern "C" {
 #endif


[patch] Plug loophole in string store merging

2020-11-19 Thread Eric Botcazou
Hi,

there is a loophole in new string store merging support I added recently: it 
does not check that the stores are consecutive, which is obviously required if 
you want to concatenate them...  Simple fix attached, the nice thing being 
that it can fall back to the regular processing if any hole is detected in the 
series of stores, thanks to the handling of STRING_CST by native_encode_expr.

Tested on x86-64/Linux, OK for the mainline?


2020-11-19  Eric Botcazou  

* gimple-ssa-store-merging.c (struct merged_store_group): Add
new 'consecutive' field.
(merged_store_group): Set it to true.
(do_merge): Set it to false if the store is not consecutive and
set string_concatenation to false in this case.
(merge_into): Call do_merge on entry.
(merge_overlapping): Likewise.


2020-11-19  Eric Botcazou  

* gnat.dg/opt90a.adb: New test.
* gnat.dg/opt90b.adb: Likewise.
* gnat.dg/opt90c.adb: Likewise.
* gnat.dg/opt90d.adb: Likewise.
* gnat.dg/opt90e.adb: Likewise.
* gnat.dg/opt90a_pkg.ads: New helper.
* gnat.dg/opt90b_pkg.ads: Likewise.
* gnat.dg/opt90c_pkg.ads: Likewise.
* gnat.dg/opt90d_pkg.ads: Likewise.
* gnat.dg/opt90e_pkg.ads: Likewise.

-- 
Eric Botcazoudiff --git a/gcc/gimple-ssa-store-merging.c b/gcc/gimple-ssa-store-merging.c
index 6089faf7ac8..17a4250d77f 100644
--- a/gcc/gimple-ssa-store-merging.c
+++ b/gcc/gimple-ssa-store-merging.c
@@ -1450,6 +1450,7 @@ public:
   bool bit_insertion;
   bool string_concatenation;
   bool only_constants;
+  bool consecutive;
   unsigned int first_nonmergeable_order;
   int lp_nr;
 
@@ -1822,6 +1823,7 @@ merged_store_group::merged_store_group (store_immediate_info *info)
   bit_insertion = info->rhs_code == BIT_INSERT_EXPR;
   string_concatenation = info->rhs_code == STRING_CST;
   only_constants = info->rhs_code == INTEGER_CST;
+  consecutive = true;
   first_nonmergeable_order = ~0U;
   lp_nr = info->lp_nr;
   unsigned HOST_WIDE_INT align_bitpos = 0;
@@ -1957,6 +1959,9 @@ merged_store_group::do_merge (store_immediate_info *info)
   first_stmt = stmt;
 }
 
+  if (info->bitpos != start + width)
+consecutive = false;
+
   /* We need to use extraction if there is any bit-field.  */
   if (info->rhs_code == BIT_INSERT_EXPR)
 {
@@ -1964,13 +1969,17 @@ merged_store_group::do_merge (store_immediate_info *info)
   gcc_assert (!string_concatenation);
 }
 
-  /* We need to use concatenation if there is any string.  */
+  /* We want to use concatenation if there is any string.  */
   if (info->rhs_code == STRING_CST)
 {
   string_concatenation = true;
   gcc_assert (!bit_insertion);
 }
 
+  /* But we cannot use it if we don't have consecutive stores.  */
+  if (!consecutive)
+string_concatenation = false;
+
   if (info->rhs_code != INTEGER_CST)
 only_constants = false;
 }
@@ -1982,12 +1991,13 @@ merged_store_group::do_merge (store_immediate_info *info)
 void
 merged_store_group::merge_into (store_immediate_info *info)
 {
+  do_merge (info);
+
   /* Make sure we're inserting in the position we think we're inserting.  */
   gcc_assert (info->bitpos >= start + width
 	  && info->bitregion_start <= bitregion_end);
 
   width = info->bitpos + info->bitsize - start;
-  do_merge (info);
 }
 
 /* Merge a store described by INFO into this merged store.
@@ -1997,11 +2007,11 @@ merged_store_group::merge_into (store_immediate_info *info)
 void
 merged_store_group::merge_overlapping (store_immediate_info *info)
 {
+  do_merge (info);
+
   /* If the store extends the size of the group, extend the width.  */
   if (info->bitpos + info->bitsize > start + width)
 width = info->bitpos + info->bitsize - start;
-
-  do_merge (info);
 }
 
 /* Go through all the recorded stores in this group in program order and
package Opt90a_Pkg is

  type Rec is record
A : Short_Short_Integer;
B : Integer;
C : String (1 .. 12);
  end record;
  pragma Pack (Rec);
  for Rec'Alignment use 1;

  type Data is tagged record
R : Rec;
  end record;

end Opt90a_Pkg;
-- { dg-do run }
-- { dg-options "-O2" }

with Ada.Calendar; use Ada.Calendar;
with Opt90a_Pkg; use Opt90a_Pkg;

procedure Opt90a is
  B : constant Integer := Year (Clock);
  V : Data;

begin
  V := (R => (A => 0, B => B, C => ""));
  if V.R.B /= B then
raise Program_Error;
  end if;
end;
package Opt90b_Pkg is

  type Rec is record
A : Short_Short_Integer;
B : Integer;
C : Short_Integer;
D : String (1 .. 12);
  end record;
  pragma Pack (Rec);
  for Rec'Alignment use 1;

  type Data is tagged record
R : Rec;
  end record;

end Opt90b_Pkg;
-- { dg-do run }
-- { dg-options "-O2" }

with Ada.Calendar; use Ada.Calendar;
with Opt90c_Pkg; use Opt90c_Pkg;

procedure Opt90c is
  B : constant Integer := Year (Clock);
  V : Data;

begin
  V := (R => (A => 0, B => B, C => 0, D => ""));
  if V.R.B /= B then
raise Prog

[patch][rtl-optimization][i386][pr97777] Fix a reg-stack df maintenance bug triggered by zero-call-used-regs pass.

2020-11-19 Thread Qing Zhao via Gcc-patches
Hi, 

PR9 - ICE: in df_refs_verify, at df-scan.c:3991 with -O -ffinite-math-only 
-fzero-call-used-regs=all

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=9

Is a bug triggered by the new pass zero-call-used-regs, however, it’s an old 
bug in the pass “reg-stack”.
This pass does not correctly maintain the df information after transformation. 

Since the transformation is reg-stack pass is quite complicate, involving both 
instruction changes and control
Flow changes, I called “df_insn_rescan_all” after the transformation is done.

The patch has been tested with bootstrap with 
--enable-checking=yes,rtl,df,extra, no regression. 

Okay for commit?

Qing

From c2573c6c8552b7b4c2eedb0684ce48b5c11436ec Mon Sep 17 00:00:00 2001
From: qing zhao 
Date: Thu, 19 Nov 2020 16:46:50 +0100
Subject: [PATCH] rtl-optimization: Fix data flow maintenance bug in
 reg-stack.c [pr9]

reg-stack pass does not maintain the data flow information correctly.
call df_insn_rescan_all after the transformation is done.

gcc/
PR rtl-optimization/9
* reg-stack.c (rest_of_handle_stack_regs): call
df_insn_rescan_all if reg_to_stack return true.

gcc/testsuite/
PR rtl-optimization/9
* gcc.target/i386/pr9.c: New test.
---
 gcc/reg-stack.c | 3 ++-
 gcc/testsuite/gcc.target/i386/pr9.c | 9 +
 2 files changed, 11 insertions(+), 1 deletion(-)
 create mode 100644 gcc/testsuite/gcc.target/i386/pr9.c

diff --git a/gcc/reg-stack.c b/gcc/reg-stack.c
index 8f98bd85750..3dab843f803 100644
--- a/gcc/reg-stack.c
+++ b/gcc/reg-stack.c
@@ -3426,7 +3426,8 @@ static unsigned int
 rest_of_handle_stack_regs (void)
 {
 #ifdef STACK_REGS
-  reg_to_stack ();
+  if (reg_to_stack ())
+df_insn_rescan_all ();
   regstack_completed = 1;
 #endif
   return 0;
diff --git a/gcc/testsuite/gcc.target/i386/pr9.c 
b/gcc/testsuite/gcc.target/i386/pr9.c
new file mode 100644
index 000..fcefc098637
--- /dev/null
+++ b/gcc/testsuite/gcc.target/i386/pr9.c
@@ -0,0 +1,9 @@
+/* { dg-do compile { target *-*-linux* } } */
+/* { dg-options "-O -fzero-call-used-regs=used -ffinite-math-only" } */
+
+float
+foo (void)
+{
+  return __builtin_fmod (0, 0);
+}
+
-- 
2.11.0




c++: Relax new assert [PR 97905]

2020-11-19 Thread Nathan Sidwell


It turns out there are legitimate cases for the new decl to not have
lang-specific.

PR c++/97905
gcc/cp/
* decl.c (duplicate_decls): Relax new assert.
gcc/testsuite/
* g++.dg/lookup/pr97905.C: New.

pushing to trunk

--
Nathan Sidwell
diff --git c/gcc/cp/decl.c w/gcc/cp/decl.c
index d90e9840f40..f5c6f5c0d10 100644
--- c/gcc/cp/decl.c
+++ w/gcc/cp/decl.c
@@ -2749,9 +2749,8 @@ duplicate_decls (tree newdecl, tree olddecl, bool hiding, bool was_hidden)
  with that from NEWDECL below.  */
   if (DECL_LANG_SPECIFIC (olddecl))
 {
-  gcc_checking_assert (DECL_LANG_SPECIFIC (newdecl)
-			   && (DECL_LANG_SPECIFIC (olddecl)
-			   != DECL_LANG_SPECIFIC (newdecl)));
+  gcc_checking_assert (DECL_LANG_SPECIFIC (olddecl)
+			   != DECL_LANG_SPECIFIC (newdecl));
   ggc_free (DECL_LANG_SPECIFIC (olddecl));
 }
 
diff --git c/gcc/testsuite/g++.dg/lookup/pr97905.C w/gcc/testsuite/g++.dg/lookup/pr97905.C
new file mode 100644
index 000..22a7e5cf6d4
--- /dev/null
+++ w/gcc/testsuite/g++.dg/lookup/pr97905.C
@@ -0,0 +1,7 @@
+// PR 97905
+
+
+template  void a() {
+  extern int *b; // This decl gets an (unneeded) decl-lang-specific
+}
+int *b; // this does not


Re: [PATCH 0/2] Improve MSP430 hardware multiply support

2020-11-19 Thread Jeff Law via Gcc-patches



On 11/17/20 7:47 AM, Jozef Lawrynowicz wrote:
> In addition to the default config, I would suggest:
>   msp430-sim/-mcpu=msp430
> Test the 430 ISA
>   msp430-sim/-mlarge/-mcode-region=either
> Test the large memory model with data assumed to be in the lower
> memory region (default, reduces code size penalty of using -mlarge),
> whilst shuffling code between the upper and lower memory regions to
> make the program fit.
>   msp430-sim/-mlarge/-mdata-region=either/-mcode-region=either
>Test the large memory model, shuffling code and data between upper
>and lower memory regions.
>
> I should really use -mlarge/-mcode-region=either, instead of just
> -mlarge, as well. -mcode-region=either doesn't change code gen, just
> allows the linker shuffling of text sections so more tests build and so
> we get better test coverage.
>
> With limited testing capacity, testing hwmult configs is not very useful
> unless hwmult behavior is specifically changed. There are msp430
> specific tests to verify the options basically work.
ACK.  I've added those multilibs to msp430-elf configuration.

Thanks!

jeff



[PATCH] c++: Fix array new with value-initialization [PR97523]

2020-11-19 Thread Marek Polacek via Gcc-patches
Since my r11-3092 the following is rejected with -std=c++20:

  struct T { explicit T(); };
  void fn(int n) {
new T[1]();
  }

with "would use explicit constructor 'T::T()'".  It is because since
that change we go into the P1009 block in build_new (array_p is false,
but nelts is non-null and we're in C++20).  Since we only have (), we
build a {} and continue to build_new_1, which then calls build_vec_init
and then we error because the {} isn't CONSTRUCTOR_IS_DIRECT_INIT.

For (), which is value-initializing, we want to do what we were doing
before: pass empty init and let build_value_init take care of it.

For various reasons I wanted to dig a little bit deeper into this,
and as a result, I'm adding a test for [expr.new]/24 (and checked that
out current behavior matches clang++).

Bootstrapped/regtested on x86_64-pc-linux-gnu, ok for trunk?

gcc/cp/ChangeLog:

PR c++/97523
* init.c (build_new): When value-initializing an array new,
leave the INIT as an empty vector.

gcc/testsuite/ChangeLog:

PR c++/97523
* g++.dg/expr/anew5.C: New test.
* g++.dg/expr/anew6.C: New test.
---
 gcc/cp/init.c |  6 +-
 gcc/testsuite/g++.dg/expr/anew5.C | 26 
 gcc/testsuite/g++.dg/expr/anew6.C | 33 +++
 3 files changed, 64 insertions(+), 1 deletion(-)
 create mode 100644 gcc/testsuite/g++.dg/expr/anew5.C
 create mode 100644 gcc/testsuite/g++.dg/expr/anew6.C

diff --git a/gcc/cp/init.c b/gcc/cp/init.c
index ffb84ea5b09..0b98f338feb 100644
--- a/gcc/cp/init.c
+++ b/gcc/cp/init.c
@@ -3766,7 +3766,11 @@ build_new (location_t loc, vec **placement, 
tree type,
 
   /* P1009: Array size deduction in new-expressions.  */
   const bool array_p = TREE_CODE (type) == ARRAY_TYPE;
-  if (*init && (array_p || (nelts && cxx_dialect >= cxx20)))
+  if (*init
+  /* If ARRAY_P, we have to deduce the array bound.  For C++20 paren-init,
+we have to process the parenthesized-list.  But don't do it for (),
+which is value-initialization, and INIT should stay empty.  */
+  && (array_p || (cxx_dialect >= cxx20 && nelts && !(*init)->is_empty (
 {
   /* This means we have 'new T[]()'.  */
   if ((*init)->is_empty ())
diff --git a/gcc/testsuite/g++.dg/expr/anew5.C 
b/gcc/testsuite/g++.dg/expr/anew5.C
new file mode 100644
index 000..d597caf5483
--- /dev/null
+++ b/gcc/testsuite/g++.dg/expr/anew5.C
@@ -0,0 +1,26 @@
+// PR c++/97523
+// { dg-do compile }
+// We were turning the () into {} which made it seem like
+// aggregate-initialization (we are dealing with arrays here), which
+// performs copy-initialization, which only accepts converting constructors.
+
+struct T {
+  explicit T();
+  T(int);
+};
+
+void
+fn (int n)
+{
+  new T[1]();
+  new T[2]();
+  new T[3]();
+  new T[n]();
+#if __cpp_aggregate_paren_init
+  new T[]();
+  new T[2](1, 2);
+  // T[2] is initialized via copy-initialization, so we can't call
+  // explicit T().
+  new T[3](1, 2); // { dg-error "explicit constructor" "" { target c++20 } }
+#endif
+}
diff --git a/gcc/testsuite/g++.dg/expr/anew6.C 
b/gcc/testsuite/g++.dg/expr/anew6.C
new file mode 100644
index 000..0542daac275
--- /dev/null
+++ b/gcc/testsuite/g++.dg/expr/anew6.C
@@ -0,0 +1,33 @@
+// PR c++/97523
+// { dg-do compile { target c++11 } }
+
+// [expr.new]/24: If the new-expression creates an object or an array of
+// objects of class type, access and ambiguity control are done for the
+// [...] constructor selected for the initialization (if any).
+// NB: We only check for a default constructor if the array has a non-constant
+// bound, or there are insufficient initializers.  Since an array is an
+// aggregate, we perform aggregate-initialization, which performs
+// copy-initialization, so we only accept converting constructors.
+
+struct T {
+  explicit T();
+  T(int);
+};
+
+struct S {
+  S(int);
+};
+
+void
+fn (int n)
+{
+  new T[1]{}; // { dg-error "explicit constructor" }
+  new T[2]{1, 2};
+  new T[3]{1, 2}; // { dg-error "explicit constructor" }
+  new T[n]{}; // { dg-error "explicit constructor" }
+
+  new S[1]{}; // { dg-error "could not convert" }
+  new S[2]{1, 2};
+  new S[3]{1, 2}; // { dg-error "could not convert" }
+  new S[n]{}; // { dg-error "could not convert" }
+}

base-commit: 2729378d0905a04e476a8bdcaaf0288f417810ec
-- 
2.28.0



[PATCH] c++: Fix crash with broken deduction from {} [PR97895]

2020-11-19 Thread Marek Polacek via Gcc-patches
Unfortunately, the otherwise beautiful

  for (constructor_elt &elt : *CONSTRUCTOR_ELTS (init))

is not immune to an empty constructor, so we have to check
CONSTRUCTOR_ELTS first.

Bootstrapped/regtested on x86_64-pc-linux-gnu, ok for trunk?

gcc/cp/ChangeLog:

PR c++/97895
* pt.c (do_auto_deduction): Don't crash when the constructor has
zero elements.

gcc/testsuite/ChangeLog:

PR c++/97895
* g++.dg/cpp0x/auto54.C: New test.
---
 gcc/cp/pt.c | 11 +++
 gcc/testsuite/g++.dg/cpp0x/auto54.C | 10 ++
 2 files changed, 17 insertions(+), 4 deletions(-)
 create mode 100644 gcc/testsuite/g++.dg/cpp0x/auto54.C

diff --git a/gcc/cp/pt.c b/gcc/cp/pt.c
index 1babf833d32..a1b6631d691 100644
--- a/gcc/cp/pt.c
+++ b/gcc/cp/pt.c
@@ -29250,10 +29250,13 @@ do_auto_deduction (tree type, tree init, tree 
auto_node,
 return error_mark_node;
 
   if (BRACE_ENCLOSED_INITIALIZER_P (init))
-/* We don't recurse here because we can't deduce from a nested
-   initializer_list.  */
-for (constructor_elt &elt : *CONSTRUCTOR_ELTS (init))
-  elt.value = resolve_nondeduced_context (elt.value, complain);
+{
+  /* We don't recurse here because we can't deduce from a nested
+initializer_list.  */
+  if (CONSTRUCTOR_ELTS (init))
+   for (constructor_elt &elt : *CONSTRUCTOR_ELTS (init))
+ elt.value = resolve_nondeduced_context (elt.value, complain);
+}
   else
 init = resolve_nondeduced_context (init, complain);
 
diff --git a/gcc/testsuite/g++.dg/cpp0x/auto54.C 
b/gcc/testsuite/g++.dg/cpp0x/auto54.C
new file mode 100644
index 000..0c1815a99bc
--- /dev/null
+++ b/gcc/testsuite/g++.dg/cpp0x/auto54.C
@@ -0,0 +1,10 @@
+// PR c++/97895
+// { dg-do compile { target c++11 } }
+
+namespace std {
+  template struct initializer_list {
+const T *ptr;
+decltype(sizeof 0) n;
+  };
+  auto a = {}; // { dg-error "unable to deduce" }
+}

base-commit: 25bb75f841c552cfd27a4344b7487efbe35b4481
-- 
2.28.0



Re: Update [PATCH 6/X] libsanitizer: Add hwasan pass and associated gimple changes

2020-11-19 Thread Richard Sandiford via Gcc-patches
Matthew Malcomson  writes:
> +/* Emit gimple statements into &stmts that take the size given in `len` and
> +   generate a size that is guaranteed to be rounded upwards to `align`.
> +
> +   This is a helper function for both handle_builtin_alloca and
> +   asan_expand_mark_ifn when using HWASAN.
> +
> +   Return the tree node representing this size, it is of TREE_TYPE
> +   size_type_node.  */
> +
> +static tree
> +hwasan_emit_round_up (gimple_seq *seq, location_t loc, tree old_size,
> +   uint8_t align)
> +{
> +  uint8_t tg_mask = align - 1;
> +  /* tree new_size = (old_size + tg_mask) & ~tg_mask;  */
> +  tree tree_mask = build_int_cst (size_type_node, tg_mask);
> +  tree oversize = gimple_build (seq, loc, PLUS_EXPR, size_type_node, 
> old_size,
> + tree_mask);
> +
> +  tree mask = build_int_cst (size_type_node, -align);
> +  return gimple_build (seq, loc, BIT_AND_EXPR, size_type_node, oversize, 
> mask);
> +}
> +

There's nothing really hwasan-specific about this, apart from the choice
“uint8_t” for the alignment and mask.  So I think we should:

- chnage “align” and “tg_mask” to “unsigned HOST_WIDE_INT”
- change the name to “gimple_build_round_up”
- take the type as a parameter, in the same position as other
  gimple_build_* type parameters
- move the function to gimple-fold.c, exported via gimple-fold.h
- drop:

   This is a helper function for both handle_builtin_alloca and
   asan_expand_mark_ifn when using HWASAN.

> […]
> @@ -690,6 +757,71 @@ handle_builtin_alloca (gcall *call, gimple_stmt_iterator 
> *iter)
>  = DECL_FUNCTION_CODE (callee) == BUILT_IN_ALLOCA
>? 0 : tree_to_uhwi (gimple_call_arg (call, 1));
>  
> +  if (hwasan_sanitize_allocas_p ())
> +{
> +  gimple_seq stmts = NULL;
> +  location_t loc = gimple_location (gsi_stmt (*iter));
> +  /*
> +  HWASAN needs a different expansion.
> +
> +  addr = __builtin_alloca (size, align);
> +
> +  should be replaced by
> +
> +  new_size = size rounded up to HWASAN_TAG_GRANULE_SIZE byte alignment;
> +  untagged_addr = __builtin_alloca (new_size, align);
> +  tag = __hwasan_choose_alloca_tag ();
> +  addr = ifn_HWASAN_SET_TAG (untagged_addr, tag);
> +  __hwasan_tag_memory (untagged_addr, tag, new_size);
> + */
> +  /* Ensure alignment at least HWASAN_TAG_GRANULE_SIZE bytes so we start 
> on
> +  a tag granule.  */
> +  align = align > HWASAN_TAG_GRANULE_SIZE ? align : 
> HWASAN_TAG_GRANULE_SIZE;
> +
> +  tree old_size = gimple_call_arg (call, 0);
> +  tree new_size = hwasan_emit_round_up (&stmts, loc, old_size,
> + HWASAN_TAG_GRANULE_SIZE);
> +
> +  /* Make the alloca call */
> +  tree untagged_addr
> + = gimple_build (&stmts, loc,
> + as_combined_fn (BUILT_IN_ALLOCA_WITH_ALIGN), ptr_type,
> + new_size, build_int_cst (size_type_node, align));
> +
> +  /* Choose the tag.
> +  Here we use an internal function so we can choose the tag at expand
> +  time.  We need the decision to be made after stack variables have been
> +  assigned their tag (i.e. once the hwasan_frame_tag_offset variable has
> +  been set to one after the last stack variables tag).  */
> +  gcall *stmt = gimple_build_call_internal (IFN_HWASAN_CHOOSE_TAG, 0);
> +  tree tag = make_ssa_name (unsigned_char_type_node);
> +  gimple_call_set_lhs (stmt, tag);
> +  gimple_set_location (stmt, loc);
> +  gimple_seq_add_stmt_without_update (&stmts, stmt);

Even though there are currently no folds defined for argumentless
functions, I think it would be worth adding a gimple_build overload
for this instead of writing it out by hand.  I.e. have a zero-argument
version of:

tree
gimple_build (gimple_seq *seq, location_t loc, combined_fn fn,
  tree type, tree arg0)
{
  tree res = gimple_simplify (fn, type, arg0, seq, gimple_build_valueize);
  if (!res)
{
  gcall *stmt;
  if (internal_fn_p (fn))
stmt = gimple_build_call_internal (as_internal_fn (fn), 1, arg0);
  else
{
  tree decl = builtin_decl_implicit (as_builtin_fn (fn));
  stmt = gimple_build_call (decl, 1, arg0);
}
  if (!VOID_TYPE_P (type))
{
  res = create_tmp_reg_or_ssa_name (type);
  gimple_call_set_lhs (stmt, res);
}
  gimple_set_location (stmt, loc);
  gimple_seq_add_stmt_without_update (seq, stmt);
}
  return res;
}

without the gimple_simplify call.

> +
> +  /* Add tag to pointer.  */
> +  tree addr
> + = gimple_build (&stmts, loc, as_combined_fn (IFN_HWASAN_SET_TAG),

This is CFN_HWASAN_SET_TAG.

> + ptr_type, untagged_addr, tag);
> +
> +  /* Tag shadow memory.
> +  NOTE: require using `untagged_addr` here for libhwasan API.  */
> +  gimple_build (&stmts, loc, as_combined_fn (BUILT_IN_HWASAN_TAG_MEM),
> + void

[PATCH] c++, v2: Add __builtin_clear_padding builtin - C++20 P0528R3 compiler side [PR88101]

2020-11-19 Thread Jakub Jelinek via Gcc-patches
Hi!

This is the whole __builtin_clear_padding patchset merged into a single
patch, + 2 new changes - one is that fold_builtin_1 now folds the
1 argument (meant for users) __builtin_clear_padding into an internal
2 argument form, where the second argument is NULL of the first argument's
type, such that gimplifier's stripping of useless type conversions doesn't
change behavior, and handling NULLPTR_TYPE as all padding bits, because
lvalue-to-rvalue conversions with decltype(nullptr) type don't really read
anything from the memory and so we need to clear all the bits as padding.
Here is the full description:

The following patch implements __builtin_clear_padding builtin that clears
the padding bits in object representation (but preserves value
representation).  Inside of unions it clears only those padding bits that
are padding for all the union members (so that it never alters value
representation).

It handles trailing padding, padding in the middle of structs including
bitfields (PDP11 unhandled, I've never figured out how those bitfields
work), VLAs (doesn't handle variable length structures, but I think almost
nobody uses them and it isn't worth the extra complexity).  For VLAs and
sufficiently large arrays it uses runtime clearing loop instead of emitting
straight-line code (unless arrays are inside of a union).

The way I think this can be used for atomics is e.g. if the structures
are power of two sized and small enough that we use the hw atomics
for say compare_exchange __builtin_clear_padding could be called first on
the address of expected and desired arguments (for desired only if we want
to ensure that most of the time the atomic memory will have padding bits
cleared), then perform the weak cmpxchg and if that fails, we got the
value from the atomic memory; we can call __builtin_clear_padding on a copy
of that and then compare it with expected, and if it is the same with the
padding bits masked off, we can use the original with whatever random
padding bits in it as the new expected for next cmpxchg.
__builtin_clear_padding itself is not atomic and therefore it shouldn't
be called on the atomic memory itself, but compare_exchange*'s expected
argument is a reference and normally the implementation may store there
the current value from memory, so padding bits can be cleared in that,
and desired is passed by value rather than reference, so clearing is fine
too.

When using libatomic, we can use it either that way, or add new libatomic
APIs that accept another argument, pointer to the padding bit bitmask,
and construct that in the template as
  alignas (_T) unsigned char _mask[sizeof (_T)];
  std::memset (_mask, ~0, sizeof (_mask));
  __builtin_clear_padding ((_T *) _mask);
which will have bits cleared for padding bits and set for bits taking part
in the value representation.  Then libatomic could internally instead
of using memcmp compare
for (i = 0; i < N; i++) if ((val1[i] & mask[i]) != (val2[i] & mask[i]))

Tested on x86_64-linux, ok for trunk if it passes full bootstrap/regtest?

2020-11-19  Jakub Jelinek  

PR libstdc++/88101
gcc/
* builtins.def (BUILT_IN_CLEAR_PADDING): New built-in function.
* builtins.c (fold_builtin_1): Handle BUILT_IN_CLEAR_PADDING.
* gimple-fold.c (clear_padding_unit, clear_padding_buf_size): New
const variables.
(struct clear_padding_struct): New type.
(clear_padding_flush, clear_padding_add_padding,
clear_padding_emit_loop, clear_padding_type,
clear_padding_union, clear_padding_real_needs_padding_p,
clear_padding_type_may_have_padding_p,
gimple_fold_builtin_clear_padding): New functions.
(gimple_fold_builtin): Handle BUILT_IN_CLEAR_PADDING.
* doc/extend.texi (__builtin_clear_padding): Document.
gcc/c-family/
* c-common.c (check_builtin_function_arguments): Handle
BUILT_IN_CLEAR_PADDING.
gcc/testsuite/
* c-c++-common/builtin-clear-padding-1.c: New test.
* c-c++-common/torture/builtin-clear-padding-1.c: New test.
* c-c++-common/torture/builtin-clear-padding-2.c: New test.
* c-c++-common/torture/builtin-clear-padding-3.c: New test.
* c-c++-common/torture/builtin-clear-padding-4.c: New test.
* c-c++-common/torture/builtin-clear-padding-5.c: New test.
* g++.dg/torture/builtin-clear-padding-1.C: New test.
* g++.dg/torture/builtin-clear-padding-2.C: New test.
* gcc.dg/builtin-clear-padding-1.c: New test.

--- gcc/builtins.def.jj 2020-11-18 09:38:28.481816977 +0100
+++ gcc/builtins.def2020-11-19 16:15:50.573639579 +0100
@@ -839,6 +839,7 @@ DEF_EXT_LIB_BUILTIN(BUILT_IN_CLEAR_C
 /* [trans-mem]: Adjust BUILT_IN_TM_CALLOC if BUILT_IN_CALLOC is changed.  */
 DEF_LIB_BUILTIN(BUILT_IN_CALLOC, "calloc", BT_FN_PTR_SIZE_SIZE, 
ATTR_MALLOC_WARN_UNUSED_RESULT_SIZE_1_2_NOTHROW_LEAF_LIST)
 DEF_GCC_BUILTIN(BUILT_IN_CLASSIFY_TYPE, "classify_type", 
BT_FN_INT_VAR, ATTR_LEAF_LIST)

Re: [PATCH] libsanitizer: fix SIGSEGV in fopen64 interceptor

2020-11-19 Thread Martin Liška

On 11/19/20 12:28 PM, Slava Barinov via Gcc-patches wrote:

Null pointer in path argument leads to SIGSEGV in interceptor.


Hello.

I can't see we ever had the null check in master. I don't this it was lost
during a merge from master.

Why do we need the hunk?
Thanks,
Martin



libsanitizer/ChangeLog:
 * sanitizer_common/sanitizer_common_interceptors.inc: Check
path for null before dereference in fopen64 interceptor.
---

Notes:
 Apparently check has been lost during merge from upstream

  libsanitizer/sanitizer_common/sanitizer_common_interceptors.inc | 2 +-
  1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/libsanitizer/sanitizer_common/sanitizer_common_interceptors.inc 
b/libsanitizer/sanitizer_common/sanitizer_common_interceptors.inc
index 729eead43c0..2ef23d9a50b 100644
--- a/libsanitizer/sanitizer_common/sanitizer_common_interceptors.inc
+++ b/libsanitizer/sanitizer_common/sanitizer_common_interceptors.inc
@@ -6081,7 +6081,7 @@ INTERCEPTOR(__sanitizer_FILE *, freopen, const char 
*path, const char *mode,
  INTERCEPTOR(__sanitizer_FILE *, fopen64, const char *path, const char *mode) {
void *ctx;
COMMON_INTERCEPTOR_ENTER(ctx, fopen64, path, mode);
-  COMMON_INTERCEPTOR_READ_RANGE(ctx, path, REAL(strlen)(path) + 1);
+  if (path) COMMON_INTERCEPTOR_READ_RANGE(ctx, path, REAL(strlen)(path) + 1);
COMMON_INTERCEPTOR_READ_RANGE(ctx, mode, REAL(strlen)(mode) + 1);
__sanitizer_FILE *res = REAL(fopen64)(path, mode);
COMMON_INTERCEPTOR_FILE_OPEN(ctx, res, path);





Re: [AArch64] Add --with-tune configure flag

2020-11-19 Thread Richard Earnshaw (lists) via Gcc-patches
On 19/11/2020 14:40, Wilco Dijkstra via Gcc-patches wrote:
> Hi,
> 
     As for your second patch, --with-cpu-64 could be a simple alias indeed,
     but what is the exact definition/expected behaviour of a --with-cpu-32
     on a target that only supports 64-bit code? The AArch64 target cannot
     generate AArch32 code, so we shouldn't silently accept it.
>>>
>>> IMO allowing users to specify all the flags available on x86 is important.
>>>
>>
>> This isn't about general users though; it's about how you configure the
>> compiler and that's not all the same.  I don't mind the --with-cpu-64 as
>> a strict alias for --with-cpu, but --with-cpu-32 is both redundant and
>> misleading as it might give the impression that it does something useful.
> 
> We could make it do something useful, for example emit a warning, an error
> or default to -mabi=ilp32 (since that is similar to what other targets do).
> Anything is better than being the only target that doesn't support it...
> 
> Cheers,
> Wilco
> 

Having the same option have a completely different meaning would be even
worse than not having the option at all.  So no, that's a non-starter.

It's not like these configure options have wide-spread usage at present.

R.


Re: [PATCH] rs6000, vector integer multiply/divide/modulo instructions

2020-11-19 Thread Pat Haugen via Gcc-patches
On 11/4/20 10:44 AM, Carl Love via Gcc-patches wrote:
> +
> +(define_insn "vdives_"
> +  [(set (match_operand:VIlong 0 "vsx_register_operand" "=v")
> +(unspec:VIlong [(match_operand:VIlong 1 "vsx_register_operand" "v")
> + (match_operand:VIlong 2 "vsx_register_operand" "v")]
> +UNSPEC_VDIVES))]
> +  "TARGET_POWER10"
> +  "vdives %0,%1,%2"
> +  [(set_attr "type" "vecsimple")])
> +
> +(define_insn "vdiveu_"
> +  [(set (match_operand:VIlong 0 "vsx_register_operand" "=v")
> +(unspec: VIlong [(match_operand:VIlong 1 "vsx_register_operand" "v")
> +  (match_operand:VIlong 2 "vsx_register_operand" "v")]
> + UNSPEC_VDIVEU))]
> +  "TARGET_POWER10"
> +  "vdiveu %0,%1,%2"
> +  [(set_attr "type" "vecsimple")])
> +
> +(define_insn "div3"
> +  [(set (match_operand:VIlong 0 "vsx_register_operand" "=v")
> + (div:VIlong (match_operand:VIlong 1 "vsx_register_operand" "v")
> + (match_operand:VIlong 2 "vsx_register_operand" "v")))]
> +  "TARGET_POWER10"
> +  "vdivs %0,%1,%2"
> +  [(set_attr "type" "vecsimple")])
> +
> +(define_insn "udiv3"
> +  [(set (match_operand:VIlong 0 "vsx_register_operand" "=v")
> + (udiv:VIlong (match_operand:VIlong 1 "vsx_register_operand" "v")
> + (match_operand:VIlong 2 "vsx_register_operand" "v")))]
> +  "TARGET_POWER10"
> +  "vdivu %0,%1,%2"
> +  [(set_attr "type" "vecsimple")])
> +
> +(define_insn "vmods_"
> +  [(set (match_operand:VIlong 0 "vsx_register_operand" "=v")
> + (mod:VIlong (match_operand:VIlong 1 "vsx_register_operand" "v")
> + (match_operand:VIlong 2 "vsx_register_operand" "v")))]
> +  "TARGET_POWER10"
> +  "vmods %0,%1,%2"
> +  [(set_attr "type" "vecsimple")])
> +
> +(define_insn "vmodu_"
> +  [(set (match_operand:VIlong 0 "vsx_register_operand" "=v")
> + (umod:VIlong (match_operand:VIlong 1 "vsx_register_operand" "v")
> +  (match_operand:VIlong 2 "vsx_register_operand" "v")))]
> +  "TARGET_POWER10"
> +  "vmodu %0,%1,%2"
> +  [(set_attr "type" "vecsimple")])

Since the vdiv.../vmod... instructions execute in the fixed point divide unit, 
all the above instructions should have a type of "div" instead of "vecsimple".


> +
> +(define_insn "vmulhs_"
> +  [(set (match_operand:VIlong 0 "vsx_register_operand" "=v")
> + (unspec:VIlong [(match_operand:VIlong 1 "vsx_register_operand" "v")
> + (match_operand:VIlong 2 "vsx_register_operand" "v")]
> +UNSPEC_VMULHS))]
> +  "TARGET_POWER10"
> +  "vmulhs %0,%1,%2"
> +  [(set_attr "type" "vecsimple")])
> +
> +(define_insn "vmulhu_"
> +  [(set (match_operand:VIlong 0 "vsx_register_operand" "=v")
> + (unspec:VIlong [(match_operand:VIlong 1 "vsx_register_operand" "v")
> + (match_operand:VIlong 2 "vsx_register_operand" "v")]
> +UNSPEC_VMULHU))]
> +  "TARGET_POWER10"
> +  "vmulhu %0,%1,%2"
> +  [(set_attr "type" "vecsimple")])> +
> +;; Vector multiply low double word
> +(define_insn "mulv2di3"
> +  [(set (match_operand:V2DI 0 "vsx_register_operand" "=v")
> + (mult:V2DI (match_operand:V2DI 1 "vsx_register_operand" "v")
> +(match_operand:V2DI 2 "vsx_register_operand" "v")))]
> +  "TARGET_POWER10"
> +  "vmulld %0,%1,%2"
> +  [(set_attr "type" "vecsimple")])

Similarly, the above 3 insns should have a "mul" instruction type.

-Pat


Re: [patch] Plug loophole in string store merging

2020-11-19 Thread Jeff Law via Gcc-patches



On 11/19/20 8:52 AM, Eric Botcazou wrote:
> Hi,
>
> there is a loophole in new string store merging support I added recently: it 
> does not check that the stores are consecutive, which is obviously required 
> if 
> you want to concatenate them...  Simple fix attached, the nice thing being 
> that it can fall back to the regular processing if any hole is detected in 
> the 
> series of stores, thanks to the handling of STRING_CST by native_encode_expr.
>
> Tested on x86-64/Linux, OK for the mainline?
>
>
> 2020-11-19  Eric Botcazou  
>
>   * gimple-ssa-store-merging.c (struct merged_store_group): Add
>   new 'consecutive' field.
>   (merged_store_group): Set it to true.
>   (do_merge): Set it to false if the store is not consecutive and
>   set string_concatenation to false in this case.
>   (merge_into): Call do_merge on entry.
>   (merge_overlapping): Likewise.
>
>
> 2020-11-19  Eric Botcazou  
>
>   * gnat.dg/opt90a.adb: New test.
>   * gnat.dg/opt90b.adb: Likewise.
>   * gnat.dg/opt90c.adb: Likewise.
>   * gnat.dg/opt90d.adb: Likewise.
>   * gnat.dg/opt90e.adb: Likewise.
>   * gnat.dg/opt90a_pkg.ads: New helper.
>   * gnat.dg/opt90b_pkg.ads: Likewise.
>   * gnat.dg/opt90c_pkg.ads: Likewise.
>   * gnat.dg/opt90d_pkg.ads: Likewise.
>   * gnat.dg/opt90e_pkg.ads: Likewise.
OK
jeff



Re: [PATCH] pru: Add builtins for HALT and LMBD

2020-11-19 Thread Dimitar Dimitrov
On четвъртък, 19 ноември 2020 г. 2:07:59 EET Jeff Law wrote:
> On 11/13/20 1:07 PM, Dimitar Dimitrov wrote:
> > Add builtins for HALT and LMBD, per Texas Instruments document
> > SPRUHV7C.  Use the new LMBD pattern to define an expand for clz.
> > 
> > Binutils [1] and sim [2] support for LMBD instruction are merged now.
> > 
> > [1] https://sourceware.org/pipermail/binutils/2020-October/113901.html
> > [2] https://sourceware.org/pipermail/gdb-patches/2020-November/173141.html
> > 
> > gcc/ChangeLog:
> > * config/pru/alu-zext.md: Add lmbd patterns for zero_extend
> > variants.
> > * config/pru/pru.c (enum pru_builtin): Add HALT and LMBD.
> > (pru_init_builtins): Ditto.
> > (pru_builtin_decl): Ditto.
> > (pru_expand_builtin): Ditto.
> > * config/pru/pru.h (CLZ_DEFINED_VALUE_AT_ZERO): Define PRU
> > value for CLZ with zero value parameter.
> > * config/pru/pru.md: Add halt, lmbd and clz patterns.
> > * doc/extend.texi: Document PRU builtins.
> > 
> > gcc/testsuite/ChangeLog:
> > * gcc.target/pru/halt.c: New test.
> > * gcc.target/pru/lmbd.c: New test.
> 
> OK.  Please commit if you haven't already.

Thank you. Pushed as 5ace1776b88d4b0fc371414d0b3983015e22fead .

Regards,
Dimitar






config: Add tests for modules-desired features

2020-11-19 Thread Nathan Sidwell

this adds configure tests for features that modules can take advantage
of -- and if they are not present has reduced or fallback functionality.

It is slightly different from the earlier posting, as the server 
functionality has been moved from gcc/cp to its own toplevel directory


 gcc/
 * configure.ac: Add tests for fstatat, sighandler_t, O_CLOEXEC,
 unix-domain and ipv6 sockets.
 * config.in: Rebuilt.
 * configure: Rebuilt.

pushing to trunk

--
Nathan Sidwell

diff --git c/gcc/configure.ac w/gcc/configure.ac
index b2732d17bf4..1cce371a9e1 100644
--- c/gcc/configure.ac
+++ w/gcc/configure.ac
@@ -1417,8 +1417,8 @@ define(gcc_UNLOCKED_FUNCS, clearerr_unlocked feof_unlocked dnl
   putchar_unlocked putc_unlocked)
 AC_CHECK_FUNCS(times clock kill getrlimit setrlimit atoq \
 	popen sysconf strsignal getrusage nl_langinfo \
-	gettimeofday mbstowcs wcswidth mmap setlocale \
-	gcc_UNLOCKED_FUNCS madvise mallinfo mallinfo2)
+	gettimeofday mbstowcs wcswidth mmap posix_fallocate setlocale \
+	gcc_UNLOCKED_FUNCS madvise mallinfo mallinfo2 fstatat)
 
 if test x$ac_cv_func_mbstowcs = xyes; then
   AC_CACHE_CHECK(whether mbstowcs works, gcc_cv_func_mbstowcs_works,
@@ -1440,6 +1440,10 @@ fi
 
 AC_CHECK_TYPE(ssize_t, int)
 AC_CHECK_TYPE(caddr_t, char *)
+AC_CHECK_TYPE(sighander_t,
+  AC_DEFINE(HAVE_SIGHANDLER_T, 1,
+[Define if  defines sighandler_t]),
+,signal.h)
 
 GCC_AC_FUNC_MMAP_BLACKLIST
 
@@ -1585,6 +1589,72 @@ if test $ac_cv_f_setlkw = yes; then
   [Define if F_SETLKW supported by fcntl.])
 fi
 
+# Check if O_CLOEXEC is defined by fcntl
+AC_CACHE_CHECK(for O_CLOEXEC, ac_cv_o_cloexec, [
+AC_COMPILE_IFELSE([AC_LANG_PROGRAM([[
+#include ]], [[
+return open ("/dev/null", O_RDONLY | O_CLOEXEC);]])],
+[ac_cv_o_cloexec=yes],[ac_cv_o_cloexec=no])])
+if test $ac_cv_o_cloexec = yes; then
+  AC_DEFINE(HOST_HAS_O_CLOEXEC, 1,
+  [Define if O_CLOEXEC supported by fcntl.])
+fi
+
+# C++ Modules would like some networking features to provide the mapping
+# server.  You can still use modules without them though.
+# The following network-related checks could probably do with some
+# Windows and other non-linux defenses and checking.
+
+# Local socket connectivity wants AF_UNIX networking
+# Check for AF_UNIX networking
+AC_CACHE_CHECK(for AF_UNIX, ac_cv_af_unix, [
+AC_COMPILE_IFELSE([AC_LANG_PROGRAM([[
+#include 
+#include 
+#include 
+#include ]],[[
+sockaddr_un un;
+un.sun_family = AF_UNSPEC;
+int fd = socket (AF_UNIX, SOCK_STREAM, 0);
+connect (fd, (sockaddr *)&un, sizeof (un));]])],
+[ac_cv_af_unix=yes],
+[ac_cv_af_unix=no])])
+if test $ac_cv_af_unix = yes; then
+  AC_DEFINE(HAVE_AF_UNIX, 1,
+  [Define if AF_UNIX supported.])
+fi
+
+# Remote socket connectivity wants AF_INET6 networking
+# Check for AF_INET6 networking
+AC_CACHE_CHECK(for AF_INET6, ac_cv_af_inet6, [
+AC_COMPILE_IFELSE([AC_LANG_PROGRAM([[
+#include 
+#include 
+#include 
+#include ]],[[
+sockaddr_in6 in6;
+in6.sin6_family = AF_UNSPEC;
+struct addrinfo *addrs = 0;
+struct addrinfo hints;
+hints.ai_flags = 0;
+hints.ai_family = AF_INET6;
+hints.ai_socktype = SOCK_STREAM;
+hints.ai_protocol = 0;
+hints.ai_canonname = 0;
+hints.ai_addr = 0;
+hints.ai_next = 0;
+int e = getaddrinfo ("localhost", 0, &hints, &addrs);
+const char *str = gai_strerror (e);
+freeaddrinfo (addrs);
+int fd = socket (AF_INET6, SOCK_STREAM, 0);
+connect (fd, (sockaddr *)&in6, sizeof (in6));]])],
+[ac_cv_af_inet6=yes],
+[ac_cv_af_inet6=no])])
+if test $ac_cv_af_inet6 = yes; then
+  AC_DEFINE(HAVE_AF_INET6, 1,
+  [Define if AF_INET6 supported.])
+fi
+
 # Restore CFLAGS, CXXFLAGS from before the gcc_AC_NEED_DECLARATIONS tests.
 CFLAGS="$saved_CFLAGS"
 CXXFLAGS="$saved_CXXFLAGS"



libstdc++: Avoid zero-probability events in discrete_distribution [PR61369]

2020-11-19 Thread Lewis Hyatt via Gcc-patches
Hello-

PR61369 (https://gcc.gnu.org/bugzilla/show_bug.cgi?id=61369) points out
that std::discrete_distribution can return an event even if it has 0
probability, and proposes a simple fix. It seems that this fix was never
applied, because there was an expectation of redoing this code anyway to
use a more efficient algorithm (PR57925). Given that this new algorithm
has not been implemented so far, would it make sense to apply the simple
fix to address this issue? The attached patch does this.

One question about the patch, a slight annoyance is that only
std::lower_bound() is currently available in random.tcc, as this file
includes only bits/stl_algobase.h and not bits/stl_algo.h (via including
). Is there a preference between simply including stl_algo.h, or
moving upper_bound to stl_algobase.h, where lower_bound is? I noticed
that in C++20 mode,  includes stl_algo.h already, so I figured
it would be fine to just include it in random.tcc unconditionally.

bootstrap + testing were done on x86-64 GNU/Linux, all tests the same
before + after plus 2 new passes from the new test. Thanks for taking a
look!

-Lewis
From: Lewis Hyatt 
Date: Wed, 18 Nov 2020 17:12:51 -0500
Subject: [PATCH] libstdc++: Avoid zero-probability events in 
discrete_distribution [PR61369]

Fixes PR61369, as recommended by the PR's submitter, by replacing
lower_bound() with upper_bound(). Currently, if there is an initial subset of
events with probability 0, the first of them will be returned with non-zero
probability (if the underlying RNG returns exactly 0). Switching to
upper_bound() ensures that this will not happen.

libstdc++-v3/ChangeLog:

PR libstdc++/61369
* include/bits/random.tcc: Include bits/stl_algo.h.
(discrete_distribution::operator()): Use upper_bound rather than
lower_bound.
* testsuite/26_numerics/random/pr60037-neg.cc: Adapt to new line
numbering in random.tcc.
* testsuite/26_numerics/random/discrete_distribution/pr61369.cc: New
test.

diff --git a/libstdc++-v3/include/bits/random.tcc 
b/libstdc++-v3/include/bits/random.tcc
index 3205442f2f6..14fe4f39c7b 100644
--- a/libstdc++-v3/include/bits/random.tcc
+++ b/libstdc++-v3/include/bits/random.tcc
@@ -31,6 +31,7 @@
 #define _RANDOM_TCC 1
 
 #include  // std::accumulate and std::partial_sum
+#include  // std::upper_bound
 
 namespace std _GLIBCXX_VISIBILITY(default)
 {
@@ -2706,7 +2707,7 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
  __aurng(__urng);
 
const double __p = __aurng();
-   auto __pos = std::lower_bound(__param._M_cp.begin(),
+   auto __pos = std::upper_bound(__param._M_cp.begin(),
  __param._M_cp.end(), __p);
 
return __pos - __param._M_cp.begin();
diff --git 
a/libstdc++-v3/testsuite/26_numerics/random/discrete_distribution/pr61369.cc 
b/libstdc++-v3/testsuite/26_numerics/random/discrete_distribution/pr61369.cc
new file mode 100644
index 000..f8fa97e293e
--- /dev/null
+++ b/libstdc++-v3/testsuite/26_numerics/random/discrete_distribution/pr61369.cc
@@ -0,0 +1,55 @@
+// Copyright (C) 2020 Free Software Foundation, Inc.
+//
+// This file is part of the GNU ISO C++ Library.  This library is free
+// software; you can redistribute it and/or modify it under the
+// terms of the GNU General Public License as published by the
+// Free Software Foundation; either version 3, or (at your option)
+// any later version.
+
+// This library is distributed in the hope that it will be useful,
+// but WITHOUT ANY WARRANTY; without even the implied warranty of
+// MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+// GNU General Public License for more details.
+
+// You should have received a copy of the GNU General Public License along
+// with this library; see the file COPYING3.  If not see
+// .
+
+// { dg-do run { target c++11 } }
+// { dg-require-cstdint "" }
+
+#include 
+#include 
+#include 
+#include 
+
+class not_so_random
+{
+public:
+  using result_type = std::uint64_t;
+
+  static constexpr result_type
+  min()
+  { return 0u; }
+
+  static constexpr result_type
+  max()
+  { return std::numeric_limits::max(); }
+
+  result_type
+  operator()() const
+  { return 0u; }
+};
+
+void
+test01()
+{
+  std::discrete_distribution<> u{0.0, 0.5, 0.5};
+  not_so_random rng;
+  VERIFY( u(rng) > 0 );
+}
+
+int main()
+{
+  test01();
+}
diff --git a/libstdc++-v3/testsuite/26_numerics/random/pr60037-neg.cc 
b/libstdc++-v3/testsuite/26_numerics/random/pr60037-neg.cc
index ba252ef34fe..4d00d1846c4 100644
--- a/libstdc++-v3/testsuite/26_numerics/random/pr60037-neg.cc
+++ b/libstdc++-v3/testsuite/26_numerics/random/pr60037-neg.cc
@@ -12,4 +12,4 @@ auto x = std::generate_canonical

Re: [PATCH] openmp: Implicit 'declare target' for C++ static initializers

2020-11-19 Thread Kwok Cheung Yeung

On 29/10/2020 10:03 am, Jakub Jelinek wrote:

I'm actually not sure how this can work correctly.
Let's say we have
int foo () { return 1; }
int bar () { return 2; }
int baz () { return 3; }
int qux () { return 4; }
int a = foo ();
int b = bar ();
int c = baz ();
int *d = &c;
int e = qux ();
int f = e + 1;
int *g = &f;
#pragma omp declare target to (b, d, g)
So, for the implicit declare target discovery, a is not declare target to,
nor is foo, and everything else is; b, d, g explicitly, c because it is
referenced in initializer of b, f because it is mentioned in initializer of
g and e because it is mentioned in initializer of f.
Haven't checked if the new function you've added is called before or after
analyze_function calls omp_discover_implicit_declare_target, but I don't
really see how it can work when it is not inside of that function, so that
discovery of new static vars that are implicitly declare target to doesn't
result in marking of its dynamic initializers too.  Perhaps we need a
langhook for that.  But if it is a separate function, either it is called
before the other discovery and will ignore static initializers for vars
that will only be marked as implicit declare target to later, or it is done
afterwards, but then it would really need to duplicate everything what the
other function does, otherwise it woiuldn't discover everything.



I have added a new langhook GET_DECL_INIT that by default returns the 
DECL_INITIAL of a variable declaration, but for C++ can also return the dynamic 
initializer if present. omp_discover_implicit_declare_target and 
omp_discover_declare_target_var_r have been changed to use the new langhook 
instead of using DECL_INITIAL.


The dynamic initializer information is stored in a new variable 
dynamic_initializers. The information is originally stored in static_aggregates, 
but this is nulled by calling prune_vars_needing_no_initialization in 
c_parse_final_cleanups. I copy the information into a separate variable before 
it is discarded - this avoids any potential problems that may be caused by 
trying to change the way that static_aggregates currently works.


With this, all the functions and variables in your example are marked correctly:

foo ()
...

__attribute__((omp declare target))
bar ()
...

__attribute__((omp declare target))
baz ()
...

__attribute__((omp declare target))
qux ()
...

.offload_var_table:
.quad   g
.quad   8
.quad   d
.quad   8
.quad   b
.quad   4
.quad   c
.quad   4
.quad   f
.quad   4
.quad   e
.quad   4

Your example is now a compile test in g++.dg/gomp/.


Anyway, that is one thing, the other is even if the implicit declare target
discovery handles those correctly, the question is what should we do
afterwards.  Because the C++ FE normally creates a single function that
performs the dynamic initialization of the TUs variables.  But that function
shouldn't be really declare target to, it initializes not only (explicit or
implicit) declare target to variables, but also host only variables.
So we'll probably need to create next to that host only TU constructor
also a device only constructor function that will only initialize the
declare target to variables.


Even without this patch, G++ currently accepts something like

int foo() { return 1; }
int x = foo();
#pragma omp declare target to(x)

but will not generate the device-side initializer for x, even though x is now 
present on the device. So this part of the implementation is broken with or 
without the patch.


Given that my patch doesn't make the current situation any worse, can I commit 
this portion of it to trunk for now, and leave device-side dynamic 
initialization for later?


Bootstrapped on x86_64 with no offloading, G++ testsuite ran with no 
regressions, and no regressions in the libgomp testsuite with Nvidia offloading.


Thanks,

Kwok
From 0348b149474d0922d79209705e6777e7af271e0d Mon Sep 17 00:00:00 2001
From: Kwok Cheung Yeung 
Date: Wed, 18 Nov 2020 13:54:01 -0800
Subject: [PATCH] openmp: Implicitly add 'declare target' directives for
 dynamic initializers in C++

2020-11-18  Kwok Cheung Yeung  

gcc/
* langhooks-def.h (lhd_get_decl_init): New.
(LANG_HOOKS_GET_DECL_INIT): New.
(LANG_HOOKS_DECLS): Add LANG_HOOKS_GET_DECL_INIT.
* langhooks.h (struct lang_hooks_for_decls): Add get_decl_init.
* omp-offload.c (omp_discover_declare_target_var_r): Use
get_decl_init langhook in place of DECL_INITIAL.

gcc/cp/
* cp-lang.c (cxx_get_decl_init): New.
(LANG_HOOKS_GET_DECL_INIT): New.
* cp-tree.h (dynamic_initializers): New.
* decl.c (dynamic_initializers): New.
* decl2.c (c_parse_final_cleanups): Copy vars into
dynamic_initializers.

gcc/testsuite/
* g++.dg/gomp/declare-target-3.C: New.
---
 gcc/cp/cp-lang.c | 24 +
 

Re: [PATCH] libstdc++: Enable without gthreads

2020-11-19 Thread Tom Tromey
> "Jonathan" == Jonathan Wakely  writes:

Jonathan> Here's a slightly more conservative version of the patch. This moves
Jonathan> std::thread and this_thread::get_id() and this_thread::yield() to a
Jonathan> new header, and makes *most* of std::thread defined without gthreads
Jonathan> (because we need the nested thread::id type to be returned from
Jonathan> this_thread::get_id()). But it doesn't declare the std::thread
Jonathan> constructor that creates new threads.
...
Jonathan> Both this and the previous patch require some GDB changes, because GDB
Jonathan> currently assumes that if std::thread is declared in  that it
Jonathan> is usable and multiple threads are supported. That's no longer true,
Jonathan> because we would declare a useless std::thread after this patch. Tom
Jonathan> Tromey has patches to make GDB handle this though.

It turns out that with this approach, there's nothing to do in gdb,
because luckily the configure check looks to see if the constructor is
usable:

AC_CACHE_CHECK([for std::thread],
   gdb_cv_cxx_std_thread,
   [AC_COMPILE_IFELSE([AC_LANG_PROGRAM(
[[#include 
  void callback() { }]],
[[std::thread t(callback);]])],

I will probably still check in the patch to catch system_error when
starting a thread, though.

thanks,
Tom


Re: [PATCH] c++: Implement -Wuninitialized for mem-initializers [PR19808]

2020-11-19 Thread Jason Merrill via Gcc-patches

On 11/17/20 3:44 AM, Jan Hubicka wrote:

On Tue, Nov 17, 2020 at 01:33:48AM -0500, Jason Merrill via Gcc-patches wrote:

Why doesn't the middle-end warning work for inline functions?


It does but only when they're called (and, as usual, also unless
the uninitialized use is eliminated).


Yes, but why?  I assume because we don't bother going through all the phases
of compilation for unused inlines, but couldn't we change that when we're
asking for (certain) warnings?


CCing Richard and Honza on this.

I think for unused functions we don't even gimplify unused functions, the
cgraph code just throws them away.  Even trying just to run the first few
passes (gimplification up to uninit1) would have several high costs,

Note that uninit1 is a late pass so it is not just few passes we speak
about.  Late passes are run only on cocde that really lands in .s file
so enabling them would mean splitting the pass queue and running another
unreachable code somewhere.  That would confuse inliner and other IPA
passes since they will have to somehow deal with dead code in their
program size estimate and also affect LTO.

Even early passes are run only on reachable portion of program, since
functions are analyzed by cgraphunit on demand (only if they are
analyzed by someone else). Simlar logic is also done be C++ FE to decide
what templates.  Changling this would also have quite some compile
time/memory use impact.

There is -fkeep-inline-functions.


OK, thanks for the explanation.  -fkeep-inline-functions seems like an 
acceptable answer for people who want a warning audit of their library 
header inlines.


Martin, I notice that the middle-end warning doesn't currently catch this:

struct B { int i,j; };

struct A
{
  B b;
  A(): b({b.i}) { }
};

A a;

It does warn if B only has one member; adding the second wrongly 
silences the warning.


Jason



Re: [PATCH] c++: Fix crash with broken deduction from {} [PR97895]

2020-11-19 Thread Jason Merrill via Gcc-patches

On 11/19/20 11:11 AM, Marek Polacek wrote:

Unfortunately, the otherwise beautiful

   for (constructor_elt &elt : *CONSTRUCTOR_ELTS (init))

is not immune to an empty constructor, so we have to check
CONSTRUCTOR_ELTS first.

Bootstrapped/regtested on x86_64-pc-linux-gnu, ok for trunk?


OK.


gcc/cp/ChangeLog:

PR c++/97895
* pt.c (do_auto_deduction): Don't crash when the constructor has
zero elements.

gcc/testsuite/ChangeLog:

PR c++/97895
* g++.dg/cpp0x/auto54.C: New test.
---
  gcc/cp/pt.c | 11 +++
  gcc/testsuite/g++.dg/cpp0x/auto54.C | 10 ++
  2 files changed, 17 insertions(+), 4 deletions(-)
  create mode 100644 gcc/testsuite/g++.dg/cpp0x/auto54.C

diff --git a/gcc/cp/pt.c b/gcc/cp/pt.c
index 1babf833d32..a1b6631d691 100644
--- a/gcc/cp/pt.c
+++ b/gcc/cp/pt.c
@@ -29250,10 +29250,13 @@ do_auto_deduction (tree type, tree init, tree 
auto_node,
  return error_mark_node;
  
if (BRACE_ENCLOSED_INITIALIZER_P (init))

-/* We don't recurse here because we can't deduce from a nested
-   initializer_list.  */
-for (constructor_elt &elt : *CONSTRUCTOR_ELTS (init))
-  elt.value = resolve_nondeduced_context (elt.value, complain);
+{
+  /* We don't recurse here because we can't deduce from a nested
+initializer_list.  */
+  if (CONSTRUCTOR_ELTS (init))
+   for (constructor_elt &elt : *CONSTRUCTOR_ELTS (init))
+ elt.value = resolve_nondeduced_context (elt.value, complain);
+}
else
  init = resolve_nondeduced_context (init, complain);
  
diff --git a/gcc/testsuite/g++.dg/cpp0x/auto54.C b/gcc/testsuite/g++.dg/cpp0x/auto54.C

new file mode 100644
index 000..0c1815a99bc
--- /dev/null
+++ b/gcc/testsuite/g++.dg/cpp0x/auto54.C
@@ -0,0 +1,10 @@
+// PR c++/97895
+// { dg-do compile { target c++11 } }
+
+namespace std {
+  template struct initializer_list {
+const T *ptr;
+decltype(sizeof 0) n;
+  };
+  auto a = {}; // { dg-error "unable to deduce" }
+}

base-commit: 25bb75f841c552cfd27a4344b7487efbe35b4481





Re: [PATCH] c++: Fix array new with value-initialization [PR97523]

2020-11-19 Thread Jason Merrill via Gcc-patches

On 11/19/20 11:11 AM, Marek Polacek wrote:

Since my r11-3092 the following is rejected with -std=c++20:

   struct T { explicit T(); };
   void fn(int n) {
 new T[1]();
   }

with "would use explicit constructor 'T::T()'".  It is because since
that change we go into the P1009 block in build_new (array_p is false,
but nelts is non-null and we're in C++20).  Since we only have (), we
build a {} and continue to build_new_1, which then calls build_vec_init
and then we error because the {} isn't CONSTRUCTOR_IS_DIRECT_INIT.

For (), which is value-initializing, we want to do what we were doing
before: pass empty init and let build_value_init take care of it.

For various reasons I wanted to dig a little bit deeper into this,
and as a result, I'm adding a test for [expr.new]/24 (and checked that
out current behavior matches clang++).

Bootstrapped/regtested on x86_64-pc-linux-gnu, ok for trunk?


OK.


gcc/cp/ChangeLog:

PR c++/97523
* init.c (build_new): When value-initializing an array new,
leave the INIT as an empty vector.

gcc/testsuite/ChangeLog:

PR c++/97523
* g++.dg/expr/anew5.C: New test.
* g++.dg/expr/anew6.C: New test.
---
  gcc/cp/init.c |  6 +-
  gcc/testsuite/g++.dg/expr/anew5.C | 26 
  gcc/testsuite/g++.dg/expr/anew6.C | 33 +++
  3 files changed, 64 insertions(+), 1 deletion(-)
  create mode 100644 gcc/testsuite/g++.dg/expr/anew5.C
  create mode 100644 gcc/testsuite/g++.dg/expr/anew6.C

diff --git a/gcc/cp/init.c b/gcc/cp/init.c
index ffb84ea5b09..0b98f338feb 100644
--- a/gcc/cp/init.c
+++ b/gcc/cp/init.c
@@ -3766,7 +3766,11 @@ build_new (location_t loc, vec **placement, 
tree type,
  
/* P1009: Array size deduction in new-expressions.  */

const bool array_p = TREE_CODE (type) == ARRAY_TYPE;
-  if (*init && (array_p || (nelts && cxx_dialect >= cxx20)))
+  if (*init
+  /* If ARRAY_P, we have to deduce the array bound.  For C++20 paren-init,
+we have to process the parenthesized-list.  But don't do it for (),
+which is value-initialization, and INIT should stay empty.  */
+  && (array_p || (cxx_dialect >= cxx20 && nelts && !(*init)->is_empty (
  {
/* This means we have 'new T[]()'.  */
if ((*init)->is_empty ())
diff --git a/gcc/testsuite/g++.dg/expr/anew5.C 
b/gcc/testsuite/g++.dg/expr/anew5.C
new file mode 100644
index 000..d597caf5483
--- /dev/null
+++ b/gcc/testsuite/g++.dg/expr/anew5.C
@@ -0,0 +1,26 @@
+// PR c++/97523
+// { dg-do compile }
+// We were turning the () into {} which made it seem like
+// aggregate-initialization (we are dealing with arrays here), which
+// performs copy-initialization, which only accepts converting constructors.
+
+struct T {
+  explicit T();
+  T(int);
+};
+
+void
+fn (int n)
+{
+  new T[1]();
+  new T[2]();
+  new T[3]();
+  new T[n]();
+#if __cpp_aggregate_paren_init
+  new T[]();
+  new T[2](1, 2);
+  // T[2] is initialized via copy-initialization, so we can't call
+  // explicit T().
+  new T[3](1, 2); // { dg-error "explicit constructor" "" { target c++20 } }
+#endif
+}
diff --git a/gcc/testsuite/g++.dg/expr/anew6.C 
b/gcc/testsuite/g++.dg/expr/anew6.C
new file mode 100644
index 000..0542daac275
--- /dev/null
+++ b/gcc/testsuite/g++.dg/expr/anew6.C
@@ -0,0 +1,33 @@
+// PR c++/97523
+// { dg-do compile { target c++11 } }
+
+// [expr.new]/24: If the new-expression creates an object or an array of
+// objects of class type, access and ambiguity control are done for the
+// [...] constructor selected for the initialization (if any).
+// NB: We only check for a default constructor if the array has a non-constant
+// bound, or there are insufficient initializers.  Since an array is an
+// aggregate, we perform aggregate-initialization, which performs
+// copy-initialization, so we only accept converting constructors.
+
+struct T {
+  explicit T();
+  T(int);
+};
+
+struct S {
+  S(int);
+};
+
+void
+fn (int n)
+{
+  new T[1]{}; // { dg-error "explicit constructor" }
+  new T[2]{1, 2};
+  new T[3]{1, 2}; // { dg-error "explicit constructor" }
+  new T[n]{}; // { dg-error "explicit constructor" }
+
+  new S[1]{}; // { dg-error "could not convert" }
+  new S[2]{1, 2};
+  new S[3]{1, 2}; // { dg-error "could not convert" }
+  new S[n]{}; // { dg-error "could not convert" }
+}

base-commit: 2729378d0905a04e476a8bdcaaf0288f417810ec





Re: [PATCH] rs6000: Fix p8_mtvsrd_df's insn type

2020-11-19 Thread David Edelsohn via Gcc-patches
On Thu, Nov 19, 2020 at 1:54 AM Kewen.Lin  wrote:
>
> Hi,
>
> The insn type of p8_mtvsrd_df looks missed to be updated
> with mtvsr.  Here I supposed mtvsrd's all usages should
> be with the same insn type.
>
> This patch is to fix its current insn type mfvsr by mtvsr.
>
> Is it ok for trunk?
>
> BR,
> Kewen
> -
> gcc/ChangeLog:
>
> * config/rs6000/rs6000.md (p8_mtvsrd_df): Fix insn type.

Good that you noticed it. Okay for trunk.

Thanks, David


[PATCH] arm: Fix up neon_vector_mem_operand [PR97528]

2020-11-19 Thread Jakub Jelinek via Gcc-patches
Hi!

The documentation for POST_MODIFY says:
   Currently, the compiler can only handle second operands of the
   form (plus (reg) (reg)) and (plus (reg) (const_int)), where
   the first operand of the PLUS has to be the same register as
   the first operand of the *_MODIFY.
The following testcase ICEs, because combine just attempts to simplify
things and ends up with
(post_modify (reg1) (plus (mult (reg2) (const_int 4)) (reg1))
but the target predicates accept it, because they only verify
that POST_MODIFY's second operand is PLUS and the second operand
of the PLUS is a REG.

The following patch fixes this by performing further verification that
the POST_MODIFY is in the form it should be.

Bootstrapped/regtested on armv7hl-linux-gnueabi, ok for trunk
and release branches after a while?

2020-11-19  Jakub Jelinek  

PR target/97528
* config/arm/arm.c (neon_vector_mem_operand): For POST_MODIFY, require
first POST_MODIFY operand is a REG and is equal to the first operand
of PLUS.

* gcc.target/arm/pr97528.c: New test.

--- gcc/config/arm/arm.c.jj 2020-11-13 19:00:46.729620560 +0100
+++ gcc/config/arm/arm.c2020-11-18 17:05:44.656867343 +0100
@@ -13429,7 +13429,9 @@ neon_vector_mem_operand (rtx op, int typ
   /* Allow post-increment by register for VLDn */
   if (type == 2 && GET_CODE (ind) == POST_MODIFY
   && GET_CODE (XEXP (ind, 1)) == PLUS
-  && REG_P (XEXP (XEXP (ind, 1), 1)))
+  && REG_P (XEXP (XEXP (ind, 1), 1))
+  && REG_P (XEXP (ind, 0))
+  && rtx_equal_p (XEXP (ind, 0), XEXP (XEXP (ind, 1), 0)))
  return true;
 
   /* Match:
--- gcc/testsuite/gcc.target/arm/pr97528.c.jj   2020-11-18 17:09:58.195053288 
+0100
+++ gcc/testsuite/gcc.target/arm/pr97528.c  2020-11-18 17:09:47.839168237 
+0100
@@ -0,0 +1,28 @@
+/* PR target/97528 */
+/* { dg-do compile } */
+/* { dg-require-effective-target arm_neon_ok } */
+/* { dg-options "-O1" }  */
+/* { dg-add-options arm_neon } */
+
+#include 
+
+typedef __simd64_int16_t T;
+typedef __simd64_uint16_t U;
+unsigned short c;
+int d;
+U e;
+
+void
+foo (void)
+{
+  unsigned short *dst = &c;
+  int g = d, b = 4;
+  U dc = e;
+  for (int h = 0; h < b; h++)
+{
+  unsigned short *i = dst;
+  U j = dc;
+  vst1_s16 ((int16_t *) i, (T) j);
+  dst += g;
+}
+}


Jakub



Re: [C PATCH] Drop qualifiers during lvalue conversion

2020-11-19 Thread Joseph Myers
On Thu, 19 Nov 2020, Uecker, Martin wrote:

> Here is another version of the patch. The
> only difference is the additional the check 
> using 'tree_ssa_useless_type_conversion'.

The code changes in this one are OK.  However, in the test:

> +void f(void)
> +{
> + const int j;
> + typeof((0,j)) i10; i10 = j;;
> + typeof(+j) i11; i11 = j;;
> + typeof(-j) i12; i12 = j;;
> + typeof(1?j:0) i13; i13 = j;;
> + typeof((int)j) i14; i14 = j;;
> + typeof((const int)j) i15; i15 = j;;
> +}

This test function seems fine.

> +void g(void)
> +{
> + volatile int j;
> + typeof((0,j)) i21; i21 = j;;
> + typeof(+j) i22; i22 = j;;
> + typeof(-j) i23; i23 = j;;
> + typeof(1?j:0) i24; i24 = j;;
> + typeof((int)j) i25; i25 = j;;
> + typeof((volatile int)j) i26; i26 = j;;
> +}
> +
> +void h(void)
> +{
> + _Atomic int j;
> + typeof((0,j)) i32; i32 = j;;
> + typeof(+j) i33; i33 = j;;
> + typeof(-j) i34; i34 = j;;
> + typeof(1?j:0) i35; i35 = j;;
> + typeof((int)j) i36; i36 = j;;
> + typeof((_Atomic int)j) i37; i37 = j;;
> +}
> +
> +void e(void)
> +{
> + int* restrict j;
> + typeof((0,j)) i43; i43 = j;;
> + typeof(1?j:0) i44; i44 = j;;
> + typeof((int*)j) i45; i45 = j;;
> + typeof((int* restrict)j) i46; i46 = j;;
> +}

But these tests don't look like they do anything useful (i.e. verify that 
typeof loses the qualifier), because testing by assignment like that only 
works with const.  You could do e.g.

volatile int j;
extern int i;
extern typeof((0,j)) i;

instead to verify the qualifier is removed.

-- 
Joseph S. Myers
jos...@codesourcery.com


Re: [PATCH] libstdc++: Enable without gthreads

2020-11-19 Thread Jonathan Wakely via Gcc-patches

On 19/11/20 13:36 +, Jonathan Wakely wrote:

On 16/11/20 14:43 -0800, Thomas Rodgers wrote:

This patch looks good to me.


Committed now.


This patch was also needed, but I don't understand why I didn't see
the FAILs on gcc135 in teh cfarm.

Anyway, tested x86_64-linux, committed to trunk.



commit 5e6a43158d2e5b26616716c50badedd3400c6bea
Author: Jonathan Wakely 
Date:   Thu Nov 19 16:17:33 2020

libstdc++: Add missing header to some tests

These tests use std::this_thread::sleep_for without including .

libstdc++-v3/ChangeLog:

* testsuite/30_threads/async/async.cc: Include .
* testsuite/30_threads/future/members/93456.cc: Likewise.

diff --git a/libstdc++-v3/testsuite/30_threads/async/async.cc b/libstdc++-v3/testsuite/30_threads/async/async.cc
index 1c779bfbcad4..b06c2553c952 100644
--- a/libstdc++-v3/testsuite/30_threads/async/async.cc
+++ b/libstdc++-v3/testsuite/30_threads/async/async.cc
@@ -22,6 +22,7 @@
 
 
 #include 
+#include 
 #include 
 
 using namespace std;
diff --git a/libstdc++-v3/testsuite/30_threads/future/members/93456.cc b/libstdc++-v3/testsuite/30_threads/future/members/93456.cc
index 8d6a5148ce3c..9d1cbcef0013 100644
--- a/libstdc++-v3/testsuite/30_threads/future/members/93456.cc
+++ b/libstdc++-v3/testsuite/30_threads/future/members/93456.cc
@@ -22,6 +22,7 @@
 
 
 #include 
+#include 
 #include 
 #include 
 #include 


Re: [PATCH] Check calls before loop unrolling

2020-11-19 Thread Jeff Law via Gcc-patches



On 8/31/20 9:33 PM, Jiufu Guo via Gcc-patches wrote:
> guojiufu  writes:
>
> Hi,
>
> In this patch, the default value of
> param=max-unrolled-average-calls-x1 is '0', which means to unroll
> a loop, there should be no call inside the body.  Do I need to set the
> default value to a bigger value (16?) for later tune?  Biger value will
> keep the behavior unchanged.
>
> And is this patch ok for trunk?  Thanks a lot for you comments!
>
> BR.
> Jiufu.
>
>
>> Hi,
>>
>> When unroll loops, if there are calls inside the loop, those calls
>> may raise negative impacts for unrolling.  This patch adds a param
>> param_max_unrolled_calls, and checks if the number of calls inside
>> the loop bigger than this param, loop is prevent from unrolling.
>>
>> This patch is checking the _average_ number of calls which is the
>> summary of call numbers multiply the possibility of the call maybe
>> executed.  The _average_ number could be a fraction, to keep the
>> precision, the param is the threshold number multiply 1.
>>
>> Bootstrap and regtest pass on powerpc64le.  Is this ok for trunk?
>>
>> gcc/ChangeLog
>> 2020-08-19  Jiufu Guo   
>>
>>  * params.opt (param_max_unrolled_average_calls_x1): New param.
>>  * cfgloop.h (average_num_loop_calls): New declare.
>>  * cfgloopanal.c (average_num_loop_calls): New function.
>>  * loop-unroll.c (decide_unroll_constant_iteration,
>>  decide_unroll_runtime_iterations,
>>  decide_unroll_stupid): Check average_num_loop_calls and
>>  param_max_unrolled_average_calls_x1.
So what's the motivation behind adding a PARAM to control this
behavior?  I'm not a big fan of exposing a lot of PARAMs for users to
tune behavior (though I've made the same lapse in judgment myself).  In
my mind a PARAM is really more about controlling pathological behavior.

jeff



Re: libstdc++: Avoid zero-probability events in discrete_distribution [PR61369]

2020-11-19 Thread Jonathan Wakely via Gcc-patches

On 19/11/20 12:57 -0500, Lewis Hyatt via Libstdc++ wrote:

Hello-

PR61369 (https://gcc.gnu.org/bugzilla/show_bug.cgi?id=61369) points out
that std::discrete_distribution can return an event even if it has 0
probability, and proposes a simple fix. It seems that this fix was never
applied, because there was an expectation of redoing this code anyway to
use a more efficient algorithm (PR57925). Given that this new algorithm
has not been implemented so far, would it make sense to apply the simple
fix to address this issue? The attached patch does this.

One question about the patch, a slight annoyance is that only
std::lower_bound() is currently available in random.tcc, as this file
includes only bits/stl_algobase.h and not bits/stl_algo.h (via including
). Is there a preference between simply including stl_algo.h, or
moving upper_bound to stl_algobase.h, where lower_bound is? I noticed
that in C++20 mode,  includes stl_algo.h already, so I figured
it would be fine to just include it in random.tcc unconditionally.


But the increase in header sizes in C++20 is a regression:
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=92546

Anyway, I'll review this patch tomorrow, thanks for sending it.


bootstrap + testing were done on x86-64 GNU/Linux, all tests the same
before + after plus 2 new passes from the new test. Thanks for taking a
look!

-Lewis



From: Lewis Hyatt 
Date: Wed, 18 Nov 2020 17:12:51 -0500
Subject: [PATCH] libstdc++: Avoid zero-probability events in 
discrete_distribution [PR61369]

Fixes PR61369, as recommended by the PR's submitter, by replacing
lower_bound() with upper_bound(). Currently, if there is an initial subset of
events with probability 0, the first of them will be returned with non-zero
probability (if the underlying RNG returns exactly 0). Switching to
upper_bound() ensures that this will not happen.

libstdc++-v3/ChangeLog:

PR libstdc++/61369
* include/bits/random.tcc: Include bits/stl_algo.h.
(discrete_distribution::operator()): Use upper_bound rather than
lower_bound.
* testsuite/26_numerics/random/pr60037-neg.cc: Adapt to new line
numbering in random.tcc.
* testsuite/26_numerics/random/discrete_distribution/pr61369.cc: New
test.

diff --git a/libstdc++-v3/include/bits/random.tcc 
b/libstdc++-v3/include/bits/random.tcc
index 3205442f2f6..14fe4f39c7b 100644
--- a/libstdc++-v3/include/bits/random.tcc
+++ b/libstdc++-v3/include/bits/random.tcc
@@ -31,6 +31,7 @@
#define _RANDOM_TCC 1

#include  // std::accumulate and std::partial_sum
+#include  // std::upper_bound

namespace std _GLIBCXX_VISIBILITY(default)
{
@@ -2706,7 +2707,7 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
  __aurng(__urng);

const double __p = __aurng();
-   auto __pos = std::lower_bound(__param._M_cp.begin(),
+   auto __pos = std::upper_bound(__param._M_cp.begin(),
  __param._M_cp.end(), __p);

return __pos - __param._M_cp.begin();
diff --git 
a/libstdc++-v3/testsuite/26_numerics/random/discrete_distribution/pr61369.cc 
b/libstdc++-v3/testsuite/26_numerics/random/discrete_distribution/pr61369.cc
new file mode 100644
index 000..f8fa97e293e
--- /dev/null
+++ b/libstdc++-v3/testsuite/26_numerics/random/discrete_distribution/pr61369.cc
@@ -0,0 +1,55 @@
+// Copyright (C) 2020 Free Software Foundation, Inc.
+//
+// This file is part of the GNU ISO C++ Library.  This library is free
+// software; you can redistribute it and/or modify it under the
+// terms of the GNU General Public License as published by the
+// Free Software Foundation; either version 3, or (at your option)
+// any later version.
+
+// This library is distributed in the hope that it will be useful,
+// but WITHOUT ANY WARRANTY; without even the implied warranty of
+// MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+// GNU General Public License for more details.
+
+// You should have received a copy of the GNU General Public License along
+// with this library; see the file COPYING3.  If not see
+// .
+
+// { dg-do run { target c++11 } }
+// { dg-require-cstdint "" }
+
+#include 
+#include 
+#include 
+#include 
+
+class not_so_random
+{
+public:
+  using result_type = std::uint64_t;
+
+  static constexpr result_type
+  min()
+  { return 0u; }
+
+  static constexpr result_type
+  max()
+  { return std::numeric_limits::max(); }
+
+  result_type
+  operator()() const
+  { return 0u; }
+};
+
+void
+test01()
+{
+  std::discrete_distribution<> u{0.0, 0.5, 0.5};
+  not_so_random rng;
+  VERIFY( u(rng) > 0 );
+}
+
+int main()
+{
+  test01();
+}
diff --git a/libstdc++-v3/testsuite/26_numerics/random/pr60037-neg.cc 
b/libstdc++-v3/testsuite/26_numerics/random/pr60037-neg.cc
index ba252ef34fe..4d00d1846c4 100644
--- a/libstdc++-v3/testsuite/26_numerics/random/pr60037-neg.cc
+++ b/libstdc++-v3/testsuite/26_numerics/random/pr60037-neg.cc
@@ -12,4 +12,4 @@ auto x = std::generate_canonical



Fix two issues I introduced in operand_equal_p

2020-11-19 Thread Jan Hubicka
Hi,
doing some further testing and analysis of icf miscompares I noticed tat
my change for hadling OEP_ADDRESS_OF of COMPONENT_REF had last minute
chnage that made it not effective, since flag is cleared before the
conditional.  After some exprimenting it seem cleanest to just use
temporary bool.

Other problem is that obj-C++ produces OBJ_TYPE_REFs that are referring
to something else than class types. obj_type_ref_class asserts for that
since one is expected to use virutal_method_call_p predicate first.
It would be nice to make obj-C++ either produce standard OBJ_TYEP_REFs
or use different code for that, but that is for another day.

I apologize for both - clearly need a break which I will do. Fortunately
it seems that ICF issues tracked in PR92535 are almost resolved.

lto-bootstraped and regtested x86_64-linux, comitted.

Honza

* fold-const.c (operand_compare::operand_equal_p): Fix thinko in
COMPONENT_REF handling and guard types_same_for_odr by
virtual_method_call_p.
(operand_compare::hash_operand): Likewise.
diff --git a/gcc/fold-const.c b/gcc/fold-const.c
index 820b08d26fd..1bce9e72c1d 100644
--- a/gcc/fold-const.c
+++ b/gcc/fold-const.c
@@ -3314,30 +3314,34 @@ operand_compare::operand_equal_p (const_tree arg0, 
const_tree arg1,
 may be NULL when we're called to compare MEM_EXPRs.  */
  if (!OP_SAME_WITH_NULL (0))
return false;
- /* Most of time we only need to compare FIELD_DECLs for equality.
-However when determining address look into actual offsets.
-These may match for unions and unshared record types.  */
- flags &= ~OEP_ADDRESS_OF;
- if (!OP_SAME (1))
-   {
- if (flags & OEP_ADDRESS_OF)
-   {
- if (TREE_OPERAND (arg0, 2)
- || TREE_OPERAND (arg1, 2))
-   return OP_SAME_WITH_NULL (2);
- tree field0 = TREE_OPERAND (arg0, 1);
- tree field1 = TREE_OPERAND (arg1, 1);
-
- if (!operand_equal_p (DECL_FIELD_OFFSET (field0),
-   DECL_FIELD_OFFSET (field1), flags)
- || !operand_equal_p (DECL_FIELD_BIT_OFFSET (field0),
-  DECL_FIELD_BIT_OFFSET (field1),
-  flags))
-   return false;
-   }
- else
-   return false;
-   }
+ {
+   bool compare_address = flags & OEP_ADDRESS_OF;
+
+   /* Most of time we only need to compare FIELD_DECLs for equality.
+  However when determining address look into actual offsets.
+  These may match for unions and unshared record types.  */
+   flags &= ~OEP_ADDRESS_OF;
+   if (!OP_SAME (1))
+ {
+   if (compare_address)
+ {
+   if (TREE_OPERAND (arg0, 2)
+   || TREE_OPERAND (arg1, 2))
+ return OP_SAME_WITH_NULL (2);
+   tree field0 = TREE_OPERAND (arg0, 1);
+   tree field1 = TREE_OPERAND (arg1, 1);
+
+   if (!operand_equal_p (DECL_FIELD_OFFSET (field0),
+ DECL_FIELD_OFFSET (field1), flags)
+   || !operand_equal_p (DECL_FIELD_BIT_OFFSET (field0),
+DECL_FIELD_BIT_OFFSET (field1),
+flags))
+ return false;
+ }
+   else
+ return false;
+ }
+ }
  return OP_SAME_WITH_NULL (2);
 
case BIT_FIELD_REF:
@@ -3436,8 +3440,11 @@ operand_compare::operand_equal_p (const_tree arg0, 
const_tree arg1,
if (!operand_equal_p (OBJ_TYPE_REF_OBJECT (arg0),
  OBJ_TYPE_REF_OBJECT (arg1), flags))
  return false;
-   if (!types_same_for_odr (obj_type_ref_class (arg0),
-obj_type_ref_class (arg1)))
+   if (virtual_method_call_p (arg0) != virtual_method_call_p (arg1))
+ return false;
+   if (virtual_method_call_p (arg0)
+   && !types_same_for_odr (obj_type_ref_class (arg0),
+   obj_type_ref_class (arg1)))
  return false;
return true;
 
@@ -3866,6 +3873,8 @@ operand_compare::hash_operand (const_tree t, 
inchash::hash &hstate,
  flags &= ~OEP_ADDRESS_OF;
  inchash::add_expr (OBJ_TYPE_REF_TOKEN (t), hstate, flags);
  inchash::add_expr (OBJ_TYPE_REF_OBJECT (t), hstate, flags);
+ if (!virtual_method_call_p (t))
+   return;
  if (tree c = obj_type_ref_class (t))
{
  c = TYPE_NAME (TYPE_MAIN_VARIANT (c));


Re: [C PATCH] Drop qualifiers during lvalue conversion

2020-11-19 Thread Uecker, Martin
Am Donnerstag, den 19.11.2020, 18:58 + schrieb Joseph Myers:
> On Thu, 19 Nov 2020, Uecker, Martin wrote:

...
> 
> > +void g(void)
> > +{
> > + volatile int j;
> > + typeof((0,j)) i21; i21 = j;;
> > + typeof(+j) i22; i22 = j;;
> > + typeof(-j) i23; i23 = j;;
> > + typeof(1?j:0) i24; i24 = j;;
> > + typeof((int)j) i25; i25 = j;;
> > + typeof((volatile int)j) i26; i26 = j;;
> > +}
> > +
> > +void h(void)
> > +{
> > + _Atomic int j;
> > + typeof((0,j)) i32; i32 = j;;
> > + typeof(+j) i33; i33 = j;;
> > + typeof(-j) i34; i34 = j;;
> > + typeof(1?j:0) i35; i35 = j;;
> > + typeof((int)j) i36; i36 = j;;
> > + typeof((_Atomic int)j) i37; i37 = j;;
> > +}
> > +
> > +void e(void)
> > +{
> > + int* restrict j;
> > + typeof((0,j)) i43; i43 = j;;
> > + typeof(1?j:0) i44; i44 = j;;
> > + typeof((int*)j) i45; i45 = j;;
> > + typeof((int* restrict)j) i46; i46 = j;;
> > +}
> 
> But these tests don't look like they do anything useful (i.e. verify that 
> typeof loses the qualifier), because testing by assignment like that only 
> works with const.  You could do e.g.
> 
> volatile int j;
> extern int i;
> extern typeof((0,j)) i;
> 
> instead to verify the qualifier is removed.

Apparently I did not have enough coffee when
generalizing this to the other qualifiers. 

Ok, with the following test?



/* test that lvalue conversions drops qualifiers, Bug 97702 */
/* { dg-do compile } */
/* { dg-options "" } */


const int jc;
extern int j;
extern typeof(0,jc) j;
extern typeof(+jc) j;
extern typeof(-jc) j;
extern typeof(1?jc:0) j;
extern typeof((int)jc) j;
extern typeof((const int)jc) j;

volatile int kv;
extern int k;
extern typeof(0,kv) k;
extern typeof(+kv) k;
extern typeof(-kv) k;
extern typeof(1?kv:0) k;
extern typeof((int)kv) k;
extern typeof((volatile int)kv) k;

_Atomic int la;
extern int l;
extern typeof(0,la) l;
extern typeof(+la) l;
extern typeof(-la) l;
extern typeof(1?la:0) l;
extern typeof((int)la) l;
extern typeof((_Atomic int)la) l;

int * restrict mr;
extern int *m;
extern typeof(0,mr) m;
extern typeof(1?mr:0) m;
extern typeof((int *)mr) m;
extern typeof((int * restrict)mr) m;



Re: [PATCH] Check calls before loop unrolling

2020-11-19 Thread Segher Boessenkool
On Thu, Nov 19, 2020 at 12:13:34PM -0700, Jeff Law wrote:
> On 8/31/20 9:33 PM, Jiufu Guo via Gcc-patches wrote:
> > guojiufu  writes:
> >> When unroll loops, if there are calls inside the loop, those calls
> >> may raise negative impacts for unrolling.  This patch adds a param
> >> param_max_unrolled_calls, and checks if the number of calls inside
> >> the loop bigger than this param, loop is prevent from unrolling.
> >>
> >> This patch is checking the _average_ number of calls which is the
> >> summary of call numbers multiply the possibility of the call maybe
> >> executed.  The _average_ number could be a fraction, to keep the
> >> precision, the param is the threshold number multiply 1.
> >>
> >> Bootstrap and regtest pass on powerpc64le.  Is this ok for trunk?
> >>
> >> gcc/ChangeLog
> >> 2020-08-19  Jiufu Guo   
> >>
> >>* params.opt (param_max_unrolled_average_calls_x1): New param.
> >>* cfgloop.h (average_num_loop_calls): New declare.
> >>* cfgloopanal.c (average_num_loop_calls): New function.
> >>* loop-unroll.c (decide_unroll_constant_iteration,
> >>decide_unroll_runtime_iterations,
> >>decide_unroll_stupid): Check average_num_loop_calls and
> >>param_max_unrolled_average_calls_x1.
> So what's the motivation behind adding a PARAM to control this
> behavior?  I'm not a big fan of exposing a lot of PARAMs for users to
> tune behavior (though I've made the same lapse in judgment myself).  In
> my mind a PARAM is really more about controlling pathological behavior.

But we (Power) need very different tuning than what others apparently
need.  It is similar to inlining, in that that also differs a lot
between archs how aggressively to do that optimally.


Segher


Re: [PATCH] Check calls before loop unrolling

2020-11-19 Thread Jeff Law via Gcc-patches



On 11/19/20 12:42 PM, Segher Boessenkool wrote:
> On Thu, Nov 19, 2020 at 12:13:34PM -0700, Jeff Law wrote:
>> On 8/31/20 9:33 PM, Jiufu Guo via Gcc-patches wrote:
>>> guojiufu  writes:
 When unroll loops, if there are calls inside the loop, those calls
 may raise negative impacts for unrolling.  This patch adds a param
 param_max_unrolled_calls, and checks if the number of calls inside
 the loop bigger than this param, loop is prevent from unrolling.

 This patch is checking the _average_ number of calls which is the
 summary of call numbers multiply the possibility of the call maybe
 executed.  The _average_ number could be a fraction, to keep the
 precision, the param is the threshold number multiply 1.

 Bootstrap and regtest pass on powerpc64le.  Is this ok for trunk?

 gcc/ChangeLog
 2020-08-19  Jiufu Guo   

* params.opt (param_max_unrolled_average_calls_x1): New param.
* cfgloop.h (average_num_loop_calls): New declare.
* cfgloopanal.c (average_num_loop_calls): New function.
* loop-unroll.c (decide_unroll_constant_iteration,
decide_unroll_runtime_iterations,
decide_unroll_stupid): Check average_num_loop_calls and
param_max_unrolled_average_calls_x1.
>> So what's the motivation behind adding a PARAM to control this
>> behavior?  I'm not a big fan of exposing a lot of PARAMs for users to
>> tune behavior (though I've made the same lapse in judgment myself).  In
>> my mind a PARAM is really more about controlling pathological behavior.
> But we (Power) need very different tuning than what others apparently
> need.  It is similar to inlining, in that that also differs a lot
> between archs how aggressively to do that optimally.
But what I think that argues is that we've got a gap in the costing
model and/or how its being used.  Throwing PARAMS at the problem isn't
really useful for the end user.  The vast majority aren't going to use
them and of the ones that do, most are probably going to get it wrong.

In  my mind fixing things so they work with no magic arguments is best. 
PARAMS are the worst solution.  A -f flag with no arguments is somewhere
in between.  Others may clearly have different opinions here.


jeff



Re: [PATCH] Check calls before loop unrolling

2020-11-19 Thread Segher Boessenkool
On Thu, Nov 19, 2020 at 12:53:27PM -0700, Jeff Law wrote:
> On 11/19/20 12:42 PM, Segher Boessenkool wrote:
> > On Thu, Nov 19, 2020 at 12:13:34PM -0700, Jeff Law wrote:
> >> On 8/31/20 9:33 PM, Jiufu Guo via Gcc-patches wrote:
> >>> guojiufu  writes:
>  When unroll loops, if there are calls inside the loop, those calls
>  may raise negative impacts for unrolling.  This patch adds a param
>  param_max_unrolled_calls, and checks if the number of calls inside
>  the loop bigger than this param, loop is prevent from unrolling.
> 
>  This patch is checking the _average_ number of calls which is the
>  summary of call numbers multiply the possibility of the call maybe
>  executed.  The _average_ number could be a fraction, to keep the
>  precision, the param is the threshold number multiply 1.
> 
>  Bootstrap and regtest pass on powerpc64le.  Is this ok for trunk?
> 
>  gcc/ChangeLog
>  2020-08-19  Jiufu Guo   
> 
>   * params.opt (param_max_unrolled_average_calls_x1): New param.
>   * cfgloop.h (average_num_loop_calls): New declare.
>   * cfgloopanal.c (average_num_loop_calls): New function.
>   * loop-unroll.c (decide_unroll_constant_iteration,
>   decide_unroll_runtime_iterations,
>   decide_unroll_stupid): Check average_num_loop_calls and
>   param_max_unrolled_average_calls_x1.
> >> So what's the motivation behind adding a PARAM to control this
> >> behavior?  I'm not a big fan of exposing a lot of PARAMs for users to
> >> tune behavior (though I've made the same lapse in judgment myself).  In
> >> my mind a PARAM is really more about controlling pathological behavior.
> > But we (Power) need very different tuning than what others apparently
> > need.  It is similar to inlining, in that that also differs a lot
> > between archs how aggressively to do that optimally.
> But what I think that argues is that we've got a gap in the costing
> model and/or how its being used.  Throwing PARAMS at the problem isn't
> really useful for the end user.  The vast majority aren't going to use
> them and of the ones that do, most are probably going to get it wrong.

No, the vast majority of people will *not* (consciously) use them,
because the target defaults will set things to useful values.

The compiler could use saner "generic" defaults perhaps, but those will
still not be satisfactory for anyone (except when they aren't generic in
fact but instead tuned for one arch ;-) ) -- unrolling is just too
important for performance.

> In  my mind fixing things so they work with no magic arguments is best. 
> PARAMS are the worst solution.  A -f flag with no arguments is somewhere
> in between.  Others may clearly have different opinions here.

There is no big difference between params and flags here, IMO -- it has
to be a -f with a value as well, for good results.

Since we have (almost) all such tunings in --param already, I'd say this
one belongs there as well?


Segher


Re: [PATCH,rs6000] Make MMA builtins use opaque modes [v2]

2020-11-19 Thread Peter Bergner via Gcc-patches
On 11/19/20 12:58 PM, acsaw...@linux.ibm.com wrote:
> +(define_expand "mma_disassemble_pair"
> +  [(match_operand:V16QI 0 "mma_disassemble_output_operand")
> +   (match_operand:OO 1 "input_operand")
> +   (match_operand 2 "const_0_to_1_operand")]

Maybe we should use vsx_register_operand instead of input_operand here?



> +(define_insn_and_split "*mma_disassemble_pair"
> +  [(set (match_operand:V16QI 0 "mma_disassemble_output_operand" "=mwa")
> +   (unspec:V16QI [(match_operand:OO 1 "input_operand" "wa")
> +  (match_operand 2 "const_0_to_1_operand")]
> +   UNSPEC_MMA_EXTRACT))]

Likewise?



> +  "TARGET_MMA
> +   && fpr_reg_operand (operands[1], OOmode)"

pairs can be assigned to any vsx register, so I think we want
vsx_register_operand here too.




> +(define_expand "mma_disassemble_acc"
> +  [(match_operand:V16QI 0 "mma_disassemble_output_operand")
> +   (match_operand:XO 1 "input_operand")
> +   (match_operand 2 "const_0_to_3_operand")]

Likewise as above, do we want to use the fpr_reg_operand predicate here
instead of input_operand?



> +(define_insn_and_split "*mma_disassemble_acc"
> +  [(set (match_operand:V16QI 0 "mma_disassemble_output_operand" "=mwa")
> +   (unspec:V16QI [(match_operand:XO 1 "input_operand" "d")
> +  (match_operand 2 "const_0_to_3_operand")]

Likewise?


Peter




c++: Expose constexpr hash table

2020-11-19 Thread Nathan Sidwell
Again, I noticed some cleanups on the way to preparing this exposure of 
the constexpr hash table.  Committing this to trunk


This patch exposes the constexpr hash table so that the modules
machinery can save and load constexpr bodies.  While there I noticed
that we could do a little constification of the hasher and comparator
functions.  Also combine the saving machinery to a single function
returning void -- nothing ever looked at its return value.

gcc/cp/
* cp-tree.h (struct constexpr_fundef): Moved from constexpr.c.
(maybe_save_constexpr_fundef): Declare.
(register_constexpr_fundef): Take constexpr_fundef object, return
void.
* decl.c (mabe_save_function_definition): Delete, functionality
moved to maybe_save_constexpr_fundef.
(emit_coro_helper, finish_function): Adjust.
* constexpr.c (struct constexpr_fundef): Moved to cp-tree.h.
(constexpr_fundef_hasher::equal): Constify.
(constexpr_fundef_hasher::hash): Constify.
(retrieve_constexpr_fundef): Make non-static.
(maybe_save_constexpr_fundef): Break out checking and duplication
from ...
(register_constexpr_fundef): ... here.  Just register the 
constexpr.



--
Nathan Sidwell
diff --git i/gcc/cp/constexpr.c w/gcc/cp/constexpr.c
index e6ab5eecd68..625410327b8 100644
--- i/gcc/cp/constexpr.c
+++ w/gcc/cp/constexpr.c
@@ -133,19 +133,10 @@ ensure_literal_type_for_constexpr_object (tree decl)
   return decl;
 }
 
-/* Representation of entries in the constexpr function definition table.  */
-
-struct GTY((for_user)) constexpr_fundef {
-  tree decl;
-  tree body;
-  tree parms;
-  tree result;
-};
-
 struct constexpr_fundef_hasher : ggc_ptr_hash
 {
-  static hashval_t hash (constexpr_fundef *);
-  static bool equal (constexpr_fundef *, constexpr_fundef *);
+  static hashval_t hash (const constexpr_fundef *);
+  static bool equal (const constexpr_fundef *, const constexpr_fundef *);
 };
 
 /* This table holds all constexpr function definitions seen in
@@ -158,7 +149,8 @@ static GTY (()) hash_table *constexpr_fundef_table;
same constexpr function.  */
 
 inline bool
-constexpr_fundef_hasher::equal (constexpr_fundef *lhs, constexpr_fundef *rhs)
+constexpr_fundef_hasher::equal (const constexpr_fundef *lhs,
+const constexpr_fundef *rhs)
 {
   return lhs->decl == rhs->decl;
 }
@@ -167,20 +159,20 @@ constexpr_fundef_hasher::equal (constexpr_fundef *lhs, constexpr_fundef *rhs)
Return a hash value for the entry pointed to by Q.  */
 
 inline hashval_t
-constexpr_fundef_hasher::hash (constexpr_fundef *fundef)
+constexpr_fundef_hasher::hash (const constexpr_fundef *fundef)
 {
   return DECL_UID (fundef->decl);
 }
 
 /* Return a previously saved definition of function FUN.   */
 
-static constexpr_fundef *
+constexpr_fundef *
 retrieve_constexpr_fundef (tree fun)
 {
   if (constexpr_fundef_table == NULL)
 return NULL;
 
-  constexpr_fundef fundef = { fun, NULL, NULL, NULL };
+  constexpr_fundef fundef = { fun, NULL_TREE, NULL_TREE, NULL_TREE };
   return constexpr_fundef_table->find (&fundef);
 }
 
@@ -669,7 +661,7 @@ get_function_named_in_call (tree t)
   return fun;
 }
 
-/* Subroutine of register_constexpr_fundef.  BODY is the body of a function
+/* Subroutine of check_constexpr_fundef.  BODY is the body of a function
declared to be constexpr, or a sub-statement thereof.  Returns the
return value if suitable, error_mark_node for a statement not allowed in
a constexpr function, or NULL_TREE if no return value was found.  */
@@ -738,7 +730,7 @@ constexpr_fn_retval (tree body)
 }
 }
 
-/* Subroutine of register_constexpr_fundef.  BODY is the DECL_SAVED_TREE of
+/* Subroutine of check_constexpr_fundef.  BODY is the DECL_SAVED_TREE of
FUN; do the necessary transformations to turn it into a single expression
that we can store in the hash table.  */
 
@@ -868,27 +860,28 @@ cx_check_missing_mem_inits (tree ctype, tree body, bool complain)
 }
 
 /* We are processing the definition of the constexpr function FUN.
-   Check that its BODY fulfills the propriate requirements and
-   enter it in the constexpr function definition table.
-   For constructor BODY is actually the TREE_LIST of the
-   member-initializer list.  */
+   Check that its body fulfills the apropriate requirements and
+   enter it in the constexpr function definition table.  */
 
-tree
-register_constexpr_fundef (tree fun, tree body)
+void
+maybe_save_constexpr_fundef (tree fun)
 {
-  constexpr_fundef entry;
-  constexpr_fundef **slot;
+  if (processing_template_decl
+  || !DECL_DECLARED_CONSTEXPR_P (fun)
+  || cp_function_chain->invalid_constexpr
+  || DECL_CLONED_FUNCTION_P (fun))
+return;
 
   if (!is_valid_constexpr_fn (fun, !DECL_GENERATED_P (fun)))
-return NULL;
+return;
 
-  tree massaged = massage_constexpr_body (fun, body);
+  tree massaged = massage_constexpr_body (fun, DECL_SAVED_TREE (fun));
   if (massaged == NULL_TREE 

Re: [PATCH] rs6000, vector integer multiply/divide/modulo instructions

2020-11-19 Thread Segher Boessenkool
On Thu, Nov 19, 2020 at 11:25:08AM -0600, Pat Haugen wrote:
> > +(define_insn "vmodu_"
> > +  [(set (match_operand:VIlong 0 "vsx_register_operand" "=v")
> > +   (umod:VIlong (match_operand:VIlong 1 "vsx_register_operand" "v")
> > +(match_operand:VIlong 2 "vsx_register_operand" "v")))]
> > +  "TARGET_POWER10"
> > +  "vmodu %0,%1,%2"
> > +  [(set_attr "type" "vecsimple")])
> 
> Since the vdiv.../vmod... instructions execute in the fixed point divide unit,

... on some implementations.  The only one currently, sure, but...

> all the above instructions should have a type of "div" instead of "vecsimple".

... it should use "vecdiv" instead (which already exists).  And set
"size" to a proper value as well, so that the scheduling models can see
the difference with e.g. xsdivqp (which should perhaps not use vecdiv at
all itself, it is a scalar div, but we do not currently have good types
for that).

> > +;; Vector multiply low double word
> > +(define_insn "mulv2di3"
> > +  [(set (match_operand:V2DI 0 "vsx_register_operand" "=v")
> > +   (mult:V2DI (match_operand:V2DI 1 "vsx_register_operand" "v")
> > +  (match_operand:V2DI 2 "vsx_register_operand" "v")))]
> > +  "TARGET_POWER10"
> > +  "vmulld %0,%1,%2"
> > +  [(set_attr "type" "vecsimple")])
> 
> Similarly, the above 3 insns should have a "mul" instruction type.

The existing AltiVec vmul* are type "veccomplex", because that was the
execution pipe used on original AltiVec...  This needs to be adapted as
well.  Not sure what is best.


Segher


Re: [C PATCH] Drop qualifiers during lvalue conversion

2020-11-19 Thread Joseph Myers
On Thu, 19 Nov 2020, Uecker, Martin wrote:

> Apparently I did not have enough coffee when
> generalizing this to the other qualifiers. 
> 
> Ok, with the following test?

OK.

-- 
Joseph S. Myers
jos...@codesourcery.com


c++: Template hash access

2020-11-19 Thread Nathan Sidwell


This exposes the template specialization table, so the modules
machinery may access it.  The hashed entity (tmpl, args & spec) is
available, along with a hash table walker.  We also need a way of
finding or inserting entries, along with some bookkeeping fns to deal
with the instantiation and (partial) specialization lists.

This is slightly modified from the earlier posting -- one of the 
functions, used for checking, isn't needed as 
match_mergeable_specialization is modified to allow that use.


gcc/cp/
* cp-tree.h (struct spec_entry): Moved from pt.c.
(walk_specializations, match_mergeable_specialization)
(get_mergeable_specialization_flags)
(add_mergeable_specialization): Declare.
* pt.c (struct spec_entry): Moved to cp-tree.h.
(walk_specializations, match_mergeable_specialization)
(get_mergeable_specialization_flags)
(add_mergeable_specialization): New.

pushing to trunk
--
Nathan Sidwell
diff --git i/gcc/cp/cp-tree.h w/gcc/cp/cp-tree.h
index 0c4b74a8895..021de76e142 100644
--- i/gcc/cp/cp-tree.h
+++ w/gcc/cp/cp-tree.h
@@ -5403,6 +5403,14 @@ public:
   hash_map *saved;
 };
 
+/* Entry in the specialization hash table.  */
+struct GTY((for_user)) spec_entry
+{
+  tree tmpl;  /* The general template this is a specialization of.  */
+  tree args;  /* The args for this (maybe-partial) specialization.  */
+  tree spec;  /* The specialization itself.  */
+};
+
 /* in class.c */
 
 extern int current_class_depth;
@@ -6994,6 +7002,15 @@ extern bool copy_guide_p			(const_tree);
 extern bool template_guide_p			(const_tree);
 extern bool builtin_guide_p			(const_tree);
 extern void store_explicit_specifier		(tree, tree);
+extern void walk_specializations		(bool,
+		 void (*)(bool, spec_entry *,
+			  void *),
+		 void *);
+extern tree match_mergeable_specialization	(bool is_decl, tree tmpl,
+		 tree args, tree spec);
+extern unsigned get_mergeable_specialization_flags (tree tmpl, tree spec);
+extern void add_mergeable_specialization(tree tmpl, tree args,
+		 tree spec, unsigned);
 extern tree add_outermost_template_args		(tree, tree);
 extern tree add_extra_args			(tree, tree);
 extern tree build_extra_args			(tree, tree, tsubst_flags_t);
diff --git i/gcc/cp/pt.c w/gcc/cp/pt.c
index a1b6631d691..463b1c3a57d 100644
--- i/gcc/cp/pt.c
+++ w/gcc/cp/pt.c
@@ -103,13 +103,6 @@ local_specialization_stack::~local_specialization_stack ()
 /* True if we've recursed into fn_type_unification too many times.  */
 static bool excessive_deduction_depth;
 
-struct GTY((for_user)) spec_entry
-{
-  tree tmpl;
-  tree args;
-  tree spec;
-};
-
 struct spec_hasher : ggc_ptr_hash
 {
   static hashval_t hash (spec_entry *);
@@ -29625,6 +29618,101 @@ declare_integer_pack (void)
 			  CP_BUILT_IN_INTEGER_PACK);
 }
 
+/* Walk the decl or type specialization table calling FN on each
+   entry.  */
+
+void
+walk_specializations (bool decls_p,
+		  void (*fn) (bool decls_p, spec_entry *entry, void *data),
+		  void *data)
+{
+  spec_hash_table *table = decls_p ? decl_specializations
+: type_specializations;
+  spec_hash_table::iterator end (table->end ());
+  for (spec_hash_table::iterator iter (table->begin ()); iter != end; ++iter)
+fn (decls_p, *iter, data);
+}
+
+/* Lookup the specialization of TMPL, ARGS in the decl or type
+   specialization table.  Return what's there, or if SPEC is non-null,
+   add it and return NULL.  */
+
+tree
+match_mergeable_specialization (bool decl_p, tree tmpl, tree args, tree spec)
+{
+  spec_entry elt = {tmpl, args, spec};
+  hash_table *specializations
+= decl_p ? decl_specializations : type_specializations;
+  hashval_t hash = spec_hasher::hash (&elt);
+  spec_entry **slot
+= specializations->find_slot_with_hash (&elt, hash,
+	spec ? INSERT : NO_INSERT);
+  spec_entry *entry = slot ? *slot: NULL;
+  
+  if (entry)
+return entry->spec;
+
+  if (spec)
+{
+  entry = ggc_alloc ();
+  *entry = elt;
+  *slot = entry;
+}
+
+  return NULL_TREE;
+}
+
+/* Return flags encoding whether SPEC is on the instantiation and/or
+   specialization lists of TMPL.  */
+
+unsigned
+get_mergeable_specialization_flags (tree tmpl, tree decl)
+{
+  unsigned flags = 0;
+
+  for (tree inst = DECL_TEMPLATE_INSTANTIATIONS (tmpl);
+   inst; inst = TREE_CHAIN (inst))
+if (TREE_VALUE (inst) == decl)
+  {
+	flags |= 1;
+	break;
+  }
+
+  if (CLASS_TYPE_P (TREE_TYPE (decl))
+  && CLASSTYPE_TEMPLATE_INFO (TREE_TYPE (decl))
+  && CLASSTYPE_USE_TEMPLATE (TREE_TYPE (decl)) == 2)
+/* Only need to search if DECL is a partial specialization.  */
+for (tree part = DECL_TEMPLATE_SPECIALIZATIONS (tmpl);
+	 part; part = TREE_CHAIN (part))
+  if (TREE_VALUE (part) == decl)
+	{
+	  flags |= 2;
+	  break;
+	}
+
+  return flags;
+}
+
+/* Add a new specialization of TMPL.  FLAGS is as returned from
+   get_mergeable_specialization_flags.  */
+
+void
+add_merg

Re: [PATCH] libstdc++: Ensure __gthread_self doesn't call undefined weak symbol [PR 95989]

2020-11-19 Thread Jonathan Wakely via Gcc-patches

On 12/11/20 17:34 +, Jonathan Wakely wrote:

On 11/11/20 19:08 +0100, Jakub Jelinek via Libstdc++ wrote:

On Wed, Nov 11, 2020 at 05:24:42PM +, Jonathan Wakely wrote:

--- a/libgcc/gthr-posix.h
+++ b/libgcc/gthr-posix.h
@@ -684,7 +684,14 @@ __gthread_equal (__gthread_t __t1, __gthread_t __t2)
static inline __gthread_t
__gthread_self (void)
{
+#if __GLIBC_PREREQ(2, 27)


What if it is a non-glibc system where __GLIBC_PREREQ macro isn't defined?
I think you'd get then
error: missing binary operator before token "("
So I think you want
#if defined __GLIBC__ && defined __GLIBC_PREREQ
#if __GLIBC_PREREQ(2, 27)
return pthread_self ();
#else
return __gthrw_(pthread_self) ();
#else
return __gthrw_(pthread_self) ();
#endif
or similar.



Here's a fixed version of the patch.

I've moved the glibc-specific code in this_thread::get_id() into a new
macro defined in config/os/gnu-linux/os_defines.h (where we already
know we are dealing with glibc). That means we don't do the
__GLIBC_PREREQ check directly in , it's hidden away in a
target-specific header.

Tested powerpc64le-linux (glibc 2.17 and 2.32), sparc-solaris2.11 and
powerpc-aix.


I've committed this version which only fixes this_thread::get_id() in
libstdc++, and doesn't change __gthread_self in gthr-posix.h

Due to a recent change to replace other uses of __gthread_self with
calls to this_thread::get_id(), fixing it there fixes all uses in
libstdc++.

Tested x86_64-linux, powerpc-aix, sparc-solaris2.11, committed to
trunk.


commit 08b4d325711d5c6f68ac29443aba3fd7aa173ac8
Author: Jonathan Wakely 
Date:   Thu Nov 19 21:07:06 2020

libstdc++: Avoid calling undefined __gthread_self weak symbol [PR 95989]

Since glibc 2.27 the pthread_self symbol has been defined in libc rather
than libpthread. Because we only call pthread_self through a weak alias
it's possible for statically linked executables to end up without a
definition of pthread_self. This crashes when trying to call an
undefined weak symbol.

We can use the __GLIBC_PREREQ version check to detect the version of
glibc where pthread_self is no longer in libpthread, and call it
directly rather than through the weak reference.

It would be better to check for pthread_self in libc during configure
instead of hardcoding the __GLIBC_PREREQ check. That would be
complicated by the fact that prior to glibc 2.27 libc.a didn't have the
pthread_self symbol, but libc.so.6 did.  The configure checks would need
to try to link both statically and dynamically, and the result would
depend on whether the static libc.a happens to be installed during
configure (which could vary between different systems using the same
version of glibc). Doing it properly is left for a future date, as that
will be needed anyway after glibc moves all pthread symbols from
libpthread to libc. When that happens we should revisit the whole
approach of using weak symbols for pthread symbols.

For the purposes of std::this_thread::get_id() we call
pthread_self() directly when using glibc 2.27 or later. Otherwise, if
__gthread_active_p() is true then we know the libpthread symbol is
available so we call that. Otherwise, we are single-threaded and just
use ((__gthread_t)1) as the thread ID.

An undesirable consequence of this change is that code compiled prior to
the change might inline the old definition of this_thread::get_id()
which always returns (__gthread_t)1 in a program that isn't linked to
libpthread. Code compiled after the change will use pthread_self() and
so get a real TID. That could result in the main thread having different
thread::id values in different translation units. This seems acceptable,
as there are not expected to be many uses of thread::id in programs
that aren't linked to libpthread.

An earlier version of this patch also changed __gthread_self() to use
__GLIBC_PREREQ(2, 27) and only use the weak symbol for older glibc. Tha
might still make sense to do, but isn't needed by libstdc++ now.

libstdc++-v3/ChangeLog:

PR libstdc++/95989
* config/os/gnu-linux/os_defines.h (_GLIBCXX_NATIVE_THREAD_ID):
Define new macro to get reliable thread ID.
* include/bits/std_thread.h: (this_thread::get_id): Use new
macro if it's defined.
* testsuite/30_threads/jthread/95989.cc: New test.
* testsuite/30_threads/this_thread/95989.cc: New test.

diff --git a/libstdc++-v3/config/os/gnu-linux/os_defines.h b/libstdc++-v3/config/os/gnu-linux/os_defines.h
index f821486ec8f5..01bfa9ddd4f2 100644
--- a/libstdc++-v3/config/os/gnu-linux/os_defines.h
+++ b/libstdc++-v3/config/os/gnu-linux/os_defines.h
@@ -49,4 +49,16 @@
 // version dynamically in case it has changed since libstdc++ was configured.
 #define _GLIBCXX_NO_OBSOLETE_ISINF_ISNAN_DYNAMIC __GLIBC_PREREQ(2,23)
 
+#if __GLIBC_PREREQ(2

RE: [EXTERNAL] Re: [PATCH] [tree-optimization] Optimize two patterns with three xors.

2020-11-19 Thread Eugene Rozenfeld via Gcc-patches
Thank you for installing my patch Jeff!

Yes, I intend to contribute regularly. I'm working on getting copyright 
assignment/disclaimer paperwork approved by my employer. I'll apply for commit 
privs after that.

Eugene

-Original Message-
From: Jeff Law  
Sent: Wednesday, November 18, 2020 11:33 AM
To: Richard Biener ; Eugene Rozenfeld 

Cc: gcc-patches@gcc.gnu.org
Subject: [EXTERNAL] Re: [PATCH] [tree-optimization] Optimize two patterns with 
three xors.



On 11/17/20 12:57 AM, Richard Biener via Gcc-patches wrote:
> On Tue, Nov 17, 2020 at 3:19 AM Eugene Rozenfeld 
>  wrote:
>> Thank you for the review Richard!
>>
>> I re-worked the patch based on your suggestions (attached).
>> I made the change to reuse the first bit_xor in both patterns and I added :s 
>> to the last xor in the first pattern.
>> For the second pattern I didn't add :s because I think the simplification is 
>> beneficial even if the second or third bit_xor has more than one use since 
>> we are simplifying them to just a single operand (@2). If that is incorrect, 
>> please explain why.
> Ah, true, that's correct.
>
> The patch is OK.
I've installed this on the trunk.

Eugene, if you're going to contribute regularly you should probably go ahead 
and get commit privs so that you can commit ACK's patches yourself.   There 
should be a link to a form from this page:

https://nam06.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgcc.gnu.org%2Fgitwrite.html&data=04%7C01%7Ceugene.rozenfeld%40microsoft.com%7Ca31f5335968749cdfe1708d88bf8bfb2%7C72f988bf86f141af91ab2d7cd011db47%7C1%7C0%7C637413247775369684%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C1000&sdata=BfQGAtU%2F8IJ8%2BRcvuNI8qKWgnC9oPFtWQhXo1RzlBTU%3D&reserved=0


Jeff



Re: [PATCH] Check calls before loop unrolling

2020-11-19 Thread Jeff Law via Gcc-patches



On 11/19/20 1:01 PM, Segher Boessenkool wrote:
> On Thu, Nov 19, 2020 at 12:53:27PM -0700, Jeff Law wrote:
>> On 11/19/20 12:42 PM, Segher Boessenkool wrote:
>>> On Thu, Nov 19, 2020 at 12:13:34PM -0700, Jeff Law wrote:
 On 8/31/20 9:33 PM, Jiufu Guo via Gcc-patches wrote:
> guojiufu  writes:
>> When unroll loops, if there are calls inside the loop, those calls
>> may raise negative impacts for unrolling.  This patch adds a param
>> param_max_unrolled_calls, and checks if the number of calls inside
>> the loop bigger than this param, loop is prevent from unrolling.
>>
>> This patch is checking the _average_ number of calls which is the
>> summary of call numbers multiply the possibility of the call maybe
>> executed.  The _average_ number could be a fraction, to keep the
>> precision, the param is the threshold number multiply 1.
>>
>> Bootstrap and regtest pass on powerpc64le.  Is this ok for trunk?
>>
>> gcc/ChangeLog
>> 2020-08-19  Jiufu Guo   
>>
>>  * params.opt (param_max_unrolled_average_calls_x1): New param.
>>  * cfgloop.h (average_num_loop_calls): New declare.
>>  * cfgloopanal.c (average_num_loop_calls): New function.
>>  * loop-unroll.c (decide_unroll_constant_iteration,
>>  decide_unroll_runtime_iterations,
>>  decide_unroll_stupid): Check average_num_loop_calls and
>>  param_max_unrolled_average_calls_x1.
 So what's the motivation behind adding a PARAM to control this
 behavior?  I'm not a big fan of exposing a lot of PARAMs for users to
 tune behavior (though I've made the same lapse in judgment myself).  In
 my mind a PARAM is really more about controlling pathological behavior.
>>> But we (Power) need very different tuning than what others apparently
>>> need.  It is similar to inlining, in that that also differs a lot
>>> between archs how aggressively to do that optimally.
>> But what I think that argues is that we've got a gap in the costing
>> model and/or how its being used.  Throwing PARAMS at the problem isn't
>> really useful for the end user.  The vast majority aren't going to use
>> them and of the ones that do, most are probably going to get it wrong.
> No, the vast majority of people will *not* (consciously) use them,
> because the target defaults will set things to useful values.
>
> The compiler could use saner "generic" defaults perhaps, but those will
> still not be satisfactory for anyone (except when they aren't generic in
> fact but instead tuned for one arch ;-) ) -- unrolling is just too
> important for performance.
Then fix the heuristics, don't add new PARAMS :-)

It didn't even occur to me until now that you may be pushing to have the
ppc backend have different values for the PARAMS.  I would strongly
discourage that.  It's been a huge headache in the s390 backend already.

>
>> In  my mind fixing things so they work with no magic arguments is best. 
>> PARAMS are the worst solution.  A -f flag with no arguments is somewhere
>> in between.  Others may clearly have different opinions here.
> There is no big difference between params and flags here, IMO -- it has
> to be a -f with a value as well, for good results.
Which is a signal that we have a deeper problem.  -f with a value is no
different than a param.

>
> Since we have (almost) all such tunings in --param already, I'd say this
> one belongs there as well?
I'm not convinced at this point. 

jeff



Re: [PATCH] c++, v2: Add __builtin_clear_padding builtin - C++20 P0528R3 compiler side [PR88101]

2020-11-19 Thread Jakub Jelinek via Gcc-patches
On Thu, Nov 19, 2020 at 05:30:06PM +0100, Jakub Jelinek via Gcc-patches wrote:
> Tested on x86_64-linux, ok for trunk if it passes full bootstrap/regtest?

Successfully bootstrapped/regtested on both x86_64-linux and i686-linux now.

Jakub



Re: [PATCH] configury: --enable-link-serialization support

2020-11-19 Thread Jakub Jelinek via Gcc-patches
On Thu, Nov 19, 2020 at 03:50:27PM +0100, Jakub Jelinek via Gcc-patches wrote:
> So, I think the problem is that for make .PHONY targets are just
> "rebuilt" always, so it is very much undesirable for the cc1plus$(exeext)
> etc. dependencies to include .PHONY targets, but I was using
> them - cc1plus.prev which would depend on some *.serial and
> e.g. cc1.serial depending on c and c depending on cc1$(exeext).
> 
> The following so far only very lightly tested patch rewrites this
> so that *.serial and *.prev aren't .PHONY targets, but instead just
> make variables.
> 
> I was worried that the order in which the language makefile fragments are
> included (which is quite random, what order we get from the filesystem
> matching */config-lang.in) would be a problem but it seems to work fine.

Successfully bootstrapped/regtested on x86_64-linux and i686-linux,
including make install which looked problematic in PR97911.

Ok for trunk?

> 2020-11-19  Jakub Jelinek  
> 
> gcc/
>   * configure.ac: In SERIAL_LIST use lang words without .serial
>   suffix.  Change $lang.prev from a target to variable and instead
>   of depending on *.serial expand to the *.serial variable if
>   the word is in the SERIAL_LIST at all, otherwise to nothing.
>   * configure: Regenerated.
> gcc/c/
>   * Make-lang.in (c.serial): Change from goal to a variable.
>   (.PHONY): Drop c.serial.
> gcc/ada/
>   * gcc-interface/Make-lang.in (ada.serial): Change from goal to a
>   variable.
>   (.PHONY): Drop ada.serial and ada.prev.
>   (gnat1$(exeext)): Depend on $(ada.serial) rather than ada.serial.
> gcc/brig/
>   * Make-lang.in (brig.serial): Change from goal to a variable.
>   (.PHONY): Drop brig.serial and brig.prev.
>   (brig1$(exeext)): Depend on $(brig.serial) rather than brig.serial.
> gcc/cp/
>   * Make-lang.in (c++.serial): Change from goal to a variable.
>   (.PHONY): Drop c++.serial and c++.prev.
>   (cc1plus$(exeext)): Depend on $(c++.serial) rather than c++.serial.
> gcc/d/
>   * Make-lang.in (d.serial): Change from goal to a variable.
>   (.PHONY): Drop d.serial and d.prev.
>   (d21$(exeext)): Depend on $(d.serial) rather than d.serial.
> gcc/fortran/
>   * Make-lang.in (fortran.serial): Change from goal to a variable.
>   (.PHONY): Drop fortran.serial and fortran.prev.
>   (f951$(exeext)): Depend on $(fortran.serial) rather than
>   fortran.serial.
> gcc/go/
>   * Make-lang.in (go.serial): Change from goal to a variable.
>   (.PHONY): Drop go.serial and go.prev.
>   (go1$(exeext)): Depend on $(go.serial) rather than go.serial.
> gcc/jit/
>   * Make-lang.in (jit.serial): Change from goal to a
>   variable.
>   (.PHONY): Drop jit.serial and jit.prev.
>   ($(LIBGCCJIT_FILENAME)): Depend on $(jit.serial) rather than
>   jit.serial.
> gcc/lto/
>   * Make-lang.in (lto1.serial, lto2.serial): Change from goals to
>   variables.
>   (.PHONY): Drop lto1.serial, lto2.serial, lto1.prev and lto2.prev.
>   ($(LTO_EXE)): Depend on $(lto1.serial) rather than lto1.serial.
>   ($(LTO_DUMP_EXE)): Depend on $(lto2.serial) rather than lto2.serial.
> gcc/objc/
>   * Make-lang.in (objc.serial): Change from goal to a variable.
>   (.PHONY): Drop objc.serial and objc.prev.
>   (cc1obj$(exeext)): Depend on $(objc.serial) rather than objc.serial.
> gcc/objcp/
>   * Make-lang.in (obj-c++.serial): Change from goal to a variable.
>   (.PHONY): Drop obj-c++.serial and obj-c++.prev.
>   (cc1objplus$(exeext)): Depend on $(obj-c++.serial) rather than
>   obj-c++.serial.

Jakub



[PATCH] ranger: Improve a % b operand ranges [PR91029]

2020-11-19 Thread Jakub Jelinek via Gcc-patches
Hi!

As mentioned in the PR, the previous PR91029 patch was testing
op2 >= 0 which is unnecessary, even negative op2 values will work the same,
furthermore, from if a % b > 0 we can deduce a > 0 rather than just a >= 0
(0 % b would be 0), and it actually valid even for other constants than 0,
a % b > 5 means a > 5 (a % b has the same sign as a and a in [0, 5] would
result in a % b in [0, 5].  Also, we can deduce a range for the other
operand, if we know
a % b >= 20, then b must be (in absolute value for signed modulo) > 20,
for a % [0, 20] the result would be [0, 19].

The following patch implements all of that, bootstrapped/regtested on
x86_64-linux and i686-linux, ok for trunk?

2020-11-19  Jakub Jelinek  

PR tree-optimization/91029
* range-op.cc (operator_trunc_mod::op1_range): Don't require signed
types, nor require that op2 >= 0.  Implement (a % b) >= x && x > 0
implies a >= x and (a % b) <= x && x < 0 implies a <= x.
(operator_trunc_mod::op2_range): New method.

* gcc.dg/tree-ssa/pr91029-1.c: New test.
* gcc.dg/tree-ssa/pr91029-2.c: New test.

--- gcc/range-op.cc.jj  2020-11-19 20:09:39.531862131 +0100
+++ gcc/range-op.cc 2020-11-19 20:44:24.507774154 +0100
@@ -2637,6 +2637,9 @@ public:
   virtual bool op1_range (irange &r, tree type,
  const irange &lhs,
  const irange &op2) const;
+  virtual bool op2_range (irange &r, tree type,
+ const irange &lhs,
+ const irange &op1) const;
 } op_trunc_mod;
 
 void
@@ -2686,24 +2689,58 @@ operator_trunc_mod::wi_fold (irange &r,
 bool
 operator_trunc_mod::op1_range (irange &r, tree type,
   const irange &lhs,
-  const irange &op2) const
+  const irange &) const
 {
-  // PR 91029.  Check for signed truncation with op2 >= 0.
-  if (TYPE_SIGN (type) == SIGNED && wi::ge_p (op2.lower_bound (), 0, SIGNED))
+  // PR 91029.
+  signop sign = TYPE_SIGN (type);
+  unsigned prec = TYPE_PRECISION (type);
+  // (a % b) >= x && x > 0 , then a >= x.
+  if (wi::gt_p (lhs.lower_bound (), 0, sign))
+{
+  r = value_range (type, lhs.lower_bound (), wi::max_value (prec, sign));
+  return true;
+}
+  // (a % b) <= x && x < 0 , then a <= x.
+  if (wi::lt_p (lhs.upper_bound (), 0, sign))
+{
+  r = value_range (type, wi::min_value (prec, sign), lhs.upper_bound ());
+  return true;
+}
+  return false;
+}
+
+bool
+operator_trunc_mod::op2_range (irange &r, tree type,
+  const irange &lhs,
+  const irange &) const
+{
+  // PR 91029.
+  signop sign = TYPE_SIGN (type);
+  unsigned prec = TYPE_PRECISION (type);
+  // (a % b) >= x && x > 0 , then b is in ~[-x, x] for signed
+  //  or b > x for unsigned.
+  if (wi::gt_p (lhs.lower_bound (), 0, sign))
+{
+  if (sign == SIGNED)
+   r = value_range (type, wi::neg (lhs.lower_bound ()),
+lhs.lower_bound (), VR_ANTI_RANGE);
+  else if (wi::lt_p (lhs.lower_bound (), wi::max_value (prec, sign),
+sign))
+   r = value_range (type, lhs.lower_bound () + 1,
+wi::max_value (prec, sign));
+  else
+   return false;
+  return true;
+}
+  // (a % b) <= x && x < 0 , then b is in ~[x, -x].
+  if (wi::lt_p (lhs.upper_bound (), 0, sign))
 {
-  unsigned prec = TYPE_PRECISION (type);
-  // if a % b > 0 , then a >= 0.
-  if (wi::gt_p (lhs.lower_bound (), 0, SIGNED))
-   {
- r = value_range (type, wi::zero (prec), wi::max_value (prec, SIGNED));
- return true;
-   }
-  // if a % b < 0 , then a <= 0.
-  if (wi::lt_p (lhs.upper_bound (), 0, SIGNED))
-   {
- r = value_range (type, wi::min_value (prec, SIGNED), wi::zero (prec));
- return true;
-   }
+  if (wi::gt_p (lhs.upper_bound (), wi::min_value (prec, sign), sign))
+   r = value_range (type, lhs.upper_bound (),
+wi::neg (lhs.upper_bound ()), VR_ANTI_RANGE);
+  else
+   return false;
+  return true;
 }
   return false;
 }
--- gcc/testsuite/gcc.dg/tree-ssa/pr91029-1.c.jj2020-11-19 
20:19:50.400414120 +0100
+++ gcc/testsuite/gcc.dg/tree-ssa/pr91029-1.c   2020-11-19 20:19:50.400414120 
+0100
@@ -0,0 +1,68 @@
+/* PR tree-optimization/91029 */
+/* { dg-do compile } */
+/* { dg-options "-O2 -fdump-tree-evrp" } */
+
+void kill (void);
+int xx;
+
+void f1 (int i, int j)
+{
+  if ((i % j) == 3)
+{
+  xx = (i < 3);
+  if (xx)
+kill ();
+}
+}
+
+void f2 (int i, int j)
+{
+  if ((i % j) > 0)
+{
+  xx = (i <= 0);
+  if (xx)
+kill ();
+}
+}
+
+void f3 (int i, int j)
+{
+  if ((i % j) == -3)
+{
+  xx = (i > -3);
+  if (xx)
+kill ();
+}
+}
+
+void f4 (int i, int j)
+{
+  if ((i % j) < 0)
+  

[PATCH] Process only valid shift ranges.

2020-11-19 Thread Andrew MacLeod via Gcc-patches
When shifting outside the valid range of [0, precision-1], we can choose 
to process just the valid ones since the rest is undefined.


This allows us to produce results for x << [0,2][+INF, +INF] by 
discarding  the invalid ranges and processing just [0,2].


THis is particularly important when using a value that is limited by a 
branch, as demonstrated in the testcase.


As Jakub suggested in the PR, we can mask the shift value with the full 
range of valid shift values, and use the result of that.

If that is undefined, then we fall back to our old undefined behaviour.

Bootstrapped on x86_64-pc-linux-gnu, no regressions.  Pushed.

Andrew






commit d0d8b5d83614d8f0d0e40c0520d4f40ffa01f8d9
Author: Andrew MacLeod 
Date:   Thu Nov 19 17:41:30 2020 -0500

Process only valid shift ranges.

When shifting outside the valid range of [0, precision-1], we can
choose to process just the valid ones since the rest is undefined.
this allows us to produce results for x << [0,2][+INF, +INF] by discarding
the invalid ranges and processing just [0,2].

gcc/
PR tree-optimization/93781
* range-op.cc (get_shift_range): Rename from
undefined_shift_range_check and now return valid shift ranges.
(operator_lshift::fold_range): Use result from get_shift_range.
(operator_rshift::fold_range): Ditto.
gcc/testsuite/
* gcc.dg/tree-ssa/pr93781-1.c: New.
* gcc.dg/tree-ssa/pr93781-2.c: New.
* gcc.dg/tree-ssa/pr93781-3.c: New.

diff --git a/gcc/range-op.cc b/gcc/range-op.cc
index 6be60073d19..5bf37e1ad82 100644
--- a/gcc/range-op.cc
+++ b/gcc/range-op.cc
@@ -80,30 +80,25 @@ empty_range_varying (irange &r, tree type,
 return false;
 }
 
-// Return TRUE if shifting by OP is undefined behavior, and set R to
-// the appropriate range.
+// Return false if shifting by OP is undefined behavior.  Otherwise, return
+// true and the range it is to be shifted by.  This allows trimming out of
+// undefined ranges, leaving only valid ranges if there are any.
 
 static inline bool
-undefined_shift_range_check (irange &r, tree type, const irange &op)
+get_shift_range (irange &r, tree type, const irange &op)
 {
   if (op.undefined_p ())
-{
-  r.set_undefined ();
-  return true;
-}
+return false;
 
-  // Shifting by any values outside [0..prec-1], gets undefined
-  // behavior from the shift operation.  We cannot even trust
-  // SHIFT_COUNT_TRUNCATED at this stage, because that applies to rtl
-  // shifts, and the operation at the tree level may be widened.
-  if (wi::lt_p (op.lower_bound (), 0, TYPE_SIGN (op.type ()))
-  || wi::ge_p (op.upper_bound (),
-  TYPE_PRECISION (type), TYPE_SIGN (op.type (
-{
-  r.set_varying (type);
-  return true;
-}
-  return false;
+  // Build valid range and intersect it with the shift range.
+  r = value_range (build_int_cst_type (op.type (), 0),
+  build_int_cst_type (op.type (), TYPE_PRECISION (type) - 1));
+  r.intersect (op);
+
+  // If there are no valid ranges in the shift range, returned false.
+  if (r.undefined_p ())
+return false;
+  return true;
 }
 
 // Return TRUE if 0 is within [WMIN, WMAX].
@@ -1465,13 +1460,20 @@ operator_lshift::fold_range (irange &r, tree type,
 const irange &op1,
 const irange &op2) const
 {
-  if (undefined_shift_range_check (r, type, op2))
-return true;
+  int_range_max shift_range;
+  if (!get_shift_range (shift_range, type, op2))
+{
+  if (op2.undefined_p ())
+   r.set_undefined ();
+  else
+   r.set_varying (type);
+  return true;
+}
 
   // Transform left shifts by constants into multiplies.
-  if (op2.singleton_p ())
+  if (shift_range.singleton_p ())
 {
-  unsigned shift = op2.lower_bound ().to_uhwi ();
+  unsigned shift = shift_range.lower_bound ().to_uhwi ();
   wide_int tmp = wi::set_bit_in_zero (shift, TYPE_PRECISION (type));
   int_range<1> mult (type, tmp, tmp);
 
@@ -1487,7 +1489,7 @@ operator_lshift::fold_range (irange &r, tree type,
 }
   else
 // Otherwise, invoke the generic fold routine.
-return range_operator::fold_range (r, type, op1, op2);
+return range_operator::fold_range (r, type, op1, shift_range);
 }
 
 void
@@ -1709,11 +1711,17 @@ operator_rshift::fold_range (irange &r, tree type,
 const irange &op1,
 const irange &op2) const
 {
-  // Invoke the generic fold routine if not undefined..
-  if (undefined_shift_range_check (r, type, op2))
-return true;
+  int_range_max shift;
+  if (!get_shift_range (shift, type, op2))
+{
+  if (op2.undefined_p ())
+   r.set_undefined ();
+  else
+   r.set_varying (type);
+  return true;
+}
 
-  return range_operator::fold_range (r, type, op1, op2);
+  return range_operator::fold_range (r,

Re: [PATCH] ranger: Improve a % b operand ranges [PR91029]

2020-11-19 Thread Andrew MacLeod via Gcc-patches

On 11/19/20 5:41 PM, Jakub Jelinek wrote:

Hi!

As mentioned in the PR, the previous PR91029 patch was testing
op2 >= 0 which is unnecessary, even negative op2 values will work the same,
furthermore, from if a % b > 0 we can deduce a > 0 rather than just a >= 0
(0 % b would be 0), and it actually valid even for other constants than 0,
a % b > 5 means a > 5 (a % b has the same sign as a and a in [0, 5] would
result in a % b in [0, 5].  Also, we can deduce a range for the other
operand, if we know
a % b >= 20, then b must be (in absolute value for signed modulo) > 20,
for a % [0, 20] the result would be [0, 19].

The following patch implements all of that, bootstrapped/regtested on
x86_64-linux and i686-linux, ok for trunk?



OK.

I was having a hard time keeping it all straight! the op1_range and 
op2_range calculations can be real head spinners sometimes.


Andrew




Re: [PATCH] rs6000, vector integer multiply/divide/modulo instructions

2020-11-19 Thread Segher Boessenkool
On Wed, Nov 04, 2020 at 08:44:03AM -0800, Carl Love wrote:
> +#define vec_mulh(a, b) __builtin_vec_mulh (a, b)
> +#define vec_div(a, b) __builtin_vec_div (a, b)
> +#define vec_dive(a, b) __builtin_vec_dive (a, b)
> +#define vec_mod(a, b) __builtin_vec_mod (a, b)

This should be

#define vec_mulh(a, b) __builtin_vec_mulh ((a), (b))

etc...  I see we have quite a few cases in altivec.h already that do not
get that right.  Something to fix, and apparently not too important in
practice ;-)

>  ;; Short vec int modes
>  (define_mode_iterator VIshort [V8HI V16QI])
> -;; Longer vec int modes for rotate/mask ops
> -(define_mode_iterator VIlong [V2DI V4SI])

Hrm, you move this one to vsx.md, but leave VIshort here (instead of
moving that to altivec.md).  Oh well, something needs to be done about
this split anyway.

> +BU_P10V_AV_2 (VDIVES_V4SI, "vdivesw", CONST, vdives_v4si)
> +BU_P10V_AV_2 (VDIVES_V2DI, "vdivesd", CONST, vdives_v2di)
> +BU_P10V_AV_2 (VDIVEU_V4SI, "vdiveuw", CONST, vdiveu_v4si)
> +BU_P10V_AV_2 (VDIVEU_V2DI, "vdiveud", CONST, vdiveu_v2di)
> +BU_P10V_AV_2 (VDIVS_V4SI, "vdivsw", CONST, divv4si3)
> +BU_P10V_AV_2 (VDIVS_V2DI, "vdivsd", CONST, divv2di3)
> +BU_P10V_AV_2 (VDIVU_V4SI, "vdivuw", CONST, udivv4si3)
> +BU_P10V_AV_2 (VDIVU_V2DI, "vdivud", CONST, udivv2di3)
> +BU_P10V_AV_2 (VMODS_V2DI, "vmodsd", CONST, vmods_v2di)
> +BU_P10V_AV_2 (VMODS_V4SI, "vmodsw", CONST, vmods_v4si)
> +BU_P10V_AV_2 (VMODU_V2DI, "vmodud", CONST, vmodu_v2di)
> +BU_P10V_AV_2 (VMODU_V4SI, "vmoduw", CONST, vmodu_v4si)
> +BU_P10V_AV_2 (VMULHS_V2DI, "vmulhsd", CONST, vmulhs_v2di)
> +BU_P10V_AV_2 (VMULHS_V4SI, "vmulhsw", CONST, vmulhs_v4si)
> +BU_P10V_AV_2 (VMULHU_V2DI, "vmulhud", CONST, vmulhu_v2di)
> +BU_P10V_AV_2 (VMULHU_V4SI, "vmulhuw", CONST, vmulhu_v4si)
> +BU_P10V_AV_2 (VMULLD_V2DI, "vmulld", CONST, mulv2di3)

So I would remove the leading "v" from all these pattern names, since
all of them have a mode in the name already.

> +(define_mode_attr VIlong_char [(V2DI "d")
> +(V4SI "w")])

This is just a subset of  -- use that, instead?

; A generic w/d attribute, for things like cmpw/cmpd.
(define_mode_attr wd [(QI"b")
  (HI"h")
  (SI"w")
  (DI"d")
  (V16QI "b")
  (V8HI  "h")
  (V4SI  "w")
  (V2DI  "d")
  (V1TI  "q")
  (TI"q")])

(never mind the name, heh -- it still is nice and short ;-) )

> +(define_insn "vmulhs_"
> +  [(set (match_operand:VIlong 0 "vsx_register_operand" "=v")
> + (unspec:VIlong [(match_operand:VIlong 1 "vsx_register_operand" "v")
> + (match_operand:VIlong 2 "vsx_register_operand" "v")]
> +UNSPEC_VMULHS))]
> +  "TARGET_POWER10"
> +  "vmulhs %0,%1,%2"
> +  [(set_attr "type" "vecsimple")])

The scalar mulh we can describe without unspecs, cannot that be done
here as well?

The type attr is problematic...  At least make it the same as the other
vector int multiplies?  That is veccomplex?

> +Vector Integer Multiply-Divide-Modulo

Use "/" instead of "-" here?  "-" normally is used for things like
"multiply-sum", not to mean "or".

> +For each integer value i from 0 to 3, do the following. The integer value in
> +word element i of a is multiplied by the integer value in word
> +element i of b. The high-order 32 bits of the 64-bit product are placed into
> +word element i of the vector returned.

I think you should quote the "i"?  @code{i} or similar.  I don't think
you need to mark up the digits, phew :-)

Please repost with those things fixed?  Thanks!


Segher


Re: [PATCH] configury: --enable-link-serialization support

2020-11-19 Thread Eric Botcazou
> Successfully bootstrapped/regtested on x86_64-linux and i686-linux,
> including make install which looked problematic in PR97911.
> 
> Ok for trunk?

I cannot really approve, but this looks like a step in the right direction.

-- 
Eric Botcazou




[PATCH] PowerPC: PR libgcc/97543, fix 64-bit long double issues

2020-11-19 Thread Michael Meissner via Gcc-patches
PowerPC: PR libgcc/97543, fix 64-bit long double issues

I meant to post this to the gcc-patches mailing list last Thursday, but I see I
posted this to an internal IBM mailing list.

This patch replaces the previous iterations of this patch:

October 22nd, 2020:
Message-ID: <2020100510.ga11...@ibm-toto.the-meissners.org>

October 28th, 2020:
Message-ID: <20201029004204.ga15...@ibm-toto.the-meissners.org

If you use a compiler with long double defaulting to 64-bit instead of 128-bit
with IBM extended double, you get linker warnings about mis-matches in the gnu
attributes for long double (PR libgcc/97543).  Even if the compiler is
configured to have long double be 64 bit as the default with the configuration
option '--without-long-double-128' you get the warnings.

You also get the same issues if you use a compiler with long double defaulting
to IEEE 128-bit instead of IBM extended double (PR libgcc/97643).

The issue is the way libgcc.a/libgcc.so is built.  Right now when building
libgcc under Linux, the long double size is set to 128-bits when building
libgcc.  However, the gnu attributes are set, leading to the warnings.

One feature of the current GNU attribute implementation is if you have a shared
library (such as libgcc_s.so), the GNU attributes for the shared library is an
inclusive OR of all of the modules within the library.  This means if any
module uses the -mlong-double-128 option and uses long double, the GNU
attributes for the library will indicate that it uses 128-bit IBM long
doubles.  If you have a static library, you will get the warning only if you
actually reference a module with the attribute set.

This patch does two things:

1)  All of the modules that support IBM 128-bit long doubles explicitly set
the ABI to IBM extended double.

2) I turned off GNU attributes for building the shared library or for
building the IBM 128-bit long double support.

I have discussed this patch with Alan Modra, and made several changes based on
his suggestions.

I have tested this on a little endian power9 system running Linux by building
three separate compilers, using the Advance Toolchain AT14.0 which uses GLIBC
2.32:

1)  A compiler where the long double default is IBM 128-bit double;
2)  A compiler where the long double default is IEEE 128-bit double; (and)
3)  A compiler where the long double default is 64-bit.

Note, for the IEEE build (#2), the other patches that I will be submitting are
needed to enable the full build.

For each of the 3 compilers, I then tested some code with long double's and
verified that each of the long double options worked without generating
warnings.

In addition, I have tested this patch on a big endian power8 system running
Linux, and there were no regressions.

Can I install this patch into the master branch?  Since this is a bug for
64-bit long doubles, I would like to back port it to GCC 10, and GCC 9 after a
shake-in period.

libgcc/
2020-11-17  Michael Meissner  

PR libgcc/97543
PR libgcc/97643
* config/rs6000/t-linux (IBM128_STATIC_OBJS): New make variable.
(IBM128_SHARED_OBJS): New make variable.
(IBM128_OBJS): New make variable.  Set all objects to use the
explicit IBM format, and disable gnu attributes.
(IBM128_CFLAGS): New make variable.
(gcc_s_compile): Add -mno-gnu-attribute to all shared library
modules.
---
 libgcc/config/rs6000/t-linux | 22 ++
 1 file changed, 22 insertions(+)

diff --git a/libgcc/config/rs6000/t-linux b/libgcc/config/rs6000/t-linux
index ed821947b66..72e9c2770a6 100644
--- a/libgcc/config/rs6000/t-linux
+++ b/libgcc/config/rs6000/t-linux
@@ -6,3 +6,25 @@ HOST_LIBGCC2_CFLAGS += -mlong-double-128
 # smaller and faster libgcc code.  Directly specifying -mcmodel=small
 # would need to take into account targets for which -mcmodel is invalid.
 HOST_LIBGCC2_CFLAGS += -mno-minimal-toc
+
+# On the modules that deal with IBM 128-bit values, make sure that TFmode uses
+# the IBM extended double format.  Also turn off gnu attributes on the static
+# modules.
+IBM128_STATIC_OBJS = ibm-ldouble$(objext) _powitf2$(objext) \
+ ppc64-fp$(objext) _divtc3$(object) _multc3$(object) \
+ _fixtfdi$(object) _fixunstfdi$(object) \
+ _floatditf$(objext) _floatunsditf$(objext)
+IBM128_SHARED_OBJS = $(IBM128_STATIC_OBJS:$(objext):_s$(objext))
+IBM128_OBJS= $(IBM128_STATIC_OBJS) $(IBM128_SHARED_OBJS)
+
+IBM128_CFLAGS  = -Wno-psabi -mabi=ibmlongdouble -mno-gnu-attribute
+
+$(IBM128_OBJS) : INTERNAL_CFLAGS += $(IBM128_CFLAGS)
+
+# Turn off gnu attributes for long double size on all of the shared library
+# modules, but leave it on for the static modules, except for the functions
+# that explicitly process IBM 128-bit floating point.  Shared libraries only
+# have one gnu attribute for the whole library, and it can lead to warnings if
+# som

Re: [PATCH] ranger: Improve a % b operand ranges [PR91029]

2020-11-19 Thread David Malcolm via Gcc-patches
On Thu, 2020-11-19 at 23:41 +0100, Jakub Jelinek via Gcc-patches wrote:
> Hi!
> 
> As mentioned in the PR, the previous PR91029 patch was testing
> op2 >= 0 which is unnecessary, even negative op2 values will work the
> same,
> furthermore, from if a % b > 0 we can deduce a > 0 rather than just a
> >= 0
> (0 % b would be 0), and it actually valid even for other constants
> than 0,
> a % b > 5 means a > 5 (a % b has the same sign as a and a in [0, 5]
> would
> result in a % b in [0, 5].  Also, we can deduce a range for the other
> operand, if we know
> a % b >= 20, then b must be (in absolute value for signed modulo) >
> 20,
> for a % [0, 20] the result would be [0, 19].
> 
> The following patch implements all of that, bootstrapped/regtested on
> x86_64-linux and i686-linux, ok for trunk?
> 
> 2020-11-19  Jakub Jelinek  
> 
>   PR tree-optimization/91029
>   * range-op.cc (operator_trunc_mod::op1_range): Don't require
> signed
>   types, nor require that op2 >= 0.  Implement (a % b) >= x && x
> > 0
>   implies a >= x and (a % b) <= x && x < 0 implies a <= x.
>   (operator_trunc_mod::op2_range): New method.
> 
>   * gcc.dg/tree-ssa/pr91029-1.c: New test.
>   * gcc.dg/tree-ssa/pr91029-2.c: New test.
> 
> --- gcc/range-op.cc.jj2020-11-19 20:09:39.531862131 +0100
> +++ gcc/range-op.cc   2020-11-19 20:44:24.507774154 +0100
> @@ -2637,6 +2637,9 @@ public:
>virtual bool op1_range (irange &r, tree type,
> const irange &lhs,
> const irange &op2) const;
> +  virtual bool op2_range (irange &r, tree type,
> +   const irange &lhs,
> +   const irange &op1) const;
>  } op_trunc_mod;

Should these various overrides of vfuncs be labeled "OVERRIDE" rather
than "virtual", to use the override specifier?  In fact, given that we
now require C++11, presumably we can spell that as "override" and lose
the macro.

Dave



[PATCH] PowerPC: PR 97791: Fix gnu attributes.

2020-11-19 Thread Michael Meissner via Gcc-patches
[PATCH] PowerPC: PR 97791: Fix gnu attributes.

Note, I originally posted this to an internal IBM mailing list, not to
gcc-patches.  Sorry about that.

This patch does two things to fix setting gnu attribute #4 (long double status)

1) Only set gnu attribute #4 if long double was passed.  Passing __float128
when long double is IBM or __ibm128 when long double is IEEE no longer sets the
attribute.  This resulted in a lot of false positives, such as using __float128
and no long double support.

2) Do not set the gnu attribute if a mode used by long double (TF or DF) is
used in a move.  The moves do not differentiate between the long double type
and similar types.  Delete the three tests that tested this.

I wrote the code for the move several years.  I wanted to flag that an object
that used the appropriate long double type got flagged.  Unfortunately, at the
RTL level, we have lost the type nodes, so we can't tell the difference between
two types that use the same mode (for instance if long double is 64-bit, the
attribute would be set if you used normal doubles, and not long doubles).  Alan
Modra and I discussed this, and we think this is just the right thing to do.

It has been tested on power8 big endian Linux server systems and power9 little
endian Linux server systems, and there were no regressions.

gcc/
2020-11-17  Michael Meissner  

PR gcc/97791
* config/rs6000/rs6000-call.c (init_cumulative_args): Only set
that long double was returned if the type is actually long
double.
(rs6000_function_arg_advance_1): Only set that long double was
passed if the type is actually long double.
* config/rs6000/rs6000.c (rs6000_emit_move): Delete code that sets
whether long double was passed based on the modes used in moves.

gcc/testsuite/
2020-11-17  Michael Meissner  

PR target/97791
* gcc.target/powerpc/gnuattr1.c: Delete.
* gcc.target/powerpc/gnuattr2.c: Delete.
* gcc.target/powerpc/gnuattr3.c: Delete.
---
 gcc/config/rs6000/rs6000-call.c | 13 -
 gcc/config/rs6000/rs6000.c  | 17 -
 gcc/testsuite/gcc.target/powerpc/gnuattr1.c | 15 ---
 gcc/testsuite/gcc.target/powerpc/gnuattr2.c | 17 -
 gcc/testsuite/gcc.target/powerpc/gnuattr3.c | 15 ---
 5 files changed, 4 insertions(+), 73 deletions(-)
 delete mode 100644 gcc/testsuite/gcc.target/powerpc/gnuattr1.c
 delete mode 100644 gcc/testsuite/gcc.target/powerpc/gnuattr2.c
 delete mode 100644 gcc/testsuite/gcc.target/powerpc/gnuattr3.c

diff --git a/gcc/config/rs6000/rs6000-call.c b/gcc/config/rs6000/rs6000-call.c
index 3bd89a79bad..8294e22fb85 100644
--- a/gcc/config/rs6000/rs6000-call.c
+++ b/gcc/config/rs6000/rs6000-call.c
@@ -6539,11 +6539,8 @@ init_cumulative_args (CUMULATIVE_ARGS *cum, tree fntype,
{
  rs6000_passes_float = true;
  if ((HAVE_LD_PPC_GNU_ATTR_LONG_DOUBLE || TARGET_64BIT)
- && (FLOAT128_IBM_P (return_mode)
- || FLOAT128_IEEE_P (return_mode)
- || (return_type != NULL
- && (TYPE_MAIN_VARIANT (return_type)
- == long_double_type_node
+ && return_type != NULL
+ && TYPE_MAIN_VARIANT (return_type) == long_double_type_node)
rs6000_passes_long_double = true;
 
  /* Note if we passed or return a IEEE 128-bit type.  We changed
@@ -7001,10 +6998,8 @@ rs6000_function_arg_advance_1 (CUMULATIVE_ARGS *cum, 
machine_mode mode,
{
  rs6000_passes_float = true;
  if ((HAVE_LD_PPC_GNU_ATTR_LONG_DOUBLE || TARGET_64BIT)
- && (FLOAT128_IBM_P (mode)
- || FLOAT128_IEEE_P (mode)
- || (type != NULL
- && TYPE_MAIN_VARIANT (type) == long_double_type_node)))
+ && type != NULL
+ && TYPE_MAIN_VARIANT (type) == long_double_type_node)
rs6000_passes_long_double = true;
 
  /* Note if we passed or return a IEEE 128-bit type.  We changed the
diff --git a/gcc/config/rs6000/rs6000.c b/gcc/config/rs6000/rs6000.c
index b6fd21a5d6f..a5188553593 100644
--- a/gcc/config/rs6000/rs6000.c
+++ b/gcc/config/rs6000/rs6000.c
@@ -10076,23 +10076,6 @@ rs6000_emit_move (rtx dest, rtx source, machine_mode 
mode)
   && GET_MODE_BITSIZE (mode) <= HOST_BITS_PER_WIDE_INT)
 gcc_unreachable ();
 
-#ifdef HAVE_AS_GNU_ATTRIBUTE
-  /* If we use a long double type, set the flags in .gnu_attribute that say
- what the long double type is.  This is to allow the linker's warning
- message for the wrong long double to be useful, even if the function does
- not do a call (for example, doing a 128-bit add on power9 if the long
- double type is IEEE 128-bit.  Do not set this if __ibm128 or __floa128 are
- used if they aren't the default long dobule type.  */
-  if (rs6000_gn

Re: [PATCH] Check calls before loop unrolling

2020-11-19 Thread Segher Boessenkool
On Thu, Nov 19, 2020 at 03:30:37PM -0700, Jeff Law wrote:
> > No, the vast majority of people will *not* (consciously) use them,
> > because the target defaults will set things to useful values.
> >
> > The compiler could use saner "generic" defaults perhaps, but those will
> > still not be satisfactory for anyone (except when they aren't generic in
> > fact but instead tuned for one arch ;-) ) -- unrolling is just too
> > important for performance.
> Then fix the heuristics, don't add new PARAMS :-)

I just said that cannot work?

> It didn't even occur to me until now that you may be pushing to have the
> ppc backend have different values for the PARAMS.  I would strongly
> discourage that.  It's been a huge headache in the s390 backend already.

It also makes a huge performance difference.  That the generic parts
of GCC are only tuned for x86 (or not well tuned for anything?) is a
huge roadblock for us.

I am not saying we should have six hundred different tunings.  But we
need a few (and we already *have* a few, not params but generic flags,
just like many other targets fwiw).

We *do* have a few custom param settings already, just like aarch64,
ia64, and sh, actually.

> >> In  my mind fixing things so they work with no magic arguments is best. 
> >> PARAMS are the worst solution.  A -f flag with no arguments is somewhere
> >> in between.  Others may clearly have different opinions here.
> > There is no big difference between params and flags here, IMO -- it has
> > to be a -f with a value as well, for good results.
> Which is a signal that we have a deeper problem.  -f with a value is no
> different than a param.

Yes exactly.

> > Since we have (almost) all such tunings in --param already, I'd say this
> > one belongs there as well?
> I'm not convinced at this point. 

Why not?

We have way many params, yes.  But the first step to counteract that
would be to deprecate and get rid of many existing ones, not to block
having new ones which can be useful (while many of the existing ones are
not).

Or, we could accept that it is not really a problem at all.  You seem to
have a strong opinion that it *is*, but I don't understand that; maybe
you can explain a bit more?

Thanks,


Segher


[PATCH] PowerPC: Map IEEE 128-bit long double built-in functions

2020-11-19 Thread Michael Meissner via Gcc-patches
[PATCH] PowerPC: Map IEEE 128-bit long double built-in functions.

I posted this patch by accident to an internal IBM mailing list instead of
gcc-patches.

This patch replaces patches previously submitted:

September 24th, 2020:
Message-ID: <20200924203159.ga31...@ibm-toto.the-meissners.org>

October 9th, 2020:
Message-ID: <20201009043543.ga11...@ibm-toto.the-meissners.org>

October 24th, 2020:
Message-ID: <2020100346.ga8...@ibm-toto.the-meissners.org>

This patch maps the built-in functions that take or return long double
arguments on systems where long double is IEEE 128-bit.

This patch goes through the built-in functions and changes the name of the
math, scanf, and printf built-in functions to use the functions that GLIBC
provides when long double uses the IEEE 128-bit representation.

In addition, changing the name in GCC allows the Fortran compiler to
automatically use the correct name.

To map the math functions, typically this patch changes l to
__ieee128.  However there are some exceptions that are handled with this
patch.

To map the printf functions,  is mapped to __ieee128.

To map the scanf functions,  is mapped to __isoc99_ieee128.

With the other IEEE long double patches, I have tested this patch by building 3
bootstrap compilers on a little endian power9 system, using the Advance
Toolchain AT14.0 library, which uses GLIBC 2.32:

1)  One compiler defaulted long double to IBM extended double;
2)  One compiler defaulted long double to IEEE 128-bit; (and)
3)  One compiler defaulted long double to 64 bit.

I was able to bootstrap each compiler and run make check.  In addition for the
compilers using the two 128-bit long double types (IBM, IEEE), I have built the
spec 2017 benchmark for both power9 and power10.

At the moment, there are some differences between between the three runs for
make check.  I have some patches to fix these issue that I've done in the past,
and I will be working on resubmitting them in the future.

In addition, there are 3 fortran benchmarks (ieee/large_2.f90,
default_format_2.f90, and default_format_denormal_2.f90) that now pass when the
long double default is IEEE 128-bit.

Can I check this into the master branch?

gcc/
2020-11-17  Michael Meissner  

* config/rs6000/rs6000.c (rs6000_mangle_decl_assembler_name): Add
support for mapping built-in function names for long double
built-in functions if long double is IEEE 128-bit.

gcc/testsuite/
2020-11-17  Michael Meissner  

* gcc.target/powerpc/float128-longdouble-math.c: New test.
* gcc.target/powerpc/float128-longdouble-stdio.c: New test.
* gcc.target/powerpc/float128-math.c: Adjust test for new name
being generated.  Add support for running test on power10.  Add
support for running if long double defaults to 64-bits.
---
 gcc/config/rs6000/rs6000.c| 135 --
 .../powerpc/float128-longdouble-math.c| 442 ++
 .../powerpc/float128-longdouble-stdio.c   |  36 ++
 .../gcc.target/powerpc/float128-math.c|  16 +-
 4 files changed, 589 insertions(+), 40 deletions(-)
 create mode 100644 gcc/testsuite/gcc.target/powerpc/float128-longdouble-math.c
 create mode 100644 gcc/testsuite/gcc.target/powerpc/float128-longdouble-stdio.c

diff --git a/gcc/config/rs6000/rs6000.c b/gcc/config/rs6000/rs6000.c
index a5188553593..35e9c844e17 100644
--- a/gcc/config/rs6000/rs6000.c
+++ b/gcc/config/rs6000/rs6000.c
@@ -27065,57 +27065,128 @@ rs6000_globalize_decl_name (FILE * stream, tree decl)
library before you can switch the real*16 type at compile time.
 
We use the TARGET_MANGLE_DECL_ASSEMBLER_NAME hook to change this name.  We
-   only do this if the default is that long double is IBM extended double, and
-   the user asked for IEEE 128-bit.  */
+   only do this transformation if the __float128 type is enabled.  This
+   prevents us from doing the transformation on older 32-bit ports that might
+   have enabled using IEEE 128-bit floating point as the default long double
+   type.  */
 
 static tree
 rs6000_mangle_decl_assembler_name (tree decl, tree id)
 {
-  if (!TARGET_IEEEQUAD_DEFAULT && TARGET_IEEEQUAD && TARGET_LONG_DOUBLE_128
+  if (TARGET_FLOAT128_TYPE && TARGET_IEEEQUAD && TARGET_LONG_DOUBLE_128
   && TREE_CODE (decl) == FUNCTION_DECL
-  && DECL_IS_UNDECLARED_BUILTIN (decl))
+  && DECL_IS_UNDECLARED_BUILTIN (decl)
+  && DECL_BUILT_IN_CLASS (decl) == BUILT_IN_NORMAL)
 {
   size_t len = IDENTIFIER_LENGTH (id);
   const char *name = IDENTIFIER_POINTER (id);
+  char *newname = NULL;
 
-  if (name[len - 1] == 'l')
+  /* See if it is one of the built-in functions with an unusual name.  */
+  switch (DECL_FUNCTION_CODE (decl))
{
- bool uses_ieee128_p = false;
- tree type = TREE_TYPE (decl);
- machine_mode ret_mode = TYPE_MODE (type);
+   case BUILT_IN_DREML:
+ newname = xstrdup ("__remainderieee128");
+ 

[PATCH] PowerPC: Set long double size for IBM/IEEE.

2020-11-19 Thread Michael Meissner via Gcc-patches
[PATCH] PowerPC: Set long double size for IBM/IEEE.

I originally posted this patch to an internal IBM mailing list instead of
gcc-patches.

As I was working with compilers where the long double default was 64-bit, it
became annoying to have to use two options to switch to one of the 128-bit long
double types (i.e. you need both -mlong-double-128 and the
-mabi={ieee,ibm}longdouble to switch the long double type).

I did this patch so that if you explicitly set the long double ABI via the
-mabi= option, it would automatically set the long double size if that was not
set explicitly.

gcc/
2020-11-17  Michael Meissner  

* config/rs6000/rs6000.c (rs6000_option_override_internal): If the
user explicitly used -mabi=ieeelongdouble or -mabi=ibmlongdouble,
set the long double size to 128.
* doc/invoke.texi (PowerPC options): Document that an explicit
-mabi=ieeelongdouble or -mabi=ibmlongdouble implicitly sets
-mlong-double-128.
---
 gcc/config/rs6000/rs6000.c | 9 +++--
 gcc/doc/invoke.texi| 7 ---
 2 files changed, 11 insertions(+), 5 deletions(-)

diff --git a/gcc/config/rs6000/rs6000.c b/gcc/config/rs6000/rs6000.c
index 35e9c844e17..6edd17a0b69 100644
--- a/gcc/config/rs6000/rs6000.c
+++ b/gcc/config/rs6000/rs6000.c
@@ -4131,8 +4131,13 @@ rs6000_option_override_internal (bool global_init_p)
 
   /* Use long double size to select the appropriate long double.  We use
  TYPE_PRECISION to differentiate the 3 different long double types.  We map
- 128 into the precision used for TFmode.  */
-  int default_long_double_size = (RS6000_DEFAULT_LONG_DOUBLE_SIZE == 64
+ 128 into the precision used for TFmode.
+
+ If the user explicitly used -mabi=ieeelongdouble or -mabi=ibmlongdouble,
+ but the compiler was configured for default 64-bit long doubles, set the
+ long double to be 128.  */
+  int default_long_double_size = ((RS6000_DEFAULT_LONG_DOUBLE_SIZE == 64
+  && !global_options_set.x_rs6000_ieeequad)
  ? 64
  : FLOAT_PRECISION_TFmode);
 
diff --git a/gcc/doc/invoke.texi b/gcc/doc/invoke.texi
index 3510a54c6c4..89d530f1d1e 100644
--- a/gcc/doc/invoke.texi
+++ b/gcc/doc/invoke.texi
@@ -27113,7 +27113,8 @@ Change the current ABI to use IBM extended-precision 
long double.
 This is not likely to work if your system defaults to using IEEE
 extended-precision long double.  If you change the long double type
 from IEEE extended-precision, the compiler will issue a warning unless
-you use the @option{-Wno-psabi} option.  Requires @option{-mlong-double-128}
+you use the @option{-Wno-psabi} option.  If this option is used, it
+will implicitly enable @option{-mlong-double-128}.
 to be enabled.
 
 @item -mabi=ieeelongdouble
@@ -27122,8 +27123,8 @@ Change the current ABI to use IEEE extended-precision 
long double.
 This is not likely to work if your system defaults to using IBM
 extended-precision long double.  If you change the long double type
 from IBM extended-precision, the compiler will issue a warning unless
-you use the @option{-Wno-psabi} option.  Requires @option{-mlong-double-128}
-to be enabled.
+you use the @option{-Wno-psabi} option.  If this option is used, it
+will implicitly enable @option{-mlong-double-128}.
 
 @item -mabi=elfv1
 @opindex mabi=elfv1
-- 
2.22.0


-- 
Michael Meissner, IBM
IBM, M/S 2506R, 550 King Street, Littleton, MA 01460-6245, USA
email: meiss...@linux.ibm.com, phone: +1 (978) 899-4797


[PATCH] PowerPC: Add float128/Decimal conversions

2020-11-19 Thread Michael Meissner via Gcc-patches
[PATCH] PowerPC: Add float128/Decimal conversions.

I accidently posted this patch on an internal IBM mailing list instead of
gcc-patches.

This patch replaces the following two patches:

September 24th, 2020:
Message-ID: <20200924203545.gd31...@ibm-toto.the-meissners.org>

October 22nd, 2020:
Message-ID: <2020100603.ga11...@ibm-toto.the-meissners.org>

This is a simplification of those patches.  Those patches were initially
written before I was using the final glibc 2.32 (Advance Toolchain AT14.0).
With using that glibc and with the previous IEEE patches submitted, I can
simplify the conversions to just use the long double defaults, compiling the
modules for IEEE 128-bit long double.  It works because stdio.h/gcc switches
the sprintf call to __sprintfieee128, and the strtold call to __strtof128.

While most of the Decimal <-> Long double tests now pass when long doubles are
IEEE 128-bit, there is one test that fails:

c-c++-common/dfp/convert-bfp-11.c

This test explicitly expects long double to be IBM 128-bit extended double.  A
later patch will fix this.

If the glibc is not 2.32 or later, this code just compiles to using abort.
That way the user won't get unknown reference errors due to the calls to the
glibc 2.32 functions that aren't in previous glibcs.

This patch is one of three critical patches needed to be able to build
compilers where the default is IEEE 128-bit.  The other patches were the
patches to rename the built-in functions, and the patches for prs 97543 and
97643 that were both posted earlier.

I have tested this patch on a little endian power9 system running Linux,
building bootstrap compilers with the 3 long double flavors (long double is
128-bit IEEE, long double is 128-bit IBM, and long double is 64-bit).  There
are no regressions with long double set to 128-bit IBM.

With the exception of convert-bfp-11.c mentioned above, none of the regressions
in the long double set to 128-bit IEEE affect the decimal support.

Can I check this into the master branch?

libgcc/
2020-11-17  Michael Meissner  

* config/rs6000/t-float128 (fp128_dec_funcs): New macro.
(ibm128_dec_funcs): New macro.
(fp128_ppc_funcs): Add the Decimal <-> __float128 conversions.
(fp128_dec_objs): Force Decimal <-> __float128 conversions to be
compiled with -mabi=ieeelongdouble.
(ibm128_dec_objs): Force Decimal <-> __float128 conversions to be
compiled with -mabi=ieeelongdouble.
(FP128_CFLAGS_DECIMAL): New macro.
(IBM128_CFLAGS_DECIMAL): New macro.
* config/rs6000/_dd_to_kf.c: New file.
* config/rs6000/_kf_to_dd.c: New file.
* config/rs6000/_kf_to_sd.c: New file.
* config/rs6000/_kf_to_td.c: New file.
* config/rs6000/_sd_to_kf.c: New file.
* config/rs6000/_td_to_kf.c: New file.
---
 libgcc/config/rs6000/_dd_to_kf.c | 58 +++
 libgcc/config/rs6000/_kf_to_dd.c | 57 ++
 libgcc/config/rs6000/_kf_to_sd.c | 58 +++
 libgcc/config/rs6000/_kf_to_td.c | 56 ++
 libgcc/config/rs6000/_sd_to_kf.c | 59 
 libgcc/config/rs6000/_td_to_kf.c | 58 +++
 libgcc/config/rs6000/t-float128  | 26 +-
 7 files changed, 371 insertions(+), 1 deletion(-)
 create mode 100644 libgcc/config/rs6000/_dd_to_kf.c
 create mode 100644 libgcc/config/rs6000/_kf_to_dd.c
 create mode 100644 libgcc/config/rs6000/_kf_to_sd.c
 create mode 100644 libgcc/config/rs6000/_kf_to_td.c
 create mode 100644 libgcc/config/rs6000/_sd_to_kf.c
 create mode 100644 libgcc/config/rs6000/_td_to_kf.c

diff --git a/libgcc/config/rs6000/_dd_to_kf.c b/libgcc/config/rs6000/_dd_to_kf.c
new file mode 100644
index 000..93601fa280e
--- /dev/null
+++ b/libgcc/config/rs6000/_dd_to_kf.c
@@ -0,0 +1,58 @@
+/* Copyright (C) 1989-2020 Free Software Foundation, Inc.
+
+This file is part of GCC.
+
+GCC is free software; you can redistribute it and/or modify it under
+the terms of the GNU General Public License as published by the Free
+Software Foundation; either version 3, or (at your option) any later
+version.
+
+GCC is distributed in the hope that it will be useful, but WITHOUT ANY
+WARRANTY; without even the implied warranty of MERCHANTABILITY or
+FITNESS FOR A PARTICULAR PURPOSE.  See the GNU General Public License
+for more details.
+
+Under Section 7 of GPL version 3, you are granted additional
+permissions described in the GCC Runtime Library Exception, version
+3.1, as published by the Free Software Foundation.
+
+You should have received a copy of the GNU General Public License and
+a copy of the GCC Runtime Library Exception along with this program;
+see the files COPYING3 and COPYING.RUNTIME respectively.  If not, see
+.  */
+
+/* Decimal64 -> _Float128 conversion.  */
+
+/* FINE_GRAINED_LIBRARIES is used so we can isolate just to dd_to_tf conver

[r11-5181 Regression] FAIL: gcc.dg/vect/vect-35.c scan-tree-dump vect "can't determine dependence between" on Linux/x86_64

2020-11-19 Thread sunil.k.pandey via Gcc-patches
On Linux/x86_64,

0862d007b564eca8c9a48fca0e689dd3f90db828 is the first bad commit
commit 0862d007b564eca8c9a48fca0e689dd3f90db828
Author: Jan Hubicka 
Date:   Thu Nov 19 20:16:26 2020 +0100

Fix two bugs in operand_equal_p

caused

FAIL: gcc.dg/vect/vect-35-big-array.c -flto -ffat-lto-objects  
scan-tree-dump-times vect "vectorized 1 loops" 1
FAIL: gcc.dg/vect/vect-35-big-array.c -flto -ffat-lto-objects  scan-tree-dump 
vect "can't determine dependence between"
FAIL: gcc.dg/vect/vect-35-big-array.c scan-tree-dump-times vect "vectorized 1 
loops" 1
FAIL: gcc.dg/vect/vect-35-big-array.c scan-tree-dump vect "can't determine 
dependence between"
FAIL: gcc.dg/vect/vect-35.c -flto -ffat-lto-objects  scan-tree-dump-times vect 
"vectorized 1 loops" 1
FAIL: gcc.dg/vect/vect-35.c -flto -ffat-lto-objects  scan-tree-dump vect "can't 
determine dependence between"
FAIL: gcc.dg/vect/vect-35.c scan-tree-dump-times vect "vectorized 1 loops" 1
FAIL: gcc.dg/vect/vect-35.c scan-tree-dump vect "can't determine dependence 
between"

with GCC configured with

../../gcc/configure 
--prefix=/local/skpandey/gccwork/toolwork/gcc-bisect-master/master/r11-5181/usr 
--enable-clocale=gnu --with-system-zlib --with-demangler-in-ld 
--with-fpmath=sse --enable-languages=c,c++,fortran --enable-cet --without-isl 
--enable-libmpx x86_64-linux --disable-bootstrap

To reproduce:

$ cd {build_dir}/gcc && make check 
RUNTESTFLAGS="vect.exp=gcc.dg/vect/vect-35-big-array.c 
--target_board='unix{-m32}'"
$ cd {build_dir}/gcc && make check 
RUNTESTFLAGS="vect.exp=gcc.dg/vect/vect-35-big-array.c 
--target_board='unix{-m32\ -march=cascadelake}'"
$ cd {build_dir}/gcc && make check 
RUNTESTFLAGS="vect.exp=gcc.dg/vect/vect-35-big-array.c 
--target_board='unix{-m64}'"
$ cd {build_dir}/gcc && make check 
RUNTESTFLAGS="vect.exp=gcc.dg/vect/vect-35-big-array.c 
--target_board='unix{-m64\ -march=cascadelake}'"
$ cd {build_dir}/gcc && make check RUNTESTFLAGS="vect.exp=gcc.dg/vect/vect-35.c 
--target_board='unix{-m32}'"
$ cd {build_dir}/gcc && make check RUNTESTFLAGS="vect.exp=gcc.dg/vect/vect-35.c 
--target_board='unix{-m32\ -march=cascadelake}'"
$ cd {build_dir}/gcc && make check RUNTESTFLAGS="vect.exp=gcc.dg/vect/vect-35.c 
--target_board='unix{-m64}'"
$ cd {build_dir}/gcc && make check RUNTESTFLAGS="vect.exp=gcc.dg/vect/vect-35.c 
--target_board='unix{-m64\ -march=cascadelake}'"

(Please do not reply to this email, for question about this report, contact me 
at skpgkp2 at gmail dot com)


[PATCH,rs6000] Make MMA builtins use opaque modes [v2]

2020-11-19 Thread Aaron Sawdey via Gcc-patches
For some reason this patch never showed up on gcc-patches.

Aaron Sawdey, Ph.D. saw...@linux.ibm.com
IBM Linux on POWER Toolchain
 

> Begin forwarded message:
> 
> From: acsaw...@linux.ibm.com
> Subject: [PATCH,rs6000] Make MMA builtins use opaque modes [v2]
> Date: November 19, 2020 at 12:58:47 PM CST
> To: gcc-patches@gcc.gnu.org
> Cc: seg...@kernel.crashing.org, wschm...@linux.ibm.com, 
> berg...@linux.ibm.com, Aaron Sawdey 
> 
> From: Aaron Sawdey 
> 
> Segher & Bergner -
>  Thanks for the reviews, here's the updated patch after fixing those things.
> We now have an UNSPEC for xxsetaccz, and an accompanying change to
> rs6000_rtx_costs to make it be cost 0 so that CSE doesn't try to replace it
> with a bunch of register moves.
> 
> If bootstrap/regtest looks good, ok for trunk?
> 
> Thanks,
>Aaron
> 
> gcc/
>   * gcc/config/rs6000/mma.md (unspec): Add assemble/extract UNSPECs.
>   (movoi): Change to movoo.
>   (*movpoi): Change to *movoo.
>   (movxi): Change to movxo.
>   (*movpxi): Change to *movxo.
>   (mma_assemble_pair): Change to OO mode.
>   (*mma_assemble_pair): New define_insn_and_split.
>   (mma_disassemble_pair): New define_expand.
>   (*mma_disassemble_pair): New define_insn_and_split.
>   (mma_assemble_acc): Change to XO mode.
>   (*mma_assemble_acc): Change to XO mode.
>   (mma_disassemble_acc): New define_expand.
>   (*mma_disassemble_acc): New define_insn_and_split.
>   (mma_): Change to XO mode.
>   (mma_): Change to XO mode.
>   (mma_): Change to XO mode.
>   (mma_): Change to OO mode.
>   (mma_): Change to XO/OO mode.
>   (mma_): Change to XO mode.
>   (mma_): Change to XO mode.
>   (mma_): Change to XO mode.
>   (mma_): Change to XO mode.
>   (mma_): Change to XO mode.
>   (mma_): Change to XO mode.
>   (mma_): Change to XO/OO mode.
>   (mma_): Change to XO/OO mode.
>   (mma_): Change to XO mode.
>   (mma_): Change to XO mode.
>   * gcc/config/rs6000/predicates.md (input_operand): Allow opaque.
>   (mma_disassemble_output_operand): New predicate.
>   * gcc/config/rs6000/rs6000-builtin.def:
>   Changes to disassemble builtins.
>   * gcc/config/rs6000/rs6000-call.c (rs6000_return_in_memory):
>   Disallow __vector_pair/__vector_quad as return types.
>   (rs6000_promote_function_mode): Remove function return type
>   check because we can't test it here any more.
>   (rs6000_function_arg): Do not allow __vector_pair/__vector_quad
>   as as function arguments.
>   (rs6000_gimple_fold_mma_builtin):
>   Handle mma_disassemble_* builtins.
>   (rs6000_init_builtins): Create types for XO/OO modes.
>   * gcc/config/rs6000/rs6000-modes.def: DElete OI, XI,
>   POI, and PXI modes, and create XO and OO modes.
>   * gcc/config/rs6000/rs6000-string.c (expand_block_move):
>   Update to OO mode.
>   * gcc/config/rs6000/rs6000.c (rs6000_hard_regno_mode_ok_uncached):
>   Update for XO/OO modes.
>   (rs6000_rtx_costs): Make UNSPEC_MMA_XXSETACCZ cost 0.
>   (rs6000_modes_tieable_p): Update for XO/OO modes.
>   (rs6000_debug_reg_global): Update for XO/OO modes.
>   (rs6000_setup_reg_addr_masks): Update for XO/OO modes.
>   (rs6000_init_hard_regno_mode_ok): Update for XO/OO modes.
>   (reg_offset_addressing_ok_p): Update for XO/OO modes.
>   (rs6000_emit_move): Update for XO/OO modes.
>   (rs6000_preferred_reload_class): Update for XO/OO modes.
>   (rs6000_split_multireg_move): Update for XO/OO modes.
>   (rs6000_mangle_type): Update for opaque types.
>   (rs6000_invalid_conversion): Update for XO/OO modes.
>   * gcc/config/rs6000/rs6000.h (VECTOR_ALIGNMENT_P):
>   Update for XO/OO modes.
>   * gcc/config/rs6000/rs6000.md (RELOAD): Update for XO/OO modes.
> gcc/testsuite/
>   * gcc.target/powerpc/mma-double-test.c (main): Call abort for failure.
>   * gcc.target/powerpc/mma-single-test.c (main): Call abort for failure.
>   * gcc.target/powerpc/pr96506.c: Rename to pr96506-1.c.
>   * gcc.target/powerpc/pr96506-2.c: New test.
> ---
> gcc/config/rs6000/mma.md  | 421 ++
> gcc/config/rs6000/predicates.md   |  12 +
> gcc/config/rs6000/rs6000-builtin.def  |  14 +-
> gcc/config/rs6000/rs6000-call.c   | 142 +++---
> gcc/config/rs6000/rs6000-modes.def|  10 +-
> gcc/config/rs6000/rs6000-string.c |   6 +-
> gcc/config/rs6000/rs6000.c| 193 
> gcc/config/rs6000/rs6000.h|   3 +-
> gcc/config/rs6000/rs6000.md   |   2 +-
> .../gcc.target/powerpc/mma-double-test.c  |   3 +
> .../gcc.target/powerpc/mma-single-test.c  |   3 +
> .../powerpc/{pr96506.c => pr96506-1.c}|  24 -
> gcc/testsuite/gcc.target/powerpc/pr96506-2.c  |  38 ++
> 13 files changed, 508 insertions(+), 363 deletions(-)
> rename gcc/tes

[PATCH] Additional small changes to support opaque modes

2020-11-19 Thread acsawdey--- via Gcc-patches
From: Aaron Sawdey 

After building some larger codes using opaque types and some c++ codes
using opaque types it became clear I needed to go through and look for
places where opaque types and modes needed to be handled. A whole pile
of one-liners.

If bootstrap/regtest passes for ppc64le and x86_64, ok for trunk?

gcc/
* typeclass.h: Add opaque_type_class.
* builtins.c (type_to_class): Identify opaque type class.
* c-family/c-pretty-print.c (c_pretty_printer::simple_type_specifier):
Treat opaque types like other types.
(c_pretty_printer::direct_abstract_declarator): Opaque types are
supported types.
* c/c-aux-info.c (gen_type): Support opaque types.
* cp/error.c (dump_type): Handle opaque types.
(dump_type_prefix): Handle opaque types.
(dump_type_suffix): Handle opaque types.
(dump_expr): Handle opaque types.
* cp/pt.c (tsubst): Allow opaque types in templates.
(unify): Allow opaque types in templates.
* cp/typeck.c (structural_comptypes): Handle comparison
of opaque types.
* dwarf2out.c (is_base_type): Handle opaque types.
(loc_descriptor): Handle opaque modes like VOIDmode/BLKmode.
(gen_type_die_with_usage): Handle opaque types.
* expr.c (count_type_elements): Opaque types should
never have initializers.
* ipa-devirt.c (odr_types_equivalent_p): No type-specific handling
for opaque types is needed as it eventually checks the underlying
mode which is what is important.
* tree-streamer.c (record_common_node): Handle opaque types.
* tree.c (type_contains_placeholder_1): Handle opaque types.
(type_cache_hasher::equal): No additional comparison needed for
opaque types.
---
 gcc/builtins.c| 1 +
 gcc/c-family/c-pretty-print.c | 2 ++
 gcc/c/c-aux-info.c| 4 
 gcc/cp/error.c| 4 
 gcc/cp/pt.c   | 2 ++
 gcc/cp/typeck.c   | 1 +
 gcc/dwarf2out.c   | 4 +++-
 gcc/expr.c| 1 +
 gcc/ipa-devirt.c  | 1 +
 gcc/tree-streamer.c   | 1 +
 gcc/tree.c| 2 ++
 gcc/typeclass.h   | 2 +-
 12 files changed, 23 insertions(+), 2 deletions(-)

diff --git a/gcc/builtins.c b/gcc/builtins.c
index 42c52a1925e..0958abcae49 100644
--- a/gcc/builtins.c
+++ b/gcc/builtins.c
@@ -2228,6 +2228,7 @@ type_to_class (tree type)
 case ARRAY_TYPE:  return (TYPE_STRING_FLAG (type)
   ? string_type_class : array_type_class);
 case LANG_TYPE:   return lang_type_class;
+case OPAQUE_TYPE:  return opaque_type_class;
 default:  return no_type_class;
 }
 }
diff --git a/gcc/c-family/c-pretty-print.c b/gcc/c-family/c-pretty-print.c
index 8953e3b678b..3027703056b 100644
--- a/gcc/c-family/c-pretty-print.c
+++ b/gcc/c-family/c-pretty-print.c
@@ -342,6 +342,7 @@ c_pretty_printer::simple_type_specifier (tree t)
   break;
 
 case VOID_TYPE:
+case OPAQUE_TYPE:
 case BOOLEAN_TYPE:
 case INTEGER_TYPE:
 case REAL_TYPE:
@@ -662,6 +663,7 @@ c_pretty_printer::direct_abstract_declarator (tree t)
 
 case IDENTIFIER_NODE:
 case VOID_TYPE:
+case OPAQUE_TYPE:
 case BOOLEAN_TYPE:
 case INTEGER_TYPE:
 case REAL_TYPE:
diff --git a/gcc/c/c-aux-info.c b/gcc/c/c-aux-info.c
index ffc8099856d..41f5598de38 100644
--- a/gcc/c/c-aux-info.c
+++ b/gcc/c/c-aux-info.c
@@ -413,6 +413,10 @@ gen_type (const char *ret_val, tree t, formals_style style)
  data_type = IDENTIFIER_POINTER (DECL_NAME (TYPE_NAME (t)));
  break;
 
+   case OPAQUE_TYPE:
+ data_type = IDENTIFIER_POINTER (DECL_NAME (TYPE_NAME (t)));
+ break;
+
case VOID_TYPE:
  data_type = "void";
  break;
diff --git a/gcc/cp/error.c b/gcc/cp/error.c
index 396558be17f..d27545d1223 100644
--- a/gcc/cp/error.c
+++ b/gcc/cp/error.c
@@ -529,6 +529,7 @@ dump_type (cxx_pretty_printer *pp, tree t, int flags)
 case INTEGER_TYPE:
 case REAL_TYPE:
 case VOID_TYPE:
+case OPAQUE_TYPE:
 case BOOLEAN_TYPE:
 case COMPLEX_TYPE:
 case VECTOR_TYPE:
@@ -874,6 +875,7 @@ dump_type_prefix (cxx_pretty_printer *pp, tree t, int flags)
 case UNION_TYPE:
 case LANG_TYPE:
 case VOID_TYPE:
+case OPAQUE_TYPE:
 case TYPENAME_TYPE:
 case COMPLEX_TYPE:
 case VECTOR_TYPE:
@@ -997,6 +999,7 @@ dump_type_suffix (cxx_pretty_printer *pp, tree t, int flags)
 case UNION_TYPE:
 case LANG_TYPE:
 case VOID_TYPE:
+case OPAQUE_TYPE:
 case TYPENAME_TYPE:
 case COMPLEX_TYPE:
 case VECTOR_TYPE:
@@ -2810,6 +2813,7 @@ dump_expr (cxx_pretty_printer *pp, tree t, int flags)
 case ENUMERAL_TYPE:
 case REAL_TYPE:
 case VOID_TYPE:
+case OPAQUE_TYPE:
 case BOOLEAN_TYPE:
 case INTEGER_TYPE:
 case COMPLEX_TYPE:
diff --git a/gcc/cp/pt.c b/gcc/

[PATCH] c++: Fix wrong error with constexpr destructor [PR97427]

2020-11-19 Thread Marek Polacek via Gcc-patches
When I implemented the code to detect modifying const objects in
constexpr contexts, we couldn't have constexpr destructors, so I didn't
consider them.  But now we can and that caused a bogus error in this
testcase: [class.dtor]p5 says that "const and volatile semantics are not
applied on an object under destruction.  They stop being in effect when
the destructor for the most derived object starts." so we have to clear
the TREE_READONLY flag we set on the object after the constructors have
been called to mark it as no-longer-under-construction.  In the ~Foo
call it's now an object under destruction, so don't report those errors.

Bootstrapped/regtested on x86_64-pc-linux-gnu, ok for trunk/10?

gcc/cp/ChangeLog:

PR c++/97427
* constexpr.c (cxx_set_object_constness): New function.
(cxx_eval_call_expression): Set new_obj for destructors too.
Call cxx_set_object_constness to set/unset TREE_READONLY of
the object under construction/destruction.

gcc/testsuite/ChangeLog:

PR c++/97427
* g++.dg/cpp2a/constexpr-dtor10.C: New test.
---
 gcc/cp/constexpr.c| 49 +--
 gcc/testsuite/g++.dg/cpp2a/constexpr-dtor10.C | 16 ++
 2 files changed, 49 insertions(+), 16 deletions(-)
 create mode 100644 gcc/testsuite/g++.dg/cpp2a/constexpr-dtor10.C

diff --git a/gcc/cp/constexpr.c b/gcc/cp/constexpr.c
index 625410327b8..ef37b3043a5 100644
--- a/gcc/cp/constexpr.c
+++ b/gcc/cp/constexpr.c
@@ -2187,6 +2187,27 @@ cxx_eval_thunk_call (const constexpr_ctx *ctx, tree t, 
tree thunk_fndecl,
   non_constant_p, overflow_p);
 }
 
+/* If OBJECT is of const class type, evaluate it to a CONSTRUCTOR and set
+   its TREE_READONLY flag according to READONLY_P.  Used for constexpr
+   'tors to detect modifying const objects in a constexpr context.  */
+
+static void
+cxx_set_object_constness (const constexpr_ctx *ctx, tree object,
+ bool readonly_p, bool *non_constant_p,
+ bool *overflow_p)
+{
+  if (CLASS_TYPE_P (TREE_TYPE (object))
+  && CP_TYPE_CONST_P (TREE_TYPE (object)))
+{
+  /* Subobjects might not be stored in ctx->global->values but we
+can get its CONSTRUCTOR by evaluating *this.  */
+  tree e = cxx_eval_constant_expression (ctx, object, /*lval*/false,
+non_constant_p, overflow_p);
+  if (TREE_CODE (e) == CONSTRUCTOR && !*non_constant_p)
+   TREE_READONLY (e) = readonly_p;
+}
+}
+
 /* Subroutine of cxx_eval_constant_expression.
Evaluate the call expression tree T in the context of OLD_CALL expression
evaluation.  */
@@ -2515,11 +2536,11 @@ cxx_eval_call_expression (const constexpr_ctx *ctx, 
tree t,
 
   depth_ok = push_cx_call_context (t);
 
-  /* Remember the object we are constructing.  */
+  /* Remember the object we are constructing or destructing.  */
   tree new_obj = NULL_TREE;
-  if (DECL_CONSTRUCTOR_P (fun))
+  if (DECL_CONSTRUCTOR_P (fun) || DECL_DESTRUCTOR_P (fun))
 {
-  /* In a constructor, it should be the first `this' argument.
+  /* In a cdtor, it should be the first `this' argument.
 At this point it has already been evaluated in the call
 to cxx_bind_parameters_in_call.  */
   new_obj = TREE_VEC_ELT (new_call.bindings, 0);
@@ -2656,6 +2677,12 @@ cxx_eval_call_expression (const constexpr_ctx *ctx, tree 
t,
  unsigned save_heap_alloc_count = ctx->global->heap_vars.length ();
  unsigned save_heap_dealloc_count = ctx->global->heap_dealloc_count;
 
+ /* If this is a constexpr destructor, the object's const and volatile
+semantics are no longer in effect; see [class.dtor]p5.  */
+ if (new_obj && DECL_DESTRUCTOR_P (fun))
+   cxx_set_object_constness (ctx, new_obj, /*readonly_p=*/false,
+ non_constant_p, overflow_p);
+
  tree jump_target = NULL_TREE;
  cxx_eval_constant_expression (&ctx_with_save_exprs, body,
lval, non_constant_p, overflow_p,
@@ -2686,19 +2713,9 @@ cxx_eval_call_expression (const constexpr_ctx *ctx, tree 
t,
 the object is no longer under construction, and its possible
 'const' semantics now apply.  Make a note of this fact by
 marking the CONSTRUCTOR TREE_READONLY.  */
- if (new_obj
- && CLASS_TYPE_P (TREE_TYPE (new_obj))
- && CP_TYPE_CONST_P (TREE_TYPE (new_obj)))
-   {
- /* Subobjects might not be stored in ctx->global->values but we
-can get its CONSTRUCTOR by evaluating *this.  */
- tree e = cxx_eval_constant_expression (ctx, new_obj,
-/*lval*/false,
-non_constant_p,
-overflow_p);
-

[PATCH] Additional small changes to support opaque modes

2020-11-19 Thread acsawdey--- via Gcc-patches
From: Aaron Sawdey 

After building some larger codes using opaque types and some c++ codes
using opaque types it became clear I needed to go through and look for
places where opaque types and modes needed to be handled. A whole pile
of one-liners.

If bootstrap/regtest passes for ppc64le and x86_64, ok for trunk?

gcc/
* typeclass.h: Add opaque_type_class.
* builtins.c (type_to_class): Identify opaque type class.
* c-family/c-pretty-print.c (c_pretty_printer::simple_type_specifier):
Treat opaque types like other types.
(c_pretty_printer::direct_abstract_declarator): Opaque types are
supported types.
* c/c-aux-info.c (gen_type): Support opaque types.
* cp/error.c (dump_type): Handle opaque types.
(dump_type_prefix): Handle opaque types.
(dump_type_suffix): Handle opaque types.
(dump_expr): Handle opaque types.
* cp/pt.c (tsubst): Allow opaque types in templates.
(unify): Allow opaque types in templates.
* cp/typeck.c (structural_comptypes): Handle comparison
of opaque types.
* dwarf2out.c (is_base_type): Handle opaque types.
(loc_descriptor): Handle opaque modes like VOIDmode/BLKmode.
(gen_type_die_with_usage): Handle opaque types.
* expr.c (count_type_elements): Opaque types should
never have initializers.
* ipa-devirt.c (odr_types_equivalent_p): No type-specific handling
for opaque types is needed as it eventually checks the underlying
mode which is what is important.
* tree-streamer.c (record_common_node): Handle opaque types.
* tree.c (type_contains_placeholder_1): Handle opaque types.
(type_cache_hasher::equal): No additional comparison needed for
opaque types.
---
 gcc/builtins.c| 1 +
 gcc/c-family/c-pretty-print.c | 2 ++
 gcc/c/c-aux-info.c| 4 
 gcc/cp/error.c| 4 
 gcc/cp/pt.c   | 2 ++
 gcc/cp/typeck.c   | 1 +
 gcc/dwarf2out.c   | 4 +++-
 gcc/expr.c| 1 +
 gcc/ipa-devirt.c  | 1 +
 gcc/tree-streamer.c   | 1 +
 gcc/tree.c| 2 ++
 gcc/typeclass.h   | 2 +-
 12 files changed, 23 insertions(+), 2 deletions(-)

diff --git a/gcc/builtins.c b/gcc/builtins.c
index 42c52a1925e..0958abcae49 100644
--- a/gcc/builtins.c
+++ b/gcc/builtins.c
@@ -2228,6 +2228,7 @@ type_to_class (tree type)
 case ARRAY_TYPE:  return (TYPE_STRING_FLAG (type)
   ? string_type_class : array_type_class);
 case LANG_TYPE:   return lang_type_class;
+case OPAQUE_TYPE:  return opaque_type_class;
 default:  return no_type_class;
 }
 }
diff --git a/gcc/c-family/c-pretty-print.c b/gcc/c-family/c-pretty-print.c
index 8953e3b678b..3027703056b 100644
--- a/gcc/c-family/c-pretty-print.c
+++ b/gcc/c-family/c-pretty-print.c
@@ -342,6 +342,7 @@ c_pretty_printer::simple_type_specifier (tree t)
   break;
 
 case VOID_TYPE:
+case OPAQUE_TYPE:
 case BOOLEAN_TYPE:
 case INTEGER_TYPE:
 case REAL_TYPE:
@@ -662,6 +663,7 @@ c_pretty_printer::direct_abstract_declarator (tree t)
 
 case IDENTIFIER_NODE:
 case VOID_TYPE:
+case OPAQUE_TYPE:
 case BOOLEAN_TYPE:
 case INTEGER_TYPE:
 case REAL_TYPE:
diff --git a/gcc/c/c-aux-info.c b/gcc/c/c-aux-info.c
index ffc8099856d..41f5598de38 100644
--- a/gcc/c/c-aux-info.c
+++ b/gcc/c/c-aux-info.c
@@ -413,6 +413,10 @@ gen_type (const char *ret_val, tree t, formals_style style)
  data_type = IDENTIFIER_POINTER (DECL_NAME (TYPE_NAME (t)));
  break;
 
+   case OPAQUE_TYPE:
+ data_type = IDENTIFIER_POINTER (DECL_NAME (TYPE_NAME (t)));
+ break;
+
case VOID_TYPE:
  data_type = "void";
  break;
diff --git a/gcc/cp/error.c b/gcc/cp/error.c
index 396558be17f..d27545d1223 100644
--- a/gcc/cp/error.c
+++ b/gcc/cp/error.c
@@ -529,6 +529,7 @@ dump_type (cxx_pretty_printer *pp, tree t, int flags)
 case INTEGER_TYPE:
 case REAL_TYPE:
 case VOID_TYPE:
+case OPAQUE_TYPE:
 case BOOLEAN_TYPE:
 case COMPLEX_TYPE:
 case VECTOR_TYPE:
@@ -874,6 +875,7 @@ dump_type_prefix (cxx_pretty_printer *pp, tree t, int flags)
 case UNION_TYPE:
 case LANG_TYPE:
 case VOID_TYPE:
+case OPAQUE_TYPE:
 case TYPENAME_TYPE:
 case COMPLEX_TYPE:
 case VECTOR_TYPE:
@@ -997,6 +999,7 @@ dump_type_suffix (cxx_pretty_printer *pp, tree t, int flags)
 case UNION_TYPE:
 case LANG_TYPE:
 case VOID_TYPE:
+case OPAQUE_TYPE:
 case TYPENAME_TYPE:
 case COMPLEX_TYPE:
 case VECTOR_TYPE:
@@ -2810,6 +2813,7 @@ dump_expr (cxx_pretty_printer *pp, tree t, int flags)
 case ENUMERAL_TYPE:
 case REAL_TYPE:
 case VOID_TYPE:
+case OPAQUE_TYPE:
 case BOOLEAN_TYPE:
 case INTEGER_TYPE:
 case COMPLEX_TYPE:
diff --git a/gcc/cp/pt.c b/gcc/

  1   2   >