[PATCH] LoongArch: Replace UNSPEC_FCOPYSIGN with copysign RTL

2023-10-02 Thread Xi Ruoyao
When I added copysign support for LoongArch (r13-3702), we did not have
a copysign RTL insn, so I had to use UNSPEC to represent the copysign
instruction. Now the copysign RTX code has been added in r14-1586, so
this patch removes those UNSPECs, and it uses the native RTL copysign
insn.

Inspired by rs6000 patch "Cleanup: Replace UNSPEC_COPYSIGN with copysign
RTL" [1] from Michael Meissner.

[1]: https://gcc.gnu.org/pipermail/gcc-patches/2023-September/631701.html

gcc/ChangeLog:

* config/loongarch/loongarch.md (UNSPEC_FCOPYSIGN): Delete.
(copysign3): Use copysign RTL instead of UNSPEC.
---

Bootstrapped and regtested on loongarch64-linux-gnu.  Ok for trunk?

 gcc/config/loongarch/loongarch.md | 6 ++
 1 file changed, 2 insertions(+), 4 deletions(-)

diff --git a/gcc/config/loongarch/loongarch.md 
b/gcc/config/loongarch/loongarch.md
index 2b09209945b..9916c741641 100644
--- a/gcc/config/loongarch/loongarch.md
+++ b/gcc/config/loongarch/loongarch.md
@@ -37,7 +37,6 @@ (define_c_enum "unspec" [
   UNSPEC_FCLASS
   UNSPEC_FMAX
   UNSPEC_FMIN
-  UNSPEC_FCOPYSIGN
   UNSPEC_FTINT
   UNSPEC_FTINTRM
   UNSPEC_FTINTRP
@@ -1130,9 +1129,8 @@ (define_insn "abs2"
 
 (define_insn "copysign3"
   [(set (match_operand:ANYF 0 "register_operand" "=f")
-   (unspec:ANYF [(match_operand:ANYF 1 "register_operand" "f")
- (match_operand:ANYF 2 "register_operand" "f")]
-UNSPEC_FCOPYSIGN))]
+   (copysign:ANYF (match_operand:ANYF 1 "register_operand" "f")
+  (match_operand:ANYF 2 "register_operand" "f")))]
   "TARGET_HARD_FLOAT"
   "fcopysign.\t%0,%1,%2"
   [(set_attr "type" "fcopysign")
-- 
2.42.0



Re: [PATCH v3] RISC-V:Optimize the MASK opt generation

2023-10-02 Thread Kito Cheng
Proposed fix, and verified with "mawk" and "gawk -P" (gawk with posix
mode) on my linux also some other report it work on freebsd, just wait
review :)

https://gcc.gnu.org/pipermail/gcc-patches/2023-October/631785.html

On Tue, Oct 3, 2023 at 2:07 AM Jeff Law  wrote:
>
>
>
> On 10/2/23 12:03, David Edelsohn wrote:
> > On Mon, Oct 2, 2023 at 1:59 PM Jeff Law  > > wrote:
> >
> >
> >
> > On 10/2/23 11:20, David Edelsohn wrote:
> >  > Wang,
> >  >
> >  > The AWK portions of this patch broke bootstrap on AIX.
> >  >
> >  > Also, the AWK portions are common code, not RISC-V specific.  I
> > don't
> >  > see anywhere that the common portions of the patch were reviewed or
> >  > approved by anyone with authority to approve the changes to the
> > AWK files.
> >  >
> >  > This patch should not have been committed without approval by a
> > reviewer
> >  > with authority for that portion of the compiler and should have been
> >  > tested on targets other than RISC-V if common parts of the
> > compiler were
> >  > changed.
> > I acked the generic bits.  So the lack of testing on another target is
> > on me.
> >
> >
> > Hi, Jeff
> >
> > Sorry. I didn't see a comment from a global reviewer in the V3 thread.
> NP.
>
> >
> > I am using Gawk on AIX.  After the change, I see a parse error from
> > gawk.  I'm rebuilding with a checkout just before the change to confirm
> > that it was the source of the error, and it seems to be past that
> > failure location.  I didn't keep the exact error.  Once I get past this
> > build cycle, I'll reproduce it.
> I think there's already a patch circulating which fixes this.  It broke
> at least one other platform.  Hopefully it'll all be sorted out today.
>
>
> jeff


[PATCH] RISC-V: Fix the riscv_legitimize_poly_move issue on targets where the minimal VLEN exceeds 512.

2023-10-02 Thread Kito Cheng
riscv_legitimize_poly_move was expected to ensure the poly value is at most 32
times smaller than the minimal VLEN (32 being derived from '4096 / 128').
This assumption held when our mode modeling was not so precisely defined.
However, now that we have modeled the mode size according to the correct minimal
VLEN info, the size difference between different RVV modes can be up to 64
times. For instance, comparing RVVMF64BI and RVVMF1BI, the sizes are [1, 1]
versus [64, 64] respectively.

gcc/ChangeLog:

* config/riscv/riscv.cc (riscv_legitimize_poly_move): Bump
max_power to 64.

gcc/testsuite/ChangeLog:

* g++.target/riscv/rvv/autovec/bug-01.C: New.
* g++.target/riscv/rvv/rvv.exp: Add autovec folder.
---
 gcc/config/riscv/riscv.cc |  7 ++--
 .../g++.target/riscv/rvv/autovec/bug-01.C | 33 +++
 gcc/testsuite/g++.target/riscv/rvv/rvv.exp|  3 ++
 3 files changed, 40 insertions(+), 3 deletions(-)
 create mode 100644 gcc/testsuite/g++.target/riscv/rvv/autovec/bug-01.C

diff --git a/gcc/config/riscv/riscv.cc b/gcc/config/riscv/riscv.cc
index d5446b63dbf..6468702e3a3 100644
--- a/gcc/config/riscv/riscv.cc
+++ b/gcc/config/riscv/riscv.cc
@@ -2386,9 +2386,10 @@ riscv_legitimize_poly_move (machine_mode mode, rtx dest, 
rtx tmp, rtx src)
 }
   else
 {
-  /* FIXME: We currently DON'T support TARGET_MIN_VLEN > 4096.  */
-  int max_power = exact_log2 (4096 / 128);
-  for (int i = 0; i < max_power; i++)
+  /* The size difference between different RVV modes can be up to 64 times.
+e.g. RVVMF64BI vs RVVMF1BI on zvl512b, which is [1, 1] vs [64, 64].  */
+  int max_power = exact_log2 (64);
+  for (int i = 0; i <= max_power; i++)
{
  int possible_div_factor = 1 << i;
  if (factor % (vlenb / possible_div_factor) == 0)
diff --git a/gcc/testsuite/g++.target/riscv/rvv/autovec/bug-01.C 
b/gcc/testsuite/g++.target/riscv/rvv/autovec/bug-01.C
new file mode 100644
index 000..fd10009ddbe
--- /dev/null
+++ b/gcc/testsuite/g++.target/riscv/rvv/autovec/bug-01.C
@@ -0,0 +1,33 @@
+/* { dg-options "-march=rv64gcv_zvl512b -mabi=lp64d -O3" } */
+
+class c {
+public:
+  int e();
+  void j();
+};
+float *d;
+class k {
+  int f;
+
+public:
+  k(int m) : f(m) {}
+  float g;
+  float h;
+  void n(int m) {
+for (int i; i < m; i++) {
+  d[0] = d[1] = d[2] = g;
+  d[3] = h;
+  d += f;
+}
+  }
+};
+c l;
+void o() {
+  int b = l.e();
+  k a(b);
+  for (;;)
+if (b == 4) {
+  l.j();
+  a.n(2);
+}
+}
diff --git a/gcc/testsuite/g++.target/riscv/rvv/rvv.exp 
b/gcc/testsuite/g++.target/riscv/rvv/rvv.exp
index 249530580d7..c30d6e93144 100644
--- a/gcc/testsuite/g++.target/riscv/rvv/rvv.exp
+++ b/gcc/testsuite/g++.target/riscv/rvv/rvv.exp
@@ -40,5 +40,8 @@ set CFLAGS "-march=$gcc_march -O3"
 dg-runtest [lsort [glob -nocomplain $srcdir/$subdir/base/*.C]] \
"" $CFLAGS
 
+dg-runtest [lsort [glob -nocomplain $srcdir/$subdir/autovec/*.\[C\]]] \
+"" $CFLAGS
+
 # All done.
 dg-finish
-- 
2.40.1



RFC: attributes documentation

2023-10-02 Thread Sandra Loosemore
When I was working on something else recently, I realized that the GCC manual 
had nothing in its attributes section saying that you could use the various 
documented GCC extension attributes with C/C++ standard attribute syntax too, 
or how (you have to use the "gnu::" prefix); I ended up finding this 
information in the release notes for GCC 10 instead.  :-(  So, minimally, I am 
going to address that.


Going beyond that, though, I think we should also document that the standard 
syntax is now the preferred way to do it, and change the examples (except for 
the parts documenting the old syntax) to use the new standard syntax.  It's 
been accepted by the default -std= setting for both C and C++ since GCC 10, and 
my understanding is that C2x will be official by the time GCC 14 is released 
(so supporting the new syntax won't be just another GNU extension any more). 
Does this sound OK to everybody?


-Sandra


Re: [RFC] expr: don't clear SUBREG_PROMOTED_VAR_P flag for a promoted subreg [target/111466]

2023-10-02 Thread Vineet Gupta




On 9/29/23 05:14, Jeff Law wrote:



On 9/28/23 21:49, Vineet Gupta wrote:


On 9/28/23 20:17, Jeff Law wrote:
I can bootstrap & regression test alpha using QEMU user mode 
emulation. So we might be able to trigger something that way. It'll 
take some time, but might prove fruitful. 


That would be awesome. It's not like this this is burning or anything 
and one of the things in the long tail of things we need to do in 
this area.
Absolutely true.  In fact, I doubt this particular quirk is all that 
important in the extension removal space.  But we've got enough data 
to do a bit of an experiment.  While it takes a long time to run, it 
doesn't require any kind of regular human intervention.  Better to 
fire it off now while it's still fresh in our minds.  If we wait a 
week or two I'll have forgotten everything.


Just as a data-point, the SPEC QEMU icount on RISC-V with this patch 
seems promising (+ve is better, lesser icount)


Baseline    subreg promoted
    note preserved   %
benchmark   workload #
500.perlbench_r 0   1222524143251   1222717055541 -0.02%
    1   741422677286    740573609468 0.11%
    2   694157786227    693787219643 0.05%
502.gcc_r   0   189519277112    188986824061 0.28%  <--
    1   224763075520    224225499491 0.24%
    2   216802546093    216426186662 0.17%
    3   179634122120    179165923074 0.26%  <--
    4   222757886491    222343753217 0.19%
503.bwaves_r    0   309660270217    309640234863 0.01%
    1   488871452301    488838894844 0.01%
    2   381243468081    381218065515 0.01%
    3   463786236849    463756755469 0.01%
505.mcf_r   0   675010417323    675014630665 0.00%
507.cactuBSSN_r 0   2856874668987   2856874679135 0.00%
508.namd_r  0   1877527849689   1877508676900 0.00%
510.parest_r    0   3479719575579   3479104891751 0.02%
511.povray_r    0   3028749801452   3030037805612 -0.04%
519.lbm_r   0   1172421269180   1172421185445 0.00%
520.omnetpp_r   0   1014838628822   1007680353306 0.71%  <--
521.wrf_r   0   3897783090826   3897162379983 0.02%
523.xalancbmk_r 0   1062577057227   1062508198843 0.01%
525.x264_r  0   451836043634    449289002218 0.56%  <--
    1   1761617424590   1758009904369 0.20%  <--
    2   1686037465791   1682815274048 0.19%  <--
526.blender_r   0   1660559350538   1660534452552 0.00%
527.cam4_r  0   2141572063843   2141254936488 0.01%
531.deepsjeng_r 0   1605692153702   1603021256993 0.17%
538.imagick_r   0   3401602661013   3401602660827 0.00%
541.leela_r 0   1989286081019   1987365526160 0.10%
544.nab_r   0   1537038165879   1528865609530 0.53%  <--
548.exchange2_r 0   2050220774922   2048840906692 0.07%
549.fotonik3d_r 0   2231807908394   2231807600916 0.00%
554.roms_r  0   2612969430882   2611529873683 0.06%
557.xz_r    0   367967125022    367760820700 0.06%
    1   961163448926    961025548415 0.01%
    2   522156857947    521939109911 0.04%
997.specrand_fr 0   413618578   413604068 0.00%
999.specrand_ir 0   413618578   413604068 0.00%



mvconst_internal splitter gated with !@ira_in_progess (was Re: Yet Another IRA question)

2023-10-02 Thread Vineet Gupta




On 9/28/23 12:52, Vineet Gupta wrote:


On 9/28/23 05:53, Jeff Law wrote:
Vineet -- assuming Vlad's patch goes in, the other obvious candidate 
for this would be the mvconst_internal define_insn_and_split where 
we'd probably want to reject the insn as a whole once IRA has started. 


Good point, although currently we've kind of papered over it with 
-fsched-pressure, but I'm sure there are way more cases that this will 
improve still.
I will spin up a full multilib test with that, hopefully with no 
fallout :-)


I have the results finally. This is testsuite neutral. Same results 
before/after


   = Summary of gcc testsuite =
    | # of unexpected case / # of unique 
unexpected case

    |  gcc |  g++ | gfortran |
   rv32imac/  ilp32/ medlow |  168 /    70 |   13 / 6 |   67 /    12 |
 rv32imafdc/ ilp32d/ medlow |  168 /    70 |   13 / 6 |   24 / 4 |
   rv64imac/   lp64/ medlow |  161 /    70 |    9 / 3 |   67 /    12 |
 rv64imafdc/  lp64d/ medlow |  160 /    69 |    5 / 2 |    6 / 1 |

But the SPEC runs are not affected at all, if anything it seems to be 
way under noise for 5 workloads.

Not sure if we still want to add the gate, your call

511.povray_r
< 3028749801452
---
> 3028749851039

521.wrf_r
< 3897783090826
---
> 3897783089025

523.xalancbmk_r
< 1062577057227
---
> 1062577053403

527.cam4_r
< 2141572063843
---
> 2141572063800

Thx,
-Vineet


Re: [PATCH] libiberty: Use posix_spawn in pex-unix when available.

2023-10-02 Thread Ian Lance Taylor
On Fri, Sep 29, 2023 at 12:18 PM Brendan Shanks  wrote:
>
> +  #define ERR_ON_FAILURE(ret, func) \
> +do { if (ret) { *err = ret; *errmsg = func; goto exit; } else {} } while 
> (0)

Thanks, but please don't use a macro that changes control flow.

Ian


Re: [committed] Require target lra in gcc.dg/pr108095.c

2023-10-02 Thread Jeff Law




On 10/2/23 14:42, John David Anglin wrote:

Committed to trunk.

Dave
---

Require target lra in gcc.dg/pr108095.c

2023-10-02  John David Anglin  

gcc/testsuite/ChangeLog:

* gcc.dg/pr108095.c: Require target lra.

Thanks.  I already had this in my local tree.

jeff


[PATCH] libcpp: testsuite: Add test for fixed _Pragma bug [PR82335]

2023-10-02 Thread Lewis Hyatt
Hello-

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=82335 is another
_Pragma-related bug that got fixed in GCC 12 but is still open. Before
closing it out, I thought it would be good to add the testcase from that
PR, which we don't have exactly in the testsuite already. Is it OK please?
Thanks!

-Lewis

-- >8 --

This PR was fixed by r12-4797 and r12-5454. Add test coverage from the PR
that is not represented elsewhere.

gcc/testsuite/ChangeLog:

PR preprocessor/82335
* c-c++-common/cpp/diagnostic-pragma-3.c: New test.
---
 .../c-c++-common/cpp/diagnostic-pragma-3.c| 37 +++
 1 file changed, 37 insertions(+)
 create mode 100644 gcc/testsuite/c-c++-common/cpp/diagnostic-pragma-3.c

diff --git a/gcc/testsuite/c-c++-common/cpp/diagnostic-pragma-3.c 
b/gcc/testsuite/c-c++-common/cpp/diagnostic-pragma-3.c
new file mode 100644
index 000..459dcec73b3
--- /dev/null
+++ b/gcc/testsuite/c-c++-common/cpp/diagnostic-pragma-3.c
@@ -0,0 +1,37 @@
+/* This is like diagnostic-pragma-2.c, but handles the case where everything
+   is wrapped inside a macro, which previously caused additional issues tracked
+   in PR preprocessor/82335.  */
+
+/* { dg-do compile } */
+/* { dg-additional-options "-save-temps -Wattributes -Wtype-limits" } */
+
+#define B _Pragma("GCC diagnostic push") \
+  _Pragma("GCC diagnostic ignored \"-Wattributes\"")
+#define E _Pragma("GCC diagnostic pop")
+
+#define X() B int __attribute((unknown_attr)) x; E
+#define Y   B int __attribute((unknown_attr)) y; E
+#define WRAP(x) x
+
+void test1(void)
+{
+  WRAP(X())
+  WRAP(Y)
+}
+
+/* Additional test provided on the PR.  */
+#define PRAGMA(...) _Pragma(#__VA_ARGS__)
+#define PUSH_IGN(X) PRAGMA(GCC diagnostic push) PRAGMA(GCC diagnostic ignored 
X)
+#define POP() PRAGMA(GCC diagnostic pop)
+#define TEST(X, Y) \
+  PUSH_IGN("-Wtype-limits") \
+  int Y = (__typeof(X))-1 < 0; \
+  POP()
+
+int test2()
+{
+  unsigned x;
+  TEST(x, i1);
+  WRAP(TEST(x, i2))
+  return i1 + i2;
+}


Re: [PATCH] [11/12/13/14 Regression] ABI break in _Hash_node_value_base since GCC 11 [PR 111050]

2023-10-02 Thread François Dumont

Now backport to gcc-11/12/13 branches.

On 29/09/2023 11:53, Jonathan Wakely wrote:

On Thu, 28 Sept 2023 at 18:25, François Dumont  wrote:


On 28/09/2023 18:18, Jonathan Wakely wrote:

On Wed, 27 Sept 2023 at 05:44, François Dumont  wrote:

Still no chance to get feedback from TC ? Maybe I can commit the below
then ?

I've heard back from Tim now. Please use "Tim Song
" as the author.

You can change the commit again using git commit --amend --author "Tim
Song "

Sure :-)


OK for trunk with that change - thanks for waiting.

Committed to trunk, let me know for backports.

Please wait a few days in case of problems on the trunk (although I
don't expect any) and then OK for 11/12/13 too - thanks!



AFAICS on gcc mailing list several gcc releases were done recently, too
late.

There have been no releases this month, so the delay hasn't caused any problems.

I was confused by emails like this one:

https://gcc.gnu.org/pipermail/gcc/2023-September/242429.html

I just subscribed to gcc mailing list, I had no idea there were regular
snapshots like this.



[committed] Add hppa*-*-* to dg-error targets at line 5 in gfortran.dg/pr95690.f90

2023-10-02 Thread John David Anglin
Committed to trunk.

Dave
---

Add hppa*-*-* to dg-error targets at line 5

2023-10-02  John David Anglin  

gcc/testsuite/ChangeLog:

* gfortran.dg/pr95690.f90: Add hppa*-*-* to dg-error targets at line 5.

diff --git a/gcc/testsuite/gfortran.dg/pr95690.f90 
b/gcc/testsuite/gfortran.dg/pr95690.f90
index 47a5df9e894..1432937438a 100644
--- a/gcc/testsuite/gfortran.dg/pr95690.f90
+++ b/gcc/testsuite/gfortran.dg/pr95690.f90
@@ -2,8 +2,8 @@
 module m
 contains
subroutine s
-  print *, (erfc) ! { dg-error "not a floating constant" "" { target 
i?86-*-* x86_64-*-* sparc*-*-* cris-*-* } }
-   end ! { dg-error "not a floating constant" "" { target { ! "i?86-*-* 
x86_64-*-* sparc*-*-* cris-*-*" } } }
+  print *, (erfc) ! { dg-error "not a floating constant" "" { target 
i?86-*-* x86_64-*-* sparc*-*-* cris-*-* hppa*-*-* } }
+   end ! { dg-error "not a floating constant" "" { target { ! "i?86-*-* 
x86_64-*-* sparc*-*-* cris-*-* hppa*-*-*" } } }
function erfc()
end
 end


signature.asc
Description: PGP signature


[committed] Require target lra in gcc.dg/pr108095.c

2023-10-02 Thread John David Anglin
Committed to trunk.

Dave
---

Require target lra in gcc.dg/pr108095.c

2023-10-02  John David Anglin  

gcc/testsuite/ChangeLog:

* gcc.dg/pr108095.c: Require target lra.

diff --git a/gcc/testsuite/gcc.dg/pr108095.c b/gcc/testsuite/gcc.dg/pr108095.c
index fb76caae72e..0a487cf614a 100644
--- a/gcc/testsuite/gcc.dg/pr108095.c
+++ b/gcc/testsuite/gcc.dg/pr108095.c
@@ -1,5 +1,5 @@
 /* PR tree-optimization/108095 */
-/* { dg-do compile } */
+/* { dg-do compile { target lra } } */
 /* { dg-options "-Os -g" } */
 
 int v;


signature.asc
Description: PGP signature


[committed] Increase timeout factor for hppa*-*-* in gcc.dg/long_branch.c

2023-10-02 Thread John David Anglin
Committed to trunk.

Dave
---

Increase timeout factor for hppa*-*-* in gcc.dg/long_branch.c

2023-10-02  John David Anglin  

gcc/testsuite/ChangeLog:

* gcc.dg/long_branch.c: Increase timeout factor for hppa*-*-*.

diff --git a/gcc/testsuite/gcc.dg/long_branch.c 
b/gcc/testsuite/gcc.dg/long_branch.c
index c1ac24f5116..ba80ab3d15b 100644
--- a/gcc/testsuite/gcc.dg/long_branch.c
+++ b/gcc/testsuite/gcc.dg/long_branch.c
@@ -1,7 +1,7 @@
 /* { dg-do run } */
 /* { dg-options "-O2 -fno-reorder-blocks" } */
 /* { dg-skip-if "limited code space" { pdp11-*-* } } */
-/* { dg-timeout-factor 2.0 { target hppa*-*-* } } */
+/* { dg-timeout-factor 4.0 { target hppa*-*-* } } */
 
 void abort ();
 


signature.asc
Description: PGP signature


[PATCH] c++: merge tsubst_copy into tsubst_copy_and_build

2023-10-02 Thread Patrick Palka
Bootstrapped and regtested on x86_64-pc-linux-gnu, does this look
OK for trunk?

-- >8 --

The relationship between tsubst_copy_and_build and tsubst_copy (two of
the main template argument substitution routines for expression trees)
is rather hazy.  The former is mostly a superset of the latter, with
some differences.

The main difference is that they handle many tree codes differently, but
much of the tree code handling in tsubst_copy appears to be dead code[1].
This is because tsubst_copy only gets directly called in a few places
and mostly on id-expressions.  The interesting exceptions are PARM_DECL,
VAR_DECL, BIT_NOT_EXPR, SCOPE_REF, TEMPLATE_ID_EXPR and IDENTIFIER_NODE:

 * for PARM_DECL and VAR_DECL, tsubst_copy_and_build calls tsubst_copy
   followed by doing some extra handling of its own
 * for BIT_NOT_EXPR tsubst_copy implicitly handles unresolved destructor
   calls (i.e. the first operand is an identifier or a type)
 * for SCOPE_REF, TEMPLATE_ID_EXPR and IDENTIFIER_NODE tsubst_copy
   refrains from doing name lookup of the terminal name

Other more minor differences are that tsubst_copy exits early when
'args' is null, and it calls maybe_dependent_member_ref, and finally
it dispatches to tsubst for type trees.

Thus tsubst_copy is (at this point) similar enough to tsubst_copy_and_build
that it makes sense to merge the two functions, with the main difference
being the name lookup behavior[2].  So this patch merges tsubst_copy into
tsubst_copy_and_build via a new tsubst tf_no_name_lookup which controls
name lookup and resolution of a (top-level) id-expression.

[1]: http://thrifty.mooo.com:8008/gcc-lcov/gcc/cp/pt.cc.gcov.html#17231
[2]: I don't know the history of tsubst_copy but I would guess it was
added before we settled on using processing_template_decl to control
whether our AST building routines perform semantic checking and return
non-templated trees, and so we needed a separate tsubst routine that
avoids semantic checking and always returns a templated tree for e.g.
partial substitution.

gcc/cp/ChangeLog:

* cp-tree.h (enum tsubst_flags): Add tf_no_name_lookup.
* pt.cc (tsubst_copy):
(tsubst_pack_expansion): Use tsubst for substituting BASES_TYPE.
(tsubst_decl) : Use tsubst_copy_and_build with
tf_no_name_lookup instead of tsubst_copy.
(tsubst) : Use tsubst_copy_and_build
instead of tsubst_copy for substituting
CLASS_PLACEHOLDER_TEMPLATE.
: Use tsubst_copy_and_build with
tf_no_name_lookup instead of tsubst_copy for substituting
TYPENAME_TYPE_FULLNAME.
(tsubst_qualified_id): Likewise for substituting the component
name of a SCOPE_REF.
(tsubst_copy): Remove.
(tsubst_copy_and_build): Clear tf_no_name_lookup at the start,
and remember if it was set.  Call maybe_dependent_member_ref.
: Don't do name lookup if tf_no_name_lookup
was set.
: Don't finish a template-id if
tf_no_name_lookup was set.
: Handle identifier and type operand (if
tf_no_name_lookup was set).
: Avoid trying to resolve a SCOPE_REF if
tf_no_name_lookup by calling build_qualified_name directly
instead of tsubst_qualified_id.
: Handling of sizeof...  copied from tsubst_copy.
: Use tsubst_copy_and_build with
tf_no_name_lookup instead of tsubst_copy to substitute
a TEMPLATE_ID_EXPR callee naming an unresolved template.
: Likewise to substitute the member.
: Copied from tsubst_copy and merged with ...
: ... these.  Initial handling copied
from tsubst_copy.  Optimize local variable substitution by
trying retrieve_local_specialization before checking
uses_template_parms.
: Copied from tsubst_copy.
: Likewise.
: Likewise.
: Likewise.
: Likewise.
: Likewise.
: Likewise.
: Likewise.
: Likewise.
: Likewise.
: Likewise.
: Likewise.
: Likewise.
: Likewise.
: Likewise.
: Use tsubst and tsubst_copy_and_build instead
of tsubst_copy.
: Copied from tsubst_copy.
(tsubst_initializer_list): Use tsubst and tsubst_copy_and_build
instead of tsubst_copy.
---
 gcc/cp/cp-tree.h |3 +
 gcc/cp/pt.cc | 1742 +++---
 2 files changed, 719 insertions(+), 1026 deletions(-)

diff --git a/gcc/cp/cp-tree.h b/gcc/cp/cp-tree.h
index 8b9a7d58462..919eab34803 100644
--- a/gcc/cp/cp-tree.h
+++ b/gcc/cp/cp-tree.h
@@ -5619,6 +5619,9 @@ enum tsubst_flags {
   tf_qualifying_scope = 1 << 14, /* Substituting the LHS of the :: operator.
Affects TYPENAME_TYPE resolution from
make_typename_type.  */
+  tf_no_name_lookup = 1 << 15, /* Don't look up the terminal name of an
+ outermost id-expression, 

[pushed] contrib: Update Darwin entries in config-list.mk

2023-10-02 Thread Iain Sandoe
Although the script itself is very unlikely to be useful (it would assume
availability of both 'binutils' and a sysroot for each target) the list
is used by at least one vendor for some testing.  So I'd encourage other
port maintainers to make sure their entries are up to date!

I tested this on x86_64-linux with a script kindly shared by Martin and
it behaved as expected with an assortment of Darwin targets.  Pushed to
trunk, thanks.
Iain

--- 8< ---

This list was out of date, and included cases that are not well-supported
for cross-compilers.

This updates the list to bracket the range of OS versions we support and
to drop one earlier case where GCC will no longer build with native tools.

contrib/ChangeLog:

* config-list.mk: Add newer Darwin versions, trim one older.
Remove cases with no OS version, which is not supported for cross-
compilers.

Signed-off-by: Iain Sandoe 
---
 contrib/config-list.mk | 16 +---
 1 file changed, 9 insertions(+), 7 deletions(-)

diff --git a/contrib/config-list.mk b/contrib/config-list.mk
index e570b13c71b..c090be13147 100644
--- a/contrib/config-list.mk
+++ b/contrib/config-list.mk
@@ -29,7 +29,8 @@ GCC_SRC_DIR=../../gcc
 # > make.out 2>&1 &
 #
 
-LIST = aarch64-elf aarch64-freebsd13 aarch64-linux-gnu aarch64-rtems \
+LIST = \
+  aarch64-elf aarch64-freebsd13 aarch64-linux-gnu aarch64-rtems \
   alpha-linux-gnu alpha-netbsd alpha-openbsd \
   alpha64-dec-vms alpha-dec-vms \
   amdgcn-amdhsa \
@@ -47,11 +48,11 @@ LIST = aarch64-elf aarch64-freebsd13 aarch64-linux-gnu 
aarch64-rtems \
   hppa-linux-gnuOPT-enable-sjlj-exceptions=yes hppa64-linux-gnu \
   hppa64-hpux11.3 \
   hppa64-hpux11.0OPT-enable-sjlj-exceptions=yes \
-  i686-pc-linux-gnu i686-apple-darwin i686-apple-darwin9 i686-apple-darwin10 \
+  i686-apple-darwin9 i686-apple-darwin13 i686-apple-darwin17 \
   i686-freebsd13 i686-kfreebsd-gnu \
   i686-netbsdelf9 \
   i686-openbsd i686-elf i686-kopensolaris-gnu i686-gnu \
-  i686-pc-msdosdjgpp i686-lynxos i686-nto-qnx \
+  i686-pc-linux-gnu i686-pc-msdosdjgpp i686-lynxos i686-nto-qnx \
   i686-rtems i686-solaris2.11 i686-wrs-vxworks \
   i686-wrs-vxworksae \
   i686-cygwinOPT-enable-threads=yes i686-mingw32crt ia64-elf \
@@ -75,8 +76,8 @@ LIST = aarch64-elf aarch64-freebsd13 aarch64-linux-gnu 
aarch64-rtems \
   nvptx-none \
   or1k-elf or1k-linux-uclibc or1k-linux-musl or1k-rtems \
   pdp11-aout \
-  powerpc-darwin8 \
-  powerpc-darwin7 powerpc64-darwin powerpc-freebsd13 powerpc-netbsd \
+  powerpc-apple-darwin9 powerpc64-apple-darwin9 powerpc-apple-darwin8 \
+  powerpc-freebsd13 powerpc-netbsd \
   powerpc-eabisimaltivec powerpc-eabisim ppc-elf \
   powerpc-eabialtivec powerpc-xilinx-eabi powerpc-eabi \
   powerpc-rtems \
@@ -96,8 +97,9 @@ LIST = aarch64-elf aarch64-freebsd13 aarch64-linux-gnu 
aarch64-rtems \
   sparc-wrs-vxworks sparc64-elf sparc64-rtems sparc64-linux \
   sparc64-netbsd sparc64-openbsd \
   v850e1-elf v850e-elf v850-elf v850-rtems vax-linux-gnu \
-  vax-netbsdelf visium-elf x86_64-apple-darwin x86_64-gnu \
-  x86_64-pc-linux-gnuOPT-with-fpmath=avx \
+  vax-netbsdelf visium-elf \
+  x86_64-apple-darwin10 x86_64-apple-darwin15 x86_64-apple-darwin21 \
+  x86_64-gnu x86_64-pc-linux-gnuOPT-with-fpmath=avx \
   x86_64-elfOPT-with-fpmath=sse x86_64-freebsd13 x86_64-netbsd \
   x86_64-w64-mingw32 \
   x86_64-mingw32OPT-enable-sjlj-exceptions=yes x86_64-rtems \
-- 
2.39.2 (Apple Git-143)



Re: [PATCH v3] RISC-V:Optimize the MASK opt generation

2023-10-02 Thread Jeff Law




On 10/2/23 12:03, David Edelsohn wrote:
On Mon, Oct 2, 2023 at 1:59 PM Jeff Law > wrote:




On 10/2/23 11:20, David Edelsohn wrote:
 > Wang,
 >
 > The AWK portions of this patch broke bootstrap on AIX.
 >
 > Also, the AWK portions are common code, not RISC-V specific.  I
don't
 > see anywhere that the common portions of the patch were reviewed or
 > approved by anyone with authority to approve the changes to the
AWK files.
 >
 > This patch should not have been committed without approval by a
reviewer
 > with authority for that portion of the compiler and should have been
 > tested on targets other than RISC-V if common parts of the
compiler were
 > changed.
I acked the generic bits.  So the lack of testing on another target is
on me.


Hi, Jeff

Sorry. I didn't see a comment from a global reviewer in the V3 thread.

NP.



I am using Gawk on AIX.  After the change, I see a parse error from 
gawk.  I'm rebuilding with a checkout just before the change to confirm 
that it was the source of the error, and it seems to be past that 
failure location.  I didn't keep the exact error.  Once I get past this 
build cycle, I'll reproduce it.
I think there's already a patch circulating which fixes this.  It broke 
at least one other platform.  Hopefully it'll all be sorted out today.



jeff


Re: [PATCH v3] RISC-V:Optimize the MASK opt generation

2023-10-02 Thread David Edelsohn
On Mon, Oct 2, 2023 at 1:59 PM Jeff Law  wrote:

>
>
> On 10/2/23 11:20, David Edelsohn wrote:
> > Wang,
> >
> > The AWK portions of this patch broke bootstrap on AIX.
> >
> > Also, the AWK portions are common code, not RISC-V specific.  I don't
> > see anywhere that the common portions of the patch were reviewed or
> > approved by anyone with authority to approve the changes to the AWK
> files.
> >
> > This patch should not have been committed without approval by a reviewer
> > with authority for that portion of the compiler and should have been
> > tested on targets other than RISC-V if common parts of the compiler were
> > changed.
> I acked the generic bits.  So the lack of testing on another target is
> on me.
>

Hi, Jeff

Sorry. I didn't see a comment from a global reviewer in the V3 thread.

I am using Gawk on AIX.  After the change, I see a parse error from gawk.
I'm rebuilding with a checkout just before the change to confirm that it
was the source of the error, and it seems to be past that failure
location.  I didn't keep the exact error.  Once I get past this build
cycle, I'll reproduce it.

Thanks, David


Re: [PATCH v3] RISC-V:Optimize the MASK opt generation

2023-10-02 Thread Jeff Law




On 10/2/23 11:20, David Edelsohn wrote:

Wang,

The AWK portions of this patch broke bootstrap on AIX.

Also, the AWK portions are common code, not RISC-V specific.  I don't 
see anywhere that the common portions of the patch were reviewed or 
approved by anyone with authority to approve the changes to the AWK files.


This patch should not have been committed without approval by a reviewer 
with authority for that portion of the compiler and should have been 
tested on targets other than RISC-V if common parts of the compiler were 
changed.
I acked the generic bits.  So the lack of testing on another target is 
on me.


jeff


Re: [PATCH v3] RISC-V:Optimize the MASK opt generation

2023-10-02 Thread David Edelsohn
Wang,

The AWK portions of this patch broke bootstrap on AIX.

Also, the AWK portions are common code, not RISC-V specific.  I don't see
anywhere that the common portions of the patch were reviewed or approved by
anyone with authority to approve the changes to the AWK files.

This patch should not have been committed without approval by a reviewer
with authority for that portion of the compiler and should have been tested
on targets other than RISC-V if common parts of the compiler were changed.

Thanks, David


Re: [PATCH v2] ARM: Block predication on atomics [PR111235]

2023-10-02 Thread Wilco Dijkstra
Hi Ramana,

>> I used --target=arm-none-linux-gnueabihf --host=arm-none-linux-gnueabihf
>> --build=arm-none-linux-gnueabihf --with-float=hard. However it seems that the
>> default armhf settings are incorrect. I shouldn't need the --with-float=hard 
>> since
>> that is obviously implied by armhf, and they should also imply armv7-a with 
>> vfpv3
>> according to documentation. It seems to get confused and skip some tests. I 
>> tried
>> using --with-fpu=auto, but that doesn't work at all, so in the end I forced 
>> it like:
>> --with-arch=armv8-a --with-fpu=neon-fp-armv8. With this it runs a few more 
>> tests.
> 
> Yeah that's a wart that I don't like.
> 
> armhf just implies the hard float ABI and came into being to help
> distinguish from the Base PCS for some of the distros at the time
> (2010s). However we didn't want to set a baseline arch at that time
> given the imminent arrival of v8-a and thus the specification of
> --with-arch , --with-fpu and --with-float became second nature to many
> of us working on it at that time.

Looking at it, the default is indeed incorrect, you get:
'-mcpu=arm10e' '-mfloat-abi=hard' '-marm' '-march=armv5te+fp'

That's like 25 years out of date!

However all the armhf distros have Armv7-a as the baseline and use Thumb-2:
'-mfloat-abi=hard' '-mthumb' '-march=armv7-a+fp'

So the issue is that dg-require-effective-target arm_arch_v7a_ok doesn't work on
armhf. It seems that if you specify an architecture even with hard-float 
configured,
it turns off FP and then complains because hard-float implies you must have 
FP...

So in most configurations Iincluding the one used by distro compilers) we 
basically
skip lots of tests for no apparent reason...

> Ok, thanks for promising to do so - I trust you to get it done. Please
> try out various combinations of -march v7ve, v7-a , v8-a with the tool
> as each of them have slightly different rules. For instance v7ve
> allows LDREXD and STREXD to be single copy atomic for 64 bit loads
> whereas v7-a did not .

You mean LDRD may be generated on CPUs with LPAE. We use LDREXD by
default since that is always atomic on v7-a.

> Ok if no regressions but as you might get nagged by the post commit CI ...

Thanks, I've committed it. Those links don't show anything concrete, however I 
do note
the CI didn't pick up v2.

Btw you're happy with backports if there are no issues reported for a few days?

Cheers,
Wilco

Re: [PATCH] Fix coroutine tests for libstdc++ gnu-version-namespace mode

2023-10-02 Thread François Dumont

Hi

Gentle reminder for this minor patch.

Thanks

On 23/09/2023 22:10, François Dumont wrote:
I'm eventually fixing those tests the same way we manage this problem 
in libstdc++ testsuite.


   testsuite: Add optional libstdc++ version namespace in expected 
diagnostic


    When libstdc++ is build with 
--enable-symvers=gnu-versioned-namespace diagnostics are

    showing this namespace, currently __8.

    gcc/testsuite/ChangeLog:

    * 
testsuite/g++.dg/coroutines/coro-bad-alloc-00-bad-op-new.C: Add optional

    '__8' version namespace in expected diagnostic.
    * 
testsuite/g++.dg/coroutines/coro-bad-alloc-01-bad-op-del.C: Likewise.
    * 
testsuite/g++.dg/coroutines/coro-bad-alloc-02-no-op-new-nt.C: Likewise.
    * 
testsuite/g++.dg/coroutines/coro-bad-grooaf-01-grooaf-expected.C: 
Likewise.

    * testsuite/g++.dg/coroutines/pr97438.C: Likewise.
    * testsuite/g++.dg/coroutines/ramp-return-b.C: Likewise.

Tested under Linux x86_64.

I'm contributing to libstdc++ so I already have write access.

Ok to commit ?

François


RE: [ARC PATCH] Use rlc r0, 0 to implement scc_ltu (i.e. carry_flag ? 1 : 0)

2023-10-02 Thread Claudiu Zissulescu
Hi Roger,

Everything is good. Ok for mainline.

Thank you for your contribution,
Claudiu

-Original Message-
From: Claudiu Zissulescu 
Sent: Sunday, October 1, 2023 5:33 PM
To: Jeff Law ; Roger Sayle 
Cc: gcc-patches@gcc.gnu.org
Subject: RE: [ARC PATCH] Use rlc r0, 0 to implement scc_ltu (i.e. carry_flag ? 
1 : 0)

I'll add it to our nightly. Just to be sure 😊 I’ll let you know asap it's 
status.

Roger, you can always use Synopsys free nsim simulator which you can find it on 
Synopsys website.

Thanks,
Claudiu


-Original Message-
From: Jeff Law  
Sent: Saturday, September 30, 2023 1:02 AM
To: Roger Sayle ; Claudiu Zissulescu 

Cc: gcc-patches@gcc.gnu.org
Subject: Re: [ARC PATCH] Use rlc r0, 0 to implement scc_ltu (i.e. carry_flag ? 
1 : 0)



On 9/29/23 15:11, Roger Sayle wrote:
> 
> Hi Claudiu,
>> The patch looks sane. Have you run dejagnu test suite?
> 
> I've not yet managed to set up an emulator or compile the entire 
> toolchain, so my dejagnu results are only useful for catching 
> (serious) problems in the compile only tests:
> 
>  === gcc Summary ===
> 
> # of expected passes91875
> # of unexpected failures23768
> # of unexpected successes   23
> # of expected failures  1038
> # of unresolved testcases   19490
> # of unsupported tests  3819
> /home/roger/GCC/arc-linux/gcc/xgcc  version 14.0.0 20230828 
> (experimental)
> (GCC)
> 
> If someone could double check there are no issues on real hardware 
> that would be great.  I'm not sure if ARC is one of the targets 
> covered by Jeff Law's compile farm?
It is :-)  Runs daily, about 4:30 am UTC.  So if the bits go in we'd have data 
within 24hrs.


Jeff



[pushed] diagnostics: group together source printing fields of diagnostic_context

2023-10-02 Thread David Malcolm
struct diagnostic_context has > 60 fields.

Try to tame some of the complexity by grouping together the 8
source-printing fields into a struct, the "m_source_printing" field.

No functional change intended.

Successfully bootstrapped & regrtested on x86_64-pc-linux-gnu.
Pushed to trunk as r14-4367-gc5c565eff6277a.

gcc/ada/ChangeLog:
* gcc-interface/misc.cc (gnat_post_options): Update for renaming
of diagnostic_context's show_caret to m_source_printing.enabled.

gcc/analyzer/ChangeLog:
* program-point.cc: Update for grouping of source printing fields
within diagnostic_context.

gcc/c-family/ChangeLog:
* c-common.cc (maybe_add_include_fixit): Update for renaming of
diagnostic_context's show_caret to m_source_printing.enabled.
* c-opts.cc (c_common_init_options): Update for renaming of
diagnostic_context's colorize_source_p to
m_source_printing.colorize_source_p.

gcc/ChangeLog:
* diagnostic-show-locus.cc: Update for reorganization of
source-printing fields of diagnostic_context.
* diagnostic.cc (diagnostic_set_caret_max_width): Likewise.
(diagnostic_initialize): Likewise.
* diagnostic.h (diagnostic_context::show_caret): Move to...
(diagnostic_context::m_source_printing::enabled): ...here.
(diagnostic_context::caret_max_width): Move to...
(diagnostic_context::m_source_printing::max_width): ...here.
(diagnostic_context::caret_chars): Move to...
(diagnostic_context::m_source_printing::caret_chars): ...here.
(diagnostic_context::colorize_source_p): Move to...
(diagnostic_context::m_source_printing::colorize_source_p): ...here.
(diagnostic_context::show_labels_p): Move to...
(diagnostic_context::m_source_printing::show_labels_p): ...here.
(diagnostic_context::show_line_numbers_p): Move to...
(diagnostic_context::m_source_printing::show_line_numbers_p): ...here.
(diagnostic_context::min_margin_width): Move to...
(diagnostic_context::m_source_printing::min_margin_width): ...here.
(diagnostic_context::show_ruler_p): Move to...
(diagnostic_context::m_source_printing::show_ruler_p): ...here.
(diagnostic_same_line): Update for above changes.
* opts.cc (common_handle_option): Update for reorganization of
source-printing fields of diagnostic_context.
* selftest-diagnostic.cc
(test_diagnostic_context::test_diagnostic_context): Likewise.
* toplev.cc (general_init): Likewise.
* tree-diagnostic-path.cc (struct event_range): Likewise.

gcc/fortran/ChangeLog:
* error.cc (gfc_diagnostic_starter): Update for reorganization of
source-printing fields of diagnostic_context.
(gfc_diagnostics_init): Likewise.
(gfc_diagnostics_finish): Likewise.

gcc/testsuite/ChangeLog:
* gcc.dg/plugin/diagnostic_plugin_show_trees.c: Update for
reorganization of source-printing fields of diagnostic_context.
* gcc.dg/plugin/diagnostic_plugin_test_inlining.c: Likewise.
* gcc.dg/plugin/diagnostic_plugin_test_paths.c: Likewise.
* gcc.dg/plugin/diagnostic_plugin_test_show_locus.c: Likewise.
* gcc.dg/plugin/diagnostic_plugin_test_string_literals.c: Likewise.
* gcc.dg/plugin/diagnostic_plugin_test_tree_expression_range.c:
Likewise.
---
 gcc/ada/gcc-interface/misc.cc |  2 +-
 gcc/analyzer/program-point.cc |  4 +-
 gcc/c-family/c-common.cc  |  2 +-
 gcc/c-family/c-opts.cc|  2 +-
 gcc/diagnostic-show-locus.cc  | 93 ++-
 gcc/diagnostic.cc | 16 ++--
 gcc/diagnostic.h  | 73 ---
 gcc/fortran/error.cc  | 10 +-
 gcc/opts.cc   |  8 +-
 gcc/selftest-diagnostic.cc|  8 +-
 .../plugin/diagnostic_plugin_show_trees.c |  2 +-
 .../plugin/diagnostic_plugin_test_inlining.c  |  2 +-
 .../plugin/diagnostic_plugin_test_paths.c |  2 +-
 .../diagnostic_plugin_test_show_locus.c   | 26 +++---
 .../diagnostic_plugin_test_string_literals.c  |  2 +-
 ...nostic_plugin_test_tree_expression_range.c |  2 +-
 gcc/toplev.cc |  8 +-
 gcc/tree-diagnostic-path.cc   |  2 +-
 18 files changed, 140 insertions(+), 124 deletions(-)

diff --git a/gcc/ada/gcc-interface/misc.cc b/gcc/ada/gcc-interface/misc.cc
index 3b21bf5b43a..269c15e4b0d 100644
--- a/gcc/ada/gcc-interface/misc.cc
+++ b/gcc/ada/gcc-interface/misc.cc
@@ -269,7 +269,7 @@ gnat_post_options (const char **pfilename ATTRIBUTE_UNUSED)
 
   /* No caret by default for Ada.  */
   if (!OPTION_SET_P (flag_diagnostics_show_caret))
-global_dc->show_caret = false;
+global_dc->m_source_printing.enabled = false;
 
   /* Copy global settings to local versions.

[pushed] diagnostics: add diagnostic_output_format class

2023-10-02 Thread David Malcolm
Eliminate various global variables in the json/sarif output code by
bundling together callbacks and state into a new diagnostic_output_format
class, with per-output-format subclasses.

No functional change intended.

Successfully bootstrapped & regrtested on x86_64-pc-linux-gnu.
Pushed to trunk as r14-4368-g140820265d96b0.

gcc/ChangeLog:
* diagnostic-format-json.cc (toplevel_array): Remove global in
favor of json_output_format::m_top_level_array.
(cur_group): Likewise, for json_output_format::m_cur_group.
(cur_children_array): Likewise, for
json_output_format::m_cur_children_array.
(class json_output_format): New.
(json_begin_diagnostic): Remove, in favor of
json_output_format::on_begin_diagnostic.
(json_end_diagnostic): Convert to...
(json_output_format::on_end_diagnostic): ...this.
(json_begin_group): Remove, in favor of
json_output_format::on_begin_group.
(json_end_group): Remove, in favor of
json_output_format::on_end_group.
(json_flush_to_file): Remove, in favor of
json_output_format::flush_to_file.
(json_stderr_final_cb): Remove, in favor of json_output_format
dtor.
(json_output_base_file_name): Remove global.
(class json_stderr_output_format): New.
(json_file_final_cb): Remove.
(class json_file_output_format): New.
(json_emit_diagram): Remove.
(diagnostic_output_format_init_json): Update.
(diagnostic_output_format_init_json_file): Update.
* diagnostic-format-sarif.cc (the_builder): Remove this global,
moving to a field of the sarif_output_format.
(sarif_builder::maybe_make_artifact_content_object): Use the
context's m_file_cache.
(get_source_lines): Convert to...
(sarif_builder::get_source_lines): ...this, using context's
m_file_cache.
(sarif_begin_diagnostic): Remove, in favor of
sarif_output_format::on_begin_diagnostic.
(sarif_end_diagnostic): Remove, in favor of
sarif_output_format::on_end_diagnostic.
(sarif_begin_group): Remove, in favor of
sarif_output_format::on_begin_group.
(sarif_end_group): Remove, in favor of
sarif_output_format::on_end_group.
(sarif_flush_to_file): Delete.
(sarif_stderr_final_cb): Delete.
(sarif_output_base_file_name): Delete.
(sarif_file_final_cb): Delete.
(class sarif_output_format): New.
(sarif_emit_diagram): Delete.
(class sarif_stream_output_format): New.
(class sarif_file_output_format): New.
(diagnostic_output_format_init_sarif): Update.
(diagnostic_output_format_init_sarif_stderr): Update.
(diagnostic_output_format_init_sarif_file): Update.
(diagnostic_output_format_init_sarif_stream): Update.
* diagnostic-show-locus.cc (diagnostic_show_locus): Update.
* diagnostic.cc (default_diagnostic_final_cb): Delete, moving to
diagnostic_text_output_format's dtor.
(diagnostic_initialize): Update, making a new instance of
diagnostic_text_output_format.
(diagnostic_finish): Delete m_output_format, rather than calling
final_cb.
(diagnostic_report_diagnostic): Assert that m_output_format is
non-NULL.  Replace call to begin_group_cb with call to
m_output_format->on_begin_group.  Replace call to
diagnostic_starter with call to
m_output_format->on_begin_diagnostic.  Replace call to
diagnostic_finalizer with call to
m_output_format->on_end_diagnostic.
(diagnostic_emit_diagram): Replace both optional call to
m_diagrams.m_emission_cb and default implementation with call to
m_output_format->on_diagram.  Move default implementation to
diagnostic_text_output_format::on_diagram.
(auto_diagnostic_group::~auto_diagnostic_group): Replace call to
end_group_cb with call to m_output_format->on_end_group.
(diagnostic_text_output_format::~diagnostic_text_output_format):
New, based on default_diagnostic_final_cb.
(diagnostic_text_output_format::on_begin_diagnostic): New, based
on code from diagnostic_report_diagnostic.
(diagnostic_text_output_format::on_end_diagnostic): Likewise.
(diagnostic_text_output_format::on_diagram): New, based on code
from diagnostic_emit_diagram.
* diagnostic.h (class diagnostic_output_format): New.
(class diagnostic_text_output_format): New.
(diagnostic_context::begin_diagnostic): Move to...
(diagnostic_context::m_text_callbacks::begin_diagnostic): ...here.
(diagnostic_context::start_span): Move to...
(diagnostic_context::m_text_callbacks::start_span): ...here.
(diagnostic_context::end_diagnostic): Move to...
(diagnostic_context::m_text_callbacks::end_diagnostic): ...here.
(diagn

[pushed] diagnostics: fix missing init of set_locations_cb

2023-10-02 Thread David Malcolm
Successfully bootstrapped & regrtested on x86_64-pc-linux-gnu.
Pushed to trunk as r14-4366-gc64693fb885f21.

gcc/ChangeLog:
* diagnostic.cc (diagnostic_initialize): Initialize
set_locations_cb to nullptr.
---
 gcc/diagnostic.cc | 1 +
 1 file changed, 1 insertion(+)

diff --git a/gcc/diagnostic.cc b/gcc/diagnostic.cc
index 00183b10700..28ab74ff23e 100644
--- a/gcc/diagnostic.cc
+++ b/gcc/diagnostic.cc
@@ -245,6 +245,7 @@ diagnostic_initialize (diagnostic_context *context, int 
n_opts)
   context->begin_group_cb = NULL;
   context->end_group_cb = NULL;
   context->final_cb = default_diagnostic_final_cb;
+  context->set_locations_cb = nullptr;
   context->ice_handler_cb = NULL;
   context->includes_seen = NULL;
   context->m_client_data_hooks = NULL;
-- 
2.26.3



Re: [PATCH] RFC: Add late-combine pass [PR106594]

2023-10-02 Thread Robin Dapp
Hi Richard,

cool, thanks.  I just gave it a try with my test cases and it does what
it is supposed to do, at least if I disable the register pressure check :)
A cursory look over the test suite showed no major regressions and just
some overly specific tests.

My test case only works before split, though, as the UNSPEC predicates will
prevent further combination afterwards.

Right now the (pre-RA) code combines every instance disregarding the actual
pressure and just checking if the "new" value does not occupy more registers
than the old one.

- Shouldn't the "pressure" also depend on the number of available hard regs
(i.e. an nregs = 2 is not necessarily worse than nregs = 1 if we have 32
hard regs in the new class vs 16 in the old one)?

- I assume/hope you expected my (now obsolete) fwprop change could be re-used?
Otherwise we wouldn't want to unconditionally "propagate" into a loop for 
example?
For my test case the combination of the vec_duplicate into all insns leads
to "high" register pressure that we could avoid.

How should we continue here?  I suppose you'll first want to get this version
to the trunk before complicating it further.

Regards
 Robin


Re: [PATCH 1/5] OpenMP, NVPTX: memcpy[23]D bias correction

2023-10-02 Thread Julian Brown
On Wed, 27 Sep 2023 00:57:58 +0200
Thomas Schwinge  wrote:

> On 2023-09-06T02:34:30-0700, Julian Brown 
> wrote:
> > This patch works around behaviour of the 2D and 3D memcpy
> > operations in the CUDA driver runtime.  Particularly in Fortran,
> > the "base pointer" of an array (used for either source or
> > destination of a host/device copy) may lie outside of data that is
> > actually stored on the device.  The fix is to make sure that we use
> > the first element of data to be transferred instead, and adjust
> > parameters accordingly.  
> 
> Do you (a) have a stand-alone test case for this (that is, not
> depending on your other pending patches, so that this could go in
> directly -- together with the before-FAIL test case).

Thanks for the reply! Here's a version with a stand-alone test case.

> Do you (b)
> know if is this a bug in our use of the CUDA Driver API or rather in
> CUDA itself?  If the latter, have you reported this to Nvidia?

I don't think the CUDA behaviour is *wrong*, as such -- at least to the
C/C++ way of thinking (or indeed a graphics-oriented way of thinking),
one would normally think of an array as having a zero-based origin, and
these 2D/3D memory copies would be intended as a way of updating just a
part of an array (or texture) that has full duplicate copies on both
the host and device.  Our use-case just happens to be a bit different,
both because Fortran (internally) represents an array by a zero-based
origin but may use 1-based (or whatever-based) indices, and because we
support partial mappings of host arrays on the device in all three
supported languages -- which amounts to much the same thing, actually.

That said, it *could* be fixed in CUDA, though probably not in all the
versions currently deployed out there in the world.  So I guess we'd
still need a patch like this anyway.

Julian
commit f6fd3ad060bbe5c57661cd861d009dbc2b415778
Author: Julian Brown 
Date:   Wed Aug 23 23:46:29 2023 +

OpenMP, NVPTX: memcpy[23]D bias correction

This patch works around behaviour of the 2D and 3D memcpy operations in
the CUDA driver runtime.  Particularly in Fortran, the "base pointer"
of an array (used for either source or destination of a host/device copy)
may lie outside of data that is actually stored on the device.  The fix
is to make sure that we use the first element of data to be transferred
instead, and adjust parameters accordingly.

2023-10-02  Julian Brown  

libgomp/
* plugin/plugin-nvptx.c (GOMP_OFFLOAD_memcpy2d): Adjust parameters to
avoid out-of-bounds array checks in CUDA runtime.
(GOMP_OFFLOAD_memcpy3d): Likewise.
* testsuite/libgomp.c-c++-common/memcpyxd-bias-1.c: New test.

diff --git a/libgomp/plugin/plugin-nvptx.c b/libgomp/plugin/plugin-nvptx.c
index 00d4241ae02..cefe288a8aa 100644
--- a/libgomp/plugin/plugin-nvptx.c
+++ b/libgomp/plugin/plugin-nvptx.c
@@ -1827,6 +1827,35 @@ GOMP_OFFLOAD_memcpy2d (int dst_ord, int src_ord, size_t dim1_size,
   data.srcXInBytes = src_offset1_size;
   data.srcY = src_offset0_len;
 
+  if (data.srcXInBytes != 0 || data.srcY != 0)
+{
+  /* Adjust origin to the actual array data, else the CUDA 2D memory
+	 copy API calls below may fail to validate source/dest pointers
+	 correctly (especially for Fortran where the "virtual origin" of an
+	 array is often outside the stored data).  */
+  if (src_ord == -1)
+	data.srcHost = (const void *) ((const char *) data.srcHost
+  + data.srcY * data.srcPitch
+  + data.srcXInBytes);
+  else
+	data.srcDevice += data.srcY * data.srcPitch + data.srcXInBytes;
+  data.srcXInBytes = 0;
+  data.srcY = 0;
+}
+
+  if (data.dstXInBytes != 0 || data.dstY != 0)
+{
+  /* As above.  */
+  if (dst_ord == -1)
+	data.dstHost = (void *) ((char *) data.dstHost
+ + data.dstY * data.dstPitch
+ + data.dstXInBytes);
+  else
+	data.dstDevice += data.dstY * data.dstPitch + data.dstXInBytes;
+  data.dstXInBytes = 0;
+  data.dstY = 0;
+}
+
   CUresult res = CUDA_CALL_NOCHECK (cuMemcpy2D, &data);
   if (res == CUDA_ERROR_INVALID_VALUE)
 /* If pitch > CU_DEVICE_ATTRIBUTE_MAX_PITCH or for device-to-device
@@ -1895,6 +1924,44 @@ GOMP_OFFLOAD_memcpy3d (int dst_ord, int src_ord, size_t dim2_size,
   data.srcY = src_offset1_len;
   data.srcZ = src_offset0_len;
 
+  if (data.srcXInBytes != 0 || data.srcY != 0 || data.srcZ != 0)
+{
+  /* Adjust origin to the actual array data, else the CUDA 3D memory
+	 copy API call below may fail to validate source/dest pointers
+	 correctly (especially for Fortran where the "virtual origin" of an
+	 array is often outside the stored data).  */
+  if (src_ord == -1)
+	data.srcHost
+	  = (const void *) ((const char *) data.srcHost
+			+ (data.srcZ * data.srcHeight + data.srcY)
+			  * data.srcPitch
+			+ data.srcXInBytes);
+  else
+	data.srcDevice
+	  += (data.srcZ * data.srcHeight + data.s

Re: [PATCH v2] RISC-V: Implement TLS Descriptors.

2023-10-02 Thread Kito Cheng
Just one nit and one more comment for doc:

Could you add some doc something like that? mostly I grab from other
target, so you can just included in the patch.

diff --git a/gcc/doc/install.texi b/gcc/doc/install.texi
index 31f2234640f..39396668da2 100644
--- a/gcc/doc/install.texi
+++ b/gcc/doc/install.texi
@@ -1174,6 +1174,9 @@ Specify the default TLS dialect, for systems
were there is a choice.
For ARM targets, possible values for @var{dialect} are @code{gnu} or
@code{gnu2}, which select between the original GNU dialect and the GNU TLS
descriptor-based dialect.
+For RISC-V targets, possible values for @var{dialect} are @code{trad} or
+@code{desc}, which select between the traditional GNU dialect and the GNU TLS
+descriptor-based dialect.

@item --enable-multiarch
Specify whether to enable or disable multiarch support.  The default is
diff --git a/gcc/doc/invoke.texi b/gcc/doc/invoke.texi
index 4085fc90907..459e266d426 100644
--- a/gcc/doc/invoke.texi
+++ b/gcc/doc/invoke.texi
@@ -1239,7 +1239,8 @@ See RS/6000 and PowerPC Options.
-minline-atomics  -mno-inline-atomics
-minline-strlen  -mno-inline-strlen
-minline-strcmp  -mno-inline-strcmp
--minline-strncmp  -mno-inline-strncmp}
+-minline-strncmp  -mno-inline-strncmp
+-mtls-dialect=desc  -mtls-dialect=trad}

@emph{RL78 Options}
@gccoptlist{-msim  -mmul=none  -mmul=g13  -mmul=g14  -mallregs
@@ -29538,6 +29539,17 @@ which register to use as base register for
reading the canary,
and from what offset from that base register. There is no default
register or offset as this is entirely for use within the Linux
kernel.
+
+@opindex mtls-dialect=desc
+@item -mtls-dialect=desc
+Use TLS descriptors as the thread-local storage mechanism for dynamic accesses
+of TLS variables.  This is the default.
+
+@opindex mtls-dialect=trad
+@item -mtls-dialect=traditional
+Use traditional TLS as the thread-local storage mechanism for dynamic accesses
+of TLS variables.
+
@end table

@node RL78 Options




> +(define_insn "@tlsdesc"
> +  [(set (reg:P A0_REGNUM)
> +   (unspec:P
> +   [(match_operand:P 0 "symbolic_operand" "")
> +(match_operand:P 1 "const_int_operand")]
> +   UNSPEC_TLSDESC))
> +   (clobber (reg:SI T0_REGNUM))]

P rather than SI here.

> +  "TARGET_TLSDESC"
> +  {
> +return ".LT%1: auipc\ta0, %%tlsdesc_hi(%0)\;"
> +   "\tt0,%%tlsdesc_load_lo(.LT%1)(a0)\;"
> +   "addi\ta0,a0,%%tlsdesc_add_lo(.LT%1)\;"
> +   "jalr\tt0,t0,%%tlsdesc_call(.LT%1)";
> +  }
> +  [(set_attr "type" "multi")
> +   (set_attr "length" "16")
> +   (set_attr "mode" "")])
> +
>  (define_insn "auipc"
>[(set (match_operand:P   0 "register_operand" "=r")
> (unspec:P


Re: [PATCH] RISC-V: Use stdint-gcc.h in rvv testsuite

2023-10-02 Thread Kito Cheng
> > On Tue, Sep 26, 2023 at 10:59 AM Patrick O'Neill  
> > wrote:
> >> stdint.h can be replaced with stdint-gcc.h to resolve some missing
> >> system headers in non-multilib installations.
> >>
> >> Tested using glibc rv32gcv and rv64gcv on r14-4258-gc9837443075.
> >>
> >> gcc/ChangeLog:
> >>
> >>  * config/riscv/riscv_vector.h (__RISCV_VECTOR_H): Replace
> >>  stdint.h with stdint-gcc.h
> > I don't think this will work when testing an installed compiler which I do.
> >
> > Thanks,
> > Andrew
> In the riscv target testsuite (gcc.target/riscv) all occurrences of
> #include  are currently constrained to the rvv/ subdirectory.
> All non-vector tests use #include  rather than
> #include . Have you encountered any issues when testing
> installations with non-vector tests?

I think the concern is to replace stdint.h with stdint-gcc.h for riscv_vector.h,
that means users MAY include stdint-gcc.h *AND* stdint.h, stdint.h the later
one generally is provided by libc, and stdint-gcc.h typically are not included.

Other than the changes in "riscv_vector.h", everything else looks fine to me.


RE: [PATCH]middle-end: Recursively check is_trivially_copyable_or_pair in vec.h

2023-10-02 Thread Tamar Christina
> -Original Message-
> From: Jakub Jelinek 
> Sent: Monday, October 2, 2023 2:21 PM
> To: Tamar Christina 
> Cc: gcc-patches@gcc.gnu.org; nd ; jwak...@redhat.com
> Subject: Re: [PATCH]middle-end: Recursively check
> is_trivially_copyable_or_pair in vec.h
> 
> On Mon, Oct 02, 2023 at 01:38:53PM +0100, Tamar Christina wrote:
> > Hi All,
> >
> > I recently committed a patch that uses a nested std::pair in the second
> argument.
> > It temporarily adds a second ranking variable for sorting and then later 
> > drops
> it.
> >
> > This hits the newly added assert in vec.h.  This assert made some
> > relaxation for std::pair but doesn't allow this case through.  The
> > patch allows a recursive std::pair in the second argument which fixes
> bootstrap.
> 
> I must say I still don't understand why using a struct ifcvt_arg_entry { tree 
> arg;
> unsigned len, occur; }; with comments describing what the members mean
> wouldn't be a better fix, in the sorting function what exactly means
> x{1,2}.second.first and x{1,2}.second.second isn't easily understandable,
> neither from the identifiers nor from any comments.
> Seems because you use 2 separate vectors, one with just tree elements and
> another with those tree elements + 2 unsigned values cached from it for the
> sorting purpose and then rewrite the original tree vector after sorting, I 
> don't
> really see why nested std::pair would be a better match for it than a named
> structure.  Furthermore, why populate args first, then compute the extra 2
> integers in another loop pushing to argsKV and finally overwrite args with
> sorted values?  Can't the first loop push tree with the 2 integers already?  
> And
> what is the point of not using this structure later on when both args and
> argsKV vectors are live until the end of the same function?
> Can't you either pass that argsKV to others, having just one vector, or at 
> least
> release the other vector when you don't really need it?
> Formatting style, swap? arg1 : arg0 isn't correctly formatted, missing space
> before ?.
> 
> Also, ArgEntry is CamelCase which we (usually) don't use in GCC and
> additionally doesn't seem to be unique enough for ODR purposes.
> Ditto argsKV.

Ok, since these will take a lot longer to test and do I've reverted the patch
for now since bootstrap was broken.

Thanks,
Tamar

> 
> > It should also still maintain the invariant that was being tested here
> > since the nested arguments should still be trivially copyable.
> >
> > Bootstrapped on aarch64-none-linux-gnu, x86_64-linux-gnu, and no issues.
> >
> > Ok for master?
> >
> > Thanks,
> > Tamar
> >
> > gcc/ChangeLog:
> >
> > vec.h (struct is_trivially_copyable_or_pair): Check recursively in
> 
> Missing "* " above.
> 
> > second arg.
> 
>   Jakub



Re: [PATCH]middle-end: Recursively check is_trivially_copyable_or_pair in vec.h

2023-10-02 Thread Jakub Jelinek
On Mon, Oct 02, 2023 at 01:38:53PM +0100, Tamar Christina wrote:
> Hi All,
> 
> I recently committed a patch that uses a nested std::pair in the second 
> argument.
> It temporarily adds a second ranking variable for sorting and then later 
> drops it.
> 
> This hits the newly added assert in vec.h.  This assert made some relaxation 
> for
> std::pair but doesn't allow this case through.  The patch allows a recursive
> std::pair in the second argument which fixes bootstrap.

I must say I still don't understand why using a
struct ifcvt_arg_entry { tree arg; unsigned len, occur; };
with comments describing what the members mean wouldn't be a better fix,
in the sorting function what exactly means x{1,2}.second.first and
x{1,2}.second.second isn't easily understandable, neither from the
identifiers nor from any comments.
Seems because you use 2 separate vectors, one with just tree elements and
another with those tree elements + 2 unsigned values cached from it for the
sorting purpose and then rewrite the original tree vector after sorting, I
don't really see why nested std::pair would be a better match for it than
a named structure.  Furthermore, why populate args first, then compute
the extra 2 integers in another loop pushing to argsKV and finally overwrite
args with sorted values?  Can't the first loop push tree with the 2 integers
already?  And what is the point of not using this structure later on when
both args and argsKV vectors are live until the end of the same function?
Can't you either pass that argsKV to others, having just one vector, or
at least release the other vector when you don't really need it?
Formatting style, swap? arg1 : arg0 isn't correctly formatted, missing space
before ?.

Also, ArgEntry is CamelCase which we (usually) don't use in GCC and
additionally doesn't seem to be unique enough for ODR purposes.
Ditto argsKV.

> It should also still maintain the invariant that was being tested here since
> the nested arguments should still be trivially copyable.
> 
> Bootstrapped on aarch64-none-linux-gnu, x86_64-linux-gnu, and no issues.
> 
> Ok for master?
> 
> Thanks,
> Tamar
> 
> gcc/ChangeLog:
> 
>   vec.h (struct is_trivially_copyable_or_pair): Check recursively in

Missing "* " above.

>   second arg.

Jakub



[PATCH]middle-end: Recursively check is_trivially_copyable_or_pair in vec.h

2023-10-02 Thread Tamar Christina
Hi All,

I recently committed a patch that uses a nested std::pair in the second 
argument.
It temporarily adds a second ranking variable for sorting and then later drops 
it.

This hits the newly added assert in vec.h.  This assert made some relaxation for
std::pair but doesn't allow this case through.  The patch allows a recursive
std::pair in the second argument which fixes bootstrap.

It should also still maintain the invariant that was being tested here since
the nested arguments should still be trivially copyable.

Bootstrapped on aarch64-none-linux-gnu, x86_64-linux-gnu, and no issues.

Ok for master?

Thanks,
Tamar

gcc/ChangeLog:

vec.h (struct is_trivially_copyable_or_pair): Check recursively in
second arg.

--- inline copy of patch -- 
diff --git a/gcc/vec.h b/gcc/vec.h
index d509639292b..dcc18c99bfb 100644
--- a/gcc/vec.h
+++ b/gcc/vec.h
@@ -1199,7 +1199,7 @@ namespace vec_detail
   template
   struct is_trivially_copyable_or_pair >
   : std::integral_constant::value
-&& std::is_trivially_copyable::value> { };
+&& is_trivially_copyable_or_pair::value> { };
 }
 #endif




-- 
diff --git a/gcc/vec.h b/gcc/vec.h
index d509639292b..dcc18c99bfb 100644
--- a/gcc/vec.h
+++ b/gcc/vec.h
@@ -1199,7 +1199,7 @@ namespace vec_detail
   template
   struct is_trivially_copyable_or_pair >
   : std::integral_constant::value
-&& std::is_trivially_copyable::value> { };
+&& is_trivially_copyable_or_pair::value> { };
 }
 #endif





[PATCH] Makefile.tpl: disable -Werror for feedback stage [PR111663]

2023-10-02 Thread Sergei Trofimovich
From: Sergei Trofimovich 

Without the change profiled bootstrap fails for various warnings on
master branch as:

$ ../gcc/configure
$ make profiledbootstrap
...
gcc/genmodes.cc: In function ‘int main(int, char**)’:
gcc/genmodes.cc:2152:1: error: ‘gcc/build/genmodes.gcda’ profile count data 
file not found [-Werror=missing-profile]
...
gcc/gengtype-parse.cc: In function ‘void parse_error(const char*, ...)’:
gcc/gengtype-parse.cc:142:21: error: ‘%s’ directive argument is null 
[-Werror=format-overflow=]

The change removes -Werror just like autofeedback does today.

/

PR bootstrap/111663
* Makefile.tpl (STAGEfeedback_CONFIGURE_FLAGS): Disable -Werror.
* Makefile.in: Regenerate.
---
 Makefile.in  | 4 
 Makefile.tpl | 4 
 2 files changed, 8 insertions(+)

diff --git a/Makefile.in b/Makefile.in
index 2f136839c35..e0e3c4c8fe8 100644
--- a/Makefile.in
+++ b/Makefile.in
@@ -638,6 +638,10 @@ STAGEtrain_TFLAGS = $(filter-out 
-fchecking=1,$(STAGE3_TFLAGS))
 
 STAGEfeedback_CFLAGS = $(STAGE4_CFLAGS) -fprofile-use 
-fprofile-reproducible=parallel-runs
 STAGEfeedback_TFLAGS = $(STAGE4_TFLAGS)
+# Disable warnings as errors for a few reasons:
+# - sources for gen* binaries do not have .gcda files available
+# - inlining decisions generate extra warnings
+STAGEfeedback_CONFIGURE_FLAGS = $(filter-out 
--enable-werror-always,$(STAGE_CONFIGURE_FLAGS))
 
 STAGEautoprofile_CFLAGS = $(filter-out -gtoggle,$(STAGE2_CFLAGS)) -g
 STAGEautoprofile_TFLAGS = $(STAGE2_TFLAGS)
diff --git a/Makefile.tpl b/Makefile.tpl
index 5872dd03f2c..8b7783bb4f1 100644
--- a/Makefile.tpl
+++ b/Makefile.tpl
@@ -561,6 +561,10 @@ STAGEtrain_TFLAGS = $(filter-out 
-fchecking=1,$(STAGE3_TFLAGS))
 
 STAGEfeedback_CFLAGS = $(STAGE4_CFLAGS) -fprofile-use 
-fprofile-reproducible=parallel-runs
 STAGEfeedback_TFLAGS = $(STAGE4_TFLAGS)
+# Disable warnings as errors for a few reasons:
+# - sources for gen* binaries do not have .gcda files available
+# - inlining decisions generate extra warnings
+STAGEfeedback_CONFIGURE_FLAGS = $(filter-out 
--enable-werror-always,$(STAGE_CONFIGURE_FLAGS))
 
 STAGEautoprofile_CFLAGS = $(filter-out -gtoggle,$(STAGE2_CFLAGS)) -g
 STAGEautoprofile_TFLAGS = $(STAGE2_TFLAGS)
-- 
2.42.0



Re: [PATCH v6] RISC-V:Optimize the MASK opt generation

2023-10-02 Thread Gerald Pfeifer
On Mon, 2 Oct 2023, Kito Cheng wrote:
> Thanks for reporting this issue, I just realized multidimensional
> arrays are gawk extensions, could you try the attached patch to see if
> it can resolve the issue?

Yes, with that patch applied the build proceeds far beyond that point
(still running).

Thanks for the quick fix!

Gerald


Re: [PATCH v1] Mode-Switching: Add optional EMIT_AFTER hook

2023-10-02 Thread Robin Dapp
> Conceptually the rounding mode is just a property.  The call, in
> effect, should demand a "normal" rounding mode and set the rounding
> mode to unknown if I understand how this is supposed to work.  If my
> understanding is wrong, then maybe that's where we should start --
> with a good description of the problem ;-)

That's also what I what struggled with last time this was discussed.

Normally, mode switching is used to switch to a requested mode for
an insn or a call and potentially switch back afterwards.

For those riscv intrinsics that specify a variable, non-default rounding
mode we have two options:

- Save and restore before and after each mode-changing intrinsic
 fegetround old_rounding
 fesetround new_rounding 
 actual instruction
 fesetround old_rounding)

- Have mode switching do it for us (lazily) to avoid most of the
storing of the old rounding mode by storing an (e.g.) function-level
rounding-mode backup value.  The backup value is used to lazily
restore the currently valid rounding mode.

The problem with this now is that whenever fesetround gets called
our backup is outdated.  Therefore we need to update our backup after
each function call (as fesetround can of course be present anywhere)
and this is where most of the complications come from.

So in that case the callee _does_ impact the caller via the backup
clobbering.  That was one of my complaints about the whole procedure
last time.  Besides, I didn't see the need for those intrinsics
anyway and would much rather have explicit fesetround calls but well :)

Having said that, it looks like Pan's patch just tries to move some of
the dirty work from the backend to the mode-switching pass by making it
easier to do something after a call.  I believe I asked for that back in
one of the reviews even?

Regards
 Robin


Re: [PATCH 1/2] testsuite: Add and use thread_fence effective-target

2023-10-02 Thread Christophe Lyon
ping?

On Sun, 10 Sept 2023 at 21:31, Christophe Lyon 
wrote:

> Some targets like arm-eabi with newlib and default settings rely on
> __sync_synchronize() to ensure synchronization.  Newlib does not
> implement it by default, to make users aware they have to take special
> care.
>
> This makes a few tests fail to link.
>
> This patch adds a new thread_fence effective target (similar to the
> corresponding one in libstdc++ testsuite), and uses it in the tests
> that need it, making them UNSUPPORTED instead of FAIL and UNRESOLVED.
>
> 2023-09-10  Christophe Lyon  
>
> gcc/
> * doc/sourcebuild.texi (Other attributes): Document thread_fence
> effective-target.
>
> gcc/testsuite/
> * g++.dg/init/array54.C: Require thread_fence.
> * gcc.dg/c2x-nullptr-1.c: Likewise.
> * gcc.dg/pr103721-2.c: Likewise.
> * lib/target-supports.exp (check_effective_target_thread_fence):
> New.
> ---
>  gcc/doc/sourcebuild.texi  |  4 
>  gcc/testsuite/g++.dg/init/array54.C   |  1 +
>  gcc/testsuite/gcc.dg/c2x-nullptr-1.c  |  1 +
>  gcc/testsuite/gcc.dg/pr103721-2.c |  1 +
>  gcc/testsuite/lib/target-supports.exp | 12 
>  5 files changed, 19 insertions(+)
>
> diff --git a/gcc/doc/sourcebuild.texi b/gcc/doc/sourcebuild.texi
> index 1a78b3c1abb..a5f61c29f3b 100644
> --- a/gcc/doc/sourcebuild.texi
> +++ b/gcc/doc/sourcebuild.texi
> @@ -2860,6 +2860,10 @@ Compiler has been configured to support link-time
> optimization (LTO).
>  Compiler and linker support link-time optimization relocatable linking
>  with @option{-r} and @option{-flto} options.
>
> +@item thread_fence
> +Target implements @code{__atomic_thread_fence} without relying on
> +non-implemented @code{__sync_synchronize()}.
> +
>  @item naked_functions
>  Target supports the @code{naked} function attribute.
>
> diff --git a/gcc/testsuite/g++.dg/init/array54.C
> b/gcc/testsuite/g++.dg/init/array54.C
> index f6be350ba72..5241e451d6d 100644
> --- a/gcc/testsuite/g++.dg/init/array54.C
> +++ b/gcc/testsuite/g++.dg/init/array54.C
> @@ -1,5 +1,6 @@
>  // PR c++/90947
>  // { dg-do run { target c++11 } }
> +// { dg-require-effective-target thread_fence }
>
>  #include 
>
> diff --git a/gcc/testsuite/gcc.dg/c2x-nullptr-1.c
> b/gcc/testsuite/gcc.dg/c2x-nullptr-1.c
> index 4e440234d52..97a31c27409 100644
> --- a/gcc/testsuite/gcc.dg/c2x-nullptr-1.c
> +++ b/gcc/testsuite/gcc.dg/c2x-nullptr-1.c
> @@ -1,5 +1,6 @@
>  /* Test valid usage of C23 nullptr.  */
>  /* { dg-do run } */
> +// { dg-require-effective-target thread_fence }
>  /* { dg-options "-std=c2x -pedantic-errors -Wall -Wextra
> -Wno-unused-variable" } */
>
>  #include 
> diff --git a/gcc/testsuite/gcc.dg/pr103721-2.c
> b/gcc/testsuite/gcc.dg/pr103721-2.c
> index aefa1f0f147..e059b1cfc2d 100644
> --- a/gcc/testsuite/gcc.dg/pr103721-2.c
> +++ b/gcc/testsuite/gcc.dg/pr103721-2.c
> @@ -1,4 +1,5 @@
>  // { dg-do run }
> +// { dg-require-effective-target thread_fence }
>  // { dg-options "-O2" }
>
>  extern void abort ();
> diff --git a/gcc/testsuite/lib/target-supports.exp
> b/gcc/testsuite/lib/target-supports.exp
> index d353cc0aaf0..7ac9e7530cc 100644
> --- a/gcc/testsuite/lib/target-supports.exp
> +++ b/gcc/testsuite/lib/target-supports.exp
> @@ -9107,6 +9107,18 @@ proc check_effective_target_sync_char_short { } {
>  || [check_effective_target_mips_llsc] }}]
>  }
>
> +# Return 1 if thread_fence does not rely on __sync_synchronize
> +# library function
> +
> +proc check_effective_target_thread_fence {} {
> +return [check_no_compiler_messages thread_fence executable {
> +   int main () {
> +   __atomic_thread_fence (__ATOMIC_SEQ_CST);
> +   return 0;
> +   }
> +} ""]
> +}
> +
>  # Return 1 if the target uses a ColdFire FPU.
>
>  proc check_effective_target_coldfire_fpu { } {
> --
> 2.34.1
>
>


Re: [PATCH] testsuite: Fix gcc.target/arm/mve/mve_vadcq_vsbcq_fpscr_overwrite.c

2023-10-02 Thread Christophe Lyon
ping? maybe this counts as obvious?


On Thu, 14 Sept 2023 at 11:13, Christophe Lyon 
wrote:

> ping?
>
> On Fri, 8 Sept 2023 at 10:43, Christophe Lyon 
> wrote:
>
>> The test was declaring 'int *carry;' and wrote to '*carry' without
>> initializing 'carry' first, leading to an attempt to write at address
>> zero, and a crash.
>>
>> Fix by declaring 'int carry;' and passing '&carrry' instead of 'carry'
>> as parameter.
>>
>> 2023-09-08  Christophe Lyon  
>>
>> gcc/testsuite/
>> * gcc.target/arm/mve/mve_vadcq_vsbcq_fpscr_overwrite.c: Fix.
>> ---
>>  .../arm/mve/mve_vadcq_vsbcq_fpscr_overwrite.c | 34 +--
>>  1 file changed, 17 insertions(+), 17 deletions(-)
>>
>> diff --git
>> a/gcc/testsuite/gcc.target/arm/mve/mve_vadcq_vsbcq_fpscr_overwrite.c
>> b/gcc/testsuite/gcc.target/arm/mve/mve_vadcq_vsbcq_fpscr_overwrite.c
>> index a8c6cce67c8..931c9d2f30b 100644
>> --- a/gcc/testsuite/gcc.target/arm/mve/mve_vadcq_vsbcq_fpscr_overwrite.c
>> +++ b/gcc/testsuite/gcc.target/arm/mve/mve_vadcq_vsbcq_fpscr_overwrite.c
>> @@ -7,7 +7,7 @@
>>
>>  volatile int32x4_t c1;
>>  volatile uint32x4_t c2;
>> -int *carry;
>> +int carry;
>>
>>  int
>>  main ()
>> @@ -21,45 +21,45 @@ main ()
>>uint32x4_t inactive2 = vcreateq_u32 (0, 0);
>>
>>mve_pred16_t p = 0x;
>> -  (*carry) = 0x;
>> +  carry = 0x;
>>
>>__builtin_arm_set_fpscr_nzcvqc (0);
>> -  c1 = vadcq (a1, b1, carry);
>> +  c1 = vadcq (a1, b1, &carry);
>>if (__builtin_arm_get_fpscr_nzcvqc () & !0x2000)
>>  __builtin_abort ();
>> -  (*carry) = 0x;
>> +  carry = 0x;
>>__builtin_arm_set_fpscr_nzcvqc (0);
>> -  c2 = vadcq (a2, b2, carry);
>> +  c2 = vadcq (a2, b2, &carry);
>>if (__builtin_arm_get_fpscr_nzcvqc () & !0x2000)
>>  __builtin_abort ();
>> -  (*carry) = 0x;
>> +  carry = 0x;
>>__builtin_arm_set_fpscr_nzcvqc (0);
>> -  c1 = vsbcq (a1, b1, carry);
>> +  c1 = vsbcq (a1, b1, &carry);
>>if (__builtin_arm_get_fpscr_nzcvqc () & !0x2000)
>>  __builtin_abort ();
>> -  (*carry) = 0x;
>> +  carry = 0x;
>>__builtin_arm_set_fpscr_nzcvqc (0);
>> -  c2 = vsbcq (a2, b2, carry);
>> +  c2 = vsbcq (a2, b2, &carry);
>>if (__builtin_arm_get_fpscr_nzcvqc () & !0x2000)
>>  __builtin_abort ();
>> -  (*carry) = 0x;
>> +  carry = 0x;
>>__builtin_arm_set_fpscr_nzcvqc (0);
>> -  c1 = vadcq_m (inactive1, a1, b1, carry, p);
>> +  c1 = vadcq_m (inactive1, a1, b1, &carry, p);
>>if (__builtin_arm_get_fpscr_nzcvqc () & !0x2000)
>>  __builtin_abort ();
>> -  (*carry) = 0x;
>> +  carry = 0x;
>>__builtin_arm_set_fpscr_nzcvqc (0);
>> -  c2 = vadcq_m (inactive2, a2, b2, carry, p);
>> +  c2 = vadcq_m (inactive2, a2, b2, &carry, p);
>>if (__builtin_arm_get_fpscr_nzcvqc () & !0x2000)
>>  __builtin_abort ();
>> -  (*carry) = 0x;
>> +  carry = 0x;
>>__builtin_arm_set_fpscr_nzcvqc (0);
>> -  c1 = vsbcq_m (inactive1, a1, b1, carry, p);
>> +  c1 = vsbcq_m (inactive1, a1, b1, &carry, p);
>>if (__builtin_arm_get_fpscr_nzcvqc () & !0x2000)
>>  __builtin_abort ();
>> -  (*carry) = 0x;
>> +  carry = 0x;
>>__builtin_arm_set_fpscr_nzcvqc (0);
>> -  c2 = vsbcq_m (inactive2, a2, b2, carry, p);
>> +  c2 = vsbcq_m (inactive2, a2, b2, &carry, p);
>>if (__builtin_arm_get_fpscr_nzcvqc () & !0x2000)
>>  __builtin_abort ();
>>
>> --
>> 2.34.1
>>
>>


[PATCH] options: Prevent multidimensional arrays

2023-10-02 Thread Kito Cheng
Multidimensional arrary is gawk extension, and we accidentally
introduced that in recent commit[1].

[1] 
https://gcc.gnu.org/git/?p=gcc.git;a=commit;h=e4a4b8e983bac865eb435b11798e38d633b98942

gcc/ChangeLog:

* opt-read.awk: Drop multidimensional arrays.
* opth-gen.awk: Ditto.
---
 gcc/opt-read.awk | 4 ++--
 gcc/opth-gen.awk | 8 
 2 files changed, 6 insertions(+), 6 deletions(-)

diff --git a/gcc/opt-read.awk b/gcc/opt-read.awk
index fcf92853957..f74d8478f72 100644
--- a/gcc/opt-read.awk
+++ b/gcc/opt-read.awk
@@ -123,7 +123,7 @@ BEGIN {
}
else {
target_var = opt_args("Var", $0)
-if (target_var)
+   if (target_var)
{
target_var = opt_args("Var", $1)
var_index = find_index(target_var, 
target_vars, n_target_vars)
@@ -131,7 +131,7 @@ BEGIN {
{
target_vars[n_target_vars++] = 
target_var
}
-   
other_masks[var_index][n_other_mask[var_index]++] = name
+   other_masks[var_index "," 
n_other_mask[var_index]++] = name
}
else
{
diff --git a/gcc/opth-gen.awk b/gcc/opth-gen.awk
index 70ca3d37719..c4398be2f3a 100644
--- a/gcc/opth-gen.awk
+++ b/gcc/opth-gen.awk
@@ -412,9 +412,9 @@ for (i = 0; i < n_target_vars; i++)
continue
for (j = 0; j < n_other_mask[i]; j++)
{
-   print "#define MASK_" other_masks[i][j] " (1U << " 
other_masknum[i][""]++ ")"
+   print "#define MASK_" other_masks[i "," j] " (1U << " 
other_masknum[i]++ ")"
}
-   if (other_masknum[i][""] > 32)
+   if (other_masknum[i] > 32)
print "#error too many target masks for" extra_target_vars[i]
 }
 
@@ -437,8 +437,8 @@ for (i = 0; i < n_target_vars; i++)
continue
for (j = 0; j < n_other_mask[i]; j++)
{
-   print "#define TARGET_" other_masks[i][j] \
- " ((" target_vars[i] " & MASK_" other_masks[i][j] ") != 
0)"
+   print "#define TARGET_" other_masks[i "," j] \
+ " ((" target_vars[i] " & MASK_" other_masks[i "," j] ") 
!= 0)"
}
 }
 print ""
-- 
2.40.1



Re: [PATCH] Remove poly_int_pod

2023-10-02 Thread Richard Sandiford
Jan-Benedict Glaw  writes:
> Hi Richard,
>
> On Thu, 2023-09-28 10:55:46 +0100, Richard Sandiford 
>  wrote:
>> poly_int was written before the switch to C++11 and so couldn't
>> use explicit default constructors.  This led to an awkward split
>> between poly_int_pod and poly_int.  poly_int simply inherited from
>> poly_int_pod and added constructors, with the argumentless constructor
>> having an empty body.  But inheritance meant that poly_int had to
>> repeat the assignment operators from poly_int_pod (again, no C++11,
>> so no "using" to inherit base-class implementations).
> [...]
>
> I haven't bisected it, but I guess your patch caused this:
>
> [all 2023-10-02 06:59:02] 
> /var/lib/laminar/run/gcc-local/75/local-toolchain-install/bin/g++ -std=c++11  
> -fno-PIE -c   -g -O2   -DIN_GCC-fno-exceptions -fno-rtti 
> -fasynchronous-unwind-tables -W -Wall -Wno-narrowing -Wwrite-strings 
> -Wcast-qual -Wmissing-format-attribute -Wconditionally-supported 
> -Woverloaded-virtual -pedantic -Wno-long-long -Wno-variadic-macros 
> -Wno-overlength-strings -Werror -fno-common  -DHAVE_CONFIG_H -fno-PIE -I. -I. 
> -I../../gcc/gcc -I../../gcc/gcc/. -I../../gcc/gcc/../include  
> -I../../gcc/gcc/../libcpp/include -I../../gcc/gcc/../libcody  
> -I../../gcc/gcc/../libdecnumber -I../../gcc/gcc/../libdecnumber/bid 
> -I../libdecnumber -I../../gcc/gcc/../libbacktrace   -o rtl-tests.o -MT 
> rtl-tests.o -MMD -MP -MF ./.deps/rtl-tests.TPo ../../gcc/gcc/rtl-tests.cc
> [all 2023-10-02 06:59:04] In file included from ../../gcc/gcc/coretypes.h:480,
> [all 2023-10-02 06:59:04]  from ../../gcc/gcc/rtl-tests.cc:22:
> [all 2023-10-02 06:59:04] ../../gcc/gcc/poly-int.h: In instantiation of 
> 'constexpr poly_int::poly_int(poly_int_full, const Cs& ...) [with Cs = 
> {int, int}; unsigned int N = 1; C = long int]':
> [all 2023-10-02 06:59:04] ../../gcc/gcc/poly-int.h:439:13:   required from 
> here
> [all 2023-10-02 06:59:04] ../../gcc/gcc/rtl-tests.cc:249:25:   in 'constexpr' 
> expansion of 'poly_int<1, long int>(1, 1)'
> [all 2023-10-02 06:59:04] ../../gcc/gcc/poly-int.h:453:5: error: too many 
> initializers for 'long int [1]'
> [all 2023-10-02 06:59:04]   453 |   : coeffs { (typename 
> poly_coeff_traits::
> [all 2023-10-02 06:59:04]   | 
> ^
> [all 2023-10-02 06:59:04]   454 |   template init_cast::type 
> (cs))... } {}
> [all 2023-10-02 06:59:04]   |   
> ~~~
> [all 2023-10-02 06:59:04] make[1]: *** [Makefile:1188: rtl-tests.o] Error 1
> [all 2023-10-02 06:59:04] make[1]: Leaving directory 
> '/var/lib/laminar/run/gcc-local/75/toolchain-build/gcc'
> [all 2023-10-02 06:59:05] make: *** [Makefile:4993: all-gcc] Error 2
>
>
> (Full build log at
> http://toolchain.lug-owl.de/laminar/jobs/gcc-local/75 .  That's in a
> Docker container on amd64-linux with the host gcc being at fairly new
> at basepoints/gcc-14-3827-g30e6ee07458)

Yeah, this was PR111642.  I pushed a fix this morning.

Thanks,
Richard


Re: [PATCH] Remove poly_int_pod

2023-10-02 Thread Jan-Benedict Glaw
Hi Richard,

On Thu, 2023-09-28 10:55:46 +0100, Richard Sandiford 
 wrote:
> poly_int was written before the switch to C++11 and so couldn't
> use explicit default constructors.  This led to an awkward split
> between poly_int_pod and poly_int.  poly_int simply inherited from
> poly_int_pod and added constructors, with the argumentless constructor
> having an empty body.  But inheritance meant that poly_int had to
> repeat the assignment operators from poly_int_pod (again, no C++11,
> so no "using" to inherit base-class implementations).
[...]

I haven't bisected it, but I guess your patch caused this:

[all 2023-10-02 06:59:02] 
/var/lib/laminar/run/gcc-local/75/local-toolchain-install/bin/g++ -std=c++11  
-fno-PIE -c   -g -O2   -DIN_GCC-fno-exceptions -fno-rtti 
-fasynchronous-unwind-tables -W -Wall -Wno-narrowing -Wwrite-strings 
-Wcast-qual -Wmissing-format-attribute -Wconditionally-supported 
-Woverloaded-virtual -pedantic -Wno-long-long -Wno-variadic-macros 
-Wno-overlength-strings -Werror -fno-common  -DHAVE_CONFIG_H -fno-PIE -I. -I. 
-I../../gcc/gcc -I../../gcc/gcc/. -I../../gcc/gcc/../include  
-I../../gcc/gcc/../libcpp/include -I../../gcc/gcc/../libcody  
-I../../gcc/gcc/../libdecnumber -I../../gcc/gcc/../libdecnumber/bid 
-I../libdecnumber -I../../gcc/gcc/../libbacktrace   -o rtl-tests.o -MT 
rtl-tests.o -MMD -MP -MF ./.deps/rtl-tests.TPo ../../gcc/gcc/rtl-tests.cc
[all 2023-10-02 06:59:04] In file included from ../../gcc/gcc/coretypes.h:480,
[all 2023-10-02 06:59:04]  from ../../gcc/gcc/rtl-tests.cc:22:
[all 2023-10-02 06:59:04] ../../gcc/gcc/poly-int.h: In instantiation of 
'constexpr poly_int::poly_int(poly_int_full, const Cs& ...) [with Cs = 
{int, int}; unsigned int N = 1; C = long int]':
[all 2023-10-02 06:59:04] ../../gcc/gcc/poly-int.h:439:13:   required from here
[all 2023-10-02 06:59:04] ../../gcc/gcc/rtl-tests.cc:249:25:   in 'constexpr' 
expansion of 'poly_int<1, long int>(1, 1)'
[all 2023-10-02 06:59:04] ../../gcc/gcc/poly-int.h:453:5: error: too many 
initializers for 'long int [1]'
[all 2023-10-02 06:59:04]   453 |   : coeffs { (typename poly_coeff_traits::
[all 2023-10-02 06:59:04]   | ^
[all 2023-10-02 06:59:04]   454 |   template init_cast::type 
(cs))... } {}
[all 2023-10-02 06:59:04]   |   
~~~
[all 2023-10-02 06:59:04] make[1]: *** [Makefile:1188: rtl-tests.o] Error 1
[all 2023-10-02 06:59:04] make[1]: Leaving directory 
'/var/lib/laminar/run/gcc-local/75/toolchain-build/gcc'
[all 2023-10-02 06:59:05] make: *** [Makefile:4993: all-gcc] Error 2


(Full build log at
http://toolchain.lug-owl.de/laminar/jobs/gcc-local/75 .  That's in a
Docker container on amd64-linux with the host gcc being at fairly new
at basepoints/gcc-14-3827-g30e6ee07458)

MfG, JBG

-- 


signature.asc
Description: PGP signature


[PATCH 3/3]middle-end: maintain LCSSA throughout loop peeling

2023-10-02 Thread Tamar Christina
Hi All,

This final patch updates peeling to maintain LCSSA all the way through.

It's significantly easier to maintain it during peeling while we still know
where all new edges connect rather than touching it up later as is currently
being done.

This allows us to remove many of the helper functions that touch up the loops
at various parts.  The only complication is for loop distribution where we
should be able to use the same,  however ldist depending on whether
redirect_lc_phi_defs is true or not will either try to maintain a limited LCSSA
form itself or removes are non-virtual phis.

The problem here is that if we maintain LCSSA then in some cases the blocks
connecting the two loops get PHIs to keep the loop IV up to date.

However there is no loop, the guard condition is rewritten as 0 != 0, to the
"loop" always exits.   However due to the PHI nodes the probabilities get
completely wrong.  It seems to think that the impossible exit is the likely
edge.  This causes incorrect warnings and the presence of the PHIs prevent the
blocks to be simplified.

While it may be possible to make ldist work with LCSSA form, doing so seems more
work than not.  For that reason the peeling code has an additional parameter
used by only ldist to not connect the two loops during peeling.

This preserves the current behaviour from ldist until I can dive into the
implementation more.  Hopefully that's ok for now.

Bootstrapped Regtested on aarch64-none-linux-gnu, x86_64-linux-gnu, and
no issues.

Ok for master?

Thanks,
Tamar

gcc/ChangeLog:

* tree-loop-distribution.cc (copy_loop_before): Request no LCSSA.
* tree-vect-loop-manip.cc (adjust_phi_and_debug_stmts): Add additional
asserts.
(slpeel_tree_duplicate_loop_to_edge_cfg): Keep LCSSA during peeling.
(find_guard_arg): Look value up through explicit edge and original defs.
(vect_do_peeling): Use it.
(slpeel_update_phi_nodes_for_guard2): Take explicit exit edge.
(slpeel_update_phi_nodes_for_lcssa, slpeel_update_phi_nodes_for_loops):
Remove.
* tree-vect-loop.cc (vect_create_epilog_for_reduction): Initialize phi.
* tree-vectorizer.h (slpeel_tree_duplicate_loop_to_edge_cfg): Add
optional param to turn off LCSSA mode.

--- inline copy of patch -- 
diff --git a/gcc/tree-loop-distribution.cc b/gcc/tree-loop-distribution.cc
index 
902edc49ab588152a5b845f2c8a42a7e2a1d6080..14fb884d3e91d79785867debaee4956a2d5b0bb1
 100644
--- a/gcc/tree-loop-distribution.cc
+++ b/gcc/tree-loop-distribution.cc
@@ -950,7 +950,7 @@ copy_loop_before (class loop *loop, bool 
redirect_lc_phi_defs)
 
   initialize_original_copy_tables ();
   res = slpeel_tree_duplicate_loop_to_edge_cfg (loop, single_exit (loop), NULL,
-   NULL, preheader, NULL);
+   NULL, preheader, NULL, false);
   gcc_assert (res != NULL);
 
   /* When a not last partition is supposed to keep the LC PHIs computed
diff --git a/gcc/tree-vect-loop-manip.cc b/gcc/tree-vect-loop-manip.cc
index 
77f8e668bcc8beca99ba4052e1b12e0d17300262..0e8c0be5384aab2399ed93966e7bf4918f6c87a5
 100644
--- a/gcc/tree-vect-loop-manip.cc
+++ b/gcc/tree-vect-loop-manip.cc
@@ -252,6 +252,9 @@ adjust_phi_and_debug_stmts (gimple *update_phi, edge e, 
tree new_def)
 {
   tree orig_def = PHI_ARG_DEF_FROM_EDGE (update_phi, e);
 
+  gcc_assert (TREE_CODE (orig_def) != SSA_NAME
+ || orig_def != new_def);
+
   SET_PHI_ARG_DEF (update_phi, e->dest_idx, new_def);
 
   if (MAY_HAVE_DEBUG_BIND_STMTS)
@@ -1445,12 +1448,19 @@ slpeel_duplicate_current_defs_from_edges (edge from, 
edge to)
on E which is either the entry or exit of LOOP.  If SCALAR_LOOP is
non-NULL, assume LOOP and SCALAR_LOOP are equivalent and copy the
basic blocks from SCALAR_LOOP instead of LOOP, but to either the
-   entry or exit of LOOP.  */
+   entry or exit of LOOP.  If FLOW_LOOPS then connect LOOP to SCALAR_LOOP as a
+   continuation.  This is correct for cases where one loop continues from the
+   other like in the vectorizer, but not true for uses in e.g. loop 
distribution
+   where the loop is duplicated and then modified.
+
+   If UPDATED_DOMS is not NULL it is update with the list of basic blocks whoms
+   dominators were updated during the peeling.  */
 
 class loop *
 slpeel_tree_duplicate_loop_to_edge_cfg (class loop *loop, edge loop_exit,
class loop *scalar_loop,
-   edge scalar_exit, edge e, edge *new_e)
+   edge scalar_exit, edge e, edge *new_e,
+   bool flow_loops)
 {
   class loop *new_loop;
   basic_block *new_bbs, *bbs, *pbbs;
@@ -1481,6 +1491,8 @@ slpeel_tree_duplicate_loop_to_edge_cfg (class loop *loop, 
edge loop_exit,
scalar_exit = exit;
break;
  }
+
+  gcc_assert (scalar_exit);
 }
 
   bbs = XNEW

[PATCH 2/3]middle-end: updated niters analysis to handle multiple exits.

2023-10-02 Thread Tamar Christina
Hi All,

This second part updates niters analysis to be able to analyze any number of
exits.  If we have multiple exits we determine the main exit by finding the
first counting IV.

The change allows the vectorizer to pass analysis for multiple loops, but we
later gracefully reject them.  It does however allow us to test if the exit
handling is using the right exit everywhere.

Additionally since we analyze all exits, we now return all conditions for them
and determine which condition belongs to the main exit.

The main condition is needed because the vectorizer needs to ignore the main IV
condition during vectorization as it will replace it during codegen.

To track versioned loops we extend the contract between ifcvt and the vectorizer
to store the exit number in aux so that we can match it up again during peeling.

Bootstrapped Regtested on aarch64-none-linux-gnu, x86_64-linux-gnu, and
no issues.

Ok for master?

Thanks,
Tamar

gcc/ChangeLog:

* tree-if-conv.cc (tree_if_conversion): Record exits in aux.
* tree-vect-loop-manip.cc (slpeel_tree_duplicate_loop_to_edge_cfg): Use
it.
* tree-vect-loop.cc (vect_get_loop_niters): Determine main exit.
(vec_init_loop_exit_info): Extend analysis when multiple exits.
(vect_analyze_loop_form): Record conds and determine main cond.
(vect_create_loop_vinfo): Extend bookkeeping of conds.
(vect_analyze_loop): Release conds.
* tree-vectorizer.h (LOOP_VINFO_LOOP_CONDS,
LOOP_VINFO_LOOP_IV_COND):  New.
(struct vect_loop_form_info): Add conds, alt_loop_conds;
(struct loop_vec_info): Add conds, loop_iv_cond.

--- inline copy of patch -- 
diff --git a/gcc/tree-if-conv.cc b/gcc/tree-if-conv.cc
index 
799f071965e5c41eb352b5530cf1d9c7ecf7bf25..3dc2290467797ebbfcef55903531b22829f4fdbd
 100644
--- a/gcc/tree-if-conv.cc
+++ b/gcc/tree-if-conv.cc
@@ -3795,6 +3795,13 @@ tree_if_conversion (class loop *loop, vec 
*preds)
 }
   if (need_to_ifcvt)
 {
+  /* Before we rewrite edges we'll record their original position in the
+edge map such that we can map the edges between the ifcvt and the
+non-ifcvt loop during peeling.  */
+  uintptr_t idx = 0;
+  for (edge exit : get_loop_exit_edges (loop))
+   exit->aux = (void*)idx++;
+
   /* Now all statements are if-convertible.  Combine all the basic
 blocks into one huge basic block doing the if-conversion
 on-the-fly.  */
diff --git a/gcc/tree-vect-loop-manip.cc b/gcc/tree-vect-loop-manip.cc
index 
e06717272aafc6d31cbdcb94840ac25de616da6d..77f8e668bcc8beca99ba4052e1b12e0d17300262
 100644
--- a/gcc/tree-vect-loop-manip.cc
+++ b/gcc/tree-vect-loop-manip.cc
@@ -1470,6 +1470,18 @@ slpeel_tree_duplicate_loop_to_edge_cfg (class loop 
*loop, edge loop_exit,
   scalar_loop = loop;
   scalar_exit = loop_exit;
 }
+  else if (scalar_loop == loop)
+scalar_exit = loop_exit;
+  else
+{
+  /* Loop has been version, match exits up using the aux index.  */
+  for (edge exit : get_loop_exit_edges (scalar_loop))
+   if (exit->aux == loop_exit->aux)
+ {
+   scalar_exit = exit;
+   break;
+ }
+}
 
   bbs = XNEWVEC (basic_block, scalar_loop->num_nodes + 1);
   pbbs = bbs + 1;
@@ -1501,6 +1513,8 @@ slpeel_tree_duplicate_loop_to_edge_cfg (class loop *loop, 
edge loop_exit,
   exit = loop_exit;
   basic_block new_preheader = new_bbs[0];
 
+  /* Record the new loop exit information.  new_loop doesn't have SCEV data and
+ so we must initialize the exit information.  */
   if (new_e)
 *new_e = new_exit;
 
diff --git a/gcc/tree-vect-loop.cc b/gcc/tree-vect-loop.cc
index 
6e60d84143626a8e1d801bb580f4dcebc73c7ba7..f1caa5f207d3b13da58c3a313b11d1ef98374349
 100644
--- a/gcc/tree-vect-loop.cc
+++ b/gcc/tree-vect-loop.cc
@@ -851,79 +851,106 @@ vect_fixup_scalar_cycles_with_patterns (loop_vec_info 
loop_vinfo)
in NUMBER_OF_ITERATIONSM1.  Place the condition under which the
niter information holds in ASSUMPTIONS.
 
-   Return the loop exit condition.  */
+   Return the loop exit conditions.  */
 
 
-static gcond *
-vect_get_loop_niters (class loop *loop, edge exit, tree *assumptions,
+static vec
+vect_get_loop_niters (class loop *loop, tree *assumptions, const_edge 
main_exit,
  tree *number_of_iterations, tree *number_of_iterationsm1)
 {
+  auto_vec exits = get_loop_exit_edges (loop);
+  vec conds;
+  conds.create (exits.length ());
   class tree_niter_desc niter_desc;
   tree niter_assumptions, niter, may_be_zero;
-  gcond *cond = get_loop_exit_condition (loop);
 
   *assumptions = boolean_true_node;
   *number_of_iterationsm1 = chrec_dont_know;
   *number_of_iterations = chrec_dont_know;
+
   DUMP_VECT_SCOPE ("get_loop_niters");
 
-  if (!exit)
-return cond;
+  if (exits.is_empty ())
+return conds;
+
+  if (dump_enabled_p ())
+dump_printf_loc (MSG_NOTE, vect_location, "Loop has %d exits.\n",
+

[PATCH 1/3]middle-end: Refactor vectorizer loop conditionals and separate out IV to new variables

2023-10-02 Thread Tamar Christina
Hi All,

This is extracted out of the patch series to support early break vectorization
in order to simplify the review of that patch series.

The goal of this one is to separate out the refactoring from the new
functionality.

This first patch separates out the vectorizer's definition of an exit to their
own values inside loop_vinfo.  During vectorization we can have three separate
copies for each loop: scalar, vectorized, epilogue.  The scalar loop can also be
the versioned loop before peeling.

Because of this we track 3 different exits inside loop_vinfo corresponding to
each of these loops.  Additionally each function that uses an exit, when not
obviously clear which exit is needed will now take the exit explicitly as an
argument.

This is because often times the callers switch the loops being passed around.
While the caller knows which loops it is, the callee does not.

For now the loop exits are simply initialized to same value as before determined
by single_exit (..).

No change in functionality is expected throughout this patch series.

Bootstrapped Regtested on aarch64-none-linux-gnu, x86_64-linux-gnu, and
no issues.

Ok for master?

Thanks,
Tamar

gcc/ChangeLog:

* tree-loop-distribution.cc (copy_loop_before): Pass exit explicitly.
(loop_distribution::distribute_loop): Bail out of not single exit.
* tree-scalar-evolution.cc (get_loop_exit_condition): New.
* tree-scalar-evolution.h (get_loop_exit_condition): New.
* tree-vect-data-refs.cc (vect_enhance_data_refs_alignment): Pass exit
explicitly.
* tree-vect-loop-manip.cc (vect_set_loop_condition_partial_vectors,
vect_set_loop_condition_partial_vectors_avx512,
vect_set_loop_condition_normal, vect_set_loop_condition): Explicitly
take exit.
(slpeel_tree_duplicate_loop_to_edge_cfg): Explicitly take exit and
return new peeled corresponding peeled exit.
(slpeel_can_duplicate_loop_p): Explicitly take exit.
(find_loop_location): Handle not knowing an explicit exit.
(vect_update_ivs_after_vectorizer, vect_gen_vector_loop_niters_mult_vf,
find_guard_arg, slpeel_update_phi_nodes_for_loops,
slpeel_update_phi_nodes_for_guard2): Use new exits.
(vect_do_peeling): Update bookkeeping to keep track of exits.
* tree-vect-loop.cc (vect_get_loop_niters): Explicitly take exit to
analyze.
(vec_init_loop_exit_info): New.
(_loop_vec_info::_loop_vec_info): Initialize vec_loop_iv,
vec_epilogue_loop_iv, scalar_loop_iv.
(vect_analyze_loop_form): Initialize exits.
(vect_create_loop_vinfo): Set main exit.
(vect_create_epilog_for_reduction, vectorizable_live_operation,
vect_transform_loop): Use it.
(scale_profile_for_vect_loop): Explicitly take exit to scale.
* tree-vectorizer.cc (set_uid_loop_bbs): Initialize loop exit.
* tree-vectorizer.h (LOOP_VINFO_IV_EXIT, LOOP_VINFO_EPILOGUE_IV_EXIT,
LOOP_VINFO_SCALAR_IV_EXIT): New.
(struct loop_vec_info): Add vec_loop_iv, vec_epilogue_loop_iv,
scalar_loop_iv.
(vect_set_loop_condition, slpeel_can_duplicate_loop_p,
slpeel_tree_duplicate_loop_to_edge_cfg): Take explicit exits.
(vec_init_loop_exit_info): New.
(struct vect_loop_form_info): Add loop_exit.

--- inline copy of patch -- 
diff --git a/gcc/tree-loop-distribution.cc b/gcc/tree-loop-distribution.cc
index 
a28470b66ea935741a61fb73961ed7c927543a3d..902edc49ab588152a5b845f2c8a42a7e2a1d6080
 100644
--- a/gcc/tree-loop-distribution.cc
+++ b/gcc/tree-loop-distribution.cc
@@ -949,7 +949,8 @@ copy_loop_before (class loop *loop, bool 
redirect_lc_phi_defs)
   edge preheader = loop_preheader_edge (loop);
 
   initialize_original_copy_tables ();
-  res = slpeel_tree_duplicate_loop_to_edge_cfg (loop, NULL, preheader);
+  res = slpeel_tree_duplicate_loop_to_edge_cfg (loop, single_exit (loop), NULL,
+   NULL, preheader, NULL);
   gcc_assert (res != NULL);
 
   /* When a not last partition is supposed to keep the LC PHIs computed
@@ -3043,6 +3044,24 @@ loop_distribution::distribute_loop (class loop *loop,
   return 0;
 }
 
+  /* Loop distribution only does prologue peeling but we still need to
+ initialize loop exit information.  However we only support single exits at
+ the moment.  As such, should exit information not have been provided and 
we
+ have more than one exit, bail out.  */
+  if (!single_exit (loop))
+{
+  if (dump_file && (dump_flags & TDF_DETAILS))
+   fprintf (dump_file,
+"Loop %d not distributed: too many exits.\n",
+loop->num);
+
+  free_rdg (rdg);
+  loop_nest.release ();
+  free_data_refs (datarefs_vec);
+  delete ddrs_table;
+  return 0;
+}
+
   data_reference_p dref;
   for (i = 0; datarefs_vec.iterate (i, &dref); ++i)
 dref->aux = (void *) (uintptr_t) i;

Re: [PATCH] RISC-V: Use safe_grow_cleared for vector info [PR111469]

2023-10-02 Thread Jakub Jelinek
On Sun, Oct 01, 2023 at 07:03:51AM +0800, 钟居哲 wrote:
> LGTM.
> 
> juzhe.zh...@rivai.ai
>  
> diff --git a/gcc/config/riscv/riscv-vsetvl.cc 
> b/gcc/config/riscv/riscv-vsetvl.cc
> index af8c31d873c..4b06d93e7f9 100644
> --- a/gcc/config/riscv/riscv-vsetvl.cc
> +++ b/gcc/config/riscv/riscv-vsetvl.cc
> @@ -2417,8 +2417,8 @@ vector_infos_manager::vector_infos_manager ()
>vector_antin = nullptr;
>vector_antout = nullptr;
>vector_earliest = nullptr;
> -  vector_insn_infos.safe_grow (get_max_uid ());
> -  vector_block_infos.safe_grow (last_basic_block_for_fn (cfun));
> +  vector_insn_infos.safe_grow_cleared (get_max_uid ());
> +  vector_block_infos.safe_grow_cleared (last_basic_block_for_fn (cfun));
>if (!optimize)
>  {
>basic_block cfg_bb;

Note, while it works around the build failure, I strongly doubt it is the
right fix in this case.  The other spots which have been changed similarly
are quite different, all the vec cases have been followed
(almost) immediately by bitmap_initialize of all the elements or just 1-3
elements and their actual uses.
The above is very different.  Sizing a vector with get_max_uid ()
means it is very likely very sparse and so constructing every element in the
array seems undesirable.  While it is true that e.g. IRA or DF use vectors
indexed by INSN_UID, I think the vectors pretty much always have pointer
elements and say pool allocate what it points to upon first access.
While vector_insn_info is huge (48 bytes on 64-bit hosts from what I can see)
even without much attempts to make it shorter (e.g. the vl_vtype_info member
ordering of 2xpointer sized member, 1 byte member, 4 byte member, 3 1 byte
members creates unnecessary padding).
The reason why arrays indexed by INSN_UID are sparse are 3:
1) we have --param min-nondebug-insn-uid= parameter (where, albeit usually
   just for debugging, one can specify very high start for those, say
   --param min-nondebug-insn-uid=10 means debug insns will be created
   with INSN_UID 1 and up, while non-DEBUG_INSNs only with INSN_UID 10
   and up, so even a function containing just 3-4 insns will have
   get_max_uid () of 14 or so; the above allocates in that case
   14 * 48 bytes (bad) and newly constructs all the elements in it
2) INSN_UIDs are given to all newly created insns during RTL optimizations,
   when an insn is deleted, its INSN_UID is not reused, so there can be
   large gaps; the more churn in RTL optimizations, the larger
3) INSN_UIDs are given even to BARRIERs, DEBUG_INSNs etc., plus the question
   is if the algorithm really needs to track every "normal" insn as well,
   whether it shouldn't track just those that are in some ways using vectors
   or relevant to it, many scalar instructions might be uninteresting,
   DEBUG_INSNs certainly should be uninteresting (they must not affect
   code generation), etc.
So, I think having something lazily allocated and initialized on demand
might be a compile time memory and time win.
For basic blockswe perform some block number compactions during compilation,
so those shouldn't be denser.  But the structure for each basic block is
even bigger (104 bytes, because it contains 2 x 48 plus probability).
E.g. DF uses for many purposes LUIDs instead, which need to be recomputed,
but are logical uids within each basic block.

Jakub



[pushed] testsuite, Darwin: Skip g++.dg/debug/dwarf2/pr85550.C

2023-10-02 Thread Iain Sandoe
Tested on x86_64-darwin21 pushed to trunk, thanks,
Iain

--- 8< ---

There are two problems here; first that the emitted asm for
-fdebug-types-section is ELF-specfic leading to assembler errors for
Mach-O.  If we fix this, we get a secondary fail since the debug linker
does not recognise DW_FORM_ref_sig8.  Disable ths test until we get
DWARF-5 support in the external Darwin toolchain components.

gcc/testsuite/ChangeLog:

* g++.dg/debug/dwarf2/pr85550.C: Skip for Darwin.

Signed-off-by: Iain Sandoe 
---
 gcc/testsuite/g++.dg/debug/dwarf2/pr85550.C | 1 +
 1 file changed, 1 insertion(+)

diff --git a/gcc/testsuite/g++.dg/debug/dwarf2/pr85550.C 
b/gcc/testsuite/g++.dg/debug/dwarf2/pr85550.C
index 35b0f56e959..c95f75255d4 100644
--- a/gcc/testsuite/g++.dg/debug/dwarf2/pr85550.C
+++ b/gcc/testsuite/g++.dg/debug/dwarf2/pr85550.C
@@ -2,6 +2,7 @@
 // { dg-do link }
 // { dg-options "-O2 -g -fdebug-types-section" }
 // { dg-skip-if "AIX DWARF5" { powerpc-ibm-aix* } }
+// { dg-skip-if "No debug linker support" { *-*-darwin* } }
 
 struct A {
   int bar () const { return 0; }
-- 
2.39.2 (Apple Git-143)