Re: [PATCH] Do not allow "x + 0.0" to "x" optimization with -fsignaling-nans

2023-06-19 Thread Toru Kisuki via Gcc-patches
Hi Jeff,


Thank you for taking care of it.


Toru


From: Jeff Law 
Sent: Monday, June 19, 2023 7:55 PM
To: Richard Biener; Toru Kisuki
Cc: gcc-patches@gcc.gnu.org
Subject: Re: [PATCH] Do not allow "x + 0.0" to "x" optimization with 
-fsignaling-nans


[EXTERNAL] Caution: This email originated from outside of the organization.



On 6/19/23 05:41, Richard Biener via Gcc-patches wrote:
> On Mon, Jun 19, 2023 at 12:33 PM Toru Kisuki via Gcc-patches
>  wrote:
>>
>> Hi,
>>
>>
>> With -O3 -fsignaling-nans -fno-signed-zeros, compiler should not simplify 'x 
>> + 0.0' to 'x'.
>>
>
> OK if you bootstrapped / tested this change.
I'm suspect Toru doesn't have write access.  So I went ahead and did and
x86 bootstrap & regression test which passed.  The ChangeLog entry
needed fleshing out a bit and fixed a minor whitespace problem in the
patch itself.

Pushed to the trunk.


jeff


Re: [PATCH 2/2] cprop_hardreg: Enable propagation of the stack pointer if possible.

2023-06-19 Thread Jeff Law via Gcc-patches




On 6/19/23 22:52, Tamar Christina wrote:


It's a bit hackish, but could we reject the stack pointer for operand1 in the
stack-tie?  And if we do so, does it help?


Yeah this one I had to defer until later this week to look at closer because 
what I'm
wondering about is whether the optimization should apply to frame related
RTX as well.

Looking at the description of RTX_FRAME_RELATED_P that this optimization may
end up de-optimizing RISC targets by creating an offset that is larger than 
offset
which can be used from a SP making reload having to spill.  i.e. sometimes the
move was explicitly done. So perhaps it should not apply it to
RTX_FRAME_RELATED_P in find_oldest_value_reg and copyprop_hardreg_forward_1?

Other parts of this pass already seems to bail out in similar situations.   So 
I needed to
write some testcases to check what would happen in these cases hence the 
deferral.
to later in the week.
Rejecting for RTX_FRAME_RELATED_P would seem reasonable and probably 
better in general to me.  The cases where we're looking to clean things 
up aren't really in the prologue/epilogue, but instead in the main body 
after register elimination has turned fp into sp + offset, thus making 
all kinds of things no longer valid.


jeff


RE: [PATCH 2/2] cprop_hardreg: Enable propagation of the stack pointer if possible.

2023-06-19 Thread Tamar Christina via Gcc-patches
> -Original Message-
> From: Jeff Law 
> Sent: Tuesday, June 20, 2023 3:17 AM
> To: Andrew Pinski ; Thiago Jung Bauermann
> 
> Cc: Manolis Tsamis ; Philipp Tomsich
> ; Richard Biener ;
> Palmer Dabbelt ; Kito Cheng ;
> gcc-patches@gcc.gnu.org; Tamar Christina 
> Subject: Re: [PATCH 2/2] cprop_hardreg: Enable propagation of the stack
> pointer if possible.
> 
> 
> 
> On 6/19/23 17:48, Andrew Pinski wrote:
> > On Mon, Jun 19, 2023 at 4:40 PM Andrew Pinski 
> wrote:
> >>
> >> On Mon, Jun 19, 2023 at 9:58 AM Thiago Jung Bauermann via Gcc-patches
> >>  wrote:
> >>>
> >>>
> >>> Hello Manolis,
> >>>
> >>> Philipp Tomsich  writes:
> >>>
>  On Thu, 8 Jun 2023 at 00:18, Jeff Law  wrote:
> >
> > On 5/25/23 06:35, Manolis Tsamis wrote:
> >> Propagation of the stack pointer in cprop_hardreg is currenty
> >> forbidden in all cases, due to maybe_mode_change returning NULL.
> >> Relax this restriction and allow propagation when no mode change is
> requested.
> >>
> >> gcc/ChangeLog:
> >>
> >>   * regcprop.cc (maybe_mode_change): Enable stack pointer
> propagation.
> > Thanks for the clarification.  This is OK for the trunk.  It looks
> > generic enough to have value going forward now rather than waiting.
> 
>  Rebased, retested, and applied to trunk.  Thanks!
> >>>
> >>> Our CI found a couple of tests that started failing on aarch64-linux
> >>> after this commit. I was able to confirm manually that they don't
> >>> happen in the commit immediately before this one, and also that
> >>> these failures are still present in today's trunk.
> >>>
> >>> I have testsuite logs for last good commit, first bad commit and
> >>> current trunk here:
> >>>
> >>> https://people.linaro.org/~thiago.bauermann/gcc-regression-6a2e8dcbb
> >>> d4b/
> >>>
> >>> Could you please check?
> >>>
> >>> These are the new failures:
> >>>
> >>> Running gcc:gcc.target/aarch64/aarch64.exp ...
> >>> FAIL: gcc.target/aarch64/stack-check-cfa-3.c scan-assembler-times
> >>> mov\\tx11, sp 1
> >>
> >> So for the above before this change we had:
> >> ```
> >> (insn:TI 597 596 598 2 (set (reg:DI 11 x11)
> >>  (reg/f:DI 31 sp)) "stack-check-prologue-16.c":16:1 65
> {*movdi_aarch64}
> >>   (nil))
> >> (insn 598 597 599 2 (set (mem:BLK (scratch) [0  A8])
> >>  (unspec:BLK [
> >>  (reg:DI 11 x11)
> >>  (reg/f:DI 31 sp)
> >>  ] UNSPEC_PRLG_STK)) "stack-check-prologue-16.c":16:1
> >> 1169 {stack_tie}
> >>   (expr_list:REG_DEAD (reg:DI 11 x11)
> >>  (nil)))
> >> ```
> >>
> >> After we get:
> >> ```
> >> (insn 598 596 599 2 (set (mem:BLK (scratch) [0  A8])
> >>  (unspec:BLK [
> >>  (reg:DI 31 sp [11]) repeated x2
> >>  ] UNSPEC_PRLG_STK)) "stack-check-prologue-16.c":16:1
> >> 1169 {stack_tie}
> >>   (nil))
> >> ```
> >> Which seems to be ok, except we still have:
> >> .cfi_def_cfa_register 11
> >>
> >> That is because on:
> >> (insn/f 596 595 598 2 (set (reg:DI 12 x12)
> >>  (plus:DI (reg:DI 12 x12)
> >>  (const_int 272 [0x110])))
> >> "stack-check-prologue-16.c":16:1
> >> 153 {*adddi3_aarch64}
> >>   (expr_list:REG_CFA_DEF_CFA (reg:DI 11 x11)
> >>  (nil)))
> >>
> >> We record x11 but never update it though that came before the mov for
> >> x11 ... So it seems like cprop_hardreg had no idea it needed to
> >> update it.
> >>
> >> I suspect the other testcases are just propagation of sp into the
> >> stores and such and just needed update. But the above testcase seems
> >> getting broken cfi  though I don't know how to fix it.

Yeah, we noticed the failures internally but left them broken since we have an
upcoming AArch64 patch which requires them to be updated anyway and are
rolling up the updates into that patch. 

> >
> > The code from aarch64.cc:
> > ```
> >/* This is done to provide unwinding information for the stack
> >   adjustments we're about to do, however to prevent the 
> > optimizers
> >   from removing the R11 move and leaving the CFA note (which 
> > would
> be
> >   very wrong) we tie the old and new stack pointer together.
> >   The tie will expand to nothing but the optimizers will not 
> > touch
> >   the instruction.  */
> >rtx stack_ptr_copy = gen_rtx_REG (Pmode,
> STACK_CLASH_SVE_CFA_REGNUM);
> >emit_move_insn (stack_ptr_copy, stack_pointer_rtx);
> >emit_insn (gen_stack_tie (stack_ptr_copy,
> > stack_pointer_rtx));
> >
> >/* We want the CFA independent of the stack pointer for the
> >   duration of the loop.  */
> >add_reg_note (insn, REG_CFA_DEF_CFA, stack_ptr_copy);
> >RTX_FRAME_RELATED_P (insn) = 1; ```
> >
> > Well except now with this change, the optimizers touch this
> > instruction. Maybe the move instruction should not be a move but an
> > unspec so optimizers don't know what 

[PATCH] Change fma_reassoc_width tuning for ampere1

2023-06-19 Thread Di Zhao OS via Gcc-patches
This patch enables reassociation of floating-point additions on ampere1.
This brings about 1% overall benefit on spec2017 fprate cases. (There
are minor regressions in 510.parest_r and 508.namd_r, analyzed here:
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=110279 .)

Bootstrapped and tested on aarch64-unknown-linux-gnu. Is this OK for trunk?

Thanks,
Di Zhao

gcc/ChangeLog:

* config/aarch64/aarch64.cc: Change fma_reassoc_width for ampere1
---
diff --git a/gcc/config/aarch64/aarch64.cc b/gcc/config/aarch64/aarch64.cc
index d16565b5581..301c9f6c0cd 100644
--- a/gcc/config/aarch64/aarch64.cc
+++ b/gcc/config/aarch64/aarch64.cc
@@ -1927,7 +1927,7 @@ static const struct tune_params ampere1_tunings =
   "32:12", /* loop_align.  */
   2,   /* int_reassoc_width.  */
   4,   /* fp_reassoc_width.  */
-  1,   /* fma_reassoc_width.  */
+  4,   /* fma_reassoc_width.  */
   2,   /* vec_reassoc_width.  */
   2,   /* min_div_recip_mul_sf.  */
   2,   /* min_div_recip_mul_df.  */
-- 
2.25.1




[Bug testsuite/110230] new test case gcc.target/powerpc/pr109932-1.c in r14-1705-g2764335bd336f2 fails for 32 bits

2023-06-19 Thread cvs-commit at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=110230

--- Comment #5 from CVS Commits  ---
The releases/gcc-13 branch has been updated by Kewen Lin :

https://gcc.gnu.org/g:4b4a21c93406aef276fbff00d3e9491285d7b4a9

commit r13-7458-g4b4a21c93406aef276fbff00d3e9491285d7b4a9
Author: Kewen Lin 
Date:   Tue Jun 13 03:04:54 2023 -0500

testsuite: Check int128 effective target for pr109932-{1,2}.c [PR110230]

This patch is to make newly added test cases pr109932-{1,2}.c
check int128 effective target to avoid unsupported type error
on 32-bit.  I did hit this failure during testing and fixed
it, but made a stupid mistake not updating the local formatted
patch which was actually out of date.

PR testsuite/110230
PR target/109932

gcc/testsuite/ChangeLog:

* gcc.target/powerpc/pr109932-1.c: Adjust with int128 effective
target.
* gcc.target/powerpc/pr109932-2.c: Ditto.

(cherry picked from commit 16eb9d69079d769b2aa2c07ce54aca20f5547c14)

[Bug target/109932] ICE in in extract_insn, at recog.cc:2791 on ppc64le with -mno-vsx

2023-06-19 Thread cvs-commit at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=109932

--- Comment #7 from CVS Commits  ---
The releases/gcc-13 branch has been updated by Kewen Lin :

https://gcc.gnu.org/g:4b4a21c93406aef276fbff00d3e9491285d7b4a9

commit r13-7458-g4b4a21c93406aef276fbff00d3e9491285d7b4a9
Author: Kewen Lin 
Date:   Tue Jun 13 03:04:54 2023 -0500

testsuite: Check int128 effective target for pr109932-{1,2}.c [PR110230]

This patch is to make newly added test cases pr109932-{1,2}.c
check int128 effective target to avoid unsupported type error
on 32-bit.  I did hit this failure during testing and fixed
it, but made a stupid mistake not updating the local formatted
patch which was actually out of date.

PR testsuite/110230
PR target/109932

gcc/testsuite/ChangeLog:

* gcc.target/powerpc/pr109932-1.c: Adjust with int128 effective
target.
* gcc.target/powerpc/pr109932-2.c: Ditto.

(cherry picked from commit 16eb9d69079d769b2aa2c07ce54aca20f5547c14)

[Bug target/109932] ICE in in extract_insn, at recog.cc:2791 on ppc64le with -mno-vsx

2023-06-19 Thread cvs-commit at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=109932

--- Comment #6 from CVS Commits  ---
The releases/gcc-13 branch has been updated by Kewen Lin :

https://gcc.gnu.org/g:4e67d73ee5100c12993c79852e4ede13d2606cad

commit r13-7457-g4e67d73ee5100c12993c79852e4ede13d2606cad
Author: Kewen Lin 
Date:   Mon Jun 12 01:08:22 2023 -0500

rs6000: Guard __builtin_{un,}pack_vector_int128 with vsx [PR109932]

As PR109932 shows, builtins __builtin_{un,}pack_vector_int128
should be guarded under vsx rather than power7, as their
corresponding bif patterns have the conditions TARGET_VSX
and VECTOR_MEM_ALTIVEC_OR_VSX_P (V1TImode).  This patch is to
move __builtin_{un,}pack_vector_int128 to stanza vsx to ensure
their supports.

PR target/109932

gcc/ChangeLog:

* config/rs6000/rs6000-builtins.def (__builtin_pack_vector_int128,
__builtin_unpack_vector_int128): Move from stanza power7 to vsx.

gcc/testsuite/ChangeLog:

* gcc.target/powerpc/pr109932-1.c: New test.
* gcc.target/powerpc/pr109932-2.c: New test.

(cherry picked from commit ff83d1b47aadcdaf80a4fda84b0dc00bb2cd3641)

[Bug target/110011] -mfull-toc (-mfp-in-toc) yields incorrect _Float128 constants on power9

2023-06-19 Thread cvs-commit at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=110011

--- Comment #10 from CVS Commits  ---
The releases/gcc-13 branch has been updated by Kewen Lin :

https://gcc.gnu.org/g:cefe925fe49af81bb4ae7a27fa2c96f0926fe22e

commit r13-7456-gcefe925fe49af81bb4ae7a27fa2c96f0926fe22e
Author: Kewen Lin 
Date:   Mon Jun 12 01:07:52 2023 -0500

rs6000: Don't use TFmode for 128 bits fp constant in toc [PR110011]

As PR110011 shows, when encoding 128 bits fp constant into
toc, we adopts REAL_VALUE_TO_TARGET_LONG_DOUBLE which is
to find the first float mode with LONG_DOUBLE_TYPE_SIZE
bits of precision, it would be TFmode here.  But the 128
bits fp constant can be with mode IFmode or KFmode, which
doesn't necessarily have the same underlying float format
as the one of TFmode, like this PR exposes, with option
-mabi=ibmlongdouble TFmode has ibm_extended_format while
KFmode has ieee_quad_format, mixing up the formats (the
encoding/decoding ways) would cause unexpected results.

This patch is to make it use constant's own mode instead
of TFmode for real_to_target call.

PR target/110011

gcc/ChangeLog:

* config/rs6000/rs6000.cc (output_toc): Use the mode of the 128-bit
floating constant itself for real_to_target call.

gcc/testsuite/ChangeLog:

* gcc.target/powerpc/pr110011.c: New test.

(cherry picked from commit 388809f2afde874180da0669c669e241037eeba0)

[Bug testsuite/110230] new test case gcc.target/powerpc/pr109932-1.c in r14-1705-g2764335bd336f2 fails for 32 bits

2023-06-19 Thread cvs-commit at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=110230

--- Comment #4 from CVS Commits  ---
The releases/gcc-12 branch has been updated by Kewen Lin :

https://gcc.gnu.org/g:4591c2c8a6b15ca99ba049d84e0e694f12db4f60

commit r12-9714-g4591c2c8a6b15ca99ba049d84e0e694f12db4f60
Author: Kewen Lin 
Date:   Tue Jun 13 03:04:54 2023 -0500

testsuite: Check int128 effective target for pr109932-{1,2}.c [PR110230]

This patch is to make newly added test cases pr109932-{1,2}.c
check int128 effective target to avoid unsupported type error
on 32-bit.  I did hit this failure during testing and fixed
it, but made a stupid mistake not updating the local formatted
patch which was actually out of date.

PR testsuite/110230
PR target/109932

gcc/testsuite/ChangeLog:

* gcc.target/powerpc/pr109932-1.c: Adjust with int128 effective
target.
* gcc.target/powerpc/pr109932-2.c: Ditto.

(cherry picked from commit 16eb9d69079d769b2aa2c07ce54aca20f5547c14)

[Bug target/109932] ICE in in extract_insn, at recog.cc:2791 on ppc64le with -mno-vsx

2023-06-19 Thread cvs-commit at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=109932

--- Comment #5 from CVS Commits  ---
The releases/gcc-12 branch has been updated by Kewen Lin :

https://gcc.gnu.org/g:4591c2c8a6b15ca99ba049d84e0e694f12db4f60

commit r12-9714-g4591c2c8a6b15ca99ba049d84e0e694f12db4f60
Author: Kewen Lin 
Date:   Tue Jun 13 03:04:54 2023 -0500

testsuite: Check int128 effective target for pr109932-{1,2}.c [PR110230]

This patch is to make newly added test cases pr109932-{1,2}.c
check int128 effective target to avoid unsupported type error
on 32-bit.  I did hit this failure during testing and fixed
it, but made a stupid mistake not updating the local formatted
patch which was actually out of date.

PR testsuite/110230
PR target/109932

gcc/testsuite/ChangeLog:

* gcc.target/powerpc/pr109932-1.c: Adjust with int128 effective
target.
* gcc.target/powerpc/pr109932-2.c: Ditto.

(cherry picked from commit 16eb9d69079d769b2aa2c07ce54aca20f5547c14)

[Bug target/109932] ICE in in extract_insn, at recog.cc:2791 on ppc64le with -mno-vsx

2023-06-19 Thread cvs-commit at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=109932

--- Comment #4 from CVS Commits  ---
The releases/gcc-12 branch has been updated by Kewen Lin :

https://gcc.gnu.org/g:31d88c795a0eb05df5a0684c34ec74116cce133f

commit r12-9713-g31d88c795a0eb05df5a0684c34ec74116cce133f
Author: Kewen Lin 
Date:   Mon Jun 12 01:08:22 2023 -0500

rs6000: Guard __builtin_{un,}pack_vector_int128 with vsx [PR109932]

As PR109932 shows, builtins __builtin_{un,}pack_vector_int128
should be guarded under vsx rather than power7, as their
corresponding bif patterns have the conditions TARGET_VSX
and VECTOR_MEM_ALTIVEC_OR_VSX_P (V1TImode).  This patch is to
move __builtin_{un,}pack_vector_int128 to stanza vsx to ensure
their supports.

PR target/109932

gcc/ChangeLog:

* config/rs6000/rs6000-builtins.def (__builtin_pack_vector_int128,
__builtin_unpack_vector_int128): Move from stanza power7 to vsx.

gcc/testsuite/ChangeLog:

* gcc.target/powerpc/pr109932-1.c: New test.
* gcc.target/powerpc/pr109932-2.c: New test.

(cherry picked from commit ff83d1b47aadcdaf80a4fda84b0dc00bb2cd3641)

[Bug target/110011] -mfull-toc (-mfp-in-toc) yields incorrect _Float128 constants on power9

2023-06-19 Thread cvs-commit at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=110011

--- Comment #9 from CVS Commits  ---
The releases/gcc-12 branch has been updated by Kewen Lin :

https://gcc.gnu.org/g:90e1030d4c6d981c2293d89db6d1d57c057ad61d

commit r12-9712-g90e1030d4c6d981c2293d89db6d1d57c057ad61d
Author: Kewen Lin 
Date:   Mon Jun 12 01:07:52 2023 -0500

rs6000: Don't use TFmode for 128 bits fp constant in toc [PR110011]

As PR110011 shows, when encoding 128 bits fp constant into
toc, we adopts REAL_VALUE_TO_TARGET_LONG_DOUBLE which is
to find the first float mode with LONG_DOUBLE_TYPE_SIZE
bits of precision, it would be TFmode here.  But the 128
bits fp constant can be with mode IFmode or KFmode, which
doesn't necessarily have the same underlying float format
as the one of TFmode, like this PR exposes, with option
-mabi=ibmlongdouble TFmode has ibm_extended_format while
KFmode has ieee_quad_format, mixing up the formats (the
encoding/decoding ways) would cause unexpected results.

This patch is to make it use constant's own mode instead
of TFmode for real_to_target call.

PR target/110011

gcc/ChangeLog:

* config/rs6000/rs6000.cc (output_toc): Use the mode of the 128-bit
floating constant itself for real_to_target call.

gcc/testsuite/ChangeLog:

* gcc.target/powerpc/pr110011.c: New test.

(cherry picked from commit 388809f2afde874180da0669c669e241037eeba0)

[PATCH] RISC-V: Optimize codegen of VLA SLP

2023-06-19 Thread Juzhe-Zhong
Recently, I figure out a better approach in case of codegen for VLA stepped 
vector.

Here is the detail descriptions:

Case 1:
void
f (uint8_t *restrict a, uint8_t *restrict b)
{
  for (int i = 0; i < 100; ++i)
{
  a[i * 8] = b[i * 8 + 37] + 1;
  a[i * 8 + 1] = b[i * 8 + 37] + 2;
  a[i * 8 + 2] = b[i * 8 + 37] + 3;
  a[i * 8 + 3] = b[i * 8 + 37] + 4;
  a[i * 8 + 4] = b[i * 8 + 37] + 5;
  a[i * 8 + 5] = b[i * 8 + 37] + 6;
  a[i * 8 + 6] = b[i * 8 + 37] + 7;
  a[i * 8 + 7] = b[i * 8 + 37] + 8;
}
}

We need to generate the stepped vector:
NPATTERNS = 8.
{ 0, 0, 0, 0, 0, 0, 0, 0, 8, 8, 8, 8, 8, 8, 8, 8 }

Before this patch:
vid.vv4 ;; {0,1,2,3,4,5,6,7,...}
vsrl.vi  v4,v4,3;; {0,0,0,0,0,0,0,1,1,1,1,1,1,1,1,...}
li   a3,8   ;; {8}
vmul.vx  v4,v4,a3   ;; {0,0,0,0,0,0,0,8,8,8,8,8,8,8,8,...}

After this patch:
vid.vv4;; {0,1,2,3,4,5,6,7,...}
vand.vi  v4,v4,-8(-NPATTERNS)  ;; {0,0,0,0,0,0,0,8,8,8,8,8,8,8,8,...}

Case 2:
void
f (uint8_t *restrict a, uint8_t *restrict b)
{
  for (int i = 0; i < 100; ++i)
{
  a[i * 8] = b[i * 8 + 3] + 1;
  a[i * 8 + 1] = b[i * 8 + 2] + 2;
  a[i * 8 + 2] = b[i * 8 + 1] + 3;
  a[i * 8 + 3] = b[i * 8 + 0] + 4;
  a[i * 8 + 4] = b[i * 8 + 7] + 5;
  a[i * 8 + 5] = b[i * 8 + 6] + 6;
  a[i * 8 + 6] = b[i * 8 + 5] + 7;
  a[i * 8 + 7] = b[i * 8 + 4] + 8;
}
} 

We need to generate the stepped vector:
NPATTERNS = 4.
{ 3, 2, 1, 0, 7, 6, 5, 4, 11, 10, 9, 8, 15, 14, 13, 12, ... }

Before this patch:
li   a6,134221824
slli a6,a6,5
addi a6,a6,3;; 64-bit: 0x000300020001
vmv.v.x  v6,a6  ;; {3, 2, 1, 0, ... }
vid.vv4 ;; {0, 1, 2, 3, 4, 5, 6, 7, ... }
vsrl.vi  v4,v4,2;; {0, 0, 0, 0, 1, 1, 1, 1, ... }
li   a3,4   ;; {4}
vmul.vx  v4,v4,a3   ;; {0, 0, 0, 0, 4, 4, 4, 4, ... }
vadd.vv  v4,v4,v6   ;; {3, 2, 1, 0, 7, 6, 5, 4, 11, 10, 9, 8, 15, 14, 13, 
12, ... }

After this patch:
li  a3,-536875008
sllia3,a3,4
addia3,a3,1
sllia3,a3,16
vmv.v.x v2,a3   ;; {3, 1, -1, -3, ... }
vid.v   v4  ;; {0, 1, 2, 3, 4, 5, 6, 7, ... }
vadd.vv v4,v4,v2;; {3, 2, 1, 0, 7, 6, 5, 4, 11, 10, 9, 8, 15, 14, 13, 
12, ... }

gcc/ChangeLog:

* config/riscv/riscv-v.cc (expand_const_vector): Optimize codegen.

gcc/testsuite/ChangeLog:

* gcc.target/riscv/rvv/autovec/partial/slp-1.c: Adapt testcase.
* gcc.target/riscv/rvv/autovec/partial/slp-16.c: New test.
* gcc.target/riscv/rvv/autovec/partial/slp_run-16.c: New test.

---
 gcc/config/riscv/riscv-v.cc   | 78 ---
 .../riscv/rvv/autovec/partial/slp-1.c |  2 +
 .../riscv/rvv/autovec/partial/slp-16.c| 24 ++
 .../riscv/rvv/autovec/partial/slp_run-16.c| 66 
 4 files changed, 125 insertions(+), 45 deletions(-)
 create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/autovec/partial/slp-16.c
 create mode 100644 
gcc/testsuite/gcc.target/riscv/rvv/autovec/partial/slp_run-16.c

diff --git a/gcc/config/riscv/riscv-v.cc b/gcc/config/riscv/riscv-v.cc
index 79c0337327d..aa143c864d6 100644
--- a/gcc/config/riscv/riscv-v.cc
+++ b/gcc/config/riscv/riscv-v.cc
@@ -1128,7 +1128,7 @@ expand_const_vector (rtx target, rtx src)
builder.quick_push (CONST_VECTOR_ELT (src, i * npatterns + j));
 }
   builder.finalize ();
-  
+
   if (CONST_VECTOR_DUPLICATE_P (src))
 {
   /* Handle the case with repeating sequence that NELTS_PER_PATTERN = 1
@@ -1204,61 +1204,49 @@ expand_const_vector (rtx target, rtx src)
   if (builder.single_step_npatterns_p ())
{
  /* Describe the case by choosing NPATTERNS = 4 as an example.  */
- rtx base, step;
+ insn_code icode;
+
+ /* Step 1: Generate vid = { 0, 1, 2, 3, 4, 5, 6, 7, ... }.  */
+ rtx vid = gen_reg_rtx (builder.mode ());
+ rtx vid_ops[] = {vid};
+ icode = code_for_pred_series (builder.mode ());
+ emit_vlmax_insn (icode, RVV_MISC_OP, vid_ops);
+
  if (builder.npatterns_all_equal_p ())
{
  /* Generate the variable-length vector following this rule:
 { a, a, a + step, a + step, a + step * 2, a + step * 2, ...}
   E.g. { 0, 0, 8, 8, 16, 16, ... } */
- /* Step 1: Generate base = { 0, 0, 0, 0, 0, 0, 0, ... }.  */
- base = expand_vector_broadcast (builder.mode (), builder.elt (0));
+ /* Step 2: VID AND -NPATTERNS:
+{ 0&-4, 1&-4, 2&-4, 3 &-4, 4 &-4, 5 &-4, 6 &-4, 7 &-4, ... }
+ */
+ rtx imm
+   = gen_int_mode (-builder.npatterns (), builder.inner_mode ());
+ rtx and_ops[] = {target, vid, imm};
+ icode = code_for_pred_scalar (AND, builder.mode ());
+ emit_vlmax_insn (icode, RVV_BINOP, and_ops);
}
  else
{
  /* Generate the 

[Bug c++/110304] __builtin_adcs missing and jakub you miss the point of builtin_adcb

2023-06-19 Thread unlvsur at live dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=110304

--- Comment #11 from cqwrteur  ---
Actually mine 

template<::std::unsigned_integral T>
inline constexpr T add_carry(T a,T b,T carryin,T& carryout) noexcept
{
[[assume(carryin==0||carryin==1)]];
a+=b;
carryout=a

Re: [PATCH, rs6000] Add two peephole2 patterns for mr. insn

2023-06-19 Thread HAO CHEN GUI via Gcc-patches
HP,
  It makes sense. I will update the patch.

Thanks
Gui Haochen

在 2023/6/20 8:07, Hans-Peter Nilsson 写道:
> On Tue, 30 May 2023, HAO CHEN GUI via Gcc-patches wrote:
> 
>> +++ b/gcc/config/rs6000/rs6000.md
>> @@ -7891,6 +7891,36 @@ (define_insn "*mov_internal2"
>> (set_attr "dot" "yes")
>> (set_attr "length" "4,4,8")])
>>
>> +(define_peephole2
>> +  [(set (match_operand:CC 2 "cc_reg_operand" "")
>> +(compare:CC (match_operand:P 1 "int_reg_operand" "")
>> +(const_int 0)))
>> +   (set (match_operand:P 0 "int_reg_operand" "")
> 
> A random comment from the sideline: I'd suggest to remove the 
> (empty) constraints string from your peephole2's.
> 
> It can be a matter of port-specific-taste but it seems removing 
> them would be consistent with the other peephole2's in 
> rs6000.md.
> 
> (In this matter, I believe the examples in md.texi are bad.)
> 
> brgds, H-P


Re: [PATCH ver 6] rs6000: Add builtins for IEEE 128-bit floating point values

2023-06-19 Thread Kewen.Lin via Gcc-patches
Hi Carl,

on 2023/6/20 02:54, Carl Love wrote:
> 
> Kewen, GCC maintainers:
> 
> Version 6, Fixed missing change log entry.  Changed builtin id names as
> requested.  Missed making the change on the last version.  Fixed
> comment in the three test cases.  Reran regression suite on Power 10,
> no regressions.
> 
> Version 5, Tested the patch on P9 BE per request.  Fixed up test case
> to get the correct expected values for BE and LE.  Fixed typos. 
> Updated the doc/extend.texi to clarify the vector arguments.  Changed
> test file names per request.  Moved builtin defs next to related
> definitions.  Renamed new mode_attr. Removed new mode_iterator, used
> existing iterator instead. Renamed mode_iterator VSEEQP_DI to V2DI_DI. 
> Fixed up overloaded definitions per request.
> 
> Version 4, added missing cases for new xxexpqp, xsxexpdp and xsxsigqp
> cases to rs6000_expand_builtin.  Merged the new define_insn definitions
> with the existing definitions.  Renamed the builtins by removing the
> __builtin_ prefix from the names.  Fixed the documentation for the
> builtins.  Updated the test files to check the desired instructions
> were generated.  Retested patch on Power 10 with no regressions.
> 
> Version 3, was able to get the overloaded version of scalar_insert_exp
> to work and the change to xsxexpqp_f128_ define instruction to
> work with the suggestions from Kewen.  
> 
> Version 2, I have addressed the various comments from Kewen.  I had
> issues with adding an additional overloaded version of
> scalar_insert_exp with vector arguments.  The overload infrastructure
> didn't work with a mix of scalar and vector arguments.  I did rename
> the __builtin_insertf128_exp to __builtin_vsx_scalar_insert_exp_qp make
> it similar to the existing builtin.  I also wasn't able to get the
> suggested merge of xsxexpqp_f128_ with xsxexpqp_ to work so
> I left the two simpler definitiions.
> 
> The patch add three new builtins to extract the significand and
> exponent of an IEEE float 128-bit value where the builtin argument is a
> vector.  Additionally, a builtin to insert the exponent into an IEEE
> float 128-bit vector argument is added.  These builtins were requested
> since there is no clean and optimal way to transfer between a vector
> and a scalar IEEE 128 bit value.
> 
> The patch has been tested on Power 9 BE and Power 10 LE with no
> regressions.  Please let me know if the patch is acceptable or not. 
> Thanks.

OK for trunk with some nits fixed in changelog (sorry that I didn't catch
all of them in previous review, but I don't think you need to post
a new version).  Thanks!

> 
>Carl
> 
> 
> rs6000: Add builtins for IEEE 128-bit floating point values
> 
> Add support for the following builtins:
> 
>  __vector unsigned long long int scalar_extract_exp_to_vec (__ieee128);
>  __vector unsigned __int128 scalar_extract_sig_to_vec (__ieee128);
>  __ieee128 scalar_insert_exp (__vector unsigned __int128,
> __vector unsigned long long);
> 
> The instructions used in the builtins operate on vector registers.  Thus
> the result must be moved to a scalar type.  There is no clean, performant
> way to do this.  The user code typically needs the result as a vector
> anyway.
> 
> gcc/
>   * config/rs6000/rs6000-builtin.cc (rs6000_expand_builtin):
>   Rename CCDE_FOR_xsxexpqp_tf to CODE_FOR_xsxexpqp_tf_di.
>   Rename CODE_FOR_xsxexpqp_kf to CODE_FOR_xsxexpqp_kf_di.


Miss 
"Rename CODE_FOR_xsxsigqp_tf to CODE_FOR_xsxsigqp_tf_ti."
"Rename CODE_FOR_xsxsigqp_kf to CODE_FOR_xsxsigqp_kf_ti."
"Rename CODE_FOR_xsiexpqp_tf to CODE_FOR_xsiexpqp_tf_di."
"Rename CODE_FOR_xsiexpqp_kf to CODE_FOR_xsiexpqp_kf_di."

>   (CODE_FOR_xsxexpqp_kf_v2di, CODE_FOR_xsxsigqp_kf_v1ti,
>   CODE_FOR_xsiexpqp_kf_v2di): Add case statements.
>   * config/rs6000/rs6000-buildin.def (__builtin_extractf128_exp,
>__builtin_extractf128_sig, __builtin_insertf128_exp): Add new
>   builtin definitions.

Should be with correct names:

(__builtin_vsx_scalar_extract_exp_to_vec,
__builtin_vsx_scalar_extract_sig_to_vec,
__builtin_vsx_scalar_insert_exp_vqp):

>   Rename xsxexpqp_kf, xsxsigqp_kf, xsiexpqp_kf to xsexpqp_kf_di,
>   xsxsigqp_kf_ti, xsiexpqp_kf_di respectively.
>   * config/rs6000/rs6000-c.cc (altivec_resolve_overloaded_builtin):
>   Update case RS6000_OVLD_VEC_VSIE to handle MODE_VECTOR_INT for new
>   overloaded instance. Update comments.
>   * config/rs6000/rs6000-overload.def
>   (__builtin_vec_scalar_insert_exp): Add new overload definition with
>   vector arguments.
>   (scalar_extract_exp_to_vec, scalar_extract_sig_to_vec): New
>   overloaded definitions.
>   * config/vsx.md (V2DI_DI): New mode iterator.
>   (DI_to_TI): New mode attribute.
>   Rename xsxexpqp_ to sxexpqp__.
>   Rename xsxsigqp_ to xsxsigqp__.
>   

Re: [PATCH 2/2] cprop_hardreg: Enable propagation of the stack pointer if possible.

2023-06-19 Thread Jeff Law via Gcc-patches




On 6/19/23 17:48, Andrew Pinski wrote:

On Mon, Jun 19, 2023 at 4:40 PM Andrew Pinski  wrote:


On Mon, Jun 19, 2023 at 9:58 AM Thiago Jung Bauermann via Gcc-patches
 wrote:



Hello Manolis,

Philipp Tomsich  writes:


On Thu, 8 Jun 2023 at 00:18, Jeff Law  wrote:


On 5/25/23 06:35, Manolis Tsamis wrote:

Propagation of the stack pointer in cprop_hardreg is currenty forbidden
in all cases, due to maybe_mode_change returning NULL. Relax this
restriction and allow propagation when no mode change is requested.

gcc/ChangeLog:

  * regcprop.cc (maybe_mode_change): Enable stack pointer propagation.

Thanks for the clarification.  This is OK for the trunk.  It looks
generic enough to have value going forward now rather than waiting.


Rebased, retested, and applied to trunk.  Thanks!


Our CI found a couple of tests that started failing on aarch64-linux
after this commit. I was able to confirm manually that they don't happen
in the commit immediately before this one, and also that these failures
are still present in today's trunk.

I have testsuite logs for last good commit, first bad commit and current
trunk here:

https://people.linaro.org/~thiago.bauermann/gcc-regression-6a2e8dcbbd4b/

Could you please check?

These are the new failures:

Running gcc:gcc.target/aarch64/aarch64.exp ...
FAIL: gcc.target/aarch64/stack-check-cfa-3.c scan-assembler-times mov\\tx11, sp 
1


So for the above before this change we had:
```
(insn:TI 597 596 598 2 (set (reg:DI 11 x11)
 (reg/f:DI 31 sp)) "stack-check-prologue-16.c":16:1 65 {*movdi_aarch64}
  (nil))
(insn 598 597 599 2 (set (mem:BLK (scratch) [0  A8])
 (unspec:BLK [
 (reg:DI 11 x11)
 (reg/f:DI 31 sp)
 ] UNSPEC_PRLG_STK)) "stack-check-prologue-16.c":16:1 1169
{stack_tie}
  (expr_list:REG_DEAD (reg:DI 11 x11)
 (nil)))
```

After we get:
```
(insn 598 596 599 2 (set (mem:BLK (scratch) [0  A8])
 (unspec:BLK [
 (reg:DI 31 sp [11]) repeated x2
 ] UNSPEC_PRLG_STK)) "stack-check-prologue-16.c":16:1 1169
{stack_tie}
  (nil))
```
Which seems to be ok, except we still have:
.cfi_def_cfa_register 11

That is because on:
(insn/f 596 595 598 2 (set (reg:DI 12 x12)
 (plus:DI (reg:DI 12 x12)
 (const_int 272 [0x110]))) "stack-check-prologue-16.c":16:1
153 {*adddi3_aarch64}
  (expr_list:REG_CFA_DEF_CFA (reg:DI 11 x11)
 (nil)))

We record x11 but never update it though that came before the mov for
x11 ... So it seems like cprop_hardreg had no idea it needed to update
it.

I suspect the other testcases are just propagation of sp into the
stores and such and just needed update. But the above testcase seems
getting broken cfi  though I don't know how to fix it.


The code from aarch64.cc:
```
   /* This is done to provide unwinding information for the stack
  adjustments we're about to do, however to prevent the optimizers
  from removing the R11 move and leaving the CFA note (which would 
be
  very wrong) we tie the old and new stack pointer together.
  The tie will expand to nothing but the optimizers will not touch
  the instruction.  */
   rtx stack_ptr_copy = gen_rtx_REG (Pmode, STACK_CLASH_SVE_CFA_REGNUM);
   emit_move_insn (stack_ptr_copy, stack_pointer_rtx);
   emit_insn (gen_stack_tie (stack_ptr_copy, stack_pointer_rtx));

   /* We want the CFA independent of the stack pointer for the
  duration of the loop.  */
   add_reg_note (insn, REG_CFA_DEF_CFA, stack_ptr_copy);
   RTX_FRAME_RELATED_P (insn) = 1;
```

Well except now with this change, the optimizers touch this
instruction. Maybe the move instruction should not be a move but an
unspec so optimizers don't know what the move was.
Adding Tamar to the CC who added this code to aarch64 originally for
comments on the above understanding here.
It's a bit hackish, but could we reject the stack pointer for operand1 
in the stack-tie?  And if we do so, does it help?


jeff


RE: Re: [PATCH] RISC-V: Fix fails of testcases

2023-06-19 Thread Li, Pan2 via Gcc-patches
Committed, thanks Jeff.

Pan

-Original Message-
From: Gcc-patches  On Behalf 
Of ???
Sent: Tuesday, June 20, 2023 7:15 AM
To: Jeff Law ; gcc-patches 
Cc: kito.cheng ; palmer ; rdapp.gcc 

Subject: Re: Re: [PATCH] RISC-V: Fix fails of testcases

>> Presumably the target selector in the dg-do ensures we only build/run
>> these on the appropriate targets now and we don't need explicitly -march
>> arguments?
Yes. 

>> Assuming that's correct, this is fine for the trunk.
Thanks.


juzhe.zh...@rivai.ai
 
From: Jeff Law
Date: 2023-06-20 07:13
To: Juzhe-Zhong; gcc-patches
CC: kito.cheng; palmer; rdapp.gcc
Subject: Re: [PATCH] RISC-V: Fix fails of testcases
 
 
On 6/19/23 17:04, Juzhe-Zhong wrote:
> FAIL: gcc.target/riscv/rvv/autovec/vls-vlmax/vec_extract-run.c -std=c99 -O3 
> -ftree-vectorize --param riscv-autovec-preference=fixed-vlmax (test for 
> excess errors)
> Excess errors:
> xgcc: fatal error: Cannot find suitable multilib set for 
> '-march=rv64imafdcv_zicsr_zifencei_zve32f_zve32x_zve64d_zve64f_zve64x_zvl128b_zvl32b_zvl64b'/'-mabi=lp64d'
> compilation terminated.
> 
> gcc/testsuite/ChangeLog:
> 
>  * gcc.target/riscv/rvv/autovec/vls-vlmax/vec_extract-run.c: Fix fail.
>  * gcc.target/riscv/rvv/autovec/vls-vlmax/vec_extract-zvfh-run.c: 
> Ditto.
>  * gcc.target/riscv/rvv/autovec/vls-vlmax/vec_set-run.c: Ditto.
>  * gcc.target/riscv/rvv/autovec/vls-vlmax/vec_set-zvfh-run.c: Ditto.
Presumably the target selector in the dg-do ensures we only build/run 
these on the appropriate targets now and we don't need explicitly -march 
arguments?
 
Assuming that's correct, this is fine for the trunk.
 
jeff
 


RE: [PATCH] RISC-V: Add tuple vector mode psABI checking and simplify code

2023-06-19 Thread Li, Pan2 via Gcc-patches
Committed, thanks Jeff.

-Original Message-
From: Gcc-patches  On Behalf 
Of Jeff Law via Gcc-patches
Sent: Tuesday, June 20, 2023 2:04 AM
To: 钟居哲 ; 丁乐华 ; gcc-patches 

Cc: Wang, Yanzhang ; kito.cheng 
; palmer ; rdapp.gcc 

Subject: Re: [PATCH] RISC-V: Add tuple vector mode psABI checking and simplify 
code



On 6/18/23 07:16, 钟居哲 wrote:
> Thanks for cleaning up codes for future's ABI support patch.
> Let's wait for Jeff or Robin comments.
Looks reasonable to me given the state we're in WRT psabi and vectors.

jeff


Re: [committed] libstdc++: Optimize std::to_array for trivial types [PR110167]

2023-06-19 Thread Patrick Palka via Gcc-patches
On Fri, 9 Jun 2023, Jonathan Wakely via Libstdc++ wrote:

> Tested powerpc64le-linux. Pushed to trunk.
> 
> This makes sense to backport after some soak time on trunk.
> 
> -- >8 --
> 
> As reported in PR libstdc++/110167, std::to_array compiles extremely
> slowly for very large arrays. It needs to instantiate a very large
> specialization of std::index_sequence and then create a very large
> aggregate initializer from the pack expansion. For trivial types we can
> simply default-initialize the std::array and then use memcpy to copy the
> values. For non-trivial types we need to use the existing
> implementation, despite the compilation cost.
> 
> As also noted in the PR, using a generic lambda instead of the
> __to_array helper compiles faster since gcc-13. It also produces
> slightly smaller code at -O1, due to additional inlining. The code at
> -Os, -O2 and -O3 seems to be the same. This new implementation requires
> __cpp_generic_lambdas >= 201707L (i.e. P0428R2) but that is supported
> since Clang 10 and since Intel icc 2021.5.0 (and since GCC 10.1).
> 
> libstdc++-v3/ChangeLog:
> 
>   PR libstdc++/110167
>   * include/std/array (to_array): Initialize arrays of trivial
>   types using memcpy. For non-trivial types, use lambda
>   expressions instead of a separate helper function.
>   (__to_array): Remove.
>   * testsuite/23_containers/array/creation/110167.cc: New test.
> ---
>  libstdc++-v3/include/std/array| 53 +--
>  .../23_containers/array/creation/110167.cc| 14 +
>  2 files changed, 51 insertions(+), 16 deletions(-)
>  create mode 100644 
> libstdc++-v3/testsuite/23_containers/array/creation/110167.cc
> 
> diff --git a/libstdc++-v3/include/std/array b/libstdc++-v3/include/std/array
> index 70280c1beeb..b791d86ddb2 100644
> --- a/libstdc++-v3/include/std/array
> +++ b/libstdc++-v3/include/std/array
> @@ -414,19 +414,8 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
>return std::move(std::get<_Int>(__arr));
>  }
>  
> -#if __cplusplus > 201703L
> +#if __cplusplus >= 202002L && __cpp_generic_lambdas >= 201707L
>  #define __cpp_lib_to_array 201907L
> -
> -  template
> -constexpr array, sizeof...(_Idx)>
> -__to_array(_Tp (&__a)[sizeof...(_Idx)], index_sequence<_Idx...>)
> -{
> -  if constexpr (_Move)
> - return {{std::move(__a[_Idx])...}};
> -  else
> - return {{__a[_Idx]...}};
> -}
> -
>template
>  [[nodiscard]]
>  constexpr array, _Nm>
> @@ -436,8 +425,24 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
>static_assert(!is_array_v<_Tp>);
>static_assert(is_constructible_v<_Tp, _Tp&>);
>if constexpr (is_constructible_v<_Tp, _Tp&>)
> - return __to_array(__a, make_index_sequence<_Nm>{});
> -  __builtin_unreachable(); // FIXME: see PR c++/91388
> + {
> +   if constexpr (is_trivial_v<_Tp> && _Nm != 0)

redundant _Nm != 0 test?

> + {
> +   array, _Nm> __arr;
> +   if (!__is_constant_evaluated() && _Nm != 0)
> + __builtin_memcpy(__arr.data(), __a, sizeof(__a));
> +   else
> + for (size_t __i = 0; __i < _Nm; ++__i)
> +   __arr._M_elems[__i] = __a[__i];
> +   return __arr;
> + }
> +   else
> + return [&__a](index_sequence<_Idx...>) {
> +   return array, _Nm>{{ __a[_Idx]... }};
> + }(make_index_sequence<_Nm>{});
> + }
> +  else
> + __builtin_unreachable(); // FIXME: see PR c++/91388
>  }
>  
>template
> @@ -449,8 +454,24 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
>static_assert(!is_array_v<_Tp>);
>static_assert(is_move_constructible_v<_Tp>);
>if constexpr (is_move_constructible_v<_Tp>)
> - return __to_array<1>(__a, make_index_sequence<_Nm>{});
> -  __builtin_unreachable(); // FIXME: see PR c++/91388
> + {
> +   if constexpr (is_trivial_v<_Tp>)
> + {
> +   array, _Nm> __arr;
> +   if (!__is_constant_evaluated() && _Nm != 0)
> + __builtin_memcpy(__arr.data(), __a, sizeof(__a));
> +   else
> + for (size_t __i = 0; __i < _Nm; ++__i)
> +   __arr._M_elems[__i] = std::move(__a[__i]);

IIUC this std::move is unnecessary for trivial arrays?

> +   return __arr;
> + }
> +   else
> + return [&__a](index_sequence<_Idx...>) {
> +   return array, _Nm>{{ std::move(__a[_Idx])... }};
> + }(make_index_sequence<_Nm>{});
> + }
> +  else
> + __builtin_unreachable(); // FIXME: see PR c++/91388
>  }
>  #endif // C++20
>  
> diff --git a/libstdc++-v3/testsuite/23_containers/array/creation/110167.cc 
> b/libstdc++-v3/testsuite/23_containers/array/creation/110167.cc
> new file mode 100644
> index 000..c2aecc911bd
> --- /dev/null
> +++ b/libstdc++-v3/testsuite/23_containers/array/creation/110167.cc
> @@ -0,0 +1,14 @@
> +// { dg-options "-std=gnu++20" }
> +// { dg-do compile { target c++20 } }
> +
> +// PR 

Re: [PATCH v6 0/4] P1689R5 support

2023-06-19 Thread Jason Merrill via Gcc

On 6/17/23 10:43, Ben Boeckel wrote:

On Fri, Jun 16, 2023 at 23:55:53 -0400, Jason Merrill wrote:

I see the same thing with patch 4 on x86_64-pc-linux-gnu, e.g.

FAIL: g++.dg/modules/ben-1_a.C -std=c++17 (test for excess errors)
Excess errors:
/home/jason/gt/gcc/testsuite/g++.dg/modules/ben-1_a.C:9:1: internal
compiler error: Segmentation fault
0x19e2f3c crash_signal
 /home/jason/gt/gcc/toplev.cc:314
0x340f3f8 mkdeps::vec::size() const
 /home/jason/gt/libcpp/mkdeps.cc:57
0x340dc1f apply_vpath
 /home/jason/gt/libcpp/mkdeps.cc:194
0x340e08e deps_add_dep(mkdeps*, char const*)
 /home/jason/gt/libcpp/mkdeps.cc:318
0xea7b51 module_client::open_module_client(unsigned int, char const*,
mkdeps*, void (*)(char const*), char const*)
 /home/jason/gt/gcc/cp/mapper-client.cc:291
0xef2ba8 make_mapper
 /home/jason/gt/gcc/cp/module.cc:14042
0xf0896c get_mapper(unsigned int, mkdeps*)
 /home/jason/gt/gcc/cp/module.cc:3977
0xf032ac name_pending_imports
 /home/jason/gt/gcc/cp/module.cc:19623
0xf03a7d preprocessed_module(cpp_reader*)
 /home/jason/gt/gcc/cp/module.cc:19817
0xe85104 module_token_cdtor(cpp_reader*, unsigned long)
 /home/jason/gt/gcc/cp/lex.cc:548
0xf467b2 cp_lexer_new_main
 /home/jason/gt/gcc/cp/parser.cc:756
0xfc1e3a c_parse_file()
 /home/jason/gt/gcc/cp/parser.cc:49725
0x11c5bf5 c_common_parse_file()
 /home/jason/gt/gcc/c-family/c-opts.cc:1268


Thanks. I missed a `nullptr` check before calling `deps_add_dep`. I
think I got misled by `make check` returning a zero exit code even if
there are failures.


Aha!

Patches 3 and 4 could also use testcases.

Jason



Re: [PATCH v6 0/4] P1689R5 support

2023-06-19 Thread Jason Merrill via Gcc-patches

On 6/17/23 10:43, Ben Boeckel wrote:

On Fri, Jun 16, 2023 at 23:55:53 -0400, Jason Merrill wrote:

I see the same thing with patch 4 on x86_64-pc-linux-gnu, e.g.

FAIL: g++.dg/modules/ben-1_a.C -std=c++17 (test for excess errors)
Excess errors:
/home/jason/gt/gcc/testsuite/g++.dg/modules/ben-1_a.C:9:1: internal
compiler error: Segmentation fault
0x19e2f3c crash_signal
 /home/jason/gt/gcc/toplev.cc:314
0x340f3f8 mkdeps::vec::size() const
 /home/jason/gt/libcpp/mkdeps.cc:57
0x340dc1f apply_vpath
 /home/jason/gt/libcpp/mkdeps.cc:194
0x340e08e deps_add_dep(mkdeps*, char const*)
 /home/jason/gt/libcpp/mkdeps.cc:318
0xea7b51 module_client::open_module_client(unsigned int, char const*,
mkdeps*, void (*)(char const*), char const*)
 /home/jason/gt/gcc/cp/mapper-client.cc:291
0xef2ba8 make_mapper
 /home/jason/gt/gcc/cp/module.cc:14042
0xf0896c get_mapper(unsigned int, mkdeps*)
 /home/jason/gt/gcc/cp/module.cc:3977
0xf032ac name_pending_imports
 /home/jason/gt/gcc/cp/module.cc:19623
0xf03a7d preprocessed_module(cpp_reader*)
 /home/jason/gt/gcc/cp/module.cc:19817
0xe85104 module_token_cdtor(cpp_reader*, unsigned long)
 /home/jason/gt/gcc/cp/lex.cc:548
0xf467b2 cp_lexer_new_main
 /home/jason/gt/gcc/cp/parser.cc:756
0xfc1e3a c_parse_file()
 /home/jason/gt/gcc/cp/parser.cc:49725
0x11c5bf5 c_common_parse_file()
 /home/jason/gt/gcc/c-family/c-opts.cc:1268


Thanks. I missed a `nullptr` check before calling `deps_add_dep`. I
think I got misled by `make check` returning a zero exit code even if
there are failures.


Aha!

Patches 3 and 4 could also use testcases.

Jason



[Bug testsuite/110316] New: [14 regression] g++.dg/ext/timevar1.C and timevar2.C fail erratically

2023-06-19 Thread seurer at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=110316

Bug ID: 110316
   Summary: [14 regression] g++.dg/ext/timevar1.C and timevar2.C
fail erratically
   Product: gcc
   Version: 14.0
Status: UNCONFIRMED
  Severity: normal
  Priority: P3
 Component: testsuite
  Assignee: unassigned at gcc dot gnu.org
  Reporter: seurer at gcc dot gnu.org
  Target Milestone: ---

I unfortunately do not have a clear starting point for this but recently the
g++.dg/ext/timevar1.C and timevar2 tests began failing some runs and working on
the next.  It is happening on one of our newer/faster machines but it did not
used to fail there.

The last run I did not see any failures (for this ome nor previously) was
47fa3cef59a031f1b0fdce309ff634fab717606d, r14-1906-g47fa3cef59a031

The first run with failures was 0f9bb3e7a4aab95fd449f60b5f891ed9a6e5f352,
r14-1910-g0f9bb3e7a4aab9

I don't see anything in that range that might cause this, though.

FAIL: g++.dg/ext/timevar1.C  -std=gnu++17 (internal compiler error: in
validate_phases, at timevar.cc:626)
FAIL: g++.dg/ext/timevar1.C  -std=gnu++17 (test for excess errors)
FAIL: g++.dg/ext/timevar2.C  -std=gnu++20 (internal compiler error: in
validate_phases, at timevar.cc:626)
FAIL: g++.dg/ext/timevar2.C  -std=gnu++20 (test for excess errors)

spawn -ignore SIGHUP
/home/gccbuild/build/nightly/build-gcc-trunk/gcc/testsuite/g++1/../../xg++
-B/home/gccbuild/build/nightly/build-gcc-trunk/gcc/testsuite/g++1/../../
/home/gccbuild/gcc_trunk_git/gcc/gcc/testsuite/g++.dg/ext/timevar2.C
-fdiagnostics-plain-output -nostdinc++
-I/home/gccbuild/build/nightly/build-gcc-trunk/powerpc64le-unknown-linux-gnu/libstdc++-v3/include/powerpc64le-unknown-linux-gnu
-I/home/gccbuild/build/nightly/build-gcc-trunk/powerpc64le-unknown-linux-gnu/libstdc++-v3/include
-I/home/gccbuild/gcc_trunk_git/gcc/libstdc++-v3/libsupc++
-I/home/gccbuild/gcc_trunk_git/gcc/libstdc++-v3/include/backward
-I/home/gccbuild/gcc_trunk_git/gcc/libstdc++-v3/testsuite/util
-fmessage-length=0 -std=gnu++98 -ftime-report -S -o timevar2.s^M
^M
Time variable   usr   sys  wall
  GGC^M
 phase setup:   0.00 (  0%)   0.00 (  0%)   0.01 (100%)
 2835k ( 81%)^M
 phase parsing  :   0.01 (100%)   0.00 (  0%)   0.00 (  0%)
  603k ( 17%)^M
 |name lookup   :   0.00 (  0%)   0.00 (  0%)   0.01 (100%)
  174k (  5%)^M
 parser (global):   0.01 (100%)   0.00 (  0%)   0.00 (  0%)
  587k ( 17%)^M
 TOTAL  :   0.01  0.00  0.01   
 3496k^M
Extra diagnostic checks enabled; compiler may run slowly.^M
Configure with --enable-checking=release to disable checks.^M
Timing error: total of phase timers exceeds total time.^M
wall1.02666800281492e-02 > 1.00150810810562e-02^M
internal compiler error: in validate_phases, at timevar.cc:626^M
0x10ff92bb toplev::~toplev()^M
/home/gccbuild/gcc_trunk_git/gcc/gcc/toplev.cc:2155^M
xg++: internal compiler error: Segmentation fault signal terminated program
cc1plus^M

Note that the two phase timings are both 0.01 and both report 100% while the
total time is also 0.01.  Is this maybe a rounding issue?

Re: [PATCH, rs6000] Add two peephole2 patterns for mr. insn

2023-06-19 Thread Hans-Peter Nilsson
On Tue, 30 May 2023, HAO CHEN GUI via Gcc-patches wrote:

> +++ b/gcc/config/rs6000/rs6000.md
> @@ -7891,6 +7891,36 @@ (define_insn "*mov_internal2"
> (set_attr "dot" "yes")
> (set_attr "length" "4,4,8")])
> 
> +(define_peephole2
> +  [(set (match_operand:CC 2 "cc_reg_operand" "")
> + (compare:CC (match_operand:P 1 "int_reg_operand" "")
> + (const_int 0)))
> +   (set (match_operand:P 0 "int_reg_operand" "")

A random comment from the sideline: I'd suggest to remove the 
(empty) constraints string from your peephole2's.

It can be a matter of port-specific-taste but it seems removing 
them would be consistent with the other peephole2's in 
rs6000.md.

(In this matter, I believe the examples in md.texi are bad.)

brgds, H-P


[Bug tree-optimization/110315] [13 Regression] g++ crashes with a segmentation fault while compiling a large const std::vector of std::string since r13-6566-ge0324e2629e25a90

2023-06-19 Thread pinskia at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=110315

Andrew Pinski  changed:

   What|Removed |Added

   Target Milestone|--- |13.2
  Component|c++ |tree-optimization

--- Comment #1 from Andrew Pinski  ---
My bet is you could reproduce it before r13-4562-g3da5ae7a347b7d74765053f4a08
and your bisect just produced one revision which just happened to undo part of
the front-end optimizations done in r13-4562 (and a few others afterwards).

Re: [PATCH 2/2] cprop_hardreg: Enable propagation of the stack pointer if possible.

2023-06-19 Thread Andrew Pinski via Gcc-patches
On Mon, Jun 19, 2023 at 4:40 PM Andrew Pinski  wrote:
>
> On Mon, Jun 19, 2023 at 9:58 AM Thiago Jung Bauermann via Gcc-patches
>  wrote:
> >
> >
> > Hello Manolis,
> >
> > Philipp Tomsich  writes:
> >
> > > On Thu, 8 Jun 2023 at 00:18, Jeff Law  wrote:
> > >>
> > >> On 5/25/23 06:35, Manolis Tsamis wrote:
> > >> > Propagation of the stack pointer in cprop_hardreg is currenty forbidden
> > >> > in all cases, due to maybe_mode_change returning NULL. Relax this
> > >> > restriction and allow propagation when no mode change is requested.
> > >> >
> > >> > gcc/ChangeLog:
> > >> >
> > >> >  * regcprop.cc (maybe_mode_change): Enable stack pointer 
> > >> > propagation.
> > >> Thanks for the clarification.  This is OK for the trunk.  It looks
> > >> generic enough to have value going forward now rather than waiting.
> > >
> > > Rebased, retested, and applied to trunk.  Thanks!
> >
> > Our CI found a couple of tests that started failing on aarch64-linux
> > after this commit. I was able to confirm manually that they don't happen
> > in the commit immediately before this one, and also that these failures
> > are still present in today's trunk.
> >
> > I have testsuite logs for last good commit, first bad commit and current
> > trunk here:
> >
> > https://people.linaro.org/~thiago.bauermann/gcc-regression-6a2e8dcbbd4b/
> >
> > Could you please check?
> >
> > These are the new failures:
> >
> > Running gcc:gcc.target/aarch64/aarch64.exp ...
> > FAIL: gcc.target/aarch64/stack-check-cfa-3.c scan-assembler-times 
> > mov\\tx11, sp 1
>
> So for the above before this change we had:
> ```
> (insn:TI 597 596 598 2 (set (reg:DI 11 x11)
> (reg/f:DI 31 sp)) "stack-check-prologue-16.c":16:1 65 {*movdi_aarch64}
>  (nil))
> (insn 598 597 599 2 (set (mem:BLK (scratch) [0  A8])
> (unspec:BLK [
> (reg:DI 11 x11)
> (reg/f:DI 31 sp)
> ] UNSPEC_PRLG_STK)) "stack-check-prologue-16.c":16:1 1169
> {stack_tie}
>  (expr_list:REG_DEAD (reg:DI 11 x11)
> (nil)))
> ```
>
> After we get:
> ```
> (insn 598 596 599 2 (set (mem:BLK (scratch) [0  A8])
> (unspec:BLK [
> (reg:DI 31 sp [11]) repeated x2
> ] UNSPEC_PRLG_STK)) "stack-check-prologue-16.c":16:1 1169
> {stack_tie}
>  (nil))
> ```
> Which seems to be ok, except we still have:
> .cfi_def_cfa_register 11
>
> That is because on:
> (insn/f 596 595 598 2 (set (reg:DI 12 x12)
> (plus:DI (reg:DI 12 x12)
> (const_int 272 [0x110]))) "stack-check-prologue-16.c":16:1
> 153 {*adddi3_aarch64}
>  (expr_list:REG_CFA_DEF_CFA (reg:DI 11 x11)
> (nil)))
>
> We record x11 but never update it though that came before the mov for
> x11 ... So it seems like cprop_hardreg had no idea it needed to update
> it.
>
> I suspect the other testcases are just propagation of sp into the
> stores and such and just needed update. But the above testcase seems
> getting broken cfi  though I don't know how to fix it.

The code from aarch64.cc:
```
  /* This is done to provide unwinding information for the stack
 adjustments we're about to do, however to prevent the optimizers
 from removing the R11 move and leaving the CFA note (which would be
 very wrong) we tie the old and new stack pointer together.
 The tie will expand to nothing but the optimizers will not touch
 the instruction.  */
  rtx stack_ptr_copy = gen_rtx_REG (Pmode, STACK_CLASH_SVE_CFA_REGNUM);
  emit_move_insn (stack_ptr_copy, stack_pointer_rtx);
  emit_insn (gen_stack_tie (stack_ptr_copy, stack_pointer_rtx));

  /* We want the CFA independent of the stack pointer for the
 duration of the loop.  */
  add_reg_note (insn, REG_CFA_DEF_CFA, stack_ptr_copy);
  RTX_FRAME_RELATED_P (insn) = 1;
```

Well except now with this change, the optimizers touch this
instruction. Maybe the move instruction should not be a move but an
unspec so optimizers don't know what the move was.
Adding Tamar to the CC who added this code to aarch64 originally for
comments on the above understanding here.

Thanks,
Andrew


>
> Thanks,
> Andrew Pinski
>
>
> >
> > Running gcc:gcc.target/aarch64/sve/pcs/aarch64-sve-pcs.exp ...
> > FAIL: gcc.target/aarch64/sve/pcs/args_1.c -march=armv8.2-a+sve 
> > -fno-stack-protector  check-function-bodies caller_pred
> > FAIL: gcc.target/aarch64/sve/pcs/args_2.c -march=armv8.2-a+sve 
> > -fno-stack-protector  scan-assembler \\tmov\\t(z[0-9]+\\.b), 
> > #8\\n.*\\tst1b\\t\\1, p[0-7], \\[x4\\]\\n
> > FAIL: gcc.target/aarch64/sve/pcs/args_3.c -march=armv8.2-a+sve 
> > -fno-stack-protector  scan-assembler \\tmov\\t(z[0-9]+\\.b), 
> > #8\\n.*\\tst1b\\t\\1, p[0-7], \\[x4\\]\\n
> > FAIL: gcc.target/aarch64/sve/pcs/args_4.c -march=armv8.2-a+sve 
> > -fno-stack-protector  scan-assembler \\tfmov\\t(z[0-9]+\\.h), 
> > #8\\.0.*\\tst1h\\t\\1, p[0-7], \\[x4\\]\\n
> > FAIL: 

Re: [PATCH 2/2] cprop_hardreg: Enable propagation of the stack pointer if possible.

2023-06-19 Thread Andrew Pinski via Gcc-patches
On Mon, Jun 19, 2023 at 9:58 AM Thiago Jung Bauermann via Gcc-patches
 wrote:
>
>
> Hello Manolis,
>
> Philipp Tomsich  writes:
>
> > On Thu, 8 Jun 2023 at 00:18, Jeff Law  wrote:
> >>
> >> On 5/25/23 06:35, Manolis Tsamis wrote:
> >> > Propagation of the stack pointer in cprop_hardreg is currenty forbidden
> >> > in all cases, due to maybe_mode_change returning NULL. Relax this
> >> > restriction and allow propagation when no mode change is requested.
> >> >
> >> > gcc/ChangeLog:
> >> >
> >> >  * regcprop.cc (maybe_mode_change): Enable stack pointer 
> >> > propagation.
> >> Thanks for the clarification.  This is OK for the trunk.  It looks
> >> generic enough to have value going forward now rather than waiting.
> >
> > Rebased, retested, and applied to trunk.  Thanks!
>
> Our CI found a couple of tests that started failing on aarch64-linux
> after this commit. I was able to confirm manually that they don't happen
> in the commit immediately before this one, and also that these failures
> are still present in today's trunk.
>
> I have testsuite logs for last good commit, first bad commit and current
> trunk here:
>
> https://people.linaro.org/~thiago.bauermann/gcc-regression-6a2e8dcbbd4b/
>
> Could you please check?
>
> These are the new failures:
>
> Running gcc:gcc.target/aarch64/aarch64.exp ...
> FAIL: gcc.target/aarch64/stack-check-cfa-3.c scan-assembler-times mov\\tx11, 
> sp 1

So for the above before this change we had:
```
(insn:TI 597 596 598 2 (set (reg:DI 11 x11)
(reg/f:DI 31 sp)) "stack-check-prologue-16.c":16:1 65 {*movdi_aarch64}
 (nil))
(insn 598 597 599 2 (set (mem:BLK (scratch) [0  A8])
(unspec:BLK [
(reg:DI 11 x11)
(reg/f:DI 31 sp)
] UNSPEC_PRLG_STK)) "stack-check-prologue-16.c":16:1 1169
{stack_tie}
 (expr_list:REG_DEAD (reg:DI 11 x11)
(nil)))
```

After we get:
```
(insn 598 596 599 2 (set (mem:BLK (scratch) [0  A8])
(unspec:BLK [
(reg:DI 31 sp [11]) repeated x2
] UNSPEC_PRLG_STK)) "stack-check-prologue-16.c":16:1 1169
{stack_tie}
 (nil))
```
Which seems to be ok, except we still have:
.cfi_def_cfa_register 11

That is because on:
(insn/f 596 595 598 2 (set (reg:DI 12 x12)
(plus:DI (reg:DI 12 x12)
(const_int 272 [0x110]))) "stack-check-prologue-16.c":16:1
153 {*adddi3_aarch64}
 (expr_list:REG_CFA_DEF_CFA (reg:DI 11 x11)
(nil)))

We record x11 but never update it though that came before the mov for
x11 ... So it seems like cprop_hardreg had no idea it needed to update
it.

I suspect the other testcases are just propagation of sp into the
stores and such and just needed update. But the above testcase seems
getting broken cfi  though I don't know how to fix it.

Thanks,
Andrew Pinski


>
> Running gcc:gcc.target/aarch64/sve/pcs/aarch64-sve-pcs.exp ...
> FAIL: gcc.target/aarch64/sve/pcs/args_1.c -march=armv8.2-a+sve 
> -fno-stack-protector  check-function-bodies caller_pred
> FAIL: gcc.target/aarch64/sve/pcs/args_2.c -march=armv8.2-a+sve 
> -fno-stack-protector  scan-assembler \\tmov\\t(z[0-9]+\\.b), 
> #8\\n.*\\tst1b\\t\\1, p[0-7], \\[x4\\]\\n
> FAIL: gcc.target/aarch64/sve/pcs/args_3.c -march=armv8.2-a+sve 
> -fno-stack-protector  scan-assembler \\tmov\\t(z[0-9]+\\.b), 
> #8\\n.*\\tst1b\\t\\1, p[0-7], \\[x4\\]\\n
> FAIL: gcc.target/aarch64/sve/pcs/args_4.c -march=armv8.2-a+sve 
> -fno-stack-protector  scan-assembler \\tfmov\\t(z[0-9]+\\.h), 
> #8\\.0.*\\tst1h\\t\\1, p[0-7], \\[x4\\]\\n
> FAIL: gcc.target/aarch64/sve/pcs/args_5_be_bf16.c -march=armv8.2-a+sve 
> -fno-stack-protector  scan-assembler \\tld2h\\t{(z[0-9]+\\.h) - 
> z[0-9]+\\.h}.*\\tst1h\\t\\1, p[0-7], \\[x1\\]\\n
> FAIL: gcc.target/aarch64/sve/pcs/args_5_be_f16.c -march=armv8.2-a+sve 
> -fno-stack-protector  scan-assembler \\tld2h\\t{(z[0-9]+\\.h) - 
> z[0-9]+\\.h}.*\\tst1h\\t\\1, p[0-7], \\[x1\\]\\n
> FAIL: gcc.target/aarch64/sve/pcs/args_5_be_f32.c -march=armv8.2-a+sve 
> -fno-stack-protector  scan-assembler \\tld2w\\t{(z[0-9]+\\.s) - 
> z[0-9]+\\.s}.*\\tst1w\\t\\1, p[0-7], \\[x1\\]\\n
> FAIL: gcc.target/aarch64/sve/pcs/args_5_be_f64.c -march=armv8.2-a+sve 
> -fno-stack-protector  scan-assembler \\tld2d\\t{(z[0-9]+\\.d) - 
> z[0-9]+\\.d}.*\\tst1d\\t\\1, p[0-7], \\[x1\\]\\n
> FAIL: gcc.target/aarch64/sve/pcs/args_5_be_s16.c -march=armv8.2-a+sve 
> -fno-stack-protector  scan-assembler \\tld2h\\t{(z[0-9]+\\.h) - 
> z[0-9]+\\.h}.*\\tst1h\\t\\1, p[0-7], \\[x1\\]\\n
> FAIL: gcc.target/aarch64/sve/pcs/args_5_be_s32.c -march=armv8.2-a+sve 
> -fno-stack-protector  scan-assembler \\tld2w\\t{(z[0-9]+\\.s) - 
> z[0-9]+\\.s}.*\\tst1w\\t\\1, p[0-7], \\[x1\\]\\n
> FAIL: gcc.target/aarch64/sve/pcs/args_5_be_s64.c -march=armv8.2-a+sve 
> -fno-stack-protector  scan-assembler \\tld2d\\t{(z[0-9]+\\.d) - 
> z[0-9]+\\.d}.*\\tst1d\\t\\1, p[0-7], \\[x1\\]\\n
> FAIL: gcc.target/aarch64/sve/pcs/args_5_be_s8.c -march=armv8.2-a+sve 
> -fno-stack-protector  scan-assembler \\tld2b\\t{(z[0-9]+\\.b) - 

[Bug c++/110315] New: [13 Regression] g++ crashes with a segmentation fault while compiling a large const std::vector of std::string since r13-6566-ge0324e2629e25a90

2023-06-19 Thread glebfm at altlinux dot org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=110315

Bug ID: 110315
   Summary: [13 Regression] g++ crashes with a segmentation fault
while compiling a large const std::vector of
std::string since r13-6566-ge0324e2629e25a90
   Product: gcc
   Version: 13.1.1
Status: UNCONFIRMED
  Severity: normal
  Priority: P3
 Component: c++
  Assignee: unassigned at gcc dot gnu.org
  Reporter: glebfm at altlinux dot org
  Target Milestone: ---

Created attachment 55366
  --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=55366=edit
preprocessed source

$ ~/gcc-test/bin/g++ --version
g++ (GCC) 13.1.1 20230619
Copyright (C) 2023 Free Software Foundation, Inc.
This is free software; see the source for copying conditions.  There is NO
warranty; not even for MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.

$ ~/gcc-test/bin/g++ -O -c test.ii
g++: internal compiler error: Segmentation fault signal terminated program
cc1plus
Please submit a full bug report, with preprocessed source (by using
-freport-bug).
See <https://gcc.gnu.org/bugs/> for instructions.

$ gdb -q --args
/usr/src/gcc-test/libexec/gcc/x86_64-pc-linux-gnu/13.1.1/cc1plus -quiet -O
test.cpp -o test.o
Reading symbols from
/usr/src/gcc-test/libexec/gcc/x86_64-pc-linux-gnu/13.1.1/cc1plus...
(gdb) run
Starting program:
/usr/src/gcc-test/libexec/gcc/x86_64-pc-linux-gnu/13.1.1/cc1plus -quiet -O
test.cpp -o test.o
[Thread debugging using libthread_db enabled]
Using host libthread_db library "/lib64/libthread_db.so.1".

Program received signal SIGSEGV, Segmentation fault.
0x01aafe51 in fold_using_range::range_of_phi
(this=this@entry=0x7bfffcaf, r=warning: RTTI symbol not found for class
'int_range<255u>'
..., phi=phi@entry=0x72d4c000, src=...) at
../../gcc/gimple-range-fold.cc:733
733 {
(gdb) bt
#0  0x01aafe51 in fold_using_range::range_of_phi
(this=this@entry=0x7bfffcaf, r=warning: RTTI symbol not found for class
'int_range<255u>'
..., phi=phi@entry=0x72d4c000, src=...)
at ../../gcc/gimple-range-fold.cc:733
#1  0x01ab2289 in fold_using_range::fold_stmt (this=0x7bfffcaf,
r=warning: RTTI symbol not found for class 'int_range<255u>'
..., s=0x72d4c000, src=..., name=0x72d4b480) at
../../gcc/gimple-range-fold.cc:491
#2  0x01aa5292 in gimple_ranger::fold_range_internal
(name=0x72d4b480, s=0x72d4c000, r=warning: RTTI symbol not found for
class 'int_range<255u>'
..., this=0x29069a0) at ../../gcc/gimple-range.cc:257
#3  gimple_ranger::range_of_stmt (this=0x29069a0, r=warning: RTTI symbol not
found for class 'int_range<255u>'
..., s=0x72d4c000, name=) at ../../gcc/gimple-range.cc:318
#4  0x01aa3fbb in gimple_ranger::range_on_entry
(this=this@entry=0x29069a0, r=warning: RTTI symbol not found for class
'int_range<255u>'
..., bb=0x734b0720, name=name@entry=0x72d4b480)
at ../../gcc/gimple-range.cc:153
#5  0x01aa6cff in gimple_ranger::range_of_expr (this=0x29069a0,
r=warning: RTTI symbol not found for class 'int_range<255u>'
..., expr=0x72d4b480, stmt=) at
../../gcc/gimple-range.cc:130
#6  0x01aa418a in gimple_ranger::range_on_exit (this=0x29069a0,
r=warning: RTTI symbol not found for class 'int_range<255u>'
..., bb=0x734b0720, name=0x72d4b480) at ../../gcc/gimple-range.cc:187
#7  0x01aa728a in gimple_ranger::range_on_edge (this=0x29069a0,
r=warning: RTTI symbol not found for class 'int_range<255u>'
..., e=0x7325a720, name=0x72d4b480) at ../../gcc/gimple-range.cc:233
#8  0x01ab012f in fold_using_range::range_of_phi
(this=this@entry=0x7c0073af, r=warning: RTTI symbol not found for class
'int_range<255u>'
..., phi=phi@entry=0x72d4c100, src=...) at ../../gcc/value-range.h:634
#9  0x01ab2289 in fold_using_range::fold_stmt (this=0x7c0073af,
r=warning: RTTI symbol not found for class 'int_range<255u>'
..., s=0x72d4c100, src=..., name=0x72d4b4c8) at
../../gcc/gimple-range-fold.cc:491
#10 0x01aa5292 in gimple_ranger::fold_range_internal
(name=0x72d4b4c8, s=0x72d4c100, r=warning: RTTI symbol not found for
class 'int_range<255u>'
#11 gimple_ranger::range_of_stmt (this=0x29069a0, r=warning: RTTI symbol not
found for class 'int_range<255u>'
..., s=0x72d4c100, name=) at ../../gcc/gimple-range.cc:318
#12 0x01aa3fbb in gimple_ranger::range_on_entry
(this=this@entry=0x29069a0, r=warning: RTTI symbol not found for class
'int_range<255u>'
..., bb=0x734b0900, name=name@entry=0x72d4b4c8)
at ../../gcc/gimple-range.cc:153
#13 0x01aa6cff in gimple_ranger::range_of_expr (this=0x29069a0,
r=warning: RTTI symbol not found for class 'int_range<255u>'
..., expr=0x72d4b4c8, stmt=) at
../../gcc/gimple-range.cc:130
#14 0x01aa418a in gimpl

Re: Re: [PATCH] RISC-V: Fix fails of testcases

2023-06-19 Thread 钟居哲
>> Presumably the target selector in the dg-do ensures we only build/run
>> these on the appropriate targets now and we don't need explicitly -march
>> arguments?
Yes. 

>> Assuming that's correct, this is fine for the trunk.
Thanks.


juzhe.zh...@rivai.ai
 
From: Jeff Law
Date: 2023-06-20 07:13
To: Juzhe-Zhong; gcc-patches
CC: kito.cheng; palmer; rdapp.gcc
Subject: Re: [PATCH] RISC-V: Fix fails of testcases
 
 
On 6/19/23 17:04, Juzhe-Zhong wrote:
> FAIL: gcc.target/riscv/rvv/autovec/vls-vlmax/vec_extract-run.c -std=c99 -O3 
> -ftree-vectorize --param riscv-autovec-preference=fixed-vlmax (test for 
> excess errors)
> Excess errors:
> xgcc: fatal error: Cannot find suitable multilib set for 
> '-march=rv64imafdcv_zicsr_zifencei_zve32f_zve32x_zve64d_zve64f_zve64x_zvl128b_zvl32b_zvl64b'/'-mabi=lp64d'
> compilation terminated.
> 
> gcc/testsuite/ChangeLog:
> 
>  * gcc.target/riscv/rvv/autovec/vls-vlmax/vec_extract-run.c: Fix fail.
>  * gcc.target/riscv/rvv/autovec/vls-vlmax/vec_extract-zvfh-run.c: 
> Ditto.
>  * gcc.target/riscv/rvv/autovec/vls-vlmax/vec_set-run.c: Ditto.
>  * gcc.target/riscv/rvv/autovec/vls-vlmax/vec_set-zvfh-run.c: Ditto.
Presumably the target selector in the dg-do ensures we only build/run 
these on the appropriate targets now and we don't need explicitly -march 
arguments?
 
Assuming that's correct, this is fine for the trunk.
 
jeff
 


Re: [PATCH] RISC-V: Fix fails of testcases

2023-06-19 Thread Jeff Law via Gcc-patches




On 6/19/23 17:04, Juzhe-Zhong wrote:

FAIL: gcc.target/riscv/rvv/autovec/vls-vlmax/vec_extract-run.c -std=c99 -O3 
-ftree-vectorize --param riscv-autovec-preference=fixed-vlmax (test for excess 
errors)
Excess errors:
xgcc: fatal error: Cannot find suitable multilib set for 
'-march=rv64imafdcv_zicsr_zifencei_zve32f_zve32x_zve64d_zve64f_zve64x_zvl128b_zvl32b_zvl64b'/'-mabi=lp64d'
compilation terminated.

gcc/testsuite/ChangeLog:

 * gcc.target/riscv/rvv/autovec/vls-vlmax/vec_extract-run.c: Fix fail.
 * gcc.target/riscv/rvv/autovec/vls-vlmax/vec_extract-zvfh-run.c: Ditto.
 * gcc.target/riscv/rvv/autovec/vls-vlmax/vec_set-run.c: Ditto.
 * gcc.target/riscv/rvv/autovec/vls-vlmax/vec_set-zvfh-run.c: Ditto.
Presumably the target selector in the dg-do ensures we only build/run 
these on the appropriate targets now and we don't need explicitly -march 
arguments?


Assuming that's correct, this is fine for the trunk.

jeff


[PATCH] RISC-V: Fix fails of testcases

2023-06-19 Thread Juzhe-Zhong
FAIL: gcc.target/riscv/rvv/autovec/vls-vlmax/vec_extract-run.c -std=c99 -O3 
-ftree-vectorize --param riscv-autovec-preference=fixed-vlmax (test for excess 
errors)
Excess errors:
xgcc: fatal error: Cannot find suitable multilib set for 
'-march=rv64imafdcv_zicsr_zifencei_zve32f_zve32x_zve64d_zve64f_zve64x_zvl128b_zvl32b_zvl64b'/'-mabi=lp64d'
compilation terminated.

gcc/testsuite/ChangeLog:

* gcc.target/riscv/rvv/autovec/vls-vlmax/vec_extract-run.c: Fix fail.
* gcc.target/riscv/rvv/autovec/vls-vlmax/vec_extract-zvfh-run.c: Ditto.
* gcc.target/riscv/rvv/autovec/vls-vlmax/vec_set-run.c: Ditto.
* gcc.target/riscv/rvv/autovec/vls-vlmax/vec_set-zvfh-run.c: Ditto.

---
 .../gcc.target/riscv/rvv/autovec/vls-vlmax/vec_extract-run.c| 2 +-
 .../riscv/rvv/autovec/vls-vlmax/vec_extract-zvfh-run.c  | 2 +-
 .../gcc.target/riscv/rvv/autovec/vls-vlmax/vec_set-run.c| 2 +-
 .../gcc.target/riscv/rvv/autovec/vls-vlmax/vec_set-zvfh-run.c   | 2 +-
 4 files changed, 4 insertions(+), 4 deletions(-)

diff --git 
a/gcc/testsuite/gcc.target/riscv/rvv/autovec/vls-vlmax/vec_extract-run.c 
b/gcc/testsuite/gcc.target/riscv/rvv/autovec/vls-vlmax/vec_extract-run.c
index 82bf6d674ec..dd22dae5eb9 100644
--- a/gcc/testsuite/gcc.target/riscv/rvv/autovec/vls-vlmax/vec_extract-run.c
+++ b/gcc/testsuite/gcc.target/riscv/rvv/autovec/vls-vlmax/vec_extract-run.c
@@ -1,5 +1,5 @@
 /* { dg-do run { target { riscv_vector } } } */
-/* { dg-additional-options "-std=c99 -march=rv64gcv -Wno-pedantic" } */
+/* { dg-additional-options "-std=c99 -Wno-pedantic" } */
 
 #include 
 
diff --git 
a/gcc/testsuite/gcc.target/riscv/rvv/autovec/vls-vlmax/vec_extract-zvfh-run.c 
b/gcc/testsuite/gcc.target/riscv/rvv/autovec/vls-vlmax/vec_extract-zvfh-run.c
index a0b2cf97afe..db54acc6535 100644
--- 
a/gcc/testsuite/gcc.target/riscv/rvv/autovec/vls-vlmax/vec_extract-zvfh-run.c
+++ 
b/gcc/testsuite/gcc.target/riscv/rvv/autovec/vls-vlmax/vec_extract-zvfh-run.c
@@ -1,5 +1,5 @@
 /* { dg-do run {target { riscv_zvfh_hw } } } */
-/* { dg-additional-options "-march=rv64gcv_zvfh -Wno-pedantic" } */
+/* { dg-additional-options "-Wno-pedantic" } */
 
 #include 
 
diff --git a/gcc/testsuite/gcc.target/riscv/rvv/autovec/vls-vlmax/vec_set-run.c 
b/gcc/testsuite/gcc.target/riscv/rvv/autovec/vls-vlmax/vec_set-run.c
index 7e5e0e69d51..bf04a3d029e 100644
--- a/gcc/testsuite/gcc.target/riscv/rvv/autovec/vls-vlmax/vec_set-run.c
+++ b/gcc/testsuite/gcc.target/riscv/rvv/autovec/vls-vlmax/vec_set-run.c
@@ -1,5 +1,5 @@
 /* { dg-do run { target { riscv_vector } } } */
-/* { dg-additional-options "-std=c99 -march=rv64gcv -Wno-pedantic" } */
+/* { dg-additional-options "-std=c99 -Wno-pedantic" } */
 
 #include 
 
diff --git 
a/gcc/testsuite/gcc.target/riscv/rvv/autovec/vls-vlmax/vec_set-zvfh-run.c 
b/gcc/testsuite/gcc.target/riscv/rvv/autovec/vls-vlmax/vec_set-zvfh-run.c
index bf514f9426b..df8363e0428 100644
--- a/gcc/testsuite/gcc.target/riscv/rvv/autovec/vls-vlmax/vec_set-zvfh-run.c
+++ b/gcc/testsuite/gcc.target/riscv/rvv/autovec/vls-vlmax/vec_set-zvfh-run.c
@@ -1,5 +1,5 @@
 /* { dg-do run { target { riscv_zvfh_hw } } } */
-/* { dg-additional-options "-march=rv64gcv_zvfh -Wno-pedantic" } */
+/* { dg-additional-options "-Wno-pedantic" } */
 
 #include 
 
-- 
2.36.1



[Bug rtl-optimization/110307] ICE in move_insn, at haifa-sched.cc:5473 when building Ruby on alpha with -fPIC -O2 (or -fpeephole2 -fschedule-insns2)

2023-06-19 Thread matoro_gcc_bugzilla at matoro dot tk via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=110307

--- Comment #4 from matoro  ---
(In reply to Alexander Monakov from comment #3)
> Do you have older versions of GCC to check on this testcase?

No, for the same reason I didn't get a complete backtrace, it takes a while to
compile on this machine.  I can go ahead and kick it off though, and update
with results as I find them.

[PR target/110201] Fix operand types for various scalar crypto insns

2023-06-19 Thread Jeff Law via Gcc-patches


A handful of the scalar crypto instructions are supposed to take a 
constant integer argument 0..3 inclusive.  A suitable constraint was 
created and used for this purpose (D03), but the operand's predicate is 
"register_operand".  That's just wrong.


This patch adds a new predicate "const_0_3_operand" and fixes the 
relevant insns to use it.  One could argue the constraint is redundant 
now (and you'd be correct).  I wouldn't lose sleep if someone wanted 
that removed, in which case I'll spin up a V2.


The testsuite was broken in a way that made it consistent with the 
compiler, so the tests passed, when they really should have been issuing 
errors all along.


This patch adjusts the existing tests so that they all expect a 
diagnostic on the invalid operand usage (including out of range 
constants).  It adds new tests with proper constants, testing the 
extremes of valid values.


OK for the trunk, or should we remove the D03 constraint?

Jeff

PR target/110201
gcc/
* config/riscv/predicates.md (const_0_3_operand): New predicate.
* config/riscv/crypto.md (riscv_aes32dsi): Use new predicate.
(riscv_aes32dsmi, riscv_aes32esi, riscvaes32esmi): Likewise.
(riscv_sm4ed_, riscv_sm4ks_"
   [(set (match_operand:X 0 "register_operand" "=r")
 (unspec:X [(match_operand:X 1 "register_operand" "r")
   (match_operand:X 2 "register_operand" "r")
-  (match_operand:SI 3 "register_operand" "D03")]
+  (match_operand:SI 3 "const_0_3_operand" "D03")]
   UNSPEC_SM4_ED))]
   "TARGET_ZKSED"
   "sm4ed\t%0,%1,%2,%3"
@@ -404,7 +404,7 @@ (define_insn "riscv_sm4ks_"
   [(set (match_operand:X 0 "register_operand" "=r")
 (unspec:X [(match_operand:X 1 "register_operand" "r")
   (match_operand:X 2 "register_operand" "r")
-  (match_operand:SI 3 "register_operand" "D03")]
+  (match_operand:SI 3 "const_0_3_operand" "D03")]
   UNSPEC_SM4_KS))]
   "TARGET_ZKSED"
   "sm4ks\t%0,%1,%2,%3"
diff --git a/gcc/config/riscv/predicates.md b/gcc/config/riscv/predicates.md
index 04ca6ceabc7..7aed71b5123 100644
--- a/gcc/config/riscv/predicates.md
+++ b/gcc/config/riscv/predicates.md
@@ -45,6 +45,10 @@ (define_predicate "const_csr_operand"
   (and (match_code "const_int")
(match_test "IN_RANGE (INTVAL (op), 0, 31)")))
 
+(define_predicate "const_0_3_operand"
+  (and (match_code "const_int")
+   (match_test "IN_RANGE (INTVAL (op), 0, 3)")))
+
 (define_predicate "csr_operand"
   (ior (match_operand 0 "const_csr_operand")
(match_operand 0 "register_operand")))
diff --git a/gcc/testsuite/gcc.target/riscv/zknd32-2.c 
b/gcc/testsuite/gcc.target/riscv/zknd32-2.c
new file mode 100644
index 000..f8e68c6e56b
--- /dev/null
+++ b/gcc/testsuite/gcc.target/riscv/zknd32-2.c
@@ -0,0 +1,28 @@
+/* { dg-do compile } */
+/* { dg-options "-O2 -march=rv32gc_zknd -mabi=ilp32d" } */
+/* { dg-skip-if "" { *-*-* } { "-g" "-flto"} } */
+
+#include 
+
+int32_t foo1(int32_t rs1, int32_t rs2)
+{
+return __builtin_riscv_aes32dsi(rs1,rs2,0);
+}
+
+int32_t foo2(int32_t rs1, int32_t rs2)
+{
+return __builtin_riscv_aes32dsmi(rs1,rs2,0);
+}
+
+int32_t foo3(int32_t rs1, int32_t rs2)
+{
+return __builtin_riscv_aes32dsi(rs1,rs2,3);
+}
+
+int32_t foo4(int32_t rs1, int32_t rs2)
+{
+return __builtin_riscv_aes32dsmi(rs1,rs2,3);
+}
+
+/* { dg-final { scan-assembler-times "aes32dsi" 2 } } */
+/* { dg-final { scan-assembler-times "aes32dsmi" 2 } } */
diff --git a/gcc/testsuite/gcc.target/riscv/zknd32.c 
b/gcc/testsuite/gcc.target/riscv/zknd32.c
index 5fcc66da901..7370a2c1812 100644
--- a/gcc/testsuite/gcc.target/riscv/zknd32.c
+++ b/gcc/testsuite/gcc.target/riscv/zknd32.c
@@ -6,13 +6,30 @@
 
 int32_t foo1(int32_t rs1, int32_t rs2, int bs)
 {
-return __builtin_riscv_aes32dsi(rs1,rs2,bs);
+return __builtin_riscv_aes32dsi(rs1,rs2,bs); /* { dg-error "invalid 
argument to built-in function" } */
 }
 
 int32_t foo2(int32_t rs1, int32_t rs2, int bs)
 {
-return __builtin_riscv_aes32dsmi(rs1,rs2,bs);
+return __builtin_riscv_aes32dsmi(rs1,rs2,bs); /* { dg-error "invalid 
argument to built-in function" } */
 }
 
-/* { dg-final { scan-assembler-times "aes32dsi" 1 } } */
-/* { dg-final { scan-assembler-times "aes32dsmi" 1 } } */
+int32_t foo3(int32_t rs1, int32_t rs2)
+{
+return __builtin_riscv_aes32dsi(rs1,rs2,-1); /* { dg-error "invalid 
argument to built-in function" } */
+}
+
+int32_t foo4(int32_t rs1, int32_t rs2)
+{
+return __builtin_riscv_aes32dsmi(rs1,rs2,-1); /* { dg-error "invalid 
argument to built-in function" } */
+}
+
+int32_t foo5(int32_t rs1, int32_t rs2)
+{
+return __builtin_riscv_aes32dsi(rs1,rs2,4); /* { dg-error "invalid 
argument to built-in function" } */
+}
+
+int32_t foo6(int32_t rs1, int32_t rs2)
+{
+return __builtin_riscv_aes32dsmi(rs1,rs2,4); /* { dg-error "invalid 
argument to built-in function" } */
+}
diff --git 

[Bug rtl-optimization/110307] ICE in move_insn, at haifa-sched.cc:5473 when building Ruby on alpha with -fPIC -O2 (or -fpeephole2 -fschedule-insns2)

2023-06-19 Thread amonakov at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=110307

--- Comment #3 from Alexander Monakov  ---
Do you have older versions of GCC to check on this testcase?

Re: [PATCH] RISC-V: Add VLS modes for GNU vectors

2023-06-19 Thread Jeff Law via Gcc-patches




On 6/19/23 15:45, 钟居哲 wrote:

Hi, Jeff.
Thanks for comment.

I add INCLUDE_ALGORITHM since I use std:min.
I failed to compile when I didn't add INCLUDE_ALGORITHM.

Is INCLUDE_ALGORITHM expensive that you don't want it?


It just stood out as unexpected.  THere's no concerns with std::min and 
the like.


Jeff



Re: [PATCH] VECT: Apply LEN_MASK_{LOAD,STORE} into vectorizer

2023-06-19 Thread 钟居哲
Hi, this patch refactors the codes in tree-vect-stmts.cc in case of gimple IR 
generation.

I realize the codes change too much and I am not sure whether you are happy 
with it.

Originally, the codes are like:

if (final_mask)
 {
generate IFN_MASK_LOAD...
 }
else if (loop_len)
{
   generate IFN_LEN_LOAD
   handle BIAS.
}
else
{ 
NORMAL_LOAD
}

Now, I refactor it:

if (final_mask || loop_len)
{
  if (get_len_load_store ().exisits ())
  {
/* LEN_MASK_LOAD or LEN_LOAD */
get len..
if (LEN_MASK_LOAD)
 {
   get mask...
   generate IFN_LEN_MASK_LOAD...
 }
 else
  {
generate IFN_LEN_LOAD...
  }
  Handle BIAS
  }
  else
  {
gcc_assert (final_mask)
/* MASK_LOAD */
   }
}
else 
{
 NORMAL_LOAD
}

The reason I refactor it is I found LEN_MASK_LOAD and LEN_LOAD share some 
common codes.
Avoid duplicate codes make the codes looks reasonable.

Boostrap and Regression is on the way.


juzhe.zh...@rivai.ai
 
From: juzhe.zhong
Date: 2023-06-20 00:17
To: gcc-patches
CC: rguenther; richard.sandiford; Ju-Zhe Zhong
Subject: [PATCH] VECT: Apply LEN_MASK_{LOAD,STORE} into vectorizer
From: Ju-Zhe Zhong 
 
This patch is apply LEN_MASK_{LOAD,STORE} into vectorizer.
I refactor gimple IR build to make codes look cleaner.
 
gcc/ChangeLog:
 
* internal-fn.cc (expand_partial_store_optab_fn): Add 
LEN_MASK_{LOAD,STORE} vectorizer support.
(internal_load_fn_p): Ditto.
(internal_store_fn_p): Ditto.
(internal_fn_mask_index): Ditto.
(internal_fn_stored_value_index): Ditto.
(internal_len_load_store_bias): Ditto.
* optabs-query.cc (can_vec_mask_load_store_p): Ditto.
(get_len_load_store_mode): Ditto.
* tree-vect-stmts.cc (check_load_store_for_partial_vectors): Ditto.
(get_all_ones_mask): New function.
(vectorizable_store): Add LEN_MASK_{LOAD,STORE} vectorizer support.
(vectorizable_load): Ditto.
 
---
gcc/internal-fn.cc |  35 +-
gcc/optabs-query.cc|  25 +++-
gcc/tree-vect-stmts.cc | 259 +
3 files changed, 213 insertions(+), 106 deletions(-)
 
diff --git a/gcc/internal-fn.cc b/gcc/internal-fn.cc
index c911ae790cb..e10c21de5f1 100644
--- a/gcc/internal-fn.cc
+++ b/gcc/internal-fn.cc
@@ -2949,7 +2949,7 @@ expand_partial_load_optab_fn (internal_fn, gcall *stmt, 
convert_optab optab)
  * OPTAB.  */
static void
-expand_partial_store_optab_fn (internal_fn, gcall *stmt, convert_optab optab)
+expand_partial_store_optab_fn (internal_fn ifn, gcall *stmt, convert_optab 
optab)
{
   class expand_operand ops[5];
   tree type, lhs, rhs, maskt, biast;
@@ -2957,7 +2957,7 @@ expand_partial_store_optab_fn (internal_fn, gcall *stmt, 
convert_optab optab)
   insn_code icode;
   maskt = gimple_call_arg (stmt, 2);
-  rhs = gimple_call_arg (stmt, 3);
+  rhs = gimple_call_arg (stmt, internal_fn_stored_value_index (ifn));
   type = TREE_TYPE (rhs);
   lhs = expand_call_mem_ref (type, stmt, 0);
@@ -4435,6 +4435,7 @@ internal_load_fn_p (internal_fn fn)
 case IFN_GATHER_LOAD:
 case IFN_MASK_GATHER_LOAD:
 case IFN_LEN_LOAD:
+case IFN_LEN_MASK_LOAD:
   return true;
 default:
@@ -4455,6 +4456,7 @@ internal_store_fn_p (internal_fn fn)
 case IFN_SCATTER_STORE:
 case IFN_MASK_SCATTER_STORE:
 case IFN_LEN_STORE:
+case IFN_LEN_MASK_STORE:
   return true;
 default:
@@ -4494,6 +4496,10 @@ internal_fn_mask_index (internal_fn fn)
 case IFN_MASK_STORE_LANES:
   return 2;
+case IFN_LEN_MASK_LOAD:
+case IFN_LEN_MASK_STORE:
+  return 3;
+
 case IFN_MASK_GATHER_LOAD:
 case IFN_MASK_SCATTER_STORE:
   return 4;
@@ -4519,6 +4525,9 @@ internal_fn_stored_value_index (internal_fn fn)
 case IFN_LEN_STORE:
   return 3;
+case IFN_LEN_MASK_STORE:
+  return 4;
+
 default:
   return -1;
 }
@@ -4583,13 +4592,31 @@ internal_len_load_store_bias (internal_fn ifn, 
machine_mode mode)
{
   optab optab = direct_internal_fn_optab (ifn);
   insn_code icode = direct_optab_handler (optab, mode);
+  int bias_argno = 3;
+  if (icode == CODE_FOR_nothing)
+{
+  machine_mode mask_mode
+ = targetm.vectorize.get_mask_mode (mode).require ();
+  if (ifn == IFN_LEN_LOAD)
+ {
+   /* Try LEN_MASK_LOAD.  */
+   optab = direct_internal_fn_optab (IFN_LEN_MASK_LOAD);
+ }
+  else
+ {
+   /* Try LEN_MASK_STORE.  */
+   optab = direct_internal_fn_optab (IFN_LEN_MASK_STORE);
+ }
+  icode = convert_optab_handler (optab, mode, mask_mode);
+  bias_argno = 4;
+}
   if (icode != CODE_FOR_nothing)
 {
   /* For now we only support biases of 0 or -1.  Try both of them.  */
-  if (insn_operand_matches (icode, 3, GEN_INT (0)))
+  if (insn_operand_matches (icode, bias_argno, GEN_INT (0)))
return 0;
-  if (insn_operand_matches (icode, 3, GEN_INT (-1)))
+  if (insn_operand_matches (icode, bias_argno, GEN_INT (-1)))
return -1;
 }
diff --git a/gcc/optabs-query.cc 

[Bug middle-end/110282] Segmentation fault with specific optimizations

2023-06-19 Thread pinskia at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=110282

--- Comment #5 from Andrew Pinski  ---
Note I suspect r12-248-gb58dc0b803057c0e6032e0d9b made the problem latent in
GCC 12+. But turning off DSE in GCC 12.1.0 does not reproduce the bug 

Re: Different ASM for ReLU function between GCC11 and GCC12

2023-06-19 Thread Marc Glisse via Gcc

On Mon, 19 Jun 2023, André Günther via Gcc wrote:


I noticed that a simple function like
auto relu( float x ) {
   return x > 0.f ? x : 0.f;
}
compiles to different ASM using GCC11 (or lower) and GCC12 (or higher). On
-O3 -mavx2 the former compiles above function to

relu(float):
   vmaxss xmm0, xmm0, DWORD PTR .LC0[rip]
   ret
.LC0:
   .long 0

which is what I would naively expect and what also clang essentially does
(clang actually uses an xor before the maxss to get the zero). The latter,
however, compiles the function to

relu(float):
   vxorps xmm1, xmm1, xmm1
   vcmpltss xmm2, xmm1, xmm0
   vblendvps xmm0, xmm1, xmm0, xmm2
   ret

which looks like a missed optimisation. Does anyone know if there's a
reason for the changed behaviour?


With -fno-signed-zeros -ffinite-math-only, gcc-12 still uses max instead 
of cmp+blend. So the first thing to check would be if both versions give 
the same result on negative 0 and NaN.


--
Marc Glisse


Re: Re: [PATCH] RISC-V: Add VLS modes for GNU vectors

2023-06-19 Thread 钟居哲
Hi, Jeff.
Thanks for comment.

I add INCLUDE_ALGORITHM since I use std:min.
I failed to compile when I didn't add INCLUDE_ALGORITHM.

Is INCLUDE_ALGORITHM expensive that you don't want it?


juzhe.zh...@rivai.ai
 
From: Jeff Law
Date: 2023-06-20 02:25
To: Juzhe-Zhong; gcc-patches
CC: kito.cheng; palmer; rdapp.gcc
Subject: Re: [PATCH] RISC-V: Add VLS modes for GNU vectors
 
 
On 6/18/23 17:06, Juzhe-Zhong wrote:
> This patch is a propsal patch is **NOT** ready to push since
> after this patch the total machine modes will exceed 255 which will create ICE
> in LTO:
>internal compiler error: in bp_pack_int_in_range, at data-streamer.h:290
Right.  Note that an ack from Jakub or Richi will be sufficient for the 
LTO fixes to go forward.
 
 
> 
> The reason we need to add VLS modes for following reason:
> 1. Enhance GNU vectors codegen:
> For example:
>   typedef int32_t vnx8si __attribute__ ((vector_size (32)));
> 
>   __attribute__ ((noipa)) void
>   f_vnx8si (int32_t * in, int32_t * out)
>   {
> vnx8si v = *(vnx8si*)in;
> *(vnx8si *) out = v;
>   }
> 
> compile option: --param=riscv-autovec-preference=scalable
>  before this patch:
>  f_vnx8si:
>  ld  a2,0(a0)
>  ld  a3,8(a0)
>  ld  a4,16(a0)
>  ld  a5,24(a0)
>  addisp,sp,-32
>  sd  a2,0(a1)
>  sd  a3,8(a1)
>  sd  a4,16(a1)
>  sd  a5,24(a1)
>  addisp,sp,32
>  jr  ra
> 
> After this patch:
> f_vnx8si:
>  vsetivlizero,8,e32,m2,ta,ma
>  vle32.v v2,0(a0)
>  vse32.v v2,0(a1)
>  ret
> 
> 2. Ehance VLA SLP:
> void
> f (uint8_t *restrict a, uint8_t *restrict b, uint8_t *restrict c)
> {
>for (int i = 0; i < 100; ++i)
>  {
>a[i * 8] = b[i * 8] + c[i * 8];
>a[i * 8 + 1] = b[i * 8] + c[i * 8 + 1];
>a[i * 8 + 2] = b[i * 8 + 2] + c[i * 8 + 2];
>a[i * 8 + 3] = b[i * 8 + 2] + c[i * 8 + 3];
>a[i * 8 + 4] = b[i * 8 + 4] + c[i * 8 + 4];
>a[i * 8 + 5] = b[i * 8 + 4] + c[i * 8 + 5];
>a[i * 8 + 6] = b[i * 8 + 6] + c[i * 8 + 6];
>a[i * 8 + 7] = b[i * 8 + 6] + c[i * 8 + 7];
>  }
> }
> 
> 
> ..
> Loop body:
>   ...
>   vrgatherei16.vv...
>   ...
> 
> Tail:
>   lbu a4,792(a1)
>  lbu a5,792(a2)
>  addwa5,a5,a4
>  sb  a5,792(a0)
>  lbu a5,793(a2)
>  addwa5,a5,a4
>  sb  a5,793(a0)
>  lbu a4,794(a1)
>  lbu a5,794(a2)
>  addwa5,a5,a4
>  sb  a5,794(a0)
>  lbu a5,795(a2)
>  addwa5,a5,a4
>  sb  a5,795(a0)
>  lbu a4,796(a1)
>  lbu a5,796(a2)
>  addwa5,a5,a4
>  sb  a5,796(a0)
>  lbu a5,797(a2)
>  addwa5,a5,a4
>  sb  a5,797(a0)
>  lbu a4,798(a1)
>  lbu a5,798(a2)
>  addwa5,a5,a4
>  sb  a5,798(a0)
>  lbu a5,799(a2)
>  addwa5,a5,a4
>  sb  a5,799(a0)
>  ret
> 
> The tail elements need VLS modes to vectorize like ARM SVE:
> 
> f:
>  mov x3, 0
>  cntbx5
>  mov x4, 792
>  whilelo p7.b, xzr, x4
> .L2:
>  ld1bz31.b, p7/z, [x1, x3]
>  ld1bz30.b, p7/z, [x2, x3]
>  trn1z31.b, z31.b, z31.b
>  add z31.b, z31.b, z30.b
>  st1bz31.b, p7, [x0, x3]
>  add x3, x3, x5
>  whilelo p7.b, x3, x4
>  b.any   .L2
> Tail:
>  ldr b31, [x1, 792]
>  ldr b27, [x1, 794]
>  ldr b28, [x1, 796]
>  dup v31.8b, v31.b[0]
>  ldr b29, [x1, 798]
>  ldr d30, [x2, 792]
>  ins v31.b[2], v27.b[0]
>  ins v31.b[3], v27.b[0]
>  ins v31.b[4], v28.b[0]
>  ins v31.b[5], v28.b[0]
>  ins v31.b[6], v29.b[0]
>  ins v31.b[7], v29.b[0]
>  add v31.8b, v30.8b, v31.8b
>  str d31, [x0, 792]
>  ret
> 
> Notice ARM SVE use ADVSIMD modes (Neon) to vectorize the tail.
 
 
 
> 
> gcc/ChangeLog:
> 
>  * config/riscv/riscv-modes.def (VECTOR_BOOL_MODE): Add VLS modes for 
> GNU vectors.
>  (ADJUST_ALIGNMENT): Ditto.
>  (ADJUST_BYTESIZE): Ditto.
> 
>  (ADJUST_PRECISION): Ditto.
>  (VECTOR_MODES): Ditto.
>  * config/riscv/riscv-protos.h (riscv_v_ext_vls_mode_p): Ditto.
>  (get_regno_alignment): Ditto.
>  * config/riscv/riscv-v.cc (INCLUDE_ALGORITHM): Ditto.
>  (const_vlmax_p): Ditto.
>  (legitimize_move): Ditto.
>  (get_vlmul): Ditto.
>  (get_regno_alignment): Ditto.
>  (get_ratio): Ditto.
>  (get_vector_mode): Ditto.
>  * config/riscv/riscv-vector-switch.def (VLS_ENTRY): Ditto.
>  * config/riscv/riscv.cc 

Re: [PATCH v6 1/4] libcpp: reject codepoints above 0x10FFFF

2023-06-19 Thread Jason Merrill via Gcc-patches

On 6/6/23 16:50, Ben Boeckel wrote:

Unicode does not support such values because they are unrepresentable in
UTF-16.


Pushed.


libcpp/

* charset.cc: Reject encodings of codepoints above 0x10.
UTF-16 does not support such codepoints and therefore all
Unicode rejects such values.

Signed-off-by: Ben Boeckel 
---
  libcpp/charset.cc | 7 +++
  1 file changed, 7 insertions(+)

diff --git a/libcpp/charset.cc b/libcpp/charset.cc
index d7f323b2cd5..3b34d804cf1 100644
--- a/libcpp/charset.cc
+++ b/libcpp/charset.cc
@@ -1886,6 +1886,13 @@ cpp_valid_utf8_p (const char *buffer, size_t num_bytes)
int err = one_utf8_to_cppchar (, , );
if (err)
return false;
+
+  /* Additionally, Unicode declares that all codepoints above 0010 are
+invalid because they cannot be represented in UTF-16.
+
+Reject such values.*/
+  if (cp >= 0x10)
+   return false;
  }
/* No problems encountered.  */
return true;




Re: [PATCH v5 3/5] p1689r5: initial support

2023-06-19 Thread Jason Merrill via Gcc-patches

On 5/12/23 10:24, Ben Boeckel wrote:

On Tue, Feb 14, 2023 at 16:50:27 -0500, Jason Merrill wrote:

I notice that the actual flags are all -fdep-*, though some of them are
-fdeps-* here, and the internal variables all seem to be fdeps_*.  I
lean toward harmonizing on "deps", I think.


Done.


I don't love the three separate options, but I suppose it's fine.  I'd
prefer "target" instead of "output".


Done.


It should be possible to omit both -file and -target and get reasonable
defaults, like the ones for -MD/-MQ in gcc.cc:cpp_unique_options.


`file` can be omitted (the `output_stream` will be used then). I *think*
I see that adding:

 %{fdeps_file:-fdeps-file=%{!o:%b.ddi}%{o*:%.ddi%*}}


%{!fdeps-file: but yes.


would at least do for `-fdeps-file` defaults? I don't know if there's a
reasonable default for `-fdeps-target=` though given that this command
line has no information about the object file that will be used (`-o` is
used for preprocessor output since we're leaning on `-E` here).


I would think it could default to %b.o?

I had quite a few more comments on the v5 patch that you didn't respond 
to here or address in the v6 patch; did your mail client hide them from you?


Jason



Re: [PATCH v6 1/4] libcpp: reject codepoints above 0x10FFFF

2023-06-19 Thread Jason Merrill via Gcc

On 6/6/23 16:50, Ben Boeckel wrote:

Unicode does not support such values because they are unrepresentable in
UTF-16.


Pushed.


libcpp/

* charset.cc: Reject encodings of codepoints above 0x10.
UTF-16 does not support such codepoints and therefore all
Unicode rejects such values.

Signed-off-by: Ben Boeckel 
---
  libcpp/charset.cc | 7 +++
  1 file changed, 7 insertions(+)

diff --git a/libcpp/charset.cc b/libcpp/charset.cc
index d7f323b2cd5..3b34d804cf1 100644
--- a/libcpp/charset.cc
+++ b/libcpp/charset.cc
@@ -1886,6 +1886,13 @@ cpp_valid_utf8_p (const char *buffer, size_t num_bytes)
int err = one_utf8_to_cppchar (, , );
if (err)
return false;
+
+  /* Additionally, Unicode declares that all codepoints above 0010 are
+invalid because they cannot be represented in UTF-16.
+
+Reject such values.*/
+  if (cp >= 0x10)
+   return false;
  }
/* No problems encountered.  */
return true;




Re: [PATCH v5 3/5] p1689r5: initial support

2023-06-19 Thread Jason Merrill via Gcc

On 5/12/23 10:24, Ben Boeckel wrote:

On Tue, Feb 14, 2023 at 16:50:27 -0500, Jason Merrill wrote:

I notice that the actual flags are all -fdep-*, though some of them are
-fdeps-* here, and the internal variables all seem to be fdeps_*.  I
lean toward harmonizing on "deps", I think.


Done.


I don't love the three separate options, but I suppose it's fine.  I'd
prefer "target" instead of "output".


Done.


It should be possible to omit both -file and -target and get reasonable
defaults, like the ones for -MD/-MQ in gcc.cc:cpp_unique_options.


`file` can be omitted (the `output_stream` will be used then). I *think*
I see that adding:

 %{fdeps_file:-fdeps-file=%{!o:%b.ddi}%{o*:%.ddi%*}}


%{!fdeps-file: but yes.


would at least do for `-fdeps-file` defaults? I don't know if there's a
reasonable default for `-fdeps-target=` though given that this command
line has no information about the object file that will be used (`-o` is
used for preprocessor output since we're leaning on `-E` here).


I would think it could default to %b.o?

I had quite a few more comments on the v5 patch that you didn't respond 
to here or address in the v6 patch; did your mail client hide them from you?


Jason



[PATCH 12/14] OpenACC: "declare create" fixes wrt. "allocatable" variables

2023-06-19 Thread Julian Brown
This patch fixes a case revealed by the previous patch where a synthetic
"acc data" region created for a "declare create" variable could interact
strangely with lexical inheritance behaviour.  In fact, it doesn't seem
right to create the "acc data" region for allocatable variables at all
-- doing so means that a data region is likely to be created for an
unallocated variable.

The fix is not to add such variables to the synthetic "acc data" region
at all, and defer to the code that performs "enter data"/"exit data"
for them when allocated/deallocated on the host instead. Then, "declare
create" variables are implicitly turned into "present" clauses on in-scope
offload regions.

2023-06-16  Julian Brown  

gcc/fortran/
* trans-openmp.cc (gfc_omp_finish_clause): Handle "declare create" for
scalar allocatable variables.
(gfc_trans_omp_clauses): Don't include allocatable vars in synthetic
"acc data" region created for "declare create" variables.  Mark such
variables with the "oacc declare create" attribute instead.  Don't
create ALWAYS_POINTER mapping for target-to-host updates of declare
create variables.
(gfc_trans_oacc_declare): Handle empty clause list.

gcc/
* gimplify.cc (gimplify_adjust_omp_clauses_1): Handle "oacc declare
create" attribute.

libgomp/
* testsuite/libgomp.oacc-fortran/declare-create-1.f90: New test.
* testsuite/libgomp.oacc-fortran/declare-create-2.f90: New test.
* testsuite/libgomp.oacc-fortran/declare-create-3.f90: New test.
---
 gcc/fortran/trans-openmp.cc   | 45 ---
 gcc/gimplify.cc   |  8 
 .../libgomp.oacc-fortran/declare-create-1.f90 | 21 +
 .../libgomp.oacc-fortran/declare-create-2.f90 | 25 +++
 .../libgomp.oacc-fortran/declare-create-3.f90 | 25 +++
 5 files changed, 119 insertions(+), 5 deletions(-)
 create mode 100644 libgomp/testsuite/libgomp.oacc-fortran/declare-create-1.f90
 create mode 100644 libgomp/testsuite/libgomp.oacc-fortran/declare-create-2.f90
 create mode 100644 libgomp/testsuite/libgomp.oacc-fortran/declare-create-3.f90

diff --git a/gcc/fortran/trans-openmp.cc b/gcc/fortran/trans-openmp.cc
index 1a14d2bc068..819d79cda28 100644
--- a/gcc/fortran/trans-openmp.cc
+++ b/gcc/fortran/trans-openmp.cc
@@ -1619,7 +1619,16 @@ gfc_omp_finish_clause (tree c, gimple_seq *pre_p, bool 
openacc)
   orig_decl = decl;
 
   c4 = build_omp_clause (OMP_CLAUSE_LOCATION (c), OMP_CLAUSE_MAP);
-  OMP_CLAUSE_SET_MAP_KIND (c4, GOMP_MAP_POINTER);
+  if (openacc
+ && GFC_DECL_GET_SCALAR_ALLOCATABLE (decl)
+ && OMP_CLAUSE_MAP_KIND (c) == GOMP_MAP_FORCE_PRESENT)
+   /* This allows "declare create" to work for scalar allocatables.  The
+  resulting mapping nodes are:
+force_present(*var) firstprivate_pointer(var)
+  which is the same as an explicit "present" clause gives.  */
+   OMP_CLAUSE_SET_MAP_KIND (c4, GOMP_MAP_FIRSTPRIVATE_POINTER);
+  else
+   OMP_CLAUSE_SET_MAP_KIND (c4, GOMP_MAP_POINTER);
   OMP_CLAUSE_DECL (c4) = decl;
   OMP_CLAUSE_SIZE (c4) = size_int (0);
   decl = build_fold_indirect_ref (decl);
@@ -4588,6 +4597,29 @@ gfc_trans_omp_clauses (stmtblock_t *block, 
gfc_omp_clauses *clauses,
  if (!n->sym->attr.referenced)
continue;
 
+ /* We do not want to include allocatable vars in a synthetic
+"acc data" region created for "!$acc declare create" vars.
+Such variables are handled by augmenting allocate/deallocate
+statements elsewhere (with
+"acc enter data declare_allocate(...)", etc.).  */
+ if (op == EXEC_OACC_DECLARE
+ && n->u.map_op == OMP_MAP_ALLOC
+ && n->sym->attr.allocatable
+ && n->sym->attr.oacc_declare_create)
+   {
+ tree tree_var = gfc_get_symbol_decl (n->sym);
+ if (!lookup_attribute ("oacc declare create",
+DECL_ATTRIBUTES (tree_var)))
+   DECL_ATTRIBUTES (tree_var)
+ = tree_cons (get_identifier ("oacc declare create"),
+  NULL_TREE, DECL_ATTRIBUTES (tree_var));
+ /* We might need to turn what would normally be a
+"firstprivate" mapping into a "present" mapping.  For the
+latter, we need the decl to be addressable.  */
+ TREE_ADDRESSABLE (tree_var) = 1;
+ continue;
+   }
+
  bool always_modifier = false;
  tree node = build_omp_clause (input_location, OMP_CLAUSE_MAP);
  tree node2 = NULL_TREE;
@@ -4780,7 +4812,8 @@ gfc_trans_omp_clauses (stmtblock_t *block, 
gfc_omp_clauses *clauses,
  tree orig_decl = decl;
 

[PATCH 11/14] OpenACC: Reimplement "inheritance" for lexically-nested offload regions

2023-06-19 Thread Julian Brown
This patch reimplements "lexical inheritance" for OpenACC offload regions
inside "data" regions, allowing e.g. this to work:

  int *ptr;
  [...]
  #pragma acc data copyin(ptr[10:2])
  {
#pragma acc parallel
{ ... }
  }

here, the "copyin" is mirrored on the inner "acc parallel" as
"present(ptr[10:2])" -- allowing code within the parallel to use that
section of the array even though the mapping is implicit.

In terms of implementation, this works by expanding mapping nodes for
"acc data" to include pointer mappings that might be needed by inner
offload regions. The resulting mapping group is then copied to the inner
offload region as needed, rewriting the first node to "force_present".
The pointer mapping nodes are then removed from the "acc data" later
during gimplification.

For OpenMP, pointer mapping nodes on equivalent "omp data" regions are
not needed, so remain suppressed during expansion.

2023-06-16  Julian Brown  

gcc/c-family/
* c-omp.cc (c_omp_address_inspector::expand_array_base): Don't omit
pointer nodes for OpenACC.

gcc/
* gimplify.cc (omp_tsort_mark, omp_mapping_group): Move before
gimplify_omp_ctx. Add constructor to omp_mapping_group.
(gimplify_omp_ctx): Add DECL_DATA_CLAUSE field.
(new_omp_context, delete_omp_context): Initialise and free above field.
(omp_gather_mapping_groups_1): Use constructor for omp_mapping_group.
(gimplify_scan_omp_clauses): Record mappings that might be lexically
inherited.  Don't remove
GOMP_MAP_FIRSTPRIVATE_POINTER/GOMP_MAP_FIRSTPRIVATE_REFERENCE yet.
(gomp_oacc_needs_data_present): New function.
(gimplify_adjust_omp_clauses_1): Implement lexical inheritance
behaviour for OpenACC.
(gimplify_adjust_omp_clauses): Remove
GOMP_MAP_FIRSTPRIVATE_POINTER/GOMP_MAP_FIRSTPRIVATE_REFERENCE here
instead, after lexical inheritance is done.

gcc/testsuite/
* c-c++-common/goacc/acc-data-chain.c: Re-enable scan test.
* gfortran.dg/goacc/pr70828.f90: Likewise.
* gfortran.dg/goacc/assumed-size.f90: New test.

libgomp/
* testsuite/libgomp.oacc-c-c++-common/pr70828.c: Un-XFAIL.
* testsuite/libgomp.oacc-c-c++-common/pr70828-2.c: Un-XFAIL.
* testsuite/libgomp.oacc-fortran/pr70828.f90: Un-XFAIL.
* testsuite/libgomp.oacc-fortran/pr70828-2.f90: Un-XFAIL.
* testsuite/libgomp.oacc-fortran/pr70828-3.f90: Un-XFAIL.
* testsuite/libgomp.oacc-fortran/pr70828-4.f90: Un-XFAIL.
* testsuite/libgomp.oacc-fortran/pr70828-5.f90: Un-XFAIL.
* testsuite/libgomp.oacc-fortran/pr70828-6.f90: Un-XFAIL.
---
 gcc/c-family/c-omp.cc |  13 +-
 gcc/gimplify.cc   | 208 +-
 .../c-c++-common/goacc/acc-data-chain.c   |   4 +-
 .../gfortran.dg/goacc/assumed-size.f90|  35 +++
 gcc/testsuite/gfortran.dg/goacc/pr70828.f90   |   3 +-
 .../libgomp.oacc-c-c++-common/pr70828-2.c |   2 -
 .../libgomp.oacc-c-c++-common/pr70828.c   |   2 -
 .../libgomp.oacc-fortran/pr70828-2.f90|   2 -
 .../libgomp.oacc-fortran/pr70828-3.f90|   2 -
 .../libgomp.oacc-fortran/pr70828-4.f90|   2 -
 .../libgomp.oacc-fortran/pr70828-5.f90|   2 -
 .../libgomp.oacc-fortran/pr70828-6.f90|   2 -
 .../libgomp.oacc-fortran/pr70828.f90  |   2 -
 13 files changed, 202 insertions(+), 77 deletions(-)
 create mode 100644 gcc/testsuite/gfortran.dg/goacc/assumed-size.f90

diff --git a/gcc/c-family/c-omp.cc b/gcc/c-family/c-omp.cc
index e55b2aec920..291a26293ef 100644
--- a/gcc/c-family/c-omp.cc
+++ b/gcc/c-family/c-omp.cc
@@ -4313,7 +4313,8 @@ c_omp_address_inspector::expand_array_base (tree c,
/* The code handling "firstprivatize_array_bases" in gimplify.cc is
   relevant here.  What do we need to create for arrays at this
   stage?  (This condition doesn't feel quite right.  FIXME?)  */
-   if (!target
+   if (openmp
+   && !target
&& (TREE_CODE (TREE_TYPE (addr_tokens[i + 1]->expr))
== ARRAY_TYPE))
  break;
@@ -4324,7 +4325,7 @@ c_omp_address_inspector::expand_array_base (tree c,
   virtual_origin);
tree data_addr = omp_accessed_addr (addr_tokens, i + 1, expr);
c2 = build_omp_clause (loc, OMP_CLAUSE_MAP);
-   if (decl_p && target)
+   if (decl_p && (!openmp || target))
  OMP_CLAUSE_SET_MAP_KIND (c2, GOMP_MAP_FIRSTPRIVATE_POINTER);
else
  {
@@ -4375,9 +4376,11 @@ c_omp_address_inspector::expand_array_base (tree c,
tree data_addr = omp_accessed_addr (addr_tokens, last_access, expr);
c2 = build_omp_clause (loc, OMP_CLAUSE_MAP);
/* For OpenACC, use FIRSTPRIVATE_POINTER for decls even on non-compute
-  regions (e.g. "acc data" constructs).  It'll be removed anyway in
-  gimplify.cc, but doing it this way 

[PATCH 10/14] OpenMP/OpenACC: Reorganise OMP map clause handling in gimplify.cc

2023-06-19 Thread Julian Brown
This patch has been separated out from the C++ "declare mapper"
support patch.  It contains just the gimplify.cc rearrangement
work, mostly moving gimplification from gimplify_scan_omp_clauses
to gimplify_adjust_omp_clauses for map clauses.

The motivation for doing this was that we don't know if we need to
instantiate mappers implicitly until the body of an offload region has
been scanned, i.e. in gimplify_adjust_omp_clauses, but we also need the
un-gimplified form of clauses to sort by base-pointer dependencies after
mapper instantiation has taken place.

The patch also reimplements the "present" clause sorting code to avoid
another sorting pass on mapping nodes.

2023-06-16  Julian Brown  

gcc/
* gimplify.cc (omp_segregate_mapping_groups): Handle "present" groups.
(gimplify_scan_omp_clauses): Use mapping group functionality to
iterate through mapping nodes.  Remove most gimplification of
OMP_CLAUSE_MAP nodes from here, but still populate ctx->variables
splay tree.
(gimplify_adjust_omp_clauses): Move most gimplification of
OMP_CLAUSE_MAP nodes here.

gcc/testsuite/
* gfortran.dg/gomp/map-12.f90: Adjust scan output.
---
 gcc/gimplify.cc   | 670 --
 gcc/testsuite/gfortran.dg/gomp/map-12.f90 |   2 +-
 2 files changed, 378 insertions(+), 294 deletions(-)

diff --git a/gcc/gimplify.cc b/gcc/gimplify.cc
index 9ce1f5b983a..e21e9d99cc9 100644
--- a/gcc/gimplify.cc
+++ b/gcc/gimplify.cc
@@ -9779,10 +9779,15 @@ omp_tsort_mapping_groups (vec 
*groups,
   return outlist;
 }
 
-/* Split INLIST into two parts, moving groups corresponding to
-   ALLOC/RELEASE/DELETE mappings to one list, and other mappings to another.
-   The former list is then appended to the latter.  Each sub-list retains the
-   order of the original list.
+/* Split INLIST into four parts:
+
+ - "present" to/from groups
+ - "present" alloc groups
+ - other to/from groups
+ - other alloc/release/delete groups
+
+   These sub-lists are then concatenated together to form the final list.
+   Each sub-list retains the order of the original list.
Note that ATTACH nodes are later moved to the end of the list in
gimplify_adjust_omp_clauses, for target regions.  */
 
@@ -9790,7 +9795,9 @@ static omp_mapping_group *
 omp_segregate_mapping_groups (omp_mapping_group *inlist)
 {
   omp_mapping_group *ard_groups = NULL, *tf_groups = NULL;
+  omp_mapping_group *pa_groups = NULL, *ptf_groups = NULL;
   omp_mapping_group **ard_tail = _groups, **tf_tail = _groups;
+  omp_mapping_group **pa_tail = _groups, **ptf_tail = _groups;
 
   for (omp_mapping_group *w = inlist; w;)
 {
@@ -9809,6 +9816,20 @@ omp_segregate_mapping_groups (omp_mapping_group *inlist)
  ard_tail = >next;
  break;
 
+   case GOMP_MAP_PRESENT_ALLOC:
+ *pa_tail = w;
+ w->next = NULL;
+ pa_tail = >next;
+ break;
+
+   case GOMP_MAP_PRESENT_FROM:
+   case GOMP_MAP_PRESENT_TO:
+   case GOMP_MAP_PRESENT_TOFROM:
+ *ptf_tail = w;
+ w->next = NULL;
+ ptf_tail = >next;
+ break;
+
default:
  *tf_tail = w;
  w->next = NULL;
@@ -9820,8 +9841,10 @@ omp_segregate_mapping_groups (omp_mapping_group *inlist)
 
   /* Now splice the lists together...  */
   *tf_tail = ard_groups;
+  *pa_tail = tf_groups;
+  *ptf_tail = pa_groups;
 
-  return tf_groups;
+  return ptf_groups;
 }
 
 /* Given a list LIST_P containing groups of mappings given by GROUPS, reorder
@@ -11673,119 +11696,30 @@ gimplify_scan_omp_clauses (tree *list_p, gimple_seq 
*pre_p,
break;
   }
 
-  if (code == OMP_TARGET
-  || code == OMP_TARGET_DATA
-  || code == OMP_TARGET_ENTER_DATA
-  || code == OMP_TARGET_EXIT_DATA)
-{
-  vec *groups;
-  groups = omp_gather_mapping_groups (list_p);
-  if (groups)
-   {
- hash_map *grpmap;
- grpmap = omp_index_mapping_groups (groups);
+  vec *groups = omp_gather_mapping_groups (list_p);
+  hash_map *grpmap = NULL;
+  unsigned grpnum = 0;
+  tree *grp_start_p = NULL, grp_end = NULL_TREE;
 
- omp_resolve_clause_dependencies (code, groups, grpmap);
- omp_build_struct_sibling_lists (code, region_type, groups, ,
- list_p);
-
- omp_mapping_group *outlist = NULL;
- bool enter_exit = (code == OMP_TARGET_ENTER_DATA
-|| code == OMP_TARGET_EXIT_DATA);
-
- /* Topological sorting may fail if we have duplicate nodes, which
-we should have detected and shown an error for already.  Skip
-sorting in that case.  */
- if (seen_error ())
-   goto failure;
-
- delete grpmap;
- delete groups;
-
- /* Rebuild now we have struct sibling lists.  */
- groups = omp_gather_mapping_groups (list_p);
- grpmap = omp_index_mapping_groups (groups);
-
-

[PATCH 08/14] OpenMP: Pointers and member mappings

2023-06-19 Thread Julian Brown
This patch changes the mapping node arrangement used for array components
of derived types, e.g.:

  type T
  integer, pointer, dimension(:) :: arrptr
  end type T

  type(T) :: tvar
  [...]
  !$omp target map(tofrom: tvar%arrptr)

This will currently be mapped using three mapping nodes:

  GOMP_MAP_TO tvar%arrptr   (the descriptor)
  GOMP_MAP_TOFROM *tvar%arrptr%data (the actual array data)
  GOMP_MAP_ALWAYS_POINTER tvar%arrptr%data  (a pointer to the array data)

This follows OMP 5.0, 2.19.7.1 (or OpenMP 5.2, 5.8.3) "map Clause":

  "If a list item in a map clause is an associated pointer and the
   pointer is not the base pointer of another list item in a map clause
   on the same construct, then it is treated as if its pointer target
   is implicitly mapped in the same clause. For the purposes of the map
   clause, the mapped pointer target is treated as if its base pointer
   is the associated pointer."

However, we can also write this:

  map(to: tvar%arrptr) map(tofrom: tvar%arrptr(3:8))

and then instead we should follow (OpenMP 5.2, 5.8.3 "map Clause"):

  "For map clauses on map-entering constructs, if any list item has a base
   pointer for which a corresponding pointer exists in the data environment
   upon entry to the region and either a new list item or the corresponding
   pointer is created in the device data environment on entry to the region,
   then:
   1. [Fortran] The corresponding pointer variable is associated with
  a pointer target that has the same rank and bounds as the pointer
  target of the original pointer, such that the corresponding list item
  can be accessed through the pointer in a target region.
   2. The corresponding pointer variable becomes an attached pointer
  for the corresponding list item."

With this patch you can write the above mappings, and the mapping nodes
used to map pointers to array sections (with descriptors) now look
like this:

  1) map(to: tvar%arrptr)   -->
  GOMP_MAP_TO [implicit]  *tvar%arrptr%data  (the array data)
  GOMP_MAP_TO_PSETtvar%arrptr(the descriptor)
  GOMP_MAP_ATTACH_DETACH  tvar%arrptr%data

  2) map(tofrom: tvar%arrptr(3:8)   -->
  GOMP_MAP_TOFROM *tvar%arrptr%data(3)  (size 8-3+1, etc.)
  GOMP_MAP_TO_PSETtvar%arrptr
  GOMP_MAP_ATTACH_DETACH  tvar%arrptr%data  (bias 3, etc.)

In this case, we can determine in the front-end that the
whole-array/pointer mapping (1) is only needed to map the pointer --
so we drop it entirely.  (Note also that we set -- early -- the
OMP_CLAUSE_MAP_RUNTIME_IMPLICIT_P flag for whole-array-via-pointer
mappings. See below.)

In the middle end, we process mappings using the struct sibling-list
handling machinery by moving the "GOMP_MAP_TO_PSET" node from the middle
of the group of three mapping nodes to the proper sorted position after
the GOMP_MAP_STRUCT mapping:

  GOMP_MAP_STRUCT   tvar (len: 1)
  GOMP_MAP_TO_PSET  tvar%arr (size: 64, etc.)  <--. moved here
  [...]   |
  GOMP_MAP_TOFROM *tvar%arrptr%data(3) ___|
  GOMP_MAP_ATTACH_DETACH  tvar%arrptr%data

In another case, if we have an array of derived-type values "dtarr",
and mappings like:

  i = 1
  j = 1
  map(to: dtarr(i)%arrptr) map(tofrom: dtarr(j)%arrptr(3:8))

We still map the same way, but this time we cannot prove that the base
expressions "dtarr(i) and "dtarr(j)" are the same in the front-end.
So we keep both mappings, but we move the "[implicit]" mapping of the
full-array reference to the end of the clause list in gimplify.cc (by
adjusting the topological sorting algorithm):

  GOMP_MAP_STRUCT dtvar  (len: 2)
  GOMP_MAP_TO_PSETdtvar(i)%arrptr
  GOMP_MAP_TO_PSETdtvar(j)%arrptr
  [...]
  GOMP_MAP_TOFROM *dtvar(j)%arrptr%data(3)  (size: 8-3+1)
  GOMP_MAP_ATTACH_DETACH  dtvar(j)%arrptr%data
  GOMP_MAP_TO [implicit]  *dtvar(i)%arrptr%data(1)  (size: whole array)
  GOMP_MAP_ATTACH_DETACH  dtvar(i)%arrptr%data

Always moving "[implicit]" full-array mappings after array-section
mappings (without that bit set) means that we'll avoid copying the whole
array unnecessarily -- even in cases where we can't prove that the arrays
are the same.

The patch also fixes some bugs with "enter data" and "exit data"
directives with this new mapping arrangement.  Also now if you have
mappings like this:

  #pragma omp target enter data map(to: dv, dv%arr(1:20))

The whole of the derived-type variable "dv" is mapped, so the
GOMP_MAP_TO_PSET for the array-section mapping can be dropped:

  GOMP_MAP_TOdv

  GOMP_MAP_TO*dv%arr%data
  GOMP_MAP_TO_PSET   dv%arr <-- deleted (array section mapping)
  GOMP_MAP_ATTACH_DETACH dv%arr%data

To accommodate for recent changes to mapping nodes made by
Tobias, this version of the patch avoids using GOMP_MAP_TO_PSET
for "exit data" directives, in favour of using the "correct"
GOMP_MAP_RELEASE/GOMP_MAP_DELETE kinds during early expansion.


[PATCH 09/14] OpenMP/OpenACC: Unordered/non-constant component offset runtime diagnostic

2023-06-19 Thread Julian Brown
This patch adds support for non-constant component offsets in "map"
clauses for OpenMP (and the equivalants for OpenACC), which are not able
to be sorted into order at compile time.  Normally struct accesses in
such clauses are gathered together and sorted into increasing address
order after a "GOMP_MAP_STRUCT" node: if we have variable indices,
that is no longer possible.

This version of the patch scales back the previously-posted version to
merely add a diagnostic for incorrect usage of component accesses with
variably-indexed arrays of structs: the only permitted variant is where
we have multiple indices that are the same, but we could not prove so
at compile time.  Rather than silently producing the wrong result for
cases where the indices are in fact different, we error out (e.g.,
"map(dtarr(i)%arrptr, dtarr(j)%arrptr(4:8))", for different i/j).

For now, multiple *constant* array indices are still supported (see
map-arrayofstruct-1.c).  That could perhaps be addressed with a follow-up
patch, if necessary.

This version of the patch renumbers the GOMP_MAP_STRUCT_UNORD kind to
avoid clashing with the OpenACC "non-contiguous" dynamic array support.

2023-06-16  Julian Brown  

gcc/fortran/
* trans-openmp.cc (gfc_omp_deep_map_kind_p): Add GOMP_MAP_STRUCT_UNORD.

gcc/
* gimplify.cc (extract_base_bit_offset): Add VARIABLE_OFFSET parameter.
(omp_get_attachment, omp_group_last, omp_group_base,
omp_directive_maps_explicitly): Add GOMP_MAP_STRUCT_UNORD support.
(omp_accumulate_sibling_list): Update calls to extract_base_bit_offset.
Support GOMP_MAP_STRUCT_UNORD.
(omp_build_struct_sibling_lists, gimplify_scan_omp_clauses,
gimplify_adjust_omp_clauses, gimplify_omp_target_update): Add
GOMP_MAP_STRUCT_UNORD support.
* omp-low.cc (lower_omp_target): Add GOMP_MAP_STRUCT_UNORD support.
* tree-pretty-print.cc (dump_omp_clause): Likewise.

include/
* gomp-constants.h (gomp_map_kind): Add GOMP_MAP_STRUCT_UNORD.

libgomp/
* oacc-mem.c (find_group_last, goacc_enter_data_internal,
goacc_exit_data_internal, GOACC_enter_exit_data): Add
GOMP_MAP_STRUCT_UNORD support.
* target.c (gomp_map_vars_internal): Add GOMP_MAP_STRUCT_UNORD support.
Detect incorrect use of variable indexing of arrays of structs.
(GOMP_target_enter_exit_data, gomp_target_task_fn): Add
GOMP_MAP_STRUCT_UNORD support.
* testsuite/libgomp.c-c++-common/map-arrayofstruct-1.c: New test.
* testsuite/libgomp.c-c++-common/map-arrayofstruct-2.c: New test.
* testsuite/libgomp.c-c++-common/map-arrayofstruct-3.c: New test.
* testsuite/libgomp.fortran/map-subarray-5.f90: New test.
---
 gcc/fortran/trans-openmp.cc   |   1 +
 gcc/gimplify.cc   | 110 ++
 gcc/omp-low.cc|   1 +
 gcc/tree-pretty-print.cc  |   3 +
 include/gomp-constants.h  |   6 +
 libgomp/oacc-mem.c|   6 +-
 libgomp/target.c  |  60 +-
 .../map-arrayofstruct-1.c |  38 ++
 .../map-arrayofstruct-2.c |  58 +
 .../map-arrayofstruct-3.c |  68 +++
 .../libgomp.fortran/map-subarray-5.f90|  54 +
 11 files changed, 378 insertions(+), 27 deletions(-)
 create mode 100644 libgomp/testsuite/libgomp.c-c++-common/map-arrayofstruct-1.c
 create mode 100644 libgomp/testsuite/libgomp.c-c++-common/map-arrayofstruct-2.c
 create mode 100644 libgomp/testsuite/libgomp.c-c++-common/map-arrayofstruct-3.c
 create mode 100644 libgomp/testsuite/libgomp.fortran/map-subarray-5.f90

diff --git a/gcc/fortran/trans-openmp.cc b/gcc/fortran/trans-openmp.cc
index a108f718ffa..1a14d2bc068 100644
--- a/gcc/fortran/trans-openmp.cc
+++ b/gcc/fortran/trans-openmp.cc
@@ -2961,6 +2961,7 @@ gfc_omp_deep_map_kind_p (tree clause)
 case GOMP_MAP_FORCE_TOFROM:
 case GOMP_MAP_USE_DEVICE_PTR_IF_PRESENT:
 case GOMP_MAP_STRUCT:
+case GOMP_MAP_STRUCT_UNORD:
 case GOMP_MAP_ALWAYS_POINTER:
 case GOMP_MAP_POINTER_TO_ZERO_LENGTH_ARRAY_SECTION:
 case GOMP_MAP_DELETE_ZERO_LEN_ARRAY_SECTION:
diff --git a/gcc/gimplify.cc b/gcc/gimplify.cc
index da81582da1c..9ce1f5b983a 100644
--- a/gcc/gimplify.cc
+++ b/gcc/gimplify.cc
@@ -8952,7 +8952,8 @@ build_omp_struct_comp_nodes (enum tree_code code, tree 
grp_start, tree grp_end,
 
 static tree
 extract_base_bit_offset (tree base, poly_int64 *bitposp,
-poly_offset_int *poffsetp)
+poly_offset_int *poffsetp,
+bool *variable_offset)
 {
   tree offset;
   poly_int64 bitsize, bitpos;
@@ -8970,10 +8971,13 @@ extract_base_bit_offset (tree base, poly_int64 *bitposp,
   if (offset && poly_int_tree_p (offset))
 {
   poffset = wi::to_poly_offset (offset);
-  

[PATCH 14/14] OpenACC: Improve implicit mapping for non-lexically nested offload regions

2023-06-19 Thread Julian Brown
This patch enables use of the OMP_CLAUSE_RUNTIME_IMPLICIT_P flag for
OpenACC.

This allows code like this to work correctly:

  int arr[100];
  [...]
  #pragma acc enter data copyin(arr[20:10])

  /* No explicit mapping of 'arr' here.  */
  #pragma acc parallel
  { /* use of arr[20:10]... */ }

  #pragma acc exit data copyout(arr[20:10])

Otherwise, the implicit "copy" ("present_or_copy") on the parallel
corresponds to the whole array, and that fails at runtime when the
subarray is mapped.

The numbering of the GOMP_MAP_IMPLICIT bit clashes with the OpenACC
"non-contiguous" dynamic array support, so the GOMP_MAP_NONCONTIG_ARRAY_P
macro has been adjusted to account for that.

This behaviour relates to upstream OpenACC issue 490 (not yet resolved).

2023-06-16  Julian Brown  

gcc/
* gimplify.cc (gimplify_adjust_omp_clauses_1): Set
OMP_CLAUSE_MAP_RUNTIME_IMPLICIT_P for OpenACC also.

gcc/testsuite/
* c-c++-common/goacc/combined-reduction.c: Adjust scan output.
* c-c++-common/goacc/reduction-1.c: Likewise.
* c-c++-common/goacc/reduction-2.c: Likewise.
* c-c++-common/goacc/reduction-3.c: Likewise.
* c-c++-common/goacc/reduction-4.c: Likewise.
* c-c++-common/goacc/reduction-10.c: Likewise.
* gfortran.dg/goacc/loop-tree-1.f90: Likewise.

include/
* gomp-constants.h (GOMP_MAP_NONCONTIG_ARRAY_P): Tweak condition.

libgomp/
* testsuite/libgomp.oacc-c-c++-common/implicit-mapping-1.c: New test.
---
 gcc/gimplify.cc   |  5 +---
 .../c-c++-common/goacc/combined-reduction.c   |  2 +-
 .../c-c++-common/goacc/reduction-1.c  |  4 ++--
 .../c-c++-common/goacc/reduction-10.c |  9 +++
 .../c-c++-common/goacc/reduction-2.c  |  4 ++--
 .../c-c++-common/goacc/reduction-3.c  |  4 ++--
 .../c-c++-common/goacc/reduction-4.c  |  4 ++--
 .../gfortran.dg/goacc/loop-tree-1.f90 |  2 +-
 include/gomp-constants.h  |  3 ++-
 .../implicit-mapping-1.c  | 24 +++
 10 files changed, 42 insertions(+), 19 deletions(-)
 create mode 100644 
libgomp/testsuite/libgomp.oacc-c-c++-common/implicit-mapping-1.c

diff --git a/gcc/gimplify.cc b/gcc/gimplify.cc
index 0706f130ebb..1e90d2ed031 100644
--- a/gcc/gimplify.cc
+++ b/gcc/gimplify.cc
@@ -13413,10 +13413,7 @@ gimplify_adjust_omp_clauses_1 (splay_tree_node n, void 
*data)
  gcc_unreachable ();
}
   OMP_CLAUSE_SET_MAP_KIND (clause, kind);
-  /* Setting of the implicit flag for the runtime is currently disabled for
-OpenACC.  */
-  if ((gimplify_omp_ctxp->region_type & ORT_ACC) == 0)
-   OMP_CLAUSE_MAP_RUNTIME_IMPLICIT_P (clause) = 1;
+  OMP_CLAUSE_MAP_RUNTIME_IMPLICIT_P (clause) = 1;
   if (DECL_SIZE (decl)
  && TREE_CODE (DECL_SIZE (decl)) != INTEGER_CST)
{
diff --git a/gcc/testsuite/c-c++-common/goacc/combined-reduction.c 
b/gcc/testsuite/c-c++-common/goacc/combined-reduction.c
index ecf23f59d66..40b93acc9ea 100644
--- a/gcc/testsuite/c-c++-common/goacc/combined-reduction.c
+++ b/gcc/testsuite/c-c++-common/goacc/combined-reduction.c
@@ -25,5 +25,5 @@ main ()
 
 /* { dg-final { scan-tree-dump-times "omp target oacc_parallel reduction.+:v1. 
map.tofrom:v1" 1 "gimple" } } */
 /* { dg-final { scan-tree-dump-times "acc loop reduction.+:v1. private.i." 1 
"gimple" } } */
-/* { dg-final { scan-tree-dump-times "omp target oacc_kernels 
map.force_tofrom:n .len: 4.. map.force_tofrom:v1 .len: 4.." 1 "gimple" } } */
+/* { dg-final { scan-tree-dump-times "omp target oacc_kernels 
map.force_tofrom:n .len: 4..implicit.. map.force_tofrom:v1 .len: 4..implicit.." 
1 "gimple" } } */
 /* { dg-final { scan-tree-dump-times "acc loop reduction.+:v1. private.i." 1 
"gimple" } } */
diff --git a/gcc/testsuite/c-c++-common/goacc/reduction-1.c 
b/gcc/testsuite/c-c++-common/goacc/reduction-1.c
index 35bfc868708..d9e3c380b8e 100644
--- a/gcc/testsuite/c-c++-common/goacc/reduction-1.c
+++ b/gcc/testsuite/c-c++-common/goacc/reduction-1.c
@@ -68,5 +68,5 @@ main(void)
 }
 
 /* Check that default copy maps are generated for loop reductions.  */
-/* { dg-final { scan-tree-dump-times "map\\(tofrom:result \\\[len: 
\[0-9\]+\\\]\\)" 7 "gimple" } } */
-/* { dg-final { scan-tree-dump-times "map\\(tofrom:lresult \\\[len: 
\[0-9\]+\\\]\\)" 2 "gimple" } } */
+/* { dg-final { scan-tree-dump-times "map\\(tofrom:result \\\[len: 
\[0-9\]+\\\]\\\[implicit\\\]\\)" 7 "gimple" } } */
+/* { dg-final { scan-tree-dump-times "map\\(tofrom:lresult \\\[len: 
\[0-9\]+\\\]\\\[implicit\\\]\\)" 2 "gimple" } } */
diff --git a/gcc/testsuite/c-c++-common/goacc/reduction-10.c 
b/gcc/testsuite/c-c++-common/goacc/reduction-10.c
index 579aa561479..36c330e9267 100644
--- a/gcc/testsuite/c-c++-common/goacc/reduction-10.c
+++ b/gcc/testsuite/c-c++-common/goacc/reduction-10.c
@@ -87,7 +87,8 @@ main(void)
 
 /* Check that default copy maps are generated for loop reductions.  */
 /* { 

[PATCH 13/14] OpenACC: Allow implicit uses of assumed-size arrays in offload regions

2023-06-19 Thread Julian Brown
This patch reimplements the functionality of the previously-reverted
patch "Assumed-size arrays with non-lexical data mappings". The purpose
is to support implicit uses of assumed-size arrays for Fortran when those
arrays have already been mapped on the target some other way (e.g. by
"acc enter data").

This relates to upstream OpenACC issue 489 (not yet resolved).

2023-06-16  Julian Brown  

gcc/fortran/
* trans-openmp.cc (gfc_omp_finish_clause): Treat implicitly-mapped
assumed-size arrays as zero-sized for OpenACC, rather than an error.

gcc/testsuite/
* gfortran.dg/goacc/assumed-size.f90: Don't expect error.

libgomp/
* testsuite/libgomp.oacc-fortran/nonlexical-assumed-size-1.f90: New
test.
* testsuite/libgomp.oacc-fortran/nonlexical-assumed-size-2.f90: New
test.
---
 gcc/fortran/trans-openmp.cc   | 16 ++--
 .../gfortran.dg/goacc/assumed-size.f90|  4 +-
 .../nonlexical-assumed-size-1.f90 | 28 +
 .../nonlexical-assumed-size-2.f90 | 40 +++
 4 files changed, 82 insertions(+), 6 deletions(-)
 create mode 100644 
libgomp/testsuite/libgomp.oacc-fortran/nonlexical-assumed-size-1.f90
 create mode 100644 
libgomp/testsuite/libgomp.oacc-fortran/nonlexical-assumed-size-2.f90

diff --git a/gcc/fortran/trans-openmp.cc b/gcc/fortran/trans-openmp.cc
index 819d79cda28..230cebf250b 100644
--- a/gcc/fortran/trans-openmp.cc
+++ b/gcc/fortran/trans-openmp.cc
@@ -1587,6 +1587,7 @@ gfc_omp_finish_clause (tree c, gimple_seq *pre_p, bool 
openacc)
 return;
 
   tree decl = OMP_CLAUSE_DECL (c);
+  bool assumed_size = false;
 
   /* Assumed-size arrays can't be mapped implicitly, they have to be
  mapped explicitly using array sections.  */
@@ -1597,9 +1598,14 @@ gfc_omp_finish_clause (tree c, gimple_seq *pre_p, bool 
openacc)
GFC_TYPE_ARRAY_RANK (TREE_TYPE (decl)) - 1)
 == NULL)
 {
-  error_at (OMP_CLAUSE_LOCATION (c),
-   "implicit mapping of assumed size array %qD", decl);
-  return;
+  if (openacc)
+   assumed_size = true;
+  else
+   {
+ error_at (OMP_CLAUSE_LOCATION (c),
+   "implicit mapping of assumed size array %qD", decl);
+ return;
+   }
 }
 
   if (OMP_CLAUSE_MAP_KIND (c) == GOMP_MAP_FORCE_DEVICEPTR)
@@ -1654,7 +1660,9 @@ gfc_omp_finish_clause (tree c, gimple_seq *pre_p, bool 
openacc)
   else
{
  OMP_CLAUSE_DECL (c) = decl;
- OMP_CLAUSE_SIZE (c) = NULL_TREE;
+ OMP_CLAUSE_SIZE (c) = assumed_size ? size_zero_node : NULL_TREE;
+ if (assumed_size)
+   OMP_CLAUSE_MAP_MAYBE_ZERO_LENGTH_ARRAY_SECTION (c) = 1;
}
   if (TREE_CODE (TREE_TYPE (orig_decl)) == REFERENCE_TYPE
  && (GFC_DECL_GET_SCALAR_POINTER (orig_decl)
diff --git a/gcc/testsuite/gfortran.dg/goacc/assumed-size.f90 
b/gcc/testsuite/gfortran.dg/goacc/assumed-size.f90
index 4fced2e70c9..12f44c4743a 100644
--- a/gcc/testsuite/gfortran.dg/goacc/assumed-size.f90
+++ b/gcc/testsuite/gfortran.dg/goacc/assumed-size.f90
@@ -4,7 +4,8 @@
 ! exit data, respectively.
 
 ! This does not appear to be supported by the OpenACC standard as of version
-! 3.0.  Check for an appropriate error message.
+! 3.0.  There is however real-world code that relies on this working, so we
+! make an attempt to support it.
 
 program test
   implicit none
@@ -26,7 +27,6 @@ subroutine dtest (a, n)
   !$acc enter data copyin(a(1:n))
 
   !$acc parallel loop
-! { dg-error {implicit mapping of assumed size array 'a'} "" { target *-*-* } 
.-1 }
   do i = 1, n
  a(i) = i
   end do
diff --git 
a/libgomp/testsuite/libgomp.oacc-fortran/nonlexical-assumed-size-1.f90 
b/libgomp/testsuite/libgomp.oacc-fortran/nonlexical-assumed-size-1.f90
new file mode 100644
index 000..4b61e1cee9b
--- /dev/null
+++ b/libgomp/testsuite/libgomp.oacc-fortran/nonlexical-assumed-size-1.f90
@@ -0,0 +1,28 @@
+! { dg-do run }
+
+program p
+implicit none
+integer :: myarr(10)
+
+myarr = 0
+
+call subr(myarr)
+
+if (myarr(5).ne.5) stop 1
+
+contains
+
+subroutine subr(arr)
+implicit none
+integer :: arr(*)
+
+!$acc enter data copyin(arr(1:10))
+
+!$acc serial
+arr(5) = 5
+!$acc end serial
+
+!$acc exit data copyout(arr(1:10))
+
+end subroutine subr
+end program p
diff --git 
a/libgomp/testsuite/libgomp.oacc-fortran/nonlexical-assumed-size-2.f90 
b/libgomp/testsuite/libgomp.oacc-fortran/nonlexical-assumed-size-2.f90
new file mode 100644
index 000..daf7089915f
--- /dev/null
+++ b/libgomp/testsuite/libgomp.oacc-fortran/nonlexical-assumed-size-2.f90
@@ -0,0 +1,40 @@
+! { dg-do run }
+
+program p
+implicit none
+integer :: myarr(10)
+
+myarr = 0
+
+call subr(myarr)
+
+if (myarr(5).ne.5) stop 1
+
+contains
+
+subroutine subr(arr)
+implicit none
+integer :: arr(*)
+
+! At first glance, it might not be obvious how this works.  The "enter data"
+! and "exit data" operations expand to a pair 

[PATCH 05/14] OpenMP/OpenACC: Reindent TO/FROM/_CACHE_ stanza in {c_}finish_omp_clause

2023-06-19 Thread Julian Brown
This patch trivially adds braces and reindents the
OMP_CLAUSE_TO/OMP_CLAUSE_FROM/OMP_CLAUSE__CACHE_ stanza in
c_finish_omp_clause and finish_omp_clause, in preparation for the
following patch (to clarify the diff a little).

2022-09-13  Julian Brown  

gcc/c/
* c-typeck.cc (c_finish_omp_clauses): Add braces and reindent
OMP_CLAUSE_TO/OMP_CLAUSE_FROM/OMP_CLAUSE__CACHE_ stanza.

gcc/cp/
* semantics.cc (finish_omp_clause): Add braces and reindent
OMP_CLAUSE_TO/OMP_CLAUSE_FROM/OMP_CLAUSE__CACHE_ stanza.
---
 gcc/c/c-typeck.cc   | 615 +-
 gcc/cp/semantics.cc | 788 ++--
 2 files changed, 707 insertions(+), 696 deletions(-)

diff --git a/gcc/c/c-typeck.cc b/gcc/c/c-typeck.cc
index 9591d67251e..2cfe2174bab 100644
--- a/gcc/c/c-typeck.cc
+++ b/gcc/c/c-typeck.cc
@@ -15520,321 +15520,326 @@ c_finish_omp_clauses (tree clauses, enum 
c_omp_region_type ort)
case OMP_CLAUSE_TO:
case OMP_CLAUSE_FROM:
case OMP_CLAUSE__CACHE_:
- t = OMP_CLAUSE_DECL (c);
- if (TREE_CODE (t) == TREE_LIST)
-   {
- grp_start_p = pc;
- grp_sentinel = OMP_CLAUSE_CHAIN (c);
+ {
+   t = OMP_CLAUSE_DECL (c);
+   if (TREE_CODE (t) == TREE_LIST)
+ {
+   grp_start_p = pc;
+   grp_sentinel = OMP_CLAUSE_CHAIN (c);
 
- if (handle_omp_array_sections (c, ort))
-   remove = true;
- else
-   {
- t = OMP_CLAUSE_DECL (c);
- if (!omp_mappable_type (TREE_TYPE (t)))
-   {
- error_at (OMP_CLAUSE_LOCATION (c),
-   "array section does not have mappable type "
-   "in %qs clause",
-   omp_clause_code_name[OMP_CLAUSE_CODE (c)]);
- remove = true;
-   }
- else if (TYPE_ATOMIC (TREE_TYPE (t)))
-   {
- error_at (OMP_CLAUSE_LOCATION (c),
-   "%<_Atomic%> %qE in %qs clause", t,
-   omp_clause_code_name[OMP_CLAUSE_CODE (c)]);
- remove = true;
-   }
- while (TREE_CODE (t) == ARRAY_REF)
-   t = TREE_OPERAND (t, 0);
- if (TREE_CODE (t) == COMPONENT_REF
- && TREE_CODE (TREE_TYPE (t)) == ARRAY_TYPE)
-   {
- do
-   {
- t = TREE_OPERAND (t, 0);
- if (TREE_CODE (t) == MEM_REF
- || TREE_CODE (t) == INDIRECT_REF)
-   {
- t = TREE_OPERAND (t, 0);
- STRIP_NOPS (t);
- if (TREE_CODE (t) == POINTER_PLUS_EXPR)
-   t = TREE_OPERAND (t, 0);
-   }
-   }
- while (TREE_CODE (t) == COMPONENT_REF
-|| TREE_CODE (t) == ARRAY_REF);
-
- if (OMP_CLAUSE_CODE (c) == OMP_CLAUSE_MAP
- && OMP_CLAUSE_MAP_IMPLICIT (c)
- && (bitmap_bit_p (_head, DECL_UID (t))
- || bitmap_bit_p (_field_head, DECL_UID (t))
- || bitmap_bit_p (_firstprivate_head,
-  DECL_UID (t
-   {
- remove = true;
- break;
-   }
- if (bitmap_bit_p (_field_head, DECL_UID (t)))
-   break;
- if (bitmap_bit_p (_head, DECL_UID (t)))
-   {
- if (OMP_CLAUSE_CODE (c) != OMP_CLAUSE_MAP)
-   error_at (OMP_CLAUSE_LOCATION (c),
- "%qD appears more than once in motion "
- "clauses", t);
- else if (ort == C_ORT_ACC)
-   error_at (OMP_CLAUSE_LOCATION (c),
- "%qD appears more than once in data "
- "clauses", t);
- else
-   error_at (OMP_CLAUSE_LOCATION (c),
- "%qD appears more than once in map "
- "clauses", t);
- remove = true;
-   }
- else
-   {
- bitmap_set_bit (_head, DECL_UID (t));
- bitmap_set_bit (_field_head, DECL_UID (t));
-   }
-   }
-   }
- if 

[PATCH 07/14] OpenMP: implicitly map base pointer for array-section pointer components

2023-06-19 Thread Julian Brown
Following from discussion in:

  https://gcc.gnu.org/pipermail/gcc-patches/2021-May/570075.html

and:

  https://gcc.gnu.org/pipermail/gcc-patches/2022-December/608100.html

and also upstream OpenMP issue 342, this patch changes mapping for array
sections of pointer components on compute regions like this:

  #pragma omp target map(s.ptr[0:10])
  {
...use of 's'...
  }

so the base pointer 's.ptr' is implicitly mapped, and thus pointer
attachment happens.  This is subtly different in the "enter data"
case, e.g:

  #pragma omp target enter data map(s.ptr[0:10])

if 's.ptr' (or the whole of 's') is not present on the target before
the directive is executed, the array section is copied to the target
but pointer attachment does *not* take place, since 's' (or 's.ptr')
is not mapped implicitly for "enter data".

To get a pointer attachment with "enter data", you can do, e.g:

  #pragma omp target enter data map(s.ptr, s.ptr[0:10])

  #pragma omp target
  {
...implicit use of 's'...
  }

That is, once the attachment has happened, implicit mapping of 's'
and uses of 's.ptr[...]' work correctly in the target region.

ChangeLog

2022-12-12  Julian Brown  

gcc/
* gimplify.cc (omp_accumulate_sibling_list): Don't require
explicitly-mapped base pointer for compute regions.

gcc/testsuite/
* c-c++-comon/gomp/target-implicit-map-2.c: Update expected scan output.

libgomp/
* testsuite/libgomp.c-c++-common/target-implicit-map-2.c: Fix missing
"free".
* testsuite/libgomp.c-c++-common/target-implicit-map-3.c: New test.
* testsuite/libgomp.c-c++-common/target-map-zlas-1.c: New test.
* testsuite/libgomp.c/target-22.c: Remove explicit base pointer
mappings.
---
 gcc/gimplify.cc   |  9 ++--
 .../c-c++-common/gomp/target-implicit-map-2.c |  3 +-
 .../target-implicit-map-2.c   |  2 +
 .../target-implicit-map-5.c   | 50 +++
 .../libgomp.c-c++-common/target-map-zlas-1.c  | 36 +
 libgomp/testsuite/libgomp.c/target-22.c   |  3 +-
 6 files changed, 97 insertions(+), 6 deletions(-)
 create mode 100644 
libgomp/testsuite/libgomp.c-c++-common/target-implicit-map-5.c
 create mode 100644 libgomp/testsuite/libgomp.c-c++-common/target-map-zlas-1.c

diff --git a/gcc/gimplify.cc b/gcc/gimplify.cc
index 9be5d9c5328..6a43c792450 100644
--- a/gcc/gimplify.cc
+++ b/gcc/gimplify.cc
@@ -10696,6 +10696,7 @@ omp_accumulate_sibling_list (enum omp_region_type 
region_type,
   poly_int64 cbitpos;
   tree ocd = OMP_CLAUSE_DECL (grp_end);
   bool openmp = !(region_type & ORT_ACC);
+  bool target = (region_type & ORT_TARGET) != 0;
   tree *continue_at = NULL;
 
   while (TREE_CODE (ocd) == ARRAY_REF)
@@ -10800,9 +10801,9 @@ omp_accumulate_sibling_list (enum omp_region_type 
region_type,
}
 
  /* For OpenMP semantics, we don't want to implicitly allocate
-space for the pointer here.  A FRAGILE_P node is only being
-created so that omp-low.cc is able to rewrite the struct
-properly.
+space for the pointer here for non-compute regions (e.g. "enter
+data").  A FRAGILE_P node is only being created so that
+omp-low.cc is able to rewrite the struct properly.
 For references (to pointers), we want to actually allocate the
 space for the reference itself in the sorted list following the
 struct node.
@@ -10810,6 +10811,7 @@ omp_accumulate_sibling_list (enum omp_region_type 
region_type,
 mapping of the attachment point, but not otherwise.  */
  if (*fragile_p
  || (openmp
+ && !target
  && attach_detach
  && TREE_CODE (TREE_TYPE (ocd)) == POINTER_TYPE
  && !OMP_CLAUSE_ATTACHMENT_MAPPING_ERASED (grp_end)))
@@ -11122,6 +11124,7 @@ omp_accumulate_sibling_list (enum omp_region_type 
region_type,
 
  if (*fragile_p
  || (openmp
+ && !target
  && attach_detach
  && TREE_CODE (TREE_TYPE (ocd)) == POINTER_TYPE
  && !OMP_CLAUSE_ATTACHMENT_MAPPING_ERASED (grp_end)))
diff --git a/gcc/testsuite/c-c++-common/gomp/target-implicit-map-2.c 
b/gcc/testsuite/c-c++-common/gomp/target-implicit-map-2.c
index 5ba1d7efe08..72df5b1 100644
--- a/gcc/testsuite/c-c++-common/gomp/target-implicit-map-2.c
+++ b/gcc/testsuite/c-c++-common/gomp/target-implicit-map-2.c
@@ -49,4 +49,5 @@ main (void)
 
 /* { dg-final { scan-tree-dump {#pragma omp target num_teams.* map\(tofrom:a 
\[len: [0-9]+\]\[implicit\]\)} "gimple" } } */
 
-/* { dg-final { scan-tree-dump {#pragma omp target num_teams.* map\(struct:a 
\[len: 1\]\) map\(alloc:a\.ptr \[len: 0\]\) map\(tofrom:\*_[0-9]+ \[len: 
[0-9]+\]\) map\(attach:a\.ptr \[bias: 0\]\)} "gimple" } } */
+/* { dg-final { scan-tree-dump {#pragma omp target num_teams.* map\(struct:a 
\[len: 

[PATCH 03/14] Revert "Fix implicit mapping for array slices on lexically-enclosing data constructs (PR70828)"

2023-06-19 Thread Julian Brown
This reverts commit a84b89b8f070f1efe86ea347e98d57e6bc32ae2d.

Relevant tests are temporarily disabled or XFAILed.

2023-06-16  Julian Brown  

gcc/
Revert:
* gimplify.cc (oacc_array_mapping_info): New struct.
(gimplify_omp_ctx): Add decl_data_clause hash map.
(new_omp_context): Zero-initialise above.
(delete_omp_context): Delete above if allocated.
(gimplify_scan_omp_clauses): Scan for array mappings on data constructs,
and record in above map.
(gomp_oacc_needs_data_present): New function.
(gimplify_adjust_omp_clauses_1): Handle data mappings (e.g. array
slices) declared in lexically-enclosing data constructs.
* omp-low.cc (lower_omp_target): Allow decl for bias not to be present
in OpenACC context.

gcc/fortran/
Revert:
* trans-openmp.cc: Handle implicit "present".

gcc/testsuite/
* c-c++-common/goacc/acc-data-chain.c: Partly disable test.
* gfortran.dg/goacc/pr70828.f90: Likewise.

libgomp/
* testsuite/libgomp.oacc-c-c++-common/pr70828.c: XFAIL test.
* testsuite/libgomp.oacc-c-c++-common/pr70828-2.c: XFAIL test.
* testsuite/libgomp.oacc-fortran/pr70828.f90: XFAIL test.
* testsuite/libgomp.oacc-fortran/pr70828-2.f90: XFAIL test.
* testsuite/libgomp.oacc-fortran/pr70828-3.f90: XFAIL test.
* testsuite/libgomp.oacc-fortran/pr70828-4.f90: XFAIL test.
* testsuite/libgomp.oacc-fortran/pr70828-5.f90: XFAIL test.
* testsuite/libgomp.oacc-fortran/pr70828-6.f90: XFAIL test.
---
 gcc/fortran/trans-openmp.cc   |  10 +-
 gcc/gimplify.cc   | 143 +-
 gcc/omp-low.cc|  10 +-
 .../c-c++-common/goacc/acc-data-chain.c   |   4 +-
 gcc/testsuite/gfortran.dg/goacc/pr70828.f90   |   3 +-
 .../libgomp.oacc-c-c++-common/pr70828-2.c |   2 +
 .../libgomp.oacc-c-c++-common/pr70828.c   |   2 +
 .../libgomp.oacc-fortran/pr70828-2.f90|   2 +
 .../libgomp.oacc-fortran/pr70828-3.f90|   2 +
 .../libgomp.oacc-fortran/pr70828-4.f90|   2 +
 .../libgomp.oacc-fortran/pr70828-5.f90|   2 +
 .../libgomp.oacc-fortran/pr70828-6.f90|   2 +
 .../libgomp.oacc-fortran/pr70828.f90  |   2 +
 13 files changed, 28 insertions(+), 158 deletions(-)

diff --git a/gcc/fortran/trans-openmp.cc b/gcc/fortran/trans-openmp.cc
index 96e91a3bc50..809b96bc220 100644
--- a/gcc/fortran/trans-openmp.cc
+++ b/gcc/fortran/trans-openmp.cc
@@ -1587,13 +1587,9 @@ gfc_omp_finish_clause (tree c, gimple_seq *pre_p, bool 
openacc)
 
   tree decl = OMP_CLAUSE_DECL (c);
 
-  /* Assumed-size arrays can't be mapped implicitly, they have to be mapped
- explicitly using array sections.  An exception is if the array is
- mapped explicitly in an enclosing data construct for OpenACC, in which
- case we see GOMP_MAP_FORCE_PRESENT here and do not need to raise an
- error.  */
-  if (OMP_CLAUSE_MAP_KIND (c) != GOMP_MAP_FORCE_PRESENT
-  && TREE_CODE (decl) == PARM_DECL
+  /* Assumed-size arrays can't be mapped implicitly, they have to be
+ mapped explicitly using array sections.  */
+  if (TREE_CODE (decl) == PARM_DECL
   && GFC_ARRAY_TYPE_P (TREE_TYPE (decl))
   && GFC_TYPE_ARRAY_AKIND (TREE_TYPE (decl)) == GFC_ARRAY_UNKNOWN
   && GFC_TYPE_ARRAY_UBOUND (TREE_TYPE (decl),
diff --git a/gcc/gimplify.cc b/gcc/gimplify.cc
index 80f1f3a657f..e3384c7f65b 100644
--- a/gcc/gimplify.cc
+++ b/gcc/gimplify.cc
@@ -218,17 +218,6 @@ enum gimplify_defaultmap_kind
   GDMK_POINTER
 };
 
-/* Used to record clauses representing array slices on data directives that
-   may affect implicit mapping semantics on enclosed OpenACC parallel/kernels
-   regions.  PSET is used for Fortran array slices with array descriptors,
-   or NULL otherwise.  */
-struct oacc_array_mapping_info
-{
-  tree mapping;
-  tree pset;
-  tree pointer;
-};
-
 struct gimplify_omp_ctx
 {
   struct gimplify_omp_ctx *outer_context;
@@ -250,7 +239,6 @@ struct gimplify_omp_ctx
   bool in_for_exprs;
   bool ompacc;
   int defaultmap[5];
-  hash_map *decl_data_clause;
 };
 
 struct privatize_reduction
@@ -485,7 +473,6 @@ new_omp_context (enum omp_region_type region_type)
   c->defaultmap[GDMK_AGGREGATE] = GOVD_MAP;
   c->defaultmap[GDMK_ALLOCATABLE] = GOVD_MAP;
   c->defaultmap[GDMK_POINTER] = GOVD_MAP;
-  c->decl_data_clause = NULL;
 
   return c;
 }
@@ -498,8 +485,6 @@ delete_omp_context (struct gimplify_omp_ctx *c)
   splay_tree_delete (c->variables);
   delete c->privatized_types;
   c->loop_iter_var.release ();
-  if (c->decl_data_clause)
-delete c->decl_data_clause;
   XDELETE (c);
 }
 
@@ -11235,41 +11220,8 @@ gimplify_scan_omp_clauses (tree *list_p, gimple_seq 
*pre_p,
case OMP_TARGET:
  break;
case OACC_DATA:
- {
-   tree base_ptr = OMP_CLAUSE_CHAIN (c);
-   tree pset = NULL;
-   

[PATCH 01/14] Revert "Assumed-size arrays with non-lexical data mappings"

2023-06-19 Thread Julian Brown
This reverts commit 72733f6e6f6ec1bb9884fea8bfbebd3de03d9374.

2023-06-16  Julian Brown  

gcc/
Revert:
* gimplify.cc (gimplify_adjust_omp_clauses_1): Raise error for
assumed-size arrays in map clauses for Fortran/OpenMP.
* omp-low.cc (lower_omp_target): Set the size of assumed-size Fortran
arrays to one to allow use of data already mapped on the offload device.

gcc/fortran/
Revert:
* trans-openmp.cc (gfc_omp_finish_clause): Change clauses mapping
assumed-size arrays to use the GOMP_MAP_FORCE_PRESENT map type.
---
 gcc/fortran/trans-openmp.cc | 22 +-
 gcc/gimplify.cc | 14 --
 gcc/omp-low.cc  |  5 -
 3 files changed, 9 insertions(+), 32 deletions(-)

diff --git a/gcc/fortran/trans-openmp.cc b/gcc/fortran/trans-openmp.cc
index e8f3b24e5f8..e55c8292d05 100644
--- a/gcc/fortran/trans-openmp.cc
+++ b/gcc/fortran/trans-openmp.cc
@@ -1588,18 +1588,10 @@ gfc_omp_finish_clause (tree c, gimple_seq *pre_p, bool 
openacc)
   tree decl = OMP_CLAUSE_DECL (c);
 
   /* Assumed-size arrays can't be mapped implicitly, they have to be mapped
- explicitly using array sections.  For OpenACC this restriction is lifted
- if the array has already been mapped:
-
-   - Using a lexically-enclosing data region: in that case we see the
- GOMP_MAP_FORCE_PRESENT mapping kind here.
-
-   - Using a non-lexical data mapping ("acc enter data").
-
- In the latter case we change the mapping type to GOMP_MAP_FORCE_PRESENT.
- This raises an error for OpenMP in the caller
- (gimplify.c:gimplify_adjust_omp_clauses_1).  OpenACC will raise a runtime
- error if the implicitly-referenced assumed-size array is not mapped.  */
+ explicitly using array sections.  An exception is if the array is
+ mapped explicitly in an enclosing data construct for OpenACC, in which
+ case we see GOMP_MAP_FORCE_PRESENT here and do not need to raise an
+ error.  */
   if (OMP_CLAUSE_MAP_KIND (c) != GOMP_MAP_FORCE_PRESENT
   && TREE_CODE (decl) == PARM_DECL
   && GFC_ARRAY_TYPE_P (TREE_TYPE (decl))
@@ -1607,7 +1599,11 @@ gfc_omp_finish_clause (tree c, gimple_seq *pre_p, bool 
openacc)
   && GFC_TYPE_ARRAY_UBOUND (TREE_TYPE (decl),
GFC_TYPE_ARRAY_RANK (TREE_TYPE (decl)) - 1)
 == NULL)
-OMP_CLAUSE_SET_MAP_KIND (c, GOMP_MAP_FORCE_PRESENT);
+{
+  error_at (OMP_CLAUSE_LOCATION (c),
+   "implicit mapping of assumed size array %qD", decl);
+  return;
+}
 
   if (OMP_CLAUSE_MAP_KIND (c) == GOMP_MAP_FORCE_DEVICEPTR)
 return;
diff --git a/gcc/gimplify.cc b/gcc/gimplify.cc
index 09c596f026e..3729b986801 100644
--- a/gcc/gimplify.cc
+++ b/gcc/gimplify.cc
@@ -12828,26 +12828,12 @@ gimplify_adjust_omp_clauses_1 (splay_tree_node n, 
void *data)
   *list_p = clause;
   struct gimplify_omp_ctx *ctx = gimplify_omp_ctxp;
   gimplify_omp_ctxp = ctx->outer_context;
-  gomp_map_kind kind = (code == OMP_CLAUSE_MAP) ? OMP_CLAUSE_MAP_KIND (clause)
-   : (gomp_map_kind) GOMP_MAP_LAST;
   /* Don't call omp_finish_clause on implicitly added OMP_CLAUSE_PRIVATE
  in simd.  Those are only added for the local vars inside of simd body
  and they don't need to be e.g. default constructible.  */
   if (code != OMP_CLAUSE_PRIVATE || ctx->region_type != ORT_SIMD) 
 lang_hooks.decls.omp_finish_clause (clause, pre_p,
(ctx->region_type & ORT_ACC) != 0);
-  /* Allow OpenACC to have implicit assumed-size arrays via FORCE_PRESENT,
- which should work as long as the array has previously been mapped
- explicitly on the target (e.g. by "enter data").  Raise an error for
- OpenMP.  */
-  if (lang_GNU_Fortran ()
-  && code == OMP_CLAUSE_MAP
-  && (ctx->region_type & ORT_ACC) == 0
-  && kind == GOMP_MAP_TOFROM
-  && OMP_CLAUSE_MAP_KIND (clause) == GOMP_MAP_FORCE_PRESENT)
-error_at (OMP_CLAUSE_LOCATION (clause),
- "implicit mapping of assumed size array %qD",
- OMP_CLAUSE_DECL (clause));
   if (gimplify_omp_ctxp)
 for (; clause != chain; clause = OMP_CLAUSE_CHAIN (clause))
   if (OMP_CLAUSE_CODE (clause) == OMP_CLAUSE_MAP
diff --git a/gcc/omp-low.cc b/gcc/omp-low.cc
index 3424eba2217..59143d8efe5 100644
--- a/gcc/omp-low.cc
+++ b/gcc/omp-low.cc
@@ -14353,11 +14353,6 @@ lower_omp_target (gimple_stmt_iterator *gsi_p, 
omp_context *ctx)
  s = OMP_CLAUSE_SIZE (c);
if (s == NULL_TREE)
  s = TYPE_SIZE_UNIT (TREE_TYPE (ovar));
-   /* Fortran assumed-size arrays have zero size because the type is
-  incomplete.  Set the size to one to allow the runtime to remap
-  any existing data that is already present on the accelerator.  */
-   if (s == NULL_TREE && is_gimple_omp_oacc (ctx->stmt))
- s = integer_one_node;

[PATCH 02/14] Revert "Fix references declared in lexically-enclosing OpenACC data region"

2023-06-19 Thread Julian Brown
This reverts commit c9cd2bac6a5127a01c6f47e5636a926ac39b5e21.

2023-06-16  Julian Brown  

gcc/fortran/
Revert:
* trans-openmp.cc (gfc_omp_finish_clause): Guard addition of clauses for
pointers with DECL_P.

gcc/
Revert:
* gimplify.cc (oacc_array_mapping_info): Add REF field.
(gimplify_scan_omp_clauses): Initialise above field for data blocks
passed by reference.
(gomp_oacc_needs_data_present): Handle references.
(gimplify_adjust_omp_clauses_1): Handle references and optional
arguments for variables declared in lexically-enclosing OpenACC data
region.
---
 gcc/fortran/trans-openmp.cc |  2 +-
 gcc/gimplify.cc | 55 +
 2 files changed, 8 insertions(+), 49 deletions(-)

diff --git a/gcc/fortran/trans-openmp.cc b/gcc/fortran/trans-openmp.cc
index e55c8292d05..96e91a3bc50 100644
--- a/gcc/fortran/trans-openmp.cc
+++ b/gcc/fortran/trans-openmp.cc
@@ -1611,7 +1611,7 @@ gfc_omp_finish_clause (tree c, gimple_seq *pre_p, bool 
openacc)
   tree c2 = NULL_TREE, c3 = NULL_TREE, c4 = NULL_TREE;
   tree present = gfc_omp_check_optional_argument (decl, true);
   tree orig_decl = NULL_TREE;
-  if (DECL_P (decl) && POINTER_TYPE_P (TREE_TYPE (decl)))
+  if (POINTER_TYPE_P (TREE_TYPE (decl)))
 {
   if (!gfc_omp_privatize_by_reference (decl)
  && !GFC_DECL_GET_SCALAR_POINTER (decl)
diff --git a/gcc/gimplify.cc b/gcc/gimplify.cc
index 3729b986801..80f1f3a657f 100644
--- a/gcc/gimplify.cc
+++ b/gcc/gimplify.cc
@@ -227,7 +227,6 @@ struct oacc_array_mapping_info
   tree mapping;
   tree pset;
   tree pointer;
-  tree ref;
 };
 
 struct gimplify_omp_ctx
@@ -11248,9 +11247,6 @@ gimplify_scan_omp_clauses (tree *list_p, gimple_seq 
*pre_p,
  }
if (base_ptr
&& OMP_CLAUSE_CODE (base_ptr) == OMP_CLAUSE_MAP
-   && !(OMP_CLAUSE_CODE (c) == OMP_CLAUSE_MAP
-&& (OMP_CLAUSE_MAP_KIND (c) == GOMP_MAP_ALLOC
-|| OMP_CLAUSE_MAP_KIND (c) == GOMP_MAP_POINTER))
&& OMP_CLAUSE_MAP_KIND (c) != GOMP_MAP_TO_PSET
&& ((OMP_CLAUSE_MAP_KIND (base_ptr)
 == GOMP_MAP_FIRSTPRIVATE_POINTER)
@@ -11269,19 +11265,6 @@ gimplify_scan_omp_clauses (tree *list_p, gimple_seq 
*pre_p,
ai.mapping = unshare_expr (c);
ai.pset = pset ? unshare_expr (pset) : NULL;
ai.pointer = unshare_expr (base_ptr);
-   ai.ref = NULL_TREE;
-   if (TREE_CODE (base_addr) == INDIRECT_REF
-   && (TREE_CODE (TREE_TYPE (TREE_OPERAND (base_addr, 0)))
-   == REFERENCE_TYPE))
- {
-   base_addr = TREE_OPERAND (base_addr, 0);
-   tree ref_clause = OMP_CLAUSE_CHAIN (base_ptr);
-   gcc_assert ((OMP_CLAUSE_CODE (ref_clause)
-== OMP_CLAUSE_MAP)
-   && (OMP_CLAUSE_MAP_KIND (ref_clause)
-   == GOMP_MAP_POINTER));
-   ai.ref = unshare_expr (ref_clause);
- }
ctx->decl_data_clause->put (base_addr, ai);
  }
if (TREE_CODE (TREE_TYPE (decl)) != ARRAY_TYPE)
@@ -12464,15 +12447,11 @@ gomp_oacc_needs_data_present (tree decl)
   && gimplify_omp_ctxp->region_type != ORT_ACC_KERNELS)
 return NULL;
 
-  tree type = TREE_TYPE (decl);
-  if (TREE_CODE (type) == REFERENCE_TYPE)
-type = TREE_TYPE (type);
-
-  if (TREE_CODE (type) != ARRAY_TYPE
-  && TREE_CODE (type) != POINTER_TYPE
-  && TREE_CODE (type) != RECORD_TYPE
-  && (TREE_CODE (type) != POINTER_TYPE
- || TREE_CODE (TREE_TYPE (type)) != ARRAY_TYPE))
+  if (TREE_CODE (TREE_TYPE (decl)) != ARRAY_TYPE
+  && TREE_CODE (TREE_TYPE (decl)) != POINTER_TYPE
+  && TREE_CODE (TREE_TYPE (decl)) != RECORD_TYPE
+  && (TREE_CODE (TREE_TYPE (decl)) != POINTER_TYPE
+ || TREE_CODE (TREE_TYPE (TREE_TYPE (decl))) != ARRAY_TYPE))
 return NULL;
 
   decl = get_base_address (decl);
@@ -12626,12 +12605,6 @@ gimplify_adjust_omp_clauses_1 (splay_tree_node n, void 
*data)
 {
   tree mapping = array_info->mapping;
   tree pointer = array_info->pointer;
-  gomp_map_kind presence_kind = GOMP_MAP_FORCE_PRESENT;
-  bool no_alloc = (OMP_CLAUSE_CODE (mapping) == OMP_CLAUSE_MAP
-  && OMP_CLAUSE_MAP_KIND (mapping) == GOMP_MAP_IF_PRESENT);
-
-  if (no_alloc || omp_check_optional_argument (decl, false))
-presence_kind = GOMP_MAP_IF_PRESENT;
 
   if (code == OMP_CLAUSE_FIRSTPRIVATE)
/* Oops, we have the wrong type of clause.  Rebuild it.  */
@@ -12639,15 +12612,14 @@ gimplify_adjust_omp_clauses_1 (splay_tree_node n, 
void *data)
 

[PATCH 04/14] Revert "openmp: Handle C/C++ array reference base-pointers in array sections"

2023-06-19 Thread Julian Brown
This reverts commit 3385743fd2fa15a2a750a29daf6d4f97f5aad0ae.

2023-06-16  Julian Brown  

Revert:
2022-02-24  Chung-Lin Tang  

gcc/c/ChangeLog:

* c-typeck.cc (handle_omp_array_sections): Add handling for
creating array-reference base-pointer attachment clause.

gcc/cp/ChangeLog:

* semantics.cc (handle_omp_array_sections): Add handling for
creating array-reference base-pointer attachment clause.

gcc/testsuite/ChangeLog:

* c-c++-common/gomp/target-enter-data-1.c: Adjust testcase.

libgomp/ChangeLog:

* testsuite/libgomp.c-c++-common/ptr-attach-2.c: New test.
---
 gcc/c/c-typeck.cc | 27 +
 gcc/cp/semantics.cc   | 28 +
 .../c-c++-common/gomp/target-enter-data-1.c   |  3 +-
 .../libgomp.c-c++-common/ptr-attach-2.c   | 60 ---
 4 files changed, 3 insertions(+), 115 deletions(-)
 delete mode 100644 libgomp/testsuite/libgomp.c-c++-common/ptr-attach-2.c

diff --git a/gcc/c/c-typeck.cc b/gcc/c/c-typeck.cc
index 450214556f9..9591d67251e 100644
--- a/gcc/c/c-typeck.cc
+++ b/gcc/c/c-typeck.cc
@@ -14113,10 +14113,6 @@ handle_omp_array_sections (tree c, enum 
c_omp_region_type ort)
   if (int_size_in_bytes (TREE_TYPE (first)) <= 0)
maybe_zero_len = true;
 
-  struct dim { tree low_bound, length; };
-  auto_vec dims (num);
-  dims.safe_grow (num);
-
   for (i = num, t = OMP_CLAUSE_DECL (c); i > 0;
   t = TREE_CHAIN (t))
{
@@ -14238,9 +14234,6 @@ handle_omp_array_sections (tree c, enum 
c_omp_region_type ort)
  else
size = size_binop (MULT_EXPR, size, l);
}
-
- dim d = { low_bound, length };
- dims[i] = d;
}
   if (non_contiguous)
{
@@ -14288,23 +14281,6 @@ handle_omp_array_sections (tree c, enum 
c_omp_region_type ort)
  OMP_CLAUSE_DECL (c) = t;
  return false;
}
-
-  tree aref = t;
-  for (i = 0; i < dims.length (); i++)
-   {
- if (dims[i].length && integer_onep (dims[i].length))
-   {
- tree lb = dims[i].low_bound;
- aref = build_array_ref (OMP_CLAUSE_LOCATION (c), aref, lb);
-   }
- else
-   {
- if (TREE_CODE (TREE_TYPE (aref)) == POINTER_TYPE)
-   t = aref;
- break;
-   }
-   }
-
   first = c_fully_fold (first, false, NULL);
   OMP_CLAUSE_DECL (c) = first;
   if (OMP_CLAUSE_CODE (c) == OMP_CLAUSE_HAS_DEVICE_ADDR)
@@ -14339,8 +14315,7 @@ handle_omp_array_sections (tree c, enum 
c_omp_region_type ort)
  break;
}
   tree c2 = build_omp_clause (OMP_CLAUSE_LOCATION (c), OMP_CLAUSE_MAP);
-  if (TREE_CODE (t) == COMPONENT_REF || TREE_CODE (t) == ARRAY_REF
- || TREE_CODE (t) == INDIRECT_REF)
+  if (TREE_CODE (t) == COMPONENT_REF)
OMP_CLAUSE_SET_MAP_KIND (c2, GOMP_MAP_ATTACH_DETACH);
   else
OMP_CLAUSE_SET_MAP_KIND (c2, GOMP_MAP_FIRSTPRIVATE_POINTER);
diff --git a/gcc/cp/semantics.cc b/gcc/cp/semantics.cc
index e7bda6fa060..93ff7cf5e1b 100644
--- a/gcc/cp/semantics.cc
+++ b/gcc/cp/semantics.cc
@@ -5605,10 +5605,6 @@ handle_omp_array_sections (tree c, enum 
c_omp_region_type ort)
   if (processing_template_decl && maybe_zero_len)
return false;
 
-  struct dim { tree low_bound, length; };
-  auto_vec dims (num);
-  dims.safe_grow (num);
-
   for (i = num, t = OMP_CLAUSE_DECL (c); i > 0;
   t = TREE_CHAIN (t))
{
@@ -5728,9 +5724,6 @@ handle_omp_array_sections (tree c, enum c_omp_region_type 
ort)
  else
size = size_binop (MULT_EXPR, size, l);
}
-
- dim d = { low_bound, length };
- dims[i] = d;
}
   if (!processing_template_decl)
{
@@ -5782,24 +5775,6 @@ handle_omp_array_sections (tree c, enum 
c_omp_region_type ort)
  OMP_CLAUSE_DECL (c) = t;
  return false;
}
-
- tree aref = t;
- for (i = 0; i < dims.length (); i++)
-   {
- if (dims[i].length && integer_onep (dims[i].length))
-   {
- tree lb = dims[i].low_bound;
- aref = convert_from_reference (aref);
- aref = build_array_ref (OMP_CLAUSE_LOCATION (c), aref, lb);
-   }
- else
-   {
- if (TREE_CODE (TREE_TYPE (aref)) == POINTER_TYPE)
-   t = aref;
- break;
-   }
-   }
-
  OMP_CLAUSE_DECL (c) = first;
  if (OMP_CLAUSE_CODE (c) == OMP_CLAUSE_HAS_DEVICE_ADDR)
return false;
@@ -5841,8 +5816,7 @@ handle_omp_array_sections (tree c, enum c_omp_region_type 
ort)
  bool reference_always_pointer = true;
  tree c2 = build_omp_clause (OMP_CLAUSE_LOCATION (c),
  OMP_CLAUSE_MAP);
- 

[PATCH 00/14] [og13] OpenMP/OpenACC: map clause and OMP gimplify rework

2023-06-19 Thread Julian Brown
This series (for the og13 branch) is a rebased and merged version of
the first few patches of the series previously sent upstream for mainline:

  https://gcc.gnu.org/pipermail/gcc-patches/2022-December/609031.html

The series contains patches 1-6 and the parts of 8 ("C++
"declare mapper" support) that pertain to reorganisation of
gimplify.cc:gimplify_{scan,adjust}_omp_clauses.

The series also contains reversions and rewrites of several patches
that needed adjustment in order to fit in with the new clause-processing
arrangements.

Tested with offloading to AMD GCN. I will apply shortly.

Thanks,

Julian

Julian Brown (14):
  Revert "Assumed-size arrays with non-lexical data mappings"
  Revert "Fix references declared in lexically-enclosing OpenACC data
region"
  Revert "Fix implicit mapping for array slices on lexically-enclosing
data constructs (PR70828)"
  Revert "openmp: Handle C/C++ array reference base-pointers in array
sections"
  OpenMP/OpenACC: Reindent TO/FROM/_CACHE_ stanza in
{c_}finish_omp_clause
  OpenMP/OpenACC: Rework clause expansion and nested struct handling
  OpenMP: implicitly map base pointer for array-section pointer
components
  OpenMP: Pointers and member mappings
  OpenMP/OpenACC: Unordered/non-constant component offset runtime
diagnostic
  OpenMP/OpenACC: Reorganise OMP map clause handling in gimplify.cc
  OpenACC: Reimplement "inheritance" for lexically-nested offload
regions
  OpenACC: "declare create" fixes wrt. "allocatable" variables
  OpenACC: Allow implicit uses of assumed-size arrays in offload regions
  OpenACC: Improve implicit mapping for non-lexically nested offload
regions

 gcc/c-family/c-common.h   |   74 +-
 gcc/c-family/c-omp.cc |  837 -
 gcc/c/c-parser.cc |   17 +-
 gcc/c/c-typeck.cc |  773 ++--
 gcc/cp/parser.cc  |   17 +-
 gcc/cp/pt.cc  |4 +-
 gcc/cp/semantics.cc   | 1065 +++---
 gcc/fortran/dependency.cc |  128 +
 gcc/fortran/dependency.h  |1 +
 gcc/fortran/gfortran.h|1 +
 gcc/fortran/trans-openmp.cc   |  376 +-
 gcc/gimplify.cc   | 2239 
 gcc/omp-general.cc|  424 +++
 gcc/omp-general.h |   69 +
 gcc/omp-low.cc|   23 +-
 .../c-c++-common/goacc/acc-data-chain.c   |2 +-
 .../c-c++-common/goacc/combined-reduction.c   |2 +-
 .../c-c++-common/goacc/reduction-1.c  |4 +-
 .../c-c++-common/goacc/reduction-10.c |9 +-
 .../c-c++-common/goacc/reduction-2.c  |4 +-
 .../c-c++-common/goacc/reduction-3.c  |4 +-
 .../c-c++-common/goacc/reduction-4.c  |4 +-
 gcc/testsuite/c-c++-common/gomp/clauses-2.c   |2 +-
 gcc/testsuite/c-c++-common/gomp/target-50.c   |2 +-
 .../c-c++-common/gomp/target-enter-data-1.c   |4 +-
 .../c-c++-common/gomp/target-implicit-map-2.c |3 +-
 .../g++.dg/gomp/static-component-1.C  |   23 +
 gcc/testsuite/gcc.dg/gomp/target-3.c  |2 +-
 .../gfortran.dg/goacc/assumed-size.f90|   35 +
 .../gfortran.dg/goacc/loop-tree-1.f90 |2 +-
 gcc/testsuite/gfortran.dg/gomp/map-12.f90 |2 +-
 gcc/testsuite/gfortran.dg/gomp/map-9.f90  |2 +-
 .../gfortran.dg/gomp/map-subarray-2.f90   |   57 +
 .../gfortran.dg/gomp/map-subarray.f90 |   40 +
 gcc/tree-pretty-print.cc  |3 +
 gcc/tree.h|8 +
 include/gomp-constants.h  |9 +-
 libgomp/oacc-mem.c|6 +-
 libgomp/target.c  |   91 +-
 libgomp/testsuite/libgomp.c++/baseptrs-3.C|  275 ++
 libgomp/testsuite/libgomp.c++/baseptrs-4.C| 3154 +
 libgomp/testsuite/libgomp.c++/baseptrs-5.C|   62 +
 libgomp/testsuite/libgomp.c++/class-array-1.C |   59 +
 libgomp/testsuite/libgomp.c++/target-48.C |   32 +
 libgomp/testsuite/libgomp.c++/target-49.C |   37 +
 .../libgomp.c-c++-common/baseptrs-1.c |   50 +
 .../libgomp.c-c++-common/baseptrs-2.c |   70 +
 .../map-arrayofstruct-1.c |   38 +
 .../map-arrayofstruct-2.c |   58 +
 .../map-arrayofstruct-3.c |   68 +
 .../target-implicit-map-2.c   |2 +
 .../target-implicit-map-5.c   |   50 +
 .../libgomp.c-c++-common/target-map-zlas-1.c  |   36 +
 .../libgomp.fortran/map-subarray-2.f90|  108 +
 .../libgomp.fortran/map-subarray-3.f90|   62 +
 .../libgomp.fortran/map-subarray-4.f90|   35 +
 .../libgomp.fortran/map-subarray-5.f90|   54 +
 .../libgomp.fortran/map-subarray-6.f90|   26 +
 

[Bug debug/110308] [14 Regression] ICE on audiofile-0.3.6: RTL: vartrack: Segmentation fault in mode_to_precision(machine_mode)

2023-06-19 Thread pinskia at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=110308

Andrew Pinski  changed:

   What|Removed |Added

  Component|rtl-optimization|debug

--- Comment #4 from Andrew Pinski  ---
So I think there are 2 bugs here. First the lost of debugging info because of
ch, and the latent segfault.

[Bug rtl-optimization/110308] [14 Regression] ICE on audiofile-0.3.6: RTL: vartrack: Segmentation fault in mode_to_precision(machine_mode)

2023-06-19 Thread pinskia at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=110308

--- Comment #3 from Andrew Pinski  ---
There are a difference in .optimized with respect to debug statements:
GCC 13:
  # DEBUG i => 0

vs
GCC trunk:
  # DEBUG i => NULL

This is in BB 5.

The change in Debug statements happened starting in ch2.

Before ch2:
   [local count: 715863673]:
  # DEBUG BEGIN_STMT
  _3 = state[i_8];
  _4 = _3->sample1;
  _5 = _4 + 1;
  _3->sample1 = _5;
  # DEBUG BEGIN_STMT
  i_23 = i_8 + 1;
  # DEBUG i => i_23

   [local count: 1073741824]:
  # i_8 = PHI <0(4), i_23(5)>
  # DEBUG i => i_8
  # DEBUG BEGIN_STMT
  if (channelCount.1_1 > i_8)
goto ; [66.67%]
  else
goto ; [33.33%]

After:
   [local count: 715863673]:
  # i_9 = PHI 
  # DEBUG BEGIN_STMT
  _3 = state[i_9];
  _4 = _3->sample1;
  _5 = _4 + 1;
  _3->sample1 = _5;
  # DEBUG BEGIN_STMT
  i_23 = i_9 + 1;
  # DEBUG i => i_23
  # DEBUG i => i_23
  # DEBUG BEGIN_STMT
  if (channelCount.1_1 > i_23)
goto ; [66.67%]
  else
goto ; [33.33%]


While in GCC 13 after ch2 looks like:
   [local count: 715863673]:
  # i_9 = PHI 
  # DEBUG i => i_9
  # DEBUG BEGIN_STMT
  _3 = state[i_9];
  _4 = _3->sample1;
  _5 = _4 + 1;
  _3->sample1 = _5;
  # DEBUG BEGIN_STMT
  i_23 = i_9 + 1;
  # DEBUG i => i_23
  # DEBUG i => i_23
  # DEBUG BEGIN_STMT
  if (channelCount.1_1 > i_23)
goto ; [66.67%]
  else
goto ; [33.33%]

Notice the `i => i_9` debug statement which is now missing.

[Bug rtl-optimization/110305] Incorrect optimization with -O3 -fsignaling-nans -fno-signed-zeros

2023-06-19 Thread mmorrell at tachyum dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=110305

--- Comment #8 from Michael Morrell  ---
Interesting information.  I still feel that perhaps both functions should use
the same logic to determine whether to make this transformation, but, for
example, the extra checking for the vector case done by
fold_real_zero_addition_p may not be needed in simplify_binary_operation_1
because of when the latter is used.

Re: [PATCH] tree-optimization/110243 - kill off IVOPTs split_offset

2023-06-19 Thread Richard Sandiford via Gcc-patches
Jeff Law  writes:
> On 6/16/23 06:34, Richard Biener via Gcc-patches wrote:
>> IVOPTs has strip_offset which suffers from the same issues regarding
>> integer overflow that split_constant_offset did but the latter was
>> fixed quite some time ago.  The following implements strip_offset
>> in terms of split_constant_offset, removing the redundant and
>> incorrect implementation.
>> 
>> The implementations are not exactly the same, strip_offset relies
>> on ptrdiff_tree_p to fend off too large offsets while split_constant_offset
>> simply assumes those do not happen and truncates them.  By
>> the same means strip_offset also handles POLY_INT_CSTs but
>> split_constant_offset does not.  Massaging the latter to
>> behave like strip_offset in those cases might be the way to go?
>> 
>> Bootstrapped and tested on x86_64-unknown-linux-gnu.
>> 
>> Comments?
>> 
>> Thanks,
>> Richard.
>> 
>>  PR tree-optimization/110243
>>  * tree-ssa-loop-ivopts.cc (strip_offset_1): Remove.
>>  (strip_offset): Make it a wrapper around split_constant_offset.
>> 
>>  * gcc.dg/torture/pr110243.c: New testcase.
> Your call -- IMHO you know this code far better than I.

+1, but LGTM FWIW.  I couldn't see anything obvious (and valid)
that split_offset_1 handles and split_constant_offset doesn't.

Thanks,
Richard


[Bug rtl-optimization/110307] ICE in move_insn, at haifa-sched.cc:5473 when building Ruby on alpha with -fPIC -O2 (or -fpeephole2 -fschedule-insns2)

2023-06-19 Thread matoro_gcc_bugzilla at matoro dot tk via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=110307

--- Comment #2 from matoro  ---
Created attachment 55365
  --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=55365=edit
archive from -fdump-tree-all -fdump-rtl-all

(In reply to Alexander Monakov from comment #1)
> I tried building a cross-compiler from trunk with
> --target=alpha-unknown-linux-gnu --with-gnu-ld --with-gnu-as
> --enable-secureplt --enable-languages=c --enable-tls and got
> 
> t.c:8:1: error: unrecognizable insn:
> 8 | }
>   | ^
> (insn 23 22 24 5 (set (reg/f:DI 74)
> (symbol_ref:DI ("ruby_current_ec") [flags 0x10]   0x7fb457a6c090 ruby_current_ec>)) "t.c":6:22 -1
>  (nil))
> during RTL pass: vregs
> 
> Would you mind compiling the testcase with -fdump-tree-all -fdump-rtl-all
> and attaching a tar.gz with the resulting dumps?

Absolutely, here you go.

[Bug target/100799] Stackoverflow in optimized code on PPC

2023-06-19 Thread bergner at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=100799

Peter Bergner  changed:

   What|Removed |Added

 Resolution|--- |INVALID
 Status|WAITING |RESOLVED

--- Comment #22 from Peter Bergner  ---
I'm closing this as NOT A BUG in GCC and is a bug in the source code being
compiled not being cognizant of the rules between calling between fortran and
C.  Surya listed two solutions which can be used in Comment #21 below.

[Bug target/110201] RISC-V: __builtin_riscv_sm4ks and __builtin_riscv_sm4ed produce invalid assembly

2023-06-19 Thread palmer at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=110201

--- Comment #6 from palmer at gcc dot gnu.org ---
(In reply to Craig Topper from comment #3)
> I don't have a testsuite. I saw that gcc had crypto builtins and I happened
> to noticed the tests in gcc weren't passing constant arguments.
> 
> We also have a divergence in names between clang and gcc for some crypto
> builtins. We really need to define a scalar crypto intrinsic header file.

OK, let's try and get that sorted out?  We're generally not supposed to be
merging intrinsics without some sort of spec to point at, but we did a pretty
poor job at that for the V intrinsics and it looks like we've slipped a bit
here too.

Unless I'm missing something we haven't released GCC with the crypto intrinsics
yet, so we should be safe to fix bugs there as they come up.

[Bug target/110201] RISC-V: __builtin_riscv_sm4ks and __builtin_riscv_sm4ed produce invalid assembly

2023-06-19 Thread palmer at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=110201

--- Comment #5 from palmer at gcc dot gnu.org ---
(In reply to Jeffrey A. Law from comment #4)
> Yea, the tests aren't great.  They'll be better shortly.  They'll test
> non-constant arguments and out-of-range constants, expecting a suitable
> diagnostic.  They'll also test the extrema of valid constants.

Awesome, thanks!

[Bug target/110201] RISC-V: __builtin_riscv_sm4ks and __builtin_riscv_sm4ed produce invalid assembly

2023-06-19 Thread law at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=110201

--- Comment #4 from Jeffrey A. Law  ---
Yea, the tests aren't great.  They'll be better shortly.  They'll test
non-constant arguments and out-of-range constants, expecting a suitable
diagnostic.  They'll also test the extrema of valid constants.

[Bug rtl-optimization/110305] Incorrect optimization with -O3 -fsignaling-nans -fno-signed-zeros

2023-06-19 Thread pinskia at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=110305

--- Comment #7 from Andrew Pinski  ---
(In reply to Michael Morrell from comment #6)
> I'm curious why this transformation is being done by both
> fold_real_zero_addition_p AND simplify_binary_operation_1.  

The answer there involves the history of GCC and the history of how
optimizations were done in GCC.
Basically fold_real_zero_addition_p (fold) would only act a statement while
simplify_binary_operation_1 could happen between statements (while doing CSE
and combine, etc.). That changed with the merge of tree-ssa in
r0-58166-g6de9cd9a886ea6 (2004). 

simplify_binary_operation_1 had the optimization since the begining of git
(though it moved from cse.c to simplify-rtx in r0-24738-g0cedb36cbd7e0c and the
HONOR_SIGNED_ZEROS was done by r0-41258-g71925bc04f24a4, in 2002 before it was
just checking ieee float format and unsafe-math).

fold had it since the begining of git also (and changed in a similar fashion as
simplify-rtx for the HONOR_SIGNED_ZEROS).

[Bug target/110201] RISC-V: __builtin_riscv_sm4ks and __builtin_riscv_sm4ed produce invalid assembly

2023-06-19 Thread craig.topper at gmail dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=110201

--- Comment #3 from Craig Topper  ---
I don't have a testsuite. I saw that gcc had crypto builtins and I happened to
noticed the tests in gcc weren't passing constant arguments.

We also have a divergence in names between clang and gcc for some crypto
builtins. We really need to define a scalar crypto intrinsic header file.

[Bug target/110201] RISC-V: __builtin_riscv_sm4ks and __builtin_riscv_sm4ed produce invalid assembly

2023-06-19 Thread palmer at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=110201

--- Comment #2 from palmer at gcc dot gnu.org ---
Do you guys have a test suite for these, or did you just happen to run into it?
 The intrinsic testing has been a bit of a blind spot in GCC land.

[Bug ada/110314] New: Gnat failed assertion and Allocators with discriminant

2023-06-19 Thread franckbehaghel_gcc at protonmail dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=110314

Bug ID: 110314
   Summary: Gnat failed assertion and Allocators with discriminant
   Product: gcc
   Version: 14.0
Status: UNCONFIRMED
  Severity: normal
  Priority: P3
 Component: ada
  Assignee: unassigned at gcc dot gnu.org
  Reporter: franckbehaghel_gcc at protonmail dot com
CC: dkm at gcc dot gnu.org
  Target Milestone: ---

With checks enabled, Gnat failed to build this file.

$ cat main_assertion_failed.adb 
with Ada.Containers.Synchronized_Queue_Interfaces;
with Ada.Containers.Unbounded_Synchronized_Queues;

procedure main is 

package  Queue_Interfaces  is new Ada.Containers.Synchronized_Queue_Interfaces
(Integer);
package Synchronized_Queues is new Ada.Containers.Unbounded_Synchronized_Queues
( Queue_Interfaces => Queue_Interfaces);
subtype Queue is Synchronized_Queues.Queue; 
type Access_Type is access all  Queue;

Q1 : Access_Type := new Queue;
Q2 : Access_Type := new Queue;

begin  
   null;
end Main;



$ gnatmake main_assertion_failed.adb 
gcc -c main_assertion_failed.adb
+===GNAT BUG DETECTED==+
| 14.0.0 20230617 (experimental) (aarch64-unknown-linux-gnu) Assert_Failure
nlists.adb:172|
| Error detected at main_assertion_failed.adb:12:21|
| Compiling main_assertion_failed.adb  |
| Please submit a bug report; see https://gcc.gnu.org/bugs/ .  |
| Use a subject line meaningful to you and us to track the bug.|
| Include the entire contents of this bug box in the report.   |
| Include the exact command that you entered.  |
| Also include sources listed below.   |
+==+




It fails also with Compiler Explorer (https://godbolt.org/) 
+===GNAT BUG DETECTED==+
| 14.0.0 20230619 (experimental) (x86_64-linux-gnu) Assert_Failure
nlists.adb:172|
| Error detected at example.adb:12:21  |
| Compiling|
| Please submit a bug report; see https://gcc.gnu.org/bugs/ .  |
| Use a subject line meaningful to you and us to track the bug.|
| Include the entire contents of this bug box in the report.   |
| Include the exact command that you entered.  |
| Also include sources listed below.   |
+==+


regards,

Re: Different ASM for ReLU function between GCC11 and GCC12

2023-06-19 Thread Jakub Jelinek via Gcc
On Mon, Jun 19, 2023 at 09:10:53PM +0200, André Günther via Gcc wrote:
> I noticed that a simple function like
> auto relu( float x ) {
> return x > 0.f ? x : 0.f;
> }
> compiles to different ASM using GCC11 (or lower) and GCC12 (or higher). On
> -O3 -mavx2 the former compiles above function to

Such reports should go into gcc.gnu.org/bugzilla/, not to the mailing list,
if you are convinced that loading the constant from memory is faster.
Another possibility is
vxorps xmm1, xmm1, xmm1
vmaxss xmm0, xmm0, xmm1
ret
which doesn't need to wait for the memory.
This changed with https://gcc.gnu.org/r12-7693

> 
> relu(float):
> vmaxss xmm0, xmm0, DWORD PTR .LC0[rip]
> ret
> .LC0:
> .long 0
> 
> which is what I would naively expect and what also clang essentially does
> (clang actually uses an xor before the maxss to get the zero). The latter,
> however, compiles the function to
> 
> relu(float):
> vxorps xmm1, xmm1, xmm1
> vcmpltss xmm2, xmm1, xmm0
> vblendvps xmm0, xmm1, xmm0, xmm2
> ret
> 
> which looks like a missed optimisation. Does anyone know if there's a
> reason for the changed behaviour?

Jakub



[Bug target/110201] RISC-V: __builtin_riscv_sm4ks and __builtin_riscv_sm4ed produce invalid assembly

2023-06-19 Thread law at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=110201

Jeffrey A. Law  changed:

   What|Removed |Added

 Ever confirmed|0   |1
   Last reconfirmed||2023-06-19
 Status|UNCONFIRMED |NEW

--- Comment #1 from Jeffrey A. Law  ---
It looks like some of the aes patterns have the same problem.  It may just have
been Liao not understanding the difference between an operand constraint and an
operand predicate.

[Bug go/110297] [13/14 Regression] all libgo tests fail on arm-linux-gnueabi and arm-linxu-gnueabihf

2023-06-19 Thread ian at airs dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=110297

--- Comment #4 from Ian Lance Taylor  ---
Thanks.  I suspect this was broken by
https://gcc.gnu.org/pipermail/gcc-patches/2022-October/604158.html.

Different ASM for ReLU function between GCC11 and GCC12

2023-06-19 Thread André Günther via Gcc
Hi,
I noticed that a simple function like
auto relu( float x ) {
return x > 0.f ? x : 0.f;
}
compiles to different ASM using GCC11 (or lower) and GCC12 (or higher). On
-O3 -mavx2 the former compiles above function to

relu(float):
vmaxss xmm0, xmm0, DWORD PTR .LC0[rip]
ret
.LC0:
.long 0

which is what I would naively expect and what also clang essentially does
(clang actually uses an xor before the maxss to get the zero). The latter,
however, compiles the function to

relu(float):
vxorps xmm1, xmm1, xmm1
vcmpltss xmm2, xmm1, xmm0
vblendvps xmm0, xmm1, xmm0, xmm2
ret

which looks like a missed optimisation. Does anyone know if there's a
reason for the changed behaviour?

Andre


Re: [PATCH v2] RISC-V: Save and restore FCSR in interrupt functions to avoid program errors.

2023-06-19 Thread Jeff Law via Gcc-patches




On 6/14/23 01:57, Jin Ma wrote:

In order to avoid interrupt functions to change the FCSR, it needs to be saved
and restored at the beginning and end of the function.

gcc/ChangeLog:

* config/riscv/riscv.cc (riscv_compute_frame_info): Allocate frame for 
FCSR.
(riscv_for_each_saved_reg): Save and restore FCSR in interrupt 
functions.
* config/riscv/riscv.md (riscv_frcsr): New patterns.
(riscv_fscsr): Likewise.

gcc/testsuite/ChangeLog:

* gcc.target/riscv/interrupt-fcsr-1.c: New test.
* gcc.target/riscv/interrupt-fcsr-2.c: New test.
* gcc.target/riscv/interrupt-fcsr-3.c: New test.

Thanks.  I pushed this to the trunk.
jeff


[PATCH ver 6] rs6000: Add builtins for IEEE 128-bit floating point values

2023-06-19 Thread Carl Love via Gcc-patches


Kewen, GCC maintainers:

Version 6, Fixed missing change log entry.  Changed builtin id names as
requested.  Missed making the change on the last version.  Fixed
comment in the three test cases.  Reran regression suite on Power 10,
no regressions.

Version 5, Tested the patch on P9 BE per request.  Fixed up test case
to get the correct expected values for BE and LE.  Fixed typos. 
Updated the doc/extend.texi to clarify the vector arguments.  Changed
test file names per request.  Moved builtin defs next to related
definitions.  Renamed new mode_attr. Removed new mode_iterator, used
existing iterator instead. Renamed mode_iterator VSEEQP_DI to V2DI_DI. 
Fixed up overloaded definitions per request.

Version 4, added missing cases for new xxexpqp, xsxexpdp and xsxsigqp
cases to rs6000_expand_builtin.  Merged the new define_insn definitions
with the existing definitions.  Renamed the builtins by removing the
__builtin_ prefix from the names.  Fixed the documentation for the
builtins.  Updated the test files to check the desired instructions
were generated.  Retested patch on Power 10 with no regressions.

Version 3, was able to get the overloaded version of scalar_insert_exp
to work and the change to xsxexpqp_f128_ define instruction to
work with the suggestions from Kewen.  

Version 2, I have addressed the various comments from Kewen.  I had
issues with adding an additional overloaded version of
scalar_insert_exp with vector arguments.  The overload infrastructure
didn't work with a mix of scalar and vector arguments.  I did rename
the __builtin_insertf128_exp to __builtin_vsx_scalar_insert_exp_qp make
it similar to the existing builtin.  I also wasn't able to get the
suggested merge of xsxexpqp_f128_ with xsxexpqp_ to work so
I left the two simpler definitiions.

The patch add three new builtins to extract the significand and
exponent of an IEEE float 128-bit value where the builtin argument is a
vector.  Additionally, a builtin to insert the exponent into an IEEE
float 128-bit vector argument is added.  These builtins were requested
since there is no clean and optimal way to transfer between a vector
and a scalar IEEE 128 bit value.

The patch has been tested on Power 9 BE and Power 10 LE with no
regressions.  Please let me know if the patch is acceptable or not. 
Thanks.

   Carl


rs6000: Add builtins for IEEE 128-bit floating point values

Add support for the following builtins:

 __vector unsigned long long int scalar_extract_exp_to_vec (__ieee128);
 __vector unsigned __int128 scalar_extract_sig_to_vec (__ieee128);
 __ieee128 scalar_insert_exp (__vector unsigned __int128,
  __vector unsigned long long);

The instructions used in the builtins operate on vector registers.  Thus
the result must be moved to a scalar type.  There is no clean, performant
way to do this.  The user code typically needs the result as a vector
anyway.

gcc/
* config/rs6000/rs6000-builtin.cc (rs6000_expand_builtin):
Rename CCDE_FOR_xsxexpqp_tf to CODE_FOR_xsxexpqp_tf_di.
Rename CODE_FOR_xsxexpqp_kf to CODE_FOR_xsxexpqp_kf_di.
(CODE_FOR_xsxexpqp_kf_v2di, CODE_FOR_xsxsigqp_kf_v1ti,
CODE_FOR_xsiexpqp_kf_v2di): Add case statements.
* config/rs6000/rs6000-buildin.def (__builtin_extractf128_exp,
 __builtin_extractf128_sig, __builtin_insertf128_exp): Add new
builtin definitions.
Rename xsxexpqp_kf, xsxsigqp_kf, xsiexpqp_kf to xsexpqp_kf_di,
xsxsigqp_kf_ti, xsiexpqp_kf_di respectively.
* config/rs6000/rs6000-c.cc (altivec_resolve_overloaded_builtin):
Update case RS6000_OVLD_VEC_VSIE to handle MODE_VECTOR_INT for new
overloaded instance. Update comments.
* config/rs6000/rs6000-overload.def
(__builtin_vec_scalar_insert_exp): Add new overload definition with
vector arguments.
(scalar_extract_exp_to_vec, scalar_extract_sig_to_vec): New
overloaded definitions.
* config/vsx.md (V2DI_DI): New mode iterator.
(DI_to_TI): New mode attribute.
Rename xsxexpqp_ to sxexpqp__.
Rename xsxsigqp_ to xsxsigqp__.
Rename xsiexpqp_ to xsiexpqp__.
* doc/extend.texi (__builtin_extractf128_exp,
__builtin_extractf128_sig): Add documentation for new builtins.
(scalar_insert_exp): Add new overloaded builtin definition.

gcc/testsuite/
* gcc.target/powerpc/bfp/extract-exp-8.c: New test case.
* gcc.target/powerpc/bfp/extract-sig-8.c: New test case.
* gcc.target/powerpc/bfp/insert-exp-16.c: New test case.
---
 gcc/config/rs6000/rs6000-builtin.cc   |  21 +++-
 gcc/config/rs6000/rs6000-builtins.def |  15 ++-
 gcc/config/rs6000/rs6000-c.cc |  10 +-
 gcc/config/rs6000/rs6000-overload.def |  12 ++
 gcc/config/rs6000/vsx.md  |  25 +++--
 gcc/doc/extend.texi   |  24 +++-
 

[Bug rtl-optimization/110305] Incorrect optimization with -O3 -fsignaling-nans -fno-signed-zeros

2023-06-19 Thread mmorrell at tachyum dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=110305

Michael Morrell  changed:

   What|Removed |Added

 CC||mmorrell at tachyum dot com

--- Comment #6 from Michael Morrell  ---
I'm curious why this transformation is being done by both
fold_real_zero_addition_p AND simplify_binary_operation_1.  The checks in
fold_real_zero_addition_p are more complex and will leave "a + 0.0" unchanged
in more cases, yet later simplify_binary_operation_1 transforms the expression
for less complex reasons.

I also wonder if there aren't similar expressions (perhaps "a * 1.0" -> a) that
need to be looked at.

[Bug fortran/92887] [F2008] Passing nullified/disassociated pointer or unalloc allocatable to OPTIONAL + VALUE dummy fails

2023-06-19 Thread anlauf at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=92887

--- Comment #6 from anlauf at gcc dot gnu.org ---
(In reply to Mikael Morin from comment #5)
> (In reply to anlauf from comment #4)
> > 
> > I'll need broader feedback, so unless someone adds to this pr, I'll submit
> > the present patch - with testcases - to get attention.
> > 
> Here you go:
> 
> > diff --git a/gcc/fortran/trans-expr.cc b/gcc/fortran/trans-expr.cc
> > index 45a984b6bdb..d9dcc11e5bd 100644
> > --- a/gcc/fortran/trans-expr.cc
> > +++ b/gcc/fortran/trans-expr.cc
> 
> > @@ -6396,7 +6399,28 @@ gfc_conv_procedure_call (gfc_se * se, gfc_symbol * 
> > sym,
> > && fsym->ts.type != BT_CLASS
> > && fsym->ts.type != BT_DERIVED)
> >   {
> > -   if (e->expr_type != EXPR_VARIABLE
> > +   /* F2018:15.5.2.12 Argument presence and
> > +  restrictions on arguments not present.  */
> > +   if (e->expr_type == EXPR_VARIABLE
> > +   && (e->symtree->n.sym->attr.allocatable
> > +   || e->symtree->n.sym->attr.pointer))
> 
> Beware of expressions like derived%alloc_comp or derived%pointer_comp which
> don't match the above.

Right.  This is fixable by using


&& (gfc_expr_attr (e).allocatable
|| gfc_expr_attr (e).pointer))

instead.

> > @@ -7072,6 +7096,42 @@ gfc_conv_procedure_call (gfc_se * se, gfc_symbol * 
> > sym,
> > }
> > }
> >  
> > +  /* F2023:15.5.3, 15.5.4: Actual argument expressions are evaluated
> > +before they are associated and a procedure is executed.  */
> > +  if (e && e->expr_type != EXPR_VARIABLE && !gfc_is_constant_expr (e))
> > +   {
> > + /* Create temporary except for functions returning pointers that
> > +can appear in a variable definition context.  */
> 
> Maybe explain *why* we have to create a temporary, that is some data
> references may become  undefined by the procedure call (intent(out) dummies)
> so we have to evaluate values depending on them beforehand (PR 92178).

That is one reason.  Another one, also pointed out in PR92178 by Tobias' review
of Steve's draft, is the first testcase at

https://gcc.gnu.org/legacy-ml/gcc-patches/2019-10/msg01970.html

This is reminiscent to an issue reported for the MERGE intrinsic (pr107874,
fixed so far, but there is a remaining issue in pr105371).

> > + if (e->expr_type != EXPR_FUNCTION
> > + || !(gfc_expr_attr (e).pointer || gfc_expr_attr (e).proc_pointer))
> 
> Merge with the outer condition?

Yes.  The above form was intended more for proof-of-concept and readability
than for coding standards.

> > +   need_temp = true;
> > +   }
> > +
> > +  if (need_temp)
> > +   {
> > + if (cond_temp == NULL_TREE)
> > +   parmse.expr = gfc_evaluate_now (parmse.expr, );
> 
> I'm not sure about this.  The condition to set need_temp looks quite general
> (especially it matches non-scalar cases, doesn't it?), but
> gfc_conv_expr_reference should already take care of creating a variable, so
> that a temporary is missing only for value dummies, I think.  I would rather
> move this to the place specific to value dummies.

I agree in principle.  The indentation level is already awful in the specific
place, which calls for thoughts about refactoring that mega-loop over the
arguments than currently spans far more than 1000 source code lines.

> I think this PR is only about scalars with basic types, is there the same
> problem with derived types?  with classes?
> I guess arrays are different as they are always by reference?

For the current documentation of the argument passing convention see:

https://gcc.gnu.org/onlinedocs/gfortran/Argument-passing-conventions.html

"For OPTIONAL dummy arguments, an absent argument is denoted by a NULL pointer,
except for scalar dummy arguments of intrinsic type which have the VALUE
attribute. For those, a hidden Boolean argument (logical(kind=C_bool),value) is
used to indicate whether the argument is present."

My understanding is that for these scalar arguments we do need something that
can be passed by value.

We currently do not support VALUE with array arguments (F2008+), character
of length > 1, and character actual arguments are broken unless they are
constants.  There are several open PRs.

> > + else
> 
> I would rather move the else part to the place above where cond_temp is set,
> so that the code is easier to follow.
> 
> > +   {
> > + /* "Conditional temporary" to handle variables that possibly
> > +cannot be dereferenced.  Use null value as fallback.  */
> > + tree dflt_temp;
> > + gcc_assert (e->ts.type != BT_DERIVED && e->ts.type != BT_CLASS);
> > + gcc_assert (e->rank == 0);
> > + dflt_temp = gfc_create_var (TREE_TYPE (parmse.expr), "temp");
> > + TREE_STATIC (dflt_temp) = 1;
> > + 

Re: [PATCH] Introduce hardbool attribute for C

2023-06-19 Thread Bernhard Reutner-Fischer via Gcc-patches
On 16 June 2023 07:35:27 CEST, Alexandre Oliva via Gcc-patches 
 wrote:

index 0..634feaed4deef
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/hardbool-err.c
@@ -0,0 +1,28 @@
+/* { dg-do compile } */
+/* { dg-options "" } */
+
+typedef _Bool __attribute__ ((__hardbool__))
+hbbl; /* { dg-error "integral types" } */
+
+typedef double __attribute__ ((__hardbool__))
+hbdbl; /* { dg-error "integral types" } */
+
+enum x;
+typedef enum x __attribute__ ((__hardbool__))
+hbenum; /* { dg-error "integral types" } */
+
+struct s;
+typedef struct s __attribute__ ((__hardbool__))
+hbstruct; /* { dg-error "integral types" } */
+
+typedef int __attribute__ ((__hardbool__ (0, 0)))
+hb00; /* { dg-error "different values" } */
+
+typedef int __attribute__ ((__hardbool__ (4, 16))) hb4x;
+struct s {
+ hb4x m:2;
+}; /* { dg-error "is a GCC extension|different values" } */
+/* { dg-warning "changes value" "warning" { target *-*-* } .-1 } */
+
+hb4x __attribute__ ((vector_size (4 * sizeof (hb4x
+vvar; /* { dg-error "invalid vector type" } */

Arm-chair, tinfoil hat still on, didn't look closely, hence:

I don't see explicit tests with _Complex nor __complex__. Would we want to 
check these here, or are they handled thought the "underlying" tests above?

I'd welcome a fortran interop note in the docs as hinted previously to cover 
out of the box behavior. It's probably reasonably unlikely but better be safe 
than sorry?
cheers,


Re: [PATCH ver 5] rs6000: Add builtins for IEEE 128-bit floating point values

2023-06-19 Thread Carl Love via Gcc-patches
Kewen:

On Mon, 2023-06-19 at 14:08 +0800, Kewen.Lin wrote:
> > 



> Hi Carl,
> 
> on 2023/6/17 01:57, Carl Love wrote:
> > overloaded instance. Update comments.
> > * config/rs6000/rs6000-overload.def
> > (__builtin_vec_scalar_insert_exp): Add new overload definition
> > with
> > vector arguments.
> > (scalar_extract_exp_to_vec, scalar_extract_sig_to_vec): New
> > overloaded definitions.
> > * config/vsx.md (V2DI_DI): New mode iterator.
> 
> Missing an entry for DI_to_TI.

Opps, missed that.  Sorry, fixed.

> > 



> 
> >  
> >const signed long long __builtin_vsx_scalar_extract_expq
> > (_Float128);
> > -VSEEQP xsxexpqp_kf {}
> > +VSEEQP xsxexpqp_kf_di {}
> > +
> > +  vull __builtin_vsx_scalar_extract_exp_to_vec (_Float128);
> > +VSEEXPKF xsxexpqp_kf_v2di {}
> 
> As I pointed out previously, the related id is VSEEQP, since both of
> them

Oops, I guess I forgot to change that.  Sorry.

> have kf in their names, having KF in its id doesn't look good IMHO.
> How about VSEEQPV instead of VSEEXPKF?  It's also consistent with
> what
> we use for VSIEQP.

Yup, makes sense, changed to VSEEQPV.
> 
> >  
> >const signed __int128 __builtin_vsx_scalar_extract_sigq
> > (_Float128);
> > -VSESQP xsxsigqp_kf {}
> > +VSESQP xsxsigqp_kf_ti {}
> > +
> > +  vuq __builtin_vsx_scalar_extract_sig_to_vec (_Float128);
> > +VSESIGKF xsxsigqp_kf_v1ti {}
> 
> Similar to the above, s/VSESIGKF/VSESQPV/
 
Changed to VSESQPV.
> 
> >  
> >const _Float128 __builtin_vsx_scalar_insert_exp_q (unsigned
> > __int128, \
> >   unsigned long
> > long);
> > -VSIEQP xsiexpqp_kf {}
> > +VSIEQP xsiexpqp_kf_di {}
> >  
> >const _Float128 __builtin_vsx_scalar_insert_exp_qp (_Float128, \
> >unsigned
> > long long);
> >  VSIEQPF xsiexpqpf_kf {}
> >  
> > +  const _Float128 __builtin_vsx_scalar_insert_exp_vqp (vuq, vull);
> > +VSIEQPV xsiexpqp_kf_v2di {}
> > +
> >const signed int __builtin_vsx_scalar_test_data_class_qp
> > (_Float128, \
> >  const
> > int<7>);
> >  VSTDCQP xststdcqp_kf {}
> > diff --git a/gcc/config/rs6000/rs6000-c.cc
> > b/gcc/config/rs6000/rs6000-c.cc
> > index 8555174d36e..11060f697db 100644
> > --- a/gcc/config/rs6000/rs6000-c.cc
> > +++ b/gcc/config/rs6000/rs6000-c.cc
> > @@ -1929,11 +1929,15 @@ altivec_resolve_overloaded_builtin
> > (location_t loc, tree fndecl,
> >128-bit variant of built-in function.  */
> > if (GET_MODE_PRECISION (arg1_mode) > 64)
> >   {
> > -   /* If first argument is of float variety, choose variant
> > -  that expects __ieee128 argument.  Otherwise, expect
> > -  __int128 argument.  */
> > +   /* If first argument is of float variety, choose the
> > variant that
> > +  expects __ieee128 argument.  If the first argument is
> > vector
> > +  int, choose the variant that expects vector unsigned
> > +  __int128 argument.  Otherwise, expect scalar __int128
> > argument.
> > +   */
> > if (GET_MODE_CLASS (arg1_mode) == MODE_FLOAT)
> >   instance_code = RS6000_BIF_VSIEQPF;
> > +   else if (GET_MODE_CLASS (arg1_mode) == MODE_VECTOR_INT)
> > + instance_code = RS6000_BIF_VSIEQPV;
> > else
> >   instance_code = RS6000_BIF_VSIEQP;
> >   }
> > diff --git a/gcc/config/rs6000/rs6000-overload.def
> > b/gcc/config/rs6000/rs6000-overload.def
> > index c582490c084..05a5ca6a04d 100644
> > --- a/gcc/config/rs6000/rs6000-overload.def
> > +++ b/gcc/config/rs6000/rs6000-overload.def
> > @@ -4515,6 +4515,18 @@
> >  VSIEQP
> >_Float128 __builtin_vec_scalar_insert_exp (_Float128, unsigned
> > long long);
> >  VSIEQPF
> > +  _Float128 __builtin_vec_scalar_insert_exp (vuq, vull);
> > +VSIEQPV
> > +
> > +[VEC_VSEEV, scalar_extract_exp_to_vec, \
> > +__builtin_vec_scalar_extract_exp_to_vector]
> > +  vull __builtin_vec_scalar_extract_exp_to_vector (_Float128);
> > +VSEEXPKF
> > +
> 
> Need to update if the above changes.

changed 
> 
> > +[VEC_VSESV, scalar_extract_sig_to_vec, \
> > +__builtin_vec_scalar_extract_sig_to_vector]
> > +  vuq __builtin_vec_scalar_extract_sig_to_vector (_Float128);
> > +VSESIGKF
> >  
> 
> Ditto.

changed

> 



> > 
> > diff --git a/gcc/testsuite/gcc.target/powerpc/bfp/scalar-extract-
> > exp-8.c b/gcc/testsuite/gcc.target/powerpc/bfp/scalar-extract-exp-
> > 8.c
> > new file mode 100644
> > index 000..e24e09012d9
> > --- /dev/null
> > +++ b/gcc/testsuite/gcc.target/powerpc/bfp/scalar-extract-exp-8.c
> > @@ -0,0 +1,58 @@
> > +/* { dg-do run { target { powerpc*-*-* } } } */
> > +/* { dg-require-effective-target lp64 } */
> > +/* { dg-require-effective-target p9vector_hw } */
> > +/* { dg-options "-mdejagnu-cpu=power9 -save-temps" } */
> > +
> > +#include 
> > +#include 
> > +
> > +#if 

Re: [PATCH] tree-optimization/110243 - kill off IVOPTs split_offset

2023-06-19 Thread Jeff Law via Gcc-patches




On 6/16/23 06:34, Richard Biener via Gcc-patches wrote:

IVOPTs has strip_offset which suffers from the same issues regarding
integer overflow that split_constant_offset did but the latter was
fixed quite some time ago.  The following implements strip_offset
in terms of split_constant_offset, removing the redundant and
incorrect implementation.

The implementations are not exactly the same, strip_offset relies
on ptrdiff_tree_p to fend off too large offsets while split_constant_offset
simply assumes those do not happen and truncates them.  By
the same means strip_offset also handles POLY_INT_CSTs but
split_constant_offset does not.  Massaging the latter to
behave like strip_offset in those cases might be the way to go?

Bootstrapped and tested on x86_64-unknown-linux-gnu.

Comments?

Thanks,
Richard.

PR tree-optimization/110243
* tree-ssa-loop-ivopts.cc (strip_offset_1): Remove.
(strip_offset): Make it a wrapper around split_constant_offset.

* gcc.dg/torture/pr110243.c: New testcase.

Your call -- IMHO you know this code far better than I.

jeff


Re: [PATCH] RISC-V: Add VLS modes for GNU vectors

2023-06-19 Thread Jeff Law via Gcc-patches




On 6/18/23 17:06, Juzhe-Zhong wrote:

This patch is a propsal patch is **NOT** ready to push since
after this patch the total machine modes will exceed 255 which will create ICE
in LTO:
   internal compiler error: in bp_pack_int_in_range, at data-streamer.h:290
Right.  Note that an ack from Jakub or Richi will be sufficient for the 
LTO fixes to go forward.





The reason we need to add VLS modes for following reason:
1. Enhance GNU vectors codegen:
For example:
  typedef int32_t vnx8si __attribute__ ((vector_size (32)));

  __attribute__ ((noipa)) void
  f_vnx8si (int32_t * in, int32_t * out)
  {
vnx8si v = *(vnx8si*)in;
*(vnx8si *) out = v;
  }

compile option: --param=riscv-autovec-preference=scalable
 before this patch:
 f_vnx8si:
 ld  a2,0(a0)
 ld  a3,8(a0)
 ld  a4,16(a0)
 ld  a5,24(a0)
 addisp,sp,-32
 sd  a2,0(a1)
 sd  a3,8(a1)
 sd  a4,16(a1)
 sd  a5,24(a1)
 addisp,sp,32
 jr  ra

After this patch:
f_vnx8si:
 vsetivlizero,8,e32,m2,ta,ma
 vle32.v v2,0(a0)
 vse32.v v2,0(a1)
 ret

2. Ehance VLA SLP:
void
f (uint8_t *restrict a, uint8_t *restrict b, uint8_t *restrict c)
{
   for (int i = 0; i < 100; ++i)
 {
   a[i * 8] = b[i * 8] + c[i * 8];
   a[i * 8 + 1] = b[i * 8] + c[i * 8 + 1];
   a[i * 8 + 2] = b[i * 8 + 2] + c[i * 8 + 2];
   a[i * 8 + 3] = b[i * 8 + 2] + c[i * 8 + 3];
   a[i * 8 + 4] = b[i * 8 + 4] + c[i * 8 + 4];
   a[i * 8 + 5] = b[i * 8 + 4] + c[i * 8 + 5];
   a[i * 8 + 6] = b[i * 8 + 6] + c[i * 8 + 6];
   a[i * 8 + 7] = b[i * 8 + 6] + c[i * 8 + 7];
 }
}


..
Loop body:
  ...
  vrgatherei16.vv...
  ...

Tail:
  lbu a4,792(a1)
 lbu a5,792(a2)
 addwa5,a5,a4
 sb  a5,792(a0)
 lbu a5,793(a2)
 addwa5,a5,a4
 sb  a5,793(a0)
 lbu a4,794(a1)
 lbu a5,794(a2)
 addwa5,a5,a4
 sb  a5,794(a0)
 lbu a5,795(a2)
 addwa5,a5,a4
 sb  a5,795(a0)
 lbu a4,796(a1)
 lbu a5,796(a2)
 addwa5,a5,a4
 sb  a5,796(a0)
 lbu a5,797(a2)
 addwa5,a5,a4
 sb  a5,797(a0)
 lbu a4,798(a1)
 lbu a5,798(a2)
 addwa5,a5,a4
 sb  a5,798(a0)
 lbu a5,799(a2)
 addwa5,a5,a4
 sb  a5,799(a0)
 ret

The tail elements need VLS modes to vectorize like ARM SVE:

f:
 mov x3, 0
 cntbx5
 mov x4, 792
 whilelo p7.b, xzr, x4
.L2:
 ld1bz31.b, p7/z, [x1, x3]
 ld1bz30.b, p7/z, [x2, x3]
 trn1z31.b, z31.b, z31.b
 add z31.b, z31.b, z30.b
 st1bz31.b, p7, [x0, x3]
 add x3, x3, x5
 whilelo p7.b, x3, x4
 b.any   .L2
Tail:
 ldr b31, [x1, 792]
 ldr b27, [x1, 794]
 ldr b28, [x1, 796]
 dup v31.8b, v31.b[0]
 ldr b29, [x1, 798]
 ldr d30, [x2, 792]
 ins v31.b[2], v27.b[0]
 ins v31.b[3], v27.b[0]
 ins v31.b[4], v28.b[0]
 ins v31.b[5], v28.b[0]
 ins v31.b[6], v29.b[0]
 ins v31.b[7], v29.b[0]
 add v31.8b, v30.8b, v31.8b
 str d31, [x0, 792]
 ret

Notice ARM SVE use ADVSIMD modes (Neon) to vectorize the tail.






gcc/ChangeLog:

 * config/riscv/riscv-modes.def (VECTOR_BOOL_MODE): Add VLS modes for 
GNU vectors.
 (ADJUST_ALIGNMENT): Ditto.
 (ADJUST_BYTESIZE): Ditto.

 (ADJUST_PRECISION): Ditto.
 (VECTOR_MODES): Ditto.
 * config/riscv/riscv-protos.h (riscv_v_ext_vls_mode_p): Ditto.
 (get_regno_alignment): Ditto.
 * config/riscv/riscv-v.cc (INCLUDE_ALGORITHM): Ditto.
 (const_vlmax_p): Ditto.
 (legitimize_move): Ditto.
 (get_vlmul): Ditto.
 (get_regno_alignment): Ditto.
 (get_ratio): Ditto.
 (get_vector_mode): Ditto.
 * config/riscv/riscv-vector-switch.def (VLS_ENTRY): Ditto.
 * config/riscv/riscv.cc (riscv_v_ext_vls_mode_p): Ditto.
 (VLS_ENTRY): Ditto.
 (riscv_v_ext_mode_p): Ditto.
 (riscv_hard_regno_nregs): Ditto.
 (riscv_hard_regno_mode_ok): Ditto.
 * config/riscv/riscv.md: Ditto.
 * config/riscv/vector-iterators.md: Ditto.
 * config/riscv/vector.md: Ditto.
 * config/riscv/autovec-vls.md: New file.

---
So I expected we were going to have to define some static length 
patterns at some point.  So this isn't a huge surprise.








diff --git a/gcc/config/riscv/riscv-v.cc b/gcc/config/riscv/riscv-v.cc
index 6421e933ca9..6fc1c433069 100644
--- a/gcc/config/riscv/riscv-v.cc
+++ 

Re: [PATCH] c-family: implement -ffp-contract=on

2023-06-19 Thread Richard Biener via Gcc-patches



> Am 19.06.2023 um 19:03 schrieb Alexander Monakov :
> 
> 
> Ping. OK for trunk?

Ok if the FE maintainers do not object within 48h.

Thanks,
Richard 

>> On Mon, 5 Jun 2023, Alexander Monakov wrote:
>> 
>> Ping for the front-end maintainers' input.
>> 
>>> On Mon, 22 May 2023, Richard Biener wrote:
>>> 
>>> On Thu, May 18, 2023 at 11:04 PM Alexander Monakov via Gcc-patches
>>>  wrote:
 
 Implement -ffp-contract=on for C and C++ without changing default
 behavior (=off for -std=cNN, =fast for C++ and -std=gnuNN).
>>> 
>>> The documentation changes mention the defaults are changed for
>>> standard modes, I suppose you want to remove that hunk.
>>> 
 gcc/c-family/ChangeLog:
 
* c-gimplify.cc (fma_supported_p): New helper.
(c_gimplify_expr) [PLUS_EXPR, MINUS_EXPR]: Implement FMA
contraction.
 
 gcc/ChangeLog:
 
* common.opt (fp_contract_mode) [on]: Remove fallback.
* config/sh/sh.md (*fmasf4): Correct flag_fp_contract_mode test.
* doc/invoke.texi (-ffp-contract): Update.
* trans-mem.cc (diagnose_tm_1): Skip internal function calls.
 ---
 gcc/c-family/c-gimplify.cc | 78 ++
 gcc/common.opt |  3 +-
 gcc/config/sh/sh.md|  2 +-
 gcc/doc/invoke.texi|  8 ++--
 gcc/trans-mem.cc   |  3 ++
 5 files changed, 88 insertions(+), 6 deletions(-)
 
 diff --git a/gcc/c-family/c-gimplify.cc b/gcc/c-family/c-gimplify.cc
 index ef5c7d919f..f7635d3b0c 100644
 --- a/gcc/c-family/c-gimplify.cc
 +++ b/gcc/c-family/c-gimplify.cc
 @@ -41,6 +41,8 @@ along with GCC; see the file COPYING3.  If not see
 #include "c-ubsan.h"
 #include "tree-nested.h"
 #include "context.h"
 +#include "tree-pass.h"
 +#include "internal-fn.h"
 
 /*  The gimplification pass converts the language-dependent trees
 (ld-trees) emitted by the parser into language-independent trees
 @@ -686,6 +688,14 @@ c_build_bind_expr (location_t loc, tree block, tree 
 body)
   return bind;
 }
 
 +/* Helper for c_gimplify_expr: test if target supports fma-like FN.  */
 +
 +static bool
 +fma_supported_p (enum internal_fn fn, tree type)
 +{
 +  return direct_internal_fn_supported_p (fn, type, OPTIMIZE_FOR_BOTH);
 +}
 +
 /* Gimplification of expression trees.  */
 
 /* Do C-specific gimplification on *EXPR_P.  PRE_P and POST_P are as in
 @@ -739,6 +749,74 @@ c_gimplify_expr (tree *expr_p, gimple_seq *pre_p 
 ATTRIBUTE_UNUSED,
break;
   }
 
 +case PLUS_EXPR:
 +case MINUS_EXPR:
 +  {
 +   tree type = TREE_TYPE (*expr_p);
 +   /* For -ffp-contract=on we need to attempt FMA contraction only
 +  during initial gimplification.  Late contraction across 
 statement
 +  boundaries would violate language semantics.  */
 +   if (SCALAR_FLOAT_TYPE_P (type)
 +   && flag_fp_contract_mode == FP_CONTRACT_ON
 +   && cfun && !(cfun->curr_properties & PROP_gimple_any)
 +   && fma_supported_p (IFN_FMA, type))
 + {
 +   bool neg_mul = false, neg_add = code == MINUS_EXPR;
 +
 +   tree *op0_p = _OPERAND (*expr_p, 0);
 +   tree *op1_p = _OPERAND (*expr_p, 1);
 +
 +   /* Look for ±(x * y) ± z, swapping operands if necessary.  */
 +   if (TREE_CODE (*op0_p) == NEGATE_EXPR
 +   && TREE_CODE (TREE_OPERAND (*op0_p, 0)) == MULT_EXPR)
 + /* '*EXPR_P' is '-(x * y) ± z'.  This is fine.  */;
 +   else if (TREE_CODE (*op0_p) != MULT_EXPR)
 + {
 +   std::swap (op0_p, op1_p);
 +   std::swap (neg_mul, neg_add);
 + }
 +   if (TREE_CODE (*op0_p) == NEGATE_EXPR)
 + {
 +   op0_p = _OPERAND (*op0_p, 0);
 +   neg_mul = !neg_mul;
 + }
 +   if (TREE_CODE (*op0_p) != MULT_EXPR)
 + break;
 +   auto_vec ops (3);
 +   ops.quick_push (TREE_OPERAND (*op0_p, 0));
 +   ops.quick_push (TREE_OPERAND (*op0_p, 1));
 +   ops.quick_push (*op1_p);
 +
 +   enum internal_fn ifn = IFN_FMA;
 +   if (neg_mul)
 + {
 +   if (fma_supported_p (IFN_FNMA, type))
 + ifn = IFN_FNMA;
 +   else
 + ops[0] = build1 (NEGATE_EXPR, type, ops[0]);
 + }
 +   if (neg_add)
 + {
 +   enum internal_fn ifn2 = ifn == IFN_FMA ? IFN_FMS : 
 IFN_FNMS;
 +   if (fma_supported_p (ifn2, type))
 + ifn = ifn2;
 +   else
 + 

Re: Tiny phiprop compile time optimization

2023-06-19 Thread Richard Biener via Gcc-patches



> Am 19.06.2023 um 20:08 schrieb Andrew Pinski via Gcc-patches 
> :
> 
> On Mon, Jun 19, 2023 at 1:32 AM Richard Biener via Gcc-patches
>  wrote:
>> 
>>> On Mon, 19 Jun 2023, Jan Hubicka wrote:
>>> 
>>> Hi,
>>> this patch avoids unnecessary post dominator and update_ssa in phiprop.
>>> 
>>> Bootstrapped/regtested x86_64-linux, OK?
>>> 
>>> gcc/ChangeLog:
>>> 
>>>  * tree-ssa-phiprop.cc (propagate_with_phi): Add 
>>> post_dominators_computed;
>>>  compute post dominators lazilly.
>>>  (const pass_data pass_data_phiprop): Remove TODO_update_ssa.
>>>  (pass_phiprop::execute): Update; return TODO_update_ssa if something
>>>  changed.
>>> 
>>> diff --git a/gcc/tree-ssa-phiprop.cc b/gcc/tree-ssa-phiprop.cc
>>> index 3cb4900b6be..87e3a2ccf3a 100644
>>> --- a/gcc/tree-ssa-phiprop.cc
>>> +++ b/gcc/tree-ssa-phiprop.cc
>>> @@ -260,7 +260,7 @@ chk_uses (tree, tree *idx, void *data)
>>> 
>>> static bool
>>> propagate_with_phi (basic_block bb, gphi *phi, struct phiprop_d *phivn,
>>> - size_t n)
>>> + size_t n, bool *post_dominators_computed)
>>> {
>>>   tree ptr = PHI_RESULT (phi);
>>>   gimple *use_stmt;
>>> @@ -324,6 +324,12 @@ propagate_with_phi (basic_block bb, gphi *phi, struct 
>>> phiprop_d *phivn,
>>>   gimple *def_stmt;
>>>   tree vuse;
>>> 
>>> +  if (!*post_dominators_computed)
>>> +{
>>> +   calculate_dominance_info (CDI_POST_DOMINATORS);
>>> +   *post_dominators_computed = true;
>> 
>> I think you can save the parameter by using dom_info_available_p () here
>> and ...
>> 
>>> + }
>>> +
>>>   /* Only replace loads in blocks that post-dominate the PHI node.  That
>>>  makes sure we don't end up speculating loads.  */
>>>   if (!dominated_by_p (CDI_POST_DOMINATORS,
>>> @@ -465,7 +471,7 @@ const pass_data pass_data_phiprop =
>>>   0, /* properties_provided */
>>>   0, /* properties_destroyed */
>>>   0, /* todo_flags_start */
>>> -  TODO_update_ssa, /* todo_flags_finish */
>>> +  0, /* todo_flags_finish */
>>> };
>>> 
>>> class pass_phiprop : public gimple_opt_pass
>>> @@ -490,9 +497,9 @@ pass_phiprop::execute (function *fun)
>>>   gphi_iterator gsi;
>>>   unsigned i;
>>>   size_t n;
>>> +  bool post_dominators_computed = false;
>>> 
>>>   calculate_dominance_info (CDI_DOMINATORS);
>>> -  calculate_dominance_info (CDI_POST_DOMINATORS);
>>> 
>>>   n = num_ssa_names;
>>>   phivn = XCNEWVEC (struct phiprop_d, n);
>>> @@ -508,7 +515,8 @@ pass_phiprop::execute (function *fun)
>>>   if (bb_has_abnormal_pred (bb))
>>>  continue;
>>>   for (gsi = gsi_start_phis (bb); !gsi_end_p (gsi); gsi_next ())
>>> - did_something |= propagate_with_phi (bb, gsi.phi (), phivn, n);
>>> + did_something |= propagate_with_phi (bb, gsi.phi (), phivn, n,
>>> +  _dominators_computed);
>>> }
>>> 
>>>   if (did_something)
>>> @@ -516,9 +524,10 @@ pass_phiprop::execute (function *fun)
>>> 
>>>   free (phivn);
>>> 
>>> -  free_dominance_info (CDI_POST_DOMINATORS);
>>> +  if (post_dominators_computed)
>>> +free_dominance_info (CDI_POST_DOMINATORS);
>> 
>> unconditionally free_dominance_info here.
>> 
>>> -  return 0;
>>> +  return did_something ? TODO_update_ssa : 0;
>> 
>> I guess that change is following general practice and good to catch
>> undesired changes (update_ssa will exit early when there's nothing
>> to do anyway).
> 
> I wonder if TODO_update_ssa_only_virtuals should be used here rather
> than TODO_update_ssa as the code produces ssa names already and just
> adds memory loads/stores. But I could be wrong.

I guess it should be able to update virtual SSA form itself.  But it’s been 
some time since I wrote the pass …

> 
> Thanks,
> Andrew Pinski
> 
> 
>> 
>> OK with those changes.


Re: Tiny phiprop compile time optimization

2023-06-19 Thread Andrew Pinski via Gcc-patches
On Mon, Jun 19, 2023 at 1:32 AM Richard Biener via Gcc-patches
 wrote:
>
> On Mon, 19 Jun 2023, Jan Hubicka wrote:
>
> > Hi,
> > this patch avoids unnecessary post dominator and update_ssa in phiprop.
> >
> > Bootstrapped/regtested x86_64-linux, OK?
> >
> > gcc/ChangeLog:
> >
> >   * tree-ssa-phiprop.cc (propagate_with_phi): Add 
> > post_dominators_computed;
> >   compute post dominators lazilly.
> >   (const pass_data pass_data_phiprop): Remove TODO_update_ssa.
> >   (pass_phiprop::execute): Update; return TODO_update_ssa if something
> >   changed.
> >
> > diff --git a/gcc/tree-ssa-phiprop.cc b/gcc/tree-ssa-phiprop.cc
> > index 3cb4900b6be..87e3a2ccf3a 100644
> > --- a/gcc/tree-ssa-phiprop.cc
> > +++ b/gcc/tree-ssa-phiprop.cc
> > @@ -260,7 +260,7 @@ chk_uses (tree, tree *idx, void *data)
> >
> >  static bool
> >  propagate_with_phi (basic_block bb, gphi *phi, struct phiprop_d *phivn,
> > - size_t n)
> > + size_t n, bool *post_dominators_computed)
> >  {
> >tree ptr = PHI_RESULT (phi);
> >gimple *use_stmt;
> > @@ -324,6 +324,12 @@ propagate_with_phi (basic_block bb, gphi *phi, struct 
> > phiprop_d *phivn,
> >gimple *def_stmt;
> >tree vuse;
> >
> > +  if (!*post_dominators_computed)
> > +{
> > +   calculate_dominance_info (CDI_POST_DOMINATORS);
> > +   *post_dominators_computed = true;
>
> I think you can save the parameter by using dom_info_available_p () here
> and ...
>
> > + }
> > +
> >/* Only replace loads in blocks that post-dominate the PHI node.  
> > That
> >   makes sure we don't end up speculating loads.  */
> >if (!dominated_by_p (CDI_POST_DOMINATORS,
> > @@ -465,7 +471,7 @@ const pass_data pass_data_phiprop =
> >0, /* properties_provided */
> >0, /* properties_destroyed */
> >0, /* todo_flags_start */
> > -  TODO_update_ssa, /* todo_flags_finish */
> > +  0, /* todo_flags_finish */
> >  };
> >
> >  class pass_phiprop : public gimple_opt_pass
> > @@ -490,9 +497,9 @@ pass_phiprop::execute (function *fun)
> >gphi_iterator gsi;
> >unsigned i;
> >size_t n;
> > +  bool post_dominators_computed = false;
> >
> >calculate_dominance_info (CDI_DOMINATORS);
> > -  calculate_dominance_info (CDI_POST_DOMINATORS);
> >
> >n = num_ssa_names;
> >phivn = XCNEWVEC (struct phiprop_d, n);
> > @@ -508,7 +515,8 @@ pass_phiprop::execute (function *fun)
> >if (bb_has_abnormal_pred (bb))
> >   continue;
> >for (gsi = gsi_start_phis (bb); !gsi_end_p (gsi); gsi_next ())
> > - did_something |= propagate_with_phi (bb, gsi.phi (), phivn, n);
> > + did_something |= propagate_with_phi (bb, gsi.phi (), phivn, n,
> > +  _dominators_computed);
> >  }
> >
> >if (did_something)
> > @@ -516,9 +524,10 @@ pass_phiprop::execute (function *fun)
> >
> >free (phivn);
> >
> > -  free_dominance_info (CDI_POST_DOMINATORS);
> > +  if (post_dominators_computed)
> > +free_dominance_info (CDI_POST_DOMINATORS);
>
> unconditionally free_dominance_info here.
>
> > -  return 0;
> > +  return did_something ? TODO_update_ssa : 0;
>
> I guess that change is following general practice and good to catch
> undesired changes (update_ssa will exit early when there's nothing
> to do anyway).

I wonder if TODO_update_ssa_only_virtuals should be used here rather
than TODO_update_ssa as the code produces ssa names already and just
adds memory loads/stores. But I could be wrong.

Thanks,
Andrew Pinski


>
> OK with those changes.


Re: [PATCH] RISC-V: Add tuple vector mode psABI checking and simplify code

2023-06-19 Thread Jeff Law via Gcc-patches




On 6/18/23 07:16, 钟居哲 wrote:

Thanks for cleaning up codes for future's ABI support patch.
Let's wait for Jeff or Robin comments.

Looks reasonable to me given the state we're in WRT psabi and vectors.

jeff


Re: [PATCH] Do not allow "x + 0.0" to "x" optimization with -fsignaling-nans

2023-06-19 Thread Jeff Law via Gcc-patches




On 6/19/23 05:41, Richard Biener via Gcc-patches wrote:

On Mon, Jun 19, 2023 at 12:33 PM Toru Kisuki via Gcc-patches
 wrote:


Hi,


With -O3 -fsignaling-nans -fno-signed-zeros, compiler should not simplify 'x + 
0.0' to 'x'.



OK if you bootstrapped / tested this change.
I'm suspect Toru doesn't have write access.  So I went ahead and did and 
x86 bootstrap & regression test which passed.  The ChangeLog entry 
needed fleshing out a bit and fixed a minor whitespace problem in the 
patch itself.


Pushed to the trunk.


jeff


[Bug rtl-optimization/110305] Incorrect optimization with -O3 -fsignaling-nans -fno-signed-zeros

2023-06-19 Thread cvs-commit at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=110305

--- Comment #5 from CVS Commits  ---
The master branch has been updated by Jeff Law :

https://gcc.gnu.org/g:827b2a279fc6ad5bb76e4d2c2eb3432955b5e11c

commit r14-1952-g827b2a279fc6ad5bb76e4d2c2eb3432955b5e11c
Author: Toru Kisuki 
Date:   Mon Jun 19 11:51:09 2023 -0600

Do not allow "x + 0.0" to "x" optimization with -fsignaling-nans

gcc/
PR rtl-optimization/110305
* simplify-rtx.cc (simplify_context::simplify_binary_operation_1):
Handle HONOR_SNANS for x + 0.0.

[Bug c/102989] Implement C2x's n2763 (_BitInt)

2023-06-19 Thread jakub at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=102989

Jakub Jelinek  changed:

   What|Removed |Added

  Attachment #55329|0   |1
is obsolete||

--- Comment #66 from Jakub Jelinek  ---
Created attachment 55364
  --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=55364=edit
gcc14-bitint-wip.patch

Updated patch.  This can already do some simple lowering of the large/huge
_BitInt operations, like:
void
foo (_BitInt(192) *x, _BitInt(192) *y, _BitInt(135) *z, _BitInt(135) *w)
{
  x[0] &= y[0];
  x[1] |= y[1];
  x[2] ^= y[2];
  x[3] = ~y[3];
  z[0] &= w[0];
  z[1] |= w[1];
  z[2] ^= w[2];
  z[3] = ~w[3];
}

_BitInt(517) a, b, c, d, e, f;

void
bar (void)
{
  a &= b;
  c |= b;
  d ^= b;
  e = ~f;
}

Additions/subtractions/left shift by small constant next.

[Bug c++/110312] -Wcast-align=strict warning despite alignas

2023-06-19 Thread f.heckenbach--- via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=110312

--- Comment #2 from Frank Heckenbach  ---
(In reply to Andrew Pinski from comment #1)
> The decl has the increased alignment but the type does not in this case.
> 
> So I think the warning is still correct.

So there's no way around it other then disabling the warning, correct?

[Bug c++/110312] -Wcast-align=strict warning despite alignas

2023-06-19 Thread pinskia at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=110312

Andrew Pinski  changed:

   What|Removed |Added

   Keywords||diagnostic

--- Comment #1 from Andrew Pinski  ---
The decl has the increased alignment but the type does not in this case.

So I think the warning is still correct.

Re: [PATCH] debug/110295 - mixed up early/late debug for member DIEs

2023-06-19 Thread Jason Merrill via Gcc-patches

On 6/19/23 06:15, Richard Biener wrote:

When we process a scope typedef during early debug creation and
we have already created a DIE for the type when the decl is
TYPE_DECL_IS_STUB and this DIE is still in limbo we end up
just re-parenting that type DIE instead of properly creating
a DIE for the decl, eventually picking up the now completed
type and creating DIEs for the members.  Instead this is currently
defered to the second time we come here, when we annotate the
DIEs with locations late where now the type DIE is no longer
in limbo and we fall through doing the job for the decl.

The following makes sure we perform the necessary early tasks
for this by continuing with the decl DIE creation after setting
a parent for the limbo type DIE.

[LTO] Bootstrapped on x86_64-unknown-linux-gnu.

OK for trunk?

Thanks,
Richard.

PR debug/110295
* dwarf2out.cc (process_scope_var): Continue processing
the decl after setting a parent in case the existing DIE
was in limbo.

* g++.dg/debug/pr110295.C: New testcase.
---
  gcc/dwarf2out.cc  |  3 ++-
  gcc/testsuite/g++.dg/debug/pr110295.C | 19 +++
  2 files changed, 21 insertions(+), 1 deletion(-)
  create mode 100644 gcc/testsuite/g++.dg/debug/pr110295.C

diff --git a/gcc/dwarf2out.cc b/gcc/dwarf2out.cc
index d89ffa66847..e70c47cec8d 100644
--- a/gcc/dwarf2out.cc
+++ b/gcc/dwarf2out.cc
@@ -26533,7 +26533,8 @@ process_scope_var (tree stmt, tree decl, tree origin, 
dw_die_ref context_die)
  
if (die != NULL && die->die_parent == NULL)

  add_child_die (context_die, die);


I wonder about reorganizing the function a bit to unify this parent 
setting with the one a bit below, which already falls through to 
gen_decl_die:



  if (decl && DECL_P (decl))
{
  die = lookup_decl_die (decl);

  /* Early created DIEs do not have a parent as the decls refer 
 to the function as DECL_CONTEXT rather than the BLOCK.  */

  if (die && die->die_parent == NULL)
{
  gcc_assert (in_lto_p);
  add_child_die (context_die, die);
}
}


OK either way.

Jason



Re: [PATCH 2/2] cprop_hardreg: Enable propagation of the stack pointer if possible.

2023-06-19 Thread Manolis Tsamis
On Mon, Jun 19, 2023 at 7:57 PM Thiago Jung Bauermann
 wrote:
>
>
> Hello Manolis,
>
> Philipp Tomsich  writes:
>
> > On Thu, 8 Jun 2023 at 00:18, Jeff Law  wrote:
> >>
> >> On 5/25/23 06:35, Manolis Tsamis wrote:
> >> > Propagation of the stack pointer in cprop_hardreg is currenty forbidden
> >> > in all cases, due to maybe_mode_change returning NULL. Relax this
> >> > restriction and allow propagation when no mode change is requested.
> >> >
> >> > gcc/ChangeLog:
> >> >
> >> >  * regcprop.cc (maybe_mode_change): Enable stack pointer 
> >> > propagation.
> >> Thanks for the clarification.  This is OK for the trunk.  It looks
> >> generic enough to have value going forward now rather than waiting.
> >
> > Rebased, retested, and applied to trunk.  Thanks!
>
> Our CI found a couple of tests that started failing on aarch64-linux
> after this commit. I was able to confirm manually that they don't happen
> in the commit immediately before this one, and also that these failures
> are still present in today's trunk.
>
> I have testsuite logs for last good commit, first bad commit and current
> trunk here:
>
> https://people.linaro.org/~thiago.bauermann/gcc-regression-6a2e8dcbbd4b/
>
> Could you please check?
>
> These are the new failures:
>
> Running gcc:gcc.target/aarch64/aarch64.exp ...
> FAIL: gcc.target/aarch64/stack-check-cfa-3.c scan-assembler-times mov\\tx11, 
> sp 1
>
> Running gcc:gcc.target/aarch64/sve/pcs/aarch64-sve-pcs.exp ...
> FAIL: gcc.target/aarch64/sve/pcs/args_1.c -march=armv8.2-a+sve 
> -fno-stack-protector  check-function-bodies caller_pred
> FAIL: gcc.target/aarch64/sve/pcs/args_2.c -march=armv8.2-a+sve 
> -fno-stack-protector  scan-assembler \\tmov\\t(z[0-9]+\\.b), 
> #8\\n.*\\tst1b\\t\\1, p[0-7], \\[x4\\]\\n
> FAIL: gcc.target/aarch64/sve/pcs/args_3.c -march=armv8.2-a+sve 
> -fno-stack-protector  scan-assembler \\tmov\\t(z[0-9]+\\.b), 
> #8\\n.*\\tst1b\\t\\1, p[0-7], \\[x4\\]\\n
> FAIL: gcc.target/aarch64/sve/pcs/args_4.c -march=armv8.2-a+sve 
> -fno-stack-protector  scan-assembler \\tfmov\\t(z[0-9]+\\.h), 
> #8\\.0.*\\tst1h\\t\\1, p[0-7], \\[x4\\]\\n
> FAIL: gcc.target/aarch64/sve/pcs/args_5_be_bf16.c -march=armv8.2-a+sve 
> -fno-stack-protector  scan-assembler \\tld2h\\t{(z[0-9]+\\.h) - 
> z[0-9]+\\.h}.*\\tst1h\\t\\1, p[0-7], \\[x1\\]\\n
> FAIL: gcc.target/aarch64/sve/pcs/args_5_be_f16.c -march=armv8.2-a+sve 
> -fno-stack-protector  scan-assembler \\tld2h\\t{(z[0-9]+\\.h) - 
> z[0-9]+\\.h}.*\\tst1h\\t\\1, p[0-7], \\[x1\\]\\n
> FAIL: gcc.target/aarch64/sve/pcs/args_5_be_f32.c -march=armv8.2-a+sve 
> -fno-stack-protector  scan-assembler \\tld2w\\t{(z[0-9]+\\.s) - 
> z[0-9]+\\.s}.*\\tst1w\\t\\1, p[0-7], \\[x1\\]\\n
> FAIL: gcc.target/aarch64/sve/pcs/args_5_be_f64.c -march=armv8.2-a+sve 
> -fno-stack-protector  scan-assembler \\tld2d\\t{(z[0-9]+\\.d) - 
> z[0-9]+\\.d}.*\\tst1d\\t\\1, p[0-7], \\[x1\\]\\n
> FAIL: gcc.target/aarch64/sve/pcs/args_5_be_s16.c -march=armv8.2-a+sve 
> -fno-stack-protector  scan-assembler \\tld2h\\t{(z[0-9]+\\.h) - 
> z[0-9]+\\.h}.*\\tst1h\\t\\1, p[0-7], \\[x1\\]\\n
> FAIL: gcc.target/aarch64/sve/pcs/args_5_be_s32.c -march=armv8.2-a+sve 
> -fno-stack-protector  scan-assembler \\tld2w\\t{(z[0-9]+\\.s) - 
> z[0-9]+\\.s}.*\\tst1w\\t\\1, p[0-7], \\[x1\\]\\n
> FAIL: gcc.target/aarch64/sve/pcs/args_5_be_s64.c -march=armv8.2-a+sve 
> -fno-stack-protector  scan-assembler \\tld2d\\t{(z[0-9]+\\.d) - 
> z[0-9]+\\.d}.*\\tst1d\\t\\1, p[0-7], \\[x1\\]\\n
> FAIL: gcc.target/aarch64/sve/pcs/args_5_be_s8.c -march=armv8.2-a+sve 
> -fno-stack-protector  scan-assembler \\tld2b\\t{(z[0-9]+\\.b) - 
> z[0-9]+\\.b}.*\\tst1b\\t\\1, p[0-7], \\[x1\\]\\n
> FAIL: gcc.target/aarch64/sve/pcs/args_5_be_u16.c -march=armv8.2-a+sve 
> -fno-stack-protector  scan-assembler \\tld2h\\t{(z[0-9]+\\.h) - 
> z[0-9]+\\.h}.*\\tst1h\\t\\1, p[0-7], \\[x1\\]\\n
> FAIL: gcc.target/aarch64/sve/pcs/args_5_be_u32.c -march=armv8.2-a+sve 
> -fno-stack-protector  scan-assembler \\tld2w\\t{(z[0-9]+\\.s) - 
> z[0-9]+\\.s}.*\\tst1w\\t\\1, p[0-7], \\[x1\\]\\n
> FAIL: gcc.target/aarch64/sve/pcs/args_5_be_u64.c -march=armv8.2-a+sve 
> -fno-stack-protector  scan-assembler \\tld2d\\t{(z[0-9]+\\.d) - 
> z[0-9]+\\.d}.*\\tst1d\\t\\1, p[0-7], \\[x1\\]\\n
> FAIL: gcc.target/aarch64/sve/pcs/args_5_be_u8.c -march=armv8.2-a+sve 
> -fno-stack-protector  scan-assembler \\tld2b\\t{(z[0-9]+\\.b) - 
> z[0-9]+\\.b}.*\\tst1b\\t\\1, p[0-7], \\[x1\\]\\n
> FAIL: gcc.target/aarch64/sve/pcs/args_5_le_bf16.c -march=armv8.2-a+sve 
> -fno-stack-protector  scan-assembler \\tld2h\\t{(z[0-9]+)\\.h - 
> z[0-9]+\\.h}.*\\tstr\\t\\1, \\[x1\\]\\n
> FAIL: gcc.target/aarch64/sve/pcs/args_5_le_f16.c -march=armv8.2-a+sve 
> -fno-stack-protector  scan-assembler \\tld2h\\t{(z[0-9]+)\\.h - 
> z[0-9]+\\.h}.*\\tstr\\t\\1, \\[x1\\]\\n
> FAIL: gcc.target/aarch64/sve/pcs/args_5_le_f32.c -march=armv8.2-a+sve 
> -fno-stack-protector  scan-assembler \\tld2w\\t{(z[0-9]+)\\.s - 
> z[0-9]+\\.s}.*\\tstr\\t\\1, \\[x1\\]\\n
> FAIL: 

Re: [PATCH] c-family: implement -ffp-contract=on

2023-06-19 Thread Alexander Monakov via Gcc-patches


Ping. OK for trunk?

On Mon, 5 Jun 2023, Alexander Monakov wrote:

> Ping for the front-end maintainers' input.
> 
> On Mon, 22 May 2023, Richard Biener wrote:
> 
> > On Thu, May 18, 2023 at 11:04 PM Alexander Monakov via Gcc-patches
> >  wrote:
> > >
> > > Implement -ffp-contract=on for C and C++ without changing default
> > > behavior (=off for -std=cNN, =fast for C++ and -std=gnuNN).
> > 
> > The documentation changes mention the defaults are changed for
> > standard modes, I suppose you want to remove that hunk.
> > 
> > > gcc/c-family/ChangeLog:
> > >
> > > * c-gimplify.cc (fma_supported_p): New helper.
> > > (c_gimplify_expr) [PLUS_EXPR, MINUS_EXPR]: Implement FMA
> > > contraction.
> > >
> > > gcc/ChangeLog:
> > >
> > > * common.opt (fp_contract_mode) [on]: Remove fallback.
> > > * config/sh/sh.md (*fmasf4): Correct flag_fp_contract_mode test.
> > > * doc/invoke.texi (-ffp-contract): Update.
> > > * trans-mem.cc (diagnose_tm_1): Skip internal function calls.
> > > ---
> > >  gcc/c-family/c-gimplify.cc | 78 ++
> > >  gcc/common.opt |  3 +-
> > >  gcc/config/sh/sh.md|  2 +-
> > >  gcc/doc/invoke.texi|  8 ++--
> > >  gcc/trans-mem.cc   |  3 ++
> > >  5 files changed, 88 insertions(+), 6 deletions(-)
> > >
> > > diff --git a/gcc/c-family/c-gimplify.cc b/gcc/c-family/c-gimplify.cc
> > > index ef5c7d919f..f7635d3b0c 100644
> > > --- a/gcc/c-family/c-gimplify.cc
> > > +++ b/gcc/c-family/c-gimplify.cc
> > > @@ -41,6 +41,8 @@ along with GCC; see the file COPYING3.  If not see
> > >  #include "c-ubsan.h"
> > >  #include "tree-nested.h"
> > >  #include "context.h"
> > > +#include "tree-pass.h"
> > > +#include "internal-fn.h"
> > >
> > >  /*  The gimplification pass converts the language-dependent trees
> > >  (ld-trees) emitted by the parser into language-independent trees
> > > @@ -686,6 +688,14 @@ c_build_bind_expr (location_t loc, tree block, tree 
> > > body)
> > >return bind;
> > >  }
> > >
> > > +/* Helper for c_gimplify_expr: test if target supports fma-like FN.  */
> > > +
> > > +static bool
> > > +fma_supported_p (enum internal_fn fn, tree type)
> > > +{
> > > +  return direct_internal_fn_supported_p (fn, type, OPTIMIZE_FOR_BOTH);
> > > +}
> > > +
> > >  /* Gimplification of expression trees.  */
> > >
> > >  /* Do C-specific gimplification on *EXPR_P.  PRE_P and POST_P are as in
> > > @@ -739,6 +749,74 @@ c_gimplify_expr (tree *expr_p, gimple_seq *pre_p 
> > > ATTRIBUTE_UNUSED,
> > > break;
> > >}
> > >
> > > +case PLUS_EXPR:
> > > +case MINUS_EXPR:
> > > +  {
> > > +   tree type = TREE_TYPE (*expr_p);
> > > +   /* For -ffp-contract=on we need to attempt FMA contraction only
> > > +  during initial gimplification.  Late contraction across 
> > > statement
> > > +  boundaries would violate language semantics.  */
> > > +   if (SCALAR_FLOAT_TYPE_P (type)
> > > +   && flag_fp_contract_mode == FP_CONTRACT_ON
> > > +   && cfun && !(cfun->curr_properties & PROP_gimple_any)
> > > +   && fma_supported_p (IFN_FMA, type))
> > > + {
> > > +   bool neg_mul = false, neg_add = code == MINUS_EXPR;
> > > +
> > > +   tree *op0_p = _OPERAND (*expr_p, 0);
> > > +   tree *op1_p = _OPERAND (*expr_p, 1);
> > > +
> > > +   /* Look for ±(x * y) ± z, swapping operands if necessary.  */
> > > +   if (TREE_CODE (*op0_p) == NEGATE_EXPR
> > > +   && TREE_CODE (TREE_OPERAND (*op0_p, 0)) == MULT_EXPR)
> > > + /* '*EXPR_P' is '-(x * y) ± z'.  This is fine.  */;
> > > +   else if (TREE_CODE (*op0_p) != MULT_EXPR)
> > > + {
> > > +   std::swap (op0_p, op1_p);
> > > +   std::swap (neg_mul, neg_add);
> > > + }
> > > +   if (TREE_CODE (*op0_p) == NEGATE_EXPR)
> > > + {
> > > +   op0_p = _OPERAND (*op0_p, 0);
> > > +   neg_mul = !neg_mul;
> > > + }
> > > +   if (TREE_CODE (*op0_p) != MULT_EXPR)
> > > + break;
> > > +   auto_vec ops (3);
> > > +   ops.quick_push (TREE_OPERAND (*op0_p, 0));
> > > +   ops.quick_push (TREE_OPERAND (*op0_p, 1));
> > > +   ops.quick_push (*op1_p);
> > > +
> > > +   enum internal_fn ifn = IFN_FMA;
> > > +   if (neg_mul)
> > > + {
> > > +   if (fma_supported_p (IFN_FNMA, type))
> > > + ifn = IFN_FNMA;
> > > +   else
> > > + ops[0] = build1 (NEGATE_EXPR, type, ops[0]);
> > > + }
> > > +   if (neg_add)
> > > + {
> > > +   enum internal_fn ifn2 = ifn == IFN_FMA ? IFN_FMS : 
> > > IFN_FNMS;
> > > +   if (fma_supported_p (ifn2, type))
> > > + ifn = ifn2;
> > > +   else
> > > + ops[2] = 

Re: [PATCH 2/2] cprop_hardreg: Enable propagation of the stack pointer if possible.

2023-06-19 Thread Thiago Jung Bauermann via Gcc-patches


Hello Manolis,

Philipp Tomsich  writes:

> On Thu, 8 Jun 2023 at 00:18, Jeff Law  wrote:
>>
>> On 5/25/23 06:35, Manolis Tsamis wrote:
>> > Propagation of the stack pointer in cprop_hardreg is currenty forbidden
>> > in all cases, due to maybe_mode_change returning NULL. Relax this
>> > restriction and allow propagation when no mode change is requested.
>> >
>> > gcc/ChangeLog:
>> >
>> >  * regcprop.cc (maybe_mode_change): Enable stack pointer 
>> > propagation.
>> Thanks for the clarification.  This is OK for the trunk.  It looks
>> generic enough to have value going forward now rather than waiting.
>
> Rebased, retested, and applied to trunk.  Thanks!

Our CI found a couple of tests that started failing on aarch64-linux
after this commit. I was able to confirm manually that they don't happen
in the commit immediately before this one, and also that these failures
are still present in today's trunk.

I have testsuite logs for last good commit, first bad commit and current
trunk here:

https://people.linaro.org/~thiago.bauermann/gcc-regression-6a2e8dcbbd4b/

Could you please check?

These are the new failures:

Running gcc:gcc.target/aarch64/aarch64.exp ...
FAIL: gcc.target/aarch64/stack-check-cfa-3.c scan-assembler-times mov\\tx11, sp 
1

Running gcc:gcc.target/aarch64/sve/pcs/aarch64-sve-pcs.exp ...
FAIL: gcc.target/aarch64/sve/pcs/args_1.c -march=armv8.2-a+sve 
-fno-stack-protector  check-function-bodies caller_pred
FAIL: gcc.target/aarch64/sve/pcs/args_2.c -march=armv8.2-a+sve 
-fno-stack-protector  scan-assembler \\tmov\\t(z[0-9]+\\.b), 
#8\\n.*\\tst1b\\t\\1, p[0-7], \\[x4\\]\\n
FAIL: gcc.target/aarch64/sve/pcs/args_3.c -march=armv8.2-a+sve 
-fno-stack-protector  scan-assembler \\tmov\\t(z[0-9]+\\.b), 
#8\\n.*\\tst1b\\t\\1, p[0-7], \\[x4\\]\\n
FAIL: gcc.target/aarch64/sve/pcs/args_4.c -march=armv8.2-a+sve 
-fno-stack-protector  scan-assembler \\tfmov\\t(z[0-9]+\\.h), 
#8\\.0.*\\tst1h\\t\\1, p[0-7], \\[x4\\]\\n
FAIL: gcc.target/aarch64/sve/pcs/args_5_be_bf16.c -march=armv8.2-a+sve 
-fno-stack-protector  scan-assembler \\tld2h\\t{(z[0-9]+\\.h) - 
z[0-9]+\\.h}.*\\tst1h\\t\\1, p[0-7], \\[x1\\]\\n
FAIL: gcc.target/aarch64/sve/pcs/args_5_be_f16.c -march=armv8.2-a+sve 
-fno-stack-protector  scan-assembler \\tld2h\\t{(z[0-9]+\\.h) - 
z[0-9]+\\.h}.*\\tst1h\\t\\1, p[0-7], \\[x1\\]\\n
FAIL: gcc.target/aarch64/sve/pcs/args_5_be_f32.c -march=armv8.2-a+sve 
-fno-stack-protector  scan-assembler \\tld2w\\t{(z[0-9]+\\.s) - 
z[0-9]+\\.s}.*\\tst1w\\t\\1, p[0-7], \\[x1\\]\\n
FAIL: gcc.target/aarch64/sve/pcs/args_5_be_f64.c -march=armv8.2-a+sve 
-fno-stack-protector  scan-assembler \\tld2d\\t{(z[0-9]+\\.d) - 
z[0-9]+\\.d}.*\\tst1d\\t\\1, p[0-7], \\[x1\\]\\n
FAIL: gcc.target/aarch64/sve/pcs/args_5_be_s16.c -march=armv8.2-a+sve 
-fno-stack-protector  scan-assembler \\tld2h\\t{(z[0-9]+\\.h) - 
z[0-9]+\\.h}.*\\tst1h\\t\\1, p[0-7], \\[x1\\]\\n
FAIL: gcc.target/aarch64/sve/pcs/args_5_be_s32.c -march=armv8.2-a+sve 
-fno-stack-protector  scan-assembler \\tld2w\\t{(z[0-9]+\\.s) - 
z[0-9]+\\.s}.*\\tst1w\\t\\1, p[0-7], \\[x1\\]\\n
FAIL: gcc.target/aarch64/sve/pcs/args_5_be_s64.c -march=armv8.2-a+sve 
-fno-stack-protector  scan-assembler \\tld2d\\t{(z[0-9]+\\.d) - 
z[0-9]+\\.d}.*\\tst1d\\t\\1, p[0-7], \\[x1\\]\\n
FAIL: gcc.target/aarch64/sve/pcs/args_5_be_s8.c -march=armv8.2-a+sve 
-fno-stack-protector  scan-assembler \\tld2b\\t{(z[0-9]+\\.b) - 
z[0-9]+\\.b}.*\\tst1b\\t\\1, p[0-7], \\[x1\\]\\n
FAIL: gcc.target/aarch64/sve/pcs/args_5_be_u16.c -march=armv8.2-a+sve 
-fno-stack-protector  scan-assembler \\tld2h\\t{(z[0-9]+\\.h) - 
z[0-9]+\\.h}.*\\tst1h\\t\\1, p[0-7], \\[x1\\]\\n
FAIL: gcc.target/aarch64/sve/pcs/args_5_be_u32.c -march=armv8.2-a+sve 
-fno-stack-protector  scan-assembler \\tld2w\\t{(z[0-9]+\\.s) - 
z[0-9]+\\.s}.*\\tst1w\\t\\1, p[0-7], \\[x1\\]\\n
FAIL: gcc.target/aarch64/sve/pcs/args_5_be_u64.c -march=armv8.2-a+sve 
-fno-stack-protector  scan-assembler \\tld2d\\t{(z[0-9]+\\.d) - 
z[0-9]+\\.d}.*\\tst1d\\t\\1, p[0-7], \\[x1\\]\\n
FAIL: gcc.target/aarch64/sve/pcs/args_5_be_u8.c -march=armv8.2-a+sve 
-fno-stack-protector  scan-assembler \\tld2b\\t{(z[0-9]+\\.b) - 
z[0-9]+\\.b}.*\\tst1b\\t\\1, p[0-7], \\[x1\\]\\n
FAIL: gcc.target/aarch64/sve/pcs/args_5_le_bf16.c -march=armv8.2-a+sve 
-fno-stack-protector  scan-assembler \\tld2h\\t{(z[0-9]+)\\.h - 
z[0-9]+\\.h}.*\\tstr\\t\\1, \\[x1\\]\\n
FAIL: gcc.target/aarch64/sve/pcs/args_5_le_f16.c -march=armv8.2-a+sve 
-fno-stack-protector  scan-assembler \\tld2h\\t{(z[0-9]+)\\.h - 
z[0-9]+\\.h}.*\\tstr\\t\\1, \\[x1\\]\\n
FAIL: gcc.target/aarch64/sve/pcs/args_5_le_f32.c -march=armv8.2-a+sve 
-fno-stack-protector  scan-assembler \\tld2w\\t{(z[0-9]+)\\.s - 
z[0-9]+\\.s}.*\\tstr\\t\\1, \\[x1\\]\\n
FAIL: gcc.target/aarch64/sve/pcs/args_5_le_f64.c -march=armv8.2-a+sve 
-fno-stack-protector  scan-assembler \\tld2d\\t{(z[0-9]+)\\.d - 
z[0-9]+\\.d}.*\\tstr\\t\\1, \\[x1\\]\\n
FAIL: gcc.target/aarch64/sve/pcs/args_5_le_s16.c -march=armv8.2-a+sve 
-fno-stack-protector  scan-assembler 

Re: gcc tricore porting

2023-06-19 Thread Alexander Monakov via Gcc


On Mon, 19 Jun 2023, Mikael Pettersson via Gcc wrote:

> (Note I'm reading the gcc mailing list via the Web archives, which
> doesn't let me create "proper" replies. Oh well.)

(there's a public-inbox instance at https://inbox.sourceware.org/gcc/
but some messages are not available there)

Alexander


Re: gcc tricore porting

2023-06-19 Thread Joel Sherrill
On Mon, Jun 19, 2023, 10:36 AM Mikael Pettersson via Gcc 
wrote:

> (Note I'm reading the gcc mailing list via the Web archives, which
> doesn't let me
> create "proper" replies. Oh well.)
>
> On Sun Jun 18 09:58:56 GMT 2023,  wrote:
> > Hi, this is my first time with open source development. I worked in
> > automotive for 22 years and we (generally) were using tricore series for
> > these products. GCC doesn't compile on that platform. I left my work some
> > days ago and so I'll have some spare time in the next few months. I would
> > like to know how difficult it is to port the tricore platform on gcc and
> if
> > during this process somebody can support me as tutor and... also if the
> gcc
> > team is interested in this item...
>
> https://github.com/volumit has a port of gcc + binutils + newlib + gdb
> to Tricore,
> and it's not _that_ ancient. I have no idea where it originates from
> or how complete
> it is, but I do know the gcc-4.9.4 based one builds with some tweaks.
>


https://github.com/volumit/package_494 says there is a port in process to
> gcc 9. Perhaps digging in and assessing that would be a good start.
>

One question is whether that code has proper assignments on file for
ultimate inclusion. That should be part of your assessment.

--joel

>

I don't know anything more about it, I'm just a collector of
> cross-compilers for
> obscure / lost / forgotten / abandoned targets.
>
> /Mikael
>


[Bug target/109811] libjxl 0.7 is a lot slower in GCC 13.1 vs Clang 16

2023-06-19 Thread cvs-commit at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=109811

--- Comment #12 from CVS Commits  ---
The master branch has been updated by Jan Hubicka :

https://gcc.gnu.org/g:7b34cacc5735385e7e2855d7c0a6fad60ef4a99b

commit r14-1951-g7b34cacc5735385e7e2855d7c0a6fad60ef4a99b
Author: Jan Hubicka 
Date:   Mon Jun 19 18:28:17 2023 +0200

optimize std::max early

we currently produce very bad code on loops using std::vector as a stack,
since
we fail to inline push_back which in turn prevents SRA and we fail to
optimize
out some store-to-load pairs.

I looked into why this function is not inlined and it is inlined by clang. 
We
currently estimate it to 66 instructions and inline limits are 15 at -O2
and 30
at -O3.  Clang has similar estimate, but still decides to inline at -O2.

I looked into reason why the body is so large and one problem I spotted is
the
way std::max is implemented by taking and returning reference to the
values.

  const T& max( const T& a, const T& b );

This makes it necessary to store the values to memory and load them later
and max is used by code computing new size of vector on resize.

We optimize this to MAX_EXPR, but only during late optimizations.  I think
this
is a common enough coding pattern and we ought to make this transparent to
early opts and IPA.  The following is easist fix that simply adds phiprop
pass
that turns the PHI of address values into PHI of values so later FRE can
propagate values across memory, phiopt discover the MAX_EXPR pattern and
DSE
remove the memory stores.

gcc/ChangeLog:

PR tree-optimization/109811
PR tree-optimization/109849
* passes.def: Add phiprop to early optimization passes.
* tree-ssa-phiprop.cc: Allow clonning.

gcc/testsuite/ChangeLog:

PR tree-optimization/109811
PR tree-optimization/109849
* gcc.dg/tree-ssa/phiprop-1.c: New test.
* gcc.dg/tree-ssa/pr21463.c: Adjust template.

[Bug middle-end/109849] suboptimal code for vector walking loop

2023-06-19 Thread cvs-commit at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=109849

--- Comment #16 from CVS Commits  ---
The master branch has been updated by Jan Hubicka :

https://gcc.gnu.org/g:7b34cacc5735385e7e2855d7c0a6fad60ef4a99b

commit r14-1951-g7b34cacc5735385e7e2855d7c0a6fad60ef4a99b
Author: Jan Hubicka 
Date:   Mon Jun 19 18:28:17 2023 +0200

optimize std::max early

we currently produce very bad code on loops using std::vector as a stack,
since
we fail to inline push_back which in turn prevents SRA and we fail to
optimize
out some store-to-load pairs.

I looked into why this function is not inlined and it is inlined by clang. 
We
currently estimate it to 66 instructions and inline limits are 15 at -O2
and 30
at -O3.  Clang has similar estimate, but still decides to inline at -O2.

I looked into reason why the body is so large and one problem I spotted is
the
way std::max is implemented by taking and returning reference to the
values.

  const T& max( const T& a, const T& b );

This makes it necessary to store the values to memory and load them later
and max is used by code computing new size of vector on resize.

We optimize this to MAX_EXPR, but only during late optimizations.  I think
this
is a common enough coding pattern and we ought to make this transparent to
early opts and IPA.  The following is easist fix that simply adds phiprop
pass
that turns the PHI of address values into PHI of values so later FRE can
propagate values across memory, phiopt discover the MAX_EXPR pattern and
DSE
remove the memory stores.

gcc/ChangeLog:

PR tree-optimization/109811
PR tree-optimization/109849
* passes.def: Add phiprop to early optimization passes.
* tree-ssa-phiprop.cc: Allow clonning.

gcc/testsuite/ChangeLog:

PR tree-optimization/109811
PR tree-optimization/109849
* gcc.dg/tree-ssa/phiprop-1.c: New test.
* gcc.dg/tree-ssa/pr21463.c: Adjust template.

[PATCH] VECT: Apply LEN_MASK_{LOAD,STORE} into vectorizer

2023-06-19 Thread juzhe . zhong
From: Ju-Zhe Zhong 

This patch is apply LEN_MASK_{LOAD,STORE} into vectorizer.
I refactor gimple IR build to make codes look cleaner.

gcc/ChangeLog:

* internal-fn.cc (expand_partial_store_optab_fn): Add 
LEN_MASK_{LOAD,STORE} vectorizer support.
(internal_load_fn_p): Ditto.
(internal_store_fn_p): Ditto.
(internal_fn_mask_index): Ditto.
(internal_fn_stored_value_index): Ditto.
(internal_len_load_store_bias): Ditto.
* optabs-query.cc (can_vec_mask_load_store_p): Ditto.
(get_len_load_store_mode): Ditto.
* tree-vect-stmts.cc (check_load_store_for_partial_vectors): Ditto.
(get_all_ones_mask): New function.
(vectorizable_store): Add LEN_MASK_{LOAD,STORE} vectorizer support.
(vectorizable_load): Ditto.

---
 gcc/internal-fn.cc |  35 +-
 gcc/optabs-query.cc|  25 +++-
 gcc/tree-vect-stmts.cc | 259 +
 3 files changed, 213 insertions(+), 106 deletions(-)

diff --git a/gcc/internal-fn.cc b/gcc/internal-fn.cc
index c911ae790cb..e10c21de5f1 100644
--- a/gcc/internal-fn.cc
+++ b/gcc/internal-fn.cc
@@ -2949,7 +2949,7 @@ expand_partial_load_optab_fn (internal_fn, gcall *stmt, 
convert_optab optab)
  * OPTAB.  */
 
 static void
-expand_partial_store_optab_fn (internal_fn, gcall *stmt, convert_optab optab)
+expand_partial_store_optab_fn (internal_fn ifn, gcall *stmt, convert_optab 
optab)
 {
   class expand_operand ops[5];
   tree type, lhs, rhs, maskt, biast;
@@ -2957,7 +2957,7 @@ expand_partial_store_optab_fn (internal_fn, gcall *stmt, 
convert_optab optab)
   insn_code icode;
 
   maskt = gimple_call_arg (stmt, 2);
-  rhs = gimple_call_arg (stmt, 3);
+  rhs = gimple_call_arg (stmt, internal_fn_stored_value_index (ifn));
   type = TREE_TYPE (rhs);
   lhs = expand_call_mem_ref (type, stmt, 0);
 
@@ -4435,6 +4435,7 @@ internal_load_fn_p (internal_fn fn)
 case IFN_GATHER_LOAD:
 case IFN_MASK_GATHER_LOAD:
 case IFN_LEN_LOAD:
+case IFN_LEN_MASK_LOAD:
   return true;
 
 default:
@@ -4455,6 +4456,7 @@ internal_store_fn_p (internal_fn fn)
 case IFN_SCATTER_STORE:
 case IFN_MASK_SCATTER_STORE:
 case IFN_LEN_STORE:
+case IFN_LEN_MASK_STORE:
   return true;
 
 default:
@@ -4494,6 +4496,10 @@ internal_fn_mask_index (internal_fn fn)
 case IFN_MASK_STORE_LANES:
   return 2;
 
+case IFN_LEN_MASK_LOAD:
+case IFN_LEN_MASK_STORE:
+  return 3;
+
 case IFN_MASK_GATHER_LOAD:
 case IFN_MASK_SCATTER_STORE:
   return 4;
@@ -4519,6 +4525,9 @@ internal_fn_stored_value_index (internal_fn fn)
 case IFN_LEN_STORE:
   return 3;
 
+case IFN_LEN_MASK_STORE:
+  return 4;
+
 default:
   return -1;
 }
@@ -4583,13 +4592,31 @@ internal_len_load_store_bias (internal_fn ifn, 
machine_mode mode)
 {
   optab optab = direct_internal_fn_optab (ifn);
   insn_code icode = direct_optab_handler (optab, mode);
+  int bias_argno = 3;
+  if (icode == CODE_FOR_nothing)
+{
+  machine_mode mask_mode
+   = targetm.vectorize.get_mask_mode (mode).require ();
+  if (ifn == IFN_LEN_LOAD)
+   {
+ /* Try LEN_MASK_LOAD.  */
+ optab = direct_internal_fn_optab (IFN_LEN_MASK_LOAD);
+   }
+  else
+   {
+ /* Try LEN_MASK_STORE.  */
+ optab = direct_internal_fn_optab (IFN_LEN_MASK_STORE);
+   }
+  icode = convert_optab_handler (optab, mode, mask_mode);
+  bias_argno = 4;
+}
 
   if (icode != CODE_FOR_nothing)
 {
   /* For now we only support biases of 0 or -1.  Try both of them.  */
-  if (insn_operand_matches (icode, 3, GEN_INT (0)))
+  if (insn_operand_matches (icode, bias_argno, GEN_INT (0)))
return 0;
-  if (insn_operand_matches (icode, 3, GEN_INT (-1)))
+  if (insn_operand_matches (icode, bias_argno, GEN_INT (-1)))
return -1;
 }
 
diff --git a/gcc/optabs-query.cc b/gcc/optabs-query.cc
index 276f8408dd7..4394d391200 100644
--- a/gcc/optabs-query.cc
+++ b/gcc/optabs-query.cc
@@ -566,11 +566,14 @@ can_vec_mask_load_store_p (machine_mode mode,
   bool is_load)
 {
   optab op = is_load ? maskload_optab : maskstore_optab;
+  optab len_op = is_load ? len_maskload_optab : len_maskstore_optab;
   machine_mode vmode;
 
   /* If mode is vector mode, check it directly.  */
   if (VECTOR_MODE_P (mode))
-return convert_optab_handler (op, mode, mask_mode) != CODE_FOR_nothing;
+return convert_optab_handler (op, mode, mask_mode) != CODE_FOR_nothing
+  || convert_optab_handler (len_op, mode, mask_mode)
+   != CODE_FOR_nothing;
 
   /* Otherwise, return true if there is some vector mode with
  the mask load/store supported.  */
@@ -584,7 +587,9 @@ can_vec_mask_load_store_p (machine_mode mode,
   vmode = targetm.vectorize.preferred_simd_mode (smode);
   if (VECTOR_MODE_P (vmode)
   && targetm.vectorize.get_mask_mode (vmode).exists (_mode)
-  && 

  1   2   3   >