date:20230327

Re: [PATCH, V2] PR target/105325, Make load/cmp fusion know about prefixed load

2023-03-27 Thread Michael Meissner via Gcc-patches

On Mon, Mar 27, 2023 at 03:03:17PM +0800, Kewen.Lin wrote:
> ... instead I suggested moving these three lines to below else arm for CCUNS,
> since the arm for CC already has those variables redefined, so it's something
> like:

I did those changes in the 3rd version of the patch.

| Date: Mon, 27 Mar 2023 23:19:55 -0400
| From: Michael Meissner 
| Subject: [PATCH, V3] PR target/105325, Make load/cmp fusion know about 
prefixed loads
| Message-ID: 

...

> In the previous review, I put a comment that "lp64 seems not necessary.".
> Did you try to test without it? (if yes, any fallouts?)

Yes, I tried it without the lp64, and I removed it from V3 of the patch.

-- 
Michael Meissner, IBM
PO Box 98, Ayer, Massachusetts, USA, 01432
email: meiss...@linux.ibm.com

[PATCH, V3] PR target/105325, Make load/cmp fusion know about prefixed loads

2023-03-27 Thread Michael Meissner via Gcc-patches

I posted a version of patch on March 21st and a second version on March 24th.
This patch makes some code changes suggested in the genfusion.pl code from the
last 2 patch submissions.  The fusion.md that is produced by genfusion.pl is
the same in all 3 versions.

I changed the genfusion.pl to match the suggestion for code layout.  I also
used the correct comment for each of the instructions (in the 2nd patch, the
when I rewrote the comments about ld and lwa being DS format instructions, I
had put the ld comment in the section handling lwa, and vice versa).

I also removed lp64 from the new test.  When I first added the prefixed code,
it was only done for 64-bit, but now it is allowed for 32-bit.  However, the
case that shows up (lwa) would not hit in 32-bit, since it only generates lwz
and not lwa.  It also would not generate ld.  But the test does pass when it is
built with -m32.

The issue with the bug is the power10 load GPR + cmpi -1/0/1 fusion
optimization generates illegal assembler code.

Ultimately the code was dying because the fusion load + compare -1/0/1 patterns
did not handle the possibility that the load might be prefixed.

The main cause is the constraints for the individual loads in the fusion did not
match the machine.  In particular, LWA is a ds format instruction when it is
unprefixed.  The code did not also set the prefixed attribute correctly.

This patch rewrites the genfusion.pl script so that it will have more accurate
constraints for the LWA and LD instructions (which are DS instructions).  The
updated genfusion.pl was then run to update fusion.md.  Finally, the code for
the "prefixed" attribute is modified so that it considers load + compare
immediate patterns to be like the normal load insns in checking whether
operand[1] is a prefixed instruction.

I have tested this code on a power9 little endian system (with long double
being IEEE 128-bit and IBM 128-bit), a power10 little endian system, and a
power8 big endian system, testing both 32-bit and 64-bit code generation.  Can
I put this code into the master branch, and after a waiting period, apply it to
the GCC 12 and GCC 11 branches (the bug does show up in those branches, and the
patch applies without change).

2023-03-27   Michael Meissner  

gcc/

PR target/105325
* gcc/config/rs6000/genfusion.pl (gen_ld_cmpi_p10): Improve generation
of the ld and lwa instructions which use the DS encoding instead of D.
Use the YZ constraint for these loads.  Handle prefixed loads better.
Set the sign_extend attribute as appropriate.
* gcc/config/rs6000/fusion.md: Regenerate.
* gcc/config/rs6000/rs6000.md (prefixed attribute): Add fused_load_cmpi
instructions to the list of instructions that might have a prefixed load
instruction.

gcc/testsuite/

PR target/105325
* g++.target/powerpc/pr105325.C: New test.
* gcc.target/powerpc/fusion-p10-ldcmpi.c: Adjust insn counts.

---
 gcc/config/rs6000/fusion.md   | 17 +
 gcc/config/rs6000/genfusion.pl| 36 ++-
 gcc/config/rs6000/rs6000.md   |  2 +-
 gcc/testsuite/g++.target/powerpc/pr105325.C   | 23 
 .../gcc.target/powerpc/fusion-p10-ldcmpi.c|  4 +--
 5 files changed, 64 insertions(+), 18 deletions(-)
 create mode 100644 gcc/testsuite/g++.target/powerpc/pr105325.C

diff --git a/gcc/config/rs6000/fusion.md b/gcc/config/rs6000/fusion.md
index d45fb138a70..da9953d9ad9 100644
--- a/gcc/config/rs6000/fusion.md
+++ b/gcc/config/rs6000/fusion.md
@@ -22,7 +22,7 @@
 ;; load mode is DI result mode is clobber compare mode is CC extend is none
 (define_insn_and_split "*ld_cmpdi_cr0_DI_clobber_CC_none"
   [(set (match_operand:CC 2 "cc_reg_operand" "=x")
-(compare:CC (match_operand:DI 1 "ds_form_mem_operand" "m")
+(compare:CC (match_operand:DI 1 "ds_form_mem_operand" "YZ")
 (match_operand:DI 3 "const_m1_to_1_operand" "n")))
(clobber (match_scratch:DI 0 "=r"))]
   "(TARGET_P10_FUSION)"
@@ -43,7 +43,7 @@ (define_insn_and_split "*ld_cmpdi_cr0_DI_clobber_CC_none"
 ;; load mode is DI result mode is clobber compare mode is CCUNS extend is none
 (define_insn_and_split "*ld_cmpldi_cr0_DI_clobber_CCUNS_none"
   [(set (match_operand:CCUNS 2 "cc_reg_operand" "=x")
-(compare:CCUNS (match_operand:DI 1 "ds_form_mem_operand" "m")
+(compare:CCUNS (match_operand:DI 1 "ds_form_mem_operand" "YZ")
(match_operand:DI 3 "const_0_to_1_operand" "n")))
(clobber (match_scratch:DI 0 "=r"))]
   "(TARGET_P10_FUSION)"
@@ -64,7 +64,7 @@ (define_insn_and_split "*ld_cmpldi_cr0_DI_clobber_CCUNS_none"
 ;; load mode is DI result mode is DI compare mode is CC extend is none
 (define_insn_and_split "*ld_cmpdi_cr0_DI_DI_CC_none"
   [(set (match_operand:CC 2 "cc_reg_operand" "=x")
-(compare:CC (match_operand:DI 1 "ds_form_mem_operand" "m")
+(compare:CC (match_operand:DI 1

[committed] CRIS: Correct "T" to define_memory_constraint, not define_constraint

2023-03-27 Thread Hans-Peter Nilsson via Gcc-patches

This patch has no effect on builds using reload of libgcc, newlib libc, my
own at-a-glance-testsuite and coremark.  That somewhat surprisingly
also goes for LRA builds, even with all CRIS reload_in_progress
augmented to include lra_in_progress.  I just noticed it when checking
because another port had a similar fix, where it mattered for LRA.

* config/cris/constraints.md ("T"): Correct to
define_memory_constraint.
---
 gcc/config/cris/constraints.md | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/gcc/config/cris/constraints.md b/gcc/config/cris/constraints.md
index 5efb61364f46..fa9aa19d13e3 100644
--- a/gcc/config/cris/constraints.md
+++ b/gcc/config/cris/constraints.md
@@ -100,7 +100,7 @@ (define_memory_constraint "Q"
   || reload_completed)")))
 
 ;; Extra constraints.
-(define_constraint "T"
+(define_memory_constraint "T"
   "Memory three-address operand."
   ;; All are indirect-memory:
   (and (match_code "mem")
-- 
2.30.2

[committed] CRIS: Add peephole2 to handle gcc.target/cris/rld-legit1.c for LRA

2023-03-27 Thread Hans-Peter Nilsson via Gcc-patches

The test-case gcc.target/cris/rld-legit1.c is a reduced
test-case that required defining LEGITIMIZE_RELOAD_ADDRESS
to stop the address from being decomposed into several insns
by reload.  Valid but suboptimal code was generated.

(Before implementing that hook for CRIS, the same test-case
also exposed a bug in reload, and a fix was committed to
avoid an ICE; see e.g. git r0-71992-gff0d9879ab0f30 and
related commits.  But, post-cc0, reload no longer handles
this test-case without LEGITIMIZE_RELOAD_ADDRESS helping and
there'd again an be ICE for CRIS (again: only if
LEGITIMIZE_RELOAD_ADDRESS is disabled).  There's a patch to
reload to fix that, at
https://gcc.gnu.org/pipermail/gcc-patches/2023-February/612039.html)

But, LRA also does not handle that test-case gracefully, and
like reload without LEGITIMIZE_RELOAD_ADDRESS for CRIS,
decomposes the address into a suboptimal (but valid)
sequence, about as messy as that from reload, and
gcc.target/cris/rld-legit1.c would regress for LRA.  There's
nothing equivalent to LEGITIMIZE_RELOAD_ADDRESS for LRA.
(Stepping through LRA, I can't find an obvious place where
to put such a hook.  Granted, I haven't seen this kind of
messy decomposition in other code, so I'm not insisting a
LEGITIMIZE_RELOAD_ADDRESS-like hook is a good idea.)

These new peephole2's are required to not regress
gcc.target/cris/rld-legit1.c with LRA enabled for CRIS.
They don't appear to otherwise make a difference for neither
libgcc, newlib libc, my own at-a-glance tests nor coremark,
for neither LRA nor reload.

* config/cris/cris.md (BW2): New mode-iterator.
(lra_szext_decomposed, lra_szext_decomposed_indirect_with_offset): New
peephole2s.
---
 gcc/config/cris/cris.md | 50 +
 1 file changed, 50 insertions(+)

diff --git a/gcc/config/cris/cris.md b/gcc/config/cris/cris.md
index 2bea480a0200..c3de259983c6 100644
--- a/gcc/config/cris/cris.md
+++ b/gcc/config/cris/cris.md
@@ -183,6 +183,10 @@ (define_mode_iterator SI_ [SI])
 
 (define_mode_iterator WD [SI HI])
 (define_mode_iterator BW [HI QI])
+
+; Another "BW" for use where an independent iteration is needed.
+(define_mode_iterator BW2 [HI QI])
+
 (define_mode_attr S [(SI "HI") (HI "QI")])
 (define_mode_attr s [(SI "hi") (HI "qi")])
 (define_mode_attr m [(SI ".d") (HI ".w") (QI ".b")])
@@ -2832,6 +2836,52 @@ (define_peephole2 ; andqu
   operands[3] = gen_rtx_ZERO_EXTEND (SImode, op1);
   operands[4] = GEN_INT (trunc_int_for_mode (INTVAL (operands[1]), QImode));
 })
+
+;; Fix a decomposed szext: fuse it with the memory operand of the
+;; load.  This is typically the sign-extension part of a decomposed
+;; "indirect offset" address.
+(define_peephole2 ; lra_szext_decomposed
+  [(parallel
+[(set (match_operand:BW 0 "register_operand")
+ (match_operand:BW 1 "memory_operand"))
+ (clobber (reg:CC CRIS_CC0_REGNUM))])
+   (parallel
+[(set (match_operand:SI 2 "register_operand") (szext:SI (match_dup 0)))
+ (clobber (reg:CC CRIS_CC0_REGNUM))])]
+  "REGNO (operands[0]) == REGNO (operands[2])
+   || peep2_reg_dead_p (2, operands[0])"
+  [(parallel
+[(set (match_dup 2) (szext:SI (match_dup 1)))
+ (clobber (reg:CC CRIS_CC0_REGNUM))])])
+
+;; Re-compose a decomposed "indirect offset" address for a szext
+;; operation.  The non-clobbering "addi" is generated by LRA.
+;; This and lra_szext_decomposed is covered by cris/rld-legit1.c.
+(define_peephole2 ; lra_szext_decomposed_indirect_with_offset
+  [(parallel
+[(set (match_operand:SI 0 "register_operand")
+ (sign_extend:SI (mem:BW (match_operand:SI 1 "register_operand"
+ (clobber (reg:CC CRIS_CC0_REGNUM))])
+   (set (match_dup 0)
+   (plus:SI (match_dup 0) (match_operand:SI 2 "register_operand")))
+   (parallel
+[(set (match_operand:SI 3 "register_operand")
+ (szext:SI (mem:BW2 (match_dup 0
+ (clobber (reg:CC CRIS_CC0_REGNUM))])]
+  "(REGNO (operands[0]) == REGNO (operands[3])
+|| peep2_reg_dead_p (3, operands[0]))
+   && (REGNO (operands[0]) == REGNO (operands[1])
+   || peep2_reg_dead_p (3, operands[0]))"
+  [(parallel
+[(set
+  (match_dup 3)
+  (szext:SI
+   (mem:BW2 (plus:SI (szext:SI (mem:BW (match_dup 1))) (match_dup 2)
+ (clobber (reg:CC CRIS_CC0_REGNUM))])])
+
+;; Add operations with similar or same decomposed addresses here, when
+;; encountered - but only when covered by mentioned test-cases for at
+;; least one of the cases generalized in the pattern.
 
 ;; Local variables:
 ;; mode:emacs-lisp
-- 
2.30.2

[committed] CRIS: Improve bailing for eliminable compares for "addi" vs. "add"

2023-03-27 Thread Hans-Peter Nilsson via Gcc-patches

This patch affects a post-reload define_split for CRIS that transforms
a condition-code-clobbering addition into a non-clobbering addition.
(A "two-operand" addition between registers is the only insn that has
both a condition-code-clobbering and a non-clobbering variant for
CRIS.)  Many more "add.d":s are replaced by non-condition-code-
clobbering "addi":s after this patch, but most of the transformations
don't matter.

CRIS with LRA generated code that exposed a flaw with the original
patch: it bailed too easily, on *any* insn using the result of the
addition.  To wit, more effort than simply applying reg_mentioned_p is
needed to inspect the user, in the code to avoid munging an insn
sequence that cmpelim is supposed to handle.

With this patch coremark score for CRIS (*with reload*) improves by
less than 0.01% (a single "nop" is eliminated in
core_state_transition, in an execution path that affects ~1/20 of all
of the 10240 calls).  However, the original cause for this patch is to
not regress gcc.target/cris/pr93372-44.c for LRA, where otherwise a
needless "cmpq" is emitted.  For CRIS with LRA, the performance effect
on coremark isn't even measurable, except by reducing the size of the
executable due to affecting non-called library code.

* config/cris/cris.md ("*add3_addi"): Improve to bail only
for possible eliminable compares.
---
 gcc/config/cris/cris.md | 53 -
 1 file changed, 52 insertions(+), 1 deletion(-)

diff --git a/gcc/config/cris/cris.md b/gcc/config/cris/cris.md
index 2bea480a0200..30ff7e75c1bf 100644
--- a/gcc/config/cris/cris.md
+++ b/gcc/config/cris/cris.md
@@ -1362,12 +1362,63 @@ (define_split ;; "*add3_addi"
 {
   rtx reg = operands[0];
   rtx_insn *i = next_nonnote_nondebug_insn_bb (curr_insn);
+  rtx x, src, dest;
 
   while (i != NULL_RTX && (!INSN_P (i) || DEBUG_INSN_P (i)))
 i = next_nonnote_nondebug_insn_bb (i);
 
-  if (i == NULL_RTX || reg_mentioned_p (reg, i) || BARRIER_P (i))
+  /* We don't want to strip the clobber if the next insn possibly uses the
+ zeroness of the result.  Preferably fail only if we see a compare insn
+ that looks eliminable and with the register "reg" compared.  With some
+ effort we could also check for an equality test (EQ, NE) in the post-split
+ user, just not for now.  */
+  if (i == NULL_RTX)
 FAIL;
+
+  x = single_set (i);
+
+  /* We explicitly need to bail on a BARRIER, but that's implied by a failing
+ single_set test.  */
+  if (x == NULL_RTX)
+FAIL;
+
+  src = SET_SRC (x);
+  dest = SET_DEST (x);
+
+  /* Bail on (post-split) eliminable compares.  */
+  if (REG_P (dest) && REGNO (dest) == CRIS_CC0_REGNUM
+  && GET_CODE (src) == COMPARE)
+{
+  rtx cop0 = XEXP (src, 0);
+
+  if (REG_P (cop0) && REGNO (cop0) == REGNO (reg)
+ && XEXP (src, 1) == const0_rtx)
+   FAIL;
+}
+
+  /* Bail out if we see a (pre-split) cbranch or cstore where the comparison
+ looks eliminable and uses the destination register in this addition.  We
+ don't need to look very deep: a single_set which is a parallel clobbers
+ something, and (one of) that something, is always CRIS_CC0_REGNUM here.
+ Also, the entities we're looking for are two-element parallels.  A
+ split-up cbranch or cstore doesn't clobber CRIS_CC0_REGNUM.  A cbranch has
+ if_then_else as its source with a comparison operator as the condition,
+ and a cstore has a source with the comparison operator directly.  That
+ also matches dstep, so look for pc as destination for the if_then_else.
+ We error on the safe side if we happen to catch other conditional entities
+ and FAIL, that just means the split won't happen.  */
+  if (GET_CODE (PATTERN (i)) == PARALLEL && XVECLEN (PATTERN (i), 0) == 2)
+{
+  rtx cmp
+   = (GET_CODE (src) == IF_THEN_ELSE && dest == pc_rtx
+  ? XEXP (src, 0)
+  : (COMPARISON_P (src) ? src : NULL_RTX));
+  gcc_assert (cmp == NULL_RTX || COMPARISON_P (cmp));
+
+  if (cmp && REG_P (XEXP (cmp, 0)) && XEXP (cmp, 1) == const0_rtx
+ && REGNO (XEXP (cmp, 0)) == REGNO (reg))
+   FAIL;
+}
 })
 
 (define_insn "mul3"
-- 
2.30.2

[committed] CRIS: Remove unused constraint "R".

2023-03-27 Thread Hans-Peter Nilsson via Gcc-patches

gcc:
* config/cris/constraints.md ("R"): Remove unused constraint.
---
 gcc/config/cris/constraints.md | 10 --
 1 file changed, 10 deletions(-)

diff --git a/gcc/config/cris/constraints.md b/gcc/config/cris/constraints.md
index 05a1d24ef5a1..5efb61364f46 100644
--- a/gcc/config/cris/constraints.md
+++ b/gcc/config/cris/constraints.md
@@ -100,16 +100,6 @@ (define_memory_constraint "Q"
   || reload_completed)")))
 
 ;; Extra constraints.
-(define_constraint "R"
-  "An operand to BDAP or BIAP."
-   ;; A BIAP; r.S?
-  (ior (match_test "cris_biap_index_p (op, reload_in_progress
-  || reload_completed)")
-   ;; A [reg] or (int) [reg], maybe with post-increment.
-   (match_test "cris_bdap_index_p (op, reload_in_progress
-  || reload_completed)")
-   (match_test "CONSTANT_P (op)")))
-
 (define_constraint "T"
   "Memory three-address operand."
   ;; All are indirect-memory:
-- 
2.30.2

[GCC14 QUEUE PATCH] RISC-V: Eliminate redundant vsetvli for duplicate AVL def

2023-03-27 Thread juzhe . zhong

From: Juzhe-Zhong 

void f (int8_t* base1,int8_t* base2,int8_t* out,int n)
{
  vint8mf4_t v = __riscv_vle8_v_i8mf4 (base1, 32);
  for (int i = 0; i < n; i++){
v = __riscv_vor_vx_i8mf4 (v, 101, 32);
v = __riscv_vle8_v_i8mf4_tu (v, base2, 32);
  }
  __riscv_vse8_v_i8mf4 (out, v, 32);
}

before this patch:
f:
li  a5,32
vsetvli zero,a5,e8,mf4,tu,ma
vle8.v  v1,0(a0)
ble a3,zero,.L2
li  t0,0
li  a0,101
.L3:
addiw   t0,t0,1
vor.vx  v1,v1,a0
vle8.v  v1,0(a1)
bne a3,t0,.L3
.L2:
vsetvli zero,zero,e8,mf4,tu,ma
vse8.v  v1,0(a2)
ret


afther this patch:

f:
li  a5,32
vsetvli zero,a5,e8,mf4,tu,ma
vle8.v  v1,0(a0)
ble a3,zero,.L2
li  t0,0
li  a0,101
.L3:
addiw   t0,t0,1
vor.vx  v1,v1,a0
vle8.v  v1,0(a1)
bne a3,t0,.L3
.L2:
vse8.v  v1,0(a2)
ret

gcc/ChangeLog:

* config/riscv/riscv-vsetvl.cc 
(vector_infos_manager::all_avail_in_compatible_p): New function.
(pass_vsetvl::refine_vsetvls): Remove redundant vsetvli.
* config/riscv/riscv-vsetvl.h: New function declare.

gcc/testsuite/ChangeLog:

* gcc.target/riscv/rvv/vsetvl/avl_single-102.c: New test.

---
 gcc/config/riscv/riscv-vsetvl.cc  | 67 ++-
 gcc/config/riscv/riscv-vsetvl.h   |  1 +
 .../riscv/rvv/vsetvl/avl_single-102.c | 16 +
 3 files changed, 81 insertions(+), 3 deletions(-)
 create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/vsetvl/avl_single-102.c

diff --git a/gcc/config/riscv/riscv-vsetvl.cc b/gcc/config/riscv/riscv-vsetvl.cc
index 4948e5d4c5e..58568b45010 100644
--- a/gcc/config/riscv/riscv-vsetvl.cc
+++ b/gcc/config/riscv/riscv-vsetvl.cc
@@ -2376,6 +2376,23 @@ vector_infos_manager::all_empty_predecessor_p (const 
basic_block cfg_bb) const
   return true;
 }
 
+bool
+vector_infos_manager::all_avail_in_compatible_p (const basic_block cfg_bb) 
const
+{
+  const auto  = vector_block_infos[cfg_bb->index].local_dem;
+  sbitmap avin = vector_avin[cfg_bb->index];
+  unsigned int bb_index;
+  sbitmap_iterator sbi;
+  EXECUTE_IF_SET_IN_BITMAP (avin, 0, bb_index, sbi)
+  {
+const auto _info
+  = static_cast (*vector_exprs[bb_index]);
+if (!info.compatible_p (avin_info))
+  return false;
+  }
+  return true;
+}
+
 bool
 vector_infos_manager::all_same_avl_p (const basic_block cfg_bb,
  sbitmap bitdata) const
@@ -3741,9 +3758,53 @@ pass_vsetvl::refine_vsetvls (void) const
  m_vector_manager->to_refine_vsetvls.add (rinsn);
  continue;
}
-  rinsn = PREV_INSN (rinsn);
-  rtx new_pat = gen_vsetvl_pat (VSETVL_VTYPE_CHANGE_ONLY, info, NULL_RTX);
-  change_insn (rinsn, new_pat);
+
+  /* Optimize such case:
+   void f (int8_t* base1,int8_t* base2,int8_t* out,int n)
+   {
+ vint8mf4_t v = __riscv_vle8_v_i8mf4 (base1, 32);
+ for (int i = 0; i < n; i++){
+   v = __riscv_vor_vx_i8mf4 (v, 101, 32);
+   v = __riscv_vle8_v_i8mf4_tu (v, base2, 32);
+ }
+ __riscv_vse8_v_i8mf4 (out, v, 32);
+   }
+
+   f:
+   li  a5,32
+   vsetvli zero,a5,e8,mf4,tu,ma
+   vle8.v  v1,0(a0)
+   ble a3,zero,.L2
+   li  t0,0
+   li  a0,101
+   .L3:
+   addiw   t0,t0,1
+   vor.vx  v1,v1,a0
+   vle8.v  v1,0(a1)
+   bne a3,t0,.L3
+   .L2:
+   vsetvli zero,zero,e8,mf4,tu,ma
+   vse8.v  v1,0(a2)
+   ret
+
+   The second vsetvli is redundant.  */
+
+  gcc_assert (has_vtype_op (insn->rtl ()));
+  rinsn = PREV_INSN (insn->rtl ());
+  gcc_assert (vector_config_insn_p (PREV_INSN (insn->rtl (;
+  if (m_vector_manager->all_avail_in_compatible_p (cfg_bb))
+   {
+ size_t id = m_vector_manager->get_expr_id (info);
+ if (bitmap_bit_p (m_vector_manager->vector_del[cfg_bb->index], id))
+   continue;
+ eliminate_insn (rinsn);
+   }
+  else
+   {
+ rtx new_pat
+   = gen_vsetvl_pat (VSETVL_VTYPE_CHANGE_ONLY, info, NULL_RTX);
+ change_insn (rinsn, new_pat);
+   }
 }
 }
 
diff --git a/gcc/config/riscv/riscv-vsetvl.h b/gcc/config/riscv/riscv-vsetvl.h
index eec03d35071..d05472c86a0 100644
--- a/gcc/config/riscv/riscv-vsetvl.h
+++ b/gcc/config/riscv/riscv-vsetvl.h
@@ -451,6 +451,7 @@ public:
   bool all_same_ratio_p (sbitmap) const;
 
   bool all_empty_predecessor_p (const

[PATCH] RISC-V: Eliminate redundant vsetvli for duplicate AVL def

2023-03-27 Thread juzhe . zhong

From: Juzhe-Zhong 

void f (int8_t* base1,int8_t* base2,int8_t* out,int n)
{
  vint8mf4_t v = __riscv_vle8_v_i8mf4 (base1, 32);
  for (int i = 0; i < n; i++){
v = __riscv_vor_vx_i8mf4 (v, 101, 32);
v = __riscv_vle8_v_i8mf4_tu (v, base2, 32);
  }
  __riscv_vse8_v_i8mf4 (out, v, 32);
}

before this patch:
f:
li  a5,32
vsetvli zero,a5,e8,mf4,tu,ma
vle8.v  v1,0(a0)
ble a3,zero,.L2
li  t0,0
li  a0,101
.L3:
addiw   t0,t0,1
vor.vx  v1,v1,a0
vle8.v  v1,0(a1)
bne a3,t0,.L3
.L2:
vsetvli zero,zero,e8,mf4,tu,ma
vse8.v  v1,0(a2)
ret


afther this patch:

f:
li  a5,32
vsetvli zero,a5,e8,mf4,tu,ma
vle8.v  v1,0(a0)
ble a3,zero,.L2
li  t0,0
li  a0,101
.L3:
addiw   t0,t0,1
vor.vx  v1,v1,a0
vle8.v  v1,0(a1)
bne a3,t0,.L3
.L2:
vse8.v  v1,0(a2)
ret

gcc/ChangeLog:

* config/riscv/riscv-vsetvl.cc 
(vector_infos_manager::all_avail_in_compatible_p): New function.
(pass_vsetvl::refine_vsetvls): Remove redundant vsetvli.
* config/riscv/riscv-vsetvl.h: New function declare.

gcc/testsuite/ChangeLog:

* gcc.target/riscv/rvv/vsetvl/avl_single-102.c: New test.

---
 gcc/config/riscv/riscv-vsetvl.cc  | 67 ++-
 gcc/config/riscv/riscv-vsetvl.h   |  1 +
 .../riscv/rvv/vsetvl/avl_single-102.c | 16 +
 3 files changed, 81 insertions(+), 3 deletions(-)
 create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/vsetvl/avl_single-102.c

diff --git a/gcc/config/riscv/riscv-vsetvl.cc b/gcc/config/riscv/riscv-vsetvl.cc
index 4948e5d4c5e..58568b45010 100644
--- a/gcc/config/riscv/riscv-vsetvl.cc
+++ b/gcc/config/riscv/riscv-vsetvl.cc
@@ -2376,6 +2376,23 @@ vector_infos_manager::all_empty_predecessor_p (const 
basic_block cfg_bb) const
   return true;
 }
 
+bool
+vector_infos_manager::all_avail_in_compatible_p (const basic_block cfg_bb) 
const
+{
+  const auto  = vector_block_infos[cfg_bb->index].local_dem;
+  sbitmap avin = vector_avin[cfg_bb->index];
+  unsigned int bb_index;
+  sbitmap_iterator sbi;
+  EXECUTE_IF_SET_IN_BITMAP (avin, 0, bb_index, sbi)
+  {
+const auto _info
+  = static_cast (*vector_exprs[bb_index]);
+if (!info.compatible_p (avin_info))
+  return false;
+  }
+  return true;
+}
+
 bool
 vector_infos_manager::all_same_avl_p (const basic_block cfg_bb,
  sbitmap bitdata) const
@@ -3741,9 +3758,53 @@ pass_vsetvl::refine_vsetvls (void) const
  m_vector_manager->to_refine_vsetvls.add (rinsn);
  continue;
}
-  rinsn = PREV_INSN (rinsn);
-  rtx new_pat = gen_vsetvl_pat (VSETVL_VTYPE_CHANGE_ONLY, info, NULL_RTX);
-  change_insn (rinsn, new_pat);
+
+  /* Optimize such case:
+   void f (int8_t* base1,int8_t* base2,int8_t* out,int n)
+   {
+ vint8mf4_t v = __riscv_vle8_v_i8mf4 (base1, 32);
+ for (int i = 0; i < n; i++){
+   v = __riscv_vor_vx_i8mf4 (v, 101, 32);
+   v = __riscv_vle8_v_i8mf4_tu (v, base2, 32);
+ }
+ __riscv_vse8_v_i8mf4 (out, v, 32);
+   }
+
+   f:
+   li  a5,32
+   vsetvli zero,a5,e8,mf4,tu,ma
+   vle8.v  v1,0(a0)
+   ble a3,zero,.L2
+   li  t0,0
+   li  a0,101
+   .L3:
+   addiw   t0,t0,1
+   vor.vx  v1,v1,a0
+   vle8.v  v1,0(a1)
+   bne a3,t0,.L3
+   .L2:
+   vsetvli zero,zero,e8,mf4,tu,ma
+   vse8.v  v1,0(a2)
+   ret
+
+   The second vsetvli is redundant.  */
+
+  gcc_assert (has_vtype_op (insn->rtl ()));
+  rinsn = PREV_INSN (insn->rtl ());
+  gcc_assert (vector_config_insn_p (PREV_INSN (insn->rtl (;
+  if (m_vector_manager->all_avail_in_compatible_p (cfg_bb))
+   {
+ size_t id = m_vector_manager->get_expr_id (info);
+ if (bitmap_bit_p (m_vector_manager->vector_del[cfg_bb->index], id))
+   continue;
+ eliminate_insn (rinsn);
+   }
+  else
+   {
+ rtx new_pat
+   = gen_vsetvl_pat (VSETVL_VTYPE_CHANGE_ONLY, info, NULL_RTX);
+ change_insn (rinsn, new_pat);
+   }
 }
 }
 
diff --git a/gcc/config/riscv/riscv-vsetvl.h b/gcc/config/riscv/riscv-vsetvl.h
index eec03d35071..d05472c86a0 100644
--- a/gcc/config/riscv/riscv-vsetvl.h
+++ b/gcc/config/riscv/riscv-vsetvl.h
@@ -451,6 +451,7 @@ public:
   bool all_same_ratio_p (sbitmap) const;
 
   bool all_empty_predecessor_p (const

Re: [PATCH 2/2] Remove Negative(gwarf-) from gdwarf

2023-03-27 Thread Joseph Myers

On Fri, 24 Mar 2023, Richard Biener via Gcc-patches wrote:

> Prior to the removal of STABS support the gdwarf, gstabs, ... options
> formed a cycle with their Negative(..) option attribute.  But that
> didn't actually have any effect since most of the options also
> are Joined or JoinedOrMissing for which there's no pruning of options
> and so once ran into the set_debug_level diagnostics reporting
> conflicting debug formats.
> 
> The following removes the remains of that cycle, which is a
> Negative option from gdwarf to gdwarf-.  With RejectNegative
> added the expected effect of -gdwarf-4 -gdwarf would be to
> enable DWARF5 support (but this doesn't happen for some reason).
> I think the more sensible behavior is that seen and implemented
> in opts.cc, the more specific -gdwarf-4 determines the DWARF level
> and a later or earlier -gdwarf becomes a no-op.  So the
> Negative(..) annotation on gdwarf is just confusing.
> 
> Bootstrapped and tested on x86_64-unknown-linux-gnu, OK?

OK.

-- 
Joseph S. Myers
jos...@codesourcery.com

Re: [PATCH 1/2] Disallow -gno-dwarf, gno-dwarf-N, -gno-gdb and -gno-vms

2023-03-27 Thread Joseph Myers

On Fri, 24 Mar 2023, Richard Biener via Gcc-patches wrote:

> The following adds RejectNegative to the gdwarf, gdwarf-, ggdb and gvms
> options since the current behavior is to treat the negative variant
> the same as the positive variant.  In particular -ggdb -gno-gdb
> do not cancel, and plain -gno-dwarf will enable (dwarf!) debug output.
> 
> Rejecting the negative forms avoids interpreting sensible behavior
> to combinations of options like -gdwarf-5 -gno-dwarf-3 and sticks to
> the behavior that later -g options simply override earlier ones and
> the only negative form is -g0.
> 
> Bootstrapped and tested on x86_64-unknown-linux-gnu, OK?

OK.

-- 
Joseph S. Myers
jos...@codesourcery.com

Re: [RFC/RFT,V2] CFI: Add support for gcc CFI in aarch64

2023-03-27 Thread Sami Tolvanen via Gcc-patches

On Mon, Mar 27, 2023 at 2:30 AM Peter Zijlstra  wrote:
>
> On Sat, Mar 25, 2023 at 01:54:16AM -0700, Dan Li wrote:
>
> > In the compiler part[4], most of the content is the same as Sami's
> > implementation[3], except for some minor differences, mainly including:
> >
> > 1. The function typeid is calculated differently and it is difficult
> > to be consistent.
>
> This means there is an effective ABI break between the compilers, which
> is sad :-( Is there really nothing to be done about this?

I agree, this would be unfortunate, and would also be a compatibility
issue with rustc where there's ongoing work to support
clang-compatible CFI type hashes:

https://github.com/rust-lang/rust/pull/105452

Sami

[PATCH] libstdc++/complex: Remove implicit type casts in complex

2023-03-27 Thread Weslley da Silva Pereira via Gcc-patches

Dear all,

Here follows a patch that removes implicit type casts in std::complex.

*Description:* The current implementation of `complex<_Tp>` assumes that
`int, double, long double` are explicitly convertible to `_Tp`. Moreover,
it also assumes that:

1. `int` is implicitly convertible to `_Tp`, e.g., when using
`complex<_Tp>(1)`.
2. `long double` can be attributed to a `_Tp` variable, e.g., when using
`const _Tp __pi_2 = 1.5707963267948966192313216916397514L`.

This patch transforms the implicit casts (1) and (2) into explicit type
casts. As a result, `std::complex` is now able to support more types. One
example is the type `Eigen::Half` from
https://eigen.tuxfamily.org/dox-devel/Half_8h_source.html which does not
implement implicit type conversions.

*ChangeLog:*
libstdc++-v3/ChangeLog:

* include/std/complex:

*Patch:* fix_complex.diff. (Also at
https://github.com/gcc-mirror/gcc/pull/84)

*OBS:* I didn't find a good reason for adding new tests or test results
here since this is really a small upgrade (in my view) to std::complex.

Sincerely,
  Weslley

-- 
Weslley S. Pereira
diff --git a/libstdc++-v3/include/std/complex b/libstdc++-v3/include/std/complex
index 0f5f14c3ddb..1a4ac8a2a54 100644
--- a/libstdc++-v3/include/std/complex
+++ b/libstdc++-v3/include/std/complex
@@ -80,7 +80,7 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
   template
 _GLIBCXX20_CONSTEXPR complex<_Tp> conj(const complex<_Tp>&);
   ///  Return complex with magnitude @a rho and angle @a theta.
-  template complex<_Tp> polar(const _Tp&, const _Tp& = 0);
+  template complex<_Tp> polar(const _Tp&, const _Tp& = _Tp(0));
 
   // Transcendentals:
   /// Return complex cosine of @a z.
@@ -961,7 +961,7 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
 inline complex<_Tp>
 polar(const _Tp& __rho, const _Tp& __theta)
 {
-  __glibcxx_assert( __rho >= 0 );
+  __glibcxx_assert( __rho >= _Tp(0) );
   return complex<_Tp>(__rho * cos(__theta), __rho * sin(__theta));
 }
 
@@ -1161,13 +1161,13 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
 
   if (__x == _Tp())
 {
-  _Tp __t = sqrt(abs(__y) / 2);
+  _Tp __t = sqrt(abs(__y) / _Tp(2));
   return complex<_Tp>(__t, __y < _Tp() ? -__t : __t);
 }
   else
 {
-  _Tp __t = sqrt(2 * (std::abs(__z) + abs(__x)));
-  _Tp __u = __t / 2;
+  _Tp __t = sqrt(_Tp(2) * (std::abs(__z) + abs(__x)));
+  _Tp __u = __t / _Tp(2);
   return __x > _Tp()
 ? complex<_Tp>(__u, __y / __t)
 : complex<_Tp>(abs(__y) / __t, __y < _Tp() ? -__u : __u);
@@ -1257,7 +1257,7 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
 complex<_Tp>
 __complex_pow_unsigned(complex<_Tp> __x, unsigned __n)
 {
-  complex<_Tp> __y = __n % 2 ? __x : complex<_Tp>(1);
+  complex<_Tp> __y = __n % 2 ? __x : complex<_Tp>(_Tp(1));
 
   while (__n >>= 1)
 {
@@ -1280,7 +1280,7 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
 pow(const complex<_Tp>& __z, int __n)
 {
   return __n < 0
-	? complex<_Tp>(1) / std::__complex_pow_unsigned(__z, -(unsigned)__n)
+	? complex<_Tp>(_Tp(1)) / std::__complex_pow_unsigned(__z, -(unsigned)__n)
 : std::__complex_pow_unsigned(__z, __n);
 }
 
@@ -2017,7 +2017,7 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
 __complex_acos(const std::complex<_Tp>& __z)
 {
   const std::complex<_Tp> __t = std::asin(__z);
-  const _Tp __pi_2 = 1.5707963267948966192313216916397514L;
+  const _Tp __pi_2 = _Tp(1.5707963267948966192313216916397514L);
   return std::complex<_Tp>(__pi_2 - __t.real(), -__t.imag());
 }

RE: [EXTERNAL] Re: [PATCH] Fix autoprofiledbootstrap build

2023-03-27 Thread Eugene Rozenfeld via Gcc-patches

Ping for https://gcc.gnu.org/pipermail/gcc-patches/2023-March/613974.html

Thanks,

Eugene

-Original Message-
From: Eugene Rozenfeld 
Sent: Tuesday, March 14, 2023 2:21 PM
To: Jeff Law ; gcc-patches@gcc.gnu.org; Andi Kleen 

Subject: RE: [EXTERNAL] Re: [PATCH] Fix autoprofiledbootstrap build

Hi Jeff,

I revived profile_merger tool in http://github.com/google/autofdo and re-worked 
the patch to merge profiles for compiling the libraries.

Please take a look at the attached patch.

Thanks,

Eugene

-Original Message-
From: Jeff Law  
Sent: Tuesday, November 22, 2022 10:16 PM
To: Eugene Rozenfeld ; gcc-patches@gcc.gnu.org; 
Andi Kleen 
Subject: Re: [EXTERNAL] Re: [PATCH] Fix autoprofiledbootstrap build

[You don't often get email from jeffreya...@gmail.com. Learn why this is 
important at https://aka.ms/LearnAboutSenderIdentification ]

On 11/22/22 14:20, Eugene Rozenfeld wrote:
> I took another look at this. We actually collect perf data when building the 
> libraries. So, we have ./prev-gcc/perf.data, ./prev-libcpp/perf.data, 
> ./prev-libiberty/perf.data, etc. But when creating gcov data for  
> -fauto-profile build of cc1plus or cc1 we only use ./prev-gcc/perf.data . So, 
> a better solution would be either having a single perf.data for all builds 
> (gcc and libraries) or merging perf.data files before attempting 
> autostagefeedback. What would you recommend?

ISTM that if neither approach loses data, then they're functionally equivalent 
-- meaning that we can select whichever is easier to wire into our build system.

A single perf.data might serialize the build.  So perhaps separate, then merge 
right before autostagefeedback.


But I'm willing to go with whatever you think is best.

Jeff

[PATCH] [og12] OpenMP: Constructors and destructors for "declare target" static aggregates

2023-03-27 Thread Julian Brown

This patch adds support for running constructors and destructors for
static (file-scope) aggregates for C++ objects which are marked with
"declare target" directives on OpenMP offload targets.

At present, space is allocated on the target for such aggregates, but
nothing ever constructs them properly, so they end up zero-initialised.

Tested with offloading to AMD GCN. I will apply to the og12 branch
shortly.

ChangeLog

2023-03-27  Julian Brown  

gcc/cp/
* decl2.cc (priority_info): Add omp_tgt_initializations_p and
omp_tgt_destructions_p.
(start_objects, start_static_storage_duration_function,
do_static_initialization_or_destruction,
one_static_initialization_or_destruction,
generate_ctor_or_dtor_function): Add 'omp_target' parameter.  Support
"declare target" decls. Update forward declarations.
(OMP_SSDF_IDENTIFIER): New macro.
(omp_tgt_ssdf_decls): New vec.
(get_priority_info): Initialize omp_tgt_initializations_p and
omp_tgt_destructions_p fields.
(handle_tls_init): Update call to
omp_static_initialization_or_destruction.
(c_parse_final_cleanups): Support constructors/destructors on OpenMP
offload targets.

gcc/
* omp-builtins.def (BUILT_IN_OMP_IS_INITIAL_DEVICE): New builtin.
* tree.cc (get_file_function_name): Support names for on-target
constructor/destructor functions.

libgomp/
* testsuite/libgomp.c++/static-aggr-constructor-destructor-1.C: New
test.
* testsuite/libgomp.c++/static-aggr-constructor-destructor-2.C: New
test.
---
 gcc/cp/decl2.cc   | 225 +++---
 gcc/omp-builtins.def  |   2 +
 gcc/tree.cc   |   6 +-
 .../static-aggr-constructor-destructor-1.C|  28 +++
 .../static-aggr-constructor-destructor-2.C|  31 +++
 5 files changed, 257 insertions(+), 35 deletions(-)
 create mode 100644 
libgomp/testsuite/libgomp.c++/static-aggr-constructor-destructor-1.C
 create mode 100644 
libgomp/testsuite/libgomp.c++/static-aggr-constructor-destructor-2.C

diff --git a/gcc/cp/decl2.cc b/gcc/cp/decl2.cc
index f1a6df375e8..042ae4df700 100644
--- a/gcc/cp/decl2.cc
+++ b/gcc/cp/decl2.cc
@@ -65,16 +65,19 @@ typedef struct priority_info_s {
   /* Nonzero if there have been any destructions at this priority
  throughout the translation unit.  */
   int destructions_p;
+  /* Again, but specifically for OpenMP "declare target" initializations.  */
+  int omp_tgt_initializations_p;
+  int omp_tgt_destructions_p;
 } *priority_info;
 
-static tree start_objects (int, int);
+static tree start_objects (int, int, bool);
 static void finish_objects (int, int, tree);
-static tree start_static_storage_duration_function (unsigned);
+static tree start_static_storage_duration_function (unsigned, bool);
 static void finish_static_storage_duration_function (tree);
 static priority_info get_priority_info (int);
-static void do_static_initialization_or_destruction (tree, bool);
-static void one_static_initialization_or_destruction (tree, tree, bool);
-static void generate_ctor_or_dtor_function (bool, int, location_t *);
+static void do_static_initialization_or_destruction (tree, bool, bool);
+static void one_static_initialization_or_destruction (tree, tree, bool, bool);
+static void generate_ctor_or_dtor_function (bool, int, location_t *, bool);
 static int generate_ctor_and_dtor_functions_for_priority (splay_tree_node,
  void *);
 static tree prune_vars_needing_no_initialization (tree *);
@@ -3791,7 +3794,7 @@ generate_tls_wrapper (tree fn)
vtv_start_verification_constructor_init_function.  */
 
 static tree
-start_objects (int method_type, int initp)
+start_objects (int method_type, int initp, bool omp_target = false)
 {
   /* Make ctor or dtor function.  METHOD_TYPE may be 'I' or 'D'.  */
   int module_init = 0;
@@ -3806,7 +3809,16 @@ start_objects (int method_type, int initp)
 {
   char type[14];
 
-  unsigned len = sprintf (type, "sub_%c", method_type);
+  unsigned len;
+  if (omp_target)
+   /* Use "off_" signifying "offload" here.  The name must be distinct
+  from the non-offload case.  The format of the name is scanned in
+  tree.cc/get_file_function_name, so stick to the same length for
+  both name variants.  */
+   len = sprintf (type, "off_%c", method_type);
+  else
+   len = sprintf (type, "sub_%c", method_type);
+
   if (initp != DEFAULT_INIT_PRIORITY)
{
  char joiner = '_';
@@ -3821,6 +3833,17 @@ start_objects (int method_type, int initp)
 
   tree fntype =build_function_type (void_type_node, void_list_node);
   tree fndecl = build_lang_decl (FUNCTION_DECL, name, fntype);
+
+  if (omp_target)
+{
+  DECL_ATTRIBUTES (fndecl)
+   = tree_cons (get_identifier ("omp declare target"),

[PATCH] fixincludes: Declare memmem if it's not declared in system headers [PR109293]

2023-03-27 Thread Xi Ruoyao via Gcc-patches

memmem is not POSIX so the system may lack it.  Then libiberty will
provide an implementation, but it's a "supplemental function" and not
declared in libiberty.h.  We need to declare the prototype to use it
then.

See libiberty doc at
https://gcc.gnu.org/onlinedocs/libiberty/Supplemental-Functions.html.

Tested by bootstrapping GCC in the following container environments on
x86_64-linux-gnu:

1. "Vanilla" system with memmem in Glibc.
2. memmem removed from string.h.
3. memmem removed from both string.h and libc.so.

For 3, also verified that memmem from libiberty is linked into fixincl
executable.

Ok for trunk?

fixincludes/ChangeLog:

PR other/109293
* configure.ac (AC_CHECK_DECLS): Add memmem.
* configure: Regenerate.
* config.h.in: Regenerate.
* system.h (memmem): Declare if HAVE_DECL_MEMMEM is zero.
---
 fixincludes/config.h.in  |  4 
 fixincludes/configure| 10 ++
 fixincludes/configure.ac |  2 +-
 fixincludes/system.h |  4 
 4 files changed, 19 insertions(+), 1 deletion(-)

diff --git a/fixincludes/config.h.in b/fixincludes/config.h.in
index 69a67f5f116..0fd21b721b9 100644
--- a/fixincludes/config.h.in
+++ b/fixincludes/config.h.in
@@ -78,6 +78,10 @@
don't. */
 #undef HAVE_DECL_GETC_UNLOCKED
 
+/* Define to 1 if you have the declaration of `memmem', and to 0 if you don't.
+   */
+#undef HAVE_DECL_MEMMEM
+
 /* Define to 1 if you have the declaration of `putchar_unlocked', and to 0 if
you don't. */
 #undef HAVE_DECL_PUTCHAR_UNLOCKED
diff --git a/fixincludes/configure b/fixincludes/configure
index b3bca666a4d..bdcc41f6ddc 100755
--- a/fixincludes/configure
+++ b/fixincludes/configure
@@ -5043,6 +5043,16 @@ fi
 cat >>confdefs.h <<_ACEOF
 #define HAVE_DECL_VASPRINTF $ac_have_decl
 _ACEOF
+ac_fn_c_check_decl "$LINENO" "memmem" "ac_cv_have_decl_memmem" 
"$ac_includes_default"
+if test "x$ac_cv_have_decl_memmem" = xyes; then :
+  ac_have_decl=1
+else
+  ac_have_decl=0
+fi
+
+cat >>confdefs.h <<_ACEOF
+#define HAVE_DECL_MEMMEM $ac_have_decl
+_ACEOF
 
 ac_fn_c_check_decl "$LINENO" "clearerr_unlocked" 
"ac_cv_have_decl_clearerr_unlocked" "$ac_includes_default"
 if test "x$ac_cv_have_decl_clearerr_unlocked" = xyes; then :
diff --git a/fixincludes/configure.ac b/fixincludes/configure.ac
index 14813b910f1..ef2227e3c93 100644
--- a/fixincludes/configure.ac
+++ b/fixincludes/configure.ac
@@ -88,7 +88,7 @@ define(fixincludes_UNLOCKED_FUNCS, clearerr_unlocked 
feof_unlocked dnl
   fread_unlocked fwrite_unlocked getchar_unlocked getc_unlocked dnl
   putchar_unlocked putc_unlocked)
 AC_CHECK_FUNCS(fixincludes_UNLOCKED_FUNCS)
-AC_CHECK_DECLS([abort, asprintf, basename(char *), errno, vasprintf])
+AC_CHECK_DECLS([abort, asprintf, basename(char *), errno, vasprintf, memmem])
 AC_CHECK_DECLS(m4_split(m4_normalize(fixincludes_UNLOCKED_FUNCS)))
 
 # Checks for typedefs, structures, and compiler characteristics.
diff --git a/fixincludes/system.h b/fixincludes/system.h
index dca5d57b2e3..687fb2e2025 100644
--- a/fixincludes/system.h
+++ b/fixincludes/system.h
@@ -209,6 +209,10 @@ extern int errno;
 extern void abort (void);
 #endif
 
+#if defined (HAVE_DECL_MEMMEM) && !HAVE_DECL_MEMMEM
+extern void *memmem (const void *, size_t, const void *, size_t);
+#endif
+
 #if HAVE_SYS_STAT_H
 # include 
 #endif
-- 
2.40.0

Re: [PATCH] c++: NTTP constraint depending on outer args [PR109160]

2023-03-27 Thread Patrick Palka via Gcc-patches

On Fri, Mar 17, 2023 at 11:26 AM Patrick Palka  wrote:
>
> Here we're crashing during satisfaction for the NTTP 'C auto' from
> do_auto_deduction ultimately because convert_template_argument / unify
> don't pass all outer template arguments to do_auto_deduction, and during
> satisfaction we need to know all arguments.  While these callers do
> pass some outer arguments, they are only sufficient to properly
> substitute the 'auto' and are not necessarily the complete set.
>
> Fortunately it seems it's possible to obtain the full set of outer
> arguments from these callers via convert_template_argument's IN_DECL
> parameter and unify's TPARMS parameter.  So this patch adds a TMPL
> parameter to do_auto_deduction, used only during adc_unify deduction,
> which contains the (partially instantiated) template corresponding to
> this auto and from which we can obtain all outer template arguments for
> satisfaction.
>
> This patch also adjusts the IN_DECL argument passed to
> coerce_template_parms from tsubst_decl so that we could in turn safely
> assume convert_template_argument's IN_DECL is always a TEMPLATE_DECL,
> and thus could pass it as-is to do_auto_deduction.  (tsubst_decl seems
> to be the only caller that passes a non-empty non-template IN_DECL to
> coerce_template_parms.)
>
> Bootstrapped and regtested on x86_64-pc-linux-gnu, does this look OK for
> trunk/12?

Ping.

>
> PR c++/109160
>
> gcc/cp/ChangeLog:
>
> * cp-tree.h (do_auto_deduction): Add defaulted TMPL parameter.
> * pt.cc (convert_template_argument): Pass IN_DECL as TMPL to
> do_auto_deduction.
> (tsubst_decl) : Pass TMPL instead of T as
> IN_DECL to coerce_template_parms.
> (unify) : Pass the corresponding
> template as TMPL to do_auto_deduction.
> (do_auto_deduction): Document default arguments.  Use TMPL
> to obtain a full set of template arguments for satisfaction
> in the adc_unify case.
>
> gcc/testsuite/ChangeLog:
>
> * g++.dg/cpp2a/concepts-placeholder12.C: New test.
> ---
>  gcc/cp/cp-tree.h  |  3 +-
>  gcc/cp/pt.cc  | 30 ++-
>  .../g++.dg/cpp2a/concepts-placeholder12.C | 29 ++
>  3 files changed, 53 insertions(+), 9 deletions(-)
>  create mode 100644 gcc/testsuite/g++.dg/cpp2a/concepts-placeholder12.C
>
> diff --git a/gcc/cp/cp-tree.h b/gcc/cp/cp-tree.h
> index dfc1c845768..e7190c5cc62 100644
> --- a/gcc/cp/cp-tree.h
> +++ b/gcc/cp/cp-tree.h
> @@ -7324,7 +7324,8 @@ extern tree do_auto_deduction   (tree, 
> tree, tree,
>   auto_deduction_context
>  = adc_unspecified,
>  tree = NULL_TREE,
> -int = LOOKUP_NORMAL);
> +int = LOOKUP_NORMAL,
> +tree = NULL_TREE);
>  extern tree type_uses_auto (tree);
>  extern tree type_uses_auto_or_concept  (tree);
>  extern void append_type_to_template_for_access_check (tree, tree, tree,
> diff --git a/gcc/cp/pt.cc b/gcc/cp/pt.cc
> index ddbd73371b9..6400b686a58 100644
> --- a/gcc/cp/pt.cc
> +++ b/gcc/cp/pt.cc
> @@ -8638,7 +8638,7 @@ convert_template_argument (tree parm,
>else if (tree a = type_uses_auto (t))
> {
>   t = do_auto_deduction (t, arg, a, complain, adc_unify, args,
> -LOOKUP_IMPLICIT);
> +LOOKUP_IMPLICIT, in_decl);
>   if (t == error_mark_node)
> return error_mark_node;
> }
> @@ -15243,7 +15243,7 @@ tsubst_decl (tree t, tree args, tsubst_flags_t 
> complain)
>  the template.  */
>   argvec = (coerce_template_parms
> (DECL_TEMPLATE_PARMS (gen_tmpl),
> -argvec, t, complain));
> +argvec, tmpl, complain));
> if (argvec == error_mark_node)
>   RETURN (error_mark_node);
> hash = spec_hasher::hash (gen_tmpl, argvec);
> @@ -24655,7 +24655,9 @@ unify (tree tparms, tree targs, tree parm, tree arg, 
> int strict,
>   if (tree a = type_uses_auto (tparm))
> {
>   tparm = do_auto_deduction (tparm, arg, a,
> -complain, adc_unify, targs);
> +complain, adc_unify, targs,
> +LOOKUP_NORMAL,
> +TPARMS_PRIMARY_TEMPLATE (tparms));
>   if (tparm == error_mark_node)
> return 1;
> }
> @@ -30643,13 +30645,20 @@ unparenthesized_id_or_class_member_access_p (tree 
> init)
> adc_requirement contexts

Re: [V5][PATCH 2/2] Update documentation to clarify a GCC extension

2023-03-27 Thread Qing Zhao via Gcc-patches



> On Mar 27, 2023, at 12:48 PM, Qing Zhao via Gcc-patches 
>  wrote:
> 
> 
> 
>> On Mar 27, 2023, at 12:31 PM, Jakub Jelinek  wrote:
>> 
>> On Mon, Mar 27, 2023 at 04:22:25PM +, Qing Zhao via Gcc-patches wrote:
 The latter IMHO.  Having a warning with completely nonsensical name will
 just confuse users.
>>> 
>>> Okay. -:)
>>> How about "-Wstruct-with-fam-not-at-end”?  Or do you have any suggestion on 
>>> the name?
>> 
>> Nobody will know what fam is.
> 
> Yes, I agree -:)
> 
>> -Wflex-array-member-not-at-end ?
> 
> However, Will this name include “a structure with flexible array member is 
> not at end”?  

Looks like no better name than “-Wflex-array-member-not-at-end” as I can think 
of..

I will use this one.

Let me know if you have any further comments on the documentation part.

thanks.

Qing
> Qing
> 
>> 
>>  Jakub
>> 
>

Re: Enable UTF-8 code page in driver and compiler on 64-bit mingw host [PR108865]

2023-03-27 Thread Costas Argyris via Gcc-patches

The patch attached to this email extends the UTF-8 support of the
driver and compiler processes to the 32-bit mingw host.Initially,
only the 64-bit host got it.

About the changes in sym-mingw32.cc:

Even though the 64-bit host was building fine with the symbol being
simply declared as a char, the 32-bit host was failing to find the
symbol at link time because a leading underscore was being added
to it by the compiler.The asm keyword ensures that the symbol
always appears with that exact name, such that the linker will
always find it.

The patch also includes Jacek's flag about adding the .manifest file
as a prerequisite for the object file (this was actually done from before
but an earlier version of the patch was pushed so it was missed).

Tested building from master for both 32 and 64-bit mingw hosts using:

1) cross-compilation from a Debian machine using configure + make
2) native-compilation from a Windows machine using MSYS2

On Thu, 9 Mar 2023 at 15:03, Jonathan Yong <10wa...@gmail.com> wrote:

> On 3/9/23 13:33, Costas Argyris wrote:
> > Pinging the list and mingw maintainer.
> >
> > Analysis and pre-approval here:
> >
> > https://gcc.gnu.org/bugzilla/show_bug.cgi?id=108865
> >
>
> Thanks, pushed to master branch.
>
>
>
From 0ed08739f44116eaef46b552df923959ba945afa Mon Sep 17 00:00:00 2001
From: Costas Argyris 
Date: Sun, 26 Mar 2023 11:32:13 +0100
Subject: [PATCH] Extend UTF-8 support to the 32-bit mingw host.

Prevent any name mangling in HOST_EXTRA_OBJS_SYMBOL
such that the linker always finds it by that name.

Also add the .manifest file as an explicit
dependency in the make rule such that the
object gets re-built if it changes.
---
 gcc/config.host| 5 +++--
 gcc/config/i386/sym-mingw32.cc | 4 +++-
 gcc/config/i386/x-mingw32-utf8 | 3 ++-
 3 files changed, 8 insertions(+), 4 deletions(-)

diff --git a/gcc/config.host b/gcc/config.host
index 4abb32ad73d..5df85752ed4 100644
--- a/gcc/config.host
+++ b/gcc/config.host
@@ -232,10 +232,11 @@ case ${host} in
 ;;
   i[34567]86-*-mingw32*)
 host_xm_file=i386/xm-mingw32.h
-host_xmake_file="${host_xmake_file} i386/x-mingw32"
+host_xmake_file="${host_xmake_file} i386/x-mingw32 i386/x-mingw32-utf8"
 host_exeext=.exe
 out_host_hook_obj=host-mingw32.o
-host_extra_gcc_objs="${host_extra_gcc_objs} driver-mingw32.o"
+host_extra_objs="${host_extra_objs} utf8-mingw32.o"
+host_extra_gcc_objs="${host_extra_gcc_objs} driver-mingw32.o utf8rc-mingw32.o"
 host_lto_plugin_soname=liblto_plugin.dll
 ;;
   x86_64-*-mingw*)
diff --git a/gcc/config/i386/sym-mingw32.cc b/gcc/config/i386/sym-mingw32.cc
index f369698abc4..2f8dee6c1ec 100644
--- a/gcc/config/i386/sym-mingw32.cc
+++ b/gcc/config/i386/sym-mingw32.cc
@@ -1 +1,3 @@
-char HOST_EXTRA_OBJS_SYMBOL;
+/* Prevent any name mangling to make sure that the linker
+   will always find the symbol. */
+char HOST_EXTRA_OBJS_SYMBOL asm ("HOST_EXTRA_OBJS_SYMBOL");
diff --git a/gcc/config/i386/x-mingw32-utf8 b/gcc/config/i386/x-mingw32-utf8
index 9de963d7965..cf5c3db3d8b 100644
--- a/gcc/config/i386/x-mingw32-utf8
+++ b/gcc/config/i386/x-mingw32-utf8
@@ -27,7 +27,8 @@
 # The resulting .o file gets added to host_extra_gcc_objs in
 # config.host for x86_64-*-mingw* host and gets linked into
 # the driver as a .o file, so it's lack of symbols is OK.
-utf8rc-mingw32.o : $(srcdir)/config/i386/utf8-mingw32.rc
+utf8rc-mingw32.o : $(srcdir)/config/i386/utf8-mingw32.rc \
+  $(srcdir)/config/i386/winnt-utf8.manifest
 	$(WINDRES) $< $@
 
 # Create an object file that just exports the global symbol
-- 
2.30.2

Re: [PATCH] In the ready lists of pipeline, put unrecog insns (such as CLOBBER, USE) at the latest to issue.

2023-03-27 Thread Richard Sandiford via Gcc-patches

Jin Ma via Gcc-patches  writes:
>   Unrecog insns (such as CLOBBER, USE) does not represent real instructions, 
> but in the
> process of pipeline optimization, they will wait for transmission in ready 
> list like
> other insns, without considering resource conflicts and cycles. This results 
> in a
> multi-issue CPU architecture that can be issued at any time if other regular 
> insns
> have resource conflicts or cannot be launched for other reasons. As a result, 
> its
> position is advanced in the generated insns sequence, which will affect 
> register
> allocation and often lead to more redundant mov instructions.

Is it the clobber rather than the use case that is causing problems?
I would expect that scheduling a use ASAP would be better for register
pressure, since it might close off the associated live range and so
reduce the number of conflicts.

I.e. is the problem that, when a live range starts with a clobber,
the current code will tend to move the clobber up and so extend
the associated live range?  If so, that sounds like something we
should address more directly, for two reasons:

(1) We should try to prevent clobbers that start a live range from being
moved up even if first_cycle_insn_p.

(2) Clobbers can also be used to close off a live range, which is useful
if a pseudo is only written to in parts.  The current behaviour is
probably better for those clobbers.

In general, if you're hitting register pressure problems with scheduling,
have you tried enabling -fsched-pressure by default, possibly with
--param=sched-pressure-algorithm=2 (but try with the default algo too)?

Thanks,
Richard

>
> gcc/ChangeLog:
>
>   * haifa-sched.cc (prune_ready_list): Consider unrecog insns(CLOBBER and 
> USE)
>   in pruning ready lists.

> ---
>  gcc/haifa-sched.cc | 8 
>  1 file changed, 8 insertions(+)
>
> diff --git a/gcc/haifa-sched.cc b/gcc/haifa-sched.cc
> index 48b53776fa9..72c4c44da76 100644
> --- a/gcc/haifa-sched.cc
> +++ b/gcc/haifa-sched.cc
> @@ -6318,6 +6318,14 @@ prune_ready_list (state_t temp_state, bool 
> first_cycle_insn_p,
> cost = 1;
> reason = "not a shadow";
>   }
> +   else if (recog_memoized (insn) < 0
> +   && (GET_CODE (PATTERN (insn)) == CLOBBER
> +   || GET_CODE (PATTERN (insn)) == USE))
> + {
> +   if (!first_cycle_insn_p)
> + cost = 1;
> +   reason = "unrecog insn";
> + }
> else if (recog_memoized (insn) < 0)
>   {
> if (!first_cycle_insn_p

Re: [V5][PATCH 2/2] Update documentation to clarify a GCC extension

2023-03-27 Thread Qing Zhao via Gcc-patches



> On Mar 27, 2023, at 12:31 PM, Jakub Jelinek  wrote:
> 
> On Mon, Mar 27, 2023 at 04:22:25PM +, Qing Zhao via Gcc-patches wrote:
>>> The latter IMHO.  Having a warning with completely nonsensical name will
>>> just confuse users.
>> 
>> Okay. -:)
>> How about "-Wstruct-with-fam-not-at-end”?  Or do you have any suggestion on 
>> the name?
> 
> Nobody will know what fam is.

Yes, I agree -:)

> -Wflex-array-member-not-at-end ?

However, Will this name include “a structure with flexible array member is not 
at end”?  
Qing

> 
>   Jakub
>

Re: [V5][PATCH 2/2] Update documentation to clarify a GCC extension

2023-03-27 Thread Jakub Jelinek via Gcc-patches

On Mon, Mar 27, 2023 at 04:22:25PM +, Qing Zhao via Gcc-patches wrote:
> > The latter IMHO.  Having a warning with completely nonsensical name will
> > just confuse users.
> 
> Okay. -:)
> How about "-Wstruct-with-fam-not-at-end”?  Or do you have any suggestion on 
> the name?

Nobody will know what fam is.
-Wflex-array-member-not-at-end ?

Jakub

Re: [V5][PATCH 2/2] Update documentation to clarify a GCC extension

2023-03-27 Thread Qing Zhao via Gcc-patches



> On Mar 27, 2023, at 12:06 PM, Jakub Jelinek  wrote:
> 
> On Mon, Mar 27, 2023 at 03:57:58PM +, Qing Zhao wrote:
 +Please use warning option  @option{-Wgnu-variable-sized-type-not-at-end} 
 to
>>> This is certainly misnamed.
>> 
>> The name “-Wgnu-variable-sized-type-not-at-end” was just used the warning 
>> name from CLANG. -:)
>> 
>> Shall we use the same name as CLANG? Or we invent a new name?
> 
> The latter IMHO.  Having a warning with completely nonsensical name will
> just confuse users.

Okay. -:)
How about "-Wstruct-with-fam-not-at-end”?  Or do you have any suggestion on the 
name?
> 
>>> GNU variable sized type not at the end of a
>>> struct is something like
>>> void bar (void *);
>>> void foo (int n) {
>>> struct S { int a; int b[n]; int c; } s;
>>> s.a = 1;
>>> __builtin_memset (s.b, 0, sizeof (s.b));
>>> s.c = 3;
>>> bar ();
>>> }
>>> Certainly not flexible array members in the middle of structure.
>> 
>> Right now, with -Wpedantic, we have the following warning for the above 
>> small case:
>> 
>> t2.c:3:24: warning: a member of a structure or union cannot have a variably 
>> modified type [-Wpedantic]
>>3 |  struct S { int a; int b[n]; int c; } s;
>>  |^
> 
> Sure, it is a GNU C extension (not allowed in C++ BTW).
> It is documented in https://gcc.gnu.org/onlinedocs/gcc/Variable-Length.html
> though just very briefly:
> As an extension, GCC accepts variable-length arrays as a member of a 
> structure or a union. For example: 
> void
> foo (int n)
> {
>  struct S { int x[n]; };
> }

Okay, I see. 
> 
>> Do we have a definition for “GNU variable sized type” now?
> 
> Naturally, variable sized type should have non-constant sizeof, because
> otherwise it is constant sized type.

Oh, for flexible array members, we cannot take sizeof it, So they are 
considered as incomplete type, right?

thanks.

Qing
>  That is not
> the case for flexible array members, there is nothing variable sized on
> them, especially if they are in the middle of a structure.
> 
>   Jakub
>

Re: [V5][PATCH 2/2] Update documentation to clarify a GCC extension

2023-03-27 Thread Jakub Jelinek via Gcc-patches

On Mon, Mar 27, 2023 at 03:57:58PM +, Qing Zhao wrote:
> >> +Please use warning option  @option{-Wgnu-variable-sized-type-not-at-end} 
> >> to
> > This is certainly misnamed.
> 
> The name “-Wgnu-variable-sized-type-not-at-end” was just used the warning 
> name from CLANG. -:)
> 
> Shall we use the same name as CLANG? Or we invent a new name?

The latter IMHO.  Having a warning with completely nonsensical name will
just confuse users.

> >  GNU variable sized type not at the end of a
> > struct is something like
> > void bar (void *);
> > void foo (int n) {
> >  struct S { int a; int b[n]; int c; } s;
> >  s.a = 1;
> >  __builtin_memset (s.b, 0, sizeof (s.b));
> >  s.c = 3;
> >  bar ();
> > }
> > Certainly not flexible array members in the middle of structure.
> 
> Right now, with -Wpedantic, we have the following warning for the above small 
> case:
> 
> t2.c:3:24: warning: a member of a structure or union cannot have a variably 
> modified type [-Wpedantic]
> 3 |  struct S { int a; int b[n]; int c; } s;
>   |^

Sure, it is a GNU C extension (not allowed in C++ BTW).
It is documented in https://gcc.gnu.org/onlinedocs/gcc/Variable-Length.html
though just very briefly:
As an extension, GCC accepts variable-length arrays as a member of a structure 
or a union. For example: 
void
foo (int n)
{
  struct S { int x[n]; };
}

> Do we have a definition for “GNU variable sized type” now?

Naturally, variable sized type should have non-constant sizeof, because
otherwise it is constant sized type.  That is not
the case for flexible array members, there is nothing variable sized on
them, especially if they are in the middle of a structure.

Jakub

[PATCH v2][RFC] vect: Verify that GET_MODE_NUNITS is greater than one for vect_grouped_store_supported

2023-03-27 Thread Kevin Lee

This patch is a proper fix to the previous patch 
https://gcc.gnu.org/pipermail/gcc-patches/2023-March/614463.html 
vect_grouped_store_supported checks if the count is a power of 2, but
doesn't check the size of the GET_MODE_NUNITS.
This should handle the riscv case where the mode is VNx1DI since the
nelt would be {1, 1}. 
It was tested on RISCV and x86_64-linux-gnu. Would this be correct 
for the vectors with size smaller than 2?

---
 gcc/tree-vect-data-refs.cc | 2 ++
 1 file changed, 2 insertions(+)

diff --git a/gcc/tree-vect-data-refs.cc b/gcc/tree-vect-data-refs.cc
index 8daf7bd7dd3..04ad12f7d04 100644
--- a/gcc/tree-vect-data-refs.cc
+++ b/gcc/tree-vect-data-refs.cc
@@ -5399,6 +5399,8 @@ vect_grouped_store_supported (tree vectype, unsigned 
HOST_WIDE_INT count)
  poly_uint64 nelt = GET_MODE_NUNITS (mode);
 
  /* The encoding has 2 interleaved stepped patterns.  */
+if(!nelt.is_constant() && maybe_lt(nelt, (unsigned int) 2))
+  return false;
  vec_perm_builder sel (nelt, 2, 3);
  sel.quick_grow (6);
  for (i = 0; i < 3; i++)
-- 
2.25.1

Re: [V5][PATCH 2/2] Update documentation to clarify a GCC extension

2023-03-27 Thread Qing Zhao via Gcc-patches



> On Mar 27, 2023, at 11:43 AM, Jakub Jelinek  wrote:
> 
> On Mon, Mar 27, 2023 at 01:38:34PM +, Qing Zhao wrote:
>> 
>> 
>>> On Mar 23, 2023, at 4:14 PM, Joseph Myers  wrote:
>>> 
>>> On Thu, 23 Mar 2023, Qing Zhao via Gcc-patches wrote:
>>> 
 +Wgnu-variable-sized-type-not-at-end
 +C C++ Var(warn_variable_sized_type_not_at_end) Warning
 +Warn about structures or unions with C99 flexible array members are not
 +at the end of a structure.
>>> 
>>> I think there's at least one word missing here, e.g. "that" before "are".
>> 
>> Will fix it.
>>> 
 +Please use warning option  @option{-Wgnu-variable-sized-type-not-at-end} 
 to
 +identify all such cases in the source code and modify them.  This 
 extension
 +will be deprecated from gcc in the next release.
>>> 
>>> We don't generally say "in the next release" in the manual (or "deprecated 
>>> from gcc").  Maybe it *is* deprecated, maybe it will be *removed*, or will 
>>> *start to warn by default*, in some specified version number (giving a 
>>> version number seems better than "next release"), but "will be deprecated" 
>>> is odd.
>> How about the following:
>> 
>> +Please use warning option  @option{-Wgnu-variable-sized-type-not-at-end} to
> This is certainly misnamed.

The name “-Wgnu-variable-sized-type-not-at-end” was just used the warning name 
from CLANG. -:)

Shall we use the same name as CLANG? Or we invent a new name?

>  GNU variable sized type not at the end of a
> struct is something like
> void bar (void *);
> void foo (int n) {
>  struct S { int a; int b[n]; int c; } s;
>  s.a = 1;
>  __builtin_memset (s.b, 0, sizeof (s.b));
>  s.c = 3;
>  bar ();
> }
> Certainly not flexible array members in the middle of structure.

Right now, with -Wpedantic, we have the following warning for the above small 
case:

t2.c:3:24: warning: a member of a structure or union cannot have a variably 
modified type [-Wpedantic]
3 |  struct S { int a; int b[n]; int c; } s;
  |^


Do we have a definition for “GNU variable sized type” now?
Shall we include "flexible array members” and" the structure/union with a 
flexible array members at the end" into “GNU variable sized type”?

thanks.

Qing
> 
>> +identify all such cases in the source code and modify them.  This warning 
>> will be 
>> + on by default starting from GCC14.
> 
>   Jakub

Re: [V5][PATCH 2/2] Update documentation to clarify a GCC extension

2023-03-27 Thread Jakub Jelinek via Gcc-patches

On Mon, Mar 27, 2023 at 01:38:34PM +, Qing Zhao wrote:
> 
> 
> > On Mar 23, 2023, at 4:14 PM, Joseph Myers  wrote:
> > 
> > On Thu, 23 Mar 2023, Qing Zhao via Gcc-patches wrote:
> > 
> >> +Wgnu-variable-sized-type-not-at-end
> >> +C C++ Var(warn_variable_sized_type_not_at_end) Warning
> >> +Warn about structures or unions with C99 flexible array members are not
> >> +at the end of a structure.
> > 
> > I think there's at least one word missing here, e.g. "that" before "are".
> 
> Will fix it.
> > 
> >> +Please use warning option  @option{-Wgnu-variable-sized-type-not-at-end} 
> >> to
> >> +identify all such cases in the source code and modify them.  This 
> >> extension
> >> +will be deprecated from gcc in the next release.
> > 
> > We don't generally say "in the next release" in the manual (or "deprecated 
> > from gcc").  Maybe it *is* deprecated, maybe it will be *removed*, or will 
> > *start to warn by default*, in some specified version number (giving a 
> > version number seems better than "next release"), but "will be deprecated" 
> > is odd.
> How about the following:
> 
> +Please use warning option  @option{-Wgnu-variable-sized-type-not-at-end} to

This is certainly misnamed.  GNU variable sized type not at the end of a
struct is something like
void bar (void *);
void foo (int n) {
  struct S { int a; int b[n]; int c; } s;
  s.a = 1;
  __builtin_memset (s.b, 0, sizeof (s.b));
  s.c = 3;
  bar ();
}
Certainly not flexible array members in the middle of structure.

> +identify all such cases in the source code and modify them.  This warning 
> will be 
> + on by default starting from GCC14.

Jakub

Re: [V5][PATCH 2/2] Update documentation to clarify a GCC extension

2023-03-27 Thread Qing Zhao via Gcc-patches



> On Mar 27, 2023, at 10:34 AM, Xi Ruoyao  wrote:
> 
> On Mon, 2023-03-27 at 13:38 +, Qing Zhao via Gcc-patches wrote:
>> 
>> 
>>> On Mar 23, 2023, at 4:14 PM, Joseph Myers  wrote:
>>> 
>>> On Thu, 23 Mar 2023, Qing Zhao via Gcc-patches wrote:
>>> 
 +Wgnu-variable-sized-type-not-at-end
 +C C++ Var(warn_variable_sized_type_not_at_end) Warning
 +Warn about structures or unions with C99 flexible array members are not
 +at the end of a structure.
>>> 
>>> I think there's at least one word missing here, e.g. "that" before "are".
>> 
>> Will fix it.
>>> 
 +Please use warning option  @option{-Wgnu-variable-sized-type-not-at-end} 
 to
 +identify all such cases in the source code and modify them.  This 
 extension
 +will be deprecated from gcc in the next release.
>>> 
>>> We don't generally say "in the next release" in the manual (or "deprecated 
>>> from gcc").  Maybe it *is* deprecated, maybe it will be *removed*, or will 
>>> *start to warn by default*, in some specified version number (giving a 
>>> version number seems better than "next release"), but "will be deprecated" 
>>> is odd.
>> How about the following:
>> 
>> +Please use warning option  @option{-Wgnu-variable-sized-type-not-at-end} to
>> +identify all such cases in the source code and modify them.  This warning 
>> will be 
>> + on by default starting from GCC14.
> 
> I'm wondering why it *was" not on by default... 

This is a new warning that will be added to gcc13, since it’s in a very late 
stage before gcc13 release,
So I am not feeling comfortable to turn it on by default now. 
I think it might be safer to turn it on by default in the beginning of gcc14.

Qing
> 
> 
> -- 
> Xi Ruoyao 
> School of Aerospace Science and Technology, Xidian University

Re: [PATCH] Changed vector size

2023-03-27 Thread Richard Biener via Gcc-patches

On Mon, Mar 27, 2023 at 12:37 PM  wrote:
>
> From: Yixuan Chen 
>
> Observed a vint type "ABS_EXPR" followed by extra 3 int type "ABS_EXPR". If 
> want to test absolute value optimization for vector, maybe don't need 4 times.

A better solution would be to scan a dump before the veclower pass?

> gcc/testsuite/ChangeLog:
>
> 2023-03-27  Yixuan Chen  
>
> * g++.dg/pr94920.C: Declare the vector size as long as int.
>
> ---
>  gcc/testsuite/g++.dg/pr94920.C | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
>
> diff --git a/gcc/testsuite/g++.dg/pr94920.C b/gcc/testsuite/g++.dg/pr94920.C
> index 126b00478d2..498bef93b3a 100644
> --- a/gcc/testsuite/g++.dg/pr94920.C
> +++ b/gcc/testsuite/g++.dg/pr94920.C
> @@ -2,7 +2,7 @@
>  /* { dg-do compile } */
>  /* { dg-options "-O2 -Wno-psabi -fdump-tree-optimized" } */
>
> -typedef int __attribute__((vector_size(4*sizeof(int vint;
> +typedef int __attribute__((vector_size(sizeof(int vint;
>
>  /* Same form as PR.  */
>  __attribute__((noipa)) unsigned int foo(int x) {
> --
> 2.40.0
>

Re: [V5][PATCH 2/2] Update documentation to clarify a GCC extension

2023-03-27 Thread Xi Ruoyao via Gcc-patches

On Mon, 2023-03-27 at 13:38 +, Qing Zhao via Gcc-patches wrote:
> 
> 
> > On Mar 23, 2023, at 4:14 PM, Joseph Myers  wrote:
> > 
> > On Thu, 23 Mar 2023, Qing Zhao via Gcc-patches wrote:
> > 
> > > +Wgnu-variable-sized-type-not-at-end
> > > +C C++ Var(warn_variable_sized_type_not_at_end) Warning
> > > +Warn about structures or unions with C99 flexible array members are not
> > > +at the end of a structure.
> > 
> > I think there's at least one word missing here, e.g. "that" before "are".
> 
> Will fix it.
> > 
> > > +Please use warning option  @option{-Wgnu-variable-sized-type-not-at-end} 
> > > to
> > > +identify all such cases in the source code and modify them.  This 
> > > extension
> > > +will be deprecated from gcc in the next release.
> > 
> > We don't generally say "in the next release" in the manual (or "deprecated 
> > from gcc").  Maybe it *is* deprecated, maybe it will be *removed*, or will 
> > *start to warn by default*, in some specified version number (giving a 
> > version number seems better than "next release"), but "will be deprecated" 
> > is odd.
> How about the following:
> 
> +Please use warning option  @option{-Wgnu-variable-sized-type-not-at-end} to
> +identify all such cases in the source code and modify them.  This warning 
> will be 
> + on by default starting from GCC14.

I'm wondering why it *was" not on by default... 


-- 
Xi Ruoyao 
School of Aerospace Science and Technology, Xidian University

Re: [PATCH] Modula-2: fix documentation layout

2023-03-27 Thread Gaius Mulley via Gcc-patches

Eric Botcazou  writes:

> Hi Gaius,
>
>> yes indeed and thanks for the patch!
>
> You're welcome.  The documentation was slightly broken again in the meantime, 
> but nothing really serious this time.
>
> Again tested with a modern and an old version of Makeinfo.  OK for mainline?
>
>
> 2023-03-27  Eric Botcazou  
>
>   * doc/gm2.texi: Add missing Next, Previous and Top fields to most
>   top-level sections.

Hi Eric,

yes certainly - thanks again!

regards,
Gaius

Re: [PATCH] sanitizer: missing signed integer overflow errors [PR109107]

2023-03-27 Thread Marek Polacek via Gcc-patches

Ping.

On Tue, Mar 14, 2023 at 06:50:26PM -0400, Marek Polacek via Gcc-patches wrote:
> Here we're failing to detect a signed overflow with -O because match.pd,
> since r8-1516, transforms
> 
>   c = (a + 1) - (int) (short int) b;
> 
> into
> 
>   c = (int) ((unsigned int) a + 4294946117);
> 
> wrongly eliding the overflow.  This kind of problems is usually
> avoided by using TYPE_OVERFLOW_SANITIZED in the appropriate place.
> The first match.pd hunk in the patch fixes it.  I've constructed
> a testcase for each of the surrounding cases as well.  Then I
> noticed that fold_binary_loc/associate has the same problem, so I've
> added a TYPE_OVERFLOW_SANITIZED there as well (it may be too coarse,
> sorry).  Then I found yet another problem, but instead of fixing it
> now I've opened 109134.  I could probably go on and find a dozen more.
> 
> Is this worth doing?
> 
> Bootstrapped/regtested on x86_64-pc-linux-gnu, ok for trunk?
> 
>   PR sanitizer/109107
> 
> gcc/ChangeLog:
> 
>   * fold-const.cc (fold_binary_loc): Use TYPE_OVERFLOW_SANITIZED
>   when associating.
>   * match.pd: Use TYPE_OVERFLOW_SANITIZED.
> 
> gcc/testsuite/ChangeLog:
> 
>   * c-c++-common/ubsan/pr109107-2.c: New test.
>   * c-c++-common/ubsan/pr109107-3.c: New test.
>   * c-c++-common/ubsan/pr109107-4.c: New test.
>   * c-c++-common/ubsan/pr109107.c: New test.
> ---
>  gcc/fold-const.cc |  3 ++-
>  gcc/match.pd  |  6 ++---
>  gcc/testsuite/c-c++-common/ubsan/pr109107-2.c | 24 ++
>  gcc/testsuite/c-c++-common/ubsan/pr109107-3.c | 25 +++
>  gcc/testsuite/c-c++-common/ubsan/pr109107-4.c | 24 ++
>  gcc/testsuite/c-c++-common/ubsan/pr109107.c   | 23 +
>  6 files changed, 101 insertions(+), 4 deletions(-)
>  create mode 100644 gcc/testsuite/c-c++-common/ubsan/pr109107-2.c
>  create mode 100644 gcc/testsuite/c-c++-common/ubsan/pr109107-3.c
>  create mode 100644 gcc/testsuite/c-c++-common/ubsan/pr109107-4.c
>  create mode 100644 gcc/testsuite/c-c++-common/ubsan/pr109107.c
> 
> diff --git a/gcc/fold-const.cc b/gcc/fold-const.cc
> index 02a24c5fe65..8d3308a34e9 100644
> --- a/gcc/fold-const.cc
> +++ b/gcc/fold-const.cc
> @@ -11319,7 +11319,8 @@ fold_binary_loc (location_t loc, enum tree_code code, 
> tree type,
>And, we need to make sure type is not saturating.  */
>  
>if ((! FLOAT_TYPE_P (type) || flag_associative_math)
> -   && !TYPE_SATURATING (type))
> +   && !TYPE_SATURATING (type)
> +   && !TYPE_OVERFLOW_SANITIZED (type))
>   {
> tree var0, minus_var0, con0, minus_con0, lit0, minus_lit0;
> tree var1, minus_var1, con1, minus_con1, lit1, minus_lit1;
> diff --git a/gcc/match.pd b/gcc/match.pd
> index e352bd422f5..98bca9ea388 100644
> --- a/gcc/match.pd
> +++ b/gcc/match.pd
> @@ -2933,7 +2933,7 @@ DEFINE_INT_AND_FLOAT_ROUND_FN (RINT)
> /* If the constant operation overflows we cannot do the transform
> directly as we would introduce undefined overflow, for example
> with (a - 1) + INT_MIN.  */
> -   (if (types_match (type, @0))
> +   (if (types_match (type, @0) && !TYPE_OVERFLOW_SANITIZED (type))
>   (with { tree cst = const_binop (outer_op == inner_op
>   ? PLUS_EXPR : MINUS_EXPR,
>   type, @1, @2); }
> @@ -2964,7 +2964,7 @@ DEFINE_INT_AND_FLOAT_ROUND_FN (RINT)
>   (if (!ANY_INTEGRAL_TYPE_P (TREE_TYPE (@0))
> || TYPE_OVERFLOW_WRAPS (TREE_TYPE (@0)))
>(view_convert (minus (outer_op @1 (view_convert @2)) @0))
> -  (if (types_match (type, @0))
> +  (if (types_match (type, @0) && !TYPE_OVERFLOW_SANITIZED (type))
> (with { tree cst = const_binop (outer_op, type, @1, @2); }
>   (if (cst && !TREE_OVERFLOW (cst))
>(minus { cst; } @0
> @@ -2983,7 +2983,7 @@ DEFINE_INT_AND_FLOAT_ROUND_FN (RINT)
>  (if (!ANY_INTEGRAL_TYPE_P (TREE_TYPE (@0))
>|| TYPE_OVERFLOW_WRAPS (TREE_TYPE (@0)))
>   (view_convert (plus @0 (minus (view_convert @1) @2)))
> - (if (types_match (type, @0))
> + (if (types_match (type, @0) && !TYPE_OVERFLOW_SANITIZED (type))
>(with { tree cst = const_binop (MINUS_EXPR, type, @1, @2); }
> (if (cst && !TREE_OVERFLOW (cst))
>   (plus { cst; } @0)))
> diff --git a/gcc/testsuite/c-c++-common/ubsan/pr109107-2.c 
> b/gcc/testsuite/c-c++-common/ubsan/pr109107-2.c
> new file mode 100644
> index 000..eb440b58dd8
> --- /dev/null
> +++ b/gcc/testsuite/c-c++-common/ubsan/pr109107-2.c
> @@ -0,0 +1,24 @@
> +/* PR sanitizer/109107 */
> +/* { dg-do run { target int32 } } */
> +/* { dg-options "-fsanitize=signed-integer-overflow" } */
> +
> +#define INT_MIN (-__INT_MAX__ - 1)
> +int a = INT_MIN;
> +const int b = 676540;
> +
> +__attribute__((noipa)) int
> +foo ()
> +{
> +  int c = a - 1 + (int) (short) b;
> +  return c;
> +}
> +
> +int
> +main

Re: [V5][PATCH 1/2] Handle component_ref to a structre/union field including flexible array member [PR101832]

2023-03-27 Thread Qing Zhao via Gcc-patches

Hi, Jakub,

Could you please review the middle end part of the changes of this patch? (The 
C FE part changes were Okayed by Joseph already).

The major change is in tree-object-size.cc (addr_object_size). (To use the new 
TYPE_INCLUDE_FLEXARRAY info). 

This patch is to fix 
PR101832(https://gcc.gnu.org/bugzilla/show_bug.cgi?id=101832), and is needed 
for Linux Kernel security.  It’s better to be put into GCC13.

Thanks a lot!

https://gcc.gnu.org/pipermail/gcc-patches/2023-March/614101.html
https://gcc.gnu.org/pipermail/gcc-patches/2023-March/614511.html

> On Mar 23, 2023, at 9:03 AM, Qing Zhao via Gcc-patches 
>  wrote:
> 
> Ping…
> 
> Please let me know if you have any further comments on the patch.
> 
> thanks.
> 
> Qing
> 
> 
> Begin forwarded message:
> 
> From: Qing Zhao mailto:qing.z...@oracle.com>>
> Subject: [V5][PATCH 1/2] Handle component_ref to a structre/union field 
> including flexible array member [PR101832]
> Date: March 16, 2023 at 5:47:14 PM EDT
> To: jos...@codesourcery.com, 
> ja...@redhat.com, 
> san...@codesourcery.com
> Cc: rguent...@suse.de, 
> siddh...@gotplt.org, 
> keesc...@chromium.org, 
> gcc-patches@gcc.gnu.org, Qing Zhao 
> mailto:qing.z...@oracle.com>>
> 
> GCC extension accepts the case when a struct with a flexible array member
> is embedded into another struct or union (possibly recursively).
> __builtin_object_size should treat such struct as flexible size per
> -fstrict-flex-arrays.
> 
> gcc/c/ChangeLog:
> 
> PR tree-optimization/101832
> * c-decl.cc (finish_struct): Set TYPE_INCLUDE_FLEXARRAY for
> struct/union type.
> 
> gcc/lto/ChangeLog:
> 
> PR tree-optimization/101832
> * lto-common.cc (compare_tree_sccs_1): Compare bit
> TYPE_NO_NAMED_ARGS_STDARG_P or TYPE_INCLUDE_FLEXARRAY properly
> for its corresponding type.
> 
> gcc/ChangeLog:
> 
> PR tree-optimization/101832
> * print-tree.cc (print_node): Print new bit 
> type_include_flexarray.
> * tree-core.h (struct tree_type_common): Use bit no_named_args_stdarg_p
> as type_include_flexarray for RECORD_TYPE or UNION_TYPE.
> * tree-object-size.cc (addr_object_size): Handle 
> structure/union type
> when it has flexible size.
> * tree-streamer-in.cc 
> (unpack_ts_type_common_value_fields): Stream
> in bit no_named_args_stdarg_p properly for its corresponding type.
> * tree-streamer-out.cc 
> (pack_ts_type_common_value_fields): Stream
> out bit no_named_args_stdarg_p properly for its corresponding type.
> * tree.h (TYPE_INCLUDE_FLEXARRAY): New macro TYPE_INCLUDE_FLEXARRAY.
> 
> gcc/testsuite/ChangeLog:
> 
> PR tree-optimization/101832
> * gcc.dg/builtin-object-size-pr101832.c: New test.
> ---
> gcc/c/c-decl.cc   |  11 ++
> gcc/lto/lto-common.cc |   5 +-
> gcc/print-tree.cc |   5 +
> .../gcc.dg/builtin-object-size-pr101832.c | 134 ++
> gcc/tree-core.h   |   2 +
> gcc/tree-object-size.cc   |  
> 23 ++-
> gcc/tree-streamer-in.cc   |   
> 5 +-
> gcc/tree-streamer-out.cc  |  
>  5 +-
> gcc/tree.h|   7 +-
> 9 files changed, 192 insertions(+), 5 deletions(-)
> create mode 100644 gcc/testsuite/gcc.dg/builtin-object-size-pr101832.c
> 
> diff --git a/gcc/c/c-decl.cc 
> b/gcc/c/c-decl.cc
> index e537d33f398..14c54809b9d 100644
> --- a/gcc/c/c-decl.cc
> +++ b/gcc/c/c-decl.cc
> @@ -9258,6 +9258,17 @@ finish_struct (location_t loc, tree t, tree fieldlist, 
> tree attributes,
>  /* Set DECL_NOT_FLEXARRAY flag for FIELD_DECL x.  */
>  DECL_NOT_FLEXARRAY (x) = !is_flexible_array_member_p (is_last_field, x);
> 
> +  /* Set TYPE_INCLUDE_FLEXARRAY for the context of x, t.
> + when x is an array and is the last field.  */
> +  if (TREE_CODE (TREE_TYPE (x)) == ARRAY_TYPE)
> + TYPE_INCLUDE_FLEXARRAY (t)
> +  = is_last_field && flexible_array_member_type_p (TREE_TYPE (x));
> +  /* Recursively set TYPE_INCLUDE_FLEXARRAY for the context of x, t
> + when x is an union or record and is the last field.  */
> +  else if (RECORD_OR_UNION_TYPE_P (TREE_TYPE (x)))
> + TYPE_INCLUDE_FLEXARRAY (t)
> +  = is_last_field && TYPE_INCLUDE_FLEXARRAY (TREE_TYPE (x));
> +
>  if (DECL_NAME (x)
> || RECORD_OR_UNION_TYPE_P (TREE_TYPE (x)))
> saw_named_field = true;
> diff --git a/gcc/lto/lto-common.cc 
>

Re: [V5][PATCH 2/2] Update documentation to clarify a GCC extension

2023-03-27 Thread Qing Zhao via Gcc-patches




> On Mar 23, 2023, at 4:14 PM, Joseph Myers  wrote:
> 
> On Thu, 23 Mar 2023, Qing Zhao via Gcc-patches wrote:
> 
>> +Wgnu-variable-sized-type-not-at-end
>> +C C++ Var(warn_variable_sized_type_not_at_end) Warning
>> +Warn about structures or unions with C99 flexible array members are not
>> +at the end of a structure.
> 
> I think there's at least one word missing here, e.g. "that" before "are".

Will fix it.
> 
>> +Please use warning option  @option{-Wgnu-variable-sized-type-not-at-end} to
>> +identify all such cases in the source code and modify them.  This extension
>> +will be deprecated from gcc in the next release.
> 
> We don't generally say "in the next release" in the manual (or "deprecated 
> from gcc").  Maybe it *is* deprecated, maybe it will be *removed*, or will 
> *start to warn by default*, in some specified version number (giving a 
> version number seems better than "next release"), but "will be deprecated" 
> is odd.
How about the following:

+Please use warning option  @option{-Wgnu-variable-sized-type-not-at-end} to
+identify all such cases in the source code and modify them.  This warning will 
be 
+ on by default starting from GCC14.

Thanks.

Qing

> 
> -- 
> Joseph S. Myers
> jos...@codesourcery.com

Re: [V5][PATCH 1/2] Handle component_ref to a structre/union field including flexible array member [PR101832]

2023-03-27 Thread Qing Zhao via Gcc-patches




> On Mar 23, 2023, at 2:55 PM, Joseph Myers  wrote:
> 
> On Thu, 23 Mar 2023, Qing Zhao via Gcc-patches wrote:
> 
>> gcc/c/ChangeLog:
>> 
>> PR tree-optimization/101832
>> * c-decl.cc (finish_struct): Set TYPE_INCLUDE_FLEXARRAY for
>> struct/union type.
> 
> The C front-end changes are OK (supposing the original patch has correct 
> whitespace, since it seems to be messed up here).

Thanks for your review.

I just double checked the change in gcc/c/c-decl.cc, looks like the whitespaces 
are good:

diff --git a/gcc/c/c-decl.cc b/gcc/c/c-decl.cc
index e537d33f398..14c54809b9d 100644
--- a/gcc/c/c-decl.cc
+++ b/gcc/c/c-decl.cc
@@ -9258,6 +9258,17 @@ finish_struct (location_t loc, tree t, tree fieldlist, 
tree attributes,
   /* Set DECL_NOT_FLEXARRAY flag for FIELD_DECL x.  */
   DECL_NOT_FLEXARRAY (x) = !is_flexible_array_member_p (is_last_field, x);
 
+  /* Set TYPE_INCLUDE_FLEXARRAY for the context of x, t.
+when x is an array and is the last field.  */
+  if (TREE_CODE (TREE_TYPE (x)) == ARRAY_TYPE)
+   TYPE_INCLUDE_FLEXARRAY (t)
+ = is_last_field && flexible_array_member_type_p (TREE_TYPE (x));
+  /* Recursively set TYPE_INCLUDE_FLEXARRAY for the context of x, t
+when x is an union or record and is the last field.  */
+  else if (RECORD_OR_UNION_TYPE_P (TREE_TYPE (x)))
+   TYPE_INCLUDE_FLEXARRAY (t)
+ = is_last_field && TYPE_INCLUDE_FLEXARRAY (TREE_TYPE (x));
+
   if (DECL_NAME (x)
  || RECORD_OR_UNION_TYPE_P (TREE_TYPE (x)))
saw_named_field = true;

I guess that the git send-mail might mess up them. -:).

Qing

> 
> -- 
> Joseph S. Myers
> jos...@codesourcery.com

Re: [PATCH 1/2] c++: improve "NTTP argument considered unused" fix [PR53164, PR105848]

2023-03-27 Thread Patrick Palka via Gcc-patches

On Thu, 23 Mar 2023, Patrick Palka wrote:

> r13-995-g733a792a2b2e16 worked around the problem of FUNCTION_DECL
> template arguments not always getting marked as odr-used by redundantly
> calling mark_used on the substituted ADDR_EXPR callee of a CALL_EXPR.
> This is just a narrow workaround however, since using a FUNCTION_DECL as
> a template argument alone should constitutes an odr-use; we shouldn't
> need to subsequently e.g. call the function or take its address.
> 
> This patch fixes this in a more general way at template specialization
> time by walking the template arguments of the specialization and calling
> mark_used on all entities used within.  As before, the call to mark_used
> as it worst a no-op, but it compensates for the situation where we end up
> forming a specialization from a template context in which mark_used is
> inhibited.  Another approach would be to call mark_used whenever we
> substitute a TEMPLATE_PARM_INDEX, but that would result in many more
> redundant calls to mark_used compared to this approach.
> 
> Bootstrapped and regtested on x86_64-pc-linux-gnu, does this look OK for
> trunk?
> 
>   PR c++/53164
>   PR c++/105848
> 
> gcc/cp/ChangeLog:
> 
>   * pt.cc (instantiate_class_template): Call
>   mark_template_arguments_used.
>   (tsubst_copy_and_build) : Revert r13-995 change.
>   (mark_template_arguments_used): Define.
>   (instantiate_template): Call mark_template_arguments_used.
> 
> gcc/testsuite/ChangeLog:
> 
>   * g++.dg/template/fn-ptr3a.C: New test.
>   * g++.dg/template/fn-ptr4.C: New test.
> ---
>  gcc/cp/pt.cc | 51 
>  gcc/testsuite/g++.dg/template/fn-ptr3a.C | 25 
>  gcc/testsuite/g++.dg/template/fn-ptr4.C  | 14 +++
>  3 files changed, 74 insertions(+), 16 deletions(-)
>  create mode 100644 gcc/testsuite/g++.dg/template/fn-ptr3a.C
>  create mode 100644 gcc/testsuite/g++.dg/template/fn-ptr4.C
> 
> diff --git a/gcc/cp/pt.cc b/gcc/cp/pt.cc
> index 7e4a8de0c8b..9b3cc1c 100644
> --- a/gcc/cp/pt.cc
> +++ b/gcc/cp/pt.cc
> @@ -220,6 +220,7 @@ static tree make_argument_pack (tree);
>  static tree enclosing_instantiation_of (tree tctx);
>  static void instantiate_body (tree pattern, tree args, tree d, bool nested);
>  static tree maybe_dependent_member_ref (tree, tree, tsubst_flags_t, tree);
> +static void mark_template_arguments_used (tree);
>  
>  /* Make the current scope suitable for access checking when we are
> processing T.  T can be FUNCTION_DECL for instantiated function
> @@ -12142,6 +12143,9 @@ instantiate_class_template (tree type)
>cp_unevaluated_operand = 0;
>c_inhibit_evaluation_warnings = 0;
>  }
> +
> +  mark_template_arguments_used (INNERMOST_TEMPLATE_ARGS (args));
> +
>/* Use #pragma pack from the template context.  */
>saved_maximum_field_alignment = maximum_field_alignment;
>maximum_field_alignment = TYPE_PRECISION (pattern);
> @@ -21173,22 +21177,10 @@ tsubst_copy_and_build (tree t,
> }
>  
>   /* Remember that there was a reference to this entity.  */
> - if (function != NULL_TREE)
> -   {
> - tree inner = function;
> - if (TREE_CODE (inner) == ADDR_EXPR
> - && TREE_CODE (TREE_OPERAND (inner, 0)) == FUNCTION_DECL)
> -   /* We should already have called mark_used when taking the
> -  address of this function, but do so again anyway to make
> -  sure it's odr-used: at worst this is a no-op, but if we
> -  obtained this FUNCTION_DECL as part of ahead-of-time overload
> -  resolution then that call to mark_used wouldn't have marked it
> -  odr-used yet (53164).  */
> -   inner = TREE_OPERAND (inner, 0);
> - if (DECL_P (inner)
> - && !mark_used (inner, complain) && !(complain & tf_error))
> -   RETURN (error_mark_node);
> -   }
> + if (function != NULL_TREE
> + && DECL_P (function)
> + && !mark_used (function, complain) && !(complain & tf_error))
> +   RETURN (error_mark_node);
>  
>   if (!maybe_fold_fn_template_args (function, complain))
> return error_mark_node;
> @@ -21883,6 +21875,31 @@ check_instantiated_args (tree tmpl, tree args, 
> tsubst_flags_t complain)
>return result;
>  }
>  
> +/* Call mark_used on each entity within the template arguments ARGS of some
> +   template specialization, to ensure that each such entity is considered
> +   odr-used regardless of whether the specialization was first formed in a
> +   template context.
> +
> +   This function assumes push_to_top_level has been called beforehand, and
> +   that processing_template_decl has been set iff the template arguments
> +   are dependent.  */
> +
> +static void
> +mark_template_arguments_used (tree args)
> +{
> +  gcc_checking_assert (TMPL_ARGS_DEPTH (args) == 1);
> +
> +  if (processing_template_decl)
> +return;
> +
> +  auto

[committed] gcov: Fix "subcomand" typos [PR109297]

2023-03-27 Thread Jonathan Wakely via Gcc-patches

Committed as obvious.

-- >8 --

gcc/ChangeLog:

PR gcov-profile/109297
* gcov-tool.cc (merge_usage): Fix "subcomand" typo.
(merge_stream_usage): Likewise.
(overlap_usage): Likewise.
---
 gcc/gcov-tool.cc | 6 +++---
 1 file changed, 3 insertions(+), 3 deletions(-)

diff --git a/gcc/gcov-tool.cc b/gcc/gcov-tool.cc
index 88b7936a4f4..0140e7f2a4a 100644
--- a/gcc/gcov-tool.cc
+++ b/gcc/gcov-tool.cc
@@ -185,7 +185,7 @@ static const struct option merge_options[] =
 static void ATTRIBUTE_NORETURN
 merge_usage (void)
 {
-  fnotice (stderr, "Merge subcomand usage:");
+  fnotice (stderr, "Merge subcommand usage:");
   print_merge_usage_message (true);
   exit (FATAL_EXIT_CODE);
 }
@@ -255,7 +255,7 @@ static const struct option merge_stream_options[] =
 static void ATTRIBUTE_NORETURN
 merge_stream_usage (void)
 {
-  fnotice (stderr, "Merge-stream subcomand usage:");
+  fnotice (stderr, "Merge-stream subcommand usage:");
   print_merge_stream_usage_message (true);
   exit (FATAL_EXIT_CODE);
 }
@@ -507,7 +507,7 @@ static const struct option overlap_options[] =
 static void ATTRIBUTE_NORETURN
 overlap_usage (void)
 {
-  fnotice (stderr, "Overlap subcomand usage:");
+  fnotice (stderr, "Overlap subcommand usage:");
   print_overlap_usage_message (true);
   exit (FATAL_EXIT_CODE);
 }
-- 
2.39.2

[PATCH] tree-optimization/54498 - testcase for the bug

2023-03-27 Thread Richard Biener via Gcc-patches

I realized I never added a testcase for the fix of this bug.  Now done
after verifying it still fails when reverting the fix.

tested on x86_64-unknown-linux-gnu, pushed.

PR tree-optimization/54498
* g++.dg/torture/pr54498.C: New testcase.
---
 gcc/testsuite/g++.dg/torture/pr54498.C | 57 ++
 1 file changed, 57 insertions(+)
 create mode 100644 gcc/testsuite/g++.dg/torture/pr54498.C

diff --git a/gcc/testsuite/g++.dg/torture/pr54498.C 
b/gcc/testsuite/g++.dg/torture/pr54498.C
new file mode 100644
index 000..74651f9063a
--- /dev/null
+++ b/gcc/testsuite/g++.dg/torture/pr54498.C
@@ -0,0 +1,57 @@
+// { dg-do run }
+// { dg-additional-options "-fno-tree-sra" }
+
+#include 
+
+using namespace std;
+
+class bar_src {
+ public:
+  bar_src() : next(0) {}
+  virtual ~bar_src() { delete next; }
+
+  bar_src *next;
+};
+
+class foo_src : public bar_src {
+ public:
+  foo_src(double f, double fwidth, double s = 5.0);
+  virtual ~foo_src() {}
+
+ private:
+  double freq, width, peak_time, cutoff;
+};
+
+
+foo_src::foo_src(double f, double fwidth, double s) {
+  freq = f; width = 1/fwidth; cutoff = s*width; peak_time = cutoff;
+}
+
+complex do_ft2(int i) __attribute__ ((noinline));
+
+complex do_ft2(int i) {
+  return i == 0 ? complex(-491.697,887.05) : 
complex(-491.692,887.026);
+}
+
+void foo(void) {
+  complex prev_ft = 0.0, ft = 0.0;
+  for (int i=0; i < 2; i++) {
+prev_ft = ft;
+{
+  foo_src src(1.0, 1.0 / 20);
+  ft = do_ft2(i);
+}
+if (i > 0)
+  {
+double a = abs(ft - prev_ft);
+if (a < 0.024 || a > 0.025)
+  __builtin_abort ();
+  }
+  }
+}
+
+int main()
+{
+  foo();
+  return 0;
+}
-- 
2.35.3

[PATCH] tree-optimization/108357 - add testcase

2023-03-27 Thread Richard Biener via Gcc-patches

The following adds the testcase for the bug which was recently
fixed.

Pushed.

PR tree-optimization/108357
* gcc.dg/tree-ssa/pr108357.c: New testcase.
---
 gcc/testsuite/gcc.dg/tree-ssa/pr108357.c | 22 ++
 1 file changed, 22 insertions(+)
 create mode 100644 gcc/testsuite/gcc.dg/tree-ssa/pr108357.c

diff --git a/gcc/testsuite/gcc.dg/tree-ssa/pr108357.c 
b/gcc/testsuite/gcc.dg/tree-ssa/pr108357.c
new file mode 100644
index 000..44c457b7a97
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/tree-ssa/pr108357.c
@@ -0,0 +1,22 @@
+/* { dg-do compile } */
+/* { dg-options "-O2 -fdump-tree-threadfull1" } */
+
+static char b;
+static unsigned c;
+void foo();
+short(a)(short d, short e) { return d * e; }
+static short f(short d) {
+  b = 0;
+  if ((d && 0 >= c < d) ^ d)
+;
+  else
+foo();
+  return d;
+}
+int main()
+{
+  short g = a(5, b ^ 9854);
+  f(g);
+}
+
+/* { dg-final { scan-tree-dump-not "foo" "threadfull1" } } */
-- 
2.35.3

Re: [PATCH] target/109296 - riscv: Add missing mode specifiers for XTheadMemPair

2023-03-27 Thread Philipp Tomsich

Applied to master, thanks!
Philipp.

On Mon, 27 Mar 2023 at 19:55, Kito Cheng  wrote:
>
> OK for trunk, thanks :)
>
> On Mon, Mar 27, 2023 at 7:04 PM Christoph Muellner 
>  wrote:
>>
>> From: Christoph Müllner 
>>
>> This patch adds missing mode specifiers for XTheadMemPair INSNs.
>>
>> gcc/ChangeLog:
>> PR target/109296
>> * config/riscv/thead.md: Add missing mode specifiers.
>>
>> Signed-off-by: Christoph Müllner 
>> ---
>>  gcc/config/riscv/thead.md | 16 
>>  1 file changed, 8 insertions(+), 8 deletions(-)
>>
>> diff --git a/gcc/config/riscv/thead.md b/gcc/config/riscv/thead.md
>> index 63c4af6f77d..0623607d3dc 100644
>> --- a/gcc/config/riscv/thead.md
>> +++ b/gcc/config/riscv/thead.md
>> @@ -321,10 +321,10 @@ (define_insn "*th_mempair_store_2"
>>
>>  ;; MEMPAIR load DI extended signed SI
>>  (define_insn "*th_mempair_load_extendsidi2"
>> -  [(set (match_operand 0 "register_operand" "=r")
>> -   (sign_extend:DI (match_operand 1 "memory_operand" "m")))
>> -   (set (match_operand 2 "register_operand" "=r")
>> -   (sign_extend:DI (match_operand 3 "memory_operand" "m")))]
>> +  [(set (match_operand:DI 0 "register_operand" "=r")
>> +   (sign_extend:DI (match_operand:SI 1 "memory_operand" "m")))
>> +   (set (match_operand:DI 2 "register_operand" "=r")
>> +   (sign_extend:DI (match_operand:SI 3 "memory_operand" "m")))]
>>"TARGET_XTHEADMEMPAIR && TARGET_64BIT && reload_completed
>> && th_mempair_operands_p (operands, true, SImode)"
>>{ return th_mempair_output_move (operands, true, SImode, SIGN_EXTEND); }
>> @@ -334,10 +334,10 @@ (define_insn "*th_mempair_load_extendsidi2"
>>
>>  ;; MEMPAIR load DI extended unsigned SI
>>  (define_insn "*th_mempair_load_zero_extendsidi2"
>> -  [(set (match_operand 0 "register_operand" "=r")
>> -   (zero_extend:DI (match_operand 1 "memory_operand" "m")))
>> -   (set (match_operand 2 "register_operand" "=r")
>> -   (zero_extend:DI (match_operand 3 "memory_operand" "m")))]
>> +  [(set (match_operand:DI 0 "register_operand" "=r")
>> +   (zero_extend:DI (match_operand:SI 1 "memory_operand" "m")))
>> +   (set (match_operand:DI 2 "register_operand" "=r")
>> +   (zero_extend:DI (match_operand:SI 3 "memory_operand" "m")))]
>>"TARGET_XTHEADMEMPAIR && TARGET_64BIT && reload_completed
>> && th_mempair_operands_p (operands, true, SImode)"
>>{ return th_mempair_output_move (operands, true, SImode, ZERO_EXTEND); }
>> --
>> 2.39.2
>>

Re: [PATCH v2 0/2] Series of patch to fix PR106594

2023-03-27 Thread Richard Sandiford via Gcc-patches

Ping

https://gcc.gnu.org/pipermail/gcc-patches/2023-March/613640.html

Richard Sandiford  writes:
> This series of patches fixes PR106594, an aarch64 regression in which
> we fail to combine an extension into an address.  The first patch just
> refactors code.  The second patch contains the actual fix.
>
> The cover note for the second patch describes the problem and the fix.
>
> Tested on aarch64-linux-gnu and x86_64-linux-gnu.  OK to install?
>
> Richard

Re: [og12] libgomp: Document OpenMP 'pinned' memory (was: [PATCH] libgomp, openmp: pinned memory)

2023-03-27 Thread Andrew Stubbs


On 27/03/2023 12:26, Thomas Schwinge wrote:

Hi!

On 2023-03-27T09:27:31+, "Stubbs, Andrew"  wrote:

-Original Message-
From: Thomas Schwinge 
Sent: 24 March 2023 15:50

On 2022-01-04T15:32:17+, Andrew Stubbs 
wrote:

This patch implements the OpenMP pinned memory trait [...]


I figure it may be helpful to document the current og12 state of affairs; does
the attached "libgomp: Document OpenMP 'pinned' memory" look good to
you?


I don't really know what "allocated via the device" means?


Heh, you're right.


I mean, I presume you mean "via CUDA", but I don't think this is obvious to the 
average reader.
Maybe "allocation is optimized for the device" or some such thing?


As we're in sections that are documenting GCN vs. nvptx specifics, we
might indeed call out which exact interfaces we're using.

How's the updated "libgomp: Document OpenMP 'pinned' memory", see
attached?


LGTM, FWIW.

Andrew

Re: [PATCH] target/109296 - riscv: Add missing mode specifiers for XTheadMemPair

2023-03-27 Thread Kito Cheng via Gcc-patches

OK for trunk, thanks :)

On Mon, Mar 27, 2023 at 7:04 PM Christoph Muellner <
christoph.muell...@vrull.eu> wrote:

> From: Christoph Müllner 
>
> This patch adds missing mode specifiers for XTheadMemPair INSNs.
>
> gcc/ChangeLog:
> PR target/109296
> * config/riscv/thead.md: Add missing mode specifiers.
>
> Signed-off-by: Christoph Müllner 
> ---
>  gcc/config/riscv/thead.md | 16 
>  1 file changed, 8 insertions(+), 8 deletions(-)
>
> diff --git a/gcc/config/riscv/thead.md b/gcc/config/riscv/thead.md
> index 63c4af6f77d..0623607d3dc 100644
> --- a/gcc/config/riscv/thead.md
> +++ b/gcc/config/riscv/thead.md
> @@ -321,10 +321,10 @@ (define_insn "*th_mempair_store_2"
>
>  ;; MEMPAIR load DI extended signed SI
>  (define_insn "*th_mempair_load_extendsidi2"
> -  [(set (match_operand 0 "register_operand" "=r")
> -   (sign_extend:DI (match_operand 1 "memory_operand" "m")))
> -   (set (match_operand 2 "register_operand" "=r")
> -   (sign_extend:DI (match_operand 3 "memory_operand" "m")))]
> +  [(set (match_operand:DI 0 "register_operand" "=r")
> +   (sign_extend:DI (match_operand:SI 1 "memory_operand" "m")))
> +   (set (match_operand:DI 2 "register_operand" "=r")
> +   (sign_extend:DI (match_operand:SI 3 "memory_operand" "m")))]
>"TARGET_XTHEADMEMPAIR && TARGET_64BIT && reload_completed
> && th_mempair_operands_p (operands, true, SImode)"
>{ return th_mempair_output_move (operands, true, SImode, SIGN_EXTEND); }
> @@ -334,10 +334,10 @@ (define_insn "*th_mempair_load_extendsidi2"
>
>  ;; MEMPAIR load DI extended unsigned SI
>  (define_insn "*th_mempair_load_zero_extendsidi2"
> -  [(set (match_operand 0 "register_operand" "=r")
> -   (zero_extend:DI (match_operand 1 "memory_operand" "m")))
> -   (set (match_operand 2 "register_operand" "=r")
> -   (zero_extend:DI (match_operand 3 "memory_operand" "m")))]
> +  [(set (match_operand:DI 0 "register_operand" "=r")
> +   (zero_extend:DI (match_operand:SI 1 "memory_operand" "m")))
> +   (set (match_operand:DI 2 "register_operand" "=r")
> +   (zero_extend:DI (match_operand:SI 3 "memory_operand" "m")))]
>"TARGET_XTHEADMEMPAIR && TARGET_64BIT && reload_completed
> && th_mempair_operands_p (operands, true, SImode)"
>{ return th_mempair_output_move (operands, true, SImode, ZERO_EXTEND); }
> --
> 2.39.2
>
>

[og12] libgomp: Document OpenMP 'pinned' memory (was: [PATCH] libgomp, openmp: pinned memory)

2023-03-27 Thread Thomas Schwinge

Hi!

On 2023-03-27T09:27:31+, "Stubbs, Andrew"  wrote:
>> -Original Message-
>> From: Thomas Schwinge 
>> Sent: 24 March 2023 15:50
>>
>> On 2022-01-04T15:32:17+, Andrew Stubbs 
>> wrote:
>> > This patch implements the OpenMP pinned memory trait [...]
>>
>> I figure it may be helpful to document the current og12 state of affairs; 
>> does
>> the attached "libgomp: Document OpenMP 'pinned' memory" look good to
>> you?
>
> I don't really know what "allocated via the device" means?

Heh, you're right.

> I mean, I presume you mean "via CUDA", but I don't think this is obvious to 
> the average reader.
> Maybe "allocation is optimized for the device" or some such thing?

As we're in sections that are documenting GCN vs. nvptx specifics, we
might indeed call out which exact interfaces we're using.

How's the updated "libgomp: Document OpenMP 'pinned' memory", see
attached?


Grüße
 Thomas


-
Siemens Electronic Design Automation GmbH; Anschrift: Arnulfstraße 201, 80634 
München; Gesellschaft mit beschränkter Haftung; Geschäftsführer: Thomas 
Heurung, Frank Thürauf; Sitz der Gesellschaft: München; Registergericht 
München, HRB 106955
>From 03e09ad4e0b4cd2232e8bb036dd2562b18ea2686 Mon Sep 17 00:00:00 2001
From: Thomas Schwinge 
Date: Fri, 24 Mar 2023 15:14:57 +0100
Subject: [PATCH] libgomp: Document OpenMP 'pinned' memory

	libgomp/
	* libgomp.texi (AMD Radeon, nvptx): Document OpenMP 'pinned'
	memory.
---
 libgomp/libgomp.texi | 8 
 1 file changed, 8 insertions(+)

diff --git a/libgomp/libgomp.texi b/libgomp/libgomp.texi
index 288e0b3a8ea..6355ce2a37b 100644
--- a/libgomp/libgomp.texi
+++ b/libgomp/libgomp.texi
@@ -4456,6 +4456,9 @@ The implementation remark:
 @item OpenMP code that has a requires directive with @code{unified_address} or
   @code{unified_shared_memory} will remove any GCN device from the list of
   available devices (``host fallback'').
+@item OpenMP @emph{pinned} memory (@code{omp_atk_pinned},
+  @code{ompx_pinned_mem_alloc}, for example)
+  is allocated via @code{mmap}, @code{mlock}.
 @end itemize
 
 
@@ -4518,6 +4521,11 @@ The implementation remark:
 @item OpenMP code that has a requires directive with @code{unified_address}
   or @code{unified_shared_memory} will remove any nvptx device from the
   list of available devices (``host fallback'').
+@item OpenMP @emph{pinned} memory (@code{omp_atk_pinned},
+  @code{ompx_pinned_mem_alloc}, for example)
+  is allocated via @code{cuMemHostAlloc} (CUDA Driver API).
+  This potentially helps optimization of host <-> device data
+  transfers.
 @end itemize
 
 
-- 
2.25.1

Re: [PATCH] Modula-2: fix documentation layout

2023-03-27 Thread Eric Botcazou via Gcc-patches

Hi Gaius,

> yes indeed and thanks for the patch!

You're welcome.  The documentation was slightly broken again in the meantime, 
but nothing really serious this time.

Again tested with a modern and an old version of Makeinfo.  OK for mainline?


2023-03-27  Eric Botcazou  

* doc/gm2.texi: Add missing Next, Previous and Top fields to most
top-level sections.

-- 
Eric Botcazoudiff --git a/gcc/doc/gm2.texi b/gcc/doc/gm2.texi
index c08bb89ac68..db35f6f7e93 100644
--- a/gcc/doc/gm2.texi
+++ b/gcc/doc/gm2.texi
@@ -2761,7 +2761,7 @@ The mailing list contents can be viewed
 These exist and can be found on the frontends web page on the
 @uref{http://gcc.gnu.org/frontends.html, gcc web site}.
 
-@node License, , ,
+@node License, Copying, Using, Top
 @section License of GNU Modula-2
 
 GNU Modula-2 is free software, the compiler is held under the GPL v3
@@ -2781,10 +2781,10 @@ see the files COPYING3 and COPYING.RUNTIME respectively.  If not, see
 More information on how these licenses work is available
 @uref{http://www.gnu.org/licenses/licenses.html} on the GNU web site.
 
-@node Copying, , ,
+@node Copying, Contributing, License, Top
 @include gpl_v3_without_node.texi
 
-@node Contributing, , ,
+@node Contributing, EBNF, Copying, Top
 @section Contributing to GNU Modula-2
 
 Please do and please read the GNU Emacs info under
@@ -2808,7 +2808,7 @@ Many thanks and enjoy your coding!
 @c This section is still being written.
 @c @include gm2-internals.texi
 
-@node EBNF, , ,
+@node EBNF, Libraries, Contributing, Top
 @chapter EBNF of GNU Modula-2
 
 This chapter contains the EBNF of GNU Modula-2.  This grammar currently
@@ -2822,14 +2822,14 @@ phase.
 
 @include m2/gm2-ebnf.texi
 
-@node Libraries, , ,
+@node Libraries, Indices, EBNF, Top
 @chapter PIM and ISO library definitions
 
 This chapter contains M2F, PIM and ISO libraries.
 
 @include m2/gm2-libs.texi
 
-@node Indices, , ,
+@node Indices, , Libraries, Top
 @section Indices
 
 @ifhtml

[PATCH] target/109296 - riscv: Add missing mode specifiers for XTheadMemPair

2023-03-27 Thread Christoph Muellner

From: Christoph Müllner 

This patch adds missing mode specifiers for XTheadMemPair INSNs.

gcc/ChangeLog:
PR target/109296
* config/riscv/thead.md: Add missing mode specifiers.

Signed-off-by: Christoph Müllner 
---
 gcc/config/riscv/thead.md | 16 
 1 file changed, 8 insertions(+), 8 deletions(-)

diff --git a/gcc/config/riscv/thead.md b/gcc/config/riscv/thead.md
index 63c4af6f77d..0623607d3dc 100644
--- a/gcc/config/riscv/thead.md
+++ b/gcc/config/riscv/thead.md
@@ -321,10 +321,10 @@ (define_insn "*th_mempair_store_2"
 
 ;; MEMPAIR load DI extended signed SI
 (define_insn "*th_mempair_load_extendsidi2"
-  [(set (match_operand 0 "register_operand" "=r")
-   (sign_extend:DI (match_operand 1 "memory_operand" "m")))
-   (set (match_operand 2 "register_operand" "=r")
-   (sign_extend:DI (match_operand 3 "memory_operand" "m")))]
+  [(set (match_operand:DI 0 "register_operand" "=r")
+   (sign_extend:DI (match_operand:SI 1 "memory_operand" "m")))
+   (set (match_operand:DI 2 "register_operand" "=r")
+   (sign_extend:DI (match_operand:SI 3 "memory_operand" "m")))]
   "TARGET_XTHEADMEMPAIR && TARGET_64BIT && reload_completed
&& th_mempair_operands_p (operands, true, SImode)"
   { return th_mempair_output_move (operands, true, SImode, SIGN_EXTEND); }
@@ -334,10 +334,10 @@ (define_insn "*th_mempair_load_extendsidi2"
 
 ;; MEMPAIR load DI extended unsigned SI
 (define_insn "*th_mempair_load_zero_extendsidi2"
-  [(set (match_operand 0 "register_operand" "=r")
-   (zero_extend:DI (match_operand 1 "memory_operand" "m")))
-   (set (match_operand 2 "register_operand" "=r")
-   (zero_extend:DI (match_operand 3 "memory_operand" "m")))]
+  [(set (match_operand:DI 0 "register_operand" "=r")
+   (zero_extend:DI (match_operand:SI 1 "memory_operand" "m")))
+   (set (match_operand:DI 2 "register_operand" "=r")
+   (zero_extend:DI (match_operand:SI 3 "memory_operand" "m")))]
   "TARGET_XTHEADMEMPAIR && TARGET_64BIT && reload_completed
&& th_mempair_operands_p (operands, true, SImode)"
   { return th_mempair_output_move (operands, true, SImode, ZERO_EXTEND); }
-- 
2.39.2

[PATCH] Changed vector size

2023-03-27 Thread chenyixuan

From: Yixuan Chen 

Observed a vint type "ABS_EXPR" followed by extra 3 int type "ABS_EXPR". If 
want to test absolute value optimization for vector, maybe don't need 4 times.

gcc/testsuite/ChangeLog:

2023-03-27  Yixuan Chen  

* g++.dg/pr94920.C: Declare the vector size as long as int.

---
 gcc/testsuite/g++.dg/pr94920.C | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/gcc/testsuite/g++.dg/pr94920.C b/gcc/testsuite/g++.dg/pr94920.C
index 126b00478d2..498bef93b3a 100644
--- a/gcc/testsuite/g++.dg/pr94920.C
+++ b/gcc/testsuite/g++.dg/pr94920.C
@@ -2,7 +2,7 @@
 /* { dg-do compile } */
 /* { dg-options "-O2 -Wno-psabi -fdump-tree-optimized" } */
 
-typedef int __attribute__((vector_size(4*sizeof(int vint;
+typedef int __attribute__((vector_size(sizeof(int vint;
 
 /* Same form as PR.  */
 __attribute__((noipa)) unsigned int foo(int x) {
-- 
2.40.0

Re: [PATCH] [rs6000] Correct match pattern in pr56605.c

2023-03-27 Thread Kewen.Lin via Gcc-patches

Hi Haochen,

on 2023/3/27 17:46, HAO CHEN GUI wrote:
> Kewen,
>   The case still fails with trunk.
> 

OK, thanks for checking, the proposed patch can catch the expected pattern
accurately (excluding noises), so okay for trunk and branches, thanks!

BR,
Kewen

> FAIL: gcc.target/powerpc/pr56605.c scan-rtl-dump-times combine "\\(compare:CC 
> \\((?:and|zero_extend):(?:[SD]I) \\((?:sub)?reg:[SD]I" 1
> 
> === gcc Summary ===
> 
> # of expected passes1
> # of unexpected failures1
> 
>   With the trunk, it should match the pattern.
> (compare:CC (and:SI (subreg:SI (reg:DI 207) 0)

Re: [PATCH] driver: Treat include path args the same way between cpp_unique_options and asm_options. [PR71850]

2023-03-27 Thread Costas Argyris via Gcc-patches

Would it be possible to make it version-dependent, then?

As in, if GNU assembler is greater or equal to the version that
supports @FILE, then pass @FILE to it, otherwise fall back to
the current behavior.

I assume most people nowadays would have a version of
Binutils later than 2005, but if we could make it conditional on
the version then even those with earlier version wouldn't break,
they would just get the current behavior.

On Mon, 27 Mar 2023 at 11:00, Xi Ruoyao  wrote:

> On Mon, 2023-03-27 at 10:36 +0100, Costas Argyris via Gcc-patches wrote:
> > [ping^3]
> >
> > This looks like it fixes the bug and also unifies the way include paths
> are
> > passed from the driver to the compiler and assembler (when a @file has
> > been passed to the driver in the first place).
> >
> > That is, when @file has been passed to the driver, put the include paths
> > in a temp @file and pass them to the assembler.Note this is already
> > happening for the compiler, so this patch merely extends this logic to
> the
> > assembler.
> >
> > Is there any reason not to go for it?
>
> It's not supported by all GNU assembler releases.  For example, GCC
> installation doc says we require Binutils >= 2.13.1 for i?86-*-linux*.
> Binutils 2.13.1 was released in 2002, but @FILE support was added into
> Binutils in 2005.
> > >
>
> --
> Xi Ruoyao 
> School of Aerospace Science and Technology, Xidian University
>

Re: [PATCH] driver: Treat include path args the same way between cpp_unique_options and asm_options. [PR71850]

2023-03-27 Thread Xi Ruoyao via Gcc-patches

On Mon, 2023-03-27 at 10:36 +0100, Costas Argyris via Gcc-patches wrote:
> [ping^3]
> 
> This looks like it fixes the bug and also unifies the way include paths are
> passed from the driver to the compiler and assembler (when a @file has
> been passed to the driver in the first place).
> 
> That is, when @file has been passed to the driver, put the include paths
> in a temp @file and pass them to the assembler.    Note this is already
> happening for the compiler, so this patch merely extends this logic to the
> assembler.
> 
> Is there any reason not to go for it?

It's not supported by all GNU assembler releases.  For example, GCC
installation doc says we require Binutils >= 2.13.1 for i?86-*-linux*.
Binutils 2.13.1 was released in 2002, but @FILE support was added into
Binutils in 2005.
> > 

-- 
Xi Ruoyao 
School of Aerospace Science and Technology, Xidian University

Re: [PATCH] [rs6000] Correct match pattern in pr56605.c

2023-03-27 Thread HAO CHEN GUI via Gcc-patches

Kewen,
  The case still fails with trunk.

FAIL: gcc.target/powerpc/pr56605.c scan-rtl-dump-times combine "\\(compare:CC 
\\((?:and|zero_extend):(?:[SD]I) \\((?:sub)?reg:[SD]I" 1

=== gcc Summary ===

# of expected passes1
# of unexpected failures1

  With the trunk, it should match the pattern.
(compare:CC (and:SI (subreg:SI (reg:DI 207) 0)

Thanks
Gui Haochen


在 2023/3/27 15:41, Kewen.Lin 写道:
> Hi Alexandre and Haochen,
> 
> on 2023/3/25 16:42, Alexandre Oliva via Gcc-patches wrote:
>>
>> Ping https://gcc.gnu.org/pipermail/gcc-patches/2022-February/590958.html
>>
>> From: Haochen Gui 
>>
>> This patch corrects the match pattern in pr56605.c. The former pattern
>> is wrong and test case fails with GCC11. It should match following
>> insn on each subtarget after mode promotion is disabled. The patch
>> need to be backported to GCC11.
> 
> Comment https://gcc.gnu.org/bugzilla/show_bug.cgi?id=102146#c21 made me
> feel that this test issue was just in branches, but this proposed patch
> seems to say it still exists on trunk, could you confirm that?
> 
> BR,
> Kewen
> 
>>
>> //gimple
>> _17 = (unsigned int) _20;
>>  prolog_loop_niters.4_23 = _17 & 3;
>>
>> //rtl
>> (insn 19 18 20 2 (parallel [
>> (set (reg:CC 208)
>> (compare:CC (and:SI (subreg:SI (reg:DI 207) 0)
>> (const_int 3 [0x3]))
>> (const_int 0 [0])))
>> (set (reg:SI 129 [ prolog_loop_niters.5 ])
>> (and:SI (subreg:SI (reg:DI 207) 0)
>> (const_int 3 [0x3])))
>> ]) 197 {*andsi3_imm_mask_dot2}
>>
>> Rebased.  Regstrapped on ppc64-linux-gnu.  Also tested with
>> ppc64-vxworks7r2 (gcc-12), where it's also needed.  Ok to install?
>>
>>
>> for  gcc/testsuite/ChangeLog
>>
>>  PR target/102146
>>  * gcc.target/powerpc/pr56605.c: Correct match pattern in
>>  combine pass.
>> ---
>>  gcc/testsuite/gcc.target/powerpc/pr56605.c |3 +--
>>  1 file changed, 1 insertion(+), 2 deletions(-)
>>
>> diff --git a/gcc/testsuite/gcc.target/powerpc/pr56605.c 
>> b/gcc/testsuite/gcc.target/powerpc/pr56605.c
>> index 7695f87db6f66..651a88e3cc7f9 100644
>> --- a/gcc/testsuite/gcc.target/powerpc/pr56605.c
>> +++ b/gcc/testsuite/gcc.target/powerpc/pr56605.c
>> @@ -11,5 +11,4 @@ void foo (short* __restrict sb, int* __restrict ia)
>>  ia[i] = (int) sb[i];
>>  }
>>  
>> -/* { dg-final { scan-rtl-dump-times {\(compare:CC 
>> \((?:and|zero_extend):(?:[SD]I) \((?:sub)?reg:[SD]I} 1 "combine" } } */
>> -
>> +/* { dg-final { scan-rtl-dump-times {\(compare:CC \(and:SI \(subreg:SI 
>> \(reg:DI} 1 "combine" } } */
>

Re: [PATCH] driver: Treat include path args the same way between cpp_unique_options and asm_options. [PR71850]

2023-03-27 Thread Costas Argyris via Gcc-patches

[ping^3]

This looks like it fixes the bug and also unifies the way include paths are
passed from the driver to the compiler and assembler (when a @file has
been passed to the driver in the first place).

That is, when @file has been passed to the driver, put the include paths
in a temp @file and pass them to the assembler.Note this is already
happening for the compiler, so this patch merely extends this logic to the
assembler.

Is there any reason not to go for it?

On Mon, 20 Mar 2023 at 09:47, Costas Argyris 
wrote:

> ping
>
> On Thu, 9 Mar 2023 at 13:39, Costas Argyris 
> wrote:
>
>> Pinging list and driver reviewer.
>>
>> Details here:
>>
>> https://gcc.gnu.org/bugzilla/show_bug.cgi?id=71850
>>
>> On Thu, 2 Mar 2023 at 19:25, Costas Argyris 
>> wrote:
>>
>>> This is a proposal to fix PR71850 by applying the existing logic for
>>> passing include paths to cc1 to as.
>>>
>>> Thanks,
>>> Costas
>>>
>>

Re: [PATCH] libstdc++: Fix up experimental/net/timer/waitable/dest.cc testcase

2023-03-27 Thread Jonathan Wakely via Gcc-patches

On Monday, March 27, 2023, Jakub Jelinek via Libstdc++ <
libstd...@gcc.gnu.org> wrote:
> Hi!
>
> In Fedora package build I've noticed a failure
>
/builddir/build/BUILD/gcc-13.0.1-20230324/libstdc++-v3/testsuite/experimental/net/timer/waitable/dest.cc:
In function 'void test01()':
>
/builddir/build/BUILD/gcc-13.0.1-20230324/libstdc++-v3/testsuite/experimental/net/timer/waitable/dest.cc:41:
warning: format '%lu' expects argument of type 'long unsigned int', but a
> rgument 2 has type 'unsigned int' [-Wformat=]
> FAIL: experimental/net/timer/waitable/dest.cc (test for excess errors)
> Excess errors:
>
/builddir/build/BUILD/gcc-13.0.1-20230324/libstdc++-v3/testsuite/experimental/net/timer/waitable/dest.cc:41:
warning: format '%lu' expects argument of type 'long unsigned int', but
argument 2 has type 'unsigned int' [-Wformat=]
> because we build with -Wformat.
>
> The test uses %lu for size_t argument, which can be anything from unsigned
> int to unsigned long long.  As for printf I'm not sure we can use %zu
> portably and given the n == 1 assertion, I think the options are to kill
> the printf, or cast to long.
>
> Ok for trunk?


Based on the use of __builtin_printf instead of including  and
doing it properly, I suspect I didn't mean to leave that print enabled, and
should have removed it before committing. But this fix is fine, OK for
trunk, thanks!

>
> 2023-03-27  Jakub Jelinek  
>
> * testsuite/experimental/net/timer/waitable/dest.cc: Avoid
-Wformat
> warning if size_t is not unsigned long.
>
> --- libstdc++-v3/testsuite/experimental/net/timer/waitable/dest.cc.jj
 2023-01-16 11:52:17.394714745 +0100
> +++ libstdc++-v3/testsuite/experimental/net/timer/waitable/dest.cc
2023-03-25 14:35:49.046639413 +0100
> @@ -38,7 +38,7 @@ test01()
>  timer.async_wait([](std::error_code e) { ec = e; });
>}
>auto n = ctx.run();
> -  __builtin_printf("ran %lu\n", n);
> +  __builtin_printf("ran %lu\n", long(n));
>VERIFY( n == 1 );
>VERIFY( ec == std::errc::operation_canceled );
>  }
>
>
> Jakub
>
>

Re: [RFC/RFT,V2] CFI: Add support for gcc CFI in aarch64

2023-03-27 Thread Peter Zijlstra

On Sat, Mar 25, 2023 at 01:54:16AM -0700, Dan Li wrote:

> In the compiler part[4], most of the content is the same as Sami's
> implementation[3], except for some minor differences, mainly including:
> 
> 1. The function typeid is calculated differently and it is difficult
> to be consistent.

This means there is an effective ABI break between the compilers, which
is sad :-( Is there really nothing to be done about this?

RE: [og12] libgomp: Document OpenMP 'pinned' memory (was: [PATCH] libgomp, openmp: pinned memory

2023-03-27 Thread Stubbs, Andrew via Gcc-patches

> -Original Message-
> From: Thomas Schwinge 
> Sent: 24 March 2023 15:50
> To: gcc-patches@gcc.gnu.org; Andrew Stubbs ;
> Tobias Burnus 
> Subject: [og12] libgomp: Document OpenMP 'pinned' memory (was: [PATCH]
> libgomp, openmp: pinned memory
> 
> Hi!
> 
> On 2022-01-04T15:32:17+, Andrew Stubbs 
> wrote:
> > This patch implements the OpenMP pinned memory trait [...]
> 
> I figure it may be helpful to document the current og12 state of affairs; does
> the attached "libgomp: Document OpenMP 'pinned' memory" look good to
> you?

I don't really know what "allocated via the device" means? I mean, I presume 
you mean "via CUDA", but I don't think this is obvious to the average reader.

Maybe "allocation is optimized for the device" or some such thing?

Andrew

RE: [PATCH] aarch64: update ampere1 vectorization cost

2023-03-27 Thread Kyrylo Tkachov via Gcc-patches



> -Original Message-
> From: Philipp Tomsich 
> Sent: Monday, March 27, 2023 9:50 AM
> To: Kyrylo Tkachov 
> Cc: gcc-patches@gcc.gnu.org; Richard Sandiford
> ; Tamar Christina
> ; Manolis Tsamis 
> Subject: Re: [PATCH] aarch64: update ampere1 vectorization cost
> 
> On Mon, 27 Mar 2023 at 16:45, Kyrylo Tkachov 
> wrote:
> >
> > Hi Philipp,
> >
> > > -Original Message-
> > > From: Gcc-patches  > > bounces+kyrylo.tkachov=arm@gcc.gnu.org> On Behalf Of Philipp
> > > Tomsich
> > > Sent: Monday, March 27, 2023 8:47 AM
> > > To: gcc-patches@gcc.gnu.org
> > > Cc: Richard Sandiford ; Tamar Christina
> > > ; Philipp Tomsich
> ;
> > > Manolis Tsamis 
> > > Subject: [PATCH] aarch64: update ampere1 vectorization cost
> > >
> > > The original submission of AmpereOne (-mcpu=ampere1) costs occurred
> > > prior to exhaustive testing of vectorizable workloads against
> > > hardware.
> > >
> > > Adjust the vector costs to achieve the best results and more closely
> > > match the underlying hardware.
> > >
> > > gcc/ChangeLog:
> > >
> > >   * config/aarch64/aarch64.cc: Update vector costs for ampere1.
> > >
> > > Co-Authored-By: Manolis Tsamis 
> > >
> > > Signed-off-by: Philipp Tomsich 
> > > ---
> > > We would like to get this into GCC 13 to avoid having to backport at
> > > the start of the next cycle.
> > >
> >
> > Given this affects only the ampere1 costs that sounds fine to me and fairly
> low risk, you are being trusted that these costs are actually desirable and
> properly validated on the hardware involved.
> >
> > > OK for backports?
> >
> > This is ok for trunk (GCC 13). Do you also want to backport this to other
> branches?
> 
> Ampere1 (with the older vector costs) are in GCC12 and GCC11.
> I would like to backport to those as well.

Ok then, though you may want to run the benchmarks on the branches as well to 
make sure the costs give the expected benefit there as well.
Thanks,
Kyrill

> 
> Thanks,
> Philipp.
> 
> > Thanks,
> > Kyrill
> >
> > >
> > >  gcc/config/aarch64/aarch64.cc | 12 ++--
> > >  1 file changed, 6 insertions(+), 6 deletions(-)
> > >
> > > diff --git a/gcc/config/aarch64/aarch64.cc
> b/gcc/config/aarch64/aarch64.cc
> > > index b27f4354031..661fff65cea 100644
> > > --- a/gcc/config/aarch64/aarch64.cc
> > > +++ b/gcc/config/aarch64/aarch64.cc
> > > @@ -1132,7 +1132,7 @@ static const struct cpu_vector_cost
> > > thunderx3t110_vector_cost =
> > >
> > >  static const advsimd_vec_cost ampere1_advsimd_vector_cost =
> > >  {
> > > -  3, /* int_stmt_cost  */
> > > +  1, /* int_stmt_cost  */
> > >3, /* fp_stmt_cost  */
> > >0, /* ld2_st2_permute_cost  */
> > >0, /* ld3_st3_permute_cost  */
> > > @@ -1148,17 +1148,17 @@ static const advsimd_vec_cost
> > > ampere1_advsimd_vector_cost =
> > >8, /* store_elt_extra_cost  */
> > >6, /* vec_to_scalar_cost  */
> > >7, /* scalar_to_vec_cost  */
> > > -  5, /* align_load_cost  */
> > > -  5, /* unalign_load_cost  */
> > > -  2, /* unalign_store_cost  */
> > > -  2  /* store_cost  */
> > > +  4, /* align_load_cost  */
> > > +  4, /* unalign_load_cost  */
> > > +  1, /* unalign_store_cost  */
> > > +  1  /* store_cost  */
> > >  };
> > >
> > >  /* Ampere-1 costs for vector insn classes.  */
> > >  static const struct cpu_vector_cost ampere1_vector_cost =
> > >  {
> > >1, /* scalar_int_stmt_cost  */
> > > -  1, /* scalar_fp_stmt_cost  */
> > > +  3, /* scalar_fp_stmt_cost  */
> > >4, /* scalar_load_cost  */
> > >1, /* scalar_store_cost  */
> > >1, /* cond_taken_branch_cost  */
> > > --
> > > 2.34.1
> >

Re: [PATCH] aarch64: update ampere1 vectorization cost

2023-03-27 Thread Philipp Tomsich

Applied to master, thanks!
Philipp.

On Mon, 27 Mar 2023 at 16:45, Kyrylo Tkachov  wrote:
>
> Hi Philipp,
>
> > -Original Message-
> > From: Gcc-patches  > bounces+kyrylo.tkachov=arm@gcc.gnu.org> On Behalf Of Philipp
> > Tomsich
> > Sent: Monday, March 27, 2023 8:47 AM
> > To: gcc-patches@gcc.gnu.org
> > Cc: Richard Sandiford ; Tamar Christina
> > ; Philipp Tomsich ;
> > Manolis Tsamis 
> > Subject: [PATCH] aarch64: update ampere1 vectorization cost
> >
> > The original submission of AmpereOne (-mcpu=ampere1) costs occurred
> > prior to exhaustive testing of vectorizable workloads against
> > hardware.
> >
> > Adjust the vector costs to achieve the best results and more closely
> > match the underlying hardware.
> >
> > gcc/ChangeLog:
> >
> >   * config/aarch64/aarch64.cc: Update vector costs for ampere1.
> >
> > Co-Authored-By: Manolis Tsamis 
> >
> > Signed-off-by: Philipp Tomsich 
> > ---
> > We would like to get this into GCC 13 to avoid having to backport at
> > the start of the next cycle.
> >
>
> Given this affects only the ampere1 costs that sounds fine to me and fairly 
> low risk, you are being trusted that these costs are actually desirable and 
> properly validated on the hardware involved.
>
> > OK for backports?
>
> This is ok for trunk (GCC 13). Do you also want to backport this to other 
> branches?
> Thanks,
> Kyrill
>
> >
> >  gcc/config/aarch64/aarch64.cc | 12 ++--
> >  1 file changed, 6 insertions(+), 6 deletions(-)
> >
> > diff --git a/gcc/config/aarch64/aarch64.cc b/gcc/config/aarch64/aarch64.cc
> > index b27f4354031..661fff65cea 100644
> > --- a/gcc/config/aarch64/aarch64.cc
> > +++ b/gcc/config/aarch64/aarch64.cc
> > @@ -1132,7 +1132,7 @@ static const struct cpu_vector_cost
> > thunderx3t110_vector_cost =
> >
> >  static const advsimd_vec_cost ampere1_advsimd_vector_cost =
> >  {
> > -  3, /* int_stmt_cost  */
> > +  1, /* int_stmt_cost  */
> >3, /* fp_stmt_cost  */
> >0, /* ld2_st2_permute_cost  */
> >0, /* ld3_st3_permute_cost  */
> > @@ -1148,17 +1148,17 @@ static const advsimd_vec_cost
> > ampere1_advsimd_vector_cost =
> >8, /* store_elt_extra_cost  */
> >6, /* vec_to_scalar_cost  */
> >7, /* scalar_to_vec_cost  */
> > -  5, /* align_load_cost  */
> > -  5, /* unalign_load_cost  */
> > -  2, /* unalign_store_cost  */
> > -  2  /* store_cost  */
> > +  4, /* align_load_cost  */
> > +  4, /* unalign_load_cost  */
> > +  1, /* unalign_store_cost  */
> > +  1  /* store_cost  */
> >  };
> >
> >  /* Ampere-1 costs for vector insn classes.  */
> >  static const struct cpu_vector_cost ampere1_vector_cost =
> >  {
> >1, /* scalar_int_stmt_cost  */
> > -  1, /* scalar_fp_stmt_cost  */
> > +  3, /* scalar_fp_stmt_cost  */
> >4, /* scalar_load_cost  */
> >1, /* scalar_store_cost  */
> >1, /* cond_taken_branch_cost  */
> > --
> > 2.34.1
>

Re: Should -ffp-contract=off the default on GCC?

2023-03-27 Thread Zeson via Gcc-patches

Any update on this thread discussion? And the thread was straying to the 
document of option and user-friendly stuff.
So does the default value of -ffp-contract=fast obey the C/C++ language 
standard? But why does clang not obey? Or is it just compiler 
implement-dependent which is not specified by standard?


Regards,
Zeson

[RFC PATCH] ipa-visibility: Fix ICE in lto-partition caused by incorrect comdat group solving in ipa-visibility

2023-03-27 Thread Xionghu Luo via Gcc-patches

I have a case ICE in lto-partion.c:158 not easy to reduce, this ICE
appears some time ago from link:
https://gcc.gnu.org/pipermail/gcc-patches/2021-December/586290.html.
I tried the proposed patch but it doesn't work for me.

Then I did some hack and finally got a successful lto link to compare
with a ICE lto link.  It seems to me that the ICE in
add_symbol_to_partition_1, at lto/lto-partition.c:158
is caused by not dissovle comdat_group_list correctly in
update_visibility_by_resolution_info.

The ICE node is a preempted_reg '__dt_del' function with same_comdat_group
linked to '__dt_base' function, succeeded by '__dt_comp' function.

Success resolution is:

2420 f75f1945 PREVAILING_DEF 
_ZN6google8protobuf8internal16FunctionClosure1IPKNS0_15FieldDescriptorEED2Ev/81027
2422 f75f1945 PREVAILING_DEF 
_ZN6google8protobuf8internal16FunctionClosure1IPKNS0_15FieldDescriptorEED1Ev/81028
2424 f75f1945 PREEMPTED_REG 
_ZN6google8protobuf8internal16FunctionClosure1IPKNS0_15FieldDescriptorEED0Ev/81029

with FOR_EACH_FUNCTION access order:
81029(__dt_del) -> 81028(__dt_comp) -> 81027(__dt_base)

81029 is accessed first, and it is markded externally_visable false,
then accessing 81028 removed all the same_comdat_groups as expected.

ICE resolution is:

2362 f75f1945 PREEMPTED_REG 
_ZN6google8protobuf8internal16FunctionClosure1IPKNS0_15FieldDescriptorEED0Ev/81029
2365 f75f1945 PREVAILING_DEF 
_ZN6google8protobuf8internal16FunctionClosure1IPKNS0_15FieldDescriptorEED2Ev/81027
2367 f75f1945 PREVAILING_DEF 
_ZN6google8protobuf8internal16FunctionClosure1IPKNS0_15FieldDescriptorEED1Ev/81028

with FOR_EACH_FUNCTION access order:
81028(__dt_comp) -> 81027(__dt_base) -> 81029(__dt_del)

81028 is accessed firstly, and node 81029's externally_visible is still
true, when calling 81028's update_visibility_by_resolution_info,
it early returns as 'same_def' is false then fail to dissolve
same_comdat_group list.
So the point here is if PREEMPTED_REG node is not accessed first, the
externallay_visable variable won't be updated on time when accessing
PREVAILING_DEF nodes first.
This patch using *function check* instead of *variable check* to eliminate
the resolution sequence influence.
Not sure whether this patch could also fix Sandra's ICE either?

gcc/ChangeLog:

* ipa-visibility.cc (update_visibility_by_resolution_info):
Check node's externally_visiable with function instead of
variable.
(function_and_variable_visibility): New parameter whole_program.

Signed-off-by: Xionghu Luo 
---
 gcc/ipa-visibility.cc | 13 +
 1 file changed, 9 insertions(+), 4 deletions(-)

diff --git a/gcc/ipa-visibility.cc b/gcc/ipa-visibility.cc
index 8ec82bb333e..1ebc584ffd9 100644
--- a/gcc/ipa-visibility.cc
+++ b/gcc/ipa-visibility.cc
@@ -393,7 +393,7 @@ update_vtable_references (tree *tp, int *walk_subtrees,
resolution info.  */
 
 static void
-update_visibility_by_resolution_info (symtab_node * node)
+update_visibility_by_resolution_info (symtab_node * node, bool whole_program)
 {
   bool define;
 
@@ -412,7 +412,12 @@ update_visibility_by_resolution_info (symtab_node * node)
 for (symtab_node *next = node->same_comdat_group;
 next != node; next = next->same_comdat_group)
   {
-   if (!next->externally_visible || next->transparent_alias)
+   if ((is_a (next)
+&& !dyn_cast (next)->externally_visible_p ())
+   || (is_a (next)
+   && !cgraph_externally_visible_p (dyn_cast (next),
+whole_program))
+   || next->transparent_alias)
  continue;
 
bool same_def
@@ -750,7 +755,7 @@ function_and_variable_visibility (bool whole_program)
DECL_EXTERNAL (node->decl) = 1;
}
 
-  update_visibility_by_resolution_info (node);
+  update_visibility_by_resolution_info (node, whole_program);
   if (node->weakref)
optimize_weakref (node);
 }
@@ -842,7 +847,7 @@ function_and_variable_visibility (bool whole_program)
  && !DECL_EXTERNAL (vnode->decl))
localize_node (whole_program, vnode);
 
-  update_visibility_by_resolution_info (vnode);
+  update_visibility_by_resolution_info (vnode, whole_program);
 
   /* Update virtual tables to point to local aliases where possible.  */
   if (DECL_VIRTUAL_P (vnode->decl)
-- 
2.27.0

Re: [PATCH] aarch64: update ampere1 vectorization cost

2023-03-27 Thread Philipp Tomsich

On Mon, 27 Mar 2023 at 16:45, Kyrylo Tkachov  wrote:
>
> Hi Philipp,
>
> > -Original Message-
> > From: Gcc-patches  > bounces+kyrylo.tkachov=arm@gcc.gnu.org> On Behalf Of Philipp
> > Tomsich
> > Sent: Monday, March 27, 2023 8:47 AM
> > To: gcc-patches@gcc.gnu.org
> > Cc: Richard Sandiford ; Tamar Christina
> > ; Philipp Tomsich ;
> > Manolis Tsamis 
> > Subject: [PATCH] aarch64: update ampere1 vectorization cost
> >
> > The original submission of AmpereOne (-mcpu=ampere1) costs occurred
> > prior to exhaustive testing of vectorizable workloads against
> > hardware.
> >
> > Adjust the vector costs to achieve the best results and more closely
> > match the underlying hardware.
> >
> > gcc/ChangeLog:
> >
> >   * config/aarch64/aarch64.cc: Update vector costs for ampere1.
> >
> > Co-Authored-By: Manolis Tsamis 
> >
> > Signed-off-by: Philipp Tomsich 
> > ---
> > We would like to get this into GCC 13 to avoid having to backport at
> > the start of the next cycle.
> >
>
> Given this affects only the ampere1 costs that sounds fine to me and fairly 
> low risk, you are being trusted that these costs are actually desirable and 
> properly validated on the hardware involved.
>
> > OK for backports?
>
> This is ok for trunk (GCC 13). Do you also want to backport this to other 
> branches?

Ampere1 (with the older vector costs) are in GCC12 and GCC11.
I would like to backport to those as well.

Thanks,
Philipp.

> Thanks,
> Kyrill
>
> >
> >  gcc/config/aarch64/aarch64.cc | 12 ++--
> >  1 file changed, 6 insertions(+), 6 deletions(-)
> >
> > diff --git a/gcc/config/aarch64/aarch64.cc b/gcc/config/aarch64/aarch64.cc
> > index b27f4354031..661fff65cea 100644
> > --- a/gcc/config/aarch64/aarch64.cc
> > +++ b/gcc/config/aarch64/aarch64.cc
> > @@ -1132,7 +1132,7 @@ static const struct cpu_vector_cost
> > thunderx3t110_vector_cost =
> >
> >  static const advsimd_vec_cost ampere1_advsimd_vector_cost =
> >  {
> > -  3, /* int_stmt_cost  */
> > +  1, /* int_stmt_cost  */
> >3, /* fp_stmt_cost  */
> >0, /* ld2_st2_permute_cost  */
> >0, /* ld3_st3_permute_cost  */
> > @@ -1148,17 +1148,17 @@ static const advsimd_vec_cost
> > ampere1_advsimd_vector_cost =
> >8, /* store_elt_extra_cost  */
> >6, /* vec_to_scalar_cost  */
> >7, /* scalar_to_vec_cost  */
> > -  5, /* align_load_cost  */
> > -  5, /* unalign_load_cost  */
> > -  2, /* unalign_store_cost  */
> > -  2  /* store_cost  */
> > +  4, /* align_load_cost  */
> > +  4, /* unalign_load_cost  */
> > +  1, /* unalign_store_cost  */
> > +  1  /* store_cost  */
> >  };
> >
> >  /* Ampere-1 costs for vector insn classes.  */
> >  static const struct cpu_vector_cost ampere1_vector_cost =
> >  {
> >1, /* scalar_int_stmt_cost  */
> > -  1, /* scalar_fp_stmt_cost  */
> > +  3, /* scalar_fp_stmt_cost  */
> >4, /* scalar_load_cost  */
> >1, /* scalar_store_cost  */
> >1, /* cond_taken_branch_cost  */
> > --
> > 2.34.1
>

RE: [PATCH] aarch64: update ampere1 vectorization cost

2023-03-27 Thread Kyrylo Tkachov via Gcc-patches

Hi Philipp,

> -Original Message-
> From: Gcc-patches  bounces+kyrylo.tkachov=arm@gcc.gnu.org> On Behalf Of Philipp
> Tomsich
> Sent: Monday, March 27, 2023 8:47 AM
> To: gcc-patches@gcc.gnu.org
> Cc: Richard Sandiford ; Tamar Christina
> ; Philipp Tomsich ;
> Manolis Tsamis 
> Subject: [PATCH] aarch64: update ampere1 vectorization cost
> 
> The original submission of AmpereOne (-mcpu=ampere1) costs occurred
> prior to exhaustive testing of vectorizable workloads against
> hardware.
> 
> Adjust the vector costs to achieve the best results and more closely
> match the underlying hardware.
> 
> gcc/ChangeLog:
> 
>   * config/aarch64/aarch64.cc: Update vector costs for ampere1.
> 
> Co-Authored-By: Manolis Tsamis 
> 
> Signed-off-by: Philipp Tomsich 
> ---
> We would like to get this into GCC 13 to avoid having to backport at
> the start of the next cycle.
> 

Given this affects only the ampere1 costs that sounds fine to me and fairly low 
risk, you are being trusted that these costs are actually desirable and 
properly validated on the hardware involved.

> OK for backports?

This is ok for trunk (GCC 13). Do you also want to backport this to other 
branches?
Thanks,
Kyrill

> 
>  gcc/config/aarch64/aarch64.cc | 12 ++--
>  1 file changed, 6 insertions(+), 6 deletions(-)
> 
> diff --git a/gcc/config/aarch64/aarch64.cc b/gcc/config/aarch64/aarch64.cc
> index b27f4354031..661fff65cea 100644
> --- a/gcc/config/aarch64/aarch64.cc
> +++ b/gcc/config/aarch64/aarch64.cc
> @@ -1132,7 +1132,7 @@ static const struct cpu_vector_cost
> thunderx3t110_vector_cost =
> 
>  static const advsimd_vec_cost ampere1_advsimd_vector_cost =
>  {
> -  3, /* int_stmt_cost  */
> +  1, /* int_stmt_cost  */
>3, /* fp_stmt_cost  */
>0, /* ld2_st2_permute_cost  */
>0, /* ld3_st3_permute_cost  */
> @@ -1148,17 +1148,17 @@ static const advsimd_vec_cost
> ampere1_advsimd_vector_cost =
>8, /* store_elt_extra_cost  */
>6, /* vec_to_scalar_cost  */
>7, /* scalar_to_vec_cost  */
> -  5, /* align_load_cost  */
> -  5, /* unalign_load_cost  */
> -  2, /* unalign_store_cost  */
> -  2  /* store_cost  */
> +  4, /* align_load_cost  */
> +  4, /* unalign_load_cost  */
> +  1, /* unalign_store_cost  */
> +  1  /* store_cost  */
>  };
> 
>  /* Ampere-1 costs for vector insn classes.  */
>  static const struct cpu_vector_cost ampere1_vector_cost =
>  {
>1, /* scalar_int_stmt_cost  */
> -  1, /* scalar_fp_stmt_cost  */
> +  3, /* scalar_fp_stmt_cost  */
>4, /* scalar_load_cost  */
>1, /* scalar_store_cost  */
>1, /* cond_taken_branch_cost  */
> --
> 2.34.1

Re: [wwwdocs] Add Ada's GCC13 changelog entry

2023-03-27 Thread Arnaud Charlet via Gcc-patches

OK, thanks.

> Hi all,
> 
> a bit belated but just like last year, I've made a patch for the Ada
> entry in the changelog. You can find the patch attached to this email.
> 
> If I have forgotten anything relevant or if I have done something
> incorrectly, please, say so.
> 
> Best regards,
> Fernando Oleo Blanco

> From d273bb1835c1ef23e15d422bed22ca5d333cbdae Mon Sep 17 00:00:00 2001
> From: Fernando Oleo Blanco 
> Date: Sun, 26 Mar 2023 14:20:36 +0200
> Subject: [PATCH 1/1] [PATCH] Add Ada's entry in the v13 changelog
> 
> Signed-off-by: Fernando Oleo Blanco 
> ---
>  htdocs/gcc-13/changes.html | 11 ++-
>  1 file changed, 10 insertions(+), 1 deletion(-)
> 
> diff --git a/htdocs/gcc-13/changes.html b/htdocs/gcc-13/changes.html
> index ff70d2ee..2e25bcf5 100644
> --- a/htdocs/gcc-13/changes.html
> +++ b/htdocs/gcc-13/changes.html
> @@ -160,7 +160,16 @@ a work-in-progress.
>  
>  New Languages and Language specific improvements
>  
> -
> +Ada
> +
> +  Traceback support added in RTEMS for the PPC ELF and ARM 
> architectures.
> +  Support for versions older than VxWorks 7 has been removed.
> +  General improvements to the contracts in the standard libraries.
> +  Addition of GNAT.Binary_Search.
> +  Further additions and fixes for the Ada 2022 specification.
> +  The Pragma SPARK_Mode=Auto is now accepted. Contract 
> analysis has been further improved.
> +  Documentation improvements.
> +
>  
>  C family
>  
> -- 
> 2.40.0
>

[PATCH] libstdc++: Fix up experimental/net/timer/waitable/dest.cc testcase

2023-03-27 Thread Jakub Jelinek via Gcc-patches

Hi!

In Fedora package build I've noticed a failure
/builddir/build/BUILD/gcc-13.0.1-20230324/libstdc++-v3/testsuite/experimental/net/timer/waitable/dest.cc:
 In function 'void test01()':
/builddir/build/BUILD/gcc-13.0.1-20230324/libstdc++-v3/testsuite/experimental/net/timer/waitable/dest.cc:41:
 warning: format '%lu' expects argument of type 'long unsigned int', but a
rgument 2 has type 'unsigned int' [-Wformat=]
FAIL: experimental/net/timer/waitable/dest.cc (test for excess errors)
Excess errors:
/builddir/build/BUILD/gcc-13.0.1-20230324/libstdc++-v3/testsuite/experimental/net/timer/waitable/dest.cc:41:
 warning: format '%lu' expects argument of type 'long unsigned int', but 
argument 2 has type 'unsigned int' [-Wformat=]
because we build with -Wformat.

The test uses %lu for size_t argument, which can be anything from unsigned
int to unsigned long long.  As for printf I'm not sure we can use %zu
portably and given the n == 1 assertion, I think the options are to kill
the printf, or cast to long.

Ok for trunk?

2023-03-27  Jakub Jelinek  

* testsuite/experimental/net/timer/waitable/dest.cc: Avoid -Wformat
warning if size_t is not unsigned long.

--- libstdc++-v3/testsuite/experimental/net/timer/waitable/dest.cc.jj   
2023-01-16 11:52:17.394714745 +0100
+++ libstdc++-v3/testsuite/experimental/net/timer/waitable/dest.cc  
2023-03-25 14:35:49.046639413 +0100
@@ -38,7 +38,7 @@ test01()
 timer.async_wait([](std::error_code e) { ec = e; });
   }
   auto n = ctx.run();
-  __builtin_printf("ran %lu\n", n);
+  __builtin_printf("ran %lu\n", long(n));
   VERIFY( n == 1 );
   VERIFY( ec == std::errc::operation_canceled );
 }


Jakub

Re: [PATCH] RISC-V: Add Z*inx incompatible check in gcc.

2023-03-27 Thread Kito Cheng via Gcc-patches

HI Jiawei:

Thanks for the fix!

Two comments:
- Could you add testcase like
https://github.com/gcc-mirror/gcc/blob/master/gcc/testsuite/gcc.target/riscv/arch-12.c
- And I would prefer those check happened in riscv_subset_list::parse
@gcc/common/config/riscv/riscv-common.cc

On Sun, Mar 26, 2023 at 4:36 PM Jiawei  wrote:
>
> Z*inx is conflict with float extensions, add incompatible check when
> z*inx and hard_float both enabled.
>
> gcc/ChangeLog:
>
> * config/riscv/riscv.cc (riscv_option_override): New check.
>
> ---
>  gcc/config/riscv/riscv.cc | 4 
>  1 file changed, 4 insertions(+)
>
> diff --git a/gcc/config/riscv/riscv.cc b/gcc/config/riscv/riscv.cc
> index 76eee4a55e9..162ba14d3c7 100644
> --- a/gcc/config/riscv/riscv.cc
> +++ b/gcc/config/riscv/riscv.cc
> @@ -6285,6 +6285,10 @@ riscv_option_override (void)
>&& riscv_abi != ABI_LP64 && riscv_abi != ABI_ILP32E)
>  error ("z*inx requires ABI ilp32, ilp32e or lp64");
>
> +  // Zfinx is conflict with float extensions.
> +  if (TARGET_ZFINX && TARGET_HARD_FLOAT)
> +error ("z*inx is conflict with float extensions");
> +
>/* We do not yet support ILP32 on RV64.  */
>if (BITS_PER_WORD != POINTER_SIZE)
>  error ("ABI requires %<-march=rv%d%>", POINTER_SIZE);
> --
> 2.25.1
>

[PATCH] rs6000: Fix predicate for const vector in sldoi_to_mov [PR109069]

2023-03-27 Thread Kewen.Lin via Gcc-patches

Hi,

As PR109069 shows, commit r12-6537-g080a06fcb076b3 which
introduces define_insn_and_split sldoi_to_mov adopts
easy_vector_constant for const vector of interest, but it's
wrong since predicate easy_vector_constant doesn't guarantee
each byte in the const vector is the same.  One counter
example is the const vector in pr109069-1.c.  This patch is
to introduce new predicate const_vector_each_byte_same to
ensure all bytes in the given const vector are the same by
considering both int and float, meanwhile for the constants
which don't meet easy_vector_constant we need to gen a move
instead of just a set, and uses VECTOR_MEM_ALTIVEC_OR_VSX_P
rather than VECTOR_UNIT_ALTIVEC_OR_VSX_P for V2DImode support
under VSX since vector long long type of vec_sld is guarded
under stanza vsx.

Bootstrapped and regtested on powerpc64-linux-gnu P7/P8/P9
and powerpc64le-linux-gnu P9 and P10.

Is it ok for trunk?

BR,
Kewen
-
PR target/109069

gcc/ChangeLog:

* config/rs6000/altivec.md (sldoi_to_mov): Replace predicate
easy_vector_constant with const_vector_each_byte_same, add
handlings in preparation for !easy_vector_constant, and update
VECTOR_UNIT_ALTIVEC_OR_VSX_P with VECTOR_MEM_ALTIVEC_OR_VSX_P.
* config/rs6000/predicates.md (const_vector_each_byte_same): New
predicate.

gcc/testsuite/ChangeLog:

* gcc.target/powerpc/pr109069-1.c: New test.
* gcc.target/powerpc/pr109069-2-run.c: New test.
* gcc.target/powerpc/pr109069-2.c: New test.
* gcc.target/powerpc/pr109069-2.h: New test.
---
 gcc/config/rs6000/altivec.md  | 14 +++-
 gcc/config/rs6000/predicates.md   | 37 +
 gcc/testsuite/gcc.target/powerpc/pr109069-1.c | 25 ++
 .../gcc.target/powerpc/pr109069-2-run.c   | 50 +++
 gcc/testsuite/gcc.target/powerpc/pr109069-2.c | 12 +++
 gcc/testsuite/gcc.target/powerpc/pr109069-2.h | 83 +++
 6 files changed, 218 insertions(+), 3 deletions(-)
 create mode 100644 gcc/testsuite/gcc.target/powerpc/pr109069-1.c
 create mode 100644 gcc/testsuite/gcc.target/powerpc/pr109069-2-run.c
 create mode 100644 gcc/testsuite/gcc.target/powerpc/pr109069-2.c
 create mode 100644 gcc/testsuite/gcc.target/powerpc/pr109069-2.h

diff --git a/gcc/config/rs6000/altivec.md b/gcc/config/rs6000/altivec.md
index 30606b8ab21..183c3005694 100644
--- a/gcc/config/rs6000/altivec.md
+++ b/gcc/config/rs6000/altivec.md
@@ -385,14 +385,22 @@ (define_split

 (define_insn_and_split "sldoi_to_mov"
   [(set (match_operand:VM 0 "altivec_register_operand")
-   (unspec:VM [(match_operand:VM 1 "easy_vector_constant")
+   (unspec:VM [(match_operand:VM 1 "const_vector_each_byte_same")
(match_dup 1)
(match_operand:QI 2 "u5bit_cint_operand")]
UNSPEC_VSLDOI))]
-  "VECTOR_UNIT_ALTIVEC_OR_VSX_P (mode) && can_create_pseudo_p ()"
+  "VECTOR_MEM_ALTIVEC_OR_VSX_P (mode) && can_create_pseudo_p ()"
   "#"
   "&& 1"
-  [(set (match_dup 0) (match_dup 1))])
+  [(set (match_dup 0) (match_dup 1))]
+  "{
+ if (!easy_vector_constant (operands[1], mode))
+   {
+rtx dest = gen_reg_rtx (mode);
+emit_move_insn (dest, operands[1]);
+operands[1] = dest;
+   }
+  }")

 (define_insn "get_vrsave_internal"
   [(set (match_operand:SI 0 "register_operand" "=r")
diff --git a/gcc/config/rs6000/predicates.md b/gcc/config/rs6000/predicates.md
index 52c65534e51..a16ee30f0c0 100644
--- a/gcc/config/rs6000/predicates.md
+++ b/gcc/config/rs6000/predicates.md
@@ -798,6 +798,43 @@ (define_predicate "easy_vector_constant_vsldoi"
(and (match_test "easy_altivec_constant (op, mode)")
 (match_test "vspltis_shifted (op) != 0")

+;; Return true if this is a vector constant and each byte in
+;; it is the same.
+(define_predicate "const_vector_each_byte_same"
+  (match_code "const_vector")
+{
+  rtx elt;
+  if (!const_vec_duplicate_p (op, ))
+return false;
+
+  machine_mode emode = GET_MODE_INNER (mode);
+  unsigned HOST_WIDE_INT eval;
+  if (CONST_INT_P (elt))
+eval = INTVAL (elt);
+  else if (CONST_DOUBLE_AS_FLOAT_P (elt))
+{
+  gcc_assert (emode == SFmode || emode == DFmode);
+  long l[2];
+  real_to_target (l, CONST_DOUBLE_REAL_VALUE (elt), emode);
+  /* real_to_target puts 32-bit pieces in each long.  */
+  eval = zext_hwi (l[0], 32);
+  eval |= zext_hwi (l[1], 32) << 32;
+}
+  else
+return false;
+
+  unsigned int esize = GET_MODE_SIZE (emode);
+  unsigned char byte0 = eval & 0xff;
+  for (unsigned int i = 1; i < esize; i++)
+{
+  eval >>= BITS_PER_UNIT;
+  if (byte0 != (eval & 0xff))
+   return false;
+}
+
+  return true;
+})
+
 ;; Return 1 if operand is a vector int register or is either a vector constant
 ;; of all 0 bits of a vector constant of all 1 bits.
 (define_predicate "vector_int_reg_or_same_bit"
diff --git a/gcc/testsuite/gcc.target/powerpc/pr109069-1.c

[PATCH (pushed)] fix: pytest error

2023-03-27 Thread Martin Liška

Fixes:
gcc/testsuite/lib/verify-sarif-file.py:10:27: Q000 Double quotes found but 
single quotes preferred

gcc/testsuite/ChangeLog:

* lib/verify-sarif-file.py: Use apostrophes instead
of double quotes.
---
 gcc/testsuite/lib/verify-sarif-file.py | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/gcc/testsuite/lib/verify-sarif-file.py 
b/gcc/testsuite/lib/verify-sarif-file.py
index f1833f3016e..eb6236f564c 100644
--- a/gcc/testsuite/lib/verify-sarif-file.py
+++ b/gcc/testsuite/lib/verify-sarif-file.py
@@ -7,5 +7,5 @@ import sys
 sys.tracebacklimit = 0
 
 fname = sys.argv[1]
-with open(fname, encoding="utf-8") as f:
+with open(fname, encoding='utf-8') as f:
 json.load(f)
-- 
2.40.0

[PATCH] aarch64: update ampere1 vectorization cost

2023-03-27 Thread Philipp Tomsich

The original submission of AmpereOne (-mcpu=ampere1) costs occurred
prior to exhaustive testing of vectorizable workloads against
hardware.

Adjust the vector costs to achieve the best results and more closely
match the underlying hardware.

gcc/ChangeLog:

* config/aarch64/aarch64.cc: Update vector costs for ampere1.

Co-Authored-By: Manolis Tsamis 

Signed-off-by: Philipp Tomsich 
---
We would like to get this into GCC 13 to avoid having to backport at
the start of the next cycle.

OK for backports?

 gcc/config/aarch64/aarch64.cc | 12 ++--
 1 file changed, 6 insertions(+), 6 deletions(-)

diff --git a/gcc/config/aarch64/aarch64.cc b/gcc/config/aarch64/aarch64.cc
index b27f4354031..661fff65cea 100644
--- a/gcc/config/aarch64/aarch64.cc
+++ b/gcc/config/aarch64/aarch64.cc
@@ -1132,7 +1132,7 @@ static const struct cpu_vector_cost 
thunderx3t110_vector_cost =
 
 static const advsimd_vec_cost ampere1_advsimd_vector_cost =
 {
-  3, /* int_stmt_cost  */
+  1, /* int_stmt_cost  */
   3, /* fp_stmt_cost  */
   0, /* ld2_st2_permute_cost  */
   0, /* ld3_st3_permute_cost  */
@@ -1148,17 +1148,17 @@ static const advsimd_vec_cost 
ampere1_advsimd_vector_cost =
   8, /* store_elt_extra_cost  */
   6, /* vec_to_scalar_cost  */
   7, /* scalar_to_vec_cost  */
-  5, /* align_load_cost  */
-  5, /* unalign_load_cost  */
-  2, /* unalign_store_cost  */
-  2  /* store_cost  */
+  4, /* align_load_cost  */
+  4, /* unalign_load_cost  */
+  1, /* unalign_store_cost  */
+  1  /* store_cost  */
 };
 
 /* Ampere-1 costs for vector insn classes.  */
 static const struct cpu_vector_cost ampere1_vector_cost =
 {
   1, /* scalar_int_stmt_cost  */
-  1, /* scalar_fp_stmt_cost  */
+  3, /* scalar_fp_stmt_cost  */
   4, /* scalar_load_cost  */
   1, /* scalar_store_cost  */
   1, /* cond_taken_branch_cost  */
-- 
2.34.1

Re: [PATCH] [rs6000] Correct match pattern in pr56605.c

2023-03-27 Thread Kewen.Lin via Gcc-patches

Hi Alexandre and Haochen,

on 2023/3/25 16:42, Alexandre Oliva via Gcc-patches wrote:
> 
> Ping https://gcc.gnu.org/pipermail/gcc-patches/2022-February/590958.html
> 
> From: Haochen Gui 
> 
> This patch corrects the match pattern in pr56605.c. The former pattern
> is wrong and test case fails with GCC11. It should match following
> insn on each subtarget after mode promotion is disabled. The patch
> need to be backported to GCC11.

Comment https://gcc.gnu.org/bugzilla/show_bug.cgi?id=102146#c21 made me
feel that this test issue was just in branches, but this proposed patch
seems to say it still exists on trunk, could you confirm that?

BR,
Kewen

> 
> //gimple
> _17 = (unsigned int) _20;
>  prolog_loop_niters.4_23 = _17 & 3;
> 
> //rtl
> (insn 19 18 20 2 (parallel [
> (set (reg:CC 208)
> (compare:CC (and:SI (subreg:SI (reg:DI 207) 0)
> (const_int 3 [0x3]))
> (const_int 0 [0])))
> (set (reg:SI 129 [ prolog_loop_niters.5 ])
> (and:SI (subreg:SI (reg:DI 207) 0)
> (const_int 3 [0x3])))
> ]) 197 {*andsi3_imm_mask_dot2}
> 
> Rebased.  Regstrapped on ppc64-linux-gnu.  Also tested with
> ppc64-vxworks7r2 (gcc-12), where it's also needed.  Ok to install?
> 
> 
> for  gcc/testsuite/ChangeLog
> 
>   PR target/102146
>   * gcc.target/powerpc/pr56605.c: Correct match pattern in
>   combine pass.
> ---
>  gcc/testsuite/gcc.target/powerpc/pr56605.c |3 +--
>  1 file changed, 1 insertion(+), 2 deletions(-)
> 
> diff --git a/gcc/testsuite/gcc.target/powerpc/pr56605.c 
> b/gcc/testsuite/gcc.target/powerpc/pr56605.c
> index 7695f87db6f66..651a88e3cc7f9 100644
> --- a/gcc/testsuite/gcc.target/powerpc/pr56605.c
> +++ b/gcc/testsuite/gcc.target/powerpc/pr56605.c
> @@ -11,5 +11,4 @@ void foo (short* __restrict sb, int* __restrict ia)
>  ia[i] = (int) sb[i];
>  }
>  
> -/* { dg-final { scan-rtl-dump-times {\(compare:CC 
> \((?:and|zero_extend):(?:[SD]I) \((?:sub)?reg:[SD]I} 1 "combine" } } */
> -
> +/* { dg-final { scan-rtl-dump-times {\(compare:CC \(and:SI \(subreg:SI 
> \(reg:DI} 1 "combine" } } */

Re: [PATCH] c++, coroutines: Stabilize names of promoted slot vars [PR101118].

2023-03-27 Thread Iain Sandoe

Hi Richard

> On 27 Mar 2023, at 12:48, Richard Biener  wrote:
> 
> On Mon, Mar 27, 2023 at 8:58 AM Iain Sandoe  wrote:
>> 
>> Hi Richard,
>> (I’m away from my usual infrastructure, so responses could be slow and 
>> testing things
>> could take a while).
>> 
>>> On 27 Mar 2023, at 12:10, Richard Biener  wrote:
>>> 
>>> On Sun, Mar 26, 2023 at 6:55 PM Iain Sandoe via Gcc-patches
>>>  wrote:
 
 Tested on x86_64-darwin21, x86-64-linux-gnu
 OK for trunk?
 Iain
 
 When we need to 'promote' a value (i.e. store it in the coroutine frame) it
 is given a frame entry name.  This was based on the DECL_UID for slot vars.
 However, when LTO is used, the names from multiple TUs become visible at 
 the
 same time, and the DECL_UIDs usually differ between units.  This leads to a
 "ODR mismatch" warning for the frame type.
 
 The fix here is to use a counter instead of the DECL_UID which makes a name
 that is stable between TUs for each frame layout (one per coroutine func).
>>> 
>>> I don't see how this avoids clashes across TUs?  But are those VAR_DECLs not
>>> local anyway?
>> 
>> The reported ODR issue is in the frame type (which is a structure) — it sees 
>> two
>> frame layouts with the same types for each field but a different name for 
>> the entries
>> that came from the promotion of the slot var (because I used the DECL_UID to 
>> generate
>> the field name).
> 
> Ah, I see.  If it's from the same TU then why do we generate two frame
> layouts with
> the same type in the first place?

They are different TUs.

The frames are generated for coroutine types instantiated from templates
declared in a (boost) header.

(I do not see anything in the testcase header making stuff explicitily inline)
AFAIR the rules this is OK ODR-use-wise ….

>>> I suppose -Wodr diagnostics for DECL_ARTIFICIAL vars are a bit on the
>>> edge as well ...
>> 
>> These promoted vars get DECL_VALUE_EXPRs (and as noted above a name to
>> assist in debugging) tying them to the frame entry,
>> 
>> .. although  I do agree that reporting warnings for compiler-internal stuff 
>> is definitely
>> on the edge (ISTR seeing maybe unused reports against such too).
> 
> If the two layouts are used to access the same objects you might run
> into TBAA issues.
> But making them appear the same but still separate types won't help that issue
> (but -flto will "fix" it for you then)

… but I wonder if I should be preventing LTO from doing this (perhaps my frame
type needs a uniquing addition, and then we would not care about the 
differing).  

hmm… now I’m not sure that this patch is the right fix .. I’d welcome Jason’s 
take
on this.

>> Not sure if we have an easy way to tell that the frame type is an internal 
>> one tho.
>> Perhaps that needs a DECL_ARTIFICAL - but would that not make it unavailable
>> for debug?
> 
> We have TYPE_ARTIFICIAL, artificial-ness and no-debug are generally separate
> (DECL_IGNORED for decls, but I don't think we have anything for types here).

OK .. I can see about adding that too - but probably not for 13.0 (unless 
that’s the
right fix for the regression, I guess).

Iain

> 
> Richard.
> 
>> 
>> Iain
>> 
>> 
>>> 
>>> Richard.
>>> 
 Signed-off-by: Iain Sandoe 
 
   PR c++/101118
 
 gcc/cp/ChangeLog:
 
   * coroutines.cc: Add counter for promoted slot vars.
   (flatten_await_stmt): Use slot vars counter instead of DECL_UID
   to generate the frame entry name for promoted target expression
   slot variables.
   (morph_fn_to_coro): Reset the slot vars counter at the start of
   each coroutine function.
 ---
 gcc/cp/coroutines.cc | 8 +++-
 1 file changed, 7 insertions(+), 1 deletion(-)
 
 diff --git a/gcc/cp/coroutines.cc b/gcc/cp/coroutines.cc
 index a2189e43db8..359a5bf46ff 100644
 --- a/gcc/cp/coroutines.cc
 +++ b/gcc/cp/coroutines.cc
 @@ -2726,6 +2726,11 @@ struct var_nest_node
  var_nest_node *else_cl;
 };
 
 +/* This is used to make a stable, but unique-per-function, sequence 
 number for
 +   each TARGET_EXPR slot variable that we 'promote' to a frame entry.  It 
 needs
 +   to be stable because the frame type is visible to LTO ODR checking.  */
 +static unsigned tmpno = 0;
 +
 /* This is called for single statements from the co-await statement walker.
   It checks to see if the statement contains any initializers for 
 awaitables
   and if any of these capture items by reference.  */
 @@ -2889,7 +2894,7 @@ flatten_await_stmt (var_nest_node *n, hash_set 
 *promoted,
 tree init = t;
 temps_used->add (init);
 tree var_type = TREE_TYPE (init);
 - char *buf = xasprintf ("D.%d", DECL_UID (TREE_OPERAND (init, 
 0)));
 + char *buf = xasprintf ("T%03u", tmpno++);
 tree var = build_lang_decl (VAR_DECL, get_identifier (buf),

Re: [PATCH] c++, coroutines: Stabilize names of promoted slot vars [PR101118].

2023-03-27 Thread Richard Biener via Gcc-patches

On Mon, Mar 27, 2023 at 8:58 AM Iain Sandoe  wrote:
>
> Hi Richard,
> (I’m away from my usual infrastructure, so responses could be slow and 
> testing things
> could take a while).
>
> > On 27 Mar 2023, at 12:10, Richard Biener  wrote:
> >
> > On Sun, Mar 26, 2023 at 6:55 PM Iain Sandoe via Gcc-patches
> >  wrote:
> >>
> >> Tested on x86_64-darwin21, x86-64-linux-gnu
> >> OK for trunk?
> >> Iain
> >>
> >> When we need to 'promote' a value (i.e. store it in the coroutine frame) it
> >> is given a frame entry name.  This was based on the DECL_UID for slot vars.
> >> However, when LTO is used, the names from multiple TUs become visible at 
> >> the
> >> same time, and the DECL_UIDs usually differ between units.  This leads to a
> >> "ODR mismatch" warning for the frame type.
> >>
> >> The fix here is to use a counter instead of the DECL_UID which makes a name
> >> that is stable between TUs for each frame layout (one per coroutine func).
> >
> > I don't see how this avoids clashes across TUs?  But are those VAR_DECLs not
> > local anyway?
>
> The reported ODR issue is in the frame type (which is a structure) — it sees 
> two
> frame layouts with the same types for each field but a different name for the 
> entries
> that came from the promotion of the slot var (because I used the DECL_UID to 
> generate
> the field name).

Ah, I see.  If it's from the same TU then why do we generate two frame
layouts with
the same type in the first place?

> > I suppose -Wodr diagnostics for DECL_ARTIFICIAL vars are a bit on the
> > edge as well ...
>
> These promoted vars get DECL_VALUE_EXPRs (and as noted above a name to
> assist in debugging) tying them to the frame entry,
>
> .. although  I do agree that reporting warnings for compiler-internal stuff 
> is definitely
> on the edge (ISTR seeing maybe unused reports against such too).

If the two layouts are used to access the same objects you might run
into TBAA issues.
But making them appear the same but still separate types won't help that issue
(but -flto will "fix" it for you then)

> Not sure if we have an easy way to tell that the frame type is an internal 
> one tho.
> Perhaps that needs a DECL_ARTIFICAL - but would that not make it unavailable
> for debug?

We have TYPE_ARTIFICIAL, artificial-ness and no-debug are generally separate
(DECL_IGNORED for decls, but I don't think we have anything for types here).

Richard.

>
> Iain
>
>
> >
> > Richard.
> >
> >> Signed-off-by: Iain Sandoe 
> >>
> >>PR c++/101118
> >>
> >> gcc/cp/ChangeLog:
> >>
> >>* coroutines.cc: Add counter for promoted slot vars.
> >>(flatten_await_stmt): Use slot vars counter instead of DECL_UID
> >>to generate the frame entry name for promoted target expression
> >>slot variables.
> >>(morph_fn_to_coro): Reset the slot vars counter at the start of
> >>each coroutine function.
> >> ---
> >> gcc/cp/coroutines.cc | 8 +++-
> >> 1 file changed, 7 insertions(+), 1 deletion(-)
> >>
> >> diff --git a/gcc/cp/coroutines.cc b/gcc/cp/coroutines.cc
> >> index a2189e43db8..359a5bf46ff 100644
> >> --- a/gcc/cp/coroutines.cc
> >> +++ b/gcc/cp/coroutines.cc
> >> @@ -2726,6 +2726,11 @@ struct var_nest_node
> >>   var_nest_node *else_cl;
> >> };
> >>
> >> +/* This is used to make a stable, but unique-per-function, sequence 
> >> number for
> >> +   each TARGET_EXPR slot variable that we 'promote' to a frame entry.  It 
> >> needs
> >> +   to be stable because the frame type is visible to LTO ODR checking.  */
> >> +static unsigned tmpno = 0;
> >> +
> >> /* This is called for single statements from the co-await statement walker.
> >>It checks to see if the statement contains any initializers for 
> >> awaitables
> >>and if any of these capture items by reference.  */
> >> @@ -2889,7 +2894,7 @@ flatten_await_stmt (var_nest_node *n, hash_set 
> >> *promoted,
> >>  tree init = t;
> >>  temps_used->add (init);
> >>  tree var_type = TREE_TYPE (init);
> >> - char *buf = xasprintf ("D.%d", DECL_UID (TREE_OPERAND (init, 
> >> 0)));
> >> + char *buf = xasprintf ("T%03u", tmpno++);
> >>  tree var = build_lang_decl (VAR_DECL, get_identifier (buf), 
> >> var_type);
> >>  DECL_ARTIFICIAL (var) = true;
> >>  free (buf);
> >> @@ -4374,6 +4379,7 @@ morph_fn_to_coro (tree orig, tree *resumer, tree 
> >> *destroyer)
> >> {
> >>   gcc_checking_assert (orig && TREE_CODE (orig) == FUNCTION_DECL);
> >>
> >> +  tmpno = 0;
> >>   *resumer = error_mark_node;
> >>   *destroyer = error_mark_node;
> >>   if (!coro_function_valid_p (orig))
> >> --
> >> 2.37.1 (Apple Git-137.1)
>

Re: [PATCH, rs6000] rs6000: correct vector sign extend built-ins on Big Endian [PR108812]

2023-03-27 Thread Kewen.Lin via Gcc-patches

Hi Haochen,

Thanks for fixing this.

on 2023/3/27 14:16, HAO CHEN GUI wrote:
> Hi,
>   This patch removes byte reverse operation before vector integer sign
> extension on Big Endian. These built-ins require to sign extend the rightmost
> element. So both BE and LE should do the same operation and the byte reversion
> is no need. This patch fixes it. Now these built-ins have the same behavior on
> all compilers. The test case is modified also.

Nice, I think this change aligns with what's in the documentation:

"Each element of the result is produced by sign-extending the element of the 
input
vector that would fall in the least significant portion of the result element. 
For
example, a sign-extension of a vector signed char to a vector signed long long 
will
sign extend the rightmost byte of each doubleword."

> 
>   The patch passed regression test on Power Linux platforms.
> 
> Thanks
> Gui Haochen
> 
> ChangeLog
> rs6000: correct vector sign extend builtins on Big Endian
> 
> gcc/
>   PR target/108812
>   * config/rs6000/vsx.md (vsignextend_qi_): Remove byte reverse
>   for Big Endian.
>   (vsignextend_hi_): Likewise.
>   (vsignextend_si_v2di): Remove.
>   * config/rs6000/rs6000-builtins.def (__builtin_altivec_vsignextsw2d):
>   Set bif-pattern to vsx_sign_extend_si_v2di.
> 
> gcc/testsuite/
>   PR target/108812
>   * gcc.target/powerpc/p9-sign_extend-runnable.c: Set different expected
>   vectors for Big Endian.
> 
> 
> patch.diff
> diff --git a/gcc/config/rs6000/rs6000-builtins.def 
> b/gcc/config/rs6000/rs6000-builtins.def
> index f76f54793d7..059a455b388 100644
> --- a/gcc/config/rs6000/rs6000-builtins.def
> +++ b/gcc/config/rs6000/rs6000-builtins.def
> @@ -2699,7 +2699,7 @@
>  VSIGNEXTSH2W vsignextend_hi_v4si {}
> 
>const vsll __builtin_altivec_vsignextsw2d (vsi);
> -VSIGNEXTSW2D vsignextend_si_v2di {}
> +VSIGNEXTSW2D vsx_sign_extend_si_v2di {}
> 
>const vsc __builtin_altivec_vslv (vsc, vsc);
>  VSLV vslv {}
> diff --git a/gcc/config/rs6000/vsx.md b/gcc/config/rs6000/vsx.md
> index 992fbc983be..9e9b33f56ab 100644
> --- a/gcc/config/rs6000/vsx.md
> +++ b/gcc/config/rs6000/vsx.md
> @@ -4941,14 +4941,7 @@ (define_expand "vsignextend_qi_"
>UNSPEC_VSX_SIGN_EXTEND))]
>"TARGET_P9_VECTOR"
>  {
> -  if (BYTES_BIG_ENDIAN)
> -{
> -  rtx tmp = gen_reg_rtx (V16QImode);
> -  emit_insn (gen_altivec_vrevev16qi2(tmp, operands[1]));
> -  emit_insn (gen_vsx_sign_extend_qi_(operands[0], tmp));
> -}
> -  else
> -emit_insn (gen_vsx_sign_extend_qi_(operands[0], operands[1]));
> +  emit_insn (gen_vsx_sign_extend_qi_(operands[0], operands[1]));
>DONE;
>  })

I think the whole define_expand can be removed, we can just use the
define_insn names vsx_sign_extend_qi_* in rs6000-builtins.def (just
like what you changed for __builtin_altivec_vsignextsw2d).

This comment is also applied for vsx_sign_extend_hi_*,
vsx_sign_extend_si_* and vsx_sign_extend_v2di_*.

One interesting thing is that we used qi/hi/si in the name for
V16QI/V8HI/V4SI but used v2di for V2DI, could you also adjust the
names from vsx_sign_extend_{qi,hi,si}_* to ..._{v16qi,v8hi,v4si}_*
then make them adopt the same naming style?

BR,
Kewen

Re: [PATCH] [PR99708] [rs6000] don't expect __ibm128 with 64-bit long double

2023-03-27 Thread Kewen.Lin via Gcc-patches

Hi Alexandre,

Thanks for fixing this.

on 2023/3/25 16:37, Alexandre Oliva via Gcc-patches wrote:
> 
> When long double is 64-bit wide, as on vxworks, the rs6000 backend
> defines neither the __ibm128 type nor the __SIZEOF_IBM128__ macro, but
> pr99708.c expected both to be always defined.  Adjust the test to
> match the implementation.

There is one patch from Mike to define type __ibm128 even without
IEEE 128-bit floating point support, it's at the link:

https://gcc.gnu.org/pipermail/gcc-patches/2022-August/599984.html

I would expect this issue would be gone if the adjustment on the
support of type __ibm128 gets landed in future.

So maybe we can just xfail this for longdouble64?  What do you
think?

BR,
Kewen

Re: [PATCH] [testsuite] [ppc] expect vectorization in gen-vect-11c.c

2023-03-27 Thread Kewen.Lin via Gcc-patches

Hi Alexandre,

on 2023/3/25 16:35, Alexandre Oliva wrote:
> 
> The first loop in main gets stores "vectorized" on powerpc into
> full-word stores, even without any vector instruction support, so the
> test's expectation of no loop vectorization is not met.
> 

I think this test issue has been gone since r13-5771-gdc87e1391c55c6.

Could you have a double check?

BR,
Kewen

> Regstrapped on ppc64-linux-gnu.  Also tested with ppc64-vxworks7r2
> (gcc-12).  Ok to install?
> 
> 
> for  gcc/testsuite/ChangeLog
> 
>   * gcc.dg/tree-ssa/gen-vect-11c.c: xfail the test for no
>   vectorization on powerpc*-*-*.
> ---
>  gcc/testsuite/gcc.dg/tree-ssa/gen-vect-11c.c |2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
> 
> diff --git a/gcc/testsuite/gcc.dg/tree-ssa/gen-vect-11c.c 
> b/gcc/testsuite/gcc.dg/tree-ssa/gen-vect-11c.c
> index 22ff44cf66da9..116f6af233887 100644
> --- a/gcc/testsuite/gcc.dg/tree-ssa/gen-vect-11c.c
> +++ b/gcc/testsuite/gcc.dg/tree-ssa/gen-vect-11c.c
> @@ -39,4 +39,4 @@ int main ()
>  }
> 
> 
> -/* { dg-final { scan-tree-dump-times "vectorized 0 loops" 1 "vect" { xfail 
> amdgcn*-*-* } } } */
> +/* { dg-final { scan-tree-dump-times "vectorized 0 loops" 1 "vect" { xfail 
> amdgcn*-*-* powerpc*-*-* } } } */
>

Re: [PATCH, V2] PR target/105325, Make load/cmp fusion know about prefixed load

2023-03-27 Thread Kewen.Lin via Gcc-patches

Hi Mike,

on 2023/3/25 07:06, Michael Meissner wrote:
> I posted a version of patch on March 21st.  This patch makes some code changes
> suggested in the genfusion.pl code.  The only change is in genfusion.pl.  The
> fusion.md that it makes is the same.
> 
> The issue with the bug is the power10 load GPR + cmpi -1/0/1 fusion
> optimization generates illegal assembler code.
> 
> Ultimately the code was dying because the fusion load + compare -1/0/1 
> patterns
> did not handle the possibility that the load might be prefixed.
> 
> The main cause is the constraints for the individual loads in the fusion did 
> not
> match the machine.  In particular, LWA is a ds format instruction when it is
> unprefixed.  The code did not also set the prefixed attribute correctly.
> 
> This patch rewrites the genfusion.pl script so that it will have more accurate
> constraints for the LWA and LD instructions (which are DS instructions).  The
> updated genfusion.pl was then run to update fusion.md.  Finally, the code for
> the "prefixed" attribute is modified so that it considers load + compare
> immediate patterns to be like the normal load insns in checking whether
> operand[1] is a prefixed instruction.
> 
> I am re-running the tests right now, but they should have the same results
> since fsuion.md is the same, and only code in genfusion.pl that makes 
> fusion.md
> was modified.  Assuming these runs pass can I check this into the master
> branch?
> 
> I will also need to check these same patches into GCC 11 and GCC 12 after a
> waiting period (the patch applied to those branches as well).
> 
> 2023-03-21   Michael Meissner  
> 
> gcc/
> 
>   PR target/105325
>   * gcc/config/rs6000/genfusion.pl (gen_ld_cmpi_p10): Improve generation
>   of the ld and lwa instructions which use the DS encoding instead of D.
>   Use the YZ constraint for these loads.  Handle prefixed loads better.
>   Set the sign_extend attribute as appropriate.
>   * gcc/config/rs6000/fusion.md: Regenerate.
>   * gcc/config/rs6000/rs6000.md (prefixed attribute): Add fused_load_cmpi
>   instructions to the list of instructions that might have a prefixed load
>   instruction.
> 
> gcc/testsuite/
> 
>   PR target/105325
>   * g++.target/powerpc/pr105325.C: New test.
>   * gcc.target/powerpc/fusion-p10-ldcmpi.c: Adjust insn counts.
> ---
>  gcc/config/rs6000/fusion.md   | 17 ++
>  gcc/config/rs6000/genfusion.pl| 32 +++
>  gcc/config/rs6000/rs6000.md   |  2 +-
>  gcc/testsuite/g++.target/powerpc/pr105325.C   | 24 ++
>  .../gcc.target/powerpc/fusion-p10-ldcmpi.c|  4 +--
>  5 files changed, 62 insertions(+), 17 deletions(-)
>  create mode 100644 gcc/testsuite/g++.target/powerpc/pr105325.C
> 
> diff --git a/gcc/config/rs6000/fusion.md b/gcc/config/rs6000/fusion.md
> index d45fb138a70..da9953d9ad9 100644
> --- a/gcc/config/rs6000/fusion.md
> +++ b/gcc/config/rs6000/fusion.md
> @@ -22,7 +22,7 @@
>  ;; load mode is DI result mode is clobber compare mode is CC extend is none
>  (define_insn_and_split "*ld_cmpdi_cr0_DI_clobber_CC_none"
>[(set (match_operand:CC 2 "cc_reg_operand" "=x")
> -(compare:CC (match_operand:DI 1 "ds_form_mem_operand" "m")
> +(compare:CC (match_operand:DI 1 "ds_form_mem_operand" "YZ")
>  (match_operand:DI 3 "const_m1_to_1_operand" "n")))
> (clobber (match_scratch:DI 0 "=r"))]
>"(TARGET_P10_FUSION)"
> @@ -43,7 +43,7 @@ (define_insn_and_split "*ld_cmpdi_cr0_DI_clobber_CC_none"
>  ;; load mode is DI result mode is clobber compare mode is CCUNS extend is 
> none
>  (define_insn_and_split "*ld_cmpldi_cr0_DI_clobber_CCUNS_none"
>[(set (match_operand:CCUNS 2 "cc_reg_operand" "=x")
> -(compare:CCUNS (match_operand:DI 1 "ds_form_mem_operand" "m")
> +(compare:CCUNS (match_operand:DI 1 "ds_form_mem_operand" "YZ")
> (match_operand:DI 3 "const_0_to_1_operand" "n")))
> (clobber (match_scratch:DI 0 "=r"))]
>"(TARGET_P10_FUSION)"
> @@ -64,7 +64,7 @@ (define_insn_and_split 
> "*ld_cmpldi_cr0_DI_clobber_CCUNS_none"
>  ;; load mode is DI result mode is DI compare mode is CC extend is none
>  (define_insn_and_split "*ld_cmpdi_cr0_DI_DI_CC_none"
>[(set (match_operand:CC 2 "cc_reg_operand" "=x")
> -(compare:CC (match_operand:DI 1 "ds_form_mem_operand" "m")
> +(compare:CC (match_operand:DI 1 "ds_form_mem_operand" "YZ")
>  (match_operand:DI 3 "const_m1_to_1_operand" "n")))
> (set (match_operand:DI 0 "gpc_reg_operand" "=r") (match_dup 1))]
>"(TARGET_P10_FUSION)"
> @@ -85,7 +85,7 @@ (define_insn_and_split "*ld_cmpdi_cr0_DI_DI_CC_none"
>  ;; load mode is DI result mode is DI compare mode is CCUNS extend is none
>  (define_insn_and_split "*ld_cmpldi_cr0_DI_DI_CCUNS_none"
>[(set (match_operand:CCUNS 2 "cc_reg_operand" "=x")
> -(compare:CCUNS (match_operand:DI 1 "ds_form_mem_operand"

[PATCH] RISC-V: Fix PR108279

2023-03-27 Thread juzhe . zhong

From: Juzhe-Zhong 

PR 108270

Fix bug: https://gcc.gnu.org/bugzilla/show_bug.cgi?id=108270.

Consider the following testcase:
void f (void * restrict in, void * restrict out, int l, int n, int m)
{
  for (int i = 0; i < l; i++){
for (int j = 0; j < m; j++){
  for (int k = 0; k < n; k++)
{
  vint8mf8_t v = __riscv_vle8_v_i8mf8 (in + i + j, 17);
  __riscv_vse8_v_i8mf8 (out + i + j, v, 17);
}
}
  }
}

Compile option: -O3

Before this patch:
mv  a7,a2
mv  a6,a0   
mv  t1,a1
mv  a2,a3
vsetivlizero,17,e8,mf8,ta,ma
...

After this patch:
mv  a7,a2
mv  a6,a0
mv  t1,a1
mv  a2,a3
ble a7,zero,.L1
ble a4,zero,.L1
ble a3,zero,.L1
add a1,a0,a4
li  a0,0
vsetivlizero,17,e8,mf8,ta,ma
...

It will produce potential bug when:

int main ()
{
  vsetivli zero, 100,.
  f (in, out, 0,0,0)
  asm volatile ("csrr a0,vl":::"memory");

  // Before this patch the a0 is 17. (Wrong).
  // After this patch the a0 is 100. (Correct).
  ...
}

gcc/ChangeLog:

* config/riscv/riscv-vsetvl.cc 
(vector_infos_manager::all_empty_predecessor_p): New function.
(pass_vsetvl::backward_demand_fusion): Fix bug.
* config/riscv/riscv-vsetvl.h: New function declare.

gcc/testsuite/ChangeLog:

* gcc.target/riscv/rvv/vsetvl/imm_bb_prop-1.c: Adapt test.
* gcc.target/riscv/rvv/vsetvl/imm_conflict-3.c: Adapt test.
* gcc.target/riscv/rvv/vsetvl/pr108270.c: New test.

---
 gcc/config/riscv/riscv-vsetvl.cc  | 24 +++
 gcc/config/riscv/riscv-vsetvl.h   |  2 ++
 .../riscv/rvv/vsetvl/imm_bb_prop-1.c  |  2 +-
 .../riscv/rvv/vsetvl/imm_conflict-3.c |  4 ++--
 .../gcc.target/riscv/rvv/vsetvl/pr108270.c| 19 +++
 5 files changed, 48 insertions(+), 3 deletions(-)
 create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/vsetvl/pr108270.c

diff --git a/gcc/config/riscv/riscv-vsetvl.cc b/gcc/config/riscv/riscv-vsetvl.cc
index b5f5301ea43..4948e5d4c5e 100644
--- a/gcc/config/riscv/riscv-vsetvl.cc
+++ b/gcc/config/riscv/riscv-vsetvl.cc
@@ -2361,6 +2361,21 @@ vector_infos_manager::all_same_ratio_p (sbitmap bitdata) 
const
   return true;
 }
 
+bool
+vector_infos_manager::all_empty_predecessor_p (const basic_block cfg_bb) const
+{
+  hash_set pred_cfg_bbs = get_all_predecessors (cfg_bb);
+  for (const basic_block pred_cfg_bb : pred_cfg_bbs)
+{
+  const auto _block_info = vector_block_infos[pred_cfg_bb->index];
+  if (!pred_block_info.local_dem.valid_or_dirty_p ()
+ && !pred_block_info.reaching_out.valid_or_dirty_p ())
+   continue;
+  return false;
+}
+  return true;
+}
+
 bool
 vector_infos_manager::all_same_avl_p (const basic_block cfg_bb,
  sbitmap bitdata) const
@@ -3118,6 +3133,14 @@ pass_vsetvl::backward_demand_fusion (void)
   if (!backward_propagate_worthwhile_p (cfg_bb, curr_block_info))
continue;
 
+  /* Fix PR108270:
+
+   bb 0 -> bb 1
+We don't need to backward fuse VL/VTYPE info from bb 1 to bb 0
+if bb 1 is not inside a loop and all predecessors of bb 0 are empty. */
+  if (m_vector_manager->all_empty_predecessor_p (cfg_bb))
+   continue;
+
   edge e;
   edge_iterator ei;
   /* Backward propagate to each predecessor.  */
@@ -3131,6 +3154,7 @@ pass_vsetvl::backward_demand_fusion (void)
continue;
  if (e->src->index == ENTRY_BLOCK_PTR_FOR_FN (cfun)->index)
continue;
+
  /* If prop is demand of vsetvl instruction and reaching doesn't demand
 AVL. We don't backward propagate since vsetvl instruction has no
 side effects.  */
diff --git a/gcc/config/riscv/riscv-vsetvl.h b/gcc/config/riscv/riscv-vsetvl.h
index 237381f7026..eec03d35071 100644
--- a/gcc/config/riscv/riscv-vsetvl.h
+++ b/gcc/config/riscv/riscv-vsetvl.h
@@ -450,6 +450,8 @@ public:
   /* Return true if all expression set in bitmap are same ratio.  */
   bool all_same_ratio_p (sbitmap) const;
 
+  bool all_empty_predecessor_p (const basic_block) const;
+
   void release (void);
   void create_bitmap_vectors (void);
   void free_bitmap_vectors (void);
diff --git a/gcc/testsuite/gcc.target/riscv/rvv/vsetvl/imm_bb_prop-1.c 
b/gcc/testsuite/gcc.target/riscv/rvv/vsetvl/imm_bb_prop-1.c
index cd4ee7dd0d3..ed32a40f5e7 100644
--- a/gcc/testsuite/gcc.target/riscv/rvv/vsetvl/imm_bb_prop-1.c
+++ b/gcc/testsuite/gcc.target/riscv/rvv/vsetvl/imm_bb_prop-1.c
@@ -29,4 +29,4 @@ void f (int8_t * restrict in, int8_t * restrict out, int n, 
int cond)
   }
 }
 
-/* { dg-final { scan-assembler-times 
{vsetivli\s+zero,\s*5,\s*e8,\s*mf8,\s*tu,\s*m[au]} 1 { target { no-opts "-O0" 
no-opts "-g" no-opts "-funroll-loops" } } } } */
+/* { dg-final { scan-assembler-times

Re: [PATCH] c++, coroutines: Stabilize names of promoted slot vars [PR101118].

2023-03-27 Thread Iain Sandoe

Hi Richard,
(I’m away from my usual infrastructure, so responses could be slow and testing 
things
could take a while).

> On 27 Mar 2023, at 12:10, Richard Biener  wrote:
> 
> On Sun, Mar 26, 2023 at 6:55 PM Iain Sandoe via Gcc-patches
>  wrote:
>> 
>> Tested on x86_64-darwin21, x86-64-linux-gnu
>> OK for trunk?
>> Iain
>> 
>> When we need to 'promote' a value (i.e. store it in the coroutine frame) it
>> is given a frame entry name.  This was based on the DECL_UID for slot vars.
>> However, when LTO is used, the names from multiple TUs become visible at the
>> same time, and the DECL_UIDs usually differ between units.  This leads to a
>> "ODR mismatch" warning for the frame type.
>> 
>> The fix here is to use a counter instead of the DECL_UID which makes a name
>> that is stable between TUs for each frame layout (one per coroutine func).
> 
> I don't see how this avoids clashes across TUs?  But are those VAR_DECLs not
> local anyway?

The reported ODR issue is in the frame type (which is a structure) — it sees two
frame layouts with the same types for each field but a different name for the 
entries
that came from the promotion of the slot var (because I used the DECL_UID to 
generate
the field name).

> I suppose -Wodr diagnostics for DECL_ARTIFICIAL vars are a bit on the
> edge as well ...

These promoted vars get DECL_VALUE_EXPRs (and as noted above a name to
assist in debugging) tying them to the frame entry,

.. although  I do agree that reporting warnings for compiler-internal stuff is 
definitely
on the edge (ISTR seeing maybe unused reports against such too).

Not sure if we have an easy way to tell that the frame type is an internal one 
tho. 
Perhaps that needs a DECL_ARTIFICAL - but would that not make it unavailable
for debug?

Iain


> 
> Richard.
> 
>> Signed-off-by: Iain Sandoe 
>> 
>>PR c++/101118
>> 
>> gcc/cp/ChangeLog:
>> 
>>* coroutines.cc: Add counter for promoted slot vars.
>>(flatten_await_stmt): Use slot vars counter instead of DECL_UID
>>to generate the frame entry name for promoted target expression
>>slot variables.
>>(morph_fn_to_coro): Reset the slot vars counter at the start of
>>each coroutine function.
>> ---
>> gcc/cp/coroutines.cc | 8 +++-
>> 1 file changed, 7 insertions(+), 1 deletion(-)
>> 
>> diff --git a/gcc/cp/coroutines.cc b/gcc/cp/coroutines.cc
>> index a2189e43db8..359a5bf46ff 100644
>> --- a/gcc/cp/coroutines.cc
>> +++ b/gcc/cp/coroutines.cc
>> @@ -2726,6 +2726,11 @@ struct var_nest_node
>>   var_nest_node *else_cl;
>> };
>> 
>> +/* This is used to make a stable, but unique-per-function, sequence number 
>> for
>> +   each TARGET_EXPR slot variable that we 'promote' to a frame entry.  It 
>> needs
>> +   to be stable because the frame type is visible to LTO ODR checking.  */
>> +static unsigned tmpno = 0;
>> +
>> /* This is called for single statements from the co-await statement walker.
>>It checks to see if the statement contains any initializers for awaitables
>>and if any of these capture items by reference.  */
>> @@ -2889,7 +2894,7 @@ flatten_await_stmt (var_nest_node *n, hash_set 
>> *promoted,
>>  tree init = t;
>>  temps_used->add (init);
>>  tree var_type = TREE_TYPE (init);
>> - char *buf = xasprintf ("D.%d", DECL_UID (TREE_OPERAND (init, 0)));
>> + char *buf = xasprintf ("T%03u", tmpno++);
>>  tree var = build_lang_decl (VAR_DECL, get_identifier (buf), 
>> var_type);
>>  DECL_ARTIFICIAL (var) = true;
>>  free (buf);
>> @@ -4374,6 +4379,7 @@ morph_fn_to_coro (tree orig, tree *resumer, tree 
>> *destroyer)
>> {
>>   gcc_checking_assert (orig && TREE_CODE (orig) == FUNCTION_DECL);
>> 
>> +  tmpno = 0;
>>   *resumer = error_mark_node;
>>   *destroyer = error_mark_node;
>>   if (!coro_function_valid_p (orig))
>> --
>> 2.37.1 (Apple Git-137.1)

Re: [PATCH] lto/109263 - lto-wrapper and -g0 -ggdb

2023-03-27 Thread Richard Biener via Gcc-patches

On Thu, 23 Mar 2023, Richard Biener wrote:

> The following makes lto-wrapper deal with non-combined debug
> disabling / enabling option combinations properly.  Interestingly
> -gno-dwarf also enables debug.
> 
> Bootstrap / regtest running on x86_64-unknown-linux-gnu.
> 
> OK?  Or do we want to try harder to zap earlier -g0 when later
> -g* appear?

I pushed this to fix the regression, the patch stays valid even
when the patches rejecting negative variants of -ggdb and friends
is approved.

Richard.

>   PR lto/109263
>   * lto-wrapper.c (run_gcc): Parse alternate debug options
>   as well, they always enable debug.
> ---
>  gcc/lto-wrapper.cc | 10 ++
>  1 file changed, 10 insertions(+)
> 
> diff --git a/gcc/lto-wrapper.cc b/gcc/lto-wrapper.cc
> index fe8c5f6e80d..5186d040ce0 100644
> --- a/gcc/lto-wrapper.cc
> +++ b/gcc/lto-wrapper.cc
> @@ -1564,6 +1564,16 @@ run_gcc (unsigned argc, char *argv[])
> skip_debug = option->arg && !strcmp (option->arg, "0");
> break;
>  
> + case OPT_gbtf:
> + case OPT_gctf:
> + case OPT_gdwarf:
> + case OPT_gdwarf_:
> + case OPT_ggdb:
> + case OPT_gvms:
> +   /* Negative forms, if allowed, enable debug info as well.  */
> +   skip_debug = false;
> +   break;
> +
>   case OPT_dumpdir:
> incoming_dumppfx = dumppfx = option->arg;
> break;
> 

-- 
Richard Biener 
SUSE Software Solutions Germany GmbH, Frankenstrasse 146, 90461 Nuernberg,
Germany; GF: Ivo Totev, Andrew Myers, Andrew McDonald, Boudien Moerman;
HRB 36809 (AG Nuernberg)

Re: [PATCH] c++, coroutines: Stabilize names of promoted slot vars [PR101118].

2023-03-27 Thread Richard Biener via Gcc-patches

On Sun, Mar 26, 2023 at 6:55 PM Iain Sandoe via Gcc-patches
 wrote:
>
> Tested on x86_64-darwin21, x86-64-linux-gnu
> OK for trunk?
> Iain
>
> When we need to 'promote' a value (i.e. store it in the coroutine frame) it
> is given a frame entry name.  This was based on the DECL_UID for slot vars.
> However, when LTO is used, the names from multiple TUs become visible at the
> same time, and the DECL_UIDs usually differ between units.  This leads to a
> "ODR mismatch" warning for the frame type.
>
> The fix here is to use a counter instead of the DECL_UID which makes a name
> that is stable between TUs for each frame layout (one per coroutine func).

I don't see how this avoids clashes across TUs?  But are those VAR_DECLs not
local anyway?

I suppose -Wodr diagnostics for DECL_ARTIFICIAL vars are a bit on the
edge as well ...

Richard.

> Signed-off-by: Iain Sandoe 
>
> PR c++/101118
>
> gcc/cp/ChangeLog:
>
> * coroutines.cc: Add counter for promoted slot vars.
> (flatten_await_stmt): Use slot vars counter instead of DECL_UID
> to generate the frame entry name for promoted target expression
> slot variables.
> (morph_fn_to_coro): Reset the slot vars counter at the start of
> each coroutine function.
> ---
>  gcc/cp/coroutines.cc | 8 +++-
>  1 file changed, 7 insertions(+), 1 deletion(-)
>
> diff --git a/gcc/cp/coroutines.cc b/gcc/cp/coroutines.cc
> index a2189e43db8..359a5bf46ff 100644
> --- a/gcc/cp/coroutines.cc
> +++ b/gcc/cp/coroutines.cc
> @@ -2726,6 +2726,11 @@ struct var_nest_node
>var_nest_node *else_cl;
>  };
>
> +/* This is used to make a stable, but unique-per-function, sequence number 
> for
> +   each TARGET_EXPR slot variable that we 'promote' to a frame entry.  It 
> needs
> +   to be stable because the frame type is visible to LTO ODR checking.  */
> +static unsigned tmpno = 0;
> +
>  /* This is called for single statements from the co-await statement walker.
> It checks to see if the statement contains any initializers for awaitables
> and if any of these capture items by reference.  */
> @@ -2889,7 +2894,7 @@ flatten_await_stmt (var_nest_node *n, hash_set 
> *promoted,
>   tree init = t;
>   temps_used->add (init);
>   tree var_type = TREE_TYPE (init);
> - char *buf = xasprintf ("D.%d", DECL_UID (TREE_OPERAND (init, 0)));
> + char *buf = xasprintf ("T%03u", tmpno++);
>   tree var = build_lang_decl (VAR_DECL, get_identifier (buf), 
> var_type);
>   DECL_ARTIFICIAL (var) = true;
>   free (buf);
> @@ -4374,6 +4379,7 @@ morph_fn_to_coro (tree orig, tree *resumer, tree 
> *destroyer)
>  {
>gcc_checking_assert (orig && TREE_CODE (orig) == FUNCTION_DECL);
>
> +  tmpno = 0;
>*resumer = error_mark_node;
>*destroyer = error_mark_node;
>if (!coro_function_valid_p (orig))
> --
> 2.37.1 (Apple Git-137.1)
>

[PATCH, rs6000] rs6000: correct vector sign extend built-ins on Big Endian [PR108812]

2023-03-27 Thread HAO CHEN GUI via Gcc-patches

Hi,
  This patch removes byte reverse operation before vector integer sign
extension on Big Endian. These built-ins require to sign extend the rightmost
element. So both BE and LE should do the same operation and the byte reversion
is no need. This patch fixes it. Now these built-ins have the same behavior on
all compilers. The test case is modified also.

  The patch passed regression test on Power Linux platforms.

Thanks
Gui Haochen

ChangeLog
rs6000: correct vector sign extend builtins on Big Endian

gcc/
PR target/108812
* config/rs6000/vsx.md (vsignextend_qi_): Remove byte reverse
for Big Endian.
(vsignextend_hi_): Likewise.
(vsignextend_si_v2di): Remove.
* config/rs6000/rs6000-builtins.def (__builtin_altivec_vsignextsw2d):
Set bif-pattern to vsx_sign_extend_si_v2di.

gcc/testsuite/
PR target/108812
* gcc.target/powerpc/p9-sign_extend-runnable.c: Set different expected
vectors for Big Endian.


patch.diff
diff --git a/gcc/config/rs6000/rs6000-builtins.def 
b/gcc/config/rs6000/rs6000-builtins.def
index f76f54793d7..059a455b388 100644
--- a/gcc/config/rs6000/rs6000-builtins.def
+++ b/gcc/config/rs6000/rs6000-builtins.def
@@ -2699,7 +2699,7 @@
 VSIGNEXTSH2W vsignextend_hi_v4si {}

   const vsll __builtin_altivec_vsignextsw2d (vsi);
-VSIGNEXTSW2D vsignextend_si_v2di {}
+VSIGNEXTSW2D vsx_sign_extend_si_v2di {}

   const vsc __builtin_altivec_vslv (vsc, vsc);
 VSLV vslv {}
diff --git a/gcc/config/rs6000/vsx.md b/gcc/config/rs6000/vsx.md
index 992fbc983be..9e9b33f56ab 100644
--- a/gcc/config/rs6000/vsx.md
+++ b/gcc/config/rs6000/vsx.md
@@ -4941,14 +4941,7 @@ (define_expand "vsignextend_qi_"
 UNSPEC_VSX_SIGN_EXTEND))]
   "TARGET_P9_VECTOR"
 {
-  if (BYTES_BIG_ENDIAN)
-{
-  rtx tmp = gen_reg_rtx (V16QImode);
-  emit_insn (gen_altivec_vrevev16qi2(tmp, operands[1]));
-  emit_insn (gen_vsx_sign_extend_qi_(operands[0], tmp));
-}
-  else
-emit_insn (gen_vsx_sign_extend_qi_(operands[0], operands[1]));
+  emit_insn (gen_vsx_sign_extend_qi_(operands[0], operands[1]));
   DONE;
 })

@@ -4968,14 +4961,7 @@ (define_expand "vsignextend_hi_"
 UNSPEC_VSX_SIGN_EXTEND))]
   "TARGET_P9_VECTOR"
 {
-  if (BYTES_BIG_ENDIAN)
-{
-  rtx tmp = gen_reg_rtx (V8HImode);
-  emit_insn (gen_altivec_vrevev8hi2(tmp, operands[1]));
-  emit_insn (gen_vsx_sign_extend_hi_(operands[0], tmp));
-}
-  else
- emit_insn (gen_vsx_sign_extend_hi_(operands[0], operands[1]));
+  emit_insn (gen_vsx_sign_extend_hi_(operands[0], operands[1]));
   DONE;
 })

@@ -4987,24 +4973,6 @@ (define_insn "vsx_sign_extend_si_v2di"
   "vextsw2d %0,%1"
   [(set_attr "type" "vecexts")])

-(define_expand "vsignextend_si_v2di"
-  [(set (match_operand:V2DI 0 "vsx_register_operand" "=v")
-   (unspec:V2DI [(match_operand:V4SI 1 "vsx_register_operand" "v")]
-UNSPEC_VSX_SIGN_EXTEND))]
-  "TARGET_P9_VECTOR"
-{
-  if (BYTES_BIG_ENDIAN)
-{
-   rtx tmp = gen_reg_rtx (V4SImode);
-
-   emit_insn (gen_altivec_vrevev4si2(tmp, operands[1]));
-   emit_insn (gen_vsx_sign_extend_si_v2di(operands[0], tmp));
-}
-  else
- emit_insn (gen_vsx_sign_extend_si_v2di(operands[0], operands[1]));
-  DONE;
-})
-
 ;; Sign extend DI to TI.  We provide both GPR targets and Altivec targets on
 ;; power10.  On earlier systems, the machine independent code will generate a
 ;; shift left to sign extend the 64-bit value to 128-bit.
diff --git a/gcc/testsuite/gcc.target/powerpc/p9-sign_extend-runnable.c 
b/gcc/testsuite/gcc.target/powerpc/p9-sign_extend-runnable.c
index fdcad019b96..03c0f1201e4 100644
--- a/gcc/testsuite/gcc.target/powerpc/p9-sign_extend-runnable.c
+++ b/gcc/testsuite/gcc.target/powerpc/p9-sign_extend-runnable.c
@@ -34,7 +34,12 @@ int main ()
   /* test sign extend byte to word */
   vec_arg_qi = (vector signed char) {1, 2, 3, 4, 5, 6, 7, 8,
 -1, -2, -3, -4, -5, -6, -7, -8};
+
+#ifdef __BIG_ENDIAN__
+  vec_expected_wi = (vector signed int) {4, 8, -4, -8};
+#else
   vec_expected_wi = (vector signed int) {1, 5, -1, -5};
+#endif

   vec_result_wi = vec_signexti (vec_arg_qi);

@@ -54,7 +59,12 @@ int main ()
   /* test sign extend byte to double */
   vec_arg_qi = (vector signed char){1, 2, 3, 4, 5, 6, 7, 8,
-1, -2, -3, -4, -5, -6, -7, -8};
+
+#ifdef __BIG_ENDIAN__
+  vec_expected_di = (vector signed long long int){8, -8};
+#else
   vec_expected_di = (vector signed long long int){1, -1};
+#endif

   vec_result_di = vec_signextll(vec_arg_qi);

@@ -72,7 +82,12 @@ int main ()

   /* test sign extend short to word */
   vec_arg_hi = (vector signed short int){1, 2, 3, 4, -1, -2, -3, -4};
+
+#ifdef __BIG_ENDIAN__
+  vec_expected_wi = (vector signed int){2, 4, -2, -4};
+#else
   vec_expected_wi = (vector signed int){1, 3, -1, -3};
+#endif

   vec_result_wi = vec_signexti(vec_arg_hi);

@@ -90,7 +105,12 @@ int main ()

   /* test sign

78 matches

Mail list logo