Re: [r14-5930 Regression] FAIL: gcc.c-torture/compile/libcall-2.c -Os (test for excess errors) on Linux/x86_64

2023-11-28 Thread Jakub Jelinek
On Wed, Nov 29, 2023 at 07:51:15AM +0100, Jose E. Marchesi wrote:
> > FAIL: gcc.c-torture/compile/libcall-2.c   -O1  (test for excess errors)
> > FAIL: gcc.c-torture/compile/libcall-2.c -O2 -flto
> > -fno-use-linker-plugin -flto-partition=none (test for excess errors)
> > FAIL: gcc.c-torture/compile/libcall-2.c -O2 -flto -fuse-linker-plugin
> > -fno-fat-lto-objects (test for excess errors)
> > FAIL: gcc.c-torture/compile/libcall-2.c   -O2  (test for excess errors)
> > FAIL: gcc.c-torture/compile/libcall-2.c   -O3 -g  (test for excess errors)
> > FAIL: gcc.c-torture/compile/libcall-2.c   -Os  (test for excess errors)
> 
> Sorry about the regression.
> 
> I installed the patch below to skip the test if the target is not x86_64
> in lp64, as obvious.  This should fix the issue.
> 
> >From 4ed0740c6e807460ce79a351094329fdeb551545 Mon Sep 17 00:00:00 2001
> From: "Jose E. Marchesi" 
> Date: Wed, 29 Nov 2023 07:44:59 +0100
> Subject: [PATCH] testsuite: fix gcc.c-torture/compile/libcall-2.c in -m32
> 
> This test relies on having __int128 in x86_64 targets, which is only
> available in -m64.
> 
> gcc/testsuite/ChangeLog
> 
>   * gcc.c-torture/compile/libcall-2.c: Skip test in -m32.
> ---
>  gcc/testsuite/gcc.c-torture/compile/libcall-2.c | 2 ++
>  1 file changed, 2 insertions(+)
> 
> diff --git a/gcc/testsuite/gcc.c-torture/compile/libcall-2.c 
> b/gcc/testsuite/gcc.c-torture/compile/libcall-2.c
> index b33944c83ff..9b889172025 100644
> --- a/gcc/testsuite/gcc.c-torture/compile/libcall-2.c
> +++ b/gcc/testsuite/gcc.c-torture/compile/libcall-2.c
> @@ -2,6 +2,8 @@
> indirect calls.  */
>  
>  /* { dg-do compile } */
> +/* __int128 is not supported in x86 -m32.  */
> +/* { dg-skip-if "" { ! { x86_64-*-* && { ! ilp32 } } } } */
>  /* { dg-options "-O2 -mcmodel=large" { target x86_64-*-* } } */
>  /* { dg-final { scan-assembler "globl\t__divti3" } } */

This is not correct.
When a test uses __int128, it should be guarded with int128 effective
target.
But, as the test doesn't really test anything on non-x86 nor ia32,
it doesn't belong to gcc.c-torture/compile/ at all, it is x86 specific
test which should be moved to gcc.target/i386/libcall-1.c
And, should have
/* { dg-do compile { target int128 } } */
/* { dg-options "-O2 -mcmodel=large" } */
/* { dg-final { scan-assembler "globl\t__divti3" } } */
I guess no need to bother with the extra guard for -mcmodel=large,
because -m32/-mx32 don't have __int128 support, and x86_64-*-*
is incorrect anyway (because with that target one can have all
of -m32/-m64/-mx32).

Jakub



[Bug target/111107] i686-w64-mingw32 does not realign stack when __attribute__((aligned)) or __attribute__((vector_size)) are used

2023-11-28 Thread ebotcazou at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=07

--- Comment #16 from Eric Botcazou  ---
> Why use STACK_REALIGN_DEFAULT rather than PREFERRED_STACK_BOUNDARY_DEFAULT?

We know that it works since Solaris has used it for ages, so this alternate way
could be deemed riskier.  But no strong opinion, if the consensus is to use the
latter, then so be it.

Re: [PATCH] Take register pressure into account for vec_construct when the components are not loaded from memory.

2023-11-28 Thread Richard Biener
On Tue, Nov 28, 2023 at 8:54 AM liuhongt  wrote:
>
> For vec_contruct, the components must be live at the same time if
> they're not loaded from memory, when the number of those components
> exceeds available registers, spill happens. Try to account that with a
> rough estimation.
> ??? Ideally, we should have an overall estimation of register pressure
> if we know the live range of all variables.
>
> The patch can avoid regressions due to .i.e. vec_contruct with 32 char.
> Bootstrapped and regtested on x86_64-pc-linux-gnu{-m32,}.
>
> Ok for trunk?

Hmm, I would suggest you put reg_needed into the class and accumulate
over all vec_construct, with your patch you pessimize a single v32qi
over two separate v16qi for example.  Also currently the whole block is
gated with INTEGRAL_TYPE_P but register pressure would be also
a concern for floating point vectors.  finish_cost would then apply an
adjustment.

'target_avail_regs' is for GENERAL_REGS, does that include APX regs?
I don't see anything similar for FP regs, but I guess the target should know
or maybe there's a #regs in regclass query already.

That said, this kind of adjustment looks somewhat appealing.

Richard.

> gcc/ChangeLog:
>
> * config/i386/i386.cc (ix86_vector_costs::add_stmt_cost): Take
> register pressure into account for vec_construct when the
> components are not loaded from memory.
> ---
>  gcc/config/i386/i386.cc | 22 +-
>  1 file changed, 21 insertions(+), 1 deletion(-)
>
> diff --git a/gcc/config/i386/i386.cc b/gcc/config/i386/i386.cc
> index 683ac643bc8..f8417555930 100644
> --- a/gcc/config/i386/i386.cc
> +++ b/gcc/config/i386/i386.cc
> @@ -24706,6 +24706,7 @@ ix86_vector_costs::add_stmt_cost (int count, 
> vect_cost_for_stmt kind,
>stmt_cost = ix86_builtin_vectorization_cost (kind, vectype, misalign);
>unsigned i;
>tree op;
> +  unsigned reg_needed = 0;
>FOR_EACH_VEC_ELT (SLP_TREE_SCALAR_OPS (node), i, op)
> if (TREE_CODE (op) == SSA_NAME)
>   TREE_VISITED (op) = 0;
> @@ -24737,11 +24738,30 @@ ix86_vector_costs::add_stmt_cost (int count, 
> vect_cost_for_stmt kind,
>   && (gimple_assign_rhs_code (def) != BIT_FIELD_REF
>   || !VECTOR_TYPE_P (TREE_TYPE
> (TREE_OPERAND (gimple_assign_rhs1 (def), 
> 0))
> -   stmt_cost += ix86_cost->sse_to_integer;
> +   {
> + stmt_cost += ix86_cost->sse_to_integer;
> + reg_needed++;
> +   }
> }
>FOR_EACH_VEC_ELT (SLP_TREE_SCALAR_OPS (node), i, op)
> if (TREE_CODE (op) == SSA_NAME)
>   TREE_VISITED (op) = 0;
> +
> +  /* For vec_contruct, the components must be live at the same time if
> +they're not loaded from memory, when the number of those components
> +exceeds available registers, spill happens. Try to account that with 
> a
> +rough estimation. Currently only handle integral modes since scalar 
> fp
> +shares sse_regs with vectors.
> +??? Ideally, we should have an overall estimation of register 
> pressure
> +if we know the live range of all variables.  */
> +  if (!fp && kind == vec_construct
> + && reg_needed > target_avail_regs)
> +   {
> + unsigned spill_cost = ix86_builtin_vectorization_cost (scalar_store,
> +vectype,
> +misalign);
> + stmt_cost += spill_cost * (reg_needed - target_avail_regs);
> +   }
>  }
>if (stmt_cost == -1)
>  stmt_cost = ix86_builtin_vectorization_cost (kind, vectype, misalign);
> --
> 2.31.1
>


Re: Pushed: [PATCH v3 0/5] LoongArch: SIMD fixes and optimizations

2023-11-28 Thread chenglulu



在 2023/11/29 下午3:12, Xi Ruoyao 写道:

On Mon, 2023-11-20 at 08:47 +0800, Xi Ruoyao wrote:

The [1/5] patch is the PR112578 fix at
https://gcc.gnu.org/pipermail/gcc-patches/2023-November/637097.html.
It has been changed to remove the nearbyint pattern (because nearbyint
should not raise FE_INEXACT even if -ffp-int-builtin-inexact).
As other patches depending on the simd.md file introduced by this, sending
it as the first of this series.

As many LASX instructions are only differentiated from the corresponding
LSX instruction with operand length, create simd.md file to contain the
RTX templates sharable by LSX and LASX.  This makes the code cleaner and
easier to maintain.

The [2/5] and [3/5] patches make vector product highpart and rotate
shift operations for GNU vectors and auto vectorization.

The [4/5] patch is a simple code cleanup, with no function change.

The [5/5] patch uses LSX for FP scalar rounding operations if LSX is
available and -ffp-int-builtin-exact.  We do this because the base FP
ISA does not have such instructions.  Using LSX is overkill, but still
much faster than calling libc functions.

Bootstrapped and regtested on loongarch64-linux-gnu.  Ok for trunk?

Pushed r14-5950 .. r14-5954 with minor change: a FSF copyright
disclaimer is added into simd.md in the 1st patch, and an used
match_scratch is removed from 2 in the 5th
patch.


Thank you very much!:-)

Xi Ruoyao (5):
   LoongArch: Fix usage of LSX and LASX frint/ftint instructions
     [PR112578]
   LoongArch: Use standard pattern name and RTX code for LSX/LASX muh
     instructions
   LoongArch: Use standard pattern name and RTX code for LSX/LASX rotate
     shift
   LoongArch: Remove lrint_allow_inexact
   LoongArch: Use LSX for scalar FP rounding with explicit rounding mode

  gcc/config/loongarch/lasx.md  | 283 -
  gcc/config/loongarch/loongarch-builtins.cc    |  52 ++--
  gcc/config/loongarch/loongarch.md |  12 +-
  gcc/config/loongarch/lsx.md   | 293 --
  gcc/config/loongarch/simd.md  | 268 
  .../loongarch/vect-frint-no-inexact.c |  48 +++
  .../loongarch/vect-frint-scalar-no-inexact.c  |  23 ++
  .../gcc.target/loongarch/vect-frint-scalar.c  |  43 +++
  .../gcc.target/loongarch/vect-frint.c |  85 +
  .../loongarch/vect-ftint-no-inexact.c |  44 +++
  .../gcc.target/loongarch/vect-ftint.c |  83 +
  gcc/testsuite/gcc.target/loongarch/vect-muh.c |  36 +++
  .../gcc.target/loongarch/vect-rotr.c  |  36 +++
  13 files changed, 701 insertions(+), 605 deletions(-)
  create mode 100644 gcc/config/loongarch/simd.md
  create mode 100644 gcc/testsuite/gcc.target/loongarch/vect-frint-no-inexact.c
  create mode 100644 
gcc/testsuite/gcc.target/loongarch/vect-frint-scalar-no-inexact.c
  create mode 100644 gcc/testsuite/gcc.target/loongarch/vect-frint-scalar.c
  create mode 100644 gcc/testsuite/gcc.target/loongarch/vect-frint.c
  create mode 100644 gcc/testsuite/gcc.target/loongarch/vect-ftint-no-inexact.c
  create mode 100644 gcc/testsuite/gcc.target/loongarch/vect-ftint.c
  create mode 100644 gcc/testsuite/gcc.target/loongarch/vect-muh.c
  create mode 100644 gcc/testsuite/gcc.target/loongarch/vect-rotr.c




[Bug target/112760] New: [14 Regression] wrong code with -O2 -fno-dce -fno-guess-branch-probability -m8bit-idiv -mavx --param=max-cse-insns=0 and __builtin_add_overflow_p()

2023-11-28 Thread zsojka at seznam dot cz via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=112760

Bug ID: 112760
   Summary: [14 Regression] wrong code with -O2 -fno-dce
-fno-guess-branch-probability -m8bit-idiv -mavx
--param=max-cse-insns=0 and __builtin_add_overflow_p()
   Product: gcc
   Version: 14.0
Status: UNCONFIRMED
  Keywords: wrong-code
  Severity: normal
  Priority: P3
 Component: target
  Assignee: unassigned at gcc dot gnu.org
  Reporter: zsojka at seznam dot cz
  Target Milestone: ---
  Host: x86_64-pc-linux-gnu
Target: i686-pc-linux-gnu

Created attachment 56715
  --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=56715=edit
reduced testcase

Output:
$ x86_64-pc-linux-gnu-gcc -m32 -O2 -fno-dce -fno-guess-branch-probability
-m8bit-idiv -mavx --param=max-cse-insns=0 testcase.c
$ ./a.out 
Aborted

The code for __builtin_add_overflow() check looks wrong:
# testcase.c:9:   u16 x = __builtin_add_overflow_p (a, g, (u16) 0);
add eax, ecx# tmp110, g.0_1
mov eax, 1  # tmp118,
setcbl  #, _8
cmovne  ebx, eax# _8,, _8, tmp118

Comparing the code without -mavx, the breakage can be observed better:
$ diff -u a-testcase.GOOD.s a-testcase.BAD.s 
--- a-testcase.GOOD.s   2023-11-29 08:34:39.978807709 +0100
+++ a-testcase.BAD.s2023-11-29 08:32:27.458809580 +0100
@@ -4,7 +4,7 @@
 #  compiled by GNU C version 14.0.0 20231128 (experimental), GMP version
6.3.0, MPFR version 4.2.1, MPC version 1.3.1, isl version isl-0.26-GMP

 # GGC heuristics: --param ggc-min-expand=30 --param ggc-min-heapsize=4096
-# options passed: -m32 -m8bit-idiv -masm=intel -mtune=generic -march=x86-64
-O2 -fno-dce -fno-guess-branch-probability --param=max-cse-insns=0
+# options passed: -m32 -m8bit-idiv -mavx -masm=intel -mtune=generic
-march=x86-64 -O2 -fno-dce -fno-guess-branch-probability
--param=max-cse-insns=0
.text
.p2align 4
.globl  foo0
@@ -28,10 +28,8 @@
movzx   esi, WORD PTR [esp+16]  # b, b
 # testcase.c:9:   u16 x = __builtin_add_overflow_p (a, g, (u16) 0);
add eax, ecx# tmp110, g.0_1
-   movzx   edx, ax # tmp111, tmp110
-   setcbl  #, _8
-   cmp eax, edx# tmp110, tmp111
mov eax, 1  # tmp118,
+   setcbl  #, _8
cmovne  ebx, eax# _8,, _8, tmp118
 # testcase.c:10:   g -= g / b;
mov eax, ecx# tmp119, g.0_1


The "cmovne" instruction is using the Z flag from a different comparison.



$ x86_64-pc-linux-gnu-gcc -v
Using built-in specs.
COLLECT_GCC=/repo/gcc-trunk/binary-latest-amd64/bin/x86_64-pc-linux-gnu-gcc
COLLECT_LTO_WRAPPER=/repo/gcc-trunk/binary-trunk-r14-5940-20231128183456-g3d104d93a70-checking-yes-rtl-df-extra-amd64/bin/../libexec/gcc/x86_64-pc-linux-gnu/14.0.0/lto-wrapper
Target: x86_64-pc-linux-gnu
Configured with: /repo/gcc-trunk//configure --enable-languages=c,c++
--enable-valgrind-annotations --disable-nls --enable-checking=yes,rtl,df,extra
--with-cloog --with-ppl --with-isl --build=x86_64-pc-linux-gnu
--host=x86_64-pc-linux-gnu --target=x86_64-pc-linux-gnu
--with-ld=/usr/bin/x86_64-pc-linux-gnu-ld
--with-as=/usr/bin/x86_64-pc-linux-gnu-as --disable-libstdcxx-pch
--prefix=/repo/gcc-trunk//binary-trunk-r14-5940-20231128183456-g3d104d93a70-checking-yes-rtl-df-extra-amd64
Thread model: posix
Supported LTO compression algorithms: zlib zstd
gcc version 14.0.0 20231128 (experimental) (GCC)

Re: [PATCH][RFC] middle-end/110237 - wrong MEM_ATTRs for partial loads/stores

2023-11-28 Thread Richard Biener
On Tue, 28 Nov 2023, Jeff Law wrote:

> 
> 
> On 11/28/23 00:50, Richard Biener wrote:
> 
> > 
> > There's no way to distinguish a partial vs. non-partial MEM on RTL and
> > while without the bogus MEM_ATTR the alias oracle pieces that
> > miscompiled the original case are fended off we still see the load/store
> > as full given they have a mode with a size - that for example means
> > that DSE can elide a previous store to a masked part.  Eventually
> > that's fended off by using an UNSPEC, but whether the RTL IL has
> > the correct semantics is questionable.
> > 
> > That said, I did propose scrapping the MEM_EXPR which I think is
> > the correct thing to do unless we want to put a CALL_EXPR into it
> > (nothing would use that at the moment) or re-do MEM_EXPR and instead
> > have an ao_ref (or sth slightly more complete) instead of the current
> > MEM_ATTRs - but that would be a lot of work.
> > 
> > This leaves the question wrt. semantics of for example x86 mask_store:
> > 
> > (insn 23 22 24 5 (set (mem:V4DF (plus:DI (reg/v/f:DI 106 [ x ])
> >  (reg:DI 101 [ ivtmp.15 ])) [2 MEM 
> > [(double *)x_11(D) + ivtmp.15_33 * 1]+0 S32 A64])
> >  (unspec:V4DF [
> >  (reg:V4DI 104 [ mask__16.8 ])
> >  (reg:V4DF 105 [ vect_cst__42 ])
> >  (mem:V4DF (plus:DI (reg/v/f:DI 106 [ x ])
> >  (reg:DI 101 [ ivtmp.15 ])) [2 MEM  > double> [(double *)x_11(D) + ivtmp.15_33 * 1]+0 S32 A64])
> >  ] UNSPEC_MASKMOV)) "t.c":5:12 8523 {avx_maskstorepd256}
> >   (nil))
> > 
> > it uses a read-modify-write which makes it safe for DSE.
> Agreed.
> 
> 
>   mask_load
> > looks like
> > 
> > (insn 28 27 29 6 (set (reg:V4DF 115 [ vect__7.11 ])
> >  (unspec:V4DF [
> >  (reg:V4DI 114 [ mask__8.8 ])
> >  (mem:V4DF (plus:DI (reg/v/f:DI 118 [ val ])
> >  (reg:DI 103 [ ivtmp.29 ])) [2 MEM  > double> [(double *)val_13(D) + ivtmp.29_22 * 1]+0 S32 A64])
> >  ] UNSPEC_MASKMOV)) "t.c":5:17 8515 {avx_maskloadpd256}
> >   (nil))
> So with the mem:V4DF inside the unspec, ISTM we must treat that as a potential
> full read, but we can't rely on it being a full read.  I don't think UNSPEC
> semantics are that it must read/consume all its operands in full, just that it
> might.  That might be worth a documentation clarification.
> 
> 
> > 
> > both have (as operand of the UNSPEC) a MEM with V4DFmode (and a
> > MEM_EXPR with a similarly bougs MEM_EXPR) indicating the loads
> > are _not_ partial.  That means the disambiguation against a store
> > to an object that's smaller than V4DF is still possible.
> > Setting MEM_SIZE to UNKNOWN doesn't help - that just asks to look
> > at the mode.  As discussed using a BLKmode MEM _might_ be a way
> > out but I didn't try what will happen then (patterns would need to
> > be adjusted I guess).
> > 
> > That said, I'm happy to commit the partial fix, scrapping the
> > bogus MEM_EXPRs.
> > 
> > OK for that?
> Works for me.

I'm re-testing the change and will push.  If the UNSPEC uses are really
OK I think we're set.  We can incrementally try to restore missing
alias info.

Richard.

> jeff
> 

-- 
Richard Biener 
SUSE Software Solutions Germany GmbH,
Frankenstrasse 146, 90461 Nuernberg, Germany;
GF: Ivo Totev, Andrew McDonald, Werner Knoblich; (HRB 36809, AG Nuernberg)


Re: [PATCH]middle-end: refactor vectorizable_live_operation into helper method for codegen

2023-11-28 Thread Richard Biener
On Mon, 27 Nov 2023, Tamar Christina wrote:

> Hi All,
> 
> To make code review of the updates to add multiple exit supports to
> vectorizable_live_operation easier I've extracted the refactoring part to
> its own patch.
> 
> This patch is a straight extract of the function with no functional changes.
> 
> Bootstrapped Regtested on aarch64-none-linux-gnu and no issues.
> 
> Ok for master?

OK.

Richard.

> Thanks,
> Tamar
> 
> gcc/ChangeLog:
> 
>   * tree-vect-loop.cc (vectorizable_live_operation_1): New.
>   (vectorizable_live_operation): Extract code to 
> vectorizable_live_operation_1.
> 
> --- inline copy of patch -- 
> diff --git a/gcc/tree-vect-loop.cc b/gcc/tree-vect-loop.cc
> index 
> 8a50380de49bc12105be47ea1d8ee3cf1f2bdab4..df5e1d28fac2ce35e71decdec0d8e31fb75557f5
>  100644
> --- a/gcc/tree-vect-loop.cc
> +++ b/gcc/tree-vect-loop.cc
> @@ -10481,6 +10481,95 @@ vectorizable_induction (loop_vec_info loop_vinfo,
>return true;
>  }
>  
> +
> +/* Function vectorizable_live_operation_1.
> +   helper function for vectorizable_live_operation.  */
> +tree
> +vectorizable_live_operation_1 (loop_vec_info loop_vinfo,
> +stmt_vec_info stmt_info, edge exit_e,
> +tree vectype, int ncopies, slp_tree slp_node,
> +tree bitsize, tree bitstart, tree vec_lhs,
> +tree lhs_type, gimple_stmt_iterator *exit_gsi)
> +{
> +  basic_block exit_bb = exit_e->dest;
> +  gcc_assert (single_pred_p (exit_bb) || LOOP_VINFO_EARLY_BREAKS 
> (loop_vinfo));
> +
> +  tree vec_lhs_phi = copy_ssa_name (vec_lhs);
> +  gimple *phi = create_phi_node (vec_lhs_phi, exit_bb);
> +  for (unsigned i = 0; i < gimple_phi_num_args (phi); i++)
> +SET_PHI_ARG_DEF (phi, i, vec_lhs);
> +
> +  gimple_seq stmts = NULL;
> +  tree new_tree;
> +  if (LOOP_VINFO_FULLY_WITH_LENGTH_P (loop_vinfo))
> +{
> +  /* Emit:
> +  SCALAR_RES = VEC_EXTRACT 
> +  where VEC_LHS is the vectorized live-out result and MASK is
> +  the loop mask for the final iteration.  */
> +  gcc_assert (ncopies == 1 && !slp_node);
> +  gimple_seq tem = NULL;
> +  gimple_stmt_iterator gsi = gsi_last (tem);
> +  tree len = vect_get_loop_len (loop_vinfo, ,
> + _VINFO_LENS (loop_vinfo),
> + 1, vectype, 0, 0);
> +  /* BIAS - 1.  */
> +  signed char biasval = LOOP_VINFO_PARTIAL_LOAD_STORE_BIAS (loop_vinfo);
> +  tree bias_minus_one
> + = int_const_binop (MINUS_EXPR,
> +build_int_cst (TREE_TYPE (len), biasval),
> +build_one_cst (TREE_TYPE (len)));
> +  /* LAST_INDEX = LEN + (BIAS - 1).  */
> +  tree last_index = gimple_build (, PLUS_EXPR, TREE_TYPE (len),
> +  len, bias_minus_one);
> +  /* This needs to implement extraction of the first index, but not sure
> +  how the LEN stuff works.  At the moment we shouldn't get here since
> +  there's no LEN support for early breaks.  But guard this so there's
> +  no incorrect codegen.  */
> +  gcc_assert (!LOOP_VINFO_EARLY_BREAKS (loop_vinfo));
> +
> +  /* SCALAR_RES = VEC_EXTRACT .  */
> +  tree scalar_res
> + = gimple_build (, CFN_VEC_EXTRACT, TREE_TYPE (vectype),
> + vec_lhs_phi, last_index);
> +  /* Convert the extracted vector element to the scalar type.  */
> +  new_tree = gimple_convert (, lhs_type, scalar_res);
> +}
> +  else if (LOOP_VINFO_FULLY_MASKED_P (loop_vinfo))
> +{
> +  /* Emit:
> +  SCALAR_RES = EXTRACT_LAST 
> +  where VEC_LHS is the vectorized live-out result and MASK is
> +  the loop mask for the final iteration.  */
> +  gcc_assert (!slp_node);
> +  tree scalar_type = TREE_TYPE (STMT_VINFO_VECTYPE (stmt_info));
> +  gimple_seq tem = NULL;
> +  gimple_stmt_iterator gsi = gsi_last (tem);
> +  tree mask = vect_get_loop_mask (loop_vinfo, ,
> +   _VINFO_MASKS (loop_vinfo),
> +   1, vectype, 0);
> +
> +  gimple_seq_add_seq (, tem);
> +   tree scalar_res = gimple_build (, CFN_EXTRACT_LAST, scalar_type,
> +mask, vec_lhs_phi);
> +  /* Convert the extracted vector element to the scalar type.  */
> +  new_tree = gimple_convert (, lhs_type, scalar_res);
> +}
> +  else
> +{
> +  tree bftype = TREE_TYPE (vectype);
> +  if (VECTOR_BOOLEAN_TYPE_P (vectype))
> + bftype = build_nonstandard_integer_type (tree_to_uhwi (bitsize), 1);
> +  new_tree = build3 (BIT_FIELD_REF, bftype, vec_lhs_phi, bitsize, 
> bitstart);
> +  new_tree = force_gimple_operand (fold_convert (lhs_type, new_tree),
> +, true, NULL_TREE);
> +}
> +  *exit_gsi = gsi_after_labels (exit_bb);
> +  if (stmts)
> +gsi_insert_seq_before (exit_gsi, stmts, GSI_SAME_STMT);
> +  return 

Pushed: [PATCH v3 0/5] LoongArch: SIMD fixes and optimizations

2023-11-28 Thread Xi Ruoyao
On Mon, 2023-11-20 at 08:47 +0800, Xi Ruoyao wrote:
> The [1/5] patch is the PR112578 fix at
> https://gcc.gnu.org/pipermail/gcc-patches/2023-November/637097.html.
> It has been changed to remove the nearbyint pattern (because nearbyint
> should not raise FE_INEXACT even if -ffp-int-builtin-inexact).
> As other patches depending on the simd.md file introduced by this, sending
> it as the first of this series.
> 
> As many LASX instructions are only differentiated from the corresponding
> LSX instruction with operand length, create simd.md file to contain the
> RTX templates sharable by LSX and LASX.  This makes the code cleaner and
> easier to maintain.
> 
> The [2/5] and [3/5] patches make vector product highpart and rotate
> shift operations for GNU vectors and auto vectorization.
> 
> The [4/5] patch is a simple code cleanup, with no function change.
> 
> The [5/5] patch uses LSX for FP scalar rounding operations if LSX is
> available and -ffp-int-builtin-exact.  We do this because the base FP
> ISA does not have such instructions.  Using LSX is overkill, but still
> much faster than calling libc functions.
> 
> Bootstrapped and regtested on loongarch64-linux-gnu.  Ok for trunk?

Pushed r14-5950 .. r14-5954 with minor change: a FSF copyright
disclaimer is added into simd.md in the 1st patch, and an used
match_scratch is removed from 2 in the 5th
patch.

> Xi Ruoyao (5):
>   LoongArch: Fix usage of LSX and LASX frint/ftint instructions
>     [PR112578]
>   LoongArch: Use standard pattern name and RTX code for LSX/LASX muh
>     instructions
>   LoongArch: Use standard pattern name and RTX code for LSX/LASX rotate
>     shift
>   LoongArch: Remove lrint_allow_inexact
>   LoongArch: Use LSX for scalar FP rounding with explicit rounding mode
> 
>  gcc/config/loongarch/lasx.md  | 283 -
>  gcc/config/loongarch/loongarch-builtins.cc    |  52 ++--
>  gcc/config/loongarch/loongarch.md |  12 +-
>  gcc/config/loongarch/lsx.md   | 293 --
>  gcc/config/loongarch/simd.md  | 268 
>  .../loongarch/vect-frint-no-inexact.c |  48 +++
>  .../loongarch/vect-frint-scalar-no-inexact.c  |  23 ++
>  .../gcc.target/loongarch/vect-frint-scalar.c  |  43 +++
>  .../gcc.target/loongarch/vect-frint.c |  85 +
>  .../loongarch/vect-ftint-no-inexact.c |  44 +++
>  .../gcc.target/loongarch/vect-ftint.c |  83 +
>  gcc/testsuite/gcc.target/loongarch/vect-muh.c |  36 +++
>  .../gcc.target/loongarch/vect-rotr.c  |  36 +++
>  13 files changed, 701 insertions(+), 605 deletions(-)
>  create mode 100644 gcc/config/loongarch/simd.md
>  create mode 100644 gcc/testsuite/gcc.target/loongarch/vect-frint-no-inexact.c
>  create mode 100644 
> gcc/testsuite/gcc.target/loongarch/vect-frint-scalar-no-inexact.c
>  create mode 100644 gcc/testsuite/gcc.target/loongarch/vect-frint-scalar.c
>  create mode 100644 gcc/testsuite/gcc.target/loongarch/vect-frint.c
>  create mode 100644 gcc/testsuite/gcc.target/loongarch/vect-ftint-no-inexact.c
>  create mode 100644 gcc/testsuite/gcc.target/loongarch/vect-ftint.c
>  create mode 100644 gcc/testsuite/gcc.target/loongarch/vect-muh.c
>  create mode 100644 gcc/testsuite/gcc.target/loongarch/vect-rotr.c

-- 
Xi Ruoyao 
School of Aerospace Science and Technology, Xidian University


[Bug driver/112759] [13 regression] mips -march=native detection broken with gcc 13+

2023-11-28 Thread matoro_gcc_bugzilla at matoro dot tk via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=112759

matoro  changed:

   What|Removed |Added

 CC||matoro_gcc_bugzilla@matoro.
   ||tk

--- Comment #1 from matoro  ---
Created attachment 56714
  --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=56714=edit
/proc/cpuinfo

[Bug driver/112759] New: [13 regression] mips -march=native detection broken with gcc 13+

2023-11-28 Thread matoro_gcc_bugzilla at matoro dot tk via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=112759

Bug ID: 112759
   Summary: [13 regression] mips -march=native detection broken
with gcc 13+
   Product: gcc
   Version: 13.1.0
Status: UNCONFIRMED
  Severity: normal
  Priority: P3
 Component: driver
  Assignee: unassigned at gcc dot gnu.org
  Reporter: matoro_gcc_bugzilla at matoro dot tk
  Target Milestone: ---

Since gcc 13 - and presumably 66c48be23e0fa5ee7474b4b078e013f901c71eed since
that is the only recent change to this area - detection of -march=native on
mips is broken, defaulting to mips1.

With gcc 13:
# gcc -march=native -Q --help=target | grep "arch=" | head -n 1
  -march=ISAmips1
# gcc --version
gcc (Gentoo 13.2.1_p20231014 p9) 13.2.1 20231014

With gcc 12:
# gcc -march=native -Q --help=target | grep "arch=" | head -n 1
  -march=ISAocteon2
# gcc --version
gcc (Gentoo 12.3.1_p20230825 p2) 12.3.1 20230825

Contents of my /proc/cpuinfo are attached.  strace seems to indicate that it is
reading the data at least:

openat(AT_FDCWD, "/proc/cpuinfo", O_RDONLY) = 3
statx(3, "", AT_STATX_SYNC_AS_STAT|AT_NO_AUTOMOUNT|AT_EMPTY_PATH,
STATX_BASIC_STATS, {stx_mask=STATX_BASIC_STATS|STATX_MNT_ID, stx_attributes=0,
stx_mode=S_IFREG|0444, stx_size=0, ...}) = 0
read(3, "system type\t\t: EBB6800 (CN6880p2"..., 1024) = 1024
close(3)= 0

[Bug target/112578] LoongArch: Wrong code -with -mlsx -fno-fp-int-builtin-inexact

2023-11-28 Thread xry111 at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=112578

Xi Ruoyao  changed:

   What|Removed |Added

 Status|ASSIGNED|RESOLVED
 Resolution|--- |FIXED

--- Comment #6 from Xi Ruoyao  ---
Fixed for trunk.

[Bug target/112578] LoongArch: Wrong code -with -mlsx -fno-fp-int-builtin-inexact

2023-11-28 Thread cvs-commit at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=112578

--- Comment #5 from GCC Commits  ---
The master branch has been updated by Xi Ruoyao :

https://gcc.gnu.org/g:530348c418d9ec28aac5b151c15405bfb860e5c4

commit r14-5950-g530348c418d9ec28aac5b151c15405bfb860e5c4
Author: Xi Ruoyao 
Date:   Sat Nov 18 04:48:20 2023 +0800

LoongArch: Fix usage of LSX and LASX frint/ftint instructions [PR112578]

The usage LSX and LASX frint/ftint instructions had some problems:

1. These instructions raises FE_INEXACT, which is not allowed with
   -fno-fp-int-builtin-inexact for most C2x section F.10.6 functions
   (the only exceptions are rint, lrint, and llrint).
2. The "frint" instruction without explicit rounding mode is used for
   roundM2, this is incorrect because roundM2 is defined "rounding
   operand 1 to the *nearest* integer, rounding away from zero in the
   event of a tie".  We actually don't have such an instruction.  Our
   frintrne instruction is roundevenM2 (unfortunately, this is not
   documented).
3. These define_insn's are written in a way not so easy to hack.

So I removed these instructions and created a "simd.md" file, then added
them and the corresponding expanders there.  The advantage of the
simd.md file is we don't need to duplicate the RTL template twice (in
lsx.md and lasx.md).

gcc/ChangeLog:

PR target/112578
* config/loongarch/lsx.md (UNSPEC_LSX_VFTINT_S,
UNSPEC_LSX_VFTINTRNE, UNSPEC_LSX_VFTINTRP,
UNSPEC_LSX_VFTINTRM, UNSPEC_LSX_VFRINTRNE_S,
UNSPEC_LSX_VFRINTRNE_D, UNSPEC_LSX_VFRINTRZ_S,
UNSPEC_LSX_VFRINTRZ_D, UNSPEC_LSX_VFRINTRP_S,
UNSPEC_LSX_VFRINTRP_D, UNSPEC_LSX_VFRINTRM_S,
UNSPEC_LSX_VFRINTRM_D): Remove.
(ILSX, FLSX): Move into ...
(VIMODE): Move into ...
(FRINT_S, FRINT_D): Remove.
(frint_pattern_s, frint_pattern_d, frint_suffix): Remove.
(lsx_vfrint_, lsx_vftint_s__,
lsx_vftintrne_w_s, lsx_vftintrne_l_d, lsx_vftintrp_w_s,
lsx_vftintrp_l_d, lsx_vftintrm_w_s, lsx_vftintrm_l_d,
lsx_vfrintrne_s, lsx_vfrintrne_d, lsx_vfrintrz_s,
lsx_vfrintrz_d, lsx_vfrintrp_s, lsx_vfrintrp_d,
lsx_vfrintrm_s, lsx_vfrintrm_d,
v4sf2,
v2df2, round2,
fix_trunc2): Remove.
* config/loongarch/lasx.md: Likewise.
* config/loongarch/simd.md: New file.
(ILSX, ILASX, FLSX, FLASX, VIMODE): ... here.
(IVEC, FVEC): New mode iterators.
(VIMODE): ... here.  Extend it to work for all LSX/LASX vector
modes.
(x, wu, simd_isa, WVEC, vimode, simdfmt, simdifmt_for_f,
elebits): New mode attributes.
(UNSPEC_SIMD_FRINTRP, UNSPEC_SIMD_FRINTRZ, UNSPEC_SIMD_FRINT,
UNSPEC_SIMD_FRINTRM, UNSPEC_SIMD_FRINTRNE): New unspecs.
(SIMD_FRINT): New int iterator.
(simd_frint_rounding, simd_frint_pattern): New int attributes.
(_vfrint_): New
define_insn template for frint instructions.
   
(_vftint__):
Likewise, but for ftint instructions.
(2): New define_expand with
flag_fp_int_builtin_inexact checked.
(l2): Likewise.
(ftrunc2): New define_expand.  It does not require
flag_fp_int_builtin_inexact.
(fix_trunc2): New define_insn_and_split.  It does
not require flag_fp_int_builtin_inexact.
(include): Add lsx.md and lasx.md.
* config/loongarch/loongarch.md (include): Include simd.md,
instead of including lsx.md and lasx.md directly.
* config/loongarch/loongarch-builtins.cc
(CODE_FOR_lsx_vftint_w_s, CODE_FOR_lsx_vftint_l_d,
CODE_FOR_lasx_xvftint_w_s, CODE_FOR_lasx_xvftint_l_d):
Remove.

gcc/testsuite/ChangeLog:

PR target/112578
* gcc.target/loongarch/vect-frint.c: New test.
* gcc.target/loongarch/vect-frint-no-inexact.c: New test.
* gcc.target/loongarch/vect-ftint.c: New test.
* gcc.target/loongarch/vect-ftint-no-inexact.c: New test.

[Bug target/112753] [14 Regression] unrecognizable insn building glibc for s390x

2023-11-28 Thread rguenth at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=112753

Richard Biener  changed:

   What|Removed |Added

   Priority|P3  |P1
   Target Milestone|--- |14.0

[Bug testsuite/112751] [14 regression] gcc.target/powerpc/pcrel-sibcall-1.c fails after r14-5628-g53ba8d669550d3

2023-11-28 Thread rguenth at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=112751

Richard Biener  changed:

   What|Removed |Added

  Component|target  |testsuite
   Target Milestone|--- |14.0

[Bug sanitizer/112748] memmove(ptr, ptr, n) call optimized out even at -O0 with -fsanitize=undefined

2023-11-28 Thread rguenth at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=112748

Richard Biener  changed:

   What|Removed |Added

 Status|UNCONFIRMED |NEW
   Keywords||documentation
 CC||dodji at gcc dot gnu.org,
   ||dvyukov at gcc dot gnu.org,
   ||jakub at gcc dot gnu.org,
   ||kcc at gcc dot gnu.org,
   ||marxin at gcc dot gnu.org
 Ever confirmed|0   |1
  Component|middle-end  |sanitizer
   Last reconfirmed||2023-11-29

--- Comment #3 from Richard Biener  ---
Confirmed.  We fold all calls early after gimplification and folding is known
to also affect -O0.  This behavior is independent of sanitizing which happens
partly before and partly only after this folding takes place.

We also simplify 1 + 1 or x + 0 with -O0 or turn printf("%s", "Hello")
into puts("Hello") for example.

Documenting this behavior might be good.  Gating some of the simplifications
on optimization might be also reasonable.

[Bug target/112743] RISC-V: building FAIL with -march=rv64(or rv32)gc_zve32f_zvfh_zfh

2023-11-28 Thread pan2.li at intel dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=112743

--- Comment #4 from Li Pan  ---
There may be another ICE for zve32f, will double-check about the details.

Re: [r14-5930 Regression] FAIL: gcc.c-torture/compile/libcall-2.c -Os (test for excess errors) on Linux/x86_64

2023-11-28 Thread Jose E. Marchesi


> On Linux/x86_64,
>
> f31a019d1161ec78846473da743aedf49cca8c27 is the first bad commit
> commit f31a019d1161ec78846473da743aedf49cca8c27
> Author: Jose E. Marchesi 
> Date:   Fri Nov 24 06:30:28 2023 +0100
>
> Emit funcall external declarations only if actually used.
>
> caused
>
> FAIL: gcc.c-torture/compile/libcall-2.c   -O0  (test for excess errors)
> FAIL: gcc.c-torture/compile/libcall-2.c   -O1  (test for excess errors)
> FAIL: gcc.c-torture/compile/libcall-2.c -O2 -flto
> -fno-use-linker-plugin -flto-partition=none (test for excess errors)
> FAIL: gcc.c-torture/compile/libcall-2.c -O2 -flto -fuse-linker-plugin
> -fno-fat-lto-objects (test for excess errors)
> FAIL: gcc.c-torture/compile/libcall-2.c   -O2  (test for excess errors)
> FAIL: gcc.c-torture/compile/libcall-2.c   -O3 -g  (test for excess errors)
> FAIL: gcc.c-torture/compile/libcall-2.c   -Os  (test for excess errors)

Sorry about the regression.

I installed the patch below to skip the test if the target is not x86_64
in lp64, as obvious.  This should fix the issue.

>From 4ed0740c6e807460ce79a351094329fdeb551545 Mon Sep 17 00:00:00 2001
From: "Jose E. Marchesi" 
Date: Wed, 29 Nov 2023 07:44:59 +0100
Subject: [PATCH] testsuite: fix gcc.c-torture/compile/libcall-2.c in -m32

This test relies on having __int128 in x86_64 targets, which is only
available in -m64.

gcc/testsuite/ChangeLog

* gcc.c-torture/compile/libcall-2.c: Skip test in -m32.
---
 gcc/testsuite/gcc.c-torture/compile/libcall-2.c | 2 ++
 1 file changed, 2 insertions(+)

diff --git a/gcc/testsuite/gcc.c-torture/compile/libcall-2.c 
b/gcc/testsuite/gcc.c-torture/compile/libcall-2.c
index b33944c83ff..9b889172025 100644
--- a/gcc/testsuite/gcc.c-torture/compile/libcall-2.c
+++ b/gcc/testsuite/gcc.c-torture/compile/libcall-2.c
@@ -2,6 +2,8 @@
indirect calls.  */
 
 /* { dg-do compile } */
+/* __int128 is not supported in x86 -m32.  */
+/* { dg-skip-if "" { ! { x86_64-*-* && { ! ilp32 } } } } */
 /* { dg-options "-O2 -mcmodel=large" { target x86_64-*-* } } */
 /* { dg-final { scan-assembler "globl\t__divti3" } } */
 
-- 
2.30.2


RE: [PATCH v1] RISC-V: Bugfix for ICE in block move when zve32f

2023-11-28 Thread Li, Pan2
Committed with the test file rename, thanks Juzhe.

Pan

From: juzhe.zh...@rivai.ai 
Sent: Wednesday, November 29, 2023 2:45 PM
To: Li, Pan2 ; gcc-patches 
Cc: Li, Pan2 ; Wang, Yanzhang ; 
kito.cheng 
Subject: Re: [PATCH v1] RISC-V: Bugfix for ICE in block move when zve32f

pr112743-0.c -> pr112743-1.c for consistent.


Otherwise LGTM. No need to send V2.


juzhe.zh...@rivai.ai

From: pan2.li
Date: 2023-11-29 14:37
To: gcc-patches
CC: juzhe.zhong; 
pan2.li; 
yanzhang.wang; 
kito.cheng
Subject: [PATCH v1] RISC-V: Bugfix for ICE in block move when zve32f
From: Pan Li mailto:pan2...@intel.com>>

The exact_div requires the exactly multiple of the divider.
Unfortunately, the condition will be broken when zve32f in
some cases. For example,

potential_ew is 8
BYTES_PER_RISCV_VECTOR * lmul1 is [4, 4]

This patch would like to ensure the precondition of exact_div
when get_vec_mode.

PR 112743

gcc/ChangeLog:

* config/riscv/riscv-string.cc (expand_block_move): Add
precondition check for exact_div.

gcc/testsuite/ChangeLog:

* gcc.target/riscv/rvv/base/pr112743-0.c: New test.

Signed-off-by: Pan Li mailto:pan2...@intel.com>>
---
gcc/config/riscv/riscv-string.cc |  1 +
.../gcc.target/riscv/rvv/base/pr112743-0.c   | 16 
2 files changed, 17 insertions(+)
create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/base/pr112743-0.c

diff --git a/gcc/config/riscv/riscv-string.cc b/gcc/config/riscv/riscv-string.cc
index 3b5e05e2c44..80e3b5981af 100644
--- a/gcc/config/riscv/riscv-string.cc
+++ b/gcc/config/riscv/riscv-string.cc
@@ -866,6 +866,7 @@ expand_block_move (rtx dst_in, rtx src_in, rtx length_in)
if (TARGET_MIN_VLEN * lmul <= nunits * BITS_PER_UNIT
/* Avoid loosing the option of using vsetivli .  */
&& (nunits <= 31 * lmul || nunits > 31 * 8)
+ && multiple_p (BYTES_PER_RISCV_VECTOR * lmul, potential_ew)
&& (riscv_vector::get_vector_mode
(elem_mode, exact_div (BYTES_PER_RISCV_VECTOR * lmul,
 potential_ew)).exists ()))
diff --git a/gcc/testsuite/gcc.target/riscv/rvv/base/pr112743-0.c 
b/gcc/testsuite/gcc.target/riscv/rvv/base/pr112743-0.c
new file mode 100644
index 000..2e62e60d89b
--- /dev/null
+++ b/gcc/testsuite/gcc.target/riscv/rvv/base/pr112743-0.c
@@ -0,0 +1,16 @@
+/* Test that we do not have ice when compile */
+/* { dg-do compile } */
+/* { dg-options "-march=rv64gc_zve32f_zvfh_zfh -mabi=lp64 -O2" } */
+
+typedef struct test_a {
+  void *x;
+  char a[10];
+  short b[2];
+  int c[1];
+} test_type_t;
+
+void
+test_copy_memory (test_type_t *out, test_type_t *in)
+{
+  *out = *in;
+}
--
2.34.1




[Bug target/112743] RISC-V: building FAIL with -march=rv64(or rv32)gc_zve32f_zvfh_zfh

2023-11-28 Thread cvs-commit at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=112743

--- Comment #3 from GCC Commits  ---
The master branch has been updated by Pan Li :

https://gcc.gnu.org/g:25a51e98fdd504826a40775a5e5b9ffb336b5aa1

commit r14-5945-g25a51e98fdd504826a40775a5e5b9ffb336b5aa1
Author: Pan Li 
Date:   Wed Nov 29 14:31:30 2023 +0800

RISC-V: Bugfix for ICE in block move when zve32f

The exact_div requires the exactly multiple of the divider.
Unfortunately, the condition will be broken when zve32f in
some cases. For example,

potential_ew is 8
BYTES_PER_RISCV_VECTOR * lmul1 is [4, 4]

This patch would like to ensure the precondition of exact_div
when get_vec_mode.

PR target/112743

gcc/ChangeLog:

* config/riscv/riscv-string.cc (expand_block_move): Add
precondition check for exact_div.

gcc/testsuite/ChangeLog:

* gcc.target/riscv/rvv/base/pr112743-1.c: New test.

Signed-off-by: Pan Li 

Re: [PATCH v1] RISC-V: Bugfix for ICE in block move when zve32f

2023-11-28 Thread juzhe.zh...@rivai.ai
pr112743-0.c -> pr112743-1.c for consistent.


Otherwise LGTM. No need to send V2.



juzhe.zh...@rivai.ai
 
From: pan2.li
Date: 2023-11-29 14:37
To: gcc-patches
CC: juzhe.zhong; pan2.li; yanzhang.wang; kito.cheng
Subject: [PATCH v1] RISC-V: Bugfix for ICE in block move when zve32f
From: Pan Li 
 
The exact_div requires the exactly multiple of the divider.
Unfortunately, the condition will be broken when zve32f in
some cases. For example,
 
potential_ew is 8
BYTES_PER_RISCV_VECTOR * lmul1 is [4, 4]
 
This patch would like to ensure the precondition of exact_div
when get_vec_mode.
 
PR 112743
 
gcc/ChangeLog:
 
* config/riscv/riscv-string.cc (expand_block_move): Add
precondition check for exact_div.
 
gcc/testsuite/ChangeLog:
 
* gcc.target/riscv/rvv/base/pr112743-0.c: New test.
 
Signed-off-by: Pan Li 
---
gcc/config/riscv/riscv-string.cc |  1 +
.../gcc.target/riscv/rvv/base/pr112743-0.c   | 16 
2 files changed, 17 insertions(+)
create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/base/pr112743-0.c
 
diff --git a/gcc/config/riscv/riscv-string.cc b/gcc/config/riscv/riscv-string.cc
index 3b5e05e2c44..80e3b5981af 100644
--- a/gcc/config/riscv/riscv-string.cc
+++ b/gcc/config/riscv/riscv-string.cc
@@ -866,6 +866,7 @@ expand_block_move (rtx dst_in, rtx src_in, rtx length_in)
if (TARGET_MIN_VLEN * lmul <= nunits * BITS_PER_UNIT
/* Avoid loosing the option of using vsetivli .  */
&& (nunits <= 31 * lmul || nunits > 31 * 8)
+ && multiple_p (BYTES_PER_RISCV_VECTOR * lmul, potential_ew)
&& (riscv_vector::get_vector_mode
(elem_mode, exact_div (BYTES_PER_RISCV_VECTOR * lmul,
 potential_ew)).exists ()))
diff --git a/gcc/testsuite/gcc.target/riscv/rvv/base/pr112743-0.c 
b/gcc/testsuite/gcc.target/riscv/rvv/base/pr112743-0.c
new file mode 100644
index 000..2e62e60d89b
--- /dev/null
+++ b/gcc/testsuite/gcc.target/riscv/rvv/base/pr112743-0.c
@@ -0,0 +1,16 @@
+/* Test that we do not have ice when compile */
+/* { dg-do compile } */
+/* { dg-options "-march=rv64gc_zve32f_zvfh_zfh -mabi=lp64 -O2" } */
+
+typedef struct test_a {
+  void *x;
+  char a[10];
+  short b[2];
+  int c[1];
+} test_type_t;
+
+void
+test_copy_memory (test_type_t *out, test_type_t *in)
+{
+  *out = *in;
+}
-- 
2.34.1
 
 


[PATCH v1] RISC-V: Bugfix for ICE in block move when zve32f

2023-11-28 Thread pan2 . li
From: Pan Li 

The exact_div requires the exactly multiple of the divider.
Unfortunately, the condition will be broken when zve32f in
some cases. For example,

potential_ew is 8
BYTES_PER_RISCV_VECTOR * lmul1 is [4, 4]

This patch would like to ensure the precondition of exact_div
when get_vec_mode.

PR 112743

gcc/ChangeLog:

* config/riscv/riscv-string.cc (expand_block_move): Add
precondition check for exact_div.

gcc/testsuite/ChangeLog:

* gcc.target/riscv/rvv/base/pr112743-0.c: New test.

Signed-off-by: Pan Li 
---
 gcc/config/riscv/riscv-string.cc |  1 +
 .../gcc.target/riscv/rvv/base/pr112743-0.c   | 16 
 2 files changed, 17 insertions(+)
 create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/base/pr112743-0.c

diff --git a/gcc/config/riscv/riscv-string.cc b/gcc/config/riscv/riscv-string.cc
index 3b5e05e2c44..80e3b5981af 100644
--- a/gcc/config/riscv/riscv-string.cc
+++ b/gcc/config/riscv/riscv-string.cc
@@ -866,6 +866,7 @@ expand_block_move (rtx dst_in, rtx src_in, rtx length_in)
if (TARGET_MIN_VLEN * lmul <= nunits * BITS_PER_UNIT
/* Avoid loosing the option of using vsetivli .  */
&& (nunits <= 31 * lmul || nunits > 31 * 8)
+   && multiple_p (BYTES_PER_RISCV_VECTOR * lmul, potential_ew)
&& (riscv_vector::get_vector_mode
 (elem_mode, exact_div (BYTES_PER_RISCV_VECTOR * lmul,
 potential_ew)).exists ()))
diff --git a/gcc/testsuite/gcc.target/riscv/rvv/base/pr112743-0.c 
b/gcc/testsuite/gcc.target/riscv/rvv/base/pr112743-0.c
new file mode 100644
index 000..2e62e60d89b
--- /dev/null
+++ b/gcc/testsuite/gcc.target/riscv/rvv/base/pr112743-0.c
@@ -0,0 +1,16 @@
+/* Test that we do not have ice when compile */
+/* { dg-do compile } */
+/* { dg-options "-march=rv64gc_zve32f_zvfh_zfh -mabi=lp64 -O2" } */
+
+typedef struct test_a {
+  void *x;
+  char a[10];
+  short b[2];
+  int c[1];
+} test_type_t;
+
+void
+test_copy_memory (test_type_t *out, test_type_t *in)
+{
+  *out = *in;
+}
-- 
2.34.1



[Bug target/112751] [14 regression] gcc.target/powerpc/pcrel-sibcall-1.c fails after r14-5628-g53ba8d669550d3

2023-11-28 Thread linkw at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=112751

Kewen Lin  changed:

   What|Removed |Added

 CC||linkw at gcc dot gnu.org
   Last reconfirmed||2023-11-29
 Status|UNCONFIRMED |NEW
 Ever confirmed|0   |1

--- Comment #2 from Kewen Lin  ---
(In reply to Andrew Pinski from comment #1)
> This is just a testsuite issue. The functions are currently marked as
> noinline.  You can either add -fno-ipa-vrp or mark them with noipa instead.
> I am not sure if noipa here is right due to having some ipa happening due to
> localization.

Thanks for looking into this, I just tested with noipa and confirmed it worked
well.

[Bug c/112758] New: Inconsistent Bitwise AND Operation Result between int and long long int on Different Optimization Levels in GCC Trunk

2023-11-28 Thread guminb at ajou dot ac.kr via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=112758

Bug ID: 112758
   Summary: Inconsistent Bitwise AND Operation Result between int
and long long int on Different Optimization Levels in
GCC Trunk
   Product: gcc
   Version: 14.0
Status: UNCONFIRMED
  Severity: normal
  Priority: P3
 Component: c
  Assignee: unassigned at gcc dot gnu.org
  Reporter: guminb at ajou dot ac.kr
  Target Milestone: ---

Dear GCC Development Team,

I would like to report an inconsistency observed in the GCC RISC-V 64 version
14.0.0 when compiling code involving bitwise AND operations between an `int`
and a `long long int` variable under different optimization levels. The issue
appears when both operands are negative, with results varying significantly
between non-optimized and optimized compilations.

The proof of concept (PoC) code provided below demonstrates this issue. The
code performs a bitwise AND operation between a 32-bit integer with its high
bit set (`globalVar`) and a 64-bit long long integer (`localVar`), both
containing negative values. The expected result of the operation seems to
differ based on the optimization level used during compilation.

PoC Code:

```c
#include 

int globalVar = 0x8000; // 32-bit int with high bit set

int main () {
long long int localVar = 0xffFF00ff; // 64-bit long long int
printf("localVar: 0x%llx\\n", localVar);
printf("globalVar: 0x%llx\\n", (long long int)globalVar);
printf("Result: 0x%llx\\n", ((localVar) & ((long long int) (globalVar;
return 0;
}
```

Observed Results:

- With **`O0`** optimization, the result of the bitwise AND operation is as
expected (**`0x00ff8000`**).
- With **`O1`**, **`O2`**, **`O3`**, **`Os`**, **`Oz`** optimizations, the
result changes to **`0x8000`**.

Assembly Output:
The assembly output for -O0 and -O1 can be viewed at the following Compiler
Explorer link:

https://godbolt.org/z/fb33vWT7o

- The **`O0`** output shows the expected behavior with explicit casting and AND
operation.
- The **`O1`** output, however, omits the casting and AND operation, leading to
an unexpected result.

I suspect this might be related to how the compiler handles casting or the
bitwise operation under different optimization levels. This inconsistency could
potentially lead to unintended behavior in applications that rely on such
operations, especially when negative values are involved.

I appreciate your attention to this matter and look forward to any insights or
potential solutions you might provide.

Best regards,
[Gyumin Baek]

Re: [PATCH] Emit funcall external declarations only if actually used.

2023-11-28 Thread Jose E. Marchesi


>> "Jose E. Marchesi"  writes:
>>> There are many places in GCC where alternative local sequences are
>>> tried in order to determine what is the cheapest or best alternative
>>> to use in the current target.  When any of these sequences involve a
>>> libcall, the current implementation of emit_library_call_value_1
>>> introduce a side-effect consisting on emitting an external declaration
>>> for the funcall (such as __divdi3) which is thus emitted even if the
>>> sequence that does the libcall is not retained.
>>>
>>> This is problematic in targets such as BPF, because the kernel loader
>>> chokes on the spurious symbol __divdi3 and makes the resulting BPF
>>> object unloadable.  Note that BPF objects are not linked before being
>>> loaded.
>>>
>>> This patch changes asssemble_external_libcall to defer emitting
>>> declarations of external libcall symbols, by saving the call tree
>>> nodes in a temporary list pending_libcall_symbols and letting
>>> process_pending_assembly_externals to emit them only if they have been
>>> referenced.  Solution suggested and sketched by Richard Sandiford.
>>>
>>> Regtested in x86_64-linux-gnu.
>>> Tested with host x86_64-linux-gnu with target bpf-unknown-none.
>>>
>>> gcc/ChangeLog
>>>
>>> * varasm.cc (pending_libcall_symbols): New variable.
>>> (process_pending_assemble_externals): Process
>>> pending_libcall_symbols.
>>> (assemble_external_libcall): Defer emitting external libcall
>>> symbols to process_pending_assemble_externals.
>>>
>>> gcc/testsuite/ChangeLog
>>>
>>> * gcc.target/bpf/divmod-libcall-1.c: New test.
>>> * gcc.target/bpf/divmod-libcall-2.c: Likewise.
>>> * gcc.c-torture/compile/libcall-2.c: Likewise.
>>
>> OK, thanks.
>
> Thank you.
> Pushed.

I installed the following fix, since the built got broken in targets
that do not define ASM_OUTPUT_EXTERNAL.

diff --git a/gcc/varasm.cc b/gcc/varasm.cc
index deb7eab7af9..167aea87091 100644
--- a/gcc/varasm.cc
+++ b/gcc/varasm.cc
@@ -2607,7 +2607,9 @@ assemble_external_libcall (rtx fun)
   /* Declare library function name external when first used, if nec.  */
   if (! SYMBOL_REF_USED (fun))
 {
+#ifdef ASM_OUTPUT_EXTERNAL
   gcc_assert (!pending_assemble_externals_processed);
+#endif
   SYMBOL_REF_USED (fun) = 1;
   /* Make sure the libcall symbol is in the symtab so any
  reference to it will mark its tree node as referenced, via
-- 
2.30.2


>
>> Richard
>>
>>> ---
>>>  .../gcc.c-torture/compile/libcall-2.c |  8 +++
>>>  .../gcc.target/bpf/divmod-libcall-1.c | 19 
>>>  .../gcc.target/bpf/divmod-libcall-2.c | 16 ++
>>>  gcc/varasm.cc | 22 ++-
>>>  4 files changed, 64 insertions(+), 1 deletion(-)
>>>  create mode 100644 gcc/testsuite/gcc.c-torture/compile/libcall-2.c
>>>  create mode 100644 gcc/testsuite/gcc.target/bpf/divmod-libcall-1.c
>>>  create mode 100644 gcc/testsuite/gcc.target/bpf/divmod-libcall-2.c
>>>
>>> diff --git a/gcc/testsuite/gcc.c-torture/compile/libcall-2.c 
>>> b/gcc/testsuite/gcc.c-torture/compile/libcall-2.c
>>> new file mode 100644
>>> index 000..b33944c83ff
>>> --- /dev/null
>>> +++ b/gcc/testsuite/gcc.c-torture/compile/libcall-2.c
>>> @@ -0,0 +1,8 @@
>>> +/* Make sure that external refences for libcalls are generated even for
>>> +   indirect calls.  */
>>> +
>>> +/* { dg-do compile } */
>>> +/* { dg-options "-O2 -mcmodel=large" { target x86_64-*-* } } */
>>> +/* { dg-final { scan-assembler "globl\t__divti3" } } */
>>> +
>>> +__int128 a, b; void foo () { a = a / b; }
>>> diff --git a/gcc/testsuite/gcc.target/bpf/divmod-libcall-1.c 
>>> b/gcc/testsuite/gcc.target/bpf/divmod-libcall-1.c
>>> new file mode 100644
>>> index 000..7481076602a
>>> --- /dev/null
>>> +++ b/gcc/testsuite/gcc.target/bpf/divmod-libcall-1.c
>>> @@ -0,0 +1,19 @@
>>> +/* This test makes sure that no spurious external symbol declarations are
>>> +   emitted for libcalls in tried but eventually not used code sequences.  
>>> */
>>> +
>>> +/* { dg-do compile } */
>>> +/* { dg-options "-O2 -mcpu=v3" } */
>>> +/* { dg-final { scan-assembler-not "global\t__divdi3" } } */
>>> +/* { dg-final { scan-assembler-not "global\t__moddi3" } } */
>>> +
>>> +int
>>> +foo (unsigned int len)
>>> +{
>>> +  return ((unsigned long)len) * 234 / 5;
>>> +}
>>> +
>>> +int
>>> +bar (unsigned int len)
>>> +{
>>> +  return ((unsigned long)len) * 234 % 5;
>>> +}
>>> diff --git a/gcc/testsuite/gcc.target/bpf/divmod-libcall-2.c 
>>> b/gcc/testsuite/gcc.target/bpf/divmod-libcall-2.c
>>> new file mode 100644
>>> index 000..792d689395a
>>> --- /dev/null
>>> +++ b/gcc/testsuite/gcc.target/bpf/divmod-libcall-2.c
>>> @@ -0,0 +1,16 @@
>>> +/* { dg-do compile } */
>>> +/* { dg-options "-O2 -mcpu=v3" } */
>>> +/* { dg-final { scan-assembler "global\t__divdi3" } } */
>>> +/* { dg-final { scan-assembler "global\t__moddi3" } } */
>>> +
>>> +int
>>> +foo (unsigned int len)
>>> +{
>>> + 

Re: [PATCH 2/4] [ifcvt] optimize x=c ? (y op z) : y by RISC-V Zicond like insns

2023-11-28 Thread Jeff Law




On 11/27/23 19:32, Fei Gao wrote:

op=[PLUS, MINUS, IOR, XOR, ASHIFT, ASHIFTRT, LSHIFTRT, ROTATE, ROTATERT]

SIGN_EXTEND, ZERO_EXTEND and SUBREG has been considered
to support SImode in 64-bit machine.
Let's defer these for now.  We're supposed to be wrapping up work that 
was posted before stage1 closed.  If these opcodes were trivial to 
support, then I would let them through, but SUBREGs for example can be 
problematical as their semantics can be complex.





Conditional op, if zero
rd = (rc == 0) ? (rs1 op rs2) : rs1
-->
czero.nez rd, rs2, rc
op rd, rs1, rd

Conditional op, if non-zero
rd = (rc != 0) ? (rs1 op rs2) : rs1
-->
czero.eqz rd, rs2, rc
op rd, rs1, rd

Co-authored-by: Xiao Zeng

gcc/ChangeLog:

 * ifcvt.cc (noce_try_cond_zero_arith):handler for condtional zero 
based ifcvt
 (noce_emit_czero): helper for noce_try_cond_zero_arith
 (noce_cond_zero_binary_op_supported): check supported OPs for 
condtional zero based ifcvt
 (get_base_reg): get base reg of a subreg or the reg itself
 (noce_bbs_ok_for_cond_zero_arith): check if BBs are OK for condtional 
zero based ifcvt
 (noce_process_if_block): add noce_try_cond_zero_arith

gcc/testsuite/ChangeLog:

 * gcc.target/riscv/zicond_ifcvt_opt.c: New test.
---
  gcc/ifcvt.cc  | 210 ++
  .../gcc.target/riscv/zicond_ifcvt_opt.c   | 682 ++
  2 files changed, 892 insertions(+)
  create mode 100644 gcc/testsuite/gcc.target/riscv/zicond_ifcvt_opt.c

diff --git a/gcc/ifcvt.cc b/gcc/ifcvt.cc
index a0af553b9ff..8f6a0e7f5fe 100644
--- a/gcc/ifcvt.cc
+++ b/gcc/ifcvt.cc
@@ -787,6 +787,7 @@ static rtx noce_get_alt_condition (struct noce_if_info *, 
rtx, rtx_insn **);
  static bool noce_try_minmax (struct noce_if_info *);
  static bool noce_try_abs (struct noce_if_info *);
  static bool noce_try_sign_mask (struct noce_if_info *);
+static int noce_try_cond_zero_arith (struct noce_if_info *);
  
  /* Return the comparison code for reversed condition for IF_INFO,

 or UNKNOWN if reversing the condition is not possible.  */
@@ -1831,6 +1832,40 @@ noce_emit_cmove (struct noce_if_info *if_info, rtx x, 
enum rtx_code code,
  return NULL_RTX;
  }
  
+/*  Emit a conditional zero, returning TARGET or NULL_RTX upon failure.

+IF_INFO describes the if-conversion scenario under consideration.
+CZERO_CODE selects the condition (EQ/NE).
+NON_ZERO_OP is the nonzero operand of the conditional move
+TARGET is the desired output register.  */
+
+static rtx
+noce_emit_czero (struct noce_if_info *if_info, enum rtx_code czero_code,
+rtx non_zero_op, rtx target)

[ ... ]
The code you wrote is safe in that if constructs a suitable if-then-else 
as a single object, starts a new sequence the uses emit_insn to put that 
object onto a sequence.  Then you extract that one and only one insn 
from the sequence and validate it can be recognized.


In cases where you want to do something like this and know you're going 
to emit one and only one insn you can use 'make_insn_raw' without 
generating a new sequence.


I would suggest you replace all the code starting with start_sequence() 
and ending with end_sequence () (inclusive) with something like


insn = make_insn_raw (set);
if (recog_memoized (insn) >= 0)
  {
emit_insn (insn);
return target;
  }
return NULL_RTX;


Note that in the future (gcc-15) when this code is generalized to use 
the expander it will potentially generate multiple insns at which point 
we'll have to put them on a sequence and validate they all are 
recognizable.  But we'll tackle that at the appropriate time.




+
+  return false;
+}
+
+/*  Helper function to return REG itself or inner expression of a SUBREG,
+otherwise NULL_RTX for other RTX_CODE.  */
+
+static rtx
+get_base_reg (rtx exp)
+{
+  if (REG_P (exp))
+return exp;
+  else if (SUBREG_P (exp))
+return SUBREG_REG (exp);
+
+  return NULL_RTX;
+
I would advise against handling subregs at this point.  What you're 
doing is probably too simplistic given the semantics of subregs.





+
+  /* Strip sign_extend if any.  */
+  if (GET_CODE (a) == SIGN_EXTEND || GET_CODE (a) == ZERO_EXTEND)
+bin_exp = XEXP (a, 0);
+  else
+bin_exp = a;
Similarly while I do think we're going to want to handle extensions, 
let's not try and add them at this point.  We want to get this wrapped 
up & integrated so that everyone can move their focus to bugfixing for 
gcc-14.



+
+/*  Try to covert if-then-else with conditional zero,
+returning TURE on success or FALSE on failure.
+IF_INFO describes the if-conversion scenario under consideration.  */
+
+static int
+noce_try_cond_zero_arith (struct noce_if_info *if_info)
+{
+  rtx target, a;
+  rtx_insn *seq;
+  machine_mode mode = GET_MODE (if_info->x);
+  rtx common = NULL_RTX;
+  enum rtx_code czero_code = UNKNOWN;
+  rtx non_zero_op = NULL_RTX;
+  rtx *to_replace = NULL;
+
+  if 

[Bug c++/101113] g++ thinks constructor suppressed by a requires clause is actually a bad copy constructor

2023-11-28 Thread gcc at nospam dot scs.stanford.edu via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=101113

David Mazières  changed:

   What|Removed |Added

 Status|NEW |RESOLVED
 Resolution|--- |INVALID

--- Comment #4 from David Mazières  ---
Sorry I should have closed this bug report a while ago when I said it wasn't a
bug.

Re: [PATCH 2/4] [ifcvt] if convert x=c ? y+z : y by RISC-V Zicond like insns

2023-11-28 Thread Jeff Law




On 11/27/23 19:57, Fei Gao wrote:


1. In find_if_header function, I found the following piece of codes:
if (!reload_completed && noce_find_if_block(...)), and find_if_header must
be called before noce_try_cond_zero_arith().

Ah good.




2. In noce_try_strore_flag_constants, new registers are also generated
without can_create_pseudo_p() check.

So I guess no need to add can_create_pseudo_p() here.
Agreed.  I could possibly make an argument that it might be nice to be 
able to look for these things after reload has completed, but I think we 
can ignore that for now.  Thanks!


Jeff


[Bug target/111107] i686-w64-mingw32 does not realign stack when __attribute__((aligned)) or __attribute__((vector_size)) are used

2023-11-28 Thread zfigura at codeweavers dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=07

--- Comment #15 from Zeb Figura  ---
(In reply to Eric Botcazou from comment #14)
> > I'd say that
> > 
> > config/i386/cygming.h:#define STACK_REALIGN_DEFAULT TARGET_SSE
> > 
> > is a non-working "fix".  The appropriate default would be
> > -mincoming-stack-boundary=2.  MIN_STACK_BOUNDARY should already be 4, so
> > that leaves PREFERRED_STACK_BOUNDARY_DEFAULT is the way to go here.
> 
> This was a minimal fix to support SSE, but Solaris was indeed more radical:
> 
> sol2.h:#undef STACK_REALIGN_DEFAULT
> sol2.h:#define STACK_REALIGN_DEFAULT (TARGET_64BIT ? 0 : 1)
> 
> so we could just mimic it for Windows.

Why use STACK_REALIGN_DEFAULT rather than PREFERRED_STACK_BOUNDARY_DEFAULT?

[PATCH] LoongArch: Fix ICE and use simplify_gen_subreg instead of gen_rtx_SUBREG directly.

2023-11-28 Thread Jiahao Xu
loongarch_expand_vec_cond_mask_expr generates 'subreg's of 'subreg's, which are 
not supported
in gcc, it causes an ICE:

ice.c:55:1: error: unrecognizable insn:
   55 | }
  | ^
(insn 63 62 64 8 (set (reg:V4DI 278)
(subreg:V4DI (subreg:V4DF (reg:V4DI 273 [ vect__53.26 ]) 0) 0)) -1
 (nil))
during RTL pass: vregs
ice.c:55:1: internal compiler error: in extract_insn, at recog.cc:2804

Last time, Ruoyao has fixed a similar ICE:
https://gcc.gnu.org/pipermail/gcc-patches/2023-November/636156.html

This patch fixes ICE and use simplify_gen_subreg instead of gen_rtx_SUBREG as 
much as possible
to avoid the same ice happening again.

gcc/ChangeLog:

* config/loongarch/loongarch.cc (loongarch_try_expand_lsx_vshuf_const): 
Use
simplify_gen_subreg instead of gen_rtx_SUBREG.
(loongarch_expand_vec_perm_const_2): Ditto.
(loongarch_expand_vec_cond_expr): Ditto.

gcc/testsuite/ChangeLog:

* gcc.target/loongarch/pr112476-3.c: New test.
* gcc.target/loongarch/pr112476-4.c: New test.

diff --git a/gcc/config/loongarch/loongarch.cc 
b/gcc/config/loongarch/loongarch.cc
index e8a2584ac97..69fcb0aa6fb 100644
--- a/gcc/config/loongarch/loongarch.cc
+++ b/gcc/config/loongarch/loongarch.cc
@@ -8799,13 +8799,13 @@ loongarch_try_expand_lsx_vshuf_const (struct 
expand_vec_perm_d *d)
   if (d->vmode == E_V2DFmode)
{
  sel = gen_rtx_CONST_VECTOR (E_V2DImode, gen_rtvec_v (d->nelt, rperm));
- tmp = gen_rtx_SUBREG (E_V2DImode, d->target, 0);
+ tmp = simplify_gen_subreg (E_V2DImode, d->target, d->vmode, 0);
  emit_move_insn (tmp, sel);
}
   else if (d->vmode == E_V4SFmode)
{
  sel = gen_rtx_CONST_VECTOR (E_V4SImode, gen_rtvec_v (d->nelt, rperm));
- tmp = gen_rtx_SUBREG (E_V4SImode, d->target, 0);
+ tmp = simplify_gen_subreg (E_V4SImode, d->target, d->vmode, 0);
  emit_move_insn (tmp, sel);
}
   else
@@ -9584,8 +9584,8 @@ loongarch_expand_vec_perm_const_2 (struct 
expand_vec_perm_d *d)
  /* Adjust op1 for selecting correct value in high 128bit of target
 register.
 op1: E_V4DImode, { 4, 5, 6, 7 } -> { 2, 3, 4, 5 }.  */
- rtx conv_op1 = gen_rtx_SUBREG (E_V4DImode, op1_alt, 0);
- rtx conv_op0 = gen_rtx_SUBREG (E_V4DImode, d->op0, 0);
+ rtx conv_op1 = simplify_gen_subreg (E_V4DImode, op1_alt, d->vmode, 0);
+ rtx conv_op0 = simplify_gen_subreg (E_V4DImode, d->op0, d->vmode, 0);
  emit_insn (gen_lasx_xvpermi_q_v4di (conv_op1, conv_op1,
  conv_op0, GEN_INT (0x21)));
 
@@ -9614,8 +9614,8 @@ loongarch_expand_vec_perm_const_2 (struct 
expand_vec_perm_d *d)
  emit_move_insn (op0_alt, d->op0);
 
  /* Generate subreg for fitting into insn gen function.  */
- rtx conv_op1 = gen_rtx_SUBREG (E_V4DImode, op1_alt, 0);
- rtx conv_op0 = gen_rtx_SUBREG (E_V4DImode, op0_alt, 0);
+ rtx conv_op1 = simplify_gen_subreg (E_V4DImode, op1_alt, d->vmode, 0);
+ rtx conv_op0 = simplify_gen_subreg (E_V4DImode, op0_alt, d->vmode, 0);
 
  /* Adjust op value in temp register.
 op0 = {0,1,2,3}, op1 = {4,5,0,1}  */
@@ -9661,9 +9661,10 @@ loongarch_expand_vec_perm_const_2 (struct 
expand_vec_perm_d *d)
  emit_move_insn (op1_alt, d->op1);
  emit_move_insn (op0_alt, d->op0);
 
- rtx conv_op1 = gen_rtx_SUBREG (E_V4DImode, op1_alt, 0);
- rtx conv_op0 = gen_rtx_SUBREG (E_V4DImode, op0_alt, 0);
- rtx conv_target = gen_rtx_SUBREG (E_V4DImode, d->target, 0);
+ rtx conv_op1 = simplify_gen_subreg (E_V4DImode, op1_alt, d->vmode, 0);
+ rtx conv_op0 = simplify_gen_subreg (E_V4DImode, op0_alt, d->vmode, 0);
+ rtx conv_target = simplify_gen_subreg (E_V4DImode, d->target,
+d->vmode, 0);
 
  emit_insn (gen_lasx_xvpermi_q_v4di (conv_op1, conv_op1,
  conv_op0, GEN_INT (0x02)));
@@ -9695,9 +9696,10 @@ loongarch_expand_vec_perm_const_2 (struct 
expand_vec_perm_d *d)
 Selector sample: E_V4DImode, { 0, 1, 4 ,5 }  */
   if (!d->testing_p)
{
- rtx conv_op1 = gen_rtx_SUBREG (E_V4DImode, d->op1, 0);
- rtx conv_op0 = gen_rtx_SUBREG (E_V4DImode, d->op0, 0);
- rtx conv_target = gen_rtx_SUBREG (E_V4DImode, d->target, 0);
+ rtx conv_op1 = simplify_gen_subreg (E_V4DImode, d->op1, d->vmode, 0);
+ rtx conv_op0 = simplify_gen_subreg (E_V4DImode, d->op0, d->vmode, 0);
+ rtx conv_target = simplify_gen_subreg (E_V4DImode, d->target,
+d->vmode, 0);
 
  /* We can achieve the expectation by using sinple xvpermi.q insn.  */
  emit_move_insn (conv_target, conv_op1);
@@ -9722,8 +9724,8 @@ loongarch_expand_vec_perm_const_2 (struct 
expand_vec_perm_d *d)
  emit_move_insn 

[PATCH] LoongArch: Fix lsx-vshuf.c and lasx-xvshuf_b.c tests fail on LA664 [PR112611]

2023-11-28 Thread Jiahao Xu
For [x]vshuf instructions, if the index value in the selector exceeds 63, it 
triggers
undefined behavior on LA464, but not on LA664. To ensure compatibility of these 
two
tests on both LA464 and LA664, we have modified both tests to ensure that the 
index
value in the selector does not exceed 63.

gcc/testsuite/ChangeLog:

* gcc.target/loongarch/vector/lasx/lasx-xvshuf_b.c: Sure index less 
than 64.
* gcc.target/loongarch/vector/lsx/lsx-vshuf.c: Ditto.

diff --git a/gcc/testsuite/gcc.target/loongarch/vector/lasx/lasx-xvshuf_b.c 
b/gcc/testsuite/gcc.target/loongarch/vector/lasx/lasx-xvshuf_b.c
index 641ea2315ff..03c479a085c 100644
--- a/gcc/testsuite/gcc.target/loongarch/vector/lasx/lasx-xvshuf_b.c
+++ b/gcc/testsuite/gcc.target/loongarch/vector/lasx/lasx-xvshuf_b.c
@@ -44,9 +44,9 @@ main ()
   *((unsigned long *)&__m256i_op1[1]) = 0xfefefefe;
   *((unsigned long *)&__m256i_op1[0]) = 0xfefefefe;
   *((unsigned long *)&__m256i_op2[3]) = 0x;
-  *((unsigned long *)&__m256i_op2[2]) = 0xfff8fff8;
+  *((unsigned long *)&__m256i_op2[2]) = 0x3f3f3f383f3f3f38;
   *((unsigned long *)&__m256i_op2[1]) = 0x;
-  *((unsigned long *)&__m256i_op2[0]) = 0xfff8fc00;
+  *((unsigned long *)&__m256i_op2[0]) = 0x3f3f3f383c00;
   *((unsigned long *)&__m256i_result[3]) = 0xfafafafafafafafa;
   *((unsigned long *)&__m256i_result[2]) = 0x;
   *((unsigned long *)&__m256i_result[1]) = 0xfefefefefefefefe;
@@ -138,33 +138,14 @@ main ()
   *((unsigned long *)&__m256i_op1[2]) = 0x;
   *((unsigned long *)&__m256i_op1[1]) = 0x;
   *((unsigned long *)&__m256i_op1[0]) = 0x;
-  *((unsigned long *)&__m256i_op2[3]) = 0x;
-  *((unsigned long *)&__m256i_op2[2]) = 0x;
-  *((unsigned long *)&__m256i_op2[1]) = 0x;
-  *((unsigned long *)&__m256i_op2[0]) = 0x;
+  *((unsigned long *)&__m256i_op2[3]) = 0x;
+  *((unsigned long *)&__m256i_op2[2]) = 0x;
+  *((unsigned long *)&__m256i_op2[1]) = 0x;
+  *((unsigned long *)&__m256i_op2[0]) = 0x;
   *((unsigned long *)&__m256i_result[3]) = 0x;
-  *((unsigned long *)&__m256i_result[2]) = 0x;
+  *((unsigned long *)&__m256i_result[2]) = 0x;
   *((unsigned long *)&__m256i_result[1]) = 0x;
-  *((unsigned long *)&__m256i_result[0]) = 0x;
-  __m256i_out = __lasx_xvshuf_b (__m256i_op0, __m256i_op1, __m256i_op2);
-  ASSERTEQ_64 (__LINE__, __m256i_result, __m256i_out);
-
-  *((unsigned long *)&__m256i_op0[3]) = 0x;
-  *((unsigned long *)&__m256i_op0[2]) = 0x;
-  *((unsigned long *)&__m256i_op0[1]) = 0x;
-  *((unsigned long *)&__m256i_op0[0]) = 0x;
-  *((unsigned long *)&__m256i_op1[3]) = 0x;
-  *((unsigned long *)&__m256i_op1[2]) = 0x;
-  *((unsigned long *)&__m256i_op1[1]) = 0x;
-  *((unsigned long *)&__m256i_op1[0]) = 0x;
-  *((unsigned long *)&__m256i_op2[3]) = 0x;
-  *((unsigned long *)&__m256i_op2[2]) = 0x;
-  *((unsigned long *)&__m256i_op2[1]) = 0x;
-  *((unsigned long *)&__m256i_op2[0]) = 0x;
-  *((unsigned long *)&__m256i_result[3]) = 0x;
-  *((unsigned long *)&__m256i_result[2]) = 0x;
-  *((unsigned long *)&__m256i_result[1]) = 0x;
-  *((unsigned long *)&__m256i_result[0]) = 0x;
+  *((unsigned long *)&__m256i_result[0]) = 0x;
   __m256i_out = __lasx_xvshuf_b (__m256i_op0, __m256i_op1, __m256i_op2);
   ASSERTEQ_64 (__LINE__, __m256i_result, __m256i_out);
 
@@ -177,7 +158,7 @@ main ()
   *((unsigned long *)&__m256i_op1[1]) = 0x;
   *((unsigned long *)&__m256i_op1[0]) = 0x;
   *((unsigned long *)&__m256i_op2[3]) = 0x;
-  *((unsigned long *)&__m256i_op2[2]) = 0x00077fff;
+  *((unsigned long *)&__m256i_op2[2]) = 0x00032f1f;
   *((unsigned long *)&__m256i_op2[1]) = 0x;
   *((unsigned long *)&__m256i_op2[0]) = 0x;
   *((unsigned long *)&__m256i_result[3]) = 0x;
@@ -187,9 +168,9 @@ main ()
   __m256i_out = __lasx_xvshuf_b (__m256i_op0, __m256i_op1, __m256i_op2);
   ASSERTEQ_64 (__LINE__, __m256i_result, __m256i_out);
 
-  *((unsigned long *)&__m256i_op0[3]) = 0xfefe;
-  *((unsigned long *)&__m256i_op0[2]) = 0x0101;
-  *((unsigned long *)&__m256i_op0[1]) = 0xfefe;
+  *((unsigned long *)&__m256i_op0[3]) = 0x0011001100110011;
+  *((unsigned long *)&__m256i_op0[2]) = 0x0001;
+  *((unsigned long *)&__m256i_op0[1]) = 0x0011001100110011;
   *((unsigned long *)&__m256i_op0[0]) = 0x0101;
   *((unsigned long *)&__m256i_op1[3]) = 

[PATCH][V2] RISC-V: Nan-box the result of movhf on soft-fp16

2023-11-28 Thread KuanLin Chen
According to spec, fmv.h checks if the input operands are correctly
 NaN-boxed. If not, the input value is treated as an n-bit canonical NaN.
 This patch fixs the issue that operands returned by soft-fp16 libgcc
 (i.e., __truncdfhf2) was not correctly NaN-boxed.

*gcc/ChangeLog:*

* config/riscv/riscv.cc (riscv_legitimize_move): Expand movfh

with Nan-boxing value.

* config/riscv/riscv.md (*movhf_softfloat_unspec): New pattern.


*gcc/testsuite/ChangeLog:*


gcc.target/riscv/_Float16-nanboxing.c: New test.


0001-RISC-V-Nan-box-the-result-of-movhf-on-soft-fp16.patch
Description: Binary data


Re: [PATCH 4/5] LoongArch: New options -mrecip and -mrecip= with ffast-math.

2023-11-28 Thread Jiahao Xu



在 2023/11/29 上午10:33, Xi Ruoyao 写道:

On Wed, 2023-11-29 at 10:23 +0800, Jiahao Xu wrote:

在 2023/11/29 上午10:08, Xi Ruoyao 写道:

On Tue, 2023-11-28 at 11:29 +0800, Jiahao Xu wrote:

diff --git a/gcc/config/loongarch/predicates.md
b/gcc/config/loongarch/predicates.md
index f7796da10b2..9e9ce58cb53 100644
--- a/gcc/config/loongarch/predicates.md
+++ b/gcc/config/loongarch/predicates.md
@@ -235,6 +235,10 @@ (define_predicate "reg_or_1_operand"
     (ior (match_operand 0 "const_1_operand")
  (match_operand 0 "register_operand")))
   
+(define_predicate "reg_or_vecotr_1_operand"

"vector" instead of "vecotr".


+  (ior (match_operand 0 "const_vector_1_operand")
+   (match_operand 0 "register_operand")))
+@opindex mrecip
+@item -mrecip
+This option enables use of the reciprocal estimate and reciprocal square
+root estimate instructions with additional Newton-Raphson steps to increase
+precision instead of doing a divide or square root and divide for
+floating-point arguments.
+These instructions are generated only when @option{-funsafe-math-optimizations}
+is enabled together with @option{-ffinite-math-only} and
+@option{-fno-trapping-math}.
+Note that while the throughput of the sequence is higher than the throughput of
+the non-reciprocal instruction, the precision of the sequence can be decreased
+by up to 2 ulp (i.e. the inverse of 1.0 equals 0.9994).
+
+@opindex mrecip=opt

We should document that using these options requires the target CPU to
support the frecipe/frsqrte instructions.


I am currently improving this patch by adding option -mfrecipe to ensure
that the target CPU supports approximate instructions

You just need to add a line into gcc/config/loongarch/genopts/isa-
evolution.in:

2   25  frecipe Support frecipe.{s/d} and frsqrte.{s/d} 
instuctions

Then the -mfrecipe option will be added and can be tested with
TARGET_FRECIPE in GCC code.  -march=native will also detect it properly
because the cpucfg info is included.  Then just add
OPTION_MASK_ISA_FRECIPE into ISA_BASE_LA64V110_FEATURES in loongarch-
cpu.cc.

I'm now implementing it according to the idea you mentioned. Yesterday, 
lulu informed me of this problem.

And could we have a __builtin for scalar frecipe/frsqrte too?  Then if
the approximation is not OK for the entire program, but the programmer
knows it's OK for some operations in a hot path, (s)he can code
__builtin_loongarch_frecipe_d (x) for an acceleration.


I agree with this suggestion.



Re: [PATCH] i386: Fix CPUID of USER_MSR.

2023-11-28 Thread Hongtao Liu
On Wed, Nov 29, 2023 at 9:23 AM Hu, Lin1  wrote:
>
> Hi, all
>
> This patch aims to fix the wrong CPUID of USER_MSR, its correct CPUID is
> (0x7, 0x1).EDX[15], But I set it as (0x7, 0x0).EDX[15]. And the patch modefied
> testcase for give the user a better example.
>
> It has been bootstrapped and regtested on x86-64-pc-linux-gnu, OK for trunk?
Ok.
>
> BR,
> Lin
>
> gcc/ChangeLog:
>
> * common/config/i386/cpuinfo.h (get_available_features): Move USER_MSR
> to the correct location.
>
> gcc/testsuite/ChangeLog:
>
> * gcc.target/i386/user_msr-1.c: Correct the MSR index for give the 
> user
> an proper example.
> ---
>  gcc/common/config/i386/cpuinfo.h   | 4 ++--
>  gcc/testsuite/gcc.target/i386/user_msr-1.c | 9 +
>  2 files changed, 7 insertions(+), 6 deletions(-)
>
> diff --git a/gcc/common/config/i386/cpuinfo.h 
> b/gcc/common/config/i386/cpuinfo.h
> index f90fb4d56a2..a1eb285daed 100644
> --- a/gcc/common/config/i386/cpuinfo.h
> +++ b/gcc/common/config/i386/cpuinfo.h
> @@ -861,8 +861,6 @@ get_available_features (struct __processor_model 
> *cpu_model,
> set_feature (FEATURE_IBT);
>if (edx & bit_UINTR)
> set_feature (FEATURE_UINTR);
> -  if (edx & bit_USER_MSR)
> -   set_feature (FEATURE_USER_MSR);
>if (amx_usable)
> {
>   if (edx & bit_AMX_TILE)
> @@ -921,6 +919,8 @@ get_available_features (struct __processor_model 
> *cpu_model,
> set_feature (FEATURE_PREFETCHI);
>   if (eax & bit_RAOINT)
> set_feature (FEATURE_RAOINT);
> + if (edx & bit_USER_MSR)
> +   set_feature (FEATURE_USER_MSR);
>   if (avx_usable)
> {
>   if (eax & bit_AVXVNNI)
> diff --git a/gcc/testsuite/gcc.target/i386/user_msr-1.c 
> b/gcc/testsuite/gcc.target/i386/user_msr-1.c
> index 447852306df..f315016d088 100644
> --- a/gcc/testsuite/gcc.target/i386/user_msr-1.c
> +++ b/gcc/testsuite/gcc.target/i386/user_msr-1.c
> @@ -1,9 +1,9 @@
>  /* { dg-do compile { target { ! ia32  }  }  } */
>  /* { dg-options "-musermsr -O2"  } */
>  /* { dg-final { scan-assembler-times "urdmsr\[ \\t\]\\%r\[a-z\]x, 
> \\%r\[a-z\]x" 1  }  } */
> -/* { dg-final { scan-assembler-times "urdmsr\[ \\t\]\\\$121" 1  }  } */
> +/* { dg-final { scan-assembler-times "urdmsr\[ \\t\]\\\$6912" 1  }  } */
>  /* { dg-final { scan-assembler-times "uwrmsr\[ \\t\]\\%r\[a-z\]x, 
> \\%r\[a-z\]x" 1  }  } */
> -/* { dg-final { scan-assembler-times "uwrmsr\[ \\t\]\\%r\[a-z\]x, \\\$121" 1 
>  }  } */
> +/* { dg-final { scan-assembler-times "uwrmsr\[ \\t\]\\%r\[a-z\]x, \\\$6912" 
> 1  }  } */
>
>  #include 
>
> @@ -13,8 +13,9 @@ volatile unsigned long long y;
>  void extern
>  user_msr_test (void)
>  {
> +  y = 6913;
>x = _urdmsr(y);
> -  x = _urdmsr(121);
> +  x = _urdmsr(6912);
>_uwrmsr(y, x);
> -  _uwrmsr(121, x);
> +  _uwrmsr(6912, x);
>  }
> --
> 2.31.1
>


-- 
BR,
Hongtao


[PATCH] [x86] Support sdot_prodv*qi with emulation of sdot_prodv*hi.

2023-11-28 Thread liuhongt
Currently sdot_prodv*qi is available under TARGET_AVXVNNIINT8, but it
can be emulated by

 vec_unpacks_lo_v32qi
 vec_unpacks_lo_v32qi
 vec_unpacks_hi_v32qi
 vec_unpacks_hi_v32qi
 sdot_prodv16hi
 sdot_prodv16hi
 add3v8si

which is faster than original

  vect_patt_39.11_48 = WIDEN_MULT_LO_EXPR ;
  vect_patt_39.11_49 = WIDEN_MULT_HI_EXPR ;
  vect_patt_38.14_54 = [vec_unpack_lo_expr] vect_patt_39.11_48;
  vect_patt_38.14_55 = [vec_unpack_hi_expr] vect_patt_39.11_48;
  vect_patt_38.14_56 = [vec_unpack_lo_expr] vect_patt_39.11_49;
  vect_patt_38.14_57 = [vec_unpack_hi_expr] vect_patt_39.11_49;
  vect_sum_15.15_59 = vect_patt_38.14_54 + vect_patt_38.14_55;
  vect_sum_15.15_60 = vect_patt_38.14_56 + vect_sum_15.15_59;
  vect_sum_15.15_61 = vect_patt_38.14_57 + vect_sum_15.15_60;

Bootstrapped and regtested on x86_64-pc-linux-gnu{-m32,}.
Ready push to trunk.

gcc/ChangeLog:

* config/i386/sse.md (sdot_prodv64qi): New expander.
(sseunpackmodelower): New mode attr.
(sdot_prod): Emulate sdot_prodv*qi with sodt_prov*hi
when TARGET_VNNIINT8 is not available.

gcc/testsuite/ChangeLog:

* gcc.target/i386/sdotprodint8_emulate.c: New test.
---
 gcc/config/i386/sse.md| 87 ---
 .../gcc.target/i386/sdotprodint8_emulate.c| 15 
 2 files changed, 90 insertions(+), 12 deletions(-)
 create mode 100644 gcc/testsuite/gcc.target/i386/sdotprodint8_emulate.c

diff --git a/gcc/config/i386/sse.md b/gcc/config/i386/sse.md
index f94a77d0b6d..e29311d83cc 100644
--- a/gcc/config/i386/sse.md
+++ b/gcc/config/i386/sse.md
@@ -1291,6 +1291,11 @@ (define_mode_attr sseunpackmode
(V32QI "V16HI") (V16HI "V8SI") (V8SI "V4DI")
(V32HI "V16SI") (V64QI "V32HI") (V16SI "V8DI")])
 
+(define_mode_attr sseunpackmodelower
+  [(V16QI "v8hi") (V8HI "v4si") (V4SI "v2di")
+   (V32QI "v16hi") (V16HI "v8si") (V8SI "v4di")
+   (V32HI "v16si") (V64QI "v32hi") (V16SI "v8di")])
+
 (define_mode_attr ssepackmode
   [(V8HI "V16QI") (V4SI "V8HI") (V2DI "V4SI")
(V16HI "V32QI") (V8SI "V16HI") (V4DI "V8SI")
@@ -30742,20 +30747,78 @@ (define_int_attr vpdotprodtype
 
 (define_expand "sdot_prod"
   [(match_operand: 0 "register_operand")
-   (match_operand:VI1 1 "register_operand")
-   (match_operand:VI1 2 "register_operand")
+   (match_operand:VI1_AVX2 1 "register_operand")
+   (match_operand:VI1_AVX2 2 "register_operand")
(match_operand: 3 "register_operand")]
-  "TARGET_AVXVNNIINT8"
+  "TARGET_SSE2"
 {
-  operands[1] = lowpart_subreg (mode,
-force_reg (mode, operands[1]),
-mode);
-  operands[2] = lowpart_subreg (mode,
-force_reg (mode, operands[2]),
-mode);
-  emit_insn (gen_rtx_SET (operands[0], operands[3]));
-  emit_insn (gen_vpdpbssd_ (operands[0], operands[3],
-  operands[1], operands[2]));
+  if (TARGET_AVXVNNIINT8)
+{
+  operands[1] = lowpart_subreg (mode,
+   force_reg (mode, operands[1]),
+   mode);
+  operands[2] = lowpart_subreg (mode,
+   force_reg (mode, operands[2]),
+   mode);
+  emit_insn (gen_rtx_SET (operands[0], operands[3]));
+  emit_insn (gen_vpdpbssd_ (operands[0], operands[3],
+ operands[1], operands[2]));
+}
+  else
+{
+  /* Emulate with vpdpwssd.  */
+  rtx op1_lo = gen_reg_rtx (mode);
+  rtx op1_hi = gen_reg_rtx (mode);
+  rtx op2_lo = gen_reg_rtx (mode);
+  rtx op2_hi = gen_reg_rtx (mode);
+
+  emit_insn (gen_vec_unpacks_lo_ (op1_lo, operands[1]));
+  emit_insn (gen_vec_unpacks_lo_ (op2_lo, operands[2]));
+  emit_insn (gen_vec_unpacks_hi_ (op1_hi, operands[1]));
+  emit_insn (gen_vec_unpacks_hi_ (op2_hi, operands[2]));
+
+  rtx res1 = gen_reg_rtx (mode);
+  rtx res2 = gen_reg_rtx (mode);
+  rtx sum = gen_reg_rtx (mode);
+
+  emit_move_insn (sum, CONST0_RTX (mode));
+  emit_insn (gen_sdot_prod (res1, op1_lo,
+   op2_lo, sum));
+  emit_insn (gen_sdot_prod (res2, op1_hi,
+   op2_hi, operands[3]));
+  emit_insn (gen_add3 (operands[0], res1, res2));
+}
+
+  DONE;
+})
+
+(define_expand "sdot_prodv64qi"
+  [(match_operand:V16SI 0 "register_operand")
+   (match_operand:V64QI 1 "register_operand")
+   (match_operand:V64QI 2 "register_operand")
+   (match_operand:V16SI 3 "register_operand")]
+  "(TARGET_AVX512VNNI || TARGET_AVX512BW) && TARGET_EVEX512"
+{
+  /* Emulate with vpdpwssd.  */
+  rtx op1_lo = gen_reg_rtx (V32HImode);
+  rtx op1_hi = gen_reg_rtx (V32HImode);
+  rtx op2_lo = gen_reg_rtx (V32HImode);
+  rtx op2_hi = gen_reg_rtx (V32HImode);
+
+  emit_insn (gen_vec_unpacks_lo_v64qi (op1_lo, operands[1]));
+  emit_insn 

Re: [PATCH 4/5] LoongArch: New options -mrecip and -mrecip= with ffast-math.

2023-11-28 Thread Xi Ruoyao
On Wed, 2023-11-29 at 10:23 +0800, Jiahao Xu wrote:
> 
> 在 2023/11/29 上午10:08, Xi Ruoyao 写道:
> > On Tue, 2023-11-28 at 11:29 +0800, Jiahao Xu wrote:
> > > diff --git a/gcc/config/loongarch/predicates.md
> > > b/gcc/config/loongarch/predicates.md
> > > index f7796da10b2..9e9ce58cb53 100644
> > > --- a/gcc/config/loongarch/predicates.md
> > > +++ b/gcc/config/loongarch/predicates.md
> > > @@ -235,6 +235,10 @@ (define_predicate "reg_or_1_operand"
> > >     (ior (match_operand 0 "const_1_operand")
> > >  (match_operand 0 "register_operand")))
> > >   
> > > +(define_predicate "reg_or_vecotr_1_operand"
> > "vector" instead of "vecotr".
> > 
> > > +  (ior (match_operand 0 "const_vector_1_operand")
> > > +   (match_operand 0 "register_operand")))
> > > +@opindex mrecip
> > > +@item -mrecip
> > > +This option enables use of the reciprocal estimate and reciprocal square
> > > +root estimate instructions with additional Newton-Raphson steps to 
> > > increase
> > > +precision instead of doing a divide or square root and divide for
> > > +floating-point arguments.
> > > +These instructions are generated only when 
> > > @option{-funsafe-math-optimizations}
> > > +is enabled together with @option{-ffinite-math-only} and
> > > +@option{-fno-trapping-math}.
> > > +Note that while the throughput of the sequence is higher than the 
> > > throughput of
> > > +the non-reciprocal instruction, the precision of the sequence can be 
> > > decreased
> > > +by up to 2 ulp (i.e. the inverse of 1.0 equals 0.9994).
> > > +
> > > +@opindex mrecip=opt
> > We should document that using these options requires the target CPU to
> > support the frecipe/frsqrte instructions.
> > 
> I am currently improving this patch by adding option -mfrecipe to ensure 
> that the target CPU supports approximate instructions

You just need to add a line into gcc/config/loongarch/genopts/isa-
evolution.in:

2   25  frecipe Support frecipe.{s/d} and frsqrte.{s/d} 
instuctions

Then the -mfrecipe option will be added and can be tested with
TARGET_FRECIPE in GCC code.  -march=native will also detect it properly
because the cpucfg info is included.  Then just add
OPTION_MASK_ISA_FRECIPE into ISA_BASE_LA64V110_FEATURES in loongarch-
cpu.cc.


And could we have a __builtin for scalar frecipe/frsqrte too?  Then if
the approximation is not OK for the entire program, but the programmer
knows it's OK for some operations in a hot path, (s)he can code
__builtin_loongarch_frecipe_d (x) for an acceleration.

-- 
Xi Ruoyao 
School of Aerospace Science and Technology, Xidian University


Re: [PATCH 4/5] LoongArch: New options -mrecip and -mrecip= with ffast-math.

2023-11-28 Thread Jiahao Xu



在 2023/11/29 上午10:08, Xi Ruoyao 写道:

On Tue, 2023-11-28 at 11:29 +0800, Jiahao Xu wrote:

diff --git a/gcc/config/loongarch/predicates.md
b/gcc/config/loongarch/predicates.md
index f7796da10b2..9e9ce58cb53 100644
--- a/gcc/config/loongarch/predicates.md
+++ b/gcc/config/loongarch/predicates.md
@@ -235,6 +235,10 @@ (define_predicate "reg_or_1_operand"
    (ior (match_operand 0 "const_1_operand")
     (match_operand 0 "register_operand")))
  
+(define_predicate "reg_or_vecotr_1_operand"

"vector" instead of "vecotr".


+  (ior (match_operand 0 "const_vector_1_operand")
+   (match_operand 0 "register_operand")))
+@opindex mrecip
+@item -mrecip
+This option enables use of the reciprocal estimate and reciprocal square
+root estimate instructions with additional Newton-Raphson steps to increase
+precision instead of doing a divide or square root and divide for
+floating-point arguments.
+These instructions are generated only when @option{-funsafe-math-optimizations}
+is enabled together with @option{-ffinite-math-only} and
+@option{-fno-trapping-math}.
+Note that while the throughput of the sequence is higher than the throughput of
+the non-reciprocal instruction, the precision of the sequence can be decreased
+by up to 2 ulp (i.e. the inverse of 1.0 equals 0.9994).
+
+@opindex mrecip=opt

We should document that using these options requires the target CPU to
support the frecipe/frsqrte instructions.

I am currently improving this patch by adding option -mfrecipe to ensure 
that the target CPU supports approximate instructions and recorded in 
invoke.texi. Thank you for your suggestions. I will upload the v2 
version later.




Re: [PATCH 4/5] LoongArch: New options -mrecip and -mrecip= with ffast-math.

2023-11-28 Thread Xi Ruoyao
On Tue, 2023-11-28 at 11:29 +0800, Jiahao Xu wrote:
> diff --git a/gcc/config/loongarch/predicates.md
> b/gcc/config/loongarch/predicates.md
> index f7796da10b2..9e9ce58cb53 100644
> --- a/gcc/config/loongarch/predicates.md
> +++ b/gcc/config/loongarch/predicates.md
> @@ -235,6 +235,10 @@ (define_predicate "reg_or_1_operand"
>    (ior (match_operand 0 "const_1_operand")
>     (match_operand 0 "register_operand")))
>  
> +(define_predicate "reg_or_vecotr_1_operand"

"vector" instead of "vecotr".

> +  (ior (match_operand 0 "const_vector_1_operand")
> +   (match_operand 0 "register_operand")))

> +@opindex mrecip
> +@item -mrecip
> +This option enables use of the reciprocal estimate and reciprocal square
> +root estimate instructions with additional Newton-Raphson steps to increase
> +precision instead of doing a divide or square root and divide for
> +floating-point arguments.
> +These instructions are generated only when 
> @option{-funsafe-math-optimizations}
> +is enabled together with @option{-ffinite-math-only} and
> +@option{-fno-trapping-math}.
> +Note that while the throughput of the sequence is higher than the throughput 
> of
> +the non-reciprocal instruction, the precision of the sequence can be 
> decreased
> +by up to 2 ulp (i.e. the inverse of 1.0 equals 0.9994).
> +
> +@opindex mrecip=opt

We should document that using these options requires the target CPU to
support the frecipe/frsqrte instructions.

-- 
Xi Ruoyao 
School of Aerospace Science and Technology, Xidian University


Re: [PATCH v1] LoongArch: Remove duplicate definition of CLZ_DEFINED_VALUE_AT_ZERO.

2023-11-28 Thread Xi Ruoyao
On Tue, 2023-11-28 at 15:56 +0800, Li Wei wrote:
> In the r14-5547 commit, C[LT]Z_DEFINED_VALUE_AT_ZERO were defined at
> the same time, but in fact, CLZ_DEFINED_VALUE_AT_ZERO has already been
> defined, so remove the duplicate definition.
> 
> gcc/ChangeLog:
> 
>   * config/loongarch/loongarch.h (CTZ_DEFINED_VALUE_AT_ZERO): Add
>     description.
>   (CLZ_DEFINED_VALUE_AT_ZERO): Remove duplicate definition.

LGTM.

Interestingly the compiler does not give any warning when a macro is
redefined but with exactly same definition.

> ---
>  gcc/config/loongarch/loongarch.h | 9 +++--
>  1 file changed, 3 insertions(+), 6 deletions(-)
> 
> diff --git a/gcc/config/loongarch/loongarch.h 
> b/gcc/config/loongarch/loongarch.h
> index 115222e70fd..fa8a3f5582f 100644
> --- a/gcc/config/loongarch/loongarch.h
> +++ b/gcc/config/loongarch/loongarch.h
> @@ -288,10 +288,12 @@ along with GCC; see the file COPYING3.  If not see
>  /* Define if loading short immediate values into registers sign extends.  */
>  #define SHORT_IMMEDIATES_SIGN_EXTEND 1
>  
> -/* The clz.{w/d} instructions have the natural values at 0.  */
> +/* The clz.{w/d}, ctz.{w/d} instructions have the natural values at 0.  */
>  
>  #define CLZ_DEFINED_VALUE_AT_ZERO(MODE, VALUE) \
>    ((VALUE) = GET_MODE_UNIT_BITSIZE (MODE), 2)
> +#define CTZ_DEFINED_VALUE_AT_ZERO(MODE, VALUE) \
> +  ((VALUE) = GET_MODE_UNIT_BITSIZE (MODE), 2)
>  
>  /* Standard register usage.  */
>  
> @@ -1239,8 +1241,3 @@ struct GTY (()) machine_function
>  
>  #define TARGET_EXPLICIT_RELOCS \
>    (la_opt_explicit_relocs == EXPLICIT_RELOCS_ALWAYS)
> -
> -#define CLZ_DEFINED_VALUE_AT_ZERO(MODE, VALUE) \
> -  ((VALUE) = GET_MODE_UNIT_BITSIZE (MODE), 2)
> -#define CTZ_DEFINED_VALUE_AT_ZERO(MODE, VALUE) \
> -  ((VALUE) = GET_MODE_UNIT_BITSIZE (MODE), 2)

-- 
Xi Ruoyao 
School of Aerospace Science and Technology, Xidian University


Re: [PATCH] Add C intrinsics for scalar crypto extension

2023-11-28 Thread Jeff Law




On 11/27/23 01:34, Liao Shihua wrote:

This patch add C intrinsics for scalar crypto extension.
Because of riscv-c-api 
(https://github.com/riscv-non-isa/riscv-c-api-doc/pull/44/files) includes 
zbkb/zbkc/zbkx's
intrinsics in bit manipulation extension, this patch only support zkn*/zks*'s 
intrinsics.

gcc/ChangeLog:

 * config.gcc: Add riscv_crypto.h
 * config/riscv/riscv_crypto.h: New file.

gcc/testsuite/ChangeLog:

 * gcc.target/riscv/zknd32.c: Use intrinsics instead of builtins.
 * gcc.target/riscv/zknd64.c: Likewise.
 * gcc.target/riscv/zkne32.c: Likewise.
 * gcc.target/riscv/zkne64.c: Likewise.
 * gcc.target/riscv/zknh-sha256-32.c: Likewise.
 * gcc.target/riscv/zknh-sha256-64.c: Likewise.
 * gcc.target/riscv/zknh-sha512-32.c: Likewise.
 * gcc.target/riscv/zknh-sha512-64.c: Likewise.
 * gcc.target/riscv/zksed32.c: Likewise.
 * gcc.target/riscv/zksed64.c: Likewise.
 * gcc.target/riscv/zksh32.c: Likewise.
 * gcc.target/riscv/zksh64.c: Likewise.
Last cycle we let a ton of vector intrinsics through after stage1 
closed.  I'm not keen to repeat that, but this looks pretty small and 
appears to just provide a mapping from the RV intrinsics to the builtin 
names within GCC.



I won't object to this one if Kito or Palmer want to see it go forward. 
I might object if more of these things get submitted later in 
stage3/stage4 :-)



It would be useful if future patches included "RISC-V" in the subject 
line.  Our Tuesday patchwork meeting focuses on patches with that tag in 
the subject line.  By using it you ensure it gets on the weekly agenda.


Thanks,
Jeff


[PATCH] i386: Fix CPUID of USER_MSR.

2023-11-28 Thread Hu, Lin1
Hi, all

This patch aims to fix the wrong CPUID of USER_MSR, its correct CPUID is
(0x7, 0x1).EDX[15], But I set it as (0x7, 0x0).EDX[15]. And the patch modefied
testcase for give the user a better example.

It has been bootstrapped and regtested on x86-64-pc-linux-gnu, OK for trunk?

BR,
Lin

gcc/ChangeLog:

* common/config/i386/cpuinfo.h (get_available_features): Move USER_MSR
to the correct location.

gcc/testsuite/ChangeLog:

* gcc.target/i386/user_msr-1.c: Correct the MSR index for give the user
an proper example.
---
 gcc/common/config/i386/cpuinfo.h   | 4 ++--
 gcc/testsuite/gcc.target/i386/user_msr-1.c | 9 +
 2 files changed, 7 insertions(+), 6 deletions(-)

diff --git a/gcc/common/config/i386/cpuinfo.h b/gcc/common/config/i386/cpuinfo.h
index f90fb4d56a2..a1eb285daed 100644
--- a/gcc/common/config/i386/cpuinfo.h
+++ b/gcc/common/config/i386/cpuinfo.h
@@ -861,8 +861,6 @@ get_available_features (struct __processor_model *cpu_model,
set_feature (FEATURE_IBT);
   if (edx & bit_UINTR)
set_feature (FEATURE_UINTR);
-  if (edx & bit_USER_MSR)
-   set_feature (FEATURE_USER_MSR);
   if (amx_usable)
{
  if (edx & bit_AMX_TILE)
@@ -921,6 +919,8 @@ get_available_features (struct __processor_model *cpu_model,
set_feature (FEATURE_PREFETCHI);
  if (eax & bit_RAOINT)
set_feature (FEATURE_RAOINT);
+ if (edx & bit_USER_MSR)
+   set_feature (FEATURE_USER_MSR);
  if (avx_usable)
{
  if (eax & bit_AVXVNNI)
diff --git a/gcc/testsuite/gcc.target/i386/user_msr-1.c 
b/gcc/testsuite/gcc.target/i386/user_msr-1.c
index 447852306df..f315016d088 100644
--- a/gcc/testsuite/gcc.target/i386/user_msr-1.c
+++ b/gcc/testsuite/gcc.target/i386/user_msr-1.c
@@ -1,9 +1,9 @@
 /* { dg-do compile { target { ! ia32  }  }  } */
 /* { dg-options "-musermsr -O2"  } */
 /* { dg-final { scan-assembler-times "urdmsr\[ \\t\]\\%r\[a-z\]x, 
\\%r\[a-z\]x" 1  }  } */
-/* { dg-final { scan-assembler-times "urdmsr\[ \\t\]\\\$121" 1  }  } */
+/* { dg-final { scan-assembler-times "urdmsr\[ \\t\]\\\$6912" 1  }  } */
 /* { dg-final { scan-assembler-times "uwrmsr\[ \\t\]\\%r\[a-z\]x, 
\\%r\[a-z\]x" 1  }  } */
-/* { dg-final { scan-assembler-times "uwrmsr\[ \\t\]\\%r\[a-z\]x, \\\$121" 1  
}  } */
+/* { dg-final { scan-assembler-times "uwrmsr\[ \\t\]\\%r\[a-z\]x, \\\$6912" 1  
}  } */
 
 #include 
 
@@ -13,8 +13,9 @@ volatile unsigned long long y;
 void extern
 user_msr_test (void)
 {
+  y = 6913;
   x = _urdmsr(y);
-  x = _urdmsr(121);
+  x = _urdmsr(6912);
   _uwrmsr(y, x);
-  _uwrmsr(121, x);
+  _uwrmsr(6912, x);
 }
-- 
2.31.1



Re: [PATCH 09/44] RISC-V: Rework branch costing model for if-conversion

2023-11-28 Thread Jeff Law




On 11/23/23 11:34, Maciej W. Rozycki wrote:

On Sun, 19 Nov 2023, Jeff Law wrote:


As I suspect you know a big part of the problem here is that BRANCH_COST and
rtx_cost don't have any common scale and thus trying to compare BRANCH_COST to
RTX_COST doesn't have well defined meaning.


  We do have preexisting places using COSTS_N_INSNS (BRANCH_COST ())
though, as documented in ifcvt.cc:

  ??? Actually, instead of the branch instruction costs we might want
  to use COSTS_N_INSNS (BRANCH_COST ()) as in other places.  */

so it seems the right direction, and given that we expose this measure to
the user (and at the very least GCC developers implementing new tuning
microarchitectures) I think it's the only sane way to do branch costing:
define the measure in terms of how many ordinary ALU instructions a branch
is statistically equivalent to.
Oh, it probably is the only sane way at this point.  To do anything more 
sane we'd have to disassociate the BRANCH_COST uses during tree/gimple 
from those in RTL-land.


FWIW, I was looking at a regression with our internal tests after your 
changes.   It was quite nice to see how well twiddling -mbranch-cost 
correlated to how many instructions we would allow in a conditional move 
sequence.


The downside is it highlighted the gimple vs RTL use issue.  I'm 
confident that we would like to see a higher branch cost in the RTL 
phases for our uarch, but I'm much less comfortable with how that's 
going to change the decisions made in trees/gimple.  We'll have to 
investigate that at some depth.







WRT the extraneous zero-extension.  Isn't that arguably a bug in the scc
expander for risc-v?  Fixing that isn't a prerequisite here, but it probably
worth a bit of someone's time.


  I've looked at it already and it's the middle end that ends up with the
zero-extension, specifically `convert_move' invoked from `emit_cstore'
down the call to `noce_try_store_flag_mask', to widen the output from
`cstoredi4', so I don't think we can do anything in the backend to prevent
it from happening.  And neither I think we can do anything useful about
`cstoredi4' having a SImode output, as it's a pattern matched by name
rather than RTX, so we can't provide variants having a SImode and a DImode
output each both at a time, as that would cause a name clash.
We're actually tracking some of these extraneous extensions.  Do you 
happen to know if the zero-extended object happens to be (subreg:SI 
(reg:DI)) kind of construct?  That's the kind of thing we're chasing 
down right now from various points.  Vineet has already fixed one class 
of them.  Jivan and I are looking at others.


jeff


[Bug testsuite/112729] gcc.target/i386/apx-interrupt-1.c etc. FAIL

2023-11-28 Thread cvs-commit at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=112729

--- Comment #8 from GCC Commits  ---
The master branch has been updated by Hongyu Wang :

https://gcc.gnu.org/g:99fa0bfd63d97825c4221dcd3123940f1d0e6291

commit r14-5943-g99fa0bfd63d97825c4221dcd3123940f1d0e6291
Author: Hongyu Wang 
Date:   Tue Nov 28 11:24:01 2023 +0800

[i386] Fix push2pop2 test fail on non-linux target [PR112729]

On linux x86-64, -fomit-frame-pointer was by default enabled so the
push2pop2 tests cfi scans are based on it. On other target with
-fno-omit-frame-pointer the cfi scan will be wrong as the frame pointer
is pushed at first. Add -fomit-frame-pointer to these tests that related
to cfi scan.

gcc/testsuite/ChangeLog:

PR target/112729
* gcc.target/i386/apx-interrupt-1.c: Add -fomit-frame-pointer.
* gcc.target/i386/apx-push2pop2-1.c: Likewise.
* gcc.target/i386/apx-push2pop2_force_drap-1.c: Likewise.

Re: [PATCH] [i386] Fix push2pop2 test fail on non-linux target [PR112729]

2023-11-28 Thread Hongtao Liu
On Tue, Nov 28, 2023 at 9:51 PM Hongyu Wang  wrote:
>
> Hi,
>
> On linux x86-64, -fomit-frame-pointer was by default enabled so the
> push2pop2 tests cfi scans are based on it. On other target with
> -fno-omit-frame-pointer the cfi scan will be wrong as the frame pointer
> is pushed at first. Add -fomit-frame-pointer to these tests that related
> to cfi scan.
>
> OK for master?
Ok.
>
> gcc/testsuite/ChangeLog:
>
> PR target/112729
> * gcc.target/i386/apx-interrupt-1.c: Add -fomit-frame-pointer.
> * gcc.target/i386/apx-push2pop2-1.c: Likewise.
> * gcc.target/i386/apx-push2pop2_force_drap-1.c: Likewise.
> ---
>  gcc/testsuite/gcc.target/i386/apx-interrupt-1.c| 2 +-
>  gcc/testsuite/gcc.target/i386/apx-push2pop2-1.c| 2 +-
>  gcc/testsuite/gcc.target/i386/apx-push2pop2_force_drap-1.c | 2 +-
>  3 files changed, 3 insertions(+), 3 deletions(-)
>
> diff --git a/gcc/testsuite/gcc.target/i386/apx-interrupt-1.c 
> b/gcc/testsuite/gcc.target/i386/apx-interrupt-1.c
> index ffcb8fce71c..6844e574d00 100644
> --- a/gcc/testsuite/gcc.target/i386/apx-interrupt-1.c
> +++ b/gcc/testsuite/gcc.target/i386/apx-interrupt-1.c
> @@ -1,5 +1,5 @@
>  /* { dg-do compile { target { ! ia32 } } } */
> -/* { dg-options "-mapx-features=egpr -m64 -O2 -mgeneral-regs-only -mno-cld 
> -mno-push-args -maccumulate-outgoing-args" } */
> +/* { dg-options "-mapx-features=egpr -m64 -O2 -mgeneral-regs-only -mno-cld 
> -mno-push-args -maccumulate-outgoing-args -fomit-frame-pointer" } */
>  /* { dg-skip-if "does not emit .cfi_xxx" "*-*-darwin*" } */
>
>  extern void foo (void *) __attribute__ ((interrupt));
> diff --git a/gcc/testsuite/gcc.target/i386/apx-push2pop2-1.c 
> b/gcc/testsuite/gcc.target/i386/apx-push2pop2-1.c
> index d78c96d36a3..5f43b42e33f 100644
> --- a/gcc/testsuite/gcc.target/i386/apx-push2pop2-1.c
> +++ b/gcc/testsuite/gcc.target/i386/apx-push2pop2-1.c
> @@ -1,5 +1,5 @@
>  /* { dg-do compile { target { ! ia32 } } } */
> -/* { dg-options "-O2 -mapx-features=push2pop2" } */
> +/* { dg-options "-O2 -mapx-features=push2pop2 -fomit-frame-pointer" } */
>  /* { dg-skip-if "does not emit .cfi_xxx" "*-*-darwin*" } */
>
>  extern int bar (int);
> diff --git a/gcc/testsuite/gcc.target/i386/apx-push2pop2_force_drap-1.c 
> b/gcc/testsuite/gcc.target/i386/apx-push2pop2_force_drap-1.c
> index 3cac7b10769..4e2259f0c99 100644
> --- a/gcc/testsuite/gcc.target/i386/apx-push2pop2_force_drap-1.c
> +++ b/gcc/testsuite/gcc.target/i386/apx-push2pop2_force_drap-1.c
> @@ -1,5 +1,5 @@
>  /* { dg-do compile { target { ! ia32 } } } */
> -/* { dg-options "-O2 -mapx-features=push2pop2 -mforce-drap" } */
> +/* { dg-options "-O2 -mapx-features=push2pop2 -fomit-frame-pointer 
> -mforce-drap" } */
>  /* { dg-skip-if "does not emit .cfi_xxx" "*-*-darwin*" } */
>
>  #include "apx-push2pop2-1.c"
> --
> 2.31.1
>


-- 
BR,
Hongtao


[Bug target/112757] RISC-V regression testsuite errors with rv32gcv_zvl1024b

2023-11-28 Thread patrick at rivosinc dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=112757

--- Comment #1 from Patrick O'Neill  ---
See also:
rv32_zvl128b: https://gcc.gnu.org/bugzilla/show_bug.cgi?id=112754
rv32_zvl256b: https://gcc.gnu.org/bugzilla/show_bug.cgi?id=112755
rv32_zvl512b: https://gcc.gnu.org/bugzilla/show_bug.cgi?id=112756
rv32_zvl1024b: https://gcc.gnu.org/bugzilla/show_bug.cgi?id=112757

[Bug target/112755] RISC-V regression testsuite errors with rv32gcv_zvl256b

2023-11-28 Thread patrick at rivosinc dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=112755

--- Comment #1 from Patrick O'Neill  ---
See also:
rv32_zvl128b: https://gcc.gnu.org/bugzilla/show_bug.cgi?id=112754
rv32_zvl256b: https://gcc.gnu.org/bugzilla/show_bug.cgi?id=112755
rv32_zvl512b: https://gcc.gnu.org/bugzilla/show_bug.cgi?id=112756
rv32_zvl1024b: https://gcc.gnu.org/bugzilla/show_bug.cgi?id=112757

[Bug target/112756] RISC-V regression testsuite errors with rv32gcv_zvl512b

2023-11-28 Thread patrick at rivosinc dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=112756

--- Comment #1 from Patrick O'Neill  ---
See also:
rv32_zvl128b: https://gcc.gnu.org/bugzilla/show_bug.cgi?id=112754
rv32_zvl256b: https://gcc.gnu.org/bugzilla/show_bug.cgi?id=112755
rv32_zvl512b: https://gcc.gnu.org/bugzilla/show_bug.cgi?id=112756
rv32_zvl1024b: https://gcc.gnu.org/bugzilla/show_bug.cgi?id=112757

[Bug target/112754] RISC-V regression testsuite errors with rv32gcv_zvl128b

2023-11-28 Thread patrick at rivosinc dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=112754

--- Comment #1 from Patrick O'Neill  ---
See also:
rv32_zvl128b: https://gcc.gnu.org/bugzilla/show_bug.cgi?id=112754
rv32_zvl256b: https://gcc.gnu.org/bugzilla/show_bug.cgi?id=112755
rv32_zvl512b: https://gcc.gnu.org/bugzilla/show_bug.cgi?id=112756
rv32_zvl1024b: https://gcc.gnu.org/bugzilla/show_bug.cgi?id=112757

[Bug target/112757] New: RISC-V regression testsuite errors with rv32gcv_zvl1024b

2023-11-28 Thread patrick at rivosinc dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=112757

Bug ID: 112757
   Summary: RISC-V regression testsuite errors with
rv32gcv_zvl1024b
   Product: gcc
   Version: 14.0
Status: UNCONFIRMED
  Severity: normal
  Priority: P3
 Component: target
  Assignee: unassigned at gcc dot gnu.org
  Reporter: patrick at rivosinc dot com
  Target Milestone: ---

Created attachment 56713
  --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=56713=edit
rv32gcv_zvl1024b testsuite failures 2023-11-27

Current testsuite status of rv32gcv_zvl1024b on GCC
ad3e759c172272f6f2ba66631e7e7bd03fb2b436

I've started running rv32 zvl variants 128-1024b weekly on the postcommit CI.

Artifacts for this run can be downloaded here:
https://github.com/patrick-rivos/gcc-postcommit-ci/actions/runs/7012181120

This is just a tracking issue, similar to
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=111311

I've attached the current results for rv32gcv_zvl1024b with glibc v2.37 on QEMU
v8.1.2

[Bug target/112756] New: RISC-V regression testsuite errors with rv32gcv_zvl512b

2023-11-28 Thread patrick at rivosinc dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=112756

Bug ID: 112756
   Summary: RISC-V regression testsuite errors with
rv32gcv_zvl512b
   Product: gcc
   Version: 14.0
Status: UNCONFIRMED
  Severity: normal
  Priority: P3
 Component: target
  Assignee: unassigned at gcc dot gnu.org
  Reporter: patrick at rivosinc dot com
  Target Milestone: ---

Created attachment 56712
  --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=56712=edit
rv32gcv_zvl512b testsuite failures 2023-11-27

Current testsuite status of rv32gcv_zvl512b on GCC
ad3e759c172272f6f2ba66631e7e7bd03fb2b436

I've started running rv32 zvl variants 128-1024b weekly on the postcommit CI.

Artifacts for this run can be downloaded here:
https://github.com/patrick-rivos/gcc-postcommit-ci/actions/runs/7012181120

This is just a tracking issue, similar to
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=111311

I've attached the current results for rv32gcv_zvl512b with glibc v2.37 on QEMU
v8.1.2

[Bug target/112755] New: RISC-V regression testsuite errors with rv32gcv_zvl256b

2023-11-28 Thread patrick at rivosinc dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=112755

Bug ID: 112755
   Summary: RISC-V regression testsuite errors with
rv32gcv_zvl256b
   Product: gcc
   Version: 14.0
Status: UNCONFIRMED
  Severity: normal
  Priority: P3
 Component: target
  Assignee: unassigned at gcc dot gnu.org
  Reporter: patrick at rivosinc dot com
  Target Milestone: ---

Created attachment 56711
  --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=56711=edit
rv32gcv_zvl256b testsuite failures 2023-11-27

Current testsuite status of rv32gcv_zvl256b on GCC
ad3e759c172272f6f2ba66631e7e7bd03fb2b436

I've started running rv32 zvl variants 128-1024b weekly on the postcommit CI.

Artifacts for this run can be downloaded here:
https://github.com/patrick-rivos/gcc-postcommit-ci/actions/runs/7012181120

This is just a tracking issue, similar to
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=111311

I've attached the current results for rv32gcv_zvl256b with glibc v2.37 on QEMU
v8.1.2

[Bug target/112754] New: RISC-V regression testsuite errors with rv32gcv_zvl128b

2023-11-28 Thread patrick at rivosinc dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=112754

Bug ID: 112754
   Summary: RISC-V regression testsuite errors with
rv32gcv_zvl128b
   Product: gcc
   Version: 14.0
Status: UNCONFIRMED
  Severity: normal
  Priority: P3
 Component: target
  Assignee: unassigned at gcc dot gnu.org
  Reporter: patrick at rivosinc dot com
  Target Milestone: ---

Created attachment 56710
  --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=56710=edit
rv32gcv_zvl128b testsuite failures 2023-11-27

Current testsuite status of rv32gcv_zvl128b on GCC
ad3e759c172272f6f2ba66631e7e7bd03fb2b436

I've started running rv32 zvl variants 128-1024b weekly on the postcommit CI.
This is my first time running these so if any of the failures look odd poke me
here or via email and I can dig into/share the logs.

Artifacts for this run can be downloaded here:
https://github.com/patrick-rivos/gcc-postcommit-ci/actions/runs/7012181120

This is just a tracking issue, similar to
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=111311

I'll open separate tracking issues for zvl 256,512,1024 and link them here when
I do.

I've attached the current results for rv32gcv_zvl128b with glibc v2.37 on QEMU
v8.1.2

Re: [V2] New pass for sign/zero extension elimination -- not ready for "final" review

2023-11-28 Thread Jeff Law




On 11/28/23 15:18, Jivan Hakobyan wrote:

The amdgcn ICE I reported still exists:


Can you send a build command to reproduce ICE.
I built on x86-64, RV32/64, and did not get any faults.
THe code is clearly wrong though.  We need to test that we have a subreg 
before we look at the subreg_byte.  I fixed one of those elsewhere, this 
may ultimately be a paste-o.  Anyway, I'll fix it for V3.

jeff



Re: [PATCH V3 6/6] aarch64: Add system register duplication check selftest

2023-11-28 Thread Richard Sandiford
Victor Do Nascimento  writes:
> Add a build-time test to check whether system register data, as
> imported from `aarch64-sys-reg.def' has any duplicate entries.
>
> Duplicate entries are defined as any two SYSREG entries in the .def
> file which share the same encoding values (as specified by its `CPENC'
> field) and where the relationship amongst the two does not fit into
> one of the following categories:
>
>   * Simple aliasing: In some cases, it is observed that one
>   register name serves as an alias to another.  One example of
>   this is where TRCEXTINSELR aliases TRCEXTINSELR0.
>   * Expressing intent: It is possible that when a given register
>   serves two distinct functions depending on how it is used, it
>   is given two distinct names whose use should match the context
>   under which it is being used.  Example:  Debug Data Transfer
>   Register. When used to receive data, it should be accessed as
>   DBGDTRRX_EL0 while when transmitting data it should be
>   accessed via DBGDTRTX_EL0.
>   * Register depreciation: Some register names have been
>   deprecated and should no longer be used, but backwards-
>   compatibility requires that such names continue to be
>   recognized, as is the case for the SPSR_EL1 register, whose
>   access via the SPSR_SVC name is now deprecated.
>   * Same encoding different target: Some encodings are given
>   different meaning depending on the target architecture and, as
>   such, are given different names in each of theses contexts.
>   We see an example of this for CPENC(3,4,2,0,0), which
>   corresponds to TTBR0_EL2 for Armv8-A targets and VSCTLR_EL2
>   in Armv8-R targets.
>
> A consequence of these observations is that `CPENC' duplication is
> acceptable iff at least one of the `properties' or `arch_reqs' fields
> of the `sysreg_t' structs associated with the two registers in
> question differ and it's this condition that is checked by the new
> `aarch64_test_sysreg_encoding_clashes' function.
>
> gcc/ChangeLog:
>
>   * config/aarch64/aarch64.cc
>   (aarch64_test_sysreg_encoding_clashes): New.
>   (aarch64_run_selftests): add call to
>   aarch64_test_sysreg_encoding_clashes selftest.

OK, thanks.

Richard

> ---
>  gcc/config/aarch64/aarch64.cc | 44 +++
>  1 file changed, 44 insertions(+)
>
> diff --git a/gcc/config/aarch64/aarch64.cc b/gcc/config/aarch64/aarch64.cc
> index eaeab0be436..c0d75f167be 100644
> --- a/gcc/config/aarch64/aarch64.cc
> +++ b/gcc/config/aarch64/aarch64.cc
> @@ -22,6 +22,7 @@
>  
>  #define INCLUDE_STRING
>  #define INCLUDE_ALGORITHM
> +#define INCLUDE_VECTOR
>  #include "config.h"
>  #include "system.h"
>  #include "coretypes.h"
> @@ -28390,6 +28391,48 @@ aarch64_test_fractional_cost ()
>ASSERT_EQ (cf (1, 2).as_double (), 0.5);
>  }
>  
> +/* Calculate whether our system register data, as imported from
> +   `aarch64-sys-reg.def' has any duplicate entries.  */
> +static void
> +aarch64_test_sysreg_encoding_clashes (void)
> +{
> +  using dup_instances_t = hash_map +std::vector>;
> +
> +  dup_instances_t duplicate_instances;
> +
> +  /* Every time an encoding is established to come up more than once
> + we add it to a "clash-analysis queue", which is then used to extract
> + necessary information from our hash map when establishing whether
> + repeated encodings are valid.  */
> +
> +  /* 1) Collect recurrence information.  */
> +  for (unsigned i = 0; i < nsysreg; i++)
> +{
> +  const sysreg_t *reg = sysreg_structs + i;
> +
> +  std::vector *tmp
> + = _instances.get_or_insert (reg->encoding);
> +
> +  tmp->push_back (reg);
> +}
> +
> +  /* 2) Carry out analysis on collected data.  */
> +  for (auto instance : duplicate_instances)
> +{
> +  unsigned nrep = instance.second.size ();
> +  if (nrep > 1)
> + for (unsigned i = 0; i < nrep; i++)
> +   for (unsigned j = i + 1; j < nrep; j++)
> + {
> +   const sysreg_t *a = instance.second[i];
> +   const sysreg_t *b = instance.second[j];
> +   ASSERT_TRUE ((a->properties != b->properties)
> +|| (a->arch_reqs != b->arch_reqs));
> + }
> +}
> +}
> +
>  /* Run all target-specific selftests.  */
>  
>  static void
> @@ -28397,6 +28440,7 @@ aarch64_run_selftests (void)
>  {
>aarch64_test_loading_full_dump ();
>aarch64_test_fractional_cost ();
> +  aarch64_test_sysreg_encoding_clashes ();
>  }
>  
>  } // namespace selftest


Re: [PATCH V3 1/6] aarch64: Sync system register information with Binutils

2023-11-28 Thread Richard Sandiford
Victor Do Nascimento  writes:
> This patch adds the `aarch64-sys-regs.def' file, originally written
> for Binutils, to GCC. In so doing, it provides GCC with the necessary
> information for teaching the compiler about system registers known to
> the assembler and how these can be used.
>
> By aligning the representation of data common to different parts of
> the toolchain we can greatly reduce the duplication of work,
> facilitating the maintenance of the aarch64 back-end across different
> parts of the toolchain; By keeping both copies of the file in sync,
> any `SYSREG (...)' that is added in one project is automatically added
> to its counterpart.  This being the case, no change should be made in
> the GCC copy of the file.  Any modifications should first be made in
> Binutils and the resulting file copied over to GCC.
>
> GCC does not implement the full range of ISA flags present in
> Binutils.  Where this is the case, aliases must be added to aarch64.h
> with the unknown architectural extension being mapped to its
> associated base architecture, such that any flag present in Binutils
> and used in system register definitions is understood in GCC.  Again,
> this is done such that flags can be used interchangeably between
> projects making use of the aarch64-system-regs.def file.  This is done
> in the next patch in the series.
>
> `.arch' directives missing from the emitted assembly files as a
> consequence of this aliasing are accounted for by the compiler using
> the S encoding of system registers when
> issuing mrs/msr instructions.  This design choice ensures the
> assembler will accept anything that was deemed acceptable by the
> compiler.
>
> gcc/ChangeLog:
>
>   * config/aarch64/aarch64-system-regs.def: New.
> ---
>  gcc/config/aarch64/aarch64-sys-regs.def | 1064 +++
>  1 file changed, 1064 insertions(+)
>  create mode 100644 gcc/config/aarch64/aarch64-sys-regs.def

OK, thanks.

Richard

>
> diff --git a/gcc/config/aarch64/aarch64-sys-regs.def 
> b/gcc/config/aarch64/aarch64-sys-regs.def
> new file mode 100644
> index 000..d24a2455503
> --- /dev/null
> +++ b/gcc/config/aarch64/aarch64-sys-regs.def
> @@ -0,0 +1,1064 @@
> +/* aarch64-system-regs.def -- AArch64 opcode support.
> +   Copyright (C) 2009-2023 Free Software Foundation, Inc.
> +   Contributed by ARM Ltd.
> +
> +   This file is part of the GNU opcodes library.
> +
> +   This library is free software; you can redistribute it and/or modify
> +   it under the terms of the GNU General Public License as published by
> +   the Free Software Foundation; either version 3, or (at your option)
> +   any later version.
> +
> +   It is distributed in the hope that it will be useful, but WITHOUT
> +   ANY WARRANTY; without even the implied warranty of MERCHANTABILITY
> +   or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU General Public
> +   License for more details.
> +
> +   You should have received a copy of the GNU General Public License
> +   along with this program; see the file COPYING3.  If not,
> +   see .  */
> +
> +/* Array of system registers and their associated arch features.
> +
> +   This file is also used by GCC.  Where necessary, any updates should
> +   be made in Binutils and the updated file copied across to GCC, such
> +   that the two projects are kept in sync at all times.
> +
> +   Before using #include to read this file, define a macro:
> +
> + SYSREG (name, encoding, flags, features)
> +
> +  The NAME is the system register name, as recognized by the
> +  assembler.  ENCODING provides the necessary information for the binary
> +  encoding of the system register.  The FLAGS field is a bitmask of
> +  relevant behavior information pertaining to the particular register.
> +  For example: is it read/write-only? does it alias another register?
> +  The FEATURES field maps onto ISA flags and specifies the architectural
> +  feature requirements of the system register.  */
> +
> +  SYSREG ("accdata_el1", CPENC (3,0,13,0,5), 0,  
> AARCH64_NO_FEATURES)
> +  SYSREG ("actlr_el1",   CPENC (3,0,1,0,1),  0,  
> AARCH64_NO_FEATURES)
> +  SYSREG ("actlr_el2",   CPENC (3,4,1,0,1),  0,  
> AARCH64_NO_FEATURES)
> +  SYSREG ("actlr_el3",   CPENC (3,6,1,0,1),  0,  
> AARCH64_NO_FEATURES)
> +  SYSREG ("afsr0_el1",   CPENC (3,0,5,1,0),  0,  
> AARCH64_NO_FEATURES)
> +  SYSREG ("afsr0_el12",  CPENC (3,5,5,1,0),  F_ARCHEXT,  
> AARCH64_FEATURE (V8_1A))
> +  SYSREG ("afsr0_el2",   CPENC (3,4,5,1,0),  0,  
> AARCH64_NO_FEATURES)
> +  SYSREG ("afsr0_el3",   CPENC (3,6,5,1,0),  0,  
> AARCH64_NO_FEATURES)
> +  SYSREG ("afsr1_el1",   CPENC (3,0,5,1,1),  0,  
> 

[r14-5930 Regression] FAIL: gcc.c-torture/compile/libcall-2.c -Os (test for excess errors) on Linux/x86_64

2023-11-28 Thread haochen.jiang
On Linux/x86_64,

f31a019d1161ec78846473da743aedf49cca8c27 is the first bad commit
commit f31a019d1161ec78846473da743aedf49cca8c27
Author: Jose E. Marchesi 
Date:   Fri Nov 24 06:30:28 2023 +0100

Emit funcall external declarations only if actually used.

caused

FAIL: gcc.c-torture/compile/libcall-2.c   -O0  (test for excess errors)
FAIL: gcc.c-torture/compile/libcall-2.c   -O1  (test for excess errors)
FAIL: gcc.c-torture/compile/libcall-2.c   -O2 -flto -fno-use-linker-plugin 
-flto-partition=none  (test for excess errors)
FAIL: gcc.c-torture/compile/libcall-2.c   -O2 -flto -fuse-linker-plugin 
-fno-fat-lto-objects  (test for excess errors)
FAIL: gcc.c-torture/compile/libcall-2.c   -O2  (test for excess errors)
FAIL: gcc.c-torture/compile/libcall-2.c   -O3 -g  (test for excess errors)
FAIL: gcc.c-torture/compile/libcall-2.c   -Os  (test for excess errors)

with GCC configured with

../../gcc/configure 
--prefix=/export/users/haochenj/src/gcc-bisect/master/master/r14-5930/usr 
--enable-clocale=gnu --with-system-zlib --with-demangler-in-ld 
--with-fpmath=sse --enable-languages=c,c++,fortran --enable-cet --without-isl 
--enable-libmpx x86_64-linux --disable-bootstrap

To reproduce:

$ cd {build_dir}/gcc && make check 
RUNTESTFLAGS="compile.exp=gcc.c-torture/compile/libcall-2.c 
--target_board='unix{-m32}'"
$ cd {build_dir}/gcc && make check 
RUNTESTFLAGS="compile.exp=gcc.c-torture/compile/libcall-2.c 
--target_board='unix{-m32\ -march=cascadelake}'"

(Please do not reply to this email, for question about this report, contact me 
at haochen dot jiang at intel.com.)
(If you met problems with cascadelake related, disabling AVX512F in command 
line might save that.)
(However, please make sure that there is no potential problems with AVX512.)


Re: [PATCH V3 5/6] aarch64: Add front-end argument type checking for target builtins

2023-11-28 Thread Richard Sandiford
Victor Do Nascimento  writes:
> In implementing the ACLE read/write system register builtins it was
> observed that leaving argument type checking to be done at expand-time
> meant that poorly-formed function calls were being "fixed" by certain
> optimization passes, meaning bad code wasn't being properly picked up
> in checking.
>
> Example:
>
>   const char *regname = "amcgcr_el0";
>   long long a = __builtin_aarch64_rsr64 (regname);
>
> is reduced by the ccp1 pass to
>
>   long long a = __builtin_aarch64_rsr64 ("amcgcr_el0");
>
> As these functions require an argument of STRING_CST type, there needs
> to be a check carried out by the front-end capable of picking this up.
>
> The introduced `check_general_builtin_call' function will be called by
> the TARGET_CHECK_BUILTIN_CALL hook whenever a call to a builtin
> belonging to the AARCH64_BUILTIN_GENERAL category is encountered,
> carrying out any appropriate checks associated with a particular
> builtin function code.
>
> gcc/ChangeLog:
>
>   * config/aarch64/aarch64-builtins.cc (check_general_builtin_call):
>   New.
>   * config/aarch64/aarch64-c.cc (aarch64_check_builtin_call):
>   Add check_general_builtin_call call.
>   * config/aarch64/aarch64-protos.h (check_general_builtin_call):
>   New.
>
> gcc/testsuite/ChangeLog:
>
>   * gcc.target/aarch64/acle/rwsr-3.c: New.
> ---
>  gcc/config/aarch64/aarch64-builtins.cc| 31 +++
>  gcc/config/aarch64/aarch64-c.cc   |  4 +--
>  gcc/config/aarch64/aarch64-protos.h   |  4 +++
>  .../gcc.target/aarch64/acle/rwsr-3.c  | 18 +++
>  4 files changed, 55 insertions(+), 2 deletions(-)
>  create mode 100644 gcc/testsuite/gcc.target/aarch64/acle/rwsr-3.c
>
> diff --git a/gcc/config/aarch64/aarch64-builtins.cc 
> b/gcc/config/aarch64/aarch64-builtins.cc
> index dd76cca611b..c5f20f68bca 100644
> --- a/gcc/config/aarch64/aarch64-builtins.cc
> +++ b/gcc/config/aarch64/aarch64-builtins.cc
> @@ -2127,6 +2127,37 @@ aarch64_general_builtin_decl (unsigned code, bool)
>return aarch64_builtin_decls[code];
>  }
>  
> +bool
> +aarch64_check_general_builtin_call (location_t location, vec,
> + unsigned int code, tree fndecl,
> + unsigned int nargs ATTRIBUTE_UNUSED, tree *args)

I'd prefer aarch64_general_check_builtin_call, to avoid breaking up
the name of the target hook.  "aarch64_general" is kind of a prefix here,
to distinguish it from aarch64_sve::

> +{
> +  switch (code)
> +{
> +case AARCH64_RSR:
> +case AARCH64_RSRP:
> +case AARCH64_RSR64:
> +case AARCH64_RSRF:
> +case AARCH64_RSRF64:
> +case AARCH64_WSR:
> +case AARCH64_WSRP:
> +case AARCH64_WSR64:
> +case AARCH64_WSRF:
> +case AARCH64_WSRF64:
> +  if (TREE_CODE (args[0]) != NOP_EXPR

It's probably best not to require a NOP_EXPR.  Let's just accept one
if it's there.  The easiest way of doing that is:

  {
rtx addr = STRIP_NOPS (args[0]);

and then checking "addr".

> +   || TREE_CODE (TREE_TYPE (args[0])) != POINTER_TYPE
> +   || (TREE_CODE (TREE_OPERAND (TREE_OPERAND (args[0], 0) , 0))
> +   != STRING_CST))

We need to check what TREE_OPERAND (args[0], 0) is before using
TREE_OPERAND (TREE_OPERAND (args[0], 0), 0).  I assume it's checking
for an ADDR_EXPR.  (Also, minor formatting nit, but there should be
no space before ", 0".)

Looks good otherwise, thanks.

Richard

> + {
> +   error_at (location, "first argument to %qD must be a string literal",
> + fndecl);
> +   return false;
> + }
> +}
> +  /* Default behavior.  */
> +  return true;
> +}
> +
>  typedef enum
>  {
>SIMD_ARG_COPY_TO_REG,
> diff --git a/gcc/config/aarch64/aarch64-c.cc b/gcc/config/aarch64/aarch64-c.cc
> index ab8844f6049..be8b7236cf9 100644
> --- a/gcc/config/aarch64/aarch64-c.cc
> +++ b/gcc/config/aarch64/aarch64-c.cc
> @@ -339,8 +339,8 @@ aarch64_check_builtin_call (location_t loc, 
> vec arg_loc,
>switch (code & AARCH64_BUILTIN_CLASS)
>  {
>  case AARCH64_BUILTIN_GENERAL:
> -  return true;
> -
> +  return aarch64_check_general_builtin_call (loc, arg_loc, subcode,
> +  orig_fndecl, nargs, args);
>  case AARCH64_BUILTIN_SVE:
>return aarch64_sve::check_builtin_call (loc, arg_loc, subcode,
> orig_fndecl, nargs, args);
> diff --git a/gcc/config/aarch64/aarch64-protos.h 
> b/gcc/config/aarch64/aarch64-protos.h
> index 5d6a1e75700..dbd486cfea4 100644
> --- a/gcc/config/aarch64/aarch64-protos.h
> +++ b/gcc/config/aarch64/aarch64-protos.h
> @@ -990,6 +990,10 @@ tree aarch64_general_builtin_rsqrt (unsigned int);
>  void handle_arm_acle_h (void);
>  void handle_arm_neon_h (void);
>  
> +bool aarch64_check_general_builtin_call (location_t, vec,
> +  unsigned int, tree, unsigned int,
> + 

Re: [PATCH V3 4/6] aarch64: Implement system register r/w arm ACLE intrinsic functions

2023-11-28 Thread Richard Sandiford
Victor Do Nascimento  writes:
> Implement the aarch64 intrinsics for reading and writing system
> registers with the following signatures:
>
>   uint32_t __arm_rsr(const char *special_register);
>   uint64_t __arm_rsr64(const char *special_register);
>   void* __arm_rsrp(const char *special_register);
>   float __arm_rsrf(const char *special_register);
>   double __arm_rsrf64(const char *special_register);
>   void __arm_wsr(const char *special_register, uint32_t value);
>   void __arm_wsr64(const char *special_register, uint64_t value);
>   void __arm_wsrp(const char *special_register, const void *value);
>   void __arm_wsrf(const char *special_register, float value);
>   void __arm_wsrf64(const char *special_register, double value);
>
> gcc/ChangeLog:
>
>   * config/aarch64/aarch64-builtins.cc (enum aarch64_builtins):
>   Add enums for new builtins.
>   (aarch64_init_rwsr_builtins): New.
>   (aarch64_general_init_builtins): Call aarch64_init_rwsr_builtins.
>   (aarch64_expand_rwsr_builtin):  New.
>   (aarch64_general_expand_builtin): Call aarch64_general_expand_builtin.
>   * config/aarch64/aarch64.md (read_sysregdi): New insn_and_split.
>   (write_sysregdi): Likewise.
>   * config/aarch64/arm_acle.h (__arm_rsr): New.
>   (__arm_rsrp): Likewise.
>   (__arm_rsr64): Likewise.
>   (__arm_rsrf): Likewise.
>   (__arm_rsrf64): Likewise.
>   (__arm_wsr): Likewise.
>   (__arm_wsrp): Likewise.
>   (__arm_wsr64): Likewise.
>   (__arm_wsrf): Likewise.
>   (__arm_wsrf64): Likewise.
>
> gcc/testsuite/ChangeLog:
>
>   * gcc.target/aarch64/acle/rwsr.c: New.
>   * gcc.target/aarch64/acle/rwsr-1.c: Likewise.
>   * gcc.target/aarch64/acle/rwsr-2.c: Likewise.
>   * gcc.dg/pch/rwsr-pch.c: Likewise.
>   * gcc.dg/pch/rwsr-pch.hs: Likewise.
> ---
>  gcc/config/aarch64/aarch64-builtins.cc| 191 ++
>  gcc/config/aarch64/aarch64.md |  18 ++
>  gcc/config/aarch64/arm_acle.h |  30 +++
>  gcc/testsuite/gcc.dg/pch/rwsr-pch.c   |   7 +
>  gcc/testsuite/gcc.dg/pch/rwsr-pch.hs  |  10 +
>  .../gcc.target/aarch64/acle/rwsr-1.c  |  29 +++
>  .../gcc.target/aarch64/acle/rwsr-2.c  |  25 +++
>  gcc/testsuite/gcc.target/aarch64/acle/rwsr.c  | 144 +
>  8 files changed, 454 insertions(+)
>  create mode 100644 gcc/testsuite/gcc.dg/pch/rwsr-pch.c
>  create mode 100644 gcc/testsuite/gcc.dg/pch/rwsr-pch.hs
>  create mode 100644 gcc/testsuite/gcc.target/aarch64/acle/rwsr-1.c
>  create mode 100644 gcc/testsuite/gcc.target/aarch64/acle/rwsr-2.c
>  create mode 100644 gcc/testsuite/gcc.target/aarch64/acle/rwsr.c
>
> diff --git a/gcc/config/aarch64/aarch64-builtins.cc 
> b/gcc/config/aarch64/aarch64-builtins.cc
> index 04f59fd9a54..dd76cca611b 100644
> --- a/gcc/config/aarch64/aarch64-builtins.cc
> +++ b/gcc/config/aarch64/aarch64-builtins.cc
> @@ -47,6 +47,7 @@
>  #include "stringpool.h"
>  #include "attribs.h"
>  #include "gimple-fold.h"
> +#include "builtins.h"
>  
>  #define v8qi_UP  E_V8QImode
>  #define v8di_UP  E_V8DImode
> @@ -808,6 +809,17 @@ enum aarch64_builtins
>AARCH64_RBIT,
>AARCH64_RBITL,
>AARCH64_RBITLL,
> +  /* System register builtins.  */
> +  AARCH64_RSR,
> +  AARCH64_RSRP,
> +  AARCH64_RSR64,
> +  AARCH64_RSRF,
> +  AARCH64_RSRF64,
> +  AARCH64_WSR,
> +  AARCH64_WSRP,
> +  AARCH64_WSR64,
> +  AARCH64_WSRF,
> +  AARCH64_WSRF64,
>AARCH64_BUILTIN_MAX
>  };
>  
> @@ -1798,6 +1810,65 @@ aarch64_init_rng_builtins (void)
>  AARCH64_BUILTIN_RNG_RNDRRS);
>  }
>  
> +/* Add builtins for reading system register.  */
> +static void
> +aarch64_init_rwsr_builtins (void)
> +{
> +  tree fntype = NULL;
> +  tree const_char_ptr_type
> += build_pointer_type (build_type_variant (char_type_node, true, false));
> +
> +#define AARCH64_INIT_RWSR_BUILTINS_DECL(F, N, T) \
> +  aarch64_builtin_decls[AARCH64_##F] \
> += aarch64_general_add_builtin ("__builtin_aarch64_"#N, T, AARCH64_##F);
> +
> +  fntype
> += build_function_type_list (uint32_type_node, const_char_ptr_type, NULL);
> +  AARCH64_INIT_RWSR_BUILTINS_DECL (RSR, rsr, fntype);
> +
> +  fntype
> += build_function_type_list (ptr_type_node, const_char_ptr_type, NULL);
> +  AARCH64_INIT_RWSR_BUILTINS_DECL (RSRP, rsrp, fntype);
> +
> +  fntype
> += build_function_type_list (uint64_type_node, const_char_ptr_type, NULL);
> +  AARCH64_INIT_RWSR_BUILTINS_DECL (RSR64, rsr64, fntype);
> +
> +  fntype
> += build_function_type_list (float_type_node, const_char_ptr_type, NULL);
> +  AARCH64_INIT_RWSR_BUILTINS_DECL (RSRF, rsrf, fntype);
> +
> +  fntype
> += build_function_type_list (double_type_node, const_char_ptr_type, NULL);
> +  AARCH64_INIT_RWSR_BUILTINS_DECL (RSRF64, rsrf64, fntype);
> +
> +  fntype
> += build_function_type_list (void_type_node, const_char_ptr_type,
> + 

[Bug rtl-optimization/55757] Suboptimal interrupt prologue/epilogue for ARMv7-M (Cortex-M3)

2023-11-28 Thread yann at poupet dot eu via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=55757

Yann Poupet  changed:

   What|Removed |Added

 CC||yann at poupet dot eu

--- Comment #8 from Yann Poupet  ---
Hi,

I had the same issue and modified GCC so that the prologue/epilogue do not
save/restore R4-R11 if it's not required - assuming these are caller-saved
(same effect as -fcall-used-[r4...r11]), with a new function attribute. I'm not
sure if this has a chance to be accepted upstream though.

I'm using it in 2 cases:

- ISR as described above
- for the tasks launched by my own home made microkernel. Indeed, when the
kernel starts a task starting its entry function, there's no need to save any
register, it's just a waste of stack space.

Anyone still interested with a solution ?
The patch is very small, maybe 10 lines.

Cheers
Yann

Re: [PATCH V3 2/6] aarch64: Add support for aarch64-sys-regs.def

2023-11-28 Thread Richard Sandiford
Victor Do Nascimento  writes:
> This patch defines the structure of a new .def file used for
> representing the aarch64 system registers, what information it should
> hold and the basic framework in GCC to process this file.
>
> Entries in the aarch64-system-regs.def file should be as follows:
>
>   SYSREG (NAME, CPENC (sn,op1,cn,cm,op2), FLAG1 | ... | FLAGn, ARCH)
>
> Where the arguments to SYSREG correspond to:
>   - NAME:  The system register name, as used in the assembly language.
>   - CPENC: The system register encoding, mapping to:
>
>  s__c_c_
>
>   - FLAG: The entries in the FLAGS field are bitwise-OR'd together to
> encode extra information required to ensure proper use of
> the system register.  For example, a read-only system
> register will have the flag F_REG_READ, while write-only
> registers will be labeled F_REG_WRITE.  Such flags are
> tested against at compile-time.
>   - ARCH: The architectural features the system register is associated
> with.  This is encoded via one of three possible macros:
> 1. When a system register is universally implemented, we say
> it has no feature requirements, so we tag it with the
> AARCH64_NO_FEATURES macro.
> 2. When a register is only implemented for a single
> architectural extension EXT, the AARCH64_FEATURE (EXT), is
> used.
> 3. When a given system register is made available by any of N
> possible architectural extensions, the AARCH64_FEATURES(N, ...)
> macro is used to combine them accordingly.
>
> In order to enable proper interpretation of the SYSREG entries by the
> compiler, flags defining system register behavior such as `F_REG_READ'
> and `F_REG_WRITE' are also defined here, so they can later be used for
> the validation of system register properties.
>
> Finally, any architectural feature flags from Binutils missing from GCC
> have appropriate aliases defined here so as to ensure
> cross-compatibility of SYSREG entries across the toolchain.
>
> gcc/ChangeLog:
>
>   * config/aarch64/aarch64.cc (sysreg_t): New.
>   (sysreg_structs): Likewise.
>   (nsysreg): Likewise.
>   (AARCH64_FEATURE): Likewise.
>   (AARCH64_FEATURES): Likewise.
>   (AARCH64_NO_FEATURES): Likewise.
>   * config/aarch64/aarch64.h (AARCH64_ISA_V8A): Add missing
>   ISA flag.
>   (AARCH64_ISA_V8_1A): Likewise.
>   (AARCH64_ISA_V8_7A): Likewise.
>   (AARCH64_ISA_V8_8A): Likewise.
>   (AARCH64_NO_FEATURES): Likewise.
>   (AARCH64_FL_RAS): New ISA flag alias.
>   (AARCH64_FL_LOR): Likewise.
>   (AARCH64_FL_PAN): Likewise.
>   (AARCH64_FL_AMU): Likewise.
>   (AARCH64_FL_SCXTNUM): Likewise.
>   (AARCH64_FL_ID_PFR2): Likewise.
>   (F_DEPRECATED): New.
>   (F_REG_READ): Likewise.
>   (F_REG_WRITE): Likewise.
>   (F_ARCHEXT): Likewise.
>   (F_REG_ALIAS): Likewise.
> ---
>  gcc/config/aarch64/aarch64.cc | 53 +++
>  gcc/config/aarch64/aarch64.h  | 22 +++
>  2 files changed, 75 insertions(+)
>
> diff --git a/gcc/config/aarch64/aarch64.cc b/gcc/config/aarch64/aarch64.cc
> index 5fd7063663c..a4a9e2e51ea 100644
> --- a/gcc/config/aarch64/aarch64.cc
> +++ b/gcc/config/aarch64/aarch64.cc
> @@ -2806,6 +2806,59 @@ static const struct processor all_cores[] =
> feature_deps::V8A ().enable, _tunings},
>{NULL, aarch64_none, aarch64_none, aarch64_no_arch, 0, NULL}
>  };
> +/* Internal representation of system registers.  */
> +typedef struct {
> +  const char *name;
> +  /* Stringified sysreg encoding values, represented as
> + s__c_c_.  */
> +  const char *encoding;
> +  /* Flags affecting sysreg usage, such as read/write-only.  */
> +  unsigned properties;
> +  /* Architectural features implied by sysreg.  */
> +  aarch64_feature_flags arch_reqs;
> +} sysreg_t;
> +
> +/* An aarch64_feature_set initializer for a single feature,
> +   AARCH64_FEATURE_.  */
> +#define AARCH64_FEATURE(FEAT) AARCH64_FL_##FEAT
> +
> +/* Used by AARCH64_FEATURES.  */
> +#define AARCH64_OR_FEATURES_1(X, F1) \
> +  AARCH64_FEATURE (F1)
> +#define AARCH64_OR_FEATURES_2(X, F1, F2) \
> +  (AARCH64_FEATURE (F1) | AARCH64_OR_FEATURES_1 (X, F2))
> +#define AARCH64_OR_FEATURES_3(X, F1, ...) \
> +  (AARCH64_FEATURE (F1) | AARCH64_OR_FEATURES_2 (X, __VA_ARGS__))
> +
> +/* An aarch64_feature_set initializer for the N features listed in "...".  */
> +#define AARCH64_FEATURES(N, ...) \
> +  AARCH64_OR_FEATURES_##N (0, __VA_ARGS__)
> +
> +#define AARCH64_NO_FEATURES 0
> +
> +/* Flags associated with the properties of system registers.  It mainly 
> serves
> +   to mark particular registers as read or write only.  */
> +#define F_DEPRECATED(1 << 1)
> +#define F_REG_READ  (1 << 2)
> +#define F_REG_WRITE (1 << 3)
> +#define F_ARCHEXT   (1 << 4)
> +/* Flag indicating register name is alias for another system 

[Bug tree-optimization/112752] `~a - MIN, ~c>` is not optimized to `MAX,c> - a`

2023-11-28 Thread pinskia at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=112752

--- Comment #2 from Andrew Pinski  ---
Here is a semi-reduced testcase:
```
#define byte unsigned char
#define MIN(a, b) ((a) > (b)?(b):(a))

byte hh(byte r, byte g, byte b) {
  byte c = 255 - r;
  byte m = 255 - g;
  byte y = 255 - b;
  byte tmp = MIN(m, y);
  byte k = MIN(c, tmp);
  return m - k;
}
```
Note matching this directly, the above might not fix the original testcase.

Re: [PATCH V3 3/6] aarch64: Implement system register validation tools

2023-11-28 Thread Richard Sandiford
Victor Do Nascimento  writes:
> Given the implementation of a mechanism of encoding system registers
> into GCC, this patch provides the mechanism of validating their use by
> the compiler.  In particular, this involves:
>
>   1. Ensuring a supplied string corresponds to a known system
>  register name.  System registers can be accessed either via their
>  name (e.g. `SPSR_EL1') or their encoding (e.g. `S3_0_C4_C0_0').
>  Register names are validated using a hash map, mapping known
>  system register names to its corresponding `sysreg_t' struct,
>  which is populated from the `aarch64_system_regs.def' file.
>  Register name validation is done via `lookup_sysreg_map', while
>  the encoding naming convention is validated via a parser
>  implemented in this patch - `is_implem_def_reg'.
>   2. Once a given register name is deemed to be valid, it is checked
>  against a further 2 criteria:
>a. Is the referenced register implemented in the target
>   architecture?  This is achieved by comparing the ARCH field
> in the relevant SYSREG entry from `aarch64_system_regs.def'
> against `aarch64_feature_flags' flags set at compile-time.
>b. Is the register being used correctly?  Check the requested
> operation against the FLAGS specified in SYSREG.
> This prevents operations like writing to a read-only system
> register.
>
> gcc/ChangeLog:
>
>   * config/aarch64/aarch64-protos.h (aarch64_valid_sysreg_name_p): New.
>   (aarch64_retrieve_sysreg): Likewise.
>   * config/aarch64/aarch64.cc (is_implem_def_reg): Likewise.
>   (aarch64_valid_sysreg_name_p): Likewise.
>   (aarch64_retrieve_sysreg): Likewise.
>   (aarch64_register_sysreg): Likewise.
>   (aarch64_init_sysregs): Likewise.
>   (aarch64_lookup_sysreg_map): Likewise.
>   * config/aarch64/predicates.md (aarch64_sysreg_string): New.
> ---
>  gcc/config/aarch64/aarch64-protos.h |   2 +
>  gcc/config/aarch64/aarch64.cc   | 147 
>  gcc/config/aarch64/predicates.md|   4 +
>  3 files changed, 153 insertions(+)
>
> diff --git a/gcc/config/aarch64/aarch64-protos.h 
> b/gcc/config/aarch64/aarch64-protos.h
> index 60a55f4bc19..5d6a1e75700 100644
> --- a/gcc/config/aarch64/aarch64-protos.h
> +++ b/gcc/config/aarch64/aarch64-protos.h
> @@ -830,6 +830,8 @@ bool aarch64_simd_shift_imm_p (rtx, machine_mode, bool);
>  bool aarch64_sve_ptrue_svpattern_p (rtx, struct simd_immediate_info *);
>  bool aarch64_simd_valid_immediate (rtx, struct simd_immediate_info *,
>   enum simd_immediate_check w = AARCH64_CHECK_MOV);
> +bool aarch64_valid_sysreg_name_p (const char *);
> +const char *aarch64_retrieve_sysreg (const char *, bool);
>  rtx aarch64_check_zero_based_sve_index_immediate (rtx);
>  bool aarch64_sve_index_immediate_p (rtx);
>  bool aarch64_sve_arith_immediate_p (machine_mode, rtx, bool);
> diff --git a/gcc/config/aarch64/aarch64.cc b/gcc/config/aarch64/aarch64.cc
> index a4a9e2e51ea..eaeab0be436 100644
> --- a/gcc/config/aarch64/aarch64.cc
> +++ b/gcc/config/aarch64/aarch64.cc
> @@ -85,6 +85,7 @@
>  #include "config/arm/aarch-common.h"
>  #include "config/arm/aarch-common-protos.h"
>  #include "ssa.h"
> +#include "hash-map.h"
>  
>  /* This file should be included last.  */
>  #include "target-def.h"
> @@ -2860,6 +2861,51 @@ const sysreg_t sysreg_structs[] =
>  
>  const unsigned nsysreg = ARRAY_SIZE (sysreg_structs);
>  
> +using sysreg_map_t = hash_map;
> +static sysreg_map_t *sysreg_map = nullptr;
> +
> +/* Map system register names to their hardware metadata: encoding,
> +   feature flags and architectural feature requirements, all of which
> +   are encoded in a sysreg_t struct.  */
> +void
> +aarch64_register_sysreg (const char *name, const sysreg_t *metadata)
> +{
> +  bool dup = sysreg_map->put (name, metadata);
> +  gcc_checking_assert (!dup);
> +}
> +
> +/* Lazily initialize hash table for system register validation,
> +   checking the validity of supplied register name and returning
> +   register's associated metadata.  */
> +static void
> +aarch64_init_sysregs (void)
> +{
> +  gcc_assert (!sysreg_map);
> +  sysreg_map = new sysreg_map_t;
> +
> +  for (unsigned i = 0; i < nsysreg; i++)
> +{
> +  const sysreg_t *reg = sysreg_structs + i;
> +  aarch64_register_sysreg (reg->name, reg);
> +}
> +}
> +
> +/* No direct access to the sysreg hash-map should be made.  Doing so
> +   risks trying to acess an unitialized hash-map and dereferencing the
> +   returned double pointer without due care risks dereferencing a
> +   null-pointer.  */
> +const sysreg_t *
> +aarch64_lookup_sysreg_map (const char *regname)
> +{
> +  if (!sysreg_map)
> +aarch64_init_sysregs ();
> +
> +  const sysreg_t **sysreg_entry = sysreg_map->get (regname);
> +  if (sysreg_entry != NULL)
> +return *sysreg_entry;
> +  return NULL;
> +}
> +
>  /* The current tuning set.  */
>  struct 

[Bug target/111107] i686-w64-mingw32 does not realign stack when __attribute__((aligned)) or __attribute__((vector_size)) are used

2023-11-28 Thread ebotcazou at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=07

--- Comment #14 from Eric Botcazou  ---
> I'd say that
> 
> config/i386/cygming.h:#define STACK_REALIGN_DEFAULT TARGET_SSE
> 
> is a non-working "fix".  The appropriate default would be
> -mincoming-stack-boundary=2.  MIN_STACK_BOUNDARY should already be 4, so
> that leaves PREFERRED_STACK_BOUNDARY_DEFAULT is the way to go here.

This was a minimal fix to support SSE, but Solaris was indeed more radical:

sol2.h:#undef STACK_REALIGN_DEFAULT
sol2.h:#define STACK_REALIGN_DEFAULT (TARGET_64BIT ? 0 : 1)

so we could just mimic it for Windows.

Re: [PATCH v2 0/5] aarch64: Add Armv9.4-a 128-bit system-register read/write support

2023-11-28 Thread Richard Sandiford
Victor Do Nascimento  writes:
> Changes from v1 -
> https://gcc.gnu.org/pipermail/gcc-patches/2023-November/635531.html
>
>   * [PATCH 4/5] - For `error_at' message, put feature name in quotes.
>   * [PATCH 4/5] - For `aarch64_retrieve_sysreg' function, add
>   description of new parameter to comments.
>   * [PATCH 5/5] - Reduce the minimum arch requirements of the system
>   register unit tests, selectively using `#pragma GCC target' when
>   testing 128-bit sysreg r/w functions.

OK for the series, thanks, with the changelog tweak that Kyrill
asked for.  1-3 had already been approved and I agree that the new
versions of 4 and 5 address the review comments.

Richard

> ---
>
> Given the introduction of optional 128-bit page table descriptor and
> translation hardening extension support with the Arm9.4-a
> architecture, this patch series introduces the necessary changes to
> the aarch64-specific builtin code to enable the reading and writing of
> 128-bit system registers.  In so doing, the following ACLE builtins and
> feature macro are made available to the compiler:
>
>   * __uint128_t __arm_rsr128(const char *special_register);
>   * void __arm_wsr128(const char *special_register, __uint128_t value);
>   * __ARM_FEATURE_SYSREG128.
>
> Finally, in order to update the GCC system-register database bringing
> it in line with Binutils, and in so doing add the relevant 128-bit
> system registers to GCC, this patch also introduces the Guarded
> Control Stack (GCS) `+gcs' architecture modifier flag, allowing the
> inclusion of the novel GCS system registers which are now supported
> and also present in the `aarch64-sys-regs.def' system register
> database.
>
> Victor Do Nascimento (5):
>   aarch64: Add march flags for +the and +d128 arch extensions
>   aarch64: Add support for GCS system registers with the +gcs modifier
>   aarch64: Sync `aarch64-sys-regs.def' with Binutils.
>   aarch64: Implement 128-bit extension to ACLE sysreg r/w builtins
>   aarch64: Add rsr128 and wsr128 ACLE tests
>
>  gcc/config/aarch64/aarch64-arches.def |  2 +
>  gcc/config/aarch64/aarch64-builtins.cc| 50 ---
>  gcc/config/aarch64/aarch64-c.cc   |  1 +
>  .../aarch64/aarch64-option-extensions.def |  6 +++
>  gcc/config/aarch64/aarch64-protos.h   |  2 +-
>  gcc/config/aarch64/aarch64-sys-regs.def   | 30 +++
>  gcc/config/aarch64/aarch64.cc |  9 +++-
>  gcc/config/aarch64/aarch64.h  | 21 
>  gcc/config/aarch64/aarch64.md | 18 +++
>  gcc/config/aarch64/arm_acle.h | 11 
>  gcc/doc/invoke.texi   |  8 +++
>  gcc/testsuite/gcc.target/aarch64/acle/rwsr.c  | 32 
>  12 files changed, 170 insertions(+), 20 deletions(-)


Re: [PATCH] s390: implement flags output

2023-11-28 Thread Joseph Myers
This has introduced an ICE building glibc for s390x-linux-gnu.

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=112753

-- 
Joseph S. Myers
jos...@codesourcery.com


[Bug target/112753] New: [14 Regression] unrecognizable insn building glibc for s390x

2023-11-28 Thread jsm28 at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=112753

Bug ID: 112753
   Summary: [14 Regression] unrecognizable insn building glibc for
s390x
   Product: gcc
   Version: 14.0
Status: UNCONFIRMED
  Keywords: ice-on-valid-code
  Severity: normal
  Priority: P3
 Component: target
  Assignee: unassigned at gcc dot gnu.org
  Reporter: jsm28 at gcc dot gnu.org
CC: jchrist at linux dot ibm.com
  Target Milestone: ---
Target: s390*-*-*

Created attachment 56709
  --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=56709=edit
preprocessed source

An ICE building glibc for s390x-linux-gnu was introduced by the commit

commit 466b100e5fee808d77598e0f294654deec281150
Author: Juergen Christ 
Date:   Mon Nov 20 09:13:10 2023 +0100

s390: implement flags output

Compile the attached test with -O2.

In file included from ../sysdeps/ieee754/ldbl-64-128/strtold_l.c:58:
./strtod_l.c: In function 'round_and_return':
./strtod_l.c:356:1: error: unrecognizable insn:
(insn 252 251 253 30 (set (reg:FPRX2 187)
(subreg:FPRX2 (reg:TF 186) 0)) "./strtod_l.c":309:218 discrim 1 -1
 (nil))
during RTL pass: vregs
./strtod_l.c:356:1: internal compiler error: in extract_insn, at recog.cc:2804
0x6f8bb8 _fatal_insn(char const*, rtx_def const*, char const*, int, char
const*)
/scratch/jmyers/glibc/many14/src/gcc/gcc/rtl-error.cc:108
0x6f8bda _fatal_insn_not_found(rtx_def const*, char const*, int, char const*)
/scratch/jmyers/glibc/many14/src/gcc/gcc/rtl-error.cc:116
0x6f759b extract_insn(rtx_insn*)
/scratch/jmyers/glibc/many14/src/gcc/gcc/recog.cc:2804
0xb85405 instantiate_virtual_regs_in_insn
/scratch/jmyers/glibc/many14/src/gcc/gcc/function.cc:1610
0xb85405 instantiate_virtual_regs
/scratch/jmyers/glibc/many14/src/gcc/gcc/function.cc:1993
0xb85405 execute
/scratch/jmyers/glibc/many14/src/gcc/gcc/function.cc:2040
Please submit a full bug report, with preprocessed source (by using
-freport-bug).
Please include the complete backtrace with any bug report.
See  for instructions.

[Bug target/111107] i686-w64-mingw32 does not realign stack when __attribute__((aligned)) or __attribute__((vector_size)) are used

2023-11-28 Thread ebotcazou at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=07

Eric Botcazou  changed:

   What|Removed |Added

 Status|UNCONFIRMED |NEW
 Ever confirmed|0   |1
   Last reconfirmed||2023-11-28

[Bug tree-optimization/112752] `~a - MIN, ~c>` is not optimized to `MAX,c> - a`

2023-11-28 Thread pinskia at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=112752

--- Comment #1 from Andrew Pinski  ---
I made a small mistake into what it should be optimized to:
The MIN should be MAX (oops):
that is:
tmp = MAX(r, g);
k = MAX(b, tmp);

[Bug libstdc++/112607] : _Normalize does not consider char_type for the basic_string_view case

2023-11-28 Thread redi at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=112607

Jonathan Wakely  changed:

   What|Removed |Added

 Status|ASSIGNED|RESOLVED
 Resolution|--- |FIXED

--- Comment #7 from Jonathan Wakely  ---
Fixed for 13.3 as well, thanks for the report.

[Bug tree-optimization/112752] New: `~a - MIN, ~c>` is not optimized to `MAX,c> - a`

2023-11-28 Thread pinskia at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=112752

Bug ID: 112752
   Summary: `~a - MIN, ~c>` is not optimized to
`MAX,c> - a`
   Product: gcc
   Version: 14.0
Status: UNCONFIRMED
  Keywords: missed-optimization
  Severity: enhancement
  Priority: P3
 Component: tree-optimization
  Assignee: unassigned at gcc dot gnu.org
  Reporter: pinskia at gcc dot gnu.org
CC: evstupac at gmail dot com, kirill.yukhin at intel dot com,
pinskia at gcc dot gnu.org, rguenth at gcc dot gnu.org,
unassigned at gcc dot gnu.org
  Target Milestone: ---
Target: x86_64-*-*

+++ This bug was initially created as a clone of Bug #52252 +++

This is an example of byte conversion from RGB (Red Green Blue) to CMYK (Cyan
Magenta Yellow blacK):
```
#define byte unsigned char
#define MIN(a, b) ((a) > (b)?(b):(a))

void convert_image(byte *in, byte *out, int size) {
int i;
for(i = 0; i < size; i++) {
byte r = in[0];
byte g = in[1];
byte b = in[2];
byte c, m, y, k, tmp;
c = 255 - r;
m = 255 - g;
y = 255 - b;
tmp = MIN(m, y);
k = MIN(c, tmp);
out[0] = c - k;
out[1] = m - k;
out[2] = y - k;
out[3] = k;
in += 3;
out += 4;
}
}
```

For the scalar (and vectorized versions) we should get instead:
```
#define byte unsigned char
#define MIN(a, b) ((a) > (b)?(b):(a))
#define MAX(a, b) ((a) < (b)?(b):(a))

void convert_image_1(byte *in, byte *out, int size) {
int i;
for(i = 0; i < size; i++) {
byte r = in[0];
byte g = in[1];
byte b = in[2];
byte c, m, y, k, tmp;
tmp = MIN(r, g);
k = MIN(b, tmp);
out[0] = k - r;
out[1] = k - g;
out[2] = k - b;
out[3] = 255 - k;
in += 3;
out += 4;
}
}
```

See bug 52252 comment #10, 11, and 12 also.

[Bug libstdc++/112607] : _Normalize does not consider char_type for the basic_string_view case

2023-11-28 Thread cvs-commit at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=112607

--- Comment #6 from GCC Commits  ---
The releases/gcc-13 branch has been updated by Jonathan Wakely
:

https://gcc.gnu.org/g:c7c92d61d32f6fd7746e2844f68d1936e2b6f6f6

commit r13-8105-gc7c92d61d32f6fd7746e2844f68d1936e2b6f6f6
Author: Jonathan Wakely 
Date:   Sat Nov 18 20:56:35 2023 +

libstdc++: Check string value_type in std::make_format_args [PR112607]

libstdc++-v3/ChangeLog:

PR libstdc++/112607
* include/std/format (basic_format_arg::_S_to_arg_type): Check
value_type for basic_string_view and basic_string
specializations.
* testsuite/std/format/arguments/112607.cc: New test.

(cherry picked from commit 279e407a06cc676d8e6e0bb5755b0a804e05377c)

[Bug tree-optimization/52252] An opportunity for x86 gcc vectorizer (gain up to 3 times)

2023-11-28 Thread pinskia at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=52252

--- Comment #12 from Andrew Pinski  ---
(In reply to rguent...@suse.de from comment #11)
> We're lacking a way to say one of the bit_not should be single-used,
> one multi-use would be OK and a fair trade-off - not sure if that
> would be enough here, of course.  That would mena changing to
> a condition with single_use ().

That does not fix it though. Because in this case we have:
  c_19 = ~r_16;
  m_20 = ~g_17;
  y_21 = ~b_18;
  tmp_22 = MIN_EXPR ;
  k_23 = MIN_EXPR ;
  _1 = c_19 - k_23;
  _3 = m_20 - k_23;
  _5 = y_21 - k_23;
  .. = k_23;

So both bit_not are used more than once.

so we have `~a - MIN, ~c>` which is the same as `MAX,c> -
a`.

Let me file this as a seperate bug to continue the discussion there.

[Bug middle-end/109849] suboptimal code for vector walking loop

2023-11-28 Thread redi at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=109849

--- Comment #31 from Jonathan Wakely  ---
Bisection points to r14-5831-gaae723d360ca26cd9fd0b039fb0a616bd0eae363 for that
remaining FAIL as well (and it isn't fixed by the new patch).

It introduced a new warning which wasn't present before:

/tmp/build/x86_64-pc-linux-gnu/libstdc++-v3/include/bits/stl_algobase.h:437:
warning: 'void* __builtin_memcpy(void*, const void*, long unsigned int)'
writing between 2 and 9223372036854775806 bytes into a region of size 0
overflows the destination [-Wstringop-overflow=]

Re: T-Head Vector for GCC-14? (was Re: RISC-V: Support XTheadVector extensions)

2023-11-28 Thread Jeff Law




On 11/28/23 12:56, Philipp Tomsich wrote:


That's obviously a risky thing to do given it was sent right at the end
of the window, but it meets the rules.

Folks in the call seemed generally amenable to at least trying for 14,
so unless anyone's opposed on the lists it seems like the way to go.
IIRC we ended up with the following TODO list:

* Make sure this doesn't regress on the targets we already support.
   From the sounds of things there's been test suite runs that look fine,
   so hopefully that's all manageable.  Christoph said he'd send
   something out, we've had a bunch of test skew so there might be a bit
   lurking but it should be generally manageable.
* We agree on some sort of support lifecycle.  There seemed to be
   basically two proposals: merge for 14 with the aim of quickly
   deperecating it (maybe even for 15), or merge for 14 with the aim of
   keeping it until it ends up un-tested (ie, requiring test results are
   published for every release).


We expect real-world users, including the BeagleV-AHEAD community, to
need support for the foreseeable future.
Keeping it until it ends up untested (and test cases are reasonably
clean) sounds like a good threshold to ensure the integrity of the
codebase while giving this a clear path to stay in for its useful
life.
I can live with it being in the tree as long as it's maintained 
(measured by ongoing testing with reasonable results).


I'd proposed that it could end up deprecated quickly, but that was based 
on the assumption that once V1.0 compliant hardware was widely available 
that we'd see less and less interest in the thead extensions.


Jeff



Re: [V2] New pass for sign/zero extension elimination -- not ready for "final" review

2023-11-28 Thread Jivan Hakobyan
>
> The amdgcn ICE I reported still exists:


Can you send a build command to reproduce ICE.
I built on x86-64, RV32/64, and did not get any faults.

On Tue, Nov 28, 2023 at 7:08 PM Andrew Stubbs  wrote:

> On 28/11/2023 06:06, Jeff Law wrote:
> > - Verify we have a SUBREG before looking at SUBREG_BYTE.
>
> The amdgcn ICE I reported still exists:
>
> > conftest.c:16:1: internal compiler error: RTL check: expected code
> 'subreg', have 'reg' in ext_dce_process_uses, at ext-dce.cc:417
> >16 | }
> >   | ^
> > 0x8c7b21 rtl_check_failed_code1(rtx_def const*, rtx_code, char const*,
> int, char const*)
> >>.../scratch/astubbs/omp/upA/gcnbuild/src/gcc-mainline/gcc/rtl.cc:770
> > 0xa768e0 ext_dce_process_uses
>
> >>.../scratch/astubbs/omp/upA/gcnbuild/src/gcc-mainline/gcc/ext-dce.cc:417
> > 0x1aed4bc ext_dce_process_bb
>
> >>.../scratch/astubbs/omp/upA/gcnbuild/src/gcc-mainline/gcc/ext-dce.cc:643
> > 0x1aed4bc ext_dce
>
> >>.../scratch/astubbs/omp/upA/gcnbuild/src/gcc-mainline/gcc/ext-dce.cc:794
> > 0x1aed4bc execute
>
> >>.../scratch/astubbs/omp/upA/gcnbuild/src/gcc-mainline/gcc/ext-dce.cc:862
> > Please submit a full bug report, with preprocessed source (by using
> -freport-bug).
> > Please include the complete backtrace with any bug report.
> > See  for instructions.
> > configure:3812: $? = 1
> > configure: failed program was:
> > | /* confdefs.h */
> > | #define PACKAGE_NAME "GNU C Runtime Library"
> > | #define PACKAGE_TARNAME "libgcc"
> > | #define PACKAGE_VERSION "1.0"
> > | #define PACKAGE_STRING "GNU C Runtime Library 1.0"
> > | #define PACKAGE_BUGREPORT ""
> > | #define PACKAGE_URL "http://www.gnu.org/software/libgcc/;
> > | /* end confdefs.h.  */
> > |
> > | int
> > | main ()
> > | {
> > |
> > |   ;
> > |   return 0;
> > | }
>
> I think the test is maybe backwards?
>
>/* ?!? How much of this should mirror SET handling, potentially
>   being shared?   */
>if (SUBREG_BYTE (dst).is_constant () && SUBREG_P (dst))
>
> Andrew
>


-- 
With the best regards
Jivan Hakobyan


[Bug target/111170] [13/14 regression] Malformed manifest does not allow to run gcc on Windows XP (Accessing a corrupted shared library) since r13-6552-gd11e088210a551

2023-11-28 Thread ebotcazou at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=70

Eric Botcazou  changed:

   What|Removed |Added

 Status|UNCONFIRMED |NEW
 Ever confirmed|0   |1
   Last reconfirmed||2023-11-28
 CC||ebotcazou at gcc dot gnu.org

--- Comment #13 from Eric Botcazou  ---
Thanks for the fix.  Now it needs to be backported onto the 13 branch.

Re: RISC-V: Support XTheadVector extensions

2023-11-28 Thread Jeff Law




On 11/28/23 12:45, Palmer Dabbelt wrote:



IMO we're just stuck between a rock and a hard place here. Specifically, 
this isn't just an assembly syntax change but also comes with a bunch of 
behaviorial changes to the instructions in question -- I'm specifically 
thinking of things like the register packing, which IIRC changed a ton 
between 0.7 and 0.8 (and then again more for 1.0). That's the kind of 
stuff that tends to have non-local implications on the port, and thus 
can trip people up.


So if we model this as just assembly syntax then we risk people tripping 
over the differences, but if we try to model it as a whole different 
extension then we have more code to manage.  I'd start with the assembly 
syntax approach, as it should be the option with less code which is 
always nice.  If that turns out to be a problem then we can always just 
duplicate the patterns, but it's way harder to merge them back together 
if we start out with things duplicated.
The way I think about the assembly bits is it allows us to share a good 
amount of code between the two implementations.  There's obviously going 
to be some differences that will require more extensive work and that's 
where I think most of our review effort ought to be.




During the patchwork call we also ended up talking about the P extension 
(and the likely vendor flavors).  Nothing's appeared for there yet, but 
the theory is that the RZ/Five (Renesas' line of RISC-V chips that came 
out earlier this year) has some P-related extension.  There's also some 
SIMD in CORE-V, as well as a bunch of low-hanging fruit missing from V 
that we'll probably see more vendor extensions for.
The only P bits that made the gcc-14 deadline were those from Embecosm, 
so I'd tend to want to push all the other P stuff out to gcc-15.




So I think if the goal is to have a single vector target for RISC-V then 
we've probably lost already.
That's probably not feasible.  But I think there's a good amount of 
sharable bits between the V1.0 and the thead-vector support.





But we've got time to sort this out.  I don't think the code in question
was targeted towards gcc-14.


[In case anyone else is watching: see the forked thread, it might be 
amied for 14 now...]
It's aimed for evaluation/review given the submission occurred before 
the gcc-14 deadline.


jeff


Re: [PATCH] c++, v4: Implement C++26 P2741R3 - user-generated static_assert messages [PR110348]

2023-11-28 Thread Jason Merrill

On 11/28/23 12:52, Jakub Jelinek wrote:

On Tue, Nov 28, 2023 at 11:31:48AM -0500, Jason Merrill wrote:

+ if (len)
+   {
+ if (data)
+   msg = c_getstr (data);
+ if (msg == NULL)
+   buf = XNEWVEC (char, len);


Jonathan pointed out elsewhere that this gets leaked if error return
prevents us from getting to the XDELETEVEC.


Seems it is just one of the returns, so ok to just XDELETEVEC there,
or should I add some RAII for that?  The other error return after
this point is for !len case and so buf isn't allocated.


RAII is generally preferable, but this is sufficient.


2023-11-28  Jakub Jelinek  

* semantics.cc (finish_static_assert): Free buf on error return.

--- gcc/cp/semantics.cc.jj  2023-11-25 10:28:27.778191561 +0100
+++ gcc/cp/semantics.cc 2023-11-28 18:50:00.094733919 +0100
@@ -11582,6 +11582,7 @@ finish_static_assert (tree condition, tr
  error_at (location,
"% message % "
"must be a constant expression", i);
+ XDELETEVEC (buf);
  return;
}
  if (msg == NULL)


Jakub





Re: [PATCH 2/2] c++: guard more against undiagnosed error_mark_node [PR112658]

2023-11-28 Thread Jason Merrill

On 11/28/23 11:51, Patrick Palka wrote:

This adds a sanity check to cp_parser_expression_statement similar to
the one in finish_expr_stmt added by r6-6795-g0fd9d4921f7ba2, which
effectively downgrades accepts-invalid/wrong-code bugs like this one
into ice-on-invalid/ice-on-valid ones.


OK.


PR c++/112658

gcc/cp/ChangeLog:

* parser.cc (cp_parser_expression_statement): If the statement
is erroneous, make sure we've seen an error.
---
  gcc/cp/parser.cc | 3 +++
  1 file changed, 3 insertions(+)

diff --git a/gcc/cp/parser.cc b/gcc/cp/parser.cc
index 2464d1a0783..743d6517b09 100644
--- a/gcc/cp/parser.cc
+++ b/gcc/cp/parser.cc
@@ -12962,6 +12962,9 @@ cp_parser_expression_statement (cp_parser* parser, tree 
in_statement_expr)
if (statement == error_mark_node
  && !cp_parser_uncommitted_to_tentative_parse_p (parser))
{
+ /* If we ran into a problem, make sure we complained.  */
+ gcc_assert (seen_error ());
+
  cp_parser_skip_to_end_of_block_or_statement (parser);
  return error_mark_node;
}




Re: [PATCH 1/2] c++: casting array prvalue [PR112658, PR94264]

2023-11-28 Thread Jason Merrill

On 11/28/23 11:51, Patrick Palka wrote:

Bootstrapped and regtested on x86-64-pc-linux-gnu, does this look OK for
trunk?

-- >8 --

Here we deem the array-to-pointer conversions in both calls as invalid,
but we fail to issue a diagnostic for the second call, ultimately because
cp_build_c_cast doesn't replay errors from build_const_cast_1.  This means
the second call get silently discarded leading to wrong/unexpected code.

This patch fixes this issue.  I'm not sure if we want to accept these
conversions in the first place (that's PR94264 or at least related to
it), but at least we're more consistent now.


I've now fixed that bug, thanks for the pointer.

The cp_build_c_cast change is OK, but the testcase won't error anymore. 
Do you have an idea for an alternate test?  If not, it's OK to apply the 
fix anyway.


Jason



[Bug c++/53220] [4.7/4.8 Regression] g++ mis-compiles compound literals

2023-11-28 Thread cvs-commit at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=53220

--- Comment #23 from GCC Commits  ---
The trunk branch has been updated by Jason Merrill :

https://gcc.gnu.org/g:305a2686c99bf9b57490419f25c79f6fb3ae0feb

commit r14-5941-g305a2686c99bf9b57490419f25c79f6fb3ae0feb
Author: Jason Merrill 
Date:   Tue Nov 28 13:54:47 2023 -0500

c++: prvalue array decay [PR94264]

My change for PR53220 made array to pointer decay for prvalue arrays
ill-formed to catch well-defined C code that produces a dangling pointer in
C++ due to the shorter lifetime of compound literals.  This wasn't really
correct, but wasn't a problem until C++17 added prvalue arrays, at which
point it started rejecting valid C++ code.

I wanted to make sure that we still diagnose the problematic code;
-Wdangling-pointer covers the array-lit.c case, but I needed to extend
-Wreturn-local-addr to handle the return case.

PR c++/94264
PR c++/53220

gcc/c/ChangeLog:

* c-typeck.cc (array_to_pointer_conversion): Adjust -Wc++-compat
diagnostic.

gcc/cp/ChangeLog:

* call.cc (convert_like_internal): Remove obsolete comment.
* typeck.cc (decay_conversion): Allow array prvalue.
(maybe_warn_about_returning_address_of_local): Check
for returning pointer to temporary.

gcc/testsuite/ChangeLog:

* c-c++-common/array-lit.c: Adjust.
* g++.dg/cpp1z/array-prvalue1.C: New test.
* g++.dg/ext/complit17.C: New test.

[Bug c++/94264] Array-to-pointer conversion not performed on array prvalues

2023-11-28 Thread cvs-commit at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=94264

--- Comment #8 from GCC Commits  ---
The trunk branch has been updated by Jason Merrill :

https://gcc.gnu.org/g:305a2686c99bf9b57490419f25c79f6fb3ae0feb

commit r14-5941-g305a2686c99bf9b57490419f25c79f6fb3ae0feb
Author: Jason Merrill 
Date:   Tue Nov 28 13:54:47 2023 -0500

c++: prvalue array decay [PR94264]

My change for PR53220 made array to pointer decay for prvalue arrays
ill-formed to catch well-defined C code that produces a dangling pointer in
C++ due to the shorter lifetime of compound literals.  This wasn't really
correct, but wasn't a problem until C++17 added prvalue arrays, at which
point it started rejecting valid C++ code.

I wanted to make sure that we still diagnose the problematic code;
-Wdangling-pointer covers the array-lit.c case, but I needed to extend
-Wreturn-local-addr to handle the return case.

PR c++/94264
PR c++/53220

gcc/c/ChangeLog:

* c-typeck.cc (array_to_pointer_conversion): Adjust -Wc++-compat
diagnostic.

gcc/cp/ChangeLog:

* call.cc (convert_like_internal): Remove obsolete comment.
* typeck.cc (decay_conversion): Allow array prvalue.
(maybe_warn_about_returning_address_of_local): Check
for returning pointer to temporary.

gcc/testsuite/ChangeLog:

* c-c++-common/array-lit.c: Adjust.
* g++.dg/cpp1z/array-prvalue1.C: New test.
* g++.dg/ext/complit17.C: New test.

[pushed] c++: prvalue array decay [PR94264]

2023-11-28 Thread Jason Merrill
Tested x86_64-pc-linux-gnu, applying to trunk.

-- 8< --

My change for PR53220 made array to pointer decay for prvalue arrays
ill-formed to catch well-defined C code that produces a dangling pointer in
C++ due to the shorter lifetime of compound literals.  This wasn't really
correct, but wasn't a problem until C++17 added prvalue arrays, at which
point it started rejecting valid C++ code.

I wanted to make sure that we still diagnose the problematic code;
-Wdangling-pointer covers the array-lit.c case, but I needed to extend
-Wreturn-local-addr to handle the return case.

PR c++/94264
PR c++/53220

gcc/c/ChangeLog:

* c-typeck.cc (array_to_pointer_conversion): Adjust -Wc++-compat
diagnostic.

gcc/cp/ChangeLog:

* call.cc (convert_like_internal): Remove obsolete comment.
* typeck.cc (decay_conversion): Allow array prvalue.
(maybe_warn_about_returning_address_of_local): Check
for returning pointer to temporary.

gcc/testsuite/ChangeLog:

* c-c++-common/array-lit.c: Adjust.
* g++.dg/cpp1z/array-prvalue1.C: New test.
* g++.dg/ext/complit17.C: New test.
---
 gcc/c/c-typeck.cc   |  2 +-
 gcc/cp/call.cc  |  2 --
 gcc/cp/typeck.cc| 12 +++-
 gcc/testsuite/c-c++-common/array-lit.c  |  3 ++-
 gcc/testsuite/g++.dg/cpp1z/array-prvalue1.C |  7 +++
 gcc/testsuite/g++.dg/ext/complit17.C|  4 
 6 files changed, 17 insertions(+), 13 deletions(-)
 create mode 100644 gcc/testsuite/g++.dg/cpp1z/array-prvalue1.C
 create mode 100644 gcc/testsuite/g++.dg/ext/complit17.C

diff --git a/gcc/c/c-typeck.cc b/gcc/c/c-typeck.cc
index 1dbb4471a88..17fdc9789b4 100644
--- a/gcc/c/c-typeck.cc
+++ b/gcc/c/c-typeck.cc
@@ -1748,7 +1748,7 @@ array_to_pointer_conversion (location_t loc, tree exp)
   if (!TREE_READONLY (decl) && !TREE_STATIC (decl))
warning_at (DECL_SOURCE_LOCATION (decl), OPT_Wc___compat,
"converting an array compound literal to a pointer "
-   "is ill-formed in C++");
+   "leads to a dangling pointer in C++");
 }
 
   adr = build_unary_op (loc, ADDR_EXPR, exp, true);
diff --git a/gcc/cp/call.cc b/gcc/cp/call.cc
index 81b104f4b40..ae0decd87f1 100644
--- a/gcc/cp/call.cc
+++ b/gcc/cp/call.cc
@@ -8578,8 +8578,6 @@ convert_like_internal (conversion *convs, tree expr, tree 
fn, int argnum,
array = finish_compound_literal (array, new_ctor, complain);
/* This is dubious now, should be blessed by P2752.  */
DECL_MERGEABLE (TARGET_EXPR_SLOT (array)) = true;
-   /* Take the address explicitly rather than via decay_conversion
-  to avoid the error about taking the address of a temporary.  */
array = cp_build_addr_expr (array, complain);
  }
else
diff --git a/gcc/cp/typeck.cc b/gcc/cp/typeck.cc
index e995fb6ddd7..0839d0a4167 100644
--- a/gcc/cp/typeck.cc
+++ b/gcc/cp/typeck.cc
@@ -2534,15 +2534,6 @@ decay_conversion (tree exp,
  return error_mark_node;
}
 
-  /* Don't let an array compound literal decay to a pointer.  It can
-still be used to initialize an array or bind to a reference.  */
-  if (TREE_CODE (exp) == TARGET_EXPR)
-   {
- if (complain & tf_error)
-   error_at (loc, "taking address of temporary array");
- return error_mark_node;
-   }
-
   ptrtype = build_pointer_type (TREE_TYPE (type));
 
   if (VAR_P (exp))
@@ -10535,6 +10526,9 @@ maybe_warn_about_returning_address_of_local (tree 
retval, location_t loc)
   if (TYPE_REF_P (valtype))
warning_at (loc, OPT_Wreturn_local_addr,
"returning reference to temporary");
+  else if (TYPE_PTR_P (valtype))
+   warning_at (loc, OPT_Wreturn_local_addr,
+   "returning pointer to temporary");
   else if (is_std_init_list (valtype))
warning_at (loc, OPT_Winit_list_lifetime,
"returning temporary % does not extend "
diff --git a/gcc/testsuite/c-c++-common/array-lit.c 
b/gcc/testsuite/c-c++-common/array-lit.c
index 6505c2091b4..a6b3adf7cc8 100644
--- a/gcc/testsuite/c-c++-common/array-lit.c
+++ b/gcc/testsuite/c-c++-common/array-lit.c
@@ -1,10 +1,11 @@
 /* { dg-options "-std=c99 -Wc++-compat -Werror" { target c } } */
+/* { dg-options "-Werror=dangling-pointer=1" { target c++ } } */
 /* { dg-prune-output "treated as errors" } */
 #include 
 
 int main()
 {
-  for (int *p = (int[]){ 1, 2, 3, 0 }; /* { dg-error "array" } */
+  for (int *p = (int[]){ 1, 2, 3, 0 }; /* { dg-error "array|temporary" } */
*p; ++p) {
 printf("%d\n", *p);
   }
diff --git a/gcc/testsuite/g++.dg/cpp1z/array-prvalue1.C 
b/gcc/testsuite/g++.dg/cpp1z/array-prvalue1.C
new file mode 100644
index 000..e837d3253a1
--- /dev/null
+++ b/gcc/testsuite/g++.dg/cpp1z/array-prvalue1.C
@@ -0,0 +1,7 @@
+// PR c++/94264
+// 

[Bug target/112751] [14 regression] gcc.target/powerpc/pcrel-sibcall-1.c fails after r14-5628-g53ba8d669550d3

2023-11-28 Thread pinskia at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=112751

--- Comment #1 from Andrew Pinski  ---
This is just a testsuite issue. The functions are currently marked as noinline.
 You can either add -fno-ipa-vrp or mark them with noipa instead. I am not sure
if noipa here is right due to having some ipa happening due to localization.

[Bug target/112751] New: [14 regression] gcc.target/powerpc/pcrel-sibcall-1.c fails after r14-5628-g53ba8d669550d3

2023-11-28 Thread seurer at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=112751

Bug ID: 112751
   Summary: [14 regression] gcc.target/powerpc/pcrel-sibcall-1.c
fails after r14-5628-g53ba8d669550d3
   Product: gcc
   Version: 14.0
Status: UNCONFIRMED
  Severity: normal
  Priority: P3
 Component: target
  Assignee: unassigned at gcc dot gnu.org
  Reporter: seurer at gcc dot gnu.org
  Target Milestone: ---

g:53ba8d669550d3a1f809048428b97ca607f95cf5, r14-5628-g53ba8d669550d3
make  -k check-gcc
RUNTESTFLAGS="powerpc.exp=gcc.target/powerpc/pcrel-sibcall-1.c"
FAIL: gcc.target/powerpc/pcrel-sibcall-1.c scan-assembler \\mb x@notoc\\M
FAIL: gcc.target/powerpc/pcrel-sibcall-1.c scan-assembler \\mbl y\\M
FAIL: gcc.target/powerpc/pcrel-sibcall-1.c scan-assembler \\mb xx@notoc\\M
# of expected passes2
# of unexpected failures3


commit 53ba8d669550d3a1f809048428b97ca607f95cf5 (HEAD)
Author: Jan Hubicka 
Date:   Mon Nov 20 19:35:53 2023 +0100

inter-procedural value range propagation

[Bug c++/102419] [11/12/13/14 Regression][concepts] return-type-requirement of "Y" does not check that T::type actually exists

2023-11-28 Thread ppalka at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=102419

Patrick Palka  changed:

   What|Removed |Added

 CC||novulae at hotmail dot com

--- Comment #10 from Patrick Palka  ---
*** Bug 112749 has been marked as a duplicate of this bug. ***

[Bug c++/112749] GCC accepts invalid code in concepts (requires clause incorrectly satisfied)

2023-11-28 Thread ppalka at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=112749

Patrick Palka  changed:

   What|Removed |Added

 Resolution|--- |DUPLICATE
 CC||ppalka at gcc dot gnu.org
 Status|UNCONFIRMED |RESOLVED

--- Comment #1 from Patrick Palka  ---
Thanks for the bug report, I think this is pretty much a dup of PR102419.

GCC is behaving correctly here: the normal form of the constraint C1> is
just 'true (with empty parameter mapping)' which is independent of the template
parameter, so the constraint is trivially satisfied for any choice of T.  To
get the behavior you desire, define C1 in a way that depends on its template
parameter e.g.

template concept C1 = requires { typename T; };

*** This bug has been marked as a duplicate of bug 102419 ***

Re: T-Head Vector for GCC-14? (was Re: RISC-V: Support XTheadVector extensions)

2023-11-28 Thread Philipp Tomsich
On Tue, 28 Nov 2023 at 20:31, Palmer Dabbelt  wrote:
>
> On Wed, 22 Nov 2023 14:27:50 PST (-0800), jeffreya...@gmail.com wrote:
> > ...
>
> [Trimming everything else, as this is a big change.  I'm also making it
> a new subject/thread, so folks can see.]
>
> > More generally, I think I need to soften my prior statement about
> > deferring this to gcc-15.  This code was submitted in time for the
> > gcc-14 deadline, so it should be evaluated just like we do anything else
> > that makes the deadline.  There are various criteria we use to evaluate
> > if something should get integrated and we should just work through this
> > series like we always do and not treat it specially in any way.
>
> We talked about this some in the pachwork meeting today.  There's a lot
> of moving parts here, so here's my best bet at summarizing
>
> It seems like folks broadly agree: I think the only reason everyone was
> so quick to defer to 15 was because we though the Vrull guys even want
> to, but sounds like there's some interest in getting this into 14.

Thank you for the follow-up on this, as I had the original
conversation with Jeff in passing.
We (and the Alibaba folks and the BeagleV-AHEAD community) would
prefer to get this into 14.

> That's obviously a risky thing to do given it was sent right at the end
> of the window, but it meets the rules.
>
> Folks in the call seemed generally amenable to at least trying for 14,
> so unless anyone's opposed on the lists it seems like the way to go.
> IIRC we ended up with the following TODO list:
>
> * Make sure this doesn't regress on the targets we already support.
>   From the sounds of things there's been test suite runs that look fine,
>   so hopefully that's all manageable.  Christoph said he'd send
>   something out, we've had a bunch of test skew so there might be a bit
>   lurking but it should be generally manageable.
> * We agree on some sort of support lifecycle.  There seemed to be
>   basically two proposals: merge for 14 with the aim of quickly
>   deperecating it (maybe even for 15), or merge for 14 with the aim of
>   keeping it until it ends up un-tested (ie, requiring test results are
>   published for every release).

We expect real-world users, including the BeagleV-AHEAD community, to
need support for the foreseeable future.
Keeping it until it ends up untested (and test cases are reasonably
clean) sounds like a good threshold to ensure the integrity of the
codebase while giving this a clear path to stay in for its useful
life.

Philipp.

> * We actually find some time to sit down and do the code review.
>   That'll be a chunk of work and time is tight since most of us are
>   focusing on V-1.0, but hopefully we've got time to fit things in.
> * There's some options for testing without hardware: QEMU dropped
>   support for V-0.7.1 a while ago, but there's a patch set that's not
>   yet on the lists to bring that back.
>
> So I think unless anyone's opposed, we can at least start looking into
> getting this into GCC-14 -- there's obviously still a ton of review work
> to do and we might find something problematic, but we won't know until
> we actually sit down and do the reviews.
>
> ---
>
> Then for my opinions:
>
> The only policy worry I have is the support lifecycle: IMO merging
> something we're going to quickly deprecate is going to lead to headaches
> for users, so we should only merge this if we're going to plan on
> supporting it for the life of the hardware.  That's always hard to
> define, but we talked through the option of pushing this onto the users:
> we'd require test results published for every GCC release, and if no
> reasonably cleas test results are published then we'll assume the HW is
> defunct and support for it can be deprecated.  That's sort of patterned
> on how glibc documents deprecating ports.
>
> IIRC we didn't really end up with any deprecation policy when merging
> the other vendor support, so I'd argue we should just make that the
> general plan for supporting vendor extensions.  It pushes a little more
> work to the vendors/users than we have before, but I think it's a good
> balance.  It's also a pretty easy policy for vendors to understand: if
> they want their custom stuff supported, they need to demonstrate it
> works.


Re: [PATCH] Fortran: deferred-length character optional dummy arguments [PR93762,PR100651]

2023-11-28 Thread Harald Anlauf

Hi FX,

On 11/28/23 18:07, FX Coudert wrote:

Hi Harald,

The patch looks OK to me. Probably wait a bit for another opinion, since I’m 
not that active and I may have missed something.

Thanks,
FX


thanks for having a look.

In the meantime I got an automated mail from the Linaro testers.
According to it there is a runtime failure of the testcase on
aarch64.  I couldn't see any useful traceback or else.

I tried the testcase on x86 with different options and found
an unexpected result only with -fsanitize=undefined and only
for the case of a rank-1 dummy when there is no actual argument
and the passed to another subroutine.  (valgrind is happy.)

Reduced reproducer:

! this fails with -fsanitize=undefined
program main
  call test_rank1 ()
contains
  subroutine test_rank1 (msg1)
character(:), optional, allocatable :: msg1(:)
if (present (msg1)) stop 77
call assert_rank1 ()! <- no problem here
call assert_rank1 (msg1)! <- problematic code path
  end

  subroutine assert_rank1 (msg2)
character(:), optional, allocatable :: msg2(:)
if (present (msg2)) stop 99 ! <- no problem if commented
  end
end


As far as I can tell, this could be a pre-existing (latent)
issue.  By looking at the tree-dump, the only thing that
appears fishy has been there before.  But then I am only
guessing that this is the problem observed on aarch64.

I have disabled the related call in the testcase of the
attached revised version.  As I do not see anything else,
I wonder if one could proceed with the current version
but open a PR for the reduced case above, unless someone
can pinpoint the place that is responsible for the above
failure.  (Is it the caller, or rather the function entry
code in the callee?)

Cheers,
Harald

From 63879942b491e23eefc6da4d80c5492434e42ec8 Mon Sep 17 00:00:00 2001
From: Harald Anlauf 
Date: Tue, 28 Nov 2023 20:19:14 +0100
Subject: [PATCH] Fortran: deferred-length character optional dummy arguments
 [PR93762,PR100651]

gcc/fortran/ChangeLog:

	PR fortran/93762
	PR fortran/100651
	* trans-expr.cc (gfc_conv_missing_dummy): The character length for
	deferred-length dummy arguments is passed by reference, so that its
	value can be returned.  Adjust handling for optional dummies.

gcc/testsuite/ChangeLog:

	PR fortran/93762
	PR fortran/100651
	* gfortran.dg/optional_deferred_char_1.f90: New test.
---
 gcc/fortran/trans-expr.cc |  22 +++-
 .../gfortran.dg/optional_deferred_char_1.f90  | 100 ++
 2 files changed, 118 insertions(+), 4 deletions(-)
 create mode 100644 gcc/testsuite/gfortran.dg/optional_deferred_char_1.f90

diff --git a/gcc/fortran/trans-expr.cc b/gcc/fortran/trans-expr.cc
index bfe9996ced6..c90c7bbf936 100644
--- a/gcc/fortran/trans-expr.cc
+++ b/gcc/fortran/trans-expr.cc
@@ -2116,10 +2116,24 @@ gfc_conv_missing_dummy (gfc_se * se, gfc_expr * arg, gfc_typespec ts, int kind)
 
   if (ts.type == BT_CHARACTER)
 {
-  tmp = build_int_cst (gfc_charlen_type_node, 0);
-  tmp = fold_build3_loc (input_location, COND_EXPR, gfc_charlen_type_node,
-			 present, se->string_length, tmp);
-  tmp = gfc_evaluate_now (tmp, >pre);
+  /* Handle deferred-length dummies that pass the character length by
+	 reference so that the value can be returned.  */
+  if (ts.deferred && INDIRECT_REF_P (se->string_length))
+	{
+	  tmp = gfc_build_addr_expr (NULL_TREE, se->string_length);
+	  tmp = fold_build3_loc (input_location, COND_EXPR, TREE_TYPE (tmp),
+ present, tmp, null_pointer_node);
+	  tmp = gfc_evaluate_now (tmp, >pre);
+	  tmp = build_fold_indirect_ref_loc (input_location, tmp);
+	}
+  else
+	{
+	  tmp = build_int_cst (gfc_charlen_type_node, 0);
+	  tmp = fold_build3_loc (input_location, COND_EXPR,
+ gfc_charlen_type_node,
+ present, se->string_length, tmp);
+	  tmp = gfc_evaluate_now (tmp, >pre);
+	}
   se->string_length = tmp;
 }
   return;
diff --git a/gcc/testsuite/gfortran.dg/optional_deferred_char_1.f90 b/gcc/testsuite/gfortran.dg/optional_deferred_char_1.f90
new file mode 100644
index 000..0fb0fb5fea1
--- /dev/null
+++ b/gcc/testsuite/gfortran.dg/optional_deferred_char_1.f90
@@ -0,0 +1,100 @@
+! { dg-do run }
+! PR fortran/93762
+! PR fortran/100651 - deferred-length character as optional dummy argument
+
+program main
+  implicit none
+  character(:), allocatable :: err_msg, msg3(:)
+  character(:), pointer :: err_msg2 => NULL()
+
+  ! Subroutines with optional arguments
+  call to_int ()
+  call to_int_p ()
+! call test_rank1 ()! this fails with -fsanitize=undefined
+  call assert_code ()
+  call assert_p ()
+  call assert_rank1 ()
+
+  ! Test passing of optional arguments
+  call to_int (err_msg)
+  if (.not. allocated (err_msg)) stop 1
+  if (len (err_msg) /= 7)stop 2
+  if (err_msg(1:7) /= "foo bar") stop 3
+
+  call to_int2 (err_msg)
+  if (.not. allocated (err_msg)) stop 4
+  if (len (err_msg) /= 7)stop 5
+  if (err_msg(1:7) /= "foo bar") stop 6
+  

Re: RISC-V: Support XTheadVector extensions

2023-11-28 Thread Palmer Dabbelt

On Fri, 17 Nov 2023 16:01:27 PST (-0800), jeffreya...@gmail.com wrote:



On 11/17/23 16:16, 钟居哲 wrote:

 >> I assume this hunk is meant for riscv_output_operand in riscv.cc.  We

may also need to add '^' to the punct_valid_p hook.  But yes, this is
the preferred way to go when all we need to do is prefix the instruction
with "th.".


No. I don't think we need to add '^' . I don't want theadvector to touch
any codes
of vector.md.
Mixing up theadvector with RVV1.0 is a nighmare for RVV maintain.
People like me don't want to touch any thing related to Thead.
But anyway, I will take care of that in GCC-15.

I suspect it's going to be even worse if you we have multiple patterns
with the same underlying RTL, but just different output strings.

The standard way to handle that has been with an output modifier and/or
ASSEMBLER_DIALECT.  If you look at the PA port for example, the
assembler syntax changed dramatically between the PA1.0/PA1.1 era and
the PA2.0 era.  But we support both variants trivially without
duplicating all the patterns.


IMO we're just stuck between a rock and a hard place here.  
Specifically, this isn't just an assembly syntax change but also comes 
with a bunch of behaviorial changes to the instructions in question -- 
I'm specifically thinking of things like the register packing, which 
IIRC changed a ton between 0.7 and 0.8 (and then again more for 1.0).  
That's the kind of stuff that tends to have non-local implications on 
the port, and thus can trip people up.


So if we model this as just assembly syntax then we risk people tripping 
over the differences, but if we try to model it as a whole different 
extension then we have more code to manage.  I'd start with the assembly 
syntax approach, as it should be the option with less code which is 
always nice.  If that turns out to be a problem then we can always just 
duplicate the patterns, but it's way harder to merge them back together 
if we start out with things duplicated.


During the patchwork call we also ended up talking about the P extension 
(and the likely vendor flavors).  Nothing's appeared for there yet, but 
the theory is that the RZ/Five (Renesas' line of RISC-V chips that came 
out earlier this year) has some P-related extension.  There's also some 
SIMD in CORE-V, as well as a bunch of low-hanging fruit missing from V 
that we'll probably see more vendor extensions for.


So I think if the goal is to have a single vector target for RISC-V then 
we've probably lost already.



But we've got time to sort this out.  I don't think the code in question
was targeted towards gcc-14.


[In case anyone else is watching: see the forked thread, it might be 
amied for 14 now...]





jeff


[Bug target/112606] [14 Regression] powerpc64le-linux-gnu: 'FAIL: gcc.target/powerpc/p8vector-fp.c scan-assembler xsnabsdp'

2023-11-28 Thread seurer at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=112606

seurer at gcc dot gnu.org changed:

   What|Removed |Added

 CC||seurer at gcc dot gnu.org

--- Comment #3 from seurer at gcc dot gnu.org ---
These tests also fail starting with g:9e9279fadbd1c673c875b9d20261d2de0473f63f,
r14-5542-g9e9279fadbd1c6

FAIL: gcc.target/powerpc/float128-hw5.c scan-assembler-not \\mxscpsgnqp\\M
FAIL: gcc.target/powerpc/float128-hw5.c scan-assembler-times \\mxsnabsqp\\M 1
FAIL: gcc.target/powerpc/float128-hw7.c scan-assembler-not \\mxscpsgnqp\\M
FAIL: gcc.target/powerpc/float128-hw7.c scan-assembler-times \\mxsnabsqp\\M 1

T-Head Vector for GCC-14? (was Re: RISC-V: Support XTheadVector extensions)

2023-11-28 Thread Palmer Dabbelt

On Wed, 22 Nov 2023 14:27:50 PST (-0800), jeffreya...@gmail.com wrote:

...


[Trimming everything else, as this is a big change.  I'm also making it 
a new subject/thread, so folks can see.]



More generally, I think I need to soften my prior statement about
deferring this to gcc-15.  This code was submitted in time for the
gcc-14 deadline, so it should be evaluated just like we do anything else
that makes the deadline.  There are various criteria we use to evaluate
if something should get integrated and we should just work through this
series like we always do and not treat it specially in any way.


We talked about this some in the pachwork meeting today.  There's a lot 
of moving parts here, so here's my best bet at summarizing 

It seems like folks broadly agree: I think the only reason everyone was 
so quick to defer to 15 was because we though the Vrull guys even want 
to, but sounds like there's some interest in getting this into 14.  
That's obviously a risky thing to do given it was sent right at the end 
of the window, but it meets the rules.


Folks in the call seemed generally amenable to at least trying for 14, 
so unless anyone's opposed on the lists it seems like the way to go.  
IIRC we ended up with the following TODO list:


* Make sure this doesn't regress on the targets we already support.  
 From the sounds of things there's been test suite runs that look fine, 
 so hopefully that's all manageable.  Christoph said he'd send 
 something out, we've had a bunch of test skew so there might be a bit 
 lurking but it should be generally manageable.
* We agree on some sort of support lifecycle.  There seemed to be 
 basically two proposals: merge for 14 with the aim of quickly 
 deperecating it (maybe even for 15), or merge for 14 with the aim of 
 keeping it until it ends up un-tested (ie, requiring test results are 
 published for every release).
* We actually find some time to sit down and do the code review.  
 That'll be a chunk of work and time is tight since most of us are 
 focusing on V-1.0, but hopefully we've got time to fit things in.
* There's some options for testing without hardware: QEMU dropped 
 support for V-0.7.1 a while ago, but there's a patch set that's not 
 yet on the lists to bring that back.


So I think unless anyone's opposed, we can at least start looking into 
getting this into GCC-14 -- there's obviously still a ton of review work 
to do and we might find something problematic, but we won't know until 
we actually sit down and do the reviews.


---

Then for my opinions:

The only policy worry I have is the support lifecycle: IMO merging 
something we're going to quickly deprecate is going to lead to headaches 
for users, so we should only merge this if we're going to plan on 
supporting it for the life of the hardware.  That's always hard to 
define, but we talked through the option of pushing this onto the users: 
we'd require test results published for every GCC release, and if no 
reasonably cleas test results are published then we'll assume the HW is 
defunct and support for it can be deprecated.  That's sort of patterned 
on how glibc documents deprecating ports.


IIRC we didn't really end up with any deprecation policy when merging 
the other vendor support, so I'd argue we should just make that the 
general plan for supporting vendor extensions.  It pushes a little more 
work to the vendors/users than we have before, but I think it's a good 
balance.  It's also a pretty easy policy for vendors to understand: if 
they want their custom stuff supported, they need to demonstrate it 
works. 


[Bug c++/112744] Nested name specifier wrongly produces ambiguity in accessing static field

2023-11-28 Thread mpolacek at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=112744

Marek Polacek  changed:

   What|Removed |Added

   Assignee|unassigned at gcc dot gnu.org  |mpolacek at gcc dot 
gnu.org
 Status|NEW |ASSIGNED

[Bug middle-end/112750] New: wrong code with _BitInt(256) and above and __builtin_sub_overflow_p() at -O0

2023-11-28 Thread zsojka at seznam dot cz via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=112750

Bug ID: 112750
   Summary: wrong code with _BitInt(256) and above and
__builtin_sub_overflow_p() at -O0
   Product: gcc
   Version: 14.0
Status: UNCONFIRMED
  Keywords: wrong-code
  Severity: normal
  Priority: P3
 Component: middle-end
  Assignee: unassigned at gcc dot gnu.org
  Reporter: zsojka at seznam dot cz
  Target Milestone: ---
  Host: x86_64-pc-linux-gnu
Target: x86_64-pc-linux-gnu

Created attachment 56708
  --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=56708=edit
reduced testcase

Output:
$ x86_64-pc-linux-gnu-gcc testcase.c
$ ./a.out 
Aborted

__builtin_sub_overflow_p() should indicate overflow, since 0xC00C598D can't fit
into int.

$ x86_64-pc-linux-gnu-gcc -v
Using built-in specs.
COLLECT_GCC=/repo/gcc-trunk/binary-latest-amd64/bin/x86_64-pc-linux-gnu-gcc
COLLECT_LTO_WRAPPER=/repo/gcc-trunk/binary-trunk-r14-5935-20231128165834-gf45d5e30bd9-checking-yes-rtl-df-extra-nobootstrap-amd64/bin/../libexec/gcc/x86_64-pc-linux-gnu/14.0.0/lto-wrapper
Target: x86_64-pc-linux-gnu
Configured with: /repo/gcc-trunk//configure --enable-languages=c,c++
--enable-valgrind-annotations --disable-nls --enable-checking=yes,rtl,df,extra
--disable-bootstrap --with-cloog --with-ppl --with-isl
--build=x86_64-pc-linux-gnu --host=x86_64-pc-linux-gnu
--target=x86_64-pc-linux-gnu --with-ld=/usr/bin/x86_64-pc-linux-gnu-ld
--with-as=/usr/bin/x86_64-pc-linux-gnu-as --disable-libstdcxx-pch
--prefix=/repo/gcc-trunk//binary-trunk-r14-5935-20231128165834-gf45d5e30bd9-checking-yes-rtl-df-extra-nobootstrap-amd64
Thread model: posix
Supported LTO compression algorithms: zlib zstd
gcc version 14.0.0 20231128 (experimental) (GCC)

[Bug c++/112744] Nested name specifier wrongly produces ambiguity in accessing static field

2023-11-28 Thread mpolacek at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=112744

Marek Polacek  changed:

   What|Removed |Added

 Status|UNCONFIRMED |NEW
 Ever confirmed|0   |1
   Last reconfirmed||2023-11-28
 CC||mpolacek at gcc dot gnu.org

--- Comment #1 from Marek Polacek  ---
Confirmed, I think.  Since a is static, there is only one copy.  So it should
not matter that A is an ambiguous base of D (which it is).  See [class]/note-3.

For the error line, in finish_class_member_access_expr scope will be A, so we
do
 3496   /* Find the base of OBJECT_TYPE corresponding to SCOPE.  */
 3497   access_path = lookup_base (object_type, scope, ba_check,
 3498  NULL, complain);
where object_type=D, scope=A.  But the ba_check means we give an error.

We don't know at this point that name refers to a static data member.  But we
can look it up, and maybe use ba_any.

Not a regression.

Re: [PATCH] [GCC] match.pd: Simplify rule for bitwise not with casts

2023-11-28 Thread Andrew Pinski
On Tue, Nov 28, 2023 at 7:38 AM  wrote:
>
> From: Ezra Sitorus 
>
> Add the transform rule (T)(~A) -> ~(T)(A) for view_convert. The simplified 
> result could be a single assembly instruction when chained with other 
> instructions.
>
> gcc/ChangeLog:
> * match.pd: Add new transform rule.
> * 
> testsuite/gcc.target/aarch64/advsimd-intrinsics/vreinterpretq_vmvnq.c: Add 
> new test
> * testsuite/gcc.target/arm/simd/vreinterpretq_vmvnq_1.c: Add new test
> ---
>  gcc/match.pd  |  5 
>  .../advsimd-intrinsics/vreinterpretq_vmvnq.c  | 25 ++
>  .../arm/simd/vreinterpretq_vmvnq_1.c  | 26 +++
>  3 files changed, 56 insertions(+)
>  create mode 100644 
> gcc/testsuite/gcc.target/aarch64/advsimd-intrinsics/vreinterpretq_vmvnq.c
>  create mode 100644 gcc/testsuite/gcc.target/arm/simd/vreinterpretq_vmvnq_1.c
>
> diff --git a/gcc/match.pd b/gcc/match.pd
> index 95225e4ca5f..273230a7681 100644
> --- a/gcc/match.pd
> +++ b/gcc/match.pd
> @@ -3576,6 +3576,11 @@ DEFINE_INT_AND_FLOAT_ROUND_FN (RINT)
>   && !TYPE_OVERFLOW_SANITIZED (type))
>(convert (op! @0 @1)
>
> +/* (T)(~A) -> ~(T)A  */
> +  (simplify
> +   (view_convert (bit_not @0))
> +   (bit_not (view_convert @0)))

This is not wrong for a few reasons. The outer type needs to be an
integral (scalar or vector) type for this to be valid. Plus this might
not be a good (or valid) idea to do for boolean types.
So I think the following check would be needed:
if ((INTEGRAL_TYPE_P (type) && TREE_CODE (type) != BOOLEAN_TYPE)
|| (VECTOR_TYPE_P (type) && INTEGRAL_TYPE_P (TREE_TYPE (type))
&& !VECTOR_BOOLEAN_TYPE_P (type)))

Note this might also cause issues with enum types which sometimes have
constrained type ranges.

Thanks,
Andrew Pinski


> +
>/* ~A + A -> -1 */
>(simplify
> (plus:c (convert? (bit_not @0)) (convert? @0))
> diff --git 
> a/gcc/testsuite/gcc.target/aarch64/advsimd-intrinsics/vreinterpretq_vmvnq.c 
> b/gcc/testsuite/gcc.target/aarch64/advsimd-intrinsics/vreinterpretq_vmvnq.c
> new file mode 100644
> index 000..ed82c844bd4
> --- /dev/null
> +++ 
> b/gcc/testsuite/gcc.target/aarch64/advsimd-intrinsics/vreinterpretq_vmvnq.c
> @@ -0,0 +1,25 @@
> +/* { dg-do compile } */
> +/* { dg-options "-O2" } */
> +
> +#include 
> +
> +int64x2_t test_vector1(int32x4_t a, int32x4_t b)
> +{
> +  return vandq_s64(vreinterpretq_s64_s32(vmvnq_s32(a)),
> +   vreinterpretq_s64_s32(b));
> +}
> +
> +int64x2_t test_vector2(int32x4_t a, int16x8_t b)
> +{
> +  return vandq_s64(vreinterpretq_s64_s32(vmvnq_s32(a)),
> +   vreinterpretq_s64_s16(b));
> +}
> +
> +int64x2_t test_vector3(int32x4_t a, int64x2_t b)
> +{
> +  return vandq_s64(vreinterpretq_s64_s32(vmvnq_s32(a)), b);
> +}
> +
> +/* { dg-final { scan-assembler-times {\tbic\t} 3 } } */
> +/* { dg-final { scan-assembler-not {\tand\t} } } */
> +/* { dg-final { scan-assembler-not {\tmvn\t} } } */
> diff --git a/gcc/testsuite/gcc.target/arm/simd/vreinterpretq_vmvnq_1.c 
> b/gcc/testsuite/gcc.target/arm/simd/vreinterpretq_vmvnq_1.c
> new file mode 100644
> index 000..a34425100ea
> --- /dev/null
> +++ b/gcc/testsuite/gcc.target/arm/simd/vreinterpretq_vmvnq_1.c
> @@ -0,0 +1,26 @@
> +/* { dg-do compile } */
> +/* { dg-options "-O2" } */
> +/* { dg-additional-options "-march=armv8.2-a -mfloat-abi=hard -mfpu=neon" } 
> */
> +
> +#include 
> +
> +int64x2_t test_vector1(int32x4_t a, int32x4_t b)
> +{
> +  return vandq_s64(vreinterpretq_s64_s32(vmvnq_s32(a)),
> +   vreinterpretq_s64_s32(b));
> +}
> +
> +int64x2_t test_vector2(int32x4_t a, int16x8_t b)
> +{
> +  return vandq_s64(vreinterpretq_s64_s32(vmvnq_s32(a)),
> +   vreinterpretq_s64_s16(b));
> +}
> +
> +int64x2_t test_vector3(int32x4_t a, int64x2_t b)
> +{
> +  return vandq_s64(vreinterpretq_s64_s32(vmvnq_s32(a)), b);
> +}
> +
> +/* { dg-final { scan-assembler-times {\tvbic\t} 3 } } */
> +/* { dg-final { scan-assembler-not {\tvand\t} } } */
> +/* { dg-final { scan-assembler-not {\tvmvn\t} } } */
> --
> 2.25.1
>


Re: [PATCH] tree-sra: Avoid returns of references to SRA candidates

2023-11-28 Thread Richard Biener



> Am 28.11.2023 um 18:38 schrieb Jan Hubicka :
> 
> 
>> 
>> 
>> 
 Am 28.11.2023 um 17:59 schrieb Jan Hubicka :
>>> 
>>> 
 
> On Tue, 28 Nov 2023, Martin Jambor wrote:
> 
> On Tue, Nov 28 2023, Richard Biener wrote:
>> On Mon, 27 Nov 2023, Martin Jambor wrote:
>> 
>>> Hi,
>>> 
>>> The enhancement to address PR 109849 contained an importsnt thinko,
>>> and that any reference that is passed to a function and does not
>>> escape, must also not happen to be aliased by the return value of the
>>> function.  This has quickly transpired as bugs PR 112711 and PR
>>> 112721.
>>> 
>>> Just as IPA-modref does a good enough job to allow us to rely on the
>>> escaped set of variables, it sems to be doing well also on updating
>>> EAF_NOT_RETURNED_DIRECTLY call argument flag which happens to address
>>> exactly the situation we need to avoid.  Of course, if a call
>>> statement ignores any returned value, we also do not need to check the
>>> flag.
>> 
>> But what about EAF_NOT_RETURNED_INDIRECTLY?  Don't you need to
>> verify the parameter doesn't escape through the return at all?
>> 
> 
> I thought EAF_NOT_RETURNED_INDIRECTLY prohibits things like "return
> param->next" but those are not a problem (whatever next points to cannot
> be an SRA candidate and any ADDR_EXPR storing its address there would
> trigger a disqualification or at least an assert).  But I guess I am
> wrong, what is actually the exact meaning of the flag?
 
 I thought it's return (x.ptr = param, );
 
 so the parameter is reachable from the return value.
 
 But let's Honza answer...
>>> It is same difference as direct/indirect escape. so it check whether
>>> values pointed to by arg can be possibly returned.  Indeed maybe we
>>> should think of better name - the other interpretation did not even
>>> occur to me, but it makes sense.
>> 
>> So does the directly returned flag cover my interpretation of indirect or is 
>> there a hole?
> 
> Stores goes through:
> 
>  /* Handle *lhs = name.  */
>  if (assign && gimple_assign_rhs1 (assign) == name)
>{ 
>  if (dump_file) 
>fprintf (dump_file, "%*s  ssa name saved to memory\n",
> m_depth * 4, "");
>  m_lattice[index].merge (0);
>}
> 
> So we give up on any flags.  So far modref does not try to track values
> in memory at all. I suppose PTA info does not help me here, since the
> memory values is stored to may not escape but later it may be read and
> copied into something that does escape?

Yeah, we currently don’t track (reliably) what parameters point to and whether 
that escapes.  Or rather, you can’t query this info.

Richard 

> Honza
>> 
>> Richard
>> 
>>> Honza
 
 Richard.


[Bug c++/112749] New: GCC accepts invalid code in concepts (requires clause incorrectly satisfied)

2023-11-28 Thread novulae at hotmail dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=112749

Bug ID: 112749
   Summary: GCC accepts invalid code in concepts (requires clause
incorrectly satisfied)
   Product: gcc
   Version: 13.2.0
Status: UNCONFIRMED
  Severity: normal
  Priority: P3
 Component: c++
  Assignee: unassigned at gcc dot gnu.org
  Reporter: novulae at hotmail dot com
  Target Milestone: ---

The below example is ill-formed - the requires clause should not be satisfied
due to the failure to instantiate W1, but GCC accepts the code
nonetheless.

Clang rejects the code.

$ cat gcc-bug.cpp
template concept C1 = true;

template using W1 = decltype(nonexistent(0));

template
requires C1>
void func(T);

void test() {
  func(0);
}

$ g++ -c -std=c++20 gcc-bug.cpp

$ clang++ -c -std=c++20 gcc-bug.cpp
gcc-bug.cpp:10:3: error: no matching function for call to 'func'
   10 |   func(0);
  |   ^~~~
gcc-bug.cpp:7:6: note: candidate template ignored: constraints not satisfied
[with T = int]
7 | void func(T);
  |  ^
gcc-bug.cpp:3:39: note: because substituted constraint expression is
ill-formed: use of undeclared identifier 'nonexistent'
3 | template using W1 = decltype(nonexistent(0));
  |   ^
1 error generated.

[Bug middle-end/112748] memmove(ptr, ptr, n) call optimized out even at -O0 with -fsanitize=undefined

2023-11-28 Thread tavianator at gmail dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=112748

--- Comment #2 from Tavian Barnes  ---
(In reply to Andrew Pinski from comment #1)
> Does -fsanitize=address remove it?

Yes, it's still removed with -fsanitize=address.

While ASAN is necessary to check that the memory is really allocated, UBSAN
should at least check that ptr is not NULL.  So it shouldn't be removed in
either case.

[Bug middle-end/112748] memmove(ptr, ptr, n) call optimized out even at -O0 with -fsanitize=undefined

2023-11-28 Thread pinskia at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=112748

--- Comment #1 from Andrew Pinski  ---
Does -fsanitize=address remove it?

[Bug c++/94264] Array-to-pointer conversion not performed on array prvalues

2023-11-28 Thread jason at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=94264

Jason Merrill  changed:

   What|Removed |Added

 CC||jason at gcc dot gnu.org
   Assignee|unassigned at gcc dot gnu.org  |jason at gcc dot gnu.org
 Depends on||53220
 Status|NEW |ASSIGNED

--- Comment #7 from Jason Merrill  ---
This is due to my PR53220 change to discourage use of compound-literals in ways
that produce a dangling pointer when the C++ compiler treats them as prvalues,
unlike C where they have variable lifetime.

I think the change was always wrong, but wasn't really a problem until we added
array prvalues in C++17.


Referenced Bugs:

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=53220
[Bug 53220] [4.7/4.8 Regression] g++ mis-compiles compound literals

  1   2   3   4   >