Re: [PATCH 1/2] [i386] Support type _Float16/__bf16 independent of SSE2.

2023-07-18 Thread Hongtao Liu via Gcc-patches
On Mon, Jul 17, 2023 at 7:38 PM Uros Bizjak  wrote:
>
> On Mon, Jul 17, 2023 at 10:28 AM Hongtao Liu  wrote:
> >
> > I'd like to ping for this patch (only patch 1/2, for patch 2/2, I
> > think that may not be necessary).
> >
> > On Mon, May 15, 2023 at 9:20 AM Hongtao Liu  wrote:
> > >
> > > ping.
> > >
> > > On Fri, Apr 21, 2023 at 9:55 PM liuhongt  wrote:
> > > >
> > > > > > +  if (!TARGET_SSE2)
> > > > > > +{
> > > > > > +  if (c_dialect_cxx ()
> > > > > > +   && cxx_dialect > cxx20)
> > > > >
> > > > > Formatting, both conditions are short, so just put them on one line.
> > > > Changed.
> > > >
> > > > > But for the C++23 macros, more importantly I think we really should
> > > > > also in ix86_target_macros_internal add
> > > > >   if (c_dialect_cxx ()
> > > > >   && cxx_dialect > cxx20
> > > > >   && (isa_flag & OPTION_MASK_ISA_SSE2))
> > > > > {
> > > > >   def_or_undef (parse_in, "__STDCPP_FLOAT16_T__");
> > > > >   def_or_undef (parse_in, "__STDCPP_BFLOAT16_T__");
> > > > > }
> > > > > plus associated libstdc++ changes.  It can be done incrementally 
> > > > > though.
> > > > Added in PATCH 2/2
> > > >
> > > > > > +  if (flag_building_libgcc)
> > > > > > + {
> > > > > > +   /* libbid uses __LIBGCC_HAS_HF_MODE__ and 
> > > > > > __LIBGCC_HAS_BF_MODE__
> > > > > > +  to check backend support of _Float16 and __bf16 type.  */
> > > > >
> > > > > That is actually the case only for HFmode, but not for BFmode right 
> > > > > now.
> > > > > So, we need further work.  One is to add the BFmode support in there,
> > > > > and another one is make sure the _Float16 <-> _Decimal* and __bf16 <->
> > > > > _Decimal* conversions are compiled in also if not -msse2 by default.
> > > > > One way to do that is wrap the HF and BF mode related functions on x86
> > > > > #ifndef __SSE2__ into the pragmas like intrin headers use (but then
> > > > > perhaps we don't need to undef this stuff here), another is not 
> > > > > provide
> > > > > the hf/bf support in that case from the TUs where they are provided 
> > > > > now,
> > > > > but from a different one which would be compiled with -msse2.
> > > > Add CFLAGS-_hf_to_sd.c += -msse2, similar for other files in libbid, 
> > > > just like
> > > > we did before for HFtype softfp. Then no need to undef libgcc macros.
> > > >
> > > > > >/* We allowed the user to turn off SSE for kernel mode.  Don't 
> > > > > > crash if
> > > > > >   some less clueful developer tries to use floating-point 
> > > > > > anyway.  */
> > > > > > -  if (needed_sseregs && !TARGET_SSE)
> > > > > > +  if (needed_sseregs
> > > > > > +  && (!TARGET_SSE
> > > > > > +   || (VALID_SSE2_TYPE_MODE (mode)
> > > > > > +   && !TARGET_SSE2)))
> > > > >
> > > > > Formatting, no need to split this up that much.
> > > > >   if (needed_sseregs
> > > > >   && (!TARGET_SSE
> > > > >   || (VALID_SSE2_TYPE_MODE (mode) && !TARGET_SSE2)))
> > > > > or even better
> > > > >   if (needed_sseregs
> > > > >   && (!TARGET_SSE || (VALID_SSE2_TYPE_MODE (mode) && 
> > > > > !TARGET_SSE2)))
> > > > > will do it.
> > > > Changed.
> > > >
> > > > > Instead of this, just use
> > > > >   if (!float16_type_node)
> > > > > {
> > > > >   float16_type_node = ix86_float16_type_node;
> > > > >   callback (float16_type_node);
> > > > >   float16_type_node = NULL_TREE;
> > > > > }
> > > > >   if (!bfloat16_type_node)
> > > > > {
> > > > >   bfloat16_type_node = ix86_bf16_type_node;
> > > > >   callback (bfloat16_type_node);
> > > > >   bfloat16_type_node = NULL_TREE;
> > > > > }
> > > > Changed.
> > > >
> > > >
> > > > > > +static const char *
> > > > > > +ix86_invalid_conversion (const_tree fromtype, const_tree totype)
> > > > > > +{
> > > > > > +  if (element_mode (fromtype) != element_mode (totype))
> > > > > > +{
> > > > > > +  /* Do no allow conversions to/from BFmode/HFmode scalar types
> > > > > > +  when TARGET_SSE2 is not available.  */
> > > > > > +  if ((TYPE_MODE (fromtype) == BFmode
> > > > > > +|| TYPE_MODE (fromtype) == HFmode)
> > > > > > +   && !TARGET_SSE2)
> > > > >
> > > > > First of all, not really sure if this should be purely about scalar
> > > > > modes, not also complex and vector modes involving those inner modes.
> > > > > Because complex or vector modes with BF/HF elements will be without
> > > > > TARGET_SSE2 for sure lowered into scalar code and that can't be 
> > > > > handled
> > > > > either.
> > > > > So if (!TARGET_SSE2 && GET_MODE_INNER (TYPE_MODE (fromtype)) == 
> > > > > BFmode)
> > > > > or even better
> > > > > if (!TARGET_SSE2 && element_mode (fromtype) == BFmode)
> > > > > ?
> > > > > Or even better remember the 2 modes above into machine_mode 
> > > > > temporaries
> > > > > and just use those in the != comparison and for the checks?
> > > > >
> > > > > Also, I think it is weird to tell user 

Re: [PATCH v7, rs6000] Implemented f[min/max]_optab by xs[min/max]dp [PR103605]

2023-07-18 Thread Kewen.Lin via Gcc-patches
Hi Haochen,

on 2022/9/26 11:35, HAO CHEN GUI wrote:
> Hi,
>   This patch implements optab f[min/max]_optab by xs[min/max]dp on rs6000.
> Tests show that outputs of xs[min/max]dp are consistent with the standard
> of C99 fmin/max.
> 
>   This patch also binds __builtin_vsx_xs[min/max]dp to fmin/max instead
> of smin/max when fast-math is not set. While fast-math is set, xs[min/max]dp
> are folded to MIN/MAX_EXPR in gimple, and finally expanded to smin/max.
> 
>   Bootstrapped and tested on ppc64 Linux BE and LE with no regressions.
> Is this okay for trunk? Any recommendations? Thanks a lot.

Sorry for the late review, this patch is okay for trunk with the below
nit tweaked or not.  Thanks!

> 
> ChangeLog
> 2022-09-26 Haochen Gui 
> 
> gcc/
>   PR target/103605
>   * config/rs6000/rs6000-builtin.cc (rs6000_gimple_fold_builtin): Gimple
>   fold RS6000_BIF_XSMINDP and RS6000_BIF_XSMAXDP when fast-math is set.
>   * config/rs6000/rs6000.md (FMINMAX): New int iterator.
>   (minmax_op): New int attribute.
>   (UNSPEC_FMAX, UNSPEC_FMIN): New unspecs.
>   (f3): New pattern by UNSPEC_FMAX and UNSPEC_FMIN.
>   * config/rs6000/rs6000-builtins.def (__builtin_vsx_xsmaxdp): Set
>   pattern to fmaxdf3.
>   (__builtin_vsx_xsmindp): Set pattern to fmindf3.
> 
> gcc/testsuite/
>   PR target/103605
>   * gcc.dg/powerpc/pr103605.h: New.
>   * gcc.dg/powerpc/pr103605-1.c: New.
>   * gcc.dg/powerpc/pr103605-2.c: New.
> 
> patch.diff
> diff --git a/gcc/config/rs6000/rs6000-builtin.cc 
> b/gcc/config/rs6000/rs6000-builtin.cc
> index e925ba9fad9..944ae9fe55c 100644
> --- a/gcc/config/rs6000/rs6000-builtin.cc
> +++ b/gcc/config/rs6000/rs6000-builtin.cc
> @@ -1588,6 +1588,8 @@ rs6000_gimple_fold_builtin (gimple_stmt_iterator *gsi)
>gimple_set_location (g, gimple_location (stmt));
>gsi_replace (gsi, g, true);
>return true;
> +/* fold into MIN_EXPR when fast-math is set.  */
> +case RS6000_BIF_XSMINDP:
>  /* flavors of vec_min.  */
>  case RS6000_BIF_XVMINDP:
>  case RS6000_BIF_XVMINSP:
> @@ -1614,6 +1616,8 @@ rs6000_gimple_fold_builtin (gimple_stmt_iterator *gsi)
>gimple_set_location (g, gimple_location (stmt));
>gsi_replace (gsi, g, true);
>return true;
> +/* fold into MAX_EXPR when fast-math is set.  */
> +case RS6000_BIF_XSMAXDP:
>  /* flavors of vec_max.  */
>  case RS6000_BIF_XVMAXDP:
>  case RS6000_BIF_XVMAXSP:
> diff --git a/gcc/config/rs6000/rs6000-builtins.def 
> b/gcc/config/rs6000/rs6000-builtins.def
> index f4a9f24bcc5..8b735493b40 100644
> --- a/gcc/config/rs6000/rs6000-builtins.def
> +++ b/gcc/config/rs6000/rs6000-builtins.def
> @@ -1613,10 +1613,10 @@
>  XSCVSPDP vsx_xscvspdp {}
> 
>const double __builtin_vsx_xsmaxdp (double, double);
> -XSMAXDP smaxdf3 {}
> +XSMAXDP fmaxdf3 {}
> 
>const double __builtin_vsx_xsmindp (double, double);
> -XSMINDP smindf3 {}
> +XSMINDP fmindf3 {}
> 
>const double __builtin_vsx_xsrdpi (double);
>  XSRDPI vsx_xsrdpi {}
> diff --git a/gcc/config/rs6000/rs6000.md b/gcc/config/rs6000/rs6000.md
> index bf85baa5370..ae0dd98f0f9 100644
> --- a/gcc/config/rs6000/rs6000.md
> +++ b/gcc/config/rs6000/rs6000.md
> @@ -158,6 +158,8 @@ (define_c_enum "unspec"
> UNSPEC_HASHCHK
> UNSPEC_XXSPLTIDP_CONST
> UNSPEC_XXSPLTIW_CONST
> +   UNSPEC_FMAX
> +   UNSPEC_FMIN
>])
> 
>  ;;
> @@ -5341,6 +5343,22 @@ (define_insn_and_split "*s3_fpr"
>DONE;
>  })
> 
> +
> +(define_int_iterator FMINMAX [UNSPEC_FMAX UNSPEC_FMIN])
> +
> +(define_int_attr  minmax_op [(UNSPEC_FMAX "max")
> +  (UNSPEC_FMIN "min")])
> +
> +(define_insn "f3"
> +  [(set (match_operand:SFDF 0 "vsx_register_operand" "=wa")
> + (unspec:SFDF [(match_operand:SFDF 1 "vsx_register_operand" "wa")
> +   (match_operand:SFDF 2 "vsx_register_operand" "wa")]
> +  FMINMAX))]
> +  "TARGET_VSX && !flag_finite_math_only"
> +  "xsdp %x0,%x1,%x2"
> +  [(set_attr "type" "fp")]
> +)
> +
>  (define_expand "movcc"
> [(set (match_operand:GPR 0 "gpc_reg_operand")
>(if_then_else:GPR (match_operand 1 "comparison_operator")
> diff --git a/gcc/testsuite/gcc.target/powerpc/pr103605-1.c 
> b/gcc/testsuite/gcc.target/powerpc/pr103605-1.c
> new file mode 100644
> index 000..923deec6a1e
> --- /dev/null
> +++ b/gcc/testsuite/gcc.target/powerpc/pr103605-1.c
> @@ -0,0 +1,7 @@
> +/* { dg-do compile } */
> +/* { dg-require-effective-target powerpc_vsx_ok } */
> +/* { dg-options "-O2 -mvsx" } */

Nit: Add a comment here like:

/* Verify that GCC generates expected min/max hw insns instead of fmin/fmax 
calls. */

> +/* { dg-final { scan-assembler-times {\mxsmaxdp\M} 3 } } */
> +/* { dg-final { scan-assembler-times {\mxsmindp\M} 3 } } */> +
> +#include "pr103605.h"
> diff --git a/gcc/testsuite/gcc.target/powerpc/pr103605-2.c 
> b/gcc/testsuite/gcc.target/powerpc/pr103605-2.c
> new file mode 100644
> index 

Ping [PATCH v7, rs6000] Implemented f[min/max]_optab by xs[min/max]dp [PR103605]

2023-07-18 Thread HAO CHEN GUI via Gcc-patches
Hi,
  As the ticket(PR107013, adding fmin/max to RTL code) is suspended, I ping
this patch. The unspec of fmin/max can be replaced with corresponding RTL
code after that ticket is fixed.

https://gcc.gnu.org/pipermail/gcc-patches/2022-September/602181.html

Thanks
Gui Haochen


在 2022/9/26 11:35, HAO CHEN GUI 写道:
> Hi,
>   This patch implements optab f[min/max]_optab by xs[min/max]dp on rs6000.
> Tests show that outputs of xs[min/max]dp are consistent with the standard
> of C99 fmin/max.
> 
>   This patch also binds __builtin_vsx_xs[min/max]dp to fmin/max instead
> of smin/max when fast-math is not set. While fast-math is set, xs[min/max]dp
> are folded to MIN/MAX_EXPR in gimple, and finally expanded to smin/max.
> 
>   Bootstrapped and tested on ppc64 Linux BE and LE with no regressions.
> Is this okay for trunk? Any recommendations? Thanks a lot.
> 
> ChangeLog
> 2022-09-26 Haochen Gui 
> 
> gcc/
>   PR target/103605
>   * config/rs6000/rs6000-builtin.cc (rs6000_gimple_fold_builtin): Gimple
>   fold RS6000_BIF_XSMINDP and RS6000_BIF_XSMAXDP when fast-math is set.
>   * config/rs6000/rs6000.md (FMINMAX): New int iterator.
>   (minmax_op): New int attribute.
>   (UNSPEC_FMAX, UNSPEC_FMIN): New unspecs.
>   (f3): New pattern by UNSPEC_FMAX and UNSPEC_FMIN.
>   * config/rs6000/rs6000-builtins.def (__builtin_vsx_xsmaxdp): Set
>   pattern to fmaxdf3.
>   (__builtin_vsx_xsmindp): Set pattern to fmindf3.
> 
> gcc/testsuite/
>   PR target/103605
>   * gcc.dg/powerpc/pr103605.h: New.
>   * gcc.dg/powerpc/pr103605-1.c: New.
>   * gcc.dg/powerpc/pr103605-2.c: New.
> 
> patch.diff
> diff --git a/gcc/config/rs6000/rs6000-builtin.cc 
> b/gcc/config/rs6000/rs6000-builtin.cc
> index e925ba9fad9..944ae9fe55c 100644
> --- a/gcc/config/rs6000/rs6000-builtin.cc
> +++ b/gcc/config/rs6000/rs6000-builtin.cc
> @@ -1588,6 +1588,8 @@ rs6000_gimple_fold_builtin (gimple_stmt_iterator *gsi)
>gimple_set_location (g, gimple_location (stmt));
>gsi_replace (gsi, g, true);
>return true;
> +/* fold into MIN_EXPR when fast-math is set.  */
> +case RS6000_BIF_XSMINDP:
>  /* flavors of vec_min.  */
>  case RS6000_BIF_XVMINDP:
>  case RS6000_BIF_XVMINSP:
> @@ -1614,6 +1616,8 @@ rs6000_gimple_fold_builtin (gimple_stmt_iterator *gsi)
>gimple_set_location (g, gimple_location (stmt));
>gsi_replace (gsi, g, true);
>return true;
> +/* fold into MAX_EXPR when fast-math is set.  */
> +case RS6000_BIF_XSMAXDP:
>  /* flavors of vec_max.  */
>  case RS6000_BIF_XVMAXDP:
>  case RS6000_BIF_XVMAXSP:
> diff --git a/gcc/config/rs6000/rs6000-builtins.def 
> b/gcc/config/rs6000/rs6000-builtins.def
> index f4a9f24bcc5..8b735493b40 100644
> --- a/gcc/config/rs6000/rs6000-builtins.def
> +++ b/gcc/config/rs6000/rs6000-builtins.def
> @@ -1613,10 +1613,10 @@
>  XSCVSPDP vsx_xscvspdp {}
> 
>const double __builtin_vsx_xsmaxdp (double, double);
> -XSMAXDP smaxdf3 {}
> +XSMAXDP fmaxdf3 {}
> 
>const double __builtin_vsx_xsmindp (double, double);
> -XSMINDP smindf3 {}
> +XSMINDP fmindf3 {}
> 
>const double __builtin_vsx_xsrdpi (double);
>  XSRDPI vsx_xsrdpi {}
> diff --git a/gcc/config/rs6000/rs6000.md b/gcc/config/rs6000/rs6000.md
> index bf85baa5370..ae0dd98f0f9 100644
> --- a/gcc/config/rs6000/rs6000.md
> +++ b/gcc/config/rs6000/rs6000.md
> @@ -158,6 +158,8 @@ (define_c_enum "unspec"
> UNSPEC_HASHCHK
> UNSPEC_XXSPLTIDP_CONST
> UNSPEC_XXSPLTIW_CONST
> +   UNSPEC_FMAX
> +   UNSPEC_FMIN
>])
> 
>  ;;
> @@ -5341,6 +5343,22 @@ (define_insn_and_split "*s3_fpr"
>DONE;
>  })
> 
> +
> +(define_int_iterator FMINMAX [UNSPEC_FMAX UNSPEC_FMIN])
> +
> +(define_int_attr  minmax_op [(UNSPEC_FMAX "max")
> +  (UNSPEC_FMIN "min")])
> +
> +(define_insn "f3"
> +  [(set (match_operand:SFDF 0 "vsx_register_operand" "=wa")
> + (unspec:SFDF [(match_operand:SFDF 1 "vsx_register_operand" "wa")
> +   (match_operand:SFDF 2 "vsx_register_operand" "wa")]
> +  FMINMAX))]
> +  "TARGET_VSX && !flag_finite_math_only"
> +  "xsdp %x0,%x1,%x2"
> +  [(set_attr "type" "fp")]
> +)
> +
>  (define_expand "movcc"
> [(set (match_operand:GPR 0 "gpc_reg_operand")
>(if_then_else:GPR (match_operand 1 "comparison_operator")
> diff --git a/gcc/testsuite/gcc.target/powerpc/pr103605-1.c 
> b/gcc/testsuite/gcc.target/powerpc/pr103605-1.c
> new file mode 100644
> index 000..923deec6a1e
> --- /dev/null
> +++ b/gcc/testsuite/gcc.target/powerpc/pr103605-1.c
> @@ -0,0 +1,7 @@
> +/* { dg-do compile } */
> +/* { dg-require-effective-target powerpc_vsx_ok } */
> +/* { dg-options "-O2 -mvsx" } */
> +/* { dg-final { scan-assembler-times {\mxsmaxdp\M} 3 } } */
> +/* { dg-final { scan-assembler-times {\mxsmindp\M} 3 } } */
> +
> +#include "pr103605.h"
> diff --git a/gcc/testsuite/gcc.target/powerpc/pr103605-2.c 
> 

[PATCH] AArch64: Do not increase the vect reduction latency by multiplying count [PR110625]

2023-07-18 Thread Hao Liu OS via Gcc-patches
This only affects the new costs in aarch64 backend.  Currently, the reduction
latency of vector body is too large as it is multiplied by stmt count.  As the
scalar reduction latency is small, the new costs model may think "scalar code
would issue more quickly" and increase the vector body cost a lot, which will
miss vectorization opportunities.

Tested by bootstrapping on aarch64-linux-gnu.

gcc/ChangeLog:

PR target/110625
* config/aarch64/aarch64.cc (count_ops): Remove the '* count'
for reduction_latency.

gcc/testsuite/ChangeLog:

* gcc.target/aarch64/pr110625.c: New testcase.
---
 gcc/config/aarch64/aarch64.cc   |  5 +--
 gcc/testsuite/gcc.target/aarch64/pr110625.c | 46 +
 2 files changed, 47 insertions(+), 4 deletions(-)
 create mode 100644 gcc/testsuite/gcc.target/aarch64/pr110625.c

diff --git a/gcc/config/aarch64/aarch64.cc b/gcc/config/aarch64/aarch64.cc
index 560e5431636..27afa64b7d5 100644
--- a/gcc/config/aarch64/aarch64.cc
+++ b/gcc/config/aarch64/aarch64.cc
@@ -16788,10 +16788,7 @@ aarch64_vector_costs::count_ops (unsigned int count, 
vect_cost_for_stmt kind,
 {
   unsigned int base
= aarch64_in_loop_reduction_latency (m_vinfo, stmt_info, m_vec_flags);
-
-  /* ??? Ideally we'd do COUNT reductions in parallel, but unfortunately
-that's not yet the case.  */
-  ops->reduction_latency = MAX (ops->reduction_latency, base * count);
+  ops->reduction_latency = MAX (ops->reduction_latency, base);
 }
 
   /* Assume that multiply-adds will become a single operation.  */
diff --git a/gcc/testsuite/gcc.target/aarch64/pr110625.c 
b/gcc/testsuite/gcc.target/aarch64/pr110625.c
new file mode 100644
index 000..0965cac33a0
--- /dev/null
+++ b/gcc/testsuite/gcc.target/aarch64/pr110625.c
@@ -0,0 +1,46 @@
+/* { dg-do compile } */
+/* { dg-options "-Ofast -mcpu=neoverse-n2 -fdump-tree-vect-details 
-fno-tree-slp-vectorize" } */
+/* { dg-final { scan-tree-dump-not "reduction latency = 8" "vect" } } */
+
+/* Do not increase the vector body cost due to the incorrect reduction latency
+Original vector body cost = 51
+Scalar issue estimate:
+  ...
+  reduction latency = 2
+  estimated min cycles per iteration = 2.00
+  estimated cycles per vector iteration (for VF 2) = 4.00
+Vector issue estimate:
+  ...
+  reduction latency = 8  <-- Too large
+  estimated min cycles per iteration = 8.00
+Increasing body cost to 102 because scalar code would issue more quickly
+  ...
+missed:  cost model: the vector iteration cost = 102 divided by the scalar 
iteration cost = 44 is greater or equal to the vectorization factor = 2.
+missed:  not vectorized: vectorization not profitable.  */
+
+typedef struct
+{
+  unsigned short m1, m2, m3, m4;
+} the_struct_t;
+typedef struct
+{
+  double m1, m2, m3, m4, m5;
+} the_struct2_t;
+
+double
+bar (the_struct2_t *);
+
+double
+foo (double *k, unsigned int n, the_struct_t *the_struct)
+{
+  unsigned int u;
+  the_struct2_t result;
+  for (u = 0; u < n; u++, k--)
+{
+  result.m1 += (*k) * the_struct[u].m1;
+  result.m2 += (*k) * the_struct[u].m2;
+  result.m3 += (*k) * the_struct[u].m3;
+  result.m4 += (*k) * the_struct[u].m4;
+}
+  return bar ();
+}
-- 
2.34.1


Re: [PATCH v3] Implement new RTL optimizations pass: fold-mem-offsets.

2023-07-18 Thread Jeff Law via Gcc-patches




On 7/18/23 17:42, Vineet Gupta wrote:

Hi Manolis,

On 7/18/23 11:01, Jeff Law via Gcc-patches wrote:
Vineet @ Rivos has indicated he stumbled across an ICE with the V3 
code.  Hopefully he'll get a testcase for that extracted shortly. 


Yeah, I was trying to build SPEC2017 with this patch and ran into ICE 
for several of them with -Ofast build: The reduced test from 455.nab is 
attached here.

The issue happens with v2 as well, so not something introduced by v3.

There's ICE in cprop_hardreg which immediately follows f-m-o.


The protagonist is ins 93 which starts off in combine as a simple set of 
a DF 0.


| sff.i.288r.combine:(insn 93 337 94 8 (set (reg/v:DF 236 [ e ])
| sff.i.288r.combine- (const_double:DF 0.0 [0x0.0p+0])) "sff.i":23:11 
190 {*movdf_hardfloat_rv64}


Subsequently reload transforms it into SP + offset

| sff.i.303r.reload:(insn 93 337 94 9 (set (mem/c:DF (plus:DI (reg/f:DI 
2 sp)

| sff.i.303r.reload- (const_int 8 [0x8])) [4 %sfp+-8 S8 A64])
| sff.i.303r.reload- (const_double:DF 0.0 [0x0.0p+0])) "sff.i":23:11 190 
{*movdf_hardfloat_rv64}

| sff.i.303r.reload- (expr_list:REG_EQUAL (const_double:DF 0.0 [0x0.0p+0])

It gets processed by f-m-o and lands in cprop_hardreg, where it triggers 
ICE.


| (insn 93 337 523 11 (set (mem/c:DF (plus:DI (reg/f:DI 2 sp)
|     (const_int 8 [0x8])) [4 %sfp+-8 S8 A64])
|     (const_double:DF 0.0 [0x0.0p+0])) "sff.i":23:11 -1
^^^
|      (expr_list:REG_EQUAL (const_double:DF 0.0 [0x0.0p+0])
|    (nil)))
| during RTL pass: cprop_hardreg

Here's my analysis:

f-m-o: do_check_validity() -> insn_invalid_p() tries to recog() a 
modified version of insn 93 (actually there is no change, so perhaps 
something we can optimize later). The corresponding md pattern 
movdf_hardfloat_rv64 no longer matches since it expects REG_P for 
operand0, while reload has converted it into SP + offset. f-m-o then 
does the right thing by invalidating INSN_CODE=-1 for a subsequent 
recog() to work correctly.
But it seems this -1 lingers into the next pass, and trips up 
copyprop_hardreg_forward_1() -> extract_constrain_insn()

So I don't know what the right fix here should be.
This is a bug in the RISC-V backend.  I actually fixed basically the 
same bug in another backend that was exposed by the f-m-o code.





In a run with -fno-fold-mem-offsets, the same insn 93 is successfully 
grok'ed by cprop_hardreg,


| (insn 93 337 522 11 (set (mem/c:DF (plus:DI (reg/f:DI 2 sp)
|    (const_int 8 [0x8])) [4 %sfp+-8 S8 A64])
|    (const_double:DF 0.0 [0x0.0p+0])) "sff.i":23:11 190 
{*movdf_hardfloat_rv64}

^^^
| (expr_list:REG_EQUAL (const_double:DF 0.0 [0x0.0p+0])
|    (nil)))

P.S. I wonder if it is a good idea in general to call recog() post 
reload since the insn could be changed sufficiently to no longer match 
the md patterns. Of course I don't know the answer.

If this ever causes a problem, it's a backend bug.  It's that simple.

Conceptually it should always be safe to set INSN_CODE to -1 for any insn.

Odds are for this specific case in the RV backend, we just need a 
constraint to store 0.0 into a memory location.  That can actually be 
implemented as a store from x0 since 0.0 has the bit pattern 0x0.  This 
is probably a good thing to expose anyway as an optimization and can 
move forward independently of the f-m-o patch.






P.S.2 When debugging code, I noticed a minor annoyance in the patch with 
the whole fold_mem_offsets_driver() switch-case indirection. It doesn't 
seem to be serving any purpose, and we could simply call corresponding 
do_* routines in execute () itself.
We were in the process of squashing some of this out of the 
implementation.   I hadn't looked at the V3 patch to see how much 
progress had been made on this yet.


Thanks for digging into this!

jeff


[PATCH v2] Store_bit_field_1: Use SUBREG instead of REG if possible

2023-07-18 Thread YunQiang Su
PR #104914

When work with
  int val;
  ((unsigned char*))[3] = *buf;
  if (val > 0) ...
The RTX mode is obtained from REG instead of SUBREG, which make
D is used instead of .  Thus something wrong happens
on sign-extend default architectures, like MIPS64.

Let's use str_rtx and mode of str_rtx as the parameters for
store_integral_bit_field if:
  modes of op0 and str_rtx are INT;
  length of op0 is greater than str_rtx.

This patch has been tested on aarch64-linux-gnu, x86_64-linux-gnu,
mips64el-linux-gnuabi64 without regression.

gcc/ChangeLog:
PR: 104914.
* expmed.cc(store_bit_field_1): Pass str_rtx and its mode
to store_integral_bit_field if the length of op0 is greater
than str_rtx.

gcc/testsuite/ChangeLog:
PR: 104914.
* gcc.target/mips/pr104914.c: New testcase.
---
 gcc/expmed.cc| 20 +---
 gcc/testsuite/gcc.target/mips/pr104914.c | 17 +
 2 files changed, 34 insertions(+), 3 deletions(-)
 create mode 100644 gcc/testsuite/gcc.target/mips/pr104914.c

diff --git a/gcc/expmed.cc b/gcc/expmed.cc
index fbd4ce2d42f..5531c19e891 100644
--- a/gcc/expmed.cc
+++ b/gcc/expmed.cc
@@ -850,6 +850,7 @@ store_bit_field_1 (rtx str_rtx, poly_uint64 bitsize, 
poly_uint64 bitnum,
  since that case is valid for any mode.  The following cases are only
  valid for integral modes.  */
   opt_scalar_int_mode op0_mode = int_mode_for_mode (GET_MODE (op0));
+  opt_scalar_int_mode str_mode = int_mode_for_mode (GET_MODE (str_rtx));
   scalar_int_mode imode;
   if (!op0_mode.exists () || imode != GET_MODE (op0))
 {
@@ -881,9 +882,22 @@ store_bit_field_1 (rtx str_rtx, poly_uint64 bitsize, 
poly_uint64 bitnum,
op0 = gen_lowpart (op0_mode.require (), op0);
 }
 
-  return store_integral_bit_field (op0, op0_mode, ibitsize, ibitnum,
-  bitregion_start, bitregion_end,
-  fieldmode, value, reverse, fallback_p);
+  /* If MODEs of str_rtx and op0 are INT, and the length of op0 is greater than
+ str_rtx, it means that str_rtx has a shorter SUBREG: int32 on 64 mach/ABI
+ is an example.  For this case, we should use the mode of SUBREG, otherwise
+ bad code will generate for sign-extension ports, like MIPS.  */
+  bool use_str_mode = false;
+  if (GET_MODE_CLASS (GET_MODE (str_rtx)) == MODE_INT
+  && GET_MODE_CLASS (GET_MODE (op0)) == MODE_INT
+  && known_gt (GET_MODE_SIZE (GET_MODE (op0)),
+  GET_MODE_SIZE (GET_MODE (str_rtx
+use_str_mode = true;
+
+  return store_integral_bit_field (use_str_mode ? str_rtx : op0,
+  use_str_mode ? str_mode : op0_mode,
+  ibitsize, ibitnum, bitregion_start,
+  bitregion_end, fieldmode, value,
+  reverse, fallback_p);
 }
 
 /* Subroutine of store_bit_field_1, with the same arguments, except
diff --git a/gcc/testsuite/gcc.target/mips/pr104914.c 
b/gcc/testsuite/gcc.target/mips/pr104914.c
new file mode 100644
index 000..fd6ef6af446
--- /dev/null
+++ b/gcc/testsuite/gcc.target/mips/pr104914.c
@@ -0,0 +1,17 @@
+/* { dg-do compile } */
+/* { dg-options "-march=mips64r2 -mabi=64" } */
+
+/* { dg-final { scan-assembler-not "\tdins\t" } } */
+
+NOMIPS16 int test (const unsigned char *buf)
+{
+  int val;
+  ((unsigned char*))[0] = *buf++;
+  ((unsigned char*))[1] = *buf++;
+  ((unsigned char*))[2] = *buf++;
+  ((unsigned char*))[3] = *buf++;
+  if(val > 0)
+return 1;
+  else
+return 0;
+}
-- 
2.30.2



[PATCH] Store_bit_field_1: Use SUBREG instead of REG if possible

2023-07-18 Thread YunQiang Su
PR #104914

When work with
  int val;
  ((unsigned char*))[3] = *buf;
  if (val > 0) ...
The RTX mode is obtained from REG instead of SUBREG, which make
D is used instead of .  Thus something wrong happens
on sign-extend default architectures, like MIPS64.

Let's use str_rtx and mode of str_rtx as the parameters for
store_integral_bit_field if:
  modes of op0 and str_rtx are INT;
  length of op0 is greater than str_rtx.

This patch has been tested on aarch64-linux-gnu, x86_64-linux-gnu,
mips64el-linux-gnuabi64 without regression.

gcc/ChangeLog:
PR: 104914.
* expmed.cc(store_bit_field_1): Pass str_rtx and its mode
to store_integral_bit_field if the length of op0 is greater
than str_rtx.

gcc/testsuite/ChangeLog:
PR: 104914.
* gcc.target/mips/pr104914.c: New testcase.
---
 gcc/expmed.cc|  20 +-
 gcc/testsuite/gcc.target/mips/pr104914.c |  17 ++
 gcc/xx   | 233 +++
 gcc/xx.log   |   1 +
 4 files changed, 268 insertions(+), 3 deletions(-)
 create mode 100644 gcc/testsuite/gcc.target/mips/pr104914.c
 create mode 100644 gcc/xx
 create mode 100644 gcc/xx.log

diff --git a/gcc/expmed.cc b/gcc/expmed.cc
index fbd4ce2d42f..5531c19e891 100644
--- a/gcc/expmed.cc
+++ b/gcc/expmed.cc
@@ -850,6 +850,7 @@ store_bit_field_1 (rtx str_rtx, poly_uint64 bitsize, 
poly_uint64 bitnum,
  since that case is valid for any mode.  The following cases are only
  valid for integral modes.  */
   opt_scalar_int_mode op0_mode = int_mode_for_mode (GET_MODE (op0));
+  opt_scalar_int_mode str_mode = int_mode_for_mode (GET_MODE (str_rtx));
   scalar_int_mode imode;
   if (!op0_mode.exists () || imode != GET_MODE (op0))
 {
@@ -881,9 +882,22 @@ store_bit_field_1 (rtx str_rtx, poly_uint64 bitsize, 
poly_uint64 bitnum,
op0 = gen_lowpart (op0_mode.require (), op0);
 }
 
-  return store_integral_bit_field (op0, op0_mode, ibitsize, ibitnum,
-  bitregion_start, bitregion_end,
-  fieldmode, value, reverse, fallback_p);
+  /* If MODEs of str_rtx and op0 are INT, and the length of op0 is greater than
+ str_rtx, it means that str_rtx has a shorter SUBREG: int32 on 64 mach/ABI
+ is an example.  For this case, we should use the mode of SUBREG, otherwise
+ bad code will generate for sign-extension ports, like MIPS.  */
+  bool use_str_mode = false;
+  if (GET_MODE_CLASS (GET_MODE (str_rtx)) == MODE_INT
+  && GET_MODE_CLASS (GET_MODE (op0)) == MODE_INT
+  && known_gt (GET_MODE_SIZE (GET_MODE (op0)),
+  GET_MODE_SIZE (GET_MODE (str_rtx
+use_str_mode = true;
+
+  return store_integral_bit_field (use_str_mode ? str_rtx : op0,
+  use_str_mode ? str_mode : op0_mode,
+  ibitsize, ibitnum, bitregion_start,
+  bitregion_end, fieldmode, value,
+  reverse, fallback_p);
 }
 
 /* Subroutine of store_bit_field_1, with the same arguments, except
diff --git a/gcc/testsuite/gcc.target/mips/pr104914.c 
b/gcc/testsuite/gcc.target/mips/pr104914.c
new file mode 100644
index 000..fd6ef6af446
--- /dev/null
+++ b/gcc/testsuite/gcc.target/mips/pr104914.c
@@ -0,0 +1,17 @@
+/* { dg-do compile } */
+/* { dg-options "-march=mips64r2 -mabi=64" } */
+
+/* { dg-final { scan-assembler-not "\tdins\t" } } */
+
+NOMIPS16 int test (const unsigned char *buf)
+{
+  int val;
+  ((unsigned char*))[0] = *buf++;
+  ((unsigned char*))[1] = *buf++;
+  ((unsigned char*))[2] = *buf++;
+  ((unsigned char*))[3] = *buf++;
+  if(val > 0)
+return 1;
+  else
+return 0;
+}
diff --git a/gcc/xx b/gcc/xx
new file mode 100644
index 000..664e8ba8b25
--- /dev/null
+++ b/gcc/xx
@@ -0,0 +1,233 @@
+DEFTREECODE (ERROR_MARK, "error_mark", tcc_exceptional, 0)
+DEFTREECODE (IDENTIFIER_NODE, "identifier_node", tcc_exceptional, 0)
+DEFTREECODE (TREE_LIST, "tree_list", tcc_exceptional, 0)
+DEFTREECODE (TREE_VEC, "tree_vec", tcc_exceptional, 0)
+DEFTREECODE (BLOCK, "block", tcc_exceptional, 0)
+DEFTREECODE (OFFSET_TYPE, "offset_type", tcc_type, 0)
+DEFTREECODE (ENUMERAL_TYPE, "enumeral_type", tcc_type, 0)
+DEFTREECODE (BOOLEAN_TYPE, "boolean_type", tcc_type, 0)
+DEFTREECODE (INTEGER_TYPE, "integer_type", tcc_type, 0)
+DEFTREECODE (REAL_TYPE, "real_type", tcc_type, 0)
+DEFTREECODE (POINTER_TYPE, "pointer_type", tcc_type, 0)
+DEFTREECODE (REFERENCE_TYPE, "reference_type", tcc_type, 0)
+DEFTREECODE (NULLPTR_TYPE, "nullptr_type", tcc_type, 0)
+DEFTREECODE (FIXED_POINT_TYPE, "fixed_point_type", tcc_type, 0)
+DEFTREECODE (COMPLEX_TYPE, "complex_type", tcc_type, 0)
+DEFTREECODE (VECTOR_TYPE, "vector_type", tcc_type, 0)
+DEFTREECODE (ARRAY_TYPE, "array_type", tcc_type, 0)
+DEFTREECODE (RECORD_TYPE, "record_type", tcc_type, 0)
+DEFTREECODE (UNION_TYPE, "union_type", tcc_type, 0)/* C union type */

[Bug preprocessor/103902] GCC requires a space between string-literal and identifier in a literal-operator-id where the identifier is not in basic character set

2023-07-18 Thread lhyatt at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=103902

Lewis Hyatt  changed:

   What|Removed |Added

 Status|NEW |RESOLVED
   Target Milestone|--- |14.0
 Resolution|--- |FIXED

--- Comment #7 from Lewis Hyatt  ---
Fixed for GCC 14.

[Bug preprocessor/103902] GCC requires a space between string-literal and identifier in a literal-operator-id where the identifier is not in basic character set

2023-07-18 Thread cvs-commit at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=103902

--- Comment #6 from CVS Commits  ---
The master branch has been updated by Lewis Hyatt :

https://gcc.gnu.org/g:1d3e4f4e2d19c3394dc018118a78c1f4b59cb5c2

commit r14-2629-g1d3e4f4e2d19c3394dc018118a78c1f4b59cb5c2
Author: Lewis Hyatt 
Date:   Tue Jul 18 17:16:08 2023 -0400

libcpp: Handle extended characters in user-defined literal suffix
[PR103902]

The PR complains that we do not handle UTF-8 in the suffix for a
user-defined
literal, such as:

bool operator ""_Ï (unsigned long long);

In fact we don't handle any extended identifier characters there, whether
UTF-8, UCNs, or the $ sign. We do handle it fine if the optional space
after
the "" tokens is included, since then the identifier is lexed in the
"normal" way as its own token. But when it is lexed as part of the string
token, this is handled in lex_string() with a one-off loop that is not
aware
of extended characters.

This patch fixes it by adding a new function scan_cur_identifier() that can
be used to lex an identifier while in the middle of lexing another token.

BTW, the other place that has been mis-lexing identifiers is
lex_identifier_intern(), which is used to implement #pragma push_macro
and #pragma pop_macro. This does not support extended characters either.
I will add that in a subsequent patch, because it can't directly reuse the
new function, but rather needs to lex from a string instead of a
cpp_buffer.

With scan_cur_identifier(), we do also correctly warn about bidi and
normalization issues in the extended identifiers comprising the suffix.

libcpp/ChangeLog:

PR preprocessor/103902
* lex.cc (identifier_diagnostics_on_lex): New function refactoring
some common code.
(lex_identifier_intern): Use the new function.
(lex_identifier): Don't run identifier diagnostics here, rather let
the call site do it when needed.
(_cpp_lex_direct): Adjust the call sites of lex_identifier ()
acccordingly.
(struct scan_id_result): New struct.
(scan_cur_identifier): New function.
(create_literal2): New function.
(lit_accum::create_literal2): New function.
(is_macro): Folded into new function...
(maybe_ignore_udl_macro_suffix): ...here.
(is_macro_not_literal_suffix): Folded likewise.
(lex_raw_string): Handle UTF-8 in UDL suffix via
scan_cur_identifier ().
(lex_string): Likewise.

gcc/testsuite/ChangeLog:

PR preprocessor/103902
* g++.dg/cpp0x/udlit-extended-id-1.C: New test.
* g++.dg/cpp0x/udlit-extended-id-2.C: New test.
* g++.dg/cpp0x/udlit-extended-id-3.C: New test.
* g++.dg/cpp0x/udlit-extended-id-4.C: New test.

Re: [PATCH v1] RISC-V: Support CALL for RVV floating-point dynamic rounding

2023-07-18 Thread juzhe.zh...@rivai.ai
   /* The RTL variable which stores the dynamic FRM value.  We always use this
  RTX to restore dynamic FRM rounding mode in mode switching.  */
   rtx dynamic_frm;
+
+  /* The boolean variables indicates there is at least one static rounding
+ mode instruction in the function or not.  */
+  bool static_frm_p;


Add a structure wrapper to wrap these 2 variable up.



juzhe.zh...@rivai.ai
 
From: pan2.li
Date: 2023-07-19 11:28
To: gcc-patches
CC: juzhe.zhong; pan2.li; yanzhang.wang; kito.cheng
Subject: [PATCH v1] RISC-V: Support CALL for RVV floating-point dynamic rounding
From: Pan Li 
 
In basic dynamic rounding mode, we simply ignore call instructions and
we would like to take care of call in this PATCH.
 
During the call, the frm may be updated or keep as is. Thus, we must
make sure at least 2 things.
 
1. The static frm before call should not pollute the frm value in call.
2. The updated frm value in call should be sticky after call completed.
 
We will perfrom some steps to make above happen.
 
1. Mark call instruction with new mode DYN_CALL.
2. Mark the instruction after CALL from NONE to DYN.
3. When emit for a DYN_CALL, we will restore the frm value.
4. When emit from a DYN_CALL, we will backup the frm value.
 
Let's take a flow for this.
 
   +-+
   | Entry (DYN) | <- frrm a5
   +-+
  /   \
+---+ +---+
| VFADD | | VFADD RTZ |  <- fsrmi 1(RTZ)
+---+ +---+
  ||
+---+ +---+
| CALL  | | CALL  |  <- fsrm a5
+---+ +---+
  |   |
+---+ +---+
| SHIFT | <- frrm a5  | VFADD |  <- frrm a5
+---+ +---+
  |  /
+---+   /
| VFADD RUP | <- fsrm1 3(RUP)
+---+ /
   \ /
+-+
| Exit (DYN_EXIT) | <- fsrm a5
+-+
 
Please *NOTE* some corn cases like no instruction after a call is not
well handled, and will be coverred in another PATCH(s) soon.
 
Signed-off-by: Pan Li 
Co-Authored-By: Juzhe-Zhong 
 
gcc/ChangeLog:
 
* config/riscv/riscv.cc (struct machine_function): Add new field
static_frm_p.
(riscv_emit_frm_mode_set): Add DYN_CALL emit.
(riscv_frm_mode_needed): New function for frm mode needed.
(riscv_mode_needed): Extrac function for frm.
(riscv_frm_mode_after): Add DYN_CALL after.
* config/riscv/vector.md (frm_mode): Add dyn_call.
(fsrmsi_restore_exit): Rename to _volatile.
(fsrmsi_restore_volatile): Likewise.
 
gcc/testsuite/ChangeLog:
 
* gcc.target/riscv/rvv/base/float-point-frm-insert-7.c: Fix
tests cases.
* gcc.target/riscv/rvv/base/float-point-dynamic-frm-33.c: New test.
* gcc.target/riscv/rvv/base/float-point-dynamic-frm-34.c: New test.
* gcc.target/riscv/rvv/base/float-point-dynamic-frm-35.c: New test.
* gcc.target/riscv/rvv/base/float-point-dynamic-frm-36.c: New test.
* gcc.target/riscv/rvv/base/float-point-dynamic-frm-37.c: New test.
* gcc.target/riscv/rvv/base/float-point-dynamic-frm-38.c: New test.
* gcc.target/riscv/rvv/base/float-point-dynamic-frm-39.c: New test.
* gcc.target/riscv/rvv/base/float-point-dynamic-frm-40.c: New test.
* gcc.target/riscv/rvv/base/float-point-dynamic-frm-41.c: New test.
* gcc.target/riscv/rvv/base/float-point-dynamic-frm-42.c: New test.
* gcc.target/riscv/rvv/base/float-point-dynamic-frm-43.c: New test.
* gcc.target/riscv/rvv/base/float-point-dynamic-frm-44.c: New test.
* gcc.target/riscv/rvv/base/float-point-frm-run-4.c: New test.
* gcc.target/riscv/rvv/base/float-point-frm-run-5.c: New test.
---
gcc/config/riscv/riscv.cc | 73 +++-
gcc/config/riscv/vector.md|  4 +-
.../rvv/base/float-point-dynamic-frm-33.c | 31 +++
.../rvv/base/float-point-dynamic-frm-34.c | 32 +++
.../rvv/base/float-point-dynamic-frm-35.c | 32 +++
.../rvv/base/float-point-dynamic-frm-36.c | 29 +++
.../rvv/base/float-point-dynamic-frm-37.c | 36 
.../rvv/base/float-point-dynamic-frm-38.c | 34 
.../rvv/base/float-point-dynamic-frm-39.c | 36 
.../rvv/base/float-point-dynamic-frm-40.c | 34 
.../rvv/base/float-point-dynamic-frm-41.c | 37 +
.../rvv/base/float-point-dynamic-frm-42.c | 37 +
.../rvv/base/float-point-dynamic-frm-43.c | 38 +
.../rvv/base/float-point-dynamic-frm-44.c | 40 +
.../riscv/rvv/base/float-point-frm-insert-7.c |  5 +-
.../riscv/rvv/base/float-point-frm-run-4.c| 82 ++
.../riscv/rvv/base/float-point-frm-run-5.c| 83 +++
17 files changed, 655 insertions(+), 8 deletions(-)
create mode 100644 
gcc/testsuite/gcc.target/riscv/rvv/base/float-point-dynamic-frm-33.c
create mode 100644 

[PATCH v1] RISC-V: Support CALL for RVV floating-point dynamic rounding

2023-07-18 Thread Pan Li via Gcc-patches
From: Pan Li 

In basic dynamic rounding mode, we simply ignore call instructions and
we would like to take care of call in this PATCH.

During the call, the frm may be updated or keep as is. Thus, we must
make sure at least 2 things.

1. The static frm before call should not pollute the frm value in call.
2. The updated frm value in call should be sticky after call completed.

We will perfrom some steps to make above happen.

1. Mark call instruction with new mode DYN_CALL.
2. Mark the instruction after CALL from NONE to DYN.
3. When emit for a DYN_CALL, we will restore the frm value.
4. When emit from a DYN_CALL, we will backup the frm value.

Let's take a flow for this.

   +-+
   | Entry (DYN) | <- frrm a5
   +-+
  /   \
+---+ +---+
| VFADD | | VFADD RTZ |  <- fsrmi 1(RTZ)
+---+ +---+
  ||
+---+ +---+
| CALL  | | CALL  |  <- fsrm a5
+---+ +---+
  |   |
+---+ +---+
| SHIFT | <- frrm a5  | VFADD |  <- frrm a5
+---+ +---+
  |  /
+---+   /
| VFADD RUP | <- fsrm1 3(RUP)
+---+ /
   \ /
+-+
| Exit (DYN_EXIT) | <- fsrm a5
+-+

Please *NOTE* some corn cases like no instruction after a call is not
well handled, and will be coverred in another PATCH(s) soon.

Signed-off-by: Pan Li 
Co-Authored-By: Juzhe-Zhong 

gcc/ChangeLog:

* config/riscv/riscv.cc (struct machine_function): Add new field
static_frm_p.
(riscv_emit_frm_mode_set): Add DYN_CALL emit.
(riscv_frm_mode_needed): New function for frm mode needed.
(riscv_mode_needed): Extrac function for frm.
(riscv_frm_mode_after): Add DYN_CALL after.
* config/riscv/vector.md (frm_mode): Add dyn_call.
(fsrmsi_restore_exit): Rename to _volatile.
(fsrmsi_restore_volatile): Likewise.

gcc/testsuite/ChangeLog:

* gcc.target/riscv/rvv/base/float-point-frm-insert-7.c: Fix
tests cases.
* gcc.target/riscv/rvv/base/float-point-dynamic-frm-33.c: New test.
* gcc.target/riscv/rvv/base/float-point-dynamic-frm-34.c: New test.
* gcc.target/riscv/rvv/base/float-point-dynamic-frm-35.c: New test.
* gcc.target/riscv/rvv/base/float-point-dynamic-frm-36.c: New test.
* gcc.target/riscv/rvv/base/float-point-dynamic-frm-37.c: New test.
* gcc.target/riscv/rvv/base/float-point-dynamic-frm-38.c: New test.
* gcc.target/riscv/rvv/base/float-point-dynamic-frm-39.c: New test.
* gcc.target/riscv/rvv/base/float-point-dynamic-frm-40.c: New test.
* gcc.target/riscv/rvv/base/float-point-dynamic-frm-41.c: New test.
* gcc.target/riscv/rvv/base/float-point-dynamic-frm-42.c: New test.
* gcc.target/riscv/rvv/base/float-point-dynamic-frm-43.c: New test.
* gcc.target/riscv/rvv/base/float-point-dynamic-frm-44.c: New test.
* gcc.target/riscv/rvv/base/float-point-frm-run-4.c: New test.
* gcc.target/riscv/rvv/base/float-point-frm-run-5.c: New test.
---
 gcc/config/riscv/riscv.cc | 73 +++-
 gcc/config/riscv/vector.md|  4 +-
 .../rvv/base/float-point-dynamic-frm-33.c | 31 +++
 .../rvv/base/float-point-dynamic-frm-34.c | 32 +++
 .../rvv/base/float-point-dynamic-frm-35.c | 32 +++
 .../rvv/base/float-point-dynamic-frm-36.c | 29 +++
 .../rvv/base/float-point-dynamic-frm-37.c | 36 
 .../rvv/base/float-point-dynamic-frm-38.c | 34 
 .../rvv/base/float-point-dynamic-frm-39.c | 36 
 .../rvv/base/float-point-dynamic-frm-40.c | 34 
 .../rvv/base/float-point-dynamic-frm-41.c | 37 +
 .../rvv/base/float-point-dynamic-frm-42.c | 37 +
 .../rvv/base/float-point-dynamic-frm-43.c | 38 +
 .../rvv/base/float-point-dynamic-frm-44.c | 40 +
 .../riscv/rvv/base/float-point-frm-insert-7.c |  5 +-
 .../riscv/rvv/base/float-point-frm-run-4.c| 82 ++
 .../riscv/rvv/base/float-point-frm-run-5.c| 83 +++
 17 files changed, 655 insertions(+), 8 deletions(-)
 create mode 100644 
gcc/testsuite/gcc.target/riscv/rvv/base/float-point-dynamic-frm-33.c
 create mode 100644 
gcc/testsuite/gcc.target/riscv/rvv/base/float-point-dynamic-frm-34.c
 create mode 100644 
gcc/testsuite/gcc.target/riscv/rvv/base/float-point-dynamic-frm-35.c
 create mode 100644 
gcc/testsuite/gcc.target/riscv/rvv/base/float-point-dynamic-frm-36.c
 create mode 100644 
gcc/testsuite/gcc.target/riscv/rvv/base/float-point-dynamic-frm-37.c
 create mode 100644 

[PATCH-1, combine] Don't widen shift mode when target has rotate/mask instruction on original mode [PR93738]

2023-07-18 Thread HAO CHEN GUI via Gcc-patches
Hi,
  The shift mode will be widen in combine pass if the operand has a normal
subreg. But when the target already has rotate/mask/insert instructions on
the narrow mode, it's unnecessary to widen the mode for lshiftrt. As
the lshiftrt is commonly converted to rotate/mask insn, the widen mode
blocks it to be further combined to rotate/mask/insert insn. The PR93738
shows the case.

The lshiftrt:SI (subreg:SI (reg:DI)) is converted to
subreg:SI (lshiftrt:DI (reg:DI)) and fails to match rotate/mask pattern.

Trying 13, 10 -> 14:
   13: r127:SI=r125:SI&0xf0ff
  REG_DEAD r125:SI
   10: r124:SI=r129:DI#4 0>>0xc&0xf00
  REG_DEAD r129:DI
   14: r128:SI=r127:SI|r124:SI

Failed to match this instruction:
(set (reg:SI 128)
(ior:SI (and:SI (reg:SI 125 [+-2 ])
(const_int -3841 [0xf0ff]))
(and:SI (subreg:SI (zero_extract:DI (reg:DI 129)
(const_int 32 [0x20])
(const_int 20 [0x14])) 4)
(const_int 3840 [0xf00]
Failed to match this instruction:
(set (reg:SI 128)
(ior:SI (and:SI (reg:SI 125 [+-2 ])
(const_int -3841 [0xf0ff]))
(and:SI (subreg:SI (and:DI (lshiftrt:DI (reg:DI 129)
(const_int 12 [0xc]))
(const_int 4294967295 [0x])) 4)
(const_int 3840 [0xf00]

If not widen the shift mode, it can be combined to rotate/mask/insert insn
as expected.

Trying 13, 10 -> 14:
   13: r127:SI=r125:SI&0xf0ff
  REG_DEAD r125:SI
   10: r124:SI=r129:DI#4 0>>0xc&0xf00
  REG_DEAD r129:DI
   14: r128:SI=r127:SI|r124:SI
  REG_DEAD r127:SI
  REG_DEAD r124:SI
Successfully matched this instruction:
(set (reg:SI 128)
(ior:SI (and:SI (reg:SI 125 [+-2 ])
(const_int -3841 [0xf0ff]))
(and:SI (lshiftrt:SI (subreg:SI (reg:DI 129) 4)
(const_int 12 [0xc]))
(const_int 3840 [0xf00]


  This patch adds a target hook to indicate if rotate/mask instructions are
supported on certain mode. If it's true, widen lshiftrt mode is skipped
and shift is done on original mode.

  The patch fixes the regression of other rs6000 test cases. They're listed
in the second patch.

  The patch passed regression test on Power Linux and x86 platforms.

Thanks
Gui Haochen

ChangeLog
combine: Not winden shift mode when target has rotate/mask instruction on
original mode

To winden shift mode is unnecessary when target already has rotate/mask
instuctions on the original mode.  It might blocks the further combine
optimization on the original mode.  For instance, further combine the insns
to a rotate/mask/insert instruction on the original mode.

This patch adds a hook to indicate if a target supports rotate/mask
instructions on the certain mode.  If it returns true, the widen shift
mode will be skipped on lshiftrt.

gcc/
PR target/93738
* combine.cc (try_widen_shift_mode): Skip to widen mode for lshiftrt
when the target has rotate/mask instructions on original mode.
* doc/tm.texi: Regenerate.
* doc/tm.texi.in (TARGET_HAVE_ROTATE_AND_MASK): Add.
* target.def (have_rotate_and_mask): New target hook.
* targhooks.cc (default_have_rotate_and_mask): New function.
* targhooks.h (default_have_rotate_and_mask): Declare.

patch.diff
diff --git a/gcc/combine.cc b/gcc/combine.cc
index 304c020ec79..f22fe42931b 100644
--- a/gcc/combine.cc
+++ b/gcc/combine.cc
@@ -10475,20 +10475,25 @@ try_widen_shift_mode (enum rtx_code code, rtx op, int 
count,
   return orig_mode;

 case LSHIFTRT:
-  /* Similarly here but with zero bits.  */
-  if (HWI_COMPUTABLE_MODE_P (mode)
- && (nonzero_bits (op, mode) & ~GET_MODE_MASK (orig_mode)) == 0)
-   return mode;
-
-  /* We can also widen if the bits brought in will be masked off.  This
-operation is performed in ORIG_MODE.  */
-  if (outer_code == AND)
+  /* Skip wider mode when the target has rotate and mask instructions on
+orig_mode.  */
+  if (!targetm.have_rotate_and_mask (orig_mode))
{
- int care_bits = low_bitmask_len (orig_mode, outer_const);
-
- if (care_bits >= 0
- && GET_MODE_PRECISION (orig_mode) - care_bits >= count)
+ /* Similarly here but with zero bits.  */
+ if (HWI_COMPUTABLE_MODE_P (mode)
+ && (nonzero_bits (op, mode) & ~GET_MODE_MASK (orig_mode)) == 0)
return mode;
+
+ /* We can also widen if the bits brought in will be masked off.
+This operation is performed in ORIG_MODE.  */
+ if (outer_code == AND)
+   {
+ int care_bits = low_bitmask_len (orig_mode, outer_const);
+
+ if (care_bits >= 0
+ && GET_MODE_PRECISION (orig_mode) - care_bits >= count)
+   return mode;
+   }
}
   /* fall through */

diff --git a/gcc/doc/tm.texi b/gcc/doc/tm.texi

Re: [PATCH] RISC-V: Fix testcase failed when default -mcmodel=medany

2023-07-18 Thread Lehua Ding
Hi Robin,

> In general I'm fine with this small change of course, I just wonder if
> the testcase is not brittle anyway. From what I can tell the respective
> change is independent of the actual number of registers so maybe it's enough 
> to
> not compare the fully body but just make sure the addis are not present?
> That way, the test could also work for -march=rv64 (which saves one
> register less anyway regardless of mcmodel - but the change still helps)
> or maybe even with instruction scheduling.  Would you mind checking this 
> still?

I think you are rigth, I would like to remove the `-mcmodel=medany` option and
relax assert from `__riscv_save/restore_4` to `__riscv_save/restore_(3|4)` to 
let
this testcase not brittle on any -mcmodel.  Then I'm also going to add another
testcase (I dont known how to run -march=rv32imafc and -march=rv64imafc on
the same testcase) that uses -march=rv64imafc.

Removing scheduling option will result in a change in the order of the assert
assembly, and I don't feel like removing it because the order may be different 
for
different microarchitectures.

Best,
Lehua

V2 patch:

gcc/testsuite/ChangeLog:

* gcc.target/riscv/stack_save_restore.c: Moved to...
* gcc.target/riscv/stack_save_restore_2.c: ...here.
* gcc.target/riscv/stack_save_restore_1.c: New test.

---
 .../gcc.target/riscv/stack_save_restore_1.c   | 40 +++
 ..._save_restore.c => stack_save_restore_2.c} |  6 +--
 2 files changed, 43 insertions(+), 3 deletions(-)
 create mode 100644 gcc/testsuite/gcc.target/riscv/stack_save_restore_1.c
 rename gcc/testsuite/gcc.target/riscv/{stack_save_restore.c => 
stack_save_restore_2.c} (90%)

diff --git a/gcc/testsuite/gcc.target/riscv/stack_save_restore_1.c 
b/gcc/testsuite/gcc.target/riscv/stack_save_restore_1.c
new file mode 100644
index 000..255ce5f40c9
--- /dev/null
+++ b/gcc/testsuite/gcc.target/riscv/stack_save_restore_1.c
@@ -0,0 +1,40 @@
+/* { dg-do compile } */
+/* { dg-options "-march=rv64imafc -mabi=lp64f -msave-restore -O2 
-fno-schedule-insns -fno-schedule-insns2 -fno-unroll-loops -fno-peel-loops 
-fno-lto" } */
+/* { dg-final { check-function-bodies "**" "" } } */
+
+char my_getchar();
+float getf();
+
+/*
+** bar:
+** callt0,__riscv_save_(3|4)
+** addisp,sp,-2032
+** ...
+** li  t0,-12288
+** add sp,sp,t0
+** ...
+** li  t0,12288
+** add sp,sp,t0
+** ...
+** addisp,sp,2032
+** tail__riscv_restore_(3|4)
+*/
+int bar()
+{
+  float volatile farray[3568];
+
+  float sum = 0;
+  float f1 = getf();
+  float f2 = getf();
+  float f3 = getf();
+  float f4 = getf();
+
+  for (int i = 0; i < 3568; i++)
+  {
+farray[i] = my_getchar() * 1.2;
+sum += farray[i];
+  }
+
+  return sum + f1 + f2 + f3 + f4;
+}
+
diff --git a/gcc/testsuite/gcc.target/riscv/stack_save_restore.c 
b/gcc/testsuite/gcc.target/riscv/stack_save_restore_2.c
similarity index 90%
rename from gcc/testsuite/gcc.target/riscv/stack_save_restore.c
rename to gcc/testsuite/gcc.target/riscv/stack_save_restore_2.c
index 522e706cfbf..4ce5e0118a4 100644
--- a/gcc/testsuite/gcc.target/riscv/stack_save_restore.c
+++ b/gcc/testsuite/gcc.target/riscv/stack_save_restore_2.c
@@ -6,8 +6,8 @@ char my_getchar();
 float getf();
 
 /*
-**bar:
-** callt0,__riscv_save_4
+** bar:
+** callt0,__riscv_save_(3|4)
 ** addisp,sp,-2032
 ** ...
 ** li  t0,-12288
@@ -17,7 +17,7 @@ float getf();
 ** add sp,sp,t0
 ** ...
 ** addisp,sp,2032
-** tail__riscv_restore_4
+** tail__riscv_restore_(3|4)
 */
 int bar()
 {
-- 
2.36.3



Re: rs6000: Fix expected counts powerpc/p9-vec-length-full

2023-07-18 Thread Kewen.Lin via Gcc-patches
Hi Carl,

The issue was tracked by PR109971 
(https://gcc.gnu.org/bugzilla/show_bug.cgi?id=109971)
and I think it had been resolved.

btw, when the expected insn count changes, it does expose some
issues but which can be either test or functionality issue, if
it's taken as a test issue, it needs some justification why
it changes like that and the change is expected.

BR,
Kewen

on 2023/7/18 23:39, Carl Love wrote:
> Ping
> 
> On Thu, 2023-06-01 at 16:11 -0700, Carl Love wrote:
>> GCC maintainers:
>>
>> The following patch updates the expected instruction counts in four
>> tests.  The counts in all of the tests changed with commit
>> f574e2dfae79055f16d0c63cc12df24815d8ead6.  
>>
>> The updated counts have been verified on both Power 9 and Power 10.
>>
>> Please let me know if this patch is acceptable for mainline.  Thanks.
>>
>>   Carl 
>>
>> 
>> rs6000: Fix expected counts powerpc/p9-vec-length-full tests
>>
>> The counts for instructions lxvl and stxvl in tests:
>>
>>   p9-vec-length-full-1.c
>>   p9-vec-length-full-2.c
>>   p9-vec-length-full-6.c
>>   p9-vec-length-full-7.c
>>
>> changed with commit:
>>
>>commit f574e2dfae79055f16d0c63cc12df24815d8ead6
>>Author: Ju-Zhe Zhong 
>>Date:   Thu May 25 22:42:35 2023 +0800
>>
>>  VECT: Add decrement IV iteration loop control by variable amount
>> support
>>
>>  This patch is supporting decrement IV by following the flow
>> designed by
>>  Richard:
>>...
>>
>> The expected counts for lxvl changed from 20 to 40 and the counts for
>> stxvl
>> changed from 10 to 20 in the first three tests.  The number of stxvl
>> instructions changed from 12 to 20 in p9-vec-length-full-7.c.  This
>> patch updates the number of expected instructions in the four tests.
>>
>> The counts have been verified on Power 9 and Power 10.
>> ---
>>  gcc/testsuite/gcc.target/powerpc/p9-vec-length-full-1.c | 4 ++--
>>  gcc/testsuite/gcc.target/powerpc/p9-vec-length-full-2.c | 4 ++--
>>  gcc/testsuite/gcc.target/powerpc/p9-vec-length-full-6.c | 4 ++--
>>  gcc/testsuite/gcc.target/powerpc/p9-vec-length-full-7.c | 2 +-
>>  4 files changed, 7 insertions(+), 7 deletions(-)
>>
>> diff --git a/gcc/testsuite/gcc.target/powerpc/p9-vec-length-full-1.c
>> b/gcc/testsuite/gcc.target/powerpc/p9-vec-length-full-1.c
>> index f01f1c54fa5..5e4f34421d3 100644
>> --- a/gcc/testsuite/gcc.target/powerpc/p9-vec-length-full-1.c
>> +++ b/gcc/testsuite/gcc.target/powerpc/p9-vec-length-full-1.c
>> @@ -12,5 +12,5 @@
>>  /* { dg-final { scan-assembler-not   {\mstxv\M} } } */
>>  /* { dg-final { scan-assembler-not   {\mlxvx\M} } } */
>>  /* { dg-final { scan-assembler-not   {\mstxvx\M} } } */
>> -/* { dg-final { scan-assembler-times {\mlxvl\M} 20 } } */
>> -/* { dg-final { scan-assembler-times {\mstxvl\M} 10 } } */
>> +/* { dg-final { scan-assembler-times {\mlxvl\M} 40 } } */
>> +/* { dg-final { scan-assembler-times {\mstxvl\M} 20 } } */
>> diff --git a/gcc/testsuite/gcc.target/powerpc/p9-vec-length-full-2.c
>> b/gcc/testsuite/gcc.target/powerpc/p9-vec-length-full-2.c
>> index f546e97fa7d..c7d927382c3 100644
>> --- a/gcc/testsuite/gcc.target/powerpc/p9-vec-length-full-2.c
>> +++ b/gcc/testsuite/gcc.target/powerpc/p9-vec-length-full-2.c
>> @@ -12,5 +12,5 @@
>>  /* { dg-final { scan-assembler-not   {\mstxv\M} } } */
>>  /* { dg-final { scan-assembler-not   {\mlxvx\M} } } */
>>  /* { dg-final { scan-assembler-not   {\mstxvx\M} } } */
>> -/* { dg-final { scan-assembler-times {\mlxvl\M} 20 } } */
>> -/* { dg-final { scan-assembler-times {\mstxvl\M} 10 } } */
>> +/* { dg-final { scan-assembler-times {\mlxvl\M} 40 } } */
>> +/* { dg-final { scan-assembler-times {\mstxvl\M} 20 } } */
>> diff --git a/gcc/testsuite/gcc.target/powerpc/p9-vec-length-full-6.c
>> b/gcc/testsuite/gcc.target/powerpc/p9-vec-length-full-6.c
>> index 65ddf2b098a..f3be3842c62 100644
>> --- a/gcc/testsuite/gcc.target/powerpc/p9-vec-length-full-6.c
>> +++ b/gcc/testsuite/gcc.target/powerpc/p9-vec-length-full-6.c
>> @@ -11,5 +11,5 @@
>>  /* It can use normal vector load for constant vector load.  */
>>  /* { dg-final { scan-assembler-times {\mstxvx?\M} 6 } } */
>>  /* 64bit/32bit pairs won't use partial vectors.  */
>> -/* { dg-final { scan-assembler-times {\mlxvl\M} 10 } } */
>> -/* { dg-final { scan-assembler-times {\mstxvl\M} 10 } } */
>> +/* { dg-final { scan-assembler-times {\mlxvl\M} 20 } } */
>> +/* { dg-final { scan-assembler-times {\mstxvl\M} 20 } } */
>> diff --git a/gcc/testsuite/gcc.target/powerpc/p9-vec-length-full-7.c
>> b/gcc/testsuite/gcc.target/powerpc/p9-vec-length-full-7.c
>> index e0e51d9a972..da086f1826a 100644
>> --- a/gcc/testsuite/gcc.target/powerpc/p9-vec-length-full-7.c
>> +++ b/gcc/testsuite/gcc.target/powerpc/p9-vec-length-full-7.c
>> @@ -12,4 +12,4 @@
>>
>>  /* Each type has one stxvl excepting for int8 and uint8, that have
>> two due to
>> rtl pass bbro duplicating the block which has one stxvl.  */
>> -/* { dg-final { scan-assembler-times {\mstxvl\M} 

[PATCH-2, rs6000] Don't widen shift mode when target has rotate/mask instruction on original mode [PR93738]

2023-07-18 Thread HAO CHEN GUI via Gcc-patches
Hi,
  The patch relies on the fist patch. The reason of the change is also described
in the first patch. This patch implements the target hook have_rotate_and_mask.
It also modifies some test cases. The regression of rlwimi-2.c is fixed. For
rlwinm-0.c and rlwinm-2.c, one more 32bit rotate/mask instruction is generated
and one less 64bit rotate/mask instruction.

  The patch passed regression test on Power Linux platforms. Test shows the 
patch
has no performance regression on SPECint.

Thanks
Gui Haochen

ChangeLog
rs6000: implement target hook have_rotate_and_mask

gcc/
PR target/93738
* config/rs6000/rs6000.cc (TARGET_HAVE_ROTATE_AND_MASK): Define.
(rs6000_have_rotate_and_mask): New function.

gcc/testsuite/
PR target/93738
* gcc.target/powerpc/rlwimi-2.c: Adjust the number of 64bit and 32bit
rotate instuctions.
* gcc.target/powerpc/rlwinm-0.c: Likewise.
* gcc.target/powerpc/rlwinm-2.c: Likewise.

patch.diff
diff --git a/gcc/config/rs6000/rs6000.cc b/gcc/config/rs6000/rs6000.cc
index 44b448d2ba6..98873afddb4 100644
--- a/gcc/config/rs6000/rs6000.cc
+++ b/gcc/config/rs6000/rs6000.cc
@@ -1764,6 +1764,9 @@ static const struct attribute_spec 
rs6000_attribute_table[] =
 #undef TARGET_CONST_ANCHOR
 #define TARGET_CONST_ANCHOR 0x8000

+#undef TARGET_HAVE_ROTATE_AND_MASK
+#define TARGET_HAVE_ROTATE_AND_MASK rs6000_have_rotate_and_mask
+
 

 /* Processor table.  */
@@ -29097,6 +29100,17 @@ rs6000_opaque_type_invalid_use_p (gimple *stmt)
   return false;
 }

+bool
+rs6000_have_rotate_and_mask (machine_mode mode)
+{
+  gcc_assert (SCALAR_INT_MODE_P (mode));
+
+  if (mode == SImode || mode == DImode)
+return true;
+
+  return false;
+}
+
 struct gcc_target targetm = TARGET_INITIALIZER;

 #include "gt-rs6000.h"
diff --git a/gcc/testsuite/gcc.target/powerpc/rlwimi-2.c 
b/gcc/testsuite/gcc.target/powerpc/rlwimi-2.c
index bafa371db73..62344a95aa0 100644
--- a/gcc/testsuite/gcc.target/powerpc/rlwimi-2.c
+++ b/gcc/testsuite/gcc.target/powerpc/rlwimi-2.c
@@ -6,10 +6,9 @@
 /* { dg-final { scan-assembler-times {(?n)^\s+blr} 6750 } } */
 /* { dg-final { scan-assembler-times {(?n)^\s+mr} 643 { target ilp32 } } } */
 /* { dg-final { scan-assembler-times {(?n)^\s+mr} 11 { target lp64 } } } */
-/* { dg-final { scan-assembler-times {(?n)^\s+rldicl} 7790 { target lp64 } } } 
*/
+/* { dg-final { scan-assembler-times {(?n)^\s+rldicl} 6728 { target lp64 } } } 
*/

-/* { dg-final { scan-assembler-times {(?n)^\s+rlwimi} 1692 { target ilp32 } } 
} */
-/* { dg-final { scan-assembler-times {(?n)^\s+rlwimi} 1666 { target lp64 } } } 
*/
+/* { dg-final { scan-assembler-times {(?n)^\s+rlwimi} 1692 } } */

 /* { dg-final { scan-assembler-times {(?n)^\s+mulli} 5036 } } */

diff --git a/gcc/testsuite/gcc.target/powerpc/rlwinm-0.c 
b/gcc/testsuite/gcc.target/powerpc/rlwinm-0.c
index 4f4fca2d8ef..b6b1b227c7e 100644
--- a/gcc/testsuite/gcc.target/powerpc/rlwinm-0.c
+++ b/gcc/testsuite/gcc.target/powerpc/rlwinm-0.c
@@ -7,10 +7,10 @@
 /* { dg-final { scan-assembler-times {(?n)^\s+rldicl} 3081 { target lp64 } } } 
*/

 /* { dg-final { scan-assembler-times {(?n)^\s+rlwinm} 3197 { target ilp32 } } 
} */
-/* { dg-final { scan-assembler-times {(?n)^\s+rlwinm} 3093 { target lp64 } } } 
*/
+/* { dg-final { scan-assembler-times {(?n)^\s+rlwinm} 3094 { target lp64 } } } 
*/
 /* { dg-final { scan-assembler-times {(?n)^\s+rotlwi} 154 } } */
 /* { dg-final { scan-assembler-times {(?n)^\s+srwi} 13 { target ilp32 } } } */
-/* { dg-final { scan-assembler-times {(?n)^\s+srdi} 13 { target lp64 } } } */
+/* { dg-final { scan-assembler-times {(?n)^\s+srdi} 12 { target lp64 } } } */


 #define SL
diff --git a/gcc/testsuite/gcc.target/powerpc/rlwinm-2.c 
b/gcc/testsuite/gcc.target/powerpc/rlwinm-2.c
index bddcfe2b76f..0315ca91dd7 100644
--- a/gcc/testsuite/gcc.target/powerpc/rlwinm-2.c
+++ b/gcc/testsuite/gcc.target/powerpc/rlwinm-2.c
@@ -7,9 +7,9 @@
 /* { dg-final { scan-assembler-times {(?n)^\s+rldic} 2726 { target lp64 } } } 
*/

 /* { dg-final { scan-assembler-times {(?n)^\s+rlwinm} 833 { target ilp32 } } } 
*/
-/* { dg-final { scan-assembler-times {(?n)^\s+rlwinm} 720 { target lp64 } } } 
*/
+/* { dg-final { scan-assembler-times {(?n)^\s+rlwinm} 721 { target lp64 } } } 
*/
 /* { dg-final { scan-assembler-times {(?n)^\s+srwi} 13 { target ilp32 } } } */
-/* { dg-final { scan-assembler-times {(?n)^\s+srdi} 13 { target lp64 } } } */
+/* { dg-final { scan-assembler-times {(?n)^\s+srdi} 12 { target lp64 } } } */

 /* { dg-final { scan-assembler-times {(?n)^\s+mulli} 2518 } } */



Re: [PATCH, rs6000] Generate mfvsrwz for all platforms and remove redundant zero extend [PR106769]

2023-07-18 Thread Kewen.Lin via Gcc-patches
Hi Haochen,

on 2023/6/19 09:14, HAO CHEN GUI wrote:
> Hi,
>   This patch modifies vsx extract expander and generates mfvsrwz/stxsiwx
> for all platforms when the mode is V4SI and the index of extracted element
> is 1 for BE and 2 for LE. Also this patch adds a insn pattern for mfvsrwz
> which can help eliminate redundant zero extend.
> 
>   Bootstrapped and tested on powerpc64-linux BE and LE with no regressions.
> 
> Thanks
> Gui Haochen
> 
> 
> ChangeLog
> rs6000: Generate mfvsrwz for all platforms and remove redundant zero extend
> 
> mfvsrwz has lower latency than xxextractuw.  So it should be generated

Nice, it also has lower latency than vextuw[lr]x.

> even with p9 vector enabled if possible.  Also the instruction is
> already zero extended.  A combine pattern is needed to eliminate
> redundant zero extend instructions.
> 
> gcc/
>   PR target/106769
>   * config/rs6000/vsx.md (expand vsx_extract_): Skip calling
>   gen_vsx_extract__p9 when it can be implemented by
>   mfvsrwz/stxsiwx.
>   (*vsx_extract__di_p9): Not generate the insn when it can
>   be generated by mfvsrwz.
>   (mfvsrwz): New insn pattern.
>   (*vsx_extract_si): Rename to...
>   (vsx_extract_si): ..., remove redundant insn condition and
>   generate the insn on p9 when it can be implemented by
>   mfvsrwz/stxsiwx.  Add a dup alternative for simple vector moving.
>   Remove reload_completed from split condition as it's unnecessary.
>   Remove unnecessary checking from preparation statements.  Set
>   type and length attributes for each alternative.
> 
> gcc/testsuite/
>   PR target/106769
>   * gcc.target/powerpc/pr106769.h: New.
>   * gcc.target/powerpc/pr106769-p8.c: New.
>   * gcc.target/powerpc/pr106769-p9.c: New.
> 
> diff --git a/gcc/config/rs6000/vsx.md b/gcc/config/rs6000/vsx.md
> index 0a34ceebeb5..09b0f83db86 100644
> --- a/gcc/config/rs6000/vsx.md
> +++ b/gcc/config/rs6000/vsx.md
> @@ -3728,7 +3728,9 @@ (define_expand  "vsx_extract_"

I noticed that we special case VSX_D for vsx_extract_, then it
has two special define_insn for special operand 2 (0/1) like:

  "define_insn "*vsx_extract__0" and "..._1".

I wonder if we can do a similar thing to special case VSX_W, it seems
more clear?

>"VECTOR_MEM_VSX_P (mode) && TARGET_DIRECT_MOVE_64BIT"
>  {
>/* If we have ISA 3.0, we can do a xxextractuw/vextractu{b,h}.  */
> -  if (TARGET_P9_VECTOR)
> +  if (TARGET_P9_VECTOR
> +  && (mode != V4SImode
> +   || INTVAL (operands[2]) != (BYTES_BIG_ENDIAN ? 1 : 2)))
>  {
>emit_insn (gen_vsx_extract__p9 (operands[0], operands[1],
>   operands[2]));
> @@ -3798,7 +3800,9 @@ (define_insn_and_split "*vsx_extract__di_p9"
> (match_operand:VSX_EXTRACT_I 1 "gpc_reg_operand" "v,")
> (parallel [(match_operand:QI 2 "const_int_operand" "n,n")]
> (clobber (match_scratch:SI 3 "=r,X"))]
> -  "VECTOR_MEM_VSX_P (mode) && TARGET_VEXTRACTUB"
> +  "TARGET_VEXTRACTUB
> +   && (mode != V4SImode
> +   || INTVAL (operands[2]) != (BYTES_BIG_ENDIAN ? 1 : 2))"
>"#"
>"&& reload_completed"
>[(parallel [(set (match_dup 4)
> @@ -3830,58 +3834,67 @@ (define_insn_and_split "*vsx_extract__store_p9"
> (set (match_dup 0)
>   (match_dup 3))])
> 
> -(define_insn_and_split  "*vsx_extract_si"
> -  [(set (match_operand:SI 0 "nonimmediate_operand" "=r,wa,Z")
> +(define_insn "mfvsrwz"
> +  [(set (match_operand:DI 0 "register_operand" "=r")
> + (zero_extend:DI
> +   (vec_select:SI
> + (match_operand:V4SI 1 "vsx_register_operand" "wa")
> + (parallel [(match_operand:QI 2 "const_int_operand" "n")]
> +   (clobber (match_scratch:V4SI 3 "=v"))]
> +  "TARGET_DIRECT_MOVE_64BIT
> +   && INTVAL (operands[2]) == (BYTES_BIG_ENDIAN ? 1 : 2)"
> +  "mfvsrwz %0,%x1"
> +  [(set_attr "type" "mfvsr")
> +   (set_attr "isa" "p8v")])
> +
> +(define_insn_and_split  "vsx_extract_si"
> +  [(set (match_operand:SI 0 "nonimmediate_operand" "=r,wa,Z,wa")
>   (vec_select:SI
> -  (match_operand:V4SI 1 "gpc_reg_operand" "v,v,v")
> -  (parallel [(match_operand:QI 2 "const_0_to_3_operand" "n,n,n")])))
> -   (clobber (match_scratch:V4SI 3 "=v,v,v"))]
> -  "VECTOR_MEM_VSX_P (V4SImode) && TARGET_DIRECT_MOVE_64BIT && 
> !TARGET_P9_VECTOR"
> -  "#"
> -  "&& reload_completed"
> +  (match_operand:V4SI 1 "gpc_reg_operand" "v,v,v,0")
> +  (parallel [(match_operand:QI 2 "const_0_to_3_operand" "n,n,n,n")])))
> +   (clobber (match_scratch:V4SI 3 "=v,v,v,v"))]
> +  "TARGET_DIRECT_MOVE_64BIT
> +   && (!TARGET_P9_VECTOR || INTVAL (operands[2]) == (BYTES_BIG_ENDIAN ? 1 : 
> 2))"
> +{
> +   if (which_alternative == 0)
> + return "mfvsrwz %0,%x1";
> +
> +   if (which_alternative == 1)
> + return "xxlor %x0,%x1,%x1";
> +
> +   if (which_alternative == 2)
> + return "stxsiwx %x1,%y0";
> +
> +   return ASM_COMMENT_START " vec_extract to same register";
> +}
> +  "&& INTVAL (operands[2]) != 

[Bug target/106460] internal compiler error: output_operand: invalid expression as operand on -O1

2023-07-18 Thread guojiufu at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=106460

Jiu Fu Guo  changed:

   What|Removed |Added

 Resolution|--- |FIXED
 Status|UNCONFIRMED |RESOLVED

--- Comment #3 from Jiu Fu Guo  ---
Fixed in trunk.

[Bug target/110625] [AArch64] Vect: SLP fails to vectorize a loop as the reduction_latency calculated by new costs is too large

2023-07-18 Thread hliu at amperecomputing dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=110625

--- Comment #6 from Hao Liu  ---
Thanks for the confirmation about the reduction latency.  I'll create a simple
patch to fix this.

> Discounting the loads, we do have 15 general operations.

That's true, and there are indeed 8 general operations for scalar loop.  As the
count_ops() is accurate, it seems maybe the Cost of Vector Body is too large
(Vector inside of loop cost: 51):

*k_48 4 times vec_perm costs 12 in body
*k_48 1 times unaligned_load (misalign -1) costs 4 in body
_5->m1 1 times vec_perm costs 3 in body
_5->m4 1 times unaligned_load (misalign -1) costs 4 in body
(int) _24 2 times vec_promote_demote costs 4 in body
(double) _25 4 times vec_promote_demote costs 8 in body
_2 * _26 4 times vector_stmt costs 8 in body

If it is small enough, even the vect-body cost is increased according to the
issue-info, SLP is still profitable.  I'm not quite familiar with this part and
it may affect all aarch64 targets, so I think it's hard to fix by me.  It would
be great if you will look at how to fix this.

Re: Enum processor_features adding on i386 backend

2023-07-18 Thread Andrew Pinski via Gcc
On Tue, Jul 18, 2023 at 7:50 PM Jiang, Haochen via Gcc  wrote:
>
> Hi all,
>
> As you all know that we are continuously working on new ISA implementation 
> for i386 backend.
>
> There is one thing that I am really curious about when I read the code.
>
> In gcc/config/i386/i386-cpuinfo.h, we have such comment:
>
> /* ISA Features supported. New features have to be inserted at the end.  */
>
> Why new features have to be inserted at the end? I did a quick investigation 
> and found that it was
> added at this mailing thread originally at libgcc/config/i386/cpuinfo.c:
>
> https://gcc.gnu.org/pipermail/gcc-patches/2015-September/428915.html
>
> In the thread, it seems that it might cause unwanted ABI change. Could anyone 
> kindly tell me something
> more about that? Should this rule still be kept for now after about eight 
> years since then?

YES we should keep it.
___builtin_cpu_supports uses those #s so the ABI is exposed to the
object level even if those #s are not exposed to the user directly
those are still exposed as an ABI that needs to be expandable that
way.

Thanks,
Andrew


>
> Thx,
> Haochen
>


Enum processor_features adding on i386 backend

2023-07-18 Thread Jiang, Haochen via Gcc
Hi all,

As you all know that we are continuously working on new ISA implementation for 
i386 backend.

There is one thing that I am really curious about when I read the code.

In gcc/config/i386/i386-cpuinfo.h, we have such comment:

/* ISA Features supported. New features have to be inserted at the end.  */

Why new features have to be inserted at the end? I did a quick investigation 
and found that it was
added at this mailing thread originally at libgcc/config/i386/cpuinfo.c:

https://gcc.gnu.org/pipermail/gcc-patches/2015-September/428915.html

In the thread, it seems that it might cause unwanted ABI change. Could anyone 
kindly tell me something
more about that? Should this rule still be kept for now after about eight years 
since then?

Thx,
Haochen



[Bug c++/102724] ICE in genericize_spaceship, at cp/method.c:1089

2023-07-18 Thread pinskia at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=102724

Andrew Pinski  changed:

   What|Removed |Added

 Status|UNCONFIRMED |NEW
 Ever confirmed|0   |1
   Last reconfirmed||2023-07-19

--- Comment #2 from Andrew Pinski  ---
Confirmed.

[Bug middle-end/110726] [14 Regression] wrong code on llvm-16 around 'a |= a == 0'

2023-07-18 Thread pinskia at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=110726

Andrew Pinski  changed:

   What|Removed |Added

   Keywords||patch
URL||https://gcc.gnu.org/piperma
   ||il/gcc-patches/2023-July/62
   ||4847.html

--- Comment #9 from Andrew Pinski  ---
Patch posted:
https://gcc.gnu.org/pipermail/gcc-patches/2023-July/624847.html

[PATCH] Fix PR110726: a | (a == b) can sometimes produce wrong code

2023-07-18 Thread Andrew Pinski via Gcc-patches
So I had missed/forgot that EQ_EXPR could have an non boolean
type for generic when I implemented r14-2556-g0407ae8a7732d9.
This patch adds check for one bit precision intergal type
which fixes the problem.

OK? Bootstrapped and tested on x86_64-linux-gnu with no regressions.

PR tree-optimization/110726

gcc/ChangeLog:

* match.pd ((a|b)&(a==b),a|(a==b),(a)|(a==b)):
Add checks to make sure the type was one bit precision
intergal type.

gcc/testsuite/ChangeLog:

* gcc.c-torture/execute/bitops-1.c: New test.
---
 gcc/match.pd  | 12 +--
 .../gcc.c-torture/execute/bitops-1.c  | 33 +++
 2 files changed, 42 insertions(+), 3 deletions(-)
 create mode 100644 gcc/testsuite/gcc.c-torture/execute/bitops-1.c

diff --git a/gcc/match.pd b/gcc/match.pd
index 054e6585876..4dfe92623f7 100644
--- a/gcc/match.pd
+++ b/gcc/match.pd
@@ -1229,7 +1229,9 @@ DEFINE_INT_AND_FLOAT_ROUND_FN (RINT)
 /* (a | b) & (a == b)  -->  a & b (boolean version of the above).  */
 (simplify
  (bit_and:c (bit_ior @0 @1) (nop_convert? (eq:c @0 @1)))
- (bit_and @0 @1))
+ (if (INTEGRAL_TYPE_P (TREE_TYPE (@0))
+  && TYPE_PRECISION (TREE_TYPE (@0)) == 1)
+  (bit_and @0 @1)))
 
 /* a | ~(a ^ b)  -->  a | ~b  */
 (simplify
@@ -1239,7 +1241,9 @@ DEFINE_INT_AND_FLOAT_ROUND_FN (RINT)
 /* a | (a == b)  -->  a | (b^1) (boolean version of the above). */
 (simplify
  (bit_ior:c @0 (nop_convert? (eq:c @0 @1)))
- (bit_ior @0 (bit_xor @1 { build_one_cst (type); })))
+ (if (INTEGRAL_TYPE_P (TREE_TYPE (@0))
+  && TYPE_PRECISION (TREE_TYPE (@0)) == 1)
+  (bit_ior @0 (bit_xor @1 { build_one_cst (type); }
 
 /* (a | b) | (a &^ b)  -->  a | b  */
 (for op (bit_and bit_xor)
@@ -1255,7 +1259,9 @@ DEFINE_INT_AND_FLOAT_ROUND_FN (RINT)
 /* (a & b) | (a == b)  -->  a == b  */
 (simplify
  (bit_ior:c (bit_and:c @0 @1) (nop_convert?@2 (eq @0 @1)))
- @2)
+ (if (INTEGRAL_TYPE_P (TREE_TYPE (@0))
+  && TYPE_PRECISION (TREE_TYPE (@0)) == 1)
+  @2))
 
 /* ~(~a & b)  -->  a | ~b  */
 (simplify
diff --git a/gcc/testsuite/gcc.c-torture/execute/bitops-1.c 
b/gcc/testsuite/gcc.c-torture/execute/bitops-1.c
new file mode 100644
index 000..cfaa6b9fd26
--- /dev/null
+++ b/gcc/testsuite/gcc.c-torture/execute/bitops-1.c
@@ -0,0 +1,33 @@
+/* PR tree-optimization/110726 */
+
+#define DECLS(n,VOL)   \
+__attribute__((noinline,noclone))  \
+int h##n(VOL int A, VOL int B){\
+return (A | B) & (A == B); \
+}  \
+__attribute__((noinline,noclone))  \
+int i##n(VOL int A, VOL int B){\
+return A | (A == B);   \
+}  \
+__attribute__((noinline,noclone))  \
+int k##n(VOL int A, VOL int B){\
+return (A & B) | (A == B); \
+}  \
+
+DECLS(0,)
+DECLS(1,volatile)
+
+int values[] = { 0, 1, 2, 3, -1, -2, -3, 0x10080 };
+int numvalues = sizeof(values)/sizeof(values[0]);
+
+int main(){
+for(int A = 0; A < numvalues; A++)
+  for(int B = 0; B < numvalues; B++)
+   {
+ int a = values[A];
+ int b = values[B];
+ if (h0 (a, b) != h1 (a, b)) __builtin_abort();
+ if (i0 (a, b) != i1 (a, b)) __builtin_abort();
+ if (k0 (a, b) != k1 (a, b)) __builtin_abort();
+   }
+}
-- 
2.31.1



[Bug target/109504] [12/13/14 Regression] Compilation fails with pragma GCC target sse4.1 and immintrin.h

2023-07-18 Thread cvs-commit at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=109504

--- Comment #9 from CVS Commits  ---
The master branch has been updated by hongtao Liu :

https://gcc.gnu.org/g:9a19fa8b616f83474c35cc5b34a3865073ced829

commit r14-2628-g9a19fa8b616f83474c35cc5b34a3865073ced829
Author: liuhongt 
Date:   Tue Apr 18 14:53:04 2023 +0800

Support type _Float16/__bf16 independent of SSE2.

Enable _Float16 and __bf16 all the time but issue errors when the
types are used in conversion, unary operation, binary operation,
parameter passing or value return when TARGET_SSE2 is not available.

Also undef macros which are used by libgcc/libstdc++ to check the
backend support of the _Float16/__bf16 types when TARGET_SSE2 is not
available.

gcc/ChangeLog:

PR target/109504
* config/i386/i386-builtins.cc
(ix86_register_float16_builtin_type): Remove TARGET_SSE2.
(ix86_register_bf16_builtin_type): Ditto.
* config/i386/i386-c.cc (ix86_target_macros): When TARGET_SSE2
isn't available, undef the macros which are used to check the
backend support of the _Float16/__bf16 types when building
libstdc++ and libgcc.
* config/i386/i386.cc (construct_container): Issue errors for
HFmode/BFmode when TARGET_SSE2 is not available.
(function_value_32): Ditto.
(ix86_scalar_mode_supported_p): Remove TARGET_SSE2 for
HFmode/BFmode.
(ix86_libgcc_floating_mode_supported_p): Ditto.
(ix86_emit_support_tinfos): Adjust codes.
(ix86_invalid_conversion): Return diagnostic message string
when there's conversion from/to BF/HFmode w/o TARGET_SSE2.
(ix86_invalid_unary_op): New function.
(ix86_invalid_binary_op): Ditto.
(TARGET_INVALID_UNARY_OP): Define.
(TARGET_INVALID_BINARY_OP): Define.
* config/i386/immintrin.h [__SSE2__]: Remove for fp16/bf16
related instrinsics header files.
* config/i386/i386.h (VALID_SSE2_TYPE_MODE): New macro.

gcc/testsuite/ChangeLog:

* gcc.target/i386/pr109504.c: New test.
* gcc.target/i386/sse2-bfloat16-1.c: Adjust error info.
* gcc.target/i386/sse2-float16-1.c: Ditto.
* gcc.target/i386/sse2-float16-4.c: New test.
* gcc.target/i386/sse2-float16-5.c: New test.
* g++.target/i386/float16-1.C: Adjust error info.

libgcc/ChangeLog:

* config/i386/t-softfp: Add -msse2 to libbid HFtype related
files.

Re: [PATCH v2 0/8] Add Loongson SX/ASX instruction support to LoongArch target.

2023-07-18 Thread PanChenghui
Got it, I will fix the commit info in next version.

I haven't test GCC with these flags before, so I will try to build and
run regression test with BOOT_CFLAGS later. 

On Tue, 2023-07-18 at 20:26 +0800, Xi Ruoyao wrote:
> On Tue, 2023-07-18 at 19:06 +0800, Chenghui Pan wrote:
> > Lulu Cheng (8):
> >   LoongArch: Added Loongson SX vector directive compilation
> > framework.
> >   LoongArch: Added Loongson SX base instruction support.
> >   LoongArch: Added Loongson SX directive builtin function support.
> >   LoongArch: Added Loongson ASX vector directive compilation
> > framework.
> >   LoongArch: Added Loongson ASX base instruction support.
> >   LoongArch: Added Loongson ASX directive builtin function support.
> 
> Let's always use "Add".
> 
> >   LoongArch: Add Loongson SX directive test cases.
> >   LoongArch: Add Loongson ASX directive test cases.
> 
> Have you tested this series by bootstrapping and regtesting GCC with
> BOOT_CFLAGS="-O2 -ftree-vectorize -fno-vect-cost-model -mlasx" and
> BOOT_CFLAGS="-O3 -mlasx"?  This may catch some mistakes early.
> 
> And I'll rebuild the entire system with these GCC patches and -mlasx
> in
> Aug (after Glibc-2.38 release) as a field test too.
> 



Re: [PATCH v5 4/5] c++modules: report imported CMI files as dependencies

2023-07-18 Thread Ben Boeckel via Gcc
On Tue, Jul 18, 2023 at 16:52:44 -0400, Jason Merrill wrote:
> On 6/25/23 12:36, Ben Boeckel wrote:
> > On Fri, Jun 23, 2023 at 08:12:41 -0400, Nathan Sidwell wrote:
> >> On 6/22/23 22:45, Ben Boeckel wrote:
> >>> On Thu, Jun 22, 2023 at 17:21:42 -0400, Jason Merrill wrote:
>  On 1/25/23 16:06, Ben Boeckel wrote:
> > They affect the build, so report them via `-MF` mechanisms.
> 
>  Why isn't this covered by the existing code in preprocessed_module?
> >>>
> >>> It appears as though it is neutered in patch 3 where
> >>> `write_make_modules_deps` is used in `make_write` (or will use that name
> >>
> >> Why do you want to record the transitive modules? I would expect just 
> >> noting the
> >> ones with imports directly in the TU would suffice (i.e check the 
> >> 'outermost' arg)
> > 
> > FWIW, only GCC has "fat" modules. MSVC and Clang both require the
> > transitive closure to be passed. The idea there is to minimize the size
> > of individual module files.
> > 
> > If GCC only reads the "fat" modules, then only those should be recorded.
> > If it reads other modules, they should be recorded as well.

For clarification, given:

* a.cppm
```
export module a;
```

* b.cppm
```
export module b;
import a;
```

* use.cppm
```
import b;
```

in a "fat" module setup, `use.cppm` only needs to be told about
`b.cmi` because it contains everything that an importer needs to know
about the `a` module (reachable types, re-exported bits, whatever). With
the "thin" modules, `a.cmi` must be specified when compiling `use.cppm`
to satisfy anything that may be required transitively (e.g., a return
type of some `b` symbol is from `a`). MSVC and Clang (17+) use this
model. I don't know MSVC's rationale, but Clang's is to make CMIs
relocatable by not embedding paths to dependent modules in CMIs. This
should help caching and network transfer sizes in distributed builds.
LLVM/Clang discussion:


https://discourse.llvm.org/t/c-20-modules-should-the-bmis-contain-paths-to-their-dependent-bmis/70422
https://github.com/llvm/llvm-project/issues/62707

Maybe I'm missing how this *actually* works in GCC as I've really only
interacted with it through the command line, but I've not needed to
mention `a.cmi` when compiling `use.cppm`. Is `a.cmi` referenced and
read through some embedded information in `b.cmi` or does `b.cmi`
include enough information to not need to read it at all? If the former,
distributed builds are going to have a problem knowing what files to
send just from the command line (I'll call this "implicit thin"). If the
latter, that is the "fat" CMI that I'm thinking of.

> But wouldn't the transitive modules be dependencies of the direct 
> imports, so (re)building the direct imports would first require building 
> the transitive modules anyway?  Expressing the transitive closure of 
> dependencies for each importer seems redundant when it can be easily 
> derived from the direct dependencies of each module.

I'm not concerned whether it is transitive or not, really. If a file is
read, it should be reported here regardless of the reason. Note that
caching mechanisms may skip actually *doing* the reading, but the
dependencies should still be reported from the cached results as-if the
real work had been performed.

--Ben


Re: [PATCH v5 4/5] c++modules: report imported CMI files as dependencies

2023-07-18 Thread Ben Boeckel via Gcc-patches
On Tue, Jul 18, 2023 at 16:52:44 -0400, Jason Merrill wrote:
> On 6/25/23 12:36, Ben Boeckel wrote:
> > On Fri, Jun 23, 2023 at 08:12:41 -0400, Nathan Sidwell wrote:
> >> On 6/22/23 22:45, Ben Boeckel wrote:
> >>> On Thu, Jun 22, 2023 at 17:21:42 -0400, Jason Merrill wrote:
>  On 1/25/23 16:06, Ben Boeckel wrote:
> > They affect the build, so report them via `-MF` mechanisms.
> 
>  Why isn't this covered by the existing code in preprocessed_module?
> >>>
> >>> It appears as though it is neutered in patch 3 where
> >>> `write_make_modules_deps` is used in `make_write` (or will use that name
> >>
> >> Why do you want to record the transitive modules? I would expect just 
> >> noting the
> >> ones with imports directly in the TU would suffice (i.e check the 
> >> 'outermost' arg)
> > 
> > FWIW, only GCC has "fat" modules. MSVC and Clang both require the
> > transitive closure to be passed. The idea there is to minimize the size
> > of individual module files.
> > 
> > If GCC only reads the "fat" modules, then only those should be recorded.
> > If it reads other modules, they should be recorded as well.

For clarification, given:

* a.cppm
```
export module a;
```

* b.cppm
```
export module b;
import a;
```

* use.cppm
```
import b;
```

in a "fat" module setup, `use.cppm` only needs to be told about
`b.cmi` because it contains everything that an importer needs to know
about the `a` module (reachable types, re-exported bits, whatever). With
the "thin" modules, `a.cmi` must be specified when compiling `use.cppm`
to satisfy anything that may be required transitively (e.g., a return
type of some `b` symbol is from `a`). MSVC and Clang (17+) use this
model. I don't know MSVC's rationale, but Clang's is to make CMIs
relocatable by not embedding paths to dependent modules in CMIs. This
should help caching and network transfer sizes in distributed builds.
LLVM/Clang discussion:


https://discourse.llvm.org/t/c-20-modules-should-the-bmis-contain-paths-to-their-dependent-bmis/70422
https://github.com/llvm/llvm-project/issues/62707

Maybe I'm missing how this *actually* works in GCC as I've really only
interacted with it through the command line, but I've not needed to
mention `a.cmi` when compiling `use.cppm`. Is `a.cmi` referenced and
read through some embedded information in `b.cmi` or does `b.cmi`
include enough information to not need to read it at all? If the former,
distributed builds are going to have a problem knowing what files to
send just from the command line (I'll call this "implicit thin"). If the
latter, that is the "fat" CMI that I'm thinking of.

> But wouldn't the transitive modules be dependencies of the direct 
> imports, so (re)building the direct imports would first require building 
> the transitive modules anyway?  Expressing the transitive closure of 
> dependencies for each importer seems redundant when it can be easily 
> derived from the direct dependencies of each module.

I'm not concerned whether it is transitive or not, really. If a file is
read, it should be reported here regardless of the reason. Note that
caching mechanisms may skip actually *doing* the reading, but the
dependencies should still be reported from the cached results as-if the
real work had been performed.

--Ben


[PATCH 8/8] c++, lto: Use TYPE_REF_IS_LVALUE

2023-07-18 Thread Ken Matsui via Gcc-patches
gcc/cp/ChangeLog:

* decl.cc (copy_fn_p): Use TYPE_REF_IS_LVALUE.
* init.cc (maybe_warn_list_ctor): Likewise.
* method.cc (early_check_defaulted_comparison): Likewise.
* pt.cc (maybe_adjust_types_for_deduction): Likewise.
(invalid_nontype_parm_type_p): Likewise.
* tree.cc (structural_type_p): Likewise.

gcc/lto/ChangeLog:

* lto-common.cc (lto_fixup_prevailing_type): Use
TYPE_REF_IS_LVALUE.

Signed-off-by: Ken Matsui 
---
 gcc/cp/decl.cc| 3 +--
 gcc/cp/init.cc| 3 +--
 gcc/cp/method.cc  | 3 +--
 gcc/cp/pt.cc  | 6 ++
 gcc/cp/tree.cc| 2 +-
 gcc/lto/lto-common.cc | 2 +-
 6 files changed, 7 insertions(+), 12 deletions(-)

diff --git a/gcc/cp/decl.cc b/gcc/cp/decl.cc
index c08056ca3d2..5e84f6a6f78 100644
--- a/gcc/cp/decl.cc
+++ b/gcc/cp/decl.cc
@@ -15250,8 +15250,7 @@ copy_fn_p (const_tree d)
   /* Pass by value copy assignment operator.  */
   result = -1;
 }
-  else if (TYPE_REF_P (arg_type)
-  && !TYPE_REF_IS_RVALUE (arg_type)
+  else if (TYPE_REF_IS_LVALUE (arg_type)
   && TYPE_MAIN_VARIANT (TREE_TYPE (arg_type)) == DECL_CONTEXT (d))
 {
   if (CP_TYPE_CONST_P (TREE_TYPE (arg_type)))
diff --git a/gcc/cp/init.cc b/gcc/cp/init.cc
index 01eb4b55889..8006576bfd6 100644
--- a/gcc/cp/init.cc
+++ b/gcc/cp/init.cc
@@ -778,8 +778,7 @@ maybe_warn_list_ctor (tree member, tree init)
   tree initlist = non_reference (parm);
 
   /* Do not warn if the parameter is an lvalue reference to non-const.  */
-  if (TYPE_REF_P (parm) && !TYPE_REF_IS_RVALUE (parm)
-  && !CP_TYPE_CONST_P (initlist))
+  if (TYPE_REF_IS_LVALUE (parm) && !CP_TYPE_CONST_P (initlist))
 return;
 
   tree targs = CLASSTYPE_TI_ARGS (initlist);
diff --git a/gcc/cp/method.cc b/gcc/cp/method.cc
index 524efc4e260..59be43d12c6 100644
--- a/gcc/cp/method.cc
+++ b/gcc/cp/method.cc
@@ -1207,8 +1207,7 @@ early_check_defaulted_comparison (tree fn)
   tree parmtype = TREE_VALUE (parmnode);
   if (CLASS_TYPE_P (parmtype))
saw_byval = true;
-  else if (TYPE_REF_P (parmtype)
-  && !TYPE_REF_IS_RVALUE (parmtype)
+  else if (TYPE_REF_IS_LVALUE (parmtype)
   && TYPE_QUALS (TREE_TYPE (parmtype)) == TYPE_QUAL_CONST)
{
  saw_byref = true;
diff --git a/gcc/cp/pt.cc b/gcc/cp/pt.cc
index cb82e1b268b..56207411a7d 100644
--- a/gcc/cp/pt.cc
+++ b/gcc/cp/pt.cc
@@ -22772,8 +22772,7 @@ maybe_adjust_types_for_deduction (tree tparms,
 too, but here handle it by stripping the reference from PARM
 rather than by adding it to ARG.  */
   if (forwarding_reference_p (*parm, TPARMS_PRIMARY_TEMPLATE (tparms))
- && TYPE_REF_P (*arg)
- && !TYPE_REF_IS_RVALUE (*arg))
+ && TYPE_REF_IS_LVALUE (*arg))
*parm = TREE_TYPE (*parm);
   /* Nothing else to do in this case.  */
   return 0;
@@ -27595,8 +27594,7 @@ invalid_nontype_parm_type_p (tree type, tsubst_flags_t 
complain)
 return false;
   else if (TYPE_PTR_P (type))
 return false;
-  else if (TYPE_REF_P (type)
-  && !TYPE_REF_IS_RVALUE (type))
+  else if (TYPE_REF_IS_LVALUE (type))
 return false;
   else if (TYPE_PTRMEM_P (type))
 return false;
diff --git a/gcc/cp/tree.cc b/gcc/cp/tree.cc
index 799183dc646..1ba7539b497 100644
--- a/gcc/cp/tree.cc
+++ b/gcc/cp/tree.cc
@@ -4928,7 +4928,7 @@ structural_type_p (tree t, bool explain)
   if (SCALAR_TYPE_P (t))
 return true;
   /* an lvalue reference type, or */
-  if (TYPE_REF_P (t) && !TYPE_REF_IS_RVALUE (t))
+  if (TYPE_REF_IS_LVALUE (t))
 return true;
   /* a literal class type with the following properties:
  - all base classes and non-static data members are public and non-mutable
diff --git a/gcc/lto/lto-common.cc b/gcc/lto/lto-common.cc
index 4b748ced87f..6321a70d04a 100644
--- a/gcc/lto/lto-common.cc
+++ b/gcc/lto/lto-common.cc
@@ -998,7 +998,7 @@ lto_fixup_prevailing_type (tree t)
  TYPE_NEXT_PTR_TO (t) = TYPE_POINTER_TO (TREE_TYPE (t));
  TYPE_POINTER_TO (TREE_TYPE (t)) = t;
}
-  else if (TYPE_REF_P (t) && !TYPE_REF_IS_RVALUE (t))
+  else if (TYPE_REF_IS_LVALUE (t))
{
  TYPE_NEXT_REF_TO (t) = TYPE_REFERENCE_TO (TREE_TYPE (t));
  TYPE_REFERENCE_TO (TREE_TYPE (t)) = t;
-- 
2.41.0



[PATCH 7/8] tree: Define TYPE_REF_IS_LVALUE

2023-07-18 Thread Ken Matsui via Gcc-patches
This patch defines TYPE_REF_IS_LVALUE to determine if a type is a C++
lvalue reference.

gcc/ChangeLog:

* tree.h (TYPE_REF_IS_LVALUE): Define.

Signed-off-by: Ken Matsui 
---
 gcc/tree.h | 4 
 1 file changed, 4 insertions(+)

diff --git a/gcc/tree.h b/gcc/tree.h
index 347e676e737..0b2cb894241 100644
--- a/gcc/tree.h
+++ b/gcc/tree.h
@@ -1054,6 +1054,10 @@ extern void omp_clause_range_check_failed (const_tree, 
const char *, int,
 #define TYPE_REF_IS_RVALUE(NODE) \
   (REFERENCE_TYPE_CHECK (NODE)->base.private_flag)
 
+/* True if reference type NODE is a C++ lvalue reference.  */
+#define TYPE_REF_IS_LVALUE(NODE) \
+  (TYPE_REF_P (NODE) && !TYPE_REF_IS_RVALUE (NODE))
+
 /* Nonzero in a _DECL if the use of the name is defined as a
deprecated feature by __attribute__((deprecated)).  */
 #define TREE_DEPRECATED(NODE) \
-- 
2.41.0



[PATCH 6/8] tree: Remove POINTER_TYPE_P

2023-07-18 Thread Ken Matsui via Gcc-patches
Since POINTER_TYPE_P was completely replaced by INDIRECT_TYPE_P, it can
be deleted.

gcc/ChangeLog:

* tree.h (POINTER_TYPE_P): Remove.

Signed-off-by: Ken Matsui 
---
 gcc/tree.h | 7 ---
 1 file changed, 7 deletions(-)

diff --git a/gcc/tree.h b/gcc/tree.h
index d548dce63f7..347e676e737 100644
--- a/gcc/tree.h
+++ b/gcc/tree.h
@@ -691,13 +691,6 @@ extern void omp_clause_range_check_failed (const_tree, 
const char *, int,
 #define INDIRECT_TYPE_P(NODE)  \
   (TYPE_PTR_P (NODE) || TYPE_REF_P (NODE))
 
-/* Nonzero if TYPE represents a pointer or reference type.
-   (It should be renamed to INDIRECT_TYPE_P.)  Keep these checks in
-   ascending code order.  */
-
-#define POINTER_TYPE_P(TYPE) \
-  (TREE_CODE (TYPE) == POINTER_TYPE || TYPE_REF_P (TYPE))
-
 /* Nonzero if TYPE represents a pointer to function.  */
 #define FUNCTION_POINTER_TYPE_P(TYPE) \
   (INDIRECT_TYPE_P (TYPE) && TREE_CODE (TREE_TYPE (TYPE)) == FUNCTION_TYPE)
-- 
2.41.0



Re: [PATCH v3] Implement new RTL optimizations pass: fold-mem-offsets.

2023-07-18 Thread Vineet Gupta

Hi Manolis,

On 7/18/23 11:01, Jeff Law via Gcc-patches wrote:
Vineet @ Rivos has indicated he stumbled across an ICE with the V3 
code.  Hopefully he'll get a testcase for that extracted shortly. 


Yeah, I was trying to build SPEC2017 with this patch and ran into ICE 
for several of them with -Ofast build: The reduced test from 455.nab is 
attached here.

The issue happens with v2 as well, so not something introduced by v3.

There's ICE in cprop_hardreg which immediately follows f-m-o.


The protagonist is ins 93 which starts off in combine as a simple set of 
a DF 0.


| sff.i.288r.combine:(insn 93 337 94 8 (set (reg/v:DF 236 [ e ])
| sff.i.288r.combine- (const_double:DF 0.0 [0x0.0p+0])) "sff.i":23:11 
190 {*movdf_hardfloat_rv64}


Subsequently reload transforms it into SP + offset

| sff.i.303r.reload:(insn 93 337 94 9 (set (mem/c:DF (plus:DI (reg/f:DI 
2 sp)

| sff.i.303r.reload- (const_int 8 [0x8])) [4 %sfp+-8 S8 A64])
| sff.i.303r.reload- (const_double:DF 0.0 [0x0.0p+0])) "sff.i":23:11 190 
{*movdf_hardfloat_rv64}

| sff.i.303r.reload- (expr_list:REG_EQUAL (const_double:DF 0.0 [0x0.0p+0])

It gets processed by f-m-o and lands in cprop_hardreg, where it triggers 
ICE.


| (insn 93 337 523 11 (set (mem/c:DF (plus:DI (reg/f:DI 2 sp)
|     (const_int 8 [0x8])) [4 %sfp+-8 S8 A64])
|     (const_double:DF 0.0 [0x0.0p+0])) "sff.i":23:11 -1
^^^
|      (expr_list:REG_EQUAL (const_double:DF 0.0 [0x0.0p+0])
|    (nil)))
| during RTL pass: cprop_hardreg

Here's my analysis:

f-m-o: do_check_validity() -> insn_invalid_p() tries to recog() a 
modified version of insn 93 (actually there is no change, so perhaps 
something we can optimize later). The corresponding md pattern 
movdf_hardfloat_rv64 no longer matches since it expects REG_P for 
operand0, while reload has converted it into SP + offset. f-m-o then 
does the right thing by invalidating INSN_CODE=-1 for a subsequent 
recog() to work correctly.
But it seems this -1 lingers into the next pass, and trips up 
copyprop_hardreg_forward_1() -> extract_constrain_insn()

So I don't know what the right fix here should be.

In a run with -fno-fold-mem-offsets, the same insn 93 is successfully 
grok'ed by cprop_hardreg,


| (insn 93 337 522 11 (set (mem/c:DF (plus:DI (reg/f:DI 2 sp)
|    (const_int 8 [0x8])) [4 %sfp+-8 S8 A64])
|    (const_double:DF 0.0 [0x0.0p+0])) "sff.i":23:11 190 
{*movdf_hardfloat_rv64}

^^^
| (expr_list:REG_EQUAL (const_double:DF 0.0 [0x0.0p+0])
|    (nil)))

P.S. I wonder if it is a good idea in general to call recog() post 
reload since the insn could be changed sufficiently to no longer match 
the md patterns. Of course I don't know the answer.


P.S.2 When debugging code, I noticed a minor annoyance in the patch with 
the whole fold_mem_offsets_driver() switch-case indirection. It doesn't 
seem to be serving any purpose, and we could simply call corresponding 
do_* routines in execute () itself.


Thx,
-Vineeta, c, d, g, h, i, j, k, m, p, q, r, aa, t, u, x;
double *f, *s;
double l, n, o, v, w;
b() {
  double e, ad, ae, af, ag, ah, ai, aj, am, an, ap, aq, ar, as, at, av, aw;
  for (; q; q = aa) {
r = f[x];
ah = f[r + 2] - g;
af = f[0] - f[r];
ag = 1 - f[r + 1];
av = ae * af * ah * ai;
aj = h - w * p * ah;
am = o + av * af;
an = j * o * av * ag;
ap = (am - m) * ad * (k - an - n) * a - v * c;
ar = (aj - l) * c;
if (a)
  ;
else
az:
  switch (d) {
  case 1:
e = aw * i;
break;
  case 2:
exit(1);
  }
s[0] = ap;
t += at * aq;
u = as += at * ar;
if (c)
  goto az;
  }
  return e;
}


[PATCH 4/8] c++, tree: Move INDIRECT_TYPE_P to tree.h

2023-07-18 Thread Ken Matsui via Gcc-patches
This patch moves INDIRECT_TYPE_P from cp/cp-tree.h to tree.h to unify
POINTER_TYPE_P in tree.h to INDIRECT_TYPE_P, which are equivalent.

gcc/cp/ChangeLog:

* cp-tree.h (INDIRECT_TYPE_P): Remove.

gcc/ChangeLog:

* tree.h (INDIRECT_TYPE_P): Define.

Signed-off-by: Ken Matsui 
---
 gcc/cp/cp-tree.h | 4 
 gcc/tree.h   | 5 +
 2 files changed, 5 insertions(+), 4 deletions(-)

diff --git a/gcc/cp/cp-tree.h b/gcc/cp/cp-tree.h
index 681aa95e57f..5236b168ecc 100644
--- a/gcc/cp/cp-tree.h
+++ b/gcc/cp/cp-tree.h
@@ -4706,10 +4706,6 @@ get_vec_init_expr (tree t)
 #define TYPE_PTRDATAMEM_P(NODE)\
   (TREE_CODE (NODE) == OFFSET_TYPE)
 
-/* Returns true if NODE is a pointer or a reference.  */
-#define INDIRECT_TYPE_P(NODE)  \
-  (TYPE_PTR_P (NODE) || TYPE_REF_P (NODE))
-
 /* Returns true if NODE is an object type:
 
  [basic.types]
diff --git a/gcc/tree.h b/gcc/tree.h
index 48d57764d9c..c369b8470ab 100644
--- a/gcc/tree.h
+++ b/gcc/tree.h
@@ -686,6 +686,11 @@ extern void omp_clause_range_check_failed (const_tree, 
const char *, int,
 #define TYPE_REF_P(NODE)   \
   (TREE_CODE (NODE) == REFERENCE_TYPE)
 
+/* Nonzero if TYPE represents a pointer or reference type.
+   Keep these checks in ascending code order.  */
+#define INDIRECT_TYPE_P(NODE)  \
+  (TYPE_PTR_P (NODE) || TYPE_REF_P (NODE))
+
 /* Nonzero if TYPE represents a pointer or reference type.
(It should be renamed to INDIRECT_TYPE_P.)  Keep these checks in
ascending code order.  */
-- 
2.41.0



[PATCH 3/8] c++, tree: Move TYPE_PTR_P to tree.h

2023-07-18 Thread Ken Matsui via Gcc-patches
This patch moves TYPE_PTR_P from cp/cp-tree.h to tree.h to unify
POINTER_TYPE_P in tree.h to INDIRECT_TYPE_P in cp/cp-tree.h, which are
equivalent.

gcc/cp/ChangeLog:

* cp-tree.h (TYPE_PTR_P): Remove.

gcc/ChangeLog:

* tree.h (TYPE_PTR_P): Define.

Signed-off-by: Ken Matsui 
---
 gcc/cp/cp-tree.h | 4 
 gcc/tree.h   | 4 
 2 files changed, 4 insertions(+), 4 deletions(-)

diff --git a/gcc/cp/cp-tree.h b/gcc/cp/cp-tree.h
index 8c96d868650..681aa95e57f 100644
--- a/gcc/cp/cp-tree.h
+++ b/gcc/cp/cp-tree.h
@@ -4706,10 +4706,6 @@ get_vec_init_expr (tree t)
 #define TYPE_PTRDATAMEM_P(NODE)\
   (TREE_CODE (NODE) == OFFSET_TYPE)
 
-/* Returns true if NODE is a pointer.  */
-#define TYPE_PTR_P(NODE)   \
-  (TREE_CODE (NODE) == POINTER_TYPE)
-
 /* Returns true if NODE is a pointer or a reference.  */
 #define INDIRECT_TYPE_P(NODE)  \
   (TYPE_PTR_P (NODE) || TYPE_REF_P (NODE))
diff --git a/gcc/tree.h b/gcc/tree.h
index d0d19153f91..48d57764d9c 100644
--- a/gcc/tree.h
+++ b/gcc/tree.h
@@ -678,6 +678,10 @@ extern void omp_clause_range_check_failed (const_tree, 
const char *, int,
 #define AGGREGATE_TYPE_P(TYPE) \
   (TREE_CODE (TYPE) == ARRAY_TYPE || RECORD_OR_UNION_TYPE_P (TYPE))
 
+/* Nonzero if this type is a pointer.  */
+#define TYPE_PTR_P(NODE)   \
+  (TREE_CODE (NODE) == POINTER_TYPE)
+
 /* Nonzero if this type is a reference.  */
 #define TYPE_REF_P(NODE)   \
   (TREE_CODE (NODE) == REFERENCE_TYPE)
-- 
2.41.0



[PATCH 2/8] gcc: Use TYPE_REF_P

2023-07-18 Thread Ken Matsui via Gcc-patches
gcc/ada/ChangeLog:

* gcc-interface/trans.cc (return_slot_opt_for_pure_call_p): Use 
TYPE_REF_P.
* gcc-interface/utils2.cc (build_unary_op): Likewise.

gcc/ChangeLog:

* alias.cc (get_alias_set): Use TYPE_REF_P.
* config/gcn/gcn-tree.cc (gcn_goacc_get_worker_red_decl): Likewise.
* fold-const.cc (fold_unary_loc): Likewise.
* gimple.cc (gimple_call_nonnull_result_p): Likewise.
* gimplify.cc (gimplify_decl_expr): Likewise.
(omp_notice_variable): Likewise.
(omp_accumulate_sibling_list): Likewise.
(omp_build_struct_sibling_lists): Likewise.
(gimplify_scan_omp_clauses): Likewise.
(omp_shared_to_firstprivate_optimizable_decl_p): Likewise.
(gimplify_adjust_omp_clauses_1): Likewise.
(gimplify_omp_taskloop_expr): Likewise.
* ipa-free-lang-data.cc (find_decls_types_r): Likewise.
* ipa-sra.cc (create_parameter_descriptors): Likewise.
* langhooks.cc (lhd_omp_scalar_p): Likewise.
* match.pd: Likewise.
* omp-low.cc (scan_sharing_clauses): Likewise.
(lower_rec_input_clauses): Likewise.
(create_task_copyfn): Likewise.
(lower_omp_target): Likewise.
(lower_omp_1): Likewise.
* omp-simd-clone.cc (simd_clone_clauses_extract): Likewise.
(simd_clone_linear_addend): Likewise.
(simd_clone_adjust): Likewise.
* tree-inline.cc (remap_type_1): Likewise.
* tree.cc (reconstruct_complex_type): Likewise.
(verify_type_variant): Likewise.
(verify_type): Likewise.
(nonnull_arg_p): Likewise.
* tree.h (POINTER_TYPE_P): Likewise.
* var-tracking.cc (prepare_call_arguments): Likewise.
(vt_add_function_parameter): Likewise.

gcc/c-family/ChangeLog:

* c-attribs.cc (has_attribute): Use TYPE_REF_P.
* c-common.cc (c_common_truthvalue_conversion): Likewise.
(c_apply_type_quals_to_decl): Likewise.
* c-pretty-print.cc (c_pretty_printer::unary_expression): Likewise.
* c-ubsan.cc (ubsan_maybe_instrument_reference_or_call): Likewise.
* c-warn.cc (warn_if_unused_value): Likewise.

gcc/cp/ChangeLog:

* constexpr.cc (check_bit_cast_type): Use TYPE_REF_P.
* coroutines.cc (analyze_fn_parms): Likewise.
* decl.cc (finish_function): Likewise.
* init.cc (find_allocator_temps_r): Likewise.
* method.cc (early_check_defaulted_comparison): Likewise.
(build_comparison_op): Likewise.
* pt.cc (value_dependent_expression_p): Likewise.
* semantics.cc (cp_oacc_check_attachments): Likewise.
(finish_omp_target_clauses_r): Likewise.
(finish_omp_target_clauses): Likewise.

gcc/d/ChangeLog:

* d-builtins.cc (build_frontend_type): Use TYPE_REF_P.
* d-codegen.cc (indirect_ref): Likewise.
* d-convert.cc (d_truthvalue_conversion): Likewise.

gcc/fortran/ChangeLog:

* trans-array.cc (gfc_conv_descriptor_data_get): Use TYPE_REF_P.
(structure_alloc_comps): Likewise.
* trans-decl.cc (gfc_generate_function_code): Likewise.
* trans-expr.cc (gfc_conv_class_to_class): Likewise.
* trans-intrinsic.cc (gfc_conv_intrinsic_caf_get): Likewise.
(conv_caf_send): Likewise.
(trans_caf_is_present): Likewise.
(conv_intrinsic_atomic_op): Likewise.
(conv_intrinsic_atomic_ref): Likewise.
(conv_intrinsic_atomic_cas): Likewise.
* trans-openmp.cc (gfc_omp_privatize_by_reference): Likewise.
(gfc_omp_finish_clause): Likewise.
(gfc_omp_scalar_p): Likewise.
(gfc_trans_omp_array_reduction_or_udr): Likewise.
(gfc_trans_omp_clauses): Likewise.

gcc/lto/ChangeLog:

* lto-common.cc (lto_fixup_prevailing_type): Use TYPE_REF_P.

gcc/m2/ChangeLog:

* gm2-gcc/m2tree.cc (m2tree_skip_reference_type): Use TYPE_REF_P.
* gm2-gcc/m2treelib.cc (m2treelib_get_set_value): Likewise.
* m2pp.cc (m2pp_parameter): Likewise.
(m2pp_param_type): Likewise.

gcc/objc/ChangeLog:

* objc-act.cc: Use TYPE_REF_P.

Signed-off-by: Ken Matsui 
---
 gcc/ada/gcc-interface/trans.cc  |  2 +-
 gcc/ada/gcc-interface/utils2.cc |  2 +-
 gcc/alias.cc|  2 +-
 gcc/c-family/c-attribs.cc   |  2 +-
 gcc/c-family/c-common.cc|  7 +++
 gcc/c-family/c-pretty-print.cc  |  2 +-
 gcc/c-family/c-ubsan.cc |  4 ++--
 gcc/c-family/c-warn.cc  |  2 +-
 gcc/config/gcn/gcn-tree.cc  |  2 +-
 gcc/cp/constexpr.cc |  2 +-
 gcc/cp/coroutines.cc|  2 +-
 gcc/cp/decl.cc  |  2 +-
 gcc/cp/init.cc  |  2 +-
 gcc/cp/method.cc|  4 ++--
 gcc/cp/pt.cc|  2 +-
 gcc/cp/semantics.cc | 10 +-
 gcc/d/d-builtins.cc |  2 +-
 gcc/d/d-codegen.cc  |  2 +-
 gcc/d/d-convert.cc  |  4 ++--
 gcc/fold-const.cc   |  2 

[PATCH 1/8] c++, tree: Move TYPE_REF_P to tree.h

2023-07-18 Thread Ken Matsui via Gcc-patches
This patch moves TYPE_REF_P from cp/cp-tree.h to tree.h to simplify the
same code as it and to declare TYPE_REF_IS_LVALUE that determines if a
type is a C++ lvalue reference.

gcc/cp/ChangeLog:

* cp-tree.h (TYPE_REF_P): Remove.

gcc/ChangeLog:

* tree.h (TYPE_REF_P): Define.

Signed-off-by: Ken Matsui 
---
 gcc/cp/cp-tree.h | 4 
 gcc/tree.h   | 4 
 2 files changed, 4 insertions(+), 4 deletions(-)

diff --git a/gcc/cp/cp-tree.h b/gcc/cp/cp-tree.h
index 3de0e154c12..8c96d868650 100644
--- a/gcc/cp/cp-tree.h
+++ b/gcc/cp/cp-tree.h
@@ -4710,10 +4710,6 @@ get_vec_init_expr (tree t)
 #define TYPE_PTR_P(NODE)   \
   (TREE_CODE (NODE) == POINTER_TYPE)
 
-/* Returns true if NODE is a reference.  */
-#define TYPE_REF_P(NODE)   \
-  (TREE_CODE (NODE) == REFERENCE_TYPE)
-
 /* Returns true if NODE is a pointer or a reference.  */
 #define INDIRECT_TYPE_P(NODE)  \
   (TYPE_PTR_P (NODE) || TYPE_REF_P (NODE))
diff --git a/gcc/tree.h b/gcc/tree.h
index 4c04245e2b1..ca3e0ce8f5e 100644
--- a/gcc/tree.h
+++ b/gcc/tree.h
@@ -678,6 +678,10 @@ extern void omp_clause_range_check_failed (const_tree, 
const char *, int,
 #define AGGREGATE_TYPE_P(TYPE) \
   (TREE_CODE (TYPE) == ARRAY_TYPE || RECORD_OR_UNION_TYPE_P (TYPE))
 
+/* Nonzero if this type is a reference.  */
+#define TYPE_REF_P(NODE)   \
+  (TREE_CODE (NODE) == REFERENCE_TYPE)
+
 /* Nonzero if TYPE represents a pointer or reference type.
(It should be renamed to INDIRECT_TYPE_P.)  Keep these checks in
ascending code order.  */
-- 
2.41.0



[PATCH 0/8] Tweak predicate macros in tree

2023-07-18 Thread Ken Matsui via Gcc-patches
This patch series tweaks predicate macros in tree.h to make the code more
readable. TYPE_REF_P is moved to tree.h and used for INDIRECT_TYPE_P and
TYPE_REF_IS_LVALUE. TYPE_PTR_P is also moved to tree.h and used for
INDIRECT_TYPE_P. POINTER_TYPE_P in tree.h is replaced with INDIRECT_TYPE_P
since it is ambiguous. TYPE_REF_IS_LVALUE is defined in tree.h through
TYPE_REF_P and TYPE_REF_IS_RVALUE. The same behavior codes with those
predicate macros are replaced for clarity.

These works were all the way up to implementing __is_lvalue_reference
built-in trait and optimizing the is_lvalue_reference trait. However, those
changes were dropped since I did not observe any performance improvements.
For those who are interested in the benchmark results, they can be found
below:

1. is_lvalue_reference

https://github.com/ken-matsui/gcc-benches/blob/main/is_lvalue_reference.md#tue-jul-18-033708-pm-pdt-2023

Time: +1.35432%
Peak Memory Usage: -0.103283%
Total Memory Usage: No difference

2. is_lvalue_reference_v

https://github.com/ken-matsui/gcc-benches/blob/main/is_lvalue_reference_v.md#tue-jul-18-034236-pm-pdt-2023

Time: No difference
Peak Memory Usage: -0.426872%
Total Memory Usage: -0.677638%

Ken Matsui (8):
  c++, tree: Move TYPE_REF_P to tree.h
  gcc: Use TYPE_REF_P
  c++, tree: Move TYPE_PTR_P to tree.h
  c++, tree: Move INDIRECT_TYPE_P to tree.h
  gcc: Use INDIRECT_TYPE_P instead of POINTER_TYPE_P
  tree: Remove POINTER_TYPE_P
  tree: Define TYPE_REF_IS_LVALUE
  c++, lto: Use TYPE_REF_IS_LVALUE

 gcc/ada/gcc-interface/ada-tree.h   |   2 +-
 gcc/ada/gcc-interface/decl.cc  |   6 +-
 gcc/ada/gcc-interface/trans.cc |  16 +--
 gcc/ada/gcc-interface/utils.cc |  12 +-
 gcc/ada/gcc-interface/utils2.cc|  14 +-
 gcc/alias.cc   |  12 +-
 gcc/analyzer/analyzer.cc   |   4 +-
 gcc/analyzer/call-details.h|   2 +-
 gcc/analyzer/call-summary.cc   |   2 +-
 gcc/analyzer/checker-event.cc  |   4 +-
 gcc/analyzer/constraint-manager.cc |   2 +-
 gcc/analyzer/engine.cc |   4 +-
 gcc/analyzer/program-state.cc  |   2 +-
 gcc/analyzer/region-model-manager.cc   |   6 +-
 gcc/analyzer/region-model.cc   |   6 +-
 gcc/analyzer/sm.cc |   4 +-
 gcc/analyzer/svalue.cc |   2 +-
 gcc/analyzer/varargs.cc|   2 +-
 gcc/asan.cc|   4 +-
 gcc/builtins.cc|  24 ++--
 gcc/c-family/c-ada-spec.cc |   2 +-
 gcc/c-family/c-attribs.cc  |  32 ++---
 gcc/c-family/c-common.cc   |  41 +++---
 gcc/c-family/c-omp.cc  |   8 +-
 gcc/c-family/c-pretty-print.cc |   4 +-
 gcc/c-family/c-ubsan.cc|  10 +-
 gcc/c-family/c-warn.cc |  34 ++---
 gcc/c/c-decl.cc|   8 +-
 gcc/c/c-parser.cc  |   4 +-
 gcc/c/c-typeck.cc  |  40 +++---
 gcc/c/gimple-parser.cc |   8 +-
 gcc/calls.cc   |   2 +-
 gcc/cfgexpand.cc   |   6 +-
 gcc/cgraph.cc  |   2 +-
 gcc/cgraphunit.cc  |   2 +-
 gcc/config/aarch64/aarch64-builtins.cc |   2 +-
 gcc/config/aarch64/aarch64-sve-builtins.cc |   2 +-
 gcc/config/aarch64/aarch64.cc  |   6 +-
 gcc/config/arc/arc.cc  |   2 +-
 gcc/config/arm/arm-builtins.cc |   6 +-
 gcc/config/arm/arm-mve-builtins.cc |   2 +-
 gcc/config/avr/avr.cc  |   6 +-
 gcc/config/epiphany/epiphany.cc|   2 +-
 gcc/config/gcn/gcn-tree.cc |   2 +-
 gcc/config/gcn/gcn.cc  |   6 +-
 gcc/config/i386/i386-builtins.cc   |   2 +-
 gcc/config/i386/i386-options.cc|   2 +-
 gcc/config/i386/i386.cc|  10 +-
 gcc/config/m32c/m32c.cc|   2 +-
 gcc/config/m68k/m68k.cc|   4 +-
 gcc/config/mips/mips.cc|   2 +-
 gcc/config/mn10300/mn10300.cc  |   2 +-
 gcc/config/msp430/msp430.cc|   2 +-
 gcc/config/nios2/nios2.cc  |   2 +-
 gcc/config/pa/pa.cc|   4 +-
 gcc/config/pru/pru-passes.cc   |   2 +-
 gcc/config/pru/pru.cc  |   2 +-
 gcc/config/rs6000/rs6000-builtin.cc|   4 +-
 gcc/config/rs6000/rs6000-c.cc  |  10 +-
 gcc/config/rs6000/rs6000.cc|   4 +-
 gcc/config/s390/s390-c.cc  |   2 +-
 gcc/config/s390/s390.cc|  12 +-
 gcc/config/sparc/sparc.cc  |   2 +-
 gcc/convert.cc |   4 +-
 gcc/cp/class.cc|   4 +-
 

[Bug c/110728] should __attribute__((cleanup())) callback get invoked for indirect edges of asm goto

2023-07-18 Thread rjmccall at gmail dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=110728

--- Comment #5 from John McCall  ---
> If this gets changed in GCC, I'd be happy to modify clang to match that 
> updated behavior.

Policy-wise, I don't think clang would accept a patch making this UB
(effectively what not calling the destructor/cleanup means) instead of
ill-formed unless a standards body forced us to.  Not calling the
destructor/cleanup seems like clearly undesirable behavior, and if we can
define that away in the compiler with relative ease, we should.

[Bug c/44677] Warn for variables incremented but not used (+=, ++)

2023-07-18 Thread vincent-gcc at vinc17 dot net via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=44677

Vincent Lefèvre  changed:

   What|Removed |Added

 CC||vincent-gcc at vinc17 dot net

--- Comment #17 from Vincent Lefèvre  ---
(In reply to Martin Sebor from comment #10)
> $ cat pr95217.c && clang -S -Wall -Wextra --analyze pr95217.c
[...]
> pr95217.c:8:3: warning: Value stored to 'p' is never read
>   p += 1;  // missing warning
>   ^~
> pr95217.c:13:3: warning: Value stored to 'p' is never read
>   p = p + 1;   // missing warning
>   ^   ~
> 2 warnings generated.

Clang (15 and above) with -Wunused-but-set-variable now detects the issue on
the "p++" and "p += 1" forms (ditto with other combined assignment operators),
but not on "p = p + 1".

Such forms (p++, etc.) are common, so that detecting an unused variable is very
useful.

Like Paul did for Emacs (comment 13), I've just fixed two issues in GNU MPFR
(one cosmetic about a useless loop variable and one important in the
testsuite), found with Clang 16. The references:
https://gitlab.inria.fr/mpfr/mpfr/-/commit/4c110cf4773b3c07de2a33acbee591cedb083c80
https://gitlab.inria.fr/mpfr/mpfr/-/commit/b34d867fa41934d12d0d4dbaaa0242d6d3eb32c7

For the second MPFR issue, there was an "err++" for each error found by the
function in the testsuite, but err was not tested at the end, so that potential
errors were never reported.

[Bug c/95057] missing -Wunused-but-set-variable warning on multiple assignments, not all of them used

2023-07-18 Thread vincent-gcc at vinc17 dot net via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=95057

--- Comment #6 from Vincent Lefèvre  ---
Well, for the ++, --, +=, -=, *=, etc. operators, that's PR44677 (though it is
unclear on what it should cover).

[Bug c/95057] missing -Wunused-but-set-variable warning on multiple assignments, not all of them used

2023-07-18 Thread vincent-gcc at vinc17 dot net via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=95057

--- Comment #5 from Vincent Lefèvre  ---
FYI, Clang 16 does not warn either on the testcases provided in comment 0 (bug
report).

But contrary to GCC (tested with master r14-1713-g6631fe419c6 - Debian
gcc-snapshot package 20230613-1), Clang 15 and 16 both warn on f and g below:

int h (void);

void f (void)
{
  int i = h ();
  i++;
}

void g (void)
{
  for (int i = 0 ;; i++)
if (h ())
  break;
}

zira:~> clang-15 -c -Wunused-but-set-variable warn-inc.c
warn-inc.c:5:7: warning: variable 'i' set but not used
[-Wunused-but-set-variable]
  int i = h ();
  ^
warn-inc.c:11:12: warning: variable 'i' set but not used
[-Wunused-but-set-variable]
  for (int i = 0 ;; i++)
   ^
2 warnings generated.

Thanks to this detection, a test with Clang 16 found two issues in GNU MPFR
(one cosmetic about a useless loop variable and one important in the
testsuite). So it is useful to consider such particular cases.

[PATCH] libstdc++: Define _GLIBCXX_HAS_BUILTIN_TRAIT

2023-07-18 Thread Ken Matsui via Gcc-patches
This patch defines _GLIBCXX_HAS_BUILTIN_TRAIT, which will be used as a
flag to toggle built-in traits in the type_traits header. Through this
macro function and _GLIBCXX_NO_BUILTIN_TRAITS macro, we can switch the
use of built-in traits without needing to modify the source code.

libstdc++-v3/ChangeLog:

* include/bits/c++config (_GLIBCXX_HAS_BUILTIN_TRAIT): Define.

Signed-off-by: Ken Matsui 
---
 libstdc++-v3/include/bits/c++config | 6 +-
 1 file changed, 5 insertions(+), 1 deletion(-)

diff --git a/libstdc++-v3/include/bits/c++config 
b/libstdc++-v3/include/bits/c++config
index dd47f274d5f..de13f61db71 100644
--- a/libstdc++-v3/include/bits/c++config
+++ b/libstdc++-v3/include/bits/c++config
@@ -854,7 +854,11 @@ namespace __gnu_cxx
 # define _GLIBCXX_HAVE_BUILTIN_LAUNDER 1
 #endif
 
-#undef _GLIBCXX_HAS_BUILTIN
+// Returns true if _GLIBCXX_NO_BUILTIN_TRAITS is not defined and the compiler
+// has a corresponding built-in type trait. _GLIBCXX_NO_BUILTIN_TRAITS is
+// defined to disable the use of built-in traits.
+#define _GLIBCXX_HAS_BUILTIN_TRAIT(BT)  \
+  (!defined(_GLIBCXX_NO_BUILTIN_TRAITS) && _GLIBCXX_HAS_BUILTIN(BT))
 
 // Mark code that should be ignored by the compiler, but seen by Doxygen.
 #define _GLIBCXX_DOXYGEN_ONLY(X)
-- 
2.41.0



[Bug c++/110114] [13/14 Regression] ICE on calling overloaded function in case of incomplete argument type and C++ designated initializers

2023-07-18 Thread mpolacek at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=110114

--- Comment #4 from Marek Polacek  ---
This should work; I'll test it tomorrow.

--- a/gcc/cp/call.cc
+++ b/gcc/cp/call.cc
@@ -986,6 +986,11 @@ build_aggr_conv (tree type, tree ctor, int flags,
tsubst_flags_t complain)
   tree empty_ctor = NULL_TREE;
   hash_set pset;

+  /* We've called complete_type on TYPE before calling this function, but
+ perhaps it wasn't successful.  */
+  if (!COMPLETE_TYPE_P (type))
+return nullptr;
+
   /* We already called reshape_init in implicit_conversion, but it might not
  have done anything in the case of parenthesized aggr init.  */

[Bug c++/110114] [13/14 Regression] ICE on calling overloaded function in case of incomplete argument type and C++ designated initializers

2023-07-18 Thread mpolacek at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=110114

Marek Polacek  changed:

   What|Removed |Added

   Priority|P3  |P2
   Assignee|unassigned at gcc dot gnu.org  |mpolacek at gcc dot 
gnu.org
 Status|NEW |ASSIGNED

[Bug c/110728] should __attribute__((cleanup())) callback get invoked for indirect edges of asm goto

2023-07-18 Thread pinskia at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=110728

Andrew Pinski  changed:

   What|Removed |Added

   See Also||https://gcc.gnu.org/bugzill
   ||a/show_bug.cgi?id=37722

--- Comment #4 from Andrew Pinski  ---
Oh and computed goto has a similar issue too (PR 37722 which is for C++
deconstructors but cleanup is the same idea there).

[Bug middle-end/110726] [14 Regression] wrong code on llvm-16 around 'a |= a == 0'

2023-07-18 Thread pinskia at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=110726

--- Comment #8 from Andrew Pinski  ---
(In reply to Sergei Trofimovich from comment #7)
> Silly question: I would expect -O0 not to perform any simplifications like
> that.  Does `gcc` have a knob to disable `match.pd` templates? Or some of
> them are crucial for correctness?

Some are crucial for correctness usually dealing with c++. you could always use
dbg_cnt to try to narrow down which one is failing.


See 
https://gcc.gnu.org/git/?p=gcc.git;a=blob;f=gcc/dbgcnt.def;h=9e2f1d857b49c6c331cc7d1c419eafa0f2403e96;hb=HEAD

For on how to use dbg_cnt

match is the counter here.

Also for generic folding -fdump-tree-original-folding will have the debug info
on what match pattern was applied during "fold" (of generic).

[Bug middle-end/110726] [14 Regression] wrong code on llvm-16 around 'a |= a == 0'

2023-07-18 Thread slyfox at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=110726

--- Comment #7 from Sergei Trofimovich  ---
Silly question: I would expect -O0 not to perform any simplifications like
that.  Does `gcc` have a knob to disable `match.pd` templates? Or some of them
are crucial for correctness?

[Bug c/110728] should __attribute__((cleanup())) callback get invoked for indirect edges of asm goto

2023-07-18 Thread pinskia at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=110728

--- Comment #3 from Andrew Pinski  ---
That is even doing:
```
int g();
int h()
{
l0:;
int x __attribute__((cleanup(test4cleanup)));
if (g()) goto l0;
}
```
Produces the same result which is why I said this is the same as PR 91951
really.

[Bug c/110728] should __attribute__((cleanup())) callback get invoked for indirect edges of asm goto

2023-07-18 Thread pinskia at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=110728

--- Comment #2 from Andrew Pinski  ---
Techincally x does not go out of scope until the } so I don't see how this
would act any other way.

[Bug c/110728] should __attribute__((cleanup())) callback get invoked for indirect edges of asm goto

2023-07-18 Thread pinskia at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=110728

--- Comment #1 from Andrew Pinski  ---
I suspect PR 91951 is the same really.

[Bug c/110728] New: should __attribute__((cleanup())) callback get invoked for indirect edges of asm goto

2023-07-18 Thread ndesaulniers at google dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=110728

Bug ID: 110728
   Summary: should __attribute__((cleanup())) callback get invoked
for indirect edges of asm goto
   Product: gcc
   Version: 14.0
Status: UNCONFIRMED
  Severity: normal
  Priority: P3
 Component: c
  Assignee: unassigned at gcc dot gnu.org
  Reporter: ndesaulniers at google dot com
  Target Milestone: ---

Consider the following test case:

```c
void test4cleanup(int*);
// No errors expected.
void test4(void) {
l0:;
int x __attribute__((cleanup(test4cleanup)));
asm goto("# %l0"l0);
}

```

gcc trunk today generates effectively the following control flow:
```
test4:
.LFB0:
subq$24, %rsp
.L2:
#APP
# .L2
#NO_APP
leaq12(%rsp), %rdi
calltest4cleanup
addq$24, %rsp
ret
```
so if the inline asm blob jumps to `l0`, then the cleanup function is not run.

That seemed surprising, at least when we discussed it on this thread.
https://reviews.llvm.org/D155342#4511887

A fellow Linux kernel dev (who introduced the usage of
__attribute__((cleanup())) (and asm goto, coincidentally) to the kernel) agreed
(on IRC).

For now in clang, we produce a diagnostic since the behavior seems surprising. 
If this gets changed in GCC, I'd be happy to modify clang to match that updated
behavior.

[PATCH] c++: fix ICE with is_really_empty_class [PR110106]

2023-07-18 Thread Marek Polacek via Gcc-patches
Bootstrapped/regtested on x86_64-pc-linux-gnu, ok for trunk and branches?

-- >8 --

is_really_empty_class is liable to crash when it gets an incomplete
or dependent type.  Since r11-557, we pass the yet-uninstantiated
class type S<0> of the PARM_DECL s to is_really_empty_class -- because
of the potential_rvalue_constant_expression -> is_rvalue_constant_expression
change in cp_parser_constant_expression.  Here we're not parsing
a template so we did not check COMPLETE_TYPE_P as we should.

PR c++/110106

gcc/cp/ChangeLog:

* constexpr.cc (potential_constant_expression_1): Check COMPLETE_TYPE_P
even when !processing_template_decl.

gcc/testsuite/ChangeLog:

* g++.dg/cpp0x/noexcept80.C: New test.
---
 gcc/cp/constexpr.cc |  2 +-
 gcc/testsuite/g++.dg/cpp0x/noexcept80.C | 12 
 2 files changed, 13 insertions(+), 1 deletion(-)
 create mode 100644 gcc/testsuite/g++.dg/cpp0x/noexcept80.C

diff --git a/gcc/cp/constexpr.cc b/gcc/cp/constexpr.cc
index 6e8f1c2b61e..1f59c5472fb 100644
--- a/gcc/cp/constexpr.cc
+++ b/gcc/cp/constexpr.cc
@@ -9116,7 +9116,7 @@ potential_constant_expression_1 (tree t, bool want_rval, 
bool strict, bool now,
   if (now && want_rval)
{
  tree type = TREE_TYPE (t);
- if ((processing_template_decl && !COMPLETE_TYPE_P (type))
+ if (!COMPLETE_TYPE_P (type)
  || dependent_type_p (type)
  || is_really_empty_class (type, /*ignore_vptr*/false))
/* An empty class has no data to read.  */
diff --git a/gcc/testsuite/g++.dg/cpp0x/noexcept80.C 
b/gcc/testsuite/g++.dg/cpp0x/noexcept80.C
new file mode 100644
index 000..3e90af747e2
--- /dev/null
+++ b/gcc/testsuite/g++.dg/cpp0x/noexcept80.C
@@ -0,0 +1,12 @@
+// PR c++/110106
+// { dg-do compile { target c++11 } }
+
+template struct S
+{
+};
+
+struct G {
+  G(S<0>);
+};
+
+void y(S<0> s) noexcept(noexcept(G{s}));

base-commit: fca089e8a47314a40ad93527ba9f9d0d374b3afb
-- 
2.41.0



Re: [PATCH v5 4/5] c++modules: report imported CMI files as dependencies

2023-07-18 Thread Nathan Sidwell via Gcc

On 7/18/23 16:52, Jason Merrill wrote:

On 6/25/23 12:36, Ben Boeckel wrote:

On Fri, Jun 23, 2023 at 08:12:41 -0400, Nathan Sidwell wrote:

On 6/22/23 22:45, Ben Boeckel wrote:

On Thu, Jun 22, 2023 at 17:21:42 -0400, Jason Merrill wrote:

On 1/25/23 16:06, Ben Boeckel wrote:

They affect the build, so report them via `-MF` mechanisms.


Why isn't this covered by the existing code in preprocessed_module?


It appears as though it is neutered in patch 3 where
`write_make_modules_deps` is used in `make_write` (or will use that name


Why do you want to record the transitive modules? I would expect just noting the
ones with imports directly in the TU would suffice (i.e check the 'outermost' 
arg)


FWIW, only GCC has "fat" modules. MSVC and Clang both require the
transitive closure to be passed. The idea there is to minimize the size
of individual module files.

If GCC only reads the "fat" modules, then only those should be recorded.
If it reads other modules, they should be recorded as well.


Please explain what you mean by fat modules.  There seems to be confusion.



But wouldn't the transitive modules be dependencies of the direct imports, so 
(re)building the direct imports would first require building the transitive 
modules anyway?  Expressing the transitive closure of dependencies for each 
importer seems redundant when it can be easily derived from the direct 
dependencies of each module.


Jason



--
Nathan Sidwell



Re: [PATCH v5 4/5] c++modules: report imported CMI files as dependencies

2023-07-18 Thread Nathan Sidwell via Gcc-patches

On 7/18/23 16:52, Jason Merrill wrote:

On 6/25/23 12:36, Ben Boeckel wrote:

On Fri, Jun 23, 2023 at 08:12:41 -0400, Nathan Sidwell wrote:

On 6/22/23 22:45, Ben Boeckel wrote:

On Thu, Jun 22, 2023 at 17:21:42 -0400, Jason Merrill wrote:

On 1/25/23 16:06, Ben Boeckel wrote:

They affect the build, so report them via `-MF` mechanisms.


Why isn't this covered by the existing code in preprocessed_module?


It appears as though it is neutered in patch 3 where
`write_make_modules_deps` is used in `make_write` (or will use that name


Why do you want to record the transitive modules? I would expect just noting the
ones with imports directly in the TU would suffice (i.e check the 'outermost' 
arg)


FWIW, only GCC has "fat" modules. MSVC and Clang both require the
transitive closure to be passed. The idea there is to minimize the size
of individual module files.

If GCC only reads the "fat" modules, then only those should be recorded.
If it reads other modules, they should be recorded as well.


Please explain what you mean by fat modules.  There seems to be confusion.



But wouldn't the transitive modules be dependencies of the direct imports, so 
(re)building the direct imports would first require building the transitive 
modules anyway?  Expressing the transitive closure of dependencies for each 
importer seems redundant when it can be easily derived from the direct 
dependencies of each module.


Jason



--
Nathan Sidwell



Re: [PATCH v5 4/5] c++modules: report imported CMI files as dependencies

2023-07-18 Thread Jason Merrill via Gcc

On 6/25/23 12:36, Ben Boeckel wrote:

On Fri, Jun 23, 2023 at 08:12:41 -0400, Nathan Sidwell wrote:

On 6/22/23 22:45, Ben Boeckel wrote:

On Thu, Jun 22, 2023 at 17:21:42 -0400, Jason Merrill wrote:

On 1/25/23 16:06, Ben Boeckel wrote:

They affect the build, so report them via `-MF` mechanisms.


Why isn't this covered by the existing code in preprocessed_module?


It appears as though it is neutered in patch 3 where
`write_make_modules_deps` is used in `make_write` (or will use that name


Why do you want to record the transitive modules? I would expect just noting the
ones with imports directly in the TU would suffice (i.e check the 'outermost' 
arg)


FWIW, only GCC has "fat" modules. MSVC and Clang both require the
transitive closure to be passed. The idea there is to minimize the size
of individual module files.

If GCC only reads the "fat" modules, then only those should be recorded.
If it reads other modules, they should be recorded as well.


But wouldn't the transitive modules be dependencies of the direct 
imports, so (re)building the direct imports would first require building 
the transitive modules anyway?  Expressing the transitive closure of 
dependencies for each importer seems redundant when it can be easily 
derived from the direct dependencies of each module.


Jason



Re: [PATCH v5 4/5] c++modules: report imported CMI files as dependencies

2023-07-18 Thread Jason Merrill via Gcc-patches

On 6/25/23 12:36, Ben Boeckel wrote:

On Fri, Jun 23, 2023 at 08:12:41 -0400, Nathan Sidwell wrote:

On 6/22/23 22:45, Ben Boeckel wrote:

On Thu, Jun 22, 2023 at 17:21:42 -0400, Jason Merrill wrote:

On 1/25/23 16:06, Ben Boeckel wrote:

They affect the build, so report them via `-MF` mechanisms.


Why isn't this covered by the existing code in preprocessed_module?


It appears as though it is neutered in patch 3 where
`write_make_modules_deps` is used in `make_write` (or will use that name


Why do you want to record the transitive modules? I would expect just noting the
ones with imports directly in the TU would suffice (i.e check the 'outermost' 
arg)


FWIW, only GCC has "fat" modules. MSVC and Clang both require the
transitive closure to be passed. The idea there is to minimize the size
of individual module files.

If GCC only reads the "fat" modules, then only those should be recorded.
If it reads other modules, they should be recorded as well.


But wouldn't the transitive modules be dependencies of the direct 
imports, so (re)building the direct imports would first require building 
the transitive modules anyway?  Expressing the transitive closure of 
dependencies for each importer seems redundant when it can be easily 
derived from the direct dependencies of each module.


Jason



Re: [PATCH v2] libcpp: Handle extended characters in user-defined literal suffix [PR103902]

2023-07-18 Thread Jason Merrill via Gcc-patches

On 3/2/23 18:07, Lewis Hyatt wrote:

The PR complains that we do not handle UTF-8 in the suffix for a user-defined
literal, such as:

bool operator ""_π (unsigned long long);

In fact we don't handle any extended identifier characters there, whether
UTF-8, UCNs, or the $ sign. We do handle it fine if the optional space after
the "" tokens is included, since then the identifier is lexed in the "normal"
way as its own token. But when it is lexed as part of the string token, this
is handled in lex_string() with a one-off loop that is not aware of extended
characters.

This patch fixes it by adding a new function scan_cur_identifier() that can be
used to lex an identifier while in the middle of lexing another token.

BTW, the other place that has been mis-lexing identifiers is
lex_identifier_intern(), which is used to implement #pragma push_macro
and #pragma pop_macro. This does not support extended characters either.
I will add that in a subsequent patch, because it can't directly reuse the
new function, but rather needs to lex from a string instead of a cpp_buffer.

With scan_cur_identifier(), we do also correctly warn about bidi and
normalization issues in the extended identifiers comprising the suffix.

libcpp/ChangeLog:

PR preprocessor/103902
* lex.cc (identifier_diagnostics_on_lex): New function refactoring
some common code.
(lex_identifier_intern): Use the new function.
(lex_identifier): Don't run identifier diagnostics here, rather let
the call site do it when needed.
(_cpp_lex_direct): Adjust the call sites of lex_identifier ()
acccordingly.
(struct scan_id_result): New struct.
(scan_cur_identifier): New function.
(create_literal2): New function.
(lit_accum::create_literal2): New function.
(is_macro): Folded into new function...
(maybe_ignore_udl_macro_suffix): ...here.
(is_macro_not_literal_suffix): Folded likewise.
(lex_raw_string): Handle UTF-8 in UDL suffix via scan_cur_identifier ().
(lex_string): Likewise.

gcc/testsuite/ChangeLog:

PR preprocessor/103902
* g++.dg/cpp0x/udlit-extended-id-1.C: New test.
* g++.dg/cpp0x/udlit-extended-id-2.C: New test.
* g++.dg/cpp0x/udlit-extended-id-3.C: New test.
* g++.dg/cpp0x/udlit-extended-id-4.C: New test.
---

Notes:
 Hello-
 
 This is the updated version of the patch, incorporating feedback from Jakub

 and Jason, most recently discussed here:
 
 https://gcc.gnu.org/pipermail/gcc-patches/2023-February/612073.html
 
 Please let me know how it looks? It is simpler than before with the new

 approach. Thanks!
 
 One thing to note. As Jason clarified for me, a usage like this:
 
  #pragma GCC poison _x

 const char * operator "" _x (const char *, unsigned long);
 
 The space between the "" and the _x is currently allowed but will be

 deprecated in C++23. GCC currently will complain about the poisoned use of
 _x in this case, and this patch, which is just focused on handling UTF-8
 properly, does not change this. But it seems that it would be correct
 not to apply poison in this case. I can try to follow up with a patch to do
 so, if it seems worthwhile? Given the syntax is deprecated, maybe it's not
 worth it...


Right, the deprecation is intended to avoid this problem; it's fine for 
us to complain.



 For the time being, this patch does add a testcase for the above and xfails
 it. For the case where no space is present, which is the part touched by 
the
 present patch, existing behavior is preserved correctly and no diagnostics
 such as poison are issued for the UDL suffix. (Contrary to v1 of this
 patch.)
 
 Thanks! bootstrap + regtested all languages on x86-64 Linux with

 no regressions.
 



-/* Returns true if a macro has been defined.
-   This might not work if compile with -save-temps,
-   or preprocess separately from compilation.  */
+/* Helper function to check if a string format macro, say from inttypes.h, is
+   placed touching a string literal, in which case it could be parsed as a 
C++11
+   user-defined string literal thus breaking the program.



User-defined literals
+   outside of namespace std must start with a single underscore, so assume
+   anything of that form really is a UDL suffix.  We don't need to worry about
+   UDLs defined inside namespace std because their names are reserved, so 
cannot
+   be used as macro names in valid programs.


I'd prefer to leave this rationale comment on the _[^_] check rather 
than hoist it out of the function; OK with that change.  Thank you very 
much for your persistence.



Return TRUE if the UDL should be
+   ignored for now and preserved for potential macro expansion.  */
  
  static bool

-is_macro(cpp_reader *pfile, const uchar *base)
+maybe_ignore_udl_macro_suffix (cpp_reader *pfile, 

[Bug middle-end/110726] [14 Regression] wrong code on llvm-16 around 'a |= a == 0'

2023-07-18 Thread pinskia at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=110726

--- Comment #6 from Andrew Pinski  ---
Created attachment 55576
  --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=55576=edit
Full testcase

[Bug fortran/110720] [13 Regression] Internal compiler error (segmentation fault) in gfc_expression_rank

2023-07-18 Thread anlauf at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=110720

--- Comment #3 from anlauf at gcc dot gnu.org ---
(In reply to Paul Thomas from comment #2)
> (In reply to anlauf from comment #1)
> > Looks like a dup of pr109948, which is fixed on 14-trunk.
> > 
> > @Paul: do you think your patch (commit r14-1487-g3c2eba4) is
> > backportable to 13-branch?
> 
> I know that it is :-)
> 
> I'll propose a backport to the list unless you want to OK it here?

I don't remember any regressions from the fix on mainline, so OK from me.
Let's fix this issue in time before 13.2 comes out...

:-)

[Bug fortran/110720] [13 Regression] Internal compiler error (segmentation fault) in gfc_expression_rank

2023-07-18 Thread pault at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=110720

--- Comment #2 from Paul Thomas  ---
(In reply to anlauf from comment #1)
> Looks like a dup of pr109948, which is fixed on 14-trunk.
> 
> @Paul: do you think your patch (commit r14-1487-g3c2eba4) is
> backportable to 13-branch?

I know that it is :-)

I'll propose a backport to the list unless you want to OK it here?

Cheers

Paul

[Bug testsuite/110419] [14 regression] new test case gfortran.dg/value_9.f90 in r14-2050-gd130ae8499e0c6 fails

2023-07-18 Thread dje at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=110419

David Edelsohn  changed:

   What|Removed |Added

 Status|UNCONFIRMED |NEW
 CC||dje at gcc dot gnu.org
   Last reconfirmed||2023-07-18
 Ever confirmed|0   |1

--- Comment #16 from David Edelsohn  ---
As I wrote in issue 110360, the bug appears to be the memory layout and padding
assumed by GFortran that does not take into account endianness.

I have changed val() to print both c and x, and not halt.

  subroutine val (x, c)
character(kind=1), intent(in) :: x  ! control: pass by reference
character(kind=1), value  :: c
print *, "by value(kind=1): ", x
print *, "by value(kind=1): ", c
!if (c /= x)   stop 1
c = "*"
if (c /= "*") stop 2
  end


The output is:

 by value(kind=1): B
 by value(kind=1): B
 by value(kind=1): A
 by value(kind=1): A
 by value(kind=1): A
 by value(kind=1):<- c
 by value(kind=1): A
 by value(kind=1):<- c
 by value(kind=1): A
 by value(kind=1):<- c
 by value(kind=1): 1
 by value(kind=1):<- c
 by value(kind=1): 1
 by value(kind=1):<- c


The assembly language for the first few calls is

# call val  ("B","B")
lwz 31,LC..5(2)  LOAD ADDRESS of x
mr 3,31  COPY address to first parameter
li 6,1
li 5,1
lbzu 4,148(3)LOAD BYTE of c as second parameter
slwi 4,4,24  SHIFT c 24 bits
bl .val.4
# call val  ("A",char(65))
mr 30,31 COPY ADDRESS of x
li 6,1
li 5,1
lbzu 4,152(30)   LOAD BYTE of c as second parameter
slwi 4,4,24  SHIFT c 24 bits
mr 3,30  COPY address of first parameter
bl .val.4
# call val  ("A",char(a))
li 6,1
li 5,1
li 4,65  <- c NOT SHIFTED
mr 3,30  <- x
bl .val.4
# call val  ("A",mychar(65))
li 6,1
li 5,1
li 4,65  <- c NOT SHIFTED
mr 3,30  <- x
bl .val.4
# call val  ("A",mychar(a))
li 6,1
li 5,1
li 4,65  <- c NOT SHIFTED
mr 3,30  <- x
bl .val.4

GFortran is not taking account of endianness for the layout of values in memory
compared to constants loaded into registers.  This isn't an ABI issue of the
target, this is a memory layout and register layout issue of GFortran.

On a big endian system, a character / byte is loaded at the LSB, but GFortran
seems to be comparing it to a memory image with the character / byte stored at
the MSB, which would be correct for little endian.  In some cases, GFortran is
shifting the value and in other cases it is not.

GFortran does not seem to have a consistent view of the memory layout for
characters / bytes loaded into a larger object.

[Bug middle-end/110726] [14 Regression] wrong code on llvm-16 around 'a |= a == 0'

2023-07-18 Thread pinskia at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=110726

--- Comment #5 from Andrew Pinski  ---
Created attachment 55575
  --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=55575=edit
Patch which should fix this

So I didn't realize I need the check for "INTEGRAL_TYPE_P (TREE_TYPE (@0))
  && TYPE_PRECISION (TREE_TYPE (@0)) == 1" On these.

[Bug target/110724] Unnecessary alignment on branch to unconditional branch targets

2023-07-18 Thread pinskia at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=110724

--- Comment #5 from Andrew Pinski  ---
Plus if this is just the return case how important is that because maybe we
should be inlining this kind of function.  Plus this is a memcpy, why not just
use the expansion of __builtin_memcpy here (which is tuned for each target
better).

[Bug c++/110106] [11/12/13/14 Regression] ICE on noexcept(noexcept(...)) with optional

2023-07-18 Thread mpolacek at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=110106

Marek Polacek  changed:

   What|Removed |Added

   Priority|P3  |P2
   Assignee|unassigned at gcc dot gnu.org  |mpolacek at gcc dot 
gnu.org
 Status|NEW |ASSIGNED

[Bug target/110724] Unnecessary alignment on branch to unconditional branch targets

2023-07-18 Thread pinskia at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=110724

Andrew Pinski  changed:

   What|Removed |Added

  Component|middle-end  |target

--- Comment #4 from Andrew Pinski  ---
I noticed clang/LLVM does not do any alignment here but I doubt that is correct
thing to do.

Anyways I don't there is any bug here in the middle-end because the alignment
is what the backend requests. Now the question is the alignment needed or not,
that is for some x86 person to do benchmarking really. And if this needs more
than the simple heurstics than what is currently supported in the middle-end
well that is still up to the Intel/AMD person to benchmark.

[Bug middle-end/110726] [14 Regression] wrong code on llvm-16 around 'a |= a == 0'

2023-07-18 Thread pinskia at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=110726

--- Comment #4 from Andrew Pinski  ---
/* a | (a == b)  -->  a | (b^1) (boolean version of the above). */
(simplify
 (bit_ior:c @0 (nop_convert? (eq:c @0 @1)))
 (bit_ior @0 (bit_xor @1 { build_one_cst (type); })))


So I guess the issue is eq on generic has an integer type rather than boolean
type ...

[Bug middle-end/110726] [14 Regression] wrong code on llvm-16 around 'a |= a == 0'

2023-07-18 Thread pinskia at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=110726

Andrew Pinski  changed:

   What|Removed |Added

 Status|NEW |ASSIGNED

[Bug middle-end/110726] [14 Regression] wrong code on llvm-16 around 'a |= a == 0'

2023-07-18 Thread pinskia at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=110726

Andrew Pinski  changed:

   What|Removed |Added

   Last reconfirmed||2023-07-18
 Status|UNCONFIRMED |NEW
   Assignee|unassigned at gcc dot gnu.org  |pinskia at gcc dot 
gnu.org
 Ever confirmed|0   |1

--- Comment #3 from Andrew Pinski  ---
It worked at r14-2547-g71a907abdb4c03d4a3419190d .

Mine, looks like r14-2556-g0407ae8a7732d9 caused it ...

[Bug middle-end/110726] [14 Regression] wrong code on llvm-16 around 'a |= a == 0'

2023-07-18 Thread pinskia at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=110726

Andrew Pinski  changed:

   What|Removed |Added

   Keywords||wrong-code
   Target Milestone|--- |14.0

[Bug target/110722] FP is Saved/Restored around inline assembly

2023-07-18 Thread pinskia at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=110722

--- Comment #1 from Andrew Pinski  ---
I think this will be fixed with -momit-leaf-frame-pointer patch at :
https://gcc.gnu.org/pipermail/gcc-patches/2023-July/624752.html

[Bug middle-end/110726] [14 Regression] wrong code on llvm-16 around 'a |= a == 0'

2023-07-18 Thread pinskia at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=110726

--- Comment #2 from Andrew Pinski  ---
Whoops wrong bug report.

Re: Fix profile update in scale_profile_for_vect_loop

2023-07-18 Thread Thiago Jung Bauermann via Gcc-patches


Hello,

Jan Hubicka via Gcc-patches  writes:

> Hi,
> when vectorizing 4 times, we sometimes do
>   for
> <4x vectorized body>
>   for
> <2x vectorized body>
>   for
> <1x vectorized body>
>
> Here the second two fors handling epilogue never iterates.
> Currently vecotrizer thinks that the middle for itrates twice.
> This turns out to be scale_profile_for_vect_loop that uses 
> niter_for_unrolled_loop.
>
> At that time we know epilogue will iterate at most 2 times
> but niter_for_unrolled_loop does not know that the last iteration
> will be taken by the epilogue-of-epilogue and thus it think
> that the loop may iterate once and exit in middle of second
> iteration.
>
> We already do correct job updating niter bounds and this is
> just ordering issue.  This patch makes us to first update
> the bounds and then do updating of the loop.  I re-implemented
> the function more correctly and precisely.
>
> The loop reducing iteration factor for overly flat profiles is bit funny, but
> only other method I can think of is to compute sreal scale that would have
> similar overhead I think.
>
> Bootstrapped/regtested x86_64-linux, comitted.
>
> gcc/ChangeLog:
>
>   PR middle-end/110649
>   * tree-vect-loop.cc (scale_profile_for_vect_loop):
>   (vect_transform_loop):
>   (optimize_mask_stores):

Our CI detected regressions on aarch64-linux-gnu with this commit in
gcc.target/aarch64/sve/aarch64-sve.exp. I checked today's trunk and it
still fails. I filed the following bug report with the details:

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=110727

Could you please check?

-- 
Thiago


[Bug middle-end/110726] [14 Regression] wrong code on llvm-16 around 'a |= a == 0'

2023-07-18 Thread pinskia at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=110726

--- Comment #1 from Andrew Pinski  ---
I think this will be fixed with -momit-leaf-frame-pointer patch at :
https://gcc.gnu.org/pipermail/gcc-patches/2023-July/624752.html

[Bug target/110727] New: gcc.target/aarch64/sve/aarch64-sve.exp has two new failures since commit 061f74c0673

2023-07-18 Thread thiago.bauermann at linaro dot org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=110727

Bug ID: 110727
   Summary: gcc.target/aarch64/sve/aarch64-sve.exp has two new
failures since commit 061f74c0673
   Product: gcc
   Version: 14.0
Status: UNCONFIRMED
  Severity: normal
  Priority: P3
 Component: target
  Assignee: unassigned at gcc dot gnu.org
  Reporter: thiago.bauermann at linaro dot org
  Target Milestone: ---

Created attachment 55574
  --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=55574=edit
Tarball containing testsuite log files for first bad and last good commits.

Our CI detected that commit 061f74c06735 "Fix profile update in
scale_profile_for_vect_loop" introduced these testsuite failures on
aarch64-linux:

Running gcc:gcc.target/aarch64/sve/aarch64-sve.exp ...
FAIL: gcc.target/aarch64/sve/live_1.c scan-assembler-times
\\twhilelo\\tp[0-7].b,  2
FAIL: gcc.target/aarch64/sve/live_1.c scan-assembler-times
\\twhilelo\\tp[0-7].h,  4

I confirmed that they are still present in trunk as of commit c11a3aedec26
"tree-ssa-loop-ch improvements, part 3" from today.

Tested on Ubuntu 22.04 with:

$ ~/src/gcc/configure --disable-bootstrap --disable-multilib && make -j 60
$ make -C gcc check-c RUNTESTFLAGS=gcc.target/aarch64/sve/aarch64-sve.exp

I'll attach gcc.sum and gcc.log from commit c11a3aedec26 as well as gcc.sum and
gcc.log from its parent, which was the last commit where the test passed.

[PATCH RESEND] c: add -Wmissing-variable-declarations [PR65213]

2023-07-18 Thread Hamza Mahfooz
Resolves:
PR c/65213 - Extend -Wmissing-declarations to variables [i.e. add
-Wmissing-variable-declarations]

gcc/c-family/ChangeLog:

PR c/65213
* c.opt (-Wmissing-variable-declarations): New option.

gcc/c/ChangeLog:

PR c/65213
* c-decl.cc (start_decl): Handle
-Wmissing-variable-declarations.

gcc/ChangeLog:

PR c/65213
* doc/invoke.texi (-Wmissing-variable-declarations): Document
new option.

gcc/testsuite/ChangeLog:

PR c/65213
* gcc.dg/Wmissing-variable-declarations.c: New test.

Signed-off-by: Hamza Mahfooz 
---
 gcc/c-family/c.opt|  4 +++
 gcc/c/c-decl.cc   | 10 +-
 gcc/doc/invoke.texi   | 11 +--
 .../gcc.dg/Wmissing-variable-declarations.c   | 33 +++
 4 files changed, 55 insertions(+), 3 deletions(-)
 create mode 100644 gcc/testsuite/gcc.dg/Wmissing-variable-declarations.c

diff --git a/gcc/c-family/c.opt b/gcc/c-family/c.opt
index 4abdc8d0e77..0ed87fcc7be 100644
--- a/gcc/c-family/c.opt
+++ b/gcc/c-family/c.opt
@@ -1048,6 +1048,10 @@ Wmissing-prototypes
 C ObjC Var(warn_missing_prototypes) Warning
 Warn about global functions without prototypes.
 
+Wmissing-variable-declarations
+C ObjC Var(warn_missing_variable_declarations) Warning
+Warn about global variables without previous declarations.
+
 Wmudflap
 C ObjC C++ ObjC++ WarnRemoved
 
diff --git a/gcc/c/c-decl.cc b/gcc/c/c-decl.cc
index ecd10ebb69c..1f9eb44dbaa 100644
--- a/gcc/c/c-decl.cc
+++ b/gcc/c/c-decl.cc
@@ -5340,6 +5340,7 @@ start_decl (struct c_declarator *declarator, struct 
c_declspecs *declspecs,
location_t *lastloc /* = NULL */)
 {
   tree decl;
+  tree old_decl;
   tree tem;
   tree expr = NULL_TREE;
   enum deprecated_states deprecated_state = DEPRECATED_NORMAL;
@@ -5360,7 +5361,9 @@ start_decl (struct c_declarator *declarator, struct 
c_declspecs *declspecs,
   if (!decl || decl == error_mark_node)
 return NULL_TREE;
 
-  if (tree lastdecl = lastloc ? lookup_last_decl (decl) : NULL_TREE)
+  old_decl = lookup_last_decl (decl);
+
+  if (tree lastdecl = lastloc ? old_decl : NULL_TREE)
 if (lastdecl != error_mark_node)
   *lastloc = DECL_SOURCE_LOCATION (lastdecl);
 
@@ -5372,6 +5375,11 @@ start_decl (struct c_declarator *declarator, struct 
c_declspecs *declspecs,
   && TREE_PUBLIC (decl))
 warning (OPT_Wmain, "%q+D is usually a function", decl);
 
+  if (warn_missing_variable_declarations && VAR_P (decl)
+  && !DECL_EXTERNAL (decl) && TREE_PUBLIC (decl) && old_decl == NULL_TREE)
+warning_at (DECL_SOURCE_LOCATION (decl), 
OPT_Wmissing_variable_declarations,
+   "no previous declaration for %qD", decl);
+
   if (initialized)
 /* Is it valid for this decl to have an initializer at all?
If not, set INITIALIZED to zero, which will indirectly
diff --git a/gcc/doc/invoke.texi b/gcc/doc/invoke.texi
index 88e3c625030..c2a0562b9e6 100644
--- a/gcc/doc/invoke.texi
+++ b/gcc/doc/invoke.texi
@@ -498,8 +498,8 @@ Objective-C and Objective-C++ Dialects}.
 
 @item C and Objective-C-only Warning Options
 @gccoptlist{-Wbad-function-cast  -Wmissing-declarations
--Wmissing-parameter-type  -Wmissing-prototypes  -Wnested-externs
--Wold-style-declaration  -Wold-style-definition
+-Wmissing-parameter-type -Wmissing-prototypes -Wmissing-variable-declarations
+-Wnested-externs -Wold-style-declaration  -Wold-style-definition
 -Wstrict-prototypes  -Wtraditional  -Wtraditional-conversion
 -Wdeclaration-after-statement  -Wpointer-sign}
 
@@ -9610,6 +9610,13 @@ provide prototypes and a non-matching declaration 
declares an
 overload rather than conflict with an earlier declaration.
 Use @option{-Wmissing-declarations} to detect missing declarations in C++.
 
+@opindex Wmissing-variable-declarations
+@opindex Wno-missing-variable-declarations
+@item -Wmissing-variable-declarations @r{(C and Objective-C only)}
+Warn if a global variable is defined without a previous declaration.
+Use this option to detect global variables that do not have a matching
+extern declaration in a header file.
+
 @opindex Wmissing-declarations
 @opindex Wno-missing-declarations
 @item -Wmissing-declarations
diff --git a/gcc/testsuite/gcc.dg/Wmissing-variable-declarations.c 
b/gcc/testsuite/gcc.dg/Wmissing-variable-declarations.c
new file mode 100644
index 000..b292dbe8c22
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/Wmissing-variable-declarations.c
@@ -0,0 +1,33 @@
+/* { dg-do compile } */
+/* { dg-options "-Wmissing-variable-declarations" } */
+
+int b0; /* { dg-warning "no previous declaration for 'b0'" } */
+
+int b1 = 1; /* { dg-warning "no previous declaration for 'b1'" } */
+
+int b2; /* { dg-warning "no previous declaration for 'b2'" } */
+int b2 = 2; 
+
+struct {
+int g0;
+} b3; /* { dg-warning "no previous declaration for 'b3'" } */
+
+int b4; /* { dg-warning "no previous declaration for 'b4'" } */
+int b4 = 3;
+extern int b4;

[Bug middle-end/110726] New: [14 Regression] wrong code on llvm-16 around 'a |= a == 0'

2023-07-18 Thread slyfox at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=110726

Bug ID: 110726
   Summary: [14 Regression] wrong code on llvm-16 around 'a |= a
== 0'
   Product: gcc
   Version: 14.0
Status: UNCONFIRMED
  Severity: normal
  Priority: P3
 Component: middle-end
  Assignee: unassigned at gcc dot gnu.org
  Reporter: slyfox at gcc dot gnu.org
  Target Milestone: ---

This week's gcc r14-2600-g3a407070b610a8 fails llvm-16 test suite as:

  Failed Tests (1):
LLVM-Unit ::
Support/./SupportTests/BlockFrequencyTest.SaturatingRightShift

It looks like the reproducer is trivial and happens even on -O0:

// $ cat bug.cc
int main(void) {
  unsigned long long freq = 0x10080ULL;

  freq >>= 2;
  freq |= freq == 0;

  if (freq != 0x4020ULL)
  __builtin_trap();
}

$ gcc-14 bug.cc -o a && ./a
Illegal instruction (core dumped)

$ $ gcc bug.cc -o a && ./a


$ gcc -v
Using built-in specs.
COLLECT_GCC=/<>/gcc-14.0.0/bin/gcc
COLLECT_LTO_WRAPPER=/<>/gcc-14.0.0/libexec/gcc/x86_64-unknown-linux-gnu/14.0.0/lto-wrapper
Target: x86_64-unknown-linux-gnu
Configured with: ../source/configure --prefix=/<>/gcc-14.0.0
--with-gmp-include=/<>/gmp-6.2.1-dev/include
--with-gmp-lib=/<>/gmp-6.2.1/lib
--with-mpfr-include=/<>/mpfr-4.2.0-dev/include
--with-mpfr-lib=/<>/mpfr-4.2.0/lib --with-mpc=/<>/libmpc-1.3.1
--with-native-system-header-dir=/<>/glibc-2.37-8-dev/include
--with-build-sysroot=/ --program-prefix= --enable-lto --disable-libstdcxx-pch
--without-included-gettext --with-system-zlib --enable-checking=release
--enable-static --enable-languages=c,c++ --disable-multilib --enable-plugin
--disable-libcc1 --with-isl=/<>/isl-0.20 --disable-bootstrap
--build=x86_64-unknown-linux-gnu --host=x86_64-unknown-linux-gnu
--target=x86_64-unknown-linux-gnu
Thread model: posix
Supported LTO compression algorithms: zlib
gcc version 14.0.0  (experimental) (GCC)

[Bug middle-end/110724] Unnecessary alignment on branch to unconditional branch targets

2023-07-18 Thread javier.martinez.bugzilla at gmail dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=110724

--- Comment #3 from Javier Martinez  
---
The generic tuning of 16:11:8 looks reasonable to me, I do not argue against
it.



From Anger Fog’s Optimizing subroutines in assembly language:

> Most microprocessors fetch code in aligned 16-byte or 32-byte blocks.
> If an important subroutine entry or jump label happens to be near the
> end of a 16-byte block then the microprocessor will only get a few 
> useful bytes of code when fetching that block of code. It may have
> to fetch the next 16 bytes too before it can decode the first instructions
> after the label. This can be avoided by aligning important subroutine
> entries and loop entries by 16. Aligning by 8 will assure that at least 8
> bytes of code can be loaded with the first instruction fetch, which may
> be sufficient if the instructions are small.



This looks like the reason behind the alignment. That section of the book
goes on to explain the inconvenience (execution of nops) of alignment on labels
reachable by other means than branching - which I presume lead to the :m and
:m2 tuning values, the distinction between -falign-labels and -falign-jumps,
and the reason padding is removed when my label is reachable by fall-through
with [[unlikely]].



All this is fine. 

My thesis is that this alignment strategy is always unnecessary in one specific
circumstance - when the branch target is itself an unconditional branch of size
1, as in:



.L1:

  ret 



Because the ret instruction will never cross a block boundary, and the
instructions following the ret must not execute, so there is no front-end stall
to avoid.

[Bug fortran/110725] [13/14 Regression,openmp] internal compiler error: in expand_expr_real_1, at expr.cc:10897

2023-07-18 Thread kargl at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=110725

kargl at gcc dot gnu.org changed:

   What|Removed |Added

Summary|[13/14 Regression] internal |[13/14 Regression,openmp]
   |compiler error: in  |internal compiler error: in
   |expand_expr_real_1, at  |expand_expr_real_1, at
   |expr.cc:10897   |expr.cc:10897
 CC||kargl at gcc dot gnu.org

--- Comment #2 from kargl at gcc dot gnu.org ---
Reduced testcase.  The '!$omp end teams' line in subroutine initial appears to
be out-of-place.


  module swim_mod

  INTEGER, PARAMETER :: N1=7702, N2=7702

  DOUBLE PRECISION, ALLOCATABLE, DIMENSION(:,:) :: U, V

  INTEGER :: M, N, MP1, NP1

!$omp declare target(U, V)
!$omp declare target(M,N,MP1,NP1)

  CONTAINS

  SUBROUTINE ALLOC
 IMPLICIT NONE
!$omp target update to(M,N,MP1,NP1)
!$omp&
 ALLOCATE(U(NP1,MP1), V(NP1,MP1))
  END SUBROUTINE

  SUBROUTINE INITAL
  INTEGER I
!$omp target
!$omp teams
!$omp distribute parallel do simd
  DO 75 I=1,M
 U(I+1,N+1) = U(I+1,1)
 V(I,1) = V(I,N+1)
   75 CONTINUE
!$omp end teams
  U(1,N+1) = U(M+1,1)
  V(M+1,1) = V(1,N+1)
!$omp end target
  END SUBROUTINE

  end module swim_mod

Re: [V1][PATCH 0/3] New attribute "element_count" to annotate bounds for C99 FAM(PR108896)

2023-07-18 Thread Qing Zhao via Gcc-patches


> On Jul 18, 2023, at 11:37 AM, Qing Zhao via Gcc-patches 
>  wrote:
> 
> 
> 
>> On Jul 17, 2023, at 7:40 PM, Kees Cook  wrote:
>> 
>> On Mon, Jul 17, 2023 at 09:17:48PM +, Qing Zhao wrote:
>>> 
 On Jul 13, 2023, at 4:31 PM, Kees Cook  wrote:
 
 In the bug, the problem is that "p" isn't known to be allocated, if I'm
 reading that correctly?
>>> 
>>> I think that the major point in PR109557 
>>> (https://gcc.gnu.org/bugzilla/show_bug.cgi?id=109557):
>>> 
>>> for the following pointer p.3_1, 
>>> 
>>> p.3_1 = p;
>>> _2 = __builtin_object_size (p.3_1, 0);
>>> 
>>> Question: why the size of p.3_1 cannot use the TYPE_SIZE of the pointee of 
>>> p when the TYPE_SIZE can be determined at compile time?
>>> 
>>> Answer:  From just knowing the type of the pointee of p, the compiler 
>>> cannot determine the size of the object.  
>> 
>> Why is that? "p" points to "struct P", which has a fixed size. There
>> must be an assumption somewhere that a pointer is allocated, otherwise
>> __bos would almost never work?
> 
> My understanding from the comments in PR109557 was: 
> 
> In general the pointer could point to the first object of an array that has 
> more elements, or to an object of a different type. 
> Without seeing the real allocation to the pointer, the compiler cannot assume 
> that the pointer point to an object that has
> the exact same type as its declaration. 
> 
> Sid and Martin, is the above understand correctly?
> 
> Honestly, I am still not 100% clear on this yet.
> 
> Jakub, Sid and Martin, could  you please explain a little bit more here, 
> especially for the following small example?
> 
> [opc@qinzhao-ol8u3-x86 109557]$ cat t.c
> #include 
> #include 
> struct P {
>  int k;
>  int x[10]; 
> } *p;
> 
> void store(int a, int b) 
> {
>  p = (struct P *)malloc (sizeof (struct P));
>  p->k = a;
>  p->x[b] = 0;
>  assert (__builtin_dynamic_object_size (p->x, 1) == sizeof (int[10]));
>  assert (__builtin_dynamic_object_size (p, 1) == sizeof (struct P));
>  return;
> }
> 
> int main()
> {
>  store(7, 7);
>  assert (__builtin_dynamic_object_size (p->x, 1) == sizeof (int[10]));
>  assert (__builtin_dynamic_object_size (p, 1) == sizeof (struct P));
>  free (p);
> }
> [opc@qinzhao-ol8u3-x86 109557]$ sh t
> /home/opc/Install/latest/bin/gcc -O -fdump-tree-all t.c
> a.out: t.c:21: main: Assertion `__builtin_dynamic_object_size (p->x, 1) == 
> sizeof (int[10])' failed.
> t: line 19: 859944 Aborted (core dumped) ./a.out
> 
> 

A correction to the above compilation option:
/home/opc/Install/latest-d/bin/gcc -O -fstrict-flex-arrays=3 t.c
a.out: t.c:22: main: Assertion `__builtin_dynamic_object_size (p, 1) == sizeof 
(struct P)' failed.
t: line 19: 918833 Aborted (core dumped) ./a.out

(All others keep the same).

Sorry for the mistake.

Qing

> In the above, among the 4 assertions, only the last one failed.
> 
> Why GCC can use the TYPE_SIZE of the pointee of the pointer “p->x” as the 
> size of the object, 
> but cannot use the TYPE_SIZE of the pointee of the pointer “p” as the size of 
> the object? 
> 
> 
>> 
>>> Therefore the bug has been closed. 
>>> 
>>> In your following testing 5:
>>> 
 I'm not sure this is a reasonable behavior, but
 let me get into the specific test, which looks like this:
 
 TEST(counted_by_seen_by_bdos)
 {
  struct annotated *p;
  int index = MAX_INDEX + unconst;
 
  p = alloc_annotated(index);
 
  REPORT_SIZE(p->array);
 /* 1 */ EXPECT_EQ(sizeof(*p), offsetof(typeof(*p), array));
  /* Check array size alone. */
 /* 2 */ EXPECT_EQ(__builtin_object_size(p->array, 1), SIZE_MAX);
 /* 3 */ EXPECT_EQ(__builtin_dynamic_object_size(p->array, 1), p->foo * 
 sizeof(*p->array));
  /* Check check entire object size. */
 /* 4 */ EXPECT_EQ(__builtin_object_size(p, 1), SIZE_MAX);
 /* 5 */ EXPECT_EQ(__builtin_dynamic_object_size(p, 1), sizeof(*p) + p->foo 
 * sizeof(*p->array));
 }
 
 Test 5 should pass as well, since, again, p can be examined. Passing p
 to __bdos implies it is allocated and the __counted_by annotation can be
 examined.
>>> 
>>> Since the call to the routine “alloc_annotated" cannot be inlined, GCC does 
>>> not see any allocation calls for the pointer p.
>> 
>> Correct.
>> 
>>> At the same time, due to the same reason as PR109986, GCC cannot determine 
>>> the size of the object by just knowing the TYPE_SIZE
>>> of the pointee of p.  
>> 
>> So the difference between test 3 and test 5 is that "p" is explicitly
>> dereferenced to find "array", and therefore an assumption is established
>> that "p" is allocated?
> 
> Yes, this might be the assumption that GCC made  -:)
>> 
>>> So, this is exactly the same issue as PR109557.  It’s an existing behavior 
>>> per the current __buitlin_object_size algorithm.
>>> I am still not very sure whether the situation in PR109557 can be improved 
>>> or not, but either way, it’s 

[Bug fortran/110360] ABI issue with character,value dummy argument

2023-07-18 Thread anlauf at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=110360

--- Comment #26 from anlauf at gcc dot gnu.org ---
(In reply to David Edelsohn from comment #25)
> The problem on big endian systems is that GFortran is passing the character
> with the wrong padding.
[...]
> GFortran is not taking account of endianness for the layout of values in
> memory compared to constants loaded into registers.  This isn't an ABI issue
> of the target, this is a memory layout and register layout issue of GFortran.

Frankly speaking, this is a place where I have zero knowledge.

> Let me know if you need more information or tests.

There is pr110419 which tracks the testsuite regression on BE systems.

Mikael added some info there.  Maybe you can have a look, too.

[Bug fortran/95947] PACK intrinsic returns blank strings when an allocatable character array with allocatable length is used

2023-07-18 Thread anlauf at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=95947

anlauf at gcc dot gnu.org changed:

   What|Removed |Added

 Status|NEW |RESOLVED
 CC||anlauf at gcc dot gnu.org
 Resolution|--- |FIXED

--- Comment #9 from anlauf at gcc dot gnu.org ---
Fixed on mainline for gcc-14 so far, and on 13-branch.
Might be backported further after waiting some time.

Thanks for the report!

[Bug fortran/110658] MINVAL/MAXVAL and deferred-length character arrays

2023-07-18 Thread cvs-commit at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=110658

--- Comment #4 from CVS Commits  ---
The releases/gcc-13 branch has been updated by Harald Anlauf
:

https://gcc.gnu.org/g:ccf94ab2abb6969c04d51c7879f07edfbb97ae55

commit r13-7584-gccf94ab2abb6969c04d51c7879f07edfbb97ae55
Author: Harald Anlauf 
Date:   Sun Jul 16 22:17:27 2023 +0200

Fortran: intrinsics and deferred-length character arguments
[PR95947,PR110658]

gcc/fortran/ChangeLog:

PR fortran/95947
PR fortran/110658
* trans-expr.cc (gfc_conv_procedure_call): For intrinsic procedures
whose result characteristics depends on the first argument and
which
can be of type character, the character length will not be
deferred.

gcc/testsuite/ChangeLog:

PR fortran/95947
PR fortran/110658
* gfortran.dg/deferred_character_37.f90: New test.

(cherry picked from commit 95ddd2659849a904509067ec3a2770135149a722)

[Bug fortran/95947] PACK intrinsic returns blank strings when an allocatable character array with allocatable length is used

2023-07-18 Thread cvs-commit at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=95947

--- Comment #8 from CVS Commits  ---
The releases/gcc-13 branch has been updated by Harald Anlauf
:

https://gcc.gnu.org/g:ccf94ab2abb6969c04d51c7879f07edfbb97ae55

commit r13-7584-gccf94ab2abb6969c04d51c7879f07edfbb97ae55
Author: Harald Anlauf 
Date:   Sun Jul 16 22:17:27 2023 +0200

Fortran: intrinsics and deferred-length character arguments
[PR95947,PR110658]

gcc/fortran/ChangeLog:

PR fortran/95947
PR fortran/110658
* trans-expr.cc (gfc_conv_procedure_call): For intrinsic procedures
whose result characteristics depends on the first argument and
which
can be of type character, the character length will not be
deferred.

gcc/testsuite/ChangeLog:

PR fortran/95947
PR fortran/110658
* gfortran.dg/deferred_character_37.f90: New test.

(cherry picked from commit 95ddd2659849a904509067ec3a2770135149a722)

Re: [PATCH v2 0/2] ifcvt: Allow if conversion of arithmetic in basic blocks with multiple sets

2023-07-18 Thread Richard Sandiford via Gcc-patches
Manolis Tsamis  writes:
> On Tue, Jul 18, 2023 at 1:12 AM Richard Sandiford
>  wrote:
>>
>> Manolis Tsamis  writes:
>> > noce_convert_multiple_sets has been introduced and extended over time to 
>> > handle
>> > if conversion for blocks with multiple sets. Currently this is focused on
>> > register moves and rejects any sort of arithmetic operations.
>> >
>> > This series is an extension to allow more sequences to take part in if
>> > conversion. The first patch is a required change to emit correct code and 
>> > the
>> > second patch whitelists a larger number of operations through
>> > bb_ok_for_noce_convert_multiple_sets.
>> >
>> > For targets that have a rich selection of conditional instructions,
>> > like aarch64, I have seen an ~5x increase of profitable if conversions for
>> > multiple set blocks in SPEC benchmarks. Also tested with a wide variety of
>> > benchmarks and I have not seen performance regressions on either x64 / 
>> > aarch64.
>>
>> Interesting results.  Are you free to say which target you used for aarch64?
>>
>> If I've understood the cost heuristics correctly, we'll allow a "predictable"
>> branch to be replaced by up to 5 simple conditional instructions and an
>> "unpredictable" branch to be replaced by up to 10 simple conditional
>> instructions.  That seems pretty high.  And I'm not sure how well we
>> guess predictability in the absence of real profile information.
>>
>> So my gut instinct was that the limitations of the current code might
>> be saving us from overly generous limits.  It sounds from your results
>> like that might not be the case though.
>>
>> Still, if it does turn out to be the case in future, I agree we should
>> fix the costs rather than hamstring the code.
>>
>
> My writing may have been confusing, but with "~5x increase of
> profitable if conversions" I just meant that ifcvt considers these
> profitable, not that they actually are when executed in particular
> hardware.

Yeah, sorry, I'd read that part as measuring the number of if-converisons.
But...

> But at the same time I haven't yet seen any obvious performance
> regressions in some benchmarks that I have ran.

...it was a pleasant surprise that doing so much more if-conversion
didn't make things noticeably worse. :)

> In any case it could be interesting to microbenchmark branches vs
> conditional instructions and see how sane these numbers are.

I think for this we really do need the real workload, since it's
hard to measure realistic branch mispredict penalties with a
microbenchmark.

> [...]
>> (2) Don't you also need to update the "rewiring" mechanism, to cope
>> with cases where the then block has something like:
>>
>>   if (a == 0) {
>> a = b op c;   ->a' = a == 0 ? b op c : a;
>> d = a op b;   ->d = a == 0 ? a' op b : d;
>>   } a = a'
>>
>> At the moment the code only handles regs and subregs, whereas but IIUC
>> it should now iterate over all the regs in the SET_SRC.  And I suppose
>> that creates the need for multiple possible rewirings in the same insn,
>> so that it isn't a simple insn -> index mapping any more.
>>
>
> Indeed, I believe this current patch cannot properly handle these. I
> will create testcases for this and see what changes need to be done in
> the next iteration so that correct code is generated.

Perhaps we should change the way that the rewiring is done.
At the moment, need_cmov_or_rewire detects the renumbering
ahead of time.  But it might be easier to:

- have noce_convert_multiple_sets_1 keep track of which
  SET_DESTs it has replaced with temporaries.

- for each subsequent instruction, go through that list in order
  and use insn_propagation (from recog.h) to apply each replacement.

That might be simpler, and should also be more robust, since the
insn_propagation routines return false on failure.

Thanks,
Richard



[Bug c++/110197] [13/14 Regression] Empty constexpr object constructor erronously claims out of range access

2023-07-18 Thread mpolacek at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=110197

--- Comment #2 from Marek Polacek  ---
// PR c++/110197

namespace std {
constexpr bool __is_constant_evaluated() {
  return __builtin_is_constant_evaluated();
}
template  using enable_if_t = _Tp;
template  struct __array_traits {
  typedef _Tp _Type[_Nm];
};
template  struct array {
  typename __array_traits<_Tp, _Nm>::_Type _M_elems;
};
template  array(_Tp) -> array, 1>;
struct char_traits {
  static constexpr unsigned length() {
__is_constant_evaluated();
return 0;
  }
};
struct basic_string_view {
  using traits_type = char_traits;
  constexpr basic_string_view(const char *)
  : _M_len{traits_type::length()}, _M_str{} {}
  long _M_len;
  char _M_str;
};
} // namespace std
struct Currency {
  constexpr Currency(std::basic_string_view) {}
};
void get_c() { constexpr std::array c{Currency{""}}; }

Re: [PATCH v3] Implement new RTL optimizations pass: fold-mem-offsets.

2023-07-18 Thread Jeff Law via Gcc-patches




On 7/18/23 11:15, Manolis Tsamis wrote:

On Fri, Jul 14, 2023 at 8:35 AM Jeff Law  wrote:




On 7/13/23 09:05, Manolis Tsamis wrote:

In this version I have made f-m-o able to also eliminate constant
moves in addition to the add constant instructions.
This increases the number of simplified/eliminated instructions and is
a good addition for RISC style ISAs where these are more common.

This has led to pr52146.c failing in x86, which I haven't been able to
find a way to fix.
This involves directly writing to a constant address with -mx32

The code
  movl$-18874240, %eax
  movl$0, (%eax)

is 'optimized' to
  movl$0, %eax
  movl$0, -18874240(%eax)

Which is actually
  movl$0, -18874240

which is wrong per the ticket.
The fix for the ticket involved changes to legitimate_address_p which
f-m-o does call but it doesn't reject due to the existence of (%eax)
which in turn is actually zero.
I believe this is not strictly an f-m-o issue since the pass calls all
the required functions to test whether the newly synthesized memory
instruction is valid.

Any ideas on how to solve this issue is appreciated.

I wonder if costing might be useful here.  I would expect the 2nd
sequence is the most costly of the three if address costing models are
reasonably accurate.

Another way would be to look at the length of the memory reference insn.
   If it's larger, then it's likely more costly.

That's what I've got off the top of my head.



I could test whether the cost function prefers the version that we
want, but that would be a workaround I would like to avoid. It may
also be the case that this reproduces with a different sequence where
the unwanted code is actually more profitable.

I was trying to find out whether the original fix can be extended in a
way that solves this, because having an address that is reported as
legitimate but is actually not could also create issues elsewhere.
But I don't yet have a suggestion on how to fix it yet.
I was thinking a bit more about this yesterday, and even in the case 
where the new mem crosses a boundary thus making the memory load/store 
more expensive I think we're still OK.


The key is at worst we will have changed an earlier instruction like t = 
sp +  into t = sp which should reduce the cost of that earlier 
instruction.  And I would expect the vast majority of the time we 
completely eliminate that earlier instruction.


So this may ultimately be a non-issue.

Vineet @ Rivos has indicated he stumbled across an ICE with the V3 code. 
 Hopefully he'll get a testcase for that extracted shortly.


jeff


[Bug fortran/110725] [13/14 Regression] internal compiler error: in expand_expr_real_1, at expr.cc:10897

2023-07-18 Thread anlauf at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=110725

anlauf at gcc dot gnu.org changed:

   What|Removed |Added

 Status|UNCONFIRMED |NEW
 Ever confirmed|0   |1
Summary|internal compiler error: in |[13/14 Regression] internal
   |expand_expr_real_1, at  |compiler error: in
   |expr.cc:10897   |expand_expr_real_1, at
   ||expr.cc:10897
   Target Milestone|--- |13.2
   Keywords||ice-on-valid-code, openmp
   Last reconfirmed||2023-07-18
   Priority|P3  |P4

--- Comment #1 from anlauf at gcc dot gnu.org ---
The code is accepted by Nvidia.

No ICE at -O0, but ICE at -Og, -O1, and higher.

ICE confirmed also on x86_64, so likely not target-specific.

[Bug middle-end/110724] Unnecessary alignment on branch to unconditional branch targets

2023-07-18 Thread pinskia at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=110724

Andrew Pinski  changed:

   What|Removed |Added

 Status|UNCONFIRMED |WAITING
   Last reconfirmed||2023-07-18
 Ever confirmed|0   |1

--- Comment #2 from Andrew Pinski  ---
The generic tuning is:
  "16:11:8",/* Loop alignment.  */
  "16:11:8",/* Jump alignment.  */
  "0:0:8",  /* Label alignment.  */

The the operands are described as:
https://gcc.gnu.org/onlinedocs/gcc-13.1.0/gcc/Optimize-Options.html#index-falign-functions
```
-falign-functions=n:m:n2
...
Examples: -falign-functions=32 aligns functions to the next 32-byte boundary,
-falign-functions=24 aligns to the next 32-byte boundary only if this can be
done by skipping 23 bytes or less, -falign-functions=32:7 aligns to the next
32-byte boundary only if this can be done by skipping 6 bytes or less.

The second pair of n2:m2 values allows you to specify a secondary alignment:
-falign-functions=64:7:32:3 aligns to the next 64-byte boundary if this can be
done by skipping 6 bytes or less, otherwise aligns to the next 32-byte boundary
if this can be done by skipping 2 bytes or less. If m2 is not specified, it
defaults to n2.
```

So align jumps to 16 byte if 11 or less bytes can be used or 8 byte alignment
Which is exactly what this does:
.p2align 4,,7  # <-- unnecessary alignment
.p2align 3

[Bug middle-end/110724] Unnecessary alignment on branch to unconditional branch targets

2023-07-18 Thread pinskia at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=110724

--- Comment #1 from Andrew Pinski  ---
I think this is by design.

Adding -fno-align-jumps makes the alignment go away.

Re: [PATCH] Read global value/mask in IPA.

2023-07-18 Thread Aldy Hernandez via Gcc-patches



On 7/17/23 15:14, Aldy Hernandez wrote:

Instead of reading the known zero bits in IPA, read the value/mask
pair which is available.

There is a slight change of behavior here.  I have removed the check
for SSA_NAME, as the ranger can calculate the range and value/mask for
INTEGER_CST.  This simplifies the code a bit, since there's no special
casing when setting the jfunc bits.  The default range for VR is
undefined, so I think it's safe just to check for undefined_p().


Final round of tests revealed a regression for which I've adjusted the 
testcase.


It turns out g++.dg/ipa/pure-const-3.C fails because IPA can now pick up 
value/mask from any pass that has an integrated ranger.  The test was 
previously disabling evrp and CCP, but now VRP[12], jump threading, and 
DOM can make value/mask adjustments visible to IPA so they must be 
disabled as well.


We've run into these scenarios multiple times in the past-- any 
improvements to the ranger pipeline causes everyone to get smarter, 
making changes visible earlier in the pipeline.


AldyFrom e1dfd4d6b3d3bf09d55b6ea3ac732462c7030802 Mon Sep 17 00:00:00 2001
From: Aldy Hernandez 
Date: Fri, 14 Jul 2023 12:38:16 +0200
Subject: [PATCH] Read global value/mask in IPA.

Instead of reading the known zero bits in IPA, read the value/mask
pair which is available.

There is a slight change of behavior here.  I have removed the check
for SSA_NAME, as the ranger can calculate the range and value/mask for
INTEGER_CST.  This simplifies the code a bit, since there's no special
casing when setting the jfunc bits.  The default range for VR is
undefined, so I think it's safe just to check for undefined_p().

gcc/ChangeLog:

	* ipa-prop.cc (ipa_compute_jump_functions_for_edge): Read global
	value/mask.

gcc/testsuite/ChangeLog:

	* g++.dg/ipa/pure-const-3.C: Adjust for smarter value/mask being
	read by ranger earlier than expected by test.
---
 gcc/ipa-prop.cc | 18 --
 gcc/testsuite/g++.dg/ipa/pure-const-3.C |  2 +-
 2 files changed, 9 insertions(+), 11 deletions(-)

diff --git a/gcc/ipa-prop.cc b/gcc/ipa-prop.cc
index 5d790ff1265..4f6ed7b89bd 100644
--- a/gcc/ipa-prop.cc
+++ b/gcc/ipa-prop.cc
@@ -2402,8 +2402,7 @@ ipa_compute_jump_functions_for_edge (struct ipa_func_body_info *fbi,
 	}
   else
 	{
-	  if (TREE_CODE (arg) == SSA_NAME
-	  && param_type
+	  if (param_type
 	  && Value_Range::supports_type_p (TREE_TYPE (arg))
 	  && Value_Range::supports_type_p (param_type)
 	  && irange::supports_p (TREE_TYPE (arg))
@@ -2422,15 +2421,14 @@ ipa_compute_jump_functions_for_edge (struct ipa_func_body_info *fbi,
 	gcc_assert (!jfunc->m_vr);
 	}
 
-  if (INTEGRAL_TYPE_P (TREE_TYPE (arg))
-	  && (TREE_CODE (arg) == SSA_NAME || TREE_CODE (arg) == INTEGER_CST))
+  if (INTEGRAL_TYPE_P (TREE_TYPE (arg)) && !vr.undefined_p ())
 	{
-	  if (TREE_CODE (arg) == SSA_NAME)
-	ipa_set_jfunc_bits (jfunc, 0,
-widest_int::from (get_nonzero_bits (arg),
-		  TYPE_SIGN (TREE_TYPE (arg;
-	  else
-	ipa_set_jfunc_bits (jfunc, wi::to_widest (arg), 0);
+	  irange  = as_a  (vr);
+	  irange_bitmask bm = r.get_bitmask ();
+	  signop sign = TYPE_SIGN (TREE_TYPE (arg));
+	  ipa_set_jfunc_bits (jfunc,
+			  widest_int::from (bm.value (), sign),
+			  widest_int::from (bm.mask (), sign));
 	}
   else if (POINTER_TYPE_P (TREE_TYPE (arg)))
 	{
diff --git a/gcc/testsuite/g++.dg/ipa/pure-const-3.C b/gcc/testsuite/g++.dg/ipa/pure-const-3.C
index b4a4673e86e..e43cf09af27 100644
--- a/gcc/testsuite/g++.dg/ipa/pure-const-3.C
+++ b/gcc/testsuite/g++.dg/ipa/pure-const-3.C
@@ -1,5 +1,5 @@
 /* { dg-do compile } */
-/* { dg-options "-O2 -fno-ipa-vrp -fdump-tree-optimized -fno-tree-ccp -fdisable-tree-evrp"  } */
+/* { dg-options "-O2 -fno-ipa-vrp -fdump-tree-optimized -fno-tree-ccp -fdisable-tree-evrp -fdisable-tree-vrp1 -fdisable-tree-vrp2 -fno-thread-jumps -fno-tree-dominator-opts"  } */
 int *ptr;
 static int barvar;
 static int b(int a);
-- 
2.40.1



[Bug c++/110338] Implement C++26 language features

2023-07-18 Thread mpolacek at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=110338
Bug 110338 depends on bug 110340, which changed state.

Bug 110340 Summary: [C++26] P2621R2 - Remove undefined behavior from lexing
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=110340

   What|Removed |Added

 Status|ASSIGNED|RESOLVED
 Resolution|--- |FIXED

[Bug c++/110340] [C++26] P2621R2 - Remove undefined behavior from lexing

2023-07-18 Thread cvs-commit at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=110340

--- Comment #3 from CVS Commits  ---
The trunk branch has been updated by Marek Polacek :

https://gcc.gnu.org/g:fca089e8a47314a40ad93527ba9f9d0d374b3afb

commit r14-2626-gfca089e8a47314a40ad93527ba9f9d0d374b3afb
Author: Marek Polacek 
Date:   Tue Jul 18 13:26:39 2023 -0400

c++: Add tests for P2621, no UB in lexer [PR110340]

C++26 P2621 removes UB in the lexer and either makes the construct valid
or ill-formed.  We're already handling this correctly so this patch only
adds tests.

PR c++/110340

gcc/testsuite/ChangeLog:

* g++.dg/cpp/string-4.C: New test.
* g++.dg/cpp/ucn-2.C: New test.

[Bug c++/110340] [C++26] P2621R2 - Remove undefined behavior from lexing

2023-07-18 Thread mpolacek at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=110340

Marek Polacek  changed:

   What|Removed |Added

 Status|ASSIGNED|RESOLVED
 Resolution|--- |FIXED

--- Comment #4 from Marek Polacek  ---
Done.

[pushed] c++: Add tests for P2621, no UB in lexer [PR110340]

2023-07-18 Thread Marek Polacek via Gcc-patches
Tested x86_64-pc-linux-gnu, applying to trunk.

-- 8< --

C++26 P2621 removes UB in the lexer and either makes the construct valid
or ill-formed.  We're already handling this correctly so this patch only
adds tests.

PR c++/110340

gcc/testsuite/ChangeLog:

* g++.dg/cpp/string-4.C: New test.
* g++.dg/cpp/ucn-2.C: New test.
---
 gcc/testsuite/g++.dg/cpp/string-4.C |  6 ++
 gcc/testsuite/g++.dg/cpp/ucn-2.C| 15 +++
 2 files changed, 21 insertions(+)
 create mode 100644 gcc/testsuite/g++.dg/cpp/string-4.C
 create mode 100644 gcc/testsuite/g++.dg/cpp/ucn-2.C

diff --git a/gcc/testsuite/g++.dg/cpp/string-4.C 
b/gcc/testsuite/g++.dg/cpp/string-4.C
new file mode 100644
index 000..37d0388413c
--- /dev/null
+++ b/gcc/testsuite/g++.dg/cpp/string-4.C
@@ -0,0 +1,6 @@
+// P2621R2 - UB? In My Lexer?
+// { dg-do compile }
+
+// [lex.phases] If a U+0027 APOSTROPHE or a U+0022 QUOTATION
+// MARK matches the last category, the program is ill-formed.
+const char * foo=" // { dg-error "terminating|expected" }
diff --git a/gcc/testsuite/g++.dg/cpp/ucn-2.C b/gcc/testsuite/g++.dg/cpp/ucn-2.C
new file mode 100644
index 000..c5583e06dd6
--- /dev/null
+++ b/gcc/testsuite/g++.dg/cpp/ucn-2.C
@@ -0,0 +1,15 @@
+// P2621R2 - UB? In My Lexer?
+// { dg-do compile }
+
+// Line splicing can form a universal-character-name [lex.charset].
+int \\
+u\
+0\
+3\
+9\
+1 = 0;
+
+// [cpp.concat] Concatenation can form a universal-character-name.
+#define CONCAT(x,y) x##y
+
+int CONCAT(\,u0393)=0;

base-commit: 05fc7db93452841280ddc5cdf71b33498ed576dc
-- 
2.41.0



[Bug fortran/110725] New: internal compiler error: in expand_expr_real_1, at expr.cc:10897

2023-07-18 Thread tonycurtis32 at gmail dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=110725

Bug ID: 110725
   Summary: internal compiler error: in expand_expr_real_1, at
expr.cc:10897
   Product: gcc
   Version: 13.1.0
Status: UNCONFIRMED
  Severity: normal
  Priority: P3
 Component: fortran
  Assignee: unassigned at gcc dot gnu.org
  Reporter: tonycurtis32 at gmail dot com
  Target Milestone: ---

Created attachment 55573
  --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=55573=edit
preprocessed source

arch = a64fx (arm 8.2 + sve)

==

Using built-in specs.
COLLECT_GCC=gfortran
COLLECT_LTO_WRAPPER=/lustre/software/gcc/13.1.0/libexec/gcc/aarch64-unknown-linux-gnu/13.1.0/lto-wrapper
Target: aarch64-unknown-linux-gnu
Configured with: ../gcc-13.1.0/configure --with-mpfr-lib=/lib64
--with-gmp-lib=/lib64 --with-gmp-include=/usr/include
--with-mpfr-include=/usr/include --with-mpc-lib=/lib64
--with-mpc-include=/usr/include --prefix=/lustre/software/gcc/13.1.0
Thread model: posix
Supported LTO compression algorithms: zlib zstd
gcc version 13.1.0 (GCC)

==

command-line = gfortran -ggdb -fopenmp -mcmodel=large -mcpu=a64fx -O3 -o swim
swim.F
(-O0 compiles ok, gcc 12.2 works fine too with optimization)

==

swim.F:107:72:

  107 |   DO 50 I=1,MP1
  |   
1
Warning: Fortran 2018 deleted feature: Shared DO termination label 50 at (1)
swim.F:129:72:

  129 |   DO 60 I=1,M
  |   
1
Warning: Fortran 2018 deleted feature: Shared DO termination label 60 at (1)
swim.F:173:72:

  173 |   DO 86 I=1,MP1
  |   
1
Warning: Fortran 2018 deleted feature: Shared DO termination label 86 at (1)
swim.F:206:72:

  206 |   DO 100 I=1,M
  |   
1
Warning: Fortran 2018 deleted feature: Shared DO termination label 100 at (1)
swim.F:278:72:

  278 |   DO 200 I=1,M
  |   
1
Warning: Fortran 2018 deleted feature: Shared DO termination label 200 at (1)
swim.F:340:72:

  340 |   DO 400 I=1,MP1
  |   
1
Warning: Fortran 2018 deleted feature: Shared DO termination label 400 at (1)
swim.F:376:72:

  376 |   DO 300 I=1,M
  |   
1
Warning: Fortran 2018 deleted feature: Shared DO termination label 300 at (1)
during RTL pass: expand
swim.F:404:72:

  404 | !$omp target
  |   
^
internal compiler error: in expand_expr_real_1, at expr.cc:10897
0x9d1067 expand_expr_real_1(tree_node*, rtx_def*, machine_mode,
expand_modifier, rtx_def**, bool)
../../gcc-13.1.0/gcc/expr.cc:10897
0x9cce9b expand_expr_real(tree_node*, rtx_def*, machine_mode, expand_modifier,
rtx_def**, bool)
../../gcc-13.1.0/gcc/expr.cc:9000
0x9cce9b expand_expr(tree_node*, rtx_def*, machine_mode, expand_modifier)
../../gcc-13.1.0/gcc/expr.h:310
0x9cce9b expand_expr_real_1(tree_node*, rtx_def*, machine_mode,
expand_modifier, rtx_def**, bool)
../../gcc-13.1.0/gcc/expr.cc:11234
0x9ce88f expand_expr_real(tree_node*, rtx_def*, machine_mode, expand_modifier,
rtx_def**, bool)
../../gcc-13.1.0/gcc/expr.cc:9000
0x9ce88f expand_expr_real_1(tree_node*, rtx_def*, machine_mode,
expand_modifier, rtx_def**, bool)
../../gcc-13.1.0/gcc/expr.cc:11463
0x9cff3b expand_expr_real(tree_node*, rtx_def*, machine_mode, expand_modifier,
rtx_def**, bool)
../../gcc-13.1.0/gcc/expr.cc:9000
0x9cff3b expand_expr_real_1(tree_node*, rtx_def*, machine_mode,
expand_modifier, rtx_def**, bool)
../../gcc-13.1.0/gcc/expr.cc:10805
0x9cce9b expand_expr_real(tree_node*, rtx_def*, machine_mode, expand_modifier,
rtx_def**, bool)
../../gcc-13.1.0/gcc/expr.cc:9000
0x9cce9b expand_expr(tree_node*, rtx_def*, machine_mode, expand_modifier)
../../gcc-13.1.0/gcc/expr.h:310
0x9cce9b expand_expr_real_1(tree_node*, rtx_def*, machine_mode,
expand_modifier, rtx_def**, bool)
../../gcc-13.1.0/gcc/expr.cc:11234
0x9cff3b expand_expr_real(tree_node*, rtx_def*, machine_mode, expand_modifier,
rtx_def**, bool)
../../gcc-13.1.0/gcc/expr.cc:9000
0x9cff3b expand_expr_real_1(tree_node*, rtx_def*, machine_mode,
expand_modifier, rtx_def**, bool)
../../gcc-13.1.0/gcc/expr.cc:10805
0x9da10f expand_expr_real(tree_node*, rtx_def*, machine_mode, expand_modifier,
rtx_def**, bool)
../../gcc-13.1.0/gcc/expr.cc:9000
0x9da10f store_expr(tree_node*, rtx_def*, int, bool, bool)
../../gcc-13.1.0/gcc/expr.cc:6330
0x9db73b expand_assignment(tree_node*, tree_node*, bool)

Re: [PATCH v3] Implement new RTL optimizations pass: fold-mem-offsets.

2023-07-18 Thread Manolis Tsamis
On Fri, Jul 14, 2023 at 8:35 AM Jeff Law  wrote:
>
>
>
> On 7/13/23 09:05, Manolis Tsamis wrote:
> > In this version I have made f-m-o able to also eliminate constant
> > moves in addition to the add constant instructions.
> > This increases the number of simplified/eliminated instructions and is
> > a good addition for RISC style ISAs where these are more common.
> >
> > This has led to pr52146.c failing in x86, which I haven't been able to
> > find a way to fix.
> > This involves directly writing to a constant address with -mx32
> >
> > The code
> >  movl$-18874240, %eax
> >  movl$0, (%eax)
> >
> > is 'optimized' to
> >  movl$0, %eax
> >  movl$0, -18874240(%eax)
> >
> > Which is actually
> >  movl$0, -18874240
> >
> > which is wrong per the ticket.
> > The fix for the ticket involved changes to legitimate_address_p which
> > f-m-o does call but it doesn't reject due to the existence of (%eax)
> > which in turn is actually zero.
> > I believe this is not strictly an f-m-o issue since the pass calls all
> > the required functions to test whether the newly synthesized memory
> > instruction is valid.
> >
> > Any ideas on how to solve this issue is appreciated.
> I wonder if costing might be useful here.  I would expect the 2nd
> sequence is the most costly of the three if address costing models are
> reasonably accurate.
>
> Another way would be to look at the length of the memory reference insn.
>   If it's larger, then it's likely more costly.
>
> That's what I've got off the top of my head.
>

I could test whether the cost function prefers the version that we
want, but that would be a workaround I would like to avoid. It may
also be the case that this reproduces with a different sequence where
the unwanted code is actually more profitable.

I was trying to find out whether the original fix can be extended in a
way that solves this, because having an address that is reported as
legitimate but is actually not could also create issues elsewhere.
But I don't yet have a suggestion on how to fix it yet.

Thanks,
Manolis

> Jeff


[Bug rtl-optimization/110701] [14 Regression] Wrong code at -O1/2/3/s on x86_64-linux-gnu

2023-07-18 Thread roger at nextmovesoftware dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=110701

Roger Sayle  changed:

   What|Removed |Added

   Assignee|unassigned at gcc dot gnu.org  |roger at 
nextmovesoftware dot com
 Status|NEW |ASSIGNED

--- Comment #6 from Roger Sayle  ---
I have a fix (to combine.cc's record_dead_and_set_regs_1).  Bootstrapping and
regression testing.

[Bug rtl-optimization/110724] New: Unnecessary alignment on branch to unconditional branch targets

2023-07-18 Thread javier.martinez.bugzilla at gmail dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=110724

Bug ID: 110724
   Summary: Unnecessary alignment on branch to unconditional
branch targets
   Product: gcc
   Version: 14.0
Status: UNCONFIRMED
  Severity: normal
  Priority: P3
 Component: rtl-optimization
  Assignee: unassigned at gcc dot gnu.org
  Reporter: javier.martinez.bugzilla at gmail dot com
  Target Milestone: ---

https://godbolt.org/z/f7qMxxfMj

void duff(int * __restrict to, const int * __restrict from, const int count)
{
int n = (count+7) / 8;
switch(count%8)
{
   case 0: do { *to++ = *from++;
   case 7:  *to++ = *from++;
   case 6:  *to++ = *from++;
   case 5:  *to++ = *from++;
   case 4:  *to++ = *from++;
   case 3:  *to++ = *from++;
   case 2:  *to++ = *from++;
   [[likely]] case 1:  *to++ = *from++;
} while (--n>0);
}
}

Trunk with O3:
jle .L1
[...]
lea rax, [rax+4]
jmp .L5# <-- no fall-through to ret
.p2align 4,,7  # <-- unnecessary alignment
.p2align 3
.L1:
ret


I believe this 16-byte alignment is done to put the branch target at the
beginning of a front-end instruction fetch block. That however seems
unnecessary when the branch target is itself an unconditional branch, as the
instructions to follow will not retire.

In this example the degrade is code size / instruction caching only, as there
is no possible fall-through to .L1 that would cause nop's to be consumed.
Changing the C++ attribute to [[unlikely]] introduces fall-through, and GCC
seems to remove the padding, which is great.

Re: [PATCH v2 0/2] ifcvt: Allow if conversion of arithmetic in basic blocks with multiple sets

2023-07-18 Thread Manolis Tsamis
Hi Richard,

Thanks for your insightful reply.

On Tue, Jul 18, 2023 at 1:12 AM Richard Sandiford
 wrote:
>
> Manolis Tsamis  writes:
> > noce_convert_multiple_sets has been introduced and extended over time to 
> > handle
> > if conversion for blocks with multiple sets. Currently this is focused on
> > register moves and rejects any sort of arithmetic operations.
> >
> > This series is an extension to allow more sequences to take part in if
> > conversion. The first patch is a required change to emit correct code and 
> > the
> > second patch whitelists a larger number of operations through
> > bb_ok_for_noce_convert_multiple_sets.
> >
> > For targets that have a rich selection of conditional instructions,
> > like aarch64, I have seen an ~5x increase of profitable if conversions for
> > multiple set blocks in SPEC benchmarks. Also tested with a wide variety of
> > benchmarks and I have not seen performance regressions on either x64 / 
> > aarch64.
>
> Interesting results.  Are you free to say which target you used for aarch64?
>
> If I've understood the cost heuristics correctly, we'll allow a "predictable"
> branch to be replaced by up to 5 simple conditional instructions and an
> "unpredictable" branch to be replaced by up to 10 simple conditional
> instructions.  That seems pretty high.  And I'm not sure how well we
> guess predictability in the absence of real profile information.
>
> So my gut instinct was that the limitations of the current code might
> be saving us from overly generous limits.  It sounds from your results
> like that might not be the case though.
>
> Still, if it does turn out to be the case in future, I agree we should
> fix the costs rather than hamstring the code.
>

My writing may have been confusing, but with "~5x increase of
profitable if conversions" I just meant that ifcvt considers these
profitable, not that they actually are when executed in particular
hardware.
But at the same time I haven't yet seen any obvious performance
regressions in some benchmarks that I have ran.
In any case it could be interesting to microbenchmark branches vs
conditional instructions and see how sane these numbers are.

> > Some samples that previously resulted in a branch but now better use these
> > instructions can be seen in the provided test case.
> >
> > Tested on aarch64 and x64; On x64 some tests that use __builtin_rint are
> > failing with an ICE but I believe that it's not an issue of this change.
> > force_operand crashes when (and:DF (not:DF (reg:DF 88)) (reg/v:DF 83 [ x ]))
> > is provided through emit_conditional_move.
>
> I guess that needs to be fixed first though.  (Thanks for checking both
> targets.)
>

I get a feeling this may be fixed if I properly take care of your
points 1 & 2 below. I will report on that.

> My main comments on the series are:
>
> (1) It isn't obvious which operations should be included in the list
> in patch 2 and which shouldn't.  Also, the patch only checks the
> outermost operation, and so it allows the inner rtxes to be
> arbitrarily complex.
>
> Because of that, it might be better to remove the condition
> altogether and just rely on the other routines to do costing and
> correctness checks.
>

That is true; I wanted to somehow only allow "normal arithmetic
operations" and avoid generating sequences with stranger codes. I will
try and see what happens if I remove the condition altogether. I also
totally missed the fact that I was allowing arbitrarily complex inner
rtxes so thanks for pointing that out.

> (2) Don't you also need to update the "rewiring" mechanism, to cope
> with cases where the then block has something like:
>
>   if (a == 0) {
> a = b op c;   ->a' = a == 0 ? b op c : a;
> d = a op b;   ->d = a == 0 ? a' op b : d;
>   } a = a'
>
> At the moment the code only handles regs and subregs, whereas but IIUC
> it should now iterate over all the regs in the SET_SRC.  And I suppose
> that creates the need for multiple possible rewirings in the same insn,
> so that it isn't a simple insn -> index mapping any more.
>

Indeed, I believe this current patch cannot properly handle these. I
will create testcases for this and see what changes need to be done in
the next iteration so that correct code is generated.

Thanks,
Manolis

> Thanks,
> Richard
>
> >
> >
> > Changes in v2:
> > - Change "conditional moves" to "conditional instructions"
> > in bb_ok_for_noce_convert_multiple_sets's comment.
> >
> > Manolis Tsamis (2):
> >   ifcvt: handle sequences that clobber flags in
> > noce_convert_multiple_sets
> >   ifcvt: Allow more operations in multiple set if conversion
> >
> >  gcc/ifcvt.cc  | 109 ++
> >  .../aarch64/ifcvt_multiple_sets_arithm.c  |  67 +++
> >  2 files changed, 127 insertions(+), 49 deletions(-)
> >  create mode 100644 
> > 

[Bug fortran/110723] ICE with allocatable character lhs and parenthesized array with vector subscript

2023-07-18 Thread anlauf at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=110723

--- Comment #1 from anlauf at gcc dot gnu.org ---
(In reply to anlauf from comment #0)
> Not sure if this is the right place.

Actually, the following still fails:

  n =  (m([2])//"")   ! ICE

:-(

Generally stripping parentheses generates a failure in
gfortran.dg/reassoc_2.f90, which is bad...

Re: [PATCH] testsuite: fix dwarf2/utf-1.C with DWARF4

2023-07-18 Thread Jason Merrill via Gcc-patches

On 7/5/23 17:51, Marek Polacek wrote:

Running
$ make check-c++ RUNTESTFLAGS='--target_board=unix\{-gdwarf-5,-gdwarf-4\} 
dwarf2.exp=utf-1.C'
shows
FAIL: g++.dg/debug/dwarf2/utf-1.C  -std=gnu++20  scan-assembler-times 
DW_AT_encoding \\(0x10\\) 3
because with -gdwarf-4 the output is:

   .byte   0x10# DW_AT_encoding

but with -gdwarf-5 the output is the expected:

 # DW_AT_encoding (0x10)

The difference is caused by the DWARF5 optimize_implicit_const
optimization:


I suppose we could do what testsuite/rust/debug/chartype.rs does
and just run the test with -gdwarf-4.

Tested on x86_64-pc-linux-gnu, ok for trunk?


OK.


gcc/testsuite/ChangeLog:

* g++.dg/debug/dwarf2/utf-1.C: Use -gdwarf-4.  Adjust expected
output.
---
  gcc/testsuite/g++.dg/debug/dwarf2/utf-1.C | 9 +++--
  1 file changed, 7 insertions(+), 2 deletions(-)

diff --git a/gcc/testsuite/g++.dg/debug/dwarf2/utf-1.C 
b/gcc/testsuite/g++.dg/debug/dwarf2/utf-1.C
index 43b354f1bb5..0ce4d8727d6 100644
--- a/gcc/testsuite/g++.dg/debug/dwarf2/utf-1.C
+++ b/gcc/testsuite/g++.dg/debug/dwarf2/utf-1.C
@@ -1,8 +1,13 @@
  // { dg-do compile { target c++20 } }
-// { dg-options { -gdwarf -dA } }
+// { dg-options { -gdwarf-4 -dA } }
  
  // Test that all three use DW_ATE_UTF.

-// { dg-final { scan-assembler-times {DW_AT_encoding \(0x10\)} 3 } }
+// This test uses -gdwarf-4 since in DWARF5 optimize_implicit_const
+// would optimize the output from:
+//   .byte   0x10# DW_AT_encoding
+// into:
+//   # DW_AT_encoding (0x10)
+// { dg-final { scan-assembler-times "0x10\[ \t]\[^\n\r]* DW_AT_encoding" 3 } }
  
  char8_t c8;

  char16_t c16;

base-commit: be240fc6acc9714e66afbfbe6dc193844bfcba05




  1   2   3   >