Re: [PATCH 1/7] ira: Refactor the handling of register conflicts to make it more general

2023-11-07 Thread Richard Biener
On Wed, Nov 8, 2023 at 4:48 AM Lehua Ding  wrote:
>
> This patch does not make any functional changes. It mainly refactor two parts:
>
> 1. The ira_allocno's objects field is expanded to an scalable array, and 
> multi-word
>pseduo registers are split and tracked only when necessary.
> 2. Since the objects array has been expanded, there will be more subreg 
> objects
>that pass through later, rather than the previous fixed two. Therefore, it
>is necessary to modify the detection of whether two objects conflict, and
>the check method is to pull back the registers occupied by the object to
>the first register of the allocno for judgment.

Did you profile this before/after?  RA performance is critical ...

> gcc/ChangeLog:
>
> * hard-reg-set.h (struct HARD_REG_SET): Add operator>>.
> * ira-build.cc (init_object_start_and_nregs): New func.
> (find_object): Ditto.
> (ira_create_allocno): Adjust.
> (ira_set_allocno_class): Set subreg info.
> (ira_create_allocno_objects): Adjust.
> (init_regs_with_subreg): Collect access in subreg.
> (ira_build): Call init_regs_with_subreg
> (ira_destroy): Clear regs_with_subreg
> * ira-color.cc (setup_profitable_hard_regs): Adjust.
> (get_conflict_and_start_profitable_regs): Adjust.
> (check_hard_reg_p): Adjust.
> (assign_hard_reg): Adjust.
> (improve_allocation): Adjust.
> * ira-int.h (struct ira_object): Adjust fields.
> (struct ira_allocno): Adjust objects filed.
> (ALLOCNO_NUM_OBJECTS): Adjust.
> (ALLOCNO_UNIT_SIZE): New.
> (ALLOCNO_TRACK_SUBREG_P): New.
> (ALLOCNO_NREGS): New.
> (OBJECT_SIZE): New.
> (OBJECT_OFFSET): New.
> (OBJECT_START): New.
> (OBJECT_NREGS): New.
> (find_object): New.
> (has_subreg_object_p): New.
> (get_full_object): New.
> * ira.cc (check_allocation): Adjust.
>
> ---
>  gcc/hard-reg-set.h |  33 +++
>  gcc/ira-build.cc   | 106 +++-
>  gcc/ira-color.cc   | 234 ++---
>  gcc/ira-int.h  |  45 -
>  gcc/ira.cc |  52 --
>  5 files changed, 349 insertions(+), 121 deletions(-)
>
> diff --git a/gcc/hard-reg-set.h b/gcc/hard-reg-set.h
> index b0bb9bce074..760eadba186 100644
> --- a/gcc/hard-reg-set.h
> +++ b/gcc/hard-reg-set.h
> @@ -113,6 +113,39 @@ struct HARD_REG_SET
>  return !operator== (other);
>}
>
> +  HARD_REG_SET
> +  operator>> (unsigned int shift_amount) const

This is a quite costly operation, why do we need it instead
of keeping an "offset" for set queries?

> +  {
> +if (shift_amount == 0)
> +  return *this;
> +
> +HARD_REG_SET res;
> +unsigned int total_bits = sizeof (HARD_REG_ELT_TYPE) * 8;
> +if (shift_amount >= total_bits)
> +  {
> +   unsigned int n_elt = shift_amount % total_bits;
> +   shift_amount -= n_elt * total_bits;
> +   for (unsigned int i = 0; i < ARRAY_SIZE (elts) - n_elt - 1; i += 1)
> + res.elts[i] = elts[i + n_elt];
> +   /* clear upper n_elt elements.  */
> +   for (unsigned int i = 0; i < n_elt; i += 1)
> + res.elts[ARRAY_SIZE (elts) - 1 - i] = 0;
> +  }
> +
> +if (shift_amount > 0)
> +  {
> +   /* The left bits of an element be shifted.  */
> +   HARD_REG_ELT_TYPE left = 0;
> +   /* Total bits of an element.  */
> +   for (int i = ARRAY_SIZE (elts); i >= 0; --i)
> + {
> +   res.elts[i] = (elts[i] >> shift_amount) | left;
> +   left = elts[i] << (total_bits - shift_amount);
> + }
> +  }
> +return res;
> +  }
> +
>HARD_REG_ELT_TYPE elts[HARD_REG_SET_LONGS];
>  };
>  typedef const HARD_REG_SET _hard_reg_set;
> diff --git a/gcc/ira-build.cc b/gcc/ira-build.cc
> index 93e46033170..07aba27c1c9 100644
> --- a/gcc/ira-build.cc
> +++ b/gcc/ira-build.cc
> @@ -440,6 +440,40 @@ initiate_allocnos (void)
>memset (ira_regno_allocno_map, 0, max_reg_num () * sizeof (ira_allocno_t));
>  }
>
> +/* Update OBJ's start and nregs field according A and OBJ info.  */
> +static void
> +init_object_start_and_nregs (ira_allocno_t a, ira_object_t obj)
> +{
> +  enum reg_class aclass = ALLOCNO_CLASS (a);
> +  gcc_assert (aclass != NO_REGS);
> +
> +  machine_mode mode = ALLOCNO_MODE (a);
> +  int nregs = ira_reg_class_max_nregs[aclass][mode];
> +  if (ALLOCNO_TRACK_SUBREG_P (a))
> +{
> +  poly_int64 end = OBJECT_OFFSET (obj) + OBJECT_SIZE (obj);
> +  for (int i = 0; i < nregs; i += 1)
> +   {
> + poly_int64 right = ALLOCNO_UNIT_SIZE (a) * (i + 1);
> + if (OBJECT_START (obj) < 0 && maybe_lt (OBJECT_OFFSET (obj), right))
> +   {
> + OBJECT_START (obj) = i;
> +   }
> + if (OBJECT_NREGS (obj) < 0 && maybe_le (end, right))
> +   {
> + OBJECT_NREGS (obj) = i + 1 - OBJECT_START (obj);
> + break;
> + 

Re: [V2 PATCH] Handle bitop with INTEGER_CST in analyze_and_compute_bitop_with_inv_effect.

2023-11-07 Thread Richard Biener
On Wed, Nov 8, 2023 at 2:18 AM Hongtao Liu  wrote:
>
> On Tue, Nov 7, 2023 at 10:34 PM Richard Biener
>  wrote:
> >
> > On Tue, Nov 7, 2023 at 2:03 PM Hongtao Liu  wrote:
> > >
> > > On Tue, Nov 7, 2023 at 4:10 PM Richard Biener
> > >  wrote:
> > > >
> > > > On Tue, Nov 7, 2023 at 7:08 AM liuhongt  wrote:
> > > > >
> > > > > analyze_and_compute_bitop_with_inv_effect assumes the first operand is
> > > > > loop invariant which is not the case when it's INTEGER_CST.
> > > > >
> > > > > Bootstrapped and regtseted on x86_64-pc-linux-gnu{-m32,}.
> > > > > Ok for trunk?
> > > >
> > > > So this addresses a missed optimization, right?  It seems to me that
> > > > even with two SSA names we are only "lucky" when rhs1 is the invariant
> > > > one.  So instead of swapping this way I'd do
> > > Yes, it's a miss optimization.
> > > And I think expr_invariant_in_loop_p (loop, match_op[1]) should be
> > > enough, if match_op[1] is a loop invariant.it must be false for the
> > > below conditions(there couldn't be any header_phi from its
> > > definition).
> >
> > Yes, all I said is that when you now care for op1 being INTEGER_CST
> > it could also be an invariant SSA name and thus only after swapping op0/op1
> > we could have a successful match, no?
> Sorry, the commit message is a little bit misleading.
> At first, I just wanted to handle the INTEGER_CST case (with TREE_CODE
> (match_op[1]) == INTEGER_CST), but then I realized that this could
> probably be extended to the normal SSA_NAME case as well, so I used
> expr_invariant_in_loop_p, which should theoretically be able to handle
> the SSA_NAME case as well.
>
> if (expr_invariant_in_loop_p (loop, match_op[1])) is true, w/o
> swapping it must return NULL_TREE for below conditions.
> if (expr_invariant_in_loop_p (loop, match_op[1])) is false, w/
> swapping it must return NULL_TREE too.
> So it can cover the both cases you mentioned, no need for a loop to
> iterate 2 match_ops for all conditions.

Sorry if it appears we're going in circles ;)

> 3692  if (TREE_CODE (match_op[1]) != SSA_NAME
> 3693  || !expr_invariant_in_loop_p (loop, match_op[0])
> 3694  || !(header_phi = dyn_cast  (SSA_NAME_DEF_STMT 
> (match_op[1])))

but this only checks match_op[1] (an SSA name at this point) for being defined
by the header PHI.  What if expr_invariant_in_loop_p (loop, mach_op[1])
and header_phi = dyn_cast  (SSA_NAME_DEF_STMT (match_op[0]))
which I think can happen when both ops are SSA name?

The only canonicalization we have is that constant operands are put second so
it would have been more natural to write the matching with the other operand
order (but likely you'd have been unlucky for the existing testcases then).

> 3695  || gimple_bb (header_phi) != loop->header
> 3696  || gimple_phi_num_args (header_phi) != 2)
> 3697return NULL_TREE;
> 3698
> 3699  if (PHI_ARG_DEF_FROM_EDGE (header_phi, loop_latch_edge (loop)) != 
> phidef)
> 3700return NULL_TREE;
>
>
> >
> > > >
> > > >  unsigned i;
> > > >  for (i = 0; i < 2; ++i)
> > > >if (TREE_CODE (match_op[i]) == SSA_NAME
> > > >&& ...)
> > > > break; /* found! */
> > > >
> > > >   if (i == 2)
> > > > return NULL_TREE;
> > > >   if (i == 0)
> > > > std::swap (match_op[0], match_op[1]);
> > > >
> > > > to also handle a "swapped" pair of SSA names?
> > > >
> > > > > gcc/ChangeLog:
> > > > >
> > > > > PR tree-optimization/105735
> > > > > PR tree-optimization/111972
> > > > > * tree-scalar-evolution.cc
> > > > > (analyze_and_compute_bitop_with_inv_effect): Handle bitop with
> > > > > INTEGER_CST.
> > > > >
> > > > > gcc/testsuite/ChangeLog:
> > > > >
> > > > > * gcc.target/i386/pr105735-3.c: New test.
> > > > > ---
> > > > >  gcc/testsuite/gcc.target/i386/pr105735-3.c | 87 
> > > > > ++
> > > > >  gcc/tree-scalar-evolution.cc   |  3 +
> > > > >  2 files changed, 90 insertions(+)
> > > > >  create mode 100644 gcc/testsuite/gcc.target/i386/pr105735-3.c
> > > > >
> > > > > diff --git a/gcc/testsuite/gcc.target/i386/pr105735-3.c 
> > > > > b/gcc/testsuite/gcc.target/i386/pr105735-3.c
> > > > > new file mode 100644
> > > > > index 000..9e268a1a997
> > > > > --- /dev/null
> > > > > +++ b/gcc/testsuite/gcc.target/i386/pr105735-3.c
> > > > > @@ -0,0 +1,87 @@
> > > > > +/* { dg-do compile } */
> > > > > +/* { dg-options "-O1 -fdump-tree-sccp-details" } */
> > > > > +/* { dg-final { scan-tree-dump-times {final value replacement} 8 
> > > > > "sccp" } } */
> > > > > +
> > > > > +unsigned int
> > > > > +__attribute__((noipa))
> > > > > +foo (unsigned int tmp)
> > > > > +{
> > > > > +  for (int bit = 0; bit < 64; bit++)
> > > > > +tmp &= 11304;
> > > > > +  return tmp;
> > > > > +}
> > > > > +
> > > > > +unsigned int
> > > > > +__attribute__((noipa))
> > > > > +foo1 (unsigned int tmp)
> > > > > +{
> > > > > +  for (int bit = 63; bit >= 0; bit -=3)
> > > > > +tmp &= 11304;
> > > > > +  return tmp;
> > > > > 

RE: [PATCH V2] test: Fix bb-slp-33.c for RVV

2023-11-07 Thread Li, Pan2
Committed, thanks Richard.

Pan

-Original Message-
From: Richard Biener  
Sent: Wednesday, November 8, 2023 2:58 PM
To: Juzhe-Zhong 
Cc: gcc-patches@gcc.gnu.org
Subject: Re: [PATCH V2] test: Fix bb-slp-33.c for RVV

On Tue, 7 Nov 2023, Juzhe-Zhong wrote:

> gcc/testsuite/ChangeLog:

OK.

>   * gcc.dg/vect/bb-slp-33.c: Rewrite the condition.
> 
> ---
>  gcc/testsuite/gcc.dg/vect/bb-slp-33.c | 35 ---
>  1 file changed, 26 insertions(+), 9 deletions(-)
> 
> diff --git a/gcc/testsuite/gcc.dg/vect/bb-slp-33.c 
> b/gcc/testsuite/gcc.dg/vect/bb-slp-33.c
> index bbb13ef798e..74af8dd27ae 100644
> --- a/gcc/testsuite/gcc.dg/vect/bb-slp-33.c
> +++ b/gcc/testsuite/gcc.dg/vect/bb-slp-33.c
> @@ -32,16 +32,33 @@ int main()
>a[4] = 7;
>check_vect ();
>test(a, b);
> -  if (a[0] != 1
> -  || a[1] != 2
> -  || a[2] != 3
> -  || a[3] != 4
> -  || a[4] != 7
> -  || a[5] != 0
> -  || a[6] != 0
> -  || a[7] != 0
> -  || a[8] != 0)
> +  if (a[0] != 1)
>  abort ();
> +  __asm__ volatile ("");
> +  if (a[1] != 2)
> +abort ();
> +  __asm__ volatile ("");
> +  if (a[2] != 3)
> +abort ();
> +  __asm__ volatile ("");
> +  if (a[3] != 4)
> +abort ();
> +  __asm__ volatile ("");
> +  if (a[4] != 7)
> +abort ();
> +  __asm__ volatile ("");
> +  if (a[5] != 0)
> +abort ();
> +  __asm__ volatile ("");
> +  if (a[6] != 0)
> +abort ();
> +  __asm__ volatile ("");
> +  if (a[7] != 0)
> +abort ();
> +  __asm__ volatile ("");
> +  if (a[8] != 0)
> +abort ();
> +  __asm__ volatile ("");
>return 0;
>  }
>  
> 

-- 
Richard Biener 
SUSE Software Solutions Germany GmbH,
Frankenstrasse 146, 90461 Nuernberg, Germany;
GF: Ivo Totev, Andrew McDonald, Werner Knoblich; (HRB 36809, AG Nuernberg)


Re: [PATCH] [i386] APX: Fix ICE due to movti postreload splitter [PR112394]

2023-11-07 Thread Hongtao Liu
On Tue, Nov 7, 2023 at 3:33 PM Hongyu Wang  wrote:
>
> Hi,
>
> When APX EGPR enabled, the TImode move pattern *movti_internal allows
> move between gpr and sse reg using constraint pair ("r","Yd"). Then a
> post-reload splitter transform such move to vec_extractv2di, while under
> -msse4.1 -mno-avx EGPR is not allowed for its enabled alternative, which
> caused ICE that insn does not match the constraint. To prevent such ICE,
> we need to adjust the constraint correspond to "Yd". Add a new "jc"
> constraint to disable EGPR under -mno-avx.
>
> Bootstrapped and regtseted on x86_64-pc-linux-gnu{-m32,}.
>
> OK for trunk?
LGTM.
>
> gcc/ChangeLog:
>
> PR target/112394
> * config/i386/constraints.md (jc): New constraint that prohibits
> EGPR on -mno-avx.
> * config/i386/i386.md (*movdi_internal): Change r constraint
> corresponds to Yd.
> (*movti_internal): Likewise.
>
> gcc/testsuite/ChangeLog:
>
> PR target/112394
> * gcc.target/i386/pr112394.c: New test.
> ---
>  gcc/config/i386/constraints.md   |  3 +++
>  gcc/config/i386/i386.md  |  8 
>  gcc/testsuite/gcc.target/i386/pr112394.c | 24 
>  3 files changed, 31 insertions(+), 4 deletions(-)
>  create mode 100644 gcc/testsuite/gcc.target/i386/pr112394.c
>
> diff --git a/gcc/config/i386/constraints.md b/gcc/config/i386/constraints.md
> index f6275740eb2..74c2f0f2d32 100644
> --- a/gcc/config/i386/constraints.md
> +++ b/gcc/config/i386/constraints.md
> @@ -434,3 +434,6 @@ (define_address_constraint "jb"
>(and (match_operand 0 "vsib_address_operand")
> (not (and (match_test "TARGET_APX_EGPR")
>  (match_test "x86_extended_rex2reg_mentioned_p (op)")
> +
> +(define_register_constraint  "jc"
> + "TARGET_APX_EGPR && !TARGET_AVX ? GENERAL_GPR16 : GENERAL_REGS")
> diff --git a/gcc/config/i386/i386.md b/gcc/config/i386/i386.md
> index ecc74e9994e..ec39c2dd512 100644
> --- a/gcc/config/i386/i386.md
> +++ b/gcc/config/i386/i386.md
> @@ -2454,8 +2454,8 @@ (define_insn "*movoi_internal_avx"
> (set_attr "mode" "OI")])
>
>  (define_insn "*movti_internal"
> -  [(set (match_operand:TI 0 "nonimmediate_operand" "=!r ,o ,v,v ,v 
> ,m,?r,?Yd")
> -   (match_operand:TI 1 "general_operand"  "riFo,re,C,BC,vm,v,Yd,r"))]
> +  [(set (match_operand:TI 0 "nonimmediate_operand" "=!r ,o ,v,v ,v 
> ,m,?jc,?Yd")
> +   (match_operand:TI 1 "general_operand"  
> "riFo,re,C,BC,vm,v,Yd,jc"))]
>"(TARGET_64BIT
>  && !(MEM_P (operands[0]) && MEM_P (operands[1])))
> || (TARGET_SSE
> @@ -2537,9 +2537,9 @@ (define_split
>
>  (define_insn "*movdi_internal"
>[(set (match_operand:DI 0 "nonimmediate_operand"
> -"=r  ,o  ,r,r  ,r,m ,*y,*y,?*y,?m,?r,?*y,?Yv,?v,?v,m ,m,?r 
> ,?*Yd,?r,?v,?*y,?*x,*k,*k  ,*r,*m,*k")
> +"=r  ,o  ,r,r  ,r,m ,*y,*y,?*y,?m,?r,?*y,?Yv,?v,?v,m 
> ,m,?jc,?*Yd,?r,?v,?*y,?*x,*k,*k  ,*r,*m,*k")
> (match_operand:DI 1 "general_operand"
> -"riFo,riF,Z,rem,i,re,C ,*y,Bk ,*y,*y,r  ,C  ,?v,Bk,?v,v,*Yd,r   ,?v,r  
> ,*x ,*y ,*r,*kBk,*k,*k,CBC"))]
> +"riFo,riF,Z,rem,i,re,C ,*y,Bk ,*y,*y,r  ,C  ,?v,Bk,?v,v,*Yd,jc  ,?v,r  
> ,*x ,*y ,*r,*kBk,*k,*k,CBC"))]
>"!(MEM_P (operands[0]) && MEM_P (operands[1]))
> && ix86_hardreg_mov_ok (operands[0], operands[1])"
>  {
> diff --git a/gcc/testsuite/gcc.target/i386/pr112394.c 
> b/gcc/testsuite/gcc.target/i386/pr112394.c
> new file mode 100644
> index 000..c582f6ea6bd
> --- /dev/null
> +++ b/gcc/testsuite/gcc.target/i386/pr112394.c
> @@ -0,0 +1,24 @@
> +/* PR target/112394 */
> +/* { dg-do compile { target { ! ia32 } } } */
> +/* { dg-options "-msse4.1 -m64 -O -mapxf" } */
> +
> +typedef int __attribute__((__vector_size__ (8))) A;
> +typedef int __attribute__((__vector_size__ (16))) B;
> +typedef char __attribute__((__vector_size__ (4))) C;
> +typedef char __attribute__((__vector_size__ (32))) D;
> +typedef _Complex __int128 CU;
> +typedef _Float16 __attribute__((__vector_size__ (8))) F;
> +D d;
> +B b;
> +CU gcu;
> +
> +int
> +foo (char c, int, int, int, int, CU cu, int x)
> +{
> +  d /= c | d;
> +  F f = __builtin_convertvector (b, F);
> +  cu /= gcu;
> +  A a = (A) f;
> +  int i = cu + x;
> +  return ((C) a[0])[1] + i + c;
> +}
> --
> 2.31.1
>


-- 
BR,
Hongtao


Re: [PATCH v2] c: Add -Wreturn-mismatch warning, split from -Wreturn-type

2023-11-07 Thread Florian Weimer
* Florian Weimer:

> The existing -Wreturn-type option covers both constraint violations
> (which are mandatory to diagnose) and warnings that have known
> false positives.  The new -Wreturn-mismatch warning is only about
> the constraint violations (missing or extra return expressions),
> and should eventually be turned into a permerror.
>
> The -std=gnu89 test cases show that by default, we do not warn for
> return; in a function not returning void.  This matches previous
> practice for -Wreturn-type.
>
> gcc/c-family/
>
>   * c.opt (Wreturn-mismatch): New.
>
> gcc/c/
>
>   * c-typeck.cc (c_finish_return): Use pedwarn with
>   OPT_Wreturn_mismatch for missing/extra return expressions.
>
> gcc/
>
>   * doc/invoke.texi (Warning Options): Document
>   -Wreturn-mismatch.  Update -Wreturn-type documentation.
>
> gcc/testsuite/
>
>   * gcc.dg/Wreturn-mismatch-1.c: New.
>   * gcc.dg/Wreturn-mismatch-2.c: New.
>   * gcc.dg/Wreturn-mismatch-3.c: New.
>   * gcc.dg/Wreturn-mismatch-4.c: New.
>   * gcc.dg/Wreturn-mismatch-5.c: New.
>   * gcc.dg/Wreturn-mismatch-6.c: New.
>   * gcc.dg/noncompile/pr55976-1.c: Change -Werror=return-type
>   to -Werror=return-mismatch.
>   * gcc.dg/noncompile/pr55976-2.c: Likewise.
>
> ---
> v2: Update comment in gcc.dg/noncompile/pr55976-2.c.  Do not produce
> an error in C90 pedantic-error mode for return; in a function
> returning non-void.  Add gcc.dg/Wreturn-mismatch-6.c to demonstrate
> this behavior.

Ping?  Original patch:



Thanks,
Florian



Re: [PATCH V2] test: Fix bb-slp-33.c for RVV

2023-11-07 Thread Richard Biener
On Tue, 7 Nov 2023, Juzhe-Zhong wrote:

> gcc/testsuite/ChangeLog:

OK.

>   * gcc.dg/vect/bb-slp-33.c: Rewrite the condition.
> 
> ---
>  gcc/testsuite/gcc.dg/vect/bb-slp-33.c | 35 ---
>  1 file changed, 26 insertions(+), 9 deletions(-)
> 
> diff --git a/gcc/testsuite/gcc.dg/vect/bb-slp-33.c 
> b/gcc/testsuite/gcc.dg/vect/bb-slp-33.c
> index bbb13ef798e..74af8dd27ae 100644
> --- a/gcc/testsuite/gcc.dg/vect/bb-slp-33.c
> +++ b/gcc/testsuite/gcc.dg/vect/bb-slp-33.c
> @@ -32,16 +32,33 @@ int main()
>a[4] = 7;
>check_vect ();
>test(a, b);
> -  if (a[0] != 1
> -  || a[1] != 2
> -  || a[2] != 3
> -  || a[3] != 4
> -  || a[4] != 7
> -  || a[5] != 0
> -  || a[6] != 0
> -  || a[7] != 0
> -  || a[8] != 0)
> +  if (a[0] != 1)
>  abort ();
> +  __asm__ volatile ("");
> +  if (a[1] != 2)
> +abort ();
> +  __asm__ volatile ("");
> +  if (a[2] != 3)
> +abort ();
> +  __asm__ volatile ("");
> +  if (a[3] != 4)
> +abort ();
> +  __asm__ volatile ("");
> +  if (a[4] != 7)
> +abort ();
> +  __asm__ volatile ("");
> +  if (a[5] != 0)
> +abort ();
> +  __asm__ volatile ("");
> +  if (a[6] != 0)
> +abort ();
> +  __asm__ volatile ("");
> +  if (a[7] != 0)
> +abort ();
> +  __asm__ volatile ("");
> +  if (a[8] != 0)
> +abort ();
> +  __asm__ volatile ("");
>return 0;
>  }
>  
> 

-- 
Richard Biener 
SUSE Software Solutions Germany GmbH,
Frankenstrasse 146, 90461 Nuernberg, Germany;
GF: Ivo Totev, Andrew McDonald, Werner Knoblich; (HRB 36809, AG Nuernberg)


[PATCH] libgcc: Add {unsigned ,}__int128 <-> _Decimal{32,64,128} conversion support [PR65833]

2023-11-07 Thread Jakub Jelinek
Hi!

The following patch adds the missing
{unsigned ,}__int128 <-> _Decimal{32,64,128}
conversion support into libgcc.a on top of the _BitInt support
(doing it without that would be larger amount of code and I hope all
the targets which support __int128 will eventually support _BitInt,
after all it is a required part of C23) and because it is in libgcc.a
only, it doesn't hurt that much if it is added for some architectures
only in GCC 15.
Initially I thought about doing this on the compiler side, but doing
it on the library side seems to be easier and more -Os friendly.
The tests currently require bitint effective target, that can be
removed when all the int128 targets support bitint.

Bootstrapped/regtested on x86_64-linux and i686-linux, ok for trunk?

2023-11-08  Jakub Jelinek  

PR libgcc/65833
libgcc/
* config/t-softfp (softfp_bid_list): Add
{U,}TItype <-> _Decimal{32,64,128} conversions.
* soft-fp/floattisd.c: New file.
* soft-fp/floattidd.c: New file.
* soft-fp/floattitd.c: New file.
* soft-fp/floatuntisd.c: New file.
* soft-fp/floatuntidd.c: New file.
* soft-fp/floatuntitd.c: New file.
* soft-fp/fixsdti.c: New file.
* soft-fp/fixddti.c: New file.
* soft-fp/fixtdti.c: New file.
* soft-fp/fixunssdti.c: New file.
* soft-fp/fixunsddti.c: New file.
* soft-fp/fixunstdti.c: New file.
gcc/testsuite/
* gcc.dg/dfp/int128-1.c: New test.
* gcc.dg/dfp/int128-2.c: New test.
* gcc.dg/dfp/int128-3.c: New test.
* gcc.dg/dfp/int128-4.c: New test.

--- libgcc/config/t-softfp.jj   2023-09-08 11:29:20.142767499 +0200
+++ libgcc/config/t-softfp  2023-11-06 10:55:19.117642736 +0100
@@ -69,7 +69,9 @@ softfp_bid_list :=
 ifeq ($(decimal_float),yes)
 ifeq ($(enable_decimal_float),bid)
 softfp_bid_list += bitintpow10 \
-  $(foreach m,sd dd td,fix$(m)bitint floatbitint$(m))
+  $(foreach m,sd dd td,fix$(m)bitint floatbitint$(m) \
+   fix$(m)ti fixuns$(m)ti \
+   floatti$(m) floatunti$(m))
 endif
 endif
 
--- libgcc/soft-fp/floattisd.c.jj   2023-11-06 09:58:23.431361481 +0100
+++ libgcc/soft-fp/floattisd.c  2023-11-06 09:58:56.149904156 +0100
@@ -0,0 +1,53 @@
+/* Software floating-point emulation.
+   Convert a 128bit signed integer to _Decimal32.
+
+   Copyright (C) 2023 Free Software Foundation, Inc.
+
+This file is part of GCC.
+
+GCC is free software; you can redistribute it and/or modify it under
+the terms of the GNU General Public License as published by the Free
+Software Foundation; either version 3, or (at your option) any later
+version.
+
+GCC is distributed in the hope that it will be useful, but WITHOUT ANY
+WARRANTY; without even the implied warranty of MERCHANTABILITY or
+FITNESS FOR A PARTICULAR PURPOSE.  See the GNU General Public License
+for more details.
+
+Under Section 7 of GPL version 3, you are granted additional
+permissions described in the GCC Runtime Library Exception, version
+3.1, as published by the Free Software Foundation.
+
+You should have received a copy of the GNU General Public License and
+a copy of the GCC Runtime Library Exception along with this program;
+see the files COPYING3 and COPYING.RUNTIME respectively.  If not, see
+.  */
+
+#include "soft-fp.h"
+#include "bitint.h"
+
+#if defined(__BITINT_MAXWIDTH__) && defined(__SIZEOF_INT128__)
+extern _Decimal32 __bid_floatbitintsd (const UBILtype *, SItype);
+extern _Decimal32 __bid_floattisd (TItype);
+
+_Decimal32
+__bid_floattisd (TItype i)
+{
+  UBILtype ib[128 / BIL_TYPE_SIZE];
+#if BIL_TYPE_SIZE == 128
+  ib[0] = i;
+#elif BIL_TYPE_SIZE == 64
+  ib[BITINT_END (0, 1)] = i >> 64;
+  ib[BITINT_END (1, 0)] = i;
+#elif BIL_TYPE_SIZE == 32
+  ib[BITINT_END (0, 3)] = i >> 96;
+  ib[BITINT_END (1, 2)] = i >> 64;
+  ib[BITINT_END (2, 1)] = i >> 32;
+  ib[BITINT_END (3, 0)] = i;
+#else
+#error Unsupported UBILtype
+#endif
+  return __bid_floatbitintsd (ib, -128);
+}
+#endif
--- libgcc/soft-fp/floattidd.c.jj   2023-11-06 09:50:17.991146599 +0100
+++ libgcc/soft-fp/floattidd.c  2023-11-06 09:57:58.283712969 +0100
@@ -0,0 +1,53 @@
+/* Software floating-point emulation.
+   Convert a 128bit signed integer to _Decimal64.
+
+   Copyright (C) 2023 Free Software Foundation, Inc.
+
+This file is part of GCC.
+
+GCC is free software; you can redistribute it and/or modify it under
+the terms of the GNU General Public License as published by the Free
+Software Foundation; either version 3, or (at your option) any later
+version.
+
+GCC is distributed in the hope that it will be useful, but WITHOUT ANY
+WARRANTY; without even the implied warranty of MERCHANTABILITY or
+FITNESS FOR A PARTICULAR PURPOSE.  See the GNU General Public License
+for more details.
+
+Under Section 7 of GPL version 3, you are granted additional
+permissions described in the GCC Runtime 

Re: Re: [PATCH] RISC-V: Normalize user vsetvl intrinsics[PR112092]

2023-11-07 Thread Kito Cheng
On Wed, Nov 8, 2023 at 2:37 PM juzhe.zh...@rivai.ai
 wrote:
>
> Another question raise to me.
>
> Is it necessary we have such many variant of vsetvls?
>
> I am thinking about redesign:
>
> __riscv_vsetvl_e8mf8
> __riscv_vsetvl_e16mf4
> __riscv_vsetvl_e32mf2
> __riscv_vsetvl_e64m1
>
> They are quite redundant. They have the same result.
>
> May be just design as :
>
> __riscv_vsetvl_ratio64
>
> I am no proposing it since it has been used for a long time. Just raise my 
> concern.

Yeah, I agree those variant are just having same behavior even
semantic on the current intrinsic model, one reason is we don't have
smart vsetvli insertion pass at design stage, also it's more obviously
to user to pick the right vsetvli intrinsic, however I intend not to
change that interface, the reason is simple, it's used for a long time
as you mentioned, change that would be huge disturbance.

There may have same argument for vbool* stuffs, but vbool* kind of
mixing historical reason* and also we didn't found better way to model
that.

* We have define MLEN is v-spec long times ago, I forgot it's 0.7 or 0.8..


Re: [PATCH] RISC-V: Nan-box the result of movhf on soft-fp16

2023-11-07 Thread Kito Cheng
Thanks for the patch!! We also found the same issue on internal
testing works and trying to figure out how to resolve that issue yet,
this patch is little bit magic, let me take a closer look.. :P

On Wed, Nov 8, 2023 at 11:08 AM KuanLin Chen  wrote:
>
>  According to spec, fmv.h checks if the input operands are correctly
>  NaN-boxed. If not, the input value is treated as an n-bit canonical NaN.
>  This patch fixs the issue that operands returned by soft-fp16 libgcc
>  (i.e., __truncdfhf2) was not correctly NaN-boxed.
>
> gcc/ChangeLog:
>
> * config/riscv/riscv.cc (riscv_legitimize_move): Expand movfh
>
> with Nan-boxing value.
>
> * config/riscv/riscv.md (*movhf_softfloat_unspec): New pattern.
>
>
> gcc/testsuite/ChangeLog:
>
>
> gcc.target/riscv/_Float16-nanboxing.c: New test.
>
>


Re: Re: [PATCH] RISC-V: Normalize user vsetvl intrinsics[PR112092]

2023-11-07 Thread juzhe.zh...@rivai.ai
Another question raise to me.

Is it necessary we have such many variant of vsetvls?

I am thinking about redesign:

__riscv_vsetvl_e8mf8
__riscv_vsetvl_e16mf4
__riscv_vsetvl_e32mf2
__riscv_vsetvl_e64m1

They are quite redundant. They have the same result.

May be just design as :

__riscv_vsetvl_ratio64

I am no proposing it since it has been used for a long time. Just raise my 
concern.



juzhe.zh...@rivai.ai
 
From: Kito Cheng
Date: 2023-11-08 14:33
To: juzhe.zh...@rivai.ai
CC: gcc-patches; Kito.cheng; jeffreyalaw; Robin Dapp
Subject: Re: Re: [PATCH] RISC-V: Normalize user vsetvl intrinsics[PR112092]
OK, then LGTM, thanks for the explanation :)
 
On Wed, Nov 8, 2023 at 2:33 PM juzhe.zh...@rivai.ai
 wrote:
>
> More details:
>
> bb 1   bb 2
>   \/
>bb 3
>
> VSETVL PASS can only do VSETVL demand fusion, fuse demand from bb 3 to bb 1, 
> and fuse demand from bb 3 to bb2.
> We are not able to remove block bb 1 and bb 2 and create new bb 4 to hold the 
> vsetvl if bb 1 and bb 2 has the same vsetvl:
>
> bb 4 (new block)
>   |
> bb 3
>
> I don't think we should do this on VSETVL PASS.
> 
> juzhe.zh...@rivai.ai
>
>
> From: Kito Cheng
> Date: 2023-11-08 14:16
> To: Juzhe-Zhong
> CC: gcc-patches; kito.cheng; jeffreyalaw; rdapp.gcc
> Subject: Re: [PATCH] RISC-V: Normalize user vsetvl intrinsics[PR112092]
> I thought vsetvli insertion will try to merge them into one for those
> cases? Could you explain few more reasons why they are not fused now?
> Not an objection since I could imageing that would be easier to
> process, just wondering why.
>
> On Wed, Nov 8, 2023 at 2:11 PM Juzhe-Zhong  wrote:
> >
> > Since our user vsetvl intrinsics are defined as just calculate the VL output
> > which is the number of the elements to be processed. Such intrinsics do not
> > have any side effects.  We should normalize them when they have same ratio.
> >
> > E.g __riscv_vsetvl_e8mf8 result is same as __riscv_vsetvl_e64m1.
> >
> > Normalize them can allow us have better codegen.
> > Consider this following example:
> >
> > #include "riscv_vector.h"
> >
> > void foo(int32_t *in1, int32_t *in2, int32_t *in3, int32_t *out, size_t n, 
> > int cond, int avl) {
> >
> >   size_t vl;
> >   if (cond)
> > vl = __riscv_vsetvl_e32m1(avl);
> >   else
> > vl = __riscv_vsetvl_e16mf2(avl);
> >   for (size_t i = 0; i < n; i += 1) {
> > vint32m1_t a = __riscv_vle32_v_i32m1(in1, vl);
> > vint32m1_t b = __riscv_vle32_v_i32m1_tu(a, in2, vl);
> > vint32m1_t c = __riscv_vle32_v_i32m1_tu(b, in3, vl);
> > __riscv_vse32_v_i32m1(out, c, vl);
> >   }
> > }
> >
> > Before this patch:
> >
> > foo:
> > beq a5,zero,.L2
> > vsetvli a6,a6,e32,m1,tu,ma
> > .L3:
> > li  a5,0
> > beq a4,zero,.L9
> > .L4:
> > vle32.v v1,0(a0)
> > addia5,a5,1
> > vle32.v v1,0(a1)
> > vle32.v v1,0(a2)
> > vse32.v v1,0(a3)
> > bne a4,a5,.L4
> > .L9:
> > ret
> > .L2:
> > vsetvli zero,a6,e32,m1,tu,ma
> > j   .L3
> >
> > After this patch:
> >
> > foo:
> > li  a5,0
> > vsetvli zero,a6,e32,m1,tu,ma
> > beq a4,zero,.L9
> > .L4:
> > vle32.v v1,0(a0)
> > addia5,a5,1
> > vle32.v v1,0(a1)
> > vle32.v v1,0(a2)
> > vse32.v v1,0(a3)
> > bne a4,a5,.L4
> > .L9:
> > ret
> >
> > PR target/112092
> >
> > gcc/ChangeLog:
> >
> > * config/riscv/riscv-vector-builtins-bases.cc: Normalize the 
> > vsetvls.
> >
> > gcc/testsuite/ChangeLog:
> >
> > * gcc.target/riscv/rvv/vsetvl/pr109743-1.c: Adapt test.
> > * gcc.target/riscv/rvv/vsetvl/pr109743-3.c: Ditto.
> > * gcc.target/riscv/rvv/vsetvl/vsetvl-11.c: Ditto.
> > * gcc.target/riscv/rvv/vsetvl/vsetvl-15.c: Ditto.
> > * gcc.target/riscv/rvv/vsetvl/vsetvl-22.c: Ditto.
> > * gcc.target/riscv/rvv/vsetvl/vsetvlmax-13.c: Ditto.
> > * gcc.target/riscv/rvv/vsetvl/vsetvlmax-15.c: Ditto.
> > * gcc.target/riscv/rvv/vsetvl/vsetvlmax-5.c: Ditto.
> > * gcc.target/riscv/rvv/vsetvl/vsetvlmax-7.c: Ditto.
> > * gcc.target/riscv/rvv/vsetvl/vsetvlmax-8.c: Ditto.
> > * gcc.target/riscv/rvv/vsetvl/pr112092-1.c: New test.
> > * gcc.target/riscv/rvv/vsetvl/pr112092-2.c: New test.
> >
> > ---
> >  .../riscv/riscv-vector-builtins-bases.cc  | 24 +-
> >  .../gcc.target/riscv/rvv/vsetvl/pr109743-1.c  |  2 +-
> >  .../gcc.target/riscv/rvv/vsetvl/pr109743-3.c  |  3 +--
> >  .../gcc.target/riscv/rvv/vsetvl/pr112092-1.c  | 25 +++
> >  .../gcc.target/riscv/rvv/vsetvl/pr112092-2.c  | 25 +++
> >  .../gcc.target/riscv/rvv/vsetvl/vsetvl-11.c   |  2 +-
> >  .../gcc.target/riscv/rvv/vsetvl/vsetvl-15.c   |  2 +-
> >  .../gcc.target/riscv/rvv/vsetvl/vsetvl-22.c   |  2 +-
> >  .../riscv/rvv/vsetvl/vsetvlmax-13.c   |  4 +--
> >  

Re: Re: [PATCH] RISC-V: Normalize user vsetvl intrinsics[PR112092]

2023-11-07 Thread Kito Cheng
OK, then LGTM, thanks for the explanation :)

On Wed, Nov 8, 2023 at 2:33 PM juzhe.zh...@rivai.ai
 wrote:
>
> More details:
>
> bb 1   bb 2
>   \/
>bb 3
>
> VSETVL PASS can only do VSETVL demand fusion, fuse demand from bb 3 to bb 1, 
> and fuse demand from bb 3 to bb2.
> We are not able to remove block bb 1 and bb 2 and create new bb 4 to hold the 
> vsetvl if bb 1 and bb 2 has the same vsetvl:
>
> bb 4 (new block)
>   |
> bb 3
>
> I don't think we should do this on VSETVL PASS.
> 
> juzhe.zh...@rivai.ai
>
>
> From: Kito Cheng
> Date: 2023-11-08 14:16
> To: Juzhe-Zhong
> CC: gcc-patches; kito.cheng; jeffreyalaw; rdapp.gcc
> Subject: Re: [PATCH] RISC-V: Normalize user vsetvl intrinsics[PR112092]
> I thought vsetvli insertion will try to merge them into one for those
> cases? Could you explain few more reasons why they are not fused now?
> Not an objection since I could imageing that would be easier to
> process, just wondering why.
>
> On Wed, Nov 8, 2023 at 2:11 PM Juzhe-Zhong  wrote:
> >
> > Since our user vsetvl intrinsics are defined as just calculate the VL output
> > which is the number of the elements to be processed. Such intrinsics do not
> > have any side effects.  We should normalize them when they have same ratio.
> >
> > E.g __riscv_vsetvl_e8mf8 result is same as __riscv_vsetvl_e64m1.
> >
> > Normalize them can allow us have better codegen.
> > Consider this following example:
> >
> > #include "riscv_vector.h"
> >
> > void foo(int32_t *in1, int32_t *in2, int32_t *in3, int32_t *out, size_t n, 
> > int cond, int avl) {
> >
> >   size_t vl;
> >   if (cond)
> > vl = __riscv_vsetvl_e32m1(avl);
> >   else
> > vl = __riscv_vsetvl_e16mf2(avl);
> >   for (size_t i = 0; i < n; i += 1) {
> > vint32m1_t a = __riscv_vle32_v_i32m1(in1, vl);
> > vint32m1_t b = __riscv_vle32_v_i32m1_tu(a, in2, vl);
> > vint32m1_t c = __riscv_vle32_v_i32m1_tu(b, in3, vl);
> > __riscv_vse32_v_i32m1(out, c, vl);
> >   }
> > }
> >
> > Before this patch:
> >
> > foo:
> > beq a5,zero,.L2
> > vsetvli a6,a6,e32,m1,tu,ma
> > .L3:
> > li  a5,0
> > beq a4,zero,.L9
> > .L4:
> > vle32.v v1,0(a0)
> > addia5,a5,1
> > vle32.v v1,0(a1)
> > vle32.v v1,0(a2)
> > vse32.v v1,0(a3)
> > bne a4,a5,.L4
> > .L9:
> > ret
> > .L2:
> > vsetvli zero,a6,e32,m1,tu,ma
> > j   .L3
> >
> > After this patch:
> >
> > foo:
> > li  a5,0
> > vsetvli zero,a6,e32,m1,tu,ma
> > beq a4,zero,.L9
> > .L4:
> > vle32.v v1,0(a0)
> > addia5,a5,1
> > vle32.v v1,0(a1)
> > vle32.v v1,0(a2)
> > vse32.v v1,0(a3)
> > bne a4,a5,.L4
> > .L9:
> > ret
> >
> > PR target/112092
> >
> > gcc/ChangeLog:
> >
> > * config/riscv/riscv-vector-builtins-bases.cc: Normalize the 
> > vsetvls.
> >
> > gcc/testsuite/ChangeLog:
> >
> > * gcc.target/riscv/rvv/vsetvl/pr109743-1.c: Adapt test.
> > * gcc.target/riscv/rvv/vsetvl/pr109743-3.c: Ditto.
> > * gcc.target/riscv/rvv/vsetvl/vsetvl-11.c: Ditto.
> > * gcc.target/riscv/rvv/vsetvl/vsetvl-15.c: Ditto.
> > * gcc.target/riscv/rvv/vsetvl/vsetvl-22.c: Ditto.
> > * gcc.target/riscv/rvv/vsetvl/vsetvlmax-13.c: Ditto.
> > * gcc.target/riscv/rvv/vsetvl/vsetvlmax-15.c: Ditto.
> > * gcc.target/riscv/rvv/vsetvl/vsetvlmax-5.c: Ditto.
> > * gcc.target/riscv/rvv/vsetvl/vsetvlmax-7.c: Ditto.
> > * gcc.target/riscv/rvv/vsetvl/vsetvlmax-8.c: Ditto.
> > * gcc.target/riscv/rvv/vsetvl/pr112092-1.c: New test.
> > * gcc.target/riscv/rvv/vsetvl/pr112092-2.c: New test.
> >
> > ---
> >  .../riscv/riscv-vector-builtins-bases.cc  | 24 +-
> >  .../gcc.target/riscv/rvv/vsetvl/pr109743-1.c  |  2 +-
> >  .../gcc.target/riscv/rvv/vsetvl/pr109743-3.c  |  3 +--
> >  .../gcc.target/riscv/rvv/vsetvl/pr112092-1.c  | 25 +++
> >  .../gcc.target/riscv/rvv/vsetvl/pr112092-2.c  | 25 +++
> >  .../gcc.target/riscv/rvv/vsetvl/vsetvl-11.c   |  2 +-
> >  .../gcc.target/riscv/rvv/vsetvl/vsetvl-15.c   |  2 +-
> >  .../gcc.target/riscv/rvv/vsetvl/vsetvl-22.c   |  2 +-
> >  .../riscv/rvv/vsetvl/vsetvlmax-13.c   |  4 +--
> >  .../riscv/rvv/vsetvl/vsetvlmax-15.c   |  6 ++---
> >  .../gcc.target/riscv/rvv/vsetvl/vsetvlmax-5.c |  4 +--
> >  .../gcc.target/riscv/rvv/vsetvl/vsetvlmax-7.c |  2 +-
> >  .../gcc.target/riscv/rvv/vsetvl/vsetvlmax-8.c |  4 +--
> >  13 files changed, 83 insertions(+), 22 deletions(-)
> >  create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/vsetvl/pr112092-1.c
> >  create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/vsetvl/pr112092-2.c
> >
> > diff --git a/gcc/config/riscv/riscv-vector-builtins-bases.cc 
> > b/gcc/config/riscv/riscv-vector-builtins-bases.cc
> > index 0298b7987a1..d70468542ee 100644
> > --- 

Re: Re: [PATCH] RISC-V: Normalize user vsetvl intrinsics[PR112092]

2023-11-07 Thread juzhe.zh...@rivai.ai
More details:

bb 1   bb 2
  \/
   bb 3

VSETVL PASS can only do VSETVL demand fusion, fuse demand from bb 3 to bb 1, 
and fuse demand from bb 3 to bb2.
We are not able to remove block bb 1 and bb 2 and create new bb 4 to hold the 
vsetvl if bb 1 and bb 2 has the same vsetvl:

bb 4 (new block)
  |
bb 3

I don't think we should do this on VSETVL PASS.


juzhe.zh...@rivai.ai
 
From: Kito Cheng
Date: 2023-11-08 14:16
To: Juzhe-Zhong
CC: gcc-patches; kito.cheng; jeffreyalaw; rdapp.gcc
Subject: Re: [PATCH] RISC-V: Normalize user vsetvl intrinsics[PR112092]
I thought vsetvli insertion will try to merge them into one for those
cases? Could you explain few more reasons why they are not fused now?
Not an objection since I could imageing that would be easier to
process, just wondering why.
 
On Wed, Nov 8, 2023 at 2:11 PM Juzhe-Zhong  wrote:
>
> Since our user vsetvl intrinsics are defined as just calculate the VL output
> which is the number of the elements to be processed. Such intrinsics do not
> have any side effects.  We should normalize them when they have same ratio.
>
> E.g __riscv_vsetvl_e8mf8 result is same as __riscv_vsetvl_e64m1.
>
> Normalize them can allow us have better codegen.
> Consider this following example:
>
> #include "riscv_vector.h"
>
> void foo(int32_t *in1, int32_t *in2, int32_t *in3, int32_t *out, size_t n, 
> int cond, int avl) {
>
>   size_t vl;
>   if (cond)
> vl = __riscv_vsetvl_e32m1(avl);
>   else
> vl = __riscv_vsetvl_e16mf2(avl);
>   for (size_t i = 0; i < n; i += 1) {
> vint32m1_t a = __riscv_vle32_v_i32m1(in1, vl);
> vint32m1_t b = __riscv_vle32_v_i32m1_tu(a, in2, vl);
> vint32m1_t c = __riscv_vle32_v_i32m1_tu(b, in3, vl);
> __riscv_vse32_v_i32m1(out, c, vl);
>   }
> }
>
> Before this patch:
>
> foo:
> beq a5,zero,.L2
> vsetvli a6,a6,e32,m1,tu,ma
> .L3:
> li  a5,0
> beq a4,zero,.L9
> .L4:
> vle32.v v1,0(a0)
> addia5,a5,1
> vle32.v v1,0(a1)
> vle32.v v1,0(a2)
> vse32.v v1,0(a3)
> bne a4,a5,.L4
> .L9:
> ret
> .L2:
> vsetvli zero,a6,e32,m1,tu,ma
> j   .L3
>
> After this patch:
>
> foo:
> li  a5,0
> vsetvli zero,a6,e32,m1,tu,ma
> beq a4,zero,.L9
> .L4:
> vle32.v v1,0(a0)
> addia5,a5,1
> vle32.v v1,0(a1)
> vle32.v v1,0(a2)
> vse32.v v1,0(a3)
> bne a4,a5,.L4
> .L9:
> ret
>
> PR target/112092
>
> gcc/ChangeLog:
>
> * config/riscv/riscv-vector-builtins-bases.cc: Normalize the vsetvls.
>
> gcc/testsuite/ChangeLog:
>
> * gcc.target/riscv/rvv/vsetvl/pr109743-1.c: Adapt test.
> * gcc.target/riscv/rvv/vsetvl/pr109743-3.c: Ditto.
> * gcc.target/riscv/rvv/vsetvl/vsetvl-11.c: Ditto.
> * gcc.target/riscv/rvv/vsetvl/vsetvl-15.c: Ditto.
> * gcc.target/riscv/rvv/vsetvl/vsetvl-22.c: Ditto.
> * gcc.target/riscv/rvv/vsetvl/vsetvlmax-13.c: Ditto.
> * gcc.target/riscv/rvv/vsetvl/vsetvlmax-15.c: Ditto.
> * gcc.target/riscv/rvv/vsetvl/vsetvlmax-5.c: Ditto.
> * gcc.target/riscv/rvv/vsetvl/vsetvlmax-7.c: Ditto.
> * gcc.target/riscv/rvv/vsetvl/vsetvlmax-8.c: Ditto.
> * gcc.target/riscv/rvv/vsetvl/pr112092-1.c: New test.
> * gcc.target/riscv/rvv/vsetvl/pr112092-2.c: New test.
>
> ---
>  .../riscv/riscv-vector-builtins-bases.cc  | 24 +-
>  .../gcc.target/riscv/rvv/vsetvl/pr109743-1.c  |  2 +-
>  .../gcc.target/riscv/rvv/vsetvl/pr109743-3.c  |  3 +--
>  .../gcc.target/riscv/rvv/vsetvl/pr112092-1.c  | 25 +++
>  .../gcc.target/riscv/rvv/vsetvl/pr112092-2.c  | 25 +++
>  .../gcc.target/riscv/rvv/vsetvl/vsetvl-11.c   |  2 +-
>  .../gcc.target/riscv/rvv/vsetvl/vsetvl-15.c   |  2 +-
>  .../gcc.target/riscv/rvv/vsetvl/vsetvl-22.c   |  2 +-
>  .../riscv/rvv/vsetvl/vsetvlmax-13.c   |  4 +--
>  .../riscv/rvv/vsetvl/vsetvlmax-15.c   |  6 ++---
>  .../gcc.target/riscv/rvv/vsetvl/vsetvlmax-5.c |  4 +--
>  .../gcc.target/riscv/rvv/vsetvl/vsetvlmax-7.c |  2 +-
>  .../gcc.target/riscv/rvv/vsetvl/vsetvlmax-8.c |  4 +--
>  13 files changed, 83 insertions(+), 22 deletions(-)
>  create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/vsetvl/pr112092-1.c
>  create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/vsetvl/pr112092-2.c
>
> diff --git a/gcc/config/riscv/riscv-vector-builtins-bases.cc 
> b/gcc/config/riscv/riscv-vector-builtins-bases.cc
> index 0298b7987a1..d70468542ee 100644
> --- a/gcc/config/riscv/riscv-vector-builtins-bases.cc
> +++ b/gcc/config/riscv/riscv-vector-builtins-bases.cc
> @@ -131,19 +131,31 @@ public:
>
>  tree type = builtin_types[e.type.index].vector;
>  machine_mode mode = TYPE_MODE (type);
> -machine_mode inner_mode = GET_MODE_INNER (mode);
> +/* Normalize same RATO (SEW/LMUL) into same vsetvl instruction.
> +
> +- e8,mf8/e16,mf4/e32,mf2/e64,m1 --> e8mf8
> + 

Re: Re: [PATCH] RISC-V: Normalize user vsetvl intrinsics[PR112092]

2023-11-07 Thread juzhe.zh...@rivai.ai
before VSETVL PASS. The code is as follows:

bb 1:
vsetvli e16mf2 -> set a6
bb 2:
vsetvli e32m1 -> set a6
bb 3:
...
vle (use a6) e32m1 TU
vle (use a6) e32m1 TU
vse (use a6) e32m1 TU

VSETVL PASS only do  VSETVL information fusion, it doesn't do the CFG block 
fusion.

VSETVL PASS succeed on following fusion:

Change bb 1 vsetvli e16mf2 -> e32m1TU
Change bb 2 vsetvli e32m1 -> e32m1TU

But VSETVL pass can't remove bb1 and bb2, can create a new block said bb 4 to 
hold vsetvli e32m1TU

So you will see:
bb 1:
vsetvli e32m1TU
bb 2:
vsetvli e32m1TU
bb 3:
...
vle
vle
vse

with this patch, since vsetvl e16mf2 and vsetvl e32m1 are normalized into same 
vsetvl e8mf4
Then, the before the VSETVL PASS, we will see:

bb 1
vsetvli e8mf4
bb 2:
...
vle
vle
vse

Since the later vle/vle/vse is using e32m1TU, then VSETVL fuse them into bb1 
change vsetvli e8mf4 into:

bb 1
vsetvli e32m1TU
bb 2:
...
vle
vle
vse


juzhe.zh...@rivai.ai
 
From: Kito Cheng
Date: 2023-11-08 14:16
To: Juzhe-Zhong
CC: gcc-patches; kito.cheng; jeffreyalaw; rdapp.gcc
Subject: Re: [PATCH] RISC-V: Normalize user vsetvl intrinsics[PR112092]
I thought vsetvli insertion will try to merge them into one for those
cases? Could you explain few more reasons why they are not fused now?
Not an objection since I could imageing that would be easier to
process, just wondering why.
 
On Wed, Nov 8, 2023 at 2:11 PM Juzhe-Zhong  wrote:
>
> Since our user vsetvl intrinsics are defined as just calculate the VL output
> which is the number of the elements to be processed. Such intrinsics do not
> have any side effects.  We should normalize them when they have same ratio.
>
> E.g __riscv_vsetvl_e8mf8 result is same as __riscv_vsetvl_e64m1.
>
> Normalize them can allow us have better codegen.
> Consider this following example:
>
> #include "riscv_vector.h"
>
> void foo(int32_t *in1, int32_t *in2, int32_t *in3, int32_t *out, size_t n, 
> int cond, int avl) {
>
>   size_t vl;
>   if (cond)
> vl = __riscv_vsetvl_e32m1(avl);
>   else
> vl = __riscv_vsetvl_e16mf2(avl);
>   for (size_t i = 0; i < n; i += 1) {
> vint32m1_t a = __riscv_vle32_v_i32m1(in1, vl);
> vint32m1_t b = __riscv_vle32_v_i32m1_tu(a, in2, vl);
> vint32m1_t c = __riscv_vle32_v_i32m1_tu(b, in3, vl);
> __riscv_vse32_v_i32m1(out, c, vl);
>   }
> }
>
> Before this patch:
>
> foo:
> beq a5,zero,.L2
> vsetvli a6,a6,e32,m1,tu,ma
> .L3:
> li  a5,0
> beq a4,zero,.L9
> .L4:
> vle32.v v1,0(a0)
> addia5,a5,1
> vle32.v v1,0(a1)
> vle32.v v1,0(a2)
> vse32.v v1,0(a3)
> bne a4,a5,.L4
> .L9:
> ret
> .L2:
> vsetvli zero,a6,e32,m1,tu,ma
> j   .L3
>
> After this patch:
>
> foo:
> li  a5,0
> vsetvli zero,a6,e32,m1,tu,ma
> beq a4,zero,.L9
> .L4:
> vle32.v v1,0(a0)
> addia5,a5,1
> vle32.v v1,0(a1)
> vle32.v v1,0(a2)
> vse32.v v1,0(a3)
> bne a4,a5,.L4
> .L9:
> ret
>
> PR target/112092
>
> gcc/ChangeLog:
>
> * config/riscv/riscv-vector-builtins-bases.cc: Normalize the vsetvls.
>
> gcc/testsuite/ChangeLog:
>
> * gcc.target/riscv/rvv/vsetvl/pr109743-1.c: Adapt test.
> * gcc.target/riscv/rvv/vsetvl/pr109743-3.c: Ditto.
> * gcc.target/riscv/rvv/vsetvl/vsetvl-11.c: Ditto.
> * gcc.target/riscv/rvv/vsetvl/vsetvl-15.c: Ditto.
> * gcc.target/riscv/rvv/vsetvl/vsetvl-22.c: Ditto.
> * gcc.target/riscv/rvv/vsetvl/vsetvlmax-13.c: Ditto.
> * gcc.target/riscv/rvv/vsetvl/vsetvlmax-15.c: Ditto.
> * gcc.target/riscv/rvv/vsetvl/vsetvlmax-5.c: Ditto.
> * gcc.target/riscv/rvv/vsetvl/vsetvlmax-7.c: Ditto.
> * gcc.target/riscv/rvv/vsetvl/vsetvlmax-8.c: Ditto.
> * gcc.target/riscv/rvv/vsetvl/pr112092-1.c: New test.
> * gcc.target/riscv/rvv/vsetvl/pr112092-2.c: New test.
>
> ---
>  .../riscv/riscv-vector-builtins-bases.cc  | 24 +-
>  .../gcc.target/riscv/rvv/vsetvl/pr109743-1.c  |  2 +-
>  .../gcc.target/riscv/rvv/vsetvl/pr109743-3.c  |  3 +--
>  .../gcc.target/riscv/rvv/vsetvl/pr112092-1.c  | 25 +++
>  .../gcc.target/riscv/rvv/vsetvl/pr112092-2.c  | 25 +++
>  .../gcc.target/riscv/rvv/vsetvl/vsetvl-11.c   |  2 +-
>  .../gcc.target/riscv/rvv/vsetvl/vsetvl-15.c   |  2 +-
>  .../gcc.target/riscv/rvv/vsetvl/vsetvl-22.c   |  2 +-
>  .../riscv/rvv/vsetvl/vsetvlmax-13.c   |  4 +--
>  .../riscv/rvv/vsetvl/vsetvlmax-15.c   |  6 ++---
>  .../gcc.target/riscv/rvv/vsetvl/vsetvlmax-5.c |  4 +--
>  .../gcc.target/riscv/rvv/vsetvl/vsetvlmax-7.c |  2 +-
>  .../gcc.target/riscv/rvv/vsetvl/vsetvlmax-8.c |  4 +--
>  13 files changed, 83 insertions(+), 22 deletions(-)
>  create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/vsetvl/pr112092-1.c
>  create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/vsetvl/pr112092-2.c
>
> diff --git 

Re: [PATCH] RISC-V: Normalize user vsetvl intrinsics[PR112092]

2023-11-07 Thread Kito Cheng
I thought vsetvli insertion will try to merge them into one for those
cases? Could you explain few more reasons why they are not fused now?
Not an objection since I could imageing that would be easier to
process, just wondering why.

On Wed, Nov 8, 2023 at 2:11 PM Juzhe-Zhong  wrote:
>
> Since our user vsetvl intrinsics are defined as just calculate the VL output
> which is the number of the elements to be processed. Such intrinsics do not
> have any side effects.  We should normalize them when they have same ratio.
>
> E.g __riscv_vsetvl_e8mf8 result is same as __riscv_vsetvl_e64m1.
>
> Normalize them can allow us have better codegen.
> Consider this following example:
>
> #include "riscv_vector.h"
>
> void foo(int32_t *in1, int32_t *in2, int32_t *in3, int32_t *out, size_t n, 
> int cond, int avl) {
>
>   size_t vl;
>   if (cond)
> vl = __riscv_vsetvl_e32m1(avl);
>   else
> vl = __riscv_vsetvl_e16mf2(avl);
>   for (size_t i = 0; i < n; i += 1) {
> vint32m1_t a = __riscv_vle32_v_i32m1(in1, vl);
> vint32m1_t b = __riscv_vle32_v_i32m1_tu(a, in2, vl);
> vint32m1_t c = __riscv_vle32_v_i32m1_tu(b, in3, vl);
> __riscv_vse32_v_i32m1(out, c, vl);
>   }
> }
>
> Before this patch:
>
> foo:
> beq a5,zero,.L2
> vsetvli a6,a6,e32,m1,tu,ma
> .L3:
> li  a5,0
> beq a4,zero,.L9
> .L4:
> vle32.v v1,0(a0)
> addia5,a5,1
> vle32.v v1,0(a1)
> vle32.v v1,0(a2)
> vse32.v v1,0(a3)
> bne a4,a5,.L4
> .L9:
> ret
> .L2:
> vsetvli zero,a6,e32,m1,tu,ma
> j   .L3
>
> After this patch:
>
> foo:
> li  a5,0
> vsetvli zero,a6,e32,m1,tu,ma
> beq a4,zero,.L9
> .L4:
> vle32.v v1,0(a0)
> addia5,a5,1
> vle32.v v1,0(a1)
> vle32.v v1,0(a2)
> vse32.v v1,0(a3)
> bne a4,a5,.L4
> .L9:
> ret
>
> PR target/112092
>
> gcc/ChangeLog:
>
> * config/riscv/riscv-vector-builtins-bases.cc: Normalize the vsetvls.
>
> gcc/testsuite/ChangeLog:
>
> * gcc.target/riscv/rvv/vsetvl/pr109743-1.c: Adapt test.
> * gcc.target/riscv/rvv/vsetvl/pr109743-3.c: Ditto.
> * gcc.target/riscv/rvv/vsetvl/vsetvl-11.c: Ditto.
> * gcc.target/riscv/rvv/vsetvl/vsetvl-15.c: Ditto.
> * gcc.target/riscv/rvv/vsetvl/vsetvl-22.c: Ditto.
> * gcc.target/riscv/rvv/vsetvl/vsetvlmax-13.c: Ditto.
> * gcc.target/riscv/rvv/vsetvl/vsetvlmax-15.c: Ditto.
> * gcc.target/riscv/rvv/vsetvl/vsetvlmax-5.c: Ditto.
> * gcc.target/riscv/rvv/vsetvl/vsetvlmax-7.c: Ditto.
> * gcc.target/riscv/rvv/vsetvl/vsetvlmax-8.c: Ditto.
> * gcc.target/riscv/rvv/vsetvl/pr112092-1.c: New test.
> * gcc.target/riscv/rvv/vsetvl/pr112092-2.c: New test.
>
> ---
>  .../riscv/riscv-vector-builtins-bases.cc  | 24 +-
>  .../gcc.target/riscv/rvv/vsetvl/pr109743-1.c  |  2 +-
>  .../gcc.target/riscv/rvv/vsetvl/pr109743-3.c  |  3 +--
>  .../gcc.target/riscv/rvv/vsetvl/pr112092-1.c  | 25 +++
>  .../gcc.target/riscv/rvv/vsetvl/pr112092-2.c  | 25 +++
>  .../gcc.target/riscv/rvv/vsetvl/vsetvl-11.c   |  2 +-
>  .../gcc.target/riscv/rvv/vsetvl/vsetvl-15.c   |  2 +-
>  .../gcc.target/riscv/rvv/vsetvl/vsetvl-22.c   |  2 +-
>  .../riscv/rvv/vsetvl/vsetvlmax-13.c   |  4 +--
>  .../riscv/rvv/vsetvl/vsetvlmax-15.c   |  6 ++---
>  .../gcc.target/riscv/rvv/vsetvl/vsetvlmax-5.c |  4 +--
>  .../gcc.target/riscv/rvv/vsetvl/vsetvlmax-7.c |  2 +-
>  .../gcc.target/riscv/rvv/vsetvl/vsetvlmax-8.c |  4 +--
>  13 files changed, 83 insertions(+), 22 deletions(-)
>  create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/vsetvl/pr112092-1.c
>  create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/vsetvl/pr112092-2.c
>
> diff --git a/gcc/config/riscv/riscv-vector-builtins-bases.cc 
> b/gcc/config/riscv/riscv-vector-builtins-bases.cc
> index 0298b7987a1..d70468542ee 100644
> --- a/gcc/config/riscv/riscv-vector-builtins-bases.cc
> +++ b/gcc/config/riscv/riscv-vector-builtins-bases.cc
> @@ -131,19 +131,31 @@ public:
>
>  tree type = builtin_types[e.type.index].vector;
>  machine_mode mode = TYPE_MODE (type);
> -machine_mode inner_mode = GET_MODE_INNER (mode);
> +/* Normalize same RATO (SEW/LMUL) into same vsetvl instruction.
> +
> +- e8,mf8/e16,mf4/e32,mf2/e64,m1 --> e8mf8
> +- e8,mf4/e16,mf2/e32,m1/e64,m2  --> e8mf4
> +- e8,mf2/e16,m1/e32,m2/e64,m4   --> e8mf2
> +- e8,m1/e16,m2/e32,m4/e64,m8--> e8m1
> +- e8,m2/e16,m4/e32,m8   --> e8m2
> +- e8,m4/e16,m8  --> e8m4
> +- e8,m8 --> e8m8
> +*/
>  /* SEW.  */
> -e.add_input_operand (Pmode,
> -gen_int_mode (GET_MODE_BITSIZE (inner_mode), Pmode));
> +e.add_input_operand (Pmode, gen_int_mode (8, Pmode));
>
>  /* LMUL.  */
> -

[PATCH] RISC-V: Normalize user vsetvl intrinsics[PR112092]

2023-11-07 Thread Juzhe-Zhong
Since our user vsetvl intrinsics are defined as just calculate the VL output
which is the number of the elements to be processed. Such intrinsics do not
have any side effects.  We should normalize them when they have same ratio.

E.g __riscv_vsetvl_e8mf8 result is same as __riscv_vsetvl_e64m1.

Normalize them can allow us have better codegen.
Consider this following example:

#include "riscv_vector.h"

void foo(int32_t *in1, int32_t *in2, int32_t *in3, int32_t *out, size_t n, int 
cond, int avl) {
  
  size_t vl;
  if (cond)
vl = __riscv_vsetvl_e32m1(avl);
  else
vl = __riscv_vsetvl_e16mf2(avl);
  for (size_t i = 0; i < n; i += 1) {
vint32m1_t a = __riscv_vle32_v_i32m1(in1, vl);
vint32m1_t b = __riscv_vle32_v_i32m1_tu(a, in2, vl);
vint32m1_t c = __riscv_vle32_v_i32m1_tu(b, in3, vl);
__riscv_vse32_v_i32m1(out, c, vl);
  }
}

Before this patch:

foo:
beq a5,zero,.L2
vsetvli a6,a6,e32,m1,tu,ma
.L3:
li  a5,0
beq a4,zero,.L9
.L4:
vle32.v v1,0(a0)
addia5,a5,1
vle32.v v1,0(a1)
vle32.v v1,0(a2)
vse32.v v1,0(a3)
bne a4,a5,.L4
.L9:
ret
.L2:
vsetvli zero,a6,e32,m1,tu,ma
j   .L3

After this patch:

foo:
li  a5,0
vsetvli zero,a6,e32,m1,tu,ma
beq a4,zero,.L9
.L4:
vle32.v v1,0(a0)
addia5,a5,1
vle32.v v1,0(a1)
vle32.v v1,0(a2)
vse32.v v1,0(a3)
bne a4,a5,.L4
.L9:
ret

PR target/112092

gcc/ChangeLog:

* config/riscv/riscv-vector-builtins-bases.cc: Normalize the vsetvls.

gcc/testsuite/ChangeLog:

* gcc.target/riscv/rvv/vsetvl/pr109743-1.c: Adapt test.
* gcc.target/riscv/rvv/vsetvl/pr109743-3.c: Ditto.
* gcc.target/riscv/rvv/vsetvl/vsetvl-11.c: Ditto.
* gcc.target/riscv/rvv/vsetvl/vsetvl-15.c: Ditto.
* gcc.target/riscv/rvv/vsetvl/vsetvl-22.c: Ditto.
* gcc.target/riscv/rvv/vsetvl/vsetvlmax-13.c: Ditto.
* gcc.target/riscv/rvv/vsetvl/vsetvlmax-15.c: Ditto.
* gcc.target/riscv/rvv/vsetvl/vsetvlmax-5.c: Ditto.
* gcc.target/riscv/rvv/vsetvl/vsetvlmax-7.c: Ditto.
* gcc.target/riscv/rvv/vsetvl/vsetvlmax-8.c: Ditto.
* gcc.target/riscv/rvv/vsetvl/pr112092-1.c: New test.
* gcc.target/riscv/rvv/vsetvl/pr112092-2.c: New test.

---
 .../riscv/riscv-vector-builtins-bases.cc  | 24 +-
 .../gcc.target/riscv/rvv/vsetvl/pr109743-1.c  |  2 +-
 .../gcc.target/riscv/rvv/vsetvl/pr109743-3.c  |  3 +--
 .../gcc.target/riscv/rvv/vsetvl/pr112092-1.c  | 25 +++
 .../gcc.target/riscv/rvv/vsetvl/pr112092-2.c  | 25 +++
 .../gcc.target/riscv/rvv/vsetvl/vsetvl-11.c   |  2 +-
 .../gcc.target/riscv/rvv/vsetvl/vsetvl-15.c   |  2 +-
 .../gcc.target/riscv/rvv/vsetvl/vsetvl-22.c   |  2 +-
 .../riscv/rvv/vsetvl/vsetvlmax-13.c   |  4 +--
 .../riscv/rvv/vsetvl/vsetvlmax-15.c   |  6 ++---
 .../gcc.target/riscv/rvv/vsetvl/vsetvlmax-5.c |  4 +--
 .../gcc.target/riscv/rvv/vsetvl/vsetvlmax-7.c |  2 +-
 .../gcc.target/riscv/rvv/vsetvl/vsetvlmax-8.c |  4 +--
 13 files changed, 83 insertions(+), 22 deletions(-)
 create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/vsetvl/pr112092-1.c
 create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/vsetvl/pr112092-2.c

diff --git a/gcc/config/riscv/riscv-vector-builtins-bases.cc 
b/gcc/config/riscv/riscv-vector-builtins-bases.cc
index 0298b7987a1..d70468542ee 100644
--- a/gcc/config/riscv/riscv-vector-builtins-bases.cc
+++ b/gcc/config/riscv/riscv-vector-builtins-bases.cc
@@ -131,19 +131,31 @@ public:
 
 tree type = builtin_types[e.type.index].vector;
 machine_mode mode = TYPE_MODE (type);
-machine_mode inner_mode = GET_MODE_INNER (mode);
+/* Normalize same RATO (SEW/LMUL) into same vsetvl instruction.
+
+- e8,mf8/e16,mf4/e32,mf2/e64,m1 --> e8mf8
+- e8,mf4/e16,mf2/e32,m1/e64,m2  --> e8mf4
+- e8,mf2/e16,m1/e32,m2/e64,m4   --> e8mf2
+- e8,m1/e16,m2/e32,m4/e64,m8--> e8m1
+- e8,m2/e16,m4/e32,m8   --> e8m2
+- e8,m4/e16,m8  --> e8m4
+- e8,m8 --> e8m8
+*/
 /* SEW.  */
-e.add_input_operand (Pmode,
-gen_int_mode (GET_MODE_BITSIZE (inner_mode), Pmode));
+e.add_input_operand (Pmode, gen_int_mode (8, Pmode));
 
 /* LMUL.  */
-e.add_input_operand (Pmode, gen_int_mode (get_vlmul (mode), Pmode));
+machine_mode e8_mode
+  = get_vector_mode (QImode, GET_MODE_NUNITS (mode)).require ();
+e.add_input_operand (Pmode, gen_int_mode (get_vlmul (e8_mode), Pmode));
 
 /* TAIL_ANY.  */
-e.add_input_operand (Pmode, gen_int_mode (get_prefer_tail_policy (), 
Pmode));
+e.add_input_operand (Pmode,
+gen_int_mode (get_prefer_tail_policy (), Pmode));
 
 /* MASK_ANY.  */
-e.add_input_operand (Pmode, gen_int_mode 

Re: [PATCH][_Hashtable] Add missing destructor call

2023-11-07 Thread François Dumont



On 07/11/2023 00:28, Jonathan Wakely wrote:

On Mon, 6 Nov 2023 at 21:39, François Dumont  wrote:

Noticed looking for other occasion to replace __try/__catch with RAII
helper.

  libstdc++: [_Hashtable] Add missing node destructor call

  libstdc++-v3/ChangeLog:

  * include/bits/hashtable_policy.h
  (_Hashtable_alloc<>::_M_allocate_node): Add missing call to
node destructor
  on construct exception.

Tested under Linux x64, ok to commit ?

OK.

Is this missing on any branches too?

Clearly all maintained branches.

I don't think it's actually a problem, since it's a trivial destructor anyway.


Yes, me neither, I was only thinking about sanity checker tools when 
doing this so no plan for backports.




Re: [PATCH 0/7] ira/lra: Support subreg coalesce

2023-11-07 Thread juzhe.zh...@rivai.ai
Thanks Lehua.

Appreciate for supporting subreg liveness tracking with tons of work.

A nit comments, I think you should mention these following PRs:

106694
89967
106146
99161 

No need send V2 now. You can send V2 after Richard and Vlad reviewed.



juzhe.zh...@rivai.ai
 
From: Lehua Ding
Date: 2023-11-08 11:47
To: gcc-patches
CC: vmakarov; richard.sandiford; juzhe.zhong; lehua.ding
Subject: [PATCH 0/7] ira/lra: Support subreg coalesce
Hi,
 
These patchs try to support subreg coalesce feature in
register allocation passes (ira and lra).
 
Let's consider a RISC-V program (https://godbolt.org/z/ec51d91aT):
 
```
#include 
 
void
foo (int32_t *in, int32_t *out, size_t m)
{
  vint32m2_t result = __riscv_vle32_v_i32m2 (in, 32);
  vint32m1_t v0 = __riscv_vget_v_i32m2_i32m1 (result, 0);
  vint32m1_t v1 = __riscv_vget_v_i32m2_i32m1 (result, 1);
  for (size_t i = 0; i < m; i++)
{
  v0 = __riscv_vadd_vv_i32m1(v0, v0, 4);
  v1 = __riscv_vmul_vv_i32m1(v1, v1, 4);
}
  *(vint32m1_t*)(out+4*0) = v0;
  *(vint32m1_t*)(out+4*1) = v1;
}
```
 
Before these patchs:
 
```
foo:
li a5,32
vsetvli zero,a5,e32,m2,ta,ma
vle32.v v4,0(a0)
vmv1r.v v2,v4
vmv1r.v v1,v5
beq a2,zero,.L2
li a5,0
vsetivli zero,4,e32,m1,ta,ma
.L3:
addi a5,a5,1
vadd.vv v2,v2,v2
vmul.vv v1,v1,v1
bne a2,a5,.L3
.L2:
vs1r.v v2,0(a1)
addi a1,a1,16
vs1r.v v1,0(a1)
ret
```
 
After these patchs:
 
```
foo:
li a5,32
vsetvli zero,a5,e32,m2,ta,ma
vle32.v v2,0(a0)
beq a2,zero,.L2
li a5,0
vsetivli zero,4,e32,m1,ta,ma
.L3:
addi a5,a5,1
vadd.vv v2,v2,v2
vmul.vv v3,v3,v3
bne a2,a5,.L3
.L2:
vs1r.v v2,0(a1)
addi a1,a1,16
vs1r.v v3,0(a1)
ret
```
 
As you can see, the two redundant vmv1r.v instructions were removed.
The reason for the two redundant vmv1r.v instructions is because
the current ira pass is being conservative in calculating the live
range of pseduo registers that occupy multil hardregs. As in the
following two RTL instructions. Where r134 occupies two physical
registers and r135 and r136 occupy one physical register.
At insn 12 point, ira considers the entire r134 pseudo register
to be live, so r135 is in conflict with r134, as shown in the ira
dump info. Then when the physical registers are allocated, r135 and
r134 are allocated first because they are inside the loop body and
have higher priority. This makes it difficult to assign r136 to
overlap with r134, i.e., to assign r136 to hr100, thus eliminating
the need for the vmv1r.v instruction. Thus two vmv1r.v instructions
appear.
 
If we refine the live information of r134 to the case of each subreg,
we can remove this conflict. We can then create copies of the set
with subreg reference, thus increasing the priority of the r134 allocation,
which allow registers with bigger alignment requirements to prioritize
the allocation of physical registers. In RVV, pseudo registers occupying
two physical registers need to be time-2 aligned.
 
```
(insn 11 10 12 2 (set (reg/v:RVVM1SI 135 [ v0 ])
(subreg:RVVM1SI (reg/v:RVVM2SI 134 [ result ]) 0)) 
"/app/example.c":7:19 998 {*movrvvm1si_whole}
 (nil))
(insn 12 11 13 2 (set (reg/v:RVVM1SI 136 [ v1 ])
(subreg:RVVM1SI (reg/v:RVVM2SI 134 [ result ]) [16, 16])) 
"/app/example.c":8:19 998 {*movrvvm1si_whole}
 (expr_list:REG_DEAD (reg/v:RVVM2SI 134 [ result ])
(nil)))
```
 
ira dump:
 
;; a1(r136,l0) conflicts: a3(r135,l0)
;; total conflict hard regs:
;; conflict hard regs:
;; a3(r135,l0) conflicts: a1(r136,l0) a6(r134,l0)
;; total conflict hard regs:
;; conflict hard regs:
;; a6(r134,l0) conflicts: a3(r135,l0)
;; total conflict hard regs:
;; conflict hard regs:
;;
;; ...
  Popping a1(r135,l0)  -- assign reg 97
  Popping a3(r136,l0)  -- assign reg 98
  Popping a4(r137,l0)  -- assign reg 15
  Popping a5(r140,l0)  -- assign reg 12
  Popping a10(r145,l0)  -- assign reg 12
  Popping a2(r139,l0)  -- assign reg 11
  Popping a9(r144,l0)  -- assign reg 11
  Popping a0(r142,l0)  -- assign reg 11
  Popping a6(r134,l0)  -- assign reg 100
  Popping a7(r143,l0)  -- assign reg 10
  Popping a8(r141,l0)  -- assign reg 15
 
The AArch64 SVE has the same problem. Consider the following
code (https://godbolt.org/z/MYrK7Ghaj):
 
```
#include 
 
int bar (svbool_t pg, int64_t* base, int n, int64_t *in1, int64_t *in2, 
int64_t*out)
{
  svint64x4_t result = svld4_s64 (pg, base);
  svint64_t v0 = svget4_s64(result, 0);
  svint64_t v1 = svget4_s64(result, 1);
  svint64_t v2 = svget4_s64(result, 2);
  svint64_t v3 = svget4_s64(result, 3);
 
  for (int i = 0; i < n; i += 1)
{
svint64_t v18 = svld1_s64(pg, in1);
svint64_t v19 = svld1_s64(pg, in2);
v0 = svmad_s64_z(pg, v0, v18, v19);
v1 = svmad_s64_z(pg, v1, v18, v19);
v2 = svmad_s64_z(pg, v2, v18, v19);
v3 = svmad_s64_z(pg, v3, v18, v19);
}
  svst1_s64(pg, out+0,v0);
  svst1_s64(pg, out+1,v1);
  svst1_s64(pg, 

[PATCH 6/7] lra: Apply live_subreg df_problem to lra pass

2023-11-07 Thread Lehua Ding
This patch change the use of old live data to the new live_subreg data.

gcc/ChangeLog:

* lra-coalesce.cc (update_live_info): Update.
(lra_coalesce): Update.
* lra-constraints.cc (update_ebb_live_info): Update.
(get_live_on_other_edges): Update.
(inherit_in_ebb): Update.
(lra_inheritance): Update.
(fix_bb_live_info): Update.
(remove_inheritance_pseudos): Update.
* lra-lives.cc (make_hard_regno_live): Update.
(make_hard_regno_dead): Update.
(mark_regno_live): Update.
(mark_regno_dead): Update.
(class bb_data_pseudos): Update.
(live_trans_fun): Update.
(live_con_fun_0): Update.
(live_con_fun_n): Update.
(initiate_live_solver): Update.
(finish_live_solver): Update.
(process_bb_lives): Update.
(lra_create_live_ranges_1): Update.
* lra-remat.cc (dump_candidates_and_remat_bb_data): Update.
(calculate_livein_cands): Update.
(do_remat): Update.
* lra-spills.cc (spill_pseudos): Update.

---
 gcc/lra-coalesce.cc|  20 ++-
 gcc/lra-constraints.cc |  93 ++---
 gcc/lra-lives.cc   | 308 -
 gcc/lra-remat.cc   |  13 +-
 gcc/lra-spills.cc  |  22 ++-
 5 files changed, 354 insertions(+), 102 deletions(-)

diff --git a/gcc/lra-coalesce.cc b/gcc/lra-coalesce.cc
index 04a5bbd714b..abfc54f1cc2 100644
--- a/gcc/lra-coalesce.cc
+++ b/gcc/lra-coalesce.cc
@@ -188,19 +188,25 @@ static bitmap_head used_pseudos_bitmap;
 /* Set up USED_PSEUDOS_BITMAP, and update LR_BITMAP (a BB live info
bitmap).  */
 static void
-update_live_info (bitmap lr_bitmap)
+update_live_info (bitmap all, bitmap full, bitmap partial)
 {
   unsigned int j;
   bitmap_iterator bi;
 
   bitmap_clear (_pseudos_bitmap);
-  EXECUTE_IF_AND_IN_BITMAP (_pseudos_bitmap, lr_bitmap,
+  EXECUTE_IF_AND_IN_BITMAP (_pseudos_bitmap, all,
FIRST_PSEUDO_REGISTER, j, bi)
 bitmap_set_bit (_pseudos_bitmap, first_coalesced_pseudo[j]);
   if (! bitmap_empty_p (_pseudos_bitmap))
 {
-  bitmap_and_compl_into (lr_bitmap, _pseudos_bitmap);
-  bitmap_ior_into (lr_bitmap, _pseudos_bitmap);
+  bitmap_and_compl_into (all, _pseudos_bitmap);
+  bitmap_ior_into (all, _pseudos_bitmap);
+
+  bitmap_and_compl_into (full, _pseudos_bitmap);
+  bitmap_ior_and_compl_into (full, _pseudos_bitmap, partial);
+
+  bitmap_and_compl_into (partial, _pseudos_bitmap);
+  bitmap_ior_and_compl_into (partial, _pseudos_bitmap, full);
 }
 }
 
@@ -303,8 +309,10 @@ lra_coalesce (void)
   bitmap_initialize (_pseudos_bitmap, _obstack);
   FOR_EACH_BB_FN (bb, cfun)
 {
-  update_live_info (df_get_live_in (bb));
-  update_live_info (df_get_live_out (bb));
+  update_live_info (DF_LIVE_SUBREG_IN (bb), DF_LIVE_SUBREG_FULL_IN (bb),
+   DF_LIVE_SUBREG_PARTIAL_IN (bb));
+  update_live_info (DF_LIVE_SUBREG_OUT (bb), DF_LIVE_SUBREG_FULL_OUT (bb),
+   DF_LIVE_SUBREG_PARTIAL_OUT (bb));
   FOR_BB_INSNS_SAFE (bb, insn, next)
if (INSN_P (insn)
&& bitmap_bit_p (_insns_bitmap, INSN_UID (insn)))
diff --git a/gcc/lra-constraints.cc b/gcc/lra-constraints.cc
index 0607c8be7cb..c3ad846b97b 100644
--- a/gcc/lra-constraints.cc
+++ b/gcc/lra-constraints.cc
@@ -6571,34 +6571,75 @@ update_ebb_live_info (rtx_insn *head, rtx_insn *tail)
{
  if (prev_bb != NULL)
{
- /* Update df_get_live_in (prev_bb):  */
+ /* Update subreg live (prev_bb):  */
+ bitmap subreg_all_in = DF_LIVE_SUBREG_IN (prev_bb);
+ bitmap subreg_full_in = DF_LIVE_SUBREG_FULL_IN (prev_bb);
+ bitmap subreg_partial_in = DF_LIVE_SUBREG_PARTIAL_IN (prev_bb);
+ subregs_live *range_in = DF_LIVE_SUBREG_RANGE_IN (prev_bb);
  EXECUTE_IF_SET_IN_BITMAP (_only_regs, 0, j, bi)
if (bitmap_bit_p (_regs, j))
- bitmap_set_bit (df_get_live_in (prev_bb), j);
-   else
- bitmap_clear_bit (df_get_live_in (prev_bb), j);
+ {
+   bitmap_set_bit (subreg_all_in, j);
+   bitmap_set_bit (subreg_full_in, j);
+   if (bitmap_bit_p (subreg_partial_in, j))
+ {
+   bitmap_clear_bit (subreg_partial_in, j);
+   range_in->remove_live (j);
+ }
+ }
+   else if (bitmap_bit_p (subreg_all_in, j))
+ {
+   bitmap_clear_bit (subreg_all_in, j);
+   bitmap_clear_bit (subreg_full_in, j);
+   if (bitmap_bit_p (subreg_partial_in, j))
+ {
+   bitmap_clear_bit (subreg_partial_in, j);
+   range_in->remove_live (j);
+ }
+ }
}
+  

[PATCH 4/7] ira: Support subreg copy

2023-11-07 Thread Lehua Ding
This patch change the copy between allocno and allocno to the copy between
object and object, that is, allow partial copy between pseudo registers.

gcc/ChangeLog:

* ira-build.cc (find_allocno_copy): Removed.
(ira_create_object): Adjust.
(find_object): New.
(ira_create_copy): Adjust.
(add_allocno_copy_to_list): Adjust.
(swap_allocno_copy_ends_if_necessary): Adjust.
(ira_add_allocno_copy): Adjust.
(print_copy): Adjust.
(print_allocno_copies): Adjust.
(ira_flattening): Adjust.
* ira-color.cc (INCLUDE_VECTOR): use std::vector
(struct allocno_color_data): New fields.
(struct allocno_hard_regs_subnode): More comments.
(form_allocno_hard_regs_nodes_forest): More comments.
(update_left_conflict_sizes_p): More comments.
(struct update_cost_queue_elem): New field.
(queue_update_cost): Adjust.
(get_next_update_cost): Adjust.
(update_costs_from_allocno): Adjust.
(update_conflict_hard_regno_costs): Adjust.
(assign_hard_reg): Adjust.
(objects_conflict_by_live_ranges_p): New.
(allocno_thread_conflict_p): Removed.
(object_thread_conflict_p): New.
(merge_threads): Adjust.
(form_threads_from_copies): Adjust.
(form_threads_from_bucket): Adjust.
(form_threads_from_colorable_allocno): Adjust.
(init_allocno_threads): Adjust.
(add_allocno_to_bucket): Adjust.
(delete_allocno_from_bucket): Adjust.
(allocno_copy_cost_saving): Adjust.
(color_allocnos): Adjust.
(color_pass): Adjust.
(update_curr_costs): Adjust.
(coalesce_allocnos): Adjust.
(ira_reuse_stack_slot): Adjust.
(ira_initiate_assign): Adjust.
(ira_finish_assign): Adjust.
* ira-conflicts.cc (allocnos_conflict_for_copy_p): Removed.
(REG_SUBREG_P): Adjust.
(subreg_move_p): New.
(regs_non_conflict_for_copy_p): New.
(subreg_reg_align_and_times_p): New.
(process_regs_for_copy): Adjust.
(add_insn_allocno_copies): Adjust.
(propagate_copies): Adjust.
* ira-emit.cc (add_range_and_copies_from_move_list): Adjust.
* ira-int.h (struct ira_object): New field.
(OBJECT_INDEX): New macro.
(struct ira_allocno_copy): Adjust fields.
(ira_add_allocno_copy): Exported.
(find_object): Exported.
(subreg_move_p): Exported.
* ira.cc (print_redundant_copies): Adjust.

---
 gcc/ira-build.cc | 150 +++-
 gcc/ira-color.cc | 541 +++
 gcc/ira-conflicts.cc | 173 +++---
 gcc/ira-emit.cc  |  10 +-
 gcc/ira-int.h|  13 +-
 gcc/ira.cc   |   5 +-
 6 files changed, 645 insertions(+), 247 deletions(-)

diff --git a/gcc/ira-build.cc b/gcc/ira-build.cc
index 5fb7a9f800f..1c47f81ce9d 100644
--- a/gcc/ira-build.cc
+++ b/gcc/ira-build.cc
@@ -36,9 +36,6 @@ along with GCC; see the file COPYING3.  If not see
 #include "cfgloop.h"
 #include "subreg-live-range.h"
 
-static ira_copy_t find_allocno_copy (ira_allocno_t, ira_allocno_t, rtx_insn *,
-ira_loop_tree_node_t);
-
 /* The root of the loop tree corresponding to the all function.  */
 ira_loop_tree_node_t ira_loop_tree_root;
 
@@ -463,6 +460,7 @@ ira_create_object (ira_allocno_t a, int start, int nregs)
   OBJECT_LIVE_RANGES (obj) = NULL;
   OBJECT_START (obj) = start;
   OBJECT_NREGS (obj) = nregs;
+  OBJECT_INDEX (obj) = ALLOCNO_NUM_OBJECTS (a);
 
   ira_object_id_map_vec.safe_push (obj);
   ira_object_id_map
@@ -519,6 +517,16 @@ find_object (ira_allocno_t a, poly_int64 offset, 
poly_int64 size)
   return find_object (a, subreg_start, subreg_nregs);
 }
 
+/* Return object in allocno A for REG.  */
+ira_object_t
+find_object (ira_allocno_t a, rtx reg)
+{
+  if (has_subreg_object_p (a) && read_modify_subreg_p (reg))
+return find_object (a, SUBREG_BYTE (reg), GET_MODE_SIZE (GET_MODE (reg)));
+  else
+return find_object (a, 0, ALLOCNO_NREGS (a));
+}
+
 /* Return the object in allocno A which match START & NREGS.  Create when not
found.  */
 ira_object_t
@@ -1502,27 +1510,36 @@ initiate_copies (void)
 /* Return copy connecting A1 and A2 and originated from INSN of
LOOP_TREE_NODE if any.  */
 static ira_copy_t
-find_allocno_copy (ira_allocno_t a1, ira_allocno_t a2, rtx_insn *insn,
+find_allocno_copy (ira_object_t obj1, ira_object_t obj2, rtx_insn *insn,
   ira_loop_tree_node_t loop_tree_node)
 {
   ira_copy_t cp, next_cp;
-  ira_allocno_t another_a;
+  ira_object_t another_obj;
 
+  ira_allocno_t a1 = OBJECT_ALLOCNO (obj1);
   for (cp = ALLOCNO_COPIES (a1); cp != NULL; cp = next_cp)
 {
-  if (cp->first == a1)
+  ira_allocno_t first_a = OBJECT_ALLOCNO (cp->first);
+  ira_allocno_t second_a = OBJECT_ALLOCNO (cp->second);
+  if (first_a == a1)
{
  next_cp = 

[PATCH 3/7] ira: Support subreg live range track

2023-11-07 Thread Lehua Ding
This patch extends the reg live range in ira to track the lifecycle
of subreg, thus enabling more granular tracking of the live range and
conflict of a pseudo subreg part. This patch will divide allocno into
two categories: one has single object, and the other is the case where
it contains subreg objects.

gcc/ChangeLog:

* ira-build.cc (init_object_start_and_nregs): Removed.
(ira_create_object): Adjust.
(find_object): New.
(find_object_anyway): New.
(ira_create_allocno): Removed regs_with_subreg.
(ira_set_allocno_class): Adjust.
(get_range): New.
(ira_copy_allocno_objects): New.
(merge_hard_reg_conflicts): Adjust.
(create_cap_allocno): Adjust.
(find_subreg_p): New.
(add_subregs): New.
(create_insn_allocnos): Adjust.
(create_bb_allocnos): Adjust.
(move_allocno_live_ranges): Adjust.
(copy_allocno_live_ranges):  Adjust.
(setup_min_max_allocno_live_range_point): Adjust.
(init_regs_with_subreg): Removed.
(ira_build): Removed.
(ira_destroy): Removed.
* ira-color.cc (INCLUDE_MAP): use std::map
(setup_left_conflict_sizes_p): Adjust.
(push_allocno_to_stack): Adjust.
* ira-conflicts.cc (record_object_conflict): Adjust.
(build_object_conflicts): Adjust.
(build_conflicts): Adjust.
(print_allocno_conflicts): Adjust.
* ira-emit.cc (modify_move_list): Adjust.
* ira-int.h (struct ira_object): Adjust.
(struct ira_allocno): Adjust.
(OBJECT_SIZE): New.
(OBJECT_OFFSET): New.
(OBJECT_SUBWORD): New.
(find_object): New.
(find_object_anyway): New.
(ira_copy_allocno_objects):  New.
* ira-lives.cc (INCLUDE_VECTOR): use std::vector.
(set_subreg_conflict_hard_regs): New.
(make_hard_regno_dead): Adjust.
(make_object_live): Adjust.
(update_allocno_pressure_excess_length): Adjust.
(make_object_dead): Adjust.
(mark_pseudo_regno_live): New.
(add_subreg_point): New.
(mark_pseudo_object_live): New.
(mark_pseudo_regno_subword_live): Removed.
(mark_pseudo_regno_subreg_live): New.
(mark_pseudo_regno_subregs_live): New.
(mark_pseudo_reg_live): New.
(mark_pseudo_regno_dead): Removed.
(mark_pseudo_object_dead): New.
(mark_pseudo_regno_subword_dead): Removed.
(mark_pseudo_regno_subreg_dead): New.
(mark_pseudo_reg_dead): Adjust.
(process_single_reg_class_operands): Adjust.
(process_out_of_region_eh_regs): Adjust.
(process_bb_node_lives): Adjust.
(class subreg_live_item): New.
(create_subregs_live_ranges): New.
(ira_create_allocno_live_ranges): Adjust.
* subreg-live-range.h: New fields.

---
 gcc/ira-build.cc| 275 +
 gcc/ira-color.cc|  68 --
 gcc/ira-conflicts.cc|  48 ++--
 gcc/ira-emit.cc |   2 +-
 gcc/ira-int.h   |  21 +-
 gcc/ira-lives.cc| 522 +---
 gcc/subreg-live-range.h |  16 ++
 7 files changed, 653 insertions(+), 299 deletions(-)

diff --git a/gcc/ira-build.cc b/gcc/ira-build.cc
index 7df98164503..5fb7a9f800f 100644
--- a/gcc/ira-build.cc
+++ b/gcc/ira-build.cc
@@ -29,10 +29,12 @@ along with GCC; see the file COPYING3.  If not see
 #include "insn-config.h"
 #include "regs.h"
 #include "memmodel.h"
+#include "tm_p.h"
 #include "ira.h"
 #include "ira-int.h"
 #include "sparseset.h"
 #include "cfgloop.h"
+#include "subreg-live-range.h"
 
 static ira_copy_t find_allocno_copy (ira_allocno_t, ira_allocno_t, rtx_insn *,
 ira_loop_tree_node_t);
@@ -440,49 +442,14 @@ initiate_allocnos (void)
   memset (ira_regno_allocno_map, 0, max_reg_num () * sizeof (ira_allocno_t));
 }
 
-/* Update OBJ's start and nregs field according A and OBJ info.  */
-static void
-init_object_start_and_nregs (ira_allocno_t a, ira_object_t obj)
-{
-  enum reg_class aclass = ALLOCNO_CLASS (a);
-  gcc_assert (aclass != NO_REGS);
-
-  machine_mode mode = ALLOCNO_MODE (a);
-  int nregs = ira_reg_class_max_nregs[aclass][mode];
-  if (ALLOCNO_TRACK_SUBREG_P (a))
-{
-  poly_int64 end = OBJECT_OFFSET (obj) + OBJECT_SIZE (obj);
-  for (int i = 0; i < nregs; i += 1)
-   {
- poly_int64 right = ALLOCNO_UNIT_SIZE (a) * (i + 1);
- if (OBJECT_START (obj) < 0 && maybe_lt (OBJECT_OFFSET (obj), right))
-   {
- OBJECT_START (obj) = i;
-   }
- if (OBJECT_NREGS (obj) < 0 && maybe_le (end, right))
-   {
- OBJECT_NREGS (obj) = i + 1 - OBJECT_START (obj);
- break;
-   }
-   }
-  gcc_assert (OBJECT_START (obj) >= 0 && OBJECT_NREGS (obj) > 0);
-}
-  else
-{
-  OBJECT_START (obj) = 0;
-  OBJECT_NREGS (obj) = nregs;
-}
-}
-
 /* Create and return an 

[PATCH 7/7] lra: Support subreg live range track and conflict detect

2023-11-07 Thread Lehua Ding
This patch implements tracking of the live range of subregs and synchronously
modifies conflict detection.

gcc/ChangeLog:

* ira-build.cc (print_copy): Adjust print.
(setup_pseudos_has_subreg_object): New.
(ira_build): collect subreg object allocno.
* lra-assigns.cc (set_offset_conflicts): New.
(setup_live_pseudos_and_spill_after_risky_transforms): Adjust.
(lra_assign): Adjust.
* lra-constraints.cc (process_alt_operands): Relax.
* lra-int.h (GCC_LRA_INT_H): New include.
(struct lra_live_range): New field subreg.
(struct lra_insn_reg): New fields.
(get_range_hard_regs):  Exported.
(get_nregs): New.
(has_subreg_object_p): New.
* lra-lives.cc (INCLUDE_VECTOR): New.
(lra_live_range_pool): New.
(create_live_range): Adjust.
(lra_merge_live_ranges): Adjust.
(update_pseudo_point): Adjust.
(class bb_data_pseudos): New.
(mark_regno_live): Adjust.
(mark_regno_dead): Adjust.
(process_bb_lives): Adjust.
(remove_some_program_points_and_update_live_ranges): Adjust.
(lra_print_live_range_list): Adjust print.
(class subreg_live_item): New class.
(create_subregs_live_ranges): New.
(lra_create_live_ranges_1): Add subreg live ranges.
* lra.cc (get_range_blocks): New.
(get_range_hard_regs): New.
(new_insn_reg): Adjust.
(collect_non_operand_hard_regs): Adjust.
(initialize_lra_reg_info_element): Adjust.
(reg_same_range_p): New.
(add_regs_to_insn_regno_info): Adjust.
* subreg-live-range.h: New constructor.

---
 gcc/ira-build.cc|  40 -
 gcc/lra-assigns.cc  | 111 ++--
 gcc/lra-constraints.cc  |  18 +-
 gcc/lra-int.h   |  33 
 gcc/lra-lives.cc| 361 ++--
 gcc/lra.cc  | 139 ++--
 gcc/subreg-live-range.h |   1 +
 7 files changed, 614 insertions(+), 89 deletions(-)

diff --git a/gcc/ira-build.cc b/gcc/ira-build.cc
index 379f877ca67..cba38d5fecb 100644
--- a/gcc/ira-build.cc
+++ b/gcc/ira-build.cc
@@ -95,6 +95,9 @@ int ira_copies_num;
basic block.  */
 static int last_basic_block_before_change;
 
+/* Record these pseudos which has subreg object. Used by LRA pass.  */
+bitmap_head pseudos_has_subreg_object;
+
 /* Initialize some members in loop tree node NODE.  Use LOOP_NUM for
the member loop_num.  */
 static void
@@ -1688,8 +1691,13 @@ print_copy (FILE *f, ira_copy_t cp)
 {
   ira_allocno_t a1 = OBJECT_ALLOCNO (cp->first);
   ira_allocno_t a2 = OBJECT_ALLOCNO (cp->second);
-  fprintf (f, "  cp%d:a%d(r%d)<->a%d(r%d)@%d:%s\n", cp->num, ALLOCNO_NUM (a1),
-  ALLOCNO_REGNO (a1), ALLOCNO_NUM (a2), ALLOCNO_REGNO (a2), cp->freq,
+  fprintf (f, "  cp%d:a%d(r%d", cp->num, ALLOCNO_NUM (a1), ALLOCNO_REGNO (a1));
+  if (ALLOCNO_NREGS (a1) != OBJECT_NREGS (cp->first))
+fprintf (f, "_obj%d", OBJECT_INDEX (cp->first));
+  fprintf (f, ")<->a%d(r%d", ALLOCNO_NUM (a2), ALLOCNO_REGNO (a2));
+  if (ALLOCNO_NREGS (a2) != OBJECT_NREGS (cp->second))
+fprintf (f, "_obj%d", OBJECT_INDEX (cp->second));
+  fprintf (f, ")@%d:%s\n", cp->freq,
   cp->insn != NULL   ? "move"
   : cp->constraint_p ? "constraint"
  : "shuffle");
@@ -3706,6 +3714,33 @@ update_conflict_hard_reg_costs (void)
 }
 }
 
+/* Setup speudos_has_subreg_object.  */
+static void
+setup_pseudos_has_subreg_object ()
+{
+  bitmap_initialize (_has_subreg_object, _obstack);
+  ira_allocno_t a;
+  ira_allocno_iterator ai;
+  FOR_EACH_ALLOCNO (a, ai)
+if (has_subreg_object_p (a))
+  {
+   bitmap_set_bit (_has_subreg_object, ALLOCNO_REGNO (a));
+   if (ira_dump_file != NULL)
+ {
+   fprintf (ira_dump_file,
+"  a%d(r%d, nregs: %d) has subreg objects:\n",
+ALLOCNO_NUM (a), ALLOCNO_REGNO (a), ALLOCNO_NREGS (a));
+   ira_allocno_object_iterator oi;
+   ira_object_t obj;
+   FOR_EACH_ALLOCNO_OBJECT (a, obj, oi)
+ fprintf (ira_dump_file, "object %d: start: %d, nregs: %d\n",
+  OBJECT_INDEX (obj), OBJECT_START (obj),
+  OBJECT_NREGS (obj));
+   fprintf (ira_dump_file, "\n");
+ }
+  }
+}
+
 /* Create a internal representation (IR) for IRA (allocnos, copies,
loop tree nodes).  The function returns TRUE if we generate loop
structure (besides nodes representing all function and the basic
@@ -3726,6 +3761,7 @@ ira_build (void)
   create_allocnos ();
   ira_costs ();
   create_allocno_objects ();
+  setup_pseudos_has_subreg_object ();
   ira_create_allocno_live_ranges ();
   remove_unnecessary_regions (false);
   ira_compress_allocno_live_ranges ();
diff --git a/gcc/lra-assigns.cc b/gcc/lra-assigns.cc
index d2ebcfd5056..6588a740162 100644
--- a/gcc/lra-assigns.cc
+++ b/gcc/lra-assigns.cc
@@ 

[PATCH 5/7] ira: Add all nregs >= 2 pseudos to tracke subreg list

2023-11-07 Thread Lehua Ding
This patch completely relax to track all eligible subregs.

gcc/ChangeLog:

* ira-build.cc (get_reg_unit_size): New.
(has_same_nregs): New.
(ira_set_allocno_class): Relax.

---
 gcc/ira-build.cc | 41 -
 1 file changed, 36 insertions(+), 5 deletions(-)

diff --git a/gcc/ira-build.cc b/gcc/ira-build.cc
index 1c47f81ce9d..379f877ca67 100644
--- a/gcc/ira-build.cc
+++ b/gcc/ira-build.cc
@@ -607,6 +607,37 @@ ira_create_allocno (int regno, bool cap_p,
   return a;
 }
 
+/* Return single register size of allocno A.  */
+static poly_int64
+get_reg_unit_size (ira_allocno_t a)
+{
+  enum reg_class aclass = ALLOCNO_CLASS (a);
+  gcc_assert (aclass != NO_REGS);
+  machine_mode mode = ALLOCNO_MODE (a);
+  int nregs = ALLOCNO_NREGS (a);
+  poly_int64 block_size = REGMODE_NATURAL_SIZE (mode);
+  int nblocks = get_nblocks (mode);
+  gcc_assert (nblocks % nregs == 0);
+  return block_size * (nblocks / nregs);
+}
+
+/* Return true if TARGET_CLASS_MAX_NREGS and TARGET_HARD_REGNO_NREGS results is
+   same. It should be noted that some targets may not implement these two very
+   uniformly, and need to be debugged step by step. For example, in V3x1DI mode
+   in AArch64, TARGET_CLASS_MAX_NREGS returns 2 but TARGET_HARD_REGNO_NREGS
+   returns 3. They are in conflict and need to be repaired in the Hook of
+   AArch64.  */
+static bool
+has_same_nregs (ira_allocno_t a)
+{
+  for (int i = 0; i < FIRST_PSEUDO_REGISTER; i++)
+if (REGNO_REG_CLASS (i) != NO_REGS
+   && reg_class_subset_p (REGNO_REG_CLASS (i), ALLOCNO_CLASS (a))
+   && ALLOCNO_NREGS (a) != hard_regno_nregs (i, ALLOCNO_MODE (a)))
+  return false;
+  return true;
+}
+
 /* Set up register class for A and update its conflict hard
registers.  */
 void
@@ -624,12 +655,12 @@ ira_set_allocno_class (ira_allocno_t a, enum reg_class 
aclass)
 
   if (aclass == NO_REGS)
 return;
-  /* SET the unit_size of one register.  */
-  machine_mode mode = ALLOCNO_MODE (a);
-  int nregs = ira_reg_class_max_nregs[aclass][mode];
-  if (nregs == 2 && maybe_eq (GET_MODE_SIZE (mode), nregs * UNITS_PER_WORD))
+  gcc_assert (!ALLOCNO_TRACK_SUBREG_P (a));
+  /* Set unit size and track_subreg_p flag for pseudo which need occupied multi
+ hard regs.  */
+  if (ALLOCNO_NREGS (a) > 1 && has_same_nregs (a))
 {
-  ALLOCNO_UNIT_SIZE (a) = UNITS_PER_WORD;
+  ALLOCNO_UNIT_SIZE (a) = get_reg_unit_size (a);
   ALLOCNO_TRACK_SUBREG_P (a) = true;
   return;
 }
-- 
2.36.3



[PATCH 2/7] ira: Add live_subreg problem and apply to ira pass

2023-11-07 Thread Lehua Ding
This patch adds a live_subreg problem to extend the original live_reg to
track the lifecycle of subreg. At the same time, this old live data is
replaced by the new live data in ira pass.

gcc/ChangeLog:

* Makefile.in: Add subreg-live-range.o
* df-problems.cc (struct df_live_subreg_problem_data): New df problem.
(need_track_subreg): helper function.
(get_range): helper function.
(remove_subreg_range): helper function.
(add_subreg_range): helper function.
(df_live_subreg_free_bb_info): df function.
(df_live_subreg_alloc): Ditto.
(df_live_subreg_reset): Ditto.
(df_live_subreg_bb_local_compute): Ditto.
(df_live_subreg_local_compute): Ditto.
(df_live_subreg_init): Ditto.
(df_live_subreg_check_result): Ditto.
(df_live_subreg_confluence_0): Ditto.
(df_live_subreg_confluence_n): Ditto.
(df_live_subreg_transfer_function): Ditto.
(df_live_subreg_finalize): Ditto.
(df_live_subreg_free): Ditto.
(df_live_subreg_top_dump): Ditto.
(df_live_subreg_bottom_dump): Ditto.
(df_live_subreg_add_problem): Ditto.
* df.h (enum df_problem_id): New df problem.
(DF_LIVE_SUBREG_INFO): New macro.
(DF_LIVE_SUBREG_IN): Ditto.
(DF_LIVE_SUBREG_OUT): Ditto.
(DF_LIVE_SUBREG_FULL_IN): Ditto.
(DF_LIVE_SUBREG_FULL_OUT): Ditto.
(DF_LIVE_SUBREG_PARTIAL_IN): Ditto.
(DF_LIVE_SUBREG_PARTIAL_OUT): Ditto.
(DF_LIVE_SUBREG_RANGE_IN): Ditto.
(DF_LIVE_SUBREG_RANGE_OUT): Ditto.
(class subregs_live): New class.
(class basic_block_subreg_live_info): New class.
(class df_live_subreg_bb_info): New class.
(df_live_subreg): New function.
(df_live_subreg_add_problem): Ditto.
(df_live_subreg_finalize): Ditto.
(class subreg_range): New class.
(need_track_subreg): Exported.
(remove_subreg_range): Exported.
(add_subreg_range): Exported.
(df_live_subreg_get_bb_info): Exported.
* ira-build.cc (create_bb_allocnos): Use new live data.
(create_loop_allocnos): Ditto.
* ira-color.cc (ira_loop_edge_freq): Ditto.
* ira-emit.cc (generate_edge_moves): Ditto.
(add_ranges_and_copies): Ditto.
* ira-lives.cc (process_out_of_region_eh_regs): Ditto.
(process_bb_node_lives): Ditto.
* ira.cc (find_moveable_pseudos): Ditto.
(interesting_dest_for_shprep_1): Ditto.
(allocate_initial_values): Ditto.
(ira): Ditto.
* reginfo.cc (get_nblocks_slow): Helper function.
* rtl.h (get_nblocks_slow): Helper function.
(get_nblocks): Helper function.
* timevar.def (TV_DF_LIVE_SUBREG): New timevar.
* subreg-live-range.cc: New file.
* subreg-live-range.h: New file.

---
 gcc/Makefile.in  |   1 +
 gcc/df-problems.cc   | 889 ++-
 gcc/df.h |  93 +++-
 gcc/ira-build.cc |  14 +-
 gcc/ira-color.cc |   8 +-
 gcc/ira-emit.cc  |  12 +-
 gcc/ira-lives.cc |   7 +-
 gcc/ira.cc   |  20 +-
 gcc/reginfo.cc   |  14 +
 gcc/rtl.h|  14 +
 gcc/subreg-live-range.cc | 649 
 gcc/subreg-live-range.h  | 326 ++
 gcc/timevar.def  |   1 +
 13 files changed, 2008 insertions(+), 40 deletions(-)
 create mode 100644 gcc/subreg-live-range.cc
 create mode 100644 gcc/subreg-live-range.h

diff --git a/gcc/Makefile.in b/gcc/Makefile.in
index 29cec21c825..e4403b5a30c 100644
--- a/gcc/Makefile.in
+++ b/gcc/Makefile.in
@@ -1675,6 +1675,7 @@ OBJS = \
store-motion.o \
streamer-hooks.o \
stringpool.o \
+subreg-live-range.o \
substring-locations.o \
target-globals.o \
targhooks.o \
diff --git a/gcc/df-problems.cc b/gcc/df-problems.cc
index d2cfaf7f50f..2585c762fd1 100644
--- a/gcc/df-problems.cc
+++ b/gcc/df-problems.cc
@@ -28,6 +28,7 @@ along with GCC; see the file COPYING3.  If not see
 #include "target.h"
 #include "rtl.h"
 #include "df.h"
+#include "subreg-live-range.h"
 #include "memmodel.h"
 #include "tm_p.h"
 #include "insn-config.h"
@@ -1344,8 +1345,894 @@ df_lr_verify_transfer_functions (void)
   bitmap_clear (_blocks);
 }
 
+/*
+   REGISTER AND SUBREG LIVES
+   Like DF_RL, but fine-grained tracking of subreg lifecycle.
+   
*/
+
+/* Private data used to verify the solution for this problem.  */
+struct df_live_subreg_problem_data
+{
+  /* An obstack for the bitmaps we need for this problem.  */
+  bitmap_obstack live_subreg_bitmaps;
+  bool has_subreg_live_p;
+};
+
+/* Helper functions */
+
+/* Return true if REGNO is a pseudo and MODE is a multil regs size.  */
+bool
+need_track_subreg (int 

[PATCH 1/7] ira: Refactor the handling of register conflicts to make it more general

2023-11-07 Thread Lehua Ding
This patch does not make any functional changes. It mainly refactor two parts:

1. The ira_allocno's objects field is expanded to an scalable array, and 
multi-word
   pseduo registers are split and tracked only when necessary.
2. Since the objects array has been expanded, there will be more subreg objects
   that pass through later, rather than the previous fixed two. Therefore, it
   is necessary to modify the detection of whether two objects conflict, and
   the check method is to pull back the registers occupied by the object to
   the first register of the allocno for judgment.

gcc/ChangeLog:

* hard-reg-set.h (struct HARD_REG_SET): Add operator>>.
* ira-build.cc (init_object_start_and_nregs): New func.
(find_object): Ditto.
(ira_create_allocno): Adjust.
(ira_set_allocno_class): Set subreg info.
(ira_create_allocno_objects): Adjust.
(init_regs_with_subreg): Collect access in subreg.
(ira_build): Call init_regs_with_subreg
(ira_destroy): Clear regs_with_subreg
* ira-color.cc (setup_profitable_hard_regs): Adjust.
(get_conflict_and_start_profitable_regs): Adjust.
(check_hard_reg_p): Adjust.
(assign_hard_reg): Adjust.
(improve_allocation): Adjust.
* ira-int.h (struct ira_object): Adjust fields.
(struct ira_allocno): Adjust objects filed.
(ALLOCNO_NUM_OBJECTS): Adjust.
(ALLOCNO_UNIT_SIZE): New.
(ALLOCNO_TRACK_SUBREG_P): New.
(ALLOCNO_NREGS): New.
(OBJECT_SIZE): New.
(OBJECT_OFFSET): New.
(OBJECT_START): New.
(OBJECT_NREGS): New.
(find_object): New.
(has_subreg_object_p): New.
(get_full_object): New.
* ira.cc (check_allocation): Adjust.

---
 gcc/hard-reg-set.h |  33 +++
 gcc/ira-build.cc   | 106 +++-
 gcc/ira-color.cc   | 234 ++---
 gcc/ira-int.h  |  45 -
 gcc/ira.cc |  52 --
 5 files changed, 349 insertions(+), 121 deletions(-)

diff --git a/gcc/hard-reg-set.h b/gcc/hard-reg-set.h
index b0bb9bce074..760eadba186 100644
--- a/gcc/hard-reg-set.h
+++ b/gcc/hard-reg-set.h
@@ -113,6 +113,39 @@ struct HARD_REG_SET
 return !operator== (other);
   }
 
+  HARD_REG_SET
+  operator>> (unsigned int shift_amount) const
+  {
+if (shift_amount == 0)
+  return *this;
+
+HARD_REG_SET res;
+unsigned int total_bits = sizeof (HARD_REG_ELT_TYPE) * 8;
+if (shift_amount >= total_bits)
+  {
+   unsigned int n_elt = shift_amount % total_bits;
+   shift_amount -= n_elt * total_bits;
+   for (unsigned int i = 0; i < ARRAY_SIZE (elts) - n_elt - 1; i += 1)
+ res.elts[i] = elts[i + n_elt];
+   /* clear upper n_elt elements.  */
+   for (unsigned int i = 0; i < n_elt; i += 1)
+ res.elts[ARRAY_SIZE (elts) - 1 - i] = 0;
+  }
+
+if (shift_amount > 0)
+  {
+   /* The left bits of an element be shifted.  */
+   HARD_REG_ELT_TYPE left = 0;
+   /* Total bits of an element.  */
+   for (int i = ARRAY_SIZE (elts); i >= 0; --i)
+ {
+   res.elts[i] = (elts[i] >> shift_amount) | left;
+   left = elts[i] << (total_bits - shift_amount);
+ }
+  }
+return res;
+  }
+
   HARD_REG_ELT_TYPE elts[HARD_REG_SET_LONGS];
 };
 typedef const HARD_REG_SET _hard_reg_set;
diff --git a/gcc/ira-build.cc b/gcc/ira-build.cc
index 93e46033170..07aba27c1c9 100644
--- a/gcc/ira-build.cc
+++ b/gcc/ira-build.cc
@@ -440,6 +440,40 @@ initiate_allocnos (void)
   memset (ira_regno_allocno_map, 0, max_reg_num () * sizeof (ira_allocno_t));
 }
 
+/* Update OBJ's start and nregs field according A and OBJ info.  */
+static void
+init_object_start_and_nregs (ira_allocno_t a, ira_object_t obj)
+{
+  enum reg_class aclass = ALLOCNO_CLASS (a);
+  gcc_assert (aclass != NO_REGS);
+
+  machine_mode mode = ALLOCNO_MODE (a);
+  int nregs = ira_reg_class_max_nregs[aclass][mode];
+  if (ALLOCNO_TRACK_SUBREG_P (a))
+{
+  poly_int64 end = OBJECT_OFFSET (obj) + OBJECT_SIZE (obj);
+  for (int i = 0; i < nregs; i += 1)
+   {
+ poly_int64 right = ALLOCNO_UNIT_SIZE (a) * (i + 1);
+ if (OBJECT_START (obj) < 0 && maybe_lt (OBJECT_OFFSET (obj), right))
+   {
+ OBJECT_START (obj) = i;
+   }
+ if (OBJECT_NREGS (obj) < 0 && maybe_le (end, right))
+   {
+ OBJECT_NREGS (obj) = i + 1 - OBJECT_START (obj);
+ break;
+   }
+   }
+  gcc_assert (OBJECT_START (obj) >= 0 && OBJECT_NREGS (obj) > 0);
+}
+  else
+{
+  OBJECT_START (obj) = 0;
+  OBJECT_NREGS (obj) = nregs;
+}
+}
+
 /* Create and return an object corresponding to a new allocno A.  */
 static ira_object_t
 ira_create_object (ira_allocno_t a, int subword)
@@ -460,15 +494,36 @@ ira_create_object (ira_allocno_t a, int subword)
   OBJECT_MIN (obj) = INT_MAX;
   OBJECT_MAX (obj) = -1;
   

[PATCH 0/7] ira/lra: Support subreg coalesce

2023-11-07 Thread Lehua Ding
Hi,

These patchs try to support subreg coalesce feature in
register allocation passes (ira and lra).

Let's consider a RISC-V program (https://godbolt.org/z/ec51d91aT):

```
#include 

void
foo (int32_t *in, int32_t *out, size_t m)
{
  vint32m2_t result = __riscv_vle32_v_i32m2 (in, 32);
  vint32m1_t v0 = __riscv_vget_v_i32m2_i32m1 (result, 0);
  vint32m1_t v1 = __riscv_vget_v_i32m2_i32m1 (result, 1);
  for (size_t i = 0; i < m; i++)
{
  v0 = __riscv_vadd_vv_i32m1(v0, v0, 4);
  v1 = __riscv_vmul_vv_i32m1(v1, v1, 4);
}
  *(vint32m1_t*)(out+4*0) = v0;
  *(vint32m1_t*)(out+4*1) = v1;
}
```

Before these patchs:

```
foo:
li  a5,32
vsetvli zero,a5,e32,m2,ta,ma
vle32.v v4,0(a0)
vmv1r.v v2,v4
vmv1r.v v1,v5
beq a2,zero,.L2
li  a5,0
vsetivlizero,4,e32,m1,ta,ma
.L3:
addia5,a5,1
vadd.vv v2,v2,v2
vmul.vv v1,v1,v1
bne a2,a5,.L3
.L2:
vs1r.v  v2,0(a1)
addia1,a1,16
vs1r.v  v1,0(a1)
ret
```

After these patchs:

```
foo:
li  a5,32
vsetvli zero,a5,e32,m2,ta,ma
vle32.v v2,0(a0)
beq a2,zero,.L2
li  a5,0
vsetivlizero,4,e32,m1,ta,ma
.L3:
addia5,a5,1
vadd.vv v2,v2,v2
vmul.vv v3,v3,v3
bne a2,a5,.L3
.L2:
vs1r.v  v2,0(a1)
addia1,a1,16
vs1r.v  v3,0(a1)
ret
```

As you can see, the two redundant vmv1r.v instructions were removed.
The reason for the two redundant vmv1r.v instructions is because
the current ira pass is being conservative in calculating the live
range of pseduo registers that occupy multil hardregs. As in the
following two RTL instructions. Where r134 occupies two physical
registers and r135 and r136 occupy one physical register.
At insn 12 point, ira considers the entire r134 pseudo register
to be live, so r135 is in conflict with r134, as shown in the ira
dump info. Then when the physical registers are allocated, r135 and
r134 are allocated first because they are inside the loop body and
have higher priority. This makes it difficult to assign r136 to
overlap with r134, i.e., to assign r136 to hr100, thus eliminating
the need for the vmv1r.v instruction. Thus two vmv1r.v instructions
appear.

If we refine the live information of r134 to the case of each subreg,
we can remove this conflict. We can then create copies of the set
with subreg reference, thus increasing the priority of the r134 allocation,
which allow registers with bigger alignment requirements to prioritize
the allocation of physical registers. In RVV, pseudo registers occupying
two physical registers need to be time-2 aligned.

```
(insn 11 10 12 2 (set (reg/v:RVVM1SI 135 [ v0 ])
(subreg:RVVM1SI (reg/v:RVVM2SI 134 [ result ]) 0)) 
"/app/example.c":7:19 998 {*movrvvm1si_whole}
 (nil))
(insn 12 11 13 2 (set (reg/v:RVVM1SI 136 [ v1 ])
(subreg:RVVM1SI (reg/v:RVVM2SI 134 [ result ]) [16, 16])) 
"/app/example.c":8:19 998 {*movrvvm1si_whole}
 (expr_list:REG_DEAD (reg/v:RVVM2SI 134 [ result ])
(nil)))
```

ira dump:

;; a1(r136,l0) conflicts: a3(r135,l0)
;; total conflict hard regs:
;; conflict hard regs:
;; a3(r135,l0) conflicts: a1(r136,l0) a6(r134,l0)
;; total conflict hard regs:
;; conflict hard regs:
;; a6(r134,l0) conflicts: a3(r135,l0)
;; total conflict hard regs:
;; conflict hard regs:
;;
;; ...
  Popping a1(r135,l0)  -- assign reg 97
  Popping a3(r136,l0)  -- assign reg 98
  Popping a4(r137,l0)  -- assign reg 15
  Popping a5(r140,l0)  -- assign reg 12
  Popping a10(r145,l0)  -- assign reg 12
  Popping a2(r139,l0)  -- assign reg 11
  Popping a9(r144,l0)  -- assign reg 11
  Popping a0(r142,l0)  -- assign reg 11
  Popping a6(r134,l0)  -- assign reg 100
  Popping a7(r143,l0)  -- assign reg 10
  Popping a8(r141,l0)  -- assign reg 15

The AArch64 SVE has the same problem. Consider the following
code (https://godbolt.org/z/MYrK7Ghaj):

```
#include 

int bar (svbool_t pg, int64_t* base, int n, int64_t *in1, int64_t *in2, 
int64_t*out)
{
  svint64x4_t result = svld4_s64 (pg, base);
  svint64_t v0 = svget4_s64(result, 0);
  svint64_t v1 = svget4_s64(result, 1);
  svint64_t v2 = svget4_s64(result, 2);
  svint64_t v3 = svget4_s64(result, 3);

  for (int i = 0; i < n; i += 1)
{
svint64_t v18 = svld1_s64(pg, in1);
svint64_t v19 = svld1_s64(pg, in2);
v0 = svmad_s64_z(pg, v0, v18, v19);
v1 = svmad_s64_z(pg, v1, v18, v19);
v2 = svmad_s64_z(pg, v2, v18, v19);
v3 = svmad_s64_z(pg, v3, v18, v19);
}
  svst1_s64(pg, out+0,v0);
  svst1_s64(pg, out+1,v1);
  svst1_s64(pg, out+2,v2);
  svst1_s64(pg, out+3,v3);
}
```

Before these patchs:

```
bar:
ld4d{z4.d - z7.d}, p0/z, [x0]
mov z26.d, z4.d

[PATCH] diagnostics: pch: Remember diagnostic pragmas in a PCH [PR64117]

2023-11-07 Thread Lewis Hyatt
Hello-

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=64117

This fixes an old PR / enhancement request. Bootstrap + regtest all
languages on x86-64 Linux. Please let me know if it looks OK? Thanks!

-Lewis

-- >8 --

As the PR points out, we do not currently record in a PCH whether any
diagnostics were reclassified by pragmas, so a header that takes care to
adjust some diagnostics for downstream users does not work as designed when
it is precompiled.  Implement that feature by adding a new interface
diagnostic_context::save_to_pch() / restore_from_pch(), which is called on
the global diagnostic context object.

This feature also exposes the need for a small tweak to the C++ frontend.
That frontend processes `#pragma GCC diagnostic push' and `#pragma GCC
diagnostic pop' pragmas twice, once while preprocessing and once again while
compiling.  In case a translation unit contains an unbalanced set of pushes
and pops, that results in twice as many leftover states as there should be.
This does not cause any problems for a single translation unit, but if the
snapshot of the state is preserved in a PCH, then it becomes observable, for
example with a setup like in the new testcase pragma-diagnostic-3.C:

t.h:

 #pragma GCC diagnostic push
 #pragma GCC diagnostic ignored...
 //no pop at end of the file

t.c

 #include "t.h"
 #pragma GCC diagnostic pop
 //expect to be at the initial state here

If t.h has been precompiled, and if the push had been processed twice at the
time the PCH was written, then the state will not be reset as expected in
t.c. Address that by having the C++ frontend reset the push/pop history
before starting its second pass.

gcc/ChangeLog:

PR pch/64117
* Makefile.in: Add tree-diagnostic.cc to GTFILES
* diagnostic.cc (diagnostic_option_classifier::init): Handle the
classification history members, which had been omitted here.
(diagnostic_option_classifier::fini): Likewise.
(diagnostic_option_classifier::classify_diagnostic): Refactor some
logic to...
(diagnostic_context::get_original_option_classification): ...this
new function.
* diagnostic.h (struct diagnostic_option_classifier::state): Declare.
(diagnostic_option_classifier::save_state_to): Declare.
(diagnostic_option_classifier::restore_from_pch): Declare.
(diagnostic_option_classifier::get_state): Declare.
(diagnostic_option_classifier::restore_state): Declare.
(diagnostic_option_classifier::free_state): Declare.
(diagnostic_context::get_original_option_classification): Declare.
(diagnostic_context::get_classifier): New accessor functions.
(diagnostic_context::save_to_pch): Declare.
(diagnostic_context::restore_from_pch): Declare.
* ggc-common.cc (gt_pch_save): Use the new PCH interface to handle
the global diagnostic context.
(gt_pch_restore): Likewise.
* tree-diagnostic.cc (struct diagnostic_option_classifier::state):
New struct.
(struct diagnostic_pch_data): New struct.
(pch): New GC root.
(diagnostic_context::save_to_pch): New function.
(diagnostic_context::restore_from_pch): New function.
(diagnostic_option_classifier::save_state_to): New function.
(diagnostic_option_classifier::restore_from_pch): New function.
(diagnostic_option_classifier::get_state): New function.
(diagnostic_option_classifier::restore_state): New function.
(diagnostic_option_classifier::free_state): New function.

gcc/cp/ChangeLog:

PR pch/64117
* parser.cc (cp_lexer_new_main): Restore the diagnostics
classifications to their original state after the first pass through
the input.

gcc/testsuite/ChangeLog:

PR pch/64117
* g++.dg/pch/pragma-diagnostic-1.C: New test.
* g++.dg/pch/pragma-diagnostic-1.Hs: New test.
* g++.dg/pch/pragma-diagnostic-2.C: New test.
* g++.dg/pch/pragma-diagnostic-2.Hs: New test.
* g++.dg/pch/pragma-diagnostic-3.C: New test.
* g++.dg/pch/pragma-diagnostic-3.Hs: New test.
* g++.dg/pch/pragma-diagnostic-4.C: New test.
* g++.dg/pch/pragma-diagnostic-4.Hs: New test.
* g++.dg/pch/pragma-diagnostic-5.C: New test.
* g++.dg/pch/pragma-diagnostic-5.Hs: New test.
* gcc.dg/pch/pragma-diagnostic-1.c: New test.
* gcc.dg/pch/pragma-diagnostic-1.hs: New test.
* gcc.dg/pch/pragma-diagnostic-2.c: New test.
* gcc.dg/pch/pragma-diagnostic-2.hs: New test.
* gcc.dg/pch/pragma-diagnostic-3.c: New test.
* gcc.dg/pch/pragma-diagnostic-3.hs: New test.
* gcc.dg/pch/pragma-diagnostic-4.c: New test.
* gcc.dg/pch/pragma-diagnostic-4.hs: New test.
---
 gcc/Makefile.in   |   1 +
 gcc/cp/parser.cc  |  11 ++
 gcc/diagnostic.cc |  26 ++-
 

[PATCH] RISC-V: Nan-box the result of movhf on soft-fp16

2023-11-07 Thread KuanLin Chen
 According to spec, fmv.h checks if the input operands are correctly
 NaN-boxed. If not, the input value is treated as an n-bit canonical NaN.
 This patch fixs the issue that operands returned by soft-fp16 libgcc
 (i.e., __truncdfhf2) was not correctly NaN-boxed.

*gcc/ChangeLog:*

* config/riscv/riscv.cc (riscv_legitimize_move): Expand movfh

with Nan-boxing value.

* config/riscv/riscv.md (*movhf_softfloat_unspec): New pattern.


*gcc/testsuite/ChangeLog:*


gcc.target/riscv/_Float16-nanboxing.c: New test.


0001-RISC-V-Nan-box-the-result-of-movhf-on-soft-fp16.patch
Description: Binary data


PING^4 [PATCH v2] rs6000: Don't use optimize_function_for_speed_p too early [PR108184]

2023-11-07 Thread Kewen.Lin
Hi,

Gentle ping this:

https://gcc.gnu.org/pipermail/gcc-patches/2023-January/609993.html

BR,
Kewen

>>> on 2023/1/16 17:08, Kewen.Lin via Gcc-patches wrote:
 Hi,

 As Honza pointed out in [1], the current uses of function
 optimize_function_for_speed_p in rs6000_option_override_internal
 are too early, since the query results from the functions
 optimize_function_for_{speed,size}_p could be changed later due
 to profile feedback and some function attributes handlings etc.

 This patch is to move optimize_function_for_speed_p to all the
 use places of the corresponding flags, which follows the existing
 practices.  Maybe we can cache it somewhere at an appropriate
 timing, but that's another thing.

 Comparing with v1[2], this version added one test case for
 SAVE_TOC_INDIRECT as Segher questioned and suggested, and it
 also considered the possibility of explicit option (see test
 cases pr108184-2.c and pr108184-4.c).  I believe that excepting
 for the intentional change on optimize_function_for_{speed,
 size}_p, there is no other function change.

 [1] https://gcc.gnu.org/pipermail/gcc-patches/2022-November/607527.html
 [2] https://gcc.gnu.org/pipermail/gcc-patches/2023-January/609379.html

 Bootstrapped and regtested on powerpc64-linux-gnu P8,
 powerpc64le-linux-gnu P{9,10} and powerpc-ibm-aix.

 Is it ok for trunk?

 BR,
 Kewen
 -
 gcc/ChangeLog:

* config/rs6000/rs6000.cc (rs6000_option_override_internal): Remove
all optimize_function_for_speed_p uses.
(fusion_gpr_load_p): Call optimize_function_for_speed_p along
with TARGET_P8_FUSION_SIGN.
(expand_fusion_gpr_load): Likewise.
(rs6000_call_aix): Call optimize_function_for_speed_p along with
TARGET_SAVE_TOC_INDIRECT.
* config/rs6000/predicates.md (fusion_gpr_mem_load): Call
optimize_function_for_speed_p along with TARGET_P8_FUSION_SIGN.

 gcc/testsuite/ChangeLog:

* gcc.target/powerpc/pr108184-1.c: New test.
* gcc.target/powerpc/pr108184-2.c: New test.
* gcc.target/powerpc/pr108184-3.c: New test.
* gcc.target/powerpc/pr108184-4.c: New test.
 ---
  gcc/config/rs6000/predicates.md   |  5 +++-
  gcc/config/rs6000/rs6000.cc   | 19 +-
  gcc/testsuite/gcc.target/powerpc/pr108184-1.c | 16 
  gcc/testsuite/gcc.target/powerpc/pr108184-2.c | 15 +++
  gcc/testsuite/gcc.target/powerpc/pr108184-3.c | 25 +++
  gcc/testsuite/gcc.target/powerpc/pr108184-4.c | 24 ++
  6 files changed, 97 insertions(+), 7 deletions(-)
  create mode 100644 gcc/testsuite/gcc.target/powerpc/pr108184-1.c
  create mode 100644 gcc/testsuite/gcc.target/powerpc/pr108184-2.c
  create mode 100644 gcc/testsuite/gcc.target/powerpc/pr108184-3.c
  create mode 100644 gcc/testsuite/gcc.target/powerpc/pr108184-4.c

 diff --git a/gcc/config/rs6000/predicates.md 
 b/gcc/config/rs6000/predicates.md
 index a1764018545..9f84468db84 100644
 --- a/gcc/config/rs6000/predicates.md
 +++ b/gcc/config/rs6000/predicates.md
 @@ -1878,7 +1878,10 @@ (define_predicate "fusion_gpr_mem_load"

/* Handle sign/zero extend.  */
if (GET_CODE (op) == ZERO_EXTEND
 -  || (TARGET_P8_FUSION_SIGN && GET_CODE (op) == SIGN_EXTEND))
 +  || (TARGET_P8_FUSION_SIGN
 +&& GET_CODE (op) == SIGN_EXTEND
 +&& (rs6000_isa_flags_explicit & OPTION_MASK_P8_FUSION_SIGN
 +|| optimize_function_for_speed_p (cfun
  {
op = XEXP (op, 0);
mode = GET_MODE (op);
 diff --git a/gcc/config/rs6000/rs6000.cc b/gcc/config/rs6000/rs6000.cc
 index 6ac3adcec6b..f47d21980a9 100644
 --- a/gcc/config/rs6000/rs6000.cc
 +++ b/gcc/config/rs6000/rs6000.cc
 @@ -3997,8 +3997,7 @@ rs6000_option_override_internal (bool global_init_p)
/* If we can shrink-wrap the TOC register save separately, then use
   -msave-toc-indirect unless explicitly disabled.  */
if ((rs6000_isa_flags_explicit & OPTION_MASK_SAVE_TOC_INDIRECT) == 0
 -  && flag_shrink_wrap_separate
 -  && optimize_function_for_speed_p (cfun))
 +  && flag_shrink_wrap_separate)
  rs6000_isa_flags |= OPTION_MASK_SAVE_TOC_INDIRECT;

/* Enable power8 fusion if we are tuning for power8, even if we aren't
 @@ -4032,7 +4031,6 @@ rs6000_option_override_internal (bool global_init_p)
   zero extending load, and an explicit sign extension.  */
if (TARGET_P8_FUSION
&& !(rs6000_isa_flags_explicit & OPTION_MASK_P8_FUSION_SIGN)
 -  && optimize_function_for_speed_p (cfun)
&& optimize >= 3)
  rs6000_isa_flags |= OPTION_MASK_P8_FUSION_SIGN;

 @@ -25690,7 +25688,10 @@ 

PING^6 [PATCH 0/9] rs6000: Rework rs6000_emit_vector_compare

2023-11-07 Thread Kewen.Lin
Hi,

Gentle ping this series:

https://gcc.gnu.org/pipermail/gcc-patches/2022-November/607146.html

BR,
Kewen

> on 2022/11/24 17:15, Kewen Lin wrote:
>> Hi,
>>
>> Following Segher's suggestion, this patch series is to rework
>> function rs6000_emit_vector_compare for vector float and int
>> in multiple steps, it's based on the previous attempts [1][2].
>> As mentioned in [1], the need to rework this for float is to
>> make a centralized place for vector float comparison handlings
>> instead of supporting with swapping ops and reversing code etc.
>> dispersedly.  It's also for a subsequent patch to handle
>> comparison operators with or without trapping math (PR105480).
>> With the handling on vector float reworked, we can further make
>> the handling on vector int simplified as shown.
>>
>> For Segher's concern about whether this rework causes any
>> assembly change, I constructed two testcases for vector float[3]
>> and int[4] respectively before, it showed the most are fine
>> excepting for the difference on LE and UNGT, it's demonstrated
>> as improvement since it uses GE instead of GT ior EQ.  The
>> associated test case in patch 3/9 is a good example.
>>
>> Besides, w/ and w/o the whole patch series, I built the whole
>> SPEC2017 at options -O3 and -Ofast separately, checked the
>> differences on object assembly.  The result showed that the
>> most are unchanged, except for:
>>
>>   * at -O3, 521.wrf_r has 9 object files and 526.blender_r has
>> 9 object files with differences.
>>
>>   * at -Ofast, 521.wrf_r has 12 object files, 526.blender_r has
>> one and 527.cam4_r has 4 object files with differences.
>>
>> By looking into these differences, all significant differences
>> are caused by the known improvement mentined above transforming
>> GT ior EQ to GE, which can also affect unrolling decision due
>> to insn count.  Some other trivial differences are branch
>> target offset difference, nop difference for alignment, vsx
>> register number differences etc.
>>
>> I also evaluated the runtime performance for these changed
>> benchmarks, the result is neutral.
>>
>> These patches are bootstrapped and regress-tested
>> incrementally on powerpc64-linux-gnu P7 & P8, and
>> powerpc64le-linux-gnu P9 & P10.
>>
>> Is it ok for trunk?
>>
>> BR,
>> Kewen
>> -
>> [1] https://gcc.gnu.org/pipermail/gcc-patches/2022-November/606375.html
>> [2] https://gcc.gnu.org/pipermail/gcc-patches/2022-November/606376.html
>> [3] https://gcc.gnu.org/pipermail/gcc-patches/2022-November/606504.html
>> [4] https://gcc.gnu.org/pipermail/gcc-patches/2022-November/606506.html
>>
>> Kewen Lin (9):
>>   rs6000: Rework vector float comparison in rs6000_emit_vector_compare - 
>> p1
>>   rs6000: Rework vector float comparison in rs6000_emit_vector_compare - 
>> p2
>>   rs6000: Rework vector float comparison in rs6000_emit_vector_compare - 
>> p3
>>   rs6000: Rework vector float comparison in rs6000_emit_vector_compare - 
>> p4
>>   rs6000: Rework vector integer comparison in rs6000_emit_vector_compare 
>> - p1
>>   rs6000: Rework vector integer comparison in rs6000_emit_vector_compare 
>> - p2
>>   rs6000: Rework vector integer comparison in rs6000_emit_vector_compare 
>> - p3
>>   rs6000: Rework vector integer comparison in rs6000_emit_vector_compare 
>> - p4
>>   rs6000: Rework vector integer comparison in rs6000_emit_vector_compare 
>> - p5
>>
>>  gcc/config/rs6000/rs6000.cc | 180 ++--
>>  gcc/testsuite/gcc.target/powerpc/vcond-fp.c |  25 +++
>>  2 files changed, 74 insertions(+), 131 deletions(-)
>>  create mode 100644 gcc/testsuite/gcc.target/powerpc/vcond-fp.c
>>
>


PING^1 [PATCH v3] sched: Change no_real_insns_p to no_real_nondebug_insns_p [PR108273]

2023-11-07 Thread Kewen.Lin
Hi,

Gentle ping this:

https://gcc.gnu.org/pipermail/gcc-patches/2023-October/634201.html

BR,
Kewen

on 2023/10/25 10:45, Kewen.Lin wrote:
> Hi,
> 
> This is almost a repost for v2 which was posted at[1] in March
> excepting for:
>   1) rebased from r14-4810 which is relatively up-to-date,
>  some conflicts on "int to bool" return type change have
>  been resolved;
>   2) adjust commit log a bit;
>   3) fix misspelled "articial" with "artificial" somewhere;
> 
> --
> *v2 comments*:
> 
> By addressing Alexander's comments, against v1 this
> patch v2 mainly:
> 
>   - Rename no_real_insns_p to no_real_nondebug_insns_p;
>   - Introduce enum rgn_bb_deps_free_action for three
> kinds of actions to free deps;
>   - Change function free_deps_for_bb_no_real_insns_p to
> resolve_forw_deps which only focuses on forward deps;
>   - Extend the handlings to cover dbg-cnt sched_block,
> add one test case for it;
>   - Move free_trg_info call in schedule_region to an
> appropriate place.
> 
> One thing I'm not sure about is the change in function
> sched_rgn_local_finish, currently the invocation to
> sched_rgn_local_free is guarded with !sel_sched_p (),
> so I just follow it, but the initialization of those
> structures (in sched_rgn_local_init) isn't guarded
> with !sel_sched_p (), it looks odd.
> 
> --
> 
> As PR108273 shows, when there is one block which only has
> NOTE_P and LABEL_P insns at non-debug mode while has some
> extra DEBUG_INSN_P insns at debug mode, after scheduling
> it, the DFA states would be different between debug mode
> and non-debug mode.  Since at non-debug mode, the block
> meets no_real_insns_p, it gets skipped; while at debug
> mode, it gets scheduled, even it only has NOTE_P, LABEL_P
> and DEBUG_INSN_P, the call of function advance_one_cycle
> will change the DFA state.  PR108519 also shows this issue
> can be exposed by some scheduler changes.
> 
> This patch is to change function no_real_insns_p to
> function no_real_nondebug_insns_p by taking debug insn into
> account, which make us not try to schedule for the block
> having only NOTE_P, LABEL_P and DEBUG_INSN_P insns,
> resulting in consistent DFA states between non-debug and
> debug mode.
> 
> Changing no_real_insns_p to no_real_nondebug_insns_p caused
> ICE when doing free_block_dependencies, the root cause is
> that we create dependencies for debug insns, those
> dependencies are expected to be resolved during scheduling
> insns, but they get skipped after this change.
> By checking the code, it looks it's reasonable to skip to
> compute block dependences for no_real_nondebug_insns_p
> blocks.  There is also another issue, which gets exposed
> in SPEC2017 bmks build at option -O2 -g, is that we could
> skip to schedule some block, which already gets dependency
> graph built so has dependencies computed and rgn_n_insns
> accumulated, then the later verification on if the graph
> becomes exhausted by scheduling would fail as follow:
> 
>   /* Sanity check: verify that all region insns were
>  scheduled.  */
> gcc_assert (sched_rgn_n_insns == rgn_n_insns);
> 
> , and also some forward deps aren't resovled.
> 
> As Alexander pointed out, the current debug count handling
> also suffers the similar issue, so this patch handles these
> two cases together: one is for some block gets skipped by
> !dbg_cnt (sched_block), the other is for some block which
> is not no_real_nondebug_insns_p initially but becomes
> no_real_nondebug_insns_p due to speculative scheduling.
> 
> This patch can be bootstrapped and regress-tested on
> x86_64-redhat-linux, aarch64-linux-gnu and
> powerpc64{,le}-linux-gnu.
> 
> I also verified this patch can pass SPEC2017 both intrate
> and fprate bmks building at -g -O2/-O3.
> 
> Any thoughts?  Is it ok for trunk?
> 
> [1] v2: https://gcc.gnu.org/pipermail/gcc-patches/2023-March/614818.html
> [2] v1: https://gcc.gnu.org/pipermail/gcc-patches/2023-March/614224.html
> 
> BR,
> Kewen
> -
>   PR rtl-optimization/108273
> 
> gcc/ChangeLog:
> 
>   * haifa-sched.cc (no_real_insns_p): Rename to ...
>   (no_real_nondebug_insns_p): ... this, and consider DEBUG_INSN_P insn.
>   * sched-ebb.cc (schedule_ebb): Replace no_real_insns_p with
>   no_real_nondebug_insns_p.
>   * sched-int.h (no_real_insns_p): Rename to ...
>   (no_real_nondebug_insns_p): ... this.
>   * sched-rgn.cc (enum rgn_bb_deps_free_action): New enum.
>   (bb_deps_free_actions): New static variable.
>   (compute_block_dependences): Skip for no_real_nondebug_insns_p.
>   (resolve_forw_deps): New function.
>   (free_block_dependencies): Check bb_deps_free_actions and call
>   function resolve_forw_deps for RGN_BB_DEPS_FREE_ARTIFICIAL.
>   (compute_priorities): Replace no_real_insns_p with
>   no_real_nondebug_insns_p.
>   (schedule_region): Replace no_real_insns_p with
>   no_real_nondebug_insns_p, set RGN_BB_DEPS_FREE_ARTIFICIAL if the block
>   get 

Re: [PATCH 07/12] mode-switching: Allow targets to set the mode for EH handlers

2023-11-07 Thread Jeff Law




On 11/7/23 17:15, Richard Sandiford wrote:

Thanks for the reviews.

Jeff Law  writes:

On 11/5/23 11:48, Richard Sandiford wrote:

The mode-switching pass already had hooks to say what mode
an entity is in on entry to a function and what mode it must
be in on return.  For SME, we also want to say what mode an
entity is guaranteed to be in on entry to an exception handler.

gcc/
* target.def (mode_switching.eh_handler): New hook.
* doc/tm.texi.in (TARGET_MODE_EH_HANDLER): New @hook.
* doc/tm.texi: Regenerate.
* mode-switching.cc (optimize_mode_switching): Use eh_handler
to get the mode on entry to an exception handler.

Can we ever have a path to the exception handler triggered by a normal
control flow and if so, presumably we want this to apply in that case too?


Not directly AFAIK.  The handler has to handle the EH_DATA_REGNOs,
call __cxa_begin_catch, etc.  So even if there is fall-through at
the source level, I think there'd always be a block that is only
reached through abnormal control flow.  So...


It looks like that's the semantics of the implementation by way to using
bb_has_eh_pred.  Just want to make sure that's the semantics you want in
that oddball case.

Assuming it is the semantics you want, it's OK for the trunk, though you
might want to twiddle the docs slightly to mention that case.


...I think these EH blocks are pure re-entry points.  I suppose some
targets might have entities whose state is call-preserved, so that it's
not changed by EH edges.  But that might also apply to other abnormal
control flow too, so it's probably a separate issue/feature.
OK.  I wouldn't be surprised if there's state that wouldn't be correct 
if we had code which jumped directly into the EH handler path.  Makes me 
wonder if the C++ language might actually prohibit such shenanigans.


Jeff


Re: [PATCH 10/12] mode-switching: Use 1-based edge aux fields

2023-11-07 Thread Jeff Law




On 11/7/23 17:35, Richard Sandiford wrote:


I could have sworn that there was something that checked that passes
left edge aux fields clear, but it looks like I misremembered.  So I
probably need to stick a clear_aux_for_edges () call above the first
main loop (for 12/12) and keep the initialisation here as well.
That does sound vaguely familiar.   Maybe it was a one-off test someone 
did.




That kind-of takes away the point of shifting to 1-based values
in the first place.  Ah well...
Your call.  I'd tend to lean towards inserting the clear_aux call if we 
don't have something that's consistently verifying aux state. 
Alternately we can return to the -1 handling.  I doubt it's all that 
important from a compile-time standpoint.


jeff


Re: [PATCH] LoongArch: Remove redundant barrier instructions before LL-SC loops

2023-11-07 Thread chenglulu



在 2023/11/6 下午7:36, Xi Ruoyao 写道:

This is isomorphic to the LLVM changes [1-2].

On LoongArch, the LL and SC instructions has memory barrier semantics:

- LL:  + 
- SC:  + 

But the compare and swap operation is allowed to fail, and if it fails
the SC instruction is not executed, thus the guarantee of acquiring
semantics cannot be ensured. Therefore, an acquire barrier needs to be
generated when failure_memorder includes an acquire operation.

On CPUs implementing LoongArch v1.10 or later, "dbar 0b10100" is an
acquire barrier; on CPUs implementing LoongArch v1.00, it is a full
barrier.  So it's always enough for acquire semantics.  OTOH if an
acquire semantic is not needed, we still needs the "dbar 0x700" as the
load-load barrier like all LL-SC loops.


I don't think there's a problem with the logic. I'm also working on 
correcting the content of the atomic functions now, and I'm doing a 
correctness test, including this modification, and I'll email you back 
after the correctness test is completed.


Thanks!



Re: [V2 PATCH] Handle bitop with INTEGER_CST in analyze_and_compute_bitop_with_inv_effect.

2023-11-07 Thread Hongtao Liu
On Tue, Nov 7, 2023 at 10:34 PM Richard Biener
 wrote:
>
> On Tue, Nov 7, 2023 at 2:03 PM Hongtao Liu  wrote:
> >
> > On Tue, Nov 7, 2023 at 4:10 PM Richard Biener
> >  wrote:
> > >
> > > On Tue, Nov 7, 2023 at 7:08 AM liuhongt  wrote:
> > > >
> > > > analyze_and_compute_bitop_with_inv_effect assumes the first operand is
> > > > loop invariant which is not the case when it's INTEGER_CST.
> > > >
> > > > Bootstrapped and regtseted on x86_64-pc-linux-gnu{-m32,}.
> > > > Ok for trunk?
> > >
> > > So this addresses a missed optimization, right?  It seems to me that
> > > even with two SSA names we are only "lucky" when rhs1 is the invariant
> > > one.  So instead of swapping this way I'd do
> > Yes, it's a miss optimization.
> > And I think expr_invariant_in_loop_p (loop, match_op[1]) should be
> > enough, if match_op[1] is a loop invariant.it must be false for the
> > below conditions(there couldn't be any header_phi from its
> > definition).
>
> Yes, all I said is that when you now care for op1 being INTEGER_CST
> it could also be an invariant SSA name and thus only after swapping op0/op1
> we could have a successful match, no?
Sorry, the commit message is a little bit misleading.
At first, I just wanted to handle the INTEGER_CST case (with TREE_CODE
(match_op[1]) == INTEGER_CST), but then I realized that this could
probably be extended to the normal SSA_NAME case as well, so I used
expr_invariant_in_loop_p, which should theoretically be able to handle
the SSA_NAME case as well.

if (expr_invariant_in_loop_p (loop, match_op[1])) is true, w/o
swapping it must return NULL_TREE for below conditions.
if (expr_invariant_in_loop_p (loop, match_op[1])) is false, w/
swapping it must return NULL_TREE too.
So it can cover the both cases you mentioned, no need for a loop to
iterate 2 match_ops for all conditions.

3692  if (TREE_CODE (match_op[1]) != SSA_NAME
3693  || !expr_invariant_in_loop_p (loop, match_op[0])
3694  || !(header_phi = dyn_cast  (SSA_NAME_DEF_STMT (match_op[1])))
3695  || gimple_bb (header_phi) != loop->header
3696  || gimple_phi_num_args (header_phi) != 2)
3697return NULL_TREE;
3698
3699  if (PHI_ARG_DEF_FROM_EDGE (header_phi, loop_latch_edge (loop)) != phidef)
3700return NULL_TREE;


>
> > >
> > >  unsigned i;
> > >  for (i = 0; i < 2; ++i)
> > >if (TREE_CODE (match_op[i]) == SSA_NAME
> > >&& ...)
> > > break; /* found! */
> > >
> > >   if (i == 2)
> > > return NULL_TREE;
> > >   if (i == 0)
> > > std::swap (match_op[0], match_op[1]);
> > >
> > > to also handle a "swapped" pair of SSA names?
> > >
> > > > gcc/ChangeLog:
> > > >
> > > > PR tree-optimization/105735
> > > > PR tree-optimization/111972
> > > > * tree-scalar-evolution.cc
> > > > (analyze_and_compute_bitop_with_inv_effect): Handle bitop with
> > > > INTEGER_CST.
> > > >
> > > > gcc/testsuite/ChangeLog:
> > > >
> > > > * gcc.target/i386/pr105735-3.c: New test.
> > > > ---
> > > >  gcc/testsuite/gcc.target/i386/pr105735-3.c | 87 ++
> > > >  gcc/tree-scalar-evolution.cc   |  3 +
> > > >  2 files changed, 90 insertions(+)
> > > >  create mode 100644 gcc/testsuite/gcc.target/i386/pr105735-3.c
> > > >
> > > > diff --git a/gcc/testsuite/gcc.target/i386/pr105735-3.c 
> > > > b/gcc/testsuite/gcc.target/i386/pr105735-3.c
> > > > new file mode 100644
> > > > index 000..9e268a1a997
> > > > --- /dev/null
> > > > +++ b/gcc/testsuite/gcc.target/i386/pr105735-3.c
> > > > @@ -0,0 +1,87 @@
> > > > +/* { dg-do compile } */
> > > > +/* { dg-options "-O1 -fdump-tree-sccp-details" } */
> > > > +/* { dg-final { scan-tree-dump-times {final value replacement} 8 
> > > > "sccp" } } */
> > > > +
> > > > +unsigned int
> > > > +__attribute__((noipa))
> > > > +foo (unsigned int tmp)
> > > > +{
> > > > +  for (int bit = 0; bit < 64; bit++)
> > > > +tmp &= 11304;
> > > > +  return tmp;
> > > > +}
> > > > +
> > > > +unsigned int
> > > > +__attribute__((noipa))
> > > > +foo1 (unsigned int tmp)
> > > > +{
> > > > +  for (int bit = 63; bit >= 0; bit -=3)
> > > > +tmp &= 11304;
> > > > +  return tmp;
> > > > +}
> > > > +
> > > > +unsigned int
> > > > +__attribute__((noipa))
> > > > +foo2 (unsigned int tmp)
> > > > +{
> > > > +  for (int bit = 0; bit < 64; bit++)
> > > > +tmp |= 11304;
> > > > +  return tmp;
> > > > +}
> > > > +
> > > > +unsigned int
> > > > +__attribute__((noipa))
> > > > +foo3 (unsigned int tmp)
> > > > +{
> > > > +  for (int bit = 63; bit >= 0; bit -=3)
> > > > +tmp |= 11304;
> > > > +  return tmp;
> > > > +}
> > > > +
> > > > +unsigned int
> > > > +__attribute__((noipa))
> > > > +foo4 (unsigned int tmp)
> > > > +{
> > > > +  for (int bit = 0; bit < 64; bit++)
> > > > +tmp ^= 11304;
> > > > +  return tmp;
> > > > +}
> > > > +
> > > > +unsigned int
> > > > +__attribute__((noipa))
> > > > +foo5 (unsigned int tmp)
> > > > +{
> > > > +  for (int bit = 0; bit < 63; bit++)
> > > > +tmp ^= 11304;
> > > > 

Re: [PATCH 1/2] libdiagnostics: header and examples

2023-11-07 Thread David Malcolm
On Tue, 2023-11-07 at 19:02 -0500, Lewis Hyatt wrote:
> On Mon, Nov 6, 2023 at 8:29 PM David Malcolm 
> wrote:
> > 
> > Here's a work-in-progress patch for GCC that adds a
> > libdiagnostics.h
> > header describing the public interface, along with various
> > testcases
> > that show usage examples for the API.  Various aspects of this need
> > work; posting now for early feedback on overall direction.
> > 
> > How does the interface look?
> > 
> ...
> > +typedef unsigned int diagnostic_location_t;
> 
> One comment that occurred to me... for GCC we have a lot of PRs that
> are unhappy about the 32-bit location_t and the consequent issues
> that
> arise with very large source files, or with very long lines that lose
> column information.
> So far GCC has been able to get by with "don't do that" advice, but a
> more general libdiagnostics may need to avoid that arbitrary
> limitation? I feel like it may not be that long before GCC needs to
> deal with it as well, perhaps with a configure option, but even now,
> it could make sense for libdiagnostic to use a 64-bit location_t
> itself from the outset, so it won't need to change later, even if
> it's
> practically restricted to 32 bits for now.

That's a good point.

Perhaps the interface should give back a pointer to an opaque type, so
it would be:

  typedef struct diagnostic_location diagnostic_location;

and e.g.

extern const diagnostic_location *
diagnostic_manager_new_location_from_file_and_line (diagnostic_manager 
*diag_mgr,
const diagnostic_file *file,
diagnostic_line_num_t 
line_num);


where the diagnostic_manager owns the underlying memory.

Dave



Re: [PATCH 10/12] mode-switching: Use 1-based edge aux fields

2023-11-07 Thread Richard Sandiford
Jeff Law  writes:
> On 11/5/23 11:49, Richard Sandiford wrote:
>> The pass used the edge aux field to record which mode change
>> should happen on the edge, with -1 meaning "none".  It's more
>> convenient for later patches to leave aux zero for "none",
>> and use numbers based at 1 to record a change.
>> 
>> gcc/
>>  * mode-switching.cc (commit_mode_sets): Use 1-based edge aux values.
> So my only worry here is the state of the aux field as we enter mode 
> switching.  ISTM the old code would never depend on that previous state 
> since it always initialized eg->aux to -1, then conditionally overwrote 
> that value if there was an insertion.  Then it could clear all the aux 
> fields once a particular entity was resolved.
>
> It appears now that for the first entity we depend on the aux field 
> being clear as we enter mode switching.  Or am I missing something?

Yeah, we'd rely on that.  12/12 would too for all edges.

I could have sworn that there was something that checked that passes
left edge aux fields clear, but it looks like I misremembered.  So I
probably need to stick a clear_aux_for_edges () call above the first
main loop (for 12/12) and keep the initialisation here as well.

That kind-of takes away the point of shifting to 1-based values
in the first place.  Ah well...

Thanks,
Richard


[committed] testsuite: Rename c2x-*, gnu2x-* tests to c23-*, gnu23-*

2023-11-07 Thread Joseph Myers
Completing the move to refer to C23 in place of C2X, rename all tests
with "c2x" or "gnu2x" in their names to use "c23" or "gnu23" instead.
17 files in the testsuite that referred to such tests (or, in one
case, a generated .i file to be scanned) by those names are updated
for the renaming.

Bootstrapped with no regressions for x86_64-pc-linux-gnu.

gcc/testsuite/
* gcc.dg/atomic/c2x-stdatomic-lockfree-char8_t.c: Move to ...
* gcc.dg/atomic/c23-stdatomic-lockfree-char8_t.c: ... here.
* gcc.dg/atomic/c2x-stdatomic-var-init-1.c: Move to ...
* gcc.dg/atomic/c23-stdatomic-var-init-1.c: ... here.
* gcc.dg/atomic/gnu2x-stdatomic-lockfree-char8_t.c: Move to ...
* gcc.dg/atomic/gnu23-stdatomic-lockfree-char8_t.c: ... here.
Update reference to moved file.
* gcc.dg/c2x-align-1.c: Move to ...
* gcc.dg/c23-align-1.c: ... here.
* gcc.dg/c2x-align-6.c: Move to ...
* gcc.dg/c23-align-6.c: ... here.
* gcc.dg/c2x-attr-deprecated-1.c: Move to ...
* gcc.dg/c23-attr-deprecated-1.c: ... here.  Update reference to
moved file.
* gcc.dg/c2x-attr-deprecated-2.c: Move to ...
* gcc.dg/c23-attr-deprecated-2.c: ... here.
* gcc.dg/c2x-attr-deprecated-3.c: Move to ...
* gcc.dg/c23-attr-deprecated-3.c: ... here.
* gcc.dg/c2x-attr-deprecated-4.c: Move to ...
* gcc.dg/c23-attr-deprecated-4.c: ... here.
* gcc.dg/c2x-attr-fallthrough-1.c: Move to ...
* gcc.dg/c23-attr-fallthrough-1.c: ... here.
* gcc.dg/c2x-attr-fallthrough-2.c: Move to ...
* gcc.dg/c23-attr-fallthrough-2.c: ... here.
* gcc.dg/c2x-attr-fallthrough-3.c: Move to ...
* gcc.dg/c23-attr-fallthrough-3.c: ... here.
* gcc.dg/c2x-attr-fallthrough-4.c: Move to ...
* gcc.dg/c23-attr-fallthrough-4.c: ... here.
* gcc.dg/c2x-attr-fallthrough-5.c: Move to ...
* gcc.dg/c23-attr-fallthrough-5.c: ... here.
* gcc.dg/c2x-attr-fallthrough-6.c: Move to ...
* gcc.dg/c23-attr-fallthrough-6.c: ... here.
* gcc.dg/c2x-attr-maybe_unused-1.c: Move to ...
* gcc.dg/c23-attr-maybe_unused-1.c: ... here.
* gcc.dg/c2x-attr-maybe_unused-2.c: Move to ...
* gcc.dg/c23-attr-maybe_unused-2.c: ... here.
* gcc.dg/c2x-attr-maybe_unused-3.c: Move to ...
* gcc.dg/c23-attr-maybe_unused-3.c: ... here.
* gcc.dg/c2x-attr-maybe_unused-4.c: Move to ...
* gcc.dg/c23-attr-maybe_unused-4.c: ... here.
* gcc.dg/c2x-attr-nodiscard-1.c: Move to ...
* gcc.dg/c23-attr-nodiscard-1.c: ... here.
* gcc.dg/c2x-attr-nodiscard-2.c: Move to ...
* gcc.dg/c23-attr-nodiscard-2.c: ... here.
* gcc.dg/c2x-attr-nodiscard-3.c: Move to ...
* gcc.dg/c23-attr-nodiscard-3.c: ... here.
* gcc.dg/c2x-attr-nodiscard-4.c: Move to ...
* gcc.dg/c23-attr-nodiscard-4.c: ... here.
* gcc.dg/c2x-attr-noreturn-1.c: Move to ...
* gcc.dg/c23-attr-noreturn-1.c: ... here.
* gcc.dg/c2x-attr-noreturn-2.c: Move to ...
* gcc.dg/c23-attr-noreturn-2.c: ... here.
* gcc.dg/c2x-attr-noreturn-3.c: Move to ...
* gcc.dg/c23-attr-noreturn-3.c: ... here.
* gcc.dg/c2x-attr-syntax-1.c: Move to ...
* gcc.dg/c23-attr-syntax-1.c: ... here.
* gcc.dg/c2x-attr-syntax-2.c: Move to ...
* gcc.dg/c23-attr-syntax-2.c: ... here.
* gcc.dg/c2x-attr-syntax-3.c: Move to ...
* gcc.dg/c23-attr-syntax-3.c: ... here.
* gcc.dg/c2x-attr-syntax-4.c: Move to ...
* gcc.dg/c23-attr-syntax-4.c: ... here.
* gcc.dg/c2x-attr-syntax-5.c: Move to ...
* gcc.dg/c23-attr-syntax-5.c: ... here.
* gcc.dg/c2x-attr-syntax-6.c: Move to ...
* gcc.dg/c23-attr-syntax-6.c: ... here.
* gcc.dg/c2x-attr-syntax-7.c: Move to ...
* gcc.dg/c23-attr-syntax-7.c: ... here.
* gcc.dg/c2x-auto-1.c: Move to ...
* gcc.dg/c23-auto-1.c: ... here.
* gcc.dg/c2x-auto-2.c: Move to ...
* gcc.dg/c23-auto-2.c: ... here.
* gcc.dg/c2x-auto-3.c: Move to ...
* gcc.dg/c23-auto-3.c: ... here.
* gcc.dg/c2x-auto-4.c: Move to ...
* gcc.dg/c23-auto-4.c: ... here.
* gcc.dg/c2x-binary-constants-1.c: Move to ...
* gcc.dg/c23-binary-constants-1.c: ... here.
* gcc.dg/c2x-binary-constants-2.c: Move to ...
* gcc.dg/c23-binary-constants-2.c: ... here.
* gcc.dg/c2x-binary-constants-3.c: Move to ...
* gcc.dg/c23-binary-constants-3.c: ... here.
* gcc.dg/c2x-bool-1.c: Move to ...
* gcc.dg/c23-bool-1.c: ... here.
* gcc.dg/c2x-bool-2.c: Move to ...
* gcc.dg/c23-bool-2.c: ... here.
* gcc.dg/c2x-bool-limits-1.c: Move to ...
* gcc.dg/c23-bool-limits-1.c: ... here.
* gcc.dg/c2x-builtins-1.c: Move to ...
* gcc.dg/c23-builtins-1.c: ... here.
* gcc.dg/c2x-complit-1.c: 

Re: [PATCH 07/12] mode-switching: Allow targets to set the mode for EH handlers

2023-11-07 Thread Richard Sandiford
Thanks for the reviews.

Jeff Law  writes:
> On 11/5/23 11:48, Richard Sandiford wrote:
>> The mode-switching pass already had hooks to say what mode
>> an entity is in on entry to a function and what mode it must
>> be in on return.  For SME, we also want to say what mode an
>> entity is guaranteed to be in on entry to an exception handler.
>> 
>> gcc/
>>  * target.def (mode_switching.eh_handler): New hook.
>>  * doc/tm.texi.in (TARGET_MODE_EH_HANDLER): New @hook.
>>  * doc/tm.texi: Regenerate.
>>  * mode-switching.cc (optimize_mode_switching): Use eh_handler
>>  to get the mode on entry to an exception handler.
> Can we ever have a path to the exception handler triggered by a normal 
> control flow and if so, presumably we want this to apply in that case too?

Not directly AFAIK.  The handler has to handle the EH_DATA_REGNOs,
call __cxa_begin_catch, etc.  So even if there is fall-through at
the source level, I think there'd always be a block that is only
reached through abnormal control flow.  So...

> It looks like that's the semantics of the implementation by way to using 
> bb_has_eh_pred.  Just want to make sure that's the semantics you want in 
> that oddball case.
>
> Assuming it is the semantics you want, it's OK for the trunk, though you 
> might want to twiddle the docs slightly to mention that case.

...I think these EH blocks are pure re-entry points.  I suppose some
targets might have entities whose state is call-preserved, so that it's
not changed by EH edges.  But that might also apply to other abnormal
control flow too, so it's probably a separate issue/feature.

Thanks,
Richard


Re: [PATCH 1/2] libdiagnostics: header and examples

2023-11-07 Thread Lewis Hyatt
On Mon, Nov 6, 2023 at 8:29 PM David Malcolm  wrote:
>
> Here's a work-in-progress patch for GCC that adds a libdiagnostics.h
> header describing the public interface, along with various testcases
> that show usage examples for the API.  Various aspects of this need
> work; posting now for early feedback on overall direction.
>
> How does the interface look?
>
...
> +typedef unsigned int diagnostic_location_t;

One comment that occurred to me... for GCC we have a lot of PRs that
are unhappy about the 32-bit location_t and the consequent issues that
arise with very large source files, or with very long lines that lose
column information.
So far GCC has been able to get by with "don't do that" advice, but a
more general libdiagnostics may need to avoid that arbitrary
limitation? I feel like it may not be that long before GCC needs to
deal with it as well, perhaps with a configure option, but even now,
it could make sense for libdiagnostic to use a 64-bit location_t
itself from the outset, so it won't need to change later, even if it's
practically restricted to 32 bits for now.

-Lewis


Re: [C PATCH 6/6] c23: construct composite type for tagged types

2023-11-07 Thread Joseph Myers
On Sat, 26 Aug 2023, Martin Uecker via Gcc-patches wrote:

> @@ -501,9 +510,61 @@ composite_type (tree t1, tree t2)
>   return build_type_attribute_variant (t1, attributes);
>}
>  
> -case ENUMERAL_TYPE:
>  case RECORD_TYPE:
>  case UNION_TYPE:
> +  if (flag_isoc2x && !comptypes_same_p (t1, t2))
> + {
> +   gcc_checking_assert (COMPLETE_TYPE_P (t1) && COMPLETE_TYPE_P (t2));
> +   gcc_checking_assert (comptypes (t1, t2));
> +
> +   /* If a composite type for these two types is already under
> +  construction, return it.  */
> +
> +   for (struct composite_cache *c = cache; c != NULL; c = c->next)
> + if (c->t1 == t1 && c->t2 == t2)
> +return c->composite;
> +
> +   /* Otherwise, create a new type node and link it into the cache.  */
> +
> +   tree n = make_node (code1);
> +   struct composite_cache cache2 = { t1, t2, n, cache };
> +   cache = 
> +
> +   tree f1 = TYPE_FIELDS (t1);
> +   tree f2 = TYPE_FIELDS (t2);
> +   tree fields = NULL_TREE;
> +
> +   for (tree a = f1, b = f2; a && b;
> +a = DECL_CHAIN (a), b = DECL_CHAIN (b))
> + {
> +   tree ta = TREE_TYPE (a);
> +   tree tb = TREE_TYPE (b);
> +
> +   gcc_assert (DECL_NAME (a) == DECL_NAME (b));
> +   gcc_assert (comptypes (ta, tb));
> +
> +   tree f = build_decl (input_location, FIELD_DECL, DECL_NAME (a),
> +composite_type_internal (ta, tb, cache));
> +
> +   DECL_FIELD_CONTEXT (f) = n;
> +   DECL_CHAIN (f) = fields;

There is a lot more per-field setup done in grokdeclarator, grokfield and 
finish_struct when a struct or union is defined.  I'm concerned that just 
calling build_decl here and then missing most of the per-field setup done 
elsewhere will not get the composite type set up correctly, especially in 
cases such as bit-fields and packed structures.

Note that the test you have of bit-fields (c2x-tag-composite-3.c) probably 
doesn't exercise this code, because the two types are the same (defined in 
the same scope, so it would be an error if they weren't the same) and so 
the comptypes_same_p check should short-circuit this code.  You need to 
test such issues in cases where the types are genuinely not the same - and 
for bit-fields, that includes ensuring you cover code paths that depend on 
each of DECL_BIT_FIELD, DECL_C_BIT_FIELD, DECL_BIT_FIELD_TYPE, to make 
sure that all of those are correct.

-- 
Joseph S. Myers
jos...@codesourcery.com


Re: [PATCH v1] RISC-V: Support FP rint to i/l/ll diff size autovec

2023-11-07 Thread Patrick O'Neill
Thanks for pointing this out Juzhe, we're investigating how the CI got 
confused here. We'll let you know what we find out.


Patrick


On 11/7/23 14:48, 钟居哲 wrote:

Plz note those FAILs are not caused by this patch.
They are caused by this commit:
https://gcc.gnu.org/git/?p=gcc.git;a=commit;h=0c42741ad95af3a1e3ac07350da4c3a94865ed63

It seems that precommit CI faild to locate the real root cause.


juzhe.zh...@rivai.ai

*From:* Patrick O'Neill 
*Date:* 2023-11-08 03:21
*To:* pan2.li ; gcc-patches

*CC:* juzhe.zhong ; yanzhang.wang
; kito.cheng

*Subject:* Re: [PATCH v1] RISC-V: Support FP rint to i/l/ll diff
size autovec

Ah sorry for the noise - I just saw that this was resolved with a
subsequent patch:

Precommit run:
https://github.com/ewlu/gcc-precommit-ci/issues/608#issuecomment-1798058721

Patrick

On 11/7/23 11:17, Patrick O'Neill wrote:

Hi Pan,
This patch (9acea4376fd98696ba51e59f417c94911a4d8248) 
causes|||cond_widen_reduc-2.c to start failing on: linux/newlib: rv32/64gc 
||linux/newlib: ||rv32gcv ||linux/newlib: ||rv32/64gc|_zba_zbb_zbc_zbs|||FAIL: 
gcc.target/riscv/rvv/autovec/cond/cond_widen_reduc-2.c
scan-assembler-times
\\tvfwredusum\\.vs\\tv[0-9]+,v[0-9]+,v[0-9]+,v0\\.t 2 FAIL:
gcc.target/riscv/rvv/autovec/cond/cond_widen_reduc-2.c
scan-assembler-times
\\tvwredsum\\.vs\\tv[0-9]+,v[0-9]+,v[0-9]+,v0\\.t 3 FAIL:
gcc.target/riscv/rvv/autovec/cond/cond_widen_reduc-2.c
scan-assembler-times
\\tvwredsumu\\.vs\\tv[0-9]+,v[0-9]+,v[0-9]+,v0\\.t 3 Debug log
output: spawn -ignore SIGHUP

/github/patrick-postcommit-runner-2/_work/gcc-postcommit-ci/gcc-postcommit-ci/riscv-gnu-toolchain/build/build-gcc-linux-stage2/gcc/xgcc

-B/github/patrick-postcommit-runner-2/_work/gcc-postcommit-ci/gcc-postcommit-ci/riscv-gnu-toolchain/build/build-gcc-linux-stage2/gcc/

/github/patrick-postcommit-runner-2/_work/gcc-postcommit-ci/gcc-postcommit-ci/riscv-gnu-toolchain/gcc/gcc/testsuite/gcc.target/riscv/rvv/autovec/cond/cond_widen_reduc-2.c
-march=rv32gcv -mabi=ilp32d -mcmodel=medlow
-fdiagnostics-plain-output -ftree-vectorize -O2 --param
riscv-autovec-lmul=dynamic -march=rv64gcv_zvfh_zvl128b
-mabi=lp64d --param riscv-autovec-preference=scalable --param
riscv-autovec-lmul=m2 -fno-vect-cost-model -ffast-math
-ffat-lto-objects -fno-ident -S -o cond_widen_reduc-2.s PASS:
gcc.target/riscv/rvv/autovec/cond/cond_widen_reduc-2.c (test for
excess errors)
gcc.target/riscv/rvv/autovec/cond/cond_widen_reduc-2.c:
\\tvfwredusum\\.vs\\tv[0-9]+,v[0-9]+,v[0-9]+,v0\\.t found 0 times
FAIL: gcc.target/riscv/rvv/autovec/cond/cond_widen_reduc-2.c
scan-assembler-times
\\tvfwredusum\\.vs\\tv[0-9]+,v[0-9]+,v[0-9]+,v0\\.t 2
gcc.target/riscv/rvv/autovec/cond/cond_widen_reduc-2.c:
\\tvwredsum\\.vs\\tv[0-9]+,v[0-9]+,v[0-9]+,v0\\.t found 0 times
FAIL: gcc.target/riscv/rvv/autovec/cond/cond_widen_reduc-2.c
scan-assembler-times
\\tvwredsum\\.vs\\tv[0-9]+,v[0-9]+,v[0-9]+,v0\\.t 3
gcc.target/riscv/rvv/autovec/cond/cond_widen_reduc-2.c:
\\tvwredsumu\\.vs\\tv[0-9]+,v[0-9]+,v[0-9]+,v0\\.t found 0 times
FAIL: gcc.target/riscv/rvv/autovec/cond/cond_widen_reduc-2.c
scan-assembler-times
\\tvwredsumu\\.vs\\tv[0-9]+,v[0-9]+,v[0-9]+,v0\\.t 3 Executing on
host:

/github/patrick-postcommit-runner-2/_work/gcc-postcommit-ci/gcc-postcommit-ci/riscv-gnu-toolchain/build/build-gcc-linux-stage2/gcc/xgcc

-B/github/patrick-postcommit-runner-2/_work/gcc-postcommit-ci/gcc-postcommit-ci/riscv-gnu-toolchain/build/build-gcc-linux-stage2/gcc/

/github/patrick-postcommit-runner-2/_work/gcc-postcommit-ci/gcc-postcommit-ci/riscv-gnu-toolchain/gcc/gcc/testsuite/gcc.target/riscv/rvv/autovec/cond/cond_widen_reduc_run-1.c
-march=rv32gcv -mabi=ilp32d -mcmodel=medlow
-fdiagnostics-plain-output -ftree-vectorize -O2 --param
riscv-autovec-lmul=dynamic --param
riscv-autovec-preference=fixed-vlmax --param
riscv-autovec-lmul=m2 -fno-vect-cost-model -ffast-math -lm -o
./cond_widen_reduc_run-1.exe (timeout = 600) These failures are
still on trunk (b7d05f13e86bf49bfb78c9876deba388efc6082e).
Thanks, Patrick Postcommit CI bisection:
https://github.com/patrick-rivos/gcc-postcommit-ci/issues/130 |
On 11/5/23 01:30,pan2...@intel.com  wrote:

From: Pan Li
This patch would like to support the FP below API auto vectorization
with different type size
+-+---+--+
| API | RV64  | RV32 |
+-+---+--+
| irint   | DF => SI  | DF => SI |
| irintf  | - | -|
| lrint   | - | DF => SI |
| 

Re: [C PATCH 4/6] c23: tag compatibility rules for enums

2023-11-07 Thread Joseph Myers
On Sat, 26 Aug 2023, Martin Uecker via Gcc-patches wrote:

> Allow redefinition of enum types and enumerators.
> 
> gcc/c:
>   * c-decl.cc (start_num): Allow redefinition.

start_enum not start_num.

> @@ -9606,9 +9624,15 @@ start_enum (location_t loc, struct c_enum_contents 
> *the_enum, tree name,
>if (name != NULL_TREE)
>  enumtype = lookup_tag (ENUMERAL_TYPE, name, true, );
>  
> +  if (flag_isoc2x && enumtype != NULL_TREE
> +  && TREE_CODE (enumtype) == ENUMERAL_TYPE
> +  && TYPE_VALUES (enumtype) != NULL_TREE)
> +enumtype = NULL_TREE;

Much the same comment applies as on the struct/union patch regarding 
ensuring nested redefinitions are detected when there's a previous 
definition outside the two nested definitions, in addition to the point 
there about making sure that a definition nested inside an enum type 
specifier for another definition of the same enum gets detected.

-- 
Joseph S. Myers
jos...@codesourcery.com


Re: [PATCH 2/5] aarch64: Add support for GCS system registers with the +gcs modifier

2023-11-07 Thread Richard Sandiford
Victor Do Nascimento  writes:
> Given the introduction of system registers associated with the Guarded
> Control Stack extension to Armv9.4-a in Binutils and their reliance on
> the `+gcs' modifier, we implement the necessary changes in GCC to
> allow for them to be recognized by the compiler.
>
> gcc/ChangeLog:
>
>   * config/aarch64/aarch64-option-extensions.def (gcs): New.
>   * config/aarch64/aarch64.h (AARCH64_ISA_GCS): New.
>   (TARGET_THE):  Likewise.
>   * doc/invoke.texi (AArch64 Options): Describe GCS.

OK, thanks.

Richard

> ---
>  gcc/config/aarch64/aarch64-option-extensions.def | 2 ++
>  gcc/config/aarch64/aarch64.h | 6 ++
>  gcc/doc/invoke.texi  | 2 ++
>  3 files changed, 10 insertions(+)
>
> diff --git a/gcc/config/aarch64/aarch64-option-extensions.def 
> b/gcc/config/aarch64/aarch64-option-extensions.def
> index da31f7c32d1..e72c039b612 100644
> --- a/gcc/config/aarch64/aarch64-option-extensions.def
> +++ b/gcc/config/aarch64/aarch64-option-extensions.def
> @@ -155,4 +155,6 @@ AARCH64_OPT_EXTENSION("d128", D128, (), (), (), "d128")
>  
>  AARCH64_OPT_EXTENSION("the", THE, (), (), (), "the")
>  
> +AARCH64_OPT_EXTENSION("gcs", GCS, (), (), (), "gcs")
> +
>  #undef AARCH64_OPT_EXTENSION
> diff --git a/gcc/config/aarch64/aarch64.h b/gcc/config/aarch64/aarch64.h
> index 1b3c800ec89..69ef54553d7 100644
> --- a/gcc/config/aarch64/aarch64.h
> +++ b/gcc/config/aarch64/aarch64.h
> @@ -230,6 +230,7 @@ enum class aarch64_feature : unsigned char {
>  #define AARCH64_ISA_CSSC(aarch64_isa_flags & AARCH64_FL_CSSC)
>  #define AARCH64_ISA_D128(aarch64_isa_flags & AARCH64_FL_D128)
>  #define AARCH64_ISA_THE (aarch64_isa_flags & AARCH64_FL_THE)
> +#define AARCH64_ISA_GCS (aarch64_isa_flags & AARCH64_FL_GCS)
>  
>  /* AARCH64_FL options necessary for system register implementation.  */
>  
> @@ -403,6 +404,11 @@ enum class aarch64_feature : unsigned char {
>  enabled through +the.  */
>  #define TARGET_THE (AARCH64_ISA_THE)
>  
> +/*  Armv9.4-A Guarded Control Stack extension system registers are
> +enabled through +gcs.  */
> +#define TARGET_GCS (AARCH64_ISA_GCS)
> +
> +
>  /* Standard register usage.  */
>  
>  /* 31 64-bit general purpose registers R0-R30:
> diff --git a/gcc/doc/invoke.texi b/gcc/doc/invoke.texi
> index 88327ce9681..88ee1fdb524 100644
> --- a/gcc/doc/invoke.texi
> +++ b/gcc/doc/invoke.texi
> @@ -21032,6 +21032,8 @@ Enable the Pointer Authentication Extension.
>  Enable the Common Short Sequence Compression instructions.
>  @item d128
>  Enable support for 128-bit system register read/write instructions.
> +@item gcs
> +Enable support for Armv9.4-a Guarded Control Stack extension.
>  @item the
>  Enable support for Armv8.9-a/9.4-a translation hardening extension.


Re: [C PATCH 3/6] c23: tag compatibility rules for struct and unions

2023-11-07 Thread Joseph Myers
On Sat, 26 Aug 2023, Martin Uecker via Gcc-patches wrote:

>   types (convert_for_assignment): Ingore qualifiers.

"Ignore".

> @@ -1993,6 +1993,24 @@ locate_old_decl (tree decl)
>   decl, TREE_TYPE (decl));
>  }
>  
> +static tree
> +previous_tag (tree type)

This function needs a comment documenting its semantics.

> @@ -8651,6 +8672,12 @@ start_struct (location_t loc, enum tree_code code, 
> tree name,
>  
>if (name != NULL_TREE)
>  ref = lookup_tag (code, name, true, );
> +
> +  /* For C2X, even if we already have a completed definition,
> + we do not use it. We will check for consistency later.  */
> +  if (flag_isoc2x && ref && TYPE_SIZE (ref))
> +ref = NULL_TREE;
> +
>if (ref && TREE_CODE (ref) == code)
>  {
>if (TYPE_STUB_DECL (ref))

This comes before the check for nested redefinitions (which are still 
invalid) - so meaning that, if ref is set to NULL_TREE here, the check 
for nested redefinitions won't apply.

You have a testcase for nested redefinitions in a slightly different case 
(where the struct's first definition hasn't finished when the nested 
definition is encountered).  But what about the case where: first, the 
struct gets defined; then, in the same scope, it gets redefined, with the 
redefinition containing a nested redefinition?  I don't see anything here 
to detect that case of nested redefinitions

For enums, note that nested redefinitions include cases where the nesting 
is inside an enum type specifier (currently diagnosed by GCC following an 
ordinary redefinition path, not one for nested definitions).

typedef __SIZE_TYPE__ size_t;
enum e : typeof (sizeof (enum e : size_t { A })) { A };

is invalid because the definitions of enum e are nested, so should be 
diagnosed, and there should be a test that it is.

> @@ -8315,6 +8332,13 @@ digest_init (location_t init_loc, tree type, tree 
> init, tree origtype,
>  conversion.  */
>   inside_init = convert (type, inside_init);
>  
> +  if ((code == RECORD_TYPE || code == UNION_TYPE)
> +   && !comptypes (TYPE_MAIN_VARIANT (type), TYPE_MAIN_VARIANT (TREE_TYPE 
> (inside_init
> + {
> +   error_init (init_loc, "invalid initializer %qT %qT", type, TREE_TYPE 
> (inside_init));
> +   return error_mark_node;
> + }

I'd expect some words between the two type names, or explaining how they 
relate to the initialization, rather than just two type names in 
succession with no explanation of what's the type of the initializer and 
what's the type of the object being initialized.

> diff --git a/gcc/testsuite/gcc.dg/c2x-tag-1.c 
> b/gcc/testsuite/gcc.dg/c2x-tag-1.c

> +struct r { int a; char b[0]; };

I tend to think tests such as this, involving GNU extensions ([0] arrays), 
should go in gnu23-* tests not c23-* ones.

(I'm currently testing the final C2X -> C23 patch, that renames existing 
tests.  The next revision of this patch series will need updating for the 
renaming in both file names and file contents.)

> +++ b/gcc/testsuite/gcc.dg/c2x-tag-10.c

This is definitely a GNU extensions test (VLAs in structures).

> +++ b/gcc/testsuite/gcc.dg/c2x-tag-4.c

Another GNU extensions test (GNU attributes).

> diff --git a/gcc/testsuite/gcc.dg/c2x-tag-7.c 
> b/gcc/testsuite/gcc.dg/c2x-tag-7.c

Another GNU extensions test (VLAs in structures).

-- 
Joseph S. Myers
jos...@codesourcery.com


Re: [PATCH 1/5] aarch64: Add march flags for +the and +d128 arch extensions

2023-11-07 Thread Richard Sandiford
Victor Do Nascimento  writes:
> Given the introduction of optional 128-bit page table descriptor and
> translation hardening extension support with the Arm9.4-a
> architecture, this introduces the relevant flags to enable the reading
> and writing of 128-bit system registers.
>
> The `+d128' -march modifier enables the use of the following ACLE
> builtin functions:
>
>   * __uint128_t __arm_rsr128(const char *special_register);
>   * void __arm_wsr128(const char *special_register, __uint128_t value);
>
> and defines the __ARM_FEATURE_SYSREG128 macro to 1.
>
> Finally, the `rcwmask_el1' and `rcwsmask_el1' 128-bit system register
> implementations are also reliant on the enablement of the `+the' flag,
> which is thus also implemented in this patch.
>
> gcc/ChangeLog:
>
>   * config/aarch64/aarch64-arches.def (armv8.9-a): New.
>   (armv9.4-a): Likewise.
>   * config/aarch64/aarch64-option-extensions.def (d128): Likewise.
>   (the): Likewise.
>   * config/aarch64/aarch64.h (AARCH64_ISA_V9_4A): Likewise.
>   (AARCH64_ISA_V8_9A): Likewise.
>   (TARGET_ARMV9_4): Likewise.
>   (AARCH64_ISA_D128): Likewise.
>   (AARCH64_ISA_THE): Likewise.
>   (TARGET_D128): Likewise.
>   * doc/invoke.texi (AArch64 Options): Document new -march flags
>   and extensions.

OK, thanks.

Richard

> ---
>  gcc/config/aarch64/aarch64-arches.def|  2 ++
>  gcc/config/aarch64/aarch64-c.cc  |  1 +
>  gcc/config/aarch64/aarch64-option-extensions.def |  4 
>  gcc/config/aarch64/aarch64.h | 15 +++
>  gcc/doc/invoke.texi  |  6 ++
>  5 files changed, 28 insertions(+)
>
> diff --git a/gcc/config/aarch64/aarch64-arches.def 
> b/gcc/config/aarch64/aarch64-arches.def
> index 7ae92aa8e98..becccb801d0 100644
> --- a/gcc/config/aarch64/aarch64-arches.def
> +++ b/gcc/config/aarch64/aarch64-arches.def
> @@ -39,10 +39,12 @@ AARCH64_ARCH("armv8.5-a", generic,   V8_5A, 
> 8,  (V8_4A, SB, SSBS, PR
>  AARCH64_ARCH("armv8.6-a", generic,   V8_6A, 8,  (V8_5A, I8MM, 
> BF16))
>  AARCH64_ARCH("armv8.7-a", generic,   V8_7A, 8,  (V8_6A, LS64))
>  AARCH64_ARCH("armv8.8-a", generic,   V8_8A, 8,  (V8_7A, MOPS))
> +AARCH64_ARCH("armv8.9-a", generic,   V8_9A, 8,  (V8_8A))
>  AARCH64_ARCH("armv8-r",   generic,   V8R  , 8,  (V8_4A))
>  AARCH64_ARCH("armv9-a",   generic,   V9A  , 9,  (V8_5A, SVE2))
>  AARCH64_ARCH("armv9.1-a", generic,   V9_1A, 9,  (V8_6A, V9A))
>  AARCH64_ARCH("armv9.2-a", generic,   V9_2A, 9,  (V8_7A, V9_1A))
>  AARCH64_ARCH("armv9.3-a", generic,   V9_3A, 9,  (V8_8A, V9_2A))
> +AARCH64_ARCH("armv9.4-a", generic,   V9_4A, 9,  (V8_9A, V9_3A))
>  
>  #undef AARCH64_ARCH
> diff --git a/gcc/config/aarch64/aarch64-c.cc b/gcc/config/aarch64/aarch64-c.cc
> index be8b7236cf9..cacf8e8ed25 100644
> --- a/gcc/config/aarch64/aarch64-c.cc
> +++ b/gcc/config/aarch64/aarch64-c.cc
> @@ -206,6 +206,7 @@ aarch64_update_cpp_builtins (cpp_reader *pfile)
>aarch64_def_or_undef (TARGET_LS64,
>   "__ARM_FEATURE_LS64", pfile);
>aarch64_def_or_undef (AARCH64_ISA_RCPC, "__ARM_FEATURE_RCPC", pfile);
> +  aarch64_def_or_undef (TARGET_D128, "__ARM_FEATURE_SYSREG128", pfile);
>  
>/* Not for ACLE, but required to keep "float.h" correct if we switch
>   target between implementations that do or do not support ARMv8.2-A
> diff --git a/gcc/config/aarch64/aarch64-option-extensions.def 
> b/gcc/config/aarch64/aarch64-option-extensions.def
> index 825f3bf7758..da31f7c32d1 100644
> --- a/gcc/config/aarch64/aarch64-option-extensions.def
> +++ b/gcc/config/aarch64/aarch64-option-extensions.def
> @@ -151,4 +151,8 @@ AARCH64_OPT_EXTENSION("mops", MOPS, (), (), (), "")
>  
>  AARCH64_OPT_EXTENSION("cssc", CSSC, (), (), (), "cssc")
>  
> +AARCH64_OPT_EXTENSION("d128", D128, (), (), (), "d128")
> +
> +AARCH64_OPT_EXTENSION("the", THE, (), (), (), "the")
> +
>  #undef AARCH64_OPT_EXTENSION
> diff --git a/gcc/config/aarch64/aarch64.h b/gcc/config/aarch64/aarch64.h
> index 84e6f79ca83..1b3c800ec89 100644
> --- a/gcc/config/aarch64/aarch64.h
> +++ b/gcc/config/aarch64/aarch64.h
> @@ -219,13 +219,17 @@ enum class aarch64_feature : unsigned char {
>  #define AARCH64_ISA_PAUTH   (aarch64_isa_flags & AARCH64_FL_PAUTH)
>  #define AARCH64_ISA_V8_7A   (aarch64_isa_flags & AARCH64_FL_V8_7A)
>  #define AARCH64_ISA_V8_8A   (aarch64_isa_flags & AARCH64_FL_V8_8A)
> +#define AARCH64_ISA_V8_9A   (aarch64_isa_flags & AARCH64_FL_V8_9A)
>  #define AARCH64_ISA_V9A (aarch64_isa_flags & AARCH64_FL_V9A)
>  #define AARCH64_ISA_V9_1A  (aarch64_isa_flags & AARCH64_FL_V9_1A)
>  #define AARCH64_ISA_V9_2A  (aarch64_isa_flags & AARCH64_FL_V9_2A)
>  #define AARCH64_ISA_V9_3A  (aarch64_isa_flags & AARCH64_FL_V9_3A)
> +#define AARCH64_ISA_V9_4A  (aarch64_isa_flags & 

Re: [PATCH 3/5] aarch64: Sync `aarch64-sys-regs.def' with Binutils.

2023-11-07 Thread Richard Sandiford
Victor Do Nascimento  writes:
> This patch updates `aarch64-sys-regs.def', bringing it into sync with
> the Binutils source.
>
> gcc/ChangeLog:
>
>   * config/aarch64/aarch64-sys-regs.def (par_el1): New.
>   (rcwmask_el1): Likewise.
>   (rcwsmask_el1): Likewise.
>   (ttbr0_el1): Likewise.
>   (ttbr0_el12): Likewise.
>   (ttbr0_el2): Likewise.
>   (ttbr1_el1): Likewise.
>   (ttbr1_el12): Likewise.
>   (ttbr1_el2): Likewise.
>   (vttbr_el2): Likewise.
>   (gcspr_el0): Likewise.
>   (gcspr_el1): Likewise.
>   (gcspr_el12): Likewise.
>   (gcspr_el2): Likewise.
>   (gcspr_el3): Likewise.
>   (gcscre0_el1): Likewise.
>   (gcscr_el1): Likewise.
>   (gcscr_el12): Likewise.
>   (gcscr_el2): Likewise.
>   (gcscr_el3): Likewise.

LGTM.  Process-wise, I think we should consider simple copies of this file
from binutils to be pre-approved/obvious.

Thanks,
Richard

> ---
>  gcc/config/aarch64/aarch64-sys-regs.def | 30 +
>  1 file changed, 21 insertions(+), 9 deletions(-)
>
> diff --git a/gcc/config/aarch64/aarch64-sys-regs.def 
> b/gcc/config/aarch64/aarch64-sys-regs.def
> index d24a2455503..96bdadb0b0f 100644
> --- a/gcc/config/aarch64/aarch64-sys-regs.def
> +++ b/gcc/config/aarch64/aarch64-sys-regs.def
> @@ -419,6 +419,16 @@
>SYSREG ("fpcr",CPENC (3,3,4,4,0),  0,  
> AARCH64_NO_FEATURES)
>SYSREG ("fpexc32_el2", CPENC (3,4,5,3,0),  0,  
> AARCH64_NO_FEATURES)
>SYSREG ("fpsr",CPENC (3,3,4,4,1),  0,  
> AARCH64_NO_FEATURES)
> +  SYSREG ("gcspr_el0",   CPENC (3,3,2,5,1),  F_ARCHEXT,  
> AARCH64_FEATURE (GCS))
> +  SYSREG ("gcspr_el1",   CPENC (3,0,2,5,1),  F_ARCHEXT,  
> AARCH64_FEATURE (GCS))
> +  SYSREG ("gcspr_el2",   CPENC (3,4,2,5,1),  F_ARCHEXT,  
> AARCH64_FEATURE (GCS))
> +  SYSREG ("gcspr_el12",  CPENC (3,5,2,5,1),  F_ARCHEXT,  
> AARCH64_FEATURE (GCS))
> +  SYSREG ("gcspr_el3",   CPENC (3,6,2,5,1),  F_ARCHEXT,  
> AARCH64_FEATURE (GCS))
> +  SYSREG ("gcscre0_el1", CPENC (3,0,2,5,2),  F_ARCHEXT,  
> AARCH64_FEATURE (GCS))
> +  SYSREG ("gcscr_el1",   CPENC (3,0,2,5,0),  F_ARCHEXT,  
> AARCH64_FEATURE (GCS))
> +  SYSREG ("gcscr_el2",   CPENC (3,4,2,5,0),  F_ARCHEXT,  
> AARCH64_FEATURE (GCS))
> +  SYSREG ("gcscr_el12",  CPENC (3,5,2,5,0),  F_ARCHEXT,  
> AARCH64_FEATURE (GCS))
> +  SYSREG ("gcscr_el3",   CPENC (3,6,2,5,0),  F_ARCHEXT,  
> AARCH64_FEATURE (GCS))
>SYSREG ("gcr_el1", CPENC (3,0,1,0,6),  F_ARCHEXT,  
> AARCH64_FEATURE (MEMTAG))
>SYSREG ("gmid_el1",CPENC (3,1,0,0,4),  
> F_REG_READ|F_ARCHEXT,   AARCH64_FEATURE (MEMTAG))
>SYSREG ("gpccr_el3",   CPENC (3,6,2,1,6),  0,  
> AARCH64_NO_FEATURES)
> @@ -584,7 +594,7 @@
>SYSREG ("oslar_el1",   CPENC (2,0,1,0,4),  F_REG_WRITE,
> AARCH64_NO_FEATURES)
>SYSREG ("oslsr_el1",   CPENC (2,0,1,1,4),  F_REG_READ, 
> AARCH64_NO_FEATURES)
>SYSREG ("pan", CPENC (3,0,4,2,3),  F_ARCHEXT,  
> AARCH64_FEATURE (PAN))
> -  SYSREG ("par_el1", CPENC (3,0,7,4,0),  0,  
> AARCH64_NO_FEATURES)
> +  SYSREG ("par_el1", CPENC (3,0,7,4,0),  F_REG_128,  
> AARCH64_NO_FEATURES)
>SYSREG ("pmbidr_el1",  CPENC (3,0,9,10,7), 
> F_REG_READ|F_ARCHEXT,   AARCH64_FEATURE (PROFILE))
>SYSREG ("pmblimitr_el1",   CPENC (3,0,9,10,0), F_ARCHEXT,  
> AARCH64_FEATURE (PROFILE))
>SYSREG ("pmbptr_el1",  CPENC (3,0,9,10,1), F_ARCHEXT,  
> AARCH64_FEATURE (PROFILE))
> @@ -746,6 +756,8 @@
>SYSREG ("prlar_el2",   CPENC (3,4,6,8,1),  F_ARCHEXT,  
> AARCH64_FEATURE (V8R))
>SYSREG ("prselr_el1",  CPENC (3,0,6,2,1),  F_ARCHEXT,  
> AARCH64_FEATURE (V8R))
>SYSREG ("prselr_el2",  CPENC (3,4,6,2,1),  F_ARCHEXT,  
> AARCH64_FEATURE (V8R))
> +  SYSREG ("rcwmask_el1", CPENC (3,0,13,0,6), F_ARCHEXT|F_REG_128,
> AARCH64_FEATURE (THE))
> +  SYSREG ("rcwsmask_el1",CPENC (3,0,13,0,3), F_ARCHEXT|F_REG_128,
> AARCH64_FEATURE (THE))
>SYSREG ("revidr_el1",  CPENC (3,0,0,0,6),  F_REG_READ, 
> AARCH64_NO_FEATURES)
>SYSREG ("rgsr_el1",CPENC (3,0,1,0,5),  F_ARCHEXT,  
> AARCH64_FEATURE (MEMTAG))
>SYSREG ("rmr_el1", CPENC (3,0,12,0,2), 0,  
> AARCH64_NO_FEATURES)
> @@ -1034,13 +1046,13 @@
>SYSREG ("trfcr_el1",  

Re: [PATCH 4/5] aarch64: Implement 128-bit extension to ACLE sysreg r/w builtins

2023-11-07 Thread Richard Sandiford
Victor Do Nascimento  writes:
> Implement the ACLE builtins for 128-bit system register manipulation:
>
>   * __uint128_t __arm_rsr128(const char *special_register);
>   * void __arm_wsr128(const char *special_register, __uint128_t value);
>
> gcc/ChangeLog:
>
>   * config/aarch64/aarch64-builtins.cc (AARCH64_RSR128): New
>   `enum aarch64_builtins' value.
>   (AARCH64_WSR128): Likewise.
>   (aarch64_init_rwsr_builtins): Init `__builtin_aarch64_rsr128'
>   and `__builtin_aarch64_wsr128' builtins.
>   (aarch64_expand_rwsr_builtin): Extend function to handle
>   `__builtin_aarch64_{rsr|wsr}128'.
>   * config/aarch64/aarch64-protos.h (aarch64_retrieve_sysreg):
>   Update function signature.
>   * config/aarch64/aarch64.cc (F_REG_128): New.
>   (aarch64_retrieve_sysreg): Add 128-bit register mode check.
>   * config/aarch64/aarch64.md (UNSPEC_SYSREG_RTI): New.
>   (UNSPEC_SYSREG_WTI): Likewise.
>   (aarch64_read_sysregti): Likewise.
>   (aarch64_write_sysregti): Likewise.
> ---
>  gcc/config/aarch64/aarch64-builtins.cc | 50 +-
>  gcc/config/aarch64/aarch64-protos.h|  2 +-
>  gcc/config/aarch64/aarch64.cc  |  6 +++-
>  gcc/config/aarch64/aarch64.md  | 18 ++
>  gcc/config/aarch64/arm_acle.h  | 11 ++
>  5 files changed, 77 insertions(+), 10 deletions(-)
>
> diff --git a/gcc/config/aarch64/aarch64-builtins.cc 
> b/gcc/config/aarch64/aarch64-builtins.cc
> index c5f20f68bca..40d3788b5e0 100644
> --- a/gcc/config/aarch64/aarch64-builtins.cc
> +++ b/gcc/config/aarch64/aarch64-builtins.cc
> @@ -815,11 +815,13 @@ enum aarch64_builtins
>AARCH64_RSR64,
>AARCH64_RSRF,
>AARCH64_RSRF64,
> +  AARCH64_RSR128,
>AARCH64_WSR,
>AARCH64_WSRP,
>AARCH64_WSR64,
>AARCH64_WSRF,
>AARCH64_WSRF64,
> +  AARCH64_WSR128,
>AARCH64_BUILTIN_MAX
>  };
>  
> @@ -1842,6 +1844,10 @@ aarch64_init_rwsr_builtins (void)
>  = build_function_type_list (double_type_node, const_char_ptr_type, NULL);
>AARCH64_INIT_RWSR_BUILTINS_DECL (RSRF64, rsrf64, fntype);
>  
> +  fntype
> += build_function_type_list (uint128_type_node, const_char_ptr_type, 
> NULL);
> +  AARCH64_INIT_RWSR_BUILTINS_DECL (RSR128, rsr128, fntype);
> +
>fntype
>  = build_function_type_list (void_type_node, const_char_ptr_type,
>   uint32_type_node, NULL);
> @@ -1867,6 +1873,12 @@ aarch64_init_rwsr_builtins (void)
>  = build_function_type_list (void_type_node, const_char_ptr_type,
>   double_type_node, NULL);
>AARCH64_INIT_RWSR_BUILTINS_DECL (WSRF64, wsrf64, fntype);
> +
> +  fntype
> += build_function_type_list (void_type_node, const_char_ptr_type,
> + uint128_type_node, NULL);
> +  AARCH64_INIT_RWSR_BUILTINS_DECL (WSR128, wsr128, fntype);
> +
>  }
>  
>  /* Initialize the memory tagging extension (MTE) builtins.  */
> @@ -2710,6 +2722,7 @@ aarch64_expand_rwsr_builtin (tree exp, rtx target, int 
> fcode)
>tree arg0, arg1;
>rtx const_str, input_val, subreg;
>enum machine_mode mode;
> +  enum insn_code icode;
>class expand_operand ops[2];
>  
>arg0 = CALL_EXPR_ARG (exp, 0);
> @@ -2718,7 +2731,18 @@ aarch64_expand_rwsr_builtin (tree exp, rtx target, int 
> fcode)
>  || fcode == AARCH64_WSRP
>  || fcode == AARCH64_WSR64
>  || fcode == AARCH64_WSRF
> -|| fcode == AARCH64_WSRF64);
> +|| fcode == AARCH64_WSRF64
> +|| fcode == AARCH64_WSR128);
> +
> +  bool op128 = (fcode == AARCH64_RSR128 || fcode == AARCH64_WSR128);
> +  enum machine_mode sysreg_mode = op128 ? TImode : DImode;
> +
> +  if (op128 && !TARGET_D128)
> +{
> +  error_at (EXPR_LOCATION (exp), "128-bit system register suppport 
> requires "
> +  "the +d128 Armv9.4-A extension");

Elsewhere we've put feature names in quotes, since they're code
or code-adjacent.  Probably also best to drop Armv9.4-A part,
since the requirement is tied only to +d128.  So:

  error_at (EXPR_LOCATION (exp),
"128-bit system register suppport requires the %"
" extension");

(formatted that way to avoid a long line).

> +  return const0_rtx;
> +}
>  
>/* Argument 0 (system register name) must be a string literal.  */
>gcc_assert (TREE_CODE (arg0) == ADDR_EXPR
> @@ -2741,7 +2765,7 @@ aarch64_expand_rwsr_builtin (tree exp, rtx target, int 
> fcode)
>  sysreg_name[pos] = TOLOWER (sysreg_name[pos]);
>  
>const char* name_output = aarch64_retrieve_sysreg ((const char *) 
> sysreg_name,
> -  write_op);
> +  write_op, op128);
>if (name_output == NULL)
>  {
>error_at (EXPR_LOCATION (exp), "invalid system register name 
> provided");
> @@ -2760,13 +2784,17 @@ 

RE: [PATCH 5/21]middle-end: update vectorizer's control update to support picking an exit other than loop latch

2023-11-07 Thread Tamar Christina
> -Original Message-
> From: Richard Biener 
> Sent: Tuesday, November 7, 2023 3:04 PM
> To: Tamar Christina 
> Cc: gcc-patches@gcc.gnu.org; nd ; j...@ventanamicro.com
> Subject: Re: [PATCH 5/21]middle-end: update vectorizer's control update to
> support picking an exit other than loop latch
> 
> On Mon, 6 Nov 2023, Tamar Christina wrote:
> 
> > Hi All,
> >
> > As requested, the vectorizer is now free to pick it's own exit which
> > can be different than what the loop CFG infrastucture uses.  The
> > vectorizer makes use of this to vectorize loops that it previously could 
> > not.
> >
> > But this means that loop control must be materialized in the block
> > that needs it less we corrupt the SSA chain.  This makes it so we use
> > the vectorizer's main IV block instead of the loop infra.
> >
> > Bootstrapped Regtested on aarch64-none-linux-gnu and no issues.
> >
> > Ok for master?
> >
> > Thanks,
> > Tamar
> >
> > gcc/ChangeLog:
> >
> > * tree-ssa-loop-manip.cc (standard_iv_increment_position):
> Conditionally
> > take dest BB.
> > * tree-ssa-loop-manip.h (standard_iv_increment_position): Likewise.
> > * tree-vect-loop-manip.cc (vect_set_loop_controls_directly): Use it.
> > (vect_set_loop_condition_partial_vectors_avx512): Likewise.
> > (vect_set_loop_condition_normal): Likewise.
> >
> > --- inline copy of patch --
> > diff --git a/gcc/tree-ssa-loop-manip.h b/gcc/tree-ssa-loop-manip.h
> > index
> >
> bda09f51d5619420331c513a9906831c779fd2b4..5938588c8882d842b00
> 301423df1
> > 11cbe7bf7ba8 100644
> > --- a/gcc/tree-ssa-loop-manip.h
> > +++ b/gcc/tree-ssa-loop-manip.h
> > @@ -38,7 +38,8 @@ extern basic_block split_loop_exit_edge (edge, bool
> > = false);  extern basic_block ip_end_pos (class loop *);  extern
> > basic_block ip_normal_pos (class loop *);  extern void
> > standard_iv_increment_position (class loop *,
> > -   gimple_stmt_iterator *, bool *);
> > +   gimple_stmt_iterator *, bool *,
> > +   basic_block = NULL);
> >  extern bool
> >  gimple_duplicate_loop_body_to_header_edge (class loop *, edge, unsigned
> int,
> >sbitmap, edge, vec *, int); 
> > diff
> --git
> > a/gcc/tree-ssa-loop-manip.cc b/gcc/tree-ssa-loop-manip.cc index
> >
> e7436915e01297e7af2a3bcf1afd01e014de6f32..bdc7a3d74a788f450ca5d
> de6c294
> > 92ce4d4e4550 100644
> > --- a/gcc/tree-ssa-loop-manip.cc
> > +++ b/gcc/tree-ssa-loop-manip.cc
> > @@ -792,14 +792,19 @@ ip_normal_pos (class loop *loop)
> >
> >  /* Stores the standard position for induction variable increment in LOOP
> > (just before the exit condition if it is available and latch block is 
> > empty,
> > -   end of the latch block otherwise) to BSI.  INSERT_AFTER is set to true 
> > if
> > -   the increment should be inserted after *BSI.  */
> > +   end of the latch block otherwise) to BSI.  If DEST_BB is specified then 
> > that
> > +   basic block is used as the destination instead of the loop latch source
> > +   block.  INSERT_AFTER is set to true if the increment should be inserted
> after
> > +   *BSI.  */
> >
> >  void
> >  standard_iv_increment_position (class loop *loop, gimple_stmt_iterator
> *bsi,
> > -   bool *insert_after)
> > +   bool *insert_after, basic_block dest_bb)
> >  {
> > -  basic_block bb = ip_normal_pos (loop), latch = ip_end_pos (loop);
> > +  basic_block bb = dest_bb;
> > +  if (!bb)
> > +bb = ip_normal_pos (loop);
> > +  basic_block latch = ip_end_pos (loop);
> 
> I don't think that's a good API extension.  Given that we don't support an 
> early
> exit after the main IV exit doesn't this code already work fine as-is?  It 
> chooses
> the last exit.  The position is also not semantically relevant, we just try 
> to keep
> the latch empty here (that is, it's a bit of a "bad" API).
> 
> So, do you really need this change?

I'll double check. I remember needing it to fix an ICE before, but also re-did 
the
way the alternative main exits were handled later.  At the end of the series as 
I
was writing the cover letter this change also seemed... off to me, I should have
checked it again before submitting it.

> 
> Maybe we're really using standard_iv_increment_position wrong here, the
> result is supposed to _only_ feed the PHI latch argument.

Could be, this was needed before I changed the way I handled the IV updates for
the alternate exit loops.  I'll double check and drop If not needed.

Thanks,
Tamar

> 
> Richard.
> 
> >gimple *last = last_nondebug_stmt (latch);
> >
> >if (!bb
> > diff --git a/gcc/tree-vect-loop-manip.cc b/gcc/tree-vect-loop-manip.cc
> > index
> >
> 6fbb5b80986fd657814b48eb009b52b094f331e6..3d59119787d6afdc5a64
> 65a547d1
> > ea2d3d940373 100644
> > --- a/gcc/tree-vect-loop-manip.cc
> > +++ b/gcc/tree-vect-loop-manip.cc
> > @@ -531,7 +531,8 @@ vect_set_loop_controls_directly (class loop 

Re: [C PATCH 2/6] c23: recursive type checking of tagged type

2023-11-07 Thread Joseph Myers
On Sat, 26 Aug 2023, Martin Uecker via Gcc-patches wrote:

> Adapt the old and unused code for type checking for C23.
> 
> gcc/c/:
>   * c-typeck.c (struct comptypes_data): Add anon_field flag.
>   (comptypes, comptypes_check_unum_int,
>   comptypes_check_different_types): Remove old cache.
>   (tagged_tu_types_compatible_p): Rewrite.

OK.

-- 
Joseph S. Myers
jos...@codesourcery.com


Re: [PATCH 5/5] aarch64: Add rsr128 and wsr128 ACLE tests

2023-11-07 Thread Richard Sandiford
Victor Do Nascimento  writes:
> Extend existing unit tests for the ACLE system register manipulation
> functions to include 128-bit tests.
>
> gcc/testsuite/ChangeLog:
>
>   * gcc/testsuite/gcc.target/aarch64/acle/rwsr.c (get_rsr128): New.
>   (set_wsr128): Likewise.
> ---
>  gcc/testsuite/gcc.target/aarch64/acle/rwsr.c | 30 +++-
>  1 file changed, 29 insertions(+), 1 deletion(-)
>
> diff --git a/gcc/testsuite/gcc.target/aarch64/acle/rwsr.c 
> b/gcc/testsuite/gcc.target/aarch64/acle/rwsr.c
> index 3af4b960306..e7725022316 100644
> --- a/gcc/testsuite/gcc.target/aarch64/acle/rwsr.c
> +++ b/gcc/testsuite/gcc.target/aarch64/acle/rwsr.c
> @@ -1,11 +1,15 @@
>  /* Test the __arm_[r,w]sr ACLE intrinsics family.  */
>  /* Check that function variants for different data types handle types 
> correctly.  */
>  /* { dg-do compile } */
> -/* { dg-options "-O1 -march=armv8.4-a" } */
> +/* { dg-options "-O1 -march=armv9.4-a+d128" } */
>  /* { dg-final { check-function-bodies "**" "" } } */

I'm nervous about having our only tests for 64-bit reads and writes
using such a high minimum version.  Could the file instead be compiled
without any minimum architecture and have tests that work with plain
-march=armv8-a?  Then the test could switch to other architectures
where necessary using #pragam GCC target.  This test...

>  #include 
>  
> +#ifndef __ARM_FEATURE_SYSREG128
> +#error "__ARM_FEATURE_SYSREG128 feature macro not defined."
> +#endif
> +

...would still work. with a #pragma GCC target.

Thanks,
Richard

>  /*
>  ** get_rsr:
>  ** ...
> @@ -66,6 +70,17 @@ get_rsrf64 ()
>return __arm_rsrf64("trcseqstr");
>  }
>  
> +/*
> +** get_rsr128:
> +**   mrrsx0, x1, s3_0_c7_c4_0
> +** ...
> +*/
> +__uint128_t
> +get_rsr128 ()
> +{
> +  __arm_rsr128("par_el1");
> +}
> +
>  /*
>  ** set_wsr32:
>  ** ...
> @@ -129,6 +144,18 @@ set_wsrf64(double a)
>__arm_wsrf64("trcseqstr", a);
>  }
>  
> +/*
> +** set_wsr128:
> +** ...
> +**   msrrs3_0_c7_c4_0, x0, x1
> +** ...
> +*/
> +void
> +set_wsr128 (__uint128_t c)
> +{
> +  __arm_wsr128 ("par_el1", c);
> +}
> +
>  /*
>  ** set_custom:
>  ** ...
> @@ -142,3 +169,4 @@ void set_custom()
>__uint64_t b = __arm_rsr64("S1_2_C3_C4_5");
>__arm_wsr64("S1_2_C3_C4_5", b);
>  }
> +


Re: Re: [PATCH v1] RISC-V: Support FP rint to i/l/ll diff size autovec

2023-11-07 Thread 钟居哲
Plz note those FAILs are not caused by this patch.
They are caused by this commit:
https://gcc.gnu.org/git/?p=gcc.git;a=commit;h=0c42741ad95af3a1e3ac07350da4c3a94865ed63
 

It seems that precommit CI faild to locate the real root cause.



juzhe.zh...@rivai.ai
 
From: Patrick O'Neill
Date: 2023-11-08 03:21
To: pan2.li; gcc-patches
CC: juzhe.zhong; yanzhang.wang; kito.cheng
Subject: Re: [PATCH v1] RISC-V: Support FP rint to i/l/ll diff size autovec
Ah sorry for the noise - I just saw that this was resolved with a subsequent 
patch:

Precommit run: 
https://github.com/ewlu/gcc-precommit-ci/issues/608#issuecomment-1798058721

Patrick
On 11/7/23 11:17, Patrick O'Neill wrote:
Hi Pan,
This patch (9acea4376fd98696ba51e59f417c94911a4d8248) causes 
cond_widen_reduc-2.c to start failing on:
linux/newlib: rv32/64gc
linux/newlib: rv32gcv
linux/newlib: rv32/64gc_zba_zbb_zbc_zbs
FAIL: gcc.target/riscv/rvv/autovec/cond/cond_widen_reduc-2.c 
scan-assembler-times \\tvfwredusum\\.vs\\tv[0-9]+,v[0-9]+,v[0-9]+,v0\\.t 2
FAIL: gcc.target/riscv/rvv/autovec/cond/cond_widen_reduc-2.c 
scan-assembler-times \\tvwredsum\\.vs\\tv[0-9]+,v[0-9]+,v[0-9]+,v0\\.t 3
FAIL: gcc.target/riscv/rvv/autovec/cond/cond_widen_reduc-2.c 
scan-assembler-times \\tvwredsumu\\.vs\\tv[0-9]+,v[0-9]+,v[0-9]+,v0\\.t 3
Debug log output:
spawn -ignore SIGHUP 
/github/patrick-postcommit-runner-2/_work/gcc-postcommit-ci/gcc-postcommit-ci/riscv-gnu-toolchain/build/build-gcc-linux-stage2/gcc/xgcc
 
-B/github/patrick-postcommit-runner-2/_work/gcc-postcommit-ci/gcc-postcommit-ci/riscv-gnu-toolchain/build/build-gcc-linux-stage2/gcc/
 
/github/patrick-postcommit-runner-2/_work/gcc-postcommit-ci/gcc-postcommit-ci/riscv-gnu-toolchain/gcc/gcc/testsuite/gcc.target/riscv/rvv/autovec/cond/cond_widen_reduc-2.c
 -march=rv32gcv -mabi=ilp32d -mcmodel=medlow -fdiagnostics-plain-output 
-ftree-vectorize -O2 --param riscv-autovec-lmul=dynamic 
-march=rv64gcv_zvfh_zvl128b -mabi=lp64d --param 
riscv-autovec-preference=scalable --param riscv-autovec-lmul=m2 
-fno-vect-cost-model -ffast-math -ffat-lto-objects -fno-ident -S -o 
cond_widen_reduc-2.s
PASS: gcc.target/riscv/rvv/autovec/cond/cond_widen_reduc-2.c (test for excess 
errors)
gcc.target/riscv/rvv/autovec/cond/cond_widen_reduc-2.c: 
\\tvfwredusum\\.vs\\tv[0-9]+,v[0-9]+,v[0-9]+,v0\\.t found 0 times
FAIL: gcc.target/riscv/rvv/autovec/cond/cond_widen_reduc-2.c 
scan-assembler-times \\tvfwredusum\\.vs\\tv[0-9]+,v[0-9]+,v[0-9]+,v0\\.t 2
gcc.target/riscv/rvv/autovec/cond/cond_widen_reduc-2.c: 
\\tvwredsum\\.vs\\tv[0-9]+,v[0-9]+,v[0-9]+,v0\\.t found 0 times
FAIL: gcc.target/riscv/rvv/autovec/cond/cond_widen_reduc-2.c 
scan-assembler-times \\tvwredsum\\.vs\\tv[0-9]+,v[0-9]+,v[0-9]+,v0\\.t 3
gcc.target/riscv/rvv/autovec/cond/cond_widen_reduc-2.c: 
\\tvwredsumu\\.vs\\tv[0-9]+,v[0-9]+,v[0-9]+,v0\\.t found 0 times
FAIL: gcc.target/riscv/rvv/autovec/cond/cond_widen_reduc-2.c 
scan-assembler-times \\tvwredsumu\\.vs\\tv[0-9]+,v[0-9]+,v[0-9]+,v0\\.t 3
Executing on host: 
/github/patrick-postcommit-runner-2/_work/gcc-postcommit-ci/gcc-postcommit-ci/riscv-gnu-toolchain/build/build-gcc-linux-stage2/gcc/xgcc
 
-B/github/patrick-postcommit-runner-2/_work/gcc-postcommit-ci/gcc-postcommit-ci/riscv-gnu-toolchain/build/build-gcc-linux-stage2/gcc/
  
/github/patrick-postcommit-runner-2/_work/gcc-postcommit-ci/gcc-postcommit-ci/riscv-gnu-toolchain/gcc/gcc/testsuite/gcc.target/riscv/rvv/autovec/cond/cond_widen_reduc_run-1.c
  -march=rv32gcv -mabi=ilp32d -mcmodel=medlow   -fdiagnostics-plain-output   
-ftree-vectorize -O2 --param riscv-autovec-lmul=dynamic --param 
riscv-autovec-preference=fixed-vlmax --param riscv-autovec-lmul=m2 
-fno-vect-cost-model -ffast-math  -lm  -o ./cond_widen_reduc_run-1.exe
(timeout = 600)
These failures are still on trunk (b7d05f13e86bf49bfb78c9876deba388efc6082e).
Thanks,
Patrick
Postcommit CI bisection: 
https://github.com/patrick-rivos/gcc-postcommit-ci/issues/130

On 11/5/23 01:30, pan2...@intel.com wrote:
From: Pan Li 
This patch would like to support the FP below API auto vectorization
with different type size
+-+---+--+
| API | RV64  | RV32 |
+-+---+--+
| irint   | DF => SI  | DF => SI |
| irintf  | - | -|
| lrint   | - | DF => SI |
| lrintf  | SF => DI  | -|
| llrint  | - | -|
| llrintf | SF => DI  | SF => DI |
+-+---+--+
Given below code:
void
test_lrintf (long *out, float *in, unsigned count)
{
  for (unsigned i = 0; i < count; i++)
out[i] = __builtin_lrintf (in[i]);
}
Before this patch:
test_lrintf:
  beq a2,zero,.L8
  sllia5,a2,32
  srlia2,a5,30
  add a4,a1,a2
.L3:
  flw fa5,0(a1)
  addia1,a1,4
  addia0,a0,8
  fcvt.l.s a5,fa5,dyn
  sd  a5,-8(a0)
  bne a1,a4,.L3
After this patch:
test_lrintf:
  beq a2,zero,.L8
  sllia2,a2,32
  srlia2,a2,32
.L3:
  vsetvli a5,a2,e32,mf2,ta,ma
  vle32.v v2,0(a1)
  sllia3,a5,2
  sllia4,a5,3
  

Re: [PATCH] openmp: Add support for the 'indirect' clause in C/C++

2023-11-07 Thread Kwok Cheung Yeung
Yes, I believe that is the right fix. The version in 
libgomp/config/accel/ should then override the version in libgomp/ for 
accelerator targets.


I'll do a quick check that this works as expected and push it ASAP. 
Sorry for breaking the build for so many targets!


Kwok

On 07/11/2023 9:51 pm, Jakub Jelinek wrote:

On Tue, Nov 07, 2023 at 09:37:22PM +, Joseph Myers wrote:

This looks like targets that libgomp/configure.tgt does *not* have any
special handling for, and so never adds "linux" to config_path for.


Indeed, I don't really see anything linux specific about the
libgomp/config/linux/target-indirect.c
so wonder if the right fix isn't
git mv libgomp/{config/linux/,}target-indirect.c

Jakub



Re: [PATCH] openmp: Add support for the 'indirect' clause in C/C++

2023-11-07 Thread Jakub Jelinek
On Tue, Nov 07, 2023 at 09:37:22PM +, Joseph Myers wrote:
> This looks like targets that libgomp/configure.tgt does *not* have any 
> special handling for, and so never adds "linux" to config_path for.

Indeed, I don't really see anything linux specific about the
libgomp/config/linux/target-indirect.c
so wonder if the right fix isn't
git mv libgomp/{config/linux/,}target-indirect.c

Jakub



Re: [PATCH] openmp: Add support for the 'indirect' clause in C/C++

2023-11-07 Thread Joseph Myers
I'm seeing build failures "make[5]: *** No rule to make target 
'target-indirect.c', needed by 'target-indirect.lo'.  Stop." for many 
targets in my glibc bot.

https://sourceware.org/pipermail/libc-testresults/2023q4/012061.html

FAIL: compilers-arc-linux-gnu gcc build
FAIL: compilers-arc-linux-gnuhf gcc build
FAIL: compilers-arceb-linux-gnu gcc build
FAIL: compilers-csky-linux-gnuabiv2 gcc build
FAIL: compilers-csky-linux-gnuabiv2-soft gcc build
FAIL: compilers-m68k-linux-gnu gcc build
FAIL: compilers-m68k-linux-gnu-coldfire gcc build
FAIL: compilers-m68k-linux-gnu-coldfire-soft gcc build
FAIL: compilers-microblaze-linux-gnu gcc build
FAIL: compilers-microblazeel-linux-gnu gcc build
FAIL: compilers-nios2-linux-gnu gcc build
FAIL: compilers-or1k-linux-gnu-soft gcc build
FAIL: compilers-riscv32-linux-gnu-rv32imac-ilp32 gcc build
FAIL: compilers-riscv32-linux-gnu-rv32imafdc-ilp32 gcc build
FAIL: compilers-riscv32-linux-gnu-rv32imafdc-ilp32d gcc build
FAIL: compilers-sh3-linux-gnu gcc build
FAIL: compilers-sh3eb-linux-gnu gcc build
FAIL: compilers-sh4-linux-gnu gcc build
FAIL: compilers-sh4-linux-gnu-soft gcc build
FAIL: compilers-sh4eb-linux-gnu gcc build
FAIL: compilers-sh4eb-linux-gnu-soft gcc build
FAIL: compilers-x86_64-gnu gcc build

This looks like targets that libgomp/configure.tgt does *not* have any 
special handling for, and so never adds "linux" to config_path for.

-- 
Joseph S. Myers
jos...@codesourcery.com


Re: [PATCH v3] c-family: Enable -fpermissive for C and ObjC

2023-11-07 Thread Joseph Myers
On Tue, 7 Nov 2023, Florian Weimer wrote:

> Future changes will treat some C front end warnings similar to
> -Wnarrowing.
> 
> gcc/
> 
>   * doc/invoke.texi (Warning Options): Mention C diagnostics
>   for -fpermissive.
> 
> gcc/c-family/
> 
>   * c.opt (fpermissive): Enable for C and ObjC.
>   * c-opts.cc (c_common_post_options): Enable -fpermissive.
> 
> (snip)
> 
> Joseph, would you be able to review this?
> 
> There are no new tests because there are no permerrors yet.

OK.

-- 
Joseph S. Myers
jos...@codesourcery.com


[committed] c: Change T2X_* format checking macros to T23_*

2023-11-07 Thread Joseph Myers
Analogous to previous changes to code that matched "c2x"
(case-insensitive), also update T2X_* macros used in format checking
tables to be named T23_*.

Bootstrapped with no regressions for x86_64-pc-linux-gnu.

gcc/c-family/
* c-format.h (T2X_UI): Rename to T23_UI.
(T2X_UL): Rename to T23_UL.
(T2X_ULL): Rename to T23_ULL.
(T2X_US): Rename to T23_US.
(T2X_UC): Rename to T23_UC.
(T2X_ST): Rename to T23_ST.
(T2X_UPD): Rename to T23_UPD.
(T2X_UIM): Rename to T23_UIM.
(T2X_D32): Rename to T23_D32.
(T2X_D64): Rename to T23_D64.
(T2X_D128): Rename to T23_D128.
(T2X_I8): Rename to T23_I8.
(T2X_I16): Rename to T23_I16.
(T2X_I32): Rename to T23_I32.
(T2X_I64): Rename to T23_I64.
(T2X_U8): Rename to T23_U8.
(T2X_U16): Rename to T23_U16.
(T2X_U32): Rename to T23_U32.
(T2X_U64): Rename to T23_U64.
(T2X_IF8): Rename to T23_IF8.
(T2X_IF16): Rename to T23_IF16.
(T2X_IF32): Rename to T23_IF32.
(T2X_IF64): Rename to T23_IF64.
(T2X_UF8): Rename to T23_UF8.
(T2X_UF16): Rename to T23_UF16.
(T2X_UF32): Rename to T23_UF32.
(T2X_UF64): Rename to T23_UF64.
* c-format.cc: Update all uses of T2X_* macros to use T23_*.

diff --git a/gcc/c-family/c-format.cc b/gcc/c-family/c-format.cc
index 06c84d019fa..1a25141071f 100644
--- a/gcc/c-family/c-format.cc
+++ b/gcc/c-family/c-format.cc
@@ -709,20 +709,20 @@ static const format_flag_pair strfmon_flag_pairs[] =
 static const format_char_info print_char_table[] =
 {
   /* C89 conversion specifiers.  */
-  { "di",  0, STD_C89, { T89_I,   T99_SC,  T89_S,   T89_L,   T9L_LL,  TEX_LL,  
T99_SST, T99_PD,  T99_IM,  BADLEN,  BADLEN,  BADLEN,   T2X_I8,  T2X_I16, 
T2X_I32, T2X_I64, T2X_IF8, T2X_IF16, T2X_IF32, T2X_IF64 }, "-wp0 +'I",  "i",  
NULL },
-  { "oxX", 0, STD_C89, { T89_UI,  T99_UC,  T89_US,  T89_UL,  T9L_ULL, TEX_ULL, 
T99_ST,  T99_UPD, T99_UIM, BADLEN,  BADLEN,  BADLEN,   T2X_U8,  T2X_U16, 
T2X_U32, T2X_U64, T2X_UF8, T2X_UF16, T2X_UF32, T2X_UF64 }, "-wp0#", "i",  
NULL },
-  { "u",   0, STD_C89, { T89_UI,  T99_UC,  T89_US,  T89_UL,  T9L_ULL, TEX_ULL, 
T99_ST,  T99_UPD, T99_UIM, BADLEN,  BADLEN,  BADLEN,   T2X_U8,  T2X_U16, 
T2X_U32, T2X_U64, T2X_UF8, T2X_UF16, T2X_UF32, T2X_UF64 }, "-wp0'I","i",  
NULL },
-  { "fgG", 0, STD_C89, { T89_D,   BADLEN,  BADLEN,  T99_D,   BADLEN,  T89_LD,  
BADLEN,  BADLEN,  BADLEN,  T2X_D32, T2X_D64, T2X_D128, BADLEN,  BADLEN,  
BADLEN,  BADLEN,  BADLEN,  BADLEN,   BADLEN,   BADLEN }, "-wp0 +#'I", "",   
NULL },
-  { "eE",  0, STD_C89, { T89_D,   BADLEN,  BADLEN,  T99_D,   BADLEN,  T89_LD,  
BADLEN,  BADLEN,  BADLEN,  T2X_D32, T2X_D64, T2X_D128, BADLEN,  BADLEN,  
BADLEN,  BADLEN,  BADLEN,  BADLEN,   BADLEN,   BADLEN }, "-wp0 +#I",  "",   
NULL },
+  { "di",  0, STD_C89, { T89_I,   T99_SC,  T89_S,   T89_L,   T9L_LL,  TEX_LL,  
T99_SST, T99_PD,  T99_IM,  BADLEN,  BADLEN,  BADLEN,   T23_I8,  T23_I16, 
T23_I32, T23_I64, T23_IF8, T23_IF16, T23_IF32, T23_IF64 }, "-wp0 +'I",  "i",  
NULL },
+  { "oxX", 0, STD_C89, { T89_UI,  T99_UC,  T89_US,  T89_UL,  T9L_ULL, TEX_ULL, 
T99_ST,  T99_UPD, T99_UIM, BADLEN,  BADLEN,  BADLEN,   T23_U8,  T23_U16, 
T23_U32, T23_U64, T23_UF8, T23_UF16, T23_UF32, T23_UF64 }, "-wp0#", "i",  
NULL },
+  { "u",   0, STD_C89, { T89_UI,  T99_UC,  T89_US,  T89_UL,  T9L_ULL, TEX_ULL, 
T99_ST,  T99_UPD, T99_UIM, BADLEN,  BADLEN,  BADLEN,   T23_U8,  T23_U16, 
T23_U32, T23_U64, T23_UF8, T23_UF16, T23_UF32, T23_UF64 }, "-wp0'I","i",  
NULL },
+  { "fgG", 0, STD_C89, { T89_D,   BADLEN,  BADLEN,  T99_D,   BADLEN,  T89_LD,  
BADLEN,  BADLEN,  BADLEN,  T23_D32, T23_D64, T23_D128, BADLEN,  BADLEN,  
BADLEN,  BADLEN,  BADLEN,  BADLEN,   BADLEN,   BADLEN }, "-wp0 +#'I", "",   
NULL },
+  { "eE",  0, STD_C89, { T89_D,   BADLEN,  BADLEN,  T99_D,   BADLEN,  T89_LD,  
BADLEN,  BADLEN,  BADLEN,  T23_D32, T23_D64, T23_D128, BADLEN,  BADLEN,  
BADLEN,  BADLEN,  BADLEN,  BADLEN,   BADLEN,   BADLEN }, "-wp0 +#I",  "",   
NULL },
   { "c",   0, STD_C89, { T89_I,   BADLEN,  BADLEN,  T94_WI,  BADLEN,  BADLEN,  
BADLEN,  BADLEN,  BADLEN,  BADLEN,  BADLEN,  BADLEN,   BADLEN,  BADLEN,  
BADLEN,  BADLEN,  BADLEN,  BADLEN,   BADLEN,   BADLEN }, "-w","",   
NULL },
   { "s",   1, STD_C89, { T89_C,   BADLEN,  BADLEN,  T94_W,   BADLEN,  BADLEN,  
BADLEN,  BADLEN,  BADLEN,  BADLEN,  BADLEN,  BADLEN,   BADLEN,  BADLEN,  
BADLEN,  BADLEN,  BADLEN,  BADLEN,   BADLEN,   BADLEN }, "-wp",   "cR", 
NULL },
   { "p",   1, STD_C89, { T89_V,   BADLEN,  BADLEN,  BADLEN,  BADLEN,  BADLEN,  
BADLEN,  BADLEN,  BADLEN,  BADLEN,  BADLEN,  BADLEN,   BADLEN,  BADLEN,  
BADLEN,  BADLEN,  BADLEN,  BADLEN,   BADLEN,   BADLEN }, "-w","c",  
NULL },
-  { "n",   1, STD_C89, { T89_I,   T99_SC,  T89_S,   T89_L,   T9L_LL,  BADLEN,  
T99_SST, T99_PD,  T99_IM,  BADLEN,  BADLEN,  BADLEN,   T2X_I8,  T2X_I16, 
T2X_I32, T2X_I64, T2X_IF8, T2X_IF16, 

Re: [PATCH] c++: decltype of (by-value captured reference) [PR79620]

2023-11-07 Thread Patrick Palka
On Tue, 7 Nov 2023, Patrick Palka wrote:

> Bootstrapped and regtested on x86_64-pc-linux-gnu, does this look
> OK for trunk?
> 
> -- >8 --
> 
> The capture decltype handling in finish_decltype_type wasn't looking
> through implicit INDIRECT_REF (added by convert_from_reference), which
> caused us to incorrectly resolve decltype((x)) to float& below.

Oops, this should say decltype((r)).  We already correctly resolve
decltype((x)) to const float& (since x isn't a reference).

> 
> We still don't fully accept the example ultimately because when
> processing the decltype inside the first lambda's trailing return type,
> we're in lambda type scope but not yet in lambda function scope that
> the check looks for, which seems like an orthogonal bug.
> 
>   PR c++/79620
> 
> gcc/cp/ChangeLog:
> 
>   * cp-tree.h (STRIP_REFERENCE_REF): Define.
>   * semantics.cc (finish_decltype_type): Use it to look
>   through implicit INDIRECT_REF when deciding whether to
>   call capture_decltype.
> 
> gcc/testsuite/ChangeLog:
> 
>   * g++.dg/cpp0x/lambda/lambda-decltype3.C: New test.
> ---
>  gcc/cp/cp-tree.h  |  4 +++
>  gcc/cp/semantics.cc   |  4 +--
>  .../g++.dg/cpp0x/lambda/lambda-decltype3.C| 28 +++
>  3 files changed, 34 insertions(+), 2 deletions(-)
>  create mode 100644 gcc/testsuite/g++.dg/cpp0x/lambda/lambda-decltype3.C
> 
> diff --git a/gcc/cp/cp-tree.h b/gcc/cp/cp-tree.h
> index b2603d4830e..1fa710d7154 100644
> --- a/gcc/cp/cp-tree.h
> +++ b/gcc/cp/cp-tree.h
> @@ -4084,6 +4084,10 @@ struct GTY(()) lang_decl {
> && TREE_TYPE (TREE_OPERAND (NODE, 0)) \
> && TYPE_REF_P (TREE_TYPE (TREE_OPERAND ((NODE), 0
>  
> +/* Look through an implicit INDIRECT_REF from convert_from_reference.  */
> +#define STRIP_REFERENCE_REF(NODE)\
> +  (REFERENCE_REF_P (NODE) ? TREE_OPERAND (NODE, 0) : NODE)
> +
>  /* True iff this represents an lvalue being treated as an rvalue during 
> return
> or throw as per [class.copy.elision].  */
>  #define IMPLICIT_RVALUE_P(NODE) \
> diff --git a/gcc/cp/semantics.cc b/gcc/cp/semantics.cc
> index f583dedd6cf..8df4521bf7c 100644
> --- a/gcc/cp/semantics.cc
> +++ b/gcc/cp/semantics.cc
> @@ -11717,10 +11717,10 @@ finish_decltype_type (tree expr, bool 
> id_expression_or_member_access_p,
>transformed into an access to a corresponding data member
>of the closure type that would have been declared if x
>were a use of the denoted entity.  */
> -  if (outer_automatic_var_p (expr)
> +  if (outer_automatic_var_p (STRIP_REFERENCE_REF (expr))
> && current_function_decl
> && LAMBDA_FUNCTION_P (current_function_decl))
> - type = capture_decltype (expr);
> + type = capture_decltype (STRIP_REFERENCE_REF (expr));
>else if (error_operand_p (expr))
>   type = error_mark_node;
>else if (expr == current_class_ptr)
> diff --git a/gcc/testsuite/g++.dg/cpp0x/lambda/lambda-decltype3.C 
> b/gcc/testsuite/g++.dg/cpp0x/lambda/lambda-decltype3.C
> new file mode 100644
> index 000..7fc157aefb5
> --- /dev/null
> +++ b/gcc/testsuite/g++.dg/cpp0x/lambda/lambda-decltype3.C
> @@ -0,0 +1,28 @@
> +// PR c++/79620
> +// [expr.prim.id.unqual] example 1
> +// { dg-do compile { target c++11 } }
> +
> +void f() {
> +  float x,  = x;
> +
> +  [=]() -> decltype((x)) {  // lambda returns float const& because this 
> lambda is not mutable and
> +// x is an lvalue
> +decltype(x) y1; // y1 has type float
> +decltype((x)) y2 = y1;  // y2 has type float const&
> +decltype(r) r1 = y1;// r1 has type float&
> +decltype((r)) r2 = y2;  // r2 has type float const&
> +return y2;  // { dg-bogus "'float&' to 'const float'" "" 
> { xfail *-*-* } }
> +  };
> +
> +  [=](decltype((x)) y) {
> +decltype((x)) z = x;// OK, y has type float&, z has type float 
> const&
> +  };
> +
> +  [=] {
> +[](decltype((x)) y) {}; // OK, lambda takes a parameter of type 
> float const&
> +
> +[x=1](decltype((x)) y) {
> +  decltype((x)) z = x;  // OK, y has type int&, z has type int const&
> +};
> +  };
> +}
> -- 
> 2.43.0.rc0.23.g8be77c5de6
> 
> 



[PATCH] c++: decltype of (by-value captured reference) [PR79620]

2023-11-07 Thread Patrick Palka
Bootstrapped and regtested on x86_64-pc-linux-gnu, does this look
OK for trunk?

-- >8 --

The capture decltype handling in finish_decltype_type wasn't looking
through implicit INDIRECT_REF (added by convert_from_reference), which
caused us to incorrectly resolve decltype((x)) to float& below.

We still don't fully accept the example ultimately because when
processing the decltype inside the first lambda's trailing return type,
we're in lambda type scope but not yet in lambda function scope that
the check looks for, which seems like an orthogonal bug.

PR c++/79620

gcc/cp/ChangeLog:

* cp-tree.h (STRIP_REFERENCE_REF): Define.
* semantics.cc (finish_decltype_type): Use it to look
through implicit INDIRECT_REF when deciding whether to
call capture_decltype.

gcc/testsuite/ChangeLog:

* g++.dg/cpp0x/lambda/lambda-decltype3.C: New test.
---
 gcc/cp/cp-tree.h  |  4 +++
 gcc/cp/semantics.cc   |  4 +--
 .../g++.dg/cpp0x/lambda/lambda-decltype3.C| 28 +++
 3 files changed, 34 insertions(+), 2 deletions(-)
 create mode 100644 gcc/testsuite/g++.dg/cpp0x/lambda/lambda-decltype3.C

diff --git a/gcc/cp/cp-tree.h b/gcc/cp/cp-tree.h
index b2603d4830e..1fa710d7154 100644
--- a/gcc/cp/cp-tree.h
+++ b/gcc/cp/cp-tree.h
@@ -4084,6 +4084,10 @@ struct GTY(()) lang_decl {
&& TREE_TYPE (TREE_OPERAND (NODE, 0))   \
&& TYPE_REF_P (TREE_TYPE (TREE_OPERAND ((NODE), 0
 
+/* Look through an implicit INDIRECT_REF from convert_from_reference.  */
+#define STRIP_REFERENCE_REF(NODE)  \
+  (REFERENCE_REF_P (NODE) ? TREE_OPERAND (NODE, 0) : NODE)
+
 /* True iff this represents an lvalue being treated as an rvalue during return
or throw as per [class.copy.elision].  */
 #define IMPLICIT_RVALUE_P(NODE) \
diff --git a/gcc/cp/semantics.cc b/gcc/cp/semantics.cc
index f583dedd6cf..8df4521bf7c 100644
--- a/gcc/cp/semantics.cc
+++ b/gcc/cp/semantics.cc
@@ -11717,10 +11717,10 @@ finish_decltype_type (tree expr, bool 
id_expression_or_member_access_p,
 transformed into an access to a corresponding data member
 of the closure type that would have been declared if x
 were a use of the denoted entity.  */
-  if (outer_automatic_var_p (expr)
+  if (outer_automatic_var_p (STRIP_REFERENCE_REF (expr))
  && current_function_decl
  && LAMBDA_FUNCTION_P (current_function_decl))
-   type = capture_decltype (expr);
+   type = capture_decltype (STRIP_REFERENCE_REF (expr));
   else if (error_operand_p (expr))
type = error_mark_node;
   else if (expr == current_class_ptr)
diff --git a/gcc/testsuite/g++.dg/cpp0x/lambda/lambda-decltype3.C 
b/gcc/testsuite/g++.dg/cpp0x/lambda/lambda-decltype3.C
new file mode 100644
index 000..7fc157aefb5
--- /dev/null
+++ b/gcc/testsuite/g++.dg/cpp0x/lambda/lambda-decltype3.C
@@ -0,0 +1,28 @@
+// PR c++/79620
+// [expr.prim.id.unqual] example 1
+// { dg-do compile { target c++11 } }
+
+void f() {
+  float x,  = x;
+
+  [=]() -> decltype((x)) {  // lambda returns float const& because this 
lambda is not mutable and
+// x is an lvalue
+decltype(x) y1; // y1 has type float
+decltype((x)) y2 = y1;  // y2 has type float const&
+decltype(r) r1 = y1;// r1 has type float&
+decltype((r)) r2 = y2;  // r2 has type float const&
+return y2;  // { dg-bogus "'float&' to 'const float'" "" { 
xfail *-*-* } }
+  };
+
+  [=](decltype((x)) y) {
+decltype((x)) z = x;// OK, y has type float&, z has type float 
const&
+  };
+
+  [=] {
+[](decltype((x)) y) {}; // OK, lambda takes a parameter of type float 
const&
+
+[x=1](decltype((x)) y) {
+  decltype((x)) z = x;  // OK, y has type int&, z has type int const&
+};
+  };
+}
-- 
2.43.0.rc0.23.g8be77c5de6



[PATCH] c++: decltype of capture proxy [PR79378, PR96917]

2023-11-07 Thread Patrick Palka
Bootstrapped and regtested on x86_64-pc-linux-gnu, does this look OK for
trunk?

-- >8 --

We usually don't see capture proxies in finish_decltype_type because
process_outer_var_ref is a no-op inside an unevaluated context and
so a use of a capture inside decltype refers directly to the captured
variable.  But we can still see a capture proxy during decltype(auto)
deduction and for decltype of an init-capture, which suggests we need
to handle capture proxies specially within finish_decltype_type (since
they're always implicitly const).  This patch adds such handling.

PR c++/79378
PR c++/96917

gcc/cp/ChangeLog:

* semantics.cc (finish_decltype_type): Handle an id-expression
naming a capture proxy specially.

gcc/testsuite/ChangeLog:

* g++.dg/cpp1y/decltype-auto7.C: New test.
* g++.dg/cpp1y/lambda-init20.C: New test.
---
 gcc/cp/semantics.cc | 28 +--
 gcc/testsuite/g++.dg/cpp1y/decltype-auto7.C | 53 +
 gcc/testsuite/g++.dg/cpp1y/lambda-init20.C  | 22 +
 3 files changed, 98 insertions(+), 5 deletions(-)
 create mode 100644 gcc/testsuite/g++.dg/cpp1y/decltype-auto7.C
 create mode 100644 gcc/testsuite/g++.dg/cpp1y/lambda-init20.C

diff --git a/gcc/cp/semantics.cc b/gcc/cp/semantics.cc
index 4059e74bdb7..f583dedd6cf 100644
--- a/gcc/cp/semantics.cc
+++ b/gcc/cp/semantics.cc
@@ -11643,12 +11643,30 @@ finish_decltype_type (tree expr, bool 
id_expression_or_member_access_p,
   /* Fall through for fields that aren't bitfields.  */
  gcc_fallthrough ();
 
-case FUNCTION_DECL:
 case VAR_DECL:
-case CONST_DECL:
-case PARM_DECL:
-case RESULT_DECL:
-case TEMPLATE_PARM_INDEX:
+ if (is_capture_proxy (expr))
+   {
+ if (is_normal_capture_proxy (expr))
+   {
+ expr = DECL_CAPTURED_VARIABLE (expr);
+ type = TREE_TYPE (expr);
+ type = non_reference (type);
+   }
+ else
+   {
+ expr = DECL_VALUE_EXPR (expr);
+ gcc_assert (TREE_CODE (expr) == COMPONENT_REF);
+ expr = TREE_OPERAND (expr, 1);
+ type = TREE_TYPE (expr);
+   }
+ break;
+   }
+ /* Fall through.  */
+   case FUNCTION_DECL:
+   case CONST_DECL:
+   case PARM_DECL:
+   case RESULT_DECL:
+   case TEMPLATE_PARM_INDEX:
  expr = mark_type_use (expr);
   type = TREE_TYPE (expr);
  if (VAR_P (expr) && DECL_NTTP_OBJECT_P (expr))
diff --git a/gcc/testsuite/g++.dg/cpp1y/decltype-auto7.C 
b/gcc/testsuite/g++.dg/cpp1y/decltype-auto7.C
new file mode 100644
index 000..a37b9db38d4
--- /dev/null
+++ b/gcc/testsuite/g++.dg/cpp1y/decltype-auto7.C
@@ -0,0 +1,53 @@
+// PR c++/96917
+// { dg-do compile { target c++14 } }
+
+int main() {
+  int x = 0;
+  int y = 0;
+  const int cx = 0;
+  const int cy = 0;
+
+  [x, , cx, ] {
+decltype(auto) a = x;
+using ty1 = int;
+using ty1 = decltype(x);
+using ty1 = decltype(a);
+
+decltype(auto) b = y;
+using ty2 = int;
+using ty2 = decltype(y);
+using ty2 = decltype(b);
+
+decltype(auto) ca = cx;
+using ty3 = const int;
+using ty3 = decltype(cx);
+using ty3 = decltype(ca);
+
+decltype(auto) cb = cy;
+using ty4 = const int;
+using ty4 = decltype(cy);
+using ty4 = decltype(cb);
+  };
+
+  [x=x, =y, cx=cx, =cy] {
+decltype(auto) a = x;
+using ty1 = int;
+using ty1 = decltype(x);
+using ty1 = decltype(a);
+
+decltype(auto) b = y;
+using ty2 = int&;
+using ty2 = decltype(y);
+using ty2 = decltype(b);
+
+decltype(auto) ca = cx;
+using ty3 = int;
+using ty3 = decltype(cx);
+using ty3 = decltype(ca);
+
+decltype(auto) cb = cy;
+using ty4 = const int&;
+using ty4 = decltype(cy);
+using ty4 = decltype(cb);
+  };
+}
diff --git a/gcc/testsuite/g++.dg/cpp1y/lambda-init20.C 
b/gcc/testsuite/g++.dg/cpp1y/lambda-init20.C
new file mode 100644
index 000..a06b77a664d
--- /dev/null
+++ b/gcc/testsuite/g++.dg/cpp1y/lambda-init20.C
@@ -0,0 +1,22 @@
+// PR c++/79378
+// { dg-do compile { target c++14 } }
+
+int main() {
+  int x = 0;
+  [x=x, =x] {
+using ty1 = int;
+using ty1 = decltype(x);
+
+using ty2 = int&;
+using ty2 = decltype(r);
+  };
+
+  const int cx = 0;
+  [x=cx, =cx] {
+using ty1 = int;
+using ty1 = decltype(x);
+
+using ty2 = const int&;
+using ty2 = decltype(r);
+  };
+}
-- 
2.43.0.rc0.23.g8be77c5de6



Re: [PATCH v1] RISC-V: Support FP rint to i/l/ll diff size autovec

2023-11-07 Thread Patrick O'Neill
Ah sorry for the noise - I just saw that this was resolved with a 
subsequent patch:


Precommit run: 
https://github.com/ewlu/gcc-precommit-ci/issues/608#issuecomment-1798058721


Patrick

On 11/7/23 11:17, Patrick O'Neill wrote:

Hi Pan,

This patch (9acea4376fd98696ba51e59f417c94911a4d8248) causes|||cond_widen_reduc-2.c to start failing on: linux/newlib: rv32/64gc ||linux/newlib: ||rv32gcv ||linux/newlib: ||rv32/64gc|_zba_zbb_zbc_zbs|||FAIL: gcc.target/riscv/rvv/autovec/cond/cond_widen_reduc-2.c 
scan-assembler-times 
\\tvfwredusum\\.vs\\tv[0-9]+,v[0-9]+,v[0-9]+,v0\\.t 2 FAIL: 
gcc.target/riscv/rvv/autovec/cond/cond_widen_reduc-2.c 
scan-assembler-times \\tvwredsum\\.vs\\tv[0-9]+,v[0-9]+,v[0-9]+,v0\\.t 
3 FAIL: gcc.target/riscv/rvv/autovec/cond/cond_widen_reduc-2.c 
scan-assembler-times 
\\tvwredsumu\\.vs\\tv[0-9]+,v[0-9]+,v[0-9]+,v0\\.t 3 Debug log output: 
spawn -ignore SIGHUP 
/github/patrick-postcommit-runner-2/_work/gcc-postcommit-ci/gcc-postcommit-ci/riscv-gnu-toolchain/build/build-gcc-linux-stage2/gcc/xgcc 
-B/github/patrick-postcommit-runner-2/_work/gcc-postcommit-ci/gcc-postcommit-ci/riscv-gnu-toolchain/build/build-gcc-linux-stage2/gcc/ 
/github/patrick-postcommit-runner-2/_work/gcc-postcommit-ci/gcc-postcommit-ci/riscv-gnu-toolchain/gcc/gcc/testsuite/gcc.target/riscv/rvv/autovec/cond/cond_widen_reduc-2.c 
-march=rv32gcv -mabi=ilp32d -mcmodel=medlow -fdiagnostics-plain-output 
-ftree-vectorize -O2 --param riscv-autovec-lmul=dynamic 
-march=rv64gcv_zvfh_zvl128b -mabi=lp64d --param 
riscv-autovec-preference=scalable --param riscv-autovec-lmul=m2 
-fno-vect-cost-model -ffast-math -ffat-lto-objects -fno-ident -S -o 
cond_widen_reduc-2.s PASS: 
gcc.target/riscv/rvv/autovec/cond/cond_widen_reduc-2.c (test for 
excess errors) gcc.target/riscv/rvv/autovec/cond/cond_widen_reduc-2.c: 
\\tvfwredusum\\.vs\\tv[0-9]+,v[0-9]+,v[0-9]+,v0\\.t found 0 times 
FAIL: gcc.target/riscv/rvv/autovec/cond/cond_widen_reduc-2.c 
scan-assembler-times 
\\tvfwredusum\\.vs\\tv[0-9]+,v[0-9]+,v[0-9]+,v0\\.t 2 
gcc.target/riscv/rvv/autovec/cond/cond_widen_reduc-2.c: 
\\tvwredsum\\.vs\\tv[0-9]+,v[0-9]+,v[0-9]+,v0\\.t found 0 times FAIL: 
gcc.target/riscv/rvv/autovec/cond/cond_widen_reduc-2.c 
scan-assembler-times \\tvwredsum\\.vs\\tv[0-9]+,v[0-9]+,v[0-9]+,v0\\.t 
3 gcc.target/riscv/rvv/autovec/cond/cond_widen_reduc-2.c: 
\\tvwredsumu\\.vs\\tv[0-9]+,v[0-9]+,v[0-9]+,v0\\.t found 0 times FAIL: 
gcc.target/riscv/rvv/autovec/cond/cond_widen_reduc-2.c 
scan-assembler-times 
\\tvwredsumu\\.vs\\tv[0-9]+,v[0-9]+,v[0-9]+,v0\\.t 3 Executing on 
host: 
/github/patrick-postcommit-runner-2/_work/gcc-postcommit-ci/gcc-postcommit-ci/riscv-gnu-toolchain/build/build-gcc-linux-stage2/gcc/xgcc 
-B/github/patrick-postcommit-runner-2/_work/gcc-postcommit-ci/gcc-postcommit-ci/riscv-gnu-toolchain/build/build-gcc-linux-stage2/gcc/ 
/github/patrick-postcommit-runner-2/_work/gcc-postcommit-ci/gcc-postcommit-ci/riscv-gnu-toolchain/gcc/gcc/testsuite/gcc.target/riscv/rvv/autovec/cond/cond_widen_reduc_run-1.c 
-march=rv32gcv -mabi=ilp32d -mcmodel=medlow -fdiagnostics-plain-output 
-ftree-vectorize -O2 --param riscv-autovec-lmul=dynamic --param 
riscv-autovec-preference=fixed-vlmax --param riscv-autovec-lmul=m2 
-fno-vect-cost-model -ffast-math -lm -o ./cond_widen_reduc_run-1.exe 
(timeout = 600) These failures are still on trunk 
(b7d05f13e86bf49bfb78c9876deba388efc6082e). Thanks, Patrick Postcommit 
CI bisection: 
https://github.com/patrick-rivos/gcc-postcommit-ci/issues/130 |

On 11/5/23 01:30,pan2...@intel.com  wrote:

From: Pan Li

This patch would like to support the FP below API auto vectorization
with different type size

+-+---+--+
| API | RV64  | RV32 |
+-+---+--+
| irint   | DF => SI  | DF => SI |
| irintf  | - | -|
| lrint   | - | DF => SI |
| lrintf  | SF => DI  | -|
| llrint  | - | -|
| llrintf | SF => DI  | SF => DI |
+-+---+--+

Given below code:
void
test_lrintf (long *out, float *in, unsigned count)
{
   for (unsigned i = 0; i < count; i++)
 out[i] = __builtin_lrintf (in[i]);
}

Before this patch:
test_lrintf:
   beq a2,zero,.L8
   sllia5,a2,32
   srlia2,a5,30
   add a4,a1,a2
.L3:
   flw fa5,0(a1)
   addia1,a1,4
   addia0,a0,8
   fcvt.l.s a5,fa5,dyn
   sd  a5,-8(a0)
   bne a1,a4,.L3

After this patch:
test_lrintf:
   beq a2,zero,.L8
   sllia2,a2,32
   srlia2,a2,32
.L3:
   vsetvli a5,a2,e32,mf2,ta,ma
   vle32.v v2,0(a1)
   sllia3,a5,2
   sllia4,a5,3
   vfwcvt.x.f.vv1,v2
   sub a2,a2,a5
   vse64.v v1,0(a0)
   add a1,a1,a3
   add a0,a0,a4
   bne a2,zero,.L3

Unfortunately, the HF mode is not include due to it requires
additional middle-end support from internal-fun.def.

gcc/ChangeLog:

* config/riscv/autovec.md: Remove the size check of lrint.
* config/riscv/riscv-v.cc (emit_vec_narrow_cvt_x_f): New help
emit 

Re: [PATCH v1] RISC-V: Support FP rint to i/l/ll diff size autovec

2023-11-07 Thread Patrick O'Neill

Hi Pan,

This patch (9acea4376fd98696ba51e59f417c94911a4d8248) causes|||cond_widen_reduc-2.c to start failing on: linux/newlib: rv32/64gc ||linux/newlib: ||rv32gcv ||linux/newlib: ||rv32/64gc|_zba_zbb_zbc_zbs|||FAIL: gcc.target/riscv/rvv/autovec/cond/cond_widen_reduc-2.c 
scan-assembler-times \\tvfwredusum\\.vs\\tv[0-9]+,v[0-9]+,v[0-9]+,v0\\.t 
2 FAIL: gcc.target/riscv/rvv/autovec/cond/cond_widen_reduc-2.c 
scan-assembler-times \\tvwredsum\\.vs\\tv[0-9]+,v[0-9]+,v[0-9]+,v0\\.t 3 
FAIL: gcc.target/riscv/rvv/autovec/cond/cond_widen_reduc-2.c 
scan-assembler-times \\tvwredsumu\\.vs\\tv[0-9]+,v[0-9]+,v[0-9]+,v0\\.t 
3 Debug log output: spawn -ignore SIGHUP 
/github/patrick-postcommit-runner-2/_work/gcc-postcommit-ci/gcc-postcommit-ci/riscv-gnu-toolchain/build/build-gcc-linux-stage2/gcc/xgcc 
-B/github/patrick-postcommit-runner-2/_work/gcc-postcommit-ci/gcc-postcommit-ci/riscv-gnu-toolchain/build/build-gcc-linux-stage2/gcc/ 
/github/patrick-postcommit-runner-2/_work/gcc-postcommit-ci/gcc-postcommit-ci/riscv-gnu-toolchain/gcc/gcc/testsuite/gcc.target/riscv/rvv/autovec/cond/cond_widen_reduc-2.c 
-march=rv32gcv -mabi=ilp32d -mcmodel=medlow -fdiagnostics-plain-output 
-ftree-vectorize -O2 --param riscv-autovec-lmul=dynamic 
-march=rv64gcv_zvfh_zvl128b -mabi=lp64d --param 
riscv-autovec-preference=scalable --param riscv-autovec-lmul=m2 
-fno-vect-cost-model -ffast-math -ffat-lto-objects -fno-ident -S -o 
cond_widen_reduc-2.s PASS: 
gcc.target/riscv/rvv/autovec/cond/cond_widen_reduc-2.c (test for excess 
errors) gcc.target/riscv/rvv/autovec/cond/cond_widen_reduc-2.c: 
\\tvfwredusum\\.vs\\tv[0-9]+,v[0-9]+,v[0-9]+,v0\\.t found 0 times FAIL: 
gcc.target/riscv/rvv/autovec/cond/cond_widen_reduc-2.c 
scan-assembler-times \\tvfwredusum\\.vs\\tv[0-9]+,v[0-9]+,v[0-9]+,v0\\.t 
2 gcc.target/riscv/rvv/autovec/cond/cond_widen_reduc-2.c: 
\\tvwredsum\\.vs\\tv[0-9]+,v[0-9]+,v[0-9]+,v0\\.t found 0 times FAIL: 
gcc.target/riscv/rvv/autovec/cond/cond_widen_reduc-2.c 
scan-assembler-times \\tvwredsum\\.vs\\tv[0-9]+,v[0-9]+,v[0-9]+,v0\\.t 3 
gcc.target/riscv/rvv/autovec/cond/cond_widen_reduc-2.c: 
\\tvwredsumu\\.vs\\tv[0-9]+,v[0-9]+,v[0-9]+,v0\\.t found 0 times FAIL: 
gcc.target/riscv/rvv/autovec/cond/cond_widen_reduc-2.c 
scan-assembler-times \\tvwredsumu\\.vs\\tv[0-9]+,v[0-9]+,v[0-9]+,v0\\.t 
3 Executing on host: 
/github/patrick-postcommit-runner-2/_work/gcc-postcommit-ci/gcc-postcommit-ci/riscv-gnu-toolchain/build/build-gcc-linux-stage2/gcc/xgcc 
-B/github/patrick-postcommit-runner-2/_work/gcc-postcommit-ci/gcc-postcommit-ci/riscv-gnu-toolchain/build/build-gcc-linux-stage2/gcc/ 
/github/patrick-postcommit-runner-2/_work/gcc-postcommit-ci/gcc-postcommit-ci/riscv-gnu-toolchain/gcc/gcc/testsuite/gcc.target/riscv/rvv/autovec/cond/cond_widen_reduc_run-1.c 
-march=rv32gcv -mabi=ilp32d -mcmodel=medlow -fdiagnostics-plain-output 
-ftree-vectorize -O2 --param riscv-autovec-lmul=dynamic --param 
riscv-autovec-preference=fixed-vlmax --param riscv-autovec-lmul=m2 
-fno-vect-cost-model -ffast-math -lm -o ./cond_widen_reduc_run-1.exe 
(timeout = 600) These failures are still on trunk 
(b7d05f13e86bf49bfb78c9876deba388efc6082e). Thanks, Patrick Postcommit 
CI bisection: https://github.com/patrick-rivos/gcc-postcommit-ci/issues/130 |


On 11/5/23 01:30,pan2...@intel.com  wrote:


From: Pan Li

This patch would like to support the FP below API auto vectorization
with different type size

+-+---+--+
| API | RV64  | RV32 |
+-+---+--+
| irint   | DF => SI  | DF => SI |
| irintf  | - | -|
| lrint   | - | DF => SI |
| lrintf  | SF => DI  | -|
| llrint  | - | -|
| llrintf | SF => DI  | SF => DI |
+-+---+--+

Given below code:
void
test_lrintf (long *out, float *in, unsigned count)
{
   for (unsigned i = 0; i < count; i++)
 out[i] = __builtin_lrintf (in[i]);
}

Before this patch:
test_lrintf:
   beq a2,zero,.L8
   sllia5,a2,32
   srlia2,a5,30
   add a4,a1,a2
.L3:
   flw fa5,0(a1)
   addia1,a1,4
   addia0,a0,8
   fcvt.l.s a5,fa5,dyn
   sd  a5,-8(a0)
   bne a1,a4,.L3

After this patch:
test_lrintf:
   beq a2,zero,.L8
   sllia2,a2,32
   srlia2,a2,32
.L3:
   vsetvli a5,a2,e32,mf2,ta,ma
   vle32.v v2,0(a1)
   sllia3,a5,2
   sllia4,a5,3
   vfwcvt.x.f.vv1,v2
   sub a2,a2,a5
   vse64.v v1,0(a0)
   add a1,a1,a3
   add a0,a0,a4
   bne a2,zero,.L3

Unfortunately, the HF mode is not include due to it requires
additional middle-end support from internal-fun.def.

gcc/ChangeLog:

* config/riscv/autovec.md: Remove the size check of lrint.
* config/riscv/riscv-v.cc (emit_vec_narrow_cvt_x_f): New help
emit func impl.
(emit_vec_widden_cvt_x_f): New help emit func impl.
(emit_vec_rounding_to_integer): New func impl to emit the
rounding from FP to integer.
(expand_vec_lrint): Leverage 

[PATCH v3] c-family: Enable -fpermissive for C and ObjC

2023-11-07 Thread Florian Weimer
Future changes will treat some C front end warnings similar to
-Wnarrowing.

gcc/

* doc/invoke.texi (Warning Options): Mention C diagnostics
for -fpermissive.

gcc/c-family/

* c.opt (fpermissive): Enable for C and ObjC.
* c-opts.cc (c_common_post_options): Enable -fpermissive.

(snip)

Joseph, would you be able to review this?

There are no new tests because there are no permerrors yet.

Thanks,
Florian
---
v3: One operator per line, and use OPTION_SET_P.

 gcc/c-family/c-opts.cc | 14 ++
 gcc/c-family/c.opt |  2 +-
 gcc/doc/invoke.texi|  8 ++--
 3 files changed, 21 insertions(+), 3 deletions(-)

diff --git a/gcc/c-family/c-opts.cc b/gcc/c-family/c-opts.cc
index 87da6c180cd..a7963646fbc 100644
--- a/gcc/c-family/c-opts.cc
+++ b/gcc/c-family/c-opts.cc
@@ -854,6 +854,20 @@ c_common_post_options (const char **pfilename)
   && flag_unsafe_math_optimizations == 0)
 flag_fp_contract_mode = FP_CONTRACT_OFF;
 
+  /* C language modes before C99 enable -fpermissive by default, but
+ only if -pedantic-errors is not specified.  Also treat
+ -fno-permissive as a subset of -pedantic-errors that does not
+ reject certain GNU extensions also present the defaults for later
+ language modes.  */
+  if (!c_dialect_cxx ()
+  && !flag_isoc99
+  && !global_dc->m_pedantic_errors
+  && !OPTION_SET_P (flag_permissive))
+{
+  flag_permissive = 1;
+  global_dc->m_permissive = 1;
+}
+
   /* If we are compiling C, and we are outside of a standards mode,
  we can permit the new values from ISO/IEC TS 18661-3 for
  FLT_EVAL_METHOD.  Otherwise, we must restrict the possible values to
diff --git a/gcc/c-family/c.opt b/gcc/c-family/c.opt
index 359f071e632..4984cd796f4 100644
--- a/gcc/c-family/c.opt
+++ b/gcc/c-family/c.opt
@@ -2116,7 +2116,7 @@ C ObjC C++ ObjC++
 Look for and use PCH files even when preprocessing.
 
 fpermissive
-C++ ObjC++ Var(flag_permissive)
+C ObjC C++ ObjC++ Var(flag_permissive)
 Downgrade conformance errors to warnings.
 
 fplan9-extensions
diff --git a/gcc/doc/invoke.texi b/gcc/doc/invoke.texi
index 80bb1efac40..8527dbd1823 100644
--- a/gcc/doc/invoke.texi
+++ b/gcc/doc/invoke.texi
@@ -6174,13 +6174,17 @@ errors by @option{-pedantic-errors}.  For instance:
 Downgrade some required diagnostics about nonconformant code from
 errors to warnings.  Thus, using @option{-fpermissive} allows some
 nonconforming code to compile.  Some C++ diagnostics are controlled
-only by this flag, but it also downgrades some diagnostics that have
-their own flag:
+only by this flag, but it also downgrades some C and C++ diagnostics
+that have their own flag:
 
 @gccoptlist{
 -Wnarrowing @r{(C++)}
 }
 
+The @option{-fpermissive} option is the default for historic C language
+modes (@option{-std=c89}, @option{-std=gnu89}, @option{-std=c90},
+@option{-std=gnu90}).
+
 @opindex Wall
 @opindex Wno-all
 @item -Wall

base-commit: 7e3c58bfc1d957e48faf0752758da0fed811ed58



[committed] i386: Make flags_reg_operand a special predicate

2023-11-07 Thread Uros Bizjak
There is no need to check the mode in flags_reg_operand predicate. The
mode in flags setting instructions is checked with ix86_match_ccmode.

The patch avoids "warning: operand X missing mode?" warnings with
VOIDmode flags_reg_operand predicate.

gcc/ChangeLog:

* config/i386/predicates.md ("flags_reg_operand"):
Make predicate special to avoid automatic mode checks.

Bootstrapped and regression tested on x86_64-linux-gnu {,-m32}.

Uros.
diff --git a/gcc/config/i386/predicates.md b/gcc/config/i386/predicates.md
index a63b8cd7be5..b5a86257c9e 100644
--- a/gcc/config/i386/predicates.md
+++ b/gcc/config/i386/predicates.md
@@ -88,7 +88,7 @@ (define_predicate "ax_reg_operand"
(match_test "REGNO (op) == AX_REG")))
 
 ;; Return true if op is the flags register.
-(define_predicate "flags_reg_operand"
+(define_special_predicate "flags_reg_operand"
   (and (match_code "reg")
(match_test "REGNO (op) == FLAGS_REG")))
 


Re: [PATCH v2 0/3] libgfortran: empty array fixes

2023-11-07 Thread Harald Anlauf

Hi Mikael,

this is OK.

Thanks for the patches!

Harald

On 11/7/23 11:24, Mikael Morin wrote:

Hello,

Harald's review of the previous version [1] of these patches spotted a possible
misbehaving case in one patch, and a latent bug in the area of the second
patch.
So here is the second try, bootstraped and regression tested on 
x86_64-pc-linux-gnu.
OK for master?

Mikael

[1]:
https://gcc.gnu.org/pipermail/fortran/2023-November/059896.html
https://gcc.gnu.org/pipermail/gcc-patches/2023-November/635305.html

Changes from version 1:
  * Add patch 1/3 to the series fixing the unallocated empty result issue.
  * In patch 2/3 (formerly 1/2) clamp negative extent to zero.


Mikael Morin (3):
   libgfortran: Don't skip allocation if size is zero [PR112412]
   libgfortran: Remove early return if extent is zero [PR112371]
   libgfortran: Remove empty array descriptor first dimension overwrite
 [PR112371]

  gcc/testsuite/gfortran.dg/allocated_4.f90 | 195 +++
  gcc/testsuite/gfortran.dg/bound_10.f90| 207 
  gcc/testsuite/gfortran.dg/bound_11.f90| 588 ++
  libgfortran/generated/all_l1.c|   9 +-
  libgfortran/generated/all_l16.c   |   9 +-
  libgfortran/generated/all_l2.c|   9 +-
  libgfortran/generated/all_l4.c|   9 +-
  libgfortran/generated/all_l8.c|   9 +-
  libgfortran/generated/any_l1.c|   9 +-
  libgfortran/generated/any_l16.c   |   9 +-
  libgfortran/generated/any_l2.c|   9 +-
  libgfortran/generated/any_l4.c|   9 +-
  libgfortran/generated/any_l8.c|   9 +-
  libgfortran/generated/count_16_l.c|   9 +-
  libgfortran/generated/count_1_l.c |   9 +-
  libgfortran/generated/count_2_l.c |   9 +-
  libgfortran/generated/count_4_l.c |   9 +-
  libgfortran/generated/count_8_l.c |   9 +-
  libgfortran/generated/findloc1_c10.c  |  18 +-
  libgfortran/generated/findloc1_c16.c  |  18 +-
  libgfortran/generated/findloc1_c17.c  |  18 +-
  libgfortran/generated/findloc1_c4.c   |  18 +-
  libgfortran/generated/findloc1_c8.c   |  18 +-
  libgfortran/generated/findloc1_i1.c   |  18 +-
  libgfortran/generated/findloc1_i16.c  |  18 +-
  libgfortran/generated/findloc1_i2.c   |  18 +-
  libgfortran/generated/findloc1_i4.c   |  18 +-
  libgfortran/generated/findloc1_i8.c   |  18 +-
  libgfortran/generated/findloc1_r10.c  |  18 +-
  libgfortran/generated/findloc1_r16.c  |  18 +-
  libgfortran/generated/findloc1_r17.c  |  18 +-
  libgfortran/generated/findloc1_r4.c   |  18 +-
  libgfortran/generated/findloc1_r8.c   |  18 +-
  libgfortran/generated/findloc1_s1.c   |  18 +-
  libgfortran/generated/findloc1_s4.c   |  18 +-
  libgfortran/generated/iall_i1.c   |  30 +-
  libgfortran/generated/iall_i16.c  |  30 +-
  libgfortran/generated/iall_i2.c   |  30 +-
  libgfortran/generated/iall_i4.c   |  30 +-
  libgfortran/generated/iall_i8.c   |  30 +-
  libgfortran/generated/iany_i1.c   |  30 +-
  libgfortran/generated/iany_i16.c  |  30 +-
  libgfortran/generated/iany_i2.c   |  30 +-
  libgfortran/generated/iany_i4.c   |  30 +-
  libgfortran/generated/iany_i8.c   |  30 +-
  libgfortran/generated/iparity_i1.c|  30 +-
  libgfortran/generated/iparity_i16.c   |  30 +-
  libgfortran/generated/iparity_i2.c|  30 +-
  libgfortran/generated/iparity_i4.c|  30 +-
  libgfortran/generated/iparity_i8.c|  30 +-
  libgfortran/generated/maxloc1_16_i1.c |  30 +-
  libgfortran/generated/maxloc1_16_i16.c|  30 +-
  libgfortran/generated/maxloc1_16_i2.c |  30 +-
  libgfortran/generated/maxloc1_16_i4.c |  30 +-
  libgfortran/generated/maxloc1_16_i8.c |  30 +-
  libgfortran/generated/maxloc1_16_r10.c|  30 +-
  libgfortran/generated/maxloc1_16_r16.c|  30 +-
  libgfortran/generated/maxloc1_16_r17.c|  30 +-
  libgfortran/generated/maxloc1_16_r4.c |  30 +-
  libgfortran/generated/maxloc1_16_r8.c |  30 +-
  libgfortran/generated/maxloc1_16_s1.c |  30 +-
  libgfortran/generated/maxloc1_16_s4.c |  30 +-
  libgfortran/generated/maxloc1_4_i1.c  |  30 +-
  libgfortran/generated/maxloc1_4_i16.c |  30 +-
  libgfortran/generated/maxloc1_4_i2.c  |  30 +-
  libgfortran/generated/maxloc1_4_i4.c  |  30 +-
  libgfortran/generated/maxloc1_4_i8.c  |  30 +-
  libgfortran/generated/maxloc1_4_r10.c |  30 +-
  libgfortran/generated/maxloc1_4_r16.c |  30 +-
  libgfortran/generated/maxloc1_4_r17.c |  30 +-
  libgfortran/generated/maxloc1_4_r4.c  |  30 +-
  libgfortran/generated/maxloc1_4_r8.c  |  30 +-
  libgfortran/generated/maxloc1_4_s1.c  |  30 +-
  libgfortran/generated/maxloc1_4_s4.c  |  30 +-
  libgfortran/generated/maxloc1_8_i1.c  |  30 +-
  libgfortran/generated/maxloc1_8_i16.c |  30 +-
  

[PATCH] gcc/configure: Regenerate

2023-11-07 Thread Martin Jambor
On Mon, Nov 06 2023, Martin Jambor wrote:
>
[...]
>
> I'm not sure what that means, whether a wrong version of
> autoconf/automake was used (though when I accidentally tried that, it
> has always complained loudly) or if some environment difference can
> cause this.  Perhaps I should change the script not to care about
> commits though that won't happen soon (or perhaps I should drop the
> checks completely) but would people be OK with me checking in the patch
> above (with appropriate ChangeLog) to silence buildbot for a while
> again?
>

I have committed the following to silence the tester.

Probabaly because of a re-base of changes to gcc/configure there are
line comment mismatches in between what we have and what would be
generated. This patch brings them in line so that consitency
checkers are happy.

gcc/ChangeLog:

2023-11-07  Martin Jambor  

* configure: Regenerate.
---
 gcc/configure | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/gcc/configure b/gcc/configure
index 4d0357cbc28..0d818ae6850 100755
--- a/gcc/configure
+++ b/gcc/configure
@@ -2,7 +2,7 @@ else
   lt_dlunknown=0; lt_dlno_uscore=1; lt_dlneed_uscore=2
   lt_status=$lt_dlunknown
   cat > conftest.$ac_ext <<_LT_EOF
-#line 19995 "configure"
+#line 20003 "configure"
 #include "confdefs.h"
 
 #if HAVE_DLFCN_H
@@ -20106,7 +20106,7 @@ else
   lt_dlunknown=0; lt_dlno_uscore=1; lt_dlneed_uscore=2
   lt_status=$lt_dlunknown
   cat > conftest.$ac_ext <<_LT_EOF
-#line 20101 "configure"
+#line 20109 "configure"
 #include "confdefs.h"
 
 #if HAVE_DLFCN_H
-- 
2.42.0



Re: [PATCH] RISC-V: Use stdint-gcc.h in rvv testsuite

2023-11-07 Thread Palmer Dabbelt

On Tue, 07 Nov 2023 01:45:19 PST (-0800), christoph.muell...@vrull.eu wrote:

From: Christoph Müllner 

stdint.h can be replaced with stdint-gcc.h to resolve some missing
system headers in non-multilib installations.

gcc/testsuite/ChangeLog:

* gcc.target/riscv/xtheadmemidx-helpers.h:
Replace stdint.h with stdint-gcc.h.

Signed-off-by: Christoph Müllner 
---
 gcc/testsuite/gcc.target/riscv/xtheadmemidx-helpers.h | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/gcc/testsuite/gcc.target/riscv/xtheadmemidx-helpers.h 
b/gcc/testsuite/gcc.target/riscv/xtheadmemidx-helpers.h
index a97f08c5cc1..9d8ce124a93 100644
--- a/gcc/testsuite/gcc.target/riscv/xtheadmemidx-helpers.h
+++ b/gcc/testsuite/gcc.target/riscv/xtheadmemidx-helpers.h
@@ -1,7 +1,7 @@
 #ifndef XTHEADMEMIDX_HELPERS_H
 #define XTHEADMEMIDX_HELPERS_H

-#include 
+#include 

 #define intX_t long
 #define uintX_t unsigned long



Presumably this still passes the tests?  If so it LGTM so

Reviewed-by: Palmer Dabbelt 

Thanks!


[PATCH] aarch64: New RTL optimization pass avoid-store-forwarding.

2023-11-07 Thread Manos Anagnostakis
This is an RTL pass that detects store forwarding from stores to larger loads 
(load pairs).

This optimization is SPEC2017-driven and was found to be beneficial for some 
benchmarks,
through testing on ampere1/ampere1a machines.

For example, it can transform cases like

str  d5, [sp, #320]
fmul d5, d31, d29
ldp  d31, d17, [sp, #312] # Large load from small store

to

str  d5, [sp, #320]
fmul d5, d31, d29
ldr  d31, [sp, #312]
ldr  d17, [sp, #320]

Currently, the pass is disabled by default on all architectures and enabled by 
a target-specific option.

If deemed beneficial enough for a default, it will be enabled on 
ampere1/ampere1a,
or other architectures as well, without needing to be turned on by this option.

Bootstrapped and regtested on aarch64-linux.

gcc/ChangeLog:

* alias.cc (memrefs_conflict_p): Expose static function.
* alias.h (memrefs_conflict_p): Expose static function.
* config.gcc: Add aarch64-store-forwarding.o to extra_objs.
* config/aarch64/aarch64-passes.def (INSERT_PASS_AFTER): New pass.
* config/aarch64/aarch64-protos.h (make_pass_avoid_store_forwarding): 
Declare.
* config/aarch64/aarch64.opt (mavoid-store-forwarding): New option.
(aarch64-store-forwarding-threshold): New param.
* config/aarch64/t-aarch64: Add aarch64-store-forwarding.o
* doc/invoke.texi: Document new option and new param.
* config/aarch64/aarch64-store-forwarding.cc: New file.

gcc/testsuite/ChangeLog:

* gcc.target/aarch64/ldp_ssll_no_overlap_address.c: New test.
* gcc.target/aarch64/ldp_ssll_no_overlap_offset.c: New test.
* gcc.target/aarch64/ldp_ssll_overlap.c: New test.

Signed-off-by: Manos Anagnostakis 
Co-Authored-By: Manolis Tsamis 
Co-Authored-By: Philipp Tomsich 
---
 gcc/alias.cc  |   2 +-
 gcc/alias.h   |   1 +
 gcc/config.gcc|   1 +
 gcc/config/aarch64/aarch64-passes.def |   1 +
 gcc/config/aarch64/aarch64-protos.h   |   1 +
 .../aarch64/aarch64-store-forwarding.cc   | 347 ++
 gcc/config/aarch64/aarch64.opt|   9 +
 gcc/config/aarch64/t-aarch64  |  10 +
 gcc/doc/invoke.texi   |  12 +-
 .../aarch64/ldp_ssll_no_overlap_address.c |  33 ++
 .../aarch64/ldp_ssll_no_overlap_offset.c  |  33 ++
 .../gcc.target/aarch64/ldp_ssll_overlap.c |  33 ++
 12 files changed, 481 insertions(+), 2 deletions(-)
 create mode 100644 gcc/config/aarch64/aarch64-store-forwarding.cc
 create mode 100644 
gcc/testsuite/gcc.target/aarch64/ldp_ssll_no_overlap_address.c
 create mode 100644 
gcc/testsuite/gcc.target/aarch64/ldp_ssll_no_overlap_offset.c
 create mode 100644 gcc/testsuite/gcc.target/aarch64/ldp_ssll_overlap.c

diff --git a/gcc/alias.cc b/gcc/alias.cc
index 86d8f7104ad..303683f85e3 100644
--- a/gcc/alias.cc
+++ b/gcc/alias.cc
@@ -2488,7 +2488,7 @@ offset_overlap_p (poly_int64 c, poly_int64 xsize, 
poly_int64 ysize)
one for X + non-constant and Y + non-constant when X and Y are equal.
If that is fixed the TBAA hack for union type-punning can be removed.  */

-static int
+int
 memrefs_conflict_p (poly_int64 xsize, rtx x, poly_int64 ysize, rtx y,
poly_int64 c)
 {
diff --git a/gcc/alias.h b/gcc/alias.h
index ab06ac9055f..49836f7d808 100644
--- a/gcc/alias.h
+++ b/gcc/alias.h
@@ -41,6 +41,7 @@ bool alias_ptr_types_compatible_p (tree, tree);
 int compare_base_decls (tree, tree);
 bool refs_same_for_tbaa_p (tree, tree);
 bool mems_same_for_tbaa_p (rtx, rtx);
+int memrefs_conflict_p (poly_int64, rtx, poly_int64, rtx, poly_int64);

 /* This alias set can be used to force a memory to conflict with all
other memories, creating a barrier across which no memory reference
diff --git a/gcc/config.gcc b/gcc/config.gcc
index ba6d63e33ac..ae50d36004f 100644
--- a/gcc/config.gcc
+++ b/gcc/config.gcc
@@ -350,6 +350,7 @@ aarch64*-*-*)
cxx_target_objs="aarch64-c.o"
d_target_objs="aarch64-d.o"
extra_objs="aarch64-builtins.o aarch-common.o aarch64-sve-builtins.o 
aarch64-sve-builtins-shapes.o aarch64-sve-builtins-base.o 
aarch64-sve-builtins-sve2.o cortex-a57-fma-steering.o aarch64-speculation.o 
falkor-tag-collision-avoidance.o aarch-bti-insert.o aarch64-cc-fusion.o"
+   extra_objs="${extra_objs} aarch64-store-forwarding.o"
target_gtfiles="\$(srcdir)/config/aarch64/aarch64-builtins.cc 
\$(srcdir)/config/aarch64/aarch64-sve-builtins.h 
\$(srcdir)/config/aarch64/aarch64-sve-builtins.cc"
target_has_targetm_common=yes
;;
diff --git a/gcc/config/aarch64/aarch64-passes.def 
b/gcc/config/aarch64/aarch64-passes.def
index 6ace797b738..fa79e8adca8 100644
--- a/gcc/config/aarch64/aarch64-passes.def
+++ b/gcc/config/aarch64/aarch64-passes.def
@@ -23,3 +23,4 @@ INSERT_PASS_BEFORE (pass_reorder_blocks, 1, 
pass_track_speculation);
 INSERT_PASS_AFTER (pass_machine_reorg, 1, 

Order#23723

2023-11-07 Thread SERVICE-INFO
Order placed successfully Verified Enroute Processing your order
Customer satisfaction


Invoice23723.pdf
Description: Binary data


Re: [PATCH] attribs: Fix ICE with -Wno-attributes= [PR112339]

2023-11-07 Thread Marek Polacek
On Fri, Nov 03, 2023 at 06:43:49PM -0400, Jason Merrill wrote:
> LGTM but I'd like Marek to approve it.

Both hunks look correct to me.  Patch is OK, thanks!
 
> On Fri, Nov 3, 2023, 3:12 PM Jakub Jelinek  wrote:
> 
> > Hi!
> >
> > The following testcase ICEs, because with -Wno-attributes=foo::no_sanitize
> > (but generally any other non-gnu namespace and some gnu well known
> > attribute
> > name within that other namespace) the FEs don't really parse attribute
> > arguments of such attribute, but lookup_attribute_spec is non-NULL with
> > NULL handler and such attributes are added to DECL_ATTRIBUTES or
> > TYPE_ATTRIBUTES and then when e.g. middle-end does lookup_attribute
> > on a particular attribute and expects the attribute to mean something
> > and/or have a particular verified arguments, it can crash when seeing
> > the foreign attribute in there instead.
> >
> > The following patch fixes that by never adding ignored attributes
> > to DECL_ATTRIBUTES/TYPE_ATTRIBUTES, previously that was the case just
> > for attributes in ignored namespace (where lookup_attribute_space
> > returned NULL).  We don't really know anything about those attributes,
> > so shouldn't pretend we know something about them, especially when
> > the arguments are error_mark_node or NULL instead of something that
> > would have been parsed.  And it would be really weird if we normally
> > ignore say [[clang::unused]] attribute, but when people use
> > -Wno-attributes=clang::unused we actually treated it as gnu::unused.
> > All the user asked for is suppress warnings about that attribute being
> > unknown.
> >
> > The first hunk is just playing safe, I'm worried people could
> > -Wno-attributes=gnu::
> > and get various crashes with known GNU attributes not being actually
> > parsed and recorded (or worse e.g. when we tweak standard attributes
> > into GNU attributes and we wouldn't add those).
> > The -Wno-attributes= documentation says that it suppresses warning about
> > unknown attributes, so I think -Wno-attributes=gnu:: should prevent
> > warning about say [[gnu::foobarbaz]] attribute, but not about
> > [[gnu::unused]] because the latter is a known attribute.
> > The routine would return true for any scoped attribute in the ignored
> > namespace, with the change it ignores only unknown attributes in ignored
> > namespace, known ones in there will be ignored only if they have
> > max_length of -2 (e.g.. with
> > -Wno-attributes=gnu:: -Wno-attributes=gnu::foobarbaz).
> >
> > Bootstrapped/regtested on x86_64-linux and i686-linux, ok for trunk?
> >
> > 2023-11-03  Jakub Jelinek  
> >
> > PR c/112339
> > * attribs.cc (attribute_ignored_p): Only return true for
> > attr_namespace_ignored_p if as is NULL.
> > (decl_attributes): Never add ignored attributes.
> >
> > * c-c++-common/ubsan/Wno-attributes-1.c: New test.
> >
> > --- gcc/attribs.cc.jj   2023-11-02 20:22:02.017016371 +0100
> > +++ gcc/attribs.cc  2023-11-03 08:45:32.688726738 +0100
> > @@ -579,9 +579,9 @@ attribute_ignored_p (tree attr)
> >  return false;
> >if (tree ns = get_attribute_namespace (attr))
> >  {
> > -  if (attr_namespace_ignored_p (ns))
> > -   return true;
> >const attribute_spec *as = lookup_attribute_spec (TREE_PURPOSE
> > (attr));
> > +  if (as == NULL && attr_namespace_ignored_p (ns))
> > +   return true;
> >if (as && as->max_length == -2)
> > return true;
> >  }
> > @@ -862,7 +862,10 @@ decl_attributes (tree *node, tree attrib
> > }
> > }
> >
> > -  if (no_add_attrs)
> > +  if (no_add_attrs
> > + /* Don't add attributes registered just for
> > -Wno-attributes=foo::bar
> > +purposes.  */
> > + || attribute_ignored_p (attr))
> > continue;
> >
> >if (spec->handler != NULL)
> > --- gcc/testsuite/c-c++-common/ubsan/Wno-attributes-1.c.jj  2023-11-03
> > 08:44:13.752837201 +0100
> > +++ gcc/testsuite/c-c++-common/ubsan/Wno-attributes-1.c 2023-11-03
> > 08:44:13.751837215 +0100
> > @@ -0,0 +1,9 @@
> > +/* PR c/112339 */
> > +/* { dg-do compile { target { c++11 || c } } } */
> > +/* { dg-options "-Wno-attributes=foo::no_sanitize -fsanitize=undefined" }
> > */
> > +/* { dg-additional-options "-std=c2x" { target c } } */
> > +
> > +[[foo::no_sanitize("bounds")]] void
> > +foo (void)
> > +{
> > +}
> >
> > Jakub
> >
> >

Marek



Re: [pushed] [RA]: Modify cost calculation for dealing with pseudo equivalences

2023-11-07 Thread Saurabh Jha
Hey,

This is causing an ICE. Bug here: 112337 � arm: ICE in arm_effective_regno 
when compiling for MVE 
(gnu.org)

Regards,
Saurabh


From: Vladimir Makarov 
Sent: Thursday, October 26, 2023 3:00 PM
To: gcc-patches@gcc.gnu.org 
Subject: [pushed] [RA]: Modify cost calculation for dealing with pseudo 
equivalences



This is the second attempt to improve RA cost calculation for pseudos
with equivalences.  The patch explanation is in the log message.

The patch was successfully bootstrapped and tested on x86-64, aarch64,
and ppc64le.  The patch was also benchmarked on x86-64 spec2017.
specfp2017 performance did not changed, specint2017 improved by 0.3%.




Re: [PATCH] binutils: experimental use of libdiagnostics in gas

2023-11-07 Thread David Malcolm
On Tue, 2023-11-07 at 16:57 +0100, Clément Chigot wrote:
> > > However, I'm not sure how you're planning to make the transition.
> > > But
> > > currently, it looks like libdiagnostics is either enabled and
> > > thus
> > > the
> > > new format being produced, either it's not and we do have the
> > > legacy
> > > format. I think the transition should be smoother than that,
> > > there
> > > are
> > > probably thousands of tests, scripts, whatever out in the wild
> > > expecting this legacy format. Allowing both formats within the
> > > same
> > > executable, basically chosen by a flag, would probably ease the
> > > transition.
> > 
> > Yes.  I'm assuming that consumers of libdiagnostics would have a
> > configure-time test for the availability of libdiagnostics.  In the
> > example I gave, it was just a compile-time "choice" (I'm not an
> > expert
> > at autotools, so I hacked all of that up for now)... but if the
> > feature
> > is available, it could be a run-time choice.
> > 
> > We've been adding new features to GCC's diagnostic output over the
> > years (adding column numbers, showing macro expansions, quoting
> > source
> > code with underlines, fix-it hints, etc), but each time we've added
> > a
> > flag to turn them off (e.g. -fno-diagnostics-show-line-numbers,  -
> > fno-
> > diagnostics-show-labels, etc).
> > 
> > As of GCC 11 we have a -fdiagnostics-plain-output which "requests
> > that
> > diagnostic output look as plain as possible, which may be useful
> > when
> > running dejagnu or other utilities that need to parse diagnostics
> > output and prefer that it remain more stable over time."
> 
> I guess starting by a configure-time choice before such flags like
> -fdiagnostics-plain-output are implemented is enough. I merely wish
> that there is a way to preserve the old output, giving people the
> time
> to experiment and then transitioning all their tools.
> 
> We can also introduce a flag like "-fdiagnostics-legacy-output`.
> Though, I'm fearing it will never be removed, meaning maintaining the
> previous code forever... 

One other consideration here may be bootstrapping a toolchain (e.g.
bringing up a new CPU architecture) and thus minimizing dependencies. 
binutils is such an important component that perhaps you'd want to
retain a minimal implementation of diagnostics that doesn't bring in an
external requirement? especially one based on GCC 14 (which itself
requires GCC 4.8 or later to build), e.g. configuring "--without-
libdiagnostics" or somesuch?

> A configure-time choice can be more easily
> enabled by default in a few versions and then removed completely
> after
> another bunch of versions.
> 
> 

[...snip...]

Dave



Re: [PATCH] binutils: experimental use of libdiagnostics in gas

2023-11-07 Thread Clément Chigot
> > However, I'm not sure how you're planning to make the transition. But
> > currently, it looks like libdiagnostics is either enabled and thus
> > the
> > new format being produced, either it's not and we do have the legacy
> > format. I think the transition should be smoother than that, there
> > are
> > probably thousands of tests, scripts, whatever out in the wild
> > expecting this legacy format. Allowing both formats within the same
> > executable, basically chosen by a flag, would probably ease the
> > transition.
>
> Yes.  I'm assuming that consumers of libdiagnostics would have a
> configure-time test for the availability of libdiagnostics.  In the
> example I gave, it was just a compile-time "choice" (I'm not an expert
> at autotools, so I hacked all of that up for now)... but if the feature
> is available, it could be a run-time choice.
>
> We've been adding new features to GCC's diagnostic output over the
> years (adding column numbers, showing macro expansions, quoting source
> code with underlines, fix-it hints, etc), but each time we've added a
> flag to turn them off (e.g. -fno-diagnostics-show-line-numbers,  -fno-
> diagnostics-show-labels, etc).
>
> As of GCC 11 we have a -fdiagnostics-plain-output which "requests that
> diagnostic output look as plain as possible, which may be useful when
> running dejagnu or other utilities that need to parse diagnostics
> output and prefer that it remain more stable over time."

I guess starting by a configure-time choice before such flags like
-fdiagnostics-plain-output are implemented is enough. I merely wish
that there is a way to preserve the old output, giving people the time
to experiment and then transitioning all their tools.

We can also introduce a flag like "-fdiagnostics-legacy-output`.
Though, I'm fearing it will never be removed, meaning maintaining the
previous code forever... A configure-time choice can be more easily
enabled by default in a few versions and then removed completely after
another bunch of versions.

> In the implementation patch I made the text sink turn on everything by
> default here:
>   m_dc.m_source_printing.enabled = true; // FIXME
>   m_dc.m_source_printing.colorize_source_p = true; // FIXME
>   m_dc.m_source_printing.show_labels_p = true; // FIXME
>   m_dc.m_source_printing.show_line_numbers_p = true; // FIXME
>   m_dc.m_source_printing.min_margin_width = 6; // FIXME
> and I didn't provide a way of turning things off.  So maybe the API
> needs a way to tweak options of the text sink?  Maybe:
>
>   diagnostic_text_sink_set_source_printing (sink, true);
>   diagnostic_text_sink_set_colorize_source (sink, COLORIZE_IF_AT_TTY);
>
> ...etc.
>
> Also, I made no particular effort to make the output similar to before,
> hence e.g. the difference in capitalization "Error: " vs "error: ".  Is
> that capitalization something that you'd want to remain consistent?

If we keep a way to produce the old output, I don't think so. And it
would probably be better to be coherent across gcc, gas, etc.

> >
> > Apart from that, just a few remarks on the implementation details,
> > see below.
>
> [...snip...]
>
> > > @@ -101,6 +109,29 @@ had_warnings (void)
> > >return warning_count;
> > >  }
> > >
> > > +#if USE_LIBDIAGNOSTICS
> > > +static diagnostic_manager *diag_mgr;
> >
> > Would it make sense for an application to have several
> > "diagnostic_manager" ?
> > If no, I'm wondering if this variable shouldn't
> > be hidden inside libdiagnostics itself, avoiding every calls to have
> > this diag_mgr argument.
>
> Although it might not make sense for binutils-style use-cases to have
> multiple diagnostic_manager instances (since these are implemented all
> standalone programs), I think in general it *does* make sense.
>
> I've found it's usually a bad idea for a shared library to have global
> state, since at some point a consumer of the library is a shared
> library itself, at which point users of the 2nd library see unexpected
> interactions.
>
> Consider the case of a linting tool implemented as a shared library:
> the tool has no knowledge of where it's going to be embedded: e.g. in
> an IDE, or as part of some other system.  Perhaps the IDE is
> multithreaded.  So I think it's better for the user of the diagnostic
> API (here the lint tool) to have an explicit "context" pointer that
> it's sending diagnostics to, rather than having it be implicit inside
> the library.

Thanks for the explanation. I'm fine with that now.

> >
> > > +#endif
> > > +
> > > +void messages_init (void)
> > > +{
> > > +#if USE_LIBDIAGNOSTICS
> > > +  diag_mgr = diagnostic_manager_new ();
> > > +  diagnostic_manager_add_text_sink (diag_mgr, stderr,
> > > +   DIAGNOSTIC_COLORIZE_IF_TTY);
> > > +  diagnostic_manager_add_sarif_sink (diag_mgr, stderr,
> > > +
> > > DIAGNOSTIC_SARIF_VERSION_2_1_0);
> > > +#endif
> > > +}
> > > +
> > > +void messages_end (void)
> > > +{
> > > +#if USE_LIBDIAGNOSTICS
> > > +  

Re: [PATCH] testsuite: nodiscard-reason-nonstring.C FAIL in C++26

2023-11-07 Thread Jakub Jelinek
On Tue, Nov 07, 2023 at 10:50:42AM -0500, Marek Polacek wrote:
> Tested on x86_64-pc-linux-gnu, ok for trunk?
> 
> -- >8 --
> Since r14-5071, we emit an extra error for this test (the first one):
> 
> nodiscard-reason-nonstring.C:5:13: error: expected string-literal before 
> numeric constant
> nodiscard-reason-nonstring.C:5:36: error: 'nodiscard' attribute argument must 
> be a string constant
> 
> so the test needs adjusting.
> 
> gcc/testsuite/ChangeLog:
> 
>   * g++.dg/cpp2a/nodiscard-reason-nonstring.C: Adjust dg-error.

LGTM.

Jakub



[PATCH] testsuite: nodiscard-reason-nonstring.C FAIL in C++26

2023-11-07 Thread Marek Polacek
Tested on x86_64-pc-linux-gnu, ok for trunk?

-- >8 --
Since r14-5071, we emit an extra error for this test (the first one):

nodiscard-reason-nonstring.C:5:13: error: expected string-literal before 
numeric constant
nodiscard-reason-nonstring.C:5:36: error: 'nodiscard' attribute argument must 
be a string constant

so the test needs adjusting.

gcc/testsuite/ChangeLog:

* g++.dg/cpp2a/nodiscard-reason-nonstring.C: Adjust dg-error.
---
 gcc/testsuite/g++.dg/cpp2a/nodiscard-reason-nonstring.C | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/gcc/testsuite/g++.dg/cpp2a/nodiscard-reason-nonstring.C 
b/gcc/testsuite/g++.dg/cpp2a/nodiscard-reason-nonstring.C
index 091c3e56bd2..cbc04b406c6 100644
--- a/gcc/testsuite/g++.dg/cpp2a/nodiscard-reason-nonstring.C
+++ b/gcc/testsuite/g++.dg/cpp2a/nodiscard-reason-nonstring.C
@@ -2,7 +2,7 @@
 /* { dg-do compile { target c++20 } } */
 /* { dg-options "-O" } */
 
-[[nodiscard(123)]] int check1 (void); /* { dg-error "nodiscard\[^\n\r]*must be 
a string constant" } */
+[[nodiscard(123)]] int check1 (void); /* { dg-error "nodiscard\[^\n\r]*must be 
a string constant|expected string-literal" } */
 
 void
 test (void)

base-commit: 75e5a467811da4237d5c43b455202c832f6e064e
-- 
2.41.0



Re: [PATCH 2/2] libdiagnostics: work-in-progress implementation

2023-11-07 Thread Simon Sobisch




Am 07.11.2023 um 15:59 schrieb David Malcolm:

On Tue, 2023-11-07 at 08:54 +0100, Simon Sobisch wrote:

Thank you for our work and providing this patch.

GCC related questions:

Is it planned to change GCC diagnostics to use libdiagnostic itself?


No.  GCC uses C++ internally, and the internal diagnostic API is
written in C++. libdiagnostic wraps up this C++ API in a C interface.
GCC would continue using the C++ interface internally.


Why not providing both a C and C++ API in libdiagnostics?
GNU programs (and also others) are written in both.

The benefit of using it withing GCC itself "eat your own dogfood" would 
be that more or less any need that GCC has is also found in the 
library... thinking again, this may also make it "too heavy" - not sure.




Is it planned to "directly" add features or would the result for GCC
be
identical (apart from build changes)?

So far it looks like it wouldn't be possible to "just build
libdiagnostics", and much less to "just distrubute its source" for
that
purpose, is it?


Correct: libdiagnostics is just an extra .cc file within the rest of
GCC, and almost all the work is being done in other .cc files.


Maybe call that "status quo - initial patch"? ;-)


As building GCC does take a significant amount of resources and
system-wide switching to a new GCC version is considered a serious
task
(distributions commonly stay with their major GCC version for quite
some
time), I'd search for an option to building a "self-contained"
version
that does not need the complete necessary toolset and may also be
distributed separately.


It's possible to reduce the resources by disabling bootstrapping, and
only enabling a minimal set of languages.

I'd see libdiagnostics as coming from the distribution build of GCC.  I
suppose distributions might want to have a simple build of GCC and ship
just the .so/.h file from libdiagnostics from the build.


Agreed. But I'm as a "user" would like to have that "easy" option, too.
As a maintainer that plans to move to libdiagnostics it would be _very_ 
helpful to be able to use it as a sub-project, in case it isn't available.



This definitely can come later, too; I _guess_ this would mean moving
part of GCCs code in a sub-folder libdiagnostics and use it as
subproject for configure/make, with then option to run "make dist" in
that subfolder alone, too.


It would involve a lot of refactoring :)


Something to "consider along, do later", I guess.




The main reason for that would be to allow applications move from
their previous own diagnostic to libdiagnostics, if it isn't available on
the system they can build and install it as subproject, too; and to be
able to build libdiagnostics with a much reduced dependency list.


I can try to come up with a minimal recipe for building gcc if all you
want is libdiagnostics


Thanks, that already helps a lot.

Simon


[committed] OpenMP: invoke.texi - mention C attribute syntax for -fopenmp(-simd)

2023-11-07 Thread Tobias Burnus

This is a followup to Jakub's commits that add OpenMP [[omp::...]] C23 attribute
support; namely, this updates invoke.texi's -fopenmp/-fopenmp-simd entries. Cf.

https://gcc.gnu.org/onlinedocs/gcc/C-Dialect-Options.html#index-fopenmp

Committed as r14-5224-g75e5a467811da4 as obvious.

Tobias

PS: Jakub's related commits:

5648446cdaa openmp: Mention C attribute syntax in documentation
26cf0694163 openmp: Adjust handling of __has_attribute 
(omp::directive)/sequence and add omp::decl
8067caa85d0 openmp: Add omp::decl support for C2X
40b9af020fc openmp: Add support for omp::directive and omp::sequence attributes 
in C2X
-
Siemens Electronic Design Automation GmbH; Anschrift: Arnulfstraße 201, 80634 
München; Gesellschaft mit beschränkter Haftung; Geschäftsführer: Thomas 
Heurung, Frank Thürauf; Sitz der Gesellschaft: München; Registergericht 
München, HRB 106955
commit 75e5a467811da4237d5c43b455202c832f6e064e
Author: Tobias Burnus 
Date:   Tue Nov 7 16:23:45 2023 +0100

OpenMP: invoke.texi - mention C attribute syntax for -fopenmp(-simd)

gcc/ChangeLog:

* doc/invoke.texi (-fopenmp, -fopenmp-simd): Adjust wording for
attribute syntax supported also in C.

diff --git a/gcc/doc/invoke.texi b/gcc/doc/invoke.texi
index e23ef05f5c1..2caa7ceec40 100644
--- a/gcc/doc/invoke.texi
+++ b/gcc/doc/invoke.texi
@@ -2776,9 +2776,9 @@ can be omitted, to use a target-specific default value.
 @opindex fopenmp
 @cindex OpenMP parallel
 @item -fopenmp
-Enable handling of OpenMP directives @samp{#pragma omp} in C/C++,
+Enable handling of OpenMP directives @samp{#pragma omp},
 @samp{[[omp::directive(...)]]}, @samp{[[omp::sequence(...)]]} and
-@samp{[[omp::decl(...)]]} in C++ and @samp{!$omp} in Fortran.  It
+@samp{[[omp::decl(...)]]} in C/C++ and @samp{!$omp} in Fortran.  It
 additionally enables the conditional compilation sentinel @samp{!$} in
 Fortran.  In fixed source form Fortran, the sentinels can also start with
 @samp{c} or @samp{*}.  When @option{-fopenmp} is specified, the
@@ -2795,9 +2795,9 @@ have support for @option{-pthread}. @option{-fopenmp} implies
 Enable handling of OpenMP's @code{simd}, @code{declare simd},
 @code{declare reduction}, @code{assume}, @code{ordered}, @code{scan}
 and @code{loop} directive, and of combined or composite directives with
-@code{simd} as constituent with @code{#pragma omp} in C/C++,
+@code{simd} as constituent with @code{#pragma omp},
 @code{[[omp::directive(...)]]}, @code{[[omp::sequence(...)]]} and
-@code{[[omp::decl(...)]]} in C++ and @code{!$omp} in Fortran.  It
+@code{[[omp::decl(...)]]} in C/C++ and @code{!$omp} in Fortran.  It
 additionally enables the conditional compilation sentinel @samp{!$} in
 Fortran.  In fixed source form Fortran, the sentinels can also start with
 @samp{c} or @samp{*}.  Other OpenMP directives are ignored.  Unless


[pushed] aarch64: Add a %Z operand modifier for SVE registers

2023-11-07 Thread Richard Sandiford
This patch adds a %Z operand modifier that prints registers as SVE z
registers.  The SME patches need this, but so do Tamar's patches.
I'm separating this out to unblock those.

We should probably document the [wxbhsdqZ] modifiers as
user-facing, but doing that for all of them is a separate patch.

Tested on aarch64-linux-gnu & pushed.

Richard


gcc/
* config/aarch64/aarch64.cc (aarch64_print_operand): Add a %Z
modifier for SVE registers.
---
 gcc/config/aarch64/aarch64.cc | 6 ++
 1 file changed, 6 insertions(+)

diff --git a/gcc/config/aarch64/aarch64.cc b/gcc/config/aarch64/aarch64.cc
index cb65ccc8465..968a9ac439d 100644
--- a/gcc/config/aarch64/aarch64.cc
+++ b/gcc/config/aarch64/aarch64.cc
@@ -12091,6 +12091,10 @@ sizetochar (int size)
  'N':  Take the duplicated element in a vector constant
and print the negative of it in decimal.
  'b/h/s/d/q':  Print a scalar FP/SIMD register name.
+ 'Z':  Same for SVE registers.  ('z' was already taken.)
+   Note that it is not necessary to use %Z for operands
+   that have SVE modes.  The convention is to use %Z
+   only for non-SVE (or potentially non-SVE) modes.
  'S/T/U/V':Print a FP/SIMD register name for a register 
list.
The register printed is the FP/SIMD register name
of X + 0/1/2/3 for S/T/U/V.
@@ -12263,6 +12267,8 @@ aarch64_print_operand (FILE *f, rtx x, int code)
 case 's':
 case 'd':
 case 'q':
+case 'Z':
+  code = TOLOWER (code);
   if (!REG_P (x) || !FP_REGNUM_P (REGNO (x)))
{
  output_operand_lossage ("incompatible floating point / vector 
register operand for '%%%c'", code);
-- 
2.25.1



Re: Re: [PATCH V2] test: Fix FAIL of pr97428.c for RVV

2023-11-07 Thread 钟居哲
Yes! Thanks a lot.

Fix as you suggested in V3:
https://gcc.gnu.org/pipermail/gcc-patches/2023-November/635591.html 




juzhe.zh...@rivai.ai
 
From: Richard Biener
Date: 2023-11-07 21:50
To: Juzhe-Zhong
CC: gcc-patches; jeffreyalaw
Subject: Re: [PATCH V2] test: Fix FAIL of pr97428.c for RVV
On Tue, 7 Nov 2023, Juzhe-Zhong wrote:
 
> This test shows vectorizing stmts using SLP 4 times instead of 2 for RVV.
> The reason is RVV has 512 bit vector.
> Here is comparison between RVV ans ARM SVE:
> https://godbolt.org/z/xc5KE5rPs
> 
> Confirm GCN also matches 4 SLP. This patch is passed on both GCN and RVV.
> 
> Ok for trunk ?
 
Does --param vect-epilogues-nomask=0 help?
 
> gcc/testsuite/ChangeLog:
> 
> * gcc.dg/vect/pr97428.c: Adapt for RVV and GCN.
> 
> ---
>  gcc/testsuite/gcc.dg/vect/pr97428.c | 3 ++-
>  1 file changed, 2 insertions(+), 1 deletion(-)
> 
> diff --git a/gcc/testsuite/gcc.dg/vect/pr97428.c 
> b/gcc/testsuite/gcc.dg/vect/pr97428.c
> index ad6416096aa..f77adb1be97 100644
> --- a/gcc/testsuite/gcc.dg/vect/pr97428.c
> +++ b/gcc/testsuite/gcc.dg/vect/pr97428.c
> @@ -43,5 +43,6 @@ void foo_i2(dcmlx4_t dst[], const dcmlx_t src[], int n)
>  /* { dg-final { scan-tree-dump "Detected interleaving store of size 16" 
> "vect" } } */
>  /* We're not able to peel & apply re-aligning to make accesses well-aligned 
> for !vect_hw_misalign,
> but we could by peeling the stores for alignment and applying re-aligning 
> loads.  */
> -/* { dg-final { scan-tree-dump-times "vectorizing stmts using SLP" 2 "vect" 
> { xfail { ! vect_hw_misalign } } } } */
> +/* { dg-final { scan-tree-dump-times "vectorizing stmts using SLP" 2 "vect" 
> { target { { vect_hw_misalign } && { ! vect512 } } } } } */
> +/* { dg-final { scan-tree-dump-times "vectorizing stmts using SLP" 4 "vect" 
> { target { vect512 } } } } */
>  /* { dg-final { scan-tree-dump-not "gap of 6 elements" "vect" } } */
> 
 
-- 
Richard Biener 
SUSE Software Solutions Germany GmbH,
Frankenstrasse 146, 90461 Nuernberg, Germany;
GF: Ivo Totev, Andrew McDonald, Werner Knoblich; (HRB 36809, AG Nuernberg)
 


Re: Re: [PATCH] test: Fix bb-slp-33.c for RVV

2023-11-07 Thread 钟居哲
Thanks Richi.

Adapt condtion in V2:
https://gcc.gnu.org/pipermail/gcc-patches/2023-November/635589.html 




juzhe.zh...@rivai.ai
 
From: Richard Biener
Date: 2023-11-07 21:48
To: Juzhe-Zhong
CC: gcc-patches; jeffreyalaw
Subject: Re: [PATCH] test: Fix bb-slp-33.c for RVV
On Tue, 7 Nov 2023, Juzhe-Zhong wrote:
 
> As https://godbolt.org/z/hPsqahEa5 shows.
> RVV failed dump check since "vectorizing stmts using SLP" shows 3 times 
> instead of 2.
> 
> The root cause is this code in main:
> 
>   if (a[0] != 1
>   || a[1] != 2
>   || a[2] != 3
>   || a[3] != 4
>   || a[4] != 7
>   || a[5] != 0
>   || a[6] != 0
>   || a[7] != 0
>   || a[8] != 0)
> abort ();
> 
> is vectorized. So add -fno-tree-vectorize avoid the confusing check.
 
Uh, please don't add optimize attributes.  If you see this vectorized
(as reduction?) then please instead rewrite the condition as
 
if (a[0] != 1)
   abort ();
__asm__ volatile ("");
if (a[1] != 2)
   abort ();
...
 
> gcc/testsuite/ChangeLog:
> 
> * gcc.dg/vect/bb-slp-33.c: Add -fno-tree-vectorize to main.
> 
> ---
>  gcc/testsuite/gcc.dg/vect/bb-slp-33.c | 3 ++-
>  1 file changed, 2 insertions(+), 1 deletion(-)
> 
> diff --git a/gcc/testsuite/gcc.dg/vect/bb-slp-33.c 
> b/gcc/testsuite/gcc.dg/vect/bb-slp-33.c
> index bbb13ef798e..f44cbdcfbcf 100644
> --- a/gcc/testsuite/gcc.dg/vect/bb-slp-33.c
> +++ b/gcc/testsuite/gcc.dg/vect/bb-slp-33.c
> @@ -17,7 +17,8 @@ test(int *__restrict__ a, int *__restrict__ b)
>a[8] = 0;
>  }
>  
> -int main()
> +int __attribute__((optimize(("-fno-tree-vectorize"
> +main()
>  {
>int a[9];
>int b[4];
> 
 
-- 
Richard Biener 
SUSE Software Solutions Germany GmbH,
Frankenstrasse 146, 90461 Nuernberg, Germany;
GF: Ivo Totev, Andrew McDonald, Werner Knoblich; (HRB 36809, AG Nuernberg)
 


[PATCH V3] test: Fix FAIL of pr97428.c for RVV

2023-11-07 Thread Juzhe-Zhong
gcc/testsuite/ChangeLog:

* gcc.dg/vect/pr97428.c: Add additional compile option for riscv.

---
 gcc/testsuite/gcc.dg/vect/pr97428.c | 1 +
 1 file changed, 1 insertion(+)

diff --git a/gcc/testsuite/gcc.dg/vect/pr97428.c 
b/gcc/testsuite/gcc.dg/vect/pr97428.c
index ad6416096aa..60dd984cfd3 100644
--- a/gcc/testsuite/gcc.dg/vect/pr97428.c
+++ b/gcc/testsuite/gcc.dg/vect/pr97428.c
@@ -1,5 +1,6 @@
 /* { dg-do compile } */
 /* { dg-require-effective-target vect_double } */
+/* { dg-additional-options "--param vect-epilogues-nomask=0" { target 
riscv*-*-* } } */
 
 typedef struct { double re, im; } dcmlx_t;
 typedef struct { double re[4], im[4]; } dcmlx4_t;
-- 
2.36.3



[PATCH] c++: fix tf_decltype manipulation for COMPOUND_EXPR

2023-11-07 Thread Patrick Palka
bootstrapped and regtested on x86_64-pc-linxu-gnu, does this look OK for trunk?

-- >8 --

In the COMPOUND_EXPR case of tsubst_expr, we were redundantly clearing
the tf_decltype flag when substituting the LHS and also neglecting to
propagate it when substituting the RHS.  This patch corrects this flag
manipulation, which allows us to accept the below testcase.

gcc/cp/ChangeLog:

* pt.cc (tsubst_expr) : Don't redundantly
clear tf_decltype when substituting the LHS.  Propagate
tf_decltype when substituting the RHS.

gcc/testsuite/ChangeLog:

* g++.dg/cpp0x/decltype-call7.C: New test.
---
 gcc/cp/pt.cc| 9 -
 gcc/testsuite/g++.dg/cpp0x/decltype-call7.C | 9 +
 2 files changed, 13 insertions(+), 5 deletions(-)
 create mode 100644 gcc/testsuite/g++.dg/cpp0x/decltype-call7.C

diff --git a/gcc/cp/pt.cc b/gcc/cp/pt.cc
index 521749df525..5f879287a58 100644
--- a/gcc/cp/pt.cc
+++ b/gcc/cp/pt.cc
@@ -20382,11 +20382,10 @@ tsubst_expr (tree t, tree args, tsubst_flags_t 
complain, tree in_decl)
 
 case COMPOUND_EXPR:
   {
-   tree op0 = tsubst_expr (TREE_OPERAND (t, 0), args,
-   complain & ~tf_decltype, in_decl);
-   RETURN (build_x_compound_expr (EXPR_LOCATION (t),
-  op0,
-  RECUR (TREE_OPERAND (t, 1)),
+   tree op0 = RECUR (TREE_OPERAND (t, 0));
+   tree op1 = tsubst_expr (TREE_OPERAND (t, 1), args,
+   complain|decltype_flag, in_decl);
+   RETURN (build_x_compound_expr (EXPR_LOCATION (t), op0, op1,
   templated_operator_saved_lookups (t),
   complain|decltype_flag));
   }
diff --git a/gcc/testsuite/g++.dg/cpp0x/decltype-call7.C 
b/gcc/testsuite/g++.dg/cpp0x/decltype-call7.C
new file mode 100644
index 000..4ce3e68381e
--- /dev/null
+++ b/gcc/testsuite/g++.dg/cpp0x/decltype-call7.C
@@ -0,0 +1,9 @@
+// { dg-do compile { target c++11 } }
+
+struct A;
+template A f();
+
+template
+decltype(42, f()) g();
+
+using type = decltype(g());
-- 
2.43.0.rc0.23.g8be77c5de6



[PATCH V2] test: Fix bb-slp-33.c for RVV

2023-11-07 Thread Juzhe-Zhong
gcc/testsuite/ChangeLog:

* gcc.dg/vect/bb-slp-33.c: Rewrite the condition.

---
 gcc/testsuite/gcc.dg/vect/bb-slp-33.c | 35 ---
 1 file changed, 26 insertions(+), 9 deletions(-)

diff --git a/gcc/testsuite/gcc.dg/vect/bb-slp-33.c 
b/gcc/testsuite/gcc.dg/vect/bb-slp-33.c
index bbb13ef798e..74af8dd27ae 100644
--- a/gcc/testsuite/gcc.dg/vect/bb-slp-33.c
+++ b/gcc/testsuite/gcc.dg/vect/bb-slp-33.c
@@ -32,16 +32,33 @@ int main()
   a[4] = 7;
   check_vect ();
   test(a, b);
-  if (a[0] != 1
-  || a[1] != 2
-  || a[2] != 3
-  || a[3] != 4
-  || a[4] != 7
-  || a[5] != 0
-  || a[6] != 0
-  || a[7] != 0
-  || a[8] != 0)
+  if (a[0] != 1)
 abort ();
+  __asm__ volatile ("");
+  if (a[1] != 2)
+abort ();
+  __asm__ volatile ("");
+  if (a[2] != 3)
+abort ();
+  __asm__ volatile ("");
+  if (a[3] != 4)
+abort ();
+  __asm__ volatile ("");
+  if (a[4] != 7)
+abort ();
+  __asm__ volatile ("");
+  if (a[5] != 0)
+abort ();
+  __asm__ volatile ("");
+  if (a[6] != 0)
+abort ();
+  __asm__ volatile ("");
+  if (a[7] != 0)
+abort ();
+  __asm__ volatile ("");
+  if (a[8] != 0)
+abort ();
+  __asm__ volatile ("");
   return 0;
 }
 
-- 
2.36.3



Re: [PATCH] test: Recover sdiv_pow2 check and remove test of RISC-V

2023-11-07 Thread Richard Biener
On Tue, 7 Nov 2023, Juzhe-Zhong wrote:

> gcc/testsuite/ChangeLog:

OK.

>   * gcc.dg/vect/vect-sdiv-pow2-1.c: Recover scan check.
>   * lib/target-supports.exp: Remove riscv.
> 
> ---
>  gcc/testsuite/gcc.dg/vect/vect-sdiv-pow2-1.c | 2 +-
>  gcc/testsuite/lib/target-supports.exp| 4 +---
>  2 files changed, 2 insertions(+), 4 deletions(-)
> 
> diff --git a/gcc/testsuite/gcc.dg/vect/vect-sdiv-pow2-1.c 
> b/gcc/testsuite/gcc.dg/vect/vect-sdiv-pow2-1.c
> index 8056c2a6748..49ecbe216f2 100644
> --- a/gcc/testsuite/gcc.dg/vect/vect-sdiv-pow2-1.c
> +++ b/gcc/testsuite/gcc.dg/vect/vect-sdiv-pow2-1.c
> @@ -79,5 +79,5 @@ main (void)
>return 0;
>  }
>  
> -/* { dg-final { scan-tree-dump "vect_recog_divmod_pattern: detected" "vect" 
> } } */
> +/* { dg-final { scan-tree-dump {\.DIV_POW2} "vect" { target 
> vect_sdiv_pow2_si } } } */
>  /* { dg-final { scan-tree-dump-times "vectorized 1 loop" 18 "vect" { target 
> vect_sdiv_pow2_si } } } */
> diff --git a/gcc/testsuite/lib/target-supports.exp 
> b/gcc/testsuite/lib/target-supports.exp
> index 0317fc102ef..8f6cdf16661 100644
> --- a/gcc/testsuite/lib/target-supports.exp
> +++ b/gcc/testsuite/lib/target-supports.exp
> @@ -8077,9 +8077,7 @@ proc check_effective_target_vect_mulhrs_hi {} {
>  
>  proc check_effective_target_vect_sdiv_pow2_si {} {
>  return [expr { ([istarget aarch64*-*-*]
> - && [check_effective_target_aarch64_sve])
> -|| ([istarget riscv*-*-*]
> -&& [check_effective_target_riscv_v]) }]
> + && [check_effective_target_aarch64_sve]) }]
>  }
>  
>  # Return 1 if the target plus current options supports a vector
> 

-- 
Richard Biener 
SUSE Software Solutions Germany GmbH,
Frankenstrasse 146, 90461 Nuernberg, Germany;
GF: Ivo Totev, Andrew McDonald, Werner Knoblich; (HRB 36809, AG Nuernberg)


Re: [PATCH 5/21]middle-end: update vectorizer's control update to support picking an exit other than loop latch

2023-11-07 Thread Richard Biener
On Mon, 6 Nov 2023, Tamar Christina wrote:

> Hi All,
> 
> As requested, the vectorizer is now free to pick it's own exit which can be
> different than what the loop CFG infrastucture uses.  The vectorizer makes use
> of this to vectorize loops that it previously could not.
> 
> But this means that loop control must be materialized in the block that needs 
> it
> less we corrupt the SSA chain.  This makes it so we use the vectorizer's main
> IV block instead of the loop infra.
> 
> Bootstrapped Regtested on aarch64-none-linux-gnu and no issues.
> 
> Ok for master?
> 
> Thanks,
> Tamar
> 
> gcc/ChangeLog:
> 
>   * tree-ssa-loop-manip.cc (standard_iv_increment_position): Conditionally
>   take dest BB.
>   * tree-ssa-loop-manip.h (standard_iv_increment_position): Likewise.
>   * tree-vect-loop-manip.cc (vect_set_loop_controls_directly): Use it.
>   (vect_set_loop_condition_partial_vectors_avx512): Likewise.
>   (vect_set_loop_condition_normal): Likewise.
> 
> --- inline copy of patch -- 
> diff --git a/gcc/tree-ssa-loop-manip.h b/gcc/tree-ssa-loop-manip.h
> index 
> bda09f51d5619420331c513a9906831c779fd2b4..5938588c8882d842b00301423df111cbe7bf7ba8
>  100644
> --- a/gcc/tree-ssa-loop-manip.h
> +++ b/gcc/tree-ssa-loop-manip.h
> @@ -38,7 +38,8 @@ extern basic_block split_loop_exit_edge (edge, bool = 
> false);
>  extern basic_block ip_end_pos (class loop *);
>  extern basic_block ip_normal_pos (class loop *);
>  extern void standard_iv_increment_position (class loop *,
> - gimple_stmt_iterator *, bool *);
> + gimple_stmt_iterator *, bool *,
> + basic_block = NULL);
>  extern bool
>  gimple_duplicate_loop_body_to_header_edge (class loop *, edge, unsigned int,
>  sbitmap, edge, vec *, int);
> diff --git a/gcc/tree-ssa-loop-manip.cc b/gcc/tree-ssa-loop-manip.cc
> index 
> e7436915e01297e7af2a3bcf1afd01e014de6f32..bdc7a3d74a788f450ca5dde6c29492ce4d4e4550
>  100644
> --- a/gcc/tree-ssa-loop-manip.cc
> +++ b/gcc/tree-ssa-loop-manip.cc
> @@ -792,14 +792,19 @@ ip_normal_pos (class loop *loop)
>  
>  /* Stores the standard position for induction variable increment in LOOP
> (just before the exit condition if it is available and latch block is 
> empty,
> -   end of the latch block otherwise) to BSI.  INSERT_AFTER is set to true if
> -   the increment should be inserted after *BSI.  */
> +   end of the latch block otherwise) to BSI.  If DEST_BB is specified then 
> that
> +   basic block is used as the destination instead of the loop latch source
> +   block.  INSERT_AFTER is set to true if the increment should be inserted 
> after
> +   *BSI.  */
>  
>  void
>  standard_iv_increment_position (class loop *loop, gimple_stmt_iterator *bsi,
> - bool *insert_after)
> + bool *insert_after, basic_block dest_bb)
>  {
> -  basic_block bb = ip_normal_pos (loop), latch = ip_end_pos (loop);
> +  basic_block bb = dest_bb;
> +  if (!bb)
> +bb = ip_normal_pos (loop);
> +  basic_block latch = ip_end_pos (loop);

I don't think that's a good API extension.  Given that we don't support
an early exit after the main IV exit doesn't this code already work
fine as-is?  It chooses the last exit.  The position is also not
semantically relevant, we just try to keep the latch empty here
(that is, it's a bit of a "bad" API).

So, do you really need this change?

Maybe we're really using standard_iv_increment_position wrong here,
the result is supposed to _only_ feed the PHI latch argument.

Richard.

>gimple *last = last_nondebug_stmt (latch);
>  
>if (!bb
> diff --git a/gcc/tree-vect-loop-manip.cc b/gcc/tree-vect-loop-manip.cc
> index 
> 6fbb5b80986fd657814b48eb009b52b094f331e6..3d59119787d6afdc5a6465a547d1ea2d3d940373
>  100644
> --- a/gcc/tree-vect-loop-manip.cc
> +++ b/gcc/tree-vect-loop-manip.cc
> @@ -531,7 +531,8 @@ vect_set_loop_controls_directly (class loop *loop, 
> loop_vec_info loop_vinfo,
>tree index_before_incr, index_after_incr;
>gimple_stmt_iterator incr_gsi;
>bool insert_after;
> -  standard_iv_increment_position (loop, _gsi, _after);
> +  edge exit_e = LOOP_VINFO_IV_EXIT (loop_vinfo);
> +  standard_iv_increment_position (loop, _gsi, _after, 
> exit_e->src);
>if (LOOP_VINFO_USING_DECREMENTING_IV_P (loop_vinfo))
>  {
>/* Create an IV that counts down from niters_total and whose step
> @@ -1017,7 +1018,8 @@ vect_set_loop_condition_partial_vectors_avx512 (class 
> loop *loop,
>tree index_before_incr, index_after_incr;
>gimple_stmt_iterator incr_gsi;
>bool insert_after;
> -  standard_iv_increment_position (loop, _gsi, _after);
> +  standard_iv_increment_position (loop, _gsi, _after,
> +   exit_edge->src);
>create_iv (niters_adj, MINUS_EXPR, iv_step, NULL_TREE, loop,
>_gsi, insert_after, _before_incr,

Re: [PATCH] binutils: experimental use of libdiagnostics in gas

2023-11-07 Thread Jan Beulich
On 07.11.2023 15:32, David Malcolm wrote:
> On Tue, 2023-11-07 at 11:03 +0100, Jan Beulich wrote:
>> On 06.11.2023 23:29, David Malcolm wrote:
>>> All of the locations are just lines; does gas do column numbers at
>>> all?
>>> (or ranges?)
>>
>> It currently doesn't, which is primarily related to the scrubbing
>> done
>> before lines are actually processed.
> 
> How complicated/desirable would it be to track locations in .s files at
> the column level?  I confess I didn't look at the parsing code at all.

At the parsing level tracking may be feasible, but as said the scrubber
(zapping in particular redundant whitespace, but also doing other
"interesting" things) is the problem point here, imo.

Jan


Re: [PATCH 2/2] libdiagnostics: work-in-progress implementation

2023-11-07 Thread David Malcolm
On Tue, 2023-11-07 at 08:54 +0100, Simon Sobisch wrote:
> Thank you for our work and providing this patch.
> 
> GCC related questions:
> 
> Is it planned to change GCC diagnostics to use libdiagnostic itself?

No.  GCC uses C++ internally, and the internal diagnostic API is
written in C++. libdiagnostic wraps up this C++ API in a C interface. 
GCC would continue using the C++ interface internally.

> 
> Is it planned to "directly" add features or would the result for GCC
> be 
> identical (apart from build changes)?
> 
> So far it looks like it wouldn't be possible to "just build 
> libdiagnostics", and much less to "just distrubute its source" for
> that 
> purpose, is it?

Correct: libdiagnostics is just an extra .cc file within the rest of
GCC, and almost all the work is being done in other .cc files.

> As building GCC does take a significant amount of resources and 
> system-wide switching to a new GCC version is considered a serious
> task 
> (distributions commonly stay with their major GCC version for quite
> some 
> time), I'd search for an option to building a "self-contained"
> version 
> that does not need the complete necessary toolset and may also be 
> distributed separately.

It's possible to reduce the resources by disabling bootstrapping, and
only enabling a minimal set of languages.

I'd see libdiagnostics as coming from the distribution build of GCC.  I
suppose distributions might want to have a simple build of GCC and ship
just the .so/.h file from libdiagnostics from the build.

> 
> This definitely can come later, too; I _guess_ this would mean moving
> part of GCCs code in a sub-folder libdiagnostics and use it as 
> subproject for configure/make, with then option to run "make dist" in
> that subfolder alone, too.

It would involve a lot of refactoring :)

> 
> The main reason for that would be to allow applications move from
> their 
> previous own diagnostic to libdiagnostics, if it isn't available on
> the 
> system they can build and install it as subproject, too; and to be
> able 
> to build libdiagnostics with a much reduced dependency list.

I can try to come up with a minimal recipe for building gcc if all you
want is libdiagnostics

> 
> 
> Thank you for considering that,
> Simon

Thanks
Dave

[...snip...]



Re: [PATCH 6/21]middle-end: support multiple exits in loop versioning

2023-11-07 Thread Richard Biener
On Mon, 6 Nov 2023, Tamar Christina wrote:

> Hi All,
> 
> This has loop versioning use the vectorizer's IV exit edge when it's available
> since single_exit (..) fails with multiple exits.
> 
> Bootstrapped Regtested on aarch64-none-linux-gnu and no issues.
> 
> Ok for master?
> 
> Thanks,
> Tamar
> 
> gcc/ChangeLog:
> 
>   * tree-vect-loop-manip.cc (vect_loop_versioning): Support multiple
>   exits.
> 
> --- inline copy of patch -- 
> diff --git a/gcc/tree-vect-loop-manip.cc b/gcc/tree-vect-loop-manip.cc
> index 
> 3d59119787d6afdc5a6465a547d1ea2d3d940373..58b4b9c11d8b844ee86156cdfcba7f838030a7c2
>  100644
> --- a/gcc/tree-vect-loop-manip.cc
> +++ b/gcc/tree-vect-loop-manip.cc
> @@ -4180,12 +4180,24 @@ vect_loop_versioning (loop_vec_info loop_vinfo,
>If loop versioning wasn't done from loop, but scalar_loop instead,
>merge_bb will have already just a single successor.  */
>  
> -  merge_bb = single_exit (loop_to_version)->dest;
> +  /* Due to the single_exit check above we should only get here when
> +  loop == loop_to_version, that means we can use loop_vinfo to get the
> +  exits.  */

You mean LOOP_VINFO_EARLY_BREAKS can only ever version the loop
itself?  That's correct.  All inner loops of loop_to_version have
a single exit unless it's loop itself.

Please reword a bit and instead do

   edge exit_edge;
   if (loop_to_version == loop)
 exit_edge = LOOP_VINFO_IV_EXIT (loop_vinfo);
   else
 exit_edge = single_exit (loop_to_version);

OK with that change.

Richard.

> +  edge exit_edge = single_exit (loop_to_version);
> +  if (LOOP_VINFO_EARLY_BREAKS (loop_vinfo))
> + {
> +   /* In early exits the main exit will fail into the merge block of the
> +  alternative exits.  So we need the single successor of the main
> +  exit here to find the merge block.  */
> +   exit_edge = LOOP_VINFO_IV_EXIT (loop_vinfo);
> + }
> +  gcc_assert (exit_edge);
> +  merge_bb = exit_edge->dest;
>if (EDGE_COUNT (merge_bb->preds) >= 2)
>   {
> gcc_assert (EDGE_COUNT (merge_bb->preds) >= 2);
> -   new_exit_bb = split_edge (single_exit (loop_to_version));
> -   new_exit_e = single_exit (loop_to_version);
> +   new_exit_bb = split_edge (exit_edge);
> +   new_exit_e = exit_edge;
> e = EDGE_SUCC (new_exit_bb, 0);
>  
> for (gsi = gsi_start_phis (merge_bb); !gsi_end_p (gsi);


Re: [PATCH] binutils: experimental use of libdiagnostics in gas

2023-11-07 Thread David Malcolm
On Tue, 2023-11-07 at 08:04 +0100, Simon Sobisch wrote:
> Thank you very much for this proof-of-concept use!
> 
> Inspecting it raises the following questions to me, both for a
> possible 
> binutils implementation and for the library use in general:
> 
> * How should the application set the relevant context (often lines
> are
>    shown before/after)?

Currently the source-printing code has this logic (in gcc/diagnostic-
show-locus.cc):
- filter locations within the diagnostic to "sufficiently sane" ones
(thus ignoring e.g. ranges that end before they start)
- look at all of the remaining locations that are in the same source
file as the primary location of the diagnostic
- determine a set of runs of source lines; layout::calculate_line_spans
has this comment:

/* We want to print the pertinent source code at a diagnostic.  The
   rich_location can contain multiple locations.  This will have been
   filtered into m_exploc (the caret for the primary location) and
   m_layout_ranges, for those ranges within the same source file.

   We will print a subset of the lines within the source file in question,
   as a collection of "spans" of lines.

   This function populates m_line_spans with an ordered, disjoint list of
   the line spans of interest.

   Printing a gap between line spans takes one line, so, when printing
   line numbers, we allow a gap of up to one line between spans when
   merging, since it makes more sense to print the source line rather than a
   "gap-in-line-numbering" line.  When not printing line numbers, it's
   better to be more explicit about what's going on, so keeping them as
   separate spans is preferred.

   For example, if the primary range is on lines 8-10, with secondary ranges
   covering lines 5-6 and lines 13-15:

 004
 005   |RANGE 1
 006   |RANGE 1
 007
 008  |PRIMARY RANGE
 009  |PRIMARY CARET
 010  |PRIMARY RANGE
 011
 012
 013|RANGE 2
 014|RANGE 2
 015|RANGE 2
 016

   With line numbering on, we want two spans: lines 5-10 and lines 13-15.

   With line numbering off (with span headers), we want three spans: lines 5-6,
   lines 8-10, and lines 13-15.  */
(end of quote)

This algorithm could be tweaked if you want extra lines of context,
perhaps having an integer option N for N extra lines of before/after
context around each run.


> * Should it be possible to override msgid used to display the
>    warning/error type?
>    If this would be possible then the text sink in messages_init may
> be
>    adjusted to replace the label with _("Warning") and _("Error"),
> which
>    would leave the text output "as-is" (if the text sink is
> configured to
>    not output the source line); this would make it usable without
>    adjusting the testsuite and to adopt to a standard later.

For the msgids, I was more thinking of the messages of the diagnostics
themselves; for instance, in GCC the error format string:

   "Invalid argument %d for builtin %qF"

has fr.po translation:

   "Argument %d invalide pour la fonction interne %qF"

But it sounds like gas also has capitalized severities (e.g.
"Warning"), so maybe that should simply be a flag in the text sink.

> 
> 
> Notes for the SARIF output:
> * the region contains an error, according to the linked JSON spec
>    startColumn has a minimum of 1 (I guess you'd just leave it away
> if
>    the application did not set it)

Good catch; looks like a bug in my SARIF output code.  I've filed it
for myself as:
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=112425


> * the application should have the option to pre-set the
> sourceLanguage
>    for the diagnostic_manager (maybe even make that a positional
> argument
>    that needs to be passed but can be NULL) and override it when
>    specifying a region

Why?

Note that the sourceLanguage can always be NULL.  I considered setting
it for gas, but it's not clear what the value can be, so I just omit
it; see:
  https://github.com/oasis-tcs/sarif-spec/issues/608



[..snip...]

Thanks for the feedback
Dave



Re: Re: [PATCH] test: Fix FAIL of vect-sdiv-pow2-1.c for RVV test: Fix FAIL of vect-sdiv-pow2-1.c for RVV#

2023-11-07 Thread 钟居哲
Ok.  Sorry for inconvenience.

Here is the patch to fix as you suggested:
https://gcc.gnu.org/pipermail/gcc-patches/2023-November/635580.html 

I disabled this test, instead, I will add it into riscv specific testsuite list.

Thanks.


juzhe.zh...@rivai.ai
 
From: Richard Biener
Date: 2023-11-07 22:36
To: 钟居哲
CC: gcc-patches; Jeff Law
Subject: Re: Re: [PATCH] test: Fix FAIL of vect-sdiv-pow2-1.c for RVV test: Fix 
FAIL of vect-sdiv-pow2-1.c for RVV#
On Tue, 7 Nov 2023, ??? wrote:
 
> Hi, Richi.
> 
> We don't have explicit SDIV_POW2 pattern but we still want to test it to make 
> sure
> we can vectorize SDIV_POW2 pattern which will be recognized.
> 
> Maybe we should add another target check ?
 
No, you should simply _not_ add riscv*-*-* to 
check_effective_target_vect_sdiv_pow2_si when you don't implement that
pattern!
 
The test is specifically for this very pattern, not a test whether
we can vectorize this at all.
 
> 
> 
> juzhe.zh...@rivai.ai
>  
> From: Richard Biener
> Date: 2023-11-07 21:45
> To: Juzhe-Zhong
> CC: gcc-patches; jeffreyalaw
> Subject: Re: [PATCH] test: Fix FAIL of vect-sdiv-pow2-1.c for RVV test: Fix 
> FAIL of vect-sdiv-pow2-1.c for RVV#
> On Tue, 7 Nov 2023, Juzhe-Zhong wrote:
>  
> > RVV didn't explictly enable DIV_POW2 optab but we cen vectorize it.
> > We should check pattern recognition instead of explicit pattern check.
>  
> But I see
>  
> proc check_effective_target_vect_sdiv_pow2_si {} {
> return [expr { ([istarget aarch64*-*-*]
> && [check_effective_target_aarch64_sve])
>|| ([istarget riscv*-*-*]
>&& [check_effective_target_riscv_v]) }]
>  
> so if you don't have sdiv_pow2_si then please don't advertise it.
>  
> > gcc/testsuite/ChangeLog:
> > 
> > * gcc.dg/vect/vect-sdiv-pow2-1.c: Fix dump check.
> > 
> > ---
> >  gcc/testsuite/gcc.dg/vect/vect-sdiv-pow2-1.c | 2 +-
> >  1 file changed, 1 insertion(+), 1 deletion(-)
> > 
> > diff --git a/gcc/testsuite/gcc.dg/vect/vect-sdiv-pow2-1.c 
> > b/gcc/testsuite/gcc.dg/vect/vect-sdiv-pow2-1.c
> > index 49ecbe216f2..8056c2a6748 100644
> > --- a/gcc/testsuite/gcc.dg/vect/vect-sdiv-pow2-1.c
> > +++ b/gcc/testsuite/gcc.dg/vect/vect-sdiv-pow2-1.c
> > @@ -79,5 +79,5 @@ main (void)
> >return 0;
> >  }
> >  
> > -/* { dg-final { scan-tree-dump {\.DIV_POW2} "vect" { target 
> > vect_sdiv_pow2_si } } } */
> > +/* { dg-final { scan-tree-dump "vect_recog_divmod_pattern: detected" 
> > "vect" } } */
> >  /* { dg-final { scan-tree-dump-times "vectorized 1 loop" 18 "vect" { 
> > target vect_sdiv_pow2_si } } } */
> > 
>  
> 
 
-- 
Richard Biener 
SUSE Software Solutions Germany GmbH,
Frankenstrasse 146, 90461 Nuernberg, Germany;
GF: Ivo Totev, Andrew McDonald, Werner Knoblich; (HRB 36809, AG Nuernberg)
 


Re: [PATCH v4 1/2] c++: Initial support for P0847R7 (Deducing this) [PR102609]

2023-11-07 Thread waffl3x
I guess I'll be attaching all new e-mails here.

I found a new, kinda scary issue.

```
bool
start_preparsed_function (tree decl1, tree attrs, int flags)
{
  tree ctype = NULL_TREE;
  bool doing_friend = false;

  /* Sanity check.  */
  gcc_assert (VOID_TYPE_P (TREE_VALUE (void_list_node)));
  gcc_assert (TREE_CHAIN (void_list_node) == NULL_TREE);

  tree fntype = TREE_TYPE (decl1);
  if (TREE_CODE (fntype) == METHOD_TYPE)
ctype = TYPE_METHOD_BASETYPE (fntype);
  else if (DECL_XOBJ_MEMBER_FUNC_P (decl1))
ctype = DECL_CONTEXT (decl1);
```

```
  if (ctype && !doing_friend && !DECL_STATIC_FUNCTION_P (decl1))
{
  /* We know that this was set up by `grokclassfn'.  We do not
 wait until `store_parm_decls', since evil parse errors may
 never get us to that point.  Here we keep the consistency
 between `current_class_type' and `current_class_ptr'.  */
  tree t = DECL_ARGUMENTS (decl1);

  gcc_assert (t != NULL_TREE && TREE_CODE (t) == PARM_DECL);
  gcc_assert (TYPE_PTR_P (TREE_TYPE (t))
|| DECL_XOBJ_MEMBER_FUNC_P (decl1));

  cp_function_chain->x_current_class_ref
= cp_build_fold_indirect_ref (t);
  /* Set this second to avoid shortcut in cp_build_indirect_ref.  */
  cp_function_chain->x_current_class_ptr = t;

  /* Constructors and destructors need to know whether they're "in
 charge" of initializing virtual base classes.  */
  /* SNIP IRRELEVANT */
}
```

I made changes in this function, which suddenly sent execution into the
second code block. It seems like this would have been being bypassed
until the fix at the top of the function. Initially this was to fix a
problem with lambdas, but suddenly a lot of other stuff seems to be
breaking. I haven't run the tests yet but... I have a really bad
feeling about this.

So my concerns here are, one, this seems kind of important upon looking
at it, what kind of stuff might have been broken when this was being
bypassed that I didn't notice? And two, how in the world was it working
when this was being bypassed?

I have a hunch that some of the reinterpretation and "just works"
behavior might have had something to do with this block of code being
bypassed. I also suspect that this area will need some changes to make
by-value xobj parameters work. However, I'm a little confused at why
this block is necessary at all. Like I have noted before, when
attempting to call a by-value xobj member function, if there are no
viable conversions, the call will fail. So it's checking for that
somewhere.

normal.C: In explicit object member function 'uintptr_t S::f(this uintptr_t)':
normal.C:15:33: error: invalid type argument (have 'uintptr_t' {aka 'long 
unsigned int'})
   15 |   uintptr_t f(this uintptr_t n) {
  | ^
normal.C: In explicit object member function 'uintptr_t S::g(this FromS)':
normal.C:18:34: error: invalid type argument (have 'FromS')
   18 |   uintptr_t g(this FromS from_s) {
  |  ^

But now that we are entering this code block, (when compiling
explicit-obj-by-value2.C) these errors are popping up. Why now? Why
isn't this being handled in the same place other things are? How
important is this block of code really? Is this the origin of the weird
errors where rvalue refs are being accepted for functions that take
const rvalue refs?

Is this code just setting up the 'this' pointer? I have so many guesses
and so many questions. I don't think I can just bypass it though, but
maybe I can? This is another one that feels really deep down the rabbit
hole so I would appreciate any insight that can be provided.

Anyway, I had thought that I probably just need to change how
build_over_call works to get passing to by-value xobj params to work.
But now that I've found this code and gotten these new errors to
surprise me, now I'm a little worried.

If it's just for the 'this' pointer then it's probably fine. I need to
sleep now, I hadn't planned on looking at this for as long as I did but
I got sucked in. I think tomorrow I will go back to bypassing this code
block and try to make changes to build_over_call works and see if that
does the trick. But things feel all over the place now so I'm a little
concerned about what else I might be neglecting.

Thanks,
Alex



[PATCH] test: Recover sdiv_pow2 check and remove test of RISC-V

2023-11-07 Thread Juzhe-Zhong
gcc/testsuite/ChangeLog:

* gcc.dg/vect/vect-sdiv-pow2-1.c: Recover scan check.
* lib/target-supports.exp: Remove riscv.

---
 gcc/testsuite/gcc.dg/vect/vect-sdiv-pow2-1.c | 2 +-
 gcc/testsuite/lib/target-supports.exp| 4 +---
 2 files changed, 2 insertions(+), 4 deletions(-)

diff --git a/gcc/testsuite/gcc.dg/vect/vect-sdiv-pow2-1.c 
b/gcc/testsuite/gcc.dg/vect/vect-sdiv-pow2-1.c
index 8056c2a6748..49ecbe216f2 100644
--- a/gcc/testsuite/gcc.dg/vect/vect-sdiv-pow2-1.c
+++ b/gcc/testsuite/gcc.dg/vect/vect-sdiv-pow2-1.c
@@ -79,5 +79,5 @@ main (void)
   return 0;
 }
 
-/* { dg-final { scan-tree-dump "vect_recog_divmod_pattern: detected" "vect" } 
} */
+/* { dg-final { scan-tree-dump {\.DIV_POW2} "vect" { target vect_sdiv_pow2_si 
} } } */
 /* { dg-final { scan-tree-dump-times "vectorized 1 loop" 18 "vect" { target 
vect_sdiv_pow2_si } } } */
diff --git a/gcc/testsuite/lib/target-supports.exp 
b/gcc/testsuite/lib/target-supports.exp
index 0317fc102ef..8f6cdf16661 100644
--- a/gcc/testsuite/lib/target-supports.exp
+++ b/gcc/testsuite/lib/target-supports.exp
@@ -8077,9 +8077,7 @@ proc check_effective_target_vect_mulhrs_hi {} {
 
 proc check_effective_target_vect_sdiv_pow2_si {} {
 return [expr { ([istarget aarch64*-*-*]
-   && [check_effective_target_aarch64_sve])
-  || ([istarget riscv*-*-*]
-  && [check_effective_target_riscv_v]) }]
+   && [check_effective_target_aarch64_sve]) }]
 }
 
 # Return 1 if the target plus current options supports a vector
-- 
2.36.3



Re: [PATCH] Do not prepend target triple to -fuse-ld=lld,mold.

2023-11-07 Thread Richard Biener
On Tue, 7 Nov 2023, Tatsuyuki Ishi wrote:

> > On Oct 16, 2023, at 18:16, Richard Biener  wrote:
> > 
> > On Mon, 16 Oct 2023, Tatsuyuki Ishi wrote:
> > 
> >> 
> >> 
> >>> On Oct 16, 2023, at 17:55, Richard Biener  wrote:
> >>> 
> >>> On Mon, 16 Oct 2023, Tatsuyuki Ishi wrote:
> >>> 
>  
>  
> > On Oct 16, 2023, at 17:39, Richard Biener  wrote:
> > 
> > On Mon, 16 Oct 2023, Tatsuyuki Ishi wrote:
> > 
> >> lld and mold are platform-agnostic and not prefixed with target triple.
> >> Prepending the target triple makes it less likely to find the intended
> >> linker executable.
> >> 
> >> A potential breaking change is that we no longer try to search for
> >> triple-prefixed lld/mold binaries anymore. However, since there doesn't
> >> seem to be support to build LLVM or mold with triple-prefixed 
> >> executable
> >> names, it seems better to just not bother with that case.
> >> 
> >>PR driver/111605
> >> 
> >> gcc/Changelog:
> >> 
> >>* collect2.cc (main): Do not prepend target triple to
> >>-fuse-ld=lld,mold.
> >> ---
> >> gcc/collect2.cc | 13 -
> >> 1 file changed, 8 insertions(+), 5 deletions(-)
> >> 
> >> diff --git a/gcc/collect2.cc b/gcc/collect2.cc
> >> index 63b9a0c233a..c943f9f577c 100644
> >> --- a/gcc/collect2.cc
> >> +++ b/gcc/collect2.cc
> >> @@ -865,12 +865,15 @@ main (int argc, char **argv)
> >> int i;
> >> 
> >> for (i = 0; i < USE_LD_MAX; i++)
> >> -full_ld_suffixes[i]
> >> #ifdef CROSS_DIRECTORY_STRUCTURE
> >> -  = concat (target_machine, "-", ld_suffixes[i], NULL);
> >> -#else
> >> -  = ld_suffixes[i];
> >> -#endif
> >> +/* lld and mold are platform-agnostic and not prefixed with target
> >> +   triple.  */
> >> +if (!(i == USE_LLD_LD || i == USE_MOLD_LD))
> >> +  full_ld_suffixes[i] = concat (target_machine, "-", 
> >> ld_suffixes[i],
> >> +  NULL);
> >> +else
> >> +#endif
> >> +  full_ld_suffixes[i] = ld_suffixes[i];
> >> 
> >> p = argv[0] + strlen (argv[0]);
> >> while (p != argv[0] && !IS_DIR_SEPARATOR (p[-1]))
> > 
> > Since we later do
> > 
> > /* Search the compiler directories for `ld'.  We have protection against
> >   recursive calls in find_a_file.  */
> > if (ld_file_name == 0)
> >  ld_file_name = find_a_file (, ld_suffixes[selected_linker], 
> > X_OK);
> > /* Search the ordinary system bin directories
> >   for `ld' (if native linking) or `TARGET-ld' (if cross).  */
> > if (ld_file_name == 0)
> >  ld_file_name = find_a_file (, full_ld_suffixes[selected_linker], 
> > X_OK);
> > 
> > I wonder how having full_ld_suffixes[LLD|MOLD] == ld_suffixes[LLD|MOLD]
> > fixes anything?
>  
>  Per the linked PR, the intended use case for this is when one wants to 
>  use their system lld/mold with a separately packaged cross toolchain, 
>  without requiring them to symlink their system lld/mold into the cross 
>  toolchain bin directory.
>  
>  (Note that the first search is against COMPILER_PATH while the latter is 
>  against PATH).
> >>> 
> >>> Ah.  So what about instead adding here
> >>> 
> >>>  /* Search the ordinary system bin directories for mold/lld even in
> >>> a cross configuration.  */
> >>>  if (ld_file_name == 0
> >>>  && selected_linker == ...)
> >>>ld_file_name = find_a_file (, ld_suffixes[selected_linker], X_OK);
> >>> 
> >>> instead?  That would keep things working in case the user has a
> >>> xyz-arch-mold in the system dir but uses GNU ld on the host
> >>> otherwise, lacking a 'mold' binary there?
> >>> 
> >>> That is, we'd only add, not change what we search for.
> >> 
> >> I considered that, but as described in commit message, it doesn?t seem 
> >> anyone has created stuff named xyz-arch-lld or xyz-arch-mold. Closest is 
> >> Gentoo?s symlink mentioned in this thread, but that?s xyz-arch-ld -> 
> >> ld.lld/mold.
> >> As such, this feels like a quirk, not something we need to keep 
> >> compatibility for.
> > 
> > I don't have a good idea whether this is the case or not unfortunately
> > so if it's my call I would err on the safe side.
> > 
> > We seem to recognize mold and lld only since GCC 12 which both are
> > still maintained so I think we might want to do the change on all
> > those branches?
> > 
> > If you feel confident there's indeed no such installs then let's go
> > with your original patch.
> > 
> > Thus, OK for trunk and the affected branches after a while of no
> > reported issues.
> 
> Hi,
> 
> Can I consider this an approval for this patch to be applied to trunk?

Yes.

> I would appreciate if this patch could be tested in GCC 14 prereleases.
> 
> I suppose backporting after no reported issues in GCC 14 would be the plan 
> here?
> 
> Please 

Re: Re: [PATCH] test: Fix FAIL of vect-sdiv-pow2-1.c for RVV test: Fix FAIL of vect-sdiv-pow2-1.c for RVV#

2023-11-07 Thread Richard Biener
On Tue, 7 Nov 2023, ??? wrote:

> Hi, Richi.
> 
> We don't have explicit SDIV_POW2 pattern but we still want to test it to make 
> sure
> we can vectorize SDIV_POW2 pattern which will be recognized.
> 
> Maybe we should add another target check ?

No, you should simply _not_ add riscv*-*-* to 
check_effective_target_vect_sdiv_pow2_si when you don't implement that
pattern!

The test is specifically for this very pattern, not a test whether
we can vectorize this at all.

> 
> 
> juzhe.zh...@rivai.ai
>  
> From: Richard Biener
> Date: 2023-11-07 21:45
> To: Juzhe-Zhong
> CC: gcc-patches; jeffreyalaw
> Subject: Re: [PATCH] test: Fix FAIL of vect-sdiv-pow2-1.c for RVV test: Fix 
> FAIL of vect-sdiv-pow2-1.c for RVV#
> On Tue, 7 Nov 2023, Juzhe-Zhong wrote:
>  
> > RVV didn't explictly enable DIV_POW2 optab but we cen vectorize it.
> > We should check pattern recognition instead of explicit pattern check.
>  
> But I see
>  
> proc check_effective_target_vect_sdiv_pow2_si {} {
> return [expr { ([istarget aarch64*-*-*]
> && [check_effective_target_aarch64_sve])
>|| ([istarget riscv*-*-*]
>&& [check_effective_target_riscv_v]) }]
>  
> so if you don't have sdiv_pow2_si then please don't advertise it.
>  
> > gcc/testsuite/ChangeLog:
> > 
> > * gcc.dg/vect/vect-sdiv-pow2-1.c: Fix dump check.
> > 
> > ---
> >  gcc/testsuite/gcc.dg/vect/vect-sdiv-pow2-1.c | 2 +-
> >  1 file changed, 1 insertion(+), 1 deletion(-)
> > 
> > diff --git a/gcc/testsuite/gcc.dg/vect/vect-sdiv-pow2-1.c 
> > b/gcc/testsuite/gcc.dg/vect/vect-sdiv-pow2-1.c
> > index 49ecbe216f2..8056c2a6748 100644
> > --- a/gcc/testsuite/gcc.dg/vect/vect-sdiv-pow2-1.c
> > +++ b/gcc/testsuite/gcc.dg/vect/vect-sdiv-pow2-1.c
> > @@ -79,5 +79,5 @@ main (void)
> >return 0;
> >  }
> >  
> > -/* { dg-final { scan-tree-dump {\.DIV_POW2} "vect" { target 
> > vect_sdiv_pow2_si } } } */
> > +/* { dg-final { scan-tree-dump "vect_recog_divmod_pattern: detected" 
> > "vect" } } */
> >  /* { dg-final { scan-tree-dump-times "vectorized 1 loop" 18 "vect" { 
> > target vect_sdiv_pow2_si } } } */
> > 
>  
> 

-- 
Richard Biener 
SUSE Software Solutions Germany GmbH,
Frankenstrasse 146, 90461 Nuernberg, Germany;
GF: Ivo Totev, Andrew McDonald, Werner Knoblich; (HRB 36809, AG Nuernberg)


Re: [V2 PATCH] Handle bitop with INTEGER_CST in analyze_and_compute_bitop_with_inv_effect.

2023-11-07 Thread Richard Biener
On Tue, Nov 7, 2023 at 2:03 PM Hongtao Liu  wrote:
>
> On Tue, Nov 7, 2023 at 4:10 PM Richard Biener
>  wrote:
> >
> > On Tue, Nov 7, 2023 at 7:08 AM liuhongt  wrote:
> > >
> > > analyze_and_compute_bitop_with_inv_effect assumes the first operand is
> > > loop invariant which is not the case when it's INTEGER_CST.
> > >
> > > Bootstrapped and regtseted on x86_64-pc-linux-gnu{-m32,}.
> > > Ok for trunk?
> >
> > So this addresses a missed optimization, right?  It seems to me that
> > even with two SSA names we are only "lucky" when rhs1 is the invariant
> > one.  So instead of swapping this way I'd do
> Yes, it's a miss optimization.
> And I think expr_invariant_in_loop_p (loop, match_op[1]) should be
> enough, if match_op[1] is a loop invariant.it must be false for the
> below conditions(there couldn't be any header_phi from its
> definition).

Yes, all I said is that when you now care for op1 being INTEGER_CST
it could also be an invariant SSA name and thus only after swapping op0/op1
we could have a successful match, no?

> >
> >  unsigned i;
> >  for (i = 0; i < 2; ++i)
> >if (TREE_CODE (match_op[i]) == SSA_NAME
> >&& ...)
> > break; /* found! */
> >
> >   if (i == 2)
> > return NULL_TREE;
> >   if (i == 0)
> > std::swap (match_op[0], match_op[1]);
> >
> > to also handle a "swapped" pair of SSA names?
> >
> > > gcc/ChangeLog:
> > >
> > > PR tree-optimization/105735
> > > PR tree-optimization/111972
> > > * tree-scalar-evolution.cc
> > > (analyze_and_compute_bitop_with_inv_effect): Handle bitop with
> > > INTEGER_CST.
> > >
> > > gcc/testsuite/ChangeLog:
> > >
> > > * gcc.target/i386/pr105735-3.c: New test.
> > > ---
> > >  gcc/testsuite/gcc.target/i386/pr105735-3.c | 87 ++
> > >  gcc/tree-scalar-evolution.cc   |  3 +
> > >  2 files changed, 90 insertions(+)
> > >  create mode 100644 gcc/testsuite/gcc.target/i386/pr105735-3.c
> > >
> > > diff --git a/gcc/testsuite/gcc.target/i386/pr105735-3.c 
> > > b/gcc/testsuite/gcc.target/i386/pr105735-3.c
> > > new file mode 100644
> > > index 000..9e268a1a997
> > > --- /dev/null
> > > +++ b/gcc/testsuite/gcc.target/i386/pr105735-3.c
> > > @@ -0,0 +1,87 @@
> > > +/* { dg-do compile } */
> > > +/* { dg-options "-O1 -fdump-tree-sccp-details" } */
> > > +/* { dg-final { scan-tree-dump-times {final value replacement} 8 "sccp" 
> > > } } */
> > > +
> > > +unsigned int
> > > +__attribute__((noipa))
> > > +foo (unsigned int tmp)
> > > +{
> > > +  for (int bit = 0; bit < 64; bit++)
> > > +tmp &= 11304;
> > > +  return tmp;
> > > +}
> > > +
> > > +unsigned int
> > > +__attribute__((noipa))
> > > +foo1 (unsigned int tmp)
> > > +{
> > > +  for (int bit = 63; bit >= 0; bit -=3)
> > > +tmp &= 11304;
> > > +  return tmp;
> > > +}
> > > +
> > > +unsigned int
> > > +__attribute__((noipa))
> > > +foo2 (unsigned int tmp)
> > > +{
> > > +  for (int bit = 0; bit < 64; bit++)
> > > +tmp |= 11304;
> > > +  return tmp;
> > > +}
> > > +
> > > +unsigned int
> > > +__attribute__((noipa))
> > > +foo3 (unsigned int tmp)
> > > +{
> > > +  for (int bit = 63; bit >= 0; bit -=3)
> > > +tmp |= 11304;
> > > +  return tmp;
> > > +}
> > > +
> > > +unsigned int
> > > +__attribute__((noipa))
> > > +foo4 (unsigned int tmp)
> > > +{
> > > +  for (int bit = 0; bit < 64; bit++)
> > > +tmp ^= 11304;
> > > +  return tmp;
> > > +}
> > > +
> > > +unsigned int
> > > +__attribute__((noipa))
> > > +foo5 (unsigned int tmp)
> > > +{
> > > +  for (int bit = 0; bit < 63; bit++)
> > > +tmp ^= 11304;
> > > +  return tmp;
> > > +}
> > > +
> > > +unsigned int
> > > +__attribute__((noipa))
> > > +f (unsigned int tmp, int bit)
> > > +{
> > > +  unsigned int res = tmp;
> > > +  for (int i = 0; i < bit; i++)
> > > +res &= 11304;
> > > +  return res;
> > > +}
> > > +
> > > +unsigned int
> > > +__attribute__((noipa))
> > > +f1 (unsigned int tmp, int bit)
> > > +{
> > > +  unsigned int res = tmp;
> > > +  for (int i = 0; i < bit; i++)
> > > +res |= 11304;
> > > +  return res;
> > > +}
> > > +
> > > +unsigned int
> > > +__attribute__((noipa))
> > > +f2 (unsigned int tmp, int bit)
> > > +{
> > > +  unsigned int res = tmp;
> > > +  for (int i = 0; i < bit; i++)
> > > +res ^= 11304;
> > > +  return res;
> > > +}
> > > diff --git a/gcc/tree-scalar-evolution.cc b/gcc/tree-scalar-evolution.cc
> > > index 70b17c5bca1..f61277c32df 100644
> > > --- a/gcc/tree-scalar-evolution.cc
> > > +++ b/gcc/tree-scalar-evolution.cc
> > > @@ -3689,6 +3689,9 @@ analyze_and_compute_bitop_with_inv_effect (class 
> > > loop* loop, tree phidef,
> > >match_op[0] = gimple_assign_rhs1 (def);
> > >match_op[1] = gimple_assign_rhs2 (def);
> > >
> > > +  if (expr_invariant_in_loop_p (loop, match_op[1]))
> > > +std::swap (match_op[0], match_op[1]);
> > > +
> > >if (TREE_CODE (match_op[1]) != SSA_NAME
> > >|| !expr_invariant_in_loop_p (loop, match_op[0])
> > >|| !(header_phi = dyn_cast  

Re: [PATCH v1] ISC-V: Support FP floor to i/l/ll diff size autovec

2023-11-07 Thread 钟居哲
LGTM



juzhe.zh...@rivai.ai
 
From: pan2.li
Date: 2023-11-07 22:30
To: gcc-patches
CC: juzhe.zhong; pan2.li; yanzhang.wang; kito.cheng
Subject: [PATCH v1] ISC-V: Support FP floor to i/l/ll diff size autovec
From: Pan Li 
 
This patch would like to support the FP below API auto vectorization
with different type size
 
+--+---+--+
| API  | RV64  | RV32 |
+--+---+--+
| ifloor   | DF => SI  | DF => SI |
| ifloorf  | - | -|
| lfloor   | - | DF => SI |
| lfloorf  | SF => DI  | -|
| llfloor  | - | -|
| llfloorf | SF => DI  | SF => DI |
+--+---+--+
 
Given below code:
void
test_lfloorf (long *out, float *in, unsigned count)
{
  for (unsigned i = 0; i < count; i++)
out[i] = __builtin_lceilf (in[i]);
}
 
Before this patch:
.L3:
  flw  fa0,0(s0)
  addi s0,s0,4
  addi s1,s1,8
  call floorf
  fcvt.l.s a5,fa0,rtz
  sd   a5,-8(s1)
  bne  s2,s0,.L3
 
After this patch:
  fsrmi2  // RDN mode
.L3:
  vsetvli  a5,a2,e32,mf2,ta,ma
  vle32.v  v2,0(a1)
  slli a3,a5,2
  slli a4,a5,3
  vfwcvt.x.f.v v1,v2
  sub  a2,a2,a5
  vse64.v  v1,0(a0)
  add  a1,a1,a3
  add  a0,a0,a4
  bne  a2,zero,.L3
 
Unfortunately, the HF mode is not include due to it requires
additional middle-end support from internal-fun.def.
 
gcc/ChangeLog:
 
* config/riscv/autovec.md: Remove the size check of lfloor.
* config/riscv/riscv-v.cc (expand_vec_lfloor): Leverage
emit_vec_rounding_to_integer for floor.
 
gcc/testsuite/ChangeLog:
 
* gcc.target/riscv/rvv/autovec/unop/math-ifloor-1.c: New test.
* gcc.target/riscv/rvv/autovec/unop/math-ifloor-run-1.c: New test.
* gcc.target/riscv/rvv/autovec/unop/math-lfloor-rv32-0.c: New test.
* gcc.target/riscv/rvv/autovec/unop/math-lfloor-rv32-run-0.c: New test.
* gcc.target/riscv/rvv/autovec/unop/math-lfloorf-rv64-0.c: New test.
* gcc.target/riscv/rvv/autovec/unop/math-lfloorf-rv64-run-0.c: New test.
* gcc.target/riscv/rvv/autovec/unop/math-llfloorf-0.c: New test.
* gcc.target/riscv/rvv/autovec/unop/math-llfloorf-run-0.c: New test.
* gcc.target/riscv/rvv/autovec/vls/math-ifloor-1.c: New test.
* gcc.target/riscv/rvv/autovec/vls/math-lfloor-rv32-0.c: New test.
* gcc.target/riscv/rvv/autovec/vls/math-lfloorf-rv64-0.c: New test.
* gcc.target/riscv/rvv/autovec/vls/math-llfloorf-0.c: New test.
 
Signed-off-by: Pan Li 
---
gcc/config/riscv/autovec.md   |  6 +-
gcc/config/riscv/riscv-v.cc   |  8 +-
.../riscv/rvv/autovec/unop/math-ifloor-1.c| 18 
.../rvv/autovec/unop/math-ifloor-run-1.c  | 83 ++
.../rvv/autovec/unop/math-lfloor-rv32-0.c | 18 
.../rvv/autovec/unop/math-lfloor-rv32-run-0.c | 83 ++
.../rvv/autovec/unop/math-lfloorf-rv64-0.c| 18 
.../autovec/unop/math-lfloorf-rv64-run-0.c| 84 +++
.../riscv/rvv/autovec/unop/math-llfloorf-0.c  | 19 +
.../rvv/autovec/unop/math-llfloorf-run-0.c| 84 +++
.../riscv/rvv/autovec/vls/math-ifloor-1.c | 27 ++
.../rvv/autovec/vls/math-lfloor-rv32-0.c  | 27 ++
.../rvv/autovec/vls/math-lfloorf-rv64-0.c | 27 ++
.../riscv/rvv/autovec/vls/math-llfloorf-0.c   | 27 ++
14 files changed, 520 insertions(+), 9 deletions(-)
create mode 100644 
gcc/testsuite/gcc.target/riscv/rvv/autovec/unop/math-ifloor-1.c
create mode 100644 
gcc/testsuite/gcc.target/riscv/rvv/autovec/unop/math-ifloor-run-1.c
create mode 100644 
gcc/testsuite/gcc.target/riscv/rvv/autovec/unop/math-lfloor-rv32-0.c
create mode 100644 
gcc/testsuite/gcc.target/riscv/rvv/autovec/unop/math-lfloor-rv32-run-0.c
create mode 100644 
gcc/testsuite/gcc.target/riscv/rvv/autovec/unop/math-lfloorf-rv64-0.c
create mode 100644 
gcc/testsuite/gcc.target/riscv/rvv/autovec/unop/math-lfloorf-rv64-run-0.c
create mode 100644 
gcc/testsuite/gcc.target/riscv/rvv/autovec/unop/math-llfloorf-0.c
create mode 100644 
gcc/testsuite/gcc.target/riscv/rvv/autovec/unop/math-llfloorf-run-0.c
create mode 100644 
gcc/testsuite/gcc.target/riscv/rvv/autovec/vls/math-ifloor-1.c
create mode 100644 
gcc/testsuite/gcc.target/riscv/rvv/autovec/vls/math-lfloor-rv32-0.c
create mode 100644 
gcc/testsuite/gcc.target/riscv/rvv/autovec/vls/math-lfloorf-rv64-0.c
create mode 100644 
gcc/testsuite/gcc.target/riscv/rvv/autovec/vls/math-llfloorf-0.c
 
diff --git a/gcc/config/riscv/autovec.md b/gcc/config/riscv/autovec.md
index b59bb880a45..973dc4ac235 100644
--- a/gcc/config/riscv/autovec.md
+++ b/gcc/config/riscv/autovec.md
@@ -2486,8 +2486,7 @@ (define_expand "lceil2"
(define_expand "lfloor2"
   [(match_operand:   0 "register_operand")
(match_operand:V_VLS_F_CONVERT_SI 1 "register_operand")]
-  "TARGET_VECTOR && !flag_trapping_math && !flag_rounding_math
-&& known_eq (GET_MODE_SIZE (mode), GET_MODE_SIZE 
(mode))"
+  "TARGET_VECTOR && !flag_trapping_math && !flag_rounding_math"
   {
 

Re: [PATCH] binutils: experimental use of libdiagnostics in gas

2023-11-07 Thread David Malcolm
On Tue, 2023-11-07 at 11:03 +0100, Jan Beulich wrote:
> On 06.11.2023 23:29, David Malcolm wrote:
> > Here's a patch for gas in binutils that makes it use libdiagnostics
> > (with some nasty hardcoded paths to specific places on my hard
> > drive
> > to make it easier to develop the API).
> > 
> > For now this hardcodes adding two sinks: a text sink on stderr, and
> > also a SARIF output to stderr (which happens after all regular
> > output).
> > 
> > For example, without this patch:
> > 
> >    gas testsuite/gas/all/warn-1.s
> > 
> > emits:
> > VVV
> > V
> > testsuite/gas/all/warn-1.s: Assembler messages:
> > testsuite/gas/all/warn-1.s:3: Warning: a warning message
> > testsuite/gas/all/warn-1.s:4: Error: .warning argument must be a
> > string
> > testsuite/gas/all/warn-1.s:5: Warning: .warning directive invoked
> > in source file
> > testsuite/gas/all/warn-1.s:6: Warning: .warning directive invoked
> > in source file
> > testsuite/gas/all/warn-1.s:7: Warning:
> > ^^^
> > ^
> > 
> > whereas with this patch:
> >   LD_LIBRARY_PATH=/home/david/coding-3/gcc-newgit-canvas-
> > 2023/build/gcc ./as-new testsuite/gas/all/warn-1.s
> > emits:
> > 
> > VVV
> > V
> > testsuite/gas/all/warn-1.s:3: warning: a warning message
> >     3 |  .warning "a warning message"   ;# { dg-warning "Warning: a
> > warning message" }
> >   |
> > testsuite/gas/all/warn-1.s:4: error: .warning argument must be a
> > string
> >     4 |  .warning a warning message ;# { dg-error "Error:
> > .warning argument must be a string" }
> >   |
> > testsuite/gas/all/warn-1.s:5: warning: .warning directive invoked
> > in source file
> >     5 |  .warning   ;# { dg-warning "Warning:
> > .warning directive invoked in source file" }
> >   |
> > testsuite/gas/all/warn-1.s:6: warning: .warning directive invoked
> > in source file
> >     6 |  .warning ".warning directive invoked in source file"   ;#
> > { dg-warning "Warning: .warning directive invoked in source file" }
> >   |
> > testsuite/gas/all/warn-1.s:7: warning:
> >     7 |  .warning ""    ;# { dg-warning "Warning: "
> > }
> >   |

[...snip...]

> > which I see:
> > - drops the leading "Assembler messages" warning,
> > - changes the capitalization of the "Warning" -> "warning" etc
> > - quotes the pertinent line in the .s file
> > 
> > All of the locations are just lines; does gas do column numbers at
> > all?
> > (or ranges?)
> 
> It currently doesn't, which is primarily related to the scrubbing
> done
> before lines are actually processed.

How complicated/desirable would it be to track locations in .s files at
the column level?  I confess I didn't look at the parsing code at all.

> 
> I take it that the lack of column information is why there are lines
> of
> this form
> 
>   |
> 
> in the example output above. 

Yes: those lines are for annotation information such as underlining
specific columns.

> Them uniformly not carrying any information
> would make it desirable for them to be suppressed.

In GCC we typically have column information, so I'd never noticed this
behavior, but it ought to be fixable, to simply not display these if
there's no column info.

> 
> > @@ -172,16 +203,34 @@ as_tsktsk (const char *format, ...)
> >  static void
> >  as_warn_internal (const char *file, unsigned int line, char
> > *buffer)
> >  {
> > +#if !USE_LIBDIAGNOSTICS
> >    bool context = false;
> > +#endif
> >  
> >    ++warning_count;
> >  
> >    if (file == NULL)
> >  {
> >    file = as_where_top ();
> > +#if !USE_LIBDIAGNOSTICS
> >    context = true;
> > +#endif
> 
> I can't spot how this context information would be replaced. It works
> for macros only right now, but the hope is to eventually extend it
> also to .include files.

I confess I hacked this up, and I didn't check what this code does.
I see now that it calls as_report_context, which iterates over macro
expansions calling as_info_where with successively larger "indent"
values.

I could extend the patch to cover that.

More ambitiously, GCC's location tracking supports recording macro
expansions and include files, and the diagnostics subsystem has a way
of printing this information.  So potentially libdiagnostics could
provide API support for this - but I haven't yet looked at the
feasibility.

> 
> > @@ -199,6 +248,7 @@ as_warn_internal (const char *file, unsigned
> > int line, char *buffer)
> >  #ifndef NO_LISTING
> >    listing_warning (buffer);
> >  #endif
> > +#endif /* #else clause of #if USE_LIBDIAGNOSTICS */
> 
> This listing integration of course needs to remain irrespective of
> which way of emitting diagnostics is used.

Likewise; I think I just put the #endif in the wrong place above.

Thanks for the feedback; hope this is constructive.
Dave



[PATCH v1] ISC-V: Support FP floor to i/l/ll diff size autovec

2023-11-07 Thread pan2 . li
From: Pan Li 

This patch would like to support the FP below API auto vectorization
with different type size

+--+---+--+
| API  | RV64  | RV32 |
+--+---+--+
| ifloor   | DF => SI  | DF => SI |
| ifloorf  | - | -|
| lfloor   | - | DF => SI |
| lfloorf  | SF => DI  | -|
| llfloor  | - | -|
| llfloorf | SF => DI  | SF => DI |
+--+---+--+

Given below code:
void
test_lfloorf (long *out, float *in, unsigned count)
{
  for (unsigned i = 0; i < count; i++)
out[i] = __builtin_lceilf (in[i]);
}

Before this patch:
.L3:
  flw  fa0,0(s0)
  addi s0,s0,4
  addi s1,s1,8
  call floorf
  fcvt.l.s a5,fa0,rtz
  sd   a5,-8(s1)
  bne  s2,s0,.L3

After this patch:
  fsrmi2  // RDN mode
.L3:
  vsetvli  a5,a2,e32,mf2,ta,ma
  vle32.v  v2,0(a1)
  slli a3,a5,2
  slli a4,a5,3
  vfwcvt.x.f.v v1,v2
  sub  a2,a2,a5
  vse64.v  v1,0(a0)
  add  a1,a1,a3
  add  a0,a0,a4
  bne  a2,zero,.L3

Unfortunately, the HF mode is not include due to it requires
additional middle-end support from internal-fun.def.

gcc/ChangeLog:

* config/riscv/autovec.md: Remove the size check of lfloor.
* config/riscv/riscv-v.cc (expand_vec_lfloor): Leverage
emit_vec_rounding_to_integer for floor.

gcc/testsuite/ChangeLog:

* gcc.target/riscv/rvv/autovec/unop/math-ifloor-1.c: New test.
* gcc.target/riscv/rvv/autovec/unop/math-ifloor-run-1.c: New test.
* gcc.target/riscv/rvv/autovec/unop/math-lfloor-rv32-0.c: New test.
* gcc.target/riscv/rvv/autovec/unop/math-lfloor-rv32-run-0.c: New test.
* gcc.target/riscv/rvv/autovec/unop/math-lfloorf-rv64-0.c: New test.
* gcc.target/riscv/rvv/autovec/unop/math-lfloorf-rv64-run-0.c: New test.
* gcc.target/riscv/rvv/autovec/unop/math-llfloorf-0.c: New test.
* gcc.target/riscv/rvv/autovec/unop/math-llfloorf-run-0.c: New test.
* gcc.target/riscv/rvv/autovec/vls/math-ifloor-1.c: New test.
* gcc.target/riscv/rvv/autovec/vls/math-lfloor-rv32-0.c: New test.
* gcc.target/riscv/rvv/autovec/vls/math-lfloorf-rv64-0.c: New test.
* gcc.target/riscv/rvv/autovec/vls/math-llfloorf-0.c: New test.

Signed-off-by: Pan Li 
---
 gcc/config/riscv/autovec.md   |  6 +-
 gcc/config/riscv/riscv-v.cc   |  8 +-
 .../riscv/rvv/autovec/unop/math-ifloor-1.c| 18 
 .../rvv/autovec/unop/math-ifloor-run-1.c  | 83 ++
 .../rvv/autovec/unop/math-lfloor-rv32-0.c | 18 
 .../rvv/autovec/unop/math-lfloor-rv32-run-0.c | 83 ++
 .../rvv/autovec/unop/math-lfloorf-rv64-0.c| 18 
 .../autovec/unop/math-lfloorf-rv64-run-0.c| 84 +++
 .../riscv/rvv/autovec/unop/math-llfloorf-0.c  | 19 +
 .../rvv/autovec/unop/math-llfloorf-run-0.c| 84 +++
 .../riscv/rvv/autovec/vls/math-ifloor-1.c | 27 ++
 .../rvv/autovec/vls/math-lfloor-rv32-0.c  | 27 ++
 .../rvv/autovec/vls/math-lfloorf-rv64-0.c | 27 ++
 .../riscv/rvv/autovec/vls/math-llfloorf-0.c   | 27 ++
 14 files changed, 520 insertions(+), 9 deletions(-)
 create mode 100644 
gcc/testsuite/gcc.target/riscv/rvv/autovec/unop/math-ifloor-1.c
 create mode 100644 
gcc/testsuite/gcc.target/riscv/rvv/autovec/unop/math-ifloor-run-1.c
 create mode 100644 
gcc/testsuite/gcc.target/riscv/rvv/autovec/unop/math-lfloor-rv32-0.c
 create mode 100644 
gcc/testsuite/gcc.target/riscv/rvv/autovec/unop/math-lfloor-rv32-run-0.c
 create mode 100644 
gcc/testsuite/gcc.target/riscv/rvv/autovec/unop/math-lfloorf-rv64-0.c
 create mode 100644 
gcc/testsuite/gcc.target/riscv/rvv/autovec/unop/math-lfloorf-rv64-run-0.c
 create mode 100644 
gcc/testsuite/gcc.target/riscv/rvv/autovec/unop/math-llfloorf-0.c
 create mode 100644 
gcc/testsuite/gcc.target/riscv/rvv/autovec/unop/math-llfloorf-run-0.c
 create mode 100644 
gcc/testsuite/gcc.target/riscv/rvv/autovec/vls/math-ifloor-1.c
 create mode 100644 
gcc/testsuite/gcc.target/riscv/rvv/autovec/vls/math-lfloor-rv32-0.c
 create mode 100644 
gcc/testsuite/gcc.target/riscv/rvv/autovec/vls/math-lfloorf-rv64-0.c
 create mode 100644 
gcc/testsuite/gcc.target/riscv/rvv/autovec/vls/math-llfloorf-0.c

diff --git a/gcc/config/riscv/autovec.md b/gcc/config/riscv/autovec.md
index b59bb880a45..973dc4ac235 100644
--- a/gcc/config/riscv/autovec.md
+++ b/gcc/config/riscv/autovec.md
@@ -2486,8 +2486,7 @@ (define_expand "lceil2"
 (define_expand "lfloor2"
   [(match_operand:   0 "register_operand")
(match_operand:V_VLS_F_CONVERT_SI 1 "register_operand")]
-  "TARGET_VECTOR && !flag_trapping_math && !flag_rounding_math
-&& known_eq (GET_MODE_SIZE (mode), GET_MODE_SIZE 
(mode))"
+  "TARGET_VECTOR && !flag_trapping_math && !flag_rounding_math"
   {
 riscv_vector::expand_vec_lfloor (operands[0], operands[1], mode, 
mode);
 DONE;
@@ 

Re: [PATCH] Do not prepend target triple to -fuse-ld=lld,mold.

2023-11-07 Thread Tatsuyuki Ishi
> On Oct 16, 2023, at 18:16, Richard Biener  wrote:
> 
> On Mon, 16 Oct 2023, Tatsuyuki Ishi wrote:
> 
>> 
>> 
>>> On Oct 16, 2023, at 17:55, Richard Biener  wrote:
>>> 
>>> On Mon, 16 Oct 2023, Tatsuyuki Ishi wrote:
>>> 
 
 
> On Oct 16, 2023, at 17:39, Richard Biener  wrote:
> 
> On Mon, 16 Oct 2023, Tatsuyuki Ishi wrote:
> 
>> lld and mold are platform-agnostic and not prefixed with target triple.
>> Prepending the target triple makes it less likely to find the intended
>> linker executable.
>> 
>> A potential breaking change is that we no longer try to search for
>> triple-prefixed lld/mold binaries anymore. However, since there doesn't
>> seem to be support to build LLVM or mold with triple-prefixed executable
>> names, it seems better to just not bother with that case.
>> 
>>  PR driver/111605
>> 
>> gcc/Changelog:
>> 
>>  * collect2.cc (main): Do not prepend target triple to
>>  -fuse-ld=lld,mold.
>> ---
>> gcc/collect2.cc | 13 -
>> 1 file changed, 8 insertions(+), 5 deletions(-)
>> 
>> diff --git a/gcc/collect2.cc b/gcc/collect2.cc
>> index 63b9a0c233a..c943f9f577c 100644
>> --- a/gcc/collect2.cc
>> +++ b/gcc/collect2.cc
>> @@ -865,12 +865,15 @@ main (int argc, char **argv)
>> int i;
>> 
>> for (i = 0; i < USE_LD_MAX; i++)
>> -full_ld_suffixes[i]
>> #ifdef CROSS_DIRECTORY_STRUCTURE
>> -  = concat (target_machine, "-", ld_suffixes[i], NULL);
>> -#else
>> -  = ld_suffixes[i];
>> -#endif
>> +/* lld and mold are platform-agnostic and not prefixed with target
>> +   triple.  */
>> +if (!(i == USE_LLD_LD || i == USE_MOLD_LD))
>> +  full_ld_suffixes[i] = concat (target_machine, "-", ld_suffixes[i],
>> +NULL);
>> +else
>> +#endif
>> +  full_ld_suffixes[i] = ld_suffixes[i];
>> 
>> p = argv[0] + strlen (argv[0]);
>> while (p != argv[0] && !IS_DIR_SEPARATOR (p[-1]))
> 
> Since we later do
> 
> /* Search the compiler directories for `ld'.  We have protection against
>   recursive calls in find_a_file.  */
> if (ld_file_name == 0)
>  ld_file_name = find_a_file (, ld_suffixes[selected_linker], 
> X_OK);
> /* Search the ordinary system bin directories
>   for `ld' (if native linking) or `TARGET-ld' (if cross).  */
> if (ld_file_name == 0)
>  ld_file_name = find_a_file (, full_ld_suffixes[selected_linker], 
> X_OK);
> 
> I wonder how having full_ld_suffixes[LLD|MOLD] == ld_suffixes[LLD|MOLD]
> fixes anything?
 
 Per the linked PR, the intended use case for this is when one wants to use 
 their system lld/mold with a separately packaged cross toolchain, without 
 requiring them to symlink their system lld/mold into the cross toolchain 
 bin directory.
 
 (Note that the first search is against COMPILER_PATH while the latter is 
 against PATH).
>>> 
>>> Ah.  So what about instead adding here
>>> 
>>>  /* Search the ordinary system bin directories for mold/lld even in
>>> a cross configuration.  */
>>>  if (ld_file_name == 0
>>>  && selected_linker == ...)
>>>ld_file_name = find_a_file (, ld_suffixes[selected_linker], X_OK);
>>> 
>>> instead?  That would keep things working in case the user has a
>>> xyz-arch-mold in the system dir but uses GNU ld on the host
>>> otherwise, lacking a 'mold' binary there?
>>> 
>>> That is, we'd only add, not change what we search for.
>> 
>> I considered that, but as described in commit message, it doesn?t seem 
>> anyone has created stuff named xyz-arch-lld or xyz-arch-mold. Closest is 
>> Gentoo?s symlink mentioned in this thread, but that?s xyz-arch-ld -> 
>> ld.lld/mold.
>> As such, this feels like a quirk, not something we need to keep 
>> compatibility for.
> 
> I don't have a good idea whether this is the case or not unfortunately
> so if it's my call I would err on the safe side.
> 
> We seem to recognize mold and lld only since GCC 12 which both are
> still maintained so I think we might want to do the change on all
> those branches?
> 
> If you feel confident there's indeed no such installs then let's go
> with your original patch.
> 
> Thus, OK for trunk and the affected branches after a while of no
> reported issues.

Hi,

Can I consider this an approval for this patch to be applied to trunk?
I would appreciate if this patch could be tested in GCC 14 prereleases.

I suppose backporting after no reported issues in GCC 14 would be the plan here?

Please let me know in case of misunderstandings.

Thanks,
Tatsuyuki.

> Thanks,
> Richard.
> 
>> The proposed change seems simple enough though, so if you consider this 
>> a compatibility issue I can go for that way as well.
> 
>> Tatsuyuki.
>> 
>>> 
>>> Thanks,
>>> Richard.
>> 
>> 
> 
> -- 
> Richard Biener mailto:rguent...@suse.de>>
> SUSE 

RE: [PATCH 3/21]middle-end: Implement code motion and dependency analysis for early breaks

2023-11-07 Thread Richard Biener
On Tue, 7 Nov 2023, Tamar Christina wrote:

> > -Original Message-
> > From: Richard Biener 
> > Sent: Tuesday, November 7, 2023 10:53 AM
> > To: Tamar Christina 
> > Cc: gcc-patches@gcc.gnu.org; nd ; j...@ventanamicro.com
> > Subject: Re: [PATCH 3/21]middle-end: Implement code motion and
> > dependency analysis for early breaks
> > 
> > On Mon, 6 Nov 2023, Tamar Christina wrote:
> > 
> > > Hi All,
> > >
> > > When performing early break vectorization we need to be sure that the
> > > vector operations are safe to perform.  A simple example is e.g.
> > >
> > >  for (int i = 0; i < N; i++)
> > >  {
> > >vect_b[i] = x + i;
> > >if (vect_a[i]*2 != x)
> > >  break;
> > >vect_a[i] = x;
> > >  }
> > >
> > > where the store to vect_b is not allowed to be executed
> > > unconditionally since if we exit through the early break it wouldn't
> > > have been done for the full VF iteration.
> > >
> > > Effective the code motion determines:
> > >   - is it safe/possible to vectorize the function
> > >   - what updates to the VUSES should be performed if we do
> > >   - Which statements need to be moved
> > >   - Which statements can't be moved:
> > > * values that are live must be reachable through all exits
> > > * values that aren't single use and shared by the use/def chain of 
> > > the cond
> > >   - The final insertion point of the instructions.  In the cases we have
> > > multiple early exist statements this should be the one closest to the 
> > > loop
> > > latch itself.
> > >
> > > After motion the loop above is:
> > >
> > >  for (int i = 0; i < N; i++)
> > >  {
> > >... y = x + i;
> > >if (vect_a[i]*2 != x)
> > >  break;
> > >vect_b[i] = y;
> > >vect_a[i] = x;
> > >
> > >  }
> > >
> > > The operation is split into two, during data ref analysis we determine
> > > validity of the operation and generate a worklist of actions to
> > > perform if we vectorize.
> > >
> > > After peeling and just before statetement tranformation we replay this
> > > worklist which moves the statements and updates book keeping only in
> > > the main loop that's to be vectorized.  This includes updating of USES in 
> > > exit
> > blocks.
> > >
> > > At the moment we don't support this for epilog nomasks since the
> > > additional vectorized epilog's stmt UIDs are not found.
> > 
> > As of UIDs note that UIDs are used for dominance checking in
> > vect_stmt_dominates_stmt_p and that at least is used during transform when
> > scheduling SLP.  Moving stmts around invalidates this UID order (I don't see
> > you "renumbering" UIDs).
> > 
> 
> Just some responses to questions while I process the rest.
> 
> I see, yeah I didn't encounter it because I punted SLP support.  As you said 
> for SLP
> We indeed don't need this.
> 
> > More comments below.
> > 
> > > Bootstrapped Regtested on aarch64-none-linux-gnu and no issues.
> > >
> > > Ok for master?
> > >
> > > Thanks,
> > > Tamar
> > >
> > > gcc/ChangeLog:
> > >
> > >   * tree-vect-data-refs.cc (validate_early_exit_stmts): New.
> > >   (vect_analyze_early_break_dependences): New.
> > >   (vect_analyze_data_ref_dependences): Use them.
> > >   * tree-vect-loop.cc (_loop_vec_info::_loop_vec_info): Initialize
> > >   early_breaks.
> > >   (move_early_exit_stmts): New.
> > >   (vect_transform_loop): use it/
> > >   * tree-vect-stmts.cc (vect_is_simple_use): Use vect_early_exit_def.
> > >   * tree-vectorizer.h (enum vect_def_type): Add vect_early_exit_def.
> > >   (class _loop_vec_info): Add early_breaks, early_break_conflict,
> > >   early_break_vuses.
> > >   (LOOP_VINFO_EARLY_BREAKS): New.
> > >   (LOOP_VINFO_EARLY_BRK_CONFLICT_STMTS): New.
> > >   (LOOP_VINFO_EARLY_BRK_DEST_BB): New.
> > >   (LOOP_VINFO_EARLY_BRK_VUSES): New.
> > >
> > > --- inline copy of patch --
> > > diff --git a/gcc/tree-vect-data-refs.cc b/gcc/tree-vect-data-refs.cc
> > > index
> > >
> > d5c9c4a11c2e5d8fd287f412bfa86d081c2f8325..0fc4f325980be0474f628c
> > 32b9ce
> > > 7be77f3e1d60 100644
> > > --- a/gcc/tree-vect-data-refs.cc
> > > +++ b/gcc/tree-vect-data-refs.cc
> > > @@ -613,6 +613,332 @@ vect_analyze_data_ref_dependence (struct
> > data_dependence_relation *ddr,
> > >return opt_result::success ();
> > >  }
> > >
> > > +/* This function tries to validate whether an early break vectorization
> > > +   is possible for the current instruction sequence. Returns True i
> > > +   possible, otherwise False.
> > > +
> > > +   Requirements:
> > > + - Any memory access must be to a fixed size buffer.
> > > + - There must not be any loads and stores to the same object.
> > > + - Multiple loads are allowed as long as they don't alias.
> > > +
> > > +   NOTE:
> > > + This implemementation is very conservative. Any overlappig 
> > > loads/stores
> > > + that take place before the early break statement gets rejected 
> > > aside from
> > > + WAR dependencies.
> > > +
> > > + i.e.:
> > > +
> > > + a[i] = 8
> > > + c = a[i]
> > > + if (b[i])
> 

Re: [PATCH] vect/ifcvt: Add vec_cond fallback and check for vector versioning.

2023-11-07 Thread Robin Dapp
> isn't is_cond_op implied by mask != NULL?  That said, if we ever end
> up here with a non-cond op but a loop mask we effectively want the
> same behvior so I think eliding is_cond_op and instead checking
> mask != NULL_TREE below is more future proof.
> 
> OK with that change.

Thanks, attached v2 with the is_cond_op removed and the comment
adjusted.  Also made the test specific to x86 and aarch64.
Going to commit it later.

Regards
 Robin

gcc/ChangeLog:

PR tree-optimization/112361
PR target/112359
PR middle-end/112406

* tree-if-conv.cc (convert_scalar_cond_reduction): Remember if
loop was versioned and only then create COND_OPs.
(predicate_scalar_phi): Do not create COND_OP when not
vectorizing.
* tree-vect-loop.cc (vect_expand_fold_left): Re-create
VEC_COND_EXPR.
(vectorize_fold_left_reduction): Pass mask to
vect_expand_fold_left.

gcc/testsuite/ChangeLog:

* gcc.dg/pr112359.c: New test.
---
 gcc/testsuite/gcc.dg/pr112359.c | 15 
 gcc/tree-if-conv.cc | 41 ++---
 gcc/tree-vect-loop.cc   | 22 --
 3 files changed, 63 insertions(+), 15 deletions(-)
 create mode 100644 gcc/testsuite/gcc.dg/pr112359.c

diff --git a/gcc/testsuite/gcc.dg/pr112359.c b/gcc/testsuite/gcc.dg/pr112359.c
new file mode 100644
index 000..38d588f3958
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/pr112359.c
@@ -0,0 +1,15 @@
+/* { dg-do compile { target aarch64*-*-* i?86-*-* x86_64-*-* } } */
+/* { dg-additional-options "-O -std=gnu99 -mavx512fp16 -ftree-loop-if-convert" 
{ target i?86-*-* x86_64-*-* } } */
+/* { dg-additional-options "-O -std=gnu99 -march=armv8.4-a+sve 
-ftree-loop-if-convert" { target aarch64*-*-* } } */
+
+int i, c;
+unsigned long long u;
+
+void
+foo (void)
+{
+  for (; i; i++)
+if (c)
+  u |= i;
+}
+
diff --git a/gcc/tree-if-conv.cc b/gcc/tree-if-conv.cc
index a7a751b668c..0190cf2369e 100644
--- a/gcc/tree-if-conv.cc
+++ b/gcc/tree-if-conv.cc
@@ -1845,12 +1845,15 @@ is_cond_scalar_reduction (gimple *phi, gimple **reduc, 
tree arg_0, tree arg_1,
 res_2 = res_13 + _ifc__1;
   Argument SWAP tells that arguments of conditional expression should be
   swapped.
+  If LOOP_VERSIONED is true if we assume that we versioned the loop for
+  vectorization.  In that case we can create a COND_OP.
   Returns rhs of resulting PHI assignment.  */
 
 static tree
 convert_scalar_cond_reduction (gimple *reduc, gimple_stmt_iterator *gsi,
   tree cond, tree op0, tree op1, bool swap,
-  bool has_nop, gimple* nop_reduc)
+  bool has_nop, gimple* nop_reduc,
+  bool loop_versioned)
 {
   gimple_stmt_iterator stmt_it;
   gimple *new_assign;
@@ -1874,7 +1877,7 @@ convert_scalar_cond_reduction (gimple *reduc, 
gimple_stmt_iterator *gsi,
  The COND_OP will have a neutral_op else value.  */
   internal_fn ifn;
   ifn = get_conditional_internal_fn (reduction_op);
-  if (ifn != IFN_LAST
+  if (loop_versioned && ifn != IFN_LAST
   && vectorized_internal_fn_supported_p (ifn, TREE_TYPE (lhs))
   && !swap)
 {
@@ -2129,11 +2132,13 @@ cmp_arg_entry (const void *p1, const void *p2, void * 
/* data.  */)
The generated code is inserted at GSI that points to the top of
basic block's statement list.
If PHI node has more than two arguments a chain of conditional
-   expression is produced.  */
+   expression is produced.
+   LOOP_VERSIONED should be true if we know that the loop was versioned for
+   vectorization. */
 
 
 static void
-predicate_scalar_phi (gphi *phi, gimple_stmt_iterator *gsi)
+predicate_scalar_phi (gphi *phi, gimple_stmt_iterator *gsi, bool 
loop_versioned)
 {
   gimple *new_stmt = NULL, *reduc, *nop_reduc;
   tree rhs, res, arg0, arg1, op0, op1, scev;
@@ -2213,7 +2218,8 @@ predicate_scalar_phi (gphi *phi, gimple_stmt_iterator 
*gsi)
  /* Convert reduction stmt into vectorizable form.  */
  rhs = convert_scalar_cond_reduction (reduc, gsi, cond, op0, op1,
   true_bb != gimple_bb (reduc),
-  has_nop, nop_reduc);
+  has_nop, nop_reduc,
+  loop_versioned);
  redundant_ssa_names.safe_push (std::make_pair (res, rhs));
}
   else
@@ -2311,7 +2317,8 @@ predicate_scalar_phi (gphi *phi, gimple_stmt_iterator 
*gsi)
{
  /* Convert reduction stmt into vectorizable form.  */
  rhs = convert_scalar_cond_reduction (reduc, gsi, cond, op0, op1,
-  swap, has_nop, nop_reduc);
+  swap, has_nop, nop_reduc,
+  loop_versioned);
  redundant_ssa_names.safe_push 

  1   2   3   >