Re: [PATCH] vect test: Remove xfail for riscv

2023-08-28 Thread Richard Biener via Gcc-patches
On Tue, 29 Aug 2023, Juzhe-Zhong wrote:

> We are planning to enable "vect" testsuite with scalable vector 
> auto-vectorization.
> 
> This case XPASS:
> XPASS: gcc.dg/vect/no-scevccp-outer-12.c scan-tree-dump-times vect "OUTER 
> LOOP VECTORIZED." 1
> 
> like ARM SVE.

OK

> ---
>  gcc/testsuite/gcc.dg/vect/no-scevccp-outer-12.c | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
> 
> diff --git a/gcc/testsuite/gcc.dg/vect/no-scevccp-outer-12.c 
> b/gcc/testsuite/gcc.dg/vect/no-scevccp-outer-12.c
> index e9ec4ca0da3..c2d3031bc0c 100644
> --- a/gcc/testsuite/gcc.dg/vect/no-scevccp-outer-12.c
> +++ b/gcc/testsuite/gcc.dg/vect/no-scevccp-outer-12.c
> @@ -47,4 +47,4 @@ int main (void)
>  }
>  
>  /* Until we support multiple types in the inner loop  */
> -/* { dg-final { scan-tree-dump-times "OUTER LOOP VECTORIZED." 1 "vect" { 
> xfail { ! aarch64*-*-* } } } } */
> +/* { dg-final { scan-tree-dump-times "OUTER LOOP VECTORIZED." 1 "vect" { 
> xfail { ! { aarch64*-*-* riscv*-*-* } } } } } */
> 

-- 
Richard Biener 
SUSE Software Solutions Germany GmbH,
Frankenstrasse 146, 90461 Nuernberg, Germany;
GF: Ivo Totev, Andrew McDonald, Werner Knoblich; (HRB 36809, AG Nuernberg)


Re: Bind RTL to a TREE expr (Re: [Bug target/111166])

2023-08-28 Thread Richard Biener via Gcc-patches
On Tue, 29 Aug 2023, Jiufu Guo wrote:

> 
> Hi All!
> 
> "rguenth at gcc dot gnu.org"  writes:
> 
> > https://gcc.gnu.org/bugzilla/show_bug.cgi?id=66
> ...
> >
> >
> > At RTL expansion time we store to D.2865 where it's DECL_RTL is r82:TI so
> > we can hardly fix it there.  Only a later pass could figure each of the
> > insns fully define the reg.
> >
> > Jiufu Guo is working to improve what we choose for DECL_RTL, but for
> > incoming params / outgoing return.  This is a case where we could,
> > with -fno-tree-vectorize, improve DECL_RTL for an automatic var and
> > choose not TImode but something like a (concat:TI reg:DI reg:DI).
> 
> Here is the patch about improving the parameters and returns in
> registers.
> https://gcc.gnu.org/pipermail/gcc-patches/2023-August/628213.html
> 
> I have a question about how to bind an RTL to a TREE expression.
> In this patch, a map TREE->RTL is used. But it would be better if
> there was a faster way.
> 
> We have DECL_RTL/INCOMING_RTL, but they can only be bound to
> DECL(or PARM). In the above patch, the TREE can be an EXPR
> (e.g. COMPONENT_REF/ARRAY_REF).
> 
> Is there a way to achieve this? Thanks for suggestions!

No, but we don't need to bind RTL to COMPONENT_REF and friends,
what we want to change is the DECL_RTL of the underlying DECL.

Richard.


[PATCH] vect test: Remove xfail for riscv

2023-08-28 Thread Juzhe-Zhong
We are planning to enable "vect" testsuite with scalable vector 
auto-vectorization.

This case XPASS:
XPASS: gcc.dg/vect/no-scevccp-outer-12.c scan-tree-dump-times vect "OUTER LOOP 
VECTORIZED." 1

like ARM SVE.

---
 gcc/testsuite/gcc.dg/vect/no-scevccp-outer-12.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/gcc/testsuite/gcc.dg/vect/no-scevccp-outer-12.c 
b/gcc/testsuite/gcc.dg/vect/no-scevccp-outer-12.c
index e9ec4ca0da3..c2d3031bc0c 100644
--- a/gcc/testsuite/gcc.dg/vect/no-scevccp-outer-12.c
+++ b/gcc/testsuite/gcc.dg/vect/no-scevccp-outer-12.c
@@ -47,4 +47,4 @@ int main (void)
 }
 
 /* Until we support multiple types in the inner loop  */
-/* { dg-final { scan-tree-dump-times "OUTER LOOP VECTORIZED." 1 "vect" { xfail 
{ ! aarch64*-*-* } } } } */
+/* { dg-final { scan-tree-dump-times "OUTER LOOP VECTORIZED." 1 "vect" { xfail 
{ ! { aarch64*-*-* riscv*-*-* } } } } } */
-- 
2.36.3



Re: [PATCH 5/9] arm: [MVE intrinsics] add support for p8 and p16 polynomial types

2023-08-28 Thread Christophe Lyon via Gcc-patches
On Tue, 29 Aug 2023 at 08:06, Prathamesh Kulkarni <
prathamesh.kulka...@linaro.org> wrote:

> On Tue, 15 Aug 2023 at 00:05, Christophe Lyon via Gcc-patches
>  wrote:
> >
> > Although they look like aliases for u8 and u16, we need to define them
> > so that we can handle p8 and p16 suffixes with the general framework.
> >
> > They will be used by vmull[bt]q_poly intrinsics.
> Hi Christophe,
>

Hi Prathamesh,


> It seems your patch committed in 9bae37ec8dc32027dedf9a32bf15754ebad6da38
> broke arm bootstrap build due to Werror=missing-field-initializers:
>
> https://ci.linaro.org/job/tcwg_bootstrap_build--master-arm-bootstrap-build/199/artifact/artifacts/notify/mail-body.txt/*view*/
>
> I think this happens because the commit adds a new member to
> type_suffix_info:
> -  unsigned int spare : 13;
> +  /* True if the suffix is for a polynomial type.  */
> +  unsigned int poly_p : 1;
> +  unsigned int spare : 12;
>
> but probably misses an initializer in arm-mve-builtins.cc:type_suffixes:
>   { "", NUM_VECTOR_TYPES, TYPE_bool, 0, 0, false, false, false,
> 0, VOIDmode }
>
> Yeah, exactly. I had noticed this after sending the patch, but forgot to
fix it when I pushed the patch.

Fixed as obvious with the attached patch (r14-3538-gacaf9e333dbc2e).

Thanks,

Christophe


Thanks,
> Prathamesh
> >
> > 2023-08-14  Christophe Lyon  
> >
> > gcc/
> > * config/arm/arm-mve-builtins.cc (type_suffixes): Handle poly_p
> > field..
> > (TYPES_poly_8_16): New.
> > (poly_8_16): New.
> > * config/arm/arm-mve-builtins.def (p8): New type suffix.
> > (p16): Likewise.
> > * config/arm/arm-mve-builtins.h (enum type_class_index): Add
> > TYPE_poly.
> > (struct type_suffix_info): Add poly_p field.
> > ---
> >  gcc/config/arm/arm-mve-builtins.cc  | 6 ++
> >  gcc/config/arm/arm-mve-builtins.def | 2 ++
> >  gcc/config/arm/arm-mve-builtins.h   | 5 -
> >  3 files changed, 12 insertions(+), 1 deletion(-)
> >
> > diff --git a/gcc/config/arm/arm-mve-builtins.cc
> b/gcc/config/arm/arm-mve-builtins.cc
> > index 7eec9d2861c..fa8b0ad36b3 100644
> > --- a/gcc/config/arm/arm-mve-builtins.cc
> > +++ b/gcc/config/arm/arm-mve-builtins.cc
> > @@ -128,6 +128,7 @@ CONSTEXPR const type_suffix_info
> type_suffixes[NUM_TYPE_SUFFIXES + 1] = {
> >  TYPE_##CLASS == TYPE_signed || TYPE_##CLASS == TYPE_unsigned, \
> >  TYPE_##CLASS == TYPE_unsigned, \
> >  TYPE_##CLASS == TYPE_float, \
> > +TYPE_##CLASS == TYPE_poly, \
> >  0, \
> >  MODE },
> >  #include "arm-mve-builtins.def"
> > @@ -177,6 +178,10 @@ CONSTEXPR const type_suffix_info
> type_suffixes[NUM_TYPE_SUFFIXES + 1] = {
> >  #define TYPES_all_signed(S, D) \
> >S (s8), S (s16), S (s32)
> >
> > +/* _p8 _p16.  */
> > +#define TYPES_poly_8_16(S, D) \
> > +  S (p8), S (p16)
> > +
> >  /* _u8 _u16 _u32.  */
> >  #define TYPES_all_unsigned(S, D) \
> >S (u8), S (u16), S (u32)
> > @@ -275,6 +280,7 @@ DEF_MVE_TYPES_ARRAY (integer_8);
> >  DEF_MVE_TYPES_ARRAY (integer_8_16);
> >  DEF_MVE_TYPES_ARRAY (integer_16_32);
> >  DEF_MVE_TYPES_ARRAY (integer_32);
> > +DEF_MVE_TYPES_ARRAY (poly_8_16);
> >  DEF_MVE_TYPES_ARRAY (signed_16_32);
> >  DEF_MVE_TYPES_ARRAY (signed_32);
> >  DEF_MVE_TYPES_ARRAY (reinterpret_integer);
> > diff --git a/gcc/config/arm/arm-mve-builtins.def
> b/gcc/config/arm/arm-mve-builtins.def
> > index e3f37876210..e2cf1baf370 100644
> > --- a/gcc/config/arm/arm-mve-builtins.def
> > +++ b/gcc/config/arm/arm-mve-builtins.def
> > @@ -63,6 +63,8 @@ DEF_MVE_TYPE_SUFFIX (u8, uint8x16_t, unsigned, 8,
> V16QImode)
> >  DEF_MVE_TYPE_SUFFIX (u16, uint16x8_t, unsigned, 16, V8HImode)
> >  DEF_MVE_TYPE_SUFFIX (u32, uint32x4_t, unsigned, 32, V4SImode)
> >  DEF_MVE_TYPE_SUFFIX (u64, uint64x2_t, unsigned, 64, V2DImode)
> > +DEF_MVE_TYPE_SUFFIX (p8, uint8x16_t, poly, 8, V16QImode)
> > +DEF_MVE_TYPE_SUFFIX (p16, uint16x8_t, poly, 16, V8HImode)
> >  #undef REQUIRES_FLOAT
> >
> >  #define REQUIRES_FLOAT true
> > diff --git a/gcc/config/arm/arm-mve-builtins.h
> b/gcc/config/arm/arm-mve-builtins.h
> > index c9b51a0c77b..37b8223dfb2 100644
> > --- a/gcc/config/arm/arm-mve-builtins.h
> > +++ b/gcc/config/arm/arm-mve-builtins.h
> > @@ -146,6 +146,7 @@ enum type_class_index
> >TYPE_float,
> >TYPE_signed,
> >TYPE_unsigned,
> > +  TYPE_poly,
> >NUM_TYPE_CLASSES
> >  };
> >
> > @@ -221,7 +222,9 @@ struct type_suffix_info
> >unsigned int unsigned_p : 1;
> >/* True if the suffix is for a floating-point type.  */
> >unsigned int float_p : 1;
> > -  unsigned int spare : 13;
> > +  /* True if the suffix is for a polynomial type.  */
> > +  unsigned int poly_p : 1;
> > +  unsigned int spare : 12;
> >
> >/* The associated vector or predicate mode.  */
> >machine_mode vector_mode : 16;
> > --
> > 2.34.1
> >
>
From acaf9e333dbc2eb811848c169f95ec7a8ca0e2e7 Mon Sep 17 00:00:00 2001
From: Christophe Lyon 
Date: Tue, 29 Aug 2023 06:35:06 +
Subject: [PATCH] arm: Fix bootstrap / add m

Re: [wwwdocs] projects/gomp: Update implementation status and minor fixes

2023-08-28 Thread Gerald Pfeifer
On Fri, 25 Aug 2023, Tobias Burnus wrote:
> It also fixes a couple of bugs and adds links providing more details
> for two items (a PR link as in libgomp.texi and a section in the manual).

Nice changes, thanks.

+Some are only stubs; see manual (

Re: [PATCH 5/9] arm: [MVE intrinsics] add support for p8 and p16 polynomial types

2023-08-28 Thread Prathamesh Kulkarni via Gcc-patches
On Tue, 15 Aug 2023 at 00:05, Christophe Lyon via Gcc-patches
 wrote:
>
> Although they look like aliases for u8 and u16, we need to define them
> so that we can handle p8 and p16 suffixes with the general framework.
>
> They will be used by vmull[bt]q_poly intrinsics.
Hi Christophe,
It seems your patch committed in 9bae37ec8dc32027dedf9a32bf15754ebad6da38
broke arm bootstrap build due to Werror=missing-field-initializers:
https://ci.linaro.org/job/tcwg_bootstrap_build--master-arm-bootstrap-build/199/artifact/artifacts/notify/mail-body.txt/*view*/

I think this happens because the commit adds a new member to type_suffix_info:
-  unsigned int spare : 13;
+  /* True if the suffix is for a polynomial type.  */
+  unsigned int poly_p : 1;
+  unsigned int spare : 12;

but probably misses an initializer in arm-mve-builtins.cc:type_suffixes:
  { "", NUM_VECTOR_TYPES, TYPE_bool, 0, 0, false, false, false,
0, VOIDmode }

Thanks,
Prathamesh
>
> 2023-08-14  Christophe Lyon  
>
> gcc/
> * config/arm/arm-mve-builtins.cc (type_suffixes): Handle poly_p
> field..
> (TYPES_poly_8_16): New.
> (poly_8_16): New.
> * config/arm/arm-mve-builtins.def (p8): New type suffix.
> (p16): Likewise.
> * config/arm/arm-mve-builtins.h (enum type_class_index): Add
> TYPE_poly.
> (struct type_suffix_info): Add poly_p field.
> ---
>  gcc/config/arm/arm-mve-builtins.cc  | 6 ++
>  gcc/config/arm/arm-mve-builtins.def | 2 ++
>  gcc/config/arm/arm-mve-builtins.h   | 5 -
>  3 files changed, 12 insertions(+), 1 deletion(-)
>
> diff --git a/gcc/config/arm/arm-mve-builtins.cc 
> b/gcc/config/arm/arm-mve-builtins.cc
> index 7eec9d2861c..fa8b0ad36b3 100644
> --- a/gcc/config/arm/arm-mve-builtins.cc
> +++ b/gcc/config/arm/arm-mve-builtins.cc
> @@ -128,6 +128,7 @@ CONSTEXPR const type_suffix_info 
> type_suffixes[NUM_TYPE_SUFFIXES + 1] = {
>  TYPE_##CLASS == TYPE_signed || TYPE_##CLASS == TYPE_unsigned, \
>  TYPE_##CLASS == TYPE_unsigned, \
>  TYPE_##CLASS == TYPE_float, \
> +TYPE_##CLASS == TYPE_poly, \
>  0, \
>  MODE },
>  #include "arm-mve-builtins.def"
> @@ -177,6 +178,10 @@ CONSTEXPR const type_suffix_info 
> type_suffixes[NUM_TYPE_SUFFIXES + 1] = {
>  #define TYPES_all_signed(S, D) \
>S (s8), S (s16), S (s32)
>
> +/* _p8 _p16.  */
> +#define TYPES_poly_8_16(S, D) \
> +  S (p8), S (p16)
> +
>  /* _u8 _u16 _u32.  */
>  #define TYPES_all_unsigned(S, D) \
>S (u8), S (u16), S (u32)
> @@ -275,6 +280,7 @@ DEF_MVE_TYPES_ARRAY (integer_8);
>  DEF_MVE_TYPES_ARRAY (integer_8_16);
>  DEF_MVE_TYPES_ARRAY (integer_16_32);
>  DEF_MVE_TYPES_ARRAY (integer_32);
> +DEF_MVE_TYPES_ARRAY (poly_8_16);
>  DEF_MVE_TYPES_ARRAY (signed_16_32);
>  DEF_MVE_TYPES_ARRAY (signed_32);
>  DEF_MVE_TYPES_ARRAY (reinterpret_integer);
> diff --git a/gcc/config/arm/arm-mve-builtins.def 
> b/gcc/config/arm/arm-mve-builtins.def
> index e3f37876210..e2cf1baf370 100644
> --- a/gcc/config/arm/arm-mve-builtins.def
> +++ b/gcc/config/arm/arm-mve-builtins.def
> @@ -63,6 +63,8 @@ DEF_MVE_TYPE_SUFFIX (u8, uint8x16_t, unsigned, 8, V16QImode)
>  DEF_MVE_TYPE_SUFFIX (u16, uint16x8_t, unsigned, 16, V8HImode)
>  DEF_MVE_TYPE_SUFFIX (u32, uint32x4_t, unsigned, 32, V4SImode)
>  DEF_MVE_TYPE_SUFFIX (u64, uint64x2_t, unsigned, 64, V2DImode)
> +DEF_MVE_TYPE_SUFFIX (p8, uint8x16_t, poly, 8, V16QImode)
> +DEF_MVE_TYPE_SUFFIX (p16, uint16x8_t, poly, 16, V8HImode)
>  #undef REQUIRES_FLOAT
>
>  #define REQUIRES_FLOAT true
> diff --git a/gcc/config/arm/arm-mve-builtins.h 
> b/gcc/config/arm/arm-mve-builtins.h
> index c9b51a0c77b..37b8223dfb2 100644
> --- a/gcc/config/arm/arm-mve-builtins.h
> +++ b/gcc/config/arm/arm-mve-builtins.h
> @@ -146,6 +146,7 @@ enum type_class_index
>TYPE_float,
>TYPE_signed,
>TYPE_unsigned,
> +  TYPE_poly,
>NUM_TYPE_CLASSES
>  };
>
> @@ -221,7 +222,9 @@ struct type_suffix_info
>unsigned int unsigned_p : 1;
>/* True if the suffix is for a floating-point type.  */
>unsigned int float_p : 1;
> -  unsigned int spare : 13;
> +  /* True if the suffix is for a polynomial type.  */
> +  unsigned int poly_p : 1;
> +  unsigned int spare : 12;
>
>/* The associated vector or predicate mode.  */
>machine_mode vector_mode : 16;
> --
> 2.34.1
>


Re: [pushed] analyzer: fix ICE in text art strings support

2023-08-28 Thread Prathamesh Kulkarni via Gcc-patches
On Fri, 25 Aug 2023 at 18:15, David Malcolm via Gcc-patches
 wrote:
>
> Successfully bootstrapped & regrtested on x86_64-pc-linux-gnu.
> Pushed to trunk as r14-3481-g99a3fcb8ff0bf2.
Hi David,
It seems the new tests FAIL on arm for LTO bootstrap config:
https://ci.linaro.org/job/tcwg_bootstrap_check--master-arm-check_bootstrap_lto-build/263/artifact/artifacts/06-check_regression/fails.sum/*view*/
Please let me know if you need any help in reproducing these failures.

Thanks,
Prathamesh
>
> gcc/analyzer/ChangeLog:
> * access-diagram.cc (class string_region_spatial_item): Remove
> assumption that the string is written to the start of the cluster.
>
> gcc/testsuite/ChangeLog:
> * gcc.dg/analyzer/out-of-bounds-diagram-17.c: New test.
> * gcc.dg/analyzer/out-of-bounds-diagram-18.c: New test.
> * gcc.dg/analyzer/out-of-bounds-diagram-19.c: New test.
> ---
>  gcc/analyzer/access-diagram.cc| 57 ---
>  .../analyzer/out-of-bounds-diagram-17.c   | 34 +++
>  .../analyzer/out-of-bounds-diagram-18.c   | 38 +
>  .../analyzer/out-of-bounds-diagram-19.c   | 45 +++
>  4 files changed, 155 insertions(+), 19 deletions(-)
>  create mode 100644 gcc/testsuite/gcc.dg/analyzer/out-of-bounds-diagram-17.c
>  create mode 100644 gcc/testsuite/gcc.dg/analyzer/out-of-bounds-diagram-18.c
>  create mode 100644 gcc/testsuite/gcc.dg/analyzer/out-of-bounds-diagram-19.c
>
> diff --git a/gcc/analyzer/access-diagram.cc b/gcc/analyzer/access-diagram.cc
> index d7b669a4e38e..a51d594b5b2c 100644
> --- a/gcc/analyzer/access-diagram.cc
> +++ b/gcc/analyzer/access-diagram.cc
> @@ -1509,10 +1509,16 @@ public:
>out.add_all_bytes_in_range (m_actual_bits);
>  else
>{
> -   byte_range head_of_string (0, m_ellipsis_head_len);
> +   byte_range bytes (0, 0);
> +   bool valid = m_actual_bits.as_concrete_byte_range (&bytes);
> +   gcc_assert (valid);
> +   byte_range head_of_string (bytes.get_start_byte_offset (),
> +  m_ellipsis_head_len);
> out.add_all_bytes_in_range (head_of_string);
> byte_range tail_of_string
> - (TREE_STRING_LENGTH (string_cst) - m_ellipsis_tail_len,
> + ((bytes.get_start_byte_offset ()
> +   + TREE_STRING_LENGTH (string_cst)
> +   - m_ellipsis_tail_len),
>m_ellipsis_tail_len);
> out.add_all_bytes_in_range (tail_of_string);
> /* Adding the above pair of ranges will also effectively add
> @@ -1535,11 +1541,14 @@ public:
>  tree string_cst = get_string_cst ();
>  if (m_show_full_string)
>{
> -   for (byte_offset_t byte_idx = bytes.get_start_byte_offset ();
> -   byte_idx < bytes.get_next_byte_offset ();
> -   byte_idx = byte_idx + 1)
> -add_column_for_byte (t, btm, sm, byte_idx,
> - byte_idx_table_y, byte_val_table_y);
> +   for (byte_offset_t byte_idx_within_cluster
> + = bytes.get_start_byte_offset ();
> +   byte_idx_within_cluster < bytes.get_next_byte_offset ();
> +   byte_idx_within_cluster = byte_idx_within_cluster + 1)
> +add_column_for_byte
> +  (t, btm, sm, byte_idx_within_cluster,
> +   byte_idx_within_cluster - bytes.get_start_byte_offset (),
> +   byte_idx_table_y, byte_val_table_y);
>
> if (m_show_utf8)
>  {
> @@ -1566,10 +1575,13 @@ public:
>  = decoded_char.m_start_byte - TREE_STRING_POINTER 
> (string_cst);
>byte_size_t size_in_bytes
>  = decoded_char.m_next_byte - decoded_char.m_start_byte;
> -  byte_range bytes (start_byte_idx, size_in_bytes);
> +  byte_range cluster_bytes_for_codepoint
> +(start_byte_idx + bytes.get_start_byte_offset (),
> + size_in_bytes);
>
>const table::rect_t code_point_table_rect
> -= btm.get_table_rect (&m_string_reg, bytes,
> += btm.get_table_rect (&m_string_reg,
> +  cluster_bytes_for_codepoint,
>utf8_code_point_table_y, 1);
>char buf[100];
>sprintf (buf, "U+%04x", decoded_char.m_ch);
> @@ -1579,7 +1591,8 @@ public:
>if (show_unichars)
>  {
>const table::rect_t character_table_rect
> -= btm.get_table_rect (&m_string_reg, bytes,
> += btm.get_table_rect (&m_string_reg,
> +  cluster_bytes_for_codepoint,
>utf8_character_table_y, 1);
>if (cpp_is_printable_char (decoded_char.m_ch))
>  t.set_cell_span (character_table_rect,
> @@ -1598,12 +1611,14 @@ public:
>{
> /* Head of string.  */
> for (int byte_idx =

[PATCH] doc: Add fpatchable-function-entry to Option-Summary page[PR110983]

2023-08-28 Thread Mao via Gcc-patches
The -fpatchable-function-entry is missing in both the web doc [1]
and the man page's "Option Summary" section.

This patch is to add it.

[1]: https://gcc.gnu.org/onlinedocs/gcc/Option-Summary.html

---
 gcc/doc/invoke.texi | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/gcc/doc/invoke.texi b/gcc/doc/invoke.texi
index 16aa92b5e86..6571180af0c 100644
--- a/gcc/doc/invoke.texi
+++ b/gcc/doc/invoke.texi
@@ -649,7 +649,8 @@ Objective-C and Objective-C++ Dialects}.
 -finstrument-functions  -finstrument-functions-once
 -finstrument-functions-exclude-function-list=@var{sym},@var{sym},@dots{}
 -finstrument-functions-exclude-file-list=@var{file},@var{file},@dots{}
--fprofile-prefix-map=@var{old}=@var{new}}
+-fprofile-prefix-map=@var{old}=@var{new}
+-fpatchable-function-entry=@var{N}@r{[},@var{M}@r{]}}
 
 @item Preprocessor Options
 @xref{Preprocessor Options,,Options Controlling the Preprocessor}.
-- 
2.34.1



Bind RTL to a TREE expr (Re: [Bug target/111166])

2023-08-28 Thread Jiufu Guo via Gcc-patches


Hi All!

"rguenth at gcc dot gnu.org"  writes:

> https://gcc.gnu.org/bugzilla/show_bug.cgi?id=66
...
>
>
> At RTL expansion time we store to D.2865 where it's DECL_RTL is r82:TI so
> we can hardly fix it there.  Only a later pass could figure each of the
> insns fully define the reg.
>
> Jiufu Guo is working to improve what we choose for DECL_RTL, but for
> incoming params / outgoing return.  This is a case where we could,
> with -fno-tree-vectorize, improve DECL_RTL for an automatic var and
> choose not TImode but something like a (concat:TI reg:DI reg:DI).

Here is the patch about improving the parameters and returns in
registers.
https://gcc.gnu.org/pipermail/gcc-patches/2023-August/628213.html

I have a question about how to bind an RTL to a TREE expression.
In this patch, a map TREE->RTL is used. But it would be better if
there was a faster way.

We have DECL_RTL/INCOMING_RTL, but they can only be bound to
DECL(or PARM). In the above patch, the TREE can be an EXPR
(e.g. COMPONENT_REF/ARRAY_REF).

Is there a way to achieve this? Thanks for suggestions!

BR,
Jeff (Jiufu Guo)


Re: [PATCH] analyzer: implement reference count checking for CPython plugin [PR107646]

2023-08-28 Thread Eric Feng via Gcc-patches
On Tue, Aug 29, 2023 at 12:32 AM Eric Feng  wrote:
>
> Hi Dave,
>
> Thanks for the feedback. I've addressed the changes you mentioned in
> addition to adding more test cases. I've also taken this chance to
> split the test files according to known function subclasses, as you previously
> suggested. Since there were also some changes to the core analyzer, I've done 
> a
> bootstrap and regtested the patch as well. Does it look OK for trunk?
Apologies — I forgot to mention that bootstrap and regtest was done on
aarch64-unknown-linux-gnu.
>
> Best,
> Eric
>
> ---
>
> This patch introduces initial support for reference count checking of
> PyObjects in relation to the Python/C API for the CPython plugin.
> Additionally, the core analyzer underwent several modifications to
> accommodate this feature. These include:
>
> - Introducing support for callbacks at the end of
>   region_model::pop_frame. This is our current point of validation for
>   the reference count of PyObjects.
> - An added optional custom stmt_finder parameter to
>   region_model_context::warn. This aids in emitting a diagnostic
>   concerning the reference count, especially when the stmt_finder is
>   NULL, which is currently the case during region_model::pop_frame.
>
> The current diagnostic we emit relating to the reference count
> appears as follows:
>
> rc3.c:23:10: warning: expected  to 
> have reference count: ‘1’ but ob_refcnt field is: ‘2’
>23 |   return list;
>   |  ^~~~
>   ‘create_py_object’: events 1-4
> |
> |4 |   PyObject* item = PyLong_FromLong(3);
> |  |^~
> |  ||
> |  |(1) when ‘PyLong_FromLong’ succeeds
> |5 |   PyObject* list = PyList_New(1);
> |  |~
> |  ||
> |  |(2) when ‘PyList_New’ succeeds
> |..
> |   14 |   PyList_Append(list, item);
> |  |   ~
> |  |   |
> |  |   (3) when ‘PyList_Append’ succeeds, moving buffer
> |..
> |   23 |   return list;
> |  |  
> |  |  |
> |  |  (4) here
> |
>
> This is a WIP in several ways:
> - Enhancing the diagnostic for better clarity. For instance, users should
>   expect to see the variable name 'item' instead of the placeholder in the
>   diagnostic above.
> - Currently, functions returning PyObject * are assumed to always produce
>   a new reference.
> - The validation of reference count is only for PyObjects created within a
>   function body. Verifying reference counts for PyObjects passed as
>   parameters is not supported in this patch.
>
> gcc/analyzer/ChangeLog:
>   PR analyzer/107646
> * engine.cc (impl_region_model_context::warn): New optional parameter.
> * exploded-graph.h (class impl_region_model_context): Likewise.
> * region-model.cc (region_model::pop_frame): New callback feature for
>   * region_model::pop_frame.
> * region-model.h (struct append_regions_cb_data): Likewise.
> (class region_model): Likewise.
> (class region_model_context): New optional parameter.
> (class region_model_context_decorator): Likewise.
>
> gcc/testsuite/ChangeLog:
>   PR analyzer/107646
> * gcc.dg/plugin/analyzer_cpython_plugin.c: Implements reference count
>   * checking for PyObjects.
> * gcc.dg/plugin/cpython-plugin-test-2.c: Moved to...
> * gcc.dg/plugin/cpython-plugin-test-PyList_Append.c: ...here (and
>   * added more tests).
> * gcc.dg/plugin/cpython-plugin-test-1.c: Moved to...
> * gcc.dg/plugin/cpython-plugin-test-no-plugin.c: ...here (and added
>   * more tests).
> * gcc.dg/plugin/plugin.exp: New tests.
> * gcc.dg/plugin/cpython-plugin-test-PyList_New.c: New test.
> * gcc.dg/plugin/cpython-plugin-test-PyLong_FromLong.c: New test.
> * gcc.dg/plugin/cpython-plugin-test-refcnt-checking.c: New test.
>
> Signed-off-by: Eric Feng 
>
> ---
>  gcc/analyzer/engine.cc|   8 +-
>  gcc/analyzer/exploded-graph.h |   4 +-
>  gcc/analyzer/region-model.cc  |   3 +
>  gcc/analyzer/region-model.h   |  48 ++-
>  .../gcc.dg/plugin/analyzer_cpython_plugin.c   | 376 +-
>  c => cpython-plugin-test-PyList_Append.c} |  56 +--
>  .../plugin/cpython-plugin-test-PyList_New.c   |  38 ++
>  .../cpython-plugin-test-PyLong_FromLong.c |  38 ++
>  ...st-1.c => cpython-plugin-test-no-plugin.c} |   0
>  .../cpython-plugin-test-refcnt-checking.c |  78 
>  gcc/testsuite/gcc.dg/plugin/plugin.exp|   5 +-
>  11 files changed, 612 insertions(+), 42 deletions(-)
>  rename gcc/testsuite/gcc.dg/plugin/{cpython-plugin-test-2.c => 
> cpython-plugin-test-PyList_Append.c} (64%)
>  create mode 100644 
> gcc/testsuite/gcc.dg/plugin/cpython-plugin-

[PATCH] analyzer: implement reference count checking for CPython plugin [PR107646]

2023-08-28 Thread Eric Feng via Gcc-patches
Hi Dave,

Thanks for the feedback. I've addressed the changes you mentioned in
addition to adding more test cases. I've also taken this chance to 
split the test files according to known function subclasses, as you previously 
suggested. Since there were also some changes to the core analyzer, I've done a
bootstrap and regtested the patch as well. Does it look OK for trunk?

Best,
Eric

---

This patch introduces initial support for reference count checking of
PyObjects in relation to the Python/C API for the CPython plugin.
Additionally, the core analyzer underwent several modifications to
accommodate this feature. These include:

- Introducing support for callbacks at the end of
  region_model::pop_frame. This is our current point of validation for
  the reference count of PyObjects.
- An added optional custom stmt_finder parameter to
  region_model_context::warn. This aids in emitting a diagnostic
  concerning the reference count, especially when the stmt_finder is
  NULL, which is currently the case during region_model::pop_frame.

The current diagnostic we emit relating to the reference count
appears as follows:

rc3.c:23:10: warning: expected  to 
have reference count: ‘1’ but ob_refcnt field is: ‘2’
   23 |   return list;
  |  ^~~~
  ‘create_py_object’: events 1-4
|
|4 |   PyObject* item = PyLong_FromLong(3);
|  |^~
|  ||
|  |(1) when ‘PyLong_FromLong’ succeeds
|5 |   PyObject* list = PyList_New(1);
|  |~
|  ||
|  |(2) when ‘PyList_New’ succeeds
|..
|   14 |   PyList_Append(list, item);
|  |   ~
|  |   |
|  |   (3) when ‘PyList_Append’ succeeds, moving buffer
|..
|   23 |   return list;
|  |  
|  |  |
|  |  (4) here
|

This is a WIP in several ways:
- Enhancing the diagnostic for better clarity. For instance, users should
  expect to see the variable name 'item' instead of the placeholder in the
  diagnostic above.
- Currently, functions returning PyObject * are assumed to always produce
  a new reference.
- The validation of reference count is only for PyObjects created within a
  function body. Verifying reference counts for PyObjects passed as
  parameters is not supported in this patch.

gcc/analyzer/ChangeLog:
  PR analyzer/107646
* engine.cc (impl_region_model_context::warn): New optional parameter.
* exploded-graph.h (class impl_region_model_context): Likewise.
* region-model.cc (region_model::pop_frame): New callback feature for
  * region_model::pop_frame.
* region-model.h (struct append_regions_cb_data): Likewise.
(class region_model): Likewise.
(class region_model_context): New optional parameter.
(class region_model_context_decorator): Likewise.

gcc/testsuite/ChangeLog:
  PR analyzer/107646
* gcc.dg/plugin/analyzer_cpython_plugin.c: Implements reference count
  * checking for PyObjects.
* gcc.dg/plugin/cpython-plugin-test-2.c: Moved to...
* gcc.dg/plugin/cpython-plugin-test-PyList_Append.c: ...here (and
  * added more tests).
* gcc.dg/plugin/cpython-plugin-test-1.c: Moved to...
* gcc.dg/plugin/cpython-plugin-test-no-plugin.c: ...here (and added
  * more tests).
* gcc.dg/plugin/plugin.exp: New tests.
* gcc.dg/plugin/cpython-plugin-test-PyList_New.c: New test.
* gcc.dg/plugin/cpython-plugin-test-PyLong_FromLong.c: New test.
* gcc.dg/plugin/cpython-plugin-test-refcnt-checking.c: New test.

Signed-off-by: Eric Feng 

---
 gcc/analyzer/engine.cc|   8 +-
 gcc/analyzer/exploded-graph.h |   4 +-
 gcc/analyzer/region-model.cc  |   3 +
 gcc/analyzer/region-model.h   |  48 ++-
 .../gcc.dg/plugin/analyzer_cpython_plugin.c   | 376 +-
 c => cpython-plugin-test-PyList_Append.c} |  56 +--
 .../plugin/cpython-plugin-test-PyList_New.c   |  38 ++
 .../cpython-plugin-test-PyLong_FromLong.c |  38 ++
 ...st-1.c => cpython-plugin-test-no-plugin.c} |   0
 .../cpython-plugin-test-refcnt-checking.c |  78 
 gcc/testsuite/gcc.dg/plugin/plugin.exp|   5 +-
 11 files changed, 612 insertions(+), 42 deletions(-)
 rename gcc/testsuite/gcc.dg/plugin/{cpython-plugin-test-2.c => 
cpython-plugin-test-PyList_Append.c} (64%)
 create mode 100644 gcc/testsuite/gcc.dg/plugin/cpython-plugin-test-PyList_New.c
 create mode 100644 
gcc/testsuite/gcc.dg/plugin/cpython-plugin-test-PyLong_FromLong.c
 rename gcc/testsuite/gcc.dg/plugin/{cpython-plugin-test-1.c => 
cpython-plugin-test-no-plugin.c} (100%)
 create mode 100644 
gcc/testsuite/gcc.dg/plugin/cpython-plugin-test-refcnt-checking.c

diff --git a/gcc/analyzer/engine.cc b/gcc/analyzer/engine.cc
in

Re: [PATCH] RISC-V: Refactor and clean expand_cond_len_{unop,binop,ternop}

2023-08-28 Thread Lehua Ding

Here is the V3 patch fix the comments, thanks.

https://gcc.gnu.org/pipermail/gcc-patches/2023-August/628650.html

--
Best,
Lehua



Re: [PATCH V2] RISC-V: Refactor and clean expand_cond_len_{unop, binop, ternop}

2023-08-28 Thread Lehua Ding

Invalid this patch, please see V3. Sorry for this.

On 2023/8/29 11:43, Lehua Ding wrote:

V2 changes: Address the comments from Robin.

Hi,

This patch refactors the codes of expand_cond_len_{unop,binop,ternop}.
Introduces a new unified function expand_cond_len_op to do the main thing.
The expand_cond_len_{unop,binop,ternop} functions only care about how
to pass the operands to the intrinsic patterns.

Best,
Lehua

gcc/ChangeLog:

* config/riscv/autovec.md: Adjust
* config/riscv/riscv-protos.h (RVV_VUNDEF): Clean.
(get_vlmax_rtx): Exported.
* config/riscv/riscv-v.cc (emit_nonvlmax_fp_ternary_tu_insn): Deleted.
(emit_vlmax_masked_gather_mu_insn): Adjust.
(get_vlmax_rtx): New func.
(expand_load_store): Adjust.
(expand_cond_len_unop): Call expand_cond_len_op.
(expand_cond_len_op): New subroutine.
(expand_cond_len_binop): Call expand_cond_len_op.
(expand_cond_len_ternop): Call expand_cond_len_op.
(expand_lanes_load_store): Adjust.
---
  gcc/config/riscv/autovec.md |   6 +-
  gcc/config/riscv/riscv-protos.h |  16 ++-
  gcc/config/riscv/riscv-v.cc | 166 ++--
  3 files changed, 60 insertions(+), 128 deletions(-)

diff --git a/gcc/config/riscv/autovec.md b/gcc/config/riscv/autovec.md
index 20ab0693b98..7a6247d9d6b 100644
--- a/gcc/config/riscv/autovec.md
+++ b/gcc/config/riscv/autovec.md
@@ -971,9 +971,9 @@
rtx mask = gen_reg_rtx (mask_mode);
riscv_vector::expand_vec_cmp (mask, LT, operands[1], zero);
  
-  rtx ops[] = {operands[0], mask, operands[1], operands[1]};

-  riscv_vector::emit_vlmax_masked_mu_insn (code_for_pred (NEG, mode),
-  riscv_vector::RVV_UNOP_MU, ops);
+  rtx ops[] = {operands[0], mask, operands[1], operands[1],
+   riscv_vector::get_vlmax_rtx (mode)};
+  riscv_vector::expand_cond_len_unop (NEG, ops);
DONE;
  })
  
diff --git a/gcc/config/riscv/riscv-protos.h b/gcc/config/riscv/riscv-protos.h

index 0e0470280f8..4137bb14b80 100644
--- a/gcc/config/riscv/riscv-protos.h
+++ b/gcc/config/riscv/riscv-protos.h
@@ -181,25 +181,20 @@ namespace riscv_vector {
  #define RVV_VUNDEF(MODE)  
 \
gen_rtx_UNSPEC (MODE, gen_rtvec (1, gen_rtx_REG (SImode, X0_REGNUM)),   
 \
  UNSPEC_VUNDEF)
+
+/* The value means the number of operands for insn_expander.  */
  enum insn_type
  {
RVV_MISC_OP = 1,
RVV_UNOP = 2,
-  RVV_UNOP_M = RVV_UNOP + 2,
-  RVV_UNOP_MU = RVV_UNOP + 2,
-  RVV_UNOP_TU = RVV_UNOP + 2,
-  RVV_UNOP_TUMU = RVV_UNOP + 2,
+  RVV_UNOP_MASK = RVV_UNOP + 2,
RVV_BINOP = 3,
-  RVV_BINOP_MU = RVV_BINOP + 2,
-  RVV_BINOP_TU = RVV_BINOP + 2,
-  RVV_BINOP_TUMU = RVV_BINOP + 2,
+  RVV_BINOP_MASK = RVV_BINOP + 2,
RVV_MERGE_OP = 4,
RVV_CMP_OP = 4,
RVV_CMP_MU_OP = RVV_CMP_OP + 2, /* +2 means mask and maskoff operand.  */
RVV_TERNOP = 5,
-  RVV_TERNOP_MU = RVV_TERNOP + 1,
-  RVV_TERNOP_TU = RVV_TERNOP + 1,
-  RVV_TERNOP_TUMU = RVV_TERNOP + 1,
+  RVV_TERNOP_MASK = RVV_TERNOP + 1,
RVV_WIDEN_TERNOP = 4,
RVV_SCALAR_MOV_OP = 4, /* +1 for VUNDEF according to vector.md.  */
RVV_SLIDE_OP = 4,  /* Dest, VUNDEF, source and offset.  */
@@ -260,6 +255,7 @@ void emit_vlmax_masked_mu_insn (unsigned, int, rtx *);
  void emit_scalar_move_insn (unsigned, rtx *, rtx = 0);
  void emit_nonvlmax_integer_move_insn (unsigned, rtx *, rtx);
  enum vlmul_type get_vlmul (machine_mode);
+rtx get_vlmax_rtx (machine_mode);
  unsigned int get_ratio (machine_mode);
  unsigned int get_nf (machine_mode);
  machine_mode get_subpart_mode (machine_mode);
diff --git a/gcc/config/riscv/riscv-v.cc b/gcc/config/riscv/riscv-v.cc
index b783fb8ab00..5ba2f59ef07 100644
--- a/gcc/config/riscv/riscv-v.cc
+++ b/gcc/config/riscv/riscv-v.cc
@@ -761,28 +761,6 @@ emit_vlmax_fp_ternary_insn (unsigned icode, int op_num, 
rtx *ops, rtx vl)
e.emit_insn ((enum insn_code) icode, ops);
  }
  
-/* This function emits a {NONVLMAX, TAIL_UNDISTURBED, MASK_ANY} vsetvli followed

- * by the ternary operation which always has a real merge operand.  */
-static void
-emit_nonvlmax_fp_ternary_tu_insn (unsigned icode, int op_num, rtx *ops, rtx vl)
-{
-  machine_mode dest_mode = GET_MODE (ops[0]);
-  machine_mode mask_mode = get_mask_mode (dest_mode);
-  insn_expander e (/*OP_NUM*/ op_num,
- /*HAS_DEST_P*/ true,
- /*FULLY_UNMASKED_P*/ false,
- /*USE_REAL_MERGE_P*/ true,
- /*HAS_AVL_P*/ true,
- /*VLMAX_P*/ false,
- /*DEST_MODE*/ dest_mode,
- /*MASK_MODE*/ mask_mode);
-  e.set_policy (TAIL_UNDISTURBED);
-  e.set_policy (MASK_ANY);
-  e.set_rounding_mode (FRM_DYN);
-  e.set_vl (vl);
-  e.emit_insn (

[PATCH V3] RISC-V: Refactor and clean expand_cond_len_{unop, binop, ternop}

2023-08-28 Thread Lehua Ding
V3 changes: Address the comments from Robin.

Hi,

This patch refactors the codes of expand_cond_len_{unop,binop,ternop}.
Introduces a new unified function expand_cond_len_op to do the main thing.
The expand_cond_len_{unop,binop,ternop} functions only care about how
to pass the operands to the intrinsic patterns.

Best,
Lehua

gcc/ChangeLog:

* config/riscv/autovec.md: Adjust
* config/riscv/riscv-protos.h (RVV_VUNDEF): Clean.
(get_vlmax_rtx): Exported.
* config/riscv/riscv-v.cc (emit_nonvlmax_fp_ternary_tu_insn): Deleted.
(emit_vlmax_masked_gather_mu_insn): Adjust.
(get_vlmax_rtx): New func.
(expand_load_store): Adjust.
(expand_cond_len_unop): Call expand_cond_len_op.
(expand_cond_len_op): New subroutine.
(expand_cond_len_binop): Call expand_cond_len_op.
(expand_cond_len_ternop): Call expand_cond_len_op.
(expand_lanes_load_store): Adjust.
---
 gcc/config/riscv/autovec.md |   6 +-
 gcc/config/riscv/riscv-protos.h |  16 ++--
 gcc/config/riscv/riscv-v.cc | 162 ++--
 3 files changed, 58 insertions(+), 126 deletions(-)

diff --git a/gcc/config/riscv/autovec.md b/gcc/config/riscv/autovec.md
index 20ab0693b98..7a6247d9d6b 100644
--- a/gcc/config/riscv/autovec.md
+++ b/gcc/config/riscv/autovec.md
@@ -971,9 +971,9 @@
   rtx mask = gen_reg_rtx (mask_mode);
   riscv_vector::expand_vec_cmp (mask, LT, operands[1], zero);
 
-  rtx ops[] = {operands[0], mask, operands[1], operands[1]};
-  riscv_vector::emit_vlmax_masked_mu_insn (code_for_pred (NEG, mode),
-  riscv_vector::RVV_UNOP_MU, ops);
+  rtx ops[] = {operands[0], mask, operands[1], operands[1],
+   riscv_vector::get_vlmax_rtx (mode)};
+  riscv_vector::expand_cond_len_unop (NEG, ops);
   DONE;
 })
 
diff --git a/gcc/config/riscv/riscv-protos.h b/gcc/config/riscv/riscv-protos.h
index 0e0470280f8..4137bb14b80 100644
--- a/gcc/config/riscv/riscv-protos.h
+++ b/gcc/config/riscv/riscv-protos.h
@@ -181,25 +181,20 @@ namespace riscv_vector {
 #define RVV_VUNDEF(MODE)   
\
   gen_rtx_UNSPEC (MODE, gen_rtvec (1, gen_rtx_REG (SImode, X0_REGNUM)),
\
  UNSPEC_VUNDEF)
+
+/* The value means the number of operands for insn_expander.  */
 enum insn_type
 {
   RVV_MISC_OP = 1,
   RVV_UNOP = 2,
-  RVV_UNOP_M = RVV_UNOP + 2,
-  RVV_UNOP_MU = RVV_UNOP + 2,
-  RVV_UNOP_TU = RVV_UNOP + 2,
-  RVV_UNOP_TUMU = RVV_UNOP + 2,
+  RVV_UNOP_MASK = RVV_UNOP + 2,
   RVV_BINOP = 3,
-  RVV_BINOP_MU = RVV_BINOP + 2,
-  RVV_BINOP_TU = RVV_BINOP + 2,
-  RVV_BINOP_TUMU = RVV_BINOP + 2,
+  RVV_BINOP_MASK = RVV_BINOP + 2,
   RVV_MERGE_OP = 4,
   RVV_CMP_OP = 4,
   RVV_CMP_MU_OP = RVV_CMP_OP + 2, /* +2 means mask and maskoff operand.  */
   RVV_TERNOP = 5,
-  RVV_TERNOP_MU = RVV_TERNOP + 1,
-  RVV_TERNOP_TU = RVV_TERNOP + 1,
-  RVV_TERNOP_TUMU = RVV_TERNOP + 1,
+  RVV_TERNOP_MASK = RVV_TERNOP + 1,
   RVV_WIDEN_TERNOP = 4,
   RVV_SCALAR_MOV_OP = 4, /* +1 for VUNDEF according to vector.md.  */
   RVV_SLIDE_OP = 4,  /* Dest, VUNDEF, source and offset.  */
@@ -260,6 +255,7 @@ void emit_vlmax_masked_mu_insn (unsigned, int, rtx *);
 void emit_scalar_move_insn (unsigned, rtx *, rtx = 0);
 void emit_nonvlmax_integer_move_insn (unsigned, rtx *, rtx);
 enum vlmul_type get_vlmul (machine_mode);
+rtx get_vlmax_rtx (machine_mode);
 unsigned int get_ratio (machine_mode);
 unsigned int get_nf (machine_mode);
 machine_mode get_subpart_mode (machine_mode);
diff --git a/gcc/config/riscv/riscv-v.cc b/gcc/config/riscv/riscv-v.cc
index b783fb8ab00..bf247788659 100644
--- a/gcc/config/riscv/riscv-v.cc
+++ b/gcc/config/riscv/riscv-v.cc
@@ -761,28 +761,6 @@ emit_vlmax_fp_ternary_insn (unsigned icode, int op_num, 
rtx *ops, rtx vl)
   e.emit_insn ((enum insn_code) icode, ops);
 }
 
-/* This function emits a {NONVLMAX, TAIL_UNDISTURBED, MASK_ANY} vsetvli 
followed
- * by the ternary operation which always has a real merge operand.  */
-static void
-emit_nonvlmax_fp_ternary_tu_insn (unsigned icode, int op_num, rtx *ops, rtx vl)
-{
-  machine_mode dest_mode = GET_MODE (ops[0]);
-  machine_mode mask_mode = get_mask_mode (dest_mode);
-  insn_expander e (/*OP_NUM*/ op_num,
- /*HAS_DEST_P*/ true,
- /*FULLY_UNMASKED_P*/ false,
- /*USE_REAL_MERGE_P*/ true,
- /*HAS_AVL_P*/ true,
- /*VLMAX_P*/ false,
- /*DEST_MODE*/ dest_mode,
- /*MASK_MODE*/ mask_mode);
-  e.set_policy (TAIL_UNDISTURBED);
-  e.set_policy (MASK_ANY);
-  e.set_rounding_mode (FRM_DYN);
-  e.set_vl (vl);
-  e.emit_insn ((enum insn_code) icode, ops);
-}
-
 /* This function emits a {NONVLMAX, TAIL_ANY, MASK_ANY} vsetvli followed by the
  * actual

[PATCH V2] RISC-V: Refactor and clean expand_cond_len_{unop, binop, ternop}

2023-08-28 Thread Lehua Ding
V2 changes: Address the comments from Robin.

Hi,

This patch refactors the codes of expand_cond_len_{unop,binop,ternop}.
Introduces a new unified function expand_cond_len_op to do the main thing.
The expand_cond_len_{unop,binop,ternop} functions only care about how
to pass the operands to the intrinsic patterns.

Best,
Lehua

gcc/ChangeLog:

* config/riscv/autovec.md: Adjust
* config/riscv/riscv-protos.h (RVV_VUNDEF): Clean.
(get_vlmax_rtx): Exported.
* config/riscv/riscv-v.cc (emit_nonvlmax_fp_ternary_tu_insn): Deleted.
(emit_vlmax_masked_gather_mu_insn): Adjust.
(get_vlmax_rtx): New func.
(expand_load_store): Adjust.
(expand_cond_len_unop): Call expand_cond_len_op.
(expand_cond_len_op): New subroutine.
(expand_cond_len_binop): Call expand_cond_len_op.
(expand_cond_len_ternop): Call expand_cond_len_op.
(expand_lanes_load_store): Adjust.
---
 gcc/config/riscv/autovec.md |   6 +-
 gcc/config/riscv/riscv-protos.h |  16 ++-
 gcc/config/riscv/riscv-v.cc | 166 ++--
 3 files changed, 60 insertions(+), 128 deletions(-)

diff --git a/gcc/config/riscv/autovec.md b/gcc/config/riscv/autovec.md
index 20ab0693b98..7a6247d9d6b 100644
--- a/gcc/config/riscv/autovec.md
+++ b/gcc/config/riscv/autovec.md
@@ -971,9 +971,9 @@
   rtx mask = gen_reg_rtx (mask_mode);
   riscv_vector::expand_vec_cmp (mask, LT, operands[1], zero);
 
-  rtx ops[] = {operands[0], mask, operands[1], operands[1]};
-  riscv_vector::emit_vlmax_masked_mu_insn (code_for_pred (NEG, mode),
-  riscv_vector::RVV_UNOP_MU, ops);
+  rtx ops[] = {operands[0], mask, operands[1], operands[1],
+   riscv_vector::get_vlmax_rtx (mode)};
+  riscv_vector::expand_cond_len_unop (NEG, ops);
   DONE;
 })
 
diff --git a/gcc/config/riscv/riscv-protos.h b/gcc/config/riscv/riscv-protos.h
index 0e0470280f8..4137bb14b80 100644
--- a/gcc/config/riscv/riscv-protos.h
+++ b/gcc/config/riscv/riscv-protos.h
@@ -181,25 +181,20 @@ namespace riscv_vector {
 #define RVV_VUNDEF(MODE)   
\
   gen_rtx_UNSPEC (MODE, gen_rtvec (1, gen_rtx_REG (SImode, X0_REGNUM)),
\
  UNSPEC_VUNDEF)
+
+/* The value means the number of operands for insn_expander.  */
 enum insn_type
 {
   RVV_MISC_OP = 1,
   RVV_UNOP = 2,
-  RVV_UNOP_M = RVV_UNOP + 2,
-  RVV_UNOP_MU = RVV_UNOP + 2,
-  RVV_UNOP_TU = RVV_UNOP + 2,
-  RVV_UNOP_TUMU = RVV_UNOP + 2,
+  RVV_UNOP_MASK = RVV_UNOP + 2,
   RVV_BINOP = 3,
-  RVV_BINOP_MU = RVV_BINOP + 2,
-  RVV_BINOP_TU = RVV_BINOP + 2,
-  RVV_BINOP_TUMU = RVV_BINOP + 2,
+  RVV_BINOP_MASK = RVV_BINOP + 2,
   RVV_MERGE_OP = 4,
   RVV_CMP_OP = 4,
   RVV_CMP_MU_OP = RVV_CMP_OP + 2, /* +2 means mask and maskoff operand.  */
   RVV_TERNOP = 5,
-  RVV_TERNOP_MU = RVV_TERNOP + 1,
-  RVV_TERNOP_TU = RVV_TERNOP + 1,
-  RVV_TERNOP_TUMU = RVV_TERNOP + 1,
+  RVV_TERNOP_MASK = RVV_TERNOP + 1,
   RVV_WIDEN_TERNOP = 4,
   RVV_SCALAR_MOV_OP = 4, /* +1 for VUNDEF according to vector.md.  */
   RVV_SLIDE_OP = 4,  /* Dest, VUNDEF, source and offset.  */
@@ -260,6 +255,7 @@ void emit_vlmax_masked_mu_insn (unsigned, int, rtx *);
 void emit_scalar_move_insn (unsigned, rtx *, rtx = 0);
 void emit_nonvlmax_integer_move_insn (unsigned, rtx *, rtx);
 enum vlmul_type get_vlmul (machine_mode);
+rtx get_vlmax_rtx (machine_mode);
 unsigned int get_ratio (machine_mode);
 unsigned int get_nf (machine_mode);
 machine_mode get_subpart_mode (machine_mode);
diff --git a/gcc/config/riscv/riscv-v.cc b/gcc/config/riscv/riscv-v.cc
index b783fb8ab00..5ba2f59ef07 100644
--- a/gcc/config/riscv/riscv-v.cc
+++ b/gcc/config/riscv/riscv-v.cc
@@ -761,28 +761,6 @@ emit_vlmax_fp_ternary_insn (unsigned icode, int op_num, 
rtx *ops, rtx vl)
   e.emit_insn ((enum insn_code) icode, ops);
 }
 
-/* This function emits a {NONVLMAX, TAIL_UNDISTURBED, MASK_ANY} vsetvli 
followed
- * by the ternary operation which always has a real merge operand.  */
-static void
-emit_nonvlmax_fp_ternary_tu_insn (unsigned icode, int op_num, rtx *ops, rtx vl)
-{
-  machine_mode dest_mode = GET_MODE (ops[0]);
-  machine_mode mask_mode = get_mask_mode (dest_mode);
-  insn_expander e (/*OP_NUM*/ op_num,
- /*HAS_DEST_P*/ true,
- /*FULLY_UNMASKED_P*/ false,
- /*USE_REAL_MERGE_P*/ true,
- /*HAS_AVL_P*/ true,
- /*VLMAX_P*/ false,
- /*DEST_MODE*/ dest_mode,
- /*MASK_MODE*/ mask_mode);
-  e.set_policy (TAIL_UNDISTURBED);
-  e.set_policy (MASK_ANY);
-  e.set_rounding_mode (FRM_DYN);
-  e.set_vl (vl);
-  e.emit_insn ((enum insn_code) icode, ops);
-}
-
 /* This function emits a {NONVLMAX, TAIL_ANY, MASK_ANY} vsetvli followed by the
  * actual 

[PATCH 0/1] RISC-V: Imply 'Zicsr' from 'Zcmt'

2023-08-28 Thread Tsukasa OI via Gcc-patches
This is a subset of my patch set
"RISC-V: Add stub support for existing extensions"

for faster review.

Since 'Zcmt' requires 'Zicsr' (and this is a bug unlike other changes in
the patch set above), this small patch is splitted.

Thanks,
Tsukasa




Tsukasa OI (1):
  RISC-V: Imply 'Zicsr' from 'Zcmt'

 gcc/common/config/riscv/riscv-common.cc | 1 +
 1 file changed, 1 insertion(+)


base-commit: 818cc9f2d2f3dbbd4004ff85d3125d92d1e430c9
-- 
2.42.0



[PATCH 1/1] RISC-V: Imply 'Zicsr' from 'Zcmt'

2023-08-28 Thread Tsukasa OI via Gcc-patches
From: Tsukasa OI 

As the specification states, the 'Zcmt' extension depends on the 'Zca' and
'Zicsr' extensions.  This commit reflects this implication.

gcc/ChangeLog:

* common/config/riscv/riscv-common.cc
(riscv_implied_info): Add implication from 'Zcmt' to 'Zicsr'.
---
 gcc/common/config/riscv/riscv-common.cc | 1 +
 1 file changed, 1 insertion(+)

diff --git a/gcc/common/config/riscv/riscv-common.cc 
b/gcc/common/config/riscv/riscv-common.cc
index a5b62cda3a09..1315c8a745ec 100644
--- a/gcc/common/config/riscv/riscv-common.cc
+++ b/gcc/common/config/riscv/riscv-common.cc
@@ -142,6 +142,7 @@ static const riscv_implied_info_t riscv_implied_info[] =
   {"zcb",  "zca"},
   {"zcmp", "zca"},
   {"zcmt", "zca"},
+  {"zcmt", "zicsr"},
 
   {NULL, NULL}
 };
-- 
2.42.0



[PATCH v3 3/3] RISC-V: Add stub support for existing extensions (unprivileged)

2023-08-28 Thread Tsukasa OI via Gcc-patches
From: Tsukasa OI 

After commit c283c4774d1c ("RISC-V: Throw compilation error for unknown
extensions") changed how do we handle unknown extensions, we have no
guarantee that we can share the same architectural string with Binutils
(specifically, the assembler).

To avoid compilation errors on shared Assembler-C/C++ projects or programs
with inline assembler, GCC should support almost all extensions that
Binutils support, even if the GCC itself does not touch a thing.

This commit adds stub supported standard unprivileged extensions to
riscv_ext_version_table and its implications to riscv_implied_info
(all information is copied from Binutils' bfd/elfxx-riscv.c except not yet
merged 'Zce', 'Zcmp' and 'Zcmt' support).

gcc/ChangeLog:

* common/config/riscv/riscv-common.cc
(riscv_implied_info): Add implications from unprivileged extensions.
(riscv_ext_version_table): Add stub support for all unprivileged
extensions supported by Binutils as well as 'Zce', 'Zcmp', 'Zcmt'.

gcc/testsuite/ChangeLog:

* gcc.target/riscv/predef-31.c: New test for a stub unprivileged
extension 'Zcb' with some implications.
---
 gcc/common/config/riscv/riscv-common.cc|  1 +
 gcc/testsuite/gcc.target/riscv/predef-31.c | 31 ++
 2 files changed, 32 insertions(+)
 create mode 100644 gcc/testsuite/gcc.target/riscv/predef-31.c

diff --git a/gcc/common/config/riscv/riscv-common.cc 
b/gcc/common/config/riscv/riscv-common.cc
index 8e2b3ba6d621..f142212f2edc 100644
--- a/gcc/common/config/riscv/riscv-common.cc
+++ b/gcc/common/config/riscv/riscv-common.cc
@@ -142,6 +142,7 @@ static const riscv_implied_info_t riscv_implied_info[] =
   {"zcb",  "zca"},
   {"zcmp", "zca"},
   {"zcmt", "zca"},
+  {"zcmt", "zicsr"},
 
   {"smaia", "ssaia"},
   {"smstateen", "ssstateen"},
diff --git a/gcc/testsuite/gcc.target/riscv/predef-31.c 
b/gcc/testsuite/gcc.target/riscv/predef-31.c
new file mode 100644
index ..4ea11442f995
--- /dev/null
+++ b/gcc/testsuite/gcc.target/riscv/predef-31.c
@@ -0,0 +1,31 @@
+/* { dg-do compile } */
+/* { dg-options "-march=rv64i_zcb -mabi=lp64 -mcmodel=medlow 
-misa-spec=20191213" } */
+
+int main () {
+
+#ifndef __riscv_arch_test
+#error "__riscv_arch_test"
+#endif
+
+#if __riscv_xlen != 64
+#error "__riscv_xlen"
+#endif
+
+#if !defined(__riscv_i) || (__riscv_i != (2 * 1000 * 1000 + 1 * 1000))
+#error "__riscv_i"
+#endif
+
+#if defined(__riscv_e)
+#error "__riscv_e"
+#endif
+
+#if !defined(__riscv_zca)
+#error "__riscv_zca"
+#endif
+
+#if !defined(__riscv_zcb)
+#error "__riscv_zcb"
+#endif
+
+  return 0;
+}
-- 
2.42.0



[PATCH v3 2/3] RISC-V: Add stub support for existing extensions (vendor)

2023-08-28 Thread Tsukasa OI via Gcc-patches
From: Tsukasa OI 

After commit c283c4774d1c ("RISC-V: Throw compilation error for unknown
extensions") changed how do we handle unknown extensions, we have no
guarantee that we can share the same architectural string with Binutils
(specifically, the assembler).

To avoid compilation errors on shared Assembler-C/C++ projects or programs
with inline assembler, GCC should support almost all extensions that
Binutils support, even if the GCC itself does not touch a thing.

This commit adds stub supported vendor extensions to
riscv_ext_version_table (no riscv_implied_info entries to add; all
information is copied from Binutils' bfd/elfxx-riscv.c).

gcc/ChangeLog:

* common/config/riscv/riscv-common.cc (riscv_ext_version_table):
Add stub support for all vendor extensions supported by Binutils.

gcc/testsuite/ChangeLog:

* gcc.target/riscv/predef-30.c: New test for a stub
vendor extension 'XVentanaCondOps'.
---
 gcc/common/config/riscv/riscv-common.cc|  2 ++
 gcc/testsuite/gcc.target/riscv/predef-30.c | 27 ++
 2 files changed, 29 insertions(+)
 create mode 100644 gcc/testsuite/gcc.target/riscv/predef-30.c

diff --git a/gcc/common/config/riscv/riscv-common.cc 
b/gcc/common/config/riscv/riscv-common.cc
index 3502993026d6..8e2b3ba6d621 100644
--- a/gcc/common/config/riscv/riscv-common.cc
+++ b/gcc/common/config/riscv/riscv-common.cc
@@ -322,6 +322,8 @@ static const struct riscv_ext_version 
riscv_ext_version_table[] =
   {"xtheadmempair", ISA_SPEC_CLASS_NONE, 1, 0},
   {"xtheadsync", ISA_SPEC_CLASS_NONE, 1, 0},
 
+  {"xventanacondops", ISA_SPEC_CLASS_NONE, 1, 0},
+
   /* Terminate the list.  */
   {NULL, ISA_SPEC_CLASS_NONE, 0, 0}
 };
diff --git a/gcc/testsuite/gcc.target/riscv/predef-30.c 
b/gcc/testsuite/gcc.target/riscv/predef-30.c
new file mode 100644
index ..9784b9ce5033
--- /dev/null
+++ b/gcc/testsuite/gcc.target/riscv/predef-30.c
@@ -0,0 +1,27 @@
+/* { dg-do compile } */
+/* { dg-options "-march=rv64i_xventanacondops -mabi=lp64 -mcmodel=medlow 
-misa-spec=20191213" } */
+
+int main () {
+
+#ifndef __riscv_arch_test
+#error "__riscv_arch_test"
+#endif
+
+#if __riscv_xlen != 64
+#error "__riscv_xlen"
+#endif
+
+#if !defined(__riscv_i) || (__riscv_i != (2 * 1000 * 1000 + 1 * 1000))
+#error "__riscv_i"
+#endif
+
+#if defined(__riscv_e)
+#error "__riscv_e"
+#endif
+
+#if !defined(__riscv_xventanacondops)
+#error "__riscv_xventanacondops"
+#endif
+
+  return 0;
+}
-- 
2.42.0



[PATCH v3 1/3] RISC-V: Add stub support for existing extensions (privileged)

2023-08-28 Thread Tsukasa OI via Gcc-patches
From: Tsukasa OI 

After commit c283c4774d1c ("RISC-V: Throw compilation error for unknown
extensions") changed how do we handle unknown extensions, we have no
guarantee that we can share the same architectural string with Binutils
(specifically, the assembler).

To avoid compilation errors on shared Assembler-C/C++ projects or programs
with inline assembler, GCC should support almost all extensions that
Binutils support, even if the GCC itself does not touch a thing.

As a start, this commit adds stub supported *privileged* extensions to
riscv_ext_version_table and its implications to riscv_implied_info
(all information is copied from Binutils' bfd/elfxx-riscv.c).

gcc/ChangeLog:

* common/config/riscv/riscv-common.cc
(riscv_implied_info): Add implications from privileged extensions.
(riscv_ext_version_table): Add stub support for all privileged
extensions supported by Binutils.

gcc/testsuite/ChangeLog:

* gcc.target/riscv/predef-29.c: New test for a stub privileged
extension 'Smstateen' with some implications.
---
 gcc/common/config/riscv/riscv-common.cc| 18 +++
 gcc/testsuite/gcc.target/riscv/predef-29.c | 35 ++
 2 files changed, 53 insertions(+)
 create mode 100644 gcc/testsuite/gcc.target/riscv/predef-29.c

diff --git a/gcc/common/config/riscv/riscv-common.cc 
b/gcc/common/config/riscv/riscv-common.cc
index a5b62cda3a09..3502993026d6 100644
--- a/gcc/common/config/riscv/riscv-common.cc
+++ b/gcc/common/config/riscv/riscv-common.cc
@@ -143,6 +143,14 @@ static const riscv_implied_info_t riscv_implied_info[] =
   {"zcmp", "zca"},
   {"zcmt", "zca"},
 
+  {"smaia", "ssaia"},
+  {"smstateen", "ssstateen"},
+  {"smepmp", "zicsr"},
+  {"ssaia", "zicsr"},
+  {"sscofpmf", "zicsr"},
+  {"ssstateen", "zicsr"},
+  {"sstc", "zicsr"},
+
   {NULL, NULL}
 };
 
@@ -288,8 +296,18 @@ static const struct riscv_ext_version 
riscv_ext_version_table[] =
   {"zcmp", ISA_SPEC_CLASS_NONE, 1, 0},
   {"zcmt", ISA_SPEC_CLASS_NONE, 1, 0},
 
+  {"smaia", ISA_SPEC_CLASS_NONE, 1, 0},
+  {"smepmp",ISA_SPEC_CLASS_NONE, 1, 0},
+  {"smstateen", ISA_SPEC_CLASS_NONE, 1, 0},
+
+  {"ssaia", ISA_SPEC_CLASS_NONE, 1, 0},
+  {"sscofpmf",  ISA_SPEC_CLASS_NONE, 1, 0},
+  {"ssstateen", ISA_SPEC_CLASS_NONE, 1, 0},
+  {"sstc",  ISA_SPEC_CLASS_NONE, 1, 0},
+
   {"svinval", ISA_SPEC_CLASS_NONE, 1, 0},
   {"svnapot", ISA_SPEC_CLASS_NONE, 1, 0},
+  {"svpbmt",  ISA_SPEC_CLASS_NONE, 1, 0},
 
   {"xtheadba", ISA_SPEC_CLASS_NONE, 1, 0},
   {"xtheadbb", ISA_SPEC_CLASS_NONE, 1, 0},
diff --git a/gcc/testsuite/gcc.target/riscv/predef-29.c 
b/gcc/testsuite/gcc.target/riscv/predef-29.c
new file mode 100644
index ..61c6429be558
--- /dev/null
+++ b/gcc/testsuite/gcc.target/riscv/predef-29.c
@@ -0,0 +1,35 @@
+/* { dg-do compile } */
+/* { dg-options "-march=rv64i_smstateen -mabi=lp64 -mcmodel=medlow 
-misa-spec=20191213" } */
+
+int main () {
+
+#ifndef __riscv_arch_test
+#error "__riscv_arch_test"
+#endif
+
+#if __riscv_xlen != 64
+#error "__riscv_xlen"
+#endif
+
+#if !defined(__riscv_i) || (__riscv_i != (2 * 1000 * 1000 + 1 * 1000))
+#error "__riscv_i"
+#endif
+
+#if defined(__riscv_e)
+#error "__riscv_e"
+#endif
+
+#if !defined(__riscv_zicsr)
+#error "__riscv_zicsr"
+#endif
+
+#if !defined(__riscv_smstateen)
+#error "__riscv_smstateen"
+#endif
+
+#if !defined(__riscv_ssstateen)
+#error "__riscv_ssstateen"
+#endif
+
+  return 0;
+}
-- 
2.42.0



[PATCH v3 0/3] RISC-V: Add stub support for existing extensions

2023-08-28 Thread Tsukasa OI via Gcc-patches
PATCH v1:

PATCH v2:



Changes: v1 -> v2 (only in PATCH 3/3)
==

Removed: 'Zvkn' -> 'Zvknha' implication (not to cause test failure)
Added:   'Zfa' -> 'F' implication (just I forgot to add in PATCH v1)


Changes: v2 -> v3 (only in PATCH 3/3)
==

Changed: 'Zcmt' -> 'Zcicsr' to
 'Zcmt' ->  'Zicsr' (fix typo)
Rebased against commit 17c22f466162
("RISC-V: Minimal support for ZC* extensions.").
and commit 30699b999e94
("[PATCH v10] RISC-V: Add support for the Zfa extension").
Slightly modified the commit message, reflecting the background.

As a rebase result, PATCH 3/3 got nearly empty (except a test case).


Thanks,
Tsukasa




Tsukasa OI (3):
  RISC-V: Add stub support for existing extensions (privileged)
  RISC-V: Add stub support for existing extensions (vendor)
  RISC-V: Add stub support for existing extensions (unprivileged)

 gcc/common/config/riscv/riscv-common.cc| 21 +
 gcc/testsuite/gcc.target/riscv/predef-29.c | 35 ++
 gcc/testsuite/gcc.target/riscv/predef-30.c | 27 +
 gcc/testsuite/gcc.target/riscv/predef-31.c | 31 +++
 4 files changed, 114 insertions(+)
 create mode 100644 gcc/testsuite/gcc.target/riscv/predef-29.c
 create mode 100644 gcc/testsuite/gcc.target/riscv/predef-30.c
 create mode 100644 gcc/testsuite/gcc.target/riscv/predef-31.c


base-commit: 818cc9f2d2f3dbbd4004ff85d3125d92d1e430c9
-- 
2.42.0



[PATCH] RISC-V: Make arch-24.c to test "success" case

2023-08-28 Thread Tsukasa OI via Gcc-patches
From: Tsukasa OI 

arch-24.c and arch-25.c are exactly the same and redundant.  The author
suspects that the original author intended to test two base ISAs (RV32I and
RV64I) so this commit changes arch-24.c to test that RV32I+Zcf does not
cause any errors.

gcc/testsuite/ChangeLog:

* gcc.target/riscv/arch-24.c: Test RV32I+Zcf instead.
---
 gcc/testsuite/gcc.target/riscv/arch-24.c | 4 +---
 1 file changed, 1 insertion(+), 3 deletions(-)

diff --git a/gcc/testsuite/gcc.target/riscv/arch-24.c 
b/gcc/testsuite/gcc.target/riscv/arch-24.c
index 3be4ade65a77..af15c3234b5e 100644
--- a/gcc/testsuite/gcc.target/riscv/arch-24.c
+++ b/gcc/testsuite/gcc.target/riscv/arch-24.c
@@ -1,5 +1,3 @@
 /* { dg-do compile } */
-/* { dg-options "-march=rv64i_zcf -mabi=lp64" } */
+/* { dg-options "-march=rv32i_zcf -mabi=ilp32" } */
 int foo() {}
-/* { dg-error "'-march=rv64i_zcf': zcf extension supports in rv32 only" "" { 
target *-*-* } 0 } */
-/* { dg-error "'-march=rv64i_zca_zcf': zcf extension supports in rv32 only" "" 
{ target *-*-* } 0 } */

base-commit: 818cc9f2d2f3dbbd4004ff85d3125d92d1e430c9
-- 
2.42.0



[PATCH v2] RISC-V: Make PR 102957 tests more comprehensive

2023-08-28 Thread Tsukasa OI via Gcc-patches
From: Tsukasa OI 

Commit c283c4774d1c ("RISC-V: Throw compilation error for unknown
extensions") changed how do we handle unknown extensions and
commit 6f709f79c915a ("[committed] [RISC-V] Fix expected diagnostic messages
in testsuite") "fixed" test failures caused by that change (on pr102957.c,
by testing the error message after the first change).

However, the latter change will partially break the original intent of PR
102957 test case because we wanted to make sure that we can parse a valid
two-letter extension name.

Fortunately, there is a valid two-letter extension name, 'Zk' (standard
scalar cryptography extension superset with NIST algorithm suite).

This commit adds pr102957-2.c to make sure that there will be no errors if
we parse a valid two-letter extension name.

gcc/testsuite/ChangeLog:

* gcc.target/riscv/pr102957-2.c: New test case using the 'Zk'
extension to continue testing whether we can use valid two-letter
extensions.
---
 gcc/testsuite/gcc.target/riscv/pr102957-2.c | 5 +
 1 file changed, 5 insertions(+)
 create mode 100644 gcc/testsuite/gcc.target/riscv/pr102957-2.c

diff --git a/gcc/testsuite/gcc.target/riscv/pr102957-2.c 
b/gcc/testsuite/gcc.target/riscv/pr102957-2.c
new file mode 100644
index ..fe6241466354
--- /dev/null
+++ b/gcc/testsuite/gcc.target/riscv/pr102957-2.c
@@ -0,0 +1,5 @@
+/* { dg-do compile } */
+/* { dg-options "-march=rv64gzk -mabi=lp64" } */
+int foo()
+{
+}

base-commit: 818cc9f2d2f3dbbd4004ff85d3125d92d1e430c9
-- 
2.42.0



[PATCH] RISC-V: Fix ASM check of vlmax_switch_vtype-16.c

2023-08-28 Thread Juzhe-Zhong
Notice there is a failure:
FAIL: gcc.target/riscv/rvv/vsetvl/vlmax_switch_vtype-16.c   -O2   
scan-assembler-times vsetvli\\s+zero,\\s*zero 2

Fix "2" into "3", the assembly is correct and better.

Committed.

gcc/testsuite/ChangeLog:

* gcc.target/riscv/rvv/vsetvl/vlmax_switch_vtype-16.c: Fix ASM check.

---
 .../gcc.target/riscv/rvv/vsetvl/vlmax_switch_vtype-16.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/gcc/testsuite/gcc.target/riscv/rvv/vsetvl/vlmax_switch_vtype-16.c 
b/gcc/testsuite/gcc.target/riscv/rvv/vsetvl/vlmax_switch_vtype-16.c
index 24c3dc53764..a1587e7e20f 100644
--- a/gcc/testsuite/gcc.target/riscv/rvv/vsetvl/vlmax_switch_vtype-16.c
+++ b/gcc/testsuite/gcc.target/riscv/rvv/vsetvl/vlmax_switch_vtype-16.c
@@ -52,7 +52,7 @@ void f (void * restrict in, void * restrict out, int32_t * a, 
int32_t * b, int n
 }
 
 /* { dg-final { scan-assembler-times 
{vsetvli\s+[a-x0-9]+,\s*zero,\s*e16,\s*mf4,\s*t[au],\s*m[au]} 1 { target { 
no-opts "-O0"  no-opts "-funroll-loops" no-opts "-O1" no-opts "-Os" no-opts 
"-Oz" no-opts "-flto" no-opts "-g" } } } } */
-/* { dg-final { scan-assembler-times {vsetvli\s+zero,\s*zero} 2 { target { 
no-opts "-O0" no-opts "-funroll-loops" no-opts "-O1" no-opts "-Os" no-opts 
"-Oz" no-opts "-flto" no-opts "-g" } } } } */
+/* { dg-final { scan-assembler-times {vsetvli\s+zero,\s*zero} 3 { target { 
no-opts "-O0" no-opts "-funroll-loops" no-opts "-O1" no-opts "-Os" no-opts 
"-Oz" no-opts "-flto" no-opts "-g" } } } } */
 /* { dg-final { scan-assembler-times 
{vsetvli\s+[a-x0-9]+,\s*zero,\s*e16,\s*mf2,\s*t[au],\s*m[au]} 2 { target { 
no-opts "-O0"  no-opts "-funroll-loops" no-opts "-O1" no-opts "-Os" no-opts 
"-Oz" no-opts "-flto" no-opts "-g" } } } } */
 /* { dg-final { scan-assembler-times 
{vsetvli\s+[a-x0-9]+,\s*zero,\s*e64,\s*m1,\s*t[au],\s*m[au]} 1 { target { 
no-opts "-O0"  no-opts "-funroll-loops" no-opts "-O1" no-opts "-Os" no-opts 
"-Oz" no-opts "-flto" no-opts "-g" } } } } */
 /* { dg-final { scan-assembler-times 
{vsetvli\s+zero,\s*zero,\s*e8,\s*mf8,\s*t[au],\s*m[au]} 1 { target { no-opts 
"-O0"  no-opts "-funroll-loops" no-opts "-O1" no-opts "-Os" no-opts "-Oz" 
no-opts "-flto" no-opts "-g" } } } } */
-- 
2.36.3



Re: [PATCH v2 3/3] RISC-V: Add stub support for existing extensions (unprivileged)

2023-08-28 Thread Tsukasa OI via Gcc-patches
On 2023/08/29 10:42, Jeff Law wrote:
> 
> 
> On 8/14/23 00:09, Tsukasa OI wrote:
>> From: Tsukasa OI 
>>
>> After commit c283c4774d1c ("RISC-V: Throw compilation error for unknown
>> extensions") changed how do we handle unknown extensions, we have no
>> guarantee that we can share the same architectural string with Binutils
>> (specifically, the assembler).
>>
>> To avoid compilation errors on shared Assembler-C/C++ projects, GCC
>> should
>> support almost all extensions that Binutils support, even if the GCC does
>> not touch a thing.
>>
>> This commit adds stub supported standard unprivileged extensions to
>> riscv_ext_version_table and its implications to riscv_implied_info
>> (all information is copied from Binutils' bfd/elfxx-riscv.c except not
>> yet
>> merged 'Zce', 'Zcmp' and 'Zcmt' support).
>>
>> gcc/ChangeLog:
>>
>> * common/config/riscv/riscv-common.cc
>> (riscv_implied_info): Add implications from unprivileged extensions.
>> (riscv_ext_version_table): Add stub support for all unprivileged
>> extensions supported by Binutils as well as 'Zce', 'Zcmp', 'Zcmt'.
>>
>> gcc/testsuite/ChangeLog:
>>
>> * gcc.target/riscv/predef-31.c: New test for a stub unprivileged
>> extension 'Zcb' with some implications.
> This series (most likely patch 3/3) seems to break arch-24.c and arch-25.c.
> 
> Please fix and post a V3.
> 
> Jeff
> 

I think it was a hidden merge failure with partial Zc* extensions
support by Jiawei (and I already fixed it in the internal version).
I'll re-review it and submit as v3 if it's okay.

I don't recall exact test cases that failed (when I tested) but looking
at arch-24.c and arch-25.c you pointed out, they have a minor issue
(independent with this patch set).  I'll submit a minor fix for those
files later.

Thanks,
Tsukasa


Re: [PATCH] RISC-V: Fix AVL/VL get ICE[VSETVL PASS]

2023-08-28 Thread Lehua Ding

Committed, thanks Kito.

On 2023/8/29 10:46, Kito Cheng via Gcc-patches wrote:

Assuming prev is vsetvli instruction is kind of a strong assumption,
but it is guarded with gcc_assert, so it is a reasonable fix to me,
LGTM :)

On Tue, Aug 29, 2023 at 10:37 AM Juzhe-Zhong  wrote:


Fix bunch of ICE in "vect" testsuite:
FAIL: gcc.dg/vect/vect-alias-check-16.c (internal compiler error: Segmentation 
fault)
FAIL: gcc.dg/vect/vect-alias-check-16.c (test for excess errors)
FAIL: gcc.dg/vect/vect-alias-check-16.c -flto -ffat-lto-objects (internal 
compiler error: Segmentation fault)
FAIL: gcc.dg/vect/vect-alias-check-16.c -flto -ffat-lto-objects (test for 
excess errors)
FAIL: gcc.dg/vect/vect-alias-check-20.c (internal compiler error: Segmentation 
fault)
FAIL: gcc.dg/vect/vect-alias-check-20.c (test for excess errors)
FAIL: gcc.dg/vect/vect-alias-check-20.c -flto -ffat-lto-objects (internal 
compiler error: Segmentation fault)
FAIL: gcc.dg/vect/vect-alias-check-20.c -flto -ffat-lto-objects (test for 
excess errors)

gcc/ChangeLog:

 * config/riscv/riscv-vsetvl.cc (vector_insn_info::get_avl_or_vl_reg): 
New function.
 (pass_vsetvl::compute_local_properties): Fix bug.
 (pass_vsetvl::commit_vsetvls): Ditto.
 * config/riscv/riscv-vsetvl.h: New function.

---
  gcc/config/riscv/riscv-vsetvl.cc | 46 +---
  gcc/config/riscv/riscv-vsetvl.h  |  1 +
  2 files changed, 31 insertions(+), 16 deletions(-)

diff --git a/gcc/config/riscv/riscv-vsetvl.cc b/gcc/config/riscv/riscv-vsetvl.cc
index f7ae6c16bee..73d672b083b 100644
--- a/gcc/config/riscv/riscv-vsetvl.cc
+++ b/gcc/config/riscv/riscv-vsetvl.cc
@@ -2290,6 +2290,32 @@ vector_insn_info::global_merge (const vector_insn_info 
&merge_info,
return new_info;
  }

+/* Wrapper helps to return the AVL or VL operand for the
+   vector_insn_info. Return AVL if the AVL is not VLMAX.
+   Otherwise, return the VL operand.  */
+rtx
+vector_insn_info::get_avl_or_vl_reg (void) const
+{
+  gcc_assert (has_avl_reg ());
+  if (!vlmax_avl_p (get_avl ()))
+return get_avl ();
+
+  if (has_vl_op (get_insn ()->rtl ()) || vsetvl_insn_p (get_insn ()->rtl ()))
+return ::get_vl (get_insn ()->rtl ());
+
+  if (get_avl_source ())
+return get_avl_reg_rtx ();
+
+  /* A DIRTY (polluted EMPTY) block if:
+   - get_insn is scalar move (no AVL or VL operand).
+   - get_avl_source is null (no def in the current DIRTY block).
+ Then we trace the previous insn which must be the insn
+ already inserted in Phase 2 to get the VL operand for VLMAX.  */
+  rtx_insn *prev_rinsn = PREV_INSN (get_insn ()->rtl ());
+  gcc_assert (prev_rinsn && vsetvl_insn_p (prev_rinsn));
+  return ::get_vl (prev_rinsn);
+}
+
  bool
  vector_insn_info::update_fault_first_load_avl (insn_info *insn)
  {
@@ -3166,19 +3192,17 @@ pass_vsetvl::compute_local_properties (void)
 bitmap_clear_bit (m_vector_manager->vector_transp[curr_bb_idx], i);
   else if (expr->has_avl_reg ())
 {
- rtx avl = vlmax_avl_p (expr->get_avl ())
- ? get_vl (expr->get_insn ()->rtl ())
- : expr->get_avl ();
+ rtx reg = expr->get_avl_or_vl_reg ();
   for (const insn_info *insn : bb->real_nondebug_insns ())
 {
- if (find_access (insn->defs (), REGNO (avl)))
+ if (find_access (insn->defs (), REGNO (reg)))
 {
   bitmap_clear_bit (
 m_vector_manager->vector_transp[curr_bb_idx], i);
   break;
 }
   else if (vlmax_avl_p (expr->get_avl ())
-  && find_access (insn->uses (), REGNO (avl)))
+  && find_access (insn->uses (), REGNO (reg)))
 {
   bitmap_clear_bit (
 m_vector_manager->vector_transp[curr_bb_idx], i);
@@ -3649,17 +3673,7 @@ pass_vsetvl::commit_vsetvls (void)
   = gen_vsetvl_pat (VSETVL_VTYPE_CHANGE_ONLY, reaching_out, NULL_RTX);
else if (vlmax_avl_p (reaching_out.get_avl ()))
 {
- rtx vl = NULL_RTX;
- /* For user VSETVL VL, AVL. We need to use VL operand here, so we
-don't directly use get_avl_reg_rtx (). Instead, we use the VL
-of the INSN->RTL ().  */
- if (!reaching_out.get_avl_source ())
-   {
- gcc_assert (vsetvl_insn_p (reaching_out.get_insn ()->rtl ()));
- vl = get_vl (reaching_out.get_insn ()->rtl ());
-   }
- else
-   vl = reaching_out.get_avl_reg_rtx ();
+ rtx vl = reaching_out.get_avl_or_vl_reg ();
   new_pat = gen_vsetvl_pat (VSETVL_NORMAL, reaching_out, vl);
 }
else
diff --git a/gcc/config/riscv/riscv-vsetvl.h b/gcc/config/riscv/riscv-vsetvl.h
index 4b5825d7f6b..2a315e45f31 100644
--- a/gcc/config/riscv/riscv-vsetvl.h
+++

[PATCH, rs6000] Call vector load/store with length expand only on 64-bit Power10 [PR96762]

2023-08-28 Thread HAO CHEN GUI via Gcc-patches
Hi,
  This patch adds "TARGET_64BIT" check when calling vector load/store
with length expand in expand_block_move. It matches the expand condition
of "lxvl" and "stxvl" defined in vsx.md.

  This patch fixes the ICE occurred with the test case on 32-bit Power10.

  Bootstrapped and tested on powerpc64-linux BE and LE with no regressions.

Thanks
Gui Haochen


ChangeLog
rs6000: call vector load/store with length expand only on 64-bit Power10

gcc/
PR target/96762
* config/rs6000/rs6000-string.cc (expand_block_move): Call vector
load/store with length expand only on 64-bit Power10.

gcc/testsuite/
PR target/96762
* gcc.target/powerpc/pr96762.c: New.


patch.diff
diff --git a/gcc/config/rs6000/rs6000-string.cc 
b/gcc/config/rs6000/rs6000-string.cc
index cd8ee8c..d1b48c2 100644
--- a/gcc/config/rs6000/rs6000-string.cc
+++ b/gcc/config/rs6000/rs6000-string.cc
@@ -2811,8 +2811,9 @@ expand_block_move (rtx operands[], bool might_overlap)
  gen_func.mov = gen_vsx_movv2di_64bit;
}
   else if (TARGET_BLOCK_OPS_UNALIGNED_VSX
-  && TARGET_POWER10 && bytes < 16
-  && orig_bytes > 16
+  /* Only use lxvl/stxvl on 64bit POWER10.  */
+  && TARGET_POWER10 && TARGET_64BIT
+  && bytes < 16 && orig_bytes > 16
   && !(bytes == 1 || bytes == 2
|| bytes == 4 || bytes == 8)
   && (align >= 128 || !STRICT_ALIGNMENT))
diff --git a/gcc/testsuite/gcc.target/powerpc/pr96762.c 
b/gcc/testsuite/gcc.target/powerpc/pr96762.c
new file mode 100644
index 000..1145dd1
--- /dev/null
+++ b/gcc/testsuite/gcc.target/powerpc/pr96762.c
@@ -0,0 +1,11 @@
+/* { dg-do compile { target ilp32 } } */
+/* { dg-options "-O2 -mdejagnu-cpu=power10" } */
+
+extern void foo (char *);
+
+void
+bar (void)
+{
+  char zj[] = "";
+  foo (zj);
+}


Re: [PATCH] RISC-V: Fix AVL/VL get ICE[VSETVL PASS]

2023-08-28 Thread Kito Cheng via Gcc-patches
Assuming prev is vsetvli instruction is kind of a strong assumption,
but it is guarded with gcc_assert, so it is a reasonable fix to me,
LGTM :)

On Tue, Aug 29, 2023 at 10:37 AM Juzhe-Zhong  wrote:
>
> Fix bunch of ICE in "vect" testsuite:
> FAIL: gcc.dg/vect/vect-alias-check-16.c (internal compiler error: 
> Segmentation fault)
> FAIL: gcc.dg/vect/vect-alias-check-16.c (test for excess errors)
> FAIL: gcc.dg/vect/vect-alias-check-16.c -flto -ffat-lto-objects (internal 
> compiler error: Segmentation fault)
> FAIL: gcc.dg/vect/vect-alias-check-16.c -flto -ffat-lto-objects (test for 
> excess errors)
> FAIL: gcc.dg/vect/vect-alias-check-20.c (internal compiler error: 
> Segmentation fault)
> FAIL: gcc.dg/vect/vect-alias-check-20.c (test for excess errors)
> FAIL: gcc.dg/vect/vect-alias-check-20.c -flto -ffat-lto-objects (internal 
> compiler error: Segmentation fault)
> FAIL: gcc.dg/vect/vect-alias-check-20.c -flto -ffat-lto-objects (test for 
> excess errors)
>
> gcc/ChangeLog:
>
> * config/riscv/riscv-vsetvl.cc (vector_insn_info::get_avl_or_vl_reg): 
> New function.
> (pass_vsetvl::compute_local_properties): Fix bug.
> (pass_vsetvl::commit_vsetvls): Ditto.
> * config/riscv/riscv-vsetvl.h: New function.
>
> ---
>  gcc/config/riscv/riscv-vsetvl.cc | 46 +---
>  gcc/config/riscv/riscv-vsetvl.h  |  1 +
>  2 files changed, 31 insertions(+), 16 deletions(-)
>
> diff --git a/gcc/config/riscv/riscv-vsetvl.cc 
> b/gcc/config/riscv/riscv-vsetvl.cc
> index f7ae6c16bee..73d672b083b 100644
> --- a/gcc/config/riscv/riscv-vsetvl.cc
> +++ b/gcc/config/riscv/riscv-vsetvl.cc
> @@ -2290,6 +2290,32 @@ vector_insn_info::global_merge (const vector_insn_info 
> &merge_info,
>return new_info;
>  }
>
> +/* Wrapper helps to return the AVL or VL operand for the
> +   vector_insn_info. Return AVL if the AVL is not VLMAX.
> +   Otherwise, return the VL operand.  */
> +rtx
> +vector_insn_info::get_avl_or_vl_reg (void) const
> +{
> +  gcc_assert (has_avl_reg ());
> +  if (!vlmax_avl_p (get_avl ()))
> +return get_avl ();
> +
> +  if (has_vl_op (get_insn ()->rtl ()) || vsetvl_insn_p (get_insn ()->rtl ()))
> +return ::get_vl (get_insn ()->rtl ());
> +
> +  if (get_avl_source ())
> +return get_avl_reg_rtx ();
> +
> +  /* A DIRTY (polluted EMPTY) block if:
> +   - get_insn is scalar move (no AVL or VL operand).
> +   - get_avl_source is null (no def in the current DIRTY block).
> + Then we trace the previous insn which must be the insn
> + already inserted in Phase 2 to get the VL operand for VLMAX.  */
> +  rtx_insn *prev_rinsn = PREV_INSN (get_insn ()->rtl ());
> +  gcc_assert (prev_rinsn && vsetvl_insn_p (prev_rinsn));
> +  return ::get_vl (prev_rinsn);
> +}
> +
>  bool
>  vector_insn_info::update_fault_first_load_avl (insn_info *insn)
>  {
> @@ -3166,19 +3192,17 @@ pass_vsetvl::compute_local_properties (void)
> bitmap_clear_bit (m_vector_manager->vector_transp[curr_bb_idx], 
> i);
>   else if (expr->has_avl_reg ())
> {
> - rtx avl = vlmax_avl_p (expr->get_avl ())
> - ? get_vl (expr->get_insn ()->rtl ())
> - : expr->get_avl ();
> + rtx reg = expr->get_avl_or_vl_reg ();
>   for (const insn_info *insn : bb->real_nondebug_insns ())
> {
> - if (find_access (insn->defs (), REGNO (avl)))
> + if (find_access (insn->defs (), REGNO (reg)))
> {
>   bitmap_clear_bit (
> m_vector_manager->vector_transp[curr_bb_idx], i);
>   break;
> }
>   else if (vlmax_avl_p (expr->get_avl ())
> -  && find_access (insn->uses (), REGNO (avl)))
> +  && find_access (insn->uses (), REGNO (reg)))
> {
>   bitmap_clear_bit (
> m_vector_manager->vector_transp[curr_bb_idx], i);
> @@ -3649,17 +3673,7 @@ pass_vsetvl::commit_vsetvls (void)
>   = gen_vsetvl_pat (VSETVL_VTYPE_CHANGE_ONLY, reaching_out, NULL_RTX);
>else if (vlmax_avl_p (reaching_out.get_avl ()))
> {
> - rtx vl = NULL_RTX;
> - /* For user VSETVL VL, AVL. We need to use VL operand here, so we
> -don't directly use get_avl_reg_rtx (). Instead, we use the VL
> -of the INSN->RTL ().  */
> - if (!reaching_out.get_avl_source ())
> -   {
> - gcc_assert (vsetvl_insn_p (reaching_out.get_insn ()->rtl ()));
> - vl = get_vl (reaching_out.get_insn ()->rtl ());
> -   }
> - else
> -   vl = reaching_out.get_avl_reg_rtx ();
> + rtx vl = reaching_out.get_avl_or_vl_reg ();
>   new_pat = gen_vsetvl_pat (VSETVL_NORMAL, reaching_out, vl);
> }
>else
> diff --git a/gcc/config/riscv/riscv-vsetvl.h b/gcc

Re: Re: [PATCH 0/2] support cm.push cm.pop cm.popret in zcmp and resolve confilct with shrink-wrap-separate

2023-08-28 Thread Kito Cheng via Gcc-patches
> 1. flag_shrink_wrap_separate seems better than flag_shrink_wrap.

(flag_)shrink_wrap_separate seems a sub optimization of
(flag_)shrink_wrap, so I am fine if flag_shrink_wrap_separate is
enough.

> 2. to pass the zcmp testcases, i will add fno-shrink-wrap-separate option.

OK


[COMMITTED V3] RISC-V: Fix error combine of pred_mov pattern

2023-08-28 Thread Lehua Ding
V3 change: Adjust the code format as Jeff suggests.

This patch fix PR110943 which will produce some error code. This is because
the error combine of some pred_mov pattern. Consider this code:

```

void foo9 (void *base, void *out, size_t vl)
{
int64_t scalar = *(int64_t*)(base + 100);
vint64m2_t v = __riscv_vmv_v_x_i64m2 (0, 1);
*(vint64m2_t*)out = v;
}
```

RTL before combine pass:

```
(insn 11 10 12 2 (set (reg/v:RVVM2DI 134 [ v ])
(if_then_else:RVVM2DI (unspec:RVVMF32BI [
(const_vector:RVVMF32BI repeat [
(const_int 1 [0x1])
])
(const_int 1 [0x1])
(const_int 2 [0x2]) repeated x2
(const_int 0 [0])
(reg:SI 66 vl)
(reg:SI 67 vtype)
] UNSPEC_VPREDICATE)
(const_vector:RVVM2DI repeat [
(const_int 0 [0])
])
(unspec:RVVM2DI [
(reg:SI 0 zero)
] UNSPEC_VUNDEF))) "/app/example.c":6:20 1089 {pred_movrvvm2di})
(insn 14 13 0 2 (set (mem:RVVM2DI (reg/v/f:DI 136 [ out ]) [1 MEM[(vint64m2_t 
*)out_4(D)]+0 S[32, 32] A128])
(reg/v:RVVM2DI 134 [ v ])) "/app/example.c":7:23 717 
{*movrvvm2di_whole})
```

RTL after combine pass:
```
(insn 14 13 0 2 (set (mem:RVVM2DI (reg:DI 138) [1 MEM[(vint64m2_t *)out_4(D)]+0 
S[32, 32] A128])
(if_then_else:RVVM2DI (unspec:RVVMF32BI [
(const_vector:RVVMF32BI repeat [
(const_int 1 [0x1])
])
(const_int 1 [0x1])
(const_int 2 [0x2]) repeated x2
(const_int 0 [0])
(reg:SI 66 vl)
(reg:SI 67 vtype)
] UNSPEC_VPREDICATE)
(const_vector:RVVM2DI repeat [
(const_int 0 [0])
])
(unspec:RVVM2DI [
(reg:SI 0 zero)
] UNSPEC_VUNDEF))) "/app/example.c":7:23 1089 {pred_movrvvm2di})
```

This combine change the semantics of insn 14. I split @pred_mov pattern and
restrict the conditon of @pred_mov.

PR target/110943

gcc/ChangeLog:

* config/riscv/predicates.md (vector_const_int_or_double_0_operand):
New predicate.
* config/riscv/riscv-vector-builtins.cc 
(function_expander::function_expander):
force_reg mem target operand.
* config/riscv/vector.md (@pred_mov): Wrapper.
(*pred_mov): Remove imm -> reg pattern.
(*pred_broadcast_imm): Add imm -> reg pattern.

gcc/testsuite/ChangeLog:

* gcc.target/riscv/rvv/base/zvfhmin-intrinsic.c: Adjust.
* gcc.target/riscv/rvv/base/pr110943.c: New test.

---
 gcc/config/riscv/predicates.md|  5 +
 gcc/config/riscv/riscv-vector-builtins.cc |  9 +-
 gcc/config/riscv/vector.md| 98 +++
 .../gcc.target/riscv/rvv/base/pr110943.c  | 33 +++
 .../riscv/rvv/base/zvfhmin-intrinsic.c| 10 +-
 5 files changed, 106 insertions(+), 49 deletions(-)
 create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/base/pr110943.c

diff --git a/gcc/config/riscv/predicates.md b/gcc/config/riscv/predicates.md
index 618ad607047..51cf7eb7514 100644
--- a/gcc/config/riscv/predicates.md
+++ b/gcc/config/riscv/predicates.md
@@ -296,6 +296,11 @@
   (and (match_code "const_vector")
(match_test "satisfies_constraint_Wc0 (op)")))
 
+(define_predicate "vector_const_int_or_double_0_operand"
+  (and (match_code "const_vector")
+   (match_test "satisfies_constraint_vi (op)
+|| satisfies_constraint_Wc0 (op)")))
+
 (define_predicate "vector_move_operand"
   (ior (match_operand 0 "nonimmediate_operand")
(and (match_code "const_vector")
diff --git a/gcc/config/riscv/riscv-vector-builtins.cc 
b/gcc/config/riscv/riscv-vector-builtins.cc
index ad4a9098620..4a7eb47972e 100644
--- a/gcc/config/riscv/riscv-vector-builtins.cc
+++ b/gcc/config/riscv/riscv-vector-builtins.cc
@@ -3471,7 +3471,14 @@ function_expander::function_expander (const 
function_instance &instance,
 exp (exp_in), target (target_in), opno (0)
 {
   if (!function_returns_void_p ())
-create_output_operand (&m_ops[opno++], target, TYPE_MODE (TREE_TYPE 
(exp)));
+{
+  if (target != NULL_RTX && MEM_P (target))
+   /* Since there is no intrinsic where target is a mem operand, it
+  should be converted to reg if it is a mem operand.  */
+   target = force_reg (GET_MODE (target), target);
+  create_output_operand (&m_ops[opno++], target,
+TYPE_MODE (TREE_TYPE (exp)));
+}
 }
 
 /* Take argument ARGNO from EXP's argument list and convert it into
diff --git a/gcc/config/riscv/vector.md b/gcc/config/riscv/vector.md
index a442e0fdd3c..d6bfbe81fcc 100644
--- a/gcc/config/riscv/vector.md
+++ b/gcc/config/riscv/vector.md
@@ -1450,69 +1

[PATCH] RISC-V: Fix AVL/VL get ICE[VSETVL PASS]

2023-08-28 Thread Juzhe-Zhong
Fix bunch of ICE in "vect" testsuite:
FAIL: gcc.dg/vect/vect-alias-check-16.c (internal compiler error: Segmentation 
fault)
FAIL: gcc.dg/vect/vect-alias-check-16.c (test for excess errors)
FAIL: gcc.dg/vect/vect-alias-check-16.c -flto -ffat-lto-objects (internal 
compiler error: Segmentation fault)
FAIL: gcc.dg/vect/vect-alias-check-16.c -flto -ffat-lto-objects (test for 
excess errors)
FAIL: gcc.dg/vect/vect-alias-check-20.c (internal compiler error: Segmentation 
fault)
FAIL: gcc.dg/vect/vect-alias-check-20.c (test for excess errors)
FAIL: gcc.dg/vect/vect-alias-check-20.c -flto -ffat-lto-objects (internal 
compiler error: Segmentation fault)
FAIL: gcc.dg/vect/vect-alias-check-20.c -flto -ffat-lto-objects (test for 
excess errors)

gcc/ChangeLog:

* config/riscv/riscv-vsetvl.cc (vector_insn_info::get_avl_or_vl_reg): 
New function.
(pass_vsetvl::compute_local_properties): Fix bug.
(pass_vsetvl::commit_vsetvls): Ditto.
* config/riscv/riscv-vsetvl.h: New function.

---
 gcc/config/riscv/riscv-vsetvl.cc | 46 +---
 gcc/config/riscv/riscv-vsetvl.h  |  1 +
 2 files changed, 31 insertions(+), 16 deletions(-)

diff --git a/gcc/config/riscv/riscv-vsetvl.cc b/gcc/config/riscv/riscv-vsetvl.cc
index f7ae6c16bee..73d672b083b 100644
--- a/gcc/config/riscv/riscv-vsetvl.cc
+++ b/gcc/config/riscv/riscv-vsetvl.cc
@@ -2290,6 +2290,32 @@ vector_insn_info::global_merge (const vector_insn_info 
&merge_info,
   return new_info;
 }
 
+/* Wrapper helps to return the AVL or VL operand for the
+   vector_insn_info. Return AVL if the AVL is not VLMAX.
+   Otherwise, return the VL operand.  */
+rtx
+vector_insn_info::get_avl_or_vl_reg (void) const
+{
+  gcc_assert (has_avl_reg ());
+  if (!vlmax_avl_p (get_avl ()))
+return get_avl ();
+
+  if (has_vl_op (get_insn ()->rtl ()) || vsetvl_insn_p (get_insn ()->rtl ()))
+return ::get_vl (get_insn ()->rtl ());
+
+  if (get_avl_source ())
+return get_avl_reg_rtx ();
+
+  /* A DIRTY (polluted EMPTY) block if:
+   - get_insn is scalar move (no AVL or VL operand).
+   - get_avl_source is null (no def in the current DIRTY block).
+ Then we trace the previous insn which must be the insn
+ already inserted in Phase 2 to get the VL operand for VLMAX.  */
+  rtx_insn *prev_rinsn = PREV_INSN (get_insn ()->rtl ());
+  gcc_assert (prev_rinsn && vsetvl_insn_p (prev_rinsn));
+  return ::get_vl (prev_rinsn);
+}
+
 bool
 vector_insn_info::update_fault_first_load_avl (insn_info *insn)
 {
@@ -3166,19 +3192,17 @@ pass_vsetvl::compute_local_properties (void)
bitmap_clear_bit (m_vector_manager->vector_transp[curr_bb_idx], i);
  else if (expr->has_avl_reg ())
{
- rtx avl = vlmax_avl_p (expr->get_avl ())
- ? get_vl (expr->get_insn ()->rtl ())
- : expr->get_avl ();
+ rtx reg = expr->get_avl_or_vl_reg ();
  for (const insn_info *insn : bb->real_nondebug_insns ())
{
- if (find_access (insn->defs (), REGNO (avl)))
+ if (find_access (insn->defs (), REGNO (reg)))
{
  bitmap_clear_bit (
m_vector_manager->vector_transp[curr_bb_idx], i);
  break;
}
  else if (vlmax_avl_p (expr->get_avl ())
-  && find_access (insn->uses (), REGNO (avl)))
+  && find_access (insn->uses (), REGNO (reg)))
{
  bitmap_clear_bit (
m_vector_manager->vector_transp[curr_bb_idx], i);
@@ -3649,17 +3673,7 @@ pass_vsetvl::commit_vsetvls (void)
  = gen_vsetvl_pat (VSETVL_VTYPE_CHANGE_ONLY, reaching_out, NULL_RTX);
   else if (vlmax_avl_p (reaching_out.get_avl ()))
{
- rtx vl = NULL_RTX;
- /* For user VSETVL VL, AVL. We need to use VL operand here, so we
-don't directly use get_avl_reg_rtx (). Instead, we use the VL
-of the INSN->RTL ().  */
- if (!reaching_out.get_avl_source ())
-   {
- gcc_assert (vsetvl_insn_p (reaching_out.get_insn ()->rtl ()));
- vl = get_vl (reaching_out.get_insn ()->rtl ());
-   }
- else
-   vl = reaching_out.get_avl_reg_rtx ();
+ rtx vl = reaching_out.get_avl_or_vl_reg ();
  new_pat = gen_vsetvl_pat (VSETVL_NORMAL, reaching_out, vl);
}
   else
diff --git a/gcc/config/riscv/riscv-vsetvl.h b/gcc/config/riscv/riscv-vsetvl.h
index 4b5825d7f6b..2a315e45f31 100644
--- a/gcc/config/riscv/riscv-vsetvl.h
+++ b/gcc/config/riscv/riscv-vsetvl.h
@@ -335,6 +335,7 @@ public:
 
   rtl_ssa::insn_info *get_insn () const { return m_insn; }
   const bool *get_demands (void) const { return m_demands; }
+  rtx get_avl_or_vl_reg (void) const;
   rtx get_avl_reg_rtx (void) const
   {
 return gen_rtx_REG (Pmode, get_avl_source

Re: [RFC PATCH 2/2] RISC-V: Fix documentation of __builtin_riscv_pause

2023-08-28 Thread Tsukasa OI via Gcc-patches
On 2023/08/29 8:09, Hans-Peter Nilsson wrote:
> On Mon, 28 Aug 2023, Jeff Law via Gcc-patches wrote:
>>
>>
>> On 8/9/23 00:11, Tsukasa OI via Gcc-patches wrote:
>>> From: Tsukasa OI 
>>>
>>> This built-in does not imply the 'Xgnuzihintpausestate' extension.
>>> It does not change architectural state (because all HINTs are prohibited
>>> from doing that).
>>>
>>> gcc/ChangeLog:
>>>
>>> * doc/extend.texi: Fix the description of __builtin_riscv_pause.
>> I've pushed this to the trunk.
> 
> I randomly noticed a typo: "hart", perhaps for "part"?
> Not sure though.
> 
> brgds, H-P
> 

Hi H-P,

As Jeff mentioned you, the word "hart" in the RISC-V world means a
HARdware Thread and commonly used to represent a hardware-based unit of
execution.

Tsukasa


Re: [PATCH] RISC-V: Revive test case PR 102957

2023-08-28 Thread Tsukasa OI via Gcc-patches
On 2023/08/29 7:01, Jeff Law wrote:
> 
> 
> On 8/11/23 08:29, Tsukasa OI wrote:
>> On 2023/08/11 23:15, Jeff Law wrote:
> 
>>>
>>
>> Originally, it tested that a two letter extension ('Zb') is accepted by
>> GCC (because the background of PR 102957 was GCC assumed multi-letter
>> 'Z' extensions are three letters or more).
>>
>> After rejecting unrecognized extensions, "dg-error" is added **just to
>> avoid the test failure** and that doesn't look right.  Yes, we now don't
>> have an ICE (like in the original report) but after the PR 102957 fix,
>> we just accepted it, not rejecting it.
>>
>> Instead, we have a valid (recognized) two-letter 'Z' extension: 'Zk'.  I
>> think replacing "zb" with "zk" is more correct considering the original
>> bug report (PR 102957) and its assumption.
>>
>> cf. 
> Thanks.  It still seems to me we want to  have two tests here.
> 
> I would suggest leaving pr102957.c alone since that tests that we give a
> proper error for "zb".  Then create a new test that verifies "zk" is
> accepted without error.
> 
> jeff
> 

Okay, that's a great compromise.

I will make v2 to add pr102957-2.c (like so) to reflect my intention and
keep the original pr102957.c.

Thanks,
Tsukasa


Re: [PATCH 1/1] RISC-V: Make "prefetch.i" built-in usable

2023-08-28 Thread Tsukasa OI via Gcc-patches
On 2023/08/29 6:20, Jeff Law wrote:
> 
> 
> On 8/9/23 21:10, Tsukasa OI via Gcc-patches wrote:
>> From: Tsukasa OI 
>>
>> The "__builtin_riscv_zicbop_cbo_prefetchi" built-in function was terribly
>> broken so that practically unusable.  It emitted "prefetch.i" but with no
>> meaningful arguments.
>>
>> Though incompatible, this commit completely changes the function
>> prototype
>> of this built-in and makes it usable.  To minimize the functionality
>> issues,
>> it renames the built-in to "__builtin_riscv_zicbop_prefetch_i".
>>
>> gcc/ChangeLog:
>>
>> * config/riscv/riscv-cmo.def: Fix function prototype.
>> * config/riscv/riscv.md (riscv_prefetchi_): Fix instruction
>> prototype.  Remove possible prefectch type argument
>> * doc/extend.texi: Document __builtin_riscv_zicbop_prefetch_i.
>>
>> gcc/testsuite/ChangeLog:
>>
>> * gcc.target/riscv/cmo-zicbop-1.c: Reflect new built-in prototype.
>> * gcc.target/riscv/cmo-zicbop-2.c: Likewise.
>> diff --git a/gcc/config/riscv/riscv.md b/gcc/config/riscv/riscv.md
>> index 688fd697255b..5658c7b7e113 100644
>> --- a/gcc/config/riscv/riscv.md
>> +++ b/gcc/config/riscv/riscv.md
>> @@ -3273,9 +3273,8 @@
>>   })
>>     (define_insn "riscv_prefetchi_"
>> -  [(unspec_volatile:X [(match_operand:X 0 "address_operand" "r")
>> -  (match_operand:X 1 "imm5_operand" "i")]
>> -  UNSPECV_PREI)]
>> +  [(unspec_volatile:X [(match_operand:X 0 "register_operand" "r")]
>> +    UNSPECV_PREI)]
>>     "TARGET_ZICBOP"
>>     "prefetch.i\t%a0"
>>   )
> What I would suggest is making a new predicate that accepts either a
> register or a register+offset where the offset fits in a signed 12 bit
> immediate.  Use that for operand 0's predicate and I think this will
> "just work" and cover all the cases supported by the prefetch.i
> instruction.
> 
> Jeff
> 

Seems reasonable.

If we have to break the compatibility anyway, adding an offset argument
is not a bad idea (though I think they will use inline assembly if a
non-zero offset is required).

I will try to add *optional* offset argument (with default value 0) like
 __builtin_speculation_safe_value built-in function in the next version.

Thanks,
Tsukasa


Re: [RFC PATCH 1/2] RISC-V: __builtin_riscv_pause for all environment

2023-08-28 Thread Tsukasa OI via Gcc-patches



On 2023/08/29 6:12, Jeff Law wrote:
> 
> 
> On 8/9/23 00:11, Tsukasa OI via Gcc-patches wrote:
>> From: Tsukasa OI 
>>
>> The "pause" RISC-V hint instruction requires the 'Zihintpause' extension
>> (in the assembler).  However, GCC emits "pause" unconditionally, making
>> an assembler error while compiling code with __builtin_riscv_pause while
>> the 'Zihintpause' extension disabled.
>>
>> However, the "pause" instruction code (0x010f) is a HINT and emitting
>> its instruction code is safe in any environment.
>>
>> This commit implements handling for the 'Zihintpause' extension and emits
>> ".insn 0x010f" instead of "pause" only if the extension is disabled
>> (making the diagnostics better).
>>
>> gcc/ChangeLog:
>>
>> * common/config/riscv/riscv-common.cc
>> (riscv_ext_version_table): Implement the 'Zihintpause' extension,
>> version 2.0.  (riscv_ext_flag_table) Add 'Zihintpause' handling.
>> * config/riscv/riscv-builtins.cc: Remove availability predicate
>> "always" and add "hint_pause" and "hint_pause_pseudo", corresponding
>> the existence of the 'Zihintpause' extension.
>> (riscv_builtins) Split builtin implementation depending on the
>> existence of the 'Zihintpause' extension.
>> * config/riscv/riscv-opts.h
>> (MASK_ZIHINTPAUSE, TARGET_ZIHINTPAUSE): New.
>> * config/riscv/riscv.md (riscv_pause): Make it only available when
>> the 'Zihintpause' extension is enabled.  (riscv_pause_insn) New
>> "pause" implementation when the 'Zihintpause' extension is disabled.
>>
>> gcc/testsuite/ChangeLog:
>>
>> * gcc.target/riscv/builtin_pause.c: Removed.
>> * gcc.target/riscv/zihintpause.c:
>> New test when the 'Zihintpause' extension is enabled.
>> * gcc.target/riscv/zihintpause-noarch.c:
>> New test when the 'Zihintpause' extension is disabled.
> So I cleaned this up a bit per the list discussion and pushed the final
> result.  Hopefully I didn't goof anything too badly ;-)  The net is we
> emit "pause" or a suitable .insn based on TARGET_ZIHINTPAUSE.
> 
> Jeff

Thanks!  I had having a problem to type words through the keyboard for a
while and I appreciate doing that instead of me (your modifications were
mostly "I would do so too" ones).

Also, it seems that I will no longer need to ask you to remove leading
"[PATCH xxx]" (not just the commit title is not my intention, I worried
that you have been doing something inefficient [other than "git am"]).

Tsukasa


Re: [PATCH 1/2] allow targets to check shrink-wrap-separate enabled or not

2023-08-28 Thread Jeff Law via Gcc-patches




On 8/28/23 19:28, Fei Gao wrote:

On 2023-08-29 06:54  Jeff Law  wrote:




On 8/28/23 01:47, Fei Gao wrote:

no functional changes but allow targets to check shrink-wrap-separate enabled 
or not.

     gcc/ChangeLog:

   * shrink-wrap.cc (try_shrink_wrapping_separate):call
     use_shrink_wrapping_separate.
   (use_shrink_wrapping_separate): wrap the condition
     check in use_shrink_wrapping_separate.
   * shrink-wrap.h (use_shrink_wrapping_separate): add to extern

So as I mentioned earlier today in the older thread, can we use
override_options to do this?

If we look at aarch64_override_options we have this:

    /* The pass to insert speculation tracking runs before
   shrink-wrapping and the latter does not know how to update the
   tracking status.  So disable it in this case.  */
    if (aarch64_track_speculation)
  flag_shrink_wrap = 0;

We kind of want this instead

    if (flag_shrink_wrap)
  {
    turn off whatever target bits enable the cm.push/cm.pop insns
  }


This does imply that we have a distinct target flag to enable/disable
those instructions.  But that seems like a good thing to have anyway.

I'm afraid we cannot simply resolve the confilict based on
flag_shrink_wrap/flag_shrink_wrap_separate only, as they're set true from -O1 
onwards,
which means zcmp is disabled almostly unless 
-fno-shrink-warp/-fno-shrink-warp-separate
are explictly given.
Yea, but I would generally expect that if someone is really concerned 
about code size, they're probably using -Os which (hopefully) would not 
have shrink-wrapping enabled.




So after discussion with Kito, we would like to turn on zcmp for -Os and 
shrink-warp-separate
for the speed perfered optimization. use_shrink_wrapping_separate in this patch 
provide the
chance for this check. No new hook is needed.

Seems reasonable to me if Kito is OK with it.

jeff


Re: [PATCH] mklog: fix bugs of --append option

2023-08-28 Thread Lehua Ding
Committed the V2 patch, which additional fix some code format warning, 
thanks Jeff.


On 2023/8/29 7:38, Jeff Law wrote:



On 7/19/23 02:21, Lehua Ding wrote:

Hi,

This little patch fix two bugs of mklog.py with --append option.
The first bug is that the regexp used is not accurate enough to
determine the top of diff area. The second bug is that if `---`
is not a true start, it needs to be added back to the patch file.

contrib/ChangeLog:

* mklog.py: Fix regexp and add missed `---`

OK.  Sorry for the delay.
jeff



--
Best,
LehuaFrom 7a720dcba582674f94486e96c2abf9b542727f90 Mon Sep 17 00:00:00 2001
From: Lehua Ding 
Date: Tue, 18 Jul 2023 18:08:47 +0800
Subject: Re: [PATCH] mklog: fix bugs of --append option

Hi,

This little patch fix two bugs of mklog.py with --append option.
The first bug is that the regexp used is not accurate enough to
determine the top of diff area. The second bug is that if `---`
is not a true start, it needs to be added back to the patch file.
And with additional fix Python code format error, which Martin reported.

Best,
Lehua

contrib/ChangeLog:

* mklog.py: Fix bugs.
---
 contrib/mklog.py | 8 +---
 1 file changed, 5 insertions(+), 3 deletions(-)

diff --git a/contrib/mklog.py b/contrib/mklog.py
index 26230b9b4f2..0abefcd9374 100755
--- a/contrib/mklog.py
+++ b/contrib/mklog.py
@@ -374,7 +374,8 @@ if __name__ == '__main__':
 args.fill_up_bug_titles, args.pr_numbers)
 if args.append:
 if (not args.input):
-raise Exception("`-a or --append` option not support standard 
input")
+raise Exception("`-a or --append` option not support standard "
+"input")
 lines = []
 with open(args.input, 'r', newline='\n') as f:
 # 1 -> not find the possible start of diff log
@@ -384,13 +385,14 @@ if __name__ == '__main__':
 for line in f:
 if maybe_diff_log == 1 and line == "---\n":
 maybe_diff_log = 2
-elif maybe_diff_log == 2 and \
- re.match("\s[^\s]+\s+\|\s\d+\s[+\-]+\n", line):
+elif (maybe_diff_log == 2 and
+  re.match(r"\s[^\s]+\s+\|\s+\d+\s[+\-]+\n", line)):
 lines += [output, "---\n", line]
 maybe_diff_log = 3
 else:
 # the possible start is not the true start.
 if maybe_diff_log == 2:
+lines.append("---\n")
 maybe_diff_log = 1
 lines.append(line)
 with open(args.input, "w") as f:
-- 
2.36.1



Re: [PATCH v2 3/3] RISC-V: Add stub support for existing extensions (unprivileged)

2023-08-28 Thread Jeff Law via Gcc-patches




On 8/14/23 00:09, Tsukasa OI wrote:

From: Tsukasa OI 

After commit c283c4774d1c ("RISC-V: Throw compilation error for unknown
extensions") changed how do we handle unknown extensions, we have no
guarantee that we can share the same architectural string with Binutils
(specifically, the assembler).

To avoid compilation errors on shared Assembler-C/C++ projects, GCC should
support almost all extensions that Binutils support, even if the GCC does
not touch a thing.

This commit adds stub supported standard unprivileged extensions to
riscv_ext_version_table and its implications to riscv_implied_info
(all information is copied from Binutils' bfd/elfxx-riscv.c except not yet
merged 'Zce', 'Zcmp' and 'Zcmt' support).

gcc/ChangeLog:

* common/config/riscv/riscv-common.cc
(riscv_implied_info): Add implications from unprivileged extensions.
(riscv_ext_version_table): Add stub support for all unprivileged
extensions supported by Binutils as well as 'Zce', 'Zcmp', 'Zcmt'.

gcc/testsuite/ChangeLog:

* gcc.target/riscv/predef-31.c: New test for a stub unprivileged
extension 'Zcb' with some implications.

This series (most likely patch 3/3) seems to break arch-24.c and arch-25.c.

Please fix and post a V3.

Jeff


Re: Re: [PATCH 1/2] allow targets to check shrink-wrap-separate enabled or not

2023-08-28 Thread Fei Gao
On 2023-08-29 06:54  Jeff Law  wrote:
>
>
>
>On 8/28/23 01:47, Fei Gao wrote:
>> no functional changes but allow targets to check shrink-wrap-separate 
>> enabled or not.
>>
>>    gcc/ChangeLog:
>>
>>  * shrink-wrap.cc (try_shrink_wrapping_separate):call
>>    use_shrink_wrapping_separate.
>>  (use_shrink_wrapping_separate): wrap the condition
>>    check in use_shrink_wrapping_separate.
>>  * shrink-wrap.h (use_shrink_wrapping_separate): add to extern
>So as I mentioned earlier today in the older thread, can we use
>override_options to do this?
>
>If we look at aarch64_override_options we have this:
>
>   /* The pass to insert speculation tracking runs before
>  shrink-wrapping and the latter does not know how to update the
>  tracking status.  So disable it in this case.  */
>   if (aarch64_track_speculation)
> flag_shrink_wrap = 0;
>
>We kind of want this instead
>
>   if (flag_shrink_wrap)
> {
>   turn off whatever target bits enable the cm.push/cm.pop insns
> }
>
>
>This does imply that we have a distinct target flag to enable/disable
>those instructions.  But that seems like a good thing to have anyway. 
I'm afraid we cannot simply resolve the confilict based on 
flag_shrink_wrap/flag_shrink_wrap_separate only, as they're set true from -O1 
onwards,
which means zcmp is disabled almostly unless 
-fno-shrink-warp/-fno-shrink-warp-separate
are explictly given. 

So after discussion with Kito, we would like to turn on zcmp for -Os and 
shrink-warp-separate
for the speed perfered optimization. use_shrink_wrapping_separate in this patch 
provide the
chance for this check. No new hook is needed. 

Please let me know what you think.

BR, 
Fei

>
>jeff

Re: [pushed][PATCH v2] LoongArch: Enable '-free' starting at -O2.

2023-08-28 Thread chenglulu

Pushed to r14-3533.

在 2023/8/28 下午5:21, Xi Ruoyao 写道:

On Mon, 2023-08-28 at 11:46 +0800, Lulu Cheng wrote:

v1 -> v2:
 1. Modify Changelog information format.

gcc/ChangeLog:

 * common/config/loongarch/loongarch-common.cc:
 Enable '-free' on O2 and above.
 * doc/invoke.texi: Modify the description information
 of the '-free' compilation option and add the LoongArch
 description.

gcc/testsuite/ChangeLog:

 * gcc.target/loongarch/sign-extend.c: New test.

LGTM.


---
  .../config/loongarch/loongarch-common.cc  |  1 +
  gcc/doc/invoke.texi   |  4 +--
  .../gcc.target/loongarch/sign-extend.c    | 25 +++
  3 files changed, 28 insertions(+), 2 deletions(-)
  create mode 100644 gcc/testsuite/gcc.target/loongarch/sign-extend.c

diff --git a/gcc/common/config/loongarch/loongarch-common.cc 
b/gcc/common/config/loongarch/loongarch-common.cc
index fce32fa3f8d..c5ed37d27a6 100644
--- a/gcc/common/config/loongarch/loongarch-common.cc
+++ b/gcc/common/config/loongarch/loongarch-common.cc
@@ -35,6 +35,7 @@ static const struct default_options 
loongarch_option_optimization_table[] =
  {
    { OPT_LEVELS_ALL, OPT_fasynchronous_unwind_tables, NULL, 1 },
    { OPT_LEVELS_1_PLUS, OPT_fsection_anchors, NULL, 1 },
+  { OPT_LEVELS_2_PLUS, OPT_free, NULL, 1 },
    { OPT_LEVELS_NONE, 0, NULL, 0 }
  };
  
diff --git a/gcc/doc/invoke.texi b/gcc/doc/invoke.texi

index a32dabf0405..16aa92b5e86 100644
--- a/gcc/doc/invoke.texi
+++ b/gcc/doc/invoke.texi
@@ -12639,8 +12639,8 @@ Attempt to remove redundant extension instructions.  
This is especially
  helpful for the x86-64 architecture, which implicitly zero-extends in 64-bit
  registers after writing to their lower 32-bit half.
  
-Enabled for Alpha, AArch64, PowerPC, RISC-V, SPARC, h83000 and x86 at levels

-@option{-O2}, @option{-O3}, @option{-Os}.
+Enabled for Alpha, AArch64, LoongArch, PowerPC, RISC-V, SPARC, h83000 and x86 
at
+levels @option{-O2}, @option{-O3}, @option{-Os}.
  
  @opindex fno-lifetime-dse

  @opindex flifetime-dse
diff --git a/gcc/testsuite/gcc.target/loongarch/sign-extend.c 
b/gcc/testsuite/gcc.target/loongarch/sign-extend.c
new file mode 100644
index 000..3f339d06bbd
--- /dev/null
+++ b/gcc/testsuite/gcc.target/loongarch/sign-extend.c
@@ -0,0 +1,25 @@
+/* { dg-do compile } */
+/* { dg-options "-mabi=lp64d -O2" } */
+/* { dg-final { scan-assembler-times "slli.w" 1 } } */
+
+extern int PL_savestack_ix;
+extern int PL_regsize;
+extern int PL_savestack_max;
+void Perl_savestack_grow_cnt (int need);
+extern void Perl_croak (char *);
+
+int
+S_regcppush(int parenfloor)
+{
+  int retval = PL_savestack_ix;
+  int paren_elems_to_push = (PL_regsize - parenfloor) * 4;
+  int p;
+
+  if (paren_elems_to_push < 0)
+    Perl_croak ("panic: paren_elems_to_push < 0");
+
+  if (PL_savestack_ix + (paren_elems_to_push + 6) > PL_savestack_max)
+    Perl_savestack_grow_cnt (paren_elems_to_push + 6);
+
+  return retval;
+}




Re: Re: [PATCH V4] RISC-V: Enable vec_int testsuite for RVV VLA vectorization

2023-08-28 Thread juzhe.zh...@rivai.ai
>> Juzhe mentioned he doesn't want to commit this before
>> all/most bugs are addresses anyway, right?
Yes.


juzhe.zh...@rivai.ai
 
From: Robin Dapp
Date: 2023-08-28 22:27
To: Kito Cheng; Juzhe-Zhong
CC: rdapp.gcc; gcc-patches; kito.cheng
Subject: Re: [PATCH V4] RISC-V: Enable vec_int testsuite for RVV VLA 
vectorization
> LGTM from my side, but I would like to wait Robin is ok too
 
In principle I'm OK with it as well, realizing we will still need to fine-tune
a lot here anyway.  For now, IMHO it's good to have some additional test 
coverage
in the vector space but we should not expect every test to be correct/a good 
match
for everything we do yet.  Juzhe mentioned he doesn't want to commit this before
all/most bugs are addresses anyway, right?
 
Regards
Robin
 


Re: [PATCH] mklog: fix bugs of --append option

2023-08-28 Thread Jeff Law via Gcc-patches




On 7/19/23 02:21, Lehua Ding wrote:

Hi,

This little patch fix two bugs of mklog.py with --append option.
The first bug is that the regexp used is not accurate enough to
determine the top of diff area. The second bug is that if `---`
is not a true start, it needs to be added back to the patch file.

contrib/ChangeLog:

* mklog.py: Fix regexp and add missed `---`

OK.  Sorry for the delay.
jeff


Re: [PATCH] fwprop: Allow UNARY_P and check register pressure.

2023-08-28 Thread Jeff Law via Gcc-patches




On 8/24/23 08:06, Robin Dapp via Gcc-patches wrote:

Ping.  I refined the code and some comments a bit and added a test
case.

My question in general would still be:  Is this something we want
given that we potentially move some of combine's work a bit towards
the front of the RTL pipeline?

Regards
  Robin

Subject: [PATCH] fwprop: Allow UNARY_P and check register pressure.

This patch enables the forwarding of UNARY_P sources.  As this
involves potentially replacing a vector register with a scalar register
the ira_hoist_pressure machinery is used to calculate the change in
register pressure.  If the propagation would increase the pressure
beyond the number of hard regs, we don't perform it.

gcc/ChangeLog:

* fwprop.cc (fwprop_propagation::profitable_p): Add unary
handling.
(fwprop_propagation::update_register_pressure): New function.
(fwprop_propagation::register_pressure_high_p): New function
(reg_single_def_for_src_p): Look through unary expressions.
(try_fwprop_subst_pattern): Check register pressure.
(forward_propagate_into): Call new function.
(fwprop_init): Init register pressure.
(fwprop_done): Clean up register pressure.
(fwprop_insn): Add comment.

gcc/testsuite/ChangeLog:

* gcc.target/riscv/rvv/autovec/binop/vadd-vx-fwprop.c: New test.
So I was hoping that Richard S. would chime in here as he knows this 
code better than anyone.


This looks like a much better implementation of something I've done 
before :-)  Basically imagine a target where a sign/zero extension can 
be folded into arithmetic for free.  We put in various hacks to this 
code to encourage more propagations of extensions.


I still think this is valuable.  As we lower from gimple->RTL we're 
going to still have artifacts in the RTL that we're going to want to 
optimize away.  fwprop has certain advantages over combine, including 
the fact that it runs earlier, pre-loop.



It looks generally sensible to me.  But give Richard S. another week to 
chime in.  He seems to be around, but may be slammed with stuff right now.


jeff



[PATCH v2] c++: tweaks for explicit conversion fns diagnostic

2023-08-28 Thread Marek Polacek via Gcc-patches
On Fri, Aug 25, 2023 at 08:34:37PM -0400, Jason Merrill wrote:
> On 8/25/23 19:37, Marek Polacek wrote:
> > Bootstrapped/regtested on x86_64-pc-linux-gnu, ok for trunk?
> > 
> > -- >8 --
> > 
> > 1) When saying that a conversion is erroneous because it would use
> > an explicit constructor, it might be nice to show where exactly
> > the explicit constructor is located.  For example, with this patch:
> > 
> > [...]
> > explicit.C:4:12: note: 'S::S(int)' declared here
> >  4 |   explicit S(int) { }
> >|^
> > 
> > 2) When a conversion doesn't work out merely because the conversion
> > function necessary to do the conversion couldn't be used because
> > it was marked explicit, it would be useful to the user to say so,
> > rather than just saying "cannot convert".  For example, with this patch:
> > 
> > explicit.C:13:12: error: cannot convert 'S' to 'bool' in initialization
> > 13 |   bool b = S{1};
> >|^~~~
> >||
> >|S
> > explicit.C:5:12: note: explicit conversion function was not considered
> >  5 |   explicit operator bool() const { return true; }
> >|^~~~
> > 
> > gcc/cp/ChangeLog:
> > 
> > * call.cc (convert_like_internal): Show where the conversion function
> > was declared.
> > (maybe_show_nonconverting_candidate): New.
> > * cp-tree.h (maybe_show_nonconverting_candidate): Declare.
> > * typeck.cc (convert_for_assignment): Call it.
> > 
> > gcc/testsuite/ChangeLog:
> > 
> > * g++.dg/diagnostic/explicit.C: New test.
> > ---
> >   gcc/cp/call.cc | 41 +++---
> >   gcc/cp/cp-tree.h   |  1 +
> >   gcc/cp/typeck.cc   |  5 +++
> >   gcc/testsuite/g++.dg/diagnostic/explicit.C | 16 +
> >   4 files changed, 59 insertions(+), 4 deletions(-)
> >   create mode 100644 gcc/testsuite/g++.dg/diagnostic/explicit.C
> > 
> > diff --git a/gcc/cp/call.cc b/gcc/cp/call.cc
> > index 23e458d3252..09ebcf6a115 100644
> > --- a/gcc/cp/call.cc
> > +++ b/gcc/cp/call.cc
> > @@ -8459,12 +8459,21 @@ convert_like_internal (conversion *convs, tree 
> > expr, tree fn, int argnum,
> > if (pedwarn (loc, 0, "converting to %qT from initializer list "
> >  "would use explicit constructor %qD",
> >  totype, convfn))
> > - inform (loc, "in C++11 and above a default constructor "
> > - "can be explicit");
> > + {
> > +   inform (loc, "in C++11 and above a default constructor "
> > +   "can be explicit");
> > +   inform (DECL_SOURCE_LOCATION (convfn), "%qD declared here",
> > +   convfn);
> 
> I'd swap these two informs.

Done.
 
> > +++ b/gcc/testsuite/g++.dg/diagnostic/explicit.C
> > @@ -0,0 +1,16 @@
> > +// { dg-do compile { target c++11 } }
> > +
> > +struct S {
> > +  explicit S(int) { }
> > +  explicit operator bool() const { return true; } // { dg-message 
> > "explicit conversion function was not considered" }
> > +  explicit operator int() const { return 42; } // { dg-message "explicit 
> > conversion function was not considered" }
> > +};
> > +
> > +void
> > +g ()
> > +{
> > +  S s = {1}; // { dg-error "would use explicit constructor" }
> > +  bool b = S{1}; // { dg-error "cannot convert .S. to .bool. in 
> > initialization" }
> > +  int i;
> > +  i = S{2}; // { dg-error "cannot convert .S. to .int. in assignment" }
> > +}
> 
> Let's also test other copy-initialization contexts: parameter passing,
> return, throw, aggregate member initialization.

Done except for throw.  To handle arg passing I moved the call to
maybe_show_nonconverting_candidate one line down.  I guess a testcase
for throw would be

struct T {
  T() { } // #1
  explicit T(const T&) { } // #2
};

void
g ()
{
  T t{};
  throw t;
}

but #2 isn't a viable candidate so this would take more effort to handle.
We just say about #1 that "candidate expects 0 arguments, 1 provided".

clang++ says

e.C:3:12: note: explicit constructor is not a candidate
3 |   explicit T(const T&) { }
  |^

Bootstrapped/regtested on x86_64-pc-linux-gnu, ok for trunk?

-- >8 --
1) When saying that a conversion is erroneous because it would use
an explicit constructor, it might be nice to show where exactly
the explicit constructor is located.  For example, with this patch:

[...]
explicit.C:4:12: note: 'S::S(int)' declared here
4 |   explicit S(int) { }
  |^

2) When a conversion doesn't work out merely because the conversion
function necessary to do the conversion couldn't be used because
it was marked explicit, it would be useful to the user to say so,
rather than just saying "cannot convert".  For example, with this patch:

explicit.C:13:12: error: cannot convert 'S' to 'bool' in initialization
   13 |   bool b = S{1};
  |^~~~
  |

Re: [PATCH] [tree-optimization/110279] swap operands in reassoc to reduce cross backedge FMA

2023-08-28 Thread Jeff Law via Gcc-patches




On 8/28/23 02:17, Di Zhao OS via Gcc-patches wrote:

This patch tries to fix the 2% regression in 510.parest_r on
ampere1 in the tracker. (Previous discussion is here:
https://gcc.gnu.org/pipermail/gcc-patches/2023-July/624893.html)

1. Add testcases for the problem. For an op list in the form of
"acc = a * b + c * d + acc", currently reassociation doesn't
Swap the operands so that more FMAs can be generated.
After widening_mul the result looks like:

_1 = .FMA(a, b, acc_0);
acc_1 = .FMA(c, d, _1);

While previously (before the "Handle FMA friendly..." patch),
widening_mul's result was like:

_1 = a * b;
_2 = .FMA (c, d, _1);
acc_1 = acc_0 + _2;

If the code fragment is in a loop, some architecture can execute
the latter in parallel, so the performance can be much faster than
the former. For the small testcase, the performance gap is over
10% on both ampere1 and neoverse-n1. So the point here is to avoid
turning the last statement into FMA, and keep it a PLUS_EXPR as
much as possible. (If we are rewriting the op list into parallel,
no special treatment is needed, since the last statement after
rewrite_expr_tree_parallel will be PLUS_EXPR anyway.)

2. Function result_feeds_back_from_phi_p is to check for cross
backedge dependency. Added new enum fma_state to describe the
state of FMA candidates.

With this patch, there's a 3% improvement in 510.parest_r 1-copy
run on ampere1. The compile options are:
"-Ofast -mcpu=ampere1 -flto --param avoid-fma-max-bits=512".

Best regards,
Di Zhao



 PR tree-optimization/110279

gcc/ChangeLog:

 * tree-ssa-reassoc.cc (enum fma_state): New enum to
 describe the state of FMA candidates for an op list.
 (rewrite_expr_tree_parallel): Changed boolean
 parameter to enum type.
 (result_feeds_back_from_phi_p): New function to check
 for cross backedge dependency.
 (rank_ops_for_fma): Return enum fma_state. Added new
 parameter.
 (reassociate_bb): If there's backedge dependency in an
 op list, swap the operands before rewrite_expr_tree.

gcc/testsuite/ChangeLog:

 * gcc.dg/pr110279.c: New test.
Not a review, but more of a question -- isn't this transformation's 
profitability uarch sensitive.  ie, just because it's bad for a set of 
aarch64 uarches, doesn't mean it's bad everywhere.


And in general we shy away from trying to adjust gimple code based on 
uarch preferences.


It seems the right place to do this is gimple->rtl expansion.

Jeff


Re: [RFC PATCH 2/2] RISC-V: Fix documentation of __builtin_riscv_pause

2023-08-28 Thread Jeff Law via Gcc-patches




On 8/28/23 17:09, Hans-Peter Nilsson wrote:

On Mon, 28 Aug 2023, Jeff Law via Gcc-patches wrote:



On 8/9/23 00:11, Tsukasa OI via Gcc-patches wrote:

From: Tsukasa OI 

This built-in does not imply the 'Xgnuzihintpausestate' extension.
It does not change architectural state (because all HINTs are prohibited
from doing that).

gcc/ChangeLog:

* doc/extend.texi: Fix the description of __builtin_riscv_pause.

I've pushed this to the trunk.


I randomly noticed a typo: "hart", perhaps for "part"?
Not sure though.

Not a typo.  "hart" has a well defined meaning in the risc-v world.

Thanks,
jeff


Re: [RFC PATCH 2/2] RISC-V: Fix documentation of __builtin_riscv_pause

2023-08-28 Thread Hans-Peter Nilsson
On Mon, 28 Aug 2023, Jeff Law via Gcc-patches wrote:
> 
> 
> On 8/9/23 00:11, Tsukasa OI via Gcc-patches wrote:
> > From: Tsukasa OI 
> > 
> > This built-in does not imply the 'Xgnuzihintpausestate' extension.
> > It does not change architectural state (because all HINTs are prohibited
> > from doing that).
> > 
> > gcc/ChangeLog:
> > 
> > * doc/extend.texi: Fix the description of __builtin_riscv_pause.
> I've pushed this to the trunk.

I randomly noticed a typo: "hart", perhaps for "part"?
Not sure though.

brgds, H-P


Re: [PATCH] c++: CWG 2359, wrong copy-init with designated init [PR91319]

2023-08-28 Thread Marek Polacek via Gcc-patches
On Mon, Aug 28, 2023 at 06:27:26PM -0400, Jason Merrill wrote:
> On 8/25/23 12:44, Marek Polacek wrote:
> > Bootstrapped/regtested on x86_64-pc-linux-gnu, ok for trunk?
> > 
> > -- >8 --
> > 
> > This CWG clarifies that designated initializer support 
> > direct-initialization.
> > Just be careful what Note 2 in [dcl.init.aggr]/4.2 says: "If the
> > initialization is by designated-initializer-clause, its form determines
> > whether copy-initialization or direct-initialization is performed."  Hence
> > this patch sets CONSTRUCTOR_IS_DIRECT_INIT only when we are dealing with
> > ".x{}", but not ".x = {}".
> > 
> > PR c++/91319
> > 
> > gcc/cp/ChangeLog:
> > 
> > * parser.cc (cp_parser_initializer_list): Set CONSTRUCTOR_IS_DIRECT_INIT
> > when the designated initializer is of the .x{} form.
> > 
> > gcc/testsuite/ChangeLog:
> > 
> > * g++.dg/cpp2a/desig30.C: New test.
> > ---
> >   gcc/cp/parser.cc |  6 ++
> >   gcc/testsuite/g++.dg/cpp2a/desig30.C | 22 ++
> >   2 files changed, 28 insertions(+)
> >   create mode 100644 gcc/testsuite/g++.dg/cpp2a/desig30.C
> > 
> > diff --git a/gcc/cp/parser.cc b/gcc/cp/parser.cc
> > index eeb22e44fb4..b3d5c65b469 100644
> > --- a/gcc/cp/parser.cc
> > +++ b/gcc/cp/parser.cc
> > @@ -25718,6 +25718,7 @@ cp_parser_initializer_list (cp_parser* parser, 
> > bool* non_constant_p,
> > tree designator;
> > tree initializer;
> > bool clause_non_constant_p;
> > +  bool direct_p = false;
> > location_t loc = cp_lexer_peek_token (parser->lexer)->location;
> > /* Handle the C++20 syntax, '. id ='.  */
> > @@ -25740,6 +25741,8 @@ cp_parser_initializer_list (cp_parser* parser, 
> > bool* non_constant_p,
> >   if (cp_lexer_next_token_is (parser->lexer, CPP_EQ))
> > /* Consume the `='.  */
> > cp_lexer_consume_token (parser->lexer);
> > + else
> > +   direct_p = true;
> > }
> > /* Also, if the next token is an identifier and the following one 
> > is a
> >  colon, we are looking at the GNU designated-initializer
> > @@ -25817,6 +25820,9 @@ cp_parser_initializer_list (cp_parser* parser, 
> > bool* non_constant_p,
> > if (clause_non_constant_p && non_constant_p)
> > *non_constant_p = true;
> > +  if (TREE_CODE (initializer) == CONSTRUCTOR)
> > +   CONSTRUCTOR_IS_DIRECT_INIT (initializer) |= direct_p;
> 
> Why |= rather than = ?

CONSTRUCTOR_IS_DIRECT_INIT could already have been set earlier so using
= might wrongly clear it.  I saw this in direct-enum-init1.C.

Marek



Re: [PATCH] c++: refine CWG 2369 satisfaction vs non-dep convs [PR99599]

2023-08-28 Thread Jason Merrill via Gcc-patches

On 8/24/23 09:31, Patrick Palka wrote:

On Wed, 23 Aug 2023, Jason Merrill wrote:


On 8/21/23 21:51, Patrick Palka wrote:

Bootstrapped and regtested on x86_64-pc-linux-gnu, does this look like
a reasonable approach?  I didn't observe any compile time/memory impact
of this change.

-- >8 --

As described in detail in the PR, CWG 2369 has the surprising
consequence of introducing constraint recursion in seemingly valid and
innocent code.

This patch attempts to fix this surpising behavior for the majority
of problematic use cases.  Rather than checking satisfaction before
_all_ non-dependent conversions, as specified by the CWG issue,
this patch makes us first check "safe" non-dependent conversions,
then satisfaction, then followed by "unsafe" non-dependent conversions.
In this case, a conversion is "safe" if computing it is guaranteed
to not induce template instantiation.  This patch heuristically
determines "safety" by checking for a constructor template or conversion
function template in the (class) parm or arg types respectively.
If neither type has such a member, then computing the conversion
should not induce instantiation (modulo satisfaction checking of
non-template constructor and conversion functions I suppose).

+ /* We're checking only non-instantiating conversions.
+A conversion may instantiate only if it's to/from a
+class type that has a constructor template/conversion
+function template.  */
+ tree parm_nonref = non_reference (parm);
+ tree type_nonref = non_reference (type);
+
+ if (CLASS_TYPE_P (parm_nonref))
+   {
+ if (!COMPLETE_TYPE_P (parm_nonref)
+ && CLASSTYPE_TEMPLATE_INSTANTIATION (parm_nonref))
+   return unify_success (explain_p);
+
+ tree ctors = get_class_binding (parm_nonref,
+ complete_ctor_identifier);
+ for (tree ctor : lkp_range (ctors))
+   if (TREE_CODE (ctor) == TEMPLATE_DECL)
+ return unify_success (explain_p);


Today we discussed maybe checking CLASSTYPE_NON_AGGREGATE?


Done; all dups of this PR seem to use tag types that are aggregates, so this
seems like a good simplification.  I also made us punt if the arg type has a
constrained non-template conversion function.



Also, instantiation can also happen when checking for conversion to a pointer
or reference to base class.


Oops, I suppose we just need to strip pointer types upfront as well.  The
!COMPLETE_TYPE_P && CLASSTYPE_TEMPLATE_INSTANTIATION tests will then make
sure we deem a potential derived-to-base conversion unsafe if appropriate
IIUC.

How does the following look?

-- >8 --

Subject: [PATCH] c++: refine CWG 2369 satisfaction vs non-dep convs [PR99599]

PR c++/99599

gcc/cp/ChangeLog:

* config-lang.in (gtfiles): Add search.cc.
* pt.cc (check_non_deducible_conversions): Add bool parameter
passed down to check_non_deducible_conversion.
(fn_type_unification): Call check_non_deducible_conversions
an extra time before satisfaction with noninst_only_p=true.
(check_non_deducible_conversion): Add bool parameter controlling
whether to compute only conversions that are guaranteed to
not induce template instantiation.
* search.cc (conversions_cache): Define.
(lookup_conversions): Use it to cache the lookup.  Improve cache
rate by considering TYPE_MAIN_VARIANT of the type.

gcc/testsuite/ChangeLog:

* g++.dg/cpp2a/concepts-nondep4.C: New test.
---
  gcc/cp/config-lang.in |  1 +
  gcc/cp/pt.cc  | 81 +--
  gcc/cp/search.cc  | 14 +++-
  gcc/testsuite/g++.dg/cpp2a/concepts-nondep4.C | 21 +
  4 files changed, 110 insertions(+), 7 deletions(-)
  create mode 100644 gcc/testsuite/g++.dg/cpp2a/concepts-nondep4.C

@@ -22921,6 +22933,65 @@ check_non_deducible_conversion (tree parm, tree arg, 
unification_kind_t strict,
  {
bool ok = false;
tree conv_arg = TYPE_P (arg) ? NULL_TREE : arg;
+  if (conv_p && *conv_p)
+   {
+ /* This conversion was already computed earlier (when
+computing only non-instantiating conversions).  */
+ gcc_checking_assert (!noninst_only_p);
+ return unify_success (explain_p);
+   }
+  if (noninst_only_p)
+   {
+ /* We're checking only non-instantiating conversions.
+Computing a conversion may induce template instantiation
+only if ... */


Let's factor this whole block out into another function.

Incidentally, CWG1092 is a related problem with defaulted functions, 
which I dealt with in a stricter way: when LOOKUP_DEFAULTED we ignore a 
conversion from the parameter being copied to a non-reference-related 
type.  As a follow-on, it might make sense to use this test there as well?


Re: [PATCH 1/2] allow targets to check shrink-wrap-separate enabled or not

2023-08-28 Thread Jeff Law via Gcc-patches




On 8/28/23 01:47, Fei Gao wrote:

no functional changes but allow targets to check shrink-wrap-separate enabled 
or not.

   gcc/ChangeLog:

 * shrink-wrap.cc (try_shrink_wrapping_separate):call
   use_shrink_wrapping_separate.
 (use_shrink_wrapping_separate): wrap the condition
   check in use_shrink_wrapping_separate.
 * shrink-wrap.h (use_shrink_wrapping_separate): add to extern
So as I mentioned earlier today in the older thread, can we use 
override_options to do this?


If we look at aarch64_override_options we have this:

  /* The pass to insert speculation tracking runs before
 shrink-wrapping and the latter does not know how to update the
 tracking status.  So disable it in this case.  */
  if (aarch64_track_speculation)
flag_shrink_wrap = 0;

We kind of want this instead

  if (flag_shrink_wrap)
{
  turn off whatever target bits enable the cm.push/cm.pop insns
}


This does imply that we have a distinct target flag to enable/disable 
those instructions.  But that seems like a good thing to have anyway.


jeff


Re: [PATCH V3] riscv: generate builtin macro for compilation with strict alignment:

2023-08-28 Thread Jeff Law via Gcc-patches




On 8/15/23 12:29, Edwin Lu wrote:

This patch is a modification of
https://gcc.gnu.org/pipermail/gcc-patches/2023-January/610115.html
following the discussion on
https://github.com/riscv-non-isa/riscv-c-api-doc/issues/32

Distinguish between explicit -mstrict-align and cpu tune param
for slow_unaligned_access=true/false.

Tested for regressions using rv32/64 multilib with newlib/linux

gcc/ChangeLog:

* config/riscv/riscv-c.cc (riscv_cpu_cpp_builtins):
  Generate __riscv_unaligned_avoid with value 1 or
  __riscv_unaligned_slow with value 1 or
  __riscv_unaligned_fast with value 1
* config/riscv/riscv.cc (riscv_option_override):
 Define riscv_user_wants_strict_align. Set
 riscv_user_wants_strict_align to TARGET_STRICT_ALIGN
* config/riscv/riscv.h: Declare riscv_user_wants_strict_align

gcc/testsuite/ChangeLog:

* gcc.target/riscv/attribute-1.c: Check for
 __riscv_unaligned_slow or __riscv_unaligned_fast
* gcc.target/riscv/attribute-4.c: Check for
 __riscv_unaligned_avoid
* gcc.target/riscv/attribute-5.c: Check for
 __riscv_unaligned_slow or __riscv_unaligned_fast
* gcc.target/riscv/predef-align-1.c: New test.
* gcc.target/riscv/predef-align-2.c: New test.
* gcc.target/riscv/predef-align-3.c: New test.
* gcc.target/riscv/predef-align-4.c: New test.
* gcc.target/riscv/predef-align-5.c: New test.
* gcc.target/riscv/predef-align-6.c: New test.
OK.  Though I'm pretty sure the commit hooks are going to complain about 
your ChangeLog :-)


jeff


Re: [PATCH 1/2] allow target to check shrink-wrap-separate enabled or not

2023-08-28 Thread Jeff Law via Gcc-patches




On 6/25/23 20:29, Fei Gao wrote:

hi Jeff

Please see my earlier reply here.
https://www.mail-archive.com/gcc-patches@gcc.gnu.org/msg310656.html

Maybe you scrolled past it in so many emails:)
Oh, so the issue isn't really the set of components being wrapped, but 
the way in which we save them.  Yea, that's going to need some tinkering.


It does make me wonder if we can handle this in riscv_override_options. 
That's a pretty standard place to deal with option conflicts.  We ought 
to be able to check if both options are enabled, then disable zcmp 
push/pop at that poing without introducing any new hooks.



jeff


Re: [PATCH] c++: use conversion_obstack_sentinel throughout

2023-08-28 Thread Jason Merrill via Gcc-patches

On 8/25/23 12:33, Patrick Palka wrote:

Boostrapped and regtested on x86_64-pc-linux-gnu, does this look OK for
trunk?


OK.


-- >8 --

This replaces manual memory management via conversion_obstack_alloc(0)
and obstack_free with the recently added conversion_obstack_sentinel,
and also uses the latter in build_user_type_conversion and
build_operator_new_call.

gcc/cp/ChangeLog:

* call.cc (build_user_type_conversion): Free allocated
conversions.
(build_converted_constant_expr_internal): Use
conversion_obstack_sentinel instead.
(perform_dguide_overload_resolution): Likewise.
(build_new_function_call): Likewise.
(build_operator_new_call): Free allocated conversions.
(build_op_call): Use conversion_obstack_sentinel instead.
(build_conditional_expr): Use conversion_obstack_sentinel
instead, and hoist it out to the outermost scope.
(build_new_op): Use conversion_obstack_sentinel instead
and set it up before the first goto.  Remove second unneeded goto.
(build_op_subscript): Use conversion_obstack_sentinel instead.
(ref_conv_binds_to_temporary): Likewise.
(build_new_method_call): Likewise.
(can_convert_arg): Likewise.
(can_convert_arg_bad): Likewise.
(perform_implicit_conversion_flags): Likewise.
(perform_direct_initialization_if_possible): Likewise.
(initialize_reference): Likewise.
---
  gcc/cp/call.cc | 107 ++---
  1 file changed, 22 insertions(+), 85 deletions(-)

diff --git a/gcc/cp/call.cc b/gcc/cp/call.cc
index 673ec91d60e..432ac99b4bb 100644
--- a/gcc/cp/call.cc
+++ b/gcc/cp/call.cc
@@ -4646,6 +4646,9 @@ build_user_type_conversion (tree totype, tree expr, int 
flags,
tree ret;
  
auto_cond_timevar tv (TV_OVERLOAD);

+
+  conversion_obstack_sentinel cos;
+
cand = build_user_type_conversion_1 (totype, expr, flags, complain);
  
if (cand)

@@ -4698,15 +4701,13 @@ build_converted_constant_expr_internal (tree type, tree 
expr,
int flags, tsubst_flags_t complain)
  {
conversion *conv;
-  void *p;
tree t;
location_t loc = cp_expr_loc_or_input_loc (expr);
  
if (error_operand_p (expr))

  return error_mark_node;
  
-  /* Get the high-water mark for the CONVERSION_OBSTACK.  */

-  p = conversion_obstack_alloc (0);
+  conversion_obstack_sentinel cos;
  
conv = implicit_conversion (type, TREE_TYPE (expr), expr,

  /*c_cast_p=*/false, flags, complain);
@@ -4815,9 +4816,6 @@ build_converted_constant_expr_internal (tree type, tree 
expr,
expr = error_mark_node;
  }
  
-  /* Free all the conversions we allocated.  */

-  obstack_free (&conversion_obstack, p);
-
return expr;
  }
  
@@ -4985,8 +4983,7 @@ perform_dguide_overload_resolution (tree dguides, const vec *args,
  
gcc_assert (deduction_guide_p (OVL_FIRST (dguides)));
  
-  /* Get the high-water mark for the CONVERSION_OBSTACK.  */

-  void *p = conversion_obstack_alloc (0);
+  conversion_obstack_sentinel cos;
  
z_candidate *cand = perform_overload_resolution (dguides, args, &candidates,

   &any_viable_p, complain);
@@ -4999,9 +4996,6 @@ perform_dguide_overload_resolution (tree dguides, const 
vec *args,
else
  result = cand->fn;
  
-  /* Free all the conversions we allocated.  */

-  obstack_free (&conversion_obstack, p);
-
return result;
  }
  
@@ -5015,7 +5009,6 @@ build_new_function_call (tree fn, vec **args,

  {
struct z_candidate *candidates, *cand;
bool any_viable_p;
-  void *p;
tree result;
  
if (args != NULL && *args != NULL)

@@ -5028,8 +5021,7 @@ build_new_function_call (tree fn, vec **args,
if (flag_tm)
  tm_malloc_replacement (fn);
  
-  /* Get the high-water mark for the CONVERSION_OBSTACK.  */

-  p = conversion_obstack_alloc (0);
+  conversion_obstack_sentinel cos;
  
cand = perform_overload_resolution (fn, *args, &candidates, &any_viable_p,

  complain);
@@ -5061,9 +5053,6 @@ build_new_function_call (tree fn, vec **args,
  == BUILT_IN_NORMAL)
 result = coro_validate_builtin_call (result);
  
-  /* Free all the conversions we allocated.  */

-  obstack_free (&conversion_obstack, p);
-
return result;
  }
  
@@ -5108,6 +5097,8 @@ build_operator_new_call (tree fnname, vec **args,

if (*args == NULL)
  return error_mark_node;
  
+  conversion_obstack_sentinel cos;

+
/* Based on:
  
 [expr.new]

@@ -5234,7 +5225,6 @@ build_op_call (tree obj, vec **args, 
tsubst_flags_t complain)
tree fns, convs, first_mem_arg = NULL_TREE;
bool any_viable_p;
tree result = NULL_TREE;
-  void *p;
  
auto_cond_timevar tv (TV_OVERLOAD);
  
@@ -5273,8 +5263,7 @@ build_op_call (tree obj, vec **args, tsubst_flags_t complain)

return error_mark_node;
  }
  

Re: [PATCH] c++: more dummy non_constant_p arg avoidance

2023-08-28 Thread Jason Merrill via Gcc-patches

On 8/25/23 13:41, Patrick Palka wrote:

Tested on x86_64-pc-linux-gnu, does this look OK for trunk?  This
reduces calls to is_rvalue_constant_expression from
cp_parser_constant_expression by 10% for stdc++.h.


OK.


-- >8 --

As a follow-up to Marek's r14-3088-ga263152643bbec, this patch makes
us avoid passing an effectively dummy non_constant_p argument in two
more spots in the parser.

gcc/cp/ChangeLog:

* parser.cc (cp_parser_parenthesized_expression_list_elt): Pass
nullptr as non_constant_p to cp_parser_braced_list if our
non_constant_p is null.
(cp_parser_initializer_list): Likewise to
cp_parser_initializer_clause.
---
  gcc/cp/parser.cc | 11 ---
  1 file changed, 8 insertions(+), 3 deletions(-)

diff --git a/gcc/cp/parser.cc b/gcc/cp/parser.cc
index 774706ac607..a8cc91059c1 100644
--- a/gcc/cp/parser.cc
+++ b/gcc/cp/parser.cc
@@ -8099,7 +8099,10 @@ cp_parser_parenthesized_expression_list_elt (cp_parser 
*parser, bool cast_p,
/* A braced-init-list.  */
cp_lexer_set_source_position (parser->lexer);
maybe_warn_cpp0x (CPP0X_INITIALIZER_LISTS);
-  expr = cp_parser_braced_list (parser, &expr_non_constant_p);
+  expr = cp_parser_braced_list (parser,
+   (non_constant_p != nullptr
+   ? &expr_non_constant_p
+   : nullptr));
if (non_constant_p && expr_non_constant_p)
*non_constant_p = true;
  }
@@ -25812,9 +25815,11 @@ cp_parser_initializer_list (cp_parser* parser, bool* 
non_constant_p,
  
/* Parse the initializer.  */

initializer = cp_parser_initializer_clause (parser,
- &clause_non_constant_p);
+ (non_constant_p != nullptr
+  ? &clause_non_constant_p
+  : nullptr));
/* If any clause is non-constant, so is the entire initializer.  */
-  if (clause_non_constant_p && non_constant_p)
+  if (non_constant_p && clause_non_constant_p)
*non_constant_p = true;
  
/* If we have an ellipsis, this is an initializer pack




Re: [PATCH] c++: CWG 2359, wrong copy-init with designated init [PR91319]

2023-08-28 Thread Jason Merrill via Gcc-patches

On 8/25/23 12:44, Marek Polacek wrote:

Bootstrapped/regtested on x86_64-pc-linux-gnu, ok for trunk?

-- >8 --

This CWG clarifies that designated initializer support direct-initialization.
Just be careful what Note 2 in [dcl.init.aggr]/4.2 says: "If the
initialization is by designated-initializer-clause, its form determines
whether copy-initialization or direct-initialization is performed."  Hence
this patch sets CONSTRUCTOR_IS_DIRECT_INIT only when we are dealing with
".x{}", but not ".x = {}".

PR c++/91319

gcc/cp/ChangeLog:

* parser.cc (cp_parser_initializer_list): Set CONSTRUCTOR_IS_DIRECT_INIT
when the designated initializer is of the .x{} form.

gcc/testsuite/ChangeLog:

* g++.dg/cpp2a/desig30.C: New test.
---
  gcc/cp/parser.cc |  6 ++
  gcc/testsuite/g++.dg/cpp2a/desig30.C | 22 ++
  2 files changed, 28 insertions(+)
  create mode 100644 gcc/testsuite/g++.dg/cpp2a/desig30.C

diff --git a/gcc/cp/parser.cc b/gcc/cp/parser.cc
index eeb22e44fb4..b3d5c65b469 100644
--- a/gcc/cp/parser.cc
+++ b/gcc/cp/parser.cc
@@ -25718,6 +25718,7 @@ cp_parser_initializer_list (cp_parser* parser, bool* 
non_constant_p,
tree designator;
tree initializer;
bool clause_non_constant_p;
+  bool direct_p = false;
location_t loc = cp_lexer_peek_token (parser->lexer)->location;
  
/* Handle the C++20 syntax, '. id ='.  */

@@ -25740,6 +25741,8 @@ cp_parser_initializer_list (cp_parser* parser, bool* 
non_constant_p,
  if (cp_lexer_next_token_is (parser->lexer, CPP_EQ))
/* Consume the `='.  */
cp_lexer_consume_token (parser->lexer);
+ else
+   direct_p = true;
}
/* Also, if the next token is an identifier and the following one is a
 colon, we are looking at the GNU designated-initializer
@@ -25817,6 +25820,9 @@ cp_parser_initializer_list (cp_parser* parser, bool* 
non_constant_p,
if (clause_non_constant_p && non_constant_p)
*non_constant_p = true;
  
+  if (TREE_CODE (initializer) == CONSTRUCTOR)

+   CONSTRUCTOR_IS_DIRECT_INIT (initializer) |= direct_p;


Why |= rather than = ?

Jason



Re: [PATCH 1/2] allow target to check shrink-wrap-separate enabled or not

2023-08-28 Thread Jeff Law via Gcc-patches




On 6/25/23 20:29, Fei Gao wrote:

hi Jeff

Please see my earlier reply here.
https://www.mail-archive.com/gcc-patches@gcc.gnu.org/msg310656.html

Maybe you scrolled past it in so many emails:)

It definitely got lost in my mountain of mail.

jeff


Re: [PATCH] RISC-V: Revive test case PR 102957

2023-08-28 Thread Jeff Law via Gcc-patches




On 8/11/23 08:29, Tsukasa OI wrote:

On 2023/08/11 23:15, Jeff Law wrote:






Originally, it tested that a two letter extension ('Zb') is accepted by
GCC (because the background of PR 102957 was GCC assumed multi-letter
'Z' extensions are three letters or more).

After rejecting unrecognized extensions, "dg-error" is added **just to
avoid the test failure** and that doesn't look right.  Yes, we now don't
have an ICE (like in the original report) but after the PR 102957 fix,
we just accepted it, not rejecting it.

Instead, we have a valid (recognized) two-letter 'Z' extension: 'Zk'.  I
think replacing "zb" with "zk" is more correct considering the original
bug report (PR 102957) and its assumption.

cf. 

Thanks.  It still seems to me we want to  have two tests here.

I would suggest leaving pr102957.c alone since that tests that we give a 
proper error for "zb".  Then create a new test that verifies "zk" is 
accepted without error.


jeff


Re: [PATCH] RISC-V: Add Types to Un-Typed Vector Instructions:

2023-08-28 Thread Jeff Law via Gcc-patches




On 8/28/23 13:03, Edwin Lu wrote:

Related Discussion:
https://inbox.sourceware.org/gcc-patches/12fb5088-3f28-0a69-de1e-f387371a5...@gmail.com/

This patch updates vector instructions to ensure that no insn is left
without a type attribute. Creates a placeholder type "vector" for insns
where a type isn't clear

Tested for regressions using rv32/rv64 gc/gcv multilib with newlib/linux.

gcc/Changelog:

* config/riscv/autovec-vls.md: Update types
* config/riscv/riscv.md: Add vector placeholder type
* config/riscv/vector.md: Update types

OK
jeff




Re: [PATCH V2] RISC-V: Fix error combine of pred_mov pattern

2023-08-28 Thread Jeff Law via Gcc-patches




On 8/10/23 06:21, Lehua Ding wrote:


+;; vle.v/vse.v,vmv.v.v
+(define_insn_and_split "*pred_mov"
+  [(set (match_operand:V_VLS 0 "nonimmediate_operand""=vr,vr,
vd, m,vr,vr")
+(if_then_else:V_VLS
+  (unspec:
+[(match_operand: 1 "vector_mask_operand"   "vmWc1,   Wc1,
vm, vmWc1,   Wc1,   Wc1")
+ (match_operand 4 "vector_length_operand"  "   rK,rK,
rK,rK,rK,rK")
+ (match_operand 5 "const_int_operand"  "i, i, 
i, i, i, i")
+ (match_operand 6 "const_int_operand"  "i, i, 
i, i, i, i")
+ (match_operand 7 "const_int_operand"  "i, i, 
i, i, i, i")
+ (reg:SI VL_REGNUM)
+ (reg:SI VTYPE_REGNUM)] UNSPEC_VPREDICATE)
+  (match_operand:V_VLS 3 "reg_or_mem_operand"  "m, m, 
m,vr,vr,vr")
+  (match_operand:V_VLS 2 "vector_merge_operand""0,vu,
vu,vu,vu, 0")))]
+  "TARGET_VECTOR && (register_operand (operands[0], mode)
+ || register_operand (operands[3], mode))"

Just a formatting nit in the pattern's condition.

"(TARGET_VECTOR
  && (register_operand (operands[0], mode)
  || register_operand (operands[3], mode)))"

OK with that change.  No need to wait for another approval.  Just update 
the patch, commit and post the committed patch to the list for archival 
purposes.


Thanks, and sorry for the long wait.  I just get busy sometimes.

jeff


Re: [PATCH] RISC-V: Fix error combine of pred_mov pattern

2023-08-28 Thread Jeff Law via Gcc-patches




On 8/11/23 10:30, Lehua Ding wrote:

 > But combine doesn't run at -O0.  So something is inconsistent.  I
 > certainly believe we need to avoid the mem->mem case, but that's
 > independent of combine and affects all optimization levels.

This is an new bug when running all tests after fixing the combine bug.

OK.  I must have misunderstood.   Thanks for clarifying.



 > I think we can simplify to just
 > !(MEM_P (operands[0]) && MEM_P (operands[1])

 > I would have expected those to be handled by the constraints rather than
 > the pattern's condition.

Yeh, the condition of the V2 becomes much simpler after split.
That was the hope.  It is worth noting that for simple moves eg movsi, 
movdi, movsf, movdf, etc there is a requirement that a single insn 
support all the valid combinations.  But I don't think we've ever had 
that requirement for vector modes and the situations where it's 
important are much less likely to trigger for vector moves.  Even more 
so given how the cond_mov patterns are implemented for RISC-V.


Jeff


Re: [PATCH 1/1] RISC-V: Make "prefetch.i" built-in usable

2023-08-28 Thread Jeff Law via Gcc-patches




On 8/9/23 21:10, Tsukasa OI via Gcc-patches wrote:

From: Tsukasa OI 

The "__builtin_riscv_zicbop_cbo_prefetchi" built-in function was terribly
broken so that practically unusable.  It emitted "prefetch.i" but with no
meaningful arguments.

Though incompatible, this commit completely changes the function prototype
of this built-in and makes it usable.  To minimize the functionality issues,
it renames the built-in to "__builtin_riscv_zicbop_prefetch_i".

gcc/ChangeLog:

* config/riscv/riscv-cmo.def: Fix function prototype.
* config/riscv/riscv.md (riscv_prefetchi_): Fix instruction
prototype.  Remove possible prefectch type argument
* doc/extend.texi: Document __builtin_riscv_zicbop_prefetch_i.

gcc/testsuite/ChangeLog:

* gcc.target/riscv/cmo-zicbop-1.c: Reflect new built-in prototype.
* gcc.target/riscv/cmo-zicbop-2.c: Likewise.
diff --git a/gcc/config/riscv/riscv.md b/gcc/config/riscv/riscv.md
index 688fd697255b..5658c7b7e113 100644
--- a/gcc/config/riscv/riscv.md
+++ b/gcc/config/riscv/riscv.md
@@ -3273,9 +3273,8 @@
  })
  
  (define_insn "riscv_prefetchi_"

-  [(unspec_volatile:X [(match_operand:X 0 "address_operand" "r")
-  (match_operand:X 1 "imm5_operand" "i")]
-  UNSPECV_PREI)]
+  [(unspec_volatile:X [(match_operand:X 0 "register_operand" "r")]
+UNSPECV_PREI)]
"TARGET_ZICBOP"
"prefetch.i\t%a0"
  )
What I would suggest is making a new predicate that accepts either a 
register or a register+offset where the offset fits in a signed 12 bit 
immediate.  Use that for operand 0's predicate and I think this will 
"just work" and cover all the cases supported by the prefetch.i instruction.


Jeff


Re: [RFC PATCH 2/2] RISC-V: Fix documentation of __builtin_riscv_pause

2023-08-28 Thread Jeff Law via Gcc-patches




On 8/9/23 00:11, Tsukasa OI via Gcc-patches wrote:

From: Tsukasa OI 

This built-in does not imply the 'Xgnuzihintpausestate' extension.
It does not change architectural state (because all HINTs are prohibited
from doing that).

gcc/ChangeLog:

* doc/extend.texi: Fix the description of __builtin_riscv_pause.

I've pushed this to the trunk.
jeff


Re: [RFC PATCH 1/2] RISC-V: __builtin_riscv_pause for all environment

2023-08-28 Thread Jeff Law via Gcc-patches



On 8/9/23 00:11, Tsukasa OI via Gcc-patches wrote:

From: Tsukasa OI 

The "pause" RISC-V hint instruction requires the 'Zihintpause' extension
(in the assembler).  However, GCC emits "pause" unconditionally, making
an assembler error while compiling code with __builtin_riscv_pause while
the 'Zihintpause' extension disabled.

However, the "pause" instruction code (0x010f) is a HINT and emitting
its instruction code is safe in any environment.

This commit implements handling for the 'Zihintpause' extension and emits
".insn 0x010f" instead of "pause" only if the extension is disabled
(making the diagnostics better).

gcc/ChangeLog:

* common/config/riscv/riscv-common.cc
(riscv_ext_version_table): Implement the 'Zihintpause' extension,
version 2.0.  (riscv_ext_flag_table) Add 'Zihintpause' handling.
* config/riscv/riscv-builtins.cc: Remove availability predicate
"always" and add "hint_pause" and "hint_pause_pseudo", corresponding
the existence of the 'Zihintpause' extension.
(riscv_builtins) Split builtin implementation depending on the
existence of the 'Zihintpause' extension.
* config/riscv/riscv-opts.h
(MASK_ZIHINTPAUSE, TARGET_ZIHINTPAUSE): New.
* config/riscv/riscv.md (riscv_pause): Make it only available when
the 'Zihintpause' extension is enabled.  (riscv_pause_insn) New
"pause" implementation when the 'Zihintpause' extension is disabled.

gcc/testsuite/ChangeLog:

* gcc.target/riscv/builtin_pause.c: Removed.
* gcc.target/riscv/zihintpause.c:
New test when the 'Zihintpause' extension is enabled.
* gcc.target/riscv/zihintpause-noarch.c:
New test when the 'Zihintpause' extension is disabled.
So I cleaned this up a bit per the list discussion and pushed the final 
result.  Hopefully I didn't goof anything too badly ;-)  The net is we 
emit "pause" or a suitable .insn based on TARGET_ZIHINTPAUSE.


Jeffcommit c2d04dd659c499d8df19f68d0602ad4c7d7065c2
Author: Tsukasa OI 
Date:   Mon Aug 28 15:04:13 2023 -0600

RISC-V: __builtin_riscv_pause for all environment

The "pause" RISC-V hint instruction requires the 'Zihintpause' extension (in
the assembler).  However, GCC emits "pause" unconditionally, making an
assembler error while compiling code with __builtin_riscv_pause while the
'Zihintpause' extension disabled.

However, the "pause" instruction code (0x010f) is a HINT and emitting 
its
instruction code is safe in any environment.

This commit implements handling for the 'Zihintpause' extension and emits
".insn 0x010f" instead of "pause" only if the extension is disabled 
(making
the diagnostics better).

gcc/ChangeLog:

* common/config/riscv/riscv-common.cc (riscv_ext_version_table):
Implement the 'Zihintpause' extension, version 2.0.
(riscv_ext_flag_table) Add 'Zihintpause' handling.
* config/riscv/riscv-builtins.cc: Remove availability predicate
"always" and add "hint_pause".
(riscv_builtins) : Add "pause" extension.
* config/riscv/riscv-opts.h (MASK_ZIHINTPAUSE, TARGET_ZIHINTPAUSE): 
New.
* config/riscv/riscv.md (riscv_pause): Adjust output based on
TARGET_ZIHINTPAUSE.

gcc/testsuite/ChangeLog:

* gcc.target/riscv/builtin_pause.c: Removed.
* gcc.target/riscv/zihintpause-1.c: New test when the 'Zihintpause'
extension is enabled.
* gcc.target/riscv/zihintpause-2.c: Likewise.
* gcc.target/riscv/zihintpause-noarch.c: New test when the 
'Zihintpause'
extension is disabled.

diff --git a/gcc/common/config/riscv/riscv-common.cc 
b/gcc/common/config/riscv/riscv-common.cc
index 128a7020172..a5b62cda3a0 100644
--- a/gcc/common/config/riscv/riscv-common.cc
+++ b/gcc/common/config/riscv/riscv-common.cc
@@ -224,6 +224,7 @@ static const struct riscv_ext_version 
riscv_ext_version_table[] =
   {"zkt",   ISA_SPEC_CLASS_NONE, 1, 0},
 
   {"zihintntl", ISA_SPEC_CLASS_NONE, 1, 0},
+  {"zihintpause", ISA_SPEC_CLASS_NONE, 2, 0},
 
   {"zicboz",ISA_SPEC_CLASS_NONE, 1, 0},
   {"zicbom",ISA_SPEC_CLASS_NONE, 1, 0},
@@ -1381,6 +1382,7 @@ static const riscv_ext_flag_table_t 
riscv_ext_flag_table[] =
   {"zkt",&gcc_options::x_riscv_zk_subext, MASK_ZKT},
 
   {"zihintntl", &gcc_options::x_riscv_zi_subext, MASK_ZIHINTNTL},
+  {"zihintpause", &gcc_options::x_riscv_zi_subext, MASK_ZIHINTPAUSE},
 
   {"zicboz", &gcc_options::x_riscv_zicmo_subext, MASK_ZICBOZ},
   {"zicbom", &gcc_options::x_riscv_zicmo_subext, MASK_ZICBOM},
diff --git a/gcc/config/riscv/riscv-builtins.cc 
b/gcc/config/riscv/riscv-builtins.cc
index 79681d75962..8afe7b7e97d 100644
--- a/gcc/config/riscv/riscv-builtins.cc
+++ b/gcc/config/riscv/riscv-builtins.cc
@@ -122,7 +122,7 @@ AVAIL (clmul_zbkc32_or_zbc32, (TARGET_ZBKC || TARGET_ZBC) 
&& !TARGET_64BIT)
 

[PATCH] MATCH: Move `(x | y) & (~x ^ y)` over to use bitwise_inverted_equal_p

2023-08-28 Thread Andrew Pinski via Gcc-patches
This moves the match pattern `(x | y) & (~x ^ y)` over to use 
bitwise_inverted_equal_p.
This now also allows to optmize comparisons and also catches the missed `(~x | 
y) & (x ^ y)`
transformation into `~x & y`.

OK? Bootstrapped and tested on x86_64-linux-gnu with no regressions.

gcc/ChangeLog:

PR tree-optmization/47
* match.pd (`(x | y) & (~x ^ y)`) Use bitwise_inverted_equal_p
instead of matching bit_not.

gcc/testsuite/ChangeLog:

PR tree-optmization/47
* gcc.dg/tree-ssa/cmpbit-4.c: New test.
---
 gcc/match.pd |  7 +++-
 gcc/testsuite/gcc.dg/tree-ssa/cmpbit-4.c | 47 
 2 files changed, 52 insertions(+), 2 deletions(-)
 create mode 100644 gcc/testsuite/gcc.dg/tree-ssa/cmpbit-4.c

diff --git a/gcc/match.pd b/gcc/match.pd
index e6bdc3149b6..47d2733211a 100644
--- a/gcc/match.pd
+++ b/gcc/match.pd
@@ -1616,8 +1616,11 @@ DEFINE_INT_AND_FLOAT_ROUND_FN (RINT)
 
 /* (x | y) & (~x ^ y) -> x & y */
 (simplify
- (bit_and:c (bit_ior:c @0 @1) (bit_xor:c @1 (bit_not @0)))
- (bit_and @0 @1))
+ (bit_and:c (bit_ior:c @0 @1) (bit_xor:c @1 @2))
+ (with { bool wascmp; }
+  (if (bitwise_inverted_equal_p (@0, @2, wascmp)
+   && (!wascmp || element_precision (type) == 1))
+   (bit_and @0 @1
 
 /* (~x | y) & (x | ~y) -> ~(x ^ y) */
 (simplify
diff --git a/gcc/testsuite/gcc.dg/tree-ssa/cmpbit-4.c 
b/gcc/testsuite/gcc.dg/tree-ssa/cmpbit-4.c
new file mode 100644
index 000..cdba5d623af
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/tree-ssa/cmpbit-4.c
@@ -0,0 +1,47 @@
+/* { dg-do compile } */
+/* { dg-options "-O2 -fdump-tree-optimized-raw" } */
+
+int g(int x, int y)
+{
+  int xp = ~x;
+  return (x | y) & (xp ^ y); // x & y
+}
+int g0(int x, int y)
+{
+  int xp = ~x;
+  return (xp | y) & (x ^ y); // ~x & y
+}
+
+_Bool gb(_Bool x, _Bool y)
+{
+  _Bool xp = !x;
+  return (x | y) & (xp ^ y); // x & y
+}
+_Bool gb0(_Bool x, _Bool y)
+{
+  _Bool xp = !x;
+  return (xp | y) & (x ^ y); // !x & y
+}
+
+
+_Bool gbi(int a, int b)
+{
+  _Bool x = a < 2;
+  _Bool y = b < 3;
+  _Bool xp = !x;
+  return (x | y) & (xp ^ y); // x & y
+}
+_Bool gbi0(int a, int b)
+{
+  _Bool x = a < 2;
+  _Bool y = b < 3;
+  _Bool xp = !x;
+  return (xp | y) & (x ^ y); // !x & y
+}
+
+/* All of these should be optimized to `x & y` or `~x & y` */
+/* { dg-final { scan-tree-dump-times "le_expr, " 3 "optimized" } } */
+/* { dg-final { scan-tree-dump-times "gt_expr, " 1 "optimized" } } */
+/* { dg-final { scan-tree-dump-not "bit_xor_expr, " "optimized" } } */
+/* { dg-final { scan-tree-dump-times "bit_and_expr, " 6 "optimized" } } */
+/* { dg-final { scan-tree-dump-times "bit_not_expr, " 2 "optimized" } } */
-- 
2.31.1



[PATCH ver 4] rs6000, add overloaded DFP quantize support

2023-08-28 Thread Carl Love via Gcc-patches


GCC maintainers:

Version 4, additional define_insn name fix.  Change Log fix for the
UNSPEC_DQUAN.  Retested patch on Power 10 LE.

Version 3, fixed the built-in instance names.  Missed removing the "n"
the name.  Added the tighter constraints on the predicates for the
define_insn.  Updated the wording for the built-ins in the
documentation file.  Changed the test file name again.  Updated the
ChangeLog file, added the PR target line.  Retested the patch on Power
10LE and Power 8 and Power 9.

Version 2, renamed the built-in instances.  Changed the name of the
overloaded built-in.  Added the missing documentation for the new
built-ins.  Fixed typos.  Changed name of the test.  Updated the
effective target for the test.  Retested the patch on Power 10LE and
Power 8 and Power 9.

The following patch adds four built-ins for the decimal floating point
(DFP) quantize instructions on rs6000.  The built-ins are for 64-bit
and 128-bit DFP operands.

The patch also adds a test case for the new builtins.

The Patch has been tested on Power 10LE and Power 9 LE/BE.

Please let me know if the patch is acceptable for mainline.  Thanks.

 Carl Love



rs6000, add overloaded DFP quantize support

Add decimal floating point (DFP) quantize built-ins for both 64-bit DFP
and 128-DFP operands.  In each case, there is an immediate version and a
variable version of the built-in.  The RM value is a 2-bit constant int
which specifies the rounding mode to use.  For the immediate versions of
the built-in, the TE field is a 5-bit constant that specifies the value of
the ideal exponent for the result.  The built-in specifications are:

  __Decimal64 builtin_dfp_quantize (_Decimal64, _Decimal64,
const int RM)
  __Decimal64 builtin_dfp_quantize (const int TE, _Decimal64,
const int RM)
  __Decimal128 builtin_dfp_quantize (_Decimal128, _Decimal128,
 const int RM)
  __Decimal128 builtin_dfp_quantize (const int TE, _Decimal128,
 const int RM)

A testcase is added for the new built-in definitions.

gcc/ChangeLog:
* config/rs6000/dfp.md (UNSPEC_DQUAN): New unspec.
(dfp_dqua_, dfp_dquai_): New define_insn.
* config/rs6000/rs6000-builtins.def (__builtin_dfp_dqua,
__builtin_dfp_dquai, __builtin_dfp_dquaq, __builtin_dfp_dquaqi):
New buit-in definitions.
* config/rs6000/rs6000-overload.def (__builtin_dfp_quantize): New
overloaded definition.
* doc/extend.texi: Add documentation for __builtin_dfp_quantize.

gcc/testsuite/
* gcc.target/powerpc/pr93448.c: New test case.

PR target/93448
---
 gcc/config/rs6000/dfp.md   |  25 ++-
 gcc/config/rs6000/rs6000-builtins.def  |  15 ++
 gcc/config/rs6000/rs6000-overload.def  |  10 ++
 gcc/doc/extend.texi|  17 ++
 gcc/testsuite/gcc.target/powerpc/pr93448.c | 200 +
 5 files changed, 266 insertions(+), 1 deletion(-)
 create mode 100644 gcc/testsuite/gcc.target/powerpc/pr93448.c

diff --git a/gcc/config/rs6000/dfp.md b/gcc/config/rs6000/dfp.md
index 5ed8a73ac51..bf4a227b0eb 100644
--- a/gcc/config/rs6000/dfp.md
+++ b/gcc/config/rs6000/dfp.md
@@ -271,7 +271,8 @@ (define_c_enum "unspec"
UNSPEC_DIEX
UNSPEC_DSCLI
UNSPEC_DTSTSFI
-   UNSPEC_DSCRI])
+   UNSPEC_DSCRI
+   UNSPEC_DQUAN])
 
 (define_code_iterator DFP_TEST [eq lt gt unordered])
 
@@ -395,3 +396,25 @@ (define_insn "dfp_dscri_"
   "dscri %0,%1,%2"
   [(set_attr "type" "dfp")
(set_attr "size" "")])
+
+(define_insn "dfp_dqua_"
+  [(set (match_operand:DDTD 0 "gpc_reg_operand" "=d")
+(unspec:DDTD [(match_operand:DDTD 1 "gpc_reg_operand" "d")
+ (match_operand:DDTD 2 "gpc_reg_operand" "d")
+ (match_operand:SI 3 "const_0_to_3_operand" "n")]
+ UNSPEC_DQUAN))]
+  "TARGET_DFP"
+  "dqua %0,%1,%2,%3"
+  [(set_attr "type" "dfp")
+   (set_attr "size" "")])
+
+(define_insn "dfp_dquai_"
+  [(set (match_operand:DDTD 0 "gpc_reg_operand" "=d")
+(unspec:DDTD [(match_operand:SI 1 "s5bit_cint_operand" "n")
+ (match_operand:DDTD 2 "gpc_reg_operand" "d")
+ (match_operand:SI 3 "const_0_to_3_operand" "n")]
+ UNSPEC_DQUAN))]
+  "TARGET_DFP"
+  "dquai %1,%0,%2,%3"
+  [(set_attr "type" "dfp")
+   (set_attr "size" "")])
diff --git a/gcc/config/rs6000/rs6000-builtins.def 
b/gcc/config/rs6000/rs6000-builtins.def
index 8a294d6c934..ce40600e803 100644
--- a/gcc/config/rs6000/rs6000-builtins.def
+++ b/gcc/config/rs6000/rs6000-builtins.def
@@ -2983,6 +2983,21 @@
   const unsigned long long __builtin_unpack_dec128 (_Decimal128, const int<1>);
 UNPACK_TD unpacktd {}
 
+  const _Decimal64 __builtin_dfp_dqua (_Decimal64, _Decimal64, \
+  const int<2>);
+DFPQUAN_64

Re: [PATCH ver 3] rs6000, add overloaded DFP quantize support

2023-08-28 Thread Carl Love via Gcc-patches
On Mon, 2023-08-28 at 10:21 +0800, Kewen.Lin wrote:
> Hi Carl,



> > 
> > A testcase is added for the new built-in definitions.
> > 
> > gcc/ChangeLog:
> > * config/rs6000/dfp.md: New UNSPEC_DQUAN.
> 
> Nit: (UNSPEC_DQUAN): New unspec.

Fixed.

> 



> > +(define_insn "dfp_dqua_"
> > +  [(set (match_operand:DDTD 0 "gpc_reg_operand" "=d")
> > +(unspec:DDTD [(match_operand:DDTD 1 "gpc_reg_operand" "d")
> > + (match_operand:DDTD 2 "gpc_reg_operand" "d")
> > + (match_operand:SI 3 "const_0_to_3_operand" "n")]
> > + UNSPEC_DQUAN))]
> > +  "TARGET_DFP"
> > +  "dqua %0,%1,%2,%3"
> > +  [(set_attr "type" "dfp")
> > +   (set_attr "size" "")])
> > +
> > +(define_insn "dfp_dqua_i"
> 
> Sorry for nitpicking, but what I suggested previously was
> "dfp_dquai_"
> instead of "dfp_dqua_i", "dquai" matches the according mnemonic so
> it's
> read better, i expands to "idd" and "itd" that look odd to me.
> Do you agree "dquai" is better?  If yes, the changelog and the
> related
> expanders need to be updated as well.
> 
> The others look good to me, thanks!

We need to get it right, so don't be sorry for nitpicking.  My bad for
not getting it right the first time.

Fixed.


Carl 



[PATCH] Fix cond-bool-2.c on powerpc and other targets

2023-08-28 Thread Andrew Pinski via Gcc-patches
This adds `--param logical-op-non-short-circuit=1` to the tescase
so it becomes a target indepdendent testcase now.
I filed PR 111217 as the variant of the testcase which fails indepdendently
of the param.

Committed as obvious after testing to make sure it passes on powerpc now.

gcc/testsuite/ChangeLog:

PR testsuite/111215
* gcc.dg/tree-ssa/cond-bool-2.c: Add
`--param logical-op-non-short-circuit=1` to the options.
---
 gcc/testsuite/gcc.dg/tree-ssa/cond-bool-2.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/gcc/testsuite/gcc.dg/tree-ssa/cond-bool-2.c 
b/gcc/testsuite/gcc.dg/tree-ssa/cond-bool-2.c
index b3e7e25dec6..7de89cc0de2 100644
--- a/gcc/testsuite/gcc.dg/tree-ssa/cond-bool-2.c
+++ b/gcc/testsuite/gcc.dg/tree-ssa/cond-bool-2.c
@@ -1,5 +1,5 @@
 /* { dg-do compile } */
-/* { dg-options "-O2 -fdump-tree-optimized-raw" } */
+/* { dg-options "-O2 --param logical-op-non-short-circuit=1 
-fdump-tree-optimized-raw" } */
 
 /* PR tree-optimization/95929 */
 
-- 
2.31.1



Re: [PATCH] fortran: Restore interface to its previous state on error [PR48776]

2023-08-28 Thread Harald Anlauf via Gcc-patches

Hi Mikael,

On 8/27/23 21:22, Mikael Morin via Gcc-patches wrote:

Hello,

this fixes an old error-recovery bug.
Tested on x86_64-pc-linux-gnu.

OK for master?


I have only a minor comment:


+/* Free the leading members of the gfc_interface linked list given in INTR
+   up to the END element (exclusive: the END element is not freed).
+   If END is not nullptr, it is assumed that END is in the linked list starting
+   with INTR.  */
+
+static void
+free_interface_elements_until (gfc_interface *intr, gfc_interface *end)
+{
+  gfc_interface *next;
+
+  for (; intr != end; intr = next)


Would it make sense to add a protection for intr == NULL, i.e.:

+  for (; intr && intr != end; intr = next)

Just to prevent a NULL pointer dereference in case there
is a corruption of the chain or something else went wrong.

Otherwise it looks good to me.

It appears that your patch similarly fixes PR107923.  :-)

Thanks for the patch!

Harald




[PATCH] RISC-V: Add Types to Un-Typed Vector Instructions:

2023-08-28 Thread Edwin Lu
Related Discussion:
https://inbox.sourceware.org/gcc-patches/12fb5088-3f28-0a69-de1e-f387371a5...@gmail.com/

This patch updates vector instructions to ensure that no insn is left
without a type attribute. Creates a placeholder type "vector" for insns
where a type isn't clear

Tested for regressions using rv32/rv64 gc/gcv multilib with newlib/linux. 

gcc/Changelog:

* config/riscv/autovec-vls.md: Update types
* config/riscv/riscv.md: Add vector placeholder type
* config/riscv/vector.md: Update types

Signed-off-by: Edwin Lu 
---
 gcc/config/riscv/autovec-vls.md | 15 ---
 gcc/config/riscv/riscv.md   |  3 ++-
 gcc/config/riscv/vector.md  | 17 -
 3 files changed, 26 insertions(+), 9 deletions(-)

diff --git a/gcc/config/riscv/autovec-vls.md b/gcc/config/riscv/autovec-vls.md
index 1b1d940d779..35b86de25c7 100644
--- a/gcc/config/riscv/autovec-vls.md
+++ b/gcc/config/riscv/autovec-vls.md
@@ -68,6 +68,7 @@ (define_insn_and_split "*mov_mem_to_mem"
   }
 DONE;
   }
+  [(set_attr "type" "vmov")]
 )
 
 (define_insn_and_split "*mov"
@@ -89,6 +90,7 @@ (define_insn_and_split "*mov"
 gcc_assert (ok_p);
 DONE;
   }
+  [(set_attr "type" "vmov")]
 )
 
 (define_expand "mov"
@@ -130,7 +132,9 @@ (define_insn_and_split "*mov_lra"
riscv_vector::RVV_UNOP, operands, 
operands[2]);
 }
   DONE;
-})
+}
+  [(set_attr "type" "vmov")]
+)
 
 (define_insn "*mov_vls"
   [(set (match_operand:VLS 0 "register_operand" "=vr")
@@ -157,6 +161,7 @@ (define_insn_and_split "@vec_duplicate"
riscv_vector::RVV_UNOP, operands);
 DONE;
   }
+  [(set_attr "type" "vector")]
 )
 
 ;; -
@@ -180,7 +185,9 @@ (define_insn_and_split "3"
   riscv_vector::emit_vlmax_insn (code_for_pred (, mode),
 riscv_vector::RVV_BINOP, operands);
   DONE;
-})
+}
+[(set_attr "type" "vector")]
+)
 
 ;; 
---
 ;;  [INT] Unary operations
@@ -201,4 +208,6 @@ (define_insn_and_split "2"
   insn_code icode = code_for_pred (, mode);
   riscv_vector::emit_vlmax_insn (icode, riscv_vector::RVV_UNOP, operands);
   DONE;
-})
+}
+[(set_attr "type" "vector")]
+)
diff --git a/gcc/config/riscv/riscv.md b/gcc/config/riscv/riscv.md
index 47d14d99903..4d062307ad9 100644
--- a/gcc/config/riscv/riscv.md
+++ b/gcc/config/riscv/riscv.md
@@ -410,6 +410,7 @@ (define_attr "ext_enabled" "no,yes"
 ;; vgather  vector register gather instructions
 ;; vcompressvector compress instruction
 ;; vmov whole vector register move
+;; vector   unknown vector instruction
 (define_attr "type"
   "unknown,branch,jump,call,load,fpload,store,fpstore,
mtc,mfc,const,arith,logical,shift,slt,imul,idiv,move,fmove,fadd,fmul,
@@ -429,7 +430,7 @@ (define_attr "type"
vired,viwred,vfredu,vfredo,vfwredu,vfwredo,
vmalu,vmpop,vmffs,vmsfs,vmiota,vmidx,vimovvx,vimovxv,vfmovvf,vfmovfv,
vslideup,vslidedown,vislide1up,vislide1down,vfslide1up,vfslide1down,
-   vgather,vcompress,vmov"
+   vgather,vcompress,vmov,vector"
   (cond [(eq_attr "got" "load") (const_string "load")
 
 ;; If a doubleword move uses these expensive instructions,
diff --git a/gcc/config/riscv/vector.md b/gcc/config/riscv/vector.md
index a442e0fdd3c..ea836968878 100644
--- a/gcc/config/riscv/vector.md
+++ b/gcc/config/riscv/vector.md
@@ -878,13 +878,15 @@ (define_insn "@vundefined"
   [(set (match_operand:V 0 "register_operand" "=vr")
(unspec:V [(reg:SI X0_REGNUM)] UNSPEC_VUNDEF))]
   "TARGET_VECTOR"
-  "")
+  ""
+  [(set_attr "type" "vector")])
 
 (define_insn "@vundefined"
   [(set (match_operand:VB 0 "register_operand" "=vr")
(unspec:VB [(reg:SI X0_REGNUM)] UNSPEC_VUNDEF))]
   "TARGET_VECTOR"
-  "")
+  ""
+  [(set_attr "type" "vector")])
 
 (define_expand "@vreinterpret"
   [(set (match_operand:V 0 "register_operand")
@@ -935,7 +937,8 @@ (define_insn "@vlmax_avl"
   [(set (match_operand:P 0 "register_operand" "=r")
(unspec:P [(match_operand:P 1 "const_int_operand" "i")] UNSPEC_VLMAX))]
   "TARGET_VECTOR"
-  "")
+  ""
+  [(set_attr "type" "vector")])
 
 ;; Set VXRM
 (define_insn "vxrmsi"
@@ -1135,7 +1138,9 @@ (define_insn_and_split "*mov_lra"
 riscv_vector::RVV_UNOP, operands, 
operands[2]);
 }
   DONE;
-})
+}
+[(set_attr "type" "vector")]
+)
 
 (define_insn_and_split "*mov_lra"
   [(set (match_operand:VB 0 "reg_or_mem_operand" "=vr, m,vr")
@@ -1155,7 +1160,9 @@ (define_insn_and_split "*mov_lra"
 riscv_vector::RVV_UNOP, operands, 
operands[2]);
 }
   DONE;
-})
+}
+[(set_attr "type" "vector")]
+)
 
 ;; Define tuple modes data movement.
 ;; operands[2] is used to save the offset of each subpart.
-- 
2.34.1



Re: [PATCH V4] RISC-V: Enable vec_int testsuite for RVV VLA vectorization

2023-08-28 Thread Jeff Law via Gcc-patches




On 8/28/23 08:27, Robin Dapp via Gcc-patches wrote:

LGTM from my side, but I would like to wait Robin is ok too


In principle I'm OK with it as well, realizing we will still need to fine-tune
a lot here anyway.  For now, IMHO it's good to have some additional test 
coverage
in the vector space but we should not expect every test to be correct/a good 
match
for everything we do yet.  Juzhe mentioned he doesn't want to commit this before
all/most bugs are addresses anyway, right?
No strong opinions on my side.  We could enable now knowing there's 
failures and that list of failures becomes a TODO list.  Or we could 
wait for more to be working before committing to keep our test results 
reasonably clean.


I could make an argument for either direction, but since Juzhe & Robin 
are doing most of the autovec work, happy to go with whatever they prefer.


jeff


[April 2022 PING] cpp: new built-in __EXP_COUNTER__

2023-08-28 Thread Kaz Kylheku via Gcc-patches
On 2022-06-13 16:13, Kaz Kylheku wrote:
> Pinging this item:
> 
> https://gcc.gnu.org/pipermail/gcc-patches/2022-April/593473.html
> 
> Thanks.


[PATCH v2] aarch64: Fine-grained ldp and stp policies with test-cases.

2023-08-28 Thread Manos Anagnostakis
This patch implements the following TODO in gcc/config/aarch64/aarch64.cc
to provide the requested behaviour for handling ldp and stp:

  /* Allow the tuning structure to disable LDP instruction formation
 from combining instructions (e.g., in peephole2).
 TODO: Implement fine-grained tuning control for LDP and STP:
   1. control policies for load and store separately;
   2. support the following policies:
  - default (use what is in the tuning structure)
  - always
  - never
  - aligned (only if the compiler can prove that the
load will be aligned to 2 * element_size)  */

It provides two new and concrete command-line options -mldp-policy and 
-mstp-policy
to give the ability to control load and store policies seperately as
stated in part 1 of the TODO.

The accepted values for both options are:
- default: Use the ldp/stp policy defined in the corresponding tuning
  structure.
- always: Emit ldp/stp regardless of alignment.
- never: Do not emit ldp/stp.
- aligned: In order to emit ldp/stp, first check if the load/store will
  be aligned to 2 * element_size.

gcc/ChangeLog:
* config/aarch64/aarch64-protos.h (struct tune_params): Add
appropriate enums for the policies.
* config/aarch64/aarch64-tuning-flags.def
(AARCH64_EXTRA_TUNING_OPTION): Remove superseded tuning
options.
* config/aarch64/aarch64.cc (aarch64_parse_ldp_policy): New
function to parse ldp-policy option.
(aarch64_parse_stp_policy): New function to parse stp-policy option.
(aarch64_override_options_internal): Call parsing functions.
(aarch64_operands_ok_for_ldpstp): Add option-value check and
alignment check and remove superseded ones
(aarch64_operands_adjust_ok_for_ldpstp): Add option-value check and
alignment check and remove superseded ones.
* config/aarch64/aarch64.opt: Add options.

gcc/testsuite/ChangeLog:
* gcc.target/aarch64/ldp_aligned.c: New test.
* gcc.target/aarch64/ldp_always.c: New test.
* gcc.target/aarch64/ldp_never.c: New test.
* gcc.target/aarch64/stp_aligned.c: New test.
* gcc.target/aarch64/stp_always.c: New test.
* gcc.target/aarch64/stp_never.c: New test.

Signed-off-by: Manos Anagnostakis 
---
Changes in v2:
- Fixed commited ldp tests to correctly trigger
  and test aarch64_operands_adjust_ok_for_ldpstp in aarch64.cc.
- Added "-mcpu=generic" to commited tests to guarantee generic target 
code
  generation and not cause the regressions of v1.

 gcc/config/aarch64/aarch64-protos.h   |  24 ++
 gcc/config/aarch64/aarch64-tuning-flags.def   |   8 -
 gcc/config/aarch64/aarch64.cc | 229 ++
 gcc/config/aarch64/aarch64.opt|   8 +
 .../gcc.target/aarch64/ldp_aligned.c  |  66 +
 gcc/testsuite/gcc.target/aarch64/ldp_always.c |  66 +
 gcc/testsuite/gcc.target/aarch64/ldp_never.c  |  66 +
 .../gcc.target/aarch64/stp_aligned.c  |  60 +
 gcc/testsuite/gcc.target/aarch64/stp_always.c |  60 +
 gcc/testsuite/gcc.target/aarch64/stp_never.c  |  60 +
 10 files changed, 586 insertions(+), 61 deletions(-)
 create mode 100644 gcc/testsuite/gcc.target/aarch64/ldp_aligned.c
 create mode 100644 gcc/testsuite/gcc.target/aarch64/ldp_always.c
 create mode 100644 gcc/testsuite/gcc.target/aarch64/ldp_never.c
 create mode 100644 gcc/testsuite/gcc.target/aarch64/stp_aligned.c
 create mode 100644 gcc/testsuite/gcc.target/aarch64/stp_always.c
 create mode 100644 gcc/testsuite/gcc.target/aarch64/stp_never.c

diff --git a/gcc/config/aarch64/aarch64-protos.h 
b/gcc/config/aarch64/aarch64-protos.h
index 70303d6fd95..be1d73490ed 100644
--- a/gcc/config/aarch64/aarch64-protos.h
+++ b/gcc/config/aarch64/aarch64-protos.h
@@ -568,6 +568,30 @@ struct tune_params
   /* Place prefetch struct pointer at the end to enable type checking
  errors when tune_params misses elements (e.g., from erroneous merges).  */
   const struct cpu_prefetch_tune *prefetch;
+/* An enum specifying how to handle load pairs using a fine-grained policy:
+   - LDP_POLICY_ALIGNED: Emit ldp if the source pointer is aligned
+   to at least double the alignment of the type.
+   - LDP_POLICY_ALWAYS: Emit ldp regardless of alignment.
+   - LDP_POLICY_NEVER: Do not emit ldp.  */
+
+  enum aarch64_ldp_policy_model
+  {
+LDP_POLICY_ALIGNED,
+LDP_POLICY_ALWAYS,
+LDP_POLICY_NEVER
+  } ldp_policy_model;
+/* An enum specifying how to handle store pairs using a fine-grained policy:
+   - STP_POLICY_ALIGNED: Emit stp if the source pointer is aligned
+   to at least double the alignment of the type.
+   - STP_POLICY_ALWAYS: Emit stp regardless of alignment.
+   - STP_POLICY_NEVER: Do not emit stp.  */
+
+  enum aarch64_stp_policy_model
+  {
+STP_POLICY_ALIGNED,
+STP_POLICY_ALWAYS,
+STP_POLICY_NEVER
+  } stp_policy

[RFC] > WIDE_INT_MAX_PREC support in wide-int

2023-08-28 Thread Jakub Jelinek via Gcc-patches
Hi!

While the _BitInt series isn't committed yet, I had a quick look at
lifting the current lowest limitation on maximum _BitInt precision,
that wide_int can only support wide_int until WIDE_INT_MAX_PRECISION - 1.

Note, other limits if that is lifted are INTEGER_CST currently using 3
unsigned char members and so being able to only hold up to 255 * 64 = 16320
bit numbers and then TYPE_PRECISION being 16-bit, so limiting us to 65535
bits.  The INTEGER_CST limit could be dealt with by dropping the
int_length.offset "cache" and making int_length.extended and
int_length.unextended members unsinged short rather than unsigned char.

The following so far just compile tested patch changes wide_int_storage
to be a union, for precisions up to WIDE_INT_MAX_PRECISION inclusive it
will work as before (just being no longer trivially copyable type and
having an inline destructor), while larger precision instead use a pointer
to heap allocated array.
For wide_int this is fairly easy (of course, I'd need to see what the
patch does to gcc code size and compile time performance, some
growth/slowdown is certain), but I'd like to brainstorm on
widest_int/widest2_int.

Currently it is a constant precision storage with WIDE_INT_MAX_PRECISION
precision (widest2_int twice that), so memory layout-wide on at least 64-bit
hosts identical to wide_int, just it doesn't have precision member and so
32 bits smaller on 32-bit hosts.  It is used in lots of places.

I think the most common is what is done e.g. in tree_int_cst* comparisons
and similarly, using wi::to_widest () to just compare INTEGER_CSTs.
That case actually doesn't even use wide_int but widest_extended_tree
as storage, unless stored into widest_int in between (that happens in
various spots as well).  For comparisons, it would be fine if
widest_int_storage/widest_extended_tree storages had a dynamic precision,
WIDE_INT_MAX_PRECISION for most of the cases (if only
precision < WIDE_INT_MAX_PRECISION is involved), otherwise the needed
precision (e.g. for binary ops) which would be what we say have in
INTEGER_CST or some type, rounded up to whole multiples of HOST_WIDE_INTs
and if unsigned with multiple of HOST_WIDE_INT precision, have another
HWI to make it always sign-extended.

Another common case is how e.g. tree-ssa-ccp.cc uses them, that is mostly
for bitwise ops and so I think the above would be just fine for that case.

Another case is how tree-ssa-loop-niter.cc uses it, I think for such a usage
it really wants something widest, perhaps we could just try to punt for
_BitInt(N) for N >= WIDE_INT_MAX_PRECISION in there, so that we never care
about bits beyond that limit?

Some passes only use widest_int after the bitint lowering spot, we don't
really need to care about those.

I think another possibility could be to make widest_int_storage etc. always
pretend it has 65536 bit precision or something similarly large and make the
decision on whether inline array or pointer is used in the storage be done
using len.  Unfortunately, set_len method is usually called after filling
the array, not before it (it even sign-extends some cases, so it has to be
done that late).

Or for e.g. binary ops compute widest_int precision based on the 2 (for
binary) or 1 (for unary) operand's .len involved?

Thoughts on this?

Note, the wide-int.cc change is just to show it does something, it would be
a waste to put that into self-test when _BitInt can support such sizes.

2023-08-28  Jakub Jelinek  

* wide-int.h (wide_int_storage): Replace val member with a union of
val and valp.  Declare destructor.
(wide_int_storage::wide_int_storage): Initialize precision to 0
in default ctor.  Allocate u.valp if needed in copy ctor.
(wide_int_storage::~wide_int_storage): New.
(wide_int_storage::operator =): Delete and/or allocate u.valp if
needed.
(wide_int_storage::get_val, wide_int_storage::write_val): Return
u.valp for precision > WIDE_INT_MAX_PRECISION, otherwise u.val.
(wide_int_storage::set_len): Use write_val instead of accessing
val directly.
(wide_int_storage::create): Allocate u.valp if needed.
* value-range.h (irange::maybe_resize): Use a loop instead of
memcpy.
* wide-int.cc (wide_int_cc_tests): Add a test for 4096 bit wide_int
addition.

--- gcc/wide-int.h.jj   2023-06-07 09:42:14.997126190 +0200
+++ gcc/wide-int.h  2023-08-28 15:09:06.498448770 +0200
@@ -1065,7 +1065,11 @@ namespace wi
 class GTY(()) wide_int_storage
 {
 private:
-  HOST_WIDE_INT val[WIDE_INT_MAX_ELTS];
+  union
+  {
+HOST_WIDE_INT val[WIDE_INT_MAX_ELTS];
+HOST_WIDE_INT *valp;
+  } GTY((skip)) u;
   unsigned int len;
   unsigned int precision;
 
@@ -1073,6 +1077,7 @@ public:
   wide_int_storage ();
   template 
   wide_int_storage (const T &);
+  ~wide_int_storage ();
 
   /* The standard generic_wide_int storage methods.  */
   unsigned int get_precision () const;
@@ -1104,7 +1109,7 @@ na

Re: [PATCH V4] RISC-V: Enable vec_int testsuite for RVV VLA vectorization

2023-08-28 Thread Robin Dapp via Gcc-patches
> LGTM from my side, but I would like to wait Robin is ok too

In principle I'm OK with it as well, realizing we will still need to fine-tune
a lot here anyway.  For now, IMHO it's good to have some additional test 
coverage
in the vector space but we should not expect every test to be correct/a good 
match
for everything we do yet.  Juzhe mentioned he doesn't want to commit this before
all/most bugs are addresses anyway, right?

Regards
 Robin


Re: [PATCH V4] RISC-V: Enable vec_int testsuite for RVV VLA vectorization

2023-08-28 Thread Kito Cheng via Gcc-patches
LGTM from my side, but I would like to wait Robin is ok too

Juzhe-Zhong 於 2023年8月28日 週一,19:43寫道:

> XPASS: gcc.dg/vect/no-scevccp-outer-12.c scan-tree-dump-times vect "OUTER
> LOOP VECTORIZED." 1
> XPASS: gcc.dg/vect/no-scevccp-outer-16.c scan-tree-dump-times vect "OUTER
> LOOP VECTORIZED." 1
> XPASS: gcc.dg/vect/no-scevccp-outer-17.c scan-tree-dump-times vect "OUTER
> LOOP VECTORIZED." 1
> XPASS: gcc.dg/vect/no-scevccp-outer-19.c scan-tree-dump-times vect "OUTER
> LOOP VECTORIZED." 1
> XPASS: gcc.dg/vect/no-scevccp-outer-21.c scan-tree-dump-times vect "OUTER
> LOOP VECTORIZED." 1
> FAIL: gcc.dg/vect/no-scevccp-outer-7.c scan-tree-dump-times vect
> "vect_recog_widen_mult_pattern: detected" 1
> FAIL: gcc.dg/vect/no-section-anchors-vect-31.c scan-tree-dump-times vect
> "Alignment of access forced using peeling" 2
> FAIL: gcc.dg/vect/no-section-anchors-vect-64.c scan-tree-dump-times vect
> "Alignment of access forced using peeling" 2
> FAIL: gcc.dg/vect/no-vfa-vect-101.c scan-tree-dump-times vect "can't
> determine dependence" 1
> FAIL: gcc.dg/vect/no-vfa-vect-102.c scan-tree-dump-times vect "possible
> dependence between data-refs" 1
> FAIL: gcc.dg/vect/no-vfa-vect-102a.c scan-tree-dump-times vect "possible
> dependence between data-refs" 1
> FAIL: gcc.dg/vect/no-vfa-vect-37.c scan-tree-dump-times vect "can't
> determine dependence" 2
> FAIL: gcc.dg/vect/pr57705.c -flto -ffat-lto-objects  scan-tree-dump-times
> vect "vectorized 1 loop" 2
> FAIL: gcc.dg/vect/pr57705.c scan-tree-dump-times vect "vectorized 1 loop" 2
> FAIL: gcc.dg/vect/pr63341-1.c -flto -ffat-lto-objects execution test
> FAIL: gcc.dg/vect/pr63341-1.c execution test
> FAIL: gcc.dg/vect/pr63341-2.c -flto -ffat-lto-objects execution test
> FAIL: gcc.dg/vect/pr63341-2.c execution test
> FAIL: gcc.dg/vect/pr65310.c -flto -ffat-lto-objects  scan-tree-dump vect
> "can't force alignment"
> FAIL: gcc.dg/vect/pr65310.c -flto -ffat-lto-objects  scan-tree-dump-not
> vect "misalign = 0"
> FAIL: gcc.dg/vect/pr65310.c scan-tree-dump vect "can't force alignment"
> FAIL: gcc.dg/vect/pr65310.c scan-tree-dump-not vect "misalign = 0"
> FAIL: gcc.dg/vect/pr65518.c -flto -ffat-lto-objects  scan-tree-dump-times
> vect "vectorized 0 loops in function" 2
> FAIL: gcc.dg/vect/pr65518.c scan-tree-dump-times vect "vectorized 0 loops
> in function" 2
> FAIL: gcc.dg/vect/pr68445.c -flto -ffat-lto-objects  scan-tree-dump vect
> "vectorizing stmts using SLP"
> FAIL: gcc.dg/vect/pr68445.c scan-tree-dump vect "vectorizing stmts using
> SLP"
> FAIL: gcc.dg/vect/pr88598-1.c -flto -ffat-lto-objects  scan-tree-dump-not
> optimized "REDUC_PLUS"
> FAIL: gcc.dg/vect/pr88598-1.c scan-tree-dump-not optimized "REDUC_PLUS"
> FAIL: gcc.dg/vect/pr88598-2.c -flto -ffat-lto-objects  scan-tree-dump-not
> optimized "REDUC_PLUS"
> FAIL: gcc.dg/vect/pr88598-2.c scan-tree-dump-not optimized "REDUC_PLUS"
> FAIL: gcc.dg/vect/pr88598-3.c -flto -ffat-lto-objects  scan-tree-dump-not
> optimized "REDUC_PLUS"
> FAIL: gcc.dg/vect/pr88598-3.c scan-tree-dump-not optimized "REDUC_PLUS"
> FAIL: gcc.dg/vect/pr94994.c -flto -ffat-lto-objects execution test
> FAIL: gcc.dg/vect/pr94994.c execution test
> FAIL: gcc.dg/vect/pr97835.c -flto -ffat-lto-objects  scan-tree-dump vect
> "vectorizing stmts using SLP"
> FAIL: gcc.dg/vect/pr97835.c scan-tree-dump vect "vectorizing stmts using
> SLP"
> FAIL: gcc.dg/vect/slp-1.c -flto -ffat-lto-objects  scan-tree-dump-times
> vect "vectorizing stmts using SLP" 4
> FAIL: gcc.dg/vect/slp-1.c scan-tree-dump-times vect "vectorizing stmts
> using SLP" 4
> FAIL: gcc.dg/vect/slp-11a.c -flto -ffat-lto-objects  scan-tree-dump-times
> vect "vectorized 0 loops" 1
> FAIL: gcc.dg/vect/slp-11a.c scan-tree-dump-times vect "vectorized 0 loops"
> 1
> FAIL: gcc.dg/vect/slp-12a.c -flto -ffat-lto-objects  scan-tree-dump-times
> vect "vectorized 0 loops" 1
> FAIL: gcc.dg/vect/slp-12a.c scan-tree-dump-times vect "vectorized 0 loops"
> 1
> FAIL: gcc.dg/vect/slp-12c.c -flto -ffat-lto-objects  scan-tree-dump-times
> vect "vectorized 0 loops" 1
> FAIL: gcc.dg/vect/slp-12c.c -flto -ffat-lto-objects  scan-tree-dump-times
> vect "vectorizing stmts using SLP" 0
> FAIL: gcc.dg/vect/slp-12c.c scan-tree-dump-times vect "vectorized 0 loops"
> 1
> FAIL: gcc.dg/vect/slp-12c.c scan-tree-dump-times vect "vectorizing stmts
> using SLP" 0
> FAIL: gcc.dg/vect/slp-15.c -flto -ffat-lto-objects  scan-tree-dump-times
> vect "vectorized 0 loops" 1
> FAIL: gcc.dg/vect/slp-15.c -flto -ffat-lto-objects  scan-tree-dump-times
> vect "vectorizing stmts using SLP" 0
> FAIL: gcc.dg/vect/slp-15.c scan-tree-dump-times vect "vectorized 0 loops" 1
> FAIL: gcc.dg/vect/slp-15.c scan-tree-dump-times vect "vectorizing stmts
> using SLP" 0
> FAIL: gcc.dg/vect/slp-19a.c -flto -ffat-lto-objects  scan-tree-dump-times
> vect "vectorized 0 loops" 1
> FAIL: gcc.dg/vect/slp-19a.c -flto -ffat-lto-objects  scan-tree-dump-times
> vect "vectorizing stmts using SLP" 0
> FAIL: gcc.dg/vect/slp-19a.c scan-tree-dump-times vect "vectorized 0 loops"
> 1
>

[PATCH] libcpp, v2: Small incremental patch for P1854R4 [PR110341]

2023-08-28 Thread Jakub Jelinek via Gcc-patches
Hi!

Sorry, testing revealed an unused uchar *outbuf; declaration breaking the
build, here is the same patch with that one line removed,
bootstrapped/regtested on x86_64-linux and i686-linux (on top of the earlier
POR110341 patch).

On Sat, Aug 26, 2023 at 01:11:06PM +0200, Jakub Jelinek via Gcc-patches wrote:
> The following incremental patch to the PR110341 posted patch uses
> a special conversion callback instead of conversion from host charset
> (UTF-8/UTF-EBCDIC) to UTF-32, and also ignores all diagnostics from the
> second cpp_interpret_string which should just count chars.  The UTF-EBCDIC
> is untested, but simple enough that it should just work.

2023-08-28  Jakub Jelinek  

PR c++/110341
* charset.cc (one_count_chars, convert_count_chars): New functions.
(narrow_str_to_charconst): Call cpp_interpret_string with type
rather than CPP_STRING32, temporarily override for that call
pfile->cb.diagnostic to noop_diagnostic_cb and
pfile->narrow_cset_desc.func to convert_count_chars and just compare
str.len against str2.len.

--- libcpp/charset.cc.jj2023-08-25 17:14:14.098733396 +0200
+++ libcpp/charset.cc   2023-08-28 12:57:44.858858994 +0200
@@ -446,6 +446,73 @@ one_utf16_to_utf8 (iconv_t bigend, const
   return 0;
 }
 
+
+/* Special routine which just counts number of characters in the
+   string, what exactly is stored into the output doesn't matter
+   as long as it is one uchar per character.  */
+
+static inline int
+one_count_chars (iconv_t, const uchar **inbufp, size_t *inbytesleftp,
+uchar **outbufp, size_t *outbytesleftp)
+{
+  cppchar_t s = 0;
+  int rval;
+
+  /* Check for space first, since we know exactly how much we need.  */
+  if (*outbytesleftp < 1)
+return E2BIG;
+
+#if HOST_CHARSET == HOST_CHARSET_ASCII
+  rval = one_utf8_to_cppchar (inbufp, inbytesleftp, &s);
+  if (rval)
+return rval;
+#else
+  if (*inbytesleftp < 1)
+return EINVAL;
+  static const uchar utf_ebcdic_map[256] = {
+/* See table 4 in http://unicode.org/reports/tr16/tr16-7.2.html  */
+0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
+0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
+0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
+0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
+1, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 1, 1, 1, 1, 1,
+1, 9, 9, 9, 9, 9, 9, 9, 9, 9, 1, 1, 1, 1, 1, 1,
+1, 1, 9, 9, 9, 9, 9, 9, 9, 9, 9, 1, 1, 1, 1, 1,
+9, 9, 9, 9, 2, 2, 2, 2, 2, 1, 1, 1, 1, 1, 1, 1,
+2, 1, 1, 1, 1, 1, 1, 1, 1, 1, 2, 2, 2, 2, 2, 2,
+2, 1, 1, 1, 1, 1, 1, 1, 1, 1, 2, 2, 2, 2, 2, 2,
+2, 1, 1, 1, 1, 1, 1, 1, 1, 1, 2, 2, 2, 1, 2, 2,
+2, 2, 2, 2, 2, 2, 2, 3, 3, 3, 3, 3, 3, 1, 3, 3,
+1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 3, 3, 3, 3, 3, 3,
+1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 3, 3, 4, 4, 4, 4,
+1, 4, 1, 1, 1, 1, 1, 1, 1, 1, 4, 4, 4, 5, 5, 5,
+1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 5, 6, 6, 7, 7, 0
+  };
+  rval = utf_ebcdic_map[**inbufp];
+  if (rval == 9)
+return EILSEQ;
+  if (rval == 0)
+rval = 1;
+  if (rval >= 2)
+{
+  if (*inbytesleftp < rval)
+   return EINVAL;
+  for (int i = 1; i < rval; ++i)
+   if (utf_ebcdic_map[(*inbufp)[i]] != 9)
+ return EILSEQ;
+}
+  *inbytesleftp -= rval;
+  *inbufp += rval;
+#endif
+
+  **outbufp = ' ';
+
+  *outbufp += 1;
+  *outbytesleftp -= 1;
+  return 0;
+}
+
+
 /* Helper routine for the next few functions.  The 'const' on
one_conversion means that we promise not to modify what function is
pointed to, which lets the inliner see through it.  */
@@ -529,6 +596,15 @@ convert_utf32_utf8 (iconv_t cd, const uc
   return conversion_loop (one_utf32_to_utf8, cd, from, flen, to);
 }
 
+/* Magic conversion which just counts characters from input, so
+   only to->len is significant.  */
+static bool
+convert_count_chars (iconv_t cd, const uchar *from,
+size_t flen, struct _cpp_strbuf *to)
+{
+  return conversion_loop (one_count_chars, cd, from, flen, to);
+}
+
 /* Identity conversion, used when we have no alternative.  */
 static bool
 convert_no_conversion (iconv_t cd ATTRIBUTE_UNUSED,
@@ -2623,15 +2699,22 @@ narrow_str_to_charconst (cpp_reader *pfi
 ill-formed.  We need to count the number of c-chars and compare
 that to str.len.  */
   cpp_string str2 = { 0, 0 };
-  if (cpp_interpret_string (pfile, &token->val.str, 1, &str2,
-   CPP_STRING32))
+  bool (*saved_diagnostic_handler) (cpp_reader *, enum 
cpp_diagnostic_level,
+   enum cpp_warning_reason, rich_location 
*,
+   const char *, va_list *)
+   ATTRIBUTE_FPTR_PRINTF(5,0);
+  saved_diagnostic_handler = pfile->cb.diagnostic;
+  pfile->cb.diagnostic = noop_diagnostic_cb;
+  convert_f save_func = pfile->narrow_cset_desc.func;
+  pfile->narrow_cset_desc.func = convert_count_chars;
+  bool ret = cpp_interpret_string 

[PATCH] c++, v2: Fix up mangling of function/block scope static structured bindings and emit abi tags [PR111069]

2023-08-28 Thread Jakub Jelinek via Gcc-patches
Hi!

On Thu, Aug 24, 2023 at 06:39:10PM +0200, Jakub Jelinek via Gcc-patches wrote:
> > Maybe do this in mangle_decomp, based on the actual mangling in process
> > instead of this pseudo-mangling?
> 
> Not sure that is possible, for 2 reasons:
> 1) determine_local_discriminator otherwise works on DECL_NAME, not mangled
>names, so if one uses (albeit implementation reserved)
>_ZZN1N3fooI1TB3bazEEivEDC1h1iEB6foobar and similar identifiers, they
>could clash with the counting of the structured bindings
> 2) seems the local discriminator counting shouldn't take into account
>details like abi tags, e.g. if I have:

The following updated patch handles everything except it leaves for the
above 2 reasons the determination of local discriminator where it was.
I had to add a new (defaulted) argument to cp_finish_decl and do
cp_maybe_mangle_decomp from there, so that it is after e.g. auto type
deduction and maybe_commonize_var (which had to be changed as well) and
spots in cp_finish_decl where we need or might need mangled names already.

Bootstrapped/regtested on x86_64-linux and i686-linux, ok for trunk?

There is one difference between g++ with this patch and clang++,
g++ uses
_ZZ3barI1TB3quxEivEDC1o1pEB3qux
while clang++ uses
_ZZ3barI1TB3quxEivEDC1o1pE
but from what I can see, such a difference is there also when just using
normal local decls:
struct [[gnu::abi_tag ("foobar")]] S { int i; };
struct [[gnu::abi_tag ("qux")]] T { int i; S j; int k; };

inline int
foo ()
{
  static S c;
  static T d;
  return ++c.i + ++d.i;
}

template 
inline int
bar ()
{
  static S c;
  static T d;
  return ++c.i + ++d.i;
}

int (*p) () = &foo;
int (*q) () = &bar;
where both compilers mangle c in foo as:
_ZZ3foovE1cB6foobar
and d in there as
_ZZ3foovE1dB3qux
and similarly both compilers mangle c in bar as
_ZZ3barI1TB3quxEivE1cB6foobar
but g++ mangles d in bar as
_ZZ3barI1TB3quxEivE1dB3qux
while clang++ mangles it as just
_ZZ3barI1TB3quxEivE1d
No idea what is right or wrong according to Itanium mangling.

2023-08-28  Jakub Jelinek  

PR c++/111069
gcc/
* common.opt (fabi-version=): Document version 19.
* doc/invoke.texi (-fabi-version=): Likewise.
gcc/c-family/
* c-opts.cc (c_common_post_options): Change latest_abi_version to 19.
gcc/cp/
* cp-tree.h (determine_local_discriminator): Add NAME argument with
NULL_TREE default.
(struct cp_decomp): New type.
(cp_finish_decl): Add DECOMP argument defaulted to nullptr.
(cp_maybe_mangle_decomp): Remove declaration.
* decl.cc (determine_local_discriminator): Add NAME argument, use it
if non-NULL, otherwise compute it the old way.
(maybe_commonize_var): Don't return early for structured bindings.
(cp_finish_decl): Add DECOMP argument, if non-NULL, call
cp_maybe_mangle_decomp.
(cp_maybe_mangle_decomp): Make it static with a forward declaration.
Call determine_local_discriminator.
* mangle.cc (find_decomp_unqualified_name): Remove.
(write_unqualified_name): Don't call find_decomp_unqualified_name.
(mangle_decomp): Handle mangling of static function/block scope
structured bindings.  Don't call decl_mangling_context twice.  Call
check_abi_tags, call write_abi_tags for abi version >= 19 and emit
-Wabi warnings if needed.
(write_guarded_var_name): Handle structured bindings.
(mangle_ref_init_variable): Use write_guarded_var_name.
* parser.cc (cp_convert_range_for, cp_parser_decomposition_declaration,
cp_finish_omp_range_for): Don't call cp_maybe_mangle_decomp, adjust
cp_finish_decl callers.
* pt.cc (tsubst_expr): Likewise.
gcc/testsuite/
* g++.dg/cpp2a/decomp8.C: New test.
* g++.dg/cpp2a/decomp9.C: New test.
* g++.dg/abi/macro0.C: Expect __GXX_ABI_VERSION 1019 rather than
1018.

--- gcc/common.opt.jj   2023-08-28 10:32:41.519579280 +0200
+++ gcc/common.opt  2023-08-28 10:35:30.337342832 +0200
@@ -1010,6 +1010,9 @@ Driver Undocumented
 ; 18: Corrects errors in mangling of lambdas with additional context.
 ; Default in G++ 13.
 ;
+; 19: Emits ABI tags if needed in structured binding mangled names.
+; Default in G++ 14.
+;
 ; Additional positive integers will be assigned as new versions of
 ; the ABI become the default version of the ABI.
 fabi-version=
--- gcc/doc/invoke.texi.jj  2023-08-28 10:32:42.322568643 +0200
+++ gcc/doc/invoke.texi 2023-08-28 10:35:30.342342766 +0200
@@ -3016,6 +3016,9 @@ in C++14 and up.
 Version 18, which first appeard in G++ 13, fixes manglings of lambdas
 that have additional context.
 
+Version 19, which first appeard in G++ 14, fixes manglings of structured
+bindings to include ABI tags.
+
 See also @option{-Wabi}.
 
 @opindex fabi-compat-version
--- gcc/c-family/c-opts.cc.jj   2023-08-28 10:32:41.462580035 +0200
+++ gcc/c-family/c-opts.cc  2023-08-28 10:35:30.338342819 +0200
@@ -974,7

Re: Ping^^ [PATCH V5 2/2] Optimize '(X - N * M) / N' to 'X / N - M' if valid

2023-08-28 Thread Richard Biener via Gcc-patches
On Wed, 23 Aug 2023, guojiufu wrote:

> Hi,
> 
> I would like to have a gentle ping...
> 
> BR,
> Jeff (Jiufu Guo)
> 
> On 2023-08-07 10:45, guojiufu via Gcc-patches wrote:
> > Hi,
> > 
> > Gentle ping...
> > 
> > On 2023-07-18 22:05, Jiufu Guo wrote:
> >> Hi,
> >> 
> >> Integer expression "(X - N * M) / N" can be optimized to "X / N - M"
> >> if there is no wrap/overflow/underflow and "X - N * M" has the same
> >> sign with "X".
> >> 
> >> Compare the previous version:
> >> https://gcc.gnu.org/pipermail/gcc-patches/2023-July/624067.html
> >> - APIs: overflow, nonnegative_p and nonpositive_p are moved close
> >>   to value range.
> >> - Use above APIs in match.pd.
> >> 
> >> Bootstrap & regtest pass on ppc64{,le} and x86_64.
> >> Is this patch ok for trunk?
> >> 
> >> BR,
> >> Jeff (Jiufu Guo)
> >> 
> >>  PR tree-optimization/108757
> >> 
> >> gcc/ChangeLog:
> >> 
> >>  * match.pd ((X - N * M) / N): New pattern.
> >>  ((X + N * M) / N): New pattern.
> >>  ((X + C) div_rshift N): New pattern.
> >> 
> >> gcc/testsuite/ChangeLog:
> >> 
> >>  * gcc.dg/pr108757-1.c: New test.
> >>  * gcc.dg/pr108757-2.c: New test.
> >>  * gcc.dg/pr108757.h: New test.
> >> 
> >> ---
> >>  gcc/match.pd  |  85 +++
> >>  gcc/testsuite/gcc.dg/pr108757-1.c |  18 +++
> >>  gcc/testsuite/gcc.dg/pr108757-2.c |  19 +++
> >>  gcc/testsuite/gcc.dg/pr108757.h   | 233 
> >> ++
> >>  4 files changed, 355 insertions(+)
> >>  create mode 100644 gcc/testsuite/gcc.dg/pr108757-1.c
> >>  create mode 100644 gcc/testsuite/gcc.dg/pr108757-2.c
> >>  create mode 100644 gcc/testsuite/gcc.dg/pr108757.h
> >> 
> >> diff --git a/gcc/match.pd b/gcc/match.pd
> >> index 8543f777a28..39dbb0567dc 100644
> >> --- a/gcc/match.pd
> >> +++ b/gcc/match.pd
> >> @@ -942,6 +942,91 @@ DEFINE_INT_AND_FLOAT_ROUND_FN (RINT)
> >>  #endif
> >> 
> >> 
> >> +#if GIMPLE
> >> +(for div (trunc_div exact_div)
> >> + /* Simplify (t + M*N) / N -> t / N + M.  */
> >> + (simplify
> >> +  (div (plus:c@4 @0 (mult:c@3 @1 @2)) @2)

The :c on the plus isn't necessary?

> >> +  (with {value_range vr0, vr1, vr2, vr3, vr4;}
> >> +  (if (INTEGRAL_TYPE_P (type)
> >> +   && get_range_query (cfun)->range_of_expr (vr1, @1)
> >> +   && get_range_query (cfun)->range_of_expr (vr2, @2)
> >> +   && range_op_handler (MULT_EXPR).overflow_free_p (vr1, vr2)

the multiplication doesn't overflow

> >> +   && get_range_query (cfun)->range_of_expr (vr0, @0)
> >> +   && get_range_query (cfun)->range_of_expr (vr3, @3)
> >> +   && range_op_handler (PLUS_EXPR).overflow_free_p (vr0, vr3)

the add doesn't overflow

> >> +   && get_range_query (cfun)->range_of_expr (vr4, @4)
> >> +   && (TYPE_UNSIGNED (type)
> >> + || (vr0.nonnegative_p () && vr4.nonnegative_p ())
> >> + || (vr0.nonpositive_p () && vr4.nonpositive_p (

I don't know what this checks - the add result and the add first
argument are not of opposite sign.  Huh.  At least this part
needs an explaining comment.

Sorry if we hashed this out before, but you can see I forgot
and it's not obvious.

> >> +  (plus (div @0 @2) @1
> >> +
> >> + /* Simplify (t - M*N) / N -> t / N - M.  */
> >> + (simplify
> >> +  (div (minus@4 @0 (mult:c@3 @1 @2)) @2)
> >> +  (with {value_range vr0, vr1, vr2, vr3, vr4;}
> >> +  (if (INTEGRAL_TYPE_P (type)
> >> +   && get_range_query (cfun)->range_of_expr (vr1, @1)
> >> +   && get_range_query (cfun)->range_of_expr (vr2, @2)
> >> +   && range_op_handler (MULT_EXPR).overflow_free_p (vr1, vr2)
> >> +   && get_range_query (cfun)->range_of_expr (vr0, @0)
> >> +   && get_range_query (cfun)->range_of_expr (vr3, @3)
> >> +   && range_op_handler (MINUS_EXPR).overflow_free_p (vr0, vr3)
> >> +   && get_range_query (cfun)->range_of_expr (vr4, @4)
> >> +   && (TYPE_UNSIGNED (type)
> >> + || (vr0.nonnegative_p () && vr4.nonnegative_p ())
> >> + || (vr0.nonpositive_p () && vr4.nonpositive_p (
> >> +  (minus (div @0 @2) @1)

looks like exactly the same - if you use a

 (for addsub (plus minus)

you should be able to do range_op_handler (addsub).

> >> +
> >> +/* Simplify
> >> +   (t + C) / N -> t / N + C / N where C is multiple of N.
> >> +   (t + C) >> N -> t >> N + C>>N if low N bits of C is 0.  */
> >> +(for op (trunc_div exact_div rshift)
> >> + (simplify
> >> +  (op (plus@3 @0 INTEGER_CST@1) INTEGER_CST@2)
> >> +   (with
> >> +{
> >> +  wide_int c = wi::to_wide (@1);
> >> +  wide_int n = wi::to_wide (@2);
> >> +  bool is_rshift = op == RSHIFT_EXPR;
> >> +  bool neg_c = false;
> >> +  bool ok = false;
> >> +  value_range vr0;
> >> +  if (INTEGRAL_TYPE_P (type)
> >> +&& get_range_query (cfun)->range_of_expr (vr0, @0))
> >> +{
> >> +ok = is_rshift ? wi::ctz (c) >= n.to_shwi ()
> >> +   : wi::multiple_of_p (c, n, TYPE_SIGN (type));
> >> +value_range vr1, vr3;
> >> +ok = ok && get_range_query (cfun)->range_of_expr (vr1, @1)

RE: [PATCH] RISC-V: Fix uninitialized probability for GIMPLE IR tests

2023-08-28 Thread Li, Pan2 via Gcc-patches
Committed, thanks Kito.

Pan

-Original Message-
From: Gcc-patches  On Behalf 
Of Kito Cheng via Gcc-patches
Sent: Monday, August 28, 2023 8:59 PM
To: Juzhe-Zhong 
Cc: GCC Patches ; Kito Cheng 
Subject: Re: [PATCH] RISC-V: Fix uninitialized probability for GIMPLE IR tests

LGTM

Juzhe-Zhong  於 2023年8月28日 週一 19:40 寫道:

> This patch fix unitialized probability in GIMPLE IR code tests:
> FAIL: gcc.dg/vect/slp-reduc-10a.c (internal compiler error: in
> compute_probabilities, at config/riscv/riscv-vsetvl.cc:4358)
> FAIL: gcc.dg/vect/slp-reduc-10a.c (test for excess errors)
> FAIL: gcc.dg/vect/slp-reduc-10a.c -flto -ffat-lto-objects (internal
> compiler error: in compute_probabilities, at
> config/riscv/riscv-vsetvl.cc:4358)
> FAIL: gcc.dg/vect/slp-reduc-10a.c -flto -ffat-lto-objects (test for excess
> errors)
> FAIL: gcc.dg/vect/slp-reduc-10b.c (internal compiler error: in
> compute_probabilities, at config/riscv/riscv-vsetvl.cc:4358)
> FAIL: gcc.dg/vect/slp-reduc-10b.c (test for excess errors)
> FAIL: gcc.dg/vect/slp-reduc-10b.c -flto -ffat-lto-objects (internal
> compiler error: in compute_probabilities, at
> config/riscv/riscv-vsetvl.cc:4358)
> FAIL: gcc.dg/vect/slp-reduc-10b.c -flto -ffat-lto-objects (test for excess
> errors)
> FAIL: gcc.dg/vect/slp-reduc-10c.c (internal compiler error: in
> compute_probabilities, at config/riscv/riscv-vsetvl.cc:4358)
> FAIL: gcc.dg/vect/slp-reduc-10c.c (test for excess errors)
> FAIL: gcc.dg/vect/slp-reduc-10c.c -flto -ffat-lto-objects (internal
> compiler error: in compute_probabilities, at
> config/riscv/riscv-vsetvl.cc:4358)
> FAIL: gcc.dg/vect/slp-reduc-10c.c -flto -ffat-lto-objects (test for excess
> errors)
> FAIL: gcc.dg/vect/slp-reduc-10d.c (internal compiler error: in
> compute_probabilities, at config/riscv/riscv-vsetvl.cc:4358)
> FAIL: gcc.dg/vect/slp-reduc-10d.c (test for excess errors)
> FAIL: gcc.dg/vect/slp-reduc-10d.c -flto -ffat-lto-objects (internal
> compiler error: in compute_probabilities, at
> config/riscv/riscv-vsetvl.cc:4358)
> FAIL: gcc.dg/vect/slp-reduc-10d.c -flto -ffat-lto-objects (test for excess
> errors)
> FAIL: gcc.dg/vect/slp-reduc-10e.c (internal compiler error: in
> compute_probabilities, at config/riscv/riscv-vsetvl.cc:4358)
> FAIL: gcc.dg/vect/slp-reduc-10e.c (test for excess errors)
> FAIL: gcc.dg/vect/slp-reduc-10e.c -flto -ffat-lto-objects (internal
> compiler error: in compute_probabilities, at
> config/riscv/riscv-vsetvl.cc:4358)
> FAIL: gcc.dg/vect/slp-reduc-10e.c -flto -ffat-lto-objects (test for excess
> errors)
> FAIL: gcc.dg/vect/vect-cond-arith-2.c (internal compiler error: in
> compute_probabilities, at config/riscv/riscv-vsetvl.cc:4358)
> FAIL: gcc.dg/vect/vect-cond-arith-2.c (test for excess errors)
> FAIL: gcc.dg/vect/vect-cond-arith-2.c -flto -ffat-lto-objects (internal
> compiler error: in compute_probabilities, at
> config/riscv/riscv-vsetvl.cc:4358)
> FAIL: gcc.dg/vect/vect-cond-arith-2.c -flto -ffat-lto-objects (test for
> excess errors)
>
> gcc/ChangeLog:
>
> * config/riscv/riscv-vsetvl.cc (pass_vsetvl::earliest_fusion):
> Skip never probability.
> (pass_vsetvl::compute_probabilities): Fix unitialized probability.
>
> ---
>  gcc/config/riscv/riscv-vsetvl.cc | 13 -
>  1 file changed, 12 insertions(+), 1 deletion(-)
>
> diff --git a/gcc/config/riscv/riscv-vsetvl.cc
> b/gcc/config/riscv/riscv-vsetvl.cc
> index 48e89fe2c03..f7ae6c16bee 100644
> --- a/gcc/config/riscv/riscv-vsetvl.cc
> +++ b/gcc/config/riscv/riscv-vsetvl.cc
> @@ -3272,6 +3272,10 @@ pass_vsetvl::earliest_fusion (void)
>   if (expr.empty_p ())
> continue;
>   edge eg = INDEX_EDGE (m_vector_manager->vector_edge_list, ed);
> + /* If it is the edge that we never reach, skip its possible PRE
> +fusion conservatively.  */
> + if (eg->probability == profile_probability::never ())
> +   break;
>   if (eg->src == ENTRY_BLOCK_PTR_FOR_FN (cfun)
>   || eg->dest == EXIT_BLOCK_PTR_FOR_FN (cfun))
> break;
> @@ -4359,7 +4363,14 @@ pass_vsetvl::compute_probabilities (void)
>FOR_EACH_EDGE (e, ei, cfg_bb->succs)
> {
>   auto &new_prob = get_block_info (e->dest).probability;
> - if (!new_prob.initialized_p ())
> + /* Normally, the edge probability should be initialized.
> +However, some special testing code which is written in
> +GIMPLE IR style force the edge probility uninitialized,
> +we conservatively set it as never so that it will not
> +affect PRE (Phase 3 && Phse 4).  */
> + if (!e->probability.initialized_p ())
> +   new_prob = profile_probability::never ();
> + else if (!new_prob.initialized_p ())
> new_prob = curr_prob * e->probability;
>   else if (new_prob == profile_probability::always ())
> continue;
> --
> 2.36.3
>
>


Re: [PATCH] RISC-V: Fix uninitialized probability for GIMPLE IR tests

2023-08-28 Thread Kito Cheng via Gcc-patches
LGTM

Juzhe-Zhong  於 2023年8月28日 週一 19:40 寫道:

> This patch fix unitialized probability in GIMPLE IR code tests:
> FAIL: gcc.dg/vect/slp-reduc-10a.c (internal compiler error: in
> compute_probabilities, at config/riscv/riscv-vsetvl.cc:4358)
> FAIL: gcc.dg/vect/slp-reduc-10a.c (test for excess errors)
> FAIL: gcc.dg/vect/slp-reduc-10a.c -flto -ffat-lto-objects (internal
> compiler error: in compute_probabilities, at
> config/riscv/riscv-vsetvl.cc:4358)
> FAIL: gcc.dg/vect/slp-reduc-10a.c -flto -ffat-lto-objects (test for excess
> errors)
> FAIL: gcc.dg/vect/slp-reduc-10b.c (internal compiler error: in
> compute_probabilities, at config/riscv/riscv-vsetvl.cc:4358)
> FAIL: gcc.dg/vect/slp-reduc-10b.c (test for excess errors)
> FAIL: gcc.dg/vect/slp-reduc-10b.c -flto -ffat-lto-objects (internal
> compiler error: in compute_probabilities, at
> config/riscv/riscv-vsetvl.cc:4358)
> FAIL: gcc.dg/vect/slp-reduc-10b.c -flto -ffat-lto-objects (test for excess
> errors)
> FAIL: gcc.dg/vect/slp-reduc-10c.c (internal compiler error: in
> compute_probabilities, at config/riscv/riscv-vsetvl.cc:4358)
> FAIL: gcc.dg/vect/slp-reduc-10c.c (test for excess errors)
> FAIL: gcc.dg/vect/slp-reduc-10c.c -flto -ffat-lto-objects (internal
> compiler error: in compute_probabilities, at
> config/riscv/riscv-vsetvl.cc:4358)
> FAIL: gcc.dg/vect/slp-reduc-10c.c -flto -ffat-lto-objects (test for excess
> errors)
> FAIL: gcc.dg/vect/slp-reduc-10d.c (internal compiler error: in
> compute_probabilities, at config/riscv/riscv-vsetvl.cc:4358)
> FAIL: gcc.dg/vect/slp-reduc-10d.c (test for excess errors)
> FAIL: gcc.dg/vect/slp-reduc-10d.c -flto -ffat-lto-objects (internal
> compiler error: in compute_probabilities, at
> config/riscv/riscv-vsetvl.cc:4358)
> FAIL: gcc.dg/vect/slp-reduc-10d.c -flto -ffat-lto-objects (test for excess
> errors)
> FAIL: gcc.dg/vect/slp-reduc-10e.c (internal compiler error: in
> compute_probabilities, at config/riscv/riscv-vsetvl.cc:4358)
> FAIL: gcc.dg/vect/slp-reduc-10e.c (test for excess errors)
> FAIL: gcc.dg/vect/slp-reduc-10e.c -flto -ffat-lto-objects (internal
> compiler error: in compute_probabilities, at
> config/riscv/riscv-vsetvl.cc:4358)
> FAIL: gcc.dg/vect/slp-reduc-10e.c -flto -ffat-lto-objects (test for excess
> errors)
> FAIL: gcc.dg/vect/vect-cond-arith-2.c (internal compiler error: in
> compute_probabilities, at config/riscv/riscv-vsetvl.cc:4358)
> FAIL: gcc.dg/vect/vect-cond-arith-2.c (test for excess errors)
> FAIL: gcc.dg/vect/vect-cond-arith-2.c -flto -ffat-lto-objects (internal
> compiler error: in compute_probabilities, at
> config/riscv/riscv-vsetvl.cc:4358)
> FAIL: gcc.dg/vect/vect-cond-arith-2.c -flto -ffat-lto-objects (test for
> excess errors)
>
> gcc/ChangeLog:
>
> * config/riscv/riscv-vsetvl.cc (pass_vsetvl::earliest_fusion):
> Skip never probability.
> (pass_vsetvl::compute_probabilities): Fix unitialized probability.
>
> ---
>  gcc/config/riscv/riscv-vsetvl.cc | 13 -
>  1 file changed, 12 insertions(+), 1 deletion(-)
>
> diff --git a/gcc/config/riscv/riscv-vsetvl.cc
> b/gcc/config/riscv/riscv-vsetvl.cc
> index 48e89fe2c03..f7ae6c16bee 100644
> --- a/gcc/config/riscv/riscv-vsetvl.cc
> +++ b/gcc/config/riscv/riscv-vsetvl.cc
> @@ -3272,6 +3272,10 @@ pass_vsetvl::earliest_fusion (void)
>   if (expr.empty_p ())
> continue;
>   edge eg = INDEX_EDGE (m_vector_manager->vector_edge_list, ed);
> + /* If it is the edge that we never reach, skip its possible PRE
> +fusion conservatively.  */
> + if (eg->probability == profile_probability::never ())
> +   break;
>   if (eg->src == ENTRY_BLOCK_PTR_FOR_FN (cfun)
>   || eg->dest == EXIT_BLOCK_PTR_FOR_FN (cfun))
> break;
> @@ -4359,7 +4363,14 @@ pass_vsetvl::compute_probabilities (void)
>FOR_EACH_EDGE (e, ei, cfg_bb->succs)
> {
>   auto &new_prob = get_block_info (e->dest).probability;
> - if (!new_prob.initialized_p ())
> + /* Normally, the edge probability should be initialized.
> +However, some special testing code which is written in
> +GIMPLE IR style force the edge probility uninitialized,
> +we conservatively set it as never so that it will not
> +affect PRE (Phase 3 && Phse 4).  */
> + if (!e->probability.initialized_p ())
> +   new_prob = profile_probability::never ();
> + else if (!new_prob.initialized_p ())
> new_prob = curr_prob * e->probability;
>   else if (new_prob == profile_probability::always ())
> continue;
> --
> 2.36.3
>
>


Re: Re: [PATCH V3] RISC-V: Enable vec_int testsuite for RVV VLA vectorization

2023-08-28 Thread juzhe.zh...@rivai.ai
Ok.
It reduced some failures, and new report is updated on the commit log in V4:
https://gcc.gnu.org/pipermail/gcc-patches/2023-August/628580.html 




juzhe.zh...@rivai.ai
 
From: Robin Dapp
Date: 2023-08-28 18:29
To: Juzhe-Zhong; gcc-patches
CC: rdapp.gcc; kito.cheng; kito.cheng; jeffreyalaw
Subject: Re: [PATCH V3] RISC-V: Enable vec_int testsuite for RVV VLA 
vectorization
On 8/28/23 12:16, Juzhe-Zhong wrote:
> FAIL: gcc.dg/vect/bb-slp-10.c -flto -ffat-lto-objects  scan-tree-dump slp2 
> "unsupported unaligned access"
> FAIL: gcc.dg/vect/bb-slp-10.c scan-tree-dump slp2 "unsupported unaligned 
> access"
> XPASS: gcc.dg/vect/no-scevccp-outer-12.c scan-tree-dump-times vect "OUTER 
> LOOP VECTORIZED." 1
> XPASS: gcc.dg/vect/no-scevccp-outer-16.c scan-tree-dump-times vect "OUTER 
> LOOP VECTORIZED." 1
> XPASS: gcc.dg/vect/no-scevccp-outer-17.c scan-tree-dump-times vect "OUTER 
> LOOP VECTORIZED." 1
> XPASS: gcc.dg/vect/no-scevccp-outer-19.c scan-tree-dump-times vect "OUTER 
> LOOP VECTORIZED." 1
> XPASS: gcc.dg/vect/no-scevccp-outer-21.c scan-tree-dump-times vect "OUTER 
> LOOP VECTORIZED." 1
> FAIL: gcc.dg/vect/no-scevccp-outer-7.c scan-tree-dump-times vect 
> "vect_recog_widen_mult_pattern: detected" 1
> XPASS: gcc.dg/vect/no-scevccp-outer-8.c scan-tree-dump-times vect "OUTER LOOP 
> VECTORIZED." 1
> FAIL: gcc.dg/vect/no-section-anchors-vect-31.c scan-tree-dump-times vect 
> "Alignment of access forced using peeling" 2
> FAIL: gcc.dg/vect/no-section-anchors-vect-64.c scan-tree-dump-times vect 
> "Alignment of access forced using peeling" 2
> FAIL: gcc.dg/vect/no-section-anchors-vect-69.c scan-tree-dump-times vect 
> "vectorized 3 loops" 1
> FAIL: gcc.dg/vect/no-vfa-vect-101.c scan-tree-dump-times vect "can't 
> determine dependence" 1
> FAIL: gcc.dg/vect/no-vfa-vect-102.c scan-tree-dump-times vect "possible 
> dependence between data-refs" 1
> FAIL: gcc.dg/vect/no-vfa-vect-102a.c scan-tree-dump-times vect "possible 
> dependence between data-refs" 1
> FAIL: gcc.dg/vect/no-vfa-vect-37.c scan-tree-dump-times vect "can't determine 
> dependence" 2
> FAIL: gcc.dg/vect/pr57705.c -flto -ffat-lto-objects  scan-tree-dump-times 
> vect "vectorized 1 loop" 2
> FAIL: gcc.dg/vect/pr57705.c scan-tree-dump-times vect "vectorized 1 loop" 2
> FAIL: gcc.dg/vect/pr63341-1.c -flto -ffat-lto-objects execution test
> FAIL: gcc.dg/vect/pr63341-1.c execution test
> FAIL: gcc.dg/vect/pr63341-2.c -flto -ffat-lto-objects execution test
> FAIL: gcc.dg/vect/pr63341-2.c execution test
> FAIL: gcc.dg/vect/pr65310.c -flto -ffat-lto-objects  scan-tree-dump vect 
> "can't force alignment"
> FAIL: gcc.dg/vect/pr65310.c -flto -ffat-lto-objects  scan-tree-dump-not vect 
> "misalign = 0"
> FAIL: gcc.dg/vect/pr65310.c scan-tree-dump vect "can't force alignment"
> FAIL: gcc.dg/vect/pr65310.c scan-tree-dump-not vect "misalign = 0"
> FAIL: gcc.dg/vect/pr65518.c -flto -ffat-lto-objects  scan-tree-dump-times 
> vect "vectorized 0 loops in function" 2
> FAIL: gcc.dg/vect/pr65518.c scan-tree-dump-times vect "vectorized 0 loops in 
> function" 2
> FAIL: gcc.dg/vect/pr68445.c -flto -ffat-lto-objects  scan-tree-dump vect 
> "vectorizing stmts using SLP"
> FAIL: gcc.dg/vect/pr68445.c scan-tree-dump vect "vectorizing stmts using SLP"
> FAIL: gcc.dg/vect/pr88598-1.c -flto -ffat-lto-objects  scan-tree-dump-not 
> optimized "REDUC_PLUS"
> FAIL: gcc.dg/vect/pr88598-1.c scan-tree-dump-not optimized "REDUC_PLUS"
> FAIL: gcc.dg/vect/pr88598-2.c -flto -ffat-lto-objects  scan-tree-dump-not 
> optimized "REDUC_PLUS"
> FAIL: gcc.dg/vect/pr88598-2.c scan-tree-dump-not optimized "REDUC_PLUS"
> FAIL: gcc.dg/vect/pr88598-3.c -flto -ffat-lto-objects  scan-tree-dump-not 
> optimized "REDUC_PLUS"
> FAIL: gcc.dg/vect/pr88598-3.c scan-tree-dump-not optimized "REDUC_PLUS"
> FAIL: gcc.dg/vect/pr94994.c -flto -ffat-lto-objects execution test
> FAIL: gcc.dg/vect/pr94994.c execution test
> FAIL: gcc.dg/vect/pr97835.c -flto -ffat-lto-objects  scan-tree-dump vect 
> "vectorizing stmts using SLP"
> FAIL: gcc.dg/vect/pr97835.c scan-tree-dump vect "vectorizing stmts using SLP"
> FAIL: gcc.dg/vect/slp-1.c -flto -ffat-lto-objects  scan-tree-dump-times vect 
> "vectorizing stmts using SLP" 4
> FAIL: gcc.dg/vect/slp-1.c scan-tree-dump-times vect "vectorizing stmts using 
> SLP" 4
> FAIL: gcc.dg/vect/slp-11a.c -flto -ffat-lto-objects  scan-tree-dump-times 
> vect "vectorized 0 loops" 1
> FAIL: gcc.dg/vect/slp-11a.c scan-tree-dump-times vect "vectorized 0 loops" 1
> FAIL: gcc.dg/vect/slp-12a.c -flto -ffat-lto-objects  scan-tree-dump-times 
> vect "vectorized 0 loops" 1
> FAIL: gcc.dg/vect/slp-12a.c scan-tree-dump-times vect "vectorized 0 loops" 1
> FAIL: gcc.dg/vect/slp-12c.c -flto -ffat-lto-objects  scan-tree-dump-times 
> vect "vectorized 0 loops" 1
> FAIL: gcc.dg/vect/slp-12c.c -flto -ffat-lto-objects  scan-tree-dump-times 
> vect "vectorizing stmts using SLP" 0
> FAIL: gcc.dg/vect/slp-12c.c scan-tree-dump-times vect "vectorized 0 loops" 1
> FAIL: gcc.dg/vect/slp-12c.c scan-tree-dump-tim

[PATCH V4] RISC-V: Enable vec_int testsuite for RVV VLA vectorization

2023-08-28 Thread Juzhe-Zhong
XPASS: gcc.dg/vect/no-scevccp-outer-12.c scan-tree-dump-times vect "OUTER LOOP 
VECTORIZED." 1
XPASS: gcc.dg/vect/no-scevccp-outer-16.c scan-tree-dump-times vect "OUTER LOOP 
VECTORIZED." 1
XPASS: gcc.dg/vect/no-scevccp-outer-17.c scan-tree-dump-times vect "OUTER LOOP 
VECTORIZED." 1
XPASS: gcc.dg/vect/no-scevccp-outer-19.c scan-tree-dump-times vect "OUTER LOOP 
VECTORIZED." 1
XPASS: gcc.dg/vect/no-scevccp-outer-21.c scan-tree-dump-times vect "OUTER LOOP 
VECTORIZED." 1
FAIL: gcc.dg/vect/no-scevccp-outer-7.c scan-tree-dump-times vect 
"vect_recog_widen_mult_pattern: detected" 1
FAIL: gcc.dg/vect/no-section-anchors-vect-31.c scan-tree-dump-times vect 
"Alignment of access forced using peeling" 2
FAIL: gcc.dg/vect/no-section-anchors-vect-64.c scan-tree-dump-times vect 
"Alignment of access forced using peeling" 2
FAIL: gcc.dg/vect/no-vfa-vect-101.c scan-tree-dump-times vect "can't determine 
dependence" 1
FAIL: gcc.dg/vect/no-vfa-vect-102.c scan-tree-dump-times vect "possible 
dependence between data-refs" 1
FAIL: gcc.dg/vect/no-vfa-vect-102a.c scan-tree-dump-times vect "possible 
dependence between data-refs" 1
FAIL: gcc.dg/vect/no-vfa-vect-37.c scan-tree-dump-times vect "can't determine 
dependence" 2
FAIL: gcc.dg/vect/pr57705.c -flto -ffat-lto-objects  scan-tree-dump-times vect 
"vectorized 1 loop" 2
FAIL: gcc.dg/vect/pr57705.c scan-tree-dump-times vect "vectorized 1 loop" 2
FAIL: gcc.dg/vect/pr63341-1.c -flto -ffat-lto-objects execution test
FAIL: gcc.dg/vect/pr63341-1.c execution test
FAIL: gcc.dg/vect/pr63341-2.c -flto -ffat-lto-objects execution test
FAIL: gcc.dg/vect/pr63341-2.c execution test
FAIL: gcc.dg/vect/pr65310.c -flto -ffat-lto-objects  scan-tree-dump vect "can't 
force alignment"
FAIL: gcc.dg/vect/pr65310.c -flto -ffat-lto-objects  scan-tree-dump-not vect 
"misalign = 0"
FAIL: gcc.dg/vect/pr65310.c scan-tree-dump vect "can't force alignment"
FAIL: gcc.dg/vect/pr65310.c scan-tree-dump-not vect "misalign = 0"
FAIL: gcc.dg/vect/pr65518.c -flto -ffat-lto-objects  scan-tree-dump-times vect 
"vectorized 0 loops in function" 2
FAIL: gcc.dg/vect/pr65518.c scan-tree-dump-times vect "vectorized 0 loops in 
function" 2
FAIL: gcc.dg/vect/pr68445.c -flto -ffat-lto-objects  scan-tree-dump vect 
"vectorizing stmts using SLP"
FAIL: gcc.dg/vect/pr68445.c scan-tree-dump vect "vectorizing stmts using SLP"
FAIL: gcc.dg/vect/pr88598-1.c -flto -ffat-lto-objects  scan-tree-dump-not 
optimized "REDUC_PLUS"
FAIL: gcc.dg/vect/pr88598-1.c scan-tree-dump-not optimized "REDUC_PLUS"
FAIL: gcc.dg/vect/pr88598-2.c -flto -ffat-lto-objects  scan-tree-dump-not 
optimized "REDUC_PLUS"
FAIL: gcc.dg/vect/pr88598-2.c scan-tree-dump-not optimized "REDUC_PLUS"
FAIL: gcc.dg/vect/pr88598-3.c -flto -ffat-lto-objects  scan-tree-dump-not 
optimized "REDUC_PLUS"
FAIL: gcc.dg/vect/pr88598-3.c scan-tree-dump-not optimized "REDUC_PLUS"
FAIL: gcc.dg/vect/pr94994.c -flto -ffat-lto-objects execution test
FAIL: gcc.dg/vect/pr94994.c execution test
FAIL: gcc.dg/vect/pr97835.c -flto -ffat-lto-objects  scan-tree-dump vect 
"vectorizing stmts using SLP"
FAIL: gcc.dg/vect/pr97835.c scan-tree-dump vect "vectorizing stmts using SLP"
FAIL: gcc.dg/vect/slp-1.c -flto -ffat-lto-objects  scan-tree-dump-times vect 
"vectorizing stmts using SLP" 4
FAIL: gcc.dg/vect/slp-1.c scan-tree-dump-times vect "vectorizing stmts using 
SLP" 4
FAIL: gcc.dg/vect/slp-11a.c -flto -ffat-lto-objects  scan-tree-dump-times vect 
"vectorized 0 loops" 1
FAIL: gcc.dg/vect/slp-11a.c scan-tree-dump-times vect "vectorized 0 loops" 1
FAIL: gcc.dg/vect/slp-12a.c -flto -ffat-lto-objects  scan-tree-dump-times vect 
"vectorized 0 loops" 1
FAIL: gcc.dg/vect/slp-12a.c scan-tree-dump-times vect "vectorized 0 loops" 1
FAIL: gcc.dg/vect/slp-12c.c -flto -ffat-lto-objects  scan-tree-dump-times vect 
"vectorized 0 loops" 1
FAIL: gcc.dg/vect/slp-12c.c -flto -ffat-lto-objects  scan-tree-dump-times vect 
"vectorizing stmts using SLP" 0
FAIL: gcc.dg/vect/slp-12c.c scan-tree-dump-times vect "vectorized 0 loops" 1
FAIL: gcc.dg/vect/slp-12c.c scan-tree-dump-times vect "vectorizing stmts using 
SLP" 0
FAIL: gcc.dg/vect/slp-15.c -flto -ffat-lto-objects  scan-tree-dump-times vect 
"vectorized 0 loops" 1
FAIL: gcc.dg/vect/slp-15.c -flto -ffat-lto-objects  scan-tree-dump-times vect 
"vectorizing stmts using SLP" 0
FAIL: gcc.dg/vect/slp-15.c scan-tree-dump-times vect "vectorized 0 loops" 1
FAIL: gcc.dg/vect/slp-15.c scan-tree-dump-times vect "vectorizing stmts using 
SLP" 0
FAIL: gcc.dg/vect/slp-19a.c -flto -ffat-lto-objects  scan-tree-dump-times vect 
"vectorized 0 loops" 1
FAIL: gcc.dg/vect/slp-19a.c -flto -ffat-lto-objects  scan-tree-dump-times vect 
"vectorizing stmts using SLP" 0
FAIL: gcc.dg/vect/slp-19a.c scan-tree-dump-times vect "vectorized 0 loops" 1
FAIL: gcc.dg/vect/slp-19a.c scan-tree-dump-times vect "vectorizing stmts using 
SLP" 0
FAIL: gcc.dg/vect/slp-19b.c -flto -ffat-lto-objects  scan-tree-dump-times vect 
"vectorized 0 loops" 1
FAIL: gcc.dg/vect/slp-19b.c -flto -ffat-lto-objects  scan-tree

[PATCH] RISC-V: Fix uninitialized probability for GIMPLE IR tests

2023-08-28 Thread Juzhe-Zhong
This patch fix unitialized probability in GIMPLE IR code tests:
FAIL: gcc.dg/vect/slp-reduc-10a.c (internal compiler error: in 
compute_probabilities, at config/riscv/riscv-vsetvl.cc:4358)
FAIL: gcc.dg/vect/slp-reduc-10a.c (test for excess errors)
FAIL: gcc.dg/vect/slp-reduc-10a.c -flto -ffat-lto-objects (internal compiler 
error: in compute_probabilities, at config/riscv/riscv-vsetvl.cc:4358)
FAIL: gcc.dg/vect/slp-reduc-10a.c -flto -ffat-lto-objects (test for excess 
errors)
FAIL: gcc.dg/vect/slp-reduc-10b.c (internal compiler error: in 
compute_probabilities, at config/riscv/riscv-vsetvl.cc:4358)
FAIL: gcc.dg/vect/slp-reduc-10b.c (test for excess errors)
FAIL: gcc.dg/vect/slp-reduc-10b.c -flto -ffat-lto-objects (internal compiler 
error: in compute_probabilities, at config/riscv/riscv-vsetvl.cc:4358)
FAIL: gcc.dg/vect/slp-reduc-10b.c -flto -ffat-lto-objects (test for excess 
errors)
FAIL: gcc.dg/vect/slp-reduc-10c.c (internal compiler error: in 
compute_probabilities, at config/riscv/riscv-vsetvl.cc:4358)
FAIL: gcc.dg/vect/slp-reduc-10c.c (test for excess errors)
FAIL: gcc.dg/vect/slp-reduc-10c.c -flto -ffat-lto-objects (internal compiler 
error: in compute_probabilities, at config/riscv/riscv-vsetvl.cc:4358)
FAIL: gcc.dg/vect/slp-reduc-10c.c -flto -ffat-lto-objects (test for excess 
errors)
FAIL: gcc.dg/vect/slp-reduc-10d.c (internal compiler error: in 
compute_probabilities, at config/riscv/riscv-vsetvl.cc:4358)
FAIL: gcc.dg/vect/slp-reduc-10d.c (test for excess errors)
FAIL: gcc.dg/vect/slp-reduc-10d.c -flto -ffat-lto-objects (internal compiler 
error: in compute_probabilities, at config/riscv/riscv-vsetvl.cc:4358)
FAIL: gcc.dg/vect/slp-reduc-10d.c -flto -ffat-lto-objects (test for excess 
errors)
FAIL: gcc.dg/vect/slp-reduc-10e.c (internal compiler error: in 
compute_probabilities, at config/riscv/riscv-vsetvl.cc:4358)
FAIL: gcc.dg/vect/slp-reduc-10e.c (test for excess errors)
FAIL: gcc.dg/vect/slp-reduc-10e.c -flto -ffat-lto-objects (internal compiler 
error: in compute_probabilities, at config/riscv/riscv-vsetvl.cc:4358)
FAIL: gcc.dg/vect/slp-reduc-10e.c -flto -ffat-lto-objects (test for excess 
errors)
FAIL: gcc.dg/vect/vect-cond-arith-2.c (internal compiler error: in 
compute_probabilities, at config/riscv/riscv-vsetvl.cc:4358)
FAIL: gcc.dg/vect/vect-cond-arith-2.c (test for excess errors)
FAIL: gcc.dg/vect/vect-cond-arith-2.c -flto -ffat-lto-objects (internal 
compiler error: in compute_probabilities, at config/riscv/riscv-vsetvl.cc:4358)
FAIL: gcc.dg/vect/vect-cond-arith-2.c -flto -ffat-lto-objects (test for excess 
errors)

gcc/ChangeLog:

* config/riscv/riscv-vsetvl.cc (pass_vsetvl::earliest_fusion): Skip 
never probability.
(pass_vsetvl::compute_probabilities): Fix unitialized probability.

---
 gcc/config/riscv/riscv-vsetvl.cc | 13 -
 1 file changed, 12 insertions(+), 1 deletion(-)

diff --git a/gcc/config/riscv/riscv-vsetvl.cc b/gcc/config/riscv/riscv-vsetvl.cc
index 48e89fe2c03..f7ae6c16bee 100644
--- a/gcc/config/riscv/riscv-vsetvl.cc
+++ b/gcc/config/riscv/riscv-vsetvl.cc
@@ -3272,6 +3272,10 @@ pass_vsetvl::earliest_fusion (void)
  if (expr.empty_p ())
continue;
  edge eg = INDEX_EDGE (m_vector_manager->vector_edge_list, ed);
+ /* If it is the edge that we never reach, skip its possible PRE
+fusion conservatively.  */
+ if (eg->probability == profile_probability::never ())
+   break;
  if (eg->src == ENTRY_BLOCK_PTR_FOR_FN (cfun)
  || eg->dest == EXIT_BLOCK_PTR_FOR_FN (cfun))
break;
@@ -4359,7 +4363,14 @@ pass_vsetvl::compute_probabilities (void)
   FOR_EACH_EDGE (e, ei, cfg_bb->succs)
{
  auto &new_prob = get_block_info (e->dest).probability;
- if (!new_prob.initialized_p ())
+ /* Normally, the edge probability should be initialized.
+However, some special testing code which is written in
+GIMPLE IR style force the edge probility uninitialized,
+we conservatively set it as never so that it will not
+affect PRE (Phase 3 && Phse 4).  */
+ if (!e->probability.initialized_p ())
+   new_prob = profile_probability::never ();
+ else if (!new_prob.initialized_p ())
new_prob = curr_prob * e->probability;
  else if (new_prob == profile_probability::always ())
continue;
-- 
2.36.3



Re: [PATCH] RISC-V: Refactor and clean expand_cond_len_{unop,binop,ternop}

2023-08-28 Thread Lehua Ding

Hi Robin,

Thanks for reviewing.


Cleanup up here is good, right now it's not really an insn_type but
indeed just the number of operands.  My original idea was to have an
insn type and a mostly unified expander that performs all necessary
operations depending on the insn_type.  Just to give an idea of why it's
called that way.


+  rtx ops[RVV_BINOP_MASK] = {target, mask, target, op, sel};
+  emit_vlmax_masked_mu_insn (icode, RVV_BINOP_MASK, ops);


One of the ideas was that a function emit_vlmax_masked_mu_insn would already
know that it's dealing with a mask and we would just pass something like
RVV_BINOP.  The other way would be to just have emit_vlmax_mu_insn or
something and let the rest be deduced from the insn_type.  Even the vlmax
I intended to have mostly implicit but that somehow got lost during
refactorings :)  No need to change anything for now, just for perspective
again.


I think the ideas of these two comments will be reflected in the next 
patch of refactoring emit_vlmax/nonvlmax_xxx functions (abstract several 
uniform types of RVV instructions), thanks a lot!




Would you mind renaming op_num (i.e. usually understood as operand_number) into
num_ops or nops? (i.e. number of operands).  That way we would be more in line 
of
what the later expander functions do.


OK, rename op_num into num_ops.



I would actually prefer to keep "ops" because it's already clear from the
function name that we work with a conditional function (and we don't have
any other ops).


No problem.



We're already a bit inconsistent with how we pasds mask, merge and the source
operands.  Maybe we could also unify this a bit?  I don't have a clear
preference for either, though.


+  rtx cond_ops[RVV_BINOP_MASK] = {dest, mask, merge, src1, src2};


Here, the merge comes before the sources as well.


+  rtx cond_ops[RVV_TERNOP_MASK] = {dest, mask, src1, src2, src3, merge};

And here, the merge comes last.  I realize this makes sense in the context
of a ternary operation because the merge is always "real".  As our vector
patterns are similar, maybe we should use this ordering all the time?


Yes, the ternary merge is not placed in operand 2 like binop or unop, I 
discussed it with Juzhe and it could be unified, but it would change to 
the intrinsic part, so I would suggest to bring up a separate patch to 
unfied the operand order. The unfied order like this:


  DEST, MASK, MERGE (this three operands fixed for most insns)
  OPS (the number can be 1, 2, 3)
  VL, TAIL_POLICY, MASK_POLICY, AVL_TYPE, ROUDING_MODE


--
Best,
Lehua



Re: Re: [PATCH 0/2] support cm.push cm.pop cm.popret in zcmp and resolve confilct with shrink-wrap-separate

2023-08-28 Thread Fei Gao

On 2023-08-28 17:27  Kito Cheng  wrote:
>
>I would prefer to decouple the shrink-wrap part by checking
>flag_shrink_wrap, I mean let disable zcmp code gen if flag_shrink_wrap
>is true for now, and a follow up patch series with shrink-wrap.[cc|h]
>changes? 

OK. some details to be confirmed by you:
1. flag_shrink_wrap_separate seems better than flag_shrink_wrap.
2. to pass the zcmp testcases, i will add fno-shrink-wrap-separate option.

BR, 
Fei

>
>On Mon, Aug 28, 2023 at 3:48 PM Fei Gao  wrote:
>>
>> The first is a helper patch to allow targets to check shrink-wrap-separate 
>> enabled or not.
>> The second is zcmp extension implementation in RISC-V.
>>
>> Fei Gao (2):
>>   allow target to check shrink-wrap-separate enabled or not
>>   support cm.push cm.pop cm.popret in zcmp and resolve confilct with 
>>shrink-wrap-separate
>>
>>  gcc/config/riscv/iterators.md |   15 +
>>  gcc/config/riscv/predicates.md    |   96 ++
>>  gcc/config/riscv/riscv-protos.h   |    2 +
>>  gcc/config/riscv/riscv.cc |  455 ++-
>>  gcc/config/riscv/riscv.h  |   25 +
>>  gcc/config/riscv/riscv.md |    2 +
>>  gcc/config/riscv/zc.md    | 1042 +
>>  gcc/shrink-wrap.cc    |   25 +-
>>  gcc/shrink-wrap.h |    1 +
>>  gcc/testsuite/gcc.target/riscv/rv32e_zcmp.c   |  256 
>>  gcc/testsuite/gcc.target/riscv/rv32i_zcmp.c   |  256 
>>  .../gcc.target/riscv/zcmp_push_fpr.c  |   34 +
>>  .../riscv/zcmp_shrink_wrap_separate.c |   93 ++
>>  .../riscv/zcmp_shrink_wrap_separate2.c    |   93 ++
>>  .../gcc.target/riscv/zcmp_stack_alignment.c   |   24 +
>>  15 files changed, 2357 insertions(+), 62 deletions(-)
>>  create mode 100644 gcc/config/riscv/zc.md
>>  create mode 100644 gcc/testsuite/gcc.target/riscv/rv32e_zcmp.c
>>  create mode 100644 gcc/testsuite/gcc.target/riscv/rv32i_zcmp.c
>>  create mode 100644 gcc/testsuite/gcc.target/riscv/zcmp_push_fpr.c
>>  create mode 100644 
>>gcc/testsuite/gcc.target/riscv/zcmp_shrink_wrap_separate.c
>>  create mode 100644 
>>gcc/testsuite/gcc.target/riscv/zcmp_shrink_wrap_separate2.c
>>  create mode 100644 gcc/testsuite/gcc.target/riscv/zcmp_stack_alignment.c
>>
>> --
>> 2.17.1
>>

Re: [PATCH] RISC-V: Refactor and clean expand_cond_len_{unop,binop,ternop}

2023-08-28 Thread Robin Dapp via Gcc-patches
Hi Lehua,

thanks for starting with the refactoring.  I have some minor comments.

> +/* The value means the number of operands for insn_expander.  */
>  enum insn_type
>  {
>RVV_MISC_OP = 1,
>RVV_UNOP = 2,
> -  RVV_UNOP_M = RVV_UNOP + 2,
> -  RVV_UNOP_MU = RVV_UNOP + 2,
> -  RVV_UNOP_TU = RVV_UNOP + 2,
> -  RVV_UNOP_TUMU = RVV_UNOP + 2,
> +  RVV_UNOP_MASK = RVV_UNOP + 2,

Cleanup up here is good, right now it's not really an insn_type but
indeed just the number of operands.  My original idea was to have an
insn type and a mostly unified expander that performs all necessary
operations depending on the insn_type.  Just to give an idea of why it's
called that way.

> +  rtx ops[RVV_BINOP_MASK] = {target, mask, target, op, sel};
> +  emit_vlmax_masked_mu_insn (icode, RVV_BINOP_MASK, ops);

One of the ideas was that a function emit_vlmax_masked_mu_insn would already
know that it's dealing with a mask and we would just pass something like
RVV_BINOP.  The other way would be to just have emit_vlmax_mu_insn or
something and let the rest be deduced from the insn_type.  Even the vlmax
I intended to have mostly implicit but that somehow got lost during
refactorings :)  No need to change anything for now, just for perspective
again. 

> -/* Expand unary ops COND_LEN_*.  */
> -void
> -expand_cond_len_unop (rtx_code code, rtx *ops)
> +/* Subroutine to expand COND_LEN_* patterns.  */
> +static void
> +expand_cond_len_op (rtx_code code, unsigned icode, int op_num, rtx *cond_ops,
> + rtx len)
>  {

Would you mind renaming op_num (i.e. usually understood as operand_number) into
num_ops or nops? (i.e. number of operands).  That way we would be more in line 
of
what the later expander functions do.

> -  rtx dest = ops[0];
> -  rtx mask = ops[1];
> -  rtx src = ops[2];
> -  rtx merge = ops[3];
> -  rtx len = ops[4];
> +  rtx dest = cond_ops[0];
> +  rtx mask = cond_ops[1];

I would actually prefer to keep "ops" because it's already clear from the
function name that we work with a conditional function (and we don't have
any other ops).

>  
> +/* Expand unary ops COND_LEN_*.  */
> +void
> +expand_cond_len_unop (rtx_code code, rtx *ops)
> +{
> +  rtx dest = ops[0];
> +  rtx mask = ops[1];
> +  rtx src = ops[2];
> +  rtx merge = ops[3];
> +  rtx len = ops[4];
> +
> +  machine_mode mode = GET_MODE (dest);
> +  insn_code icode = code_for_pred (code, mode);
> +  rtx cond_ops[RVV_UNOP_MASK] = {dest, mask, merge, src};
> +  expand_cond_len_op (code, icode, RVV_UNOP_MASK, cond_ops, len);
> +}

We're already a bit inconsistent with how we pasds mask, merge and the source
operands.  Maybe we could also unify this a bit?  I don't have a clear
preference for either, though.

> +  rtx cond_ops[RVV_BINOP_MASK] = {dest, mask, merge, src1, src2};

Here, the merge comes before the sources as well.

> +  rtx cond_ops[RVV_TERNOP_MASK] = {dest, mask, src1, src2, src3, merge};
And here, the merge comes last.  I realize this makes sense in the context
of a ternary operation because the merge is always "real".  As our vector
patterns are similar, maybe we should use this ordering all the time?

Regards
 Robin



Re: [PATCH] c++: Implement C++ DR 2406 - [[fallthrough]] attribute and iteration statements

2023-08-28 Thread Richard Biener via Gcc-patches
On Fri, 25 Aug 2023, Jakub Jelinek wrote:

> Hi!
> 
> The following patch implements
> CWG 2406 - [[fallthrough]] attribute and iteration statements
> The genericization of some loops leaves nothing at all or just a label
> after a body of a loop, so if the loop is later followed by
> case or default label in a switch, the fallthrough statement isn't
> diagnosed.
> 
> The following patch implements it by marking the IFN_FALLTHROUGH call
> in such a case, such that during gimplification it can be pedantically
> diagnosed even if it is followed by case or default label or some normal
> labels followed by case/default labels.
> 
> While looking into this, I've discovered other problems.
> expand_FALLTHROUGH_r is removing the IFN_FALLTHROUGH calls from the IL,
> but wasn't telling that to walk_gimple_stmt/walk_gimple_seq_mod, so
> the callers would then skip the next statement after it, and it would
> return non-NULL if the removed stmt was last in the sequence.  This could
> lead to wi->callback_result being set even if it didn't appear at the very
> end of switch sequence.
> The patch makes use of wi->removed_stmt such that the callers properly
> know what happened, and use different way to handle the end of switch
> sequence case.
> 
> That change discovered a bug in the gimple-walk handling of
> wi->removed_stmt.  If that flag is set, the callback is telling the callers
> that the current statement has been removed and so the innermost
> walk_gimple_seq_mod shouldn't gsi_next.  The problem is that
> wi->removed_stmt is only reset at the start of a walk_gimple_stmt, but that
> can be too late for some cases.  If we have two nested gimple sequences,
> say GIMPLE_BIND as the last stmt of some gimple seq, we remove the last
> statement inside of that GIMPLE_BIND, set wi->removed_stmt there, don't
> do gsi_next correctly because already gsi_remove moved us to the next stmt,
> there is no next stmt, so we return back to the caller, but wi->removed_stmt
> is still set and so we don't do gsi_next even in the outer sequence, despite
> the GIMPLE_BIND (etc.) not being removed.  That means we walk the
> GIMPLE_BIND with its whole sequence again.
> The patch fixes that by resetting wi->removed_stmt after we've used that
> flag in walk_gimple_seq_mod.  Nothing really uses that flag after the
> outermost walk_gimple_seq_mod, it is just a private notification that
> the stmt callback has removed a stmt.
> 
> Bootstrapped/regtested on x86_64-linux and i686-linux, ok for trunk?

The gimple-walk.cc/gimplify.cc changes are OK, I don't understand
the c-gimplify.cc one.

Thanks,
Richard.

> 2023-08-25  Jakub Jelinek  
> 
> gcc/
>   * gimplify.cc (expand_FALLTHROUGH_r): Use wi->removed_stmt after
>   gsi_remove, change the way of passing fallthrough stmt at the end
>   of sequence to expand_FALLTHROUGH.  Diagnose IFN_FALLTHROUGH
>   with GF_CALL_NOTHROW flag.
>   (expand_FALLTHROUGH): Change loc into array of 2 location_t elts,
>   don't test wi.callback_result, instead check whether first
>   elt is not UNKNOWN_LOCATION and in that case pedwarn with the
>   second location.
>   * gimple-walk.cc (walk_gimple_seq_mod): Clear wi->removed_stmt
>   after the flag has been used.
> gcc/c-family/
>   * c-gimplify.cc (genericize_c_loop): For C++ mark IFN_FALLTHROUGH
>   call at the end of loop body as TREE_NOTHROW.
> gcc/testsuite/
>   * g++.dg/DRs/dr2406.C: New test.
> 
> --- gcc/gimplify.cc.jj2023-08-23 11:22:28.115592483 +0200
> +++ gcc/gimplify.cc   2023-08-25 13:43:58.711847414 +0200
> @@ -2588,17 +2588,33 @@ expand_FALLTHROUGH_r (gimple_stmt_iterat
>*handled_ops_p = false;
>break;
>  case GIMPLE_CALL:
> +  static_cast(wi->info)[0] = UNKNOWN_LOCATION;
>if (gimple_call_internal_p (stmt, IFN_FALLTHROUGH))
>   {
> +   location_t loc = gimple_location (stmt);
> gsi_remove (gsi_p, true);
> +   wi->removed_stmt = true;
> +
> +   /* nothrow flag is added by genericize_c_loop to mark fallthrough
> +  statement at the end of some loop's body.  Those should be
> +  always diagnosed, either because they indeed don't precede
> +  a case label or default label, or because the next statement
> +  is not within the same iteration statement.  */
> +   if ((stmt->subcode & GF_CALL_NOTHROW) != 0)
> + {
> +   pedwarn (loc, 0, "attribute % not preceding "
> +"a case label or default label");
> +   break;
> + }
> +
> if (gsi_end_p (*gsi_p))
>   {
> -   *static_cast(wi->info) = gimple_location (stmt);
> -   return integer_zero_node;
> +   static_cast(wi->info)[0] = BUILTINS_LOCATION;
> +   static_cast(wi->info)[1] = loc;
> +   break;
>   }
>  
> bool found = false;
> -   location_t loc = gimple_location (stmt);
>  
> gimple_stmt_iterator gsi2 = *gsi_p;
>

Re: [PATCH V3] RISC-V: Enable vec_int testsuite for RVV VLA vectorization

2023-08-28 Thread Robin Dapp via Gcc-patches
On 8/28/23 12:16, Juzhe-Zhong wrote:
> FAIL: gcc.dg/vect/bb-slp-10.c -flto -ffat-lto-objects  scan-tree-dump slp2 
> "unsupported unaligned access"
> FAIL: gcc.dg/vect/bb-slp-10.c scan-tree-dump slp2 "unsupported unaligned 
> access"
> XPASS: gcc.dg/vect/no-scevccp-outer-12.c scan-tree-dump-times vect "OUTER 
> LOOP VECTORIZED." 1
> XPASS: gcc.dg/vect/no-scevccp-outer-16.c scan-tree-dump-times vect "OUTER 
> LOOP VECTORIZED." 1
> XPASS: gcc.dg/vect/no-scevccp-outer-17.c scan-tree-dump-times vect "OUTER 
> LOOP VECTORIZED." 1
> XPASS: gcc.dg/vect/no-scevccp-outer-19.c scan-tree-dump-times vect "OUTER 
> LOOP VECTORIZED." 1
> XPASS: gcc.dg/vect/no-scevccp-outer-21.c scan-tree-dump-times vect "OUTER 
> LOOP VECTORIZED." 1
> FAIL: gcc.dg/vect/no-scevccp-outer-7.c scan-tree-dump-times vect 
> "vect_recog_widen_mult_pattern: detected" 1
> XPASS: gcc.dg/vect/no-scevccp-outer-8.c scan-tree-dump-times vect "OUTER LOOP 
> VECTORIZED." 1
> FAIL: gcc.dg/vect/no-section-anchors-vect-31.c scan-tree-dump-times vect 
> "Alignment of access forced using peeling" 2
> FAIL: gcc.dg/vect/no-section-anchors-vect-64.c scan-tree-dump-times vect 
> "Alignment of access forced using peeling" 2
> FAIL: gcc.dg/vect/no-section-anchors-vect-69.c scan-tree-dump-times vect 
> "vectorized 3 loops" 1
> FAIL: gcc.dg/vect/no-vfa-vect-101.c scan-tree-dump-times vect "can't 
> determine dependence" 1
> FAIL: gcc.dg/vect/no-vfa-vect-102.c scan-tree-dump-times vect "possible 
> dependence between data-refs" 1
> FAIL: gcc.dg/vect/no-vfa-vect-102a.c scan-tree-dump-times vect "possible 
> dependence between data-refs" 1
> FAIL: gcc.dg/vect/no-vfa-vect-37.c scan-tree-dump-times vect "can't determine 
> dependence" 2
> FAIL: gcc.dg/vect/pr57705.c -flto -ffat-lto-objects  scan-tree-dump-times 
> vect "vectorized 1 loop" 2
> FAIL: gcc.dg/vect/pr57705.c scan-tree-dump-times vect "vectorized 1 loop" 2
> FAIL: gcc.dg/vect/pr63341-1.c -flto -ffat-lto-objects execution test
> FAIL: gcc.dg/vect/pr63341-1.c execution test
> FAIL: gcc.dg/vect/pr63341-2.c -flto -ffat-lto-objects execution test
> FAIL: gcc.dg/vect/pr63341-2.c execution test
> FAIL: gcc.dg/vect/pr65310.c -flto -ffat-lto-objects  scan-tree-dump vect 
> "can't force alignment"
> FAIL: gcc.dg/vect/pr65310.c -flto -ffat-lto-objects  scan-tree-dump-not vect 
> "misalign = 0"
> FAIL: gcc.dg/vect/pr65310.c scan-tree-dump vect "can't force alignment"
> FAIL: gcc.dg/vect/pr65310.c scan-tree-dump-not vect "misalign = 0"
> FAIL: gcc.dg/vect/pr65518.c -flto -ffat-lto-objects  scan-tree-dump-times 
> vect "vectorized 0 loops in function" 2
> FAIL: gcc.dg/vect/pr65518.c scan-tree-dump-times vect "vectorized 0 loops in 
> function" 2
> FAIL: gcc.dg/vect/pr68445.c -flto -ffat-lto-objects  scan-tree-dump vect 
> "vectorizing stmts using SLP"
> FAIL: gcc.dg/vect/pr68445.c scan-tree-dump vect "vectorizing stmts using SLP"
> FAIL: gcc.dg/vect/pr88598-1.c -flto -ffat-lto-objects  scan-tree-dump-not 
> optimized "REDUC_PLUS"
> FAIL: gcc.dg/vect/pr88598-1.c scan-tree-dump-not optimized "REDUC_PLUS"
> FAIL: gcc.dg/vect/pr88598-2.c -flto -ffat-lto-objects  scan-tree-dump-not 
> optimized "REDUC_PLUS"
> FAIL: gcc.dg/vect/pr88598-2.c scan-tree-dump-not optimized "REDUC_PLUS"
> FAIL: gcc.dg/vect/pr88598-3.c -flto -ffat-lto-objects  scan-tree-dump-not 
> optimized "REDUC_PLUS"
> FAIL: gcc.dg/vect/pr88598-3.c scan-tree-dump-not optimized "REDUC_PLUS"
> FAIL: gcc.dg/vect/pr94994.c -flto -ffat-lto-objects execution test
> FAIL: gcc.dg/vect/pr94994.c execution test
> FAIL: gcc.dg/vect/pr97835.c -flto -ffat-lto-objects  scan-tree-dump vect 
> "vectorizing stmts using SLP"
> FAIL: gcc.dg/vect/pr97835.c scan-tree-dump vect "vectorizing stmts using SLP"
> FAIL: gcc.dg/vect/slp-1.c -flto -ffat-lto-objects  scan-tree-dump-times vect 
> "vectorizing stmts using SLP" 4
> FAIL: gcc.dg/vect/slp-1.c scan-tree-dump-times vect "vectorizing stmts using 
> SLP" 4
> FAIL: gcc.dg/vect/slp-11a.c -flto -ffat-lto-objects  scan-tree-dump-times 
> vect "vectorized 0 loops" 1
> FAIL: gcc.dg/vect/slp-11a.c scan-tree-dump-times vect "vectorized 0 loops" 1
> FAIL: gcc.dg/vect/slp-12a.c -flto -ffat-lto-objects  scan-tree-dump-times 
> vect "vectorized 0 loops" 1
> FAIL: gcc.dg/vect/slp-12a.c scan-tree-dump-times vect "vectorized 0 loops" 1
> FAIL: gcc.dg/vect/slp-12c.c -flto -ffat-lto-objects  scan-tree-dump-times 
> vect "vectorized 0 loops" 1
> FAIL: gcc.dg/vect/slp-12c.c -flto -ffat-lto-objects  scan-tree-dump-times 
> vect "vectorizing stmts using SLP" 0
> FAIL: gcc.dg/vect/slp-12c.c scan-tree-dump-times vect "vectorized 0 loops" 1
> FAIL: gcc.dg/vect/slp-12c.c scan-tree-dump-times vect "vectorizing stmts 
> using SLP" 0
> FAIL: gcc.dg/vect/slp-15.c -flto -ffat-lto-objects  scan-tree-dump-times vect 
> "vectorized 0 loops" 1
> FAIL: gcc.dg/vect/slp-15.c -flto -ffat-lto-objects  scan-tree-dump-times vect 
> "vectorizing stmts using SLP" 0
> FAIL: gcc.dg/vect/slp-15.c scan-tree-dump-times vect "vectorized 0 loops" 1
> FAIL: gcc.dg/vect/slp-15.c scan-tre

Re: Re: [PATCH V2] RISC-V: Enable vec_int testsuite for RVV VLA vectorization

2023-08-28 Thread juzhe.zh...@rivai.ai
Ok. Add -Wno-psabi which reduce 5 FAILS.

V3:
https://gcc.gnu.org/pipermail/gcc-patches/2023-August/628572.html 




juzhe.zh...@rivai.ai
 
From: Robin Dapp
Date: 2023-08-28 16:22
To: Juzhe-Zhong; gcc-patches
CC: rdapp.gcc; kito.cheng; kito.cheng; jeffreyalaw
Subject: Re: [PATCH V2] RISC-V: Enable vec_int testsuite for RVV VLA 
vectorization
Thanks,
 
just giving my quick thoughts on some of the FAILs:
 
> Test report:
> FAIL: gcc.dg/vect/bb-slp-10.c -flto -ffat-lto-objects  scan-tree-dump slp2 
> "unsupported unaligned access"
> FAIL: gcc.dg/vect/bb-slp-10.c scan-tree-dump slp2 "unsupported unaligned 
> access"
 
For these we would need to add riscv to target_vect_element_align_preferred.
That might depend on uarch, though. 
 
> FAIL: gcc.dg/vect/bb-slp-70.c (test for excess errors)
> FAIL: gcc.dg/vect/bb-slp-70.c -flto -ffat-lto-objects (test for excess errors)
> FAIL: gcc.dg/vect/bb-slp-layout-17.c (test for excess errors)
> FAIL: gcc.dg/vect/bb-slp-layout-17.c -flto -ffat-lto-objects (test for excess 
> errors)
 
For these we need -Wno-psabi for now.   Besides, I still wanted to provide a
popcount fallback sometime soon.
 
> FAIL: gcc.dg/vect/pr65310.c -flto -ffat-lto-objects  scan-tree-dump vect 
> "can't force alignment"
> FAIL: gcc.dg/vect/pr65310.c -flto -ffat-lto-objects  scan-tree-dump-not vect 
> "misalign = 0"
> FAIL: gcc.dg/vect/pr65310.c scan-tree-dump vect "can't force alignment"
> FAIL: gcc.dg/vect/pr65310.c scan-tree-dump-not vect "misalign = 0"
 
Same as above with vect_element_align_preferred.
 
> XPASS: gcc.dg/vect/vect-10.c -flto -ffat-lto-objects  scan-tree-dump-times 
> vect "vectorized 1 loops" 1
> XPASS: gcc.dg/vect/vect-10.c scan-tree-dump-times vect "vectorized 1 loops" 1
> FAIL: gcc.dg/vect/vect-104.c -flto -ffat-lto-objects  scan-tree-dump-times 
> vect "possible dependence between data-refs" 1
> FAIL: gcc.dg/vect/vect-104.c scan-tree-dump-times vect "possible dependence 
> between data-refs" 1
> FAIL: gcc.dg/vect/vect-109.c -flto -ffat-lto-objects  scan-tree-dump-times 
> vect "unsupported unaligned access" 2
> FAIL: gcc.dg/vect/vect-109.c scan-tree-dump-times vect "unsupported unaligned 
> access" 2
> XPASS: gcc.dg/vect/vect-24.c -flto -ffat-lto-objects  scan-tree-dump-times 
> vect "vectorized 3 loops" 1
> XPASS: gcc.dg/vect/vect-24.c scan-tree-dump-times vect "vectorized 3 loops" 1
> FAIL: gcc.dg/vect/vect-26.c -flto -ffat-lto-objects  scan-tree-dump-times 
> vect "Alignment of access forced using peeling" 1
> FAIL: gcc.dg/vect/vect-26.c scan-tree-dump-times vect "Alignment of access 
> forced using peeling" 1
> FAIL: gcc.dg/vect/vect-27.c -flto -ffat-lto-objects  scan-tree-dump-times 
> vect "Vectorizing an unaligned access" 1
> FAIL: gcc.dg/vect/vect-27.c scan-tree-dump-times vect "Vectorizing an 
> unaligned access" 1
> FAIL: gcc.dg/vect/vect-29.c -flto -ffat-lto-objects  scan-tree-dump-times 
> vect "Vectorizing an unaligned access" 1
> FAIL: gcc.dg/vect/vect-29.c scan-tree-dump-times vect "Vectorizing an 
> unaligned access" 1
> FAIL: gcc.dg/vect/vect-33.c -flto -ffat-lto-objects  scan-tree-dump-times 
> vect "Alignment of access forced using versioning" 1
> FAIL: gcc.dg/vect/vect-33.c scan-tree-dump-times vect "Alignment of access 
> forced using versioning" 1
> FAIL: gcc.dg/vect/vect-72.c -flto -ffat-lto-objects  scan-tree-dump-times 
> vect "Vectorizing an unaligned access" 1
> FAIL: gcc.dg/vect/vect-72.c scan-tree-dump-times vect "Vectorizing an 
> unaligned access" 1
> FAIL: gcc.dg/vect/vect-75-big-array.c -flto -ffat-lto-objects  
> scan-tree-dump-times vect "Vectorizing an unaligned access" 1
> FAIL: gcc.dg/vect/vect-75-big-array.c scan-tree-dump-times vect "Vectorizing 
> an unaligned access" 1
> FAIL: gcc.dg/vect/vect-75.c -flto -ffat-lto-objects  scan-tree-dump-times 
> vect "Vectorizing an unaligned access" 1
> FAIL: gcc.dg/vect/vect-75.c scan-tree-dump-times vect "Vectorizing an 
> unaligned access" 1
> FAIL: gcc.dg/vect/vect-77-alignchecks.c -flto -ffat-lto-objects  
> scan-tree-dump-times vect "Vectorizing an unaligned access" 1
> FAIL: gcc.dg/vect/vect-77-alignchecks.c scan-tree-dump-times vect 
> "Vectorizing an unaligned access" 1
> FAIL: gcc.dg/vect/vect-77-global.c -flto -ffat-lto-objects  
> scan-tree-dump-times vect "Vectorizing an unaligned access" 1
> FAIL: gcc.dg/vect/vect-77-global.c scan-tree-dump-times vect "Vectorizing an 
> unaligned access" 1
> FAIL: gcc.dg/vect/vect-78-alignchecks.c -flto -ffat-lto-objects  
> scan-tree-dump-times vect "Vectorizing an unaligned access" 1
> FAIL: gcc.dg/vect/vect-78-alignchecks.c scan-tree-dump-times vect 
> "Vectorizing an unaligned access" 1
> FAIL: gcc.dg/vect/vect-78-global.c -flto -ffat-lto-objects  
> scan-tree-dump-times vect "Vectorizing an unaligned access" 1
> FAIL: gcc.dg/vect/vect-78-global.c scan-tree-dump-times vect "Vectorizing an 
> unaligned access" 1
> FAIL: gcc.dg/vect/vect-89-big-array.c -flto -ffat-lto-objects  
> scan-tree-dump-times vect "Alignment of access forced using peeling" 1
>

[PATCH V3] RISC-V: Enable vec_int testsuite for RVV VLA vectorization

2023-08-28 Thread Juzhe-Zhong
FAIL: gcc.dg/vect/bb-slp-10.c -flto -ffat-lto-objects  scan-tree-dump slp2 
"unsupported unaligned access"
FAIL: gcc.dg/vect/bb-slp-10.c scan-tree-dump slp2 "unsupported unaligned access"
XPASS: gcc.dg/vect/no-scevccp-outer-12.c scan-tree-dump-times vect "OUTER LOOP 
VECTORIZED." 1
XPASS: gcc.dg/vect/no-scevccp-outer-16.c scan-tree-dump-times vect "OUTER LOOP 
VECTORIZED." 1
XPASS: gcc.dg/vect/no-scevccp-outer-17.c scan-tree-dump-times vect "OUTER LOOP 
VECTORIZED." 1
XPASS: gcc.dg/vect/no-scevccp-outer-19.c scan-tree-dump-times vect "OUTER LOOP 
VECTORIZED." 1
XPASS: gcc.dg/vect/no-scevccp-outer-21.c scan-tree-dump-times vect "OUTER LOOP 
VECTORIZED." 1
FAIL: gcc.dg/vect/no-scevccp-outer-7.c scan-tree-dump-times vect 
"vect_recog_widen_mult_pattern: detected" 1
XPASS: gcc.dg/vect/no-scevccp-outer-8.c scan-tree-dump-times vect "OUTER LOOP 
VECTORIZED." 1
FAIL: gcc.dg/vect/no-section-anchors-vect-31.c scan-tree-dump-times vect 
"Alignment of access forced using peeling" 2
FAIL: gcc.dg/vect/no-section-anchors-vect-64.c scan-tree-dump-times vect 
"Alignment of access forced using peeling" 2
FAIL: gcc.dg/vect/no-section-anchors-vect-69.c scan-tree-dump-times vect 
"vectorized 3 loops" 1
FAIL: gcc.dg/vect/no-vfa-vect-101.c scan-tree-dump-times vect "can't determine 
dependence" 1
FAIL: gcc.dg/vect/no-vfa-vect-102.c scan-tree-dump-times vect "possible 
dependence between data-refs" 1
FAIL: gcc.dg/vect/no-vfa-vect-102a.c scan-tree-dump-times vect "possible 
dependence between data-refs" 1
FAIL: gcc.dg/vect/no-vfa-vect-37.c scan-tree-dump-times vect "can't determine 
dependence" 2
FAIL: gcc.dg/vect/pr57705.c -flto -ffat-lto-objects  scan-tree-dump-times vect 
"vectorized 1 loop" 2
FAIL: gcc.dg/vect/pr57705.c scan-tree-dump-times vect "vectorized 1 loop" 2
FAIL: gcc.dg/vect/pr63341-1.c -flto -ffat-lto-objects execution test
FAIL: gcc.dg/vect/pr63341-1.c execution test
FAIL: gcc.dg/vect/pr63341-2.c -flto -ffat-lto-objects execution test
FAIL: gcc.dg/vect/pr63341-2.c execution test
FAIL: gcc.dg/vect/pr65310.c -flto -ffat-lto-objects  scan-tree-dump vect "can't 
force alignment"
FAIL: gcc.dg/vect/pr65310.c -flto -ffat-lto-objects  scan-tree-dump-not vect 
"misalign = 0"
FAIL: gcc.dg/vect/pr65310.c scan-tree-dump vect "can't force alignment"
FAIL: gcc.dg/vect/pr65310.c scan-tree-dump-not vect "misalign = 0"
FAIL: gcc.dg/vect/pr65518.c -flto -ffat-lto-objects  scan-tree-dump-times vect 
"vectorized 0 loops in function" 2
FAIL: gcc.dg/vect/pr65518.c scan-tree-dump-times vect "vectorized 0 loops in 
function" 2
FAIL: gcc.dg/vect/pr68445.c -flto -ffat-lto-objects  scan-tree-dump vect 
"vectorizing stmts using SLP"
FAIL: gcc.dg/vect/pr68445.c scan-tree-dump vect "vectorizing stmts using SLP"
FAIL: gcc.dg/vect/pr88598-1.c -flto -ffat-lto-objects  scan-tree-dump-not 
optimized "REDUC_PLUS"
FAIL: gcc.dg/vect/pr88598-1.c scan-tree-dump-not optimized "REDUC_PLUS"
FAIL: gcc.dg/vect/pr88598-2.c -flto -ffat-lto-objects  scan-tree-dump-not 
optimized "REDUC_PLUS"
FAIL: gcc.dg/vect/pr88598-2.c scan-tree-dump-not optimized "REDUC_PLUS"
FAIL: gcc.dg/vect/pr88598-3.c -flto -ffat-lto-objects  scan-tree-dump-not 
optimized "REDUC_PLUS"
FAIL: gcc.dg/vect/pr88598-3.c scan-tree-dump-not optimized "REDUC_PLUS"
FAIL: gcc.dg/vect/pr94994.c -flto -ffat-lto-objects execution test
FAIL: gcc.dg/vect/pr94994.c execution test
FAIL: gcc.dg/vect/pr97835.c -flto -ffat-lto-objects  scan-tree-dump vect 
"vectorizing stmts using SLP"
FAIL: gcc.dg/vect/pr97835.c scan-tree-dump vect "vectorizing stmts using SLP"
FAIL: gcc.dg/vect/slp-1.c -flto -ffat-lto-objects  scan-tree-dump-times vect 
"vectorizing stmts using SLP" 4
FAIL: gcc.dg/vect/slp-1.c scan-tree-dump-times vect "vectorizing stmts using 
SLP" 4
FAIL: gcc.dg/vect/slp-11a.c -flto -ffat-lto-objects  scan-tree-dump-times vect 
"vectorized 0 loops" 1
FAIL: gcc.dg/vect/slp-11a.c scan-tree-dump-times vect "vectorized 0 loops" 1
FAIL: gcc.dg/vect/slp-12a.c -flto -ffat-lto-objects  scan-tree-dump-times vect 
"vectorized 0 loops" 1
FAIL: gcc.dg/vect/slp-12a.c scan-tree-dump-times vect "vectorized 0 loops" 1
FAIL: gcc.dg/vect/slp-12c.c -flto -ffat-lto-objects  scan-tree-dump-times vect 
"vectorized 0 loops" 1
FAIL: gcc.dg/vect/slp-12c.c -flto -ffat-lto-objects  scan-tree-dump-times vect 
"vectorizing stmts using SLP" 0
FAIL: gcc.dg/vect/slp-12c.c scan-tree-dump-times vect "vectorized 0 loops" 1
FAIL: gcc.dg/vect/slp-12c.c scan-tree-dump-times vect "vectorizing stmts using 
SLP" 0
FAIL: gcc.dg/vect/slp-15.c -flto -ffat-lto-objects  scan-tree-dump-times vect 
"vectorized 0 loops" 1
FAIL: gcc.dg/vect/slp-15.c -flto -ffat-lto-objects  scan-tree-dump-times vect 
"vectorizing stmts using SLP" 0
FAIL: gcc.dg/vect/slp-15.c scan-tree-dump-times vect "vectorized 0 loops" 1
FAIL: gcc.dg/vect/slp-15.c scan-tree-dump-times vect "vectorizing stmts using 
SLP" 0
FAIL: gcc.dg/vect/slp-19a.c -flto -ffat-lto-objects  scan-tree-dump-times vect 
"vectorized 0 loops" 1
FAIL: gcc.dg/vect/slp-19a.c -flto -ffat-lto-objects  scan-tree

Re: [PATCH V2] RISC-V: Disable user vsetvl fusion into EMPTY or DIRTY (Polluted EMPTY) block

2023-08-28 Thread Lehua Ding

Committed, thanks Kito.

On 2023/8/28 17:55, Kito Cheng via Gcc-patches wrote:

LGTM, that's much clearer than v1 to me :)

On Mon, Aug 28, 2023 at 5:54 PM Juzhe-Zhong  wrote:


This patch is fixing these bunch of ICE in "vect" testsuite:
FAIL: gcc.dg/vect/no-scevccp-outer-2.c (internal compiler error: in 
anticipatable_occurrence_p, at config/riscv/riscv-vsetvl.cc:314)
FAIL: gcc.dg/vect/no-scevccp-outer-2.c (test for excess errors)
FAIL: gcc.dg/vect/pr109025.c (internal compiler error: in 
anticipatable_occurrence_p, at config/riscv/riscv-vsetvl.cc:314)
FAIL: gcc.dg/vect/pr109025.c (test for excess errors)
FAIL: gcc.dg/vect/pr109025.c -flto -ffat-lto-objects (internal compiler error: 
in anticipatable_occurrence_p, at config/riscv/riscv-vsetvl.cc:314)
FAIL: gcc.dg/vect/pr109025.c -flto -ffat-lto-objects (test for excess errors)
FAIL: gcc.dg/vect/pr42604.c (internal compiler error: in 
anticipatable_occurrence_p, at config/riscv/riscv-vsetvl.cc:314)
FAIL: gcc.dg/vect/pr42604.c (test for excess errors)
FAIL: gcc.dg/vect/pr42604.c -flto -ffat-lto-objects (internal compiler error: 
in anticipatable_occurrence_p, at config/riscv/riscv-vsetvl.cc:314)
FAIL: gcc.dg/vect/pr42604.c -flto -ffat-lto-objects (test for excess errors)
FAIL: gcc.dg/vect/vect-double-reduc-3.c (internal compiler error: in 
anticipatable_occurrence_p, at config/riscv/riscv-vsetvl.cc:314)
FAIL: gcc.dg/vect/vect-double-reduc-3.c (test for excess errors)
FAIL: gcc.dg/vect/vect-double-reduc-3.c -flto -ffat-lto-objects (internal 
compiler error: in anticipatable_occurrence_p, at 
config/riscv/riscv-vsetvl.cc:314)
FAIL: gcc.dg/vect/vect-double-reduc-3.c -flto -ffat-lto-objects (test for 
excess errors)
FAIL: gcc.dg/vect/vect-double-reduc-7.c (internal compiler error: in 
anticipatable_occurrence_p, at config/riscv/riscv-vsetvl.cc:314)
FAIL: gcc.dg/vect/vect-double-reduc-7.c (test for excess errors)
FAIL: gcc.dg/vect/vect-double-reduc-7.c -flto -ffat-lto-objects (internal 
compiler error: in anticipatable_occurrence_p, at 
config/riscv/riscv-vsetvl.cc:314)
FAIL: gcc.dg/vect/vect-double-reduc-7.c -flto -ffat-lto-objects (test for 
excess errors)

gcc/ChangeLog:

 * config/riscv/riscv-vsetvl.cc (pass_vsetvl::earliest_fusion): Fix bug.

---
  gcc/config/riscv/riscv-vsetvl.cc | 38 ++--
  1 file changed, 36 insertions(+), 2 deletions(-)

diff --git a/gcc/config/riscv/riscv-vsetvl.cc b/gcc/config/riscv/riscv-vsetvl.cc
index 682f795c8e1..48e89fe2c03 100644
--- a/gcc/config/riscv/riscv-vsetvl.cc
+++ b/gcc/config/riscv/riscv-vsetvl.cc
@@ -3285,12 +3285,46 @@ pass_vsetvl::earliest_fusion (void)
   gcc_assert (!(eg->flags & EDGE_ABNORMAL));
   vector_insn_info new_info = vector_insn_info ();
   profile_probability prob = src_block_info.probability;
+ /* We don't fuse user vsetvl into EMPTY or
+DIRTY (EMPTY but polluted) block for these
+following reasons:
+
+   - The user vsetvl instruction is configured as
+ no side effects that the previous passes
+ (GSCE, Loop-invariant, ..., etc)
+ should be able to do a good job on optimization
+ of user explicit vsetvls so we don't need to
+ PRE optimization (The user vsetvls should be
+ on the optimal local already before this pass)
+ again for user vsetvls in VSETVL PASS here
+ (Phase 3 && Phase 4).
+
+   - Allowing user vsetvls be optimized in PRE
+ optimization here (Phase 3 && Phase 4) will
+ complicate the codes so much so we prefer user
+ vsetvls be optimized in post-optimization
+ (Phase 5 && Phase 6).  */
+ if (vsetvl_insn_p (expr.get_insn ()->rtl ()))
+   {
+ if (src_block_info.reaching_out.empty_p ())
+   continue;
+ else if (src_block_info.reaching_out.dirty_p ()
+  && !src_block_info.reaching_out.compatible_p (expr))
+   {
+ new_info.set_empty ();
+ /* Update probability as uninitialized status so that
+we won't try to fuse any demand info into such EMPTY
+block any more.  */
+ prob = profile_probability::uninitialized ();
+ update_block_info (eg->src->index, prob, new_info);
+ continue;
+   }
+   }

   if (src_block_info.reaching_out.empty_p ())
 {
   if (src_block_info.probability
-   == profile_probability::uninitialized ()
- || vsetvl_insn_p (expr.get_insn ()->rtl ()))
+ == profile_probability::uninitialized ())
 continue;
   new_info = expr.global_merge (

Re: [PATCH V2] RISC-V: Disable user vsetvl fusion into EMPTY or DIRTY (Polluted EMPTY) block

2023-08-28 Thread Kito Cheng via Gcc-patches
LGTM, that's much clearer than v1 to me :)

On Mon, Aug 28, 2023 at 5:54 PM Juzhe-Zhong  wrote:
>
> This patch is fixing these bunch of ICE in "vect" testsuite:
> FAIL: gcc.dg/vect/no-scevccp-outer-2.c (internal compiler error: in 
> anticipatable_occurrence_p, at config/riscv/riscv-vsetvl.cc:314)
> FAIL: gcc.dg/vect/no-scevccp-outer-2.c (test for excess errors)
> FAIL: gcc.dg/vect/pr109025.c (internal compiler error: in 
> anticipatable_occurrence_p, at config/riscv/riscv-vsetvl.cc:314)
> FAIL: gcc.dg/vect/pr109025.c (test for excess errors)
> FAIL: gcc.dg/vect/pr109025.c -flto -ffat-lto-objects (internal compiler 
> error: in anticipatable_occurrence_p, at config/riscv/riscv-vsetvl.cc:314)
> FAIL: gcc.dg/vect/pr109025.c -flto -ffat-lto-objects (test for excess errors)
> FAIL: gcc.dg/vect/pr42604.c (internal compiler error: in 
> anticipatable_occurrence_p, at config/riscv/riscv-vsetvl.cc:314)
> FAIL: gcc.dg/vect/pr42604.c (test for excess errors)
> FAIL: gcc.dg/vect/pr42604.c -flto -ffat-lto-objects (internal compiler error: 
> in anticipatable_occurrence_p, at config/riscv/riscv-vsetvl.cc:314)
> FAIL: gcc.dg/vect/pr42604.c -flto -ffat-lto-objects (test for excess errors)
> FAIL: gcc.dg/vect/vect-double-reduc-3.c (internal compiler error: in 
> anticipatable_occurrence_p, at config/riscv/riscv-vsetvl.cc:314)
> FAIL: gcc.dg/vect/vect-double-reduc-3.c (test for excess errors)
> FAIL: gcc.dg/vect/vect-double-reduc-3.c -flto -ffat-lto-objects (internal 
> compiler error: in anticipatable_occurrence_p, at 
> config/riscv/riscv-vsetvl.cc:314)
> FAIL: gcc.dg/vect/vect-double-reduc-3.c -flto -ffat-lto-objects (test for 
> excess errors)
> FAIL: gcc.dg/vect/vect-double-reduc-7.c (internal compiler error: in 
> anticipatable_occurrence_p, at config/riscv/riscv-vsetvl.cc:314)
> FAIL: gcc.dg/vect/vect-double-reduc-7.c (test for excess errors)
> FAIL: gcc.dg/vect/vect-double-reduc-7.c -flto -ffat-lto-objects (internal 
> compiler error: in anticipatable_occurrence_p, at 
> config/riscv/riscv-vsetvl.cc:314)
> FAIL: gcc.dg/vect/vect-double-reduc-7.c -flto -ffat-lto-objects (test for 
> excess errors)
>
> gcc/ChangeLog:
>
> * config/riscv/riscv-vsetvl.cc (pass_vsetvl::earliest_fusion): Fix 
> bug.
>
> ---
>  gcc/config/riscv/riscv-vsetvl.cc | 38 ++--
>  1 file changed, 36 insertions(+), 2 deletions(-)
>
> diff --git a/gcc/config/riscv/riscv-vsetvl.cc 
> b/gcc/config/riscv/riscv-vsetvl.cc
> index 682f795c8e1..48e89fe2c03 100644
> --- a/gcc/config/riscv/riscv-vsetvl.cc
> +++ b/gcc/config/riscv/riscv-vsetvl.cc
> @@ -3285,12 +3285,46 @@ pass_vsetvl::earliest_fusion (void)
>   gcc_assert (!(eg->flags & EDGE_ABNORMAL));
>   vector_insn_info new_info = vector_insn_info ();
>   profile_probability prob = src_block_info.probability;
> + /* We don't fuse user vsetvl into EMPTY or
> +DIRTY (EMPTY but polluted) block for these
> +following reasons:
> +
> +   - The user vsetvl instruction is configured as
> + no side effects that the previous passes
> + (GSCE, Loop-invariant, ..., etc)
> + should be able to do a good job on optimization
> + of user explicit vsetvls so we don't need to
> + PRE optimization (The user vsetvls should be
> + on the optimal local already before this pass)
> + again for user vsetvls in VSETVL PASS here
> + (Phase 3 && Phase 4).
> +
> +   - Allowing user vsetvls be optimized in PRE
> + optimization here (Phase 3 && Phase 4) will
> + complicate the codes so much so we prefer user
> + vsetvls be optimized in post-optimization
> + (Phase 5 && Phase 6).  */
> + if (vsetvl_insn_p (expr.get_insn ()->rtl ()))
> +   {
> + if (src_block_info.reaching_out.empty_p ())
> +   continue;
> + else if (src_block_info.reaching_out.dirty_p ()
> +  && !src_block_info.reaching_out.compatible_p 
> (expr))
> +   {
> + new_info.set_empty ();
> + /* Update probability as uninitialized status so that
> +we won't try to fuse any demand info into such EMPTY
> +block any more.  */
> + prob = profile_probability::uninitialized ();
> + update_block_info (eg->src->index, prob, new_info);
> + continue;
> +   }
> +   }
>
>   if (src_block_info.reaching_out.empty_p ())
> {
>   if (src_block_info.probability
> -   == profile_probability::uninitialized ()
> - || vsetvl_insn_p (expr.get_insn ()->rtl ()))
> + == profile_probability::un

Re: Re: [PATCH] RISC-V: Disable user vsetvl fusion into EMPTY block

2023-08-28 Thread juzhe.zh...@rivai.ai
Address comments:
https://gcc.gnu.org/pipermail/gcc-patches/2023-August/628568.html 




juzhe.zh...@rivai.ai
 
From: Kito Cheng
Date: 2023-08-28 16:58
To: Robin Dapp
CC: Juzhe-Zhong; gcc-patches; kito.cheng
Subject: Re: [PATCH] RISC-V: Disable user vsetvl fusion into EMPTY block
Is it possible to skip that at the topper level like that?
 
diff --git a/gcc/config/riscv/riscv-vsetvl.cc b/gcc/config/riscv/riscv-vsetvl.cc
index 682f795c8e1..654d25de593 100644
--- a/gcc/config/riscv/riscv-vsetvl.cc
+++ b/gcc/config/riscv/riscv-vsetvl.cc
@@ -3269,7 +3269,7 @@ pass_vsetvl::earliest_fusion (void)
  for (size_t i = 0; i < m_vector_manager->vector_exprs.length (); i++)
   {
 auto &expr = *m_vector_manager->vector_exprs[i];
- if (expr.empty_p ())
+ if (expr.empty_p () || vsetvl_insn_p (expr.get_insn ()->rtl ()))
   continue;
 edge eg = INDEX_EDGE (m_vector_manager->vector_edge_list, ed);
 if (eg->src == ENTRY_BLOCK_PTR_FOR_FN (cfun)
 


[PATCH V2] RISC-V: Disable user vsetvl fusion into EMPTY or DIRTY (Polluted EMPTY) block

2023-08-28 Thread Juzhe-Zhong
This patch is fixing these bunch of ICE in "vect" testsuite:
FAIL: gcc.dg/vect/no-scevccp-outer-2.c (internal compiler error: in 
anticipatable_occurrence_p, at config/riscv/riscv-vsetvl.cc:314)
FAIL: gcc.dg/vect/no-scevccp-outer-2.c (test for excess errors)
FAIL: gcc.dg/vect/pr109025.c (internal compiler error: in 
anticipatable_occurrence_p, at config/riscv/riscv-vsetvl.cc:314)
FAIL: gcc.dg/vect/pr109025.c (test for excess errors)
FAIL: gcc.dg/vect/pr109025.c -flto -ffat-lto-objects (internal compiler error: 
in anticipatable_occurrence_p, at config/riscv/riscv-vsetvl.cc:314)
FAIL: gcc.dg/vect/pr109025.c -flto -ffat-lto-objects (test for excess errors)
FAIL: gcc.dg/vect/pr42604.c (internal compiler error: in 
anticipatable_occurrence_p, at config/riscv/riscv-vsetvl.cc:314)
FAIL: gcc.dg/vect/pr42604.c (test for excess errors)
FAIL: gcc.dg/vect/pr42604.c -flto -ffat-lto-objects (internal compiler error: 
in anticipatable_occurrence_p, at config/riscv/riscv-vsetvl.cc:314)
FAIL: gcc.dg/vect/pr42604.c -flto -ffat-lto-objects (test for excess errors)
FAIL: gcc.dg/vect/vect-double-reduc-3.c (internal compiler error: in 
anticipatable_occurrence_p, at config/riscv/riscv-vsetvl.cc:314)
FAIL: gcc.dg/vect/vect-double-reduc-3.c (test for excess errors)
FAIL: gcc.dg/vect/vect-double-reduc-3.c -flto -ffat-lto-objects (internal 
compiler error: in anticipatable_occurrence_p, at 
config/riscv/riscv-vsetvl.cc:314)
FAIL: gcc.dg/vect/vect-double-reduc-3.c -flto -ffat-lto-objects (test for 
excess errors)
FAIL: gcc.dg/vect/vect-double-reduc-7.c (internal compiler error: in 
anticipatable_occurrence_p, at config/riscv/riscv-vsetvl.cc:314)
FAIL: gcc.dg/vect/vect-double-reduc-7.c (test for excess errors)
FAIL: gcc.dg/vect/vect-double-reduc-7.c -flto -ffat-lto-objects (internal 
compiler error: in anticipatable_occurrence_p, at 
config/riscv/riscv-vsetvl.cc:314)
FAIL: gcc.dg/vect/vect-double-reduc-7.c -flto -ffat-lto-objects (test for 
excess errors)

gcc/ChangeLog:

* config/riscv/riscv-vsetvl.cc (pass_vsetvl::earliest_fusion): Fix bug.

---
 gcc/config/riscv/riscv-vsetvl.cc | 38 ++--
 1 file changed, 36 insertions(+), 2 deletions(-)

diff --git a/gcc/config/riscv/riscv-vsetvl.cc b/gcc/config/riscv/riscv-vsetvl.cc
index 682f795c8e1..48e89fe2c03 100644
--- a/gcc/config/riscv/riscv-vsetvl.cc
+++ b/gcc/config/riscv/riscv-vsetvl.cc
@@ -3285,12 +3285,46 @@ pass_vsetvl::earliest_fusion (void)
  gcc_assert (!(eg->flags & EDGE_ABNORMAL));
  vector_insn_info new_info = vector_insn_info ();
  profile_probability prob = src_block_info.probability;
+ /* We don't fuse user vsetvl into EMPTY or
+DIRTY (EMPTY but polluted) block for these
+following reasons:
+
+   - The user vsetvl instruction is configured as
+ no side effects that the previous passes
+ (GSCE, Loop-invariant, ..., etc)
+ should be able to do a good job on optimization
+ of user explicit vsetvls so we don't need to
+ PRE optimization (The user vsetvls should be
+ on the optimal local already before this pass)
+ again for user vsetvls in VSETVL PASS here
+ (Phase 3 && Phase 4).
+
+   - Allowing user vsetvls be optimized in PRE
+ optimization here (Phase 3 && Phase 4) will
+ complicate the codes so much so we prefer user
+ vsetvls be optimized in post-optimization
+ (Phase 5 && Phase 6).  */
+ if (vsetvl_insn_p (expr.get_insn ()->rtl ()))
+   {
+ if (src_block_info.reaching_out.empty_p ())
+   continue;
+ else if (src_block_info.reaching_out.dirty_p ()
+  && !src_block_info.reaching_out.compatible_p (expr))
+   {
+ new_info.set_empty ();
+ /* Update probability as uninitialized status so that
+we won't try to fuse any demand info into such EMPTY
+block any more.  */
+ prob = profile_probability::uninitialized ();
+ update_block_info (eg->src->index, prob, new_info);
+ continue;
+   }
+   }
 
  if (src_block_info.reaching_out.empty_p ())
{
  if (src_block_info.probability
-   == profile_probability::uninitialized ()
- || vsetvl_insn_p (expr.get_insn ()->rtl ()))
+ == profile_probability::uninitialized ())
continue;
  new_info = expr.global_merge (expr, eg->src->index);
  new_info.set_dirty ();
-- 
2.36.3



Re: [PATCH] alias-analyis: try to find ADDR_EXPR for SSA_NAME ptr

2023-08-28 Thread Richard Biener via Gcc-patches
On Mon, Aug 28, 2023 at 11:35 AM Di Zhao OS via Gcc-patches
 wrote:
>
> This patch tries to improve alias-analysis between an SSA_NAME and
> a declaration a little. For a case like:
>
> int array1[10], array2[10];
> ptr1 = array1 + x;
> ptr2 = ptr1 + y;
>
> , *ptr2 should not alias with array2.
>
> If we can't disambiguate from points-to information, this patch
> tries to find a determined ADDR_EXPR following the defining
> statements and then disambiguate by compare_base_decls. On spec2017
> 502.gcc, there are several thousands of new non-aliasing relation
> found. (No obvious improvements or regressions found, though.)
>
> Bootstrapped and tested on aarch64-unknown-linux-gnu.
>
> Thanks,
> Di Zhao
>
>
> gcc/ChangeLog:
>
> * tree-ssa-alias.cc (ptr_deref_may_alias_decl_p): try
> to find ADDR_EXPR for SSA_NAME ptrs.
>
> ---
>  gcc/tree-ssa-alias.cc | 42 ++
>  1 file changed, 34 insertions(+), 8 deletions(-)
>
> diff --git a/gcc/tree-ssa-alias.cc b/gcc/tree-ssa-alias.cc
> index cf38fe506a8..a6fe1e7b227 100644
> --- a/gcc/tree-ssa-alias.cc
> +++ b/gcc/tree-ssa-alias.cc
> @@ -271,6 +271,38 @@ ptr_deref_may_alias_decl_p (tree ptr, tree decl)
>return ptr_deref_may_alias_decl_p (ptr, decl);
>  }
>
> +  if (TREE_CODE (ptr) == SSA_NAME)
> +{
> +  /* First disambiguate from points-to information.  */
> +  pi = SSA_NAME_PTR_INFO (ptr);
> +  if (pi && !pt_solution_includes (&pi->pt, decl))
> +   return false;
> +
> +  /* Try to find an ADDR_EXPR by walking the defining statements, so we 
> can
> +probably disambiguate from compare_base_decls.  */
> +  gimple *def_stmt;
> +  while ((def_stmt = SSA_NAME_DEF_STMT (ptr)))
> +   {
> + if (is_gimple_assign (def_stmt)
> + && gimple_assign_rhs_code (def_stmt) == POINTER_PLUS_EXPR)

This is going to very badly regress compile-time with long chains of
pointer adjustments so not really appropriate.

What are the sources of missed points-to information here?  Sometimes
transforms unnecessarily drop or forget to propagate info.

What are the passes that actually benefit most from this change?  We could
implement a light-weight points-to info "forwarding" pass for example.  Iff,
then your patch should update points-to info on the stmts it visits during
this walk to make future walks cheaper - and IMHO it definitely would need
an upper bound on the number of stmts walked.  Something like

  auto_vec ptrs;
  while (..)
{
   result = check alias;
   ptrs.quick_push (ptr);
}
  update points-to-info of ptrs;
  return result;

I'll also note that other APIs have very much the same issue
(you can look for POINTER_PLUS_EXPR walks of GENERIC
arguments), splitting out the POINTER_PLUS_EXPR handling
would be good.

But first I'd like to have the first questions answered.

> +   {
> + ptr = gimple_assign_rhs1 (def_stmt);
> + if (TREE_CODE (ptr) != SSA_NAME)
> +   break;
> +   }
> + /* See if we can find a certain defining source.  */
> + else if (gimple_code (def_stmt) == GIMPLE_PHI
> +  && gimple_phi_num_args (def_stmt) == 1)
> +   {
> + ptr = PHI_ARG_DEF (def_stmt, 0);
> + if (TREE_CODE (ptr) != SSA_NAME)
> +   break;
> +   }
> + else
> +   break;
> +   }
> +}
> +
>/* ADDR_EXPR pointers either just offset another pointer or directly
>   specify the pointed-to set.  */
>if (TREE_CODE (ptr) == ADDR_EXPR)
> @@ -279,7 +311,7 @@ ptr_deref_may_alias_decl_p (tree ptr, tree decl)
>if (base
>   && (TREE_CODE (base) == MEM_REF
>   || TREE_CODE (base) == TARGET_MEM_REF))
> -   ptr = TREE_OPERAND (base, 0);
> +   return ptr_deref_may_alias_decl_p (TREE_OPERAND (base, 0), decl);
>else if (base
>&& DECL_P (base))
> return compare_base_decls (base, decl) != 0;
> @@ -294,13 +326,7 @@ ptr_deref_may_alias_decl_p (tree ptr, tree decl)
>if (!may_be_aliased (decl))
>  return false;
>
> -  /* If we do not have useful points-to information for this pointer
> - we cannot disambiguate anything else.  */
> -  pi = SSA_NAME_PTR_INFO (ptr);
> -  if (!pi)
> -return true;
> -
> -  return pt_solution_includes (&pi->pt, decl);
> +  return true;
>  }
>
>  /* Return true if dereferenced PTR1 and PTR2 may alias.
> --
> 2.25.1


[PATCH] alias-analyis: try to find ADDR_EXPR for SSA_NAME ptr

2023-08-28 Thread Di Zhao OS via Gcc-patches
This patch tries to improve alias-analysis between an SSA_NAME and
a declaration a little. For a case like:

int array1[10], array2[10];
ptr1 = array1 + x;
ptr2 = ptr1 + y;

, *ptr2 should not alias with array2.

If we can't disambiguate from points-to information, this patch
tries to find a determined ADDR_EXPR following the defining
statements and then disambiguate by compare_base_decls. On spec2017
502.gcc, there are several thousands of new non-aliasing relation
found. (No obvious improvements or regressions found, though.) 

Bootstrapped and tested on aarch64-unknown-linux-gnu.

Thanks,
Di Zhao


gcc/ChangeLog:

* tree-ssa-alias.cc (ptr_deref_may_alias_decl_p): try
to find ADDR_EXPR for SSA_NAME ptrs.

---
 gcc/tree-ssa-alias.cc | 42 ++
 1 file changed, 34 insertions(+), 8 deletions(-)

diff --git a/gcc/tree-ssa-alias.cc b/gcc/tree-ssa-alias.cc
index cf38fe506a8..a6fe1e7b227 100644
--- a/gcc/tree-ssa-alias.cc
+++ b/gcc/tree-ssa-alias.cc
@@ -271,6 +271,38 @@ ptr_deref_may_alias_decl_p (tree ptr, tree decl)
   return ptr_deref_may_alias_decl_p (ptr, decl);
 }
 
+  if (TREE_CODE (ptr) == SSA_NAME)
+{
+  /* First disambiguate from points-to information.  */
+  pi = SSA_NAME_PTR_INFO (ptr);
+  if (pi && !pt_solution_includes (&pi->pt, decl))
+   return false;
+
+  /* Try to find an ADDR_EXPR by walking the defining statements, so we can
+probably disambiguate from compare_base_decls.  */
+  gimple *def_stmt;
+  while ((def_stmt = SSA_NAME_DEF_STMT (ptr)))
+   {
+ if (is_gimple_assign (def_stmt)
+ && gimple_assign_rhs_code (def_stmt) == POINTER_PLUS_EXPR)
+   {
+ ptr = gimple_assign_rhs1 (def_stmt);
+ if (TREE_CODE (ptr) != SSA_NAME)
+   break;
+   }
+ /* See if we can find a certain defining source.  */
+ else if (gimple_code (def_stmt) == GIMPLE_PHI
+  && gimple_phi_num_args (def_stmt) == 1)
+   {
+ ptr = PHI_ARG_DEF (def_stmt, 0);
+ if (TREE_CODE (ptr) != SSA_NAME)
+   break;
+   }
+ else
+   break;
+   }
+}
+
   /* ADDR_EXPR pointers either just offset another pointer or directly
  specify the pointed-to set.  */
   if (TREE_CODE (ptr) == ADDR_EXPR)
@@ -279,7 +311,7 @@ ptr_deref_may_alias_decl_p (tree ptr, tree decl)
   if (base
  && (TREE_CODE (base) == MEM_REF
  || TREE_CODE (base) == TARGET_MEM_REF))
-   ptr = TREE_OPERAND (base, 0);
+   return ptr_deref_may_alias_decl_p (TREE_OPERAND (base, 0), decl);
   else if (base
   && DECL_P (base))
return compare_base_decls (base, decl) != 0;
@@ -294,13 +326,7 @@ ptr_deref_may_alias_decl_p (tree ptr, tree decl)
   if (!may_be_aliased (decl))
 return false;
 
-  /* If we do not have useful points-to information for this pointer
- we cannot disambiguate anything else.  */
-  pi = SSA_NAME_PTR_INFO (ptr);
-  if (!pi)
-return true;
-
-  return pt_solution_includes (&pi->pt, decl);
+  return true;
 }
 
 /* Return true if dereferenced PTR1 and PTR2 may alias.
-- 
2.25.1


Re: [PATCH] s390: Fix builtins vec_rli and verll

2023-08-28 Thread Andreas Krebbel via Gcc-patches
Hi Stefan,

do you really need to introduce a new flag for U64 given that the type of the 
builtin is unsigned long?

Andreas

On 8/21/23 17:56, Stefan Schulze Frielinghaus wrote:
> The second argument of these builtins is an unsigned immediate.  For
> vec_rli the API allows immediates up to 64 bits whereas the instruction
> verll only allows immediates up to 32 bits.  Since the shift count
> equals the immediate modulo vector element size, truncating those
> immediates is fine.
> 
> Bootstrapped and regtested on s390.  Ok for mainline?
> 
> gcc/ChangeLog:
> 
>   * config/s390/s390-builtins.def (O_U64): New.
>   (O1_U64): Ditto.
>   (O2_U64): Ditto.
>   (O3_U64): Ditto.
>   (O4_U64): Ditto.
>   (O_M12): Change bit position.
>   (O_S2): Ditto.
>   (O_S3): Ditto.
>   (O_S4): Ditto.
>   (O_S5): Ditto.
>   (O_S8): Ditto.
>   (O_S12): Ditto.
>   (O_S16): Ditto.
>   (O_S32): Ditto.
>   (O_ELEM): Ditto.
>   (O_LIT): Ditto.
>   (OB_DEF_VAR): Add operand constraints.
>   (B_DEF): Ditto.
>   * config/s390/s390.cc (s390_const_operand_ok): Honour 64 bit
>   operands.
> ---
>  gcc/config/s390/s390-builtins.def | 60 ++-
>  gcc/config/s390/s390.cc   |  6 ++--
>  2 files changed, 39 insertions(+), 27 deletions(-)
> 
> diff --git a/gcc/config/s390/s390-builtins.def 
> b/gcc/config/s390/s390-builtins.def
> index a16983b18bd..c829f445a11 100644
> --- a/gcc/config/s390/s390-builtins.def
> +++ b/gcc/config/s390/s390-builtins.def
> @@ -28,6 +28,7 @@
>  #undef O_U12
>  #undef O_U16
>  #undef O_U32
> +#undef O_U64
>  
>  #undef O_M12
>  
> @@ -88,6 +89,11 @@
>  #undef O3_U32
>  #undef O4_U32
>  
> +#undef O1_U64
> +#undef O2_U64
> +#undef O3_U64
> +#undef O4_U64
> +
>  #undef O1_M12
>  #undef O2_M12
>  #undef O3_M12
> @@ -157,20 +163,21 @@
>  #define O_U127 /* unsigned 16 bit literal */
>  #define O_U168 /* unsigned 16 bit literal */
>  #define O_U329 /* unsigned 32 bit literal */
> +#define O_U64   10 /* unsigned 64 bit literal */
>  
> -#define O_M12   10 /* matches bitmask of 12 */
> +#define O_M12   11 /* matches bitmask of 12 */
>  
> -#define O_S211 /* signed  2 bit literal */
> -#define O_S312 /* signed  3 bit literal */
> -#define O_S413 /* signed  4 bit literal */
> -#define O_S514 /* signed  5 bit literal */
> -#define O_S815 /* signed  8 bit literal */
> -#define O_S12   16 /* signed 12 bit literal */
> -#define O_S16   17 /* signed 16 bit literal */
> -#define O_S32   18 /* signed 32 bit literal */
> +#define O_S212 /* signed  2 bit literal */
> +#define O_S313 /* signed  3 bit literal */
> +#define O_S414 /* signed  4 bit literal */
> +#define O_S515 /* signed  5 bit literal */
> +#define O_S816 /* signed  8 bit literal */
> +#define O_S12   17 /* signed 12 bit literal */
> +#define O_S16   18 /* signed 16 bit literal */
> +#define O_S32   19 /* signed 32 bit literal */
>  
> -#define O_ELEM  19 /* Element selector requiring modulo arithmetic. */
> -#define O_LIT   20 /* Operand must be a literal fitting the target type.  */
> +#define O_ELEM  20 /* Element selector requiring modulo arithmetic. */
> +#define O_LIT   21 /* Operand must be a literal fitting the target type.  */
>  
>  #define O_SHIFT 5
>  
> @@ -223,6 +230,11 @@
>  #define O3_U32 (O_U32 << (2 * O_SHIFT))
>  #define O4_U32 (O_U32 << (3 * O_SHIFT))
>  
> +#define O1_U64 O_U64
> +#define O2_U64 (O_U64 << O_SHIFT)
> +#define O3_U64 (O_U64 << (2 * O_SHIFT))
> +#define O4_U64 (O_U64 << (3 * O_SHIFT))
> +
>  #define O1_M12 O_M12
>  #define O2_M12 (O_M12 << O_SHIFT)
>  #define O3_M12 (O_M12 << (2 * O_SHIFT))
> @@ -1989,19 +2001,19 @@ B_DEF  (s390_verllvf,   vrotlv4si3,   
>   0,
>  B_DEF  (s390_verllvg,   vrotlv2di3, 0,   
>B_VX,   0,  BT_FN_UV2DI_UV2DI_UV2DI)
>  
>  OB_DEF (s390_vec_rli,   s390_vec_rli_u8,
> s390_vec_rli_s64,   B_VX,   BT_FN_OV4SI_OV4SI_ULONG)
> -OB_DEF_VAR (s390_vec_rli_u8,s390_verllb,0,   
>0,  BT_OV_UV16QI_UV16QI_ULONG)
> -OB_DEF_VAR (s390_vec_rli_s8,s390_verllb,0,   
>0,  BT_OV_V16QI_V16QI_ULONG)
> -OB_DEF_VAR (s390_vec_rli_u16,   s390_verllh,0,   
>0,  BT_OV_UV8HI_UV8HI_ULONG)
> -OB_DEF_VAR (s390_vec_rli_s16,   s390_verllh,0,   
>0,  BT_OV_V8HI_V8HI_ULONG)
> -OB_DEF_VAR (s390_vec_rli_u32,   s390_verllf,0,   
>0,  BT_OV_UV4SI_UV4SI_ULONG)
> -OB_DEF_VAR (s390_vec_rli_s32,   s390_verllf,0,   
>0,  BT_OV_V4SI_V4SI_ULONG)
> -OB_DEF_VAR (s390_vec_rli_u64,   s390_verllg,0,   
>0,  BT_OV_UV2DI_UV2DI_ULONG)
> -OB_DEF_VAR (s390

Re: [PATCH 0/2] support cm.push cm.pop cm.popret in zcmp and resolve confilct with shrink-wrap-separate

2023-08-28 Thread Kito Cheng via Gcc-patches
I would prefer to decouple the shrink-wrap part by checking
flag_shrink_wrap, I mean let disable zcmp code gen if flag_shrink_wrap
is true for now, and a follow up patch series with shrink-wrap.[cc|h]
changes?

On Mon, Aug 28, 2023 at 3:48 PM Fei Gao  wrote:
>
> The first is a helper patch to allow targets to check shrink-wrap-separate 
> enabled or not.
> The second is zcmp extension implementation in RISC-V.
>
> Fei Gao (2):
>   allow target to check shrink-wrap-separate enabled or not
>   support cm.push cm.pop cm.popret in zcmp and resolve confilct with 
> shrink-wrap-separate
>
>  gcc/config/riscv/iterators.md |   15 +
>  gcc/config/riscv/predicates.md|   96 ++
>  gcc/config/riscv/riscv-protos.h   |2 +
>  gcc/config/riscv/riscv.cc |  455 ++-
>  gcc/config/riscv/riscv.h  |   25 +
>  gcc/config/riscv/riscv.md |2 +
>  gcc/config/riscv/zc.md| 1042 +
>  gcc/shrink-wrap.cc|   25 +-
>  gcc/shrink-wrap.h |1 +
>  gcc/testsuite/gcc.target/riscv/rv32e_zcmp.c   |  256 
>  gcc/testsuite/gcc.target/riscv/rv32i_zcmp.c   |  256 
>  .../gcc.target/riscv/zcmp_push_fpr.c  |   34 +
>  .../riscv/zcmp_shrink_wrap_separate.c |   93 ++
>  .../riscv/zcmp_shrink_wrap_separate2.c|   93 ++
>  .../gcc.target/riscv/zcmp_stack_alignment.c   |   24 +
>  15 files changed, 2357 insertions(+), 62 deletions(-)
>  create mode 100644 gcc/config/riscv/zc.md
>  create mode 100644 gcc/testsuite/gcc.target/riscv/rv32e_zcmp.c
>  create mode 100644 gcc/testsuite/gcc.target/riscv/rv32i_zcmp.c
>  create mode 100644 gcc/testsuite/gcc.target/riscv/zcmp_push_fpr.c
>  create mode 100644 gcc/testsuite/gcc.target/riscv/zcmp_shrink_wrap_separate.c
>  create mode 100644 
> gcc/testsuite/gcc.target/riscv/zcmp_shrink_wrap_separate2.c
>  create mode 100644 gcc/testsuite/gcc.target/riscv/zcmp_stack_alignment.c
>
> --
> 2.17.1
>


Re: [PATCH] s390: Fix some builtin definitions

2023-08-28 Thread Andreas Krebbel via Gcc-patches
On 8/21/23 17:58, Stefan Schulze Frielinghaus wrote:
> Bootstrapped and regtested on s390.  Ok for mainline?
> 
> gcc/ChangeLog:
> 
>   * config/s390/s390-builtins.def (s390_vec_signed_flt): Fix
>   builtin flag.
>   (s390_vec_unsigned_flt): Ditto.
>   (s390_vec_revb_flt): Ditto.
>   (s390_vec_reve_flt): Ditto.
>   (s390_vclfnhs): Fix operand flags.
>   (s390_vclfnls): Ditto.
>   (s390_vcrnfs): Ditto.
>   (s390_vcfn): Ditto.
>   (s390_vcnf): Ditto.

Ok. Thanks!

Andreas


> ---
>  gcc/config/s390/s390-builtins.def | 20 ++--
>  1 file changed, 10 insertions(+), 10 deletions(-)
> 
> diff --git a/gcc/config/s390/s390-builtins.def 
> b/gcc/config/s390/s390-builtins.def
> index c829f445a11..964d86c74a0 100644
> --- a/gcc/config/s390/s390-builtins.def
> +++ b/gcc/config/s390/s390-builtins.def
> @@ -2846,12 +2846,12 @@ B_DEF  (s390_vcelfb,
> floatunsv4siv4sf2,  0,
>  B_DEF  (s390_vcdlgb,floatunsv2div2df2,  0,   
>B_VX,   O2_U4 | O3_U3,  BT_FN_V2DF_UV2DI)
>  
>  OB_DEF (s390_vec_signed,
> s390_vec_signed_flt,s390_vec_signed_dbl,B_VX,   BT_FN_OV4SI_OV4SI)
> -OB_DEF_VAR (s390_vec_signed_flt,s390_vcfeb, 0,   
>B_VXE2, BT_OV_V4SI_V4SF)
> +OB_DEF_VAR (s390_vec_signed_flt,s390_vcfeb, B_VXE2,  
>0,  BT_OV_V4SI_V4SF)
>  OB_DEF_VAR (s390_vec_signed_dbl,s390_vcgdb, 0,   
>0,  BT_OV_V2DI_V2DF)
>  
>  OB_DEF (s390_vec_unsigned,  
> s390_vec_unsigned_flt,s390_vec_unsigned_dbl,B_VX,   BT_FN_OV4SI_OV4SI)
> -OB_DEF_VAR (s390_vec_unsigned_flt,  s390_vclfeb,0,   
>  B_VXE2, BT_OV_UV4SI_V4SF)
> -OB_DEF_VAR (s390_vec_unsigned_dbl,  s390_vclgdb,0,   
>  0,  BT_OV_UV2DI_V2DF)
> +OB_DEF_VAR (s390_vec_unsigned_flt,  s390_vclfeb,B_VXE2,  
>0,  BT_OV_UV4SI_V4SF)
> +OB_DEF_VAR (s390_vec_unsigned_dbl,  s390_vclgdb,0,   
>0,  BT_OV_UV2DI_V2DF)
>  
>  B_DEF  (s390_vcfeb, fix_truncv4sfv4si2, 0,   
>B_VXE2, O2_U4 | O3_U3,  BT_FN_V4SI_V4SF)
>  B_DEF  (s390_vcgdb, fix_truncv2dfv2di2, 0,   
>B_VX,   O2_U4 | O3_U3,  BT_FN_V2DI_V2DF)
> @@ -2929,7 +2929,7 @@ OB_DEF_VAR (s390_vec_revb_s32,  s390_vlbrf, 
> 0,
>  OB_DEF_VAR (s390_vec_revb_u32,  s390_vlbrf, 0,   
>0,  BT_OV_UV4SI_UV4SI)
>  OB_DEF_VAR (s390_vec_revb_s64,  s390_vlbrg, 0,   
>0,  BT_OV_V2DI_V2DI)
>  OB_DEF_VAR (s390_vec_revb_u64,  s390_vlbrg, 0,   
>0,  BT_OV_UV2DI_UV2DI)
> -OB_DEF_VAR (s390_vec_revb_flt,  s390_vlbrf_flt, 0,   
>B_VXE,  BT_OV_V4SF_V4SF)
> +OB_DEF_VAR (s390_vec_revb_flt,  s390_vlbrf_flt, B_VXE,   
>0,  BT_OV_V4SF_V4SF)
>  OB_DEF_VAR (s390_vec_revb_dbl,  s390_vlbrg_dbl, 0,   
>0,  BT_OV_V2DF_V2DF)
>  
>  B_DEF  (s390_vlbrh, bswapv8hi,  0,   
>B_VX,   0,   BT_FN_V8HI_V8HI)
> @@ -2960,7 +2960,7 @@ OB_DEF_VAR (s390_vec_reve_u32,  s390_vlerf, 
> 0,
>  OB_DEF_VAR (s390_vec_reve_b64,  s390_vlerg, 0,   
>0,  BT_OV_BV2DI_BV2DI)
>  OB_DEF_VAR (s390_vec_reve_s64,  s390_vlerg, 0,   
>0,  BT_OV_V2DI_V2DI)
>  OB_DEF_VAR (s390_vec_reve_u64,  s390_vlerg, 0,   
>0,  BT_OV_UV2DI_UV2DI)
> -OB_DEF_VAR (s390_vec_reve_flt,  s390_vlerf_flt, 0,   
>B_VXE,  BT_OV_V4SF_V4SF)
> +OB_DEF_VAR (s390_vec_reve_flt,  s390_vlerf_flt, B_VXE,   
>0,  BT_OV_V4SF_V4SF)
>  OB_DEF_VAR (s390_vec_reve_dbl,  s390_vlerg_dbl, 0,   
>0,  BT_OV_V2DF_V2DF)
>  
>  B_DEF  (s390_vlerb, eltswapv16qi,   0,   
>B_VX,   0,   BT_FN_V16QI_V16QI)
> @@ -3037,10 +3037,10 @@ B_DEF  (s390_vstrszf,vstrszv4si,  
>   0,
>  
>  /* arch 14 builtins */
>  
> -B_DEF  (s390_vclfnhs,vclfnhs_v8hi,  0,   
>B_NNPA, O3_U4,  BT_FN_V4SF_V8HI_UINT)
> -B_DEF  (s390_vclfnls,vclfnls_v8hi,  0,   
>B_NNPA, O3_U4,  BT_FN_V4SF_V8HI_UINT)
> +B_DEF  (s390_vclfnhs,vclfnhs_v8hi,  0,   
>B_NNPA, O2_U4,

Re: [PATCH v2] LoongArch: Enable '-free' starting at -O2.

2023-08-28 Thread Xi Ruoyao via Gcc-patches
On Mon, 2023-08-28 at 11:46 +0800, Lulu Cheng wrote:
> v1 -> v2:
> 1. Modify Changelog information format.
> 
> gcc/ChangeLog:
> 
> * common/config/loongarch/loongarch-common.cc:
> Enable '-free' on O2 and above.
> * doc/invoke.texi: Modify the description information
> of the '-free' compilation option and add the LoongArch
> description.
> 
> gcc/testsuite/ChangeLog:
> 
> * gcc.target/loongarch/sign-extend.c: New test.

LGTM.

> ---
>  .../config/loongarch/loongarch-common.cc  |  1 +
>  gcc/doc/invoke.texi   |  4 +--
>  .../gcc.target/loongarch/sign-extend.c    | 25 +++
>  3 files changed, 28 insertions(+), 2 deletions(-)
>  create mode 100644 gcc/testsuite/gcc.target/loongarch/sign-extend.c
> 
> diff --git a/gcc/common/config/loongarch/loongarch-common.cc 
> b/gcc/common/config/loongarch/loongarch-common.cc
> index fce32fa3f8d..c5ed37d27a6 100644
> --- a/gcc/common/config/loongarch/loongarch-common.cc
> +++ b/gcc/common/config/loongarch/loongarch-common.cc
> @@ -35,6 +35,7 @@ static const struct default_options 
> loongarch_option_optimization_table[] =
>  {
>    { OPT_LEVELS_ALL, OPT_fasynchronous_unwind_tables, NULL, 1 },
>    { OPT_LEVELS_1_PLUS, OPT_fsection_anchors, NULL, 1 },
> +  { OPT_LEVELS_2_PLUS, OPT_free, NULL, 1 },
>    { OPT_LEVELS_NONE, 0, NULL, 0 }
>  };
>  
> diff --git a/gcc/doc/invoke.texi b/gcc/doc/invoke.texi
> index a32dabf0405..16aa92b5e86 100644
> --- a/gcc/doc/invoke.texi
> +++ b/gcc/doc/invoke.texi
> @@ -12639,8 +12639,8 @@ Attempt to remove redundant extension instructions.  
> This is especially
>  helpful for the x86-64 architecture, which implicitly zero-extends in 64-bit
>  registers after writing to their lower 32-bit half.
>  
> -Enabled for Alpha, AArch64, PowerPC, RISC-V, SPARC, h83000 and x86 at levels
> -@option{-O2}, @option{-O3}, @option{-Os}.
> +Enabled for Alpha, AArch64, LoongArch, PowerPC, RISC-V, SPARC, h83000 and 
> x86 at
> +levels @option{-O2}, @option{-O3}, @option{-Os}.
>  
>  @opindex fno-lifetime-dse
>  @opindex flifetime-dse
> diff --git a/gcc/testsuite/gcc.target/loongarch/sign-extend.c 
> b/gcc/testsuite/gcc.target/loongarch/sign-extend.c
> new file mode 100644
> index 000..3f339d06bbd
> --- /dev/null
> +++ b/gcc/testsuite/gcc.target/loongarch/sign-extend.c
> @@ -0,0 +1,25 @@
> +/* { dg-do compile } */
> +/* { dg-options "-mabi=lp64d -O2" } */
> +/* { dg-final { scan-assembler-times "slli.w" 1 } } */
> +
> +extern int PL_savestack_ix;
> +extern int PL_regsize;
> +extern int PL_savestack_max;
> +void Perl_savestack_grow_cnt (int need);
> +extern void Perl_croak (char *);
> +
> +int
> +S_regcppush(int parenfloor)
> +{
> +  int retval = PL_savestack_ix;
> +  int paren_elems_to_push = (PL_regsize - parenfloor) * 4;
> +  int p;
> +
> +  if (paren_elems_to_push < 0)
> +    Perl_croak ("panic: paren_elems_to_push < 0");
> +
> +  if (PL_savestack_ix + (paren_elems_to_push + 6) > PL_savestack_max)
> +    Perl_savestack_grow_cnt (paren_elems_to_push + 6);
> +
> +  return retval;
> +}

-- 
Xi Ruoyao 
School of Aerospace Science and Technology, Xidian University


  1   2   >