[PATCH V3] Machine Description: Add LEN_MASK_{GATHER_LOAD, SCATTER_STORE} pattern

2023-06-30 Thread juzhe . zhong
From: Ju-Zhe Zhong 

Hi, Richi and Richard.

This patch is adding LEN_MASK_{GATHER_LOAD,SCATTER_STORE} to allow targets
handle flow control by mask and loop control by length on gather/scatter memory
operations. Consider this following case:

#include 
void
f (uint8_t *restrict a, 
   uint8_t *restrict b, int n,
   int base, int step,
   int *restrict cond)
{
  for (int i = 0; i < n; ++i)
{
  if (cond[i])
a[i * step + base] = b[i * step + base];
}
}

We hope RVV can vectorize such case into following IR:

loop_len = SELECT_VL
control_mask = comparison
v = LEN_MASK_GATHER_LOAD (.., loop_len, control_mask, bias)
LEN_SCATTER_STORE (... v, ..., loop_len, control_mask, bias)

This patch doesn't apply such patterns into vectorizer, just add patterns
and update the documents.

Will send patch which apply such patterns into vectorizer soon after this
patch is approved.

Thanks.

gcc/ChangeLog:

* doc/md.texi: Add len_mask_gather/scatter.
* internal-fn.cc (expand_scatter_store_optab_fn): Ditto.
(expand_gather_load_optab_fn): Ditto.
(internal_load_fn_p): Ditto.
(internal_store_fn_p): Ditto.
(internal_gather_scatter_fn_p): Ditto.
(internal_fn_len_index): Ditto.
(internal_fn_stored_value_index): Ditto.
* internal-fn.def (LEN_MASK_GATHER_LOAD): Ditto.
(LEN_MASK_SCATTER_STORE): Ditto.
* internal-fn.h (internal_fn_len_index): Ditto.
* optabs.def (OPTAB_CD): Ditto.

---
 gcc/doc/md.texi | 17 
 gcc/internal-fn.cc  | 64 ++---
 gcc/internal-fn.def |  8 --
 gcc/internal-fn.h   |  1 +
 gcc/optabs.def  |  2 ++
 5 files changed, 87 insertions(+), 5 deletions(-)

diff --git a/gcc/doc/md.texi b/gcc/doc/md.texi
index 9648fdc846a..df41b5251d4 100644
--- a/gcc/doc/md.texi
+++ b/gcc/doc/md.texi
@@ -5040,6 +5040,15 @@ operand 5.  Bit @var{i} of the mask is set if element 
@var{i}
 of the result should be loaded from memory and clear if element @var{i}
 of the result should be set to zero.
 
+@cindex @code{len_mask_gather_load@var{m}@var{n}} instruction pattern
+@item @samp{len_mask_gather_load@var{m}@var{n}}
+Like @samp{gather_load@var{m}@var{n}}, but takes an extra length operand 
(operand 5),
+a mask operand (operand 6) as well as a bias operand (operand 7).  Similar to 
len_maskload,
+the instruction loads at most (operand 5 + operand 7) elements from memory.
+Bit @var{i} of the mask is set if element @var{i} of the result should
+be loaded from memory and clear if element @var{i} of the result should be 
undefined.
+Mask elements @var{i} with i > (operand 5) are ignored.
+
 @cindex @code{scatter_store@var{m}@var{n}} instruction pattern
 @item @samp{scatter_store@var{m}@var{n}}
 Store a vector of mode @var{m} into several distinct memory locations.
@@ -5069,6 +5078,14 @@ Like @samp{scatter_store@var{m}@var{n}}, but takes an 
extra mask operand as
 operand 5.  Bit @var{i} of the mask is set if element @var{i}
 of the result should be stored to memory.
 
+@cindex @code{len_mask_scatter_store@var{m}@var{n}} instruction pattern
+@item @samp{len_mask_scatter_store@var{m}@var{n}}
+Like @samp{scatter_store@var{m}@var{n}}, but takes an extra length operand 
(operand 5),
+a mask operand (operand 6) as well as a bias operand (operand 7).  The 
instruction stores
+at most (operand 5 + operand 7) elements of (operand 4) to memory.
+Bit @var{i} of the mask is set if element @var{i} of (operand 4) should be 
stored.
+Mask elements @var{i} with i > (operand 5) are ignored.
+
 @cindex @code{vec_set@var{m}} instruction pattern
 @item @samp{vec_set@var{m}}
 Set given field in the vector value.  Operand 0 is the vector to modify,
diff --git a/gcc/internal-fn.cc b/gcc/internal-fn.cc
index 9017176dc7a..955b2216b11 100644
--- a/gcc/internal-fn.cc
+++ b/gcc/internal-fn.cc
@@ -3537,7 +3537,7 @@ expand_scatter_store_optab_fn (internal_fn, gcall *stmt, 
direct_optab optab)
   HOST_WIDE_INT scale_int = tree_to_shwi (scale);
   rtx rhs_rtx = expand_normal (rhs);
 
-  class expand_operand ops[6];
+  class expand_operand ops[7];
   int i = 0;
   create_address_operand (&ops[i++], base_rtx);
   create_input_operand (&ops[i++], offset_rtx, TYPE_MODE (TREE_TYPE (offset)));
@@ -3546,6 +3546,17 @@ expand_scatter_store_optab_fn (internal_fn, gcall *stmt, 
direct_optab optab)
   create_input_operand (&ops[i++], rhs_rtx, TYPE_MODE (TREE_TYPE (rhs)));
   if (mask_index >= 0)
 {
+  if (optab == len_mask_scatter_store_optab)
+   {
+ tree len = gimple_call_arg (stmt, internal_fn_len_index (ifn));
+ rtx len_rtx = expand_normal (len);
+ create_convert_operand_from (&ops[i++], len_rtx,
+  TYPE_MODE (TREE_TYPE (len)),
+  TYPE_UNSIGNED (TREE_TYPE (len)));
+ tree biast = gimple_call_arg (stmt, gimple_call_num_args (stmt) - 1);
+ rtx bias = expand_normal (biast);
+ create_in

[PATCH] c++: Fix ICE with parameter pack of decltype(auto) [PR103497]

2023-06-30 Thread Nathaniel Shead via Gcc-patches
On Thu, Jun 29, 2023 at 01:43:07PM -0400, Jason Merrill wrote:
> On 6/24/23 09:24, Nathaniel Shead wrote:
> > On Fri, Jun 23, 2023 at 11:59:51AM -0400, Patrick Palka wrote:
> > > Hi,
> > > 
> > > On Sat, 22 Apr 2023, Nathaniel Shead via Gcc-patches wrote:
> > > 
> > > > Bootstrapped and tested on x86_64-pc-linux-gnu.
> > > > 
> > > > -- 8< --
> > > > 
> > > > This patch raises an error early when the decltype(auto) specifier is
> > > > used as a parameter of a function. This prevents any issues with an
> > > > unexpected tree type later on when performing the call.
> > > 
> > > Thanks very much for the patch!  Some minor comments below.
> > > 
> > > > 
> > > > PR 103497
> > > 
> > > We should include the bug component name when referring to the PR in the
> > > commit message (i.e. PR c++/103497) so that upon pushing the patch the
> > > post-commit hook automatically adds a comment to the PR reffering to the
> > > commit.  I could be wrong but AFAIK the hook only performs this when the
> > > component name is included.
> > 
> > Thanks for the review! Fixed.
> > 
> > > > 
> > > > gcc/cp/ChangeLog:
> > > > 
> > > > * parser.cc (cp_parser_simple_type_specifier): Add check for
> > > > decltype(auto) as function parameter.
> > > > 
> > > > gcc/testsuite/ChangeLog:
> > > > 
> > > > * g++.dg/pr103497.C: New test.
> > > > 
> > > > Signed-off-by: Nathaniel Shead 
> > > > ---
> > > >   gcc/cp/parser.cc| 10 ++
> > > >   gcc/testsuite/g++.dg/pr103497.C |  7 +++
> > > >   2 files changed, 17 insertions(+)
> > > >   create mode 100644 gcc/testsuite/g++.dg/pr103497.C
> > > > 
> > > > diff --git a/gcc/cp/parser.cc b/gcc/cp/parser.cc
> > > > index e5f032f2330..1415e07e152 100644
> > > > --- a/gcc/cp/parser.cc
> > > > +++ b/gcc/cp/parser.cc
> > > > @@ -19884,6 +19884,16 @@ cp_parser_simple_type_specifier (cp_parser* 
> > > > parser,
> > > > && cp_lexer_peek_nth_token (parser->lexer, 2)->type != 
> > > > CPP_SCOPE)
> > > >   {
> > > > type = saved_checks_value (token->u.tree_check_value);
> > > > +  /* Within a function parameter declaration, decltype(auto) is 
> > > > always an
> > > > +error.  */
> > > > +  if (parser->auto_is_implicit_function_template_parm_p
> > > > + && TREE_CODE (type) == TEMPLATE_TYPE_PARM
> > > 
> > > We could check is_auto (type) here instead, to avoid any confusion with
> > > checking AUTO_IS_DECLTYPE for a non-auto TEMPLATE_TYPE_PARM.
> > > 
> > > > + && AUTO_IS_DECLTYPE (type))
> > > > +   {
> > > > + error_at (token->location,
> > > > +   "cannot declare a parameter with 
> > > > %");
> > > > + type = error_mark_node;
> > > > +   }
> > > > if (decl_specs)
> > > > {
> > > >   cp_parser_set_decl_spec_type (decl_specs, type,
> > > > diff --git a/gcc/testsuite/g++.dg/pr103497.C 
> > > > b/gcc/testsuite/g++.dg/pr103497.C
> > > > new file mode 100644
> > > > index 000..bcd421c2907
> > > > --- /dev/null
> > > > +++ b/gcc/testsuite/g++.dg/pr103497.C
> > > > @@ -0,0 +1,7 @@
> > > > +// { dg-do compile { target c++14 } }
> > > > +
> > > > +void foo(decltype(auto)... args);  // { dg-error "parameter with 
> > > > .decltype.auto..|no parameter packs" }
> > > 
> > > I noticed for
> > > 
> > >void foo(decltype(auto) arg);
> > > 
> > > we already issue an identical error from grokdeclarator.  Perhaps we could
> > > instead extend the error handling there to detect decltype(auto)... as 
> > > well,
> > > rather than adding new error handling in cp_parser_simple_type_specifier?
> > 
> > Ah thanks, I didn't notice this; this simplifies the change a fair bit.
> > How about this patch instead?
> > 
> > Regtested on x86_64-pc-linux-gnu.
> > 
> > -- 8< --
> > 
> > This patch ensures that checks for usages of 'auto' in function
> > parameters also consider parameter packs, since 'type_uses_auto' does
> > not seem to consider this case.
> > 
> > PR c++/103497
> > 
> > gcc/cp/ChangeLog:
> > 
> > * decl.cc (grokdeclarator): Check for decltype(auto) in
> > parameter pack.
> > 
> > gcc/testsuite/ChangeLog:
> > 
> > * g++.dg/cpp1y/decltype-auto-103497.C: New test.
> > 
> > Signed-off-by: Nathaniel Shead 
> > ---
> >   gcc/cp/decl.cc| 3 +++
> >   gcc/testsuite/g++.dg/cpp1y/decltype-auto-103497.C | 8 
> >   2 files changed, 11 insertions(+)
> >   create mode 100644 gcc/testsuite/g++.dg/cpp1y/decltype-auto-103497.C
> > 
> > diff --git a/gcc/cp/decl.cc b/gcc/cp/decl.cc
> > index 60f107d50c4..aaf691fce68 100644
> > --- a/gcc/cp/decl.cc
> > +++ b/gcc/cp/decl.cc
> > @@ -14044,6 +14044,9 @@ grokdeclarator (const cp_declarator *declarator,
> > error ("cannot use %<::%> in parameter declaration");
> > tree auto_node = type_uses_auto (type);
> > +  if (!auto_node && parameter_pack_p)
> > +   auto_node = type_uses_auto (PACK_EXPANSION_PATTERN (type));
> 
> Hmm, I wonder if type

[PATCH V4] Machine Description: Add LEN_MASK_{GATHER_LOAD, SCATTER_STORE} pattern

2023-06-30 Thread juzhe . zhong
From: Ju-Zhe Zhong 

Hi, Richi and Richard.

This patch is adding LEN_MASK_{GATHER_LOAD,SCATTER_STORE} to allow targets
handle flow control by mask and loop control by length on gather/scatter memory
operations. Consider this following case:

#include 
void
f (uint8_t *restrict a,
   uint8_t *restrict b, int n,
   int base, int step,
   int *restrict cond)
{
  for (int i = 0; i < n; ++i)
{
  if (cond[i])
a[i * step + base] = b[i * step + base];
}
}

We hope RVV can vectorize such case into following IR:

loop_len = SELECT_VL
control_mask = comparison
v = LEN_MASK_GATHER_LOAD (.., loop_len, control_mask, bias)
LEN_SCATTER_STORE (... v, ..., loop_len, control_mask, bias)

This patch doesn't apply such patterns into vectorizer, just add patterns
and update the documents.

Will send patch which apply such patterns into vectorizer soon after this
patch is approved.

Thanks.

gcc/ChangeLog:

* doc/md.texi: Add len_mask_gather/scatter.
* internal-fn.cc (expand_scatter_store_optab_fn): Ditto.
(expand_gather_load_optab_fn): Ditto.
(internal_load_fn_p): Ditto.
(internal_store_fn_p): Ditto.
(internal_gather_scatter_fn_p): Ditto.
(internal_fn_len_index): Ditto.
(internal_fn_stored_value_index): Ditto.
* internal-fn.def (LEN_MASK_GATHER_LOAD): Ditto.
(LEN_MASK_SCATTER_STORE): Ditto.
* internal-fn.h (internal_fn_len_index): Ditto.
* optabs.def (OPTAB_CD): Ditto.

---
 gcc/doc/md.texi | 17 
 gcc/internal-fn.cc  | 67 +++--
 gcc/internal-fn.def |  8 --
 gcc/internal-fn.h   |  1 +
 gcc/optabs.def  |  2 ++
 5 files changed, 90 insertions(+), 5 deletions(-)

diff --git a/gcc/doc/md.texi b/gcc/doc/md.texi
index 9648fdc846a..df41b5251d4 100644
--- a/gcc/doc/md.texi
+++ b/gcc/doc/md.texi
@@ -5040,6 +5040,15 @@ operand 5.  Bit @var{i} of the mask is set if element 
@var{i}
 of the result should be loaded from memory and clear if element @var{i}
 of the result should be set to zero.
 
+@cindex @code{len_mask_gather_load@var{m}@var{n}} instruction pattern
+@item @samp{len_mask_gather_load@var{m}@var{n}}
+Like @samp{gather_load@var{m}@var{n}}, but takes an extra length operand 
(operand 5),
+a mask operand (operand 6) as well as a bias operand (operand 7).  Similar to 
len_maskload,
+the instruction loads at most (operand 5 + operand 7) elements from memory.
+Bit @var{i} of the mask is set if element @var{i} of the result should
+be loaded from memory and clear if element @var{i} of the result should be 
undefined.
+Mask elements @var{i} with i > (operand 5) are ignored.
+
 @cindex @code{scatter_store@var{m}@var{n}} instruction pattern
 @item @samp{scatter_store@var{m}@var{n}}
 Store a vector of mode @var{m} into several distinct memory locations.
@@ -5069,6 +5078,14 @@ Like @samp{scatter_store@var{m}@var{n}}, but takes an 
extra mask operand as
 operand 5.  Bit @var{i} of the mask is set if element @var{i}
 of the result should be stored to memory.
 
+@cindex @code{len_mask_scatter_store@var{m}@var{n}} instruction pattern
+@item @samp{len_mask_scatter_store@var{m}@var{n}}
+Like @samp{scatter_store@var{m}@var{n}}, but takes an extra length operand 
(operand 5),
+a mask operand (operand 6) as well as a bias operand (operand 7).  The 
instruction stores
+at most (operand 5 + operand 7) elements of (operand 4) to memory.
+Bit @var{i} of the mask is set if element @var{i} of (operand 4) should be 
stored.
+Mask elements @var{i} with i > (operand 5) are ignored.
+
 @cindex @code{vec_set@var{m}} instruction pattern
 @item @samp{vec_set@var{m}}
 Set given field in the vector value.  Operand 0 is the vector to modify,
diff --git a/gcc/internal-fn.cc b/gcc/internal-fn.cc
index 9017176dc7a..6401eeeccb9 100644
--- a/gcc/internal-fn.cc
+++ b/gcc/internal-fn.cc
@@ -3537,7 +3537,7 @@ expand_scatter_store_optab_fn (internal_fn, gcall *stmt, 
direct_optab optab)
   HOST_WIDE_INT scale_int = tree_to_shwi (scale);
   rtx rhs_rtx = expand_normal (rhs);
 
-  class expand_operand ops[6];
+  class expand_operand ops[7];
   int i = 0;
   create_address_operand (&ops[i++], base_rtx);
   create_input_operand (&ops[i++], offset_rtx, TYPE_MODE (TREE_TYPE (offset)));
@@ -3546,9 +3546,23 @@ expand_scatter_store_optab_fn (internal_fn, gcall *stmt, 
direct_optab optab)
   create_input_operand (&ops[i++], rhs_rtx, TYPE_MODE (TREE_TYPE (rhs)));
   if (mask_index >= 0)
 {
+  if (optab == len_mask_scatter_store_optab)
+   {
+ tree len = gimple_call_arg (stmt, internal_fn_len_index (ifn));
+ rtx len_rtx = expand_normal (len);
+ create_convert_operand_from (&ops[i++], len_rtx,
+  TYPE_MODE (TREE_TYPE (len)),
+  TYPE_UNSIGNED (TREE_TYPE (len)));
+   }
   tree mask = gimple_call_arg (stmt, mask_index);
   rtx mask_rtx = expand_normal (mask);
   create_input_operand (&ops[i

Re: Re: [PATCH V2] Machine Description: Add LEN_MASK_{GATHER_LOAD, SCATTER_STORE} pattern

2023-06-30 Thread juzhe.zh...@rivai.ai
Hi, Richi. I have added "BIAS" and send V4:
https://gcc.gnu.org/pipermail/gcc-patches/2023-June/623293.html 

Forget about V3. I made a mistake there, sorry about that.

Thanks.


juzhe.zh...@rivai.ai
 
From: Richard Biener
Date: 2023-06-30 14:26
To: Ju-Zhe Zhong
CC: gcc-patches; richard.sandiford
Subject: Re: [PATCH V2] Machine Description: Add LEN_MASK_{GATHER_LOAD, 
SCATTER_STORE} pattern
On Fri, 30 Jun 2023, juzhe.zh...@rivai.ai wrote:
 
> From: Ju-Zhe Zhong 
> 
> Hi, Richi and Richard.
> 
> This patch is adding LEN_MASK_{GATHER_LOAD,SCATTER_STORE} to allow targets
> handle flow control by mask and loop control by length on gather/scatter 
> memory
> operations. Consider this following case:
> 
> #include 
> void
> f (uint8_t *restrict a, 
>uint8_t *restrict b, int n,
>int base, int step,
>int *restrict cond)
> {
>   for (int i = 0; i < n; ++i)
> {
>   if (cond[i])
> a[i * step + base] = b[i * step + base];
> }
> }
> 
> We hope RVV can vectorize such case into following IR:
> 
> loop_len = SELECT_VL
> control_mask = comparison
> v = LEN_MASK_GATHER_LOAD (.., loop_len, control_mask)
> LEN_SCATTER_STORE (... v, ..., loop_len, control_mask)
> 
> This patch doesn't apply such patterns into vectorizer, just add patterns
> and update the documents.
 
I see this doesn't add a BIAS argument - I think we should be consistent
here, at least for the memory access internal functions.  I'll note
that 'len' has issues with scatter/gather anyway since the trick of
handling 'len' in bytes by only providing QImode variants doesn't work
here.  So maybe that's good enough for an argument to not add bias
either ...
 
Richard, do you have any opinion here?
 
> Will send patch which apply such patterns into vectorizer soon after this
> patch is approved.
> 
> Thanks.
> 
> gcc/ChangeLog:
> 
> * doc/md.texi: Add LEN_MASK_{GATHER_LOAD,SCATTER_STORE}.
> * internal-fn.cc (expand_scatter_store_optab_fn): Ditto.
> (expand_gather_load_optab_fn): Ditto.
> (internal_load_fn_p): Ditto.
> (internal_store_fn_p): Ditto.
> (internal_gather_scatter_fn_p): Ditto.
> (internal_fn_mask_index): Ditto.
 
Can you add internal_fn_len_index please and make use of it?
Should have asked for this in the previous patch already I guess.
 
Otherwise this looks good to me.
 
Thanks,
Richard.
 
> (internal_fn_stored_value_index): Ditto.
> * internal-fn.def (LEN_MASK_GATHER_LOAD): Ditto.
> (LEN_MASK_SCATTER_STORE): Ditto.
> * optabs.def (OPTAB_CD): Ditto.
> 
> ---
>  gcc/doc/md.texi | 17 +
>  gcc/internal-fn.cc  | 32 ++--
>  gcc/internal-fn.def |  8 ++--
>  gcc/optabs.def  |  2 ++
>  4 files changed, 55 insertions(+), 4 deletions(-)
> 
> diff --git a/gcc/doc/md.texi b/gcc/doc/md.texi
> index 9648fdc846a..b84aaab7075 100644
> --- a/gcc/doc/md.texi
> +++ b/gcc/doc/md.texi
> @@ -5040,6 +5040,15 @@ operand 5.  Bit @var{i} of the mask is set if element 
> @var{i}
>  of the result should be loaded from memory and clear if element @var{i}
>  of the result should be set to zero.
>  
> +@cindex @code{len_mask_gather_load@var{m}@var{n}} instruction pattern
> +@item @samp{len_mask_gather_load@var{m}@var{n}}
> +Like @samp{gather_load@var{m}@var{n}}, but takes an extra length operand 
> (operand 5)
> +as well as a mask operand (operand 6). Similar to len_maskload, the 
> instruction loads
> +at most (operand 5) elements from memory.
> +Bit @var{i} of the mask is set if element @var{i} of the result should
> +be loaded from memory and clear if element @var{i} of the result should be 
> undefined.
> +Mask elements @var{i} with i > (operand 5) are ignored.
> +
>  @cindex @code{scatter_store@var{m}@var{n}} instruction pattern
>  @item @samp{scatter_store@var{m}@var{n}}
>  Store a vector of mode @var{m} into several distinct memory locations.
> @@ -5069,6 +5078,14 @@ Like @samp{scatter_store@var{m}@var{n}}, but takes an 
> extra mask operand as
>  operand 5.  Bit @var{i} of the mask is set if element @var{i}
>  of the result should be stored to memory.
>  
> +@cindex @code{len_mask_scatter_store@var{m}@var{n}} instruction pattern
> +@item @samp{len_mask_scatter_store@var{m}@var{n}}
> +Like @samp{scatter_store@var{m}@var{n}}, but takes an extra length operand 
> (operand 5)
> +as well as a mask operand (operand 6). The instruction stores at most 
> (operand 5) elements
> +of (operand 4) to memory.
> +Bit @var{i} of the mask is set if element @var{i} of (operand 4) should be 
> stored.
> +Mask elements @var{i} with i > (operand 5) are ignored.
> +
>  @cindex @code{vec_set@var{m}} instruction pattern
>  @item @samp{vec_set@var{m}}
>  Set given field in the vector value.  Operand 0 is the vector to modify,
> diff --git a/gcc/internal-fn.cc b/gcc/internal-fn.cc
> index 9017176dc7a..e4b558e33d8 100644
> --- a/gcc/internal-fn.cc
> +++ b/gcc/internal-fn.cc
> @@ -3537,7 +3537,7 @@ expand_scatter_store_optab_fn 

Flip the nvptx port to LRA (was: [PATCH] Turn on LRA on all targets)

2023-06-30 Thread Thomas Schwinge
Hi!

On 2023-04-29T09:06:54-0600, Jeff Law via Gcc-patches  
wrote:
> On 4/29/23 07:37, Roger Sayle wrote:
>>
>> Segher Boessenkool wrote:
>>> I send this patch now so that people can start testing.
>>>
>>> --- a/gcc/config/nvptx/nvptx.cc
>>> +++ b/gcc/config/nvptx/nvptx.cc
>>> @@ -7601,9 +7601,6 @@ nvptx_asm_output_def_from_decls (FILE *stream, tree 
>>> name, tree value)
>>> #undef TARGET_ATTRIBUTE_TABLE
>>> #define TARGET_ATTRIBUTE_TABLE nvptx_attribute_table
>>>
>>> -#undef TARGET_LRA_P
>>> -#define TARGET_LRA_P hook_bool_void_false
>>> -
>>> #undef TARGET_LEGITIMATE_ADDRESS_P
>>> #define TARGET_LEGITIMATE_ADDRESS_P nvptx_legitimate_address_p
>>
>> I've tested Segher's patch on nvptx-none with make and make -k check and
>> can confirm there are no new regressions.

Confirmed.  Also, no change in nvptx target libraries built.  As
expected.

>> Nvptx is unique in that it
>> doesn't
>> use register allocation, i.e. GCC's only TARGET_NO_REGISTER_ALLOCATION
>> target,
>> so it's a little odd that it specifies which register allocator it doesn't
>> use.
>>
>> I hope this helps,
>
> It does.  Consider a patch which flips the nvptx port to LRA as
> pre-approved.

Pushed to master branch commit f7e3123638712773e8c01e17aae9dc64d9342016
"Flip the nvptx port to LRA", see attached.


Grüße
 Thomas


-
Siemens Electronic Design Automation GmbH; Anschrift: Arnulfstraße 201, 80634 
München; Gesellschaft mit beschränkter Haftung; Geschäftsführer: Thomas 
Heurung, Frank Thürauf; Sitz der Gesellschaft: München; Registergericht 
München, HRB 106955
>From f7e3123638712773e8c01e17aae9dc64d9342016 Mon Sep 17 00:00:00 2001
From: Segher Boessenkool 
Date: Sun, 23 Apr 2023 16:47:52 +
Subject: [PATCH] Flip the nvptx port to LRA

... understanding that "turn on LRA" is an exaggeration here, given that nvptx
isn't actually doing register allocation ('TARGET_NO_REGISTER_ALLOCATION').

	gcc/
	* config/nvptx/nvptx.cc (TARGET_LRA_P): Remove.

Co-authored-by: Thomas Schwinge 
---
 gcc/config/nvptx/nvptx.cc | 3 ---
 1 file changed, 3 deletions(-)

diff --git a/gcc/config/nvptx/nvptx.cc b/gcc/config/nvptx/nvptx.cc
index e3b0304d5376..16ed78030d73 100644
--- a/gcc/config/nvptx/nvptx.cc
+++ b/gcc/config/nvptx/nvptx.cc
@@ -7633,9 +7633,6 @@ nvptx_asm_output_def_from_decls (FILE *stream, tree name, tree value)
 #undef TARGET_ATTRIBUTE_TABLE
 #define TARGET_ATTRIBUTE_TABLE nvptx_attribute_table
 
-#undef TARGET_LRA_P
-#define TARGET_LRA_P hook_bool_void_false
-
 #undef TARGET_LEGITIMATE_ADDRESS_P
 #define TARGET_LEGITIMATE_ADDRESS_P nvptx_legitimate_address_p
 
-- 
2.39.2



[x86 PATCH] Add STV support for DImode and SImode rotations by constant.

2023-06-30 Thread Roger Sayle

This patch implements scalar-to-vector (STV) support for DImode and SImode
rotations by constant bit counts.  Scalar rotations are almost always
optimal on x86, requiring only one or two instructions, but it is also
possible to implement these efficiently with SSE2, requiring only one
or two instructions for SImode rotations and at most 3 instructions for
DImode rotations.  This allows GCC to STV rotations with a small or no
penalty if there are other (net) benefits to converting a chain.  An
example of the benefits is shown below, which is based upon the BLAKE2
cryptographic hash function:

unsigned long long a,b,c,d;

unsigned long rot(unsigned long long x, int y)
{
  return (x<>(64-y));
}

void foo()
{
  d = rot(d ^ a,32);
  c = c + d;
  b = rot(b ^ c,24);
  a = a + b;
  d = rot(d ^ a,16);
  c = c + d;
  b = rot(b ^ c,63);
}

where with -m32 -O2 -msse2

Before (59 insns, 247 bytes):
foo:pushl   %edi
xorl%edx, %edx
pushl   %esi
pushl   %ebx
subl$16, %esp
movqa, %xmm1
movqd, %xmm0
movqb, %xmm2
pxor%xmm1, %xmm0
psrlq   $32, %xmm0
movd%xmm0, %eax
movd%edx, %xmm0
movd%eax, %xmm3
punpckldq   %xmm0, %xmm3
movqc, %xmm0
paddq   %xmm3, %xmm0
pxor%xmm0, %xmm2
movd%xmm2, %ecx
psrlq   $32, %xmm2
movd%xmm2, %ebx
movl%ecx, %eax
shldl   $24, %ebx, %ecx
shldl   $24, %eax, %ebx
movd%ebx, %xmm4
movd%ecx, %xmm2
punpckldq   %xmm4, %xmm2
movdqa  .LC0, %xmm4
pand%xmm4, %xmm2
paddq   %xmm2, %xmm1
movq%xmm1, a
pxor%xmm3, %xmm1
movd%xmm1, %esi
psrlq   $32, %xmm1
movd%xmm1, %edi
movl%esi, %eax
shldl   $16, %edi, %esi
shldl   $16, %eax, %edi
movd%esi, %xmm1
movd%edi, %xmm3
punpckldq   %xmm3, %xmm1
pand%xmm4, %xmm1
movq%xmm1, d
paddq   %xmm1, %xmm0
movq%xmm0, c
pxor%xmm2, %xmm0
movd%xmm0, 8(%esp)
psrlq   $32, %xmm0
movl8(%esp), %eax
movd%xmm0, 12(%esp)
movl12(%esp), %edx
shrdl   $1, %edx, %eax
xorl%edx, %edx
movl%eax, b
movl%edx, b+4
addl$16, %esp
popl%ebx
popl%esi
popl%edi
ret

After (32 insns, 165 bytes):
movqa, %xmm1
xorl%edx, %edx
movqd, %xmm0
movqb, %xmm2
movdqa  .LC0, %xmm4
pxor%xmm1, %xmm0
psrlq   $32, %xmm0
movd%xmm0, %eax
movd%edx, %xmm0
movd%eax, %xmm3
punpckldq   %xmm0, %xmm3
movqc, %xmm0
paddq   %xmm3, %xmm0
pxor%xmm0, %xmm2
pshufd  $68, %xmm2, %xmm2
psrldq  $5, %xmm2
pand%xmm4, %xmm2
paddq   %xmm2, %xmm1
movq%xmm1, a
pxor%xmm3, %xmm1
pshuflw $147, %xmm1, %xmm1
pand%xmm4, %xmm1
movq%xmm1, d
paddq   %xmm1, %xmm0
movq%xmm0, c
pxor%xmm2, %xmm0
pshufd  $20, %xmm0, %xmm0
psrlq   $1, %xmm0
pshufd  $136, %xmm0, %xmm0
pand%xmm4, %xmm0
movq%xmm0, b
ret


This patch has been tested on x86_64-pc-linux-gnu with make bootstrap
and make -k check, both with and without --target_board=unix{-m32}
with no new failures.  Ok for mainline?


2023-06-30  Roger Sayle  

gcc/ChangeLog
* config/i386/i386-features.cc (compute_convert_gain): Provide
gains/costs for ROTATE and ROTATERT (by an integer constant).
(general_scalar_chain::convert_rotate): New helper function to
convert a DImode or SImode rotation by an integer constant into
SSE vector form.
(general_scalar_chain::convert_insn): Call the new convert_rotate
for ROTATE and ROTATERT.
(general_scalar_to_vector_candidate_p): Consider ROTATE and
ROTATERT to be candidates if the second operand is an integer
constant, valid for a rotation (or shift) in the given mode.
* config/i386/i386-features.h (general_scalar_chain): Add new
helper method convert_rotate.

gcc/testsuite/ChangeLog
* gcc.target/i386/rotate-6.c: New test case.
* gcc.target/i386/sse2-stv-1.c: Likewise.


Thanks in advance,
Roger
--

diff --git a/gcc/config/i386/i386-features.cc b/gcc/config/i386/i386-features.cc
index 4a3b07a..b98baba 100644
--- a/gcc/config/i386/i386-features.cc
+++ b/gcc/config/i386/i386-features.cc
@@ -582,6 +582,25 @@ general_scalar_chain::compute_convert_gain ()
  igain -= vector_const_cost (XEXP (src, 0));
break;
 
+ case ROTATE:
+ case ROTATERT:
+   igain += m * ix86_cost->shift_const;
+   

LTO: Capture 'lto_file_decl_data *file_data' in 'class lto_input_block' (was: [PATCH v3] Streamer: Fix out of range memory access of machine mode)

2023-06-30 Thread Thomas Schwinge
Hi!

On 2023-06-29T22:14:59+0200, I wrote:
> [the new] 'file_data->mode_bits' needs to be considered [somewhere]
>
> Easiest is in 'gcc/lto-streamer.h:class lto_input_block' to capture
> 'lto_file_decl_data *file_data' instead of just
> 'unsigned char *mode_table', and adjust all users.

I've split this out as a preparational "no change in behavior" patch; is
"LTO: Capture 'lto_file_decl_data *file_data' in 'class lto_input_block'"
OK to push, see attached?


Grüße
 Thomas


-
Siemens Electronic Design Automation GmbH; Anschrift: Arnulfstraße 201, 80634 
München; Gesellschaft mit beschränkter Haftung; Geschäftsführer: Thomas 
Heurung, Frank Thürauf; Sitz der Gesellschaft: München; Registergericht 
München, HRB 106955
>From 1b75a8680bdef16633e3fa2479832a1b71dae43f Mon Sep 17 00:00:00 2001
From: Thomas Schwinge 
Date: Thu, 29 Jun 2023 21:33:06 +0200
Subject: [PATCH] LTO: Capture 'lto_file_decl_data *file_data' in 'class
 lto_input_block'

... instead of just 'unsigned char *mode_table'.  Preparation for a forthcoming
change, where we need to capture an additional 'file_data' item, so it seems
easier to just capture that one proper.

	gcc/
	* lto-streamer.h (class lto_input_block): Capture
	'lto_file_decl_data *file_data' instead of just
	'unsigned char *mode_table'.
	* ipa-devirt.cc (ipa_odr_read_section): Adjust.
	* ipa-fnsummary.cc (inline_read_section): Likewise.
	* ipa-icf.cc (sem_item_optimizer::read_section): Likewise.
	* ipa-modref.cc (read_section): Likewise.
	* ipa-prop.cc (ipa_prop_read_section, read_replacements_section):
	Likewise.
	* ipa-sra.cc (isra_read_summary_section): Likewise.
	* lto-cgraph.cc (input_cgraph_opt_section): Likewise.
	* lto-section-in.cc (lto_create_simple_input_block): Likewise.
	* lto-streamer-in.cc (lto_read_body_or_constructor)
	(lto_input_toplevel_asms): Likewise.
	* tree-streamer.h (bp_unpack_machine_mode): Likewise.
	gcc/lto/
	* lto-common.cc (lto_read_decls): Adjust.
---
 gcc/ipa-devirt.cc  |  2 +-
 gcc/ipa-fnsummary.cc   |  2 +-
 gcc/ipa-icf.cc |  2 +-
 gcc/ipa-modref.cc  |  2 +-
 gcc/ipa-prop.cc|  4 ++--
 gcc/ipa-sra.cc |  2 +-
 gcc/lto-cgraph.cc  |  2 +-
 gcc/lto-section-in.cc  |  2 +-
 gcc/lto-streamer-in.cc |  6 +++---
 gcc/lto-streamer.h | 10 +-
 gcc/lto/lto-common.cc  |  2 +-
 gcc/tree-streamer.h|  6 +++---
 12 files changed, 21 insertions(+), 21 deletions(-)

diff --git a/gcc/ipa-devirt.cc b/gcc/ipa-devirt.cc
index 2c61a497cee..87529be4515 100644
--- a/gcc/ipa-devirt.cc
+++ b/gcc/ipa-devirt.cc
@@ -4147,7 +4147,7 @@ ipa_odr_read_section (struct lto_file_decl_data *file_data, const char *data,
   class data_in *data_in;
 
   lto_input_block ib ((const char *) data + main_offset, header->main_size,
-		  file_data->mode_table);
+		  file_data);
 
   data_in
 = lto_data_in_create (file_data, (const char *) data + string_offset,
diff --git a/gcc/ipa-fnsummary.cc b/gcc/ipa-fnsummary.cc
index a5f5a50c8a5..37c1edc2f3a 100644
--- a/gcc/ipa-fnsummary.cc
+++ b/gcc/ipa-fnsummary.cc
@@ -4528,7 +4528,7 @@ inline_read_section (struct lto_file_decl_data *file_data, const char *data,
   unsigned int f_count;
 
   lto_input_block ib ((const char *) data + main_offset, header->main_size,
-		  file_data->mode_table);
+		  file_data);
 
   data_in =
 lto_data_in_create (file_data, (const char *) data + string_offset,
diff --git a/gcc/ipa-icf.cc b/gcc/ipa-icf.cc
index cb9f768d85d..836d0914ded 100644
--- a/gcc/ipa-icf.cc
+++ b/gcc/ipa-icf.cc
@@ -2204,7 +2204,7 @@ sem_item_optimizer::read_section (lto_file_decl_data *file_data,
   unsigned int count;
 
   lto_input_block ib_main ((const char *) data + main_offset, 0,
-			   header->main_size, file_data->mode_table);
+			   header->main_size, file_data);
 
   data_in
 = lto_data_in_create (file_data, (const char *) data + string_offset,
diff --git a/gcc/ipa-modref.cc b/gcc/ipa-modref.cc
index e3196df8aa9..278b2dbd828 100644
--- a/gcc/ipa-modref.cc
+++ b/gcc/ipa-modref.cc
@@ -3816,7 +3816,7 @@ read_section (struct lto_file_decl_data *file_data, const char *data,
   unsigned int f_count;
 
   lto_input_block ib ((const char *) data + main_offset, header->main_size,
-		  file_data->mode_table);
+		  file_data);
 
   data_in
 = lto_data_in_create (file_data, (const char *) data + string_offset,
diff --git a/gcc/ipa-prop.cc b/gcc/ipa-prop.cc
index 704fe01b02c..8f2119b72e3 100644
--- a/gcc/ipa-prop.cc
+++ b/gcc/ipa-prop.cc
@@ -5337,7 +5337,7 @@ ipa_prop_read_section (struct lto_file_decl_data *file_data, const char *data,
   unsigned int count;
 
   lto_input_block ib_main ((const char *) data + main_offset,
-			   header->main_size, file_data->mode_table);
+			   header->main_size, file_data);
 
   data_in =
 lto_data_in_create (file_data, (const char *) data + string_offset,
@@ -5561,7 +5561,7 @@ read_replacements_section (struct lto_file_decl_data *file_data,
   unsigned int count;
 
   lto_input_block ib_main ((const char *) da

Re: [PATCH 1/2] Mid engine setup [SU]ABDL

2023-06-30 Thread Oluwatamilore Adebayo via Gcc-patches
> Sorry, my fault.  I was using the original type names in this
> suggestion, rather than the TYPE1…TYPE5 ones.  Should be:
> 
>WIDEN_ABD exists to optimize the case where TYPE4 is at least
>twice as wide as TYPE3.

Change made.

> Lingering use of “L” suffixes here.  Maybe:
> 
>  stmts that constitute the pattern, principally:
>   out = IFN_ABD (x, y)
>   out = IFN_WIDEN_ABD (x, y)

Change made.

> > +  if (TYPE_PRECISION (out_type) >= TYPE_PRECISION (abd_in_type) * 2
> > +  && TYPE_PRECISION (abd_out_type) != stmt_vinfo->min_output_precision)
> 
> Sorry for not noticing last time, but I think the second condition
> would be more natural as:
> 
>   && stmt_vinfo->min_output_precision >= TYPE_PRECISION (abd_in_type) * 2)
> 
> (There's no distinction between abs_in_type and abs_out_type at this point,
> so it seems clearer to use the same value in both conditions.)

Change made.

> > +  gassign *last_stmt = dyn_cast  (STMT_VINFO_STMT (stmt_vinfo));
> > +  if (!last_stmt || !gimple_assign_cast_p (last_stmt))
> 
> I think this should be:
> 
>   if (!last_stmt || !CONVERT_EXPR_CODE_P (gimple_assign_rhs_code (last_stmt)))
> 
> gimple_assign_cast_p is more general, and allows conversions
> between integral and non-integral types.

Change made.

> > +  tree in_type = TREE_TYPE (last_rhs);
> > +  tree out_type = TREE_TYPE (gimple_assign_lhs (last_stmt));
> > +  if (TYPE_PRECISION (in_type) * 2 != TYPE_PRECISION (out_type))
> > +return NULL;
> 
> I think this also needs to require TYPE_UNSIGNED (in_type):
> 
>   if (TYPE_PRECISION (in_type) * 2 != TYPE_PRECISION (out_type)
>   || !TYPE_UNSIGNED (in_type))
> return NULL;
> 
> That is, the extension has to be a zero extension rather than
> a sign extension.
> 
> For example:
> 
>   int32_t a, b, c;
>   int64_t d;
> 
>   c = IFN_ABD (a, b);
>   d = (int64_t) c;
> 
> sign-extends the ABD result to 64 bits, and so a == INT_MAX
> && b == INT_MIN gives:
> 
>   c = -1(UINT_MAX converted to signed)
>   d = -1
> 
> But IFN_WIDEN_ABD would give d == UINT_MAX instead.

Change made.

> > +  gimple *pattern_stmt = STMT_VINFO_STMT (abd_pattern_vinfo);
> > +  if (gimple_assign_cast_p (pattern_stmt))
> > +{
> > +   tree op = gimple_assign_rhs1 (pattern_stmt);
> > +   vect_unpromoted_value unprom;
> > +   op = vect_look_through_possible_promotion (vinfo, op, &unprom);
> > +
> > +  if (!op)
> > +   return NULL;
> > +
> > +  abd_pattern_vinfo = vect_get_internal_def (vinfo, op);
> > +  if (!abd_pattern_vinfo)
> > +   return NULL;
> > +
> > +  pattern_stmt = STMT_VINFO_STMT (abd_pattern_vinfo);
> > +}
> 
> I think the code quoted above reduces to:
> 
>   vect_unpromoted_value unprom;
>   tree op = vect_look_through_possible_promotion (vinfo, last_rhs, &unprom);
>   if (!op || TYPE_PRECISION (TREE_TYPE (op)) != TYPE_PRECISION (in_type))
> return NULL;
> 
>   stmt_vec_info abd_pattern_vinfo = vect_get_internal_def (vinfo, op);
>   if (!abd_pattern_vinfo)
> return NULL;
>   abd_pattern_vinfo = vect_stmt_to_vectorize (abd_pattern_vinfo);
> 
> ...
> 
> > +  tree abd_oprnd0 = gimple_call_arg (abd_stmt, 0);
> > +  tree abd_oprnd1 = gimple_call_arg (abd_stmt, 1);
> > +  if (TYPE_PRECISION (TREE_TYPE (abd_oprnd0)) != TYPE_PRECISION (in_type))
> > +return NULL;
> 
> With the changes above, this check would not be necessary.

Both changes made.

Updated patch will be in the next email.


Re: [PATCH 2/2] AArch64: New RTL for ABDL

2023-06-30 Thread Oluwatamilore Adebayo via Gcc-patches
> Sorry, my fault, but I meant the comment about avoiding
> (minus (max…) (min…)) for both patterns, not just the first.

Change made.

> I think the review suggestions for 1/2 will change the tests.
> For example:
> 
> TEST2(signed, short, char)

This is the case and the tests have been updated to reflect it.

Updated patch in next email.


[PATCH 1/2] Mid engine setup [SU]ABDL

2023-06-30 Thread Oluwatamilore Adebayo via Gcc-patches
From: oluade01 

This updates vect_recog_abd_pattern to recognize the widening
variant of absolute difference (ABDL, ABDL2).

gcc/ChangeLog:

* internal-fn.cc (widening_fn_p, decomposes_to_hilo_fn_p):
Add IFN_VEC_WIDEN_ABD to the switch statement.
* internal-fn.def (VEC_WIDEN_ABD): New internal hilo optab.
* optabs.def (vec_widen_sabd_optab,
vec_widen_sabd_hi_optab, vec_widen_sabd_lo_optab,
vec_widen_sabd_odd_even, vec_widen_sabd_even_optab,
vec_widen_uabd_optab,
vec_widen_uabd_hi_optab, vec_widen_uabd_lo_optab,
vec_widen_uabd_odd_even, vec_widen_uabd_even_optab):
New optabs.
* tree-vect-patterns.cc (vect_recog_abd_pattern): Update to
to build a VEC_WIDEN_ABD call if the input precision is smaller
than the precision of the output.
(vect_recog_widen_abd_pattern): Should an ABD expression be
found preceeding an extension, replace the two with a
VEC_WIDEN_ABD.
---
 gcc/doc/md.texi   |  17 
 gcc/internal-fn.def   |   5 +
 gcc/optabs.def|  10 ++
 gcc/tree-vect-patterns.cc | 190 --
 4 files changed, 174 insertions(+), 48 deletions(-)

diff --git a/gcc/doc/md.texi b/gcc/doc/md.texi
index 
e11b10d2fca11016232921bc85e47975f700e6c6..f518d9ed6e170b2e2b9557fffed2cf47aa02633d
 100644
--- a/gcc/doc/md.texi
+++ b/gcc/doc/md.texi
@@ -5617,6 +5617,23 @@ signed/unsigned elements of size S@.  Subtract the 
high/low elements of 2 from
 1 and widen the resulting elements. Put the N/2 results of size 2*S in the
 output vector (operand 0).
 
+@cindex @code{vec_widen_sabd_hi_@var{m}} instruction pattern
+@cindex @code{vec_widen_sabd_lo_@var{m}} instruction pattern
+@cindex @code{vec_widen_sabd_odd_@var{m}} instruction pattern
+@cindex @code{vec_widen_sabd_even_@var{m}} instruction pattern
+@cindex @code{vec_widen_uabd_hi_@var{m}} instruction pattern
+@cindex @code{vec_widen_uabd_lo_@var{m}} instruction pattern
+@cindex @code{vec_widen_uabd_odd_@var{m}} instruction pattern
+@cindex @code{vec_widen_uabd_even_@var{m}} instruction pattern
+@item @samp{vec_widen_uabd_hi_@var{m}}, @samp{vec_widen_uabd_lo_@var{m}}
+@itemx @samp{vec_widen_uabd_odd_@var{m}}, @samp{vec_widen_uabd_even_@var{m}}
+@itemx @samp{vec_widen_sabd_hi_@var{m}}, @samp{vec_widen_sabd_lo_@var{m}}
+@itemx @samp{vec_widen_sabd_odd_@var{m}}, @samp{vec_widen_sabd_even_@var{m}}
+Signed/Unsigned widening absolute difference.  Operands 1 and 2 are
+vectors with N signed/unsigned elements of size S@.  Find the absolute
+difference between operands 1 and 2 and widen the resulting elements.
+Put the N/2 results of size 2*S in the output vector (operand 0).
+
 @cindex @code{vec_addsub@var{m}3} instruction pattern
 @item @samp{vec_addsub@var{m}3}
 Alternating subtract, add with even lanes doing subtract and odd
diff --git a/gcc/internal-fn.def b/gcc/internal-fn.def
index 
116965f4830cec8f60642ff011a86b6562e2c509..d67274d68b49943a88c531e903fd03b42343ab97
 100644
--- a/gcc/internal-fn.def
+++ b/gcc/internal-fn.def
@@ -352,6 +352,11 @@ DEF_INTERNAL_WIDENING_OPTAB_FN (VEC_WIDEN_MINUS,
first,
vec_widen_ssub, vec_widen_usub,
binary)
+DEF_INTERNAL_WIDENING_OPTAB_FN (VEC_WIDEN_ABD,
+   ECF_CONST | ECF_NOTHROW,
+   first,
+   vec_widen_sabd, vec_widen_uabd,
+   binary)
 DEF_INTERNAL_OPTAB_FN (VEC_FMADDSUB, ECF_CONST, vec_fmaddsub, ternary)
 DEF_INTERNAL_OPTAB_FN (VEC_FMSUBADD, ECF_CONST, vec_fmsubadd, ternary)
 
diff --git a/gcc/optabs.def b/gcc/optabs.def
index 
35b835a6ac56d72417dac8ddfd77a8a7e2475e65..68dfa1550f791a2fe833012157601ecfa68f1e09
 100644
--- a/gcc/optabs.def
+++ b/gcc/optabs.def
@@ -418,6 +418,11 @@ OPTAB_D (vec_widen_sadd_hi_optab, "vec_widen_sadd_hi_$a")
 OPTAB_D (vec_widen_sadd_lo_optab, "vec_widen_sadd_lo_$a")
 OPTAB_D (vec_widen_sadd_odd_optab, "vec_widen_sadd_odd_$a")
 OPTAB_D (vec_widen_sadd_even_optab, "vec_widen_sadd_even_$a")
+OPTAB_D (vec_widen_sabd_optab, "vec_widen_sabd_$a")
+OPTAB_D (vec_widen_sabd_hi_optab, "vec_widen_sabd_hi_$a")
+OPTAB_D (vec_widen_sabd_lo_optab, "vec_widen_sabd_lo_$a")
+OPTAB_D (vec_widen_sabd_odd_optab, "vec_widen_sabd_odd_$a")
+OPTAB_D (vec_widen_sabd_even_optab, "vec_widen_sabd_even_$a")
 OPTAB_D (vec_widen_sshiftl_hi_optab, "vec_widen_sshiftl_hi_$a")
 OPTAB_D (vec_widen_sshiftl_lo_optab, "vec_widen_sshiftl_lo_$a")
 OPTAB_D (vec_widen_umult_even_optab, "vec_widen_umult_even_$a")
@@ -436,6 +441,11 @@ OPTAB_D (vec_widen_uadd_hi_optab, "vec_widen_uadd_hi_$a")
 OPTAB_D (vec_widen_uadd_lo_optab, "vec_widen_uadd_lo_$a")
 OPTAB_D (vec_widen_uadd_odd_optab, "vec_widen_uadd_odd_$a")
 OPTAB_D (vec_widen_uadd_even_optab, "vec_widen_uadd_even_$a")
+OPTAB_D (vec_widen_uabd_optab, "vec_widen_uabd_$a")
+OPTAB_D (vec_widen_uabd_hi_optab, "vec_widen_uabd_hi_$a")

[PATCH 2/2] AArch64: New RTL for ABDL

2023-06-30 Thread Oluwatamilore Adebayo via Gcc-patches
From: oluade01 

This patch adds new RTL for ABDL (sabdl, sabdl2, uabdl, uabdl2).

gcc/ChangeLog:

* config/aarch64/aarch64-simd.md
(vec_widen_abdl_lo_, vec_widen_abdl_hi_):
Expansions for abd vec widen optabs.
(aarch64_abdl_insn): VQW based abdl RTL.
* config/aarch64/iterators.md (USMAX_EXT): Code attributes
that give the appropriate extend RTL for the max RTL.

gcc/testsuite/ChangeLog:

* gcc.target/aarch64/abd_2.c: Added ABDL testcases.
* gcc.target/aarch64/abd_3.c: Added ABDL testcases.
* gcc.target/aarch64/abd_4.c: Added ABDL testcases.
* gcc.target/aarch64/abd_none_2.c: Added ABDL testcases.
* gcc.target/aarch64/abd_none_3.c: Added ABDL testcases.
* gcc.target/aarch64/abd_none_4.c: Added ABDL testcases.
* gcc.target/aarch64/abd_run_1.c: Added ABDL testcases.
* gcc.target/aarch64/sve/abd_1.c: Added ABDL testcases.
* gcc.target/aarch64/sve/abd_2.c: Added ABDL testcases.
* gcc.target/aarch64/sve/abd_none_1.c: Added ABDL testcases.
* gcc.target/aarch64/sve/abd_none_2.c: Added ABDL testcases.
---
 gcc/config/aarch64/aarch64-simd.md| 60 +
 gcc/config/aarch64/iterators.md   |  3 +
 gcc/testsuite/gcc.target/aarch64/abd_2.c  | 45 +++---
 gcc/testsuite/gcc.target/aarch64/abd_3.c  | 46 +++---
 gcc/testsuite/gcc.target/aarch64/abd_4.c  | 34 
 gcc/testsuite/gcc.target/aarch64/abd_none_2.c | 73 
 gcc/testsuite/gcc.target/aarch64/abd_none_3.c | 73 
 gcc/testsuite/gcc.target/aarch64/abd_none_4.c | 84 +++
 gcc/testsuite/gcc.target/aarch64/abd_run_1.c  | 29 +++
 .../gcc.target/aarch64/abd_widen_2.c  | 50 +++
 .../gcc.target/aarch64/abd_widen_3.c  | 50 +++
 .../gcc.target/aarch64/abd_widen_4.c  | 56 +
 gcc/testsuite/gcc.target/aarch64/sve/abd_1.c  | 57 +++--
 gcc/testsuite/gcc.target/aarch64/sve/abd_2.c  | 47 +--
 .../gcc.target/aarch64/sve/abd_none_1.c   | 73 
 .../gcc.target/aarch64/sve/abd_none_2.c   | 80 ++
 16 files changed, 803 insertions(+), 57 deletions(-)
 create mode 100644 gcc/testsuite/gcc.target/aarch64/abd_widen_2.c
 create mode 100644 gcc/testsuite/gcc.target/aarch64/abd_widen_3.c
 create mode 100644 gcc/testsuite/gcc.target/aarch64/abd_widen_4.c

diff --git a/gcc/config/aarch64/aarch64-simd.md 
b/gcc/config/aarch64/aarch64-simd.md
index 
bf90202ba2ad3f62f2020486d21256f083effb07..fddf40da57693e8158b17e6045b2a064552ad273
 100644
--- a/gcc/config/aarch64/aarch64-simd.md
+++ b/gcc/config/aarch64/aarch64-simd.md
@@ -975,6 +975,66 @@ (define_expand "aarch64_abdl2"
   }
 )
 
+(define_insn "aarch64_abdl_hi_internal"
+  [(set (match_operand: 0 "register_operand" "=w")
+   (abs:
+ (minus:
+   (ANY_EXTEND:
+ (vec_select:
+   (match_operand:VQW 1 "register_operand" "w")
+   (match_operand:VQW 3 "vect_par_cnst_hi_half" "")))
+   (ANY_EXTEND:
+ (vec_select:
+   (match_operand:VQW 2 "register_operand" "w")
+   (match_dup 3))]
+  "TARGET_SIMD"
+  "abdl2\t%0., %1., %2."
+  [(set_attr "type" "neon_abd_long")]
+)
+
+(define_insn "aarch64_abdl_lo_internal"
+  [(set (match_operand: 0 "register_operand" "=w")
+   (abs:
+ (minus:
+   (ANY_EXTEND:
+ (vec_select:
+   (match_operand:VQW 1 "register_operand" "w")
+   (match_operand:VQW 3 "vect_par_cnst_lo_half" "")))
+   (ANY_EXTEND:
+ (vec_select:
+   (match_operand:VQW 2 "register_operand" "w")
+   (match_dup 3))]
+  "TARGET_SIMD"
+  "abdl\t%0., %1., %2."
+  [(set_attr "type" "neon_abd_long")]
+)
+
+(define_expand "vec_widen_abd_hi_"
+  [(match_operand: 0 "register_operand")
+   (ANY_EXTEND: (match_operand:VQW 1 "register_operand"))
+   (ANY_EXTEND: (match_operand:VQW 2 "register_operand"))]
+  "TARGET_SIMD"
+  {
+rtx p = aarch64_simd_vect_par_cnst_half (mode, , true);
+emit_insn (gen_aarch64_abdl_hi_internal (operands[0], 
operands[1],
+  operands[2], p));
+DONE;
+  }
+)
+
+(define_expand "vec_widen_abd_lo_"
+  [(match_operand: 0 "register_operand")
+   (ANY_EXTEND: (match_operand:VQW 1 "register_operand"))
+   (ANY_EXTEND: (match_operand:VQW 2 "register_operand"))]
+  "TARGET_SIMD"
+  {
+rtx p = aarch64_simd_vect_par_cnst_half (mode, , false);
+emit_insn (gen_aarch64_abdl_lo_internal (operands[0], 
operands[1],
+  operands[2], p));
+DONE;
+  }
+)
+
 (define_insn "aarch64_abal"
   [(set (match_operand: 0 "register_operand" "=w")
(plus:
diff --git a/gcc/config/aarch64/iterators.md b/gcc/config/aarch64/iterators.md
index 
d9c7354730ac5870c0042f1e30fb1140a117d110..1385842d0a51b3f4a0871af4d

Re: [x86 PATCH] Add STV support for DImode and SImode rotations by constant.

2023-06-30 Thread Uros Bizjak via Gcc-patches
On Fri, Jun 30, 2023 at 9:29 AM Roger Sayle  wrote:
>
>
> This patch implements scalar-to-vector (STV) support for DImode and SImode
> rotations by constant bit counts.  Scalar rotations are almost always
> optimal on x86, requiring only one or two instructions, but it is also
> possible to implement these efficiently with SSE2, requiring only one
> or two instructions for SImode rotations and at most 3 instructions for
> DImode rotations.  This allows GCC to STV rotations with a small or no
> penalty if there are other (net) benefits to converting a chain.  An
> example of the benefits is shown below, which is based upon the BLAKE2
> cryptographic hash function:
>
> unsigned long long a,b,c,d;
>
> unsigned long rot(unsigned long long x, int y)
> {
>   return (x<>(64-y));
> }
>
> void foo()
> {
>   d = rot(d ^ a,32);
>   c = c + d;
>   b = rot(b ^ c,24);
>   a = a + b;
>   d = rot(d ^ a,16);
>   c = c + d;
>   b = rot(b ^ c,63);
> }
>
> where with -m32 -O2 -msse2
>
> Before (59 insns, 247 bytes):
> foo:pushl   %edi
> xorl%edx, %edx
> pushl   %esi
> pushl   %ebx
> subl$16, %esp
> movqa, %xmm1
> movqd, %xmm0
> movqb, %xmm2
> pxor%xmm1, %xmm0
> psrlq   $32, %xmm0
> movd%xmm0, %eax
> movd%edx, %xmm0
> movd%eax, %xmm3
> punpckldq   %xmm0, %xmm3
> movqc, %xmm0
> paddq   %xmm3, %xmm0
> pxor%xmm0, %xmm2
> movd%xmm2, %ecx
> psrlq   $32, %xmm2
> movd%xmm2, %ebx
> movl%ecx, %eax
> shldl   $24, %ebx, %ecx
> shldl   $24, %eax, %ebx
> movd%ebx, %xmm4
> movd%ecx, %xmm2
> punpckldq   %xmm4, %xmm2
> movdqa  .LC0, %xmm4
> pand%xmm4, %xmm2
> paddq   %xmm2, %xmm1
> movq%xmm1, a
> pxor%xmm3, %xmm1
> movd%xmm1, %esi
> psrlq   $32, %xmm1
> movd%xmm1, %edi
> movl%esi, %eax
> shldl   $16, %edi, %esi
> shldl   $16, %eax, %edi
> movd%esi, %xmm1
> movd%edi, %xmm3
> punpckldq   %xmm3, %xmm1
> pand%xmm4, %xmm1
> movq%xmm1, d
> paddq   %xmm1, %xmm0
> movq%xmm0, c
> pxor%xmm2, %xmm0
> movd%xmm0, 8(%esp)
> psrlq   $32, %xmm0
> movl8(%esp), %eax
> movd%xmm0, 12(%esp)
> movl12(%esp), %edx
> shrdl   $1, %edx, %eax
> xorl%edx, %edx
> movl%eax, b
> movl%edx, b+4
> addl$16, %esp
> popl%ebx
> popl%esi
> popl%edi
> ret
>
> After (32 insns, 165 bytes):
> movqa, %xmm1
> xorl%edx, %edx
> movqd, %xmm0
> movqb, %xmm2
> movdqa  .LC0, %xmm4
> pxor%xmm1, %xmm0
> psrlq   $32, %xmm0
> movd%xmm0, %eax
> movd%edx, %xmm0
> movd%eax, %xmm3
> punpckldq   %xmm0, %xmm3
> movqc, %xmm0
> paddq   %xmm3, %xmm0
> pxor%xmm0, %xmm2
> pshufd  $68, %xmm2, %xmm2
> psrldq  $5, %xmm2
> pand%xmm4, %xmm2
> paddq   %xmm2, %xmm1
> movq%xmm1, a
> pxor%xmm3, %xmm1
> pshuflw $147, %xmm1, %xmm1
> pand%xmm4, %xmm1
> movq%xmm1, d
> paddq   %xmm1, %xmm0
> movq%xmm0, c
> pxor%xmm2, %xmm0
> pshufd  $20, %xmm0, %xmm0
> psrlq   $1, %xmm0
> pshufd  $136, %xmm0, %xmm0
> pand%xmm4, %xmm0
> movq%xmm0, b
> ret
>
>
> This patch has been tested on x86_64-pc-linux-gnu with make bootstrap
> and make -k check, both with and without --target_board=unix{-m32}
> with no new failures.  Ok for mainline?
>
>
> 2023-06-30  Roger Sayle  
>
> gcc/ChangeLog
> * config/i386/i386-features.cc (compute_convert_gain): Provide
> gains/costs for ROTATE and ROTATERT (by an integer constant).
> (general_scalar_chain::convert_rotate): New helper function to
> convert a DImode or SImode rotation by an integer constant into
> SSE vector form.
> (general_scalar_chain::convert_insn): Call the new convert_rotate
> for ROTATE and ROTATERT.
> (general_scalar_to_vector_candidate_p): Consider ROTATE and
> ROTATERT to be candidates if the second operand is an integer
> constant, valid for a rotation (or shift) in the given mode.
> * config/i386/i386-features.h (general_scalar_chain): Add new
> helper method convert_rotate.
>
> gcc/testsuite/ChangeLog
> * gcc.target/i386/rotate-6.c: New test case.
> * gcc.target/i386/sse2-stv-1.c: Likewise.

LGTM.

Please note that AVX512VL provides VPROLD/VPROLQ and VPRORD/VPRORQ
native rotate instructions that can come handy here.

[PATCH] middle-end/110489 - avoid useless work on statistics

2023-06-30 Thread Richard Biener via Gcc-patches
When we call statistics_fini_pass we unconditionally allocate
the statistics hash and traverse it.  When a TU has many small
functions this can take considerable time.  The following avoids
this by never allocating the hash from this function.

Bootstrapped and tested on x86_64-unknown-linux-gnu, pushed.

PR middle-end/110489
* statistics.cc (curr_statistics_hash): Add argument
indicating whether we should allocate the hash.
(statistics_fini_pass): If the hash isn't allocated
only print the summary header.
---
 gcc/statistics.cc | 21 ++---
 1 file changed, 14 insertions(+), 7 deletions(-)

diff --git a/gcc/statistics.cc b/gcc/statistics.cc
index 1708e0d3aac..6d1eefd544e 100644
--- a/gcc/statistics.cc
+++ b/gcc/statistics.cc
@@ -88,7 +88,7 @@ static unsigned nr_statistics_hashes;
statistics.  */
 
 static stats_counter_table_type *
-curr_statistics_hash (void)
+curr_statistics_hash (bool alloc = true)
 {
   unsigned idx;
 
@@ -99,6 +99,9 @@ curr_statistics_hash (void)
   && statistics_hashes[idx])
 return statistics_hashes[idx];
 
+  if (!alloc)
+return nullptr;
+
   if (idx >= nr_statistics_hashes)
 {
   statistics_hashes = XRESIZEVEC (stats_counter_table_type *,
@@ -202,23 +205,27 @@ statistics_fini_pass (void)
   if (current_pass->static_pass_number == -1)
 return;
 
+  stats_counter_table_type *stat_hash = curr_statistics_hash (false);
+
   if (dump_file
   && dump_flags & TDF_STATS)
 {
   fprintf (dump_file, "\n");
   fprintf (dump_file, "Pass statistics of \"%s\": ", current_pass->name);
   fprintf (dump_file, "\n");
-  curr_statistics_hash ()
-   ->traverse_noresize  (NULL);
+  if (stat_hash)
+   stat_hash->traverse_noresize  (NULL);
   fprintf (dump_file, "\n");
 }
+
+  if (!stat_hash)
+return;
+
   if (statistics_dump_file
   && !(statistics_dump_flags & TDF_STATS
   || statistics_dump_flags & TDF_DETAILS))
-curr_statistics_hash ()
-  ->traverse_noresize  (NULL);
-  curr_statistics_hash ()
-->traverse_noresize  (NULL);
+stat_hash->traverse_noresize  (NULL);
+  stat_hash->traverse_noresize  (NULL);
 }
 
 /* Helper for printing summary information.  */
-- 
2.35.3


Re: LTO: Capture 'lto_file_decl_data *file_data' in 'class lto_input_block' (was: [PATCH v3] Streamer: Fix out of range memory access of machine mode)

2023-06-30 Thread Richard Biener via Gcc-patches
On Fri, 30 Jun 2023, Thomas Schwinge wrote:

> Hi!
> 
> On 2023-06-29T22:14:59+0200, I wrote:
> > [the new] 'file_data->mode_bits' needs to be considered [somewhere]
> >
> > Easiest is in 'gcc/lto-streamer.h:class lto_input_block' to capture
> > 'lto_file_decl_data *file_data' instead of just
> > 'unsigned char *mode_table', and adjust all users.
> 
> I've split this out as a preparational "no change in behavior" patch; is
> "LTO: Capture 'lto_file_decl_data *file_data' in 'class lto_input_block'"
> OK to push, see attached?

Yes.

Richard.

> 
> Gr??e
>  Thomas
> 
> 
> -
> Siemens Electronic Design Automation GmbH; Anschrift: Arnulfstra?e 201, 80634 
> M?nchen; Gesellschaft mit beschr?nkter Haftung; Gesch?ftsf?hrer: Thomas 
> Heurung, Frank Th?rauf; Sitz der Gesellschaft: M?nchen; Registergericht 
> M?nchen, HRB 106955
> 

-- 
Richard Biener 
SUSE Software Solutions Germany GmbH, Frankenstrasse 146, 90461 Nuernberg,
Germany; GF: Ivo Totev, Andrew Myers, Andrew McDonald, Boudien Moerman;
HRB 36809 (AG Nuernberg)


Re: [PATCH v1 2/6] LoongArch: Added Loongson SX base instruction support.

2023-06-30 Thread WANG Xuerui




On 2023/6/30 10:16, Chenghui Pan wrote:

[snip]
---
  gcc/config/loongarch/constraints.md|  128 +-
  gcc/config/loongarch/loongarch-builtins.cc |   10 +
  gcc/config/loongarch/loongarch-modes.def   |   38 +
  gcc/config/loongarch/loongarch-protos.h|   31 +
  gcc/config/loongarch/loongarch.cc  | 2235 +-
  gcc/config/loongarch/loongarch.h   |   65 +-
  gcc/config/loongarch/loongarch.md  |   44 +-
  gcc/config/loongarch/lsx.md| 4490 
  gcc/config/loongarch/predicates.md |  333 +-
  9 files changed, 7184 insertions(+), 190 deletions(-)
  create mode 100644 gcc/config/loongarch/lsx.md

diff --git a/gcc/config/loongarch/constraints.md 
b/gcc/config/loongarch/constraints.md
index 7a38cd07ae9..1dd56af07c4 100644
--- a/gcc/config/loongarch/constraints.md
+++ b/gcc/config/loongarch/constraints.md
@@ -30,8 +30,7 @@
  ;; "h" <-unused
  ;; "i" "Matches a general integer constant." (Global non-architectural)
  ;; "j" SIBCALL_REGS
-;; "k" "A memory operand whose address is formed by a base register and
-;;  (optionally scaled) index register."
+;; "k" <-unused
  ;; "l" "A signed 16-bit constant."
  ;; "m" "A memory operand whose address is formed by a base register and offset
  ;;  that is suitable for use in instructions with the same addressing mode
@@ -80,13 +79,14 @@
  ;; "N" <-unused
  ;; "O" <-unused
  ;; "P" <-unused
-;; "Q" <-unused
+;; "Q" "A signed 12-bit constant"
  ;; "R" <-unused
  ;; "S" <-unused
  ;; "T" <-unused
  ;; "U" <-unused
  ;; "V" "Matches a non-offsettable memory reference." (Global 
non-architectural)
-;; "W" <-unused
+;; "W" "A memory address based on a member of @code{BASE_REG_CLASS}.  This is
+;; true for all references."
  ;; "X" "Matches anything." (Global non-architectural)
  ;; "Y" -
  ;;"Yd"
@@ -214,6 +214,63 @@ (define_constraint "Le"
(and (match_code "const_int")
 (match_test "loongarch_addu16i_imm12_operand_p (ival, SImode)")))
  
+(define_constraint "M"

+  "A constant that cannot be loaded using @code{lui}, @code{addiu}
+   or @code{ori}."
+  (and (match_code "const_int")
+   (not (match_test "IMM12_OPERAND (ival)"))
+   (not (match_test "IMM12_OPERAND_UNSIGNED (ival)"))
+   (not (match_test "LU12I_OPERAND (ival)"
+
+(define_constraint "N"
+  "A constant in the range -65535 to -1 (inclusive)."
+  (and (match_code "const_int")
+   (match_test "ival >= -0x && ival < 0")))
+
+(define_constraint "O"
+  "A signed 15-bit constant."
+  (and (match_code "const_int")
+   (match_test "ival >= -0x4000 && ival < 0x4000")))
+
+(define_constraint "P"
+  "A constant in the range 1 to 65535 (inclusive)."
+  (and (match_code "const_int")
+   (match_test "ival > 0 && ival < 0x1")))


These constraints are meant to be exposed for developers to use, right? 
If not so they should probably be marked "@internal", and if so you 
should update the docs as well.


Also these are not documented in the comment block at the top of file.


+
+;; General constraints
+
+(define_memory_constraint "R"
+  "An address that can be used in a non-macro load or store."
+  (and (match_code "mem")
+   (match_test "loongarch_address_insns (XEXP (op, 0), mode, false) == 
1")))


Similarly, is this "R" constraint meant to be exposed as well? Sure 
one's free to choose letters but "R" IMO strongly implies something 
related to registers, not addresses...



+(define_constraint "S"
+  "@internal
+   A constant call address."
+  (and (match_operand 0 "call_insn_operand")
+   (match_test "CONSTANT_P (op)")))


Additionally, IMO we probably should minimize our use of single-letter 
constraints that don't overlap with other architectures' similar usage. 
(I know that several projects have accepted LSX/LASX code well ahead of 
this series, but I don't know off my head if their code used any inline 
asm instead of C intrinsics. Intuitively this shouldn't be a concern 
though.)


Overall, I'd recommend moving all single-letter constraints added here 
to a two-letter namespace, so everything is better namespaced and easier 
to remember (e.g. if we choose something like "Vx" or "Yx" for 
everything vector-related, it'd be a lot easier to mentally associate 
the two-letter incantations with correct semantics.)



+
+(define_constraint "YG"
+  "@internal
+   A vector zero."
+  (and (match_code "const_vector")
+   (match_test "op == CONST0_RTX (mode)")))
+
+(define_constraint "YA"
+  "@internal
+   An unsigned 6-bit constant."
+  (and (match_code "const_int")
+   (match_test "UIMM6_OPERAND (ival)")))
+
+(define_constraint "YB"
+  "@internal
+   A signed 10-bit constant."
+  (and (match_code "const_int")
+   (match_test "IMM10_OPERAND (ival)")))
+
+(define_constraint "Yb"
+   "@internal"
+   (match_operand 0 "qi_mask_operand"))
+
  (define_constraint "Yd"
"@internal
 A constant @code{move_operand} that can be safely l

Re: PR108672 re-fixed after [PATCH] libstdc++: Synchronize PSTL with upstream

2023-06-30 Thread Jonathan Wakely via Gcc-patches
On Fri, 30 Jun 2023 at 04:48, Hans-Peter Nilsson wrote:
>
> > Date: Mon, 26 Jun 2023 11:57:49 -0700
> > From: Thomas Rodgers via Gcc-patches 
>
> > On Wed, May 17, 2023 at 12:32 PM Jonathan Wakely  wrote:
> > > All the actual code changes look good.
>
> Unfortunately, this overwrote the fix for PR108672.  I take
> it there's a step missing from the synchronization process;
> a check that no local commits are overwritten?  Sounds like
> something that can be fully scripted (not volunteering) or
> already available (like, "list all commits affecting
> contents touched by/between two named commits").
>
> I did *not* check whether any other local commits were also
> overwritten.  Also, not sure about whether better try to get
> this upstreamed: __INT32_TYPE__ seems gcc-specific.

Clang does support it too, but I agree that upstream might not want that change.


> Anyway, r13-5702-g72058eea9d407e was "re-committed" per
> below as obvious after regtesting cris-elf.

Thanks.

I'll add an include/pstl/LOCAL_PATCHES file listed the commits we
apply locally after importing the upstream sources.

Based on git history, the initial list of commits is:

r9-6908-g0360f9ad4048ea
r9-6942-g9eda9f9231f287
r9-7071-ga34d6343a758f6
r10-572-g34d878c7bc86d4
r10-1314-g32bab8b6ad0a90
r11-7339-g7e647d71d556b7
r12-7699-gac73c944eac88f
r13-3708-ge3b10249119fb4
r13-5702-g72058eea9d407e

But several of those have been incorporated upstream, or were
reapplied correctly to our downstream copies. We'll go through the
list and find which ones need to stay there.

It looks like r10-1314-g32bab8b6ad0a90 was lost and should be re-applied.



[v4] Streamer: Fix out of range memory access of machine mode

2023-06-30 Thread Thomas Schwinge
Hi!

On 2023-06-30T01:39:39+, "Li, Pan2"  wrote:
> That’s very cool, thanks Thomas for help!

:-)

> Let’s wait the AMD test running result for the final version of the patch.

That's all looking good, too.

> From: juzhe.zh...@rivai.ai 
> Sent: Friday, June 30, 2023 9:27 AM

> Could you merge your patch after you tested?

I've done that, and with (already approved)

"LTO: Capture 'lto_file_decl_data *file_data' in 'class lto_input_block'"
split out, OK to push the attached
v4 "Streamer: Fix out of range memory access of machine mode"?


Grüße
 Thomas


> From: Thomas Schwinge
> Date: 2023-06-30 04:14

> Subject: Re: [PATCH v3] Streamer: Fix out of range memory access of machine 
> mode
> Hi!
>
> On 2023-06-29T11:29:57+0200, I wrote:
>> On 2023-06-21T15:58:24+0800, Pan Li via Gcc-patches 
>> mailto:gcc-patches@gcc.gnu.org>> wrote:
>>> We extend the machine mode from 8 to 16 bits already. But there still
>>> one placing missing from the streamer. It has one hard coded array
>>> for the machine code like size 256.
>>>
>>> In the lto pass, we memset the array by MAX_MACHINE_MODE count but the
>>> value of the MAX_MACHINE_MODE will grow as more and more modes are
>>> added. While the machine mode array in tree-streamer still leave 256 as is.
>>>
>>> Then, when the MAX_MACHINE_MODE is greater than 256, the memset of
>>> lto_output_init_mode_table will touch the memory out of range unexpected.
>>
>> Uh.  :-O
>>
>>> This patch would like to take the MAX_MACHINE_MODE as the size of the
>>> array in streamer, to make sure there is no potential unexpected
>>> memory access in future. Meanwhile, this patch also adjust some place
>>> which has MAX_MACHINE_MODE <= 256 assumption.
>>
>> Thanks to Jakub and Richard for guidance re the offloading compilation
>> case, where we've got different 'MAX_MACHINE_MODE's between stream-out
>> and stream-in, and a modes mapping table.
>>
>> However, with this patch, there are ICEs all over the place...  I'm
>> having a look.
>
> Your patch has all the right ideas, there are just a few additional
> changes necessary.  Please merge in the attached
> "f into Streamer: Fix out of range memory access of machine mode", with
> 'Co-authored-by: Thomas Schwinge 
> mailto:tho...@codesourcery.com>>'.  This has
> already survived compiler-side 'lto.exp' testing and
> 'check-target-libgomp' with Nvidia GPU offloading; AMD GPU testing is now
> running (not expecting any bad surprises).  Will let you know by (my)
> tomorrow morning in case there are any more problems.
>
> Explanation:
>
>>> --- a/gcc/lto-streamer-in.cc
>>> +++ b/gcc/lto-streamer-in.cc
>>> @@ -1985,8 +1985,6 @@ lto_input_mode_table (struct lto_file_decl_data 
>>> *file_data)
>>>  internal_error ("cannot read LTO mode table from %s",
>>>   file_data->file_name);
>>>
>>> -  unsigned char *table = ggc_cleared_vec_alloc (1 << 8);
>>> -  file_data->mode_table = table;
>>>const struct lto_simple_header_with_strings *header
>>>  = (const struct lto_simple_header_with_strings *) data;
>>>int string_offset;
>>> @@ -1998,16 +1996,22 @@ lto_input_mode_table (struct lto_file_decl_data 
>>> *file_data)
>>>   header->string_size, vNULL);
>>>bitpack_d bp = streamer_read_bitpack (&ib);
>>>
>>> +  unsigned mode_bits = bp_unpack_value (&bp, 5);
>>> +  unsigned char *table = ggc_cleared_vec_alloc (1 << 
>>> mode_bits);
>>> +
>>> +  file_data->mode_table = table;
>>> +  file_data->mode_bits = mode_bits;
>
> Here, we set 'file_data->mode_bits' for the offloading case (where
> 'lto_input_mode_table' is called) -- but it's not set for the
> non-offloading case (where 'lto_input_mode_table' isn't called).  (See my
> 'gcc/lto/lto-common.cc:lto_read_decls' change.)  That's "not currently a
> problem", as 'file_data->mode_bits' isn't used anywhere...
>
>>> --- a/gcc/lto-streamer.h
>>> +++ b/gcc/lto-streamer.h
>>> @@ -604,6 +604,8 @@ struct GTY(()) lto_file_decl_data
>>>int order_base;
>>>
>>>int unit_base;
>>> +
>>> +  unsigned mode_bits;
>>>  };
>
>>>  inline machine_mode
>>>  bp_unpack_machine_mode (struct bitpack_d *bp)
>>>  {
>>> -  return (machine_mode)
>>> -((class lto_input_block *)
>>> - bp->stream)->mode_table[bp_unpack_enum (bp, machine_mode, 1 << 
>>> 8)];
>>> +  int last = 1 << ceil_log2 (MAX_MACHINE_MODE);
>>> +  lto_input_block *input_block = (class lto_input_block *) bp->stream;
>>> +  int index = bp_unpack_enum (bp, machine_mode, last);
>>> +
>>> +  return (machine_mode) input_block->mode_table[index];
>>>  }
>
> ..., but 'file_data->mode_bits' needs to be considered here, in the
> stream-in for offloading, where 'file_data->mode_bits' -- that is, the
> host 'MAX_MACHINE_MODE' -- very likely is different from the offload
> device 'MAX_MACHINE_MODE'.
>
> Easiest is in 'gcc/lto-streamer.h:class lto_input_block' to capture
> 'lto_file_decl_data *file_da

Re: [PATCH] tree.h: Hide wi::from_mpz from GENERATOR_FILE

2023-06-30 Thread Richard Biener via Gcc-patches
On Fri, Jun 30, 2023 at 7:20 AM Kewen.Lin  wrote:
>
> Hi,
>
> Similar to r0-85707-g34917a102a4e0c for PR35051, the uses
> of mpz_t should be guarded with "#ifndef GENERATOR_FILE".
> This patch is to fix it and avoid some possible build
> errors.
>
> Bootstrapped and regress-tested on x86_64-redhat-linux,
> and powerpc64{,le}-linux-gnu.  And cross-build well on
> power for 40+ different ports.
>
> Is it ok for trunk?

OK.

> gcc/ChangeLog:
>
> * tree.h (wi::from_mpz): Hide from GENERATOR_FILE.
> ---
>  gcc/tree.h | 2 ++
>  1 file changed, 2 insertions(+)
>
> diff --git a/gcc/tree.h b/gcc/tree.h
> index 1854fe4a7d4..7e92a12f9cb 100644
> --- a/gcc/tree.h
> +++ b/gcc/tree.h
> @@ -6460,7 +6460,9 @@ namespace wi
>
>wide_int min_value (const_tree);
>wide_int max_value (const_tree);
> +#ifndef GENERATOR_FILE
>wide_int from_mpz (const_tree, mpz_t, bool);
> +#endif
>  }
>
>  template 
> --
> 2.39.3


Re: [PATCH] Collect both user and kernel events for autofdo tests and autoprofiledbootstrap

2023-06-30 Thread Richard Biener via Gcc-patches
On Fri, Jun 30, 2023 at 7:28 AM Eugene Rozenfeld via Gcc-patches
 wrote:
>
> When we collect just user events for autofdo with lbr we get some events 
> where branch
> sources are kernel addresses and branch targets are user addresses. Without 
> kernel MMAP
> events create_gcov can't make sense of kernel addresses. Currently 
> create_gcov fails if
> it can't map at least 95% of events. We sometimes get below this threshold 
> with just
> user events. The change is to collect both user events and kernel events.

Does this require elevated privileges?  Can we instead "fix" create_gcov here?

> Tested on x86_64-pc-linux-gnu.
>
> ChangeLog:
>
> * Makefile.in: Collect both kernel and user events for autofdo
> * Makefile.tpl: Collect both kernel and user events for autofdo
>
> gcc/testsuite/ChangeLog:
>
> * lib/target-supports.exp: Collect both kernel and user events for 
> autofdo
> ---
>  Makefile.in   | 2 +-
>  Makefile.tpl  | 2 +-
>  gcc/testsuite/lib/target-supports.exp | 2 +-
>  3 files changed, 3 insertions(+), 3 deletions(-)
>
> diff --git a/Makefile.in b/Makefile.in
> index f19a9db621e..04307ca561b 100644
> --- a/Makefile.in
> +++ b/Makefile.in
> @@ -404,7 +404,7 @@ MAKEINFO = @MAKEINFO@
>  EXPECT = @EXPECT@
>  RUNTEST = @RUNTEST@
>
> -AUTO_PROFILE = gcc-auto-profile -c 1000
> +AUTO_PROFILE = gcc-auto-profile --all -c 1000
>
>  # This just becomes part of the MAKEINFO definition passed down to
>  # sub-makes.  It lets flags be given on the command line while still
> diff --git a/Makefile.tpl b/Makefile.tpl
> index 3a5b7ed3c92..d0fe7e2fb77 100644
> --- a/Makefile.tpl
> +++ b/Makefile.tpl
> @@ -407,7 +407,7 @@ MAKEINFO = @MAKEINFO@
>  EXPECT = @EXPECT@
>  RUNTEST = @RUNTEST@
>
> -AUTO_PROFILE = gcc-auto-profile -c 1000
> +AUTO_PROFILE = gcc-auto-profile --all -c 1000
>
>  # This just becomes part of the MAKEINFO definition passed down to
>  # sub-makes.  It lets flags be given on the command line while still
> diff --git a/gcc/testsuite/lib/target-supports.exp 
> b/gcc/testsuite/lib/target-supports.exp
> index 4d04df2a709..b16853d76df 100644
> --- a/gcc/testsuite/lib/target-supports.exp
> +++ b/gcc/testsuite/lib/target-supports.exp
> @@ -704,7 +704,7 @@ proc check_effective_target_keeps_null_pointer_checks { } 
> {
>  # this allows parallelism of 16 and higher of parallel gcc-auto-profile
>  proc profopt-perf-wrapper { } {
>  global srcdir
> -return "$srcdir/../config/i386/gcc-auto-profile -m8 "
> +return "$srcdir/../config/i386/gcc-auto-profile --all -m8 "
>  }
>
>  # Return true if profiling is supported on the target.
> --
> 2.25.1
>


Re: [PATCH 1/3] targhooks: Extend legitimate_address_p with code_helper [PR110248]

2023-06-30 Thread Richard Biener via Gcc-patches
On Fri, Jun 30, 2023 at 7:38 AM Kewen.Lin  wrote:
>
> Hi,
>
> As PR110248 shows, some middle-end passes like IVOPTs can
> query the target hook legitimate_address_p with some
> artificially constructed rtx to determine whether some
> addressing modes are supported by target for some gimple
> statement.  But for now the existing legitimate_address_p
> only checks the given mode, it's unable to distinguish
> some special cases unfortunately, for example, for LEN_LOAD
> ifn on Power port, we would expand it with lxvl hardware
> insn, which only supports one register to hold the address
> (the other register is holding the length), that is we
> don't support base (reg) + index (reg) addressing mode for
> sure.  But hook legitimate_address_p only considers the
> given mode which would be some vector mode for LEN_LOAD
> ifn, and we do support base + index addressing mode for
> normal vector load and store insns, so the hook will return
> true for the query unexpectedly.
>
> This patch is to introduce one extra argument of type
> code_helper for hook legitimate_address_p, it makes targets
> able to handle some special case like what's described
> above.  The subsequent patches will show how to leverage
> the new code_helper argument.
>
> I didn't separate those target specific adjustments to
> their own patches, since those changes are no function
> changes.  One typical change is to add one unnamed argument
> with default ERROR_MARK, some ports need to include tree.h
> in their {port}-protos.h since the hook is used in some
> machine description files.  I've cross-built a corresponding
> cc1 successfully for at least one triple of each affected
> target so I believe they are safe.  But feel free to correct
> me if separating is needed for the review of this patch.
>
> Besides, it's bootstrapped and regtested on
> x86_64-redhat-linux and powerpc64{,le}-linux-gnu.
>
> Is it ok for trunk?

Is defaulting the arguments in the targets necessary for
the middle-end or only for direct uses in the targets?

It looks OK in general but please give others some time to
comment.

Thanks,
Richard.

> BR,
> Kewen
> --
> PR tree-optimization/110248
>
> gcc/ChangeLog:
>
> * coretypes.h (class code_helper): Add forward declaration.
> * doc/tm.texi: Regenerate.
> * lra-constraints.cc (valid_address_p): Call target hook
> targetm.addr_space.legitimate_address_p with an extra parameter
> ERROR_MARK as its prototype changes.
> * recog.cc (memory_address_addr_space_p): Likewise.
> * reload.cc (strict_memory_address_addr_space_p): Likewise.
> * target.def (legitimate_address_p, addr_space.legitimate_address_p):
> Extend with one more argument of type code_helper, update the
> documentation accordingly.
> * targhooks.cc (default_legitimate_address_p): Adjust for the
> new code_helper argument.
> (default_addr_space_legitimate_address_p): Likewise.
> * targhooks.h (default_legitimate_address_p): Likewise.
> (default_addr_space_legitimate_address_p): Likewise.
> * config/aarch64/aarch64.cc (aarch64_legitimate_address_hook_p): 
> Adjust
> with extra unnamed code_helper argument with default ERROR_MARK.
> * config/alpha/alpha.cc (alpha_legitimate_address_p): Likewise.
> * config/arc/arc.cc (arc_legitimate_address_p): Likewise.
> * config/arm/arm-protos.h (arm_legitimate_address_p): Likewise.
> (tree.h): New include for tree_code ERROR_MARK.
> * config/arm/arm.cc (arm_legitimate_address_p): Adjust with extra
> unnamed code_helper argument with default ERROR_MARK.
> * config/avr/avr.cc (avr_addr_space_legitimate_address_p): Likewise.
> * config/bfin/bfin.cc (bfin_legitimate_address_p): Likewise.
> * config/bpf/bpf.cc (bpf_legitimate_address_p): Likewise.
> * config/c6x/c6x.cc (c6x_legitimate_address_p): Likewise.
> * config/cris/cris-protos.h (cris_legitimate_address_p): Likewise.
> (tree.h): New include for tree_code ERROR_MARK.
> * config/cris/cris.cc (cris_legitimate_address_p): Adjust with extra
> unnamed code_helper argument with default ERROR_MARK.
> * config/csky/csky.cc (csky_legitimate_address_p): Likewise.
> * config/epiphany/epiphany.cc (epiphany_legitimate_address_p):
> Likewise.
> * config/frv/frv.cc (frv_legitimate_address_p): Likewise.
> * config/ft32/ft32.cc (ft32_addr_space_legitimate_address_p): 
> Likewise.
> * config/gcn/gcn.cc (gcn_addr_space_legitimate_address_p): Likewise.
> * config/h8300/h8300.cc (h8300_legitimate_address_p): Likewise.
> * config/i386/i386.cc (ix86_legitimate_address_p): Likewise.
> * config/ia64/ia64.cc (ia64_legitimate_address_p): Likewise.
> * config/iq2000/iq2000.cc (iq2000_legitimate_address_p): Likewise.
> * config/lm32/lm32.cc (lm32_legitimate_address_p):

Re: [PATCH 2/3] ivopts: Call valid_mem_ref_p with code_helper [PR110248]

2023-06-30 Thread Richard Biener via Gcc-patches
On Fri, Jun 30, 2023 at 7:46 AM Kewen.Lin  wrote:
>
> Hi,
>
> As PR110248 shows, to get the expected query results for
> that case internal functions LEN_{LOAD,STORE} is unable to
> adopt some addressing modes, we need to pass down the
> related IFN code as well.  This patch is to make IVOPTs
> pass down ifn code for USE_PTR_ADDRESS type uses, it
> adjusts the related {strict_,}memory_address_addr_space_p
> and valid_mem_ref_p functions as well.
>
> Bootstrapped and regtested on x86_64-redhat-linux and
> powerpc64{,le}-linux-gnu.
>
> Is it ok for trunk?

LGTM.

> BR,
> Kewen
> -
> PR tree-optimization/110248
>
> gcc/ChangeLog:
>
> * recog.cc (memory_address_addr_space_p): Add one more argument ch of
> type code_helper and pass it to 
> targetm.addr_space.legitimate_address_p
> instead of ERROR_MARK.
> (offsettable_address_addr_space_p): Update one function pointer with
> one more argument of type code_helper as its assignees
> memory_address_addr_space_p and strict_memory_address_addr_space_p
> have been adjusted, and adjust some call sites with ERROR_MARK.
> * recog.h (tree.h): New include header file for tree_code ERROR_MARK.
> (memory_address_addr_space_p): Adjust with one more unnamed argument
> of type code_helper with default ERROR_MARK.
> (strict_memory_address_addr_space_p): Likewise.
> * reload.cc (strict_memory_address_addr_space_p): Add one unnamed
> argument of type code_helper.
> * tree-ssa-address.cc (valid_mem_ref_p): Add one more argument ch of
> type code_helper and pass it to memory_address_addr_space_p.
> * tree-ssa-address.h (valid_mem_ref_p): Adjust the declaration with
> one more unnamed argument of type code_helper with default value
> ERROR_MARK.
> * tree-ssa-loop-ivopts.cc (get_address_cost): Use ERROR_MARK as code
> by default, change it with ifn code for USE_PTR_ADDRESS type use, and
> pass it to all valid_mem_ref_p calls.
> ---
>  gcc/recog.cc| 13 ++---
>  gcc/recog.h | 10 +++---
>  gcc/reload.cc   |  2 +-
>  gcc/tree-ssa-address.cc |  4 ++--
>  gcc/tree-ssa-address.h  |  3 ++-
>  gcc/tree-ssa-loop-ivopts.cc | 18 +-
>  6 files changed, 31 insertions(+), 19 deletions(-)
>
> diff --git a/gcc/recog.cc b/gcc/recog.cc
> index 692c258def6..2bff6c03e4d 100644
> --- a/gcc/recog.cc
> +++ b/gcc/recog.cc
> @@ -1802,8 +1802,8 @@ pop_operand (rtx op, machine_mode mode)
> for mode MODE in address space AS.  */
>
>  bool
> -memory_address_addr_space_p (machine_mode mode ATTRIBUTE_UNUSED,
> -rtx addr, addr_space_t as)
> +memory_address_addr_space_p (machine_mode mode ATTRIBUTE_UNUSED, rtx addr,
> +addr_space_t as, code_helper ch)
>  {
>  #ifdef GO_IF_LEGITIMATE_ADDRESS
>gcc_assert (ADDR_SPACE_GENERIC_P (as));
> @@ -1813,8 +1813,7 @@ memory_address_addr_space_p (machine_mode mode 
> ATTRIBUTE_UNUSED,
>   win:
>return true;
>  #else
> -  return targetm.addr_space.legitimate_address_p (mode, addr, 0, as,
> - ERROR_MARK);
> +  return targetm.addr_space.legitimate_address_p (mode, addr, 0, as, ch);
>  #endif
>  }
>
> @@ -2430,7 +2429,7 @@ offsettable_address_addr_space_p (int strictp, 
> machine_mode mode, rtx y,
>rtx z;
>rtx y1 = y;
>rtx *y2;
> -  bool (*addressp) (machine_mode, rtx, addr_space_t) =
> +  bool (*addressp) (machine_mode, rtx, addr_space_t, code_helper) =
>  (strictp ? strict_memory_address_addr_space_p
>  : memory_address_addr_space_p);
>poly_int64 mode_sz = GET_MODE_SIZE (mode);
> @@ -2469,7 +2468,7 @@ offsettable_address_addr_space_p (int strictp, 
> machine_mode mode, rtx y,
>*y2 = plus_constant (address_mode, *y2, mode_sz - 1);
>/* Use QImode because an odd displacement may be automatically invalid
>  for any wider mode.  But it should be valid for a single byte.  */
> -  good = (*addressp) (QImode, y, as);
> +  good = (*addressp) (QImode, y, as, ERROR_MARK);
>
>/* In any case, restore old contents of memory.  */
>*y2 = y1;
> @@ -2504,7 +2503,7 @@ offsettable_address_addr_space_p (int strictp, 
> machine_mode mode, rtx y,
>
>/* Use QImode because an odd displacement may be automatically invalid
>   for any wider mode.  But it should be valid for a single byte.  */
> -  return (*addressp) (QImode, z, as);
> +  return (*addressp) (QImode, z, as, ERROR_MARK);
>  }
>
>  /* Return true if ADDR is an address-expression whose effect depends
> diff --git a/gcc/recog.h b/gcc/recog.h
> index badf8e3dc1c..c6ef619c5dd 100644
> --- a/gcc/recog.h
> +++ b/gcc/recog.h
> @@ -20,6 +20,9 @@ along with GCC; see the file COPYING3.  If not see
>  #ifndef GCC_RECOG_H
>  #define GCC_RECOG_H
>
> +/* For enum tree_code ERROR_MARK.  */
> +#include "tr

Re: [PATCH] Collect both user and kernel events for autofdo tests and autoprofiledbootstrap

2023-06-30 Thread Sam James via Gcc-patches


Richard Biener via Gcc-patches  writes:

> On Fri, Jun 30, 2023 at 7:28 AM Eugene Rozenfeld via Gcc-patches
>  wrote:
>>
>> When we collect just user events for autofdo with lbr we get some events 
>> where branch
>> sources are kernel addresses and branch targets are user addresses. Without 
>> kernel MMAP
>> events create_gcov can't make sense of kernel addresses. Currently 
>> create_gcov fails if
>> it can't map at least 95% of events. We sometimes get below this threshold 
>> with just
>> user events. The change is to collect both user events and kernel events.
>
> Does this require elevated privileges?  Can we instead "fix" create_gcov here?

Right, requiring privileges for this is going to be a no-go for a lot of
builders. In a distro context, for example, it means we can't consider
autofdo at all.


Re: PR82943 - Suggested patch to fix

2023-06-30 Thread Paul Richard Thomas via Gcc-patches
Hi All,

I have gone through the PDT problem reports and made sure that they
block PR82173.

To my utter astonishment (i) There might be only one duplicate; and
(ii) Only 82649, 84119, 90218, 95541, 99079, 102901 & 105380 (out of
50 PRs) depend on the representation.

Regards

Paul


Re: [PATCH 1/3] targhooks: Extend legitimate_address_p with code_helper [PR110248]

2023-06-30 Thread Kewen.Lin via Gcc-patches
Hi Richi,

Thanks for your review!

on 2023/6/30 16:56, Richard Biener wrote:
> On Fri, Jun 30, 2023 at 7:38 AM Kewen.Lin  wrote:
>>
>> Hi,
>>
>> As PR110248 shows, some middle-end passes like IVOPTs can
>> query the target hook legitimate_address_p with some
>> artificially constructed rtx to determine whether some
>> addressing modes are supported by target for some gimple
>> statement.  But for now the existing legitimate_address_p
>> only checks the given mode, it's unable to distinguish
>> some special cases unfortunately, for example, for LEN_LOAD
>> ifn on Power port, we would expand it with lxvl hardware
>> insn, which only supports one register to hold the address
>> (the other register is holding the length), that is we
>> don't support base (reg) + index (reg) addressing mode for
>> sure.  But hook legitimate_address_p only considers the
>> given mode which would be some vector mode for LEN_LOAD
>> ifn, and we do support base + index addressing mode for
>> normal vector load and store insns, so the hook will return
>> true for the query unexpectedly.
>>
>> This patch is to introduce one extra argument of type
>> code_helper for hook legitimate_address_p, it makes targets
>> able to handle some special case like what's described
>> above.  The subsequent patches will show how to leverage
>> the new code_helper argument.
>>
>> I didn't separate those target specific adjustments to
>> their own patches, since those changes are no function
>> changes.  One typical change is to add one unnamed argument
>> with default ERROR_MARK, some ports need to include tree.h
>> in their {port}-protos.h since the hook is used in some
>> machine description files.  I've cross-built a corresponding
>> cc1 successfully for at least one triple of each affected
>> target so I believe they are safe.  But feel free to correct
>> me if separating is needed for the review of this patch.
>>
>> Besides, it's bootstrapped and regtested on
>> x86_64-redhat-linux and powerpc64{,le}-linux-gnu.
>>
>> Is it ok for trunk?
> 
> Is defaulting the arguments in the targets necessary for
> the middle-end or only for direct uses in the targets?

It's only for the direct uses in target codes, the call
sites in generic code of these hooks would use the given
code_helper type variable or an explicit ERROR_MARK, they
don't require target codes to set that.

> 
> It looks OK in general but please give others some time to
> comment.

OK, thanks again!

BR,
Kewen

> 
> Thanks,
> Richard.
> 
>> BR,
>> Kewen
>> --
>> PR tree-optimization/110248
>>
>> gcc/ChangeLog:
>>
>> * coretypes.h (class code_helper): Add forward declaration.
>> * doc/tm.texi: Regenerate.
>> * lra-constraints.cc (valid_address_p): Call target hook
>> targetm.addr_space.legitimate_address_p with an extra parameter
>> ERROR_MARK as its prototype changes.
>> * recog.cc (memory_address_addr_space_p): Likewise.
>> * reload.cc (strict_memory_address_addr_space_p): Likewise.
>> * target.def (legitimate_address_p, addr_space.legitimate_address_p):
>> Extend with one more argument of type code_helper, update the
>> documentation accordingly.
>> * targhooks.cc (default_legitimate_address_p): Adjust for the
>> new code_helper argument.
>> (default_addr_space_legitimate_address_p): Likewise.
>> * targhooks.h (default_legitimate_address_p): Likewise.
>> (default_addr_space_legitimate_address_p): Likewise.
>> * config/aarch64/aarch64.cc (aarch64_legitimate_address_hook_p): 
>> Adjust
>> with extra unnamed code_helper argument with default ERROR_MARK.
>> * config/alpha/alpha.cc (alpha_legitimate_address_p): Likewise.
>> * config/arc/arc.cc (arc_legitimate_address_p): Likewise.
>> * config/arm/arm-protos.h (arm_legitimate_address_p): Likewise.
>> (tree.h): New include for tree_code ERROR_MARK.
>> * config/arm/arm.cc (arm_legitimate_address_p): Adjust with extra
>> unnamed code_helper argument with default ERROR_MARK.
>> * config/avr/avr.cc (avr_addr_space_legitimate_address_p): Likewise.
>> * config/bfin/bfin.cc (bfin_legitimate_address_p): Likewise.
>> * config/bpf/bpf.cc (bpf_legitimate_address_p): Likewise.
>> * config/c6x/c6x.cc (c6x_legitimate_address_p): Likewise.
>> * config/cris/cris-protos.h (cris_legitimate_address_p): Likewise.
>> (tree.h): New include for tree_code ERROR_MARK.
>> * config/cris/cris.cc (cris_legitimate_address_p): Adjust with extra
>> unnamed code_helper argument with default ERROR_MARK.
>> * config/csky/csky.cc (csky_legitimate_address_p): Likewise.
>> * config/epiphany/epiphany.cc (epiphany_legitimate_address_p):
>> Likewise.
>> * config/frv/frv.cc (frv_legitimate_address_p): Likewise.
>> * config/ft32/ft32.cc (ft32_addr_space_legitimate_address_p): 
>> Likewise.
>> 

[PATCH] Fix couple of endianness issues in fold_ctor_reference

2023-06-30 Thread Eric Botcazou via Gcc-patches
Hi,

fold_ctor_reference attempts to use a recursive local processing in order to 
call native_encode_expr on the leaf nodes of the constructor, before falling 
back to calling native_encode_initializer if this fails.  There are a couple 
of issues related to endianness present in it:
  1) it does not specifically handle integral bit-fields; now these are left 
justified on big-endian platforms so cannot be treated like ordinary fields.
  2) it does not check that the constructor uses the native storage order.

Proposed fix attached, tested on x86-64/Linux and SPARC/Solaris, OK for the 
mainline and some branches?


2023-06-30  Eric Botcazou  

* gimple-fold.cc (fold_array_ctor_reference): Fix head comment.
(fold_nonarray_ctor_reference): Likewise.  Specifically deal
with integral bit-fields.
(fold_ctor_reference): Check that the constructor uses the
native storage order.


2023-06-30  Eric Botcazou  

* gcc.c-torture/execute/20230630-1.c: New test.
* gcc.c-torture/execute/20230630-2.c: Likewise.

-- 
Eric Botcazoudiff --git a/gcc/gimple-fold.cc b/gcc/gimple-fold.cc
index 3d46b76edeb..e80a72dfa22 100644
--- a/gcc/gimple-fold.cc
+++ b/gcc/gimple-fold.cc
@@ -7849,12 +7849,11 @@ get_base_constructor (tree base, poly_int64_pod *bit_offset,
 }
 }
 
-/* CTOR is CONSTRUCTOR of an array type.  Fold a reference of SIZE bits
-   to the memory at bit OFFSET. When non-null, TYPE is the expected
-   type of the reference; otherwise the type of the referenced element
-   is used instead. When SIZE is zero, attempt to fold a reference to
-   the entire element which OFFSET refers to.  Increment *SUBOFF by
-   the bit offset of the accessed element.  */
+/* CTOR is a CONSTRUCTOR of an array or vector type.  Fold a reference of SIZE
+   bits to the memory at bit OFFSET.  If non-null, TYPE is the expected type of
+   the reference; otherwise the type of the referenced element is used instead.
+   When SIZE is zero, attempt to fold a reference to the entire element OFFSET
+   refers to.  Increment *SUBOFF by the bit offset of the accessed element.  */
 
 static tree
 fold_array_ctor_reference (tree type, tree ctor,
@@ -8019,13 +8018,11 @@ fold_array_ctor_reference (tree type, tree ctor,
   return type ? build_zero_cst (type) : NULL_TREE;
 }
 
-/* CTOR is CONSTRUCTOR of an aggregate or vector.  Fold a reference
-   of SIZE bits to the memory at bit OFFSET.   When non-null, TYPE
-   is the expected type of the reference; otherwise the type of
-   the referenced member is used instead.  When SIZE is zero,
-   attempt to fold a reference to the entire member which OFFSET
-   refers to; in this case.  Increment *SUBOFF by the bit offset
-   of the accessed member.  */
+/* CTOR is a CONSTRUCTOR of a record or union type.  Fold a reference of SIZE
+   bits to the memory at bit OFFSET.  If non-null, TYPE is the expected type of
+   the reference; otherwise the type of the referenced member is used instead.
+   When SIZE is zero, attempt to fold a reference to the entire member OFFSET
+   refers to.  Increment *SUBOFF by the bit offset of the accessed member.  */
 
 static tree
 fold_nonarray_ctor_reference (tree type, tree ctor,
@@ -8037,8 +8034,7 @@ fold_nonarray_ctor_reference (tree type, tree ctor,
   unsigned HOST_WIDE_INT cnt;
   tree cfield, cval;
 
-  FOR_EACH_CONSTRUCTOR_ELT (CONSTRUCTOR_ELTS (ctor), cnt, cfield,
-			cval)
+  FOR_EACH_CONSTRUCTOR_ELT (CONSTRUCTOR_ELTS (ctor), cnt, cfield, cval)
 {
   tree byte_offset = DECL_FIELD_OFFSET (cfield);
   tree field_offset = DECL_FIELD_BIT_OFFSET (cfield);
@@ -8110,6 +8106,19 @@ fold_nonarray_ctor_reference (tree type, tree ctor,
 	return NULL_TREE;
 
 	  offset_int inner_offset = offset_int (offset) - bitoffset;
+
+	  /* Integral bit-fields are left-justified on big-endian targets, so
+	 we must arrange for native_encode_int to look at the MSB.  */
+  if (DECL_BIT_FIELD (cfield) && INTEGRAL_TYPE_P (TREE_TYPE (cfield)))
+	{
+	  if (BYTES_BIG_ENDIAN != WORDS_BIG_ENDIAN)
+		return NULL_TREE;
+	  const unsigned int encoding_size
+		= GET_MODE_BITSIZE (SCALAR_INT_TYPE_MODE (TREE_TYPE (cfield)));
+	  if (BYTES_BIG_ENDIAN)
+		inner_offset += encoding_size - wi::to_offset (field_size);
+	}
+
 	  return fold_ctor_reference (type, cval,
   inner_offset.to_uhwi (), size,
   from_decl, suboff);
@@ -8122,7 +8131,7 @@ fold_nonarray_ctor_reference (tree type, tree ctor,
   return build_zero_cst (type);
 }
 
-/* CTOR is value initializing memory.  Fold a reference of TYPE and
+/* CTOR is a value initializing memory.  Fold a reference of TYPE and
bit size POLY_SIZE to the memory at bit POLY_OFFSET.  When POLY_SIZE
is zero, attempt to fold a reference to the entire subobject
which OFFSET refers to.  This is used when folding accesses to
@@ -8163,7 +8172,8 @@ fold_ctor_reference (tree type, tree ctor, const poly_uint64 &poly_offset,
 	}
 

Re: PR108672 re-fixed after [PATCH] libstdc++: Synchronize PSTL with upstream

2023-06-30 Thread Jonathan Wakely via Gcc-patches
On Fri, 30 Jun 2023 at 09:42, Jonathan Wakely  wrote:
>
> On Fri, 30 Jun 2023 at 04:48, Hans-Peter Nilsson wrote:
> >
> > > Date: Mon, 26 Jun 2023 11:57:49 -0700
> > > From: Thomas Rodgers via Gcc-patches 
> >
> > > On Wed, May 17, 2023 at 12:32 PM Jonathan Wakely  
> > > wrote:
> > > > All the actual code changes look good.
> >
> > Unfortunately, this overwrote the fix for PR108672.  I take
> > it there's a step missing from the synchronization process;
> > a check that no local commits are overwritten?  Sounds like
> > something that can be fully scripted (not volunteering) or
> > already available (like, "list all commits affecting
> > contents touched by/between two named commits").
> >
> > I did *not* check whether any other local commits were also
> > overwritten.  Also, not sure about whether better try to get
> > this upstreamed: __INT32_TYPE__ seems gcc-specific.
>
> Clang does support it too, but I agree that upstream might not want that 
> change.
>
>
> > Anyway, r13-5702-g72058eea9d407e was "re-committed" per
> > below as obvious after regtesting cris-elf.
>
> Thanks.
>
> I'll add an include/pstl/LOCAL_PATCHES file listed the commits we
> apply locally after importing the upstream sources.
>
> Based on git history, the initial list of commits is:

For the record:

> r9-6908-g0360f9ad4048ea
Upstream: c7c6413119380828d92e8beb5fb2f35d3f2e1572
> r9-6942-g9eda9f9231f287
Upstream: 2e15f4ac572bcf429ec12e8f3efbb8ad254042c7
> r9-7071-ga34d6343a758f6
Upstream: 8a497a958be1f4656eff9664e1be29491f3795d2
and 86d4ec756b5e0bc72d7e78fc574e05802959ead4
> r10-572-g34d878c7bc86d4
Not upstream (GCC-specific).
> r10-1314-g32bab8b6ad0a90
Not upstream (probably should be)
> r11-7339-g7e647d71d556b7
Upstream: b152f9f392d42e1c8e29f382f876480631575190
> r12-7699-gac73c944eac88f
Not upstream (GCC-specific)
> r13-3708-ge3b10249119fb4
Not upstream (GCC-specific)
> r13-5702-g72058eea9d407e
Not upstream (GCC-specific)
>
> But several of those have been incorporated upstream, or were
> reapplied correctly to our downstream copies. We'll go through the
> list and find which ones need to stay there.
>
> It looks like r10-1314-g32bab8b6ad0a90 was lost and should be re-applied.



Re: [PATCH] Fix couple of endianness issues in fold_ctor_reference

2023-06-30 Thread Richard Biener via Gcc-patches
On Fri, Jun 30, 2023 at 11:29 AM Eric Botcazou via Gcc-patches
 wrote:
>
> Hi,
>
> fold_ctor_reference attempts to use a recursive local processing in order to
> call native_encode_expr on the leaf nodes of the constructor, before falling
> back to calling native_encode_initializer if this fails.  There are a couple
> of issues related to endianness present in it:
>   1) it does not specifically handle integral bit-fields; now these are left
> justified on big-endian platforms so cannot be treated like ordinary fields.
>   2) it does not check that the constructor uses the native storage order.
>
> Proposed fix attached, tested on x86-64/Linux and SPARC/Solaris, OK for the
> mainline and some branches?

OK.

thanks,
Richard.

>
> 2023-06-30  Eric Botcazou  
>
> * gimple-fold.cc (fold_array_ctor_reference): Fix head comment.
> (fold_nonarray_ctor_reference): Likewise.  Specifically deal
> with integral bit-fields.
> (fold_ctor_reference): Check that the constructor uses the
> native storage order.
>
>
> 2023-06-30  Eric Botcazou  
>
> * gcc.c-torture/execute/20230630-1.c: New test.
> * gcc.c-torture/execute/20230630-2.c: Likewise.
>
> --
> Eric Botcazou


Re: [PATCH] RISC-V: Support vfwmul.vv combine lowering

2023-06-30 Thread Robin Dapp via Gcc-patches
> The explicit conversions I see are because we need the output of the
> conversion in multiple vfmul instructions.  That won't be helped by
> the patch you've proposed.

FWIW on my local branch and the patch applied I see that the vfwmuls
are being generated (all of the vfmuls are replaced).

> It'll need to be a define_insn_and_split as its a 3->3 splitter.  The
> split will emit the two extensions and the widening multiply as 3
> distinct insns.

I tried this and while it worked for the first vfwmul the subsequent
ones are not being combined/optimized.  Now I'm not a combine expert
at all but it looks as if the source float_extends are being deleted

 deferring deletion of insn with uid = 39.
 deferring deletion of insn with uid = 37.

with that pattern successfully matched, while they are only "rescanned"
with the synthetic "single widen" one.  Them being deleted (or rather
absorbed by the vfwmul) no further combination is possible (until after
split?)

This seems to be a fundamental difference between the two approaches.
Maybe the "double widen" pattern can be adjusted to also handle this
or I did something wrong when writing the splitter?

With the "single widen" pattern, however, it works more or less
naturally therefore I'd still suggest going for it.

Regards
 Robin


[PATCH V5] Machine Description: Add LEN_MASK_{GATHER_LOAD, SCATTER_STORE} pattern

2023-06-30 Thread juzhe . zhong
From: Ju-Zhe Zhong 

Hi, Richi and Richard.

This patch is adding LEN_MASK_{GATHER_LOAD,SCATTER_STORE} to allow targets
handle flow control by mask and loop control by length on gather/scatter memory
operations. Consider this following case:

#include 
void
f (uint8_t *restrict a,
   uint8_t *restrict b, int n,
   int base, int step,
   int *restrict cond)
{
  for (int i = 0; i < n; ++i)
{
  if (cond[i])
a[i * step + base] = b[i * step + base];
}
}

We hope RVV can vectorize such case into following IR:

loop_len = SELECT_VL
control_mask = comparison
v = LEN_MASK_GATHER_LOAD (.., loop_len, control_mask, bias)
LEN_SCATTER_STORE (... v, ..., loop_len, control_mask, bias)

This patch doesn't apply such patterns into vectorizer, just add patterns
and update the documents.

Will send patch which apply such patterns into vectorizer soon after this
patch is approved.

Thanks.

---
 gcc/doc/md.texi | 17 
 gcc/internal-fn.cc  | 67 +++--
 gcc/internal-fn.def |  8 --
 gcc/internal-fn.h   |  1 +
 gcc/optabs.def  |  2 ++
 5 files changed, 90 insertions(+), 5 deletions(-)

diff --git a/gcc/doc/md.texi b/gcc/doc/md.texi
index 9648fdc846a..df41b5251d4 100644
--- a/gcc/doc/md.texi
+++ b/gcc/doc/md.texi
@@ -5040,6 +5040,15 @@ operand 5.  Bit @var{i} of the mask is set if element 
@var{i}
 of the result should be loaded from memory and clear if element @var{i}
 of the result should be set to zero.
 
+@cindex @code{len_mask_gather_load@var{m}@var{n}} instruction pattern
+@item @samp{len_mask_gather_load@var{m}@var{n}}
+Like @samp{gather_load@var{m}@var{n}}, but takes an extra length operand 
(operand 5),
+a mask operand (operand 6) as well as a bias operand (operand 7).  Similar to 
len_maskload,
+the instruction loads at most (operand 5 + operand 7) elements from memory.
+Bit @var{i} of the mask is set if element @var{i} of the result should
+be loaded from memory and clear if element @var{i} of the result should be 
undefined.
+Mask elements @var{i} with i > (operand 5) are ignored.
+
 @cindex @code{scatter_store@var{m}@var{n}} instruction pattern
 @item @samp{scatter_store@var{m}@var{n}}
 Store a vector of mode @var{m} into several distinct memory locations.
@@ -5069,6 +5078,14 @@ Like @samp{scatter_store@var{m}@var{n}}, but takes an 
extra mask operand as
 operand 5.  Bit @var{i} of the mask is set if element @var{i}
 of the result should be stored to memory.
 
+@cindex @code{len_mask_scatter_store@var{m}@var{n}} instruction pattern
+@item @samp{len_mask_scatter_store@var{m}@var{n}}
+Like @samp{scatter_store@var{m}@var{n}}, but takes an extra length operand 
(operand 5),
+a mask operand (operand 6) as well as a bias operand (operand 7).  The 
instruction stores
+at most (operand 5 + operand 7) elements of (operand 4) to memory.
+Bit @var{i} of the mask is set if element @var{i} of (operand 4) should be 
stored.
+Mask elements @var{i} with i > (operand 5) are ignored.
+
 @cindex @code{vec_set@var{m}} instruction pattern
 @item @samp{vec_set@var{m}}
 Set given field in the vector value.  Operand 0 is the vector to modify,
diff --git a/gcc/internal-fn.cc b/gcc/internal-fn.cc
index 9017176dc7a..da3827481e9 100644
--- a/gcc/internal-fn.cc
+++ b/gcc/internal-fn.cc
@@ -3537,7 +3537,7 @@ expand_scatter_store_optab_fn (internal_fn, gcall *stmt, 
direct_optab optab)
   HOST_WIDE_INT scale_int = tree_to_shwi (scale);
   rtx rhs_rtx = expand_normal (rhs);
 
-  class expand_operand ops[6];
+  class expand_operand ops[8];
   int i = 0;
   create_address_operand (&ops[i++], base_rtx);
   create_input_operand (&ops[i++], offset_rtx, TYPE_MODE (TREE_TYPE (offset)));
@@ -3546,9 +3546,23 @@ expand_scatter_store_optab_fn (internal_fn, gcall *stmt, 
direct_optab optab)
   create_input_operand (&ops[i++], rhs_rtx, TYPE_MODE (TREE_TYPE (rhs)));
   if (mask_index >= 0)
 {
+  if (optab == len_mask_scatter_store_optab)
+   {
+ tree len = gimple_call_arg (stmt, internal_fn_len_index (ifn));
+ rtx len_rtx = expand_normal (len);
+ create_convert_operand_from (&ops[i++], len_rtx,
+  TYPE_MODE (TREE_TYPE (len)),
+  TYPE_UNSIGNED (TREE_TYPE (len)));
+   }
   tree mask = gimple_call_arg (stmt, mask_index);
   rtx mask_rtx = expand_normal (mask);
   create_input_operand (&ops[i++], mask_rtx, TYPE_MODE (TREE_TYPE (mask)));
+  if (optab == len_mask_scatter_store_optab)
+   {
+ tree biast = gimple_call_arg (stmt, gimple_call_num_args (stmt) - 1);
+ rtx bias = expand_normal (biast);
+ create_input_operand (&ops[i++], bias, QImode);
+   }
 }
 
   insn_code icode = convert_optab_handler (optab, TYPE_MODE (TREE_TYPE (rhs)),
@@ -3559,7 +3573,7 @@ expand_scatter_store_optab_fn (internal_fn, gcall *stmt, 
direct_optab optab)
 /* Expand {MASK_,}GATHER_LOAD call CALL using optab OPTAB.  */
 
 static void
-expand_gather_load

[PATCH] VECT: Apply LEN_MASK_GATHER_LOAD/SCATTER_STORE into vectorizer

2023-06-30 Thread juzhe . zhong
From: Ju-Zhe Zhong 

Hi, Richard and Richi.
It seems that the implementation of LEN_MASK_GATHER_LOAD/LEN_MASK_SCATTER_STORE 
is simple
and code change is not big.

Here is an example:

#include 
void
f (uint8_t *restrict a,
   uint8_t *restrict b, int n,
   int base, int step,
   int *restrict cond)
{
  for (int i = 0; i < n; ++i)
{
  if (cond[i])
a[i * step + base] = b[i * step + base];
}
}

With this patch:

   [local count: 84095460]:
  _58 = (unsigned int) base_19(D);
  _61 = (unsigned long) b_20(D);
  _63 = (unsigned long) a_21(D);
  vect_cst__105 = [vec_duplicate_expr] _58;
  _110 = (unsigned long) n_16(D);

   [local count: 504572759]:
  # vect_vec_iv_.8_95 = PHI <_96(7), { 0, 1, 2, ... }(6)>
  # vectp_cond.9_99 = PHI 
  # ivtmp_111 = PHI 
  _113 = .SELECT_VL (ivtmp_111, POLY_INT_CST [4, 4]);
  _96 = vect_vec_iv_.8_95 + { POLY_INT_CST [4, 4], ... };
  ivtmp_98 = _113 * 4;
  vect__24.11_101 = .LEN_MASK_LOAD (vectp_cond.9_99, 32B, _113, { -1, ... }, 0);
  mask__14.12_103 = vect__24.11_101 != { 0, ... };
  vect__59.13_104 = VIEW_CONVERT_EXPR(vect_vec_iv_.8_95);
  vect__60.14_106 = vect__59.13_104 + vect_cst__105;
  vect__12.15_107 = VIEW_CONVERT_EXPR(vect__60.14_106);
  vect_patt_5.16_108 = .LEN_MASK_GATHER_LOAD (_61, vect__12.15_107, 4, { 0, ... 
}, _113, mask__14.12_103, 0);
  .LEN_MASK_SCATTER_STORE (_63, vect__12.15_107, 4, vect_patt_5.16_108, _113, 
mask__14.12_103, 0);
  vectp_cond.9_100 = vectp_cond.9_99 + ivtmp_98;
  ivtmp_112 = ivtmp_111 - _113;
  if (ivtmp_112 != 0)
goto ; [83.33%]
  else
goto ; [16.67%]

gcc/ChangeLog:

* optabs-query.cc (supports_vec_gather_load_p): Apply 
LEN_MASK_GATHER_LOAD/SCATTER_STORE.
(supports_vec_scatter_store_p): Ditto.
* tree-vect-data-refs.cc (vect_gather_scatter_fn_p): Ditto.
* tree-vect-stmts.cc (check_load_store_for_partial_vectors): Ditto.
(vectorizable_store): Ditto.
(vectorizable_load): Ditto.

---
 gcc/optabs-query.cc|  2 +
 gcc/tree-vect-data-refs.cc | 18 -
 gcc/tree-vect-stmts.cc | 81 +-
 3 files changed, 98 insertions(+), 3 deletions(-)

diff --git a/gcc/optabs-query.cc b/gcc/optabs-query.cc
index 2fdd0d34354..bf1f484e874 100644
--- a/gcc/optabs-query.cc
+++ b/gcc/optabs-query.cc
@@ -676,6 +676,7 @@ supports_vec_gather_load_p (machine_mode mode)
 this_fn_optabs->supports_vec_gather_load[mode]
   = (supports_vec_convert_optab_p (gather_load_optab, mode)
 || supports_vec_convert_optab_p (mask_gather_load_optab, mode)
+|| supports_vec_convert_optab_p (len_mask_gather_load_optab, mode)
 ? 1 : -1);
 
   return this_fn_optabs->supports_vec_gather_load[mode] > 0;
@@ -692,6 +693,7 @@ supports_vec_scatter_store_p (machine_mode mode)
 this_fn_optabs->supports_vec_scatter_store[mode]
   = (supports_vec_convert_optab_p (scatter_store_optab, mode)
 || supports_vec_convert_optab_p (mask_scatter_store_optab, mode)
+|| supports_vec_convert_optab_p (len_mask_scatter_store_optab, mode)
 ? 1 : -1);
 
   return this_fn_optabs->supports_vec_scatter_store[mode] > 0;
diff --git a/gcc/tree-vect-data-refs.cc b/gcc/tree-vect-data-refs.cc
index ebe93832b1e..01016284c48 100644
--- a/gcc/tree-vect-data-refs.cc
+++ b/gcc/tree-vect-data-refs.cc
@@ -3873,16 +3873,24 @@ vect_gather_scatter_fn_p (vec_info *vinfo, bool read_p, 
bool masked_p,
 return false;
 
   /* Work out which function we need.  */
-  internal_fn ifn, alt_ifn;
+  internal_fn ifn, alt_ifn, len_mask_ifn;
   if (read_p)
 {
   ifn = masked_p ? IFN_MASK_GATHER_LOAD : IFN_GATHER_LOAD;
   alt_ifn = IFN_MASK_GATHER_LOAD;
+  /* When target supports LEN_MASK_GATHER_LOAD, we always
+use LEN_MASK_GATHER_LOAD regardless whether len and
+mask are valid or not.  */
+  len_mask_ifn = IFN_LEN_MASK_GATHER_LOAD;
 }
   else
 {
   ifn = masked_p ? IFN_MASK_SCATTER_STORE : IFN_SCATTER_STORE;
   alt_ifn = IFN_MASK_SCATTER_STORE;
+  /* When target supports LEN_MASK_SCATTER_STORE, we always
+use LEN_MASK_SCATTER_STORE regardless whether len and
+mask are valid or not.  */
+  len_mask_ifn = IFN_LEN_MASK_SCATTER_STORE;
 }
 
   for (;;)
@@ -3909,6 +3917,14 @@ vect_gather_scatter_fn_p (vec_info *vinfo, bool read_p, 
bool masked_p,
  *offset_vectype_out = offset_vectype;
  return true;
}
+  else if (internal_gather_scatter_fn_supported_p (len_mask_ifn, vectype,
+  memory_type,
+  offset_vectype, scale))
+   {
+ *ifn_out = ifn;
+ *offset_vectype_out = offset_vectype;
+ return true;
+   }
 
   if (TYPE_PRECISION (offset_type) >= POINTER_SIZE
  && TYPE_PRECISION (offset_type) >= element_bits)
diff --git a/gcc/tree-vect-stmts.cc b/gcc/tree-vect-stmts.cc
index 68faa8ead39..fa0387353cf 100644
--- a/gcc/tr

[COMMITTED] MAINTAINERS file: Added myself to Write After Approval and DCO

2023-06-30 Thread Rishi Raj via Gcc-patches
 From 50cb9df7209125f9466336d23efdd4fbeda9c4d5 Mon Sep 17 00:00:00 2001
From: rsh-raj 
Date: Fri, 30 Jun 2023 16:04:48 +0530
Subject: [PATCH] MAINTAINERS file: Added myself to Write After Approval and
 DCO

ChangeLog:

2023-06-30  Rishi Raj  

* MAINTAINERS: Added myself to Write After Approval and DCO
---
 MAINTAINERS | 2 ++
 1 file changed, 2 insertions(+)

diff --git a/MAINTAINERS b/MAINTAINERS
index bac773ad0af..2a0eb5b52b5 100644
--- a/MAINTAINERS
+++ b/MAINTAINERS
@@ -610,6 +610,7 @@ Hafiz Abid Qadeer 
 Yao Qi 
 Jerry Quinn 
 Navid Rahimi 
+Rishi Raj 
 Easwaran Raman 
 Joe Ramsay 
 Rolf Rasmussen 
@@ -749,6 +750,7 @@ Immad Mir 
 Gaius Mulley 
 Siddhesh Poyarekar 
 Navid Rahimi 
+Rishi Raj 
 Trevor Saunders 
 Bill Schmidt 
 Nathan Sidwell 
-- 
2.40.1


[PATCH] tree-optimization/110496 - TYPE_PRECISION issue with store-merging

2023-06-30 Thread Richard Biener via Gcc-patches
When store-merging looks for bswap opportunities we also handle
BIT_FIELD_REFs where we verify the refed object is of scalar
type but we don't check for the result type we eventually use.
That's done later but after we eventually query TYPE_PRECISION.
The following re-orders this.

Bootstrapped and tested on x86_64-unknown-linux-gnu, pushed.

PR tree-optimization/110496
* gimple-ssa-store-merging.cc (find_bswap_or_nop_1): Re-order
verifying and TYPE_PRECISION query for the BIT_FIELD_REF case.

* gcc.dg/pr110496.c: New testcase.
---
 gcc/gimple-ssa-store-merging.cc |  5 -
 gcc/testsuite/gcc.dg/pr110496.c | 26 ++
 2 files changed, 30 insertions(+), 1 deletion(-)
 create mode 100644 gcc/testsuite/gcc.dg/pr110496.c

diff --git a/gcc/gimple-ssa-store-merging.cc b/gcc/gimple-ssa-store-merging.cc
index 401496a9231..0d19b98ed73 100644
--- a/gcc/gimple-ssa-store-merging.cc
+++ b/gcc/gimple-ssa-store-merging.cc
@@ -650,10 +650,13 @@ find_bswap_or_nop_1 (gimple *stmt, struct symbolic_number 
*n, int limit)
 
  /* Convert.  */
  n->type = TREE_TYPE (rhs1);
+ if (!verify_symbolic_number_p (n, stmt))
+   return NULL;
+
  if (!n->base_addr)
n->range = TYPE_PRECISION (n->type) / BITS_PER_UNIT;
 
- return verify_symbolic_number_p (n, stmt) ? stmt : NULL;
+ return stmt;
}
 
   return NULL;
diff --git a/gcc/testsuite/gcc.dg/pr110496.c b/gcc/testsuite/gcc.dg/pr110496.c
new file mode 100644
index 000..3c3d12fb532
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/pr110496.c
@@ -0,0 +1,26 @@
+/* { dg-do compile } */
+/* { dg-options "-O2" } */
+
+long contents, f_num;
+int decide();
+int f_MV0__x;
+void f() {
+  unsigned char *rptr;
+  unsigned char valbuf[6];
+  rptr = (unsigned char *)contents;
+  if (decide())
+do {
+  __builtin_memcpy(valbuf, &f_MV0__x, sizeof(int));
+  (&valbuf[0])[4] = (&valbuf[0])[5] = 0;
+} while (0);
+  else {
+int MV0__x = f_num;
+__builtin_memcpy(valbuf, &MV0__x, sizeof(int));
+(&valbuf[0])[4] = (&valbuf[0])[5] = 0;
+  }
+  rptr[1] = valbuf[4];
+  rptr[2] = valbuf[5];
+  rptr[4] = valbuf[1];
+  rptr[5] = valbuf[2];
+  __builtin_memset(valbuf, 0, 8);
+}
-- 
2.35.3


[pushed] wwwdocs: gcc-14: Add list markup for C++ release notes

2023-06-30 Thread Gerald Pfeifer
List items can only appear in lists. :-)

This fixes up

  commit b38079855ead7f7e358d17bc06642d031de5e29b
  Author: Marek Polacek 
  Date:   Thu Jun 22 14:44:43 2023 -0400

C++26 P2752R3 - Static storage for braced initializers implemented

Gerald

---
 htdocs/gcc-14/changes.html | 2 ++
 1 file changed, 2 insertions(+)

diff --git a/htdocs/gcc-14/changes.html b/htdocs/gcc-14/changes.html
index 00165740..3f797642 100644
--- a/htdocs/gcc-14/changes.html
+++ b/htdocs/gcc-14/changes.html
@@ -76,6 +76,7 @@ a work-in-progress.
 
 C++
 
+
   Several C++26 features have been implemented:
 
   https://wg21.link/P2752R3";>P2752R3, Static storage for
@@ -83,6 +84,7 @@ a work-in-progress.
   
 
   
+
 
 
 
-- 
2.41.0


Re: [PATCH 2/9] vect: Adjust vectorizable_load costing on VMAT_GATHER_SCATTER && gs_info.decl

2023-06-30 Thread Richard Biener via Gcc-patches
On Tue, Jun 13, 2023 at 4:03 AM Kewen Lin  wrote:
>
> This patch adds one extra argument cost_vec to function
> vect_build_gather_load_calls, so that we can do costing
> next to the tranform in vect_build_gather_load_calls.
> For now, the implementation just follows the handlings in
> vect_model_load_cost, it isn't so good, so placing one
> FIXME for any further improvement.  This patch should not
> cause any functional changes.
>
> gcc/ChangeLog:
>
> * tree-vect-stmts.cc (vect_build_gather_load_calls): Add the handlings
> on costing with one extra argument cost_vec.
> (vectorizable_load): Adjust the call to vect_build_gather_load_calls.
> (vect_model_load_cost): Assert it won't get VMAT_GATHER_SCATTER with
> gs_info.decl set any more.
> ---
>  gcc/tree-vect-stmts.cc | 31 +++
>  1 file changed, 23 insertions(+), 8 deletions(-)
>
> diff --git a/gcc/tree-vect-stmts.cc b/gcc/tree-vect-stmts.cc
> index 44514658be3..744cdf40e26 100644
> --- a/gcc/tree-vect-stmts.cc
> +++ b/gcc/tree-vect-stmts.cc
> @@ -1135,6 +1135,8 @@ vect_model_load_cost (vec_info *vinfo,
>   slp_tree slp_node,
>   stmt_vector_for_cost *cost_vec)
>  {
> +  gcc_assert (memory_access_type != VMAT_GATHER_SCATTER || !gs_info->decl);
> +
>unsigned int inside_cost = 0, prologue_cost = 0;
>bool grouped_access_p = STMT_VINFO_GROUPED_ACCESS (stmt_info);
>
> @@ -2819,7 +2821,8 @@ vect_build_gather_load_calls (vec_info *vinfo, 
> stmt_vec_info stmt_info,
>   gimple_stmt_iterator *gsi,
>   gimple **vec_stmt,
>   gather_scatter_info *gs_info,
> - tree mask)
> + tree mask,
> + stmt_vector_for_cost *cost_vec)
>  {
>loop_vec_info loop_vinfo = dyn_cast  (vinfo);
>class loop *loop = LOOP_VINFO_LOOP (loop_vinfo);
> @@ -2831,6 +2834,23 @@ vect_build_gather_load_calls (vec_info *vinfo, 
> stmt_vec_info stmt_info,
>poly_uint64 gather_off_nunits
>  = TYPE_VECTOR_SUBPARTS (gs_info->offset_vectype);
>
> +  /* FIXME: Keep the previous costing way in vect_model_load_cost by costing
> + N scalar loads, but it should be tweaked to use target specific costs
> + on related gather load calls.  */
> +  if (!vec_stmt)

going over the series now, I'm collecting comments but wanted to get
this one out here:
I'd rather see if (cost_vec) here, that 'vec_stmt' argument is quite
legacy (I think it can
be completely purged everywhere)

> +{
> +  unsigned int assumed_nunits = vect_nunits_for_cost (vectype);
> +  unsigned int inside_cost;
> +  inside_cost = record_stmt_cost (cost_vec, ncopies * assumed_nunits,
> + scalar_load, stmt_info, 0, vect_body);
> +  if (dump_enabled_p ())
> +   dump_printf_loc (MSG_NOTE, vect_location,
> +"vect_model_load_cost: inside_cost = %d, "
> +"prologue_cost = 0 .\n",
> +inside_cost);
> +  return;
> +}
> +
>tree arglist = TYPE_ARG_TYPES (TREE_TYPE (gs_info->decl));
>tree rettype = TREE_TYPE (TREE_TYPE (gs_info->decl));
>tree srctype = TREE_VALUE (arglist); arglist = TREE_CHAIN (arglist);
> @@ -9483,13 +9503,8 @@ vectorizable_load (vec_info *vinfo,
>
>if (memory_access_type == VMAT_GATHER_SCATTER && gs_info.decl)
>  {
> -  if (costing_p)
> -   vect_model_load_cost (vinfo, stmt_info, ncopies, vf, 
> memory_access_type,
> - alignment_support_scheme, misalignment, 
> &gs_info,
> - slp_node, cost_vec);
> -  else
> -   vect_build_gather_load_calls (vinfo, stmt_info, gsi, vec_stmt, 
> &gs_info,
> - mask);
> +  vect_build_gather_load_calls (vinfo, stmt_info, gsi, vec_stmt, 
> &gs_info,
> +   mask, cost_vec);
>return true;
>  }
>
> --
> 2.31.1
>


RE: [PATCH v1] RISC-V: Refactor vxrm_mode attr for type attr equal

2023-06-30 Thread Li, Pan2 via Gcc-patches
Committed, thanks Jeff and Juzhe.

Pan

-Original Message-
From: Jeff Law  
Sent: Friday, June 30, 2023 4:58 AM
To: Li, Pan2 ; gcc-patches@gcc.gnu.org
Cc: juzhe.zh...@rivai.ai; kito.ch...@sifive.com; Wang, Yanzhang 

Subject: Re: [PATCH v1] RISC-V: Refactor vxrm_mode attr for type attr equal



On 6/29/23 00:00, pan2...@intel.com wrote:
> From: Pan Li 
> 
> This patch would like to refactor the vxrm_mode attr for duplicated
> eq_attr condition. The common condition of attr is extraced to one
> place instead of many places.
> 
> Signed-off-by: Pan Li 
> 
> gcc/ChangeLog:
> 
>   * config/riscv/vector.md: Refactor the common condition.
OK
jeff


Re: [PATCH 3/9] vect: Adjust vectorizable_load costing on VMAT_INVARIANT

2023-06-30 Thread Richard Biener via Gcc-patches
On Tue, Jun 13, 2023 at 4:03 AM Kewen Lin  wrote:
>
> This patch adjusts the cost handling on VMAT_INVARIANT in
> function vectorizable_load.  We don't call function
> vect_model_load_cost for it any more.
>
> To make the costing on VMAT_INVARIANT better, this patch is
> to query hoist_defs_of_uses for hoisting decision, and add
> costs for different "where" based on it.  Currently function
> hoist_defs_of_uses would always hoist the defs of all SSA
> uses, adding one argument HOIST_P aims to avoid the actual
> hoisting during costing phase.
>
> gcc/ChangeLog:
>
> * tree-vect-stmts.cc (hoist_defs_of_uses): Add one argument HOIST_P.
> (vectorizable_load): Adjust the handling on VMAT_INVARIANT to respect
> hoisting decision and without calling vect_model_load_cost.
> (vect_model_load_cost): Assert it won't get VMAT_INVARIANT any more
> and remove VMAT_INVARIANT related handlings.
> ---
>  gcc/tree-vect-stmts.cc | 61 +++---
>  1 file changed, 39 insertions(+), 22 deletions(-)
>
> diff --git a/gcc/tree-vect-stmts.cc b/gcc/tree-vect-stmts.cc
> index 744cdf40e26..19c61d703c8 100644
> --- a/gcc/tree-vect-stmts.cc
> +++ b/gcc/tree-vect-stmts.cc
> @@ -1135,7 +1135,8 @@ vect_model_load_cost (vec_info *vinfo,
>   slp_tree slp_node,
>   stmt_vector_for_cost *cost_vec)
>  {
> -  gcc_assert (memory_access_type != VMAT_GATHER_SCATTER || !gs_info->decl);
> +  gcc_assert ((memory_access_type != VMAT_GATHER_SCATTER || !gs_info->decl)
> + && memory_access_type != VMAT_INVARIANT);
>
>unsigned int inside_cost = 0, prologue_cost = 0;
>bool grouped_access_p = STMT_VINFO_GROUPED_ACCESS (stmt_info);
> @@ -1238,16 +1239,6 @@ vect_model_load_cost (vec_info *vinfo,
>ncopies * assumed_nunits,
>scalar_load, stmt_info, 0, vect_body);
>  }
> -  else if (memory_access_type == VMAT_INVARIANT)
> -{
> -  /* Invariant loads will ideally be hoisted and splat to a vector.  */
> -  prologue_cost += record_stmt_cost (cost_vec, 1,
> -scalar_load, stmt_info, 0,
> -vect_prologue);
> -  prologue_cost += record_stmt_cost (cost_vec, 1,
> -scalar_to_vec, stmt_info, 0,
> -vect_prologue);
> -}
>else
>  vect_get_load_cost (vinfo, stmt_info, ncopies,
> alignment_support_scheme, misalignment, first_stmt_p,
> @@ -9121,10 +9112,11 @@ permute_vec_elements (vec_info *vinfo,
>  /* Hoist the definitions of all SSA uses on STMT_INFO out of the loop LOOP,
> inserting them on the loops preheader edge.  Returns true if we
> were successful in doing so (and thus STMT_INFO can be moved then),
> -   otherwise returns false.  */
> +   otherwise returns false.  HOIST_P indicates if we want to hoist the
> +   definitions of all SSA uses, it would be false when we are costing.  */
>
>  static bool
> -hoist_defs_of_uses (stmt_vec_info stmt_info, class loop *loop)
> +hoist_defs_of_uses (stmt_vec_info stmt_info, class loop *loop, bool hoist_p)
>  {
>ssa_op_iter i;
>tree op;
> @@ -9158,6 +9150,9 @@ hoist_defs_of_uses (stmt_vec_info stmt_info, class loop 
> *loop)
>if (!any)
>  return true;
>
> +  if (!hoist_p)
> +return true;
> +
>FOR_EACH_SSA_TREE_OPERAND (op, stmt_info->stmt, i, SSA_OP_USE)
>  {
>gimple *def_stmt = SSA_NAME_DEF_STMT (op);
> @@ -9510,14 +9505,6 @@ vectorizable_load (vec_info *vinfo,
>
>if (memory_access_type == VMAT_INVARIANT)
>  {
> -  if (costing_p)
> -   {
> - vect_model_load_cost (vinfo, stmt_info, ncopies, vf,
> -   memory_access_type, alignment_support_scheme,
> -   misalignment, &gs_info, slp_node, cost_vec);
> - return true;
> -   }
> -
>gcc_assert (!grouped_load && !mask && !bb_vinfo);
>/* If we have versioned for aliasing or the loop doesn't
>  have any data dependencies that would preclude this,
> @@ -9525,7 +9512,37 @@ vectorizable_load (vec_info *vinfo,
>  thus we can insert it on the preheader edge.  */
>bool hoist_p = (LOOP_VINFO_NO_DATA_DEPENDENCIES (loop_vinfo)
>   && !nested_in_vect_loop
> - && hoist_defs_of_uses (stmt_info, loop));
> + && hoist_defs_of_uses (stmt_info, loop, !costing_p));

'hoist_defs_of_uses' should ideally be computed once at analysis time and
the result remembered.  It's not so easy in this case so maybe just
add a comment
for this here.

> +  if (costing_p)
> +   {
> + if (hoist_p)
> +   {
> + unsigned int prologue_cost;
> + prologue_cost = record_stmt_cost (cost_vec, 1, scalar_load,
> + 

Re: [PATCH 1/2] Mid engine setup [SU]ABDL

2023-06-30 Thread Richard Sandiford via Gcc-patches
Oluwatamilore Adebayo  writes:
> From: oluade01 
>
> This updates vect_recog_abd_pattern to recognize the widening
> variant of absolute difference (ABDL, ABDL2).
>
> gcc/ChangeLog:
>
>   * internal-fn.cc (widening_fn_p, decomposes_to_hilo_fn_p):
>   Add IFN_VEC_WIDEN_ABD to the switch statement.
>   * internal-fn.def (VEC_WIDEN_ABD): New internal hilo optab.
>   * optabs.def (vec_widen_sabd_optab,
>   vec_widen_sabd_hi_optab, vec_widen_sabd_lo_optab,
>   vec_widen_sabd_odd_even, vec_widen_sabd_even_optab,
>   vec_widen_uabd_optab,
>   vec_widen_uabd_hi_optab, vec_widen_uabd_lo_optab,
>   vec_widen_uabd_odd_even, vec_widen_uabd_even_optab):
>   New optabs.
>   * tree-vect-patterns.cc (vect_recog_abd_pattern): Update to
>   to build a VEC_WIDEN_ABD call if the input precision is smaller
>   than the precision of the output.
>   (vect_recog_widen_abd_pattern): Should an ABD expression be
>   found preceeding an extension, replace the two with a
>   VEC_WIDEN_ABD.

Thanks.  Testing on an updated trunk shows that we need to check
INTEGRAL_TYPE_P…

> @@ -1703,6 +1736,66 @@ vect_recog_widen_minus_pattern (vec_info *vinfo, 
> stmt_vec_info last_stmt_info,
> &subtype);
>  }
>  
> +/* Try to detect abd on widened inputs, converting IFN_ABD
> +   to IFN_VEC_WIDEN_ABD.  */
> +static gimple *
> +vect_recog_widen_abd_pattern (vec_info *vinfo, stmt_vec_info stmt_vinfo,
> +   tree *type_out)
> +{
> +  gassign *last_stmt = dyn_cast  (STMT_VINFO_STMT (stmt_vinfo));
> +  if (!last_stmt || !CONVERT_EXPR_CODE_P (gimple_assign_rhs_code 
> (last_stmt)))
> +return NULL;
> +
> +  tree last_rhs = gimple_assign_rhs1 (last_stmt);
> +
> +  tree in_type = TREE_TYPE (last_rhs);
> +  tree out_type = TREE_TYPE (gimple_assign_lhs (last_stmt));
> +  if (TYPE_PRECISION (in_type) * 2 != TYPE_PRECISION (out_type)
> +  || !TYPE_UNSIGNED (in_type))
> +return NULL;

…here to avoid new stricter testing for TYPE_PRECISION on VECTOR_TYPEs.
I've pushed the series with that change.

Thanks again for working on this.  It's a really nice improvement.

Richard

> +
> +  vect_unpromoted_value unprom;
> +  tree op = vect_look_through_possible_promotion (vinfo, last_rhs, &unprom);
> +  if (!op || TYPE_PRECISION (TREE_TYPE (op)) != TYPE_PRECISION (in_type))
> +return NULL;
> +
> +  stmt_vec_info abd_pattern_vinfo = vect_get_internal_def (vinfo, op);
> +  if (!abd_pattern_vinfo)
> +return NULL;
> +
> +  abd_pattern_vinfo = vect_stmt_to_vectorize (abd_pattern_vinfo);
> +  gcall *abd_stmt = dyn_cast  (STMT_VINFO_STMT (abd_pattern_vinfo));
> +  if (!abd_stmt
> +  || !gimple_call_internal_p (abd_stmt)
> +  || gimple_call_internal_fn (abd_stmt) != IFN_ABD)
> +return NULL;
> +
> +  tree vectype_in = get_vectype_for_scalar_type (vinfo, in_type);
> +  tree vectype_out = get_vectype_for_scalar_type (vinfo, out_type);
> +
> +  code_helper dummy_code;
> +  int dummy_int;
> +  auto_vec dummy_vec;
> +  if (!supportable_widening_operation (vinfo, IFN_VEC_WIDEN_ABD, stmt_vinfo,
> +vectype_out, vectype_in,
> +&dummy_code, &dummy_code,
> +&dummy_int, &dummy_vec))
> +return NULL;
> +
> +  vect_pattern_detected ("vect_recog_widen_abd_pattern", last_stmt);
> +
> +  *type_out = vectype_out;
> +
> +  tree abd_oprnd0 = gimple_call_arg (abd_stmt, 0);
> +  tree abd_oprnd1 = gimple_call_arg (abd_stmt, 1);
> +  tree widen_abd_result = vect_recog_temp_ssa_var (out_type, NULL);
> +  gcall *widen_abd_stmt = gimple_build_call_internal (IFN_VEC_WIDEN_ABD, 2,
> +   abd_oprnd0, abd_oprnd1);
> +  gimple_call_set_lhs (widen_abd_stmt, widen_abd_result);
> +  gimple_set_location (widen_abd_stmt, gimple_location (last_stmt));
> +  return widen_abd_stmt;
> +}
> +
>  /* Function vect_recog_ctz_ffs_pattern
>  
> Try to find the following pattern:
> @@ -6670,6 +6763,7 @@ static vect_recog_func vect_vect_recog_func_ptrs[] = {
>{ vect_recog_mask_conversion_pattern, "mask_conversion" },
>{ vect_recog_widen_plus_pattern, "widen_plus" },
>{ vect_recog_widen_minus_pattern, "widen_minus" },
> +  { vect_recog_widen_abd_pattern, "widen_abd" },
>/* These must come after the double widening ones.  */
>  };


[committed] Regenrate lto-plugin/Makefile.in

2023-06-30 Thread Martin Jambor
Hi,

On Thu, Jun 29 2023, Marek Polacek wrote:
> On Thu, Jun 29, 2023 at 05:58:22PM +0200, Martin Jambor wrote:

[...]

>> 
>> Unfortunately I won't have time to actually look at this in the next 2-3
>> weeks, so I am inclined to just trust the verification script (which
>> essentially runs autoconf/automake everywhere and then expects no diff)
>> and commit the one-line change.  What do you think, does that make sense
>> (even without looking at why other Makefile.in files did not change)?
>
> Yes please, go ahead with the one line change meanwhile.  Thanks!
>
> I've opened PR110467 for the build problem.
>
> Marek


Commit regenerated lto-plugin/Makefile.in in order to reflect changes
introduction of --enable-host-pie.

lto-plugin/ChangeLog:

2023-06-30  Martin Jambor  

* Makefile.in: Regenerate.
---
 lto-plugin/Makefile.in | 1 +
 1 file changed, 1 insertion(+)

diff --git a/lto-plugin/Makefile.in b/lto-plugin/Makefile.in
index cb568e1e09f..f6f5b020ff5 100644
--- a/lto-plugin/Makefile.in
+++ b/lto-plugin/Makefile.in
@@ -298,6 +298,7 @@ datadir = @datadir@
 datarootdir = @datarootdir@
 docdir = @docdir@
 dvidir = @dvidir@
+enable_host_bind_now = @enable_host_bind_now@
 exec_prefix = @exec_prefix@
 gcc_build_dir = @gcc_build_dir@
 get_gcc_base_ver = @get_gcc_base_ver@
-- 
2.41.0



Re: [PATCH 0/9] vect: Move costing next to the transform for vect load

2023-06-30 Thread Richard Biener via Gcc-patches
On Tue, Jun 13, 2023 at 4:07 AM Kewen Lin  wrote:
>
> This patch series follows Richi's suggestion at the link [1],
> which suggest structuring vectorizable_load to make costing
> next to the transform, in order to make it easier to keep
> costing and the transform in sync.  For now, it's a known
> issue that what we cost can be inconsistent with what we
> transform, as the case in PR82255 and some other associated
> test cases in the patches of this series show.
>
> Basically this patch series makes costing not call function
> vect_model_load_cost any more.  To make the review and
> bisection easy, I organized the changes according to the
> memory access types of vector load.  For each memory access
> type, firstly it follows the handlings in the function
> vect_model_load_costto avoid any missing, then refines
> further by referring to the transform code, I also checked
> them with some typical test cases to verify.  Hope the
> subjects of patches are clear enough.
>
> The whole series can be bootstrapped and regtested
> incrementally on:
>   - x86_64-redhat-linux
>   - aarch64-linux-gnu
>   - powerpc64-linux-gnu P7, P8 and P9
>   - powerpc64le-linux-gnu P8, P9 and P10
>
> By considering the current vector test buckets are mainly
> tested without cost model, I also verified the whole patch
> series was neutral for SPEC2017 int/fp on Power9 at O2,
> O3 and Ofast separately.

I went through the series now and I like it overall (well, I suggested
the change).
Looking at the changes I think we want some followup to reduce the
mess in the final loop nest.  We already have some VMAT_* cases handled
separately, maybe we can split out some more cases.  Maybe we should
bite the bullet and duplicate that loop nest for the different VMAT_* cases.
Maybe we can merge some of the if (!costing_p) checks by clever
re-ordering.  So what
this series doesn't improve is overall readability of the code (indent and our
80 char line limit).

The change also makes it more difficult(?) to separate analysis and transform
though in the end I hope that analysis will actually "code generate" to a (SLP)
data structure so the target will have a chance to see the actual flow of insns.

That said, I'd like to hear from Richard whether he thinks this is a step
in the right direction.

Are you willing to followup with doing the same re-structuring to
vectorizable_store?

OK from my side with the few comments addressed.  The patch likely needs refresh
after the RVV changes in this area?

Thanks,
Richard.

> [1] https://gcc.gnu.org/pipermail/gcc-patches/2021-January/563624.html
>
> Kewen Lin (9):
>   vect: Move vect_model_load_cost next to the transform in vectorizable_load
>   vect: Adjust vectorizable_load costing on VMAT_GATHER_SCATTER && 
> gs_info.decl
>   vect: Adjust vectorizable_load costing on VMAT_INVARIANT
>   vect: Adjust vectorizable_load costing on VMAT_ELEMENTWISE and 
> VMAT_STRIDED_SLP
>   vect: Adjust vectorizable_load costing on VMAT_GATHER_SCATTER
>   vect: Adjust vectorizable_load costing on VMAT_LOAD_STORE_LANES
>   vect: Adjust vectorizable_load costing on VMAT_CONTIGUOUS_REVERSE
>   vect: Adjust vectorizable_load costing on VMAT_CONTIGUOUS_PERMUTE
>   vect: Adjust vectorizable_load costing on VMAT_CONTIGUOUS
>
>  .../vect/costmodel/ppc/costmodel-pr82255.c|  31 +
>  .../costmodel/ppc/costmodel-vect-reversed.c   |  22 +
>  gcc/testsuite/gcc.target/i386/pr70021.c   |   2 +-
>  gcc/tree-vect-stmts.cc| 651 ++
>  4 files changed, 432 insertions(+), 274 deletions(-)
>  create mode 100644 
> gcc/testsuite/gcc.dg/vect/costmodel/ppc/costmodel-pr82255.c
>  create mode 100644 
> gcc/testsuite/gcc.dg/vect/costmodel/ppc/costmodel-vect-reversed.c
>
> --
> 2.31.1
>


RE: [v4] Streamer: Fix out of range memory access of machine mode

2023-06-30 Thread Li, Pan2 via Gcc-patches
Thanks Thomas for make it happen.

Then we have 2 patches, right? V4 Streamer and V1 LTO: Capture. Not quite sure 
if these 2 has some dependencies when commit (I suppose both are well tested 
and approved). But anything I can do to make some progress please feel free to 
let me know.

Again, very appreciate for the great help from Thomas, it is really save my day!

Pan

-Original Message-
From: Thomas Schwinge  
Sent: Friday, June 30, 2023 4:50 PM
To: Li, Pan2 ; juzhe.zh...@rivai.ai; 
gcc-patches@gcc.gnu.org; Richard Biener ; Jakub Jelinek 

Cc: Robin Dapp ; jeffreya...@gmail.com; Wang, Yanzhang 
; kito.ch...@gmail.com; Tobias Burnus 

Subject: [v4] Streamer: Fix out of range memory access of machine mode

Hi!

On 2023-06-30T01:39:39+, "Li, Pan2"  wrote:
> That’s very cool, thanks Thomas for help!

:-)

> Let’s wait the AMD test running result for the final version of the patch.

That's all looking good, too.

> From: juzhe.zh...@rivai.ai 
> Sent: Friday, June 30, 2023 9:27 AM

> Could you merge your patch after you tested?

I've done that, and with (already approved)

"LTO: Capture 'lto_file_decl_data *file_data' in 'class lto_input_block'"
split out, OK to push the attached
v4 "Streamer: Fix out of range memory access of machine mode"?


Grüße
 Thomas


> From: Thomas Schwinge
> Date: 2023-06-30 04:14

> Subject: Re: [PATCH v3] Streamer: Fix out of range memory access of machine 
> mode
> Hi!
>
> On 2023-06-29T11:29:57+0200, I wrote:
>> On 2023-06-21T15:58:24+0800, Pan Li via Gcc-patches 
>> mailto:gcc-patches@gcc.gnu.org>> wrote:
>>> We extend the machine mode from 8 to 16 bits already. But there still
>>> one placing missing from the streamer. It has one hard coded array
>>> for the machine code like size 256.
>>>
>>> In the lto pass, we memset the array by MAX_MACHINE_MODE count but the
>>> value of the MAX_MACHINE_MODE will grow as more and more modes are
>>> added. While the machine mode array in tree-streamer still leave 256 as is.
>>>
>>> Then, when the MAX_MACHINE_MODE is greater than 256, the memset of
>>> lto_output_init_mode_table will touch the memory out of range unexpected.
>>
>> Uh.  :-O
>>
>>> This patch would like to take the MAX_MACHINE_MODE as the size of the
>>> array in streamer, to make sure there is no potential unexpected
>>> memory access in future. Meanwhile, this patch also adjust some place
>>> which has MAX_MACHINE_MODE <= 256 assumption.
>>
>> Thanks to Jakub and Richard for guidance re the offloading compilation
>> case, where we've got different 'MAX_MACHINE_MODE's between stream-out
>> and stream-in, and a modes mapping table.
>>
>> However, with this patch, there are ICEs all over the place...  I'm
>> having a look.
>
> Your patch has all the right ideas, there are just a few additional
> changes necessary.  Please merge in the attached
> "f into Streamer: Fix out of range memory access of machine mode", with
> 'Co-authored-by: Thomas Schwinge 
> mailto:tho...@codesourcery.com>>'.  This has
> already survived compiler-side 'lto.exp' testing and
> 'check-target-libgomp' with Nvidia GPU offloading; AMD GPU testing is now
> running (not expecting any bad surprises).  Will let you know by (my)
> tomorrow morning in case there are any more problems.
>
> Explanation:
>
>>> --- a/gcc/lto-streamer-in.cc
>>> +++ b/gcc/lto-streamer-in.cc
>>> @@ -1985,8 +1985,6 @@ lto_input_mode_table (struct lto_file_decl_data 
>>> *file_data)
>>>  internal_error ("cannot read LTO mode table from %s",
>>>   file_data->file_name);
>>>
>>> -  unsigned char *table = ggc_cleared_vec_alloc (1 << 8);
>>> -  file_data->mode_table = table;
>>>const struct lto_simple_header_with_strings *header
>>>  = (const struct lto_simple_header_with_strings *) data;
>>>int string_offset;
>>> @@ -1998,16 +1996,22 @@ lto_input_mode_table (struct lto_file_decl_data 
>>> *file_data)
>>>   header->string_size, vNULL);
>>>bitpack_d bp = streamer_read_bitpack (&ib);
>>>
>>> +  unsigned mode_bits = bp_unpack_value (&bp, 5);
>>> +  unsigned char *table = ggc_cleared_vec_alloc (1 << 
>>> mode_bits);
>>> +
>>> +  file_data->mode_table = table;
>>> +  file_data->mode_bits = mode_bits;
>
> Here, we set 'file_data->mode_bits' for the offloading case (where
> 'lto_input_mode_table' is called) -- but it's not set for the
> non-offloading case (where 'lto_input_mode_table' isn't called).  (See my
> 'gcc/lto/lto-common.cc:lto_read_decls' change.)  That's "not currently a
> problem", as 'file_data->mode_bits' isn't used anywhere...
>
>>> --- a/gcc/lto-streamer.h
>>> +++ b/gcc/lto-streamer.h
>>> @@ -604,6 +604,8 @@ struct GTY(()) lto_file_decl_data
>>>int order_base;
>>>
>>>int unit_base;
>>> +
>>> +  unsigned mode_bits;
>>>  };
>
>>>  inline machine_mode
>>>  bp_unpack_machine_mode (struct bitpack_d *bp)
>>>  {
>>> -  return (machine_mode)
>>> -   

Adjust LTO mode tables for "Machine_Mode: Extend machine_mode from 8 to 16 bits" (was: [PATCH] Machine_Mode: Extend machine_mode from 8 to 16 bits)

2023-06-30 Thread Thomas Schwinge
Hi!

On 2023-05-13T16:44:41+0800, Kito Cheng via Gcc-patches 
 wrote:
> Tried this patch and I ran into some issues, some variables are using
> unsigned char to hold machine mode and will have problems when the
> number of modes is larger than 255...
>
> And here is the fix:

> --- a/gcc/genmodes.cc
> +++ b/gcc/genmodes.cc
> @@ -1141,10 +1141,10 @@ inline __attribute__((__always_inline__))\n\
> #else\n\
> extern __inline__ __attribute__((__always_inline__, __gnu_inline__))\n\
> #endif\n\
> -unsigned char\n\
> +unsigned short\n\
> mode_inner_inline (machine_mode mode)\n\
> {\n\
> -  extern const unsigned char mode_inner[NUM_MACHINE_MODES];\n\
> +  extern const unsigned short mode_inner[NUM_MACHINE_MODES];\n\
>   gcc_assert (mode >= 0 && mode < NUM_MACHINE_MODES);\n\
>   switch (mode)\n\
> {");
> @@ -1529,7 +1529,7 @@ emit_mode_wider (void)
>   int c;
>   struct mode_data *m;
>
> -  print_decl ("unsigned char", "mode_next", "NUM_MACHINE_MODES");
> +  print_decl ("unsigned short", "mode_next", "NUM_MACHINE_MODES");

Etc.

Instead of 's%char%short', shouldn't we really be using
'enum machine_mode' here?  (I understand such a change may require some
further surgery, but wouldn't it be the correct thing to do?)


And, in context of working on

"Streamer: Fix out of range memory access of machine mode", I found
another one, see attached
'[WIP] Adjust LTO mode tables for "Machine_Mode: Extend machine_mode from 8 to 
16 bits"'
(..., which applies on top of the former.)  There, in fact, I did change
to 'enum machine_mode' instead of 's%char%short' -- correct?  Any
comments on the 'GTY' issues: (1) 'const' build error,
(2) '[build-gcc]/gcc/gtype-desc.cc' changes, and (3) is 'GTY((atomic))'
actually the right thing to use, here?

In particular, the 'lto_mode_identity_table' changes would seem necessary
to keep standard LTO ('-flto') functional for large 'machine_mode' size?


Bernhard: Fancy writing a Coccinelle check whether there are any more
places where we put what originally was a 'machine_mode' type into a
'char' (or, into a non-'machine_mode' generally)?  ;-) Hey, just a Friday
afternoon idea!


Grüße
 Thomas


-
Siemens Electronic Design Automation GmbH; Anschrift: Arnulfstraße 201, 80634 
München; Gesellschaft mit beschränkter Haftung; Geschäftsführer: Thomas 
Heurung, Frank Thürauf; Sitz der Gesellschaft: München; Registergericht 
München, HRB 106955
>From 0fd8f65bb87b11ef8ae366a797aec572d67b284f Mon Sep 17 00:00:00 2001
From: Thomas Schwinge 
Date: Fri, 30 Jun 2023 13:23:55 +0200
Subject: [PATCH] [WIP] Adjust LTO mode tables for "Machine_Mode: Extend
 machine_mode from 8 to 16 bits"

---
 gcc/lto-streamer-in.cc |  2 +-
 gcc/lto-streamer.h | 56 +-
 gcc/lto/lto-common.cc  | 10 
 gcc/lto/lto-common.h   |  2 +-
 gcc/tree-streamer.h|  2 +-
 5 files changed, 63 insertions(+), 9 deletions(-)

diff --git a/gcc/lto-streamer-in.cc b/gcc/lto-streamer-in.cc
index 1876e1967ec..bbd44504ff8 100644
--- a/gcc/lto-streamer-in.cc
+++ b/gcc/lto-streamer-in.cc
@@ -1997,7 +1997,7 @@ lto_input_mode_table (struct lto_file_decl_data *file_data)
   bitpack_d bp = streamer_read_bitpack (&ib);
 
   unsigned mode_bits = bp_unpack_value (&bp, 5);
-  unsigned char *table = ggc_cleared_vec_alloc (1 << mode_bits);
+  machine_mode *table = ggc_cleared_vec_alloc (1 << mode_bits);
 
   file_data->mode_table = table;
   file_data->mode_bits = mode_bits;
diff --git a/gcc/lto-streamer.h b/gcc/lto-streamer.h
index 0556b34c837..4d83741e4c6 100644
--- a/gcc/lto-streamer.h
+++ b/gcc/lto-streamer.h
@@ -596,7 +596,61 @@ struct GTY(()) lto_file_decl_data
   hash_map * GTY((skip)) resolution_map;
 
   /* Mode translation table.  */
-  const unsigned char *mode_table;
+  /*TODO const
+With 'const', we get:
+
+gtype-desc.cc: In function 'void gt_pch_nx_lto_file_decl_data(void*)':
+gtype-desc.cc:6531:34: error: invalid conversion from 'const void*' to 'void*' [-fpermissive]
+ gt_pch_note_object ((*x).mode_table, x, gt_pch_p_18lto_file_decl_data);
+  ^
+In file included from [...]/source-gcc/gcc/hash-table.h:247:0,
+ from [...]/source-gcc/gcc/coretypes.h:486,
+ from gtype-desc.cc:23:
+[...]/source-gcc/gcc/ggc.h:47:12: note:   initializing argument 1 of 'int gt_pch_note_object(void*, void*, gt_note_pointers, size_t)'
+ extern int gt_pch_note_object (void *, void *, gt_note_pointers,
+^
+make[2]: *** [Makefile:1180: gtype-desc.o] Error 1
+   */
+  machine_mode * GTY((atomic)) mode_table;
+  /*
+This (without 'const') changes '[build-gcc]/gcc/gtype-desc.cc' as follows:
+
+@@ -2566,7 +2566,9 @@ gt_ggc_mx_lto_file_decl_data (void *x_p)
+   gt_ggc_m_17lto_in_decl_state ((*x).global_decl_state);
+   gt_ggc_m_29hash_table_decl_state_hasher_ ((*x).function_decl_states);
+ 

Re: Adjust LTO mode tables for "Machine_Mode: Extend machine_mode from 8 to 16 bits" (was: [PATCH] Machine_Mode: Extend machine_mode from 8 to 16 bits)

2023-06-30 Thread Kito Cheng via Gcc-patches
> On 2023-05-13T16:44:41+0800, Kito Cheng via Gcc-patches 
>  wrote:
> > Tried this patch and I ran into some issues, some variables are using
> > unsigned char to hold machine mode and will have problems when the
> > number of modes is larger than 255...
> >
> > And here is the fix:
>
> > --- a/gcc/genmodes.cc
> > +++ b/gcc/genmodes.cc
> > @@ -1141,10 +1141,10 @@ inline __attribute__((__always_inline__))\n\
> > #else\n\
> > extern __inline__ __attribute__((__always_inline__, __gnu_inline__))\n\
> > #endif\n\
> > -unsigned char\n\
> > +unsigned short\n\
> > mode_inner_inline (machine_mode mode)\n\
> > {\n\
> > -  extern const unsigned char mode_inner[NUM_MACHINE_MODES];\n\
> > +  extern const unsigned short mode_inner[NUM_MACHINE_MODES];\n\
> >   gcc_assert (mode >= 0 && mode < NUM_MACHINE_MODES);\n\
> >   switch (mode)\n\
> > {");
> > @@ -1529,7 +1529,7 @@ emit_mode_wider (void)
> >   int c;
> >   struct mode_data *m;
> >
> > -  print_decl ("unsigned char", "mode_next", "NUM_MACHINE_MODES");
> > +  print_decl ("unsigned short", "mode_next", "NUM_MACHINE_MODES");
>
> Etc.
>
> Instead of 's%char%short', shouldn't we really be using
> 'enum machine_mode' here?  (I understand such a change may require some
> further surgery, but wouldn't it be the correct thing to do?)

Hmmm, I think maybe what we need is to leverage C++ language features
to declare enum with underlying types like that:

enum machine_mode : uint16_t


Fix predictions of conditionals with __builtin_expect

2023-06-30 Thread Jan Hubicka via Gcc-patches
Hi,
while looking into the std::vector _M_realloc_insert codegen I noticed that 
call of __throw_bad_alloc is predicted with 10% probability. This is because
the conditional guarding it has __builtin_expect (cond, 0) on it.  This
incorrectly takes precedence over more reliable heuristics predicting that call
to cold noreturn is likely not going to happen.

So I reordered the predictors so __builtin_expect_with_probability comes first
after predictors that never makes a mistake (so user can use it to always
specify the outcome by hand).  I also downgraded malloc predictor since I do
think user-defined malloc functions & new operators may behave funny ways and
moved usual __builtin_expect after the noreturn cold predictor.

This triggered latent bug in expr_expected_value_1 where

  if (*predictor < predictor2)
*predictor = predictor2;

should be:

  if (predictor2 < *predictor)
*predictor = predictor2;

which eventually triggered an ICE on combining heuristics.  This made me notice
that we can do slightly better while combining expected values in case only 
one of the parameters (such as in a*b when we expect a==0) can determine
overall result.

Note that the new code may pick weaker heuristics in case that both values are
predicted.  Not sure if this scenario is worth the extra CPU time: there is
not correct way to combine the probabilities anyway since we do not know if
the predictions are independent, so I think users should not rely on it.

Fixing this issue uncovered another problem.  In 2018 Martin Liska added
code predicting that MALLOC returns non-NULL but instead of that he predicts
that it returns true (boolean 1).  This sort of works for testcase testing
 malloc (10) != NULL
but, for example, we will predict
 malloc (10) == malloc (10)
as true, which is not right and such comparsion may happen in real code

I think proper way is to update expr_expected_value_1 to work with value
ranges, but that needs greater surgery so I decided to postpone this and
only add FIXME and fill PR110499.

Bootstrapped/regtested x86_64-linux.  Will commit it shortly.

gcc/ChangeLog:

PR middle-end/109849
* predict.cc (estimate_bb_frequencies): Turn to static function.
(expr_expected_value_1): Fix handling of binary expressions with
predicted values.
* predict.def (PRED_MALLOC_NONNULL): Move later in the priority queue.
(PRED_BUILTIN_EXPECT_WITH_PROBABILITY): Move to almost top of the 
priority
queue.
* predict.h (estimate_bb_frequencies): No longer declare it.

gcc/testsuite/ChangeLog:

PR middle-end/109849
* gcc.dg/predict-18.c: Improve testcase.

diff --git a/gcc/predict.cc b/gcc/predict.cc
index 5e3c1d69ca4..688c0970f1c 100644
--- a/gcc/predict.cc
+++ b/gcc/predict.cc
@@ -89,6 +90,7 @@ static void predict_paths_leading_to_edge (edge, enum 
br_predictor,
 static bool can_predict_insn_p (const rtx_insn *);
 static HOST_WIDE_INT get_predictor_value (br_predictor, HOST_WIDE_INT);
 static void determine_unlikely_bbs ();
+static void estimate_bb_frequencies (bool force);
 
 /* Information we hold about each branch predictor.
Filled using information from predict.def.  */
@@ -2485,7 +2487,11 @@ expr_expected_value_1 (tree type, tree op0, enum 
tree_code code,
{
  if (predictor)
*predictor = PRED_MALLOC_NONNULL;
- return boolean_true_node;
+ /* FIXME: This is wrong and we need to convert the logic
+to value ranges.  This makes predictor to assume that
+malloc always returns (size_t)1 which is not the same
+as returning non-NULL.  */
+ return fold_convert (type, boolean_true_node);
}
 
  if (DECL_BUILT_IN_CLASS (decl) == BUILT_IN_NORMAL)
@@ -2563,7 +2569,9 @@ expr_expected_value_1 (tree type, tree op0, enum 
tree_code code,
  case BUILT_IN_REALLOC:
if (predictor)
  *predictor = PRED_MALLOC_NONNULL;
-   return boolean_true_node;
+   /* FIXME: This is wrong and we need to convert the logic
+  to value ranges.  */
+   return fold_convert (type, boolean_true_node);
  default:
break;
}
@@ -2575,18 +2583,43 @@ expr_expected_value_1 (tree type, tree op0, enum 
tree_code code,
   if (get_gimple_rhs_class (code) == GIMPLE_BINARY_RHS)
 {
   tree res;
+  tree nop0 = op0;
+  tree nop1 = op1;
+  if (TREE_CODE (op0) != INTEGER_CST)
+   {
+ /* See if expected value of op0 is good enough to determine the 
result.  */
+ nop0 = expr_expected_value (op0, visited, predictor, probability);
+ if (nop0
+ && (res = fold_build2 (code, type, nop0, op1)) != NULL
+ && TREE_CODE (res) == INTEGER_CST)
+   return res;
+ if (!nop0)
+   nop0 = op0;
+}
  

[committed] fold-const+optabs: Change return type of predicate functions from int to bool

2023-06-30 Thread Uros Bizjak via Gcc-patches
Also change some internal variables and function argument from int to bool.

gcc/ChangeLog:

* fold-const.h (multiple_of_p): Change return type from int to bool.
* fold-const.cc (split_tree): Change negl_p, neg_litp_p,
neg_conp_p and neg_var_p variables to bool.
(const_binop): Change sat_p variable to bool.
(merge_ranges): Change no_overlap variable to bool.
(extract_muldiv_1): Change same_p variable to bool.
(tree_swap_operands_p): Update function body for bool return type.
(fold_truth_andor): Change commutative variable to bool.
(multiple_of_p): Change return type
from int to void and adjust function body accordingly.
* optabs.h (expand_twoval_unop): Change return type from int to bool.
(expand_twoval_binop): Ditto.
(can_compare_p): Ditto.
(have_add2_insn): Ditto.
(have_addptr3_insn): Ditto.
(have_sub2_insn): Ditto.
(have_insn_for): Ditto.
* optabs.cc (add_equal_note): Ditto.
(widen_operand): Change no_extend argument from int to bool.
(expand_binop): Ditto.
(expand_twoval_unop): Change return type
from int to void and adjust function body accordingly.
(expand_twoval_binop): Ditto.
(can_compare_p): Ditto.
(have_add2_insn): Ditto.
(have_addptr3_insn): Ditto.
(have_sub2_insn): Ditto.
(have_insn_for): Ditto.

Bootstrapped and regression tested on x86_64-linux-gnu {,-m32}.

Uros.
diff --git a/gcc/fold-const.cc b/gcc/fold-const.cc
index ac90a594fcc..a02ede79fed 100644
--- a/gcc/fold-const.cc
+++ b/gcc/fold-const.cc
@@ -922,8 +922,8 @@ split_tree (tree in, tree type, enum tree_code code,
 {
   tree op0 = TREE_OPERAND (in, 0);
   tree op1 = TREE_OPERAND (in, 1);
-  int neg1_p = TREE_CODE (in) == MINUS_EXPR;
-  int neg_litp_p = 0, neg_conp_p = 0, neg_var_p = 0;
+  bool neg1_p = TREE_CODE (in) == MINUS_EXPR;
+  bool neg_litp_p = false, neg_conp_p = false, neg_var_p = false;
 
   /* First see if either of the operands is a literal, then a constant.  */
   if (TREE_CODE (op0) == INTEGER_CST || TREE_CODE (op0) == REAL_CST
@@ -1450,7 +1450,7 @@ const_binop (enum tree_code code, tree arg1, tree arg2)
   FIXED_VALUE_TYPE f2;
   FIXED_VALUE_TYPE result;
   tree t, type;
-  int sat_p;
+  bool sat_p;
   bool overflow_p;
 
   /* The following codes are handled by fixed_arithmetic.  */
@@ -5680,7 +5680,7 @@ bool
 merge_ranges (int *pin_p, tree *plow, tree *phigh, int in0_p, tree low0,
  tree high0, int in1_p, tree low1, tree high1)
 {
-  int no_overlap;
+  bool no_overlap;
   int subset;
   int temp;
   tree tem;
@@ -6855,7 +6855,7 @@ extract_muldiv_1 (tree t, tree c, enum tree_code code, 
tree wide_type,
> GET_MODE_SIZE (SCALAR_INT_TYPE_MODE (type)))
? wide_type : type);
   tree t1, t2;
-  int same_p = tcode == code;
+  bool same_p = tcode == code;
   tree op0 = NULL_TREE, op1 = NULL_TREE;
   bool sub_strict_overflow_p;
 
@@ -7467,17 +7467,17 @@ bool
 tree_swap_operands_p (const_tree arg0, const_tree arg1)
 {
   if (CONSTANT_CLASS_P (arg1))
-return 0;
+return false;
   if (CONSTANT_CLASS_P (arg0))
-return 1;
+return true;
 
   STRIP_NOPS (arg0);
   STRIP_NOPS (arg1);
 
   if (TREE_CONSTANT (arg1))
-return 0;
+return false;
   if (TREE_CONSTANT (arg0))
-return 1;
+return true;
 
   /* It is preferable to swap two SSA_NAME to ensure a canonical form
  for commutative and comparison operators.  Ensuring a canonical
@@ -7486,21 +7486,21 @@ tree_swap_operands_p (const_tree arg0, const_tree arg1)
   if (TREE_CODE (arg0) == SSA_NAME
   && TREE_CODE (arg1) == SSA_NAME
   && SSA_NAME_VERSION (arg0) > SSA_NAME_VERSION (arg1))
-return 1;
+return true;
 
   /* Put SSA_NAMEs last.  */
   if (TREE_CODE (arg1) == SSA_NAME)
-return 0;
+return false;
   if (TREE_CODE (arg0) == SSA_NAME)
-return 1;
+return true;
 
   /* Put variables last.  */
   if (DECL_P (arg1))
-return 0;
+return false;
   if (DECL_P (arg0))
-return 1;
+return true;
 
-  return 0;
+  return false;
 }
 
 
@@ -9693,10 +9693,10 @@ fold_truth_andor (location_t loc, enum tree_code code, 
tree type,
   tree a01 = TREE_OPERAND (arg0, 1);
   tree a10 = TREE_OPERAND (arg1, 0);
   tree a11 = TREE_OPERAND (arg1, 1);
-  int commutative = ((TREE_CODE (arg0) == TRUTH_OR_EXPR
- || TREE_CODE (arg0) == TRUTH_AND_EXPR)
-&& (code == TRUTH_AND_EXPR
-|| code == TRUTH_OR_EXPR));
+  bool commutative = ((TREE_CODE (arg0) == TRUTH_OR_EXPR
+  || TREE_CODE (arg0) == TRUTH_AND_EXPR)
+ && (code == TRUTH_AND_EXPR
+ || code == TRUTH_OR_EXPR));
 
   if (operand_equal_p (a00, a10, 0))
return fold_build2_loc (loc, TREE_CODE (arg0), type, a00,
@@ -14012,8 +14012,8 @@ fold_binary_initializer_loc (location_t loc, tree_code 

[committed] libstdc++: Make std::random_device throw more std::system_error [PR105081]

2023-06-30 Thread Jonathan Wakely via Gcc-patches
Tested x86_64-linux. Pushed to trunk.

-- >8 --

In r14-289-gf9412cedd6c0e7 I made the std::random_device constructor
throw std::system_error for unrecognized tokens. But it still throws
std::runtime_error for a token such as "rdseed" that is recognized but
not supported at runtime by the CPU the program is running on.

With this change we throw std::system_error for those cases too. This
fixes the following failures on Intel CPUs withour rdseed support:

FAIL: 26_numerics/random/random_device/94087.cc execution test
FAIL: 26_numerics/random/random_device/cons/token.cc execution test
FAIL: 26_numerics/random/random_device/entropy.cc execution test

libstdc++-v3/ChangeLog:

PR libstdc++/105081
* src/c++11/random.cc (random_device::_M_init): Throw
std::system_error when the requested device is a valid token but
not available at runtime.
---
 libstdc++-v3/src/c++11/random.cc | 23 ---
 1 file changed, 20 insertions(+), 3 deletions(-)

diff --git a/libstdc++-v3/src/c++11/random.cc b/libstdc++-v3/src/c++11/random.cc
index 6ecdc7169ab..cece6edbfc7 100644
--- a/libstdc++-v3/src/c++11/random.cc
+++ b/libstdc++-v3/src/c++11/random.cc
@@ -373,6 +373,15 @@ namespace std _GLIBCXX_VISIBILITY(default)
  "(const std::string&):"
  " unsupported token"));
 
+#if defined ENOSYS
+const int unsupported = ENOSYS;
+#elif defined ENOTSUP
+const int unsupported = ENOTSUP;
+#else
+const int unsupported = 0;
+#endif
+int err = 0;
+
 #ifdef _GLIBCXX_USE_CRT_RAND_S
 if (which & rand_s)
 {
@@ -407,6 +416,7 @@ namespace std _GLIBCXX_VISIBILITY(default)
  return;
}
}
+  err = unsupported;
 }
 #endif // USE_RDSEED
 
@@ -427,6 +437,7 @@ namespace std _GLIBCXX_VISIBILITY(default)
  return;
}
}
+  err = unsupported;
 }
 #endif // USE_RDRAND
 
@@ -438,6 +449,7 @@ namespace std _GLIBCXX_VISIBILITY(default)
_M_func = &__ppc_darn;
return;
  }
+   err = unsupported;
   }
 #endif // USE_DARN
 
@@ -458,6 +470,7 @@ namespace std _GLIBCXX_VISIBILITY(default)
_M_func = &__libc_getentropy;
return;
  }
+   err = unsupported;
   }
 #endif // _GLIBCXX_HAVE_GETENTROPY
 
@@ -477,6 +490,7 @@ namespace std _GLIBCXX_VISIBILITY(default)
   if (_M_file)
return;
 #endif // USE_POSIX_FILE_IO
+  err = errno;
 }
 #endif // _GLIBCXX_USE_DEV_RANDOM
 
@@ -493,9 +507,12 @@ namespace std _GLIBCXX_VISIBILITY(default)
 }
 #endif
 
-std::__throw_runtime_error(
-   __N("random_device::random_device(const std::string&):"
-   " device not available"));
+auto msg = __N("random_device::random_device(const std::string&):"
+  " device not available");
+if (err)
+  std::__throw_syserr(err, msg);
+else
+  std::__throw_runtime_error(msg);
 #endif // USE_MT19937
   }
 
-- 
2.41.0



[PATCH] libstdc++: Fix iostream init for Clang on darwin [PR110432]

2023-06-30 Thread Jonathan Wakely via Gcc-patches
Tested x86_64-linux. Patrick, PTAL.

-- >8 --

The __has_attribute(init_priority) check in  is true for Clang
on darwin, which means that user code including  thinks the
library will initialize the global streams. However, when libstdc++ is
built by GCC on darwin, the __has_attribute(init_priority) check is
false, which means that the library thinks that user code will do the
initialization when  is included. This means that the
initialization is never done.

Add an autoconf check so that the header and the library both make their
decision based on the static properties of GCC at build time, with a
consistent outcome.

As a belt and braces check, also do the initialization in  if
the compiler including that header doesn't support the attribute (even
if the library also containers the initialization). This might result in
redundant initialization done in , but ensures the
initialization happens somewhere if there's any doubt about the
attribute working correctly due to missing linker support.

libstdc++-v3/ChangeLog:

PR libstdc++/110432
* acinclude.m4 (GLIBCXX_CHECK_INIT_PRIORITY): New.
* config.h.in: Regenerate.
* configure: Regenerate.
* configure.ac: Use GLIBCXX_CHECK_INIT_PRIORITY.
* include/std/iostream:
* src/c++98/ios_base_init.h: Use new autoconf macro instead of
__has_attribute.
---
 libstdc++-v3/acinclude.m4  | 27 ++
 libstdc++-v3/config.h.in   |  3 ++
 libstdc++-v3/configure | 51 ++
 libstdc++-v3/configure.ac  |  3 ++
 libstdc++-v3/include/std/iostream  |  2 +-
 libstdc++-v3/src/c++98/ios_base_init.h |  2 +-
 6 files changed, 86 insertions(+), 2 deletions(-)

diff --git a/libstdc++-v3/acinclude.m4 b/libstdc++-v3/acinclude.m4
index 277ae10e031..823832f97d4 100644
--- a/libstdc++-v3/acinclude.m4
+++ b/libstdc++-v3/acinclude.m4
@@ -5680,6 +5680,33 @@ AC_DEFUN([GLIBCXX_CHECK_ALIGNAS_CACHELINE], [
   AC_LANG_RESTORE
 ])
 
+dnl
+dnl Check whether iostream initialization should be done in the library,
+dnl using the init_priority attribute.
+dnl
+dnl Defines:
+dnl  _GLIBCXX_USE_INIT_PRIORITY_ATTRIBUTE if GCC supports the init_priority
+dnlattribute for the target.
+dnl
+AC_DEFUN([GLIBCXX_CHECK_INIT_PRIORITY], [
+AC_LANG_SAVE
+  AC_LANG_CPLUSPLUS
+
+  AC_MSG_CHECKING([whether init_priority attribute is supported])
+  AC_TRY_COMPILE(, [
+  #if ! __has_attribute(init_priority)
+  #error init_priority not supported
+  #endif
+], [ac_init_priority=yes], [ac_init_priority=no])
+  if test "$ac_init_priority" = yes; then
+AC_DEFINE_UNQUOTED(_GLIBCXX_USE_INIT_PRIORITY_ATTRIBUTE, 1,
+  [Define if init_priority should be used for iostream initialization.])
+  fi
+  AC_MSG_RESULT($ac_init_priority)
+
+  AC_LANG_RESTORE
+])
+
 # Macros from the top-level gcc directory.
 m4_include([../config/gc++filt.m4])
 m4_include([../config/tls.m4])
diff --git a/libstdc++-v3/configure.ac b/libstdc++-v3/configure.ac
index 9770c178767..fc0f2522027 100644
--- a/libstdc++-v3/configure.ac
+++ b/libstdc++-v3/configure.ac
@@ -550,6 +550,9 @@ GLIBCXX_ZONEINFO_DIR
 # For src/c++11/shared_ptr.cc alignment.
 GLIBCXX_CHECK_ALIGNAS_CACHELINE
 
+# For using init_priority in ios_init.cc
+GLIBCXX_CHECK_INIT_PRIORITY
+
 # Define documentation rules conditionally.
 
 # See if makeinfo has been installed and is modern enough
diff --git a/libstdc++-v3/include/std/iostream 
b/libstdc++-v3/include/std/iostream
index cfd124dcf43..ec337cf89dd 100644
--- a/libstdc++-v3/include/std/iostream
+++ b/libstdc++-v3/include/std/iostream
@@ -75,7 +75,7 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
   // For construction of filebuffers for cout, cin, cerr, clog et. al.
   // When the init_priority attribute is usable, we do this initialization
   // in the compiled library instead (src/c++98/globals_io.cc).
-#if !__has_attribute(__init_priority__)
+#if !(_GLIBCXX_USE_INIT_PRIORITY_ATTRIBUTE && __has_attribute(init_priority))
   static ios_base::Init __ioinit;
 #elif defined(_GLIBCXX_SYMVER_GNU)
   __extension__ __asm (".globl _ZSt21ios_base_library_initv");
diff --git a/libstdc++-v3/src/c++98/ios_base_init.h 
b/libstdc++-v3/src/c++98/ios_base_init.h
index b600ec3298e..f7edfc84625 100644
--- a/libstdc++-v3/src/c++98/ios_base_init.h
+++ b/libstdc++-v3/src/c++98/ios_base_init.h
@@ -8,6 +8,6 @@
 // constructor when statically linking with libstdc++.a), instead of
 // doing so in (each TU that includes) .
 // This needs to be done in the same TU that defines the stream objects.
-#if __has_attribute(init_priority)
+#if _GLIBCXX_USE_INIT_PRIORITY_ATTRIBUTE
 static ios_base::Init __ioinit __attribute__((init_priority(90)));
 #endif
-- 
2.41.0



Re: [PATCH] libstdc++: Fix iostream init for Clang on darwin [PR110432]

2023-06-30 Thread Patrick Palka via Gcc-patches
On Fri, 30 Jun 2023, Jonathan Wakely wrote:

> Tested x86_64-linux. Patrick, PTAL.
> 
> -- >8 --
> 
> The __has_attribute(init_priority) check in  is true for Clang
> on darwin, which means that user code including  thinks the
> library will initialize the global streams. However, when libstdc++ is
> built by GCC on darwin, the __has_attribute(init_priority) check is
> false, which means that the library thinks that user code will do the
> initialization when  is included. This means that the
> initialization is never done.
> 
> Add an autoconf check so that the header and the library both make their
> decision based on the static properties of GCC at build time, with a
> consistent outcome.
> 
> As a belt and braces check, also do the initialization in  if
> the compiler including that header doesn't support the attribute (even
> if the library also containers the initialization). This might result in
> redundant initialization done in , but ensures the
> initialization happens somewhere if there's any doubt about the
> attribute working correctly due to missing linker support.
> 
> libstdc++-v3/ChangeLog:
> 
>   PR libstdc++/110432
>   * acinclude.m4 (GLIBCXX_CHECK_INIT_PRIORITY): New.
>   * config.h.in: Regenerate.
>   * configure: Regenerate.
>   * configure.ac: Use GLIBCXX_CHECK_INIT_PRIORITY.
>   * include/std/iostream:

Missing ChangeLog entry?

>   * src/c++98/ios_base_init.h: Use new autoconf macro instead of
>   __has_attribute.
> ---
>  libstdc++-v3/acinclude.m4  | 27 ++
>  libstdc++-v3/config.h.in   |  3 ++
>  libstdc++-v3/configure | 51 ++
>  libstdc++-v3/configure.ac  |  3 ++
>  libstdc++-v3/include/std/iostream  |  2 +-
>  libstdc++-v3/src/c++98/ios_base_init.h |  2 +-
>  6 files changed, 86 insertions(+), 2 deletions(-)
> 
> diff --git a/libstdc++-v3/acinclude.m4 b/libstdc++-v3/acinclude.m4
> index 277ae10e031..823832f97d4 100644
> --- a/libstdc++-v3/acinclude.m4
> +++ b/libstdc++-v3/acinclude.m4
> @@ -5680,6 +5680,33 @@ AC_DEFUN([GLIBCXX_CHECK_ALIGNAS_CACHELINE], [
>AC_LANG_RESTORE
>  ])
>  
> +dnl
> +dnl Check whether iostream initialization should be done in the library,
> +dnl using the init_priority attribute.
> +dnl
> +dnl Defines:
> +dnl  _GLIBCXX_USE_INIT_PRIORITY_ATTRIBUTE if GCC supports the init_priority
> +dnlattribute for the target.
> +dnl
> +AC_DEFUN([GLIBCXX_CHECK_INIT_PRIORITY], [
> +AC_LANG_SAVE
> +  AC_LANG_CPLUSPLUS
> +
> +  AC_MSG_CHECKING([whether init_priority attribute is supported])
> +  AC_TRY_COMPILE(, [
> +  #if ! __has_attribute(init_priority)
> +  #error init_priority not supported
> +  #endif
> +  ], [ac_init_priority=yes], [ac_init_priority=no])
> +  if test "$ac_init_priority" = yes; then
> +AC_DEFINE_UNQUOTED(_GLIBCXX_USE_INIT_PRIORITY_ATTRIBUTE, 1,
> +  [Define if init_priority should be used for iostream initialization.])
> +  fi
> +  AC_MSG_RESULT($ac_init_priority)
> +
> +  AC_LANG_RESTORE
> +])
> +
>  # Macros from the top-level gcc directory.
>  m4_include([../config/gc++filt.m4])
>  m4_include([../config/tls.m4])
> diff --git a/libstdc++-v3/configure.ac b/libstdc++-v3/configure.ac
> index 9770c178767..fc0f2522027 100644
> --- a/libstdc++-v3/configure.ac
> +++ b/libstdc++-v3/configure.ac
> @@ -550,6 +550,9 @@ GLIBCXX_ZONEINFO_DIR
>  # For src/c++11/shared_ptr.cc alignment.
>  GLIBCXX_CHECK_ALIGNAS_CACHELINE
>  
> +# For using init_priority in ios_init.cc
> +GLIBCXX_CHECK_INIT_PRIORITY
> +
>  # Define documentation rules conditionally.
>  
>  # See if makeinfo has been installed and is modern enough
> diff --git a/libstdc++-v3/include/std/iostream 
> b/libstdc++-v3/include/std/iostream
> index cfd124dcf43..ec337cf89dd 100644
> --- a/libstdc++-v3/include/std/iostream
> +++ b/libstdc++-v3/include/std/iostream
> @@ -75,7 +75,7 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
>// For construction of filebuffers for cout, cin, cerr, clog et. al.
>// When the init_priority attribute is usable, we do this initialization
>// in the compiled library instead (src/c++98/globals_io.cc).
> -#if !__has_attribute(__init_priority__)
> +#if !(_GLIBCXX_USE_INIT_PRIORITY_ATTRIBUTE && __has_attribute(init_priority))

This should check __init_priority__ since init_priority is a non-reserved
name I think?  LGTM otherwise.

>static ios_base::Init __ioinit;
>  #elif defined(_GLIBCXX_SYMVER_GNU)
>__extension__ __asm (".globl _ZSt21ios_base_library_initv");
> diff --git a/libstdc++-v3/src/c++98/ios_base_init.h 
> b/libstdc++-v3/src/c++98/ios_base_init.h
> index b600ec3298e..f7edfc84625 100644
> --- a/libstdc++-v3/src/c++98/ios_base_init.h
> +++ b/libstdc++-v3/src/c++98/ios_base_init.h
> @@ -8,6 +8,6 @@
>  // constructor when statically linking with libstdc++.a), instead of
>  // doing so in (each TU that includes) .
>  // This needs to be done in the same TU that defines the stream objects.
> -#if __has_attribute(init_pri

[committed] libstdc++: Fix unused warning for new variable

2023-06-30 Thread Jonathan Wakely via Gcc-patches
Tested powerpc64le-linux and x86_64-linux. Pushed to trunk.

-- >8 --

This newly-introduced variable isn't used on all paths, so add the
[[maybe_unused]] attribute.

libstdc++-v3/ChangeLog:

* src/c++11/random.cc (random_device::_M_init): Add maybe_unused
attribute.
---
 libstdc++-v3/src/c++11/random.cc | 6 +++---
 1 file changed, 3 insertions(+), 3 deletions(-)

diff --git a/libstdc++-v3/src/c++11/random.cc b/libstdc++-v3/src/c++11/random.cc
index cece6edbfc7..75989bd3337 100644
--- a/libstdc++-v3/src/c++11/random.cc
+++ b/libstdc++-v3/src/c++11/random.cc
@@ -374,11 +374,11 @@ namespace std _GLIBCXX_VISIBILITY(default)
  " unsupported token"));
 
 #if defined ENOSYS
-const int unsupported = ENOSYS;
+[[maybe_unused]] const int unsupported = ENOSYS;
 #elif defined ENOTSUP
-const int unsupported = ENOTSUP;
+[[maybe_unused]] const int unsupported = ENOTSUP;
 #else
-const int unsupported = 0;
+[[maybe_unused]] const int unsupported = 0;
 #endif
 int err = 0;
 
-- 
2.41.0



Re: [PATCH] libstdc++: Fix iostream init for Clang on darwin [PR110432]

2023-06-30 Thread Jonathan Wakely via Gcc-patches
On Fri, 30 Jun 2023 at 15:29, Patrick Palka  wrote:
>
> On Fri, 30 Jun 2023, Jonathan Wakely wrote:
>
> > Tested x86_64-linux. Patrick, PTAL.
> >
> > -- >8 --
> >
> > The __has_attribute(init_priority) check in  is true for Clang
> > on darwin, which means that user code including  thinks the
> > library will initialize the global streams. However, when libstdc++ is
> > built by GCC on darwin, the __has_attribute(init_priority) check is
> > false, which means that the library thinks that user code will do the
> > initialization when  is included. This means that the
> > initialization is never done.
> >
> > Add an autoconf check so that the header and the library both make their
> > decision based on the static properties of GCC at build time, with a
> > consistent outcome.
> >
> > As a belt and braces check, also do the initialization in  if
> > the compiler including that header doesn't support the attribute (even
> > if the library also containers the initialization). This might result in
> > redundant initialization done in , but ensures the
> > initialization happens somewhere if there's any doubt about the
> > attribute working correctly due to missing linker support.
> >
> > libstdc++-v3/ChangeLog:
> >
> >   PR libstdc++/110432
> >   * acinclude.m4 (GLIBCXX_CHECK_INIT_PRIORITY): New.
> >   * config.h.in: Regenerate.
> >   * configure: Regenerate.
> >   * configure.ac: Use GLIBCXX_CHECK_INIT_PRIORITY.
> >   * include/std/iostream:
>
> Missing ChangeLog entry?
>
> >   * src/c++98/ios_base_init.h: Use new autoconf macro instead of
> >   __has_attribute.
> > ---
> >  libstdc++-v3/acinclude.m4  | 27 ++
> >  libstdc++-v3/config.h.in   |  3 ++
> >  libstdc++-v3/configure | 51 ++
> >  libstdc++-v3/configure.ac  |  3 ++
> >  libstdc++-v3/include/std/iostream  |  2 +-
> >  libstdc++-v3/src/c++98/ios_base_init.h |  2 +-
> >  6 files changed, 86 insertions(+), 2 deletions(-)
> >
> > diff --git a/libstdc++-v3/acinclude.m4 b/libstdc++-v3/acinclude.m4
> > index 277ae10e031..823832f97d4 100644
> > --- a/libstdc++-v3/acinclude.m4
> > +++ b/libstdc++-v3/acinclude.m4
> > @@ -5680,6 +5680,33 @@ AC_DEFUN([GLIBCXX_CHECK_ALIGNAS_CACHELINE], [
> >AC_LANG_RESTORE
> >  ])
> >
> > +dnl
> > +dnl Check whether iostream initialization should be done in the library,
> > +dnl using the init_priority attribute.
> > +dnl
> > +dnl Defines:
> > +dnl  _GLIBCXX_USE_INIT_PRIORITY_ATTRIBUTE if GCC supports the init_priority
> > +dnlattribute for the target.
> > +dnl
> > +AC_DEFUN([GLIBCXX_CHECK_INIT_PRIORITY], [
> > +AC_LANG_SAVE
> > +  AC_LANG_CPLUSPLUS
> > +
> > +  AC_MSG_CHECKING([whether init_priority attribute is supported])
> > +  AC_TRY_COMPILE(, [
> > +  #if ! __has_attribute(init_priority)
> > +  #error init_priority not supported
> > +  #endif
> > +  ], [ac_init_priority=yes], [ac_init_priority=no])
> > +  if test "$ac_init_priority" = yes; then
> > +AC_DEFINE_UNQUOTED(_GLIBCXX_USE_INIT_PRIORITY_ATTRIBUTE, 1,
> > +  [Define if init_priority should be used for iostream 
> > initialization.])
> > +  fi
> > +  AC_MSG_RESULT($ac_init_priority)
> > +
> > +  AC_LANG_RESTORE
> > +])
> > +
> >  # Macros from the top-level gcc directory.
> >  m4_include([../config/gc++filt.m4])
> >  m4_include([../config/tls.m4])
> > diff --git a/libstdc++-v3/configure.ac b/libstdc++-v3/configure.ac
> > index 9770c178767..fc0f2522027 100644
> > --- a/libstdc++-v3/configure.ac
> > +++ b/libstdc++-v3/configure.ac
> > @@ -550,6 +550,9 @@ GLIBCXX_ZONEINFO_DIR
> >  # For src/c++11/shared_ptr.cc alignment.
> >  GLIBCXX_CHECK_ALIGNAS_CACHELINE
> >
> > +# For using init_priority in ios_init.cc
> > +GLIBCXX_CHECK_INIT_PRIORITY
> > +
> >  # Define documentation rules conditionally.
> >
> >  # See if makeinfo has been installed and is modern enough
> > diff --git a/libstdc++-v3/include/std/iostream 
> > b/libstdc++-v3/include/std/iostream
> > index cfd124dcf43..ec337cf89dd 100644
> > --- a/libstdc++-v3/include/std/iostream
> > +++ b/libstdc++-v3/include/std/iostream
> > @@ -75,7 +75,7 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
> >// For construction of filebuffers for cout, cin, cerr, clog et. al.
> >// When the init_priority attribute is usable, we do this initialization
> >// in the compiled library instead (src/c++98/globals_io.cc).
> > -#if !__has_attribute(__init_priority__)
> > +#if !(_GLIBCXX_USE_INIT_PRIORITY_ATTRIBUTE && 
> > __has_attribute(init_priority))
>
> This should check __init_priority__ since init_priority is a non-reserved
> name I think?  LGTM otherwise.

Thanks, fixed and pushed to trunk. Backport to gcc-13 to follow.



Re: [PATCH 3/3] analyzer: add text-art visualizations of out-of-bounds accesses [PR106626]

2023-06-30 Thread Martin Jambor
Hi David,

On Wed, May 31 2023, David Malcolm via Gcc-patches wrote:
> This patch extends -Wanalyzer-out-of-bounds so that, where possible, it
> will emit a text art diagram visualizing the spatial relationship between

[...]


>
> gcc/ChangeLog:
>   PR analyzer/106626
>   * Makefile.in (ANALYZER_OBJS): Add analyzer/access-diagram.o.
>   * doc/invoke.texi (Wanalyzer-out-of-bounds): Add description of
>   text art.
>   (fanalyzer-debug-text-art): New.
>
> gcc/analyzer/ChangeLog:
>   PR analyzer/106626
>   * access-diagram.cc: New file.
>   * access-diagram.h: New file.
>   * analyzer.h (class region_offset): Add default ctor.
>   (region_offset::make_byte_offset): New decl.
>   (region_offset::concrete_p): New.
>   (region_offset::get_concrete_byte_offset): New.
>   (region_offset::calc_symbolic_bit_offset): New decl.
>   (region_offset::calc_symbolic_byte_offset): New decl.
>   (region_offset::dump_to_pp): New decl.
>   (region_offset::dump): New decl.
>   (operator<, operator<=, operator>, operator>=): New decls for
>   region_offset.
>   * analyzer.opt
>   (-param=analyzer-text-art-string-ellipsis-threshold=): New.
>   (-param=analyzer-text-art-string-ellipsis-head-len=): New.
>   (-param=analyzer-text-art-string-ellipsis-tail-len=): New.
>   (-param=analyzer-text-art-ideal-canvas-width=): New.

contrib/check-params-in-docs.py now complains that:

  $ ./gcc/xgcc -Bgcc --help=param &>/tmp/params.txt
  $ ../src/contrib/check-params-in-docs.py ../src/gcc/doc/invoke.texi 
/tmp/params.txt 
  Missing:
  @item analyzer-text-art-string-ellipsis-threshold
  The number of bytes at which to ellipsize string literals in

  @item analyzer-text-art-string-ellipsis-head-len
  The number of literal bytes to show at the head of a string

  @item analyzer-text-art-string-ellipsis-tail-len
  The number of literal bytes to show at the tail of a string

  @item analyzer-text-art-ideal-canvas-width
  The ideal width in characters of text art diagrams generated by the

Can you please add the respective documentation entries?

Thanks!

Martin


Re: [PATCH v2 1/3] c++: Track lifetimes in constant evaluation [PR70331, PR96630, PR98675]

2023-06-30 Thread Nathaniel Shead via Gcc-patches
On Mon, Jun 26, 2023 at 03:37:32PM -0400, Patrick Palka wrote:
> On Sun, 25 Jun 2023, Nathaniel Shead wrote:
> 
> > On Fri, Jun 23, 2023 at 12:43:21PM -0400, Patrick Palka wrote:
> > > On Wed, 29 Mar 2023, Nathaniel Shead via Gcc-patches wrote:
> > > 
> > > > This adds rudimentary lifetime tracking in C++ constexpr contexts,
> > > > allowing the compiler to report errors with using values after their
> > > > backing has gone out of scope. We don't yet handle other ways of ending
> > > > lifetimes (e.g. explicit destructor calls).
> > > 
> > > Awesome!
> > > 
> > > > 
> > > > PR c++/96630
> > > > PR c++/98675
> > > > PR c++/70331
> > > > 
> > > > gcc/cp/ChangeLog:
> > > > 
> > > > * constexpr.cc (constexpr_global_ctx::put_value): Mark value as
> > > > in lifetime.
> > > > (constexpr_global_ctx::remove_value): Mark value as expired.
> > > > (cxx_eval_call_expression): Remove comment that is no longer
> > > > applicable.
> > > > (non_const_var_error): Add check for expired values.
> > > > (cxx_eval_constant_expression): Add checks for expired values. 
> > > > Forget
> > > > local variables at end of bind expressions. Forget temporaries 
> > > > at end
> > > > of cleanup points.
> > > > * cp-tree.h (struct lang_decl_base): New flag to track expired 
> > > > values
> > > > in constant evaluation.
> > > > (DECL_EXPIRED_P): Access the new flag.
> > > > (SET_DECL_EXPIRED_P): Modify the new flag.
> > > > * module.cc (trees_out::lang_decl_bools): Write out the new 
> > > > flag.
> > > > (trees_in::lang_decl_bools): Read in the new flag.
> > > > 
> > > > gcc/testsuite/ChangeLog:
> > > > 
> > > > * g++.dg/cpp0x/constexpr-ice20.C: Update error raised by test.
> > > > * g++.dg/cpp1y/constexpr-lifetime1.C: New test.
> > > > * g++.dg/cpp1y/constexpr-lifetime2.C: New test.
> > > > * g++.dg/cpp1y/constexpr-lifetime3.C: New test.
> > > > * g++.dg/cpp1y/constexpr-lifetime4.C: New test.
> > > > * g++.dg/cpp1y/constexpr-lifetime5.C: New test.
> > > > 
> > > > Signed-off-by: Nathaniel Shead 
> > > > ---
> > > >  gcc/cp/constexpr.cc   | 69 +++
> > > >  gcc/cp/cp-tree.h  | 10 ++-
> > > >  gcc/cp/module.cc  |  2 +
> > > >  gcc/testsuite/g++.dg/cpp0x/constexpr-ice20.C  |  2 +-
> > > >  .../g++.dg/cpp1y/constexpr-lifetime1.C| 13 
> > > >  .../g++.dg/cpp1y/constexpr-lifetime2.C| 20 ++
> > > >  .../g++.dg/cpp1y/constexpr-lifetime3.C| 13 
> > > >  .../g++.dg/cpp1y/constexpr-lifetime4.C| 11 +++
> > > >  .../g++.dg/cpp1y/constexpr-lifetime5.C| 11 +++
> > > >  9 files changed, 137 insertions(+), 14 deletions(-)
> > > >  create mode 100644 gcc/testsuite/g++.dg/cpp1y/constexpr-lifetime1.C
> > > >  create mode 100644 gcc/testsuite/g++.dg/cpp1y/constexpr-lifetime2.C
> > > >  create mode 100644 gcc/testsuite/g++.dg/cpp1y/constexpr-lifetime3.C
> > > >  create mode 100644 gcc/testsuite/g++.dg/cpp1y/constexpr-lifetime4.C
> > > >  create mode 100644 gcc/testsuite/g++.dg/cpp1y/constexpr-lifetime5.C
> > > > 
> > > > diff --git a/gcc/cp/constexpr.cc b/gcc/cp/constexpr.cc
> > > > index 3de60cfd0f8..bdbc12144a7 100644
> > > > --- a/gcc/cp/constexpr.cc
> > > > +++ b/gcc/cp/constexpr.cc
> > > > @@ -1185,10 +1185,22 @@ public:
> > > >void put_value (tree t, tree v)
> > > >{
> > > >  bool already_in_map = values.put (t, v);
> > > > +if (!already_in_map && DECL_P (t))
> > > > +  {
> > > > +   if (!DECL_LANG_SPECIFIC (t))
> > > > + retrofit_lang_decl (t);
> > > > +   if (DECL_LANG_SPECIFIC (t))
> > > > + SET_DECL_EXPIRED_P (t, false);
> > > > +  }
> > > 
> > > Since this new flag would only be used only during constexpr evaluation,
> > > could we instead use an on-the-side hash_set in constexpr_global_ctx for
> > > tracking expired-ness?  That way we won't have to allocate a
> > > DECL_LANG_SPECIFIC structure for decls that lack it, and won't have to
> > > worry about the flag in other parts of the compiler.
> > 
> > I've tried this but I haven't been able to get it to work well. The main
> > issue I'm running into is the caching of function calls in constant
> > evaluation. For example, consider the following:
> > 
> > constexpr const double& test() {
> >   const double& local = 3.0;
> >   return local;
> > }
> > 
> > constexpr int foo(const double&) { return 5; }
> > 
> > constexpr int a = foo(test());
> > static_assert(test() == 3.0);
> > 
> > When constant-evaluating 'a', we evaluate 'test()'. It returns a value
> > that ends its lifetime immediately, so we mark this in 'ctx->global' as
> > expired. However, 'foo()' never actually evaluates this expired value,
> > so the initialisation of 'a' succeeds.
> > 
> > However, then when the static assert

[PATCH v2 3/3] c++: Improve location information in constexpr evaluation

2023-06-30 Thread Nathaniel Shead via Gcc-patches
On Fri, Jun 23, 2023 at 01:09:14PM -0400, Patrick Palka wrote:
> On Wed, 29 Mar 2023, Nathaniel Shead via Gcc-patches wrote:
> 
> > This patch caches the current expression's location information in the
> > constexpr_global_ctx struct, which allows subexpressions that have lost
> > location information to still provide accurate diagnostics. Also
> > rewrites a number of 'error' calls as 'error_at' to provide more
> > specific location information.
> > 
> > The primary effect of this change is that many errors within evaluation
> > of a constexpr function will now point at the offending expression (with
> > expansion tracing information) rather than just the outermost call.
> 
> This seems like a great improvement!
> 
> In other parts of the frontend, e.g. during substitution from
> tsubst_expr or tsubst_copy_and_build, we do something similar by
> setting/restoring input_location directly.  (We've since added the RAII
> class iloc_sentinel for this.)  I wonder if that'd be preferable here?

I didn't consider that; I've given it a try and I think it's nicer.
Doing it this way also updated a number of 'error' calls that I hadn't
fixed up in this version; generally this meant nicer error messages, but
I had to override it for a couple of cases where I felt the errors it
raised were worse (by adding context that made no sense).

I'm still bootstrapping/regtesting but I'll send out an updated version
of this sometime tomorrow when it's done. Thanks!

> > 
> > gcc/cp/ChangeLog:
> > 
> > * constexpr.cc (constexpr_global_ctx): New field for cached
> > tree location, defaulting to input_location.
> > (cxx_eval_internal_function): Fall back to ctx->global->loc
> > rather than input_location.
> > (modifying_const_object_error): Likewise.
> > (cxx_eval_dynamic_cast_fn): Likewise.
> > (eval_and_check_array_index): Likewise.
> > (cxx_eval_array_reference): Likewise.
> > (cxx_eval_bit_field_ref): Likewise.
> > (cxx_eval_component_reference): Likewise.
> > (cxx_eval_indirect_ref): Likewise.
> > (cxx_eval_store_expression): Likewise.
> > (cxx_eval_increment_expression): Likewise.
> > (cxx_eval_loop_expr): Likewise.
> > (cxx_eval_binary_expression): Likewise.
> > (cxx_eval_constant_expression): Cache location of trees for use
> > in errors, and prefer it instead of input_location.
> > 
> > gcc/testsuite/ChangeLog:
> > 
> > * g++.dg/cpp0x/constexpr-48089.C: Updated diagnostic locations.
> > * g++.dg/cpp0x/constexpr-diag3.C: Likewise.
> > * g++.dg/cpp0x/constexpr-ice20.C: Likewise.
> > * g++.dg/cpp1y/constexpr-89481.C: Likewise.
> > * g++.dg/cpp1y/constexpr-lifetime1.C: Likewise.
> > * g++.dg/cpp1y/constexpr-lifetime2.C: Likewise.
> > * g++.dg/cpp1y/constexpr-lifetime3.C: Likewise.
> > * g++.dg/cpp1y/constexpr-lifetime4.C: Likewise.
> > * g++.dg/cpp1y/constexpr-lifetime5.C: Likewise.
> > * g++.dg/cpp1y/constexpr-union5.C: Likewise.
> > * g++.dg/cpp1y/pr68180.C: Likewise.
> > * g++.dg/cpp1z/constexpr-lambda6.C: Likewise.
> > * g++.dg/cpp2a/bit-cast11.C: Likewise.
> > * g++.dg/cpp2a/bit-cast12.C: Likewise.
> > * g++.dg/cpp2a/bit-cast14.C: Likewise.
> > * g++.dg/cpp2a/constexpr-98122.C: Likewise.
> > * g++.dg/cpp2a/constexpr-dynamic17.C: Likewise.
> > * g++.dg/cpp2a/constexpr-init1.C: Likewise.
> > * g++.dg/cpp2a/constexpr-new12.C: Likewise.
> > * g++.dg/cpp2a/constexpr-new3.C: Likewise.
> > * g++.dg/ext/constexpr-vla2.C: Likewise.
> > * g++.dg/ext/constexpr-vla3.C: Likewise.
> > * g++.dg/ubsan/pr63956.C: Likewise.
> > 
> > libstdc++/ChangeLog:
> > 
> > * testsuite/25_algorithms/equal/constexpr_neg.cc: Updated
> > diagnostics locations.
> > 
> > Signed-off-by: Nathaniel Shead 
> > ---
> >  gcc/cp/constexpr.cc   | 83 +++
> >  gcc/testsuite/g++.dg/cpp0x/constexpr-48089.C  | 10 +--
> >  gcc/testsuite/g++.dg/cpp0x/constexpr-diag3.C  |  2 +-
> >  gcc/testsuite/g++.dg/cpp0x/constexpr-ice20.C  |  4 +-
> >  gcc/testsuite/g++.dg/cpp1y/constexpr-89481.C  |  3 +-
> >  .../g++.dg/cpp1y/constexpr-lifetime1.C|  1 +
> >  .../g++.dg/cpp1y/constexpr-lifetime2.C|  4 +-
> >  .../g++.dg/cpp1y/constexpr-lifetime3.C|  4 +-
> >  .../g++.dg/cpp1y/constexpr-lifetime4.C|  2 +-
> >  .../g++.dg/cpp1y/constexpr-lifetime5.C|  4 +-
> >  gcc/testsuite/g++.dg/cpp1y/constexpr-union5.C |  4 +-
> >  gcc/testsuite/g++.dg/cpp1y/pr68180.C  |  4 +-
> >  .../g++.dg/cpp1z/constexpr-lambda6.C  |  4 +-
> >  gcc/testsuite/g++.dg/cpp2a/bit-cast11.C   | 10 +--
> >  gcc/testsuite/g++.dg/cpp2a/bit-cast12.C   | 10 +--
> >  gcc/testsuite/g++.dg/cpp2a/bit-cast14.C   | 14 ++--
> >  gcc/testsuite/g++.dg/cpp2a/constexpr-98122.C  |  4 +-
> >  .../g++.dg/cpp2a/constexpr-dynamic17.C|  5 +-
> >  gcc/testsuite/g++.dg/cpp2a/constexpr-init1.C  |  5 +-
> >  gcc/testsuite/g++.dg/cpp2a/constexpr-new12.C  |  6 

[pushed 1/2] jit: avoid using __vector in testcase [PR110466]

2023-06-30 Thread David Malcolm via Gcc-patches
r13-4531-gd2e782cb99c311 added test coverage to libgccjit's vector
support, but used __vector, which doesn't work on Power.  Additionally
the size param to gcc_jit_type_get_vector was wrong.

Fixed thusly.

Successfully regrtested on x86_64-pc-linux-gnu.
Verified fix on powerpc64le-unknown-linux-gnu (gcc112 in Compile Farm).
Pushed to trunk as r14--g6735d660839533.

gcc/testsuite/ChangeLog:
PR jit/110466
* jit.dg/test-expressions.c (run_test_of_comparison): Fix size
param to gcc_jit_type_get_vector.
(verify_comparisons): Use a typedef rather than __vector.

Co-authored-by: Marek Polacek 
---
 gcc/testsuite/jit.dg/test-expressions.c | 22 +++---
 1 file changed, 11 insertions(+), 11 deletions(-)

diff --git a/gcc/testsuite/jit.dg/test-expressions.c 
b/gcc/testsuite/jit.dg/test-expressions.c
index 13b3baf79ea..2337b01907e 100644
--- a/gcc/testsuite/jit.dg/test-expressions.c
+++ b/gcc/testsuite/jit.dg/test-expressions.c
@@ -417,7 +417,7 @@ static void run_test_of_comparison(gcc_jit_context *ctxt,
 const char *expected)
 {
   gcc_jit_type *vec_type =
-gcc_jit_type_get_vector (type, 4);
+gcc_jit_type_get_vector (type, 2);
 
   CHECK_STRING_VALUE (
 make_test_of_comparison (ctxt,
@@ -560,17 +560,17 @@ verify_comparisons (gcc_jit_result *result)
   CHECK_VALUE (test_COMPARISON_GE_on_int (1, 2), 0);
   CHECK_VALUE (test_COMPARISON_GE_on_int (2, 1), 1);
 
-  typedef int __vector __attribute__ ((__vector_size__ (sizeof(int) * 2)));
-  typedef __vector (*test_vec_fn) (__vector, __vector);
+  typedef int v2si __attribute__ ((__vector_size__ (sizeof(int) * 2)));
+  typedef v2si (*test_vec_fn) (v2si, v2si);
 
-  __vector zero_zero = {0, 0};
-  __vector zero_one = {0, 1};
-  __vector one_zero = {1, 0};
+  v2si zero_zero = {0, 0};
+  v2si zero_one = {0, 1};
+  v2si one_zero = {1, 0};
 
-  __vector true_true = {-1, -1};
-  __vector false_true = {0, -1};
-  __vector true_false = {-1, 0};
-  __vector false_false = {0, 0};
+  v2si true_true = {-1, -1};
+  v2si false_true = {0, -1};
+  v2si true_false = {-1, 0};
+  v2si false_false = {0, 0};
 
   test_vec_fn test_COMPARISON_EQ_on_vec_int =
 (test_vec_fn)gcc_jit_result_get_code (result,
@@ -615,7 +615,7 @@ verify_comparisons (gcc_jit_result *result)
   CHECK_VECTOR_VALUE (2, test_COMPARISON_GE_on_vec_int (zero_one, one_zero), 
false_true);
 
   typedef float __vector_f __attribute__ ((__vector_size__ (sizeof(float) * 
2)));
-  typedef __vector (*test_vec_f_fn) (__vector_f, __vector_f);
+  typedef v2si (*test_vec_f_fn) (__vector_f, __vector_f);
 
   __vector_f zero_zero_f = {0, 0};
   __vector_f zero_one_f = {0, 1};
-- 
2.26.3



[pushed 2/2] jit.exp: handle dwarf version mismatch in jit-check-debug-info [PR110466]

2023-06-30 Thread David Malcolm via Gcc-patches
Successfully regrtested on x86_64-pc-linux-gnu.
Verified fix on powerpc64le-unknown-linux-gnu (gcc112 in Compile Farm).
Pushed to trunk as r14-2223-gc3c0ba5436170e.

gcc/testsuite/ChangeLog:
PR jit/110466
* jit.dg/jit.exp (jit-check-debug-info): Gracefully handle too
early versions of gdb that don't support our dwarf version, via
"unsupported".
---
 gcc/testsuite/jit.dg/jit.exp | 4 
 1 file changed, 4 insertions(+)

diff --git a/gcc/testsuite/jit.dg/jit.exp b/gcc/testsuite/jit.dg/jit.exp
index 3568dbb9d63..8bf7e51c24f 100644
--- a/gcc/testsuite/jit.dg/jit.exp
+++ b/gcc/testsuite/jit.dg/jit.exp
@@ -440,6 +440,10 @@ proc jit-check-debug-info { obj_file cmds match } {
 send $cmd
 }
 expect {
+   -re "Dwarf Error: wrong version in compilation unit header" {
+   set testcase [testname-for-summary]
+   unsupported "$testcase: gdb does not support dwarf version"
+   }
 -re $match { pass OK }
 default { fail FAIL }
 }
-- 
2.26.3



Re: Adjust LTO mode tables for "Machine_Mode: Extend machine_mode from 8 to 16 bits" (was: [PATCH] Machine_Mode: Extend machine_mode from 8 to 16 bits)

2023-06-30 Thread Thomas Schwinge
Hi!

On 2023-06-30T20:45:38+0800, Kito Cheng  wrote:
>> On 2023-05-13T16:44:41+0800, Kito Cheng via Gcc-patches 
>>  wrote:
>> > Tried this patch and I ran into some issues, some variables are using
>> > unsigned char to hold machine mode and will have problems when the
>> > number of modes is larger than 255...
>> >
>> > And here is the fix:
>>
>> > --- a/gcc/genmodes.cc
>> > +++ b/gcc/genmodes.cc
>> > @@ -1141,10 +1141,10 @@ inline __attribute__((__always_inline__))\n\
>> > #else\n\
>> > extern __inline__ __attribute__((__always_inline__, __gnu_inline__))\n\
>> > #endif\n\
>> > -unsigned char\n\
>> > +unsigned short\n\
>> > mode_inner_inline (machine_mode mode)\n\
>> > {\n\
>> > -  extern const unsigned char mode_inner[NUM_MACHINE_MODES];\n\
>> > +  extern const unsigned short mode_inner[NUM_MACHINE_MODES];\n\
>> >   gcc_assert (mode >= 0 && mode < NUM_MACHINE_MODES);\n\
>> >   switch (mode)\n\
>> > {");
>> > @@ -1529,7 +1529,7 @@ emit_mode_wider (void)
>> >   int c;
>> >   struct mode_data *m;
>> >
>> > -  print_decl ("unsigned char", "mode_next", "NUM_MACHINE_MODES");
>> > +  print_decl ("unsigned short", "mode_next", "NUM_MACHINE_MODES");
>>
>> Etc.
>>
>> Instead of 's%char%short', shouldn't we really be using
>> 'enum machine_mode' here?  (I understand such a change may require some
>> further surgery, but wouldn't it be the correct thing to do?)
>
> Hmmm, I think maybe what we need is to leverage C++ language features
> to declare enum with underlying types like that:
>
> enum machine_mode : uint16_t

Eh, so that's the reason/confusion (or, at least some of it...) here: my
(naïve...) assumption has been that 'enum machine_mode' already does have
a fixed underlying type -- but apparently it does not, so defaults to
'unsigned int'!

(gdb) ptype lto_mode_identity_table
type = const enum machine_mode : unsigned int {E_VOIDmode, E_BLKmode, 
E_CCmode, [...], NUM_MACHINE_MODES = 130} *

So, yeah, should we fix that, and then generally use 'enum machine_mode'
instead of 'char' vs. 'short'?  (Or, which other "detail" do I fail to
recognize this time?)


Grüße
 Thomas
-
Siemens Electronic Design Automation GmbH; Anschrift: Arnulfstraße 201, 80634 
München; Gesellschaft mit beschränkter Haftung; Geschäftsführer: Thomas 
Heurung, Frank Thürauf; Sitz der Gesellschaft: München; Registergericht 
München, HRB 106955


RE: [PATCH 2/19][front-end] C/C++ front-end: add pragma GCC novector

2023-06-30 Thread Tamar Christina via Gcc-patches
Hi Jason,

Thanks for the review. I only now realized I should have split them between C 
and C++.

Will do so on the respins.

> 
> On 6/28/23 09:41, Tamar Christina wrote:
> > Hi All,
> >
> > FORTRAN currently has a pragma NOVECTOR for indicating that
> > vectorization should not be applied to a particular loop.
> >
> > ICC/ICX also has such a pragma for C and C++ called #pragma novector.
> >
> > As part of this patch series I need a way to easily turn off
> > vectorization of particular loops, particularly for testsuite reasons.
> >
> > This patch proposes a #pragma GCC novector that does the same for C
> > and C++ as gfortan does for FORTRAN and what ICX/ICX does for C and C++.
> >
> > I added only some basic tests here, but the next patch in the series
> > uses this in the testsuite in about ~800 tests.
> >
> > Bootstrapped Regtested on aarch64-none-linux-gnu and no issues.
> >
> > Ok for master?
> >
> > Thanks,
> > Tamar
> >
> > gcc/c-family/ChangeLog:
> >
> > * c-pragma.h (enum pragma_kind): Add PRAGMA_NOVECTOR.
> > * c-pragma.cc (init_pragma): Use it.
> >
> > gcc/c/ChangeLog:
> >
> > * c-parser.cc (c_parser_while_statement, c_parser_do_statement,
> > c_parser_for_statement, c_parser_statement_after_labels,
> > c_parse_pragma_novector, c_parser_pragma): Wire through novector
> and
> > default to false.
> 
> I'll let the C maintainers review the C changes.
> 
> > gcc/cp/ChangeLog:
> >
> > * cp-tree.def (RANGE_FOR_STMT): Update comment.
> > * cp-tree.h (RANGE_FOR_NOVECTOR): New.
> > (cp_convert_range_for, finish_while_stmt_cond, finish_do_stmt,
> > finish_for_cond): Add novector param.
> > * init.cc (build_vec_init): Default novector to false.
> > * method.cc (build_comparison_op): Likewise.
> > * parser.cc (cp_parser_statement): Likewise.
> > (cp_parser_for, cp_parser_c_for, cp_parser_range_for,
> > cp_convert_range_for, cp_parser_iteration_statement,
> > cp_parser_omp_for_loop, cp_parser_pragma): Support novector.
> > (cp_parser_pragma_novector): New.
> > * pt.cc (tsubst_expr): Likewise.
> > * semantics.cc (finish_while_stmt_cond, finish_do_stmt,
> > finish_for_cond): Likewise.
> >
> > gcc/ChangeLog:
> >
> > * doc/extend.texi: Document it.
> > * tree-core.h (struct tree_base): Add lang_flag_7 and reduce spare0.
> > * tree.h (TREE_LANG_FLAG_7): New.
> 
> This doesn't seem necessary; I think only flags 1 and 6 are currently used in
> RANGE_FOR_STMT.

Ah fair, I thought every option needed to occupy a specific bit. I'll try to 
re-use one.

> 
> > gcc/testsuite/ChangeLog:
> >
> > * g++.dg/vect/vect-novector-pragma.cc: New test.
> > * gcc.dg/vect/vect-novector-pragma.c: New test.
> >
> > --- inline copy of patch --
> >...
> > @@ -13594,7 +13595,8 @@ cp_parser_condition (cp_parser* parser)
> >  not included. */
> >
> >   static tree
> > -cp_parser_for (cp_parser *parser, bool ivdep, unsigned short unroll)
> > +cp_parser_for (cp_parser *parser, bool ivdep, unsigned short unroll,
> > +  bool novector)
> 
> I wonder about combining the ivdep and novector parameters here and in
> other functions?  Up to you.

As in, combine them in e.g. a struct?

> 
> > @@ -49613,17 +49633,33 @@ cp_parser_pragma (cp_parser *parser,
> enum pragma_context context, bool *if_p)
> > break;
> >   }
> > const bool ivdep = cp_parser_pragma_ivdep (parser, pragma_tok);
> > -   unsigned short unroll;
> > +   unsigned short unroll = 0;
> > +   bool novector = false;
> > cp_token *tok = cp_lexer_peek_token (the_parser->lexer);
> > -   if (tok->type == CPP_PRAGMA
> > -   && cp_parser_pragma_kind (tok) == PRAGMA_UNROLL)
> > +
> > +   while (tok->type == CPP_PRAGMA)
> >   {
> > -   tok = cp_lexer_consume_token (parser->lexer);
> > -   unroll = cp_parser_pragma_unroll (parser, tok);
> > -   tok = cp_lexer_peek_token (the_parser->lexer);
> > +   switch (cp_parser_pragma_kind (tok))
> > + {
> > +   case PRAGMA_UNROLL:
> > + {
> > +   tok = cp_lexer_consume_token (parser->lexer);
> > +   unroll = cp_parser_pragma_unroll (parser, tok);
> > +   tok = cp_lexer_peek_token (the_parser->lexer);
> > +   break;
> > + }
> > +   case PRAGMA_NOVECTOR:
> > + {
> > +   tok = cp_lexer_consume_token (parser->lexer);
> > +   novector = cp_parser_pragma_novector (parser, tok);
> > +   tok = cp_lexer_peek_token (the_parser->lexer);
> > +   break;
> > + }
> > +   default:
> > + gcc_unreachable ();
> > + }
> >   }
> 
> Repeating this pattern three times for the three related pragmas is too much;
> please combine the three cases into one.

Sure, I had some trouble combing them before because of the initial token being
consumed, but think I know a way.

Thanks for the review, will send updated split patch Monday.

C

Re: Adjust LTO mode tables for "Machine_Mode: Extend machine_mode from 8 to 16 bits" (was: [PATCH] Machine_Mode: Extend machine_mode from 8 to 16 bits)

2023-06-30 Thread Jakub Jelinek via Gcc-patches
On Fri, Jun 30, 2023 at 08:45:38PM +0800, Kito Cheng wrote:
> Hmmm, I think maybe what we need is to leverage C++ language features
> to declare enum with underlying types like that:
> 
> enum machine_mode : uint16_t

What would be the advantage of doing that?
I mean, on most hosts using unsigned rather than unsigned short is
actually faster, and for the cases where we care about the size
(e.g. mode in RTL, DECLs and the like) we already use enum bitfields.

Jakub



Re: [PATCH 2/19][front-end] C/C++ front-end: add pragma GCC novector

2023-06-30 Thread Jason Merrill via Gcc-patches
On Fri, Jun 30, 2023, 12:18 PM Tamar Christina 
wrote:

> Hi Jason,
>
> Thanks for the review. I only now realized I should have split them
> between C and C++.
>
> Will do so on the respins.
>
> >
> > On 6/28/23 09:41, Tamar Christina wrote:
> > > Hi All,
> > >
> > > FORTRAN currently has a pragma NOVECTOR for indicating that
> > > vectorization should not be applied to a particular loop.
> > >
> > > ICC/ICX also has such a pragma for C and C++ called #pragma novector.
> > >
> > > As part of this patch series I need a way to easily turn off
> > > vectorization of particular loops, particularly for testsuite reasons.
> > >
> > > This patch proposes a #pragma GCC novector that does the same for C
> > > and C++ as gfortan does for FORTRAN and what ICX/ICX does for C and
> C++.
> > >
> > > I added only some basic tests here, but the next patch in the series
> > > uses this in the testsuite in about ~800 tests.
> > >
> > > Bootstrapped Regtested on aarch64-none-linux-gnu and no issues.
> > >
> > > Ok for master?
> > >
> > > Thanks,
> > > Tamar
> > >
> > > gcc/c-family/ChangeLog:
> > >
> > > * c-pragma.h (enum pragma_kind): Add PRAGMA_NOVECTOR.
> > > * c-pragma.cc (init_pragma): Use it.
> > >
> > > gcc/c/ChangeLog:
> > >
> > > * c-parser.cc (c_parser_while_statement, c_parser_do_statement,
> > > c_parser_for_statement, c_parser_statement_after_labels,
> > > c_parse_pragma_novector, c_parser_pragma): Wire through novector
> > and
> > > default to false.
> >
> > I'll let the C maintainers review the C changes.
> >
> > > gcc/cp/ChangeLog:
> > >
> > > * cp-tree.def (RANGE_FOR_STMT): Update comment.
> > > * cp-tree.h (RANGE_FOR_NOVECTOR): New.
> > > (cp_convert_range_for, finish_while_stmt_cond, finish_do_stmt,
> > > finish_for_cond): Add novector param.
> > > * init.cc (build_vec_init): Default novector to false.
> > > * method.cc (build_comparison_op): Likewise.
> > > * parser.cc (cp_parser_statement): Likewise.
> > > (cp_parser_for, cp_parser_c_for, cp_parser_range_for,
> > > cp_convert_range_for, cp_parser_iteration_statement,
> > > cp_parser_omp_for_loop, cp_parser_pragma): Support novector.
> > > (cp_parser_pragma_novector): New.
> > > * pt.cc (tsubst_expr): Likewise.
> > > * semantics.cc (finish_while_stmt_cond, finish_do_stmt,
> > > finish_for_cond): Likewise.
> > >
> > > gcc/ChangeLog:
> > >
> > > * doc/extend.texi: Document it.
> > > * tree-core.h (struct tree_base): Add lang_flag_7 and reduce
> spare0.
> > > * tree.h (TREE_LANG_FLAG_7): New.
> >
> > This doesn't seem necessary; I think only flags 1 and 6 are currently
> used in
> > RANGE_FOR_STMT.
>
> Ah fair, I thought every option needed to occupy a specific bit. I'll try
> to re-use one.
>
> >
> > > gcc/testsuite/ChangeLog:
> > >
> > > * g++.dg/vect/vect-novector-pragma.cc: New test.
> > > * gcc.dg/vect/vect-novector-pragma.c: New test.
> > >
> > > --- inline copy of patch --
> > >...
> > > @@ -13594,7 +13595,8 @@ cp_parser_condition (cp_parser* parser)
> > >  not included. */
> > >
> > >   static tree
> > > -cp_parser_for (cp_parser *parser, bool ivdep, unsigned short unroll)
> > > +cp_parser_for (cp_parser *parser, bool ivdep, unsigned short unroll,
> > > +  bool novector)
> >
> > I wonder about combining the ivdep and novector parameters here and in
> > other functions?  Up to you.
>
> As in, combine them in e.g. a struct?
>

I was thinking in an int or enum.

>
> > > @@ -49613,17 +49633,33 @@ cp_parser_pragma (cp_parser *parser,
> > enum pragma_context context, bool *if_p)
> > > break;
> > >   }
> > > const bool ivdep = cp_parser_pragma_ivdep (parser, pragma_tok);
> > > -   unsigned short unroll;
> > > +   unsigned short unroll = 0;
> > > +   bool novector = false;
> > > cp_token *tok = cp_lexer_peek_token (the_parser->lexer);
> > > -   if (tok->type == CPP_PRAGMA
> > > -   && cp_parser_pragma_kind (tok) == PRAGMA_UNROLL)
> > > +
> > > +   while (tok->type == CPP_PRAGMA)
> > >   {
> > > -   tok = cp_lexer_consume_token (parser->lexer);
> > > -   unroll = cp_parser_pragma_unroll (parser, tok);
> > > -   tok = cp_lexer_peek_token (the_parser->lexer);
> > > +   switch (cp_parser_pragma_kind (tok))
> > > + {
> > > +   case PRAGMA_UNROLL:
> > > + {
> > > +   tok = cp_lexer_consume_token (parser->lexer);
> > > +   unroll = cp_parser_pragma_unroll (parser, tok);
> > > +   tok = cp_lexer_peek_token (the_parser->lexer);
> > > +   break;
> > > + }
> > > +   case PRAGMA_NOVECTOR:
> > > + {
> > > +   tok = cp_lexer_consume_token (parser->lexer);
> > > +   novector = cp_parser_pragma_novector (parser, tok);
> > > +   tok = cp_lexer_peek_token (the_parser->lexer);
> > > +   break;
> > > + }
> > > +   default:
> > > + gcc_unr

Re: [PATCH 1/2] go: update usage of TARGET_AIX to TARGET_AIX_OS

2023-06-30 Thread Peter Bergner via Gcc-patches
On 6/22/23 10:30 PM, Ian Lance Taylor wrote:
> On Thu, Jun 22, 2023, 4: 47 PM Peter Bergner  
> wrote: On 6/22/23 6: 37 PM, Peter Bergner via Gcc-patches wrote: > On 6/16/23 
> >> On Fri, Jun 16, 2023 at 9:00 AM Paul E. Murphy via Gcc-patches
> >> mailto:gcc-patches@gcc.gnu.org>> wrote:
> >>>
> >>> TARGET_AIX is defined to a non-zero value on linux and maybe other
> >>> powerpc64le targets.  This leads to unexpected behavior such as
> >>> dropping the .go_export section when linking a shared library
> >>> on linux/powerpc64le.
> >>>
> >>> Instead, use TARGET_AIX_OS to toggle AIX specific behavior.
> >>>
> >>> Fixes golang/go#60798.
> >>>
> >>> gcc/go/ChangeLog:
> >>>
> >>>         * go-backend.cc [TARGET_AIX]: Rename and update usage to
> >>>         TARGET_AIX_OS.
> >>>         * go-lang.cc: Likewise.
> >>
> >> This is OK.
> >>
> >> Thanks.
> >>
> >> Ian
> >
> > I pushed this to trunk for Paul.
> 
> I see this is broken on the release branches too.  Are backports ok
> after some burn-in on trunk?
> 
> Yes.  Thanks.

Ok, I backported the Go fix to GCC 13, 12, 11 and 10 (before the 10.5 freeze).
I also backported to the rust change to GCC 13, which was the first release
with rust.   Thanks.

Peter



[PATCH 1/7] Fix up merge/formatting errors

2023-06-30 Thread Julian Brown
This patch fixes a couple of minor merge/formatting errors.

2023-06-30  Julian Brown  

gcc/fortran/
* parse.cc (decode_omp_directive): Add missing break.

gcc/
* gimplify.cc (gimplify_adjust_omp_clauses): Fix indentation.
---
 gcc/fortran/parse.cc | 1 +
 gcc/gimplify.cc  | 4 ++--
 2 files changed, 3 insertions(+), 2 deletions(-)

diff --git a/gcc/fortran/parse.cc b/gcc/fortran/parse.cc
index 73f15608260..2467adf5836 100644
--- a/gcc/fortran/parse.cc
+++ b/gcc/fortran/parse.cc
@@ -902,6 +902,7 @@ decode_omp_directive (void)
   break;
 case 't':
   matchs ("tile sizes", gfc_match_omp_tile, ST_OMP_TILE);
+  break;
 case 'u':
   matchs ("unroll", gfc_match_omp_unroll, ST_OMP_UNROLL);
   break;
diff --git a/gcc/gimplify.cc b/gcc/gimplify.cc
index 1e90d2ed031..707a0c046de 100644
--- a/gcc/gimplify.cc
+++ b/gcc/gimplify.cc
@@ -13996,8 +13996,8 @@ gimplify_adjust_omp_clauses (gimple_seq *pre_p, 
gimple_seq body, tree *list_p,
 fb_lvalue) == GS_ERROR)
remove = true;
  gimplify_omp_ctxp = ctx;
-  break;
-}
+ break;
+   }
 
 if ((code == OMP_TARGET
  || code == OMP_TARGET_DATA
-- 
2.25.1



[PATCH 0/7] [og13] OpenMP: lvalue parsing and "declare mapper" support

2023-06-30 Thread Julian Brown
This patch series provides generalised lvalue ("locator list item")
parsing for OpenMP "map", "to" and "from" clauses for C and C++, and
"declare mapper" support for C, C++ and Fortran.  It is based on the
latter part of the patch series sent upstream previously:

  https://gcc.gnu.org/pipermail/gcc-patches/2022-December/609031.html

(parts 7, 8, 9, 10 and 11), with adjustments necessary due to other
patches already on og13 and also a few follow-up fixes.

Tested with offloading to AMD GCN.  I will apply (to og13) shortly.

Julian Brown (7):
  Fix up merge/formatting errors
  OpenMP: OpenMP 5.2 semantics for pointers with unmapped target
  OpenMP: lvalue parsing for map/to/from clauses (C++)
  OpenMP: C++ "declare mapper" support
  OpenMP: lvalue parsing for map clauses (C)
  OpenMP: Support OpenMP 5.0 "declare mapper" directives for C
  OpenMP: Fortran "!$omp declare mapper" support

 gcc/c-family/c-common.h   |7 +-
 gcc/c-family/c-omp.cc |  323 +-
 gcc/c-family/c-pretty-print.cc|   12 +
 gcc/c/c-decl.cc   |  169 +
 gcc/c/c-objc-common.h |   12 +
 gcc/c/c-parser.cc |  463 ++-
 gcc/c/c-tree.h|   10 +
 gcc/c/c-typeck.cc |  128 +-
 gcc/cp/constexpr.cc   |   10 +
 gcc/cp/cp-gimplify.cc |6 +
 gcc/cp/cp-objcp-common.h  |9 +
 gcc/cp/cp-tree.h  |   19 +-
 gcc/cp/decl.cc|   27 +-
 gcc/cp/decl2.cc   |   54 +-
 gcc/cp/error.cc   |   34 +
 gcc/cp/parser.cc  |  497 ++-
 gcc/cp/parser.h   |3 +
 gcc/cp/pt.cc  |   78 +-
 gcc/cp/semantics.cc   |  261 +-
 gcc/cp/typeck.cc  |   50 +
 gcc/fortran/dump-parse-tree.cc|4 +
 gcc/fortran/f95-lang.cc   |7 +
 gcc/fortran/gfortran.h|   56 +-
 gcc/fortran/match.cc  |9 +-
 gcc/fortran/match.h   |1 +
 gcc/fortran/module.cc |  251 +-
 gcc/fortran/openmp.cc |  299 +-
 gcc/fortran/parse.cc  |   18 +-
 gcc/fortran/resolve.cc|2 +
 gcc/fortran/st.cc |2 +-
 gcc/fortran/symbol.cc |   16 +
 gcc/fortran/trans-decl.cc |   33 +-
 gcc/fortran/trans-openmp.cc   |  785 +++-
 gcc/fortran/trans-stmt.h  |1 +
 gcc/fortran/trans.h   |3 +
 gcc/gimplify.cc   |  583 ++-
 gcc/langhooks-def.h   |   13 +
 gcc/langhooks.cc  |   35 +
 gcc/langhooks.h   |   16 +
 gcc/omp-general.h |   86 +
 .../c-c++-common/gomp/declare-mapper-12.c |   22 +
 .../c-c++-common/gomp/declare-mapper-3.c  |   30 +
 .../c-c++-common/gomp/declare-mapper-4.c  |   78 +
 .../c-c++-common/gomp/declare-mapper-5.c  |   26 +
 .../c-c++-common/gomp/declare-mapper-6.c  |   23 +
 .../c-c++-common/gomp/declare-mapper-7.c  |   29 +
 .../c-c++-common/gomp/declare-mapper-8.c  |   43 +
 .../c-c++-common/gomp/declare-mapper-9.c  |   34 +
 gcc/testsuite/c-c++-common/gomp/map-6.c   |   14 +-
 gcc/testsuite/g++.dg/gomp/array-section-1.C   |   38 +
 gcc/testsuite/g++.dg/gomp/array-section-2.C   |   63 +
 .../g++.dg/gomp/bad-array-section-1.C |   35 +
 .../g++.dg/gomp/bad-array-section-10.C|   35 +
 .../g++.dg/gomp/bad-array-section-11.C|   36 +
 .../g++.dg/gomp/bad-array-section-2.C |   33 +
 .../g++.dg/gomp/bad-array-section-3.C |   28 +
 .../g++.dg/gomp/bad-array-section-4.C |   50 +
 .../g++.dg/gomp/bad-array-section-5.C |   50 +
 .../g++.dg/gomp/bad-array-section-6.C |   24 +
 .../g++.dg/gomp/bad-array-section-7.C |   36 +
 .../g++.dg/gomp/bad-array-section-8.C |   53 +
 .../g++.dg/gomp/bad-array-section-9.C |   39 +
 gcc/testsuite/g++.dg/gomp/declare-mapper-1.C  |   58 +
 gcc/testsuite/g++.dg/gomp/declare-mapper-2.C  |   30 +
 .../gomp/has_device_addr-non-lvalue-1.C   |   36 +
 gcc/testsuite/g++.dg/gomp/ind-base-3.C|   37 +
 gcc/testsuite/g++.dg/gomp/map-assignment-1.C  |   12 +
 gcc/testsuite/g++.dg/gomp/map-inc-1.C |   10 +
 gcc/testsuite/g++.dg/gomp/map-lvalue-ref-1.C  |   19 +
 gcc/testsuite/g++.dg/gomp/map-ptrmem-1.C  |   37 +
 gcc/testsuite/g++.dg/gomp/map-ptrmem-2.C  |   40 +
 .../g++.dg/gomp/map-static-cast-lvalue-1.C|   17 +
 gcc/testsuite/g++.dg/gomp/map-ternary-1.C  

[PATCH 2/7] OpenMP: OpenMP 5.2 semantics for pointers with unmapped target

2023-06-30 Thread Julian Brown
This patch fixes two more cases where an unmapped target pointer results
in a null pointer on the target instead of a copy of the host pointer.
The latter behaviour is required by OpenMP 5.2, which is a change from
earlier versions of the standard.  This change has already been made in
one place by Tobias's patch here:

  https://gcc.gnu.org/pipermail/gcc-patches/2023-June/622018.html

But this patch makes a similar adjustment in other places
(i.e. for GOMP_MAP_ATTACH_ZERO_LENGTH_ARRAY_SECTION and
GOMP_MAP_POINTER_TO_ZERO_LENGTH_ARRAY_SECTION).

These changes also revealed a problem with DECL_VALUE_EXPR handling in
gimplify.cc, which this patch also fixes.

2023-06-30  Julian Brown  

gcc/
* gimplify.cc (gimplify_scan_omp_clauses): Add note about
DECL_VALUE_EXPR handling for struct mapping nodes.
(gimplify_adjust_omp_clauses): Perform DECL_VALUE_EXPR substitution
before DECL_P check.

libgomp/
* target.c (gomp_map_pointer): Modify zero-length array section
pointer handling.
(gomp_attach_pointer): Likewise.
* testsuite/libgomp.c++/target-lambda-1.C: Update for OpenMP 5.2
semantics.
* testsuite/libgomp.c++/target-this-3.C: Likewise.
* testsuite/libgomp.c++/target-this-4.C: Likewise.
---
 gcc/gimplify.cc   | 20 ++-
 libgomp/target.c  |  7 +++
 .../testsuite/libgomp.c++/target-lambda-1.C   |  5 -
 libgomp/testsuite/libgomp.c++/target-this-3.C | 11 ++
 libgomp/testsuite/libgomp.c++/target-this-4.C | 11 ++
 5 files changed, 40 insertions(+), 14 deletions(-)

diff --git a/gcc/gimplify.cc b/gcc/gimplify.cc
index 707a0c046de..0e856b903ec 100644
--- a/gcc/gimplify.cc
+++ b/gcc/gimplify.cc
@@ -12090,7 +12090,13 @@ gimplify_scan_omp_clauses (tree *list_p, gimple_seq 
*pre_p,
 
  /* Adding the decl for a struct access: we haven't created
 GOMP_MAP_STRUCT nodes yet, so this statement needs to predict
-whether they will be created in gimplify_adjust_omp_clauses.  */
+whether they will be created in gimplify_adjust_omp_clauses.
+NOTE: Technically we should probably look through DECL_VALUE_EXPR
+here because something that looks like a DECL_P may actually be a
+struct access, e.g. variables in a lambda closure
+(__closure->__foo) or class members (this->foo). Currently in both
+those cases we map the whole of the containing object (directly in
+the C++ FE) though, so struct nodes are not created.  */
  if (c == grp_end
  && addr_tokens[0]->type == STRUCTURE_BASE
  && addr_tokens[0]->u.structure_base_kind == BASE_DECL
@@ -13895,6 +13901,18 @@ gimplify_adjust_omp_clauses (gimple_seq *pre_p, 
gimple_seq body, tree *list_p,
  remove = true;
  break;
}
+ /* If we have a DECL_VALUE_EXPR (e.g. this is a class member and/or
+a variable captured in a lambda closure), look through that now
+before the DECL_P check below.  (A code other than COMPONENT_REF,
+i.e. INDIRECT_REF, will be a VLA/variable-length array
+section.  A global var may be a variable in a common block.  We
+don't want to do this here for either of those.)  */
+ if ((ctx->region_type & ORT_ACC) == 0
+ && DECL_P (decl)
+ && !is_global_var (decl)
+ && DECL_HAS_VALUE_EXPR_P (decl)
+ && TREE_CODE (DECL_VALUE_EXPR (decl)) == COMPONENT_REF)
+   decl = OMP_CLAUSE_DECL (c) = DECL_VALUE_EXPR (decl);
  if (TREE_CODE (decl) == TARGET_EXPR)
{
  if (gimplify_expr (&OMP_CLAUSE_DECL (c), pre_p, NULL,
diff --git a/libgomp/target.c b/libgomp/target.c
index fbc84c68952..4447675cd16 100644
--- a/libgomp/target.c
+++ b/libgomp/target.c
@@ -855,7 +855,7 @@ gomp_map_pointer (struct target_mem_desc *tgt, struct 
goacc_asyncqueue *aq,
   if (n == NULL)
 {
   if (allow_zero_length_array_sections)
-   cur_node.tgt_offset = 0;
+   cur_node.tgt_offset = cur_node.host_start;
   else if (devicep->is_usm_ptr_func
   && devicep->is_usm_ptr_func ((void*)cur_node.host_start))
cur_node.tgt_offset = cur_node.host_start;
@@ -1023,9 +1023,8 @@ gomp_attach_pointer (struct gomp_device_descr *devicep,
{
  if (allow_zero_length_array_sections)
/* When allowing attachment to zero-length array sections, we
-  allow attaching to NULL pointers when the target region is not
-  mapped.  */
-   data = 0;
+  copy the host pointer when the target region is not mapped.  */
+   data = target;
  else
{
  gomp_mutex_unlock (&devicep->lock);
diff --git a/libgomp/testsuite/libgomp.c++/target-lambda-1.C 
b/libgomp/testsuite/libgomp.c++/target-lambda-1.

[PATCH 4/7] OpenMP: C++ "declare mapper" support

2023-06-30 Thread Julian Brown
This patch adds support for OpenMP 5.0 "declare mapper" functionality
for C++.  I've merged it to og13 based on the last version
posted upstream, with some minor changes due to the newly-added
'present' map modifier support.  There's also a fix to splay-tree
traversal in gimplify.cc:omp_instantiate_implicit_mappers, and this patch
omits the rearrangement of gimplify.cc:gimplify_{scan,adjust}_omp_clauses
that I separated out into its own patch and applied (to og13) already.

2023-06-30  Julian Brown  

gcc/c-family/
* c-common.h (omp_mapper_list): Add forward declaration.
(c_omp_find_nested_mappers, c_omp_instantiate_mappers): Add prototypes.
* c-omp.cc (c_omp_find_nested_mappers): New function.
(remap_mapper_decl_info): New struct.
(remap_mapper_decl_1, omp_instantiate_mapper,
c_omp_instantiate_mappers): New functions.

gcc/cp/
* constexpr.cc (reduced_constant_expression_p): Add OMP_DECLARE_MAPPER
case.
(cxx_eval_constant_expression, potential_constant_expression_1):
Likewise.
* cp-gimplify.cc (cxx_omp_finish_mapper_clauses): New function.
* cp-objcp-common.h (LANG_HOOKS_OMP_FINISH_MAPPER_CLAUSES,
LANG_HOOKS_OMP_MAPPER_LOOKUP, LANG_HOOKS_OMP_EXTRACT_MAPPER_DIRECTIVE,
LANG_HOOKS_OMP_MAP_ARRAY_SECTION): Define langhooks.
* cp-tree.h (lang_decl_base): Add omp_declare_mapper_p field.  Recount
spare bits comment.
(DECL_OMP_DECLARE_MAPPER_P): New macro.
(omp_mapper_id, cp_check_omp_declare_mapper, omp_instantiate_mappers,
cxx_omp_finish_mapper_clauses, cxx_omp_mapper_lookup,
cxx_omp_extract_mapper_directive, cxx_omp_map_array_section: Add
prototypes.
* decl.cc (check_initializer): Add OpenMP declare mapper support.
(cp_finish_decl): Set DECL_INITIAL for OpenMP declare mapper var decls
as appropriate.
* decl2.cc (mark_used): Instantiate OpenMP "declare mapper" magic var
decls.
* error.cc (dump_omp_declare_mapper): New function.
(dump_simple_decl): Use above.
* parser.cc (cp_parser_omp_clause_map): Add KIND parameter.  Support
"mapper" modifier.
(cp_parser_omp_all_clauses): Add KIND argument to
cp_parser_omp_clause_map call.
(cp_parser_omp_target): Call omp_instantiate_mappers before
finish_omp_clauses.
(cp_parser_omp_declare_mapper): New function.
(cp_parser_omp_declare): Add "declare mapper" support.
* pt.cc (tsubst_decl): Adjust name of "declare mapper" magic var decls
once we know their type.
(tsubst_omp_clauses): Call omp_instantiate_mappers before
finish_omp_clauses, for target regions.
(tsubst_expr): Support OMP_DECLARE_MAPPER nodes.
(instantiate_decl): Instantiate initialiser (i.e definition) for OpenMP
declare mappers.
* semantics.cc (gimplify.h): Include.
(omp_mapper_id, omp_mapper_lookup, omp_extract_mapper_directive,
cxx_omp_map_array_section, cp_check_omp_declare_mapper): New functions.
(finish_omp_clauses): Delete GOMP_MAP_PUSH_MAPPER_NAME and
GOMP_MAP_POP_MAPPER_NAME artificial clauses.
(omp_target_walk_data): Add MAPPERS field.
(finish_omp_target_clauses_r): Scan for uses of struct/union/class type
variables.
(finish_omp_target_clauses): Create artificial mapper binding clauses
for used structs/unions/classes in offload region.

gcc/fortran/
* parse.cc (tree.h, fold-const.h, tree-hash-traits.h): Add includes
(for additions to omp-general.h).

gcc/
* gimplify.cc (gimplify_omp_ctx): Add IMPLICIT_MAPPERS field.
(new_omp_context): Initialise IMPLICIT_MAPPERS hash map.
(delete_omp_context): Delete IMPLICIT_MAPPERS hash map.
(instantiate_mapper_info): New structs.
(remap_mapper_decl_1, omp_mapper_copy_decl, omp_instantiate_mapper,
omp_instantiate_implicit_mappers): New functions.
(gimplify_scan_omp_clauses): Handle MAPPER_BINDING clauses.
(gimplify_adjust_omp_clauses): Instantiate implicit declared mappers.
(gimplify_omp_declare_mapper): New function.
(gimplify_expr): Call above function.
* langhooks-def.h (lhd_omp_finish_mapper_clauses,
lhd_omp_mapper_lookup, lhd_omp_extract_mapper_directive,
lhd_omp_map_array_section): Add prototypes.
(LANG_HOOKS_OMP_FINISH_MAPPER_CLAUSES,
LANG_HOOKS_OMP_MAPPER_LOOKUP, LANG_HOOKS_OMP_EXTRACT_MAPPER_DIRECTIVE,
LANG_HOOKS_OMP_MAP_ARRAY_SECTION): Define macros.
(LANG_HOOK_DECLS): Add above macros.
* langhooks.cc (lhd_omp_finish_mapper_clauses,
lhd_omp_mapper_lookup, lhd_omp_extract_mapper_directive,
lhd_omp_map_array_section): New dummy functions.
* langhooks.h (lang_hooks_for_decls): Add OMP_FINISH_MAPPER_CLAUSES,
OMP_MAPPER_LOOKUP, OMP_EXTRACT_MAPPER_DIRECTIVE, OMP_

[PATCH 7/7] OpenMP: Fortran "!$omp declare mapper" support

2023-06-30 Thread Julian Brown
This patch implements "omp declare mapper" functionality for Fortran,
following the equivalent support for C and C++.  This version of the
patch has been merged to og13 and contains various fixes for e.g.:

  * Mappers with deferred-length strings

  * Array descriptors not being appropriately transferred
to the offload target (see "OMP_MAP_POINTER_ONLY" and
gimplify.cc:omp_maybe_get_descriptor_from_ptr).

2023-06-30  Julian Brown  

gcc/fortran/
* dump-parse-tree.cc (show_attr): Show omp_udm_artificial_var flag.
(show_omp_namelist): Support OMP_MAP_POINTER_ONLY and OMP_MAP_UNSET.
* f95-lang.cc (LANG_HOOKS_OMP_FINISH_MAPPER_CLAUSES,
LANG_HOOKS_OMP_EXTRACT_MAPPER_DIRECTIVE,
LANG_HOOKS_OMP_MAP_ARRAY_SECTION): Define language hooks.
* gfortran.h (gfc_statement): Add ST_OMP_DECLARE_MAPPER.
(symbol_attribute): Add omp_udm_artificial_var attribute.
(gfc_omp_map_op): Add OMP_MAP_POINTER_ONLY and OMP_MAP_UNSET.
(gfc_omp_namelist): Add udm pointer to u2 union.
(gfc_omp_udm): New struct.
(gfc_omp_namelist_udm): New struct.
(gfc_symtree): Add omp_udm pointer.
(gfc_namespace): Add omp_udm_root symtree. Add omp_udm_ns flag.
(gfc_free_omp_namelist): Update prototype.
(gfc_free_omp_udm, gfc_omp_udm_find, gfc_find_omp_udm,
gfc_resolve_omp_udms): Add prototypes.
* match.cc (gfc_free_omp_namelist): Change FREE_NS and FREE_ALIGN
parameters to LIST number, to handle freeing user-defined mapper
namelists safely.
* match.h (gfc_match_omp_declare_mapper): Add prototype.
* module.cc (ab_attribute): Add AB_OMP_DECLARE_MAPPER_VAR.
(attr_bits): Add OMP_DECLARE_MAPPER_VAR.
(mio_symbol_attribute): Read/write AB_OMP_DECLARE_MAPPER_VAR attribute.
Set referenced attr on read.
(omp_map_clause_ops, omp_map_cardinality): New arrays.
(load_omp_udms, check_omp_declare_mappers): New functions.
(read_module): Load and check OMP declare mappers.
(write_omp_udm, write_omp_udms): New functions.
(write_module): Write OMP declare mappers.
* openmp.cc (gfc_free_omp_clauses, gfc_match_omp_variable_list,
gfc_match_omp_to_link, gfc_match_omp_depend_sink,
gfc_match_omp_clause_reduction): Update calls to gfc_free_omp_namelist.
(gfc_free_omp_udm, gfc_find_omp_udm, gfc_omp_udm_find,
gfc_match_omp_declare_mapper): New functions.
(gfc_match_omp_clauses): Add DEFAULT_MAP_OP parameter. Update calls to
gfc_free_omp_namelist.  Add declare mapper support.
(resolve_omp_clauses): Add declare mapper support.  Update calls to
gfc_free_omp_namelist.
(gfc_resolve_omp_udm, gfc_resolve_omp_udms): New functions.
* parse.cc (decode_omp_directive): Add declare mapper support.
(case_omp_decl): Add ST_OMP_DECLARE_MAPPER case.
(gfc_ascii_statement): Add ST_OMP_DECLARE_MAPPER case.
* resolve.cc (resolve_types): Call gfc_resolve_omp_udms.
* st.cc (gfc_free_statement): Update call to gfc_free_omp_namelist.
* symbol.cc (free_omp_udm_tree): New function.
(gfc_free_namespace): Call above.
* trans-decl.cc (omp_declare_mapper_ns): New global.
(gfc_finish_var_decl, gfc_generate_function_code): Support declare
mappers.
(gfc_trans_deferred_vars): Ignore artificial declare-mapper vars.
* trans-openmp.cc (tree-iterator.h): Include.
(toc_directive): New enum.
(gfc_trans_omp_array_section): Change OP and OPENMP parameters to
toc_directive CD ('clause directive').
(gfc_omp_finish_mapper_clauses, gfc_omp_extract_mapper_directive,
gfc_omp_map_array_section): New functions.
(omp_clause_directive): New enum.
(gfc_trans_omp_clauses): Remove DECLARE_SIMD and OPENACC parameters.
Replace with toc_directive CD, defaulting to TOC_OPENMP.  Add declare
mapper support and OMP_MAP_POINTER_ONLY support.
(gfc_trans_omp_construct, gfc_trans_oacc_executable_directive,
gfc_trans_oacc_combined_directive): Update calls to
gfc_trans_omp_clauses.
(gfc_subst_replace, gfc_subst_prepend_ref): New variables.
(gfc_subst_in_expr_1, gfc_subst_in_expr, gfc_subst_mapper_var,
gfc_trans_omp_instantiate_mapper, gfc_trans_omp_instantiate_mappers,
gfc_record_mapper_bindings_code_fn, gfc_record_mapper_bindings_expr_fn,
gfc_find_nested_mappers, gfc_record_mapper_bindings): New functions.
(gfc_typespec * hash traits): New template.
(omp_declare_mapper_ns): Extern declaration.
(gfc_trans_omp_target): Call gfc_trans_omp_instantiate_mappers and
gfc_record_mapper_bindings. Update calls to gfc_trans_omp_clauses.
(gfc_trans_omp_declare_simd, gfc_trans_omp_declare_variant): Update
calls to gfc_trans_omp_clauses.
(gfc_trans_omp_mapper_name, gfc

[PATCH 5/7] OpenMP: lvalue parsing for map clauses (C)

2023-06-30 Thread Julian Brown
This patch adds support for parsing general lvalues ("locator list item
types") for OpenMP "map", "to" and "from" clauses to the C front-end,
similar to the previously-posted patch for C++.

2023-06-30  Julian Brown  

gcc/c/
* c-pretty-print.cc (c_pretty_printer::postfix_expression,
c_pretty_printer::expression): Add OMP_ARRAY_SECTION support.
* c-parser.cc (c_parser_braced_init, c_parser_conditional_expression):
Don't allow OpenMP array section.
(c_parser_postfix_expression): Don't allow array section in statement
expression.
(c_parser_postfix_expression_after_primary): Add support
for OpenMP array section parsing.
(c_parser_expr_list): Don't allow OpenMP array section here.
(c_parser_omp_variable_list): Change ALLOW_DEREF parameter to
MAP_LVALUE.  Support parsing of general lvalues in "map", "to" and
"from" clauses.
(c_parser_omp_var_list_parens): Change ALLOW_DEREF parameter to
MAP_LVALUE.  Update call to c_parser_omp_variable_list.
(c_parser_oacc_data_clause): Update calls to
c_parser_omp_var_list_parens.
(c_parser_omp_clause_reduction): Use OMP_ARRAY_SECTION tree node
instead of TREE_LIST for array sections.
(c_parser_omp_target): Allow GOMP_MAP_ATTACH.
* c-tree.h (c_omp_array_section_p): Add extern declaration.
(build_omp_array_section): Add prototype.
* c-typeck.c (c_omp_array_section_p): Add flag.
(mark_exp_read): Support OMP_ARRAY_SECTION.
(build_omp_array_section): Add function.
(build_external_ref): Tweak error path for OpenMP array sections.
(handle_omp_array_sections_1): Use OMP_ARRAY_SECTION tree code instead
of TREE_LIST.  Handle more kinds of expressions.
(c_oacc_check_attachments): Use OMP_ARRAY_SECTION instead of TREE_LIST
for array sections.
(c_finish_omp_clauses): Use OMP_ARRAY_SECTION instead of TREE_LIST.
Check for supported expression types.

gcc/testsuite/
* gcc.dg/gomp/bad-array-section-c-1.c: New test.
* gcc.dg/gomp/bad-array-section-c-2.c: New test.
* gcc.dg/gomp/bad-array-section-c-3.c: New test.
* gcc.dg/gomp/bad-array-section-c-4.c: New test.
* gcc.dg/gomp/bad-array-section-c-5.c: New test.
* gcc.dg/gomp/bad-array-section-c-6.c: New test.
* gcc.dg/gomp/bad-array-section-c-7.c: New test.
* gcc.dg/gomp/bad-array-section-c-8.c: New test.

libgomp/
* testsuite/libgomp.c-c++-common/ind-base-4.c: New test.
* testsuite/libgomp.c-c++-common/unary-ptr-1.c: New test.
---
 gcc/c-family/c-pretty-print.cc|  12 ++
 gcc/c/c-parser.cc | 184 +++---
 gcc/c/c-tree.h|   2 +
 gcc/c/c-typeck.cc | 113 ---
 .../gcc.dg/gomp/bad-array-section-c-1.c   |  16 ++
 .../gcc.dg/gomp/bad-array-section-c-2.c   |  13 ++
 .../gcc.dg/gomp/bad-array-section-c-3.c   |  24 +++
 .../gcc.dg/gomp/bad-array-section-c-4.c   |  26 +++
 .../gcc.dg/gomp/bad-array-section-c-5.c   |  15 ++
 .../gcc.dg/gomp/bad-array-section-c-6.c   |  16 ++
 .../gcc.dg/gomp/bad-array-section-c-7.c   |  26 +++
 .../gcc.dg/gomp/bad-array-section-c-8.c   |  21 ++
 .../libgomp.c-c++-common/ind-base-4.c |  50 +
 .../libgomp.c-c++-common/unary-ptr-1.c|  16 ++
 14 files changed, 487 insertions(+), 47 deletions(-)
 create mode 100644 gcc/testsuite/gcc.dg/gomp/bad-array-section-c-1.c
 create mode 100644 gcc/testsuite/gcc.dg/gomp/bad-array-section-c-2.c
 create mode 100644 gcc/testsuite/gcc.dg/gomp/bad-array-section-c-3.c
 create mode 100644 gcc/testsuite/gcc.dg/gomp/bad-array-section-c-4.c
 create mode 100644 gcc/testsuite/gcc.dg/gomp/bad-array-section-c-5.c
 create mode 100644 gcc/testsuite/gcc.dg/gomp/bad-array-section-c-6.c
 create mode 100644 gcc/testsuite/gcc.dg/gomp/bad-array-section-c-7.c
 create mode 100644 gcc/testsuite/gcc.dg/gomp/bad-array-section-c-8.c
 create mode 100644 libgomp/testsuite/libgomp.c-c++-common/ind-base-4.c
 create mode 100644 libgomp/testsuite/libgomp.c-c++-common/unary-ptr-1.c

diff --git a/gcc/c-family/c-pretty-print.cc b/gcc/c-family/c-pretty-print.cc
index 7536a7c471f..225ac7ef285 100644
--- a/gcc/c-family/c-pretty-print.cc
+++ b/gcc/c-family/c-pretty-print.cc
@@ -1615,6 +1615,17 @@ c_pretty_printer::postfix_expression (tree e)
   pp_c_right_bracket (this);
   break;
 
+case OMP_ARRAY_SECTION:
+  postfix_expression (TREE_OPERAND (e, 0));
+  pp_c_left_bracket (this);
+  if (TREE_OPERAND (e, 1))
+   expression (TREE_OPERAND (e, 1));
+  pp_colon (this);
+  if (TREE_OPERAND (e, 2))
+   expression (TREE_OPERAND (e, 2));
+  pp_c_right_bracket (this);
+  break;
+
 case CALL_EXPR:
   {
call_expr_arg_iterator iter;
@@ -2664,6 +2675,7 @@ c_pretty_printer::expression (tree

[PATCH 6/7] OpenMP: Support OpenMP 5.0 "declare mapper" directives for C

2023-06-30 Thread Julian Brown
This patch adds support for "declare mapper" directives (and the "mapper"
modifier on "map" clauses) for C.

2023-06-30  Julian Brown  

gcc/c/
* c-decl.cc (c_omp_mapper_id, c_omp_mapper_decl, c_omp_mapper_lookup,
c_omp_extract_mapper_directive, c_omp_map_array_section,
c_omp_scan_mapper_bindings_r, c_omp_scan_mapper_bindings): New
functions.
* c-objc-common.h (LANG_HOOKS_OMP_FINISH_MAPPER_CLAUSES,
LANG_HOOKS_OMP_MAPPER_LOOKUP, LANG_HOOKS_OMP_EXTRACT_MAPPER_DIRECTIVE,
LANG_HOOKS_OMP_MAP_ARRAY_SECTION): Define langhooks for C.
* c-parser.cc (c_parser_omp_clause_map): Add KIND parameter.  Handle
mapper modifier.
(c_parser_omp_all_clauses): Update call to c_parser_omp_clause_map with
new kind argument.
(c_parser_omp_target): Instantiate explicit mappers and record bindings
for implicit mappers.
(c_parser_omp_declare_mapper): Parse "declare mapper" directives.
(c_parser_omp_declare): Support "declare mapper".
* c-tree.h (c_omp_finish_mapper_clauses, c_omp_mapper_lookup,
c_omp_extract_mapper_directive, c_omp_map_array_section,
c_omp_mapper_id, c_omp_mapper_decl, c_omp_scan_mapper_bindings,
c_omp_instantiate_mappers): Add prototypes.
* c-typeck.cc (c_finish_omp_clauses): Handle GOMP_MAP_PUSH_MAPPER_NAME
and GOMP_MAP_POP_MAPPER_NAME.
(c_omp_finish_mapper_clauses): New function (langhook).

gcc/testsuite/
* c-c++-common/gomp/declare-mapper-4.c: Enable for C.
* c-c++-common/gomp/declare-mapper-5.c: Likewise.
* c-c++-common/gomp/declare-mapper-6.c: Likewise.
* c-c++-common/gomp/declare-mapper-7.c: Likewise.
* c-c++-common/gomp/declare-mapper-8.c: Likewise.
* c-c++-common/gomp/declare-mapper-9.c: Likewise.
* c-c++-common/gomp/declare-mapper-12.c: Enable for C.
* gcc.dg/gomp/declare-mapper-10.c: New test.
* gcc.dg/gomp/declare-mapper-11.c: New test.

libgomp/
* testsuite/libgomp.c-c++-common/declare-mapper-9.c: Enable for C.
* testsuite/libgomp.c-c++-common/declare-mapper-10.c: Likewise.
* testsuite/libgomp.c-c++-common/declare-mapper-11.c: Likewise.
* testsuite/libgomp.c-c++-common/declare-mapper-12.c: Likewise.
* testsuite/libgomp.c-c++-common/declare-mapper-13.c: Likewise.
* testsuite/libgomp.c-c++-common/declare-mapper-14.c: Likewise.
---
 gcc/c/c-decl.cc   | 169 +++
 gcc/c/c-objc-common.h |  12 +
 gcc/c/c-parser.cc | 279 --
 gcc/c/c-tree.h|   8 +
 gcc/c/c-typeck.cc |  15 +
 .../c-c++-common/gomp/declare-mapper-12.c |   2 +-
 .../c-c++-common/gomp/declare-mapper-4.c  |   2 +-
 .../c-c++-common/gomp/declare-mapper-5.c  |   2 +-
 .../c-c++-common/gomp/declare-mapper-6.c  |   2 +-
 .../c-c++-common/gomp/declare-mapper-7.c  |   2 +-
 .../c-c++-common/gomp/declare-mapper-8.c  |   2 +-
 .../c-c++-common/gomp/declare-mapper-9.c  |   2 +-
 gcc/testsuite/gcc.dg/gomp/declare-mapper-10.c |  61 
 gcc/testsuite/gcc.dg/gomp/declare-mapper-11.c |  33 +++
 .../libgomp.c-c++-common/declare-mapper-10.c  |   2 +-
 .../libgomp.c-c++-common/declare-mapper-11.c  |   2 +-
 .../libgomp.c-c++-common/declare-mapper-12.c  |   2 +-
 .../libgomp.c-c++-common/declare-mapper-13.c  |   2 +-
 .../libgomp.c-c++-common/declare-mapper-14.c  |   2 +-
 .../libgomp.c-c++-common/declare-mapper-9.c   |   2 +-
 20 files changed, 573 insertions(+), 30 deletions(-)
 create mode 100644 gcc/testsuite/gcc.dg/gomp/declare-mapper-10.c
 create mode 100644 gcc/testsuite/gcc.dg/gomp/declare-mapper-11.c

diff --git a/gcc/c/c-decl.cc b/gcc/c/c-decl.cc
index b82bf5b4a48..ca9c72f99e5 100644
--- a/gcc/c/c-decl.cc
+++ b/gcc/c/c-decl.cc
@@ -13136,6 +13136,175 @@ c_check_omp_declare_reduction_r (tree *tp, int *, 
void *data)
   return NULL_TREE;
 }
 
+/* Return identifier to look up for omp declare reduction.  */
+
+tree
+c_omp_mapper_id (tree mapper_id)
+{
+  const char *p = NULL;
+
+  const char prefix[] = "omp declare mapper ";
+
+  if (mapper_id == NULL_TREE)
+p = "";
+  else if (TREE_CODE (mapper_id) == IDENTIFIER_NODE)
+p = IDENTIFIER_POINTER (mapper_id);
+  else
+return error_mark_node;
+
+  size_t lenp = sizeof (prefix);
+  size_t len = strlen (p);
+  char *name = XALLOCAVEC (char, lenp + len);
+  memcpy (name, prefix, lenp - 1);
+  memcpy (name + lenp - 1, p, len + 1);
+  return get_identifier (name);
+}
+
+/* Lookup MAPPER_ID in the current scope, or create an artificial
+   VAR_DECL, bind it into the current scope and return it.  */
+
+tree
+c_omp_mapper_decl (tree mapper_id)
+{
+  struct c_binding *b = I_SYMBOL_BINDING (mapper_id);
+  if (b != NULL && B_IN_CURRENT_SCOPE (b))
+return b->decl;
+
+  tree decl = build_decl (BUILTINS_LOCATION, VAR_DECL,
+ 

[PATCH V4, rs6000] Disable generation of scalar modulo instructions

2023-06-30 Thread Pat Haugen via Gcc-patches

Updated from prior version to address latest review comment (simplify
umod3).

Disable generation of scalar modulo instructions.

It was recently discovered that the scalar modulo instructions can suffer
noticeable performance issues for certain input values. This patch disables
their generation since the equivalent div/mul/sub sequence does not suffer
the same problem.

Bootstrapped and regression tested on powerpc64/powerpc64le.
Ok for master and backports after burn in?

-Pat


2023-06-30  Pat Haugen  

gcc/
* config/rs6000/rs6000.cc (rs6000_rtx_costs): Check if disabling
scalar modulo.
* config/rs6000/rs6000.h (RS6000_DISABLE_SCALAR_MODULO): New.
* config/rs6000/rs6000.md (mod3, *mod3): Disable.
(define_expand umod3): New.
(define_insn umod3): Rename to *umod3 and disable.
(umodti3, modti3): Disable.

gcc/testsuite/
* gcc.target/powerpc/clone1.c: Add xfails.
* gcc.target/powerpc/clone3.c: Likewise.
* gcc.target/powerpc/mod-1.c: Update scan strings and add xfails.
* gcc.target/powerpc/mod-2.c: Likewise.
* gcc.target/powerpc/p10-vdivq-vmodq.c: Add xfails.
diff --git a/gcc/config/rs6000/rs6000.cc b/gcc/config/rs6000/rs6000.cc
index 07c3a3d15ac..72abf285301 100644
--- a/gcc/config/rs6000/rs6000.cc
+++ b/gcc/config/rs6000/rs6000.cc
@@ -22157,7 +22157,9 @@ rs6000_rtx_costs (rtx x, machine_mode mode, int 
outer_code,
*total = rs6000_cost->divsi;
}
   /* Add in shift and subtract for MOD unless we have a mod instruction. */
-  if (!TARGET_MODULO && (code == MOD || code == UMOD))
+  if ((!TARGET_MODULO
+  || (RS6000_DISABLE_SCALAR_MODULO && SCALAR_INT_MODE_P (mode)))
+&& (code == MOD || code == UMOD))
*total += COSTS_N_INSNS (2);
   return false;
 
diff --git a/gcc/config/rs6000/rs6000.h b/gcc/config/rs6000/rs6000.h
index 3503614efbd..22595f6ebd7 100644
--- a/gcc/config/rs6000/rs6000.h
+++ b/gcc/config/rs6000/rs6000.h
@@ -2492,3 +2492,9 @@ while (0)
rs6000_asm_output_opcode (STREAM);  \
 }  \
   while (0)
+
+/* Disable generation of scalar modulo instructions due to performance issues
+   with certain input values.  This can be removed in the future when the
+   issues have been resolved.  */
+#define RS6000_DISABLE_SCALAR_MODULO 1
+
diff --git a/gcc/config/rs6000/rs6000.md b/gcc/config/rs6000/rs6000.md
index cdab49fbb91..555c8525333 100644
--- a/gcc/config/rs6000/rs6000.md
+++ b/gcc/config/rs6000/rs6000.md
@@ -3422,6 +3422,17 @@ (define_expand "mod3"
FAIL;
 
   operands[2] = force_reg (mode, operands[2]);
+
+  if (RS6000_DISABLE_SCALAR_MODULO)
+   {
+ temp1 = gen_reg_rtx (mode);
+ temp2 = gen_reg_rtx (mode);
+
+ emit_insn (gen_div3 (temp1, operands[1], operands[2]));
+ emit_insn (gen_mul3 (temp2, temp1, operands[2]));
+ emit_insn (gen_sub3 (operands[0], operands[1], temp2));
+ DONE;
+   }
 }
   else
 {
@@ -3441,17 +3452,36 @@ (define_insn "*mod3"
   [(set (match_operand:GPR 0 "gpc_reg_operand" "=&r,r")
 (mod:GPR (match_operand:GPR 1 "gpc_reg_operand" "r,r")
 (match_operand:GPR 2 "gpc_reg_operand" "r,r")))]
-  "TARGET_MODULO"
+  "TARGET_MODULO && !RS6000_DISABLE_SCALAR_MODULO"
   "mods %0,%1,%2"
   [(set_attr "type" "div")
(set_attr "size" "")])
 
+;; This define_expand can be removed when RS6000_DISABLE_SCALAR_MODULO is
+;; removed.
+(define_expand "umod3"
+  [(set (match_operand:GPR 0 "gpc_reg_operand")
+   (umod:GPR (match_operand:GPR 1 "gpc_reg_operand")
+ (match_operand:GPR 2 "gpc_reg_operand")))]
+  "TARGET_MODULO"
+{
+  if (RS6000_DISABLE_SCALAR_MODULO)
+{
+  rtx temp1 = gen_reg_rtx (mode);
+  rtx temp2 = gen_reg_rtx (mode);
+
+  emit_insn (gen_udiv3 (temp1, operands[1], operands[2]));
+  emit_insn (gen_mul3 (temp2, temp1, operands[2]));
+  emit_insn (gen_sub3 (operands[0], operands[1], temp2));
+  DONE;
+}
+})
 
-(define_insn "umod3"
+(define_insn "*umod3"
   [(set (match_operand:GPR 0 "gpc_reg_operand" "=&r,r")
 (umod:GPR (match_operand:GPR 1 "gpc_reg_operand" "r,r")
  (match_operand:GPR 2 "gpc_reg_operand" "r,r")))]
-  "TARGET_MODULO"
+  "TARGET_MODULO && !RS6000_DISABLE_SCALAR_MODULO"
   "modu %0,%1,%2"
   [(set_attr "type" "div")
(set_attr "size" "")])
@@ -3508,7 +3538,7 @@ (define_insn "umodti3"
   [(set (match_operand:TI 0 "altivec_register_operand" "=v")
(umod:TI (match_operand:TI 1 "altivec_register_operand" "v")
 (match_operand:TI 2 "altivec_register_operand" "v")))]
-  "TARGET_POWER10 && TARGET_POWERPC64"
+  "TARGET_POWER10 && TARGET_POWERPC64 && !RS6000_DISABLE_SCALAR_MODULO"
   "vmoduq %0,%1,%2"
   [(set_attr "type" "vecdiv")
(set_attr "size" "128")])
@@ -3517,7 +3547,7 @@ (define_insn "modti3"
   [(set (match_operand:TI 0 "altivec_register_operand" "=v")
(mod:TI

[PATCH] libstdc++: Enable OpenMP 5.0 pragmas in PSTL headers

2023-06-30 Thread Jonathan Wakely via Gcc-patches
Jakub made a similar change a few yeas ago, but I think it got lost
in the recent PSTL rebase.

Tested x86_64-linux.

Does this look OK for trunk?

-- >8 --

This reapplies r10-1314-g32bab8b6ad0a90 which was lost in the recent
PSTL rebase from upstream.

* include/pstl/pstl_config.h (_PSTL_PRAGMA_SIMD_SCAN,
_PSTL_PRAGMA_SIMD_INCLUSIVE_SCAN, _PSTL_PRAGMA_SIMD_EXCLUSIVE_SCAN):
Define to OpenMP 5.0 pragmas even for GCC 10.0+.
(_PSTL_UDS_PRESENT): Define to 1 for GCC 10.0+.
---
 libstdc++-v3/include/pstl/pstl_config.h | 6 --
 1 file changed, 4 insertions(+), 2 deletions(-)

diff --git a/libstdc++-v3/include/pstl/pstl_config.h 
b/libstdc++-v3/include/pstl/pstl_config.h
index 74d2139c736..ccb9dd32838 100644
--- a/libstdc++-v3/include/pstl/pstl_config.h
+++ b/libstdc++-v3/include/pstl/pstl_config.h
@@ -82,7 +82,8 @@
 #define _PSTL_PRAGMA_FORCEINLINE
 #endif
 
-#if defined(__INTEL_COMPILER) && __INTEL_COMPILER >= 1900
+#if (defined(__INTEL_COMPILER) && __INTEL_COMPILER >= 1900) || \
+(!defined(__INTEL_COMPILER) && _PSTL_GCC_VERSION >= 10)
 #define _PSTL_PRAGMA_SIMD_SCAN(PRM) _PSTL_PRAGMA(omp simd 
reduction(inscan, PRM))
 #define _PSTL_PRAGMA_SIMD_INCLUSIVE_SCAN(PRM) _PSTL_PRAGMA(omp scan 
inclusive(PRM))
 #define _PSTL_PRAGMA_SIMD_EXCLUSIVE_SCAN(PRM) _PSTL_PRAGMA(omp scan 
exclusive(PRM))
@@ -126,7 +127,8 @@
 #define _PSTL_UDR_PRESENT
 #endif
 
-#if defined(__INTEL_COMPILER) && __INTEL_COMPILER >= 1900 && 
__INTEL_COMPILER_BUILD_DATE >= 20180626
+#if (defined(__INTEL_COMPILER) && __INTEL_COMPILER >= 1900 && 
__INTEL_COMPILER_BUILD_DATE >= 20180626) || \
+(!defined(__INTEL_COMPILER) && _PSTL_GCC_VERSION >= 10)
 #   define _PSTL_UDS_PRESENT
 #endif
 
-- 
2.41.0



RE: [EXTERNAL] Re: [PATCH] Collect both user and kernel events for autofdo tests and autoprofiledbootstrap

2023-06-30 Thread Eugene Rozenfeld via Gcc-patches
I don't run this with elevated privileges but I set 
/proc/sys/kernel/kptr_restrict to 0. Setting that does require elevated 
privileges.

If that's not acceptable, the only fix I can think of is to make that event 
mapping threshold percentage a parameter to create_gcov and pass something low 
enough. 80% instead of the current threshold of 95% should work, although it's 
a bit fragile.

Eugene

-Original Message-
From: Sam James  
Sent: Friday, June 30, 2023 1:59 AM
To: Richard Biener 
Cc: Eugene Rozenfeld ; gcc-patches@gcc.gnu.org
Subject: [EXTERNAL] Re: [PATCH] Collect both user and kernel events for autofdo 
tests and autoprofiledbootstrap

[You don't often get email from s...@gentoo.org. Learn why this is important at 
https://aka.ms/LearnAboutSenderIdentification ]

Richard Biener via Gcc-patches  writes:

> On Fri, Jun 30, 2023 at 7:28 AM Eugene Rozenfeld via Gcc-patches 
>  wrote:
>>
>> When we collect just user events for autofdo with lbr we get some 
>> events where branch sources are kernel addresses and branch targets 
>> are user addresses. Without kernel MMAP events create_gcov can't make 
>> sense of kernel addresses. Currently create_gcov fails if it can't 
>> map at least 95% of events. We sometimes get below this threshold with just 
>> user events. The change is to collect both user events and kernel events.
>
> Does this require elevated privileges?  Can we instead "fix" create_gcov here?

Right, requiring privileges for this is going to be a no-go for a lot of 
builders. In a distro context, for example, it means we can't consider autofdo 
at all.


RE: [EXTERNAL] Re: [PATCH] Collect both user and kernel events for autofdo tests and autoprofiledbootstrap

2023-06-30 Thread Eugene Rozenfeld via Gcc-patches
I also set /proc/sys/kernel/perf_event_paranoid to 1 instead of the default 2.

-Original Message-
From: Gcc-patches  On 
Behalf Of Eugene Rozenfeld via Gcc-patches
Sent: Friday, June 30, 2023 2:44 PM
To: Sam James ; Richard Biener 
Cc: gcc-patches@gcc.gnu.org
Subject: RE: [EXTERNAL] Re: [PATCH] Collect both user and kernel events for 
autofdo tests and autoprofiledbootstrap

I don't run this with elevated privileges but I set 
/proc/sys/kernel/kptr_restrict to 0. Setting that does require elevated 
privileges.

If that's not acceptable, the only fix I can think of is to make that event 
mapping threshold percentage a parameter to create_gcov and pass something low 
enough. 80% instead of the current threshold of 95% should work, although it's 
a bit fragile.

Eugene

-Original Message-
From: Sam James 
Sent: Friday, June 30, 2023 1:59 AM
To: Richard Biener 
Cc: Eugene Rozenfeld ; gcc-patches@gcc.gnu.org
Subject: [EXTERNAL] Re: [PATCH] Collect both user and kernel events for autofdo 
tests and autoprofiledbootstrap

[You don't often get email from s...@gentoo.org. Learn why this is important at 
https://aka.ms/LearnAboutSenderIdentification ]

Richard Biener via Gcc-patches  writes:

> On Fri, Jun 30, 2023 at 7:28 AM Eugene Rozenfeld via Gcc-patches 
>  wrote:
>>
>> When we collect just user events for autofdo with lbr we get some 
>> events where branch sources are kernel addresses and branch targets 
>> are user addresses. Without kernel MMAP events create_gcov can't make 
>> sense of kernel addresses. Currently create_gcov fails if it can't 
>> map at least 95% of events. We sometimes get below this threshold with just 
>> user events. The change is to collect both user events and kernel events.
>
> Does this require elevated privileges?  Can we instead "fix" create_gcov here?

Right, requiring privileges for this is going to be a no-go for a lot of 
builders. In a distro context, for example, it means we can't consider autofdo 
at all.


Re: [PATCH] rs6000: Update the vsx-vector-6.* tests.

2023-06-30 Thread Carl Love via Gcc-patches
Kewen:

On Fri, 2023-06-30 at 11:37 +0800, Kewen.Lin wrote:
> Hi Carl,
> 
> on 2023/6/30 05:36, Carl Love wrote:
> > Kewen:
> > 
> > On Wed, 2023-06-28 at 16:35 +0800, Kewen.Lin wrote:
> > > > Yea, I was going with a runnable test and didn't include the
> > > > instruction counts.  Added back in.  Rather then doing by
> > > > processor
> > > > version (P8, P9, P10) I was able to do it by BE/LE.  The
> > > > instruction
> > > > counts were the same for LE accross processor versions but
> > > > there
> > > > are a
> > > > few instruction counts that vary with BE and LE.
> > > 
> > > But the original test case only checks for cpu-types (processor
> > > version)
> > > but not for endianness, it means for the bif usages, there should
> > > not
> > > be
> > > different for endianness.  Why does this changes with your new
> > > test
> > > case?
> > > Could you have a further look and make it consistent with some
> > > adjustment
> > > if possible?  As we know, checking insn counts sometimes are
> > > fragile,
> > > so
> > > I think we should try our best to make it as robust as possible
> > > in
> > > the
> > > first place.
> > > 
> > > Besides, the original case also have some differences between
> > > p7/p8
> > > and
> > > p9.
> > >   
> > 
> > There are differences on P8 LE versus BE.  I did a diff between the
> > P8
> > and P9 tests:
> > 
> >  diff vsx-vector-6.p8.c vsx-vector-6.p9.c
> > 3,4c3,4
> > < /* { dg-require-effective-target powerpc_p8vector_ok } */
> > < /* { dg-options "-O2 -mdejagnu-cpu=power8" } */
> > ---
> > > /* { dg-require-effective-target powerpc_p9vector_ok } */
> > > /* { dg-options "-O2 -mdejagnu-cpu=power9" } */
> > 12c12
> > < /* { dg-final { scan-assembler-times {\mvperm\M} 1 } } */
> > ---
> > > /* { dg-final { scan-assembler-times {\m(?:v|xx)permr?\M} 1 } }
> > > */
> > 23d22
> > < /* { dg-final { scan-assembler-times {\mxvmsub[am]dp\M} 1 } } */
> > 37c36
> > < /* { dg-final { scan-assembler-times {\mxvsubdp\M} 1 } } */
> > ---
> > > /* { dg-final { scan-assembler-times {\mxvmsub[am]dp\M} 1 } } */
> > 
> > So we can see the vperm, vpermr, xxpermr, xvmsubadp, xvmsubmdp,
> > xvsubdp, xvmsubadp, xvmsubmdp instruction count checks are
> > different
> > between the two architectures.  I then wrote a script to compile
> > the
> > CPU specific test on Power 8, Power 9 and Power 10 architectures
> > and
> > then grep for the above list of instructions.  If I run the scrip
> > on P8
> > BE  and LE I get> 
> > 
> > Power 8 BEPower 8 LE   Power 9 LE   Power 9
> > BEPower 10 LE*
> >(makalu-
> > lp1)(genoa) (marlin)  (nilram)   (ltcd97-lp3)
> > instruction   count countcount count   
> >  count
> > vperm  1  10 0 
> >0
> > vpermr 0  00 0 
> >0
> > xxpermr0  01 0 
> >1
> > xvmsubadp  1  01 1 
> >1
> > xvmsubmdp  0  10 0 
> >0
> > xvsubdp1  11 1 
> >1
> > 
> 
> Thanks for looking into this and making this statistics.
> 
> Is there a typo for column nilram?   Otherwise, the below insn check
> 
> /* { dg-final { scan-assembler-times {\m(?:v|xx)permr?\M} 1 } } */
> 
> would fail there.

Yes, there is a typo in the nilram column.  The test generates a vperm
instruction.

#if defined (__BIG_ENDIAN__) || defined (_ARCH_PWR9)
  dst[8].d = vec_perm (src0[8].d, src1[8].d, src2[8].uc);
 f74:   e9 3f 00 78 ld  r9,120(r31)
 f78:   39 29 07 00 addir9,r9,1792
 f7c:   f5 89 00 01 lxv vs12,0(r9)
 f80:   e9 3f 00 80 ld  r9,128(r31)
 f84:   39 29 07 00 addir9,r9,1792
 f88:   f4 09 00 01 lxv vs0,0(r9)
 f8c:   e9 3f 00 88 ld  r9,136(r31)
 f90:   39 29 07 00 addir9,r9,1792
 f94:   f4 09 00 89 lxv vs32,128(r9)
 f98:   e9 3f 00 70 ld  r9,112(r31)
 f9c:   39 29 07 00 addir9,r9,1792
 fa0:   f0 2c 64 91 xxmrvs33,vs12
 fa4:   f1 a0 04 91 xxmrvs45,vs0
 fa8:   10 01 68 2b vperm   v0,v1,v13,v0
 ...

> 

> > 
> > I had played with putting -Wno-inline on the command line but that
> > didn't seem to make any difference.  However, you suggestion of
> > __attribute__ ((noipa)) does prevent the inlining and we don't get
> > the
> > second copy of the instructions showing up. The inlining eliminated
> > the
> > LE/BE differences for xvmaxsp, xvminsp and xvmaxdp.
> 
> -Winline is a option for warning: "Warn if a function that is
> declared
> as inline cannot be inlined.", I think what you wanted is -fno-
> inline,
> and it's good to know noipa helps here.

Yea, my bad.  Didn't read the manual very carefully.  
> 
> > The i

Re: [PATCH] RISC-V: Support vfwmul.vv combine lowering

2023-06-30 Thread Jeff Law via Gcc-patches




On 6/30/23 04:14, Robin Dapp wrote:

The explicit conversions I see are because we need the output of the
conversion in multiple vfmul instructions.  That won't be helped by
the patch you've proposed.


FWIW on my local branch and the patch applied I see that the vfwmuls
are being generated (all of the vfmuls are replaced).


It'll need to be a define_insn_and_split as its a 3->3 splitter.  The
split will emit the two extensions and the widening multiply as 3
distinct insns.


I tried this and while it worked for the first vfwmul the subsequent
ones are not being combined/optimized.  Now I'm not a combine expert
at all but it looks as if the source float_extends are being deleted

  deferring deletion of insn with uid = 39.
  deferring deletion of insn with uid = 37.

with that pattern successfully matched, while they are only "rescanned"
with the synthetic "single widen" one.  Them being deleted (or rather
absorbed by the vfwmul) no further combination is possible (until after
split?)

This seems to be a fundamental difference between the two approaches.
Maybe the "double widen" pattern can be adjusted to also handle this
or I did something wrong when writing the splitter?

With the "single widen" pattern, however, it works more or less
naturally therefore I'd still suggest going for it.
I'd hoped to have time to revisit all of this today, but I'm quickly 
running out of time.


There has to be some kind of mismatch between the patch or testcase or 
what we're looking at to judge success.


Monday and Tuesday are holidays in the US.  Naturally that means the 
rest of my work week is going to be busier than normal.  I don't want to 
hold things up unnecessarily.


While I really don't see the need to have the bridge pattern, I'm still 
willing to believe that I've missed something, which is why I wanted to 
dive into it myself.  For example, we have heuristics to avoid trying 
too many 4->n combine patterns and we might be tripping over that or who 
knows what.


So my suggestion is that if both of you are getting the desired code, 
then Robin handle the review side of the two patches that introduce the 
helper patterns.


Jeff


[PATCH] c-family: Implement pragma_lex () for preprocess-only mode

2023-06-30 Thread Lewis Hyatt via Gcc-patches
In order to support processing #pragma in preprocess-only mode (-E or
-save-temps for gcc/g++), we need a way to obtain the #pragma tokens from
libcpp. In full compilation modes, this is accomplished by calling
pragma_lex (), which is a symbol that must be exported by the frontend, and
which is currently implemented for C and C++. Neither of those frontends
initializes its parser machinery in preprocess-only mode, and consequently
pragma_lex () does not work in this case.

Address that by adding a new function c_init_preprocess () for the frontends
to implement, which arranges for pragma_lex () to work in preprocess-only
mode, and adjusting pragma_lex () accordingly.

In preprocess-only mode, the preprocessor is accustomed to controlling the
interaction with libcpp, and it only knows about tokens that it has called
into libcpp itself to obtain. Since it still needs to see the tokens
obtained by pragma_lex () so that they can be streamed to the output, also
add a new libcpp callback, on_token_lex (), that ensures the preprocessor
sees these tokens too.

Currently, there is one place where we are already supporting #pragma in
preprocess-only mode, namely the handling of `#pragma GCC diagnostic'.  That
was done by directly interfacing with libcpp, rather than making use of
pragma_lex (). Now that pragma_lex () works, that code is no longer
necessary; remove it.

gcc/c-family/ChangeLog:

* c-common.h (c_init_preprocess): Declare new function.
* c-opts.cc (c_common_init): Call it.
* c-pragma.cc (pragma_diagnostic_lex_normal): Rename to...
(pragma_diagnostic_lex): ...this.
(pragma_diagnostic_lex_pp): Remove.
(handle_pragma_diagnostic_impl): Call pragma_diagnostic_lex () in
all modes.
(c_pp_invoke_early_pragma_handler): Adapt to support pragma_lex ()
usage.
* c-pragma.h (pragma_lex_discard_to_eol): Declare new function.

gcc/c/ChangeLog:

* c-parser.cc (pragma_lex): Support preprocess-only mode.
(pragma_lex_discard_to_eol): New function.
(c_init_preprocess): New function.

gcc/cp/ChangeLog:

* parser.cc (c_init_preprocess): New function.
(maybe_read_tokens_for_pragma_lex): New function.
(pragma_lex): Support preprocess-only mode.
(pragma_lex_discard_to_eol): New funtion.

libcpp/ChangeLog:

* include/cpplib.h (struct cpp_callbacks): Add new callback
on_token_lex.
* macro.cc (cpp_get_token_1): Support new callback.
---

Notes:
Hello-

In r13-1544, I added support for processing `#pragma GCC diagnostic' in
preprocess-only mode. Because pragma_lex () doesn't work in that mode, in
that patch I called into libcpp directly to obtain the tokens needed to
process the pragma. As part of the review, Jason noted that it would
probably be better to make pragma_lex () usable in preprocess-only mode, and
we decided just to add a comment about that for the time being, and to go
ahead and implement that in the future, if it became necessary to support
other pragmas during preprocessing.

I think now is a good time to proceed with that plan, because I would like
to fix PR87299, which is about another pragma (#pragma GCC target) not
working in preprocess-only mode. This patch makes the necessary changes for
pragma_lex () to work in preprocess-only mode.

I have also added a new callback, on_token_lex (), to libcpp. This is so the
preprocessor can see and stream out all the tokens that pragma_lex () gets
from libcpp, since it won't otherwise see them.  This seemed the simplest
approach to me. Another possibility would be to add a wrapper function in
c-family/c-lex.cc, which would call cpp_get_token_with_location(), and then
also stream the token in preprocess-only mode, and then change all calls
into libcpp in that file to use the wrapper function.  The libcpp callback
seemed cleaner to me FWIW.

There are no new tests added here, since it's just a change of
implementation covered by existing tests. Bootstrap + regtest all languages
looks good on x86-64 Linux.

Please let me know what you think? Thanks!

-Lewis

 gcc/c-family/c-common.h  |  3 +++
 gcc/c-family/c-opts.cc   |  1 +
 gcc/c-family/c-pragma.cc | 56 ++--
 gcc/c-family/c-pragma.h  |  2 ++
 gcc/c/c-parser.cc| 34 
 gcc/cp/parser.cc | 50 +++
 libcpp/include/cpplib.h  |  4 +++
 libcpp/macro.cc  |  3 +++
 8 files changed, 105 insertions(+), 48 deletions(-)

diff --git a/gcc/c-family/c-common.h b/gcc/c-family/c-common.h
index b5ef5ff6b2c..78fc5248ba6 100644
--- a/gcc/c-family/c-common.h
+++ b/gcc/c-family/c-common.h
@@ -990,6 +990,9 @@ extern void c_parse_file (void);
 
 extern void c_parse_final_cleanups (void);
 
+/* This initializes for preprocess-only mode.  */
+extern void c_in

[PATCH] RISC-V: improve codegen for repeating large constants [3]

2023-06-30 Thread Vineet Gupta
Ran into a minor snafu in const splitting code when playing with test
case from an old PR/23813.

long long f(void) { return 0xF0F0F0F0F0F0F0F0ull; }

This currently generates

li  a5,-252645376
addia5,a5,241
li  a0,-252645376
sllia5,a5,32
addia0,a0,240
add a0,a5,a0
ret

The signed math in hival extraction introduces an additional bit,
causing loval == hival check to fail.

| riscv_split_integer (val=-1085102592571150096, mode=E_DImode) at 
../gcc/config/riscv/riscv.cc:702
| 702 unsigned HOST_WIDE_INT loval = sext_hwi (val, 32);
| (gdb)n
| 703 unsigned HOST_WIDE_INT hival = sext_hwi ((val - loval) >> 32, 32);
| (gdb)
| 704 rtx hi = gen_reg_rtx (mode), lo = gen_reg_rtx (mode);
| (gdb) p/x val
| $2 = 0xf0f0f0f0f0f0f0f0
| (gdb) p/x loval
| $3 = 0xf0f0f0f0
| (gdb) p/x hival
| $4 = 0xf0f0f0f1
   ^^^
Fix that by eliding the subtraction in shift.

With patch:

li  a5,-252645376
addia5,a5,240
sllia0,a5,32
add a0,a0,a5
ret

gcc/ChangeLog:

* config/riscv/riscv.cc (riscv_split_integer): hival computation
  do elide subtraction of loval.
* (riscv_split_integer_cost): Ditto.
* (riscv_build_integer): Ditto

Signed-off-by: Vineet Gupta 
---
I wasn't planning to do any more work on large const stuff, but just ran into 
it this
on a random BZ entry when trying search for redundant constant stuff.
The test seemed too good to pass :-)
---
 gcc/config/riscv/riscv.cc | 6 +++---
 1 file changed, 3 insertions(+), 3 deletions(-)

diff --git a/gcc/config/riscv/riscv.cc b/gcc/config/riscv/riscv.cc
index 5ac187c1b1b4..377d3aac794b 100644
--- a/gcc/config/riscv/riscv.cc
+++ b/gcc/config/riscv/riscv.cc
@@ -643,7 +643,7 @@ riscv_build_integer (struct riscv_integer_op *codes, 
HOST_WIDE_INT value,
   && (value > INT32_MAX || value < INT32_MIN))
 {
   unsigned HOST_WIDE_INT loval = sext_hwi (value, 32);
-  unsigned HOST_WIDE_INT hival = sext_hwi ((value - loval) >> 32, 32);
+  unsigned HOST_WIDE_INT hival = sext_hwi (value >> 32, 32);
   struct riscv_integer_op alt_codes[RISCV_MAX_INTEGER_OPS];
   struct riscv_integer_op hicode[RISCV_MAX_INTEGER_OPS];
   int hi_cost, lo_cost;
@@ -674,7 +674,7 @@ riscv_split_integer_cost (HOST_WIDE_INT val)
 {
   int cost;
   unsigned HOST_WIDE_INT loval = sext_hwi (val, 32);
-  unsigned HOST_WIDE_INT hival = sext_hwi ((val - loval) >> 32, 32);
+  unsigned HOST_WIDE_INT hival = sext_hwi (val >> 32, 32);
   struct riscv_integer_op codes[RISCV_MAX_INTEGER_OPS];
 
   cost = 2 + riscv_build_integer (codes, loval, VOIDmode);
@@ -700,7 +700,7 @@ static rtx
 riscv_split_integer (HOST_WIDE_INT val, machine_mode mode)
 {
   unsigned HOST_WIDE_INT loval = sext_hwi (val, 32);
-  unsigned HOST_WIDE_INT hival = sext_hwi ((val - loval) >> 32, 32);
+  unsigned HOST_WIDE_INT hival = sext_hwi (val >> 32, 32);
   rtx hi = gen_reg_rtx (mode), lo = gen_reg_rtx (mode);
 
   riscv_move_integer (lo, lo, loval, mode);
-- 
2.34.1



Re: [PATCH] RISC-V: improve codegen for repeating large constants [3]

2023-06-30 Thread Vineet Gupta




On 6/30/23 16:33, Vineet Gupta wrote:

Ran into a minor snafu in const splitting code when playing with test
case from an old PR/23813.

long long f(void) { return 0xF0F0F0F0F0F0F0F0ull; }

This currently generates

li  a5,-252645376
addia5,a5,241
li  a0,-252645376
sllia5,a5,32
addia0,a0,240
add a0,a5,a0
ret

The signed math in hival extraction introduces an additional bit,
causing loval == hival check to fail.

| riscv_split_integer (val=-1085102592571150096, mode=E_DImode) at 
../gcc/config/riscv/riscv.cc:702
| 702 unsigned HOST_WIDE_INT loval = sext_hwi (val, 32);
| (gdb)n
| 703 unsigned HOST_WIDE_INT hival = sext_hwi ((val - loval) >> 32, 32);
| (gdb)


FWIW (and I missed adding this observation to the changelog) I pondered 
about using unsigned loval/hival with zext_hwi() but that in certain 
cases can cause additional insns


e.g. constant 0x8000_ is codegen to LI 1 +SLLI 31 vs, LI 
0x_8000




| 704 rtx hi = gen_reg_rtx (mode), lo = gen_reg_rtx (mode);
| (gdb) p/x val
| $2 = 0xf0f0f0f0f0f0f0f0
| (gdb) p/x loval
| $3 = 0xf0f0f0f0
| (gdb) p/x hival
| $4 = 0xf0f0f0f1
^^^
Fix that by eliding the subtraction in shift.

With patch:

li  a5,-252645376
addia5,a5,240
sllia0,a5,32
add a0,a0,a5
ret

gcc/ChangeLog:

* config/riscv/riscv.cc (riscv_split_integer): hival computation
  do elide subtraction of loval.
* (riscv_split_integer_cost): Ditto.
* (riscv_build_integer): Ditto

Signed-off-by: Vineet Gupta 
---
I wasn't planning to do any more work on large const stuff, but just ran into 
it this
on a random BZ entry when trying search for redundant constant stuff.
The test seemed too good to pass :-)
---
  gcc/config/riscv/riscv.cc | 6 +++---
  1 file changed, 3 insertions(+), 3 deletions(-)

diff --git a/gcc/config/riscv/riscv.cc b/gcc/config/riscv/riscv.cc
index 5ac187c1b1b4..377d3aac794b 100644
--- a/gcc/config/riscv/riscv.cc
+++ b/gcc/config/riscv/riscv.cc
@@ -643,7 +643,7 @@ riscv_build_integer (struct riscv_integer_op *codes, 
HOST_WIDE_INT value,
&& (value > INT32_MAX || value < INT32_MIN))
  {
unsigned HOST_WIDE_INT loval = sext_hwi (value, 32);
-  unsigned HOST_WIDE_INT hival = sext_hwi ((value - loval) >> 32, 32);
+  unsigned HOST_WIDE_INT hival = sext_hwi (value >> 32, 32);
struct riscv_integer_op alt_codes[RISCV_MAX_INTEGER_OPS];
struct riscv_integer_op hicode[RISCV_MAX_INTEGER_OPS];
int hi_cost, lo_cost;
@@ -674,7 +674,7 @@ riscv_split_integer_cost (HOST_WIDE_INT val)
  {
int cost;
unsigned HOST_WIDE_INT loval = sext_hwi (val, 32);
-  unsigned HOST_WIDE_INT hival = sext_hwi ((val - loval) >> 32, 32);
+  unsigned HOST_WIDE_INT hival = sext_hwi (val >> 32, 32);
struct riscv_integer_op codes[RISCV_MAX_INTEGER_OPS];
  
cost = 2 + riscv_build_integer (codes, loval, VOIDmode);

@@ -700,7 +700,7 @@ static rtx
  riscv_split_integer (HOST_WIDE_INT val, machine_mode mode)
  {
unsigned HOST_WIDE_INT loval = sext_hwi (val, 32);
-  unsigned HOST_WIDE_INT hival = sext_hwi ((val - loval) >> 32, 32);
+  unsigned HOST_WIDE_INT hival = sext_hwi (val >> 32, 32);
rtx hi = gen_reg_rtx (mode), lo = gen_reg_rtx (mode);
  
riscv_move_integer (lo, lo, loval, mode);




Re: [PATCH] RISC-V: improve codegen for repeating large constants [3]

2023-06-30 Thread Andrew Waterman via Gcc-patches
I don't believe this is correct; the subtraction is needed to account
for the fact that the low part might be negative, resulting in a
borrow from the high part.  See the output for your test case below:

$ cat test.c
#include 

int main()
{
  unsigned long result, tmp;

asm (
  "li  %1,-252645376\n"
  "addi%1,%1,240\n"
  "slli%0,%1,32\n"
  "add %0,%0,%1"
: "=r" (result), "=r" (tmp));

  printf("%lx\n", result);

  return 0;
}
$ riscv64-unknown-elf-gcc -O2 test.c
$ spike pk a.out
bbl loader
f0f0f0eff0f0f0f0
$


On Fri, Jun 30, 2023 at 4:42 PM Vineet Gupta  wrote:
>
>
>
> On 6/30/23 16:33, Vineet Gupta wrote:
> > Ran into a minor snafu in const splitting code when playing with test
> > case from an old PR/23813.
> >
> >   long long f(void) { return 0xF0F0F0F0F0F0F0F0ull; }
> >
> > This currently generates
> >
> >   li  a5,-252645376
> >   addia5,a5,241
> >   li  a0,-252645376
> >   sllia5,a5,32
> >   addia0,a0,240
> >   add a0,a5,a0
> >   ret
> >
> > The signed math in hival extraction introduces an additional bit,
> > causing loval == hival check to fail.
> >
> > | riscv_split_integer (val=-1085102592571150096, mode=E_DImode) at 
> > ../gcc/config/riscv/riscv.cc:702
> > | 702   unsigned HOST_WIDE_INT loval = sext_hwi (val, 32);
> > | (gdb)n
> > | 703   unsigned HOST_WIDE_INT hival = sext_hwi ((val - loval) >> 32, 32);
> > | (gdb)
>
> FWIW (and I missed adding this observation to the changelog) I pondered
> about using unsigned loval/hival with zext_hwi() but that in certain
> cases can cause additional insns
>
> e.g. constant 0x8000_ is codegen to LI 1 +SLLI 31 vs, LI
> 0x_8000
>
>
> > | 704   rtx hi = gen_reg_rtx (mode), lo = gen_reg_rtx (mode);
> > | (gdb) p/x val
> > | $2 = 0xf0f0f0f0f0f0f0f0
> > | (gdb) p/x loval
> > | $3 = 0xf0f0f0f0
> > | (gdb) p/x hival
> > | $4 = 0xf0f0f0f1
> > ^^^
> > Fix that by eliding the subtraction in shift.
> >
> > With patch:
> >
> >   li  a5,-252645376
> >   addia5,a5,240
> >   sllia0,a5,32
> >   add a0,a0,a5
> >   ret
> >
> > gcc/ChangeLog:
> >
> >   * config/riscv/riscv.cc (riscv_split_integer): hival computation
> > do elide subtraction of loval.
> >   * (riscv_split_integer_cost): Ditto.
> >   * (riscv_build_integer): Ditto
> >
> > Signed-off-by: Vineet Gupta 
> > ---
> > I wasn't planning to do any more work on large const stuff, but just ran 
> > into it this
> > on a random BZ entry when trying search for redundant constant stuff.
> > The test seemed too good to pass :-)
> > ---
> >   gcc/config/riscv/riscv.cc | 6 +++---
> >   1 file changed, 3 insertions(+), 3 deletions(-)
> >
> > diff --git a/gcc/config/riscv/riscv.cc b/gcc/config/riscv/riscv.cc
> > index 5ac187c1b1b4..377d3aac794b 100644
> > --- a/gcc/config/riscv/riscv.cc
> > +++ b/gcc/config/riscv/riscv.cc
> > @@ -643,7 +643,7 @@ riscv_build_integer (struct riscv_integer_op *codes, 
> > HOST_WIDE_INT value,
> > && (value > INT32_MAX || value < INT32_MIN))
> >   {
> > unsigned HOST_WIDE_INT loval = sext_hwi (value, 32);
> > -  unsigned HOST_WIDE_INT hival = sext_hwi ((value - loval) >> 32, 32);
> > +  unsigned HOST_WIDE_INT hival = sext_hwi (value >> 32, 32);
> > struct riscv_integer_op alt_codes[RISCV_MAX_INTEGER_OPS];
> > struct riscv_integer_op hicode[RISCV_MAX_INTEGER_OPS];
> > int hi_cost, lo_cost;
> > @@ -674,7 +674,7 @@ riscv_split_integer_cost (HOST_WIDE_INT val)
> >   {
> > int cost;
> > unsigned HOST_WIDE_INT loval = sext_hwi (val, 32);
> > -  unsigned HOST_WIDE_INT hival = sext_hwi ((val - loval) >> 32, 32);
> > +  unsigned HOST_WIDE_INT hival = sext_hwi (val >> 32, 32);
> > struct riscv_integer_op codes[RISCV_MAX_INTEGER_OPS];
> >
> > cost = 2 + riscv_build_integer (codes, loval, VOIDmode);
> > @@ -700,7 +700,7 @@ static rtx
> >   riscv_split_integer (HOST_WIDE_INT val, machine_mode mode)
> >   {
> > unsigned HOST_WIDE_INT loval = sext_hwi (val, 32);
> > -  unsigned HOST_WIDE_INT hival = sext_hwi ((val - loval) >> 32, 32);
> > +  unsigned HOST_WIDE_INT hival = sext_hwi (val >> 32, 32);
> > rtx hi = gen_reg_rtx (mode), lo = gen_reg_rtx (mode);
> >
> > riscv_move_integer (lo, lo, loval, mode);
>


Re: [PATCH] rs6000: Update the vsx-vector-6.* tests.

2023-06-30 Thread Carl Love via Gcc-patches
Kewen:

On Fri, 2023-06-30 at 15:20 -0700, Carl Love wrote:
> So, went to look at the assembly to verify my comment on the
> difference
> being related to the loads. I decided to actually count the
> instructions just to verify the number in the assembly files. 
> Before,
> I just looked at the assembly briefly but didn't dig in very deep.
> 
> If I compile the tests and dump the assembly with:
>   gcc -g -mcpu=power8 -o vsx-vector-6-func-2lop vsx-vector-6-func-
> 2lop.c
> 
>   objdump -S -d vsx-vector-6-func-2lop > vsx-vector-6-func-2lop.dump
>   
>   grep xxlor vsx-vector-6-func-2lop.dump | wc
>   4  28 192
> 
> So we see 4 xxlor instructions not 32 as expeced for BE or 22 as
> expected for LE as the test claims.  I get the same count of 4 on
> both
> makalu and on genoa. 

With a little help from Peter and Julian Wang.  Objdump decodes some of
the xxlor instructions as xxmr instsructions.  The xxmr is a new
mnemonic which will be out in the next ISA.  But objdump already
produces it.  So if you add the counts for grep xxlor and grep xxmr you
get a total of 34 which agress with the count of xxlor in the gcc -S
generated assembly.

  Carl 



Re: [PATCH] rs6000: Update the vsx-vector-6.* tests.

2023-06-30 Thread Peter Bergner via Gcc-patches
On 6/30/23 5:20 PM, Carl Love wrote:
> So, we have the issue that looking at the assembly gives different
> instruction counts then what 
> 
>dg-final { scan-assembler-times {\mxxlor\M} }
> 
> comes up with???

I recommend not even counting xxlor at all, since the majority of
them come from vsx register copies and whether and how many we
generate seemingly varies with the phase of the moon, day of the
week, etc. etc. 

If you really want to verify an xxlor count, you almost have to extract
the given test into it's own file so it's not corrupted by any of the
other tests and it has to be as small as possible and compiled with
a fair amount of optimization.  Even then you may get some copies.
So I'd recommend just removing the xxlor counts altogether.

Peter




Re: [PATCH] rs6000: Update the vsx-vector-6.* tests.

2023-06-30 Thread Peter Bergner via Gcc-patches
On 6/30/23 6:50 PM, Carl Love wrote:
> With a little help from Peter and Julian Wang.  Objdump decodes some of
> the xxlor instructions as xxmr instsructions.  The xxmr is a new
> mnemonic which will be out in the next ISA.  But objdump already
> produces it.  So if you add the counts for grep xxlor and grep xxmr you
> get a total of 34 which agress with the count of xxlor in the gcc -S
> generated assembly.

Right, xxmr is new and objdump defaults to emitting it for xxlor's used
as copies.   You can use the -Mraw objdump option to display the real
mnemonics instead of any extended mnemonics.

Peter





Re: [PATCH] RISC-V: improve codegen for repeating large constants [3]

2023-06-30 Thread Vineet Gupta




On 6/30/23 16:50, Andrew Waterman wrote:

I don't believe this is correct; the subtraction is needed to account
for the fact that the low part might be negative, resulting in a
borrow from the high part.  See the output for your test case below:

$ cat test.c
#include 

int main()
{
   unsigned long result, tmp;

asm (
   "li  %1,-252645376\n"
   "addi%1,%1,240\n"
   "slli%0,%1,32\n"
   "add %0,%0,%1"
 : "=r" (result), "=r" (tmp));

   printf("%lx\n", result);

   return 0;
}
$ riscv64-unknown-elf-gcc -O2 test.c
$ spike pk a.out
bbl loader
f0f0f0eff0f0f0f0
$


Thx for the quick feedback Andew. I'm clearly lacking in signed math :-(
So is it possible to have a better code seq for the testcase at all ?

-Vineet




On Fri, Jun 30, 2023 at 4:42 PM Vineet Gupta  wrote:



On 6/30/23 16:33, Vineet Gupta wrote:

Ran into a minor snafu in const splitting code when playing with test
case from an old PR/23813.

   long long f(void) { return 0xF0F0F0F0F0F0F0F0ull; }

This currently generates

   li  a5,-252645376
   addia5,a5,241
   li  a0,-252645376
   sllia5,a5,32
   addia0,a0,240
   add a0,a5,a0
   ret

The signed math in hival extraction introduces an additional bit,
causing loval == hival check to fail.

| riscv_split_integer (val=-1085102592571150096, mode=E_DImode) at 
../gcc/config/riscv/riscv.cc:702
| 702   unsigned HOST_WIDE_INT loval = sext_hwi (val, 32);
| (gdb)n
| 703   unsigned HOST_WIDE_INT hival = sext_hwi ((val - loval) >> 32, 32);
| (gdb)

FWIW (and I missed adding this observation to the changelog) I pondered
about using unsigned loval/hival with zext_hwi() but that in certain
cases can cause additional insns

e.g. constant 0x8000_ is codegen to LI 1 +SLLI 31 vs, LI
0x_8000



| 704   rtx hi = gen_reg_rtx (mode), lo = gen_reg_rtx (mode);
| (gdb) p/x val
| $2 = 0xf0f0f0f0f0f0f0f0
| (gdb) p/x loval
| $3 = 0xf0f0f0f0
| (gdb) p/x hival
| $4 = 0xf0f0f0f1
 ^^^
Fix that by eliding the subtraction in shift.

With patch:

   li  a5,-252645376
   addia5,a5,240
   sllia0,a5,32
   add a0,a0,a5
   ret

gcc/ChangeLog:

   * config/riscv/riscv.cc (riscv_split_integer): hival computation
 do elide subtraction of loval.
   * (riscv_split_integer_cost): Ditto.
   * (riscv_build_integer): Ditto

Signed-off-by: Vineet Gupta 
---
I wasn't planning to do any more work on large const stuff, but just ran into 
it this
on a random BZ entry when trying search for redundant constant stuff.
The test seemed too good to pass :-)
---
   gcc/config/riscv/riscv.cc | 6 +++---
   1 file changed, 3 insertions(+), 3 deletions(-)

diff --git a/gcc/config/riscv/riscv.cc b/gcc/config/riscv/riscv.cc
index 5ac187c1b1b4..377d3aac794b 100644
--- a/gcc/config/riscv/riscv.cc
+++ b/gcc/config/riscv/riscv.cc
@@ -643,7 +643,7 @@ riscv_build_integer (struct riscv_integer_op *codes, 
HOST_WIDE_INT value,
 && (value > INT32_MAX || value < INT32_MIN))
   {
 unsigned HOST_WIDE_INT loval = sext_hwi (value, 32);
-  unsigned HOST_WIDE_INT hival = sext_hwi ((value - loval) >> 32, 32);
+  unsigned HOST_WIDE_INT hival = sext_hwi (value >> 32, 32);
 struct riscv_integer_op alt_codes[RISCV_MAX_INTEGER_OPS];
 struct riscv_integer_op hicode[RISCV_MAX_INTEGER_OPS];
 int hi_cost, lo_cost;
@@ -674,7 +674,7 @@ riscv_split_integer_cost (HOST_WIDE_INT val)
   {
 int cost;
 unsigned HOST_WIDE_INT loval = sext_hwi (val, 32);
-  unsigned HOST_WIDE_INT hival = sext_hwi ((val - loval) >> 32, 32);
+  unsigned HOST_WIDE_INT hival = sext_hwi (val >> 32, 32);
 struct riscv_integer_op codes[RISCV_MAX_INTEGER_OPS];

 cost = 2 + riscv_build_integer (codes, loval, VOIDmode);
@@ -700,7 +700,7 @@ static rtx
   riscv_split_integer (HOST_WIDE_INT val, machine_mode mode)
   {
 unsigned HOST_WIDE_INT loval = sext_hwi (val, 32);
-  unsigned HOST_WIDE_INT hival = sext_hwi ((val - loval) >> 32, 32);
+  unsigned HOST_WIDE_INT hival = sext_hwi (val >> 32, 32);
 rtx hi = gen_reg_rtx (mode), lo = gen_reg_rtx (mode);

 riscv_move_integer (lo, lo, loval, mode);




Re: [PATCH] RISC-V: improve codegen for repeating large constants [3]

2023-06-30 Thread Andrew Waterman via Gcc-patches
On Fri, Jun 30, 2023 at 5:13 PM Vineet Gupta  wrote:
>
>
>
> On 6/30/23 16:50, Andrew Waterman wrote:
> > I don't believe this is correct; the subtraction is needed to account
> > for the fact that the low part might be negative, resulting in a
> > borrow from the high part.  See the output for your test case below:
> >
> > $ cat test.c
> > #include 
> >
> > int main()
> > {
> >unsigned long result, tmp;
> >
> > asm (
> >"li  %1,-252645376\n"
> >"addi%1,%1,240\n"
> >"slli%0,%1,32\n"
> >"add %0,%0,%1"
> >  : "=r" (result), "=r" (tmp));
> >
> >printf("%lx\n", result);
> >
> >return 0;
> > }
> > $ riscv64-unknown-elf-gcc -O2 test.c
> > $ spike pk a.out
> > bbl loader
> > f0f0f0eff0f0f0f0
> > $
>
> Thx for the quick feedback Andew. I'm clearly lacking in signed math :-(
> So is it possible to have a better code seq for the testcase at all ?

You're welcome!

When Zba is implemented, then inserting a zext.w would do the trick;
see below.  (The generalization is that the zext.w is needed if the
32-bit constant is negative.)  When Zba is not implemented, I think
the original sequence is optimal.

li  a5, -252645376
addia5, a5, 240
sllia0, a5, 32
zext.w  a5, a5
add a0, a0, a5


>
> -Vineet
>
> >
> >
> > On Fri, Jun 30, 2023 at 4:42 PM Vineet Gupta  wrote:
> >>
> >>
> >> On 6/30/23 16:33, Vineet Gupta wrote:
> >>> Ran into a minor snafu in const splitting code when playing with test
> >>> case from an old PR/23813.
> >>>
> >>>long long f(void) { return 0xF0F0F0F0F0F0F0F0ull; }
> >>>
> >>> This currently generates
> >>>
> >>>li  a5,-252645376
> >>>addia5,a5,241
> >>>li  a0,-252645376
> >>>sllia5,a5,32
> >>>addia0,a0,240
> >>>add a0,a5,a0
> >>>ret
> >>>
> >>> The signed math in hival extraction introduces an additional bit,
> >>> causing loval == hival check to fail.
> >>>
> >>> | riscv_split_integer (val=-1085102592571150096, mode=E_DImode) at 
> >>> ../gcc/config/riscv/riscv.cc:702
> >>> | 702   unsigned HOST_WIDE_INT loval = sext_hwi (val, 32);
> >>> | (gdb)n
> >>> | 703   unsigned HOST_WIDE_INT hival = sext_hwi ((val - loval) >> 32, 32);
> >>> | (gdb)
> >> FWIW (and I missed adding this observation to the changelog) I pondered
> >> about using unsigned loval/hival with zext_hwi() but that in certain
> >> cases can cause additional insns
> >>
> >> e.g. constant 0x8000_ is codegen to LI 1 +SLLI 31 vs, LI
> >> 0x_8000
> >>
> >>
> >>> | 704   rtx hi = gen_reg_rtx (mode), lo = gen_reg_rtx (mode);
> >>> | (gdb) p/x val
> >>> | $2 = 0xf0f0f0f0f0f0f0f0
> >>> | (gdb) p/x loval
> >>> | $3 = 0xf0f0f0f0
> >>> | (gdb) p/x hival
> >>> | $4 = 0xf0f0f0f1
> >>>  ^^^
> >>> Fix that by eliding the subtraction in shift.
> >>>
> >>> With patch:
> >>>
> >>>li  a5,-252645376
> >>>addia5,a5,240
> >>>sllia0,a5,32
> >>>add a0,a0,a5
> >>>ret
> >>>
> >>> gcc/ChangeLog:
> >>>
> >>>* config/riscv/riscv.cc (riscv_split_integer): hival computation
> >>>  do elide subtraction of loval.
> >>>* (riscv_split_integer_cost): Ditto.
> >>>* (riscv_build_integer): Ditto
> >>>
> >>> Signed-off-by: Vineet Gupta 
> >>> ---
> >>> I wasn't planning to do any more work on large const stuff, but just ran 
> >>> into it this
> >>> on a random BZ entry when trying search for redundant constant stuff.
> >>> The test seemed too good to pass :-)
> >>> ---
> >>>gcc/config/riscv/riscv.cc | 6 +++---
> >>>1 file changed, 3 insertions(+), 3 deletions(-)
> >>>
> >>> diff --git a/gcc/config/riscv/riscv.cc b/gcc/config/riscv/riscv.cc
> >>> index 5ac187c1b1b4..377d3aac794b 100644
> >>> --- a/gcc/config/riscv/riscv.cc
> >>> +++ b/gcc/config/riscv/riscv.cc
> >>> @@ -643,7 +643,7 @@ riscv_build_integer (struct riscv_integer_op *codes, 
> >>> HOST_WIDE_INT value,
> >>>  && (value > INT32_MAX || value < INT32_MIN))
> >>>{
> >>>  unsigned HOST_WIDE_INT loval = sext_hwi (value, 32);
> >>> -  unsigned HOST_WIDE_INT hival = sext_hwi ((value - loval) >> 32, 
> >>> 32);
> >>> +  unsigned HOST_WIDE_INT hival = sext_hwi (value >> 32, 32);
> >>>  struct riscv_integer_op alt_codes[RISCV_MAX_INTEGER_OPS];
> >>>  struct riscv_integer_op hicode[RISCV_MAX_INTEGER_OPS];
> >>>  int hi_cost, lo_cost;
> >>> @@ -674,7 +674,7 @@ riscv_split_integer_cost (HOST_WIDE_INT val)
> >>>{
> >>>  int cost;
> >>>  unsigned HOST_WIDE_INT loval = sext_hwi (val, 32);
> >>> -  unsigned HOST_WIDE_INT hival = sext_hwi ((val - loval) >> 32, 32);
> >>> +  unsigned HOST_WIDE_INT hival = sext_hwi (val >> 32, 32);
> >>>  struct riscv_integer_op codes[RISCV_MAX_INTEGER_OPS];
> >>>
> >>>  cost = 2 + riscv_build_integer (codes, loval, VOIDmode);
> >>> @@ -700,7 +700,7 @@ static rtx
> >>>riscv_split_integer (HOST_WIDE_INT val, machine_mode mode)
>

Re: [PATCH] RISC-V: improve codegen for repeating large constants [3]

2023-06-30 Thread Palmer Dabbelt

On Fri, 30 Jun 2023 17:25:54 PDT (-0700), Andrew Waterman wrote:

On Fri, Jun 30, 2023 at 5:13 PM Vineet Gupta  wrote:




On 6/30/23 16:50, Andrew Waterman wrote:
> I don't believe this is correct; the subtraction is needed to account
> for the fact that the low part might be negative, resulting in a
> borrow from the high part.  See the output for your test case below:
>
> $ cat test.c
> #include 
>
> int main()
> {
>unsigned long result, tmp;
>
> asm (
>"li  %1,-252645376\n"
>"addi%1,%1,240\n"
>"slli%0,%1,32\n"
>"add %0,%0,%1"
>  : "=r" (result), "=r" (tmp));
>
>printf("%lx\n", result);
>
>return 0;
> }
> $ riscv64-unknown-elf-gcc -O2 test.c
> $ spike pk a.out
> bbl loader
> f0f0f0eff0f0f0f0
> $

Thx for the quick feedback Andew. I'm clearly lacking in signed math :-(
So is it possible to have a better code seq for the testcase at all ?


You're welcome!

When Zba is implemented, then inserting a zext.w would do the trick;
see below.  (The generalization is that the zext.w is needed if the
32-bit constant is negative.)  When Zba is not implemented, I think
the original sequence is optimal.

li  a5, -252645376
addia5, a5, 240
sllia0, a5, 32
zext.w  a5, a5
add a0, a0, a5


For the non-Zba case, I think we can leverage the two high parts 
starting out the same to save an instruction generating the constant.  
So for the original code sequence of 


   li  a5,-252645376
   addia5,a5,241
   li  a0,-252645376
   sllia5,a5,32
   addia0,a0,240
   add a0,a5,a0
   ret

we could instead generate

   li  a5,-252645376
   addia0,a5,240
   addia5,a5,241
   sllia5,a5,32
   add a0,a5,a0
   ret

which IIUC produces the same result.  I think something along the lines 
of this (with the corresponding cost function updates) would do it


   diff --git a/gcc/config/riscv/riscv.cc b/gcc/config/riscv/riscv.cc
   index de578b5b899..32b6033a966 100644
   --- a/gcc/config/riscv/riscv.cc
   +++ b/gcc/config/riscv/riscv.cc
   @@ -704,7 +704,13 @@ riscv_split_integer (HOST_WIDE_INT val, machine_mode 
mode)
  rtx hi = gen_reg_rtx (mode), lo = gen_reg_rtx (mode);

  riscv_move_integer (hi, hi, hival, mode);

   -  riscv_move_integer (lo, lo, loval, mode);
   +  if (riscv_integer_cost (loval - hival) + 1 < riscv_integer_cost (loval)) {
   +rtx delta = gen_reg_rrtx (mode);
   +riscv_move_integer (delta, delta, loval - hival, mode);
   +lo = gen_rtx_fmt_ee (PLUS, mode, hi, delta);
   +  } else {
   +riscv_move_integer (lo, lo, loval, mode);
   +  }

  hi = gen_rtx_fmt_ee (ASHIFT, mode, hi, GEN_INT (32));

  hi = force_reg (mode, hi);

though I suppose that would produce a slightly different sequence that has the
same number of instructions but a slightly longer dependency chain, something
more like

   li  a5,-252645376
   addia5,a5,241
   addia0,a5,-1
   sllia5,a5,32
   add a0,a5,a0
   ret

Take that all with a grain of salt, though, as I just ate some very spicy
chicken and can barely see straight :)







-Vineet

>
>
> On Fri, Jun 30, 2023 at 4:42 PM Vineet Gupta  wrote:
>>
>>
>> On 6/30/23 16:33, Vineet Gupta wrote:
>>> Ran into a minor snafu in const splitting code when playing with test
>>> case from an old PR/23813.
>>>
>>>long long f(void) { return 0xF0F0F0F0F0F0F0F0ull; }
>>>
>>> This currently generates
>>>
>>>li  a5,-252645376
>>>addia5,a5,241
>>>li  a0,-252645376
>>>sllia5,a5,32
>>>addia0,a0,240
>>>add a0,a5,a0
>>>ret
>>>
>>> The signed math in hival extraction introduces an additional bit,
>>> causing loval == hival check to fail.
>>>
>>> | riscv_split_integer (val=-1085102592571150096, mode=E_DImode) at 
../gcc/config/riscv/riscv.cc:702
>>> | 702   unsigned HOST_WIDE_INT loval = sext_hwi (val, 32);
>>> | (gdb)n
>>> | 703   unsigned HOST_WIDE_INT hival = sext_hwi ((val - loval) >> 32, 32);
>>> | (gdb)
>> FWIW (and I missed adding this observation to the changelog) I pondered
>> about using unsigned loval/hival with zext_hwi() but that in certain
>> cases can cause additional insns
>>
>> e.g. constant 0x8000_ is codegen to LI 1 +SLLI 31 vs, LI
>> 0x_8000
>>
>>
>>> | 704   rtx hi = gen_reg_rtx (mode), lo = gen_reg_rtx (mode);
>>> | (gdb) p/x val
>>> | $2 = 0xf0f0f0f0f0f0f0f0
>>> | (gdb) p/x loval
>>> | $3 = 0xf0f0f0f0
>>> | (gdb) p/x hival
>>> | $4 = 0xf0f0f0f1
>>>  ^^^
>>> Fix that by eliding the subtraction in shift.
>>>
>>> With patch:
>>>
>>>li  a5,-252645376
>>>addia5,a5,240
>>>sllia0,a5,32
>>>add a0,a0,a5
>>>ret
>>>
>>> gcc/ChangeLog:
>>>
>>>* config/riscv/riscv.cc (riscv_split_integer): hival computation
>>>  do elide subtraction of loval.
>>>* (riscv_split

[PATCH ver 2] rs6000, __builtin_set_fpscr_rn add retrun value

2023-06-30 Thread Carl Love via Gcc-patches


GCC maintainers:

Ver 2,  Went back thru the requirements and emails.  Not sure where I
came up with the requirement for an overloaded version with double
argument.  Removed the overloaded version with the double argument. 
Added the macro to announce if the __builtin_set_fpscr_rn returns a
void or a double with the FPSCR bits.  Updated the documentation file. 
Retested on Power 8 BE/LE, Power 9 BE/LE, Power 10 LE.  Redid the test
file.  Per request, the original test file functionality was not
changed.  Just changed the name from test_fpscr_rn_builtin.c to 
test_fpscr_rn_builtin_1.c.  Put new tests for the return values into a
new test file, test_fpscr_rn_builtin_2.c.

The GLibC team requested a builtin to replace the mffscrn and
mffscrniinline asm instructions in the GLibC code.  Previously there
was discussion on adding builtins for the mffscrn instructions.

https://gcc.gnu.org/pipermail/gcc-patches/2023-May/620261.html

In the end, it was felt that it would be to extend the existing
__builtin_set_fpscr_rn builtin to return a double instead of a void
type.  The desire is that we could have the functionality of the
mffscrn and mffscrni instructions on older ISAs.  The two instructions
were initially added in ISA 3.0.  The __builtin_set_fpscr_rn has the
needed functionality to set the RN field using the mffscrn and mffscrni
instructions if ISA 3.0 is supported or fall back to using logical
instructions to mask and set the bits for earlier ISAs.  The
instructions return the current value of the FPSCR fields DRN, VE, OE,
UE, ZE, XE, NI, RN bit positions then update the RN bit positions with
the new RN value provided.

The current __builtin_set_fpscr_rn builtin has a return type of void. 
So, changing the return type to double and returning the  FPSCR fields
DRN, VE, OE, UE, ZE, XE, NI, RN bit positions would then give the
functionally equivalent of the mffscrn and mffscrni instructions.  Any
current uses of the builtin would just ignore the return value yet any
new uses could use the return value.  So the requirement is for the
change to the __builtin_set_fpscr_rn builtin to be backwardly
compatible and work for all ISAs.

The following patch changes the return type of the
 __builtin_set_fpscr_rn builtin from void to double.  The return value
is the current value of the various FPSCR fields DRN, VE, OE, UE, ZE,
XE, NI, RN bit positions when the builtin is called.  The builtin then
updated the RN field with the new value provided as an argument to the
builtin.  The patch adds new testcases to test_fpscr_rn_builtin.c to
check that the builtin returns the current value of the FPSCR fields
and then updates the RN field.

The GLibC team has reviewed the patch to make sure it met their needs
as a drop in replacement for the inline asm mffscr and mffscrni
statements in the GLibC code.  T

The patch has been tested on Power 8 LE/BE, Power 9 LE/BE and Power 10
LE.

Please let me know if the patch is acceptable for mainline.  Thanks.

   Carl 


--
rs6000, __builtin_set_fpscr_rn add retrun value

Change the return value from void to double.  The return value consists of
the FPSCR fields DRN, VE, OE, UE, ZE, XE, NI, RN bit positions.  Add an
overloaded version which accepts a double argument.

The test powerpc/test_fpscr_rn_builtin.c is updated to add tests for the
double reterun value and the new double argument.

gcc/ChangeLog:
* config/rs6000/rs6000-builtins.def (__builtin_set_fpscr_rn): Update
builtin definition return type.
* config/rs6000-c.cc(rs6000_target_modify_macros): Add check, define
__SET_FPSCR_RN_RETURNS_FPSCR__ macro.
* config/rs6000/rs6000.md ((rs6000_get_fpscr_fields): New
define_expand.
(rs6000_update_fpscr_rn_field): New define_expand.
(rs6000_set_fpscr_rn): Addedreturn argument.  Updated to use new
rs6000_get_fpscr_fields and rs6000_update_fpscr_rn_field define
 _expands.
* doc/extend.texi (__builtin_set_fpscr_rn): Update description for
the return value and new double argument.  Add descripton for
__SET_FPSCR_RN_RETURNS_FPSCR__ macro.

gcc/testsuite/ChangeLog:
gcc.target/powerpc/test_fpscr_rn_builtin.c: Renamed to
test_fpscr_rn_builtin_1.c.  Added comment.
gcc.target/powerpc/test_fpscr_rn_builtin_2.c: New test for the
return value of __builtin_set_fpscr_rn builtin.
---
 gcc/config/rs6000/rs6000-builtins.def |   2 +-
 gcc/config/rs6000/rs6000-c.cc |   4 +
 gcc/config/rs6000/rs6000.md   |  87 +++---
 gcc/doc/extend.texi   |  26 ++-
 ...rn_builtin.c => test_fpscr_rn_builtin_1.c} |   6 +
 .../powerpc/test_fpscr_rn_builtin_2.c | 153 ++
 6 files changed, 246 insertions(+), 32 deletions(-)
 rename gcc/testsuite/gcc.target/powerpc/{test_fpscr_rn_builtin.c => 
test_fpscr_rn_builtin_1.c} (92%)
 create mode 100644 gcc

[PATCH v3 0/3] c++: Track lifetimes in constant evaluation [PR70331,...]

2023-06-30 Thread Nathaniel Shead via Gcc-patches
This is an update of the patch series at
https://gcc.gnu.org/pipermail/gcc-patches/2023-March/614811.html

Changes since v2:

- Use a separate 'hash_set' to track expired variables instead of
  adding a flag to 'lang_decl_base'.
- Use 'iloc_sentinel' to propagate location information down to
  subexpressions instead of manually saving and falling back to a
  parent expression's location.
- Update more tests with improved error location information.

Bootstrapped and regtested on x86_64-pc-linux-gnu.

---

Nathaniel Shead (3):
  c++: Track lifetimes in constant evaluation [PR70331,PR96630,PR98675]
  c++: Improve constexpr error for dangling local variables
  c++: Improve location information in constant evaluation

 gcc/cp/constexpr.cc   | 158 +++---
 gcc/cp/semantics.cc   |   5 +-
 gcc/cp/typeck.cc  |   5 +-
 gcc/testsuite/g++.dg/cpp0x/constexpr-48089.C  |  10 +-
 gcc/testsuite/g++.dg/cpp0x/constexpr-70323.C  |   8 +-
 gcc/testsuite/g++.dg/cpp0x/constexpr-70323a.C |   8 +-
 .../g++.dg/cpp0x/constexpr-delete2.C  |   5 +-
 gcc/testsuite/g++.dg/cpp0x/constexpr-diag3.C  |   2 +-
 gcc/testsuite/g++.dg/cpp0x/constexpr-ice20.C  |   1 +
 .../g++.dg/cpp0x/constexpr-recursion.C|   6 +-
 gcc/testsuite/g++.dg/cpp0x/overflow1.C|   2 +-
 gcc/testsuite/g++.dg/cpp1y/constexpr-89285.C  |   5 +-
 gcc/testsuite/g++.dg/cpp1y/constexpr-89481.C  |   3 +-
 .../g++.dg/cpp1y/constexpr-lifetime1.C|  14 ++
 .../g++.dg/cpp1y/constexpr-lifetime2.C|  20 +++
 .../g++.dg/cpp1y/constexpr-lifetime3.C|  13 ++
 .../g++.dg/cpp1y/constexpr-lifetime4.C|  11 ++
 .../g++.dg/cpp1y/constexpr-lifetime5.C|  11 ++
 .../g++.dg/cpp1y/constexpr-tracking-const14.C |   3 +-
 .../g++.dg/cpp1y/constexpr-tracking-const16.C |   3 +-
 .../g++.dg/cpp1y/constexpr-tracking-const18.C |   4 +-
 .../g++.dg/cpp1y/constexpr-tracking-const19.C |   4 +-
 .../g++.dg/cpp1y/constexpr-tracking-const21.C |   4 +-
 .../g++.dg/cpp1y/constexpr-tracking-const22.C |   4 +-
 .../g++.dg/cpp1y/constexpr-tracking-const3.C  |   3 +-
 .../g++.dg/cpp1y/constexpr-tracking-const4.C  |   3 +-
 .../g++.dg/cpp1y/constexpr-tracking-const7.C  |   3 +-
 gcc/testsuite/g++.dg/cpp1y/constexpr-union5.C |   4 +-
 gcc/testsuite/g++.dg/cpp1y/pr68180.C  |   4 +-
 .../g++.dg/cpp1z/constexpr-lambda6.C  |   4 +-
 .../g++.dg/cpp1z/constexpr-lambda8.C  |   5 +-
 gcc/testsuite/g++.dg/cpp2a/bit-cast11.C   |  10 +-
 gcc/testsuite/g++.dg/cpp2a/bit-cast12.C   |  10 +-
 gcc/testsuite/g++.dg/cpp2a/bit-cast14.C   |  14 +-
 gcc/testsuite/g++.dg/cpp2a/constexpr-98122.C  |   4 +-
 .../g++.dg/cpp2a/constexpr-dynamic17.C|   5 +-
 gcc/testsuite/g++.dg/cpp2a/constexpr-init1.C  |   5 +-
 gcc/testsuite/g++.dg/cpp2a/constexpr-new12.C  |   6 +-
 gcc/testsuite/g++.dg/cpp2a/constexpr-new3.C   |  10 +-
 gcc/testsuite/g++.dg/cpp2a/constinit10.C  |   5 +-
 .../g++.dg/cpp2a/is-corresponding-member4.C   |   4 +-
 gcc/testsuite/g++.dg/ext/constexpr-vla2.C |   4 +-
 gcc/testsuite/g++.dg/ext/constexpr-vla3.C |   4 +-
 gcc/testsuite/g++.dg/ubsan/pr63956.C  |  23 +--
 .../g++.dg/warn/Wreturn-local-addr-6.C|   3 -
 .../25_algorithms/equal/constexpr_neg.cc  |   7 +-
 .../testsuite/26_numerics/gcd/105844.cc   |  10 +-
 .../testsuite/26_numerics/lcm/105844.cc   |  14 +-
 48 files changed, 330 insertions(+), 143 deletions(-)
 create mode 100644 gcc/testsuite/g++.dg/cpp1y/constexpr-lifetime1.C
 create mode 100644 gcc/testsuite/g++.dg/cpp1y/constexpr-lifetime2.C
 create mode 100644 gcc/testsuite/g++.dg/cpp1y/constexpr-lifetime3.C
 create mode 100644 gcc/testsuite/g++.dg/cpp1y/constexpr-lifetime4.C
 create mode 100644 gcc/testsuite/g++.dg/cpp1y/constexpr-lifetime5.C

-- 
2.41.0



[PATCH v3 1/3] c++: Track lifetimes in constant evaluation [PR70331,PR96630,PR98675]

2023-06-30 Thread Nathaniel Shead via Gcc-patches
This adds rudimentary lifetime tracking in C++ constexpr contexts,
allowing the compiler to report errors with using values after their
backing has gone out of scope. We don't yet handle other ways of
accessing values outside their lifetime (e.g. following explicit
destructor calls).

PR c++/96630
PR c++/98675
PR c++/70331

gcc/cp/ChangeLog:

* constexpr.cc (constexpr_global_ctx::remove_value): Mark value as
outside lifetime.
(find_expired_values): New function.
(outside_lifetime_error): New function.
(cxx_eval_call_expression): Don't cache calls that return references to
values outside their lifetime.
(cxx_eval_constant_expression): Add checks for out-of-lifetime values.
Forget local variables at end of bind expressions, and temporaries
after cleanup points.

gcc/testsuite/ChangeLog:

* g++.dg/cpp1y/constexpr-lifetime1.C: New test.
* g++.dg/cpp1y/constexpr-lifetime2.C: New test.
* g++.dg/cpp1y/constexpr-lifetime3.C: New test.
* g++.dg/cpp1y/constexpr-lifetime4.C: New test.
* g++.dg/cpp1y/constexpr-lifetime5.C: New test.

Signed-off-by: Nathaniel Shead 
---
 gcc/cp/constexpr.cc   | 112 ++
 .../g++.dg/cpp1y/constexpr-lifetime1.C|  13 ++
 .../g++.dg/cpp1y/constexpr-lifetime2.C|  20 
 .../g++.dg/cpp1y/constexpr-lifetime3.C|  13 ++
 .../g++.dg/cpp1y/constexpr-lifetime4.C|  11 ++
 .../g++.dg/cpp1y/constexpr-lifetime5.C|  11 ++
 6 files changed, 160 insertions(+), 20 deletions(-)
 create mode 100644 gcc/testsuite/g++.dg/cpp1y/constexpr-lifetime1.C
 create mode 100644 gcc/testsuite/g++.dg/cpp1y/constexpr-lifetime2.C
 create mode 100644 gcc/testsuite/g++.dg/cpp1y/constexpr-lifetime3.C
 create mode 100644 gcc/testsuite/g++.dg/cpp1y/constexpr-lifetime4.C
 create mode 100644 gcc/testsuite/g++.dg/cpp1y/constexpr-lifetime5.C

diff --git a/gcc/cp/constexpr.cc b/gcc/cp/constexpr.cc
index cca0435bafc..bc59b4aab67 100644
--- a/gcc/cp/constexpr.cc
+++ b/gcc/cp/constexpr.cc
@@ -1165,6 +1165,8 @@ public:
   hash_set *modifiable;
   /* Number of heap VAR_DECL deallocations.  */
   unsigned heap_dealloc_count;
+  /* Values that are not within their lifetime.  */
+  hash_set outside_lifetime;
   /* Constructor.  */
   constexpr_global_ctx ()
 : constexpr_ops_count (0), cleanups (NULL), modifiable (nullptr),
@@ -1188,7 +1190,12 @@ public:
 if (!already_in_map && modifiable)
   modifiable->add (t);
   }
-  void remove_value (tree t) { values.remove (t); }
+  void remove_value (tree t)
+  {
+if (DECL_P (t))
+  outside_lifetime.add (t);
+values.remove (t);
+  }
 };
 
 /* Helper class for constexpr_global_ctx.  In some cases we want to avoid
@@ -2509,6 +2516,22 @@ cxx_eval_dynamic_cast_fn (const constexpr_ctx *ctx, tree 
call,
   return cp_build_addr_expr (obj, complain);
 }
 
+/* Look for expired values in the expression *TP, called through
+   cp_walk_tree.  DATA is ctx->global->outside_lifetime.  */
+
+static tree
+find_expired_values (tree *tp, int *walk_subtrees, void *data)
+{
+  hash_set *outside_lifetime = (hash_set *) data;
+
+  if (TYPE_P (*tp))
+*walk_subtrees = 0;
+  else if (outside_lifetime->contains (*tp))
+return *tp;
+
+  return NULL_TREE;
+}
+
 /* Data structure used by replace_decl and replace_decl_r.  */
 
 struct replace_decl_data
@@ -3160,10 +3183,7 @@ cxx_eval_call_expression (const constexpr_ctx *ctx, tree 
t,
  for (tree save_expr : save_exprs)
ctx->global->remove_value (save_expr);
 
- /* Remove the parms/result from the values map.  Is it worth
-bothering to do this when the map itself is only live for
-one constexpr evaluation?  If so, maybe also clear out
-other vars from call, maybe in BIND_EXPR handling?  */
+ /* Remove the parms/result from the values map.  */
  ctx->global->remove_value (res);
  for (tree parm = parms; parm; parm = TREE_CHAIN (parm))
ctx->global->remove_value (parm);
@@ -3210,13 +3230,20 @@ cxx_eval_call_expression (const constexpr_ctx *ctx, 
tree t,
cacheable = false;
}
 
-   /* Rewrite all occurrences of the function's RESULT_DECL with the
-  current object under construction.  */
-   if (!*non_constant_p && ctx->object
-   && CLASS_TYPE_P (TREE_TYPE (res))
-   && !is_empty_class (TREE_TYPE (res)))
- if (replace_decl (&result, res, ctx->object))
-   cacheable = false;
+ /* Also don't cache a call if we return a pointer to an expired
+value.  */
+ if (cacheable && (cp_walk_tree_without_duplicates
+   (&result, find_expired_values,
+&ctx->global->outside_lifetime)))
+   cacheable = false;
+
+ /* Rewrite all occurrences of the function's RESULT

[PATCH v3 2/3] c++: Improve constexpr error for dangling local variables

2023-06-30 Thread Nathaniel Shead via Gcc-patches
Currently, when typeck discovers that a return statement will refer to a
local variable it rewrites to return a null pointer. This causes the
error messages for using the return value in a constant expression to be
unhelpful, especially for reference return values.

This patch removes this "optimisation". Relying on this raises a warning
by default and causes UB anyway, so there should be no issue in doing
so. We also suppress additional warnings from later passes that detect
this as a dangling pointer, since we've already indicated this anyway.

gcc/cp/ChangeLog:

* semantics.cc (finish_return_stmt): Suppress dangling pointer
reporting on return statement if already reported.
* typeck.cc (check_return_expr): Don't set return expression to
zero for dangling addresses.

gcc/testsuite/ChangeLog:

* g++.dg/cpp1y/constexpr-lifetime5.C: Test reported message is
correct.
* g++.dg/warn/Wreturn-local-addr-6.C: Remove check for return
value optimisation.

Signed-off-by: Nathaniel Shead 
---
 gcc/cp/semantics.cc  | 5 -
 gcc/cp/typeck.cc | 5 +++--
 gcc/testsuite/g++.dg/cpp1y/constexpr-lifetime5.C | 4 ++--
 gcc/testsuite/g++.dg/warn/Wreturn-local-addr-6.C | 3 ---
 4 files changed, 9 insertions(+), 8 deletions(-)

diff --git a/gcc/cp/semantics.cc b/gcc/cp/semantics.cc
index 8fb47fd179e..107407de513 100644
--- a/gcc/cp/semantics.cc
+++ b/gcc/cp/semantics.cc
@@ -1260,7 +1260,10 @@ finish_return_stmt (tree expr)
 
   r = build_stmt (input_location, RETURN_EXPR, expr);
   if (no_warning)
-suppress_warning (r, OPT_Wreturn_type);
+{
+  suppress_warning (r, OPT_Wreturn_type);
+  suppress_warning (r, OPT_Wdangling_pointer_);
+}
   r = maybe_cleanup_point_expr_void (r);
   r = add_stmt (r);
 
diff --git a/gcc/cp/typeck.cc b/gcc/cp/typeck.cc
index 859b133a18d..47233b3b717 100644
--- a/gcc/cp/typeck.cc
+++ b/gcc/cp/typeck.cc
@@ -11273,8 +11273,9 @@ check_return_expr (tree retval, bool *no_warning)
   else if (!processing_template_decl
   && maybe_warn_about_returning_address_of_local (retval, loc)
   && INDIRECT_TYPE_P (valtype))
-   retval = build2 (COMPOUND_EXPR, TREE_TYPE (retval), retval,
-build_zero_cst (TREE_TYPE (retval)));
+   /* Suppress the Wdangling-pointer warning in the return statement
+  that would otherwise occur.  */
+   *no_warning = true;
 }
 
   /* A naive attempt to reduce the number of -Wdangling-reference false
diff --git a/gcc/testsuite/g++.dg/cpp1y/constexpr-lifetime5.C 
b/gcc/testsuite/g++.dg/cpp1y/constexpr-lifetime5.C
index a4bc71d890a..ad3ef579f63 100644
--- a/gcc/testsuite/g++.dg/cpp1y/constexpr-lifetime5.C
+++ b/gcc/testsuite/g++.dg/cpp1y/constexpr-lifetime5.C
@@ -1,11 +1,11 @@
 // { dg-do compile { target c++14 } }
 // { dg-options "-Wno-return-local-addr" }
 
-constexpr const int& id(int x) { return x; }
+constexpr const int& id(int x) { return x; }  // { dg-message "note: declared 
here" }
 
 constexpr bool test() {
   const int& y = id(3);
   return y == 3;
 }
 
-constexpr bool x = test();  // { dg-error "" }
+constexpr bool x = test();  // { dg-error "accessing object outside its 
lifetime" }
diff --git a/gcc/testsuite/g++.dg/warn/Wreturn-local-addr-6.C 
b/gcc/testsuite/g++.dg/warn/Wreturn-local-addr-6.C
index fae8b7e766f..ec8e241d83e 100644
--- a/gcc/testsuite/g++.dg/warn/Wreturn-local-addr-6.C
+++ b/gcc/testsuite/g++.dg/warn/Wreturn-local-addr-6.C
@@ -24,6 +24,3 @@ return_addr_local_as_intref (void)
 
   return (const intptr_t&)a;   // { dg-warning "\\\[-Wreturn-local-addr]" } */
 }
-
-/* Verify that the return value has been replaced with zero:
-  { dg-final { scan-tree-dump-times "return 0;" 2 "optimized" } } */
-- 
2.41.0



[PATCH v3 3/3] c++: Improve location information in constant evaluation

2023-06-30 Thread Nathaniel Shead via Gcc-patches
This patch updates 'input_location' during constant evaluation to ensure
that errors in subexpressions that lack location information still
provide accurate diagnostics.

By itself this change causes some small regressions in diagnostic
quality for circumstances where errors used 'input_location' but the
location of the parent subexpression doesn't make sense, so this patch
also includes a couple of other small diagnostic improvements to improve
the most egregious cases.

gcc/cp/ChangeLog:

* constexpr.cc (modifying_const_object_error): Find the source
location of the const object's declaration.
(cxx_eval_store_expression): Fall back to the location of the
target object when evaluating initialiser.
(cxx_eval_constant_expression): Update input_location to the location
of the currently evaluated expression.

libstdc++-v3/ChangeLog:

* testsuite/25_algorithms/equal/constexpr_neg.cc: Update diagnostic
locations.
* testsuite/26_numerics/gcd/105844.cc: Likewise.
* testsuite/26_numerics/lcm/105844.cc: Likewise.

gcc/testsuite/ChangeLog:

* g++.dg/cpp0x/constexpr-48089.C: Update diagnostic locations.
* g++.dg/cpp0x/constexpr-70323.C: Likewise.
* g++.dg/cpp0x/constexpr-70323a.C: Likewise.
* g++.dg/cpp0x/constexpr-delete2.C: Likewise.
* g++.dg/cpp0x/constexpr-diag3.C: Likewise.
* g++.dg/cpp0x/constexpr-ice20.C: Likewise.
* g++.dg/cpp0x/constexpr-recursion.C: Likewise.
* g++.dg/cpp0x/overflow1.C: Likewise.
* g++.dg/cpp1y/constexpr-89285.C: Likewise.
* g++.dg/cpp1y/constexpr-89481.C: Likewise.
* g++.dg/cpp1y/constexpr-lifetime1.C: Likewise.
* g++.dg/cpp1y/constexpr-lifetime2.C: Likewise.
* g++.dg/cpp1y/constexpr-lifetime3.C: Likewise.
* g++.dg/cpp1y/constexpr-lifetime4.C: Likewise.
* g++.dg/cpp1y/constexpr-lifetime5.C: Likewise.
* g++.dg/cpp1y/constexpr-tracking-const14.C: Likewise.
* g++.dg/cpp1y/constexpr-tracking-const16.C: Likewise.
* g++.dg/cpp1y/constexpr-tracking-const18.C: Likewise.
* g++.dg/cpp1y/constexpr-tracking-const19.C: Likewise.
* g++.dg/cpp1y/constexpr-tracking-const21.C: Likewise.
* g++.dg/cpp1y/constexpr-tracking-const22.C: Likewise.
* g++.dg/cpp1y/constexpr-tracking-const3.C: Likewise.
* g++.dg/cpp1y/constexpr-tracking-const4.C: Likewise.
* g++.dg/cpp1y/constexpr-tracking-const7.C: Likewise.
* g++.dg/cpp1y/constexpr-union5.C: Likewise.
* g++.dg/cpp1y/pr68180.C: Likewise.
* g++.dg/cpp1z/constexpr-lambda6.C: Likewise.
* g++.dg/cpp1z/constexpr-lambda8.C: Likewise.
* g++.dg/cpp2a/bit-cast11.C: Likewise.
* g++.dg/cpp2a/bit-cast12.C: Likewise.
* g++.dg/cpp2a/bit-cast14.C: Likewise.
* g++.dg/cpp2a/constexpr-98122.C: Likewise.
* g++.dg/cpp2a/constexpr-dynamic17.C: Likewise.
* g++.dg/cpp2a/constexpr-init1.C: Likewise.
* g++.dg/cpp2a/constexpr-new12.C: Likewise.
* g++.dg/cpp2a/constexpr-new3.C: Likewise.
* g++.dg/cpp2a/constinit10.C: Likewise.
* g++.dg/cpp2a/is-corresponding-member4.C: Likewise.
* g++.dg/ext/constexpr-vla2.C: Likewise.
* g++.dg/ext/constexpr-vla3.C: Likewise.
* g++.dg/ubsan/pr63956.C: Likewise.

Signed-off-by: Nathaniel Shead 
---
 gcc/cp/constexpr.cc   | 46 ++-
 gcc/testsuite/g++.dg/cpp0x/constexpr-48089.C  | 10 ++--
 gcc/testsuite/g++.dg/cpp0x/constexpr-70323.C  |  8 ++--
 gcc/testsuite/g++.dg/cpp0x/constexpr-70323a.C |  8 ++--
 .../g++.dg/cpp0x/constexpr-delete2.C  |  5 +-
 gcc/testsuite/g++.dg/cpp0x/constexpr-diag3.C  |  2 +-
 gcc/testsuite/g++.dg/cpp0x/constexpr-ice20.C  |  1 +
 .../g++.dg/cpp0x/constexpr-recursion.C|  6 +--
 gcc/testsuite/g++.dg/cpp0x/overflow1.C|  2 +-
 gcc/testsuite/g++.dg/cpp1y/constexpr-89285.C  |  5 +-
 gcc/testsuite/g++.dg/cpp1y/constexpr-89481.C  |  3 +-
 .../g++.dg/cpp1y/constexpr-lifetime1.C|  1 +
 .../g++.dg/cpp1y/constexpr-lifetime2.C|  4 +-
 .../g++.dg/cpp1y/constexpr-lifetime3.C|  4 +-
 .../g++.dg/cpp1y/constexpr-lifetime4.C|  2 +-
 .../g++.dg/cpp1y/constexpr-lifetime5.C|  4 +-
 .../g++.dg/cpp1y/constexpr-tracking-const14.C |  3 +-
 .../g++.dg/cpp1y/constexpr-tracking-const16.C |  3 +-
 .../g++.dg/cpp1y/constexpr-tracking-const18.C |  4 +-
 .../g++.dg/cpp1y/constexpr-tracking-const19.C |  4 +-
 .../g++.dg/cpp1y/constexpr-tracking-const21.C |  4 +-
 .../g++.dg/cpp1y/constexpr-tracking-const22.C |  4 +-
 .../g++.dg/cpp1y/constexpr-tracking-const3.C  |  3 +-
 .../g++.dg/cpp1y/constexpr-tracking-const4.C  |  3 +-
 .../g++.dg/cpp1y/constexpr-tracking-const7.C  |  3 +-
 gcc/testsuite/g++.dg/cpp1y/constexpr-union5.C |  4 +-
 gcc/testsuite/g++.dg/cpp1y/pr68180.C  |  4 +-
 .../g++.dg/cpp1z/constexpr-lambda6.C  |  4 +-
 .../g++.dg/cpp1z/constexpr-l

[PATCH] lto: Bypass assembler when generating LTO object files. & libiberty: lto: Addition of .symtab in elf file.

2023-06-30 Thread Rishi Raj via Gcc-patches
This series of two patches enables the output of the LTO object file
without an assembler. As of now, .symtab is emitted with __gnu_lto_slim
symbol. To test, follow the instructions in the commit
message of patch 1. Also, as suggested by Honza, I am putting these patches
on devel/bypass-asm branch.
>From 46c766d242fd248abc49201cf6419735c8415a6f Mon Sep 17 00:00:00 2001
From: Rishi Raj 
Date: Sat, 1 Jul 2023 10:28:11 +0530
Subject: [PATCH 1/2] lto: Bypass assembler when generating LTO object files.

This patch applies Jan Hubicka previous patch on current sources.
Now the compiler is able to produce object files without assembler,
although
a lot of things are missing, such as __lto_slim symbol, debug symbols
etc. They will be added in future patches. To test this current patch,
use these commands below.
1) ./xgcc -B ./ -O3 a.c -flto -S -fbypass-asm=crtbegin.o  -o a.o
2) ./xgcc -B ./ -O2 a.o -flto
3)  ./a.out

We are currently working with elf-only support (mach-o, coff, xcoff etc
will be dealt later) so this will only work on a linux machine. I have
tested
this on my machine ( Arch linux, Machine: Advanced Micro Devices
X86-64) and
all LTO test cases passed as expected.

gcc/ChangeLog:

* Makefile.in:
* common.opt:
* langhooks.cc (lhd_begin_section):
(lhd_append_data):
(lhd_end_section):
* lto/lto-object.cc: Moved to...
* lto-object.cc: ...here.
* lto-streamer.h (struct lto_section_slot):
(struct lto_section_list):
(struct lto_file):
(lto_obj_file_open):
(lto_obj_file_close):
(lto_obj_build_section_table):
(lto_obj_create_section_hash_table):
(lto_obj_begin_section):
(lto_obj_append_data):
(lto_obj_end_section):
(lto_set_current_out_file):
(lto_get_current_out_file):
* toplev.cc (compile_file):
(lang_dependent_init):

gcc/lto/ChangeLog:

* Make-lang.in:
* lto-common.cc (lto_file_read):
* lto-lang.cc:
* lto.h (struct lto_file):
(lto_obj_file_open):
(lto_obj_file_close):
(struct lto_section_list):
(lto_obj_build_section_table):
(lto_obj_create_section_hash_table):
(lto_obj_begin_section):
(lto_obj_append_data):
(lto_obj_end_section):
(lto_set_current_out_file):
(lto_get_current_out_file):
(struct lto_section_slot):

Signed-off-by: Rishi Raj 
---
 gcc/Makefile.in |  1 +
 gcc/common.opt  |  3 +++
 gcc/langhooks.cc| 29 +++-
 gcc/{lto => }/lto-object.cc | 29 +---
 gcc/lto-streamer.h  | 35 ++
 gcc/lto/Make-lang.in|  4 ++--
 gcc/lto/lto-common.cc   |  3 ++-
 gcc/lto/lto-lang.cc |  1 +
 gcc/lto/lto.h   | 38 -
 gcc/toplev.cc   | 19 ---
 10 files changed, 110 insertions(+), 52 deletions(-)
 rename gcc/{lto => }/lto-object.cc (94%)

diff --git a/gcc/Makefile.in b/gcc/Makefile.in
index c478ec85201..c9ae222fb59 100644
--- a/gcc/Makefile.in
+++ b/gcc/Makefile.in
@@ -1560,6 +1560,7 @@ OBJS = \
  lto-section-out.o \
  lto-opts.o \
  lto-compress.o \
+ lto-object.o \
  mcf.o \
  mode-switching.o \
  modulo-sched.o \
diff --git a/gcc/common.opt b/gcc/common.opt
index 25f650e2dae..ba7a18ece8c 100644
--- a/gcc/common.opt
+++ b/gcc/common.opt
@@ -1169,6 +1169,9 @@ fbtr-bb-exclusive
 Common Ignore
 Does nothing.  Preserved for backward compatibility.

+fbypass-asm=
+Common Joined Var(flag_bypass_asm)
+
 fcallgraph-info
 Common RejectNegative Var(flag_callgraph_info) Init(NO_CALLGRAPH_INFO);
 Output callgraph information on a per-file basis.
diff --git a/gcc/langhooks.cc b/gcc/langhooks.cc
index 9a1a9eccca9..a76ed974d58 100644
--- a/gcc/langhooks.cc
+++ b/gcc/langhooks.cc
@@ -38,6 +38,10 @@ along with GCC; see the file COPYING3.  If not see
 #include "stor-layout.h"
 #include "cgraph.h"
 #include "debug.h"
+#include "function.h"
+#include "basic-block.h"
+#include "gimple.h"
+#include "lto-streamer.h"

 /* Do nothing; in many cases the default hook.  */

@@ -817,6 +821,19 @@ lhd_begin_section (const char *name)
 {
   section *section;

+  if (flag_bypass_asm)
+{
+  static int initialized = false;
+  if (!initialized)
+ {
+  gcc_assert (asm_out_file == NULL);
+  lto_set_current_out_file (lto_obj_file_open (asm_file_name,
true));
+  initialized = true;
+ }
+  lto_obj_begin_section (name);
+  return;
+}
+
   /* Save the old section so we can restore it in lto_end_asm_section.  */
   gcc_assert (!saved_section);
   saved_section = in_section;
@@ -833,8 +850,13 @@ lhd_begin_section (const char *name)
imple