Re: [PATCH] [x86] define builtins for "shared" avxneconvert-avx512bf16vl builtins.

2022-11-17 Thread Hongtao Liu via Gcc-patches
On Fri, Nov 18, 2022 at 3:50 PM Jakub Jelinek  wrote:
>
> On Fri, Nov 18, 2022 at 09:45:22AM +0800, liuhongt via Gcc-patches wrote:
> > This should fix incorrect error when call those builtin with
> > -mavxneconvert and w/o -mavx512bf16 -mavx512vl.
> >
> > Bootstrapped and regtested on x86_64-pc-linux-gnu{-m32,}
> > Ready to push to trunk.
> >
> > gcc/ChangeLog:
> >
> >   * config/i386/i386-builtins.cc (def_builtin): Hanlde "shared"
>
> Just a nit: s/Hanlde/Handle/
Thanks.
>
> >   avx512bf16vl-avxneconvert builtins.
> >
> > gcc/testsuite/ChangeLog:
> >
> >   * gcc.target/i386/avxneconvert-1.c: New test.
>
> Jakub
>


-- 
BR,
Hongtao


Re: [PATCH] [x86] define builtins for "shared" avxneconvert-avx512bf16vl builtins.

2022-11-17 Thread Jakub Jelinek via Gcc-patches
On Fri, Nov 18, 2022 at 09:45:22AM +0800, liuhongt via Gcc-patches wrote:
> This should fix incorrect error when call those builtin with
> -mavxneconvert and w/o -mavx512bf16 -mavx512vl.
> 
> Bootstrapped and regtested on x86_64-pc-linux-gnu{-m32,}
> Ready to push to trunk.
> 
> gcc/ChangeLog:
> 
>   * config/i386/i386-builtins.cc (def_builtin): Hanlde "shared"

Just a nit: s/Hanlde/Handle/

>   avx512bf16vl-avxneconvert builtins.
> 
> gcc/testsuite/ChangeLog:
> 
>   * gcc.target/i386/avxneconvert-1.c: New test.

Jakub



Re: [PATCH] c++, v4: Implement C++23 P2647R1 - Permitting static constexpr variables in constexpr functions

2022-11-17 Thread Jakub Jelinek via Gcc-patches
On Thu, Nov 17, 2022 at 07:15:05PM -0500, Marek Polacek wrote:
> > --- gcc/cp/decl.cc.jj   2022-11-16 14:44:43.692339668 +0100
> > +++ gcc/cp/decl.cc  2022-11-17 20:53:44.102011594 +0100
> > @@ -5600,6 +5600,57 @@ groktypename (cp_decl_specifier_seq *typ
> >return type;
> >  }
> >  
> > +/* For C++17 and older diagnose static or thread_local decls in constexpr
> > +   or consteval functions.  For C++20 similarly, except if they are
> 
> In C++17 we don't support consteval so I guess drop the "or consteval "?

I just forgot to update the function comment.

Anyway, I think:

> BTW, I notice that the patch breaks
> g++.dg/cpp1y/lambda-generic-func1.C
> g++.dg/cpp1z/constexpr-lambda16.C
> Maybe they just need dg- tweaks.

this is actually a real bug and I'm not sure how to resolve that.

We have there:

int main()
{
  [](auto i) { if (i) { int j; static int k; return i + j; } return i; }(0);
}

and for C++17/20 I presume something (haven't figured out yet what) marks
the lambda operator() when still a template as constexpr and then
cp_finish_decl -> diagnose_static_in_constexpr pedwarns on it.
For the above perhaps we could figure out there is a static int k; in the
operator() and don't turn it into constexpr, but what if there is
something that would e.g. satisfy decl_maybe_constant_var_p but not
decl_constant_var_p when actually instantiated?
Without my patch, the diagnostics is in start_decl which isn't called again
during instantiation, so I presume we mark it as constexpr and then we'd
diagnose it during constant evaluation.

Jakub



[PATCH] tree-optimization/107647 - avoid FMA from SLP with -ffp-contract=off

2022-11-17 Thread Richard Biener via Gcc-patches
Only with -ffp-contract=fast we can synthesize FMA operations like
vfmaddsub231ps, so properly guard the transform in SLP pattern
detection.

Bootstrapped and tested on x86_64-unknown-linux-gnu, pushed.

PR tree-optimization/107647
* tree-vect-slp-patterns.cc (addsub_pattern::recognize): Only
allow FMA generation with -ffp-contract=fast for FP types.
(complex_mul_pattern::matches): Likewise.

* gcc.target/i386/pr107647.c: New testcase.
---
 gcc/testsuite/gcc.target/i386/pr107647.c | 17 +
 gcc/tree-vect-slp-patterns.cc| 15 +++
 2 files changed, 28 insertions(+), 4 deletions(-)
 create mode 100644 gcc/testsuite/gcc.target/i386/pr107647.c

diff --git a/gcc/testsuite/gcc.target/i386/pr107647.c 
b/gcc/testsuite/gcc.target/i386/pr107647.c
new file mode 100644
index 000..45fcb55d698
--- /dev/null
+++ b/gcc/testsuite/gcc.target/i386/pr107647.c
@@ -0,0 +1,17 @@
+/* { dg-do compile } */
+/* { dg-options "-O2 -ffp-contract=off -mavx2 -mfma" } */
+
+void cscal(int n, float da_r, float *x)
+{
+  for (int i = 0; i < n; i += 4)
+{
+  float temp0  =  da_r * x[i]   - x[i+1];
+  float temp1  =  da_r * x[i+2] - x[i+3];
+  x[i+1]   =  da_r * x[i+1] + x[i];
+  x[i+3]   =  da_r * x[i+3] + x[i+2];
+  x[i] =  temp0;
+  x[i+2]   =  temp1;
+}
+}
+
+/* { dg-final { scan-assembler-not "fma" } } */
diff --git a/gcc/tree-vect-slp-patterns.cc b/gcc/tree-vect-slp-patterns.cc
index dc694b8e531..122d697a809 100644
--- a/gcc/tree-vect-slp-patterns.cc
+++ b/gcc/tree-vect-slp-patterns.cc
@@ -1035,8 +1035,11 @@ complex_mul_pattern::matches (complex_operation_t op,
   auto_vec left_op, right_op;
   slp_tree add0 = NULL;
 
-  /* Check if we may be a multiply add.  */
+  /* Check if we may be a multiply add.  It's only valid to form FMAs
+ with -ffp-contract=fast.  */
   if (!mul0
+  && (flag_fp_contract_mode == FP_CONTRACT_FAST
+ || !FLOAT_TYPE_P (SLP_TREE_VECTYPE (l0node[0])))
   && vect_match_expression_p (l0node[0], PLUS_EXPR))
 {
   auto vals = SLP_TREE_CHILDREN (l0node[0]);
@@ -1501,9 +1504,13 @@ addsub_pattern::recognize (slp_tree_to_load_perm_map_t *,
 }
 
   /* Now we have either { -, +, -, + ... } (!l0add_p) or { +, -, +, - ... }
- (l0add_p), see whether we have FMA variants.  */
-  if (!l0add_p
-  && vect_match_expression_p (SLP_TREE_CHILDREN (l0node)[0], MULT_EXPR))
+ (l0add_p), see whether we have FMA variants.  We can only form FMAs
+ if allowed via -ffp-contract=fast.  */
+  if (flag_fp_contract_mode != FP_CONTRACT_FAST
+  && FLOAT_TYPE_P (SLP_TREE_VECTYPE (l0node)))
+;
+  else if (!l0add_p
+  && vect_match_expression_p (SLP_TREE_CHILDREN (l0node)[0], 
MULT_EXPR))
 {
   /* (c * d) -+ a */
   if (vect_pattern_validate_optab (IFN_VEC_FMADDSUB, node))
-- 
2.35.3


Re: [PATCH] RISC-V: Note that __builtin_riscv_pause() implies Xgnuzihintpausestate

2022-11-17 Thread Kito Cheng via Gcc-patches
Wait, what's Xgnuzihintpausestate???


On Fri, Nov 18, 2022 at 12:30 PM Palmer Dabbelt  wrote:
>
> gcc/ChangeLog:
>
> * doc/extend.texi (__builtin_riscv_pause): Imply
> Xgnuzihintpausestate.
> ---
>  gcc/doc/extend.texi | 4 +++-
>  1 file changed, 3 insertions(+), 1 deletion(-)
>
> diff --git a/gcc/doc/extend.texi b/gcc/doc/extend.texi
> index b1dd39e64b8..26f14e61bc8 100644
> --- a/gcc/doc/extend.texi
> +++ b/gcc/doc/extend.texi
> @@ -21103,7 +21103,9 @@ Returns the value that is currently set in the 
> @samp{tp} register.
>  @end deftypefn
>
>  @deftypefn {Built-in Function}  void __builtin_riscv_pause (void)
> -Generates the @code{pause} (hint) machine instruction.
> +Generates the @code{pause} (hint) machine instruction.  This implies the
> +Xgnuzihintpausestate extension, which redefines the @code{pause} instruction 
> to
> +change architectural state.
>  @end deftypefn
>
>  @node RX Built-in Functions
> --
> 2.38.1
>


Re: [PATCHv2, rs6000] Enable have_cbranchcc4 on rs6000

2022-11-17 Thread HAO CHEN GUI via Gcc-patches
Hi David,

在 2022/11/17 21:24, David Edelsohn 写道:
> This is better, but the pattern should be near and after the existing 
> cbranch4 patterns earlier in the file, not the *cbranch pattern.  It 
> doesn't match the comment.
Sure, I will put it after existing "cbranch4" patterns.

> 
> Why are you using zero_constant predicate instead of matching (const_int 0) 
> for operand 2?
The "const_int 0" is an operand other than a predicate. We need a predicate 
here.

> 
> Why does this need the new all_branch_comparison_operator?  Can the ifcvt 
> optimization correctly elide the 2 insn sequence?
Because rs6000 defines "*cbranch_2insn" insn, such insns are generated after 
expand.

(jump_insn 50 47 51 11 (set (pc)
(if_then_else (ge (reg:CCFP 156)
(const_int 0 [0]))
(label_ref 53)
(pc))) 
"/home/guihaoc/gcc/gcc-mainline-base/gmp/mpz/cmpabs_d.c":80:7 884 
{*cbranch_2insn}
 (expr_list:REG_DEAD (reg:CCFP 156)
(int_list:REG_BR_PROB 633507684 (nil)))
 -> 53)

In prepare_cmp_insn, the comparison is verified by insn_operand_matches. If
extra_insn_branch_comparison_operator is not included in "cbranchcc4" predicate,
it hits ICE here.

  if (GET_MODE_CLASS (mode) == MODE_CC)
{
  enum insn_code icode = optab_handler (cbranch_optab, CCmode);
  test = gen_rtx_fmt_ee (comparison, VOIDmode, x, y);
  gcc_assert (icode != CODE_FOR_nothing
  && insn_operand_matches (icode, 0, test));
  *ptest = test;
  return;
}

The real conditional move is generated by emit_conditional_move_1. Commonly
"*cbranch_2insn" can't be optimized out and it returns NULL_RTX.

  if (COMPARISON_P (comparison))
{
  saved_pending_stack_adjust save;
  save_pending_stack_adjust ();
  last = get_last_insn ();
  do_pending_stack_adjust ();
  machine_mode cmpmode = comp.mode;
  prepare_cmp_insn (XEXP (comparison, 0), XEXP (comparison, 1),
GET_CODE (comparison), NULL_RTX, unsignedp,
OPTAB_WIDEN, , );
  if (comparison)
{
   rtx res = emit_conditional_move_1 (target, comparison,
  op2, op3, mode);
   if (res != NULL_RTX)
 return res;
}
  delete_insns_since (last);
  restore_pending_stack_adjust ();

I think that extra_insn_branch_comparison_operator should be included in
"cbranchcc4" predicates as such insns exist. And leave it to
emit_conditional_move which decides whether it can be optimized or not.

Thanks for your comments
Gui Haochen


Re: Re: [PATCH] Ver.2: Add compile option "-msmall-data-limit=0" to avoid using .srodata section for riscv.

2022-11-17 Thread 陈逸轩
Thank you very much for your example! I have sent a new patch according to your 
guide.

Jeff Law jeffreya...@gmail.comwrote:
> 
> On 11/17/22 02:53, Yixuan Chen wrote:
> > 2022-11-17  Yixuan Chen  
> >
> >  * gcc/testsuite/gcc.dg/pr25521.c: Add compile option 
> > "-msmall-data-limit=0" to avoid using .srodata section for riscv.
> > ---
> >   gcc/testsuite/gcc.dg/pr25521.c | 3 ++-
> >   1 file changed, 2 insertions(+), 1 deletion(-)
> >
> > diff --git a/gcc/testsuite/gcc.dg/pr25521.c b/gcc/testsuite/gcc.dg/pr25521.c
> > index 74fe2ae6626..628ddf1a761 100644
> > --- a/gcc/testsuite/gcc.dg/pr25521.c
> > +++ b/gcc/testsuite/gcc.dg/pr25521.c
> > @@ -2,7 +2,8 @@
> >  sections.
> >   
> >  { dg-require-effective-target elf }
> > -   { dg-do compile } */
> > +   { dg-do compile }
> > +   { dg-options "-msmall-data-limit=0" { target { riscv*-*-* } } } */
> >   
> >   const volatile int foo = 30;
> >   
> 
> Wouldn't this be better?  It avoids a target specific conditional by 
> instead extending what we look for to cover [s]rodata sections.
> 
> 
> Thoughts?
> 
> Jeff


[PATCH] optimize the testcase for architectures that use ".srodata"

2022-11-17 Thread Yixuan Chen
2022-11-18  Yixuan Chen  

* gcc.dg/pr25521.c: optimize the testcast for architectures that use 
".srodata"

testsuite/gcc.dg/pr25521.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/gcc/testsuite/gcc.dg/pr25521.c b/gcc/testsuite/gcc.dg/pr25521.c
index 74fe2ae6626..63363a03b9f 100644
--- a/gcc/testsuite/gcc.dg/pr25521.c
+++ b/gcc/testsuite/gcc.dg/pr25521.c
@@ -7,4 +7,4 @@
 const volatile int foo = 30;
 
 
-/* { dg-final { scan-assembler "\\.rodata" } } */
+/* { dg-final { scan-assembler "\\.s\?rodata" } } */
-- 
2.37.2



Re: [PATCH] Ver.2: Add compile option "-msmall-data-limit=0" to avoid using .srodata section for riscv.

2022-11-17 Thread Oria Chiuan via Gcc-patches
Thank you very much for your patient explanation!

Palmer Dabbelt 于2022年11月18日 周五13:02写道:

> On Thu, 17 Nov 2022 19:30:23 PST (-0800), oriachi...@gmail.com wrote:
> > Got it, I used to regard this test case as targeting at test if the const
> > data would use the ".rodata" section.
>
> Sorry, I'm not quite sure what you're trying to say here.  Here's a dump
> of how I see things:
>
> In some targets (RISC-V and MIPS) there's multiple copies of the
> data/rodata sections, with the small data/rodata ending up in the small
> sections (`.sdata` and `.srodata`).  I've never actually been 100% on
> that being allowed by any spec, but MIPS did it long before RISC-V so I
> figure software is expected to tolerate the oddness.
>
> In RISC-V we use it to try and place as many symbols as possible close
> to GP, so we're more likely to relax to GP-relative addressing
> sequences.  IIRC that's pretty much the same as MIPS, though they have
> slightly different addressing requirements.
>
> For targets that function this way `.srodata` and `.rodata` are
> functionally equivalent (assuming you're not playing any GP tricks to
> relocate, but those are way out of what's supported).  So unless the
> test is trying to dig into performance issues differences between these
> sections, it should just allow code to target either.
>
> >
> > Palmer Dabbelt  于2022年11月18日周五 07:59写道:
> >
> >> On Thu, 17 Nov 2022 13:50:00 PST (-0800), gcc-patches@gcc.gnu.org
> wrote:
> >> >
> >> > On 11/17/22 02:53, Yixuan Chen wrote:
> >> >> 2022-11-17  Yixuan Chen  
> >> >>
> >> >>  * gcc/testsuite/gcc.dg/pr25521.c: Add compile option
> >> "-msmall-data-limit=0" to avoid using .srodata section for riscv.
> >> >> ---
> >> >>   gcc/testsuite/gcc.dg/pr25521.c | 3 ++-
> >> >>   1 file changed, 2 insertions(+), 1 deletion(-)
> >> >>
> >> >> diff --git a/gcc/testsuite/gcc.dg/pr25521.c
> >> b/gcc/testsuite/gcc.dg/pr25521.c
> >> >> index 74fe2ae6626..628ddf1a761 100644
> >> >> --- a/gcc/testsuite/gcc.dg/pr25521.c
> >> >> +++ b/gcc/testsuite/gcc.dg/pr25521.c
> >> >> @@ -2,7 +2,8 @@
> >> >>  sections.
> >> >>
> >> >>  { dg-require-effective-target elf }
> >> >> -   { dg-do compile } */
> >> >> +   { dg-do compile }
> >> >> +   { dg-options "-msmall-data-limit=0" { target { riscv*-*-* } } }
> */
> >> >>
> >> >>   const volatile int foo = 30;
> >> >>
> >> >
> >> > Wouldn't this be better?  It avoids a target specific conditional by
> >> > instead extending what we look for to cover [s]rodata sections.
> >> >
> >> >
> >> > Thoughts?
> >> >
> >> > Jeff
> >> > diff --git a/gcc/testsuite/gcc.dg/pr25521.c
> >> b/gcc/testsuite/gcc.dg/pr25521.c
> >> > index 74fe2ae6626..63363a03b9f 100644
> >> > --- a/gcc/testsuite/gcc.dg/pr25521.c
> >> > +++ b/gcc/testsuite/gcc.dg/pr25521.c
> >> > @@ -7,4 +7,4 @@
> >> >  const volatile int foo = 30;
> >> >
> >> >
> >> > -/* { dg-final { scan-assembler "\\.rodata" } } */
> >> > +/* { dg-final { scan-assembler "\\.s\?rodata" } } */
> >>
> >> That's how I usually do it for these tests, there's some other targets
> >> with sdata too so it fixes the test for everyone.  IIRC I said something
> >> like that in the v1, but sorry if I'm just getting it confused with some
> >> other patch.
> >>
> >> There's a few of these that need to get chased down for every release,
> >> maybe we should add some sort of DG hepler?  Not sure that'd keep folks
> >> from matching on .data, though...
> >>
>


Re: [PATCH] RISC-V: Add support for AIA ISA extensions (Ssaia and Smaia)

2022-11-17 Thread Palmer Dabbelt

On Thu, 17 Nov 2022 18:12:23 PST (-0800), christoph.muell...@vrull.eu wrote:

From: Christoph Müllner 

This patch adds support for the two AIA ISA extensions Ssaia and Smaia.
They are not relelvant for the compiler, but the assembler might want
to validate the CSRs. Therefore, all this patch does is recognize the
extension name, emit a feature macro (incl. a test).


This is pretty far in the weeds, but the AIA PDF says

   extension Smaia encompasses all added CSRs and all modifications to 
   interrupt response behavior that the AIA specifies for a hart, over 
   all privilege levels


but only a subset of AIA has been frozen.  I think that's fine, assuming 
we're decoupling ourselves from the ISA strings (and thus extension 
names).  We just need to document it somewhere -- presumably invoke, but 
that doesn't document anything else yet so we don't really have a 
pattern to match.



Signed-off-by: Christoph Müllner 
---
 gcc/common/config/riscv/riscv-common.cc |  2 ++
 gcc/testsuite/gcc.target/riscv/smaia.c  | 13 +
 gcc/testsuite/gcc.target/riscv/ssaia.c  | 13 +
 3 files changed, 28 insertions(+)
 create mode 100644 gcc/testsuite/gcc.target/riscv/smaia.c
 create mode 100644 gcc/testsuite/gcc.target/riscv/ssaia.c

diff --git a/gcc/common/config/riscv/riscv-common.cc 
b/gcc/common/config/riscv/riscv-common.cc
index 4b7f777c103..674eded07b7 100644
--- a/gcc/common/config/riscv/riscv-common.cc
+++ b/gcc/common/config/riscv/riscv-common.cc
@@ -219,6 +219,8 @@ static const struct riscv_ext_version 
riscv_ext_version_table[] =

   {"zmmul", ISA_SPEC_CLASS_NONE, 1, 0},

+  {"smaia", ISA_SPEC_CLASS_NONE, 1, 0},
+  {"ssaia", ISA_SPEC_CLASS_NONE, 1, 0},
   {"svinval", ISA_SPEC_CLASS_NONE, 1, 0},
   {"svnapot", ISA_SPEC_CLASS_NONE, 1, 0},

diff --git a/gcc/testsuite/gcc.target/riscv/smaia.c 
b/gcc/testsuite/gcc.target/riscv/smaia.c
new file mode 100644
index 000..9ca80236245
--- /dev/null
+++ b/gcc/testsuite/gcc.target/riscv/smaia.c
@@ -0,0 +1,13 @@
+/* { dg-do compile } */
+/* { dg-options "-march=rv64gc_smaia" { target { rv64 } } } */
+/* { dg-options "-march=rv32gc_smaia" { target { rv32 } } } */
+
+#ifndef __riscv_smaia
+#error Feature macro not defined
+#endif
+
+int
+foo (int a)
+{
+  return a;
+}
diff --git a/gcc/testsuite/gcc.target/riscv/ssaia.c 
b/gcc/testsuite/gcc.target/riscv/ssaia.c
new file mode 100644
index 000..b20e0eb10f5
--- /dev/null
+++ b/gcc/testsuite/gcc.target/riscv/ssaia.c
@@ -0,0 +1,13 @@
+/* { dg-do compile } */
+/* { dg-options "-march=rv64gc_ssaia" { target { rv64 } } } */
+/* { dg-options "-march=rv32gc_ssaia" { target { rv32 } } } */
+
+#ifndef __riscv_ssaia
+#error Feature macro not defined
+#endif
+
+int
+foo (int a)
+{
+  return a;
+}


Re: [PATCH] Ver.2: Add compile option "-msmall-data-limit=0" to avoid using .srodata section for riscv.

2022-11-17 Thread Palmer Dabbelt

On Thu, 17 Nov 2022 19:30:23 PST (-0800), oriachi...@gmail.com wrote:

Got it, I used to regard this test case as targeting at test if the const
data would use the ".rodata" section.


Sorry, I'm not quite sure what you're trying to say here.  Here's a dump 
of how I see things:


In some targets (RISC-V and MIPS) there's multiple copies of the 
data/rodata sections, with the small data/rodata ending up in the small 
sections (`.sdata` and `.srodata`).  I've never actually been 100% on 
that being allowed by any spec, but MIPS did it long before RISC-V so I 
figure software is expected to tolerate the oddness.


In RISC-V we use it to try and place as many symbols as possible close 
to GP, so we're more likely to relax to GP-relative addressing 
sequences.  IIRC that's pretty much the same as MIPS, though they have 
slightly different addressing requirements.


For targets that function this way `.srodata` and `.rodata` are 
functionally equivalent (assuming you're not playing any GP tricks to 
relocate, but those are way out of what's supported).  So unless the 
test is trying to dig into performance issues differences between these 
sections, it should just allow code to target either.




Palmer Dabbelt  于2022年11月18日周五 07:59写道:


On Thu, 17 Nov 2022 13:50:00 PST (-0800), gcc-patches@gcc.gnu.org wrote:
>
> On 11/17/22 02:53, Yixuan Chen wrote:
>> 2022-11-17  Yixuan Chen  
>>
>>  * gcc/testsuite/gcc.dg/pr25521.c: Add compile option
"-msmall-data-limit=0" to avoid using .srodata section for riscv.
>> ---
>>   gcc/testsuite/gcc.dg/pr25521.c | 3 ++-
>>   1 file changed, 2 insertions(+), 1 deletion(-)
>>
>> diff --git a/gcc/testsuite/gcc.dg/pr25521.c
b/gcc/testsuite/gcc.dg/pr25521.c
>> index 74fe2ae6626..628ddf1a761 100644
>> --- a/gcc/testsuite/gcc.dg/pr25521.c
>> +++ b/gcc/testsuite/gcc.dg/pr25521.c
>> @@ -2,7 +2,8 @@
>>  sections.
>>
>>  { dg-require-effective-target elf }
>> -   { dg-do compile } */
>> +   { dg-do compile }
>> +   { dg-options "-msmall-data-limit=0" { target { riscv*-*-* } } } */
>>
>>   const volatile int foo = 30;
>>
>
> Wouldn't this be better?  It avoids a target specific conditional by
> instead extending what we look for to cover [s]rodata sections.
>
>
> Thoughts?
>
> Jeff
> diff --git a/gcc/testsuite/gcc.dg/pr25521.c
b/gcc/testsuite/gcc.dg/pr25521.c
> index 74fe2ae6626..63363a03b9f 100644
> --- a/gcc/testsuite/gcc.dg/pr25521.c
> +++ b/gcc/testsuite/gcc.dg/pr25521.c
> @@ -7,4 +7,4 @@
>  const volatile int foo = 30;
>
>
> -/* { dg-final { scan-assembler "\\.rodata" } } */
> +/* { dg-final { scan-assembler "\\.s\?rodata" } } */

That's how I usually do it for these tests, there's some other targets
with sdata too so it fixes the test for everyone.  IIRC I said something
like that in the v1, but sorry if I'm just getting it confused with some
other patch.

There's a few of these that need to get chased down for every release,
maybe we should add some sort of DG hepler?  Not sure that'd keep folks
from matching on .data, though...



Re: [PATCH 2/7] riscv: riscv-cores.def: Add T-Head XuanTie C906

2022-11-17 Thread Palmer Dabbelt

On Thu, 17 Nov 2022 20:50:19 PST (-0800), gcc-patches@gcc.gnu.org wrote:

On Sun, Nov 13, 2022 at 10:46:31PM +0100, Christoph Muellner wrote:

From: Christoph Müllner 

This adds T-Head's XuanTie C906 to the list of known cores as "thead-c906".
The C906 is shipped for quite some time (it is the core of the Allwinner D1).
Note, that the tuning struct for the C906 is already part of GCC (it is
also name "thead-c906").

gcc/ChangeLog:

* config/riscv/riscv-cores.def (RISCV_CORE): Add "thead-c906".

gcc/testsuite/ChangeLog:

* gcc.target/riscv/mcpu-thead-c906.c: New test.

Signed-off-by: Christoph Müllner 
---
 gcc/config/riscv/riscv-cores.def   |  2 ++
 .../gcc.target/riscv/mcpu-thead-c906.c | 18 ++
 2 files changed, 20 insertions(+)
 create mode 100644 gcc/testsuite/gcc.target/riscv/mcpu-thead-c906.c

diff --git a/gcc/config/riscv/riscv-cores.def b/gcc/config/riscv/riscv-cores.def
index 31ad34682c5..648a010e09b 100644
--- a/gcc/config/riscv/riscv-cores.def
+++ b/gcc/config/riscv/riscv-cores.def
@@ -73,4 +73,6 @@ RISCV_CORE("sifive-s76",  "rv64imafdc", "sifive-7-series")
 RISCV_CORE("sifive-u54",  "rv64imafdc", "sifive-5-series")
 RISCV_CORE("sifive-u74",  "rv64imafdc", "sifive-7-series")

+RISCV_CORE("thead-c906",  "rv64imafdc", "thead-c906")
+


I think it makes more sense that thead-906 includes extended instructions by 
default.


Seems reasonable to me, but Kito understands this stuff better than I 
do.  IMO `-mtune=thead-c906` should leave the ISA targets alone and just 
set the tune info, and `-mcpu=thead-c906` should do that and also set 
the ISA to whatever's implemented on that core.


That said, I was playing around with some B-extension multilib stuff 
recently and am pretty sure this stuff is all a bit broken.  Maybe we 
should punt on enabling all these extensions for `-mcpu` until we have 
that sorted out?  IMO we're at the point where having ISA-dependent 
multilib paths on Linux makes sense, but that risks throwing another 
wrench into distro folks.


Maybe it doesn't matter, though?  IIUC distros aren't shipping multilib 
right now so the bugs won't manifest for users.


Re: [PATCH] RISC-V: branch-(not)equals-zero compares against $zero

2022-11-17 Thread Palmer Dabbelt

On Thu, 17 Nov 2022 14:44:31 PST (-0800), jeffreya...@gmail.com wrote:


On 11/8/22 12:55, Philipp Tomsich wrote:

If we are testing a register or a paradoxical subreg (i.e. anything that is not
a partial subreg) for equality/non-equality with zero, we can generate a branch
that compares against $zero.  This will work for QI, HI, SI and DImode, so we
enable this for ANYI.

2020-08-30  gcc/ChangeLog:

* config/riscv/riscv.md (*branch_equals_zero): Added pattern.


I've gone back an forth on this a few times.  As you know, I hate
subregs in the target descriptions and I guess I need to extend that to
querying if something is a subreg or not rather than just subregs
appearing in the RTL.


Presumably the idea behind rejecting partial subregs is the bits outside
the partial is unspecified, but that's also going to be true if we're
looking at a hardreg in QImode (for example) irrespective of it being
wrapped in a subreg.


I don't doubt it works the vast majority of the time, but I haven't been
able to convince myself it'll work all the time.  How do we ensure that
the bits outside the mode are zero?  I've been bitten by this kind of
problem before, and it's safe to say it was exceedingly painful to find.


I don't really understand the middle-end issues here (if there are 
any?), but I'm pretty sure code like this has passed by a few times 
before and we've yet to find a reliable way to optimize these cases.  
There's a bunch of patterns where knowing the XLEN-extension of shorter 
values would let us generate better code, but there's also cases where 
we'd generate worse code by ensure any extension scheme is followed.


Every time I've seen this come up before I've managed to convince myself 
we can't really fix the problem in the backend, though: if we always 
generate extended values in registers then we just push the cost over to 
the other patterns.  The only way I've come up with to handle something 
like this is to push more types into the middle-end so we can track 
these high bits and generate the faster sequences where we know what 
they are.  That seems like a huge mess, though, and every time it comes 
up folks run away ;)


Sorry if that's kind of vague, I usually find a way to break these but 
my box isn't cooperating with GCC builds today so I haven't even gotten 
that far yet...


Re: [PATCH 2/7] riscv: riscv-cores.def: Add T-Head XuanTie C906

2022-11-17 Thread cooper.qu--- via Gcc-patches
On Sun, Nov 13, 2022 at 10:46:31PM +0100, Christoph Muellner wrote:
> From: Christoph Müllner 
> 
> This adds T-Head's XuanTie C906 to the list of known cores as "thead-c906".
> The C906 is shipped for quite some time (it is the core of the Allwinner D1).
> Note, that the tuning struct for the C906 is already part of GCC (it is
> also name "thead-c906").
> 
> gcc/ChangeLog:
> 
>   * config/riscv/riscv-cores.def (RISCV_CORE): Add "thead-c906".
> 
> gcc/testsuite/ChangeLog:
> 
>   * gcc.target/riscv/mcpu-thead-c906.c: New test.
> 
> Signed-off-by: Christoph Müllner 
> ---
>  gcc/config/riscv/riscv-cores.def   |  2 ++
>  .../gcc.target/riscv/mcpu-thead-c906.c | 18 ++
>  2 files changed, 20 insertions(+)
>  create mode 100644 gcc/testsuite/gcc.target/riscv/mcpu-thead-c906.c
> 
> diff --git a/gcc/config/riscv/riscv-cores.def 
> b/gcc/config/riscv/riscv-cores.def
> index 31ad34682c5..648a010e09b 100644
> --- a/gcc/config/riscv/riscv-cores.def
> +++ b/gcc/config/riscv/riscv-cores.def
> @@ -73,4 +73,6 @@ RISCV_CORE("sifive-s76",  "rv64imafdc", 
> "sifive-7-series")
>  RISCV_CORE("sifive-u54",  "rv64imafdc", "sifive-5-series")
>  RISCV_CORE("sifive-u74",  "rv64imafdc", "sifive-7-series")
>  
> +RISCV_CORE("thead-c906",  "rv64imafdc", "thead-c906")
> +

I think it makes more sense that thead-906 includes extended instructions by 
default.


Thanks,
Cooper


Re: [PATCH 7/7] riscv: Add basic extension support for XTheadFmv and XTheadInt

2022-11-17 Thread cooper.qu--- via Gcc-patches
On Sun, Nov 13, 2022 at 10:46:36PM +0100, Christoph Muellner wrote:
> From: Christoph Müllner 
> 
> This patch add basic support for the XTheadFmv and XTheadInt
> ISA extension. As both extensions only contain instruction,
> which are not supposed to be emitted by the compiler, the support
> only covers awareness of the extension name in the march string
> and the definition of a feature test macro.
>

I think the XTheadFmv can be emitted when the data is moved between
DImode and DFmode in rv32 target. The intructions are similar to the
move instructions of new standard extension "zfa".

Thanks,
Cooper


Re: [PATCH] RISC-V: Add the Zihpm and Zicntr extensions

2022-11-17 Thread Palmer Dabbelt

On Thu, 17 Nov 2022 18:14:18 PST (-0800), christoph.muell...@vrull.eu wrote:

On Wed, Nov 9, 2022 at 4:01 AM Palmer Dabbelt  wrote:


These extensions were recently frozen [1].  As per Andrew's post [2]
we're meant to ignore these in software, this just adds them to the list
of allowed extensions and otherwise ignores them.  I added these under
SPEC_CLASS_NONE even though the PDF lists them as 20190614 because it
seems pointless to add another spec class just to accept two extensions
we then ignore.

1:
https://groups.google.com/a/groups.riscv.org/g/isa-dev/c/HZGoqP1eyps/m/GTNKRLJoAQAJ
2:
https://groups.google.com/a/groups.riscv.org/g/sw-dev/c/QKjQhChrq9Q/m/7gqdkctgAgAJ

gcc/ChangeLog

* common/config/riscv/riscv-common.cc: Add Zihpm and Zicnttr
extensions.

---

These deserves documentation, a test case, and a NEWS entry.  I didn't
write those yet because it's not super clear this is the way we wanted
to go, though: just flat out ignoring the ISA feels like the wrong thing
to do, but the guidance here is pretty clear.  Still feels odd, though.




We already have the infrastructure in GAS to check the CSR numbers.
It is an optional feature, but it is here and working.
We follow the guidance in the default configuration (CSR checking needs to
be turned on).
As long as we want to keep this infrastructure, there is no question if we
should continue
to support new extensions as required by this feature:
We have to because everything else will lead to a broken feature.

The question if CSR checking in GAS should be removed or not does not have
to be
answered right now if there is doubt about making the wrong decision.

Additionally, I fully agree that we can not ignore unknown extensions.
We must report an unknown extension in the march string to the user.
And even without CSR checking, GCC needs to be aware of all extensions
(e.g. for possible future support of -march=native).

So I think this patch should go in (together with a test).

That's why I also sent something similar for Smaia and Ssaia:
https://gcc.gnu.org/pipermail/gcc-patches/2022-November/606640.html


That's a different problem: with Zihpm and Zicntr we're ignoring known 
extensions, so we can pretend the ISA didn't make a backwards 
incompatible change.  That requires explicitly ignoring words in the ISA 
manual, which is something we've tried very hard to do in the past -- 
maybe less so these days, but IMO it's still worth calling out (see the 
__builtin_riscv_pause() doc patch, for example).



BR
Christoph






We've also still got an open discussion on how we want to handle -march
going forwards that's pretty relevant here, so I figured it'd be best to
send this out sooner rather than later as it's sort of related.
---
 gcc/common/config/riscv/riscv-common.cc | 3 +++
 1 file changed, 3 insertions(+)

diff --git a/gcc/common/config/riscv/riscv-common.cc
b/gcc/common/config/riscv/riscv-common.cc
index 4b7f777c103..72981f05ac7 100644
--- a/gcc/common/config/riscv/riscv-common.cc
+++ b/gcc/common/config/riscv/riscv-common.cc
@@ -190,6 +190,9 @@ static const struct riscv_ext_version
riscv_ext_version_table[] =
   {"zicbom",ISA_SPEC_CLASS_NONE, 1, 0},
   {"zicbop",ISA_SPEC_CLASS_NONE, 1, 0},

+  {"zicntr", ISA_SPEC_CLASS_NONE, 2, 0},
+  {"zihpm",  ISA_SPEC_CLASS_NONE, 2, 0},
+
   {"zk",ISA_SPEC_CLASS_NONE, 1, 0},
   {"zkn",   ISA_SPEC_CLASS_NONE, 1, 0},
   {"zks",   ISA_SPEC_CLASS_NONE, 1, 0},
--
2.38.1




[PATCH] RISC-V: Note that __builtin_riscv_pause() implies Xgnuzihintpausestate

2022-11-17 Thread Palmer Dabbelt
gcc/ChangeLog:

* doc/extend.texi (__builtin_riscv_pause): Imply
Xgnuzihintpausestate.
---
 gcc/doc/extend.texi | 4 +++-
 1 file changed, 3 insertions(+), 1 deletion(-)

diff --git a/gcc/doc/extend.texi b/gcc/doc/extend.texi
index b1dd39e64b8..26f14e61bc8 100644
--- a/gcc/doc/extend.texi
+++ b/gcc/doc/extend.texi
@@ -21103,7 +21103,9 @@ Returns the value that is currently set in the 
@samp{tp} register.
 @end deftypefn
 
 @deftypefn {Built-in Function}  void __builtin_riscv_pause (void)
-Generates the @code{pause} (hint) machine instruction.
+Generates the @code{pause} (hint) machine instruction.  This implies the
+Xgnuzihintpausestate extension, which redefines the @code{pause} instruction to
+change architectural state.
 @end deftypefn
 
 @node RX Built-in Functions
-- 
2.38.1



Re: [PATCH] invoke: RISC-V's -march doesn't take ISA strings

2022-11-17 Thread Palmer Dabbelt

On Wed, 09 Nov 2022 01:52:12 PST (-0800), christoph.muell...@vrull.eu wrote:

On Wed, Nov 9, 2022 at 4:00 AM Palmer Dabbelt  wrote:


On Tue, 08 Nov 2022 05:40:10 PST (-0800), christoph.muell...@vrull.eu
wrote:
> On Mon, Nov 7, 2022 at 8:01 PM Palmer Dabbelt 
wrote:
>
>> The docs say we take ISA strings, but that's never really been the case:
>> at a bare minimum we've required lower case strings, but there's
>> generally been some subtle differences as well in things like version
>> handling and such.  We talked about removing the lower case requirement
>> in the last GNU toolchain meeting and we've always called other
>> differences just bugs.  We don't have profile support yet, but based on
>> the discussions on the RISC-V lists it looks like we're going to have
>> some differences there as well.
>
>
>> So let's just stop pretending these are ISA strings.  That's been a
>> headache for years now, if we're meant to just be ISA-string-like here
>> then we don't have to worry about all these long-tail ISA string parsing
>> issues.
>>
>
> You are right, we should first properly specify the -march string,
> before we talk about the implementation details of the parser.
>
> I tried to collect all the recent change requests and undocumented
> properties of the -march string and worked on a first draft
specification.
> As the -march flag should share a common behavior across different
> compilers and tools, I've made a PR to the RISC-V toolchain-conventions
> repo:
>   https://github.com/riscv-non-isa/riscv-toolchain-conventions/pull/26
>
> Do you mind if we continue the discussion there?

IMO trying to handle this with another RISC-V spec is a waste of time:
we've spent many years trying to follow the specs here, it's pretty
clear they're just not meant to be read in that level of detail.  This
sort of problem is all over the place in RISC-V land, moving to a
different spec doesn't fix the problem.



I created the documentation as a response of your comment in your patch
about
the flag being "woefully under-documented".
You can call my attempt to address this a "waste of time", but a more
constructive
approach would be appreciated.


We need to document it in invoke (still .texi?  Not sure if that's 
changing along with sphinx...).  That's really been the case for quite a 
while now, we've had users complain about it.  We've just sort of been 
lazy and called it an ISA string with some small exceptions, but if 
something like this goes in then we don't have that excuse any more.



The reason I created a PR over there in the riscv-toolchain-conventions
repo is,
that it is the agreed place to document the common behavior of RISC-V
compilers/tools (e.g. command line flags).
I.e. to ensure that LLVM developers can also contribute to a common
solution.


That's very different than what you suggested.  What GCC does needs to 
be discussed on the GCC mailing lists and documented along with GCC.  If 
you want to document want all RISC-V compilers do that's up to you, but 
that PR describes things that neither GCC nor LLVM currently do.  We've 
been through this a bunch of times, it's the same discussion again.



If I understand correctly, you want something between the documentation that
you wrote as part of this patch and the PR that I created.
If so, then please let me know the details you don't want to have documented
in my proposal.


You can do whatever you want with your time, that's your decision.  That 
said, I still consider this a waste of time, for two reasons:


* We still need to document the GCC behavior along with GCC.  Nothing 
 from the RISC-V foundation changes that.  Even if that documentation 
 perfectly described the GCC behavior at any given time, there's all 
 sorts of versioning and licensing issues that make it unusable in 
 practice.
* We tried using the RISC-V specs as a single point of agreement, that 
 was the ISA string.  There's been years worth of issues around this, 
 we just have different definitions of some basic terms like 
 "compatible".  That's fine, every community does things their own way, 
 but moving these definitions to a different RISC-V spec doesn't change 
 anything.


So if you want to go write something in that repo then you're more than 
welcome to.  I just don't think it solves any problems -- we've got two 
standards, we can't fix that by adding a third.



Anyway, thanks for your feedback.
I'll quote/reference it in the PR so it won't get lost.




>> Link: https://lists.riscv.org/g/sig-toolchains/message/486
>>
>> gcc/ChangeLog
>>
>> doc/invoke.texi (RISC-V): -march doesn't take ISA strings.
>>
>> ---
>>
>> This is now woefully under-documented, as we can't even fall back on the
>> "it's just an ISA string" excuse any more.  I'm happy to go document
>> that, but figured I'd just send this along now so we can have the
>> discussion.
>> ---
>>  gcc/doc/invoke.texi | 8 
>>  1 file changed, 4 insertions(+), 4 deletions(-)
>>
>> diff --git 

Re: [PATCH 5/7] riscv: thead: Add support for XTheadBb ISA extension

2022-11-17 Thread cooper.qu--- via Gcc-patches
On Sun, Nov 13, 2022 at 10:46:34PM +0100, Christoph Muellner wrote:
> +(define_expand "extv"
> +  [(set (match_operand:GPR 0 "register_operand" "=r")
> + (sign_extract:GPR (match_operand:GPR 1 "register_operand" "r")
> +  (match_operand 2 "const_int_operand")
> +  (match_operand 3 "const_int_operand")))]
> +  "TARGET_XTHEADBB"
> +{
> +  if (TARGET_XTHEADBB
> +  && ((INTVAL (operands[2]) + INTVAL (operands[3]))
> +   >= GET_MODE_BITSIZE (GET_MODE (operands[1])).to_constant ()))
> +FAIL;
> +})
> +
> +(define_expand "extzv"
> +  [(set (match_operand:GPR 0 "register_operand" "=r")
> + (zero_extract:GPR (match_operand:GPR 1 "register_operand" "r")
> +  (match_operand 2 "const_int_operand")
> +  (match_operand 3 "const_int_operand")))]
> +  "TARGET_XTHEADBB"
> +{
> +  if (TARGET_XTHEADBB
> +  && ((INTVAL (operands[2]) + INTVAL (operands[3]))
> +   >= GET_MODE_BITSIZE (GET_MODE (operands[1])).to_constant ()))
> +FAIL;
I doubt whether it is necessary to add this judgment here,
and other architectures seem to have not added it. But there's nothing wrong 
with adding
> +
> +(define_insn "*th_ext"
> +  [(set (match_operand:X 0 "register_operand" "=r")
> + (sign_extract:X (match_operand:X 1 "register_operand" "r")
> + (match_operand 2 "const_int_operand")
> + (match_operand 3 "const_int_operand")))]
> +  "TARGET_XTHEADBB"
> +{
> +  operands[3] = GEN_INT (INTVAL (operands[2]) + INTVAL (operands[3]));
> +  return "th.ext\t%0,%1,%2,%3";
> +}
> +  [(set_attr "type" "bitmanip")])
> +
> +(define_insn "*th_extu"
> +  [(set (match_operand:X 0 "register_operand" "=r")
> + (zero_extract:X (match_operand:X 1 "register_operand" "r")
> + (match_operand 2 "const_int_operand")
> + (match_operand 3 "const_int_operand")))]
> +  "TARGET_XTHEADBB"
> +{
> +  operands[3] = GEN_INT (INTVAL (operands[2]) + INTVAL (operands[3]));
> +  return "th.extu\t%0,%1,%2,%3";
> +}
> +  [(set_attr "type" "bitmanip")])
> +

I think the operands[3] should be:
operands[3] = GEN_INT (INTVAL (operands[2]) + INTVAL (operands[3])) - 1
Because the ext and extu extract the bits %2..%3, when size is 1, the 2%
equals to 3%.
And a small optimization can be done here, the extzv can generate c.andi
when the start bit is 0 and the size is less than 7.

> +/* { dg-final { scan-assembler-times "th.revw\t" 2 } } */
> +/* { dg-final { scan-assembler-times "th.rev\t" 2 } } */
> diff --git a/gcc/testsuite/gcc.target/riscv/xtheadbb-srri.c 
> b/gcc/testsuite/gcc.target/riscv/xtheadbb-srri.c
> new file mode 100644
> index 000..cd992ae3f0a
> --- /dev/null
> +++ b/gcc/testsuite/gcc.target/riscv/xtheadbb-srri.c
> @@ -0,0 +1,18 @@
> +/* { dg-do compile } */
> +/* { dg-options "-march=rv64gc_xtheadbb -mabi=lp64" } */
> +/* { dg-skip-if "" { *-*-* } { "-O0" "-g" } } */
> +
> +unsigned long foo1(unsigned long rs1)
> +{
> +long shamt = __riscv_xlen - 11;
> +return (rs1 << shamt) |
> +(rs1 >> ((__riscv_xlen - shamt) & (__riscv_xlen - 1)));
> +}
> +unsigned long foo2(unsigned long rs1)
> +{
> +unsigned long shamt = __riscv_xlen - 11;
> +return (rs1 >> shamt) |
> +(rs1 << ((__riscv_xlen - shamt) & (__riscv_xlen - 1)));
> +}
> +
> +/* { dg-final { scan-assembler-times "th.srri" 2 } } */

Why is there no testcase for ff1 here? It can be generated by the builtin 
function '__builtin_clzl'.


Thanks,
Cooper


Re: [PATCH] Ver.2: Add compile option "-msmall-data-limit=0" to avoid using .srodata section for riscv.

2022-11-17 Thread Oria Chiuan via Gcc-patches
Got it, I used to regard this test case as targeting at test if the const
data would use the ".rodata" section.

Palmer Dabbelt  于2022年11月18日周五 07:59写道:

> On Thu, 17 Nov 2022 13:50:00 PST (-0800), gcc-patches@gcc.gnu.org wrote:
> >
> > On 11/17/22 02:53, Yixuan Chen wrote:
> >> 2022-11-17  Yixuan Chen  
> >>
> >>  * gcc/testsuite/gcc.dg/pr25521.c: Add compile option
> "-msmall-data-limit=0" to avoid using .srodata section for riscv.
> >> ---
> >>   gcc/testsuite/gcc.dg/pr25521.c | 3 ++-
> >>   1 file changed, 2 insertions(+), 1 deletion(-)
> >>
> >> diff --git a/gcc/testsuite/gcc.dg/pr25521.c
> b/gcc/testsuite/gcc.dg/pr25521.c
> >> index 74fe2ae6626..628ddf1a761 100644
> >> --- a/gcc/testsuite/gcc.dg/pr25521.c
> >> +++ b/gcc/testsuite/gcc.dg/pr25521.c
> >> @@ -2,7 +2,8 @@
> >>  sections.
> >>
> >>  { dg-require-effective-target elf }
> >> -   { dg-do compile } */
> >> +   { dg-do compile }
> >> +   { dg-options "-msmall-data-limit=0" { target { riscv*-*-* } } } */
> >>
> >>   const volatile int foo = 30;
> >>
> >
> > Wouldn't this be better?  It avoids a target specific conditional by
> > instead extending what we look for to cover [s]rodata sections.
> >
> >
> > Thoughts?
> >
> > Jeff
> > diff --git a/gcc/testsuite/gcc.dg/pr25521.c
> b/gcc/testsuite/gcc.dg/pr25521.c
> > index 74fe2ae6626..63363a03b9f 100644
> > --- a/gcc/testsuite/gcc.dg/pr25521.c
> > +++ b/gcc/testsuite/gcc.dg/pr25521.c
> > @@ -7,4 +7,4 @@
> >  const volatile int foo = 30;
> >
> >
> > -/* { dg-final { scan-assembler "\\.rodata" } } */
> > +/* { dg-final { scan-assembler "\\.s\?rodata" } } */
>
> That's how I usually do it for these tests, there's some other targets
> with sdata too so it fixes the test for everyone.  IIRC I said something
> like that in the v1, but sorry if I'm just getting it confused with some
> other patch.
>
> There's a few of these that need to get chased down for every release,
> maybe we should add some sort of DG hepler?  Not sure that'd keep folks
> from matching on .data, though...
>


[PATCH 2/2] Fix PR middle-end/107705: ICE after reclaration error

2022-11-17 Thread apinski--- via Gcc-patches
From: Andrew Pinski 

The problem here is after we created a call expression
in the C front-end, we replace the decl type with
an error mark node. We then end up calling
aggregate_value_p with the call expression
with the decl with the error mark as the type
and we ICE.

The fix is to check the function type
after we process the call expression inside
aggregate_value_p to get it.

OK? Bootstrapped and tested on x86_64-linux-gnu with no regressions.

Thanks,
Andrew Pinski

gcc/ChangeLog:

PR middle-end/107705
* function.cc (aggregate_value_p): Return 0 if
the function type was an error operand.

gcc/testsuite/ChangeLog:

* gcc.dg/redecl-22.c: New test.
---
 gcc/function.cc  | 3 +++
 gcc/testsuite/gcc.dg/redecl-22.c | 9 +
 2 files changed, 12 insertions(+)
 create mode 100644 gcc/testsuite/gcc.dg/redecl-22.c

diff --git a/gcc/function.cc b/gcc/function.cc
index 361aa5f7ed1..9c8773bbc59 100644
--- a/gcc/function.cc
+++ b/gcc/function.cc
@@ -2090,6 +2090,9 @@ aggregate_value_p (const_tree exp, const_tree fntype)
   if (VOID_TYPE_P (type))
 return 0;
 
+  if (error_operand_p (fntype))
+return 0;
+
   /* If a record should be passed the same as its first (and only) member
  don't pass it as an aggregate.  */
   if (TREE_CODE (type) == RECORD_TYPE && TYPE_TRANSPARENT_AGGR (type))
diff --git a/gcc/testsuite/gcc.dg/redecl-22.c b/gcc/testsuite/gcc.dg/redecl-22.c
new file mode 100644
index 000..7758570fabe
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/redecl-22.c
@@ -0,0 +1,9 @@
+/* We used to ICE in the gimplifier, PR 107705 */
+/* { dg-do compile } */
+/* { dg-options "-w" } */
+int f (void)
+{
+  int (*p) (void) = 0; // { dg-note "" }
+  return p ();
+  int p = 1; // { dg-error "" }
+}
-- 
2.17.1



[PATCH 1/2] Fix PRs 106764, 106765, and 107307, all ICE after invalid re-declaration

2022-11-17 Thread apinski--- via Gcc-patches
From: Andrew Pinski 

The problem here is the gimplifier returns GS_ERROR but
in some cases we don't check that soon enough and try
to do other work which could crash.
So the fix in these two cases is to return GS_ERROR
early if the gimplify_* functions had return GS_ERROR.

OK? Bootstrapped and tested on x86_64-linux-gnu with no regressions.

Thanks,
Andrew Pinski

gcc/ChangeLog:

PR c/106764
PR c/106765
PR c/107307
* gimplify.cc (gimplify_compound_lval): Return GS_ERROR
if gimplify_expr had return GS_ERROR.
(gimplify_call_expr): Likewise.

gcc/testsuite/ChangeLog:

PR c/106764
PR c/106765
PR c/107307
* gcc.dg/redecl-19.c: New test.
* gcc.dg/redecl-20.c: New test.
* gcc.dg/redecl-21.c: New test.
---
 gcc/gimplify.cc  | 5 +
 gcc/testsuite/gcc.dg/redecl-19.c | 5 +
 gcc/testsuite/gcc.dg/redecl-20.c | 9 +
 gcc/testsuite/gcc.dg/redecl-21.c | 9 +
 4 files changed, 28 insertions(+)
 create mode 100644 gcc/testsuite/gcc.dg/redecl-19.c
 create mode 100644 gcc/testsuite/gcc.dg/redecl-20.c
 create mode 100644 gcc/testsuite/gcc.dg/redecl-21.c

diff --git a/gcc/gimplify.cc b/gcc/gimplify.cc
index f06ce3cc77a..c62a966e918 100644
--- a/gcc/gimplify.cc
+++ b/gcc/gimplify.cc
@@ -3272,6 +3272,8 @@ gimplify_compound_lval (tree *expr_p, gimple_seq *pre_p, 
gimple_seq *post_p,
   tret = gimplify_expr (p, pre_p, post_p, is_gimple_min_lval,
fallback | fb_lvalue);
   ret = MIN (ret, tret);
+  if (ret == GS_ERROR)
+return GS_ERROR;
 
   /* Step 2a: if we have component references we do not support on
  registers then make sure the base isn't a register.  Of course
@@ -3709,6 +3711,9 @@ gimplify_call_expr (tree *expr_p, gimple_seq *pre_p, bool 
want_value)
   ret = gimplify_expr (_EXPR_FN (*expr_p), pre_p, NULL,
   is_gimple_call_addr, fb_rvalue);
 
+  if (ret == GS_ERROR)
+return GS_ERROR;
+
   nargs = call_expr_nargs (*expr_p);
 
   /* Get argument types for verification.  */
diff --git a/gcc/testsuite/gcc.dg/redecl-19.c b/gcc/testsuite/gcc.dg/redecl-19.c
new file mode 100644
index 000..cc10685448b
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/redecl-19.c
@@ -0,0 +1,5 @@
+/* We used to ICE in the gimplifier, PR 106764 */
+/* { dg-do compile } */
+/* { dg-options "-w" } */
+(*a)(); // { dg-note "" }
+b(){a()} a; // { dg-error "" }
diff --git a/gcc/testsuite/gcc.dg/redecl-20.c b/gcc/testsuite/gcc.dg/redecl-20.c
new file mode 100644
index 000..07f52115ec8
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/redecl-20.c
@@ -0,0 +1,9 @@
+/* We used to ICE in the gimplifier, PR 107307 */
+// { dg-do compile }
+// { dg-options "-w" }
+void f ()
+{
+  const struct { int a[1]; } b; // { dg-note "" }
+  int *c = b.a;
+  int *b; // { dg-error "" }
+}
diff --git a/gcc/testsuite/gcc.dg/redecl-21.c b/gcc/testsuite/gcc.dg/redecl-21.c
new file mode 100644
index 000..2f2a6548a57
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/redecl-21.c
@@ -0,0 +1,9 @@
+/* We used to ICE in the gimplifier, PR 106765 */
+/* { dg-do compile } */
+/* { dg-options "-w" } */
+struct a {
+  int b
+} c() {
+  struct a a; // { dg-note "" }
+  a.b;
+  d a; // { dg-error "" }
-- 
2.17.1



Re: [PATCH 3/8]middle-end: Support extractions of subvectors from arbitrary element position inside a vector

2022-11-17 Thread Hongtao Liu via Gcc-patches
On Thu, Nov 17, 2022 at 9:59 PM Richard Sandiford
 wrote:
>
> Hongtao Liu  writes:
> > On Thu, Nov 17, 2022 at 5:39 PM Richard Sandiford
> >  wrote:
> >>
> >> Hongtao Liu  writes:
> >> > On Wed, Nov 16, 2022 at 1:39 AM Richard Sandiford
> >> >  wrote:
> >> >>
> >> >> Tamar Christina  writes:
> >> >> >> -Original Message-
> >> >> >> From: Hongtao Liu 
> >> >> >> Sent: Tuesday, November 15, 2022 9:37 AM
> >> >> >> To: Tamar Christina 
> >> >> >> Cc: Richard Sandiford ; Tamar Christina 
> >> >> >> via
> >> >> >> Gcc-patches ; nd ;
> >> >> >> rguent...@suse.de
> >> >> >> Subject: Re: [PATCH 3/8]middle-end: Support extractions of 
> >> >> >> subvectors from
> >> >> >> arbitrary element position inside a vector
> >> >> >>
> >> >> >> On Tue, Nov 15, 2022 at 4:51 PM Tamar Christina
> >> >> >>  wrote:
> >> >> >> >
> >> >> >> > > -Original Message-
> >> >> >> > > From: Hongtao Liu 
> >> >> >> > > Sent: Tuesday, November 15, 2022 8:36 AM
> >> >> >> > > To: Tamar Christina 
> >> >> >> > > Cc: Richard Sandiford ; Tamar 
> >> >> >> > > Christina
> >> >> >> > > via Gcc-patches ; nd ;
> >> >> >> > > rguent...@suse.de
> >> >> >> > > Subject: Re: [PATCH 3/8]middle-end: Support extractions of
> >> >> >> > > subvectors from arbitrary element position inside a vector
> >> >> >> > >
> >> >> >> > > Hi:
> >> >> >> > >   I'm from https://gcc.gnu.org/pipermail/gcc-patches/2022-
> >> >> >> > > November/606040.html.
> >> >> >> > > >  }
> >> >> >> > > >
> >> >> >> > > >/* See if we can get a better vector mode before extracting.
> >> >> >> > > > */ diff --git a/gcc/optabs.cc b/gcc/optabs.cc index
> >> >> >> > > >
> >> >> >> > >
> >> >> >> cff37ccb0dfc3dd79b97d0abfd872f340855dc96..f338df410265dfe55b68961600
> >> >> >> > > 9
> >> >> >> > > 0
> >> >> >> > > > a453cc6a28d9 100644
> >> >> >> > > > --- a/gcc/optabs.cc
> >> >> >> > > > +++ b/gcc/optabs.cc
> >> >> >> > > > @@ -6267,6 +6267,7 @@ expand_vec_perm_const (machine_mode
> >> >> >> mode,
> >> >> >> > > rtx v0, rtx v1,
> >> >> >> > > >v0_qi = gen_lowpart (qimode, v0);
> >> >> >> > > >v1_qi = gen_lowpart (qimode, v1);
> >> >> >> > > >if (targetm.vectorize.vec_perm_const != NULL
> >> >> >> > > > + && targetm.can_change_mode_class (mode, qimode,
> >> >> >> > > > + ALL_REGS)
> >> >> >> > > It looks like you want to guard gen_lowpart, shouldn't it be 
> >> >> >> > > better
> >> >> >> > > to use validate_subreg  or (tmp = gen_lowpart_if_possible (mode,
> >> >> >> target_qi)).
> >> >> >> > > IMHO, targetm.can_change_mode_class is mostly used for RA, but 
> >> >> >> > > not
> >> >> >> > > to guard gen_lowpart.
> >> >> >> >
> >> >> >> > Hmm I don't think this is quite true, there are existing usages in
> >> >> >> > expr.cc and rtanal.cc That do this and aren't part of RA.  As I
> >> >> >> > mentioned before for instance the canoncalization of vec_select to 
> >> >> >> > subreg
> >> >> >> in rtlanal for instances uses this.
> >> >> >> In theory, we need to iterate through all reg classes that can be 
> >> >> >> assigned for
> >> >> >> both qimode and mode, if any regclass returns true for
> >> >> >> targetm.can_change_mode_class, the bitcast(validate_subreg) should 
> >> >> >> be ok.
> >> >> >> Here we just passed ALL_REGS.
> >> >> >
> >> >> > Yes, and most targets where this transformation is valid return true 
> >> >> > here.
> >> >> >
> >> >> > I've checked:
> >> >> >  * alpha
> >> >> >  * arm
> >> >> >  * aarch64
> >> >> >  * rs6000
> >> >> >  * s390
> >> >> >  * sparc
> >> >> >  * pa
> >> >> >  * mips
> >> >> >
> >> >> > And even the default example that other targets use from the 
> >> >> > documentation
> >> >> > would return true as the size of the modes are the same.
> >> >> >
> >> >> > X86 and RISCV are the only two targets that I found (but didn't check 
> >> >> > all) that
> >> >> > blankly return a result based on just the register classes.
> >> >> >
> >> >> > That is to say, there are more targets that adhere to the 
> >> >> > interpretation that
> >> >> > rclass here means "should be possible in some class in rclass" rather 
> >> >> > than
> >> >> > "should be possible in ALL classes of rclass".
> >> >>
> >> >> Yeah, I agree.  A query "can something stored in ALL_REGS change from
> >> >> mode M1 to mode M2?" is meaningful if at least one register R in 
> >> >> ALL_REGS
> >> >> can hold both M1 and M2.  It's then the target's job to answer
> >> >> conservatively so that the result covers all such R.
> >> >>
> >> >> In principle it's OK for a target to err on the side of caution and 
> >> >> forbid
> >> >> things that are actually OK.  But that's going to risk losing 
> >> >> performance
> >> >> in some cases, and sometimes that loss of performance will be 
> >> >> unacceptable.
> >> >> IMO that's what's happening here.  The target is applying x87 rules to
> >> >> things that (AIUI) are never stored in x87 registers, and so losing
> >> > Yes, it can be optimized since some mode will never assigned to x87 
> >> > registers.
> 

Re: [PATCH] RISC-V: Add the Zihpm and Zicntr extensions

2022-11-17 Thread Christoph Müllner
On Wed, Nov 9, 2022 at 4:01 AM Palmer Dabbelt  wrote:

> These extensions were recently frozen [1].  As per Andrew's post [2]
> we're meant to ignore these in software, this just adds them to the list
> of allowed extensions and otherwise ignores them.  I added these under
> SPEC_CLASS_NONE even though the PDF lists them as 20190614 because it
> seems pointless to add another spec class just to accept two extensions
> we then ignore.
>
> 1:
> https://groups.google.com/a/groups.riscv.org/g/isa-dev/c/HZGoqP1eyps/m/GTNKRLJoAQAJ
> 2:
> https://groups.google.com/a/groups.riscv.org/g/sw-dev/c/QKjQhChrq9Q/m/7gqdkctgAgAJ
>
> gcc/ChangeLog
>
> * common/config/riscv/riscv-common.cc: Add Zihpm and Zicnttr
> extensions.
>
> ---
>
> These deserves documentation, a test case, and a NEWS entry.  I didn't
> write those yet because it's not super clear this is the way we wanted
> to go, though: just flat out ignoring the ISA feels like the wrong thing
> to do, but the guidance here is pretty clear.  Still feels odd, though.
>


We already have the infrastructure in GAS to check the CSR numbers.
It is an optional feature, but it is here and working.
We follow the guidance in the default configuration (CSR checking needs to
be turned on).
As long as we want to keep this infrastructure, there is no question if we
should continue
to support new extensions as required by this feature:
We have to because everything else will lead to a broken feature.

The question if CSR checking in GAS should be removed or not does not have
to be
answered right now if there is doubt about making the wrong decision.

Additionally, I fully agree that we can not ignore unknown extensions.
We must report an unknown extension in the march string to the user.
And even without CSR checking, GCC needs to be aware of all extensions
(e.g. for possible future support of -march=native).

So I think this patch should go in (together with a test).

That's why I also sent something similar for Smaia and Ssaia:
https://gcc.gnu.org/pipermail/gcc-patches/2022-November/606640.html

BR
Christoph





> We've also still got an open discussion on how we want to handle -march
> going forwards that's pretty relevant here, so I figured it'd be best to
> send this out sooner rather than later as it's sort of related.
> ---
>  gcc/common/config/riscv/riscv-common.cc | 3 +++
>  1 file changed, 3 insertions(+)
>
> diff --git a/gcc/common/config/riscv/riscv-common.cc
> b/gcc/common/config/riscv/riscv-common.cc
> index 4b7f777c103..72981f05ac7 100644
> --- a/gcc/common/config/riscv/riscv-common.cc
> +++ b/gcc/common/config/riscv/riscv-common.cc
> @@ -190,6 +190,9 @@ static const struct riscv_ext_version
> riscv_ext_version_table[] =
>{"zicbom",ISA_SPEC_CLASS_NONE, 1, 0},
>{"zicbop",ISA_SPEC_CLASS_NONE, 1, 0},
>
> +  {"zicntr", ISA_SPEC_CLASS_NONE, 2, 0},
> +  {"zihpm",  ISA_SPEC_CLASS_NONE, 2, 0},
> +
>{"zk",ISA_SPEC_CLASS_NONE, 1, 0},
>{"zkn",   ISA_SPEC_CLASS_NONE, 1, 0},
>{"zks",   ISA_SPEC_CLASS_NONE, 1, 0},
> --
> 2.38.1
>
>


[PATCH] RISC-V: Add support for AIA ISA extensions (Ssaia and Smaia)

2022-11-17 Thread Christoph Muellner
From: Christoph Müllner 

This patch adds support for the two AIA ISA extensions Ssaia and Smaia.
They are not relelvant for the compiler, but the assembler might want
to validate the CSRs. Therefore, all this patch does is recognize the
extension name, emit a feature macro (incl. a test).

Signed-off-by: Christoph Müllner 
---
 gcc/common/config/riscv/riscv-common.cc |  2 ++
 gcc/testsuite/gcc.target/riscv/smaia.c  | 13 +
 gcc/testsuite/gcc.target/riscv/ssaia.c  | 13 +
 3 files changed, 28 insertions(+)
 create mode 100644 gcc/testsuite/gcc.target/riscv/smaia.c
 create mode 100644 gcc/testsuite/gcc.target/riscv/ssaia.c

diff --git a/gcc/common/config/riscv/riscv-common.cc 
b/gcc/common/config/riscv/riscv-common.cc
index 4b7f777c103..674eded07b7 100644
--- a/gcc/common/config/riscv/riscv-common.cc
+++ b/gcc/common/config/riscv/riscv-common.cc
@@ -219,6 +219,8 @@ static const struct riscv_ext_version 
riscv_ext_version_table[] =
 
   {"zmmul", ISA_SPEC_CLASS_NONE, 1, 0},
 
+  {"smaia", ISA_SPEC_CLASS_NONE, 1, 0},
+  {"ssaia", ISA_SPEC_CLASS_NONE, 1, 0},
   {"svinval", ISA_SPEC_CLASS_NONE, 1, 0},
   {"svnapot", ISA_SPEC_CLASS_NONE, 1, 0},
 
diff --git a/gcc/testsuite/gcc.target/riscv/smaia.c 
b/gcc/testsuite/gcc.target/riscv/smaia.c
new file mode 100644
index 000..9ca80236245
--- /dev/null
+++ b/gcc/testsuite/gcc.target/riscv/smaia.c
@@ -0,0 +1,13 @@
+/* { dg-do compile } */
+/* { dg-options "-march=rv64gc_smaia" { target { rv64 } } } */
+/* { dg-options "-march=rv32gc_smaia" { target { rv32 } } } */
+
+#ifndef __riscv_smaia
+#error Feature macro not defined
+#endif
+
+int
+foo (int a)
+{
+  return a;
+}
diff --git a/gcc/testsuite/gcc.target/riscv/ssaia.c 
b/gcc/testsuite/gcc.target/riscv/ssaia.c
new file mode 100644
index 000..b20e0eb10f5
--- /dev/null
+++ b/gcc/testsuite/gcc.target/riscv/ssaia.c
@@ -0,0 +1,13 @@
+/* { dg-do compile } */
+/* { dg-options "-march=rv64gc_ssaia" { target { rv64 } } } */
+/* { dg-options "-march=rv32gc_ssaia" { target { rv32 } } } */
+
+#ifndef __riscv_ssaia
+#error Feature macro not defined
+#endif
+
+int
+foo (int a)
+{
+  return a;
+}
-- 
2.38.1



[PATCH] [x86] define builtins for "shared" avxneconvert-avx512bf16vl builtins.

2022-11-17 Thread liuhongt via Gcc-patches
This should fix incorrect error when call those builtin with
-mavxneconvert and w/o -mavx512bf16 -mavx512vl.

Bootstrapped and regtested on x86_64-pc-linux-gnu{-m32,}
Ready to push to trunk.

gcc/ChangeLog:

* config/i386/i386-builtins.cc (def_builtin): Hanlde "shared"
avx512bf16vl-avxneconvert builtins.

gcc/testsuite/ChangeLog:

* gcc.target/i386/avxneconvert-1.c: New test.
---
 gcc/config/i386/i386-builtins.cc   |  2 ++
 gcc/testsuite/gcc.target/i386/avxneconvert-1.c | 11 +++
 2 files changed, 13 insertions(+)
 create mode 100644 gcc/testsuite/gcc.target/i386/avxneconvert-1.c

diff --git a/gcc/config/i386/i386-builtins.cc b/gcc/config/i386/i386-builtins.cc
index 9412cf1acc8..eacdf072244 100644
--- a/gcc/config/i386/i386-builtins.cc
+++ b/gcc/config/i386/i386-builtins.cc
@@ -285,6 +285,8 @@ def_builtin (HOST_WIDE_INT mask, HOST_WIDE_INT mask2,
 avx512vl exist.  */
  || (mask2 == OPTION_MASK_ISA2_AVXVNNI)
  || (mask2 == OPTION_MASK_ISA2_AVXIFMA)
+ || (mask2 == (OPTION_MASK_ISA2_AVXNECONVERT
+   | OPTION_MASK_ISA2_AVX512BF16))
  || (lang_hooks.builtin_function
  == lang_hooks.builtin_function_ext_scope))
{
diff --git a/gcc/testsuite/gcc.target/i386/avxneconvert-1.c 
b/gcc/testsuite/gcc.target/i386/avxneconvert-1.c
new file mode 100644
index 000..2bb129c3f72
--- /dev/null
+++ b/gcc/testsuite/gcc.target/i386/avxneconvert-1.c
@@ -0,0 +1,11 @@
+/* { dg-do compile } */
+/* { dg-options "-mavxneconvert -O2" } */
+
+typedef float v8sf __attribute__((vector_size(32)));
+typedef __bf16 v8bf __attribute__((vector_size(16)));
+
+v8bf
+foo (v8sf a)
+{
+  return __builtin_ia32_cvtneps2bf16_v8sf (a);
+}
-- 
2.27.0



Re: [PATCH] c++, v4: Implement C++23 P2647R1 - Permitting static constexpr variables in constexpr functions

2022-11-17 Thread Jason Merrill via Gcc-patches

On 11/17/22 15:42, Jakub Jelinek wrote:

On Thu, Nov 17, 2022 at 07:42:40PM +0100, Jakub Jelinek via Gcc-patches wrote:

I thought for older C++ this is to catch
void
foo ()
{
   constexpr int a = ({ static constexpr int b = 2; b; });
}
and for C++23 the only 3 spots that diagnose those.
But perhaps for C++20 or older we can check if the var has a context
of a constexpr function (then assume cp_finish_decl errored or pedwarned
already) and only error or pedwarn otherwise.


We could, but I wouldn't bother to enforce this specially for 
statement-expressions, which are already an extension.


OTOH, we should test that static constexpr is handled properly for 
lambdas, i.e. this should still fail:


constexpr int q = [](int i)
{ static constexpr int x = 42; return x+i; }(24);


So, here is an updated patch, which in constexpr.cc will accept
DECL_EXPR of decl_*constant_var_p static/thread_local non-extern vars
for C++23 or if they are not declared in constexpr/consteval function.
So, the statement expression case will remain hard error for C++ <= 20 rather 
than
pedwarn, because due to the ctx->quiet vs. !ctx->quiet case I don't see
what else we could do, either something is a constant expression, or
it is not, but whether it is or is not shouldn't depend on
-Wpedantic/-Wno-pedantic/-Werror=pedantic.

2022-11-17  Jakub Jelinek  

gcc/c-family/
* c-cppbuiltin.cc (c_cpp_builtins): Bump __cpp_constexpr
value from 202207L to 202211L.
gcc/cp/
* constexpr.cc (cxx_eval_constant_expression): Implement C++23
P2647R1 - Permitting static constexpr variables in constexpr functions.
Allow decl_constant_var_p static or thread_local vars for
C++23 and later or if they are declared inside of constexpr or
consteval function.
(potential_constant_expression_1): Similarly, except use
decl_maybe_constant_var_p instead of decl_constant_var_p if
processing_template_decl.
* decl.cc (diagnose_static_in_constexpr): New function.
(start_decl): Remove diagnostics of static or thread_local
vars in constexpr or consteval functions.
(cp_finish_decl): Call diagnose_static_in_constexpr.
gcc/testsuite/
* g++.dg/cpp23/constexpr-nonlit17.C: New test.
* g++.dg/cpp23/constexpr-nonlit18.C: New test.
* g++.dg/cpp23/constexpr-nonlit19.C: New test.
* g++.dg/cpp23/constexpr-nonlit20.C: New test.
* g++.dg/cpp23/feat-cxx2b.C: Adjust expected __cpp_constexpr
value.
* g++.dg/ext/stmtexpr19.C: Don't expect an error for C++20 or later.

--- gcc/c-family/c-cppbuiltin.cc.jj 2022-11-17 09:00:42.106249011 +0100
+++ gcc/c-family/c-cppbuiltin.cc2022-11-17 09:01:49.286320527 +0100
@@ -1074,7 +1074,7 @@ c_cpp_builtins (cpp_reader *pfile)
  /* Set feature test macros for C++23.  */
  cpp_define (pfile, "__cpp_size_t_suffix=202011L");
  cpp_define (pfile, "__cpp_if_consteval=202106L");
- cpp_define (pfile, "__cpp_constexpr=202207L");
+ cpp_define (pfile, "__cpp_constexpr=202211L");
  cpp_define (pfile, "__cpp_multidimensional_subscript=202211L");
  cpp_define (pfile, "__cpp_named_character_escapes=202207L");
  cpp_define (pfile, "__cpp_static_call_operator=202207L");
--- gcc/cp/constexpr.cc.jj  2022-11-17 08:48:30.530357181 +0100
+++ gcc/cp/constexpr.cc 2022-11-17 20:53:15.432408015 +0100
@@ -7100,17 +7100,35 @@ cxx_eval_constant_expression (const cons
/* Allow __FUNCTION__ etc.  */
&& !DECL_ARTIFICIAL (r))
  {
-   if (!ctx->quiet)
+   bool ok = decl_constant_var_p (r);
+   /* Since P2647R1 control can pass through definitions of static
+  or thread_local vars usable in constant expressions.
+  In C++20 or older, if such vars are declared inside of
+  constexpr or consteval function, diagnose_static_in_constexpr
+  should have already pedwarned on those.  Otherwise they could
+  be e.g. in a statement expression, reject those before
+  C++23.  */
+   if (ok && cxx_dialect < cxx23)
  {
-   if (CP_DECL_THREAD_LOCAL_P (r))
- error_at (loc, "control passes through definition of %qD "
-"with thread storage duration", r);
-   else
- error_at (loc, "control passes through definition of %qD "
-"with static storage duration", r);
+   tree fnctx = decl_function_context (r);
+   if (fnctx == NULL_TREE
+   || !DECL_DECLARED_CONSTEXPR_P (fnctx))
+ ok = false;
+ }
+   if (!ok)
+ {
+   if (!ctx->quiet)
+ {
+   if (CP_DECL_THREAD_LOCAL_P (r))
+ error_at (loc, "control passes through definition of "
+   

Re: [PATCH] c++, v4: Implement C++23 P2647R1 - Permitting static constexpr variables in constexpr functions

2022-11-17 Thread Marek Polacek via Gcc-patches
On Thu, Nov 17, 2022 at 09:42:17PM +0100, Jakub Jelinek wrote:
> On Thu, Nov 17, 2022 at 07:42:40PM +0100, Jakub Jelinek via Gcc-patches wrote:
> > I thought for older C++ this is to catch
> > void
> > foo ()
> > {
> >   constexpr int a = ({ static constexpr int b = 2; b; });
> > }
> > and for C++23 the only 3 spots that diagnose those.
> > But perhaps for C++20 or older we can check if the var has a context
> > of a constexpr function (then assume cp_finish_decl errored or pedwarned
> > already) and only error or pedwarn otherwise.
> 
> So, here is an updated patch, which in constexpr.cc will accept
> DECL_EXPR of decl_*constant_var_p static/thread_local non-extern vars
> for C++23 or if they are not declared in constexpr/consteval function.
> So, the statement expression case will remain hard error for C++ <= 20 rather 
> than
> pedwarn, because due to the ctx->quiet vs. !ctx->quiet case I don't see
> what else we could do, either something is a constant expression, or
> it is not, but whether it is or is not shouldn't depend on
> -Wpedantic/-Wno-pedantic/-Werror=pedantic.
> 
> 2022-11-17  Jakub Jelinek  
> 
> gcc/c-family/
>   * c-cppbuiltin.cc (c_cpp_builtins): Bump __cpp_constexpr
>   value from 202207L to 202211L.
> gcc/cp/
>   * constexpr.cc (cxx_eval_constant_expression): Implement C++23
>   P2647R1 - Permitting static constexpr variables in constexpr functions.
>   Allow decl_constant_var_p static or thread_local vars for
>   C++23 and later or if they are declared inside of constexpr or
>   consteval function.
>   (potential_constant_expression_1): Similarly, except use
>   decl_maybe_constant_var_p instead of decl_constant_var_p if
>   processing_template_decl.
>   * decl.cc (diagnose_static_in_constexpr): New function.
>   (start_decl): Remove diagnostics of static or thread_local
>   vars in constexpr or consteval functions.
>   (cp_finish_decl): Call diagnose_static_in_constexpr.
> gcc/testsuite/
>   * g++.dg/cpp23/constexpr-nonlit17.C: New test.
>   * g++.dg/cpp23/constexpr-nonlit18.C: New test.
>   * g++.dg/cpp23/constexpr-nonlit19.C: New test.
>   * g++.dg/cpp23/constexpr-nonlit20.C: New test.
>   * g++.dg/cpp23/feat-cxx2b.C: Adjust expected __cpp_constexpr
>   value.
>   * g++.dg/ext/stmtexpr19.C: Don't expect an error for C++20 or later. 
> 
> --- gcc/c-family/c-cppbuiltin.cc.jj   2022-11-17 09:00:42.106249011 +0100
> +++ gcc/c-family/c-cppbuiltin.cc  2022-11-17 09:01:49.286320527 +0100
> @@ -1074,7 +1074,7 @@ c_cpp_builtins (cpp_reader *pfile)
> /* Set feature test macros for C++23.  */
> cpp_define (pfile, "__cpp_size_t_suffix=202011L");
> cpp_define (pfile, "__cpp_if_consteval=202106L");
> -   cpp_define (pfile, "__cpp_constexpr=202207L");
> +   cpp_define (pfile, "__cpp_constexpr=202211L");
> cpp_define (pfile, "__cpp_multidimensional_subscript=202211L");
> cpp_define (pfile, "__cpp_named_character_escapes=202207L");
> cpp_define (pfile, "__cpp_static_call_operator=202207L");
> --- gcc/cp/constexpr.cc.jj2022-11-17 08:48:30.530357181 +0100
> +++ gcc/cp/constexpr.cc   2022-11-17 20:53:15.432408015 +0100
> @@ -7100,17 +7100,35 @@ cxx_eval_constant_expression (const cons
>   /* Allow __FUNCTION__ etc.  */
>   && !DECL_ARTIFICIAL (r))
> {
> - if (!ctx->quiet)
> + bool ok = decl_constant_var_p (r);
> + /* Since P2647R1 control can pass through definitions of static
> +or thread_local vars usable in constant expressions.
> +In C++20 or older, if such vars are declared inside of
> +constexpr or consteval function, diagnose_static_in_constexpr
> +should have already pedwarned on those.  Otherwise they could
> +be e.g. in a statement expression, reject those before
> +C++23.  */
> + if (ok && cxx_dialect < cxx23)
> {
> - if (CP_DECL_THREAD_LOCAL_P (r))
> -   error_at (loc, "control passes through definition of %qD "
> -  "with thread storage duration", r);
> - else
> -   error_at (loc, "control passes through definition of %qD "
> -  "with static storage duration", r);
> + tree fnctx = decl_function_context (r);
> + if (fnctx == NULL_TREE
> + || !DECL_DECLARED_CONSTEXPR_P (fnctx))
> +   ok = false;

FWIW, I couldn't find a way to trigger this code.

> +   }
> + if (!ok)
> +   {
> + if (!ctx->quiet)
> +   {
> + if (CP_DECL_THREAD_LOCAL_P (r))
> +   error_at (loc, "control passes through definition of "
> +  "%qD with thread storage duration", r);
> + else
> +   error_at (loc, "control passes through definition of 

Re: [PATCH 4/7] RISC-V: Recognize sign-extract + and cases for XVentanaCondOps

2022-11-17 Thread Philipp Tomsich
On Fri, 18 Nov 2022 at 00:56, Palmer Dabbelt  wrote:
>
> On Thu, 17 Nov 2022 15:41:26 PST (-0800), gcc-patches@gcc.gnu.org wrote:
> >
> > On 11/12/22 14:29, Philipp Tomsich wrote:
> >> Users might use explicit arithmetic operations to create a mask and
> >> then and it, in a sequence like
> >>  cond = (bits >> SHIFT) & 1;
> >>  mask = ~(cond - 1);
> >>  val &= mask;
> >> which will present as a single-bit sign-extract.
> >>
> >> Dependening on what combination of XVentanaCondOps and Zbs are
> >> available, this will map to the following sequences:
> >>   - bexti + vt.maskc, if both Zbs and XVentanaCondOps are present
> >>   - andi + vt.maskc, if only XVentanaCondOps is available and the
> >>  sign-extract is operating on bits 10:0 (bit
> >>  11 can't be reached, as the immediate is
> >>  sign-extended)
> >>   - slli + srli + and, otherwise.
> >>
> >> gcc/ChangeLog:
> >>
> >>  * config/riscv/xventanacondops.md: Recognize SIGN_EXTRACT
> >>of a single-bit followed by AND for XVentanaCondOps.
> >>
> >> Signed-off-by: Philipp Tomsich 
> >> ---
> >>
> >>   gcc/config/riscv/xventanacondops.md | 46 +
> >>   1 file changed, 46 insertions(+)
> >>
> >> diff --git a/gcc/config/riscv/xventanacondops.md 
> >> b/gcc/config/riscv/xventanacondops.md
> >> index 7930ef1d837..3e9d5833a4b 100644
> >> --- a/gcc/config/riscv/xventanacondops.md
> >> +++ b/gcc/config/riscv/xventanacondops.md
> >> @@ -73,3 +73,49 @@
> >> "TARGET_XVENTANACONDOPS"
> >> [(set (match_dup 5) (match_dup 1))
> >>  (set (match_dup 0) (and:X (neg:X (ne:X (match_dup 5) (const_int 0)))
> >> +
> >> +;; Users might use explicit arithmetic operations to create a mask and
> >> +;; then and it, in a sequence like
> >
> > Nit.  Seems like a word is missing.  "make and then and it"??
> >
> >
> > Do we really care about TARGET_XVENTANACONDOPS && ! TARGET_ZBS?
>
> I guess that's really more of a question for the Ventana folks, but
> assuming all the Ventana widgets have Zbs then it seems reasonable to
> just couple them -- there's already enough options in RISC-V land to
> test everything, might as well make sure what slips through the cracks
> isn't being built.
>
> Probably best to have a comment saying why here, and then something to
> enforce the dependency in -march (either as an implict extension
> dependency, or just a warning/error) so users don't get tripped up on
> configs that aren't expected to work.

With an eye to (the proposed) ZiCondOps, I'd rather pull this in once
XVentanaCondOps is applied.
That said, we'll need to add a test-case for these.

> > If there's a good reason to care about the !TARGET_ZBS case, then OK
> > with the nit fixed.   If we agree that the !TARGET_ZBS case isn't all
> > that important, then obviously OK with that pattern removed too.
> >
> > I'm about out of oomph today.  I may take a look at 7/7 tonight though.
> > Given it hits target independent code we probably want to get resolution
> > on that patch sooner rather than later.
>
> Thanks, there's no way we would have gotten this all sorted out so fast
> without the help!


Re: [PATCH 4/7] RISC-V: Recognize sign-extract + and cases for XVentanaCondOps

2022-11-17 Thread Philipp Tomsich
On Fri, 18 Nov 2022 at 00:41, Jeff Law  wrote:
>
>
> On 11/12/22 14:29, Philipp Tomsich wrote:
> > Users might use explicit arithmetic operations to create a mask and
> > then and it, in a sequence like
> >  cond = (bits >> SHIFT) & 1;
> >  mask = ~(cond - 1);
> >  val &= mask;
> > which will present as a single-bit sign-extract.
> >
> > Dependening on what combination of XVentanaCondOps and Zbs are
> > available, this will map to the following sequences:
> >   - bexti + vt.maskc, if both Zbs and XVentanaCondOps are present
> >   - andi + vt.maskc, if only XVentanaCondOps is available and the
> >  sign-extract is operating on bits 10:0 (bit
> >   11 can't be reached, as the immediate is
> >   sign-extended)
> >   - slli + srli + and, otherwise.
> >
> > gcc/ChangeLog:
> >
> >   * config/riscv/xventanacondops.md: Recognize SIGN_EXTRACT
> > of a single-bit followed by AND for XVentanaCondOps.
> >
> > Signed-off-by: Philipp Tomsich 
> > ---
> >
> >   gcc/config/riscv/xventanacondops.md | 46 +
> >   1 file changed, 46 insertions(+)
> >
> > diff --git a/gcc/config/riscv/xventanacondops.md 
> > b/gcc/config/riscv/xventanacondops.md
> > index 7930ef1d837..3e9d5833a4b 100644
> > --- a/gcc/config/riscv/xventanacondops.md
> > +++ b/gcc/config/riscv/xventanacondops.md
> > @@ -73,3 +73,49 @@
> > "TARGET_XVENTANACONDOPS"
> > [(set (match_dup 5) (match_dup 1))
> >  (set (match_dup 0) (and:X (neg:X (ne:X (match_dup 5) (const_int 0)))
> > +
> > +;; Users might use explicit arithmetic operations to create a mask and
> > +;; then and it, in a sequence like
>
> Nit.  Seems like a word is missing.  "make and then and it"??
>
>
> Do we really care about TARGET_XVENTANACONDOPS && ! TARGET_ZBS?

While Ventana might not plan to have this combination, nothing
prevents someone to implement only a single one of these — just as
users might choose to override the -march string.  Also note that (the
proposed) ZiCondOps will share most of its infrastructure with
XVentanaCondOps, we will have the same situation there.

> If there's a good reason to care about the !TARGET_ZBS case, then OK
> with the nit fixed.   If we agree that the !TARGET_ZBS case isn't all
> that important, then obviously OK with that pattern removed too.
>
> I'm about out of oomph today.  I may take a look at 7/7 tonight though.
> Given it hits target independent code we probably want to get resolution
> on that patch sooner rather than later.
>
> jeff
>


Re: [PATCH] c++: Reject UDLs in certain contexts [PR105300]

2022-11-17 Thread Jason Merrill via Gcc-patches

On 11/16/22 20:12, Marek Polacek wrote:

On Wed, Nov 16, 2022 at 08:22:39AM -0500, Jason Merrill wrote:

On 11/15/22 19:35, Marek Polacek wrote:

On Tue, Nov 15, 2022 at 06:58:39PM -0500, Jason Merrill wrote:

On 11/12/22 06:53, Marek Polacek wrote:

In this PR, we are crashing because we've encountered a UDL where a
string-literal is expected.  This patch makes the parser reject string
and character UDLs in all places where the grammar requires a
string-literal and not a user-defined-string-literal.

Bootstrapped/regtested on x86_64-pc-linux-gnu, ok for trunk?


Since the grammar has

user-defined-string-literal :
string-literal ud-suffix

maybe we want to move the UDL handling out to a cp_parser_udl_string_literal
that calls cp_parser_string_literal?


Umm, maybe, but the UDL handling code seems to be too entrenched in
cp_parser_string_literal and I don't think it's going to be easy to extract
it :/.


Fair enough; maybe a wrapper, then?


As in, have a cp_parser_udl_string_literal wrapper that calls
cp_parser_string_literal with udl_ok=true, rename cp_parser_string_literal,
introduce a new cp_parser_string_literal wrapper that passes udl_ok=false?


That's what I was thinking.  And the new cp_parser_string_literal could 
also omit the lookup_udlit parm.



One problem with cp_parser_udl_string_literal is that it's too similar to
cp_parser_userdef_string_literal, which would be confusing, I think.


True, probably better to use that name instead, and rename the current 
one to something like finish_userdef_string_literal


Jason



Re: [PATCH] Ver.2: Add compile option "-msmall-data-limit=0" to avoid using .srodata section for riscv.

2022-11-17 Thread Palmer Dabbelt

On Thu, 17 Nov 2022 13:50:00 PST (-0800), gcc-patches@gcc.gnu.org wrote:


On 11/17/22 02:53, Yixuan Chen wrote:

2022-11-17  Yixuan Chen  

 * gcc/testsuite/gcc.dg/pr25521.c: Add compile option 
"-msmall-data-limit=0" to avoid using .srodata section for riscv.
---
  gcc/testsuite/gcc.dg/pr25521.c | 3 ++-
  1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/gcc/testsuite/gcc.dg/pr25521.c b/gcc/testsuite/gcc.dg/pr25521.c
index 74fe2ae6626..628ddf1a761 100644
--- a/gcc/testsuite/gcc.dg/pr25521.c
+++ b/gcc/testsuite/gcc.dg/pr25521.c
@@ -2,7 +2,8 @@
 sections.

 { dg-require-effective-target elf }
-   { dg-do compile } */
+   { dg-do compile }
+   { dg-options "-msmall-data-limit=0" { target { riscv*-*-* } } } */

  const volatile int foo = 30;



Wouldn't this be better?  It avoids a target specific conditional by
instead extending what we look for to cover [s]rodata sections.


Thoughts?

Jeff
diff --git a/gcc/testsuite/gcc.dg/pr25521.c b/gcc/testsuite/gcc.dg/pr25521.c
index 74fe2ae6626..63363a03b9f 100644
--- a/gcc/testsuite/gcc.dg/pr25521.c
+++ b/gcc/testsuite/gcc.dg/pr25521.c
@@ -7,4 +7,4 @@
 const volatile int foo = 30;


-/* { dg-final { scan-assembler "\\.rodata" } } */
+/* { dg-final { scan-assembler "\\.s\?rodata" } } */


That's how I usually do it for these tests, there's some other targets 
with sdata too so it fixes the test for everyone.  IIRC I said something 
like that in the v1, but sorry if I'm just getting it confused with some 
other patch.


There's a few of these that need to get chased down for every release, 
maybe we should add some sort of DG hepler?  Not sure that'd keep folks 
from matching on .data, though...


Re: [PATCH] c++: constinit on pointer to function [PR104066]

2022-11-17 Thread Jason Merrill via Gcc-patches

On 11/17/22 13:38, Marek Polacek wrote:

[dcl.constinit]: "The constinit specifier shall be applied only to
a declaration of a variable with static or thread storage duration."

Thus, this ought to be OK:

   constinit void (*p)() = nullptr;

but the error message I introduced when implementing constinit was
not looking at funcdecl_p, so the code above was rejected.

Fixed thus.  I'm checking constinit_p first because I think that's
far more likely to be false than funcdecl_p.

Bootstrapped/regtested on x86_64-pc-linux-gnu, ok for trunk?
I think I'd like to backport this all the way back to 10.


OK for trunk and release branches.


PR c++/104066

gcc/cp/ChangeLog:

* decl.cc (grokdeclarator): Check funcdecl_p before complaining
about constinit.

gcc/testsuite/ChangeLog:

* g++.dg/cpp2a/constinit18.C: New test.
---
  gcc/cp/decl.cc   |  2 +-
  gcc/testsuite/g++.dg/cpp2a/constinit18.C | 12 
  2 files changed, 13 insertions(+), 1 deletion(-)
  create mode 100644 gcc/testsuite/g++.dg/cpp2a/constinit18.C

diff --git a/gcc/cp/decl.cc b/gcc/cp/decl.cc
index d28889ed865..9a7b1a6c381 100644
--- a/gcc/cp/decl.cc
+++ b/gcc/cp/decl.cc
@@ -13071,7 +13071,7 @@ grokdeclarator (const cp_declarator *declarator,
  "an array", name);
return error_mark_node;
  }
-   if (constinit_p)
+   if (constinit_p && funcdecl_p)
  {
error_at (declspecs->locations[ds_constinit],
  "% on function return type is not "
diff --git a/gcc/testsuite/g++.dg/cpp2a/constinit18.C 
b/gcc/testsuite/g++.dg/cpp2a/constinit18.C
new file mode 100644
index 000..51b4f0273be
--- /dev/null
+++ b/gcc/testsuite/g++.dg/cpp2a/constinit18.C
@@ -0,0 +1,12 @@
+// PR c++/104066
+// { dg-do compile { target c++20 } }
+
+constinit void (*p)() = nullptr;
+constinit void (*pp)() = nullptr;
+void fn();
+constinit void ()() = fn;
+
+extern constinit long (* const syscall_reexported) (long, ...);
+
+constinit void bad (); // { dg-error ".constinit. on function return type is not 
allowed" }
+constinit void bad () { } // { dg-error ".constinit. on function return type is not 
allowed" }

base-commit: ee892832ea19b21a3420ef042e582204fac852a2




Re: [PATCH 4/7] RISC-V: Recognize sign-extract + and cases for XVentanaCondOps

2022-11-17 Thread Palmer Dabbelt

On Thu, 17 Nov 2022 15:41:26 PST (-0800), gcc-patches@gcc.gnu.org wrote:


On 11/12/22 14:29, Philipp Tomsich wrote:

Users might use explicit arithmetic operations to create a mask and
then and it, in a sequence like
 cond = (bits >> SHIFT) & 1;
 mask = ~(cond - 1);
 val &= mask;
which will present as a single-bit sign-extract.

Dependening on what combination of XVentanaCondOps and Zbs are
available, this will map to the following sequences:
  - bexti + vt.maskc, if both Zbs and XVentanaCondOps are present
  - andi + vt.maskc, if only XVentanaCondOps is available and the
 sign-extract is operating on bits 10:0 (bit
11 can't be reached, as the immediate is
sign-extended)
  - slli + srli + and, otherwise.

gcc/ChangeLog:

* config/riscv/xventanacondops.md: Recognize SIGN_EXTRACT
  of a single-bit followed by AND for XVentanaCondOps.

Signed-off-by: Philipp Tomsich 
---

  gcc/config/riscv/xventanacondops.md | 46 +
  1 file changed, 46 insertions(+)

diff --git a/gcc/config/riscv/xventanacondops.md 
b/gcc/config/riscv/xventanacondops.md
index 7930ef1d837..3e9d5833a4b 100644
--- a/gcc/config/riscv/xventanacondops.md
+++ b/gcc/config/riscv/xventanacondops.md
@@ -73,3 +73,49 @@
"TARGET_XVENTANACONDOPS"
[(set (match_dup 5) (match_dup 1))
 (set (match_dup 0) (and:X (neg:X (ne:X (match_dup 5) (const_int 0)))
+
+;; Users might use explicit arithmetic operations to create a mask and
+;; then and it, in a sequence like


Nit.  Seems like a word is missing.  "make and then and it"??


Do we really care about TARGET_XVENTANACONDOPS && ! TARGET_ZBS?


I guess that's really more of a question for the Ventana folks, but 
assuming all the Ventana widgets have Zbs then it seems reasonable to 
just couple them -- there's already enough options in RISC-V land to 
test everything, might as well make sure what slips through the cracks 
isn't being built.


Probably best to have a comment saying why here, and then something to 
enforce the dependency in -march (either as an implict extension 
dependency, or just a warning/error) so users don't get tripped up on 
configs that aren't expected to work.



If there's a good reason to care about the !TARGET_ZBS case, then OK
with the nit fixed.   If we agree that the !TARGET_ZBS case isn't all
that important, then obviously OK with that pattern removed too.

I'm about out of oomph today.  I may take a look at 7/7 tonight though. 
Given it hits target independent code we probably want to get resolution
on that patch sooner rather than later.


Thanks, there's no way we would have gotten this all sorted out so fast 
without the help!


Re: [PATCH 2/5] c++: Set the locus of the function result decl

2022-11-17 Thread Jason Merrill via Gcc-patches

On 11/17/22 14:02, Bernhard Reutner-Fischer wrote:

On Thu, 17 Nov 2022 09:53:32 -0500
Jason Merrill  wrote:


On 11/17/22 03:56, Bernhard Reutner-Fischer wrote:

On Tue, 15 Nov 2022 18:52:41 -0500
Jason Merrill  wrote:
   

On 11/12/22 13:45, Bernhard Reutner-Fischer wrote:

gcc/cp/ChangeLog:

* decl.cc (start_function): Set the result decl source location to
the location of the typespec.

---
Bootstrapped and regtested on x86_86-unknown-linux with no regressions.
Ok for trunk?

Cc: Nathan Sidwell 
Cc: Jason Merrill 
---
gcc/cp/decl.cc | 15 ++-
1 file changed, 14 insertions(+), 1 deletion(-)

diff --git a/gcc/cp/decl.cc b/gcc/cp/decl.cc
index 6e98ea35a39..ed40815e645 100644
--- a/gcc/cp/decl.cc
+++ b/gcc/cp/decl.cc
@@ -17449,6 +17449,8 @@ start_function (cp_decl_specifier_seq *declspecs,
tree attrs)
{
  tree decl1;
+  tree result;
+  bool ret;


We now prefer to declare new variables as late as possible, usually when
they are initialized.


Moved. Ok like attached? Bootstrapped and regtested fine.
   

  decl1 = grokdeclarator (declarator, declspecs, FUNCDEF, 1, );
  invoke_plugin_callbacks (PLUGIN_START_PARSE_FUNCTION, decl1);
@@ -17461,7 +17463,18 @@ start_function (cp_decl_specifier_seq *declspecs,
gcc_assert (same_type_p (TREE_TYPE (TREE_TYPE (decl1)),
 integer_type_node));

-  return start_preparsed_function (decl1, attrs, /*flags=*/SF_DEFAULT);

+  ret = start_preparsed_function (decl1, attrs, /*flags=*/SF_DEFAULT);
+
+  /* decl1 might be ggc_freed here.  */
+  decl1 = current_function_decl;
+
+  /* Set the result decl source location to the location of the typespec.  */
+  if (TREE_CODE (decl1) == FUNCTION_DECL
+  && declspecs->locations[ds_type_spec] != UNKNOWN_LOCATION
+  && (result = DECL_RESULT (decl1)) != NULL_TREE
+  && DECL_SOURCE_LOCATION (result) == input_location)
+DECL_SOURCE_LOCATION (result) = declspecs->locations[ds_type_spec];


One way to handle the template case would be for the code in
start_preparsed_function that sets DECL_RESULT to check whether decl1 is
a template instantiation, and in that case copy the location from the
template's DECL_RESULT, i.e.

DECL_RESULT (DECL_TEMPLATE_RESULT (DECL_TI_TEMPLATE (decl1)))


Well, that would probably work if something would set the location of
that template result decl properly, which nothing does out of the box.


Hmm, it should get set by your patch, since templates go through
start_function like normal functions.


diff --git a/gcc/cp/decl.cc b/gcc/cp/decl.cc
index ed7226b82f0..65d78c82a2d 100644
--- a/gcc/cp/decl.cc
+++ b/gcc/cp/decl.cc
@@ -17230,6 +17231,17 @@ start_preparsed_function (tree decl1, tree attrs, int 
flags)
 cp_apply_type_quals_to_decl (cp_type_quals (restype), resdecl);
   }
   
+  /* Set the result decl source location to the location of the typespec.  */

+  if (DECL_RESULT (decl1)
+  && !DECL_USE_TEMPLATE (decl1)
+  && DECL_TEMPLATE_INFO (decl1)
+  && DECL_TI_TEMPLATE (decl1)
+  && DECL_TEMPLATE_RESULT (DECL_TI_TEMPLATE (decl1))
+  && DECL_RESULT (DECL_TEMPLATE_RESULT (DECL_TI_TEMPLATE (decl1


This condition is true only for the template definition, for which you
haven't gotten to your start_function change yet.

Instead, you want to copy the location for instantiations, i.e. check
DECL_TEMPLATE_INSTANTIATION instead of !DECL_USE_TEMPLATE.


No, that makes no difference.


Hmm, when I stop there when processing the instantiation the template's 
DECL_RESULT has the right location information, e.g. for


template  int f() { return 42; }

int main()
{
  f();
}

#1  0x00f950e8 in instantiate_body (pattern=0x77ff5080 f>, args=, d=0x7fffe971e600 f>, nested_p=false) at /home/jason/gt/gcc/cp/pt.cc:26470
#0  start_preparsed_function (decl1=, 
attrs=, flags=1) at /home/jason/gt/gcc/cp/decl.cc:17252

(gdb) p expand_location (input_location)
$13 = {file = 0x4962370 "wa.C", line = 1, column = 24, data = 0x0, sysp 
= false}
(gdb) p expand_location (DECL_SOURCE_LOCATION (DECL_RESULT 
(DECL_TEMPLATE_RESULT (DECL_TI_TEMPLATE (decl1)
$14 = {file = 0x4962370 "wa.C", line = 1, column = 20, data = 0x0, sysp 
= false}



But really I'm not interested in the template case, i only mentioned
them because they don't work and in case somebody wanted to have correct
locations.
I remember just frustration when i looked at those a year ago.


I'd like to get the template case right while we're looking at it.  I 
guess I can add that myself if you're done trying.



Is the hunk for normal functions OK for trunk?


You also need a testcase for the desired behavior, with e.g.
{ dg-error "23:" }


thanks,




+  DECL_SOURCE_LOCATION (DECL_RESULT (decl1))
+   = DECL_SOURCE_LOCATION (


Open paren goes on the new line.


+   DECL_RESULT (DECL_TEMPLATE_RESULT (DECL_TI_TEMPLATE (decl1; >   
  /* Record the decl so that the function name is defined.

Re: [PATCH 4/7] RISC-V: Recognize sign-extract + and cases for XVentanaCondOps

2022-11-17 Thread Jeff Law via Gcc-patches



On 11/12/22 14:29, Philipp Tomsich wrote:

Users might use explicit arithmetic operations to create a mask and
then and it, in a sequence like
 cond = (bits >> SHIFT) & 1;
 mask = ~(cond - 1);
 val &= mask;
which will present as a single-bit sign-extract.

Dependening on what combination of XVentanaCondOps and Zbs are
available, this will map to the following sequences:
  - bexti + vt.maskc, if both Zbs and XVentanaCondOps are present
  - andi + vt.maskc, if only XVentanaCondOps is available and the
 sign-extract is operating on bits 10:0 (bit
11 can't be reached, as the immediate is
sign-extended)
  - slli + srli + and, otherwise.

gcc/ChangeLog:

* config/riscv/xventanacondops.md: Recognize SIGN_EXTRACT
  of a single-bit followed by AND for XVentanaCondOps.

Signed-off-by: Philipp Tomsich 
---

  gcc/config/riscv/xventanacondops.md | 46 +
  1 file changed, 46 insertions(+)

diff --git a/gcc/config/riscv/xventanacondops.md 
b/gcc/config/riscv/xventanacondops.md
index 7930ef1d837..3e9d5833a4b 100644
--- a/gcc/config/riscv/xventanacondops.md
+++ b/gcc/config/riscv/xventanacondops.md
@@ -73,3 +73,49 @@
"TARGET_XVENTANACONDOPS"
[(set (match_dup 5) (match_dup 1))
 (set (match_dup 0) (and:X (neg:X (ne:X (match_dup 5) (const_int 0)))
+
+;; Users might use explicit arithmetic operations to create a mask and
+;; then and it, in a sequence like


Nit.  Seems like a word is missing.  "make and then and it"??


Do we really care about TARGET_XVENTANACONDOPS && ! TARGET_ZBS?


If there's a good reason to care about the !TARGET_ZBS case, then OK 
with the nit fixed.   If we agree that the !TARGET_ZBS case isn't all 
that important, then obviously OK with that pattern removed too.


I'm about out of oomph today.  I may take a look at 7/7 tonight though.  
Given it hits target independent code we probably want to get resolution 
on that patch sooner rather than later.


jeff



Re: [PATCH 6/7] RISC-V: Support immediates in XVentanaCondOps

2022-11-17 Thread Jeff Law via Gcc-patches



On 11/12/22 14:29, Philipp Tomsich wrote:

When if-conversion encounters sequences using immediates, the
sequences can't trivially map back onto vt.maskc/vt.maskcn (even if
benefitial) due to vt.maskc and vt.maskcn not having immediate forms.

This adds a splitter to rewrite opportunities for XVentanaCondOps that
operate on an immediate by first putting the immediate into a register
to enable the non-immediate vt.maskc/vt.maskcn instructions to operate
on the value.

Consider code, such as

   long func2 (long a, long c)
   {
 if (c)
   a = 2;
 else
   a = 5;
 return a;
   }

which will be converted to

   func2:
seqza0,a2
neg a0,a0
andia0,a0,3
addia0,a0,2
ret

Following this change, we generate

li  a0,3
vt.maskcn   a0,a0,a2
addia0,a0,2
ret

This commit also introduces a simple unit test for if-conversion with
immediate (literal) values as the sources for simple sets in the THEN
and ELSE blocks. The test checks that Ventana's conditional mask
instruction (vt.maskc) is emitted as part of the resultant branchless
instruction sequence.

gcc/ChangeLog:

* config/riscv/xventanacondops.md: Support immediates for
  vt.maskc/vt.maskcn through a splitter.

gcc/testsuite/ChangeLog:

 * gcc.target/riscv/xventanacondops-ifconv-imm.c: New test.


OK once we've cleared the non-technical hurdles to committing vendor 
specific extensions.



Jeff




Re: [PATCH 5/7] RISC-V: Recognize bexti in negated if-conversion

2022-11-17 Thread Jeff Law via Gcc-patches



On 11/12/22 14:29, Philipp Tomsich wrote:

While the positive case "if ((bits >> SHAMT) & 1)" for SHAMT 0..10 can
trigger conversion into efficient branchless sequences
   - with Zbs (bexti + neg + and)
   - with XVentanaCondOps (andi + vt.maskc)
the inverted/negated case results in
   andi a5,a0,1024
   seqz a5,a5
   neg a5,a5
   and a5,a5,a1
due to how the sequence presents to the combine pass.

This adds an additional splitter to reassociate the polarity reversed
case into bexti + addi, if Zbs is present.

Signed-off-by: Philipp Tomsich 

gcc/ChangeLog:

 * config/riscv/xventanacondops.md: Add split to reassociate
   "andi + seqz + neg" into "bexti + addi".


OK once we've cleared the non-technical hurdles to committing vendor 
specific extensions.



Seeing the number of re association splitters that have been submitted 
and the fact that there's been a fair amount of commonality makes me 
wonder if we should instead be beefing up the generic split point code 
in combine.  Though IIRC that code may be strictly splitting without 
reassociation or rewriting..  Hmm, anyway, worth keeping in mind that we 
have some generic code to try and find good points to break down a 
complex insn into components that might be recognizable.



jeff




Re: [PATCH 3/7] RISC-V: Support noce_try_store_flag_mask as vt.maskc

2022-11-17 Thread Jeff Law via Gcc-patches



On 11/12/22 14:29, Philipp Tomsich wrote:

When if-conversion in noce_try_store_flag_mask starts the sequence off
with an order-operator, our patterns for vt.maskc will receive the
result of the order-operator as a register argument; consequently,
they can't know that the result will be either 1 or 0.

To convey this information (and make vt.maskc applicable), we wrap
the result of the order-operator in a eq/ne against (const_int 0).
This commit adds the split pattern to handle these cases.

gcc/ChangeLog:

* config/riscv/xventanacondops.md: Add split to wrap an an
   order-operator suitably for generating vt.maskc.

Signed-off-by: Philipp Tomsich 

Ref vrull/gcc#157

RISC-V: Recognize 'ge'/'le' operators as 'slt'/'sgt'

During if-conversion, if noce_try_store_flag_mask succeeds, we may see
 if (cur < next) {
 next = 0;
 }
transformed into
27: r82:SI=ltu(r76:DI,r75:DI)
   REG_DEAD r76:DI
28: r81:SI=r82:SI^0x1
   REG_DEAD r82:SI
29: r80:DI=zero_extend(r81:SI)
   REG_DEAD r81:SI

This currently escapes the combiner, as RISC-V does not have a pattern
to apply the 'slt' instruction to 'geu' verbs.  By adding a pattern in
this commit, we match such cases.

gcc/ChangeLog:

* config/riscv/predicates.md (anyge_operator): Define.
(anygt_operator): Define.
(anyle_operator): Define.
(anylt_operator): Define.
* config/riscv/riscv.md (*sge_): Add a
  pattern to map 'geu' onto slt w/ reversed operands.
* config/riscv/riscv.md: Helpers for ge & le.

gcc/testsuite/ChangeLog:

* gcc.target/riscv/xventanacondops-le-01.c: New test.
* gcc.target/riscv/xventanacondops-lt-03.c: New test.


Presumably the two splitters in riscv.md can't live in 
xventanacondops.md due to ordering issues?


OK once we've cleared the non-technical hurdles to committing vendor 
specific extensions.



Jeff



Re: [PATCH 2/7] RISC-V: Generate vt.maskc on noce_try_store_flag_mask if-conversion

2022-11-17 Thread Jeff Law via Gcc-patches



On 11/12/22 14:29, Philipp Tomsich wrote:

Adds a pattern to map the output of noce_try_store_flag_mask
if-conversion in the combiner onto vt.maskc; the input patterns
supported are similar to the following:
   (set (reg/v/f:DI 75 [  ])
(and:DI (neg:DI (ne:DI (reg:DI 82)
(const_int 0 [0])))
(reg/v/f:DI 75 [  ])))

This reduces dynamic instruction counts for the perlbench-workload in
SPEC CPU2017 by 0.8230%, 0.4689%, and 0.2332% (respectively, for the
each of the 3 workloads in the 'ref'-workload).

To ensure that the combine-pass doesn't get confused about
profitability, we recognize the idiom as requiring a single
instruction when the XVentanaCondOps extension is present.

gcc/ChangeLog:

* config/riscv/riscv.cc (riscv_rtx_costs): Recognize idiom for
  vt.maskc as a single insn with TARGET_XVENTANACONDOPS.
* config/riscv/riscv.md: Include xventanacondops.md.
* config/riscv/xventanacondops.md: New file.

gcc/testsuite/ChangeLog:

* gcc.target/riscv/xventanacondops-ne-03.c: New test.
* gcc.target/riscv/xventanacondops-ne-04.c: New test.


OK once we've cleared the non-technical hurdles to committing vendor 
specific extensions.




Jeff




Re: [PATCH 1/7] RISC-V: Recognize xventanacondops extension

2022-11-17 Thread Jeff Law via Gcc-patches



On 11/12/22 14:29, Philipp Tomsich wrote:

This adds the xventanacondops extension to the option parsing and as a
default for the ventana-vt1 core:

gcc/Changelog:

* common/config/riscv/riscv-common.cc: Recognize
   "xventanacondops" as part of an architecture string.
* config/riscv/riscv-cores.def (RISCV_CORE): Enable
  "xventanacondops" by default for "ventana-vt1".
* config/riscv/riscv-opts.h (MASK_XVENTANACONDOPS): Define.
(TARGET_XVENTANACONDOPS): Define.
* config/riscv/riscv.opt: Add "riscv_xventanacondops".


OK once we've cleared the non-technical hurdles to committing vendor 
specific extensions.



Jeff




Re: [PATCH] RISC-V: branch-(not)equals-zero compares against $zero

2022-11-17 Thread Jeff Law via Gcc-patches



On 11/8/22 12:55, Philipp Tomsich wrote:

If we are testing a register or a paradoxical subreg (i.e. anything that is not
a partial subreg) for equality/non-equality with zero, we can generate a branch
that compares against $zero.  This will work for QI, HI, SI and DImode, so we
enable this for ANYI.

2020-08-30  gcc/ChangeLog:

* config/riscv/riscv.md (*branch_equals_zero): Added pattern.


I've gone back an forth on this a few times.  As you know, I hate 
subregs in the target descriptions and I guess I need to extend that to 
querying if something is a subreg or not rather than just subregs 
appearing in the RTL.



Presumably the idea behind rejecting partial subregs is the bits outside 
the partial is unspecified, but that's also going to be true if we're 
looking at a hardreg in QImode (for example) irrespective of it being 
wrapped in a subreg.



I don't doubt it works the vast majority of the time, but I haven't been 
able to convince myself it'll work all the time.  How do we ensure that 
the bits outside the mode are zero?  I've been bitten by this kind of 
problem before, and it's safe to say it was exceedingly painful to find.



Jeff




Re: [PATCH] AArch64: Add support for -mdirect-extern-access

2022-11-17 Thread Fangrui Song via Gcc-patches
On Thu, Nov 17, 2022 at 1:55 PM Andrew Pinski  wrote:
>
> On Thu, Nov 17, 2022 at 1:46 PM Fangrui Song  wrote:
> >
> > On Thu, Nov 17, 2022 at 1:37 PM Andrew Pinski  wrote:
> > >
> > > On Thu, Nov 17, 2022 at 1:21 PM maskray--- via Gcc-patches
> > >  wrote:
> > > >
> > > > > +.. option:: -mdirect-extern-access, -mno-direct-extern-access
> > > > > +
> > > > > +  Use direct accesses for external data symbols.  It avoids a GOT 
> > > > > indirection
> > > > > +  on all external data symbols with :option:`-fpie` or 
> > > > > :option:`-fPIE`.  This is
> > > > > +  useful for executables linked with :option:`-static` or 
> > > > > :option:`-static-pie`.
> > > > > +  With :option:`-fpic` or :option:`-fPIC`, it only affects accesses 
> > > > > to protected
> > > > > +  data symbols.  It has no effect on non-position independent code.  
> > > > > The default
> > > > > +  is :option:`-mno-direct-extern-access`.
> > > > > +
> > > > > +  .. warning::
> > > > > +
> > > > > +Use :option:`-mdirect-extern-access` either in shared libraries 
> > > > > or in
> > > > > +executables, but not in both.  Protected symbols used both in a 
> > > > > shared
> > > > > +library and executable may cause linker errors or fail to work 
> > > > > correctly.
> > > >
> > > > I think current GCC and Clang's behavior is:
> > > >
> > > > * -mdirect-extern-access is the default for -fno-pic. This is to enable 
> > > > optimizations for -static programs but may introduce copy relocations.
> > > > * -mno-direct-extern-access is the default for -fpie and -fpic. This 
> > > > uses some GOT-generating relocations which can be optimized out (lld, 
> > > > see https://maskray.me/blog/2021-08-29-all-about-global-offset-table) 
> > > > but the instruction is nevertheless slightly longer.
> > > >
> > > > (-mdirect-extern-access for -fpic probably doesn't make sense.)
> > > >
> > > > The option I introduced to Clang is -fdirect-access-external-data
> > > > (see 
> > > > https://maskray.me/blog/2021-01-09-copy-relocations-canonical-plt-entries-and-protected).
> > > > If -mdirect-extern-access gets more popular, I can add a Clang alias.
> > > > But I am opposed to forcing a GNU property for 
> > > > -mdirect-extern-access/-mno-direct-extern-access.
> > > >
> > > > FWIW I used 
> > > > https://gist.github.com/MaskRay/c03a90922003df666551589f1629df22 to 
> > > > test my Clang changes related to -fno-semantic-interposition
> > > > on various visibility attributes x non-weak/weak x nopic/pie/pic x 
> > > > dllimport/not x ...
> > >
> > >
> > > The x86_64 discussion about this is here
> > > https://gcc.gnu.org/bugzilla/show_bug.cgi?id=98112 .
> > > I think clang changing the ABI is just broken and should think twice
> > > before we do it for GCC.
> > >
> > > And there is a lot of visibility protected issues filed in GCC bug
> > > databases specifically about copy relocs too.
> > > https://gcc.gnu.org/bugzilla/show_bug.cgi?id=56527
> > > https://gcc.gnu.org/bugzilla/show_bug.cgi?id=37611
> > > https://gcc.gnu.org/bugzilla/show_bug.cgi?id=19520
> > > https://sourceware.org/bugzilla/show_bug.cgi?id=28875
> > > https://sourceware.org/bugzilla/show_bug.cgi?id=28877
> > > I also suspect clang's behavior is still broken too.
> > >
> > > Thanks,
> > > Andrew
> >
> > Well, I don't think Clang changed ABI regarding -fno-pic/-fpie/-fpic.
> > As I did archaeology on
> > https://maskray.me/blog/2021-01-09-copy-relocations-canonical-plt-entries-and-protected
> > "Reflection on protected data symbols and copy relocations"
> > GCC 5 x86-64 made a change and GCC aarch64 accidentally picked up the 
> > change.
>
> You missed: https://gcc.gnu.org/bugzilla/show_bug.cgi?id=65248 (or
> rather r5-7961-ga5eef8e9b02474 ) was the change to fix protected .

I didn't. https://gcc.gnu.org/bugzilla/show_bug.cgi?id=65248 proposed a problem.
It could be resolved as "wontfix. copy relocations are just
incompatible with protected symbols as an optimization (which is the
very purpose of inventing protected)"
but was resolved by pessimizing GCC codegen. This led to heated
discussion in several places including
https://sourceware.org/legacy-ml/binutils/2016-03/msg00312.html (which
my article linked to).

>
> >
> > """
> > On the GCC side, in -fpic mode, using GOT-generating relocations when
> > accessing a protected variable subverts the point using the protected
> > visibility. The unneeded pessimization is the foremost complaint. The
> > pessimization applies to all ports with #define TARGET_BINDS_LOCAL_P
> > default_binds_local_p_2. aarch64 moved to default_binds_local_p_2
> > accidentally by
> > https://gcc.gnu.org/git/?p=gcc.git;a=commit;h=cbddf64c0243816b45e6680754a251c603245dbc.
>
> This was NOT by accident. In fact you just looked into the commit and
> NOT the actually email which submitted the patch:
> https://gcc.gnu.org/legacy-ml/gcc-patches/2015-04/msg01432.html
> https://gcc.gnu.org/bugzilla/show_bug.cgi?id=65780

This aarch64 commit was by accident. The code happened to 

Re: [PATCH] match.pd: rewrite select to branchless expression

2022-11-17 Thread Jeff Law via Gcc-patches



On 11/8/22 13:15, Andrew Pinski via Gcc-patches wrote:

On Tue, Nov 8, 2022 at 12:02 PM Michael Collison  wrote:

This patches transforms (cond (and (x , 0x1) == 0), y, (z op y)) into
(-(and (x , 0x1)) & z ) op y, where op is a '^' or a '|'. It also
transforms (cond (and (x , 0x1) != 0), (z op y), y ) into (-(and (x ,
0x1)) & z ) op y.

Matching this patterns allows GCC to generate branchless code for one of
the functions in coremark.

Bootstrapped and tested on x86 and RISC-V. Okay?

This seems like a (much) reduced (simplified?) version of
https://gcc.gnu.org/pipermail/gcc-patches/2021-November/584411.html .
I have not had time for the last year to go through the comments on
that patch and resubmit it though.
It seems like you are aiming for one specific case in coremarks rather
than a more generic fix too.


I'm fairly confident it was developed independently.  Michael did this 
transformation for LLVM and reached out to me a month or two ago for 
suggestions on the GCC implementation.



My recollection is I suggested phi-opt or match.pd with a slight 
preference for phi-opt as I wasn't offhand sure if we'd have a form 
suitable for match.pd.



THe pattern is just a conditional xor/ior with all is said and done.  
While the inspiration comes from coremark, I don't think it's supposed 
to be specific to coremark.



jeff




Re: [PATCH] AArch64: Add support for -mdirect-extern-access

2022-11-17 Thread Andrew Pinski via Gcc-patches
On Thu, Nov 17, 2022 at 1:46 PM Fangrui Song  wrote:
>
> On Thu, Nov 17, 2022 at 1:37 PM Andrew Pinski  wrote:
> >
> > On Thu, Nov 17, 2022 at 1:21 PM maskray--- via Gcc-patches
> >  wrote:
> > >
> > > > +.. option:: -mdirect-extern-access, -mno-direct-extern-access
> > > > +
> > > > +  Use direct accesses for external data symbols.  It avoids a GOT 
> > > > indirection
> > > > +  on all external data symbols with :option:`-fpie` or 
> > > > :option:`-fPIE`.  This is
> > > > +  useful for executables linked with :option:`-static` or 
> > > > :option:`-static-pie`.
> > > > +  With :option:`-fpic` or :option:`-fPIC`, it only affects accesses to 
> > > > protected
> > > > +  data symbols.  It has no effect on non-position independent code.  
> > > > The default
> > > > +  is :option:`-mno-direct-extern-access`.
> > > > +
> > > > +  .. warning::
> > > > +
> > > > +Use :option:`-mdirect-extern-access` either in shared libraries or 
> > > > in
> > > > +executables, but not in both.  Protected symbols used both in a 
> > > > shared
> > > > +library and executable may cause linker errors or fail to work 
> > > > correctly.
> > >
> > > I think current GCC and Clang's behavior is:
> > >
> > > * -mdirect-extern-access is the default for -fno-pic. This is to enable 
> > > optimizations for -static programs but may introduce copy relocations.
> > > * -mno-direct-extern-access is the default for -fpie and -fpic. This uses 
> > > some GOT-generating relocations which can be optimized out (lld, see 
> > > https://maskray.me/blog/2021-08-29-all-about-global-offset-table) but the 
> > > instruction is nevertheless slightly longer.
> > >
> > > (-mdirect-extern-access for -fpic probably doesn't make sense.)
> > >
> > > The option I introduced to Clang is -fdirect-access-external-data
> > > (see 
> > > https://maskray.me/blog/2021-01-09-copy-relocations-canonical-plt-entries-and-protected).
> > > If -mdirect-extern-access gets more popular, I can add a Clang alias.
> > > But I am opposed to forcing a GNU property for 
> > > -mdirect-extern-access/-mno-direct-extern-access.
> > >
> > > FWIW I used 
> > > https://gist.github.com/MaskRay/c03a90922003df666551589f1629df22 to test 
> > > my Clang changes related to -fno-semantic-interposition
> > > on various visibility attributes x non-weak/weak x nopic/pie/pic x 
> > > dllimport/not x ...
> >
> >
> > The x86_64 discussion about this is here
> > https://gcc.gnu.org/bugzilla/show_bug.cgi?id=98112 .
> > I think clang changing the ABI is just broken and should think twice
> > before we do it for GCC.
> >
> > And there is a lot of visibility protected issues filed in GCC bug
> > databases specifically about copy relocs too.
> > https://gcc.gnu.org/bugzilla/show_bug.cgi?id=56527
> > https://gcc.gnu.org/bugzilla/show_bug.cgi?id=37611
> > https://gcc.gnu.org/bugzilla/show_bug.cgi?id=19520
> > https://sourceware.org/bugzilla/show_bug.cgi?id=28875
> > https://sourceware.org/bugzilla/show_bug.cgi?id=28877
> > I also suspect clang's behavior is still broken too.
> >
> > Thanks,
> > Andrew
>
> Well, I don't think Clang changed ABI regarding -fno-pic/-fpie/-fpic.
> As I did archaeology on
> https://maskray.me/blog/2021-01-09-copy-relocations-canonical-plt-entries-and-protected
> "Reflection on protected data symbols and copy relocations"
> GCC 5 x86-64 made a change and GCC aarch64 accidentally picked up the change.

You missed: https://gcc.gnu.org/bugzilla/show_bug.cgi?id=65248 (or
rather r5-7961-ga5eef8e9b02474 ) was the change to fix protected .


>
> """
> On the GCC side, in -fpic mode, using GOT-generating relocations when
> accessing a protected variable subverts the point using the protected
> visibility. The unneeded pessimization is the foremost complaint. The
> pessimization applies to all ports with #define TARGET_BINDS_LOCAL_P
> default_binds_local_p_2. aarch64 moved to default_binds_local_p_2
> accidentally by
> https://gcc.gnu.org/git/?p=gcc.git;a=commit;h=cbddf64c0243816b45e6680754a251c603245dbc.

This was NOT by accident. In fact you just looked into the commit and
NOT the actually email which submitted the patch:
https://gcc.gnu.org/legacy-ml/gcc-patches/2015-04/msg01432.html
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=65780

"As s390/arm/aarch64 seems to work fine
(generate a COPY relocation and thus define symbol locally) in non-PIE
executables, this patch changes those to a function that has been added for
that behavior."


Thanks,
Andrew Pinski

>
> For GCC<5 (and all versions of Clang), direct accesses to protected
> variables are produced in -fpic code. Mixing such object files can
> still silently break copy relocations on protected data symbols.
> Therefore, GNU ld made the controversial change
> https://sourceware.org/git/gitweb.cgi?p=binutils-gdb.git;a=commit;h=ca3fe95e469b9daec153caa2c90665f5daaec2b5
> to error in -shared mode.
> """
>
>
> > >
> > > On 2022-11-17, Ramana Radhakrishnan wrote:
> > > >On Thu, Nov 17, 2022 at 5:30 PM Richard Sandiford via 

Re: [PATCH] Ver.2: Add compile option "-msmall-data-limit=0" to avoid using .srodata section for riscv.

2022-11-17 Thread Jeff Law via Gcc-patches


On 11/17/22 02:53, Yixuan Chen wrote:

2022-11-17  Yixuan Chen  

 * gcc/testsuite/gcc.dg/pr25521.c: Add compile option 
"-msmall-data-limit=0" to avoid using .srodata section for riscv.
---
  gcc/testsuite/gcc.dg/pr25521.c | 3 ++-
  1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/gcc/testsuite/gcc.dg/pr25521.c b/gcc/testsuite/gcc.dg/pr25521.c
index 74fe2ae6626..628ddf1a761 100644
--- a/gcc/testsuite/gcc.dg/pr25521.c
+++ b/gcc/testsuite/gcc.dg/pr25521.c
@@ -2,7 +2,8 @@
 sections.
  
 { dg-require-effective-target elf }

-   { dg-do compile } */
+   { dg-do compile }
+   { dg-options "-msmall-data-limit=0" { target { riscv*-*-* } } } */
  
  const volatile int foo = 30;
  


Wouldn't this be better?  It avoids a target specific conditional by 
instead extending what we look for to cover [s]rodata sections.



Thoughts?

Jeff
diff --git a/gcc/testsuite/gcc.dg/pr25521.c b/gcc/testsuite/gcc.dg/pr25521.c
index 74fe2ae6626..63363a03b9f 100644
--- a/gcc/testsuite/gcc.dg/pr25521.c
+++ b/gcc/testsuite/gcc.dg/pr25521.c
@@ -7,4 +7,4 @@
 const volatile int foo = 30;
 
 
-/* { dg-final { scan-assembler "\\.rodata" } } */
+/* { dg-final { scan-assembler "\\.s\?rodata" } } */


Re: [PATCH] AArch64: Add support for -mdirect-extern-access

2022-11-17 Thread Fangrui Song via Gcc-patches
On Thu, Nov 17, 2022 at 1:37 PM Andrew Pinski  wrote:
>
> On Thu, Nov 17, 2022 at 1:21 PM maskray--- via Gcc-patches
>  wrote:
> >
> > > +.. option:: -mdirect-extern-access, -mno-direct-extern-access
> > > +
> > > +  Use direct accesses for external data symbols.  It avoids a GOT 
> > > indirection
> > > +  on all external data symbols with :option:`-fpie` or :option:`-fPIE`.  
> > > This is
> > > +  useful for executables linked with :option:`-static` or 
> > > :option:`-static-pie`.
> > > +  With :option:`-fpic` or :option:`-fPIC`, it only affects accesses to 
> > > protected
> > > +  data symbols.  It has no effect on non-position independent code.  The 
> > > default
> > > +  is :option:`-mno-direct-extern-access`.
> > > +
> > > +  .. warning::
> > > +
> > > +Use :option:`-mdirect-extern-access` either in shared libraries or in
> > > +executables, but not in both.  Protected symbols used both in a 
> > > shared
> > > +library and executable may cause linker errors or fail to work 
> > > correctly.
> >
> > I think current GCC and Clang's behavior is:
> >
> > * -mdirect-extern-access is the default for -fno-pic. This is to enable 
> > optimizations for -static programs but may introduce copy relocations.
> > * -mno-direct-extern-access is the default for -fpie and -fpic. This uses 
> > some GOT-generating relocations which can be optimized out (lld, see 
> > https://maskray.me/blog/2021-08-29-all-about-global-offset-table) but the 
> > instruction is nevertheless slightly longer.
> >
> > (-mdirect-extern-access for -fpic probably doesn't make sense.)
> >
> > The option I introduced to Clang is -fdirect-access-external-data
> > (see 
> > https://maskray.me/blog/2021-01-09-copy-relocations-canonical-plt-entries-and-protected).
> > If -mdirect-extern-access gets more popular, I can add a Clang alias.
> > But I am opposed to forcing a GNU property for 
> > -mdirect-extern-access/-mno-direct-extern-access.
> >
> > FWIW I used 
> > https://gist.github.com/MaskRay/c03a90922003df666551589f1629df22 to test my 
> > Clang changes related to -fno-semantic-interposition
> > on various visibility attributes x non-weak/weak x nopic/pie/pic x 
> > dllimport/not x ...
>
>
> The x86_64 discussion about this is here
> https://gcc.gnu.org/bugzilla/show_bug.cgi?id=98112 .
> I think clang changing the ABI is just broken and should think twice
> before we do it for GCC.
>
> And there is a lot of visibility protected issues filed in GCC bug
> databases specifically about copy relocs too.
> https://gcc.gnu.org/bugzilla/show_bug.cgi?id=56527
> https://gcc.gnu.org/bugzilla/show_bug.cgi?id=37611
> https://gcc.gnu.org/bugzilla/show_bug.cgi?id=19520
> https://sourceware.org/bugzilla/show_bug.cgi?id=28875
> https://sourceware.org/bugzilla/show_bug.cgi?id=28877
> I also suspect clang's behavior is still broken too.
>
> Thanks,
> Andrew

Well, I don't think Clang changed ABI regarding -fno-pic/-fpie/-fpic.
As I did archaeology on
https://maskray.me/blog/2021-01-09-copy-relocations-canonical-plt-entries-and-protected
"Reflection on protected data symbols and copy relocations"
GCC 5 x86-64 made a change and GCC aarch64 accidentally picked up the change.

"""
On the GCC side, in -fpic mode, using GOT-generating relocations when
accessing a protected variable subverts the point using the protected
visibility. The unneeded pessimization is the foremost complaint. The
pessimization applies to all ports with #define TARGET_BINDS_LOCAL_P
default_binds_local_p_2. aarch64 moved to default_binds_local_p_2
accidentally by
https://gcc.gnu.org/git/?p=gcc.git;a=commit;h=cbddf64c0243816b45e6680754a251c603245dbc.

For GCC<5 (and all versions of Clang), direct accesses to protected
variables are produced in -fpic code. Mixing such object files can
still silently break copy relocations on protected data symbols.
Therefore, GNU ld made the controversial change
https://sourceware.org/git/gitweb.cgi?p=binutils-gdb.git;a=commit;h=ca3fe95e469b9daec153caa2c90665f5daaec2b5
to error in -shared mode.
"""


> >
> > On 2022-11-17, Ramana Radhakrishnan wrote:
> > >On Thu, Nov 17, 2022 at 5:30 PM Richard Sandiford via Gcc-patches
> > > wrote:
> > >>
> > >> Wilco Dijkstra  writes:
> > >> > Hi Richard,
> > >> >
> > >> >> Can you go into more detail about:
> > >> >>
> > >> >>Use :option:`-mdirect-extern-access` either in shared libraries or 
> > >> >> in
> > >> >>executables, but not in both.  Protected symbols used both in a 
> > >> >> shared
> > >> >>library and executable may cause linker errors or fail to work 
> > >> >> correctly
> > >> >>
> > >> >> If this is LLVM's default for PIC (and by assumption shared 
> > >> >> libraries),
> > >> >> is it then invalid to use -mdirect-extern-access for any PIEs that
> > >> >> are linked against those shared libraries and use protected symbols
> > >> >> from those libraries?  How would a user know that one of the shared
> > >> >> libraries they're linking against was built in this way?
> > >> >
> > >> 

Re: [PATCH] AArch64: Add support for -mdirect-extern-access

2022-11-17 Thread Andrew Pinski via Gcc-patches
On Thu, Nov 17, 2022 at 1:21 PM maskray--- via Gcc-patches
 wrote:
>
> > +.. option:: -mdirect-extern-access, -mno-direct-extern-access
> > +
> > +  Use direct accesses for external data symbols.  It avoids a GOT 
> > indirection
> > +  on all external data symbols with :option:`-fpie` or :option:`-fPIE`.  
> > This is
> > +  useful for executables linked with :option:`-static` or 
> > :option:`-static-pie`.
> > +  With :option:`-fpic` or :option:`-fPIC`, it only affects accesses to 
> > protected
> > +  data symbols.  It has no effect on non-position independent code.  The 
> > default
> > +  is :option:`-mno-direct-extern-access`.
> > +
> > +  .. warning::
> > +
> > +Use :option:`-mdirect-extern-access` either in shared libraries or in
> > +executables, but not in both.  Protected symbols used both in a shared
> > +library and executable may cause linker errors or fail to work 
> > correctly.
>
> I think current GCC and Clang's behavior is:
>
> * -mdirect-extern-access is the default for -fno-pic. This is to enable 
> optimizations for -static programs but may introduce copy relocations.
> * -mno-direct-extern-access is the default for -fpie and -fpic. This uses 
> some GOT-generating relocations which can be optimized out (lld, see 
> https://maskray.me/blog/2021-08-29-all-about-global-offset-table) but the 
> instruction is nevertheless slightly longer.
>
> (-mdirect-extern-access for -fpic probably doesn't make sense.)
>
> The option I introduced to Clang is -fdirect-access-external-data
> (see 
> https://maskray.me/blog/2021-01-09-copy-relocations-canonical-plt-entries-and-protected).
> If -mdirect-extern-access gets more popular, I can add a Clang alias.
> But I am opposed to forcing a GNU property for 
> -mdirect-extern-access/-mno-direct-extern-access.
>
> FWIW I used https://gist.github.com/MaskRay/c03a90922003df666551589f1629df22 
> to test my Clang changes related to -fno-semantic-interposition
> on various visibility attributes x non-weak/weak x nopic/pie/pic x 
> dllimport/not x ...


The x86_64 discussion about this is here
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=98112 .
I think clang changing the ABI is just broken and should think twice
before we do it for GCC.

And there is a lot of visibility protected issues filed in GCC bug
databases specifically about copy relocs too.
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=56527
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=37611
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=19520
https://sourceware.org/bugzilla/show_bug.cgi?id=28875
https://sourceware.org/bugzilla/show_bug.cgi?id=28877
I also suspect clang's behavior is still broken too.

Thanks,
Andrew

>
> On 2022-11-17, Ramana Radhakrishnan wrote:
> >On Thu, Nov 17, 2022 at 5:30 PM Richard Sandiford via Gcc-patches
> > wrote:
> >>
> >> Wilco Dijkstra  writes:
> >> > Hi Richard,
> >> >
> >> >> Can you go into more detail about:
> >> >>
> >> >>Use :option:`-mdirect-extern-access` either in shared libraries or in
> >> >>executables, but not in both.  Protected symbols used both in a 
> >> >> shared
> >> >>library and executable may cause linker errors or fail to work 
> >> >> correctly
> >> >>
> >> >> If this is LLVM's default for PIC (and by assumption shared libraries),
> >> >> is it then invalid to use -mdirect-extern-access for any PIEs that
> >> >> are linked against those shared libraries and use protected symbols
> >> >> from those libraries?  How would a user know that one of the shared
> >> >> libraries they're linking against was built in this way?
> >> >
> >> > Yes, the usage model is that you'd either use it for static PIE or only 
> >> > on
> >> > data that is not shared. If you get it wrong them you'll get the copy
> >> > relocation error.
> >>
> >> Thanks.  I think I'm still missing something though.  If, for the
> >> non-executable case, people should only use the feature on data that
> >> is not shared, why do we need to relax the binds-local condition for
> >> protected symbols on -fPIC?  Oughtn't the symbol to be hidden rather
> >> than protected if the data isn't shared?
> >>
> >> I can understand the reasoning for the PIE changes but I'm still
> >> struggling with the PIC-but-not-PIE bits.
> >
> >I think I'm with Richard S on hidden vs protected on first reading. I
> >can see why this works out of the box and can even be default for
> >static-pie.
> >
> >Any reason why this is not on by default - it's early enough in the
> >stage3 cycle and we can always flip the defaults if there are more
> >problems found.
> >
> >You probably need a rebase for the documentation bits,.
> >
> >regards
> >Ramana
> >
> >
> >Ramana
>
>
> +  is :option:`-mno-direct-extern-access`.


Re: [PATCH 4/6] diagnostics: libcpp: Add LC_GEN linemaps to support in-memory buffers

2022-11-17 Thread Lewis Hyatt via Gcc-patches
On Sat, Nov 05, 2022 at 12:23:28PM -0400, David Malcolm wrote:
> On Fri, 2022-11-04 at 09:44 -0400, Lewis Hyatt via Gcc-patches wrote:
> [...snip...]
> > 
> > diff --git a/gcc/c-family/c-common.cc b/gcc/c-family/c-common.cc
> > index 5890c18bdc3..2935d7fb236 100644
> > --- a/gcc/c-family/c-common.cc
> > +++ b/gcc/c-family/c-common.cc
> > @@ -9183,11 +9183,14 @@ try_to_locate_new_include_insertion_point (const 
> > char *file, location_t loc)
> >const line_map_ordinary *ord_map
> > = LINEMAPS_ORDINARY_MAP_AT (line_table, i);
> >  
> > +  if (ord_map->reason == LC_GEN)
> > +   continue;
> > +
> >if (const line_map_ordinary *from
> >   = linemap_included_from_linemap (line_table, ord_map))
> > /* We cannot use pointer equality, because with preprocessed
> >input all filename strings are unique.  */
> > -   if (0 == strcmp (from->to_file, file))
> > +   if (from->reason != LC_GEN && 0 == strcmp (from->to_file, file))
> >   {
> > last_include_ord_map = from;
> > last_ord_map_after_include = NULL;
> 
> [...snip...]
> 
> I'm not a fan of having the "to_file" field change meaning based on
> whether reason is LC_GEN.
> 
> How involved would it be to split line_map_ordinary into two
> subclasses, so that we'd have this hierarchy (with indentation showing
> inheritance):
> 
> line_map
>   line_map_ordinary
> line_map_ordinary_file
> line_map_ordinary_generated
>   line_map_macro
> 
> Alternatively, how about renaming "to_file" to be "data" (or "m_data"),
> to emphasize that it might not be a filename, and that we have to check
> everywhere we access that field.
> 
> Please can all those checks for LC_GEN go into an inline function so we
> can write e.g.
>   map->generated_p ()
> or somesuch.
> 
> If I reading things right, patch 6 adds the sole usage of this in
> destringize_and_run.  Would we ever want to discriminate between
> different kinds of generated buffers?
> 
> [...snip...]
> 
> > @@ -796,10 +798,13 @@ diagnostic_report_current_module (diagnostic_context 
> > *context, location_t where)
> >  N_("of module"),
> >  N_("In module imported at"),   /* 6 */
> >  N_("imported at"),
> > +N_("In buffer generated from"),   /* 8 */
> > };
> 
> We use the wording "destringized" in:
> 
> so maybe this should be "In buffer destringized from" ???  (I'm not
> sure) 
> 
> [...snip...]
> 
> > diff --git a/gcc/input.cc b/gcc/input.cc
> > index 483cb6e940d..3cf5480551d 100644
> > --- a/gcc/input.cc
> > +++ b/gcc/input.cc
> 
> [..snip...]
> 
> > @@ -58,7 +64,7 @@ public:
> >~file_cache_slot ();
> 
> My initial thought reading the input.cc part of this patch was that I
> want it to be very clear when a file_cache_slot is for a real file vs
> when we're replaying generated data.  I'd hoped that this could have
> been expressed via inheritance, but we preallocate all the cache slots
> once in an array in file_cache's ctor and the slots get reused over
> time.  So instead of that, can we please have some kind of:
> 
>bool file_slot_p () const;
>bool generated_slot_p () const;
> 
> or somesuch, so that we can have clear assertions and conditionals
> about the current state of a slot (I think the discriminating condition
> is that generated_data_len > 0, right?)
> 
> If I'm reading things right, it looks like file_cache_slot::m_file_path
> does double duty after this patch, and is either a filename, or a
> pointer to the generated data.  If so, please can the patch rename it,
> and have all usage guarded appropriately.  Can it be a union? (or does
> the ctor prevent that?)
> 
> [...snip...]
>  
> > @@ -445,16 +461,23 @@ file_cache::evicted_cache_tab_entry (unsigned 
> > *highest_use_count)
> > num_file_slots files are cached.  */
> >  
> >  file_cache_slot*
> > -file_cache::add_file (const char *file_path)
> > +file_cache::add_file (const char *file_path, unsigned int 
> > generated_data_len)
> 
> Can we split this into two functions: one for files, and one for
> generated data?  (add_file vs add_generated_data?)
> 
> >  {
> >  
> > -  FILE *fp = fopen (file_path, "r");
> > -  if (fp == NULL)
> > -return NULL;
> > +  FILE *fp;
> > +  if (generated_data_len)
> > +fp = NULL;
> > +  else
> > +{
> > +  fp = fopen (file_path, "r");
> > +  if (fp == NULL)
> > +   return NULL;
> > +}
> >  
> >unsigned highest_use_count = 0;
> >file_cache_slot *r = evicted_cache_tab_entry (_use_count);
> > -  if (!r->create (in_context, file_path, fp, highest_use_count))
> > +  if (!r->create (in_context, file_path, fp, highest_use_count,
> > + generated_data_len))
> >  return NULL;
> >return r;
> >  }
> 
> [...snip...]
> 
> > @@ -535,11 +571,12 @@ file_cache::~file_cache ()
> > it.  */
> >  
> >  file_cache_slot*
> > -file_cache::lookup_or_add_file (const char *file_path)
> > 

Re: [PATCH] AArch64: Add support for -mdirect-extern-access

2022-11-17 Thread maskray--- via Gcc-patches

+.. option:: -mdirect-extern-access, -mno-direct-extern-access
+
+  Use direct accesses for external data symbols.  It avoids a GOT indirection
+  on all external data symbols with :option:`-fpie` or :option:`-fPIE`.  This 
is
+  useful for executables linked with :option:`-static` or 
:option:`-static-pie`.
+  With :option:`-fpic` or :option:`-fPIC`, it only affects accesses to 
protected
+  data symbols.  It has no effect on non-position independent code.  The 
default
+  is :option:`-mno-direct-extern-access`.
+
+  .. warning::
+
+Use :option:`-mdirect-extern-access` either in shared libraries or in
+executables, but not in both.  Protected symbols used both in a shared
+library and executable may cause linker errors or fail to work correctly.


I think current GCC and Clang's behavior is:

* -mdirect-extern-access is the default for -fno-pic. This is to enable 
optimizations for -static programs but may introduce copy relocations.
* -mno-direct-extern-access is the default for -fpie and -fpic. This uses some 
GOT-generating relocations which can be optimized out (lld, see 
https://maskray.me/blog/2021-08-29-all-about-global-offset-table) but the 
instruction is nevertheless slightly longer.

(-mdirect-extern-access for -fpic probably doesn't make sense.)

The option I introduced to Clang is -fdirect-access-external-data
(see 
https://maskray.me/blog/2021-01-09-copy-relocations-canonical-plt-entries-and-protected).
If -mdirect-extern-access gets more popular, I can add a Clang alias.
But I am opposed to forcing a GNU property for 
-mdirect-extern-access/-mno-direct-extern-access.

FWIW I used https://gist.github.com/MaskRay/c03a90922003df666551589f1629df22 to 
test my Clang changes related to -fno-semantic-interposition
on various visibility attributes x non-weak/weak x nopic/pie/pic x 
dllimport/not x ...

On 2022-11-17, Ramana Radhakrishnan wrote:

On Thu, Nov 17, 2022 at 5:30 PM Richard Sandiford via Gcc-patches
 wrote:


Wilco Dijkstra  writes:
> Hi Richard,
>
>> Can you go into more detail about:
>>
>>Use :option:`-mdirect-extern-access` either in shared libraries or in
>>executables, but not in both.  Protected symbols used both in a shared
>>library and executable may cause linker errors or fail to work correctly
>>
>> If this is LLVM's default for PIC (and by assumption shared libraries),
>> is it then invalid to use -mdirect-extern-access for any PIEs that
>> are linked against those shared libraries and use protected symbols
>> from those libraries?  How would a user know that one of the shared
>> libraries they're linking against was built in this way?
>
> Yes, the usage model is that you'd either use it for static PIE or only on
> data that is not shared. If you get it wrong them you'll get the copy
> relocation error.

Thanks.  I think I'm still missing something though.  If, for the
non-executable case, people should only use the feature on data that
is not shared, why do we need to relax the binds-local condition for
protected symbols on -fPIC?  Oughtn't the symbol to be hidden rather
than protected if the data isn't shared?

I can understand the reasoning for the PIE changes but I'm still
struggling with the PIC-but-not-PIE bits.


I think I'm with Richard S on hidden vs protected on first reading. I
can see why this works out of the box and can even be default for
static-pie.

Any reason why this is not on by default - it's early enough in the
stage3 cycle and we can always flip the defaults if there are more
problems found.

You probably need a rebase for the documentation bits,.

regards
Ramana


Ramana



+  is :option:`-mno-direct-extern-access`.


Re: [PATCH]AArch64 Fix vector re-interpretation between partial SIMD modes

2022-11-17 Thread Richard Sandiford via Gcc-patches
Tamar Christina  writes:
> Hi All,
>
> While writing a patch series I started getting incorrect codegen out from
> VEC_PERM on partial struct types.
>
> It turns out that this was happening because the TARGET_CAN_CHANGE_MODE_CLASS
> implementation has a slight bug in it.  The hook only checked for SIMD to
> Partial but never Partial to SIMD.   This resulted in incorrect subregs to be
> generated from the fallback code in VEC_PERM_EXPR expansions.
>
> I have unfortunately not been able to trigger it using a standalone testcase 
> as
> the mid-end optimizes away the permute every time I try to describe a permute
> that would result in the bug.
>
> The patch now rejects any conversion of partial SIMD struct types, unless they
> are both partial structures of the same number of registers or one is a SIMD
> type who's size is less than 8 bytes.
>
> Bootstrapped Regtested on aarch64-none-linux-gnu and no issues.
>
> Ok for master? And backport to GCC 12?
>
> Thanks,
> Tamar
>
> gcc/ChangeLog:
>
>   * config/aarch64/aarch64.cc (aarch64_can_change_mode_class): Restrict
>   conversions between partial struct types properly.
>
> --- inline copy of patch -- 
> diff --git a/gcc/config/aarch64/aarch64.cc b/gcc/config/aarch64/aarch64.cc
> index 
> d3c3650d7d728f56adb65154127dc7b72386c5a7..84dbe2f4ea7d03b424602ed98a34e7824217dc91
>  100644
> --- a/gcc/config/aarch64/aarch64.cc
> +++ b/gcc/config/aarch64/aarch64.cc
> @@ -26471,9 +26471,10 @@ aarch64_can_change_mode_class (machine_mode from,
>bool from_pred_p = (from_flags & VEC_SVE_PRED);
>bool to_pred_p = (to_flags & VEC_SVE_PRED);
>  
> -  bool from_full_advsimd_struct_p = (from_flags == (VEC_ADVSIMD | 
> VEC_STRUCT));
>bool to_partial_advsimd_struct_p = (to_flags == (VEC_ADVSIMD | VEC_STRUCT
>  | VEC_PARTIAL));
> +  bool from_partial_advsimd_struct_p = (from_flags == (VEC_ADVSIMD | 
> VEC_STRUCT
> +| VEC_PARTIAL));
>  
>/* Don't allow changes between predicate modes and other modes.
>   Only predicate registers can hold predicate modes and only
> @@ -26496,9 +26497,23 @@ aarch64_can_change_mode_class (machine_mode from,
>  return false;
>  
>/* Don't allow changes between partial and full Advanced SIMD structure
> - modes.  */
> -  if (from_full_advsimd_struct_p && to_partial_advsimd_struct_p)
> -return false;
> + modes unless both are a partial struct with the same number of registers
> + or the vector bitsizes must be the same.  */
> +  if (to_partial_advsimd_struct_p ^ from_partial_advsimd_struct_p)
> +{
> +  /* If they're both partial structures, allow if they have the same 
> number
> +  or registers.  */
> +  if (to_partial_advsimd_struct_p == from_partial_advsimd_struct_p)
> + return known_eq (GET_MODE_SIZE (from), GET_MODE_SIZE (to));

It looks like the ^ makes this line unreachable.  I guess it should
be a separate top-level condition.

> +  /* If one is a normal SIMD register, allow only if no larger than 
> 64-bit.  */
> +  if ((to_flags & VEC_ADVSIMD) == to_flags)
> + return known_le (GET_MODE_SIZE (to), 8);
> +  else if ((from_flags & VEC_ADVSIMD) == from_flags)
> + return known_le (GET_MODE_SIZE (from), 8);
> +
> +  return false;
> +}

I don't think we need to restrict this to SIMD modes.  A plain DI would
be OK too.  So I think it should just be:

return (known_le (GET_MODE_SIZE (to), 8)
|| known_le (GET_MODE_SIZE (from, 8));

Thanks,
Richard

>  
>if (maybe_ne (BITS_PER_SVE_VECTOR, 128u))
>  {


[PATCH] libcpp: Add missing config for --enable-valgrind-annotations [PR107691]

2022-11-17 Thread Bernhard Reutner-Fischer via Gcc-patches
---
ceb17928e5d1d5 copied (parts of) the valgrind annotation checks from gcc
to libcpp. The above copies the missing pieces to libcpp to diagnose
when libcpp is configured with --enable-valgrind-annotations but
valgrind is not installed.

Tested with --enable-valgrind-annotations without valgrind installed
where it fixes errors thrown by test(1). And once with valgrind
installed.
Ok for trunk?

libcpp/ChangeLog:

PR preprocessor/107691
* configure.ac: Add valgrind header checks.
* configure: Regenerate.
---
 libcpp/configure.ac | 35 +++
 1 file changed, 35 insertions(+)

diff --git a/libcpp/configure.ac b/libcpp/configure.ac
index 9b6042518e5..89ac99b04bd 100644
--- a/libcpp/configure.ac
+++ b/libcpp/configure.ac
@@ -226,6 +226,40 @@ case x$enable_languages in
 esac
 AC_SUBST(CET_HOST_FLAGS)
 
+dnl # This check AC_REQUIREs various stuff, so it *must not* be inside
+dnl # an if statement.  This was the source of very frustrating bugs
+dnl # in converting to autoconf 2.5x!
+AC_CHECK_HEADER(valgrind.h, have_valgrind_h=yes, have_valgrind_h=no)
+
+# It is certainly possible that there's valgrind but no valgrind.h.
+# GCC relies on making annotations so we must have both.
+AC_MSG_CHECKING(for VALGRIND_DISCARD in )
+AC_PREPROC_IFELSE([AC_LANG_SOURCE(
+  [[#include 
+#ifndef VALGRIND_DISCARD
+#error VALGRIND_DISCARD not defined
+#endif]])],
+  [gcc_cv_header_valgrind_memcheck_h=yes],
+  [gcc_cv_header_valgrind_memcheck_h=no])
+AC_MSG_RESULT($gcc_cv_header_valgrind_memcheck_h)
+AC_MSG_CHECKING(for VALGRIND_DISCARD in )
+AC_PREPROC_IFELSE([AC_LANG_SOURCE(
+  [[#include 
+#ifndef VALGRIND_DISCARD
+#error VALGRIND_DISCARD not defined
+#endif]])],
+  [gcc_cv_header_memcheck_h=yes],
+  [gcc_cv_header_memcheck_h=no])
+AC_MSG_RESULT($gcc_cv_header_memcheck_h)
+if test $gcc_cv_header_valgrind_memcheck_h = yes; then
+  AC_DEFINE(HAVE_VALGRIND_MEMCHECK_H, 1,
+   [Define if valgrind's valgrind/memcheck.h header is installed.])
+fi
+if test $gcc_cv_header_memcheck_h = yes; then
+  AC_DEFINE(HAVE_MEMCHECK_H, 1,
+   [Define if valgrind's memcheck.h header is installed.])
+fi
+
 AC_ARG_ENABLE(valgrind-annotations,
 [AS_HELP_STRING([--enable-valgrind-annotations],
[enable valgrind runtime interaction])], [],
@@ -235,6 +269,7 @@ if test x$enable_valgrind_annotations != xno \
   if (test $have_valgrind_h = no \
   && test $gcc_cv_header_memcheck_h = no \
   && test $gcc_cv_header_valgrind_memcheck_h = no); then
+AC_MSG_ERROR([*** valgrind annotations requested, but])
 AC_MSG_ERROR([*** Can't find valgrind/memcheck.h, memcheck.h or 
valgrind.h])
   fi
   AC_DEFINE(ENABLE_VALGRIND_ANNOTATIONS, 1,
-- 
2.38.1



Re: [PATCH] ARM: Make ARMv8-M attribute cmse_nonsecure_call work in Ada

2022-11-17 Thread Ramana Radhakrishnan via Gcc-patches
On Mon, Oct 24, 2022 at 9:55 AM Eric Botcazou via Gcc-patches
 wrote:
>
> Hi,
>
> until most other machine attributes, this one does not work in Ada because,
> while it applies to pointer-to-function types, it is explicitly marked as
> requiring declarations in the implementation.
>
> Now, in Ada, machine attributes are specified like this:
>
>   type Non_Secure is access procedure;
>   pragma Machine_Attribute (Non_Secure, "cmse_nonsecure_call");
>
> i.e. not attached to the declaration of Non_Secure (testcase attached).
>
> So the attached patch extends the support to Ada by also accepting
> pointer-to-function types in the handler.
>
> Tested on arm-eabi, OK for the mainline?
>


Ok if no regressions, perhaps the test needs to be in the ada test suite ?

regards

Ramana


>
> 2022-10-24  Eric Botcazou  
>
> * config/arm/arm.cc (arm_attribute_table) : 
> Change
> decl_required field to false.
> (arm_handle_cmse_nonsecure_call): Deal with a TYPE node.
>
>
> --
> Eric Botcazou


Re: [PATCH] AArch64: Add support for -mdirect-extern-access

2022-11-17 Thread Ramana Radhakrishnan via Gcc-patches
On Thu, Nov 17, 2022 at 5:30 PM Richard Sandiford via Gcc-patches
 wrote:
>
> Wilco Dijkstra  writes:
> > Hi Richard,
> >
> >> Can you go into more detail about:
> >>
> >>Use :option:`-mdirect-extern-access` either in shared libraries or in
> >>executables, but not in both.  Protected symbols used both in a shared
> >>library and executable may cause linker errors or fail to work correctly
> >>
> >> If this is LLVM's default for PIC (and by assumption shared libraries),
> >> is it then invalid to use -mdirect-extern-access for any PIEs that
> >> are linked against those shared libraries and use protected symbols
> >> from those libraries?  How would a user know that one of the shared
> >> libraries they're linking against was built in this way?
> >
> > Yes, the usage model is that you'd either use it for static PIE or only on
> > data that is not shared. If you get it wrong them you'll get the copy
> > relocation error.
>
> Thanks.  I think I'm still missing something though.  If, for the
> non-executable case, people should only use the feature on data that
> is not shared, why do we need to relax the binds-local condition for
> protected symbols on -fPIC?  Oughtn't the symbol to be hidden rather
> than protected if the data isn't shared?
>
> I can understand the reasoning for the PIE changes but I'm still
> struggling with the PIC-but-not-PIE bits.

I think I'm with Richard S on hidden vs protected on first reading. I
can see why this works out of the box and can even be default for
static-pie.

Any reason why this is not on by default - it's early enough in the
stage3 cycle and we can always flip the defaults if there are more
problems found.

You probably need a rebase for the documentation bits,.

regards
Ramana


Ramana


[PATCH] Fortran: reject NULL actual argument without explicit interface [PR107576]

2022-11-17 Thread Harald Anlauf via Gcc-patches
Dear all,

one cannot pass a NULL actual argument to a procedure without an
explicit interface.  This is detected and reported by NAG and Intel.
(Cray accepts this silently, and some other brands ICE.)

The testcase by Gerhard even tricked gfortran into inconsistent
behavior which could lead to an ICE with -fallow-argument-mismatch,
or silently accepting invalid code.

The solution is to reject such code, see attached patch.

Regtested on x86_64-pc-linux-gnu.  OK for mainline?

As this is marked as a regression which started at v7,
OK for backports to open branches?

Thanks,
Harald

From c6b19d662f51b1e2d2691e81cfeb68ad953a4c09 Mon Sep 17 00:00:00 2001
From: Harald Anlauf 
Date: Thu, 17 Nov 2022 21:36:49 +0100
Subject: [PATCH] Fortran: reject NULL actual argument without explicit
 interface [PR107576]

gcc/fortran/ChangeLog:

	PR fortran/107576
	* interface.cc (gfc_procedure_use): Reject NULL as actual argument
	when there is no explicit procedure interface.

gcc/testsuite/ChangeLog:

	PR fortran/107576
	* gfortran.dg/null_actual_3.f90: New test.
---
 gcc/fortran/interface.cc|  8 
 gcc/testsuite/gfortran.dg/null_actual_3.f90 | 18 ++
 2 files changed, 26 insertions(+)
 create mode 100644 gcc/testsuite/gfortran.dg/null_actual_3.f90

diff --git a/gcc/fortran/interface.cc b/gcc/fortran/interface.cc
index 616ae2b1197..73799c175b7 100644
--- a/gcc/fortran/interface.cc
+++ b/gcc/fortran/interface.cc
@@ -4162,6 +4162,14 @@ gfc_procedure_use (gfc_symbol *sym, gfc_actual_arglist **ap, locus *where)
 	  return false;
 	}

+	  if (a->expr && a->expr->expr_type == EXPR_NULL)
+	{
+	  gfc_error ("Passing intrinsic NULL as actual argument at %L "
+			 "requires an explicit interface", >expr->where);
+	  a->expr->error = 1;
+	  return false;
+	}
+
 	  /* TS 29113, C407b.  */
 	  if (a->expr && a->expr->expr_type == EXPR_VARIABLE
 	  && symbol_rank (a->expr->symtree->n.sym) == -1)
diff --git a/gcc/testsuite/gfortran.dg/null_actual_3.f90 b/gcc/testsuite/gfortran.dg/null_actual_3.f90
new file mode 100644
index 000..ea49f9630c9
--- /dev/null
+++ b/gcc/testsuite/gfortran.dg/null_actual_3.f90
@@ -0,0 +1,18 @@
+! { dg-do compile }
+! { dg-options "-fallow-argument-mismatch -w" }
+! PR fortran/107576
+! Contributed by G.Steinmetz
+
+program p
+  implicit none
+  interface
+ subroutine r(y)
+   integer, pointer :: y(:)
+ end subroutine r
+  end interface
+  integer, pointer :: z(:) => null()
+  call r(z)
+  call s(z)
+  call r(null(z))
+  call s(null(z)) ! { dg-error "requires an explicit interface" }
+end
--
2.35.3



Re: [PATCH] [range-ops] Implement sqrt.

2022-11-17 Thread Joseph Myers
On Thu, 17 Nov 2022, Jakub Jelinek via Gcc-patches wrote:

> On Thu, Nov 17, 2022 at 06:59:45PM +, Joseph Myers wrote:
> > On Thu, 17 Nov 2022, Aldy Hernandez via Gcc-patches wrote:
> > 
> > > So... is the optimization wrong?  Are we not allowed to substitute
> > > that NAN if we know it's gonna happen?  Should we also allow F F F F F
> > > in the test?  Or something else?
> > 
> > This seems like the usual ambiguity about what transformations 
> > -ftrapping-math (on by default) is meant to prevent.
> > 
> > Generally it's understood to prevent transformations that add *or remove* 
> > exceptions, so folding a case that raises "invalid" to a NaN (with 
> > "invalid" no longer raised) is invalid with -ftrapping-math.  But that 
> > doesn't tend to be applied if the operation raising the exceptions has a 
> > result that is otherwise unused - in such a case the operation may still 
> > be removed completely (the exception isn't properly treated as a side 
> > effect to avoid dead code elimination; cf. Marc Glisse's -ffenv-access 
> > patches from August 2020).  And it may often also not be applied to 
> > "inexact".
> 
> The problem is that the above model I'm afraid is largely incompatible with
> the optimizations ranger provides.

That model is more an empirical description of when the nominal 
-ftrapping-math semantics tend to be respected, than a coherent design for 
any kind of API commitment to what the option does or what the default 
trapping-math rules are.

-- 
Joseph S. Myers
jos...@codesourcery.com


[PATCH] c++, v4: Implement C++23 P2647R1 - Permitting static constexpr variables in constexpr functions

2022-11-17 Thread Jakub Jelinek via Gcc-patches
On Thu, Nov 17, 2022 at 07:42:40PM +0100, Jakub Jelinek via Gcc-patches wrote:
> I thought for older C++ this is to catch
> void
> foo ()
> {
>   constexpr int a = ({ static constexpr int b = 2; b; });
> }
> and for C++23 the only 3 spots that diagnose those.
> But perhaps for C++20 or older we can check if the var has a context
> of a constexpr function (then assume cp_finish_decl errored or pedwarned
> already) and only error or pedwarn otherwise.

So, here is an updated patch, which in constexpr.cc will accept
DECL_EXPR of decl_*constant_var_p static/thread_local non-extern vars
for C++23 or if they are not declared in constexpr/consteval function.
So, the statement expression case will remain hard error for C++ <= 20 rather 
than
pedwarn, because due to the ctx->quiet vs. !ctx->quiet case I don't see
what else we could do, either something is a constant expression, or
it is not, but whether it is or is not shouldn't depend on
-Wpedantic/-Wno-pedantic/-Werror=pedantic.

2022-11-17  Jakub Jelinek  

gcc/c-family/
* c-cppbuiltin.cc (c_cpp_builtins): Bump __cpp_constexpr
value from 202207L to 202211L.
gcc/cp/
* constexpr.cc (cxx_eval_constant_expression): Implement C++23
P2647R1 - Permitting static constexpr variables in constexpr functions.
Allow decl_constant_var_p static or thread_local vars for
C++23 and later or if they are declared inside of constexpr or
consteval function.
(potential_constant_expression_1): Similarly, except use
decl_maybe_constant_var_p instead of decl_constant_var_p if
processing_template_decl.
* decl.cc (diagnose_static_in_constexpr): New function.
(start_decl): Remove diagnostics of static or thread_local
vars in constexpr or consteval functions.
(cp_finish_decl): Call diagnose_static_in_constexpr.
gcc/testsuite/
* g++.dg/cpp23/constexpr-nonlit17.C: New test.
* g++.dg/cpp23/constexpr-nonlit18.C: New test.
* g++.dg/cpp23/constexpr-nonlit19.C: New test.
* g++.dg/cpp23/constexpr-nonlit20.C: New test.
* g++.dg/cpp23/feat-cxx2b.C: Adjust expected __cpp_constexpr
value.
* g++.dg/ext/stmtexpr19.C: Don't expect an error for C++20 or later. 

--- gcc/c-family/c-cppbuiltin.cc.jj 2022-11-17 09:00:42.106249011 +0100
+++ gcc/c-family/c-cppbuiltin.cc2022-11-17 09:01:49.286320527 +0100
@@ -1074,7 +1074,7 @@ c_cpp_builtins (cpp_reader *pfile)
  /* Set feature test macros for C++23.  */
  cpp_define (pfile, "__cpp_size_t_suffix=202011L");
  cpp_define (pfile, "__cpp_if_consteval=202106L");
- cpp_define (pfile, "__cpp_constexpr=202207L");
+ cpp_define (pfile, "__cpp_constexpr=202211L");
  cpp_define (pfile, "__cpp_multidimensional_subscript=202211L");
  cpp_define (pfile, "__cpp_named_character_escapes=202207L");
  cpp_define (pfile, "__cpp_static_call_operator=202207L");
--- gcc/cp/constexpr.cc.jj  2022-11-17 08:48:30.530357181 +0100
+++ gcc/cp/constexpr.cc 2022-11-17 20:53:15.432408015 +0100
@@ -7100,17 +7100,35 @@ cxx_eval_constant_expression (const cons
/* Allow __FUNCTION__ etc.  */
&& !DECL_ARTIFICIAL (r))
  {
-   if (!ctx->quiet)
+   bool ok = decl_constant_var_p (r);
+   /* Since P2647R1 control can pass through definitions of static
+  or thread_local vars usable in constant expressions.
+  In C++20 or older, if such vars are declared inside of
+  constexpr or consteval function, diagnose_static_in_constexpr
+  should have already pedwarned on those.  Otherwise they could
+  be e.g. in a statement expression, reject those before
+  C++23.  */
+   if (ok && cxx_dialect < cxx23)
  {
-   if (CP_DECL_THREAD_LOCAL_P (r))
- error_at (loc, "control passes through definition of %qD "
-"with thread storage duration", r);
-   else
- error_at (loc, "control passes through definition of %qD "
-"with static storage duration", r);
+   tree fnctx = decl_function_context (r);
+   if (fnctx == NULL_TREE
+   || !DECL_DECLARED_CONSTEXPR_P (fnctx))
+ ok = false;
+ }
+   if (!ok)
+ {
+   if (!ctx->quiet)
+ {
+   if (CP_DECL_THREAD_LOCAL_P (r))
+ error_at (loc, "control passes through definition of "
+"%qD with thread storage duration", r);
+   else
+ error_at (loc, "control passes through definition of "
+"%qD with static storage duration", r);
+ }
+   *non_constant_p = true;
+   break;
  }
-

Re: [PATCH][GCC] arm: Add support for new frame unwinding instruction "0xb5".

2022-11-17 Thread Ramana Radhakrishnan via Gcc-patches
On Thu, Nov 10, 2022 at 10:38 AM Srinath Parvathaneni via Gcc-patches
 wrote:
>
> Hi,
>
> This patch adds support for Arm frame unwinding instruction "0xb5" [1]. When
> an exception is taken and "0xb5" instruction is encounter during runtime
> stack-unwinding, we use effective vsp as modifier in pointer authentication.
> On completion of stack unwinding if "0xb5" instruction is not encountered
> then CFA will be used as modifier in pointer authentication.
>
> [1] 
> https://github.com/ARM-software/abi-aa/releases/download/2022Q3/ehabi32.pdf
>
> Regression tested on arm-none-eabi target and found no regressions.
>
> Ok for master?
>

No, not yet.

Presumably the logic to produce 0xb5 is in the source base and this
was tested with suitable options that produce said opcode ? I see no
logic in place to produce the said opcode in the backend in a quick
read as the pacbti patches still seem to be in review. ?

So what was the test suite run actually testing ?

regards
Ramana


> Regards,
> Srinath.
>
> gcc/ChangeLog:
>
> 2022-11-09  Srinath Parvathaneni  
>
> * libgcc/config/arm/pr-support.c (__gnu_unwind_execute): Decode opcode
> "0xb5".
>
>
> ### Attachment also inlined for ease of reply
> ###
>
>
> diff --git a/libgcc/config/arm/pr-support.c b/libgcc/config/arm/pr-support.c
> index 
> e48854587c667a959aa66ccc4982231f6ecc..73e4942a39b34a83c2da85def6b13e82ec501552
>  100644
> --- a/libgcc/config/arm/pr-support.c
> +++ b/libgcc/config/arm/pr-support.c
> @@ -107,7 +107,9 @@ __gnu_unwind_execute (_Unwind_Context * context, 
> __gnu_unwind_state * uws)
>_uw op;
>int set_pc;
>int set_pac = 0;
> +  int set_pac_sp = 0;
>_uw reg;
> +  _uw sp;
>
>set_pc = 0;
>for (;;)
> @@ -124,10 +126,11 @@ __gnu_unwind_execute (_Unwind_Context * context, 
> __gnu_unwind_state * uws)
>  #if defined(TARGET_HAVE_PACBTI)
>   if (set_pac)
> {
> - _uw sp;
>   _uw lr;
>   _uw pac;
> - _Unwind_VRS_Get (context, _UVRSC_CORE, R_SP, _UVRSD_UINT32, 
> );
> + if (!set_pac_sp)
> +   _Unwind_VRS_Get (context, _UVRSC_CORE, R_SP, _UVRSD_UINT32,
> +);
>   _Unwind_VRS_Get (context, _UVRSC_CORE, R_LR, _UVRSD_UINT32, 
> );
>   _Unwind_VRS_Get (context, _UVRSC_PAC, R_IP,
>_UVRSD_UINT32, );
> @@ -259,7 +262,19 @@ __gnu_unwind_execute (_Unwind_Context * context, 
> __gnu_unwind_state * uws)
>   continue;
> }
>
> - if ((op & 0xfc) == 0xb4)  /* Obsolete FPA.  */
> + /* Use current VSP as modifier in PAC validation.  */
> + if (op == 0xb5)
> +   {
> + if (set_pac)
> +   _Unwind_VRS_Get (context, _UVRSC_CORE, R_SP, _UVRSD_UINT32,
> +);
> + else
> +   return _URC_FAILURE;
> + set_pac_sp = 1;
> + continue;
> +   }
> +
> + if ((op & 0xfd) == 0xb6)  /* Obsolete FPA.  */
> return _URC_FAILURE;
>
>   /* op & 0xf8 == 0xb8.  */
>
>
>


Re: [Patch Arm] Fix PR 92999

2022-11-17 Thread Ramana Radhakrishnan via Gcc-patches
On Fri, Nov 11, 2022 at 9:50 PM Ramana Radhakrishnan
 wrote:
>
> On Thu, Nov 10, 2022 at 7:46 PM Ramana Radhakrishnan
>  wrote:
> >
> > On Thu, Nov 10, 2022 at 6:03 PM Richard Earnshaw
> >  wrote:
> > >
> > >
> > >
> > > On 10/11/2022 17:21, Richard Earnshaw via Gcc-patches wrote:
> > > >
> > > >
> > > > On 08/11/2022 18:20, Ramana Radhakrishnan via Gcc-patches wrote:
> > > >> PR92999 is a case where the VFP calling convention does not allocate
> > > >> enough FP registers for a homogenous aggregate containing FP16 values.
> > > >> I believe this is the complete fix but would appreciate another set of
> > > >> eyes on this.
> > > >>
> > > >> Could I get a hand with a regression test run on an armhf environment
> > > >> while I fix my environment ?
> > > >>
> > > >> gcc/ChangeLog:
> > > >>
> > > >> PR target/92999
> > > >> *  config/arm/arm.c (aapcs_vfp_allocate_return_reg): Adjust to handle
> > > >> aggregates with elements smaller than SFmode.
> > > >>
> > > >> gcc/testsuite/ChangeLog:
> > > >>
> > > >> * gcc.target/arm/pr92999.c: New test.
> > > >>
> > > >>
> > > >> Thanks,
> > > >> Ramana
> > > >>
> > > >> Signed-off-by: Ramana Radhakrishnan 
> > > >
> > > > I'm not sure about this.  The AAPCS does not mention a base type of a
> > > > half-precision FP type as an appropriate homogeneous aggregate for using
> > > > VFP registers for either calling or returning.
> >
> > Ooh interesting, thanks for taking a look and poking at the AAPCS and
> > that's a good catch. BF16 should also have the same behaviour as FP16
> > , I suspect ?
>
> I suspect I got caught out by the definition of the Homogenous
> aggregate from Section 5.3.5
> ((https://github.com/ARM-software/abi-aa/blob/2982a9f3b512a5bfdc9e3fea5d3b298f9165c36b/aapcs32/aapcs32.rst#homogeneous-aggregates)
> which simply suggests it's an aggregate of fundamental types which
> lists half precision floating point .
>
> FTR, ideally I should have read 7.1.2.1
> https://github.com/ARM-software/abi-aa/blob/2982a9f3b512a5bfdc9e3fea5d3b298f9165c36b/aapcs32/aapcs32.rst#procedure-calling)
> :)
>
>
>
> >
> > > >
> > > > So perhaps the bug is that we try to treat this as a homogeneous
> > > > aggregate at all.
> >
> > Yep I agree - I'll take a look again tomorrow and see if I can get a fix.
> >
> > (And thanks Alex for the test run, I might trouble you again while I
> > still (slowly) get some of my boards back up)
>
>
> and as promised take 2. I'd really prefer another review on this one
> to see if I've not missed anything in the cases below.

Ping  ?

Ramana

>
> regards
> Ramana
>
>
> >
> > regards,
> > Ramana
> >
> >
> > >
> > > R.


Re: [PATCH v2] tree-object-size: Support strndup and strdup

2022-11-17 Thread Siddhesh Poyarekar

Ping!

On 2022-11-04 08:48, Siddhesh Poyarekar wrote:

Use string length of input to strdup to determine the usable size of the
resulting object.  Avoid doing the same for strndup since there's a
chance that the input may be too large, resulting in an unnecessary
overhead or worse, the input may not be NULL terminated, resulting in a
crash where there would otherwise have been none.

gcc/ChangeLog:

* tree-object-size.cc (todo): New variable.
(object_sizes_execute): Use it.
(strdup_object_size): New function.
(call_object_size): Use it.

gcc/testsuite/ChangeLog:

* gcc.dg/builtin-dynamic-object-size-0.c (test_strdup,
test_strndup, test_strdup_min, test_strndup_min): New tests.
(main): Call them.
* gcc.dg/builtin-dynamic-object-size-1.c: Silence overread
warnings.
* gcc.dg/builtin-dynamic-object-size-2.c: Likewise.
* gcc.dg/builtin-dynamic-object-size-3.c: Likewise.
* gcc.dg/builtin-dynamic-object-size-4.c: Likewise.
* gcc.dg/builtin-object-size-1.c: Silence overread warnings.
Declare free, strdup and strndup.
(test11): New test.
(main): Call it.
* gcc.dg/builtin-object-size-2.c: Silence overread warnings.
Declare free, strdup and strndup.
(test9): New test.
(main): Call it.
* gcc.dg/builtin-object-size-3.c: Silence overread warnings.
Declare free, strdup and strndup.
(test11): New test.
(main): Call it.
* gcc.dg/builtin-object-size-4.c: Silence overread warnings.
Declare free, strdup and strndup.
(test9): New test.
(main): Call it.
---
Tested:

- x86_64 bootstrap and testsuite run
- i686 build and testsuite run
- ubsan bootstrap

  .../gcc.dg/builtin-dynamic-object-size-0.c| 43 +
  .../gcc.dg/builtin-dynamic-object-size-1.c|  2 +-
  .../gcc.dg/builtin-dynamic-object-size-2.c|  2 +-
  .../gcc.dg/builtin-dynamic-object-size-3.c|  2 +-
  .../gcc.dg/builtin-dynamic-object-size-4.c|  2 +-
  gcc/testsuite/gcc.dg/builtin-object-size-1.c  | 94 +-
  gcc/testsuite/gcc.dg/builtin-object-size-2.c  | 94 +-
  gcc/testsuite/gcc.dg/builtin-object-size-3.c  | 95 ++-
  gcc/testsuite/gcc.dg/builtin-object-size-4.c  | 94 +-
  gcc/tree-object-size.cc   | 84 +++-
  10 files changed, 502 insertions(+), 10 deletions(-)

diff --git a/gcc/testsuite/gcc.dg/builtin-dynamic-object-size-0.c 
b/gcc/testsuite/gcc.dg/builtin-dynamic-object-size-0.c
index 01a280b2d7b..4f1606a486b 100644
--- a/gcc/testsuite/gcc.dg/builtin-dynamic-object-size-0.c
+++ b/gcc/testsuite/gcc.dg/builtin-dynamic-object-size-0.c
@@ -479,6 +479,40 @@ test_loop (int *obj, size_t sz, size_t start, size_t end, 
int incr)
return __builtin_dynamic_object_size (ptr, 0);
  }
  
+/* strdup/strndup.  */

+
+size_t
+__attribute__ ((noinline))
+test_strdup (const char *in)
+{
+  char *res = __builtin_strdup (in);
+  return __builtin_dynamic_object_size (res, 0);
+}
+
+size_t
+__attribute__ ((noinline))
+test_strndup (const char *in, size_t bound)
+{
+  char *res = __builtin_strndup (in, bound);
+  return __builtin_dynamic_object_size (res, 0);
+}
+
+size_t
+__attribute__ ((noinline))
+test_strdup_min (const char *in)
+{
+  char *res = __builtin_strdup (in);
+  return __builtin_dynamic_object_size (res, 2);
+}
+
+size_t
+__attribute__ ((noinline))
+test_strndup_min (const char *in, size_t bound)
+{
+  char *res = __builtin_strndup (in, bound);
+  return __builtin_dynamic_object_size (res, 2);
+}
+
  /* Other tests.  */
  
  struct TV4

@@ -651,6 +685,15 @@ main (int argc, char **argv)
int *t = test_pr105736 ();
if (__builtin_dynamic_object_size (t, 0) != -1)
  FAIL ();
+  const char *str = "hello world";
+  if (test_strdup (str) != __builtin_strlen (str) + 1)
+FAIL ();
+  if (test_strndup (str, 4) != 5)
+FAIL ();
+  if (test_strdup_min (str) != __builtin_strlen (str) + 1)
+FAIL ();
+  if (test_strndup_min (str, 4) != 1)
+FAIL ();
  
if (nfails > 0)

  __builtin_abort ();
diff --git a/gcc/testsuite/gcc.dg/builtin-dynamic-object-size-1.c 
b/gcc/testsuite/gcc.dg/builtin-dynamic-object-size-1.c
index 7cc8b1c9488..8f17c8edcaf 100644
--- a/gcc/testsuite/gcc.dg/builtin-dynamic-object-size-1.c
+++ b/gcc/testsuite/gcc.dg/builtin-dynamic-object-size-1.c
@@ -1,5 +1,5 @@
  /* { dg-do run } */
-/* { dg-options "-O2" } */
+/* { dg-options "-O2 -Wno-stringop-overread" } */
  /* { dg-require-effective-target alloca } */
  
  #define __builtin_object_size __builtin_dynamic_object_size

diff --git a/gcc/testsuite/gcc.dg/builtin-dynamic-object-size-2.c 
b/gcc/testsuite/gcc.dg/builtin-dynamic-object-size-2.c
index 267dbf48ca7..3677782ff1c 100644
--- a/gcc/testsuite/gcc.dg/builtin-dynamic-object-size-2.c
+++ b/gcc/testsuite/gcc.dg/builtin-dynamic-object-size-2.c
@@ -1,5 +1,5 @@
  /* { dg-do run } */
-/* { dg-options 

Re: [PATCH] [range-ops] Implement sqrt.

2022-11-17 Thread Jakub Jelinek via Gcc-patches
On Thu, Nov 17, 2022 at 06:59:45PM +, Joseph Myers wrote:
> On Thu, 17 Nov 2022, Aldy Hernandez via Gcc-patches wrote:
> 
> > So... is the optimization wrong?  Are we not allowed to substitute
> > that NAN if we know it's gonna happen?  Should we also allow F F F F F
> > in the test?  Or something else?
> 
> This seems like the usual ambiguity about what transformations 
> -ftrapping-math (on by default) is meant to prevent.
> 
> Generally it's understood to prevent transformations that add *or remove* 
> exceptions, so folding a case that raises "invalid" to a NaN (with 
> "invalid" no longer raised) is invalid with -ftrapping-math.  But that 
> doesn't tend to be applied if the operation raising the exceptions has a 
> result that is otherwise unused - in such a case the operation may still 
> be removed completely (the exception isn't properly treated as a side 
> effect to avoid dead code elimination; cf. Marc Glisse's -ffenv-access 
> patches from August 2020).  And it may often also not be applied to 
> "inexact".

The problem is that the above model I'm afraid is largely incompatible with
the optimizations ranger provides.
A strict model where no operations that could raise exceptions are discarded
is easy, we let frange optimize as much as it wants and just tell DCE not to
eliminate operations that can raise exceptions.
But in the model where some exceptions can be discarded if results are unused
but not others where they are used, there is no way to distinguish between
the result of the operation really isn't needed and ranger figured out a
result (or usable range of something) and therefore the result of the
operation isn't needed.
Making frange more limited with -ftrapping-math, making it punt for
operations that could raise an exception would be quite drastic
pessimization.  Perhaps for -ftrapping-math we could say no frange value is
singleton and so at least for most of operations we actually wouldn't
optimize out the whole computation when we know the result?  Still, we could
also just have
r = long_computation (x, y, z);
if (r > 42.0)
and if frange figures out that r must be [256.0, 1024.0] and never NAN, we'd
still happily optimize away the comparison.

Jakub



Re: [PATCH 2/5] c++: Set the locus of the function result decl

2022-11-17 Thread Bernhard Reutner-Fischer via Gcc-patches
On Thu, 17 Nov 2022 09:53:32 -0500
Jason Merrill  wrote:

> On 11/17/22 03:56, Bernhard Reutner-Fischer wrote:
> > On Tue, 15 Nov 2022 18:52:41 -0500
> > Jason Merrill  wrote:
> >   
> >> On 11/12/22 13:45, Bernhard Reutner-Fischer wrote:  
> >>> gcc/cp/ChangeLog:
> >>>
> >>>   * decl.cc (start_function): Set the result decl source location to
> >>>   the location of the typespec.
> >>>
> >>> ---
> >>> Bootstrapped and regtested on x86_86-unknown-linux with no regressions.
> >>> Ok for trunk?
> >>>
> >>> Cc: Nathan Sidwell 
> >>> Cc: Jason Merrill 
> >>> ---
> >>>gcc/cp/decl.cc | 15 ++-
> >>>1 file changed, 14 insertions(+), 1 deletion(-)
> >>>
> >>> diff --git a/gcc/cp/decl.cc b/gcc/cp/decl.cc
> >>> index 6e98ea35a39..ed40815e645 100644
> >>> --- a/gcc/cp/decl.cc
> >>> +++ b/gcc/cp/decl.cc
> >>> @@ -17449,6 +17449,8 @@ start_function (cp_decl_specifier_seq *declspecs,
> >>>   tree attrs)
> >>>{
> >>>  tree decl1;
> >>> +  tree result;
> >>> +  bool ret;  
> >>
> >> We now prefer to declare new variables as late as possible, usually when
> >> they are initialized.  
> > 
> > Moved. Ok like attached? Bootstrapped and regtested fine.
> >   
> >>>  decl1 = grokdeclarator (declarator, declspecs, FUNCDEF, 1, );
> >>>  invoke_plugin_callbacks (PLUGIN_START_PARSE_FUNCTION, decl1);
> >>> @@ -17461,7 +17463,18 @@ start_function (cp_decl_specifier_seq *declspecs,
> >>>gcc_assert (same_type_p (TREE_TYPE (TREE_TYPE (decl1)),
> >>>integer_type_node));
> >>>
> >>> -  return start_preparsed_function (decl1, attrs, /*flags=*/SF_DEFAULT);
> >>> +  ret = start_preparsed_function (decl1, attrs, /*flags=*/SF_DEFAULT);
> >>> +
> >>> +  /* decl1 might be ggc_freed here.  */
> >>> +  decl1 = current_function_decl;
> >>> +
> >>> +  /* Set the result decl source location to the location of the 
> >>> typespec.  */
> >>> +  if (TREE_CODE (decl1) == FUNCTION_DECL
> >>> +  && declspecs->locations[ds_type_spec] != UNKNOWN_LOCATION
> >>> +  && (result = DECL_RESULT (decl1)) != NULL_TREE
> >>> +  && DECL_SOURCE_LOCATION (result) == input_location)
> >>> +DECL_SOURCE_LOCATION (result) = declspecs->locations[ds_type_spec];  
> >>
> >> One way to handle the template case would be for the code in
> >> start_preparsed_function that sets DECL_RESULT to check whether decl1 is
> >> a template instantiation, and in that case copy the location from the
> >> template's DECL_RESULT, i.e.
> >>
> >> DECL_RESULT (DECL_TEMPLATE_RESULT (DECL_TI_TEMPLATE (decl1)))  
> > 
> > Well, that would probably work if something would set the location of
> > that template result decl properly, which nothing does out of the box.  
> 
> Hmm, it should get set by your patch, since templates go through 
> start_function like normal functions.
> 
> > diff --git a/gcc/cp/decl.cc b/gcc/cp/decl.cc
> > index ed7226b82f0..65d78c82a2d 100644
> > --- a/gcc/cp/decl.cc
> > +++ b/gcc/cp/decl.cc
> > @@ -17230,6 +17231,17 @@ start_preparsed_function (tree decl1, tree attrs, 
> > int flags)
> > cp_apply_type_quals_to_decl (cp_type_quals (restype), resdecl);
> >   }
> >   
> > +  /* Set the result decl source location to the location of the typespec.  
> > */
> > +  if (DECL_RESULT (decl1)
> > +  && !DECL_USE_TEMPLATE (decl1)
> > +  && DECL_TEMPLATE_INFO (decl1)
> > +  && DECL_TI_TEMPLATE (decl1)
> > +  && DECL_TEMPLATE_RESULT (DECL_TI_TEMPLATE (decl1))
> > +  && DECL_RESULT (DECL_TEMPLATE_RESULT (DECL_TI_TEMPLATE (decl1  
> 
> This condition is true only for the template definition, for which you 
> haven't gotten to your start_function change yet.
> 
> Instead, you want to copy the location for instantiations, i.e. check 
> DECL_TEMPLATE_INSTANTIATION instead of !DECL_USE_TEMPLATE.

No, that makes no difference.
But really I'm not interested in the template case, i only mentioned
them because they don't work and in case somebody wanted to have correct
locations.
I remember just frustration when i looked at those a year ago.

Is the hunk for normal functions OK for trunk?

thanks,

> 
> > +  DECL_SOURCE_LOCATION (DECL_RESULT (decl1))
> > +   = DECL_SOURCE_LOCATION (  
> 
> Open paren goes on the new line.
> 
> > +   DECL_RESULT (DECL_TEMPLATE_RESULT (DECL_TI_TEMPLATE (decl1; >   
> >   /* Record the decl so that the function name is defined.
> >If we already have a decl for this name, and it is a FUNCTION_DECL,
> >use the old decl.  */
> > 
> > (gdb) call inform(DECL_SOURCE_LOCATION (DECL_RESULT (decl1)), "decl1 result 
> > locus before")
> > ../tmp4/return-narrow-2.cc:7:3: note: decl1 result locus before
> >  7 |   { return _M_finish != 0; }
> >|   ^
> > (gdb) n
> > (gdb) call inform(DECL_SOURCE_LOCATION (DECL_RESULT (decl1)), "decl1 result 
> > locus from TI")
> > ../tmp4/return-narrow-2.cc:7:3: note: decl1 result locus from TI
> > (gdb) p DECL_SOURCE_LOCATION (DECL_RESULT (decl1))
> > $1 = 

Re: [PATCH] RISC-V: Handle "(a & twobits) == singlebit" in branches using Zbs

2022-11-17 Thread Philipp Tomsich
On Thu, 17 Nov 2022 at 19:56, Andrew Waterman  wrote:
>
> On Thu, Nov 17, 2022 at 10:52 AM Philipp Tomsich
>  wrote:
> >
> > On Thu, 17 Nov 2022 at 19:33, Andrew Waterman  wrote:
> > >
> > > Am I wrong to worry that this will increase dynamic instruction count
> > > when used in a loop?  The obvious code is more efficient when the
> > > constant loads can be hoisted out of a loop.  Or does the cost model
> > > account for this somehow?
> >
> > With this change merged, GCC still hoists the constants out of the
> > loop (just checked with a quick test case).
> > So the cost model seems correct (whether intentionally or accidentally).
>
> Cool, thanks for checking.

We have an updated cost-model for IF_THEN_ELSE brewing, but it didn't
make the cut (and will need some more adjustments and a lot more
testing).
It seems to make a difference on some SPEC workloads.  I don't have a
timeline on finalizing that cost-model improvement yet.

>
> >
> > Thanks,
> > Philipp.
> >
> > >
> > >
> > > On Sun, Nov 13, 2022 at 12:50 PM Philipp Tomsich
> > >  wrote:
> > > >
> > > > Use Zbs when generating a sequence for "if ((a & twobits) == singlebit) 
> > > > ..."
> > > > that can be expressed as bexti + bexti + andn.
> > > >
> > > > gcc/ChangeLog:
> > > >
> > > > * config/riscv/bitmanip.md 
> > > > (*branch_mask_twobits_equals_singlebit):
> > > > Handle "if ((a & T) == C)" using Zbs, when T has 2 bits set and 
> > > > C has one
> > > > of these tow bits set.
> > > > * config/riscv/predicates.md (const_twobits_operand): New 
> > > > predicate.
> > > >
> > > > gcc/testsuite/ChangeLog:
> > > >
> > > > * gcc.target/riscv/zbs-if_then_else-01.c: New test.
> > > >
> > > > Signed-off-by: Philipp Tomsich 
> > > > ---
> > > >
> > > >  gcc/config/riscv/bitmanip.md  | 42 +++
> > > >  gcc/config/riscv/predicates.md|  5 +++
> > > >  .../gcc.target/riscv/zbs-if_then_else-01.c| 20 +
> > > >  3 files changed, 67 insertions(+)
> > > >  create mode 100644 gcc/testsuite/gcc.target/riscv/zbs-if_then_else-01.c
> > > >
> > > > diff --git a/gcc/config/riscv/bitmanip.md b/gcc/config/riscv/bitmanip.md
> > > > index 7a8f4e35880..2cea394671f 100644
> > > > --- a/gcc/config/riscv/bitmanip.md
> > > > +++ b/gcc/config/riscv/bitmanip.md
> > > > @@ -690,3 +690,45 @@
> > > >"TARGET_ZBS"
> > > >[(set (match_dup 0) (zero_extract:X (match_dup 1) (const_int 1) 
> > > > (match_dup 2)))
> > > > (set (match_dup 0) (xor:X (match_dup 0) (const_int 1)))])
> > > > +
> > > > +;; IF_THEN_ELSE: test for 2 bits of opposite polarity
> > > > +(define_insn_and_split "*branch_mask_twobits_equals_singlebit"
> > > > +  [(set (pc)
> > > > +   (if_then_else (match_operator 1 "equality_operator"
> > > > +  [(and:X (match_operand:X 2 "register_operand" 
> > > > "r")
> > > > +  (match_operand:X 3 
> > > > "const_twobits_operand" "i"))
> > > > +   (match_operand:X 4 "single_bit_mask_operand" 
> > > > "i")])
> > > > +(label_ref (match_operand 0 "" ""))
> > > > +(pc)))
> > > > +   (clobber (match_scratch:X 5 "="))
> > > > +   (clobber (match_scratch:X 6 "="))]
> > > > +  "TARGET_ZBS && TARGET_ZBB && !SMALL_OPERAND (INTVAL (operands[3]))"
> > > > +  "#"
> > > > +  "&& reload_completed"
> > > > +  [(set (match_dup 5) (zero_extract:X (match_dup 2)
> > > > + (const_int 1)
> > > > + (match_dup 8)))
> > > > +   (set (match_dup 6) (zero_extract:X (match_dup 2)
> > > > + (const_int 1)
> > > > + (match_dup 9)))
> > > > +   (set (match_dup 6) (and:X (not:X (match_dup 6)) (match_dup 5)))
> > > > +   (set (pc) (if_then_else (match_op_dup 1 [(match_dup 6) (const_int 
> > > > 0)])
> > > > +  (label_ref (match_dup 0))
> > > > +  (pc)))]
> > > > +{
> > > > +   unsigned HOST_WIDE_INT twobits_mask = UINTVAL (operands[3]);
> > > > +   unsigned HOST_WIDE_INT singlebit_mask = UINTVAL (operands[4]);
> > > > +
> > > > +   /* Make sure that the reference value has one of the bits of the 
> > > > mask set */
> > > > +   if ((twobits_mask & singlebit_mask) == 0)
> > > > +  FAIL;
> > > > +
> > > > +   int setbit = ctz_hwi (singlebit_mask);
> > > > +   int clearbit = ctz_hwi (twobits_mask & ~singlebit_mask);
> > > > +
> > > > +   operands[1] = gen_rtx_fmt_ee (GET_CODE (operands[1]) == NE ? EQ : 
> > > > NE,
> > > > +mode, operands[6], GEN_INT(0));
> > > > +
> > > > +   operands[8] = GEN_INT (setbit);
> > > > +   operands[9] = GEN_INT (clearbit);
> > > > +})
> > > > diff --git a/gcc/config/riscv/predicates.md 
> > > > b/gcc/config/riscv/predicates.md
> > > > index 490bff688a7..6e34829a59b 100644
> > > > --- a/gcc/config/riscv/predicates.md
> > > > +++ b/gcc/config/riscv/predicates.md
> > > > @@ 

Re: [PATCH] [range-ops] Implement sqrt.

2022-11-17 Thread Joseph Myers
On Thu, 17 Nov 2022, Aldy Hernandez via Gcc-patches wrote:

> So... is the optimization wrong?  Are we not allowed to substitute
> that NAN if we know it's gonna happen?  Should we also allow F F F F F
> in the test?  Or something else?

This seems like the usual ambiguity about what transformations 
-ftrapping-math (on by default) is meant to prevent.

Generally it's understood to prevent transformations that add *or remove* 
exceptions, so folding a case that raises "invalid" to a NaN (with 
"invalid" no longer raised) is invalid with -ftrapping-math.  But that 
doesn't tend to be applied if the operation raising the exceptions has a 
result that is otherwise unused - in such a case the operation may still 
be removed completely (the exception isn't properly treated as a side 
effect to avoid dead code elimination; cf. Marc Glisse's -ffenv-access 
patches from August 2020).  And it may often also not be applied to 
"inexact".

There have been various past discussions of possible ways to split up the 
different effects of options such as -ftrapping-math into finer-grained 
options allowing more control of what transformations are permitted - see 
e.g. 
 
and bug 54192.  There is also the question in that context of which 
sub-options should be enabled by default at all.

-- 
Joseph S. Myers
jos...@codesourcery.com


Re: [PATCH] RISC-V: Handle "(a & twobits) == singlebit" in branches using Zbs

2022-11-17 Thread Philipp Tomsich
On Thu, 17 Nov 2022 at 19:28, Andrew Pinski  wrote:
>
> On Thu, Nov 17, 2022 at 10:25 AM Andrew Pinski  wrote:
> >
> > On Sun, Nov 13, 2022 at 12:51 PM Philipp Tomsich
> >  wrote:
> > >
> > > Use Zbs when generating a sequence for "if ((a & twobits) == singlebit) 
> > > ..."
> > > that can be expressed as bexti + bexti + andn.
> >
> > Can't you also handle if ((a & twobits) == 0) case doing a similar thing.
> > That is:
> > two bexti + and and then compare against zero which is exactly the
> > same # of instructions as the above case.

We can form any 2-bit constant with BSETI + BSETI (no OR required).
So no explicit support for that case will be required (as a AND + BEQ
will be formed anyway).

> >
> >
> > >
> > > gcc/ChangeLog:
> > >
> > > * config/riscv/bitmanip.md 
> > > (*branch_mask_twobits_equals_singlebit):
> > > Handle "if ((a & T) == C)" using Zbs, when T has 2 bits set and C 
> > > has one
> > > of these tow bits set.
> > > * config/riscv/predicates.md (const_twobits_operand): New 
> > > predicate.
> > >
> > > gcc/testsuite/ChangeLog:
> > >
> > > * gcc.target/riscv/zbs-if_then_else-01.c: New test.
> > >
> > > Signed-off-by: Philipp Tomsich 
> > > ---
> > >
> > >  gcc/config/riscv/bitmanip.md  | 42 +++
> > >  gcc/config/riscv/predicates.md|  5 +++
> > >  .../gcc.target/riscv/zbs-if_then_else-01.c| 20 +
> > >  3 files changed, 67 insertions(+)
> > >  create mode 100644 gcc/testsuite/gcc.target/riscv/zbs-if_then_else-01.c
> > >
> > > diff --git a/gcc/config/riscv/bitmanip.md b/gcc/config/riscv/bitmanip.md
> > > index 7a8f4e35880..2cea394671f 100644
> > > --- a/gcc/config/riscv/bitmanip.md
> > > +++ b/gcc/config/riscv/bitmanip.md
> > > @@ -690,3 +690,45 @@
> > >"TARGET_ZBS"
> > >[(set (match_dup 0) (zero_extract:X (match_dup 1) (const_int 1) 
> > > (match_dup 2)))
> > > (set (match_dup 0) (xor:X (match_dup 0) (const_int 1)))])
> > > +
> > > +;; IF_THEN_ELSE: test for 2 bits of opposite polarity
> > > +(define_insn_and_split "*branch_mask_twobits_equals_singlebit"
> > > +  [(set (pc)
> > > +   (if_then_else (match_operator 1 "equality_operator"
> > > +  [(and:X (match_operand:X 2 "register_operand" "r")
> > > +  (match_operand:X 3 "const_twobits_operand" 
> > > "i"))
> > > +   (match_operand:X 4 "single_bit_mask_operand" 
> > > "i")])
> > > +(label_ref (match_operand 0 "" ""))
> > > +(pc)))
> > > +   (clobber (match_scratch:X 5 "="))
> > > +   (clobber (match_scratch:X 6 "="))]
> > > +  "TARGET_ZBS && TARGET_ZBB && !SMALL_OPERAND (INTVAL (operands[3]))"
>
> Is there a reason why you can't do this at expand time? I think there
> are recent patches floating around which is supposed to help with that
> case and the RISCV backend just needs to plug into that infrastructure
> too.

I may have missed the specific patches you refer to (pointer to the
relevant series appreciated).

However, if we move this to expand-time, then ifcvt.cc will run after
(and may form this case once our support for polarity-reversed bit
tests is merged).
So there is good reason to have this pattern.

> Thanks,
> Andrew Pinski
>
> > > +  "#"
> > > +  "&& reload_completed"
> > > +  [(set (match_dup 5) (zero_extract:X (match_dup 2)
> > > + (const_int 1)
> > > + (match_dup 8)))
> > > +   (set (match_dup 6) (zero_extract:X (match_dup 2)
> > > + (const_int 1)
> > > + (match_dup 9)))
> > > +   (set (match_dup 6) (and:X (not:X (match_dup 6)) (match_dup 5)))
> > > +   (set (pc) (if_then_else (match_op_dup 1 [(match_dup 6) (const_int 0)])
> > > +  (label_ref (match_dup 0))
> > > +  (pc)))]
> > > +{
> > > +   unsigned HOST_WIDE_INT twobits_mask = UINTVAL (operands[3]);
> > > +   unsigned HOST_WIDE_INT singlebit_mask = UINTVAL (operands[4]);
> > > +
> > > +   /* Make sure that the reference value has one of the bits of the mask 
> > > set */
> > > +   if ((twobits_mask & singlebit_mask) == 0)
> > > +  FAIL;
> > > +
> > > +   int setbit = ctz_hwi (singlebit_mask);
> > > +   int clearbit = ctz_hwi (twobits_mask & ~singlebit_mask);
> > > +
> > > +   operands[1] = gen_rtx_fmt_ee (GET_CODE (operands[1]) == NE ? EQ : NE,
> > > +mode, operands[6], GEN_INT(0));
> > > +
> > > +   operands[8] = GEN_INT (setbit);
> > > +   operands[9] = GEN_INT (clearbit);
> > > +})
> > > diff --git a/gcc/config/riscv/predicates.md 
> > > b/gcc/config/riscv/predicates.md
> > > index 490bff688a7..6e34829a59b 100644
> > > --- a/gcc/config/riscv/predicates.md
> > > +++ b/gcc/config/riscv/predicates.md
> > > @@ -321,6 +321,11 @@
> > >(and (match_code "const_int")
> > > (match_test "popcount_hwi (~UINTVAL (op)) == 2")))
> > >
> > > +;; A 

Re: [PATCH] RISC-V: Handle "(a & twobits) == singlebit" in branches using Zbs

2022-11-17 Thread Andrew Waterman
On Thu, Nov 17, 2022 at 10:52 AM Philipp Tomsich
 wrote:
>
> On Thu, 17 Nov 2022 at 19:33, Andrew Waterman  wrote:
> >
> > Am I wrong to worry that this will increase dynamic instruction count
> > when used in a loop?  The obvious code is more efficient when the
> > constant loads can be hoisted out of a loop.  Or does the cost model
> > account for this somehow?
>
> With this change merged, GCC still hoists the constants out of the
> loop (just checked with a quick test case).
> So the cost model seems correct (whether intentionally or accidentally).

Cool, thanks for checking.

>
> Thanks,
> Philipp.
>
> >
> >
> > On Sun, Nov 13, 2022 at 12:50 PM Philipp Tomsich
> >  wrote:
> > >
> > > Use Zbs when generating a sequence for "if ((a & twobits) == singlebit) 
> > > ..."
> > > that can be expressed as bexti + bexti + andn.
> > >
> > > gcc/ChangeLog:
> > >
> > > * config/riscv/bitmanip.md 
> > > (*branch_mask_twobits_equals_singlebit):
> > > Handle "if ((a & T) == C)" using Zbs, when T has 2 bits set and C 
> > > has one
> > > of these tow bits set.
> > > * config/riscv/predicates.md (const_twobits_operand): New 
> > > predicate.
> > >
> > > gcc/testsuite/ChangeLog:
> > >
> > > * gcc.target/riscv/zbs-if_then_else-01.c: New test.
> > >
> > > Signed-off-by: Philipp Tomsich 
> > > ---
> > >
> > >  gcc/config/riscv/bitmanip.md  | 42 +++
> > >  gcc/config/riscv/predicates.md|  5 +++
> > >  .../gcc.target/riscv/zbs-if_then_else-01.c| 20 +
> > >  3 files changed, 67 insertions(+)
> > >  create mode 100644 gcc/testsuite/gcc.target/riscv/zbs-if_then_else-01.c
> > >
> > > diff --git a/gcc/config/riscv/bitmanip.md b/gcc/config/riscv/bitmanip.md
> > > index 7a8f4e35880..2cea394671f 100644
> > > --- a/gcc/config/riscv/bitmanip.md
> > > +++ b/gcc/config/riscv/bitmanip.md
> > > @@ -690,3 +690,45 @@
> > >"TARGET_ZBS"
> > >[(set (match_dup 0) (zero_extract:X (match_dup 1) (const_int 1) 
> > > (match_dup 2)))
> > > (set (match_dup 0) (xor:X (match_dup 0) (const_int 1)))])
> > > +
> > > +;; IF_THEN_ELSE: test for 2 bits of opposite polarity
> > > +(define_insn_and_split "*branch_mask_twobits_equals_singlebit"
> > > +  [(set (pc)
> > > +   (if_then_else (match_operator 1 "equality_operator"
> > > +  [(and:X (match_operand:X 2 "register_operand" "r")
> > > +  (match_operand:X 3 "const_twobits_operand" 
> > > "i"))
> > > +   (match_operand:X 4 "single_bit_mask_operand" 
> > > "i")])
> > > +(label_ref (match_operand 0 "" ""))
> > > +(pc)))
> > > +   (clobber (match_scratch:X 5 "="))
> > > +   (clobber (match_scratch:X 6 "="))]
> > > +  "TARGET_ZBS && TARGET_ZBB && !SMALL_OPERAND (INTVAL (operands[3]))"
> > > +  "#"
> > > +  "&& reload_completed"
> > > +  [(set (match_dup 5) (zero_extract:X (match_dup 2)
> > > + (const_int 1)
> > > + (match_dup 8)))
> > > +   (set (match_dup 6) (zero_extract:X (match_dup 2)
> > > + (const_int 1)
> > > + (match_dup 9)))
> > > +   (set (match_dup 6) (and:X (not:X (match_dup 6)) (match_dup 5)))
> > > +   (set (pc) (if_then_else (match_op_dup 1 [(match_dup 6) (const_int 0)])
> > > +  (label_ref (match_dup 0))
> > > +  (pc)))]
> > > +{
> > > +   unsigned HOST_WIDE_INT twobits_mask = UINTVAL (operands[3]);
> > > +   unsigned HOST_WIDE_INT singlebit_mask = UINTVAL (operands[4]);
> > > +
> > > +   /* Make sure that the reference value has one of the bits of the mask 
> > > set */
> > > +   if ((twobits_mask & singlebit_mask) == 0)
> > > +  FAIL;
> > > +
> > > +   int setbit = ctz_hwi (singlebit_mask);
> > > +   int clearbit = ctz_hwi (twobits_mask & ~singlebit_mask);
> > > +
> > > +   operands[1] = gen_rtx_fmt_ee (GET_CODE (operands[1]) == NE ? EQ : NE,
> > > +mode, operands[6], GEN_INT(0));
> > > +
> > > +   operands[8] = GEN_INT (setbit);
> > > +   operands[9] = GEN_INT (clearbit);
> > > +})
> > > diff --git a/gcc/config/riscv/predicates.md 
> > > b/gcc/config/riscv/predicates.md
> > > index 490bff688a7..6e34829a59b 100644
> > > --- a/gcc/config/riscv/predicates.md
> > > +++ b/gcc/config/riscv/predicates.md
> > > @@ -321,6 +321,11 @@
> > >(and (match_code "const_int")
> > > (match_test "popcount_hwi (~UINTVAL (op)) == 2")))
> > >
> > > +;; A CONST_INT operand that has exactly two bits set.
> > > +(define_predicate "const_twobits_operand"
> > > +  (and (match_code "const_int")
> > > +   (match_test "popcount_hwi (UINTVAL (op)) == 2")))
> > > +
> > >  ;; A CONST_INT operand that fits into the unsigned half of a
> > >  ;; signed-immediate after the top bit has been cleared.
> > >  (define_predicate "uimm_extra_bit_operand"
> > > diff --git 

Re: [PATCH] RISC-V: Handle "(a & twobits) == singlebit" in branches using Zbs

2022-11-17 Thread Philipp Tomsich
On Thu, 17 Nov 2022 at 19:33, Andrew Waterman  wrote:
>
> Am I wrong to worry that this will increase dynamic instruction count
> when used in a loop?  The obvious code is more efficient when the
> constant loads can be hoisted out of a loop.  Or does the cost model
> account for this somehow?

With this change merged, GCC still hoists the constants out of the
loop (just checked with a quick test case).
So the cost model seems correct (whether intentionally or accidentally).

Thanks,
Philipp.

>
>
> On Sun, Nov 13, 2022 at 12:50 PM Philipp Tomsich
>  wrote:
> >
> > Use Zbs when generating a sequence for "if ((a & twobits) == singlebit) ..."
> > that can be expressed as bexti + bexti + andn.
> >
> > gcc/ChangeLog:
> >
> > * config/riscv/bitmanip.md 
> > (*branch_mask_twobits_equals_singlebit):
> > Handle "if ((a & T) == C)" using Zbs, when T has 2 bits set and C 
> > has one
> > of these tow bits set.
> > * config/riscv/predicates.md (const_twobits_operand): New predicate.
> >
> > gcc/testsuite/ChangeLog:
> >
> > * gcc.target/riscv/zbs-if_then_else-01.c: New test.
> >
> > Signed-off-by: Philipp Tomsich 
> > ---
> >
> >  gcc/config/riscv/bitmanip.md  | 42 +++
> >  gcc/config/riscv/predicates.md|  5 +++
> >  .../gcc.target/riscv/zbs-if_then_else-01.c| 20 +
> >  3 files changed, 67 insertions(+)
> >  create mode 100644 gcc/testsuite/gcc.target/riscv/zbs-if_then_else-01.c
> >
> > diff --git a/gcc/config/riscv/bitmanip.md b/gcc/config/riscv/bitmanip.md
> > index 7a8f4e35880..2cea394671f 100644
> > --- a/gcc/config/riscv/bitmanip.md
> > +++ b/gcc/config/riscv/bitmanip.md
> > @@ -690,3 +690,45 @@
> >"TARGET_ZBS"
> >[(set (match_dup 0) (zero_extract:X (match_dup 1) (const_int 1) 
> > (match_dup 2)))
> > (set (match_dup 0) (xor:X (match_dup 0) (const_int 1)))])
> > +
> > +;; IF_THEN_ELSE: test for 2 bits of opposite polarity
> > +(define_insn_and_split "*branch_mask_twobits_equals_singlebit"
> > +  [(set (pc)
> > +   (if_then_else (match_operator 1 "equality_operator"
> > +  [(and:X (match_operand:X 2 "register_operand" "r")
> > +  (match_operand:X 3 "const_twobits_operand" 
> > "i"))
> > +   (match_operand:X 4 "single_bit_mask_operand" "i")])
> > +(label_ref (match_operand 0 "" ""))
> > +(pc)))
> > +   (clobber (match_scratch:X 5 "="))
> > +   (clobber (match_scratch:X 6 "="))]
> > +  "TARGET_ZBS && TARGET_ZBB && !SMALL_OPERAND (INTVAL (operands[3]))"
> > +  "#"
> > +  "&& reload_completed"
> > +  [(set (match_dup 5) (zero_extract:X (match_dup 2)
> > + (const_int 1)
> > + (match_dup 8)))
> > +   (set (match_dup 6) (zero_extract:X (match_dup 2)
> > + (const_int 1)
> > + (match_dup 9)))
> > +   (set (match_dup 6) (and:X (not:X (match_dup 6)) (match_dup 5)))
> > +   (set (pc) (if_then_else (match_op_dup 1 [(match_dup 6) (const_int 0)])
> > +  (label_ref (match_dup 0))
> > +  (pc)))]
> > +{
> > +   unsigned HOST_WIDE_INT twobits_mask = UINTVAL (operands[3]);
> > +   unsigned HOST_WIDE_INT singlebit_mask = UINTVAL (operands[4]);
> > +
> > +   /* Make sure that the reference value has one of the bits of the mask 
> > set */
> > +   if ((twobits_mask & singlebit_mask) == 0)
> > +  FAIL;
> > +
> > +   int setbit = ctz_hwi (singlebit_mask);
> > +   int clearbit = ctz_hwi (twobits_mask & ~singlebit_mask);
> > +
> > +   operands[1] = gen_rtx_fmt_ee (GET_CODE (operands[1]) == NE ? EQ : NE,
> > +mode, operands[6], GEN_INT(0));
> > +
> > +   operands[8] = GEN_INT (setbit);
> > +   operands[9] = GEN_INT (clearbit);
> > +})
> > diff --git a/gcc/config/riscv/predicates.md b/gcc/config/riscv/predicates.md
> > index 490bff688a7..6e34829a59b 100644
> > --- a/gcc/config/riscv/predicates.md
> > +++ b/gcc/config/riscv/predicates.md
> > @@ -321,6 +321,11 @@
> >(and (match_code "const_int")
> > (match_test "popcount_hwi (~UINTVAL (op)) == 2")))
> >
> > +;; A CONST_INT operand that has exactly two bits set.
> > +(define_predicate "const_twobits_operand"
> > +  (and (match_code "const_int")
> > +   (match_test "popcount_hwi (UINTVAL (op)) == 2")))
> > +
> >  ;; A CONST_INT operand that fits into the unsigned half of a
> >  ;; signed-immediate after the top bit has been cleared.
> >  (define_predicate "uimm_extra_bit_operand"
> > diff --git a/gcc/testsuite/gcc.target/riscv/zbs-if_then_else-01.c 
> > b/gcc/testsuite/gcc.target/riscv/zbs-if_then_else-01.c
> > new file mode 100644
> > index 000..d249a841ff9
> > --- /dev/null
> > +++ b/gcc/testsuite/gcc.target/riscv/zbs-if_then_else-01.c
> > @@ -0,0 +1,20 @@
> > +/* { dg-do compile } */
> > +/* { dg-options "-march=rv64gc_zbb_zbs -mabi=lp64" } */
> > 

Re: [PATCH] c++, v3: Implement C++23 P2647R1 - Permitting static constexpr variables in constexpr functions

2022-11-17 Thread Jakub Jelinek via Gcc-patches
On Thu, Nov 17, 2022 at 09:42:18AM -0500, Jason Merrill wrote:
> > --- gcc/cp/constexpr.cc.jj  2022-11-17 08:48:30.530357181 +0100
> > +++ gcc/cp/constexpr.cc 2022-11-17 09:56:50.479522863 +0100
> > @@ -7098,7 +7098,8 @@ cxx_eval_constant_expression (const cons
> > && (TREE_STATIC (r)
> > || (CP_DECL_THREAD_LOCAL_P (r) && !DECL_REALLY_EXTERN (r)))
> > /* Allow __FUNCTION__ etc.  */
> > -   && !DECL_ARTIFICIAL (r))
> > +   && !DECL_ARTIFICIAL (r)
> > +   && (cxx_dialect < cxx20 || !decl_constant_var_p (r)))
> 
> I don't think we need to check cxx_dialect here since
> diagnose_static_in_constexpr will have already complained.

I thought for older C++ this is to catch
void
foo ()
{
  constexpr int a = ({ static constexpr int b = 2; b; });
}
and for C++23 the only 3 spots that diagnose those.
But perhaps for C++20 or older we can check if the var has a context
of a constexpr function (then assume cp_finish_decl errored or pedwarned
already) and only error or pedwarn otherwise.

> 
> >   {
> > if (!ctx->quiet)
> >   {
> > @@ -9588,7 +9589,12 @@ potential_constant_expression_1 (tree t,
> > tmp = DECL_EXPR_DECL (t);
> > if (VAR_P (tmp) && !DECL_ARTIFICIAL (tmp))
> > {
> > - if (CP_DECL_THREAD_LOCAL_P (tmp) && !DECL_REALLY_EXTERN (tmp))
> > + if (CP_DECL_THREAD_LOCAL_P (tmp)
> > + && !DECL_REALLY_EXTERN (tmp)
> > + && (cxx_dialect < cxx20
> > + || (processing_template_decl
> > + ? !decl_maybe_constant_var_p (tmp)
> > + : !decl_constant_var_p (tmp
> 
> Or here.
> 
> > {
> >   if (flags & tf_error)
> > constexpr_error (DECL_SOURCE_LOCATION (tmp), fundef_p,
> > @@ -9596,7 +9602,11 @@ potential_constant_expression_1 (tree t,
> >  "% context", tmp);
> >   return false;
> > }
> > - else if (TREE_STATIC (tmp))
> > + else if (TREE_STATIC (tmp)
> > +  && (cxx_dialect < cxx20
> > +  || (processing_template_decl
> > +  ? !decl_maybe_constant_var_p (tmp)
> > +  : !decl_constant_var_p (tmp
> > {
> >   if (flags & tf_error)
> > constexpr_error (DECL_SOURCE_LOCATION (tmp), fundef_p,

And these too.

> > +static void
> > +diagnose_static_in_constexpr (tree decl)
> > +{
> > +  if (current_function_decl && VAR_P (decl)
> > +  && DECL_DECLARED_CONSTEXPR_P (current_function_decl)
> > +  && cxx_dialect < cxx23
> > +  && (cxx_dialect < cxx20
> > + || (processing_template_decl
> > + ? !decl_maybe_constant_var_p (decl)
> > + : !decl_constant_var_p (decl
> 
> For (maybe) constant variables let's make this error a pedwarn in C++20 and
> below.

Ok.
> 
> > +{
> > +  bool ok = false;
> > +  if (CP_DECL_THREAD_LOCAL_P (decl) && !DECL_REALLY_EXTERN (decl))
> > +   error_at (DECL_SOURCE_LOCATION (decl),
> > + "%qD defined % in %qs function only "
> > + "available with %<-std=c++2b%> or %<-std=gnu++2b%>", decl,
> > + DECL_IMMEDIATE_FUNCTION_P (current_function_decl)
> > + ? "consteval" : "constexpr");
> > +  else if (TREE_STATIC (decl))
> > +   error_at (DECL_SOURCE_LOCATION (decl),
> > + "%qD defined % in %qs function only available "
> > + "with %<-std=c++2b%> or %<-std=gnu++2b%>", decl,
> > + DECL_IMMEDIATE_FUNCTION_P (current_function_decl)
> > + ? "consteval" : "constexpr");
> > +  else
> > +   ok = true;
> > +  if (!ok)
> > +   cp_function_chain->invalid_constexpr = true;
> > +}
> > +}
> > +
> >   /* Process a DECLARATOR for a function-scope or namespace-scope
> >  variable or function declaration.
> >  (Function definitions go through start_function; class member
> > @@ -5860,28 +5895,8 @@ start_decl (const cp_declarator *declara
> > DECL_THIS_STATIC (decl) = 1;
> >   }
> > -  if (current_function_decl && VAR_P (decl)
> > -  && DECL_DECLARED_CONSTEXPR_P (current_function_decl)
> > -  && cxx_dialect < cxx23)
> > -{
> > -  bool ok = false;
> > -  if (CP_DECL_THREAD_LOCAL_P (decl) && !DECL_REALLY_EXTERN (decl))
> > -   error_at (DECL_SOURCE_LOCATION (decl),
> > - "%qD defined % in %qs function only "
> > - "available with %<-std=c++2b%> or %<-std=gnu++2b%>", decl,
> > - DECL_IMMEDIATE_FUNCTION_P (current_function_decl)
> > - ? "consteval" : "constexpr");
> > -  else if (TREE_STATIC (decl))
> > -   error_at (DECL_SOURCE_LOCATION (decl),
> > - "%qD defined % in %qs function only available "
> > - "with %<-std=c++2b%> or %<-std=gnu++2b%>", decl,
> > - DECL_IMMEDIATE_FUNCTION_P (current_function_decl)
> > - ? "consteval" : "constexpr");
> > -  else
> > -   ok = true;
> > -  if (!ok)
> > -   

[PATCH] c++: constinit on pointer to function [PR104066]

2022-11-17 Thread Marek Polacek via Gcc-patches
[dcl.constinit]: "The constinit specifier shall be applied only to
a declaration of a variable with static or thread storage duration."

Thus, this ought to be OK:

  constinit void (*p)() = nullptr;

but the error message I introduced when implementing constinit was
not looking at funcdecl_p, so the code above was rejected.

Fixed thus.  I'm checking constinit_p first because I think that's
far more likely to be false than funcdecl_p.

Bootstrapped/regtested on x86_64-pc-linux-gnu, ok for trunk?
I think I'd like to backport this all the way back to 10.

PR c++/104066

gcc/cp/ChangeLog:

* decl.cc (grokdeclarator): Check funcdecl_p before complaining
about constinit.

gcc/testsuite/ChangeLog:

* g++.dg/cpp2a/constinit18.C: New test.
---
 gcc/cp/decl.cc   |  2 +-
 gcc/testsuite/g++.dg/cpp2a/constinit18.C | 12 
 2 files changed, 13 insertions(+), 1 deletion(-)
 create mode 100644 gcc/testsuite/g++.dg/cpp2a/constinit18.C

diff --git a/gcc/cp/decl.cc b/gcc/cp/decl.cc
index d28889ed865..9a7b1a6c381 100644
--- a/gcc/cp/decl.cc
+++ b/gcc/cp/decl.cc
@@ -13071,7 +13071,7 @@ grokdeclarator (const cp_declarator *declarator,
  "an array", name);
return error_mark_node;
  }
-   if (constinit_p)
+   if (constinit_p && funcdecl_p)
  {
error_at (declspecs->locations[ds_constinit],
  "% on function return type is not "
diff --git a/gcc/testsuite/g++.dg/cpp2a/constinit18.C 
b/gcc/testsuite/g++.dg/cpp2a/constinit18.C
new file mode 100644
index 000..51b4f0273be
--- /dev/null
+++ b/gcc/testsuite/g++.dg/cpp2a/constinit18.C
@@ -0,0 +1,12 @@
+// PR c++/104066
+// { dg-do compile { target c++20 } }
+
+constinit void (*p)() = nullptr;
+constinit void (*pp)() = nullptr;
+void fn();
+constinit void ()() = fn;
+
+extern constinit long (* const syscall_reexported) (long, ...);
+
+constinit void bad (); // { dg-error ".constinit. on function return type is 
not allowed" }
+constinit void bad () { } // { dg-error ".constinit. on function return type 
is not allowed" }

base-commit: ee892832ea19b21a3420ef042e582204fac852a2
-- 
2.38.1



Re: [PATCH] RISC-V: Handle "(a & twobits) == singlebit" in branches using Zbs

2022-11-17 Thread Andrew Waterman
Am I wrong to worry that this will increase dynamic instruction count
when used in a loop?  The obvious code is more efficient when the
constant loads can be hoisted out of a loop.  Or does the cost model
account for this somehow?


On Sun, Nov 13, 2022 at 12:50 PM Philipp Tomsich
 wrote:
>
> Use Zbs when generating a sequence for "if ((a & twobits) == singlebit) ..."
> that can be expressed as bexti + bexti + andn.
>
> gcc/ChangeLog:
>
> * config/riscv/bitmanip.md 
> (*branch_mask_twobits_equals_singlebit):
> Handle "if ((a & T) == C)" using Zbs, when T has 2 bits set and C has 
> one
> of these tow bits set.
> * config/riscv/predicates.md (const_twobits_operand): New predicate.
>
> gcc/testsuite/ChangeLog:
>
> * gcc.target/riscv/zbs-if_then_else-01.c: New test.
>
> Signed-off-by: Philipp Tomsich 
> ---
>
>  gcc/config/riscv/bitmanip.md  | 42 +++
>  gcc/config/riscv/predicates.md|  5 +++
>  .../gcc.target/riscv/zbs-if_then_else-01.c| 20 +
>  3 files changed, 67 insertions(+)
>  create mode 100644 gcc/testsuite/gcc.target/riscv/zbs-if_then_else-01.c
>
> diff --git a/gcc/config/riscv/bitmanip.md b/gcc/config/riscv/bitmanip.md
> index 7a8f4e35880..2cea394671f 100644
> --- a/gcc/config/riscv/bitmanip.md
> +++ b/gcc/config/riscv/bitmanip.md
> @@ -690,3 +690,45 @@
>"TARGET_ZBS"
>[(set (match_dup 0) (zero_extract:X (match_dup 1) (const_int 1) (match_dup 
> 2)))
> (set (match_dup 0) (xor:X (match_dup 0) (const_int 1)))])
> +
> +;; IF_THEN_ELSE: test for 2 bits of opposite polarity
> +(define_insn_and_split "*branch_mask_twobits_equals_singlebit"
> +  [(set (pc)
> +   (if_then_else (match_operator 1 "equality_operator"
> +  [(and:X (match_operand:X 2 "register_operand" "r")
> +  (match_operand:X 3 "const_twobits_operand" 
> "i"))
> +   (match_operand:X 4 "single_bit_mask_operand" "i")])
> +(label_ref (match_operand 0 "" ""))
> +(pc)))
> +   (clobber (match_scratch:X 5 "="))
> +   (clobber (match_scratch:X 6 "="))]
> +  "TARGET_ZBS && TARGET_ZBB && !SMALL_OPERAND (INTVAL (operands[3]))"
> +  "#"
> +  "&& reload_completed"
> +  [(set (match_dup 5) (zero_extract:X (match_dup 2)
> + (const_int 1)
> + (match_dup 8)))
> +   (set (match_dup 6) (zero_extract:X (match_dup 2)
> + (const_int 1)
> + (match_dup 9)))
> +   (set (match_dup 6) (and:X (not:X (match_dup 6)) (match_dup 5)))
> +   (set (pc) (if_then_else (match_op_dup 1 [(match_dup 6) (const_int 0)])
> +  (label_ref (match_dup 0))
> +  (pc)))]
> +{
> +   unsigned HOST_WIDE_INT twobits_mask = UINTVAL (operands[3]);
> +   unsigned HOST_WIDE_INT singlebit_mask = UINTVAL (operands[4]);
> +
> +   /* Make sure that the reference value has one of the bits of the mask set 
> */
> +   if ((twobits_mask & singlebit_mask) == 0)
> +  FAIL;
> +
> +   int setbit = ctz_hwi (singlebit_mask);
> +   int clearbit = ctz_hwi (twobits_mask & ~singlebit_mask);
> +
> +   operands[1] = gen_rtx_fmt_ee (GET_CODE (operands[1]) == NE ? EQ : NE,
> +mode, operands[6], GEN_INT(0));
> +
> +   operands[8] = GEN_INT (setbit);
> +   operands[9] = GEN_INT (clearbit);
> +})
> diff --git a/gcc/config/riscv/predicates.md b/gcc/config/riscv/predicates.md
> index 490bff688a7..6e34829a59b 100644
> --- a/gcc/config/riscv/predicates.md
> +++ b/gcc/config/riscv/predicates.md
> @@ -321,6 +321,11 @@
>(and (match_code "const_int")
> (match_test "popcount_hwi (~UINTVAL (op)) == 2")))
>
> +;; A CONST_INT operand that has exactly two bits set.
> +(define_predicate "const_twobits_operand"
> +  (and (match_code "const_int")
> +   (match_test "popcount_hwi (UINTVAL (op)) == 2")))
> +
>  ;; A CONST_INT operand that fits into the unsigned half of a
>  ;; signed-immediate after the top bit has been cleared.
>  (define_predicate "uimm_extra_bit_operand"
> diff --git a/gcc/testsuite/gcc.target/riscv/zbs-if_then_else-01.c 
> b/gcc/testsuite/gcc.target/riscv/zbs-if_then_else-01.c
> new file mode 100644
> index 000..d249a841ff9
> --- /dev/null
> +++ b/gcc/testsuite/gcc.target/riscv/zbs-if_then_else-01.c
> @@ -0,0 +1,20 @@
> +/* { dg-do compile } */
> +/* { dg-options "-march=rv64gc_zbb_zbs -mabi=lp64" } */
> +/* { dg-skip-if "" { *-*-* } { "-O0" "-Og" "-O1" } } */
> +
> +void g();
> +
> +void f1 (long a)
> +{
> +  if ((a & ((1ul << 33) | (1 << 4))) == (1ul << 33))
> +g();
> +}
> +
> +void f2 (long a)
> +{
> +  if ((a & 0x12) == 0x10)
> +g();
> +}
> +
> +/* { dg-final { scan-assembler-times "bexti\t" 2 } } */
> +/* { dg-final { scan-assembler-times "andn\t" 1 } } */
> --
> 2.34.1
>


Re: [PATCH] RISC-V: Handle "(a & twobits) == singlebit" in branches using Zbs

2022-11-17 Thread Andrew Pinski via Gcc-patches
On Thu, Nov 17, 2022 at 10:25 AM Andrew Pinski  wrote:
>
> On Sun, Nov 13, 2022 at 12:51 PM Philipp Tomsich
>  wrote:
> >
> > Use Zbs when generating a sequence for "if ((a & twobits) == singlebit) ..."
> > that can be expressed as bexti + bexti + andn.
>
> Can't you also handle if ((a & twobits) == 0) case doing a similar thing.
> That is:
> two bexti + and and then compare against zero which is exactly the
> same # of instructions as the above case.
>
>
> >
> > gcc/ChangeLog:
> >
> > * config/riscv/bitmanip.md 
> > (*branch_mask_twobits_equals_singlebit):
> > Handle "if ((a & T) == C)" using Zbs, when T has 2 bits set and C 
> > has one
> > of these tow bits set.
> > * config/riscv/predicates.md (const_twobits_operand): New predicate.
> >
> > gcc/testsuite/ChangeLog:
> >
> > * gcc.target/riscv/zbs-if_then_else-01.c: New test.
> >
> > Signed-off-by: Philipp Tomsich 
> > ---
> >
> >  gcc/config/riscv/bitmanip.md  | 42 +++
> >  gcc/config/riscv/predicates.md|  5 +++
> >  .../gcc.target/riscv/zbs-if_then_else-01.c| 20 +
> >  3 files changed, 67 insertions(+)
> >  create mode 100644 gcc/testsuite/gcc.target/riscv/zbs-if_then_else-01.c
> >
> > diff --git a/gcc/config/riscv/bitmanip.md b/gcc/config/riscv/bitmanip.md
> > index 7a8f4e35880..2cea394671f 100644
> > --- a/gcc/config/riscv/bitmanip.md
> > +++ b/gcc/config/riscv/bitmanip.md
> > @@ -690,3 +690,45 @@
> >"TARGET_ZBS"
> >[(set (match_dup 0) (zero_extract:X (match_dup 1) (const_int 1) 
> > (match_dup 2)))
> > (set (match_dup 0) (xor:X (match_dup 0) (const_int 1)))])
> > +
> > +;; IF_THEN_ELSE: test for 2 bits of opposite polarity
> > +(define_insn_and_split "*branch_mask_twobits_equals_singlebit"
> > +  [(set (pc)
> > +   (if_then_else (match_operator 1 "equality_operator"
> > +  [(and:X (match_operand:X 2 "register_operand" "r")
> > +  (match_operand:X 3 "const_twobits_operand" 
> > "i"))
> > +   (match_operand:X 4 "single_bit_mask_operand" "i")])
> > +(label_ref (match_operand 0 "" ""))
> > +(pc)))
> > +   (clobber (match_scratch:X 5 "="))
> > +   (clobber (match_scratch:X 6 "="))]
> > +  "TARGET_ZBS && TARGET_ZBB && !SMALL_OPERAND (INTVAL (operands[3]))"

Is there a reason why you can't do this at expand time? I think there
are recent patches floating around which is supposed to help with that
case and the RISCV backend just needs to plug into that infrastructure
too.

Thanks,
Andrew Pinski

> > +  "#"
> > +  "&& reload_completed"
> > +  [(set (match_dup 5) (zero_extract:X (match_dup 2)
> > + (const_int 1)
> > + (match_dup 8)))
> > +   (set (match_dup 6) (zero_extract:X (match_dup 2)
> > + (const_int 1)
> > + (match_dup 9)))
> > +   (set (match_dup 6) (and:X (not:X (match_dup 6)) (match_dup 5)))
> > +   (set (pc) (if_then_else (match_op_dup 1 [(match_dup 6) (const_int 0)])
> > +  (label_ref (match_dup 0))
> > +  (pc)))]
> > +{
> > +   unsigned HOST_WIDE_INT twobits_mask = UINTVAL (operands[3]);
> > +   unsigned HOST_WIDE_INT singlebit_mask = UINTVAL (operands[4]);
> > +
> > +   /* Make sure that the reference value has one of the bits of the mask 
> > set */
> > +   if ((twobits_mask & singlebit_mask) == 0)
> > +  FAIL;
> > +
> > +   int setbit = ctz_hwi (singlebit_mask);
> > +   int clearbit = ctz_hwi (twobits_mask & ~singlebit_mask);
> > +
> > +   operands[1] = gen_rtx_fmt_ee (GET_CODE (operands[1]) == NE ? EQ : NE,
> > +mode, operands[6], GEN_INT(0));
> > +
> > +   operands[8] = GEN_INT (setbit);
> > +   operands[9] = GEN_INT (clearbit);
> > +})
> > diff --git a/gcc/config/riscv/predicates.md b/gcc/config/riscv/predicates.md
> > index 490bff688a7..6e34829a59b 100644
> > --- a/gcc/config/riscv/predicates.md
> > +++ b/gcc/config/riscv/predicates.md
> > @@ -321,6 +321,11 @@
> >(and (match_code "const_int")
> > (match_test "popcount_hwi (~UINTVAL (op)) == 2")))
> >
> > +;; A CONST_INT operand that has exactly two bits set.
> > +(define_predicate "const_twobits_operand"
> > +  (and (match_code "const_int")
> > +   (match_test "popcount_hwi (UINTVAL (op)) == 2")))
> > +
> >  ;; A CONST_INT operand that fits into the unsigned half of a
> >  ;; signed-immediate after the top bit has been cleared.
> >  (define_predicate "uimm_extra_bit_operand"
> > diff --git a/gcc/testsuite/gcc.target/riscv/zbs-if_then_else-01.c 
> > b/gcc/testsuite/gcc.target/riscv/zbs-if_then_else-01.c
> > new file mode 100644
> > index 000..d249a841ff9
> > --- /dev/null
> > +++ b/gcc/testsuite/gcc.target/riscv/zbs-if_then_else-01.c
> > @@ -0,0 +1,20 @@
> > +/* { dg-do compile } */
> > +/* { dg-options "-march=rv64gc_zbb_zbs -mabi=lp64" } */
> > 

Re: [PATCH] RISC-V: Handle "(a & twobits) == singlebit" in branches using Zbs

2022-11-17 Thread Andrew Pinski via Gcc-patches
On Sun, Nov 13, 2022 at 12:51 PM Philipp Tomsich
 wrote:
>
> Use Zbs when generating a sequence for "if ((a & twobits) == singlebit) ..."
> that can be expressed as bexti + bexti + andn.

Can't you also handle if ((a & twobits) == 0) case doing a similar thing.
That is:
two bexti + and and then compare against zero which is exactly the
same # of instructions as the above case.


>
> gcc/ChangeLog:
>
> * config/riscv/bitmanip.md 
> (*branch_mask_twobits_equals_singlebit):
> Handle "if ((a & T) == C)" using Zbs, when T has 2 bits set and C has 
> one
> of these tow bits set.
> * config/riscv/predicates.md (const_twobits_operand): New predicate.
>
> gcc/testsuite/ChangeLog:
>
> * gcc.target/riscv/zbs-if_then_else-01.c: New test.
>
> Signed-off-by: Philipp Tomsich 
> ---
>
>  gcc/config/riscv/bitmanip.md  | 42 +++
>  gcc/config/riscv/predicates.md|  5 +++
>  .../gcc.target/riscv/zbs-if_then_else-01.c| 20 +
>  3 files changed, 67 insertions(+)
>  create mode 100644 gcc/testsuite/gcc.target/riscv/zbs-if_then_else-01.c
>
> diff --git a/gcc/config/riscv/bitmanip.md b/gcc/config/riscv/bitmanip.md
> index 7a8f4e35880..2cea394671f 100644
> --- a/gcc/config/riscv/bitmanip.md
> +++ b/gcc/config/riscv/bitmanip.md
> @@ -690,3 +690,45 @@
>"TARGET_ZBS"
>[(set (match_dup 0) (zero_extract:X (match_dup 1) (const_int 1) (match_dup 
> 2)))
> (set (match_dup 0) (xor:X (match_dup 0) (const_int 1)))])
> +
> +;; IF_THEN_ELSE: test for 2 bits of opposite polarity
> +(define_insn_and_split "*branch_mask_twobits_equals_singlebit"
> +  [(set (pc)
> +   (if_then_else (match_operator 1 "equality_operator"
> +  [(and:X (match_operand:X 2 "register_operand" "r")
> +  (match_operand:X 3 "const_twobits_operand" 
> "i"))
> +   (match_operand:X 4 "single_bit_mask_operand" "i")])
> +(label_ref (match_operand 0 "" ""))
> +(pc)))
> +   (clobber (match_scratch:X 5 "="))
> +   (clobber (match_scratch:X 6 "="))]
> +  "TARGET_ZBS && TARGET_ZBB && !SMALL_OPERAND (INTVAL (operands[3]))"
> +  "#"
> +  "&& reload_completed"
> +  [(set (match_dup 5) (zero_extract:X (match_dup 2)
> + (const_int 1)
> + (match_dup 8)))
> +   (set (match_dup 6) (zero_extract:X (match_dup 2)
> + (const_int 1)
> + (match_dup 9)))
> +   (set (match_dup 6) (and:X (not:X (match_dup 6)) (match_dup 5)))
> +   (set (pc) (if_then_else (match_op_dup 1 [(match_dup 6) (const_int 0)])
> +  (label_ref (match_dup 0))
> +  (pc)))]
> +{
> +   unsigned HOST_WIDE_INT twobits_mask = UINTVAL (operands[3]);
> +   unsigned HOST_WIDE_INT singlebit_mask = UINTVAL (operands[4]);
> +
> +   /* Make sure that the reference value has one of the bits of the mask set 
> */
> +   if ((twobits_mask & singlebit_mask) == 0)
> +  FAIL;
> +
> +   int setbit = ctz_hwi (singlebit_mask);
> +   int clearbit = ctz_hwi (twobits_mask & ~singlebit_mask);
> +
> +   operands[1] = gen_rtx_fmt_ee (GET_CODE (operands[1]) == NE ? EQ : NE,
> +mode, operands[6], GEN_INT(0));
> +
> +   operands[8] = GEN_INT (setbit);
> +   operands[9] = GEN_INT (clearbit);
> +})
> diff --git a/gcc/config/riscv/predicates.md b/gcc/config/riscv/predicates.md
> index 490bff688a7..6e34829a59b 100644
> --- a/gcc/config/riscv/predicates.md
> +++ b/gcc/config/riscv/predicates.md
> @@ -321,6 +321,11 @@
>(and (match_code "const_int")
> (match_test "popcount_hwi (~UINTVAL (op)) == 2")))
>
> +;; A CONST_INT operand that has exactly two bits set.
> +(define_predicate "const_twobits_operand"
> +  (and (match_code "const_int")
> +   (match_test "popcount_hwi (UINTVAL (op)) == 2")))
> +
>  ;; A CONST_INT operand that fits into the unsigned half of a
>  ;; signed-immediate after the top bit has been cleared.
>  (define_predicate "uimm_extra_bit_operand"
> diff --git a/gcc/testsuite/gcc.target/riscv/zbs-if_then_else-01.c 
> b/gcc/testsuite/gcc.target/riscv/zbs-if_then_else-01.c
> new file mode 100644
> index 000..d249a841ff9
> --- /dev/null
> +++ b/gcc/testsuite/gcc.target/riscv/zbs-if_then_else-01.c
> @@ -0,0 +1,20 @@
> +/* { dg-do compile } */
> +/* { dg-options "-march=rv64gc_zbb_zbs -mabi=lp64" } */
> +/* { dg-skip-if "" { *-*-* } { "-O0" "-Og" "-O1" } } */

It would be useful to add a rv32 testcase too.

Thanks,
Andrew Pinski

> +
> +void g();
> +
> +void f1 (long a)
> +{
> +  if ((a & ((1ul << 33) | (1 << 4))) == (1ul << 33))
> +g();
> +}
> +
> +void f2 (long a)
> +{
> +  if ((a & 0x12) == 0x10)
> +g();
> +}
> +
> +/* { dg-final { scan-assembler-times "bexti\t" 2 } } */
> +/* { dg-final { scan-assembler-times "andn\t" 1 } } */
> --
> 2.34.1
>


[COMMITTED] Fix PR 107734: valgrind errors with sbitmap in match.pd

2022-11-17 Thread apinski--- via Gcc-patches
From: Andrew Pinski 

sbitmap is a simple bitmap and the memory allocated is not cleared
on creation; you have to clear it or set it to all ones before using
it.  This is unlike bitmap which is a sparse bitmap and the entries are
cleared as created.
The code added in r13-4044-gdc95e1e9702f2f missed that.
This patch fixes that mistake.

Committed as obvious after a bootstrap and test on x86_64-linux-gnu.

gcc/ChangeLog:

PR middle-end/107734
* match.pd (perm + vector op pattern): Clear the sbitmap before
use.
---
 gcc/match.pd | 2 ++
 1 file changed, 2 insertions(+)

diff --git a/gcc/match.pd b/gcc/match.pd
index 5aba1653b80..a4d1386fd9f 100644
--- a/gcc/match.pd
+++ b/gcc/match.pd
@@ -8288,6 +8288,8 @@ and,
   if (sel.encoding ().encoded_full_vector_p ())
 {
   auto_sbitmap seen (nelts);
+  bitmap_clear (seen);
+
   unsigned HOST_WIDE_INT count = 0, i;
 
   for (i = 0; i < nelts; i++)
-- 
2.17.1



[COMMITTED] [PR tree-optimization/107732] [range-ops] Handle attempt to abs() negatives.

2022-11-17 Thread Aldy Hernandez via Gcc-patches
The threader is creating a scenario where we are trying to solve:

[NEGATIVES] = abs(x)

While solving this we have an intermediate value of UNDEFINED because
we have no positive numbers.  But then we try to union the negative
pair to the final result by querying the bounds.  Since neither
UNDEFINED nor NAN have bounds, they need to be specially handled.

PR tree-optimization/107732

gcc/ChangeLog:

* range-op-float.cc (foperator_abs::op1_range): Early exit when
result is undefined.

gcc/testsuite/ChangeLog:

* gcc.dg/tree-ssa/pr107732.c: New test.
---
 gcc/range-op-float.cc|  2 +-
 gcc/testsuite/gcc.dg/tree-ssa/pr107732.c | 13 +
 2 files changed, 14 insertions(+), 1 deletion(-)
 create mode 100644 gcc/testsuite/gcc.dg/tree-ssa/pr107732.c

diff --git a/gcc/range-op-float.cc b/gcc/range-op-float.cc
index adb0cbaa6d5..ee88511eba0 100644
--- a/gcc/range-op-float.cc
+++ b/gcc/range-op-float.cc
@@ -1407,7 +1407,7 @@ foperator_abs::op1_range (frange , tree type,
   neg_nan.set_nan (type, true);
   r.union_ (neg_nan);
 }
-  if (r.known_isnan ())
+  if (r.known_isnan () || r.undefined_p ())
 return true;
   // Then add the negative of each pair:
   // ABS(op1) = [5,20] would yield op1 => [-20,-5][5,20].
diff --git a/gcc/testsuite/gcc.dg/tree-ssa/pr107732.c 
b/gcc/testsuite/gcc.dg/tree-ssa/pr107732.c
new file mode 100644
index 000..b216f38db0e
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/tree-ssa/pr107732.c
@@ -0,0 +1,13 @@
+// { dg-do compile }
+// { dg-options "-O2" }
+
+double sqrt(double);
+double a, b, c;
+void d() {
+  for (;;) {
+c = __builtin_fabs(a);
+sqrt(c);
+if (a)
+  a = b;
+  }
+}
-- 
2.38.1



Re: [PATCH] maintainer-scripts/gcc_release: compress xz in parallel

2022-11-17 Thread Sam James via Gcc-patches


> On 8 Nov 2022, at 07:14, Sam James  wrote:
> 
> 1. This should speed up decompression for folks, as parallel xz
>   creates a different archive which can be decompressed in parallel.
> 
>   Note that this different method is enabled by default in a new
>   xz release coming shortly anyway (>= 5.3.3_alpha1).
> 
>   I build GCC regularly from the weekly snapshots
>   and so the decompression time adds up.
> 
> 2. It should speed up compression on the webserver a bit.
> 
>   Note that -T0 won't be the default in the new xz release,
>   only the parallel compression mode (which enables parallel
>   decompression).
> 
>   -T0 detects the number of cores available.
> 
>   So, if a different number of threads is preferred, it's fine
>   to set e.g. -T2, etc.
> 
> Signed-off-by: Sam James 
> ---
> maintainer-scripts/gcc_release | 2 +-
> 1 file changed, 1 insertion(+), 1 deletion(-)
> 
> diff --git a/maintainer-scripts/gcc_release b/maintainer-scripts/gcc_release
> index 2456908d716..962b8efe99a 100755
> --- a/maintainer-scripts/gcc_release
> +++ b/maintainer-scripts/gcc_release
> @@ -609,7 +609,7 @@ FILE_LIST=""
> # Programs we use.
> 
> BZIP2="${BZIP2:-bzip2}"
> -XZ="${XZ:-xz --best}"
> +XZ="${XZ:-xz -T0 --best}"
> CVS="${CVS:-cvs -f -Q -z9}"
> DIFF="${DIFF:-diff -Nrcpad}"
> ENV="${ENV:-env}"
> --
> 2.38.1
> 

ping


signature.asc
Description: Message signed with OpenPGP


Re: [PATCH] [range-ops] Implement sqrt.

2022-11-17 Thread Aldy Hernandez via Gcc-patches
This may be DCE.

DOM uses ranger through simplify_using_ranges::fold_cond() to fold the
following conditional to false, because we know x_185 is a NAN:

 x_185 = __builtin_sqrtf (-1.0e+0);
if (x_185 ord x_185)

I believe we can do that, because there are no user observable
effects.  But DCE removes the sqrt which could trap:

Eliminating unnecessary statements:
Deleting : x_185 = __builtin_sqrtf (-1.0e+0);

Is DCE allowed to remove that sqrtf call?

Thanks.
Aldy

On Thu, Nov 17, 2022 at 5:48 PM Aldy Hernandez  wrote:
>
>
>
> On 11/17/22 17:40, Aldy Hernandez wrote:
> > To go along with whatever magic we're gonna tack along to the
> > range-ops sqrt implementation, here is another revision addressing the
> > VARYING issue you pointed out.
> >
> > A few things...
> >
> > Instead of going through trees, I decided to call do_mpfr_arg1
> > directly.  Let's not go the wide int <-> tree rat hole in this one.
> >
> > The function do_mpfr_arg1 bails on +INF, so I had to handle it manually.
> >
> > There's a regression in gfortran.dg/ieee/ieee_6.f90, which I'm not
> > sure how to handle.  We are failing because we are calculating
> > sqrt(-1) and expecting certain IEEE flags set.  These flags aren't
> > set, presumably because we folded sqrt(-1) into a NAN directly:
> >
> >  // All negatives.
> >  if (real_compare (LT_EXPR, _ub, ))
> >{
> >  real_nan (, "", 0, TYPE_MODE (type));
> >  ub = lb;
> >  maybe_nan = true;
> >  return;
> >}
>
> FWIW, we could return [-0.0, +INF] +-NAN which would keep us from
> eliding the sqrt, but it'd be a pity to keep the sqrt unless it's
> mandated by some IEEE canon.
>
> Aldy



Re: [PATCH] AArch64: Add support for -mdirect-extern-access

2022-11-17 Thread Richard Sandiford via Gcc-patches
Wilco Dijkstra  writes:
> Hi Richard,
>
>> Can you go into more detail about:
>>
>>Use :option:`-mdirect-extern-access` either in shared libraries or in
>>executables, but not in both.  Protected symbols used both in a shared
>>library and executable may cause linker errors or fail to work correctly
>>
>> If this is LLVM's default for PIC (and by assumption shared libraries),
>> is it then invalid to use -mdirect-extern-access for any PIEs that
>> are linked against those shared libraries and use protected symbols
>> from those libraries?  How would a user know that one of the shared
>> libraries they're linking against was built in this way?
>
> Yes, the usage model is that you'd either use it for static PIE or only on
> data that is not shared. If you get it wrong them you'll get the copy
> relocation error.

Thanks.  I think I'm still missing something though.  If, for the
non-executable case, people should only use the feature on data that
is not shared, why do we need to relax the binds-local condition for
protected symbols on -fPIC?  Oughtn't the symbol to be hidden rather
than protected if the data isn't shared?

I can understand the reasoning for the PIE changes but I'm still
struggling with the PIC-but-not-PIE bits.

> In the future we need to decide what the ABI is and
> ensure GCC and LLVM are compatible. An import feature to mark symbols
> that may be overridden by a shared library would be useful too.
>
>> It looks like the main difference between this implementation and
>> the x86 one is that x86 allows direct accesses to common symbols.
>> What's the reason for not doing that for AArch64?  Does it not work,
>> is it a false optimisation (i.e. pessimisation), or did it not seem
>> important now that -fno-common is the default?
>
> I don't see any difference in the way common symbols are accessed on x86,
> so it's not clear which cases common_local_p param actually affects (eg. with
> -fPIC there is always a GOT indirection for common symbols).

Hmm, OK.  Could it be for one of the other languages?  But yeah,
if we don't have a testcase for it, I agree it's better to leave
things as they are.

Thanks,
Richard


Re: [PATCH] [range-ops] Implement sqrt.

2022-11-17 Thread Aldy Hernandez via Gcc-patches




On 11/17/22 17:40, Aldy Hernandez wrote:

To go along with whatever magic we're gonna tack along to the
range-ops sqrt implementation, here is another revision addressing the
VARYING issue you pointed out.

A few things...

Instead of going through trees, I decided to call do_mpfr_arg1
directly.  Let's not go the wide int <-> tree rat hole in this one.

The function do_mpfr_arg1 bails on +INF, so I had to handle it manually.

There's a regression in gfortran.dg/ieee/ieee_6.f90, which I'm not
sure how to handle.  We are failing because we are calculating
sqrt(-1) and expecting certain IEEE flags set.  These flags aren't
set, presumably because we folded sqrt(-1) into a NAN directly:

 // All negatives.
 if (real_compare (LT_EXPR, _ub, ))
   {
 real_nan (, "", 0, TYPE_MODE (type));
 ub = lb;
 maybe_nan = true;
 return;
   }


FWIW, we could return [-0.0, +INF] +-NAN which would keep us from 
eliding the sqrt, but it'd be a pity to keep the sqrt unless it's 
mandated by some IEEE canon.


Aldy



PING^5 [PATCH] testsuite: Verify that module-mapper is available

2022-11-17 Thread Torbjorn SVENSSON via Gcc-patches

Hi,

Ping, https://gcc.gnu.org/pipermail/gcc-patches/2022-November/604895.html

Ok for trunk?

Kind regards,
Torbjörn

On 2022-11-02 19:13, Torbjorn SVENSSON wrote:

Hi,

Ping, https://gcc.gnu.org/pipermail/gcc-patches/2022-October/602844.html

Ok for trunk?

Kind regards,
Torbjörn

On 2022-10-25 16:24, Torbjorn SVENSSON via Gcc-patches wrote:

Hi,

Ping, https://gcc.gnu.org/pipermail/gcc-patches/2022-October/603544.html

Kind regards,
Torbjörn

On 2022-10-14 09:42, Torbjorn SVENSSON wrote:

Hi,

Ping, https://gcc.gnu.org/pipermail/gcc-patches/2022-October/602843.html

Kind regards,
Torbjörn

On 2022-10-05 11:17, Torbjorn SVENSSON wrote:

Hi,

Ping, 
https://gcc.gnu.org/pipermail/gcc-patches/2022-September/602111.html


Kind regards,
Torbjörn

On 2022-09-23 14:03, Torbjörn SVENSSON wrote:

For some test cases, it's required that the optional module mapper
"g++-mapper-server" is built. As the server is not required, the
test cases will fail if it can't be found.

gcc/testsuite/ChangeLog:

* lib/target-supports.exp (check_is_prog_name_available):
New.
* lib/target-supports-dg.exp
(dg-require-prog-name-available): New.
* g++.dg/modules/modules.exp: Verify avilability of module
mapper.

Signed-off-by: Torbjörn SVENSSON  
---
  gcc/testsuite/g++.dg/modules/modules.exp | 31 


  gcc/testsuite/lib/target-supports-dg.exp | 15 
  gcc/testsuite/lib/target-supports.exp    | 15 
  3 files changed, 61 insertions(+)

diff --git a/gcc/testsuite/g++.dg/modules/modules.exp 
b/gcc/testsuite/g++.dg/modules/modules.exp

index afb323d0efd..4784803742a 100644
--- a/gcc/testsuite/g++.dg/modules/modules.exp
+++ b/gcc/testsuite/g++.dg/modules/modules.exp
@@ -279,6 +279,29 @@ proc module-init { src } {
  return $option_list
  }
+# Return 1 if requirements are met
+proc module-check-requirements { tests } {
+    foreach test $tests {
+    set tmp [dg-get-options $test]
+    foreach op $tmp {
+    switch [lindex $op 0] {
+    "dg-additional-options" {
+    # Example strings to match:
+    # -fmodules-ts -fmodule-mapper=|@g++-mapper-server\\ 
-t\\ [srcdir]/inc-xlate-1.map

+    # -fmodules-ts -fmodule-mapper=|@g++-mapper-server
+    if [regexp -- {(^| )-fmodule-mapper=\|@([^\\ ]*)} 
[lindex $op 2] dummy dummy2 prog] {

+    verbose "Checking that mapper exist: $prog"
+    if { ![ check_is_prog_name_available $prog ] } {
+    return 0
+    }
+    }
+    }
+    }
+    }
+    }
+    return 1
+}
+
  # cleanup any detritus from previous run
  cleanup_module_files [find $DEFAULT_REPO *.gcm]
@@ -307,6 +330,14 @@ foreach src [lsort [find $srcdir/$subdir 
{*_a.[CHX}]] {

  set tests [lsort [find [file dirname $src] \
    [regsub {_a.[CHX]$} [file tail $src] 
{_[a-z].[CHX]}]]]

+    if { ![module-check-requirements $tests] } {
+    set testcase [regsub {_a.[CH]} $src {}]
+    set testcase \
+    [string range $testcase [string length "$srcdir/"] end]
+    unsupported $testcase
+    continue
+    }
+
  set std_list [module-init $src]
  foreach std $std_list {
  set mod_files {}
diff --git a/gcc/testsuite/lib/target-supports-dg.exp 
b/gcc/testsuite/lib/target-supports-dg.exp

index aa2164bc789..6ce3b2b1a1b 100644
--- a/gcc/testsuite/lib/target-supports-dg.exp
+++ b/gcc/testsuite/lib/target-supports-dg.exp
@@ -683,3 +683,18 @@ proc dg-require-symver { args } {
  set dg-do-what [list [lindex ${dg-do-what} 0] "N" "P"]
  }
  }
+
+# If this target does not provide prog named "$args", skip this test.
+
+proc dg-require-prog-name-available { args } {
+    # The args are within another list; pull them out.
+    set args [lindex $args 0]
+
+    set prog [lindex $args 1]
+
+    if { ![ check_is_prog_name_available $prog ] } {
+    upvar dg-do-what dg-do-what
+    set dg-do-what [list [lindex ${dg-do-what} 0] "N" "P"]
+    }
+}
+
diff --git a/gcc/testsuite/lib/target-supports.exp 
b/gcc/testsuite/lib/target-supports.exp

index 703aba412a6..c3b7a6c17b3 100644
--- a/gcc/testsuite/lib/target-supports.exp
+++ b/gcc/testsuite/lib/target-supports.exp
@@ -11928,3 +11928,18 @@ main:
  .byte 0
    } ""]
  }
+
+# Return 1 if this target has prog named "$prog", 0 otherwise.
+
+proc check_is_prog_name_available { prog } {
+    global tool
+
+    set options [list "additional_flags=-print-prog-name=$prog"]
+    set output [lindex [${tool}_target_compile "" "" "none" 
$options] 0]

+
+    if { $output == $prog } {
+    return 0
+    }
+
+    return 1
+}


[PATCH 13/35] arm: further fix overloading of MVE vaddq[_m]_n intrinsic

2022-11-17 Thread Andrea Corallo via Gcc-patches
From: Stam Markianos-Wright 

It was observed that in tests `vaddq_m_n_[s/u][8/16/32].c`, the _Generic
resolution would fall back to the `__ARM_undef` failure state.

This is a regression since `dc39db873670bea8d8e655444387ceaa53a01a79` and
`6bd4ce64eb48a72eca300cb52773e6101d646004`, but it previously wasn't
identified, because the tests were not checking for this kind of failure.

The above commits changed the definitions of the intrinsics from using
`[u]int[8/16/32]_t` types for the scalar argument to using `int`. This
allowed `int` to be supported in user code through the overloaded
`#defines`, but seems to have broken the `[u]int[8/16/32]_t` types

The solution implemented by this patch is to explicitly use a new
_Generic mapping from all the `[u]int[8/16/32]_t` types for int. With this
change, both `int` and `[u]int[8/16/32]_t` parameters are supported from
user code and are handled by the overloading mechanism correctly.

gcc/ChangeLog:

* config/arm/arm_mve.h (__arm_vaddq_m_n_s8): Change types.
(__arm_vaddq_m_n_s32): Likewise.
(__arm_vaddq_m_n_s16): Likewise.
(__arm_vaddq_m_n_u8): Likewise.
(__arm_vaddq_m_n_u32): Likewise.
(__arm_vaddq_m_n_u16): Likewise.
(__arm_vaddq_m): Fix Overloading.
(__ARM_mve_coerce3): New.
---
 gcc/config/arm/arm_mve.h | 78 
 1 file changed, 40 insertions(+), 38 deletions(-)

diff --git a/gcc/config/arm/arm_mve.h b/gcc/config/arm/arm_mve.h
index 684f997520f..951dc25374b 100644
--- a/gcc/config/arm/arm_mve.h
+++ b/gcc/config/arm/arm_mve.h
@@ -9675,42 +9675,42 @@ __arm_vabdq_m_u16 (uint16x8_t __inactive, uint16x8_t 
__a, uint16x8_t __b, mve_pr
 
 __extension__ extern __inline int8x16_t
 __attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
-__arm_vaddq_m_n_s8 (int8x16_t __inactive, int8x16_t __a, int __b, mve_pred16_t 
__p)
+__arm_vaddq_m_n_s8 (int8x16_t __inactive, int8x16_t __a, int8_t __b, 
mve_pred16_t __p)
 {
   return __builtin_mve_vaddq_m_n_sv16qi (__inactive, __a, __b, __p);
 }
 
 __extension__ extern __inline int32x4_t
 __attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
-__arm_vaddq_m_n_s32 (int32x4_t __inactive, int32x4_t __a, int __b, 
mve_pred16_t __p)
+__arm_vaddq_m_n_s32 (int32x4_t __inactive, int32x4_t __a, int32_t __b, 
mve_pred16_t __p)
 {
   return __builtin_mve_vaddq_m_n_sv4si (__inactive, __a, __b, __p);
 }
 
 __extension__ extern __inline int16x8_t
 __attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
-__arm_vaddq_m_n_s16 (int16x8_t __inactive, int16x8_t __a, int __b, 
mve_pred16_t __p)
+__arm_vaddq_m_n_s16 (int16x8_t __inactive, int16x8_t __a, int16_t __b, 
mve_pred16_t __p)
 {
   return __builtin_mve_vaddq_m_n_sv8hi (__inactive, __a, __b, __p);
 }
 
 __extension__ extern __inline uint8x16_t
 __attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
-__arm_vaddq_m_n_u8 (uint8x16_t __inactive, uint8x16_t __a, int __b, 
mve_pred16_t __p)
+__arm_vaddq_m_n_u8 (uint8x16_t __inactive, uint8x16_t __a, uint8_t __b, 
mve_pred16_t __p)
 {
   return __builtin_mve_vaddq_m_n_uv16qi (__inactive, __a, __b, __p);
 }
 
 __extension__ extern __inline uint32x4_t
 __attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
-__arm_vaddq_m_n_u32 (uint32x4_t __inactive, uint32x4_t __a, int __b, 
mve_pred16_t __p)
+__arm_vaddq_m_n_u32 (uint32x4_t __inactive, uint32x4_t __a, uint32_t __b, 
mve_pred16_t __p)
 {
   return __builtin_mve_vaddq_m_n_uv4si (__inactive, __a, __b, __p);
 }
 
 __extension__ extern __inline uint16x8_t
 __attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
-__arm_vaddq_m_n_u16 (uint16x8_t __inactive, uint16x8_t __a, int __b, 
mve_pred16_t __p)
+__arm_vaddq_m_n_u16 (uint16x8_t __inactive, uint16x8_t __a, uint16_t __b, 
mve_pred16_t __p)
 {
   return __builtin_mve_vaddq_m_n_uv8hi (__inactive, __a, __b, __p);
 }
@@ -26417,42 +26417,42 @@ __arm_vabdq_m (uint16x8_t __inactive, uint16x8_t __a, 
uint16x8_t __b, mve_pred16
 
 __extension__ extern __inline int8x16_t
 __attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
-__arm_vaddq_m (int8x16_t __inactive, int8x16_t __a, int __b, mve_pred16_t __p)
+__arm_vaddq_m (int8x16_t __inactive, int8x16_t __a, int8_t __b, mve_pred16_t 
__p)
 {
  return __arm_vaddq_m_n_s8 (__inactive, __a, __b, __p);
 }
 
 __extension__ extern __inline int32x4_t
 __attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
-__arm_vaddq_m (int32x4_t __inactive, int32x4_t __a, int __b, mve_pred16_t __p)
+__arm_vaddq_m (int32x4_t __inactive, int32x4_t __a, int32_t __b, mve_pred16_t 
__p)
 {
  return __arm_vaddq_m_n_s32 (__inactive, __a, __b, __p);
 }
 
 __extension__ extern __inline int16x8_t
 __attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
-__arm_vaddq_m (int16x8_t __inactive, int16x8_t __a, int __b, mve_pred16_t __p)
+__arm_vaddq_m (int16x8_t __inactive, int16x8_t __a, int16_t __b, mve_pred16_t 
__p)
 {
  return 

Re: [PATCH] RISC-V: Handle "(a & twobits) == singlebit" in branches using Zbs

2022-11-17 Thread Philipp Tomsich
On Thu, 17 Nov 2022 at 17:39, Jeff Law  wrote:
>
>
> On 11/17/22 08:12, Philipp Tomsich wrote:
> >
> > This serves as an assertion only, as that case is non-sensical and
> > will be optimized away by earlier passes (as "a & C == T" with C and T
> > sharing no bits will always be false).
> > IFAIK the preceding transforms should always clean such a check up,
> > but we can't exclude the possibility that with enough command line
> > overrides and params we might see such a non-sensical test making it
> > all the way to the backend.
>
> Good!  I was thinking in the back of my mind that the no-sharing-bits
> case should have been handled in the generic optimizers.  Thanks for
> clarifying.
>
>
> >
> > What would you recommend? Adding this to the pattern's condition feels
> > a bit redundant.
>
> We can leave it in the splitter.
>
>
> > In fact, I am leaning towards hiding the !SMALL_OPERAND check in yet
> > another predicate that combines const_twobits_operand with a
> > match_test for !SMALL_OPERAND.

I'll send a v2 with this cleaned up (and look into clarifying things
around the FAIL).

Philipp.


[PATCH 15/35] arm: Explicitly specify other float types for _Generic overloading [PR107515]

2022-11-17 Thread Andrea Corallo via Gcc-patches
From: Stam Markianos-Wright 

This patch adds explicit references to other float types
to __ARM_mve_typeid in arm_mve.h.  Resolves PR 107515:
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=107515

gcc/ChangeLog:
PR 107515
* config/arm/arm_mve.h (__ARM_mve_typeid): Add float types.
---
 gcc/config/arm/arm_mve.h | 3 +++
 1 file changed, 3 insertions(+)

diff --git a/gcc/config/arm/arm_mve.h b/gcc/config/arm/arm_mve.h
index fd1876b57a0..f6b42dc3fab 100644
--- a/gcc/config/arm/arm_mve.h
+++ b/gcc/config/arm/arm_mve.h
@@ -35582,6 +35582,9 @@ enum {
short: __ARM_mve_type_int_n, \
int: __ARM_mve_type_int_n, \
long: __ARM_mve_type_int_n, \
+   _Float16: __ARM_mve_type_fp_n, \
+   __fp16: __ARM_mve_type_fp_n, \
+   float: __ARM_mve_type_fp_n, \
double: __ARM_mve_type_fp_n, \
long long: __ARM_mve_type_int_n, \
unsigned char: __ARM_mve_type_int_n, \
-- 
2.25.1



PING^2 [PATCH] cpp/remap: Only override if string matched

2022-11-17 Thread Torbjorn SVENSSON via Gcc-patches

Hi,

Ping, https://gcc.gnu.org/pipermail/gcc-patches/2022-November/604898.html

Ok for trunk?

Kind regards,
Torbjörn

On 2022-11-02 19:21, Torbjorn SVENSSON wrote:

Hi,

Ping, https://gcc.gnu.org/pipermail/gcc-patches/2022-October/604062.html

Ok for trunk?

Kind regards,
Torbjörn

On 2022-10-20 22:48, Torbjörn SVENSSON wrote:

For systems with HAVE_DOS_BASED_FILE_SYSTEM set, only override the
pointer if the backslash pattern matches.

Output without this patch:
.../gcc/testsuite/gcc.dg/cpp/pr71681-2.c:5:10: fatal error: a/t2.h: No 
such file or directory


With patch applied, no output and the test case succeeds.

libcpp/ChangeLog

* files.cc: Ensure pattern matches before use.

Signed-off-by: Torbjörn SVENSSON 
---
  libcpp/files.cc | 2 +-
  1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/libcpp/files.cc b/libcpp/files.cc
index 24208f7b0f8..a18b1caf48d 100644
--- a/libcpp/files.cc
+++ b/libcpp/files.cc
@@ -1833,7 +1833,7 @@ remap_filename (cpp_reader *pfile, _cpp_file *file)
  #ifdef HAVE_DOS_BASED_FILE_SYSTEM
    {
  const char *p2 = strchr (fname, '\\');
-    if (!p || (p > p2))
+    if (!p || (p2 && p > p2))
    p = p2;
    }
  #endif


Re: [PATCH] AArch64: Add support for -mdirect-extern-access

2022-11-17 Thread Wilco Dijkstra via Gcc-patches
Hi Richard,

> Can you go into more detail about:
>
>    Use :option:`-mdirect-extern-access` either in shared libraries or in
>    executables, but not in both.  Protected symbols used both in a shared
>    library and executable may cause linker errors or fail to work correctly
>
> If this is LLVM's default for PIC (and by assumption shared libraries),
> is it then invalid to use -mdirect-extern-access for any PIEs that
> are linked against those shared libraries and use protected symbols
> from those libraries?  How would a user know that one of the shared
> libraries they're linking against was built in this way?

Yes, the usage model is that you'd either use it for static PIE or only on
data that is not shared. If you get it wrong them you'll get the copy
relocation error. In the future we need to decide what the ABI is and
ensure GCC and LLVM are compatible. An import feature to mark symbols
that may be overridden by a shared library would be useful too.

> It looks like the main difference between this implementation and
> the x86 one is that x86 allows direct accesses to common symbols.
> What's the reason for not doing that for AArch64?  Does it not work,
> is it a false optimisation (i.e. pessimisation), or did it not seem
> important now that -fno-common is the default?

I don't see any difference in the way common symbols are accessed on x86,
so it's not clear which cases common_local_p param actually affects (eg. with
-fPIC there is always a GOT indirection for common symbols).

Cheers,
Wilco

PING^2 [PATCH] testsuite: Windows paths use \ and not /

2022-11-17 Thread Torbjorn SVENSSON via Gcc-patches

Hi,

Ping, https://gcc.gnu.org/pipermail/gcc-patches/2022-November/604896.html

Ok for trunk?

Kind regards,
Torbjörn

On 2022-11-02 19:16, Torbjorn SVENSSON wrote:

Hi,

Ping, https://gcc.gnu.org/pipermail/gcc-patches/2022-October/604312.html

Ok for trunk?

Kind regards,
Torbjörn

On 2022-10-25 17:15, Torbjörn SVENSSON wrote:

Without this patch, the following error is reported on Windows:

In file included from 
t:\build\arm-none-eabi\include\c++\11.3.1\string:54,
   from 
t:\build\arm-none-eabi\include\c++\11.3.1\bits\locale_classes.h:40,
   from 
t:\build\arm-none-eabi\include\c++\11.3.1\bits\ios_base.h:41,

   from t:\build\arm-none-eabi\include\c++\11.3.1\ios:42,
   from 
t:\build\arm-none-eabi\include\c++\11.3.1\ostream:38,
   from 
t:\build\arm-none-eabi\include\c++\11.3.1\iostream:39:
t:\build\arm-none-eabi\include\c++\11.3.1\bits\range_access.h:36:10: 
note: include 
't:\build\arm-none-eabi\include\c++\11.3.1\initializer_list' 
translated to import
arm-none-eabi-g++.exe: warning: 
.../gcc/testsuite/g++.dg/modules/pr99023_b.X: linker input file unused 
because linking not done
FAIL: g++.dg/modules/pr99023_b.X -std=c++2a  dg-regexp 6 not found: 
"[^\n]*: note: include '[^\n]*/initializer_list' translated to import\n"


gcc/testsuite/ChangeLog:

* g++.dg/modules/pr99023_b.X: Match Windows paths too.

Co-Authored-By: Yvan ROUX 
Signed-off-by: Torbjörn SVENSSON 
---
  gcc/testsuite/g++.dg/modules/pr99023_b.X | 2 +-
  1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/gcc/testsuite/g++.dg/modules/pr99023_b.X 
b/gcc/testsuite/g++.dg/modules/pr99023_b.X

index 3d82f34868b..ca5f32e5bcc 100644
--- a/gcc/testsuite/g++.dg/modules/pr99023_b.X
+++ b/gcc/testsuite/g++.dg/modules/pr99023_b.X
@@ -3,5 +3,5 @@
  // { dg-prune-output {linker input file unused} }
-// { dg-regexp {[^\n]*: note: include '[^\n]*/initializer_list' 
translated to import\n} }
+// { dg-regexp {[^\n]*: note: include '[^\n]*[/\\]initializer_list' 
translated to import\n} }

  NO DO NOT COMPILE


[PATCH 01/35] arm: improve vcreateq* tests

2022-11-17 Thread Andrea Corallo via Gcc-patches
gcc/testsuite/ChangeLog:

* gcc.target/arm/mve/intrinsics/vcreateq_f16.c: Improve test.
* gcc.target/arm/mve/intrinsics/vcreateq_f32.c: Likewise.
* gcc.target/arm/mve/intrinsics/vcreateq_s16.c: Likewise.
* gcc.target/arm/mve/intrinsics/vcreateq_s32.c: Likewise.
* gcc.target/arm/mve/intrinsics/vcreateq_s64.c: Likewise.
* gcc.target/arm/mve/intrinsics/vcreateq_s8.c: Likewise.
* gcc.target/arm/mve/intrinsics/vcreateq_u16.c: Likewise.
* gcc.target/arm/mve/intrinsics/vcreateq_u32.c: Likewise.
* gcc.target/arm/mve/intrinsics/vcreateq_u64.c: Likewise.
* gcc.target/arm/mve/intrinsics/vcreateq_u8.c: Likewise.
---
 .../arm/mve/intrinsics/vcreateq_f16.c | 23 ++-
 .../arm/mve/intrinsics/vcreateq_f32.c | 23 ++-
 .../arm/mve/intrinsics/vcreateq_s16.c | 23 ++-
 .../arm/mve/intrinsics/vcreateq_s32.c | 23 ++-
 .../arm/mve/intrinsics/vcreateq_s64.c | 23 ++-
 .../arm/mve/intrinsics/vcreateq_s8.c  | 23 ++-
 .../arm/mve/intrinsics/vcreateq_u16.c | 23 ++-
 .../arm/mve/intrinsics/vcreateq_u32.c | 23 ++-
 .../arm/mve/intrinsics/vcreateq_u64.c | 23 ++-
 .../arm/mve/intrinsics/vcreateq_u8.c  | 23 ++-
 10 files changed, 220 insertions(+), 10 deletions(-)

diff --git a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vcreateq_f16.c 
b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vcreateq_f16.c
index fb3601edb94..c39303daa03 100644
--- a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vcreateq_f16.c
+++ b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vcreateq_f16.c
@@ -1,13 +1,34 @@
 /* { dg-require-effective-target arm_v8_1m_mve_fp_ok } */
 /* { dg-add-options arm_v8_1m_mve_fp } */
 /* { dg-additional-options "-O2" } */
+/* { dg-final { check-function-bodies "**" "" } } */
 
 #include "arm_mve.h"
 
+/*
+**foo:
+** ...
+** vmov q[0-9+]\[2\], q[0-9+]\[0\], r[0-9+], r[0-9+]
+** vmov q[0-9+]\[3\], q[0-9+]\[1\], r[0-9+], r[0-9+]
+** ...
+*/
 float16x8_t
 foo (uint64_t a, uint64_t b)
 {
   return vcreateq_f16 (a, b);
 }
 
-/* { dg-final { scan-assembler "vmov"  }  } */
+/*
+**foo1:
+** ...
+** vmov q[0-9+]\[2\], q[0-9+]\[0\], r[0-9+], r[0-9+]
+** vmov q[0-9+]\[3\], q[0-9+]\[1\], r[0-9+], r[0-9+]
+** ...
+*/
+float16x8_t
+foo1 ()
+{
+  return vcreateq_f16 (1, 1);
+}
+
+/* { dg-final { scan-assembler-not "__ARM_undef" } } */
\ No newline at end of file
diff --git a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vcreateq_f32.c 
b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vcreateq_f32.c
index 4f4da62eed7..ad66f4407cd 100644
--- a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vcreateq_f32.c
+++ b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vcreateq_f32.c
@@ -1,13 +1,34 @@
 /* { dg-require-effective-target arm_v8_1m_mve_fp_ok } */
 /* { dg-add-options arm_v8_1m_mve_fp } */
 /* { dg-additional-options "-O2" } */
+/* { dg-final { check-function-bodies "**" "" } } */
 
 #include "arm_mve.h"
 
+/*
+**foo:
+** ...
+** vmov q[0-9+]\[2\], q[0-9+]\[0\], r[0-9+], r[0-9+]
+** vmov q[0-9+]\[3\], q[0-9+]\[1\], r[0-9+], r[0-9+]
+** ...
+*/
 float32x4_t
 foo (uint64_t a, uint64_t b)
 {
   return vcreateq_f32 (a, b);
 }
 
-/* { dg-final { scan-assembler "vmov"  }  } */
+/*
+**foo1:
+** ...
+** vmov q[0-9+]\[2\], q[0-9+]\[0\], r[0-9+], r[0-9+]
+** vmov q[0-9+]\[3\], q[0-9+]\[1\], r[0-9+], r[0-9+]
+** ...
+*/
+float32x4_t
+foo1 ()
+{
+  return vcreateq_f32 (1, 1);
+}
+
+/* { dg-final { scan-assembler-not "__ARM_undef" } } */
\ No newline at end of file
diff --git a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vcreateq_s16.c 
b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vcreateq_s16.c
index 103be6310bd..7e70a486513 100644
--- a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vcreateq_s16.c
+++ b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vcreateq_s16.c
@@ -1,13 +1,34 @@
 /* { dg-require-effective-target arm_v8_1m_mve_ok } */
 /* { dg-add-options arm_v8_1m_mve } */
 /* { dg-additional-options "-O2" } */
+/* { dg-final { check-function-bodies "**" "" } } */
 
 #include "arm_mve.h"
 
+/*
+**foo:
+** ...
+** vmov q[0-9+]\[2\], q[0-9+]\[0\], r[0-9+], r[0-9+]
+** vmov q[0-9+]\[3\], q[0-9+]\[1\], r[0-9+], r[0-9+]
+** ...
+*/
 int16x8_t
 foo (uint64_t a, uint64_t b)
 {
   return vcreateq_s16 (a, b);
 }
 
-/* { dg-final { scan-assembler "vmov"  }  } */
+/*
+**foo1:
+** ...
+** vmov q[0-9+]\[2\], q[0-9+]\[0\], r[0-9+], r[0-9+]
+** vmov q[0-9+]\[3\], q[0-9+]\[1\], r[0-9+], r[0-9+]
+** ...
+*/
+int16x8_t
+foo1 ()
+{
+  return vcreateq_s16 (1, 1);
+}
+
+/* { dg-final { scan-assembler-not "__ARM_undef" } } */
\ No newline at end of file
diff --git a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vcreateq_s32.c 
b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vcreateq_s32.c
index 

Re: [PATCH] RISC-V: Handle "(a & twobits) == singlebit" in branches using Zbs

2022-11-17 Thread Jeff Law via Gcc-patches



On 11/17/22 08:12, Philipp Tomsich wrote:


This serves as an assertion only, as that case is non-sensical and
will be optimized away by earlier passes (as "a & C == T" with C and T
sharing no bits will always be false).
IFAIK the preceding transforms should always clean such a check up,
but we can't exclude the possibility that with enough command line
overrides and params we might see such a non-sensical test making it
all the way to the backend.


Good!  I was thinking in the back of my mind that the no-sharing-bits 
case should have been handled in the generic optimizers.  Thanks for 
clarifying.





What would you recommend? Adding this to the pattern's condition feels
a bit redundant.


We can leave it in the splitter.



In fact, I am leaning towards hiding the !SMALL_OPERAND check in yet
another predicate that combines const_twobits_operand with a
match_test for !SMALL_OPERAND.


Sure.

jeff




Re: [PATCH] [range-ops] Implement sqrt.

2022-11-17 Thread Aldy Hernandez via Gcc-patches
To go along with whatever magic we're gonna tack along to the
range-ops sqrt implementation, here is another revision addressing the
VARYING issue you pointed out.

A few things...

Instead of going through trees, I decided to call do_mpfr_arg1
directly.  Let's not go the wide int <-> tree rat hole in this one.

The function do_mpfr_arg1 bails on +INF, so I had to handle it manually.

There's a regression in gfortran.dg/ieee/ieee_6.f90, which I'm not
sure how to handle.  We are failing because we are calculating
sqrt(-1) and expecting certain IEEE flags set.  These flags aren't
set, presumably because we folded sqrt(-1) into a NAN directly:

// All negatives.
if (real_compare (LT_EXPR, _ub, ))
  {
real_nan (, "", 0, TYPE_MODE (type));
ub = lb;
maybe_nan = true;
return;
  }

The failing part of the test is:

  if (.not. (all(flags .eqv. [.false.,.false.,.true.,.true.,.false.]) &
 .or. all(flags .eqv. [.false.,.false.,.true.,.true.,.true.]) &
 .or. all(flags .eqv. [.false.,.false.,.true.,.false.,.false.]) &
 .or. all(flags .eqv.
[.false.,.false.,.true.,.false.,.true.]))) STOP 5

But we are generating F F F F F.  Google has informed me that that 3rd
flag is IEEE_INVALID.

So... is the optimization wrong?  Are we not allowed to substitute
that NAN if we know it's gonna happen?  Should we also allow F F F F F
in the test?  Or something else?

Thanks.
Aldy

On Wed, Nov 16, 2022 at 9:33 PM Jakub Jelinek  wrote:
>
> On Mon, Nov 14, 2022 at 09:55:29PM +, Joseph Myers wrote:
> > On Sun, 13 Nov 2022, Jakub Jelinek via Gcc-patches wrote:
> >
> > > So, I wonder if we don't need to add a target hook where targets will be
> > > able to provide upper bound on error for floating point functions for
> > > different floating point modes and some way to signal unknown 
> > > accuracy/can't
> > > be trusted, in which case we would give up or return just the range for
> > > VARYING.
> >
> > Note that the figures given in the glibc manual are purely empirical
> > (largest errors observed for inputs in the glibc testsuite on a system
> > that was then used to update the libm-test-ulps files); they don't
> > constitute any kind of guarantee about either the current implementation
> > or the API, nor are they formally verified, nor do they come from
> > exhaustive testing (though worst cases from exhaustive testing for float
> > may have been added to the glibc testsuite in some cases).  (I think the
> > only functions known to give huge errors for some inputs, outside of any
> > IBM long double issues, are the Bessel functions and cpow functions.  But
> > even if other functions don't have huge errors, and some
> > architecture-specific implementations might have issues, there are
> > certainly some cases where errors can exceed the 9ulp threshold on what
> > the libm tests will accept in libm-test-ulps files, which are thus
> > considered glibc bugs.  (That's 9ulp from the correctly rounded value,
> > computed in ulp of that value.  For IBM long double it's 16ulp instead,
> > treating the format as having a fixed 106 bits of precision.  Both figures
> > are empirical ones chosen based on what bounds sufficed for most libm
> > functions some years ago; ideally, with better implementations of some
> > functions we could probably bring those numbers down.))
>
> I know I can't get guarantees without formal proofs and even ulps from
> reported errors are better than randomized testing.
> But I think at least for non-glibc we want to be able to get a rough idea
> of the usual error range in ulps.
>
> This is what I came up with so far (link with
> gcc -o ulp-tester{,.c} -O2 -lmpfr -lm
> ), it still doesn't verify that functions are always within the mathematical
> range of results ([-0.0, Inf] for sqrt, [-1.0, 1.0] for sin/cos etc.), guess
> that would be useful and verify the program actually does what is intended.
> One can supply just one argument (number of tests, first 46 aren't really
> random) or two, in the latter case the second should be upward, downward or
> towardzero to use non-default rounding mode.
> The idea is that we'd collect ballpark estimates for roundtonearest and
> then estimates for the other 3 rounding modes, the former would be used
> without -frounding-math, max over all 4 rounding modes for -frounding-math
> as gcc will compute using mpfr always in round to nearest.
>
> Jakub
From 759bcd4b4b6f70fcec045b24fb6874aaca989549 Mon Sep 17 00:00:00 2001
From: Aldy Hernandez 
Date: Sun, 13 Nov 2022 18:39:59 +0100
Subject: [PATCH] [range-ops] Implement sqrt.

gcc/ChangeLog:

	* fold-const-call.cc (do_mpfr_arg1): Remove static.
	* gimple-range-op.cc (class cfn_sqrt): New.
	(gimple_range_op_handler::maybe_builtin_call): Add sqrt case.
	* realmpfr.h (do_mpfr_arg1): Add extern.

gcc/testsuite/ChangeLog:

	* gcc.dg/tree-ssa/vrp124.c: New test.
---
 gcc/fold-const-call.cc |  2 +-
 gcc/gimple-range-op.cc | 56 

[PATCH 17/35] arm: improve tests and fix vadd*

2022-11-17 Thread Andrea Corallo via Gcc-patches
gcc/ChangeLog:

* config/arm/mve.md (mve_vaddlvq_p_v4si)
(mve_vaddq_n_, mve_vaddvaq_)
(mve_vaddlvaq_v4si, mve_vaddq_n_f)
(mve_vaddlvaq_p_v4si, mve_vaddq, mve_vaddq_f):
Fix spacing.

gcc/testsuite/ChangeLog:

* gcc.target/arm/mve/intrinsics/vaddlvaq_p_s32.c: Improve test.
* gcc.target/arm/mve/intrinsics/vaddlvaq_p_u32.c: Likewise.
* gcc.target/arm/mve/intrinsics/vaddlvaq_s32.c: Likewise.
* gcc.target/arm/mve/intrinsics/vaddlvaq_u32.c: Likewise.
* gcc.target/arm/mve/intrinsics/vaddlvq_p_s32.c: Likewise.
* gcc.target/arm/mve/intrinsics/vaddlvq_p_u32.c: Likewise.
* gcc.target/arm/mve/intrinsics/vaddlvq_s32.c: Likewise.
* gcc.target/arm/mve/intrinsics/vaddlvq_u32.c: Likewise.
* gcc.target/arm/mve/intrinsics/vaddq_f16.c: Likewise.
* gcc.target/arm/mve/intrinsics/vaddq_f32.c: Likewise.
* gcc.target/arm/mve/intrinsics/vaddq_m_f16.c: Likewise.
* gcc.target/arm/mve/intrinsics/vaddq_m_f32.c: Likewise.
* gcc.target/arm/mve/intrinsics/vaddq_m_n_f16.c: Likewise.
* gcc.target/arm/mve/intrinsics/vaddq_m_n_f32.c: Likewise.
* gcc.target/arm/mve/intrinsics/vaddq_m_n_s16.c: Likewise.
* gcc.target/arm/mve/intrinsics/vaddq_m_n_s32.c: Likewise.
* gcc.target/arm/mve/intrinsics/vaddq_m_n_s8.c: Likewise.
* gcc.target/arm/mve/intrinsics/vaddq_m_n_u16.c: Likewise.
* gcc.target/arm/mve/intrinsics/vaddq_m_n_u32.c: Likewise.
* gcc.target/arm/mve/intrinsics/vaddq_m_n_u8.c: Likewise.
* gcc.target/arm/mve/intrinsics/vaddq_m_s16.c: Likewise.
* gcc.target/arm/mve/intrinsics/vaddq_m_s32.c: Likewise.
* gcc.target/arm/mve/intrinsics/vaddq_m_s8.c: Likewise.
* gcc.target/arm/mve/intrinsics/vaddq_m_u16.c: Likewise.
* gcc.target/arm/mve/intrinsics/vaddq_m_u32.c: Likewise.
* gcc.target/arm/mve/intrinsics/vaddq_m_u8.c: Likewise.
* gcc.target/arm/mve/intrinsics/vaddq_n_f16.c: Likewise.
* gcc.target/arm/mve/intrinsics/vaddq_n_f32.c: Likewise.
* gcc.target/arm/mve/intrinsics/vaddq_n_s16.c: Likewise.
* gcc.target/arm/mve/intrinsics/vaddq_n_s32.c: Likewise.
* gcc.target/arm/mve/intrinsics/vaddq_n_s8.c: Likewise.
* gcc.target/arm/mve/intrinsics/vaddq_n_u16.c: Likewise.
* gcc.target/arm/mve/intrinsics/vaddq_n_u32.c: Likewise.
* gcc.target/arm/mve/intrinsics/vaddq_n_u8.c: Likewise.
* gcc.target/arm/mve/intrinsics/vaddq_s16.c: Likewise.
* gcc.target/arm/mve/intrinsics/vaddq_s32.c: Likewise.
* gcc.target/arm/mve/intrinsics/vaddq_s8.c: Likewise.
* gcc.target/arm/mve/intrinsics/vaddq_u16.c: Likewise.
* gcc.target/arm/mve/intrinsics/vaddq_u32.c: Likewise.
* gcc.target/arm/mve/intrinsics/vaddq_u8.c: Likewise.
* gcc.target/arm/mve/intrinsics/vaddq_x_f16.c: Likewise.
* gcc.target/arm/mve/intrinsics/vaddq_x_f32.c: Likewise.
* gcc.target/arm/mve/intrinsics/vaddq_x_n_f16.c: Likewise.
* gcc.target/arm/mve/intrinsics/vaddq_x_n_f32.c: Likewise.
* gcc.target/arm/mve/intrinsics/vaddq_x_n_s16.c: Likewise.
* gcc.target/arm/mve/intrinsics/vaddq_x_n_s32.c: Likewise.
* gcc.target/arm/mve/intrinsics/vaddq_x_n_s8.c: Likewise.
* gcc.target/arm/mve/intrinsics/vaddq_x_n_u16.c: Likewise.
* gcc.target/arm/mve/intrinsics/vaddq_x_n_u32.c: Likewise.
* gcc.target/arm/mve/intrinsics/vaddq_x_n_u8.c: Likewise.
* gcc.target/arm/mve/intrinsics/vaddq_x_s16.c: Likewise.
* gcc.target/arm/mve/intrinsics/vaddq_x_s32.c: Likewise.
* gcc.target/arm/mve/intrinsics/vaddq_x_s8.c: Likewise.
* gcc.target/arm/mve/intrinsics/vaddq_x_u16.c: Likewise.
* gcc.target/arm/mve/intrinsics/vaddq_x_u32.c: Likewise.
* gcc.target/arm/mve/intrinsics/vaddq_x_u8.c: Likewise.
* gcc.target/arm/mve/intrinsics/vaddvaq_p_s16.c: Likewise.
* gcc.target/arm/mve/intrinsics/vaddvaq_p_s32.c: Likewise.
* gcc.target/arm/mve/intrinsics/vaddvaq_p_s8.c: Likewise.
* gcc.target/arm/mve/intrinsics/vaddvaq_p_u16.c: Likewise.
* gcc.target/arm/mve/intrinsics/vaddvaq_p_u32.c: Likewise.
* gcc.target/arm/mve/intrinsics/vaddvaq_p_u8.c: Likewise.
* gcc.target/arm/mve/intrinsics/vaddvaq_s16.c: Likewise.
* gcc.target/arm/mve/intrinsics/vaddvaq_s32.c: Likewise.
* gcc.target/arm/mve/intrinsics/vaddvaq_s8.c: Likewise.
* gcc.target/arm/mve/intrinsics/vaddvaq_u16.c: Likewise.
* gcc.target/arm/mve/intrinsics/vaddvaq_u32.c: Likewise.
* gcc.target/arm/mve/intrinsics/vaddvaq_u8.c: Likewise.
* gcc.target/arm/mve/intrinsics/vaddvq_p_s16.c: Likewise.
* gcc.target/arm/mve/intrinsics/vaddvq_p_s32.c: Likewise.
* gcc.target/arm/mve/intrinsics/vaddvq_p_s8.c: Likewise.
* gcc.target/arm/mve/intrinsics/vaddvq_p_u16.c: Likewise.
* 

[PATCH 09/35] arm: improve tests for vmax*

2022-11-17 Thread Andrea Corallo via Gcc-patches
gcc/testsuite/ChangeLog:

* gcc.target/arm/mve/intrinsics/vmaxaq_m_s16.c: Improve test.
* gcc.target/arm/mve/intrinsics/vmaxaq_m_s32.c: Likewise.
* gcc.target/arm/mve/intrinsics/vmaxaq_m_s8.c: Likewise.
* gcc.target/arm/mve/intrinsics/vmaxaq_s16.c: Likewise.
* gcc.target/arm/mve/intrinsics/vmaxaq_s32.c: Likewise.
* gcc.target/arm/mve/intrinsics/vmaxaq_s8.c: Likewise.
* gcc.target/arm/mve/intrinsics/vmaxavq_p_s16.c: Likewise.
* gcc.target/arm/mve/intrinsics/vmaxavq_p_s32.c: Likewise.
* gcc.target/arm/mve/intrinsics/vmaxavq_p_s8.c: Likewise.
* gcc.target/arm/mve/intrinsics/vmaxavq_s16.c: Likewise.
* gcc.target/arm/mve/intrinsics/vmaxavq_s32.c: Likewise.
* gcc.target/arm/mve/intrinsics/vmaxavq_s8.c: Likewise.
* gcc.target/arm/mve/intrinsics/vmaxnmaq_f16.c: Likewise.
* gcc.target/arm/mve/intrinsics/vmaxnmaq_f32.c: Likewise.
* gcc.target/arm/mve/intrinsics/vmaxnmaq_m_f16.c: Likewise.
* gcc.target/arm/mve/intrinsics/vmaxnmaq_m_f32.c: Likewise.
* gcc.target/arm/mve/intrinsics/vmaxnmavq_f16.c: Likewise.
* gcc.target/arm/mve/intrinsics/vmaxnmavq_f32.c: Likewise.
* gcc.target/arm/mve/intrinsics/vmaxnmavq_p_f16.c: Likewise.
* gcc.target/arm/mve/intrinsics/vmaxnmavq_p_f32.c: Likewise.
* gcc.target/arm/mve/intrinsics/vmaxnmq_f16.c: Likewise.
* gcc.target/arm/mve/intrinsics/vmaxnmq_f32.c: Likewise.
* gcc.target/arm/mve/intrinsics/vmaxnmq_m_f16.c: Likewise.
* gcc.target/arm/mve/intrinsics/vmaxnmq_m_f32.c: Likewise.
* gcc.target/arm/mve/intrinsics/vmaxnmq_x_f16.c: Likewise.
* gcc.target/arm/mve/intrinsics/vmaxnmq_x_f32.c: Likewise.
* gcc.target/arm/mve/intrinsics/vmaxnmvq_f16.c: Likewise.
* gcc.target/arm/mve/intrinsics/vmaxnmvq_f32.c: Likewise.
* gcc.target/arm/mve/intrinsics/vmaxnmvq_p_f16.c: Likewise.
* gcc.target/arm/mve/intrinsics/vmaxnmvq_p_f32.c: Likewise.
* gcc.target/arm/mve/intrinsics/vmaxq_m_s16.c: Likewise.
* gcc.target/arm/mve/intrinsics/vmaxq_m_s32.c: Likewise.
* gcc.target/arm/mve/intrinsics/vmaxq_m_s8.c: Likewise.
* gcc.target/arm/mve/intrinsics/vmaxq_m_u16.c: Likewise.
* gcc.target/arm/mve/intrinsics/vmaxq_m_u32.c: Likewise.
* gcc.target/arm/mve/intrinsics/vmaxq_m_u8.c: Likewise.
* gcc.target/arm/mve/intrinsics/vmaxq_s16.c: Likewise.
* gcc.target/arm/mve/intrinsics/vmaxq_s32.c: Likewise.
* gcc.target/arm/mve/intrinsics/vmaxq_s8.c: Likewise.
* gcc.target/arm/mve/intrinsics/vmaxq_u16.c: Likewise.
* gcc.target/arm/mve/intrinsics/vmaxq_u32.c: Likewise.
* gcc.target/arm/mve/intrinsics/vmaxq_u8.c: Likewise.
* gcc.target/arm/mve/intrinsics/vmaxq_x_s16.c: Likewise.
* gcc.target/arm/mve/intrinsics/vmaxq_x_s32.c: Likewise.
* gcc.target/arm/mve/intrinsics/vmaxq_x_s8.c: Likewise.
* gcc.target/arm/mve/intrinsics/vmaxq_x_u16.c: Likewise.
* gcc.target/arm/mve/intrinsics/vmaxq_x_u32.c: Likewise.
* gcc.target/arm/mve/intrinsics/vmaxq_x_u8.c: Likewise.
* gcc.target/arm/mve/intrinsics/vmaxvq_p_s16.c: Likewise.
* gcc.target/arm/mve/intrinsics/vmaxvq_p_s32.c: Likewise.
* gcc.target/arm/mve/intrinsics/vmaxvq_p_s8.c: Likewise.
* gcc.target/arm/mve/intrinsics/vmaxvq_p_u16.c: Likewise.
* gcc.target/arm/mve/intrinsics/vmaxvq_p_u32.c: Likewise.
* gcc.target/arm/mve/intrinsics/vmaxvq_p_u8.c: Likewise.
* gcc.target/arm/mve/intrinsics/vmaxvq_s16.c: Likewise.
* gcc.target/arm/mve/intrinsics/vmaxvq_s32.c: Likewise.
* gcc.target/arm/mve/intrinsics/vmaxvq_s8.c: Likewise.
* gcc.target/arm/mve/intrinsics/vmaxvq_u16.c: Likewise.
* gcc.target/arm/mve/intrinsics/vmaxvq_u32.c: Likewise.
* gcc.target/arm/mve/intrinsics/vmaxvq_u8.c: Likewise.
---
 .../arm/mve/intrinsics/vmaxaq_m_s16.c | 25 +--
 .../arm/mve/intrinsics/vmaxaq_m_s32.c | 25 +--
 .../arm/mve/intrinsics/vmaxaq_m_s8.c  | 25 +--
 .../arm/mve/intrinsics/vmaxaq_s16.c   | 16 +++-
 .../arm/mve/intrinsics/vmaxaq_s32.c   | 16 +++-
 .../gcc.target/arm/mve/intrinsics/vmaxaq_s8.c | 16 +++-
 .../arm/mve/intrinsics/vmaxavq_p_s16.c| 41 ---
 .../arm/mve/intrinsics/vmaxavq_p_s32.c| 41 ---
 .../arm/mve/intrinsics/vmaxavq_p_s8.c | 41 ---
 .../arm/mve/intrinsics/vmaxavq_s16.c  | 29 ++---
 .../arm/mve/intrinsics/vmaxavq_s32.c  | 29 ++---
 .../arm/mve/intrinsics/vmaxavq_s8.c   | 29 ++---
 .../arm/mve/intrinsics/vmaxnmaq_f16.c | 16 +++-
 .../arm/mve/intrinsics/vmaxnmaq_f32.c | 16 +++-
 .../arm/mve/intrinsics/vmaxnmaq_m_f16.c   | 25 +--
 .../arm/mve/intrinsics/vmaxnmaq_m_f32.c   | 25 +--
 

[PATCH 29/35] arm: improve tests for vqdmul*

2022-11-17 Thread Andrea Corallo via Gcc-patches
gcc/testsuite/ChangeLog:

* gcc.target/arm/mve/intrinsics/vqdmulhq_m_n_s16.c: Improve tests.
* gcc.target/arm/mve/intrinsics/vqdmulhq_m_n_s32.c: Likewise.
* gcc.target/arm/mve/intrinsics/vqdmulhq_m_n_s8.c: Likewise.
* gcc.target/arm/mve/intrinsics/vqdmulhq_m_s16.c: Likewise.
* gcc.target/arm/mve/intrinsics/vqdmulhq_m_s32.c: Likewise.
* gcc.target/arm/mve/intrinsics/vqdmulhq_m_s8.c: Likewise.
* gcc.target/arm/mve/intrinsics/vqdmulhq_n_s16.c: Likewise.
* gcc.target/arm/mve/intrinsics/vqdmulhq_n_s32.c: Likewise.
* gcc.target/arm/mve/intrinsics/vqdmulhq_n_s8.c: Likewise.
* gcc.target/arm/mve/intrinsics/vqdmulhq_s16.c: Likewise.
* gcc.target/arm/mve/intrinsics/vqdmulhq_s32.c: Likewise.
* gcc.target/arm/mve/intrinsics/vqdmulhq_s8.c: Likewise.
* gcc.target/arm/mve/intrinsics/vqdmullbq_m_n_s16.c: Likewise.
* gcc.target/arm/mve/intrinsics/vqdmullbq_m_n_s32.c: Likewise.
* gcc.target/arm/mve/intrinsics/vqdmullbq_m_s16.c: Likewise.
* gcc.target/arm/mve/intrinsics/vqdmullbq_m_s32.c: Likewise.
* gcc.target/arm/mve/intrinsics/vqdmullbq_n_s16.c: Likewise.
* gcc.target/arm/mve/intrinsics/vqdmullbq_n_s32.c: Likewise.
* gcc.target/arm/mve/intrinsics/vqdmullbq_s16.c: Likewise.
* gcc.target/arm/mve/intrinsics/vqdmullbq_s32.c: Likewise.
* gcc.target/arm/mve/intrinsics/vqdmulltq_m_n_s16.c: Likewise.
* gcc.target/arm/mve/intrinsics/vqdmulltq_m_n_s32.c: Likewise.
* gcc.target/arm/mve/intrinsics/vqdmulltq_m_s16.c: Likewise.
* gcc.target/arm/mve/intrinsics/vqdmulltq_m_s32.c: Likewise.
* gcc.target/arm/mve/intrinsics/vqdmulltq_n_s16.c: Likewise.
* gcc.target/arm/mve/intrinsics/vqdmulltq_n_s32.c: Likewise.
* gcc.target/arm/mve/intrinsics/vqdmulltq_s16.c: Likewise.
* gcc.target/arm/mve/intrinsics/vqdmulltq_s32.c: Likewise.
---
 .../arm/mve/intrinsics/vqdmulhq_m_n_s16.c | 26 ---
 .../arm/mve/intrinsics/vqdmulhq_m_n_s32.c | 26 ---
 .../arm/mve/intrinsics/vqdmulhq_m_n_s8.c  | 26 ---
 .../arm/mve/intrinsics/vqdmulhq_m_s16.c   | 26 ---
 .../arm/mve/intrinsics/vqdmulhq_m_s32.c   | 26 ---
 .../arm/mve/intrinsics/vqdmulhq_m_s8.c| 26 ---
 .../arm/mve/intrinsics/vqdmulhq_n_s16.c   | 16 ++--
 .../arm/mve/intrinsics/vqdmulhq_n_s32.c   | 16 ++--
 .../arm/mve/intrinsics/vqdmulhq_n_s8.c| 16 ++--
 .../arm/mve/intrinsics/vqdmulhq_s16.c | 16 ++--
 .../arm/mve/intrinsics/vqdmulhq_s32.c | 16 ++--
 .../arm/mve/intrinsics/vqdmulhq_s8.c  | 16 ++--
 .../arm/mve/intrinsics/vqdmullbq_m_n_s16.c| 26 ---
 .../arm/mve/intrinsics/vqdmullbq_m_n_s32.c| 26 ---
 .../arm/mve/intrinsics/vqdmullbq_m_s16.c  | 26 ---
 .../arm/mve/intrinsics/vqdmullbq_m_s32.c  | 26 ---
 .../arm/mve/intrinsics/vqdmullbq_n_s16.c  | 16 ++--
 .../arm/mve/intrinsics/vqdmullbq_n_s32.c  | 16 ++--
 .../arm/mve/intrinsics/vqdmullbq_s16.c| 16 ++--
 .../arm/mve/intrinsics/vqdmullbq_s32.c| 16 ++--
 .../arm/mve/intrinsics/vqdmulltq_m_n_s16.c| 26 ---
 .../arm/mve/intrinsics/vqdmulltq_m_n_s32.c| 26 ---
 .../arm/mve/intrinsics/vqdmulltq_m_s16.c  | 26 ---
 .../arm/mve/intrinsics/vqdmulltq_m_s32.c  | 26 ---
 .../arm/mve/intrinsics/vqdmulltq_n_s16.c  | 16 ++--
 .../arm/mve/intrinsics/vqdmulltq_n_s32.c  | 16 ++--
 .../arm/mve/intrinsics/vqdmulltq_s16.c| 16 ++--
 .../arm/mve/intrinsics/vqdmulltq_s32.c| 16 ++--
 28 files changed, 504 insertions(+), 84 deletions(-)

diff --git a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vqdmulhq_m_n_s16.c 
b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vqdmulhq_m_n_s16.c
index 57ab85eaf52..a5c1a106205 100644
--- a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vqdmulhq_m_n_s16.c
+++ b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vqdmulhq_m_n_s16.c
@@ -1,23 +1,41 @@
 /* { dg-require-effective-target arm_v8_1m_mve_ok } */
 /* { dg-add-options arm_v8_1m_mve } */
 /* { dg-additional-options "-O2" } */
+/* { dg-final { check-function-bodies "**" "" } } */
 
 #include "arm_mve.h"
 
+/*
+**foo:
+** ...
+** vmsrp0, (?:ip|fp|r[0-9]+)(?:@.*|)
+** ...
+** vpst(?: @.*|)
+** ...
+** vqdmulht.s16q[0-9]+, q[0-9]+, (?:ip|fp|r[0-9]+)(?:  @.*|)
+** ...
+*/
 int16x8_t
 foo (int16x8_t inactive, int16x8_t a, int16_t b, mve_pred16_t p)
 {
   return vqdmulhq_m_n_s16 (inactive, a, b, p);
 }
 
-/* { dg-final { scan-assembler "vpst" } } */
-/* { dg-final { scan-assembler "vqdmulht.s16"  }  } */
 
+/*
+**foo1:
+** ...
+** vmsrp0, (?:ip|fp|r[0-9]+)(?:

[PATCH 32/35] arm: improve tests for vqsubq*

2022-11-17 Thread Andrea Corallo via Gcc-patches
gcc/testsuite/ChangeLog:

* gcc.target/arm/mve/intrinsics/vqsubq_m_n_s16.c:
* gcc.target/arm/mve/intrinsics/vqsubq_m_n_s32.c:
* gcc.target/arm/mve/intrinsics/vqsubq_m_n_s8.c:
* gcc.target/arm/mve/intrinsics/vqsubq_m_n_u16.c:
* gcc.target/arm/mve/intrinsics/vqsubq_m_n_u32.c:
* gcc.target/arm/mve/intrinsics/vqsubq_m_n_u8.c:
* gcc.target/arm/mve/intrinsics/vqsubq_m_s16.c:
* gcc.target/arm/mve/intrinsics/vqsubq_m_s32.c:
* gcc.target/arm/mve/intrinsics/vqsubq_m_s8.c:
* gcc.target/arm/mve/intrinsics/vqsubq_m_u16.c:
* gcc.target/arm/mve/intrinsics/vqsubq_m_u32.c:
* gcc.target/arm/mve/intrinsics/vqsubq_m_u8.c:
* gcc.target/arm/mve/intrinsics/vqsubq_n_s16.c:
* gcc.target/arm/mve/intrinsics/vqsubq_n_s32.c:
* gcc.target/arm/mve/intrinsics/vqsubq_n_s8.c:
* gcc.target/arm/mve/intrinsics/vqsubq_n_u16.c:
* gcc.target/arm/mve/intrinsics/vqsubq_n_u32.c:
* gcc.target/arm/mve/intrinsics/vqsubq_n_u8.c:
* gcc.target/arm/mve/intrinsics/vqsubq_s16.c:
* gcc.target/arm/mve/intrinsics/vqsubq_s32.c:
* gcc.target/arm/mve/intrinsics/vqsubq_s8.c:
* gcc.target/arm/mve/intrinsics/vqsubq_u16.c:
* gcc.target/arm/mve/intrinsics/vqsubq_u32.c:
* gcc.target/arm/mve/intrinsics/vqsubq_u8.c:
---
 .../arm/mve/intrinsics/vqsubq_m_n_s16.c   | 26 ++--
 .../arm/mve/intrinsics/vqsubq_m_n_s32.c   | 26 ++--
 .../arm/mve/intrinsics/vqsubq_m_n_s8.c| 26 ++--
 .../arm/mve/intrinsics/vqsubq_m_n_u16.c   | 42 +--
 .../arm/mve/intrinsics/vqsubq_m_n_u32.c   | 42 +--
 .../arm/mve/intrinsics/vqsubq_m_n_u8.c| 42 +--
 .../arm/mve/intrinsics/vqsubq_m_s16.c | 26 ++--
 .../arm/mve/intrinsics/vqsubq_m_s32.c | 26 ++--
 .../arm/mve/intrinsics/vqsubq_m_s8.c  | 26 ++--
 .../arm/mve/intrinsics/vqsubq_m_u16.c | 26 ++--
 .../arm/mve/intrinsics/vqsubq_m_u32.c | 26 ++--
 .../arm/mve/intrinsics/vqsubq_m_u8.c  | 26 ++--
 .../arm/mve/intrinsics/vqsubq_n_s16.c | 16 ++-
 .../arm/mve/intrinsics/vqsubq_n_s32.c | 16 ++-
 .../arm/mve/intrinsics/vqsubq_n_s8.c  | 16 ++-
 .../arm/mve/intrinsics/vqsubq_n_u16.c | 28 -
 .../arm/mve/intrinsics/vqsubq_n_u32.c | 28 -
 .../arm/mve/intrinsics/vqsubq_n_u8.c  | 28 -
 .../arm/mve/intrinsics/vqsubq_s16.c   | 16 ++-
 .../arm/mve/intrinsics/vqsubq_s32.c   | 16 ++-
 .../gcc.target/arm/mve/intrinsics/vqsubq_s8.c | 16 ++-
 .../arm/mve/intrinsics/vqsubq_u16.c   | 16 ++-
 .../arm/mve/intrinsics/vqsubq_u32.c   | 16 ++-
 .../gcc.target/arm/mve/intrinsics/vqsubq_u8.c | 16 ++-
 24 files changed, 516 insertions(+), 72 deletions(-)

diff --git a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vqsubq_m_n_s16.c 
b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vqsubq_m_n_s16.c
index abcff4f0e3c..39b8089919d 100644
--- a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vqsubq_m_n_s16.c
+++ b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vqsubq_m_n_s16.c
@@ -1,23 +1,41 @@
 /* { dg-require-effective-target arm_v8_1m_mve_ok } */
 /* { dg-add-options arm_v8_1m_mve } */
 /* { dg-additional-options "-O2" } */
+/* { dg-final { check-function-bodies "**" "" } } */
 
 #include "arm_mve.h"
 
+/*
+**foo:
+** ...
+** vmsrp0, (?:ip|fp|r[0-9]+)(?:@.*|)
+** ...
+** vpst(?: @.*|)
+** ...
+** vqsubt.s16  q[0-9]+, q[0-9]+, (?:ip|fp|r[0-9]+)(?:  @.*|)
+** ...
+*/
 int16x8_t
 foo (int16x8_t inactive, int16x8_t a, int16_t b, mve_pred16_t p)
 {
   return vqsubq_m_n_s16 (inactive, a, b, p);
 }
 
-/* { dg-final { scan-assembler "vpst" } } */
-/* { dg-final { scan-assembler "vqsubt.s16"  }  } */
 
+/*
+**foo1:
+** ...
+** vmsrp0, (?:ip|fp|r[0-9]+)(?:@.*|)
+** ...
+** vpst(?: @.*|)
+** ...
+** vqsubt.s16  q[0-9]+, q[0-9]+, (?:ip|fp|r[0-9]+)(?:  @.*|)
+** ...
+*/
 int16x8_t
 foo1 (int16x8_t inactive, int16x8_t a, int16_t b, mve_pred16_t p)
 {
   return vqsubq_m (inactive, a, b, p);
 }
 
-/* { dg-final { scan-assembler "vpst" } } */
-/* { dg-final { scan-assembler "vqsubt.s16"  }  } */
+/* { dg-final { scan-assembler-not "__ARM_undef" } } */
\ No newline at end of file
diff --git a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vqsubq_m_n_s32.c 
b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vqsubq_m_n_s32.c
index 23e59ff12a2..ed6b92ddcf5 100644
--- a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vqsubq_m_n_s32.c
+++ b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vqsubq_m_n_s32.c
@@ -1,23 +1,41 @@
 /* { dg-require-effective-target arm_v8_1m_mve_ok } */
 /* { dg-add-options arm_v8_1m_mve } */
 /* { dg-additional-options "-O2" } */
+/* { dg-final { check-function-bodies "**" "" } } */
 
 #include 

[PATCH 21/35] arm: improve tests for vhaddq_m*

2022-11-17 Thread Andrea Corallo via Gcc-patches
gcc/testsuite/ChangeLog:

* gcc.target/arm/mve/intrinsics/vhaddq_m_n_s16.c: Improve test.
* gcc.target/arm/mve/intrinsics/vhaddq_m_n_s32.c: Likewise.
* gcc.target/arm/mve/intrinsics/vhaddq_m_n_s8.c: Likewise.
* gcc.target/arm/mve/intrinsics/vhaddq_m_n_u16.c: Likewise.
* gcc.target/arm/mve/intrinsics/vhaddq_m_n_u32.c: Likewise.
* gcc.target/arm/mve/intrinsics/vhaddq_m_n_u8.c: Likewise.
* gcc.target/arm/mve/intrinsics/vhaddq_m_s16.c: Likewise.
* gcc.target/arm/mve/intrinsics/vhaddq_m_s32.c: Likewise.
* gcc.target/arm/mve/intrinsics/vhaddq_m_s8.c: Likewise.
* gcc.target/arm/mve/intrinsics/vhaddq_m_u16.c: Likewise.
* gcc.target/arm/mve/intrinsics/vhaddq_m_u32.c: Likewise.
* gcc.target/arm/mve/intrinsics/vhaddq_m_u8.c: Likewise.
* gcc.target/arm/mve/intrinsics/vhaddq_n_s16.c: Likewise.
* gcc.target/arm/mve/intrinsics/vhaddq_n_s32.c: Likewise.
* gcc.target/arm/mve/intrinsics/vhaddq_n_s8.c: Likewise.
* gcc.target/arm/mve/intrinsics/vhaddq_n_u16.c: Likewise.
* gcc.target/arm/mve/intrinsics/vhaddq_n_u32.c: Likewise.
* gcc.target/arm/mve/intrinsics/vhaddq_n_u8.c: Likewise.
* gcc.target/arm/mve/intrinsics/vhaddq_s16.c: Likewise.
* gcc.target/arm/mve/intrinsics/vhaddq_s32.c: Likewise.
* gcc.target/arm/mve/intrinsics/vhaddq_s8.c: Likewise.
* gcc.target/arm/mve/intrinsics/vhaddq_u16.c: Likewise.
* gcc.target/arm/mve/intrinsics/vhaddq_u32.c: Likewise.
* gcc.target/arm/mve/intrinsics/vhaddq_u8.c: Likewise.
* gcc.target/arm/mve/intrinsics/vhaddq_x_n_s16.c: Likewise.
* gcc.target/arm/mve/intrinsics/vhaddq_x_n_s32.c: Likewise.
* gcc.target/arm/mve/intrinsics/vhaddq_x_n_s8.c: Likewise.
* gcc.target/arm/mve/intrinsics/vhaddq_x_n_u16.c: Likewise.
* gcc.target/arm/mve/intrinsics/vhaddq_x_n_u32.c: Likewise.
* gcc.target/arm/mve/intrinsics/vhaddq_x_n_u8.c: Likewise.
* gcc.target/arm/mve/intrinsics/vhaddq_x_s16.c: Likewise.
* gcc.target/arm/mve/intrinsics/vhaddq_x_s32.c: Likewise.
* gcc.target/arm/mve/intrinsics/vhaddq_x_s8.c: Likewise.
* gcc.target/arm/mve/intrinsics/vhaddq_x_u16.c: Likewise.
* gcc.target/arm/mve/intrinsics/vhaddq_x_u32.c: Likewise.
* gcc.target/arm/mve/intrinsics/vhaddq_x_u8.c: Likewise.
---
 .../arm/mve/intrinsics/vhaddq_m_n_s16.c   | 26 ++--
 .../arm/mve/intrinsics/vhaddq_m_n_s32.c   | 26 ++--
 .../arm/mve/intrinsics/vhaddq_m_n_s8.c| 26 ++--
 .../arm/mve/intrinsics/vhaddq_m_n_u16.c   | 42 +--
 .../arm/mve/intrinsics/vhaddq_m_n_u32.c   | 42 +--
 .../arm/mve/intrinsics/vhaddq_m_n_u8.c| 42 +--
 .../arm/mve/intrinsics/vhaddq_m_s16.c | 26 ++--
 .../arm/mve/intrinsics/vhaddq_m_s32.c | 26 ++--
 .../arm/mve/intrinsics/vhaddq_m_s8.c  | 26 ++--
 .../arm/mve/intrinsics/vhaddq_m_u16.c | 26 ++--
 .../arm/mve/intrinsics/vhaddq_m_u32.c | 26 ++--
 .../arm/mve/intrinsics/vhaddq_m_u8.c  | 26 ++--
 .../arm/mve/intrinsics/vhaddq_n_s16.c | 16 ++-
 .../arm/mve/intrinsics/vhaddq_n_s32.c | 16 ++-
 .../arm/mve/intrinsics/vhaddq_n_s8.c  | 16 ++-
 .../arm/mve/intrinsics/vhaddq_n_u16.c | 28 -
 .../arm/mve/intrinsics/vhaddq_n_u32.c | 28 -
 .../arm/mve/intrinsics/vhaddq_n_u8.c  | 28 -
 .../arm/mve/intrinsics/vhaddq_s16.c   | 16 ++-
 .../arm/mve/intrinsics/vhaddq_s32.c   | 16 ++-
 .../gcc.target/arm/mve/intrinsics/vhaddq_s8.c | 16 ++-
 .../arm/mve/intrinsics/vhaddq_u16.c   | 16 ++-
 .../arm/mve/intrinsics/vhaddq_u32.c   | 16 ++-
 .../gcc.target/arm/mve/intrinsics/vhaddq_u8.c | 16 ++-
 .../arm/mve/intrinsics/vhaddq_x_n_s16.c   | 26 ++--
 .../arm/mve/intrinsics/vhaddq_x_n_s32.c   | 26 ++--
 .../arm/mve/intrinsics/vhaddq_x_n_s8.c| 26 ++--
 .../arm/mve/intrinsics/vhaddq_x_n_u16.c   | 42 +--
 .../arm/mve/intrinsics/vhaddq_x_n_u32.c   | 42 +--
 .../arm/mve/intrinsics/vhaddq_x_n_u8.c| 42 +--
 .../arm/mve/intrinsics/vhaddq_x_s16.c | 25 +--
 .../arm/mve/intrinsics/vhaddq_x_s32.c | 25 +--
 .../arm/mve/intrinsics/vhaddq_x_s8.c  | 25 +--
 .../arm/mve/intrinsics/vhaddq_x_u16.c | 25 +--
 .../arm/mve/intrinsics/vhaddq_x_u32.c | 25 +--
 .../arm/mve/intrinsics/vhaddq_x_u8.c  | 25 +--
 36 files changed, 828 insertions(+), 114 deletions(-)

diff --git a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vhaddq_m_n_s16.c 
b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vhaddq_m_n_s16.c
index e90af963697..0bd03832ff5 100644
--- 

[PATCH 26/35] arm: improve tests for vmlasq*

2022-11-17 Thread Andrea Corallo via Gcc-patches
gcc/testsuite/ChangeLog:

* gcc.target/arm/mve/intrinsics/vmlasq_m_n_s16.c: Improve test.
* gcc.target/arm/mve/intrinsics/vmlasq_m_n_s32.c: Likewise.
* gcc.target/arm/mve/intrinsics/vmlasq_m_n_s8.c: Likewise.
* gcc.target/arm/mve/intrinsics/vmlasq_m_n_u16.c: Likewise.
* gcc.target/arm/mve/intrinsics/vmlasq_m_n_u32.c: Likewise.
* gcc.target/arm/mve/intrinsics/vmlasq_m_n_u8.c: Likewise.
* gcc.target/arm/mve/intrinsics/vmlasq_n_s16.c: Likewise.
* gcc.target/arm/mve/intrinsics/vmlasq_n_s32.c: Likewise.
* gcc.target/arm/mve/intrinsics/vmlasq_n_s8.c: Likewise.
* gcc.target/arm/mve/intrinsics/vmlasq_n_u16.c: Likewise.
* gcc.target/arm/mve/intrinsics/vmlasq_n_u32.c: Likewise.
* gcc.target/arm/mve/intrinsics/vmlasq_n_u8.c: Likewise.
---
 .../arm/mve/intrinsics/vmlasq_m_n_s16.c   | 34 ++---
 .../arm/mve/intrinsics/vmlasq_m_n_s32.c   | 34 ++---
 .../arm/mve/intrinsics/vmlasq_m_n_s8.c| 34 ++---
 .../arm/mve/intrinsics/vmlasq_m_n_u16.c   | 50 ---
 .../arm/mve/intrinsics/vmlasq_m_n_u32.c   | 50 ---
 .../arm/mve/intrinsics/vmlasq_m_n_u8.c| 50 ---
 .../arm/mve/intrinsics/vmlasq_n_s16.c | 24 ++---
 .../arm/mve/intrinsics/vmlasq_n_s32.c | 24 ++---
 .../arm/mve/intrinsics/vmlasq_n_s8.c  | 24 ++---
 .../arm/mve/intrinsics/vmlasq_n_u16.c | 36 ++---
 .../arm/mve/intrinsics/vmlasq_n_u32.c | 36 ++---
 .../arm/mve/intrinsics/vmlasq_n_u8.c  | 36 ++---
 12 files changed, 348 insertions(+), 84 deletions(-)

diff --git a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vmlasq_m_n_s16.c 
b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vmlasq_m_n_s16.c
index bf66e616ec7..af6e588adad 100644
--- a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vmlasq_m_n_s16.c
+++ b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vmlasq_m_n_s16.c
@@ -1,23 +1,41 @@
 /* { dg-require-effective-target arm_v8_1m_mve_ok } */
 /* { dg-add-options arm_v8_1m_mve } */
 /* { dg-additional-options "-O2" } */
+/* { dg-final { check-function-bodies "**" "" } } */
 
 #include "arm_mve.h"
 
+/*
+**foo:
+** ...
+** vmsrp0, (?:ip|fp|r[0-9]+)(?:@.*|)
+** ...
+** vpst(?: @.*|)
+** ...
+** vmlast.s16  q[0-9]+, q[0-9]+, (?:ip|fp|r[0-9]+)(?:  @.*|)
+** ...
+*/
 int16x8_t
-foo (int16x8_t a, int16x8_t b, int16_t c, mve_pred16_t p)
+foo (int16x8_t m1, int16x8_t m2, int16_t add, mve_pred16_t p)
 {
-  return vmlasq_m_n_s16 (a, b, c, p);
+  return vmlasq_m_n_s16 (m1, m2, add, p);
 }
 
-/* { dg-final { scan-assembler "vpst" } } */
-/* { dg-final { scan-assembler "vmlast.s16"  }  } */
 
+/*
+**foo1:
+** ...
+** vmsrp0, (?:ip|fp|r[0-9]+)(?:@.*|)
+** ...
+** vpst(?: @.*|)
+** ...
+** vmlast.s16  q[0-9]+, q[0-9]+, (?:ip|fp|r[0-9]+)(?:  @.*|)
+** ...
+*/
 int16x8_t
-foo1 (int16x8_t a, int16x8_t b, int16_t c, mve_pred16_t p)
+foo1 (int16x8_t m1, int16x8_t m2, int16_t add, mve_pred16_t p)
 {
-  return vmlasq_m (a, b, c, p);
+  return vmlasq_m (m1, m2, add, p);
 }
 
-/* { dg-final { scan-assembler "vpst" } } */
-/* { dg-final { scan-assembler "vmlast.s16"  }  } */
+/* { dg-final { scan-assembler-not "__ARM_undef" } } */
\ No newline at end of file
diff --git a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vmlasq_m_n_s32.c 
b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vmlasq_m_n_s32.c
index 53c21e2e5b6..9d0cc3076d9 100644
--- a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vmlasq_m_n_s32.c
+++ b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vmlasq_m_n_s32.c
@@ -1,23 +1,41 @@
 /* { dg-require-effective-target arm_v8_1m_mve_ok } */
 /* { dg-add-options arm_v8_1m_mve } */
 /* { dg-additional-options "-O2" } */
+/* { dg-final { check-function-bodies "**" "" } } */
 
 #include "arm_mve.h"
 
+/*
+**foo:
+** ...
+** vmsrp0, (?:ip|fp|r[0-9]+)(?:@.*|)
+** ...
+** vpst(?: @.*|)
+** ...
+** vmlast.s32  q[0-9]+, q[0-9]+, (?:ip|fp|r[0-9]+)(?:  @.*|)
+** ...
+*/
 int32x4_t
-foo (int32x4_t a, int32x4_t b, int32_t c, mve_pred16_t p)
+foo (int32x4_t m1, int32x4_t m2, int32_t add, mve_pred16_t p)
 {
-  return vmlasq_m_n_s32 (a, b, c, p);
+  return vmlasq_m_n_s32 (m1, m2, add, p);
 }
 
-/* { dg-final { scan-assembler "vpst" } } */
-/* { dg-final { scan-assembler "vmlast.s32"  }  } */
 
+/*
+**foo1:
+** ...
+** vmsrp0, (?:ip|fp|r[0-9]+)(?:@.*|)
+** ...
+** vpst(?: @.*|)
+** ...
+** vmlast.s32  q[0-9]+, q[0-9]+, (?:ip|fp|r[0-9]+)(?:  @.*|)
+** ...
+*/
 int32x4_t
-foo1 (int32x4_t a, int32x4_t b, int32_t c, mve_pred16_t p)
+foo1 (int32x4_t m1, int32x4_t m2, int32_t add, mve_pred16_t p)
 {
-  return vmlasq_m (a, b, c, p);
+  return vmlasq_m (m1, m2, add, p);
 }
 
-/* { dg-final { scan-assembler "vpst" } } */
-/* { dg-final { scan-assembler "vmlast.s32"  }  } */
+/* { dg-final { 

[PATCH 34/35] arm: improve tests for vrshlq*

2022-11-17 Thread Andrea Corallo via Gcc-patches
gcc/testsuite/ChangeLog:

* gcc.target/arm/mve/intrinsics/vrshlq_m_n_s16.c: Improve tests.
* gcc.target/arm/mve/intrinsics/vrshlq_m_n_s32.c: Likewise.
* gcc.target/arm/mve/intrinsics/vrshlq_m_n_s8.c: Likewise.
* gcc.target/arm/mve/intrinsics/vrshlq_m_n_u16.c: Likewise.
* gcc.target/arm/mve/intrinsics/vrshlq_m_n_u32.c: Likewise.
* gcc.target/arm/mve/intrinsics/vrshlq_m_n_u8.c: Likewise.
* gcc.target/arm/mve/intrinsics/vrshlq_m_s16.c: Likewise.
* gcc.target/arm/mve/intrinsics/vrshlq_m_s32.c: Likewise.
* gcc.target/arm/mve/intrinsics/vrshlq_m_s8.c: Likewise.
* gcc.target/arm/mve/intrinsics/vrshlq_m_u16.c: Likewise.
* gcc.target/arm/mve/intrinsics/vrshlq_m_u32.c: Likewise.
* gcc.target/arm/mve/intrinsics/vrshlq_m_u8.c: Likewise.
* gcc.target/arm/mve/intrinsics/vrshlq_n_s16.c: Likewise.
* gcc.target/arm/mve/intrinsics/vrshlq_n_s32.c: Likewise.
* gcc.target/arm/mve/intrinsics/vrshlq_n_s8.c: Likewise.
* gcc.target/arm/mve/intrinsics/vrshlq_n_u16.c: Likewise.
* gcc.target/arm/mve/intrinsics/vrshlq_n_u32.c: Likewise.
* gcc.target/arm/mve/intrinsics/vrshlq_n_u8.c: Likewise.
* gcc.target/arm/mve/intrinsics/vrshlq_s16.c: Likewise.
* gcc.target/arm/mve/intrinsics/vrshlq_s32.c: Likewise.
* gcc.target/arm/mve/intrinsics/vrshlq_s8.c: Likewise.
* gcc.target/arm/mve/intrinsics/vrshlq_u16.c: Likewise.
* gcc.target/arm/mve/intrinsics/vrshlq_u32.c: Likewise.
* gcc.target/arm/mve/intrinsics/vrshlq_u8.c: Likewise.
* gcc.target/arm/mve/intrinsics/vrshlq_x_s16.c: Likewise.
* gcc.target/arm/mve/intrinsics/vrshlq_x_s32.c: Likewise.
* gcc.target/arm/mve/intrinsics/vrshlq_x_s8.c: Likewise.
* gcc.target/arm/mve/intrinsics/vrshlq_x_u16.c: Likewise.
* gcc.target/arm/mve/intrinsics/vrshlq_x_u32.c: Likewise.
* gcc.target/arm/mve/intrinsics/vrshlq_x_u8.c: Likewise.
---
 .../arm/mve/intrinsics/vrshlq_m_n_s16.c   | 25 +++---
 .../arm/mve/intrinsics/vrshlq_m_n_s32.c   | 25 +++---
 .../arm/mve/intrinsics/vrshlq_m_n_s8.c| 25 +++---
 .../arm/mve/intrinsics/vrshlq_m_n_u16.c   | 25 +++---
 .../arm/mve/intrinsics/vrshlq_m_n_u32.c   | 25 +++---
 .../arm/mve/intrinsics/vrshlq_m_n_u8.c| 25 +++---
 .../arm/mve/intrinsics/vrshlq_m_s16.c | 26 ---
 .../arm/mve/intrinsics/vrshlq_m_s32.c | 26 ---
 .../arm/mve/intrinsics/vrshlq_m_s8.c  | 26 ---
 .../arm/mve/intrinsics/vrshlq_m_u16.c | 26 ---
 .../arm/mve/intrinsics/vrshlq_m_u32.c | 26 ---
 .../arm/mve/intrinsics/vrshlq_m_u8.c  | 26 ---
 .../arm/mve/intrinsics/vrshlq_n_s16.c | 16 ++--
 .../arm/mve/intrinsics/vrshlq_n_s32.c | 16 ++--
 .../arm/mve/intrinsics/vrshlq_n_s8.c  | 16 ++--
 .../arm/mve/intrinsics/vrshlq_n_u16.c | 16 ++--
 .../arm/mve/intrinsics/vrshlq_n_u32.c | 16 ++--
 .../arm/mve/intrinsics/vrshlq_n_u8.c  | 16 ++--
 .../arm/mve/intrinsics/vrshlq_s16.c   | 16 ++--
 .../arm/mve/intrinsics/vrshlq_s32.c   | 16 ++--
 .../gcc.target/arm/mve/intrinsics/vrshlq_s8.c | 16 ++--
 .../arm/mve/intrinsics/vrshlq_u16.c   | 16 ++--
 .../arm/mve/intrinsics/vrshlq_u32.c   | 16 ++--
 .../gcc.target/arm/mve/intrinsics/vrshlq_u8.c | 16 ++--
 .../arm/mve/intrinsics/vrshlq_x_s16.c | 25 +++---
 .../arm/mve/intrinsics/vrshlq_x_s32.c | 25 +++---
 .../arm/mve/intrinsics/vrshlq_x_s8.c  | 25 +++---
 .../arm/mve/intrinsics/vrshlq_x_u16.c | 25 +++---
 .../arm/mve/intrinsics/vrshlq_x_u32.c | 25 +++---
 .../arm/mve/intrinsics/vrshlq_x_u8.c  | 25 +++---
 30 files changed, 564 insertions(+), 84 deletions(-)

diff --git a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vrshlq_m_n_s16.c 
b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vrshlq_m_n_s16.c
index cf51de6aa9c..c7d1f3a5b1c 100644
--- a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vrshlq_m_n_s16.c
+++ b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vrshlq_m_n_s16.c
@@ -1,22 +1,41 @@
 /* { dg-require-effective-target arm_v8_1m_mve_ok } */
 /* { dg-add-options arm_v8_1m_mve } */
 /* { dg-additional-options "-O2" } */
+/* { dg-final { check-function-bodies "**" "" } } */
 
 #include "arm_mve.h"
 
+/*
+**foo:
+** ...
+** vmsrp0, (?:ip|fp|r[0-9]+)(?:@.*|)
+** ...
+** vpst(?: @.*|)
+** ...
+** vrshlt.s16  q[0-9]+, (?:ip|fp|r[0-9]+)(?:   @.*|)
+** ...
+*/
 int16x8_t
 foo (int16x8_t a, int32_t b, mve_pred16_t p)
 {
   return vrshlq_m_n_s16 (a, b, p);
 }
 
-/* { dg-final { 

[PATCH 18/35] arm: improve tests for vmulq*

2022-11-17 Thread Andrea Corallo via Gcc-patches
gcc/testsuite/ChangeLog:

* gcc.target/arm/mve/intrinsics/vmulq_f16.c: Improve test.
* gcc.target/arm/mve/intrinsics/vmulq_f32.c: Likewise.
* gcc.target/arm/mve/intrinsics/vmulq_m_f16.c: Likewise.
* gcc.target/arm/mve/intrinsics/vmulq_m_f32.c: Likewise.
* gcc.target/arm/mve/intrinsics/vmulq_m_n_f16.c: Likewise.
* gcc.target/arm/mve/intrinsics/vmulq_m_n_f32.c: Likewise.
* gcc.target/arm/mve/intrinsics/vmulq_m_n_s16.c: Likewise.
* gcc.target/arm/mve/intrinsics/vmulq_m_n_s32.c: Likewise.
* gcc.target/arm/mve/intrinsics/vmulq_m_n_s8.c: Likewise.
* gcc.target/arm/mve/intrinsics/vmulq_m_n_u16.c: Likewise.
* gcc.target/arm/mve/intrinsics/vmulq_m_n_u32.c: Likewise.
* gcc.target/arm/mve/intrinsics/vmulq_m_n_u8.c: Likewise.
* gcc.target/arm/mve/intrinsics/vmulq_m_s16.c: Likewise.
* gcc.target/arm/mve/intrinsics/vmulq_m_s32.c: Likewise.
* gcc.target/arm/mve/intrinsics/vmulq_m_s8.c: Likewise.
* gcc.target/arm/mve/intrinsics/vmulq_m_u16.c: Likewise.
* gcc.target/arm/mve/intrinsics/vmulq_m_u32.c: Likewise.
* gcc.target/arm/mve/intrinsics/vmulq_m_u8.c: Likewise.
* gcc.target/arm/mve/intrinsics/vmulq_n_f16.c: Likewise.
* gcc.target/arm/mve/intrinsics/vmulq_n_f32.c: Likewise.
* gcc.target/arm/mve/intrinsics/vmulq_n_s16.c: Likewise.
* gcc.target/arm/mve/intrinsics/vmulq_n_s32.c: Likewise.
* gcc.target/arm/mve/intrinsics/vmulq_n_s8.c: Likewise.
* gcc.target/arm/mve/intrinsics/vmulq_n_u16.c: Likewise.
* gcc.target/arm/mve/intrinsics/vmulq_n_u32.c: Likewise.
* gcc.target/arm/mve/intrinsics/vmulq_n_u8.c: Likewise.
* gcc.target/arm/mve/intrinsics/vmulq_s16.c: Likewise.
* gcc.target/arm/mve/intrinsics/vmulq_s32.c: Likewise.
* gcc.target/arm/mve/intrinsics/vmulq_s8.c: Likewise.
* gcc.target/arm/mve/intrinsics/vmulq_u16.c: Likewise.
* gcc.target/arm/mve/intrinsics/vmulq_u32.c: Likewise.
* gcc.target/arm/mve/intrinsics/vmulq_u8.c: Likewise.
* gcc.target/arm/mve/intrinsics/vmulq_x_f16.c: Likewise.
* gcc.target/arm/mve/intrinsics/vmulq_x_f32.c: Likewise.
* gcc.target/arm/mve/intrinsics/vmulq_x_n_f16.c: Likewise.
* gcc.target/arm/mve/intrinsics/vmulq_x_n_f32.c: Likewise.
* gcc.target/arm/mve/intrinsics/vmulq_x_n_s16.c: Likewise.
* gcc.target/arm/mve/intrinsics/vmulq_x_n_s32.c: Likewise.
* gcc.target/arm/mve/intrinsics/vmulq_x_n_s8.c: Likewise.
* gcc.target/arm/mve/intrinsics/vmulq_x_n_u16.c: Likewise.
* gcc.target/arm/mve/intrinsics/vmulq_x_n_u32.c: Likewise.
* gcc.target/arm/mve/intrinsics/vmulq_x_n_u8.c: Likewise.
* gcc.target/arm/mve/intrinsics/vmulq_x_s16.c: Likewise.
* gcc.target/arm/mve/intrinsics/vmulq_x_s32.c: Likewise.
* gcc.target/arm/mve/intrinsics/vmulq_x_s8.c: Likewise.
* gcc.target/arm/mve/intrinsics/vmulq_x_u16.c: Likewise.
* gcc.target/arm/mve/intrinsics/vmulq_x_u32.c: Likewise.
* gcc.target/arm/mve/intrinsics/vmulq_x_u8.c: Likewise.
---
 .../gcc.target/arm/mve/intrinsics/vmulq_f16.c | 16 ++-
 .../gcc.target/arm/mve/intrinsics/vmulq_f32.c | 16 ++-
 .../arm/mve/intrinsics/vmulq_m_f16.c  | 26 ++--
 .../arm/mve/intrinsics/vmulq_m_f32.c  | 26 ++--
 .../arm/mve/intrinsics/vmulq_m_n_f16.c| 42 +--
 .../arm/mve/intrinsics/vmulq_m_n_f32.c| 42 +--
 .../arm/mve/intrinsics/vmulq_m_n_s16.c| 26 ++--
 .../arm/mve/intrinsics/vmulq_m_n_s32.c| 26 ++--
 .../arm/mve/intrinsics/vmulq_m_n_s8.c | 26 ++--
 .../arm/mve/intrinsics/vmulq_m_n_u16.c| 42 +--
 .../arm/mve/intrinsics/vmulq_m_n_u32.c| 42 +--
 .../arm/mve/intrinsics/vmulq_m_n_u8.c | 42 +--
 .../arm/mve/intrinsics/vmulq_m_s16.c  | 26 ++--
 .../arm/mve/intrinsics/vmulq_m_s32.c  | 26 ++--
 .../arm/mve/intrinsics/vmulq_m_s8.c   | 26 ++--
 .../arm/mve/intrinsics/vmulq_m_u16.c  | 26 ++--
 .../arm/mve/intrinsics/vmulq_m_u32.c  | 26 ++--
 .../arm/mve/intrinsics/vmulq_m_u8.c   | 26 ++--
 .../arm/mve/intrinsics/vmulq_n_f16.c  | 28 -
 .../arm/mve/intrinsics/vmulq_n_f32.c  | 28 -
 .../arm/mve/intrinsics/vmulq_n_s16.c  | 16 ++-
 .../arm/mve/intrinsics/vmulq_n_s32.c  | 16 ++-
 .../arm/mve/intrinsics/vmulq_n_s8.c   | 16 ++-
 .../arm/mve/intrinsics/vmulq_n_u16.c  | 28 -
 .../arm/mve/intrinsics/vmulq_n_u32.c  | 28 -
 .../arm/mve/intrinsics/vmulq_n_u8.c   | 28 -
 .../gcc.target/arm/mve/intrinsics/vmulq_s16.c | 16 ++-
 .../gcc.target/arm/mve/intrinsics/vmulq_s32.c | 16 ++-
 

[PATCH 25/35] arm: improve tests and fix vmlaldavaxq*

2022-11-17 Thread Andrea Corallo via Gcc-patches
gcc/ChangeLog:

* config/arm/mve.md (mve_vmlaldavaq_)
(mve_vmlaldavaxq_s, mve_vmlaldavaxq_p_): Fix
spacing vs tabs.

gcc/testsuite/ChangeLog:

* gcc.target/arm/mve/intrinsics/vmlaldavaxq_p_s16.c: Improve tests.
* gcc.target/arm/mve/intrinsics/vmlaldavaxq_p_s32.c: Likewise.
* gcc.target/arm/mve/intrinsics/vmlaldavaxq_s16.c: Likewise.
* gcc.target/arm/mve/intrinsics/vmlaldavaxq_s32.c: Likewise.
---
 gcc/config/arm/mve.md |  6 ++--
 .../arm/mve/intrinsics/vmlaldavaxq_p_s16.c| 32 +++
 .../arm/mve/intrinsics/vmlaldavaxq_p_s32.c| 32 +++
 .../arm/mve/intrinsics/vmlaldavaxq_s16.c  | 24 ++
 .../arm/mve/intrinsics/vmlaldavaxq_s32.c  | 24 ++
 5 files changed, 91 insertions(+), 27 deletions(-)

diff --git a/gcc/config/arm/mve.md b/gcc/config/arm/mve.md
index 714dc6fc7ce..d2ffae6a425 100644
--- a/gcc/config/arm/mve.md
+++ b/gcc/config/arm/mve.md
@@ -4163,7 +4163,7 @@ (define_insn "mve_vmlaldavaq_"
 VMLALDAVAQ))
   ]
   "TARGET_HAVE_MVE"
-  "vmlaldava.%# %Q0, %R0, %q2, %q3"
+  "vmlaldava.%#\t%Q0, %R0, %q2, %q3"
   [(set_attr "type" "mve_move")
 ])
 
@@ -4179,7 +4179,7 @@ (define_insn "mve_vmlaldavaxq_s"
 VMLALDAVAXQ_S))
   ]
   "TARGET_HAVE_MVE"
-  "vmlaldavax.s%# %Q0, %R0, %q2, %q3"
+  "vmlaldavax.s%#\t%Q0, %R0, %q2, %q3"
   [(set_attr "type" "mve_move")
 ])
 
@@ -6126,7 +6126,7 @@ (define_insn "mve_vmlaldavaxq_p_"
 VMLALDAVAXQ_P))
   ]
   "TARGET_HAVE_MVE"
-  "vpst\;vmlaldavaxt.%# %Q0, %R0, %q2, %q3"
+  "vpst\;vmlaldavaxt.%#\t%Q0, %R0, %q2, %q3"
   [(set_attr "type" "mve_move")
(set_attr "length""8")])
 
diff --git a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vmlaldavaxq_p_s16.c 
b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vmlaldavaxq_p_s16.c
index f33d3880236..87f0354a636 100644
--- a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vmlaldavaxq_p_s16.c
+++ b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vmlaldavaxq_p_s16.c
@@ -1,21 +1,41 @@
 /* { dg-require-effective-target arm_v8_1m_mve_ok } */
 /* { dg-add-options arm_v8_1m_mve } */
 /* { dg-additional-options "-O2" } */
+/* { dg-final { check-function-bodies "**" "" } } */
 
 #include "arm_mve.h"
 
+/*
+**foo:
+** ...
+** vmsrp0, (?:ip|fp|r[0-9]+)(?:@.*|)
+** ...
+** vpst(?: @.*|)
+** ...
+** vmlaldavaxt.s16 (?:ip|fp|r[0-9]+), (?:ip|fp|r[0-9]+), q[0-9]+, 
q[0-9]+(?:   @.*|)
+** ...
+*/
 int64_t
-foo (int64_t a, int16x8_t b, int16x8_t c, mve_pred16_t p)
+foo (int64_t add, int16x8_t m1, int16x8_t m2, mve_pred16_t p)
 {
-  return vmlaldavaxq_p_s16 (a, b, c, p);
+  return vmlaldavaxq_p_s16 (add, m1, m2, p);
 }
 
-/* { dg-final { scan-assembler "vmlaldavaxt.s16"  }  } */
 
+/*
+**foo1:
+** ...
+** vmsrp0, (?:ip|fp|r[0-9]+)(?:@.*|)
+** ...
+** vpst(?: @.*|)
+** ...
+** vmlaldavaxt.s16 (?:ip|fp|r[0-9]+), (?:ip|fp|r[0-9]+), q[0-9]+, 
q[0-9]+(?:   @.*|)
+** ...
+*/
 int64_t
-foo1 (int64_t a, int16x8_t b, int16x8_t c, mve_pred16_t p)
+foo1 (int64_t add, int16x8_t m1, int16x8_t m2, mve_pred16_t p)
 {
-  return vmlaldavaxq_p (a, b, c, p);
+  return vmlaldavaxq_p (add, m1, m2, p);
 }
 
-/* { dg-final { scan-assembler "vmlaldavaxt.s16"  }  } */
+/* { dg-final { scan-assembler-not "__ARM_undef" } } */
\ No newline at end of file
diff --git a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vmlaldavaxq_p_s32.c 
b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vmlaldavaxq_p_s32.c
index ab072a9850e..d26bf5b90af 100644
--- a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vmlaldavaxq_p_s32.c
+++ b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vmlaldavaxq_p_s32.c
@@ -1,21 +1,41 @@
 /* { dg-require-effective-target arm_v8_1m_mve_ok } */
 /* { dg-add-options arm_v8_1m_mve } */
 /* { dg-additional-options "-O2" } */
+/* { dg-final { check-function-bodies "**" "" } } */
 
 #include "arm_mve.h"
 
+/*
+**foo:
+** ...
+** vmsrp0, (?:ip|fp|r[0-9]+)(?:@.*|)
+** ...
+** vpst(?: @.*|)
+** ...
+** vmlaldavaxt.s32 (?:ip|fp|r[0-9]+), (?:ip|fp|r[0-9]+), q[0-9]+, 
q[0-9]+(?:   @.*|)
+** ...
+*/
 int64_t
-foo (int64_t a, int32x4_t b, int32x4_t c, mve_pred16_t p)
+foo (int64_t add, int32x4_t m1, int32x4_t m2, mve_pred16_t p)
 {
-  return vmlaldavaxq_p_s32 (a, b, c, p);
+  return vmlaldavaxq_p_s32 (add, m1, m2, p);
 }
 
-/* { dg-final { scan-assembler "vmlaldavaxt.s32"  }  } */
 
+/*
+**foo1:
+** ...
+** vmsrp0, (?:ip|fp|r[0-9]+)(?:@.*|)
+** ...
+** vpst(?: @.*|)
+** ...
+** vmlaldavaxt.s32 (?:ip|fp|r[0-9]+), (?:ip|fp|r[0-9]+), q[0-9]+, 
q[0-9]+(?:   @.*|)
+** ...
+*/
 int64_t
-foo1 (int64_t a, int32x4_t b, int32x4_t c, mve_pred16_t p)
+foo1 (int64_t add, int32x4_t m1, int32x4_t m2, mve_pred16_t p)
 {
-  return vmlaldavaxq_p (a, b, c, p);
+  return vmlaldavaxq_p (add, m1, m2, p);
 }
 
-/* { dg-final { scan-assembler "vmlaldavaxt.s32"  }  } */
+/* { dg-final { 

[PATCH 28/35] arm: improve tests for vqdmlahq_m*

2022-11-17 Thread Andrea Corallo via Gcc-patches
gcc/testsuite/ChangeLog:

* gcc.target/arm/mve/intrinsics/vqdmlahq_m_n_s16.c: Improve test.
* gcc.target/arm/mve/intrinsics/vqdmlahq_m_n_s32.c: Likewise.
* gcc.target/arm/mve/intrinsics/vqdmlahq_m_n_s8.c: Likewise.
* gcc.target/arm/mve/intrinsics/vqdmlahq_n_s16.c: Likewise.
* gcc.target/arm/mve/intrinsics/vqdmlahq_n_s32.c: Likewise.
* gcc.target/arm/mve/intrinsics/vqdmlahq_n_s8.c: Likewise.
* gcc.target/arm/mve/intrinsics/vqdmlashq_m_n_s16.c: Likewise.
* gcc.target/arm/mve/intrinsics/vqdmlashq_m_n_s32.c: Likewise.
* gcc.target/arm/mve/intrinsics/vqdmlashq_m_n_s8.c: Likewise.
* gcc.target/arm/mve/intrinsics/vqdmlashq_n_s16.c: Likewise.
* gcc.target/arm/mve/intrinsics/vqdmlashq_n_s32.c: Likewise.
* gcc.target/arm/mve/intrinsics/vqdmlashq_n_s8.c: Likewise.
---
 .../arm/mve/intrinsics/vqdmlahq_m_n_s16.c | 34 ++-
 .../arm/mve/intrinsics/vqdmlahq_m_n_s32.c | 34 ++-
 .../arm/mve/intrinsics/vqdmlahq_m_n_s8.c  | 34 ++-
 .../arm/mve/intrinsics/vqdmlahq_n_s16.c   | 24 +
 .../arm/mve/intrinsics/vqdmlahq_n_s32.c   | 24 +
 .../arm/mve/intrinsics/vqdmlahq_n_s8.c| 24 +
 .../arm/mve/intrinsics/vqdmlashq_m_n_s16.c| 34 ++-
 .../arm/mve/intrinsics/vqdmlashq_m_n_s32.c| 34 ++-
 .../arm/mve/intrinsics/vqdmlashq_m_n_s8.c | 34 ++-
 .../arm/mve/intrinsics/vqdmlashq_n_s16.c  | 24 +
 .../arm/mve/intrinsics/vqdmlashq_n_s32.c  | 24 +
 .../arm/mve/intrinsics/vqdmlashq_n_s8.c   | 24 +
 12 files changed, 264 insertions(+), 84 deletions(-)

diff --git a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vqdmlahq_m_n_s16.c 
b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vqdmlahq_m_n_s16.c
index d8c4f4bab8e..94d93874542 100644
--- a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vqdmlahq_m_n_s16.c
+++ b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vqdmlahq_m_n_s16.c
@@ -1,23 +1,41 @@
 /* { dg-require-effective-target arm_v8_1m_mve_ok } */
 /* { dg-add-options arm_v8_1m_mve } */
 /* { dg-additional-options "-O2" } */
+/* { dg-final { check-function-bodies "**" "" } } */
 
 #include "arm_mve.h"
 
+/*
+**foo:
+** ...
+** vmsrp0, (?:ip|fp|r[0-9]+)(?:@.*|)
+** ...
+** vpst(?: @.*|)
+** ...
+** vqdmlaht.s16q[0-9]+, q[0-9]+, (?:ip|fp|r[0-9]+)(?:  @.*|)
+** ...
+*/
 int16x8_t
-foo (int16x8_t a, int16x8_t b, int16_t c, mve_pred16_t p)
+foo (int16x8_t add, int16x8_t m1, int16_t m2, mve_pred16_t p)
 {
-  return vqdmlahq_m_n_s16 (a, b, c, p);
+  return vqdmlahq_m_n_s16 (add, m1, m2, p);
 }
 
-/* { dg-final { scan-assembler "vpst" } } */
-/* { dg-final { scan-assembler "vqdmlaht.s16"  }  } */
 
+/*
+**foo1:
+** ...
+** vmsrp0, (?:ip|fp|r[0-9]+)(?:@.*|)
+** ...
+** vpst(?: @.*|)
+** ...
+** vqdmlaht.s16q[0-9]+, q[0-9]+, (?:ip|fp|r[0-9]+)(?:  @.*|)
+** ...
+*/
 int16x8_t
-foo1 (int16x8_t a, int16x8_t b, int16_t c, mve_pred16_t p)
+foo1 (int16x8_t add, int16x8_t m1, int16_t m2, mve_pred16_t p)
 {
-  return vqdmlahq_m (a, b, c, p);
+  return vqdmlahq_m (add, m1, m2, p);
 }
 
-/* { dg-final { scan-assembler "vpst" } } */
-/* { dg-final { scan-assembler "vqdmlaht.s16"  }  } */
+/* { dg-final { scan-assembler-not "__ARM_undef" } } */
\ No newline at end of file
diff --git a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vqdmlahq_m_n_s32.c 
b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vqdmlahq_m_n_s32.c
index 361f5d00bdf..a3dab7fa02e 100644
--- a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vqdmlahq_m_n_s32.c
+++ b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vqdmlahq_m_n_s32.c
@@ -1,23 +1,41 @@
 /* { dg-require-effective-target arm_v8_1m_mve_ok } */
 /* { dg-add-options arm_v8_1m_mve } */
 /* { dg-additional-options "-O2" } */
+/* { dg-final { check-function-bodies "**" "" } } */
 
 #include "arm_mve.h"
 
+/*
+**foo:
+** ...
+** vmsrp0, (?:ip|fp|r[0-9]+)(?:@.*|)
+** ...
+** vpst(?: @.*|)
+** ...
+** vqdmlaht.s32q[0-9]+, q[0-9]+, (?:ip|fp|r[0-9]+)(?:  @.*|)
+** ...
+*/
 int32x4_t
-foo (int32x4_t a, int32x4_t b, int32_t c, mve_pred16_t p)
+foo (int32x4_t add, int32x4_t m1, int32_t m2, mve_pred16_t p)
 {
-  return vqdmlahq_m_n_s32 (a, b, c, p);
+  return vqdmlahq_m_n_s32 (add, m1, m2, p);
 }
 
-/* { dg-final { scan-assembler "vpst" } } */
-/* { dg-final { scan-assembler "vqdmlaht.s32"  }  } */
 
+/*
+**foo1:
+** ...
+** vmsrp0, (?:ip|fp|r[0-9]+)(?:@.*|)
+** ...
+** vpst(?: @.*|)
+** ...
+** vqdmlaht.s32q[0-9]+, q[0-9]+, (?:ip|fp|r[0-9]+)(?:  @.*|)
+** ...
+*/
 int32x4_t
-foo1 (int32x4_t a, int32x4_t b, int32_t c, mve_pred16_t p)
+foo1 (int32x4_t add, int32x4_t m1, int32_t m2, mve_pred16_t p)
 {
-  return vqdmlahq_m (a, b, c, p);
+  return vqdmlahq_m (add, m1, m2, p);
 }
 
-/* { dg-final { 

[PATCH 22/35] arm: improve tests for vhsubq_m*

2022-11-17 Thread Andrea Corallo via Gcc-patches
gcc/testsuite/ChangeLog:

* gcc.target/arm/mve/intrinsics/vhsubq_m_n_s16.c: Improve test.
* gcc.target/arm/mve/intrinsics/vhsubq_m_n_s32.c: Likewise.
* gcc.target/arm/mve/intrinsics/vhsubq_m_n_s8.c: Likewise.
* gcc.target/arm/mve/intrinsics/vhsubq_m_n_u16.c: Likewise.
* gcc.target/arm/mve/intrinsics/vhsubq_m_n_u32.c: Likewise.
* gcc.target/arm/mve/intrinsics/vhsubq_m_n_u8.c: Likewise.
* gcc.target/arm/mve/intrinsics/vhsubq_m_s16.c: Likewise.
* gcc.target/arm/mve/intrinsics/vhsubq_m_s32.c: Likewise.
* gcc.target/arm/mve/intrinsics/vhsubq_m_s8.c: Likewise.
* gcc.target/arm/mve/intrinsics/vhsubq_m_u16.c: Likewise.
* gcc.target/arm/mve/intrinsics/vhsubq_m_u32.c: Likewise.
* gcc.target/arm/mve/intrinsics/vhsubq_m_u8.c: Likewise.
* gcc.target/arm/mve/intrinsics/vhsubq_n_s16.c: Likewise.
* gcc.target/arm/mve/intrinsics/vhsubq_n_s32.c: Likewise.
* gcc.target/arm/mve/intrinsics/vhsubq_n_s8.c: Likewise.
* gcc.target/arm/mve/intrinsics/vhsubq_n_u16.c: Likewise.
* gcc.target/arm/mve/intrinsics/vhsubq_n_u32.c: Likewise.
* gcc.target/arm/mve/intrinsics/vhsubq_n_u8.c: Likewise.
* gcc.target/arm/mve/intrinsics/vhsubq_s16.c: Likewise.
* gcc.target/arm/mve/intrinsics/vhsubq_s32.c: Likewise.
* gcc.target/arm/mve/intrinsics/vhsubq_s8.c: Likewise.
* gcc.target/arm/mve/intrinsics/vhsubq_u16.c: Likewise.
* gcc.target/arm/mve/intrinsics/vhsubq_u32.c: Likewise.
* gcc.target/arm/mve/intrinsics/vhsubq_u8.c: Likewise.
* gcc.target/arm/mve/intrinsics/vhsubq_x_n_s16.c: Likewise.
* gcc.target/arm/mve/intrinsics/vhsubq_x_n_s32.c: Likewise.
* gcc.target/arm/mve/intrinsics/vhsubq_x_n_s8.c: Likewise.
* gcc.target/arm/mve/intrinsics/vhsubq_x_n_u16.c: Likewise.
* gcc.target/arm/mve/intrinsics/vhsubq_x_n_u32.c: Likewise.
* gcc.target/arm/mve/intrinsics/vhsubq_x_n_u8.c: Likewise.
* gcc.target/arm/mve/intrinsics/vhsubq_x_s16.c: Likewise.
* gcc.target/arm/mve/intrinsics/vhsubq_x_s32.c: Likewise.
* gcc.target/arm/mve/intrinsics/vhsubq_x_s8.c: Likewise.
* gcc.target/arm/mve/intrinsics/vhsubq_x_u16.c: Likewise.
* gcc.target/arm/mve/intrinsics/vhsubq_x_u32.c: Likewise.
* gcc.target/arm/mve/intrinsics/vhsubq_x_u8.c: Likewise.
---
 .../arm/mve/intrinsics/vhsubq_m_n_s16.c   | 26 ++--
 .../arm/mve/intrinsics/vhsubq_m_n_s32.c   | 26 ++--
 .../arm/mve/intrinsics/vhsubq_m_n_s8.c| 26 ++--
 .../arm/mve/intrinsics/vhsubq_m_n_u16.c   | 42 +--
 .../arm/mve/intrinsics/vhsubq_m_n_u32.c   | 42 +--
 .../arm/mve/intrinsics/vhsubq_m_n_u8.c| 42 +--
 .../arm/mve/intrinsics/vhsubq_m_s16.c | 26 ++--
 .../arm/mve/intrinsics/vhsubq_m_s32.c | 26 ++--
 .../arm/mve/intrinsics/vhsubq_m_s8.c  | 26 ++--
 .../arm/mve/intrinsics/vhsubq_m_u16.c | 26 ++--
 .../arm/mve/intrinsics/vhsubq_m_u32.c | 26 ++--
 .../arm/mve/intrinsics/vhsubq_m_u8.c  | 26 ++--
 .../arm/mve/intrinsics/vhsubq_n_s16.c | 16 ++-
 .../arm/mve/intrinsics/vhsubq_n_s32.c | 16 ++-
 .../arm/mve/intrinsics/vhsubq_n_s8.c  | 16 ++-
 .../arm/mve/intrinsics/vhsubq_n_u16.c | 28 -
 .../arm/mve/intrinsics/vhsubq_n_u32.c | 28 -
 .../arm/mve/intrinsics/vhsubq_n_u8.c  | 28 -
 .../arm/mve/intrinsics/vhsubq_s16.c   | 16 ++-
 .../arm/mve/intrinsics/vhsubq_s32.c   | 16 ++-
 .../gcc.target/arm/mve/intrinsics/vhsubq_s8.c | 16 ++-
 .../arm/mve/intrinsics/vhsubq_u16.c   | 16 ++-
 .../arm/mve/intrinsics/vhsubq_u32.c   | 16 ++-
 .../gcc.target/arm/mve/intrinsics/vhsubq_u8.c | 16 ++-
 .../arm/mve/intrinsics/vhsubq_x_n_s16.c   | 26 ++--
 .../arm/mve/intrinsics/vhsubq_x_n_s32.c   | 26 ++--
 .../arm/mve/intrinsics/vhsubq_x_n_s8.c| 26 ++--
 .../arm/mve/intrinsics/vhsubq_x_n_u16.c   | 42 +--
 .../arm/mve/intrinsics/vhsubq_x_n_u32.c   | 42 +--
 .../arm/mve/intrinsics/vhsubq_x_n_u8.c| 42 +--
 .../arm/mve/intrinsics/vhsubq_x_s16.c | 25 +--
 .../arm/mve/intrinsics/vhsubq_x_s32.c | 25 +--
 .../arm/mve/intrinsics/vhsubq_x_s8.c  | 25 +--
 .../arm/mve/intrinsics/vhsubq_x_u16.c | 25 +--
 .../arm/mve/intrinsics/vhsubq_x_u32.c | 25 +--
 .../arm/mve/intrinsics/vhsubq_x_u8.c  | 25 +--
 36 files changed, 828 insertions(+), 114 deletions(-)

diff --git a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vhsubq_m_n_s16.c 
b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vhsubq_m_n_s16.c
index 27dcb7be957..6390589808f 100644
--- 

[PATCH 08/35] arm: improve tests for vmin*

2022-11-17 Thread Andrea Corallo via Gcc-patches
gcc/testsuite/ChangeLog:

* gcc.target/arm/mve/intrinsics/vminaq_m_s16.c: Improve test.
* gcc.target/arm/mve/intrinsics/vminaq_m_s32.c: Likewise.
* gcc.target/arm/mve/intrinsics/vminaq_m_s8.c: Likewise.
* gcc.target/arm/mve/intrinsics/vminaq_s16.c: Likewise.
* gcc.target/arm/mve/intrinsics/vminaq_s32.c: Likewise.
* gcc.target/arm/mve/intrinsics/vminaq_s8.c: Likewise.
* gcc.target/arm/mve/intrinsics/vminavq_p_s16.c: Likewise.
* gcc.target/arm/mve/intrinsics/vminavq_p_s32.c: Likewise.
* gcc.target/arm/mve/intrinsics/vminavq_p_s8.c: Likewise.
* gcc.target/arm/mve/intrinsics/vminavq_s16.c: Likewise.
* gcc.target/arm/mve/intrinsics/vminavq_s32.c: Likewise.
* gcc.target/arm/mve/intrinsics/vminavq_s8.c: Likewise.
* gcc.target/arm/mve/intrinsics/vminnmaq_f16.c: Likewise.
* gcc.target/arm/mve/intrinsics/vminnmaq_f32.c: Likewise.
* gcc.target/arm/mve/intrinsics/vminnmaq_m_f16.c: Likewise.
* gcc.target/arm/mve/intrinsics/vminnmaq_m_f32.c: Likewise.
* gcc.target/arm/mve/intrinsics/vminnmavq_f16.c: Likewise.
* gcc.target/arm/mve/intrinsics/vminnmavq_f32.c: Likewise.
* gcc.target/arm/mve/intrinsics/vminnmavq_p_f16.c: Likewise.
* gcc.target/arm/mve/intrinsics/vminnmavq_p_f32.c: Likewise.
* gcc.target/arm/mve/intrinsics/vminnmq_f16.c: Likewise.
* gcc.target/arm/mve/intrinsics/vminnmq_f32.c: Likewise.
* gcc.target/arm/mve/intrinsics/vminnmq_m_f16.c: Likewise.
* gcc.target/arm/mve/intrinsics/vminnmq_m_f32.c: Likewise.
* gcc.target/arm/mve/intrinsics/vminnmq_x_f16.c: Likewise.
* gcc.target/arm/mve/intrinsics/vminnmq_x_f32.c: Likewise.
* gcc.target/arm/mve/intrinsics/vminnmvq_f16.c: Likewise.
* gcc.target/arm/mve/intrinsics/vminnmvq_f32.c: Likewise.
* gcc.target/arm/mve/intrinsics/vminnmvq_p_f16.c: Likewise.
* gcc.target/arm/mve/intrinsics/vminnmvq_p_f32.c: Likewise.
* gcc.target/arm/mve/intrinsics/vminq_m_s16.c: Likewise.
* gcc.target/arm/mve/intrinsics/vminq_m_s32.c: Likewise.
* gcc.target/arm/mve/intrinsics/vminq_m_s8.c: Likewise.
* gcc.target/arm/mve/intrinsics/vminq_m_u16.c: Likewise.
* gcc.target/arm/mve/intrinsics/vminq_m_u32.c: Likewise.
* gcc.target/arm/mve/intrinsics/vminq_m_u8.c: Likewise.
* gcc.target/arm/mve/intrinsics/vminq_s16.c: Likewise.
* gcc.target/arm/mve/intrinsics/vminq_s32.c: Likewise.
* gcc.target/arm/mve/intrinsics/vminq_s8.c: Likewise.
* gcc.target/arm/mve/intrinsics/vminq_u16.c: Likewise.
* gcc.target/arm/mve/intrinsics/vminq_u32.c: Likewise.
* gcc.target/arm/mve/intrinsics/vminq_u8.c: Likewise.
* gcc.target/arm/mve/intrinsics/vminq_x_s16.c: Likewise.
* gcc.target/arm/mve/intrinsics/vminq_x_s32.c: Likewise.
* gcc.target/arm/mve/intrinsics/vminq_x_s8.c: Likewise.
* gcc.target/arm/mve/intrinsics/vminq_x_u16.c: Likewise.
* gcc.target/arm/mve/intrinsics/vminq_x_u32.c: Likewise.
* gcc.target/arm/mve/intrinsics/vminq_x_u8.c: Likewise.
* gcc.target/arm/mve/intrinsics/vminvq_p_s16.c: Likewise.
* gcc.target/arm/mve/intrinsics/vminvq_p_s32.c: Likewise.
* gcc.target/arm/mve/intrinsics/vminvq_p_s8.c: Likewise.
* gcc.target/arm/mve/intrinsics/vminvq_p_u16.c: Likewise.
* gcc.target/arm/mve/intrinsics/vminvq_p_u32.c: Likewise.
* gcc.target/arm/mve/intrinsics/vminvq_p_u8.c: Likewise.
* gcc.target/arm/mve/intrinsics/vminvq_s16.c: Likewise.
* gcc.target/arm/mve/intrinsics/vminvq_s32.c: Likewise.
* gcc.target/arm/mve/intrinsics/vminvq_s8.c: Likewise.
* gcc.target/arm/mve/intrinsics/vminvq_u16.c: Likewise.
* gcc.target/arm/mve/intrinsics/vminvq_u32.c: Likewise.
* gcc.target/arm/mve/intrinsics/vminvq_u8.c: Likewise.
---
 .../arm/mve/intrinsics/vminaq_m_s16.c | 25 +--
 .../arm/mve/intrinsics/vminaq_m_s32.c | 25 +--
 .../arm/mve/intrinsics/vminaq_m_s8.c  | 25 +--
 .../arm/mve/intrinsics/vminaq_s16.c   | 16 +++-
 .../arm/mve/intrinsics/vminaq_s32.c   | 16 +++-
 .../gcc.target/arm/mve/intrinsics/vminaq_s8.c | 16 +++-
 .../arm/mve/intrinsics/vminavq_p_s16.c| 41 ---
 .../arm/mve/intrinsics/vminavq_p_s32.c| 41 ---
 .../arm/mve/intrinsics/vminavq_p_s8.c | 41 ---
 .../arm/mve/intrinsics/vminavq_s16.c  | 29 ++---
 .../arm/mve/intrinsics/vminavq_s32.c  | 29 ++---
 .../arm/mve/intrinsics/vminavq_s8.c   | 29 ++---
 .../arm/mve/intrinsics/vminnmaq_f16.c | 16 +++-
 .../arm/mve/intrinsics/vminnmaq_f32.c | 16 +++-
 .../arm/mve/intrinsics/vminnmaq_m_f16.c   | 25 +--
 .../arm/mve/intrinsics/vminnmaq_m_f32.c   | 25 +--
 

  1   2   >