Re: [PATCH 1/2]middle-end: Add new tbranch optab to add support for bit-test-and-branch operations

2022-11-14 Thread Richard Biener via Gcc-patches
On Mon, Nov 14, 2022 at 4:57 PM Tamar Christina via Gcc-patches
 wrote:
>
> > -Original Message-
> > From: Richard Biener 
> > Sent: Saturday, November 5, 2022 2:23 PM
> > To: Aldy Hernandez 
> > Cc: Tamar Christina ; Jeff Law
> > ; gcc-patches@gcc.gnu.org; nd ;
> > MacLeod, Andrew 
> > Subject: Re: [PATCH 1/2]middle-end: Add new tbranch optab to add support
> > for bit-test-and-branch operations
> >
> > On Wed, 2 Nov 2022, Aldy Hernandez wrote:
> >
> > > On Wed, Nov 2, 2022 at 10:55 AM Tamar Christina
> >  wrote:
> > > >
> > > > Hi Aldy,
> > > >
> > > > I'm trying to use Ranger to determine if a range of an expression is a
> > single bit.
> > > >
> > > > If possible in case of a mask then also the position of the bit that's 
> > > > being
> > checked by the mask (or the mask itself).
> > >
> > > Just instantiate a ranger, and ask for the range of an SSA name (or an
> > > arbitrary tree expression) at a particular gimple statement (or an
> > > edge):
> > >
> > > gimple_ranger ranger;
> > > int_range_max r;
> > > if (ranger.range_of_expr (r, , )) {
> > >   // do stuff with range "r"
> > >   if (r.singleton_p ()) {
> > > wide_int num = r.lower_bound ();
> > > // Check the bits in NUM, etc...
> > >   }
> > > }
> > >
> > > You can see the full ranger API in gimple-range.h.
> > >
> > > Note that instantiating a new ranger is relatively lightweight, but
> > > it's not free.  So unless you're calling range_of_expr sporadically,
> > > you probably want to have one instance for your pass.  You can pass
> > > around the gimple_ranger around your pass.  Another way of doing this
> > > is calling enable_rager() at pass start, and then doing:
> > >
> > >   get_range_query (cfun)->range_of_expr (r, , ));
> > >
> > > gimple-loop-versioning.cc has an example of using enable_ranger /
> > > disable_ranger.
> > >
> > > I am assuming you are interested in ranges for integers / pointers.
> > > Otherwise (floats, etc) you'd have to use "Value_Range" instead of
> > > int_range_max.  I can give you examples on that if necessary.
> > >
> > > Let me know if that helps.
>
> It Did! I ended up going with Richi's suggestion, but the snippet was very 
> helpful
> for a different range based patch I'm trying a prototype for.
>
> Many thanks for the example!
>
> >
> > I think you maybe just want get_nonzero_bits?
>
> Ah, looks like that uses range info as well.  Thanks!
>
> Ok for master?
>
> Thanks,
> Tamar
>
> gcc/ChangeLog:
>
> * dojump.cc (do_jump): Pass along value.
> (do_jump_by_parts_greater_rtx): Likewise.
> (do_jump_by_parts_zero_rtx): Likewise.
> (do_jump_by_parts_equality_rtx): Likewise.
> (do_compare_rtx_and_jump): Likewise.
> (do_compare_and_jump): Likewise.
> * dojump.h (do_compare_rtx_and_jump): New.
> * optabs.cc (emit_cmp_and_jump_insn_1): Refactor to take optab to 
> check.
> (validate_test_and_branch): New.
> (emit_cmp_and_jump_insns): Optiobally take a value, and when value is
> supplied then check if it's suitable for tbranch.
> * optabs.def (tbranch$a4): New.
> * doc/md.texi (tbranch@var{mode}4): Document it.
> * optabs.h (emit_cmp_and_jump_insns):
> * tree.h (tree_zero_one_valued_p): New.
>
> --- inline copy of patch ---
> diff --git a/gcc/doc/md.texi b/gcc/doc/md.texi
> index 
> 34825549ed4e315b07d36dc3d63bae0cc0a3932d..342e8c4c670de251a35689d1805acceb72a8f6bf
>  100644
> --- a/gcc/doc/md.texi
> +++ b/gcc/doc/md.texi
> @@ -6958,6 +6958,13 @@ case, you can and should make operand 1's predicate 
> reject some operators
>  in the @samp{cstore@var{mode}4} pattern, or remove the pattern altogether
>  from the machine description.
>
> +@cindex @code{tbranch@var{mode}4} instruction pattern
> +@item @samp{tbranch@var{mode}4}
> +Conditional branch instruction combined with a bit test-and-compare
> +instruction. Operand 0 is a comparison operator.  Operand 1 is the
> +operand of the comparison. Operand 2 is the bit position of Operand 1 to 
> test.
> +Operand 3 is the @code{code_label} to jump to.
> +
>  @cindex @code{cbranch@var{mode}4} instruction pattern
>  @item @samp{cbranch@var{mode}4}
>  Conditional branch instruction combined with a compare instruction.
> diff --git a/gcc/dojump.h b/gcc/dojump.h
> index 
> e379cceb34bb1765cb575636e4c05b61501fc2cf..d1d79c490c420a805fe48d58740a79c1f25fb839
>  100644
> --- a/gcc/dojump.h
> +++ b/gcc/dojump.h
> @@ -71,6 +71,10 @@ extern void jumpifnot (tree exp, rtx_code_label *label,
>  extern void jumpifnot_1 (enum tree_code, tree, tree, rtx_code_label *,
>  profile_probability);
>
> +extern void do_compare_rtx_and_jump (rtx, rtx, enum rtx_code, int, tree,
> +machine_mode, rtx, rtx_code_label *,
> +rtx_code_label *, profile_probability);
> +
>  extern void do_compare_rtx_and_jump (rtx, rtx, enum rtx_code, int,
>  

[committed] c++: Fix a typo in function name

2022-11-14 Thread Jakub Jelinek via Gcc-patches
Hi!

I've noticed I've made a typo in the name of the function.
Fixed thusly.

Bootstrapped/regtested on x86_64-linux and i686-linux, committed
as obvious to trunk.

2022-11-15  Jakub Jelinek  

* cp-tree.h (next_common_initial_seqence): Rename to ...
(next_common_initial_sequence): ... this.
* typeck.cc (next_common_initial_seqence): Rename to ...
(next_common_initial_sequence): ... this.
(layout_compatible_type_p): Call next_common_initial_sequence
rather than next_common_initial_seqence.
* semantics.cc (is_corresponding_member_aggr): Likewise.

--- gcc/cp/cp-tree.h.jj 2022-11-14 13:35:34.311158621 +0100
+++ gcc/cp/cp-tree.h2022-11-14 13:41:29.817322405 +0100
@@ -7982,7 +7982,7 @@ extern bool comp_except_specs (const_t
 extern bool comptypes  (tree, tree, int);
 extern bool same_type_ignoring_top_level_qualifiers_p (tree, tree);
 extern bool similar_type_p (tree, tree);
-extern bool next_common_initial_seqence(tree &, tree &);
+extern bool next_common_initial_sequence   (tree &, tree &);
 extern bool layout_compatible_type_p   (tree, tree);
 extern bool compparms  (const_tree, const_tree);
 extern int comp_cv_qualification   (const_tree, const_tree);
--- gcc/cp/typeck.cc.jj 2022-11-14 13:35:34.474156404 +0100
+++ gcc/cp/typeck.cc2022-11-14 13:42:07.328812124 +0100
@@ -1779,7 +1779,7 @@ similar_type_p (tree type1, tree type2)
the common initial sequence.  */
 
 bool
-next_common_initial_seqence (tree , tree )
+next_common_initial_sequence (tree , tree )
 {
   while (memb1)
 {
@@ -1871,7 +1871,7 @@ layout_compatible_type_p (tree type1, tr
{
  while (1)
{
- if (!next_common_initial_seqence (field1, field2))
+ if (!next_common_initial_sequence (field1, field2))
return false;
  if (field1 == NULL_TREE)
return true;
--- gcc/cp/semantics.cc.jj  2022-11-14 13:35:34.429157016 +0100
+++ gcc/cp/semantics.cc 2022-11-14 13:41:47.930076022 +0100
@@ -11665,7 +11665,7 @@ is_corresponding_member_aggr (location_t
   tree ret = boolean_false_node;
   while (1)
 {
-  bool r = next_common_initial_seqence (field1, field2);
+  bool r = next_common_initial_sequence (field1, field2);
   if (field1 == NULL_TREE || field2 == NULL_TREE)
break;
   if (r

Jakub



Re: [PATCH v2 0/2] Basic support for the Ventana VT1 w/ instruction fusion

2022-11-14 Thread Richard Biener via Gcc-patches
On Tue, Nov 15, 2022 at 12:01 AM Philipp Tomsich
 wrote:
>
> On Mon, 14 Nov 2022 at 23:47, Palmer Dabbelt  wrote:
> >
> > [Trying to join the threads here.]
> >
> > On Mon, 14 Nov 2022 13:28:23 PST (-0800), philipp.toms...@vrull.eu wrote:
> > > Jeff,
> > >
> > > On Mon, 14 Nov 2022 at 22:23, Jeff Law  wrote:
> > >>
> > >>
> > >> On 11/14/22 13:00, Palmer Dabbelt wrote:
> > >> > On Sun, 13 Nov 2022 12:48:22 PST (-0800), philipp.toms...@vrull.eu 
> > >> > wrote:
> > >> >>
> > >> >> This series provides support for the Ventana VT1 (a 4-way superscalar
> > >> >> rv64gc_zba_zbb_zbc_zbs_zifenci_xventanacondops core) including support
> > >> >> for the supported instruction fusion patterns.
> > >> >>
> > >> >> This includes the addition of the fusion-aware scheduling
> > >> >> infrastructure for RISC-V and implements idiom recognition for the
> > >> >> fusion patterns supported by VT1.
> > >> >>
> > >> >> Note that we don't signal support for XVentanaCondOps at this point,
> > >> >> as the XVentanaCondOps support is in-flight separately. Changing the
> > >> >> defaults for VT1 can happen late in the cycle, so no need to link the
> > >> >> two different changesets.
> > >> >>
> > >> >> Changes in v2:
> > >> >> - Rebased and changed over to .rst-based documentation
> > >> >> - Updated to catch more fusion cases
> > >> >> - Signals support for Zifencei
> > >> >>
> > >> >> Philipp Tomsich (2):
> > >> >>   RISC-V: Add basic support for the Ventana-VT1 core
> > >> >>   RISC-V: Add instruction fusion (for ventana-vt1)
> > >> >>
> > >> >>  gcc/config/riscv/riscv-cores.def  |   3 +
> > >> >>  gcc/config/riscv/riscv-opts.h |   2 +-
> > >> >>  gcc/config/riscv/riscv.cc | 233 
> > >> >> ++
> > >> >>  .../risc-v-options.rst|   5 +-
> > >> >>  4 files changed, 240 insertions(+), 3 deletions(-)
> > >> >
> > >> > I guess we never really properly talked about this on the GCC mailing
> > >> > lists, but IMO it's fine to start taking code for designs that have
> > >> > been announced under the assumption that if the hardware doesn't
> > >> > actually show up according to those timelines that it will be assumed
> > >> > to have never existed and thus be removed more quickly than usual.
> > >> Absolutely.   I have zero interest in carrying around code for
> > >> nonexistent or dead variants.
> > >> >
> > >> > That said, I can't find anything describing that the VT-1 exists aside
> > >> > from these patches.  Is there anything that describes this design and
> > >> > when it's expected to be available?
> > >>
> > >> What do you need?  I can give some broad overview information on the
> > >> design, but it would likely just mirror what's already been mentioned in
> > >> these patches.
> > >>
> > >>
> > >> As far as schedules.  I'm not sure what I can say.  I'll check on that.
> >
> > I'm less worried about the "does this pipeline model match the HW" bits,
> > at least until the HW is publicly available then all we can do is rely
> > on the vendor (and even after the HW is public the vendor might be the
> > only one who cares enough to figure things out, nothing we can really do
> > upstream there).  We've had some issues with nobody caring enough about
> > the C906 pipeline model to sort out whether some patches are a net win,
> > but if nobody (including the vendor) cares about the HW enough to
> > benchmark things then there's not much we can do.
> >
> > My bigger worry is getting roped in to supporting a bunch of hardware
> > that doesn't actually exist yet and may never make it outside some
> > vendor's lab.  That can generally be a ton of work and filters
> > throughout GCC, even outside of the RISC-V backend.  We've already got
> > enough chaos just trying to follow the ISA, chasing down issues related
> > to hardware that may not ever manifest is just going to lead to
> > craziness.
> >
> > So on my end the point of the schedule is to have something we can look
> > at and determine that the hardware is somehow defunct.  The fairest way
> > we could come up with was to tie it to some sort of company announcement
> > of the hardware: obviously everyone knows their internal timelines, but
> > that's not fair to companies that don't employ someone with commit
> > access.  Requirement some sort of public announcement means everyone has
> > the same rules to play by, IMO that's really important in RISC-V land as
> > there's so many vendors.
> >
> > >> It was never my intention to bypass any process/procedures here. So if I
> > >> did, my apologies.
> > >
> > > The controversial part is XVentanaCondOps (as it is a vendor-defined
> > > extension), so I'll certainly hold off on that until both you and
> > > Palmer are in agreement on how to proceed there.
> >
> > The pipeline models are essentially in the same spot.  We've got a bit
> > of a precedent there for taking them just based on an announcement, but
> > there isn't one here.
> >
> > [and the 

Re: [PATCH] [PR68097] Try to avoid recursing for floats in tree_*_nonnegative_warnv_p.

2022-11-14 Thread Richard Biener via Gcc-patches
On Mon, Nov 14, 2022 at 8:05 PM Aldy Hernandez  wrote:
>
>
>
> On 11/14/22 10:12, Richard Biener wrote:
> > On Sat, Nov 12, 2022 at 7:30 PM Aldy Hernandez  wrote:
> >>
> >> It irks me that a PR named "we should track ranges for floating-point
> >> hasn't been closed in this release.  This is an attempt to do just
> >> that.
> >>
> >> As mentioned in the PR, even though we track ranges for floats, it has
> >> been suggested that avoiding recursing through SSA defs in
> >> gimple_assign_nonnegative_warnv_p is also a goal.  We can do this with
> >> various ranger components without the need for a heavy handed approach
> >> (i.e. a full ranger).
> >>
> >> I have implemented two versions of known_float_sign_p() that answer
> >> the question whether we definitely know the sign for an operation or a
> >> tree expression.
> >>
> >> Both versions use get_global_range_query, which is a wrapper to query
> >> global ranges.  This means, that no caching or propagation is done.
> >> In the case of an SSA, we just return the global range for it (think
> >> SSA_NAME_RANGE_INFO).  In the case of a tree code with operands, we
> >> also use get_global_range_query to resolve the operands, and then call
> >> into range-ops, which is our lowest level component.  There is no
> >> ranger or gori involved.  All we're doing is resolving the operation
> >> with the ranges passed.
> >>
> >> This is enough to avoid recursing in the case where we definitely know
> >> the sign of a range.  Otherwise, we still recurse.
> >>
> >> Note that instead of get_global_range_query(), we could use
> >> get_range_query() which uses a ranger (if active in a pass), or
> >> get_global_range_query if not.  This would allow passes that have an
> >> active ranger (with enable_ranger) to use a full ranger.  These passes
> >> are currently, VRP, loop unswitching, DOM, loop versioning, etc.  If
> >> no ranger is active, get_range_query defaults to global ranges, so
> >> there's no additional penalty.
> >>
> >> Would this be acceptable, at least enough to close (or rename the PR ;-))?
> >
> > I think the checks would belong to the gimple_stmt_nonnegative_warnv_p 
> > function
> > only (that's the SSA name entry from the fold-const.cc ones)?
>
> That was my first approach, but I thought I'd cover the unary and binary
> operators as well, since they had other callers.  But I'm happy with
> just the top-level tweak.  It's a lot less code :).

@@ -9234,6 +9235,15 @@ bool
 gimple_stmt_nonnegative_warnv_p (gimple *stmt, bool *strict_overflow_p,
 int depth)
 {
+  tree type = gimple_range_type (stmt);
+  if (type && frange::supports_p (type))
+{
+  frange r;
+  bool sign;
+  return (get_global_range_query ()->range_of_stmt (r, stmt)
+ && r.signbit_p (sign)
+ && sign == false);
+}

the above means we never fall through to the switch below if
frange::supports_p (type) - that's eventually good enough, I
don't think we ever call this very function directly but it gets
invoked via recursion through operands only.  But of course
I wonder what types are not supported by frange and whether
the manual processing we fall through to does anything meaningful
for those?

I won't ask you to thoroughly answer this now but please put in
a comment reflecting the above before the switch stmt.

   switch (gimple_code (stmt))


Otherwise OK, in case you tree gets back to bootstrapping ;)

> >
> > I also notice the use of 'bool' for the "sign".  That's not really
> > descriptive.  We
> > have SIGNED and UNSIGNED (aka enum signop), not sure if that's the
> > perfect match vs. NEGATIVE and NONNEGATIVE.  Maybe the functions
> > name is just bad and they should be known_float_negative_p?
>
> The bool sign is to keep in line with real.*, and was suggested by Jeff
> (in real.* not here).  I'm happy to change the entire frange API to use
> sgnop.  It is cleaner.  If that's acceptable, I could do that as a
> follow-up.
>
> How's this, pending tests once I figure out why my trees have been
> broken all day :-/.
>
> Aldy
>
> p.s. First it was sphinx failure, now I'm seeing this:
> /home/aldyh/src/clean/gcc/match.pd:7935:8 error: return statement not
> allowed in C expression
> return NULL_TREE;
> ^

Supposedly somebody pushed and reverted this transient error?  Yep,
Tamar did.

Richard.


[PATCH, V1 1/1] RISC-V: Make R_RISCV_SUB6 conforms to riscv abi standard

2022-11-14 Thread zengxiao
From: zengxiao 

This patch makes R_RISCV_SUB6 conforms to riscv abi standard.
R_RISCV_SUB6 only the lower 6 bits of the code are valid.
The proposed specification which can be found in 8.5. Relocations of,
https://github.com/riscv-non-isa/riscv-elf-psabi-doc/releases/download/v1.0-rc4/riscv-abi.pdf

bfd/ChangeLog:

* elfxx-riscv.c (riscv_elf_add_sub_reloc):

binutils/ChangeLog:

* testsuite/binutils-all/riscv/dwarf-SUB6.d: New test.
* testsuite/binutils-all/riscv/dwarf-SUB6.s: New test.

reviewed-by: gao...@eswincomputing.com
 jinyanji...@eswincomputing.com

Signed-off-by: zengxiao 
---
 bfd/elfxx-riscv.c |  7 +
 .../testsuite/binutils-all/riscv/dwarf-SUB6.d | 31 +++
 .../testsuite/binutils-all/riscv/dwarf-SUB6.s | 12 +++
 3 files changed, 50 insertions(+)
 create mode 100644 binutils/testsuite/binutils-all/riscv/dwarf-SUB6.d
 create mode 100644 binutils/testsuite/binutils-all/riscv/dwarf-SUB6.s

diff --git a/bfd/elfxx-riscv.c b/bfd/elfxx-riscv.c
index 300ccf49534..e71d4a456f2 100644
--- a/bfd/elfxx-riscv.c
+++ b/bfd/elfxx-riscv.c
@@ -994,6 +994,13 @@ riscv_elf_add_sub_reloc (bfd *abfd,
   relocation = old_value + relocation;
   break;
 case R_RISCV_SUB6:
+  {
+bfd_vma six_bit_valid_value = old_value & howto->dst_mask;
+six_bit_valid_value -= relocation;
+relocation = (six_bit_valid_value & howto->dst_mask) |
+ (old_value & ~howto->dst_mask);
+  }
+  break;
 case R_RISCV_SUB8:
 case R_RISCV_SUB16:
 case R_RISCV_SUB32:
diff --git a/binutils/testsuite/binutils-all/riscv/dwarf-SUB6.d 
b/binutils/testsuite/binutils-all/riscv/dwarf-SUB6.d
new file mode 100644
index 000..47d5ae570d7
--- /dev/null
+++ b/binutils/testsuite/binutils-all/riscv/dwarf-SUB6.d
@@ -0,0 +1,31 @@
+#PROG: objcopy
+#objdump: --dwarf=frames
+
+tmpdir/riscvcopy.o: file format elf32-littleriscv
+
+Contents of the .eh_frame section:
+
+
+ 0020  CIE
+  Version:   3
+  Augmentation:  "zR"
+  Code alignment factor: 1
+  Data alignment factor: -4
+  Return address column: 1
+  Augmentation data: 1b
+  DW_CFA_def_cfa_register: r2 \(sp\)
+  DW_CFA_def_cfa_offset: 48
+  DW_CFA_offset: r1 \(ra\) at cfa-4
+  DW_CFA_offset: r8 \(s0\) at cfa-8
+  DW_CFA_def_cfa: r8 \(s0\) ofs 0
+  DW_CFA_restore: r1 \(ra\)
+  DW_CFA_restore: r8 \(s0\)
+  DW_CFA_def_cfa: r2 \(sp\) ofs 48
+  DW_CFA_def_cfa_offset: 0
+  DW_CFA_nop
+
+0024 0010 0028 FDE cie= pc=002c..002c
+  DW_CFA_nop
+  DW_CFA_nop
+  DW_CFA_nop
+
diff --git a/binutils/testsuite/binutils-all/riscv/dwarf-SUB6.s 
b/binutils/testsuite/binutils-all/riscv/dwarf-SUB6.s
new file mode 100644
index 000..fe959f59d9b
--- /dev/null
+++ b/binutils/testsuite/binutils-all/riscv/dwarf-SUB6.s
@@ -0,0 +1,12 @@
+.attribute arch, "rv32i2p0_m2p0_a2p0_f2p0_c2p0"
+.cfi_startproc
+.cfi_def_cfa_offset 48
+.cfi_offset 1, -4
+.cfi_offset 8, -8
+.cfi_def_cfa 8, 0
+.cfi_restore 1
+.cfi_restore 8
+.cfi_def_cfa 2, 48
+.cfi_def_cfa_offset 0
+.cfi_endproc
+
\ No newline at end of file
-- 
2.34.1



[PATCH, V1 0/1] RISC-V: Make R_RISCV_SUB6 conforms to riscv abi standard

2022-11-14 Thread zengxiao
From: zengxiao 

Hi all RISC-V folks:

When riscv-objdump is used to generate dwarf information, problems are found, 
like:
DW_CFA_??? (User defined call frame op: 0x3c)

This error is related to that riscv-objdump does not follow the riscv 
R_RISCV_SUB6 standard. 
Riscv-readelf is correct because it follows the R_RISCV_SUB6 standard.

There are test cases in 
https://github.com/zeng-xiao/gnu-bug-fix/tree/main/EG-769
that describe the error in detail. 

zengxiao (1):
  RISC-V: Make R_RISCV_SUB6 conforms to riscv abi standard

 bfd/elfxx-riscv.c |  7 +
 .../testsuite/binutils-all/riscv/dwarf-SUB6.d | 31 +++
 .../testsuite/binutils-all/riscv/dwarf-SUB6.s | 12 +++
 3 files changed, 50 insertions(+)
 create mode 100644 binutils/testsuite/binutils-all/riscv/dwarf-SUB6.d
 create mode 100644 binutils/testsuite/binutils-all/riscv/dwarf-SUB6.s

-- 
2.34.1



[PATCH, V1 1/1] RISC-V: Make R_RISCV_SUB6 conforms to riscv abi standard

2022-11-14 Thread zengxiao
From: zengxiao 

This patch makes R_RISCV_SUB6 conforms to riscv abi standard.
R_RISCV_SUB6 only the lower 6 bits of the code are valid.
The proposed specification which can be found in 8.5. Relocations of,
https://github.com/riscv-non-isa/riscv-elf-psabi-doc/releases/download/v1.0-rc4/riscv-abi.pdf

bfd/ChangeLog:

* elfxx-riscv.c (riscv_elf_add_sub_reloc): Take the lower
6 bits as the significant bit
---
 bfd/elfxx-riscv.c |  7 +
 .../testsuite/binutils-all/riscv/dwarf-SUB6.d | 31 +++
 .../testsuite/binutils-all/riscv/dwarf-SUB6.s | 12 +++
 3 files changed, 50 insertions(+)
 create mode 100644 binutils/testsuite/binutils-all/riscv/dwarf-SUB6.d
 create mode 100644 binutils/testsuite/binutils-all/riscv/dwarf-SUB6.s

diff --git a/bfd/elfxx-riscv.c b/bfd/elfxx-riscv.c
index 300ccf49534..e71d4a456f2 100644
--- a/bfd/elfxx-riscv.c
+++ b/bfd/elfxx-riscv.c
@@ -994,6 +994,13 @@ riscv_elf_add_sub_reloc (bfd *abfd,
   relocation = old_value + relocation;
   break;
 case R_RISCV_SUB6:
+  {
+bfd_vma six_bit_valid_value = old_value & howto->dst_mask;
+six_bit_valid_value -= relocation;
+relocation = (six_bit_valid_value & howto->dst_mask) |
+ (old_value & ~howto->dst_mask);
+  }
+  break;
 case R_RISCV_SUB8:
 case R_RISCV_SUB16:
 case R_RISCV_SUB32:
diff --git a/binutils/testsuite/binutils-all/riscv/dwarf-SUB6.d 
b/binutils/testsuite/binutils-all/riscv/dwarf-SUB6.d
new file mode 100644
index 000..47d5ae570d7
--- /dev/null
+++ b/binutils/testsuite/binutils-all/riscv/dwarf-SUB6.d
@@ -0,0 +1,31 @@
+#PROG: objcopy
+#objdump: --dwarf=frames
+
+tmpdir/riscvcopy.o: file format elf32-littleriscv
+
+Contents of the .eh_frame section:
+
+
+ 0020  CIE
+  Version:   3
+  Augmentation:  "zR"
+  Code alignment factor: 1
+  Data alignment factor: -4
+  Return address column: 1
+  Augmentation data: 1b
+  DW_CFA_def_cfa_register: r2 \(sp\)
+  DW_CFA_def_cfa_offset: 48
+  DW_CFA_offset: r1 \(ra\) at cfa-4
+  DW_CFA_offset: r8 \(s0\) at cfa-8
+  DW_CFA_def_cfa: r8 \(s0\) ofs 0
+  DW_CFA_restore: r1 \(ra\)
+  DW_CFA_restore: r8 \(s0\)
+  DW_CFA_def_cfa: r2 \(sp\) ofs 48
+  DW_CFA_def_cfa_offset: 0
+  DW_CFA_nop
+
+0024 0010 0028 FDE cie= pc=002c..002c
+  DW_CFA_nop
+  DW_CFA_nop
+  DW_CFA_nop
+
diff --git a/binutils/testsuite/binutils-all/riscv/dwarf-SUB6.s 
b/binutils/testsuite/binutils-all/riscv/dwarf-SUB6.s
new file mode 100644
index 000..fe959f59d9b
--- /dev/null
+++ b/binutils/testsuite/binutils-all/riscv/dwarf-SUB6.s
@@ -0,0 +1,12 @@
+.attribute arch, "rv32i2p0_m2p0_a2p0_f2p0_c2p0"
+.cfi_startproc
+.cfi_def_cfa_offset 48
+.cfi_offset 1, -4
+.cfi_offset 8, -8
+.cfi_def_cfa 8, 0
+.cfi_restore 1
+.cfi_restore 8
+.cfi_def_cfa 2, 48
+.cfi_def_cfa_offset 0
+.cfi_endproc
+
\ No newline at end of file
-- 
2.34.1



[PATCH, V1 0/1] RISC-V: Make R_RISCV_SUB6 conforms to riscv abi standard

2022-11-14 Thread zengxiao
From: zengxiao 

Hi all RISC-V folks:

When riscv-objdump is used to generate dwarf information, problems are found, 
like:
DW_CFA_??? (User defined call frame op: 0x3c)

This error is related to that riscv-objdump does not follow the riscv 
R_RISCV_SUB6 standard. 
Riscv-readelf is correct because it follows the R_RISCV_SUB6 standard.

There are test cases in 
https://github.com/zeng-xiao/gnu-bug-fix/tree/main/EG-769
that describe the error in detail. 

zengxiao (1):
  RISC-V: Make R_RISCV_SUB6 conforms to riscv abi standard

 bfd/elfxx-riscv.c |  7 +
 .../testsuite/binutils-all/riscv/dwarf-SUB6.d | 31 +++
 .../testsuite/binutils-all/riscv/dwarf-SUB6.s | 12 +++
 3 files changed, 50 insertions(+)
 create mode 100644 binutils/testsuite/binutils-all/riscv/dwarf-SUB6.d
 create mode 100644 binutils/testsuite/binutils-all/riscv/dwarf-SUB6.s

-- 
2.34.1



[PATCH] Remove Score documentation

2022-11-14 Thread apinski--- via Gcc-patches
From: Andrew Pinski 

Score target support was removed in r5-3909-g3daa7bbf791203
but it looks like some of the documentation was missed.
This removes it.

Committed as obvious after a "make html".

Thanks,
Andrew

gcc/ChangeLog:

* doc/invoke.texi: Remove Score option section.
---
 gcc/doc/invoke.texi | 52 
 1 file changed, 52 deletions(-)

diff --git a/gcc/doc/invoke.texi b/gcc/doc/invoke.texi
index ef88f2a..55e8a14 100644
--- a/gcc/doc/invoke.texi
+++ b/gcc/doc/invoke.texi
@@ -1316,13 +1316,6 @@ See RS/6000 and PowerPC Options.
 -mwarn-framesize  -mwarn-dynamicstack  -mstack-size  -mstack-guard @gol
 -mhotpatch=@var{halfwords},@var{halfwords}}
 
-@emph{Score Options}
-@gccoptlist{-meb  -mel @gol
--mnhwloop @gol
--muls @gol
--mmac @gol
--mscore5  -mscore5u  -mscore7  -mscore7d}
-
 @emph{SH Options}
 @gccoptlist{-m1  -m2  -m2e @gol
 -m2a-nofpu  -m2a-single-only  -m2a-single  -m2a @gol
@@ -19726,7 +19719,6 @@ platform.
 * RS/6000 and PowerPC Options::
 * RX Options::
 * S/390 and zSeries Options::
-* Score Options::
 * SH Options::
 * Solaris 2 Options::
 * SPARC Options::
@@ -30424,50 +30416,6 @@ This option can be overridden for individual functions 
with the
 @code{hotpatch} attribute.
 @end table
 
-@node Score Options
-@subsection Score Options
-@cindex Score Options
-
-These options are defined for Score implementations:
-
-@table @gcctabopt
-@item -meb
-@opindex meb
-Compile code for big-endian mode.  This is the default.
-
-@item -mel
-@opindex mel
-Compile code for little-endian mode.
-
-@item -mnhwloop
-@opindex mnhwloop
-Disable generation of @code{bcnz} instructions.
-
-@item -muls
-@opindex muls
-Enable generation of unaligned load and store instructions.
-
-@item -mmac
-@opindex mmac
-Enable the use of multiply-accumulate instructions. Disabled by default.
-
-@item -mscore5
-@opindex mscore5
-Specify the SCORE5 as the target architecture.
-
-@item -mscore5u
-@opindex mscore5u
-Specify the SCORE5U of the target architecture.
-
-@item -mscore7
-@opindex mscore7
-Specify the SCORE7 as the target architecture. This is the default.
-
-@item -mscore7d
-@opindex mscore7d
-Specify the SCORE7D as the target architecture.
-@end table
-
 @node SH Options
 @subsection SH Options
 
-- 
1.8.3.1



[PATCH v4] OpenMP: Generate SIMD clones for functions with "declare target"

2022-11-14 Thread Sandra Loosemore via Gcc-patches
Here is yet another attempt at a patch to auto-generate SIMD clones for 
functions that already have the "declare target" attribute.  This 
version v4 is derived from the previous v2 version, since v3 seemed to 
be a dead end.


I have added conditionals to restrict the auto-generation at -O2 to the 
offload compiler, and extended the syntax of the 
-fopenmp-target-simd-clone to allow explicit control over whether it 
applies to host, target, or both -- this primarily to allow better test 
coverage.  I've added infrastructure to support testing on the offload 
compiler, added new test cases, and reworked the existing test cases to 
scan for interesting things written to the dump file instead of 
examining the .s output.


I hope it is not too late to consider this patch given that I've been 
trying to get this feature in for months already.  Also, I kind of got 
caught in the Sphinx churn last week, relating to the documentation 
parts of this patch.  :-(  I understand that if this patch is accepted I 
am also on the hook to come up with a further patch to try to GC unused 
clones after vectorization; I haven't started on that piece yet.


-SandraFrom 771be96d2dc7b8868ba06cf8ec6afe7a3337ac89 Mon Sep 17 00:00:00 2001
From: Sandra Loosemore 
Date: Tue, 15 Nov 2022 03:40:12 +
Subject: [PATCH] OpenMP: Generate SIMD clones for functions with "declare
 target"

This patch causes the IPA simdclone pass to generate clones for
functions with the "omp declare target" attribute as if they had
"omp declare simd", provided the function appears to be suitable for
SIMD execution.  The filter is conservative, rejecting functions
that write memory or that call other functions not known to be safe.
A new option -fopenmp-target-simd-clone is added to control this
transformation; it's enabled for offload processing at -O2 and higher.

gcc/ChangeLog:

	* common.opt (fopenmp-target-simd-clone): New option.
	(target_simd_clone_device): New enum to go with it.
	* doc/invoke.texi (-fopenmp-target-simd-clone): Document.
	* flag-types.h (enum omp_target_simd_clone_device_kind): New.
	* omp-simd-clone.cc (auto_simd_fail): New function.
	(auto_simd_check_stmt): New function.
	(plausible_type_for_simd_clone): New function.
	(ok_for_auto_simd_clone): New function.
	(simd_clone_create): Add force_local argument, make the symbol
	have internal linkage if it is true.
	(expand_simd_clones): Also check for cloneable functions with
	"omp declare target".  Pass explicit_p argument to
	simd_clone.compute_vecsize_and_simdlen target hook.
	* opts.cc (default_options_table): Add -fopenmp-target-simd-clone.
	* target.def (TARGET_SIMD_CLONE_COMPUTE_VECSIZE_AND_SIMDLEN):
	Add bool explicit_p argument.
	* doc/tm.texi: Regenerated.
	* config/aarch64/aarch64.cc
	(aarch64_simd_clone_compute_vecsize_and_simdlen): Update.
	* config/gcn/gcn.cc
	(gcn_simd_clone_compute_vecsize_and_simdlen): Update.
	* config/i386/i386.cc
	(ix86_simd_clone_compute_vecsize_and_simdlen): Update.

gcc/testsuite/ChangeLog:

	* g++.dg/gomp/target-simd-clone-1.C: New.
	* g++.dg/gomp/target-simd-clone-2.C: New.
	* gcc.dg/gomp/target-simd-clone-1.c: New.
	* gcc.dg/gomp/target-simd-clone-2.c: New.
	* gcc.dg/gomp/target-simd-clone-3.c: New.
	* gcc.dg/gomp/target-simd-clone-4.c: New.
	* gcc.dg/gomp/target-simd-clone-5.c: New.
	* gcc.dg/gomp/target-simd-clone-6.c: New.
	* gcc.dg/gomp/target-simd-clone-7.c: New.
	* gcc.dg/gomp/target-simd-clone-8.c: New.
	* lib/scanoffloadipa.exp: New.

libgomp/ChangeLog:

	* testsuite/lib/libgomp.exp: Load scanoffloadipa.exp library.
	* testsuite/libgomp.c/target-simd-clone-1.c: New.
	* testsuite/libgomp.c/target-simd-clone-2.c: New.
	* testsuite/libgomp.c/target-simd-clone-3.c: New.
---
 gcc/common.opt|  22 ++
 gcc/config/aarch64/aarch64.cc |  24 +-
 gcc/config/gcn/gcn.cc |  10 +-
 gcc/config/i386/i386.cc   |  27 +-
 gcc/doc/invoke.texi   |  23 +-
 gcc/doc/tm.texi   |   2 +-
 gcc/flag-types.h  |   9 +
 gcc/omp-simd-clone.cc | 309 --
 gcc/opts.cc   |   2 +
 gcc/target.def|   2 +-
 .../g++.dg/gomp/target-simd-clone-1.C |  25 ++
 .../g++.dg/gomp/target-simd-clone-2.C |  23 ++
 .../gcc.dg/gomp/target-simd-clone-1.c |  25 ++
 .../gcc.dg/gomp/target-simd-clone-2.c |  22 ++
 .../gcc.dg/gomp/target-simd-clone-3.c |  22 ++
 .../gcc.dg/gomp/target-simd-clone-4.c |  26 ++
 .../gcc.dg/gomp/target-simd-clone-5.c |  28 ++
 .../gcc.dg/gomp/target-simd-clone-6.c |  27 ++
 .../gcc.dg/gomp/target-simd-clone-7.c |  15 +
 .../gcc.dg/gomp/target-simd-clone-8.c |  25 ++
 gcc/testsuite/lib/scanoffloadipa.exp  | 148 +
 libgomp/testsuite/lib/libgomp.exp |   1 +
 

[PATCH] Remove the picoChip documentation

2022-11-14 Thread apinski--- via Gcc-patches
From: Andrew Pinski 

PicoChip support was removed in r5-3431-g157e859ffe3b5d but the
documentation was missed it seems.

Committed as obvious after running "make html" to make sure the
building of the documentation still works.

Thanks,
Andrew Pinski

gcc/ChangeLog:

* doc/extend.texi: Remove picoChip builtin section.
* doc/invoke.texi: Remove picoChip option section.
---
 gcc/doc/extend.texi | 37 -
 gcc/doc/invoke.texi | 53 -
 2 files changed, 90 deletions(-)

diff --git a/gcc/doc/extend.texi b/gcc/doc/extend.texi
index ca84f3a..608bbe1 100644
--- a/gcc/doc/extend.texi
+++ b/gcc/doc/extend.texi
@@ -14647,7 +14647,6 @@ instructions, but allow the compiler to schedule those 
calls.
 * Other MIPS Built-in Functions::
 * MSP430 Built-in Functions::
 * NDS32 Built-in Functions::
-* picoChip Built-in Functions::
 * Basic PowerPC Built-in Functions::
 * PowerPC AltiVec/VSX Built-in Functions::
 * PowerPC Hardware Transactional Memory Built-in Functions::
@@ -17774,42 +17773,6 @@ Enable global interrupt.
 Disable global interrupt.
 @end deftypefn
 
-@node picoChip Built-in Functions
-@subsection picoChip Built-in Functions
-
-GCC provides an interface to selected machine instructions from the
-picoChip instruction set.
-
-@table @code
-@item int __builtin_sbc (int @var{value})
-Sign bit count.  Return the number of consecutive bits in @var{value}
-that have the same value as the sign bit.  The result is the number of
-leading sign bits minus one, giving the number of redundant sign bits in
-@var{value}.
-
-@item int __builtin_byteswap (int @var{value})
-Byte swap.  Return the result of swapping the upper and lower bytes of
-@var{value}.
-
-@item int __builtin_brev (int @var{value})
-Bit reversal.  Return the result of reversing the bits in
-@var{value}.  Bit 15 is swapped with bit 0, bit 14 is swapped with bit 1,
-and so on.
-
-@item int __builtin_adds (int @var{x}, int @var{y})
-Saturating addition.  Return the result of adding @var{x} and @var{y},
-storing the value 32767 if the result overflows.
-
-@item int __builtin_subs (int @var{x}, int @var{y})
-Saturating subtraction.  Return the result of subtracting @var{y} from
-@var{x}, storing the value @minus{}32768 if the result overflows.
-
-@item void __builtin_halt (void)
-Halt.  The processor stops execution.  This built-in is useful for
-implementing assertions.
-
-@end table
-
 @node Basic PowerPC Built-in Functions
 @subsection Basic PowerPC Built-in Functions
 
diff --git a/gcc/doc/invoke.texi b/gcc/doc/invoke.texi
index 12be55f..ef88f2a 100644
--- a/gcc/doc/invoke.texi
+++ b/gcc/doc/invoke.texi
@@ -1190,10 +1190,6 @@ Objective-C and Objective-C++ Dialects}.
 -mint32  -mno-int16  -mint16  -mno-int32 @gol
 -msplit  -munix-asm  -mdec-asm  -mgnu-asm  -mlra}
 
-@emph{picoChip Options}
-@gccoptlist{-mae=@var{ae_type}  -mvliw-lookahead=@var{N} @gol
--msymbol-as-address  -mno-inefficient-warnings}
-
 @emph{PowerPC Options}
 See RS/6000 and PowerPC Options.
 
@@ -19723,7 +19719,6 @@ platform.
 * Nvidia PTX Options::
 * OpenRISC Options::
 * PDP-11 Options::
-* picoChip Options::
 * PowerPC Options::
 * PRU Options::
 * RISC-V Options::
@@ -28396,54 +28391,6 @@ Use the new LRA register allocator.  By default, the 
old ``reload''
 allocator is used.
 @end table
 
-@node picoChip Options
-@subsection picoChip Options
-@cindex picoChip options
-
-These @samp{-m} options are defined for picoChip implementations:
-
-@table @gcctabopt
-
-@item -mae=@var{ae_type}
-@opindex mcpu
-Set the instruction set, register set, and instruction scheduling
-parameters for array element type @var{ae_type}.  Supported values
-for @var{ae_type} are @samp{ANY}, @samp{MUL}, and @samp{MAC}.
-
-@option{-mae=ANY} selects a completely generic AE type.  Code
-generated with this option runs on any of the other AE types.  The
-code is not as efficient as it would be if compiled for a specific
-AE type, and some types of operation (e.g., multiplication) do not
-work properly on all types of AE.
-
-@option{-mae=MUL} selects a MUL AE type.  This is the most useful AE type
-for compiled code, and is the default.
-
-@option{-mae=MAC} selects a DSP-style MAC AE.  Code compiled with this
-option may suffer from poor performance of byte (char) manipulation,
-since the DSP AE does not provide hardware support for byte load/stores.
-
-@item -msymbol-as-address
-Enable the compiler to directly use a symbol name as an address in a
-load/store instruction, without first loading it into a
-register.  Typically, the use of this option generates larger
-programs, which run faster than when the option isn't used.  However, the
-results vary from program to program, so it is left as a user option,
-rather than being permanently enabled.
-
-@item -mno-inefficient-warnings
-Disables warnings about the generation of inefficient code.  These
-warnings can be generated, for example, when compiling code that

[PATCH] Remove documentation for MeP

2022-11-14 Thread apinski--- via Gcc-patches
From: Andrew Pinski 

MeP support was removed in r7-1614-g0609abdad81e26
but it looks like the documentation for the target
was missed.

Committed as obvious after doing "make html" to
make sure the documentation is fine.

Thanks,
Andrew Pinski

gcc/ChangeLog:

* doc/extend.texi: Remove MeP documentation.
* doc/invoke.texi: Remove MeP Options documentation.
---
 gcc/doc/extend.texi | 190 
 gcc/doc/invoke.texi | 171 --
 2 files changed, 361 deletions(-)

diff --git a/gcc/doc/extend.texi b/gcc/doc/extend.texi
index 8da0db9..ca84f3a 100644
--- a/gcc/doc/extend.texi
+++ b/gcc/doc/extend.texi
@@ -2542,7 +2542,6 @@ GCC plugins may provide their own attributes.
 * M32R/D Function Attributes::
 * m68k Function Attributes::
 * MCORE Function Attributes::
-* MeP Function Attributes::
 * MicroBlaze Function Attributes::
 * Microsoft Windows Function Attributes::
 * MIPS Function Attributes::
@@ -5392,45 +5391,6 @@ basic @code{asm} and C code may appear to work, they 
cannot be
 depended upon to work reliably and are not supported.
 @end table
 
-@node MeP Function Attributes
-@subsection MeP Function Attributes
-
-These function attributes are supported by the MeP back end:
-
-@table @code
-@item disinterrupt
-@cindex @code{disinterrupt} function attribute, MeP
-On MeP targets, this attribute causes the compiler to emit
-instructions to disable interrupts for the duration of the given
-function.
-
-@item interrupt
-@cindex @code{interrupt} function attribute, MeP
-Use this attribute to indicate
-that the specified function is an interrupt handler.  The compiler generates
-function entry and exit sequences suitable for use in an interrupt handler
-when this attribute is present.
-
-@item near
-@cindex @code{near} function attribute, MeP
-This attribute causes the compiler to assume the called
-function is close enough to use the normal calling convention,
-overriding the @option{-mtf} command-line option.
-
-@item far
-@cindex @code{far} function attribute, MeP
-On MeP targets this causes the compiler to use a calling convention
-that assumes the called function is too far away for the built-in
-addressing modes.
-
-@item vliw
-@cindex @code{vliw} function attribute, MeP
-The @code{vliw} attribute tells the compiler to emit
-instructions in VLIW mode instead of core mode.  Note that this
-attribute is not allowed unless a VLIW coprocessor has been configured
-and enabled through command-line options.
-@end table
-
 @node MicroBlaze Function Attributes
 @subsection MicroBlaze Function Attributes
 
@@ -7336,7 +7296,6 @@ attributes.
 * IA-64 Variable Attributes::
 * LoongArch Variable Attributes::
 * M32R/D Variable Attributes::
-* MeP Variable Attributes::
 * Microsoft Windows Variable Attributes::
 * MSP430 Variable Attributes::
 * Nvidia PTX Variable Attributes::
@@ -8182,70 +8141,6 @@ Medium and large model objects may live anywhere in the 
32-bit address space
 addresses).
 @end table
 
-@node MeP Variable Attributes
-@subsection MeP Variable Attributes
-
-The MeP target has a number of addressing modes and busses.  The
-@code{near} space spans the standard memory space's first 16 megabytes
-(24 bits).  The @code{far} space spans the entire 32-bit memory space.
-The @code{based} space is a 128-byte region in the memory space that
-is addressed relative to the @code{$tp} register.  The @code{tiny}
-space is a 65536-byte region relative to the @code{$gp} register.  In
-addition to these memory regions, the MeP target has a separate 16-bit
-control bus which is specified with @code{cb} attributes.
-
-@table @code
-
-@item based
-@cindex @code{based} variable attribute, MeP
-Any variable with the @code{based} attribute is assigned to the
-@code{.based} section, and is accessed with relative to the
-@code{$tp} register.
-
-@item tiny
-@cindex @code{tiny} variable attribute, MeP
-Likewise, the @code{tiny} attribute assigned variables to the
-@code{.tiny} section, relative to the @code{$gp} register.
-
-@item near
-@cindex @code{near} variable attribute, MeP
-Variables with the @code{near} attribute are assumed to have addresses
-that fit in a 24-bit addressing mode.  This is the default for large
-variables (@code{-mtiny=4} is the default) but this attribute can
-override @code{-mtiny=} for small variables, or override @code{-ml}.
-
-@item far
-@cindex @code{far} variable attribute, MeP
-Variables with the @code{far} attribute are addressed using a full
-32-bit address.  Since this covers the entire memory space, this
-allows modules to make no assumptions about where variables might be
-stored.
-
-@item io
-@cindex @code{io} variable attribute, MeP
-@itemx io (@var{addr})
-Variables with the @code{io} attribute are used to address
-memory-mapped peripherals.  If an address is specified, the variable
-is assigned that address, else it is not assigned an address (it is
-assumed some other module assigns an 

[PATCH] Fix @opindex for mcall-aixdesc and mcall-openbsd

2022-11-14 Thread apinski--- via Gcc-patches
From: Andrew Pinski 

For mcall-aixdesc, the opindex was just m which was wrong.
For mcall-openbsd, the opindex was mcall-netbsd which was wrong.
This two have been broken since the options were added to the documentation
back in r0-92913-g244609a618b094 .

Committed as obvious after a "make html" and checking the options index.

Thanks,
Andrew

gcc/ChangeLog:

* doc/invoke.texi: Fix opindex for mcall-aixdesc and mcall-openbsd.
---
 gcc/doc/invoke.texi | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/gcc/doc/invoke.texi b/gcc/doc/invoke.texi
index dc2da464ebb..0276fbf4550 100644
--- a/gcc/doc/invoke.texi
+++ b/gcc/doc/invoke.texi
@@ -29640,7 +29640,7 @@ Specify both @option{-mcall-sysv} and @option{-meabi} 
options.
 Specify both @option{-mcall-sysv} and @option{-mno-eabi} options.
 
 @item -mcall-aixdesc
-@opindex m
+@opindex mcall-aixdesc
 On System V.4 and embedded PowerPC systems compile code for the AIX
 operating system.
 
@@ -29660,7 +29660,7 @@ On System V.4 and embedded PowerPC systems compile code 
for the
 NetBSD operating system.
 
 @item -mcall-openbsd
-@opindex mcall-netbsd
+@opindex mcall-openbsd
 On System V.4 and embedded PowerPC systems compile code for the
 OpenBSD operating system.
 
-- 
2.17.1



Re: [PATCH] RISC-V: Optimal RVV epilogue logic.

2022-11-14 Thread Jeff Law via Gcc-patches



On 11/14/22 20:13, Kito Cheng wrote:

I would suggest add a sperated case and scan-assembly-not to demonstrate
this patch.


Agreed.  One way to do this would be to have new tests which have the 
proper dg-directives for testing this issue and #include the original test.



So, something like this:



/* { dg-do compile } */
/* { dg-options "-march=rv32gcv -mabi=ilp32 -mpreferred-stack-boundary=3 
-O3 -fno-schedule-insns -fno-schedule-insns2" } */


#include "spill-1.c"

/* Make sure we do not have a useless SP adjustment.  */

/* { dg-final { scan-assembler-not "addi sp, sp, 0" } } */


The key thing to know is that the dg directives are parsed by the 
framework before preprocessing.  So the dg-directives in spill-1.c would 
not affect this new test.  That requires us to provide our own, both for 
how to run the test and what to look for.



Jeff




Re: [PATCH 7/7] riscv: Add support for str(n)cmp inline expansion

2022-11-14 Thread Jeff Law via Gcc-patches



On 11/14/22 17:53, Palmer Dabbelt wrote:

On Mon, 14 Nov 2022 16:46:37 PST (-0800), Kito Cheng wrote:

Hi Christoph:


This patch implements expansions for the cmpstrsi and the cmpstrnsi
builtins using Zbb instructions (if available).
This allows to inline calls to strcmp() and strncmp().

The expansion basically emits a peeled comparison sequence (i.e. a 
peeled

comparison loop) which compares XLEN bits per step if possible.

The emitted sequence can be controlled, by setting the maximum number
of compared bytes (-mstring-compare-inline-limit).


I would like to have a unified option interface,
maybe -m[no-]inline-str[n]cmp and -minline-str[n]cmp-limit.
And add some option like this:
-minline-str[n]cmp=[bitmanip|vector|auto] in future,
since I assume we'll have different versions of those things.


Can we just decide that from mtune?  We'll probably have 
uarch-specific string functions at some point, might as well start 
planning for it now.


Sure, though the implementation isn't terribly tied to any uarch at the 
moment and I doubt uarch approaches would make significant impacts here 
-- we're peeling off some small number of iterations fairly 
generically.  The uarch specific stuff would be the code in glibc 
selected by an ifunc.  uarch variants for block copiers seem inevitable 
though :-)



I don' t have strong opinions here, so if we want to key off mtune, 
sure.  If we want to have variants for scalar vs vector that's quite 
reasonable too.  Or if we want to go all the way to uarch specific 
implementations, I won't object.








th

gcc/ChangeLog:

    * config/riscv/riscv-protos.h (riscv_expand_strn_compare): New
  prototype.
    * config/riscv/riscv-string.cc (GEN_EMIT_HELPER3): New helper
  macros.
    (GEN_EMIT_HELPER2): New helper macros.
    (expand_strncmp_zbb_sequence): New function.
    (riscv_emit_str_compare_zbb): New function.
    (riscv_expand_strn_compare): New function.
    * config/riscv/riscv.md (cmpstrnsi): Invoke expansion functions
  for strn_compare.
    (cmpstrsi): Invoke expansion functions for strn_compare.
    * config/riscv/riscv.opt: Add new parameter
  '-mstring-compare-inline-limit'.


We need to document this option.


Yes, definitely needs documentation.  Thanks for catching that.


jeff



[PATCH] doc: Reword the description of -mrelax-cmpxchg-loop [PR 107676]

2022-11-14 Thread Hongyu Wang via Gcc-patches
Hi,

According to PR 107676, the document of -mrelax-cmpxchg-loop is nonsensical.
Adjust the wording according to the comments.

Bootstrapped on x86_64-pc-linux-gnu, ok for trunk?

gcc/ChangeLog:

PR target/107676
* doc/invoke.texi: Reword the description of
-mrelax-cmpxchg-loop.
---
 gcc/doc/invoke.texi | 10 ++
 1 file changed, 6 insertions(+), 4 deletions(-)

diff --git a/gcc/doc/invoke.texi b/gcc/doc/invoke.texi
index 40f667a630a..bdd7c319aef 100644
--- a/gcc/doc/invoke.texi
+++ b/gcc/doc/invoke.texi
@@ -33805,10 +33805,12 @@ registers.
 
 @item -mrelax-cmpxchg-loop
 @opindex mrelax-cmpxchg-loop
-Relax cmpxchg loop by emitting an early load and compare before cmpxchg,
-execute pause if load value is not expected. This reduces excessive
-cachline bouncing when and works for all atomic logic fetch builtins
-that generates compare and swap loop.
+For compare and swap loops that emitted by some __atomic_* builtins
+(e.g. __atomic_fetch_(or|and|xor|nand) and their __atomic_*_fetch
+counterparts), emit an atomic load before cmpxchg instruction. If the
+loaded value is not equal to expected, execute a pause instead of
+directly run the cmpxchg instruction. This might reduce excessive
+cacheline bouncing.
 
 @item -mindirect-branch=@var{choice}
 @opindex mindirect-branch
-- 
2.18.1



Re: [PATCH] RISC-V: Optimal RVV epilogue logic.

2022-11-14 Thread Jeff Law via Gcc-patches



On 11/14/22 09:29, jiawei wrote:

Skip add insn generate if the adjust size equal to zero.

gcc/ChangeLog:

 * config/riscv/riscv.cc (riscv_expand_epilogue):
New if control segement.

---
  gcc/config/riscv/riscv.cc | 18 ++
  1 file changed, 10 insertions(+), 8 deletions(-)

diff --git a/gcc/config/riscv/riscv.cc b/gcc/config/riscv/riscv.cc
index 02a01ca0b7c..af138db7545 100644
--- a/gcc/config/riscv/riscv.cc
+++ b/gcc/config/riscv/riscv.cc
@@ -5186,24 +5186,26 @@ riscv_expand_epilogue (int style)
}
  
/* Get an rtx for STEP1 that we can add to BASE.  */

-  rtx adjust = GEN_INT (step1.to_constant ());
-  if (!SMALL_OPERAND (step1.to_constant ()))
+  if (step1.to_constant () != 0){


This doesn't follow GCC formatting rules.  The open-curley should go on 
a new line, intended two spaces further in.  This will (of course) cause 
other code to need to be reindented as well.



Jeff


Re: Re: [PATCH] RISC-V: Optimal RVV epilogue logic.

2022-11-14 Thread Kito Cheng
I would suggest add a sperated case and scan-assembly-not to demonstrate
this patch.


juzhe.zh...@rivai.ai  於 2022年11月15日 週二 10:47 寫道:

> I think you'd better change assembler checking of "spill-*.c" cases.
> Check they don't have "addi sp,sp,0" redundant instruction.
> Let's see whether Kito aggree with that.
> --
> juzhe.zh...@rivai.ai
>
>
> *From:* jiawei 
> *Date:* 2022-11-15 10:37
> *To:* Kito Cheng 
> *CC:* gcc-patches ; kito.cheng
> ; palmer ; juzhe.zhong
> ; christoph.muellner ;
> philipp.tomsich ; wuwei2016
> 
> *Subject:* Re: Re: [PATCH] RISC-V: Optimal RVV epilogue logic.
>  -原始邮件-
>  发件人: "Kito Cheng" 
>  发送时间: 2022-11-15 09:48:26 (星期二)
>  收件人: jiawei 
>  抄送: gcc-patches@gcc.gnu.org, kito.ch...@sifive.com,
> pal...@rivosinc.com, juzhe.zh...@rivai.ai, christoph.muell...@vrull.eu,
> philipp.toms...@vrull.eu, wuwei2...@iscas.ac.cn
>  主题: Re: [PATCH] RISC-V: Optimal RVV epilogue logic.
> 
>  Could you provide some testcase?
>
> Sorry for not giving a clear description,
>
> You can use amost all testcases in gcc.target/riscv/rvv/base/spill-*.c
>
> compile with -march=rv64gcv and check the assemble file spill-*.s,
>
> before this patch, it will generate assemble code contain additional
>
> `addi sp,sp,0`:
>
> ```
> csrrt0,vlenb
> sllit1,t0,1
> add sp,sp,t1
> addisp,sp,0
> ld  s0,24(sp)
> addisp,sp,32
> jr  ra
> ```
>
> after this patch it will removed:
>
> ```
> csrrt0,vlenb
> sllit1,t0,1
> add sp,sp,t1
> ld  s0,24(sp)
> addisp,sp,32
> jr  ra
> ```
>
> 
>  On Tue, Nov 15, 2022 at 12:29 AM jiawei  wrote:
>  
>   Skip add insn generate if the adjust size equal to zero.
>  
>   gcc/ChangeLog:
>  
>   * config/riscv/riscv.cc (riscv_expand_epilogue):
>   New if control segement.
>  
>   ---
>gcc/config/riscv/riscv.cc | 18 ++
>1 file changed, 10 insertions(+), 8 deletions(-)
>  
>   diff --git a/gcc/config/riscv/riscv.cc
> b/gcc/config/riscv/riscv.cc
>   index 02a01ca0b7c..af138db7545 100644
>   --- a/gcc/config/riscv/riscv.cc
>   +++ b/gcc/config/riscv/riscv.cc
>   @@ -5186,24 +5186,26 @@ riscv_expand_epilogue (int style)
>   }
>  
>  /* Get an rtx for STEP1 that we can add to BASE.  */
>   -  rtx adjust = GEN_INT (step1.to_constant ());
>   -  if (!SMALL_OPERAND (step1.to_constant ()))
>   +  if (step1.to_constant () != 0){
>   +rtx adjust = GEN_INT (step1.to_constant ());
>   +if (!SMALL_OPERAND (step1.to_constant ()))
>   {
> riscv_emit_move (RISCV_PROLOGUE_TEMP (Pmode), adjust);
> adjust = RISCV_PROLOGUE_TEMP (Pmode);
>   }
>  
>   -  insn = emit_insn (
>   +insn = emit_insn (
>  gen_add3_insn (stack_pointer_rtx,
> stack_pointer_rtx, adjust));
>  
>   -  rtx dwarf = NULL_RTX;
>   -  rtx cfa_adjust_rtx = gen_rtx_PLUS (Pmode,
> stack_pointer_rtx,
>   +rtx dwarf = NULL_RTX;
>   +rtx cfa_adjust_rtx = gen_rtx_PLUS (Pmode,
> stack_pointer_rtx,
>GEN_INT (step2));
>  
>   -  dwarf = alloc_reg_note (REG_CFA_DEF_CFA, cfa_adjust_rtx,
> dwarf);
>   -  RTX_FRAME_RELATED_P (insn) = 1;
>   +dwarf = alloc_reg_note (REG_CFA_DEF_CFA,
> cfa_adjust_rtx, dwarf);
>   +RTX_FRAME_RELATED_P (insn) = 1;
>  
>   -  REG_NOTES (insn) = dwarf;
>   +REG_NOTES (insn) = dwarf;
>   +  }
>}
>  else if (frame_pointer_needed)
>{
>   --
>   2.25.1
>  
> 
>
>


Re: Re: [PATCH] RISC-V: Optimal RVV epilogue logic.

2022-11-14 Thread juzhe.zh...@rivai.ai
I think you'd better change assembler checking of "spill-*.c" cases.
Check they don't have "addi sp,sp,0" redundant instruction.
Let's see whether Kito aggree with that.


juzhe.zh...@rivai.ai
 
From: jiawei
Date: 2022-11-15 10:37
To: Kito Cheng
CC: gcc-patches; kito.cheng; palmer; juzhe.zhong; christoph.muellner; 
philipp.tomsich; wuwei2016
Subject: Re: Re: [PATCH] RISC-V: Optimal RVV epilogue logic.
 -原始邮件-
 发件人: "Kito Cheng" 
 发送时间: 2022-11-15 09:48:26 (星期二)
 收件人: jiawei 
 抄送: gcc-patches@gcc.gnu.org, kito.ch...@sifive.com, pal...@rivosinc.com, 
juzhe.zh...@rivai.ai, christoph.muell...@vrull.eu, philipp.toms...@vrull.eu, 
wuwei2...@iscas.ac.cn
 主题: Re: [PATCH] RISC-V: Optimal RVV epilogue logic.
 
 Could you provide some testcase?
 
Sorry for not giving a clear description, 
 
You can use amost all testcases in gcc.target/riscv/rvv/base/spill-*.c
 
compile with -march=rv64gcv and check the assemble file spill-*.s,
 
before this patch, it will generate assemble code contain additional
 
`addi sp,sp,0`:
 
```
csrrt0,vlenb
sllit1,t0,1
add sp,sp,t1
addisp,sp,0
ld  s0,24(sp)
addisp,sp,32
jr  ra
```
 
after this patch it will removed:
 
```
csrrt0,vlenb
sllit1,t0,1
add sp,sp,t1
ld  s0,24(sp)
addisp,sp,32
jr  ra
```
 
 
 On Tue, Nov 15, 2022 at 12:29 AM jiawei  wrote:
 
  Skip add insn generate if the adjust size equal to zero.
 
  gcc/ChangeLog:
 
  * config/riscv/riscv.cc (riscv_expand_epilogue):
  New if control segement.
 
  ---
   gcc/config/riscv/riscv.cc | 18 ++
   1 file changed, 10 insertions(+), 8 deletions(-)
 
  diff --git a/gcc/config/riscv/riscv.cc b/gcc/config/riscv/riscv.cc
  index 02a01ca0b7c..af138db7545 100644
  --- a/gcc/config/riscv/riscv.cc
  +++ b/gcc/config/riscv/riscv.cc
  @@ -5186,24 +5186,26 @@ riscv_expand_epilogue (int style)
  }
 
 /* Get an rtx for STEP1 that we can add to BASE.  */
  -  rtx adjust = GEN_INT (step1.to_constant ());
  -  if (!SMALL_OPERAND (step1.to_constant ()))
  +  if (step1.to_constant () != 0){
  +rtx adjust = GEN_INT (step1.to_constant ());
  +if (!SMALL_OPERAND (step1.to_constant ()))
  {
riscv_emit_move (RISCV_PROLOGUE_TEMP (Pmode), adjust);
adjust = RISCV_PROLOGUE_TEMP (Pmode);
  }
 
  -  insn = emit_insn (
  +insn = emit_insn (
 gen_add3_insn (stack_pointer_rtx, stack_pointer_rtx, 
adjust));
 
  -  rtx dwarf = NULL_RTX;
  -  rtx cfa_adjust_rtx = gen_rtx_PLUS (Pmode, stack_pointer_rtx,
  +rtx dwarf = NULL_RTX;
  +rtx cfa_adjust_rtx = gen_rtx_PLUS (Pmode, stack_pointer_rtx,
   GEN_INT (step2));
 
  -  dwarf = alloc_reg_note (REG_CFA_DEF_CFA, cfa_adjust_rtx, 
dwarf);
  -  RTX_FRAME_RELATED_P (insn) = 1;
  +dwarf = alloc_reg_note (REG_CFA_DEF_CFA, cfa_adjust_rtx, 
dwarf);
  +RTX_FRAME_RELATED_P (insn) = 1;
 
  -  REG_NOTES (insn) = dwarf;
  +REG_NOTES (insn) = dwarf;
  +  }
   }
 else if (frame_pointer_needed)
   {
  --
  2.25.1
 



Re: Re: [PATCH] RISC-V: Optimal RVV epilogue logic.

2022-11-14 Thread jiawei
 -原始邮件-
 发件人: "Kito Cheng" 
 发送时间: 2022-11-15 09:48:26 (星期二)
 收件人: jiawei 
 抄送: gcc-patches@gcc.gnu.org, kito.ch...@sifive.com, pal...@rivosinc.com, 
juzhe.zh...@rivai.ai, christoph.muell...@vrull.eu, philipp.toms...@vrull.eu, 
wuwei2...@iscas.ac.cn
 主题: Re: [PATCH] RISC-V: Optimal RVV epilogue logic.
 
 Could you provide some testcase?

Sorry for not giving a clear description, 

You can use amost all testcases in gcc.target/riscv/rvv/base/spill-*.c

compile with -march=rv64gcv and check the assemble file spill-*.s,

before this patch, it will generate assemble code contain additional

`addi sp,sp,0`:

```
csrrt0,vlenb
sllit1,t0,1
add sp,sp,t1
addisp,sp,0
ld  s0,24(sp)
addisp,sp,32
jr  ra
```

after this patch it will removed:

```
csrrt0,vlenb
sllit1,t0,1
add sp,sp,t1
ld  s0,24(sp)
addisp,sp,32
jr  ra
```

 
 On Tue, Nov 15, 2022 at 12:29 AM jiawei  wrote:
 
  Skip add insn generate if the adjust size equal to zero.
 
  gcc/ChangeLog:
 
  * config/riscv/riscv.cc (riscv_expand_epilogue):
  New if control segement.
 
  ---
   gcc/config/riscv/riscv.cc | 18 ++
   1 file changed, 10 insertions(+), 8 deletions(-)
 
  diff --git a/gcc/config/riscv/riscv.cc b/gcc/config/riscv/riscv.cc
  index 02a01ca0b7c..af138db7545 100644
  --- a/gcc/config/riscv/riscv.cc
  +++ b/gcc/config/riscv/riscv.cc
  @@ -5186,24 +5186,26 @@ riscv_expand_epilogue (int style)
  }
 
 /* Get an rtx for STEP1 that we can add to BASE.  */
  -  rtx adjust = GEN_INT (step1.to_constant ());
  -  if (!SMALL_OPERAND (step1.to_constant ()))
  +  if (step1.to_constant () != 0){
  +rtx adjust = GEN_INT (step1.to_constant ());
  +if (!SMALL_OPERAND (step1.to_constant ()))
  {
riscv_emit_move (RISCV_PROLOGUE_TEMP (Pmode), adjust);
adjust = RISCV_PROLOGUE_TEMP (Pmode);
  }
 
  -  insn = emit_insn (
  +insn = emit_insn (
 gen_add3_insn (stack_pointer_rtx, stack_pointer_rtx, 
adjust));
 
  -  rtx dwarf = NULL_RTX;
  -  rtx cfa_adjust_rtx = gen_rtx_PLUS (Pmode, stack_pointer_rtx,
  +rtx dwarf = NULL_RTX;
  +rtx cfa_adjust_rtx = gen_rtx_PLUS (Pmode, stack_pointer_rtx,
   GEN_INT (step2));
 
  -  dwarf = alloc_reg_note (REG_CFA_DEF_CFA, cfa_adjust_rtx, 
dwarf);
  -  RTX_FRAME_RELATED_P (insn) = 1;
  +dwarf = alloc_reg_note (REG_CFA_DEF_CFA, cfa_adjust_rtx, 
dwarf);
  +RTX_FRAME_RELATED_P (insn) = 1;
 
  -  REG_NOTES (insn) = dwarf;
  +REG_NOTES (insn) = dwarf;
  +  }
   }
 else if (frame_pointer_needed)
   {
  --
  2.25.1
 


Re: [PATCH 7/7] riscv: Add support for str(n)cmp inline expansion

2022-11-14 Thread Kito Cheng
On Tue, Nov 15, 2022 at 8:53 AM Palmer Dabbelt  wrote:
>
> On Mon, 14 Nov 2022 16:46:37 PST (-0800), Kito Cheng wrote:
> > Hi Christoph:
> >
> >> This patch implements expansions for the cmpstrsi and the cmpstrnsi
> >> builtins using Zbb instructions (if available).
> >> This allows to inline calls to strcmp() and strncmp().
> >>
> >> The expansion basically emits a peeled comparison sequence (i.e. a peeled
> >> comparison loop) which compares XLEN bits per step if possible.
> >>
> >> The emitted sequence can be controlled, by setting the maximum number
> >> of compared bytes (-mstring-compare-inline-limit).
> >
> > I would like to have a unified option interface,
> > maybe -m[no-]inline-str[n]cmp and -minline-str[n]cmp-limit.
> > And add some option like this:
> > -minline-str[n]cmp=[bitmanip|vector|auto] in future,
> > since I assume we'll have different versions of those things.
>
> Can we just decide that from mtune?  We'll probably have uarch-specific
> string functions at some point, might as well start planning for it now.

I assume you mean the -minline-str[n]cmp=[bitmanip|vector|auto] part?
I think this part should have more discussion and could defer that until
we reach consensus.

But -m[no-]inline-str[n]cmp and -minline-str[n]cmp-limit part I favor having
those two options to disable and/or fine tune those parameters.

>
> >> gcc/ChangeLog:
> >>
> >> * config/riscv/riscv-protos.h (riscv_expand_strn_compare): New
> >>   prototype.
> >> * config/riscv/riscv-string.cc (GEN_EMIT_HELPER3): New helper
> >>   macros.
> >> (GEN_EMIT_HELPER2): New helper macros.
> >> (expand_strncmp_zbb_sequence): New function.
> >> (riscv_emit_str_compare_zbb): New function.
> >> (riscv_expand_strn_compare): New function.
> >> * config/riscv/riscv.md (cmpstrnsi): Invoke expansion functions
> >>   for strn_compare.
> >> (cmpstrsi): Invoke expansion functions for strn_compare.
> >> * config/riscv/riscv.opt: Add new parameter
> >>   '-mstring-compare-inline-limit'.
> >
> > We need to document this option.


Re: [PATCH] RISC-V: Optimal RVV epilogue logic.

2022-11-14 Thread Kito Cheng via Gcc-patches
Could you provide some testcase?

On Tue, Nov 15, 2022 at 12:29 AM jiawei  wrote:
>
> Skip add insn generate if the adjust size equal to zero.
>
> gcc/ChangeLog:
>
> * config/riscv/riscv.cc (riscv_expand_epilogue):
> New if control segement.
>
> ---
>  gcc/config/riscv/riscv.cc | 18 ++
>  1 file changed, 10 insertions(+), 8 deletions(-)
>
> diff --git a/gcc/config/riscv/riscv.cc b/gcc/config/riscv/riscv.cc
> index 02a01ca0b7c..af138db7545 100644
> --- a/gcc/config/riscv/riscv.cc
> +++ b/gcc/config/riscv/riscv.cc
> @@ -5186,24 +5186,26 @@ riscv_expand_epilogue (int style)
> }
>
>/* Get an rtx for STEP1 that we can add to BASE.  */
> -  rtx adjust = GEN_INT (step1.to_constant ());
> -  if (!SMALL_OPERAND (step1.to_constant ()))
> +  if (step1.to_constant () != 0){
> +rtx adjust = GEN_INT (step1.to_constant ());
> +if (!SMALL_OPERAND (step1.to_constant ()))
> {
>   riscv_emit_move (RISCV_PROLOGUE_TEMP (Pmode), adjust);
>   adjust = RISCV_PROLOGUE_TEMP (Pmode);
> }
>
> -  insn = emit_insn (
> +insn = emit_insn (
>gen_add3_insn (stack_pointer_rtx, stack_pointer_rtx, adjust));
>
> -  rtx dwarf = NULL_RTX;
> -  rtx cfa_adjust_rtx = gen_rtx_PLUS (Pmode, stack_pointer_rtx,
> +rtx dwarf = NULL_RTX;
> +rtx cfa_adjust_rtx = gen_rtx_PLUS (Pmode, stack_pointer_rtx,
>  GEN_INT (step2));
>
> -  dwarf = alloc_reg_note (REG_CFA_DEF_CFA, cfa_adjust_rtx, dwarf);
> -  RTX_FRAME_RELATED_P (insn) = 1;
> +dwarf = alloc_reg_note (REG_CFA_DEF_CFA, cfa_adjust_rtx, dwarf);
> +RTX_FRAME_RELATED_P (insn) = 1;
>
> -  REG_NOTES (insn) = dwarf;
> +REG_NOTES (insn) = dwarf;
> +  }
>  }
>else if (frame_pointer_needed)
>  {
> --
> 2.25.1
>


Re: [PATCH] lto: Stream current working directory for first streamed relative filename and adjust relative paths [PR93865]

2022-11-14 Thread Ian Lance Taylor via Gcc-patches
On Thu, Sep 10, 2020 at 1:39 AM Jakub Jelinek via Gcc-patches
 wrote:
>
> If the gcc -c -flto ... commands to compile some or all objects are run in a
> different directory (or in different directories) from the directory in
> which the gcc -flto link line is invoked, then the .debug_line will be
> incorrect if there are any relative filenames, it will use those relative
> filenames while .debug_info will contain a different DW_AT_comp_dir.
>
> The following patch streams (at most once after each clear_line_info)
> the current working directory (what we record in DW_AT_comp_dir) when
> encountering the first relative pathname, and when reading the location info
> reads it back and if the current working directory at that point is
> different from the saved one, adjusts relative paths by adding a relative
> prefix how to go from the current working directory to the previously saved
> path (with a fallback e.g. for DOS e:\\foo vs. d:\\bar change to use
> absolute directory).
>
> Bootstrapped/regtested on x86_64-linux (both lto bootstrap and normal one;
> i686-linux doesn't build due to some unrelated libgo bugs), ok for trunk?
>
> 2020-09-10  Jakub Jelinek  
>
> PR debug/93865
> * lto-streamer.h (struct output_block): Add emit_pad member.
> * lto-streamer-out.c: Include toplev.h.
> (clear_line_info): Set emit_pwd.
> (lto_output_location_1): Encode the ob->current_file != xloc.file
> bit directly into the location number.  If changing file, emit
> additionally a bit whether pwd is emitted and emit it before the
> first relative pathname since clear_line_info.
> (output_function, output_constructor): Don't call clear_line_info
> here.
> * lto-streamer-in.c (struct string_pair_map): New type.
> (struct string_pair_map_hasher): New type.
> (string_pair_map_hasher::hash): New method.
> (string_pair_map_hasher::equal): New method.
> (path_name_pair_hash_table, string_pair_map_allocator): New variables.
> (relative_path_prefix, canon_relative_path_prefix,
> canon_relative_file_name): New functions.
> (canon_file_name): Add relative_prefix argument, if non-NULL
> and string is a relative path, return canon_relative_file_name.
> (lto_location_cache::input_location_and_block): Decode file change
> bit from the location number.  If changing file, unpack bit whether
> pwd is streamed and stream in pwd.  Adjust canon_file_name caller.
> (lto_free_file_name_hash): Delete path_name_pair_hash_table
> and string_pair_map_allocator.


Hi, I've noticed that this patch is incomplete.  It streams the result
of get_src_pwd without passing it through remap_debug_filename.  As in
comp_dir_output in dwarf2out.cc, we should always remap all file and
directory names, including the result of get_src_pwd.

Ian




> --- gcc/lto-streamer.h.jj   2020-09-09 09:08:13.102815586 +0200
> +++ gcc/lto-streamer.h  2020-09-09 12:36:13.120070769 +0200
> @@ -718,6 +718,7 @@ struct output_block
>int current_col;
>bool current_sysp;
>bool reset_locus;
> +  bool emit_pwd;
>tree current_block;
>
>/* Cache of nodes written in this section.  */
> --- gcc/lto-streamer-out.c.jj   2020-09-09 09:08:13.077815963 +0200
> +++ gcc/lto-streamer-out.c  2020-09-09 13:21:34.093021582 +0200
> @@ -47,6 +47,7 @@ along with GCC; see the file COPYING3.
>  #include "file-prefix-map.h" /* remap_debug_filename()  */
>  #include "output.h"
>  #include "ipa-utils.h"
> +#include "toplev.h"
>
>
>  static void lto_write_tree (struct output_block*, tree, bool);
> @@ -61,6 +62,7 @@ clear_line_info (struct output_block *ob
>ob->current_col = 0;
>ob->current_sysp = false;
>ob->reset_locus = true;
> +  ob->emit_pwd = true;
>/* Initialize to something that will never appear as block,
>   so that the first location with block in a function etc.
>   always streams a change_block bit and the first block.  */
> @@ -189,9 +191,6 @@ lto_output_location_1 (struct output_blo
>  {
>location_t loc = LOCATION_LOCUS (orig_loc);
>
> -  bp_pack_int_in_range (bp, 0, RESERVED_LOCATION_COUNT,
> -   loc < RESERVED_LOCATION_COUNT
> -   ? loc : RESERVED_LOCATION_COUNT);
>if (loc >= RESERVED_LOCATION_COUNT)
>  {
>expanded_location xloc = expand_location (loc);
> @@ -207,13 +206,30 @@ lto_output_location_1 (struct output_blo
>   ob->reset_locus = false;
> }
>
> -  bp_pack_value (bp, ob->current_file != xloc.file, 1);
> +  /* As RESERVED_LOCATION_COUNT is 2, we can use the spare value of
> +3 without wasting additional bits to signalize file change.
> +If RESERVED_LOCATION_COUNT changes, reconsider this.  */
> +  gcc_checking_assert (RESERVED_LOCATION_COUNT == 2);
> +  bp_pack_int_in_range (bp, 0, RESERVED_LOCATION_COUNT + 1,
> +   

Re: [PATCH 7/7] riscv: Add support for str(n)cmp inline expansion

2022-11-14 Thread Palmer Dabbelt

On Mon, 14 Nov 2022 16:46:37 PST (-0800), Kito Cheng wrote:

Hi Christoph:


This patch implements expansions for the cmpstrsi and the cmpstrnsi
builtins using Zbb instructions (if available).
This allows to inline calls to strcmp() and strncmp().

The expansion basically emits a peeled comparison sequence (i.e. a peeled
comparison loop) which compares XLEN bits per step if possible.

The emitted sequence can be controlled, by setting the maximum number
of compared bytes (-mstring-compare-inline-limit).


I would like to have a unified option interface,
maybe -m[no-]inline-str[n]cmp and -minline-str[n]cmp-limit.
And add some option like this:
-minline-str[n]cmp=[bitmanip|vector|auto] in future,
since I assume we'll have different versions of those things.


Can we just decide that from mtune?  We'll probably have uarch-specific 
string functions at some point, might as well start planning for it now.



gcc/ChangeLog:

* config/riscv/riscv-protos.h (riscv_expand_strn_compare): New
  prototype.
* config/riscv/riscv-string.cc (GEN_EMIT_HELPER3): New helper
  macros.
(GEN_EMIT_HELPER2): New helper macros.
(expand_strncmp_zbb_sequence): New function.
(riscv_emit_str_compare_zbb): New function.
(riscv_expand_strn_compare): New function.
* config/riscv/riscv.md (cmpstrnsi): Invoke expansion functions
  for strn_compare.
(cmpstrsi): Invoke expansion functions for strn_compare.
* config/riscv/riscv.opt: Add new parameter
  '-mstring-compare-inline-limit'.


We need to document this option.


Re: [PATCH 7/7] riscv: Add support for str(n)cmp inline expansion

2022-11-14 Thread Kito Cheng via Gcc-patches
Hi Christoph:

> This patch implements expansions for the cmpstrsi and the cmpstrnsi
> builtins using Zbb instructions (if available).
> This allows to inline calls to strcmp() and strncmp().
>
> The expansion basically emits a peeled comparison sequence (i.e. a peeled
> comparison loop) which compares XLEN bits per step if possible.
>
> The emitted sequence can be controlled, by setting the maximum number
> of compared bytes (-mstring-compare-inline-limit).

I would like to have a unified option interface,
maybe -m[no-]inline-str[n]cmp and -minline-str[n]cmp-limit.
And add some option like this:
-minline-str[n]cmp=[bitmanip|vector|auto] in future,
since I assume we'll have different versions of those things.

>
> gcc/ChangeLog:
>
> * config/riscv/riscv-protos.h (riscv_expand_strn_compare): New
>   prototype.
> * config/riscv/riscv-string.cc (GEN_EMIT_HELPER3): New helper
>   macros.
> (GEN_EMIT_HELPER2): New helper macros.
> (expand_strncmp_zbb_sequence): New function.
> (riscv_emit_str_compare_zbb): New function.
> (riscv_expand_strn_compare): New function.
> * config/riscv/riscv.md (cmpstrnsi): Invoke expansion functions
>   for strn_compare.
> (cmpstrsi): Invoke expansion functions for strn_compare.
> * config/riscv/riscv.opt: Add new parameter
>   '-mstring-compare-inline-limit'.

We need to document this option.


[PATCH v2] c++: Disable -Wignored-qualifiers for template args [PR107492]

2022-11-14 Thread Marek Polacek via Gcc-patches
On Thu, Nov 03, 2022 at 03:22:12PM -0400, Jason Merrill wrote:
> On 11/1/22 13:01, Marek Polacek wrote:
> > It seems wrong to issue a -Wignored-qualifiers warning for code like:
> > 
> >static_assert(!is_same_v);
> > 
> > because there the qualifier matters.  Likewise in template
> > specialization:
> > 
> >template struct S { };
> >template<> struct S { };
> >template<> struct S { }; // OK, not a redefinition
> > 
> > I'm of the mind that we should disable the warning for template
> > arguments, as in the patch below.
> 
> Hmm, I'm not sure why we would want to treat template arguments differently
> from other type-ids.  Maybe only warn if funcdecl_p?

I think that makes sense.  There are other contexts in which cv-quals
matter, for instance trailing-return-type.  Updated patch below, plus
I've extended the testcase.  Thanks,

Bootstrapped/regtested on x86_64-pc-linux-gnu, ok for trunk?

-- >8 --
It seems wrong to issue a -Wignored-qualifiers warning for code like:

  static_assert(!is_same_v);

because there the qualifier matters.  Likewise in template
specialization:

  template struct S { };
  template<> struct S { };
  template<> struct S { }; // OK, not a redefinition

And likewise in other type-id contexts such as trailing-return-type:

  auto g() -> const void (*)();

This patch limits the warning to the function declaration context only.

PR c++/107492

gcc/cp/ChangeLog:

* decl.cc (grokdeclarator): Only emit a -Wignored-qualifiers warning
when funcdecl_p.

gcc/testsuite/ChangeLog:

* g++.dg/warn/Wignored-qualifiers3.C: New test.
---
 gcc/cp/decl.cc|  6 -
 .../g++.dg/warn/Wignored-qualifiers3.C| 24 +++
 2 files changed, 29 insertions(+), 1 deletion(-)
 create mode 100644 gcc/testsuite/g++.dg/warn/Wignored-qualifiers3.C

diff --git a/gcc/cp/decl.cc b/gcc/cp/decl.cc
index 890cfcabd35..67b9f24d7d6 100644
--- a/gcc/cp/decl.cc
+++ b/gcc/cp/decl.cc
@@ -13038,7 +13038,11 @@ grokdeclarator (const cp_declarator *declarator,
 
if (type_quals != TYPE_UNQUALIFIED)
  {
-   if (SCALAR_TYPE_P (type) || VOID_TYPE_P (type))
+   /* It's wrong, for instance, to issue a -Wignored-qualifiers
+  warning for
+   static_assert(!is_same_v);
+   because there the qualifier matters.  */
+   if (funcdecl_p && (SCALAR_TYPE_P (type) || VOID_TYPE_P (type)))
  warning_at (typespec_loc, OPT_Wignored_qualifiers, "type "
  "qualifiers ignored on function return type");
/* [dcl.fct] "A volatile-qualified return type is
diff --git a/gcc/testsuite/g++.dg/warn/Wignored-qualifiers3.C 
b/gcc/testsuite/g++.dg/warn/Wignored-qualifiers3.C
new file mode 100644
index 000..dedb38fc995
--- /dev/null
+++ b/gcc/testsuite/g++.dg/warn/Wignored-qualifiers3.C
@@ -0,0 +1,24 @@
+// PR c++/107492
+// { dg-do compile { target c++14 } }
+// { dg-additional-options "-Wignored-qualifiers" }
+
+// Here the 'const' matters, so don't warn.
+template struct S { };
+template<> struct S { };
+template<> struct S { }; // { dg-bogus "ignored" }
+
+template constexpr bool is_same_v = false;
+template constexpr bool is_same_v = true;
+
+static_assert( ! is_same_v< void(*)(), const void(*)() >, ""); // { dg-bogus 
"ignored" }
+
+// Here the 'const' matters as well -> don't warn.
+auto g() -> const void (*)(); // { dg-bogus "ignored" }
+auto g() -> const void (*)() { return nullptr; } // { dg-bogus "ignored" }
+
+// Here as well.
+const void (*h)() = static_cast(h); // { dg-bogus "ignored" }
+
+// But let's keep the warning here.
+const void f(); // { dg-warning "ignored" }
+const void f() { } // { dg-warning "ignored" }

base-commit: c41bbfcaf9d6ef5b57a7e89bba70b861c08a686b
-- 
2.38.1



Re: [PATCH 7/7] riscv: Add support for str(n)cmp inline expansion

2022-11-14 Thread Jeff Law via Gcc-patches



On 11/14/22 14:49, Christoph Müllner wrote:



We can take this further, but then the following questions pop up:
* how much data processing per loop iteration?


I have no idea because I don't have any real data.  Last time I gathered 
any data on this issue was circa 1988 :-)




* what about unaligned strings?


I'd punt.  I don't think we can depend on having a high performance 
unaligned access.  You could do a dynamic check of alignment, but you'd 
really need to know that they're aligned often enough that the dynamic 
check can often be recovered.





Happy to get suggestions/opinions for improvement.


I think this is pretty good without additional data that would indicate 
that handling unaligned cases or a different number of loop peels would 
be a notable improvement.


Jeff


Re: [PATCH] ira: Remove duplicate `memset' over `full_costs' from `assign_hard_reg'

2022-11-14 Thread Jeff Law via Gcc-patches



On 11/14/22 16:21, Maciej W. Rozycki wrote:

Remove duplicate clearing of `full_costs' made in `assign_hard_reg',
which has been there since the beginning, i.e. commit 058e97ecf33a
("IRA has been merged into trunk"),
.

gcc/
* ira-color.cc (assign_hard_reg): Remove duplicate `memset' over
`full_costs'.
---
Hi,

  I find this fairly obvious, OK to apply?


Seems obvious to me as well.  OK.

jeff




Re: [PATCH] c++: Add testcase for DR 2392

2022-11-14 Thread Jason Merrill via Gcc-patches

On 11/14/22 00:36, Jakub Jelinek wrote:

Hi!

Working virtually out of Baker Island.

The testcase from DR 2392 passes, so I assume we don't need to do
anything further for the DR.

Tested on x86_64-linux, ok for trunk?


OK.


2022-11-13  Jakub Jelinek  

* g++.dg/DRs/dr2392.C: Add testcase for DR 2392.

--- gcc/testsuite/g++.dg/DRs/dr2392.C.jj2022-11-13 20:49:22.107817793 
-1200
+++ gcc/testsuite/g++.dg/DRs/dr2392.C   2022-11-13 20:49:17.506880524 -1200
@@ -0,0 +1,12 @@
+// DR 2392
+// { dg-do compile { target c++11 } }
+
+template 
+constexpr int
+foo ()
+{
+  T t;
+  return 1;
+}
+
+using V = decltype (new int[foo ()]);

Jakub





Re: [PATCH] c++: Allow attributes on concepts - DR 2428

2022-11-14 Thread Jason Merrill via Gcc-patches

On 11/14/22 00:40, Jakub Jelinek wrote:

Hi!

Working virtually out of Baker Island.

The following patch adds parsing of attributes to concept definition,
allows deprecated attribute to be specified (some ugliness needed
because CONCEPT_DECL is a cp/*.def attribute and so can't be mentioned
in c-family/ directly; used what is used for objc method decls,
an alternative would be a langhook)


Several of the codes in c-common.def are C++-only, you might just move 
it over?



and checks TREE_DEPRECATED in
build_standard_check (not sure if that is the right spot, or whether
it shouldn't be checked also for variable and function concepts and
how to write testcase coverage for that).


I wouldn't bother with var/fn concepts, they're obsolete.


Lightly tested so far.

2022-11-13  Jakub Jelinek  

gcc/c-family/
* c-common.h (c_concept_decl): Declare.
* c-attribs.cc (handle_deprecated_attribute): Allow deprecated
attribute on CONCEPT_DECL if flag_concepts.
gcc/c/
* c-decl.cc (c_concept_decl): New function.
gcc/cp/
* cp-tree.h (finish_concept_definition): Add ATTRS parameter.
* parser.cc (cp_parser_concept_definition): Parse attributes in
between identifier and =.  Adjust finish_concept_definition
caller.
* pt.cc (finish_concept_definition): Add ATTRS parameter.  Call
cplus_decl_attributes.
* constraint.cc (build_standard_check): If CONCEPT_DECL is
TREE_DEPRECATED, emit -Wdeprecated-declaration warnings.
* tree.cc (c_concept_decl): New function.
gcc/testsuite/
* g++.dg/cpp2a/concepts-dr2428.C: New test.

--- gcc/c-family/c-common.h.jj  2022-10-27 21:00:53.698247586 -1200
+++ gcc/c-family/c-common.h 2022-11-13 21:49:37.934598359 -1200
@@ -831,6 +831,7 @@ extern tree (*make_fname_decl) (location
  
  /* In c-decl.cc and cp/tree.cc.  FIXME.  */

  extern void c_register_addr_space (const char *str, addr_space_t as);
+extern bool c_concept_decl (enum tree_code);
  
  /* In c-common.cc.  */

  extern bool in_late_binary_op;
--- gcc/c-family/c-attribs.cc.jj2022-10-09 19:31:57.177988375 -1200
+++ gcc/c-family/c-attribs.cc   2022-11-13 21:52:37.920152731 -1200
@@ -4211,7 +4211,8 @@ handle_deprecated_attribute (tree *node,
  || VAR_OR_FUNCTION_DECL_P (decl)
  || TREE_CODE (decl) == FIELD_DECL
  || TREE_CODE (decl) == CONST_DECL
- || objc_method_decl (TREE_CODE (decl)))
+ || objc_method_decl (TREE_CODE (decl))
+ || (flag_concepts && c_concept_decl (TREE_CODE (decl
TREE_DEPRECATED (decl) = 1;
else if (TREE_CODE (decl) == LABEL_DECL)
{
--- gcc/c/c-decl.cc.jj  2022-11-12 23:29:08.181504470 -1200
+++ gcc/c/c-decl.cc 2022-11-13 21:50:38.178779716 -1200
@@ -12987,6 +12987,14 @@ c_register_addr_space (const char *word,
ridpointers [rid] = id;
  }
  
+/* C doesn't have CONCEPT_DECL.  */

+
+bool
+c_concept_decl (enum tree_code)
+{
+  return false;
+}
+
  /* Return identifier to look up for omp declare reduction.  */
  
  tree

--- gcc/cp/cp-tree.h.jj 2022-11-11 20:30:10.138056914 -1200
+++ gcc/cp/cp-tree.h2022-11-13 20:58:39.443218815 -1200
@@ -8324,7 +8324,7 @@ struct diagnosing_failed_constraint
  extern cp_expr finish_constraint_or_expr  (location_t, cp_expr, cp_expr);
  extern cp_expr finish_constraint_and_expr (location_t, cp_expr, cp_expr);
  extern cp_expr finish_constraint_primary_expr (cp_expr);
-extern tree finish_concept_definition  (cp_expr, tree);
+extern tree finish_concept_definition  (cp_expr, tree, tree);
  extern tree combine_constraint_expressions  (tree, tree);
  extern tree append_constraint (tree, tree);
  extern tree get_constraints (const_tree);
--- gcc/cp/parser.cc.jj 2022-11-08 22:39:13.325041007 -1200
+++ gcc/cp/parser.cc2022-11-13 20:58:15.692542640 -1200
@@ -29672,6 +29672,8 @@ cp_parser_concept_definition (cp_parser
return NULL_TREE;
  }
  
+  tree attrs = cp_parser_attributes_opt (parser);

+
if (!cp_parser_require (parser, CPP_EQ, RT_EQ))
  {
cp_parser_skip_to_end_of_statement (parser);
@@ -29688,7 +29690,7 @@ cp_parser_concept_definition (cp_parser
   but continue as if it were.  */
cp_parser_consume_semicolon_at_end_of_statement (parser);
  
-  return finish_concept_definition (id, init);

+  return finish_concept_definition (id, init, attrs);
  }
  
  // -- //

--- gcc/cp/pt.cc.jj 2022-11-07 20:54:37.341399829 -1200
+++ gcc/cp/pt.cc2022-11-13 21:01:18.333053377 -1200
@@ -29027,7 +29027,7 @@ placeholder_type_constraint_dependent_p
 the TEMPLATE_DECL. */
  
  tree

-finish_concept_definition (cp_expr id, tree init)
+finish_concept_definition (cp_expr id, tree init, tree attrs)
  {
gcc_assert (identifier_p (id));
gcc_assert (processing_template_decl);
@@ -29061,6 +29061,9 @@ 

Re: [PATCH 2/2] c++: remove i_c_e_p parm from tsubst_copy_and_build

2022-11-14 Thread Jason Merrill via Gcc-patches

On 11/10/22 09:56, Patrick Palka wrote:

AFAICT the only purpose of tsubst_copy_and_build's
integral_constant_expression_p boolean parameter is to diagnose certain
constructs that aren't allowed to appear in a C++98 integral constant
expression context, specifically casts to a non-integral type (diagnosed
from the *_CAST_EXPR case of tsubst_copy_and_build) or dependent names
that resolve to a non-constant decl (diagnosed from the IDENTIFIER_NODE
case of tsubst_copy_and_build).  The parameter has no effect outside of
C++98 AFAICT.

But diagnosing such constructs should arguably be done by
is_constant_expression after substitution, and doing it during
substitution by way of an additional parameter complicates the API of
this workhouse function for functionality that's specific to C++98.
And it seems is_constant_expression already does a good job of diagnosing
the aforementioned two constructs in C++98 mode, at least as far as our
testsuite is concerned.

So this patch gets rid of this parameter from tsubst_copy_and_build,
tsubst_expr and tsubst_copy_and_build_call_args.  The only interesting
changes are those to potential_constant_expression_1 and the
IDENTIFIER_NODE and *_CAST_EXPR cases of tsubst_copy_and_build; the rest
are mechanical adjustments to these functions and their call sites.

Bootstrapped and regtested on x86_64-pc-linux-gnu, does this look OK for
trunk?


OK.


gcc/cp/ChangeLog:

* constexpr.cc (potential_constant_expression_1)
: Use
cast_valid_in_integral_constant_expression_p instead of
open coding it.
* constraint.cc (tsubst_valid_expression_requirement): Adjust
calls to tsubst_copy_and_build and tsubst_expr.
(tsubst_constraint): Likewise.
(satisfy_atom): Likewise.
(diagnose_trait_expr): Likewise.
* cp-tree.h (tsubst_copy_and_build): Remove i_c_e_p parameter.
(tsubst_expr): Likewise.
* init.cc (get_nsdmi): Adjust calls to tsubst_copy_and_build
and tsubst_expr.
* pt.cc (expand_integer_pack): Likewise.
(instantiate_non_dependent_expr_internal): Likewise.
(tsubst_friend_function): Likewise.
(tsubst_attribute): Likewise.
(instantiate_class_template): Likewise.
(tsubst_template_arg): Likewise.
(gen_elem_of_pack_expansion_instantiation): Likewise.
(tsubst_fold_expr_init): Likewise.
(tsubst_pack_expansion): Likewise.
(tsubst_default_argument): Likewise.
(tsubst_function_decl): Likewise.
(tsubst_decl): Likewise.
(tsubst_arg_types): Likewise.
(tsubst_exception_specification): Likewise.
(tsubst): Likewise.
(tsubst_init): Likewise.
(tsubst_copy): Likewise.
(tsubst_omp_clause_decl): Likewise.
(tsubst_omp_clauses): Likewise.
(tsubst_copy_asm_operands): Likewise.
(tsubst_omp_for_iterator): Likewise.
(tsubst_expr): Likewise.  Remove i_c_e_p parameter.
(tsubst_omp_udr): Likewise.
(tsubst_non_call_postfix_expression): Likewise.  Remove i_c_e_p 
parameter.
(tsubst_lambda_expr): Likewise.
(tsubst_copy_and_build_call_args): Likewise.
(tsubst_copy_and_build): Likewise.  Remove i_c_e_p parameter.
: Adjust call to finish_id_expression
following removal of i_c_e_p.
: Remove C++98-specific cast validity check
guarded by i_c_e_p.
(maybe_instantiate_noexcept): Adjust calls to
tsubst_copy_and_build and tsubst_expr.
(instantiate_body): Likewise.
(instantiate_decl): Likewise.
(tsubst_initializer_list): Likewise.
(tsubst_enum): Likewise.

gcc/objcp/ChangeLog:

* objcp-lang.cc (objcp_tsubst_copy_and_build): Likewise.

gcc/testsuite/ChangeLog:

* g++.dg/template/crash55.C: Don't expect additional
C++98-specific diagnostics.
* g++.dg/template/ref3.C: Remove C++98-specific xfail.
---
  gcc/cp/constexpr.cc |   4 +-
  gcc/cp/constraint.cc|  14 +-
  gcc/cp/cp-tree.h|   6 +-
  gcc/cp/init.cc  |   6 +-
  gcc/cp/pt.cc| 240 
  gcc/objcp/objcp-lang.cc |   3 +-
  gcc/testsuite/g++.dg/template/crash55.C |   3 +-
  gcc/testsuite/g++.dg/template/ref3.C|   3 +-
  8 files changed, 93 insertions(+), 186 deletions(-)

diff --git a/gcc/cp/constexpr.cc b/gcc/cp/constexpr.cc
index 15b4f2c4a08..e665839f5b1 100644
--- a/gcc/cp/constexpr.cc
+++ b/gcc/cp/constexpr.cc
@@ -9460,9 +9460,7 @@ potential_constant_expression_1 (tree t, bool want_rval, 
bool strict, bool now,
  case STATIC_CAST_EXPR:
  case REINTERPRET_CAST_EXPR:
  case IMPLICIT_CONV_EXPR:
-  if (cxx_dialect < cxx11
- && !dependent_type_p (TREE_TYPE (t))
- && !INTEGRAL_OR_ENUMERATION_TYPE_P (TREE_TYPE (t)))
+  if (!cast_valid_in_integral_constant_expression_p (TREE_TYPE 

Re: [PATCH 1/2] c++: remove function_p parm from tsubst_copy_and_build

2022-11-14 Thread Jason Merrill via Gcc-patches

On 11/10/22 09:56, Patrick Palka wrote:

The function_p parameter of tsubst_copy_and_build (added in r69316) is
inspected only in its IDENTIFIER_NODE case, where it controls whether we
diagnose unqualified name lookup failure for the given identifier.  But
I think ever since r173965, we never substitute an IDENTIFIER_NODE with
function_p=true for which the lookup can possibly fail, and therefore
the flag is effectively unneeded.

Before that commit, we would incorrectly repeat unqualified lookup for
an ADL-enabled CALL_EXPR at instantiation time, which naturally could
fail and thus motivated the flag.  Afterwards, we no longer substitute
an IDENTIFIER_NODE callee when koenig_p is true so the flag isn't needed
for its original purpose.  What about when koenig_p=false?  Apparently
we still may have an IDENTIFIER_NODE callee in this case, namely when
unqualified name lookup found a dependent local function declaration,
but repeating that lookup can't fail.  (It also can't fail for USING_DECL
callees.)

So this patch removes this effectively unneeded parameter from
tsubst_copy_and_build.  It also updates a outdated comment in the
CALL_EXPR case about when we may see an IDENTIFIER_NODE callee with
koenig_p=false.

Bootstrapped and regtested on x86_64-pc-linux-gnu, does this look OK for
trunk?


OK.


gcc/cp/ChangeLog:

* cp-lang.cc (objcp_tsubst_copy_and_build): Remove
function_p parameter.
* cp-objcp-common.h (objcp_tsubst_copy_and_build):
Likewise.
* cp-tree.h (tsubst_copy_and_build): Likewise.
* init.cc (get_nsdmi): Adjust calls to tsubst_copy_and_build.
* pt.cc (expand_integer_pack): Likewise.
(instantiate_non_dependent_expr_internal): Likewise.
(tsubst_function_decl): Likewise.
(tsubst_arg_types): Likewise.
(tsubst_exception_specification): Likewise.
(tsubst): Likewise.
(tsubst_copy_asm_operands): Likewise.
(tsubst_expr): Likewise.
(tsubst_non_call_postfix_expression): Likewise.
(tsubst_lambda_expr): Likewise.
(tsubst_copy_and_build_call_args): Likewise.
(tsubst_copy_and_build): Remove function_p parameter
and adjust function comment.  Adjust recursive calls.
: Update outdated comment about when
we can see an IDENTIFIER_NODE callee with koenig_p=false.
(maybe_instantiate_noexcept): Adjust calls to
tsubst_copy_and_build.

gcc/objcp/ChangeLog:

* objcp-lang.cc (objcp_tsubst_copy_and_build): Remove
function_p parameter.
---
  gcc/cp/cp-lang.cc|  3 +--
  gcc/cp/cp-objcp-common.h |  3 +--
  gcc/cp/cp-tree.h |  2 +-
  gcc/cp/init.cc   |  2 +-
  gcc/cp/pt.cc | 46 
  gcc/objcp/objcp-lang.cc  |  5 ++---
  6 files changed, 19 insertions(+), 42 deletions(-)

diff --git a/gcc/cp/cp-lang.cc b/gcc/cp/cp-lang.cc
index c3cfde56cc6..a3f29eda0d6 100644
--- a/gcc/cp/cp-lang.cc
+++ b/gcc/cp/cp-lang.cc
@@ -116,8 +116,7 @@ tree
  objcp_tsubst_copy_and_build (tree /*t*/,
 tree /*args*/,
 tsubst_flags_t /*complain*/,
-tree /*in_decl*/,
-bool /*function_p*/)
+tree /*in_decl*/)
  {
return NULL_TREE;
  }
diff --git a/gcc/cp/cp-objcp-common.h b/gcc/cp/cp-objcp-common.h
index 1a67f14d9b3..f4ba0c9e012 100644
--- a/gcc/cp/cp-objcp-common.h
+++ b/gcc/cp/cp-objcp-common.h
@@ -24,8 +24,7 @@ along with GCC; see the file COPYING3.  If not see
  /* In cp/objcp-common.c, cp/cp-lang.cc and objcp/objcp-lang.cc.  */
  
  extern tree cp_get_debug_type (const_tree);

-extern tree objcp_tsubst_copy_and_build (tree, tree, tsubst_flags_t,
-tree, bool);
+extern tree objcp_tsubst_copy_and_build (tree, tree, tsubst_flags_t, tree);
  
  extern int cp_decl_dwarf_attribute (const_tree, int);

  extern int cp_type_dwarf_attribute (const_tree, int);
diff --git a/gcc/cp/cp-tree.h b/gcc/cp/cp-tree.h
index d13bb3d4c0e..40fd2e1ebb9 100644
--- a/gcc/cp/cp-tree.h
+++ b/gcc/cp/cp-tree.h
@@ -7383,7 +7383,7 @@ extern tree tsubst_default_argument   (tree, 
int, tree, tree,
 tsubst_flags_t);
  extern tree tsubst (tree, tree, tsubst_flags_t, tree);
  extern tree tsubst_copy_and_build (tree, tree, tsubst_flags_t,
-tree, bool = false, bool = 
false);
+tree, bool = false);
  extern tree tsubst_expr (tree, tree, tsubst_flags_t,
   tree, bool);
  extern tree tsubst_pack_expansion (tree, tree, tsubst_flags_t, 
tree);
diff --git a/gcc/cp/init.cc b/gcc/cp/init.cc
index 3d5d3904944..fee49090de7 100644
--- a/gcc/cp/init.cc
+++ b/gcc/cp/init.cc
@@ -622,7 +622,7 @@ get_nsdmi (tree 

Re: [PATCH] c++: Implement C++23 P2589R1 - - static operator[]

2022-11-14 Thread Jason Merrill via Gcc-patches

On 11/10/22 21:40, Jakub Jelinek wrote:

Hi!

As stage1 is very close, here is a patch that implements the static
operator[] paper.
One thing that doesn't work properly is the same problem as I've filed
yesterday for static operator() - PR107624 - that side-effects of
the postfix-expression on which the call or subscript operator are
applied are thrown away, I assume we have to add them into COMPOUND_EXPR
somewhere after we find out that the we've chosen a static member function
operator.


Indeed.  The code in build_new_method_call for this case has the comment

  /* In an expression of the form `a->f()' where `f' turns 

 out to be a static member function, `a' is 


 none-the-less evaluated.  */


Bootstrapped/regtested on x86_64-linux and i686-linux, ok for trunk
provided the paper gets voted into C++23?


OK.


2022-11-11  Jakub Jelinek  

gcc/c-family/
* c-cppbuiltin.cc (c_cpp_builtins): Bump C++23
__cpp_multidimensional_subscript macro value to 202211L.
gcc/cp/
* decl.cc (grok_op_properties): Implement C++23 P2589R1
- static operator[].  Handle operator[] similarly to operator()
- allow static member functions, but pedwarn on it for C++20 and
older.  Unlike operator(), perform rest of checks on it though for
C++20.
* call.cc (add_operator_candidates): For operator[] with class
typed first parameter, pass that parameter as first_arg and
an adjusted arglist without that parameter.
gcc/testsuite/
* g++.dg/cpp23/subscript9.C: New test.
* g++.dg/cpp23/feat-cxx2b.C: Expect a newer
__cpp_multidimensional_subscript value.
* g++.old-deja/g++.bugs/900210_10.C: Don't expect an error
for C++23 or later.

--- gcc/c-family/c-cppbuiltin.cc.jj 2022-10-14 09:35:56.182990495 +0200
+++ gcc/c-family/c-cppbuiltin.cc2022-11-10 22:29:12.539832741 +0100
@@ -1075,7 +1075,7 @@ c_cpp_builtins (cpp_reader *pfile)
  cpp_define (pfile, "__cpp_size_t_suffix=202011L");
  cpp_define (pfile, "__cpp_if_consteval=202106L");
  cpp_define (pfile, "__cpp_constexpr=202110L");
- cpp_define (pfile, "__cpp_multidimensional_subscript=202110L");
+ cpp_define (pfile, "__cpp_multidimensional_subscript=202211L");
  cpp_define (pfile, "__cpp_named_character_escapes=202207L");
  cpp_define (pfile, "__cpp_static_call_operator=202207L");
  cpp_define (pfile, "__cpp_implicit_move=202207L");
--- gcc/cp/decl.cc.jj   2022-11-08 09:54:37.313400209 +0100
+++ gcc/cp/decl.cc  2022-11-10 21:26:06.891359343 +0100
@@ -15377,7 +15377,15 @@ grok_op_properties (tree decl, bool comp
   an enumeration, or a reference to an enumeration.  13.4.0.6 */
if (! methodp || DECL_STATIC_FUNCTION_P (decl))
  {
-  if (operator_code == CALL_EXPR)
+  if (operator_code == TYPE_EXPR
+ || operator_code == COMPONENT_REF
+ || operator_code == NOP_EXPR)
+   {
+ error_at (loc, "%qD must be a non-static member function", decl);
+ return false;
+   }
+
+  if (operator_code == CALL_EXPR || operator_code == ARRAY_REF)
{
  if (! DECL_STATIC_FUNCTION_P (decl))
{
@@ -15386,52 +15394,41 @@ grok_op_properties (tree decl, bool comp
}
  if (cxx_dialect < cxx23
  /* For lambdas we diagnose static lambda specifier elsewhere.  */
- && ! LAMBDA_FUNCTION_P (decl)
+ && (operator_code == ARRAY_REF || ! LAMBDA_FUNCTION_P (decl))
  /* For instantiations, we have diagnosed this already.  */
  && ! DECL_USE_TEMPLATE (decl))
pedwarn (loc, OPT_Wc__23_extensions, "%qD may be a static member "
- "function only with %<-std=c++23%> or %<-std=gnu++23%>", decl);
- /* There are no further restrictions on the arguments to an
-overloaded "operator ()".  */
- return true;
-   }
-  if (operator_code == TYPE_EXPR
- || operator_code == COMPONENT_REF
- || operator_code == ARRAY_REF
- || operator_code == NOP_EXPR)
-   {
- error_at (loc, "%qD must be a non-static member function", decl);
- return false;
+"function only with %<-std=c++23%> or %<-std=gnu++23%>",
+decl);
}
-
-  if (DECL_STATIC_FUNCTION_P (decl))
+  else if (DECL_STATIC_FUNCTION_P (decl))
{
  error_at (loc, "%qD must be either a non-static member "
"function or a non-member function", decl);
  return false;
}
-
-  for (tree arg = argtypes; ; arg = TREE_CHAIN (arg))
-   {
- if (!arg || arg == void_list_node)
-   {
- if (complain)
-   error_at(loc, "%qD must have an argument of class or "
-"enumerated type", decl);
- return false;
-   }
+  else

[PATCH] ira: Remove duplicate `memset' over `full_costs' from `assign_hard_reg'

2022-11-14 Thread Maciej W. Rozycki
Remove duplicate clearing of `full_costs' made in `assign_hard_reg', 
which has been there since the beginning, i.e. commit 058e97ecf33a
("IRA has been merged into trunk"), 
.

gcc/
* ira-color.cc (assign_hard_reg): Remove duplicate `memset' over 
`full_costs'.
---
Hi,

 I find this fairly obvious, OK to apply?

  Maciej
---
 gcc/ira-color.cc |1 -
 1 file changed, 1 deletion(-)

gcc-ira-assign-hard-reg-full-costs-dup.diff
Index: gcc/gcc/ira-color.cc
===
--- gcc.orig/gcc/ira-color.cc
+++ gcc/gcc/ira-color.cc
@@ -1961,7 +1961,6 @@ assign_hard_reg (ira_allocno_t a, bool r
   aclass = ALLOCNO_CLASS (a);
   class_size = ira_class_hard_regs_num[aclass];
   best_hard_regno = -1;
-  memset (full_costs, 0, sizeof (int) * class_size);
   mem_cost = 0;
   memset (costs, 0, sizeof (int) * class_size);
   memset (full_costs, 0, sizeof (int) * class_size);


Re: [PATCH] c++: init_priority and SUPPORTS_INIT_PRIORITY [PR107638]

2022-11-14 Thread Jason Merrill via Gcc-patches

On 11/11/22 08:47, Patrick Palka wrote:

The commit r13-3706-gd0a492faa6478c for correcting the result of
__has_attribute(init_priority) causes a bootstrap failure on hppa64-hpux
because it assumes SUPPORTS_INIT_PRIORITY expands to a simple constant,
but on this target SUPPORTS_INIT_PRIORITY is defined as

   #define SUPPORTS_INIT_PRIORITY (TARGET_GNU_LD ? 1 : 0)

(where TARGET_GNU_LD expands to something in terms of global_options)
which means we can't use this macro to statically exclude the entry
for init_priority when defining the cxx_attribute_table.

So instead of trying to exclude init_priority from the attribute table
for sake of __has_attribute, this patch just makes __has_attribute
handle init_priority specially.

Bootstrapped and regtested on x86_64-pc-linux-gnu, does this look OK for
trunk?  Also sanity checked by artificially defining SUPPORTS_INIT_PRIORITY
to 0.


OK.


PR c++/107638

gcc/c-family/ChangeLog:

* c-lex.cc (c_common_has_attribute): Return 1 for init_priority
iff SUPPORTS_INIT_PRIORITY.

gcc/cp/ChangeLog:

* tree.cc (cxx_attribute_table): Don't conditionally exclude
the init_priority entry.
(handle_init_priority_attribute): Remove ATTRIBUTE_UNUSED.
Return error_mark_node if !SUPPORTS_INIT_PRIORITY.
---
  gcc/c-family/c-lex.cc |  9 +
  gcc/cp/tree.cc| 11 +++
  2 files changed, 16 insertions(+), 4 deletions(-)

diff --git a/gcc/c-family/c-lex.cc b/gcc/c-family/c-lex.cc
index 89c65aca28a..2fe562c7ccf 100644
--- a/gcc/c-family/c-lex.cc
+++ b/gcc/c-family/c-lex.cc
@@ -380,6 +380,15 @@ c_common_has_attribute (cpp_reader *pfile, bool std_syntax)
result = 201907;
  else if (is_attribute_p ("assume", attr_name))
result = 202207;
+ else if (is_attribute_p ("init_priority", attr_name))
+   {
+ /* The (non-standard) init_priority attribute is always
+included in the attribute table, but we don't want to
+advertise the attribute unless the target actually
+supports init priorities.  */
+ result = SUPPORTS_INIT_PRIORITY ? 1 : 0;
+ attr_name = NULL_TREE;
+   }
}
  else
{
diff --git a/gcc/cp/tree.cc b/gcc/cp/tree.cc
index c30bbeb0839..2324c2269fc 100644
--- a/gcc/cp/tree.cc
+++ b/gcc/cp/tree.cc
@@ -5010,10 +5010,8 @@ const struct attribute_spec cxx_attribute_table[] =
  {
/* { name, min_len, max_len, decl_req, type_req, fn_type_req,
 affects_type_identity, handler, exclude } */
-#if SUPPORTS_INIT_PRIORITY
{ "init_priority",  1, 1, true,  false, false, false,
  handle_init_priority_attribute, NULL },
-#endif
{ "abi_tag", 1, -1, false, false, false, true,
  handle_abi_tag_attribute, NULL },
{ NULL, 0, 0, false, false, false, false, NULL, NULL }
@@ -5041,13 +5039,19 @@ const struct attribute_spec std_attribute_table[] =
  
  /* Handle an "init_priority" attribute; arguments as in

 struct attribute_spec.handler.  */
-ATTRIBUTE_UNUSED static tree
+static tree
  handle_init_priority_attribute (tree* node,
tree name,
tree args,
int /*flags*/,
bool* no_add_attrs)
  {
+  if (!SUPPORTS_INIT_PRIORITY)
+/* Treat init_priority as an unrecognized attribute (mirroring the
+   result of __has_attribute) if the target doesn't support init
+   priorities.  */
+return error_mark_node;
+
tree initp_expr = TREE_VALUE (args);
tree decl = *node;
tree type = TREE_TYPE (decl);
@@ -5105,7 +5109,6 @@ handle_init_priority_attribute (tree* node,
 pri);
  }
  
-  gcc_assert (SUPPORTS_INIT_PRIORITY);

SET_DECL_INIT_PRIORITY (decl, pri);
DECL_HAS_INIT_PRIORITY_P (decl) = 1;
return NULL_TREE;




Re: [PATCH] c++: Disable -Wdangling-reference when initing T

2022-11-14 Thread Jason Merrill via Gcc-patches

On 11/11/22 10:22, Marek Polacek wrote:

Non-const lvalue references can't bind to a temporary, so the
warning should not be emitted if we're initializing something of that
type.  I'm not disabling the warning when the function itself returns
a non-const lvalue reference, that would regress at least

   const int  = std::any_cast(std::any());

in Wdangling-reference2.C where the any_cast returns an int&.

Unfortunately, this patch means we'll stop diagnosing

   int& fn(int&& x) { return static_cast(x); }
   void test ()
   {
 int  = fn(4);
   }

where there's a genuine dangling reference.  OTOH, the patch
should suppress false positives with iterators, like:

   auto  = *candidates.begin ();

and arguably that's more important than detecting some relatively
obscure cases.  It's probably not worth it making the warning more
complicated by, for instance, not warning when a fn returns 'int&'
but takes 'const int&' (because then it can't return its argument).

Bootstrapped/regtested on x86_64-pc-linux-gnu, ok for trunk?


OK.


gcc/cp/ChangeLog:

* call.cc (maybe_warn_dangling_reference): Don't warn when initializing
a non-const lvalue reference.

gcc/testsuite/ChangeLog:

* g++.dg/cpp23/elision4.C: Remove dg-warning.
* g++.dg/warn/Wdangling-reference1.C: Turn dg-warning into dg-bogus.
* g++.dg/warn/Wdangling-reference7.C: New test.
---
  gcc/cp/call.cc   | 10 --
  gcc/testsuite/g++.dg/cpp23/elision4.C|  4 ++--
  gcc/testsuite/g++.dg/warn/Wdangling-reference1.C |  4 ++--
  gcc/testsuite/g++.dg/warn/Wdangling-reference7.C | 16 
  4 files changed, 28 insertions(+), 6 deletions(-)
  create mode 100644 gcc/testsuite/g++.dg/warn/Wdangling-reference7.C

diff --git a/gcc/cp/call.cc b/gcc/cp/call.cc
index bd3b64a7e26..ef618d5c485 100644
--- a/gcc/cp/call.cc
+++ b/gcc/cp/call.cc
@@ -13679,8 +13679,14 @@ maybe_warn_dangling_reference (const_tree decl, tree 
init)
  {
if (!warn_dangling_reference)
  return;
-  if (!(TYPE_REF_OBJ_P (TREE_TYPE (decl))
-   || std_pair_ref_ref_p (TREE_TYPE (decl
+  tree type = TREE_TYPE (decl);
+  /* Only warn if what we're initializing has type T&& or const T&, or
+ std::pair.  (A non-const lvalue reference can't
+ bind to a temporary.)  */
+  if (!((TYPE_REF_OBJ_P (type)
+&& (TYPE_REF_IS_RVALUE (type)
+|| CP_TYPE_CONST_P (TREE_TYPE (type
+   || std_pair_ref_ref_p (type)))
  return;
/* Don't suppress the diagnostic just because the call comes from
   a system header.  If the DECL is not in a system header, or if
diff --git a/gcc/testsuite/g++.dg/cpp23/elision4.C 
b/gcc/testsuite/g++.dg/cpp23/elision4.C
index d39053ad741..77dcffcdaad 100644
--- a/gcc/testsuite/g++.dg/cpp23/elision4.C
+++ b/gcc/testsuite/g++.dg/cpp23/elision4.C
@@ -34,6 +34,6 @@ T& temporary2(T&& x) { return static_cast(x); }
  void
  test ()
  {
-  int& r1 = temporary1 (42); // { dg-warning "dangling reference" }
-  int& r2 = temporary2 (42); // { dg-warning "dangling reference" }
+  int& r1 = temporary1 (42);
+  int& r2 = temporary2 (42);
  }
diff --git a/gcc/testsuite/g++.dg/warn/Wdangling-reference1.C 
b/gcc/testsuite/g++.dg/warn/Wdangling-reference1.C
index 97c81ee716c..1718c28165e 100644
--- a/gcc/testsuite/g++.dg/warn/Wdangling-reference1.C
+++ b/gcc/testsuite/g++.dg/warn/Wdangling-reference1.C
@@ -139,6 +139,6 @@ struct Y {
  // x1 = Y::operator int&& (_EXPR )
  int&& x1 = Y(); // { dg-warning "dangling reference" }
  int&& x2 = Y{}; // { dg-warning "dangling reference" }
-int& x3 = Y(); // { dg-warning "dangling reference" }
-int& x4 = Y{}; // { dg-warning "dangling reference" }
+int& x3 = Y(); // { dg-bogus "dangling reference" }
+int& x4 = Y{}; // { dg-bogus "dangling reference" }
  const int& t1 = Y().foo(10); // { dg-warning "dangling reference" }
diff --git a/gcc/testsuite/g++.dg/warn/Wdangling-reference7.C 
b/gcc/testsuite/g++.dg/warn/Wdangling-reference7.C
new file mode 100644
index 000..4b0de2d8670
--- /dev/null
+++ b/gcc/testsuite/g++.dg/warn/Wdangling-reference7.C
@@ -0,0 +1,16 @@
+// { dg-do compile { target c++11 } }
+// { dg-options "-Wdangling-reference" }
+
+int& ref(const int&);
+int&& rref(const int&);
+
+void
+g ()
+{
+  const int& r1 = ref (1); // { dg-warning "dangling reference" }
+  int& r2 = ref (2); // { dg-bogus "dangling reference" }
+  auto& r3 = ref (3); // { dg-bogus "dangling reference" }
+  int&& r4 = rref (4); // { dg-warning "dangling reference" }
+  auto&& r5 = rref (5); // { dg-warning "dangling reference" }
+  const int&& r6 = rref (6); // { dg-warning "dangling reference" }
+}

base-commit: 0a7b437ca71e2721e9bcf070762fc54ef7991aeb




Re: [PATCH] c++: Add testcase for DR 2604

2022-11-14 Thread Jason Merrill via Gcc-patches

On 11/14/22 01:43, Jakub Jelinek wrote:

Hi!

Working virtually out of Baker Island.

As the following testcase shows, I think we don't inherit template's
attributes into specializations.

Tested on x86_64-linux, ok for trunk?


OK.


2022-11-13  Jakub Jelinek  

* g++.dg/DRs/dr2604.C: New test.

--- gcc/testsuite/g++.dg/DRs/dr2604.C.jj2022-11-13 23:39:45.725712300 
-1200
+++ gcc/testsuite/g++.dg/DRs/dr2604.C   2022-11-13 23:39:38.712807673 -1200
@@ -0,0 +1,53 @@
+// DR 2604 - Attributes for an explicit specialization.
+// { dg-do compile { target c++11 } }
+// { dg-options "-Wunused-parameter" }
+
+template
+[[noreturn]] void
+foo ([[maybe_unused]] int i)
+{
+  for (;;);
+}
+
+template<>
+void
+foo (int i) // { dg-warning "unused parameter 'i'" }
+{
+}
+
+template
+void
+bar (int i)// { dg-warning "unused parameter 'i'" }
+{
+}
+
+template<>
+[[noreturn]] void
+bar ([[maybe_unused]] int i)
+{
+  for (;;);
+}
+
+[[noreturn]] void
+baz ()
+{
+  foo (0);
+}
+
+[[noreturn]] void
+qux ()
+{
+  foo (0);
+}  // { dg-warning "'noreturn' function does return" }
+
+[[noreturn]] void
+garply ()
+{
+  bar (0);
+}  // { dg-warning "'noreturn' function does return" }
+
+[[noreturn]] void
+corge ()
+{
+  bar (0);
+}

Jakub





Re: [PATCH] c++: P2448 - Relaxing some constexpr restrictions [PR106649]

2022-11-14 Thread Jason Merrill via Gcc-patches

On 11/9/22 10:53, Marek Polacek wrote:

This patch implements C++23 P2448, which lifts more restrictions on the
constexpr keyword.  It's effectively going the way of being just a hint
(hello, inline!).

This gist is relatively simple: in C++23, a constexpr function's return
type/parameter type doesn't have to be a literal type; and you can have
a constexpr function for which no invocation satisfies the requirements
of a core constant expression.  For example,

   void f(int& i); // not constexpr

   constexpr void g(int& i) {
 f(i); // unconditionally calls a non-constexpr function
   }

is now OK, even though there isn't an invocation of 'g' that would be
a constant expression.  Maybe 'f' will be made constexpr soon, or maybe
this depends on the version of C++ used, and similar.  The patch is
unfortunately not that trivial.  The important bit is to use the new
require_potential_rvalue_constant_expression_fncheck in
maybe_save_constexpr_fundef (and where appropriate).  It has a new flag
that says that we're checking the body of a constexpr function, and in
that case it's OK to find constructs that aren't a constant expression.

Since it's useful to be able to check for problematic constructs even
in C++23, this patch implements a new warning, -Winvalid-constexpr,
which is a pedwarn turned on by default in C++20 and earlier, and which
can be turned on in C++23 as well, in which case it's an ordinary warning.
This I implemented by using the new function constexpr_error, used in
p_c_e_1 and friends.  (In some cases I believe fundef_p will be always
false (= hard error), but it made sense to me to be consistent and use
constexpr_error throughout p_c_e_1.)

While working on this I think I found a bug, see constexpr-nonlit15.C
and .  This patch doesn't address that.

I also don't love that in C++23, if you don't use -Winvalid-constexpr,
and call a constexpr function that in fact isn't constexpr-ready yet,
sometimes all you get is an error saying "called in a constant expression"
like in constexpr-nonlit12.C.  This could be remedied by some tweaks to
explain_invalid_constexpr_fn, I reckon (it gives up on !DECL_DEFAULTED_FN).


And also in maybe_save_constexpr_fn: if -Wno-invalid-constexpr, 
"complain" should be false so we save the definition for 
explain_invalid_constexpr_fn to refer to.



Bootstrapped/regtested on x86_64-pc-linux-gnu, ok for trunk?

PR c++/106649

gcc/c-family/ChangeLog:

* c-cppbuiltin.cc (c_cpp_builtins): Update value of __cpp_constexpr for
C++23.
* c-opts.cc (c_common_post_options): Set warn_invalid_constexpr
depending on cxx_dialect.
* c.opt (Winvalid-constexpr): New option.

gcc/cp/ChangeLog:

* constexpr.cc (constexpr_error): New function.
(is_valid_constexpr_fn): Use constexpr_error.
(maybe_save_constexpr_fundef): Call
require_potential_rvalue_constant_expression_fncheck rather than
require_potential_rvalue_constant_expression.
(non_const_var_error): Add a bool parameter.  Use constexpr_error.
(inline_asm_in_constexpr_error): Likewise.
(cxx_eval_constant_expression): Adjust calls to non_const_var_error
and inline_asm_in_constexpr_error.
(potential_constant_expression_1): Add a bool parameter.  Use
constexpr_error.
(require_potential_rvalue_constant_expression_fncheck): New function.
* cp-tree.h (require_potential_rvalue_constant_expression_fncheck):
Declare.
* method.cc (struct comp_info): Call
require_potential_rvalue_constant_expression_fncheck rather than
require_potential_rvalue_constant_expression.

gcc/ChangeLog:

* doc/gcc/gcc-command-options/option-summary.rst: Add
-Winvalid-constexpr.
* doc/gcc/gcc-command-options/options-controlling-c++-dialect.rst:
Document -Winvalid-constexpr.

gcc/testsuite/ChangeLog:

* g++.dg/cpp0x/constexpr-ctor2.C: Expect an error in c++20_down only.
* g++.dg/cpp0x/constexpr-default-ctor.C: Likewise.
* g++.dg/cpp0x/constexpr-diag3.C: Likewise.
* g++.dg/cpp0x/constexpr-ex1.C: Likewise.
* g++.dg/cpp0x/constexpr-friend.C: Likewise.
* g++.dg/cpp0x/constexpr-generated1.C: Likewise.
* g++.dg/cpp0x/constexpr-ice5.C: Likewise.
* g++.dg/cpp0x/constexpr-ice6.C: Likewise.
* g++.dg/cpp0x/constexpr-memfn1.C: Likewise.
* g++.dg/cpp0x/constexpr-neg2.C: Likewise.
* g++.dg/cpp0x/constexpr-non-const-arg.C: Likewise.
* g++.dg/cpp0x/constexpr-reinterpret1.C: Likewise.
* g++.dg/cpp0x/pr65327.C: Likewise.
* g++.dg/cpp1y/constexpr-105050.C: Likewise.
* g++.dg/cpp1y/constexpr-89285-2.C: Likewise.
* g++.dg/cpp1y/constexpr-89285.C: Likewise.
* g++.dg/cpp1y/constexpr-89785-2.C: Likewise.
* g++.dg/cpp1y/constexpr-local4.C: Likewise.
* g++.dg/cpp1y/constexpr-neg1.C: Likewise.
* 

Re: [PATCH v2 0/2] Basic support for the Ventana VT1 w/ instruction fusion

2022-11-14 Thread Philipp Tomsich
On Mon, 14 Nov 2022 at 23:47, Palmer Dabbelt  wrote:
>
> [Trying to join the threads here.]
>
> On Mon, 14 Nov 2022 13:28:23 PST (-0800), philipp.toms...@vrull.eu wrote:
> > Jeff,
> >
> > On Mon, 14 Nov 2022 at 22:23, Jeff Law  wrote:
> >>
> >>
> >> On 11/14/22 13:00, Palmer Dabbelt wrote:
> >> > On Sun, 13 Nov 2022 12:48:22 PST (-0800), philipp.toms...@vrull.eu wrote:
> >> >>
> >> >> This series provides support for the Ventana VT1 (a 4-way superscalar
> >> >> rv64gc_zba_zbb_zbc_zbs_zifenci_xventanacondops core) including support
> >> >> for the supported instruction fusion patterns.
> >> >>
> >> >> This includes the addition of the fusion-aware scheduling
> >> >> infrastructure for RISC-V and implements idiom recognition for the
> >> >> fusion patterns supported by VT1.
> >> >>
> >> >> Note that we don't signal support for XVentanaCondOps at this point,
> >> >> as the XVentanaCondOps support is in-flight separately. Changing the
> >> >> defaults for VT1 can happen late in the cycle, so no need to link the
> >> >> two different changesets.
> >> >>
> >> >> Changes in v2:
> >> >> - Rebased and changed over to .rst-based documentation
> >> >> - Updated to catch more fusion cases
> >> >> - Signals support for Zifencei
> >> >>
> >> >> Philipp Tomsich (2):
> >> >>   RISC-V: Add basic support for the Ventana-VT1 core
> >> >>   RISC-V: Add instruction fusion (for ventana-vt1)
> >> >>
> >> >>  gcc/config/riscv/riscv-cores.def  |   3 +
> >> >>  gcc/config/riscv/riscv-opts.h |   2 +-
> >> >>  gcc/config/riscv/riscv.cc | 233 ++
> >> >>  .../risc-v-options.rst|   5 +-
> >> >>  4 files changed, 240 insertions(+), 3 deletions(-)
> >> >
> >> > I guess we never really properly talked about this on the GCC mailing
> >> > lists, but IMO it's fine to start taking code for designs that have
> >> > been announced under the assumption that if the hardware doesn't
> >> > actually show up according to those timelines that it will be assumed
> >> > to have never existed and thus be removed more quickly than usual.
> >> Absolutely.   I have zero interest in carrying around code for
> >> nonexistent or dead variants.
> >> >
> >> > That said, I can't find anything describing that the VT-1 exists aside
> >> > from these patches.  Is there anything that describes this design and
> >> > when it's expected to be available?
> >>
> >> What do you need?  I can give some broad overview information on the
> >> design, but it would likely just mirror what's already been mentioned in
> >> these patches.
> >>
> >>
> >> As far as schedules.  I'm not sure what I can say.  I'll check on that.
>
> I'm less worried about the "does this pipeline model match the HW" bits,
> at least until the HW is publicly available then all we can do is rely
> on the vendor (and even after the HW is public the vendor might be the
> only one who cares enough to figure things out, nothing we can really do
> upstream there).  We've had some issues with nobody caring enough about
> the C906 pipeline model to sort out whether some patches are a net win,
> but if nobody (including the vendor) cares about the HW enough to
> benchmark things then there's not much we can do.
>
> My bigger worry is getting roped in to supporting a bunch of hardware
> that doesn't actually exist yet and may never make it outside some
> vendor's lab.  That can generally be a ton of work and filters
> throughout GCC, even outside of the RISC-V backend.  We've already got
> enough chaos just trying to follow the ISA, chasing down issues related
> to hardware that may not ever manifest is just going to lead to
> craziness.
>
> So on my end the point of the schedule is to have something we can look
> at and determine that the hardware is somehow defunct.  The fairest way
> we could come up with was to tie it to some sort of company announcement
> of the hardware: obviously everyone knows their internal timelines, but
> that's not fair to companies that don't employ someone with commit
> access.  Requirement some sort of public announcement means everyone has
> the same rules to play by, IMO that's really important in RISC-V land as
> there's so many vendors.
>
> >> It was never my intention to bypass any process/procedures here. So if I
> >> did, my apologies.
> >
> > The controversial part is XVentanaCondOps (as it is a vendor-defined
> > extension), so I'll certainly hold off on that until both you and
> > Palmer are in agreement on how to proceed there.
>
> The pipeline models are essentially in the same spot.  We've got a bit
> of a precedent there for taking them just based on an announcement, but
> there isn't one here.
>
> [and the other side of the thread]
>
> On Mon, 14 Nov 2022 13:14:35 PST (-0800), philipp.toms...@vrull.eu wrote:
> > On Mon, 14 Nov 2022 at 21:58, Palmer Dabbelt  wrote:
> >>
> >> On Mon, 14 Nov 2022 12:03:38 PST (-0800), philipp.toms...@vrull.eu wrote:
> >> > On Mon, 14 Nov 2022 at 

[committed] wwwdocs: gcc-13: Add release notes for more C23 features

2022-11-14 Thread Joseph Myers
diff --git a/htdocs/gcc-13/changes.html b/htdocs/gcc-13/changes.html
index 41d07e57..d033628b 100644
--- a/htdocs/gcc-13/changes.html
+++ b/htdocs/gcc-13/changes.html
@@ -112,9 +112,41 @@ a work-in-progress.
 
   https://www.open-std.org/jtc1/sc22/wg14/www/docs/n3042.htm;>N3042,
  Introduce the nullptr constant
+  https://www.open-std.org/jtc1/sc22/wg14/www/docs/n2963.htm;>N2963,
+  Enhanced Enumerations (fixed underlying types)
+  https://www.open-std.org/jtc1/sc22/wg14/www/docs/n2975.pdf;>N2975,
+   Relax requirements for variadic parameter lists
+  https://www.open-std.org/jtc1/sc22/wg14/www/docs/n3007.htm;>N3007,
+  Type inference for object definitions (auto)
+  https://www.open-std.org/jtc1/sc22/wg14/www/docs/n3018.htm;>N3018,
+   The constexpr specifier for object definitions
+  https://www.open-std.org/jtc1/sc22/wg14/www/docs/n3038.htm;>N3038,
+   Introduce storage-class specifiers for compound literals
+  typeof (previously supported as an extension)
+   and typeof_unqual
+  New
+   keywords alignas, alignof, bool,
+   false, static_assert, 
thread_local,
+   true
+  https://www.open-std.org/jtc1/sc22/wg14/www/docs/n2764.pdf;>N2764,
+  The noreturn attribute
   Support for empty initializer braces
+  __STDC_VERSION_*_H__ header version macros
+  Removal of ATOMIC_VAR_INIT
+  unreachable macro
+  in stddef.h
+  Removal of trigraphs
+  Removal of unprototyped functions
+  printf and scanf format checking
+  with -Wformat for %wN
+  and %wfN format length modifiers
+  https://www.open-std.org/jtc1/sc22/wg14/www/docs/n2836.pdf;>N2836,
+   Identifier Syntax using Unicode Standard Annex 31
 
   
+  In addition to those C23 features, existing features adopted in
+  C23 have been adjusted to follow C23 requirements and are not diagnosed
+  with -std=c2x -Wpedantic.
   New warnings:
 
   -Wenum-int-mismatch warns about mismatches between an

-- 
Joseph S. Myers
jos...@codesourcery.com


[pushed] c++: only declare satisfied friends

2022-11-14 Thread Jason Merrill via Gcc-patches
Tested x86_64-pc-linux-gnu, applying to trunk.

-- >8 --

A friend declaration can only have constraints if it is defined.  If
multiple instantiations of a class template define the same friend function
signature, it's an error, but that shouldn't happen if it's constrained to
only be declared in one instantiation.

Currently we don't mangle requirements, so the foos all mangle the same and
actually instantiating #1 will break, but for now we can test that they're
considered distinct.

gcc/cp/ChangeLog:

* pt.cc (tsubst_friend_function): Check satisfaction.

gcc/testsuite/ChangeLog:

* g++.dg/cpp2a/concepts-friend11.C: New test.
---
 gcc/cp/pt.cc  |  3 +++
 .../g++.dg/cpp2a/concepts-friend11.C  | 21 +++
 2 files changed, 24 insertions(+)
 create mode 100644 gcc/testsuite/g++.dg/cpp2a/concepts-friend11.C

diff --git a/gcc/cp/pt.cc b/gcc/cp/pt.cc
index 57917de321f..af96c5ca25f 100644
--- a/gcc/cp/pt.cc
+++ b/gcc/cp/pt.cc
@@ -11284,6 +11284,9 @@ tsubst_friend_function (tree decl, tree args)
  not_tmpl = DECL_TEMPLATE_RESULT (new_friend);
  new_friend_result_template_info = DECL_TEMPLATE_INFO (not_tmpl);
}
+  else if (!constraints_satisfied_p (new_friend))
+   /* Only define a constrained hidden friend when satisfied.  */
+   return error_mark_node;
 
   /* Inside pushdecl_namespace_level, we will push into the
 current namespace. However, the friend function should go
diff --git a/gcc/testsuite/g++.dg/cpp2a/concepts-friend11.C 
b/gcc/testsuite/g++.dg/cpp2a/concepts-friend11.C
new file mode 100644
index 000..0350ac3553e
--- /dev/null
+++ b/gcc/testsuite/g++.dg/cpp2a/concepts-friend11.C
@@ -0,0 +1,21 @@
+// CWG2596
+// { dg-do compile { target c++20 } }
+
+struct Base {};
+
+int foo(Base&) { return 0; } // #0
+
+template
+struct S : Base {
+  friend int foo(Base&) requires (N == 1) { return 1; }  // #1
+  // friend int foo(Base&) requires (N == 2) { return 3; }  // #2
+};
+
+S<1> s1;
+S<2> s2;  // OK, no conflict between #1 and #0
+int x = foo(s1);  // { dg-error "ambiguous" }
+int y = foo(s2);  // OK, selects #0
+
+// ??? currently the foos all mangle the same, so comment out #2
+// and only test that #1 isn't multiply defined and overloads with #0.
+// The 2596 example does not include #0 and expects both calls to work.

base-commit: e7c12a921525b2aa27ca4814c42c63d61a6d954e
-- 
2.31.1



Re: [PATCH v2 0/2] Basic support for the Ventana VT1 w/ instruction fusion

2022-11-14 Thread Palmer Dabbelt

[Trying to join the threads here.]

On Mon, 14 Nov 2022 13:28:23 PST (-0800), philipp.toms...@vrull.eu wrote:

Jeff,

On Mon, 14 Nov 2022 at 22:23, Jeff Law  wrote:



On 11/14/22 13:00, Palmer Dabbelt wrote:
> On Sun, 13 Nov 2022 12:48:22 PST (-0800), philipp.toms...@vrull.eu wrote:
>>
>> This series provides support for the Ventana VT1 (a 4-way superscalar
>> rv64gc_zba_zbb_zbc_zbs_zifenci_xventanacondops core) including support
>> for the supported instruction fusion patterns.
>>
>> This includes the addition of the fusion-aware scheduling
>> infrastructure for RISC-V and implements idiom recognition for the
>> fusion patterns supported by VT1.
>>
>> Note that we don't signal support for XVentanaCondOps at this point,
>> as the XVentanaCondOps support is in-flight separately. Changing the
>> defaults for VT1 can happen late in the cycle, so no need to link the
>> two different changesets.
>>
>> Changes in v2:
>> - Rebased and changed over to .rst-based documentation
>> - Updated to catch more fusion cases
>> - Signals support for Zifencei
>>
>> Philipp Tomsich (2):
>>   RISC-V: Add basic support for the Ventana-VT1 core
>>   RISC-V: Add instruction fusion (for ventana-vt1)
>>
>>  gcc/config/riscv/riscv-cores.def  |   3 +
>>  gcc/config/riscv/riscv-opts.h |   2 +-
>>  gcc/config/riscv/riscv.cc | 233 ++
>>  .../risc-v-options.rst|   5 +-
>>  4 files changed, 240 insertions(+), 3 deletions(-)
>
> I guess we never really properly talked about this on the GCC mailing
> lists, but IMO it's fine to start taking code for designs that have
> been announced under the assumption that if the hardware doesn't
> actually show up according to those timelines that it will be assumed
> to have never existed and thus be removed more quickly than usual.
Absolutely.   I have zero interest in carrying around code for
nonexistent or dead variants.
>
> That said, I can't find anything describing that the VT-1 exists aside
> from these patches.  Is there anything that describes this design and
> when it's expected to be available?

What do you need?  I can give some broad overview information on the
design, but it would likely just mirror what's already been mentioned in
these patches.


As far as schedules.  I'm not sure what I can say.  I'll check on that.


I'm less worried about the "does this pipeline model match the HW" bits, 
at least until the HW is publicly available then all we can do is rely 
on the vendor (and even after the HW is public the vendor might be the 
only one who cares enough to figure things out, nothing we can really do 
upstream there).  We've had some issues with nobody caring enough about 
the C906 pipeline model to sort out whether some patches are a net win, 
but if nobody (including the vendor) cares about the HW enough to 
benchmark things then there's not much we can do.


My bigger worry is getting roped in to supporting a bunch of hardware 
that doesn't actually exist yet and may never make it outside some 
vendor's lab.  That can generally be a ton of work and filters 
throughout GCC, even outside of the RISC-V backend.  We've already got 
enough chaos just trying to follow the ISA, chasing down issues related 
to hardware that may not ever manifest is just going to lead to 
craziness.


So on my end the point of the schedule is to have something we can look 
at and determine that the hardware is somehow defunct.  The fairest way 
we could come up with was to tie it to some sort of company announcement 
of the hardware: obviously everyone knows their internal timelines, but 
that's not fair to companies that don't employ someone with commit 
access.  Requirement some sort of public announcement means everyone has 
the same rules to play by, IMO that's really important in RISC-V land as 
there's so many vendors.



It was never my intention to bypass any process/procedures here. So if I
did, my apologies.


The controversial part is XVentanaCondOps (as it is a vendor-defined
extension), so I'll certainly hold off on that until both you and
Palmer are in agreement on how to proceed there.


The pipeline models are essentially in the same spot.  We've got a bit 
of a precedent there for taking them just based on an announcement, but 
there isn't one here.


[and the other side of the thread]

On Mon, 14 Nov 2022 13:14:35 PST (-0800), philipp.toms...@vrull.eu wrote:

On Mon, 14 Nov 2022 at 21:58, Palmer Dabbelt  wrote:


On Mon, 14 Nov 2022 12:03:38 PST (-0800), philipp.toms...@vrull.eu wrote:
> On Mon, 14 Nov 2022 at 21:00, Palmer Dabbelt  wrote:
>>
>> On Sun, 13 Nov 2022 12:48:22 PST (-0800), philipp.toms...@vrull.eu wrote:
>> >
>> > This series provides support for the Ventana VT1 (a 4-way superscalar
>> > rv64gc_zba_zbb_zbc_zbs_zifenci_xventanacondops core) including support
>> > for the supported instruction fusion patterns.
>> >
>> > This includes the addition of the fusion-aware scheduling
>> > 

Re: Revert Sphinx documentation [Was: Issues with Sphinx]

2022-11-14 Thread Gerald Pfeifer
On Mon, 14 Nov 2022, Jonathan Wakely wrote:
> I formatted my new region/endregion pragmas on one line because that
> seemed to be how it should be done for rSt, e.g. we had:
> 
> ``#pragma GCC push_options`` ``#pragma GCC pop_options``
> 
> But I think the attached patch is more correct for how we document
> pragmas in texinfo.
> 
> OK for trunk?

Looks good to my eyes. Thank you for caring for such details, Jonathan!

Gerald


[committed] ira: Fix `create_insn_allocnos' `outer' parameter documentation

2022-11-14 Thread Maciej W. Rozycki
The parameter of `create_insn_allocnos' for any parent expression of `x' 
has always been called `outer' rather than `parent', just as added by 
commit d1bb282efbf9 ("Fix for "FAIL: tmpdir-gcc.dg-struct-layout-1/t028 
c_compat_x_tst.o compile, (internal compiler error)""), 
.  Correct 
inline documentation accordingly.

gcc/
* ira-build.cc (create_insn_allocnos): Fix documentation.
---
Hi,

 Committed as obvious.

  Maciej
---
 gcc/ira-build.cc |2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

gcc-ira-create-insn-allocnos-outer-doc.diff
Index: gcc/gcc/ira-build.cc
===
--- gcc.orig/gcc/ira-build.cc
+++ gcc/gcc/ira-build.cc
@@ -1832,7 +1832,7 @@ static basic_block curr_bb;
 
 /* This recursive function creates allocnos corresponding to
pseudo-registers containing in X.  True OUTPUT_P means that X is
-   an lvalue.  PARENT corresponds to the parent expression of X.  */
+   an lvalue.  OUTER corresponds to the parent expression of X.  */
 static void
 create_insn_allocnos (rtx x, rtx outer, bool output_p)
 {


Re: [PATCH 2/2]AArch64 Perform more late folding of reg moves and shifts which arrive after expand

2022-11-14 Thread Richard Sandiford via Gcc-patches
(Sorry, immediately following up to myself for a second time recently.)

Richard Sandiford  writes:
> Tamar Christina  writes:
>>> 
>>> The same thing ought to work for smov, so it would be good to do both.
>>> That would also make the split between the original and new patterns more
>>> obvious: left shift for the old pattern, right shift for the new pattern.
>>> 
>>
>> Done, though because umov can do multilevel extensions I couldn't combine 
>> them
>> Into a single pattern.
>
> Hmm, but the pattern is:
>
> (define_insn "*si3_insn2_uxtw"
>   [(set (match_operand:GPI 0 "register_operand" "=r,r,r")
>   (zero_extend:GPI (LSHIFTRT_ONLY:SI
> (match_operand:SI 1 "register_operand" "w,r,r")
> (match_operand:QI 2 "aarch64_reg_or_shift_imm_si" "Usl,Uss,r"]
>
> GPI is just SI or DI, so in the SI case we're zero-extending SI to SI,
> which isn't a valid operation.  The original patch was just for extending
> to DI, which seems correct.  The choice between printing %x for smov and
> %w for umov can then depend on the code.

My original comment quoted above was about using smov in the zero-extend
pattern.  I.e. the original:

(define_insn "*si3_insn2_uxtw"
  [(set (match_operand:DI 0 "register_operand" "=r,?r,r")
(zero_extend:DI (LSHIFTRT:SI
 (match_operand:SI 1 "register_operand" "w,r,r")
 (match_operand:QI 2 "aarch64_reg_or_shift_imm_si" "Usl,Uss,r"]

could instead be:

(define_insn "*si3_insn2_uxtw"
  [(set (match_operand:DI 0 "register_operand" "=r,?r,r")
(zero_extend:DI (SHIFTRT:SI
 (match_operand:SI 1 "register_operand" "w,r,r")
 (match_operand:QI 2 "aarch64_reg_or_shift_imm_si" "Usl,Uss,r"]

with the pattern using "smov %w0, ..." for ashiftft case.

Thanks,
Richard

>
>>
>> Bootstrapped Regtested on aarch64-none-linux-gnu and no issues.
>>
>> Ok for master?
>>
>> Thanks,
>> Tamar
>>
>> gcc/ChangeLog:
>>
>>  * config/aarch64/aarch64.md (*si3_insn_uxtw): Split SHIFT into
>>  left and right ones.
>>  (*aarch64_ashr_sisd_or_int_3, *si3_insn2_sxtw): Support
>>  smov.
>>  * config/aarch64/constraints.md (Usl): New.
>>  * config/aarch64/iterators.md (LSHIFTRT_ONLY, ASHIFTRT_ONLY): New.
>>
>> gcc/testsuite/ChangeLog:
>>
>>  * gcc.target/aarch64/shift-read_1.c: New test.
>>  * gcc.target/aarch64/shift-read_2.c: New test.
>>  * gcc.target/aarch64/shift-read_3.c: New test.
>>
>> --- inline copy of patch ---
>>
>> diff --git a/gcc/config/aarch64/aarch64.md b/gcc/config/aarch64/aarch64.md
>> index 
>> c333fb1f72725992bb304c560f1245a242d5192d..2bc2684b82c35a44e0a2cea6e3aaf32d939f8cdf
>>  100644
>> --- a/gcc/config/aarch64/aarch64.md
>> +++ b/gcc/config/aarch64/aarch64.md
>> @@ -5370,20 +5370,42 @@ (define_split
>>  
>>  ;; Arithmetic right shift using SISD or Integer instruction
>>  (define_insn "*aarch64_ashr_sisd_or_int_3"
>> -  [(set (match_operand:GPI 0 "register_operand" "=r,r,w,,")
>> +  [(set (match_operand:GPI 0 "register_operand" "=r,r,w,r,,")
>>  (ashiftrt:GPI
>> -  (match_operand:GPI 1 "register_operand" "r,r,w,w,w")
>> +  (match_operand:GPI 1 "register_operand" "r,r,w,w,w,w")
>>(match_operand:QI 2 "aarch64_reg_or_shift_imm_di"
>> -   "Us,r,Us,w,0")))]
>> +   "Us,r,Us,Usl,w,0")))]
>>""
>> -  "@
>> -   asr\t%0, %1, %2
>> -   asr\t%0, %1, %2
>> -   sshr\t%0, %1, %2
>> -   #
>> -   #"
>> -  [(set_attr "type" 
>> "bfx,shift_reg,neon_shift_imm,neon_shift_reg,neon_shift_reg")
>> -   (set_attr "arch" "*,*,simd,simd,simd")]
>> +  {
>> +switch (which_alternative)
>> +{
>> +  case 0:
>> +return "asr\t%0, %1, %2";
>> +  case 1:
>> +return "asr\t%0, %1, %2";
>> +  case 2:
>> +return "sshr\t%0, %1, %2";
>> +  case 3:
>> +{
>> +  int val = INTVAL (operands[2]);
>> +  int size = 32 - val;
>> +
>> +  if (size == 16)
>> +return "smov\\t%w0, %1.h[1]";
>> +  if (size == 8)
>> +return "smov\\t%w0, %1.b[3]";
>
> This only looks right for SI, not DI.  (But we can do something
> similar for DI.)
>
> Thanks,
> Richard
>
>> +  gcc_unreachable ();
>> +}
>> +  case 4:
>> +return "#";
>> +  case 5:
>> +return "#";
>> +  default:
>> +gcc_unreachable ();
>> +}
>> +  }
>> +  [(set_attr "type" "bfx,shift_reg,neon_shift_imm,neon_to_gp, 
>> neon_shift_reg,neon_shift_reg")
>> +   (set_attr "arch" "*,*,simd,simd,simd,simd")]
>>  )
>>  
>>  (define_split
>> @@ -5493,7 +5515,7 @@ (define_insn "*rol3_insn"
>>  ;; zero_extend version of shifts
>>  (define_insn "*si3_insn_uxtw"
>>[(set (match_operand:DI 0 "register_operand" "=r,r")
>> -(zero_extend:DI (SHIFT_no_rotate:SI
>> +(zero_extend:DI (SHIFT_arith:SI
>>   (match_operand:SI 1 "register_operand" "r,r")
>>   (match_operand:QI 2 "aarch64_reg_or_shift_imm_si" "Uss,r"]
>>""
>> @@ -5528,6 +5550,68 @@ (define_insn "*rolsi3_insn_uxtw"
>>[(set_attr "type" "rotate_imm")]
>>  )
>>  
>> 

Re: [PATCH] [range-ops] Implement sqrt.

2022-11-14 Thread Joseph Myers
On Sun, 13 Nov 2022, Jakub Jelinek via Gcc-patches wrote:

> So, I wonder if we don't need to add a target hook where targets will be
> able to provide upper bound on error for floating point functions for
> different floating point modes and some way to signal unknown accuracy/can't
> be trusted, in which case we would give up or return just the range for
> VARYING.

Note that the figures given in the glibc manual are purely empirical 
(largest errors observed for inputs in the glibc testsuite on a system 
that was then used to update the libm-test-ulps files); they don't 
constitute any kind of guarantee about either the current implementation 
or the API, nor are they formally verified, nor do they come from 
exhaustive testing (though worst cases from exhaustive testing for float 
may have been added to the glibc testsuite in some cases).  (I think the 
only functions known to give huge errors for some inputs, outside of any 
IBM long double issues, are the Bessel functions and cpow functions.  But 
even if other functions don't have huge errors, and some 
architecture-specific implementations might have issues, there are 
certainly some cases where errors can exceed the 9ulp threshold on what 
the libm tests will accept in libm-test-ulps files, which are thus 
considered glibc bugs.  (That's 9ulp from the correctly rounded value, 
computed in ulp of that value.  For IBM long double it's 16ulp instead, 
treating the format as having a fixed 106 bits of precision.  Both figures 
are empirical ones chosen based on what bounds sufficed for most libm 
functions some years ago; ideally, with better implementations of some 
functions we could probably bring those numbers down.))

-- 
Joseph S. Myers
jos...@codesourcery.com


Re: [PATCH 2/2]AArch64 Perform more late folding of reg moves and shifts which arrive after expand

2022-11-14 Thread Richard Sandiford via Gcc-patches
Tamar Christina  writes:
>> 
>> The same thing ought to work for smov, so it would be good to do both.
>> That would also make the split between the original and new patterns more
>> obvious: left shift for the old pattern, right shift for the new pattern.
>> 
>
> Done, though because umov can do multilevel extensions I couldn't combine them
> Into a single pattern.

Hmm, but the pattern is:

(define_insn "*si3_insn2_uxtw"
  [(set (match_operand:GPI 0 "register_operand" "=r,r,r")
(zero_extend:GPI (LSHIFTRT_ONLY:SI
  (match_operand:SI 1 "register_operand" "w,r,r")
  (match_operand:QI 2 "aarch64_reg_or_shift_imm_si" "Usl,Uss,r"]

GPI is just SI or DI, so in the SI case we're zero-extending SI to SI,
which isn't a valid operation.  The original patch was just for extending
to DI, which seems correct.  The choice between printing %x for smov and
%w for umov can then depend on the code.

>
> Bootstrapped Regtested on aarch64-none-linux-gnu and no issues.
>
> Ok for master?
>
> Thanks,
> Tamar
>
> gcc/ChangeLog:
>
>   * config/aarch64/aarch64.md (*si3_insn_uxtw): Split SHIFT into
>   left and right ones.
>   (*aarch64_ashr_sisd_or_int_3, *si3_insn2_sxtw): Support
>   smov.
>   * config/aarch64/constraints.md (Usl): New.
>   * config/aarch64/iterators.md (LSHIFTRT_ONLY, ASHIFTRT_ONLY): New.
>
> gcc/testsuite/ChangeLog:
>
>   * gcc.target/aarch64/shift-read_1.c: New test.
>   * gcc.target/aarch64/shift-read_2.c: New test.
>   * gcc.target/aarch64/shift-read_3.c: New test.
>
> --- inline copy of patch ---
>
> diff --git a/gcc/config/aarch64/aarch64.md b/gcc/config/aarch64/aarch64.md
> index 
> c333fb1f72725992bb304c560f1245a242d5192d..2bc2684b82c35a44e0a2cea6e3aaf32d939f8cdf
>  100644
> --- a/gcc/config/aarch64/aarch64.md
> +++ b/gcc/config/aarch64/aarch64.md
> @@ -5370,20 +5370,42 @@ (define_split
>  
>  ;; Arithmetic right shift using SISD or Integer instruction
>  (define_insn "*aarch64_ashr_sisd_or_int_3"
> -  [(set (match_operand:GPI 0 "register_operand" "=r,r,w,,")
> +  [(set (match_operand:GPI 0 "register_operand" "=r,r,w,r,,")
>   (ashiftrt:GPI
> -   (match_operand:GPI 1 "register_operand" "r,r,w,w,w")
> +   (match_operand:GPI 1 "register_operand" "r,r,w,w,w,w")
> (match_operand:QI 2 "aarch64_reg_or_shift_imm_di"
> -"Us,r,Us,w,0")))]
> +"Us,r,Us,Usl,w,0")))]
>""
> -  "@
> -   asr\t%0, %1, %2
> -   asr\t%0, %1, %2
> -   sshr\t%0, %1, %2
> -   #
> -   #"
> -  [(set_attr "type" 
> "bfx,shift_reg,neon_shift_imm,neon_shift_reg,neon_shift_reg")
> -   (set_attr "arch" "*,*,simd,simd,simd")]
> +  {
> +switch (which_alternative)
> +{
> +  case 0:
> + return "asr\t%0, %1, %2";
> +  case 1:
> + return "asr\t%0, %1, %2";
> +  case 2:
> + return "sshr\t%0, %1, %2";
> +  case 3:
> + {
> +   int val = INTVAL (operands[2]);
> +   int size = 32 - val;
> +
> +   if (size == 16)
> + return "smov\\t%w0, %1.h[1]";
> +   if (size == 8)
> + return "smov\\t%w0, %1.b[3]";

This only looks right for SI, not DI.  (But we can do something
similar for DI.)

Thanks,
Richard

> +   gcc_unreachable ();
> + }
> +  case 4:
> + return "#";
> +  case 5:
> + return "#";
> +  default:
> + gcc_unreachable ();
> +}
> +  }
> +  [(set_attr "type" "bfx,shift_reg,neon_shift_imm,neon_to_gp, 
> neon_shift_reg,neon_shift_reg")
> +   (set_attr "arch" "*,*,simd,simd,simd,simd")]
>  )
>  
>  (define_split
> @@ -5493,7 +5515,7 @@ (define_insn "*rol3_insn"
>  ;; zero_extend version of shifts
>  (define_insn "*si3_insn_uxtw"
>[(set (match_operand:DI 0 "register_operand" "=r,r")
> - (zero_extend:DI (SHIFT_no_rotate:SI
> + (zero_extend:DI (SHIFT_arith:SI
>(match_operand:SI 1 "register_operand" "r,r")
>(match_operand:QI 2 "aarch64_reg_or_shift_imm_si" "Uss,r"]
>""
> @@ -5528,6 +5550,68 @@ (define_insn "*rolsi3_insn_uxtw"
>[(set_attr "type" "rotate_imm")]
>  )
>  
> +(define_insn "*si3_insn2_sxtw"
> +  [(set (match_operand:GPI 0 "register_operand" "=r,r,r")
> + (sign_extend:GPI (ASHIFTRT_ONLY:SI
> +   (match_operand:SI 1 "register_operand" "w,r,r")
> +   (match_operand:QI 2 "aarch64_reg_or_shift_imm_si" "Usl,Uss,r"]
> +  "mode != DImode || satisfies_constraint_Usl (operands[2])"
> +  {
> +switch (which_alternative)
> +{
> +  case 0:
> + {
> +   int val = INTVAL (operands[2]);
> +   int size = 32 - val;
> +
> +   if (size == 16)
> + return "smov\\t%0, %1.h[1]";
> +   if (size == 8)
> + return "smov\\t%0, %1.b[3]";
> +   gcc_unreachable ();
> + }
> +  case 1:
> + return "\\t%0, %1, %2";
> +  case 2:
> + return "\\t%0, %1, %2";
> +  default:
> + gcc_unreachable ();
> +  }
> +  }
> +  [(set_attr "type" "neon_to_gp,bfx,shift_reg")]
> +)
> +
> +(define_insn "*si3_insn2_uxtw"
> +  

Re: [PATCH 7/7] riscv: Add support for str(n)cmp inline expansion

2022-11-14 Thread Christoph Müllner
On Mon, Nov 14, 2022 at 8:28 PM Jeff Law  wrote:

>
> On 11/13/22 16:05, Christoph Muellner wrote:
> > From: Christoph Müllner 
> >
> > This patch implements expansions for the cmpstrsi and the cmpstrnsi
> > builtins using Zbb instructions (if available).
> > This allows to inline calls to strcmp() and strncmp().
> >
> > The expansion basically emits a peeled comparison sequence (i.e. a peeled
> > comparison loop) which compares XLEN bits per step if possible.
> >
> > The emitted sequence can be controlled, by setting the maximum number
> > of compared bytes (-mstring-compare-inline-limit).
> >
> > gcc/ChangeLog:
> >
> >   * config/riscv/riscv-protos.h (riscv_expand_strn_compare): New
> > prototype.
> >   * config/riscv/riscv-string.cc (GEN_EMIT_HELPER3): New helper
> > macros.
> >   (GEN_EMIT_HELPER2): New helper macros.
> >   (expand_strncmp_zbb_sequence): New function.
> >   (riscv_emit_str_compare_zbb): New function.
> >   (riscv_expand_strn_compare): New function.
> >   * config/riscv/riscv.md (cmpstrnsi): Invoke expansion functions
> > for strn_compare.
> >   (cmpstrsi): Invoke expansion functions for strn_compare.
> >   * config/riscv/riscv.opt: Add new parameter
> > '-mstring-compare-inline-limit'.
>
> Presumably the hybrid inline + out of line approach is to capture the
> fact that most strings compare unequal early, then punt out to the
> library if they don't follow that model?  It looks like we're structured
> for that case by peeling iterations rather than having a fully inlined
> approach.  Just want to confirm...
>

Yes, this was inspired by gcc/config/rs6000/rs6000-string.cc
(e.g. expand_strncmp_gpr_sequence).

The current implementation emits an unrolled loop to process up to N
characters.
For longer strings, we do a handover to libc to process the remainder there.
The hand-over implies a call overhead and, of course, a well-optimized
str(n)cmp
implementation would be beneficial (once we have the information in user
space for ifuncs).

We can take this further, but then the following questions pop up:
* how much data processing per loop iteration?
* what about unaligned strings?

Happy to get suggestions/opinions for improvement.


> I was a bit worried about the "readahead" problem that arises when
> reading more than a byte and a NUL is found in the first string.  If
> you're not careful, the readahead of the second string could fault.  But
> it looks like we avoid that by requiring word alignment on both strings.
>

Yes, aligned strings are not affected by the readahead.

I wonder if we should add dynamic tests in case the compiler cannot derive
XLEN-alignment so we capture more cases (e.g. character-arrays have
guaranteed alignment 1, but are allocated with a higher actual alignment on
the stack).


> > +
> > +/* Emit a string comparison sequence using Zbb instruction.
> > +
> > +   OPERANDS[0] is the target (result).
> > +   OPERANDS[1] is the first source.
> > +   OPERANDS[2] is the second source.
> > +   If NO_LENGTH is zero, then:
> > +   OPERANDS[3] is the length.
> > +   OPERANDS[4] is the alignment in bytes.
> > +   If NO_LENGTH is nonzero, then:
> > +   OPERANDS[3] is the alignment in bytes.
>
> Ugh.  I guess it's inevitable unless we want to drop the array and pass
> each element individually (in which case we'd pass a NULL_RTX in the
> case we don't have a length argument).
>

I will split the array into individual rtx arguments as suggested.


> I'd like to give others a chance to chime in here.  Everything looks
> sensible here, but I may have missed something.  So give the other
> maintainers a couple days to chime in before committing.
>
>
> Jeff
>
>


Re: [PATCH v2 0/2] Basic support for the Ventana VT1 w/ instruction fusion

2022-11-14 Thread Philipp Tomsich
Jeff,

On Mon, 14 Nov 2022 at 22:23, Jeff Law  wrote:
>
>
> On 11/14/22 13:00, Palmer Dabbelt wrote:
> > On Sun, 13 Nov 2022 12:48:22 PST (-0800), philipp.toms...@vrull.eu wrote:
> >>
> >> This series provides support for the Ventana VT1 (a 4-way superscalar
> >> rv64gc_zba_zbb_zbc_zbs_zifenci_xventanacondops core) including support
> >> for the supported instruction fusion patterns.
> >>
> >> This includes the addition of the fusion-aware scheduling
> >> infrastructure for RISC-V and implements idiom recognition for the
> >> fusion patterns supported by VT1.
> >>
> >> Note that we don't signal support for XVentanaCondOps at this point,
> >> as the XVentanaCondOps support is in-flight separately. Changing the
> >> defaults for VT1 can happen late in the cycle, so no need to link the
> >> two different changesets.
> >>
> >> Changes in v2:
> >> - Rebased and changed over to .rst-based documentation
> >> - Updated to catch more fusion cases
> >> - Signals support for Zifencei
> >>
> >> Philipp Tomsich (2):
> >>   RISC-V: Add basic support for the Ventana-VT1 core
> >>   RISC-V: Add instruction fusion (for ventana-vt1)
> >>
> >>  gcc/config/riscv/riscv-cores.def  |   3 +
> >>  gcc/config/riscv/riscv-opts.h |   2 +-
> >>  gcc/config/riscv/riscv.cc | 233 ++
> >>  .../risc-v-options.rst|   5 +-
> >>  4 files changed, 240 insertions(+), 3 deletions(-)
> >
> > I guess we never really properly talked about this on the GCC mailing
> > lists, but IMO it's fine to start taking code for designs that have
> > been announced under the assumption that if the hardware doesn't
> > actually show up according to those timelines that it will be assumed
> > to have never existed and thus be removed more quickly than usual.
> Absolutely.   I have zero interest in carrying around code for
> nonexistent or dead variants.
> >
> > That said, I can't find anything describing that the VT-1 exists aside
> > from these patches.  Is there anything that describes this design and
> > when it's expected to be available?
>
> What do you need?  I can give some broad overview information on the
> design, but it would likely just mirror what's already been mentioned in
> these patches.
>
>
> As far as schedules.  I'm not sure what I can say.  I'll check on that.
>
>
> It was never my intention to bypass any process/procedures here. So if I
> did, my apologies.

The controversial part is XVentanaCondOps (as it is a vendor-defined
extension), so I'll certainly hold off on that until both you and
Palmer are in agreement on how to proceed there.

Thanks,
Philipp.

> jeff
>


Re: [PATCH 1/5] c: Set the locus of the function result decl

2022-11-14 Thread Joseph Myers
On Sun, 13 Nov 2022, Bernhard Reutner-Fischer via Gcc-patches wrote:

> Bootstrapped and regtested on x86_86-unknown-linux with no regressions.
> Ok for trunk?
> 
> Cc: Joseph Myers 
> ---
> gcc/c/ChangeLog:
> 
>   * c-decl.cc (start_function): Set the result decl source
>   location to the location of the typespec.

OK.

-- 
Joseph S. Myers
jos...@codesourcery.com


Re: [PATCH v2 0/2] Basic support for the Ventana VT1 w/ instruction fusion

2022-11-14 Thread Jeff Law via Gcc-patches



On 11/14/22 13:00, Palmer Dabbelt wrote:

On Sun, 13 Nov 2022 12:48:22 PST (-0800), philipp.toms...@vrull.eu wrote:


This series provides support for the Ventana VT1 (a 4-way superscalar
rv64gc_zba_zbb_zbc_zbs_zifenci_xventanacondops core) including support
for the supported instruction fusion patterns.

This includes the addition of the fusion-aware scheduling
infrastructure for RISC-V and implements idiom recognition for the
fusion patterns supported by VT1.

Note that we don't signal support for XVentanaCondOps at this point,
as the XVentanaCondOps support is in-flight separately. Changing the
defaults for VT1 can happen late in the cycle, so no need to link the
two different changesets.

Changes in v2:
- Rebased and changed over to .rst-based documentation
- Updated to catch more fusion cases
- Signals support for Zifencei

Philipp Tomsich (2):
  RISC-V: Add basic support for the Ventana-VT1 core
  RISC-V: Add instruction fusion (for ventana-vt1)

 gcc/config/riscv/riscv-cores.def  |   3 +
 gcc/config/riscv/riscv-opts.h |   2 +-
 gcc/config/riscv/riscv.cc | 233 ++
 .../risc-v-options.rst    |   5 +-
 4 files changed, 240 insertions(+), 3 deletions(-)


I guess we never really properly talked about this on the GCC mailing 
lists, but IMO it's fine to start taking code for designs that have 
been announced under the assumption that if the hardware doesn't 
actually show up according to those timelines that it will be assumed 
to have never existed and thus be removed more quickly than usual.
Absolutely.   I have zero interest in carrying around code for 
nonexistent or dead variants.


That said, I can't find anything describing that the VT-1 exists aside 
from these patches.  Is there anything that describes this design and 
when it's expected to be available?


What do you need?  I can give some broad overview information on the 
design, but it would likely just mirror what's already been mentioned in 
these patches.



As far as schedules.  I'm not sure what I can say.  I'll check on that.


It was never my intention to bypass any process/procedures here. So if I 
did, my apologies.



jeff



Re: [PATCH v2 0/2] Basic support for the Ventana VT1 w/ instruction fusion

2022-11-14 Thread Philipp Tomsich
On Mon, 14 Nov 2022 at 21:58, Palmer Dabbelt  wrote:
>
> On Mon, 14 Nov 2022 12:03:38 PST (-0800), philipp.toms...@vrull.eu wrote:
> > On Mon, 14 Nov 2022 at 21:00, Palmer Dabbelt  wrote:
> >>
> >> On Sun, 13 Nov 2022 12:48:22 PST (-0800), philipp.toms...@vrull.eu wrote:
> >> >
> >> > This series provides support for the Ventana VT1 (a 4-way superscalar
> >> > rv64gc_zba_zbb_zbc_zbs_zifenci_xventanacondops core) including support
> >> > for the supported instruction fusion patterns.
> >> >
> >> > This includes the addition of the fusion-aware scheduling
> >> > infrastructure for RISC-V and implements idiom recognition for the
> >> > fusion patterns supported by VT1.
> >> >
> >> > Note that we don't signal support for XVentanaCondOps at this point,
> >> > as the XVentanaCondOps support is in-flight separately.  Changing the
> >> > defaults for VT1 can happen late in the cycle, so no need to link the
> >> > two different changesets.
> >> >
> >> > Changes in v2:
> >> > - Rebased and changed over to .rst-based documentation
> >> > - Updated to catch more fusion cases
> >> > - Signals support for Zifencei
> >> >
> >> > Philipp Tomsich (2):
> >> >   RISC-V: Add basic support for the Ventana-VT1 core
> >> >   RISC-V: Add instruction fusion (for ventana-vt1)
> >> >
> >> >  gcc/config/riscv/riscv-cores.def  |   3 +
> >> >  gcc/config/riscv/riscv-opts.h |   2 +-
> >> >  gcc/config/riscv/riscv.cc | 233 ++
> >> >  .../risc-v-options.rst|   5 +-
> >> >  4 files changed, 240 insertions(+), 3 deletions(-)
> >>
> >> I guess we never really properly talked about this on the GCC mailing
> >> lists, but IMO it's fine to start taking code for designs that have been
> >> announced under the assumption that if the hardware doesn't actually
> >> show up according to those timelines that it will be assumed to have
> >> never existed and thus be removed more quickly than usual.
> >>
> >> That said, I can't find anything describing that the VT-1 exists aside
> >> from these patches.  Is there anything that describes this design and
> >> when it's expected to be available?
> >
> > I have to defer to Jeff on this one.
>
> Looks like you already committed it, though:
>
> 991cfe5b30c ("RISC-V: Add instruction fusion (for ventana-vt1)")
> b4fca4fc70d ("RISC-V: Add basic support for the Ventana-VT1 core")
>
> We talked about this multiple times and I thought you were on board with
> the proposed "hardware needs to be announced" changes, did I
> misunderstand that?

Sorry — I had assumed that the "basic support" changes were agreed
upon between you and Jeff, given that Jeff had given the OK.

My position is still the same as discussed at LPC that "hardware needs
to be announced".

Thanks,
Philipp.


Re: [PATCH 6/7] riscv: Add support for strlen inline expansion

2022-11-14 Thread Christoph Müllner
On Mon, Nov 14, 2022 at 7:17 PM Jeff Law  wrote:

>
> On 11/13/22 16:05, Christoph Muellner wrote:
> > From: Christoph Müllner 
> >
> > This patch implements the expansion of the strlen builtin
> > using Zbb instructions (if available) for aligned strings
> > using the following sequence:
> >
> >li  a3,-1
> >addia4,a0,8
> > .L2:  ld  a5,0(a0)
> >addia0,a0,8
> >orc.b   a5,a5
> >beq a5,a3,6 <.L2>
> >not a5,a5
> >ctz a5,a5
> >srlia5,a5,0x3
> >add a0,a0,a5
> >sub a0,a0,a4
> >
> > This allows to inline calls to strlen(), with optimized code for
> > determining the length of a string.
> >
> > gcc/ChangeLog:
> >
> >   * config/riscv/riscv-protos.h (riscv_expand_strlen): New
> > prototype.
> >   * config/riscv/riscv-string.cc (riscv_emit_unlikely_jump): New
> > function.
> >   (GEN_EMIT_HELPER2): New helper macro.
> >   (GEN_EMIT_HELPER3): New helper macro.
> >   (do_load_from_addr): New helper function.
> >   (riscv_expand_strlen_zbb): New function.
> >   (riscv_expand_strlen): New function.
> >   * config/riscv/riscv.md (strlen): Invoke expansion
> > functions for strlen.
> >
> >
> > +extern bool riscv_expand_strlen (rtx[]);
>
> Consider adding the number of elements in the RTX array here. Martin S's
> work from a little while ago will make use of it to try and catch
> over-reads and over-writes if the data is available.
>

Done.


>
>
> >
> >   /* Information about one CPU we know about.  */
> >   struct riscv_cpu_info {
> > diff --git a/gcc/config/riscv/riscv-string.cc
> b/gcc/config/riscv/riscv-string.cc
> > index 1137df475be..bf96522b608 100644
> > --- a/gcc/config/riscv/riscv-string.cc
> > +++ b/gcc/config/riscv/riscv-string.cc
> > @@ -38,6 +38,81 @@
> >   #include "predict.h"
> >   #include "optabs.h"
> >
> > +/* Emit unlikely jump instruction.  */
> > +
> > +static rtx_insn *
> > +riscv_emit_unlikely_jump (rtx insn)
> > +{
> > +  rtx_insn *jump = emit_jump_insn (insn);
> > +  add_reg_br_prob_note (jump, profile_probability::very_unlikely ());
> > +  return jump;
> > +}
>
> I was a bit surprised that we didn't have this as a generic routine.
> Consider adding this to emit-rtl.cc along with its companion
> emit_likely_jump.  Not a requirement to move forward, but it seems like
> the right thing to do.
>

I created both and called them emit_[un]likely_jump_insn() to match
emit_jump_insn().


>
>
>
>
> > +
> > +/* Emit proper instruction depending on type of dest.  */
>
> s/type/mode/
>

Done.


>
>
>
> > +
> > +/* Emit proper instruction depending on type of dest.  */
>
> s/type/mode/
>

Done.


>
>
> You probably want to undefine GEN_EMIT_HELPER once you're done when
> them.  That's become fairly standard practice for these kind of helper
> macros.
>

Done.


>
> OK with the nits fixed.  Your call on whether or not to move the
> implementation of emit_likely_jump and emit_unlikely_jump into emit-rtl.cc.
>

I've made all the requested and suggested changes and rested again.
Thanks!


>
>
> Jeff
>
>
>


Re: [PATCH v2 0/2] Basic support for the Ventana VT1 w/ instruction fusion

2022-11-14 Thread Palmer Dabbelt

On Mon, 14 Nov 2022 12:03:38 PST (-0800), philipp.toms...@vrull.eu wrote:

On Mon, 14 Nov 2022 at 21:00, Palmer Dabbelt  wrote:


On Sun, 13 Nov 2022 12:48:22 PST (-0800), philipp.toms...@vrull.eu wrote:
>
> This series provides support for the Ventana VT1 (a 4-way superscalar
> rv64gc_zba_zbb_zbc_zbs_zifenci_xventanacondops core) including support
> for the supported instruction fusion patterns.
>
> This includes the addition of the fusion-aware scheduling
> infrastructure for RISC-V and implements idiom recognition for the
> fusion patterns supported by VT1.
>
> Note that we don't signal support for XVentanaCondOps at this point,
> as the XVentanaCondOps support is in-flight separately.  Changing the
> defaults for VT1 can happen late in the cycle, so no need to link the
> two different changesets.
>
> Changes in v2:
> - Rebased and changed over to .rst-based documentation
> - Updated to catch more fusion cases
> - Signals support for Zifencei
>
> Philipp Tomsich (2):
>   RISC-V: Add basic support for the Ventana-VT1 core
>   RISC-V: Add instruction fusion (for ventana-vt1)
>
>  gcc/config/riscv/riscv-cores.def  |   3 +
>  gcc/config/riscv/riscv-opts.h |   2 +-
>  gcc/config/riscv/riscv.cc | 233 ++
>  .../risc-v-options.rst|   5 +-
>  4 files changed, 240 insertions(+), 3 deletions(-)

I guess we never really properly talked about this on the GCC mailing
lists, but IMO it's fine to start taking code for designs that have been
announced under the assumption that if the hardware doesn't actually
show up according to those timelines that it will be assumed to have
never existed and thus be removed more quickly than usual.

That said, I can't find anything describing that the VT-1 exists aside
from these patches.  Is there anything that describes this design and
when it's expected to be available?


I have to defer to Jeff on this one.


Looks like you already committed it, though:

991cfe5b30c ("RISC-V: Add instruction fusion (for ventana-vt1)")
b4fca4fc70d ("RISC-V: Add basic support for the Ventana-VT1 core")

We talked about this multiple times and I thought you were on board with 
the proposed "hardware needs to be announced" changes, did I 
misunderstand that?


Re: [PATCH v2] c, analyzer: support named constants in analyzer [PR106302]

2022-11-14 Thread Marek Polacek via Gcc-patches
On Fri, Nov 11, 2022 at 10:23:10PM -0500, David Malcolm wrote:
> Changes since v1: ported the doc changes from texinfo to sphinx
> 
> Successfully bootstrapped & regrtested on x86_64-pc-linux-gnu.
> 
> Are the C frontend parts OK for trunk?  (I can self-approve the
> analyzer parts)

Sorry for the delay.
 
> The patch adds an interface for frontends to call into the analyzer as
> the translation unit finishes.  The analyzer can then call back into the
> frontend to ask about the values of the named constants it cares about
> whilst the frontend's data structures are still around.
> 
> The patch implements this for the C frontend, which looks up the names
> by looking for named CONST_DECLs (which handles enum values).  Failing
> that, it attempts to look up the values of macros but only the simplest
> cases are supported (a non-traditional macro with a single CPP_NUMBER
> token).  It does this by building a buffer containing the macro
> definition and rerunning a lexer on it.
> 
> The analyzer gracefully handles the cases where named values aren't
> found (such as anything more complicated than described above).
> 
> The patch ports the analyzer to use this mechanism for "O_RDONLY",
> "O_WRONLY", and "O_ACCMODE".  I have successfully tested my socket patch
> to also use this for "SOCK_STREAM" and "SOCK_DGRAM", so the technique
> seems to work.

So this works well for code like

enum __socket_type {
SOCK_STREAM = 1,

#define SOCK_STREAM SOCK_STREAM
};

?

> diff --git a/gcc/c/c-parser.cc b/gcc/c/c-parser.cc
> index d70697b1d63..efe19fbe70b 100644
> --- a/gcc/c/c-parser.cc
> +++ b/gcc/c/c-parser.cc
> @@ -72,6 +72,8 @@ along with GCC; see the file COPYING3.  If not see
>  #include "memmodel.h"
>  #include "c-family/known-headers.h"
>  #include "bitmap.h"
> +#include "analyzer/analyzer-language.h"
> +#include "toplev.h"
>  
>  /* We need to walk over decls with incomplete struct/union/enum types
> after parsing the whole translation unit.
> @@ -1662,6 +1664,87 @@ static bool c_parser_objc_diagnose_bad_element_prefix
>(c_parser *, struct c_declspecs *);
>  static location_t c_parser_parse_rtl_body (c_parser *, char *);
>  
> +#if ENABLE_ANALYZER
> +
> +namespace ana {
> +
> +/* Concrete implementation of ana::translation_unit for the C frontend.  */
> +
> +class c_translation_unit : public translation_unit
> +{
> +public:
> +  /* Implementation of translation_unit::lookup_constant_by_id for use by the
> + analyzer to look up named constants in the user's source code.  */
> +  tree lookup_constant_by_id (tree id) const final override
> +  {
> +/* Consider decls.  */
> +if (tree decl = lookup_name (id))
> +  if (TREE_CODE (decl) == CONST_DECL)
> + if (tree value = DECL_INITIAL (decl))
> +   if (TREE_CODE (value) == INTEGER_CST)
> + return value;
> +
> +/* Consider macros.  */
> +cpp_hashnode *hashnode = C_CPP_HASHNODE (id);
> +if (cpp_macro_p (hashnode))
> +  if (tree value = consider_macro (hashnode->value.macro))
> + return value;
> +
> +return NULL_TREE;
> +  }
> +
> +private:
> +  /* Attempt to get an INTEGER_CST from MACRO.
> + Only handle the simplest cases: where MACRO's definition is a single
> + token containing a number, by lexing the number again.
> + This will handle e.g.
> +   #define NAME 42
> + and other bases but not negative numbers, parentheses or e.g.
> +   #define NAME 1 << 7
> + as doing so would require a parser.  */
> +  tree consider_macro (cpp_macro *macro) const
> +  {
> +if (macro->paramc > 0)
> +  return NULL_TREE;
> +if (macro->kind == cmk_traditional)

Do you really want to handle cmk_assert?  I'd say you want

  if (macro->kind != cmk_macro)

> +  return NULL_TREE;
> +if (macro->count != 1)
> +  return NULL_TREE;
> +const cpp_token  = macro->exp.tokens[0];
> +if (tok.type != CPP_NUMBER)
> +  return NULL_TREE;
> +
> +cpp_reader *old_parse_in = parse_in;
> +parse_in = cpp_create_reader (c_dialect_cxx () ? CLK_GNUCXX: CLK_GNUC89,
> +   ident_hash, line_table);

Why not always CLK_GNUC89 since we're in the C FE?

> +
> +pretty_printer pp;
> +pp_string (, (const char *)tok.val.str.text);

A space after ')'.

> +pp_newline ();
> +cpp_push_buffer (parse_in,
> +  (const unsigned char *)pp_formatted_text (),

Likewise.

> +  strlen (pp_formatted_text ()),
> +  0);
> +
> +tree value;
> +location_t loc;
> +unsigned char cpp_flags;
> +c_lex_with_flags (, , _flags, 0);
> +
> +cpp_destroy (parse_in);
> +parse_in = old_parse_in;
> +
> +if (value && TREE_CODE (value) == INTEGER_CST)
> +  return value;
> +
> +return NULL_TREE;
> +  }
> +};
> +
> +} // namespace ana
> +
> +#endif /* #if ENABLE_ANALYZER */
> +
>  /* Parse a translation unit (C90 6.7, C99 6.9, C11 6.9).
>  
> translation-unit:
> @@ -1722,6 +1805,14 @@ 

[PATCH][committed]middle-end: Fix addsub patch removing return statements

2022-11-14 Thread Tamar Christina via Gcc-patches
Hi All,

My recent patch had return statements in the match.pd expressions
which were recently outlawed.. Unfornately I didn't rebase this
patch before committing so this broke the build.

I've just reflowed the conditions to avoid the returns.

Bootstrapped Regtested on aarch64-none-linux-gnu and no issues.

Committed to fix the build, but think it's still trivial as it just reflows the 
code.

Thanks,
Tamar

gcc/ChangeLog:

* match.pd: Remove returns.

--- inline copy of patch -- 
diff --git a/gcc/match.pd b/gcc/match.pd
index 
4701578c96451d56e5235d5e0bc5d0a0378c1435..946dcd1b301c8fbcfe36e534651ff0b183b324ef
 100644
--- a/gcc/match.pd
+++ b/gcc/match.pd
@@ -7984,51 +7984,51 @@ and,
{
  /* Build a vector of integers from the tree mask.  */
  vec_perm_builder builder;
- if (!tree_to_vec_perm_builder (, @2))
-   return NULL_TREE;
-
- /* Create a vec_perm_indices for the integer vector.  */
- poly_uint64 nelts = TYPE_VECTOR_SUBPARTS (type);
- vec_perm_indices sel (builder, 2, nelts);
}
-   (if (sel.series_p (0, 2, 0, 2))
+   (if (tree_to_vec_perm_builder (, @2))
 (with
  {
+   /* Create a vec_perm_indices for the integer vector.  */
+   poly_uint64 nelts = TYPE_VECTOR_SUBPARTS (type);
+   vec_perm_indices sel (builder, 2, nelts);
machine_mode vec_mode = TYPE_MODE (type);
machine_mode wide_mode;
-   if (!GET_MODE_WIDER_MODE (vec_mode).exists (_mode)
-  || !VECTOR_MODE_P (wide_mode)
-  || (GET_MODE_UNIT_BITSIZE (vec_mode) * 2
-   != GET_MODE_UNIT_BITSIZE (wide_mode)))
-return NULL_TREE;
-
-   tree stype = lang_hooks.types.type_for_mode (GET_MODE_INNER (wide_mode),
-   TYPE_UNSIGNED (type));
-   if (TYPE_MODE (stype) == BLKmode)
-return NULL_TREE;
-   tree ntype = build_vector_type_for_mode (stype, wide_mode);
-   if (!VECTOR_TYPE_P (ntype))
-return NULL_TREE;
-
-   /* The format has to be a non-extended ieee format.  */
-   const struct real_format *fmt_old = FLOAT_MODE_FORMAT (vec_mode);
-   const struct real_format *fmt_new = FLOAT_MODE_FORMAT (wide_mode);
-   if (fmt_old == NULL || fmt_new == NULL)
-return NULL_TREE;
-
-   /* If the target doesn't support v1xx vectors, try using scalar mode xx
- instead.  */
-   if (known_eq (GET_MODE_NUNITS (wide_mode), 1)
-  && !target_supports_op_p (ntype, NEGATE_EXPR, optab_vector))
-ntype = stype;
  }
- (if (fmt_new->signbit_rw
-== fmt_old->signbit_rw + GET_MODE_UNIT_BITSIZE (vec_mode)
- && fmt_new->signbit_rw == fmt_new->signbit_ro
- && targetm.can_change_mode_class (TYPE_MODE (ntype), TYPE_MODE 
(type), ALL_REGS)
- && ((optimize_vectors_before_lowering_p () && VECTOR_TYPE_P (ntype))
- || target_supports_op_p (ntype, NEGATE_EXPR, optab_vector)))
-  (plus (view_convert:type (negate (view_convert:ntype @1))) @0)))
+ (if (sel.series_p (0, 2, 0, 2)
+  && GET_MODE_WIDER_MODE (vec_mode).exists (_mode)
+ && VECTOR_MODE_P (wide_mode)
+ && (GET_MODE_UNIT_BITSIZE (vec_mode) * 2
+ == GET_MODE_UNIT_BITSIZE (wide_mode)))
+   (with
+{
+  tree stype
+= lang_hooks.types.type_for_mode (GET_MODE_INNER (wide_mode),
+  TYPE_UNSIGNED (type));
+  tree ntype = build_vector_type_for_mode (stype, wide_mode);
+
+  /* The format has to be a non-extended ieee format.  */
+  const struct real_format *fmt_old = FLOAT_MODE_FORMAT (vec_mode);
+  const struct real_format *fmt_new = FLOAT_MODE_FORMAT (wide_mode);
+}
+(if (TYPE_MODE (stype) != BLKmode
+ && VECTOR_TYPE_P (ntype)
+ && fmt_old != NULL
+ && fmt_new != NULL)
+ (with
+  {
+/* If the target doesn't support v1xx vectors, try using
+   scalar mode xx instead.  */
+   if (known_eq (GET_MODE_NUNITS (wide_mode), 1)
+   && !target_supports_op_p (ntype, NEGATE_EXPR, optab_vector))
+ ntype = stype;
+  }
+  (if (fmt_new->signbit_rw
+   == fmt_old->signbit_rw + GET_MODE_UNIT_BITSIZE (vec_mode)
+   && fmt_new->signbit_rw == fmt_new->signbit_ro
+   && targetm.can_change_mode_class (TYPE_MODE (ntype), TYPE_MODE 
(type), ALL_REGS)
+   && ((optimize_vectors_before_lowering_p () && VECTOR_TYPE_P 
(ntype))
+   || target_supports_op_p (ntype, NEGATE_EXPR, optab_vector)))
+   (plus (view_convert:type (negate (view_convert:ntype @1))) 
@0)))
 
 (simplify
  (vec_perm @0 @1 VECTOR_CST@2)




-- 
diff --git a/gcc/match.pd b/gcc/match.pd
index 
4701578c96451d56e5235d5e0bc5d0a0378c1435..946dcd1b301c8fbcfe36e534651ff0b183b324ef
 100644
--- a/gcc/match.pd
+++ b/gcc/match.pd
@@ -7984,51 

Re: [PATCH v2 0/2] Basic support for the Ventana VT1 w/ instruction fusion

2022-11-14 Thread Philipp Tomsich
On Mon, 14 Nov 2022 at 21:00, Palmer Dabbelt  wrote:
>
> On Sun, 13 Nov 2022 12:48:22 PST (-0800), philipp.toms...@vrull.eu wrote:
> >
> > This series provides support for the Ventana VT1 (a 4-way superscalar
> > rv64gc_zba_zbb_zbc_zbs_zifenci_xventanacondops core) including support
> > for the supported instruction fusion patterns.
> >
> > This includes the addition of the fusion-aware scheduling
> > infrastructure for RISC-V and implements idiom recognition for the
> > fusion patterns supported by VT1.
> >
> > Note that we don't signal support for XVentanaCondOps at this point,
> > as the XVentanaCondOps support is in-flight separately.  Changing the
> > defaults for VT1 can happen late in the cycle, so no need to link the
> > two different changesets.
> >
> > Changes in v2:
> > - Rebased and changed over to .rst-based documentation
> > - Updated to catch more fusion cases
> > - Signals support for Zifencei
> >
> > Philipp Tomsich (2):
> >   RISC-V: Add basic support for the Ventana-VT1 core
> >   RISC-V: Add instruction fusion (for ventana-vt1)
> >
> >  gcc/config/riscv/riscv-cores.def  |   3 +
> >  gcc/config/riscv/riscv-opts.h |   2 +-
> >  gcc/config/riscv/riscv.cc | 233 ++
> >  .../risc-v-options.rst|   5 +-
> >  4 files changed, 240 insertions(+), 3 deletions(-)
>
> I guess we never really properly talked about this on the GCC mailing
> lists, but IMO it's fine to start taking code for designs that have been
> announced under the assumption that if the hardware doesn't actually
> show up according to those timelines that it will be assumed to have
> never existed and thus be removed more quickly than usual.
>
> That said, I can't find anything describing that the VT-1 exists aside
> from these patches.  Is there anything that describes this design and
> when it's expected to be available?

I have to defer to Jeff on this one.

Philipp.


Re: [PATCH v2 0/2] Basic support for the Ventana VT1 w/ instruction fusion

2022-11-14 Thread Palmer Dabbelt

On Sun, 13 Nov 2022 12:48:22 PST (-0800), philipp.toms...@vrull.eu wrote:


This series provides support for the Ventana VT1 (a 4-way superscalar
rv64gc_zba_zbb_zbc_zbs_zifenci_xventanacondops core) including support
for the supported instruction fusion patterns.

This includes the addition of the fusion-aware scheduling
infrastructure for RISC-V and implements idiom recognition for the
fusion patterns supported by VT1.

Note that we don't signal support for XVentanaCondOps at this point,
as the XVentanaCondOps support is in-flight separately.  Changing the
defaults for VT1 can happen late in the cycle, so no need to link the
two different changesets.

Changes in v2:
- Rebased and changed over to .rst-based documentation
- Updated to catch more fusion cases
- Signals support for Zifencei

Philipp Tomsich (2):
  RISC-V: Add basic support for the Ventana-VT1 core
  RISC-V: Add instruction fusion (for ventana-vt1)

 gcc/config/riscv/riscv-cores.def  |   3 +
 gcc/config/riscv/riscv-opts.h |   2 +-
 gcc/config/riscv/riscv.cc | 233 ++
 .../risc-v-options.rst|   5 +-
 4 files changed, 240 insertions(+), 3 deletions(-)


I guess we never really properly talked about this on the GCC mailing 
lists, but IMO it's fine to start taking code for designs that have been 
announced under the assumption that if the hardware doesn't actually 
show up according to those timelines that it will be assumed to have 
never existed and thus be removed more quickly than usual.


That said, I can't find anything describing that the VT-1 exists aside 
from these patches.  Is there anything that describes this design and 
when it's expected to be available?


Re: [PATCH 7/7] riscv: Add support for str(n)cmp inline expansion

2022-11-14 Thread Jeff Law via Gcc-patches



On 11/13/22 16:05, Christoph Muellner wrote:

From: Christoph Müllner 

This patch implements expansions for the cmpstrsi and the cmpstrnsi
builtins using Zbb instructions (if available).
This allows to inline calls to strcmp() and strncmp().

The expansion basically emits a peeled comparison sequence (i.e. a peeled
comparison loop) which compares XLEN bits per step if possible.

The emitted sequence can be controlled, by setting the maximum number
of compared bytes (-mstring-compare-inline-limit).

gcc/ChangeLog:

* config/riscv/riscv-protos.h (riscv_expand_strn_compare): New
  prototype.
* config/riscv/riscv-string.cc (GEN_EMIT_HELPER3): New helper
  macros.
(GEN_EMIT_HELPER2): New helper macros.
(expand_strncmp_zbb_sequence): New function.
(riscv_emit_str_compare_zbb): New function.
(riscv_expand_strn_compare): New function.
* config/riscv/riscv.md (cmpstrnsi): Invoke expansion functions
  for strn_compare.
(cmpstrsi): Invoke expansion functions for strn_compare.
* config/riscv/riscv.opt: Add new parameter
  '-mstring-compare-inline-limit'.


Presumably the hybrid inline + out of line approach is to capture the 
fact that most strings compare unequal early, then punt out to the 
library if they don't follow that model?  It looks like we're structured 
for that case by peeling iterations rather than having a fully inlined 
approach.  Just want to confirm...



I was a bit worried about the "readahead" problem that arises when 
reading more than a byte and a NUL is found in the first string.  If 
you're not careful, the readahead of the second string could fault.  But 
it looks like we avoid that by requiring word alignment on both strings.




+
+/* Emit a string comparison sequence using Zbb instruction.
+
+   OPERANDS[0] is the target (result).
+   OPERANDS[1] is the first source.
+   OPERANDS[2] is the second source.
+   If NO_LENGTH is zero, then:
+   OPERANDS[3] is the length.
+   OPERANDS[4] is the alignment in bytes.
+   If NO_LENGTH is nonzero, then:
+   OPERANDS[3] is the alignment in bytes.


Ugh.  I guess it's inevitable unless we want to drop the array and pass 
each element individually (in which case we'd pass a NULL_RTX in the 
case we don't have a length argument).



I'd like to give others a chance to chime in here.  Everything looks 
sensible here, but I may have missed something.  So give the other 
maintainers a couple days to chime in before committing.



Jeff



Re: [PATCH v2 2/2] RISC-V: Add instruction fusion (for ventana-vt1)

2022-11-14 Thread Jeff Law via Gcc-patches



On 11/14/22 11:55, Philipp Tomsich wrote:

On Mon, 14 Nov 2022 at 17:06, Jeff Law  wrote:


On 11/13/22 13:48, Philipp Tomsich wrote:

The Ventana VT1 core supports quad-issue and instruction fusion.
This implemented TARGET_SCHED_MACRO_FUSION_P to keep fusible sequences
together and adds idiom matcheing for the supported fusion cases.

gcc/ChangeLog:

   * config/riscv/riscv.cc (enum riscv_fusion_pairs): Add symbolic
   constants to identify supported fusion patterns.
   (struct riscv_tune_param): Add fusible_op field.
   (riscv_macro_fusion_p): Implement.
   (riscv_fusion_enabled_p): Implement.
   (riscv_macro_fusion_pair_p): Implement and recoginze fusible
   idioms for Ventana VT1.
   (TARGET_SCHED_MACRO_FUSION_P): Point to riscv_macro_fusion_p.
   (TARGET_SCHED_MACRO_FUSION_PAIR_P): Point to riscv_macro_fusion_pair_p.

You know the fusion rules for VT1 better than I...  I'm happy to largely
defer to you on this.

I do wonder if going forward hand matching RTL like this is going to be
an unmaintainable mess and whether or not we would be better served
using insn attributes to describe instruction fusion.

I had thought about that, too.
In the end our team decided to stay away from it for the time being:
fusion frequently needs to look at second-level properties and whether
the first instruction's output register is overwritten by the second
instruction.  So we kept with the same stereotype of idiom-matching
that is also used for AArch64 today.


Yea, we're still going to have to grub around to get the operands.  But 
we'd know the overall form of the insn and the types of its operands was 
right.  But it's still going to be clunky either way.  My worry with the 
attribute approach is we'll end up with a horrible mess of attributes 
due to multiple fusion implementations and that we'll need to split 
alternatives so that we can tag them more precisely, etc.


It feels like we almost need a DSL to specify this stuff, much like we 
have for scheduling models.



Jeff



Re: [PATCH 3/7] riscv: Enable overlap-by-pieces via tune param

2022-11-14 Thread Christoph Müllner
On Mon, Nov 14, 2022 at 8:04 PM Jeff Law  wrote:

>
> On 11/14/22 01:29, Christoph Müllner wrote:
>
>
>
> On Mon, Nov 14, 2022 at 8:59 AM Philipp Tomsich 
> wrote:
>
>> On Mon, 14 Nov 2022 at 03:48, Vineet Gupta  wrote:
>> >
>> >
>> >
>> > On 11/13/22 15:05, Christoph Muellner wrote:
>> > >
>> > > +static bool
>> > > +riscv_overlap_op_by_pieces (void)
>> > > +{
>> > > +  return tune_param->overlap_op_by_pieces;
>> >
>> > Does this not need to be gated on unaligned access enabled as well.
>>
>> I assume you mean "&& !STRICT_ALIGNMENT"?
>>
>
> I think the case that slow_unaligned_access and overlap_op_by_pieces will
> both be set will not occur (we can defer the discussion about that until
> then).
> Gating overlap_op_by_pieces with !TARGET_STRICT_ALIGN is a good idea.
> It will be fixed for a v2.
>
> OK with that change.
>
>
> I'm still working through 7/7.  I may not have enough left in my tank to
> get through that one today.
>

No worries!
Thank you very much for the fast reviews!


>
> jeff
>
>


Re: [PATCH] [PR68097] Try to avoid recursing for floats in tree_*_nonnegative_warnv_p.

2022-11-14 Thread Aldy Hernandez via Gcc-patches



On 11/14/22 10:12, Richard Biener wrote:

On Sat, Nov 12, 2022 at 7:30 PM Aldy Hernandez  wrote:


It irks me that a PR named "we should track ranges for floating-point
hasn't been closed in this release.  This is an attempt to do just
that.

As mentioned in the PR, even though we track ranges for floats, it has
been suggested that avoiding recursing through SSA defs in
gimple_assign_nonnegative_warnv_p is also a goal.  We can do this with
various ranger components without the need for a heavy handed approach
(i.e. a full ranger).

I have implemented two versions of known_float_sign_p() that answer
the question whether we definitely know the sign for an operation or a
tree expression.

Both versions use get_global_range_query, which is a wrapper to query
global ranges.  This means, that no caching or propagation is done.
In the case of an SSA, we just return the global range for it (think
SSA_NAME_RANGE_INFO).  In the case of a tree code with operands, we
also use get_global_range_query to resolve the operands, and then call
into range-ops, which is our lowest level component.  There is no
ranger or gori involved.  All we're doing is resolving the operation
with the ranges passed.

This is enough to avoid recursing in the case where we definitely know
the sign of a range.  Otherwise, we still recurse.

Note that instead of get_global_range_query(), we could use
get_range_query() which uses a ranger (if active in a pass), or
get_global_range_query if not.  This would allow passes that have an
active ranger (with enable_ranger) to use a full ranger.  These passes
are currently, VRP, loop unswitching, DOM, loop versioning, etc.  If
no ranger is active, get_range_query defaults to global ranges, so
there's no additional penalty.

Would this be acceptable, at least enough to close (or rename the PR ;-))?


I think the checks would belong to the gimple_stmt_nonnegative_warnv_p function
only (that's the SSA name entry from the fold-const.cc ones)?


That was my first approach, but I thought I'd cover the unary and binary 
operators as well, since they had other callers.  But I'm happy with 
just the top-level tweak.  It's a lot less code :).




I also notice the use of 'bool' for the "sign".  That's not really
descriptive.  We
have SIGNED and UNSIGNED (aka enum signop), not sure if that's the
perfect match vs. NEGATIVE and NONNEGATIVE.  Maybe the functions
name is just bad and they should be known_float_negative_p?


The bool sign is to keep in line with real.*, and was suggested by Jeff 
(in real.* not here).  I'm happy to change the entire frange API to use 
sgnop.  It is cleaner.  If that's acceptable, I could do that as a 
follow-up.


How's this, pending tests once I figure out why my trees have been 
broken all day :-/.


Aldy

p.s. First it was sphinx failure, now I'm seeing this:
/home/aldyh/src/clean/gcc/match.pd:7935:8 error: return statement not 
allowed in C expression

   return NULL_TREE;
   ^From 6e36626aec81bf97f8f54116a291574c16cbc205 Mon Sep 17 00:00:00 2001
From: Aldy Hernandez 
Date: Sat, 12 Nov 2022 11:58:07 +0100
Subject: [PATCH] [PR68097] Try to avoid recursing for floats in
 gimple_stmt_nonnegative_warnv_p.

It irks me that a PR named "we should track ranges for floating-point
hasn't been closed in this release.  This is an attempt to do just
that.

As mentioned in the PR, even though we track ranges for floats, it has
been suggested that avoiding recursing through SSA defs in
gimple_assign_nonnegative_warnv_p is also a goal.  This patch uses a
global range query (no on-demand lookups, just global ranges and
minimal folding) to determine if the range of a statement is known to
be non-negative.

	PR tree-optimization/68097

gcc/ChangeLog:

	* gimple-fold.cc (gimple_stmt_nonnegative_warnv_p): Call
	range_of_stmt for floats.
---
 gcc/gimple-fold.cc | 10 ++
 1 file changed, 10 insertions(+)

diff --git a/gcc/gimple-fold.cc b/gcc/gimple-fold.cc
index 0a212e6d0d4..79cc4d7f569 100644
--- a/gcc/gimple-fold.cc
+++ b/gcc/gimple-fold.cc
@@ -68,6 +68,7 @@ along with GCC; see the file COPYING3.  If not see
 #include "tree-ssa-strlen.h"
 #include "varasm.h"
 #include "internal-fn.h"
+#include "gimple-range.h"
 
 enum strlen_range_kind {
   /* Compute the exact constant string length.  */
@@ -9234,6 +9235,15 @@ bool
 gimple_stmt_nonnegative_warnv_p (gimple *stmt, bool *strict_overflow_p,
  int depth)
 {
+  tree type = gimple_range_type (stmt);
+  if (type && frange::supports_p (type))
+{
+  frange r;
+  bool sign;
+  return (get_global_range_query ()->range_of_stmt (r, stmt)
+	  && r.signbit_p (sign)
+	  && sign == false);
+}
   switch (gimple_code (stmt))
 {
 case GIMPLE_ASSIGN:
-- 
2.38.1



Re: [PATCH 1/7] riscv: bitmanip: add orc.b as an unspec

2022-11-14 Thread Philipp Tomsich
On Mon, 14 Nov 2022 at 17:51, Jeff Law  wrote:
>
>
> On 11/13/22 16:05, Christoph Muellner wrote:
> > From: Philipp Tomsich 
> >
> > As a basis for optimized string functions (e.g., the by-pieces
> > implementations), we need orc.b available.  This adds orc.b as an
> > unspec, so we can expand to it.
> >
> > gcc/ChangeLog:
> >
> >  * config/riscv/bitmanip.md (orcb2): Add orc.b as an
> > unspec.
> >  * config/riscv/riscv.md: Add UNSPEC_ORC_B.
> In general, we should prefer to express things as "real" RTL rather than
> UNSPECS.  In this particular case expressing the orc could be done with
> a handful of IOR expressions, though they'd probably need to reference
> byte SUBREGs of the input and I dislike explicit SUBREGs in the md file
> even more than UNSPECs.  So
>
> OK.

Applied to master. Thanks!
(After using emacs' whitespace-cleanup to fix the damage that
Christoph's vim did to the ChangeLog...)

Philipp.

>
>
> Jeff
>
>
> ps.  We could consider this a reduc_ior_scal insn, but that may be
> actively harmful.  Having vector ops on the general and vector registers
> is a wart I hope we can avoid.
>
>


Re: [PATCH 5/7] riscv: Use by-pieces to do overlapping accesses in block_move_straight

2022-11-14 Thread Jeff Law via Gcc-patches



On 11/14/22 12:01, Christoph Müllner wrote:



On Mon, Nov 14, 2022 at 6:16 PM Jeff Law  wrote:


On 11/13/22 16:05, Christoph Muellner wrote:
> From: Christoph Müllner 
>
> The current implementation of riscv_block_move_straight() emits
a couple
> of load-store pairs with maximum width (e.g. 8-byte for RV64).
> The remainder is handed over to move_by_pieces(), which emits
code based
> target settings like slow_unaligned_access and overlap_op_by_pieces.
>
> move_by_pieces() will emit overlapping memory accesses with maximum
> width only if the given length exceeds the size of one access
> (e.g. 15-bytes for 8-byte accesses).
>
> This patch changes the implementation of riscv_block_move_straight()
> such, that it preserves a remainder within the interval
> [delta..2*delta) instead of [0..delta), so that overlapping memory
> access may be emitted (if the requirements for them are given).
>
> gcc/ChangeLog:
>
>       * config/riscv/riscv-string.c (riscv_block_move_straight):
>         Adjust range for emitted load/store pairs.

The change to riscv_expand_block_move isn't noted in the
ChangeLog.  OK
with that fixed (I'm assuming you want to attempt to use overlapping
word ops for that case).


The change in riscv_expand_block_move is a code cleanup.
At the beginning of riscv_expand_block_move we do the following:
  unsigned HOST_WIDE_INT length = UINTVAL (length);


AH, missed that.


Thanks,

Jeff


Re: [PATCH 3/7] riscv: Enable overlap-by-pieces via tune param

2022-11-14 Thread Jeff Law via Gcc-patches



On 11/14/22 01:29, Christoph Müllner wrote:



On Mon, Nov 14, 2022 at 8:59 AM Philipp Tomsich 
 wrote:


On Mon, 14 Nov 2022 at 03:48, Vineet Gupta 
wrote:
>
>
>
> On 11/13/22 15:05, Christoph Muellner wrote:
> >
> > +static bool
> > +riscv_overlap_op_by_pieces (void)
> > +{
> > +  return tune_param->overlap_op_by_pieces;
>
> Does this not need to be gated on unaligned access enabled as well.

I assume you mean "&& !STRICT_ALIGNMENT"?


I think the case that slow_unaligned_access and overlap_op_by_pieces will
both be set will not occur (we can defer the discussion about that 
until then).

Gating overlap_op_by_pieces with !TARGET_STRICT_ALIGN is a good idea.
It will be fixed for a v2.


OK with that change.


I'm still working through 7/7.  I may not have enough left in my tank to 
get through that one today.



jeff



Re: [PATCH 5/7] riscv: Use by-pieces to do overlapping accesses in block_move_straight

2022-11-14 Thread Christoph Müllner
On Mon, Nov 14, 2022 at 6:16 PM Jeff Law  wrote:

>
> On 11/13/22 16:05, Christoph Muellner wrote:
> > From: Christoph Müllner 
> >
> > The current implementation of riscv_block_move_straight() emits a couple
> > of load-store pairs with maximum width (e.g. 8-byte for RV64).
> > The remainder is handed over to move_by_pieces(), which emits code based
> > target settings like slow_unaligned_access and overlap_op_by_pieces.
> >
> > move_by_pieces() will emit overlapping memory accesses with maximum
> > width only if the given length exceeds the size of one access
> > (e.g. 15-bytes for 8-byte accesses).
> >
> > This patch changes the implementation of riscv_block_move_straight()
> > such, that it preserves a remainder within the interval
> > [delta..2*delta) instead of [0..delta), so that overlapping memory
> > access may be emitted (if the requirements for them are given).
> >
> > gcc/ChangeLog:
> >
> >   * config/riscv/riscv-string.c (riscv_block_move_straight):
> > Adjust range for emitted load/store pairs.
>
> The change to riscv_expand_block_move isn't noted in the ChangeLog.  OK
> with that fixed (I'm assuming you want to attempt to use overlapping
> word ops for that case).
>

The change in riscv_expand_block_move is a code cleanup.
At the beginning of riscv_expand_block_move we do the following:
  unsigned HOST_WIDE_INT length = UINTVAL (length);
The signature of riscv_block_move_straight wants a "unsigned HOST_WIDE_INT
length".
So we can simply reuse length instead of doing "INTVAL (length)", which is
redundant
and uses the wrong signess (INTVAL vs UINTVAL).

Also, the ChangeLog entry for the test was missing.

Thanks,
Christoph


>
>
> jeff
>
>
>


[PATCH][committed]middle-end: Fix can_special_div_by_const doc.

2022-11-14 Thread Tamar Christina via Gcc-patches
Hi All,

This commits the typo fix so it matches the tm.texi file and fix the bootstrap.

Committed under the obvious rule.

Thanks,
Tamar

gcc/ChangeLog:

* target.def: Fix typo.

--- inline copy of patch -- 
diff --git a/gcc/target.def b/gcc/target.def
index 
f491e2233cf18760631f148dacf18d0e0b133e4c..41c62a68f03a44aadbaa66ec5ad11ced3cc0801f
 100644
--- a/gcc/target.def
+++ b/gcc/target.def
@@ -1907,7 +1907,7 @@ DEFHOOK
  "This hook is used to test whether the target has a special method of\n\
 division of vectors of type @var{vectype} using the value @var{constant},\n\
 and producing a vector of type @var{vectype}.  The division\n\
-will then not be decomposed by the and kept as a div.\n\
+will then not be decomposed by the vectorizer and kept as a div.\n\
 \n\
 When the hook is being used to test whether the target supports a special\n\
 divide, @var{in0}, @var{in1}, and @var{output} are all null.  When the hook\n\




-- 
diff --git a/gcc/target.def b/gcc/target.def
index 
f491e2233cf18760631f148dacf18d0e0b133e4c..41c62a68f03a44aadbaa66ec5ad11ced3cc0801f
 100644
--- a/gcc/target.def
+++ b/gcc/target.def
@@ -1907,7 +1907,7 @@ DEFHOOK
  "This hook is used to test whether the target has a special method of\n\
 division of vectors of type @var{vectype} using the value @var{constant},\n\
 and producing a vector of type @var{vectype}.  The division\n\
-will then not be decomposed by the and kept as a div.\n\
+will then not be decomposed by the vectorizer and kept as a div.\n\
 \n\
 When the hook is being used to test whether the target supports a special\n\
 divide, @var{in0}, @var{in1}, and @var{output} are all null.  When the hook\n\





Re: [PATCH v2 2/2] RISC-V: Add instruction fusion (for ventana-vt1)

2022-11-14 Thread Philipp Tomsich
On Mon, 14 Nov 2022 at 17:06, Jeff Law  wrote:
>
>
> On 11/13/22 13:48, Philipp Tomsich wrote:
> > The Ventana VT1 core supports quad-issue and instruction fusion.
> > This implemented TARGET_SCHED_MACRO_FUSION_P to keep fusible sequences
> > together and adds idiom matcheing for the supported fusion cases.
> >
> > gcc/ChangeLog:
> >
> >   * config/riscv/riscv.cc (enum riscv_fusion_pairs): Add symbolic
> >   constants to identify supported fusion patterns.
> >   (struct riscv_tune_param): Add fusible_op field.
> >   (riscv_macro_fusion_p): Implement.
> >   (riscv_fusion_enabled_p): Implement.
> >   (riscv_macro_fusion_pair_p): Implement and recoginze fusible
> >   idioms for Ventana VT1.
> >   (TARGET_SCHED_MACRO_FUSION_P): Point to riscv_macro_fusion_p.
> >   (TARGET_SCHED_MACRO_FUSION_PAIR_P): Point to 
> > riscv_macro_fusion_pair_p.
>
> You know the fusion rules for VT1 better than I...  I'm happy to largely
> defer to you on this.
>
> I do wonder if going forward hand matching RTL like this is going to be
> an unmaintainable mess and whether or not we would be better served
> using insn attributes to describe instruction fusion.

I had thought about that, too.
In the end our team decided to stay away from it for the time being:
fusion frequently needs to look at second-level properties and whether
the first instruction's output register is overwritten by the second
instruction.  So we kept with the same stereotype of idiom-matching
that is also used for AArch64 today.

That said, both the RISC-V and the AArch64 implementations of this are
on my list of things to refactor in a quiet hour.

>
>
>
> >
> > Signed-off-by: Philipp Tomsich 
> > ---
> >
> > Changes in v2:
> > - Update fusion patterns and catch some missing idioms/fusion pairs.
> >
> >   gcc/config/riscv/riscv.cc | 219 ++
> >   1 file changed, 219 insertions(+)
> >
> > diff --git a/gcc/config/riscv/riscv.cc b/gcc/config/riscv/riscv.cc
> > index 31d651f8744..43ba520885c 100644
> > --- a/gcc/config/riscv/riscv.cc
> > +++ b/gcc/config/riscv/riscv.cc
> >
> > +static bool
> > +riscv_macro_fusion_pair_p (rtx_insn *prev, rtx_insn *curr)
> > +{
> > +  rtx prev_set = single_set (prev);
> > +  rtx curr_set = single_set (curr);
> > +  /* prev and curr are simple SET insns i.e. no flag setting or branching. 
> >  */
> > +  bool simple_sets_p = prev_set && curr_set && !any_condjump_p (curr);
> > +
> > +  if (!riscv_macro_fusion_p ())
> > +return false;
> > +
> > +  if (simple_sets_p && (riscv_fusion_enabled_p (RISCV_FUSE_ZEXTW) ||
> > + riscv_fusion_enabled_p (RISCV_FUSE_ZEXTH)))
>
> Formatting nit.  Bring the && down to a new line and if you still need a
> line break for the "||",  then the "||" should be on a new line as
> well.  Something like this...
>
>
> if (simple_sets_p
>&& (riscv_fusion_enabled_p (RISCV_FUSE_ZEXTW
>
>|| riscv_fusion_enabled_p (RISCV_FUSE_ZEXTH)))
>
>
> > +   && REGNO (XEXP (SET_SRC (curr_set), 0)) == REGNO(SET_DEST 
> > (curr_set))
>
> Space before open paren on this line.
>
>
> >
> > +   && (( INTVAL (XEXP (SET_SRC (curr_set), 1)) == 32
> > + && riscv_fusion_enabled_p(RISCV_FUSE_ZEXTW) )
> > +   || ( INTVAL (XEXP (SET_SRC (curr_set), 1)) < 32
> > +&& riscv_fusion_enabled_p(RISCV_FUSE_ZEXTWS
>
> Extraneous spaces after the open parens before INTVALs above.
>
>
> > +   && REGNO (XEXP (SET_SRC (curr_set), 0)) == REGNO(SET_DEST 
> > (curr_set))
>
> Missing whitespace before open paren on this line.
>
>
> OK with the nits fixed.

Applied to master with these fixes (and a fix for the typo in the
commit message that Jakub spotted).
Thanks!

Philipp.


[wwwdocs] cxx-status: Add C++23 papers from the Nov 2022 Kona WG21 plenary

2022-11-14 Thread Marek Polacek via Gcc-patches
Pushed.

commit b97d1aba41d95ae9220fe08a991738a58a716212
Author: Marek Polacek 
Date:   Mon Nov 14 13:49:21 2022 -0500

cxx-status: Add C++23 papers from the Nov 2022 Kona WG21 plenary

diff --git a/htdocs/projects/cxx-status.html b/htdocs/projects/cxx-status.html
index 8650f3cd..3454cfc9 100644
--- a/htdocs/projects/cxx-status.html
+++ b/htdocs/projects/cxx-status.html
@@ -356,6 +356,42 @@
https://gcc.gnu.org/PR106658;>No

 
+
+
+   static operator[] 
+   https://wg21.link/p2589;>P2589R1
+   https://gcc.gnu.org/PR107684;>No
+   __cpp_multidimensional_subscript >= 202211L 
+
+
+
+   DR: Permitting static constexpr variables in
+ constexpr functions 
+   https://wg21.link/p2647;>P2647R1
+   https://gcc.gnu.org/PR107685;>No
+   __cpp_constexpr >= 202211L 
+
+
+
+   DR: consteval needs to propagate up 
+   https://wg21.link/p2564;>P2564R3
+   https://gcc.gnu.org/PR107687;>No
+   __cpp_consteval >= 202211L 
+
+
+
+   DR: Meaningful exports 
+   https://wg21.link/p2615;>P2615R1
+   https://gcc.gnu.org/PR107688;>No
+   
+
+
+
+   Wording for P2644R1 Fix for Range-based for Loop 
+   https://wg21.link/p2718;>P2718R0
+   https://gcc.gnu.org/PR107637;>No
+   
+
 

Re: [PATCH][X86_64] Separate znver4 insn reservations from older znvers

2022-11-14 Thread Alexander Monakov via Gcc-patches


On Mon, 14 Nov 2022, Joshi, Tejas Sanjay wrote:

> [Public]
> 
> Hi,

Hi. I'm still waiting for feedback on fixes for existing models:
https://inbox.sourceware.org/gcc-patches/5ae6fc21-edc6-133-aee2-a41e16eb...@ispras.ru/T/#t
did you have a chance to look at those?

> PFA the patch which adds znver4 instruction reservations separately from older
> znver versions:
> * This also models separate div, fdiv and ssediv units accordingly.

Why are you modeling 'fdiv' and 'ssediv' separately? When preparing the above
patches, I checked that x87 and SSE divisions use the same hardware unit, and
I don't see a strong reason to artificially clone it in the model.

(integer divider is a separate unit from the floating-point divider)

> * Does not blow-up the insn-automata.cc size (it grew from 201502 to 206141 
> for me.)
> * The patch successfully builds, bootstraps, and passes make check.
> * I have also run spec, showing no regressions for 1-copy 3-iteration runs. 
> However, I observe 1.5% gain for 507.cactuBSSN_r.

I have a question on AVX512 modeling in your patch:

> +;; AVX instructions
> +(define_insn_reservation "znver4_sse_log" 1
> +  (and (eq_attr "cpu" "znver4")
> +   (and (eq_attr "type" "sselog,sselog1")
> +(and (eq_attr "mode" "V4SF,V8SF,V2DF,V4DF")
> + (eq_attr "memory" "none"
> +  "znver4-direct,znver4-fpu")
> +
> +(define_insn_reservation "znver4_sse_log_evex" 1
> +  (and (eq_attr "cpu" "znver4")
> +   (and (eq_attr "type" "sselog,sselog1")
> +(and (eq_attr "mode" "V16SF,V8DF")
> + (eq_attr "memory" "none"
> +  
> "znver4-direct,znver4-fpu0+znver4-fpu1|znver4-fpu2+znver4-fpu3")
> +

This is an AVX512 instruction, and you're modeling that it occupies two ports at
once and thus has half throughput, but later in the AVX512 section:

> +;; AVX512 instructions
> +(define_insn_reservation "znver4_sse_mul_evex" 3
> +  (and (eq_attr "cpu" "znver4")
> +   (and (eq_attr "type" "ssemul")
> +(and (eq_attr "mode" "V16SF,V8DF")
> + (eq_attr "memory" "none"
> +  "znver4-double,znver4-fpu0|znver4-fpu3")

none of the instructions are modeled this way. If that's on purpose, can you
add a comment? It's surprising, since generally AVX512 has half throughput
compared to AVX256 on Zen 4, but the model doesn't seem to reflect that.

Alexander


Re: [PATCH v2 1/2] RISC-V: Add basic support for the Ventana-VT1 core

2022-11-14 Thread Philipp Tomsich
Applied to master. Thanks!

Philipp.

On Mon, 14 Nov 2022 at 16:52, Jeff Law  wrote:
>
>
> On 11/13/22 13:48, Philipp Tomsich wrote:
> > The Ventana-VT1 core is compatible with rv64gc, Zb[abcs], Zifenci and
> > XVentanaCondOps.
> > This introduces a placeholder -mcpu=ventana-vt1, so tooling and
> > scripts don't need to change once full support (pipeline, tuning,
> > etc.) will become public later.
> >
> > gcc/ChangeLog:
> >
> >   * config/riscv/riscv-cores.def (RISCV_TUNE): Add ventana-vt1.
> >   (RISCV_CORE): Ditto.
> >   * config/riscv/riscv-opts.h (enum riscv_microarchitecture_type): 
> > Ditto.
> >   * config/riscv/riscv.cc: Add tune_info for ventana-vt1.
> >   * config/riscv/riscv.md: Add ventana-vt1.
> >   * 
> > doc/gcc/gcc-command-options/machine-dependent-options/risc-v-options.rst:
> >   Document -mcpu= and -mtune with ventana-vt1.
>
> OK.
>
>
> WRT the scheduler description.  I have one, but I think it's on the
> server at the vacation house which went offline a couple weeks ago and
> due to health reasons I haven't been up there to reset the internet
> connection.  Worst case I can just rebuild it from scratch, it's not
> that complex.
>
> Jeff


Re: [PATCH] libstdc++: Enable _GLIBCXX_WEAK_DEFINITION on more platforms

2022-11-14 Thread Arsen Arsenović via Gcc-patches
Evening,

Similar to the other patch, pinging this before it drifts unreasonably
far into S3, and hence into GCC14.

Archive:
https://gcc.gnu.org/pipermail/gcc-patches/2022-October/603931.html

Have a great evening :)
-- 
Arsen Arsenović


signature.asc
Description: PGP signature


Ping: [PATCH] libstdc++: Enable building libstdc++.{a, so} when !HOSTED

2022-11-14 Thread Arsen Arsenović via Gcc-patches
Evening,

Since S1 is closed now, best to ping this patch before it drifts into
GCC14.

Archive link:
https://gcc.gnu.org/pipermail/gcc-patches/2022-October/604031.html

Have a great evening!
-- 
Arsen Arsenović


signature.asc
Description: PGP signature


Re: [PATCH 6/7] riscv: Add support for strlen inline expansion

2022-11-14 Thread Jeff Law via Gcc-patches



On 11/13/22 16:05, Christoph Muellner wrote:

From: Christoph Müllner 

This patch implements the expansion of the strlen builtin
using Zbb instructions (if available) for aligned strings
using the following sequence:

   li  a3,-1
   addia4,a0,8
.L2:  ld  a5,0(a0)
   addia0,a0,8
   orc.b   a5,a5
   beq a5,a3,6 <.L2>
   not a5,a5
   ctz a5,a5
   srlia5,a5,0x3
   add a0,a0,a5
   sub a0,a0,a4

This allows to inline calls to strlen(), with optimized code for
determining the length of a string.

gcc/ChangeLog:

* config/riscv/riscv-protos.h (riscv_expand_strlen): New
  prototype.
* config/riscv/riscv-string.cc (riscv_emit_unlikely_jump): New
  function.
(GEN_EMIT_HELPER2): New helper macro.
(GEN_EMIT_HELPER3): New helper macro.
(do_load_from_addr): New helper function.
(riscv_expand_strlen_zbb): New function.
(riscv_expand_strlen): New function.
* config/riscv/riscv.md (strlen): Invoke expansion
  functions for strlen.


+extern bool riscv_expand_strlen (rtx[]);


Consider adding the number of elements in the RTX array here. Martin S's 
work from a little while ago will make use of it to try and catch 
over-reads and over-writes if the data is available.



  
  /* Information about one CPU we know about.  */

  struct riscv_cpu_info {
diff --git a/gcc/config/riscv/riscv-string.cc b/gcc/config/riscv/riscv-string.cc
index 1137df475be..bf96522b608 100644
--- a/gcc/config/riscv/riscv-string.cc
+++ b/gcc/config/riscv/riscv-string.cc
@@ -38,6 +38,81 @@
  #include "predict.h"
  #include "optabs.h"
  
+/* Emit unlikely jump instruction.  */

+
+static rtx_insn *
+riscv_emit_unlikely_jump (rtx insn)
+{
+  rtx_insn *jump = emit_jump_insn (insn);
+  add_reg_br_prob_note (jump, profile_probability::very_unlikely ());
+  return jump;
+}


I was a bit surprised that we didn't have this as a generic routine.   
Consider adding this to emit-rtl.cc along with its companion 
emit_likely_jump.  Not a requirement to move forward, but it seems like 
the right thing to do.






+
+/* Emit proper instruction depending on type of dest.  */


s/type/mode/




+
+/* Emit proper instruction depending on type of dest.  */


s/type/mode/


You probably want to undefine GEN_EMIT_HELPER once you're done when 
them.  That's become fairly standard practice for these kind of helper 
macros.


OK with the nits fixed.  Your call on whether or not to move the 
implementation of emit_likely_jump and emit_unlikely_jump into emit-rtl.cc.



Jeff




[PATCH] tree-optimization/107485 - fix non-call exception ICE with inlining

2022-11-14 Thread Richard Biener via Gcc-patches
Inlining performs a wrong non-call exception fixup for VEC_COND_EXPRs
which on the branch fail to properly have the condition split out in
the first place.

Bootstrapped and tested on x86_64-unknown-linux-gnu, pushed to
the GCC 10 branch which is the only one this code snippet prevails.

PR tree-optimization/107485
* tree-inline.c (remap_gimple_stmt): Use correct type for
split out condition of [VEC_]COND_EXPRs.
---
 gcc/tree-inline.c | 7 +++
 1 file changed, 3 insertions(+), 4 deletions(-)

diff --git a/gcc/tree-inline.c b/gcc/tree-inline.c
index c20c25ceb50..658b09c07d2 100644
--- a/gcc/tree-inline.c
+++ b/gcc/tree-inline.c
@@ -1979,11 +1979,10 @@ remap_gimple_stmt (gimple *stmt, copy_body_data *id)
 || gimple_assign_rhs_code (ass) == VEC_COND_EXPR)
&& gimple_could_trap_p (ass))
  {
-   gassign *cmp
- = gimple_build_assign (make_ssa_name (boolean_type_node),
-gimple_assign_rhs1 (ass));
+   tree def = make_ssa_name (TREE_TYPE (gimple_assign_rhs1 (ass)));
+   gassign *cmp = gimple_build_assign (def, gimple_assign_rhs1 (ass));
gimple_seq_add_stmt (, cmp);
-   gimple_assign_set_rhs1 (ass, gimple_assign_lhs (cmp));
+   gimple_assign_set_rhs1 (ass, def);
  }
 }
 
-- 
2.35.3


[PATCH][committed] aarch64: Add support for +cssc

2022-11-14 Thread Kyrylo Tkachov via Gcc-patches
Hi all,

This patch adds codegen for FEAT_CSSC from the 2022 Architecture extensions.
It fits various existing optabs in GCC quite well.
There are instructions for scalar signed/unsigned min/max, abs, ctz, popcount.
We have expanders for these already, so they are wired up to emit single-insn
patterns for the new TARGET_CSSC.

These instructions are enabled by the +cssc command-line extension.
This version of the patch follows Andrew's suggestion for the expanders.
Pushing to trunk.
Thanks,
Kyrill

gcc/ChangeLog:

* config/aarch64/aarch64-option-extensions.def (cssc): Define.
* config/aarch64/aarch64.h (AARCH64_ISA_CSSC): Define.
(TARGET_CSSC): Likewise.
* config/aarch64/aarch64.md (*aarch64_abs2_cssc_ins): New 
define_insn.
(abs2): Adjust for the above.
(aarch64_umax3_insn): New define_insn.
(umax3): Adjust for the above.
(*aarch64_popcount2_cssc_insn): New define_insn.
(popcount2): Adjust for the above.
(3): New define_insn.
* config/aarch64/constraints.md (Usm): Define.
(Uum): Likewise.
* doc/invoke.texi (AArch64 options): Document +cssc.
* config/aarch64/iterators.md (MAXMIN_NOUMAX): New code iterator.
* config/aarch64/predicates.md (aarch64_sminmax_immediate): Define.
(aarch64_sminmax_operand): Likewise.
(aarch64_uminmax_immediate): Likewise.
(aarch64_uminmax_operand): Likewise.

gcc/testsuite/ChangeLog:

* gcc.target/aarch64/cssc_1.c: New test.
* gcc.target/aarch64/cssc_2.c: New test.
* gcc.target/aarch64/cssc_3.c: New test.
* gcc.target/aarch64/cssc_4.c: New test.
* gcc.target/aarch64/cssc_5.c: New test.


cssc.patch
Description: cssc.patch


Re: [PATCH] Fix gdb FilteringTypePrinter (again)

2022-11-14 Thread François Dumont via Gcc-patches

Any chance to review this one ?

On 06/10/22 19:38, François Dumont wrote:

Hi

Looks like the previous patch was not enough. When using it in the 
context of a build without dual abi and versioned namespace I started 
having failures again. I guess I hadn't rebuild everything properly.


This time I think the problem was in those lines:

    if self.type_obj == type_obj:
    return strip_inline_namespaces(self.name)

I've added a call to gdb.types.get_basic_type so that we do not 
compare a type with its typedef.


Thanks for the pointer to the doc !

Doing so I eventually use your code Jonathan to make 
FilteringTypeFilter more specific to a given instantiation.


    libstdc++: Fix gdb FilteringTypePrinter

    Once we found a matching FilteringTypePrinter instance we look for 
the associated
    typedef and check that the returned Python Type is equal to the 
Type to recognize.
    But gdb Python Type includes properties to distinguish a typedef 
from the actual
    type. So use gdb.types.get_basic_type to check if we are indeed on 
the same type.


    Additionnaly enhance FilteringTypePrinter matching mecanism by 
introducing targ1 that,

    if not None, will be used as the 1st template parameter.

    libstdc++-v3/ChangeLog:

    * python/libstdcxx/v6/printers.py (FilteringTypePrinter): 
Rename 'match' field
    'template'. Add self.targ1 to specify the first template 
parameter of the instantiation

    to match.
    (add_one_type_printer): Add targ1 optional parameter, 
default to None.
    Use gdb.types.get_basic_type to compare the type to 
recognize and the type

    returned from the typedef lookup.
    (register_type_printers): Adapt calls to 
add_one_type_printers.


Tested under Linux x86_64 normal, version namespace with or without 
dual abi.


François





Re: [PATCH] PR 107189 Remove useless _Alloc_node

2022-11-14 Thread François Dumont via Gcc-patches

Gentle reminder.

Sorry if I should have committed it as trivial but I cannot do it 
anymore now that I asked :-)



On 12/10/22 22:18, François Dumont wrote:

libstdc++: Remove _Alloc_node instance in _Rb_tree [PR107189]

    libstdc++-v3/ChangeLog:

    PR libstdc++/107189
    * include/bits/stl_tree.h 
(_Rb_tree<>::_M_insert_range_equal): Remove

    unused _Alloc_node instance.

Ok to commit ?

François





Re: [PATCH v3] c++: parser - Support for target address spaces in C++

2022-11-14 Thread Jason Merrill via Gcc-patches

On 11/10/22 06:40, Georg-Johann Lay wrote:



Am 10.11.22 um 15:08 schrieb Paul Iannetta:

On Thu, Nov 03, 2022 at 02:38:39PM +0100, Georg-Johann Lay wrote:

[PATCH v3] c++: parser - Support for target address spaces in C++

2. Will it work with compound literals?
===

Currently, the following C code works for target avr:

const __flash char *pHallo = (const __flash char[]) { "Hallo" };

This is a pointer in RAM (AS0) that holds the address of a string in 
flash

(AS1) and is initialized with that address. Unfortunately, this does not
work locally:

const __flash char* get_hallo (void)
{
 [static] const __flash char *p2 = (const __flash char[]) { 
"Hallo2" };

 return p2;
}

foo.c: In function 'get_hallo':
foo.c: error: compound literal qualified by address-space qualifier

Is there any way to make this work now? Would be great!


I don't object to allowing this, but what's the advantage of this 
pattern over


static __flash const char p2[] = "Hallo2";

?


Currently, I implement the same restrictions as the C front-end, but I
think that this restriction could be lifted.


Hi Paul,

this would be great.  FYI, due to AVR quirks, .rodata is located in RAM.
Reason behind this is that in functions like

char get_letter (const char *c)
{
     return *c;
}

there is no means to determine whether get_letter was called with a 
const char* or a char*.  Accessing flash vs. RAM would require different 
instructions, thus .rodata is part of RAM, so that RAM accesses will 
work in either case.


The obvious problem is that this wastes RAM. One way out is to define 
address space in flash and to pass const __flash char*, where respective 
objects are located in flash (.progmem.data in case of avr).


This is fine for objects which the application creates, but there are 
also artificial objects like vtables or cswtch tables.



3. Will TARGET_ADDR_SPACE_DIAGNOSE_USAGE still work?


Currently there is target hook TARGET_ADDR_SPACE_DIAGNOSE_USAGE.
I did not see it in your patches, so maybe I just missed it? See
https://gcc.gnu.org/onlinedocs/gcc-12.2.0/gccint/Named-Address-Spaces.html#index-TARGET_005fADDR_005fSPACE_005fDIAGNOSE_005fUSAGE


That was a point I overlooked in my previous patch.  This will be in
my new revision where I also add implicit conversion between
address spaces and also expose TARGET_ADDR_SPACE_CONVERT.


4. Will it be possible to put C++ virtual tables in ASs, and how?
=


Currently, I do not allow the declaration of instances of classes in
an address space, mainly to not have to cope with the handling of the
this pointer.  That is,

   __flash Myclass *t;

does not work.  Nevertheless, I admit that this is would be nice to
have.


One big complaint about avr-g++ is that there is no way to put vtables in
flash (address-space 1) and to access them accordingly.  How can this be
achieved with C++ address spaces?


Do you want only the vtables in the flash address space or do you want
to be able to have the whole class content.


My question is about vtables, not the bits that represent some object.
vtables are stored independently of objects, usually in .rodata + 
comdat.  Notice that vtables are read-only and in static storage, even 
if objects are neither.


The problem with vtables is that the user has no handle to specify where 
to locate them -- and even if, due to AVR quirks, the right instruction 
must be used.  Thus just putting vtables in flash by means of some 
section attribute won't work, only address-spaces can do the trick.



1. If you only want the vtables, I think that a target hook called
at vtable creation would do the trick.


Yes, that would be enough, see https://gcc.gnu.org/PR43745


As you say there, this would be an ABI change, so there would need to be 
a transition strategy.  I don't know to what extent AVR users try to use 
older compiled code vs. always rebuilding everything.



Johann


2. If you want to be able to have pointer to classes in __flash, I
will need to further the support I have currently implemented to
support the pointer this qualified with an address space.
Retrospectively, I think this have to be implemented.

Paul


Would be great if this would work, but I think this can be really 
tricky, because it's already tricky for non-class objects.


Indeed, especially if objects of the same type can live either in flash 
or RAM: you'd need 2 or more of each method for the different accesses. 
Perhaps via cloning.


Simpler might be to declare that objects of a particular class can only 
live in flash.


A user has to specify __flash explicitly, which is quite different to 
plain objects.  For example, a const int can live in .rodata, but in 
cases like


extern int func();
extern const int ival;
const int ival = func();

ival would live in .bss and be initialized at runtime by a static 
constructor. 

Re: [PATCH 1/7] riscv: bitmanip: add orc.b as an unspec

2022-11-14 Thread Jeff Law via Gcc-patches



On 11/14/22 09:51, Jeff Law wrote:


On 11/13/22 16:05, Christoph Muellner wrote:

From: Philipp Tomsich 

As a basis for optimized string functions (e.g., the by-pieces
implementations), we need orc.b available.  This adds orc.b as an
unspec, so we can expand to it.

gcc/ChangeLog:

 * config/riscv/bitmanip.md (orcb2): Add orc.b as an
  unspec.
 * config/riscv/riscv.md: Add UNSPEC_ORC_B.
In general, we should prefer to express things as "real" RTL rather 
than UNSPECS.  In this particular case expressing the orc could be 
done with a handful of IOR expressions, though they'd probably need to 
reference byte SUBREGs of the input and I dislike explicit SUBREGs in 
the md file even more than UNSPECs. 


Mis-read the specs on orc.  So ignore the comment about expressing this 
as a handful of IORs and about it being reduc_ior_scal.


jeff




Re: [PATCH] libatomic: Add support for LSE and LSE2

2022-11-14 Thread Richard Sandiford via Gcc-patches
Wilco Dijkstra via Gcc-patches  writes:
> Add support for AArch64 LSE and LSE2 to libatomic.  Disable outline atomics,
> and use LSE ifuncs for 1-8 byte atomics and LSE2 ifuncs for 16-byte atomics.
> On Neoverse V1, 16-byte atomics are ~4x faster due to avoiding locks.
>
> Note this is safe since we swap all 16-byte atomics using the same ifunc,
> so they either use locks or LSE2 atomics, but never a mix. This also improves
> ABI compatibility with LLVM: its inlined 16-byte atomics are compatible with
> the new libatomic if LSE2 is supported.
>
> Passes regress, OK for commit?
>
> libatomic/
> Makefile.in: Regenerated with automake 1.15.1.
> Makefile.am: Add atomic_16.S for AArch64.
> configure.tgt: Disable outline atomics in AArch64 build.
> config/linux/aarch64/atomic_16.S: New file - implementation of
> ifuncs for 128-bit atomics.
> config/linux/aarch64/host-config.h: Enable ifuncs, use LSE 
> (HWCAP_ATOMICS)
> for 1-8-byte atomics and LSE2 (HWCAP_USCAT) for 16-byte atomics.
>
> ---
> diff --git a/libatomic/Makefile.am b/libatomic/Makefile.am
> index 
> d88515e4a03bd812334ae0b7bf4c0bba119455dc..41e5da28512150780a2018386e22b4e70afcfa3f
>  100644
> --- a/libatomic/Makefile.am
> +++ b/libatomic/Makefile.am
> @@ -127,6 +127,8 @@ if HAVE_IFUNC
>  if ARCH_AARCH64_LINUX
>  IFUNC_OPTIONS = -march=armv8-a+lse
>  libatomic_la_LIBADD += $(foreach s,$(SIZES),$(addsuffix 
> _$(s)_1_.lo,$(SIZEOBJS)))
> +libatomic_la_SOURCES += atomic_16.S
> +
>  endif
>  if ARCH_ARM_LINUX
>  IFUNC_OPTIONS = -march=armv7-a+fp -DHAVE_KERNEL64
> diff --git a/libatomic/Makefile.in b/libatomic/Makefile.in
> index 
> 80d25653dc75cca995c8b0b2107a55f1234a6d52..89e29fc60a7fb74341b2f0f805e461847073082c
>  100644
> --- a/libatomic/Makefile.in
> +++ b/libatomic/Makefile.in
> @@ -90,13 +90,14 @@ build_triplet = @build@
>  host_triplet = @host@
>  target_triplet = @target@
>  @ARCH_AARCH64_LINUX_TRUE@@HAVE_IFUNC_TRUE@am__append_1 = $(foreach 
> s,$(SIZES),$(addsuffix _$(s)_1_.lo,$(SIZEOBJS)))
> -@ARCH_ARM_LINUX_TRUE@@HAVE_IFUNC_TRUE@am__append_2 = $(foreach \
> +@ARCH_AARCH64_LINUX_TRUE@@HAVE_IFUNC_TRUE@am__append_2 = atomic_16.S
> +@ARCH_ARM_LINUX_TRUE@@HAVE_IFUNC_TRUE@am__append_3 = $(foreach \
>  @ARCH_ARM_LINUX_TRUE@@HAVE_IFUNC_TRUE@   s,$(SIZES),$(addsuffix \
>  @ARCH_ARM_LINUX_TRUE@@HAVE_IFUNC_TRUE@   _$(s)_1_.lo,$(SIZEOBJS))) \
>  @ARCH_ARM_LINUX_TRUE@@HAVE_IFUNC_TRUE@   $(addsuffix \
>  @ARCH_ARM_LINUX_TRUE@@HAVE_IFUNC_TRUE@   _8_2_.lo,$(SIZEOBJS))
> -@ARCH_I386_TRUE@@HAVE_IFUNC_TRUE@am__append_3 = $(addsuffix 
> _8_1_.lo,$(SIZEOBJS))
> -@ARCH_X86_64_TRUE@@HAVE_IFUNC_TRUE@am__append_4 = $(addsuffix 
> _16_1_.lo,$(SIZEOBJS)) \
> +@ARCH_I386_TRUE@@HAVE_IFUNC_TRUE@am__append_4 = $(addsuffix 
> _8_1_.lo,$(SIZEOBJS))
> +@ARCH_X86_64_TRUE@@HAVE_IFUNC_TRUE@am__append_5 = $(addsuffix 
> _16_1_.lo,$(SIZEOBJS)) \
>  @ARCH_X86_64_TRUE@@HAVE_IFUNC_TRUE@ $(addsuffix 
> _16_2_.lo,$(SIZEOBJS))
>  
>  subdir = .
> @@ -154,8 +155,11 @@ am__uninstall_files_from_dir = { \
>}
>  am__installdirs = "$(DESTDIR)$(toolexeclibdir)"
>  LTLIBRARIES = $(noinst_LTLIBRARIES) $(toolexeclib_LTLIBRARIES)
> +@ARCH_AARCH64_LINUX_TRUE@@HAVE_IFUNC_TRUE@am__objects_1 =  \
> +@ARCH_AARCH64_LINUX_TRUE@@HAVE_IFUNC_TRUE@   atomic_16.lo
>  am_libatomic_la_OBJECTS = gload.lo gstore.lo gcas.lo gexch.lo \
> - glfree.lo lock.lo init.lo fenv.lo fence.lo flag.lo
> + glfree.lo lock.lo init.lo fenv.lo fence.lo flag.lo \
> + $(am__objects_1)
>  libatomic_la_OBJECTS = $(am_libatomic_la_OBJECTS)
>  AM_V_lt = $(am__v_lt_@AM_V@)
>  am__v_lt_ = $(am__v_lt_@AM_DEFAULT_V@)
> @@ -165,9 +169,9 @@ libatomic_la_LINK = $(LIBTOOL) $(AM_V_lt) --tag=CC 
> $(AM_LIBTOOLFLAGS) \
>   $(LIBTOOLFLAGS) --mode=link $(CCLD) $(AM_CFLAGS) $(CFLAGS) \
>   $(libatomic_la_LDFLAGS) $(LDFLAGS) -o $@
>  libatomic_convenience_la_DEPENDENCIES = $(libatomic_la_LIBADD)
> -am__objects_1 = gload.lo gstore.lo gcas.lo gexch.lo glfree.lo lock.lo \
> - init.lo fenv.lo fence.lo flag.lo
> -am_libatomic_convenience_la_OBJECTS = $(am__objects_1)
> +am__objects_2 = gload.lo gstore.lo gcas.lo gexch.lo glfree.lo lock.lo \
> + init.lo fenv.lo fence.lo flag.lo $(am__objects_1)
> +am_libatomic_convenience_la_OBJECTS = $(am__objects_2)
>  libatomic_convenience_la_OBJECTS =  \
>   $(am_libatomic_convenience_la_OBJECTS)
>  AM_V_P = $(am__v_P_@AM_V@)
> @@ -185,6 +189,16 @@ am__v_at_1 =
>  depcomp = $(SHELL) $(top_srcdir)/../depcomp
>  am__depfiles_maybe = depfiles
>  am__mv = mv -f
> +CPPASCOMPILE = $(CCAS) $(DEFS) $(DEFAULT_INCLUDES) $(INCLUDES) \
> + $(AM_CPPFLAGS) $(CPPFLAGS) $(AM_CCASFLAGS) $(CCASFLAGS)
> +LTCPPASCOMPILE = $(LIBTOOL) $(AM_V_lt) $(AM_LIBTOOLFLAGS) \
> + $(LIBTOOLFLAGS) --mode=compile $(CCAS) $(DEFS) \
> + $(DEFAULT_INCLUDES) $(INCLUDES) $(AM_CPPFLAGS) $(CPPFLAGS) \
> + $(AM_CCASFLAGS) $(CCASFLAGS)
> +AM_V_CPPAS = $(am__v_CPPAS_@AM_V@)
> +am__v_CPPAS_ = 

Re: GCC 13.0.0 Status Report (2022-11-14), Stage 3 in effect now

2022-11-14 Thread Xi Ruoyao via Gcc-patches
Hi Martin,

Is it allowed to merge libsanitizer from LLVM in stage 3?  If not I'd
like to cherry pick some commits from LLVM [to fix some stupid errors
I've made in LoongArch libasan :(].

On Mon, 2022-11-14 at 13:21 +, Richard Biener via Gcc-patches wrote:
> Status
> ==
> 
> The GCC development branch which will become GCC 13 is now in
> bugfixing mode (Stage 3) until the end of Jan 15th.
> 
> As usual the first weeks of Stage 3 are used to feature patches
> posted late during Stage 1.  At some point unreviewed features
> need to be postponed for the next Stage 1.
> 
> 
> Quality Data
> 
> 
> Priority  #   Change from last report
>     ---   ---
> P1  33    
> P2  473 
> P3  113   +  29
> P4  253   +   6
> P5  25   
>     ---   ---
> Total P1-P3 619   +  29
> Total   897   +  35
> 
> 
> Previous Report
> ===
> 
> https://gcc.gnu.org/pipermail/gcc/2022-October/239690.html

-- 
Xi Ruoyao 
School of Aerospace Science and Technology, Xidian University


Re: [PATCH 5/7] riscv: Use by-pieces to do overlapping accesses in block_move_straight

2022-11-14 Thread Jeff Law via Gcc-patches



On 11/13/22 16:05, Christoph Muellner wrote:

From: Christoph Müllner 

The current implementation of riscv_block_move_straight() emits a couple
of load-store pairs with maximum width (e.g. 8-byte for RV64).
The remainder is handed over to move_by_pieces(), which emits code based
target settings like slow_unaligned_access and overlap_op_by_pieces.

move_by_pieces() will emit overlapping memory accesses with maximum
width only if the given length exceeds the size of one access
(e.g. 15-bytes for 8-byte accesses).

This patch changes the implementation of riscv_block_move_straight()
such, that it preserves a remainder within the interval
[delta..2*delta) instead of [0..delta), so that overlapping memory
access may be emitted (if the requirements for them are given).

gcc/ChangeLog:

* config/riscv/riscv-string.c (riscv_block_move_straight):
  Adjust range for emitted load/store pairs.


The change to riscv_expand_block_move isn't noted in the ChangeLog.  OK 
with that fixed (I'm assuming you want to attempt to use overlapping 
word ops for that case).



jeff




Re: [PATCH 4/7] riscv: Move riscv_block_move_loop to separate file

2022-11-14 Thread Jeff Law via Gcc-patches



On 11/13/22 16:05, Christoph Muellner wrote:

From: Christoph Müllner 

Let's try to not accumulate too much functionality in one single file
as this does not really help maintaining or extending the code.
So in order to add more similar functionality like riscv_block_move_loop
let's move this function to a separate file.

This change does not do any functional changes.
It does modify a single line in the existing code,
that check_GNU_style.py complained about.

gcc/ChangeLog:

* config.gcc: Add new object riscv-string.o
* config/riscv/riscv-protos.h (riscv_expand_block_move): Remove
  duplicated prototype and move to new section for
  riscv-string.cc.
* config/riscv/riscv.cc (riscv_block_move_straight): Remove function.
(riscv_adjust_block_mem): Likewise.
(riscv_block_move_loop): Likewise.
(riscv_expand_block_move): Likewise.
* config/riscv/riscv.md (cpymemsi): Move to new section for
  riscv-string.cc.
* config/riscv/t-riscv: Add compile rule for riscv-string.o
* config/riscv/riscv-string.c: New file.


OK.  Note I suspect the commit hooks are going to complain about your 
ChangeLog formatting.



jeff




Re: [PATCH 2/7] riscv: bitmanip/zbb: Add prefix/postfix and enable visiblity

2022-11-14 Thread Jeff Law via Gcc-patches



On 11/13/22 16:05, Christoph Muellner wrote:

From: Christoph Müllner 

INSNs are usually postfixed by a number representing the argument count.
Given the instructions will be used in a later commit, let's make them
visible, but add a "riscv_" prefix to avoid conflicts with standard
INSNs.

gcc/ChangeLog:

* config/riscv/bitmanip.md (*_not): Rename INSN.
(riscv__not3): Rename INSN.
(*xor_not): Rename INSN.
(xor_not3): Rename INSN.


Not strictly necessary, but given how often I've seen ports expose an 
insn with a standard name, but ever so slightly different semantics and 
the ensuing code correctness issues, I like the idea of prefixing.



OK

jeff




Re: [PATCH 1/7] riscv: bitmanip: add orc.b as an unspec

2022-11-14 Thread Jeff Law via Gcc-patches



On 11/13/22 16:05, Christoph Muellner wrote:

From: Philipp Tomsich 

As a basis for optimized string functions (e.g., the by-pieces
implementations), we need orc.b available.  This adds orc.b as an
unspec, so we can expand to it.

gcc/ChangeLog:

 * config/riscv/bitmanip.md (orcb2): Add orc.b as an
  unspec.
 * config/riscv/riscv.md: Add UNSPEC_ORC_B.
In general, we should prefer to express things as "real" RTL rather than 
UNSPECS.  In this particular case expressing the orc could be done with 
a handful of IOR expressions, though they'd probably need to reference 
byte SUBREGs of the input and I dislike explicit SUBREGs in the md file 
even more than UNSPECs.  So


OK.


Jeff


ps.  We could consider this a reduc_ior_scal insn, but that may be 
actively harmful.  Having vector ops on the general and vector registers 
is a wart I hope we can avoid.





[PATCH] RISC-V: Optimal RVV epilogue logic.

2022-11-14 Thread jiawei
Skip add insn generate if the adjust size equal to zero.

gcc/ChangeLog:

* config/riscv/riscv.cc (riscv_expand_epilogue): 
New if control segement.

---
 gcc/config/riscv/riscv.cc | 18 ++
 1 file changed, 10 insertions(+), 8 deletions(-)

diff --git a/gcc/config/riscv/riscv.cc b/gcc/config/riscv/riscv.cc
index 02a01ca0b7c..af138db7545 100644
--- a/gcc/config/riscv/riscv.cc
+++ b/gcc/config/riscv/riscv.cc
@@ -5186,24 +5186,26 @@ riscv_expand_epilogue (int style)
}
 
   /* Get an rtx for STEP1 that we can add to BASE.  */
-  rtx adjust = GEN_INT (step1.to_constant ());
-  if (!SMALL_OPERAND (step1.to_constant ()))
+  if (step1.to_constant () != 0){
+rtx adjust = GEN_INT (step1.to_constant ());
+if (!SMALL_OPERAND (step1.to_constant ()))
{
  riscv_emit_move (RISCV_PROLOGUE_TEMP (Pmode), adjust);
  adjust = RISCV_PROLOGUE_TEMP (Pmode);
}
 
-  insn = emit_insn (
+insn = emit_insn (
   gen_add3_insn (stack_pointer_rtx, stack_pointer_rtx, adjust));
 
-  rtx dwarf = NULL_RTX;
-  rtx cfa_adjust_rtx = gen_rtx_PLUS (Pmode, stack_pointer_rtx,
+rtx dwarf = NULL_RTX;
+rtx cfa_adjust_rtx = gen_rtx_PLUS (Pmode, stack_pointer_rtx,
 GEN_INT (step2));
 
-  dwarf = alloc_reg_note (REG_CFA_DEF_CFA, cfa_adjust_rtx, dwarf);
-  RTX_FRAME_RELATED_P (insn) = 1;
+dwarf = alloc_reg_note (REG_CFA_DEF_CFA, cfa_adjust_rtx, dwarf);
+RTX_FRAME_RELATED_P (insn) = 1;
 
-  REG_NOTES (insn) = dwarf;
+REG_NOTES (insn) = dwarf;
+  }
 }
   else if (frame_pointer_needed)
 {
-- 
2.25.1



Re: [PATCH 1/2]middle-end: Add new tbranch optab to add support for bit-test-and-branch operations

2022-11-14 Thread Jeff Law via Gcc-patches



On 11/14/22 08:56, Tamar Christina wrote:


gcc/ChangeLog:

* dojump.cc (do_jump): Pass along value.
(do_jump_by_parts_greater_rtx): Likewise.
(do_jump_by_parts_zero_rtx): Likewise.
(do_jump_by_parts_equality_rtx): Likewise.
(do_compare_rtx_and_jump): Likewise.
(do_compare_and_jump): Likewise.
* dojump.h (do_compare_rtx_and_jump): New.
* optabs.cc (emit_cmp_and_jump_insn_1): Refactor to take optab to check.
(validate_test_and_branch): New.
(emit_cmp_and_jump_insns): Optiobally take a value, and when value is
supplied then check if it's suitable for tbranch.
* optabs.def (tbranch$a4): New.
* doc/md.texi (tbranch@var{mode}4): Document it.
* optabs.h (emit_cmp_and_jump_insns):
* tree.h (tree_zero_one_valued_p): New.


OK.

jeff



[PATCH][X86_64] Separate znver4 insn reservations from older znvers

2022-11-14 Thread Joshi, Tejas Sanjay via Gcc-patches
[Public]

Hi,

PFA the patch which adds znver4 instruction reservations separately from older 
znver versions:
* This also models separate div, fdiv and ssediv units accordingly.
* Does not blow-up the insn-automata.cc size (it grew from 201502 to 206141 for 
me.)
* The patch successfully builds, bootstraps, and passes make check.
* I have also run spec, showing no regressions for 1-copy 3-iteration runs. 
However, I observe 1.5% gain for 507.cactuBSSN_r.

Is it ok for trunk?

Thanks and Regards,
Tejas

Also, should I inline such long patches?

gcc/ChangeLog:

* gcc/common/config/i386/i386-common.cc (processor_alias_table):
Use CPU_ZNVER4 for znver4.
* config/i386/i386.md: Add znver4.md.
* config/i386/znver4.md: New.

---
 gcc/common/config/i386/i386-common.cc |2 +-
 gcc/config/i386/i386.md   |1 +
 gcc/config/i386/znver4.md | 1028 +
 3 files changed, 1030 insertions(+), 1 deletion(-)
 create mode 100644 gcc/config/i386/znver4.md

diff --git a/gcc/common/config/i386/i386-common.cc 
b/gcc/common/config/i386/i386-common.cc
index f66bdd5a2af..4b01c3540e5 100644
--- a/gcc/common/config/i386/i386-common.cc
+++ b/gcc/common/config/i386/i386-common.cc
@@ -2113,7 +2113,7 @@ const pta processor_alias_table[] =
   {"znver3", PROCESSOR_ZNVER3, CPU_ZNVER3,
 PTA_ZNVER3,
 M_CPU_SUBTYPE (AMDFAM19H_ZNVER3), P_PROC_AVX2},
-  {"znver4", PROCESSOR_ZNVER4, CPU_ZNVER3,
+  {"znver4", PROCESSOR_ZNVER4, CPU_ZNVER4,
 PTA_ZNVER4,
 M_CPU_SUBTYPE (AMDFAM19H_ZNVER4), P_PROC_AVX512F},
   {"btver1", PROCESSOR_BTVER1, CPU_GENERIC,
diff --git a/gcc/config/i386/i386.md b/gcc/config/i386/i386.md
index 8081df76741..c18dfe2af9e 100644
--- a/gcc/config/i386/i386.md
+++ b/gcc/config/i386/i386.md
@@ -1312,6 +1312,7 @@
 (include "bdver3.md")
 (include "btver2.md")
 (include "znver.md")
+(include "znver4.md")
 (include "geode.md")
 (include "atom.md")
 (include "slm.md")
diff --git a/gcc/config/i386/znver4.md b/gcc/config/i386/znver4.md
new file mode 100644
index 000..e3892d1df2f
--- /dev/null
+++ b/gcc/config/i386/znver4.md
@@ -0,0 +1,1028 @@
+;; Copyright (C) 2012-2022 Free Software Foundation, Inc.
+;;
+;; This file is part of GCC.
+;;
+;; GCC is free software; you can redistribute it and/or modify
+;; it under the terms of the GNU General Public License as published by
+;; the Free Software Foundation; either version 3, or (at your option)
+;; any later version.
+;;
+;; GCC is distributed in the hope that it will be useful,
+;; but WITHOUT ANY WARRANTY; without even the implied warranty of
+;; MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+;; GNU General Public License for more details.
+;;
+;; You should have received a copy of the GNU General Public License
+;; along with GCC; see the file COPYING3.  If not see
+;; .
+;;
+
+
+(define_attr "znver4_decode" "direct,vector,double"
+  (const_string "direct"))
+
+;; AMD znver4 Scheduling
+;; Modeling automatons for zen decoders, integer execution pipes,
+;; AGU pipes, branch, floating point execution and fp store units.
+(define_automaton "znver4, znver4_ieu, znver4_idiv, znver4_fdiv, 
znver4_ssediv, znver4_agu, znver4_bru, znver4_fpu, znver4_fp_store")
+
+;; Decoders unit has 4 decoders and all of them can decode fast path
+;; and vector type instructions.
+(define_cpu_unit "znver4-decode0" "znver4")
+(define_cpu_unit "znver4-decode1" "znver4")
+(define_cpu_unit "znver4-decode2" "znver4")
+(define_cpu_unit "znver4-decode3" "znver4")
+
+;; Currently blocking all decoders for vector path instructions as
+;; they are dispatched separetely as microcode sequence.
+(define_reservation "znver4-vector" 
"znver4-decode0+znver4-decode1+znver4-decode2+znver4-decode3")
+
+;; Direct instructions can be issued to any of the four decoders.
+(define_reservation "znver4-direct" 
"znver4-decode0|znver4-decode1|znver4-decode2|znver4-decode3")
+
+;; Fix me: Need to revisit this later to simulate fast path double behavior.
+(define_reservation "znver4-double" "znver4-direct")
+
+
+;; Integer unit 4 ALU pipes.
+(define_cpu_unit "znver4-ieu0" "znver4_ieu")
+(define_cpu_unit "znver4-ieu1" "znver4_ieu")
+(define_cpu_unit "znver4-ieu2" "znver4_ieu")
+(define_cpu_unit "znver4-ieu3" "znver4_ieu")
+(define_reservation "znver4-ieu" 
"znver4-ieu0|znver4-ieu1|znver4-ieu2|znver4-ieu3")
+
+;; 3 AGU pipes in znver4
+(define_cpu_unit "znver4-agu0" "znver4_agu")
+(define_cpu_unit "znver4-agu1" "znver4_agu")
+(define_cpu_unit "znver4-agu2" "znver4_agu")
+(define_reservation "znver4-agu-reserve" "znver4-agu0|znver4-agu1|znver4-agu2")
+
+;; Load is 4 cycles. We do not model reservation of load unit.
+(define_reservation "znver4-load" "znver4-agu-reserve")
+(define_reservation "znver4-store" "znver4-agu-reserve")
+
+;; vectorpath (microcoded) instructions are single issue instructions.
+;; So, they occupy all the integer units.
+(define_reservation "znver4-ivector" 

Re: [PATCH v2 2/2] RISC-V: Add instruction fusion (for ventana-vt1)

2022-11-14 Thread Jakub Jelinek via Gcc-patches
On Mon, Nov 14, 2022 at 09:06:10AM -0700, Jeff Law via Gcc-patches wrote:
> 
> On 11/13/22 13:48, Philipp Tomsich wrote:
> > The Ventana VT1 core supports quad-issue and instruction fusion.
> > This implemented TARGET_SCHED_MACRO_FUSION_P to keep fusible sequences
> > together and adds idiom matcheing for the supported fusion cases.
> > 
> > gcc/ChangeLog:
> > 
> > * config/riscv/riscv.cc (enum riscv_fusion_pairs): Add symbolic
> > constants to identify supported fusion patterns.
> > (struct riscv_tune_param): Add fusible_op field.
> > (riscv_macro_fusion_p): Implement.
> > (riscv_fusion_enabled_p): Implement.
> > (riscv_macro_fusion_pair_p): Implement and recoginze fusible

s/recoginze/recognize/

> > idioms for Ventana VT1.
> > (TARGET_SCHED_MACRO_FUSION_P): Point to riscv_macro_fusion_p.
> > (TARGET_SCHED_MACRO_FUSION_PAIR_P): Point to riscv_macro_fusion_pair_p.
> 
> You know the fusion rules for VT1 better than I...  I'm happy to largely
> defer to you on this.
> 
> I do wonder if going forward hand matching RTL like this is going to be an
> unmaintainable mess and whether or not we would be better served using insn
> attributes to describe instruction fusion.

Jakub



Re: [PATCH v2 2/2] RISC-V: Add instruction fusion (for ventana-vt1)

2022-11-14 Thread Jeff Law via Gcc-patches



On 11/13/22 13:48, Philipp Tomsich wrote:

The Ventana VT1 core supports quad-issue and instruction fusion.
This implemented TARGET_SCHED_MACRO_FUSION_P to keep fusible sequences
together and adds idiom matcheing for the supported fusion cases.

gcc/ChangeLog:

* config/riscv/riscv.cc (enum riscv_fusion_pairs): Add symbolic
constants to identify supported fusion patterns.
(struct riscv_tune_param): Add fusible_op field.
(riscv_macro_fusion_p): Implement.
(riscv_fusion_enabled_p): Implement.
(riscv_macro_fusion_pair_p): Implement and recoginze fusible
idioms for Ventana VT1.
(TARGET_SCHED_MACRO_FUSION_P): Point to riscv_macro_fusion_p.
(TARGET_SCHED_MACRO_FUSION_PAIR_P): Point to riscv_macro_fusion_pair_p.


You know the fusion rules for VT1 better than I...  I'm happy to largely 
defer to you on this.


I do wonder if going forward hand matching RTL like this is going to be 
an unmaintainable mess and whether or not we would be better served 
using insn attributes to describe instruction fusion.






Signed-off-by: Philipp Tomsich 
---

Changes in v2:
- Update fusion patterns and catch some missing idioms/fusion pairs.

  gcc/config/riscv/riscv.cc | 219 ++
  1 file changed, 219 insertions(+)

diff --git a/gcc/config/riscv/riscv.cc b/gcc/config/riscv/riscv.cc
index 31d651f8744..43ba520885c 100644
--- a/gcc/config/riscv/riscv.cc
+++ b/gcc/config/riscv/riscv.cc

+static bool
+riscv_macro_fusion_pair_p (rtx_insn *prev, rtx_insn *curr)
+{
+  rtx prev_set = single_set (prev);
+  rtx curr_set = single_set (curr);
+  /* prev and curr are simple SET insns i.e. no flag setting or branching.  */
+  bool simple_sets_p = prev_set && curr_set && !any_condjump_p (curr);
+
+  if (!riscv_macro_fusion_p ())
+return false;
+
+  if (simple_sets_p && (riscv_fusion_enabled_p (RISCV_FUSE_ZEXTW) ||
+   riscv_fusion_enabled_p (RISCV_FUSE_ZEXTH)))


Formatting nit.  Bring the && down to a new line and if you still need a 
line break for the "||",  then the "||" should be on a new line as 
well.  Something like this...



if (simple_sets_p
  && (riscv_fusion_enabled_p (RISCV_FUSE_ZEXTW

  || riscv_fusion_enabled_p (RISCV_FUSE_ZEXTH)))



+ && REGNO (XEXP (SET_SRC (curr_set), 0)) == REGNO(SET_DEST (curr_set))


Space before open paren on this line.




+ && (( INTVAL (XEXP (SET_SRC (curr_set), 1)) == 32
+   && riscv_fusion_enabled_p(RISCV_FUSE_ZEXTW) )
+ || ( INTVAL (XEXP (SET_SRC (curr_set), 1)) < 32
+  && riscv_fusion_enabled_p(RISCV_FUSE_ZEXTWS


Extraneous spaces after the open parens before INTVALs above.



+ && REGNO (XEXP (SET_SRC (curr_set), 0)) == REGNO(SET_DEST (curr_set))


Missing whitespace before open paren on this line.


OK with the nits fixed.


Jeff



Re: [PATCH] libstdc++: Fix python/ not making install directories

2022-11-14 Thread Jonathan Wakely via Gcc-patches
On Mon, 14 Nov 2022 at 15:58, Arsen Arsenović wrote:
>
>
> Jonathan Wakely  writes:
> >> It's the first thing the recipe does:
> >>
> >> install-data-local: gdb.py
> >> @$(mkdir_p) $(DESTDIR)$(toolexeclibdir)
> >>
> >> That's why I'm suggesting to do the same thing for the debug dir.
> >
> > This presumably means it has the problems that mkinstalldirs is
> > supposed to solve, but is that only relevant for Solaris 8, i.e. not
> > relevant?
>
> Ah, sorry.  I didn't notice that it did that at the top.  Yes, mkdir -p
> is at least as good then.
>

I've pushed it to trunk now, thanks for finding the problem.



RE: [PATCH 2/2]AArch64 Support new tbranch optab.

2022-11-14 Thread Tamar Christina via Gcc-patches
Hello,

Ping and updated patch.

Bootstrapped Regtested on aarch64-none-linux-gnu and no issues.

Ok for master?

Thanks,
Tamar

gcc/ChangeLog:

* config/aarch64/aarch64.md (*tb1): Rename to...
(*tb1): ... this.
(tbranch4): New.

gcc/testsuite/ChangeLog:

* gcc.target/aarch64/tbz_1.c: New test.

--- inline copy of patch ---

diff --git a/gcc/config/aarch64/aarch64.md b/gcc/config/aarch64/aarch64.md
index 
2bc2684b82c35a44e0a2cea6e3aaf32d939f8cdf..d7684c93fba5b717d568e1a4fd712bde55c7c72e
 100644
--- a/gcc/config/aarch64/aarch64.md
+++ b/gcc/config/aarch64/aarch64.md
@@ -943,12 +943,29 @@ (define_insn "*cb1"
  (const_int 1)))]
 )
 
-(define_insn "*tb1"
+(define_expand "tbranch4"
   [(set (pc) (if_then_else
- (EQL (zero_extract:DI (match_operand:GPI 0 "register_operand" "r")
-   (const_int 1)
-   (match_operand 1
- "aarch64_simd_shift_imm_" "n"))
+   (match_operator 0 "aarch64_comparison_operator"
+[(match_operand:ALLI 1 "register_operand")
+ (match_operand:ALLI 2 "aarch64_simd_shift_imm_")])
+   (label_ref (match_operand 3 "" ""))
+   (pc)))]
+  "optimize > 0"
+{
+  rtx bitvalue = gen_reg_rtx (DImode);
+  rtx tmp = simplify_gen_subreg (DImode, operands[1], GET_MODE (operands[1]), 
0);
+  emit_insn (gen_extzv (bitvalue, tmp, const1_rtx, operands[2]));
+  operands[2] = const0_rtx;
+  operands[1] = aarch64_gen_compare_reg (GET_CODE (operands[0]), bitvalue,
+operands[2]);
+})
+
+(define_insn "*tb1"
+  [(set (pc) (if_then_else
+ (EQL (zero_extract:GPI (match_operand:ALLI 0 "register_operand" 
"r")
+(const_int 1)
+(match_operand 1
+  "aarch64_simd_shift_imm_" 
"n"))
   (const_int 0))
 (label_ref (match_operand 2 "" ""))
 (pc)))
@@ -959,15 +976,15 @@ (define_insn "*tb1"
   {
if (get_attr_far_branch (insn) == 1)
  return aarch64_gen_far_branch (operands, 2, "Ltb",
-"\\t%0, %1, ");
+"\\t%0, %1, ");
else
  {
operands[1] = GEN_INT (HOST_WIDE_INT_1U << UINTVAL (operands[1]));
-   return "tst\t%0, %1\;\t%l2";
+   return "tst\t%0, %1\;\t%l2";
  }
   }
 else
-  return "\t%0, %1, %l2";
+  return "\t%0, %1, %l2";
   }
   [(set_attr "type" "branch")
(set (attr "length")
diff --git a/gcc/testsuite/gcc.target/aarch64/tbz_1.c 
b/gcc/testsuite/gcc.target/aarch64/tbz_1.c
new file mode 100644
index 
..86f5d3e23cf7f1ea6f3596549ce1a0cff6774463
--- /dev/null
+++ b/gcc/testsuite/gcc.target/aarch64/tbz_1.c
@@ -0,0 +1,95 @@
+/* { dg-do compile } */
+/* { dg-additional-options "-O2 -std=c99  -fno-unwind-tables 
-fno-asynchronous-unwind-tables" } */
+/* { dg-final { check-function-bodies "**" "" "" { target { le } } } } */
+
+#include 
+
+void h(void);
+
+/*
+** g1:
+** tbnzx[0-9]+, #?0, .L([0-9]+)
+** ret
+** ...
+*/
+void g1(bool x)
+{
+  if (__builtin_expect (x, 0))
+h ();
+}
+
+/*
+** g2:
+** tbz x[0-9]+, #?0, .L([0-9]+)
+** b   h
+** ...
+*/
+void g2(bool x)
+{
+  if (__builtin_expect (x, 1))
+h ();
+}
+
+/*
+** g3_ge:
+** tbnzw[0-9]+, #?31, .L[0-9]+
+** b   h
+** ...
+*/
+void g3_ge(int x)
+{
+  if (__builtin_expect (x >= 0, 1))
+h ();
+}
+
+/*
+** g3_gt:
+** cmp w[0-9]+, 0
+** ble .L[0-9]+
+** b   h
+** ...
+*/
+void g3_gt(int x)
+{
+  if (__builtin_expect (x > 0, 1))
+h ();
+}
+
+/*
+** g3_lt:
+** tbz w[0-9]+, #?31, .L[0-9]+
+** b   h
+** ...
+*/
+void g3_lt(int x)
+{
+  if (__builtin_expect (x < 0, 1))
+h ();
+}
+
+/*
+** g3_le:
+** cmp w[0-9]+, 0
+** bgt .L[0-9]+
+** b   h
+** ...
+*/
+void g3_le(int x)
+{
+  if (__builtin_expect (x <= 0, 1))
+h ();
+}
+
+/*
+** g5:
+** mov w[0-9]+, 65279
+** tst w[0-9]+, w[0-9]+
+** beq .L[0-9]+
+** b   h
+** ...
+*/ 
+void g5(int x)
+{
+  if (__builtin_expect (x & 0xfeff, 1))
+h ();
+}


rb16486.patch
Description: rb16486.patch


Re: [PATCH] libstdc++: Fix python/ not making install directories

2022-11-14 Thread Arsen Arsenović via Gcc-patches

Jonathan Wakely  writes:
>> It's the first thing the recipe does:
>>
>> install-data-local: gdb.py
>> @$(mkdir_p) $(DESTDIR)$(toolexeclibdir)
>>
>> That's why I'm suggesting to do the same thing for the debug dir.
>
> This presumably means it has the problems that mkinstalldirs is
> supposed to solve, but is that only relevant for Solaris 8, i.e. not
> relevant?

Ah, sorry.  I didn't notice that it did that at the top.  Yes, mkdir -p
is at least as good then.

Thanks,
-- 
Arsen Arsenović


signature.asc
Description: PGP signature


Re: [PATCH v2 1/2] RISC-V: Add basic support for the Ventana-VT1 core

2022-11-14 Thread Philipp Tomsich
On Mon, 14 Nov 2022 at 16:52, Jeff Law  wrote:
>
>
> On 11/13/22 13:48, Philipp Tomsich wrote:
> > The Ventana-VT1 core is compatible with rv64gc, Zb[abcs], Zifenci and
> > XVentanaCondOps.
> > This introduces a placeholder -mcpu=ventana-vt1, so tooling and
> > scripts don't need to change once full support (pipeline, tuning,
> > etc.) will become public later.
> >
> > gcc/ChangeLog:
> >
> >   * config/riscv/riscv-cores.def (RISCV_TUNE): Add ventana-vt1.
> >   (RISCV_CORE): Ditto.
> >   * config/riscv/riscv-opts.h (enum riscv_microarchitecture_type): 
> > Ditto.
> >   * config/riscv/riscv.cc: Add tune_info for ventana-vt1.
> >   * config/riscv/riscv.md: Add ventana-vt1.
> >   * 
> > doc/gcc/gcc-command-options/machine-dependent-options/risc-v-options.rst:
> >   Document -mcpu= and -mtune with ventana-vt1.
>
> OK.
>
>
> WRT the scheduler description.  I have one, but I think it's on the
> server at the vacation house which went offline a couple weeks ago and
> due to health reasons I haven't been up there to reset the internet
> connection.  Worst case I can just rebuild it from scratch, it's not
> that complex.

This series is pointing 'ventana-vt1' back to 'generic', so we could
also add the pipeline description later in the release cycle...

Philipp.


RE: [PATCH 1/2]middle-end: Add new tbranch optab to add support for bit-test-and-branch operations

2022-11-14 Thread Tamar Christina via Gcc-patches
> -Original Message-
> From: Richard Biener 
> Sent: Saturday, November 5, 2022 2:23 PM
> To: Aldy Hernandez 
> Cc: Tamar Christina ; Jeff Law
> ; gcc-patches@gcc.gnu.org; nd ;
> MacLeod, Andrew 
> Subject: Re: [PATCH 1/2]middle-end: Add new tbranch optab to add support
> for bit-test-and-branch operations
> 
> On Wed, 2 Nov 2022, Aldy Hernandez wrote:
> 
> > On Wed, Nov 2, 2022 at 10:55 AM Tamar Christina
>  wrote:
> > >
> > > Hi Aldy,
> > >
> > > I'm trying to use Ranger to determine if a range of an expression is a
> single bit.
> > >
> > > If possible in case of a mask then also the position of the bit that's 
> > > being
> checked by the mask (or the mask itself).
> >
> > Just instantiate a ranger, and ask for the range of an SSA name (or an
> > arbitrary tree expression) at a particular gimple statement (or an
> > edge):
> >
> > gimple_ranger ranger;
> > int_range_max r;
> > if (ranger.range_of_expr (r, , )) {
> >   // do stuff with range "r"
> >   if (r.singleton_p ()) {
> > wide_int num = r.lower_bound ();
> > // Check the bits in NUM, etc...
> >   }
> > }
> >
> > You can see the full ranger API in gimple-range.h.
> >
> > Note that instantiating a new ranger is relatively lightweight, but
> > it's not free.  So unless you're calling range_of_expr sporadically,
> > you probably want to have one instance for your pass.  You can pass
> > around the gimple_ranger around your pass.  Another way of doing this
> > is calling enable_rager() at pass start, and then doing:
> >
> >   get_range_query (cfun)->range_of_expr (r, , ));
> >
> > gimple-loop-versioning.cc has an example of using enable_ranger /
> > disable_ranger.
> >
> > I am assuming you are interested in ranges for integers / pointers.
> > Otherwise (floats, etc) you'd have to use "Value_Range" instead of
> > int_range_max.  I can give you examples on that if necessary.
> >
> > Let me know if that helps.

It Did! I ended up going with Richi's suggestion, but the snippet was very 
helpful
for a different range based patch I'm trying a prototype for.

Many thanks for the example!

> 
> I think you maybe just want get_nonzero_bits?

Ah, looks like that uses range info as well.  Thanks!

Ok for master?

Thanks,
Tamar

gcc/ChangeLog:

* dojump.cc (do_jump): Pass along value.
(do_jump_by_parts_greater_rtx): Likewise.
(do_jump_by_parts_zero_rtx): Likewise.
(do_jump_by_parts_equality_rtx): Likewise.
(do_compare_rtx_and_jump): Likewise.
(do_compare_and_jump): Likewise.
* dojump.h (do_compare_rtx_and_jump): New.
* optabs.cc (emit_cmp_and_jump_insn_1): Refactor to take optab to check.
(validate_test_and_branch): New.
(emit_cmp_and_jump_insns): Optiobally take a value, and when value is
supplied then check if it's suitable for tbranch.
* optabs.def (tbranch$a4): New.
* doc/md.texi (tbranch@var{mode}4): Document it.
* optabs.h (emit_cmp_and_jump_insns):
* tree.h (tree_zero_one_valued_p): New.

--- inline copy of patch ---
diff --git a/gcc/doc/md.texi b/gcc/doc/md.texi
index 
34825549ed4e315b07d36dc3d63bae0cc0a3932d..342e8c4c670de251a35689d1805acceb72a8f6bf
 100644
--- a/gcc/doc/md.texi
+++ b/gcc/doc/md.texi
@@ -6958,6 +6958,13 @@ case, you can and should make operand 1's predicate 
reject some operators
 in the @samp{cstore@var{mode}4} pattern, or remove the pattern altogether
 from the machine description.
 
+@cindex @code{tbranch@var{mode}4} instruction pattern
+@item @samp{tbranch@var{mode}4}
+Conditional branch instruction combined with a bit test-and-compare
+instruction. Operand 0 is a comparison operator.  Operand 1 is the
+operand of the comparison. Operand 2 is the bit position of Operand 1 to test.
+Operand 3 is the @code{code_label} to jump to.
+
 @cindex @code{cbranch@var{mode}4} instruction pattern
 @item @samp{cbranch@var{mode}4}
 Conditional branch instruction combined with a compare instruction.
diff --git a/gcc/dojump.h b/gcc/dojump.h
index 
e379cceb34bb1765cb575636e4c05b61501fc2cf..d1d79c490c420a805fe48d58740a79c1f25fb839
 100644
--- a/gcc/dojump.h
+++ b/gcc/dojump.h
@@ -71,6 +71,10 @@ extern void jumpifnot (tree exp, rtx_code_label *label,
 extern void jumpifnot_1 (enum tree_code, tree, tree, rtx_code_label *,
 profile_probability);
 
+extern void do_compare_rtx_and_jump (rtx, rtx, enum rtx_code, int, tree,
+machine_mode, rtx, rtx_code_label *,
+rtx_code_label *, profile_probability);
+
 extern void do_compare_rtx_and_jump (rtx, rtx, enum rtx_code, int,
 machine_mode, rtx, rtx_code_label *,
 rtx_code_label *, profile_probability);
diff --git a/gcc/dojump.cc b/gcc/dojump.cc
index 
2af0cd1aca3b6af13d5d8799094ee93f18022296..190324f36f1a31990f8c49bc8c0f45c23da5c31e
 100644
--- a/gcc/dojump.cc
+++ b/gcc/dojump.cc
@@ -619,7 +619,7 @@ do_jump (tree 

Re: [PATCH v2 1/2] RISC-V: Add basic support for the Ventana-VT1 core

2022-11-14 Thread Jeff Law via Gcc-patches



On 11/13/22 13:48, Philipp Tomsich wrote:

The Ventana-VT1 core is compatible with rv64gc, Zb[abcs], Zifenci and
XVentanaCondOps.
This introduces a placeholder -mcpu=ventana-vt1, so tooling and
scripts don't need to change once full support (pipeline, tuning,
etc.) will become public later.

gcc/ChangeLog:

* config/riscv/riscv-cores.def (RISCV_TUNE): Add ventana-vt1.
(RISCV_CORE): Ditto.
* config/riscv/riscv-opts.h (enum riscv_microarchitecture_type): Ditto.
* config/riscv/riscv.cc: Add tune_info for ventana-vt1.
* config/riscv/riscv.md: Add ventana-vt1.
* 
doc/gcc/gcc-command-options/machine-dependent-options/risc-v-options.rst:
Document -mcpu= and -mtune with ventana-vt1.


OK.


WRT the scheduler description.  I have one, but I think it's on the 
server at the vacation house which went offline a couple weeks ago and 
due to health reasons I haven't been up there to reset the internet 
connection.  Worst case I can just rebuild it from scratch, it's not 
that complex.


Jeff


RE: [PATCH][GCC] aarch64: Add support for Cortex-X3 CPU.

2022-11-14 Thread Srinath Parvathaneni via Gcc-patches
Hi,

> -Original Message-
> From: Kyrylo Tkachov 
> Sent: Monday, November 14, 2022 2:47 PM
> To: Srinath Parvathaneni ; gcc-
> patc...@gcc.gnu.org
> Cc: Richard Sandiford 
> Subject: RE: [PATCH][GCC] aarch64: Add support for Cortex-X3 CPU.
> 
> 
> 
> > -Original Message-
> > From: Srinath Parvathaneni 
> > Sent: Friday, November 11, 2022 3:08 PM
> > To: gcc-patches@gcc.gnu.org
> > Cc: Richard Sandiford ; Kyrylo Tkachov
> > 
> > Subject: [PATCH][GCC] aarch64: Add support for Cortex-X3 CPU.
> >
> > Hi,
> >
> > This patch adds support for Cortex-X3 CPU.
> >
> > Bootstrapped on aarch64-none-linux-gnu and found no regressions.
> >
> > Ok for GCC master?
> 
> Ok, but the documentation needs to be rebased as we've moved back to
> .texi.

Thank you Kyrill, I have rebased and committed the patch, updated invoke.texi.

Regards,
Srinath.

> Thanks,
> Kyrill
> 
> >
> > Regards,
> > Srinath.
> >
> > gcc/ChangeLog:
> >
> > 2022-11-09  Srinath Parvathaneni  
> >
> > * config/aarch64/aarch64-cores.def (AARCH64_CORE): Add
> > Cortex-X3 CPU.
> > * config/aarch64/aarch64-tune.md: Regenerate.
> > *
> > doc/gcc/gcc-command-options/machine-dependent-options/aarch64-
> > options.rst:
> > Document Cortex-X3 CPU.
> >
> >
> > ### Attachment also inlined for ease of reply
> > ###
> >
> >
> > diff --git a/gcc/config/aarch64/aarch64-cores.def
> > b/gcc/config/aarch64/aarch64-cores.def
> > index
> >
> 3055da9b268b6b71bc3bd6db721812b387e8dd44..a2062468136bf1c38b941c
> > 53868d26dafedda276 100644
> > --- a/gcc/config/aarch64/aarch64-cores.def
> > +++ b/gcc/config/aarch64/aarch64-cores.def
> > @@ -172,6 +172,8 @@ AARCH64_CORE("cortex-a715",  cortexa715,
> > cortexa57, V9A,  (SVE2_BITPERM, MEMTAG,
> >
> >  AARCH64_CORE("cortex-x2",  cortexx2, cortexa57, V9A,  (SVE2_BITPERM,
> > MEMTAG, I8MM, BF16), neoversen2, 0x41, 0xd48, -1)
> >
> > +AARCH64_CORE("cortex-x3",  cortexx3, cortexa57, V9A,  (SVE2_BITPERM,
> > MEMTAG, I8MM, BF16), neoversen2, 0x41, 0xd4e, -1)
> > +
> >  AARCH64_CORE("neoverse-n2", neoversen2, cortexa57, V9A, (I8MM,
> BF16,
> > SVE2_BITPERM, RNG, MEMTAG, PROFILE), neoversen2, 0x41, 0xd49, -1)
> >
> >  AARCH64_CORE("demeter", demeter, cortexa57, V9A, (I8MM, BF16,
> > SVE2_BITPERM, RNG, MEMTAG, PROFILE), neoversev2, 0x41, 0xd4f, -1) diff
> > --git a/gcc/config/aarch64/aarch64-tune.md
> > b/gcc/config/aarch64/aarch64-tune.md
> > index
> >
> 22ec1be5a4c71b930221d2c4f1e62df57df0cadf..74c4384712b202058a58f1da0
> > ca28adec97a6b9b 100644
> > --- a/gcc/config/aarch64/aarch64-tune.md
> > +++ b/gcc/config/aarch64/aarch64-tune.md
> > @@ -1,5 +1,5 @@
> >  ;; -*- buffer-read-only: t -*-
> >  ;; Generated automatically by gentune.sh from aarch64-cores.def
> > (define_attr "tune"
> > -
> > "cortexa34,cortexa35,cortexa53,cortexa57,cortexa72,cortexa73,thun
> >
> derx,thunderxt88p1,thunderxt88,octeontx,octeontxt81,octeontxt83,thunde
> > r
> >
> xt81,thunderxt83,ampere1,emag,xgene1,falkor,qdf24xx,exynosm1,phecda,t
> > hunderx2t99p1,vulcan,thunderx2t99,cortexa55,cortexa75,cortexa76,cortex
> > a
> > 76ae,cortexa77,cortexa78,cortexa78ae,cortexa78c,cortexa65,cortexa65ae,
> > co
> > rtexx1,cortexx1c,ares,neoversen1,neoversee1,octeontx2,octeontx2t98,oct
> > eo
> >
> ntx2t96,octeontx2t93,octeontx2f95,octeontx2f95n,octeontx2f95mm,a64fx,t
> > s
> >
> v110,thunderx3t110,zeus,neoversev1,neoverse512tvb,saphira,cortexa57cor
> > t
> > exa53,cortexa72cortexa53,cortexa73cortexa35,cortexa73cortexa53,cortexa
> > 7
> > 5cortexa55,cortexa76cortexa55,cortexr82,cortexa510,cortexa710,cortexa7
> > 15 ,cortexx2,neoversen2,demeter,neoversev2"
> > +
> > "cortexa34,cortexa35,cortexa53,cortexa57,cortexa72,cortexa73,thun
> >
> derx,thunderxt88p1,thunderxt88,octeontx,octeontxt81,octeontxt83,thunde
> > r
> >
> xt81,thunderxt83,ampere1,emag,xgene1,falkor,qdf24xx,exynosm1,phecda,t
> > hunderx2t99p1,vulcan,thunderx2t99,cortexa55,cortexa75,cortexa76,cortex
> > a
> > 76ae,cortexa77,cortexa78,cortexa78ae,cortexa78c,cortexa65,cortexa65ae,
> > co
> > rtexx1,cortexx1c,ares,neoversen1,neoversee1,octeontx2,octeontx2t98,oct
> > eo
> >
> ntx2t96,octeontx2t93,octeontx2f95,octeontx2f95n,octeontx2f95mm,a64fx,t
> > s
> >
> v110,thunderx3t110,zeus,neoversev1,neoverse512tvb,saphira,cortexa57cor
> > t
> > exa53,cortexa72cortexa53,cortexa73cortexa35,cortexa73cortexa53,cortexa
> > 7
> > 5cortexa55,cortexa76cortexa55,cortexr82,cortexa510,cortexa710,cortexa7
> > 15 ,cortexx2,cortexx3,neoversen2,demeter,neoversev2"
> > (const (symbol_ref "((enum attr_tune) aarch64_tune)"))) diff --git
> > a/gcc/doc/gcc/gcc-command-options/machine-dependent-
> > options/aarch64-options.rst b/gcc/doc/gcc/gcc-command-
> options/machine-
> > dependent-options/aarch64-options.rst
> > index
> >
> d97515d9e54feaa85a2ead4e9b73f0eb966cb39f..7cc369ef95e510e30873159b
> > 8e2130c4f77a57d3 100644
> > --- a/gcc/doc/gcc/gcc-command-options/machine-dependent-
> > options/aarch64-options.rst
> > +++ 

Re: [RFC PATCH] ipa-guarded-deref: Add new pass to dereference function pointers

2022-11-14 Thread Christoph Müllner
On Mon, Nov 14, 2022 at 2:48 PM Richard Biener 
wrote:

> On Mon, Nov 14, 2022 at 12:46 PM Christoph Müllner
>  wrote:
> >
> >
> >
> > On Mon, Nov 14, 2022 at 11:10 AM Richard Biener <
> richard.guent...@gmail.com> wrote:
> >>
> >> On Mon, Nov 14, 2022 at 10:32 AM Christoph Müllner
> >>  wrote:
> >> >
> >> >
> >> >
> >> > On Mon, Nov 14, 2022 at 10:00 AM Richard Biener <
> richard.guent...@gmail.com> wrote:
> >> >>
> >> >> On Mon, Nov 14, 2022 at 9:13 AM Christoph Müllner
> >> >>  wrote:
> >> >> >
> >> >> >
> >> >> >
> >> >> > On Mon, Nov 14, 2022 at 8:31 AM Richard Biener <
> richard.guent...@gmail.com> wrote:
> >> >> >>
> >> >> >> On Sun, Nov 13, 2022 at 4:09 PM Christoph Muellner
> >> >> >>  wrote:
> >> >> >> >
> >> >> >> > From: Christoph Müllner 
> >> >> >> >
> >> >> >> > This patch adds a new pass that looks up function pointer
> assignments,
> >> >> >> > and adds guarded direct calls to the call sites of the function
> >> >> >> > pointers.
> >> >> >> >
> >> >> >> > E.g.: Lets assume an assignment to a function pointer as
> follows:
> >> >> >> > b->cb = 
> >> >> >> >   Other part of the program can use the function pointer as
> follows:
> >> >> >> > b->cb ();
> >> >> >> >   With this pass the invocation will be transformed to:
> >> >> >> > if (b->cb == myfun)
> >> >> >> >   myfun();
> >> >> >> > else
> >> >> >> >b->cb ()
> >> >> >> >
> >> >> >> > The impact of the dynamic guard is expected to be less than the
> speedup
> >> >> >> > gained by enabled optimizations (e.g. inlining or constant
> propagation).
> >> >> >>
> >> >> >> We have speculative devirtualization doing this very transform,
> shouldn't you
> >> >> >> instead improve that instead of inventing another specialized
> pass?
> >> >> >
> >> >> >
> >> >> > Yes, it can be integrated into ipa-devirt.
> >> >> >
> >> >> > The reason we initially decided to move it into its own file was
> that C++ devirtualization
> >> >> > and function pointer dereferencing/devirtualization will likely
> not use the same analysis.
> >> >> > E.g. ODR only applies to C++, C++ tables are not directly exposed
> to the user.
> >> >> > So we figured that different things should not be merged together,
> but a reuse
> >> >> > of common code to avoid duplication is mandatory.
> >> >>
> >> >> Btw, in other context the idea came up to build candidates based on
> available
> >> >> API/ABI (that can be indirectly called).  That would help for
> example the
> >> >> get_ref calls in refine_subpel in the x264 benchmark.  Maybe what you
> >> >> do is actually
> >> >> the very same thing (but look for explicit address-taking) - I didn't
> >> >> look into whether
> >> >> you prune the list of candidates based on API/ABI.
> >> >
> >> >
> >> > No, I don't consider API/ABI at all (do you have a pointer so I can
> get a better understanding of that idea?).
> >>
> >> No, it was just an idea discussed internally.
> >>
> >> > Adding guards for all possible functions with the same API/ABI seems
> expensive (I might misunderstand the idea).
> >> > My patch adds a maximum of 1 test per call site.
> >> >
> >> > What I do is looking which addresses are assigned to the function
> pointer.
> >> > If there is more than one assigned function, I drop the function
> pointer from the list of candidates.
> >>
> >> OK.  If the program is type correct that's probably going to work well
> >> enough.  If there are more than
> >> one candidates then you could prune those by simple API checks, like
> >> match up the number of arguments
> >> or void vs. non-void return type.  More advanced pruning might lose
> >> some valid candidates (API vs.
> >> ABI compatibility), but it's only heuristic pruning in any case.
> >>
> >> It would probably help depending on what exactly "assigned to the
> >> function pointer" means.  If the
> >> function pointer is not from directly visible static storage then
> >> matching up assignments and uses
> >> is going to be a difficult IPA problem itself.  So our original idea
> was for
> >>
> >>  (*fnptr) (args ...);
> >>
> >> look for all possible definitions in the (LTO) unit that match the
> >> call signature and that have their
> >> address taken and that possibly could be pointed to by fnptr and if
> >> that's a single one, speculatively
> >> devirtualize that.
> >
> >
> > Understood. That's an interesting idea.
> > Assuming that functions with identical signatures are rare,
> > both approaches should find similar candidates.
> >
> > I wonder why the API/ABI compatibility checks are needed
> > if we only consider functions assigned to a function pointer.
> > I.e. if call-site and callee don't match, wouldn't the indirect call
> > suffer from the same incompatibility?
>
> At least in C land mismatches are not unheard of (working across TUs).
>
> >
> > The patch currently looks at the following properties of the RHS of a
> function pointer assignment:
> > * rhs = gimple_assign_rhs1 (stmt)
> > * rhs_t = TREE_TYPE 

[PATCH 2/2] IBM zSystems: Save argument registers to the stack -mpreserve-args

2022-11-14 Thread Andreas Krebbel via Gcc-patches
This adds support for preserving the content of parameter registers to
the stack and emit CFI for it. This useful for applications which want
to implement their own stack unwinding and need access to function
arguments.

With the -mpreserve-args option GPRs and FPRs are save to the stack
slots which are reserved for stdargs in the register save area.

gcc/ChangeLog:

* config/s390/s390.cc (s390_restore_gpr_p): New function.
(s390_preserve_gpr_arg_in_range_p): New function.
(s390_preserve_gpr_arg_p): New function.
(s390_preserve_fpr_args_p): New function.
(s390_preserve_fpr_arg_p): New function.
(s390_register_info_stdarg_fpr): Rename to ...
(s390_register_info_arg_fpr): ... this. Add -mpreserve-args handling.
(s390_register_info_stdarg_gpr): Rename to ...
(s390_register_info_arg_gpr): ... this. Add -mpreserve-args handling.
(s390_register_info): Use the renamed functions above.
(s390_optimize_register_info): Likewise.
(save_fpr): Generate CFI for -mpreserve-args.
(save_gprs): Generate CFI for -mpreserve-args. Drop return value.
(s390_emit_prologue): Adjust to changed calling convention of save_gprs.
(s390_optimize_prologue): Likewise.
* config/s390/s390.opt: New option -mpreserve-args

gcc/testsuite/ChangeLog:

* gcc.target/s390/preserve-args-1.c: New test.
* gcc.target/s390/preserve-args-2.c: New test.
---
 gcc/config/s390/s390.cc   | 263 +-
 gcc/config/s390/s390.opt  |   4 +
 .../gcc.target/s390/preserve-args-1.c |  17 ++
 .../gcc.target/s390/preserve-args-2.c |  19 ++
 4 files changed, 229 insertions(+), 74 deletions(-)
 create mode 100644 gcc/testsuite/gcc.target/s390/preserve-args-1.c
 create mode 100644 gcc/testsuite/gcc.target/s390/preserve-args-2.c

diff --git a/gcc/config/s390/s390.cc b/gcc/config/s390/s390.cc
index f5c75395cf3..5e197b5314b 100644
--- a/gcc/config/s390/s390.cc
+++ b/gcc/config/s390/s390.cc
@@ -411,6 +411,53 @@ struct s390_address
 #define FP_ARG_NUM_REG (TARGET_64BIT? 4 : 2)
 #define VEC_ARG_NUM_REG 8
 
+/* Return TRUE if GPR REGNO is supposed to be restored in the function
+   epilogue.  */
+static inline bool
+s390_restore_gpr_p (int regno)
+{
+  return (cfun_frame_layout.first_restore_gpr != -1
+ && regno >= cfun_frame_layout.first_restore_gpr
+ && regno <= cfun_frame_layout.last_restore_gpr);
+}
+
+/* Return TRUE if any of the registers in range [FIRST, LAST] is saved
+   because of -mpreserve-args.  */
+static inline bool
+s390_preserve_gpr_arg_in_range_p (int first, int last)
+{
+  int num_arg_regs = MIN (crtl->args.info.gprs + cfun->va_list_gpr_size,
+ GP_ARG_NUM_REG);
+  return (num_arg_regs
+ && s390_preserve_args_p
+ && first <= GPR2_REGNUM + num_arg_regs - 1
+ && last >= GPR2_REGNUM);
+}
+
+static inline bool
+s390_preserve_gpr_arg_p (int regno)
+{
+  return s390_preserve_gpr_arg_in_range_p (regno, regno);
+}
+
+/* Return TRUE if FPR arguments need to be saved onto the stack due to 
-mpreserve-args.  */
+static inline bool
+s390_preserve_fpr_args_p (void)
+{
+  return (s390_preserve_args_p
+ && (crtl->args.info.fprs + cfun->va_list_fpr_size));
+}
+
+static inline bool
+s390_preserve_fpr_arg_p (int regno)
+{
+  int num_arg_regs = MIN (crtl->args.info.fprs + cfun->va_list_fpr_size,
+ FP_ARG_NUM_REG);
+  return (s390_preserve_args_p
+ && regno <= FPR0_REGNUM + num_arg_regs - 1
+ && regno >= FPR0_REGNUM);
+}
+
 /* A couple of shortcuts.  */
 #define CONST_OK_FOR_J(x) \
CONST_OK_FOR_CONSTRAINT_P((x), 'J', "J")
@@ -9893,61 +9940,90 @@ s390_register_info_gprtofpr ()
 }
 
 /* Set the bits in fpr_bitmap for FPRs which need to be saved due to
-   stdarg.
+   stdarg or -mpreserve-args.
This is a helper routine for s390_register_info.  */
-
 static void
-s390_register_info_stdarg_fpr ()
+s390_register_info_arg_fpr ()
 {
   int i;
-  int min_fpr;
-  int max_fpr;
+  int min_stdarg_fpr = INT_MAX, max_stdarg_fpr = -1;
+  int min_preserve_fpr = INT_MAX, max_preserve_fpr = -1;
+  int min_fpr, max_fpr;
 
   /* Save the FP argument regs for stdarg. f0, f2 for 31 bit and
  f0-f4 for 64 bit.  */
-  if (!cfun->stdarg
-  || !TARGET_HARD_FLOAT
-  || !cfun->va_list_fpr_size
-  || crtl->args.info.fprs >= FP_ARG_NUM_REG)
-return;
+  if (cfun->stdarg
+  && TARGET_HARD_FLOAT
+  && cfun->va_list_fpr_size
+  && crtl->args.info.fprs < FP_ARG_NUM_REG)
+{
+  min_stdarg_fpr = crtl->args.info.fprs;
+  max_stdarg_fpr = min_stdarg_fpr + cfun->va_list_fpr_size - 1;
+  if (max_stdarg_fpr >= FP_ARG_NUM_REG)
+   max_stdarg_fpr = FP_ARG_NUM_REG - 1;
+
+  /* FPR argument regs start at f0.  */
+  min_stdarg_fpr += FPR0_REGNUM;
+  max_stdarg_fpr += FPR0_REGNUM;
+}
+
+  if (s390_preserve_fpr_args_p ())
+{
+  

  1   2   3   >