[C++ PATCH take #2] PR c++/96442: Improved error recovery in enumerations.

2022-06-05 Thread Roger Sayle

Hi Jason,
My apologies for the long delay, but I've finally got around to
implementing your suggested improvements (implied by your review):
https://gcc.gnu.org/pipermail/gcc-patches/2022-March/591504.html
of my patch for PR c++/96442:
https://gcc.gnu.org/pipermail/gcc-patches/2022-February/590716.html

The "How does that happen?" is insightful and leads to a cleaner
solution, setting ENUM_UNDERLYING_TYPE to integer_type_node when
issuing an error, so that this invariant holds during the parser's
error recovery.  I've also moved the new testcase to the g++.dg/parse
subdirectory as per your feedback on my previous ICE-on-invalid fixes.

This patch has been tested on x86_64-pc-linunx-gnu with make bootstrap
and make -k check with no new (unexpected) failures.  Ok for mainline?


2022-06-05  Roger Sayle  

gcc/cp/ChangeLog
PR c++/96442
* decl.cc (start_enum): When emitting a "must be integral" error,
set ENUM_UNDERLYING_TYPE to integer_type_node, to avoid an ICE
downstream in build_enumeration.

gcc/testsuite/ChangeLog
PR c++/96442
* g++.dg/parse/pr96442.C: New test cae.

Thanks again,
Roger
--

> -Original Message-
> From: Jason Merrill 
> Sent: 10 March 2022 05:06
> To: Roger Sayle ; gcc-patches@gcc.gnu.org
> Subject: Re: [C++ PATCH] PR c++/96442: Another improved error recovery in
> enumerations.
> 
> On 2/22/22 08:02, Roger Sayle wrote:
> >
> > This patch resolves PR c++/96442, another ICE-after-error regression.
> > In this case, invalid code attempts to use a non-integral type as the
> > underlying type for an enumeration (a record_type in the example given
> > in the bugzilla PR), for which the parser emits an error message but
> > allows the inappropriate type to leak to downstream code.
> 
> How does that happen?
> 
> Would it help to change dependent_type_p in start_enum to
> WILDCARD_TYPE_P?
> 
> > The minimal
> > safe fix is to double check that the enumeration's underlying type
> > EUTYPE satisfies INTEGRAL_TYPE_P before calling int_fits_type_p in
> > build_enumerator.  This is a one line fix, but correcting indentation
> > and storing a common subexpression in a variable makes the change look
> > a little bigger.
> >
> > This patch has been tested on x86_64-pc-linunx-gnu with make bootstrap
> > and make -k check with no new (unexpected) failures.  Ok for mainline?
> >
> >
> > 2022-02-22  Roger Sayle  
> >
> > gcc/cp/ChangeLog
> > PR c++/96442
> > * decl.cc (build_enumeration): Check ENUM_UNDERLYING_TYPE is
> > INTEGRAL_TYPE_P before calling int_fits_type_p.
> >
> > gcc/testsuite/ChangeLog
> > PR c++/96442
> > * g++.dg/pr96442.C: New test cae.
> >
> >
> > Thanks in advance,
> > Roger
> > --
> >

diff --git a/gcc/cp/decl.cc b/gcc/cp/decl.cc
index e0d397d..ca735d3 100644
--- a/gcc/cp/decl.cc
+++ b/gcc/cp/decl.cc
@@ -16306,8 +16306,11 @@ start_enum (tree name, tree enumtype, tree 
underlying_type,
   else if (dependent_type_p (underlying_type))
ENUM_UNDERLYING_TYPE (enumtype) = underlying_type;
   else
-error ("underlying type %qT of %qT must be an integral type", 
-   underlying_type, enumtype);
+   {
+ error ("underlying type %qT of %qT must be an integral type", 
+underlying_type, enumtype);
+ ENUM_UNDERLYING_TYPE (enumtype) = integer_type_node;
+   }
 }
 
   /* If into a template class, the returned enum is always the first
diff --git a/gcc/testsuite/g++.dg/parse/pr96442.C 
b/gcc/testsuite/g++.dg/parse/pr96442.C
new file mode 100644
index 000..235bb11
--- /dev/null
+++ b/gcc/testsuite/g++.dg/parse/pr96442.C
@@ -0,0 +1,6 @@
+/* { dg-do compile } */
+/* { dg-options "-O2" } */
+enum struct a : struct {};
+template  enum class a : class c{};
+enum struct a {b};
+// { dg-excess-errors "" }


Re: [1/2] PR96463 - aarch64 specific changes

2022-06-05 Thread Prathamesh Kulkarni via Gcc-patches
On Wed, 1 Jun 2022 at 14:12, Richard Sandiford
 wrote:
>
> Prathamesh Kulkarni  writes:
> > On Thu, 12 May 2022 at 16:15, Richard Sandiford
> >  wrote:
> >>
> >> Prathamesh Kulkarni  writes:
> >> > On Wed, 11 May 2022 at 12:44, Richard Sandiford
> >> >  wrote:
> >> >>
> >> >> Prathamesh Kulkarni  writes:
> >> >> > On Fri, 6 May 2022 at 16:00, Richard Sandiford
> >> >> >  wrote:
> >> >> >>
> >> >> >> Prathamesh Kulkarni  writes:
> >> >> >> > diff --git a/gcc/config/aarch64/aarch64-sve-builtins-base.cc 
> >> >> >> > b/gcc/config/aarch64/aarch64-sve-builtins-base.cc
> >> >> >> > index c24c0548724..1ef4ea2087b 100644
> >> >> >> > --- a/gcc/config/aarch64/aarch64-sve-builtins-base.cc
> >> >> >> > +++ b/gcc/config/aarch64/aarch64-sve-builtins-base.cc
> >> >> >> > @@ -44,6 +44,14 @@
> >> >> >> >  #include "aarch64-sve-builtins-shapes.h"
> >> >> >> >  #include "aarch64-sve-builtins-base.h"
> >> >> >> >  #include "aarch64-sve-builtins-functions.h"
> >> >> >> > +#include "aarch64-builtins.h"
> >> >> >> > +#include "gimple-ssa.h"
> >> >> >> > +#include "tree-phinodes.h"
> >> >> >> > +#include "tree-ssa-operands.h"
> >> >> >> > +#include "ssa-iterators.h"
> >> >> >> > +#include "stringpool.h"
> >> >> >> > +#include "value-range.h"
> >> >> >> > +#include "tree-ssanames.h"
> >> >> >>
> >> >> >> Minor, but: I think the preferred approach is to include "ssa.h"
> >> >> >> rather than include some of these headers directly.
> >> >> >>
> >> >> >> >
> >> >> >> >  using namespace aarch64_sve;
> >> >> >> >
> >> >> >> > @@ -1207,6 +1215,56 @@ public:
> >> >> >> >  insn_code icode = code_for_aarch64_sve_ld1rq (e.vector_mode 
> >> >> >> > (0));
> >> >> >> >  return e.use_contiguous_load_insn (icode);
> >> >> >> >}
> >> >> >> > +
> >> >> >> > +  gimple *
> >> >> >> > +  fold (gimple_folder &f) const OVERRIDE
> >> >> >> > +  {
> >> >> >> > +tree arg0 = gimple_call_arg (f.call, 0);
> >> >> >> > +tree arg1 = gimple_call_arg (f.call, 1);
> >> >> >> > +
> >> >> >> > +/* Transform:
> >> >> >> > +   lhs = svld1rq ({-1, -1, ... }, arg1)
> >> >> >> > +   into:
> >> >> >> > +   tmp = mem_ref [(int * {ref-all}) arg1]
> >> >> >> > +   lhs = vec_perm_expr.
> >> >> >> > +   on little endian target.  */
> >> >> >> > +
> >> >> >> > +if (!BYTES_BIG_ENDIAN
> >> >> >> > + && integer_all_onesp (arg0))
> >> >> >> > +  {
> >> >> >> > + tree lhs = gimple_call_lhs (f.call);
> >> >> >> > + auto simd_type = aarch64_get_simd_info_for_type (Int32x4_t);
> >> >> >>
> >> >> >> Does this work for other element sizes?  I would have expected it
> >> >> >> to be the (128-bit) Advanced SIMD vector associated with the same
> >> >> >> element type as the SVE vector.
> >> >> >>
> >> >> >> The testcase should cover more than just int32x4_t -> svint32_t,
> >> >> >> just to be sure.
> >> >> > In the attached patch, it obtains corresponding advsimd type with:
> >> >> >
> >> >> > tree eltype = TREE_TYPE (lhs_type);
> >> >> > unsigned nunits = 128 / TREE_INT_CST_LOW (TYPE_SIZE (eltype));
> >> >> > tree vectype = build_vector_type (eltype, nunits);
> >> >> >
> >> >> > While this seems to work with different element sizes, I am not sure 
> >> >> > if it's
> >> >> > the correct approach ?
> >> >>
> >> >> Yeah, that looks correct.  Other SVE code uses aarch64_vq_mode
> >> >> to get the vector mode associated with a .Q “element”, so an
> >> >> alternative would be:
> >> >>
> >> >> machine_mode vq_mode = aarch64_vq_mode (TYPE_MODE (eltype)).require 
> >> >> ();
> >> >> tree vectype = build_vector_type_for_mode (eltype, vq_mode);
> >> >>
> >> >> which is more explicit about wanting an Advanced SIMD vector.
> >> >>
> >> >> >> > +
> >> >> >> > + tree elt_ptr_type
> >> >> >> > +   = build_pointer_type_for_mode (simd_type.eltype, VOIDmode, 
> >> >> >> > true);
> >> >> >> > + tree zero = build_zero_cst (elt_ptr_type);
> >> >> >> > +
> >> >> >> > + /* Use element type alignment.  */
> >> >> >> > + tree access_type
> >> >> >> > +   = build_aligned_type (simd_type.itype, TYPE_ALIGN 
> >> >> >> > (simd_type.eltype));
> >> >> >> > +
> >> >> >> > + tree tmp = make_ssa_name_fn (cfun, access_type, 0);
> >> >> >> > + gimple *mem_ref_stmt
> >> >> >> > +   = gimple_build_assign (tmp, fold_build2 (MEM_REF, 
> >> >> >> > access_type, arg1, zero));
> >> >> >>
> >> >> >> Long line.  Might be easier to format by assigning the fold_build2 
> >> >> >> result
> >> >> >> to a temporary variable.
> >> >> >>
> >> >> >> > + gsi_insert_before (f.gsi, mem_ref_stmt, GSI_SAME_STMT);
> >> >> >> > +
> >> >> >> > + tree mem_ref_lhs = gimple_get_lhs (mem_ref_stmt);
> >> >> >> > + tree vectype = TREE_TYPE (mem_ref_lhs);
> >> >> >> > + tree lhs_type = TREE_TYPE (lhs);
> >> >> >>
> >> >> >> Is this necessary?  The code above supplied the types and I wouldn't
> >> >> >> have expected them to change during the build process.
> >> >> >>
> >> >> >> > +
> >> >> >> > + int source_nelts = TYPE_VECTOR_S

[PATCH take #2] Fold truncations of left shifts in match.pd

2022-06-05 Thread Roger Sayle

Hi Richard,
Many thanks for taking the time to explain how vectorization is supposed
to work.  I now see that vect_recog_rotate_pattern in tree-vect-patterns.cc
is supposed to handle lowering of rotations to (vector) shifts, and
completely agree that adding support for signed types (using appropriate
casts to unsigned_type_for and casting the result back to the original
signed type) is a better approach to avoid the regression of pr98674.c.

I've also implemented your suggestions of combining the proposed new
(convert (lshift @1 INTEGER_CST@2)) with the existing one, and at the
same time including support for valid shifts greater than the narrower
type, such as (short)(x << 20),  to constant zero.  Although this optimization
is already performed during the tree-ssa passes, it's convenient to
also catch it here during constant folding.

This revised patch has been tested on x86_64-pc-linux-gnu with
make bootstrap and make -k check, both with and without
--target_board=unix{-m32}, with no new failures.  Ok for mainline?

2022-06-05  Roger Sayle  
Richard Biener  

gcc/ChangeLog
* match.pd (convert (lshift @1 INTEGER_CST@2)): Narrow integer
left shifts by a constant when the result is truncated, and the
shift constant is well-defined.
* tree-vect-patterns.cc (vect_recog_rotate_pattern): Add
support for rotations of signed integer types, by lowering
using unsigned vector shifts.

gcc/testsuite/ChangeLog
* gcc.dg/fold-convlshift-4.c: New test case.
* gcc.dg/optimize-bswaphi-1.c: Update found bswap count.
* gcc.dg/tree-ssa/pr61839_3.c: Shift is now optimized before VRP.
* gcc.dg/vect/vect-over-widen-1-big-array.c: Remove obsolete tests.
* gcc.dg/vect/vect-over-widen-1.c: Likewise.
* gcc.dg/vect/vect-over-widen-3-big-array.c: Likewise.
* gcc.dg/vect/vect-over-widen-3.c: Likewise.
* gcc.dg/vect/vect-over-widen-4-big-array.c: Likewise.
* gcc.dg/vect/vect-over-widen-4.c: Likewise.


Thanks again,
Roger
--

> -Original Message-
> From: Richard Biener 
> Sent: 02 June 2022 12:03
> To: Roger Sayle 
> Cc: GCC Patches 
> Subject: Re: [PATCH] Fold truncations of left shifts in match.pd
> 
> On Thu, Jun 2, 2022 at 12:55 PM Roger Sayle 
> wrote:
> >
> >
> > Hi Richard,
> > > +  /* RTL expansion knows how to expand rotates using shift/or.  */
> > > + if (icode == CODE_FOR_nothing
> > > +  && (code == LROTATE_EXPR || code == RROTATE_EXPR)
> > > +  && optab_handler (ior_optab, vec_mode) != CODE_FOR_nothing
> > > +  && optab_handler (ashl_optab, vec_mode) != CODE_FOR_nothing)
> > > +icode = (int) optab_handler (lshr_optab, vec_mode);
> > >
> > > but we then get the vector costing wrong.
> >
> > The issue is that we currently get the (relative) vector costing wrong.
> > Currently for gcc.dg/vect/pr98674.c, the vectorizer thinks the scalar
> > code requires two shifts and an ior, so believes its profitable to
> > vectorize this loop using two vector shifts and an vector ior.  But
> > once match.pd simplifies the truncate and recognizes the HImode rotate we
> end up with:
> >
> > pr98674.c:6:16: note:   ==> examining statement: _6 = _1 r>> 8;
> > pr98674.c:6:16: note:   vect_is_simple_use: vectype vector(8) short int
> > pr98674.c:6:16: note:   vect_is_simple_use: operand 8, type of def: constant
> > pr98674.c:6:16: missed:   op not supported by target.
> > pr98674.c:8:33: missed:   not vectorized: relevant stmt not supported: _6 = 
> > _1
> r>> 8;
> > pr98674.c:6:16: missed:  bad operation or unsupported loop bound.
> >
> >
> > Clearly, it's a win to vectorize HImode rotates, when the backend can
> > perform
> > 8 (or 16) rotations at a time, but using 3 vector instructions, even
> > when a scalar rotate can performed in a single instruction.
> > Fundamentally, vectorization may still be desirable/profitable even when the
> backend doesn't provide an optab.
> 
> Yes, as said it's tree-vect-patterns.cc job to handle this not natively 
> supported
> rotate by re-writing it.  Can you check why vect_recog_rotate_pattern does not
> do this?  Ah, the code only handles !TYPE_UNSIGNED (type) - not sure why
> though (for rotates it should not matter and for the lowered sequence we can
> convert to desired signedness to get arithmetic/logical shifts)?
> 
> > The current situation where the i386's backend provides expanders to
> > lower rotations (or vcond) into individual instruction sequences, also 
> > interferes
> with
> > vector costing.   It's the vector cost function that needs to be fixed, not 
> > the
> > generated code made worse (or the backend bloated performing its own
> > RTL expansion workarounds).
> >
> > Is it instead ok to mark pr98674.c as XFAIL (a regression)?
> > The tweak to tree-vect-stmts.cc was based on the assumption that we
> > wished to continue vectorizing this loop.  Improving scalar code
> > generation really shouldn't disable vectorization like this.
> 
>

RE: [x86 PATCH] PR target/91681: zero_extendditi2 pattern for more optimizations.

2022-06-05 Thread Roger Sayle

Hi Uros,
Many thanks for your speedy review.  This revised patch implements
all three of your recommended improvements; the use of
ix86_binary_operator_ok with code UNKNOWN, the removal of
"n" constraints from const_int_operand predicates, and the use
of force_reg (for input operands, and REG_P for output operands)
to ensure that it's always safe to call gen_lowpart/gen_highpart.

[If we proceed with the recent proposal to split double word 
addition, subtraction and other operations before reload, then
these new add/sub variants should be updated at the same time,
but for now this patch keeps double word patterns consistent].
 
This revised patch has been retested on x86_64-pc-linux-gnu with
make bootstrap and make -k check, both with and without 
--target_board=unix{-m32} with no new failures.  Ok for mainline?


2022-06-05  Roger Sayle  
Uroš Bizjak  

gcc/ChangeLog
PR target/91681
* config/i386/i386.md (zero_extendditi2): New define_insn_and_split.
(*add3_doubleword_zext): New define_insn_and_split.
(*sub3_doubleword_zext): New define_insn_and_split.
(*concat3_1): New define_insn_and_split replacing
previous define_split for implementing DST = (HI<<32)|LO as
pair of move instructions, setting lopart and hipart.
(*concat3_2): Likewise.
(*concat3_3): Likewise, where HI is zero_extended.
(*concat3_4): Likewise, where HI is zero_extended.
* config/i386/sse.md (kunpckhi): Add UNSPEC_MASKOP unspec.
(kunpcksi): Likewise, add UNSPEC_MASKOP unspec.
(kunpckdi): Likewise, add UNSPEC_MASKOP unspec.
(vec_pack_trunc_qi): Update to specify required UNSPEC_MASKOP unspec.
(vec_pack_trunc_): Likewise.

gcc/testsuite/ChangeLog
PR target/91681
* g++.target/i386/pr91681.C: New test case (from the PR).
* gcc.target/i386/pr91681-1.c: New int128 test case.
* gcc.target/i386/pr91681-2.c: Likewise.
* gcc.target/i386/pr91681-3.c: Likewise, but for ia32.


Thanks again,
Roger
--

> -Original Message-
> From: Uros Bizjak 
> Sent: 03 June 2022 11:08
> To: Roger Sayle 
> Cc: GCC Patches 
> Subject: Re: [x86 PATCH] PR target/91681: zero_extendditi2 pattern for more
> optimizations.
> 
> On Fri, Jun 3, 2022 at 11:49 AM Roger Sayle 
> wrote:
> >
> >
> > Technically, PR target/91681 has already been resolved; we now
> > recognize the highpart multiplication at the tree-level, we no longer
> > use the stack, and we currently generate the same number of
> > instructions as LLVM.  However, it is still possible to do better, the
> > current x86_64 code to generate a double word addition of a zero extended
> operand, looks like:
> >
> > xorl%r11d, %r11d
> > addq%r10, %rax
> > adcq%r11, %rdx
> >
> > when it's possible (as LLVM does) to use an immediate constant:
> >
> > addq%r10, %rax
> > adcq$0, %rdx
> >
> > To do this, the backend required one or two simple changes, that then
> > themselves required one or two more obscure tweaks.
> >
> > The simple starting point is to define a zero_extendditi2 pattern, for
> > zero extension from DImode to TImode on TARGET_64BIT that is split
> > after reload.  Double word (TImode) addition/subtraction is split
> > after reload, so that constrains when things should happen.
> >
> > With zero extension now visible to combine, we add two new
> > define_insn_and_split that add/subtract a zero extended operand in
> > double word mode.  These apply to both 32-bit and 64-bit code
> > generation, to produce adc $0 and sbb $0.
> >
> > The first strange tweak is that these new patterns interfere with the
> > optimization that recognizes DW:DI = (HI:SI<<32)+LO:SI as a pair of
> > register moves, or more accurately the combine splitter no longer
> > triggers as we're now converting two instructions into two
> > instructions (not three instructions into two instructions).  This is
> > easily repaired (and extended to handle TImode) by changing from a
> > pair of define_split (that handle operand commutativity) to a set of
> > four define_insn_and_split (again to handle operand commutativity).
> >
> > The other/final strange tweak that the above splitters now interfere
> > with AVX512's kunpckdq instruction which is defined as identical RTL,
> > DW:DI = (HI:SI<<32)|zero_extend(LO:SI).  To distinguish this, and also
> > avoid AVX512 mask registers being used by reload to perform SImode
> > scalar shifts, I've added the explicit (unspec UNSPEC_MASKOP) to the
> > unpack mask operations, which matches what sse.md does for the other
> > mask specific (logic) operations.
> >
> > This patch has been tested on x86_64-pc-linux-gnu with make bootstrap
> > and make -k check, both with and without --target_board=unix{-m32},
> > with no new failures.  Ok for mainline?
> >
> >
> > 2022-06-03  Roger Sayle  
> >
> > gcc/ChangeLog
> > PR target/91681
> > * config/i386/i386.md (zero_

[PATCH] PR tree-optimization/105835: Two narrowing patterns for match.pd.

2022-06-05 Thread Roger Sayle

This patch resolves PR tree-optimization/105835, which is a code quality
(dead code elimination) regression at -O1 triggered/exposed by a recent
change to canonicalize X&-Y as X*Y.  The new (shorter) form exposes some
missed optimization opportunities that can be handled by adding some
extra simplifications to match.pd.

One transformation is to simplify "(short)(x ? 65535 : 0)" into the
equivalent "x ? -1 : 0", or more accurately x ? (short)-1 : (short)0",
as INTEGER_CSTs record their type, and integer conversions can be
pushed inside COND_EXPRs reducing the number of gimple statements.

The other transformation is that (short)(X * 65535), where X is [0,1],
into the equivalent (short)X * -1, (or again (short)-1 where tree's
INTEGER_CSTs encode their type).  This is valid because multiplications
where one operand is [0,1] are guaranteed not to overflow, and hence
integer conversions can also be pushed inside these multiplications.

These narrowing conversion optimizations can be identified by range
analyses, such as EVRP, but these are only performed at -O2 and above,
which is why this regression is only visible with -O1.

This patch has been tested on x86_64-pc-linux-gnu with make bootstrap
and make -k check, both with and without --target_board=unix{-m32}
with no new failures.  Ok for mainline?


2022-06-05  Roger Sayle  

gcc/ChangeLog
* match.pd (convert (mult zero_one_valued_p@1 INTEGER_CST@2)):
Narrow integer multiplication by a zero_one_valued_p operand.
(convert (cond @1 INTEGER_CST@2 INTEGER_CST@3)): Push integer
conversions inside COND_EXPR where both data operands are
integer constants.

gcc/testsuite/ChangeLog
* gcc.dg/pr105835.c: New test case.


Thanks in advance,
Roger
--

diff --git a/gcc/match.pd b/gcc/match.pd
index 2d3ffc4..d705947 100644
--- a/gcc/match.pd
+++ b/gcc/match.pd
@@ -1800,6 +1800,15 @@ DEFINE_INT_AND_FLOAT_ROUND_FN (RINT)
   && !TYPE_UNSIGNED (TREE_TYPE (@0)))
   (mult (convert @0) @1)))
 
+/* Narrow integer multiplication by a zero_one_valued_p operand.
+   Multiplication by [0,1] is guaranteed not to overflow.  */
+(simplify
+ (convert (mult@0 zero_one_valued_p@1 INTEGER_CST@2))
+ (if (INTEGRAL_TYPE_P (type)
+  && INTEGRAL_TYPE_P (TREE_TYPE (@0))
+  && TYPE_PRECISION (type) <= TYPE_PRECISION (TREE_TYPE (@0)))
+  (mult (convert @1) (convert @2
+
 /* Convert ~ (-A) to A - 1.  */
 (simplify
  (bit_not (convert? (negate @0)))
@@ -4265,6 +4274,12 @@ DEFINE_INT_AND_FLOAT_ROUND_FN (RINT)
 )
 #endif
 
+(simplify
+ (convert (cond@0 @1 INTEGER_CST@2 INTEGER_CST@3))
+ (if (INTEGRAL_TYPE_P (type)
+  && INTEGRAL_TYPE_P (TREE_TYPE (@0)))
+  (cond @1 (convert @2) (convert @3
+
 /* Simplification moved from fold_cond_expr_with_comparison.  It may also
be extended.  */
 /* This pattern implements two kinds simplification:
diff --git a/gcc/testsuite/gcc.dg/pr105835.c b/gcc/testsuite/gcc.dg/pr105835.c
new file mode 100644
index 000..354c81c
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/pr105835.c
@@ -0,0 +1,18 @@
+/* { dg-do compile } */
+/* { dg-options "-O1 -fdump-tree-optimized" } */
+
+void foo();
+
+static int b;
+
+static short a(short c, unsigned short d) { return c - d; }
+
+int main() {
+int e = -(0 < b);
+if (a(1, e))
+b = 0;
+else
+foo();
+}
+
+/* { dg-final { scan-tree-dump-not "goto" "optimized" } } */


[x86 PATCH] Double word implementation of and; cmp to not; test optimization.

2022-06-05 Thread Roger Sayle

This patch extends the recent and;cmp to not;test optimization to also
perform this transformation for TImode on TARGET_64BIT and DImode on -m32,
One motivation for this is that it's a step to fixing the current failure
of gcc.target/i386/pr65105-5.c on -m32.

A more direct benefit for x86_64 is that the following code:

int foo(__int128 x, __int128 y)
{
  return (x & y) == y;
}

improves (with -O2 -mbmi) from:

movq%rdi, %r8
movq%rsi, %rdi
movq%rdx, %rsi
andq%rcx, %rdi
movq%r8, %rax
andq%rdx, %rax
movq%rdi, %rdx
xorq%rsi, %rax
xorq%rcx, %rdx
orq %rdx, %rax
sete%al
movzbl  %al, %eax
ret

to the much better:

movq%rdi, %r8
movq%rsi, %rdi
andn%rdx, %r8, %rax
andn%rcx, %rdi, %rsi
orq %rsi, %rax
sete%al
movzbl  %al, %eax
ret

The major theme of this patch is to generalize many of i386.md's
*di3_doubleword patterns to become *_doubleword patterns, i.e.
whenever there exists a "double word" optimization for DImode with -m32,
there should be an equivalent TImode optimization on TARGET_64BIT.

The following patch has been tested on x86_64-pc-linux-gnu with
make bootstrap and make -k check, where on TARGET_64BIT there are
no new failures, but paradoxically with --target_board=unix{-m32}
the other dg-final clause in gcc.target/i386/pr65105-5.c now fails.
Counter-intuitively, this is progress, and pr65105-5.c may now be
fixed (without using peephole2) simply by tweaking the STV pass to
handle andn/test (in a follow-up patch).
OK for mainline?


2022-06-05  Roger Sayle  

gcc/ChangeLog
* config/i386/i386.cc (ix86_rtx_costs) : Provide costs
for double word comparisons and tests (comparisons against zero).
* config/i386/i386.md (*test_not_doubleword): Split DWI
and;cmp into andn;cmp $0 as a pre-reload splitter.
(define_expand and3): Generalize from SWIM1248x to SWIDWI.
(define_insn_and_split "*anddi3_doubleword"): Rename/generalize...
(define_insn_and_split "*and3_doubleword"): ... to this.
(define_insn "*andndi3_doubleword"): Rename and generalize...
(define_insn "*andn3_doubleword): ... to this.
(define_split): Split andn when TARGET_BMI for both  modes.
(define_split): Split andn when !TARGET_BMI for both  modes.
(define_expand 3): Generalize from SWIM1248x to
SWIDWI.
(define_insn_and_split "*3_doubleword): Generalize
from DI mode to both  modes.

gcc/testsuite/ChangeLog
* gcc.target/i386/testnot-3.c: New test case.


Thanks again,
Roger
--

diff --git a/gcc/config/i386/i386.cc b/gcc/config/i386/i386.cc
index df5c80d..af11669 100644
--- a/gcc/config/i386/i386.cc
+++ b/gcc/config/i386/i386.cc
@@ -20918,6 +20918,19 @@ ix86_rtx_costs (rtx x, machine_mode mode, int 
outer_code_i, int opno,
  return true;
}
 
+  if (SCALAR_INT_MODE_P (GET_MODE (op0))
+ && GET_MODE_SIZE (GET_MODE (op0)) > UNITS_PER_WORD)
+   {
+ if (op1 == const0_rtx)
+   *total = cost->add
++ rtx_cost (op0, GET_MODE (op0), outer_code, opno, speed);
+ else
+   *total = 3*cost->add
++ rtx_cost (op0, GET_MODE (op0), outer_code, opno, speed)
++ rtx_cost (op1, GET_MODE (op0), outer_code, opno, speed);
+ return true;
+   }
+
   /* The embedded comparison operand is completely free.  */
   if (!general_operand (op0, GET_MODE (op0)) && op1 == const0_rtx)
*total = 0;
diff --git a/gcc/config/i386/i386.md b/gcc/config/i386/i386.md
index 2b1d65b..502416b 100644
--- a/gcc/config/i386/i386.md
+++ b/gcc/config/i386/i386.md
@@ -9785,9 +9785,24 @@
(set (reg:CCZ FLAGS_REG)
(compare:CCZ (and:SWI (match_dup 2) (match_dup 1))
 (const_int 0)))]
-{
-  operands[2] = gen_reg_rtx (mode);
-})
+  "operands[2] = gen_reg_rtx (mode);")
+
+;; Split and;cmp (as optimized by combine) into andn;cmp $0
+(define_insn_and_split "*test_not_doubleword"
+  [(set (reg:CCZ FLAGS_REG)
+   (compare:CCZ
+ (and:DWI
+   (not:DWI (match_operand:DWI 0 "register_operand"))
+   (match_operand:DWI 1 "nonimmediate_operand"))
+ (const_int 0)))]
+  "ix86_pre_reload_split ()"
+  "#"
+  "&& 1"
+  [(parallel
+  [(set (match_dup 2) (and:DWI (not:DWI (match_dup 0)) (match_dup 1)))
+   (clobber (reg:CC FLAGS_REG))])
+   (set (reg:CCZ FLAGS_REG) (compare:CCZ (match_dup 2) (const_int 0)))]
+  "operands[2] = gen_reg_rtx (mode);")
 
 ;; Convert HImode/SImode test instructions with immediate to QImode ones.
 ;; i386 does not allow to encode test with 8bit sign extended immediate, so
@@ -9846,19 +9861,21 @@
 ;; it should be done with splitters.
 
 (define_expand "and3"
-  [(set (match_operand:SWIM1248x 0 "nonimmediate_operand")
-   (and:SWIM124

Re: [PING] PR middle-end/95126: Expand small const structs as immediate constants

2022-06-05 Thread Andreas Schwab
This breaks Ada on aarch64 in stage3, probably a miscompiled stage2
compiler.  For example:

/opt/gcc/gcc-20220605/Build/./prev-gcc/xgcc 
-B/opt/gcc/gcc-20220605/Build/./prev-gcc/ -B/usr/aarch64-suse-linux/bin/ 
-B/usr/aarch64-suse-linux/bin/ -B/usr/aarch64-suse-linux/lib/ -isystem 
/usr/aarch64-suse-linux/include -isystem /usr/aarch64-suse-linux/sys-include   
-fchecking=1 -c -g -O2 -fchecking=1  -gnatpg -gnata -W -Wall -nostdinc -I- -I. 
-Iada/generated -Iada -I../../gcc/ada -Iada/libgnat -I../../gcc/ada/libgnat 
-Iada/gcc-interface -I../../gcc/ada/gcc-interface ../../gcc/ada/spark_xrefs.adb 
-o ada/spark_xrefs.o
+===GNAT BUG DETECTED==+
| 13.0.0 20220605 (experimental) [master ad6919374be] (aarch64-suse-linux) |
| Assert_Failure failed precondition from sinfo-nodes.ads:5419 |
| Error detected at types.ads:53:28|
| Compiling ../../gcc/ada/spark_xrefs.adb  |
| Please submit a bug report; see https://gcc.gnu.org/bugs/ .  |
| Use a subject line meaningful to you and us to track the bug.|
| Include the entire contents of this bug box in the report.   |
| Include the exact command that you entered.  |
| Also include sources listed below.   |
+==+

-- 
Andreas Schwab, sch...@linux-m68k.org
GPG Key fingerprint = 7578 EB47 D4E5 4D69 2510  2552 DF73 E780 A9DA AEC1
"And now for something completely different."


Re: [x86 PATCH] Recognize vpcmov in combine with -mxop.

2022-06-05 Thread Uros Bizjak via Gcc-patches
On Sat, Jun 4, 2022 at 1:03 PM Roger Sayle  wrote:
>
>
> By way of an apology for causing PR target/105791, where I'd overlooked
> the need to support V1TImode in TARGET_XOP's vpcmov instruction, this
> patch further improves support for TARGET_XOP's vpcmov instruction, by
> recognizing it in combine.
>
> Currently, the test case:
>
> typedef int v4si __attribute__ ((vector_size (16)));
> v4si foo(v4si c, v4si t, v4si f)
> {
> return (c&t)|(~c&f);
> }
>
> on x86_64 with -O2 -mxop generates:
> vpxor   %xmm2, %xmm1, %xmm1
> vpand   %xmm0, %xmm1, %xmm1
> vpxor   %xmm2, %xmm1, %xmm0
> ret
>
> but with this patch now generates:
> vpcmov  %xmm0, %xmm2, %xmm1, %xmm0
> ret
>
> On its own, the new combine splitter works fine on TARGET_64BIT, but
> alas with -m32 combine incorrectly thinks the replacement instruction
> is more expensive, as IF_THEN_ELSE isn't currently/correctly handled
> in ix86_rtx_costs.  So to avoid the need for a target selector in the
> new testcase, I've updated ix86_rtx_costs to report that AMD's vpcmov
> has a latency of two cycles [it's now an obsolete instruction set
> extension and there's unlikely to ever be a processor where this
> instruction has a different timing], and while there I also added
> rtx_costs for x86_64's integer conditional move instructions (which
> have single cycle latency).
>
> This patch has been tested on x86_64-pc-linux-gnu with make bootstrap
> and make -k check, both with and without --target_board=unix{-m32},
> with no new failures.  Ok for mainline?
>
>
> 2022-06-04  Roger Sayle  
>
> gcc/ChangeLog
> * config/i386/i386.cc (ix86_rtx_costs): Add a new case for
> IF_THEN_ELSE, and provide costs for TARGET_XOP's vpcmov and
> TARGET_CMOVE's (scalar integer) conditional moves.
> * config/i386/sse.md (define_split): Recognize XOP's vpcmov
> from its equivalent (canonical) pxor;pand;pxor sequence.
>
> gcc/testsuite/ChangeLog
> * gcc.target/i386/xop-pcmov3.c: New test case.

OK with a nit below.

Thanks,
Uros.

+{
+  operands[5] = REGNO (operands[4]) == REGNO (operands[1]) ? operands[2]
+   : operands[1];
+})
+

Please expand this to enhance readability, it is a bit too cryptic for me ...


Re: [x86 PATCH] PR target/91681: zero_extendditi2 pattern for more optimizations.

2022-06-05 Thread Uros Bizjak via Gcc-patches
On Sun, Jun 5, 2022 at 1:48 PM Roger Sayle  wrote:
>
>
> Hi Uros,
> Many thanks for your speedy review.  This revised patch implements
> all three of your recommended improvements; the use of
> ix86_binary_operator_ok with code UNKNOWN, the removal of
> "n" constraints from const_int_operand predicates, and the use
> of force_reg (for input operands, and REG_P for output operands)
> to ensure that it's always safe to call gen_lowpart/gen_highpart.
>
> [If we proceed with the recent proposal to split double word
> addition, subtraction and other operations before reload, then
> these new add/sub variants should be updated at the same time,
> but for now this patch keeps double word patterns consistent].
>
> This revised patch has been retested on x86_64-pc-linux-gnu with
> make bootstrap and make -k check, both with and without
> --target_board=unix{-m32} with no new failures.  Ok for mainline?

+(define_insn_and_split "*concat3_1"
+  [(set (match_operand: 0 "register_operand" "=r")
+ (any_or_plus:
+  (ashift: (match_operand: 1 "register_operand" "r")
+ (match_operand: 2 "const_int_operand"))
+  (zero_extend: (match_operand:DWIH 3 "register_operand" "r"]
+  "INTVAL (operands[2]) ==  * BITS_PER_UNIT
+   && REG_P (operands[0])
+   && ix86_pre_reload_split ()"
+  "#"
+  "&& 1"
+  [(set (match_dup 4) (match_dup 3))
+   (set (match_dup 5) (match_dup 6))]
+{
+  operands[1] = force_reg (mode, operands[1]);
+  operands[4] = gen_lowpart (mode, operands[0]);
+  operands[5] = gen_highpart (mode, operands[0]);
+  operands[6] = gen_lowpart (mode, operands[1]);
+})

Hm, but in this particular case (and other) you can use
split_double_mode on operands[0], instead of manually splitting REG_P
constrained operands, and it will handle everything correctly. Please
note that split_double_mode has:

split_double_mode (machine_mode mode, rtx operands[],
   int num, rtx lo_half[], rtx hi_half[])

so with some care you can use:

"split_double_mode (mode, &operands[0],1, &operands[4], &operands[5]);"

followed by:

operands[6] = simplify_gen_subreg (mode, op, mode, 0);

The above line is partially what split_double_mode does.

This is the approach other pre_reload doubleword splitters take, it
looks the safest (otherwise it would break left and right with
existing patterns ...), and the most effective to me.

Please also get approval for sse.md change from Hongtao, AVX512F stuff
is in a separate playground.

Uros.


>
> 2022-06-05  Roger Sayle  
> Uroš Bizjak  
>
> gcc/ChangeLog
> PR target/91681
> * config/i386/i386.md (zero_extendditi2): New define_insn_and_split.
> (*add3_doubleword_zext): New define_insn_and_split.
> (*sub3_doubleword_zext): New define_insn_and_split.
> (*concat3_1): New define_insn_and_split replacing
> previous define_split for implementing DST = (HI<<32)|LO as
> pair of move instructions, setting lopart and hipart.
> (*concat3_2): Likewise.
> (*concat3_3): Likewise, where HI is zero_extended.
> (*concat3_4): Likewise, where HI is zero_extended.
> * config/i386/sse.md (kunpckhi): Add UNSPEC_MASKOP unspec.
> (kunpcksi): Likewise, add UNSPEC_MASKOP unspec.
> (kunpckdi): Likewise, add UNSPEC_MASKOP unspec.
> (vec_pack_trunc_qi): Update to specify required UNSPEC_MASKOP unspec.
> (vec_pack_trunc_): Likewise.
>
> gcc/testsuite/ChangeLog
> PR target/91681
> * g++.target/i386/pr91681.C: New test case (from the PR).
> * gcc.target/i386/pr91681-1.c: New int128 test case.
> * gcc.target/i386/pr91681-2.c: Likewise.
> * gcc.target/i386/pr91681-3.c: Likewise, but for ia32.
>
>
> Thanks again,
> Roger
> --
>
> > -Original Message-
> > From: Uros Bizjak 
> > Sent: 03 June 2022 11:08
> > To: Roger Sayle 
> > Cc: GCC Patches 
> > Subject: Re: [x86 PATCH] PR target/91681: zero_extendditi2 pattern for more
> > optimizations.
> >
> > On Fri, Jun 3, 2022 at 11:49 AM Roger Sayle 
> > wrote:
> > >
> > >
> > > Technically, PR target/91681 has already been resolved; we now
> > > recognize the highpart multiplication at the tree-level, we no longer
> > > use the stack, and we currently generate the same number of
> > > instructions as LLVM.  However, it is still possible to do better, the
> > > current x86_64 code to generate a double word addition of a zero extended
> > operand, looks like:
> > >
> > > xorl%r11d, %r11d
> > > addq%r10, %rax
> > > adcq%r11, %rdx
> > >
> > > when it's possible (as LLVM does) to use an immediate constant:
> > >
> > > addq%r10, %rax
> > > adcq$0, %rdx
> > >
> > > To do this, the backend required one or two simple changes, that then
> > > themselves required one or two more obscure tweaks.
> > >
> > > The simple starting point is to define a zero_extendditi2 pattern, for
> > > zero extension from DImode to TImode on TARGET_64BIT that is split
> > > after reloa

Re: [x86 PATCH] Double word implementation of and; cmp to not; test optimization.

2022-06-05 Thread Uros Bizjak via Gcc-patches
On Sun, Jun 5, 2022 at 7:19 PM Roger Sayle  wrote:
>
>
> This patch extends the recent and;cmp to not;test optimization to also
> perform this transformation for TImode on TARGET_64BIT and DImode on -m32,
> One motivation for this is that it's a step to fixing the current failure
> of gcc.target/i386/pr65105-5.c on -m32.
>
> A more direct benefit for x86_64 is that the following code:
>
> int foo(__int128 x, __int128 y)
> {
>   return (x & y) == y;
> }
>
> improves (with -O2 -mbmi) from:
>
> movq%rdi, %r8
> movq%rsi, %rdi
> movq%rdx, %rsi
> andq%rcx, %rdi
> movq%r8, %rax
> andq%rdx, %rax
> movq%rdi, %rdx
> xorq%rsi, %rax
> xorq%rcx, %rdx
> orq %rdx, %rax
> sete%al
> movzbl  %al, %eax
> ret
>
> to the much better:
>
> movq%rdi, %r8
> movq%rsi, %rdi
> andn%rdx, %r8, %rax
> andn%rcx, %rdi, %rsi
> orq %rsi, %rax
> sete%al
> movzbl  %al, %eax
> ret
>
> The major theme of this patch is to generalize many of i386.md's
> *di3_doubleword patterns to become *_doubleword patterns, i.e.
> whenever there exists a "double word" optimization for DImode with -m32,
> there should be an equivalent TImode optimization on TARGET_64BIT.

No, please do not mix two different themes in one patch.

OTOH, the only TImode optimization that can be used with SSE registers
is with logic instructions and some constant shifts, but there is no
TImode arithmetic. I assume your end goal is to introduce STV for
TImode on 64-bit targets, because DImode patterns for x86_32 were
introduced to avoid early decomposition by middle end and to split
instructions that STV didn't convert to vector instructions after STV
pass. So, let's start with basic V1TImode support before optimizations
are introduced.

Uros.

> The following patch has been tested on x86_64-pc-linux-gnu with
> make bootstrap and make -k check, where on TARGET_64BIT there are
> no new failures, but paradoxically with --target_board=unix{-m32}
> the other dg-final clause in gcc.target/i386/pr65105-5.c now fails.
> Counter-intuitively, this is progress, and pr65105-5.c may now be
> fixed (without using peephole2) simply by tweaking the STV pass to
> handle andn/test (in a follow-up patch).
> OK for mainline?
>
>
> 2022-06-05  Roger Sayle  
>
> gcc/ChangeLog
> * config/i386/i386.cc (ix86_rtx_costs) : Provide costs
> for double word comparisons and tests (comparisons against zero).
> * config/i386/i386.md (*test_not_doubleword): Split DWI
> and;cmp into andn;cmp $0 as a pre-reload splitter.
> (define_expand and3): Generalize from SWIM1248x to SWIDWI.
> (define_insn_and_split "*anddi3_doubleword"): Rename/generalize...
> (define_insn_and_split "*and3_doubleword"): ... to this.
> (define_insn "*andndi3_doubleword"): Rename and generalize...
> (define_insn "*andn3_doubleword): ... to this.
> (define_split): Split andn when TARGET_BMI for both  modes.
> (define_split): Split andn when !TARGET_BMI for both  modes.
> (define_expand 3): Generalize from SWIM1248x to
> SWIDWI.
> (define_insn_and_split "*3_doubleword): Generalize
> from DI mode to both  modes.
>
> gcc/testsuite/ChangeLog
> * gcc.target/i386/testnot-3.c: New test case.
>
>
> Thanks again,
> Roger
> --
>


Re: [PATCH] x86: harmonize __builtin_ia32_psadbw*() types

2022-06-05 Thread Uros Bizjak via Gcc-patches
On Thu, Jun 2, 2022 at 5:04 PM Jan Beulich  wrote:
>
> The 64-bit, 128-bit, and 512-bit variants have VDI return type, in
> line with instruction behavior. Make the 256-bit builtin match, thus
> also making it match the insn it expands to (using VI8_AVX2_AVX512BW).
>
> gcc/
>
> * config/i386/i386-builtin.def (__builtin_ia32_psadbw256):
> Change type.
> * config/i386/i386-builtin-types.def: New function type
> (V4DI, V32QI, V32QI).
> * config/i386/i386-expand.cc (ix86_expand_args_builtin): Handle
> V4DI_FTYPE_V32QI_V32QI.

LGTM, but please let HJ have the final approval.

Uros.

>
> --- a/gcc/config/i386/i386-builtin.def
> +++ b/gcc/config/i386/i386-builtin.def
> @@ -1217,7 +1217,7 @@ BDESC (OPTION_MASK_ISA_AVX2, 0, CODE_FOR
>  BDESC (OPTION_MASK_ISA_AVX2, 0, CODE_FOR_mulv8si3, 
> "__builtin_ia32_pmulld256"  , IX86_BUILTIN_PMULLD256  , UNKNOWN, (int) 
> V8SI_FTYPE_V8SI_V8SI)
>  BDESC (OPTION_MASK_ISA_AVX2, 0, CODE_FOR_vec_widen_umult_even_v8si, 
> "__builtin_ia32_pmuludq256", IX86_BUILTIN_PMULUDQ256, UNKNOWN, (int) 
> V4DI_FTYPE_V8SI_V8SI)
>  BDESC (OPTION_MASK_ISA_AVX2, 0, CODE_FOR_iorv4di3, "__builtin_ia32_por256", 
> IX86_BUILTIN_POR256, UNKNOWN, (int) V4DI_FTYPE_V4DI_V4DI)
> -BDESC (OPTION_MASK_ISA_AVX2, 0, CODE_FOR_avx2_psadbw, 
> "__builtin_ia32_psadbw256", IX86_BUILTIN_PSADBW256, UNKNOWN, (int) 
> V16HI_FTYPE_V32QI_V32QI)
> +BDESC (OPTION_MASK_ISA_AVX2, 0, CODE_FOR_avx2_psadbw, 
> "__builtin_ia32_psadbw256", IX86_BUILTIN_PSADBW256, UNKNOWN, (int) 
> V4DI_FTYPE_V32QI_V32QI)
>  BDESC (OPTION_MASK_ISA_AVX2, 0, CODE_FOR_avx2_pshufbv32qi3, 
> "__builtin_ia32_pshufb256", IX86_BUILTIN_PSHUFB256, UNKNOWN, (int) 
> V32QI_FTYPE_V32QI_V32QI)
>  BDESC (OPTION_MASK_ISA_AVX2, 0, CODE_FOR_avx2_pshufdv3, 
> "__builtin_ia32_pshufd256", IX86_BUILTIN_PSHUFD256, UNKNOWN, (int) 
> V8SI_FTYPE_V8SI_INT)
>  BDESC (OPTION_MASK_ISA_AVX2, 0, CODE_FOR_avx2_pshufhwv3, 
> "__builtin_ia32_pshufhw256", IX86_BUILTIN_PSHUFHW256, UNKNOWN, (int) 
> V16HI_FTYPE_V16HI_INT)
> --- a/gcc/config/i386/i386-builtin-types.def
> +++ b/gcc/config/i386/i386-builtin-types.def
> @@ -516,6 +516,7 @@ DEF_FUNCTION_TYPE (V8DI, V8DI, V2DI, INT
>  DEF_FUNCTION_TYPE (V8DI, V8DI, V2DI, INT, V8DI, UQI)
>  DEF_FUNCTION_TYPE (V8DI, V8DI, V4DI, INT, V8DI, UQI)
>  DEF_FUNCTION_TYPE (V4DI, V8SI, V8SI)
> +DEF_FUNCTION_TYPE (V4DI, V32QI, V32QI)
>  DEF_FUNCTION_TYPE (V8DI, V64QI, V64QI)
>  DEF_FUNCTION_TYPE (V4DI, V4DI, V2DI)
>  DEF_FUNCTION_TYPE (V4DI, PCV4DI, V4DI)
> --- a/gcc/config/i386/i386-expand.cc
> +++ b/gcc/config/i386/i386-expand.cc
> @@ -10359,6 +10359,7 @@ ix86_expand_args_builtin (const struct b
>  case V8SI_FTYPE_V16HI_V16HI:
>  case V4DI_FTYPE_V4DI_V4DI:
>  case V4DI_FTYPE_V8SI_V8SI:
> +case V4DI_FTYPE_V32QI_V32QI:
>  case V8DI_FTYPE_V64QI_V64QI:
>if (comparison == UNKNOWN)
> return ix86_expand_binop_builtin (icode, exp, target);
>


Re: [PING] PR middle-end/95126: Expand small const structs as immediate constants

2022-06-05 Thread Rainer Orth
Andreas Schwab  writes:

> This breaks Ada on aarch64 in stage3, probably a miscompiled stage2
> compiler.  For example:
>
> /opt/gcc/gcc-20220605/Build/./prev-gcc/xgcc
> -B/opt/gcc/gcc-20220605/Build/./prev-gcc/ -B/usr/aarch64-suse-linux/bin/
> -B/usr/aarch64-suse-linux/bin/ -B/usr/aarch64-suse-linux/lib/ -isystem
> /usr/aarch64-suse-linux/include -isystem
> /usr/aarch64-suse-linux/sys-include -fchecking=1 -c -g -O2 -fchecking=1
> -gnatpg -gnata -W -Wall -nostdinc -I- -I. -Iada/generated -Iada
> -I../../gcc/ada -Iada/libgnat -I../../gcc/ada/libgnat -Iada/gcc-interface
> -I../../gcc/ada/gcc-interface ../../gcc/ada/spark_xrefs.adb -o
> ada/spark_xrefs.o
> +===GNAT BUG DETECTED==+
> | 13.0.0 20220605 (experimental) [master ad6919374be] (aarch64-suse-linux) |
> | Assert_Failure failed precondition from sinfo-nodes.ads:5419 |
> | Error detected at types.ads:53:28|
> | Compiling ../../gcc/ada/spark_xrefs.adb  |
> | Please submit a bug report; see https://gcc.gnu.org/bugs/ .  |
> | Use a subject line meaningful to you and us to track the bug.|
> | Include the entire contents of this bug box in the report.   |
> | Include the exact command that you entered.  |
> | Also include sources listed below.   |
> +==+

Confirmed: this also happens on i386-pc-solaris2.11,
sparc-sun-solaris2.11, and x86_64-pc-linux-gnu.


[PATCH]: libgompd add parallel handle functions

2022-06-05 Thread Mohamed Sayed via Gcc-patches
This patch adds parallel region handles specified in section 5.5.3.
>From examining libgomp code, I found that struct gomp_team describes the
parallel region.
The Thread handle gives the address of gomp_thread so, I tried to
access *team
gomp_thread->ts->team.
The parallel handle is the team pointer in team state.
I have a question about ompd_get_task_parallel_handle
https://www.openmp.org/spec-html/5.0/openmpsu218.html
How can i reach gomp_team from gomp_task
And the union in gomp_task has two entries gomp_sem_t and gomp_team

libgomp/ChangeLog

2022-06-06  Mohamed Sayed  


* Makefile.am: (libgompd_la_SOURCES): Add ompd-parallel.c.
* Makefile.in: Regenerate.
* libgompd.map: Add ompd_get_curr_parallel_handle,
ompd_get_enclosing_parallel_handle, ompd_rel_parallel_handle
and ompd_parallel_handle_compare symbol versions.
* ompd-support.h:() : Add gompd_access (gomp_team_state, team) and
gompd_access (gomp_team, prev_ts).
diff --git a/libgomp/Makefile.am b/libgomp/Makefile.am
index 6d913a93e7f..4e215450b25 100644
--- a/libgomp/Makefile.am
+++ b/libgomp/Makefile.am
@@ -94,7 +94,7 @@ libgomp_la_SOURCES = alloc.c atomic.c barrier.c critical.c 
env.c error.c \
priority_queue.c affinity-fmt.c teams.c allocator.c oacc-profiling.c \
oacc-target.c ompd-support.c
 
-libgompd_la_SOURCES = ompd-init.c ompd-helper.c ompd-icv.c
+libgompd_la_SOURCES = ompd-init.c ompd-helper.c ompd-icv.c ompd-parallel.c
 
 include $(top_srcdir)/plugin/Makefrag.am
 
diff --git a/libgomp/Makefile.in b/libgomp/Makefile.in
index 40f896b5f03..ab66ad1c8f0 100644
--- a/libgomp/Makefile.in
+++ b/libgomp/Makefile.in
@@ -233,7 +233,8 @@ am_libgomp_la_OBJECTS = alloc.lo atomic.lo barrier.lo 
critical.lo \
affinity-fmt.lo teams.lo allocator.lo oacc-profiling.lo \
oacc-target.lo ompd-support.lo $(am__objects_1)
 libgomp_la_OBJECTS = $(am_libgomp_la_OBJECTS)
-am_libgompd_la_OBJECTS = ompd-init.lo ompd-helper.lo ompd-icv.lo
+am_libgompd_la_OBJECTS = ompd-init.lo ompd-helper.lo ompd-icv.lo \
+   ompd-parallel.lo
 libgompd_la_OBJECTS = $(am_libgompd_la_OBJECTS)
 AM_V_P = $(am__v_P_@AM_V@)
 am__v_P_ = $(am__v_P_@AM_DEFAULT_V@)
@@ -583,7 +584,7 @@ libgomp_la_SOURCES = alloc.c atomic.c barrier.c critical.c 
env.c \
oacc-async.c oacc-plugin.c oacc-cuda.c priority_queue.c \
affinity-fmt.c teams.c allocator.c oacc-profiling.c \
oacc-target.c ompd-support.c $(am__append_7)
-libgompd_la_SOURCES = ompd-init.c ompd-helper.c ompd-icv.c
+libgompd_la_SOURCES = ompd-init.c ompd-helper.c ompd-icv.c ompd-parallel.c
 
 # Nvidia PTX OpenACC plugin.
 @PLUGIN_NVPTX_TRUE@libgomp_plugin_nvptx_version_info = -version-info 
$(libtool_VERSION)
@@ -800,6 +801,7 @@ distclean-compile:
 @AMDEP_TRUE@@am__include@ @am__quote@./$(DEPDIR)/ompd-helper.Plo@am__quote@
 @AMDEP_TRUE@@am__include@ @am__quote@./$(DEPDIR)/ompd-icv.Plo@am__quote@
 @AMDEP_TRUE@@am__include@ @am__quote@./$(DEPDIR)/ompd-init.Plo@am__quote@
+@AMDEP_TRUE@@am__include@ @am__quote@./$(DEPDIR)/ompd-parallel.Plo@am__quote@
 @AMDEP_TRUE@@am__include@ @am__quote@./$(DEPDIR)/ompd-support.Plo@am__quote@
 @AMDEP_TRUE@@am__include@ @am__quote@./$(DEPDIR)/ordered.Plo@am__quote@
 @AMDEP_TRUE@@am__include@ @am__quote@./$(DEPDIR)/parallel.Plo@am__quote@
diff --git a/libgomp/libgompd.map b/libgomp/libgompd.map
index 85bdc3695f6..1662dc56962 100644
--- a/libgomp/libgompd.map
+++ b/libgomp/libgompd.map
@@ -16,6 +16,10 @@ OMPD_5.1 {
 ompd_thread_handle_compare;
 ompd_get_thread_id;
 ompd_get_device_from_thread;
+ompd_get_curr_parallel_handle;
+ompd_get_enclosing_parallel_handle;
+ompd_rel_parallel_handle;
+ompd_parallel_handle_compare;
   local:
 *;
 };
diff --git a/libgomp/ompd-support.h b/libgomp/ompd-support.h
index 39d55161132..48a2e6133f5 100644
--- a/libgomp/ompd-support.h
+++ b/libgomp/ompd-support.h
@@ -83,12 +83,15 @@ extern __UINT64_TYPE__ gompd_state;
   gompd_access (gomp_thread_pool, threads) \
   gompd_access (gomp_thread, ts) \
   gompd_access (gomp_team_state, team_id) \
-  gompd_access (gomp_task, icv)
+  gompd_access (gomp_task, icv) \
+  gompd_access (gomp_team_state, team) \
+  gompd_access (gomp_team, prev_ts)
 
 #define GOMPD_SIZES(gompd_size) \
   gompd_size (gomp_thread) \
   gompd_size (gomp_task_icv) \
-  gompd_size (gomp_task)
+  gompd_size (gomp_task) 
+  
 
 #ifdef HAVE_ATTRIBUTE_VISIBILITY
 #pragma GCC visibility pop


[PATCH-1, rs6000] Replace shift and ior insns with one rotate and mask insn for bswap pattern [PR93453]

2022-06-05 Thread HAO CHEN GUI via Gcc-patches
Hi,
  This patch replaces shift and ior insns with one rotate and mask
insn for the split patterns which are for DI byte swap on Power6 and
before. The test cases shows the optimization.

  Bootstrapped and tested on ppc64 Linux BE and LE with no regressions.
Is this okay for trunk? Any recommendations? Thanks a lot.

ChangeLog
2022-06-06 Haochen Gui 

gcc/
* config/rs6000/rs6000.md (split for DI load byte swap): Merge shift
and ior insns to one rotate and mask insn.
(split for DI register byte swap): Likewise.

gcc/testsuite/
* gcc.target/powerpc/pr93453-1.c: New.

patch.diff
diff --git a/gcc/config/rs6000/rs6000.md b/gcc/config/rs6000/rs6000.md
index bf85baa5370..2e38195aaac 100644
--- a/gcc/config/rs6000/rs6000.md
+++ b/gcc/config/rs6000/rs6000.md
@@ -2828,8 +2828,8 @@ (define_split
   emit_insn (gen_bswapsi2 (dest_32, word2));
 }

-  emit_insn (gen_ashldi3 (op3, op3, GEN_INT (32)));
-  emit_insn (gen_iordi3 (dest, dest, op3));
+  emit_insn (gen_rotldi3_insert_3 (dest, op3, GEN_INT (32), dest,
+  GEN_INT ((HOST_WIDE_INT_1U << 32) - 1)));
   DONE;
 })

@@ -2914,10 +2914,10 @@ (define_split
   rtx op3_si  = simplify_gen_subreg (SImode, op3, DImode, lo_off);

   emit_insn (gen_lshrdi3 (op2, src, GEN_INT (32)));
-  emit_insn (gen_bswapsi2 (dest_si, src_si));
-  emit_insn (gen_bswapsi2 (op3_si, op2_si));
-  emit_insn (gen_ashldi3 (dest, dest, GEN_INT (32)));
-  emit_insn (gen_iordi3 (dest, dest, op3));
+  emit_insn (gen_bswapsi2 (op3_si, src_si));
+  emit_insn (gen_bswapsi2 (dest_si, op2_si));
+  emit_insn (gen_rotldi3_insert_3 (dest, op3, GEN_INT (32), dest,
+  GEN_INT ((HOST_WIDE_INT_1U << 32) - 1)));
   DONE;
 })

diff --git a/gcc/testsuite/gcc.target/powerpc/pr93453-1.c 
b/gcc/testsuite/gcc.target/powerpc/pr93453-1.c
new file mode 100644
index 000..4271886561f
--- /dev/null
+++ b/gcc/testsuite/gcc.target/powerpc/pr93453-1.c
@@ -0,0 +1,14 @@
+/* { dg-do compile { target lp64 } } */
+/* { dg-options "-mdejagnu-cpu=power6 -O2" } */
+
+unsigned long load_byte_reverse (unsigned long *in)
+{
+   return __builtin_bswap64 (*in);
+}
+
+unsigned long byte_reverse (unsigned long in)
+{
+   return __builtin_bswap64 (in);
+}
+
+/* { dg-final { scan-assembler-times {\mrldimi\M} 2 } } */


Re: [PATCH] x86: harmonize __builtin_ia32_psadbw*() types

2022-06-05 Thread Hongtao Liu via Gcc-patches
On Mon, Jun 6, 2022 at 3:17 AM Uros Bizjak via Gcc-patches
 wrote:
>
> On Thu, Jun 2, 2022 at 5:04 PM Jan Beulich  wrote:
> >
> > The 64-bit, 128-bit, and 512-bit variants have VDI return type, in
> > line with instruction behavior. Make the 256-bit builtin match, thus
> > also making it match the insn it expands to (using VI8_AVX2_AVX512BW).
> >
> > gcc/
> >
> > * config/i386/i386-builtin.def (__builtin_ia32_psadbw256):
> > Change type.
> > * config/i386/i386-builtin-types.def: New function type
> > (V4DI, V32QI, V32QI).
> > * config/i386/i386-expand.cc (ix86_expand_args_builtin): Handle
> > V4DI_FTYPE_V32QI_V32QI.
>
> LGTM, but please let HJ have the final approval.
I think it was just a typo and not intentional, so Ok for the trunk.
>
> Uros.
>
> >
> > --- a/gcc/config/i386/i386-builtin.def
> > +++ b/gcc/config/i386/i386-builtin.def
> > @@ -1217,7 +1217,7 @@ BDESC (OPTION_MASK_ISA_AVX2, 0, CODE_FOR
> >  BDESC (OPTION_MASK_ISA_AVX2, 0, CODE_FOR_mulv8si3, 
> > "__builtin_ia32_pmulld256"  , IX86_BUILTIN_PMULLD256  , UNKNOWN, (int) 
> > V8SI_FTYPE_V8SI_V8SI)
> >  BDESC (OPTION_MASK_ISA_AVX2, 0, CODE_FOR_vec_widen_umult_even_v8si, 
> > "__builtin_ia32_pmuludq256", IX86_BUILTIN_PMULUDQ256, UNKNOWN, (int) 
> > V4DI_FTYPE_V8SI_V8SI)
> >  BDESC (OPTION_MASK_ISA_AVX2, 0, CODE_FOR_iorv4di3, 
> > "__builtin_ia32_por256", IX86_BUILTIN_POR256, UNKNOWN, (int) 
> > V4DI_FTYPE_V4DI_V4DI)
> > -BDESC (OPTION_MASK_ISA_AVX2, 0, CODE_FOR_avx2_psadbw, 
> > "__builtin_ia32_psadbw256", IX86_BUILTIN_PSADBW256, UNKNOWN, (int) 
> > V16HI_FTYPE_V32QI_V32QI)
> > +BDESC (OPTION_MASK_ISA_AVX2, 0, CODE_FOR_avx2_psadbw, 
> > "__builtin_ia32_psadbw256", IX86_BUILTIN_PSADBW256, UNKNOWN, (int) 
> > V4DI_FTYPE_V32QI_V32QI)
> >  BDESC (OPTION_MASK_ISA_AVX2, 0, CODE_FOR_avx2_pshufbv32qi3, 
> > "__builtin_ia32_pshufb256", IX86_BUILTIN_PSHUFB256, UNKNOWN, (int) 
> > V32QI_FTYPE_V32QI_V32QI)
> >  BDESC (OPTION_MASK_ISA_AVX2, 0, CODE_FOR_avx2_pshufdv3, 
> > "__builtin_ia32_pshufd256", IX86_BUILTIN_PSHUFD256, UNKNOWN, (int) 
> > V8SI_FTYPE_V8SI_INT)
> >  BDESC (OPTION_MASK_ISA_AVX2, 0, CODE_FOR_avx2_pshufhwv3, 
> > "__builtin_ia32_pshufhw256", IX86_BUILTIN_PSHUFHW256, UNKNOWN, (int) 
> > V16HI_FTYPE_V16HI_INT)
> > --- a/gcc/config/i386/i386-builtin-types.def
> > +++ b/gcc/config/i386/i386-builtin-types.def
> > @@ -516,6 +516,7 @@ DEF_FUNCTION_TYPE (V8DI, V8DI, V2DI, INT
> >  DEF_FUNCTION_TYPE (V8DI, V8DI, V2DI, INT, V8DI, UQI)
> >  DEF_FUNCTION_TYPE (V8DI, V8DI, V4DI, INT, V8DI, UQI)
> >  DEF_FUNCTION_TYPE (V4DI, V8SI, V8SI)
> > +DEF_FUNCTION_TYPE (V4DI, V32QI, V32QI)
> >  DEF_FUNCTION_TYPE (V8DI, V64QI, V64QI)
> >  DEF_FUNCTION_TYPE (V4DI, V4DI, V2DI)
> >  DEF_FUNCTION_TYPE (V4DI, PCV4DI, V4DI)
> > --- a/gcc/config/i386/i386-expand.cc
> > +++ b/gcc/config/i386/i386-expand.cc
> > @@ -10359,6 +10359,7 @@ ix86_expand_args_builtin (const struct b
> >  case V8SI_FTYPE_V16HI_V16HI:
> >  case V4DI_FTYPE_V4DI_V4DI:
> >  case V4DI_FTYPE_V8SI_V8SI:
> > +case V4DI_FTYPE_V32QI_V32QI:
> >  case V8DI_FTYPE_V64QI_V64QI:
> >if (comparison == UNKNOWN)
> > return ix86_expand_binop_builtin (icode, exp, target);
> >



-- 
BR,
Hongtao


Re: [PATCH] Update {skylake,icelake,alderlake}_cost to add a bit preference to vector store.

2022-06-05 Thread Hongtao Liu via Gcc-patches
On Wed, Jun 1, 2022 at 11:56 PM H.J. Lu via Gcc-patches
 wrote:
>
> On Tue, May 31, 2022 at 10:06 PM Cui,Lili  wrote:
> >
> > This patch is to update {skylake,icelake,alderlake}_cost to add a bit 
> > preference to vector store.
> > Since the interger vector construction cost has changed, we need to adjust 
> > the load and store costs for intel processers.
> >
> > With the patch applied
> > 538.imagic_r:gets ~6% improvement on ADL for multicopy.
> > 525.x264_r  :gets ~2% improvement on ADL and ICX for multicopy.
> > with no measurable changes for other benchmarks.
> >
> > Bootstrapped and regtested on x86_64-linux-gnu{-m32,}. Ok for trunk?
> >
> > Thanks,
> > Lili.
> >
> > gcc/ChangeLog
> >
> > PR target/105493
> > * config/i386/x86-tune-costs.h (skylake_cost): Raise the gpr load 
> > cost
> > from 4 to 6 and gpr store cost from 6 to 8. Change SSE loads and
> > unaligned loads cost from {6, 6, 6, 10, 20} to {8, 8, 8, 8, 16}.
> > (icelake_cost): Ditto.
> > (alderlake_cost): Raise the gpr store cost from 6 to 8 and SSE 
> > loads,
> > stores and unaligned stores cost from {6, 6, 6, 10, 15} to
> > {8, 8, 8, 10, 15}.
> >
> > gcc/testsuite/
> >
> > PR target/105493
> > * gcc.target/i386/pr91446.c: Adjust to expect vectorization
> > * gcc.target/i386/pr99881.c: XFAIL.
> > ---
> >  gcc/config/i386/x86-tune-costs.h| 26 -
> >  gcc/testsuite/gcc.target/i386/pr91446.c |  2 +-
> >  gcc/testsuite/gcc.target/i386/pr99881.c |  2 +-
> >  3 files changed, 15 insertions(+), 15 deletions(-)
> >
> > diff --git a/gcc/config/i386/x86-tune-costs.h 
> > b/gcc/config/i386/x86-tune-costs.h
> > index ea34a939c68..6c9066c84cc 100644
> > --- a/gcc/config/i386/x86-tune-costs.h
> > +++ b/gcc/config/i386/x86-tune-costs.h
> > @@ -1897,15 +1897,15 @@ struct processor_costs skylake_cost = {
> >8,   /* "large" insn */
> >17,  /* MOVE_RATIO */
> >17,  /* CLEAR_RATIO */
> > -  {4, 4, 4},   /* cost of loading integer registers
> > +  {6, 6, 6},   /* cost of loading integer registers
> >in QImode, HImode and SImode.
> >Relative to reg-reg move (2).  */
> > -  {6, 6, 6},   /* cost of storing integer 
> > registers */
> > -  {6, 6, 6, 10, 20},   /* cost of loading SSE register
> > +  {8, 8, 8},   /* cost of storing integer 
> > registers */
> > +  {8, 8, 8, 8, 16},/* cost of loading SSE register
> >in 32bit, 64bit, 128bit, 256bit 
> > and 512bit */
> >{8, 8, 8, 8, 16},/* cost of storing SSE register
> >in 32bit, 64bit, 128bit, 256bit 
> > and 512bit */
> > -  {6, 6, 6, 10, 20},   /* cost of unaligned loads.  */
> > +  {8, 8, 8, 8, 16},/* cost of unaligned loads.  */
> >{8, 8, 8, 8, 16},/* cost of unaligned stores.  */
> >2, 2, 4, /* cost of moving XMM,YMM,ZMM 
> > register */
> >6,   /* cost of moving SSE register to 
> > integer.  */
> > @@ -2023,15 +2023,15 @@ struct processor_costs icelake_cost = {
> >8,   /* "large" insn */
> >17,  /* MOVE_RATIO */
> >17,  /* CLEAR_RATIO */
> > -  {4, 4, 4},   /* cost of loading integer registers
> > +  {6, 6, 6},   /* cost of loading integer registers
> >in QImode, HImode and SImode.
> >Relative to reg-reg move (2).  */
> > -  {6, 6, 6},   /* cost of storing integer 
> > registers */
> > -  {6, 6, 6, 10, 20},   /* cost of loading SSE register
> > +  {8, 8, 8},   /* cost of storing integer 
> > registers */
> > +  {8, 8, 8, 8, 16},/* cost of loading SSE register
> >in 32bit, 64bit, 128bit, 256bit 
> > and 512bit */
> >{8, 8, 8, 8, 16},/* cost of storing SSE register
> >in 32bit, 64bit, 128bit, 256bit 
> > and 512bit */
> > -  {6, 6, 6, 10, 20},   /* cost of unaligned loads.  */
> > +  {8, 8, 8, 8, 16},/* cost of unaligned loads.  */
> >{8, 8, 8, 8, 16},/* cost of unaligned stores.  */
> >2, 2, 4, /* cost of moving XMM,YMM,ZMM 
> > register */
> >6,   /* cost of moving SSE regis

[PATCH] inline: Rebuild target option node for caller [PR105459]

2022-06-05 Thread Kewen.Lin via Gcc-patches
Hi,

PR105459 exposes one issue in inline_call handling that when it
decides to copy FP flags from callee to caller and rebuild the
optimization node for caller fndecl, it's possible that the target
option node is also necessary to be rebuilt.  Without updating
target option node early, it can make nodes share the same target
option node wrongly, later when we want to unshare it somewhere
(like in target hook) it can get unexpected results, like ICE on
uninitialized secondary member of target globals exposed in this PR.

Commit r12-3721 makes it get exact fp_expression info and causes
more optimization chances then exposes this issue.  Commit r11-5855
introduces two target options to shadow flag_excess_precision and
flag_unsafe_math_optimizations and shows the need to rebuild target
node in inline_call when optimization node changes.

As commented in PR105459, I tried to postpone init_function_start
in cgraph_node::expand, but abandoned it since I thought it just
concealed the issue.  And I also tried to adjust the target node
when current function switching, but failed since we get the NULL
cfun and fndecl in WPA phase.

Bootstrapped and regtested on x86_64-redhat-linux, powerpc64-linux-gnu
P8 and powerpc64le-linux-gnu P9.

Any thoughts?  Is it OK for trunk?

BR,
Kewen
-

PR tree-optimization/105459

gcc/ChangeLog:

* ipa-inline-transform.cc (inline_call): Rebuild target option node
once optimization node gets rebuilt.

gcc/testsuite/ChangeLog:

* gcc.dg/lto/pr105459_0.c: New test.
---
 gcc/ipa-inline-transform.cc   | 50 +--
 gcc/testsuite/gcc.dg/lto/pr105459_0.c | 35 +++
 2 files changed, 83 insertions(+), 2 deletions(-)
 create mode 100644 gcc/testsuite/gcc.dg/lto/pr105459_0.c

diff --git a/gcc/ipa-inline-transform.cc b/gcc/ipa-inline-transform.cc
index 07288e57c73..edba58377f4 100644
--- a/gcc/ipa-inline-transform.cc
+++ b/gcc/ipa-inline-transform.cc
@@ -52,6 +52,7 @@ along with GCC; see the file COPYING3.  If not see
 #include "ipa-modref.h"
 #include "symtab-thunks.h"
 #include "symtab-clones.h"
+#include "target.h"

 int ncalls_inlined;
 int nfunctions_inlined;
@@ -469,8 +470,53 @@ inline_call (struct cgraph_edge *e, bool update_original,
 }

   /* Reload global optimization flags.  */
-  if (reload_optimization_node && DECL_STRUCT_FUNCTION (to->decl) == cfun)
-set_cfun (cfun, true);
+  if (reload_optimization_node)
+{
+  /* Only need to check and update target option node
+when target_option_default_node is not NULL.  */
+  if (target_option_default_node)
+   {
+ /* Save the current context for optimization and target option
+node.  */
+ tree old_optimize
+   = build_optimization_node (&global_options, &global_options_set);
+ tree old_target_opt
+   = build_target_option_node (&global_options, &global_options_set);
+
+ /* Restore optimization with new optimizatin node.  */
+ tree new_optimize = DECL_FUNCTION_SPECIFIC_OPTIMIZATION (to->decl);
+ if (old_optimize != new_optimize)
+   cl_optimization_restore (&global_options, &global_options_set,
+TREE_OPTIMIZATION (new_optimize));
+
+ /* Restore target option with the one from caller fndecl.  */
+ tree cur_target_opt = DECL_FUNCTION_SPECIFIC_TARGET (to->decl);
+ if (!cur_target_opt)
+   cur_target_opt = target_option_default_node;
+ cl_target_option_restore (&global_options, &global_options_set,
+   TREE_TARGET_OPTION (cur_target_opt));
+
+ /* Update target option as optimization changes.  */
+ targetm.target_option.override ();
+
+ /* Rebuild target option node for caller fndecl and replace
+with it if the node changes.  */
+ tree new_target_opt
+   = build_target_option_node (&global_options, &global_options_set);
+ if (cur_target_opt != new_target_opt)
+   DECL_FUNCTION_SPECIFIC_TARGET (to->decl) = new_target_opt;
+
+ /* Restore the context with previous saved nodes.  */
+ if (old_optimize != new_optimize)
+   cl_optimization_restore (&global_options, &global_options_set,
+TREE_OPTIMIZATION (old_optimize));
+ cl_target_option_restore (&global_options, &global_options_set,
+   TREE_TARGET_OPTION (old_target_opt));
+   }
+
+  if (DECL_STRUCT_FUNCTION (to->decl) == cfun)
+   set_cfun (cfun, true);
+}

   /* If aliases are involved, redirect edge to the actual destination and
  possibly remove the aliases.  */
diff --git a/gcc/testsuite/gcc.dg/lto/pr105459_0.c 
b/gcc/testsuite/gcc.dg/lto/pr105459_0.c
new file mode 100644
index 000..c799e6ef23d
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/lto/pr105459_0.c
@@ -0,0 +1,35 @@
+/* { dg-lto-do link } */
+/* { dg-lto-op

[PATCH] Update document for VECTOR_MODES_WITH_PREFIX

2022-06-05 Thread Kewen.Lin via Gcc-patches
Hi,

r10-3912 updated the format of VECTOR_MODES_WITH_PREFIX by
adding one more parameter ORDER, the related document is out
of date.  So update the document for ORDER.

Is it ok for trunk?

BR,
Kewen
-

gcc/ChangeLog:

* machmode.def (VECTOR_MODES_WITH_PREFIX): Update document for
parameter ORDER.
---
 gcc/machmode.def | 5 +++--
 1 file changed, 3 insertions(+), 2 deletions(-)

diff --git a/gcc/machmode.def b/gcc/machmode.def
index b62a5fbc683..d62563601fc 100644
--- a/gcc/machmode.def
+++ b/gcc/machmode.def
@@ -142,9 +142,10 @@ along with GCC; see the file COPYING3.  If not see
than two bytes (if CLASS is FLOAT).  CLASS must be INT or
FLOAT.  The names follow the same rule as VECTOR_MODE uses.

- VECTOR_MODES_WITH_PREFIX (PREFIX, CLASS, WIDTH);
+ VECTOR_MODES_WITH_PREFIX (PREFIX, CLASS, WIDTH, ORDER);
Like VECTOR_MODES, but start the mode names with PREFIX instead
-   of the usual "V".
+   of the usual "V".  ORDER is the top-level sorting order of the
+   mode, with smaller numbers indicating a higher priority.

  VECTOR_BOOL_MODE (NAME, COUNT, COMPONENT, BYTESIZE)
 Create a vector mode called NAME that contains COUNT boolean
--
2.27.0