date:20210729

Re: [PATCH] c++: Implement P0466R5 __cpp_lib_is_pointer_interconvertible compiler helpers [PR101539]

2021-07-29 Thread Jason Merrill via Gcc-patches


On 7/29/21 4:38 PM, Jason Merrill wrote:

On 7/29/21 3:50 AM, Jakub Jelinek wrote:

Hi!

The following patch attempts to implement the compiler helpers for
libstdc++ std::is_pointer_interconvertible_base_of trait and
std::is_pointer_interconvertible_with_class template function.

For the former __is_pointer_interconvertible_base_of trait that checks 
first

whether base and derived aren't non-union class types that are the same
ignoring toplevel cv-qualifiers, otherwise if derived is unambiguously
derived from base without cv-qualifiers, derived being a complete type,
and if so, my limited understanding of any derived object being
pointer-interconvertible with base subobject IMHO implies (because one 
can't

inherit from unions or unions can't inherit) that we check if derived is
standard layout type and we walk bases of derived
recursively, stopping on a class that has any non-static data members and
check if any of the bases is base.  On class with non-static data members
no bases are compared already.

The latter is implemented using a FE
__builtin_is_pointer_interconvertible_with_class, but because on the 
library

side it will be a template function, the builtin takes ... arguments and
only during folding verifies it has a single argument with pointer to 
member

type.  The initial errors IMHO can only happen if one uses the builtin
incorrectly by hand, the template function should ensure that it has
exactly a single argument that has pointer to member type.
Otherwise, again with my limited understanding of what
the template function should do and pointer-interconvertibility,
it folds to false for pointer-to-member-function, errors if
basetype of the OFFSET_TYPE is incomplete, folds to false
for non-std-layout basetype, then finds the first non-static
data member in the basetype or its bases (by ignoring
DECL_FIELD_IS_BASE FIELD_DECLs that are empty, recursing into
DECL_FIELD_IS_BASE FIELD_DECLs type that are non-empty (I think
std layout should ensure there is at most one), for unions
checks if membertype is same type as any of the union FIELD_DECLs,
for non-unions the first other FIELD_DECL only, and for anonymous
aggregates similarly (union vs. non-union) but recurses into the
anon aggr types.  If membertype doesn't match the type of
first non-static data member (or for unions any of the members),
then the builtin folds to false, otherwise the built folds to
a check whether the argument is equal to OFFSET_TYPE of 0 or not,
either at compile time if it is constant (e.g. for constexpr
folding) or at runtime otherwise.

As I wrote in the PR, I've tried my testcases with MSVC on godbolt
that claims to implement it, and https://godbolt.org/z/3PnjM33vM
for the first testcase shows it disagrees with my expectations on
static_assert (std::is_pointer_interconvertible_base_of_v);
static_assert (std::is_pointer_interconvertible_base_of_v);


I think these are correct.


static_assert (!std::is_pointer_interconvertible_base_of_v);
static_assert (!std::is_pointer_interconvertible_base_of_v); >> 
static_assert (std::is_pointer_interconvertible_base_of_v);


I think these are wrong, given my comment below about CWG2254.


Is that a bug in my patch or is MSVC buggy on these (or mix thereof)?
https://godbolt.org/z/aYeYnne9d
shows the second testcase, here it differs on:
static_assert (std::is_pointer_interconvertible_with_class 
(::b));
static_assert (std::is_pointer_interconvertible_with_class 
(::g));
static_assert (std::is_pointer_interconvertible_with_class 
(::b));

static_assert (std::is_pointer_interconvertible_with_class (::a));
static_assert (std::is_pointer_interconvertible_with_class (::b));
Again, my bug, MSVC bug, mix thereof?


MSVC bug, I think.


Oh, and there is another thing, the standard has an example:
struct A { int a; };    // a standard-layout class
struct B { int b; };    // a standard-layout class
struct C: public A, public B { };   // not a standard-layout class

static_assert( is_pointer_interconvertible_with_class( ::b ) );
   // Succeeds because, despite its appearance, ::b has type
   // “pointer to member of B of type int”.
static_assert( is_pointer_interconvertible_with_class( ::b ) );
   // Forces the use of class C, and fails.
It seems to work as written with MSVC (second assertion fails),
but fails with GCC with the patch:
/tmp/1.C:22:57: error: no matching function for call to 
‘is_pointer_interconvertible_with_class(int B::*)’
    22 | static_assert( is_pointer_interconvertible_with_class( 
::b ) );
   |
~^
/tmp/1.C:8:1: note: candidate: ‘template constexpr 
bool std::is_pointer_interconvertible_with_class(M S::*)’

 8 | is_pointer_interconvertible_with_class (M S::*m) noexcept
   | ^~
/tmp/1.C:8:1: note:   template argument deduction/substitution failed:
/tmp/1.C:22:57: note:   mismatched types ‘C’ and ‘B’
    22 | static_assert(

[PATCH v2] Make loops_list support an optional loop_p root

2021-07-29 Thread Kewen.Lin via Gcc-patches

on 2021/7/29 下午4:01, Richard Biener wrote:
> On Fri, Jul 23, 2021 at 10:41 AM Kewen.Lin  wrote:
>>
>> on 2021/7/22 下午8:56, Richard Biener wrote:
>>> On Tue, Jul 20, 2021 at 4:37
>>> PM Kewen.Lin  wrote:

 Hi,

 This v2 has addressed some review comments/suggestions:

   - Use "!=" instead of "<" in function operator!= (const Iter )
   - Add new CTOR loops_list (struct loops *loops, unsigned flags)
 to support loop hierarchy tree rather than just a function,
 and adjust to use loops* accordingly.
>>>
>>> I actually meant struct loop *, not struct loops * ;)  At the point
>>> we pondered to make loop invariant motion work on single
>>> loop nests we gave up not only but also because it iterates
>>> over the loop nest but all the iterators only ever can process
>>> all loops, not say, all loops inside a specific 'loop' (and
>>> including that 'loop' if LI_INCLUDE_ROOT).  So the
>>> CTOR would take the 'root' of the loop tree as argument.
>>>
>>> I see that doesn't trivially fit how loops_list works, at least
>>> not for LI_ONLY_INNERMOST.  But I guess FROM_INNERMOST
>>> could be adjusted to do ONLY_INNERMOST as well?
>>>
>>
>>
>> Thanks for the clarification!  I just realized that the previous
>> version with struct loops* is problematic, all traversal is
>> still bounded with outer_loop == NULL.  I think what you expect
>> is to respect the given loop_p root boundary.  Since we just
>> record the loops' nums, I think we still need the function* fn?
> 
> Would it simplify things if we recorded the actual loop *?
> 

I'm afraid it's unsafe to record the loop*.  I had the same
question why the loop iterator uses index rather than loop* when
I read this at the first time.  I guess the design of processing
loops allows its user to update or even delete the folllowing
loops to be visited.  For example, when the user does some tricks
on one loop, then it duplicates the loop and its children to
somewhere and then removes the loop and its children, when
iterating onto its children later, the "index" way will check its
validity by get_loop at that point, but the "loop *" way will
have some recorded pointers to become dangling, can't do the
validity check on itself, seems to need a side linear search to
ensure the validity.

> There's still the to_visit reserve which needs a bound on
> the number of loops for efficiency reasons.
> 

Yes, I still keep the fn in the updated version.

>> So I add one optional argument loop_p root and update the
>> visiting codes accordingly.  Before this change, the previous
>> visiting uses the outer_loop == NULL as the termination condition,
>> it perfectly includes the root itself, but with this given root,
>> we have to use it as the termination condition to avoid to iterate
>> onto its possible existing next.
>>
>> For LI_ONLY_INNERMOST, I was thinking whether we can use the
>> code like:
>>
>> struct loops *fn_loops = loops_for_fn (fn)->larray;
>> for (i = 0; vec_safe_iterate (fn_loops, i, ); i++)
>> if (aloop != NULL
>> && aloop->inner == NULL
>> && flow_loop_nested_p (tree_root, aloop))
>>  this->to_visit.quick_push (aloop->num);
>>
>> it has the stable bound, but if the given root only has several
>> child loops, it can be much worse if there are many loops in fn.
>> It seems impossible to predict the given root loop hierarchy size,
>> maybe we can still use the original linear searching for the case
>> loops_for_fn (fn) == root?  But since this visiting seems not so
>> performance critical, I chose to share the code originally used
>> for FROM_INNERMOST, hope it can have better readability and
>> maintainability.
> 
> I was indeed looking for something that has execution/storage
> bound on the subtree we're interested in.  If we pull the CTOR
> out-of-line we can probably keep the linear search for
> LI_ONLY_INNERMOST when looking at the whole loop tree.
> 

OK, I've moved the suggested single loop tree walker out-of-line
to cfgloop.c, and brought the linear search back for
LI_ONLY_INNERMOST when looking at the whole loop tree.

> It just seemed to me that we can eventually re-use a
> single loop tree walker for all orders, just adjusting the
> places we push.
> 

Wow, good point!  Indeed, I have further unified all orders
handlings into a single function walk_loop_tree.

>>
>> Bootstrapped and regtested on powerpc64le-linux-gnu P9,
>> x86_64-redhat-linux and aarch64-linux-gnu, also
>> bootstrapped on ppc64le P9 with bootstrap-O3 config.
>>
>> Does the attached patch meet what you expect?
> 
> So yeah, it's probably close to what is sensible.  Not sure
> whether optimizing the loops for the !only_push_innermost_p
> case is important - if we manage to produce a single
> walker with conditionals based on 'flags' then IPA-CP should
> produce optimal clones as well I guess.
> 

Thanks for the comments, the updated v2 is attached.
Comparing with v1, it does:

  - Unify one single loop

Re: [PATCH] c++: __builtin_is_pointer_interconvertible_with_class incremental fix [PR101539]

2021-07-29 Thread Jason Merrill via Gcc-patches


On 7/29/21 10:21 AM, Jakub Jelinek wrote:

On Thu, Jul 29, 2021 at 09:50:10AM +0200, Jakub Jelinek via Gcc-patches wrote:

Now that I'm writing the above text and rereading the
pointer-interconvertibility definition, I think my first_nonstatic_data_member_p
and fold_builtin_is_pointer_inverconvertible_with_class have one bug,
for unions the pointer inter-convertibility doesn't talk about std layout at
all, so I think I need to check for std_layout_type_p only for non-union
class types and accept any union, std_layout_type_p or not.  But when
recursing from a union type into anonymous structure type punt if the
anonymous structure type is not std_layout_type_p + add testcase coverage.


For this part, here is an incremental fix.  Tested on x86_64-linux.

It also shows that in the case (we're beyond the standard in this case
because anonymous structures are not in the standard) of union with
non-std-layout anonymous structure in it, in the case in the testcases like:
struct D {};
struct E { [[no_unique_address]] D e; };
union Y { int a; struct : public E { short b; long c; }; long long d; };


We don't already reject an anonymous struct with bases?  I think we 
should do so, in fixup_anonymous_aggr.  We might even require anonymous 
structs to be standard-layout.



the builtin will return false for ::b - while ::b is at offset zero,
the anonymous structure is not std-layout and therefore the
pointer-interconvertibility rules say pointers aren't interconvertible.
But in case like:
union Y2 { int a; struct : public E { int b; long c; }; long long d; };
it will return true for ::b - while the same applies, there is
another union member with int type.  In theory when seeing the PTRMEM_CST
we could still differentiate, ::a is ok but ::b is not.  But as soon
as we have just an INTEGER_CST with OFFSET_TYPE or need to check it at
runtime, all we know is that we have pointer to int data member in Y2
at offset 0, and that is the same for ::a and ::b.


Yep.

I'm inclined not to handle this extension case specifically.


2021-07-29  Jakub Jelinek  

PR c++/101539
* semantics.c (first_nonstatic_data_member_p): Don't recurse into
non-std-layout non-union class types from union type.
(fold_builtin_is_pointer_inverconvertible_with_class): Don't check
std-layout type for union types.

* g++.dg/cpp2a/is-pointer-interconvertible-with-class1.C: Add
tests for non-std-layout union type.
* g++.dg/cpp2a/is-pointer-interconvertible-with-class2.C: Likewise.
* g++.dg/cpp2a/is-pointer-interconvertible-with-class4.C: Add
tests for non-std-layout anonymous class type in union.
* g++.dg/cpp2a/is-pointer-interconvertible-with-class5.C: Likewise.

--- gcc/cp/semantics.c.jj   2021-07-28 23:06:38.665443459 +0200
+++ gcc/cp/semantics.c  2021-07-29 15:44:30.659713391 +0200
@@ -10631,7 +10631,9 @@ first_nonstatic_data_member_p (tree type
  if (TREE_CODE (type) != UNION_TYPE)
return first_nonstatic_data_member_p (TREE_TYPE (field),
  membertype);
- if (first_nonstatic_data_member_p (TREE_TYPE (field), membertype))
+ if ((TREE_CODE (TREE_TYPE (field)) == UNION_TYPE
+  || std_layout_type_p (TREE_TYPE (field)))
+ && first_nonstatic_data_member_p (TREE_TYPE (field), membertype))
return true;
}
else if (TREE_CODE (type) != UNION_TYPE)
@@ -10677,7 +10679,8 @@ fold_builtin_is_pointer_inverconvertible
if (!complete_type_or_else (basetype, NULL_TREE))
  return boolean_false_node;
  
-  if (!std_layout_type_p (basetype))

+  if (TREE_CODE (basetype) != UNION_TYPE
+  && !std_layout_type_p (basetype))
  return boolean_false_node;
  
if (!first_nonstatic_data_member_p (basetype, membertype))

--- gcc/testsuite/g++.dg/cpp2a/is-pointer-interconvertible-with-class1.C.jj 
2021-07-28 23:06:38.667443431 +0200
+++ gcc/testsuite/g++.dg/cpp2a/is-pointer-interconvertible-with-class1.C
2021-07-29 15:46:55.809743808 +0200
@@ -28,6 +28,7 @@ union U { int a; double b; long long c;
  struct V { union { int a; long b; }; int c; };
  union X { int a; union { short b; long c; }; long long d; };
  struct Y { void foo () {} };
+union Z { int a; private: int b; protected: int c; public: int d; };
  
  static_assert (std::is_pointer_interconvertible_with_class (::b));

  static_assert (!std::is_pointer_interconvertible_with_class (::b2));
@@ -60,3 +61,5 @@ static_assert (std::is_pointer_interconv
  static_assert (std::is_pointer_interconvertible_with_class (::d));
  static_assert (!std::is_pointer_interconvertible_with_class ((int B::*) 
nullptr));
  static_assert (!std::is_pointer_interconvertible_with_class (::foo));
+static_assert (std::is_pointer_interconvertible_with_class (::a));
+static_assert (std::is_pointer_interconvertible_with_class (::d));
---

PING^w: [PATCH] ipa-devirt: check precision mismatch of enum values [PR101396]

2021-07-29 Thread Xi Ruoyao via Gcc-patches

Ping again.

On Sun, 2021-07-11 at 01:48 +0800, Xi Ruoyao wrote:
> We are comparing enum values (in wide_int) to check ODR violation.
> However, if we compare two wide_int values with different precision,
> we'll trigger an assert, leading to ICE.  With enum-base introduced
> in C++11, it's easy to sink into this situation.
> 
> To fix the issue, we need to explicitly check this kind of mismatch,
> and emit a proper warning message if there is such one.
> 
> Bootstrapped & regtested on x86_64-linux-gnu.  Ok for trunk?
> 
> gcc/
> 
> PR ipa/101396
> * ipa-devirt.c (ipa_odr_read_section): Compare the precision
> of
>   enum values, and emit a warning if they mismatch.
> 
> gcc/testsuite/
> 
> PR ipa/101396
> * g++.dg/lto/pr101396_0.C: New test.
> * g++.dg/lto/pr101396_1.C: New test.
> ---
>  gcc/ipa-devirt.c  |  9 +
>  gcc/testsuite/g++.dg/lto/pr101396_0.C | 12 
>  gcc/testsuite/g++.dg/lto/pr101396_1.C | 10 ++
>  3 files changed, 31 insertions(+)
>  create mode 100644 gcc/testsuite/g++.dg/lto/pr101396_0.C
>  create mode 100644 gcc/testsuite/g++.dg/lto/pr101396_1.C
> 
> diff --git a/gcc/ipa-devirt.c b/gcc/ipa-devirt.c
> index 8cd1100aba9..8deec75b2df 100644
> --- a/gcc/ipa-devirt.c
> +++ b/gcc/ipa-devirt.c
> @@ -4193,6 +4193,8 @@ ipa_odr_read_section (struct lto_file_decl_data
> *file_data, const char *data,
>   if (do_warning != -1 || j >= this_enum.vals.length ())
> continue;
>   if (strcmp (id, this_enum.vals[j].name)
> + || (val.get_precision() !=
> + this_enum.vals[j].val.get_precision())
>   || val != this_enum.vals[j].val)
> {
>   warn_name = xstrdup (id);
> @@ -4260,6 +4262,13 @@ ipa_odr_read_section (struct lto_file_decl_data
> *file_data, const char *data,
>     "name %qs differs from name %qs defined"
>     " in another translation unit",
>     this_enum.vals[j].name, warn_name);
> + else if (this_enum.vals[j].val.get_precision() !=
> +  warn_value.get_precision())
> +   inform (this_enum.vals[j].locus,
> +   "name %qs is defined as %u-bit while
> another "
> +   "translation unit defines it as %u-bit",
> +   warn_name,
> this_enum.vals[j].val.get_precision(),
> +   warn_value.get_precision());
>   /* FIXME: In case there is easy way to print
> wide_ints,
>  perhaps we could do it here instead of overflow
> check.  */
>   else if (wi::fits_shwi_p (this_enum.vals[j].val)
> diff --git a/gcc/testsuite/g++.dg/lto/pr101396_0.C
> b/gcc/testsuite/g++.dg/lto/pr101396_0.C
> new file mode 100644
> index 000..b7a2947a880
> --- /dev/null
> +++ b/gcc/testsuite/g++.dg/lto/pr101396_0.C
> @@ -0,0 +1,12 @@
> +/* { dg-lto-do link } */
> +
> +enum A : __UINT32_TYPE__ { // { dg-lto-warning "6: type 'A' violates
> the C\\+\\+ One Definition Rule" }
> +  a, // { dg-lto-note "3: name 'a' is defined as 32-bit while another
> translation unit defines it as 64-bit" }
> +  b,
> +  c
> +};
> +
> +int main()
> +{
> +  return (int) A::a;
> +}
> diff --git a/gcc/testsuite/g++.dg/lto/pr101396_1.C
> b/gcc/testsuite/g++.dg/lto/pr101396_1.C
> new file mode 100644
> index 000..a6d032d694d
> --- /dev/null
> +++ b/gcc/testsuite/g++.dg/lto/pr101396_1.C
> @@ -0,0 +1,10 @@
> +enum A : __UINT64_TYPE__ { // { dg-lto-note "6: an enum with
> different value name is defined in another translation unit" }
> +  a, // { dg-lto-note "3: mismatching definition" }
> +  b,
> +  c
> +};
> +
> +int f(enum A x)
> +{
> +  return (int) x;
> +}


-- 
Xi Ruoyao 
School of Aerospace Science and Technology, Xidian University

PING^5: [PATCH] mips: Fix up mips_atomic_assign_expand_fenv [PR94780]

2021-07-29 Thread Xi Ruoyao via Gcc-patches

Ping again.

On Wed, 2021-06-23 at 11:11 +0800, Xi Ruoyao wrote:
> Commit message shamelessly copied from 1777beb6b129 by jakub:
> 
> This function, because it is sometimes called even outside of function
> bodies, uses create_tmp_var_raw rather than create_tmp_var.  But in
> order
> for that to work, when first referenced, the VAR_DECLs need to appear
> in a
> TARGET_EXPR so that during gimplification the var gets the right
> DECL_CONTEXT and is added to local decls.
> 
> Bootstrapped & regtested on mips64el-linux-gnu.  Ok for trunk and
> backport
> to 11, 10, and 9?
> 
> gcc/
> 
> * config/mips/mips.c (mips_atomic_assign_expand_fenv): Use
>   TARGET_EXPR instead of MODIFY_EXPR.
> ---
>  gcc/config/mips/mips.c | 12 ++--
>  1 file changed, 6 insertions(+), 6 deletions(-)
> 
> diff --git a/gcc/config/mips/mips.c b/gcc/config/mips/mips.c
> index 8f043399a8e..89d1be6cea6 100644
> --- a/gcc/config/mips/mips.c
> +++ b/gcc/config/mips/mips.c
> @@ -22439,12 +22439,12 @@ mips_atomic_assign_expand_fenv (tree *hold,
> tree *clear, tree *update)
>    tree get_fcsr = mips_builtin_decls[MIPS_GET_FCSR];
>    tree set_fcsr = mips_builtin_decls[MIPS_SET_FCSR];
>    tree get_fcsr_hold_call = build_call_expr (get_fcsr, 0);
> -  tree hold_assign_orig = build2 (MODIFY_EXPR, MIPS_ATYPE_USI,
> - fcsr_orig_var, get_fcsr_hold_call);
> +  tree hold_assign_orig = build4 (TARGET_EXPR, MIPS_ATYPE_USI,
> + fcsr_orig_var, get_fcsr_hold_call,
> NULL, NULL);
>    tree hold_mod_val = build2 (BIT_AND_EXPR, MIPS_ATYPE_USI,
> fcsr_orig_var,
>   build_int_cst (MIPS_ATYPE_USI,
> 0xf003));
> -  tree hold_assign_mod = build2 (MODIFY_EXPR, MIPS_ATYPE_USI,
> -    fcsr_mod_var, hold_mod_val);
> +  tree hold_assign_mod = build4 (TARGET_EXPR, MIPS_ATYPE_USI,
> +    fcsr_mod_var, hold_mod_val, NULL,
> NULL);
>    tree set_fcsr_hold_call = build_call_expr (set_fcsr, 1,
> fcsr_mod_var);
>    tree hold_all = build2 (COMPOUND_EXPR, MIPS_ATYPE_USI,
>   hold_assign_orig, hold_assign_mod);
> @@ -22454,8 +22454,8 @@ mips_atomic_assign_expand_fenv (tree *hold,
> tree *clear, tree *update)
>    *clear = build_call_expr (set_fcsr, 1, fcsr_mod_var);
>  
>    tree get_fcsr_update_call = build_call_expr (get_fcsr, 0);
> -  *update = build2 (MODIFY_EXPR, MIPS_ATYPE_USI,
> -   exceptions_var, get_fcsr_update_call);
> +  *update = build4 (TARGET_EXPR, MIPS_ATYPE_USI,
> +   exceptions_var, get_fcsr_update_call, NULL, NULL);
>    tree set_fcsr_update_call = build_call_expr (set_fcsr, 1,
> fcsr_orig_var);
>    *update = build2 (COMPOUND_EXPR, void_type_node, *update,
>     set_fcsr_update_call);


-- 
Xi Ruoyao 
School of Aerospace Science and Technology, Xidian University

PING^5: [PATCH] mips: add MSA vec_cmp and vec_cmpu expand pattern [PR101132]

2021-07-29 Thread Xi Ruoyao via Gcc-patches

Ping again.

On Mon, 2021-06-21 at 21:42 +0800, Xi Ruoyao wrote:
> Middle-end started to emit vec_cmp and vec_cmpu since GCC 11, causing
> ICE on MIPS with MSA enabled.  Add the pattern to prevent it.
> 
> Bootstrapped and regression tested on mips64el-linux-gnu.
> Ok for trunk?
> 
> gcc/
> 
> * config/mips/mips-protos.h (mips_expand_vec_cmp_expr):
> Declare.
> * config/mips/mips.c (mips_expand_vec_cmp_expr): New function.
> * config/mips/mips-msa.md (vec_cmp): New
>   expander.
>   (vec_cmpu): New expander.
> ---
>  gcc/config/mips/mips-msa.md   | 22 ++
>  gcc/config/mips/mips-protos.h |  1 +
>  gcc/config/mips/mips.c    | 11 +++
>  3 files changed, 34 insertions(+)
> 
> diff --git a/gcc/config/mips/mips-msa.md b/gcc/config/mips/mips-msa.md
> index 3ecf2bde19f..3a67f25be56 100644
> --- a/gcc/config/mips/mips-msa.md
> +++ b/gcc/config/mips/mips-msa.md
> @@ -435,6 +435,28 @@
>    DONE;
>  })
>  
> +(define_expand "vec_cmp"
> +  [(match_operand: 0 "register_operand")
> +   (match_operator 1 ""
> + [(match_operand:MSA 2 "register_operand")
> +  (match_operand:MSA 3 "register_operand")])]
> +  "ISA_HAS_MSA"
> +{
> +  mips_expand_vec_cmp_expr (operands);
> +  DONE;
> +})
> +
> +(define_expand "vec_cmpu"
> +  [(match_operand: 0 "register_operand")
> +   (match_operator 1 ""
> + [(match_operand:IMSA 2 "register_operand")
> +  (match_operand:IMSA 3 "register_operand")])]
> +  "ISA_HAS_MSA"
> +{
> +  mips_expand_vec_cmp_expr (operands);
> +  DONE;
> +})
> +
>  (define_insn "msa_insert_"
>    [(set (match_operand:MSA 0 "register_operand" "=f,f")
> (vec_merge:MSA
> diff --git a/gcc/config/mips/mips-protos.h b/gcc/config/mips/mips-
> protos.h
> index 2cf4ed50292..a685f7f7dd5 100644
> --- a/gcc/config/mips/mips-protos.h
> +++ b/gcc/config/mips/mips-protos.h
> @@ -385,6 +385,7 @@ extern mulsidi3_gen_fn mips_mulsidi3_gen_fn (enum
> rtx_code);
>  
>  extern void mips_register_frame_header_opt (void);
>  extern void mips_expand_vec_cond_expr (machine_mode, machine_mode,
> rtx *);
> +extern void mips_expand_vec_cmp_expr (rtx *);
>  
>  /* Routines implemented in mips-d.c  */
>  extern void mips_d_target_versions (void);
> diff --git a/gcc/config/mips/mips.c b/gcc/config/mips/mips.c
> index 00a8eef96aa..8f043399a8e 100644
> --- a/gcc/config/mips/mips.c
> +++ b/gcc/config/mips/mips.c
> @@ -22321,6 +22321,17 @@ mips_expand_msa_cmp (rtx dest, enum rtx_code
> cond, rtx op0, rtx op1)
>  }
>  }
>  
> +void
> +mips_expand_vec_cmp_expr (rtx *operands)
> +{
> +  rtx cond = operands[1];
> +  rtx op0 = operands[2];
> +  rtx op1 = operands[3];
> +  rtx res = operands[0];
> +
> +  mips_expand_msa_cmp (res, GET_CODE (cond), op0, op1);
> +}
> +
>  /* Expand VEC_COND_EXPR, where:
>     MODE is mode of the result
>     VIMODE equivalent integer mode


-- 
Xi Ruoyao 
School of Aerospace Science and Technology, Xidian University

[PATCH] doc: correct documentation of "call" (et al) operand 2.

2021-07-29 Thread Hans-Peter Nilsson

An old itch being scratched: the documentation lies; it's not "the
number of registers used as operands", unless the target makes a
special arrangement to that effect, and there's nothing in the guts of
gcc setting up or assuming those semantics.

Instead, see calls.c:expand_call, variable next_arg_reg.  Or just
consider the variable name.  The text is somewhat transcribed from the
head comment of emit_call_1 for parameter next_arg_reg.  Most
important is to document the relation to function_arg_info::end_marker()
and the TARGET_FUNCTION_ARG hook.

The "normally" in the head comment, in "normally it is the first
arg-register beyond those used for args in this call, or 0 if all the
arg-registers are used in this call" means "by default", unless the
target tests end_marker_p and does something special, but the port is
free to return whatever it likes when it sees the end-marker.

And, I do mean "whatever it likes" because if the port doesn't
actually mention that operand in the RTX emitted for its "call" or
"call_value" patterns ("usually" define_expands), it can be any
mumbo-jumbo, such as a VOIDmode register, which seems like it happens
for some targets, or NULL, that happens for others.  Returning a
VOIDmode register until recently included MMIX, where it made it into
the emitted RTL, confusing later passes, recently exposed as an ICE.

Tested by inspecting the info and generated pdf for sanity.

Ok for the doc part?

gcc:
* doc/md.texi (call): Correct information about operand 2.
* config/mmix/mmix.md ("call", "call_value"): Remove fixed FIXMEs.
---
 gcc/config/mmix/mmix.md | 12 +++-
 gcc/doc/md.texi |  8 ++--
 2 files changed, 9 insertions(+), 11 deletions(-)

diff --git a/gcc/doc/md.texi b/gcc/doc/md.texi
index 07681e2ad292..f6d1bc1ad0f7 100644
--- a/gcc/doc/md.texi
+++ b/gcc/doc/md.texi
@@ -7048,8 +7048,12 @@ machines.
 @item @samp{call}
 Subroutine call instruction returning no value.  Operand 0 is the
 function to call; operand 1 is the number of bytes of arguments pushed
-as a @code{const_int}; operand 2 is the number of registers used as
-operands.
+as a @code{const_int}.  Operand 2 is the result of calling the target
+hook @code{TARGET_FUNCTION_ARG} with the second argument @code{arg}
+yielding true for @code{arg.end_marker_p ()}, in a call after all
+parameters have been passed to that hook.  By default this is the first
+register beyond those used for arguments in the call, or @code{NULL} if
+all the argument-registers are used in the call.

 On most machines, operand 2 is not actually stored into the RTL
 pattern.  It is supplied for the sake of some RISC machines which need
diff --git a/gcc/config/mmix/mmix.md b/gcc/config/mmix/mmix.md
index a6d7608ef50c..33e9c60982d6 100644
--- a/gcc/config/mmix/mmix.md
+++ b/gcc/config/mmix/mmix.md
@@ -999,10 +999,8 @@ (define_expand "call"
 = mmix_get_hard_reg_initial_val (Pmode,
 MMIX_INCOMING_RETURN_ADDRESS_REGNUM);

-  /* FIXME: There's a bug in gcc which causes NULL to be passed as
- operand[2] when we get out of registers, which later confuses gcc.
- Work around it by replacing it with const_int 0.  Possibly documentation
- error too.  */
+  /* NULL gets passed as operand[2] when we get out of registers,
+ which later confuses gcc.  Replace it with const_int 0.  */
   if (operands[2] == NULL_RTX)
 operands[2] = const0_rtx;

@@ -1036,14 +1034,10 @@ (define_expand "call_value"
 = mmix_get_hard_reg_initial_val (Pmode,
 MMIX_INCOMING_RETURN_ADDRESS_REGNUM);

-  /* FIXME: See 'call'.  */
+  /* See 'call'.  */
   if (operands[3] == NULL_RTX)
 operands[3] = const0_rtx;

-  /* FIXME: Documentation bug: operands[3] (operands[2] for 'call') is the
- *next* argument register, not the number of arguments in registers.
- (There used to be code here where that mattered.)  */
-
   operands[5] = gen_rtx_REG (DImode, MMIX_INCOMING_RETURN_ADDRESS_REGNUM);
 }")

-- 
2.20.1

Committed: Fix MMIX breakage; ICE in df_ref_record, at df-scan.c:2598

2021-07-29 Thread Hans-Peter Nilsson

This bug made me dive into some of the murkier waters of gcc, namely
the source of operand 2 to the "call" pattern.  It can be pretty
poisonous, but is unused (either directly or later) by most targets.

The target function_arg (and function_incoming_arg), can unless
specially handled, cause a VOIDmode reg RTX to be generated, for the
function arguments end-marker.  This is then passed on by expand_call
to the target "call" pattern, as operand[2] (which is wrongly
documented or wrongly implemented, see comment in mmix.c) but unused
by most targets that do not handle it specially, as in operand 2 not
making it into the insn generated for the "call" (et al) patterns.  Of
course, the MMIX port stands out here: the RTX makes it into the
generated RTX but is then actually unused and is just a placeholder;
see mmix_print_operand 'p'.

Anyway, df-scan inspects the emitted call rtx and horks on the
void-mode RTX (actually: that it represents a zero-sized register
range) from r12-1702.

While I could replace or remove the emitted unused call insn operand,
that would still leave unusable rtx to future users of function_arg
actually looking for next_arg_reg.  Better replace VOIDmode with
DImode here; that's the "natural" mode of MMIX registers.

(As a future improvement, I'll also remove the placeholder argument
and replace the intended user; the print_operand output modifier 'p'
modifier (as in "PUSHJ $%p2,%0") with some punctuation, perhaps '!'
(as in "PUSHJ $%!,%0").

I inspected all ports, but other targets emit a special
function_arg_info::end_marker cookie or just don't emit "call"
operand[2] (etc) in the expanded "call" pattern.

gcc:
* config/mmix/mmix.c (mmix_function_arg_1): Avoid
generating a VOIDmode register for e.g the
function_arg_info::end_marker.
---
 gcc/config/mmix/mmix.c | 13 ++---
 1 file changed, 10 insertions(+), 3 deletions(-)

diff --git a/gcc/config/mmix/mmix.c b/gcc/config/mmix/mmix.c
index 40bfb4314ddd..db7af7b75b6d 100644
--- a/gcc/config/mmix/mmix.c
+++ b/gcc/config/mmix/mmix.c
@@ -667,10 +667,17 @@ mmix_function_arg_1 (const cumulative_args_t argsp_v,
 {
   CUMULATIVE_ARGS *argsp = get_cumulative_args (argsp_v);

+  /* The mode of the argument will be VOIDmode for the "end_marker".  Make sure
+ we don't ever generate a VOIDmode register; later passes will barf on 
that.
+ We may want to use the register number, so return something nominally
+ useful.  Thus, for VOIDmode, use DImode, being the natural mode for the
+ register.  */
+  machine_mode mode = arg.mode == VOIDmode ? DImode : arg.mode;
+
   /* Last-argument marker.  */
   if (arg.end_marker_p ())
 return (argsp->regs < MMIX_MAX_ARGS_IN_REGS)
-  ? gen_rtx_REG (arg.mode,
+  ? gen_rtx_REG (mode,
 (incoming
  ? MMIX_FIRST_INCOMING_ARG_REGNUM
  : MMIX_FIRST_ARG_REGNUM) + argsp->regs)
@@ -678,10 +685,10 @@ mmix_function_arg_1 (const cumulative_args_t argsp_v,

   return (argsp->regs < MMIX_MAX_ARGS_IN_REGS
  && !targetm.calls.must_pass_in_stack (arg)
- && (GET_MODE_BITSIZE (arg.mode) <= 64
+ && (GET_MODE_BITSIZE (mode) <= 64
  || argsp->lib
  || TARGET_LIBFUNC))
-? gen_rtx_REG (arg.mode,
+? gen_rtx_REG (mode,
   (incoming
? MMIX_FIRST_INCOMING_ARG_REGNUM
: MMIX_FIRST_ARG_REGNUM)
-- 
2.20.1

Re: [PATCH] fix breakage from "libstdc++: Remove unnecessary uses of "

2021-07-29 Thread Jonathan Wakely via Gcc-patches

On Thu, 29 Jul 2021, 23:44 Hans-Peter Nilsson,  wrote:

> Commit r12-2534 was incomplete and (by inspection derived from
> an MMIX build) failing for targets without an insn for
> compare_and_swap for pointer-size objects, IOW for targets for
> which "ATOMIC_POINTER_LOCK_FREE != 2" is true:
>
> x/gcc/libstdc++-v3/src/c++17/memory_resource.cc: In member function
>  'std::pmr::memory_resource*
>
>  std::pmr::{anonymous}::atomic_mem_res::exchange(std::pmr::memory_resource*)':
> x/gcc/libstdc++-v3/src/c++17/memory_resource.cc:140:21: error:
>  'exchange' is not a member of 'std'
>   140 | return std::exchange(val, r);
>   | ^~~~
> make[5]: *** [Makefile:577: memory_resource.lo] Error 1
> make[5]: Leaving directory
>  '/home/hp/tmp/newmmix-r12-2579-p3/gccobj/mmix/libstdc++-v3/src/c++17'
>
> This fix was derived from edits elsewhere in that patch.
>
> Tested mmix-knuth-mmixware, restoring build (together with
> target-reviving patches as MMIX is currently and at that commit
> broken for target-specific reasons).
>
> Ok to commit?
>

Yes, thanks.

(We could also just include  to get the declaration of
std::exchange, since this isn't a header so we can include whatever we
want, but this is fine.)



> libstdc++-v3/:
> * src/c++17/memory_resource.cc: Use __exchange instead
> of std::exchange.
> ---
>  libstdc++-v3/src/c++17/memory_resource.cc | 6 +++---
>  1 file changed, 3 insertions(+), 3 deletions(-)
>
> diff --git a/libstdc++-v3/src/c++17/memory_resource.cc
> b/libstdc++-v3/src/c++17/memory_resource.cc
> index 5dfc29fc0ec8..1ba79903f870 100644
> --- a/libstdc++-v3/src/c++17/memory_resource.cc
> +++ b/libstdc++-v3/src/c++17/memory_resource.cc
> @@ -29,7 +29,7 @@
>  #include 
>  #if ATOMIC_POINTER_LOCK_FREE != 2
>  # include// std::mutex, std::lock_guard
> -# include // std::exchange
> +# include // std::__exchange
>  #endif
>
>  namespace std _GLIBCXX_VISIBILITY(default)
> @@ -117,7 +117,7 @@ namespace pmr
>memory_resource* exchange(memory_resource* r)
>{
> lock_guard lock(mx);
> -   return std::exchange(val, r);
> +   return std::__exchange(val, r);
>}
>  };
>  #else
> @@ -137,7 +137,7 @@ namespace pmr
>
>memory_resource* exchange(memory_resource* r)
>{
> -   return std::exchange(val, r);
> +   return std::__exchange(val, r);
>}
>  };
>  #endif // ATOMIC_POINTER_LOCK_FREE == 2
> --
> 2.20.1
>
>

[PATCH] fix breakage from "libstdc++: Remove unnecessary uses of "

2021-07-29 Thread Hans-Peter Nilsson

Commit r12-2534 was incomplete and (by inspection derived from
an MMIX build) failing for targets without an insn for
compare_and_swap for pointer-size objects, IOW for targets for
which "ATOMIC_POINTER_LOCK_FREE != 2" is true:

x/gcc/libstdc++-v3/src/c++17/memory_resource.cc: In member function
 'std::pmr::memory_resource*
 std::pmr::{anonymous}::atomic_mem_res::exchange(std::pmr::memory_resource*)':
x/gcc/libstdc++-v3/src/c++17/memory_resource.cc:140:21: error:
 'exchange' is not a member of 'std'
  140 | return std::exchange(val, r);
  | ^~~~
make[5]: *** [Makefile:577: memory_resource.lo] Error 1
make[5]: Leaving directory
 '/home/hp/tmp/newmmix-r12-2579-p3/gccobj/mmix/libstdc++-v3/src/c++17'

This fix was derived from edits elsewhere in that patch.

Tested mmix-knuth-mmixware, restoring build (together with
target-reviving patches as MMIX is currently and at that commit
broken for target-specific reasons).

Ok to commit?

libstdc++-v3/:
* src/c++17/memory_resource.cc: Use __exchange instead
of std::exchange.
---
 libstdc++-v3/src/c++17/memory_resource.cc | 6 +++---
 1 file changed, 3 insertions(+), 3 deletions(-)

diff --git a/libstdc++-v3/src/c++17/memory_resource.cc 
b/libstdc++-v3/src/c++17/memory_resource.cc
index 5dfc29fc0ec8..1ba79903f870 100644
--- a/libstdc++-v3/src/c++17/memory_resource.cc
+++ b/libstdc++-v3/src/c++17/memory_resource.cc
@@ -29,7 +29,7 @@
 #include 
 #if ATOMIC_POINTER_LOCK_FREE != 2
 # include// std::mutex, std::lock_guard
-# include // std::exchange
+# include // std::__exchange
 #endif

 namespace std _GLIBCXX_VISIBILITY(default)
@@ -117,7 +117,7 @@ namespace pmr
   memory_resource* exchange(memory_resource* r)
   {
lock_guard lock(mx);
-   return std::exchange(val, r);
+   return std::__exchange(val, r);
   }
 };
 #else
@@ -137,7 +137,7 @@ namespace pmr

   memory_resource* exchange(memory_resource* r)
   {
-   return std::exchange(val, r);
+   return std::__exchange(val, r);
   }
 };
 #endif // ATOMIC_POINTER_LOCK_FREE == 2
-- 
2.20.1

New German PO file for 'gcc' (version 11.2.0)

2021-07-29 Thread Translation Project Robot

Hello, gentle maintainer.

This is a message from the Translation Project robot.

A revised PO file for textual domain 'gcc' has been submitted
by the German team of translators.  The file is available at:

https://translationproject.org/latest/gcc/de.po

(This file, 'gcc-11.2.0.de.po', has just now been sent to you in
a separate email.)

All other PO files for your package are available in:

https://translationproject.org/latest/gcc/

Please consider including all of these in your next release, whether
official or a pretest.

Whenever you have a new distribution with a new version number ready,
containing a newer POT file, please send the URL of that distribution
tarball to the address below.  The tarball may be just a pretest or a
snapshot, it does not even have to compile.  It is just used by the
translators when they need some extra translation context.

The following HTML page has been updated:

https://translationproject.org/domain/gcc.html

If any question arises, please contact the translation coordinator.

Thank you for all your work,

The Translation Project robot, in the
name of your translation coordinator.

[PATCH] Add testcases that got lost when tree-ssa was merged

2021-07-29 Thread apinski--- via Gcc-patches

From: Andrew Pinski 

So I was looking at some older PRs (PR 16016 in this case),
I noticed that some of the testcases were removed when
the tree-ssa branch was merged. This adds them back in.

OK? Bootstrapped and tested on x86_64-linux-gnu with no regressions.

Thanks,
Andrew Pinski

gcc/testsuite/ChangeLog:

PR testsuite/101517
* g++.dg/warn/Wunused-18.C: New test.
* gcc.c-torture/compile/20030405-2.c: New test.
* gcc.c-torture/compile/20040304-2.c: New test.
* gcc.dg/20030612-2.c: New test.
---
 gcc/testsuite/g++.dg/warn/Wunused-18.C| 13 +
 .../gcc.c-torture/compile/20030405-2.c| 58 +++
 .../gcc.c-torture/compile/20040304-2.c| 45 ++
 gcc/testsuite/gcc.dg/20030612-2.c | 20 +++
 4 files changed, 136 insertions(+)
 create mode 100644 gcc/testsuite/g++.dg/warn/Wunused-18.C
 create mode 100644 gcc/testsuite/gcc.c-torture/compile/20030405-2.c
 create mode 100644 gcc/testsuite/gcc.c-torture/compile/20040304-2.c
 create mode 100644 gcc/testsuite/gcc.dg/20030612-2.c

diff --git a/gcc/testsuite/g++.dg/warn/Wunused-18.C 
b/gcc/testsuite/g++.dg/warn/Wunused-18.C
new file mode 100644
index 000..06d1a0516bc
--- /dev/null
+++ b/gcc/testsuite/g++.dg/warn/Wunused-18.C
@@ -0,0 +1,13 @@
+// PR c++/14199
+// { dg-options "-W -Wall -Wunused" }
+
+struct X { 
+static void foo (); 
+}; 
+ 
+template  
+void foo (const T ) { 
+  t.foo(); 
+}
+
+template void foo (const X &); 
diff --git a/gcc/testsuite/gcc.c-torture/compile/20030405-2.c 
b/gcc/testsuite/gcc.c-torture/compile/20030405-2.c
new file mode 100644
index 000..2e61f1fa3ff
--- /dev/null
+++ b/gcc/testsuite/gcc.c-torture/compile/20030405-2.c
@@ -0,0 +1,58 @@
+/* PR optimization/10024 */
+extern int *allegro_errno;
+typedef long fixed;
+extern inline int
+fixfloor (fixed x)
+{
+  if (x >= 0)
+return (x >> 16);
+  else
+return ~((~x) >> 16);
+}
+extern inline int
+fixtoi (fixed x)
+{
+  return fixfloor (x) + ((x & 0x8000) >> 15);
+}
+extern inline fixed
+ftofix (double x)
+{
+  if (x > 32767.0)
+{
+  *allegro_errno = 34;
+  return 0x7FFF;
+}
+  if (x < -32767.0)
+{
+  *allegro_errno = 34;
+  return -0x7FFF;
+}
+  return (long) (x * 65536.0 + (x < 0 ? -0.5 : 0.5));
+}
+extern inline double
+fixtof (fixed x)
+{
+  return (double) x / 65536.0;
+}
+extern inline fixed
+fixdiv (fixed x, fixed y)
+{
+  if (y == 0)
+{
+  *allegro_errno = 34;
+  return (x < 0) ? -0x7FFF : 0x7FFF;
+}
+  else
+return ftofix (fixtof (x) / fixtof (y));
+}
+extern inline fixed
+itofix (int x)
+{
+  return x << 16;
+}
+
+int
+foo (int n)
+{
+  return fixtoi (fixdiv (itofix (512), itofix (n)));
+}
diff --git a/gcc/testsuite/gcc.c-torture/compile/20040304-2.c 
b/gcc/testsuite/gcc.c-torture/compile/20040304-2.c
new file mode 100644
index 000..146d42f23d6
--- /dev/null
+++ b/gcc/testsuite/gcc.c-torture/compile/20040304-2.c
@@ -0,0 +1,45 @@
+/* PR optimization/14235 */
+/* Origin:  */
+
+typedef signed charint8_t;
+typedef short  int16_t;
+typedef intint32_t;
+typedef unsigned long long uint64_t;
+
+static const uint64_t LOW_BYTE_MASK= 0x00ffULL;
+static const uint64_t HIGH_BYTE_MASK   = 0xff00ULL;
+static const uint64_t WORD_MASK= 0xULL;
+static const uint64_t DWORD_MASK   = 0xULL;
+
+extern uint64_t *srca_mask;
+extern int *assert_thrown;
+
+void foo()
+{
+  uint64_t tempA = 0; /* actually a bunch of code to set A */ 
+  uint64_t tempB = 0; /* actually a bunch of code to set B */ 
+
+  /* cast A to right size */
+  tempA = (((*srca_mask == LOW_BYTE_MASK) || 
+(*srca_mask == HIGH_BYTE_MASK)) ?
+   ((int8_t)tempA) : 
+   ((*srca_mask == WORD_MASK) ? 
+((int16_t)tempA) : 
+((*srca_mask == DWORD_MASK) ? 
+ ((int32_t)tempA) : 
+ tempA)));
+
+  /* cast B to right size */
+  tempB = (((*srca_mask == LOW_BYTE_MASK) || 
+(*srca_mask == HIGH_BYTE_MASK)) ? 
+   ((int8_t)tempB) : 
+   ((*srca_mask == WORD_MASK) ? 
+((int16_t)tempB) : 
+((*srca_mask == DWORD_MASK) ? 
+ ((int32_t)tempB) : 
+ tempB))); 
+
+  if ((int) tempA > (int) tempB) { 
+*assert_thrown = 1;
+  }
+}
diff --git a/gcc/testsuite/gcc.dg/20030612-2.c 
b/gcc/testsuite/gcc.dg/20030612-2.c
new file mode 100644
index 000..f9f212caba1
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/20030612-2.c
@@ -0,0 +1,20 @@
+/* Derived from PR middle-end/168.  */
+
+/* { dg-do compile } */
+/* { dg-options "-W" } */
+
+extern void foo ();
+
+unsigned char uc;
+unsigned short int usi;
+unsigned int ui;
+
+
+void bar()
+{
+  if (uc + usi >= ui)  /* { dg-bogus "between signed and unsigned" } */
+foo ();
+  if (uc * usi >= ui)  /* { dg-bogus "between signed and unsigned" } */
+foo ();
+}
+

Re: [PATCH v3 1/2] rs6000: Add support for _mm_minpos_epu16

2021-07-29 Thread Paul A. Clarke via Gcc-patches

On Tue, Jul 27, 2021 at 10:29:13PM -0400, David Edelsohn via Gcc-patches wrote:
> > Add a naive implementation of the subject x86 intrinsic to
> > ease porting.
> >
> > 2021-07-15  Paul A. Clarke  
> >
> > gcc
> > * config/rs6000/smmintrin.h (_mm_minpos_epu16): New.
> 
> Segher already approved this with the changes requested.

Segher said:
| This does not compute the index correctly for big endian (it needs to
| walk from right to left for that).  The construction of the return value
| looks wrong as well.
| 
| Okay for trunk with that fixed.  Thanks!

I responded:
| I'm not seeing the issue here. The values are numbered by element order,
| and the results are in the "first" (minimum value) and "second" (index of
| first encountered minimum value in element order) elements of the result.

I did not get a response, nor did I change any code. It feels like a stretch
to equate the above exchange to "approved", so I'll continue to wait for
explicit approval.

PC

Re: [patch][version 7]add -ftrivial-auto-var-init and variable attribute "uninitialized" to gcc

2021-07-29 Thread Kees Cook via Gcc-patches

On Thu, Jul 29, 2021 at 08:02:43PM +, Qing Zhao wrote:
> This is the 7th version of the patch for the new security feature for GCC.
> I have tested it with bootstrap on both x86 and aarch64, regression testing 
> on both x86 and aarch64.
> Also compile CPU2017 (running is ongoing), without any issue.
> 
> Please take a look and let me know any issue.

All my kernel tests pass; this looks great! Thank you. :)

-- 
Kees Cook

Re: [PATCH 02/10] [i386] Enable _Float16 type for TARGET_SSE2 and above.

2021-07-29 Thread Joseph Myers

On Thu, 29 Jul 2021, Hongtao Liu via Gcc-patches wrote:

> > Rather than using FLT_EVAL_METHOD_PROMOTE_TO_FLOAT16 whenever TARGET_SSE2
> > (i.e. whenever the type is available), it might make more sense to follow
> > AArch64 and use it only when the hardware instructions are available.  In
> > any case, it seems peculiar to use a different threshold in the "fast"
>   We want to provide some debuggability to the software emulation.
> When there's inconsistency between software emulation and hardware
> instructions, users can still debug on non-avx512fp16 processor w/
> software emulation and extra option -fexcess-precision=standard,

But that's not the purpose of -fexcess-precision=standard.  The purpose is 
only: when the default case is non-conforming, make it conforming instead.  
The default case is non-conforming only when the back end has insn 
patterns pretending to be able to do arithmetic on formats it can't 
actually do arithmetic on - that is, x87 arithmetic where the insn 
patterns pretend to support SFmode and DFmode arithmetic but actually use 
XFmode (and the similar issue for older m68k, but that back end doesn't 
actually have the required support for -fexcess-precision=standard).

So -fexcess-precision=standard should not do anything different from 
-fexcess-precision=fast regarding _Float16.

If you want to be able to enable or disable excess precision for _Float16 
separately from the underlying hardware support, that might provide a case 
for supporting extra options, say -fexcess-precision=16 that means follow 
the semantics of FLT_EVAL_METHOD == 16 (and with an error for that option 
on architectures where the given FLT_EVAL_METHOD value isn't supported).  
But that shouldn't be done by making -fexcess-precision=standard do 
something outside its scope.

> Also since TARGET_C_EXCESS_PRECISION is not related to type, for
> testcase w/o _Float16 and is supposed to be runned on x86 fpu, if gcc
> is built w/ --with-arch=sapphirerapid, it will regress those
> testcases. .i.e. gcc.target/i386/excess-precision-*.c, that's why we
> can't follow AArch64.

Those tests use -mfpmath=387.

In the -mfpmath=387 case, it seems reasonable to keep the rule of 
promoting to long double, regardless of hardware _Float16 support (-msse2 
must also be in effect for the type to be supported at all by the back 
end).  It's the -mfpmath=sse case for which I think following AArch64 is 
appropriate.

-- 
Joseph S. Myers
jos...@codesourcery.com

Re: [PATCH 03/10] [i386] libgcc: Enable hfmode soft-sf/df/xf/tf extensions and truncations.

2021-07-29 Thread Joseph Myers

On Tue, 27 Jul 2021, Hongtao Liu via Gcc-patches wrote:

> modified   gcc/emit-rtl.c
> @@ -928,6 +928,10 @@ validate_subreg (machine_mode omode, machine_mode imode,
>   fix them all.  */
>if (omode == word_mode)
>  ;
> +  /* ???Similarly like (subreg:DI (reg:SF), also allow (subreg:SI (reg:HF))
> + here. Though extract_bit_field is the culprit here, not the backends.  
> */
> +  else if (imode == HFmode && omode == SImode)
> +;

You can't reference HFmode by name at all in any target-independent file, 
outside of a #ifdef HAVE_HFmode conditional.  It's only defined in 
architecture-specific -modes.def for those architectures supporting 
that mode, so you'll have an undefined identifier building for other 
targets if you reference it in a generic source file.  You have to 
condition things on the logical properties of the mode that are relevant, 
not on the target-specific name (or use a HAVE_HFmode conditional, but 
basing things on logical properties is clearly better where possible).

-- 
Joseph S. Myers
jos...@codesourcery.com

Re: [PATCH] c++: Implement P0466R5 __cpp_lib_is_pointer_interconvertible compiler helpers [PR101539]

2021-07-29 Thread Jason Merrill via Gcc-patches


On 7/29/21 3:50 AM, Jakub Jelinek wrote:

Hi!

The following patch attempts to implement the compiler helpers for
libstdc++ std::is_pointer_interconvertible_base_of trait and
std::is_pointer_interconvertible_with_class template function.

For the former __is_pointer_interconvertible_base_of trait that checks first
whether base and derived aren't non-union class types that are the same
ignoring toplevel cv-qualifiers, otherwise if derived is unambiguously
derived from base without cv-qualifiers, derived being a complete type,
and if so, my limited understanding of any derived object being
pointer-interconvertible with base subobject IMHO implies (because one can't
inherit from unions or unions can't inherit) that we check if derived is
standard layout type and we walk bases of derived
recursively, stopping on a class that has any non-static data members and
check if any of the bases is base.  On class with non-static data members
no bases are compared already.

The latter is implemented using a FE
__builtin_is_pointer_interconvertible_with_class, but because on the library
side it will be a template function, the builtin takes ... arguments and
only during folding verifies it has a single argument with pointer to member
type.  The initial errors IMHO can only happen if one uses the builtin
incorrectly by hand, the template function should ensure that it has
exactly a single argument that has pointer to member type.
Otherwise, again with my limited understanding of what
the template function should do and pointer-interconvertibility,
it folds to false for pointer-to-member-function, errors if
basetype of the OFFSET_TYPE is incomplete, folds to false
for non-std-layout basetype, then finds the first non-static
data member in the basetype or its bases (by ignoring
DECL_FIELD_IS_BASE FIELD_DECLs that are empty, recursing into
DECL_FIELD_IS_BASE FIELD_DECLs type that are non-empty (I think
std layout should ensure there is at most one), for unions
checks if membertype is same type as any of the union FIELD_DECLs,
for non-unions the first other FIELD_DECL only, and for anonymous
aggregates similarly (union vs. non-union) but recurses into the
anon aggr types.  If membertype doesn't match the type of
first non-static data member (or for unions any of the members),
then the builtin folds to false, otherwise the built folds to
a check whether the argument is equal to OFFSET_TYPE of 0 or not,
either at compile time if it is constant (e.g. for constexpr
folding) or at runtime otherwise.

As I wrote in the PR, I've tried my testcases with MSVC on godbolt
that claims to implement it, and https://godbolt.org/z/3PnjM33vM
for the first testcase shows it disagrees with my expectations on
static_assert (std::is_pointer_interconvertible_base_of_v);
static_assert (std::is_pointer_interconvertible_base_of_v);
static_assert (!std::is_pointer_interconvertible_base_of_v);
static_assert (!std::is_pointer_interconvertible_base_of_v);
static_assert (std::is_pointer_interconvertible_base_of_v);
Is that a bug in my patch or is MSVC buggy on these (or mix thereof)?
https://godbolt.org/z/aYeYnne9d
shows the second testcase, here it differs on:
static_assert (std::is_pointer_interconvertible_with_class (::b));
static_assert (std::is_pointer_interconvertible_with_class (::g));
static_assert (std::is_pointer_interconvertible_with_class (::b));
static_assert (std::is_pointer_interconvertible_with_class (::a));
static_assert (std::is_pointer_interconvertible_with_class (::b));
Again, my bug, MSVC bug, mix thereof?

Oh, and there is another thing, the standard has an example:
struct A { int a; };// a standard-layout class
struct B { int b; };// a standard-layout class
struct C: public A, public B { };   // not a standard-layout class

static_assert( is_pointer_interconvertible_with_class( ::b ) );
   // Succeeds because, despite its appearance, ::b has type
   // “pointer to member of B of type int”.
static_assert( is_pointer_interconvertible_with_class( ::b ) );
   // Forces the use of class C, and fails.
It seems to work as written with MSVC (second assertion fails),
but fails with GCC with the patch:
/tmp/1.C:22:57: error: no matching function for call to 
‘is_pointer_interconvertible_with_class(int B::*)’
22 | static_assert( is_pointer_interconvertible_with_class( ::b ) );
   |~^
/tmp/1.C:8:1: note: candidate: ‘template constexpr bool 
std::is_pointer_interconvertible_with_class(M S::*)’
 8 | is_pointer_interconvertible_with_class (M S::*m) noexcept
   | ^~
/tmp/1.C:8:1: note:   template argument deduction/substitution failed:
/tmp/1.C:22:57: note:   mismatched types ‘C’ and ‘B’
22 | static_assert( is_pointer_interconvertible_with_class( ::b ) );
   |~^
the second int argument isn't deduced.

This boils down

Re: [PATCH] RISC-V: Enable overlap-by-pieces in case of fast unaliged access

2021-07-29 Thread Christoph Müllner via Gcc-patches

On Thu, Jul 29, 2021 at 8:54 PM Palmer Dabbelt  wrote:
>
> On Tue, 27 Jul 2021 02:32:12 PDT (-0700), cmuell...@gcc.gnu.org wrote:
> > Ok, so if I understand correctly Palmer and Andrew prefer
> > overlap_op_by_pieces to be controlled
> > by its own field in the riscv_tune_param struct and not by the field
> > slow_unaligned_access in this struct
> > (i.e. slow_unaligned_access==false is not enough to imply
> > overlap_op_by_pieces==true).
>
> I guess, but I'm not really worried about this at that level of detail
> right now.  It's not like the tune structures form any sort of external
> interface we have to keep stable, we can do whatever we want with those
> fields so I'd just aim for encoding the desired behavior as simply as
> possible rather than trying to build something extensible.
>
> There are really two questions we need to answer: is this code actually
> faster for the C906, and is this what the average users wants under -Os.

I never mentioned -Os.
My main goal is code compiled for -O2, -O3 or even -Ofast.
And I want to execute code as fast as possible.

Loading hot data from cache is faster when being done by a single
load-word instruction than 4 load-byte instructions.
Less instructions implies less pressure for the instruction cache.
Less instructions implies less work for a CPU pipeline.
Architectures, which don't have a penalty for unaligned accesses
therefore observe a performance benefit.

What I understand from Andrew's email is that it is not that simple
and implementation might have a penalty for overlapping accesses
that is high enough to avoid them. I don't have the details for C906,
so I can't say if that's the case.

> That first one is pretty easy: just running those simple code sequences
> under a sweep of page offsets should be sufficient to determine if this
> is always faster (in which case it's an easy yes), if it's always slower
> (an easy no), or if there's some slow cases like page/cache line
> crossing (in which case we'd need to think a bit).
>
> The second one is a bit tricker.  In the past we'd said these sort of
> "actively misalign accesses to generate smaller code" sort of thing
> isn't suitable for -Os (as most machines still have very slow unaligned
> accesses) but is suitable for -Oz (don't remember if that ever ended up
> in GCC, though).  That still seems like a reasonable decision, but if it
> turns out that implementations with fast unaligned accesses become the
> norm then it'd probably be worth revisiting it.  Not sure exactly how to
> determine that tipping point, but I think we're a long way away from it
> right now.
>
> IMO it's really just premature to try and design an encoding of the
> tuning paramaters until we have an idea of what they are, as we'll just
> end up devolving down the path of trying to encode all possible hardware
> and that's generally a huge waste of time.  Since there's no ABI here we
> can refactor this however we want as new tunings show up.

I guess you mean that there needs to be a clear benefit for a supported
machine in GCC. Either obviously (see below), by measurement results,
or by decision
of the machine's maintainer (especially if the decision is a trade-off).

>
> > I don't have access to pipeline details that give proof that there are cases
> > where this patch causes a performance penalty.
> >
> > So, I leave this here as a summary for someone who has enough information 
> > and
> > interest to move this forward:
> > * the original patch should be sufficient, but does not have tests:
> >   https://gcc.gnu.org/pipermail/gcc-patches/2021-July/575791.html
> > * the tests can be taken from this patch:
> >   https://gcc.gnu.org/pipermail/gcc-patches/2021-July/575864.html
> >   Note, that there is a duplicated "sw" in builtins-overlap-6.c, which
> > should be a "sd".
> >
> > Thanks for the feedback!
>
> Cool.  Looks like the C906 is starting to show up in the real world, so
> we should be able to find someone who has access to one and cares enough
> to at least run some simple benchamrks of these code sequences.  IMO
> that's a pretty low interest bar, so I don't see any harm in waiting --
> when the hardware is common then I'm sure someone will care enough to
> give this a shot, and until then it's not really impacting anyone either
> way.
>
> The -Os thing is a bigger discussion, and while I'm happy to have it I
> don't really think we're even close to these being common enough yet.  I
> saw your memmove patch and think the same rationale might apply there,
> but I haven't looked closely and won't have time to for a bit as I've
> got to get around to the other projects.

The cpymemsi patch is also targeting -O2 or higher for fast code execution.
And it is one of the cases where there is an obvious performance benefit
for all machines that have slow_unaligned_access==false.

At the moment the cpymemsi expansion for RISC-V is implemented as if
there is no machine with slow_unaligned_access==false.
And in fact there is a

Re: [PATCH v2] c++: Accept C++11 attribute-definition [PR101582]

2021-07-29 Thread Jason Merrill via Gcc-patches


On 7/29/21 5:28 AM, Jakub Jelinek wrote:

On Wed, Jul 28, 2021 at 04:32:08PM -0400, Jason Merrill wrote:

As the following testcase shows, we don't parse properly
C++11 attribute-declaration:
https://eel.is/c++draft/dcl.dcl#nt:attribute-declaration

cp_parser_toplevel_declaration just handles empty-declaration parsing
(with diagnostics for C++98)


This seems to be a bug: from the comments, cp_parser_toplevel_declaration is
intended to only handle #pragma parsing, everything else should be in
cp_parser_declaration.

As a result, we wrongly reject

extern "C" ;

So please move empty-declaration and attribute-declaration handling into
cp_parser_declaration.


So like this?
It means we allow for modules
export ;
or
export [[]];
where we previously rejected those, which is allowed by the grammar and
invalid because of
https://eel.is/c++draft/module.interface#3
but we allowed already before
export {}
which suffers from the same problem - the export-declaration doesn't declare
at least one name.  So I think there just should be something that tracks if
the module exported at least one name and if not, diagnose it at the end of
cp_parser_module_export (and adjust the new modules testcase).

Bootstrapped/regtested on x86_64-linux and i686-linux, ok for trunk?

2021-07-29  Jakub Jelinek  

PR c++/101582
* parser.c (cp_parser_skip_std_attribute_spec_seq): Add a forward
declaration.
(cp_parser_declaration): Parse empty-declaration and
attribute-declaration.
(cp_parser_toplevel_declaration): Don't parse empty-declaration here.

* g++.dg/cpp0x/gen-attrs-45.C: Expect a warning about ignored
attributes instead of error.
* g++.dg/cpp0x/gen-attrs-75.C: New test.
* g++.dg/modules/pr101582-1.C: New test.

--- gcc/cp/parser.c.jj  2021-07-28 23:06:38.658443554 +0200
+++ gcc/cp/parser.c 2021-07-28 23:12:10.955941089 +0200
@@ -2507,6 +2507,8 @@ static tree cp_parser_std_attribute_spec
(cp_parser *);
  static tree cp_parser_std_attribute_spec_seq
(cp_parser *);
+static size_t cp_parser_skip_std_attribute_spec_seq
+  (cp_parser *, size_t);
  static size_t cp_parser_skip_attributes_opt
(cp_parser *, size_t);
  static bool cp_parser_extension_opt
@@ -14410,6 +14412,31 @@ cp_parser_declaration (cp_parser* parser
cp_token *token2 = (token1->type == CPP_EOF
  ? token1 : cp_lexer_peek_nth_token (parser->lexer, 2));
  
+  if (token1->type == CPP_SEMICOLON)

+{
+  cp_lexer_consume_token (parser->lexer);
+  /* A declaration consisting of a single semicolon is invalid
+   * before C++11.  Allow it unless we're being pedantic.  */
+  if (cxx_dialect < cxx11)
+   pedwarn (input_location, OPT_Wpedantic, "extra %<;%>");
+  return;
+}
+  else if (cp_lexer_nth_token_is (parser->lexer,
+ cp_parser_skip_std_attribute_spec_seq (parser,
+1),
+ CPP_SEMICOLON))
+{
+  location_t attrs_loc = token1->location;
+  tree std_attrs = cp_parser_std_attribute_spec_seq (parser);
+  if (std_attrs != NULL_TREE)
+   warning_at (make_location (attrs_loc, attrs_loc, parser->lexer),
+   OPT_Wattributes,
+   "attributes in attribute declaration are ignored");


Let's not mention the obscure attribute-declaration grammar nonterminal, 
"attribute ignored" seems sufficient.



+  if (cp_lexer_next_token_is (parser->lexer, CPP_SEMICOLON))
+   cp_lexer_consume_token (parser->lexer);
+  return;
+}
+
/* Get the high-water mark for the DECLARATOR_OBSTACK.  */
void *p = obstack_alloc (_obstack, 0);
  
@@ -14560,14 +14587,6 @@ cp_parser_toplevel_declaration (cp_parse

 cp_parser_declaration.  (A #pragma at block scope is
 handled in cp_parser_statement.)  */
  cp_parser_pragma (parser, pragma_external, NULL);
-  else if (token->type == CPP_SEMICOLON)
-{
-  cp_lexer_consume_token (parser->lexer);
-  /* A declaration consisting of a single semicolon is invalid
-   * before C++11.  Allow it unless we're being pedantic.  */
-  if (cxx_dialect < cxx11)
-   pedwarn (input_location, OPT_Wpedantic, "extra %<;%>");
-}
else
  /* Parse the declaration itself.  */
  cp_parser_declaration (parser, NULL_TREE);
--- gcc/testsuite/g++.dg/cpp0x/gen-attrs-45.C.jj2021-07-26 
09:13:08.504121494 +0200
+++ gcc/testsuite/g++.dg/cpp0x/gen-attrs-45.C   2021-07-28 23:07:05.095085351 
+0200
@@ -1,4 +1,4 @@
  // PR c++/52906
  // { dg-do compile { target c++11 } }
  
-[[gnu::deprecated]]; // { dg-error "does not declare anything" }

+[[gnu::deprecated]]; // { dg-warning "attributes in attribute declaration are 
ignored" }
--- gcc/testsuite/g++.dg/cpp0x/gen-attrs-75.C.jj2021-07-28 
23:07:05.095085351 +0200
+++ gcc/testsuite/g++.dg/cpp0x/gen-attrs-75.C   2021-07-29

Re: [PATCH 0/2] New target hook TARGET_COMPUTE_MULTILIB and implementation for RISC-V

2021-07-29 Thread Palmer Dabbelt


On Thu, 29 Jul 2021 11:44:09 PDT (-0700), gcc-patches@gcc.gnu.org wrote:

ping

On Wed, Jul 21, 2021 at 5:28 PM Kito Cheng  wrote:


This patch set allow target to use customized multi-lib mechanism rather than 
the built-in
multi-lib mechanism.

The motivation of this patch is RISC-V might have very complicated multi-lib 
re-use
rule*, which is hard to maintain and use current multi-lib scripts,
we even hit the "argument list too long" error when we tried to add more
multi-lib reuse rule.

* Here is an example for RISC-V multi-lib rules:
https://gist.github.com/kito-cheng/0289cd42d9a756382e5afeb77b42b73b

V2 Changes:
- NO changes for first patch(TARGET_COMPUTE_MULTILIB part) since first version.
- Handle option other than -march and -mabi for riscv_compute_multilib.


This generally LGTM, but I think it's the sort of thing that should be 
looked at by a global reviewer.  There's a bit of a policy decision 
being made here in that this allows external hooks during the build 
process.


I'm fine with this, as it's just the multilib list, those are really 
specific to a specific toolchain distribution, and there's never going 
to be a way to catalog all the interested cases for the embedded 
toolchains.  I'm still not comfortable calling that a review, though, as 
these things are subtle and I don't always have the same bar for 
external bits that the rest of the GCC folks do.

[committed] Reinstate branch-on-bit insns for H8

2021-07-29 Thread Jeff Law via Gcc-patches

The branch-on-bit patterns have been disabled since the transition away 
from cc0 on the H8.  This patch reinstates them.  This tends to be a 
fairly nice win since the bit test is 2 or 4 bytes, but the 
and-with-constant approaches is going to be 2-6 bytes or worse if the 
input value doesn't die.


Variable bit handling isn't supported yet.  I can't convince myself the 
old variable bit test patterns were ever used.   But some contemplating 
is useful here as these are crazy expensive, none of my ideas from last 
night looked viable though.



Tested overnight in my tester without regressions.  Committing to the trunk.

Jeff
commit 0c6d21faa426bd6e6fdb3a6b47af530e49944118
Author: Jeff Law 
Date:   Thu Jul 29 14:32:59 2021 -0400

Reinstate branch-on-bit insns for H8

gcc/
* config/h8300/h8300-modes.def: Add CCZ, CCV and CCC, drop CCZNV.
* config/h8300/h8300.md (H8cc mode iterator): Add CCZ.
(cc mode_attr): Similarly.
(ccz subst_attr): Similarly.
* config/h8300/jumpcall.md: Add new patterns for branch-on-bit.
* config/h8300/testcompare.md: Remove various cc0 based patterns
that had been commented out.  Add pattern to set CCZ from a bit
test.

diff --git a/gcc/config/h8300/h8300-modes.def b/gcc/config/h8300/h8300-modes.def
index 23b777b2966..6ab52606a9a 100644
--- a/gcc/config/h8300/h8300-modes.def
+++ b/gcc/config/h8300/h8300-modes.def
@@ -18,4 +18,6 @@
.  */
 
 CC_MODE (CCZN);
-CC_MODE (CCZNV);
+CC_MODE (CCZ);
+CC_MODE (CCV);
+CC_MODE (CCC);
diff --git a/gcc/config/h8300/h8300.md b/gcc/config/h8300/h8300.md
index e596987a6a6..7f49e4284f2 100644
--- a/gcc/config/h8300/h8300.md
+++ b/gcc/config/h8300/h8300.md
@@ -140,11 +140,11 @@
 
 ;; The modes we're supporting.  This is used when we want to generate
 ;; multiple patterns where only the mode differs from a single template
-(define_mode_iterator H8cc [CC CCZN])
+(define_mode_iterator H8cc [CC CCZN CCZ])
 
 ;; This is used to generate multiple define_substs from a single
 ;; template for the different variants we might have.
-(define_mode_attr cc [(CC "cc") (CCZN "cczn")])
+(define_mode_attr cc [(CC "cc") (CCZN "cczn") (CCZ "ccz")])
 
 ;; The primary substitution pattern.   is used to create multiple
 ;; substitutions based on the CC bits that are set.
@@ -165,6 +165,7 @@
 ;; apply the subst_cczn or subset_cc define_subst to generate a
 ;; new pattern that compare-elim can use
 (define_subst_attr "cczn" "subst_cczn" "" "_cczn")
+(define_subst_attr "ccz" "subst_ccz" "" "_ccz")
 (define_subst_attr "cc" "subst_cc" "" "_cc")
 
 ;; Type of delay slot.  NONE means the instruction has no delay slot.
diff --git a/gcc/config/h8300/jumpcall.md b/gcc/config/h8300/jumpcall.md
index e1f04183564..3e59fee58bd 100644
--- a/gcc/config/h8300/jumpcall.md
+++ b/gcc/config/h8300/jumpcall.md
@@ -143,6 +143,52 @@
   [(set_attr "type" "bitbranch")
(set_attr "length_table" "bitbranch")])
 
+(define_insn_and_split ""
+  [(set (pc)
+   (if_then_else (match_operator 3 "eqne_operator"
+   [(zero_extract:QHSI (match_operand:QHSI 1 
"register_operand" "r")
+   (const_int 1)
+   (match_operand 2 
"const_int_operand" "n"))
+(const_int 0)])
+ (label_ref (match_operand 0 "" ""))
+ (pc)))]
+  "INTVAL (operands[2]) < 16"
+  "#"
+  "&& reload_completed"
+  [(set (reg:CCZ CC_REG)
+   (eq (zero_extract:QHSI (match_dup 1) (const_int 1) (match_dup 2))
+   (const_int 0)))
+   (set (pc)
+   (if_then_else (match_op_dup 3 [(reg:CCZ CC_REG) (const_int 0)])
+ (label_ref (match_dup 0))
+ (pc)))])
+
+(define_insn_and_split ""
+  [(set (pc)
+   (if_then_else (match_operator 3 "eqne_operator"
+   [(zero_extract:SI (match_operand:SI 1 
"register_operand" "r")
+ (const_int 1)
+ (match_operand 2 "const_int_operand" 
"n"))
+(const_int 0)])
+ (label_ref (match_operand 0 "" ""))
+ (pc)))
+   (clobber (match_scratch:SI 4 "="))]
+  "INTVAL (operands[2]) >= 16"
+  "#"
+  "&& reload_completed"
+  [(parallel [(set (match_dup 4)
+  (ior:SI (and:SI (match_dup 4) (const_int -65536))
+  (lshiftrt:SI (match_dup 1) (const_int 16
+ (clobber (reg:CC CC_REG))])
+   (set (reg:CCZ CC_REG)
+   (eq (zero_extract:SI (match_dup 4) (const_int 1) (match_dup 2))
+   (const_int 0)))
+   (set (pc)
+   (if_then_else (match_op_dup 3 [(reg:CCZ CC_REG) (const_int 0)])
+ (label_ref (match_dup 0))
+ (pc)))]
+  "operands[2] = GEN_INT (INTVAL (operands[2]) - 16);")
+
 ;; Unconditional and other jump

Re: [PATCH] RISC-V: Enable overlap-by-pieces in case of fast unaliged access

2021-07-29 Thread Palmer Dabbelt

On Tue, 27 Jul 2021 02:32:12 PDT (-0700), cmuell...@gcc.gnu.org wrote:

Ok, so if I understand correctly Palmer and Andrew prefer
overlap_op_by_pieces to be controlled
by its own field in the riscv_tune_param struct and not by the field
slow_unaligned_access in this struct
(i.e. slow_unaligned_access==false is not enough to imply
overlap_op_by_pieces==true).

I guess, but I'm not really worried about this at that level of detail 
right now.  It's not like the tune structures form any sort of external 
interface we have to keep stable, we can do whatever we want with those 
fields so I'd just aim for encoding the desired behavior as simply as 
possible rather than trying to build something extensible.

There are really two questions we need to answer: is this code actually 
faster for the C906, and is this what the average users wants under -Os.  

That first one is pretty easy: just running those simple code sequences 
under a sweep of page offsets should be sufficient to determine if this 
is always faster (in which case it's an easy yes), if it's always slower 
(an easy no), or if there's some slow cases like page/cache line 
crossing (in which case we'd need to think a bit).

The second one is a bit tricker.  In the past we'd said these sort of 
"actively misalign accesses to generate smaller code" sort of thing 
isn't suitable for -Os (as most machines still have very slow unaligned 
accesses) but is suitable for -Oz (don't remember if that ever ended up 
in GCC, though).  That still seems like a reasonable decision, but if it 
turns out that implementations with fast unaligned accesses become the 
norm then it'd probably be worth revisiting it.  Not sure exactly how to 
determine that tipping point, but I think we're a long way away from it 
right now.

IMO it's really just premature to try and design an encoding of the 
tuning paramaters until we have an idea of what they are, as we'll just 
end up devolving down the path of trying to encode all possible hardware 
and that's generally a huge waste of time.  Since there's no ABI here we 
can refactor this however we want as new tunings show up.

I don't have access to pipeline details that give proof that there are cases
where this patch causes a performance penalty.

So, I leave this here as a summary for someone who has enough information and
interest to move this forward:
* the original patch should be sufficient, but does not have tests:
  https://gcc.gnu.org/pipermail/gcc-patches/2021-July/575791.html
* the tests can be taken from this patch:
  https://gcc.gnu.org/pipermail/gcc-patches/2021-July/575864.html
  Note, that there is a duplicated "sw" in builtins-overlap-6.c, which
should be a "sd".

Thanks for the feedback!

Cool.  Looks like the C906 is starting to show up in the real world, so 
we should be able to find someone who has access to one and cares enough 
to at least run some simple benchamrks of these code sequences.  IMO 
that's a pretty low interest bar, so I don't see any harm in waiting -- 
when the hardware is common then I'm sure someone will care enough to 
give this a shot, and until then it's not really impacting anyone either 
way.

The -Os thing is a bigger discussion, and while I'm happy to have it I 
don't really think we're even close to these being common enough yet.  I 
saw your memmove patch and think the same rationale might apply there, 
but I haven't looked closely and won't have time to for a bit as I've 
got to get around to the other projects.

On Tue, Jul 27, 2021 at 3:48 AM Palmer Dabbelt  wrote:

On Mon, 26 Jul 2021 03:05:21 PDT (-0700), Andrew Waterman wrote:
> On Thu, Jul 22, 2021 at 10:27 AM Palmer Dabbelt  wrote:
>>
>> On Thu, 22 Jul 2021 06:29:46 PDT (-0700), gcc-patches@gcc.gnu.org wrote:
>> > Could you add a testcase? Otherwise LGTM.
>> >
>> > Option: -O2 -mtune=thead-c906 -march=rv64gc -mabi=lp64
>> > void foo(char *dst){
>> >__builtin_memset(dst, 0, 15);
>> > }
>>
>> I'd like to see:
>>
>> * Test results.  This is only on for one target right now, so relying on
>>   it to just work on others isn't a good idea.
>> * Something to demonstrate this doesn't break -mstrict-align.
>> * Some sort of performance analysis.  Most machines that support
>>   unaligned access do so with some performance degredation,
>
> Also, some machines that gracefully support misaligned accesses under
> most circumstances nevertheless experience a perf degradation when the
> load depends on two stores that overlap partially but not fully.  This
> transformation will obviously trigger such behavior from time to time.

Ya, I thought I wrote a response to this but I guess it's just in a
buffer somewhere.  The code sequences this is generating are really the
worst case for unaligned stores: one of them is always guaranteed to be
misaligned, and it partially overlaps with a store one cycle away.

We're really only saving a handful of instructions at best here, so
there's not much room for error when it comes to

Re: [PATCH 0/2] New target hook TARGET_COMPUTE_MULTILIB and implementation for RISC-V

2021-07-29 Thread Kito Cheng via Gcc-patches

ping

On Wed, Jul 21, 2021 at 5:28 PM Kito Cheng  wrote:
>
> This patch set allow target to use customized multi-lib mechanism rather than 
> the built-in
> multi-lib mechanism.
>
> The motivation of this patch is RISC-V might have very complicated multi-lib 
> re-use
> rule*, which is hard to maintain and use current multi-lib scripts,
> we even hit the "argument list too long" error when we tried to add more
> multi-lib reuse rule.
>
> * Here is an example for RISC-V multi-lib rules:
> https://gist.github.com/kito-cheng/0289cd42d9a756382e5afeb77b42b73b
>
> V2 Changes:
> - NO changes for first patch(TARGET_COMPUTE_MULTILIB part) since first 
> version.
> - Handle option other than -march and -mabi for riscv_compute_multilib.
>
>

[r12-2591 Regression] FAIL: g++.dg/warn/Wstringop-overflow-4.C -std=gnu++98 (test for excess errors) on Linux/x86_64

2021-07-29 Thread sunil.k.pandey via Gcc-patches

On Linux/x86_64,

2e96b5f14e4025691b57d2301d71aa6092ed44bc is the first bad commit
commit 2e96b5f14e4025691b57d2301d71aa6092ed44bc
Author: Aldy Hernandez 
Date:   Tue Jun 15 12:32:51 2021 +0200

Backwards jump threader rewrite with ranger.

caused

FAIL: gcc.dg/tree-prof/20050826-2.c scan-tree-dump-not dom2 "Invalid sum"
FAIL: g++.dg/warn/Wstringop-overflow-4.C  -std=gnu++14 (test for excess errors)
FAIL: g++.dg/warn/Wstringop-overflow-4.C  -std=gnu++17 (test for excess errors)
FAIL: g++.dg/warn/Wstringop-overflow-4.C  -std=gnu++2a (test for excess errors)
FAIL: g++.dg/warn/Wstringop-overflow-4.C  -std=gnu++98 (test for excess errors)

with GCC configured with

../../gcc/configure 
--prefix=/local/skpandey/gccwork/toolwork/gcc-bisect-master/master/r12-2591/usr 
--enable-clocale=gnu --with-system-zlib --with-demangler-in-ld 
--with-fpmath=sse --enable-languages=c,c++,fortran --enable-cet --without-isl 
--enable-libmpx x86_64-linux --disable-bootstrap

To reproduce:

$ cd {build_dir}/gcc && make check 
RUNTESTFLAGS="tree-prof.exp=gcc.dg/tree-prof/20050826-2.c 
--target_board='unix{-m32}'"
$ cd {build_dir}/gcc && make check 
RUNTESTFLAGS="tree-prof.exp=gcc.dg/tree-prof/20050826-2.c 
--target_board='unix{-m32\ -march=cascadelake}'"
$ cd {build_dir}/gcc && make check 
RUNTESTFLAGS="tree-prof.exp=gcc.dg/tree-prof/20050826-2.c 
--target_board='unix{-m64}'"
$ cd {build_dir}/gcc && make check 
RUNTESTFLAGS="tree-prof.exp=gcc.dg/tree-prof/20050826-2.c 
--target_board='unix{-m64\ -march=cascadelake}'"
$ cd {build_dir}/gcc && make check 
RUNTESTFLAGS="dg.exp=g++.dg/warn/Wstringop-overflow-4.C 
--target_board='unix{-m32}'"
$ cd {build_dir}/gcc && make check 
RUNTESTFLAGS="dg.exp=g++.dg/warn/Wstringop-overflow-4.C 
--target_board='unix{-m32\ -march=cascadelake}'"

(Please do not reply to this email, for question about this report, contact me 
at skpgkp2 at gmail dot com)

[PATCH v2] Fix for powerpc64 long double complex divide failure

2021-07-29 Thread Patrick McGehearty via Gcc-patches

This patch resolves the failure of powerpc64 long double complex divide
in native ibm long double format after the patch "Practical improvement
to libgcc complex divide".

The new code uses the following macros which are intended to be mapped
to appropriate values according to the underlying hardware representation.
See https://gcc.gnu.org/bugzilla/show_bug.cgi?id=101104

RBIG a value near the maximum representation
RMIN a value near the minimum representation
 (but not in the subnormal range)
RMIN2a value moderately less than 1
RMINSCAL the inverse of RMIN2
RMAX2RBIG * RMIN2  - a value to limit scaling to not overflow

When "long double" values were not using the IEEE 128-bit format but
the traditional IBM 128-bit, the previous code used the LDBL values
which caused overflow for RMINSCAL. The new code uses the DBL values.

RBIG  LDBL_MAX = 0x1.f800p+1022
  DBL_MAX  = 0x1.f000p+1022

RMIN  LDBL_MIN = 0x1.p-969
RMIN  DBL_MIN  = 0x1.p-1022

RMIN2 LDBL_EPSILON = 0x0.1000p-1022 = 0x1.0p-1074
RMIN2 DBL_EPSILON  = 0x1.p-52

RMINSCAL 1/LDBL_EPSILON = inf (1.0p+1074 does not fit in IBM 128-bit).
 1/DBL_EPSILON  = 0x1.p+52

RMAX2 = RBIG * RMIN2 = 0x1.f800p-52
RBIG * RMIN2 = 0x1.f000p+970

The MAX and MIN values have only modest changes since the exponent
field for IBM 128-bit floating point values is the same size as
the exponent field for IBM 64-bit floating point values. However
the EPSILON field is considerably different. Due to how small
values can be represented in the lower 64 bits of the IBM 128-bit
floating point, EPSILON is extremely small, so far beyond the
desired value that inversion of the value overflows and even
without the overflow, the RMAX2 is so small as to eliminate
most usage of the test.

Instead of just replacing the use of KF_EPSILON with DF_ESPILON, we
replace all uses of KF_* with DF_*. Since the exponent fields are
essentially the same, we gain the positive benefits from the new
formula while avoiding all under/overflow issues in the #defines.

The change has been tested on gcc135.fsffrance.org and gains the
expected improvements in accuracy for long double complex divide.

libgcc/
PR target/101104
* config/rs6000/_divkc3.c (RBIG, RMIN, RMIN2, RMINSCAL, RMAX2):
Fix long double complex divide for native IBM 128-bit.
---
 libgcc/config/rs6000/_divkc3.c | 8 
 1 file changed, 4 insertions(+), 4 deletions(-)

diff --git a/libgcc/config/rs6000/_divkc3.c b/libgcc/config/rs6000/_divkc3.c
index a1d29d2..2b229c8 100644
--- a/libgcc/config/rs6000/_divkc3.c
+++ b/libgcc/config/rs6000/_divkc3.c
@@ -38,10 +38,10 @@ see the files COPYING3 and COPYING.RUNTIME respectively.  
If not, see
 #endif
 
 #ifndef __LONG_DOUBLE_IEEE128__
-#define RBIG   (__LIBGCC_KF_MAX__ / 2)
-#define RMIN   (__LIBGCC_KF_MIN__)
-#define RMIN2  (__LIBGCC_KF_EPSILON__)
-#define RMINSCAL (1 / __LIBGCC_KF_EPSILON__)
+#define RBIG   (__LIBGCC_DF_MAX__ / 2)
+#define RMIN   (__LIBGCC_DF_MIN__)
+#define RMIN2  (__LIBGCC_DF_EPSILON__)
+#define RMINSCAL (1 / __LIBGCC_DF_EPSILON__)
 #define RMAX2  (RBIG * RMIN2)
 #else
 #define RBIG   (__LIBGCC_TF_MAX__ / 2)
-- 
1.8.3.1

Re: [PATCH] c++: suppress all warnings on memper pointers to work around dICE [PR101219]

2021-07-29 Thread Jason Merrill via Gcc-patches


On 7/22/21 7:15 PM, Sergei Trofimovich wrote:

From: Sergei Trofimovich 

r12-1804 ("cp: add support for per-location warning groups.") among other
things removed warning suppression from a few places including ptrmemfuncs.

Currently ptrmemfuncs don't have valid BINFO attached which causes ICEs
in access checks:

 crash_signal
 gcc/toplev.c:328
 perform_or_defer_access_check(tree_node*, tree_node*, tree_node*, int, 
access_failure_info*)
 gcc/cp/semantics.c:490
 finish_non_static_data_member(tree_node*, tree_node*, tree_node*)
 gcc/cp/semantics.c:2208
 ...

The change suppresses warnings again until we provide BINFOs for ptrmemfuncs.


We don't need BINFOs for PMFs, we need to avoid paths that expect them.

It looks like the problem is with tsubst_copy_and_build calling 
finish_non_static_data_member instead of build_ptrmemfunc_access_expr.



PR c++/101219

gcc/cp/ChangeLog:

* typeck.c (build_ptrmemfunc_access_expr): Suppress all warnings
to avoid ICE.

gcc/testsuite/ChangeLog:

* g++.dg/torture/pr101219.C: New test.


This doesn't need to be in torture; it has nothing to do with optimization.


---
  gcc/cp/typeck.c |  6 +-
  gcc/testsuite/g++.dg/torture/pr101219.C | 10 ++
  2 files changed, 15 insertions(+), 1 deletion(-)
  create mode 100644 gcc/testsuite/g++.dg/torture/pr101219.C

diff --git a/gcc/cp/typeck.c b/gcc/cp/typeck.c
index a483e1f988d..aa91fd21c7b 100644
--- a/gcc/cp/typeck.c
+++ b/gcc/cp/typeck.c
@@ -3326,7 +3326,11 @@ build_ptrmemfunc_access_expr (tree ptrmem, tree 
member_name)
 member = DECL_CHAIN (member))
  if (DECL_NAME (member) == member_name)
break;
-  return build_simple_component_ref (ptrmem, member);
+  tree r = build_simple_component_ref (ptrmem, member);
+  /* Suppress warning to avoid exposing missing BINFO for ptrmem
+ synthetic structs: PR101219.  */
+  suppress_warning(r);
+  return r;
  }
  
  /* Given an expression PTR for a pointer, return an expression

diff --git a/gcc/testsuite/g++.dg/torture/pr101219.C 
b/gcc/testsuite/g++.dg/torture/pr101219.C
new file mode 100644
index 000..c8d30448187
--- /dev/null
+++ b/gcc/testsuite/g++.dg/torture/pr101219.C
@@ -0,0 +1,10 @@
+/* { dg-do compile } */
+/* { dg-additional-options "-Wall" } */
+struct S { void m(); };
+
+template  bool f() {
+  /* In PR101219 gcc used to ICE in warning code. */
+  void (S::*mp)();
+
+  return ::m == mp;
+}

[committed] d: Generate Object class if it doesn't exist during TypeInfo emission (PR101672)

2021-07-29 Thread Iain Buclaw via Gcc-patches

Hi,

This patch adds a check to make_frontend_typeinfo to generate a stub
Object class if one doesn't exist in the run-time library.

Having a stub will prevent errors from occuring when compiling D code
with an empty object.d.  Though if it were to actually be used
implicitly then an error should occur.

Bootstrapped and regression tested on x86_64-linux-gnu/-mx32/-m32, and
committed to mainline.

Regards,
Iain

---
gcc/d/ChangeLog:

PR d/101672
* typeinfo.cc (make_frontend_typeinfo): Generate Object class if it
doesn't exist.
(check_typeinfo_type): Don't warn if there's no location.

gcc/testsuite/ChangeLog:

PR d/101672
* gdc.dg/pr100967.d: Update test.
* gdc.dg/pr101672.d: New test.
---
 gcc/d/typeinfo.cc   | 23 +--
 gcc/testsuite/gdc.dg/pr100967.d |  2 +-
 gcc/testsuite/gdc.dg/pr101672.d | 19 +++
 3 files changed, 41 insertions(+), 3 deletions(-)
 create mode 100644 gcc/testsuite/gdc.dg/pr101672.d

diff --git a/gcc/d/typeinfo.cc b/gcc/d/typeinfo.cc
index 9d6464deb07..a1f0543d58e 100644
--- a/gcc/d/typeinfo.cc
+++ b/gcc/d/typeinfo.cc
@@ -205,12 +205,30 @@ make_frontend_typeinfo (Identifier *ident, 
ClassDeclaration *base = NULL)
   if (!object_module->_scope)
 object_module->importAll (NULL);
 
+  /* Object class doesn't exist, create a stub one that will cause an error if
+ used.  */
+  Loc loc = (object_module->md) ? object_module->md->loc : object_module->loc;
+  if (!base)
+{
+  if (!ClassDeclaration::object)
+   {
+ ClassDeclaration *object
+   = ClassDeclaration::create (loc, Identifier::idPool ("Object"),
+   NULL, NULL, true);
+ object->parent = object_module;
+ object->members = new Dsymbols;
+ object->storage_class |= STCtemp;
+   }
+
+  base = ClassDeclaration::object;
+}
+
   /* Assignment of global typeinfo variables is managed by the ClassDeclaration
  constructor, so only need to new the declaration here.  */
-  Loc loc = (object_module->md) ? object_module->md->loc : object_module->loc;
   ClassDeclaration *tinfo = ClassDeclaration::create (loc, ident, NULL, NULL,
  true);
   tinfo->parent = object_module;
+  tinfo->members = new Dsymbols;
   dsymbolSemantic (tinfo, object_module->_scope);
   tinfo->baseClass = base;
   /* This is a compiler generated class, and shouldn't be mistaken for being
@@ -1316,6 +1334,7 @@ public:
 tree type = tinfo_types[get_typeinfo_kind (tid->tinfo)];
 gcc_assert (type != NULL_TREE);
 
+/* Built-in typeinfo will be referenced as one-only.  */
 tid->csym = declare_extern_var (ident, type);
 DECL_LANG_SPECIFIC (tid->csym) = build_lang_decl (tid);
 
@@ -1400,7 +1419,7 @@ check_typeinfo_type (const Loc , Scope *sc)
   /* If TypeInfo has not been declared, warn about each location once.  */
   static Loc warnloc;
 
-  if (!warnloc.equals (loc))
+  if (loc.filename && !warnloc.equals (loc))
{
  error_at (make_location_t (loc),
"% could not be found, "
diff --git a/gcc/testsuite/gdc.dg/pr100967.d b/gcc/testsuite/gdc.dg/pr100967.d
index 582ad582676..bb83c299ced 100644
--- a/gcc/testsuite/gdc.dg/pr100967.d
+++ b/gcc/testsuite/gdc.dg/pr100967.d
@@ -1,7 +1,7 @@
 // https://gcc.gnu.org/bugzilla/show_bug.cgi?id=100967
 // { dg-do compile }
 
-module object; // { dg-error "class object.TypeInfo missing or corrupt 
object.d" }
+module object;
 
 extern(C) int main()
 {
diff --git a/gcc/testsuite/gdc.dg/pr101672.d b/gcc/testsuite/gdc.dg/pr101672.d
new file mode 100644
index 000..292fd761fb1
--- /dev/null
+++ b/gcc/testsuite/gdc.dg/pr101672.d
@@ -0,0 +1,19 @@
+// https://gcc.gnu.org/bugzilla/show_bug.cgi?id=101672
+// { dg-do compile }
+
+module object;
+
+interface I101672
+{
+static int i101672;
+}
+
+class A101672 : I101672 // { dg-error "class object.A101672 missing or corrupt 
object.d" }
+{
+static int a101672;
+}
+
+class B101672 : A101672
+{
+static int b101672;
+}
-- 
2.30.2

[committed] d: Return the correct value for C++ constructor calls (PR101664)

2021-07-29 Thread Iain Buclaw via Gcc-patches

Hi,

C++ constructors return void, even though the D front-end semantic
treats them as implicitly returning `this'.  To handle this correctly,
the object reference is cached and used as the result of the expression.

Bootstrapped and regression tested on x86_64-linux-gnu/-mx32/-m32,
committed to mainline, and backported to the gcc-11 release branch.

Regards,
Iain

---
gcc/d/ChangeLog:

PR d/101664
* expr.cc (ExprVisitor::visit (CallExp *)): Use object expression as
result for C++ constructor calls.

gcc/testsuite/ChangeLog:

PR d/101664
* gdc.dg/extern-c++/extern-c++.exp: New.
* gdc.dg/extern-c++/pr101664.d: New test.
* gdc.dg/extern-c++/pr101664_1.cc: New test.
---
 gcc/d/expr.cc | 13 +++
 .../gdc.dg/extern-c++/extern-c++.exp  | 39 +++
 gcc/testsuite/gdc.dg/extern-c++/pr101664.d| 15 +++
 gcc/testsuite/gdc.dg/extern-c++/pr101664_1.cc | 10 +
 4 files changed, 77 insertions(+)
 create mode 100644 gcc/testsuite/gdc.dg/extern-c++/extern-c++.exp
 create mode 100644 gcc/testsuite/gdc.dg/extern-c++/pr101664.d
 create mode 100644 gcc/testsuite/gdc.dg/extern-c++/pr101664_1.cc

diff --git a/gcc/d/expr.cc b/gcc/d/expr.cc
index 99ca958c7c4..85269c6b2be 100644
--- a/gcc/d/expr.cc
+++ b/gcc/d/expr.cc
@@ -1751,6 +1751,7 @@ public:
 tree callee = NULL_TREE;
 tree object = NULL_TREE;
 tree cleanup = NULL_TREE;
+tree returnvalue = NULL_TREE;
 TypeFunction *tf = NULL;
 
 /* Calls to delegates can sometimes look like this.  */
@@ -1819,6 +1820,15 @@ public:
else
  fndecl = build_address (fndecl);
 
+   /* C++ constructors return void, even though front-end semantic
+  treats them as implicitly returning `this'.  Set returnvalue
+  to override the result of this expression.  */
+   if (fd->isCtorDeclaration () && fd->linkage == LINKcpp)
+ {
+   thisexp = d_save_expr (thisexp);
+   returnvalue = thisexp;
+ }
+
callee = build_method_call (fndecl, thisexp, fd->type);
  }
  }
@@ -1885,6 +1895,9 @@ public:
build the call expression.  */
 tree exp = d_build_call (tf, callee, object, e->arguments);
 
+if (returnvalue != NULL_TREE)
+  exp = compound_expr (exp, returnvalue);
+
 if (tf->isref)
   exp = build_deref (exp);
 
diff --git a/gcc/testsuite/gdc.dg/extern-c++/extern-c++.exp 
b/gcc/testsuite/gdc.dg/extern-c++/extern-c++.exp
new file mode 100644
index 000..d38f993faaf
--- /dev/null
+++ b/gcc/testsuite/gdc.dg/extern-c++/extern-c++.exp
@@ -0,0 +1,39 @@
+#   Copyright (C) 2021 Free Software Foundation, Inc.
+
+# This program is free software; you can redistribute it and/or modify
+# it under the terms of the GNU General Public License as published by
+# the Free Software Foundation; either version 3 of the License, or
+# (at your option) any later version.
+# 
+# This program is distributed in the hope that it will be useful,
+# but WITHOUT ANY WARRANTY; without even the implied warranty of
+# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+# GNU General Public License for more details.
+# 
+# You should have received a copy of the GNU General Public License
+# along with GCC; see the file COPYING3.  If not see
+# .
+
+# Load support procs.
+load_lib gdc-dg.exp
+
+# We are mixing D and C++ code, need to pull in libstdc++
+global GDC_INCLUDE_CXX_FLAGS
+set GDC_INCLUDE_CXX_FLAGS 1
+
+# Initialize `dg'.
+dg-init
+
+# Main loop.
+if [check_no_compiler_messages extern_c++_tests assembly {
+   // C++
+   int main() { return 0; }
+}] {
+gdc-dg-runtest [lsort \
+  [glob -nocomplain $srcdir/$subdir/*.d ] ] "" ""
+}
+
+set GDC_INCLUDE_CXX_FLAGS 0
+
+# All done.
+dg-finish
diff --git a/gcc/testsuite/gdc.dg/extern-c++/pr101664.d 
b/gcc/testsuite/gdc.dg/extern-c++/pr101664.d
new file mode 100644
index 000..57b3d903582
--- /dev/null
+++ b/gcc/testsuite/gdc.dg/extern-c++/pr101664.d
@@ -0,0 +1,15 @@
+// https://gcc.gnu.org/bugzilla/show_bug.cgi?id=101664
+// { dg-do run }
+// { dg-options "-O2" }
+// { dg-additional-sources "pr101664_1.cc" }
+
+extern(C++) struct S101664
+{
+int i;
+this(int);
+}
+
+void main()
+{
+assert(S101664(1).i == 1);
+}
diff --git a/gcc/testsuite/gdc.dg/extern-c++/pr101664_1.cc 
b/gcc/testsuite/gdc.dg/extern-c++/pr101664_1.cc
new file mode 100644
index 000..066e784293d
--- /dev/null
+++ b/gcc/testsuite/gdc.dg/extern-c++/pr101664_1.cc
@@ -0,0 +1,10 @@
+struct S101664
+{
+  int i;
+  S101664 (int n);
+};
+
+S101664::S101664 (int n)
+: i(n)
+{
+}
-- 
2.30.2

[committed] d: Ensure casting from bool results in either 0 or 1 (PR96435)

2021-07-29 Thread Iain Buclaw via Gcc-patches

Hi,

When casting from bool, the result is either 0 or 1, any other value
violates @safe code, so enforce that it is never invalid.

This patch does that by lowering rvalue reads into `(bool & 1)'.

Bootstrapped and regression tested on x86_64-linux-gnu/-mx32/-m32,
committed to mainline, and backported to the gcc-11, gcc-10 and gcc-9
release branches.

Regards,
Iain

---
gcc/d/ChangeLog:

PR d/96435
* d-convert.cc (convert_for_rvalue): New function.
* d-tree.h (convert_for_rvalue): Declare.
* expr.cc (ExprVisitor::visit (CastExp *)): Use convert_for_rvalue.
(build_return_dtor): Likewise.

gcc/testsuite/ChangeLog:

PR d/96435
* gdc.dg/torture/pr96435.d: New test.
---
 gcc/d/d-convert.cc | 36 ++
 gcc/d/d-tree.h |  1 +
 gcc/d/expr.cc  | 13 ++
 gcc/testsuite/gdc.dg/torture/pr96435.d | 21 +++
 4 files changed, 66 insertions(+), 5 deletions(-)
 create mode 100644 gcc/testsuite/gdc.dg/torture/pr96435.d

diff --git a/gcc/d/d-convert.cc b/gcc/d/d-convert.cc
index 237c084acf5..d43485dca77 100644
--- a/gcc/d/d-convert.cc
+++ b/gcc/d/d-convert.cc
@@ -602,6 +602,42 @@ convert_expr (tree exp, Type *etype, Type *totype)
   return result ? result : convert (build_ctype (totype), exp);
 }
 
+/* Return a TREE represenwation of EXPR, whose type has been converted from
+ * ETYPE to TOTYPE, and is being used in an rvalue context.  */
+
+tree
+convert_for_rvalue (tree expr, Type *etype, Type *totype)
+{
+  tree result = NULL_TREE;
+
+  Type *ebtype = etype->toBasetype ();
+  Type *tbtype = totype->toBasetype ();
+
+  switch (ebtype->ty)
+{
+case Tbool:
+  /* If casting from bool, the result is either 0 or 1, any other value
+violates @safe code, so enforce that it is never invalid.  */
+  if (CONSTANT_CLASS_P (expr))
+   result = d_truthvalue_conversion (expr);
+  else
+   {
+ /* Reinterpret the boolean as an integer and test the first bit.
+The generated code should end up being equivalent to:
+   *cast(ubyte *) & 1;  */
+ machine_mode bool_mode = TYPE_MODE (TREE_TYPE (expr));
+ tree mtype = lang_hooks.types.type_for_mode (bool_mode, 1);
+ result = fold_build2 (BIT_AND_EXPR, mtype,
+   build_vconvert (mtype, expr),
+   build_one_cst (mtype));
+   }
+
+  result = convert (build_ctype (tbtype), result);
+  break;
+}
+
+  return result ? result : convert_expr (expr, etype, totype);
+}
 
 /* Apply semantics of assignment to a value of type TOTYPE to EXPR
(e.g., pointer = array -> pointer = [0])
diff --git a/gcc/d/d-tree.h b/gcc/d/d-tree.h
index b03d60a5c0e..f210b8b1a6e 100644
--- a/gcc/d/d-tree.h
+++ b/gcc/d/d-tree.h
@@ -598,6 +598,7 @@ extern bool decl_with_nonnull_addr_p (const_tree);
 extern tree d_truthvalue_conversion (tree);
 extern tree d_convert (tree, tree);
 extern tree convert_expr (tree, Type *, Type *);
+extern tree convert_for_rvalue (tree, Type *, Type *);
 extern tree convert_for_assignment (tree, Type *, Type *);
 extern tree convert_for_argument (tree, Parameter *);
 extern tree convert_for_condition (tree, Type *);
diff --git a/gcc/d/expr.cc b/gcc/d/expr.cc
index b78778eb8ef..99ca958c7c4 100644
--- a/gcc/d/expr.cc
+++ b/gcc/d/expr.cc
@@ -1491,7 +1491,7 @@ public:
 if (tbtype->ty == Tvoid)
   this->result_ = build_nop (build_ctype (tbtype), result);
 else
-  this->result_ = convert_expr (result, ebtype, tbtype);
+  this->result_ = convert_for_rvalue (result, ebtype, tbtype);
   }
 
   /* Build a delete expression.  */
@@ -3169,11 +3169,14 @@ build_return_dtor (Expression *e, Type *type, 
TypeFunction *tf)
   tree result = build_expr (e);
 
   /* Convert for initializing the DECL_RESULT.  */
-  result = convert_expr (result, e->type, type);
-
-  /* If we are returning a reference, take the address.  */
   if (tf->isref)
-result = build_address (result);
+{
+  /* If we are returning a reference, take the address.  */
+  result = convert_expr (result, e->type, type);
+  result = build_address (result);
+}
+  else
+result = convert_for_rvalue (result, e->type, type);
 
   /* The decl to store the return expression.  */
   tree decl = DECL_RESULT (cfun->decl);
diff --git a/gcc/testsuite/gdc.dg/torture/pr96435.d 
b/gcc/testsuite/gdc.dg/torture/pr96435.d
new file mode 100644
index 000..c6d8785ec5b
--- /dev/null
+++ b/gcc/testsuite/gdc.dg/torture/pr96435.d
@@ -0,0 +1,21 @@
+// https://gcc.gnu.org/bugzilla/show_bug.cgi?id=96435
+// { dg-do run }
+
+@safe bool test96435()
+{
+int[2] array = [16, 678];
+union U { int i; bool b; }
+U u;
+u.i = 0xDEADBEEF;
+assert(array[u.b] == 678);
+return u.b;
+}
+
+@safe void main()
+{
+auto b = test96435();
+if (b)
+assert(true);
+if (!b)
+

[committed] d: Remove generated D header files on error (PR101657)

2021-07-29 Thread Iain Buclaw via Gcc-patches

Hi,

This patch adds a clean-up for removing any generated DI header files
created before semantic analysis was ran.

If an error occurs later during compilation, remember that we generated
the headers, so that they can be removed before exit.

Bootstrapped and regression tested on x86_64-linux-gnu/-mx32/-m32, and
committed to mainline.

Regards,
Iain

---
gcc/d/ChangeLog:

PR d/101657
* d-lang.cc (d_parse_file): Remove generated D header files on error.

gcc/testsuite/ChangeLog:

PR d/101657
* gdc.dg/pr101657.d: New test.
---
 gcc/d/d-lang.cc | 19 +++
 gcc/testsuite/gdc.dg/pr101657.d | 14 ++
 2 files changed, 33 insertions(+)
 create mode 100644 gcc/testsuite/gdc.dg/pr101657.d

diff --git a/gcc/d/d-lang.cc b/gcc/d/d-lang.cc
index 6ad3823d910..ac0945b1f34 100644
--- a/gcc/d/d-lang.cc
+++ b/gcc/d/d-lang.cc
@@ -1000,6 +1000,10 @@ d_parse_file (void)
}
 }
 
+  /* If an error occurs later during compilation, remember that we generated
+ the headers, so that they can be removed before exit.  */
+  bool dump_headers = false;
+
   if (global.errors)
 goto had_errors;
 
@@ -1019,6 +1023,8 @@ d_parse_file (void)
 
  genhdrfile (m);
}
+
+  dump_headers = true;
 }
 
   if (global.errors)
@@ -1243,6 +1249,19 @@ d_parse_file (void)
  exit with an error status.  */
   errorcount += (global.errors + global.warnings);
 
+  /* Remove generated .di files on error.  */
+  if (errorcount && dump_headers)
+{
+  for (size_t i = 0; i < modules.length; i++)
+   {
+ Module *m = modules[i];
+ if (d_option.fonly && m != Module::rootModule)
+   continue;
+
+ remove (m->hdrfile->toChars ());
+   }
+}
+
   /* Write out globals.  */
   d_finish_compilation (vec_safe_address (global_declarations),
vec_safe_length (global_declarations));
diff --git a/gcc/testsuite/gdc.dg/pr101657.d b/gcc/testsuite/gdc.dg/pr101657.d
new file mode 100644
index 000..0d77c36f030
--- /dev/null
+++ b/gcc/testsuite/gdc.dg/pr101657.d
@@ -0,0 +1,14 @@
+// https://gcc.gnu.org/bugzilla/show_bug.cgi?id=101657
+// { dg-do compile }
+// { dg-additional-options "-H" }
+
+void fun101657()
+{
+fail; // { dg-error "undefined identifier 'fail'" }
+}
+
+// { dg-final { if ![file exists pr101657.di] \{} }
+// { dg-final { pass "gdc.dg/pr101657.d   (file exists pr101657.di)" } }
+// { dg-final { \} else \{  } }
+// { dg-final { fail "gdc.dg/pr101657.d   (file exists pr101657.di)" } }
+// { dg-final { \}  } }
-- 
2.30.2

[committed] d: Don't escape quoted format strings in escape_d_format (PR101656)

2021-07-29 Thread Iain Buclaw via Gcc-patches

Hi,

This patch prepares the escape_d_format function to handle being given a
quoted string.  Something that the self-hosted D front-end does with a
new format helper for symbols.

If the format string is enclosed by two '`' characters, then don't
escape the first and laster characters.

There are no tests as only the self-hosted front-end has the necessary
change that turns this on.

Bootstrapped and regression tested on x86_64-linux-gnu/-mx32/-m32, and
committed to mainline.

Regards,
Iain

---
gcc/d/ChangeLog:

PR d/101656
* d-diagnostic.cc (escape_d_format): Don't escape quoted format
strings.
---
 gcc/d/d-diagnostic.cc | 14 +-
 1 file changed, 13 insertions(+), 1 deletion(-)

diff --git a/gcc/d/d-diagnostic.cc b/gcc/d/d-diagnostic.cc
index 7043abe10bd..1982bd954a8 100644
--- a/gcc/d/d-diagnostic.cc
+++ b/gcc/d/d-diagnostic.cc
@@ -135,10 +135,21 @@ expand_d_format (const char *format)
 static char *
 escape_d_format (const char *format)
 {
+  bool quoted = false;
+  size_t format_len = 0;
   obstack buf;
 
   gcc_obstack_init ();
 
+  /* If the format string is enclosed by two '`' characters, then don't escape
+ the first and last characters.  */
+  if (*format == '`')
+{
+  format_len = strlen (format) - 1;
+  if (format_len && format[format_len] == '`')
+   quoted = true;
+}
+
   for (const char *p = format; *p; p++)
 {
   switch (*p)
@@ -152,7 +163,8 @@ escape_d_format (const char *format)
case '`':
  /* Escape '`' characters so that expand_d_format does not confuse them
 for a quoted string.  */
- obstack_1grow (, '\\');
+ if (!quoted || (p != format && p != (format + format_len)))
+   obstack_1grow (, '\\');
  break;
 
default:
-- 
2.30.2

RE: [ARM] PR66791: Replace builtins in vld1

2021-07-29 Thread Kyrylo Tkachov via Gcc-patches



> -Original Message-
> From: Prathamesh Kulkarni 
> Sent: 29 July 2021 15:45
> To: Kyrylo Tkachov 
> Cc: gcc Patches ; Richard Earnshaw
> 
> Subject: Re: [ARM] PR66791: Replace builtins in vld1
> 
> On Thu, 29 Jul 2021 at 14:57, Kyrylo Tkachov 
> wrote:
> >
> > Hi Prathamesh,
> >
> > > -Original Message-
> > > From: Prathamesh Kulkarni 
> > > Sent: 26 July 2021 22:24
> > > To: gcc Patches ; Kyrylo Tkachov
> > > ; Richard Earnshaw
> > > 
> > > Subject: [ARM] PR66791: Replace builtins in vld1
> > >
> > > Hi,
> > > Similar to aarch64, this patch replaces call to builtin by
> > > dereferencing __a in vld1_p64, vld1_s64 and vld1_u64.
> > >
> > > The patch changes code-gen for the intrinsic as follows:
> > > Before patch:
> > > vld1.64 {d16}, [r0:64]
> > > vmovr0, r1, d16 @ int
> > > bx  lr
> > >
> > > After patch:
> > > ldrdr0, [r0]
> > > bx  lr
> > >
> > > I assume the code-gen after patch is correct, since it loads two
> > > consecutive words from [r0] into r0 and r1 ?
> >
> > Yes, this looks correct.
> >
> > >
> > > Bootstrapped+tested on arm-linux-gnueabihf.
> > > OK to commit ?
> >
> > Ok. Can we now remove the vld1 builtin definition?
> Does the attached patch look OK ?
> I suppose we can only remove entry for di since the patch replaces
> calls to only __builtin_neon_vld1di ?

Yeah, we can just remove the DI entry.
Ok if this passes the usual testing.
Thanks,
Kyrill

> 
> Thanks,
> Prathamesh
> > Thanks,
> > Kyrill
> >
> > >
> > > Thanks,
> > > Prathamesh

Re: [ARM] PR66791: Replace builtins in vld1

2021-07-29 Thread Prathamesh Kulkarni via Gcc-patches

On Thu, 29 Jul 2021 at 14:57, Kyrylo Tkachov  wrote:
>
> Hi Prathamesh,
>
> > -Original Message-
> > From: Prathamesh Kulkarni 
> > Sent: 26 July 2021 22:24
> > To: gcc Patches ; Kyrylo Tkachov
> > ; Richard Earnshaw
> > 
> > Subject: [ARM] PR66791: Replace builtins in vld1
> >
> > Hi,
> > Similar to aarch64, this patch replaces call to builtin by
> > dereferencing __a in vld1_p64, vld1_s64 and vld1_u64.
> >
> > The patch changes code-gen for the intrinsic as follows:
> > Before patch:
> > vld1.64 {d16}, [r0:64]
> > vmovr0, r1, d16 @ int
> > bx  lr
> >
> > After patch:
> > ldrdr0, [r0]
> > bx  lr
> >
> > I assume the code-gen after patch is correct, since it loads two
> > consecutive words from [r0] into r0 and r1 ?
>
> Yes, this looks correct.
>
> >
> > Bootstrapped+tested on arm-linux-gnueabihf.
> > OK to commit ?
>
> Ok. Can we now remove the vld1 builtin definition?
Does the attached patch look OK ?
I suppose we can only remove entry for di since the patch replaces
calls to only __builtin_neon_vld1di ?

Thanks,
Prathamesh
> Thanks,
> Kyrill
>
> >
> > Thanks,
> > Prathamesh
gcc/ChangeLog:

PR target/66791
* config/arm/arm_neon.h (vld1_p64): Replace call to builtin by
explicitly dereferencing __a.
(vld1_s64): Likewise.
(vld1_u64): Likewise.
* config/arm/arm_neon_builtins.def (vld1): Remove entry for di
and change to VAR13.

diff --git a/gcc/config/arm/arm_neon.h b/gcc/config/arm/arm_neon.h
index 41b596b5fc6..5a91d15bf75 100644
--- a/gcc/config/arm/arm_neon.h
+++ b/gcc/config/arm/arm_neon.h
@@ -10301,7 +10301,7 @@ __extension__ extern __inline poly64x1_t
 __attribute__  ((__always_inline__, __gnu_inline__, __artificial__))
 vld1_p64 (const poly64_t * __a)
 {
-  return (poly64x1_t)__builtin_neon_vld1di ((const __builtin_neon_di *) __a);
+  return (poly64x1_t) { *__a };
 }
 
 #pragma GCC pop_options
@@ -10330,7 +10330,7 @@ __extension__ extern __inline int64x1_t
 __attribute__  ((__always_inline__, __gnu_inline__, __artificial__))
 vld1_s64 (const int64_t * __a)
 {
-  return (int64x1_t)__builtin_neon_vld1di ((const __builtin_neon_di *) __a);
+  return (int64x1_t) { *__a };
 }
 
 #if defined (__ARM_FP16_FORMAT_IEEE) || defined (__ARM_FP16_FORMAT_ALTERNATIVE)
@@ -10374,7 +10374,7 @@ __extension__ extern __inline uint64x1_t
 __attribute__  ((__always_inline__, __gnu_inline__, __artificial__))
 vld1_u64 (const uint64_t * __a)
 {
-  return (uint64x1_t)__builtin_neon_vld1di ((const __builtin_neon_di *) __a);
+  return (uint64x1_t) { *__a };
 }
 
 __extension__ extern __inline poly8x8_t
diff --git a/gcc/config/arm/arm_neon_builtins.def 
b/gcc/config/arm/arm_neon_builtins.def
index 70438ac1848..fb6d66e594a 100644
--- a/gcc/config/arm/arm_neon_builtins.def
+++ b/gcc/config/arm/arm_neon_builtins.def
@@ -302,8 +302,8 @@ VAR1 (TERNOP, vtbx1, v8qi)
 VAR1 (TERNOP, vtbx2, v8qi)
 VAR1 (TERNOP, vtbx3, v8qi)
 VAR1 (TERNOP, vtbx4, v8qi)
-VAR14 (LOAD1, vld1,
-v8qi, v4hi, v4hf, v2si, v2sf, di, v16qi, v8hi, v8hf, v4si, v4sf, v2di,
+VAR13 (LOAD1, vld1,
+v8qi, v4hi, v4hf, v2si, v2sf, v16qi, v8hi, v8hf, v4si, v4sf, v2di,
 v4bf, v8bf)
 VAR12 (LOAD1LANE, vld1_lane,
v8qi, v4hi, v2si, v2sf, di, v16qi, v8hi, v4si, v4sf, v2di, v4bf, v8bf)

Re: [PATCH 0/13] v2 warning control by group and location (PR 74765)

2021-07-29 Thread Martin Sebor via Gcc-patches


On 7/29/21 2:26 AM, Andrew Burgess wrote:

* Martin Sebor  [2021-07-28 10:16:59 -0600]:


On 7/28/21 5:14 AM, Andrew Burgess wrote:

* Martin Sebor via Gcc-patches  [2021-07-19 09:08:35 
-0600]:


On 7/17/21 2:36 PM, Jan-Benedict Glaw wrote:

Hi Martin!

On Fri, 2021-06-04 15:27:04 -0600, Martin Sebor  wrote:

This is a revised patch series to add warning control by group and
location, updated based on feedback on the initial series.

[...]

My automated checking (in this case: Using Debian's "gcc-snapshot"
package) indicates that between versions 1:20210527-1 and
1:20210630-1, building GDB breaks. Your patch is a likely candidate.
It's a case where a method asks for a nonnull argument and later on
checks for NULLness again. The build log is currently available at
(http://wolf.lug-owl.de:8080/jobs/gdb-vax-linux/5), though obviously
breaks for any target:

configure --target=vax-linux --prefix=/tmp/gdb-vax-linux
make all-gdb

[...]
[all 2021-07-16 19:19:25]   CXXcompile/compile.o
[all 2021-07-16 19:19:30] In file included from 
./../gdbsupport/common-defs.h:126,
[all 2021-07-16 19:19:30]  from ./defs.h:28,
[all 2021-07-16 19:19:30]  from compile/compile.c:20:
[all 2021-07-16 19:19:30] ./../gdbsupport/gdb_unlinker.h: In constructor 
'gdb::unlinker::unlinker(const char*)':
[all 2021-07-16 19:19:30] ./../gdbsupport/gdb_assert.h:35:4: error: 'nonnull' 
argument 'filename' compared to NULL [-Werror=nonnull-compare]
[all 2021-07-16 19:19:30]35 |   ((void) ((expr) ? 0 :   
\
[all 2021-07-16 19:19:30]   |   
~^~~~
[all 2021-07-16 19:19:30]36 |(gdb_assert_fail (#expr, __FILE__, 
__LINE__, FUNCTION_NAME), 0)))
[all 2021-07-16 19:19:30]   |
~
[all 2021-07-16 19:19:30] ./../gdbsupport/gdb_unlinker.h:38:5: note: in 
expansion of macro 'gdb_assert'
[all 2021-07-16 19:19:30]38 | gdb_assert (filename != NULL);
[all 2021-07-16 19:19:30]   | ^~
[all 2021-07-16 19:19:31] cc1plus: all warnings being treated as errors
[all 2021-07-16 19:19:31] make[1]: *** [Makefile:1641: compile/compile.o] Error 
1
[all 2021-07-16 19:19:31] make[1]: Leaving directory 
'/var/lib/laminar/run/gdb-vax-linux/5/binutils-gdb/gdb'
[all 2021-07-16 19:19:31] make: *** [Makefile:11410: all-gdb] Error 2


Code is this:

31 class unlinker
32 {
33  public:
34
35   unlinker (const char *filename) ATTRIBUTE_NONNULL (2)
36 : m_filename (filename)
37   {
38 gdb_assert (filename != NULL);
39   }

I'm quite undecided whether this is bad behavior of GCC or bad coding
style in Binutils/GDB, or both.


A warning should be expected in this case.  Before the recent GCC
change it was inadvertently suppressed in gdb_assert macros by its
operand being enclosed in parentheses.


This issue was just posted to the GDB list, and I wanted to clarify my
understanding a bit.

I believe that (at least by default) adding the nonnull attribute
allows GCC to assume (in the above case) that filename will not be
NULL and generate code accordingly.

Additionally, passing an explicit NULL (i.e. 'unlinker obj (NULL)')
would cause a compile time error.

But, there's nothing to actually stop a NULL being passed due to, say,
a logic bug in the program.  So, something like this would compile
fine:

extern const char *ptr;
unlinker obj (ptr);

And in a separate compilation unit:

const char *ptr = NULL;

Obviously, the run time behaviour of such a program would be
undefined.

Given the above then, it doesn't seem crazy to want to do something
like the above, that is, add an assert to catch a logic bug in the
program.

Is there an approved mechanism through which I can tell GCC that I
really do want to do a comparison to NULL, without any warning, and
without the check being optimised out?




Thanks for your feedback.


The manual says -fno-delete-null-pointer-checks is supposed to
prevent the removal of the null function argument test so I'd
expect adding attribute optimize ("no-delete-null-pointer-checks")
to the definition of the function to have that effect but in my
testing it didn't work (and didn't give a warning for the two
attributes on the same declarataion).  That seems worth filing
a bug for.


I've since been pointed at this:

   https://gcc.gnu.org/bugzilla/show_bug.cgi?id=100404

Comment #3 of which discusses exactly this issue.


I had forgotten about that discussion and opened PR 101665.  Thanks
for the pointer!  Let me link the two and see about updating
the manual and maybe also issuing a warning when both attributes
are set on the same function.

Martin

Re: Question about divide by 0 and what we can do with it

2021-07-29 Thread Andrew MacLeod via Gcc-patches


On 7/29/21 3:19 AM, Richard Biener wrote:

On Wed, Jul 28, 2021 at 4:39 PM Andrew MacLeod  wrote:


Which has removed the second call to builtin_abort()(Even before we
get to EVRP!)

SO the issue doesn't seem to be removing the divide by 0, it seems to be
a pattern match for [0,1] that is triggering.

I would argue the test case should not be testing for not removing the
divide by 0... Because we can now fold c_14 to be 486097858, and I
think that is a valid transformation?  (assuming no non-call exceptions
of course)

I think it's a valid transform, even with -fnon-call-exceptions when
-fdelete-dead-exceptions is enabled (like for Ada and C++).

You'd have to dig into history to tell why we added this testcase in that way.

Richard.


Looks like the PR was adding an optimization to simplification which  
checked to see if the LHS of a binop is a constant and the RHS of the 
binop was a 2-value range.


If so, it tries folding each range individually and if that is 
successful, turns it into a  CONDITION ?  LHSopRHS1 : LHSopRHS2


THere are 4 test files, the second one is triggering the failure because 
it deals with / and %, and is specifically checking to make sure the 
invalid cases like / 0 and  % 0 are not transformed..


With EVRP now able to do something with both the const and the modulus, 
the operations it is specifically scanning for to make sure it didn't 
transform no longer apply.


In fact, the things it checks for:

/* Dont optimize 972195717 / 0 in function foo.  */
/* { dg-final { scan-tree-dump-times "972195717 / " 1  "evrp" } } */
/* Dont optimize 972195717 % 0 in function bar.  */
/* { dg-final { scan-tree-dump-times "972195717 % " 1 "evrp" } } */
/* May optimize in function bar2, but EVRP doesn't perform this yet.  */
/* { dg-final { scan-tree-dump-times "972195715 % " 0 "evrp" } } */


The last line was already adjusted once when EVRP got smarter and 
started doing the transformation on   CST % 1 : 2


I see a couple of options...  we could delete the testcase, but Im not 
sure that is useful.


1) just change those 1's to 0's and move on like was time last time it 
appear.


2) also add a check that __builtin_abort is only called once. This would 
then be checking that we do all these transformations:


CST / [0,2]  will produce CST/2
CST % [0, 2] will produce either [0,0] or [1,1] based on CST.
CST % [1,2]  will produce  [0,1], but be transformed from the % into the 
?:  form.


Is this acceptable?    Test tetscase change would look like:

diff --git a/gcc/testsuite/gcc.dg/tree-ssa/pr61839_2.c 
b/gcc/testsuite/gcc.dg/tree-ssa/pr61839_2.c
index cfec54de991..e579fcb680f 100644
--- a/gcc/testsuite/gcc.dg/tree-ssa/pr61839_2.c
+++ b/gcc/testsuite/gcc.dg/tree-ssa/pr61839_2.c
@@ -45,10 +45,14 @@ int bar2 ()
   return 0;
 }
 
+/* EVRP now transformations all 3 functions. */
+/* { dg-final { scan-tree-dump-times "__builtin_abort" 1 "evrp" } } */
 
 /* Dont optimize 972195717 / 0 in function foo.  */
-/* { dg-final { scan-tree-dump-times "972195717 / " 1  "evrp" } } */
+/* { dg-final { scan-tree-dump-times "972195717 / " 0  "evrp" } } */
 /* Dont optimize 972195717 % 0 in function bar.  */
-/* { dg-final { scan-tree-dump-times "972195717 % " 1 "evrp" } } */
+/* { dg-final { scan-tree-dump-times "972195717 % " 0 "evrp" } } */
 /* May optimize in function bar2, but EVRP doesn't perform this yet.  */
 /* { dg-final { scan-tree-dump-times "972195715 % " 0 "evrp" } } */
+
+

[PATCH] RISC-V: Allow unaligned accesses in cpymemsi expansion

2021-07-29 Thread Christoph Muellner via Gcc-patches

The RISC-V cpymemsi expansion is called, whenever the by-pieces
infrastructure will not be taking care of the builtin expansion.
Currently, that's the case for e.g. memcpy() with n <= 24 bytes.
The code emitted by the by-pieces infrastructure emits code, that
performs unaligned accesses if the target's
riscv_slow_unaligned_access_p is false (and n is not 1).

If n > 24, then the RISC-V cpymemsi expansion is called, which is
implemented in riscv_expand_block_move(). The current implementation
does not check riscv_slow_unaligned_access_p and never emits unaligned
accesses.

Since by-pieces emits unaligned accesses, it is reasonable to implement
the same behaviour in the cpymemsi expansion. And that's what this patch
is doing.

The patch checks riscv_slow_unaligned_access_p at the entry and sets
the allowed alignment accordingly. This alignment is then propagated
down to the routines that emit the actual instructions.

Without the patch a memcpy() with n==25 will be exanded only
if the given pointers are aligned. With the patch also unaligned
pointers are accepted if riscv_slow_unaligned_access_p is false.

gcc/ChangeLog:

* config/riscv/riscv.c (riscv_block_move_straight): Add
parameter align.
(riscv_adjust_block_mem): Replace parameter length by parameter
align.
(riscv_block_move_loop): Add parameter align.
(riscv_expand_block_move): Set alignment properly if the target
has fast unaligned access.

gcc/testsuite/ChangeLog:

* gcc.target/riscv/builtins-strict-align.c: New test.
* gcc.target/riscv/builtins-unaligned-1.c: New test.
* gcc.target/riscv/builtins-unaligned-2.c: New test.
* gcc.target/riscv/builtins-unaligned-3.c: New test.
* gcc.target/riscv/builtins-unaligned-4.c: New test.
* gcc.target/riscv/builtins.h: New test.

Signed-off-by: Christoph Muellner 
---
 gcc/config/riscv/riscv.c  | 53 +++
 .../gcc.target/riscv/builtins-strict-align.c  | 13 +
 .../gcc.target/riscv/builtins-unaligned-1.c   | 15 ++
 .../gcc.target/riscv/builtins-unaligned-2.c   | 15 ++
 .../gcc.target/riscv/builtins-unaligned-3.c   | 15 ++
 .../gcc.target/riscv/builtins-unaligned-4.c   | 15 ++
 gcc/testsuite/gcc.target/riscv/builtins.h | 10 
 7 files changed, 115 insertions(+), 21 deletions(-)
 create mode 100644 gcc/testsuite/gcc.target/riscv/builtins-strict-align.c
 create mode 100644 gcc/testsuite/gcc.target/riscv/builtins-unaligned-1.c
 create mode 100644 gcc/testsuite/gcc.target/riscv/builtins-unaligned-2.c
 create mode 100644 gcc/testsuite/gcc.target/riscv/builtins-unaligned-3.c
 create mode 100644 gcc/testsuite/gcc.target/riscv/builtins-unaligned-4.c
 create mode 100644 gcc/testsuite/gcc.target/riscv/builtins.h

diff --git a/gcc/config/riscv/riscv.c b/gcc/config/riscv/riscv.c
index 576960bb37c..0596a9ff1b6 100644
--- a/gcc/config/riscv/riscv.c
+++ b/gcc/config/riscv/riscv.c
@@ -3173,11 +3173,13 @@ riscv_legitimize_call_address (rtx addr)
   return addr;
 }
 
-/* Emit straight-line code to move LENGTH bytes from SRC to DEST.
+/* Emit straight-line code to move LENGTH bytes from SRC to DEST
+   with accesses that are ALIGN bytes aligned.
Assume that the areas do not overlap.  */
 
 static void
-riscv_block_move_straight (rtx dest, rtx src, unsigned HOST_WIDE_INT length)
+riscv_block_move_straight (rtx dest, rtx src, unsigned HOST_WIDE_INT length,
+  unsigned HOST_WIDE_INT align)
 {
   unsigned HOST_WIDE_INT offset, delta;
   unsigned HOST_WIDE_INT bits;
@@ -3185,8 +3187,7 @@ riscv_block_move_straight (rtx dest, rtx src, unsigned 
HOST_WIDE_INT length)
   enum machine_mode mode;
   rtx *regs;
 
-  bits = MAX (BITS_PER_UNIT,
- MIN (BITS_PER_WORD, MIN (MEM_ALIGN (src), MEM_ALIGN (dest;
+  bits = MAX (BITS_PER_UNIT, MIN (BITS_PER_WORD, align));
 
   mode = mode_for_size (bits, MODE_INT, 0).require ();
   delta = bits / BITS_PER_UNIT;
@@ -3211,21 +3212,20 @@ riscv_block_move_straight (rtx dest, rtx src, unsigned 
HOST_WIDE_INT length)
 {
   src = adjust_address (src, BLKmode, offset);
   dest = adjust_address (dest, BLKmode, offset);
-  move_by_pieces (dest, src, length - offset,
- MIN (MEM_ALIGN (src), MEM_ALIGN (dest)), RETURN_BEGIN);
+  move_by_pieces (dest, src, length - offset, align, RETURN_BEGIN);
 }
 }
 
 /* Helper function for doing a loop-based block operation on memory
-   reference MEM.  Each iteration of the loop will operate on LENGTH
-   bytes of MEM.
+   reference MEM.
 
Create a new base register for use within the loop and point it to
the start of MEM.  Create a new memory reference that uses this
-   register.  Store them in *LOOP_REG and *LOOP_MEM respectively.  */
+   register and has an alignment of ALIGN.  Store them in *LOOP_REG
+   and *LOOP_MEM respectively.  */
 
 static void
-riscv_adjust_block_mem (rtx mem, unsigned HOST_WIDE_INT length,

[ARM] PR66791: Replace builtin in vld1_dup intrinsics

2021-07-29 Thread Prathamesh Kulkarni via Gcc-patches

Hi,
The attached patch replaces builtins in vld1_dup intrinsics with call
to corresponding vdup_n intrinsic and removes entry for vld1_dup from
arm_neon_builtins.def.
Bootstrapped+tested on arm-linux-gnueabihf.
OK to commit ?

Thanks,
Prathamesh
gcc/ChangeLog:

PR target/66791
* config/arm/arm_neon.h (vld1_dup_s8): Replace builtin with call to
corresponding vdup_n intrinsic.
(vld1_dup_s16): Likewise.
(vld1_dup_s32): Likewise.
(vld1_dup_f32): Likewise.
(vld1_dup_u8): Likewise.
(vld1_dup_u16): Likewise.
(vld1_dup_u32): Likewise.
(vld1_dup_p8): Likewise.
(vld1_dup_p16): Likewise.
(vld1_dup_p64): Likewise.
(vld1_dup_s64): Likewise.
(vld1_dup_u64): Likewise.
(vld1q_dup_s8): Likewise.
(vld1q_dup_s16): Likewise.
(vld1q_dup_s32): Likewise.
(vld1q_dup_f32): Likewise.
(vld1q_dup_u8): Likewise.
(vld1q_dup_u16): Likewise.
(vld1q_dup_u32): Likewise.
(vld1q_dup_p8): Likewise.
(vld1q_dup_p16): Likewise.
(vld1q_dup_p64): Likewise.
(vld1q_dup_s64): Likewise.
(vld1q_dup_u64): Likewise.
* config/arm/arm_neon_builtins.def (vld1_dup): Remove entry.


diff --git a/gcc/config/arm/arm_neon.h b/gcc/config/arm/arm_neon.h
index 41b596b5fc6..bc55dacffd3 100644
--- a/gcc/config/arm/arm_neon.h
+++ b/gcc/config/arm/arm_neon.h
@@ -10683,21 +10683,21 @@ __extension__ extern __inline int8x8_t
 __attribute__  ((__always_inline__, __gnu_inline__, __artificial__))
 vld1_dup_s8 (const int8_t * __a)
 {
-  return (int8x8_t)__builtin_neon_vld1_dupv8qi ((const __builtin_neon_qi *) 
__a);
+  return vdup_n_s8 (*__a);
 }
 
 __extension__ extern __inline int16x4_t
 __attribute__  ((__always_inline__, __gnu_inline__, __artificial__))
 vld1_dup_s16 (const int16_t * __a)
 {
-  return (int16x4_t)__builtin_neon_vld1_dupv4hi ((const __builtin_neon_hi *) 
__a);
+  return vdup_n_s16 (*__a);
 }
 
 __extension__ extern __inline int32x2_t
 __attribute__  ((__always_inline__, __gnu_inline__, __artificial__))
 vld1_dup_s32 (const int32_t * __a)
 {
-  return (int32x2_t)__builtin_neon_vld1_dupv2si ((const __builtin_neon_si *) 
__a);
+  return vdup_n_s32 (*__a);
 }
 
 #if defined (__ARM_FP16_FORMAT_IEEE) || defined (__ARM_FP16_FORMAT_ALTERNATIVE)
@@ -10714,42 +10714,42 @@ __extension__ extern __inline float32x2_t
 __attribute__  ((__always_inline__, __gnu_inline__, __artificial__))
 vld1_dup_f32 (const float32_t * __a)
 {
-  return (float32x2_t)__builtin_neon_vld1_dupv2sf ((const __builtin_neon_sf *) 
__a);
+  return vdup_n_f32 (*__a);
 }
 
 __extension__ extern __inline uint8x8_t
 __attribute__  ((__always_inline__, __gnu_inline__, __artificial__))
 vld1_dup_u8 (const uint8_t * __a)
 {
-  return (uint8x8_t)__builtin_neon_vld1_dupv8qi ((const __builtin_neon_qi *) 
__a);
+  return vdup_n_u8 (*__a);
 }
 
 __extension__ extern __inline uint16x4_t
 __attribute__  ((__always_inline__, __gnu_inline__, __artificial__))
 vld1_dup_u16 (const uint16_t * __a)
 {
-  return (uint16x4_t)__builtin_neon_vld1_dupv4hi ((const __builtin_neon_hi *) 
__a);
+  return vdup_n_u16 (*__a);
 }
 
 __extension__ extern __inline uint32x2_t
 __attribute__  ((__always_inline__, __gnu_inline__, __artificial__))
 vld1_dup_u32 (const uint32_t * __a)
 {
-  return (uint32x2_t)__builtin_neon_vld1_dupv2si ((const __builtin_neon_si *) 
__a);
+  return vdup_n_u32 (*__a);
 }
 
 __extension__ extern __inline poly8x8_t
 __attribute__  ((__always_inline__, __gnu_inline__, __artificial__))
 vld1_dup_p8 (const poly8_t * __a)
 {
-  return (poly8x8_t)__builtin_neon_vld1_dupv8qi ((const __builtin_neon_qi *) 
__a);
+  return vdup_n_p8 (*__a);
 }
 
 __extension__ extern __inline poly16x4_t
 __attribute__  ((__always_inline__, __gnu_inline__, __artificial__))
 vld1_dup_p16 (const poly16_t * __a)
 {
-  return (poly16x4_t)__builtin_neon_vld1_dupv4hi ((const __builtin_neon_hi *) 
__a);
+  return vdup_n_p16 (*__a);
 }
 
 #pragma GCC push_options
@@ -10758,7 +10758,7 @@ __extension__ extern __inline poly64x1_t
 __attribute__  ((__always_inline__, __gnu_inline__, __artificial__))
 vld1_dup_p64 (const poly64_t * __a)
 {
-  return (poly64x1_t)__builtin_neon_vld1_dupdi ((const __builtin_neon_di *) 
__a);
+  return vdup_n_p64 (*__a);
 }
 
 #pragma GCC pop_options
@@ -10766,35 +10766,35 @@ __extension__ extern __inline int64x1_t
 __attribute__  ((__always_inline__, __gnu_inline__, __artificial__))
 vld1_dup_s64 (const int64_t * __a)
 {
-  return (int64x1_t)__builtin_neon_vld1_dupdi ((const __builtin_neon_di *) 
__a);
+  return vdup_n_s64 (*__a);
 }
 
 __extension__ extern __inline uint64x1_t
 __attribute__  ((__always_inline__, __gnu_inline__, __artificial__))
 vld1_dup_u64 (const uint64_t * __a)
 {
-  return (uint64x1_t)__builtin_neon_vld1_dupdi ((const __builtin_neon_di *) 
__a);
+  return vdup_n_u64 (*__a);
 }
 
 __extension__ extern __inline int8x16_t
 __attribute__  ((__always_inline__, __gnu_inline__,

[PATCH] c++: __builtin_is_pointer_interconvertible_with_class incremental fix [PR101539]

2021-07-29 Thread Jakub Jelinek via Gcc-patches

On Thu, Jul 29, 2021 at 09:50:10AM +0200, Jakub Jelinek via Gcc-patches wrote:
> Now that I'm writing the above text and rereading the
> pointer-interconvertibility definition, I think my 
> first_nonstatic_data_member_p
> and fold_builtin_is_pointer_inverconvertible_with_class have one bug,
> for unions the pointer inter-convertibility doesn't talk about std layout at
> all, so I think I need to check for std_layout_type_p only for non-union
> class types and accept any union, std_layout_type_p or not.  But when
> recursing from a union type into anonymous structure type punt if the
> anonymous structure type is not std_layout_type_p + add testcase coverage.

For this part, here is an incremental fix.  Tested on x86_64-linux.

It also shows that in the case (we're beyond the standard in this case
because anonymous structures are not in the standard) of union with
non-std-layout anonymous structure in it, in the case in the testcases like:
struct D {};
struct E { [[no_unique_address]] D e; };
union Y { int a; struct : public E { short b; long c; }; long long d; };
the builtin will return false for ::b - while ::b is at offset zero,
the anonymous structure is not std-layout and therefore the
pointer-interconvertibility rules say pointers aren't interconvertible.
But in case like:
union Y2 { int a; struct : public E { int b; long c; }; long long d; };
it will return true for ::b - while the same applies, there is
another union member with int type.  In theory when seeing the PTRMEM_CST
we could still differentiate, ::a is ok but ::b is not.  But as soon
as we have just an INTEGER_CST with OFFSET_TYPE or need to check it at
runtime, all we know is that we have pointer to int data member in Y2
at offset 0, and that is the same for ::a and ::b.

2021-07-29  Jakub Jelinek  

PR c++/101539
* semantics.c (first_nonstatic_data_member_p): Don't recurse into
non-std-layout non-union class types from union type.
(fold_builtin_is_pointer_inverconvertible_with_class): Don't check
std-layout type for union types.

* g++.dg/cpp2a/is-pointer-interconvertible-with-class1.C: Add
tests for non-std-layout union type.
* g++.dg/cpp2a/is-pointer-interconvertible-with-class2.C: Likewise.
* g++.dg/cpp2a/is-pointer-interconvertible-with-class4.C: Add
tests for non-std-layout anonymous class type in union.
* g++.dg/cpp2a/is-pointer-interconvertible-with-class5.C: Likewise.

--- gcc/cp/semantics.c.jj   2021-07-28 23:06:38.665443459 +0200
+++ gcc/cp/semantics.c  2021-07-29 15:44:30.659713391 +0200
@@ -10631,7 +10631,9 @@ first_nonstatic_data_member_p (tree type
  if (TREE_CODE (type) != UNION_TYPE)
return first_nonstatic_data_member_p (TREE_TYPE (field),
  membertype);
- if (first_nonstatic_data_member_p (TREE_TYPE (field), membertype))
+ if ((TREE_CODE (TREE_TYPE (field)) == UNION_TYPE
+  || std_layout_type_p (TREE_TYPE (field)))
+ && first_nonstatic_data_member_p (TREE_TYPE (field), membertype))
return true;
}
   else if (TREE_CODE (type) != UNION_TYPE)
@@ -10677,7 +10679,8 @@ fold_builtin_is_pointer_inverconvertible
   if (!complete_type_or_else (basetype, NULL_TREE))
 return boolean_false_node;

-  if (!std_layout_type_p (basetype))
+  if (TREE_CODE (basetype) != UNION_TYPE
+  && !std_layout_type_p (basetype))
 return boolean_false_node;

   if (!first_nonstatic_data_member_p (basetype, membertype))
--- gcc/testsuite/g++.dg/cpp2a/is-pointer-interconvertible-with-class1.C.jj 
2021-07-28 23:06:38.667443431 +0200
+++ gcc/testsuite/g++.dg/cpp2a/is-pointer-interconvertible-with-class1.C
2021-07-29 15:46:55.809743808 +0200
@@ -28,6 +28,7 @@ union U { int a; double b; long long c;
 struct V { union { int a; long b; }; int c; };
 union X { int a; union { short b; long c; }; long long d; };
 struct Y { void foo () {} };
+union Z { int a; private: int b; protected: int c; public: int d; };

 static_assert (std::is_pointer_interconvertible_with_class (::b));
 static_assert (!std::is_pointer_interconvertible_with_class (::b2));
@@ -60,3 +61,5 @@ static_assert (std::is_pointer_interconv
 static_assert (std::is_pointer_interconvertible_with_class (::d));
 static_assert (!std::is_pointer_interconvertible_with_class ((int B::*) 
nullptr));
 static_assert (!std::is_pointer_interconvertible_with_class (::foo));
+static_assert (std::is_pointer_interconvertible_with_class (::a));
+static_assert (std::is_pointer_interconvertible_with_class (::d));
--- gcc/testsuite/g++.dg/cpp2a/is-pointer-interconvertible-with-class2.C.jj 
2021-07-28 23:06:38.667443431 +0200
+++ gcc/testsuite/g++.dg/cpp2a/is-pointer-interconvertible-with-class2.C
2021-07-29 15:48:33.075423974 +0200
@@ -28,6 +28,7 @@ union U { int a; double b; long long c;
 struct V { union { int a; long b; }; int c; };
 union X { int a; union

[PATCH 31/34] rs6000: Debug support

2021-07-29 Thread Bill Schmidt via Gcc-patches

2021-07-28  Bill Schmidt  

gcc/
* config/rs6000/rs6000-call.c (rs6000_debug_type): New function.
(def_builtin): Change debug formatting for easier parsing and
include more information.
(rs6000_init_builtins): Add dump of autogenerated builtins.
(altivec_init_builtins): Dump __builtin_altivec_mask_for_load for
completeness.
---
 gcc/config/rs6000/rs6000-call.c | 191 +++-
 1 file changed, 185 insertions(+), 6 deletions(-)

diff --git a/gcc/config/rs6000/rs6000-call.c b/gcc/config/rs6000/rs6000-call.c
index d14d58be7d7..8e2f76f1b5c 100644
--- a/gcc/config/rs6000/rs6000-call.c
+++ b/gcc/config/rs6000/rs6000-call.c
@@ -8880,6 +8880,106 @@ rs6000_gimplify_va_arg (tree valist, tree type, 
gimple_seq *pre_p,
 
 /* Builtins.  */
 
+/* Debug utility to translate a type node to a single token.  */
+static
+const char *rs6000_debug_type (tree type)
+{
+  if (type == void_type_node)
+return "void";
+  else if (type == long_integer_type_node)
+return "long";
+  else if (type == long_unsigned_type_node)
+return "ulong";
+  else if (type == long_long_integer_type_node)
+return "longlong";
+  else if (type == long_long_unsigned_type_node)
+return "ulonglong";
+  else if (type == bool_V2DI_type_node)
+return "vbll";
+  else if (type == bool_V4SI_type_node)
+return "vbi";
+  else if (type == bool_V8HI_type_node)
+return "vbs";
+  else if (type == bool_V16QI_type_node)
+return "vbc";
+  else if (type == bool_int_type_node)
+return "bool";
+  else if (type == dfloat64_type_node)
+return "_Decimal64";
+  else if (type == double_type_node)
+return "double";
+  else if (type == intDI_type_node)
+return "sll";
+  else if (type == intHI_type_node)
+return "ss";
+  else if (type == ibm128_float_type_node)
+return "__ibm128";
+  else if (type == opaque_V4SI_type_node)
+return "opaque";
+  else if (POINTER_TYPE_P (type))
+return "void*";
+  else if (type == intQI_type_node || type == char_type_node)
+return "sc";
+  else if (type == dfloat32_type_node)
+return "_Decimal32";
+  else if (type == float_type_node)
+return "float";
+  else if (type == intSI_type_node || type == integer_type_node)
+return "si";
+  else if (type == dfloat128_type_node)
+return "_Decimal128";
+  else if (type == long_double_type_node)
+return "longdouble";
+  else if (type == intTI_type_node)
+return "sq";
+  else if (type == unsigned_intDI_type_node)
+return "ull";
+  else if (type == unsigned_intHI_type_node)
+return "us";
+  else if (type == unsigned_intQI_type_node)
+return "uc";
+  else if (type == unsigned_intSI_type_node)
+return "ui";
+  else if (type == unsigned_intTI_type_node)
+return "uq";
+  else if (type == unsigned_V1TI_type_node)
+return "vuq";
+  else if (type == unsigned_V2DI_type_node)
+return "vull";
+  else if (type == unsigned_V4SI_type_node)
+return "vui";
+  else if (type == unsigned_V8HI_type_node)
+return "vus";
+  else if (type == unsigned_V16QI_type_node)
+return "vuc";
+  else if (type == V16QI_type_node)
+return "vsc";
+  else if (type == V1TI_type_node)
+return "vsq";
+  else if (type == V2DF_type_node)
+return "vd";
+  else if (type == V2DI_type_node)
+return "vsll";
+  else if (type == V4SF_type_node)
+return "vf";
+  else if (type == V4SI_type_node)
+return "vsi";
+  else if (type == V8HI_type_node)
+return "vss";
+  else if (type == pixel_V8HI_type_node)
+return "vp";
+  else if (type == pcvoid_type_node)
+return "voidc*";
+  else if (type == float128_type_node)
+return "_Float128";
+  else if (type == vector_pair_type_node)
+return "__vector_pair";
+  else if (type == vector_quad_type_node)
+return "__vector_quad";
+  else
+return "unknown";
+}
+
 static void
 def_builtin (const char *name, tree type, enum rs6000_builtins code)
 {
@@ -8908,7 +9008,7 @@ def_builtin (const char *name, tree type, enum 
rs6000_builtins code)
   /* const function, function only depends on the inputs.  */
   TREE_READONLY (t) = 1;
   TREE_NOTHROW (t) = 1;
-  attr_string = ", const";
+  attr_string = "= const";
 }
   else if ((classify & RS6000_BTC_PURE) != 0)
 {
@@ -8916,7 +9016,7 @@ def_builtin (const char *name, tree type, enum 
rs6000_builtins code)
 external state.  */
   DECL_PURE_P (t) = 1;
   TREE_NOTHROW (t) = 1;
-  attr_string = ", pure";
+  attr_string = "= pure";
 }
   else if ((classify & RS6000_BTC_FP) != 0)
 {
@@ -8930,12 +9030,12 @@ def_builtin (const char *name, tree type, enum 
rs6000_builtins code)
{
  DECL_PURE_P (t) = 1;
  DECL_IS_NOVOPS (t) = 1;
- attr_string = ", fp, pure";
+ attr_string = "= fp, pure";
}
   else
{
  TREE_READONLY (t) = 1;
- attr_string = ", fp, const";
+ attr_string = "= fp, const";
}
 }

[PATCH 23/34] rs6000: Builtin expansion, part 1

2021-07-29 Thread Bill Schmidt via Gcc-patches

2021-06-17  Bill Schmidt  

gcc/
* config/rs6000/rs6000-call.c (rs6000_expand_new_builtin): New
forward decl.
(rs6000_invalid_new_builtin): New stub function.
(rs6000_expand_builtin): Call rs6000_expand_new_builtin.
(rs6000_expand_ldst_mask): New stub function.
(new_cpu_expand_builtin): Likewise.
(elemrev_icode): Likewise.
(ldv_expand_builtin): Likewise.
(lxvrse_expand_builtin): Likewise.
(lxvrze_expand_builtin): Likewise.
(stv_expand_builtin): Likewise.
(new_mma_expand_builtin): Likewise.
(new_htm_expand_builtin): Likewise.
(rs6000_expand_new_builtin): New function.
---
 gcc/config/rs6000/rs6000-call.c | 526 
 1 file changed, 526 insertions(+)

diff --git a/gcc/config/rs6000/rs6000-call.c b/gcc/config/rs6000/rs6000-call.c
index f75b5c8176c..4719d074455 100644
--- a/gcc/config/rs6000/rs6000-call.c
+++ b/gcc/config/rs6000/rs6000-call.c
@@ -190,6 +190,7 @@ static tree builtin_function_type (machine_mode, 
machine_mode,
 static void rs6000_common_init_builtins (void);
 static void htm_init_builtins (void);
 static void mma_init_builtins (void);
+static rtx rs6000_expand_new_builtin (tree, rtx, rtx, machine_mode, int);
 static bool rs6000_gimple_fold_new_builtin (gimple_stmt_iterator *gsi);
 
 
@@ -11664,6 +11665,14 @@ rs6000_invalid_builtin (enum rs6000_builtins fncode)
 error ("%qs is not supported with the current options", name);
 }
 
+/* Raise an error message for a builtin function that is called without the
+   appropriate target options being set.  */
+
+static void
+rs6000_invalid_new_builtin (enum rs6000_gen_builtins fncode)
+{
+}
+
 /* Target hook for early folding of built-ins, shamelessly stolen
from ia64.c.  */
 
@@ -14255,6 +14264,9 @@ rs6000_expand_builtin (tree exp, rtx target, rtx 
subtarget ATTRIBUTE_UNUSED,
   machine_mode mode ATTRIBUTE_UNUSED,
   int ignore ATTRIBUTE_UNUSED)
 {
+  if (new_builtins_are_live)
+return rs6000_expand_new_builtin (exp, target, subtarget, mode, ignore);
+
   tree fndecl = TREE_OPERAND (CALL_EXPR_FN (exp), 0);
   enum rs6000_builtins fcode
 = (enum rs6000_builtins) DECL_MD_FUNCTION_CODE (fndecl);
@@ -14547,6 +14559,520 @@ rs6000_expand_builtin (tree exp, rtx target, rtx 
subtarget ATTRIBUTE_UNUSED,
   gcc_unreachable ();
 }
 
+/* Expand ALTIVEC_BUILTIN_MASK_FOR_LOAD.  */
+rtx
+rs6000_expand_ldst_mask (rtx target, tree arg0)
+ {
+  return target;
+ }
+
+/* Expand the CPU builtin in FCODE and store the result in TARGET.  */
+static rtx
+new_cpu_expand_builtin (enum rs6000_gen_builtins fcode,
+   tree exp ATTRIBUTE_UNUSED, rtx target)
+{
+  return target;
+}
+
+static insn_code
+elemrev_icode (rs6000_gen_builtins fcode)
+{
+  return (insn_code) 0;
+}
+
+static rtx
+ldv_expand_builtin (rtx target, insn_code icode, rtx *op, machine_mode tmode)
+{
+  return target;
+}
+
+static rtx
+lxvrse_expand_builtin (rtx target, insn_code icode, rtx *op,
+  machine_mode tmode, machine_mode smode)
+{
+  return target;
+}
+
+static rtx
+lxvrze_expand_builtin (rtx target, insn_code icode, rtx *op,
+  machine_mode tmode, machine_mode smode)
+{
+  return target;
+}
+
+static rtx
+stv_expand_builtin (insn_code icode, rtx *op,
+   machine_mode tmode, machine_mode smode)
+{
+  return NULL_RTX;
+}
+
+/* Expand the MMA built-in in EXP.  */
+static rtx
+new_mma_expand_builtin (tree exp, rtx target, insn_code icode,
+   rs6000_gen_builtins fcode)
+{
+  return target;
+}
+
+/* Expand the HTM builtin in EXP and store the result in TARGET.  */
+static rtx
+new_htm_expand_builtin (bifdata *bifaddr, rs6000_gen_builtins fcode,
+   tree exp, rtx target)
+{
+  return const0_rtx;
+}
+
+/* Expand an expression EXP that calls a built-in function,
+   with result going to TARGET if that's convenient
+   (and in mode MODE if that's convenient).
+   SUBTARGET may be used as the target for computing one of EXP's operands.
+   IGNORE is nonzero if the value is to be ignored.
+   Use the new builtin infrastructure.  */
+static rtx
+rs6000_expand_new_builtin (tree exp, rtx target,
+  rtx subtarget ATTRIBUTE_UNUSED,
+  machine_mode ignore_mode ATTRIBUTE_UNUSED,
+  int ignore ATTRIBUTE_UNUSED)
+{
+  tree fndecl = TREE_OPERAND (CALL_EXPR_FN (exp), 0);
+  enum rs6000_gen_builtins fcode
+= (enum rs6000_gen_builtins) DECL_MD_FUNCTION_CODE (fndecl);
+  size_t uns_fcode = (size_t)fcode;
+  enum insn_code icode = rs6000_builtin_info_x[uns_fcode].icode;
+
+  /* We have two different modes (KFmode, TFmode) that are the IEEE 128-bit
+ floating point type, depending on whether long double is the IBM extended
+ double (KFmode) or long double is IEEE 128-bit (TFmode).  It is simpler if
+ we only define one variant of the

[PATCH 21/34] rs6000: Handle some recent MMA builtin changes

2021-07-29 Thread Bill Schmidt via Gcc-patches

2021-07-27  Bill Schmidt  

gcc/
* config/rs6000/rs6000-builtin-new.def (ASSEMBLE_ACC): Add mmaint
flag.
(ASSEMBLE_PAIR): Likewise.
(BUILD_ACC): Likewise.
(DISASSEMBLE_ACC): Likewise.
(DISASSEMBLE_PAIR): Likewise.
(PMXVBF16GER2): Likewise.
(PMXVBF16GER2NN): Likewise.
(PMXVBF16GER2NP): Likewise.
(PMXVBF16GER2PN): Likewise.
(PMXVBF16GER2PP): Likewise.
(PMXVF16GER2): Likewise.
(PMXVF16GER2NN): Likewise.
(PMXVF16GER2NP): Likewise.
(PMXVF16GER2PN): Likewise.
(PMXVF16GER2PP): Likewise.
(PMXVF32GER): Likewise.
(PMXVF32GERNN): Likewise.
(PMXVF32GERNP): Likewise.
(PMXVF32GERPN): Likewise.
(PMXVF32GERPP): Likewise.
(PMXVF64GER): Likewise.
(PMXVF64GERNN): Likewise.
(PMXVF64GERNP): Likewise.
(PMXVF64GERPN): Likewise.
(PMXVF64GERPP): Likewise.
(PMXVI16GER2): Likewise.
(PMXVI16GER2PP): Likewise.
(PMXVI16GER2S): Likewise.
(PMXVI16GER2SPP): Likewise.
(PMXVI4GER8): Likewise.
(PMXVI4GER8PP): Likewise.
(PMXVI8GER4): Likewise.
(PMXVI8GER4PP): Likewise.
(PMXVI8GER4SPP): Likewise.
(XVBF16GER2): Likewise.
(XVBF16GER2NN): Likewise.
(XVBF16GER2NP): Likewise.
(XVBF16GER2PN): Likewise.
(XVBF16GER2PP): Likewise.
(XVF16GER2): Likewise.
(XVF16GER2NN): Likewise.
(XVF16GER2NP): Likewise.
(XVF16GER2PN): Likewise.
(XVF16GER2PP): Likewise.
(XVF32GER): Likewise.
(XVF32GERNN): Likewise.
(XVF32GERNP): Likewise.
(XVF32GERPN): Likewise.
(XVF32GERPP): Likewise.
(XVF64GER): Likewise.
(XVF64GERNN): Likewise.
(XVF64GERNP): Likewise.
(XVF64GERPN): Likewise.
(XVF64GERPP): Likewise.
(XVI16GER2): Likewise.
(XVI16GER2PP): Likewise.
(XVI16GER2S): Likewise.
(XVI16GER2SPP): Likewise.
(XVI4GER8): Likewise.
(XVI4GER8PP): Likewise.
(XVI8GER4): Likewise.
(XVI8GER4PP): Likewise.
(XVI8GER4SPP): Likewise.
(XXMFACC): Likewise.
(XXMTACC): Likewise.
(XXSETACCZ): Likewise.
(ASSEMBLE_PAIR_V): Likewise.
(BUILD_PAIR): Likewise.
(DISASSEMBLE_PAIR_V): Likewise.
(LXVP): New.
(STXVP): New.
* config/rs6000/rs6000-call.c
(rs6000_gimple_fold_new_mma_builtin): Handle RS6000_BIF_LXVP and
RS6000_BIF_STXVP.
* config/rs6000/rs6000-gen-builtins.c (attrinfo): Add ismmaint.
(parse_bif_attrs): Handle ismmaint.
(write_decls): Add bif_mmaint_bit and bif_is_mmaint.
(write_bif_static_init): Handle ismmaint.
---
 gcc/config/rs6000/rs6000-builtin-new.def | 145 ---
 gcc/config/rs6000/rs6000-call.c  |  32 -
 gcc/config/rs6000/rs6000-gen-builtins.c  |  38 +++---
 3 files changed, 129 insertions(+), 86 deletions(-)

diff --git a/gcc/config/rs6000/rs6000-builtin-new.def 
b/gcc/config/rs6000/rs6000-builtin-new.def
index 91dce7fbc91..c1bf545c408 100644
--- a/gcc/config/rs6000/rs6000-builtin-new.def
+++ b/gcc/config/rs6000/rs6000-builtin-new.def
@@ -129,6 +129,7 @@
 ;   mma  Needs special handling for MMA
 ;   quad MMA instruction using a register quad as an input operand
 ;   pair MMA instruction using a register pair as an input operand
+;   mmaint   MMA instruction expanding to internal call at GIMPLE time
 ;   no32bit  Not valid for TARGET_32BIT
 ;   32bitRequires different handling for TARGET_32BIT
 ;   cpu  This is a "cpu_is" or "cpu_supports" builtin
@@ -3584,415 +3585,421 @@
 
 [mma]
   void __builtin_mma_assemble_acc (v512 *, vuc, vuc, vuc, vuc);
-ASSEMBLE_ACC nothing {mma}
+ASSEMBLE_ACC nothing {mma,mmaint}
 
   v512 __builtin_mma_assemble_acc_internal (vuc, vuc, vuc, vuc);
 ASSEMBLE_ACC_INTERNAL mma_assemble_acc {mma}
 
   void __builtin_mma_assemble_pair (v256 *, vuc, vuc);
-ASSEMBLE_PAIR nothing {mma}
+ASSEMBLE_PAIR nothing {mma,mmaint}
 
   v256 __builtin_mma_assemble_pair_internal (vuc, vuc);
 ASSEMBLE_PAIR_INTERNAL vsx_assemble_pair {mma}
 
   void __builtin_mma_build_acc (v512 *, vuc, vuc, vuc, vuc);
-BUILD_ACC nothing {mma}
+BUILD_ACC nothing {mma,mmaint}
 
   v512 __builtin_mma_build_acc_internal (vuc, vuc, vuc, vuc);
 BUILD_ACC_INTERNAL mma_assemble_acc {mma}
 
   void __builtin_mma_disassemble_acc (void *, v512 *);
-DISASSEMBLE_ACC nothing {mma,quad}
+DISASSEMBLE_ACC nothing {mma,quad,mmaint}
 
   vuc __builtin_mma_disassemble_acc_internal (v512, const int<2>);
 DISASSEMBLE_ACC_INTERNAL mma_disassemble_acc {mma}
 
   void __builtin_mma_disassemble_pair (void *, v256 *);
-DISASSEMBLE_PAIR nothing {mma,pair}
+DISASSEMBLE_PAIR nothing {mma,pair,mmaint}
 
   vuc __builtin_mma_disassemble_pair_internal (v256, const int<2>);
 DISASSEMBLE_PAIR_INTERNAL

[PATCH 29/34] rs6000: Update rs6000_builtin_decl

2021-07-29 Thread Bill Schmidt via Gcc-patches

Create a new version of this function that uses the new infrastructure,
and particularly checks for supported builtins the new way.

2021-07-28  Bill Schmidt  

gcc/
* config/rs6000/rs6000-call.c (rs6000_new_builtin_decl): New
function.
(rs6000_builtin_decl): Call it.
---
 gcc/config/rs6000/rs6000-call.c | 20 
 1 file changed, 20 insertions(+)

diff --git a/gcc/config/rs6000/rs6000-call.c b/gcc/config/rs6000/rs6000-call.c
index eaf62d734f1..d14d58be7d7 100644
--- a/gcc/config/rs6000/rs6000-call.c
+++ b/gcc/config/rs6000/rs6000-call.c
@@ -16320,11 +16320,31 @@ rs6000_init_builtins (void)
 }
 }
 
+static tree
+rs6000_new_builtin_decl (unsigned code, bool initialize_p ATTRIBUTE_UNUSED)
+{
+  rs6000_gen_builtins fcode = (rs6000_gen_builtins) code;
+
+  if (fcode >= RS6000_OVLD_MAX)
+return error_mark_node;
+
+  if (!rs6000_new_builtin_is_supported_p (fcode))
+{
+  rs6000_invalid_new_builtin (fcode);
+  return error_mark_node;
+}
+
+  return rs6000_builtin_decls_x[code];
+}
+
 /* Returns the rs6000 builtin decl for CODE.  */
 
 tree
 rs6000_builtin_decl (unsigned code, bool initialize_p ATTRIBUTE_UNUSED)
 {
+  if (new_builtins_are_live)
+return rs6000_new_builtin_decl (code, initialize_p);
+
   HOST_WIDE_INT fnmask;
 
   if (code >= RS6000_BUILTIN_COUNT)
-- 
2.27.0

[PATCH 33/34] rs6000: Test case adjustments

2021-07-29 Thread Bill Schmidt via Gcc-patches

2021-07-19  Bill Schmidt  

gcc/testsuite/
* gcc.target/powerpc/bfp/scalar-extract-exp-2.c: Adjust.
* gcc.target/powerpc/bfp/scalar-extract-sig-2.c: Adjust.
* gcc.target/powerpc/bfp/scalar-insert-exp-2.c: Adjust.
* gcc.target/powerpc/bfp/scalar-insert-exp-5.c: Adjust.
* gcc.target/powerpc/bfp/scalar-insert-exp-8.c: Adjust.
* gcc.target/powerpc/bfp/scalar-test-neg-2.c: Adjust.
* gcc.target/powerpc/bfp/scalar-test-neg-3.c: Adjust.
* gcc.target/powerpc/bfp/scalar-test-neg-5.c: Adjust.
* gcc.target/powerpc/byte-in-set-2.c: Adjust.
* gcc.target/powerpc/cmpb-2.c: Adjust.
* gcc.target/powerpc/cmpb32-2.c: Adjust.
* gcc.target/powerpc/crypto-builtin-2.c: Adjust.
* gcc.target/powerpc/fold-vec-splat-floatdouble.c: Adjust.
* gcc.target/powerpc/fold-vec-splat-longlong.c: Adjust.
* gcc.target/powerpc/fold-vec-splat-misc-invalid.c: Adjust.
* gcc.target/powerpc/int_128bit-runnable.c: Adjust.
* gcc.target/powerpc/p8vector-builtin-8.c: Adjust.
* gcc.target/powerpc/pr80315-1.c: Adjust.
* gcc.target/powerpc/pr80315-2.c: Adjust.
* gcc.target/powerpc/pr80315-3.c: Adjust.
* gcc.target/powerpc/pr80315-4.c: Adjust.
* gcc.target/powerpc/pr88100.c: Adjust.
* gcc.target/powerpc/pragma_misc9.c: Adjust.
* gcc.target/powerpc/pragma_power8.c: Adjust.
* gcc.target/powerpc/pragma_power9.c: Adjust.
* gcc.target/powerpc/test_fpscr_drn_builtin_error.c: Adjust.
* gcc.target/powerpc/test_fpscr_rn_builtin_error.c: Adjust.
* gcc.target/powerpc/test_mffsl.c: Adjust.
* gcc.target/powerpc/vec-gnb-2.c: Adjust.
* gcc.target/powerpc/vsu/vec-all-nez-7.c: Adjust.
* gcc.target/powerpc/vsu/vec-any-eqz-7.c: Adjust.
* gcc.target/powerpc/vsu/vec-cmpnez-7.c: Adjust.
* gcc.target/powerpc/vsu/vec-cntlz-lsbb-2.c: Adjust.
* gcc.target/powerpc/vsu/vec-cnttz-lsbb-2.c: Adjust.
* gcc.target/powerpc/vsu/vec-xst-len-12.c: Adjust.
* gcc.target/powerpc/vsu/vec-xst-len-13.c: Adjust.
---
 .../gcc.target/powerpc/bfp/scalar-extract-exp-2.c  |  2 +-
 .../gcc.target/powerpc/bfp/scalar-extract-sig-2.c  |  2 +-
 .../gcc.target/powerpc/bfp/scalar-insert-exp-2.c   |  2 +-
 .../gcc.target/powerpc/bfp/scalar-insert-exp-5.c   |  2 +-
 .../gcc.target/powerpc/bfp/scalar-insert-exp-8.c   |  2 +-
 .../gcc.target/powerpc/bfp/scalar-test-neg-2.c |  2 +-
 .../gcc.target/powerpc/bfp/scalar-test-neg-3.c |  2 +-
 .../gcc.target/powerpc/bfp/scalar-test-neg-5.c |  2 +-
 gcc/testsuite/gcc.target/powerpc/byte-in-set-2.c   |  2 +-
 gcc/testsuite/gcc.target/powerpc/cmpb-2.c  |  2 +-
 gcc/testsuite/gcc.target/powerpc/cmpb32-2.c|  2 +-
 .../gcc.target/powerpc/crypto-builtin-2.c  | 14 +++---
 .../powerpc/fold-vec-splat-floatdouble.c   |  4 ++--
 .../gcc.target/powerpc/fold-vec-splat-longlong.c   | 10 +++---
 .../powerpc/fold-vec-splat-misc-invalid.c  |  8 
 .../gcc.target/powerpc/int_128bit-runnable.c   |  6 +++---
 .../gcc.target/powerpc/p8vector-builtin-8.c|  1 +
 gcc/testsuite/gcc.target/powerpc/pr80315-1.c   |  2 +-
 gcc/testsuite/gcc.target/powerpc/pr80315-2.c   |  2 +-
 gcc/testsuite/gcc.target/powerpc/pr80315-3.c   |  2 +-
 gcc/testsuite/gcc.target/powerpc/pr80315-4.c   |  2 +-
 gcc/testsuite/gcc.target/powerpc/pr88100.c | 12 ++--
 gcc/testsuite/gcc.target/powerpc/pragma_misc9.c|  2 +-
 gcc/testsuite/gcc.target/powerpc/pragma_power8.c   |  2 ++
 gcc/testsuite/gcc.target/powerpc/pragma_power9.c   |  3 +++
 .../powerpc/test_fpscr_drn_builtin_error.c |  4 ++--
 .../powerpc/test_fpscr_rn_builtin_error.c  | 12 ++--
 gcc/testsuite/gcc.target/powerpc/test_mffsl.c  |  3 ++-
 gcc/testsuite/gcc.target/powerpc/vec-gnb-2.c   |  2 +-
 .../gcc.target/powerpc/vsu/vec-all-nez-7.c |  2 +-
 .../gcc.target/powerpc/vsu/vec-any-eqz-7.c |  2 +-
 .../gcc.target/powerpc/vsu/vec-cmpnez-7.c  |  2 +-
 .../gcc.target/powerpc/vsu/vec-cntlz-lsbb-2.c  |  2 +-
 .../gcc.target/powerpc/vsu/vec-cnttz-lsbb-2.c  |  2 +-
 .../gcc.target/powerpc/vsu/vec-xl-len-13.c |  2 +-
 .../gcc.target/powerpc/vsu/vec-xst-len-12.c|  2 +-
 36 files changed, 65 insertions(+), 62 deletions(-)

diff --git a/gcc/testsuite/gcc.target/powerpc/bfp/scalar-extract-exp-2.c 
b/gcc/testsuite/gcc.target/powerpc/bfp/scalar-extract-exp-2.c
index 922180675fc..53b67c95cf9 100644
--- a/gcc/testsuite/gcc.target/powerpc/bfp/scalar-extract-exp-2.c
+++ b/gcc/testsuite/gcc.target/powerpc/bfp/scalar-extract-exp-2.c
@@ -14,7 +14,7 @@ get_exponent (double *p)
 {
   double source = *p;
 
-  return scalar_extract_exp (source);  /* { dg-error 
"'__builtin_vec_scalar_extract_exp' is not supported in this compiler 
configuration" } */
+  return scalar_extract_exp (source);  /* { dg-error

[PATCH 25/34] rs6000: Builtin expansion, part 3

2021-07-29 Thread Bill Schmidt via Gcc-patches

2021-03-05  Bill Schmidt  

gcc/
* config/rs6000/rs6000-call.c (new_cpu_expand_builtin):
Implement.
---
 gcc/config/rs6000/rs6000-call.c | 100 
 1 file changed, 100 insertions(+)

diff --git a/gcc/config/rs6000/rs6000-call.c b/gcc/config/rs6000/rs6000-call.c
index 13a24dd9713..4f5aed137fb 100644
--- a/gcc/config/rs6000/rs6000-call.c
+++ b/gcc/config/rs6000/rs6000-call.c
@@ -14666,6 +14666,106 @@ static rtx
 new_cpu_expand_builtin (enum rs6000_gen_builtins fcode,
tree exp ATTRIBUTE_UNUSED, rtx target)
 {
+  /* __builtin_cpu_init () is a nop, so expand to nothing.  */
+  if (fcode == RS6000_BIF_CPU_INIT)
+return const0_rtx;
+
+  if (target == 0 || GET_MODE (target) != SImode)
+target = gen_reg_rtx (SImode);
+
+#ifdef TARGET_LIBC_PROVIDES_HWCAP_IN_TCB
+  tree arg = TREE_OPERAND (CALL_EXPR_ARG (exp, 0), 0);
+  /* Target clones creates an ARRAY_REF instead of STRING_CST, convert it back
+ to a STRING_CST.  */
+  if (TREE_CODE (arg) == ARRAY_REF
+  && TREE_CODE (TREE_OPERAND (arg, 0)) == STRING_CST
+  && TREE_CODE (TREE_OPERAND (arg, 1)) == INTEGER_CST
+  && compare_tree_int (TREE_OPERAND (arg, 1), 0) == 0)
+arg = TREE_OPERAND (arg, 0);
+
+  if (TREE_CODE (arg) != STRING_CST)
+{
+  error ("builtin %qs only accepts a string argument",
+rs6000_builtin_info_x[(size_t) fcode].bifname);
+  return const0_rtx;
+}
+
+  if (fcode == RS6000_BIF_CPU_IS)
+{
+  const char *cpu = TREE_STRING_POINTER (arg);
+  rtx cpuid = NULL_RTX;
+  for (size_t i = 0; i < ARRAY_SIZE (cpu_is_info); i++)
+   if (strcmp (cpu, cpu_is_info[i].cpu) == 0)
+ {
+   /* The CPUID value in the TCB is offset by _DL_FIRST_PLATFORM.  */
+   cpuid = GEN_INT (cpu_is_info[i].cpuid + _DL_FIRST_PLATFORM);
+   break;
+ }
+  if (cpuid == NULL_RTX)
+   {
+ /* Invalid CPU argument.  */
+ error ("cpu %qs is an invalid argument to builtin %qs",
+cpu, rs6000_builtin_info_x[(size_t) fcode].bifname);
+ return const0_rtx;
+   }
+
+  rtx platform = gen_reg_rtx (SImode);
+  rtx tcbmem = gen_const_mem (SImode,
+ gen_rtx_PLUS (Pmode,
+   gen_rtx_REG (Pmode, TLS_REGNUM),
+   GEN_INT (TCB_PLATFORM_OFFSET)));
+  emit_move_insn (platform, tcbmem);
+  emit_insn (gen_eqsi3 (target, platform, cpuid));
+}
+  else if (fcode == RS6000_BIF_CPU_SUPPORTS)
+{
+  const char *hwcap = TREE_STRING_POINTER (arg);
+  rtx mask = NULL_RTX;
+  int hwcap_offset;
+  for (size_t i = 0; i < ARRAY_SIZE (cpu_supports_info); i++)
+   if (strcmp (hwcap, cpu_supports_info[i].hwcap) == 0)
+ {
+   mask = GEN_INT (cpu_supports_info[i].mask);
+   hwcap_offset = TCB_HWCAP_OFFSET (cpu_supports_info[i].id);
+   break;
+ }
+  if (mask == NULL_RTX)
+   {
+ /* Invalid HWCAP argument.  */
+ error ("%s %qs is an invalid argument to builtin %qs",
+"hwcap", hwcap,
+rs6000_builtin_info_x[(size_t) fcode].bifname);
+ return const0_rtx;
+   }
+
+  rtx tcb_hwcap = gen_reg_rtx (SImode);
+  rtx tcbmem = gen_const_mem (SImode,
+ gen_rtx_PLUS (Pmode,
+   gen_rtx_REG (Pmode, TLS_REGNUM),
+   GEN_INT (hwcap_offset)));
+  emit_move_insn (tcb_hwcap, tcbmem);
+  rtx scratch1 = gen_reg_rtx (SImode);
+  emit_insn (gen_rtx_SET (scratch1, gen_rtx_AND (SImode, tcb_hwcap, 
mask)));
+  rtx scratch2 = gen_reg_rtx (SImode);
+  emit_insn (gen_eqsi3 (scratch2, scratch1, const0_rtx));
+  emit_insn (gen_rtx_SET (target, gen_rtx_XOR (SImode, scratch2, 
const1_rtx)));
+}
+  else
+gcc_unreachable ();
+
+  /* Record that we have expanded a CPU builtin, so that we can later
+ emit a reference to the special symbol exported by LIBC to ensure we
+ do not link against an old LIBC that doesn't support this feature.  */
+  cpu_builtin_p = true;
+
+#else
+  warning (0, "builtin %qs needs GLIBC (2.23 and newer) that exports hardware "
+  "capability bits", rs6000_builtin_info_x[(size_t) fcode].bifname);
+
+  /* For old LIBCs, always return FALSE.  */
+  emit_move_insn (target, GEN_INT (0));
+#endif /* TARGET_LIBC_PROVIDES_HWCAP_IN_TCB */
+
   return target;
 }
 
-- 
2.27.0

[PATCH 15/34] rs6000: Execute the automatic built-in initialization code

2021-07-29 Thread Bill Schmidt via Gcc-patches

2021-03-04  Bill Schmidt  

gcc/
* config/rs6000/rs6000-call.c (rs6000-builtins.h): New #include.
(rs6000_init_builtins): Call rs6000_autoinit_builtins; skip the old
initialization logic when new builtins are enabled.
---
 gcc/config/rs6000/rs6000-call.c | 12 
 1 file changed, 12 insertions(+)

diff --git a/gcc/config/rs6000/rs6000-call.c b/gcc/config/rs6000/rs6000-call.c
index b1338191926..be34a196be0 100644
--- a/gcc/config/rs6000/rs6000-call.c
+++ b/gcc/config/rs6000/rs6000-call.c
@@ -69,6 +69,7 @@
 #include "opts.h"
 
 #include "rs6000-internal.h"
+#include "rs6000-builtins.h"
 
 #if TARGET_MACHO
 #include "gstab.h"  /* for N_SLINE */
@@ -13648,6 +13649,17 @@ rs6000_init_builtins (void)
 = build_pointer_type (build_qualified_type (void_type_node,
TYPE_QUAL_CONST));
 
+  /* Execute the autogenerated initialization code for builtins.  */
+  rs6000_autoinit_builtins ();
+
+  if (new_builtins_are_live)
+{
+#ifdef SUBTARGET_INIT_BUILTINS
+  SUBTARGET_INIT_BUILTINS;
+#endif
+  return;
+}
+
   /* Create Altivec, VSX and MMA builtins on machines with at least the
  general purpose extensions (970 and newer) to allow the use of
  the target attribute.  */
-- 
2.27.0

Re: [llvm-dev] [PATCH] Add optional _Float16 support

2021-07-29 Thread H.J. Lu via Gcc-patches

On Tue, Jul 13, 2021 at 9:24 AM H.J. Lu  wrote:
>
> On Tue, Jul 13, 2021 at 8:41 AM Joseph Myers  wrote:
> >
> > On Tue, 13 Jul 2021, H.J. Lu wrote:
> >
> > > On Mon, Jul 12, 2021 at 8:59 PM Wang, Pengfei  
> > > wrote:
> > > >
> > > > > Return _Float16 and _Complex _Float16 values in %xmm0/%xmm1 registers.
> > > >
> > > > Can you please explain the behavior here? Is there difference between 
> > > > _Float16 and _Complex _Float16 when return? I.e.,
> > > > 1, In which case will _Float16 values return in both %xmm0 and %xmm1?
> > > > 2, For a single _Float16 value, are both real part and imaginary part 
> > > > returned in %xmm0? Or returned in %xmm0 and %xmm1 respectively?
> > >
> > > Here is the v2 patch to add the missing _Float16 bits.   The PDF file is 
> > > at
> > >
> > > https://gitlab.com/x86-psABIs/i386-ABI/-/wikis/Intel386-psABI
> >
> > This PDF shows _Complex _Float16 as having a size of 2 bytes (should be
> > 4-byte size, 2-byte alignment).
> >
> > It also seems to change double from 4-byte to 8-byte alignment, which is
> > wrong.  And it's inconsistent about whether it covers the long double =
> > double (Android) case - it shows that case for _Complex long double but
> > not for long double itself.
>
> Here is the v3 patch with the fixes.  I also updated the PDF file.

Here is the final patch I checked in.   _Complex _Float16 is changed to return
in XMM0 register.   The new PDF file is at

https://gitlab.com/x86-psABIs/i386-ABI/-/wikis/Intel386-psABI

-- 
H.J.
From 4ce1007486d28b13da36bbf216b2e470818d7ee1 Mon Sep 17 00:00:00 2001
From: "H.J. Lu" 
Date: Thu, 1 Jul 2021 13:58:00 -0700
Subject: [PATCH] Add optional _Float16 support

1. Pass _Float16 and _Complex _Float16 values on stack.
2. Return _Float16 and _Complex _Float16 values in XMM0 register.
---
 low-level-sys-info.tex | 70 +-
 1 file changed, 49 insertions(+), 21 deletions(-)

diff --git a/low-level-sys-info.tex b/low-level-sys-info.tex
index acaf30e..860ff66 100644
--- a/low-level-sys-info.tex
+++ b/low-level-sys-info.tex
@@ -30,7 +30,8 @@ object, and the term \emph{\textindex{\sixteenbyte{}}} refers to a
 \subsubsection{Fundamental Types}
 
 Table~\ref{basic-types} shows the correspondence between ISO C
-scalar types and the processor scalar types.  \code{__float80},
+scalar types and the processor scalar types.  \code{_Float16},
+\code{__float80},
 \code{__float128}, \code{__m64}, \code{__m128}, \code{__m256} and
 \code{__m512} types are optional.
 
@@ -79,23 +80,28 @@ scalar types and the processor scalar types.  \code{__float80},
 & \texttt{\textit{any-type} *} & 4 & 4 & unsigned \fourbyte \\
 & \texttt{\textit{any-type} (*)()} & & \\
 \hline
-Floating-& \texttt{float} & 4 & 4 & single (IEEE-754) \\
 \cline{2-5}
-point & \texttt{double} & 8 & 4 & double (IEEE-754) \\
-& \texttt{long double}$^{\dagger\dagger\dagger\dagger}$  & & & \\
+& \texttt{_Float16}$^{\dagger\dagger\dagger\dagger\dagger}$ & 2 & 2 & 16-bit (IEEE-754) \\
+\cline{2-5}
+& \texttt{float} & 4 & 4 & single (IEEE-754) \\
+\cline{2-5}
+Floating- & \texttt{double} & 8 & 4 & double (IEEE-754) \\
+point & \texttt{long double}$^{\dagger\dagger\dagger\dagger}$ & & & \\
 \cline{2-5}
 & \texttt{__float80}$^{\dagger\dagger}$  & 12 & 4 & 80-bit extended (IEEE-754) \\
 & \texttt{long double}$^{\dagger\dagger\dagger\dagger}$  & & & \\
 \cline{2-5}
 & \texttt{__float128}$^{\dagger\dagger}$ & 16 & 16 & 128-bit extended (IEEE-754) \\
 \hline
-Complex& \texttt{_Complex float} & 8 & 4 & complex single (IEEE-754) \\
+& \texttt{_Complex _Float16} $^{\dagger\dagger\dagger\dagger\dagger}$ & 4 & 2 & complex 16-bit (IEEE-754) \\
+\cline{2-5}
+& \texttt{_Complex float} & 8 & 4 & complex single (IEEE-754) \\
 \cline{2-5}
-Floating-& \texttt{_Complex double} & 16 & 4 & complex double (IEEE-754) \\
-point & \texttt{_Complex long double}$^{\dagger\dagger\dagger\dagger}$ & & & \\
+Complex& \texttt{_Complex double} & 16 & 4 & complex double (IEEE-754) \\
+Floating-& \texttt{_Complex long double}$^{\dagger\dagger\dagger\dagger}$ & & & \\
 \cline{2-5}
-& \texttt{_Complex __float80}$^{\dagger\dagger}$  & 24 & 4 & complex 80-bit extended (IEEE-754) \\
-& \texttt{_Complex long double}$^{\dagger\dagger\dagger\dagger}$  & & & \\
+point & \texttt{_Complex __float80}$^{\dagger\dagger}$ & 24 & 4 & complex 80-bit extended (IEEE-754) \\
+& \texttt{_Complex long double}$^{\dagger\dagger\dagger\dagger}$ & & & \\
 \cline{2-5}
 & \texttt{_Complex __float128}$^{\dagger\dagger}$ & 32 & 16 & complex 128-bit extended (IEEE-754) \\
 \hline
@@ -125,6 +131,8 @@ The \texttt{long double} type is 64-bit, the same as the \texttt{double}
 type, on the Android{\texttrademark} platform.  More information on the
 Android{\texttrademark} platform is available from
 \url{http://www.android.com/}.}\\
+\multicolumn{5}{p{13cm}}{\myfontsize

[PATCH 27/34] rs6000: Builtin expansion, part 5

2021-07-29 Thread Bill Schmidt via Gcc-patches

2021-06-17  Bill Schmidt  

gcc/
* config/rs6000/rs6000-call.c (new_mma_expand_builtin):
Implement.
---
 gcc/config/rs6000/rs6000-call.c | 103 
 1 file changed, 103 insertions(+)

diff --git a/gcc/config/rs6000/rs6000-call.c b/gcc/config/rs6000/rs6000-call.c
index 89984d65a46..f37ee9b25ab 100644
--- a/gcc/config/rs6000/rs6000-call.c
+++ b/gcc/config/rs6000/rs6000-call.c
@@ -15024,6 +15024,109 @@ static rtx
 new_mma_expand_builtin (tree exp, rtx target, insn_code icode,
rs6000_gen_builtins fcode)
 {
+  tree fndecl = TREE_OPERAND (CALL_EXPR_FN (exp), 0);
+  tree arg;
+  call_expr_arg_iterator iter;
+  const struct insn_operand_data *insn_op;
+  rtx op[MAX_MMA_OPERANDS];
+  unsigned nopnds = 0;
+  bool void_func = TREE_TYPE (TREE_TYPE (fndecl)) == void_type_node;
+  machine_mode tmode = VOIDmode;
+
+  if (!void_func)
+{
+  tmode = insn_data[icode].operand[0].mode;
+  if (!target
+ || GET_MODE (target) != tmode
+ || !(*insn_data[icode].operand[0].predicate) (target, tmode))
+   target = gen_reg_rtx (tmode);
+  op[nopnds++] = target;
+}
+  else
+target = const0_rtx;
+
+  FOR_EACH_CALL_EXPR_ARG (arg, iter, exp)
+{
+  if (arg == error_mark_node)
+   return const0_rtx;
+
+  rtx opnd;
+  insn_op = _data[icode].operand[nopnds];
+  if (TREE_CODE (arg) == ADDR_EXPR
+ && MEM_P (DECL_RTL (TREE_OPERAND (arg, 0
+   opnd = DECL_RTL (TREE_OPERAND (arg, 0));
+  else
+   opnd = expand_normal (arg);
+
+  if (!(*insn_op->predicate) (opnd, insn_op->mode))
+   {
+ if (!strcmp (insn_op->constraint, "n"))
+   {
+ if (!CONST_INT_P (opnd))
+   error ("argument %d must be an unsigned literal", nopnds);
+ else
+   error ("argument %d is an unsigned literal that is "
+  "out of range", nopnds);
+ return const0_rtx;
+   }
+ opnd = copy_to_mode_reg (insn_op->mode, opnd);
+   }
+
+  /* Some MMA instructions have INOUT accumulator operands, so force
+their target register to be the same as their input register.  */
+  if (!void_func
+ && nopnds == 1
+ && !strcmp (insn_op->constraint, "0")
+ && insn_op->mode == tmode
+ && REG_P (opnd)
+ && (*insn_data[icode].operand[0].predicate) (opnd, tmode))
+   target = op[0] = opnd;
+
+  op[nopnds++] = opnd;
+}
+
+  rtx pat;
+  switch (nopnds)
+{
+case 1:
+  pat = GEN_FCN (icode) (op[0]);
+  break;
+case 2:
+  pat = GEN_FCN (icode) (op[0], op[1]);
+  break;
+case 3:
+  /* The ASSEMBLE builtin source operands are reversed in little-endian
+mode, so reorder them.  */
+  if (fcode == RS6000_BIF_ASSEMBLE_PAIR_V_INTERNAL && !WORDS_BIG_ENDIAN)
+   std::swap (op[1], op[2]);
+  pat = GEN_FCN (icode) (op[0], op[1], op[2]);
+  break;
+case 4:
+  pat = GEN_FCN (icode) (op[0], op[1], op[2], op[3]);
+  break;
+case 5:
+  /* The ASSEMBLE builtin source operands are reversed in little-endian
+mode, so reorder them.  */
+  if (fcode == RS6000_BIF_ASSEMBLE_ACC_INTERNAL && !WORDS_BIG_ENDIAN)
+   {
+ std::swap (op[1], op[4]);
+ std::swap (op[2], op[3]);
+   }
+  pat = GEN_FCN (icode) (op[0], op[1], op[2], op[3], op[4]);
+  break;
+case 6:
+  pat = GEN_FCN (icode) (op[0], op[1], op[2], op[3], op[4], op[5]);
+  break;
+case 7:
+  pat = GEN_FCN (icode) (op[0], op[1], op[2], op[3], op[4], op[5], op[6]);
+  break;
+default:
+  gcc_unreachable ();
+}
+  if (!pat)
+return NULL_RTX;
+  emit_insn (pat);
+
   return target;
 }
 
-- 
2.27.0

[PATCH 28/34] rs6000: Builtin expansion, part 6

2021-07-29 Thread Bill Schmidt via Gcc-patches

2021-07-28  Bill Schmidt  

gcc/
* config/rs6000/rs6000-call.c (new_htm_spr_num): New function.
(new_htm_expand_builtin): Implement.
(rs6000_expand_new_builtin): Handle 32-bit and endian cases.
---
 gcc/config/rs6000/rs6000-call.c | 202 
 1 file changed, 202 insertions(+)

diff --git a/gcc/config/rs6000/rs6000-call.c b/gcc/config/rs6000/rs6000-call.c
index f37ee9b25ab..eaf62d734f1 100644
--- a/gcc/config/rs6000/rs6000-call.c
+++ b/gcc/config/rs6000/rs6000-call.c
@@ -15130,11 +15130,171 @@ new_mma_expand_builtin (tree exp, rtx target, 
insn_code icode,
   return target;
 }
 
+/* Return the appropriate SPR number associated with the given builtin.  */
+static inline HOST_WIDE_INT
+new_htm_spr_num (enum rs6000_gen_builtins code)
+{
+  if (code == RS6000_BIF_GET_TFHAR
+  || code == RS6000_BIF_SET_TFHAR)
+return TFHAR_SPR;
+  else if (code == RS6000_BIF_GET_TFIAR
+  || code == RS6000_BIF_SET_TFIAR)
+return TFIAR_SPR;
+  else if (code == RS6000_BIF_GET_TEXASR
+  || code == RS6000_BIF_SET_TEXASR)
+return TEXASR_SPR;
+  gcc_assert (code == RS6000_BIF_GET_TEXASRU
+ || code == RS6000_BIF_SET_TEXASRU);
+  return TEXASRU_SPR;
+}
+
 /* Expand the HTM builtin in EXP and store the result in TARGET.  */
 static rtx
 new_htm_expand_builtin (bifdata *bifaddr, rs6000_gen_builtins fcode,
tree exp, rtx target)
 {
+  tree fndecl = TREE_OPERAND (CALL_EXPR_FN (exp), 0);
+  bool nonvoid = TREE_TYPE (TREE_TYPE (fndecl)) != void_type_node;
+
+  if (!TARGET_POWERPC64
+  && (fcode == RS6000_BIF_TABORTDC
+ || fcode == RS6000_BIF_TABORTDCI))
+{
+  error ("builtin %qs is only valid in 64-bit mode", bifaddr->bifname);
+  return const0_rtx;
+}
+
+  rtx op[MAX_HTM_OPERANDS], pat;
+  int nopnds = 0;
+  tree arg;
+  call_expr_arg_iterator iter;
+  insn_code icode = bifaddr->icode;
+  bool uses_spr = bif_is_htmspr (*bifaddr);
+  rtx cr = NULL_RTX;
+
+  if (uses_spr)
+icode = rs6000_htm_spr_icode (nonvoid);
+  const insn_operand_data *insn_op = _data[icode].operand[0];
+
+  if (nonvoid)
+{
+  machine_mode tmode = (uses_spr) ? insn_op->mode : E_SImode;
+  if (!target
+ || GET_MODE (target) != tmode
+ || (uses_spr && !(*insn_op->predicate) (target, tmode)))
+   target = gen_reg_rtx (tmode);
+  if (uses_spr)
+   op[nopnds++] = target;
+}
+
+  FOR_EACH_CALL_EXPR_ARG (arg, iter, exp)
+{
+  if (arg == error_mark_node || nopnds >= MAX_HTM_OPERANDS)
+   return const0_rtx;
+
+  insn_op = _data[icode].operand[nopnds];
+  op[nopnds] = expand_normal (arg);
+
+  if (!(*insn_op->predicate) (op[nopnds], insn_op->mode))
+   {
+ if (!strcmp (insn_op->constraint, "n"))
+   {
+ int arg_num = (nonvoid) ? nopnds : nopnds + 1;
+ if (!CONST_INT_P (op[nopnds]))
+   error ("argument %d must be an unsigned literal", arg_num);
+ else
+   error ("argument %d is an unsigned literal that is "
+  "out of range", arg_num);
+ return const0_rtx;
+   }
+ op[nopnds] = copy_to_mode_reg (insn_op->mode, op[nopnds]);
+   }
+
+  nopnds++;
+}
+
+  /* Handle the builtins for extended mnemonics.  These accept
+ no arguments, but map to builtins that take arguments.  */
+  switch (fcode)
+{
+case RS6000_BIF_TENDALL:  /* Alias for: tend. 1  */
+case RS6000_BIF_TRESUME:  /* Alias for: tsr. 1  */
+  op[nopnds++] = GEN_INT (1);
+  break;
+case RS6000_BIF_TSUSPEND: /* Alias for: tsr. 0  */
+  op[nopnds++] = GEN_INT (0);
+  break;
+default:
+  break;
+}
+
+  /* If this builtin accesses SPRs, then pass in the appropriate
+ SPR number and SPR regno as the last two operands.  */
+  if (uses_spr)
+{
+  machine_mode mode = (TARGET_POWERPC64) ? DImode : SImode;
+  op[nopnds++] = gen_rtx_CONST_INT (mode, new_htm_spr_num (fcode));
+}
+  /* If this builtin accesses a CR, then pass in a scratch
+ CR as the last operand.  */
+  else if (bif_is_htmcr (*bifaddr))
+{
+  cr = gen_reg_rtx (CCmode);
+  op[nopnds++] = cr;
+}
+
+  switch (nopnds)
+{
+case 1:
+  pat = GEN_FCN (icode) (op[0]);
+  break;
+case 2:
+  pat = GEN_FCN (icode) (op[0], op[1]);
+  break;
+case 3:
+  pat = GEN_FCN (icode) (op[0], op[1], op[2]);
+  break;
+case 4:
+  pat = GEN_FCN (icode) (op[0], op[1], op[2], op[3]);
+  break;
+default:
+  gcc_unreachable ();
+}
+  if (!pat)
+return NULL_RTX;
+  emit_insn (pat);
+
+  if (bif_is_htmcr (*bifaddr))
+{
+  if (fcode == RS6000_BIF_TBEGIN)
+   {
+ /* Emit code to set TARGET to true or false depending on
+whether the tbegin. instruction succeeded or failed
+to start a transaction.  We do this by placing the 1's
+complement of CR's EQ

[PATCH 22/34] rs6000: Support for vectorizing built-in functions

2021-07-29 Thread Bill Schmidt via Gcc-patches

2021-03-05  Bill Schmidt  

gcc/
* config/rs6000/rs6000.c (rs6000-builtins.h): New include.
(rs6000_new_builtin_vectorized_function): New function.
(rs6000_new_builtin_md_vectorized_function): Likewise.
(rs6000_builtin_vectorized_function): Call
rs6000_new_builtin_vectorized_function.
(rs6000_builtin_md_vectorized_function): Call
rs6000_new_builtin_md_vectorized_function.
---
 gcc/config/rs6000/rs6000.c | 200 +
 1 file changed, 200 insertions(+)

diff --git a/gcc/config/rs6000/rs6000.c b/gcc/config/rs6000/rs6000.c
index 279f00cc648..ceba25d028c 100644
--- a/gcc/config/rs6000/rs6000.c
+++ b/gcc/config/rs6000/rs6000.c
@@ -78,6 +78,7 @@
 #include "case-cfn-macros.h"
 #include "ppc-auxv.h"
 #include "rs6000-internal.h"
+#include "rs6000-builtins.h"
 #include "opts.h"
 
 /* This file should be included last.  */
@@ -5489,6 +5490,198 @@ rs6000_loop_unroll_adjust (unsigned nunroll, struct 
loop *loop)
   return nunroll;
 }
 
+/* Returns a function decl for a vectorized version of the builtin function
+   with builtin function code FN and the result vector type TYPE, or NULL_TREE
+   if it is not available.  */
+
+static tree
+rs6000_new_builtin_vectorized_function (unsigned int fn, tree type_out,
+   tree type_in)
+{
+  machine_mode in_mode, out_mode;
+  int in_n, out_n;
+
+  if (TARGET_DEBUG_BUILTIN)
+fprintf (stderr, "rs6000_new_builtin_vectorized_function (%s, %s, %s)\n",
+combined_fn_name (combined_fn (fn)),
+GET_MODE_NAME (TYPE_MODE (type_out)),
+GET_MODE_NAME (TYPE_MODE (type_in)));
+
+  if (TREE_CODE (type_out) != VECTOR_TYPE
+  || TREE_CODE (type_in) != VECTOR_TYPE)
+return NULL_TREE;
+
+  out_mode = TYPE_MODE (TREE_TYPE (type_out));
+  out_n = TYPE_VECTOR_SUBPARTS (type_out);
+  in_mode = TYPE_MODE (TREE_TYPE (type_in));
+  in_n = TYPE_VECTOR_SUBPARTS (type_in);
+
+  switch (fn)
+{
+CASE_CFN_COPYSIGN:
+  if (VECTOR_UNIT_VSX_P (V2DFmode)
+ && out_mode == DFmode && out_n == 2
+ && in_mode == DFmode && in_n == 2)
+   return rs6000_builtin_decls_x[RS6000_BIF_CPSGNDP];
+  if (VECTOR_UNIT_VSX_P (V4SFmode)
+ && out_mode == SFmode && out_n == 4
+ && in_mode == SFmode && in_n == 4)
+   return rs6000_builtin_decls_x[RS6000_BIF_CPSGNSP];
+  if (VECTOR_UNIT_ALTIVEC_P (V4SFmode)
+ && out_mode == SFmode && out_n == 4
+ && in_mode == SFmode && in_n == 4)
+   return rs6000_builtin_decls_x[RS6000_BIF_COPYSIGN_V4SF];
+  break;
+CASE_CFN_CEIL:
+  if (VECTOR_UNIT_VSX_P (V2DFmode)
+ && out_mode == DFmode && out_n == 2
+ && in_mode == DFmode && in_n == 2)
+   return rs6000_builtin_decls_x[RS6000_BIF_XVRDPIP];
+  if (VECTOR_UNIT_VSX_P (V4SFmode)
+ && out_mode == SFmode && out_n == 4
+ && in_mode == SFmode && in_n == 4)
+   return rs6000_builtin_decls_x[RS6000_BIF_XVRSPIP];
+  if (VECTOR_UNIT_ALTIVEC_P (V4SFmode)
+ && out_mode == SFmode && out_n == 4
+ && in_mode == SFmode && in_n == 4)
+   return rs6000_builtin_decls_x[RS6000_BIF_VRFIP];
+  break;
+CASE_CFN_FLOOR:
+  if (VECTOR_UNIT_VSX_P (V2DFmode)
+ && out_mode == DFmode && out_n == 2
+ && in_mode == DFmode && in_n == 2)
+   return rs6000_builtin_decls_x[RS6000_BIF_XVRDPIM];
+  if (VECTOR_UNIT_VSX_P (V4SFmode)
+ && out_mode == SFmode && out_n == 4
+ && in_mode == SFmode && in_n == 4)
+   return rs6000_builtin_decls_x[RS6000_BIF_XVRSPIM];
+  if (VECTOR_UNIT_ALTIVEC_P (V4SFmode)
+ && out_mode == SFmode && out_n == 4
+ && in_mode == SFmode && in_n == 4)
+   return rs6000_builtin_decls_x[RS6000_BIF_VRFIM];
+  break;
+CASE_CFN_FMA:
+  if (VECTOR_UNIT_VSX_P (V2DFmode)
+ && out_mode == DFmode && out_n == 2
+ && in_mode == DFmode && in_n == 2)
+   return rs6000_builtin_decls_x[RS6000_BIF_XVMADDDP];
+  if (VECTOR_UNIT_VSX_P (V4SFmode)
+ && out_mode == SFmode && out_n == 4
+ && in_mode == SFmode && in_n == 4)
+   return rs6000_builtin_decls_x[RS6000_BIF_XVMADDSP];
+  if (VECTOR_UNIT_ALTIVEC_P (V4SFmode)
+ && out_mode == SFmode && out_n == 4
+ && in_mode == SFmode && in_n == 4)
+   return rs6000_builtin_decls_x[RS6000_BIF_VMADDFP];
+  break;
+CASE_CFN_TRUNC:
+  if (VECTOR_UNIT_VSX_P (V2DFmode)
+ && out_mode == DFmode && out_n == 2
+ && in_mode == DFmode && in_n == 2)
+   return rs6000_builtin_decls_x[RS6000_BIF_XVRDPIZ];
+  if (VECTOR_UNIT_VSX_P (V4SFmode)
+ && out_mode == SFmode && out_n == 4
+ && in_mode == SFmode && in_n == 4)
+   return rs6000_builtin_decls_x[RS6000_BIF_XVRSPIZ];
+  if (VECTOR_UNIT_ALTIVEC_P (V4SFmode)
+ && out_mode == SFmode && out_n == 4
+ && in_mode == SFmode && in_n == 4)
+   return

[PATCH 07/34] rs6000: Add power8-vector builtins

2021-07-29 Thread Bill Schmidt via Gcc-patches

2021-04-01  Bill Schmidt  

gcc/
* config/rs6000/rs6000-builtin-new.def: Add power8-vector stanza.
---
 gcc/config/rs6000/rs6000-builtin-new.def | 438 +++
 1 file changed, 438 insertions(+)

diff --git a/gcc/config/rs6000/rs6000-builtin-new.def 
b/gcc/config/rs6000/rs6000-builtin-new.def
index bffce52ee47..f13fb13b0ad 100644
--- a/gcc/config/rs6000/rs6000-builtin-new.def
+++ b/gcc/config/rs6000/rs6000-builtin-new.def
@@ -1996,3 +1996,441 @@
 
   const unsigned long long __builtin_divdeu (unsigned long long, unsigned long 
long);
 DIVDEU diveu_di {}
+
+
+; Power8 vector built-ins.
+[power8-vector]
+  const vsll __builtin_altivec_abs_v2di (vsll);
+ABS_V2DI absv2di2 {}
+
+  const vsc __builtin_altivec_bcddiv10_v16qi (vsc);
+BCDDIV10_V16QI bcddiv10_v16qi {}
+
+  const vsc __builtin_altivec_bcdmul10_v16qi (vsc);
+BCDMUL10_V16QI bcdmul10_v16qi {}
+
+  const vsc __builtin_altivec_eqv_v16qi (vsc, vsc);
+EQV_V16QI eqvv16qi3 {}
+
+  const vuc __builtin_altivec_eqv_v16qi_uns (vuc, vuc);
+EQV_V16QI_UNS eqvv16qi3 {}
+
+  const vsq __builtin_altivec_eqv_v1ti (vsq, vsq);
+EQV_V1TI eqvv1ti3 {}
+
+  const vuq __builtin_altivec_eqv_v1ti_uns (vuq, vuq);
+EQV_V1TI_UNS eqvv1ti3 {}
+
+  const vd __builtin_altivec_eqv_v2df (vd, vd);
+EQV_V2DF eqvv2df3 {}
+
+  const vsll __builtin_altivec_eqv_v2di (vsll, vsll);
+EQV_V2DI eqvv2di3 {}
+
+  const vull __builtin_altivec_eqv_v2di_uns (vull, vull);
+EQV_V2DI_UNS eqvv2di3 {}
+
+  const vf __builtin_altivec_eqv_v4sf (vf, vf);
+EQV_V4SF eqvv4sf3 {}
+
+  const vsi __builtin_altivec_eqv_v4si (vsi, vsi);
+EQV_V4SI eqvv4si3 {}
+
+  const vui __builtin_altivec_eqv_v4si_uns (vui, vui);
+EQV_V4SI_UNS eqvv4si3 {}
+
+  const vss __builtin_altivec_eqv_v8hi (vss, vss);
+EQV_V8HI eqvv8hi3 {}
+
+  const vus __builtin_altivec_eqv_v8hi_uns (vus, vus);
+EQV_V8HI_UNS eqvv8hi3 {}
+
+  const vsc __builtin_altivec_nand_v16qi (vsc, vsc);
+NAND_V16QI nandv16qi3 {}
+
+  const vuc __builtin_altivec_nand_v16qi_uns (vuc, vuc);
+NAND_V16QI_UNS nandv16qi3 {}
+
+  const vsq __builtin_altivec_nand_v1ti (vsq, vsq);
+NAND_V1TI nandv1ti3 {}
+
+  const vuq __builtin_altivec_nand_v1ti_uns (vuq, vuq);
+NAND_V1TI_UNS nandv1ti3 {}
+
+  const vd __builtin_altivec_nand_v2df (vd, vd);
+NAND_V2DF nandv2df3 {}
+
+  const vsll __builtin_altivec_nand_v2di (vsll, vsll);
+NAND_V2DI nandv2di3 {}
+
+  const vull __builtin_altivec_nand_v2di_uns (vull, vull);
+NAND_V2DI_UNS nandv2di3 {}
+
+  const vf __builtin_altivec_nand_v4sf (vf, vf);
+NAND_V4SF nandv4sf3 {}
+
+  const vsi __builtin_altivec_nand_v4si (vsi, vsi);
+NAND_V4SI nandv4si3 {}
+
+  const vui __builtin_altivec_nand_v4si_uns (vui, vui);
+NAND_V4SI_UNS nandv4si3 {}
+
+  const vss __builtin_altivec_nand_v8hi (vss, vss);
+NAND_V8HI nandv8hi3 {}
+
+  const vus __builtin_altivec_nand_v8hi_uns (vus, vus);
+NAND_V8HI_UNS nandv8hi3 {}
+
+  const vsc __builtin_altivec_neg_v16qi (vsc);
+NEG_V16QI negv16qi2 {}
+
+  const vd __builtin_altivec_neg_v2df (vd);
+NEG_V2DF negv2df2 {}
+
+  const vsll __builtin_altivec_neg_v2di (vsll);
+NEG_V2DI negv2di2 {}
+
+  const vf __builtin_altivec_neg_v4sf (vf);
+NEG_V4SF negv4sf2 {}
+
+  const vsi __builtin_altivec_neg_v4si (vsi);
+NEG_V4SI negv4si2 {}
+
+  const vss __builtin_altivec_neg_v8hi (vss);
+NEG_V8HI negv8hi2 {}
+
+  const vsc __builtin_altivec_orc_v16qi (vsc, vsc);
+ORC_V16QI orcv16qi3 {}
+
+  const vuc __builtin_altivec_orc_v16qi_uns (vuc, vuc);
+ORC_V16QI_UNS orcv16qi3 {}
+
+  const vsq __builtin_altivec_orc_v1ti (vsq, vsq);
+ORC_V1TI orcv1ti3 {}
+
+  const vuq __builtin_altivec_orc_v1ti_uns (vuq, vuq);
+ORC_V1TI_UNS orcv1ti3 {}
+
+  const vd __builtin_altivec_orc_v2df (vd, vd);
+ORC_V2DF orcv2df3 {}
+
+  const vsll __builtin_altivec_orc_v2di (vsll, vsll);
+ORC_V2DI orcv2di3 {}
+
+  const vull __builtin_altivec_orc_v2di_uns (vull, vull);
+ORC_V2DI_UNS orcv2di3 {}
+
+  const vf __builtin_altivec_orc_v4sf (vf, vf);
+ORC_V4SF orcv4sf3 {}
+
+  const vsi __builtin_altivec_orc_v4si (vsi, vsi);
+ORC_V4SI orcv4si3 {}
+
+  const vui __builtin_altivec_orc_v4si_uns (vui, vui);
+ORC_V4SI_UNS orcv4si3 {}
+
+  const vss __builtin_altivec_orc_v8hi (vss, vss);
+ORC_V8HI orcv8hi3 {}
+
+  const vus __builtin_altivec_orc_v8hi_uns (vus, vus);
+ORC_V8HI_UNS orcv8hi3 {}
+
+  const vsc __builtin_altivec_vclzb (vsc);
+VCLZB clzv16qi2 {}
+
+  const vsll __builtin_altivec_vclzd (vsll);
+VCLZD clzv2di2 {}
+
+  const vss __builtin_altivec_vclzh (vss);
+VCLZH clzv8hi2 {}
+
+  const vsi __builtin_altivec_vclzw (vsi);
+VCLZW clzv4si2 {}
+
+  const vuc __builtin_altivec_vgbbd (vuc);
+VGBBD p8v_vgbbd {}
+
+  const vsq __builtin_altivec_vaddcuq (vsq, vsq);
+VADDCUQ altivec_vaddcuq {}
+
+  const vsq __builtin_altivec_vaddecuq (vsq, vsq, vsq);
+VADDECUQ altivec_vaddecuq {}
+
+  const vsq __builtin_altivec_vaddeuqm (vsq, vsq, vsq);
+

[PATCH 34/34] rs6000: Enable the new builtin support

2021-07-29 Thread Bill Schmidt via Gcc-patches

2021-03-05  Bill Schmidt  

gcc/
* config/rs6000/rs6000-gen-builtins.c (write_init_file):
Initialize new_builtins_are_live to 1.
---
 gcc/config/rs6000/rs6000-gen-builtins.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/gcc/config/rs6000/rs6000-gen-builtins.c 
b/gcc/config/rs6000/rs6000-gen-builtins.c
index 5fc56eff6d1..b0f90de8cae 100644
--- a/gcc/config/rs6000/rs6000-gen-builtins.c
+++ b/gcc/config/rs6000/rs6000-gen-builtins.c
@@ -2764,7 +2764,7 @@ write_init_file (void)
   fprintf (init_file, "#include \"rs6000-builtins.h\"\n");
   fprintf (init_file, "\n");
 
-  fprintf (init_file, "int new_builtins_are_live = 0;\n\n");
+  fprintf (init_file, "int new_builtins_are_live = 1;\n\n");
 
   fprintf (init_file, "tree rs6000_builtin_decls_x[RS6000_OVLD_MAX];\n\n");
 
-- 
2.27.0

[PATCH 26/34] rs6000: Builtin expansion, part 4

2021-07-29 Thread Bill Schmidt via Gcc-patches

2021-07-28  Bill Schmidt  

gcc/
* config/rs6000/rs6000-call.c (elemrev_icode): Implement.
(ldv_expand_builtin): Likewise.
(lxvrse_expand_builtin): Likewise.
(lxvrze_expand_builtin): Likewise.
(stv_expand_builtin): Likewise.
---
 gcc/config/rs6000/rs6000-call.c | 217 
 1 file changed, 217 insertions(+)

diff --git a/gcc/config/rs6000/rs6000-call.c b/gcc/config/rs6000/rs6000-call.c
index 4f5aed137fb..89984d65a46 100644
--- a/gcc/config/rs6000/rs6000-call.c
+++ b/gcc/config/rs6000/rs6000-call.c
@@ -14772,12 +14772,114 @@ new_cpu_expand_builtin (enum rs6000_gen_builtins 
fcode,
 static insn_code
 elemrev_icode (rs6000_gen_builtins fcode)
 {
+  switch (fcode)
+{
+default:
+  gcc_unreachable ();
+case RS6000_BIF_ST_ELEMREV_V1TI:
+  return (BYTES_BIG_ENDIAN ? CODE_FOR_vsx_store_v1ti
+ : CODE_FOR_vsx_st_elemrev_v1ti);
+case RS6000_BIF_ST_ELEMREV_V2DF:
+  return (BYTES_BIG_ENDIAN ? CODE_FOR_vsx_store_v2df
+ : CODE_FOR_vsx_st_elemrev_v2df);
+case RS6000_BIF_ST_ELEMREV_V2DI:
+  return (BYTES_BIG_ENDIAN ? CODE_FOR_vsx_store_v2di
+ : CODE_FOR_vsx_st_elemrev_v2di);
+case RS6000_BIF_ST_ELEMREV_V4SF:
+  return (BYTES_BIG_ENDIAN ? CODE_FOR_vsx_store_v4sf
+ : CODE_FOR_vsx_st_elemrev_v4sf);
+case RS6000_BIF_ST_ELEMREV_V4SI:
+  return (BYTES_BIG_ENDIAN ? CODE_FOR_vsx_store_v4si
+ : CODE_FOR_vsx_st_elemrev_v4si);
+case RS6000_BIF_ST_ELEMREV_V8HI:
+  return (BYTES_BIG_ENDIAN ? CODE_FOR_vsx_store_v8hi
+ : CODE_FOR_vsx_st_elemrev_v8hi);
+case RS6000_BIF_ST_ELEMREV_V16QI:
+  return (BYTES_BIG_ENDIAN ? CODE_FOR_vsx_store_v16qi
+ : CODE_FOR_vsx_st_elemrev_v16qi);
+case RS6000_BIF_LD_ELEMREV_V2DF:
+  return (BYTES_BIG_ENDIAN ? CODE_FOR_vsx_load_v2df
+ : CODE_FOR_vsx_ld_elemrev_v2df);
+case RS6000_BIF_LD_ELEMREV_V1TI:
+  return (BYTES_BIG_ENDIAN ? CODE_FOR_vsx_load_v1ti
+ : CODE_FOR_vsx_ld_elemrev_v1ti);
+case RS6000_BIF_LD_ELEMREV_V2DI:
+  return (BYTES_BIG_ENDIAN ? CODE_FOR_vsx_load_v2di
+ : CODE_FOR_vsx_ld_elemrev_v2di);
+case RS6000_BIF_LD_ELEMREV_V4SF:
+  return (BYTES_BIG_ENDIAN ? CODE_FOR_vsx_load_v4sf
+ : CODE_FOR_vsx_ld_elemrev_v4sf);
+case RS6000_BIF_LD_ELEMREV_V4SI:
+  return (BYTES_BIG_ENDIAN ? CODE_FOR_vsx_load_v4si
+ : CODE_FOR_vsx_ld_elemrev_v4si);
+case RS6000_BIF_LD_ELEMREV_V8HI:
+  return (BYTES_BIG_ENDIAN ? CODE_FOR_vsx_load_v8hi
+ : CODE_FOR_vsx_ld_elemrev_v8hi);
+case RS6000_BIF_LD_ELEMREV_V16QI:
+  return (BYTES_BIG_ENDIAN ? CODE_FOR_vsx_load_v16qi
+ : CODE_FOR_vsx_ld_elemrev_v16qi);
+}
+  gcc_unreachable ();
   return (insn_code) 0;
 }
 
 static rtx
 ldv_expand_builtin (rtx target, insn_code icode, rtx *op, machine_mode tmode)
 {
+  rtx pat, addr;
+  bool blk = (icode == CODE_FOR_altivec_lvlx
+ || icode == CODE_FOR_altivec_lvlxl
+ || icode == CODE_FOR_altivec_lvrx
+ || icode == CODE_FOR_altivec_lvrxl);
+
+  if (target == 0
+  || GET_MODE (target) != tmode
+  || !(*insn_data[icode].operand[0].predicate) (target, tmode))
+target = gen_reg_rtx (tmode);
+
+  op[1] = copy_to_mode_reg (Pmode, op[1]);
+
+  /* For LVX, express the RTL accurately by ANDing the address with -16.
+ LVXL and LVE*X expand to use UNSPECs to hide their special behavior,
+ so the raw address is fine.  */
+  if (icode == CODE_FOR_altivec_lvx_v1ti
+  || icode == CODE_FOR_altivec_lvx_v2df
+  || icode == CODE_FOR_altivec_lvx_v2di
+  || icode == CODE_FOR_altivec_lvx_v4sf
+  || icode == CODE_FOR_altivec_lvx_v4si
+  || icode == CODE_FOR_altivec_lvx_v8hi
+  || icode == CODE_FOR_altivec_lvx_v16qi)
+{
+  rtx rawaddr;
+  if (op[0] == const0_rtx)
+   rawaddr = op[1];
+  else
+   {
+ op[0] = copy_to_mode_reg (Pmode, op[0]);
+ rawaddr = gen_rtx_PLUS (Pmode, op[1], op[0]);
+   }
+  addr = gen_rtx_AND (Pmode, rawaddr, gen_rtx_CONST_INT (Pmode, -16));
+  addr = gen_rtx_MEM (blk ? BLKmode : tmode, addr);
+
+  emit_insn (gen_rtx_SET (target, addr));
+}
+  else
+{
+  if (op[0] == const0_rtx)
+   addr = gen_rtx_MEM (blk ? BLKmode : tmode, op[1]);
+  else
+   {
+ op[0] = copy_to_mode_reg (Pmode, op[0]);
+ addr = gen_rtx_MEM (blk ? BLKmode : tmode,
+ gen_rtx_PLUS (Pmode, op[1], op[0]));
+   }
+
+  pat = GEN_FCN (icode) (target, addr);
+  if (!pat)
+   return 0;
+  emit_insn (pat);
+}
+
   return target;
 }
 
@@ -14785,6 +14887,42 @@ static rtx
 lxvrse_expand_builtin (rtx target, insn_code icode, rtx *op,
   machine_mode tmode, machine_mode smode)
 {
+  rtx pat, addr;
+  op[1] = copy_to_mode_reg (Pmode, op[1]);
+
+  if (op[0] == const0_rtx)
+addr =

[PATCH 17/34] rs6000: Add sanity to V2DI_type_node definitions

2021-07-29 Thread Bill Schmidt via Gcc-patches

It seems quite strange for these to be "vector long" for 64-bit and
"vector long long" for 32-bit, when "vector long long" will do for both.

2021-03-04  Bill Schmidt  

gcc/
* config/rs6000/rs6000-call.c (rs6000_init_builtins): Change
initialization of V2DI_type_node and unsigned_V2DI_type_node.
---
 gcc/config/rs6000/rs6000-call.c | 17 +
 1 file changed, 13 insertions(+), 4 deletions(-)

diff --git a/gcc/config/rs6000/rs6000-call.c b/gcc/config/rs6000/rs6000-call.c
index be34a196be0..7a8bc5f537c 100644
--- a/gcc/config/rs6000/rs6000-call.c
+++ b/gcc/config/rs6000/rs6000-call.c
@@ -13296,9 +13296,13 @@ rs6000_init_builtins (void)
 (TARGET_ALTIVEC)  ? ", altivec" : "",
 (TARGET_VSX)  ? ", vsx" : "");
 
-  V2DI_type_node = rs6000_vector_type (TARGET_POWERPC64 ? "__vector long"
-  : "__vector long long",
-  long_long_integer_type_node, 2);
+  if (new_builtins_are_live)
+V2DI_type_node = rs6000_vector_type ("__vector long long",
+long_long_integer_type_node, 2);
+  else
+V2DI_type_node = rs6000_vector_type (TARGET_POWERPC64 ? "__vector long"
+: "__vector long long",
+long_long_integer_type_node, 2);
   ptr_V2DI_type_node
 = build_pointer_type (build_qualified_type (V2DI_type_node,
TYPE_QUAL_CONST));
@@ -13349,7 +13353,12 @@ rs6000_init_builtins (void)
 = build_pointer_type (build_qualified_type (unsigned_V4SI_type_node,
TYPE_QUAL_CONST));
 
-  unsigned_V2DI_type_node = rs6000_vector_type (TARGET_POWERPC64
+  if (new_builtins_are_live)
+unsigned_V2DI_type_node
+  = rs6000_vector_type ("__vector unsigned long long",
+   long_long_unsigned_type_node, 2);
+  else
+unsigned_V2DI_type_node = rs6000_vector_type (TARGET_POWERPC64
   ? "__vector unsigned long"
   : "__vector unsigned long long",
   long_long_unsigned_type_node, 2);
-- 
2.27.0

[PATCH 32/34] rs6000: Update altivec.h for automated interfaces

2021-07-29 Thread Bill Schmidt via Gcc-patches

2021-07-28  Bill Schmidt  

gcc/
* config/rs6000/altivec.h: Delete a number of #defines that are
now superfluous; alphabetize; include rs6000-vecdefines.h; include
some synonyms.
---
 gcc/config/rs6000/altivec.h | 519 +++-
 1 file changed, 38 insertions(+), 481 deletions(-)

diff --git a/gcc/config/rs6000/altivec.h b/gcc/config/rs6000/altivec.h
index 5b631c7ebaf..9dfa285ccd1 100644
--- a/gcc/config/rs6000/altivec.h
+++ b/gcc/config/rs6000/altivec.h
@@ -55,32 +55,36 @@
 #define __CR6_LT   2
 #define __CR6_LT_REV   3
 
-/* Synonyms.  */
+#include "rs6000-vecdefines.h"
+
+/* Deprecated interfaces.  */
+#define vec_lvx vec_ld
+#define vec_lvxl vec_ldl
+#define vec_stvx vec_st
+#define vec_stvxl vec_stl
 #define vec_vaddcuw vec_addc
 #define vec_vand vec_and
 #define vec_vandc vec_andc
-#define vec_vrfip vec_ceil
 #define vec_vcmpbfp vec_cmpb
 #define vec_vcmpgefp vec_cmpge
 #define vec_vctsxs vec_cts
 #define vec_vctuxs vec_ctu
 #define vec_vexptefp vec_expte
-#define vec_vrfim vec_floor
-#define vec_lvx vec_ld
-#define vec_lvxl vec_ldl
 #define vec_vlogefp vec_loge
 #define vec_vmaddfp vec_madd
 #define vec_vmhaddshs vec_madds
-#define vec_vmladduhm vec_mladd
 #define vec_vmhraddshs vec_mradds
+#define vec_vmladduhm vec_mladd
 #define vec_vnmsubfp vec_nmsub
 #define vec_vnor vec_nor
 #define vec_vor vec_or
-#define vec_vpkpx vec_packpx
 #define vec_vperm vec_perm
-#define vec_permxor __builtin_vec_vpermxor
+#define vec_vpkpx vec_packpx
 #define vec_vrefp vec_re
+#define vec_vrfim vec_floor
 #define vec_vrfin vec_round
+#define vec_vrfip vec_ceil
+#define vec_vrfiz vec_trunc
 #define vec_vrsqrtefp vec_rsqrte
 #define vec_vsel vec_sel
 #define vec_vsldoi vec_sld
@@ -91,440 +95,53 @@
 #define vec_vspltisw vec_splat_s32
 #define vec_vsr vec_srl
 #define vec_vsro vec_sro
-#define vec_stvx vec_st
-#define vec_stvxl vec_stl
 #define vec_vsubcuw vec_subc
 #define vec_vsum2sws vec_sum2s
 #define vec_vsumsws vec_sums
-#define vec_vrfiz vec_trunc
 #define vec_vxor vec_xor
 
+/* For _ARCH_PWR8.  Always define to support #pragma GCC target.  */
+#define vec_vclz vec_cntlz
+#define vec_vgbbd vec_gb
+#define vec_vmrgew vec_mergee
+#define vec_vmrgow vec_mergeo
+#define vec_vpopcntu vec_popcnt
+#define vec_vrld vec_rl
+#define vec_vsld vec_sl
+#define vec_vsrd vec_sr
+#define vec_vsrad vec_sra
+
+/* For _ARCH_PWR9.  Always define to support #pragma GCC target.  */
+#define vec_extract_fp_from_shorth vec_extract_fp32_from_shorth
+#define vec_extract_fp_from_shortl vec_extract_fp32_from_shortl
+#define vec_vctz vec_cnttz
+
+/* Synonyms.  */
 /* Functions that are resolved by the backend to one of the
typed builtins.  */
-#define vec_vaddfp __builtin_vec_vaddfp
-#define vec_addc __builtin_vec_addc
-#define vec_adde __builtin_vec_adde
-#define vec_addec __builtin_vec_addec
-#define vec_vaddsws __builtin_vec_vaddsws
-#define vec_vaddshs __builtin_vec_vaddshs
-#define vec_vaddsbs __builtin_vec_vaddsbs
-#define vec_vavgsw __builtin_vec_vavgsw
-#define vec_vavguw __builtin_vec_vavguw
-#define vec_vavgsh __builtin_vec_vavgsh
-#define vec_vavguh __builtin_vec_vavguh
-#define vec_vavgsb __builtin_vec_vavgsb
-#define vec_vavgub __builtin_vec_vavgub
-#define vec_ceil __builtin_vec_ceil
-#define vec_cmpb __builtin_vec_cmpb
-#define vec_vcmpeqfp __builtin_vec_vcmpeqfp
-#define vec_cmpge __builtin_vec_cmpge
-#define vec_vcmpgtfp __builtin_vec_vcmpgtfp
-#define vec_vcmpgtsw __builtin_vec_vcmpgtsw
-#define vec_vcmpgtuw __builtin_vec_vcmpgtuw
-#define vec_vcmpgtsh __builtin_vec_vcmpgtsh
-#define vec_vcmpgtuh __builtin_vec_vcmpgtuh
-#define vec_vcmpgtsb __builtin_vec_vcmpgtsb
-#define vec_vcmpgtub __builtin_vec_vcmpgtub
-#define vec_vcfsx __builtin_vec_vcfsx
-#define vec_vcfux __builtin_vec_vcfux
-#define vec_cts __builtin_vec_cts
-#define vec_ctu __builtin_vec_ctu
-#define vec_cpsgn __builtin_vec_copysign
-#define vec_double __builtin_vec_double
-#define vec_doublee __builtin_vec_doublee
-#define vec_doubleo __builtin_vec_doubleo
-#define vec_doublel __builtin_vec_doublel
-#define vec_doubleh __builtin_vec_doubleh
-#define vec_expte __builtin_vec_expte
-#define vec_float __builtin_vec_float
-#define vec_float2 __builtin_vec_float2
-#define vec_floate __builtin_vec_floate
-#define vec_floato __builtin_vec_floato
-#define vec_floor __builtin_vec_floor
-#define vec_loge __builtin_vec_loge
-#define vec_madd __builtin_vec_madd
-#define vec_madds __builtin_vec_madds
-#define vec_mtvscr __builtin_vec_mtvscr
-#define vec_reve __builtin_vec_vreve
-#define vec_vmaxfp __builtin_vec_vmaxfp
-#define vec_vmaxsw __builtin_vec_vmaxsw
-#define vec_vmaxsh __builtin_vec_vmaxsh
-#define vec_vmaxsb __builtin_vec_vmaxsb
-#define vec_vminfp __builtin_vec_vminfp
-#define vec_vminsw __builtin_vec_vminsw
-#define vec_vminsh __builtin_vec_vminsh
-#define vec_vminsb __builtin_vec_vminsb
-#define vec_mradds __builtin_vec_mradds
-#define vec_vmsumshm __builtin_vec_vmsumshm
-#define

[PATCH 30/34] rs6000: Miscellaneous uses of rs6000_builtins_decl_x

2021-07-29 Thread Bill Schmidt via Gcc-patches

There are a few leftover places where we use the old rs6000_builtins_decl
array, but we need to use rs6000_builtins_decl_x instead when the new
builtins infrastructure is in play.

2021-07-28  Bill Schmidt  

gcc/
* config/rs6000/rs6000.c (rs6000_builtin_reciprocal): Use
rs6000_builtin_decls_x when appropriate.
(add_condition_to_bb): Likewise.
(rs6000_atomic_assign_expand_fenv): Likewise.
---
 gcc/config/rs6000/rs6000.c | 19 ---
 1 file changed, 16 insertions(+), 3 deletions(-)

diff --git a/gcc/config/rs6000/rs6000.c b/gcc/config/rs6000/rs6000.c
index ceba25d028c..112453b2908 100644
--- a/gcc/config/rs6000/rs6000.c
+++ b/gcc/config/rs6000/rs6000.c
@@ -22453,12 +22453,16 @@ rs6000_builtin_reciprocal (tree fndecl)
   if (!RS6000_RECIP_AUTO_RSQRTE_P (V2DFmode))
return NULL_TREE;
 
+  if (new_builtins_are_live)
+   return rs6000_builtin_decls_x[RS6000_BIF_RSQRT_2DF];
   return rs6000_builtin_decls[VSX_BUILTIN_RSQRT_2DF];
 
 case VSX_BUILTIN_XVSQRTSP:
   if (!RS6000_RECIP_AUTO_RSQRTE_P (V4SFmode))
return NULL_TREE;
 
+  if (new_builtins_are_live)
+   return rs6000_builtin_decls_x[RS6000_BIF_RSQRT_4SF];
   return rs6000_builtin_decls[VSX_BUILTIN_RSQRT_4SF];
 
 default:
@@ -25047,7 +25051,10 @@ add_condition_to_bb (tree function_decl, tree 
version_decl,
 
   tree bool_zero = build_int_cst (bool_int_type_node, 0);
   tree cond_var = create_tmp_var (bool_int_type_node);
-  tree predicate_decl = rs6000_builtin_decls [(int) 
RS6000_BUILTIN_CPU_SUPPORTS];
+  tree predicate_decl
+= (new_builtins_are_live
+   ? rs6000_builtin_decls_x[(int) RS6000_BIF_CPU_SUPPORTS]
+   : rs6000_builtin_decls [(int) RS6000_BUILTIN_CPU_SUPPORTS]);
   const char *arg_str = rs6000_clone_map[clone_isa].name;
   tree predicate_arg = build_string_literal (strlen (arg_str) + 1, arg_str);
   gimple *call_cond_stmt = gimple_build_call (predicate_decl, 1, 
predicate_arg);
@@ -27687,8 +27694,14 @@ rs6000_atomic_assign_expand_fenv (tree *hold, tree 
*clear, tree *update)
   return;
 }
 
-  tree mffs = rs6000_builtin_decls[RS6000_BUILTIN_MFFS];
-  tree mtfsf = rs6000_builtin_decls[RS6000_BUILTIN_MTFSF];
+  tree mffs
+= (new_builtins_are_live
+   ? rs6000_builtin_decls_x[RS6000_BIF_MFFS]
+   : rs6000_builtin_decls[RS6000_BUILTIN_MFFS]);
+  tree mtfsf
+= (new_builtins_are_live
+   ? rs6000_builtin_decls_x[RS6000_BIF_MTFSF]
+   : rs6000_builtin_decls[RS6000_BUILTIN_MTFSF]);
   tree call_mffs = build_call_expr (mffs, 0);
 
   /* Generates the equivalent of feholdexcept (_var)
-- 
2.27.0

[PATCH 24/34] rs6000: Builtin expansion, part 2

2021-07-29 Thread Bill Schmidt via Gcc-patches

2021-03-05  Bill Schmidt  

gcc/
* config/rs6000/rs6000-call.c (rs6000_invalid_new_builtin):
Implement.
(rs6000_expand_ldst_mask): Likewise.
(rs6000_init_builtins): Initialize altivec_builtin_mask_for_load.
---
 gcc/config/rs6000/rs6000-call.c | 101 +++-
 1 file changed, 100 insertions(+), 1 deletion(-)

diff --git a/gcc/config/rs6000/rs6000-call.c b/gcc/config/rs6000/rs6000-call.c
index 4719d074455..13a24dd9713 100644
--- a/gcc/config/rs6000/rs6000-call.c
+++ b/gcc/config/rs6000/rs6000-call.c
@@ -11671,6 +11671,75 @@ rs6000_invalid_builtin (enum rs6000_builtins fncode)
 static void
 rs6000_invalid_new_builtin (enum rs6000_gen_builtins fncode)
 {
+  size_t uns_fncode = (size_t) fncode;
+  const char *name = rs6000_builtin_info_x[uns_fncode].bifname;
+
+  switch (rs6000_builtin_info_x[uns_fncode].enable)
+{
+case ENB_P5:
+  error ("%qs requires the %qs option", name, "-mcpu=power5");
+  break;
+case ENB_P6:
+  error ("%qs requires the %qs option", name, "-mcpu=power6");
+  break;
+case ENB_ALTIVEC:
+  error ("%qs requires the %qs option", name, "-maltivec");
+  break;
+case ENB_CELL:
+  error ("%qs is only valid for the cell processor", name);
+  break;
+case ENB_VSX:
+  error ("%qs requires the %qs option", name, "-mvsx");
+  break;
+case ENB_P7:
+  error ("%qs requires the %qs option", name, "-mcpu=power7");
+  break;
+case ENB_P7_64:
+  error ("%qs requires the %qs option and either the %qs or %qs option",
+name, "-mcpu=power7", "-m64", "-mpowerpc64");
+  break;
+case ENB_P8:
+  error ("%qs requires the %qs option", name, "-mcpu=power8");
+  break;
+case ENB_P8V:
+  error ("%qs requires the %qs option", name, "-mpower8-vector");
+  break;
+case ENB_P9:
+  error ("%qs requires the %qs option", name, "-mcpu=power9");
+  break;
+case ENB_P9_64:
+  error ("%qs requires the %qs option and either the %qs or %qs option",
+name, "-mcpu=power9", "-m64", "-mpowerpc64");
+  break;
+case ENB_P9V:
+  error ("%qs requires the %qs option", name, "-mpower9-vector");
+  break;
+case ENB_IEEE128_HW:
+  error ("%qs requires ISA 3.0 IEEE 128-bit floating point", name);
+  break;
+case ENB_DFP:
+  error ("%qs requires the %qs option", name, "-mhard-dfp");
+  break;
+case ENB_CRYPTO:
+  error ("%qs requires the %qs option", name, "-mcrypto");
+  break;
+case ENB_HTM:
+  error ("%qs requires the %qs option", name, "-mhtm");
+  break;
+case ENB_P10:
+  error ("%qs requires the %qs option", name, "-mcpu=power10");
+  break;
+case ENB_P10_64:
+  error ("%qs requires the %qs option and either the %qs or %qs option",
+name, "-mcpu=power10", "-m64", "-mpowerpc64");
+  break;
+case ENB_MMA:
+  error ("%qs requires the %qs option", name, "-mmma");
+  break;
+default:
+case ENB_ALWAYS:
+  gcc_unreachable ();
+};
 }
 
 /* Target hook for early folding of built-ins, shamelessly stolen
@@ -14563,7 +14632,33 @@ rs6000_expand_builtin (tree exp, rtx target, rtx 
subtarget ATTRIBUTE_UNUSED,
 rtx
 rs6000_expand_ldst_mask (rtx target, tree arg0)
  {
-  return target;
+  int icode2 = (BYTES_BIG_ENDIAN ? (int) CODE_FOR_altivec_lvsr_direct
+   : (int) CODE_FOR_altivec_lvsl_direct);
+  machine_mode tmode = insn_data[icode2].operand[0].mode;
+  machine_mode mode = insn_data[icode2].operand[1].mode;
+  rtx op, addr, pat;
+
+  gcc_assert (TARGET_ALTIVEC);
+
+  gcc_assert (POINTER_TYPE_P (TREE_TYPE (arg0)));
+  op = expand_expr (arg0, NULL_RTX, Pmode, EXPAND_NORMAL);
+  addr = memory_address (mode, op);
+  /* We need to negate the address.  */
+  op = gen_reg_rtx (GET_MODE (addr));
+  emit_insn (gen_rtx_SET (op, gen_rtx_NEG (GET_MODE (addr), addr)));
+  op = gen_rtx_MEM (mode, op);
+
+  if (target == 0
+  || GET_MODE (target) != tmode
+  || ! (*insn_data[icode2].operand[0].predicate) (target, tmode))
+target = gen_reg_rtx (tmode);
+
+  pat = GEN_FCN (icode2) (target, op);
+  if (!pat)
+return 0;
+  emit_insn (pat);
+
+   return target;
  }
 
 /* Expand the CPU builtin in FCODE and store the result in TARGET.  */
@@ -15463,6 +15558,10 @@ rs6000_init_builtins (void)
   /* Execute the autogenerated initialization code for builtins.  */
   rs6000_autoinit_builtins ();
 
+  if (new_builtins_are_live)
+altivec_builtin_mask_for_load
+  = rs6000_builtin_decls_x[RS6000_BIF_MASK_FOR_LOAD];
+
   if (new_builtins_are_live)
 {
 #ifdef SUBTARGET_INIT_BUILTINS
-- 
2.27.0

[PATCH 18/34] rs6000: Always initialize vector_pair and vector_quad nodes

2021-07-29 Thread Bill Schmidt via Gcc-patches

2021-03-24  Bill Schmidt  

gcc/
* config/rs6000/rs6000-call.c (rs6000_init_builtins): Remove
TARGET_EXTRA_BUILTINS guard.
---
 gcc/config/rs6000/rs6000-call.c | 51 -
 1 file changed, 24 insertions(+), 27 deletions(-)

diff --git a/gcc/config/rs6000/rs6000-call.c b/gcc/config/rs6000/rs6000-call.c
index 7a8bc5f537c..0c555f29f7d 100644
--- a/gcc/config/rs6000/rs6000-call.c
+++ b/gcc/config/rs6000/rs6000-call.c
@@ -13542,34 +13542,31 @@ rs6000_init_builtins (void)
 ieee128_float_type_node = ibm128_float_type_node = long_double_type_node;
 
   /* Vector pair and vector quad support.  */
-  if (TARGET_EXTRA_BUILTINS)
-{
-  vector_pair_type_node = make_node (OPAQUE_TYPE);
-  SET_TYPE_MODE (vector_pair_type_node, OOmode);
-  TYPE_SIZE (vector_pair_type_node) = bitsize_int (GET_MODE_BITSIZE 
(OOmode));
-  TYPE_PRECISION (vector_pair_type_node) = GET_MODE_BITSIZE (OOmode);
-  TYPE_SIZE_UNIT (vector_pair_type_node) = size_int (GET_MODE_SIZE 
(OOmode));
-  SET_TYPE_ALIGN (vector_pair_type_node, 256);
-  TYPE_USER_ALIGN (vector_pair_type_node) = 0;
-  lang_hooks.types.register_builtin_type (vector_pair_type_node,
- "__vector_pair");
-  ptr_vector_pair_type_node
-   = build_pointer_type (build_qualified_type (vector_pair_type_node,
-   TYPE_QUAL_CONST));
+  vector_pair_type_node = make_node (OPAQUE_TYPE);
+  SET_TYPE_MODE (vector_pair_type_node, OOmode);
+  TYPE_SIZE (vector_pair_type_node) = bitsize_int (GET_MODE_BITSIZE (OOmode));
+  TYPE_PRECISION (vector_pair_type_node) = GET_MODE_BITSIZE (OOmode);
+  TYPE_SIZE_UNIT (vector_pair_type_node) = size_int (GET_MODE_SIZE (OOmode));
+  SET_TYPE_ALIGN (vector_pair_type_node, 256);
+  TYPE_USER_ALIGN (vector_pair_type_node) = 0;
+  lang_hooks.types.register_builtin_type (vector_pair_type_node,
+ "__vector_pair");
+  ptr_vector_pair_type_node
+= build_pointer_type (build_qualified_type (vector_pair_type_node,
+   TYPE_QUAL_CONST));
 
-  vector_quad_type_node = make_node (OPAQUE_TYPE);
-  SET_TYPE_MODE (vector_quad_type_node, XOmode);
-  TYPE_SIZE (vector_quad_type_node) = bitsize_int (GET_MODE_BITSIZE 
(XOmode));
-  TYPE_PRECISION (vector_quad_type_node) = GET_MODE_BITSIZE (XOmode);
-  TYPE_SIZE_UNIT (vector_quad_type_node) = size_int (GET_MODE_SIZE 
(XOmode));
-  SET_TYPE_ALIGN (vector_quad_type_node, 512);
-  TYPE_USER_ALIGN (vector_quad_type_node) = 0;
-  lang_hooks.types.register_builtin_type (vector_quad_type_node,
- "__vector_quad");
-  ptr_vector_quad_type_node
-   = build_pointer_type (build_qualified_type (vector_quad_type_node,
-   TYPE_QUAL_CONST));
-}
+  vector_quad_type_node = make_node (OPAQUE_TYPE);
+  SET_TYPE_MODE (vector_quad_type_node, XOmode);
+  TYPE_SIZE (vector_quad_type_node) = bitsize_int (GET_MODE_BITSIZE (XOmode));
+  TYPE_PRECISION (vector_quad_type_node) = GET_MODE_BITSIZE (XOmode);
+  TYPE_SIZE_UNIT (vector_quad_type_node) = size_int (GET_MODE_SIZE (XOmode));
+  SET_TYPE_ALIGN (vector_quad_type_node, 512);
+  TYPE_USER_ALIGN (vector_quad_type_node) = 0;
+  lang_hooks.types.register_builtin_type (vector_quad_type_node,
+ "__vector_quad");
+  ptr_vector_quad_type_node
+= build_pointer_type (build_qualified_type (vector_quad_type_node,
+   TYPE_QUAL_CONST));
 
   /* Initialize the modes for builtin_function_type, mapping a machine mode to
  tree type node.  */
-- 
2.27.0

[PATCH 20/34] rs6000: Handle gimple folding of target built-ins

2021-07-29 Thread Bill Schmidt via Gcc-patches

This is another patch that looks bigger than it really is.  Because we
have a new namespace for the builtins, allowing us to have both the old
and new builtin infrastructure supported at once, we need versions of
these functions that use the new builtin namespace.  Otherwise the code is
unchanged.

2021-07-29  Bill Schmidt  

gcc/
* config/rs6000/rs6000-call.c (rs6000_gimple_fold_new_builtin):
New forward decl.
(rs6000_gimple_fold_builtin): Call rs6000_gimple_fold_new_builtin.
(rs6000_new_builtin_valid_without_lhs): New function.
(rs6000_gimple_fold_new_mma_builtin): Likewise.
(rs6000_gimple_fold_new_builtin): Likewise.
---
 gcc/config/rs6000/rs6000-call.c | 1160 +++
 1 file changed, 1160 insertions(+)

diff --git a/gcc/config/rs6000/rs6000-call.c b/gcc/config/rs6000/rs6000-call.c
index b08440fd074..cb2503351c4 100644
--- a/gcc/config/rs6000/rs6000-call.c
+++ b/gcc/config/rs6000/rs6000-call.c
@@ -190,6 +190,7 @@ static tree builtin_function_type (machine_mode, 
machine_mode,
 static void rs6000_common_init_builtins (void);
 static void htm_init_builtins (void);
 static void mma_init_builtins (void);
+static bool rs6000_gimple_fold_new_builtin (gimple_stmt_iterator *gsi);
 
 
 /* Hash table to keep track of the argument types for builtin functions.  */
@@ -12018,6 +12019,9 @@ rs6000_gimple_fold_mma_builtin (gimple_stmt_iterator 
*gsi)
 bool
 rs6000_gimple_fold_builtin (gimple_stmt_iterator *gsi)
 {
+  if (new_builtins_are_live)
+return rs6000_gimple_fold_new_builtin (gsi);
+
   gimple *stmt = gsi_stmt (*gsi);
   tree fndecl = gimple_call_fndecl (stmt);
   gcc_checking_assert (fndecl && DECL_BUILT_IN_CLASS (fndecl) == BUILT_IN_MD);
@@ -12965,6 +12969,35 @@ rs6000_gimple_fold_builtin (gimple_stmt_iterator *gsi)
   return false;
 }
 
+/*  Helper function to sort out which built-ins may be valid without having
+a LHS.  */
+static bool
+rs6000_new_builtin_valid_without_lhs (enum rs6000_gen_builtins fn_code,
+ tree fndecl)
+{
+  if (TREE_TYPE (TREE_TYPE (fndecl)) == void_type_node)
+return true;
+
+  switch (fn_code)
+{
+case RS6000_BIF_STVX_V16QI:
+case RS6000_BIF_STVX_V8HI:
+case RS6000_BIF_STVX_V4SI:
+case RS6000_BIF_STVX_V4SF:
+case RS6000_BIF_STVX_V2DI:
+case RS6000_BIF_STVX_V2DF:
+case RS6000_BIF_STXVW4X_V16QI:
+case RS6000_BIF_STXVW4X_V8HI:
+case RS6000_BIF_STXVW4X_V4SF:
+case RS6000_BIF_STXVW4X_V4SI:
+case RS6000_BIF_STXVD2X_V2DF:
+case RS6000_BIF_STXVD2X_V2DI:
+  return true;
+default:
+  return false;
+}
+}
+
 /* Check whether a builtin function is supported in this target
configuration.  */
 bool
@@ -13056,6 +13089,1133 @@ rs6000_new_builtin_is_supported_p (enum 
rs6000_gen_builtins fncode)
   return true;
 }
 
+/* Expand the MMA built-ins early, so that we can convert the pass-by-reference
+   __vector_quad arguments into pass-by-value arguments, leading to more
+   efficient code generation.  */
+static bool
+rs6000_gimple_fold_new_mma_builtin (gimple_stmt_iterator *gsi,
+   rs6000_gen_builtins fn_code)
+{
+  gimple *stmt = gsi_stmt (*gsi);
+  size_t fncode = (size_t) fn_code;
+
+  if (!bif_is_mma (rs6000_builtin_info_x[fncode]))
+return false;
+
+  /* Each call that can be gimple-expanded has an associated built-in
+ function that it will expand into.  If this one doesn't, we have
+ already expanded it!  */
+  if (rs6000_builtin_info_x[fncode].assoc_bif == RS6000_BIF_NONE)
+return false;
+
+  bifdata *bd = _builtin_info_x[fncode];
+  unsigned nopnds = bd->nargs;
+  gimple_seq new_seq = NULL;
+  gimple *new_call;
+  tree new_decl;
+
+  /* Compatibility built-ins; we used to call these
+ __builtin_mma_{dis,}assemble_pair, but now we call them
+ __builtin_vsx_{dis,}assemble_pair.  Handle the old versions.  */
+  if (fncode == RS6000_BIF_ASSEMBLE_PAIR)
+fncode = RS6000_BIF_ASSEMBLE_PAIR_V;
+  else if (fncode == RS6000_BIF_DISASSEMBLE_PAIR)
+fncode = RS6000_BIF_DISASSEMBLE_PAIR_V;
+
+  if (fncode == RS6000_BIF_DISASSEMBLE_ACC
+  || fncode == RS6000_BIF_DISASSEMBLE_PAIR_V)
+{
+  /* This is an MMA disassemble built-in function.  */
+  push_gimplify_context (true);
+  unsigned nvec = (fncode == RS6000_BIF_DISASSEMBLE_ACC) ? 4 : 2;
+  tree dst_ptr = gimple_call_arg (stmt, 0);
+  tree src_ptr = gimple_call_arg (stmt, 1);
+  tree src_type = TREE_TYPE (src_ptr);
+  tree src = create_tmp_reg_or_ssa_name (TREE_TYPE (src_type));
+  gimplify_assign (src, build_simple_mem_ref (src_ptr), _seq);
+
+  /* If we are not disassembling an accumulator/pair or our destination is
+another accumulator/pair, then just copy the entire thing as is.  */
+  if ((fncode == RS6000_BIF_DISASSEMBLE_ACC
+  && TREE_TYPE (TREE_TYPE (dst_ptr)) == vector_quad_type_node)
+ || (fncode ==

[PATCH 19/34] rs6000: Handle overloads during program parsing

2021-07-29 Thread Bill Schmidt via Gcc-patches

Although this patch looks quite large, the changes are fairly minimal.
Most of it is duplicating the large function that does the overload
resolution using the automatically generated data structures instead of
the old hand-generated ones.  This doesn't make the patch terribly easy to
review, unfortunately.  Just be aware that generally we aren't changing
the logic and functionality of overload handling.

2021-06-07  Bill Schmidt  

gcc/
* config/rs6000/rs6000-c.c (rs6000-builtins.h): New include.
(altivec_resolve_new_overloaded_builtin): New forward decl.
(rs6000_new_builtin_type_compatible): New function.
(altivec_resolve_overloaded_builtin): Call
altivec_resolve_new_overloaded_builtin.
(altivec_build_new_resolved_builtin): New function.
(altivec_resolve_new_overloaded_builtin): Likewise.
* config/rs6000/rs6000-call.c (rs6000_new_builtin_is_supported_p):
Likewise.
---
 gcc/config/rs6000/rs6000-c.c| 1083 +++
 gcc/config/rs6000/rs6000-call.c |   91 +++
 2 files changed, 1174 insertions(+)

diff --git a/gcc/config/rs6000/rs6000-c.c b/gcc/config/rs6000/rs6000-c.c
index afcb5bb6e39..a986e57fe7d 100644
--- a/gcc/config/rs6000/rs6000-c.c
+++ b/gcc/config/rs6000/rs6000-c.c
@@ -35,6 +35,10 @@
 #include "langhooks.h"
 #include "c/c-tree.h"
 
+#include "rs6000-builtins.h"
+
+static tree
+altivec_resolve_new_overloaded_builtin (location_t, tree, void *);
 
 
 /* Handle the machine specific pragma longcall.  Its syntax is
@@ -811,6 +815,30 @@ is_float128_p (tree t)
  && t == long_double_type_node));
 }
   
+static bool
+rs6000_new_builtin_type_compatible (tree t, tree u)
+{
+  if (t == error_mark_node)
+return false;
+
+  if (INTEGRAL_TYPE_P (t) && INTEGRAL_TYPE_P (u))
+return true;
+
+  if (TARGET_IEEEQUAD && TARGET_LONG_DOUBLE_128
+  && is_float128_p (t) && is_float128_p (u))
+return true;
+
+  if (POINTER_TYPE_P (t) && POINTER_TYPE_P (u))
+{
+  t = TREE_TYPE (t);
+  u = TREE_TYPE (u);
+  if (TYPE_READONLY (u))
+   t = build_qualified_type (t, TYPE_QUAL_CONST);
+}
+
+  return lang_hooks.types_compatible_p (t, u);
+}
+
 static inline bool
 rs6000_builtin_type_compatible (tree t, int id)
 {
@@ -927,6 +955,10 @@ tree
 altivec_resolve_overloaded_builtin (location_t loc, tree fndecl,
void *passed_arglist)
 {
+  if (new_builtins_are_live)
+return altivec_resolve_new_overloaded_builtin (loc, fndecl,
+  passed_arglist);
+
   vec *arglist = static_cast *> (passed_arglist);
   unsigned int nargs = vec_safe_length (arglist);
   enum rs6000_builtins fcode
@@ -1930,3 +1962,1054 @@ altivec_resolve_overloaded_builtin (location_t loc, 
tree fndecl,
 return error_mark_node;
   }
 }
+
+/* Build a tree for a function call to an Altivec non-overloaded builtin.
+   The overloaded builtin that matched the types and args is described
+   by DESC.  The N arguments are given in ARGS, respectively.
+
+   Actually the only thing it does is calling fold_convert on ARGS, with
+   a small exception for vec_{all,any}_{ge,le} predicates. */
+
+static tree
+altivec_build_new_resolved_builtin (tree *args, int n, tree fntype,
+   tree ret_type,
+   rs6000_gen_builtins bif_id,
+   rs6000_gen_builtins ovld_id)
+{
+  tree argtypes = TYPE_ARG_TYPES (fntype);
+  tree arg_type[MAX_OVLD_ARGS];
+  tree fndecl = rs6000_builtin_decls_x[bif_id];
+  tree call;
+
+  for (int i = 0; i < n; i++)
+arg_type[i] = TREE_VALUE (argtypes), argtypes = TREE_CHAIN (argtypes);
+
+  /* The AltiVec overloading implementation is overall gross, but this
+ is particularly disgusting.  The vec_{all,any}_{ge,le} builtins
+ are completely different for floating-point vs. integer vector
+ types, because the former has vcmpgefp, but the latter should use
+ vcmpgtXX.
+
+ In practice, the second and third arguments are swapped, and the
+ condition (LT vs. EQ, which is recognizable by bit 1 of the first
+ argument) is reversed.  Patch the arguments here before building
+ the resolved CALL_EXPR.  */
+  if (n == 3
+  && ovld_id == RS6000_OVLD_VEC_CMPGE_P
+  && bif_id != RS6000_BIF_VCMPGEFP_P
+  && bif_id != RS6000_BIF_XVCMPGEDP_P)
+{
+  std::swap (args[1], args[2]);
+  std::swap (arg_type[1], arg_type[2]);
+
+  args[0] = fold_build2 (BIT_XOR_EXPR, TREE_TYPE (args[0]), args[0],
+build_int_cst (NULL_TREE, 2));
+}
+
+  /* If the number of arguments to an overloaded function increases,
+ we must expand this switch.  */
+  gcc_assert (MAX_OVLD_ARGS <= 4);
+
+  switch (n)
+{
+case 0:
+  call = build_call_expr (fndecl, 0);
+  break;
+case 1:
+  call = build_call_expr (fndecl, 1,
+ fully_fold_convert

[PATCH 16/34] rs6000: Darwin builtin support

2021-07-29 Thread Bill Schmidt via Gcc-patches

2021-03-04  Bill Schmidt  

gcc/
* config/rs6000/darwin.h (SUBTARGET_INIT_BUILTINS): Use the new
decl when new_builtins_are_live.
* config/rs6000/rs6000-builtin-new.def (__builtin_cfstring): New
built-in.
---
 gcc/config/rs6000/darwin.h   | 8 ++--
 gcc/config/rs6000/rs6000-builtin-new.def | 6 ++
 2 files changed, 12 insertions(+), 2 deletions(-)

diff --git a/gcc/config/rs6000/darwin.h b/gcc/config/rs6000/darwin.h
index 42f39e60305..6abf8e84f54 100644
--- a/gcc/config/rs6000/darwin.h
+++ b/gcc/config/rs6000/darwin.h
@@ -504,8 +504,12 @@
 #define SUBTARGET_INIT_BUILTINS
\
 do {   \
   darwin_patch_builtins ();\
-  rs6000_builtin_decls[(unsigned) (RS6000_BUILTIN_CFSTRING)]   \
-= darwin_init_cfstring_builtins ((unsigned) (RS6000_BUILTIN_CFSTRING)); \
+  if (new_builtins_are_live)   \
+rs6000_builtin_decls_x[(unsigned) (RS6000_BIF_CFSTRING)]   \
+  = darwin_init_cfstring_builtins ((unsigned) (RS6000_BIF_CFSTRING)); \
+  else \
+rs6000_builtin_decls[(unsigned) (RS6000_BUILTIN_CFSTRING)] \
+  = darwin_init_cfstring_builtins ((unsigned) (RS6000_BUILTIN_CFSTRING)); \
 } while(0)
 
 /* So far, there is no rs6000_fold_builtin, if one is introduced, then
diff --git a/gcc/config/rs6000/rs6000-builtin-new.def 
b/gcc/config/rs6000/rs6000-builtin-new.def
index 322dbe1f713..91dce7fbc91 100644
--- a/gcc/config/rs6000/rs6000-builtin-new.def
+++ b/gcc/config/rs6000/rs6000-builtin-new.def
@@ -187,6 +187,12 @@
 ; Builtins that have been around since time immemorial or are just
 ; considered available everywhere.
 [always]
+; __builtin_cfstring is for Darwin, which will replace the decl we
+; create here with another one during subtarget processing.  We just
+; need to ensure it has a slot in the builtin enumeration.
+  void __builtin_cfstring ();
+CFSTRING nothing {}
+
   void __builtin_cpu_init ();
 CPU_INIT nothing {cpu}
 
-- 
2.27.0

[PATCH 13/34] rs6000: Add Cell builtins

2021-07-29 Thread Bill Schmidt via Gcc-patches

2021-06-07  Bill Schmidt  

gcc/
* config/rs6000/rs6000-builtin-new.def: Add cell stanza.
---
 gcc/config/rs6000/rs6000-builtin-new.def | 27 
 1 file changed, 27 insertions(+)

diff --git a/gcc/config/rs6000/rs6000-builtin-new.def 
b/gcc/config/rs6000/rs6000-builtin-new.def
index 805bdc87acd..322dbe1f713 100644
--- a/gcc/config/rs6000/rs6000-builtin-new.def
+++ b/gcc/config/rs6000/rs6000-builtin-new.def
@@ -1102,6 +1102,33 @@
 VEC_SET_V8HI nothing {set}
 
 
+; Cell builtins.
+[cell]
+  pure vsc __builtin_altivec_lvlx (signed long, const void *);
+LVLX altivec_lvlx {ldvec}
+
+  pure vsc __builtin_altivec_lvlxl (signed long, const void *);
+LVLXL altivec_lvlxl {ldvec}
+
+  pure vsc __builtin_altivec_lvrx (signed long, const void *);
+LVRX altivec_lvrx {ldvec}
+
+  pure vsc __builtin_altivec_lvrxl (signed long, const void *);
+LVRXL altivec_lvrxl {ldvec}
+
+  void __builtin_altivec_stvlx (vsc, signed long, void *);
+STVLX altivec_stvlx {stvec}
+
+  void __builtin_altivec_stvlxl (vsc, signed long, void *);
+STVLXL altivec_stvlxl {stvec}
+
+  void __builtin_altivec_stvrx (vsc, signed long, void *);
+STVRX altivec_stvrx {stvec}
+
+  void __builtin_altivec_stvrxl (vsc, signed long, void *);
+STVRXL altivec_stvrxl {stvec}
+
+
 ; VSX builtins.
 [vsx]
   pure vd __builtin_altivec_lvx_v2df (signed long, const void *);
-- 
2.27.0

[PATCH 11/34] rs6000: Add MMA builtins

2021-07-29 Thread Bill Schmidt via Gcc-patches

2021-06-16  Bill Schmidt  

gcc/
* config/rs6000/rs6000-builtin-new.def: Add mma stanza.
---
 gcc/config/rs6000/rs6000-builtin-new.def | 416 +++
 1 file changed, 416 insertions(+)

diff --git a/gcc/config/rs6000/rs6000-builtin-new.def 
b/gcc/config/rs6000/rs6000-builtin-new.def
index 6b7a79549a4..4b65d54d913 100644
--- a/gcc/config/rs6000/rs6000-builtin-new.def
+++ b/gcc/config/rs6000/rs6000-builtin-new.def
@@ -3332,3 +3332,419 @@
 
   const unsigned long long __builtin_pextd (unsigned long long, unsigned long 
long);
 PEXTD pextd {}
+
+
+[mma]
+  void __builtin_mma_assemble_acc (v512 *, vuc, vuc, vuc, vuc);
+ASSEMBLE_ACC nothing {mma}
+
+  v512 __builtin_mma_assemble_acc_internal (vuc, vuc, vuc, vuc);
+ASSEMBLE_ACC_INTERNAL mma_assemble_acc {mma}
+
+  void __builtin_mma_assemble_pair (v256 *, vuc, vuc);
+ASSEMBLE_PAIR nothing {mma}
+
+  v256 __builtin_mma_assemble_pair_internal (vuc, vuc);
+ASSEMBLE_PAIR_INTERNAL vsx_assemble_pair {mma}
+
+  void __builtin_mma_build_acc (v512 *, vuc, vuc, vuc, vuc);
+BUILD_ACC nothing {mma}
+
+  v512 __builtin_mma_build_acc_internal (vuc, vuc, vuc, vuc);
+BUILD_ACC_INTERNAL mma_assemble_acc {mma}
+
+  void __builtin_mma_disassemble_acc (void *, v512 *);
+DISASSEMBLE_ACC nothing {mma,quad}
+
+  vuc __builtin_mma_disassemble_acc_internal (v512, const int<2>);
+DISASSEMBLE_ACC_INTERNAL mma_disassemble_acc {mma}
+
+  void __builtin_mma_disassemble_pair (void *, v256 *);
+DISASSEMBLE_PAIR nothing {mma,pair}
+
+  vuc __builtin_mma_disassemble_pair_internal (v256, const int<2>);
+DISASSEMBLE_PAIR_INTERNAL vsx_disassemble_pair {mma}
+
+  void __builtin_mma_pmxvbf16ger2 (v512 *, vuc, vuc, const int<4>, const 
int<4>, const int<2>);
+PMXVBF16GER2 nothing {mma}
+
+  v512 __builtin_mma_pmxvbf16ger2_internal (vuc, vuc, const int<4>, const 
int<4>, const int<2>);
+PMXVBF16GER2_INTERNAL mma_pmxvbf16ger2 {mma}
+
+  void __builtin_mma_pmxvbf16ger2nn (v512 *, vuc, vuc, const int<4>, const 
int<4>, const int<2>);
+PMXVBF16GER2NN nothing {mma,quad}
+
+  v512 __builtin_mma_pmxvbf16ger2nn_internal (v512, vuc, vuc, const int<4>, 
const int<4>, const int<2>);
+PMXVBF16GER2NN_INTERNAL mma_pmxvbf16ger2nn {mma,quad}
+
+  void __builtin_mma_pmxvbf16ger2np (v512 *, vuc, vuc, const int<4>, const 
int<4>, const int<2>);
+PMXVBF16GER2NP nothing {mma,quad}
+
+  v512 __builtin_mma_pmxvbf16ger2np_internal (v512, vuc, vuc, const int<4>, 
const int<4>, const int<2>);
+PMXVBF16GER2NP_INTERNAL mma_pmxvbf16ger2np {mma,quad}
+
+  void __builtin_mma_pmxvbf16ger2pn (v512 *, vuc, vuc, const int<4>, const 
int<4>, const int<2>);
+PMXVBF16GER2PN nothing {mma,quad}
+
+  v512 __builtin_mma_pmxvbf16ger2pn_internal (v512, vuc, vuc, const int<4>, 
const int<4>, const int<2>);
+PMXVBF16GER2PN_INTERNAL mma_pmxvbf16ger2pn {mma,quad}
+
+  void __builtin_mma_pmxvbf16ger2pp (v512 *, vuc, vuc, const int<4>, const 
int<4>, const int<2>);
+PMXVBF16GER2PP nothing {mma,quad}
+
+  v512 __builtin_mma_pmxvbf16ger2pp_internal (v512, vuc, vuc, const int<4>, 
const int<4>, const int<2>);
+PMXVBF16GER2PP_INTERNAL mma_pmxvbf16ger2pp {mma,quad}
+
+  void __builtin_mma_pmxvf16ger2 (v512 *, vuc, vuc, const int<4>, const 
int<4>, const int<2>);
+PMXVF16GER2 nothing {mma}
+
+  v512 __builtin_mma_pmxvf16ger2_internal (vuc, vuc, const int<4>, const 
int<4>, const int<2>);
+PMXVF16GER2_INTERNAL mma_pmxvf16ger2 {mma}
+
+  void __builtin_mma_pmxvf16ger2nn (v512 *, vuc, vuc, const int<4>, const 
int<4>, const int<2>);
+PMXVF16GER2NN nothing {mma,quad}
+
+  v512 __builtin_mma_pmxvf16ger2nn_internal (v512, vuc, vuc, const int<4>, 
const int<4>, const int<2>);
+PMXVF16GER2NN_INTERNAL mma_pmxvf16ger2nn {mma,quad}
+
+  void __builtin_mma_pmxvf16ger2np (v512 *, vuc, vuc, const int<4>, const 
int<4>, const int<2>);
+PMXVF16GER2NP nothing {mma,quad}
+
+  v512 __builtin_mma_pmxvf16ger2np_internal (v512, vuc, vuc, const int<4>, 
const int<4>, const int<2>);
+PMXVF16GER2NP_INTERNAL mma_pmxvf16ger2np {mma,quad}
+
+  void __builtin_mma_pmxvf16ger2pn (v512 *, vuc, vuc, const int<4>, const 
int<4>, const int<2>);
+PMXVF16GER2PN nothing {mma,quad}
+
+  v512 __builtin_mma_pmxvf16ger2pn_internal (v512, vuc, vuc, const int<4>, 
const int<4>, const int<2>);
+PMXVF16GER2PN_INTERNAL mma_pmxvf16ger2pn {mma,quad}
+
+  void __builtin_mma_pmxvf16ger2pp (v512 *, vuc, vuc, const int<4>, const 
int<4>, const int<2>);
+PMXVF16GER2PP nothing {mma,quad}
+
+  v512 __builtin_mma_pmxvf16ger2pp_internal (v512, vuc, vuc, const int<4>, 
const int<4>, const int<2>);
+PMXVF16GER2PP_INTERNAL mma_pmxvf16ger2pp {mma,quad}
+
+  void __builtin_mma_pmxvf32ger (v512 *, vuc, vuc, const int<4>, const int<4>);
+PMXVF32GER nothing {mma}
+
+  v512 __builtin_mma_pmxvf32ger_internal (vuc, vuc, const int<4>, const 
int<4>);
+PMXVF32GER_INTERNAL mma_pmxvf32ger {mma}
+
+  void __builtin_mma_pmxvf32gernn (v512 *, vuc, vuc, const int<4>, const

[PATCH 12/34] rs6000: Add miscellaneous builtins

2021-07-29 Thread Bill Schmidt via Gcc-patches

2021-06-15  Bill Schmidt  

gcc/
* config/rs6000/rs6000-builtin-new.def: Add ieee128-hw, dfp,
crypto, and htm stanzas.
---
 gcc/config/rs6000/rs6000-builtin-new.def | 215 +++
 1 file changed, 215 insertions(+)

diff --git a/gcc/config/rs6000/rs6000-builtin-new.def 
b/gcc/config/rs6000/rs6000-builtin-new.def
index 4b65d54d913..805bdc87acd 100644
--- a/gcc/config/rs6000/rs6000-builtin-new.def
+++ b/gcc/config/rs6000/rs6000-builtin-new.def
@@ -2811,6 +2811,221 @@
 XL_LEN_R xl_len_r {}
 
 
+; Builtins requiring hardware support for IEEE-128 floating-point.
+[ieee128-hw]
+  fpmath _Float128 __builtin_addf128_round_to_odd (_Float128, _Float128);
+ADDF128_ODD addkf3_odd {}
+
+  fpmath _Float128 __builtin_divf128_round_to_odd (_Float128, _Float128);
+DIVF128_ODD divkf3_odd {}
+
+  fpmath _Float128 __builtin_fmaf128_round_to_odd (_Float128, _Float128, 
_Float128);
+FMAF128_ODD fmakf4_odd {}
+
+  fpmath _Float128 __builtin_mulf128_round_to_odd (_Float128, _Float128);
+MULF128_ODD mulkf3_odd {}
+
+  const signed int __builtin_vsx_scalar_cmp_exp_qp_eq (_Float128, _Float128);
+VSCEQPEQ xscmpexpqp_eq_kf {}
+
+  const signed int __builtin_vsx_scalar_cmp_exp_qp_gt (_Float128, _Float128);
+VSCEQPGT xscmpexpqp_gt_kf {}
+
+  const signed int __builtin_vsx_scalar_cmp_exp_qp_lt (_Float128, _Float128);
+VSCEQPLT xscmpexpqp_lt_kf {}
+
+  const signed int __builtin_vsx_scalar_cmp_exp_qp_unordered (_Float128, 
_Float128);
+VSCEQPUO xscmpexpqp_unordered_kf {}
+
+  fpmath _Float128 __builtin_sqrtf128_round_to_odd (_Float128);
+SQRTF128_ODD sqrtkf2_odd {}
+
+  fpmath _Float128 __builtin_subf128_round_to_odd (_Float128, _Float128);
+SUBF128_ODD subkf3_odd {}
+
+  fpmath double __builtin_truncf128_round_to_odd (_Float128);
+TRUNCF128_ODD trunckfdf2_odd {}
+
+  const signed long long __builtin_vsx_scalar_extract_expq (_Float128);
+VSEEQP xsxexpqp_kf {}
+
+  const signed __int128 __builtin_vsx_scalar_extract_sigq (_Float128);
+VSESQP xsxsigqp_kf {}
+
+  const _Float128 __builtin_vsx_scalar_insert_exp_q (unsigned __int128, 
unsigned long long);
+VSIEQP xsiexpqp_kf {}
+
+  const _Float128 __builtin_vsx_scalar_insert_exp_qp (_Float128, unsigned long 
long);
+VSIEQPF xsiexpqpf_kf {}
+
+  const signed int __builtin_vsx_scalar_test_data_class_qp (_Float128, const 
int<7>);
+VSTDCQP xststdcqp_kf {}
+
+  const signed int __builtin_vsx_scalar_test_neg_qp (_Float128);
+VSTDCNQP xststdcnegqp_kf {}
+
+
+
+; Decimal floating-point builtins.
+[dfp]
+  const _Decimal64 __builtin_ddedpd (const int<2>, _Decimal64);
+DDEDPD dfp_ddedpd_dd {}
+
+  const _Decimal128 __builtin_ddedpdq (const int<2>, _Decimal128);
+DDEDPDQ dfp_ddedpd_td {}
+
+  const _Decimal64 __builtin_denbcd (const int<1>, _Decimal64);
+DENBCD dfp_denbcd_dd {}
+
+  const _Decimal128 __builtin_denbcdq (const int<1>, _Decimal128);
+DENBCDQ dfp_denbcd_td {}
+
+  const _Decimal128 __builtin_denb2dfp_v16qi (vsc);
+DENB2DFP_V16QI dfp_denbcd_v16qi {}
+
+  const _Decimal64 __builtin_diex (signed long long, _Decimal64);
+DIEX dfp_diex_dd {}
+
+  const _Decimal128 __builtin_diexq (signed long long, _Decimal128);
+DIEXQ dfp_diex_td {}
+
+  const _Decimal64 __builtin_dscli (_Decimal64, const int<6>);
+DSCLI dfp_dscli_dd {}
+
+  const _Decimal128 __builtin_dscliq (_Decimal128, const int<6>);
+DSCLIQ dfp_dscli_td {}
+
+  const _Decimal64 __builtin_dscri (_Decimal64, const int<6>);
+DSCRI dfp_dscri_dd {}
+
+  const _Decimal128 __builtin_dscriq (_Decimal128, const int<6>);
+DSCRIQ dfp_dscri_td {}
+
+  const signed long long __builtin_dxex (_Decimal64);
+DXEX dfp_dxex_dd {}
+
+  const signed long long __builtin_dxexq (_Decimal128);
+DXEXQ dfp_dxex_td {}
+
+  const _Decimal128 __builtin_pack_dec128 (unsigned long long, unsigned long 
long);
+PACK_TD packtd {}
+
+  void __builtin_set_fpscr_drn (const int[0,7]);
+SET_FPSCR_DRN rs6000_set_fpscr_drn {}
+
+  const unsigned long __builtin_unpack_dec128 (_Decimal128, const int<1>);
+UNPACK_TD unpacktd {}
+
+
+[crypto]
+  const vull __builtin_crypto_vcipher (vull, vull);
+VCIPHER crypto_vcipher_v2di {}
+
+  const vuc __builtin_crypto_vcipher_be (vuc, vuc);
+VCIPHER_BE crypto_vcipher_v16qi {}
+
+  const vull __builtin_crypto_vcipherlast (vull, vull);
+VCIPHERLAST crypto_vcipherlast_v2di {}
+
+  const vuc __builtin_crypto_vcipherlast_be (vuc, vuc);
+VCIPHERLAST_BE crypto_vcipherlast_v16qi {}
+
+  const vull __builtin_crypto_vncipher (vull, vull);
+VNCIPHER crypto_vncipher_v2di {}
+
+  const vuc __builtin_crypto_vncipher_be (vuc, vuc);
+VNCIPHER_BE crypto_vncipher_v16qi {}
+
+  const vull __builtin_crypto_vncipherlast (vull, vull);
+VNCIPHERLAST crypto_vncipherlast_v2di {}
+
+  const vuc __builtin_crypto_vncipherlast_be (vuc, vuc);
+VNCIPHERLAST_BE crypto_vncipherlast_v16qi {}
+
+  const vull __builtin_crypto_vsbox (vull);
+VSBOX crypto_vsbox_v2di {}
+
+  const

[PATCH 10/34] rs6000: Add Power10 builtins

2021-07-29 Thread Bill Schmidt via Gcc-patches

2021-07-28  Bill Schmidt  

gcc/
* config/rs6000/rs6000-builtin-new.def: Add power10 and power10-64
stanzas.
---
 gcc/config/rs6000/rs6000-builtin-new.def | 523 +++
 1 file changed, 523 insertions(+)

diff --git a/gcc/config/rs6000/rs6000-builtin-new.def 
b/gcc/config/rs6000/rs6000-builtin-new.def
index 8885df089a6..6b7a79549a4 100644
--- a/gcc/config/rs6000/rs6000-builtin-new.def
+++ b/gcc/config/rs6000/rs6000-builtin-new.def
@@ -2809,3 +2809,526 @@
 
   pure vsc __builtin_vsx_xl_len_r (void *, signed long);
 XL_LEN_R xl_len_r {}
+
+
+[power10]
+  const vbq __builtin_altivec_cmpge_1ti (vsq, vsq);
+CMPGE_1TI vector_nltv1ti {}
+
+  const vbq __builtin_altivec_cmpge_u1ti (vuq, vuq);
+CMPGE_U1TI vector_nltuv1ti {}
+
+  const vbq __builtin_altivec_cmple_1ti (vsq, vsq);
+CMPLE_1TI vector_ngtv1ti {}
+
+  const vbq __builtin_altivec_cmple_u1ti (vuq, vuq);
+CMPLE_U1TI vector_ngtuv1ti {}
+
+  const unsigned long long __builtin_altivec_cntmbb (vuc, const int<1>);
+VCNTMBB vec_cntmb_v16qi {}
+
+  const unsigned long long __builtin_altivec_cntmbd (vull, const int<1>);
+VCNTMBD vec_cntmb_v2di {}
+
+  const unsigned long long __builtin_altivec_cntmbh (vus, const int<1>);
+VCNTMBH vec_cntmb_v8hi {}
+
+  const unsigned long long __builtin_altivec_cntmbw (vui, const int<1>);
+VCNTMBW vec_cntmb_v4si {}
+
+  const vsq __builtin_altivec_div_v1ti (vsq, vsq);
+DIV_V1TI vsx_div_v1ti {}
+
+  const vsq __builtin_altivec_dives (vsq, vsq);
+DIVES_V1TI vsx_dives_v1ti {}
+
+  const vuq __builtin_altivec_diveu (vuq, vuq);
+DIVEU_V1TI vsx_diveu_v1ti {}
+
+  const vsq __builtin_altivec_mods (vsq, vsq);
+MODS_V1TI vsx_mods_v1ti {}
+
+  const vuq __builtin_altivec_modu (vuq, vuq);
+MODU_V1TI vsx_modu_v1ti {}
+
+  const vuc __builtin_altivec_mtvsrbm (unsigned long long);
+MTVSRBM vec_mtvsr_v16qi {}
+
+  const vull __builtin_altivec_mtvsrdm (unsigned long long);
+MTVSRDM vec_mtvsr_v2di {}
+
+  const vus __builtin_altivec_mtvsrhm (unsigned long long);
+MTVSRHM vec_mtvsr_v8hi {}
+
+  const vuq __builtin_altivec_mtvsrqm (unsigned long long);
+MTVSRQM vec_mtvsr_v1ti {}
+
+  const vui __builtin_altivec_mtvsrwm (unsigned long long);
+MTVSRWM vec_mtvsr_v4si {}
+
+  pure signed __int128 __builtin_altivec_se_lxvrbx (signed long, const signed 
char *);
+SE_LXVRBX vsx_lxvrbx {lxvrse}
+
+  pure signed __int128 __builtin_altivec_se_lxvrhx (signed long, const signed 
short *);
+SE_LXVRHX vsx_lxvrhx {lxvrse}
+
+  pure signed __int128 __builtin_altivec_se_lxvrwx (signed long, const signed 
int *);
+SE_LXVRWX vsx_lxvrwx {lxvrse}
+
+  pure signed __int128 __builtin_altivec_se_lxvrdx (signed long, const signed 
long long *);
+SE_LXVRDX vsx_lxvrdx {lxvrse}
+
+  void __builtin_altivec_tr_stxvrbx (vsq, signed long, signed char *);
+TR_STXVRBX vsx_stxvrbx {stvec}
+
+  void __builtin_altivec_tr_stxvrhx (vsq, signed long, signed int *);
+TR_STXVRHX vsx_stxvrhx {stvec}
+
+  void __builtin_altivec_tr_stxvrwx (vsq, signed long, signed short *);
+TR_STXVRWX vsx_stxvrwx {stvec}
+
+  void __builtin_altivec_tr_stxvrdx (vsq, signed long, signed long long *);
+TR_STXVRDX vsx_stxvrdx {stvec}
+
+  const vuq __builtin_altivec_udiv_v1ti (vuq, vuq);
+UDIV_V1TI vsx_udiv_v1ti {}
+
+  const vull __builtin_altivec_vcfuged (vull, vull);
+VCFUGED vcfuged {}
+
+  const vsc __builtin_altivec_vclrlb (vsc, signed int);
+VCLRLB vclrlb {}
+
+  const vsc __builtin_altivec_vclrrb (vsc, signed int);
+VCLRRB vclrrb {}
+
+  const signed int __builtin_altivec_vcmpaet_p (vsq, vsq);
+VCMPAET_P vector_ae_v1ti_p {}
+
+  const vbq __builtin_altivec_vcmpequt (vsq, vsq);
+VCMPEQUT vector_eqv1ti {}
+
+  const signed int __builtin_altivec_vcmpequt_p (signed int, vsq, vsq);
+VCMPEQUT_P vector_eq_v1ti_p {pred}
+
+  const vbq __builtin_altivec_vcmpgtst (vsq, vsq);
+VCMPGTST vector_gtv1ti {}
+
+  const signed int __builtin_altivec_vcmpgtst_p (signed int, vsq, vsq);
+VCMPGTST_P vector_gt_v1ti_p {pred}
+
+  const vbq __builtin_altivec_vcmpgtut (vuq, vuq);
+VCMPGTUT vector_gtuv1ti {}
+
+  const signed int __builtin_altivec_vcmpgtut_p (signed int, vuq, vuq);
+VCMPGTUT_P vector_gtu_v1ti_p {pred}
+
+  const vbq __builtin_altivec_vcmpnet (vsq, vsq);
+VCMPNET vcmpnet {}
+
+  const signed int __builtin_altivec_vcmpnet_p (vsq, vsq);
+VCMPNET_P vector_ne_v1ti_p {}
+
+  const vull __builtin_altivec_vclzdm (vull, vull);
+VCLZDM vclzdm {}
+
+  const vull __builtin_altivec_vctzdm (vull, vull);
+VCTZDM vctzdm {}
+
+  const vsll __builtin_altivec_vdivesd (vsll, vsll);
+VDIVESD dives_v2di {}
+
+  const vsi __builtin_altivec_vdivesw (vsi, vsi);
+VDIVESW dives_v4si {}
+
+  const vull __builtin_altivec_vdiveud (vull, vull);
+VDIVEUD diveu_v2di {}
+
+  const vui __builtin_altivec_vdiveuw (vui, vui);
+VDIVEUW diveu_v4si {}
+
+  const vsll __builtin_altivec_vdivsd (vsll, vsll);
+VDIVSD divv2di3 {}
+
+  const vsi

[PATCH 09/34] rs6000: Add more type nodes to support builtin processing

2021-07-29 Thread Bill Schmidt via Gcc-patches

2021-06-10  Bill Schmidt  

gcc/
* config/rs6000/rs6000-call.c (rs6000_init_builtins): Initialize
various pointer type nodes.
* config/rs6000/rs6000.h (rs6000_builtin_type_index): Add enum
values for various pointer types.
(ptr_V16QI_type_node): New macro.
(ptr_V1TI_type_node): New macro.
(ptr_V2DI_type_node): New macro.
(ptr_V2DF_type_node): New macro.
(ptr_V4SI_type_node): New macro.
(ptr_V4SF_type_node): New macro.
(ptr_V8HI_type_node): New macro.
(ptr_unsigned_V16QI_type_node): New macro.
(ptr_unsigned_V1TI_type_node): New macro.
(ptr_unsigned_V8HI_type_node): New macro.
(ptr_unsigned_V4SI_type_node): New macro.
(ptr_unsigned_V2DI_type_node): New macro.
(ptr_bool_V16QI_type_node): New macro.
(ptr_bool_V8HI_type_node): New macro.
(ptr_bool_V4SI_type_node): New macro.
(ptr_bool_V2DI_type_node): New macro.
(ptr_bool_V1TI_type_node): New macro.
(ptr_pixel_type_node): New macro.
(ptr_intQI_type_node): New macro.
(ptr_uintQI_type_node): New macro.
(ptr_intHI_type_node): New macro.
(ptr_uintHI_type_node): New macro.
(ptr_intSI_type_node): New macro.
(ptr_uintSI_type_node): New macro.
(ptr_intDI_type_node): New macro.
(ptr_uintDI_type_node): New macro.
(ptr_intTI_type_node): New macro.
(ptr_uintTI_type_node): New macro.
(ptr_long_integer_type_node): New macro.
(ptr_long_unsigned_type_node): New macro.
(ptr_float_type_node): New macro.
(ptr_double_type_node): New macro.
(ptr_long_double_type_node): New macro.
(ptr_dfloat64_type_node): New macro.
(ptr_dfloat128_type_node): New macro.
(ptr_ieee128_type_node): New macro.
(ptr_ibm128_type_node): New macro.
(ptr_vector_pair_type_node): New macro.
(ptr_vector_quad_type_node): New macro.
(ptr_long_long_integer_type_node): New macro.
(ptr_long_long_unsigned_type_node): New macro.
---
 gcc/config/rs6000/rs6000-call.c | 151 
 gcc/config/rs6000/rs6000.h  |  82 +
 2 files changed, 233 insertions(+)

diff --git a/gcc/config/rs6000/rs6000-call.c b/gcc/config/rs6000/rs6000-call.c
index 8b16d65e684..b1338191926 100644
--- a/gcc/config/rs6000/rs6000-call.c
+++ b/gcc/config/rs6000/rs6000-call.c
@@ -13298,25 +13298,63 @@ rs6000_init_builtins (void)
   V2DI_type_node = rs6000_vector_type (TARGET_POWERPC64 ? "__vector long"
   : "__vector long long",
   long_long_integer_type_node, 2);
+  ptr_V2DI_type_node
+= build_pointer_type (build_qualified_type (V2DI_type_node,
+   TYPE_QUAL_CONST));
+
   V2DF_type_node = rs6000_vector_type ("__vector double", double_type_node, 2);
+  ptr_V2DF_type_node
+= build_pointer_type (build_qualified_type (V2DF_type_node,
+   TYPE_QUAL_CONST));
+
   V4SI_type_node = rs6000_vector_type ("__vector signed int",
   intSI_type_node, 4);
+  ptr_V4SI_type_node
+= build_pointer_type (build_qualified_type (V4SI_type_node,
+   TYPE_QUAL_CONST));
+
   V4SF_type_node = rs6000_vector_type ("__vector float", float_type_node, 4);
+  ptr_V4SF_type_node
+= build_pointer_type (build_qualified_type (V4SF_type_node,
+   TYPE_QUAL_CONST));
+
   V8HI_type_node = rs6000_vector_type ("__vector signed short",
   intHI_type_node, 8);
+  ptr_V8HI_type_node
+= build_pointer_type (build_qualified_type (V8HI_type_node,
+   TYPE_QUAL_CONST));
+
   V16QI_type_node = rs6000_vector_type ("__vector signed char",
intQI_type_node, 16);
+  ptr_V16QI_type_node
+= build_pointer_type (build_qualified_type (V16QI_type_node,
+   TYPE_QUAL_CONST));
 
   unsigned_V16QI_type_node = rs6000_vector_type ("__vector unsigned char",
unsigned_intQI_type_node, 16);
+  ptr_unsigned_V16QI_type_node
+= build_pointer_type (build_qualified_type (unsigned_V16QI_type_node,
+   TYPE_QUAL_CONST));
+
   unsigned_V8HI_type_node = rs6000_vector_type ("__vector unsigned short",
   unsigned_intHI_type_node, 8);
+  ptr_unsigned_V8HI_type_node
+= build_pointer_type (build_qualified_type (unsigned_V8HI_type_node,
+   TYPE_QUAL_CONST));
+
   unsigned_V4SI_type_node = rs6000_vector_type ("__vector unsigned int",
   unsigned_intSI_type_node, 4);
+

[PATCH 08/34] rs6000: Add Power9 builtins

2021-07-29 Thread Bill Schmidt via Gcc-patches

2021-06-15  Bill Schmidt  

gcc/
* config/rs6000/rs6000-builtin-new.def: Add power9-vector, power9,
and power9-64 stanzas.
---
 gcc/config/rs6000/rs6000-builtin-new.def | 375 +++
 1 file changed, 375 insertions(+)

diff --git a/gcc/config/rs6000/rs6000-builtin-new.def 
b/gcc/config/rs6000/rs6000-builtin-new.def
index f13fb13b0ad..8885df089a6 100644
--- a/gcc/config/rs6000/rs6000-builtin-new.def
+++ b/gcc/config/rs6000/rs6000-builtin-new.def
@@ -2434,3 +2434,378 @@
 
   const double __builtin_vsx_xscvspdpn (vf);
 XSCVSPDPN vsx_xscvspdpn {}
+
+
+; Power9 vector builtins.
+[power9-vector]
+  const vss __builtin_altivec_convert_4f32_8f16 (vf, vf);
+CONVERT_4F32_8F16 convert_4f32_8f16 {}
+
+  const vss __builtin_altivec_convert_4f32_8i16 (vf, vf);
+CONVERT_4F32_8I16 convert_4f32_8i16 {}
+
+  const signed int __builtin_altivec_first_match_index_v16qi (vsc, vsc);
+VFIRSTMATCHINDEX_V16QI first_match_index_v16qi {}
+
+  const signed int __builtin_altivec_first_match_index_v8hi (vss, vss);
+VFIRSTMATCHINDEX_V8HI first_match_index_v8hi {}
+
+  const signed int __builtin_altivec_first_match_index_v4si (vsi, vsi);
+VFIRSTMATCHINDEX_V4SI first_match_index_v4si {}
+
+  const signed int __builtin_altivec_first_match_or_eos_index_v16qi (vsc, vsc);
+VFIRSTMATCHOREOSINDEX_V16QI first_match_or_eos_index_v16qi {}
+
+  const signed int __builtin_altivec_first_match_or_eos_index_v8hi (vss, vss);
+VFIRSTMATCHOREOSINDEX_V8HI first_match_or_eos_index_v8hi {}
+
+  const signed int __builtin_altivec_first_match_or_eos_index_v4si (vsi, vsi);
+VFIRSTMATCHOREOSINDEX_V4SI first_match_or_eos_index_v4si {}
+
+  const signed int __builtin_altivec_first_mismatch_index_v16qi (vsc, vsc);
+VFIRSTMISMATCHINDEX_V16QI first_mismatch_index_v16qi {}
+
+  const signed int __builtin_altivec_first_mismatch_index_v8hi (vss, vss);
+VFIRSTMISMATCHINDEX_V8HI first_mismatch_index_v8hi {}
+
+  const signed int __builtin_altivec_first_mismatch_index_v4si (vsi, vsi);
+VFIRSTMISMATCHINDEX_V4SI first_mismatch_index_v4si {}
+
+  const signed int __builtin_altivec_first_mismatch_or_eos_index_v16qi (vsc, 
vsc);
+VFIRSTMISMATCHOREOSINDEX_V16QI first_mismatch_or_eos_index_v16qi {}
+
+  const signed int __builtin_altivec_first_mismatch_or_eos_index_v8hi (vss, 
vss);
+VFIRSTMISMATCHOREOSINDEX_V8HI first_mismatch_or_eos_index_v8hi {}
+
+  const signed int __builtin_altivec_first_mismatch_or_eos_index_v4si (vsi, 
vsi);
+VFIRSTMISMATCHOREOSINDEX_V4SI first_mismatch_or_eos_index_v4si {}
+
+  const vsc __builtin_altivec_vadub (vsc, vsc);
+VADUB vaduv16qi3 {}
+
+  const vss __builtin_altivec_vaduh (vss, vss);
+VADUH vaduv8hi3 {}
+
+  const vsi __builtin_altivec_vaduw (vsi, vsi);
+VADUW vaduv4si3 {}
+
+  const vsll __builtin_altivec_vbpermd (vsll, vsc);
+VBPERMD altivec_vbpermd {}
+
+  const signed int __builtin_altivec_vclzlsbb_v16qi (vsc);
+VCLZLSBB_V16QI vclzlsbb_v16qi {}
+
+  const signed int __builtin_altivec_vclzlsbb_v4si (vsi);
+VCLZLSBB_V4SI vclzlsbb_v4si {}
+
+  const signed int __builtin_altivec_vclzlsbb_v8hi (vss);
+VCLZLSBB_V8HI vclzlsbb_v8hi {}
+
+  const vsc __builtin_altivec_vctzb (vsc);
+VCTZB ctzv16qi2 {}
+
+  const vsll __builtin_altivec_vctzd (vsll);
+VCTZD ctzv2di2 {}
+
+  const vss __builtin_altivec_vctzh (vss);
+VCTZH ctzv8hi2 {}
+
+  const vsi __builtin_altivec_vctzw (vsi);
+VCTZW ctzv4si2 {}
+
+  const signed int __builtin_altivec_vctzlsbb_v16qi (vsc);
+VCTZLSBB_V16QI vctzlsbb_v16qi {}
+
+  const signed int __builtin_altivec_vctzlsbb_v4si (vsi);
+VCTZLSBB_V4SI vctzlsbb_v4si {}
+
+  const signed int __builtin_altivec_vctzlsbb_v8hi (vss);
+VCTZLSBB_V8HI vctzlsbb_v8hi {}
+
+  const signed int __builtin_altivec_vcmpaeb_p (vsc, vsc);
+VCMPAEB_P vector_ae_v16qi_p {}
+
+  const signed int __builtin_altivec_vcmpaed_p (vsll, vsll);
+VCMPAED_P vector_ae_v2di_p {}
+
+  const signed int __builtin_altivec_vcmpaedp_p (vd, vd);
+VCMPAEDP_P vector_ae_v2df_p {}
+
+  const signed int __builtin_altivec_vcmpaefp_p (vf, vf);
+VCMPAEFP_P vector_ae_v4sf_p {}
+
+  const signed int __builtin_altivec_vcmpaeh_p (vss, vss);
+VCMPAEH_P vector_ae_v8hi_p {}
+
+  const signed int __builtin_altivec_vcmpaew_p (vsi, vsi);
+VCMPAEW_P vector_ae_v4si_p {}
+
+  const vsc __builtin_altivec_vcmpneb (vsc, vsc);
+VCMPNEB vcmpneb {}
+
+  const signed int __builtin_altivec_vcmpneb_p (vsc, vsc);
+VCMPNEB_P vector_ne_v16qi_p {}
+
+  const signed int __builtin_altivec_vcmpned_p (vsll, vsll);
+VCMPNED_P vector_ne_v2di_p {}
+
+  const signed int __builtin_altivec_vcmpnedp_p (vd, vd);
+VCMPNEDP_P vector_ne_v2df_p {}
+
+  const signed int __builtin_altivec_vcmpnefp_p (vf, vf);
+VCMPNEFP_P vector_ne_v4sf_p {}
+
+  const vss __builtin_altivec_vcmpneh (vss, vss);
+VCMPNEH vcmpneh {}
+
+  const signed int __builtin_altivec_vcmpneh_p (vss, vss);
+VCMPNEH_P vector_ne_v8hi_p {}
+
+  const vsi

[PATCH 06/34] rs6000: Add power7 and power7-64 builtins

2021-07-29 Thread Bill Schmidt via Gcc-patches

2021-04-02  Bill Schmidt  

gcc/
* config/rs6000/rs6000-builtin-new.def: Add power7 and power7-64
stanzas.
---
 gcc/config/rs6000/rs6000-builtin-new.def | 39 
 1 file changed, 39 insertions(+)

diff --git a/gcc/config/rs6000/rs6000-builtin-new.def 
b/gcc/config/rs6000/rs6000-builtin-new.def
index ca694be1ac3..bffce52ee47 100644
--- a/gcc/config/rs6000/rs6000-builtin-new.def
+++ b/gcc/config/rs6000/rs6000-builtin-new.def
@@ -1957,3 +1957,42 @@
 
   const vsll __builtin_vsx_xxspltd_2di (vsll, const int<1>);
 XXSPLTD_V2DI vsx_xxspltd_v2di {}
+
+
+; Power7 builtins (ISA 2.06).
+[power7]
+  const unsigned int __builtin_addg6s (unsigned int, unsigned int);
+ADDG6S addg6s {}
+
+  const signed long __builtin_bpermd (signed long, signed long);
+BPERMD bpermd_di {}
+
+  const unsigned int __builtin_cbcdtd (unsigned int);
+CBCDTD cbcdtd {}
+
+  const unsigned int __builtin_cdtbcd (unsigned int);
+CDTBCD cdtbcd {}
+
+  const signed int __builtin_divwe (signed int, signed int);
+DIVWE dive_si {}
+
+  const unsigned int __builtin_divweu (unsigned int, unsigned int);
+DIVWEU diveu_si {}
+
+  const vsq __builtin_pack_vector_int128 (unsigned long long, unsigned long 
long);
+PACK_V1TI packv1ti {}
+
+  void __builtin_ppc_speculation_barrier ();
+SPECBARR speculation_barrier {}
+
+  const unsigned long __builtin_unpack_vector_int128 (vsq, const int<1>);
+UNPACK_V1TI unpackv1ti {}
+
+
+; Power7 builtins requiring 64-bit GPRs (even with 32-bit addressing).
+[power7-64]
+  const signed long long __builtin_divde (signed long long, signed long long);
+DIVDE dive_di {}
+
+  const unsigned long long __builtin_divdeu (unsigned long long, unsigned long 
long);
+DIVDEU diveu_di {}
-- 
2.27.0

[PATCH 05/34] rs6000: Add available-everywhere and ancient builtins

2021-07-29 Thread Bill Schmidt via Gcc-patches

2021-06-07  Bill Schmidt  

gcc/
* config/rs6000/rs6000-builtin-new.def: Add always, power5, and
power6 stanzas.
---
 gcc/config/rs6000/rs6000-builtin-new.def | 72 
 1 file changed, 72 insertions(+)

diff --git a/gcc/config/rs6000/rs6000-builtin-new.def 
b/gcc/config/rs6000/rs6000-builtin-new.def
index 974cdc8c37c..ca694be1ac3 100644
--- a/gcc/config/rs6000/rs6000-builtin-new.def
+++ b/gcc/config/rs6000/rs6000-builtin-new.def
@@ -184,6 +184,78 @@
 
 
 
+; Builtins that have been around since time immemorial or are just
+; considered available everywhere.
+[always]
+  void __builtin_cpu_init ();
+CPU_INIT nothing {cpu}
+
+  bool __builtin_cpu_is (string);
+CPU_IS nothing {cpu}
+
+  bool __builtin_cpu_supports (string);
+CPU_SUPPORTS nothing {cpu}
+
+  unsigned long long __builtin_ppc_get_timebase ();
+GET_TB rs6000_get_timebase {}
+
+  double __builtin_mffs ();
+MFFS rs6000_mffs {}
+
+; This will break for long double == _Float128.  libgcc history.
+  const long double __builtin_pack_longdouble (double, double);
+PACK_TF packtf {}
+
+  unsigned long __builtin_ppc_mftb ();
+MFTB rs6000_mftb_di {32bit}
+
+  void __builtin_mtfsb0 (const int<5>);
+MTFSB0 rs6000_mtfsb0 {}
+
+  void __builtin_mtfsb1 (const int<5>);
+MTFSB1 rs6000_mtfsb1 {}
+
+  void __builtin_mtfsf (const int<8>, double);
+MTFSF rs6000_mtfsf {}
+
+  const __ibm128 __builtin_pack_ibm128 (double, double);
+PACK_IF packif {}
+
+  void __builtin_set_fpscr_rn (const int[0,3]);
+SET_FPSCR_RN rs6000_set_fpscr_rn {}
+
+  const double __builtin_unpack_ibm128 (__ibm128, const int<1>);
+UNPACK_IF unpackif {}
+
+; This will break for long double == _Float128.  libgcc history.
+  const double __builtin_unpack_longdouble (long double, const int<1>);
+UNPACK_TF unpacktf {}
+
+
+; Builtins that have been around just about forever, but not quite.
+[power5]
+  fpmath double __builtin_recipdiv (double, double);
+RECIP recipdf3 {}
+
+  fpmath float __builtin_recipdivf (float, float);
+RECIPF recipsf3 {}
+
+  fpmath double __builtin_rsqrt (double);
+RSQRT rsqrtdf2 {}
+
+  fpmath float __builtin_rsqrtf (float);
+RSQRTF rsqrtsf2 {}
+
+
+; Power6 builtins.
+[power6]
+  const signed long __builtin_p6_cmpb (signed long, signed long);
+CMPB cmpbdi3 {}
+
+  const signed int __builtin_p6_cmpb_32 (signed int, signed int);
+CMPB_32 cmpbsi3 {}
+
+
 ; AltiVec builtins.
 [altivec]
   const vsc __builtin_altivec_abs_v16qi (vsc);
-- 
2.27.0

[PATCH 04/34] rs6000: Add VSX builtins

2021-07-29 Thread Bill Schmidt via Gcc-patches

2021-06-07  Bill Schmidt  

gcc/
* config/rs6000/rs6000-builtin-new.def: Add vsx stanza.
---
 gcc/config/rs6000/rs6000-builtin-new.def | 857 +++
 1 file changed, 857 insertions(+)

diff --git a/gcc/config/rs6000/rs6000-builtin-new.def 
b/gcc/config/rs6000/rs6000-builtin-new.def
index f1aa5529cdd..974cdc8c37c 100644
--- a/gcc/config/rs6000/rs6000-builtin-new.def
+++ b/gcc/config/rs6000/rs6000-builtin-new.def
@@ -1028,3 +1028,860 @@
 
   const vss __builtin_vec_set_v8hi (vss, signed short, const int<3>);
 VEC_SET_V8HI nothing {set}
+
+
+; VSX builtins.
+[vsx]
+  pure vd __builtin_altivec_lvx_v2df (signed long, const void *);
+LVX_V2DF altivec_lvx_v2df {ldvec}
+
+  pure vsll __builtin_altivec_lvx_v2di (signed long, const void *);
+LVX_V2DI altivec_lvx_v2di {ldvec}
+
+  pure vd __builtin_altivec_lvxl_v2df (signed long, const void *);
+LVXL_V2DF altivec_lvxl_v2df {ldvec}
+
+  pure vsll __builtin_altivec_lvxl_v2di (signed long, const void *);
+LVXL_V2DI altivec_lvxl_v2di {ldvec}
+
+  const vd __builtin_altivec_nabs_v2df (vd);
+NABS_V2DF vsx_nabsv2df2 {}
+
+  const vsll __builtin_altivec_nabs_v2di (vsll);
+NABS_V2DI nabsv2di2 {}
+
+  void __builtin_altivec_stvx_v2df (vd, signed long, void *);
+STVX_V2DF altivec_stvx_v2df {stvec}
+
+  void __builtin_altivec_stvx_v2di (vsll, signed long, void *);
+STVX_V2DI altivec_stvx_v2di {stvec}
+
+  void __builtin_altivec_stvxl_v2df (vd, signed long, void *);
+STVXL_V2DF altivec_stvxl_v2df {stvec}
+
+  void __builtin_altivec_stvxl_v2di (vsll, signed long, void *);
+STVXL_V2DI altivec_stvxl_v2di {stvec}
+
+  const vd __builtin_altivec_vand_v2df (vd, vd);
+VAND_V2DF andv2df3 {}
+
+  const vsll __builtin_altivec_vand_v2di (vsll, vsll);
+VAND_V2DI andv2di3 {}
+
+  const vull __builtin_altivec_vand_v2di_uns (vull, vull);
+VAND_V2DI_UNS andv2di3 {}
+
+  const vd __builtin_altivec_vandc_v2df (vd, vd);
+VANDC_V2DF andcv2df3 {}
+
+  const vsll __builtin_altivec_vandc_v2di (vsll, vsll);
+VANDC_V2DI andcv2di3 {}
+
+  const vull __builtin_altivec_vandc_v2di_uns (vull, vull);
+VANDC_V2DI_UNS andcv2di3 {}
+
+  const vsll __builtin_altivec_vcmpequd (vull, vull);
+VCMPEQUD vector_eqv2di {}
+
+  const int __builtin_altivec_vcmpequd_p (int, vsll, vsll);
+VCMPEQUD_P vector_eq_v2di_p {pred}
+
+  const vsll __builtin_altivec_vcmpgtsd (vsll, vsll);
+VCMPGTSD vector_gtv2di {}
+
+  const int __builtin_altivec_vcmpgtsd_p (int, vsll, vsll);
+VCMPGTSD_P vector_gt_v2di_p {pred}
+
+  const vsll __builtin_altivec_vcmpgtud (vull, vull);
+VCMPGTUD vector_gtuv2di {}
+
+  const int __builtin_altivec_vcmpgtud_p (int, vsll, vsll);
+VCMPGTUD_P vector_gtu_v2di_p {pred}
+
+  const vd __builtin_altivec_vnor_v2df (vd, vd);
+VNOR_V2DF norv2df3 {}
+
+  const vsll __builtin_altivec_vnor_v2di (vsll, vsll);
+VNOR_V2DI norv2di3 {}
+
+  const vull __builtin_altivec_vnor_v2di_uns (vull, vull);
+VNOR_V2DI_UNS norv2di3 {}
+
+  const vd __builtin_altivec_vor_v2df (vd, vd);
+VOR_V2DF iorv2df3 {}
+
+  const vsll __builtin_altivec_vor_v2di (vsll, vsll);
+VOR_V2DI iorv2di3 {}
+
+  const vull __builtin_altivec_vor_v2di_uns (vull, vull);
+VOR_V2DI_UNS iorv2di3 {}
+
+  const vd __builtin_altivec_vperm_2df (vd, vd, vuc);
+VPERM_2DF altivec_vperm_v2df {}
+
+  const vsll __builtin_altivec_vperm_2di (vsll, vsll, vuc);
+VPERM_2DI altivec_vperm_v2di {}
+
+  const vull __builtin_altivec_vperm_2di_uns (vull, vull, vuc);
+VPERM_2DI_UNS altivec_vperm_v2di_uns {}
+
+  const vd __builtin_altivec_vreve_v2df (vd);
+VREVE_V2DF altivec_vrevev2df2 {}
+
+  const vsll __builtin_altivec_vreve_v2di (vsll);
+VREVE_V2DI altivec_vrevev2di2 {}
+
+  const vd __builtin_altivec_vsel_2df (vd, vd, vd);
+VSEL_2DF vector_select_v2df {}
+
+  const vsll __builtin_altivec_vsel_2di (vsll, vsll, vsll);
+VSEL_2DI_B vector_select_v2di {}
+
+  const vull __builtin_altivec_vsel_2di_uns (vull, vull, vull);
+VSEL_2DI_UNS vector_select_v2di_uns {}
+
+  const vd __builtin_altivec_vsldoi_2df (vd, vd, const int<4>);
+VSLDOI_2DF altivec_vsldoi_v2df {}
+
+  const vsll __builtin_altivec_vsldoi_2di (vsll, vsll, const int<4>);
+VSLDOI_2DI altivec_vsldoi_v2di {}
+
+  const vd __builtin_altivec_vxor_v2df (vd, vd);
+VXOR_V2DF xorv2df3 {}
+
+  const vsll __builtin_altivec_vxor_v2di (vsll, vsll);
+VXOR_V2DI xorv2di3 {}
+
+  const vull __builtin_altivec_vxor_v2di_uns (vull, vull);
+VXOR_V2DI_UNS xorv2di3 {}
+
+  const signed __int128 __builtin_vec_ext_v1ti (vsq, signed int);
+VEC_EXT_V1TI nothing {extract}
+
+  const double __builtin_vec_ext_v2df (vd, signed int);
+VEC_EXT_V2DF nothing {extract}
+
+  const signed long long __builtin_vec_ext_v2di (vsll, signed int);
+VEC_EXT_V2DI nothing {extract}
+
+  const vsq __builtin_vec_init_v1ti (signed __int128);
+VEC_INIT_V1TI nothing {init}
+
+  const vd __builtin_vec_init_v2df (double, double);
+VEC_INIT_V2DF nothing {init}
+
+  const

[PATCH 03/34] rs6000: Add the rest of the [altivec] stanza to the builtins file

2021-07-29 Thread Bill Schmidt via Gcc-patches

2021-06-10  Bill Schmidt  

gcc/
* config/rs6000/rs6000-builtin-new.def: Finish altivec stanza.
* config/rs6000/rs6000-call.c (rs6000_init_builtins): Move
initialization of pcvoid_type_node here...
(altivec_init_builtins): ...from here.
* config/rs6000/rs6000.h (rs6000_builtin_type_index): Add
RS6000_BTI_const_ptr_void.
(pcvoid_type_node): New macro.
---
 gcc/config/rs6000/rs6000-builtin-new.def | 831 +++
 gcc/config/rs6000/rs6000-call.c  |   7 +-
 gcc/config/rs6000/rs6000.h   |   2 +
 3 files changed, 836 insertions(+), 4 deletions(-)

diff --git a/gcc/config/rs6000/rs6000-builtin-new.def 
b/gcc/config/rs6000/rs6000-builtin-new.def
index a84a3def2d5..f1aa5529cdd 100644
--- a/gcc/config/rs6000/rs6000-builtin-new.def
+++ b/gcc/config/rs6000/rs6000-builtin-new.def
@@ -197,3 +197,834 @@
 
   const vss __builtin_altivec_abs_v8hi (vss);
 ABS_V8HI absv8hi2 {}
+
+  const vsc __builtin_altivec_abss_v16qi (vsc);
+ABSS_V16QI altivec_abss_v16qi {}
+
+  const vsi __builtin_altivec_abss_v4si (vsi);
+ABSS_V4SI altivec_abss_v4si {}
+
+  const vss __builtin_altivec_abss_v8hi (vss);
+ABSS_V8HI altivec_abss_v8hi {}
+
+  const vf __builtin_altivec_copysignfp (vf, vf);
+COPYSIGN_V4SF vector_copysignv4sf3 {}
+
+  void __builtin_altivec_dss (const int<2>);
+DSS altivec_dss {}
+
+  void __builtin_altivec_dssall ();
+DSSALL altivec_dssall {}
+
+  void __builtin_altivec_dst (void *, const int, const int<2>);
+DST altivec_dst {}
+
+  void __builtin_altivec_dstst (void *, const int, const int<2>);
+DSTST altivec_dstst {}
+
+  void __builtin_altivec_dststt (void *, const int, const int<2>);
+DSTSTT altivec_dststt {}
+
+  void __builtin_altivec_dstt (void *, const int, const int<2>);
+DSTT altivec_dstt {}
+
+  fpmath vsi __builtin_altivec_fix_sfsi (vf);
+FIX_V4SF_V4SI fix_truncv4sfv4si2 {}
+
+  fpmath vui __builtin_altivec_fixuns_sfsi (vf);
+FIXUNS_V4SF_V4SI fixuns_truncv4sfv4si2 {}
+
+  fpmath vf __builtin_altivec_float_sisf (vsi);
+FLOAT_V4SI_V4SF floatv4siv4sf2 {}
+
+  pure vsc __builtin_altivec_lvebx (signed long, const void *);
+LVEBX altivec_lvebx {ldvec}
+
+  pure vss __builtin_altivec_lvehx (signed long, const void *);
+LVEHX altivec_lvehx {ldvec}
+
+  pure vsi __builtin_altivec_lvewx (signed long, const void *);
+LVEWX altivec_lvewx {ldvec}
+
+  pure vuc __builtin_altivec_lvsl (signed long, const void *);
+LVSL altivec_lvsl {ldvec}
+
+  pure vuc __builtin_altivec_lvsr (signed long, const void *);
+LVSR altivec_lvsr {ldvec}
+
+  pure vsi __builtin_altivec_lvx (signed long, const void *);
+LVX altivec_lvx_v4si {ldvec}
+
+  pure vsq __builtin_altivec_lvx_v1ti (signed long, const void *);
+LVX_V1TI altivec_lvx_v1ti {ldvec}
+
+  pure vsc __builtin_altivec_lvx_v16qi (signed long, const void *);
+LVX_V16QI altivec_lvx_v16qi {ldvec}
+
+  pure vf __builtin_altivec_lvx_v4sf (signed long, const void *);
+LVX_V4SF altivec_lvx_v4sf {ldvec}
+
+  pure vsi __builtin_altivec_lvx_v4si (signed long, const void *);
+LVX_V4SI altivec_lvx_v4si {ldvec}
+
+  pure vss __builtin_altivec_lvx_v8hi (signed long, const void *);
+LVX_V8HI altivec_lvx_v8hi {ldvec}
+
+  pure vsi __builtin_altivec_lvxl (signed long, const void *);
+LVXL altivec_lvxl_v4si {ldvec}
+
+  pure vsc __builtin_altivec_lvxl_v16qi (signed long, const void *);
+LVXL_V16QI altivec_lvxl_v16qi {ldvec}
+
+  pure vf __builtin_altivec_lvxl_v4sf (signed long, const void *);
+LVXL_V4SF altivec_lvxl_v4sf {ldvec}
+
+  pure vsi __builtin_altivec_lvxl_v4si (signed long, const void *);
+LVXL_V4SI altivec_lvxl_v4si {ldvec}
+
+  pure vss __builtin_altivec_lvxl_v8hi (signed long, const void *);
+LVXL_V8HI altivec_lvxl_v8hi {ldvec}
+
+  const vsc __builtin_altivec_mask_for_load (const void *);
+MASK_FOR_LOAD altivec_lvsr_direct {ldstmask}
+
+  vss __builtin_altivec_mfvscr ();
+MFVSCR altivec_mfvscr {}
+
+  void __builtin_altivec_mtvscr (vsi);
+MTVSCR altivec_mtvscr {}
+
+  const vsll __builtin_altivec_vmulesw (vsi, vsi);
+VMULESW vec_widen_smult_even_v4si {}
+
+  const vull __builtin_altivec_vmuleuw (vui, vui);
+VMULEUW vec_widen_umult_even_v4si {}
+
+  const vsll __builtin_altivec_vmulosw (vsi, vsi);
+VMULOSW vec_widen_smult_odd_v4si {}
+
+  const vull __builtin_altivec_vmulouw (vui, vui);
+VMULOUW vec_widen_umult_odd_v4si {}
+
+  const vsc __builtin_altivec_nabs_v16qi (vsc);
+NABS_V16QI nabsv16qi2 {}
+
+  const vf __builtin_altivec_nabs_v4sf (vf);
+NABS_V4SF vsx_nabsv4sf2 {}
+
+  const vsi __builtin_altivec_nabs_v4si (vsi);
+NABS_V4SI nabsv4si2 {}
+
+  const vss __builtin_altivec_nabs_v8hi (vss);
+NABS_V8HI nabsv8hi2 {}
+
+  void __builtin_altivec_stvebx (vsc, signed long, void *);
+STVEBX altivec_stvebx {stvec}
+
+  void __builtin_altivec_stvehx (vss, signed long, void *);
+STVEHX altivec_stvehx {stvec}
+
+  void

[PATCH 02/34] rs6000: Add gengtype handling to the build machinery

2021-07-29 Thread Bill Schmidt via Gcc-patches

2021-06-07  Bill Schmidt  

gcc/
* config.gcc (target_gtfiles): Add ./rs6000-builtins.h.
* config/rs6000/t-rs6000 (EXTRA_GTYPE_DEPS): Set.
---
 gcc/config.gcc | 1 +
 gcc/config/rs6000/t-rs6000 | 1 +
 2 files changed, 2 insertions(+)

diff --git a/gcc/config.gcc b/gcc/config.gcc
index fe2205b4bc2..a880823e562 100644
--- a/gcc/config.gcc
+++ b/gcc/config.gcc
@@ -492,6 +492,7 @@ powerpc*-*-*)
extra_options="${extra_options} g.opt fused-madd.opt 
rs6000/rs6000-tables.opt"
target_gtfiles="$target_gtfiles \$(srcdir)/config/rs6000/rs6000-logue.c 
\$(srcdir)/config/rs6000/rs6000-call.c"
target_gtfiles="$target_gtfiles 
\$(srcdir)/config/rs6000/rs6000-pcrel-opt.c"
+   target_gtfiles="$target_gtfiles ./rs6000-builtins.h"
;;
 pru-*-*)
cpu_type=pru
diff --git a/gcc/config/rs6000/t-rs6000 b/gcc/config/rs6000/t-rs6000
index e0e8ab8d828..92766d8ea25 100644
--- a/gcc/config/rs6000/t-rs6000
+++ b/gcc/config/rs6000/t-rs6000
@@ -22,6 +22,7 @@ TM_H += $(srcdir)/config/rs6000/rs6000-builtin.def
 TM_H += $(srcdir)/config/rs6000/rs6000-cpus.def
 TM_H += $(srcdir)/config/rs6000/rs6000-modes.h
 PASSES_EXTRA += $(srcdir)/config/rs6000/rs6000-passes.def
+EXTRA_GTYPE_DEPS += $(srcdir)/config/rs6000/rs6000-builtin-new.def
 
 rs6000-pcrel-opt.o: $(srcdir)/config/rs6000/rs6000-pcrel-opt.c
$(COMPILE) $<
-- 
2.27.0

[PATCH 01/34] rs6000: Incorporate new builtins code into the build machinery

2021-07-29 Thread Bill Schmidt via Gcc-patches

Differences from previous version:
 - Removed the change to add rs6000-c.o to extra_objs (unnecessary)
 - Avoided race condition and documented how this works

2021-07-27  Bill Schmidt  

gcc/
* config.gcc (powerpc*-*-*): Add rs6000-builtins.o to extra_objs.
* config/rs6000/rs6000-gen-builtins.c (main): Close init_file
last.
* config/rs6000/t-rs6000 (rs6000-gen-builtins.o): New target.
(rbtree.o): Likewise.
(rs6000-gen-builtins): Likewise.
(rs6000-builtins.c): Likewise.
(rs6000-builtins.h): Likewise.
(rs6000.o): Add dependency.
(EXTRA_HEADERS): Add rs6000-vecdefines.h.
(rs6000-vecdefines.h): New target.
(rs6000-builtins.o): Likewise.
(rs6000-call.o): Add rs6000-builtins.h as a dependency.
(rs6000-c.o): Likewise.
---
 gcc/config.gcc  |  1 +
 gcc/config/rs6000/rs6000-gen-builtins.c |  4 ++-
 gcc/config/rs6000/t-rs6000  | 46 ++---
 3 files changed, 45 insertions(+), 6 deletions(-)

diff --git a/gcc/config.gcc b/gcc/config.gcc
index 93e2b3219b9..fe2205b4bc2 100644
--- a/gcc/config.gcc
+++ b/gcc/config.gcc
@@ -476,6 +476,7 @@ powerpc*-*-*)
cpu_type=rs6000
extra_objs="rs6000-string.o rs6000-p8swap.o rs6000-logue.o"
extra_objs="${extra_objs} rs6000-call.o rs6000-pcrel-opt.o"
+   extra_objs="${extra_objs} rs6000-builtins.o"
extra_headers="ppc-asm.h altivec.h htmintrin.h htmxlintrin.h"
extra_headers="${extra_headers} bmi2intrin.h bmiintrin.h"
extra_headers="${extra_headers} xmmintrin.h mm_malloc.h emmintrin.h"
diff --git a/gcc/config/rs6000/rs6000-gen-builtins.c 
b/gcc/config/rs6000/rs6000-gen-builtins.c
index e5d3b71b622..c401a44e104 100644
--- a/gcc/config/rs6000/rs6000-gen-builtins.c
+++ b/gcc/config/rs6000/rs6000-gen-builtins.c
@@ -2979,9 +2979,11 @@ main (int argc, const char **argv)
   exit (1);
 }
 
+  /* Always close init_file last.  This avoids race conditions in the
+ build machinery.  See comments in t-rs6000.  */
   fclose (header_file);
-  fclose (init_file);
   fclose (defines_file);
+  fclose (init_file);
 
   return 0;
 }
diff --git a/gcc/config/rs6000/t-rs6000 b/gcc/config/rs6000/t-rs6000
index 44f7ffb35fe..e0e8ab8d828 100644
--- a/gcc/config/rs6000/t-rs6000
+++ b/gcc/config/rs6000/t-rs6000
@@ -27,10 +27,6 @@ rs6000-pcrel-opt.o: 
$(srcdir)/config/rs6000/rs6000-pcrel-opt.c
$(COMPILE) $<
$(POSTCOMPILE)
 
-rs6000-c.o: $(srcdir)/config/rs6000/rs6000-c.c
-   $(COMPILE) $<
-   $(POSTCOMPILE)
-
 rs6000-string.o: $(srcdir)/config/rs6000/rs6000-string.c
$(COMPILE) $<
$(POSTCOMPILE)
@@ -47,7 +43,47 @@ rs6000-logue.o: $(srcdir)/config/rs6000/rs6000-logue.c
$(COMPILE) $<
$(POSTCOMPILE)
 
-rs6000-call.o: $(srcdir)/config/rs6000/rs6000-call.c
+rs6000-gen-builtins.o: $(srcdir)/config/rs6000/rs6000-gen-builtins.c
+   $(COMPILE) $<
+   $(POSTCOMPILE)
+
+rbtree.o: $(srcdir)/config/rs6000/rbtree.c
+   $(COMPILE) $<
+   $(POSTCOMPILE)
+
+rs6000-gen-builtins: rs6000-gen-builtins.o rbtree.o
+   $(LINKER_FOR_BUILD) $(BUILD_LINKERFLAGS) $(BUILD_LDFLAGS) -o $@ \
+   $(filter-out $(BUILD_LIBDEPS), $^) $(BUILD_LIBS)
+
+# TODO: Whenever GNU make 4.3 is the minimum required, we should use
+# grouped targets on this:
+#rs6000-builtins.c rs6000-builtins.h rs6000-vecdefines.h &: 
+#   
+# For now, the header files depend on rs6000-builtins.c, which avoids
+# races because the .c file is closed last in rs6000-gen-builtins.c.
+rs6000-builtins.c: rs6000-gen-builtins \
+  $(srcdir)/config/rs6000/rs6000-builtin-new.def \
+  $(srcdir)/config/rs6000/rs6000-overload.def
+   ./rs6000-gen-builtins $(srcdir)/config/rs6000/rs6000-builtin-new.def \
+   $(srcdir)/config/rs6000/rs6000-overload.def rs6000-builtins.h \
+   rs6000-builtins.c rs6000-vecdefines.h
+
+rs6000-builtins.h: rs6000-builtins.c
+
+rs6000.o: rs6000-builtins.h
+
+EXTRA_HEADERS += rs6000-vecdefines.h
+rs6000-vecdefines.h: rs6000-builtins.c
+
+rs6000-builtins.o: rs6000-builtins.c
+   $(COMPILE) $<
+   $(POSTCOMPILE)
+
+rs6000-call.o: $(srcdir)/config/rs6000/rs6000-call.c rs6000-builtins.h
+   $(COMPILE) $<
+   $(POSTCOMPILE)
+
+rs6000-c.o: $(srcdir)/config/rs6000/rs6000-c.c rs6000-builtins.h
$(COMPILE) $<
$(POSTCOMPILE)
 
-- 
2.27.0

[PATCHv4 00/34] Replace the Power target-specific builtin machinery

2021-07-29 Thread Bill Schmidt via Gcc-patches

Hi!

Original patch series here:
https://gcc.gnu.org/pipermail/gcc-patches/2021-April/568840.html

V2 patch series here:
https://gcc.gnu.org/pipermail/gcc-patches/2021-June/572231.html

V3 patch series here:
https://gcc.gnu.org/pipermail/gcc-patches/2021-June/573020.html

Thanks for all the reviews so far!  I've now committed all of the
rs6000-gen-builtins.c and rbtree.[ch] patches, along with the patch
to generic code to handle build-time GC roots in gnegtype.  These
constituted the first 22 patches of the V3 series.

In this version of the series, I've made some changes in response to
reviews from Segher and Will Schmidt, and incorporated some upstream
changes since the V3 posting.  Mapping from V4 patches to V3 patches:

   V4   =>   V3
    
  0001  0023
   ....
  0020  0042
  0021   (new)
  0022  0043
   ....
  0034  0055

The new patch 0021 handles the MMA changes, which required rethinking
how I handle "internal" MMA builtins.

Thanks again for the ongoing reviews!

Bill

Bill Schmidt (34):
  rs6000: Incorporate new builtins code into the build machinery
  rs6000: Add gengtype handling to the build machinery
  rs6000: Add the rest of the [altivec] stanza to the builtins file
  rs6000: Add VSX builtins
  rs6000: Add available-everywhere and ancient builtins
  rs6000: Add power7 and power7-64 builtins
  rs6000: Add power8-vector builtins
  rs6000: Add Power9 builtins
  rs6000: Add more type nodes to support builtin processing
  rs6000: Add Power10 builtins
  rs6000: Add MMA builtins
  rs6000: Add miscellaneous builtins
  rs6000: Add Cell builtins
  rs6000: Add remaining overloads
  rs6000: Execute the automatic built-in initialization code
  rs6000: Darwin builtin support
  rs6000: Add sanity to V2DI_type_node definitions
  rs6000: Always initialize vector_pair and vector_quad nodes
  rs6000: Handle overloads during program parsing
  rs6000: Handle gimple folding of target built-ins
  rs6000: Handle some recent MMA builtin changes
  rs6000: Support for vectorizing built-in functions
  rs6000: Builtin expansion, part 1
  rs6000: Builtin expansion, part 2
  rs6000: Builtin expansion, part 3
  rs6000: Builtin expansion, part 4
  rs6000: Builtin expansion, part 5
  rs6000: Builtin expansion, part 6
  rs6000: Update rs6000_builtin_decl
  rs6000: Miscellaneous uses of rs6000_builtins_decl_x
  rs6000: Debug support
  rs6000: Update altivec.h for automated interfaces
  rs6000: Test case adjustments
  rs6000: Enable the new builtin support

 gcc/config.gcc|2 +
 gcc/config/rs6000/altivec.h   |  519 +-
 gcc/config/rs6000/darwin.h|8 +-
 gcc/config/rs6000/rs6000-builtin-new.def  | 3806 ++
 gcc/config/rs6000/rs6000-c.c  | 1083 +++
 gcc/config/rs6000/rs6000-call.c   | 3437 +-
 gcc/config/rs6000/rs6000-gen-builtins.c   |   44 +-
 gcc/config/rs6000/rs6000-overload.def | 6104 +
 gcc/config/rs6000/rs6000.c|  219 +-
 gcc/config/rs6000/rs6000.h|   84 +
 gcc/config/rs6000/t-rs6000|   47 +-
 .../powerpc/bfp/scalar-extract-exp-2.c|2 +-
 .../powerpc/bfp/scalar-extract-sig-2.c|2 +-
 .../powerpc/bfp/scalar-insert-exp-2.c |2 +-
 .../powerpc/bfp/scalar-insert-exp-5.c |2 +-
 .../powerpc/bfp/scalar-insert-exp-8.c |2 +-
 .../powerpc/bfp/scalar-test-neg-2.c   |2 +-
 .../powerpc/bfp/scalar-test-neg-3.c   |2 +-
 .../powerpc/bfp/scalar-test-neg-5.c   |2 +-
 .../gcc.target/powerpc/byte-in-set-2.c|2 +-
 gcc/testsuite/gcc.target/powerpc/cmpb-2.c |2 +-
 gcc/testsuite/gcc.target/powerpc/cmpb32-2.c   |2 +-
 .../gcc.target/powerpc/crypto-builtin-2.c |   14 +-
 .../powerpc/fold-vec-splat-floatdouble.c  |4 +-
 .../powerpc/fold-vec-splat-longlong.c |   10 +-
 .../powerpc/fold-vec-splat-misc-invalid.c |8 +-
 .../gcc.target/powerpc/int_128bit-runnable.c  |6 +-
 .../gcc.target/powerpc/p8vector-builtin-8.c   |1 +
 gcc/testsuite/gcc.target/powerpc/pr80315-1.c  |2 +-
 gcc/testsuite/gcc.target/powerpc/pr80315-2.c  |2 +-
 gcc/testsuite/gcc.target/powerpc/pr80315-3.c  |2 +-
 gcc/testsuite/gcc.target/powerpc/pr80315-4.c  |2 +-
 gcc/testsuite/gcc.target/powerpc/pr88100.c|   12 +-
 .../gcc.target/powerpc/pragma_misc9.c |2 +-
 .../gcc.target/powerpc/pragma_power8.c|2 +
 .../gcc.target/powerpc/pragma_power9.c|3 +
 .../powerpc/test_fpscr_drn_builtin_error.c|4 +-
 .../powerpc/test_fpscr_rn_builtin_error.c |   12 +-
 gcc/testsuite/gcc.target/powerpc/test_mffsl.c |3 +-
 gcc/testsuite/gcc.target/powerpc/vec-gnb-2.c  |2 +-
 .../gcc.target/powerpc/vsu/vec-all-nez-7.c|2 +-
 .../gcc.target/powerpc/vsu/vec-any-eqz-7.c|2 +-

Re: [PATCH] libgccjit: add some reflection functions in the jit C api

2021-07-29 Thread Antoni Boucher via Gcc-patches

David: PING

Le lundi 19 juillet 2021 à 12:10 -0400, Antoni Boucher a écrit :
> I'm sending the patch once again for review/approval.
> 
> I fixed the doc to use the new function names.
> 
> Le vendredi 18 juin 2021 à 16:37 -0400, David Malcolm a écrit :
> > On Fri, 2021-06-18 at 15:41 -0400, Antoni Boucher wrote:
> > > I have write access now.
> > 
> > Great.
> > 
> > > I'm not sure how I'm supposed to send my patches:
> > > should I put it in personal branches and you'll merge them?
> > 
> > Please send them to this mailing list for review; once they're
> > approved
> > you can merge them.
> > 
> > > 
> > > And for the MAINTAINERS file, should I just push to master right
> > > away,
> > > after sending it to the mailing list?
> > 
> > I think people just push the MAINTAINERS change and then let the
> > list
> > know, since it makes a good test that write access is working
> > correctly.
> > 
> > Dave
> > 
> > > 
> > > Thanks for your help!
> > > 
> > > Le vendredi 18 juin 2021 à 12:09 -0400, David Malcolm a écrit :
> > > > On Fri, 2021-06-18 at 11:55 -0400, Antoni Boucher wrote:
> > > > > Le vendredi 11 juin 2021 à 14:00 -0400, David Malcolm a
> > > > > écrit :
> > > > > > On Fri, 2021-06-11 at 08:15 -0400, Antoni Boucher wrote:
> > > > > > > Thank you for your answer.
> > > > > > > I attached the updated patch.
> > > > > > 
> > > > > > BTW you (or possibly me) dropped the mailing lists; was
> > > > > > that
> > > > > > deliberate?
> > > > > 
> > > > > Oh, my bad.
> > > > > 
> > > > 
> > > > [...]
> > > > 
> > > > 
> > > > > > 
> > > > > > 
> > > > > > > I have signed the FSF copyright attribution.
> > > > > > 
> > > > > > I can push changes on your behalf, but I'd prefer it if you
> > > > > > did
> > > > > > it,
> > > > > > especially given that you have various other patches you
> > > > > > want
> > > > > > to
> > > > > > get
> > > > > > in.
> > > > > > 
> > > > > > Instructions on how to get push rights to the git repo are
> > > > > > here:
> > > > > >   https://gcc.gnu.org/gitwrite.html
> > > > > > 
> > > > > > I can sponsor you.
> > > > > 
> > > > > Thanks.
> > > > > I did sign up to get push rights.
> > > > > Have you accepted my request to get those?
> > > > 
> > > > I did, but I didn't see any kind of notification.  Did you get
> > > > an
> > > > email
> > > > about it?
> > > > 
> > > > 
> > > > Dave
> > > > 
> > > 
> > > 
> > 
> > 
>

Re: [PATCH 42/55] rs6000: Handle gimple folding of target built-ins

2021-07-29 Thread Bill Schmidt via Gcc-patches




On 7/28/21 4:21 PM, will schmidt wrote:

On Thu, 2021-06-17 at 10:19 -0500, Bill Schmidt via Gcc-patches wrote:


+/* Vector compares; EQ, NE, GE, GT, LE.  */
+case RS6000_BIF_VCMPEQUB:
+case RS6000_BIF_VCMPEQUH:
+case RS6000_BIF_VCMPEQUW:
+case RS6000_BIF_VCMPEQUD:
+  fold_compare_helper (gsi, EQ_EXPR, stmt);
+  return true;
+
+case RS6000_BIF_VCMPNEB:
+case RS6000_BIF_VCMPNEH:
+case RS6000_BIF_VCMPNEW:
+  fold_compare_helper (gsi, NE_EXPR, stmt);
+  return true;
+
Noting that entries for  _CMPNET,_VCMPEQUT, etc are missing from this
version versus the non-new version of this function.
I believe thiswas/is deliberate and by design.
Same with entries for P10V_BUILTIN_CMPLE_1TI, etc below.



Indeed not!  This is something I missed when new code was added after I 
posted the original patch series.  I'll reinstate the quadword 
compares.  Thanks for spotting this!


Bill

[committed] testsuite: Fix up two tests for recent libstdc++ header changes [PR101647]

2021-07-29 Thread Jakub Jelinek via Gcc-patches

Hi!

After recent libstdc++ header changes  no longer includes
(parts of?)  and doesn't have to and  no longer includes
(parts of?) .
This patch fixes:
testsuite/g++.dg/pr71389.C:10:39: error: aggregate 'std::array, 16> v13' has incomplete type and cannot be defined
as well as
testsuite/g++.dg/cpp0x/initlist48.C:11:6: error: 'initializer_list' in 
namespace 'std' does not name a template type; did you mean 
'uninitialized_fill'?

Tested on x86_64-linux, committed to trunk as obvious:

2021-07-29  Jakub Jelinek  

PR testsuite/101647
* g++.dg/pr71389.C: Include  instead of .
* g++.dg/cpp0x/initlist48.C: Include also .

--- gcc/testsuite/g++.dg/cpp0x/initlist48.C
+++ gcc/testsuite/g++.dg/cpp0x/initlist48.C
@@ -2,6 +2,7 @@
 // { dg-do compile { target c++11 } }
 
 #include 
+#include 
 
 struct Foo{
 int i;
--- gcc/testsuite/g++.dg/pr71389.C
+++ gcc/testsuite/g++.dg/pr71389.C
@@ -1,7 +1,7 @@
 // { dg-do compile { target i?86-*-* x86_64-*-* } }
 // { dg-options "-std=c++11 -O3 -march=ivybridge" }
 
-#include 
+#include 
 
 extern int le_s6, le_s9, le_s11;
 long foo_v14[16][16];


Jakub

Re: [committed] amdgcn: Fix attributes for LLVM-12 [PR 100208]

2021-07-29 Thread Andrew Stubbs


On 29/07/2021 08:34, Richard Biener wrote:

On Wed, Jul 28, 2021 at 3:04 PM Andrew Stubbs  wrote:


This patch follows up my previous patch and supports more variants of
LLVM 12.

There are still other incompatibilities with LLVM 12, but this at least
the ELF attributes should now automatically tune to any LLVM 9, 10, or
12 assembler (It would be nice if one set of options would just work
everywhere, but no).

LLVM 11 was not tested, but is broken in other ways in any case. LLVM 13
(dev) needs more work.

Unfortunately, the need for configure tests and the CLI instability
within the LLVM 12 release branch means that GCC probably needs to be
rebuilt whenever LLVM is upgraded, even for minor versions.


Is it possible to handle some incompatibilities with command line arguments
to llvm-mc in a wrapper script that could dispatch based on the
installed llvm-mc
version?  Or maybe some specs magic that passes down -mllvm-mc-version=XYZ
from a %{llvm-mc-version} specs handler that somehow queries the installed
llvm-mc?  Or is the llvm-mc version not enough to decide things?  I realize this
still needs adjustments for each new llvm-mc version that pops up.


Hopefully things will stabilize and all this will become overkill, but 
if not then we may have to consider some of those ideas.


Andrew

[OG11, committed] amdgcn: Fix attributes for LLVM-12 [PR 100208]

2021-07-29 Thread Andrew Stubbs


Now backported to devel/omp/gcc-11.

Andrew

On 28/07/2021 14:03, Andrew Stubbs wrote:
This patch follows up my previous patch and supports more variants of 
LLVM 12.


There are still other incompatibilities with LLVM 12, but this at least 
the ELF attributes should now automatically tune to any LLVM 9, 10, or 
12 assembler (It would be nice if one set of options would just work 
everywhere, but no).


LLVM 11 was not tested, but is broken in other ways in any case. LLVM 13 
(dev) needs more work.


Unfortunately, the need for configure tests and the CLI instability 
within the LLVM 12 release branch means that GCC probably needs to be 
rebuilt whenever LLVM is upgraded, even for minor versions.


Andrew

[Patch] testsuite/lib/gfortran.exp: Add -I for ISO.h [PR101305, PR101660] (was: Re: [Patch] gfortran.dg/dg.exp: Add libgfortran as -I flag for ISO.h [PR101305] (was: [PATCH 3/3] [PR libfortran/10130

2021-07-29 Thread Tobias Burnus


On 29.07.21 09:09, Jakub Jelinek wrote:

On Thu, Jul 29, 2021 at 12:56:32AM +0200, Jakub Jelinek wrote:

On Wed, Jul 28, 2021 at 01:22:53PM +0200, Tobias Burnus wrote:

gfortran.dg/dg.exp: Add libgfortran as -I flag for ISO*.h [PR101305]

Wouldn't it be better to do that in gcc/testsuite/lib/gfortran.exp
to GFORTRAN_UNDER_TEST there next to
-B$specpath/libgfortran/ ?


I guess so – and that's what I did. However, I had to ensure that it
gets reset otherwise it picks up the wrong header in multilib runs; this
also affects the -B$specpath/libgfortran bit, but I think that makes sense.


Though, I guess we need that mostly for the C FE, so perhaps it needs to go
at the start of additional_flags=, whether TEST_ALWAYS_FLAGS is empty or
not.


For the main testsuite (gcc/testsuite/*fortran*/), I believe the patch
above is sufficient as everything runs through GFORTRAN_UNDER_TEST.

I am also inclined not to add flags to TEST_ALWAYS_FLAGS which then
might get applied to other/pure C/C++ tests.

Regarding libgomp: that one uses xgcc for the compilation. I don't
really see a need to use the Fortran array descriptor from a C program
in libgomp's testsuite. Thus, I am inclined to ignore libgomp.
Otherwise, as libgomp does not gfortran_init and handles libraries
separately, I think the code needs to be put into
libgomp.*fortran/fortran.exp.

Thoughts? Okay?

Tobias

-
Siemens Electronic Design Automation GmbH; Anschrift: Arnulfstraße 201, 80634 
München; Gesellschaft mit beschränkter Haftung; Geschäftsführer: Thomas 
Heurung, Frank Thürauf; Sitz der Gesellschaft: München; Registergericht 
München, HRB 106955
testsuite/lib/gfortran.exp: Add -I for ISO*.h [PR101305, PR101660]

This patch adds -I$specdir/libgfortran to GFORTRAN_UNDER_TEST, when
set by proc gfortran_init. As the $specdir depends on the multilib
setting, it has to be re-set for a different multilib; hence, we track
whether a previous call to gfortran_init set that var or whether it
was set differently.

gcc/testsuite/
	PR libfortran/101305
	PR fortran/101660

	* lib/gfortran.exp (gfortran_init): Add -I $specdir/libgfortran to
	GFORTRAN_UNDER_TEST; update it when set by previous gfortran_init call.
	* gfortran.dg/ISO_Fortran_binding_1.c: Use <...> not "..." for
	ISO_Fortran_binding.h's #include.
	* gfortran.dg/ISO_Fortran_binding_10.c: Likewise.
	* gfortran.dg/ISO_Fortran_binding_11.c: Likewise.
	* gfortran.dg/ISO_Fortran_binding_12.c: Likewise.
	* gfortran.dg/ISO_Fortran_binding_15.c: Likewise.
	* gfortran.dg/ISO_Fortran_binding_16.c: Likewise.
	* gfortran.dg/ISO_Fortran_binding_17.c: Likewise.
	* gfortran.dg/ISO_Fortran_binding_18.c: Likewise.
	* gfortran.dg/ISO_Fortran_binding_3.c: Likewise.
	* gfortran.dg/ISO_Fortran_binding_5.c: Likewise.
	* gfortran.dg/ISO_Fortran_binding_6.c: Likewise.
	* gfortran.dg/ISO_Fortran_binding_7.c: Likewise.
	* gfortran.dg/ISO_Fortran_binding_8.c: Likewise.
	* gfortran.dg/ISO_Fortran_binding_9.c: Likewise.
	* gfortran.dg/PR94327.c: Likewise.
	* gfortran.dg/PR94331.c: Likewise.
	* gfortran.dg/bind_c_array_params_3_aux.c: Likewise.
	* gfortran.dg/iso_fortran_binding_uint8_array_driver.c: Likewise.
	* gfortran.dg/pr93524.c: Likewise.

diff --git a/gcc/testsuite/lib/gfortran.exp b/gcc/testsuite/lib/gfortran.exp
index 1e7da1110bc..cae6738b4b8 100644
--- a/gcc/testsuite/lib/gfortran.exp
+++ b/gcc/testsuite/lib/gfortran.exp
@@ -151,6 +151,7 @@ proc gfortran_init { args } {
 global gcc_warning_prefix
 global gcc_error_prefix
 global TEST_ALWAYS_FLAGS
+global gfortran_init_set_GFORTRAN_UNDER_TEST
 
 # We set LC_ALL and LANG to C so that we get the same error messages as expected.
 setenv LC_ALL C
@@ -166,7 +167,11 @@ proc gfortran_init { args } {
   setenv LANG C.ASCII
 }
 
-if ![info exists GFORTRAN_UNDER_TEST] then {
+# GFORTRAN_UNDER_TEST as set below contains $specpath, which depends on
+# the used multilib config. Thus, its value may need to be reset;
+# that's tracked via gfortran_init_set_GFORTRAN_UNDER_TEST.
+if { ![info exists GFORTRAN_UNDER_TEST]
+	 || [info exists gfortran_init_set_GFORTRAN_UNDER_TEST] } then {
 	if [info exists TOOL_EXECUTABLE] {
 	set GFORTRAN_UNDER_TEST $TOOL_EXECUTABLE
 	} else {
@@ -178,7 +183,8 @@ proc gfortran_init { args } {
 		} else {
 		set specpath [get_multilibs]
 		}
-		set GFORTRAN_UNDER_TEST [findfile $base_dir/../../gfortran "$base_dir/../../gfortran -B$base_dir/../../ -B$specpath/libgfortran/" [findfile $base_dir/gfortran "$base_dir/gfortran -B$base_dir/" [transform gfortran]]]
+		set gfortran_init_set_GFORTRAN_UNDER_TEST 1
+		set GFORTRAN_UNDER_TEST [findfile $base_dir/../../gfortran "$base_dir/../../gfortran -B$base_dir/../../ -B$specpath/libgfortran/ -I$specpath/libgfortran" [findfile $base_dir/gfortran "$base_dir/gfortran -B$base_dir/" [transform gfortran]]]
 	}
 	}
 }
diff --git a/gcc/testsuite/gfortran.dg/ISO_Fortran_binding_1.c b/gcc/testsuite/gfortran.dg/ISO_Fortran_binding_1.c
---

[PATCH v2] c++: Accept C++11 attribute-definition [PR101582]

2021-07-29 Thread Jakub Jelinek via Gcc-patches

On Wed, Jul 28, 2021 at 04:32:08PM -0400, Jason Merrill wrote:
> > As the following testcase shows, we don't parse properly
> > C++11 attribute-declaration:
> > https://eel.is/c++draft/dcl.dcl#nt:attribute-declaration
> > 
> > cp_parser_toplevel_declaration just handles empty-declaration parsing
> > (with diagnostics for C++98)
> 
> This seems to be a bug: from the comments, cp_parser_toplevel_declaration is
> intended to only handle #pragma parsing, everything else should be in
> cp_parser_declaration.
> 
> As a result, we wrongly reject
> 
> extern "C" ;
> 
> So please move empty-declaration and attribute-declaration handling into
> cp_parser_declaration.

So like this?
It means we allow for modules
export ;
or
export [[]];
where we previously rejected those, which is allowed by the grammar and
invalid because of
https://eel.is/c++draft/module.interface#3
but we allowed already before
export {}
which suffers from the same problem - the export-declaration doesn't declare
at least one name.  So I think there just should be something that tracks if
the module exported at least one name and if not, diagnose it at the end of
cp_parser_module_export (and adjust the new modules testcase).

Bootstrapped/regtested on x86_64-linux and i686-linux, ok for trunk?

2021-07-29  Jakub Jelinek  

PR c++/101582
* parser.c (cp_parser_skip_std_attribute_spec_seq): Add a forward
declaration.
(cp_parser_declaration): Parse empty-declaration and
attribute-declaration.
(cp_parser_toplevel_declaration): Don't parse empty-declaration here.

* g++.dg/cpp0x/gen-attrs-45.C: Expect a warning about ignored
attributes instead of error.
* g++.dg/cpp0x/gen-attrs-75.C: New test.
* g++.dg/modules/pr101582-1.C: New test.

--- gcc/cp/parser.c.jj  2021-07-28 23:06:38.658443554 +0200
+++ gcc/cp/parser.c 2021-07-28 23:12:10.955941089 +0200
@@ -2507,6 +2507,8 @@ static tree cp_parser_std_attribute_spec
   (cp_parser *);
 static tree cp_parser_std_attribute_spec_seq
   (cp_parser *);
+static size_t cp_parser_skip_std_attribute_spec_seq
+  (cp_parser *, size_t);
 static size_t cp_parser_skip_attributes_opt
   (cp_parser *, size_t);
 static bool cp_parser_extension_opt
@@ -14410,6 +14412,31 @@ cp_parser_declaration (cp_parser* parser
   cp_token *token2 = (token1->type == CPP_EOF
  ? token1 : cp_lexer_peek_nth_token (parser->lexer, 2));
 
+  if (token1->type == CPP_SEMICOLON)
+{
+  cp_lexer_consume_token (parser->lexer);
+  /* A declaration consisting of a single semicolon is invalid
+   * before C++11.  Allow it unless we're being pedantic.  */
+  if (cxx_dialect < cxx11)
+   pedwarn (input_location, OPT_Wpedantic, "extra %<;%>");
+  return;
+}
+  else if (cp_lexer_nth_token_is (parser->lexer,
+ cp_parser_skip_std_attribute_spec_seq (parser,
+1),
+ CPP_SEMICOLON))
+{
+  location_t attrs_loc = token1->location;
+  tree std_attrs = cp_parser_std_attribute_spec_seq (parser);
+  if (std_attrs != NULL_TREE)
+   warning_at (make_location (attrs_loc, attrs_loc, parser->lexer),
+   OPT_Wattributes,
+   "attributes in attribute declaration are ignored");
+  if (cp_lexer_next_token_is (parser->lexer, CPP_SEMICOLON))
+   cp_lexer_consume_token (parser->lexer);
+  return;
+}
+
   /* Get the high-water mark for the DECLARATOR_OBSTACK.  */
   void *p = obstack_alloc (_obstack, 0);
 
@@ -14560,14 +14587,6 @@ cp_parser_toplevel_declaration (cp_parse
cp_parser_declaration.  (A #pragma at block scope is
handled in cp_parser_statement.)  */
 cp_parser_pragma (parser, pragma_external, NULL);
-  else if (token->type == CPP_SEMICOLON)
-{
-  cp_lexer_consume_token (parser->lexer);
-  /* A declaration consisting of a single semicolon is invalid
-   * before C++11.  Allow it unless we're being pedantic.  */
-  if (cxx_dialect < cxx11)
-   pedwarn (input_location, OPT_Wpedantic, "extra %<;%>");
-}
   else
 /* Parse the declaration itself.  */
 cp_parser_declaration (parser, NULL_TREE);
--- gcc/testsuite/g++.dg/cpp0x/gen-attrs-45.C.jj2021-07-26 
09:13:08.504121494 +0200
+++ gcc/testsuite/g++.dg/cpp0x/gen-attrs-45.C   2021-07-28 23:07:05.095085351 
+0200
@@ -1,4 +1,4 @@
 // PR c++/52906
 // { dg-do compile { target c++11 } }
 
-[[gnu::deprecated]]; // { dg-error "does not declare anything" }
+[[gnu::deprecated]]; // { dg-warning "attributes in attribute declaration are 
ignored" }
--- gcc/testsuite/g++.dg/cpp0x/gen-attrs-75.C.jj2021-07-28 
23:07:05.095085351 +0200
+++ gcc/testsuite/g++.dg/cpp0x/gen-attrs-75.C   2021-07-29 10:59:09.630326797 
+0200
@@ -0,0 +1,35 @@
+// PR c++/101582
+// { dg-do compile }
+// { dg-options "" }
+
+;
+[[]] [[]] [[]];// { dg-warning

RE: [ARM] PR66791: Replace builtins in vld1

2021-07-29 Thread Kyrylo Tkachov via Gcc-patches

Hi Prathamesh,

> -Original Message-
> From: Prathamesh Kulkarni 
> Sent: 26 July 2021 22:24
> To: gcc Patches ; Kyrylo Tkachov
> ; Richard Earnshaw
> 
> Subject: [ARM] PR66791: Replace builtins in vld1
> 
> Hi,
> Similar to aarch64, this patch replaces call to builtin by
> dereferencing __a in vld1_p64, vld1_s64 and vld1_u64.
> 
> The patch changes code-gen for the intrinsic as follows:
> Before patch:
> vld1.64 {d16}, [r0:64]
> vmovr0, r1, d16 @ int
> bx  lr
> 
> After patch:
> ldrdr0, [r0]
> bx  lr
> 
> I assume the code-gen after patch is correct, since it loads two
> consecutive words from [r0] into r0 and r1 ?

Yes, this looks correct.

> 
> Bootstrapped+tested on arm-linux-gnueabihf.
> OK to commit ?

Ok. Can we now remove the vld1 builtin definition?
Thanks,
Kyrill

> 
> Thanks,
> Prathamesh

[PATCH] aarch64: Don't include vec_select high-half in SIMD subtract cost

2021-07-29 Thread Jonathan Wright via Gcc-patches

Hi,

The Neon subtract-long/subract-widen instructions can select the top
or bottom half of the operand registers. This selection does not
change the cost of the underlying instruction and this should be
reflected by the RTL cost function.

This patch adds RTL tree traversal in the Neon subtract cost function
to match vec_select high-half of its operands. This traversal
prevents the cost of the vec_select from being added into the cost of
the subtract - meaning that these instructions can now be emitted in
the combine pass as they are no longer deemed prohibitively
expensive.

Regression tested and bootstrapped on aarch64-none-linux-gnu - no
issues.

Ok for master?

Thanks,
Jonathan

---

gcc/ChangeLog:

2021-07-28  Jonathan Wright  

* config/aarch64/aarch64.c: Traverse RTL tree to prevent cost
of vec_select high-half from being added into Neon subtract
cost.

gcc/testsuite/ChangeLog:

* gcc.target/aarch64/vsubX_high_cost.c: New test.


rb14711.patch
Description: rb14711.patch

[PATCH] aarch64: Don't include vec_select high-half in SIMD add cost

2021-07-29 Thread Jonathan Wright via Gcc-patches

Hi,

The Neon add-long/add-widen instructions can select the top or bottom
half of the operand registers. This selection does not change the
cost of the underlying instruction and this should be reflected by
the RTL cost function.

This patch adds RTL tree traversal in the Neon add cost function to
match vec_select high-half of its operands. This traversal prevents
the cost of the vec_select from being added into the cost of the
subtract - meaning that these instructions can now be emitted in the
combine pass as they are no longer deemed prohibitively expensive.

Regression tested and bootstrapped on aarch64-none-linux-gnu - no
issues.

Ok for master?

Thanks,
Jonathan

---

gcc/ChangeLog:

2021-07-28  Jonathan Wright  

* config/aarch64/aarch64.c: Traverse RTL tree to prevent cost
of vec_select high-half from being added into Neon add cost.

gcc/testsuite/ChangeLog:

* gcc.target/aarch64/vaddX_high_cost.c: New test.


rb14710.patch
Description: rb14710.patch

Re: [PATCH 0/13] v2 warning control by group and location (PR 74765)

2021-07-29 Thread Andrew Burgess

* Martin Sebor  [2021-07-28 10:16:59 -0600]:

> On 7/28/21 5:14 AM, Andrew Burgess wrote:
> > * Martin Sebor via Gcc-patches  [2021-07-19 
> > 09:08:35 -0600]:
> > 
> > > On 7/17/21 2:36 PM, Jan-Benedict Glaw wrote:
> > > > Hi Martin!
> > > > 
> > > > On Fri, 2021-06-04 15:27:04 -0600, Martin Sebor  
> > > > wrote:
> > > > > This is a revised patch series to add warning control by group and
> > > > > location, updated based on feedback on the initial series.
> > > > [...]
> > > > 
> > > > My automated checking (in this case: Using Debian's "gcc-snapshot"
> > > > package) indicates that between versions 1:20210527-1 and
> > > > 1:20210630-1, building GDB breaks. Your patch is a likely candidate.
> > > > It's a case where a method asks for a nonnull argument and later on
> > > > checks for NULLness again. The build log is currently available at
> > > > (http://wolf.lug-owl.de:8080/jobs/gdb-vax-linux/5), though obviously
> > > > breaks for any target:
> > > > 
> > > > configure --target=vax-linux --prefix=/tmp/gdb-vax-linux
> > > > make all-gdb
> > > > 
> > > > [...]
> > > > [all 2021-07-16 19:19:25]   CXXcompile/compile.o
> > > > [all 2021-07-16 19:19:30] In file included from 
> > > > ./../gdbsupport/common-defs.h:126,
> > > > [all 2021-07-16 19:19:30]  from ./defs.h:28,
> > > > [all 2021-07-16 19:19:30]  from compile/compile.c:20:
> > > > [all 2021-07-16 19:19:30] ./../gdbsupport/gdb_unlinker.h: In 
> > > > constructor 'gdb::unlinker::unlinker(const char*)':
> > > > [all 2021-07-16 19:19:30] ./../gdbsupport/gdb_assert.h:35:4: error: 
> > > > 'nonnull' argument 'filename' compared to NULL [-Werror=nonnull-compare]
> > > > [all 2021-07-16 19:19:30]35 |   ((void) ((expr) ? 0 :   
> > > > \
> > > > [all 2021-07-16 19:19:30]   |   
> > > > ~^~~~
> > > > [all 2021-07-16 19:19:30]36 |(gdb_assert_fail (#expr, 
> > > > __FILE__, __LINE__, FUNCTION_NAME), 0)))
> > > > [all 2021-07-16 19:19:30]   |
> > > > ~
> > > > [all 2021-07-16 19:19:30] ./../gdbsupport/gdb_unlinker.h:38:5: note: in 
> > > > expansion of macro 'gdb_assert'
> > > > [all 2021-07-16 19:19:30]38 | gdb_assert (filename != NULL);
> > > > [all 2021-07-16 19:19:30]   | ^~
> > > > [all 2021-07-16 19:19:31] cc1plus: all warnings being treated as errors
> > > > [all 2021-07-16 19:19:31] make[1]: *** [Makefile:1641: 
> > > > compile/compile.o] Error 1
> > > > [all 2021-07-16 19:19:31] make[1]: Leaving directory 
> > > > '/var/lib/laminar/run/gdb-vax-linux/5/binutils-gdb/gdb'
> > > > [all 2021-07-16 19:19:31] make: *** [Makefile:11410: all-gdb] Error 2
> > > > 
> > > > 
> > > > Code is this:
> > > > 
> > > >31 class unlinker
> > > >32 {
> > > >33  public:
> > > >34
> > > >35   unlinker (const char *filename) ATTRIBUTE_NONNULL (2)
> > > >36 : m_filename (filename)
> > > >37   {
> > > >38 gdb_assert (filename != NULL);
> > > >39   }
> > > > 
> > > > I'm quite undecided whether this is bad behavior of GCC or bad coding
> > > > style in Binutils/GDB, or both.
> > > 
> > > A warning should be expected in this case.  Before the recent GCC
> > > change it was inadvertently suppressed in gdb_assert macros by its
> > > operand being enclosed in parentheses.
> > 
> > This issue was just posted to the GDB list, and I wanted to clarify my
> > understanding a bit.
> > 
> > I believe that (at least by default) adding the nonnull attribute
> > allows GCC to assume (in the above case) that filename will not be
> > NULL and generate code accordingly.
> > 
> > Additionally, passing an explicit NULL (i.e. 'unlinker obj (NULL)')
> > would cause a compile time error.
> > 
> > But, there's nothing to actually stop a NULL being passed due to, say,
> > a logic bug in the program.  So, something like this would compile
> > fine:
> > 
> >extern const char *ptr;
> >unlinker obj (ptr);
> > 
> > And in a separate compilation unit:
> > 
> >const char *ptr = NULL;
> > 
> > Obviously, the run time behaviour of such a program would be
> > undefined.
> > 
> > Given the above then, it doesn't seem crazy to want to do something
> > like the above, that is, add an assert to catch a logic bug in the
> > program.
> > 
> > Is there an approved mechanism through which I can tell GCC that I
> > really do want to do a comparison to NULL, without any warning, and
> > without the check being optimised out?
>

Thanks for your feedback.

> The manual says -fno-delete-null-pointer-checks is supposed to
> prevent the removal of the null function argument test so I'd
> expect adding attribute optimize ("no-delete-null-pointer-checks")
> to the definition of the function to have that effect but in my
> testing it didn't work (and didn't give a warning for the two
>

Re: [PATCH] Make loops_list support an optional loop_p root

2021-07-29 Thread Richard Biener via Gcc-patches

On Fri, Jul 23, 2021 at 10:41 AM Kewen.Lin  wrote:
>
> on 2021/7/22 下午8:56, Richard Biener wrote:
> > On Tue, Jul 20, 2021 at 4:37
> > PM Kewen.Lin  wrote:
> >>
> >> Hi,
> >>
> >> This v2 has addressed some review comments/suggestions:
> >>
> >>   - Use "!=" instead of "<" in function operator!= (const Iter )
> >>   - Add new CTOR loops_list (struct loops *loops, unsigned flags)
> >> to support loop hierarchy tree rather than just a function,
> >> and adjust to use loops* accordingly.
> >
> > I actually meant struct loop *, not struct loops * ;)  At the point
> > we pondered to make loop invariant motion work on single
> > loop nests we gave up not only but also because it iterates
> > over the loop nest but all the iterators only ever can process
> > all loops, not say, all loops inside a specific 'loop' (and
> > including that 'loop' if LI_INCLUDE_ROOT).  So the
> > CTOR would take the 'root' of the loop tree as argument.
> >
> > I see that doesn't trivially fit how loops_list works, at least
> > not for LI_ONLY_INNERMOST.  But I guess FROM_INNERMOST
> > could be adjusted to do ONLY_INNERMOST as well?
> >
>
>
> Thanks for the clarification!  I just realized that the previous
> version with struct loops* is problematic, all traversal is
> still bounded with outer_loop == NULL.  I think what you expect
> is to respect the given loop_p root boundary.  Since we just
> record the loops' nums, I think we still need the function* fn?

Would it simplify things if we recorded the actual loop *?

There's still the to_visit reserve which needs a bound on
the number of loops for efficiency reasons.

> So I add one optional argument loop_p root and update the
> visiting codes accordingly.  Before this change, the previous
> visiting uses the outer_loop == NULL as the termination condition,
> it perfectly includes the root itself, but with this given root,
> we have to use it as the termination condition to avoid to iterate
> onto its possible existing next.
>
> For LI_ONLY_INNERMOST, I was thinking whether we can use the
> code like:
>
> struct loops *fn_loops = loops_for_fn (fn)->larray;
> for (i = 0; vec_safe_iterate (fn_loops, i, ); i++)
> if (aloop != NULL
> && aloop->inner == NULL
> && flow_loop_nested_p (tree_root, aloop))
>  this->to_visit.quick_push (aloop->num);
>
> it has the stable bound, but if the given root only has several
> child loops, it can be much worse if there are many loops in fn.
> It seems impossible to predict the given root loop hierarchy size,
> maybe we can still use the original linear searching for the case
> loops_for_fn (fn) == root?  But since this visiting seems not so
> performance critical, I chose to share the code originally used
> for FROM_INNERMOST, hope it can have better readability and
> maintainability.

I was indeed looking for something that has execution/storage
bound on the subtree we're interested in.  If we pull the CTOR
out-of-line we can probably keep the linear search for
LI_ONLY_INNERMOST when looking at the whole loop tree.

It just seemed to me that we can eventually re-use a
single loop tree walker for all orders, just adjusting the
places we push.

>
> Bootstrapped and regtested on powerpc64le-linux-gnu P9,
> x86_64-redhat-linux and aarch64-linux-gnu, also
> bootstrapped on ppc64le P9 with bootstrap-O3 config.
>
> Does the attached patch meet what you expect?

So yeah, it's probably close to what is sensible.  Not sure
whether optimizing the loops for the !only_push_innermost_p
case is important - if we manage to produce a single
walker with conditionals based on 'flags' then IPA-CP should
produce optimal clones as well I guess.

Richard.

>
> BR,
> Kewen
> -
> gcc/ChangeLog:
>
> * cfgloop.h (loops_list::loops_list): Add one optional argument root
> and adjust accordingly.

[PATCH] c++: Implement P0466R5 __cpp_lib_is_pointer_interconvertible compiler helpers [PR101539]

2021-07-29 Thread Jakub Jelinek via Gcc-patches

Hi!

The following patch attempts to implement the compiler helpers for
libstdc++ std::is_pointer_interconvertible_base_of trait and
std::is_pointer_interconvertible_with_class template function.

For the former __is_pointer_interconvertible_base_of trait that checks first
whether base and derived aren't non-union class types that are the same
ignoring toplevel cv-qualifiers, otherwise if derived is unambiguously
derived from base without cv-qualifiers, derived being a complete type,
and if so, my limited understanding of any derived object being
pointer-interconvertible with base subobject IMHO implies (because one can't
inherit from unions or unions can't inherit) that we check if derived is
standard layout type and we walk bases of derived
recursively, stopping on a class that has any non-static data members and
check if any of the bases is base.  On class with non-static data members
no bases are compared already.

The latter is implemented using a FE
__builtin_is_pointer_interconvertible_with_class, but because on the library
side it will be a template function, the builtin takes ... arguments and
only during folding verifies it has a single argument with pointer to member
type.  The initial errors IMHO can only happen if one uses the builtin
incorrectly by hand, the template function should ensure that it has
exactly a single argument that has pointer to member type.
Otherwise, again with my limited understanding of what
the template function should do and pointer-interconvertibility,
it folds to false for pointer-to-member-function, errors if
basetype of the OFFSET_TYPE is incomplete, folds to false
for non-std-layout basetype, then finds the first non-static
data member in the basetype or its bases (by ignoring
DECL_FIELD_IS_BASE FIELD_DECLs that are empty, recursing into
DECL_FIELD_IS_BASE FIELD_DECLs type that are non-empty (I think
std layout should ensure there is at most one), for unions
checks if membertype is same type as any of the union FIELD_DECLs,
for non-unions the first other FIELD_DECL only, and for anonymous
aggregates similarly (union vs. non-union) but recurses into the
anon aggr types.  If membertype doesn't match the type of
first non-static data member (or for unions any of the members),
then the builtin folds to false, otherwise the built folds to
a check whether the argument is equal to OFFSET_TYPE of 0 or not,
either at compile time if it is constant (e.g. for constexpr
folding) or at runtime otherwise.

As I wrote in the PR, I've tried my testcases with MSVC on godbolt
that claims to implement it, and https://godbolt.org/z/3PnjM33vM
for the first testcase shows it disagrees with my expectations on
static_assert (std::is_pointer_interconvertible_base_of_v);
static_assert (std::is_pointer_interconvertible_base_of_v);
static_assert (!std::is_pointer_interconvertible_base_of_v);
static_assert (!std::is_pointer_interconvertible_base_of_v);
static_assert (std::is_pointer_interconvertible_base_of_v);
Is that a bug in my patch or is MSVC buggy on these (or mix thereof)?
https://godbolt.org/z/aYeYnne9d
shows the second testcase, here it differs on:
static_assert (std::is_pointer_interconvertible_with_class (::b));
static_assert (std::is_pointer_interconvertible_with_class (::g));
static_assert (std::is_pointer_interconvertible_with_class (::b));
static_assert (std::is_pointer_interconvertible_with_class (::a));
static_assert (std::is_pointer_interconvertible_with_class (::b));
Again, my bug, MSVC bug, mix thereof?

Oh, and there is another thing, the standard has an example:
struct A { int a; };// a standard-layout class
struct B { int b; };// a standard-layout class
struct C: public A, public B { };   // not a standard-layout class

static_assert( is_pointer_interconvertible_with_class( ::b ) );
  // Succeeds because, despite its appearance, ::b has type
  // “pointer to member of B of type int”.
static_assert( is_pointer_interconvertible_with_class( ::b ) );
  // Forces the use of class C, and fails.
It seems to work as written with MSVC (second assertion fails),
but fails with GCC with the patch:
/tmp/1.C:22:57: error: no matching function for call to 
‘is_pointer_interconvertible_with_class(int B::*)’
   22 | static_assert( is_pointer_interconvertible_with_class( ::b ) );
  |~^
/tmp/1.C:8:1: note: candidate: ‘template constexpr bool 
std::is_pointer_interconvertible_with_class(M S::*)’
8 | is_pointer_interconvertible_with_class (M S::*m) noexcept
  | ^~
/tmp/1.C:8:1: note:   template argument deduction/substitution failed:
/tmp/1.C:22:57: note:   mismatched types ‘C’ and ‘B’
   22 | static_assert( is_pointer_interconvertible_with_class( ::b ) );
  |~^
the second int argument isn't deduced.

This boils down to:
template 
bool foo (M S::*m) noexcept;
struct A {

[OpenACC] Extract 'pass_oacc_loop_designation' out of 'pass_oacc_device_lower' (was: [PATCH 1/4] openacc: Middle-end worker-partitioning support)

2021-07-29 Thread Thomas Schwinge

Hi Julian!

On 2021-03-02T04:20:11-0800, Julian Brown  wrote:
> This patch implements worker-partitioning support in the middle end,
> [...]

I've first separately pushed the mostly "mechanical changes" re
"[OpenACC] Extract 'pass_oacc_loop_designation' out of
'pass_oacc_device_lower'" to master branch in commit
0829ab79d37be6c59072af0c4f54043f7e9d23ea, see attached.

A few comments there:

> --- a/gcc/omp-offload.c
> +++ b/gcc/omp-offload.c

> @@ -1367,6 +1368,8 @@ oacc_loop_xform_head_tail (gcall *from, int level)
>else if (gimple_call_internal_p (stmt, IFN_GOACC_REDUCTION))
>   *gimple_call_arg_ptr (stmt, 3) = replacement;
>
> +  update_stmt (stmt);
> +
>gsi_next ();
>while (gsi_end_p (gsi))
>   gsi = gsi_start_bb (single_succ (gsi_bb (gsi)));
> @@ -1391,25 +1394,28 @@ oacc_loop_process (oacc_loop *loop)
> [...]
> +   update_stmt (call);

Sneaky.  ACK.

>  /* Main entry point for oacc transformations which run on the device
> compiler after LTO, so we know what the target device is at this
> point (including the host fallback).  */
>
>  static unsigned int
> -execute_oacc_device_lower ()
> +execute_oacc_loop_designation ()

This does not just OpenACC loop designation but also includes the general
OpenACC offloaded function classification (diagnostics) as well as
OpenACC 'nohost' clause handling for OpenACC 'routine', meaning that the
"loop designation" name is not totally accurate.  But I couldn't easily
come up with anything more accurate (or an easy way to split out these
things), so I left it at that.

(Also, for later, I wonder if not all the 'oacc_loop' stuff could/should
move into its own new file 'gcc/omp-oacc-loop.cc'.  Also, the tag
'oacc_loop' isn't totally accurate either, for this also deals with
OpenACC 'routine' level of parallelism -- maybe 'oacc_lop' instead of
'oacc_loop' etc.)

> @@ -2051,10 +2072,36 @@ execute_oacc_device_lower ()
>   free_oacc_loop (l);
>  }
>
> +  free_oacc_loop (loops);
> +
>/* Offloaded targets may introduce new basic blocks, which require
>   dominance information to update SSA.  */
>calculate_dominance_info (CDI_DOMINATORS);
>
> +  return 0;
> +}

I do confirm the manual 'calculate_dominance_info (CDI_DOMINATORS)'
necessary in the original state (where this is in the middle of the two
"passes"), but given 'TODO_cleanup_cfg' as part of 'todo_flags_finish'
for new 'pass_oacc_loop_designation', we no longer need that now, as far
as I can tell.  So I removed the manual
'calculate_dominance_info (CDI_DOMINATORS)' -- but please do tell if
there is a reason to keep it.

>  namespace {
>
> +const pass_data pass_data_oacc_loop_designation =
> +{
> +  GIMPLE_PASS, /* type */
> +  "oaccloops", /* name */
> +  OPTGROUP_OMP, /* optinfo_flags */
> +  TV_NONE, /* tv_id */
> +  PROP_cfg, /* properties_required */
> +  0 /* Possibly PROP_gimple_eomp.  */, /* properties_provided */
> +  0, /* properties_destroyed */
> +  0, /* todo_flags_start */
> +  TODO_update_ssa | TODO_cleanup_cfg
> +  | TODO_rebuild_alias, /* todo_flags_finish */
> +};

Do you remember why you added 'TODO_rebuild_alias' here?
'pass_oacc_device_lower' doesn't have it, and neither does
'pass_oacc_loop_designation' in your original (2017-11-27) internal
gcn/master branch commit 81ee7ef64cdfa47c01f24c79b8ebd03242c9f3eb
"Split device-lowering/gimple workers into three passes".  So I
removed that -- but please do tell if there is a reason to keep it.

Grüße
 Thomas

-
Siemens Electronic Design Automation GmbH; Anschrift: Arnulfstraße 201, 80634 
München; Gesellschaft mit beschränkter Haftung; Geschäftsführer: Thomas 
Heurung, Frank Thürauf; Sitz der Gesellschaft: München; Registergericht 
München, HRB 106955
>From 0829ab79d37be6c59072af0c4f54043f7e9d23ea Mon Sep 17 00:00:00 2001
From: Thomas Schwinge 
Date: Tue, 2 Mar 2021 04:20:11 -0800
Subject: [PATCH] [OpenACC] Extract 'pass_oacc_loop_designation' out of
 'pass_oacc_device_lower'

This really is a separate step -- and another pass to be added between the two,
later on.

	gcc/
	* omp-offload.c (oacc_loop_xform_head_tail, oacc_loop_process):
	'update_stmt' after modification.
	(pass_oacc_loop_designation): New function, extracted out of...
	(pass_oacc_device_lower): ... this.
	(pass_data_oacc_loop_designation, pass_oacc_loop_designation)
	(make_pass_oacc_loop_designation): New
	* passes.def: Add it.
	* tree-parloops.c (create_parallel_loop): Adjust.
	* tree-pass.h (make_pass_oacc_loop_designation): New.
	gcc/testsuite/
	* c-c++-common/goacc/classify-kernels-unparallelized.c:
	's%oaccdevlow%oaccloops%g'.
	* c-c++-common/goacc/classify-kernels.c: Likewise.
	* c-c++-common/goacc/classify-parallel.c: Likewise.
	* c-c++-common/goacc/classify-routine-nohost.c: Likewise.
	* c-c++-common/goacc/classify-routine.c: Likewise.
	* c-c++-common/goacc/classify-serial.c: Likewise.
	* c-c++-common/goacc/routine-nohost-1.c: Likewise.
	* g++.dg/goacc/template.C: Likewise.
	*

Re: [PATCH v4] Use range-based for loops for traversing loops

2021-07-29 Thread Richard Biener via Gcc-patches

On Tue, Jul 27, 2021 at 4:11 AM Kewen.Lin  wrote:
>
> on 2021/7/24 上午12:10, Martin Sebor wrote:
> > On 7/23/21 2:35 AM, Kewen.Lin wrote:
> >> Hi,
> >>
> >> Comparing to v2, this v3 removed the new CTOR with struct loops *loops
> >> as Richi clarified.  I'd like to support it in a separated follow up
> >> patch by extending the existing CTOR with an optional argument loop_p
> >> root.
> >
> > Looks very nice (and quite a bit work)!  Thanks again!
> >
> > Not to make even more work for you, but it occurred to me that
> > the declaration of the loop control variable could be simplified
> > by the use of auto like so:
> >
> >  for (auto loop: loops_list (cfun, ...))
> >
>
> Thanks for the suggestion!  Updated in v4 accordingly.
>
> I was under the impression to use C++11 auto is arguable since it sometimes
> may make things less clear.  But I think you are right, using auto here won't
> make it harder to read but more concise.  Thanks again.
>
> > I spotted what looks to me like a few minor typos in the docs
> > diff:
> >
> > diff --git a/gcc/doc/loop.texi b/gcc/doc/loop.texi
> > index a135656ed01..27697b08728 100644
> > --- a/gcc/doc/loop.texi
> > +++ b/gcc/doc/loop.texi
> > @@ -79,14 +79,14 @@ and its subloops in the numbering.  The index of a loop 
> > never changes.
> >
> > The entries of the @code{larray} field should not be accessed directly.
> > The function @code{get_loop} returns the loop description for a loop with
> > -the given index.  @code{number_of_loops} function returns number of
> > -loops in the function.  To traverse all loops, use @code{FOR_EACH_LOOP}
> > -macro.  The @code{flags} argument of the macro is used to determine
> > -the direction of traversal and the set of loops visited.  Each loop is
> > -guaranteed to be visited exactly once, regardless of the changes to the
> > -loop tree, and the loops may be removed during the traversal.  The newly
> > -created loops are never traversed, if they need to be visited, this
> > -must be done separately after their creation.
> > +the given index.  @code{number_of_loops} function returns number of loops
> > +in the function.  To traverse all loops, use range-based for loop with
> >
> > Missing article:
> >
> >   use a range-based for loop
> >
> > +class @code{loop_list} instance. The @code{flags} argument of the macro
> >
> > Is that loop_list or loops_list?
> >
> > IIUC, it's also not a macro anymore, right?  The flags argument
> > is passed to the loop_list ctor, no?
> >
>
> Oops, thanks for catching all above ones!  Fixed in v4.
>
> Bootstrapped and regtested again on powerpc64le-linux-gnu P9,
> x86_64-redhat-linux and aarch64-linux-gnu, also
> bootstrapped again on ppc64le P9 with bootstrap-O3 config.
>
> Is it ok for trunk?

OK.

Thanks,
Richard.

> BR,
> Kewen
> -
> gcc/ChangeLog:
>
> * cfgloop.h (as_const): New function.
> (class loop_iterator): Rename to ...
> (class loops_list): ... this.
> (loop_iterator::next): Rename to ...
> (loops_list::Iter::fill_curr_loop): ... this and adjust.
> (loop_iterator::loop_iterator): Rename to ...
> (loops_list::loops_list): ... this and adjust.
> (loops_list::Iter): New class.
> (loops_list::iterator): New type.
> (loops_list::const_iterator): New type.
> (loops_list::begin): New function.
> (loops_list::end): Likewise.
> (loops_list::begin const): Likewise.
> (loops_list::end const): Likewise.
> (FOR_EACH_LOOP): Remove.
> (FOR_EACH_LOOP_FN): Remove.
> * cfgloop.c (flow_loops_dump): Adjust FOR_EACH_LOOP* with range-based
> for loop with loops_list instance.
> (sort_sibling_loops): Likewise.
> (disambiguate_loops_with_multiple_latches): Likewise.
> (verify_loop_structure): Likewise.
> * cfgloopmanip.c (create_preheaders): Likewise.
> (force_single_succ_latches): Likewise.
> * config/aarch64/falkor-tag-collision-avoidance.c
> (execute_tag_collision_avoidance): Likewise.
> * config/mn10300/mn10300.c (mn10300_scan_for_setlb_lcc): Likewise.
> * config/s390/s390.c (s390_adjust_loops): Likewise.
> * doc/loop.texi: Likewise.
> * gimple-loop-interchange.cc (pass_linterchange::execute): Likewise.
> * gimple-loop-jam.c (tree_loop_unroll_and_jam): Likewise.
> * gimple-loop-versioning.cc (loop_versioning::analyze_blocks): 
> Likewise.
> (loop_versioning::make_versioning_decisions): Likewise.
> * gimple-ssa-split-paths.c (split_paths): Likewise.
> * graphite-isl-ast-to-gimple.c (graphite_regenerate_ast_isl): 
> Likewise.
> * graphite.c (canonicalize_loop_form): Likewise.
> (graphite_transform_loops): Likewise.
> * ipa-fnsummary.c (analyze_function_body): Likewise.
> * ipa-pure-const.c (analyze_function): Likewise.
> * loop-doloop.c (doloop_optimize_loops): Likewise.
> * loop-init.c

[PATCH 1/5] IBM Z: Get rid of vec merge unspec

2021-07-29 Thread Andreas Krebbel via Gcc-patches

This patch gets rid of the unspecs we were using for the vector merge
instruction and replaces it with generic rtx.

gcc/ChangeLog:

* config/s390/s390-modes.def: Add more vector modes to support
concatenation of two vectors.
* config/s390/s390-protos.h (s390_expand_merge_perm_const): Add
prototype.
(s390_expand_merge): Likewise.
* config/s390/s390.c (s390_expand_merge_perm_const): New function.
(s390_expand_merge): New function.
* config/s390/s390.md (UNSPEC_VEC_MERGEH, UNSPEC_VEC_MERGEL):
Remove constant definitions.
* config/s390/vector.md (V_HW_2): Add mode iterators.
(VI_HW_4, V_HW_4): Rename VI_HW_4 to V_HW_4.
(vec_2x_nelts, vec_2x_wide): New mode attributes.
(*vmrhb, *vmrlb, *vmrhh, *vmrlh, *vmrhf, *vmrlf, *vmrhg, *vmrlg):
New pattern definitions.
(vec_widen_umult_lo_, vec_widen_umult_hi_)
(vec_widen_smult_lo_, vec_widen_smult_hi_)
(vec_unpacks_lo_v4sf, vec_unpacks_hi_v4sf, vec_unpacks_lo_v2df)
(vec_unpacks_hi_v2df): Adjust expanders to emit non-unspec RTX for
vec merge.
* config/s390/vx-builtins.md (V_HW_4): Remove mode iterator. Now
in vector.md.
(vec_mergeh, vec_mergel): Use s390_expand_merge to
emit vec merge pattern.

gcc/testsuite/ChangeLog:

* gcc.target/s390/vector/long-double-asm-in-out-hard-fp-reg.c:
Instead of vpdi with 0 and 5 vmrlg and vmrhg are used now.
* gcc.target/s390/vector/long-double-asm-inout-hard-fp-reg.c: Likewise.
* gcc.target/s390/zvector/vec-types.h: New test.
* gcc.target/s390/zvector/vec_merge.c: New test.
---
 gcc/config/s390/s390-modes.def|  11 +-
 gcc/config/s390/s390-protos.h |   2 +
 gcc/config/s390/s390.c|  36 
 gcc/config/s390/s390.md   |   2 -
 gcc/config/s390/vector.md | 204 +++---
 gcc/config/s390/vx-builtins.md|  35 ++-
 .../long-double-asm-in-out-hard-fp-reg.c  |   8 +-
 .../long-double-asm-inout-hard-fp-reg.c   |   6 +-
 .../gcc.target/s390/zvector/vec-types.h   |  37 
 .../gcc.target/s390/zvector/vec_merge.c   |  88 
 10 files changed, 367 insertions(+), 62 deletions(-)
 create mode 100644 gcc/testsuite/gcc.target/s390/zvector/vec-types.h
 create mode 100644 gcc/testsuite/gcc.target/s390/zvector/vec_merge.c

diff --git a/gcc/config/s390/s390-modes.def b/gcc/config/s390/s390-modes.def
index 6d814fc490c..245c2b811d4 100644
--- a/gcc/config/s390/s390-modes.def
+++ b/gcc/config/s390/s390-modes.def
@@ -259,14 +259,17 @@ CC_MODE (CCVFANY);
 
 /* Vector modes.  */
 
-VECTOR_MODES (INT, 2);/* V2QI */
-VECTOR_MODES (INT, 4);/*V4QI V2HI */
-VECTOR_MODES (INT, 8);/*   V8QI V4HI V2SI */
-VECTOR_MODES (INT, 16);   /* V16QI V8HI V4SI V2DI */
+VECTOR_MODES (INT, 2);/*   V2QI */
+VECTOR_MODES (INT, 4);/*  V4QI V2HI */
+VECTOR_MODES (INT, 8);/* V8QI V4HI V2SI */
+VECTOR_MODES (INT, 16);   /*   V16QI V8HI V4SI V2DI */
+VECTOR_MODES (INT, 32);   /* V32QI V16HI V8SI V4DI V2TI */
 
 VECTOR_MODE (FLOAT, SF, 2);   /* V2SF */
 VECTOR_MODE (FLOAT, SF, 4);   /* V4SF */
+VECTOR_MODE (FLOAT, SF, 8);   /* V8SF */
 VECTOR_MODE (FLOAT, DF, 2);   /* V2DF */
+VECTOR_MODE (FLOAT, DF, 4);   /* V4DF */
 
 VECTOR_MODE (INT, QI, 1); /* V1QI */
 VECTOR_MODE (INT, HI, 1); /* V1HI */
diff --git a/gcc/config/s390/s390-protos.h b/gcc/config/s390/s390-protos.h
index 289e018cf0f..4b03c6e99f5 100644
--- a/gcc/config/s390/s390-protos.h
+++ b/gcc/config/s390/s390-protos.h
@@ -122,6 +122,8 @@ extern void s390_expand_vec_compare_cc (rtx, enum rtx_code, 
rtx, rtx, bool);
 extern enum rtx_code s390_reverse_condition (machine_mode, enum rtx_code);
 extern void s390_expand_vcond (rtx, rtx, rtx, enum rtx_code, rtx, rtx);
 extern void s390_expand_vec_init (rtx, rtx);
+extern rtx s390_expand_merge_perm_const (machine_mode, bool);
+extern void s390_expand_merge (rtx, rtx, rtx, bool);
 extern rtx s390_build_signbit_mask (machine_mode);
 extern rtx s390_return_addr_rtx (int, rtx);
 extern rtx s390_back_chain_rtx (void);
diff --git a/gcc/config/s390/s390.c b/gcc/config/s390/s390.c
index b1d3b99784d..b1a9ca9d8aa 100644
--- a/gcc/config/s390/s390.c
+++ b/gcc/config/s390/s390.c
@@ -7014,6 +7014,42 @@ s390_expand_vec_init (rtx target, rtx vals)
 }
 }
 
+/* Return a parallel of constant integers to be used as permutation
+   vector for a vector merge operation in MODE.  If HIGH_P is true the
+   left-most elements of the source vectors are merged otherwise the
+   right-most elements.  */
+rtx
+s390_expand_merge_perm_const (machine_mode mode, bool high_p)
+{
+  int nelts = GET_MODE_NUNITS (mode);
+  rtx perm[16];
+  int addend = high_p ? 0 : nelts;
+
+  for (int i = 0; i < nelts; i++)
+

[PATCH 3/5] IBM Z: Remove redundant V_HW_64 mode iterator.

2021-07-29 Thread Andreas Krebbel via Gcc-patches

gcc/ChangeLog:

* config/s390/vector.md (V_HW_64): Remove mode iterator.
(*vec_load_pair): Use V_HW_2 instead of V_HW_64.
* config/s390/vx-builtins.md
(vec_scatter_element_SI): Use V_HW_2 instead of
V_HW_64.
---
 gcc/config/s390/vector.md  |  7 +++
 gcc/config/s390/vx-builtins.md | 14 +++---
 2 files changed, 10 insertions(+), 11 deletions(-)

diff --git a/gcc/config/s390/vector.md b/gcc/config/s390/vector.md
index 6a6370b5275..b372bf171f7 100644
--- a/gcc/config/s390/vector.md
+++ b/gcc/config/s390/vector.md
@@ -36,7 +36,6 @@ (define_mode_iterator V_HW  [V16QI V8HI V4SI V2DI (V1TI 
"TARGET_VXE") V2DF
 (define_mode_iterator V_HW2 [V16QI V8HI V4SI V2DI V2DF (V4SF "TARGET_VXE")
 (V1TF "TARGET_VXE") (TF "TARGET_VXE")])
 
-(define_mode_iterator V_HW_64 [V2DI V2DF])
 (define_mode_iterator VT_HW_HSDT [V8HI V4SI V4SF V2DI V2DF V1TI V1TF TI TF])
 (define_mode_iterator V_HW_HSD [V8HI V4SI (V4SF "TARGET_VXE") V2DI V2DF])
 
@@ -1972,9 +1971,9 @@ (define_expand "vec_cmp"
 })
 
 (define_insn "*vec_load_pair"
-  [(set (match_operand:V_HW_64   0 "register_operand" 
"=v,v")
-   (vec_concat:V_HW_64 (match_operand: 1 "register_operand"  
"d,v")
-   (match_operand: 2 "register_operand"  
"d,v")))]
+  [(set (match_operand:V_HW_2   0 "register_operand" 
"=v,v")
+   (vec_concat:V_HW_2 (match_operand: 1 "register_operand"  "d,v")
+  (match_operand: 2 "register_operand"  
"d,v")))]
   "TARGET_VX"
   "@
vlvgp\t%v0,%1,%2
diff --git a/gcc/config/s390/vx-builtins.md b/gcc/config/s390/vx-builtins.md
index 3799e833187..3e7b8541887 100644
--- a/gcc/config/s390/vx-builtins.md
+++ b/gcc/config/s390/vx-builtins.md
@@ -452,17 +452,17 @@ (define_insn "vec_scatter_element_DI"
 
 ; A 31 bit target address is generated from 64 bit elements
 ; vsceg
-(define_insn "vec_scatter_element_SI"
+(define_insn "vec_scatter_element_SI"
   [(set (mem:
 (plus:SI (subreg:SI
-  (unspec: [(match_operand:V_HW_64 1 
"register_operand"   "v")
- (match_operand:QI  3 
"const_mask_operand" "C")]
+  (unspec: [(match_operand:V_HW_2 1 
"register_operand"   "v")
+ (match_operand:QI 3 
"const_mask_operand" "C")]
 UNSPEC_VEC_EXTRACT) 4)
- (match_operand:SI  2 
"address_operand"   "ZQ")))
-   (unspec: [(match_operand:V_HW_640 
"register_operand"   "v")
+ (match_operand:SI 2 
"address_operand"   "ZQ")))
+   (unspec: [(match_operand:V_HW_20 
"register_operand"   "v")
   (match_dup 3)] UNSPEC_VEC_EXTRACT))]
-  "TARGET_VX && !TARGET_64BIT && UINTVAL (operands[3]) < GET_MODE_NUNITS 
(mode)"
-  "vsce\t%v0,%O2(%v1,%R2),%3"
+  "TARGET_VX && !TARGET_64BIT && UINTVAL (operands[3]) < GET_MODE_NUNITS 
(mode)"
+  "vsce\t%v0,%O2(%v1,%R2),%3"
   [(set_attr "op_type" "VRV")])
 
 ; Element size and target address size is the same
-- 
2.31.1

[PATCH 2/5] IBM Z: Get rid of vpdi unspec

2021-07-29 Thread Andreas Krebbel via Gcc-patches

The patch gets rid of the unspec used for the vector permute double
immediate instruction and replaces it with generic rtx.

gcc/ChangeLog:

* config/s390/s390.md (UNSPEC_VEC_PERMI): Remove constant
definition.
* config/s390/vector.md (*vpdi1, *vpdi4): New pattern
definitions.
* config/s390/vx-builtins.md (*vec_permi): Emit generic rtx
instead of an unspec.

gcc/testsuite/ChangeLog:

* gcc.target/s390/zvector/vec-permi.c: Removed.
* gcc.target/s390/zvector/vec_permi.c: New test.
---
 gcc/config/s390/s390.md   |  1 -
 gcc/config/s390/vector.md | 26 
 gcc/config/s390/vx-builtins.md| 26 +++-
 .../gcc.target/s390/zvector/vec-permi.c   | 54 ---
 .../gcc.target/s390/zvector/vec_permi.c   | 66 +++
 5 files changed, 102 insertions(+), 71 deletions(-)
 delete mode 100644 gcc/testsuite/gcc.target/s390/zvector/vec-permi.c
 create mode 100644 gcc/testsuite/gcc.target/s390/zvector/vec_permi.c

diff --git a/gcc/config/s390/s390.md b/gcc/config/s390/s390.md
index d896faee0fb..1b894a926ce 100644
--- a/gcc/config/s390/s390.md
+++ b/gcc/config/s390/s390.md
@@ -166,7 +166,6 @@ (define_c_enum "unspec" [
UNSPEC_VEC_PACK_UNSIGNED_SATURATE_CC
UNSPEC_VEC_PACK_UNSIGNED_SATURATE_GENCC
UNSPEC_VEC_PERM
-   UNSPEC_VEC_PERMI
UNSPEC_VEC_EXTEND
UNSPEC_VEC_STORE_LEN
UNSPEC_VEC_STORE_LEN_R
diff --git a/gcc/config/s390/vector.md b/gcc/config/s390/vector.md
index 7507aec1c8e..6a6370b5275 100644
--- a/gcc/config/s390/vector.md
+++ b/gcc/config/s390/vector.md
@@ -767,6 +767,32 @@ (define_insn "*vec_perm"
   "vperm\t%v0,%v1,%v2,%v3"
   [(set_attr "op_type" "VRR")])
 
+
+; First DW of op1 and second DW of op2
+(define_insn "*vpdi1"
+  [(set (match_operand:V_HW_2   0 "register_operand" "=v")
+   (vec_select:V_HW_2
+(vec_concat:
+ (match_operand:V_HW_2 1 "register_operand"  "v")
+ (match_operand:V_HW_2 2 "register_operand"  "v"))
+(parallel [(const_int 0) (const_int 3)])))]
+  "TARGET_VX"
+  "vpdi\t%v0,%v1,%v2,1"
+  [(set_attr "op_type" "VRR")])
+
+; Second DW of op1 and first of op2
+(define_insn "*vpdi4"
+  [(set (match_operand:V_HW_2   0 "register_operand" "=v")
+   (vec_select:V_HW_2
+(vec_concat:
+ (match_operand:V_HW_2 1 "register_operand"  "v")
+ (match_operand:V_HW_2 2 "register_operand"  "v"))
+(parallel [(const_int 1) (const_int 2)])))]
+  "TARGET_VX"
+  "vpdi\t%v0,%v1,%v2,4"
+  [(set_attr "op_type" "VRR")])
+
+
 (define_insn "*vmrhb"
   [(set (match_operand:V16QI 0 "register_operand" "=v")
 (vec_select:V16QI
diff --git a/gcc/config/s390/vx-builtins.md b/gcc/config/s390/vx-builtins.md
index 5abe43b9e53..3799e833187 100644
--- a/gcc/config/s390/vx-builtins.md
+++ b/gcc/config/s390/vx-builtins.md
@@ -403,28 +403,22 @@ (define_insn "vec_zperm"
   "vperm\t%v0,%v1,%v2,%v3"
   [(set_attr "op_type" "VRR")])
 
+; Incoming op3 is in vec_permi format and will we turned into a
+; permute vector consisting of op3 and op4.
 (define_expand "vec_permi"
-  [(set (match_operand:V_HW_64  0 "register_operand"   "")
-   (unspec:V_HW_64 [(match_operand:V_HW_64 1 "register_operand"   "")
-(match_operand:V_HW_64 2 "register_operand"   "")
-(match_operand:QI  3 "const_mask_operand" "")]
-   UNSPEC_VEC_PERMI))]
+  [(set (match_operand:V_HW_2   0 "register_operand" "")
+   (vec_select:V_HW_2
+(vec_concat:
+ (match_operand:V_HW_2 1 "register_operand" "")
+ (match_operand:V_HW_2 2 "register_operand" ""))
+(parallel [(match_operand:QI 3 "const_mask_operand" "") (match_dup 
4)])))]
   "TARGET_VX"
 {
   HOST_WIDE_INT val = INTVAL (operands[3]);
-  operands[3] = GEN_INT ((val & 1) | (val & 2) << 1);
+  operands[3] = GEN_INT ((val & 2) >> 1);
+  operands[4] = GEN_INT ((val & 1) + 2);
 })
 
-(define_insn "*vec_permi"
-  [(set (match_operand:V_HW_64  0 "register_operand"  "=v")
-   (unspec:V_HW_64 [(match_operand:V_HW_64 1 "register_operand"   "v")
-(match_operand:V_HW_64 2 "register_operand"   "v")
-(match_operand:QI  3 "const_mask_operand" "C")]
-   UNSPEC_VEC_PERMI))]
-  "TARGET_VX && (UINTVAL (operands[3]) & 10) == 0"
-  "vpdi\t%v0,%v1,%v2,%b3"
-  [(set_attr "op_type" "VRR")])
-
 
 ; Vector replicate
 
diff --git a/gcc/testsuite/gcc.target/s390/zvector/vec-permi.c 
b/gcc/testsuite/gcc.target/s390/zvector/vec-permi.c
deleted file mode 100644
index c0a852b9703..000
--- a/gcc/testsuite/gcc.target/s390/zvector/vec-permi.c
+++ /dev/null
@@ -1,54 +0,0 @@
-/* { dg-do compile } */
-/* { dg-options "-O3 -march=z13 -mzarch --save-temps" } */
-/* { dg-do run { target { s390_z13_hw } } } */
-
-/*
- * The vector intrinsic vec_permi(a, b, c) chooses one of the two

[PATCH 0/5] IBM Z: Implement TARGET_VECTORIZE_VEC_PERM_CONST

2021-07-29 Thread Andreas Krebbel via Gcc-patches

This patchset, after some prep work, provides an initial
implementation of the TARGET_VECTORIZE_VEC_PERM_CONST hook for IBM Z.
Only the vmrh, vmrl, and vpdi instruction are exploited so far.  More
instructions will be added with follow-on patches.

Bootstrapped and regression tested on s390x.

As expected various occurrences of the vperm instruction get replaced
with vmr* and vpdi.

I'll commit the patches after giving it a few days for comments.

Andreas Krebbel (5):
  IBM Z: Get rid of vec merge unspec
  IBM Z: Get rid of vpdi unspec
  IBM Z: Remove redundant V_HW_64 mode iterator.
  IBM Z: Implement TARGET_VECTORIZE_VEC_PERM_CONST for vector merge
  IBM Z: Implement TARGET_VECTORIZE_VEC_PERM_CONST for vpdi

 gcc/config/s390/s390-modes.def|  11 +-
 gcc/config/s390/s390-protos.h |   2 +
 gcc/config/s390/s390.c| 191 ++
 gcc/config/s390/s390.md   |   3 -
 gcc/config/s390/vector.md | 238 +++---
 gcc/config/s390/vx-builtins.md|  75 +++---
 .../long-double-asm-in-out-hard-fp-reg.c  |   8 +-
 .../long-double-asm-inout-hard-fp-reg.c   |   6 +-
 .../gcc.target/s390/vector/perm-merge.c   | 104 
 .../gcc.target/s390/vector/perm-vpdi.c|  49 
 .../gcc.target/s390/vector/vec-types.h|  35 +++
 .../gcc.target/s390/zvector/vec-permi.c   |  54 
 .../gcc.target/s390/zvector/vec-types.h   |  37 +++
 .../gcc.target/s390/zvector/vec_merge.c   |  88 +++
 .../gcc.target/s390/zvector/vec_permi.c   |  66 +
 15 files changed, 822 insertions(+), 145 deletions(-)
 create mode 100644 gcc/testsuite/gcc.target/s390/vector/perm-merge.c
 create mode 100644 gcc/testsuite/gcc.target/s390/vector/perm-vpdi.c
 create mode 100644 gcc/testsuite/gcc.target/s390/vector/vec-types.h
 delete mode 100644 gcc/testsuite/gcc.target/s390/zvector/vec-permi.c
 create mode 100644 gcc/testsuite/gcc.target/s390/zvector/vec-types.h
 create mode 100644 gcc/testsuite/gcc.target/s390/zvector/vec_merge.c
 create mode 100644 gcc/testsuite/gcc.target/s390/zvector/vec_permi.c

-- 
2.31.1

[PATCH 5/5] IBM Z: Implement TARGET_VECTORIZE_VEC_PERM_CONST for vpdi

2021-07-29 Thread Andreas Krebbel via Gcc-patches

This patch makes use of the vector permute double immediate
instruction for constant permute vectors.

gcc/ChangeLog:

* config/s390/s390.c (expand_perm_with_vpdi): New function.
(vectorize_vec_perm_const_1): Call expand_perm_with_vpdi.
* config/s390/vector.md (*vpdi1, @vpdi1): Enable a
parameterized expander.
(*vpdi4, @vpdi4): Likewise.

gcc/testsuite/ChangeLog:

* gcc.target/s390/vector/perm-vpdi.c: New test.
---
 gcc/config/s390/s390.c| 47 ++
 gcc/config/s390/vector.md |  5 +-
 .../gcc.target/s390/vector/perm-vpdi.c| 49 +++
 3 files changed, 98 insertions(+), 3 deletions(-)
 create mode 100644 gcc/testsuite/gcc.target/s390/vector/perm-vpdi.c

diff --git a/gcc/config/s390/s390.c b/gcc/config/s390/s390.c
index 684241b00b8..20c52c83c72 100644
--- a/gcc/config/s390/s390.c
+++ b/gcc/config/s390/s390.c
@@ -16981,6 +16981,50 @@ expand_perm_with_merge (const struct expand_vec_perm_d 
)
   return merge_lo_p || merge_hi_p;
 }
 
+/* Try to expand the vector permute operation described by D using the
+   vector permute doubleword immediate instruction vpdi.  Return true
+   if vpdi could be used.
+
+   VPDI allows 4 different immediate values (0, 1, 4, 5). The 0 and 5
+   cases are covered by vmrhg and vmrlg already.  So we only care
+   about the 1, 4 cases here.
+   1 - First element of src1 and second of src2
+   4 - Second element of src1 and first of src2  */
+static bool
+expand_perm_with_vpdi (const struct expand_vec_perm_d )
+{
+  bool vpdi1_p = false;
+  bool vpdi4_p = false;
+  rtx op0_reg, op1_reg;
+
+  // Only V2DI and V2DF are supported here.
+  if (d.nelt != 2)
+return false;
+
+  if (d.perm[0] == 0 && d.perm[1] == 3)
+vpdi1_p = true;
+
+  if (d.perm[0] == 1 && d.perm[1] == 2)
+vpdi4_p = true;
+
+  if (!vpdi1_p && !vpdi4_p)
+return false;
+
+  if (d.testing_p)
+return true;
+
+  op0_reg = force_reg (GET_MODE (d.op0), d.op0);
+  op1_reg = force_reg (GET_MODE (d.op1), d.op1);
+
+  if (vpdi1_p)
+emit_insn (gen_vpdi1 (d.vmode, d.target, op0_reg, op1_reg));
+
+  if (vpdi4_p)
+emit_insn (gen_vpdi4 (d.vmode, d.target, op0_reg, op1_reg));
+
+  return true;
+}
+
 /* Try to find the best sequence for the vector permute operation
described by D.  Return true if the operation could be
expanded.  */
@@ -16990,6 +17034,9 @@ vectorize_vec_perm_const_1 (const struct 
expand_vec_perm_d )
   if (expand_perm_with_merge (d))
 return true;
 
+  if (expand_perm_with_vpdi (d))
+return true;
+
   return false;
 }
 
diff --git a/gcc/config/s390/vector.md b/gcc/config/s390/vector.md
index b372bf171f7..1b0ae47ab49 100644
--- a/gcc/config/s390/vector.md
+++ b/gcc/config/s390/vector.md
@@ -768,7 +768,7 @@ (define_insn "*vec_perm"
 
 
 ; First DW of op1 and second DW of op2
-(define_insn "*vpdi1"
+(define_insn "@vpdi1"
   [(set (match_operand:V_HW_2   0 "register_operand" "=v")
(vec_select:V_HW_2
 (vec_concat:
@@ -780,7 +780,7 @@ (define_insn "*vpdi1"
   [(set_attr "op_type" "VRR")])
 
 ; Second DW of op1 and first of op2
-(define_insn "*vpdi4"
+(define_insn "@vpdi4"
   [(set (match_operand:V_HW_2   0 "register_operand" "=v")
(vec_select:V_HW_2
 (vec_concat:
@@ -926,7 +926,6 @@ (define_insn_and_split "tf_to_fprx2"
   operands[5] = simplify_gen_subreg (DFmode, operands[1], TFmode, 8);
 })
 
-; vec_perm_const for V2DI using vpdi?
 
 ;;
 ;; Vector integer arithmetic instructions
diff --git a/gcc/testsuite/gcc.target/s390/vector/perm-vpdi.c 
b/gcc/testsuite/gcc.target/s390/vector/perm-vpdi.c
new file mode 100644
index 000..cc925315b37
--- /dev/null
+++ b/gcc/testsuite/gcc.target/s390/vector/perm-vpdi.c
@@ -0,0 +1,49 @@
+/* { dg-do run { target { s390*-*-* } } } */
+/* { dg-options "-O3 -mzarch -march=z14 -mzvector --save-temps" } */
+
+/* { dg-final { scan-assembler-times "\tvmrhg\t" 3 } } */
+/* { dg-final { scan-assembler-times "\tvmrlg\t" 3 } } */
+/* { dg-final { scan-assembler-times "\tvpdi\t" 6 } } */
+
+#include "vec-types.h"
+#include 
+
+#define GEN_PERMI_BITS(VEC_TYPE, BITS) \
+  VEC_TYPE __attribute__((noinline))   \
+  permi_##BITS##_##VEC_TYPE(VEC_TYPE a, VEC_TYPE b) {  \
+return (VEC_TYPE){a[((BITS) & 2) >> 1], b[(BITS) & 1] }; }
+
+#define GEN_PERMI(VEC_TYPE)\
+  GEN_PERMI_BITS(VEC_TYPE, 0); \
+  GEN_PERMI_BITS(VEC_TYPE, 1); \
+  GEN_PERMI_BITS(VEC_TYPE, 2); \
+  GEN_PERMI_BITS(VEC_TYPE, 3); \
+
+GEN_PERMI(v2di)
+GEN_PERMI(uv2di)
+GEN_PERMI(v2df)
+
+
+#define CHECK_PERMI_BITS(VEC_TYPE, BITS)   \
+  VEC_TYPE r##BITS = permi_##BITS##_##VEC_TYPE (a, b); \
+  if (r##BITS[0] != ((BITS) & 2) >> 1  \
+  || r##BITS[1] != ((BITS) & 1) + 2)   \
+__builtin_abort();
+
+#define CHECK_PERMI(VEC_TYPE)  \
+  {

[PATCH 4/5] IBM Z: Implement TARGET_VECTORIZE_VEC_PERM_CONST for vector merge

2021-07-29 Thread Andreas Krebbel via Gcc-patches

This patch implements the TARGET_VECTORIZE_VEC_PERM_CONST in the IBM Z
backend. The initial implementation only exploits the vector merge
instruction but there is more to come.

gcc/ChangeLog:

* config/s390/s390.c (MAX_VECT_LEN): Define macro.
(struct expand_vec_perm_d): Define struct.
(expand_perm_with_merge): New function.
(vectorize_vec_perm_const_1): New function.
(s390_vectorize_vec_perm_const): New function.
(TARGET_VECTORIZE_VEC_PERM_CONST): Define target macro.

gcc/testsuite/ChangeLog:

* gcc.target/s390/vector/perm-merge.c: New test.
* gcc.target/s390/vector/vec-types.h: New test.
---
 gcc/config/s390/s390.c| 108 ++
 .../gcc.target/s390/vector/perm-merge.c   | 104 +
 .../gcc.target/s390/vector/vec-types.h|  35 ++
 3 files changed, 247 insertions(+)
 create mode 100644 gcc/testsuite/gcc.target/s390/vector/perm-merge.c
 create mode 100644 gcc/testsuite/gcc.target/s390/vector/vec-types.h

diff --git a/gcc/config/s390/s390.c b/gcc/config/s390/s390.c
index b1a9ca9d8aa..684241b00b8 100644
--- a/gcc/config/s390/s390.c
+++ b/gcc/config/s390/s390.c
@@ -16928,6 +16928,110 @@ s390_md_asm_adjust (vec , vec 
,
   return after_md_seq;
 }
 
+#define MAX_VECT_LEN   16
+
+struct expand_vec_perm_d
+{
+  rtx target, op0, op1;
+  unsigned char perm[MAX_VECT_LEN];
+  machine_mode vmode;
+  unsigned char nelt;
+  bool testing_p;
+};
+
+/* Try to expand the vector permute operation described by D using the
+   vector merge instructions vml and vmh.  Return true if vector merge
+   could be used.  */
+static bool
+expand_perm_with_merge (const struct expand_vec_perm_d )
+{
+  bool merge_lo_p = true;
+  bool merge_hi_p = true;
+
+  if (d.nelt % 2)
+return false;
+
+  // For V4SI this checks for: { 0, 4, 1, 5 }
+  for (int telt = 0; telt < d.nelt; telt++)
+if (d.perm[telt] != telt / 2 + (telt % 2) * d.nelt)
+  {
+   merge_hi_p = false;
+   break;
+  }
+
+  if (!merge_hi_p)
+{
+  // For V4SI this checks for: { 2, 6, 3, 7 }
+  for (int telt = 0; telt < d.nelt; telt++)
+   if (d.perm[telt] != (telt + d.nelt) / 2 + (telt % 2) * d.nelt)
+ {
+   merge_lo_p = false;
+   break;
+ }
+}
+  else
+merge_lo_p = false;
+
+  if (d.testing_p)
+return merge_lo_p || merge_hi_p;
+
+  if (merge_lo_p || merge_hi_p)
+s390_expand_merge (d.target, d.op0, d.op1, merge_hi_p);
+
+  return merge_lo_p || merge_hi_p;
+}
+
+/* Try to find the best sequence for the vector permute operation
+   described by D.  Return true if the operation could be
+   expanded.  */
+static bool
+vectorize_vec_perm_const_1 (const struct expand_vec_perm_d )
+{
+  if (expand_perm_with_merge (d))
+return true;
+
+  return false;
+}
+
+/* Return true if we can emit instructions for the constant
+   permutation vector in SEL.  If OUTPUT, IN0, IN1 are non-null the
+   hook is supposed to emit the required INSNs.  */
+
+bool
+s390_vectorize_vec_perm_const (machine_mode vmode, rtx target, rtx op0, rtx 
op1,
+  const vec_perm_indices )
+{
+  struct expand_vec_perm_d d;
+  unsigned char perm[MAX_VECT_LEN];
+  unsigned int i, nelt;
+
+  if (!s390_vector_mode_supported_p (vmode) || GET_MODE_SIZE (vmode) != 16)
+return false;
+
+  d.target = target;
+  d.op0 = op0;
+  d.op1 = op1;
+
+  d.vmode = vmode;
+  gcc_assert (VECTOR_MODE_P (d.vmode));
+  d.nelt = nelt = GET_MODE_NUNITS (d.vmode);
+  d.testing_p = target == NULL_RTX;
+
+  gcc_assert (target == NULL_RTX || REG_P (target));
+  gcc_assert (sel.length () == nelt);
+  gcc_checking_assert (sizeof (d.perm) == sizeof (perm));
+
+  for (i = 0; i < nelt; i++)
+{
+  unsigned char e = sel[i];
+  gcc_assert (e < 2 * nelt);
+  d.perm[i] = e;
+  perm[i] = e;
+}
+
+  return vectorize_vec_perm_const_1 (d);
+}
+
 /* Initialize GCC target structure.  */
 
 #undef  TARGET_ASM_ALIGNED_HI_OP
@@ -17238,6 +17342,10 @@ s390_md_asm_adjust (vec , vec 
,
 #undef TARGET_MD_ASM_ADJUST
 #define TARGET_MD_ASM_ADJUST s390_md_asm_adjust
 
+#undef TARGET_VECTORIZE_VEC_PERM_CONST
+#define TARGET_VECTORIZE_VEC_PERM_CONST s390_vectorize_vec_perm_const
+
+
 struct gcc_target targetm = TARGET_INITIALIZER;
 
 #include "gt-s390.h"
diff --git a/gcc/testsuite/gcc.target/s390/vector/perm-merge.c 
b/gcc/testsuite/gcc.target/s390/vector/perm-merge.c
new file mode 100644
index 000..51b23ddd886
--- /dev/null
+++ b/gcc/testsuite/gcc.target/s390/vector/perm-merge.c
@@ -0,0 +1,104 @@
+/* { dg-do compile } */
+/* { dg-options "-O3 -mzarch -march=z14 -mzvector --save-temps" } */
+/* { dg-do run { target { s390_z14_hw } } } */
+
+/* { dg-final { scan-assembler-times "\tvmrhb\t" 2 } } */
+/* { dg-final { scan-assembler-times "\tvmrlb\t" 2 } } */
+/* { dg-final { scan-assembler-times "\tvmrhh\t" 2 } } */
+/* { dg-final { scan-assembler-times "\tvmrlh\t" 2 } } */
+/* { dg-final {

Re: [committed] amdgcn: Fix attributes for LLVM-12 [PR 100208]

2021-07-29 Thread Richard Biener via Gcc-patches

On Wed, Jul 28, 2021 at 3:04 PM Andrew Stubbs  wrote:
>
> This patch follows up my previous patch and supports more variants of
> LLVM 12.
>
> There are still other incompatibilities with LLVM 12, but this at least
> the ELF attributes should now automatically tune to any LLVM 9, 10, or
> 12 assembler (It would be nice if one set of options would just work
> everywhere, but no).
>
> LLVM 11 was not tested, but is broken in other ways in any case. LLVM 13
> (dev) needs more work.
>
> Unfortunately, the need for configure tests and the CLI instability
> within the LLVM 12 release branch means that GCC probably needs to be
> rebuilt whenever LLVM is upgraded, even for minor versions.

Is it possible to handle some incompatibilities with command line arguments
to llvm-mc in a wrapper script that could dispatch based on the
installed llvm-mc
version?  Or maybe some specs magic that passes down -mllvm-mc-version=XYZ
from a %{llvm-mc-version} specs handler that somehow queries the installed
llvm-mc?  Or is the llvm-mc version not enough to decide things?  I realize this
still needs adjustments for each new llvm-mc version that pops up.

Richard.

> Andrew

Re: [PATCH 1/2] Fix debug info for ignored decls at start of assembly

2021-07-29 Thread Richard Biener

On Wed, 28 Jul 2021, Bernd Edlinger wrote:

> On 7/28/21 2:51 PM, Richard Biener wrote:
> > On Mon, 26 Jul 2021, Bernd Edlinger wrote:
> > 
> >> Ignored functions decls that are compiled at the start of
> >> the assembly have bogus line numbers until the first .file
> >> directive, as reported in PR101575.
> >>
> >> The work around for this issue is to emit a dummy .file
> >> directive when the first function is DECL_IGNORED_P, when
> >> that is not already done, mostly for -fdwarf-4.
> > 
> > I wonder if it makes sense to unconditionally announce the
> > TU with a .file directive at the beginning.  ISTR this is
> > what we now do with -gdwarf-5.
> > 
> 
> Yes, that would work, even when the file name is not guessed
> correctly.
> 
> Initially I had "" unconditionally here, and it did
> not really hurt, except that it is visible with readelf.

I think I'd prefer that, since if we don't announce a .file
before the first assembler statement but ask gas to produce
line info it might be tempted to create line info referencing
the possibly temporary filename of the assembler file which
is undesirable from a build reproducability point.

Richard.

> > Note get_AT_string (comp_unit_die (), DW_AT_name) doesn't
> > work with LTO, you'll get  then.
> > 
> 
> Yeah, that's why I wanted to restrict that to the case where
> it's absolutely necessary.
> 
> > Is the dwarf assembler bug reported/fixed?  Can you include
> > a reference please?
> > 
> 
> I've just added a bug report, it's unlikely to be fixed IMHO:
> https://sourceware.org/bugzilla/show_bug.cgi?id=28149
> 
> I will add that to the patch description:
> 
> Ignored functions decls that are compiled at the start of
> the assembly have bogus line numbers until the first .file
> directive, as reported in PR101575.
> 
> The corresponding binutils bug report is
> https://sourceware.org/bugzilla/show_bug.cgi?id=28149
> 
> The work around for this issue is to emit a dummy .file
> directive when the first function is DECL_IGNORED_P, when
> that is not already done, mostly for -fdwarf-4.
> 
> 
> Thanks
> Bernd.
> 
> > Thanks,
> > Richard.
> > 
> >> 2021-07-24  Bernd Edlinger  
> >>
> >>PR ada/101575
> >>* dwarf2out.c (dwarf2out_begin_prologue): Move init
> >>of fde->ignored_debug to dwarf2out_set_ignored_loc.
> >>(dwarf2out_set_ignored_loc): This is now also called
> >>when no .loc statement is to be generated, in that case
> >>we emit a dummy .file statement when needed.
> >>* final.c (final_start_function_1,
> >>final_scan_insn_1): Call debug_hooks->set_ignored_loc
> >>for all DECL_IGNORED_P functions.
> >> ---
> >>  gcc/dwarf2out.c | 29 +
> >>  gcc/final.c |  5 ++---
> >>  2 files changed, 27 insertions(+), 7 deletions(-)
> >>
> >> diff --git a/gcc/dwarf2out.c b/gcc/dwarf2out.c
> >> index 884f1e1..8de0d6f 100644
> >> --- a/gcc/dwarf2out.c
> >> +++ b/gcc/dwarf2out.c
> >> @@ -1115,7 +1115,6 @@ dwarf2out_begin_prologue (unsigned int line 
> >> ATTRIBUTE_UNUSED,
> >>fde->dw_fde_current_label = dup_label;
> >>fde->in_std_section = (fnsec == text_section
> >> || (cold_text_section && fnsec == cold_text_section));
> >> -  fde->ignored_debug = DECL_IGNORED_P (current_function_decl);
> >>in_text_section_p = fnsec == text_section;
> >>  
> >>/* We only want to output line number information for the genuine dwarf2
> >> @@ -28546,10 +28545,32 @@ dwarf2out_set_ignored_loc (unsigned int line, 
> >> unsigned int column,
> >>  {
> >>dw_fde_ref fde = cfun->fde;
> >>  
> >> -  fde->ignored_debug = false;
> >> -  set_cur_line_info_table (function_section (fde->decl));
> >> +  if (filename)
> >> +{
> >> +  set_cur_line_info_table (function_section (fde->decl));
> >> +
> >> +  dwarf2out_source_line (line, column, filename, 0, true);
> >> +}
> >> +  else
> >> +{
> >> +  fde->ignored_debug = true;
> >> +
> >> +  /* Work around for PR101575: output a dummy .file directive.  */
> >> +  if (in_first_function_p
> >> +&& debug_info_level >= DINFO_LEVEL_TERSE
> >> +&& dwarf_debuginfo_p ()
> >> +#if defined(HAVE_AS_GDWARF_5_DEBUG_FLAG) && 
> >> defined(HAVE_AS_WORKING_DWARF_N_FLAG)
> >> +&& dwarf_version <= 4
> >> +#endif
> >> +&& output_asm_line_debug_info ())
> >> +  {
> >> +const char *filename0 = get_AT_string (comp_unit_die (), DW_AT_name);
> >>  
> >> -  dwarf2out_source_line (line, column, filename, 0, true);
> >> +if (filename0 == NULL)
> >> +  filename0 = "";
> >> +maybe_emit_file (lookup_filename (filename0));
> >> +  }
> >> +}
> >>  }
> >>  
> >>  /* Record the beginning of a new source file.  */
> >> diff --git a/gcc/final.c b/gcc/final.c
> >> index ac6892d..82a5767 100644
> >> --- a/gcc/final.c
> >> +++ b/gcc/final.c
> >> @@ -1725,7 +1725,7 @@ final_start_function_1 (rtx_insn **firstp, FILE 
> >> *file, int *seen,
> >>if (!dwarf2_debug_info_emitted_p (current_function_decl))
> >>

Re: Question about divide by 0 and what we can do with it

2021-07-29 Thread Richard Biener via Gcc-patches

On Wed, Jul 28, 2021 at 4:39 PM Andrew MacLeod  wrote:
>
> So Im seeing what appears to me to be inconsistent behaviour.
>
> in pr96094.c we see:
>
> int
> foo (int x)
> {
>if (x >= 2U)
>  return 34;
>return 34 / x;
> }
>
> x has a range of [0,1] and since / 0  in undefined, the expectation is
> that we fold this to "return 34" and vrp1 does this:
>
>[local count: 767403281]:
>x_6 = ASSERT_EXPR ;
>_4 = 34 / x_6;
>
> [local count: 1073741824]:
># _2 = PHI <34(2), _4(3)>
>
> Transformed to
>
>[local count: 767403281]:
>_4 = 34;
>
> [local count: 1073741824]:
># _2 = PHI <34(2), _4(3)>
>
>
> but if we go to tree-ssa/pr61839_2.c, we see:
>
>volatile unsigned b = 1U;
>int c = 1;
>c = (a + 972195718) % (b ? 2 : 0);
>if (c == 1)
>  ;
>else
>  __builtin_abort ();
>
> /* Dont optimize 972195717 / 0 in function foo.  */
> /* { dg-final { scan-tree-dump-times "972195717 / " 1  "evrp" } } */
>
>
> So why is it OK to optimize out the divide in the first case, but not in
> the second??
>
> Furthermore, If I tweak the second testcase to:
>
>int a = -1;
>volatile unsigned b = 1U;
>int c = 1;
>c = (a + 972195718) / (b ? 2 : 0);
>if (c == 486097858)
>  ;
>else
>  __builtin_abort ();
>
>int d = 1;
>d = (a + 972195718) / (b ? 1 : 0);
>if (d == 972195717)
>  ;
>else
>  __builtin_abort ();
>return d;
>
> NOte the only difference is the first case divides by 0 or 2, the second
> case by 0 or 1...
>
> we quite happily produce:
>
>:
>b ={v} 1;
>b.1_2 ={v} b;
>if (b.1_2 != 0)
>  goto ; [INV]
>else
>  goto ; [INV]
>
> :
>
> :
># iftmp.0_7 = PHI <2(2), 0(3)>
>c_14 = 972195717 / iftmp.0_7;
>if (c_14 == 486097858)
>  goto ; [INV]
>else
>  goto ; [INV]
>
> :
>__builtin_abort ();
>
> :
>b.2_4 ={v} b;
>_5 = b.2_4 != 0;
>_6 = (int) _5;
>return 972195717;
>
>
> Which has removed the second call to builtin_abort()(Even before we
> get to EVRP!)
>
> SO the issue doesn't seem to be removing the divide by 0, it seems to be
> a pattern match for [0,1] that is triggering.
>
> I would argue the test case should not be testing for not removing the
> divide by 0... Because we can now fold c_14 to be 486097858, and I
> think that is a valid transformation?  (assuming no non-call exceptions
> of course)

I think it's a valid transform, even with -fnon-call-exceptions when
-fdelete-dead-exceptions is enabled (like for Ada and C++).

You'd have to dig into history to tell why we added this testcase in that way.

Richard.

>
> Andrew
>

Re: [Patch] gfortran.dg/dg.exp: Add libgfortran as -I flag for ISO*.h [PR101305] (was: [PATCH 3/3] [PR libfortran/101305] Fix ISO_Fortran_binding.h paths in gfortran testsuite)

2021-07-29 Thread Jakub Jelinek via Gcc-patches

On Thu, Jul 29, 2021 at 12:56:32AM +0200, Jakub Jelinek wrote:
> On Wed, Jul 28, 2021 at 01:22:53PM +0200, Tobias Burnus wrote:
> > gfortran.dg/dg.exp: Add libgfortran as -I flag for ISO*.h [PR101305]
> > 
> > gcc/testsuite/
> > PR libfortran/101305
> > * gfortran.dg/dg.exp: Add '-I /libgfortran'
> > compile flag.
> 
> Wouldn't it be better to do that in gcc/testsuite/lib/gfortran.exp
> to GFORTRAN_UNDER_TEST there next to
> -B$specpath/libgfortran/ ?
> So that we don't add it for the installed gfortran testing - there
> we want to test what installed gfortran will do,
> and will affect also libgomp testing.

Though, I guess we need that mostly for the C FE, so perhaps it needs to go
at the start of additional_flags=, whether TEST_ALWAYS_FLAGS is empty or
not.

Jakub

Re: [PATCH V3] Use preferred mode for doloop IV [PR61837]

2021-07-29 Thread guojiufu via Gcc-patches


On 2021-07-27 23:40, Jeff Law wrote:

On 7/27/2021 12:27 AM, Richard Biener wrote:

On Fri, 23 Jul 2021, Jeff Law wrote:



On 7/15/2021 4:08 AM, Jiufu Guo via Gcc-patches wrote:

Refine code for V2 according to review comments:
* Use if check instead assert, and refine assert
* Use better RE check for test case, e.g. (?n)/(?p)
* Use better wording for target.def

Currently, doloop.xx variable is using the type as niter which may 
be

shorter than word size.  For some targets, it would be better to use
word size type.  For example, on 64bit system, to access 32bit 
value,

subreg maybe used.  Then using 64bit type maybe better for niter if
it can be present in both 32bit and 64bit.

This patch add target hook for querg perferred mode for doloop IV.
And update mode accordingly.

Bootstrap and regtest pass on powerpc64le, is this ok for trunk?

BR.
Jiufu

gcc/ChangeLog:

2021-07-15  Jiufu Guo  

  PR target/61837
  * config/rs6000/rs6000.c (TARGET_PREFERRED_DOLOOP_MODE): New hook.
  (rs6000_preferred_doloop_mode): New hook.
  * doc/tm.texi: Regenerate.
  * doc/tm.texi.in: Add hook preferred_doloop_mode.
  * target.def (preferred_doloop_mode): New hook.
  * targhooks.c (default_preferred_doloop_mode): New hook.
  * targhooks.h (default_preferred_doloop_mode): New hook.
  * tree-ssa-loop-ivopts.c (compute_doloop_base_on_mode): New 
function.

  (add_iv_candidate_for_doloop): Call targetm.preferred_doloop_mode
  and compute_doloop_base_on_mode.

gcc/testsuite/ChangeLog:

2021-07-15  Jiufu Guo  

  PR target/61837
  * gcc.target/powerpc/pr61837.c: New test.
My first reaction was that whatever type corresponds to the target's 
word_mode
would be the right choice.  But then I remembered things like dbCC on 
m68k
which had a more limited range.  While I don't think m68k uses the 
doloop
bits, it's a clear example that the most desirable type may not 
correspond to

the word type for the target.

So my concern with this patch is its introducing more target 
dependencies into
the gimple pipeline which is generally considered undesirable from a 
design
standpoint.  Is there any way to lower from whatever type is chosen 
by ivopts
to the target's desired type at the gimple->rtl border rather than 
doing it in

ivopts?

I think that's difficult - after all we want to base other IV uses on
the doloop IV if possible.  So IMHO it's not different from IVOPTs
choosing different IVs based on RTL costing and target addressing mode
availability so I wasn't worried about those additional target
dependences at this point of the GIMPLE pipeline.

Yea, you're probably right on both accounts.   With that resolved I
think this is OK for the trunk.

Thanks for your patience Jiufu and thanks for chiming in Richi.


Thanks for all your help!

The patch was committed to r12-2585.

I notice that I ignored one guality case(gfortran.dg/guality/arg1.f90).
It becomes 'unsupported' from 'pass'.  The issue could be reproduced
on a similar test case without this patch.  Just opened PR101669 for it.


BR,
Jiufu



jeff

Re: [PATCH] c/101512 - fix missing address-taking in c_common_mark_addressable_vec

2021-07-29 Thread Richard Biener

On Wed, 28 Jul 2021, Joseph Myers wrote:

> On Wed, 21 Jul 2021, Jakub Jelinek via Gcc-patches wrote:
> 
> > I wonder if instead when trying to wrap
> > C_MAYBE_CONST_EXPR into a VIEW_CONVERT_EXPR we shouldn't be
> > removing that C_MAYBE_CONST_EXPR and perhaps adding it around the
> > VIEW_CONVERT_EXPR.  E.g. various routines in c/c-typeck.c like
> > build_unary_op remember int_operands, remove_c_maybe_const_expr
> > and at the end note_integer_operands.
> > 
> > If Joseph thinks it is ok to have C_MAYBE_CONST_EXPR inside of
> > VCE, then the patch looks good to me.
> 
> There are specific cases when a C_MAYBE_CONST_EXPR mustn't appear inside 
> another expression: any case where the inner expression is required to be 
> fully folded (this implies nested C_MAYBE_CONST_EXPR aren't allowed) and 
> any case where the expression might appear (possibly unevaluated) in an 
> integer constant expression (any C_MAYBE_CONST_EXPR noting that needs to 
> be at top level).
> 
> If the expressions involved here can never appear in an integer constant 
> expression and do not need to be fully folded, I think it's OK to have 
> C_MAYBE_CONST_EXPR inside VIEW_CONVERT_EXPR.

Since the expression in question involves GCC extensions to the C language
(vector types), it's not clear if they can be part of an integer constant
expression.  The case is sth like

typedef long V __attribute__((vector_size(16)));

const long x = ((V)(V){1+1, 2+2})[2-1];

which we reject.  In particular c_fully_fold_internal is fed

VIEW_CONVERT_EXPR(VIEW_CONVERT_EXPR(<<< 
Unknown tree: compound_literal_expr
static V __compound_literal.0 = { 2, 4 }; >>>))[3]

in this case (no C_MAYBE_CONST_EXPR here), but we are not even turning
the COMPOUND_LITERAL_EXPR into a VECTOR_CST.

So I conclude that indeed the expressions involved here can never
appear in an integer constant expression.

Pushed to trunk.

Richard.

1 2 >

100 matches

Mail list logo