Re: [PATCH] libgccjit: Add support for types used by atomic builtins [PR96066] [PR96067]

2021-12-03 Thread Antoni Boucher via Gcc-patches
David: PING

In case you missed it, that's the last patch left to review for now.

Le dimanche 21 novembre 2021 à 16:44 -0500, Antoni Boucher a écrit :
> Thanks for the review!
> I updated the patch.
> 
> See notes below.
> 
> Le samedi 20 novembre 2021 à 13:50 -0500, David Malcolm a écrit :
> > On Sat, 2021-11-20 at 11:27 -0500, Antoni Boucher wrote:
> > > Hi.
> > > Here's the updated patch.
> > > Thanks for the review!
> > 
> > Thanks for the updated patch...
> > 
> > > 
> > > Le jeudi 20 mai 2021 à 16:24 -0400, David Malcolm a écrit :
> > > > On Mon, 2021-05-17 at 21:02 -0400, Antoni Boucher via Jit
> > > > wrote:
> > > > > Hello.
> > > > > This patch fixes the issue with using atomic builtins in
> > > > > libgccjit.
> > > > > Thanks to review it.
> > > > 
> > > > [...snip...]
> > > >  
> > > > > diff --git a/gcc/jit/jit-recording.c b/gcc/jit/jit-
> > > > > recording.c
> > > > > index 117ff70114c..de876ff9fa6 100644
> > > > > --- a/gcc/jit/jit-recording.c
> > > > > +++ b/gcc/jit/jit-recording.c
> > > > > @@ -2598,8 +2598,18 @@
> > > > > recording::memento_of_get_pointer::accepts_writes_from (type
> > > > > *rtype)
> > > > >  return false;
> > > > >  
> > > > >    /* It's OK to assign to a (const T *) from a (T *).  */
> > > > > -  return m_other_type->unqualified ()
> > > > > -    ->accepts_writes_from (rtype_points_to);
> > > > > +  if (m_other_type->unqualified ()
> > > > > +    ->accepts_writes_from (rtype_points_to)) {
> > > > > +  return true;
> > > > > +  }
> > > > > +
> > > > > +  /* It's OK to assign to a (volatile const T *) from a
> > > > > (volatile
> > > > > const T *). */
> > > > > +  if (m_other_type->unqualified ()->unqualified ()
> > > > > +    ->accepts_writes_from (rtype_points_to->unqualified ()))
> > > > > {
> > > > > +  return true;
> > > > > +  }
> > > > 
> > > > Presumably you need this to get the atomic builtins working?
> > > > 
> > > > If I'm reading the above correctly, the new test doesn't
> > > > distinguish
> > > > between the 3 different kinds of qualifiers (aligned, volatile,
> > > > and
> > > > const), it merely tries to strip some of them off.
> > > > 
> > > > It's not valid to e.g. assign to a (aligned T *) from a (const
> > > > T
> > > > *).
> > > > 
> > > > Maybe we need an internal enum to discriminate between
> > > > different
> > > > subclasses of decorated_type?
> > 
> > I'm still concerned about this case, my reading of the updated
> > patch
> > is
> > that this case is still not quite correctly handled (see notes
> > below).
> > I don't think we currently have test coverage for assignment to
> > e.g.
> > (aligned T *) from a (const T*); I feel that it should be an error,
> > without an explicit cast.
> > 
> > Please can you add a testcase for this?
> 
> Done.
> 
> > 
> > If you want to go the extra mile, given that this is code created
> > through an API, you could have a testcase that iterates through all
> > possible combinations of qualifiers (for both source and
> > destination
> > pointer), and verifies that libgccjit at least doesn't crash on
> > them
> > (and hopefully does the right thing on each one)  :/
> > 
> > (perhaps doing each one in a different gcc_jit_context)
> > 
> > Might be nice to update test-fuzzer.c for the new qualifiers; I
> > don't
> > think I've touched it in a long time.
> 
> Done.
> 
> > 
> > [...snip...]
> > 
> > > diff --git a/gcc/jit/jit-recording.h b/gcc/jit/jit-recording.h
> > > index 4a994fe7094..60aaba2a246 100644
> > > --- a/gcc/jit/jit-recording.h
> > > +++ b/gcc/jit/jit-recording.h
> > > @@ -545,6 +545,8 @@ public:
> > >    virtual bool is_float () const = 0;
> > >    virtual bool is_bool () const = 0;
> > >    virtual type *is_pointer () = 0;
> > > +  virtual type *is_volatile () { return NULL; }
> > > +  virtual type *is_const () { return NULL; }
> > >    virtual type *is_array () = 0;
> > >    virtual struct_ *is_struct () { return NULL; }
> > >    virtual bool is_void () const { return false; }
> > > @@ -687,6 +689,13 @@ public:
> > >    /* Strip off the "const", giving the underlying type.  */
> > >    type *unqualified () FINAL OVERRIDE { return m_other_type; }
> > >  
> > > +  virtual bool is_same_type_as (type *other)
> > > +  {
> > > +    return m_other_type->is_same_type_as (other->is_const ());
> > > +  }
> > 
> > What happens if other_is_const () returns NULL, and
> >   m_other_type->is_same_type_as ()
> > tries to call a vfunc on it...
> 
> Fixed.
> 
> > 
> > > +
> > > +  virtual type *is_const () { return m_other_type; }
> > > +
> > >    void replay_into (replayer *) FINAL OVERRIDE;
> > >  
> > >  private:
> > > @@ -701,9 +710,16 @@ public:
> > >    memento_of_get_volatile (type *other_type)
> > >    : decorated_type (other_type) {}
> > >  
> > > +  virtual bool is_same_type_as (type *other)
> > > +  {
> > > +    return m_other_type->is_same_type_as (other->is_volatile
> > > ());
> > > +  }
> > 
> > ...with similar considerations here.
> > 
> > i.e. is it possible for the user to create combinati

[PATCH v2] c++: Handle auto(x) in parameter-declaration-clause [PR103401]

2021-12-03 Thread Marek Polacek via Gcc-patches
On Thu, Dec 02, 2021 at 12:56:38PM -0500, Jason Merrill wrote:
> On 12/2/21 10:27, Marek Polacek wrote:
> > On Wed, Dec 01, 2021 at 11:24:58PM -0500, Jason Merrill wrote:
> > > On 12/1/21 10:16, Marek Polacek wrote:
> > > > In C++23, auto(x) is valid, so decltype(auto(x)) should also be valid,
> > > > so
> > > > 
> > > > void f(decltype(auto(0)));
> > > > 
> > > > should be just as
> > > > 
> > > > void f(int);
> > > > 
> > > > but currently, everytime we see 'auto' in a 
> > > > parameter-declaration-clause,
> > > > we try to synthesize_implicit_template_parm for it, creating a new 
> > > > template
> > > > parameter list.  The code above actually has us calling s_i_t_p twice;
> > > > once from cp_parser_decltype_expr -> cp_parser_postfix_expression which
> > > > fails and then again from cp_parser_decltype_expr -> 
> > > > cp_parser_expression.
> > > > So it looks like we have f and we accept ill-formed code.
> > > > 
> > > > So we need to be more careful about synthesizing the implicit template
> > > > parameter.  cp_parser_postfix_expression looked like a sensible place.
> > > 
> > > Does this cover other uses of auto in decltype, such as
> > > 
> > > void f(decltype(new auto{0}));
> > 
> > Yes: the clearing of auto_is_implicit_function_template_parm_p will happen 
> > here
> > too.
> > 
> > However, I'm noticing this:
> > 
> >void f1(decltype(new auto{0}));
> >void f2(decltype(new int{0}));
> > 
> >void
> >g ()
> >{
> >  int i;
> >  void f3(decltype(new auto{0}));
> >  void f4(decltype(new int{0}));
> >  f1 (&i); // error: no matching function for call to f1(int*)
> >   // couldn't deduce template parameter auto:1
> >  f2 (&i);
> >  f3 (&i);
> >  f4 (&i);
> >}
> > I think the error we issue is bogus.  (My patch doesn't change this.  
> > clang++
> > accepts.)  Should I file a PR (and investigate)?
> 
> That certainly suggests that auto_is_implicit_function_template_parm_p isn't
> getting cleared soon enough for f1.

Exactly right.
 
> > > ?  Should we adjust this flag in cp_parser_decltype along with all the 
> > > other
> > > flags?
> > 
> > I think that's possible, but wouldn't cover auto in default arguments, or 
> > array
> > bounds.
> 
> I guess cp_parser_sizeof_operand would need the same change.
> 
> Do we currently handle auto in default arguments wrong?  Ah, I see that we
> currently set auto_is_... for the whole parameter declaration clause, rather
> than just for the decl-specifier-seq of parameters as the standard
> specifies:
> 
> "A placeholder-type-specifier of the form type-constraint opt auto can be
> used as a decl-specifier of the decl-specifier-seq of a
> parameter-declaration of a function declaration or lambda-expression"

Thanks.  How about this then?  The patch gives the rationale.

Bootstrapped/regtested on x86_64-pc-linux-gnu, ok for trunk?

-- >8 --
In C++23, auto(x) is valid, so decltype(auto(x)) should also be valid,
so

  void f(decltype(auto(0)));

should be just as

  void f(int);

but currently, everytime we see 'auto' in a parameter-declaration-clause,
we try to synthesize_implicit_template_parm for it, creating a new template
parameter list.  The code above actually has us calling s_i_t_p twice;
once from cp_parser_decltype_expr -> cp_parser_postfix_expression which
fails and then again from cp_parser_decltype_expr -> cp_parser_expression.
So it looks like we have f and we accept ill-formed code.

This shows that we need to be more careful about synthesizing the
implicit template parameter.  [dcl.spec.auto.general] says that "A
placeholder-type-specifier of the form type-constraintopt auto can be
used as a decl-specifier of the decl-specifier-seq of a
parameter-declaration of a function declaration or lambda-expression..."
so this patch turns off auto_is_... after we've parsed the decl-specifier-seq.

That doesn't quite cut yet though, because we also need to handle an
auto nested in the decl-specifier:

  void f(decltype(new auto{0}));

therefore the cp_parser_decltype change.

The second hunk broke lambda-generic-85713-2.C but I think the error we
issue with this patch is in fact correct, and clang++ agrees.

The r11-1913 change is OK: we need to make sure that we see '(auto)' after
decltype to go ahead with 'decltype(auto)'.

PR c++/103401

gcc/cp/ChangeLog:

* parser.c (cp_parser_decltype): Clear
auto_is_implicit_function_template_parm_p.
(cp_parser_parameter_declaration): Clear
auto_is_implicit_function_template_parm_p after parsing the
decl-specifier-seq.

gcc/testsuite/ChangeLog:

* g++.dg/cpp1y/lambda-generic-85713-2.C: Add dg-error.
* g++.dg/cpp23/auto-fncast7.C: New test.
* g++.dg/cpp23/auto-fncast8.C: New test.
* g++.dg/cpp23/auto-fncast9.C: New test.
---
 gcc/cp/parser.c   | 19 +++
 .../g++.dg/cpp1y/lambda-generic-85713-2.C |  2 +-
 gcc/testsuite/g++.dg/cpp23/aut

Re: [PATCH] fix up compute_objsize (including PR 103143)

2021-12-03 Thread Jeff Law via Gcc-patches




On 11/8/2021 7:34 PM, Martin Sebor via Gcc-patches wrote:

The pointer-query code that implements compute_objsize() that's
in turn used by most middle end access warnings now has a few
warts in it and (at least) one bug.  With the exception of
the bug the warts aren't behind any user-visible bugs that
I know of but they do cause problems in new code I've been
implementing on top of it.  Besides fixing the one bug (just
a typo) the attached patch cleans up these latent issues:

1) It moves the bndrng member from the access_ref class to
   access_data.  As a FIXME in the code notes, the member never
   did belong in the former and only takes up space in the cache.

2) The compute_objsize_r() function is big, unwieldy, and tedious
   to step through because of all the if statements that are better
   coded as one switch statement.  This change factors out more
   of its code into smaller handler functions as has been suggested
   and done a few times before.

3) (2) exposed a few places where I fail to pass the current
   GIMPLE statement down to ranger.  This leads to worse quality
   range info, including possible false positives and negatives.
   I just spotted these problems in code review but I haven't
   taken the time to come up with test cases.  This change fixes
   these oversights as well.

4) The handling of PHI statements is also in one big, hard-to-
   follow function.  This change moves the handling of each PHI
   argument into its own handler which merges it into the previous
   argument.  This makes the code easier to work with and opens it
   to reuse also for MIN_EXPR and MAX_EXPR.  (This is primarily
   used to print informational notes after warnings.)

5) Finally, the patch factors code to dump each access_ref
   cached by the pointer_query cache out of pointer_query::dump
   and into access_ref::dump.  This helps with debugging.

These changes should have no user-visible effect and other than
a regression test for the typo (PR 103143) come with no tests.
They've been tested on x86_64-linux.
Sigh.  You've identified 6 distinct changes above.  The 5 you've 
enumerated plus a typo fix somewhere.  There's no reason why they need 
to be a single patch and many reasons why they should be a series of 
independent patches.    Combining them into a single patch isn't how we 
do things and it hides the actual bugfix in here.


Please send a fix for the typo first since that should be able to 
trivially go forward.  Then  a patch for item #1.  That should be 
trivial to review when it's pulled out from teh rest of the patch. 
Beyond that, your choice on ordering, but you need to break this down.





Jeff



Re: [PATCH] rs6000: Builtins test changes for test_fpscr_[d]rn_builtin_error.c

2021-12-03 Thread Peter Bergner via Gcc-patches
On 12/2/21 4:24 PM, Segher Boessenkool wrote:
> On Thu, Dec 02, 2021 at 10:43:24AM -0600, Bill Schmidt wrote:
>> The new built-in infrastructure is now enabled!
> 
> Congratulations, and thanks for all the work!

A big +1!

Peter





Re: [PATCH v2] rs6000: Fix some issues in rs6000_can_inline_p [PR102059]

2021-12-03 Thread Peter Bergner via Gcc-patches
On 12/2/21 9:46 PM, Kewen.Lin via Gcc-patches wrote:
> on 2021/11/30 上午12:57, Segher Boessenkool wrote:
>> On Wed, Sep 01, 2021 at 02:55:51PM +0800, Kewen.Lin wrote:
>>> This patch is to fix the inconsistent behaviors for non-LTO mode
>>> and LTO mode.  As Martin pointed out, currently the function
>>> rs6000_can_inline_p simply makes it inlinable if callee_tree is
>>> NULL, but it's wrong, we should use the command line options
>>> from target_option_default_node as default.
>>
>> This is not documented.
>>
> 
> Yeah, but according to the document for the target attribute [1],
> "Multiple target back ends implement the target attribute to specify
> that a function is to be compiled with different target options than
> specified on the command line. The original target command-line options
> are ignored. ", it seems to say the function without any target
> attribute/pragma will be compiled with target options specified on the
> command line.  I think it's a normal expectation for users.
> 
> Excepting for the inconsistent behaviors between LTO and non-LTO,
> it can also make the below case different.

I thought Martin and richi mentioned that target attribute options
are treated as if they are appended to the end of the command line
options, so they can potentially override earlier options, but they
don't actually ignore them?

Peter



Re: [PATCH] libcpp: Fix up handling of deferred pragmas [PR102432]

2021-12-03 Thread Marek Polacek via Gcc-patches
On Fri, Dec 03, 2021 at 11:27:27AM +0100, Jakub Jelinek wrote:
> Hi!
> 
> The https://gcc.gnu.org/pipermail/gcc-patches/2020-November/557903.html
> change broke the following testcases.  The problem is when a pragma
> namespace allows expansion (i.e. p->is_nspace && p->allow_expansion),
> e.g. the omp or acc namespaces do, then when parsing the second pragma
> token we do it with pfile->state.in_directive set,
> pfile->state.prevent_expansion clear and pfile->state.in_deferred_pragma
> clear (the last one because we don't know yet if it will be a deferred
> pragma or not).  If the pragma line only contains a single name
> and newline after it, and there exists a function-like macro with the
> same name, the preprocessor needs to peek in funlike_invocation_p
> the next token whether it isn't ( but in this case it will see a newline.
> As pfile->state.in_directive is set, we don't read anything after the
> newline, pfile->buffer->need_line is set and CPP_EOF is lexed, which
> funlike_invocation_p doesn't push back.  Because name is a function-like
> macro and on the pragma line there is no ( after the name, it isn't
> expanded, and control flow returns to do_pragma.  If name is valid
> deferred pragma, we set pfile->state.in_deferred_pragma (and really
> need it set so that e.g. end_directive later on doesn't eat all the
> tokens from the pragma line).
> 
> Before Nathan's change (which unfortunately didn't contain rationale
> on why it is better to do it like that), this wasn't a problem,
> next _cpp_lex_direct called when we want next token would return
> CPP_PRAGMA_EOF when it saw buffer->need_line, which would turn off
> pfile->state.in_deferred_pragma and following get token would already
> read the next line.  But Nathan's patch replaced it with an assertion
> failure that now triggers and CPP_PRAGMA_EOL is done only when lexing
> the '\n'.  Except for this special case that works fine, but in
> this case it doesn't because when peeking the token we still didn't know
> that it will be a deferred pragma.
> I've tried to fix that up in do_pragma by detecting this and pushing
> CPP_PRAGMA_EOL as lookahead, but that doesn't work because end_directive
> still needs to see pfile->state.in_deferred_pragma set.
> 
> So, this patch affectively reverts part of Nathan's change, CPP_PRAGMA_EOL
> addition isn't done only when parsing the '\n', but is now done in both
> places, in the first one instead of the assertion failure.
> 
> Bootstrapped/regtested on x86_64-linux and i686-linux, ok for trunk?

OK, thanks.

Marek



[committed] libstdc++: Simplify emplace member functions in _Rb_tree

2021-12-03 Thread Jonathan Wakely via Gcc-patches
Tested powerpc64le-linux, pushed to trunk.


This introduces a new RAII type to simplify the emplace members which
currently use try-catch blocks to deallocate a node if an exception is
thrown by the comparisons done during insertion. The new type is created
on the stack and manages the allocation of a new node and deallocates it
in the destructor if it wasn't inserted into the tree. It also provides
helper functions for doing the insertion, releasing ownership of the
node to the tree.

Also, we don't need to use long qualified names if we put the return
type after the nested-name-specifier.

libstdc++-v3/ChangeLog:

* include/bits/stl_tree.h (_Rb_tree::_Auto_node): Define new
RAII helper for creating and inserting new nodes.
(_Rb_tree::_M_insert_node): Use trailing-return-type to simplify
out-of-line definition.
(_Rb_tree::_M_insert_lower_node): Likewise.
(_Rb_tree::_M_insert_equal_lower_node): Likewise.
(_Rb_tree::_M_emplace_unique): Likewise. Use _Auto_node.
(_Rb_tree::_M_emplace_equal): Likewise.
(_Rb_tree::_M_emplace_hint_unique): Likewise.
(_Rb_tree::_M_emplace_hint_equal): Likewise.
---
 libstdc++-v3/include/bits/stl_tree.h | 148 ++-
 1 file changed, 78 insertions(+), 70 deletions(-)

diff --git a/libstdc++-v3/include/bits/stl_tree.h 
b/libstdc++-v3/include/bits/stl_tree.h
index 55b8c9c7cb2..336f4ed97b7 100644
--- a/libstdc++-v3/include/bits/stl_tree.h
+++ b/libstdc++-v3/include/bits/stl_tree.h
@@ -1624,6 +1624,52 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
__y.begin(), __y.end());
   }
 #endif
+
+private:
+#if __cplusplus >= 201103L
+  // An RAII _Node handle
+  struct _Auto_node
+  {
+   template
+ _Auto_node(_Rb_tree& __t, _Args&&... __args)
+ : _M_t(__t),
+   _M_node(__t._M_create_node(std::forward<_Args>(__args)...))
+ { }
+
+   ~_Auto_node()
+   {
+ if (_M_node)
+   _M_t._M_drop_node(_M_node);
+   }
+
+   _Auto_node(_Auto_node&& __n)
+   : _M_t(__n._M_t), _M_node(__n._M_node)
+   { __n._M_node = nullptr; }
+
+   const _Key&
+   _M_key() const
+   { return _S_key(_M_node); }
+
+   iterator
+   _M_insert(pair<_Base_ptr, _Base_ptr> __p)
+   {
+ auto __it = _M_t._M_insert_node(__p.first, __p.second, _M_node);
+ _M_node = nullptr;
+ return __it;
+   }
+
+   iterator
+   _M_insert_equal_lower()
+   {
+ auto __it = _M_t._M_insert_equal_lower_node(_M_node);
+ _M_node = nullptr;
+ return __it;
+   }
+
+   _Rb_tree& _M_t;
+   _Link_type _M_node;
+  };
+#endif // C++11
 };
 
   template= 201103L
   template
-typename _Rb_tree<_Key, _Val, _KeyOfValue, _Compare, _Alloc>::iterator
+auto
 _Rb_tree<_Key, _Val, _KeyOfValue, _Compare, _Alloc>::
 _M_insert_node(_Base_ptr __x, _Base_ptr __p, _Link_type __z)
+-> iterator
 {
   bool __insert_left = (__x != 0 || __p == _M_end()
|| _M_impl._M_key_compare(_S_key(__z),
@@ -2342,9 +2389,10 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
 
   template
-typename _Rb_tree<_Key, _Val, _KeyOfValue, _Compare, _Alloc>::iterator
+auto
 _Rb_tree<_Key, _Val, _KeyOfValue, _Compare, _Alloc>::
 _M_insert_lower_node(_Base_ptr __p, _Link_type __z)
+-> iterator
 {
   bool __insert_left = (__p == _M_end()
|| !_M_impl._M_key_compare(_S_key(__p),
@@ -2358,9 +2406,10 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
 
   template
-typename _Rb_tree<_Key, _Val, _KeyOfValue, _Compare, _Alloc>::iterator
+auto
 _Rb_tree<_Key, _Val, _KeyOfValue, _Compare, _Alloc>::
 _M_insert_equal_lower_node(_Link_type __z)
+-> iterator
 {
   _Link_type __x = _M_begin();
   _Base_ptr __y = _M_end();
@@ -2376,100 +2425,59 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
   template
 template
-  pair::iterator, bool>
+  auto
   _Rb_tree<_Key, _Val, _KeyOfValue, _Compare, _Alloc>::
   _M_emplace_unique(_Args&&... __args)
+  -> pair
   {
-   _Link_type __z = _M_create_node(std::forward<_Args>(__args)...);
-
-   __try
- {
-   typedef pair _Res;
-   auto __res = _M_get_insert_unique_pos(_S_key(__z));
-   if (__res.second)
- return _Res(_M_insert_node(__res.first, __res.second, __z), true);
-   
-   _M_drop_node(__z);
-   return _Res(iterator(__res.first), false);
- }
-   __catch(...)
- {
-   _M_drop_node(__z);
-   __throw_exception_again;
- }
+   _Auto_node __z(*this, std::forward<_Args>(__args)...);
+   auto __res = _M_get_insert_unique_pos(__z._M_key());
+   if (__res.second)
+ return {__z._M_insert(__res), true};
+   return {iterator(__res.first), false};
   }
 
   template
 template
-  typename 

Re: std::basic_string<_Tp> constructor point of instantiation woes?

2021-12-03 Thread Jonathan Wakely via Gcc-patches
On Mon, 22 Nov 2021 at 16:31, Stephan Bergmann via Libstdc++
 wrote:
>
> When using recent libstc++ trunk with Clang in C++20 mode,
> std::u16string literals as in
>
> > #include 
> > int main() {
> >   using namespace std::literals;
> >   u""s;
> > }
>
> started to cause linker failures due to undefined
>
> > _ZNSt7__cxx1112basic_stringIDsSt11char_traitsIDsESaIDsEE12_M_constructIPKDsEEvT_S8_St20forward_iterator_tag
>
> After some head scratching, I found the more insightful
>
> > $ cat test.cc
> > #include 
> > constexpr std::string s("", 0);
>
> > $ clang++ -std=c++20 -fsyntax-only test.cc
> > test.cc:2:23: error: constexpr variable 's' must be initialized by a 
> > constant expression
> > constexpr std::string s("", 0);
> >   ^~~~
> > ~/gcc/trunk/inst/lib/gcc/x86_64-pc-linux-gnu/12.0.0/../../../../include/c++/12.0.0/bits/basic_string.h:620:2:
> >  note: undefined function '_M_construct' cannot be used in a 
> > constant expression
> > _M_construct(__s, __s + __n, std::forward_iterator_tag());
> > ^
> > test.cc:2:23: note: in call to 'basic_string(&""[0], 0, 
> > std::allocator())'
> > constexpr std::string s("", 0);
> >   ^
> > ~/gcc/trunk/inst/lib/gcc/x86_64-pc-linux-gnu/12.0.0/../../../../include/c++/12.0.0/bits/basic_string.h:331:9:
> >  note: declared here
> > _M_construct(_FwdIterator __beg, _FwdIterator __end,
> > ^
> > 1 error generated.
>
> and after some more head scratching found Clang to complain about the
> reduced
>
> > template struct S {
> > constexpr void f();
> > constexpr S() { f(); };
> > };
> > S s1;
> > template constexpr void S::f() {}
> > constexpr S s2;
>
> (about which GCC does not complain).  Not entirely sure who is right,
> but what would help Clang is to move the definitions of the literal
> operators in basic_string.h (which implicitly instantiate the
> corresponding std::basic_string<_Tp> constructor) past the definition of
> _M_construct (which is called from the constructor) in basic_string.tcc;
> something like

The .tcc files are something of an anachronism, as I think they were
supposed to have the non-inline function definitions which might be
subject to 'export' for separate compilation. Except that feature was
removed from C++11, and so now it's just a fairly pointless separation
between inline and non-inline functions ... except where we're muddied
the waters by changing some to 'inline' without moving them to the
other file (because why bother).

That said, all the one- or two-line inline functions like the literal
operators and to_string are all in basic_string.h and having to move
some arbitrary subset of them into the other file, after the
non-inline definitions, is a bit annoying.

I think this is https://bugs.llvm.org/show_bug.cgi?id=24128



Re: rs6000: Fix up flag_shrink_wrap handling in presence of -mrop-protect [PR101324]

2021-12-03 Thread Peter Bergner via Gcc-patches
On 12/3/21 3:27 PM, Peter Bergner wrote:
> On 12/3/21 2:39 PM, Peter Bergner wrote:
>> On 10/29/21 4:45 PM, Segher Boessenkool wrote:
>>> On Wed, Oct 27, 2021 at 10:17:39PM -0500, Peter Bergner wrote:
 2021-10-27  Martin Liska  

 gcc/
PR target/101324
* config/rs6000/rs6000.c (rs6000_option_override_internal): Move the
disabling of shrink-wrapping when using -mrop-protect from here...
(rs6000_override_options_after_change): ...to here.

 2021-10-27  Peter Bergner  

 gcc/testsuite/
PR target/101324
* gcc.target/powerpc/pr101324.c: New test.
>>>
>>> Okay for trunk with similar robustification.  Thanks!
>>
>> With the rop_ok change finally committed, I have finally pushed this
>> change with your suggested test case changes.  Thanks Martin for the
>> fix and Segher for the reviews.
> 
> So I just checked and we have the same failure on GCC11 too.
> Ok for the GCC11 release branch after this has burned-in on
> trunk for a couple of days?
> 
> Ditto for the rop_ok testsuite patch that goes with this one?

FYI, both commits backported cleanly and showed no testsuite regressions
on powerrpc64le-linux.

Peter





[pushed] c++: avoid redundant scope in diagnostics

2021-12-03 Thread Jason Merrill via Gcc-patches
We can make some function signatures shorter to print by omitting redundant
nested-name-specifiers in the rest of the declarator.

Tested x86_64-pc-linux-gnu, applying to trunk.

gcc/cp/ChangeLog:

* error.c (current_dump_scope): New variable.
(dump_scope): Check it.
(dump_function_decl): Set it.

gcc/testsuite/ChangeLog:

* g++.dg/diagnostic/scope1.C: New test.
---
 gcc/cp/error.c   | 10 +-
 gcc/testsuite/g++.dg/diagnostic/scope1.C | 12 
 2 files changed, 21 insertions(+), 1 deletion(-)
 create mode 100644 gcc/testsuite/g++.dg/diagnostic/scope1.C

diff --git a/gcc/cp/error.c b/gcc/cp/error.c
index 98c1f0e4bdf..daea3b39a15 100644
--- a/gcc/cp/error.c
+++ b/gcc/cp/error.c
@@ -211,6 +211,10 @@ dump_module_suffix (cxx_pretty_printer *pp, tree decl)
   }
 }
 
+/* The scope of the declaration we're currently printing, to avoid redundantly
+   dumping the same scope on parameter types.  */
+static tree current_dump_scope;
+
 /* Dump a scope, if deemed necessary.  */
 
 static void
@@ -218,7 +222,7 @@ dump_scope (cxx_pretty_printer *pp, tree scope, int flags)
 {
   int f = flags & (TFF_SCOPE | TFF_CHASE_TYPEDEF);
 
-  if (scope == NULL_TREE)
+  if (scope == NULL_TREE || scope == current_dump_scope)
 return;
 
   /* Enum values within an unscoped enum will be CONST_DECL with an
@@ -1756,6 +1760,10 @@ dump_function_decl (cxx_pretty_printer *pp, tree t, int 
flags)
   else
 dump_scope (pp, CP_DECL_CONTEXT (t), flags);
 
+  /* Name lookup for the rest of the function declarator is implicitly in the
+ scope of the function, so avoid printing redundant scope qualifiers.  */
+  auto cds = make_temp_override (current_dump_scope, CP_DECL_CONTEXT (t));
+
   dump_function_name (pp, t, dump_function_name_flags);
 
   if (!(flags & TFF_NO_FUNCTION_ARGUMENTS))
diff --git a/gcc/testsuite/g++.dg/diagnostic/scope1.C 
b/gcc/testsuite/g++.dg/diagnostic/scope1.C
new file mode 100644
index 000..14d0a1bfab6
--- /dev/null
+++ b/gcc/testsuite/g++.dg/diagnostic/scope1.C
@@ -0,0 +1,12 @@
+// Test for avoiding redundant scope qualifiers.
+
+struct A
+{
+  struct B { };
+  static void f(B,B);  // { dg-message {A::f\(B, B\)} }
+};
+
+int main()
+{
+  A::f(42);// { dg-error "no match" }
+}

base-commit: 7e71909af2cf3aeec9bed4f6a3cc42c1d17cd661
-- 
2.27.0



[PATCH] pru: Fixup flags for .pru_irq_map section

2021-12-03 Thread Dimitar Dimitrov
I intend to merge this patch next week, unless I hear objections.  I
consider it a bug fix which fits the Stage 3 criteria.  It fixes the
RPMSG firmware examples in the latest version 6.0 of TI's PRU Software
Package.

The .pru_irq_map section has been introduced by Linux kernel 5.10:
  
https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=c75c9fdac66efd8b54773368254ef330c276171b
This section must not be loaded into target memory.

Binutils already includes the corresponding fix in the linker script:
  
https://sourceware.org/git/?p=binutils-gdb.git;a=commit;h=44b357eb9aefc77a8385e631d8e3035a664f2333

gcc/ChangeLog:

* config/pru/pru.c (pru_section_type_flags): New function.
(TARGET_SECTION_TYPE_FLAGS): Wire it.

gcc/testsuite/ChangeLog:

* gcc.target/pru/pru_irq_map.c: New test.

Signed-off-by: Dimitar Dimitrov 
---
 gcc/config/pru/pru.c   | 19 +++
 gcc/testsuite/gcc.target/pru/pru_irq_map.c |  8 
 2 files changed, 27 insertions(+)
 create mode 100644 gcc/testsuite/gcc.target/pru/pru_irq_map.c

diff --git a/gcc/config/pru/pru.c b/gcc/config/pru/pru.c
index 9f264b4698d..01d283fd9e7 100644
--- a/gcc/config/pru/pru.c
+++ b/gcc/config/pru/pru.c
@@ -2022,6 +2022,23 @@ pru_assemble_integer (rtx x, unsigned int size, int 
aligned_p)
 }
 }
 
+/* Implement TARGET_SECTION_TYPE_FLAGS.  */
+
+static unsigned int
+pru_section_type_flags (tree decl, const char *name, int reloc)
+{
+  unsigned int flags = default_section_type_flags (decl, name, reloc);
+
+  /* The .pru_irq_map section is not meant to be loaded into the target
+ memory.  Instead its contents are read by the host remoteproc loader.
+ To prevent being marked as a loadable (allocated) section, the
+ .pru_irq_map section is intercepted and marked as a debug section.  */
+  if (!strcmp (name, ".pru_irq_map"))
+flags = SECTION_DEBUG | SECTION_RETAIN;
+
+  return flags;
+}
+
 /* Implement TARGET_ASM_FILE_START.  */
 
 static void
@@ -3071,6 +3088,8 @@ pru_unwind_word_mode (void)
 #define TARGET_ASM_FUNCTION_PROLOGUE pru_asm_function_prologue
 #undef TARGET_ASM_INTEGER
 #define TARGET_ASM_INTEGER pru_assemble_integer
+#undef TARGET_SECTION_TYPE_FLAGS
+#define TARGET_SECTION_TYPE_FLAGS pru_section_type_flags
 
 #undef TARGET_ASM_FILE_START
 #define TARGET_ASM_FILE_START pru_file_start
diff --git a/gcc/testsuite/gcc.target/pru/pru_irq_map.c 
b/gcc/testsuite/gcc.target/pru/pru_irq_map.c
new file mode 100644
index 000..4f9a5e71b01
--- /dev/null
+++ b/gcc/testsuite/gcc.target/pru/pru_irq_map.c
@@ -0,0 +1,8 @@
+/* Test the special handling of .pru_irq_map section.  */
+
+/* { dg-do compile } */
+
+int my_int_map __attribute__((section(".pru_irq_map")));
+
+/* Section must not have the allocated flag.  */
+/* { dg-final { scan-assembler "\.section\[ \t\]+.pru_irq_map,\[ \]*\"\",[ 
]*@progbits" } } */
-- 
2.33.1



Re: rs6000: Fix up flag_shrink_wrap handling in presence of -mrop-protect [PR101324]

2021-12-03 Thread Peter Bergner via Gcc-patches
On 12/3/21 2:39 PM, Peter Bergner wrote:
> On 10/29/21 4:45 PM, Segher Boessenkool wrote:
>> On Wed, Oct 27, 2021 at 10:17:39PM -0500, Peter Bergner wrote:
>>> 2021-10-27  Martin Liska  
>>>
>>> gcc/
>>> PR target/101324
>>> * config/rs6000/rs6000.c (rs6000_option_override_internal): Move the
>>> disabling of shrink-wrapping when using -mrop-protect from here...
>>> (rs6000_override_options_after_change): ...to here.
>>>
>>> 2021-10-27  Peter Bergner  
>>>
>>> gcc/testsuite/
>>> PR target/101324
>>> * gcc.target/powerpc/pr101324.c: New test.
>>
>> Okay for trunk with similar robustification.  Thanks!
> 
> With the rop_ok change finally committed, I have finally pushed this
> change with your suggested test case changes.  Thanks Martin for the
> fix and Segher for the reviews.

So I just checked and we have the same failure on GCC11 too.
Ok for the GCC11 release branch after this has burned-in on
trunk for a couple of days?

Ditto for the rop_ok testsuite patch that goes with this one?

Peter



[PATCH 2/2] Use dominators to reduce ranger cache-flling.

2021-12-03 Thread Andrew MacLeod via Gcc-patches
When a request is made for the range of an ssa_name at some location, 
the first thing we do is invoke range_of_stmt() to ensure we have looked 
at the definition and have an evaluation for the name at a global 
level.  I recently added a patch which dramatically reduces the call 
stack requirements for that call.


Once we have confirmed the definition range has been set, a call is made 
for the range-on-entry to the block of the use.  This is performed by 
the cache, which proceeds to walk the CFG predecessors looking for when 
ranges are created  (exported), existing range-on-entry cache hits,  or 
the definition block. Once this list of  predecessors has been created, 
a forward walk is done, pushing the range's through successor edges of 
all the blocks  were visited in the initial walk.


If the use is far from the definition, we end up filling a lot of the 
same value on these paths.  Also uses which are far from a 
range-modifying statement push the same value into a lot of blocks.


This patch tries to address at least some inefficiencies.  It recognizes 
that


First, if there is no range modifying stmt between this use and the last 
range we saw in a dominating block, we can just use the value from the 
dominating block and not fill in all the cache entries between here and 
there.  This is the biggest win.


Second. if there is a range modifying statement at the end of some 
block, we will have to do the appropriate cache walk to this point, but 
its possible the range-on-entry to THAT block might be able to use a 
dominating range, and we can prevent the walk from going any further 
than this block


Combined, this should prevent a lot of unnecessary ranges from being 
plugging into the cache.


ie, to visualize:

bb4:
  a = foo()
<..>
bb60:
   if (a < 30)
<...>
bb110:
    g = a + 10

if the first place we ask for a is in bb110, we walk the CFG from 110 
all the way back to bb4, on all paths leading back. then fill all those 
cache entries.

With this patch,
  a) if bb60 does not dominate bb110, the request will scan the 
dominators, arrive at the definition block, have seen no range modifier, 
and simply set the on-entry for 110 to the range of a. done.
  b) if bb60 does dominate 110, we have no idea which edge out of 60 
dominates it, so we will revert to he existing cache algorithm.  Before 
doing so, it checks and determines that there are no modifiers between 
bb60 and the def in bb4, and so sets the on-entry cache for bb60 to be 
the range of a.   Now when we do the cache fill walk, it only has to go 
back as far as bb60 instead of all the way to bb4.


Otherwise we just revert to what we do now (or if dominators are not 
available).   I have yet to see a case where we miss something we use to 
get, but that does not mean there isn't one :-).


The cumulative performance impact of this compiling a set of 390 GCC 
source files at -O2 (measured via callgrind) is pretty decent:  Negative 
numbers are a compile time decrease.  Thus -10% is 10% faster 
compilation time.


EVRP     : %change from trunk is -26.31% (!)
VRP2     : %change from trunk is -9.53%
thread_jumps_full   : %change from trunk is -15.8%
Total compilation time  : %change from trunk is -1.06%

So its not insignificant.

Risk would be very low, unless dominators are screwed up mid-pass.. but 
then the relation code would be screwed up too.


Bootstrapped on  x86_64-pc-linux-gnu with no regressions. OK?

Andrew




From ea2f90151dcaeea2b5c372f900e1eef735269e18 Mon Sep 17 00:00:00 2001
From: Andrew MacLeod 
Date: Fri, 3 Dec 2021 11:02:19 -0500
Subject: [PATCH 2/2] Use dominators to reduce cache-flling.

Before walking the CFG and filling all cache entries, check if the
same information is available in a dominator.

	* gimple-range-cache.cc (ranger_cache::fill_block_cache): Check for
	a range from dominators before filling the cache.
	(ranger_cache::range_from_dom): New.
---
 gcc/gimple-range-cache.cc | 73 +++
 gcc/gimple-range-cache.h  |  1 +
 2 files changed, 74 insertions(+)

diff --git a/gcc/gimple-range-cache.cc b/gcc/gimple-range-cache.cc
index fe31e9462aa..47e95ec23be 100644
--- a/gcc/gimple-range-cache.cc
+++ b/gcc/gimple-range-cache.cc
@@ -1312,6 +1312,20 @@ ranger_cache::fill_block_cache (tree name, basic_block bb, basic_block def_bb)
   fprintf (dump_file, " : ");
 }
 
+  // If there are dominators, check if a dominators can supply the range.
+  if (dom_info_available_p (CDI_DOMINATORS)
+  && range_from_dom (block_result, name, bb))
+{
+  m_on_entry.set_bb_range (name, bb, block_result);
+  if (DEBUG_RANGE_CACHE)
+	{
+	  fprintf (dump_file, "Filled from dominator! :  ");
+	  block_result.dump (dump_file);
+	  fprintf (dump_file, "\n");
+	}
+  return;
+}
+
   while (m_workback.length () > 0)
 {
   basic_block node = m_workback.pop ();
@@ -1394,3 +1408,62 @@ ranger_cache::fill_block_cache (tree name, basic_block bb, basic_block def_bb)
 fprin

[PATCH 1/2] - Add BB option for outgoing_edge_range_p.

2021-12-03 Thread Andrew MacLeod via Gcc-patches
has_edge_range_p() and may_recompute_p() currently only take an optional 
edge as a parameter.  They only indicate if a range *might* be 
calculated for a name, but they do not do any calculations.


To determine the results, they always consult the exports for the 
edge->src block. the value is true or false for every edge, so its 
really a basic block property.  This patch makes the option available to 
consult directly with the BB, and calls that for the edge version.


Without this, if you want to know if a basic block can affect a range, 
you have to pick an arbitrary outgoing edge and ask with that.. which 
seems a little awkward.


This patch is used by the next one.

Bootstrapped on  x86_64-pc-linux-gnu with no regressions. OK?

Andrew
From cfe5cdcace399a8e615e103022140359790b3c8b Mon Sep 17 00:00:00 2001
From: Andrew MacLeod 
Date: Fri, 3 Dec 2021 10:51:18 -0500
Subject: [PATCH 1/2] Add BB option for outgoing_edge_range_p and
 may_reocmpute_p.

There are times we only need to know if any edge from a block can calculate
a range.

	* gimple-range-gori.h (class gori_compute):: Add prototypes.
	* gimple-range-gori.cc (gori_compute::has_edge_range_p): Add alternate
	API for basic block.  Call for edge alterantive.
	(gori_compute::may_recompute_p): Ditto.
---
 gcc/gimple-range-gori.cc | 74 +---
 gcc/gimple-range-gori.h  |  6 ++--
 2 files changed, 51 insertions(+), 29 deletions(-)

diff --git a/gcc/gimple-range-gori.cc b/gcc/gimple-range-gori.cc
index 0dba34b58c5..6c17267ad37 100644
--- a/gcc/gimple-range-gori.cc
+++ b/gcc/gimple-range-gori.cc
@@ -1166,33 +1166,12 @@ gori_compute::compute_operand1_and_operand2_range (irange &r,
   r.intersect (op_range);
   return true;
 }
-// Return TRUE if a range can be calculated or recomputed for NAME on edge E.
-
-bool
-gori_compute::has_edge_range_p (tree name, edge e)
-{
-  // Check if NAME is an export or can be recomputed.
-  if (e)
-return is_export_p (name, e->src) || may_recompute_p (name, e);
-
-  // If no edge is specified, check if NAME can have a range calculated
-  // on any edge.
-  return is_export_p (name) || may_recompute_p (name);
-}
-
-// Dump what is known to GORI computes to listing file F.
-
-void
-gori_compute::dump (FILE *f)
-{
-  gori_map::dump (f);
-}
 
-// Return TRUE if NAME can be recomputed on edge E.  If any direct dependant
-// is exported on edge E, it may change the computed value of NAME.
+// Return TRUE if NAME can be recomputed on any edge exiting BB.  If any
+// direct dependant is exported, it may also change the computed value of NAME.
 
 bool
-gori_compute::may_recompute_p (tree name, edge e)
+gori_compute::may_recompute_p (tree name, basic_block bb)
 {
   tree dep1 = depend1 (name);
   tree dep2 = depend2 (name);
@@ -1207,13 +1186,47 @@ gori_compute::may_recompute_p (tree name, edge e)
 return false;
 
   // If edge is specified, check if NAME can be recalculated on that edge.
-  if (e)
-return ((is_export_p (dep1, e->src))
-	|| (dep2 && is_export_p (dep2, e->src)));
+  if (bb)
+return ((is_export_p (dep1, bb))
+	|| (dep2 && is_export_p (dep2, bb)));
 
   return (is_export_p (dep1)) || (dep2 && is_export_p (dep2));
 }
 
+// Return TRUE if NAME can be recomputed on edge E.  If any direct dependant
+// is exported on edge E, it may change the computed value of NAME.
+
+bool
+gori_compute::may_recompute_p (tree name, edge e)
+{
+  gcc_checking_assert (e);
+  return may_recompute_p (name, e->src);
+}
+
+
+// Return TRUE if a range can be calculated or recomputed for NAME on any
+// edge exiting BB.
+
+bool
+gori_compute::has_edge_range_p (tree name, basic_block bb)
+{
+  // Check if NAME is an export or can be recomputed.
+  if (bb)
+return is_export_p (name, bb) || may_recompute_p (name, bb);
+
+  // If no block is specified, check for anywhere in the IL.
+  return is_export_p (name) || may_recompute_p (name);
+}
+
+// Return TRUE if a range can be calculated or recomputed for NAME on edge E.
+
+bool
+gori_compute::has_edge_range_p (tree name, edge e)
+{
+  gcc_checking_assert (e);
+  return has_edge_range_p (name, e->src);
+}
+
 // Calculate a range on edge E and return it in R.  Try to evaluate a
 // range for NAME on this edge.  Return FALSE if this is either not a
 // control edge or NAME is not defined by this edge.
@@ -1287,6 +1300,13 @@ gori_compute::outgoing_edge_range_p (irange &r, edge e, tree name,
   return false;
 }
 
+// Dump what is known to GORI computes to listing file F.
+
+void
+gori_compute::dump (FILE *f)
+{
+  gori_map::dump (f);
+}
 
 // 
 //  GORI iterator.  Although we have bitmap iterators, don't expose that it
diff --git a/gcc/gimple-range-gori.h b/gcc/gimple-range-gori.h
index ec0b95145f0..b15497e9f59 100644
--- a/gcc/gimple-range-gori.h
+++ b/gcc/gimple-range-gori.h
@@ -158,10 +158,12 @@ class gori_compute : public gori_map
 public:
   gori_compute (int not_executa

Re: rs6000: Fix up flag_shrink_wrap handling in presence of -mrop-protect [PR101324]

2021-12-03 Thread Peter Bergner via Gcc-patches
On 10/29/21 4:45 PM, Segher Boessenkool wrote:
> On Wed, Oct 27, 2021 at 10:17:39PM -0500, Peter Bergner wrote:
>> 2021-10-27  Martin Liska  
>>
>> gcc/
>>  PR target/101324
>>  * config/rs6000/rs6000.c (rs6000_option_override_internal): Move the
>>  disabling of shrink-wrapping when using -mrop-protect from here...
>>  (rs6000_override_options_after_change): ...to here.
>>
>> 2021-10-27  Peter Bergner  
>>
>> gcc/testsuite/
>>  PR target/101324
>>  * gcc.target/powerpc/pr101324.c: New test.
> 
> Okay for trunk with similar robustification.  Thanks!

With the rop_ok change finally committed, I have finally pushed this
change with your suggested test case changes.  Thanks Martin for the
fix and Segher for the reviews.

Peter



Re: [PATCH] rs6000: testsuite: Add rop_ok effective-target function

2021-12-03 Thread Peter Bergner via Gcc-patches
On 12/2/21 5:15 PM, Segher Boessenkool wrote:
>> Tested on powerpc64le*-linux with no regressions.  Ok for mainline?
> 
> What can "*" be there other than the empty string?  Which valuse of "*"
> did you test?  :-)

Heh, too used to typing powerpc64*-linux.  Yeah, in this case * == "".



> Okay for trunk.  Thanks!

Thanks, pushed.

Peter




[PATCH v3] elf: Add _dl_find_object function

2021-12-03 Thread Florian Weimer via Gcc-patches
It can be used to speed up the libgcc unwinder, and the internal
_dl_find_dso_for_object function (which is used for caller
identification in dlopen and related functions, and in dladdr).

_dl_find_object is in the internal namespace due to bug 28503.
If libgcc switches to _dl_find_object, this namespace issue will
be fixed.  It is located in libc for two reasons: it is necessary
to forward the call to the static libc after static dlopen, and
there is a link ordering issue with -static-libgcc and libgcc_eh.a
because libc.so is not a linker script that includes ld.so in the
glibc build tree (so that GCC's internal -lc after libgcc_eh.a does
not pick up ld.so).

It is necessary to do the i386 customization in the
sysdeps/x86/bits/dl_find_object.h header shared with x86-64 because
otherwise, multilib installations are broken.

The implementation uses software transactional memory, as suggested
by Torvald Riegel.  Two copies of the supporting data structures are
used, also achieving full async-signal-safety.

---
v3: Introduce _dlfo_lookup, to consolidate the core object-finding
logic, as suggested by Adhemerval.  Added some struct padding
suggested by Jakub.

 NEWS   |   4 +
 bits/dl_find_object.h  |  32 +
 dlfcn/Makefile |   2 +-
 dlfcn/dlfcn.h  |  27 +
 elf/Makefile   |  47 +-
 elf/Versions   |   3 +
 elf/dl-close.c |   4 +
 elf/dl-find_object.c   | 842 +
 elf/dl-find_object.h   | 115 +++
 elf/dl-libc_freeres.c  |   2 +
 elf/dl-open.c  |   5 +
 elf/dl-support.c   |   3 +
 elf/libc-dl_find_object.c  |  26 +
 elf/rtld.c |  11 +
 elf/rtld_static_init.c |   1 +
 elf/tst-dl_find_object-mod1.c  |  10 +
 elf/tst-dl_find_object-mod2.c  |  15 +
 elf/tst-dl_find_object-mod3.c  |  10 +
 elf/tst-dl_find_object-mod4.c  |  10 +
 elf/tst-dl_find_object-mod5.c  |  11 +
 elf/tst-dl_find_object-mod6.c  |  11 +
 elf/tst-dl_find_object-mod7.c  |  10 +
 elf/tst-dl_find_object-mod8.c  |  10 +
 elf/tst-dl_find_object-mod9.c  |  10 +
 elf/tst-dl_find_object-static.c|  22 +
 elf/tst-dl_find_object-threads.c   | 275 +++
 elf/tst-dl_find_object.c   | 240 ++
 include/atomic_wide_counter.h  |  14 +
 include/bits/dl_find_object.h  |   1 +
 include/dlfcn.h|   2 +
 include/link.h |   3 +
 manual/Makefile|   2 +-
 manual/dynlink.texi| 137 
 manual/libdl.texi  |  10 -
 manual/probes.texi |   2 +-
 manual/threads.texi|   2 +-
 sysdeps/arm/bits/dl_find_object.h  |  25 +
 sysdeps/generic/ldsodefs.h |   5 +
 sysdeps/mach/hurd/i386/libc.abilist|   1 +
 sysdeps/nios2/bits/dl_find_object.h|  25 +
 sysdeps/unix/sysv/linux/aarch64/libc.abilist   |   1 +
 sysdeps/unix/sysv/linux/alpha/libc.abilist |   1 +
 sysdeps/unix/sysv/linux/arc/libc.abilist   |   1 +
 sysdeps/unix/sysv/linux/arm/be/libc.abilist|   1 +
 sysdeps/unix/sysv/linux/arm/le/libc.abilist|   1 +
 sysdeps/unix/sysv/linux/csky/libc.abilist  |   1 +
 sysdeps/unix/sysv/linux/hppa/libc.abilist  |   1 +
 sysdeps/unix/sysv/linux/i386/libc.abilist  |   1 +
 sysdeps/unix/sysv/linux/ia64/libc.abilist  |   1 +
 sysdeps/unix/sysv/linux/m68k/coldfire/libc.abilist |   1 +
 sysdeps/unix/sysv/linux/m68k/m680x0/libc.abilist   |   1 +
 sysdeps/unix/sysv/linux/microblaze/be/libc.abilist |   1 +
 sysdeps/unix/sysv/linux/microblaze/le/libc.abilist |   1 +
 .../unix/sysv/linux/mips/mips32/fpu/libc.abilist   |   1 +
 .../unix/sysv/linux/mips/mips32/nofpu/libc.abilist |   1 +
 .../unix/sysv/linux/mips/mips64/n32/libc.abilist   |   1 +
 .../unix/sysv/linux/mips/mips64/n64/libc.abilist   |   1 +
 sysdeps/unix/sysv/linux/nios2/libc.abilist |   1 +
 .../sysv/linux/powerpc/powerpc32/fpu/libc.abilist  |   1 +
 .../linux/powerpc/powerpc32/nofpu/libc.abilist |   1 +
 .../sysv/linux/powerpc/powerpc64/be/libc.abilist   |   1 +
 .../sysv/linux/powerpc/powerpc64/le/libc.abilist   |   1 +
 sysdeps/unix/sysv/linux/riscv/rv32/libc.abilist|  

[PATCH take #2] PR target/43892: Some carry flag (CA) optimizations on PowerPC.

2021-12-03 Thread Roger Sayle

Doh!  This time with the patch attached...

This patch resolves PR target/43892 (suboptimal add with carry) by adding
four new define_insn_and_split to the rs6000 backend, that all recognize
pairs of instructions where the first instruction sets the carry flag and
the second one consumes it.  It also adds a commutative variant of
add3_carry_in_0 (aka "addze") to catch cases, not caught by recog's
insn canonicalization, where CA_REG appears first.

For the add32carry function in the original PR:

unsigned int add32carry(unsigned int sum, unsigned int x) {
  unsigned int z = sum + x;
  if (sum + x < x)
z++;
  return z;
}

previously "-O2 -m32" would generate:

add32carry:
add 3,3,4
subfc 4,4,3
subfe 9,9,9
subf 3,9,3
blr

with this patch we now generate:

add32carry:
addc 3,3,4
addze 3,3
blr

And for the related examples in the new test case,

unsigned long add_leu(unsigned long a, unsigned long b, unsigned long c) {
  return a + (b <= c);
}

unsigned long add_geu(unsigned long a, unsigned long b, unsigned long c) {
  return a + (b >= c);
}

On powerpc64 with -O2 we'd previously generate:

add_leu:
subfc 4,4,5
subfe 9,9,9
addi 9,9,1
add 3,9,3
blr
add_geu:
subfc 5,5,4
subfe 9,9,9
addi 9,9,1
add 3,9,3
blr

but with this patch we now generate:

add_leu:
subfc 4,4,5
addze 3,3
blr
add_geu:
subfc 5,5,4
addze 3,3
blr

This patch has been tested on powerpc64-unknown-linux-gnu (many thanks to
gcc203.fsffrance.org on the GCC compile farm) with a make bootstrap and make
-k check with now new failures.

Ok for mainline?


2021-12-03  Roger Sayle  

gcc/ChangeLog
PR target/43892
* config/rs6000/rs6000.md (*add3_carry_in_0_2): New
define_insn to recognize commutative form of add3_carry_in_0.
(*add3_geu, *add3_leu, *subf3_carry_in_xx_subf,
*add3_carry_in_addc): New define_insn_and_split patterns.

gcc/testsuite/ChangeLog
PR target/43892
* gcc.target/powerpc/addcmp.c: New test case.
* gcc.target/powerpc/pr43892.c: New test case.


Many thanks in advance.
Roger
--

diff --git a/gcc/config/rs6000/rs6000.md b/gcc/config/rs6000/rs6000.md
index 6bec2bddbde..90c23556ccb 100644
--- a/gcc/config/rs6000/rs6000.md
+++ b/gcc/config/rs6000/rs6000.md
@@ -2067,6 +2067,16 @@
   "addze %0,%1"
   [(set_attr "type" "add")])
 
+;; Non-canonical form of add3_carry_in_0
+(define_insn "*add3_carry_in_0_2"
+  [(set (match_operand:GPR 0 "gpc_reg_operand" "=r")
+   (plus:GPR (reg:GPR CA_REGNO)
+ (match_operand:GPR 1 "gpc_reg_operand" "r")))
+   (clobber (reg:GPR CA_REGNO))]
+  ""
+  "addze %0,%1"
+  [(set_attr "type" "add")])
+
 (define_insn "add3_carry_in_m1"
   [(set (match_operand:GPR 0 "gpc_reg_operand" "=r")
(plus:GPR (plus:GPR (match_operand:GPR 1 "gpc_reg_operand" "r")
@@ -2078,6 +2088,95 @@
   [(set_attr "type" "add")])
 
 
+;; PR target/43892 -> subf3_carry ; add3_carry_in_0
+(define_insn_and_split "*add3_geu"
+  [(set (match_operand:P 0 "gpc_reg_operand" "=r")
+   (plus:P (geu:P (match_operand:P 1 "gpc_reg_operand" "r")
+  (match_operand:P 2 "gpc_reg_operand" "r"))
+   (match_operand:P 3 "gpc_reg_operand" "r")))
+   (clobber (match_scratch:P 4 "=r"))
+   (clobber (reg:P CA_REGNO))]
+  ""
+  "#"
+  "&& 1"
+  [(const_int 0)]
+{
+  if (GET_CODE (operands[4]) == SCRATCH)
+operands[4] = gen_reg_rtx (mode);
+  emit_insn (gen_subf3_carry (operands[4], operands[1], operands[2]));
+  emit_insn (gen_add3_carry_in_0 (operands[0], operands[3]));
+  DONE;
+}
+  [(set_attr "type" "two")
+   (set_attr "length" "8")])
+
+;; PR target/43892 -> subf3_carry ; add3_carry_in_0
+(define_insn_and_split "*add3_leu"
+  [(set (match_operand:P 0 "gpc_reg_operand" "=r")
+   (plus:P (leu:P (match_operand:P 1 "gpc_reg_operand" "r")
+  (match_operand:P 2 "gpc_reg_operand" "r"))
+   (match_operand:P 3 "gpc_reg_operand" "r")))
+   (clobber (match_scratch:P 4 "=r"))
+   (clobber (reg:P CA_REGNO))]
+  ""
+  "#"
+  "&& 1"
+  [(const_int 0)]
+{
+  if (GET_CODE (operands[4]) == SCRATCH)
+operands[4] = gen_reg_rtx (mode);
+  emit_insn (gen_subf3_carry (operands[4], operands[2], operands[1]));
+  emit_insn (gen_add3_carry_in_0 (operands[0], operands[3]));
+  DONE;
+}
+  [(set_attr "type" "two")
+   (set_attr "length" "8")])
+
+;; PR target/43892 -> subf3_carry_in_xx ; subf3
+(define_insn_and_split "*subf3_carry_in_xx_subf"
+  [(set (match_operand:P 0 "gpc_reg_operand" "=r")
+   (plus:P (minus:P (match_operand:P 1 "gpc_reg_operand" "r")
+(reg:P CA_REGNO))
+   (const_int 1)))
+   (clobber (match_scratch:P 2 "=r"))
+   (clobber (reg:P CA_REGNO))]
+  ""
+  "#"
+  "&& 1"
+  [(const_int 0)]
+{
+  if (GET_CODE (operands[2]) == SCRATCH)
+operands[2] = gen_reg_rtx (mode);
+  emit_in

Re: [PATCH] c++, v2: Allow indeterminate unsigned char or std::byte in bit_cast - P1272R4

2021-12-03 Thread Jason Merrill via Gcc-patches

On 12/2/21 15:56, Jakub Jelinek wrote:

On Thu, Dec 02, 2021 at 03:32:58PM -0500, Jason Merrill wrote:

So IMHO we need the patch I've posted (with the testcases slightly adjusted
not to do those extra copies afterwards for now),
and try to deal with the lvalue-to-rvalue conversion of indeterminate later,
and once done, perhaps in a copy of those testcases readd those extra copies
afterwards and verify it is rejected.


Makes sense, except:


+ gcc_assert (end == 1 || end == 2);


This seems to assert that the value representation of a bit-field can't span
more than two bytes, which is wrong for, say,

struct A
{
   int : 1;
   unsigned __int128 c : 128; // value representation spans 17 bytes
};


Sure, for arbitrary bitfields yes.  But, the patch only deals with
unsigned char and std::byte TYPE_MAIN_VARIANT (DECL_BIT_FIELD_TYPE (fld))
bitfields (others don't have the type listed in the standard and so should
error right away, which is done by keeping the mask bits uncleared).


Ah, of course.  The patch is OK, though you might factor


+  else if (is_byte_access_type (type)
+  && TYPE_MAIN_VARIANT (type) != char_type_node)


into is_byte_access_type_not_plain_char.  OK either way.

Jason



Re: [PATCH v2] c++: Fix for decltype(auto) and parenthesized expr [PR103403]

2021-12-03 Thread Jason Merrill via Gcc-patches

On 12/2/21 18:12, Marek Polacek wrote:

On Wed, Dec 01, 2021 at 11:16:27PM -0500, Jason Merrill wrote:

On 12/1/21 10:16, Marek Polacek wrote:

In r11-4758, I tried to fix this problem:

int &&i = 0;
decltype(auto) j = i; // should behave like int &&j = i; error

wherein do_auto_deduction was getting confused with a REFERENCE_REF_P
and it didn't realize its operand was a name, not an expression, and
deduced the wrong type.

Unfortunately that fix broke this:

int&& r = 1;
decltype(auto) rr = (r);

where 'rr' should be 'int &' since '(r)' is an expression, not a name.  But
because I stripped the INDIRECT_REF with the r11-4758 change, we deduced
'rr's type as if decltype had gotten a name, resulting in 'int &&'.

I suspect I thought that the REF_PARENTHESIZED_P check when setting
'bool id' in do_auto_deduction would handle the (r) case, but that's not
the case; while the documentation for REF_PARENTHESIZED_P specifically says
it can be set in INDIRECT_REF, we don't actually do so.

This patch sets REF_PARENTHESIZED_P even on REFERENCE_REF_P, so that
do_auto_deduction can use it.

It also removes code in maybe_undo_parenthesized_ref that I think is
dead -- and we don't hit it while running dg.exp.  To adduce more data,
it also looks dead here:
https://splichal.eu/lcov/gcc/cp/semantics.c.gcov.html


Agreed, that code is dead since r9-1417.


Bootstrapped/regtested on x86_64-pc-linux-gnu, ok for trunk and 11?

PR c++/103403

gcc/cp/ChangeLog:

* cp-gimplify.c (cp_fold): Don't recurse if maybe_undo_parenthesized_ref
doesn't change its argument.
* pt.c (do_auto_deduction): Don't strip REFERENCE_REF_P trees.  Also
REF_PARENTHESIZED_P for REFERENCE_REF_P.
* semantics.c (force_paren_expr): Set REF_PARENTHESIZED_P on
REFERENCE_REF_P trees too.
(maybe_undo_parenthesized_ref): Remove dead code.

gcc/testsuite/ChangeLog:

* g++.dg/cpp1y/decltype-auto2.C: New test.
* g++.dg/cpp1y/decltype-auto3.C: New test.
---
   gcc/cp/cp-gimplify.c|  3 ++-
   gcc/cp/pt.c |  5 ++---
   gcc/cp/semantics.c  | 18 --
   gcc/testsuite/g++.dg/cpp1y/decltype-auto2.C | 12 
   gcc/testsuite/g++.dg/cpp1y/decltype-auto3.C | 12 
   5 files changed, 32 insertions(+), 18 deletions(-)
   create mode 100644 gcc/testsuite/g++.dg/cpp1y/decltype-auto2.C
   create mode 100644 gcc/testsuite/g++.dg/cpp1y/decltype-auto3.C

diff --git a/gcc/cp/cp-gimplify.c b/gcc/cp/cp-gimplify.c
index 0a002db14e7..e3ede02a48e 100644
--- a/gcc/cp/cp-gimplify.c
+++ b/gcc/cp/cp-gimplify.c
@@ -2421,7 +2421,8 @@ cp_fold (tree x)
 if (REF_PARENTHESIZED_P (x))
{
  tree p = maybe_undo_parenthesized_ref (x);
- return cp_fold (p);
+ if (p != x)
+   return cp_fold (p);
}
 goto unary;
diff --git a/gcc/cp/pt.c b/gcc/cp/pt.c
index f4b9d9673fb..c5b41b57028 100644
--- a/gcc/cp/pt.c
+++ b/gcc/cp/pt.c
@@ -29889,11 +29889,10 @@ do_auto_deduction (tree type, tree init, tree 
auto_node,
 else if (AUTO_IS_DECLTYPE (auto_node))
   {
 tree stripped_init = tree_strip_any_location_wrapper (init);
-  if (REFERENCE_REF_P (stripped_init))
-   stripped_init = TREE_OPERAND (stripped_init, 0);
 bool id = (DECL_P (stripped_init)
 || ((TREE_CODE (init) == COMPONENT_REF
- || TREE_CODE (init) == SCOPE_REF)
+ || TREE_CODE (init) == SCOPE_REF
+ || REFERENCE_REF_P (init))
 && !REF_PARENTHESIZED_P (init)));


This change seems wrong; not all references are id-expressions or member
accesses.  For instance, a call to a function returning a reference is not.
I think we still want to depend on the TREE_CODE, but we need to look at the
TREE_CODE of stripped_init, not init.


Yes indeed, id was wrong.  I didn't notice because we still deduced to the
right type!

This version only strips REFERENCE_REF_P when they aren't REF_PARENTHESIZED_P,
so that, as the comment says, I can tell '(r)' and 'r' apart.  Since we can
deduce the correct type even with the wrong id, I tested this by using the
two functions in decltype-auto4.C, and adding gcc_assert(id) or !id so that
I can verify that id looks correct with this patch.

Also add a new test case: decltype-auto4.C.  It would still be nice to test
more; I'm not sure how well we test [dcl.type.decltype]/1.1 and /1.2.  I'm
also adding decomp-decltype1.C, which fixes decltype + structured binding.

Bootstrapped/regtested on x86_64-pc-linux-gnu, ok for trunk / 11?


OK.


-- >8 --
In r11-4758, I tried to fix this problem:

   int &&i = 0;
   decltype(auto) j = i; // should behave like int &&j = i; error

wherein do_auto_deduction was getting confused with a REFERENCE_REF_P
and it didn't realize its operand was a name, not an expression, and
deduced the wrong type.

Unfortunately th

[PATCH 6/6] rs6000: Rename arrays to remove temporary _x suffix

2021-12-03 Thread Bill Schmidt via Gcc-patches
From: Bill Schmidt 

Hi!

While we had two sets of built-in infrastructure at once, I added _x as a
suffix to two arrays to disambiguate the old and new versions.  Time to fix
that also.

Bootstrapped and tested on powerpc64le-linux-gnu with no regressions.  Is this
okay for trunk?

Thanks!
Bill

2021-12-02  Bill Schmidt  

gcc/
* config/rs6000/rs6000-c.c (altivec_build_resolved_builtin): Rename
rs6000_builtin_decls_x to rs6000_builtin_decls.
(altivec_resolve_overloaded_builtin): Likewise.  Also rename
rs6000_builtin_info_x to rs6000_builtin_info.
* config/rs6000/rs6000-call.c (rs6000_invalid_builtin): Rename
rs6000_builtin_info_x to rs6000_builtin_info.
(rs6000_builtin_is_supported): Likewise.
(rs6000_gimple_fold_mma_builtin): Likewise.  Also rename
rs6000_builtin_decls_x to rs6000_builtin_decls.
(rs6000_gimple_fold_builtin): Rename rs6000_builtin_info_x to
rs6000_builtin_info.
(cpu_expand_builtin): Likewise.
(rs6000_expand_builtin): Likewise.
(rs6000_init_builtins): Likewise.  Also rename rs6000_builtin_decls_x
to rs6000_builtin_decls.
(rs6000_builtin_decl): Rename rs6000_builtin_decls_x to
rs6000_builtin_decls.
* config/rs6000/rs6000-gen-builtins.c (write_decls): In generated code,
rename rs6000_builtin_decls_x to rs6000_builtin_decls, and rename
rs6000_builtin_info_x to rs6000_builtin_info.
(write_bif_static_init): In generated code, rename
rs6000_builtin_info_x to rs6000_builtin_info.
(write_init_bif_table): In generated code, rename
rs6000_builtin_decls_x to rs6000_builtin_decls, and rename
rs6000_builtin_info_x to rs6000_builtin_info.
(write_init_ovld_table): In generated code, rename
rs6000_builtin_decls_x to rs6000_builtin_decls.
(write_init_file): Likewise.
* config/rs6000/rs6000.c (rs6000_builtin_vectorized_function):
Likewise.
(rs6000_builtin_md_vectorized_function): Likewise.
(rs6000_builtin_reciprocal): Likewise.
(add_condition_to_bb): Likewise.
(rs6000_atomic_assign_expand_fenv): Likewise.
---
 gcc/config/rs6000/rs6000-c.c| 64 -
 gcc/config/rs6000/rs6000-call.c | 46 +-
 gcc/config/rs6000/rs6000-gen-builtins.c | 27 +--
 gcc/config/rs6000/rs6000.c  | 58 +++---
 4 files changed, 96 insertions(+), 99 deletions(-)

diff --git a/gcc/config/rs6000/rs6000-c.c b/gcc/config/rs6000/rs6000-c.c
index f790c72d621..e0ebdeed548 100644
--- a/gcc/config/rs6000/rs6000-c.c
+++ b/gcc/config/rs6000/rs6000-c.c
@@ -867,7 +867,7 @@ altivec_build_resolved_builtin (tree *args, int n, tree 
fntype, tree ret_type,
 {
   tree argtypes = TYPE_ARG_TYPES (fntype);
   tree arg_type[MAX_OVLD_ARGS];
-  tree fndecl = rs6000_builtin_decls_x[bif_id];
+  tree fndecl = rs6000_builtin_decls[bif_id];
 
   for (int i = 0; i < n; i++)
 {
@@ -1001,13 +1001,13 @@ altivec_resolve_overloaded_builtin (location_t loc, 
tree fndecl,
  case E_SFmode:
{
  /* For floats use the xvmulsp instruction directly.  */
- tree call = rs6000_builtin_decls_x[RS6000_BIF_XVMULSP];
+ tree call = rs6000_builtin_decls[RS6000_BIF_XVMULSP];
  return build_call_expr (call, 2, arg0, arg1);
}
  case E_DFmode:
{
  /* For doubles use the xvmuldp instruction directly.  */
- tree call = rs6000_builtin_decls_x[RS6000_BIF_XVMULDP];
+ tree call = rs6000_builtin_decls[RS6000_BIF_XVMULDP];
  return build_call_expr (call, 2, arg0, arg1);
}
  /* Other types are errors.  */
@@ -1066,7 +1066,7 @@ altivec_resolve_overloaded_builtin (location_t loc, tree 
fndecl,
vec_safe_push (params, arg0);
vec_safe_push (params, arg1);
tree call = altivec_resolve_overloaded_builtin
- (loc, rs6000_builtin_decls_x[RS6000_OVLD_VEC_CMPEQ],
+ (loc, rs6000_builtin_decls[RS6000_OVLD_VEC_CMPEQ],
   params);
/* Use save_expr to ensure that operands used more than once
   that may have side effects (like calls) are only evaluated
@@ -1076,7 +1076,7 @@ altivec_resolve_overloaded_builtin (location_t loc, tree 
fndecl,
vec_safe_push (params, call);
vec_safe_push (params, call);
return altivec_resolve_overloaded_builtin
- (loc, rs6000_builtin_decls_x[RS6000_OVLD_VEC_NOR], params);
+ (loc, rs6000_builtin_decls[RS6000_OVLD_VEC_NOR], params);
  }
  /* Other types are errors.  */
default:
@@ -1129,9 +1129,9 @@ altivec_resolve_overloaded_builtin (location_t loc, tree 
fndecl,
  vec_safe_push (params, arg1);
 
  if (fcode

[PATCH 5/6] rs6000: Rename functions with "new" in their names

2021-12-03 Thread Bill Schmidt via Gcc-patches
From: Bill Schmidt 

Hi!

While we had two sets of built-in functionality at the same time, I put "new"
in the names of quite a few functions.  Time to undo that.

Bootstrapped and tested on powerpc64le-linux-gnu with no regressions.  Is this
okay for trunk?

Thanks!
Bill

2021-12-02  Bill Schmidt  

gcc/
* config/rs6000/rs6000-c.c (altivec_resolve_new_overloaded_builtin):
Remove forward declaration.
(rs6000_new_builtin_type_compatible): Rename to
rs6000_builtin_type_compatible.
(rs6000_builtin_type_compatible): Remove.
(altivec_resolve_overloaded_builtin): Remove.
(altivec_build_new_resolved_builtin): Rename to
altivec_build_resolved_builtin.
(altivec_resolve_new_overloaded_builtin): Rename to
altivec_resolve_overloaded_builtin.  Remove static keyword.  Adjust
called function names.
* config/rs6000/rs6000-call.c (rs6000_expand_new_builtin): Remove
forward declaration.
(rs6000_gimple_fold_new_builtin): Likewise.
(rs6000_invalid_new_builtin): Rename to rs6000_invalid_builtin.
(rs6000_gimple_fold_builtin): Remove.
(rs6000_new_builtin_valid_without_lhs): Rename to
rs6000_builtin_valid_without_lhs.
(rs6000_new_builtin_is_supported): Rename to
rs6000_builtin_is_supported.
(rs6000_gimple_fold_new_mma_builtin): Rename to
rs6000_gimple_fold_mma_builtin.
(rs6000_gimple_fold_new_builtin): Rename to
rs6000_gimple_fold_builtin.  Remove static keyword.  Adjust called
function names.
(rs6000_expand_builtin): Remove.
(new_cpu_expand_builtin): Rename to cpu_expand_builtin.
(new_mma_expand_builtin): Rename to mma_expand_builtin.
(new_htm_spr_num): Rename to htm_spr_num.
(new_htm_expand_builtin): Rename to htm_expand_builtin.  Change name
of called function.
(rs6000_expand_new_builtin): Rename to rs6000_expand_builtin.  Remove
static keyword.  Adjust called function names.
(rs6000_new_builtin_decl): Rename to rs6000_builtin_decl.  Remove
static keyword.
(rs6000_builtin_decl): Remove.
* config/rs6000/rs6000-gen-builtins.c (write_decls): In gnerated code,
rename rs6000_new_builtin_is_supported to rs6000_builtin_is_supported.
* config/rs6000/rs6000-internal.h (rs6000_invalid_new_builtin): Rename
to rs6000_invalid_builtin.
* config/rs6000/rs6000.c (rs6000_new_builtin_vectorized_function):
Rename to rs6000_builtin_vectorized_function.
(rs6000_new_builtin_md_vectorized_function): Rename to
rs6000_builtin_md_vectorized_function.
(rs6000_builtin_vectorized_function): Remove.
(rs6000_builtin_md_vectorized_function): Remove.
---
 gcc/config/rs6000/rs6000-c.c| 120 +---
 gcc/config/rs6000/rs6000-call.c |  99 ++-
 gcc/config/rs6000/rs6000-gen-builtins.c |   3 +-
 gcc/config/rs6000/rs6000-internal.h |   2 +-
 gcc/config/rs6000/rs6000.c  |  31 ++
 5 files changed, 80 insertions(+), 175 deletions(-)

diff --git a/gcc/config/rs6000/rs6000-c.c b/gcc/config/rs6000/rs6000-c.c
index d44edf585aa..f790c72d621 100644
--- a/gcc/config/rs6000/rs6000-c.c
+++ b/gcc/config/rs6000/rs6000-c.c
@@ -37,9 +37,6 @@
 
 #include "rs6000-internal.h"
 
-static tree altivec_resolve_new_overloaded_builtin (location_t, tree, void *);
-
-
 /* Handle the machine specific pragma longcall.  Its syntax is
 
# pragma longcall ( TOGGLE )
@@ -817,7 +814,7 @@ is_float128_p (tree t)
 
 /* Return true iff ARGTYPE can be compatibly passed as PARMTYPE.  */
 static bool
-rs6000_new_builtin_type_compatible (tree parmtype, tree argtype)
+rs6000_builtin_type_compatible (tree parmtype, tree argtype)
 {
   if (parmtype == error_mark_node)
 return false;
@@ -840,23 +837,6 @@ rs6000_new_builtin_type_compatible (tree parmtype, tree 
argtype)
   return lang_hooks.types_compatible_p (parmtype, argtype);
 }
 
-static inline bool
-rs6000_builtin_type_compatible (tree t, int id)
-{
-  tree builtin_type;
-  builtin_type = rs6000_builtin_type (id);
-  if (t == error_mark_node)
-return false;
-  if (INTEGRAL_TYPE_P (t) && INTEGRAL_TYPE_P (builtin_type))
-return true;
-  else if (TARGET_IEEEQUAD && TARGET_LONG_DOUBLE_128
-  && is_float128_p (t) && is_float128_p (builtin_type))
-return true;
-  else
-return lang_hooks.types_compatible_p (t, builtin_type);
-}
-
-
 /* In addition to calling fold_convert for EXPR of type TYPE, also
call c_fully_fold to remove any C_MAYBE_CONST_EXPRs that could be
hiding there (PR47197).  */
@@ -873,16 +853,6 @@ fully_fold_convert (tree type, tree expr)
   return result;
 }
 
-/* Implementation of the resolve_overloaded_builtin target hook, to
-   support Altivec's overloaded builtins.  */
-
-tree
-altivec_resolve_overloaded_builtin (location_t loc, tree fndecl,
-  

[PATCH 3/6] rs6000: Rename rs6000-builtin-new.def to rs6000-builtins.def

2021-12-03 Thread Bill Schmidt via Gcc-patches
From: Bill Schmidt 

Hi!

This patch just renames a file and updates the build machinery accordingly.

Bootstrapped and tested on powerpc64le-linux-gnu with no regressions.  Is this
okay for trunk?

Thanks!
Bill

2021-12-02  Bill Schmidt  

gcc/
* config/rs6000/rs6000-builtin-new.def: Rename to...
* config/rs6000/rs6000-builtins.def: ...this.
* config/rs6000/rs6000-gen-builtins.c: Adjust header commentary.
* config/rs6000/t-rs6000 (EXTRA_GTYPE_DEPS): Rename
rs6000-builtin-new.def to rs6000-builtins.def.
(rs6000-builtins.c): Likewise.
---
 .../rs6000/{rs6000-builtin-new.def => rs6000-builtins.def}  | 0
 gcc/config/rs6000/rs6000-gen-builtins.c | 4 ++--
 gcc/config/rs6000/t-rs6000  | 6 +++---
 3 files changed, 5 insertions(+), 5 deletions(-)
 rename gcc/config/rs6000/{rs6000-builtin-new.def => rs6000-builtins.def} (100%)

diff --git a/gcc/config/rs6000/rs6000-builtin-new.def 
b/gcc/config/rs6000/rs6000-builtins.def
similarity index 100%
rename from gcc/config/rs6000/rs6000-builtin-new.def
rename to gcc/config/rs6000/rs6000-builtins.def
diff --git a/gcc/config/rs6000/rs6000-gen-builtins.c 
b/gcc/config/rs6000/rs6000-gen-builtins.c
index 78b2486aafc..9c61b7d9fe6 100644
--- a/gcc/config/rs6000/rs6000-gen-builtins.c
+++ b/gcc/config/rs6000/rs6000-gen-builtins.c
@@ -22,7 +22,7 @@ along with GCC; see the file COPYING3.  If not see
recognition code for Power targets, based on text files that
describe the built-in functions and vector overloads:
 
- rs6000-builtin-new.def Table of built-in functions
+ rs6000-builtins.defTable of built-in functions
  rs6000-overload.defTable of overload functions
 
Both files group similar functions together in "stanzas," as
@@ -125,7 +125,7 @@ along with GCC; see the file COPYING3.  If not see
 
The second line contains the  that this particular instance of
the overloaded function maps to.  It must match a token that appears in
-   rs6000-builtin-new.def.  Optionally, a second token may appear.  If only
+   rs6000-builtins.def.  Optionally, a second token may appear.  If only
one token is on the line, it is also used to build the unique identifier
for the overloaded function.  If a second token is present, the second
token is used instead for this purpose.  This is necessary in cases
diff --git a/gcc/config/rs6000/t-rs6000 b/gcc/config/rs6000/t-rs6000
index d48a4b1be6c..3d3143a171d 100644
--- a/gcc/config/rs6000/t-rs6000
+++ b/gcc/config/rs6000/t-rs6000
@@ -22,7 +22,7 @@ TM_H += $(srcdir)/config/rs6000/rs6000-builtin.def
 TM_H += $(srcdir)/config/rs6000/rs6000-cpus.def
 TM_H += $(srcdir)/config/rs6000/rs6000-modes.h
 PASSES_EXTRA += $(srcdir)/config/rs6000/rs6000-passes.def
-EXTRA_GTYPE_DEPS += $(srcdir)/config/rs6000/rs6000-builtin-new.def
+EXTRA_GTYPE_DEPS += $(srcdir)/config/rs6000/rs6000-builtins.def
 
 rs6000-pcrel-opt.o: $(srcdir)/config/rs6000/rs6000-pcrel-opt.c
$(COMPILE) $<
@@ -59,10 +59,10 @@ build/rs6000-gen-builtins$(build_exeext): 
build/rs6000-gen-builtins.o \
 # For now, the header files depend on rs6000-builtins.c, which avoids
 # races because the .c file is closed last in rs6000-gen-builtins.c.
 rs6000-builtins.c: build/rs6000-gen-builtins$(build_exeext) \
-  $(srcdir)/config/rs6000/rs6000-builtin-new.def \
+  $(srcdir)/config/rs6000/rs6000-builtins.def \
   $(srcdir)/config/rs6000/rs6000-overload.def
$(RUN_GEN) ./build/rs6000-gen-builtins$(build_exeext) \
-   $(srcdir)/config/rs6000/rs6000-builtin-new.def \
+   $(srcdir)/config/rs6000/rs6000-builtins.def \
$(srcdir)/config/rs6000/rs6000-overload.def rs6000-builtins.h \
rs6000-builtins.c rs6000-vecdefines.h
 
-- 
2.27.0



[PATCH 0/6] rs6000: Remove "old" built-in function infrastructure

2021-12-03 Thread Bill Schmidt via Gcc-patches
From: Bill Schmidt 

Hi!

Now that the new built-in function support is all upstream and enabled, it
seems safe and prudent to remove the old code to avoid confusion.  I broke this
up to the extent possible, but the first patch is a bit large and messy because
so many dead functions have to be removed when taking out the
"new_builtins_are_live" variable.

Bill Schmidt (6):
  rs6000: Remove new_builtins_are_live and dead code it was guarding
  rs6000: Remove altivec_overloaded_builtins array and initialization
  rs6000: Rename rs6000-builtin-new.def to rs6000-builtins.def
  rs6000: Remove rs6000-builtin.def and associated data and functions
  rs6000: Rename functions with "new" in their names
  rs6000: Rename arrays to remove temporary _x suffix

 gcc/config/rs6000/darwin.h| 8 +-
 gcc/config/rs6000/rs6000-builtin.def  |  3350 ---
 ...00-builtin-new.def => rs6000-builtins.def} | 0
 gcc/config/rs6000/rs6000-c.c  |  1342 +-
 gcc/config/rs6000/rs6000-call.c   | 17810 +++-
 gcc/config/rs6000/rs6000-gen-builtins.c   |   115 +-
 gcc/config/rs6000/rs6000-internal.h   | 2 +-
 gcc/config/rs6000/rs6000-protos.h | 3 -
 gcc/config/rs6000/rs6000.c|   334 +-
 gcc/config/rs6000/rs6000.h|58 -
 gcc/config/rs6000/t-rs6000| 7 +-
 11 files changed, 3173 insertions(+), 19856 deletions(-)
 delete mode 100644 gcc/config/rs6000/rs6000-builtin.def
 rename gcc/config/rs6000/{rs6000-builtin-new.def => rs6000-builtins.def} (100%)

-- 
2.27.0



Re: [PATCH] rs6000: Fix use of wrong enum for built-in function code.

2021-12-03 Thread Bill Schmidt via Gcc-patches
On 12/3/21 10:26 AM, Segher Boessenkool wrote:
> Hi!
>
> On Thu, Dec 02, 2021 at 04:53:18PM -0600, Bill Schmidt wrote:
>> I discovered this bug while working on patches to remove the old built-ins
>> infrastructure.  I missed a spot in converting from the rs6000_builtins enum 
>> to
>> the rs6000_gen_builtins enum.  This fixes it.  The fix is technically not 
>> right
>> if new_builtins_are_enabled were to be set to zero, but we're not going to do
>> that anymore, and the remnants of that code will be removed shortly.
>> gcc/
>>  * config/rs6000/rs6000.c (rs6000_builtin_reciprocal): Fix builtin
>>  identifiers.
> What an informative changelog ;-)
>
> Okay for trunk.  Thanks!

Thanks!  Pushed as r12-5776.

Bill

>
>
> Segher


Re: PING^1 [PATCH] x86: Add -mmove-max=bits and -mstore-max=bits

2021-12-03 Thread H.J. Lu via Gcc-patches
On Fri, Dec 3, 2021 at 8:55 AM Uros Bizjak  wrote:
>
> On Fri, Dec 3, 2021 at 2:24 PM H.J. Lu  wrote:
> >
> > On Thu, Nov 25, 2021 at 2:47 PM H.J. Lu  wrote:
> > >
> > > Add -mmove-max=bits and -mstore-max=bits to enable 256-bit/512-bit move
> > > and store, independent of -mprefer-vector-width=bits:
> > >
> > > 1. Add X86_TUNE_AVX512_MOVE_BY_PIECES and X86_TUNE_AVX512_STORE_BY_PIECES
> > > which are enabled for Intel Sapphire Rapids processor.
> > > 2. Add -mmove-max=bits to set the maximum number of bits can be moved from
> > > memory to memory efficiently.  The default value is derived from
> > > X86_TUNE_AVX512_MOVE_BY_PIECES, X86_TUNE_AVX256_MOVE_BY_PIECES, and the
> > > preferred vector width.
> > > 3. Add -mstore-max=bits to set the maximum number of bits can be stored to
> > > memory efficiently.  The default value is derived from
> > > X86_TUNE_AVX512_STORE_BY_PIECES, X86_TUNE_AVX256_STORE_BY_PIECES and the
> > > preferred vector width.
> > >
> > > gcc/
> > >
> > > PR target/103269
> > > * config/i386/i386-expand.c (ix86_expand_builtin): Pass PVW_NONE
> > > and PVW_NONE to ix86_target_string.
> > > * config/i386/i386-options.c (ix86_target_string): Add arguments
> > > for move_max and store_max.
> > > (ix86_target_string::add_vector_width): New lambda.
> > > (ix86_debug_options): Pass ix86_move_max and ix86_store_max to
> > > ix86_target_string.
> > > (ix86_function_specific_print): Pass ptr->x_ix86_move_max and
> > > ptr->x_ix86_store_max to ix86_target_string.
> > > (ix86_valid_target_attribute_tree): Handle x_ix86_move_max and
> > > x_ix86_store_max.
> > > (ix86_option_override_internal): Set the default x_ix86_move_max
> > > and x_ix86_store_max.
> > > * config/i386/i386-options.h (ix86_target_string): Add
> > > prefer_vector_width and prefer_vector_width.
> > > * config/i386/i386.h (TARGET_AVX256_MOVE_BY_PIECES): Removed.
> > > (TARGET_AVX256_STORE_BY_PIECES): Likewise.
> > > (MOVE_MAX): Use 64 if ix86_move_max or ix86_store_max ==
> > > PVW_AVX512.  Use 32 if ix86_move_max or ix86_store_max >=
> > > PVW_AVX256.
> > > (STORE_MAX_PIECES): Use 64 if ix86_store_max == PVW_AVX512.
> > > Use 32 if ix86_store_max >= PVW_AVX256.
> > > * config/i386/i386.opt: Add -mmove-max=bits and -mstore-max=bits.
> > > * config/i386/x86-tune.def (X86_TUNE_AVX512_MOVE_BY_PIECES): New.
> > > (X86_TUNE_AVX512_STORE_BY_PIECES): Likewise.
> > > * doc/invoke.texi: Document -mmove-max=bits and -mstore-max=bits.
> > >
> > > gcc/testsuite/
> > >
> > > PR target/103269
> > > * gcc.target/i386/pieces-memcpy-17.c: New test.
> > > * gcc.target/i386/pieces-memcpy-18.c: Likewise.
> > > * gcc.target/i386/pieces-memcpy-19.c: Likewise.
> > > * gcc.target/i386/pieces-memcpy-20.c: Likewise.
> > > * gcc.target/i386/pieces-memcpy-21.c: Likewise.
> > > * gcc.target/i386/pieces-memset-45.c: Likewise.
> > > * gcc.target/i386/pieces-memset-46.c: Likewise.
> > > * gcc.target/i386/pieces-memset-47.c: Likewise.
> > > * gcc.target/i386/pieces-memset-48.c: Likewise.
> > > * gcc.target/i386/pieces-memset-49.c: Likewise.
>
> LGTM with two grammar fixes below.

Fixed.

> Thanks,
> Uros.

This is the patch I am checking in.

Thanks.

> > > ---
> > >  gcc/config/i386/i386-expand.c |  1 +
> > >  gcc/config/i386/i386-options.c| 75 +--
> > >  gcc/config/i386/i386-options.h|  6 +-
> > >  gcc/config/i386/i386.h| 18 ++---
> > >  gcc/config/i386/i386.opt  |  8 ++
> > >  gcc/config/i386/x86-tune.def  | 10 +++
> > >  gcc/doc/invoke.texi   | 13 
> > >  .../gcc.target/i386/pieces-memcpy-17.c| 16 
> > >  .../gcc.target/i386/pieces-memcpy-18.c| 16 
> > >  .../gcc.target/i386/pieces-memcpy-19.c| 16 
> > >  .../gcc.target/i386/pieces-memcpy-20.c| 16 
> > >  .../gcc.target/i386/pieces-memcpy-21.c| 16 
> > >  .../gcc.target/i386/pieces-memset-45.c| 16 
> > >  .../gcc.target/i386/pieces-memset-46.c| 17 +
> > >  .../gcc.target/i386/pieces-memset-47.c| 17 +
> > >  .../gcc.target/i386/pieces-memset-48.c| 17 +
> > >  .../gcc.target/i386/pieces-memset-49.c| 16 
> > >  17 files changed, 276 insertions(+), 18 deletions(-)
> > >  create mode 100644 gcc/testsuite/gcc.target/i386/pieces-memcpy-17.c
> > >  create mode 100644 gcc/testsuite/gcc.target/i386/pieces-memcpy-18.c
> > >  create mode 100644 gcc/testsuite/gcc.target/i386/pieces-memcpy-19.c
> > >  create mode 100644 gcc/testsuite/gcc.target/i386/pieces-memcpy-20.c
> > >  create mode 100644 gcc/testsuite/gcc.target/i386/pieces-memcpy-21.c
> > >  create mode

Re: [PATCH 0/4] Add aarch64-darwin support for off-stack trampolines

2021-12-03 Thread Jeff Law via Gcc-patches




On 12/3/2021 12:53 AM, Iain Sandoe wrote:



On 3 Dec 2021, at 03:12, Jeff Law  wrote:



On 11/22/2021 7:49 AM, Maxim Blinov wrote:

Hi all, apologies for forgetting to add the cover letter.

No worries.  I'd already assumed this was to support aarch64 trampolines on 
darwin by having them live elsewere as managed entities.


The motivation of this work is to provide (limited) support for GCC
nested function trampolines on targets that do not have an executable
stack. This code has been (roughly) tested by creating several
thousand nested functions (i.e. enough to force allocation of a new
page), making sure all the nested functions execute correctly, and
consequently returning back up and ensuring that the pages are freed
when there are no more active trampolines in them.

Right.  I'm looking at this wondering if we should do something similar for our 
new architecture.  Avoiding executable stacks is a good thing :-)

One of the limitations of the implementation in its current state is
the inability to track longjmps. There has been some discussion about
instrumenting calls to setjmp/longjmp so that the state of trampolines
is correctly tracked and freed when necessary, however that hasn't
been worked on yet.

So in the longjmp case, we just leak trampolines, right?  I'd think that should 
be quite uncommon.  It'd be nice to fix, but the benefits of non-executable 
stacks may ultimately be enough to overcome the limitation.

The other question is why not do a scheme similar to what Ada does with 
function descriptors?  Is that not feasible for some reason?  I realize that 
hasn't been plumbed into the C/C++ compilers, but it may be another viable 
option.

The problem is that it breaks ABI ;)

It was worth asking :-)




[in a function ptr] we need an address bit to test to determine if we are 
handling a case which has the descriptor, or if we have a regular indirect call.

Unfortunately, although aarch64 aligns functions to 4 bytes, the two lower bits 
are reserved by Arm and therefore we’d have to force function alignment to 
8bytes and that’s an ABI break (that cannot reasonably be expected to happen) 
for aarch64-darwin (or any other arch that has a release in the wild, I’d 
imagine).

(FWIW, This is what I’ve currently  implemented on my development branch [not 
for C++, since that has no nested functions].  I implemented the change for 
Fortran and re-used Martin Uecker’s proposed C impl. - but I used one of the 
reserved address bits [as a work-around to get people going with the port])
Good to know.  I keep thinking I should revamp how our port handles 
nested functions.  Right now we're using the tried and true trampolines, 
but we've got low bits available, so we could go with function 
descriptor approach.  Or we could go with trampolines in mmap'd space 
approach.  I just don't want to use old fashioned trampolines :-)





Here’s the thread discussing the situation when Martin proposed the change for 
C.

https://gcc.gnu.org/legacy-ml/gcc-patches/2018-12/msg01532.html

Yea, I vaguely remember the thread.  ABI stability is a pain :(

jeff


[PATCH] x86: Scan leal in PR target/83782 tests for x32

2021-12-03 Thread H.J. Lu via Gcc-patches
Update PR target/83782 tests to scan leal for x32 to fix:

FAIL: gcc.target/i386/pr83782-1.c scan-assembler leaq[ \\t]foo\\(%rip\\),[ 
\\t]%rax
FAIL: gcc.target/i386/pr83782-2.c scan-assembler leaq[ \\t]foo\\(%rip\\),[ 
\\t]%rax

PR target/83782
* gcc.target/i386/pr83782-1.c: Also scan leal x32.
* gcc.target/i386/pr83782-2.c: Likewise.
---
 gcc/testsuite/gcc.target/i386/pr83782-1.c | 2 +-
 gcc/testsuite/gcc.target/i386/pr83782-2.c | 2 +-
 2 files changed, 2 insertions(+), 2 deletions(-)

diff --git a/gcc/testsuite/gcc.target/i386/pr83782-1.c 
b/gcc/testsuite/gcc.target/i386/pr83782-1.c
index f4c7370142a..ce97b12e65d 100644
--- a/gcc/testsuite/gcc.target/i386/pr83782-1.c
+++ b/gcc/testsuite/gcc.target/i386/pr83782-1.c
@@ -21,6 +21,6 @@ bar(void)
 }
 
 /* { dg-final { scan-assembler {leal[ \t]foo@GOTOFF\(%[^,]*\),[ \t]%eax} { 
target ia32 } } } */
-/* { dg-final { scan-assembler {leaq[ \t]foo\(%rip\),[ \t]%rax} { target { ! 
ia32 } } } } */
+/* { dg-final { scan-assembler {lea(?:l|q)[ \t]foo\(%rip\),[ \t]%(?:e|r)ax} { 
target { ! ia32 } } } } */
 /* { dg-final { scan-assembler-not "foo@GOT\\\(" { target ia32 } } } */
 /* { dg-final { scan-assembler-not "foo@GOTPCREL\\\(" { target { ! ia32 } } } 
} */
diff --git a/gcc/testsuite/gcc.target/i386/pr83782-2.c 
b/gcc/testsuite/gcc.target/i386/pr83782-2.c
index 6c6528fff46..e25d258bbda 100644
--- a/gcc/testsuite/gcc.target/i386/pr83782-2.c
+++ b/gcc/testsuite/gcc.target/i386/pr83782-2.c
@@ -21,6 +21,6 @@ bar(void)
 }
 
 /* { dg-final { scan-assembler {leal[ \t]foo@GOTOFF\(%[^,]*\),[ \t]%eax} { 
target ia32 } } } */
-/* { dg-final { scan-assembler {leaq[ \t]foo\(%rip\),[ \t]%rax} { target { ! 
ia32 } } } } */
+/* { dg-final { scan-assembler {lea(?:l|q)[ \t]foo\(%rip\),[ \t]%(?:e|r)ax} { 
target { ! ia32 } } } } */
 /* { dg-final { scan-assembler-not "foo@GOT\\\(" { target ia32 } } } */
 /* { dg-final { scan-assembler-not "foo@GOTPCREL\\\(" { target { ! ia32 } } } 
} */
-- 
2.33.1



Re: [PATCH, v2, OpenMP 5.0] Remove array section base-pointer mapping semantics, and other front-end adjustments (mainline trunk)

2021-12-03 Thread Jakub Jelinek via Gcc-patches
On Fri, Nov 19, 2021 at 09:54:12PM +0800, Chung-Lin Tang wrote:
> 2021-11-19  Chung-Lin Tang  
> 
> gcc/c/ChangeLog:
> 
>   * c-parser.c (struct omp_dim): New struct type for use inside
>   c_parser_omp_variable_list.
>   (c_parser_omp_variable_list): Allow multiple levels of array and
>   component accesses in array section base-pointer expression.
>   (c_parser_omp_clause_to): Set 'allow_deref' to true in call to
>   c_parser_omp_var_list_parens.
>   (c_parser_omp_clause_from): Likewise.
>   * c-typeck.c (handle_omp_array_sections_1): Extend allowed range
>   of base-pointer expressions involving INDIRECT/MEM/ARRAY_REF and
>   POINTER_PLUS_EXPR.
>   (c_finish_omp_clauses): Extend allowed ranged of expressions
>   involving INDIRECT/MEM/ARRAY_REF and POINTER_PLUS_EXPR.
> 
> gcc/cp/ChangeLog:
> 
>   * parser.c (struct omp_dim): New struct type for use inside
>   cp_parser_omp_var_list_no_open.
>   (cp_parser_omp_var_list_no_open): Allow multiple levels of array and
>   component accesses in array section base-pointer expression.
>   (cp_parser_omp_all_clauses): Set 'allow_deref' to true in call to
>   cp_parser_omp_var_list for to/from clauses.
>   * semantics.c (handle_omp_array_sections_1): Extend allowed range
>   of base-pointer expressions involving INDIRECT/MEM/ARRAY_REF and
>   POINTER_PLUS_EXPR.
>   (handle_omp_array_sections): Adjust pointer map generation of
>   references.
>   (finish_omp_clauses): Extend allowed ranged of expressions
>   involving INDIRECT/MEM/ARRAY_REF and POINTER_PLUS_EXPR.
> 
> gcc/fortran/ChangeLog:
> 
>   * trans-openmp.c (gfc_trans_omp_array_section): Do not generate
>   GOMP_MAP_ALWAYS_POINTER map for main array maps of ARRAY_TYPE type.
> 
> gcc/ChangeLog:
> 
>   * gimplify.c (extract_base_bit_offset): Add 'tree *offsetp' parameter,
>   accomodate case where 'offset' return of get_inner_reference is
>   non-NULL.
>   (is_or_contains_p): Further robustify conditions.
>   (omp_target_reorder_clauses): In alloc/to/from sorting phase, also
>   move following GOMP_MAP_ALWAYS_POINTER maps along.  Add new sorting
>   phase where we make sure pointers with an attach/detach map are ordered
>   correctly.
>   (gimplify_scan_omp_clauses): Add modifications to avoid creating
>   GOMP_MAP_STRUCT and associated alloc map for attach/detach maps.
> 
> gcc/testsuite/ChangeLog:
> 
>   * c-c++-common/goacc/deep-copy-arrayofstruct.c: Adjust testcase.
>   * c-c++-common/gomp/target-enter-data-1.c: New testcase.
>   * c-c++-common/gomp/target-implicit-map-2.c: New testcase.
> 
> libgomp/ChangeLog:
> 
>   * target.c (gomp_map_vars_existing): Make sure attached pointer is
>   not overwritten during cross-host/device copying.
>   (gomp_update): Likewise.
>   (gomp_exit_data): Likewise.
>   * testsuite/libgomp.c++/target-11.C: Adjust testcase.
>   * testsuite/libgomp.c++/target-12.C: Likewise.
>   * testsuite/libgomp.c++/target-15.C: Likewise.
>   * testsuite/libgomp.c++/target-16.C: Likewise.
>   * testsuite/libgomp.c++/target-17.C: Likewise.
>   * testsuite/libgomp.c++/target-21.C: Likewise.
>   * testsuite/libgomp.c++/target-23.C: Likewise.
>   * testsuite/libgomp.c/target-23.c: Likewise.
>   * testsuite/libgomp.c/target-29.c: Likewise.
>   * testsuite/libgomp.c-c++-common/target-implicit-map-2.c: New testcase.

> @@ -12982,6 +12991,7 @@ c_parser_omp_variable_list (c_parser *parser,
>  
>while (1)
>  {
> +  auto_vec dims;
>bool array_section_p = false;
>if (kind == OMP_CLAUSE_DEPEND || kind == OMP_CLAUSE_AFFINITY)
>   {

Wouldn't it be better to have the dims variable outside of the loop?
You do dims.truncate (0); anyway, so when it is used, it should always
start with an empty vector, but if it is outside of the loop, it won't
need to be freed and allocated again for every list item.

> +   else
> + {
> +   for (unsigned i = 0; i < dims.length (); i++)
> + t = tree_cons (dims[i].low_bound, dims[i].length, t);
> + }

No {}s around single statement body.

>  static tree
>  cp_parser_omp_var_list_no_open (cp_parser *parser, enum omp_clause_code kind,
>   tree list, bool *colon,
>   bool allow_deref = false)
>  {
> +  auto_vec dims;
> +  bool array_section_p;
>cp_token *token;
>bool saved_colon_corrects_to_scope_p = parser->colon_corrects_to_scope_p;
>if (colon)

Here it is correctly outside of the loop ;)

Otherwise LGTM.

Jakub



Re: PING^1 [PATCH] x86: Add -mmove-max=bits and -mstore-max=bits

2021-12-03 Thread Uros Bizjak via Gcc-patches
On Fri, Dec 3, 2021 at 2:24 PM H.J. Lu  wrote:
>
> On Thu, Nov 25, 2021 at 2:47 PM H.J. Lu  wrote:
> >
> > Add -mmove-max=bits and -mstore-max=bits to enable 256-bit/512-bit move
> > and store, independent of -mprefer-vector-width=bits:
> >
> > 1. Add X86_TUNE_AVX512_MOVE_BY_PIECES and X86_TUNE_AVX512_STORE_BY_PIECES
> > which are enabled for Intel Sapphire Rapids processor.
> > 2. Add -mmove-max=bits to set the maximum number of bits can be moved from
> > memory to memory efficiently.  The default value is derived from
> > X86_TUNE_AVX512_MOVE_BY_PIECES, X86_TUNE_AVX256_MOVE_BY_PIECES, and the
> > preferred vector width.
> > 3. Add -mstore-max=bits to set the maximum number of bits can be stored to
> > memory efficiently.  The default value is derived from
> > X86_TUNE_AVX512_STORE_BY_PIECES, X86_TUNE_AVX256_STORE_BY_PIECES and the
> > preferred vector width.
> >
> > gcc/
> >
> > PR target/103269
> > * config/i386/i386-expand.c (ix86_expand_builtin): Pass PVW_NONE
> > and PVW_NONE to ix86_target_string.
> > * config/i386/i386-options.c (ix86_target_string): Add arguments
> > for move_max and store_max.
> > (ix86_target_string::add_vector_width): New lambda.
> > (ix86_debug_options): Pass ix86_move_max and ix86_store_max to
> > ix86_target_string.
> > (ix86_function_specific_print): Pass ptr->x_ix86_move_max and
> > ptr->x_ix86_store_max to ix86_target_string.
> > (ix86_valid_target_attribute_tree): Handle x_ix86_move_max and
> > x_ix86_store_max.
> > (ix86_option_override_internal): Set the default x_ix86_move_max
> > and x_ix86_store_max.
> > * config/i386/i386-options.h (ix86_target_string): Add
> > prefer_vector_width and prefer_vector_width.
> > * config/i386/i386.h (TARGET_AVX256_MOVE_BY_PIECES): Removed.
> > (TARGET_AVX256_STORE_BY_PIECES): Likewise.
> > (MOVE_MAX): Use 64 if ix86_move_max or ix86_store_max ==
> > PVW_AVX512.  Use 32 if ix86_move_max or ix86_store_max >=
> > PVW_AVX256.
> > (STORE_MAX_PIECES): Use 64 if ix86_store_max == PVW_AVX512.
> > Use 32 if ix86_store_max >= PVW_AVX256.
> > * config/i386/i386.opt: Add -mmove-max=bits and -mstore-max=bits.
> > * config/i386/x86-tune.def (X86_TUNE_AVX512_MOVE_BY_PIECES): New.
> > (X86_TUNE_AVX512_STORE_BY_PIECES): Likewise.
> > * doc/invoke.texi: Document -mmove-max=bits and -mstore-max=bits.
> >
> > gcc/testsuite/
> >
> > PR target/103269
> > * gcc.target/i386/pieces-memcpy-17.c: New test.
> > * gcc.target/i386/pieces-memcpy-18.c: Likewise.
> > * gcc.target/i386/pieces-memcpy-19.c: Likewise.
> > * gcc.target/i386/pieces-memcpy-20.c: Likewise.
> > * gcc.target/i386/pieces-memcpy-21.c: Likewise.
> > * gcc.target/i386/pieces-memset-45.c: Likewise.
> > * gcc.target/i386/pieces-memset-46.c: Likewise.
> > * gcc.target/i386/pieces-memset-47.c: Likewise.
> > * gcc.target/i386/pieces-memset-48.c: Likewise.
> > * gcc.target/i386/pieces-memset-49.c: Likewise.

LGTM with two grammar fixes below.

Thanks,
Uros.

> > ---
> >  gcc/config/i386/i386-expand.c |  1 +
> >  gcc/config/i386/i386-options.c| 75 +--
> >  gcc/config/i386/i386-options.h|  6 +-
> >  gcc/config/i386/i386.h| 18 ++---
> >  gcc/config/i386/i386.opt  |  8 ++
> >  gcc/config/i386/x86-tune.def  | 10 +++
> >  gcc/doc/invoke.texi   | 13 
> >  .../gcc.target/i386/pieces-memcpy-17.c| 16 
> >  .../gcc.target/i386/pieces-memcpy-18.c| 16 
> >  .../gcc.target/i386/pieces-memcpy-19.c| 16 
> >  .../gcc.target/i386/pieces-memcpy-20.c| 16 
> >  .../gcc.target/i386/pieces-memcpy-21.c| 16 
> >  .../gcc.target/i386/pieces-memset-45.c| 16 
> >  .../gcc.target/i386/pieces-memset-46.c| 17 +
> >  .../gcc.target/i386/pieces-memset-47.c| 17 +
> >  .../gcc.target/i386/pieces-memset-48.c| 17 +
> >  .../gcc.target/i386/pieces-memset-49.c| 16 
> >  17 files changed, 276 insertions(+), 18 deletions(-)
> >  create mode 100644 gcc/testsuite/gcc.target/i386/pieces-memcpy-17.c
> >  create mode 100644 gcc/testsuite/gcc.target/i386/pieces-memcpy-18.c
> >  create mode 100644 gcc/testsuite/gcc.target/i386/pieces-memcpy-19.c
> >  create mode 100644 gcc/testsuite/gcc.target/i386/pieces-memcpy-20.c
> >  create mode 100644 gcc/testsuite/gcc.target/i386/pieces-memcpy-21.c
> >  create mode 100644 gcc/testsuite/gcc.target/i386/pieces-memset-45.c
> >  create mode 100644 gcc/testsuite/gcc.target/i386/pieces-memset-46.c
> >  create mode 100644 gcc/testsuite/gcc.target/i386/pieces-memset-47.c
> >  create mode 100644 gcc/testsuite/gcc.target/i386/pieces-memset-48.c
> >  creat

Re: [PATCH, v5, OpenMP 5.0] Improve OpenMP target support for C++ [PR92120 v5]

2021-12-03 Thread Jakub Jelinek via Gcc-patches
On Tue, Nov 16, 2021 at 08:43:27PM +0800, Chung-Lin Tang wrote:
> 2021-11-16  Chung-Lin Tang  
> 
>   PR middle-end/92120
> 
> gcc/cp/ChangeLog:
> 
>   * cp-tree.h (finish_omp_target): New declaration.
>   (finish_omp_target_clauses): Likewise.
>   * parser.c (cp_parser_omp_clause_map): Adjust call to
>   cp_parser_omp_var_list_no_open to set 'allow_deref' argument to true.
>   (cp_parser_omp_target): Factor out code, adjust into calls to new
>   function finish_omp_target.
>   * pt.c (tsubst_expr): Add call to finish_omp_target_clauses for
>   OMP_TARGET case.
>   * semantics.c (handle_omp_array_sections_1): Add handling to create
>   'this->member' from 'member' FIELD_DECL. Remove case of rejecting
>   'this' when not in declare simd.
>   (handle_omp_array_sections): Likewise.
>   (finish_omp_clauses): Likewise. Adjust to allow 'this[]' in OpenMP
>   map clauses. Handle 'A->member' case in map clauses. Remove case of
>   rejecting 'this' when not in declare simd.
>   (struct omp_target_walk_data): New struct for walking over
>   target-directive tree body.
>   (finish_omp_target_clauses_r): New function for tree walk.
>   (finish_omp_target_clauses): New function.
>   (finish_omp_target): New function.
> 
> gcc/c/ChangeLog:
> 
>   * c-parser.c (c_parser_omp_clause_map): Set 'allow_deref' argument in
>   call to c_parser_omp_variable_list to 'true'.
>   * c-typeck.c (handle_omp_array_sections_1): Add strip of MEM_REF in
>   array base handling.
>   (c_finish_omp_clauses): Handle 'A->member' case in map clauses.
> 
> gcc/ChangeLog:
> 
>   * gimplify.c ("tree-hash-traits.h"): Add include.
>   (gimplify_scan_omp_clauses): Change struct_map_to_clause to type
>   hash_map *. Adjust struct map handling to handle
>   cases of *A and A->B expressions. Under !DECL_P case of
>   GOMP_CLAUSE_MAP handling, add STRIP_NOPS for indir_p case, add to
>   struct_deref_set for map(*ptr_to_struct) cases. Add MEM_REF case when
>   handling component_ref_p case. Add unshare_expr and gimplification
>   when created GOMP_MAP_STRUCT is not a DECL. Add code to add
>   firstprivate pointer for *pointer-to-struct case.
>   (gimplify_adjust_omp_clauses): Move GOMP_MAP_STRUCT removal code for
>   exit data directives code to earlier position.
>   * omp-low.c (lower_omp_target):
>   Handle GOMP_MAP_ATTACH_ZERO_LENGTH_ARRAY_SECTION, and
>   GOMP_MAP_POINTER_TO_ZERO_LENGTH_ARRAY_SECTION map kinds.
>   * tree-pretty-print.c (dump_omp_clause): Likewise.
> 
> gcc/testsuite/ChangeLog:
> 
>   * gcc.dg/gomp/target-3.c: New testcase.
>   * g++.dg/gomp/target-3.C: New testcase.
>   * g++.dg/gomp/target-lambda-1.C: New testcase.
>   * g++.dg/gomp/target-lambda-2.C: New testcase.
>   * g++.dg/gomp/target-this-1.C: New testcase.
>   * g++.dg/gomp/target-this-2.C: New testcase.
>   * g++.dg/gomp/target-this-3.C: New testcase.
>   * g++.dg/gomp/target-this-4.C: New testcase.
>   * g++.dg/gomp/target-this-5.C: New testcase.
>   * g++.dg/gomp/this-2.C: Adjust testcase.
> 
> include/ChangeLog:
> 
>   * gomp-constants.h (enum gomp_map_kind):
>   Add GOMP_MAP_ATTACH_ZERO_LENGTH_ARRAY_SECTION, and
>   GOMP_MAP_POINTER_TO_ZERO_LENGTH_ARRAY_SECTION map kinds.
>   (GOMP_MAP_POINTER_P):
>   Include GOMP_MAP_POINTER_TO_ZERO_LENGTH_ARRAY_SECTION.
> 
> libgomp/ChangeLog:
> 
>   * libgomp.h (gomp_attach_pointer): Add bool parameter.
>   * oacc-mem.c (acc_attach_async): Update call to gomp_attach_pointer.
>   (goacc_enter_data_internal): Likewise.
>   * target.c (gomp_map_vars_existing): Update assert condition to
>   include GOMP_MAP_ATTACH_ZERO_LENGTH_ARRAY_SECTION.
>   (gomp_map_pointer): Add 'bool allow_zero_length_array_sections'
>   parameter, add support for mapping a pointer with NULL target.
>   (gomp_attach_pointer): Add 'bool allow_zero_length_array_sections'
>   parameter, add support for attaching a pointer with NULL target.
>   (gomp_map_vars_internal): Update calls to gomp_map_pointer and
>   gomp_attach_pointer, add handling for
>   GOMP_MAP_ATTACH_ZERO_LENGTH_ARRAY_SECTION, and
>   GOMP_MAP_POINTER_TO_ZERO_LENGTH_ARRAY_SECTION cases.
>   * testsuite/libgomp.c++/target-23.C: New testcase.
>   * testsuite/libgomp.c++/target-lambda-1.C: New testcase.
>   * testsuite/libgomp.c++/target-lambda-2.C: New testcase.
>   * testsuite/libgomp.c++/target-this-1.C: New testcase.
>   * testsuite/libgomp.c++/target-this-2.C: New testcase.
>   * testsuite/libgomp.c++/target-this-3.C: New testcase.
>   * testsuite/libgomp.c++/target-this-4.C: New testcase.
>   * testsuite/libgomp.c++/target-this-5.C: New testcase.

> +/* Used to walk OpenMP target directive body.  */
> +
> +struct omp_target_walk_data
> +{
> +  tree current_object;
> +  bool this_expr_accessed;
> 

Re: Dominators question

2021-12-03 Thread Richard Biener via Gcc-patches
On December 3, 2021 3:15:25 PM GMT+01:00, Andrew MacLeod  
wrote:
>When something like the loop unswitching code adds elements to the CFGs, 
>does this invalidate the dominators? or are they updated?  or is it in 
>an in between state.
>
>Im curious because a) the relation code uses it under the covers, and b) 
>Im looking to add a ranger caching improvement which also uses 
>dominators if they are available.
>
>When blocks are added, I wonder what will happen to
>
>   1) dom_info_available_p (CDI_DOMINATORS)  (is it still true), and 
>then what happens to
>
>   2) get_immediate_dominator (CDI_DOMINATORS, bb);  for one of the 
>newly added BBs.

Dominators are generally updated by most high level CFG manipulations, just the 
fast queries are invalidated. If a pass uses CFG manipulation that does not 
update dominators you will get ICEs or silent wron code... 

Richard. 

>Thanks
>
>Andrew
>



Re: [PATCH] rs6000: Fix use of wrong enum for built-in function code.

2021-12-03 Thread Segher Boessenkool
Hi!

On Thu, Dec 02, 2021 at 04:53:18PM -0600, Bill Schmidt wrote:
> I discovered this bug while working on patches to remove the old built-ins
> infrastructure.  I missed a spot in converting from the rs6000_builtins enum 
> to
> the rs6000_gen_builtins enum.  This fixes it.  The fix is technically not 
> right
> if new_builtins_are_enabled were to be set to zero, but we're not going to do
> that anymore, and the remnants of that code will be removed shortly.

> gcc/
>   * config/rs6000/rs6000.c (rs6000_builtin_reciprocal): Fix builtin
>   identifiers.

What an informative changelog ;-)

Okay for trunk.  Thanks!


Segher


Re: [PATCH v2 0/2] RISC-V: add gcc support for Scalar Cryptography v1.0.0-rc6

2021-12-03 Thread Kito Cheng via Gcc-patches
Hi SiYu:

Committed, thanks!

On Thu, Nov 25, 2021 at 12:42 AM Palmer Dabbelt  wrote:
>
> On Wed, 24 Nov 2021 02:00:33 PST (-0800), Kito Cheng wrote:
> > I would prefer to accept those patchset even with no builtin function
> > or intrinsic function yet,
> > this not only add the support of -march option, but also introduce the
> > predefined macros like __riscv_zk*,
> > which could be used in *.S file to check if those instructions are
> > available or not.
>
> That makes sense, I guess I hadn't thought of that use case.
>
> > On Wed, Nov 24, 2021 at 11:23 AM Palmer Dabbelt  wrote:
> >>
> >> [Changing to Jim's new address]
> >>
> >> On Mon, 22 Nov 2021 00:19:08 PST (-0800), s...@isrc.iscas.ac.cn wrote:
> >> > From: SiYu Wu 
> >> >
> >> > This patch add gcc backend support for RISC-V Scalar Cryptography
> >> > Extension (k-ext), including machine description, builtins defines and
> >> > testcases for each k-ext's subset.
> >> >
> >> > A note about Zbkx: The Zbkx should be implemented in bitmanip's Zbp, but
> >> > since zbp is not included in the bitmanip spec v1.0, and crypto's v1.0
> >> > release will earlier than bitmanip's next release, so for now we
> >> > implementing it here.
> >> >
> >> > Version logs:
> >> >
> >> > v2: As Kito mentions, now this patch only includes the arch string 
> >> > related
> >> > stuff, the builtins and md changes is not included, waiting for the 
> >> > builtin
> >> > and intrinsic added to the spec. Also removed the unnecessary patches 
> >> > and add
> >> > Changelogs.
> >>
> >> I don't think there's anything wrong with what's here, but IMO we should
> >> hold off on merging until GCC does something with these extensions.
> >>
> >> IIUC all this enables is passing "-march=*Zk*" instead of
> >> "-Wa,-march=*Zk*", and while that is useful I'm worried it'll just make
> >> more of a headache for users who lose a simple way to detect the
> >> intrinsics.  IMO forcing users to pass -Wa properly encodes the "GCC
> >> doesn't support these, but binutils does" scenario pretty sanely, and
> >> users doing things at this level of complexity should be used to that
> >> already because it happens somewhat frequently.
> >>
> >> I'm not sure if I'm missing some use case this for this, though.
> >>
> >> > SiYu Wu (2):
> >> >   RISC-V: Add option defines for Scalar Cryptography
> >> >   RISC-V: Add implied defines of Zk, Zkn and Zks
> >> >
> >> >  gcc/common/config/riscv/riscv-common.c | 38 +-
> >> >  gcc/config/riscv/arch-canonicalize | 16 ++-
> >> >  gcc/config/riscv/riscv-opts.h  | 22 +++
> >> >  gcc/config/riscv/riscv.opt |  3 ++
> >> >  4 files changed, 77 insertions(+), 2 deletions(-)


Re: [Patch 3/8, Arm, GCC] Add option -mbranch-protection. [Was RE: [Patch 2/7, Arm, GCC] Add option -mbranch-protection.]

2021-12-03 Thread Richard Earnshaw via Gcc-patches




On 28/10/2021 12:42, Tejas Belagod via Gcc-patches wrote:




-Original Message-
From: Richard Earnshaw 
Sent: Monday, October 11, 2021 1:58 PM
To: Tejas Belagod ; gcc-patches@gcc.gnu.org
Subject: Re: [Patch 2/7, Arm, GCC] Add option -mbranch-protection.

On 08/10/2021 13:17, Tejas Belagod via Gcc-patches wrote:

Hi,

Add -mbranch-protection option and its associated parsing routines.
This option enables the code-generation of pointer signing and
authentication instructions in function prologues and epilogues.

Tested on arm-none-eabi. OK for trunk?

2021-10-04  Tejas Belagod  

gcc/ChangeLog:

* common/config/arm/arm-common.c
 (arm_print_hit_for_pacbti_option): New.
 (arm_progress_next_token): New.
 (arm_parse_pac_ret_clause): New routine for parsing the
pac-ret clause for -mbranch-protection.
(arm_parse_pacbti_option): New routine to parse all the options
to -mbranch-protection.
* config/arm/arm-protos.h (arm_parse_pacbti_option): Export.
* config/arm/arm.c (arm_configure)build_target): Handle option
to -mbranch-protection.
* config/arm/arm.opt (mbranch-protection). New.
(arm_enable_pacbti): New.



You're missing documentation for invoke.texi.

Also, how does this differ from the exising option in aarch64?  Can the code
from that be adapted to be made common to both targets rather than doing
a new implementation?

Finally, there are far to many manifest constants in this patch, they need
replacing with enums or #defines as appropriate if we cannot share the
aarch64 code.


Thanks for the reviews.

Add -mbranch-protection option.  This option enables the code-generation of
pointer signing and authentication instructions in function prologues and
epilogues.

2021-10-25  Tejas Belagod  

gcc/ChangeLog:

* config/arm/arm.c (arm_configure_build_target): Parse and validate
-mbranch-protection option and initialize appropriate data structures.
* config/arm/arm.opt: New option -mbranch-protection.


.../arm.opt (-mbranch-protection) : New option.


* doc/invoke.texi: Document -mbranch-protection.


.../invoke.texi (Arm Options): Document it.



Tested the following configurations, OK for trunk?

-mthumb/-march=armv8.1-m.main+pacbti/-mfloat-abi=soft
-marm/-march=armv7-a/-mfpu=vfpv3-d16/-mfloat-abi=softfp
mcmodel=small and tiny
aarch64-none-linux-gnu native test and bootstrap



+@item 
-mbranch-protection=@var{none}|@var{standard}|@var{pac-ret}[+@var{leaf}][+@var{bti}]|@var{bti}[+@var{pac-ret}[+@var{leaf}]]

+@opindex mbranch-protection
+Select the branch protection features to use.
+@samp{none} is the default and turns off all types of branch protection.
+@samp{standard} turns on all types of branch protection features.  If a 
feature

+has additional tuning options, then @samp{standard} sets it to its standard
+level.
+@samp{pac-ret[+@var{leaf}]} turns on return address signing to its standard
+level: signing functions that save the return address to memory (non-leaf
+functions will practically always do this).  The optional argument 
@samp{leaf}

+can be used to extend the signing to include leaf functions.
+@samp{bti} turns on branch target identification mechanism.

This doesn't really use the right documentation style.  Closer would be:

===
@item 
-mbranch-protection=@var{none}|@var{standard}|@var{pac-ret}[+@var{leaf}][+@var{bti}]|@var{bti}[+@var{pac-ret}[+@var{leaf}]]

@opindex mbranch-protection
Enable branch protection features (armv8.1-m.main only).
@samp{none} generate code without branch protection or return address 
signing.
@samp{standard[+@var{leaf}]} generate code with all branch protection 
features enabled at their standard level.
@samp{pac-ret[+@var{leaf}]} generate code with return address signing 
set to its standard level, which is to sign all functions that save the 
return address to memory.
@samp{leaf} When return address signing is enabled, also sign leaf 
functions even if they do not write the return address to memory.
+@samp{bti} Add landing-pad instructions at the permitted targets of 
indirect branch instructions.


If the @samp{+pacbti} architecture extension is not enabled, then all 
branch protection and return address signing operations are constrained 
to use only the instructions defined in the architectural-NOP space. 
The generated code will remain backwards-compatible with earlier 
versions of the architecture, but the additional security can be enabled 
at run time on processors that support the @samp{PACBTI} extension.


Branch target enforcement using BTI can only be enabled at runtime if 
all code in the application has been compiled with at least 
@samp{-mbranch-protection=bti}.


The default is to generate code without branch protection or return 
address signing.



R.


[PATCH 2/2] RISC-V: Minimal support of vector extensions

2021-12-03 Thread Kito Cheng
gcc/ChangeLog:

* common/config/riscv/riscv-common.c (riscv_implied_info): Add
vector extensions.
(riscv_ext_version_table): Add version info for vector extensions.
(riscv_ext_flag_table): Add option mask for vector extensions.
* config/riscv/riscv-opts.h (MASK_VECTOR_EEW_32): New.
(MASK_VECTOR_EEW_64): New.
(MASK_VECTOR_EEW_FP_32): New.
(MASK_VECTOR_EEW_FP_64): New.
(MASK_ZVL32B: New.
(MASK_ZVL64B: New.
(MASK_ZVL128B: New.
(MASK_ZVL256B: New.
(MASK_ZVL512B: New.
(MASK_ZVL1024B): New.
(MASK_ZVL2048B): New.
(MASK_ZVL4096B): New.
(MASK_ZVL8192B): New.
(MASK_ZVL16384B): New.
(MASK_ZVL32768B): New.
(MASK_ZVL65536B): New.
(TARGET_ZVL32B): New.
(TARGET_ZVL64B): New.
(TARGET_ZVL128B): New.
(TARGET_ZVL256B): New.
(TARGET_ZVL512B): New.
(TARGET_ZVL1024B): New.
(TARGET_ZVL2048B): New.
(TARGET_ZVL4096B): New.
(TARGET_ZVL8192B): New.
(TARGET_ZVL16384B): New.
(TARGET_ZVL32768B): New.
(TARGET_ZVL65536B): New.
* config/riscv/riscv.opt (Mask(VECTOR)): New.
(riscv_vector_eew_flags): New.
(riscv_zvl_flags): New.

gcc/testsuite/ChangeLog:

* testsuite/gcc.target/riscv/predef-14.c: New.
* testsuite/gcc.target/riscv/predef-15.c: Ditto.
* testsuite/gcc.target/riscv/predef-16.c: Ditto.
---
 gcc/common/config/riscv/riscv-common.c | 86 
 gcc/config/riscv/riscv-opts.h  | 31 
 gcc/config/riscv/riscv.opt |  8 ++
 gcc/testsuite/gcc.target/riscv/predef-14.c | 83 
 gcc/testsuite/gcc.target/riscv/predef-15.c | 91 ++
 gcc/testsuite/gcc.target/riscv/predef-16.c | 91 ++
 6 files changed, 390 insertions(+)
 create mode 100644 gcc/testsuite/gcc.target/riscv/predef-14.c
 create mode 100644 gcc/testsuite/gcc.target/riscv/predef-15.c
 create mode 100644 gcc/testsuite/gcc.target/riscv/predef-16.c

diff --git a/gcc/common/config/riscv/riscv-common.c 
b/gcc/common/config/riscv/riscv-common.c
index 31b1c833965..5cf15024aef 100644
--- a/gcc/common/config/riscv/riscv-common.c
+++ b/gcc/common/config/riscv/riscv-common.c
@@ -50,6 +50,38 @@ static const riscv_implied_info_t riscv_implied_info[] =
   {"d", "f"},
   {"f", "zicsr"},
   {"d", "zicsr"},
+
+  {"v", "zvl128b"},
+  {"v", "zve64d"},
+
+  {"zve32f", "f"},
+  {"zve64f", "f"},
+  {"zve64d", "d"},
+
+  {"zve32x", "zvl32b"},
+  {"zve32f", "zve32x"},
+  {"zve32f", "zvl32b"},
+
+  {"zve64x", "zve32x"},
+  {"zve64x", "zvl64b"},
+  {"zve64f", "zve32f"},
+  {"zve64f", "zve64x"},
+  {"zve64f", "zvl64b"},
+  {"zve64d", "zve64f"},
+  {"zve64d", "zvl64b"},
+
+  {"zvl64b", "zvl32b"},
+  {"zvl128b", "zvl64b"},
+  {"zvl256b", "zvl128b"},
+  {"zvl512b", "zvl256b"},
+  {"zvl1024b", "zvl512b"},
+  {"zvl2048b", "zvl1024b"},
+  {"zvl4096b", "zvl2048b"},
+  {"zvl8192b", "zvl4096b"},
+  {"zvl16384b", "zvl8192b"},
+  {"zvl32768b", "zvl16384b"},
+  {"zvl65536b", "zvl32768b"},
+
   {NULL, NULL}
 };
 
@@ -101,11 +133,34 @@ static const struct riscv_ext_version 
riscv_ext_version_table[] =
   {"zifencei", ISA_SPEC_CLASS_20191213, 2, 0},
   {"zifencei", ISA_SPEC_CLASS_20190608, 2, 0},
 
+
+  {"v",   ISA_SPEC_CLASS_NONE, 1, 0},
+
   {"zba", ISA_SPEC_CLASS_NONE, 1, 0},
   {"zbb", ISA_SPEC_CLASS_NONE, 1, 0},
   {"zbc", ISA_SPEC_CLASS_NONE, 1, 0},
   {"zbs", ISA_SPEC_CLASS_NONE, 1, 0},
 
+  {"zve32x", ISA_SPEC_CLASS_NONE, 1, 0},
+  {"zve32f", ISA_SPEC_CLASS_NONE, 1, 0},
+  {"zve32d", ISA_SPEC_CLASS_NONE, 1, 0},
+  {"zve64x", ISA_SPEC_CLASS_NONE, 1, 0},
+  {"zve64f", ISA_SPEC_CLASS_NONE, 1, 0},
+  {"zve64d", ISA_SPEC_CLASS_NONE, 1, 0},
+
+  {"zvl32b", ISA_SPEC_CLASS_NONE, 1, 0},
+  {"zvl64b", ISA_SPEC_CLASS_NONE, 1, 0},
+  {"zvl128b", ISA_SPEC_CLASS_NONE, 1, 0},
+  {"zvl256b", ISA_SPEC_CLASS_NONE, 1, 0},
+  {"zvl512b", ISA_SPEC_CLASS_NONE, 1, 0},
+  {"zvl1024b", ISA_SPEC_CLASS_NONE, 1, 0},
+  {"zvl2048b", ISA_SPEC_CLASS_NONE, 1, 0},
+  {"zvl4096b", ISA_SPEC_CLASS_NONE, 1, 0},
+  {"zvl8192b", ISA_SPEC_CLASS_NONE, 1, 0},
+  {"zvl16384b", ISA_SPEC_CLASS_NONE, 1, 0},
+  {"zvl32768b", ISA_SPEC_CLASS_NONE, 1, 0},
+  {"zvl65536b", ISA_SPEC_CLASS_NONE, 1, 0},
+
   /* Terminate the list.  */
   {NULL, ISA_SPEC_CLASS_NONE, 0, 0}
 };
@@ -940,6 +995,7 @@ static const riscv_ext_flag_table_t riscv_ext_flag_table[] =
   {"f", &gcc_options::x_target_flags, MASK_HARD_FLOAT},
   {"d", &gcc_options::x_target_flags, MASK_DOUBLE_FLOAT},
   {"c", &gcc_options::x_target_flags, MASK_RVC},
+  {"v", &gcc_options::x_target_flags, MASK_VECTOR},
 
   {"zicsr",&gcc_options::x_riscv_zi_subext, MASK_ZICSR},
   {"zifencei", &gcc_options::x_riscv_zi_subext, MASK_ZIFENCEI},
@@ -949,6 +1005,36 @@ static const riscv_ext_flag_table_t 
riscv_ext_flag_table[] =
   {"zbc",&gcc_options::x_riscv_zb_subext, MASK_ZBC},
   {"zbs",&gcc_options:

[PATCH 1/2] RISC-V: Allow extension name contain digit

2021-12-03 Thread Kito Cheng
RISC-V spec only allow alphabetical name for extension before, however
vector extension add several extension named with digits, so we try to
extend the naming rule.

Ref:
https://github.com/riscv/riscv-isa-manual/pull/718

gcc/ChangeLog:

* common/config/riscv/riscv-common.c
(riscv_subset_list::parse_multiletter_ext): Allow ext. name has
digit.
---
 gcc/common/config/riscv/riscv-common.c | 42 +++---
 1 file changed, 38 insertions(+), 4 deletions(-)

diff --git a/gcc/common/config/riscv/riscv-common.c 
b/gcc/common/config/riscv/riscv-common.c
index b8dd0aeac3e..31b1c833965 100644
--- a/gcc/common/config/riscv/riscv-common.c
+++ b/gcc/common/config/riscv/riscv-common.c
@@ -760,24 +760,58 @@ riscv_subset_list::parse_multiletter_ext (const char *p,
   bool explicit_version_p = false;
   char *ext;
   char backup;
+  size_t len;
+  size_t end_of_version_pos, i;
+  bool found_any_number = false;
+  bool found_minor_version = false;
 
-  while (*++q != '\0' && *q != '_' && !ISDIGIT (*q))
+  /* Parse until end of this extension including version number.  */
+  while (*++q != '\0' && *q != '_')
;
 
   backup = *q;
   *q = '\0';
-  ext = xstrdup (subset);
+  len = q - subset;
   *q = backup;
 
+  end_of_version_pos = len;
+  /* Find the begin of version string.  */
+  for (i = len -1; i > 0; --i)
+   {
+ if (ISDIGIT (subset[i]))
+   {
+ found_any_number = true;
+ continue;
+   }
+ /* Might be version seperator, but need to check one more char,
+we only allow p, so we could stop parsing if found
+any more `p`.  */
+ if (subset[i] == 'p' &&
+ !found_minor_version &&
+ found_any_number && ISDIGIT (subset[i-1]))
+   {
+ found_minor_version = true;
+ continue;
+   }
+
+ end_of_version_pos = i + 1;
+ break;
+   }
+
+  backup = subset[end_of_version_pos];
+  subset[end_of_version_pos] = '\0';
+  ext = xstrdup (subset);
+  subset[end_of_version_pos] = backup;
+
   end_of_version
-   = parsing_subset_version (ext, q, &major_version, &minor_version,
+   = parsing_subset_version (ext, subset + end_of_version_pos, 
&major_version, &minor_version,
  /* std_ext_p= */ false, &explicit_version_p);
   free (ext);
 
   if (end_of_version == NULL)
return NULL;
 
-  *q = '\0';
+  subset[end_of_version_pos] = '\0';
 
   if (strlen (subset) == 1)
{
-- 
2.34.0



[PATCH 0/2] RISC-V: Vector extensions support

2021-12-03 Thread Kito Cheng
This patch set adding basic -march option support and feature test marco for
vector extensions, and also extend the syntax of arch string for vector
extensions, although that should change RISC-V ISA manual first would be
better, but we don't got response[1] yet, and that will block whole vector
extensions, so we decide fix that first.

[1] https://github.com/riscv/riscv-isa-manual/pull/718





Re: [EXTERNAL] Re: [PATCH][WIP] PR tree-optimization/101808 Boolean comparison simplification

2021-12-03 Thread Jeff Law via Gcc-patches




On 11/29/2021 4:51 PM, Navid Rahimi wrote:

Jeff,

Sorry for bringing back this thread.

Quick question, you mentioned checking the TYPE_PRECISION to make sure the type 
is a canonical Boolean type (and not a fancy signed/unsigned Boolean type from 
Ada Andrew mentioned).
But I noticed that truth_valued_p does already check for:
(if (INTEGRAL_TYPE_P (type) && TYPE_PRECISION (type) == 1))

So in this case, there should not any other Boolean type fall into 
truth_valued_p type [1]? Is that right?

Yes, correct.

Jeff



Re: [PATCH] Improve AutoFDO count propagation algorithm

2021-12-03 Thread Jeff Law via Gcc-patches




On 12/2/2021 7:53 PM, Eugene Rozenfeld via Gcc-patches wrote:

When a basic block A has been annotated with a count and it has only one
successor (or predecessor) B, we can propagate the A's count to B.
The algorithm without this change could leave B without an annotation if B had
other unannotated predecessors (or successors). For example, in the test case I 
added,
the loop header block was left unannotated, which prevented loop unrolling.

gcc/ChangeLog:
 * auto-profile.c (afdo_propagate_edge): Improve count propagation 
algorithm.

gcc/testsuite/ChangeLog:
 * gcc.dg/tree-prof/init-array.c: New test for unrolling inner loops.
Seems quite sensible.   I can also easily argue this is a bugfix, even 
though there isn't a BZ associated with  this issue that I'm aware of.


OK for the trunk.

Thanks,
Jeff



Re: [Patch 2/8, Arm, GCC] Add Armv8.1-M Mainline target feature +pacbti. [Was RE: [Patch 1/7, Arm, GCC] Add Armv8.1-M Mainline target feature +pacbti.]

2021-12-03 Thread Richard Earnshaw via Gcc-patches




On 28/10/2021 12:41, Tejas Belagod via Gcc-patches wrote:




-Original Message-
From: Richard Earnshaw 
Sent: Monday, October 11, 2021 1:29 PM
To: Tejas Belagod ; gcc-patches@gcc.gnu.org
Subject: Re: [Patch 1/7, Arm, GCC] Add Armv8.1-M Mainline target feature
+pacbti.

On 08/10/2021 13:17, Tejas Belagod via Gcc-patches wrote:

Hi,

This patch adds the -march feature +pacbti to Armv8.1-M Mainline.
This feature enables pointer signing and authentication instructions
on M-class architectures.

Tested on arm-none-eabi. OK for trunk?

2021-10-04  Tejas Belagod  

gcc/Changelog:

* config/arm/arm-cpus.in: Define new feature pacbti.
* config/arm/arm.h (TARGET_HAVE_PACBTI): New.



"+pacbti" needs to be documented in invoke.texi at the appropriate place.



Thanks for the reviews.

This patch adds the -march feature +pacbti to Armv8.1-M Mainline.
This feature enables pointer signing and authentication instructions
on M-class architectures.

2021-10-25  Tejas Belagod  

gcc/Changelog:

* config/arm/arm-cpus.in: Define new feature pacbti.
* config/arm/arm.h (TARGET_HAVE_PACBTI): New.
* doc/invoke.texi: Document new feature pacbti.


This isn't in the correct style:

gcc/Changelog:

* config/arm/arm.h (TARGET_HAVE_PACBTI): New macro.
* config/arm/arm-cpus.in (pacbti): New feature.
* doc/invoke.texi (Arm Options): Document it.

would be better.




Tested the following configurations, OK for trunk?

-mthumb/-march=armv8.1-m.main+pacbti/-mfloat-abi=soft
-marm/-march=armv7-a/-mfpu=vfpv3-d16/-mfloat-abi=softfp
mcmodel=small and tiny
aarch64-none-linux-gnu native test and bootstrap



R.


Otherwise OK.

R.


Re: [Patch] libgomp.texi: Update OMP_PLACES

2021-12-03 Thread Jakub Jelinek via Gcc-patches
On Wed, Oct 20, 2021 at 12:54:05PM +0200, Tobias Burnus wrote:
> libgomp/ChangeLog:
> 
>   * libgomp.texi (OMP_PLACES): Extend description for OMP 5.1 changes.
> 
> diff --git a/libgomp/libgomp.texi b/libgomp/libgomp.texi
> index e9fa8ba0bf7..aee82ef2ba2 100644
> --- a/libgomp/libgomp.texi
> +++ b/libgomp/libgomp.texi
> @@ -2031,25 +2031,33 @@ When undefined, @env{OMP_PROC_BIND} defaults to 
> @code{TRUE} when
>  @table @asis
>  @item @emph{Description}:
>  The thread placement can be either specified using an abstract name or by an
> -explicit list of the places.  The abstract names @code{threads}, @code{cores}
> -and @code{sockets} can be optionally followed by a positive number in
> -parentheses, which denotes the how many places shall be created.  With
> -@code{threads} each place corresponds to a single hardware thread; 
> @code{cores}
> -to a single core with the corresponding number of hardware threads; and with
> -@code{sockets} the place corresponds to a single socket.  The resulting
> -placement can be shown by setting the @env{OMP_DISPLAY_ENV} environment
> -variable.
> +explicit list of the places.  The abstract names @code{threads}, 
> @code{cores},
> +@code{sockets}, @code{ll_caches} and @code{numa_domains} can be optionally
> +followed by a positive number in parentheses, which denotes the how many 
> places
> +shall be created.  With @code{threads} each place corresponds to a single
> +hardware thread; @code{cores} to a single core with the corresponding number 
> of
> +hardware threads; with @code{sockets} the place corresponds to a single
> +socket; with @code{ll_caches} to a set of cores that shares the last level
> +cache on the device; and @code{numa_domains} to a set of cores for which 
> their
> +closest memory on the device is the same meory and at a similar distance from

s/meory/memory/

> +the cores.  The resulting placement can be shown by setting the
> +@env{OMP_DISPLAY_ENV} environment variable.
>  
>  Alternatively, the placement can be specified explicitly as comma-separated
>  list of places.  A place is specified by set of nonnegative numbers in curly
> -braces, denoting the denoting the hardware threads.  The hardware threads
> +braces, denoting the denoting the hardware threads.  (The curly braces can be

Preexisting issue, "denoting the " is repeated twice, can you please fix
that?

> +omitted when only a single number has been specified.)  The hardware threads

Also, I wouldn't add ()s around the above sentence.

>  belonging to a place can either be specified as comma-separated list of
>  nonnegative thread numbers or using an interval.  Multiple places can also be
>  either specified by a comma-separated list of places or by an interval.  To
> -specify an interval, a colon followed by the count is placed after after
> +specify an interval, a colon followed by the count is placed after
>  the hardware thread number or the place.  Optionally, the length can be
>  followed by a colon and the stride number -- otherwise a unit stride is
> -assumed.  For instance, the following specifies the same places list:
> +assumed. Placing an exclamation mark (@code{!}) directly before a curly

Two spaces after .

> +brace or numbers inside the curley braces (excluding intervals) will

s/curley/curly/

> +exclude those hardware threads.
> +
> +For instance, the following specifies the same places list:
>  @code{"@{0,1,2@}, @{3,4,6@}, @{7,8,9@}, @{10,11,12@}"};
>  @code{"@{0:3@}, @{3:3@}, @{7:3@}, @{10:3@}"}; and @code{"@{0:2@}:4:3"}.
>  

Otherwise LGTM.

Jakub



Re: [PATCH v2] regrename: Skip renaming if instruction is noop move.

2021-12-03 Thread Jeff Law via Gcc-patches




On 12/2/2021 9:26 PM, Jojo R wrote:

Skip renaming if instruction is noop move, and it will
been removed for performance.

gcc/
* regrename.c (find_rename_reg): Return satisfied regno
if instruction is noop move.

OK
jeff



Re: [PR103028] test ifcvt trap_if seq more strictly after reload

2021-12-03 Thread Jeff Law via Gcc-patches




On 12/3/2021 2:21 AM, Alexandre Oliva via Gcc-patches wrote:

When -fif-conversion2 is enabled, we attempt to replace conditional
branches around unconditional traps with conditional traps.  That
canonicalizes compares, which may change an immediate that barely fits
into one that doesn't.

The compare for the trap is first checked using the predicates of
cbranch predicates, and then, compare and conditional trap insns are
emitted and recognized.

In the failing s390x testcase, i <=u 0x_ is canonicalized into
i OK.  I wouldn't be surprised if there's other cases were we're using 
recog_memoized post-reload when we should be using valid_insn_p.


Jeff


Re: [Patch 1/8, Arm, AArch64, GCC] Refactor mbranch-protection option parsing and make it common to AArch32 and AArch64 backends. [Was RE: [Patch 2/7, Arm, GCC] Add option -mbranch-protection.]

2021-12-03 Thread Richard Earnshaw via Gcc-patches




On 30/11/2021 11:11, Andrea Corallo via Gcc-patches wrote:

Tejas Belagod via Gcc-patches  writes:


Ping for this series.

Thanks,
Tejas.


Hi all,

pinging this series.

BR

   Andrea



It would be easier to find the 'series' if the messages were properly 
threaded together...


R.


Re: [Patch 1/8, Arm, AArch64, GCC] Refactor mbranch-protection option parsing and make it common to AArch32 and AArch64 backends. [Was RE: [Patch 2/7, Arm, GCC] Add option -mbranch-protection.]

2021-12-03 Thread Richard Earnshaw via Gcc-patches




On 10/11/2021 13:55, Andrea Corallo via Gcc-patches wrote:

Tejas Belagod via Gcc-patches  writes:

[...]


This change refactors all the mbranch-protection option parsing code and types
to make it common to both AArch32 and AArch64 backends.  This change also pulls
in some supporting types from AArch64 to make it common
(aarch_parse_opt_result).  The significant changes in this patch are the
movement of all branch protection parsing routines from aarch64.c to
aarch-common.c and supporting data types and static data structures.  This
patch also pre-declares variables and types required in the aarch32 back for
moved variables for function sign scope and key to prepare for the impending
series of patches that support parsing the feature mbranch-protection in the
aarch32 back end.

2021-10-25  Tejas Belagod  

gcc/ChangeLog:

* common/config/aarch64/aarch64-common.c: Include aarch-common.h.
(all_architectures): Fix comment.
(aarch64_parse_extension): Rename return type, enum value names.
* config/aarch64/aarch64-c.c (aarch64_update_cpp_builtins): Rename
factored out aarch_ra_sign_scope and aarch_ra_sign_key variables.
Also rename corresponding enum values.
* config/aarch64/aarch64-opts.h (aarch64_function_type): Factor out
aarch64_function_type and move it to common code as aarch_function_type
in aarch-common.h.
* config/aarch64/aarch64-protos.h: Include common types header, move out
types aarch64_parse_opt_result and aarch64_key_type to aarch-common.h
* config/aarch64/aarch64.c: Move mbranch-protection parsing types and
functions out into aarch-common.h and aarch-common.c.  Fix up all the 
name
changes resulting from the move.
* config/aarch64/aarch64.md: Fix up aarch64_ra_sign_key type name change
and enum value.
* config/aarch64/aarch64.opt: Include aarch-common.h to import type 
move.
Fix up name changes from factoring out common code and data.
* config/arm/aarch-common-protos.h: Export factored out routines to both
backends.
* config/arm/aarch-common.c: Include newly factored out types.  Move all
mbranch-protection code and data structures from aarch64.c.
* config/arm/aarch-common.h: New header that declares types shared 
between
aarch32 and aarch64 backends.
* config/arm/arm-protos.h: Declare types and variables that are made 
common
to aarch64 and aarch32 backends - aarch_ra_sign_key, 
aarch_ra_sign_scope and
aarch_enable_bti.


Tested the following configurations. OK for trunk?

-mthumb/-march=armv8.1-m.main+pacbti/-mfloat-abi=soft
-marm/-march=armv7-a/-mfpu=vfpv3-d16/-mfloat-abi=softfp
mcmodel=small and tiny
aarch64-none-linux-gnu native test and bootstrap

Thanks,
Tejas.


Hi Tejas,

going through the code I've spotted a couple of indentation nits that I
guess are coming from the original source that was moved.


diff --git a/gcc/config/arm/aarch-common.c b/gcc/config/arm/aarch-common.c


[...]


+  /* Copy the last processed token into the argument to pass it back.
+Used by option and attribute validation to print the offending token.  */
+  if (last_str)
+{
+  if (str) strcpy (*last_str, str);
+  else *last_str = NULL;


I think we should have new lines after both if and else here.


Agreed.  This doesn't match the GNU style.




+}


There should also be a blank line here, before the next if clause.


+  if (res == AARCH_PARSE_OK)
+{
+  /* If needed, alloc the accepted string then copy in const_str.
+   Used by override_option_after_change_1.  */
+  if (!accepted_branch_protection_string)
+   accepted_branch_protection_string = (char *) xmalloc (
+ BRANCH_PROTECT_STR_MAX
+   + 1);

^^
 Indentation


It would be best to split this just before the '=', then then rest of 
the statement should fit on one line




+  strncpy (accepted_branch_protection_string, const_str,
+   BRANCH_PROTECT_STR_MAX + 1);

^^
 Same

+  /* Forcibly null-terminate.  */
+  accepted_branch_protection_string[BRANCH_PROTECT_STR_MAX] = '\0';
+}
+  return res;
+}


Thanks

   Andrea



+++ b/gcc/config/arm/aarch-common.h
@@ -0,0 +1,73 @@
+/* Types shared between arm and aarch64.
+
+   Copyright (C) 1991-2021 Free Software Foundation, Inc.
+   Contributed by ARM Ltd.
+

I think this should be

/* Types shared between arm and aarch64.

   Copyright (C) 2009-2021 Free Software Foundation, Inc.
   Contributed by Arm Ltd.

Since all of the code has been derived from the aarch64 port.

-struct aarch64_branch_protect_type
-{
-  /* The type's name that the user passes to the branch-protection option
-strin

Re: [patch] Fortran/OpenMP: Support most of 5.1 atomic extensions

2021-12-03 Thread Jakub Jelinek via Gcc-patches
On Mon, Nov 15, 2021 at 12:29:31PM +0100, Tobias Burnus wrote:
> --- a/gcc/fortran/dump-parse-tree.c
> +++ b/gcc/fortran/dump-parse-tree.c
> @@ -1926,6 +1930,22 @@ show_omp_clauses (gfc_omp_clauses *omp_clauses)
>fputc (' ', dumpfile);
>fputs (memorder, dumpfile);
>  }
> +  if (omp_clauses->fail != OMP_MEMORDER_UNSET)
> +{
> +  const char *memorder;
> +  switch (omp_clauses->fail)
> + {
> + case OMP_MEMORDER_ACQ_REL: memorder = "ACQ_REL"; break;

No need for the above line.

> + case OMP_MEMORDER_ACQUIRE: memorder = "AQUIRE"; break;
> + case OMP_MEMORDER_RELAXED: memorder = "RELAXED"; break;
> + case OMP_MEMORDER_RELEASE: memorder = "RELEASE"; break;

And above line either.  They aren't allowed for fail clause and
you reject it already during parsing, so the default: gcc_unreachable ();
can handle it fine.

> + case OMP_MEMORDER_SEQ_CST: memorder = "SEQ_CST"; break;
> + default: gcc_unreachable ();

> @@ -1449,8 +1452,9 @@ gfc_match_omp_clauses (gfc_omp_clauses **cp, const 
> omp_mask mask,
>gcc_checking_assert (OMP_MASK1_LAST <= 64 && OMP_MASK2_LAST <= 64);
>*cp = NULL;
>while (1)
> -{
> -  if ((first || gfc_match_char (',') != MATCH_YES)
> +{ 

Why the added trailing whitespace after { ?

> +  match m = MATCH_NO;
> +  if ((first || (m = gfc_match_char (',')) != MATCH_YES)
> && (needs_space && gfc_match_space () != MATCH_YES))
>   break;
>needs_space = false;
> @@ -1460,7 +1464,11 @@ gfc_match_omp_clauses (gfc_omp_clauses **cp, const 
> omp_mask mask,
>gfc_omp_namelist **head;
>old_loc = gfc_current_locus;
>char pc = gfc_peek_ascii_char ();
> -  match m;
> +  if (pc == '\n' && m == MATCH_YES)
> + {
> +   gfc_error ("Clause expected at %C after tailing comma");

Do you mean trailing ?

> +   if ((mask & OMP_CLAUSE_FAIL)
> +   && (m = gfc_match_dupl_check (c->fail == OMP_MEMORDER_UNSET,
> + "fail", true)) != MATCH_NO)
> + {
> +   if (m == MATCH_ERROR)
> + goto error;
> +   if (gfc_match ("seq_cst") == MATCH_YES)
> + c->fail = OMP_MEMORDER_SEQ_CST;
> +   else if (gfc_match ("acquire") == MATCH_YES)
> + c->fail = OMP_MEMORDER_ACQUIRE;
> +   else if (gfc_match ("relaxed") == MATCH_YES)
> + c->fail = OMP_MEMORDER_RELAXED;
> +   else
> + {
> +   gfc_error ("Expected SEQ_CST, ACQUIRE or RELAXED at %C");
> +   break;
> + }

Here is where you make sure c->fail isn't OMP_MEMORDER_{RELEASE,ACQ_REL} ...

> -static void
> +/*static */ void
>  resolve_omp_atomic (gfc_code *code)

Why?

> +  if (stmt && !capture_stmt && next->block->block)
> + {
> +   if (next->block->block->expr1)
> + {
> +   gfc_error ("Expected ELSE at %L in atomic compare capture",
> +   &next->block->block->expr1->where);
> + }

No {}s around single statement body.

> @@ -4508,6 +4508,17 @@ gfc_trans_omp_atomic (gfc_code *code)
>  case OMP_MEMORDER_SEQ_CST: mo = OMP_MEMORY_ORDER_SEQ_CST; break;
>  default: gcc_unreachable ();
>  }
> +  switch (atomic_code->ext.omp_clauses->fail)
> +{
> +case OMP_MEMORDER_UNSET: fail_mo = OMP_FAIL_MEMORY_ORDER_UNSPECIFIED; 
> break;
> +case OMP_MEMORDER_ACQ_REL: fail_mo = OMP_FAIL_MEMORY_ORDER_RELAXED; 
> break;
> +case OMP_MEMORDER_ACQUIRE: fail_mo = OMP_FAIL_MEMORY_ORDER_ACQUIRE; 
> break;
> +case OMP_MEMORDER_RELAXED: fail_mo = OMP_FAIL_MEMORY_ORDER_RELAXED; 
> break;
> +case OMP_MEMORDER_RELEASE: fail_mo = OMP_FAIL_MEMORY_ORDER_RELEASE; 
> break;
> +case OMP_MEMORDER_SEQ_CST: fail_mo = OMP_FAIL_MEMORY_ORDER_SEQ_CST; 
> break;

Again, no reason to handle OMP_MEMORDER_ACQ_REL and OMP_MEMORDER_RELEASE
above.

Otherwise LGTM.

Jakub



Dominators question

2021-12-03 Thread Andrew MacLeod via Gcc-patches
When something like the loop unswitching code adds elements to the CFGs, 
does this invalidate the dominators? or are they updated?  or is it in 
an in between state.


Im curious because a) the relation code uses it under the covers, and b) 
Im looking to add a ranger caching improvement which also uses 
dominators if they are available.


When blocks are added, I wonder what will happen to

  1) dom_info_available_p (CDI_DOMINATORS)  (is it still true), and 
then what happens to


  2) get_immediate_dominator (CDI_DOMINATORS, bb);  for one of the 
newly added BBs.


Thanks

Andrew



Re: [PATCH] Loop unswitching: support gswitch statements.

2021-12-03 Thread Andrew MacLeod via Gcc-patches

On 12/2/21 11:02, Martin Liška wrote:

On 12/2/21 15:27, Andrew MacLeod wrote:
ranger->gori().outgoing_edge_range_p (irange &r, edge e, tree name, 
*get_global_range_query ());


Thank you! It works for me!

Martin

btw, this applies to names not  on the stmt as well.   The function 
returns TRUE if there is an outgoing range calculable, and false if 
not.  so:


a = b + 20
if (a > 40)    <- returns TRUE for outgoing range of 'a' or 'b'
    {
   c = foo()
   if (c > 60)   <- returns false for 'a' or 'b'
   {
   if (b < 100)   <- Also returns TRUE for 'a' or 'b'

The final block, from the EVRP dump:

     :
    if (b_4(D) <= 99)
  goto ; [INV]
    else
  goto ; [INV]

4->5  (T) b_4(D) :  unsigned int [21, 99]
4->5  (T) a_5 : unsigned int [41, 119]
4->6  (F) b_4(D) :  unsigned int [100, 4294967275]
4->6  (F) a_5 : unsigned int [120, +INF]

Shows that it has a range for both 'a' and 'b'.

Just to point out that you can use this query on any conditional edge, 
even if it isnt directly mentioned on the stmt.  so if you are keying of 
'a', you could simply ask if outgoing_edge_range_p ( , a,...)  rather 
than parsing the condition to see if 'a' is on it.   if there is no 
chance that 'a' is affected by the block, is just returns false.


When its called directly like this, it picks up no ranges from outside 
the basic block you are querying, except via that range query you 
provide. The listing shows the defaultwhic is whatever ranger knows.   
So if you provided get_global_range_query, those ranges would instead be 
something like:


4->5  (T) b_4(D) :  unsigned int [0, 99]
4->5  (T) a_5 : unsigned int [21, 119]
4->6  (F) b_4(D) :  unsigned int [100, +INF]
4->6  (F) a_5 : unsigned int [0,20] [120, +INF-20 ]

It may simplify things a little if you are unswitching on 'a', you can 
just ask each block with a condition whether 'a's range can be modified


Andrew.





PING^1 [PATCH] x86: Add -mmove-max=bits and -mstore-max=bits

2021-12-03 Thread H.J. Lu via Gcc-patches
On Thu, Nov 25, 2021 at 2:47 PM H.J. Lu  wrote:
>
> Add -mmove-max=bits and -mstore-max=bits to enable 256-bit/512-bit move
> and store, independent of -mprefer-vector-width=bits:
>
> 1. Add X86_TUNE_AVX512_MOVE_BY_PIECES and X86_TUNE_AVX512_STORE_BY_PIECES
> which are enabled for Intel Sapphire Rapids processor.
> 2. Add -mmove-max=bits to set the maximum number of bits can be moved from
> memory to memory efficiently.  The default value is derived from
> X86_TUNE_AVX512_MOVE_BY_PIECES, X86_TUNE_AVX256_MOVE_BY_PIECES, and the
> preferred vector width.
> 3. Add -mstore-max=bits to set the maximum number of bits can be stored to
> memory efficiently.  The default value is derived from
> X86_TUNE_AVX512_STORE_BY_PIECES, X86_TUNE_AVX256_STORE_BY_PIECES and the
> preferred vector width.
>
> gcc/
>
> PR target/103269
> * config/i386/i386-expand.c (ix86_expand_builtin): Pass PVW_NONE
> and PVW_NONE to ix86_target_string.
> * config/i386/i386-options.c (ix86_target_string): Add arguments
> for move_max and store_max.
> (ix86_target_string::add_vector_width): New lambda.
> (ix86_debug_options): Pass ix86_move_max and ix86_store_max to
> ix86_target_string.
> (ix86_function_specific_print): Pass ptr->x_ix86_move_max and
> ptr->x_ix86_store_max to ix86_target_string.
> (ix86_valid_target_attribute_tree): Handle x_ix86_move_max and
> x_ix86_store_max.
> (ix86_option_override_internal): Set the default x_ix86_move_max
> and x_ix86_store_max.
> * config/i386/i386-options.h (ix86_target_string): Add
> prefer_vector_width and prefer_vector_width.
> * config/i386/i386.h (TARGET_AVX256_MOVE_BY_PIECES): Removed.
> (TARGET_AVX256_STORE_BY_PIECES): Likewise.
> (MOVE_MAX): Use 64 if ix86_move_max or ix86_store_max ==
> PVW_AVX512.  Use 32 if ix86_move_max or ix86_store_max >=
> PVW_AVX256.
> (STORE_MAX_PIECES): Use 64 if ix86_store_max == PVW_AVX512.
> Use 32 if ix86_store_max >= PVW_AVX256.
> * config/i386/i386.opt: Add -mmove-max=bits and -mstore-max=bits.
> * config/i386/x86-tune.def (X86_TUNE_AVX512_MOVE_BY_PIECES): New.
> (X86_TUNE_AVX512_STORE_BY_PIECES): Likewise.
> * doc/invoke.texi: Document -mmove-max=bits and -mstore-max=bits.
>
> gcc/testsuite/
>
> PR target/103269
> * gcc.target/i386/pieces-memcpy-17.c: New test.
> * gcc.target/i386/pieces-memcpy-18.c: Likewise.
> * gcc.target/i386/pieces-memcpy-19.c: Likewise.
> * gcc.target/i386/pieces-memcpy-20.c: Likewise.
> * gcc.target/i386/pieces-memcpy-21.c: Likewise.
> * gcc.target/i386/pieces-memset-45.c: Likewise.
> * gcc.target/i386/pieces-memset-46.c: Likewise.
> * gcc.target/i386/pieces-memset-47.c: Likewise.
> * gcc.target/i386/pieces-memset-48.c: Likewise.
> * gcc.target/i386/pieces-memset-49.c: Likewise.
> ---
>  gcc/config/i386/i386-expand.c |  1 +
>  gcc/config/i386/i386-options.c| 75 +--
>  gcc/config/i386/i386-options.h|  6 +-
>  gcc/config/i386/i386.h| 18 ++---
>  gcc/config/i386/i386.opt  |  8 ++
>  gcc/config/i386/x86-tune.def  | 10 +++
>  gcc/doc/invoke.texi   | 13 
>  .../gcc.target/i386/pieces-memcpy-17.c| 16 
>  .../gcc.target/i386/pieces-memcpy-18.c| 16 
>  .../gcc.target/i386/pieces-memcpy-19.c| 16 
>  .../gcc.target/i386/pieces-memcpy-20.c| 16 
>  .../gcc.target/i386/pieces-memcpy-21.c| 16 
>  .../gcc.target/i386/pieces-memset-45.c| 16 
>  .../gcc.target/i386/pieces-memset-46.c| 17 +
>  .../gcc.target/i386/pieces-memset-47.c| 17 +
>  .../gcc.target/i386/pieces-memset-48.c| 17 +
>  .../gcc.target/i386/pieces-memset-49.c| 16 
>  17 files changed, 276 insertions(+), 18 deletions(-)
>  create mode 100644 gcc/testsuite/gcc.target/i386/pieces-memcpy-17.c
>  create mode 100644 gcc/testsuite/gcc.target/i386/pieces-memcpy-18.c
>  create mode 100644 gcc/testsuite/gcc.target/i386/pieces-memcpy-19.c
>  create mode 100644 gcc/testsuite/gcc.target/i386/pieces-memcpy-20.c
>  create mode 100644 gcc/testsuite/gcc.target/i386/pieces-memcpy-21.c
>  create mode 100644 gcc/testsuite/gcc.target/i386/pieces-memset-45.c
>  create mode 100644 gcc/testsuite/gcc.target/i386/pieces-memset-46.c
>  create mode 100644 gcc/testsuite/gcc.target/i386/pieces-memset-47.c
>  create mode 100644 gcc/testsuite/gcc.target/i386/pieces-memset-48.c
>  create mode 100644 gcc/testsuite/gcc.target/i386/pieces-memset-49.c
>
> diff --git a/gcc/config/i386/i386-expand.c b/gcc/config/i386/i386-expand.c
> index 0d5d1a0e205..7e77ff56ddc 100644
> --- a/gcc/config/i386/i386-expand.c
> +++ b/gcc/config/i386/i386-expand.c
> @@ -12295,6 +1229

Re: [PATCH] PR fortran/103505 - ICE in compare_bound_mpz_t, at fortran/resolve.c:4587

2021-12-03 Thread Mikael Morin

On 02/12/2021 22:48, Harald Anlauf via Fortran wrote:

Dear Fortranners,

specifying invalid constant array declarations (e.g. real array bounds)
could lead to an ICE because the array specs were checked for consistency.
A possible solution is to try an early simplification before doing that
check and was suggested by Steve.

However, regtesting did reveal that bad declarations involving e.g.
arithmetic errors - like division by zero - were now handled differently.
(The relevant testcase was gfortran.dg/arith_divide_2.f90).

I therefore added a new function gfc_try_simplify_expr that accepts the
simplification only if no error occurs.  This allows discovery of
arithmetic errors also at later stages and will be used here.

Regtested on x86_64-pc-linux-gnu.  OK for mainline?


OK. Thanks


Re: [PATCH v2] [AARCH64] Fix PR target/103100 -mstrict-align and memset on not aligned buffers

2021-12-03 Thread Richard Sandiford via Gcc-patches
Andrew Pinski via Gcc-patches  writes:
> On Thu, Nov 18, 2021 at 5:55 PM apinski--- via Gcc-patches
>  wrote:
>>
>> From: Andrew Pinski 
>>
>> The problem here is that aarch64_expand_setmem does not change the alignment
>> for strict alignment case. This is a simplified patch from what I had 
>> previously.
>> So constraining copy_limit to the alignment of the mem in the case of strict 
>> align
>> fixes the issue without checking to many other changes to the code.
>>
>> OK? Bootstrapped and tested on aarch64-linux-gnu with no regressions.
>
> Ping?
>
>>
>> gcc/ChangeLog:
>>
>> * config/aarch64/aarch64.c (aarch64_expand_setmem): Constraint
>> copy_limit to the alignment of the mem if STRICT_ALIGNMENT is
>> true.

This looks correct, but for a modified version of the testcase in the PR:

void f(char *x) { __builtin_memset (x, 0, 216); }

we'll now emit 216 STRB instructions, which seems a bit excessive.

I think in practice the code has only been tuned on targets that
support LDP/STP Q, so how about moving the copy_limit calculation
further up and doing:

  unsigned max_set_size = (copy_limit * 8) / BITS_PER_UNIT;

?

It would be good to have a scan-assembler-not testcase for the testsuite.

Thanks,
Richard


>> ---
>>  gcc/config/aarch64/aarch64.c | 13 ++---
>>  1 file changed, 10 insertions(+), 3 deletions(-)
>>
>> diff --git a/gcc/config/aarch64/aarch64.c b/gcc/config/aarch64/aarch64.c
>> index 7389b5953dc..e9c2e89d8ce 100644
>> --- a/gcc/config/aarch64/aarch64.c
>> +++ b/gcc/config/aarch64/aarch64.c
>> @@ -23744,9 +23744,16 @@ aarch64_expand_setmem (rtx *operands)
>>/* Maximum amount to copy in one go.  We allow 256-bit chunks based on the
>>   AARCH64_EXTRA_TUNE_NO_LDP_STP_QREGS tuning parameter.  setmem expand
>>   pattern is only turned on for TARGET_SIMD.  */
>> -  const int copy_limit = (aarch64_tune_params.extra_tuning_flags
>> - & AARCH64_EXTRA_TUNE_NO_LDP_STP_QREGS)
>> - ? GET_MODE_BITSIZE (TImode) : 256;
>> +  int copy_limit;
>> +
>> +  if (aarch64_tune_params.extra_tuning_flags
>> +  & AARCH64_EXTRA_TUNE_NO_LDP_STP_QREGS)
>> +copy_limit = GET_MODE_BITSIZE (TImode);
>> +  else
>> +copy_limit = 256;
>> +
>> +  if (STRICT_ALIGNMENT)
>> +copy_limit = MIN (copy_limit, (int)MEM_ALIGN (dst));
>>
>>while (n > 0)
>>  {
>> --
>> 2.17.1
>>


Re: [PATCH 2/5]AArch64 sve: combine nested if predicates

2021-12-03 Thread Richard Sandiford via Gcc-patches
Tamar Christina  writes:
>> This hashing looks unnecessarily complex.  The values we're hashing are
>> vector SSA_NAMEs, so I think we should be able to hash and compare them
>> as a plain pair of pointers.
>> 
>> The type could then be std::pair and the hashing could be done using
>> pair_hash from hash-traits.h.
>> 
>
> Fancy.. TIL...
>
> Here's the respun patch.
>
> Bootstrapped Regtested on aarch64-none-linux-gnu, x86_64-linux-gnu and no 
> issues.

LGTM, just some very minor nits…

> Ok for master?
>
> Thanks,
> Tamar
>
> gcc/ChangeLog:
>
>   * tree-vect-stmts.c (prepare_load_store_mask): Rename to...
>   (prepare_vec_mask): ...This and record operations that have already been
>   masked.
>   (vectorizable_call): Use it.
>   (vectorizable_operation): Likewise.
>   (vectorizable_store): Likewise.
>   (vectorizable_load): Likewise.
>   * tree-vectorizer.h (class _loop_vec_info): Add vec_cond_masked_set.
>   (vec_cond_masked_set_type, tree_cond_mask_hash,
>   vec_cond_masked_key): New.
>
> gcc/testsuite/ChangeLog:
>
>   * gcc.target/aarch64/sve/pred-combine-and.c: New test.
>
> --- inline copy of patch ---
>
> diff --git a/gcc/testsuite/gcc.target/aarch64/sve/pred-combine-and.c 
> b/gcc/testsuite/gcc.target/aarch64/sve/pred-combine-and.c
> new file mode 100644
> index 
> ..ee927346abe518caa3cba397b11dfd1ee7e93630
> --- /dev/null
> +++ b/gcc/testsuite/gcc.target/aarch64/sve/pred-combine-and.c
> @@ -0,0 +1,18 @@
> +/* { dg-do compile } */
> +/* { dg-options "-O3" } */
> +
> +void f5(float * restrict z0, float * restrict z1, float *restrict x, float * 
> restrict y, float c, int n)
> +{
> +for (int i = 0; i < n; i++) {
> +float a = x[i];
> +float b = y[i];
> +if (a > b) {
> +z0[i] = a + b;
> +if (a > c) {
> +z1[i] = a - b;
> +}
> +}
> +}
> +}
> +
> +/* { dg-final { scan-assembler-times {\tfcmgt\tp[0-9]+\.s, p[0-9]+/z, 
> z[0-9]+\.s, z[0-9]+\.s} 2 } } */
> diff --git a/gcc/tree-vect-stmts.c b/gcc/tree-vect-stmts.c
> index 
> 2284ad069e4d521f4e0bd43d34181a258cd636ef..2a02ff0b1e53be6eda49770b240f8f58f3928a87
>  100644
> --- a/gcc/tree-vect-stmts.c
> +++ b/gcc/tree-vect-stmts.c
> @@ -1796,23 +1796,30 @@ check_load_store_for_partial_vectors (loop_vec_info 
> loop_vinfo, tree vectype,
>  /* Return the mask input to a masked load or store.  VEC_MASK is the 
> vectorized
> form of the scalar mask condition and LOOP_MASK, if nonnull, is the mask
> that needs to be applied to all loads and stores in a vectorized loop.
> -   Return VEC_MASK if LOOP_MASK is null, otherwise return VEC_MASK & 
> LOOP_MASK.
> +   Return VEC_MASK if LOOP_MASK is null or if VEC_MASK is already masked,
> +   otherwise return VEC_MASK & LOOP_MASK.
>  
> MASK_TYPE is the type of both masks.  If new statements are needed,
> insert them before GSI.  */
>  
>  static tree
> -prepare_load_store_mask (tree mask_type, tree loop_mask, tree vec_mask,
> -  gimple_stmt_iterator *gsi)
> +prepare_vec_mask (loop_vec_info loop_vinfo, tree mask_type, tree loop_mask,
> +   tree vec_mask, gimple_stmt_iterator *gsi)
>  {
>gcc_assert (useless_type_conversion_p (mask_type, TREE_TYPE (vec_mask)));
>if (!loop_mask)
>  return vec_mask;
>  
>gcc_assert (TREE_TYPE (loop_mask) == mask_type);
> +
> +  vec_cond_masked_key cond (vec_mask, loop_mask);
> +  if (loop_vinfo->vec_cond_masked_set.contains (cond))
> +return vec_mask;
> +

Guess this is pushing a personal preference, sorry, but now that
the code is C++11, I think we should use:

  if (loop_vinfo->vec_cond_masked_set.contains ({ vec_mask, loop_mask }))
return vec_mask;

for cases like this, avoiding the need for the separate “cond” variable.

>tree and_res = make_temp_ssa_name (mask_type, NULL, "vec_mask_and");
>gimple *and_stmt = gimple_build_assign (and_res, BIT_AND_EXPR,
> vec_mask, loop_mask);
> +
>gsi_insert_before (gsi, and_stmt, GSI_SAME_STMT);
>return and_res;
>  }
> @@ -3526,8 +3533,9 @@ vectorizable_call (vec_info *vinfo,
> gcc_assert (ncopies == 1);
> tree mask = vect_get_loop_mask (gsi, masks, vec_num,
> vectype_out, i);
> -   vargs[mask_opno] = prepare_load_store_mask
> - (TREE_TYPE (mask), mask, vargs[mask_opno], gsi);
> +   vargs[mask_opno] = prepare_vec_mask
> + (loop_vinfo, TREE_TYPE (mask), mask,
> +  vargs[mask_opno], gsi);
>   }
>  
> gcall *call;
> @@ -3564,8 +3572,8 @@ vectorizable_call (vec_info *vinfo,
> tree mask = vect_get_loop_mask (gsi, masks, ncopies,
> vectype_out, j);
>  

Re: [PATCH] gcc: vxworks: fix providing stdint.h header

2021-12-03 Thread Olivier Hainque via Gcc-patches



> On 3 Dec 2021, at 11:27, Rasmus Villemoes  wrote:
> 
> Reverting my fix and applying this on top of my gcc-11.2 based branch
> seems to work. I haven't used the compiler for building or running any
> modules (don't have the hardware handy), but I've done an 'objdump -d'
> comparison on all the generated host binaries and target libraries with
> no diff.
> 
> So OK by me.


Thanks for checking. Bootstrap and regression tests passed
on a native x86_64-linux host.

Alex, how does that look to you?

Thanks in advance!




[committed] testsuite: Fix up pr103456.c testcase [PR103456]

2021-12-03 Thread Jakub Jelinek via Gcc-patches
On Wed, Dec 01, 2021 at 02:01:59PM +0530, Siddhesh Poyarekar wrote:
>   PR tree-optimization/103456
>   * gcc.dg/ubsan/pr103456.c: New test.

ubsan.exp cycles through torture options, and that includes
-O2 -flto -fno-fat-lto-objects.  But with those options
tree dump scans don't work for post-IPA passes, for dg-do
compile tests nothing after IPA is done.  So we get an
unresolved testcase:
gcc.dg/ubsan/pr103456.c   -O2 -flto -fuse-linker-plugin -fno-fat-lto-objects  : 
dump file does not exist
UNRESOLVED: gcc.dg/ubsan/pr103456.c   -O2 -flto -fuse-linker-plugin 
-fno-fat-lto-objects   scan-tree-dump-not objsz1 "maximum object size 0"

Fixed by adding -ffat-lto-objects so that we perform the post-IPA
passes.

Tested on x86_64-linux, committed to trunk as obvious.

2021-12-03  Jakub Jelinek  

PR tree-optimization/103456
* gcc.dg/ubsan/pr103456.c: Add -ffat-lto-objects to dg-options.

--- gcc/testsuite/gcc.dg/ubsan/pr103456.c.jj2021-12-01 10:03:55.404029919 
+0100
+++ gcc/testsuite/gcc.dg/ubsan/pr103456.c   2021-12-03 12:06:57.613750230 
+0100
@@ -1,6 +1,6 @@
 /* PR tree-optimization/103456 */
 /* { dg-do compile } */
-/* { dg-options "-fsanitize=undefined -O -fdump-tree-objsz" } */
+/* { dg-options "-fsanitize=undefined -O -fdump-tree-objsz -ffat-lto-objects" 
} */
 
 static char *multilib_options = "m64/m32";
 


Jakub



[PATCH] libcpp: Fix up handling of deferred pragmas [PR102432]

2021-12-03 Thread Jakub Jelinek via Gcc-patches
Hi!

The https://gcc.gnu.org/pipermail/gcc-patches/2020-November/557903.html
change broke the following testcases.  The problem is when a pragma
namespace allows expansion (i.e. p->is_nspace && p->allow_expansion),
e.g. the omp or acc namespaces do, then when parsing the second pragma
token we do it with pfile->state.in_directive set,
pfile->state.prevent_expansion clear and pfile->state.in_deferred_pragma
clear (the last one because we don't know yet if it will be a deferred
pragma or not).  If the pragma line only contains a single name
and newline after it, and there exists a function-like macro with the
same name, the preprocessor needs to peek in funlike_invocation_p
the next token whether it isn't ( but in this case it will see a newline.
As pfile->state.in_directive is set, we don't read anything after the
newline, pfile->buffer->need_line is set and CPP_EOF is lexed, which
funlike_invocation_p doesn't push back.  Because name is a function-like
macro and on the pragma line there is no ( after the name, it isn't
expanded, and control flow returns to do_pragma.  If name is valid
deferred pragma, we set pfile->state.in_deferred_pragma (and really
need it set so that e.g. end_directive later on doesn't eat all the
tokens from the pragma line).

Before Nathan's change (which unfortunately didn't contain rationale
on why it is better to do it like that), this wasn't a problem,
next _cpp_lex_direct called when we want next token would return
CPP_PRAGMA_EOF when it saw buffer->need_line, which would turn off
pfile->state.in_deferred_pragma and following get token would already
read the next line.  But Nathan's patch replaced it with an assertion
failure that now triggers and CPP_PRAGMA_EOL is done only when lexing
the '\n'.  Except for this special case that works fine, but in
this case it doesn't because when peeking the token we still didn't know
that it will be a deferred pragma.
I've tried to fix that up in do_pragma by detecting this and pushing
CPP_PRAGMA_EOL as lookahead, but that doesn't work because end_directive
still needs to see pfile->state.in_deferred_pragma set.

So, this patch affectively reverts part of Nathan's change, CPP_PRAGMA_EOL
addition isn't done only when parsing the '\n', but is now done in both
places, in the first one instead of the assertion failure.

Bootstrapped/regtested on x86_64-linux and i686-linux, ok for trunk?

2021-12-03  Jakub Jelinek  

PR preprocessor/102432
* lex.c (_cpp_lex_direct): If buffer->need_line while
pfile->state.in_deferred_pragma, return CPP_PRAGMA_EOL token instead
of assertion failure.

* c-c++-common/gomp/pr102432.c: New test.
* c-c++-common/goacc/pr102432.c: New test.

--- libcpp/lex.c.jj 2021-12-01 10:21:16.808869062 +0100
+++ libcpp/lex.c2021-12-02 10:38:23.010621589 +0100
@@ -3530,7 +3530,21 @@ _cpp_lex_direct (cpp_reader *pfile)
   buffer = pfile->buffer;
   if (buffer->need_line)
 {
-  gcc_assert (!pfile->state.in_deferred_pragma);
+  if (pfile->state.in_deferred_pragma)
+   {
+ /* This can happen in cases like:
+#define loop(x) whatever
+#pragma omp loop
+where when trying to expand loop we need to peek
+next token after loop, but aren't still in_deferred_pragma
+mode but are in in_directive mode, so buffer->need_line
+is set, a CPP_EOF is peeked.  */
+ result->type = CPP_PRAGMA_EOL;
+ pfile->state.in_deferred_pragma = false;
+ if (!pfile->state.pragma_allow_expansion)
+   pfile->state.prevent_expansion--;
+ return result;
+   }
   if (!_cpp_get_fresh_line (pfile))
{
  result->type = CPP_EOF;
--- gcc/testsuite/c-c++-common/gomp/pr102432.c.jj   2021-12-02 
10:42:23.627138897 +0100
+++ gcc/testsuite/c-c++-common/gomp/pr102432.c  2021-12-02 10:41:54.163565353 
+0100
@@ -0,0 +1,23 @@
+/* PR preprocessor/102432 */
+
+#define loop(x)
+
+void
+foo (void)
+{
+  int i;
+#pragma omp parallel
+#pragma omp loop
+  for (i = 0; i < 64; i++)
+;
+}
+
+void
+bar (void)
+{
+  int i;
+  _Pragma ("omp parallel")
+  _Pragma ("omp loop")
+  for (i = 0; i < 64; i++)
+;
+}
--- gcc/testsuite/c-c++-common/goacc/pr102432.c.jj  2021-12-02 
10:43:10.663458092 +0100
+++ gcc/testsuite/c-c++-common/goacc/pr102432.c 2021-12-02 10:43:24.791253606 
+0100
@@ -0,0 +1,23 @@
+/* PR preprocessor/102432 */
+
+#define loop(x)
+
+void
+foo (void)
+{
+  int i;
+#pragma acc parallel
+#pragma acc loop
+  for (i = 0; i < 64; i++)
+;
+}
+
+void
+bar (void)
+{
+  int i;
+  _Pragma ("acc parallel")
+  _Pragma ("acc loop")
+  for (i = 0; i < 64; i++)
+;
+}

Jakub



Re: [PATCH] gcc: vxworks: fix providing stdint.h header

2021-12-03 Thread Rasmus Villemoes
On 02/12/2021 16.29, Olivier Hainque wrote:
> Hi Rasmus,
> 
> Some new on this.
> 
>> On 20 Nov 2021, at 09:07, Olivier Hainque  wrote:
>>
>> I'll check how our build sequence proceeds.
> 
> Turns out our build succeeds thanks to the presence
> of a vendor version of stdint.h in the VxWorks6/7 header dirs
> and we're lucky that the need to pre-include yvals.h isn't
> showing up during the libraries' build.
> 
> It is of course not good to use one version during the build
> of libraries then install an alternate version that will be
> used by programs afterwards.
> 
> The attached patch achieves the same kind of thing you
> initiated, only reusing a method previously introduced
> for glimits.h instead of adding a new use_gcc_stdint value,
> which seems a bit less intrusive to me.
> 
> This introduces an indirect dependency on the VxWorks version.h
> for vxcrtstuff objects, for which we then need to apply the same
> tricks as for libgcc2 regarding include paths (to select the system
> header instead of the gcc one).
> 
> I have had a few succesful builds and tests with this,
> for both VxWorks 6 and VxWorks 7 configurations.

Reverting my fix and applying this on top of my gcc-11.2 based branch
seems to work. I haven't used the compiler for building or running any
modules (don't have the hardware handy), but I've done an 'objdump -d'
comparison on all the generated host binaries and target libraries with
no diff.

So OK by me.

Thanks,
Rasmus


Re: [PATCH] [i386] Prefer INT_SSE_REGS for SSE_FLOAT_MODE_P in preferred_reload_class.

2021-12-03 Thread Uros Bizjak via Gcc-patches
On Fri, Dec 3, 2021 at 7:19 AM liuhongt  wrote:
>
> Hi:
> > Please also consider TARGET_INTER_UNIT_MOVES_TO_VEC and
> > TARGET_INTER_UNIT_MOVES_FROM_VEC.
> Here's updated patch.
>
> Also honor TARGET_INTER_UNIT_MOVES_TO/FROM_VEC and in
> preferred_{,out_}reload_class.
>
> Bootstrapped and regtested on x86_64-pc-linux-gnu{-m32\ -march=k8,\ 
> -march=k8}.
> Ok?
>
> gcc/ChangeLog:
>
> PR target/95740
> * config/i386/i386.c (ix86_preferred_output_reload_class):
> don't reload integer register to/from sse register when tune
> "inter_unit_moves_to/from_vec" is off.
> (ix86_preferred_reload_class): Ditto, also prefer
> INT_SSE_REGS for SSE_FLOAT_MODE_P.
> * config/i386/i386.h (INT_SSE_CLASS_P): New.
>
> gcc/testsuite/ChangeLog:
>
> * gcc.target/i386/pr95740.c: New test.

I was thinking about:

--cut here--
@@ -19194,9 +19194,17 @@ ix86_preferred_reload_class (rtx x,
reg_class_t regclass)
  return NO_REGS;
}

-  /* Prefer SSE regs only, if we can use them for math.  */
+  /* Prefer SSE if we can use them for math.  Also allow integer regs
+ when moves between register units are cheap.  */
  if (SSE_FLOAT_MODE_P (mode) && TARGET_SSE_MATH)
-return SSE_CLASS_P (regclass) ? regclass : NO_REGS;
+{
+  if (TARGET_INTER_UNIT_MOVES_FROM_VEC
+ && TARGET_INTER_UNIT_MOVES_TO_VEC
+ && GET_MODE_SIZE (mode) <= GET_MODE_SIZE (word_mode))
+   return INT_SSE_CLASS_P (regclass) ? regclass : NO_REGS;
+  else
+   return SSE_CLASS_P (regclass) ? regclass : NO_REGS;
+}

  /* Generally when we see PLUS here, it's the function invariant
 (plus soft-fp const_int).  Which can only be computed into general
--cut here--

So, INT_SSE class is allowed when interunit moves are enabled. The
patch also takes care for 64-bit moves which are expensive on 32-bit
targets.

Uros.

> ---
>  gcc/config/i386/i386.c  | 32 +++--
>  gcc/config/i386/i386.h  |  2 ++
>  gcc/testsuite/gcc.target/i386/pr95740.c | 26 
>  3 files changed, 58 insertions(+), 2 deletions(-)
>  create mode 100644 gcc/testsuite/gcc.target/i386/pr95740.c
>
> diff --git a/gcc/config/i386/i386.c b/gcc/config/i386/i386.c
> index 80fee627358..5b90c09a0ba 100644
> --- a/gcc/config/i386/i386.c
> +++ b/gcc/config/i386/i386.c
> @@ -19194,9 +19194,24 @@ ix86_preferred_reload_class (rtx x, reg_class_t 
> regclass)
>return NO_REGS;
>  }
>
> -  /* Prefer SSE regs only, if we can use them for math.  */
> +  /* Unless hard register REGNO is known, it is hard to to tell whether a 
> movd
> + instruction will be generated based on MODE and REGCLASS, because for
> + pseudo-registers, even SFmode could be assigned to INTGER_CLASS_P.  */
> +  if (GENERAL_REG_P (x)
> +  && !TARGET_INTER_UNIT_MOVES_TO_VEC
> +  && MAYBE_SSE_CLASS_P (regclass))
> +return NO_REGS;
> +
> +  if (SSE_REG_P (x)
> +  && !TARGET_INTER_UNIT_MOVES_FROM_VEC
> +  && MAYBE_INTEGER_CLASS_P (regclass))
> +return NO_REGS;
> +
> +  /* Prefer INT_SSE_REGS, enable reload from SSE register to GENERAL_REGS,
> + MAYBE_SSE_CLASS_P is too broad, for sse math, FLOAT_SSE_REGS,
> + FLOAT_INT_SSE_REGS should be disliked.  */
>if (SSE_FLOAT_MODE_P (mode) && TARGET_SSE_MATH)
> -return SSE_CLASS_P (regclass) ? regclass : NO_REGS;
> +return INT_SSE_CLASS_P (regclass) ? regclass : NO_REGS;
>
>/* Generally when we see PLUS here, it's the function invariant
>   (plus soft-fp const_int).  Which can only be computed into general
> @@ -19226,6 +19241,19 @@ ix86_preferred_reload_class (rtx x, reg_class_t 
> regclass)
>  static reg_class_t
>  ix86_preferred_output_reload_class (rtx x, reg_class_t regclass)
>  {
> +
> +  /* Handle movement between integer and sse register like
> + ix86_preferred_reload_class.  */
> +  if (GENERAL_REG_P (x)
> +  && !TARGET_INTER_UNIT_MOVES_TO_VEC
> +  && MAYBE_SSE_CLASS_P (regclass))
> +return NO_REGS;
> +
> +  if (SSE_REG_P (x)
> +  && !TARGET_INTER_UNIT_MOVES_FROM_VEC
> +  && MAYBE_INTEGER_CLASS_P (regclass))
> +return NO_REGS;
> +
>/* Restrict the output reload class to the register bank that we are doing
>   math on.  If we would like not to return a subset of CLASS, reject this
>   alternative: if reload cannot do this, it will still use its choice.  */
> diff --git a/gcc/config/i386/i386.h b/gcc/config/i386/i386.h
> index 2fda1e0686e..ec90e47904b 100644
> --- a/gcc/config/i386/i386.h
> +++ b/gcc/config/i386/i386.h
> @@ -1283,6 +1283,8 @@ enum reg_class
>reg_class_subset_p ((CLASS), FLOAT_REGS)
>  #define SSE_CLASS_P(CLASS) \
>reg_class_subset_p ((CLASS), ALL_SSE_REGS)
> +#define INT_SSE_CLASS_P(CLASS) \
> +  reg_class_subset_p ((CLASS), INT_SSE_REGS)
>  #define MMX_CLASS_P(CLASS) \
>((CLASS) == MMX_REGS)
>  #define MASK_CLASS_P(CLASS) \
> diff --git a/gcc/testsuite/gcc.target/i386/pr95740.c 
> b/gcc/testsuite/gcc.

Re:Led bulb

2021-12-03 Thread Zora Cao
Dear
This is a manufacturer with 20 year's solid experience in the lighting industry.
We can provide good quality with lower price. 
Making your products in your market has a great competition.
Our main products are ceiling lights, down lights, led bulbs,flood lights and 
so on.
If you have any ideas, I can send the catalogue for your reference at first


Your business partner

Zora Cao
Anhui Chenxin Lighting Electrical Appliance Co.,Ltd
Wechat:Levenon1226 Whatsapp:+86-17858999110
Tel:+86-551-65856715 Cell:+86-17858999110
Website:https://ahcxlight.en.alibaba.com

99e19903b7400f2a06a42b512d0ad94c.png
Description: Binary data


Re: Improve -fprofile-report

2021-12-03 Thread Jan Hubicka via Gcc-patches
> On 11/27/21 16:56, Jan Hubicka via Gcc-patches wrote:
> > Hi,
> > Profile-report was never properly updated after switch to new profile
> > representation.  This patch fixes the way profile mismatches are
> > calculated: we used to collect separately count and freq mismatches,
> > while now we have only counts & probabilities.  So we verify
> > 
> >   - in count: that total count of incomming edges is close to acutal count 
> > of
> > the BB
> >   - out prob: that total sum of outgoing edge edge probabilities is close
> > to 1 (except for BB containing noreturn calls or EH).
> 
> Hello.
> 
> Can you please CC me when you mention me in an email?
Sorry - I meant to do that but forgot.  It was quite long debugging
session since the profile stats code was somewhat confused since last
revamp and it took me long while to make sense of the results.
> 
> The version you send is different to what was install :)

There were more problems in the way profile code was colected - I
remember writting email about updated patch but it seems it was never
sent.  I am attaching the version of patch I installed. 

Main change was fixing the way profile mismatches are accounted since
they used to be accounted with off-by-one error (bug which was
introduced few years back) and I added the dump file numbers to make it
easier to find the dumps.
> 
> Pass dump id and name|static mismatch|dynamic 
> mismatch |overall 
>   |
>  |in count |out prob |in count
>   |out prob  |size   |time
>   |
>  15t cfg |  0  |  0  |0   
>   |0 |   165834  |   495010   
>   |
>  17t ompexp  |  0  |  0  |0   
>   |0 |   165834  |   495010   
>   |
>  18t walloca |  0  |  0  |0   
>   |0 |   165834  |   495010   
>   |
>  19i visibility  |  0  |  0  |0   
>   |0 |   165834  |   495010   
>   |
>  20i build_ssa_passes|  0  |  0  |0   
>   |0 |   165834  |   495010   
>   |
> ...
> 
> Can you please rename it to the same format we use for dump files, e.g. 
> 018t.walloca1 ?
> It would be easier for people finding the corresponding dump file.

Do you know how to get that name?
With the numbers it is not too hard to find the dump, but I do not mind
having it either way.
> > 
> > Maritnj: I think we want to track, for start
> 
> You likely mean me, right?

Yep, sorry - it was long day :(
> 
> > 
> >   1) fixup_cfg| 19   +13| 57+5| 
> > 65581029   -158744835|0 |34292+27.9%|  
> > 73900655012-7.8%|
> >   2) loop |612  | 24  |
> > 861403844 |0 |25182  |  
> > 59589705822 |
> >   3) waccess  |817   -10| 26  |
> > 994609654 -2636320|  2199665 |33382  |  
> > 38968666048 |
> >   4) into_cfglayout   |792  | 26  |
> > 982479501 |  2199665 |   245988 -1.2%| 
> > 287195553058-0.9%|
> >   5) alignments   |792  | 22  |   
> > 1077517775 |   750472 |   311396  | 
> > 289957699542 |
> > 
> > 1) is situation after IPA passes, 2) just before loop optimizations, 3) is 
> > end of gimple optimizatoin queue, 4) is just after expansion and 5) is end 
> > of RTl.
> > For each of it we can track
> > 
> > For this we can record:
> >   - in count mismatches: 19
> >   - out probability mismathces: 57
> >   - dynamic in count mismatches: 65581029
> >   - dynamic out probability mismathces: 0
> >   - overall size: 34292
> >   - overall time: 73900655012
> > (values are from fixup_cfg stats).
> > Would that be reasonable?
> 
> Yes, I'm going to add that.

Thanks!
Honza


gcc/ChangeLog:

2021-11-28  Jan Hubicka  

* cfghooks.c: Include sreal.h, profile.h.
(profile_record_check_consistency): Fix checking of count counsistency;
record also dynamic mismatches.
* cfgrtl.c (rtl_account_profile_record): Similarly.
* tree-cfg.c (gimple_account_profile_record): Likewise.
* cfghooks.h (struct profile_record): Remove num_mismatched_freq_in,
num_mismatched_freq_out, turn time to double, add
dyn_

[PR103028] test ifcvt trap_if seq more strictly after reload

2021-12-03 Thread Alexandre Oliva via Gcc-patches


When -fif-conversion2 is enabled, we attempt to replace conditional
branches around unconditional traps with conditional traps.  That
canonicalizes compares, which may change an immediate that barely fits
into one that doesn't.

The compare for the trap is first checked using the predicates of
cbranch predicates, and then, compare and conditional trap insns are
emitted and recognized.

In the failing s390x testcase, i <=u 0x_ is canonicalized into
i  t) {
+unsigned long long ii;
+asm("":"=g"(ii):"0"(i));
+if ((ii <= t))
+  __builtin_trap();
+return x;
+  }
+
+ return 0;
+}


-- 
Alexandre Oliva, happy hackerhttps://FSFLA.org/blogs/lxo/
   Free Software Activist   GNU Toolchain Engineer
Disinformation flourishes because many people care deeply about injustice
but very few check the facts.  Ask me about 


Re: [PR103149] introduce asmnesia internal function

2021-12-03 Thread Alexandre Oliva via Gcc-patches
On Dec  2, 2021, Richard Biener  wrote:

> While adding ASMNESIA looks OK at this point, I'm not sure about the
> asm handling stuff.  You mention 'reload' above, do you mean LRA?

I meant the allocation was first visible in -fdump-rtl-reload.  I've
added a note about this problem, and a modified testcase, to PR93027
once I realized that the problem was present in an unpatched compiler,
even at -O0.


In the previous patch, I'd already abandoned the intent to use the +X
constraint for ASMNESIA, because it was not useful and created other
problems.  I'd fixed numerous of those additional problems in the
recog.c and cfgexpand.c changes, but those turned out to be not needed
to fix the PR once I backpedaled to +g (or +m).  ASMNESIA also turned
out to be unnecessary, so I prepared and retested another version of the
patch that uses the same switch-to-MEM logic I wrote for ASMNESIA in the
previous patch, but now directly in the detach_value implementation
within the harden-conditional passes, using +m and an addressable
temporary if we find that +g won't do.

If you find that ASMNESIA is still useful as a reusable internal
primitive, I can reintroduce it, now or at a later time.  It would bring
some slight codegen benefit, but we could do without it at this stage.


[PR103149] detach values through mem only if general regs won't do

From: Alexandre Oliva 

When hardening compares or conditional branches, we perform redundant
tests, and to prevent them from being optimized out, we use asm
statements that preserve a value used in a compare, but in a way that
the compiler can no longer assume it's the same value, so it can't
optimize the redundant test away.

We used to use +g, but that requires general regs or mem.  You might
think that, if a reg constraint can't be satisfied, the register
allocator will fall back to memory, but that's not so: we decide on
matching MEMs very early on, by using the same addressable operand on
both input and output, and only if the constraint does not allow
registers.  If it does, we use gimple registers and then pseudos as
inputs and outputs, and then inputs can be substituted by equivalent
expressions, and then, if no register contraint fits (e.g. because
that mode won't fit in general regs, or won't fit in regs at all), the
register allocator will give up before even trying to allocate some
temporary memory to unify input and output.

This patch arranges for us to create and use the temporary stack slot
if we can tell the mode requires memory, or won't otherwise fit in
general regs, and thus to use +m for that asm.


Regstrapped on x86_64-linux-gnu.  Verified that the mentioned aarch64 PR
is fixed.  Also bootstrapping along with a patch that enables both
compare hardening passes.  Ok to install?


for  gcc/ChangeLog

PR middle-end/103149
* gimple-harden-conditionals.cc (detach_value): Use memory if
general regs won't do.

for  gcc/testsuite/ChangeLog

PR middle-end/103149
* gcc.target/aarch64/pr103149.c: New.
---
 gcc/gimple-harden-conditionals.cc   |   67 +--
 gcc/testsuite/gcc.target/aarch64/pr103149.c |   14 ++
 2 files changed, 75 insertions(+), 6 deletions(-)
 create mode 100644 gcc/testsuite/gcc.target/aarch64/pr103149.c

diff --git a/gcc/gimple-harden-conditionals.cc 
b/gcc/gimple-harden-conditionals.cc
index cfa2361d65be0..81867d6e4275f 100644
--- a/gcc/gimple-harden-conditionals.cc
+++ b/gcc/gimple-harden-conditionals.cc
@@ -22,6 +22,8 @@ along with GCC; see the file COPYING3.  If not see
 #include "system.h"
 #include "coretypes.h"
 #include "backend.h"
+#include "target.h"
+#include "rtl.h"
 #include "tree.h"
 #include "fold-const.h"
 #include "gimple.h"
@@ -132,25 +134,78 @@ detach_value (location_t loc, gimple_stmt_iterator *gsip, 
tree val)
   tree ret = make_ssa_name (TREE_TYPE (val));
   SET_SSA_NAME_VAR_OR_IDENTIFIER (ret, SSA_NAME_IDENTIFIER (val));
 
-  /* Output asm ("" : "=g" (ret) : "0" (val));  */
+  /* Some modes won't fit in general regs, so we fall back to memory
+ for them.  ??? It would be ideal to try to identify an alternate,
+ wider or more suitable register class, and use the corresponding
+ constraint, but there's no logic to go from register class to
+ constraint, even if there is a corresponding constraint, and even
+ if we could enumerate constraints, we can't get to their string
+ either.  So this will do for now.  */
+  bool need_memory = true;
+  enum machine_mode mode = TYPE_MODE (TREE_TYPE (val));
+  if (mode != BLKmode)
+for (int i = 0; i < FIRST_PSEUDO_REGISTER; i++)
+  if (TEST_HARD_REG_BIT (reg_class_contents[GENERAL_REGS], i)
+ && targetm.hard_regno_mode_ok (i, mode))
+   {
+ need_memory = false;
+ break;
+   }
+
+  tree asminput = val;
+  tree asmoutput = ret;
+  const char *constraint_out = need_memory ? "=m" : "=g";
+  const char *constraint_in = need_memory ? "m" : "0";
+
+  if (need_memory)

Re: Compare guessed profile frequencies to actual profile feedback in profile dump file

2021-12-03 Thread Bernhard Reutner-Fischer via Gcc-patches
On Sun, 28 Nov 2021 19:52:08 +0100
Jan Hubicka via Gcc-patches  wrote:

>  Basic block  136 guessed freq:   17.548 cummulative:  0.60%  feedback 
> freq:   51.848 cummulative:   1.94% cnt: 101811269914

> diff --git a/gcc/profile.c b/gcc/profile.c
> index d07002d265e..dbf42ff7b2b 100644
> --- a/gcc/profile.c
> +++ b/gcc/profile.c

> +   fprintf (dump_file,
> +" Basic block %4i guessed freq: %12.3f"
> +" cummulative:%6.2f%% "
> +" feedback freq: %12.3f cummulative:%7.2f%%"

s/cummulative/cumulative/g with just one "m"

thanks,


Re: Improve -fprofile-report

2021-12-03 Thread Bernhard Reutner-Fischer via Gcc-patches
On Sat, 27 Nov 2021 16:56:32 +0100
Jan Hubicka via Gcc-patches  wrote:

> --- a/gcc/cfghooks.h
> +++ b/gcc/cfghooks.h
> @@ -36,22 +36,25 @@ along with GCC; see the file COPYING3.  If not see
> and one CFG hook per CFG mode.  */
>  struct profile_record
>  {

> -  /* Likewise for a basic block's successors.  */
> -  int num_mismatched_count_out;
> -  /* A weighted cost of the run-time of the function body.  */
> -  gcov_type_unsigned time;
>/* A weighted cost of the size of the function body.  */
>int size;
>/* True iff this pass actually was run.  */
>bool run;
> +  bool fdo;
>  };
>  

fdo seems to be unused, does it belong to some other patch?
thanks,


Re: [PATCH] fix spelling of -linker-output-auto-nolto-rel

2021-12-03 Thread Eric Botcazou via Gcc-patches
> The transposition nolto -> notlo is confusing and it makes the long
> name even harder to read than it already is - I kept reading it as
> "not lo" until I realized it was a simply typo.

Thanks for catching this!

-- 
Eric Botcazou




[PATCH] [Committed] New testcase for C++/71792, bitfields and auto

2021-12-03 Thread apinski--- via Gcc-patches
From: Andrew Pinski 

This testcase used to fail before GCC 6.4.0 due to the wrong
type being used for auto when used with bitfields, the C++
front-end was using the "bitfield" type rather than the
underlaying type.

Committed the testcase after a quick check.

PR c++/71792

gcc/testsuite/ChangeLog:

* g++.dg/torture/pr71792.C: New test.
---
 gcc/testsuite/g++.dg/torture/pr71792.C | 42 ++
 1 file changed, 42 insertions(+)
 create mode 100644 gcc/testsuite/g++.dg/torture/pr71792.C

diff --git a/gcc/testsuite/g++.dg/torture/pr71792.C 
b/gcc/testsuite/g++.dg/torture/pr71792.C
new file mode 100644
index 000..607774d755d
--- /dev/null
+++ b/gcc/testsuite/g++.dg/torture/pr71792.C
@@ -0,0 +1,42 @@
+// { dg-do run { target c++11 } }
+// PR C++/71792
+
+class some_class
+{
+public:
+  unsigned int np  : 4;
+  unsigned int nc  : 8;
+  unsigned int nc0 : 1;
+};
+
+template
+static void test_bug (const some_class &mp) {
+  if (what) {
+int t = 0;
+for (auto i = mp.nc0; i < mp.nc; i++) {
+  if (t != i) __builtin_abort ();
+  t++;
+}
+  }
+}
+
+static void test_ok (const some_class &mp) {
+  int t = 0;
+  for (auto i = mp.nc0; i < mp.nc; i++) {
+if (t != i) __builtin_abort ();
+t++;
+  }
+}
+
+int main ()
+{
+  some_class mp;
+  mp.nc0 = 0;
+  mp.nc = 9;
+  mp.np = 3;
+
+  test_bug (mp);
+  test_ok (mp);
+
+  return 0;
+}
-- 
2.17.1